问题：

Flink中的收集器。它是做什么的？

南宫正阳

2023-03-14

我正在学习Flink，其中一件令我困惑的事情是使用一个名为Collector的对象。例如在平面图函数中。这个Collector和它的方法收集了什么？以及为什么例如map函数不需要通过显式使用它来传递结果？

这里可以看到在flatmap函数中使用收集器的一些示例：https://www.programcreek.com/scala/org.apache.flink.util.Collector

此外，如果我搜索收集器在Flink体系结构中的位置，我找不到任何具有该映射的图

共有2个答案

阎令

2023-03-14

如您所知，如果您希望一个工件在数据流中生成N个输出，您可以使用收集器将输出数据封装在flatmap中，相反，Map通常生成一对一的数据，因此不需要使用它。从某种意义上说，收集器具有广泛的内部应用程序。你可以看看org。阿帕奇。Flink。流式处理。api。操作员。输出（从收集器扩展）\r组织。阿帕奇。Flink。运行时。操作员。运输。OutputCollector，通常用于收集记录并将其发送给writer。等等，收集需要写入数据时调用。

示例（不一定准确）：

flatMap的Scala源代码有三种定义。让我们看看第一个的定义。

  /**
   * Creates a new DataStream by applying the given function to every element and flattening
   * the results.
   */
  def flatMap[R: TypeInformation](fun: (T, Collector[R]) => Unit): DataStream[R] = {
    if (fun == null) {
      throw new NullPointerException("FlatMap function must not be null.")
    }
    val cleanFun = clean(fun)
    val flatMapper = new FlatMapFunction[T, R] {
      def flatMap(in: T, out: Collector[R]) { cleanFun(in, out) }
    }
    flatMap(flatMapper)
  }

使用此方法的示例如下：

text.flatMap((input: String, out: Collector[String]) => {
  input.split(" ").foreach(out.collect)
})

在这种方法中，我们需要通过收集器手动发送数据

然后让我们看看源代码中的第二个定义：

  /**
   * Creates a new DataStream by applying the given function to every element and flattening
   * the results.
   */
  def flatMap[R: TypeInformation](fun: T => TraversableOnce[R]): DataStream[R] = {
    if (fun == null) {
      throw new NullPointerException("FlatMap function must not be null.")
    }
    val cleanFun = clean(fun)
    val flatMapper = new FlatMapFunction[T, R] {
      def flatMap(in: T, out: Collector[R]) { cleanFun(in) foreach out.collect }
    }
    flatMap(flatMapper)
  }

这里我们不使用Collector来收集输出，而是直接输出一个列表，Flink帮助我们将列表展平。使用Traversableonce也会导致我们无论如何都要返回一个列表，即使它是一个空列表，否则我们无法匹配函数的定义。

text.flatMap(input => {
  if (input.size > 15) {
    input.split(" ")
  } else {
    Seq.empty
  }
})

你可以找到很多类似的地方，只要是关于发送数据记录的，你几乎可以看到Collector。

段干博涉

2023-03-14

Flink将Collector传递给任何可能发出任意数量流元素的用户函数。map函数不使用Collector，因为它执行一对一的转换，map函数的返回值是输出。而平面图可以为每个事件发出零、一个或多个流元素，这使得Collector成为适应这一点的方便方式。

Flink中的收集器。它是做什么的？

共有2个答案

相关问答

相关文章

相关阅读

相关工具

相关文档