问题：

Flume exec到cat文件只向hdfs接收器写入一行

澹台蕴藉

2023-03-14

我是Flume的新手。我有一个大的 CSV 文本文件，其中包含记录，每个记录的长度约为 50 个字符，CR-LF 终止了这些行。我想使用Flume将这些数据摄取到HDFS中。结果是只有文件的一行被写入HDFS（如果该线索有帮助，它是文件的第二行）。

我在输出中没有看到任何错误。谢谢。详情如下。

这是我的执行命令：

Flume-ng代理--conf--conf文件example.conf--name a1-Dflume.root.logger=INFO，控制台

和我的配置：

a1.sources = r1
a1.sinks = k1 k2
a1.channels = c1

a1.sources.r1.type = exec
a1.sources.r1.shell = /bin/bash -c
a1.sources.r1.command = cat /Users/david/flume/testdata.txt
a1.sources.r1.interceptors = a
a1.sources.r1.interceptors.a.type = org.apache.flume.interceptor.TimestampInterceptor$Builder

# Describe the sinks
a1.sinks.k1.type = logger

a1.sinks.k2.type = hdfs
a1.sinks.k2.channel = c1
a1.sinks.k2.hdfs.fileType = DataStream
a1.sinks.k2.hdfs.writeFormat = Text
# a1.sinks.k2.hdfs.path = /flume
a1.sinks.k2.hdfs.path = /flume/trades/%y-%m-%d/%H%M/%S

# Use a channel which buffers events in memory
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100

# Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

和输出：

13/07/03 22:15:34 INFO lifecycle.LifecycleSupervisor: Starting lifecycle supervisor 1
13/07/03 22:15:34 INFO node.FlumeNode: Flume node starting - a1
13/07/03 22:15:34 INFO nodemanager.DefaultLogicalNodeManager: Node manager starting
13/07/03 22:15:34 INFO lifecycle.LifecycleSupervisor: Starting lifecycle supervisor 9
13/07/03 22:15:34 INFO properties.PropertiesFileConfigurationProvider: Configuration provider starting
13/07/03 22:15:34 INFO properties.PropertiesFileConfigurationProvider: Reloading configuration file:example.conf
13/07/03 22:15:34 INFO conf.FlumeConfiguration: Processing:k2
13/07/03 22:15:34 INFO conf.FlumeConfiguration: Processing:k1
13/07/03 22:15:34 INFO conf.FlumeConfiguration: Processing:k1
13/07/03 22:15:34 INFO conf.FlumeConfiguration: Processing:k2
13/07/03 22:15:34 INFO conf.FlumeConfiguration: Processing:k2
13/07/03 22:15:34 INFO conf.FlumeConfiguration: Processing:k2
13/07/03 22:15:34 INFO conf.FlumeConfiguration: Added sinks: k1 k2 Agent: a1
13/07/03 22:15:34 INFO conf.FlumeConfiguration: Processing:k2
13/07/03 22:15:34 INFO conf.FlumeConfiguration: Post-validation flume configuration contains configuration  for agents: [a1]
13/07/03 22:15:34 INFO properties.PropertiesFileConfigurationProvider: Creating channels
13/07/03 22:15:34 INFO instrumentation.MonitoredCounterGroup: Monitoried counter group for type: CHANNEL, name: c1, registered successfully.
13/07/03 22:15:34 INFO properties.PropertiesFileConfigurationProvider: created channel c1
13/07/03 22:15:34 INFO sink.DefaultSinkFactory: Creating instance of sink: k1, type: logger
13/07/03 22:15:34 INFO sink.DefaultSinkFactory: Creating instance of sink: k2, type: hdfs
2013-07-03 22:15:34.777 java[5903:5703] Unable to load realm info from SCDynamicStore
13/07/03 22:15:34 INFO hdfs.HDFSEventSink: Hadoop Security enabled: false
13/07/03 22:15:34 INFO instrumentation.MonitoredCounterGroup: Monitoried counter group for type: SINK, name: k2, registered successfully.
13/07/03 22:15:34 INFO nodemanager.DefaultLogicalNodeManager: Starting new configuration:{ sourceRunners:{r1=EventDrivenSourceRunner: { source:org.apache.flume.source.ExecSource{name:r1,state:IDLE} }} sinkRunners:{k1=SinkRunner: { policy:org.apache.flume.sink.DefaultSinkProcessor@23dae5f1 counterGroup:{ name:null counters:{} } }, k2=SinkRunner: { policy:org.apache.flume.sink.DefaultSinkProcessor@782e439a counterGroup:{ name:null counters:{} } }} channels:{c1=org.apache.flume.channel.MemoryChannel{name: c1}} }
13/07/03 22:15:34 INFO nodemanager.DefaultLogicalNodeManager: Starting Channel c1
13/07/03 22:15:34 INFO instrumentation.MonitoredCounterGroup: Component type: CHANNEL, name: c1 started
13/07/03 22:15:34 INFO nodemanager.DefaultLogicalNodeManager: Starting Sink k1
13/07/03 22:15:34 INFO nodemanager.DefaultLogicalNodeManager: Starting Sink k2
13/07/03 22:15:34 INFO instrumentation.MonitoredCounterGroup: Component type: SINK, name: k2 started
13/07/03 22:15:34 INFO nodemanager.DefaultLogicalNodeManager: Starting Source r1
13/07/03 22:15:34 INFO source.ExecSource: Exec source starting with command:cat /Users/david/flume/testdata.txt
13/07/03 22:15:34 INFO source.ExecSource: Command [cat /Users/david/flume/testdata.txt] exited with 0
13/07/03 22:15:34 INFO sink.LoggerSink: Event: { headers:{timestamp=1372904134964} body: 30 30 32 37 38 35 41 39 2C 30 38 2F 30 35 2F 32 002785A9,08/05/2 }
13/07/03 22:15:34 INFO sink.LoggerSink: Event: { headers:{timestamp=1372904134964} body: 30 30 32 37 38 35 41 39 2C 30 38 2F 30 34 2F 32 002785A9,08/04/2 }
13/07/03 22:15:34 INFO sink.LoggerSink: Event: { headers:{timestamp=1372904134964} body: 30 30 32 37 38 35 41 39 2C 31 30 2F 30 37 2F 32 002785A9,10/07/2 }
13/07/03 22:15:34 INFO sink.LoggerSink: Event: { headers:{timestamp=1372904134964} body: 30 30 32 37 38 35 41 39 2C 31 30 2F 30 34 2F 32 002785A9,10/04/2 }
13/07/03 22:15:34 INFO sink.LoggerSink: Event: { headers:{timestamp=1372904134964} body: 30 30 32 37 38 35 41 39 2C 31 30 2F 30 35 2F 32 002785A9,10/05/2 }
13/07/03 22:15:34 INFO sink.LoggerSink: Event: { headers:{timestamp=1372904134964} body: 30 30 32 37 38 35 41 39 2C 31 30 2F 30 36 2F 32 002785A9,10/06/2 }
13/07/03 22:15:34 INFO sink.LoggerSink: Event: { headers:{timestamp=1372904134964} body: 30 30 32 37 37 38 38 36 2C 31 30 2F 30 34 2F 32 00277886,10/04/2 }
13/07/03 22:15:34 INFO sink.LoggerSink: Event: { headers:{timestamp=1372904134964} body: 30 30 30 30 35 44 42 33 2C 30 39 2F 31 39 2F 32 00005DB3,09/19/2 }
13/07/03 22:15:35 INFO hdfs.BucketWriter: Creating /flume/trades/13-07-03/2215/34/FlumeData.1372904134974.tmp
13/07/03 22:16:05 INFO hdfs.BucketWriter: Renaming /flume/trades/13-07-03/2215/34/FlumeData.1372904134974.tmp to /flume/trades/13-07-03/2215/34/FlumeData.1372904134974

共有3个答案

白腾

2023-03-14

如果要从cat文件中获取流数据，cat文件必须位于flumeng中--

谷德本

2023-03-14

我也有同样的问题。我想一旦消息来源填满了频道

东方吕恭

2023-03-14

我想明白了。事实证明，我需要在我的配置文件中更加具体，现在看起来像这样：

# Example command to run:
# flume-ng agent --conf conf --conf-file example.conf --name a1 -Dflume.root.logger=INFO,console

# Name the components on this agent
a1.sources = r1
a1.sinks =  k2
a1.channels = c1

# Describe/configure the source
# a1.sources.r1.type = netcat
# a1.sources.r1.bind = localhost
# a1.sources.r1.port = 44444
a1.sources.r1.type = exec
a1.sources.r1.shell = /bin/bash -c
a1.sources.r1.command = cat /Users/david/flume/testdata.txt
a1.sources.r1.interceptors = a
a1.sources.r1.interceptors.a.type = org.apache.flume.interceptor.TimestampInterceptor$Builder

# Describe the sinks
a1.sinks.k1.type = logger

a1.sinks.k2.type = hdfs
a1.sinks.k2.channel = c1
a1.sinks.k2.hdfs.fileType = DataStream
a1.sinks.k2.hdfs.batchSize = 2000
a1.sinks.k2.hdfs.rollCount = 5000
a1.sinks.k2.hdfs.rollSize = 1000000
a1.sinks.k2.hdfs.rollInterval = 10
a1.sinks.k2.hdfs.writeFormat = Text
# a1.sinks.k2.hdfs.path = /flume
a1.sinks.k2.hdfs.path = /flume/trades/%y-%m-%d/%H%M/%S

# Use a channel which buffers events in memory
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100

# Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1
a1.sinks.k2.channel = c1

类似资料：

OutputStreamWriter只向文件中写入一项

===================trace================================
水槽加载csv文件Excels到hdfs接收器

我已将Flume源配置为Spooldir类型。我有很多CSV文件，.xl3和.xls，我希望我的Flume代理将所有文件从假脱机程序加载到HDFS接收器。然而，水槽代理返回异常这是我对水槽源的配置：和我的HDFS接收器：
Flume HDFS接收器写入错误“无协议:值”

尝试运行水槽作业时，我收到下面给出的错误。我正在云时代设置上运行它。 Kafka是来源 Morphline用作拦截器，从中创建avro记录接收器为HDFS 完全相同的文件(morphline，avro schema等。水槽配置)。但是在另一个环境中，它会抛出这个错误。我能够在水槽上找到相关代码：https://github.com/apache/flume/blob/trunk/flume-n
Apache Flume HDFS接收文件写有什么保证？

如果Flume代理在HDFS文件写入过程中被杀死（比如使用Avro格式），有人能解释一下会发生什么吗？文件会被破坏，所有事件都会丢失吗？我了解Flume数据链的不同元素之间存在交易（来源-
通过HDFS接收器将flume事件写入S3确保事务

我们使用Flume和S3来存储我们的事件。我认识到，只有当HDFS接收器滚动到下一个文件或Flume优雅地关闭时，事件才会传输到S3。在我看来，这可能会导致潜在的数据丢失。Flume文档写道： ...Flume使用事务性方法来保证事件的可靠传递。。。此处是我的配置：我想我只是做错了什么，有什么想法吗？
使用水槽将本地文件源到 HDFS 接收器

我正在使用flume将本地文件源到HDFS接收器，下面是我的conf：我使用用户“flume”来执行这个conf文件。但它显示我找不到本地文件，权限被拒绝如何解决这个问题？

Flume exec到cat文件只向hdfs接收器写入一行

共有3个答案

相关问答

相关文章

相关阅读

相关工具

相关文档