问题：

水槽和HDFS集成，HDFS IO错误

公西志文

2023-03-14

我试图将FLUME与HDFS集成，我的FLUME配置文件是

hdfs-agent.sources= netcat-collect
hdfs-agent.sinks = hdfs-write
hdfs-agent.channels= memoryChannel

hdfs-agent.sources.netcat-collect.type = netcat
hdfs-agent.sources.netcat-collect.bind = localhost
hdfs-agent.sources.netcat-collect.port = 11111

hdfs-agent.sinks.hdfs-write.type = FILE_ROLL
hdfs-agent.sinks.hdfs-write.hdfs.path = hdfs://127.0.0.1:50020/user/oracle/flume
hdfs-agent.sinks.hdfs-write.rollInterval = 30
hdfs-agent.sinks.hdfs-write.hdfs.writeFormat=Text
hdfs-agent.sinks.hdfs-write.hdfs.fileType=DataStream

hdfs-agent.channels.memoryChannel.type = memory
hdfs-agent.channels.memoryChannel.capacity=10000
hdfs-agent.sources.netcat-collect.channels=memoryChannel
hdfs-agent.sinks.hdfs-write.channel=memoryChannel.

我的核心站点文件是

<configuration>
    <property>
        <name>fs.default.name</name>
        <value>hdfs://localhost</value>
    </property>
</configuration>

当我尝试运行flume代理时，它正在启动，并且能够从nc命令中读取，但是在写入hdfs时，我得到了下面的异常。我尝试使用< code > Hadoop DFS admin-safe mode leave 在安全模式下启动，但仍然出现以下异常。

2014-02-14 10:31:53,785 (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.flume.sink.hdfs.BucketWriter.open(BucketWriter.java:219)] Creating hdfs://127.0.0.1:50020/user/oracle/flume/FlumeData.1392354113707.tmp
2014-02-14 10:31:54,011 (SinkRunner-PollingRunner-DefaultSinkProcessor) [WARN - org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:418)] HDFS IO error
java.io.IOException: Call to /127.0.0.1:50020 failed on local exception: java.io.EOFException
        at org.apache.hadoop.ipc.Client.wrapException(Client.java:1089)
        at org.apache.hadoop.ipc.Client.call(Client.java:1057)
        at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:226)
        at $Proxy5.getProtocolVersion(Unknown Source)
        at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:369)
        at org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:111)
        at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:213)
        at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:180)
        at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:89)
        at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1489)
        at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:66)
        at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:1523)
        at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1505)
        at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:227)
        at org.apache.hadoop.fs.Path.getFileSystem(Path.java:175)
        at org.apache.flume.sink.hdfs.BucketWriter$1.call(BucketWriter.java:226)
        at org.apache.flume.sink.hdfs.BucketWriter$1.call(BucketWriter.java:220)
        at org.apache.flume.sink.hdfs.BucketWriter$8$1.run(BucketWriter.java:536)
        at org.apache.flume.sink.hdfs.BucketWriter.runPrivileged(BucketWriter.java:160)
        at org.apache.flume.sink.hdfs.BucketWriter.access$1000(BucketWriter.java:56)
        at org.apache.flume.sink.hdfs.BucketWriter$8.call(BucketWriter.java:533)
        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
        at java.util.concurrent.FutureTask.run(FutureTask.java:138)
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:662)
Caused by: java.io.EOFException
        at java.io.DataInputStream.readInt(DataInputStream.java:375)
        at org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:781)
        at org.apache.hadoop.ipc.Client$Connection.run(Client.java:689)

如果在任何属性文件中配置了错误，请告诉我，以便它可以工作。

另外，如果我为此使用了正确的端口，请告诉我

我的目标是集成flume和hadoop。我为hadoop设置了一个单节点服务器

共有1个答案

鄢开诚

2023-03-14

您必须提供带有 fs.default.name 的端口号

示例：

<configuration>
    <property>
        <name>fs.default.name</name>
        <value>hdfs://localhost:9001</value>
    </property>
</configuration>

之后编辑Flume配置文件，如下所示

hdfs-agent.sources= netcat-collect
hdfs-agent.sinks = hdfs-write
hdfs-agent.channels= memoryChannel

hdfs-agent.sources.netcat-collect.type = netcat
hdfs-agent.sources.netcat-collect.bind = localhost
hdfs-agent.sources.netcat-collect.port = 11111

hdfs-agent.sinks.hdfs-write.type = hdfs
hdfs-agent.sinks.hdfs-write.hdfs.path = hdfs://127.0.0.1:9001/user/oracle/flume
hdfs-agent.sinks.hdfs-write.rollInterval = 30
hdfs-agent.sinks.hdfs-write.hdfs.writeFormat=Text
hdfs-agent.sinks.hdfs-write.hdfs.fileType=DataStream

hdfs-agent.channels.memoryChannel.type = memory
hdfs-agent.channels.memoryChannel.capacity=10000
hdfs-agent.sources.netcat-collect.channels=memoryChannel
hdfs-agent.sinks.hdfs-write.channel=memoryChannel

变化：

hdfs-agent.sinks.hdfs-write.type = hdfs(sink type as hdfs)
hdfs-agent.sinks.hdfs-write.hdfs.path = hdfs://127.0.0.1:9001/user/oracle/flume(port number) 
hdfs-agent.sinks.hdfs-write.channel=memoryChannel(Removed the dot symbol after memoryChannel)

类似资料：

水槽和远程hdfs水槽出错

我正在尝试使用hdfs水槽运行水槽。hdfs在不同的机器上正常运行，我甚至可以与水槽机器上的hdfs交互，但是当我运行水槽并向其发送事件时，我收到以下错误：同样，一致性不是问题，因为我可以使用hadoop命令行与hdfs交互（水槽机不是datanode）。最奇怪的是，在杀死水槽后，我可以看到tmp文件是在hdfs中创建的，但它是空的（扩展名仍然是. tmp）。关于为什么会发生这种情况的任何想法
水槽HDFS源

我想使用 flume 将数据从 hdfs 目录传输到 hdfs 中的目录，在此传输中，我想应用处理形态线。例如：我的来源是我的水槽是有水槽可能吗？如果是，源水槽的类型是什么？
Apache Flume Hdfs水槽

我们可以为HDFS Sink添加分隔符吗？写入文件时，我们如何添加记录分隔符？以下是配置：-
以hdfs为水槽的水槽中的NOSUCH方法错误

我试图配置水槽与HDFS作为汇。这是我的flume.conf文件：我的hadoop版本是：水槽版本是：我已将这两个jar文件放在flume/lib目录中我将hadoop common jar放在那里，因为在启动flume代理时出现以下错误：现在代理开始了。这是启动日志：但是当一些事件发生时，下面的错误出现在水槽日志中，并且没有任何东西被写入hdfs。我缺少一些配置或jar文件？
水槽内存香奈儿到HDFS水槽

我遇到了Flume的问题（Cloudera CDH 5.3上的1.5）：我想做的是:每5分钟，大约20个文件被推送到假脱机目录(从远程存储中抓取)。每个文件包含多行，每行是一个日志(在JSON中)。文件大小在10KB到1MB之间。当我启动代理时，所有文件都被成功推送到HDFS。1分钟后（这是我在flume.conf中设置的），文件被滚动（删除. tmp后缀并关闭）。但是，当在假脱机目录中找到
水槽HDFS-200追加

https://cwiki.apache.org/confluence/display/FLUME/Getting 开始的页面说 HDFS sink 支持追加，但我无法找到有关如何启用它的任何信息，每个示例都在滚动文件上。因此，如果可能的话，我将不胜感激有关如何将水槽附加到现有文件的任何信息）使现代化可以将所有滚动属性设置为0，这将使flume写入单个文件，但它不会关闭文件，新记录对其他进程不

水槽和HDFS集成，HDFS IO错误

共有1个答案

相关问答

相关文章

相关阅读

相关工具

相关文档