当前位置: 首页 > 工具软件 > Informa > 使用案例 >

spark本地读取hdfs文件连接超时:java.net.ConnectException: Connection timed out: no further informa

耿运浩
2023-12-01

报错信息具体描述:1.读写HDFS文件程序

def main(args: Array[String]): Unit = {

  // 如果在windows本地跑,需要从widnows访问HDFS,需要指定一个合法的身份
  System.setProperty("HADOOP_USER_NAME", "hdfs")
  val logFile =  "/user/zla/test"


  val spark = SparkSession
    .builder
    .appName("demo")
    .config("fs.defaultFS", "hdfs://pyspark001:8020/")
    .master("local[*]")
    .getOrCreate()

  import spark.implicits._

  val stringline: Dataset[String] = spark.read.textFile(logFile)
  val strings: Array[String] = stringline.take(10)
  for(a <- strings){
    println("-------------"+a)
  }

  spark.stop()
}

21/05/05 11:28:01 INFO FileScanRDD: Reading File path: hdfs://pyspark001:8020/user/zla/test, range: 0-24, partition values: [empty row]
21/05/05 11:28:01 INFO CodeGenerator: Code generated in 10.1283 ms
21/05/05 11:28:22 WARN BlockReaderFactory: I/O error constructing remote block reader.
java.net.ConnectException: Connection timed out: no further information
    at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
    at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
    at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
    at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:530)
    at org.apache.hadoop.hdfs.DFSClient.newConnectedPeer(DFSClient.java:3101)
    at org.apache.hadoop.hdfs.BlockReaderFactory.nextTcpPeer(BlockReaderFactory.java:755)
    at org.apache.hadoop.hdfs.BlockReaderFactory.getRemoteBlockReaderFromTcp(BlockReaderFactory.java:670)
    at org.apache.hadoop.hdfs.BlockReaderFactory.build(BlockReaderFactory.java:337)
    at org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:576)
    at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:800)
    at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:847)
    at java.io.DataInputStream.read(DataInputStream.java:100)
    at org.apache.hadoop.util.LineReader.fillBuffer(LineReader.java:180)
    at org.apache.hadoop.util.LineReader.readDefaultLine(LineReader.java:216)
    at org.apache.hadoop.util.LineReader.readLine(LineReader.java:174)
    at org.apache.hadoop.mapreduce.lib.input.LineRecordReader.skipUtfByteOrderMark(LineRecordReader.java:143)
    at org.apache.hadoop.mapreduce.lib.input.LineRecordReader.nextKeyValue(LineRecordReader.java:183)
    at org.apache.spark.sql.execution.datasources.RecordReaderIterator.hasNext(RecordReaderIterator.scala:39)
    at org.apache.spark.sql.execution.datasources.HadoopFileLinesReader.hasNext(HadoopFileLinesReader.scala:50)
    at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
    at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
    at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:105)
    at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:177)
    at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:105)
    at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown Source)
    at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
    at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:395)
    at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:234)
    at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:228)
    at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:827)
    at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:827)
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
    at org.apache.spark.scheduler.Task.run(Task.scala:108)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:335)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
21/05/05 11:28:22 WARN DFSClient: Failed to connect to /172.17.1.123:50010 for block, add to deadNodes and continue. java.net.ConnectException: Connection timed out: no further information
java.net.ConnectException: Connection timed out: no further information
    at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
    at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
    at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
    at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:530)
    at org.apache.hadoop.hdfs.DFSClient.newConnectedPeer(DFSClient.java:3101)
    at org.apache.hadoop.hdfs.BlockReaderFactory.nextTcpPeer(BlockReaderFactory.java:755)
    at org.apache.hadoop.hdfs.BlockReaderFactory.getRemoteBlockReaderFromTcp(BlockReaderFactory.java:670)
    at org.apache.hadoop.hdfs.BlockReaderFactory.build(BlockReaderFactory.java:337)
    at org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:576)
    at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:800)
    at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:847)
    at java.io.DataInputStream.read(DataInputStream.java:100)
    at org.apache.hadoop.util.LineReader.fillBuffer(LineReader.java:180)
    at org.apache.hadoop.util.LineReader.readDefaultLine(LineReader.java:216)

3.hosts文件叙述
   1》windows中cmd中可以ping通集群主机名
   2》windows中telnet中可以ping通集群主机名:端口号
   3》集群/etc/hosts文件和windows中的hosts文件内容不一致(做了地址映射)
       windows中的hosts文件配置的是外网地址
       linux集群中的hosts文件配置的是内网地址

4.疑惑?
    在idea中执行上面操作hdfs程序后,可以在hdfs中创建文件,但是不能向文件中写入内容,文件在创建时是hadoop用户并可指定文件权限为777(依然报错)
解决方法

第一次发帖碰到你给我回复,万分感谢,问题已解决,因为集群设置了地址映射关系,本地访问集群的时候不能通过内网访问,在程序中添加一个配置就好了
Configuration conf = new Configuration();//设置通过域名访问datanodeconf.set("dfs.client.use.datanode.hostname""true");
这样就可以访问了,也是在网上看的帖子,找了好久碰到的,感谢你!!!参考网址如下:
https://www.cnblogs.com/krcys/p/9146329.html

 类似资料: