当前位置: 首页 > 知识库问答 >
问题:

尽管进行了分区,但我一直在炸毁spark集群

荀学文
2023-03-14
    // build rdd and let cluster build up the ngram list        
    val streamList_rdd = sc.parallelize(streamList).repartition(partitionCount) 
    val rdd_results = streamList_rdd.flatMap { x => x.toList }
    println(rdd_results.count())
java.io.IOException: Connection reset by peer
at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
at sun.nio.ch.IOUtil.read(IOUtil.java:192)
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379)
at io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:313)
at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:881)
at io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:242)
at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:119)
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
at java.lang.Thread.run(Thread.java:745)

共有1个答案

江建明
2023-03-14

鉴于数据量和内存错误的存在,我认为您需要分配更多的集群资源。

增加分区可以提高并行性,但代价是在规模已经不足的集群上消耗更多资源。我还怀疑重新分区操作会导致洗牌,这在最好的情况下是一个昂贵的操作,非常糟糕(灾难性的!)当您有足够的数据到内存不足时。但如果没有原木,那就是猜想。

心跳失败的原因可能是执行器负载太重,无法及时响应,或者进程崩溃/被纱线杀死...

 类似资料: