当前位置: 首页 > 知识库问答 >
问题:

在独立群集上运行IntelliJ Idea中的Spark,并在同一Windows计算机上使用Master

龚伯寅
2023-03-14

当将master设置为local[*]时,我已经能够在IntelliJ Idea中成功运行Spark应用程序。然而,当我将master设置为Spark的单独实例时,会发生异常。

下面是我尝试执行的SparkPi应用程序。

import scala.math.random

import org.apache.spark.SparkContext
import org.apache.spark.SparkConf

/** Computes an approximation to pi */
object SparkPi {
  def main(args: Array[String]) {
    val conf = new SparkConf().setMaster("spark tjvrlaptop:7077").setAppName("Spark Pi") //.set("spark.scheduler.mode", "FIFO").set("spark.cores.max", "8")
    val spark = new SparkContext(conf)
    val slices = if (args.length > 0) args(0).toInt else 20
    val n = math.max(100000000L * slices, Int.MaxValue).toInt // avoid overflow

    for(j <- 1 to 1000000) {
      val count = spark.parallelize(1 until n, slices).map { i =>
        val x = random * 2 - 1
        val y = random * 2 - 1
        if (x * x + y * y < 1) 1 else 0
      }.reduce(_ + _)
      println("Pi is roughly " + 4.0 * count / n)
    }
    spark.stop()
  }
}

这是我的身材。sbt内容:

name := "SBTScalaSparkPi"

version := "1.0"

scalaVersion := "2.10.6"

libraryDependencies += "org.apache.spark" %% "spark-core" % "1.6.1"

以下是我的plugins.sbt内容:

logLevel := Level.Warn

通过在同一台机器上的不同命令提示中使用以下命令,我执行了Spark Master和worker。

spark-1.6.1-bin-hadoop2.6\bin>spark-class org.apache.spark.deploy.master.Master --host tjvrlaptop

spark-1.6.1-bin-hadoop2.6\bin>spark-class org.apache.spark.deploy.worker.Worker spark tjvrlaptop:7077

[主人和工人似乎没有任何问题][1]

[1] :http i.stack。伊姆古尔。com/B3BDZ。巴布亚新几内亚

接下来,我尝试在IntelliJ中运行该程序。一段时间后它失败,并出现以下错误:

**Command Promt where Master is running**

>     16/03/27 14:44:33 INFO Master: Registering app Spark Pi
>     16/03/27 14:44:33 INFO Master: Registered app Spark Pi with ID app-20160327144433-0000
>     16/03/27 14:44:33 INFO Master: Launching executor app-20160327144433-0000/0 on worker
> worker-20160327140440-192.168.56.1-52701
>     16/03/27 14:44:38 INFO Master: Received unregister request from application app-20160327144433-0000
>     16/03/27 14:44:38 INFO Master: Removing app app-20160327144433-0000
>     16/03/27 14:44:38 INFO Master: TJVRLAPTOP:55368 got disassociated, removing it.
>     16/03/27 14:44:38 INFO Master: 192.168.56.1:55350 got disassociated, removing it.
>     16/03/27 14:44:38 WARN Master: Got status update for unknown executor app-20160327144433-0000/0

**Command Prompt where the Worker is running**

> 16/03/27 14:44:34 INFO Worker: Asked to launch executor
> app-20160327144433-0000/0 for Spark Pi 16/03/27 14:44:34 INFO
> SecurityManager: Changing view acls to: tjoha 16/03/27 14:44:34 INFO
> SecurityManager: Changing modify acls to: tjoha 16/03/27 14:44:34 INFO
> SecurityManager: SecurityManager: authentication disabled; ui acls
> disabled; users with view permissions: Set(tjoha); users with modify
> permissions: Set(tjoha) 16/03/27 14:44:34 INFO ExecutorRunner: Launch
> command: "C Program Files\Java\jre1.8.0_77\bin\java" "-cp"
> "C Users\tjoha\Documents\spark-1.6.1-bin-hadoop2.6\bin\..\conf\;C Users\tjoha\Documents\spark-1.6.1-bin-hadoop2.6\bin\..\lib\spark-assembly-1.6.1-hadoop2.6.0.jar;C Users\tjoha\Documents\spark-1.6.1-bin-hadoop2.6\bin\..\lib\datanucleus-api-jdo-3.2.6.jar;C Users\tjoha\Documents\spark-1.6.1-bin-hadoop2.6\bin\..\lib\datanucleus-core-3.2.10.jar;C Users\tjoha\Documents\spark-1.6.1-bin-hadoop2.6\bin\..\lib\datanucleus-rdbms-3.2.9.jar"
> "-Xms1024M" "-Xmx1024M" "-Dspark.driver.port=55350"
> "org.apache.spark.executor.CoarseGrainedExecutorBackend"
> "--driver-url" "spark CoarseGrainedScheduler@192.168.56.1:55350"
> "--executor-id" "0" "--hostname" "192.168.56.1" "--cores" "8"
> "--app-id" "app-20160327144433-0000" "--worker-url"
> "spark Worker@192.168.56.1:52701" 16/03/27 14:44:38 INFO Worker:
> Asked to kill executor app-20160327144433-0000/0 16/03/27 14:44:38
> INFO ExecutorRunner: Runner thread for executor
> app-20160327144433-0000/0 interrupted 16/03/27 14:44:38 INFO
> ExecutorRunner: Killing process! 16/03/27 14:44:38 INFO Worker:
> Executor app-20160327144433-0000/0 finished with state KILLED
> exitStatus 1 16/03/27 14:44:38 INFO Worker: Cleaning up local
> directories for application app-20160327144433-0000 16/03/27 14:44:38
> INFO ExternalShuffleBlockResolver: Application app-20160327144433-0000
> removed, cleanupLocalDirs = true

**IntelliJ Idea Output**

> Using Spark's default log4j profile:
> org/apache/spark/log4j-defaults.properties 16/03/27 15:06:04 INFO
> SparkContext: Running Spark version 1.6.1 16/03/27 15:06:05 WARN
> NativeCodeLoader: Unable to load native-hadoop library for your
> platform... using builtin-java classes where applicable 16/03/27
> 15:06:05 INFO SecurityManager: Changing view acls to: tjoha 16/03/27
> 15:06:05 INFO SecurityManager: Changing modify acls to: tjoha 16/03/27
> 15:06:05 INFO SecurityManager: SecurityManager: authentication
> disabled; ui acls disabled; users with view permissions: Set(tjoha);
> users with modify permissions: Set(tjoha) 16/03/27 15:06:06 INFO
> Utils: Successfully started service 'sparkDriver' on port 56183.
> 16/03/27 15:06:07 INFO Slf4jLogger: Slf4jLogger started 16/03/27
> 15:06:07 INFO Remoting: Starting remoting 16/03/27 15:06:07 INFO
> Remoting: Remoting started; listening on addresses
> :[akka tcp sparkDriverActorSystem@192.168.56.1:56196] 16/03/27
> 15:06:07 INFO Utils: Successfully started service
> 'sparkDriverActorSystem' on port 56196. 16/03/27 15:06:07 INFO
> SparkEnv: Registering MapOutputTracker 16/03/27 15:06:07 INFO
> SparkEnv: Registering BlockManagerMaster 16/03/27 15:06:07 INFO
> DiskBlockManager: Created local directory at
> C Users\tjoha\AppData\Local\Temp\blockmgr-9623b0f9-81f5-4a10-bbc7-ba077d53a2e5
> 16/03/27 15:06:07 INFO MemoryStore: MemoryStore started with capacity
> 2.4 GB 16/03/27 15:06:07 INFO SparkEnv: Registering OutputCommitCoordinator 16/03/27 15:06:07 WARN Utils: Service
> 'SparkUI' could not bind on port 4040. Attempting port 4041. 16/03/27
> 15:06:07 INFO Utils: Successfully started service 'SparkUI' on port
> 4041. 16/03/27 15:06:07 INFO SparkUI: Started SparkUI at http 192.168.56.1:4041 16/03/27 15:06:08 INFO
> AppClient$ClientEndpoint: Connecting to master
> spark tjvrlaptop:7077... 16/03/27 15:06:09 INFO
> SparkDeploySchedulerBackend: Connected to Spark cluster with app ID
> app-20160327150608-0002 16/03/27 15:06:09 INFO
> AppClient$ClientEndpoint: Executor added: app-20160327150608-0002/0 on
> worker-20160327150550-192.168.56.1-56057 (192.168.56.1:56057) with 8
> cores 16/03/27 15:06:09 INFO SparkDeploySchedulerBackend: Granted
> executor ID app-20160327150608-0002/0 on hostPort 192.168.56.1:56057
> with 8 cores, 1024.0 MB RAM 16/03/27 15:06:09 INFO
> AppClient$ClientEndpoint: Executor updated: app-20160327150608-0002/0
> is now RUNNING 16/03/27 15:06:09 INFO Utils: Successfully started
> service 'org.apache.spark.network.netty.NettyBlockTransferService' on
> port 56234. 16/03/27 15:06:09 INFO NettyBlockTransferService: Server
> created on 56234 16/03/27 15:06:09 INFO BlockManagerMaster: Trying to
> register BlockManager 16/03/27 15:06:09 INFO
> BlockManagerMasterEndpoint: Registering block manager
> 192.168.56.1:56234 with 2.4 GB RAM, BlockManagerId(driver, 192.168.56.1, 56234) 16/03/27 15:06:09 INFO BlockManagerMaster: Registered BlockManager 16/03/27 15:06:09 INFO
> SparkDeploySchedulerBackend: SchedulerBackend is ready for scheduling
> beginning after reached minRegisteredResourcesRatio: 0.0 16/03/27
> 15:06:10 INFO SparkContext: Starting job: reduce at SparkPi.scala:37
> 16/03/27 15:06:10 INFO DAGScheduler: Got job 0 (reduce at
> SparkPi.scala:37) with 20 output partitions 16/03/27 15:06:10 INFO
> DAGScheduler: Final stage: ResultStage 0 (reduce at SparkPi.scala:37)
> 16/03/27 15:06:10 INFO DAGScheduler: Parents of final stage: List()
> 16/03/27 15:06:10 INFO DAGScheduler: Missing parents: List() 16/03/27
> 15:06:10 INFO DAGScheduler: Submitting ResultStage 0
> (MapPartitionsRDD[1] at map at SparkPi.scala:33), which has no missing
> parents 16/03/27 15:06:10 INFO MemoryStore: Block broadcast_0 stored
> as values in memory (estimated size 1880.0 B, free 1880.0 B) 16/03/27
> 15:06:10 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in
> memory (estimated size 1212.0 B, free 3.0 KB) 16/03/27 15:06:10 INFO
> BlockManagerInfo: Added broadcast_0_piece0 in memory on
> 192.168.56.1:56234 (size: 1212.0 B, free: 2.4 GB) 16/03/27 15:06:10 INFO SparkContext: Created broadcast 0 from broadcast at
> DAGScheduler.scala:1006 16/03/27 15:06:10 INFO DAGScheduler:
> Submitting 20 missing tasks from ResultStage 0 (MapPartitionsRDD[1] at
> map at SparkPi.scala:33) 16/03/27 15:06:10 INFO TaskSchedulerImpl:
> Adding task set 0.0 with 20 tasks 16/03/27 15:06:14 INFO
> SparkDeploySchedulerBackend: Registered executor
> NettyRpcEndpointRef(null) (TJVRLAPTOP:56281) with ID 0 16/03/27
> 15:06:14 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0,
> TJVRLAPTOP, partition 0,PROCESS_LOCAL, 2078 bytes) 16/03/27 15:06:14

> ...

> TJVRLAPTOP, partition 6,PROCESS_LOCAL, 2078 bytes) 16/03/27 15:06:14
> INFO TaskSetManager: Starting task 7.0 in stage 0.0 (TID 7,
> TJVRLAPTOP, partition 7,PROCESS_LOCAL, 2078 bytes) 16/03/27 15:06:14
> INFO BlockManagerMasterEndpoint: Registering block manager
> TJVRLAPTOP:56319 with 511.1 MB RAM, BlockManagerId(0, TJVRLAPTOP,
> 56319) 16/03/27 15:06:15 INFO BlockManagerInfo: Added
> broadcast_0_piece0 in memory on TJVRLAPTOP:56319 (size: 1212.0 B,
> free: 511.1 MB) 16/03/27 15:06:15 INFO TaskSetManager: Starting task
> 8.0 in stage 0.0 (TID 8, TJVRLAPTOP, partition 8,PROCESS_LOCAL, 2078 bytes) 16/03/27 15:06:15 INFO TaskSetManager: Starting task 9.0 in
> stage 0.0 (TID 9, TJVRLAPTOP, partition 9,PROCESS_LOCAL, 2078 bytes)
> 16/03/27 15:06:15 INFO TaskSetManager: Starting task 10.0 in stage 0.0
> (TID 10, TJVRLAPTOP, partition 10,PROCESS_LOCAL, 2078 bytes) 16/03/27
> 15:06:15 INFO TaskSetManager: Starting task 11.0 in stage 0.0 (TID 11,
> TJVRLAPTOP, partition 11,PROCESS_LOCAL, 2078 bytes) 16/03/27 15:06:15

> ...

> java.lang.ClassNotFoundException: SparkPi$$anonfun$main$1$$anonfun$1
>   at java.net.URLClassLoader.findClass(Unknown Source)    at
> java.lang.ClassLoader.loadClass(Unknown Source)   at
> java.lang.ClassLoader.loadClass(Unknown Source)   at
> java.lang.Class.forName0(Native Method)   at
> java.lang.Class.forName(Unknown Source)   at
> org.apache.spark.serializer.JavaDeserializationStream$$anon$1.resolveClass(JavaSerializer.scala:68)
>   at java.io.ObjectInputStream.readNonProxyDesc(Unknown Source)   at
> java.io.ObjectInputStream.readClassDesc(Unknown Source)   at
> java.io.ObjectInputStream.readOrdinaryObject(Unknown Source)  at
> java.io.ObjectInputStream.readObject0(Unknown Source)     at
> java.io.ObjectInputStream.defaultReadFields(Unknown Source)   at
> java.io.ObjectInputStream.readSerialData(Unknown Source)  at
> java.io.ObjectInputStream.readOrdinaryObject(Unknown Source)  at
> java.io.ObjectInputStream.readObject0(Unknown Source)     at
> java.io.ObjectInputStream.defaultReadFields(Unknown Source)   at
> java.io.ObjectInputStream.readSerialData(Unknown Source)  at
> java.io.ObjectInputStream.readOrdinaryObject(Unknown Source)  at
> java.io.ObjectInputStream.readObject0(Unknown Source)     at
> java.io.ObjectInputStream.defaultReadFields(Unknown Source)   at
> java.io.ObjectInputStream.readSerialData(Unknown Source)  at
> java.io.ObjectInputStream.readOrdinaryObject(Unknown Source)  at
> java.io.ObjectInputStream.readObject0(Unknown Source)     at
> java.io.ObjectInputStream.readObject(Unknown Source)  at
> org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:76)
>   at
> org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:115)
>   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
>   at org.apache.spark.scheduler.Task.run(Task.scala:89)   at
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
>   at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
>   at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
>   at java.lang.Thread.run(Unknown Source) 16/03/27 15:06:15 INFO
> TaskSetManager: Lost task 5.0 in stage 0.0 (TID 5) on executor
> TJVRLAPTOP: java.lang.ClassNotFoundException
> (SparkPi$$anonfun$main$1$$anonfun$1) [duplicate 1] 16/03/27 15:06:15
> INFO TaskSetManager: Lost task 3.0 in stage 0.0 (TID 3) on executor
> TJVRLAPTOP: java.lang.ClassNotFoundException

> ...
 
> INFO TaskSetManager: Starting task 10.1 in stage 0.0 (TID 20,
> TJVRLAPTOP, partition 10,PROCESS_LOCAL, 2078 bytes) 16/03/27 15:06:15

> ...

> TJVRLAPTOP, partition 3,PROCESS_LOCAL, 2078 bytes) 16/03/27 15:06:15
> INFO TaskSetManager: Lost task 4.0 in stage 0.0 (TID 4) on executor
> TJVRLAPTOP: java.lang.ClassNotFoundException
> (SparkPi$$anonfun$main$1$$anonfun$1) [duplicate 8] 16/03/27 15:06:15
> INFO TaskSetManager: Lost task 12.0 in stage 0.0 (TID 12) on executor
> TJVRLAPTOP: java.lang.ClassNotFoundException

> ...

> INFO TaskSetManager: Starting task 2.3 in stage 0.0 (TID 39,
> TJVRLAPTOP, partition 2,PROCESS_LOCAL, 2078 bytes) 16/03/27 15:06:16
> MapOutputTrackerMasterEndpoint stopped! 16/03/27 15:06:16 WARN
> TransportChannelHandler: Exception in connection from
> TJVRLAPTOP/192.168.56.1:56281 java.io.IOException: An existing
> connection was forcibly closed by the remote host 16/03/27 15:06:17
> INFO MemoryStore: MemoryStore cleared 16/03/27 15:06:17 INFO
> BlockManager: BlockManager stopped 16/03/27 15:06:17 INFO
> BlockManagerMaster: BlockManagerMaster stopped 16/03/27 15:06:17 INFO
> OutputCommitCoordinator$OutputCommitCoordinatorEndpoint:
> OutputCommitCoordinator stopped! 16/03/27 15:06:17 INFO SparkContext:
> Successfully stopped SparkContext 16/03/27 15:06:17 INFO
> ShutdownHookManager: Shutdown hook called 16/03/27 15:06:17 INFO
> RemoteActorRefProvider$RemotingTerminator: Shutting down remote
> daemon. 16/03/27 15:06:17 INFO ShutdownHookManager: Deleting directory
> C Users\tjoha\AppData\Local\Temp\spark-11f8184f-23fb-43be-91bb-113fb74aa8b9

共有1个答案

万嘉熙
2023-03-14

当您在嵌入式模式(本地[*])下运行时,Spark在类路径上有所有必需的代码。

当您在独立模式下运行时,您必须通过将jar复制到lib文件夹,使其显式可用于Spark。

 类似资料:
  • 我已经在Kubernetes上建立了Spark独立集群,并试图连接到Kubernetes上没有的Kerberized Hadoop集群。我已经将core-site.xml和hdfs-site.xml放在Spark集群的容器中,并相应地设置了HADOOP_CONF_DIR。我能够成功地在Spark容器中为访问Hadoop集群的principal生成kerberos凭据缓存。但是当我运行spark-s

  • 我正在运行一个小型spark集群,其中有两个EC2实例(M4.xLarge)。 --驱动程序-内存8G --驱动器-核心2 --部署模式客户端

  • null 大多数文档描述了如何在Kubernetes上运行Spark集群。在Kubernetes上独立运行Spark的方法是什么?

  • 工人出现在图片上。为了运行我的代码,我使用了以下命令:

  • null sbin/start-slave.sh spark://c96___37fb:7077--用于并置从机的端口7078 sbin/start-slave.sh spark://masternodeip:7077--其他两个从机的端口7078 前面引用的所有端口都从nodeMaster重定向到相应的Docker。 因此,webUI向我显示,我的集群有3个连接的节点,不幸的是,当运行时,只有并

  • 我们有一个Hadoop集群,数据节点为275个节点(55Tb总内存,12000个VCore)。这个集群与几个项目共享,我们有一个YARN队列分配给我们,资源有限。 为了提高性能,我们正在考虑为我们的项目构建一个单独的Spark集群(在同一网络中的Mesos上)并访问Hadoop集群上的HDFS数据。 正如Spark文档中提到的:https://spark.apache.org/docs/lates