我用Jupyter初始化创建了一个DataProc集群。我使用的图像版本是1.4。我使用ssh连接到主节点和工作节点,并运行python--version
,两者都显示python3.6.5::Anaconda,Inc.
。
但是,当我尝试运行Google的示例:使用Jupyter(PySpark内核)从BigQuery读取和写入数据时,它会给出以下错误:
Py4JJavaError Traceback (most recent call last)
<ipython-input-13-1cf15cbebfd5> in <module>
55
56 # Display 10 results.
---> 57 pprint.pprint(word_counts.take(10))
58
59
/usr/lib/spark/python/pyspark/rdd.py in take(self, num)
1358
1359 p = range(partsScanned, min(partsScanned + numPartsToTry, totalParts))
-> 1360 res = self.context.runJob(self, takeUpToNumLeft, p)
1361
1362 items += res
/usr/lib/spark/python/pyspark/context.py in runJob(self, rdd, partitionFunc, partitions, allowLocal)
1049 # SparkContext#runJob.
1050 mappedRDD = rdd.mapPartitions(partitionFunc)
-> 1051 sock_info = self._jvm.PythonRDD.runJob(self._jsc.sc(), mappedRDD._jrdd, partitions)
1052 return list(_load_from_socket(sock_info, mappedRDD._jrdd_deserializer))
1053
/usr/lib/spark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py in __call__(self, *args)
1255 answer = self.gateway_client.send_command(command)
1256 return_value = get_return_value(
-> 1257 answer, self.gateway_client, self.target_id, self.name)
1258
1259 for temp_arg in temp_args:
/usr/lib/spark/python/pyspark/sql/utils.py in deco(*a, **kw)
61 def deco(*a, **kw):
62 try:
---> 63 return f(*a, **kw)
64 except py4j.protocol.Py4JJavaError as e:
65 s = e.java_exception.toString()
/usr/lib/spark/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py in get_return_value(answer, gateway_client, target_id, name)
326 raise Py4JJavaError(
327 "An error occurred while calling {0}{1}{2}.\n".
--> 328 format(target_id, ".", name), value)
329 else:
330 raise Py4JError(
Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.runJob.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 24.0 failed 4 times, most recent failure: Lost task 0.3 in stage 24.0 (TID 563, test-1-w-0.c.abc.internal, executor 3): org.apache.spark.api.python.PythonException: Traceback (most recent call last):
File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/worker.py", line 262, in main
("%d.%d" % sys.version_info[:2], version))
Exception: Python in worker has different version 2.7 than that in driver 3.6, PySpark cannot run with different minor versions.Please check environment variables PYSPARK_PYTHON and PYSPARK_DRIVER_PYTHON are correctly set.
at org.apache.spark.api.python.BasePythonRunner$ReaderIterator.handlePythonException(PythonRunner.scala:452)
at org.apache.spark.api.python.PythonRunner$$anon$1.read(PythonRunner.scala:588)
at org.apache.spark.api.python.PythonRunner$$anon$1.read(PythonRunner.scala:571)
at org.apache.spark.api.python.BasePythonRunner$ReaderIterator.hasNext(PythonRunner.scala:406)
at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
at scala.collection.Iterator$GroupedIterator.fill(Iterator.scala:1124)
at scala.collection.Iterator$GroupedIterator.hasNext(Iterator.scala:1130)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409)
at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:125)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:55)
at org.apache.spark.scheduler.Task.run(Task.scala:121)
at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:402)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:408)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Driver stacktrace:
at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1888)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1876)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1875)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1875)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:926)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:926)
at scala.Option.foreach(Option.scala:257)
at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:926)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2109)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2058)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2047)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49)
at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:737)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2061)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2082)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2101)
at org.apache.spark.api.python.PythonRDD$.runJob(PythonRDD.scala:153)
at org.apache.spark.api.python.PythonRDD.runJob(PythonRDD.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.spark.api.python.PythonException: Traceback (most recent call last):
File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/worker.py", line 262, in main
("%d.%d" % sys.version_info[:2], version))
Exception: Python in worker has different version 2.7 than that in driver 3.6, PySpark cannot run with different minor versions.Please check environment variables PYSPARK_PYTHON and PYSPARK_DRIVER_PYTHON are correctly set.
at org.apache.spark.api.python.BasePythonRunner$ReaderIterator.handlePythonException(PythonRunner.scala:452)
at org.apache.spark.api.python.PythonRunner$$anon$1.read(PythonRunner.scala:588)
at org.apache.spark.api.python.PythonRunner$$anon$1.read(PythonRunner.scala:571)
at org.apache.spark.api.python.BasePythonRunner$ReaderIterator.hasNext(PythonRunner.scala:406)
at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
at scala.collection.Iterator$GroupedIterator.fill(Iterator.scala:1124)
at scala.collection.Iterator$GroupedIterator.hasNext(Iterator.scala:1130)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409)
at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:125)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:55)
at org.apache.spark.scheduler.Task.run(Task.scala:121)
at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:402)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:408)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
... 1 more
我不明白为什么会发生主工作者python版本错误。此外,当我从本地命令行提交此作业时,它可以正常工作。任何帮助或建议都将不胜感激。
所描述的问题应该只在使用初始化操作时出现,这些操作最初是为Dataproc 1.2或更早版本编写的。当使用Dataproc图像版本1.3或更高版本时,您应该使用Dataproc可选组件来安装Juptyer,而不是初始化操作;这种方法将更加可靠,并确保整个集群中所有相关的版本设置都是正确的:
gcloud dataproc clusters create cluster-name \
--optional-components=JUPYTER \
--image-version=1.4 \
... other flags
问题内容: 我通过终端更新 sklearn 版本 如果我列出 但是如果我在笔记本上运行 结果是 如何解决和更新Jupyter Notebook中的版本? 问题答案: 要更新Jupyter使用的版本,您需要通过Jupiter界面打开终端 并从这里运行命令 错误是使用系统终端。
本文向大家介绍如何实现更换Jupyter Notebook内核Python版本,包括了如何实现更换Jupyter Notebook内核Python版本的使用技巧和注意事项,需要的朋友参考一下 我使用anaconda安装的python3.6.3,并且自己建立一个虚拟环境,虚拟环境下的python版本也是3.6.3,Jupyter Notebook的内核P丫头好哦哦呢指向的是虚拟环境下的
尝试在安装了Java版本JDK1.7的远程机器上运行我使用maven和JDK1.8构建的spark应用程序。使用spark-submit命令: 获取以下异常:
当我建立我的Xamarin Android应用程序在发布模式,我得到这个错误: /库/框架/单声道。framework/External/xbuild/Xamarin/Android/Xamarin。Android常见的目标:错误:执行任务链接程序集时出错:错误XA2006:引用元数据项“系统”。沃德·沙马林。形式。条目::从“MyApp,Version=1.0”设置_FontSize(Syste
问题内容: 谁能解释默认情况下如何在我的计算机上运行python 2.6?似乎 指向2.7,所以似乎没有给我正确的信息。 当我产生错误时,我看到了真正正在运行的东西。为什么会这样呢? 我该如何纠正呢? - - 编辑: - - 从评论中的建议: 问题答案: Bash使用内部哈希表来优化查找。当您在您的较早位置安装了一个与现有程序同名的新程序时,Bash对此一无所知,并继续使用旧程序。该可执行文件做了
问题内容: 我之前通过Anaconda在Ubuntu 14.04中安装了Jupyter笔记本,而现在我安装了TensorFlow。我希望TensorFlow能够正常工作,而不管我是在笔记本中还是在简单地编写脚本。为了实现这一目标,我最终两次安装了TensorFlow,一次使用Anaconda,一次使用pip。Anaconda安装工作正常,但我需要在对python的任何调用之前加上“源代码激活ten