当前位置: 首页 > 知识库问答 >
问题:

当我将local的值设置为大于1时,pyspark为什么会出错

史高阳
2023-03-14

当我将本地值设置为1时,操作正常,但当设置为2时,错误消息报告如下

from pyspark import SparkContext
# Changing 1 to 2 will give you an error
sc = SparkContext("local[2]", "sort")


class MySort:
    def __init__(self, tup):
        self.tup = tup

    def __gt__(self, other):
        if self.tup[0] > other.tup[0]:
            return True
        elif self.tup[0] == other.tup[0]:
            if self.tup[1] >= other.tup[1]:
                return True
            else:
                return False
        else:
            return False


r1 = sc.parallelize([(1, 2), (2, 2), (2, 3), (2, 1), (1, 3)])
r2 = r1.sortBy(MySort)
print(r2.collect())
Caused by: org.apache.spark.api.python.PythonException: Traceback (most recent call last):
  File "E:\spark2.3.1\spark-2.3.1-bin-hadoop2.7\python\lib\pyspark.zip\pyspark\worker.py", line 230, in main
  File "E:\spark2.3.1\spark-2.3.1-bin-hadoop2.7\python\lib\pyspark.zip\pyspark\worker.py", line 225, in process
  File "E:\spark2.3.1\spark-2.3.1-bin-hadoop2.7\python\lib\pyspark.zip\pyspark\serializers.py", line 376, in dump_stream
    bytes = self.serializer.dumps(vs)
  File "E:\spark2.3.1\spark-2.3.1-bin-hadoop2.7\python\lib\pyspark.zip\pyspark\serializers.py", line 555, in dumps
    return pickle.dumps(obj, protocol)
_pickle.PicklingError: Can't pickle : attribute lookup MySort on __main__ failed

共有2个答案

瞿兴朝
2023-03-14

spark的属性真的很有趣,我以前不知道。我认为当您使用单核时,类不会被pickle(pickle需要在其他地方使用类)。但您仍然可以使用函数(我假设您按前两个值对值进行排序):

key_func = lambda tup : tup[:2]

r1 = sc.parallelize([(1, 2), (2, 2), (2, 3), (2, 1), (1, 3)])
r2 = r1.sortBy(key_func)
print(r2.collect())
巫马英豪
2023-03-14

我认为您需要添加参数,以便spark与您的类一起提交文件:

your_file.py文件

因为火花需要把这门课传给另一个工人。

 类似资料: