parallel.futures中的ProcessPoolExecutor比multiprocessing.Pool慢

方光华

2023-03-14

问题内容：

我是用新的闪亮试验concurrent.futures在Python
3.2模块引起，而我注意到，几乎相同的代码，使用泳池从concurrent.futures的方式
比使用慢multiprocessing.Pool。

这是使用多重处理的版本：

def hard_work(n):
    # Real hard work here
    pass

if __name__ == '__main__':
    from multiprocessing import Pool, cpu_count

    try:
        workers = cpu_count()
    except NotImplementedError:
        workers = 1
    pool = Pool(processes=workers)
    result = pool.map(hard_work, range(100, 1000000))

这是使用current.futures：

def hard_work(n):
    # Real hard work here
    pass

if __name__ == '__main__':
    from concurrent.futures import ProcessPoolExecutor, wait
    from multiprocessing import cpu_count
    try:
        workers = cpu_count()
    except NotImplementedError:
        workers = 1
    pool = ProcessPoolExecutor(max_workers=workers)
    result = pool.map(hard_work, range(100, 1000000))

使用从Eli Bendersky文章中获得的简单分解函数，这些就是我计算机（i7、64位，Arch Linux）上的结果：

[juanlu@nebulae]─[~/Development/Python/test]
└[10:31:10] $ time python pool_multiprocessing.py

real    0m10.330s
user    1m13.430s
sys 0m0.260s
[juanlu@nebulae]─[~/Development/Python/test]
└[10:31:29] $ time python pool_futures.py

real    4m3.939s
user    6m33.297s
sys 0m54.853s

我无法使用Python探查器对它们进行探查，因为出现泡菜错误。有任何想法吗？

问题答案：

使用mapfrom时concurrent.futures，来自iterable的每个元素将分别提交给执行程序，该执行程序Future将为每个调用创建一个对象。然后，它返回一个迭代器，该迭代器产生期货返回的结果。
Future对象相当重，它们需要做很多工作才能允许它们提供的所有功能（例如回调，取消功能，检查状态等）。

与此相比，multiprocessing.Pool开销要少得多。它分批提交作业（减少IPC开销），并直接使用该函数返回的结果。对于大批量的工作，多处理绝对是更好的选择。

如果您希望对那些开销不那么重要的长期运行的工作进行汇总，希望通过回调通知您或不时检查它们是否完成或能够单独取消执行，则期货是很好的选择。

个人说明 ：

我真的想不出使用什么理由Executor.map-它没有提供任何期货功能-
除了可以指定超时的功能。如果您只是对结果感兴趣，最好使用multiprocessing.Pool的map函数之一。

parallel.futures中的ProcessPoolExecutor比multiprocessing.Pool慢

相关阅读

相关文章

相关问答

相关工具

相关文档