dpark是豆瓣克隆的spark的高效分布式框架,安装测试了下,做个笔记
1 下载:git clone https://github.com/douban/dpark.git
2 进入dpark目录运行:python setup.py install
3 测试代码,使用蒙特卡洛模拟计算π值:
#coding:utf-8
import sys
import random
from dpark import DparkContext
dpark = DparkContext()
count = dpark.accumulator(0)
def random_once(*args, **kwrgs):
x = random.random() * 2 - 1
y = random.random() * 2 - 1
if x * x + y * y < 1:
count.add(1)
if __name__ == "__main__":
if(len(sys.argv)<2):
print("input args:N")
sys.exit(1)
N=int(sys.argv[1])
dpark.parallelize(range(0, N), 10).foreach(random_once)
print 'PI is roughly', 4.0 * count.value / N
4 假设上面的脚本命名为dpark_test.py,则运行:python dpark_test.py 100000 ,得到的结果如下:
PI is roughly 3.14276