项目的Git目录:https://github.com/dmlc/cxxnet/tree/master/example/multi-machine
sudo apt-get install imagemagick
sudo apt-get install libjpeg-dev
4.下载训练与测试用数据,网址为:https://www.kaggle.com/c/datasciencebowl/data
接下来我会引用官方文档的内容,并提供我自己的理解为大家讲解编译流程:
mkdir /home/cxxnet/example/kaggle_bowl/data
python gen_train.py /home/data/bowl/train/ /home/cxxnet/example/kaggle_bowl/data/train/
python gen_test.py /home/data/bowl/test/ /home/cxxnet/example/kaggle_bowl/data/test/
在转换图片格式到48 X 48时,会有一部分图片转换失败,后来查明这种情况是因为在mac中存在.DS_Store(是用来存储这个文件夹的显示属性的)这一特殊文件,它会python程序异常终止,我们将这一文件夹下的.DS_Store文件夹删除即可。
python gen_img_list.py train /home/data/bowl/sampleSubmission.csv data/train/ train.lst
python gen_img_list.py test /home/data/bowl/sampleSubmission.csv data/test/ test.lst
Partition the data into 8 parts.
./partition.sh train.lst ../../tools/im2bin
这里可能会发现im2bin并未生成,那么需做如下修改:
在cxxnet/tools/Makefile文件的export中加入-I../dmlc-core/include
修改 BIN = im2bin
hosts
$ cat hosts
192.168.0.111
192.168.0.112
此处hosts也可以为你本机hosts设置的用户名,如:
$ cat hosts
wangchao
wangchao1
Further assume each machine has two GPUs, so we put dev = gpu:0,1
in bowl.conf
. If mpirun
is installed, then launch cxxnet
on these two machines by using 2 workers and 2 servers:
./run.sh 2 2 bowl.conf
这里因为我们采用的是4核的CPU,所以需要在bowl.conf做如下修改:
dev = cpu:0,1,2,3
另外:在bowl.conf 中需要注释掉两条语句:
# image_mean = "models/image_mean.bin"
# image_mean = “models/image_mean.bin"
最后,要想成功运行,还需要在.bashrc中配置环境变量,如下所示:
export PATH=/root/mpich-install/bin:$PATH
export LD_LIBRARY_PATH=/opt/OpenBLAS/lib:$LD_LIBRARY_PATH
export LD_LIBRARY_PATH=/root/cxxnet/ps-lite/deps/lib:$LD_LIBRARY_PATH
重新运行一遍:
./run.sh 2 2 bowl.conf
是不是成功了呢?