Realtime_Multi-Person_Pose_Estimation训练踩坑

阴凯歌

2023-12-01

前言

最近在研究Realtime_Multi-Person_Pose_Estimation的训练和再训练的过程。
参考 https://blog.csdn.net/qq_38469553/article/details/82119292

以及官方github
https://github.com/ZheC/Realtime_Multi-Person_Pose_Estimation

开始安装

1）Run cd training; bash getData.sh to obtain the COCO images in dataset/COCO/images/, keypoints annotations in dataset/COCO/annotations/ and COCO official toolbox in dataset/COCO/coco/.

打开getData.sh看到里面都是初始化文件夹，下载coco api和下载coco2014数据集，一起要40G左右，一开始就这么难，后面咋玩0.0

2）Run getANNO.m in matlab to convert the annotation format from json to mat in dataset/COCO/mat/.
要在matlab里运行，还得下一个matlab for linux
参考 https://blog.51cto.com/ajxiaocainiao/2307618
又是10G左右，囧

ps：下载完成后安装matlab,安装第二个iso的时候，必须重新打开一个终端（提前打开挂载会报错），然后再挂载才能挂上去。

安装并激活完毕后，进入到项目文件夹\training目录，执行matlab打开程序，在matlab中执行getANNO
这时候会在/dataset/COCO/mat/中生成
coco_kpt.mat coco_val.mat
由/dataset/COCO/annotations目录中的 person_keypoints_train2014.json和person_keypoints_val2014.json 转换而来

3）Run genCOCOMask.m in matlab to obatin the mask images for unlabeled person. You can use ‘parfor’ in matlab to speed up the code.
在matlab中执行genCOCOMask

Error1:

Undefined function or variable 'maskApiMex'.

解决办法：
进入 dataset/COCO/coco/MatlabAPI
执行 matlab启动
在matlab中执行 ： mex('CFLAGS=\$CFLAGS -Wall -std=c99','-largeArrayDims','private/maskApiMex.c','../common/maskApi.c','-I../common/','-outdir','private');
最后显示  
MEX completed successfully.
重新执行 genCOCOMask 可以正常运行

ps：数据集过大，这套流程貌似主要用作解析coco的标注，可以自行把数据集处理小一点（包括图片和json），方便一开始研究它的数据处理流程。减少成training2000,val300,我的速度快多了。这个步骤生成了很多mask图像，会发现数据集中很多图像被筛掉了，生成的mask都是人物图像

4）Run genJSON(‘COCO’) to generate a json file in dataset/COCO/json/ folder. The json files contain raw informations needed for training.

在matlab中输入：
genJSON(‘COCO’)
运行之后在dataset/COCO/json/下得到一个json文件，大概结构如下：

{'root':[
{
                        "dataset": "COCO",
                        "isValidation": 0.000,
                        "img_paths": "train2014/COCO_train2014_000000000036.jpg",
                        "img_width": 481.000,
                        "img_height": 640.000,
                        "objpos": [322.885,395.485],
                        "image_id": 36.000,
                        "bbox": [167.580,162.890,310.610,465.190],
                        "segment_area": 86145.297,
                        "num_keypoints": 13.000,

                        "joint_self": [
                                [250.000,244.000,1.000],
                                [265.000,223.000,1.000],
                                [235.000,235.000,1.000],
                                [309.000,227.000,1.000],
                                [235.000,253.000,1.000],
                                [355.000,337.000,1.000],
                                [215.000,342.000,1.000],
                                [407.000,494.000,1.000],
                                [213.000,520.000,1.000],
                                [445.000,617.000,1.000],
                                [244.000,447.000,1.000],
                                [338.000,603.000,1.000],
                                [267.000,608.000,1.000],
                                [0.000,0.000,2.000],
                                [0.000,0.000,2.000],
                                [0.000,0.000,2.000],
                                [0.000,0.000,2.000]
                        ],
                        "scale_provided": 1.264,
                        "joint_others": [],
                        "annolist_index": 1.000,
                        "people_index": 1.000,
                        "numOtherPeople": 0.000,
                        "scale_provided_other": {
                                "_ArrayType_": "double",
                                "_ArraySize_": [0,0],
                                "_ArrayData_": null
                        },
                        "objpos_other": {
                                "_ArrayType_": "double",
                                "_ArraySize_": [0,0],
                                "_ArrayData_": null
                        },
                        "bbox_other": {
                                "_ArrayType_": "double",
                                "_ArraySize_": [0,0],
                                "_ArrayData_": null
                        },
                        "segment_area_other": {
                                "_ArrayType_": "double",
                                "_ArraySize_": [0,0],
                                "_ArrayData_": null
                        },
                        "num_keypoints_other": {
                                "_ArrayType_": "double",
                                "_ArraySize_": [0,0],
                                "_ArrayData_": null
                        }
},

{},
...
]}

5）Run python genLMDB.py to generate your LMDB. (You can also download our LMDB for the COCO dataset (189GB file) by: bash get_lmdb.sh)
这个LMDB有点大了，还好我的数据集变小了
运行前要修改为自己的路径

elif "COCO" in data[idx]['dataset']:
			path_header = 'xxxx/training/dataset/COCO/images/'

...

if __name__ == "__main__":
	#writeLMDB(['MPI'], '/home/zhecao/MPI_pose/lmdb', 1)
	writeLMDB(['COCO'], 'xxxx/training/dataset/COCO/lmdb', 1)

在系统中运行 python3 genLMDB.py

Error1：
No module named ‘lmdb’
需要安装下 pip install lmdb

Error2：
No module named ‘caffe’
这个好像要用到下一步编译出来的pycaffe，先做第6步

6)Download our modified caffe: caffe_train. Compile pycaffe. It will be merged with caffe_rtpose (for testing) soon.
编译这个修改版的caffe
make -j4

Error1:

src/caffe/cpm_data_transformer.cpp:4:39: fatal error: opencv2/contrib/contrib.hpp: No such file or directory
 #include <opencv2/contrib/contrib.hpp>

解决办法

vim src/caffe/cpm_data_transformer.cpp

然后将

#include <opencv2/contrib/contrib.hpp>

这一行注释掉，然后重新编译即可。

Error2:

build_release/src/caffe/proto/caffe.pb.h:12:2: error: #error This file was generated by a newer version of protoc which is

据说是protoc版本冲突
使用 whereis protoc 还真有好几个

解决办法

在Makefile 中修改这两句：
$(Q)protoc --proto_path=$(PROTO_SRC_DIR) --cpp_out=$(PROTO_BUILD_DIR) $&lt;
$(Q)protoc --proto_path=$(PROTO_SRC_DIR) --python_out=$(PY_PROTO_BUILD_DIR) $&lt;
为
$(Q)/usr/bin/protoc --proto_path=$(PROTO_SRC_DIR) --cpp_out=$(PROTO_BUILD_DIR) $&lt;
$(Q)/usr/bin/protoc --proto_path=$(PROTO_SRC_DIR) --python_out=$(PY_PROTO_BUILD_DIR) $&lt;

即把开头的"protoc"补全路径即可 (/usr/bin/protoc即为自己向指定给的版本路径)

注：这种修改不会影响系统默认的protoc版本，只会在caffe编译的时候调用相应的proto版本
make clean
make -j4 就没有这个错了

Error3:

 fatal error: driver_types.h: No such file or directory

将Makefile.config中CUDA_DIR路径改为我的cuda路径，他的cuda版本才7.5，我的是10.0

Error4:

error: too few arguments to function ‘cudnnStatus_t cudnnSetConvolution2dDescriptor(cudnnConvolutionDescriptor_t,

版本不一样，方法也就报错了。
将官方版本中的/include/caffe/util/cudnn.hpp 替换掉编译版本对应的文件

Error5:

hdf5_data_layer.cpp:13:18: fatal error: hdf5.h: No such file or directory

解决方案

首先安装 sudo apt-get install libhdf5-serial-dev
Step 1

在Makefile.config文件的第85行，添加/usr/include/hdf5/serial/ 到 INCLUDE_DIRS，也就是把下面第一行代码改为第二行代码。

INCLUDE_DIRS := $(PYTHON_INCLUDE) /usr/local/include

INCLUDE_DIRS := $(PYTHON_INCLUDE) /usr/local/include /usr/include/hdf5/serial/

Step 2

在Makefile文件的第173行，把 hdf5_hl 和hdf5修改为hdf5_serial_hl 和 hdf5_serial，也就是把下面第一行代码改为第二行代码。

LIBRARIES += glog gflags protobuf boost_system boost_filesystem m hdf5_hl hdf5


LIBRARIES += glog gflags protobuf boost_system boost_filesystem m hdf5_serial_hl hdf5_serial

Error6:

Unsupported gpu architecture 'compute_20'

解决方案：
在最新caffe Makefile.config有这么一句# For CUDA >= 9.0, comment the *_20 and *_21 lines for compatibility. 我的是10.0所以要注释掉20-21

Error7:

/usr/bin/x86_64-linux-gnu-ld: cannot find -lopencv_contrib
我的opencv不存在这个库，将Makefile中的opencv_contrib去掉

Error8:

undefined reference to `cv::imwrite(cv::String const&, cv::_InputArray const&, std::vector<int, std::allocator<int> > const&)'
makefile中打印
$(warning >>>>>>>>>>>>>>>>>>>> LDFLAGS :$(LDFLAGS))
查看库依赖，有可能是so文件冲突，我的显示的是用了4.1.0的库文件，改为之前自带的3.2版本更接近他的开发环境，去掉多余的依赖后，make clean 
make -j4 编译成功

接着执行 make pycaffe
会看到以下内容，这次没有报错

CXX/LD -o python/caffe/_caffe.so python/caffe/_caffe.cpp
touch python/caffe/proto/__init__.py
PROTOC (python) src/caffe/proto/caffe.proto

生成的pycaffe 位于 python/caffe

修改genLMDB.py中的caffe路径为我的路径，继续执行第5步
Error1:

ImportError: dynamic module does not define module export function (PyInit__caffe)
据说是要安装到低版本python才行，心累，conda部署个python2.7的环境，提示安装其他软件这里就不写了,conda安装不了就用pip

Error2:

/usr/lib/libgdal.so.20: undefined symbol: sqlite3_column_table_name
在环境里再安装个
conda install gdal

Error3:

No module named google.protobuf.internal
conda install protobuf

Error4:

TypeError: 'NoneType' object has no attribute '__getitem__'
终于报的不是环境错误了，调试了一下，发现mask2014文件夹路径跟genCOCOMask不一致，导致读不出数据，修改其中一个，保持一致就好了，这里直接把mask2014移动到images中

运行完成后会生成data.mdb和lock.mdb

8)Run python setLayers.py --exp 1 to generate the prototxt and shell file for training.
这一步可以跳过，因为官方提供了example_proto
修改相关位置完成训练

Realtime_Multi-Person_Pose_Estimation训练踩坑

前言

开始安装

相关阅读

相关文章

相关问答

相关文档