ZC:类似之前的文章,没太多看头
ZC:主看这个
ZC:(20190905)训练起来好慢... 它指定的最多的step是 200k,文章中说 “TensorBoard的主界面,可以看出来,在20k step之前,loss下降很快,之后就波动很小了,经过一夜的训练,目前接近80k step了, mAP平均精确率为67%,我们几乎可以停止训练了。” 但是 我用了 半天多一点的时间 才训练了1000step(i5 9400F)... 暂时放弃这个吧... (使用的是 ssd_mobilenet_v1_coco_2017_11_17)
2.1、一些记录:
1、Home
cmd 切换到目录:G:\Tensorflow\models_copy\research
执行命令:
python object_detection/dataset_tools/zz_create_pascal_tf_record_4raccoon.py --data_dir=G:/Tensorflow_dataset/raccoon_dataset/images --set=G:/Tensorflow_dataset/raccoon_dataset/train.txt --output_path=G:/Tensorflow_dataset/raccoon_dataset/train.record --label_map_path=G:/Tensorflow_dataset/raccoon_dataset/raccoon_label_map.pbtxt --annotations_dir=G:/Tensorflow_dataset/raccoon_dataset/annotations
python object_detection/dataset_tools/zz_create_pascal_tf_record_4raccoon.py --data_dir=G:/Tensorflow_dataset/raccoon_dataset/images --set=G:/Tensorflow_dataset/raccoon_dataset/val.txt --output_path=G:/Tensorflow_dataset/raccoon_dataset/val.record --label_map_path=G:/Tensorflow_dataset/raccoon_dataset/raccoon_label_map.pbtxt --annotations_dir=G:/Tensorflow_dataset/raccoon_dataset/annotations
python model_main.py --logtostderr --pipeline_config_path=G:/Tensorflow_dataset/raccoon_dataset/ssd_mobilenet_v1_raccoon.config --train_dir=G:/Tensorflow_dataset/raccoon_dataset/train
1.1、
ModuleNotFoundError: No module named 'pycocotools'
python -m pip install pycocotools
报错 装不了
Windows10下安装Git - 勿忘初心的博客 - CSDN博客.html(https://blog.csdn.net/qq_32786873/article/details/80570783)
pip install git+https://github.com/philferriere/cocoapi.git#subdirectory=PythonAPI
ZC:我用上面这句就成功了,下面的这句就没有再尝试了...
pip3 install "git+https://github.com/philferriere/cocoapi.git#egg=pycocotools&subdirectory=PythonAPI"
1.2、ModuleNotFoundError: No module named 'nets'
ModuleNotFoundError_ No module named 'nets' - qq_37644877的博客 - CSDN博客.html(https://blog.csdn.net/qq_37644877/article/details/92772820)
1.2.1、
error: could not create 'build': 当文件已存在时,无法创建该文件。
原因是git clone下来的代码库中有个BUILD文件,而build和install指令需要新建build文件夹,名字冲突导致问题。暂时不清楚BUILD文件的作用。将该文件移动到其他目录,再运行上述指令,即可成功安装。
Windows下Tensorflow-silm库使用遇到ImportError_ No module named 'nets'问题的解决方法 - Orankarl的博客 - CSDN博客.html(https://blog.csdn.net/lgczym/article/details/79272579)
1.3、
python model_main.py --logtostderr --pipeline_config_path=G:/Tensorflow_dataset/raccoon_dataset/ssd_mobilenet_v1_raccoon.config --train_dir=G:/Tensorflow_dataset/raccoon_dataset/train
执行上面的语句 出错
ZC: 公司 貌似没有这个问题,已经看到在训练了
ZC: 记得昨天搜了下 说是权限不够的问题,说降到 Tensorflow1.2的gpu版,难道是 CMD的权限没给够??
2、Work
python object_detection/dataset_tools/zz_create_pascal_tf_record_4raccoon.py --data_dir=G:/Tensorflow_dataset/raccoon_dataset/images --set=G:/Tensorflow_dataset/raccoon_dataset/train.txt --output_path=G:/Tensorflow_dataset/raccoon_dataset/train.record --label_map_path=G:/Tensorflow_dataset/raccoon_dataset/raccoon_label_map.pbtxt --annotations_dir=G:/Tensorflow_dataset/raccoon_dataset/annotations
目录"G:\TensorFlow_ZZ\models_copy\research\object_detection"下
python eval_util_test.py ^
--logtostderr ^
--pipeline_config_path=G:/Tensorflow_dataset/raccoon_dataset/ssd_mobilenet_v1_raccoon.config ^
--checkpoint_dir=G:/Tensorflow_dataset/raccoon_dataset/train ^
--eval_dir=G:/Tensorflow_dataset/raccoon_dataset/eval
ZC: 貌似 "^"没用啊...
python eval.py --logtostderr --pipeline_config_path=G:/Tensorflow_dataset/raccoon_dataset/ssd_mobilenet_v1_raccoon.config --checkpoint_dir=G:/Tensorflow_dataset/raccoon_dataset/train --eval_dir=G:/Tensorflow_dataset/raccoon_dataset/eval
ZC: 貌似 应该是 使用"legacy/eval.py"??如下:
python legacy/eval.py --logtostderr --pipeline_config_path=G:/Tensorflow_dataset/raccoon_dataset/ssd_mobilenet_v1_raccoon.config --checkpoint_dir=G:/Tensorflow_dataset/raccoon_dataset/train --eval_dir=G:/Tensorflow_dataset/raccoon_dataset/eval
python legacy/eval.py --logtostderr --pipeline_config_path=G:/Tensorflow_dataset/raccoon_dataset/ssd_mobilenet_v1_raccoon.config --checkpoint_dir=C:/Users/Administrator/AppData/Local/Temp/tmpqui5qdpk --eval_dir=G:/Tensorflow_dataset/raccoon_dataset/eval
tensorboard --logdir=G:/Tensorflow_dataset/raccoon_dataset/
2.1、
ModuleNotFoundError: No module named 'object_detection'
在路径"G:\TensorFlow_ZZ\models_copy\research>"下执行命令:
python setup.py install
2.2、
pip3 install "git+https://github.com/philferriere/cocoapi.git#egg=pycocotools&subdirectory=PythonAPI"
公司用这个,好长时间没反应...
c:\users\admini~1\appdata\local\temp\pip-install-63hmithl\pycocotools
我擦 公司用这个 也贼慢...
pip install git+https://github.com/philferriere/cocoapi.git#subdirectory=PythonAPI
我擦 没招了,copy到 gitee ...
pip3 install "git+https://gitee.com/zclxy/cocoapi.git#egg=pycocotools&subdirectory=PythonAPI"
不慢...安装成功了
2.3、
训练的时候 报错:
File "C:\Program Files\Python37\lib\site-packages\object_detection-0.1-py3.7.e
gg\object_detection\eval_util.py", line 896, in get_evaluators
raise ValueError('Metric not found: {}'.format(eval_metric_fn_key))
ValueError: Metric not found: pascal_voc_metrics
ZC: G:\Tensorflow_dataset\raccoon_dataset\ssd_mobilenet_v1_raccoon.config 中 有指定使用 "pascal_voc_metrics"
ZC: 看到 文章下面的提问,有人说 直接将这句 屏蔽就行了,我试了将"ssd_mobilenet_v1_raccoon.config"中的那句注释掉了(我记得 原来的那个文件(object_detection/samples/configs/ssd_mobilenet_v1_coco.config)中 也是没有指定这个东西的)
ZC: 也可以参看这个文章:“(Tensorflow Object detection Api)安装 - ljyt2的博客 - CSDN博客.html(https://blog.csdn.net/ljyt2/article/details/82143904)”(这个文章是我 搜索“eval_util.py”时 搜到的)
2.2、20190905 16:00不到 --> 20190906 08:18,训练时间 16小时左右,训练 step 3653,最后的 记录如下:
I0906 08:08:33.096599 5416 estimator.py:2099] Saving 'checkpoint_path' summary
for global step 3618: C:\Users\ADMINI~1\AppData\Local\Temp\tmpulzzhj3v\model.ckp
t-3618
I0906 08:18:19.660601 5416 basic_session_run_hooks.py:606] Saving checkpoints f
or 3653 into C:\Users\ADMINI~1\AppData\Local\Temp\tmpulzzhj3v\model.ckpt.
I0906 08:18:21.548201 5416 estimator.py:1145] Calling model_fn.
I0906 08:18:22.697801 5416 convolutional_box_predictor.py:151] depth of additio
nal conv before box predictor: 0
I0906 08:18:22.729001 5416 convolutional_box_predictor.py:151] depth of additio
nal conv before box predictor: 0
I0906 08:18:22.759601 5416 convolutional_box_predictor.py:151] depth of additio
nal conv before box predictor: 0
I0906 08:18:22.776201 5416 convolutional_box_predictor.py:151] depth of additio
nal conv before box predictor: 0
I0906 08:18:22.807401 5416 convolutional_box_predictor.py:151] depth of additio
nal conv before box predictor: 0
I0906 08:18:22.823001 5416 convolutional_box_predictor.py:151] depth of additio
nal conv before box predictor: 0
I0906 08:18:24.036201 5416 estimator.py:1147] Done calling model_fn.
I0906 08:18:24.051801 5416 evaluation.py:255] Starting evaluation at 2019-09-06
T08:18:24Z
I0906 08:18:24.254601 5416 monitored_session.py:240] Graph was finalized.
I0906 08:18:24.254601 5416 saver.py:1280] Restoring parameters from C:\Users\AD
MINI~1\AppData\Local\Temp\tmpulzzhj3v\model.ckpt-3653
I0906 08:18:24.535401 5416 session_manager.py:500] Running local_init_op.
I0906 08:18:24.597801 5416 session_manager.py:502] Done running local_init_op.
I0906 08:18:36.455401 940 coco_evaluation.py:205] Performing evaluation on 40
images.
creating index...
index created!
I0906 08:18:36.455401 940 coco_tools.py:115] Loading and preparing annotation
results...
I0906 08:18:36.455401 940 coco_tools.py:137] DONE (t=0.00s)
creating index...
index created!
Running per image evaluation...
Evaluate annotation type *bbox*
DONE (t=0.19s).
Accumulating evaluation results...
DONE (t=0.02s).
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.157
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.395
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.101
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = -1.000
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.000
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.180
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.267
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.360
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.424
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = -1.000
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.000
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.490
I0906 08:18:36.689401 5416 evaluation.py:275] Finished evaluation at 2019-09-06
-08:18:36
I0906 08:18:36.689401 5416 estimator.py:2039] Saving dict for global step 3653:
DetectionBoxes_Precision/mAP = 0.1565868, DetectionBoxes_Precision/mAP (large)
= 0.1803251, DetectionBoxes_Precision/mAP (medium) = 0.0, DetectionBoxes_Precisi
on/mAP (small) = -1.0, DetectionBoxes_Precision/mAP@.50IOU = 0.39549702, Detecti
onBoxes_Precision/mAP@.75IOU = 0.10103414, DetectionBoxes_Recall/AR@1 = 0.266666
68, DetectionBoxes_Recall/AR@10 = 0.36, DetectionBoxes_Recall/AR@100 = 0.4244444
4, DetectionBoxes_Recall/AR@100 (large) = 0.4897436, DetectionBoxes_Recall/AR@10
0 (medium) = 0.0, DetectionBoxes_Recall/AR@100 (small) = -1.0, Loss/classificati
on_loss = 6.1131063, Loss/localization_loss = 2.1550155, Loss/regularization_los
s = 0.45052153, Loss/total_loss = 8.718643, global_step = 3653, learning_rate =
0.004, loss = 8.718643
I0906 08:18:36.689401 5416 estimator.py:2099] Saving 'checkpoint_path' summary
for global step 3653: C:\Users\ADMINI~1\AppData\Local\Temp\tmpulzzhj3v\model.ckp
t-3653
ZC:差不多 算了一下,和 用ssd_mobilenet_v1_coco_2017_11_17时的速度 差不多啊... 基本是 每36step 花费10分钟,"2.2"中 3600多step 花费 16小时左右 也差不多
ZC:"2.1" 和 "2.2" 的额区别:在 "2.2"里面,文件ssd_mobilenet_v1_raccoon.config 中,这2句 是注释掉的,如下:
# fine_tune_checkpoint: "G:/TensorFlow_ZZ/models_copy/research/object_detection/ssd_mobilenet_v1_coco_2017_11_17/model.ckpt"
# from_detection_checkpoint: true
而在"2.1"里面,上面2句 是放开的。
网上查到说:上2句放开 --> 在人家训练好的数据集上继续训练(ssd_mobilenet_v1_coco_2017_11_17/model.ckpt),上2句注释掉 --> 不使用人家预先训练好的数据集(但是 看 训练的时间和速度 感觉没有什么区别... 我觉得 有点怀疑这个说法了...
3、这个 也要看
4、
5、(20190905)资料:(度搜“TensorFlow训练 自己物体检测器”)
5.1、关于使用tensorflow object detection API训练自己的模型-补充部分(代码,数据标注工具,训练数据,测试数据) - 开到荼蘼的博客 - CSDN博客.html(https://blog.csdn.net/zj1131190425/article/details/84997740)
使用tensorflow object detection API 训练自己的目标检测模型 (三) - 开到荼蘼的博客 - CSDN博客.html(https://blog.csdn.net/zj1131190425/article/details/80778888)
ZC:先看此人的系列文章
5.2、训练自己的目标检测模型:tensorflow+win7 - twinkle_star1314的博客 - CSDN博客.html(https://blog.csdn.net/twinkle_star1314/article/details/88980551)
ZC:使用了 coco的配置文件,但是貌似没有使用 ssd_mobilenet_v1_coco_2017_11_17
5.3、TensorFlow使用object detection训练自己的模型用于物体识别 - zk_ken的博客 - CSDN博客.html(https://blog.csdn.net/zk_ken/article/details/80826835)
5.4、Win10 Tensorflow训练一个自己的物体检测 - 简书.html(https://www.jianshu.com/p/d9fb1b3799b9)
6、
7、度搜“tensorflow object detect 训练 中断”
内容摘录:“在训练过程中可使用 Ctrl+C 任意时刻中断训练,之后再执行上述代码会从断点之处继续训练,而不是从头开始(除非把训练输出文件全部删除)。”
8、
9、