tensorrtx/retinaface
TensorRT实现yolov5推理加速(一)
TensorRT实现yolov5推理加速(二)
##系统环境
Environment
Operating System + Version: Ubuntu + 16.04
TensorRT Version: 7.1.3.4
GPU Type: GeForce GTX1650,4GB
Nvidia Driver Version: 470.63.01
CUDA Version: 10.2.300
CUDNN Version: 7.6.5
Python Version (if applicable): 3.7.3
Anaconda Version:4.10.3
gcc:7.5.0
g++:7.5.0
name: tensorRT-yolov5
channels:
- <unknown>
- http://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
- http://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/r
- http://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/msys2
dependencies:
- _libgcc_mutex=0.1=main
- _openmp_mutex=4.5=1_gnu
- blas=1.0=mkl
- bzip2=1.0.8=h7b6447c_0
- ca-certificates=2021.7.5=h06a4308_1
- certifi=2021.5.30=py37h06a4308_0
- cudatoolkit=10.2.89=hfd86e86_1
- ffmpeg=4.2.2=h20bf706_0
- freetype=2.10.4=h5ab3b9f_0
- gmp=6.2.1=h2531618_2
- gnutls=3.6.15=he1e5248_0
- jpeg=9b=h024ee3a_2
- lame=3.100=h7b6447c_0
- lcms2=2.12=h3be6417_0
- libedit=3.1.20210714=h7f8727e_0
- libffi=3.2.1=hf484d3e_1007
- libgcc-ng=9.3.0=h5101ec6_17
- libgomp=9.3.0=h5101ec6_17
- libidn2=2.3.2=h7f8727e_0
- libopus=1.3.1=h7b6447c_0
- libpng=1.6.37=hbc83047_0
- libstdcxx-ng=9.3.0=hd4cf53a_17
- libtasn1=4.16.0=h27cfd23_0
- libtiff=4.2.0=h85742a9_0
- libunistring=0.9.10=h27cfd23_0
- libuv=1.40.0=h7b6447c_0
- libvpx=1.7.0=h439df22_0
- libwebp-base=1.2.0=h27cfd23_0
- lz4-c=1.9.3=h295c915_1
- mkl_fft=1.3.0=py37h42c9631_2
- mkl_random=1.2.2=py37h51133e4_0
- ncurses=6.2=he6710b0_1
- nettle=3.7.3=hbbd107a_1
- ninja=1.10.2=hff7bd54_1
- numpy-base=1.20.3=py37h74d4b33_0
- openh264=2.1.0=hd408876_0
- openjpeg=2.4.0=h3ad879b_0
- openssl=1.1.1l=h7f8727e_0
- pip=21.2.2=py37h06a4308_0
- python=3.7.3=h0371630_0
- pytorch=1.8.0=py3.7_cuda10.2_cudnn7.6.5_0
- readline=7.0=h7b6447c_5
- setuptools=52.0.0=py37h06a4308_0
- six=1.16.0=pyhd3eb1b0_0
- sqlite=3.33.0=h62c20be_0
- tk=8.6.10=hbc83047_0
- torchvision=0.9.0=py37_cu102
- typing_extensions=3.10.0.0=pyh06a4308_0
- wheel=0.37.0=pyhd3eb1b0_0
- x264=1!157.20191217=h7b6447c_0
- xz=5.2.5=h7b6447c_0
- zlib=1.2.11=h7b6447c_3
- zstd=1.4.9=haebb681_0
- pip:
- appdirs==1.4.4
- charset-normalizer==2.0.4
- cycler==0.10.0
- dpcpp-cpp-rt==2021.3.0
- flatbuffers==2.0
- graphsurgeon==0.4.5
- idna==3.2
- intel-cmplr-lib-rt==2021.3.0
- intel-cmplr-lic-rt==2021.3.0
- intel-opencl-rt==2021.3.0
- intel-openmp==2021.3.0
- kiwisolver==1.3.1
- mako==1.1.5
- markupsafe==2.0.1
- matplotlib==3.4.3
- mkl==2021.3.0
- mkl-fft==1.3.0
- mkl-service==2.4.0
- netron==5.1.6
- numpy==1.21.2
- olefile==0.46
- onnx==1.10.1
- onnx-simplifier==0.3.6
- onnxoptimizer==0.2.6
- onnxruntime==1.8.1
- opencv-python==4.5.3.56
- pandas==1.3.2
- pillow==8.3.2
- protobuf==3.17.3
- pycuda==2021.1
- pyparsing==2.4.7
- python-dateutil==2.8.2
- pytools==2021.2.8
- pytz==2021.1
- pyyaml==5.4.1
- requests==2.26.0
- scipy==1.7.1
- seaborn==0.11.2
- tbb==2021.3.0
- tensorrt==7.1.3.4
- torchsummary==1.5.1
- tqdm==4.62.2
- typing-extensions==3.10.0.2
- uff==0.6.9
- urllib3==1.26.6
prefix: /home/yichao/miniconda3/envs/tensorRT-yolov5
appdirs==1.4.4
certifi==2021.5.30
charset-normalizer==2.0.4
cycler==0.10.0
dpcpp-cpp-rt==2021.3.0
flatbuffers==2.0
graphsurgeon @ file:///home/yichao/360Downloads/TensorRT-7.1.3.4/graphsurgeon/graphsurgeon-0.4.5-py2.py3-none-any.whl
idna==3.2
intel-cmplr-lib-rt==2021.3.0
intel-cmplr-lic-rt==2021.3.0
intel-opencl-rt==2021.3.0
intel-openmp==2021.3.0
kiwisolver==1.3.1
Mako==1.1.5
MarkupSafe==2.0.1
matplotlib==3.4.3
mkl==2021.3.0
mkl-fft==1.3.0
mkl-random @ file:///tmp/build/80754af9/mkl_random_1626179032232/work
mkl-service==2.4.0
netron==5.1.6
numpy==1.21.2
olefile==0.46
onnx==1.10.1
onnx-simplifier==0.3.6
onnxoptimizer==0.2.6
onnxruntime==1.8.1
opencv-python==4.5.3.56
pandas==1.3.2
Pillow==8.3.2
protobuf==3.17.3
pycuda==2021.1
pyparsing==2.4.7
python-dateutil==2.8.2
pytools==2021.2.8
pytz==2021.1
PyYAML==5.4.1
requests==2.26.0
scipy==1.7.1
seaborn==0.11.2
six @ file:///tmp/build/80754af9/six_1623709665295/work
tbb==2021.3.0
tensorrt @ file:///home/yichao/360Downloads/TensorRT-7.1.3.4/python/tensorrt-7.1.3.4-cp37-none-linux_x86_64.whl
torch==1.8.0
torchsummary==1.5.1
torchvision==0.9.0
tqdm==4.62.2
typing-extensions==3.10.0.2
uff @ file:///home/yichao/360Downloads/TensorRT-7.1.3.4/uff/uff-0.6.9-py2.py3-none-any.whl
urllib3==1.26.6
INPUT_H
, INPUT_W
defined in decode.h
USE_FP16
or USE_INT8
or USE_FP32
in retina_r50.cpp
DEVICE
in retina_r50.cpp
BATCHSIZE
in retina_r50.cpp
以FP16为例
git clone https://github.com/wang-xinyu/Pytorch_Retinaface.git
// download its weights 'Resnet50_Final.pth', put it in Pytorch_Retinaface/weights
cd Pytorch_Retinaface
python detect.py --save_model
python genwts.py
// a file 'retinaface.wts' will be generated.
git clone https://github.com/wang-xinyu/tensorrtx.git
cd tensorrtx/retinaface
// put retinaface.wts here
mkdir build
cd build
yichao@yichao:~/MyDocuments/tensorrtx/retinaface/build$ cmake ..
CMake Deprecation Warning at CMakeLists.txt:1 (cmake_minimum_required):
Compatibility with CMake < 2.8.12 will be removed from a future version of
CMake.
Update the VERSION argument <min> value or use a ...<max> suffix to tell
CMake that the project does not need compatibility with older versions.
-- The C compiler identification is GNU 7.5.0
-- The CXX compiler identification is GNU 7.5.0
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /usr/bin/cc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Found CUDA: /usr/local/cuda (found version "10.2")
embed_platform off
-- Found OpenCV: /usr/local/opencv3.3.0 (found version "3.3.0")
-- Configuring done
-- Generating done
-- Build files have been written to: /home/yichao/MyDocuments/tensorrtx/retinaface/build
# 打印所有的日志信息
make VERBOSE=1
(tensorRT-yolov5) yichao@yichao:~/MyDocuments/tensorrtx/retinaface/build$ make -j8
[ 12%] Building NVCC (Device) object CMakeFiles/decodeplugin.dir/decodeplugin_generated_decode.cu.o
/home/yichao/MyDocuments/tensorrtx/retinaface/decode.h(73): warning: function "nvinfer1::IPluginV2Ext::configurePlugin(const nvinfer1::Dims *, int, const nvinfer1::Dims *, int, const nvinfer1::DataType *, const nvinfer1::DataType *, const __nv_bool *, const __nv_bool *, nvinfer1::PluginFormat, int)" is hidden by "nvinfer1::DecodePlugin::configurePlugin" -- virtual function override intended?
/home/yichao/MyDocuments/tensorrtx/retinaface/decode.h(73): warning: function "nvinfer1::IPluginV2Ext::configurePlugin(const nvinfer1::Dims *, int, const nvinfer1::Dims *, int, const nvinfer1::DataType *, const nvinfer1::DataType *, const bool *, const bool *, nvinfer1::PluginFormat, int)" is hidden by "nvinfer1::DecodePlugin::configurePlugin" -- virtual function override intended?
...
...
...
[ 87%] Linking CXX executable retina_mnet
[100%] Linking CXX executable retina_r50
[100%] Built target retina_r50
[100%] Built target retina_mnet
./retina_r50 -s
(tensorRT-yolov5) yichao@yichao:~/MyDocuments/tensorrtx/retinaface/build$ time ./retina_r50 -s
Loading weights: ../retinaface.wts
Building engine, please wait for a while...
Build engine successfully!
real 1m3.483s
user 0m33.287s
sys 0m5.715s
生成engine引擎大小为78.2MB
Thu Jan 13 16:00:02 2022
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.63.01 Driver Version: 470.63.01 CUDA Version: 11.4 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce ... Off | 00000000:01:00.0 On | N/A |
| 27% 36C P0 28W / 75W | 828MiB / 3903MiB | 63% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 1623 G /usr/lib/xorg/Xorg 209MiB |
| 0 N/A N/A 23027 C ./retina_r50 615MiB |
+-----------------------------------------------------------------------------+
wget https://github.com/Tencent/FaceDetection-DSFD/raw/master/data/worlds-largest-selfie.jpg
如果下载图片太慢了,改成:
wget https://github.com/Tencent.cnpmjs.org/FaceDetection-DSFD/raw/master/data/worlds-largest-selfie.jpg
(tensorRT-yolov5) yichao@yichao:~/MyDocuments/tensorrtx/retinaface/build$ wget https://github.com.cnpmjs.org/Tencent/FaceDetection-DSFD/raw/master/data/worlds-largest-selfie.jpg
--2022-01-13 15:02:13-- https://github.com.cnpmjs.org/Tencent/FaceDetection-DSFD/raw/master/data/worlds-largest-selfie.jpg
正在解析主机 github.com.cnpmjs.org (github.com.cnpmjs.org)... 47.241.4.205
正在连接 github.com.cnpmjs.org (github.com.cnpmjs.org)|47.241.4.205|:443... 已连接。
已发出 HTTP 请求,正在等待回应... 302 Found
位置:https://raw.githubusercontent.com/Tencent/FaceDetection-DSFD/master/data/worlds-largest-selfie.jpg [跟随至新的 URL]
--2022-01-13 15:02:14-- https://raw.githubusercontent.com/Tencent/FaceDetection-DSFD/master/data/worlds-largest-selfie.jpg
正在解析主机 raw.githubusercontent.com (raw.githubusercontent.com)... 151.101.72.133
正在连接 raw.githubusercontent.com (raw.githubusercontent.com)|151.101.72.133|:443... 已连接。
已发出 HTTP 请求,正在等待回应... 200 OK
长度: 471393 (460K) [image/jpeg]
正在保存至: “worlds-largest-selfie.jpg”
worlds-largest-selfi 100%[===================>] 460.34K 13.0KB/s in 28s
2022-01-13 15:02:44 (16.5 KB/s) - 已保存 “worlds-largest-selfie.jpg” [471393/471393])
./retina_r50 -d
(tensorRT-yolov5) yichao@yichao:~/MyDocuments/tensorrtx/retinaface/build$ ./retina_r50 -d
445571us
19030us
...
...
...
15157us
15870us
umber of detections -> 1433
-> 515.064
after nms -> 256
修改 retinaface_trt.py
中的图片路径。
input_image_paths = ["worlds-largest-selfie.jpg"]
(tensorRT-yolov5) yichao@yichao:~/MyDocuments/tensorrtx/retinaface$ python retinaface_trt.py
3.9774467945098877
0.017582416534423828
0.01763463020324707
0.021233797073364258
0.017621517181396484
0.017649412155151367
0.017993688583374023
0.017635107040405273
0.01763153076171875
0.017618894577026367
修改 retina_r50.cpp
文件中的 USE_FP32
,其他操作参考上文中的关键步骤。
(tensorRT-yolov5) yichao@yichao:~/MyDocuments/tensorrtx/retinaface/build$ time ./retina_r50 -s
Loading weights: ../retinaface.wts
Building engine, please wait for a while...
Build engine successfully!
real 0m27.783s
user 0m18.162s
sys 0m2.295s
生成engine引擎大小为154.2MB
Thu Jan 13 16:10:38 2022
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.63.01 Driver Version: 470.63.01 CUDA Version: 11.4 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce ... Off | 00000000:01:00.0 On | N/A |
| 27% 36C P0 42W / 75W | 834MiB / 3903MiB | 56% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 1623 G /usr/lib/xorg/Xorg 209MiB |
| 0 N/A N/A 23509 C ./retina_r50 621MiB |
+-----------------------------------------------------------------------------+
(tensorRT-yolov5) yichao@yichao:~/MyDocuments/tensorrtx/retinaface/build$ ./retina_r50 -d
436509us
30747us
30568us
...
...
...
29127us
28726us
28716us
number of detections -> 1433
-> 515.075
after nms -> 257
(tensorRT-yolov5) yichao@yichao:~/MyDocuments/tensorrtx/retinaface$ python retinaface_trt.py
3.919330358505249
0.03155779838562012
0.031530141830444336
0.03136157989501953
0.03149151802062988
0.0314486026763916
0.03205513954162598
0.03142070770263672
0.03142905235290527
0.03143477439880371
修改 retina_r50.cpp
文件中的 USE_FP16
。
download my calibration images widerface_calib
from GoogleDrive or BaiduPan pwd: a9wh
retinaface/build
目录retina_r50.cpp
文件USE_INT8
make -j8
./retina_r50 -s
(tensorRT-yolov5) yichao@yichao:~/MyDocuments/tensorrtx/retinaface/build$ time ./retina_r50 -s
Loading weights: ../retinaface.wts
Your platform support int8: 1
Building engine, please wait for a while...
reading calib cache: r50_int8calib.table
2--Demonstration_2_Demonstration_Political_Rally_2_488.jpg 0
29--Students_Schoolkids_29_Students_Schoolkids_Students_Schoolkids_29_517.jpg 1
39--Ice_Skating_39_Ice_Skating_Ice_Skating_39_344.jpg 2
...
...
...
61--Street_Battle_61_Street_Battle_streetfight_61_566.jpg 998
2--Demonstration_2_Demonstration_Demonstration_Or_Protest_2_260.jpg 999
reading calib cache: r50_int8calib.table
writing calib cache: r50_int8calib.table size: 12200
Build engine successfully!
real 7m25.594s
user 5m58.694s
sys 1m34.686s
生成engine引擎大小为30.1MB
Thu Jan 13 15:42:58 2022
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.63.01 Driver Version: 470.63.01 CUDA Version: 11.4 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce ... Off | 00000000:01:00.0 On | N/A |
| 27% 39C P0 45W / 75W | 1073MiB / 3903MiB | 86% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 1623 G /usr/lib/xorg/Xorg 209MiB |
| 0 N/A N/A 22413 C ./retina_r50 860MiB |
+-----------------------------------------------------------------------------+
./retina_r50 -d
(tensorRT-yolov5) yichao@yichao:~/MyDocuments/tensorrtx/retinaface/build$ time ./retina_r50 -d
424574us
13240us
14247us
...
...
...
11711us
11662us
11103us
number of detections -> 1382
-> 11.1058
after nms -> 246
(tensorRT-yolov5) yichao@yichao:~/MyDocuments/tensorrtx/retinaface$ python retinaface_trt.py
3.9951412677764893
0.014085054397583008
0.014075279235839844
0.013991594314575195
0.014072656631469727
0.014059305191040039
0.014052867889404297
0.014079093933105469
0.01405954360961914
0.014012575149536133
精度 | Infer Time |
---|---|
FP32 | 29ms |
FP16 | 15ms |
INT8 | 11ms |
总结:FP 16加速比是FP 32的2倍,INT8 相对于 FP 16加速不明显。
CMake Error at /usr/local/opencv3.3.0/share/OpenCV/OpenCVConfig.cmake:108 (message):
OpenCV static library was compiled with CUDA 10.2 support. Please, use the
same version or rebuild OpenCV with CUDA 11.0
Call Stack (most recent call first):
CMakeLists.txt:28 (find_package)
错误原因:
opencv版本与CUDA版本不匹配。博主使用CUDA10.3编译opencv3.3.0,正确的应该是opencv3.3.0匹配CUDA10.2,而当前的opencv版本为3.3.0、CUDA版本为11.0。
解决办法:
因为重新编译opencv比较麻烦,直接切换cuda10.2即可,参考博客
[CUDA在ubuntu多版本切换共存](https://blog.csdn.net/m0_37605642/article/details/120098215)
注意:切换cuda版本之后,清空build目录中的文件,重新cmake
NvInfer.h
文件fatal error: NvInfer.h: No such file or directory | TensorRT 报错处理 | 【成功解决】
yichao@yichao:~/MyDocuments/tensorrtx/retinaface/build$ make -j8
[ 12%] Building NVCC (Device) object CMakeFiles/decodeplugin.dir/decodeplugin_generated_decode.cu.o
In file included from /home/yichao/MyDocuments/tensorrtx/retinaface/decode.cu:1:0:
/home/yichao/MyDocuments/tensorrtx/retinaface/decode.h:6:10: fatal error: NvInfer.h: 没有那个文件或目录
#include "NvInfer.h"
^~~~~~~~~~~
compilation terminated.
CMake Error at decodeplugin_generated_decode.cu.o.Debug.cmake:220 (message):
Error generating
/home/yichao/MyDocuments/tensorrtx/retinaface/build/CMakeFiles/decodeplugin.dir//./decodeplugin_generated_decode.cu.o
错误原因:
NvInfer.h 头文件属于 TensorRT 下的一个专有头文件,在编译C++ 代码时需要找到它。
解决办法:
/home/yichao/MyDocuments/tensorrtx/retinaface/CMakeLists.txt,增加tensorRT的依赖库
# tensorRT
include_directories(/home/yichao/360Downloads/TensorRT-7.1.3.4/include)
link_directories(/home/yichao/360Downloads/TensorRT-7.1.3.4/lib/)
32 errors detected in the compilation of "/tmp/tmpxft_00003bbc_00000000-6_decode.cpp1.ii".
-- Removing /home/yichao/MyDocuments/tensorrtx/retinaface/build/CMakeFiles/decodeplugin.dir//./decodeplugin_generated_decode.cu.o
/home/yichao/360Downloads/cmake-3.21.1-linux-x86_64/bin/cmake -E rm -f /home/yichao/MyDocuments/tensorrtx/retinaface/build/CMakeFiles/decodeplugin.dir//./decodeplugin_generated_decode.cu.o
CMake Error at decodeplugin_generated_decode.cu.o.Debug.cmake:280 (message):
Error generating file
/home/yichao/MyDocuments/tensorrtx/retinaface/build/CMakeFiles/decodeplugin.dir//./decodeplugin_generated_decode.cu.o
CMakeFiles/decodeplugin.dir/build.make:75: recipe for target 'CMakeFiles/decodeplugin.dir/decodeplugin_generated_decode.cu.o' failed
make[2]: *** [CMakeFiles/decodeplugin.dir/decodeplugin_generated_decode.cu.o] Error 1
make[2]: Leaving directory '/home/yichao/MyDocuments/tensorrtx/retinaface/build'
CMakeFiles/Makefile2:86: recipe for target 'CMakeFiles/decodeplugin.dir/all' failed
make[1]: *** [CMakeFiles/decodeplugin.dir/all] Error 2
make[1]: Leaving directory '/home/yichao/MyDocuments/tensorrtx/retinaface/build'
Makefile:90: recipe for target 'all' failed
make: *** [all] Error 2
错误原因:
CMakeLists.txt中的tensorRT配置问题,make编译使用的tensorRT版本与系统的tensorRT版本要一致。
解决办法:
/home/yichao/MyDocuments/tensorrtx/retinaface/CMakeLists.txt修改tensorRT的配置
# tensorRT
include_directories(/home/yichao/360Downloads/TensorRT-8.0.1.6/include)
link_directories(/home/yichao/360Downloads/TensorRT-8.0.1.6/lib/)
改为
# tensorRT
include_directories(/home/yichao/360Downloads/TensorRT-7.1.3.4/include)
link_directories(/home/yichao/360Downloads/TensorRT-7.1.3.4/lib/)
lnvinfer
解决Make时,“/usr/bin/ld: 找不到 -lXXX”问题的四种方法
/usr/bin/ld: 找不到 -lnvinfer
collect2: error: ld returned 1 exit status
CMakeFiles/decodeplugin.dir/build.make:90: recipe for target 'libdecodeplugin.so' failed
make[2]: *** [libdecodeplugin.so] Error 1
CMakeFiles/Makefile2:86: recipe for target 'CMakeFiles/decodeplugin.dir/all' failed
make[1]: *** [CMakeFiles/decodeplugin.dir/all] Error 2
Makefile:90: recipe for target 'all' failed
make: *** [all] Error 2
错误原因:
找不到nvinfer库文件。这个库的文件名应该为“libnvinfer.so”,其命名规则是:lib+库名(即xxx)+.so。
解决办法:
1. 找到 libnvinfer.so 文件
(用find)find / -name libnvinfer.so
或者
(用locate)locate libnvinfer.so
# 输出
/home/yichao/360Downloads/TensorRT-7.1.3.4/lib/libnvinfer.so
2. 创建软链接
sudo ln -s /home/yichao/360Downloads/TensorRT-7.1.3.4/lib/libnvinfer.so /usr/lib/libnvinfer.so
make[2]: *** [CMakeFiles/retina_r50.dir/calibrator.cpp.o] Error 1
make[2]: *** 正在等待未完成的任务....
/home/yichao/MyDocuments/tensorrtx/retinaface/calibrator.cpp: In member function ‘virtual bool Int8EntropyCalibrator2::getBatch(void**, const char**, int)’:
/home/yichao/MyDocuments/tensorrtx/retinaface/calibrator.cpp:52:131: error: too many arguments to function ‘cv::Mat cv::dnn::experimental_dnn_v1::blobFromImages(const std::vector<cv::Mat>&, double, cv::Size, const Scalar&, bool)’
cv::Mat blob = cv::dnn::blobFromImages(input_imgs_, 1.0, cv::Size(input_w_, input_h_), cv::Scalar(104, 117, 123), false, false);
^
compilation terminated due to -Wfatal-errors.
CMakeFiles/retina_mnet.dir/build.make:75: recipe for target 'CMakeFiles/retina_mnet.dir/calibrator.cpp.o' failed
make[2]: *** [CMakeFiles/retina_mnet.dir/calibrator.cpp.o] Error 1
make[2]: *** 正在等待未完成的任务....
CMakeFiles/Makefile2:138: recipe for target 'CMakeFiles/retina_mnet.dir/all' failed
make[1]: *** [CMakeFiles/retina_mnet.dir/all] Error 2
make[1]: *** 正在等待未完成的任务....
CMakeFiles/Makefile2:112: recipe for target 'CMakeFiles/retina_r50.dir/all' failed
make[1]: *** [CMakeFiles/retina_r50.dir/all] Error 2
Makefile:90: recipe for target 'all' failed
make: *** [all] Error 2
错误原因:
源码错误
/home/yichao/MyDocuments/tensorrtx/retinaface/calibrator.cpp:52
解决办法:
修改源码
cv::Mat blob = cv::dnn::blobFromImages(input_imgs_, 1.0, cv::Size(input_w_, input_h_), cv::Scalar(104, 117, 123), false, false);
修改为
cv::Mat blob = cv::dnn::blobFromImages(input_imgs_, 1.0, cv::Size(input_w_, input_h_), cv::Scalar(104, 117, 123), false);
Cuda Error in allocate: 2 (out of memory) - GPU Memory Leak? #851
显存不足,生成engine引擎失败。
[TensorRT] ERROR: ../rtSafe/safeRuntime.cpp (25) - Cuda Error in allocate: 2 (out of memory)
[TensorRT] ERROR: ../rtSafe/safeRuntime.cpp (25) - Cuda Error in allocate: 2 (out of memory)
Traceback (most recent call last):
File "/media/yichao/蚁巢文件/YOYOFile/YOYOFile/demo/build_engine.py", line 146, in <module>
main(args)
File "/media/yichao/蚁巢文件/YOYOFile/YOYOFile/demo/build_engine.py", line 126, in main
builder.create_engine(args.engine, args.precision)
File "/media/yichao/蚁巢文件/YOYOFile/YOYOFile/demo/build_engine.py", line 118, in create_engine
with self.builder.build_engine(self.network, self.config) as engine, open(engine_path, "wb") as f:
AttributeError: __enter__
错误原因:
我用python API,在GeForce GTX 1650(4GB)服务器上生成引擎失败。在Jetson TX2(8GB)开发板上测试也失败。
解释一:
Same problem. But this problem only happens when my system is 1080ti+tensorRT7.0+cuda10.0+centos7.6. When I change to 2080ti+tensorRT7.0, everything works fine.
解释二:
I face the problem with 1080 and no problem on 2080. And I don't found any debug means.