TensorRT实现RetinaFace推理加速(一)

尉迟兴修
2023-12-01

一、参考资料

tensorrtx/retinaface
TensorRT实现yolov5推理加速(一)
TensorRT实现yolov5推理加速(二)

二、实验环境

##系统环境

Environment
Operating System + Version: Ubuntu + 16.04
TensorRT Version: 7.1.3.4
GPU Type: GeForce GTX1650,4GB
Nvidia Driver Version: 470.63.01
CUDA Version: 10.2.300
CUDNN Version: 7.6.5
Python Version (if applicable): 3.7.3
Anaconda Version:4.10.3
gcc:7.5.0
g++:7.5.0

tensorRT-yolov5.yaml

name: tensorRT-yolov5
channels:
  - <unknown>
  - http://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
  - http://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/r
  - http://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/msys2
dependencies:
  - _libgcc_mutex=0.1=main
  - _openmp_mutex=4.5=1_gnu
  - blas=1.0=mkl
  - bzip2=1.0.8=h7b6447c_0
  - ca-certificates=2021.7.5=h06a4308_1
  - certifi=2021.5.30=py37h06a4308_0
  - cudatoolkit=10.2.89=hfd86e86_1
  - ffmpeg=4.2.2=h20bf706_0
  - freetype=2.10.4=h5ab3b9f_0
  - gmp=6.2.1=h2531618_2
  - gnutls=3.6.15=he1e5248_0
  - jpeg=9b=h024ee3a_2
  - lame=3.100=h7b6447c_0
  - lcms2=2.12=h3be6417_0
  - libedit=3.1.20210714=h7f8727e_0
  - libffi=3.2.1=hf484d3e_1007
  - libgcc-ng=9.3.0=h5101ec6_17
  - libgomp=9.3.0=h5101ec6_17
  - libidn2=2.3.2=h7f8727e_0
  - libopus=1.3.1=h7b6447c_0
  - libpng=1.6.37=hbc83047_0
  - libstdcxx-ng=9.3.0=hd4cf53a_17
  - libtasn1=4.16.0=h27cfd23_0
  - libtiff=4.2.0=h85742a9_0
  - libunistring=0.9.10=h27cfd23_0
  - libuv=1.40.0=h7b6447c_0
  - libvpx=1.7.0=h439df22_0
  - libwebp-base=1.2.0=h27cfd23_0
  - lz4-c=1.9.3=h295c915_1
  - mkl_fft=1.3.0=py37h42c9631_2
  - mkl_random=1.2.2=py37h51133e4_0
  - ncurses=6.2=he6710b0_1
  - nettle=3.7.3=hbbd107a_1
  - ninja=1.10.2=hff7bd54_1
  - numpy-base=1.20.3=py37h74d4b33_0
  - openh264=2.1.0=hd408876_0
  - openjpeg=2.4.0=h3ad879b_0
  - openssl=1.1.1l=h7f8727e_0
  - pip=21.2.2=py37h06a4308_0
  - python=3.7.3=h0371630_0
  - pytorch=1.8.0=py3.7_cuda10.2_cudnn7.6.5_0
  - readline=7.0=h7b6447c_5
  - setuptools=52.0.0=py37h06a4308_0
  - six=1.16.0=pyhd3eb1b0_0
  - sqlite=3.33.0=h62c20be_0
  - tk=8.6.10=hbc83047_0
  - torchvision=0.9.0=py37_cu102
  - typing_extensions=3.10.0.0=pyh06a4308_0
  - wheel=0.37.0=pyhd3eb1b0_0
  - x264=1!157.20191217=h7b6447c_0
  - xz=5.2.5=h7b6447c_0
  - zlib=1.2.11=h7b6447c_3
  - zstd=1.4.9=haebb681_0
  - pip:
    - appdirs==1.4.4
    - charset-normalizer==2.0.4
    - cycler==0.10.0
    - dpcpp-cpp-rt==2021.3.0
    - flatbuffers==2.0
    - graphsurgeon==0.4.5
    - idna==3.2
    - intel-cmplr-lib-rt==2021.3.0
    - intel-cmplr-lic-rt==2021.3.0
    - intel-opencl-rt==2021.3.0
    - intel-openmp==2021.3.0
    - kiwisolver==1.3.1
    - mako==1.1.5
    - markupsafe==2.0.1
    - matplotlib==3.4.3
    - mkl==2021.3.0
    - mkl-fft==1.3.0
    - mkl-service==2.4.0
    - netron==5.1.6
    - numpy==1.21.2
    - olefile==0.46
    - onnx==1.10.1
    - onnx-simplifier==0.3.6
    - onnxoptimizer==0.2.6
    - onnxruntime==1.8.1
    - opencv-python==4.5.3.56
    - pandas==1.3.2
    - pillow==8.3.2
    - protobuf==3.17.3
    - pycuda==2021.1
    - pyparsing==2.4.7
    - python-dateutil==2.8.2
    - pytools==2021.2.8
    - pytz==2021.1
    - pyyaml==5.4.1
    - requests==2.26.0
    - scipy==1.7.1
    - seaborn==0.11.2
    - tbb==2021.3.0
    - tensorrt==7.1.3.4
    - torchsummary==1.5.1
    - tqdm==4.62.2
    - typing-extensions==3.10.0.2
    - uff==0.6.9
    - urllib3==1.26.6
prefix: /home/yichao/miniconda3/envs/tensorRT-yolov5

requirements-gpu.txt

appdirs==1.4.4
certifi==2021.5.30
charset-normalizer==2.0.4
cycler==0.10.0
dpcpp-cpp-rt==2021.3.0
flatbuffers==2.0
graphsurgeon @ file:///home/yichao/360Downloads/TensorRT-7.1.3.4/graphsurgeon/graphsurgeon-0.4.5-py2.py3-none-any.whl
idna==3.2
intel-cmplr-lib-rt==2021.3.0
intel-cmplr-lic-rt==2021.3.0
intel-opencl-rt==2021.3.0
intel-openmp==2021.3.0
kiwisolver==1.3.1
Mako==1.1.5
MarkupSafe==2.0.1
matplotlib==3.4.3
mkl==2021.3.0
mkl-fft==1.3.0
mkl-random @ file:///tmp/build/80754af9/mkl_random_1626179032232/work
mkl-service==2.4.0
netron==5.1.6
numpy==1.21.2
olefile==0.46
onnx==1.10.1
onnx-simplifier==0.3.6
onnxoptimizer==0.2.6
onnxruntime==1.8.1
opencv-python==4.5.3.56
pandas==1.3.2
Pillow==8.3.2
protobuf==3.17.3
pycuda==2021.1
pyparsing==2.4.7
python-dateutil==2.8.2
pytools==2021.2.8
pytz==2021.1
PyYAML==5.4.1
requests==2.26.0
scipy==1.7.1
seaborn==0.11.2
six @ file:///tmp/build/80754af9/six_1623709665295/work
tbb==2021.3.0
tensorrt @ file:///home/yichao/360Downloads/TensorRT-7.1.3.4/python/tensorrt-7.1.3.4-cp37-none-linux_x86_64.whl
torch==1.8.0
torchsummary==1.5.1
torchvision==0.9.0
tqdm==4.62.2
typing-extensions==3.10.0.2
uff @ file:///home/yichao/360Downloads/TensorRT-7.1.3.4/uff/uff-0.6.9-py2.py3-none-any.whl
urllib3==1.26.6

三、重要说明

3.1 配置文件

  • Input shape INPUT_H, INPUT_W defined in decode.h
  • INT8/FP16/FP32 can be selected by the macro USE_FP16 or USE_INT8 or USE_FP32 in retina_r50.cpp
  • GPU id can be selected by the macro DEVICE in retina_r50.cpp
  • Batchsize can be selected by the macro BATCHSIZE in retina_r50.cpp

3.2 预训练模型下载

face-recognition-models

face-detection-models

face-alignment-models

face-attribute-models

四、关键步骤

以FP16为例

4.1 pytorch预训练模型生成wts

4.1.1 下载github代码仓库

git clone https://github.com/wang-xinyu/Pytorch_Retinaface.git
// download its weights 'Resnet50_Final.pth', put it in Pytorch_Retinaface/weights

4.1.2 下载预训练模型

cd Pytorch_Retinaface
python detect.py --save_model

4.1.3 生成wts

python genwts.py
// a file 'retinaface.wts' will be generated.

4.2 tensorrtx准备工作

git clone https://github.com/wang-xinyu/tensorrtx.git
cd tensorrtx/retinaface
// put retinaface.wts here
mkdir build
cd build

4.3 cmake编译

yichao@yichao:~/MyDocuments/tensorrtx/retinaface/build$ cmake ..
CMake Deprecation Warning at CMakeLists.txt:1 (cmake_minimum_required):
  Compatibility with CMake < 2.8.12 will be removed from a future version of
  CMake.

  Update the VERSION argument <min> value or use a ...<max> suffix to tell
  CMake that the project does not need compatibility with older versions.


-- The C compiler identification is GNU 7.5.0
-- The CXX compiler identification is GNU 7.5.0
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /usr/bin/cc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Found CUDA: /usr/local/cuda (found version "10.2") 
embed_platform off
-- Found OpenCV: /usr/local/opencv3.3.0 (found version "3.3.0") 
-- Configuring done
-- Generating done
-- Build files have been written to: /home/yichao/MyDocuments/tensorrtx/retinaface/build

4.4 make -j8编译

# 打印所有的日志信息
make VERBOSE=1  
(tensorRT-yolov5) yichao@yichao:~/MyDocuments/tensorrtx/retinaface/build$ make -j8
[ 12%] Building NVCC (Device) object CMakeFiles/decodeplugin.dir/decodeplugin_generated_decode.cu.o
/home/yichao/MyDocuments/tensorrtx/retinaface/decode.h(73): warning: function "nvinfer1::IPluginV2Ext::configurePlugin(const nvinfer1::Dims *, int, const nvinfer1::Dims *, int, const nvinfer1::DataType *, const nvinfer1::DataType *, const __nv_bool *, const __nv_bool *, nvinfer1::PluginFormat, int)" is hidden by "nvinfer1::DecodePlugin::configurePlugin" -- virtual function override intended?

/home/yichao/MyDocuments/tensorrtx/retinaface/decode.h(73): warning: function "nvinfer1::IPluginV2Ext::configurePlugin(const nvinfer1::Dims *, int, const nvinfer1::Dims *, int, const nvinfer1::DataType *, const nvinfer1::DataType *, const bool *, const bool *, nvinfer1::PluginFormat, int)" is hidden by "nvinfer1::DecodePlugin::configurePlugin" -- virtual function override intended?
...
...
...
[ 87%] Linking CXX executable retina_mnet
[100%] Linking CXX executable retina_r50
[100%] Built target retina_r50
[100%] Built target retina_mnet

4.5 生成engine引擎

./retina_r50 -s
(tensorRT-yolov5) yichao@yichao:~/MyDocuments/tensorrtx/retinaface/build$ time ./retina_r50 -s
Loading weights: ../retinaface.wts
Building engine, please wait for a while...
Build engine successfully!

real	1m3.483s
user	0m33.287s
sys	0m5.715s

生成engine引擎大小为78.2MB

4.5.1 显存占用情况

Thu Jan 13 16:00:02 2022       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.63.01    Driver Version: 470.63.01    CUDA Version: 11.4     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  Off  | 00000000:01:00.0  On |                  N/A |
| 27%   36C    P0    28W /  75W |    828MiB /  3903MiB |     63%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      1623      G   /usr/lib/xorg/Xorg                209MiB |
|    0   N/A  N/A     23027      C   ./retina_r50                      615MiB |
+-----------------------------------------------------------------------------+

4.6 infer推理

4.6.1 下载图片。

wget https://github.com/Tencent/FaceDetection-DSFD/raw/master/data/worlds-largest-selfie.jpg

如果下载图片太慢了,改成:
wget https://github.com/Tencent.cnpmjs.org/FaceDetection-DSFD/raw/master/data/worlds-largest-selfie.jpg
(tensorRT-yolov5) yichao@yichao:~/MyDocuments/tensorrtx/retinaface/build$ wget https://github.com.cnpmjs.org/Tencent/FaceDetection-DSFD/raw/master/data/worlds-largest-selfie.jpg
--2022-01-13 15:02:13--  https://github.com.cnpmjs.org/Tencent/FaceDetection-DSFD/raw/master/data/worlds-largest-selfie.jpg
正在解析主机 github.com.cnpmjs.org (github.com.cnpmjs.org)... 47.241.4.205
正在连接 github.com.cnpmjs.org (github.com.cnpmjs.org)|47.241.4.205|:443... 已连接。
已发出 HTTP 请求,正在等待回应... 302 Found
位置:https://raw.githubusercontent.com/Tencent/FaceDetection-DSFD/master/data/worlds-largest-selfie.jpg [跟随至新的 URL]
--2022-01-13 15:02:14--  https://raw.githubusercontent.com/Tencent/FaceDetection-DSFD/master/data/worlds-largest-selfie.jpg
正在解析主机 raw.githubusercontent.com (raw.githubusercontent.com)... 151.101.72.133
正在连接 raw.githubusercontent.com (raw.githubusercontent.com)|151.101.72.133|:443... 已连接。
已发出 HTTP 请求,正在等待回应... 200 OK
长度: 471393 (460K) [image/jpeg]
正在保存至: “worlds-largest-selfie.jpg”

worlds-largest-selfi 100%[===================>] 460.34K  13.0KB/s    in 28s     

2022-01-13 15:02:44 (16.5 KB/s) - 已保存 “worlds-largest-selfie.jpg” [471393/471393])

4.6.2 测试推理速度

./retina_r50 -d
(tensorRT-yolov5) yichao@yichao:~/MyDocuments/tensorrtx/retinaface/build$ ./retina_r50 -d
445571us
19030us
...
...
...
15157us
15870us
umber of detections -> 1433
 -> 515.064
after nms -> 256

4.7 python infer

修改 retinaface_trt.py 中的图片路径。

input_image_paths = ["worlds-largest-selfie.jpg"]
(tensorRT-yolov5) yichao@yichao:~/MyDocuments/tensorrtx/retinaface$ python retinaface_trt.py 
3.9774467945098877
0.017582416534423828
0.01763463020324707
0.021233797073364258
0.017621517181396484
0.017649412155151367
0.017993688583374023
0.017635107040405273
0.01763153076171875
0.017618894577026367

五、tensorRT FP32 推理

TensorRT实现yolov5推理加速(一)

修改 retina_r50.cpp 文件中的 USE_FP32,其他操作参考上文中的关键步骤

5.1 生成engine引擎

(tensorRT-yolov5) yichao@yichao:~/MyDocuments/tensorrtx/retinaface/build$ time ./retina_r50 -s
Loading weights: ../retinaface.wts
Building engine, please wait for a while...
Build engine successfully!

real	0m27.783s
user	0m18.162s
sys	0m2.295s

生成engine引擎大小为154.2MB

5.1.1 显存占用情况

Thu Jan 13 16:10:38 2022       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.63.01    Driver Version: 470.63.01    CUDA Version: 11.4     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  Off  | 00000000:01:00.0  On |                  N/A |
| 27%   36C    P0    42W /  75W |    834MiB /  3903MiB |     56%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      1623      G   /usr/lib/xorg/Xorg                209MiB |
|    0   N/A  N/A     23509      C   ./retina_r50                      621MiB |
+-----------------------------------------------------------------------------+

5.2 infer 推理

(tensorRT-yolov5) yichao@yichao:~/MyDocuments/tensorrtx/retinaface/build$ ./retina_r50 -d
436509us
30747us
30568us
...
...
...
29127us
28726us
28716us
number of detections -> 1433
 -> 515.075
after nms -> 257

5.3 python infer

(tensorRT-yolov5) yichao@yichao:~/MyDocuments/tensorrtx/retinaface$ python retinaface_trt.py 
3.919330358505249
0.03155779838562012
0.031530141830444336
0.03136157989501953
0.03149151802062988
0.0314486026763916
0.03205513954162598
0.03142070770263672
0.03142905235290527
0.03143477439880371

六、tensorRT FP16 推理

TensorRT实现yolov5推理加速(一)

修改 retina_r50.cpp 文件中的 USE_FP16

七、tensorRT INT8 推理

7.1 校准数据集

7.1.1 下载校准数据集

download my calibration images widerface_calib from GoogleDrive or BaiduPan pwd: a9wh

7.1.2 解压到 retinaface/build 目录

7.2 修改 retina_r50.cpp 文件

USE_INT8

7.3 make -j8 编译

make -j8

7.4 生成 engine 引擎

./retina_r50 -s
(tensorRT-yolov5) yichao@yichao:~/MyDocuments/tensorrtx/retinaface/build$ time ./retina_r50 -s
Loading weights: ../retinaface.wts
Your platform support int8: 1
Building engine, please wait for a while...
reading calib cache: r50_int8calib.table
2--Demonstration_2_Demonstration_Political_Rally_2_488.jpg  0
29--Students_Schoolkids_29_Students_Schoolkids_Students_Schoolkids_29_517.jpg  1
39--Ice_Skating_39_Ice_Skating_Ice_Skating_39_344.jpg  2
...
...
...
61--Street_Battle_61_Street_Battle_streetfight_61_566.jpg  998
2--Demonstration_2_Demonstration_Demonstration_Or_Protest_2_260.jpg  999
reading calib cache: r50_int8calib.table
writing calib cache: r50_int8calib.table size: 12200
Build engine successfully!

real	7m25.594s
user	5m58.694s
sys	1m34.686s

生成engine引擎大小为30.1MB

7.4.1 显存占用情况

Thu Jan 13 15:42:58 2022       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.63.01    Driver Version: 470.63.01    CUDA Version: 11.4     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  Off  | 00000000:01:00.0  On |                  N/A |
| 27%   39C    P0    45W /  75W |   1073MiB /  3903MiB |     86%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      1623      G   /usr/lib/xorg/Xorg                209MiB |
|    0   N/A  N/A     22413      C   ./retina_r50                      860MiB |
+-----------------------------------------------------------------------------+

7.5 infer 推理

./retina_r50 -d
(tensorRT-yolov5) yichao@yichao:~/MyDocuments/tensorrtx/retinaface/build$ time ./retina_r50 -d
424574us
13240us
14247us
...
...
...
11711us
11662us
11103us
number of detections -> 1382
 -> 11.1058
after nms -> 246

7.6 python infer

(tensorRT-yolov5) yichao@yichao:~/MyDocuments/tensorrtx/retinaface$ python retinaface_trt.py 
3.9951412677764893
0.014085054397583008
0.014075279235839844
0.013991594314575195
0.014072656631469727
0.014059305191040039
0.014052867889404297
0.014079093933105469
0.01405954360961914
0.014012575149536133

八、RetinaFace性能分析

人脸检测器RetinaFace性能分析

精度Infer Time
FP3229ms
FP1615ms
INT811ms

总结:FP 16加速比是FP 32的2倍,INT8 相对于 FP 16加速不明显。

九、可能出现的问题

Q1:opencv与CUDA版本不匹配,导致 cmake失败

CMake Error at /usr/local/opencv3.3.0/share/OpenCV/OpenCVConfig.cmake:108 (message):
  OpenCV static library was compiled with CUDA 10.2 support.  Please, use the
  same version or rebuild OpenCV with CUDA 11.0
Call Stack (most recent call first):
  CMakeLists.txt:28 (find_package)
错误原因:
opencv版本与CUDA版本不匹配。博主使用CUDA10.3编译opencv3.3.0,正确的应该是opencv3.3.0匹配CUDA10.2,而当前的opencv版本为3.3.0、CUDA版本为11.0。

解决办法:
因为重新编译opencv比较麻烦,直接切换cuda10.2即可,参考博客
[CUDA在ubuntu多版本切换共存](https://blog.csdn.net/m0_37605642/article/details/120098215)

注意:切换cuda版本之后,清空build目录中的文件,重新cmake

Q2:找不到 NvInfer.h 文件

fatal error: NvInfer.h: No such file or directory | TensorRT 报错处理 | 【成功解决】

yichao@yichao:~/MyDocuments/tensorrtx/retinaface/build$ make -j8
[ 12%] Building NVCC (Device) object CMakeFiles/decodeplugin.dir/decodeplugin_generated_decode.cu.o
In file included from /home/yichao/MyDocuments/tensorrtx/retinaface/decode.cu:1:0:
/home/yichao/MyDocuments/tensorrtx/retinaface/decode.h:6:10: fatal error: NvInfer.h: 没有那个文件或目录
 #include "NvInfer.h"
          ^~~~~~~~~~~
compilation terminated.
CMake Error at decodeplugin_generated_decode.cu.o.Debug.cmake:220 (message):
  Error generating
  /home/yichao/MyDocuments/tensorrtx/retinaface/build/CMakeFiles/decodeplugin.dir//./decodeplugin_generated_decode.cu.o
错误原因:
NvInfer.h 头文件属于 TensorRT 下的一个专有头文件,在编译C++ 代码时需要找到它。

解决办法:
/home/yichao/MyDocuments/tensorrtx/retinaface/CMakeLists.txt,增加tensorRT的依赖库

# tensorRT
include_directories(/home/yichao/360Downloads/TensorRT-7.1.3.4/include)
link_directories(/home/yichao/360Downloads/TensorRT-7.1.3.4/lib/)

Q3:不支持tensorRT8

32 errors detected in the compilation of "/tmp/tmpxft_00003bbc_00000000-6_decode.cpp1.ii".
-- Removing /home/yichao/MyDocuments/tensorrtx/retinaface/build/CMakeFiles/decodeplugin.dir//./decodeplugin_generated_decode.cu.o
/home/yichao/360Downloads/cmake-3.21.1-linux-x86_64/bin/cmake -E rm -f /home/yichao/MyDocuments/tensorrtx/retinaface/build/CMakeFiles/decodeplugin.dir//./decodeplugin_generated_decode.cu.o
CMake Error at decodeplugin_generated_decode.cu.o.Debug.cmake:280 (message):
  Error generating file
  /home/yichao/MyDocuments/tensorrtx/retinaface/build/CMakeFiles/decodeplugin.dir//./decodeplugin_generated_decode.cu.o


CMakeFiles/decodeplugin.dir/build.make:75: recipe for target 'CMakeFiles/decodeplugin.dir/decodeplugin_generated_decode.cu.o' failed
make[2]: *** [CMakeFiles/decodeplugin.dir/decodeplugin_generated_decode.cu.o] Error 1
make[2]: Leaving directory '/home/yichao/MyDocuments/tensorrtx/retinaface/build'
CMakeFiles/Makefile2:86: recipe for target 'CMakeFiles/decodeplugin.dir/all' failed
make[1]: *** [CMakeFiles/decodeplugin.dir/all] Error 2
make[1]: Leaving directory '/home/yichao/MyDocuments/tensorrtx/retinaface/build'
Makefile:90: recipe for target 'all' failed
make: *** [all] Error 2
错误原因:
CMakeLists.txt中的tensorRT配置问题,make编译使用的tensorRT版本与系统的tensorRT版本要一致。

解决办法:
/home/yichao/MyDocuments/tensorrtx/retinaface/CMakeLists.txt修改tensorRT的配置
# tensorRT
include_directories(/home/yichao/360Downloads/TensorRT-8.0.1.6/include)
link_directories(/home/yichao/360Downloads/TensorRT-8.0.1.6/lib/)
改为
# tensorRT
include_directories(/home/yichao/360Downloads/TensorRT-7.1.3.4/include)
link_directories(/home/yichao/360Downloads/TensorRT-7.1.3.4/lib/)

Q4:找不到 lnvinfer

解决Make时,“/usr/bin/ld: 找不到 -lXXX”问题的四种方法

/usr/bin/ld: 找不到 -lnvinfer
collect2: error: ld returned 1 exit status
CMakeFiles/decodeplugin.dir/build.make:90: recipe for target 'libdecodeplugin.so' failed
make[2]: *** [libdecodeplugin.so] Error 1
CMakeFiles/Makefile2:86: recipe for target 'CMakeFiles/decodeplugin.dir/all' failed
make[1]: *** [CMakeFiles/decodeplugin.dir/all] Error 2
Makefile:90: recipe for target 'all' failed
make: *** [all] Error 2
错误原因:
找不到nvinfer库文件。这个库的文件名应该为“libnvinfer.so”,其命名规则是:lib+库名(即xxx)+.so。

解决办法:
1. 找到 libnvinfer.so 文件
(用find)find / -name libnvinfer.so
或者
(用locate)locate libnvinfer.so

# 输出
/home/yichao/360Downloads/TensorRT-7.1.3.4/lib/libnvinfer.so

2. 创建软链接
sudo ln -s /home/yichao/360Downloads/TensorRT-7.1.3.4/lib/libnvinfer.so /usr/lib/libnvinfer.so

Q5:源代码错误

TensorRT实现yolov5推理加速(二)

make[2]: *** [CMakeFiles/retina_r50.dir/calibrator.cpp.o] Error 1
make[2]: *** 正在等待未完成的任务....
/home/yichao/MyDocuments/tensorrtx/retinaface/calibrator.cpp: In member function ‘virtual bool Int8EntropyCalibrator2::getBatch(void**, const char**, int)’:
/home/yichao/MyDocuments/tensorrtx/retinaface/calibrator.cpp:52:131: error: too many arguments to function ‘cv::Mat cv::dnn::experimental_dnn_v1::blobFromImages(const std::vector<cv::Mat>&, double, cv::Size, const Scalar&, bool)’
     cv::Mat blob = cv::dnn::blobFromImages(input_imgs_, 1.0, cv::Size(input_w_, input_h_), cv::Scalar(104, 117, 123), false, false);
                                                                                                                                   ^
compilation terminated due to -Wfatal-errors.
CMakeFiles/retina_mnet.dir/build.make:75: recipe for target 'CMakeFiles/retina_mnet.dir/calibrator.cpp.o' failed
make[2]: *** [CMakeFiles/retina_mnet.dir/calibrator.cpp.o] Error 1
make[2]: *** 正在等待未完成的任务....
CMakeFiles/Makefile2:138: recipe for target 'CMakeFiles/retina_mnet.dir/all' failed
make[1]: *** [CMakeFiles/retina_mnet.dir/all] Error 2
make[1]: *** 正在等待未完成的任务....
CMakeFiles/Makefile2:112: recipe for target 'CMakeFiles/retina_r50.dir/all' failed
make[1]: *** [CMakeFiles/retina_r50.dir/all] Error 2
Makefile:90: recipe for target 'all' failed
make: *** [all] Error 2
错误原因:
源码错误
/home/yichao/MyDocuments/tensorrtx/retinaface/calibrator.cpp:52

解决办法:
修改源码

cv::Mat blob = cv::dnn::blobFromImages(input_imgs_, 1.0, cv::Size(input_w_, input_h_), cv::Scalar(104, 117, 123), false, false);
修改为
cv::Mat blob = cv::dnn::blobFromImages(input_imgs_, 1.0, cv::Size(input_w_, input_h_), cv::Scalar(104, 117, 123), false);

Q6:显存不足

Cuda Error in allocate: 2 (out of memory) - GPU Memory Leak? #851

显存不足,生成engine引擎失败。

[TensorRT] ERROR: ../rtSafe/safeRuntime.cpp (25) - Cuda Error in allocate: 2 (out of memory)
[TensorRT] ERROR: ../rtSafe/safeRuntime.cpp (25) - Cuda Error in allocate: 2 (out of memory)
Traceback (most recent call last):
  File "/media/yichao/蚁巢文件/YOYOFile/YOYOFile/demo/build_engine.py", line 146, in <module>
    main(args)
  File "/media/yichao/蚁巢文件/YOYOFile/YOYOFile/demo/build_engine.py", line 126, in main
    builder.create_engine(args.engine, args.precision)
  File "/media/yichao/蚁巢文件/YOYOFile/YOYOFile/demo/build_engine.py", line 118, in create_engine
    with self.builder.build_engine(self.network, self.config) as engine, open(engine_path, "wb") as f:
AttributeError: __enter__
错误原因:

我用python API,在GeForce GTX 1650(4GB)服务器上生成引擎失败。在Jetson TX2(8GB)开发板上测试也失败。
解释一:
Same problem. But this problem only happens when my system is 1080ti+tensorRT7.0+cuda10.0+centos7.6. When I change to 2080ti+tensorRT7.0, everything works fine. 

解释二:
I face the problem with 1080 and no problem on 2080. And I don't found any debug means.
 类似资料: