TensorRT实现RetinaFace推理加速（一）

尉迟兴修

2023-12-01

一、参考资料

tensorrtx/retinaface
TensorRT实现yolov5推理加速（一）
TensorRT实现yolov5推理加速（二）

二、实验环境

##系统环境

Environment
Operating System + Version: Ubuntu + 16.04
TensorRT Version: 7.1.3.4
GPU Type: GeForce GTX1650,4GB
Nvidia Driver Version: 470.63.01
CUDA Version: 10.2.300
CUDNN Version: 7.6.5
Python Version (if applicable): 3.7.3
Anaconda Version：4.10.3
gcc：7.5.0
g++：7.5.0

tensorRT-yolov5.yaml

name: tensorRT-yolov5
channels:
  - <unknown>
  - http://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
  - http://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/r
  - http://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/msys2
dependencies:
  - _libgcc_mutex=0.1=main
  - _openmp_mutex=4.5=1_gnu
  - blas=1.0=mkl
  - bzip2=1.0.8=h7b6447c_0
  - ca-certificates=2021.7.5=h06a4308_1
  - certifi=2021.5.30=py37h06a4308_0
  - cudatoolkit=10.2.89=hfd86e86_1
  - ffmpeg=4.2.2=h20bf706_0
  - freetype=2.10.4=h5ab3b9f_0
  - gmp=6.2.1=h2531618_2
  - gnutls=3.6.15=he1e5248_0
  - jpeg=9b=h024ee3a_2
  - lame=3.100=h7b6447c_0
  - lcms2=2.12=h3be6417_0
  - libedit=3.1.20210714=h7f8727e_0
  - libffi=3.2.1=hf484d3e_1007
  - libgcc-ng=9.3.0=h5101ec6_17
  - libgomp=9.3.0=h5101ec6_17
  - libidn2=2.3.2=h7f8727e_0
  - libopus=1.3.1=h7b6447c_0
  - libpng=1.6.37=hbc83047_0
  - libstdcxx-ng=9.3.0=hd4cf53a_17
  - libtasn1=4.16.0=h27cfd23_0
  - libtiff=4.2.0=h85742a9_0
  - libunistring=0.9.10=h27cfd23_0
  - libuv=1.40.0=h7b6447c_0
  - libvpx=1.7.0=h439df22_0
  - libwebp-base=1.2.0=h27cfd23_0
  - lz4-c=1.9.3=h295c915_1
  - mkl_fft=1.3.0=py37h42c9631_2
  - mkl_random=1.2.2=py37h51133e4_0
  - ncurses=6.2=he6710b0_1
  - nettle=3.7.3=hbbd107a_1
  - ninja=1.10.2=hff7bd54_1
  - numpy-base=1.20.3=py37h74d4b33_0
  - openh264=2.1.0=hd408876_0
  - openjpeg=2.4.0=h3ad879b_0
  - openssl=1.1.1l=h7f8727e_0
  - pip=21.2.2=py37h06a4308_0
  - python=3.7.3=h0371630_0
  - pytorch=1.8.0=py3.7_cuda10.2_cudnn7.6.5_0
  - readline=7.0=h7b6447c_5
  - setuptools=52.0.0=py37h06a4308_0
  - six=1.16.0=pyhd3eb1b0_0
  - sqlite=3.33.0=h62c20be_0
  - tk=8.6.10=hbc83047_0
  - torchvision=0.9.0=py37_cu102
  - typing_extensions=3.10.0.0=pyh06a4308_0
  - wheel=0.37.0=pyhd3eb1b0_0
  - x264=1!157.20191217=h7b6447c_0
  - xz=5.2.5=h7b6447c_0
  - zlib=1.2.11=h7b6447c_3
  - zstd=1.4.9=haebb681_0
  - pip:
    - appdirs==1.4.4
    - charset-normalizer==2.0.4
    - cycler==0.10.0
    - dpcpp-cpp-rt==2021.3.0
    - flatbuffers==2.0
    - graphsurgeon==0.4.5
    - idna==3.2
    - intel-cmplr-lib-rt==2021.3.0
    - intel-cmplr-lic-rt==2021.3.0
    - intel-opencl-rt==2021.3.0
    - intel-openmp==2021.3.0
    - kiwisolver==1.3.1
    - mako==1.1.5
    - markupsafe==2.0.1
    - matplotlib==3.4.3
    - mkl==2021.3.0
    - mkl-fft==1.3.0
    - mkl-service==2.4.0
    - netron==5.1.6
    - numpy==1.21.2
    - olefile==0.46
    - onnx==1.10.1
    - onnx-simplifier==0.3.6
    - onnxoptimizer==0.2.6
    - onnxruntime==1.8.1
    - opencv-python==4.5.3.56
    - pandas==1.3.2
    - pillow==8.3.2
    - protobuf==3.17.3
    - pycuda==2021.1
    - pyparsing==2.4.7
    - python-dateutil==2.8.2
    - pytools==2021.2.8
    - pytz==2021.1
    - pyyaml==5.4.1
    - requests==2.26.0
    - scipy==1.7.1
    - seaborn==0.11.2
    - tbb==2021.3.0
    - tensorrt==7.1.3.4
    - torchsummary==1.5.1
    - tqdm==4.62.2
    - typing-extensions==3.10.0.2
    - uff==0.6.9
    - urllib3==1.26.6
prefix: /home/yichao/miniconda3/envs/tensorRT-yolov5

requirements-gpu.txt

appdirs==1.4.4
certifi==2021.5.30
charset-normalizer==2.0.4
cycler==0.10.0
dpcpp-cpp-rt==2021.3.0
flatbuffers==2.0
graphsurgeon @ file:///home/yichao/360Downloads/TensorRT-7.1.3.4/graphsurgeon/graphsurgeon-0.4.5-py2.py3-none-any.whl
idna==3.2
intel-cmplr-lib-rt==2021.3.0
intel-cmplr-lic-rt==2021.3.0
intel-opencl-rt==2021.3.0
intel-openmp==2021.3.0
kiwisolver==1.3.1
Mako==1.1.5
MarkupSafe==2.0.1
matplotlib==3.4.3
mkl==2021.3.0
mkl-fft==1.3.0
mkl-random @ file:///tmp/build/80754af9/mkl_random_1626179032232/work
mkl-service==2.4.0
netron==5.1.6
numpy==1.21.2
olefile==0.46
onnx==1.10.1
onnx-simplifier==0.3.6
onnxoptimizer==0.2.6
onnxruntime==1.8.1
opencv-python==4.5.3.56
pandas==1.3.2
Pillow==8.3.2
protobuf==3.17.3
pycuda==2021.1
pyparsing==2.4.7
python-dateutil==2.8.2
pytools==2021.2.8
pytz==2021.1
PyYAML==5.4.1
requests==2.26.0
scipy==1.7.1
seaborn==0.11.2
six @ file:///tmp/build/80754af9/six_1623709665295/work
tbb==2021.3.0
tensorrt @ file:///home/yichao/360Downloads/TensorRT-7.1.3.4/python/tensorrt-7.1.3.4-cp37-none-linux_x86_64.whl
torch==1.8.0
torchsummary==1.5.1
torchvision==0.9.0
tqdm==4.62.2
typing-extensions==3.10.0.2
uff @ file:///home/yichao/360Downloads/TensorRT-7.1.3.4/uff/uff-0.6.9-py2.py3-none-any.whl
urllib3==1.26.6

三、重要说明

3.1 配置文件

Input shape INPUT_H, INPUT_W defined in decode.h
INT8/FP16/FP32 can be selected by the macro USE_FP16 or USE_INT8 or USE_FP32 in retina_r50.cpp
GPU id can be selected by the macro DEVICE in retina_r50.cpp
Batchsize can be selected by the macro BATCHSIZE in retina_r50.cpp

3.2 预训练模型下载

face-recognition-models

face-detection-models

face-alignment-models

face-attribute-models

四、关键步骤

以FP16为例

4.1 pytorch预训练模型生成wts

4.1.1 下载github代码仓库

git clone https://github.com/wang-xinyu/Pytorch_Retinaface.git
// download its weights 'Resnet50_Final.pth', put it in Pytorch_Retinaface/weights

4.1.2 下载预训练模型

cd Pytorch_Retinaface
python detect.py --save_model

4.1.3 生成wts

python genwts.py
// a file 'retinaface.wts' will be generated.

4.2 tensorrtx准备工作

git clone https://github.com/wang-xinyu/tensorrtx.git
cd tensorrtx/retinaface
// put retinaface.wts here
mkdir build
cd build

4.3 cmake编译

yichao@yichao:~/MyDocuments/tensorrtx/retinaface/build$ cmake ..
CMake Deprecation Warning at CMakeLists.txt:1 (cmake_minimum_required):
  Compatibility with CMake < 2.8.12 will be removed from a future version of
  CMake.

  Update the VERSION argument <min> value or use a ...<max> suffix to tell
  CMake that the project does not need compatibility with older versions.


-- The C compiler identification is GNU 7.5.0
-- The CXX compiler identification is GNU 7.5.0
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /usr/bin/cc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Found CUDA: /usr/local/cuda (found version "10.2") 
embed_platform off
-- Found OpenCV: /usr/local/opencv3.3.0 (found version "3.3.0") 
-- Configuring done
-- Generating done
-- Build files have been written to: /home/yichao/MyDocuments/tensorrtx/retinaface/build

4.4 make -j8编译

# 打印所有的日志信息
make VERBOSE=1

(tensorRT-yolov5) yichao@yichao:~/MyDocuments/tensorrtx/retinaface/build$ make -j8
[ 12%] Building NVCC (Device) object CMakeFiles/decodeplugin.dir/decodeplugin_generated_decode.cu.o
/home/yichao/MyDocuments/tensorrtx/retinaface/decode.h(73): warning: function "nvinfer1::IPluginV2Ext::configurePlugin(const nvinfer1::Dims *, int, const nvinfer1::Dims *, int, const nvinfer1::DataType *, const nvinfer1::DataType *, const __nv_bool *, const __nv_bool *, nvinfer1::PluginFormat, int)" is hidden by "nvinfer1::DecodePlugin::configurePlugin" -- virtual function override intended?

/home/yichao/MyDocuments/tensorrtx/retinaface/decode.h(73): warning: function "nvinfer1::IPluginV2Ext::configurePlugin(const nvinfer1::Dims *, int, const nvinfer1::Dims *, int, const nvinfer1::DataType *, const nvinfer1::DataType *, const bool *, const bool *, nvinfer1::PluginFormat, int)" is hidden by "nvinfer1::DecodePlugin::configurePlugin" -- virtual function override intended?
...
...
...
[ 87%] Linking CXX executable retina_mnet
[100%] Linking CXX executable retina_r50
[100%] Built target retina_r50
[100%] Built target retina_mnet

4.5 生成engine引擎

./retina_r50 -s

(tensorRT-yolov5) yichao@yichao:~/MyDocuments/tensorrtx/retinaface/build$ time ./retina_r50 -s
Loading weights: ../retinaface.wts
Building engine, please wait for a while...
Build engine successfully!

real	1m3.483s
user	0m33.287s
sys	0m5.715s

生成engine引擎大小为78.2MB

4.5.1 显存占用情况

Thu Jan 13 16:00:02 2022       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.63.01    Driver Version: 470.63.01    CUDA Version: 11.4     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  Off  | 00000000:01:00.0  On |                  N/A |
| 27%   36C    P0    28W /  75W |    828MiB /  3903MiB |     63%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      1623      G   /usr/lib/xorg/Xorg                209MiB |
|    0   N/A  N/A     23027      C   ./retina_r50                      615MiB |
+-----------------------------------------------------------------------------+

4.6 infer推理

4.6.1 下载图片。

wget https://github.com/Tencent/FaceDetection-DSFD/raw/master/data/worlds-largest-selfie.jpg

如果下载图片太慢了，改成：
wget https://github.com/Tencent.cnpmjs.org/FaceDetection-DSFD/raw/master/data/worlds-largest-selfie.jpg

(tensorRT-yolov5) yichao@yichao:~/MyDocuments/tensorrtx/retinaface/build$ wget https://github.com.cnpmjs.org/Tencent/FaceDetection-DSFD/raw/master/data/worlds-largest-selfie.jpg
--2022-01-13 15:02:13--  https://github.com.cnpmjs.org/Tencent/FaceDetection-DSFD/raw/master/data/worlds-largest-selfie.jpg
正在解析主机 github.com.cnpmjs.org (github.com.cnpmjs.org)... 47.241.4.205
正在连接 github.com.cnpmjs.org (github.com.cnpmjs.org)|47.241.4.205|:443... 已连接。
已发出 HTTP 请求，正在等待回应... 302 Found
位置：https://raw.githubusercontent.com/Tencent/FaceDetection-DSFD/master/data/worlds-largest-selfie.jpg [跟随至新的 URL]
--2022-01-13 15:02:14--  https://raw.githubusercontent.com/Tencent/FaceDetection-DSFD/master/data/worlds-largest-selfie.jpg
正在解析主机 raw.githubusercontent.com (raw.githubusercontent.com)... 151.101.72.133
正在连接 raw.githubusercontent.com (raw.githubusercontent.com)|151.101.72.133|:443... 已连接。
已发出 HTTP 请求，正在等待回应... 200 OK
长度： 471393 (460K) [image/jpeg]
正在保存至: “worlds-largest-selfie.jpg”

worlds-largest-selfi 100%[===================>] 460.34K  13.0KB/s    in 28s     

2022-01-13 15:02:44 (16.5 KB/s) - 已保存 “worlds-largest-selfie.jpg” [471393/471393])

4.6.2 测试推理速度

./retina_r50 -d

(tensorRT-yolov5) yichao@yichao:~/MyDocuments/tensorrtx/retinaface/build$ ./retina_r50 -d
445571us
19030us
...
...
...
15157us
15870us
umber of detections -> 1433
 -> 515.064
after nms -> 256

4.7 python infer

修改 retinaface_trt.py 中的图片路径。

input_image_paths = ["worlds-largest-selfie.jpg"]

(tensorRT-yolov5) yichao@yichao:~/MyDocuments/tensorrtx/retinaface$ python retinaface_trt.py 
3.9774467945098877
0.017582416534423828
0.01763463020324707
0.021233797073364258
0.017621517181396484
0.017649412155151367
0.017993688583374023
0.017635107040405273
0.01763153076171875
0.017618894577026367

五、tensorRT FP32 推理

TensorRT实现yolov5推理加速（一）

修改 retina_r50.cpp 文件中的 USE_FP32，其他操作参考上文中的关键步骤。

5.1 生成engine引擎

(tensorRT-yolov5) yichao@yichao:~/MyDocuments/tensorrtx/retinaface/build$ time ./retina_r50 -s
Loading weights: ../retinaface.wts
Building engine, please wait for a while...
Build engine successfully!

real	0m27.783s
user	0m18.162s
sys	0m2.295s

生成engine引擎大小为154.2MB

5.1.1 显存占用情况

Thu Jan 13 16:10:38 2022       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.63.01    Driver Version: 470.63.01    CUDA Version: 11.4     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  Off  | 00000000:01:00.0  On |                  N/A |
| 27%   36C    P0    42W /  75W |    834MiB /  3903MiB |     56%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      1623      G   /usr/lib/xorg/Xorg                209MiB |
|    0   N/A  N/A     23509      C   ./retina_r50                      621MiB |
+-----------------------------------------------------------------------------+

5.2 infer 推理

(tensorRT-yolov5) yichao@yichao:~/MyDocuments/tensorrtx/retinaface/build$ ./retina_r50 -d
436509us
30747us
30568us
...
...
...
29127us
28726us
28716us
number of detections -> 1433
 -> 515.075
after nms -> 257

5.3 python infer

(tensorRT-yolov5) yichao@yichao:~/MyDocuments/tensorrtx/retinaface$ python retinaface_trt.py 
3.919330358505249
0.03155779838562012
0.031530141830444336
0.03136157989501953
0.03149151802062988
0.0314486026763916
0.03205513954162598
0.03142070770263672
0.03142905235290527
0.03143477439880371

六、tensorRT FP16 推理

TensorRT实现yolov5推理加速（一）

修改 retina_r50.cpp 文件中的 USE_FP16。

七、tensorRT INT8 推理

7.1 校准数据集

7.1.1 下载校准数据集

download my calibration images widerface_calib from GoogleDrive or BaiduPan pwd: a9wh

7.1.2 解压到 `retinaface/build` 目录

7.2 修改 `retina_r50.cpp` 文件

USE_INT8

7.3 make -j8 编译

make -j8

7.4 生成 engine 引擎

./retina_r50 -s

(tensorRT-yolov5) yichao@yichao:~/MyDocuments/tensorrtx/retinaface/build$ time ./retina_r50 -s
Loading weights: ../retinaface.wts
Your platform support int8: 1
Building engine, please wait for a while...
reading calib cache: r50_int8calib.table
2--Demonstration_2_Demonstration_Political_Rally_2_488.jpg  0
29--Students_Schoolkids_29_Students_Schoolkids_Students_Schoolkids_29_517.jpg  1
39--Ice_Skating_39_Ice_Skating_Ice_Skating_39_344.jpg  2
...
...
...
61--Street_Battle_61_Street_Battle_streetfight_61_566.jpg  998
2--Demonstration_2_Demonstration_Demonstration_Or_Protest_2_260.jpg  999
reading calib cache: r50_int8calib.table
writing calib cache: r50_int8calib.table size: 12200
Build engine successfully!

real	7m25.594s
user	5m58.694s
sys	1m34.686s

生成engine引擎大小为30.1MB

7.4.1 显存占用情况

Thu Jan 13 15:42:58 2022       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.63.01    Driver Version: 470.63.01    CUDA Version: 11.4     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  Off  | 00000000:01:00.0  On |                  N/A |
| 27%   39C    P0    45W /  75W |   1073MiB /  3903MiB |     86%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      1623      G   /usr/lib/xorg/Xorg                209MiB |
|    0   N/A  N/A     22413      C   ./retina_r50                      860MiB |
+-----------------------------------------------------------------------------+

7.5 infer 推理

./retina_r50 -d

(tensorRT-yolov5) yichao@yichao:~/MyDocuments/tensorrtx/retinaface/build$ time ./retina_r50 -d
424574us
13240us
14247us
...
...
...
11711us
11662us
11103us
number of detections -> 1382
 -> 11.1058
after nms -> 246

7.6 python infer

(tensorRT-yolov5) yichao@yichao:~/MyDocuments/tensorrtx/retinaface$ python retinaface_trt.py 
3.9951412677764893
0.014085054397583008
0.014075279235839844
0.013991594314575195
0.014072656631469727
0.014059305191040039
0.014052867889404297
0.014079093933105469
0.01405954360961914
0.014012575149536133

八、RetinaFace性能分析

人脸检测器RetinaFace性能分析

精度	Infer Time
FP32	29ms
FP16	15ms
INT8	11ms

总结：FP 16加速比是FP 32的2倍，INT8 相对于 FP 16加速不明显。

九、可能出现的问题

Q1：opencv与CUDA版本不匹配，导致 cmake失败

CMake Error at /usr/local/opencv3.3.0/share/OpenCV/OpenCVConfig.cmake:108 (message):
  OpenCV static library was compiled with CUDA 10.2 support.  Please, use the
  same version or rebuild OpenCV with CUDA 11.0
Call Stack (most recent call first):
  CMakeLists.txt:28 (find_package)

错误原因：
opencv版本与CUDA版本不匹配。博主使用CUDA10.3编译opencv3.3.0，正确的应该是opencv3.3.0匹配CUDA10.2，而当前的opencv版本为3.3.0、CUDA版本为11.0。

解决办法：
因为重新编译opencv比较麻烦，直接切换cuda10.2即可，参考博客
[CUDA在ubuntu多版本切换共存](https://blog.csdn.net/m0_37605642/article/details/120098215)

注意：切换cuda版本之后，清空build目录中的文件，重新cmake

Q2：找不到 `NvInfer.h` 文件

fatal error: NvInfer.h: No such file or directory | TensorRT 报错处理 | 【成功解决】

yichao@yichao:~/MyDocuments/tensorrtx/retinaface/build$ make -j8
[ 12%] Building NVCC (Device) object CMakeFiles/decodeplugin.dir/decodeplugin_generated_decode.cu.o
In file included from /home/yichao/MyDocuments/tensorrtx/retinaface/decode.cu:1:0:
/home/yichao/MyDocuments/tensorrtx/retinaface/decode.h:6:10: fatal error: NvInfer.h: 没有那个文件或目录
 #include "NvInfer.h"
          ^~~~~~~~~~~
compilation terminated.
CMake Error at decodeplugin_generated_decode.cu.o.Debug.cmake:220 (message):
  Error generating
  /home/yichao/MyDocuments/tensorrtx/retinaface/build/CMakeFiles/decodeplugin.dir//./decodeplugin_generated_decode.cu.o

错误原因：
NvInfer.h 头文件属于 TensorRT 下的一个专有头文件，在编译C++ 代码时需要找到它。

解决办法：
/home/yichao/MyDocuments/tensorrtx/retinaface/CMakeLists.txt，增加tensorRT的依赖库

# tensorRT
include_directories(/home/yichao/360Downloads/TensorRT-7.1.3.4/include)
link_directories(/home/yichao/360Downloads/TensorRT-7.1.3.4/lib/)

Q3：不支持tensorRT8

32 errors detected in the compilation of "/tmp/tmpxft_00003bbc_00000000-6_decode.cpp1.ii".
-- Removing /home/yichao/MyDocuments/tensorrtx/retinaface/build/CMakeFiles/decodeplugin.dir//./decodeplugin_generated_decode.cu.o
/home/yichao/360Downloads/cmake-3.21.1-linux-x86_64/bin/cmake -E rm -f /home/yichao/MyDocuments/tensorrtx/retinaface/build/CMakeFiles/decodeplugin.dir//./decodeplugin_generated_decode.cu.o
CMake Error at decodeplugin_generated_decode.cu.o.Debug.cmake:280 (message):
  Error generating file
  /home/yichao/MyDocuments/tensorrtx/retinaface/build/CMakeFiles/decodeplugin.dir//./decodeplugin_generated_decode.cu.o


CMakeFiles/decodeplugin.dir/build.make:75: recipe for target 'CMakeFiles/decodeplugin.dir/decodeplugin_generated_decode.cu.o' failed
make[2]: *** [CMakeFiles/decodeplugin.dir/decodeplugin_generated_decode.cu.o] Error 1
make[2]: Leaving directory '/home/yichao/MyDocuments/tensorrtx/retinaface/build'
CMakeFiles/Makefile2:86: recipe for target 'CMakeFiles/decodeplugin.dir/all' failed
make[1]: *** [CMakeFiles/decodeplugin.dir/all] Error 2
make[1]: Leaving directory '/home/yichao/MyDocuments/tensorrtx/retinaface/build'
Makefile:90: recipe for target 'all' failed
make: *** [all] Error 2

错误原因：
CMakeLists.txt中的tensorRT配置问题，make编译使用的tensorRT版本与系统的tensorRT版本要一致。

解决办法：
/home/yichao/MyDocuments/tensorrtx/retinaface/CMakeLists.txt修改tensorRT的配置
# tensorRT
include_directories(/home/yichao/360Downloads/TensorRT-8.0.1.6/include)
link_directories(/home/yichao/360Downloads/TensorRT-8.0.1.6/lib/)
改为
# tensorRT
include_directories(/home/yichao/360Downloads/TensorRT-7.1.3.4/include)
link_directories(/home/yichao/360Downloads/TensorRT-7.1.3.4/lib/)

Q4：找不到 `lnvinfer`

解决Make时，“/usr/bin/ld: 找不到 -lXXX”问题的四种方法

/usr/bin/ld: 找不到 -lnvinfer
collect2: error: ld returned 1 exit status
CMakeFiles/decodeplugin.dir/build.make:90: recipe for target 'libdecodeplugin.so' failed
make[2]: *** [libdecodeplugin.so] Error 1
CMakeFiles/Makefile2:86: recipe for target 'CMakeFiles/decodeplugin.dir/all' failed
make[1]: *** [CMakeFiles/decodeplugin.dir/all] Error 2
Makefile:90: recipe for target 'all' failed
make: *** [all] Error 2

错误原因：
找不到nvinfer库文件。这个库的文件名应该为“libnvinfer.so”，其命名规则是：lib+库名(即xxx)+.so。

解决办法：
1. 找到 libnvinfer.so 文件
（用find）find / -name libnvinfer.so
或者
（用locate）locate libnvinfer.so

# 输出
/home/yichao/360Downloads/TensorRT-7.1.3.4/lib/libnvinfer.so

2. 创建软链接
sudo ln -s /home/yichao/360Downloads/TensorRT-7.1.3.4/lib/libnvinfer.so /usr/lib/libnvinfer.so

Q5：源代码错误

TensorRT实现yolov5推理加速（二）

make[2]: *** [CMakeFiles/retina_r50.dir/calibrator.cpp.o] Error 1
make[2]: *** 正在等待未完成的任务....
/home/yichao/MyDocuments/tensorrtx/retinaface/calibrator.cpp: In member function ‘virtual bool Int8EntropyCalibrator2::getBatch(void**, const char**, int)’:
/home/yichao/MyDocuments/tensorrtx/retinaface/calibrator.cpp:52:131: error: too many arguments to function ‘cv::Mat cv::dnn::experimental_dnn_v1::blobFromImages(const std::vector<cv::Mat>&, double, cv::Size, const Scalar&, bool)’
     cv::Mat blob = cv::dnn::blobFromImages(input_imgs_, 1.0, cv::Size(input_w_, input_h_), cv::Scalar(104, 117, 123), false, false);
                                                                                                                                   ^
compilation terminated due to -Wfatal-errors.
CMakeFiles/retina_mnet.dir/build.make:75: recipe for target 'CMakeFiles/retina_mnet.dir/calibrator.cpp.o' failed
make[2]: *** [CMakeFiles/retina_mnet.dir/calibrator.cpp.o] Error 1
make[2]: *** 正在等待未完成的任务....
CMakeFiles/Makefile2:138: recipe for target 'CMakeFiles/retina_mnet.dir/all' failed
make[1]: *** [CMakeFiles/retina_mnet.dir/all] Error 2
make[1]: *** 正在等待未完成的任务....
CMakeFiles/Makefile2:112: recipe for target 'CMakeFiles/retina_r50.dir/all' failed
make[1]: *** [CMakeFiles/retina_r50.dir/all] Error 2
Makefile:90: recipe for target 'all' failed
make: *** [all] Error 2

错误原因：
源码错误
/home/yichao/MyDocuments/tensorrtx/retinaface/calibrator.cpp:52

解决办法：
修改源码

cv::Mat blob = cv::dnn::blobFromImages(input_imgs_, 1.0, cv::Size(input_w_, input_h_), cv::Scalar(104, 117, 123), false, false);
修改为
cv::Mat blob = cv::dnn::blobFromImages(input_imgs_, 1.0, cv::Size(input_w_, input_h_), cv::Scalar(104, 117, 123), false);

Q6：显存不足

Cuda Error in allocate: 2 (out of memory) - GPU Memory Leak? #851

显存不足，生成engine引擎失败。

[TensorRT] ERROR: ../rtSafe/safeRuntime.cpp (25) - Cuda Error in allocate: 2 (out of memory)
[TensorRT] ERROR: ../rtSafe/safeRuntime.cpp (25) - Cuda Error in allocate: 2 (out of memory)
Traceback (most recent call last):
  File "/media/yichao/蚁巢文件/YOYOFile/YOYOFile/demo/build_engine.py", line 146, in <module>
    main(args)
  File "/media/yichao/蚁巢文件/YOYOFile/YOYOFile/demo/build_engine.py", line 126, in main
    builder.create_engine(args.engine, args.precision)
  File "/media/yichao/蚁巢文件/YOYOFile/YOYOFile/demo/build_engine.py", line 118, in create_engine
    with self.builder.build_engine(self.network, self.config) as engine, open(engine_path, "wb") as f:
AttributeError: __enter__

错误原因：

我用python API，在GeForce GTX 1650(4GB)服务器上生成引擎失败。在Jetson TX2(8GB)开发板上测试也失败。

解释一：
Same problem. But this problem only happens when my system is 1080ti+tensorRT7.0+cuda10.0+centos7.6. When I change to 2080ti+tensorRT7.0, everything works fine. 

解释二：
I face the problem with 1080 and no problem on 2080. And I don't found any debug means.