当前位置: 首页 > 工具软件 > static_status > 使用案例 >

解决win10(2080ti)+CUDA9.2+pytorch-gpu(RuntimeError: cuDNN error: CUDNN_STATUS_EXECUTION_FAILED)

晁绍辉
2023-12-01

问题

CUDA9.2+Anaconda5.0+Pytorch1.0.0( py3.7_cuda90_cudnn7_1)安装完成后,不调用GPU跑程序,可以正常运行,当调用cuda()后出错:RuntimeError: cuDNN error: CUDNN_STATUS_EXECUTION_FAILED
而同样的代码在Ubuntu下运行正常。

// 出错提示如下
Traceback (most recent call last):
  File "D:/ProjectWork/Pythonworkp/DFT01/dft1.py", line 98, in <module>
    rnn.cuda()
  File "C:\Users\OFC\Anaconda3\envs\torch2\lib\site-packages\torch\nn\modules\module.py", line 260, in cuda
    return self._apply(lambda t: t.cuda(device))
  File "C:\Users\OFC\Anaconda3\envs\torch2\lib\site-packages\torch\nn\modules\module.py", line 187, in _apply
    module._apply(fn)
  File "C:\Users\OFC\Anaconda3\envs\torch2\lib\site-packages\torch\nn\modules\rnn.py", line 117, in _apply
    self.flatten_parameters()
  File "C:\Users\OFC\Anaconda3\envs\torch2\lib\site-packages\torch\nn\modules\rnn.py", line 113, in flatten_parameters
    self.batch_first, bool(self.bidirectional))
RuntimeError: cuDNN error: CUDNN_STATUS_EXECUTION_FAILED

解决方法

  1. 重新安装pytorch
#之前在线安装的版本
C:\Windows\system32>activate torch2

(torch2) C:\Windows\system32>conda install pytorch torchvision cuda92 -c pytorch
Fetching package metadata ...............
Solving package specifications: .

Package plan for installation in environment C:\Users\OFC\Anaconda3\envs\torch2:

The following NEW packages will be INSTALLED:

   cuda92:      1.0-0                       pytorch
   ninja:       1.8.2-py37he980bc4_1
   pytorch:     1.0.0-py3.7_cuda90_cudnn7_1 pytorch
   torchvision: 0.2.1-py_2                  pytorch

Proceed ([y]/n)? y
#卸载
(torch2) C:\Windows\system32>conda uninstall pytorch
Fetching package metadata .............
Solving package specifications: .

下载离线的pytorch
pytorch离线安装包下载地址:https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/pytorch/win-64/
将下载的安装包放到C:\Users\OFC\Anaconda3\envs路径下,安装pytorch:
conda install pytorch-0.4.1-py37_cuda92_cudnn7he774522_1.tar.bz2
pip install torchvision

#重装Pytorch
(torch2) C:\Users\OFC\Anaconda3\envs>conda install pytorch-0.4.1-py37_cuda92_cudnn7he774522_1.tar.bz2
(torch2) C:\Users\OFC\Anaconda3\envs>pip install torchvision

按照提示:
pip install PyHamcrest==1.9.0
python -m pip install --upgrade pip

  1. 运行程序,出错:
Warning! HDF5 library version mismatched error
The HDF5 header files used to compile this application do not match
the version used by the HDF5 library to which this application is linked.
Data corruption or segmentation faults may occur if the application continues.
This can happen when an application was compiled by one version of HDF5 but
linked with a different version of static or shared HDF5 library.
You should recompile the application or check your shared library related
settings such as 'LD_LIBRARY_PATH'.
You can, at your own risk, disable this warning by setting the environment
variable 'HDF5_DISABLE_VERSION_CHECK' to a value of '1'.
Setting it to 2 or higher will suppress the warning messages totally.
Headers are 1.10.2, library is 1.10.1

在此虚拟环境下安装hdf5:conda install -c anaconda hdf5=1.10.2

(torch2) C:\Users\OFC\Anaconda3\envs>conda install -c anaconda hdf5=1.10.2
Fetching package metadata ...............
Solving package specifications: .
  1. 在pycharm中设置环境变量,可能原因是调用了Anaconda中的hdf5=1.10.1,而没有调用虚拟环境下新装的hdf5=1.10.2,需要设置环境变量

  2. Run–>Edit Configurations…–>Environment–>Environment variables:点击框右边的文件夹图标–>点击“+”,添加环境变量
    LD_LIBRARY_PATH: C:\Users\OFC\Anaconda3\envs\torch2\Library\mingw-w64
    PATH: C:\Users\OFC\Anaconda3\envs\torch2\Library\bin

  3. 运行,成功调用GPU!

 类似资料: