当前位置: 首页 > 知识库问答 >
问题:

我是利库布拉斯。所以10.0:无法打开共享对象文件:没有此类文件或目录

陶寒
2023-03-14

我已经在Ubuntu18.04上安装了Cuda 10.1和cudnn,它似乎已正确安装为nvcc和nvidia smi类型,我得到了正确的响应:

user:~$ nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Fri_Feb__8_19:08:17_PST_2019
Cuda compilation tools, release 10.1, V10.1.105
user:~$ nvidia-smi 
Mon Mar 18 14:36:47 2019       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.43       Driver Version: 418.43       CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Quadro K5200        Off  | 00000000:03:00.0  On |                  Off |
| 26%   39C    P8    14W / 150W |    225MiB /  8118MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0      1538      G   /usr/lib/xorg/Xorg                            32MiB |
|    0      1583      G   /usr/bin/gnome-shell                           5MiB |
|    0      3008      G   /usr/lib/xorg/Xorg                           100MiB |
|    0      3120      G   /usr/bin/gnome-shell                          82MiB |
+-----------------------------------------------------------------------------+

我已经使用:用户:~$sudo pi3安装-升级tensorflow-gpu安装了tensorflow

The directory '/home/amin/.cache/pip/http' or its parent directory is not owned by the current user and the cache has been disabled. Please check the permissions and owner of that directory. If executing pip with sudo, you may want sudo's -H flag.
The directory '/home/amin/.cache/pip' or its parent directory is not owned by the current user and caching wheels has been disabled. check the permissions and owner of that directory. If executing pip with sudo, you may want sudo's -H flag.
Requirement already up-to-date: tensorflow-gpu in /usr/local/lib/python3.6/dist-packages (1.13.1)
Requirement already satisfied, skipping upgrade: keras-applications>=1.0.6 in /usr/local/lib/python3.6/dist-packages (from tensorflow-gpu) (1.0.7)
Requirement already satisfied, skipping upgrade: protobuf>=3.6.1 in /usr/local/lib/python3.6/dist-packages (from tensorflow-gpu) (3.6.1)
Requirement already satisfied, skipping upgrade: wheel>=0.26 in /usr/local/lib/python3.6/dist-packages (from tensorflow-gpu) (0.32.3)
Requirement already satisfied, skipping upgrade: absl-py>=0.1.6 in /usr/local/lib/python3.6/dist-packages (from tensorflow-gpu) (0.7.0)
Requirement already satisfied, skipping upgrade: keras-preprocessing>=1.0.5 in /usr/local/lib/python3.6/dist-packages (from tensorflow-gpu) (1.0.9)
Requirement already satisfied, skipping upgrade: gast>=0.2.0 in /usr/local/lib/python3.6/dist-packages (from tensorflow-gpu) (0.2.2)
Requirement already satisfied, skipping upgrade: termcolor>=1.1.0 in /usr/local/lib/python3.6/dist-packages (from tensorflow-gpu) (1.1.0)
Requirement already satisfied, skipping upgrade: grpcio>=1.8.6 in /usr/local/lib/python3.6/dist-packages (from tensorflow-gpu) (1.18.0)
Requirement already satisfied, skipping upgrade: tensorflow-estimator<1.14.0rc0,>=1.13.0 in /usr/local/lib/python3.6/dist-packages (from tensorflow-gpu) (1.13.0)
Requirement already satisfied, skipping upgrade: six>=1.10.0 in /usr/lib/python3/dist-packages (from tensorflow-gpu) (1.11.0)
Requirement already satisfied, skipping upgrade: numpy>=1.13.3 in /usr/lib/python3/dist-packages (from tensorflow-gpu) (1.13.3)
Requirement already satisfied, skipping upgrade: astor>=0.6.0 in /usr/local/lib/python3.6/dist-packages (from tensorflow-gpu) (0.7.1)
Requirement already satisfied, skipping upgrade: tensorboard<1.14.0,>=1.13.0 in /usr/local/lib/python3.6/dist-packages (from tensorflow-gpu) (1.13.1)
Requirement already satisfied, skipping upgrade: h5py in /usr/local/lib/python3.6/dist-packages (from keras-applications>=1.0.6->tensorflow-gpu) (2.9.0)
Requirement already satisfied, skipping upgrade: setuptools in /usr/local/lib/python3.6/dist-packages (from protobuf>=3.6.1->tensorflow-gpu) (40.6.3)
Requirement already satisfied, skipping upgrade: mock>=2.0.0 in /usr/local/lib/python3.6/dist-packages (from tensorflow-estimator<1.14.0rc0,>=1.13.0->tensorflow-gpu) (2.0.0)
Requirement already satisfied, skipping upgrade: werkzeug>=0.11.15 in /usr/local/lib/python3.6/dist-packages (from tensorboard<1.14.0,>=1.13.0->tensorflow-gpu) (0.14.1)
Requirement already satisfied, skipping upgrade: markdown>=2.6.8 in /usr/local/lib/python3.6/dist-packages (from tensorboard<1.14.0,>=1.13.0->tensorflow-gpu) (3.0.1)
Requirement already satisfied, skipping upgrade: pbr>=0.11 in /usr/local/lib/python3.6/dist-packages (from mock>=2.0.0->tensorflow-estimator<1.14.0rc0,>=1.13.0->tensorflow-gpu) (5.1.1)

然而,当我试图导入tenstorflow我得到关于10.0libcublas.so.错误:

user:~$ python3
Python 3.6.7 (default, Oct 22 2018, 11:32:17) 
[GCC 8.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow as tf
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/pywrap_tensorflow.py", line 58, in <module>
    from tensorflow.python.pywrap_tensorflow_internal import *
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 28, in <module>
    _pywrap_tensorflow_internal = swig_import_helper()
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 24, in swig_import_helper
    _mod = imp.load_module('_pywrap_tensorflow_internal', fp, pathname, description)
  File "/usr/lib/python3.6/imp.py", line 243, in load_module
    return load_dynamic(name, filename, file)
  File "/usr/lib/python3.6/imp.py", line 343, in load_dynamic
    return _load(spec)
ImportError: libcublas.so.10.0: cannot open shared object file: No such file or directory

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/__init__.py", line 24, in <module>
    from tensorflow.python import pywrap_tensorflow  # pylint: disable=unused-import
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/__init__.py", line 49, in <module>
    from tensorflow.python import pywrap_tensorflow
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/pywrap_tensorflow.py", line 74, in <module>
    raise ImportError(msg)
ImportError: Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/pywrap_tensorflow.py", line 58, in <module>
    from tensorflow.python.pywrap_tensorflow_internal import *
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 28, in <module>
    _pywrap_tensorflow_internal = swig_import_helper()
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 24, in swig_import_helper
    _mod = imp.load_module('_pywrap_tensorflow_internal', fp, pathname, description)
  File "/usr/lib/python3.6/imp.py", line 243, in load_module
    return load_dynamic(name, filename, file)
  File "/usr/lib/python3.6/imp.py", line 343, in load_dynamic
    return _load(spec)
ImportError: libcublas.so.10.0: cannot open shared object file: No such file or directory


Failed to load the native TensorFlow runtime.

See https://www.tensorflow.org/install/errors

for some common reasons and solutions.  Include the entire stack trace
above this error message when asking for help.

我错过了什么?我该如何解决这个问题?

谢谢

共有3个答案

彭骏
2023-03-14

CUDA 10.1(按照tensorflow文档安装)无法找到libcublas。所以10.0错误。lib存在于/usr/local/cuda-10.1/targets/x86_64-linux/lib/中,但名称错误。

另一篇(丢失的)stackoverflow帖子说,这是软件包的固定依赖性问题,可以使用apt的额外cli标志进行修复。这似乎没有解决我的问题。

经过测试的解决方法是修改指令,以降级到CUDA 10.0

# Uninstall packages from tensorflow installation instructions 
sudo apt-get remove cuda-10-1 \
    libcudnn7 \
    libcudnn7-dev \
    libnvinfer6 \
    libnvinfer-dev \
    libnvinfer-plugin6

# WORKS: Downgrade to CUDA-10.0
sudo apt-get install -y --no-install-recommends \
    cuda-10-0 \
    libcudnn7=7.6.4.38-1+cuda10.0  \
    libcudnn7-dev=7.6.4.38-1+cuda10.0;
sudo apt-get install -y --no-install-recommends \
    libnvinfer6=6.0.1-1+cuda10.0 \
    libnvinfer-dev=6.0.1-1+cuda10.0 \
    libnvinfer-plugin6=6.0.1-1+cuda10.0;

升级到CUDA-10.2似乎也遇到了同样的问题

# BROKEN: Upgrade to CUDA-10.2 
# use `apt show -a libcudnn7 libnvinfer7` to find 10.2 compatable version numbers
sudo apt-get install -y --no-install-recommends \
    cuda-10-2 \
    libcudnn7=7.6.5.32-1+cuda10.2  \
    libcudnn7-dev=7.6.5.32-1+cuda10.2;
sudo apt-get install -y --no-install-recommends \
    libnvinfer7=7.0.0-1+cuda10.2 \
    libnvinfer-dev=7.0.0-1+cuda10.2 \
    libnvinfer-plugin7=7.0.0-1+cuda10.2;

在Python中测试GPU可见性

python3
>>> import tensorflow as tf
>>> tf.test.is_gpu_available()

FutureWarning关于tensorflow导入的警告

https://github.com/tensorflow/tensorflow/issues/30427

两种解决方案:

  • pip3每晚安装tf gpu
  • pip3安装“numpy”

更新:

您还需要正确的tensorflow版本才能与CUDA版本匹配

Tensorflow/CUDA版本组合:

  • Tensorflow v2。x不支持CUDA9(Ubuntu 18.4默认)

有关完整列表,请参见:https://www.tensorflow.org/install/source#tested_build_configurations

您可能需要重新安装与CUDA匹配的命名版本的tenstorflow

pip uninstall tensorflow tensorflow-gpu
pip install tensorflow==2.1.0 tensorflow-gpu==2.1.0

然后在~/. bashrc中添加CUDA到$PATH和$LD_LIBRARY_PATH

~/.巴什尔

# CUDA Environment Setup: https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html#environment-setup
for CUDA_BIN_DIR in `find /usr/local/cuda-*/bin   -maxdepth 0`; do export PATH="$PATH:$CUDA_BIN_DIR"; done;
for CUDA_LIB_DIR in `find /usr/local/cuda-*/lib64 -maxdepth 0`; do export LD_LIBRARY_PATH="${LD_LIBRARY_PATH:+${LD_LIBRARY_PATH}:}$CUDA_LIB_DIR"; done;

export            PATH=`echo $PATH            | tr ':' '\n' | awk '!x[$0]++' | tr '\n' ':' | sed 's/:$//g'` # Deduplicate $PATH
export LD_LIBRARY_PATH=`echo $LD_LIBRARY_PATH | tr ':' '\n' | awk '!x[$0]++' | tr '\n' ':' | sed 's/:$//g'` # Deduplicate $LD_LIBRARY_PATH

冯星阑
2023-03-14

如果使用Cuda 10.1(按照中的指示https://www.tensorflow.org/install/gpu)问题是libcublas。所以10已从cuda-10.1目录移到cuda-10.2(!)

从这个答案中复制:https://github.com/tensorflow/tensorflow/issues/26182#issuecomment-684993950

...libcublas.so.10位于 /usr/local/cuda-10.2/lib64(来自nvidia的惊喜10.1的安装安装了一些10.2的东西),但只有 /usr/local/cuda在指向 /usr/local/cuda-10.1.的包含路径中

修复方法是将其添加到包含路径:

export LD_LIBRARY_PATH=/usr/local/cuda-10.2/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}

注意:已知此修复程序在Cuda 10.1 V10中可用。1.243(使用nvcc-V打印您的版本)。

司徒志强
2023-03-14

我从以下链接cuda 10.0下载了cuda 10.0

然后,我使用以下命令安装了它:

sudo dpkg -i cuda-repo-ubuntu1804_10.0.130-1_amd64.deb
sudo apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/7fa2af80.pub
sudo apt-get update
sudo apt-get install cuda-10-0

然后,我通过链接CUDNN下载安装了CUDA 10.0的cudnn v7.5.0,您需要使用帐户登录。

在选择了正确的版本后,我通过link CUDNN power link下载,然后我添加了CUDNN的include和lib文件,如下所示:

sudo cp -P cuda/targets/ppc64le-linux/include/cudnn.h /usr/local/cuda-10.0/include/
sudo cp -P cuda/targets/ppc64le-linux/lib/libcudnn* /usr/local/cuda-10.0/lib64/
sudo chmod a+r /usr/local/cuda-10.0/lib64/libcudnn*

修改了cuda 10.0的lib和path的. bashrc后,如果没有,需要将它们添加到. bashrc中

export PATH=/usr/local/cuda-10.0/bin${PATH:+:${PATH}}
export LD_LIBRARY_PATH=/usr/local/cuda-10.0/lib64:${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}

经过所有这些步骤,我成功地将tensorflow导入到python3中。

 类似资料: