当前位置: 首页 > 知识库问答 >
问题:

tensorflow和火炬。cuda可以找到GPU,但只有Keras找不到

王声
2023-03-14

我试过如何检查keras是否使用了tensorflow的gpu版本?答复但我只认出keras没有看到GPU。

我重新安装了整个需求,包括tensorflow gpu、keras模块,甚至CUDA。

我用的是Jupyter remote ipython。

下面的列表是我安装的模块版本

...
keras                     2.2.4
keras-applications        1.0.8
keras-preprocessing       1.1.0
...
tensorflow-gpu            1.14.0
...
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2018 NVIDIA Corporation
Built on Sat_Aug_25_21:08:01_CDT_2018
Cuda compilation tools, release 10.0, V10.0.130

我检查了以下内容:

print(device_lib.list_local_devices())
print()

from keras import backend
print(backend.tensorflow_backend._get_available_gpus())
print()

from torch import cuda
print(cuda.is_available())
print(cuda.device_count())
print(cuda.get_device_name(cuda.current_device()))
print()

结果:

device_type: "CPU"
memory_limit: 268435456
locality {
}
incarnation: 15355337614284368930
, name: "/device:XLA_CPU:0"
device_type: "XLA_CPU"
memory_limit: 17179869184
locality {
}
incarnation: 5758691101165968939
physical_device_desc: "device: XLA_CPU device"
, name: "/device:XLA_GPU:0"
device_type: "XLA_GPU"
memory_limit: 17179869184
locality {
}
incarnation: 17050701241022830982
physical_device_desc: "device: XLA_GPU device"
, name: "/device:XLA_GPU:1"
device_type: "XLA_GPU"
memory_limit: 17179869184
locality {
}
incarnation: 15949544090620437264
physical_device_desc: "device: XLA_GPU device"
]

[]

True
2
GeForce GTX 1080 Ti

=============已添加==========

此外,我还了解了如何从python shell内部判断tensorflow是否使用gpu加速?在终点站接电话。我试过:

with tf.device('/gpu:0'):
    a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')
    b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b')
    c = tf.matmul(a, b)

with tf.Session() as sess:
    print (sess.run(c))

结果:

2019-08-08 16:16:57.060679: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcuda.so.1
2019-08-08 16:16:57.075040: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1640] Found device 0 with properties: 
name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.6705
pciBusID: 0000:05:00.0
2019-08-08 16:16:57.076003: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1640] Found device 1 with properties: 
name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.6705
pciBusID: 0000:0a:00.0
2019-08-08 16:16:57.076256: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudart.so.10.0
2019-08-08 16:16:57.078074: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcublas.so.10.0
2019-08-08 16:16:57.080007: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcufft.so.10.0
2019-08-08 16:16:57.080436: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcurand.so.10.0
2019-08-08 16:16:57.083506: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusolver.so.10.0
2019-08-08 16:16:57.085629: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusparse.so.10.0
2019-08-08 16:16:57.086483: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcudnn.so.7'; dlerror: libcudnn.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: usr/local/cuda/extras/CUPTI/lib64:/usr/local/cuda/lib64:/tink/dlgks224/conda/lib:
2019-08-08 16:16:57.086537: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1663] Cannot dlopen some GPU libraries. Skipping registering GPU devices...
2019-08-08 16:16:57.087195: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-08-08 16:16:57.117070: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2198685000 Hz
2019-08-08 16:16:57.119097: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x55eab648cdc0 executing computations on platform Host. Devices:
2019-08-08 16:16:57.119231: I tensorflow/compiler/xla/service/service.cc:175]   StreamExecutor device (0): <undefined>, <undefined>
2019-08-08 16:16:57.119383: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1181] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-08-08 16:16:57.119397: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1187]      
2019-08-08 16:16:57.483390: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x55eab653adf0 executing computations on platform CUDA. Devices:
2019-08-08 16:16:57.483443: I tensorflow/compiler/xla/service/service.cc:175]   StreamExecutor device (0): GeForce GTX 1080 Ti, Compute Capability 6.1
2019-08-08 16:16:57.483454: I tensorflow/compiler/xla/service/service.cc:175]   StreamExecutor device (1): GeForce GTX 1080 Ti, Compute Capability 6.1
Traceback (most recent call last):
  File "/tink/dlgks224/conda/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1356, in _do_call
    return fn(*args)
  File "/tink/dlgks224/conda/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1339, in _run_fn
    self._extend_graph()
  File "/tink/dlgks224/conda/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1374, in _extend_graph
    tf_session.ExtendSession(self._session)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Cannot assign a device for operation MatMul: {{node MatMul}}was explicitly assigned to /device:GPU:0 but available devices are [ /job:localhost/replica:0/task:0/device:CPU:0, /job:localhost/replica:0/task:0/device:XLA_CPU:0, /job:localhost/replica:0/task:0/device:XLA_GPU:0, /job:localhost/replica:0/task:0/device:XLA_GPU:1 ]. Make sure the device specification refers to a valid device.
     [[MatMul]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 2, in <module>
  File "/tink/dlgks224/conda/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 950, in run
    run_metadata_ptr)
  File "/tink/dlgks224/conda/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1173, in _run
    feed_dict_tensor, options, run_metadata)
  File "/tink/dlgks224/conda/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1350, in _do_run
    run_metadata)
  File "/tink/dlgks224/conda/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1370, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Cannot assign a device for operation MatMul: node MatMul (defined at <stdin>:4) was explicitly assigned to /device:GPU:0 but available devices are [ /job:localhost/replica:0/task:0/device:CPU:0, /job:localhost/replica:0/task:0/device:XLA_CPU:0, /job:localhost/replica:0/task:0/device:XLA_GPU:0, /job:localhost/replica:0/task:0/device:XLA_GPU:1 ]. Make sure the device specification refers to a valid device.
     [[MatMul]]

Errors may have originated from an input operation.
Input Source operations connected to node MatMul:
 a (defined at <stdin>:2)   
 b (defined at <stdin>:3)

共有1个答案

单于旭东
2023-03-14

解决了!

这是一个出人意料的愚蠢问题。

这个错误一直在告诉我它是什么。

我再次检查libcudnn.so.7,它安装在错误的地方。

当您遇到类似错误时,请对此进行验证!

2019-08-08 16:16:57.086483: I tensorflow/stream_executor/platform/default/dso_loader.cc:53]
Could not dlopen library 'libcudnn.so.7'; dlerror: libcudnn.so.7:
cannot open shared object file: No such file or directory;
LD_LIBRARY_PATH: usr/local/cuda/extras/CUPTI/lib64:/usr/local/cuda/lib64:/tink/dlgks224/conda/lib:
 类似资料:
  • 遵循文档中的示例: https://symfony.com/doc/current/page_creation.html 我遇到这个错误消息: 自动加载程序期望类"App\Controller\LuckyController"定义在文件"/var/www/my-project/供应商/作曲家/.../... /src/Controller/LuckyController.php”。找到该文件,但类

  • 没有图像很难描述: 编辑:(我不能发布一个图像,直到我有更多的声誉)。 jar存在于外部库中,包包含类('org.roblectric.AndroidManifest')。导入语句在“robolectric”之前是灰色的,但在“AndroidManifest”上是红色的。当我点击包(robolectric)时,它会将我带到正确的包,其中显然包含该类。 这是应用程序中唯一的机器人库,因此它不是在使用

  • 问题内容: 我使用keras版本2.0.0和tensorflow版本0.12.1构建了docker 镜像的gpu版本https://github.com/floydhub/dl- docker 。然后,我运行了mnist教程https://github.com/fchollet/keras/blob/master/examples/mnist_cnn.py,但意识到keras没有使用GPU。以下是

  • Imporderror://home/jj/anaconda2/bin/../lib/libstdc++.so.6:找不到版本`cxxabi_1.3.8'(/home/jj/anaconda2/lib/python2.7/site-packages/tensorflow/python/_pywrap_tensorflow.so需要) CUDA8.0 cudnn 5.1 GCC 5.4.1 Tens

  • 我运行的是Arch Linux,并从存储库中安装了cuda sdk和cuda工具包。我已经编译了/opt/cudasdk/CUDALibraries中的库。 不,我运行make in /opt/cuda-sdk/C编译sdk示例,得到以下错误: Makefile本身似乎只包含文件/opt/cudasdk/C/common/common。mk,即: Echo$LD_LIBRARY_PATH的输出为空

  • 当我从gradle/code中删除TensorFlow引用时,导入的模块可以正常工作。