问题：

Tensorflow GPU内存分配

曹兴贤

2023-03-14

我正在尝试使用我的GPU而不是CPU来训练一个自定义的对象检测模型。我遵循了以下教程中给出的所有说明：https://tensorflow-object-detection-api-tutorial.readthedocs.io/

我已经测试了我的软件，一切都已安装并正常工作。

目前正在使用：

Windows 10

但问题是，在训练几秒钟后，它停止使用GPU，并发出以下警告消息。


2020-12-29 15:01:15.444931: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudart64_110.dll
2020-12-29 15:01:18.923079: I tensorflow/compiler/jit/xla_cpu_device.cc:41] Not creating XLA devices, tf_xla_enable_xla_devices not set
2020-12-29 15:01:18.928526: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library nvcuda.dll
2020-12-29 15:01:19.830691: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 0 with properties:
pciBusID: 0000:01:00.0 name: Quadro P1000 computeCapability: 6.1
coreClock: 1.5185GHz coreCount: 4 deviceMemorySize: 4.00GiB deviceMemoryBandwidth: 89.53GiB/s
2020-12-29 15:01:19.838069: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudart64_110.dll
2020-12-29 15:01:19.849650: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cublas64_11.dll
2020-12-29 15:01:19.854098: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cublasLt64_11.dll
2020-12-29 15:01:19.861632: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cufft64_10.dll
2020-12-29 15:01:19.867525: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library curand64_10.dll
2020-12-29 15:01:19.879754: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cusolver64_10.dll
2020-12-29 15:01:19.886521: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cusparse64_11.dll
2020-12-29 15:01:19.891603: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudnn64_8.dll
2020-12-29 15:01:19.895368: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1862] Adding visible gpu devices: 0
2020-12-29 15:01:19.900144: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2020-12-29 15:01:19.910485: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 0 with properties:
pciBusID: 0000:01:00.0 name: Quadro P1000 computeCapability: 6.1
coreClock: 1.5185GHz coreCount: 4 deviceMemorySize: 4.00GiB deviceMemoryBandwidth: 89.53GiB/s
2020-12-29 15:01:19.917796: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudart64_110.dll
2020-12-29 15:01:19.922273: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cublas64_11.dll
2020-12-29 15:01:19.926687: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cublasLt64_11.dll
2020-12-29 15:01:19.930618: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cufft64_10.dll
2020-12-29 15:01:19.934399: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library curand64_10.dll
2020-12-29 15:01:19.938808: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cusolver64_10.dll
2020-12-29 15:01:19.943155: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cusparse64_11.dll
2020-12-29 15:01:19.947005: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudnn64_8.dll
2020-12-29 15:01:19.950826: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1862] Adding visible gpu devices: 0
2020-12-29 15:01:20.491701: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1261] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-12-29 15:01:20.496963: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1267]      0
2020-12-29 15:01:20.500990: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1280] 0:   N
2020-12-29 15:01:20.504027: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1406] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 2991 MB memory) -> physical GPU (device: 0, name: Quadro P1000, pci bus id: 0000:01:00.0, compute capability: 6.1)
2020-12-29 15:01:20.512219: I tensorflow/compiler/jit/xla_gpu_device.cc:99] Not creating XLA devices, tf_xla_enable_xla_devices not set
INFO:tensorflow:Using MirroredStrategy with devices ('/job:localhost/replica:0/task:0/device:GPU:0',)
I1229 15:01:20.515150  5872 mirrored_strategy.py:350] Using MirroredStrategy with devices ('/job:localhost/replica:0/task:0/device:GPU:0',)
INFO:tensorflow:Maybe overwriting train_steps: None
I1229 15:01:20.515150  5872 config_util.py:552] Maybe overwriting train_steps: None
INFO:tensorflow:Maybe overwriting use_bfloat16: False
I1229 15:01:20.515150  5872 config_util.py:552] Maybe overwriting use_bfloat16: False
WARNING:tensorflow:From C:\Users\USER-\Anaconda3\lib\site-packages\object_detection\model_lib_v2.py:523: StrategyBase.experimental_distribute_datasets_from_function (from tensorflow.python.distribute.distribute_lib) is deprecated and will be removed in a future version.
Instructions for updating:
rename to distribute_datasets_from_function
W1229 15:01:20.530780  5872 deprecation.py:339] From C:\Users\USER-\Anaconda3\lib\site-packages\object_detection\model_lib_v2.py:523: StrategyBase.experimental_distribute_datasets_from_function (from tensorflow.python.distribute.distribute_lib) is deprecated and will be removed in a future version.
Instructions for updating:
rename to distribute_datasets_from_function
INFO:tensorflow:Reading unweighted datasets: ['annotations/train.record']
I1229 15:01:20.546404  5872 dataset_builder.py:148] Reading unweighted datasets: ['annotations/train.record']
INFO:tensorflow:Reading record datasets for input file: ['annotations/train.record']
I1229 15:01:20.546404  5872 dataset_builder.py:77] Reading record datasets for input file: ['annotations/train.record']
INFO:tensorflow:Number of filenames to read: 1
I1229 15:01:20.546404  5872 dataset_builder.py:78] Number of filenames to read: 1
WARNING:tensorflow:num_readers has been reduced to 1 to match input file shards.
W1229 15:01:20.546404  5872 dataset_builder.py:86] num_readers has been reduced to 1 to match input file shards.
WARNING:tensorflow:From C:\Users\USER-\Anaconda3\lib\site-packages\object_detection\builders\dataset_builder.py:103: parallel_interleave (from tensorflow.python.data.experimental.ops.interleave_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.Dataset.interleave(map_func, cycle_length, block_length, num_parallel_calls=tf.data.AUTOTUNE)` instead. If sloppy execution is desired, use `tf.data.Options.experimental_deterministic`.
W1229 15:01:20.546404  5872 deprecation.py:339] From C:\Users\USER-\Anaconda3\lib\site-packages\object_detection\builders\dataset_builder.py:103: parallel_interleave (from tensorflow.python.data.experimental.ops.interleave_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.Dataset.interleave(map_func, cycle_length, block_length, num_parallel_calls=tf.data.AUTOTUNE)` instead. If sloppy execution is desired, use `tf.data.Options.experimental_deterministic`.
WARNING:tensorflow:From C:\Users\USER-\Anaconda3\lib\site-packages\object_detection\builders\dataset_builder.py:222: DatasetV1.map_with_legacy_function (from tensorflow.python.data.ops.dataset_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.Dataset.map()
W1229 15:01:20.562029  5872 deprecation.py:339] From C:\Users\USER-\Anaconda3\lib\site-packages\object_detection\builders\dataset_builder.py:222: DatasetV1.map_with_legacy_function (from tensorflow.python.data.ops.dataset_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.Dataset.map()
WARNING:tensorflow:From C:\Users\USER-\Anaconda3\lib\site-packages\tensorflow\python\util\dispatch.py:201: sparse_to_dense (from tensorflow.python.ops.sparse_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Create a `tf.sparse.SparseTensor` and use `tf.sparse.to_dense` instead.
W1229 15:01:25.685788  5872 deprecation.py:339] From C:\Users\USER-\Anaconda3\lib\site-packages\tensorflow\python\util\dispatch.py:201: sparse_to_dense (from tensorflow.python.ops.sparse_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Create a `tf.sparse.SparseTensor` and use `tf.sparse.to_dense` instead.
WARNING:tensorflow:From C:\Users\USER-\Anaconda3\lib\site-packages\tensorflow\python\util\dispatch.py:201: sample_distorted_bounding_box (from tensorflow.python.ops.image_ops_impl) is deprecated and will be removed in a future version.
Instructions for updating:
`seed2` arg is deprecated.Use sample_distorted_bounding_box_v2 instead.
W1229 15:01:27.908942  5872 deprecation.py:339] From C:\Users\USER-\Anaconda3\lib\site-packages\tensorflow\python\util\dispatch.py:201: sample_distorted_bounding_box (from tensorflow.python.ops.image_ops_impl) is deprecated and will be removed in a future version.
Instructions for updating:
`seed2` arg is deprecated.Use sample_distorted_bounding_box_v2 instead.
WARNING:tensorflow:From C:\Users\USER-\Anaconda3\lib\site-packages\object_detection\inputs.py:281: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.cast` instead.
W1229 15:01:29.229117  5872 deprecation.py:339] From C:\Users\USER-\Anaconda3\lib\site-packages\object_detection\inputs.py:281: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.cast` instead.
2020-12-29 15:01:31.781125: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:116] None of the MLIR optimization passes are enabled (registered 2)
C:\Users\USER-\Anaconda3\lib\site-packages\tensorflow\python\keras\backend.py:434: UserWarning: `tf.keras.backend.set_learning_phase` is deprecated and will be removed after 2020-10-11. To update it, simply pass a True/False value to the `training` argument of the `__call__` method of your layer or model.
  warnings.warn('`tf.keras.backend.set_learning_phase` is deprecated and '
2020-12-29 15:01:48.972736: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cublas64_11.dll
2020-12-29 15:01:49.258182: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cublasLt64_11.dll
2020-12-29 15:01:49.287771: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudnn64_8.dll
2020-12-29 15:01:49.822205: I tensorflow/core/platform/windows/subprocess.cc:308] SubProcess ended with return code: 0

2020-12-29 15:01:49.866004: I tensorflow/core/platform/windows/subprocess.cc:308] SubProcess ended with return code: 0

WARNING:tensorflow:Unresolved object in checkpoint: (root).model._groundtruth_lists
W1229 15:01:52.823682  5872 util.py:161] Unresolved object in checkpoint: (root).model._groundtruth_lists
WARNING:tensorflow:Unresolved object in checkpoint: (root).model._box_predictor
W1229 15:01:52.823682  5872 util.py:161] Unresolved object in checkpoint: (root).model._box_predictor
WARNING:tensorflow:Unresolved object in checkpoint: (root).model._batched_prediction_tensor_names
W1229 15:01:52.823682  5872 util.py:161] Unresolved object in checkpoint: (root).model._batched_prediction_tensor_names
WARNING:tensorflow:Unresolved object in checkpoint: (root).model._box_predictor._box_prediction_head
W1229 15:01:52.823682  5872 util.py:161] Unresolved object in checkpoint: (root).model._box_predictor._box_prediction_head
WARNING:tensorflow:Unresolved object in checkpoint: (root).model._box_predictor._prediction_heads
W1229 15:01:52.823682  5872 util.py:161] Unresolved object in checkpoint: (root).model._box_predictor._prediction_heads
WARNING:tensorflow:Unresolved object in checkpoint: (root).model._box_predictor._sorted_head_names
W1229 15:01:52.823682  5872 util.py:161] Unresolved object in checkpoint: (root).model._box_predictor._sorted_head_names
WARNING:tensorflow:Unresolved object in checkpoint: (root).model._box_predictor._additional_projection_layers
W1229 15:01:52.823682  5872 util.py:161] Unresolved object in checkpoint: (root).model._box_predictor._additional_projection_layers
WARNING:tensorflow:Unresolved object in checkpoint: (root).model._box_predictor._base_tower_layers_for_heads
W1229 15:01:52.839355  5872 util.py:161] Unresolved object in checkpoint: (root).model._box_predictor._base_tower_layers_for_heads
WARNING:tensorflow:Unresolved object in checkpoint: (root).model._box_predictor._head_scope_conv_layers
W1229 15:01:52.839355  5872 util.py:161] Unresolved object in checkpoint: (root).model._box_predictor._head_scope_conv_layers
WARNING:tensorflow:Unresolved object in checkpoint: (root).model._box_predictor._box_prediction_head._box_encoder_layers
W1229 15:01:52.839355  5872 util.py:161] Unresolved object in checkpoint: (root).model._box_predictor._box_prediction_head._box_encoder_layers
WARNING:tensorflow:Unresolved object in checkpoint: (root).model._box_predictor._prediction_heads.class_predictions_with_background
W1229 15:01:52.839355  5872 util.py:161] Unresolved object in checkpoint: (root).model._box_predictor._prediction_heads.class_predictions_with_background
WARNING:tensorflow:Unresolved object in checkpoint: (root).model._box_predictor._additional_projection_layers.0
W1229 15:01:52.839355  5872 util.py:161] Unresolved object in checkpoint: (root).model._box_predictor._additional_projection_layers.0
WARNING:tensorflow:Unresolved object in checkpoint: (root).model._box_predictor._additional_projection_layers.1
W1229 15:01:52.839355  5872 util.py:161] Unresolved object in checkpoint: (root).model._box_predictor._additional_projection_layers.1
WARNING:tensorflow:Unresolved object in checkpoint: (root).model._box_predictor._additional_projection_layers.2
W1229 15:01:52.839355  5872 util.py:161] Unresolved object in checkpoint: (root).model._box_predictor._additional_projection_layers.2
WARNING:tensorflow:Unresolved object in checkpoint: (root).model._box_predictor._additional_projection_layers.3
W1229 15:01:52.839355  5872 util.py:161] Unresolved object in checkpoint: (root).model._box_predictor._additional_projection_layers.3
WARNING:tensorflow:Unresolved object in checkpoint: (root).model._box_predictor._additional_projection_layers.4
W1229 15:01:52.839355  5872 util.py:161] Unresolved object in checkpoint: (root).model._box_predictor._additional_projection_layers.4
W1229 15:01:53.076874  5872 util.py:161] Unresolved object in checkpoint: (root).model._box_predictor._base_tower_layers_for_heads.class_predictions_with_background.4.7.moving_variance
WARNING:tensorflow:Unresolved object in checkpoint: (root).model._box_predictor._base_tower_layers_for_heads.class_predictions_with_background.4.10.axis
W1229 15:01:53.076874  5872 util.py:161] Unresolved object in checkpoint: (root).model._box_predictor._base_tower_layers_for_heads.class_predictions_with_background.4.10.axis
WARNING:tensorflow:Unresolved object in checkpoint: (root).model._box_predictor._base_tower_layers_for_heads.class_predictions_with_background.4.10.gamma
W1229 15:01:53.076874  5872 util.py:161] Unresolved object in checkpoint: (root).model._box_predictor._base_tower_layers_for_heads.class_predictions_with_background.4.10.gamma
WARNING:tensorflow:Unresolved object in checkpoint: (root).model._box_predictor._base_tower_layers_for_heads.class_predictions_with_background.4.10.beta
W1229 15:01:53.076874  5872 util.py:161] Unresolved object in checkpoint: (root).model._box_predictor._base_tower_layers_for_heads.class_predictions_with_background.4.10.beta
WARNING:tensorflow:Unresolved object in checkpoint: (root).model._box_predictor._base_tower_layers_for_heads.class_predictions_with_background.4.10.moving_mean
W1229 15:01:53.076874  5872 util.py:161] Unresolved object in checkpoint: (root).model._box_predictor._base_tower_layers_for_heads.class_predictions_with_background.4.10.moving_mean
WARNING:tensorflow:Unresolved object in checkpoint: (root).model._box_predictor._base_tower_layers_for_heads.class_predictions_with_background.4.10.moving_variance
W1229 15:01:53.076874  5872 util.py:161] Unresolved object in checkpoint: (root).model._box_predictor._base_tower_layers_for_heads.class_predictions_with_background.4.10.moving_variance
WARNING:tensorflow:A checkpoint was restored (e.g. tf.train.Checkpoint.restore or tf.keras.Model.load_weights) but not all checkpointed values were used. See above for specific issues. Use expect_partial() on the load status object, e.g. tf.train.Checkpoint.restore(...).expect_partial(), to silence these warnings, or use assert_consumed() to make the check explicit. See https://www.tensorflow.org/guide/checkpoint#loading_mechanics for details.
W1229 15:01:53.076874  5872 util.py:169] A checkpoint was restored (e.g. tf.train.Checkpoint.restore or tf.keras.Model.load_weights) but not all checkpointed values were used. See above for specific issues. Use expect_partial() on the load status object, e.g. tf.train.Checkpoint.restore(...).expect_partial(), to silence these warnings, or use assert_consumed() to make the check explicit. See https://www.tensorflow.org/guide/checkpoint#loading_mechanics for details.
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
I1229 15:01:53.468799  5872 cross_device_ops.py:565] Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
I1229 15:01:53.468799  5872 cross_device_ops.py:565] Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
I1229 15:01:53.468799  5872 cross_device_ops.py:565] Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
I1229 15:01:53.468799  5872 cross_device_ops.py:565] Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
I1229 15:01:53.468799  5872 cross_device_ops.py:565] Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
I1229 15:01:53.468799  5872 cross_device_ops.py:565] Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
I1229 15:01:53.468799  5872 cross_device_ops.py:565] Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
I1229 15:01:53.484427  5872 cross_device_ops.py:565] Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
I1229 15:01:53.484427  5872 cross_device_ops.py:565] Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
I1229 15:01:53.484427  5872 cross_device_ops.py:565] Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
WARNING:tensorflow:From C:\Users\USER-\Anaconda3\lib\site-packages\tensorflow\python\util\deprecation.py:605: calling map_fn_v2 (from tensorflow.python.ops.map_fn) with dtype is deprecated and will be removed in a future version.
Instructions for updating:
Use fn_output_signature instead
W1229 15:01:59.423827 15152 deprecation.py:537] From C:\Users\USER-\Anaconda3\lib\site-packages\tensorflow\python\util\deprecation.py:605: calling map_fn_v2 (from tensorflow.python.ops.map_fn) with dtype is deprecated and will be removed in a future version.
Instructions for updating:
Use fn_output_signature instead
2020-12-29 15:02:11.320699: W tensorflow/core/common_runtime/bfc_allocator.cc:248] Allocator (GPU_0_bfc) ran out of memory trying to allocate 1.73GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2020-12-29 15:02:11.351326: W tensorflow/core/common_runtime/bfc_allocator.cc:248] Allocator (GPU_0_bfc) ran out of memory trying to allocate 1.74GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2020-12-29 15:02:11.751709: W tensorflow/core/common_runtime/bfc_allocator.cc:248] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.08GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2020-12-29 15:02:11.784850: W tensorflow/core/common_runtime/bfc_allocator.cc:248] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.09GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2020-12-29 15:02:12.607912: W tensorflow/core/common_runtime/bfc_allocator.cc:248] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.14GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2020-12-29 15:02:12.644507: W tensorflow/core/common_runtime/bfc_allocator.cc:248] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.15GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2020-12-29 15:02:13.057969: W tensorflow/core/common_runtime/bfc_allocator.cc:248] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.05GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2020-12-29 15:02:13.092341: W tensorflow/core/common_runtime/bfc_allocator.cc:248] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.06GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2020-12-29 15:02:13.299573: W tensorflow/core/common_runtime/bfc_allocator.cc:248] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.04GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2020-12-29 15:02:13.331704: W tensorflow/core/common_runtime/bfc_allocator.cc:248] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.06GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.

此外，我没有在我的设备上运行任何其他程序，所以内存不足似乎有点奇怪。

诸葛砚

2023-03-14

我看到这个错误消息有四个不同的原因，有不同的解决方案：

也许您的GPU内存已满，当TensorFlow进行初始化时，您的计算图形最终使用了物理设备的所有内存，那么这个问题就出现了。解决方案是在GPU选项中使用allowgrowth=True。如果为GPU启用了内存增长，则运行时初始化将不会分配设备上的所有内存。导入后使用下面的代码段可能会解决您的问题。

import tensorflow as tf
physical_devices = tf.config.experimental.list_physical_devices('GPU')
if len(physical_devices) > 0:
    tf.config.experimental.set_memory_growth(physical_devices[0], True)

我经常通过关闭python进程，删除~/，来解决这个错误。nv目录（在linux上，rm-rf~/.nv），并重新启动Python进程。我不知道为什么会这样。这可能至少部分与第二种选择有关：

Keras包含在上面的TensorFlow 2.0中。所以呢

移除导入KERA并更换KERA中的。单元模块导入类语句--

例如，从keras更换。图层从tensorflow导入Conv3D、ConvLSTM2D、Conv3DTranspose和输入。克拉斯。图层导入Conv3D、ConvLSTM2D、Conv3DTranspose、输入

如果您从未使用过类似的型号，您的VRAM没有用完，您的导入如第3步所述正确，并且您的缓存干净，我将返回并使用可用的最佳安装指南设置CUDA TensorFlow-我在以下说明中获得了最大的成功：https://www.tensorflow.org/install/gpu而不是那些在英伟达/CUDA网站。Lambda堆栈：https://lambdalabs.com/lambda-stack-deep-learning-software这也是一个好办法。

Tensorflow GPU内存分配

共有1个答案

相关问答

相关文章

相关阅读

相关工具

相关文档