当前位置: 首页 > 知识库问答 >
问题:

基于GPU的Tensorflow

戚祺
2023-03-14

我在Tensorflow中的LSTM-RNN上训练一些音乐数据,遇到了GPU内存分配的一些问题,我不明白:我遇到了OOM,而实际上似乎还有足够的VRAM可用。一些背景:我正在使用GTX1060 6GB、英特尔至强E3-1231V3和8GB内存开发Ubuntu Gnome 16.04。现在,首先是我能理解的错误消息的一部分,在中,我将在最后再次添加整个错误消息,以供任何可能要求帮助的人使用:

I tensorflow/core/common_runtime/bfc_分配器。cc:696]8个大小为256的区块,总计2.0kb I tensorflow/core/common_runtime/bfc_分配器。cc:696]1个大小为1280的区块,总计1.2kb I tensorflow/core/common_runtime/bfc_分配器。cc:696]5个大小为44288的区块,总计216.2kb I tensorflow/core/common_runtime/bfc_分配器。cc:696]5个区块大小56064,总计273.8KiB I tensorflow/core/common_runtime/bfc_分配器。cc:696]4块大小154350080,总计588.80MiB I tensorflow/core/common_runtime/bfc_分配器。cc:696]3块大小813400064,总计2.27GiB I tensorflow/core/common_runtime/bfc_分配器。cc:696]1块大小1612612352,总计1.50GiB Itensorflow/core/common_runtime/bfc_分配器。cc:700]正在使用的区块总数:4.35GiB I tensorflow/core/common_runtime/bfc_分配器。cc:702]统计:

限额:5484118016

因纽斯:4670717952

MaxInuse: 5484118016

努马洛斯:29

MaxAllocSize:1612612352

W tensorflow/core/common_runtime/bfc_allocator.cc:274]************************************************************************************uuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuu*********************************************************************************************************************************************************************************************************************************************************************************************************************。有关内存状态,请参阅日志。W tensorflow/core/framework/op_kernel.cc:993]资源耗尽:在使用形状[1452514000]分配tensor时OOM

因此,我可以看出,最多需要分配5484118016个字节,4670717952个字节已准备就绪,另外777.72MB=775720000个字节需要分配。根据我的计算器,5484118016字节-4670717952字节-775720000字节=37680064字节。因此,在为他想要推入的新张量分配空间后,仍然应该有37MB的空闲VRAM。这对我来说似乎也是相当合法的,因为Tensorflow可能(我猜?)不会尝试分配比仍然可用的VRAM更多的VRAM,而只是将其余的数据保留在RAM或其他东西中。

现在我想我的想法中有一些很大的错误,但是如果有人能给我解释一下,这个错误是什么,我将非常感激。我的问题的明显解决策略是让我的批次小一点,每个批次在1.5GB左右可能太大了。我仍然很想知道那里的实际问题是什么。

编辑:我发现了一些告诉我要尝试的东西:

config = tf.ConfigProto()
config.gpu_options.allocator_type = 'BFC'
with tf.Session(config = config) as s:

这仍然不起作用,但由于tensorflow文档缺少对什么的解释

 gpu_options.allocator_type = 'BFC'

我很想问问你们。

为感兴趣的人添加其余错误消息:

抱歉长时间复制/粘贴,但也许有人需要/想看它,

非常感谢你,里昂

(gputensorflow) leon@ljksUbuntu:~/Tensorflow$ python Netzwerk_v0.5.1_gamma.py 
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcublas.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcudnn.so.5 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcufft.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcurand.so.8.0 locally
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE3 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations.
I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:910] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 0 with properties: 
name: GeForce GTX 1060 6GB
major: 6 minor: 1 memoryClockRate (GHz) 1.7335
pciBusID 0000:01:00.0
Total memory: 5.93GiB
Free memory: 5.40GiB
I tensorflow/core/common_runtime/gpu/gpu_device.cc:906] DMA: 0 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 0:   Y 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 1060 6GB, pci bus id: 0000:01:00.0)
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (256):   Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (512):   Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (1024):  Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (2048):  Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (4096):  Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (8192):  Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (16384):     Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (32768):     Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (65536):     Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (131072):    Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (262144):    Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (524288):    Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (1048576):   Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (2097152):   Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (4194304):   Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (8388608):   Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (16777216):  Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (33554432):  Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (67108864):  Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (134217728):     Total Chunks: 1, Chunks in use: 0 147.20MiB allocated for chunks. 147.20MiB client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (268435456):     Total Chunks: 1, Chunks in use: 0 628.52MiB allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:660] Bin for 775.72MiB was 256.00MiB, Chunk State: 
I tensorflow/core/common_runtime/bfc_allocator.cc:666]   Size: 628.52MiB | Requested Size: 0B | in_use: 0, prev:   Size: 147.20MiB | Requested Size: 147.20MiB | in_use: 1, next:   Size: 54.8KiB | Requested Size: 54.7KiB | in_use: 1
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x10208000000 of size 1280
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x10208000500 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x10208000600 of size 56064
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x1020800e100 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x1020800e200 of size 44288
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x10208018f00 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x10208019000 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x10208019100 of size 813400064
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x102387d1100 of size 56064
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x102387dec00 of size 154350080
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x10241b11e00 of size 44288
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x10241b1cb00 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x10241b1cc00 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x10241b1cd00 of size 154350080
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x102722d4d00 of size 56064
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x1027b615a00 of size 44288
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x1027b620700 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x1027b620800 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x1027b620900 of size 813400064
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x102abdd8900 of size 813400064
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x102dc590900 of size 56064
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x102dc59e400 of size 56064
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x102dc5abf00 of size 154350080
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x102e58df100 of size 154350080
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x102eec12300 of size 44288
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x102eec1d000 of size 44288
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x102eec27d00 of size 1612612352
I tensorflow/core/common_runtime/bfc_allocator.cc:687] Free at 0x1024ae4ff00 of size 659049984
I tensorflow/core/common_runtime/bfc_allocator.cc:687] Free at 0x102722e2800 of size 154350080
I tensorflow/core/common_runtime/bfc_allocator.cc:693]      Summary of in-use Chunks by size: 
I tensorflow/core/common_runtime/bfc_allocator.cc:696] 8 Chunks of size 256 totalling 2.0KiB
I tensorflow/core/common_runtime/bfc_allocator.cc:696] 1 Chunks of size 1280 totalling 1.2KiB
I tensorflow/core/common_runtime/bfc_allocator.cc:696] 5 Chunks of size 44288 totalling 216.2KiB
I tensorflow/core/common_runtime/bfc_allocator.cc:696] 5 Chunks of size 56064 totalling 273.8KiB
I tensorflow/core/common_runtime/bfc_allocator.cc:696] 4 Chunks of size 154350080 totalling 588.80MiB
I tensorflow/core/common_runtime/bfc_allocator.cc:696] 3 Chunks of size 813400064 totalling 2.27GiB
I tensorflow/core/common_runtime/bfc_allocator.cc:696] 1 Chunks of size 1612612352 totalling 1.50GiB
I tensorflow/core/common_runtime/bfc_allocator.cc:700] Sum Total of in-use chunks: 4.35GiB
I tensorflow/core/common_runtime/bfc_allocator.cc:702] Stats: 
Limit:                  5484118016
InUse:                  4670717952
MaxInUse:               5484118016
NumAllocs:                      29
MaxAllocSize:           1612612352

W tensorflow/core/common_runtime/bfc_allocator.cc:274] *********************___________*__***************************************************xxxxxxxxxxxxxx
W tensorflow/core/common_runtime/bfc_allocator.cc:275] Ran out of memory trying to allocate 775.72MiB.  See logs for memory state.
W tensorflow/core/framework/op_kernel.cc:993] Resource exhausted: OOM when allocating tensor with shape[14525,14000]
Traceback (most recent call last):
  File "/home/leon/anaconda3/envs/gputensorflow/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1022, in _do_call
    return fn(*args)
  File "/home/leon/anaconda3/envs/gputensorflow/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1004, in _run_fn
    status, run_metadata)
  File "/home/leon/anaconda3/envs/gputensorflow/lib/python3.5/contextlib.py", line 66, in __exit__
    next(self.gen)
  File "/home/leon/anaconda3/envs/gputensorflow/lib/python3.5/site-packages/tensorflow/python/framework/errors_impl.py", line 469, in raise_exception_on_not_ok_status
    pywrap_tensorflow.TF_GetCode(status))
tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[14525,14000]
     [[Node: rnn/basic_lstm_cell/weights/Initializer/random_uniform = Add[T=DT_FLOAT, _class=["loc:@rnn/basic_lstm_cell/weights"], _device="/job:localhost/replica:0/task:0/gpu:0"](rnn/basic_lstm_cell/weights/Initializer/random_uniform/mul, rnn/basic_lstm_cell/weights/Initializer/random_uniform/min)]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "Netzwerk_v0.5.1_gamma.py", line 171, in <module>
    session.run(tf.global_variables_initializer())
  File "/home/leon/anaconda3/envs/gputensorflow/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 767, in run
    run_metadata_ptr)
  File "/home/leon/anaconda3/envs/gputensorflow/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 965, in _run
    feed_dict_string, options, run_metadata)
  File "/home/leon/anaconda3/envs/gputensorflow/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1015, in _do_run
    target_list, options, run_metadata)
  File "/home/leon/anaconda3/envs/gputensorflow/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1035, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[14525,14000]
     [[Node: rnn/basic_lstm_cell/weights/Initializer/random_uniform = Add[T=DT_FLOAT, _class=["loc:@rnn/basic_lstm_cell/weights"], _device="/job:localhost/replica:0/task:0/gpu:0"](rnn/basic_lstm_cell/weights/Initializer/random_uniform/mul, rnn/basic_lstm_cell/weights/Initializer/random_uniform/min)]]

Caused by op 'rnn/basic_lstm_cell/weights/Initializer/random_uniform', defined at:
  File "Netzwerk_v0.5.1_gamma.py", line 94, in <module>
    initial_state=initial_state, time_major=False)       # time_major = FALSE currently
  File "/home/leon/anaconda3/envs/gputensorflow/lib/python3.5/site-packages/tensorflow/python/ops/rnn.py", line 545, in dynamic_rnn
    dtype=dtype)
  File "/home/leon/anaconda3/envs/gputensorflow/lib/python3.5/site-packages/tensorflow/python/ops/rnn.py", line 712, in _dynamic_rnn_loop
    swap_memory=swap_memory)
  File "/home/leon/anaconda3/envs/gputensorflow/lib/python3.5/site-packages/tensorflow/python/ops/control_flow_ops.py", line 2626, in while_loop
    result = context.BuildLoop(cond, body, loop_vars, shape_invariants)
  File "/home/leon/anaconda3/envs/gputensorflow/lib/python3.5/site-packages/tensorflow/python/ops/control_flow_ops.py", line 2459, in BuildLoop
    pred, body, original_loop_vars, loop_vars, shape_invariants)
  File "/home/leon/anaconda3/envs/gputensorflow/lib/python3.5/site-packages/tensorflow/python/ops/control_flow_ops.py", line 2409, in _BuildLoop
    body_result = body(*packed_vars_for_body)
  File "/home/leon/anaconda3/envs/gputensorflow/lib/python3.5/site-packages/tensorflow/python/ops/rnn.py", line 697, in _time_step
    (output, new_state) = call_cell()
  File "/home/leon/anaconda3/envs/gputensorflow/lib/python3.5/site-packages/tensorflow/python/ops/rnn.py", line 683, in <lambda>
    call_cell = lambda: cell(input_t, state)
  File "/home/leon/anaconda3/envs/gputensorflow/lib/python3.5/site-packages/tensorflow/contrib/rnn/python/ops/core_rnn_cell_impl.py", line 179, in __call__
    concat = _linear([inputs, h], 4 * self._num_units, True, scope=scope)
  File "/home/leon/anaconda3/envs/gputensorflow/lib/python3.5/site-packages/tensorflow/contrib/rnn/python/ops/core_rnn_cell_impl.py", line 747, in _linear
    "weights", [total_arg_size, output_size], dtype=dtype)
  File "/home/leon/anaconda3/envs/gputensorflow/lib/python3.5/site-packages/tensorflow/python/ops/variable_scope.py", line 988, in get_variable
    custom_getter=custom_getter)
  File "/home/leon/anaconda3/envs/gputensorflow/lib/python3.5/site-packages/tensorflow/python/ops/variable_scope.py", line 890, in get_variable
    custom_getter=custom_getter)
  File "/home/leon/anaconda3/envs/gputensorflow/lib/python3.5/site-packages/tensorflow/python/ops/variable_scope.py", line 348, in get_variable
    validate_shape=validate_shape)
  File "/home/leon/anaconda3/envs/gputensorflow/lib/python3.5/site-packages/tensorflow/python/ops/variable_scope.py", line 333, in _true_getter
    caching_device=caching_device, validate_shape=validate_shape)
  File "/home/leon/anaconda3/envs/gputensorflow/lib/python3.5/site-packages/tensorflow/python/ops/variable_scope.py", line 684, in _get_single_variable
    validate_shape=validate_shape)
  File "/home/leon/anaconda3/envs/gputensorflow/lib/python3.5/site-packages/tensorflow/python/ops/variables.py", line 226, in __init__
    expected_shape=expected_shape)
  File "/home/leon/anaconda3/envs/gputensorflow/lib/python3.5/site-packages/tensorflow/python/ops/variables.py", line 303, in _init_from_args
    initial_value(), name="initial_value", dtype=dtype)
  File "/home/leon/anaconda3/envs/gputensorflow/lib/python3.5/site-packages/tensorflow/python/ops/variable_scope.py", line 673, in <lambda>
    shape.as_list(), dtype=dtype, partition_info=partition_info)
  File "/home/leon/anaconda3/envs/gputensorflow/lib/python3.5/site-packages/tensorflow/python/ops/init_ops.py", line 360, in __call__
    dtype, seed=self.seed)
  File "/home/leon/anaconda3/envs/gputensorflow/lib/python3.5/site-packages/tensorflow/python/ops/random_ops.py", line 246, in random_uniform
    return math_ops.add(rnd * (maxval - minval), minval, name=name)
  File "/home/leon/anaconda3/envs/gputensorflow/lib/python3.5/site-packages/tensorflow/python/ops/gen_math_ops.py", line 73, in add
    result = _op_def_lib.apply_op("Add", x=x, y=y, name=name)
  File "/home/leon/anaconda3/envs/gputensorflow/lib/python3.5/site-packages/tensorflow/python/framework/op_def_library.py", line 763, in apply_op
    op_def=op_def)
  File "/home/leon/anaconda3/envs/gputensorflow/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 2395, in create_op
    original_op=self._default_original_op, op_def=op_def)
  File "/home/leon/anaconda3/envs/gputensorflow/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 1264, in __init__
    self._traceback = _extract_stack()

ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[14525,14000]
     [[Node: rnn/basic_lstm_cell/weights/Initializer/random_uniform = Add[T=DT_FLOAT, _class=["loc:@rnn/basic_lstm_cell/weights"], _device="/job:localhost/replica:0/task:0/gpu:0"](rnn/basic_lstm_cell/weights/Initializer/random_uniform/mul, rnn/basic_lstm_cell/weights/Initializer/random_uniform/min)]]

共有3个答案

况唯
2023-03-14

一个接一个地运行模型排列时遇到相同的OOM问题。在完成一个模型,然后定义并运行一个新模型之后,似乎GPU内存没有完全从以前的模型中清除,内存中正在积累一些东西,并最终导致OOM错误。

g-eoj对另一个问题的回答:

keras.backend.clear_session()

应该清除以前的模型。从…起https://keras.io/backend/ 销毁当前TF图并创建新TF图。有助于避免旧模型/层的混乱。运行并保存一个模型后,清除会话,然后运行下一个模型。

程鸿波
2023-03-14

试着看看这个

小心不要在同一GPU上运行评估和训练二进制文件,否则可能会耗尽内存。考虑在一个单独的GPU上运行评估,或者在同一GPU上运行评估时暂停训练二进制。

https://www.tensorflow.org/tutorials/deep_cnn

罗学林
2023-03-14

我通过减少batch_size=52来解决这个问题,只有减少内存使用才能减少batch_size。

Batch_size取决于你的gpu显卡,VRAM大小,缓存内存等。

请选择另一个堆栈溢出链接

 类似资料:
  • 我正在尝试使用我的RTX 2060 Super与Keras进行预测。出于某种原因,它似乎在我的CPU上运行。 这是我用来调试的测试脚本: 以下是打印到控制台的结果: 下面是一个屏幕截图,显示了我在Task Manager中的CPU和GPU利用率: 任何帮助都将不胜感激!

  • 本文向大家介绍基于Tensorflow使用CPU而不用GPU问题的解决,包括了基于Tensorflow使用CPU而不用GPU问题的解决的使用技巧和注意事项,需要的朋友参考一下 之前的文章讲过用Tensorflow的object detection api训练MobileNetV2-SSDLite,然后发现训练的时候没有利用到GPU,反而CPU占用率贼高(可能会有Could not dlopen l

  • 我正在尝试创建和训练一个CNN模型。但每次我运行代码时,tensorflow并没有使用GPU,而是使用CPU。我已经安装了tensorflow的最新版本。附上以下详细信息。 在运行时,我得到以下带有警告消息的输出。(平台:VS代码) 2021-07-28 15:35:13.163991: W tenstorflow/core/common_runtime/bfc_allocator.cc:337]

  • GPU

    TensorFlow 支持 CPU 和 GPU 这两种设备,标识设备的方法为: /cpu:0:机器中的 CPU /gpu:0:机器中的 GPU, 如果你有一个的话. /gpu:1:机器中的第二个 GPU, 以此类推… 记录设备指派情况 通过设置 log_device_placement 选项来记录 operations 和 Tensor 被指派到哪个设备上运行: import tensorflow

  • 问题内容: 在JavaScript中,每个对象同时是一个实例和一个类。要进行继承,可以将任何对象实例用作原型。 在Python,C ++等中,有类和实例作为单独的概念。为了进行继承,您必须使用基类创建一个新类,然后可以使用该新类来生成派生实例。 为什么JavaScript朝这个方向发展(基于原型的面向对象)?与传统的基于类的OO相比,基于原型的OO有哪些优点和缺点? 问题答案: 这里大约有一百个术

  • 本文向大家介绍使用Tensorflow-GPU禁用GPU设置(CPU与GPU速度对比),包括了使用Tensorflow-GPU禁用GPU设置(CPU与GPU速度对比)的使用技巧和注意事项,需要的朋友参考一下 禁用GPU设置 CPU与GPU对比 显卡:GTX 1066 CPU GPU 简单测试:GPU比CPU快5秒 补充知识:tensorflow使用CPU可以跑(运行),但是使用GPU却不能用的情况