来源于github上hmi88大佬的代码。
问题:InternalError: Blas GEMM launch failed : a.shape=(16, 64), b.shape=(64, 8192)
解决:
gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction=0.333)
https://blog.csdn.net/sunwuhuang1/article/details/53946462/
问题:
shuffle…
shuffle done
解决:
观察 ./Data/celeba/face_detect.py第23行,cv2.imwrite(‘128_crop/’ + fn[:-4] + ‘_crop’ + fn[-4:], image_resize)
需要有./Data/celeba/128_crop这个目录,所以在celeba下创建128_crop文件夹。
问题:
2021-04-29 13:21:33.096838: W T:\src\github\tensorflow\tensorflow\core\common_runtime\bfc_allocator.cc:279] ****************************_**************************************************
***************xxxxxx
2021-04-29 13:21:33.102561: W T:\src\github\tensorflow\tensorflow\core\framework\op_kernel.cc:1318] OP_REQUIRES failed at conv_ops.cc:673 : Resource exhausted: OOM when allocating tensor
with shape[16,128,128,128] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
Traceback (most recent call last):
File "D:\Anaconda\lib\site-packages\tensorflow\python\client\session.py", line 1322, in _do_call
return fn(*args)
File "D:\Anaconda\lib\site-packages\tensorflow\python\client\session.py", line 1307, in _run_fn
options, feed_dict, fetch_list, target_list, run_metadata)
File "D:\Anaconda\lib\site-packages\tensorflow\python\client\session.py", line 1409, in _call_tf_sessionrun
run_metadata)
tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[16,256,128,128] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by all
ocator GPU_0_bfc
[[Node: disc_/conv3_enc_0/Conv2D = Conv2D[T=DT_FLOAT, data_format="NCHW", dilations=[1, 1, 1, 1], padding="SAME", strides=[1, 1, 1, 1], use_cudnn_on_gpu=true, _device="/job:local
host/replica:0/task:0/device:GPU:0"](disc_/Elu_2, disc_/conv3_enc_0/weight/read)]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
[[Node: Mean_1/_497 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_
device_incarnation=1, tensor_name="edge_1377_Mean_1", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "main.py", line 56, in <module>
main()
File "main.py", line 51, in main
model.train(args.flag)
File "D:\pycharmProject\BEGAN_hmi88\src\operator\op_BEGAN.py", line 106, in train
_, loss_g, d_real_loss, d_fake_loss = self.sess.run(g_opt, feed_dict=feed_dict)
File "D:\Anaconda\lib\site-packages\tensorflow\python\client\session.py", line 900, in run
run_metadata_ptr)
File "D:\Anaconda\lib\site-packages\tensorflow\python\client\session.py", line 1135, in _run
feed_dict_tensor, options, run_metadata)
File "D:\Anaconda\lib\site-packages\tensorflow\python\client\session.py", line 1316, in _do_run
run_metadata)
File "D:\Anaconda\lib\site-packages\tensorflow\python\client\session.py", line 1335, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[16,256,128,128] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by all
ocator GPU_0_bfc
[[Node: disc_/conv3_enc_0/Conv2D = Conv2D[T=DT_FLOAT, data_format="NCHW", dilations=[1, 1, 1, 1], padding="SAME", strides=[1, 1, 1, 1], use_cudnn_on_gpu=true, _device="/job:local
host/replica:0/task:0/device:GPU:0"](disc_/Elu_2, disc_/conv3_enc_0/weight/read)]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
[[Node: Mean_1/_497 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_
device_incarnation=1, tensor_name="edge_1377_Mean_1", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
Caused by op 'disc_/conv3_enc_0/Conv2D', defined at:
File "main.py", line 56, in <module>
main()
File "main.py", line 47, in main
model = began.BEGAN(args, sess)
File "D:\pycharmProject\BEGAN_hmi88\src\models\BEGAN.py", line 7, in __init__
Operator.__init__(self, args, sess)
File "D:\pycharmProject\BEGAN_hmi88\src\operator\op_BEGAN.py", line 12, in __init__
self.build_model()
File "D:\pycharmProject\BEGAN_hmi88\src\operator\op_BEGAN.py", line 25, in build_model
d_real = self.decoder(self.encoder(self.y))
File "D:\pycharmProject\BEGAN_hmi88\src\models\BEGAN.py", line 71, in encoder
x = conv2d(x, [1, 1, f, 2 * f], stride=1, padding=p,name='conv3_enc_0')
File "D:\pycharmProject\BEGAN_hmi88\src\layer\layers.py", line 15, in conv2d
x = tf.nn.conv2d(x, weight, [1, stride, stride, 1], padding=padding)
File "D:\Anaconda\lib\site-packages\tensorflow\python\ops\gen_nn_ops.py", line 1042, in conv2d
data_format=data_format, dilations=dilations, name=name)
File "D:\Anaconda\lib\site-packages\tensorflow\python\framework\op_def_library.py", line 787, in _apply_op_helper
op_def=op_def)
File "D:\Anaconda\lib\site-packages\tensorflow\python\framework\ops.py", line 3392, in create_op
op_def=op_def)
File "D:\Anaconda\lib\site-packages\tensorflow\python\framework\ops.py", line 1718, in __init__
self._traceback = self._graph._extract_stack() # pylint: disable=protected-access
ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[16,256,128,128] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_b
fc
[[Node: disc_/conv3_enc_0/Conv2D = Conv2D[T=DT_FLOAT, data_format="NCHW", dilations=[1, 1, 1, 1], padding="SAME", strides=[1, 1, 1, 1], use_cudnn_on_gpu=true, _device="/job:local
host/replica:0/task:0/device:GPU:0"](disc_/Elu_2, disc_/conv3_enc_0/weight/read)]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
[[Node: Mean_1/_497 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_
device_incarnation=1, tensor_name="edge_1377_Mean_1", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
解决:将batch_size的值由原来的16修改为8。
https://blog.csdn.net/qq_33221533/article/details/100188050