BEGAN-hmi88代码调试

沈曜灿

2023-12-01

来源于github上hmi88大佬的代码。

问题：InternalError: Blas GEMM launch failed : a.shape=(16, 64), b.shape=(64, 8192)

解决：
gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction=0.333)

https://blog.csdn.net/sunwuhuang1/article/details/53946462/

问题：
shuffle…
shuffle done

解决：
观察 ./Data/celeba/face_detect.py第23行，cv2.imwrite(‘128_crop/’ + fn[:-4] + ‘_crop’ + fn[-4:], image_resize)
需要有./Data/celeba/128_crop这个目录，所以在celeba下创建128_crop文件夹。

问题：

2021-04-29 13:21:33.096838: W T:\src\github\tensorflow\tensorflow\core\common_runtime\bfc_allocator.cc:279] ****************************_**************************************************
***************xxxxxx
2021-04-29 13:21:33.102561: W T:\src\github\tensorflow\tensorflow\core\framework\op_kernel.cc:1318] OP_REQUIRES failed at conv_ops.cc:673 : Resource exhausted: OOM when allocating tensor
with shape[16,128,128,128] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
Traceback (most recent call last):
  File "D:\Anaconda\lib\site-packages\tensorflow\python\client\session.py", line 1322, in _do_call
    return fn(*args)
  File "D:\Anaconda\lib\site-packages\tensorflow\python\client\session.py", line 1307, in _run_fn
    options, feed_dict, fetch_list, target_list, run_metadata)
  File "D:\Anaconda\lib\site-packages\tensorflow\python\client\session.py", line 1409, in _call_tf_sessionrun
    run_metadata)
tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[16,256,128,128] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by all
ocator GPU_0_bfc
         [[Node: disc_/conv3_enc_0/Conv2D = Conv2D[T=DT_FLOAT, data_format="NCHW", dilations=[1, 1, 1, 1], padding="SAME", strides=[1, 1, 1, 1], use_cudnn_on_gpu=true, _device="/job:local
host/replica:0/task:0/device:GPU:0"](disc_/Elu_2, disc_/conv3_enc_0/weight/read)]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

         [[Node: Mean_1/_497 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_
device_incarnation=1, tensor_name="edge_1377_Mean_1", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "main.py", line 56, in <module>
    main()
  File "main.py", line 51, in main
    model.train(args.flag)
  File "D:\pycharmProject\BEGAN_hmi88\src\operator\op_BEGAN.py", line 106, in train
    _, loss_g, d_real_loss, d_fake_loss = self.sess.run(g_opt, feed_dict=feed_dict)
  File "D:\Anaconda\lib\site-packages\tensorflow\python\client\session.py", line 900, in run
    run_metadata_ptr)
  File "D:\Anaconda\lib\site-packages\tensorflow\python\client\session.py", line 1135, in _run
    feed_dict_tensor, options, run_metadata)
  File "D:\Anaconda\lib\site-packages\tensorflow\python\client\session.py", line 1316, in _do_run
    run_metadata)
  File "D:\Anaconda\lib\site-packages\tensorflow\python\client\session.py", line 1335, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[16,256,128,128] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by all
ocator GPU_0_bfc
         [[Node: disc_/conv3_enc_0/Conv2D = Conv2D[T=DT_FLOAT, data_format="NCHW", dilations=[1, 1, 1, 1], padding="SAME", strides=[1, 1, 1, 1], use_cudnn_on_gpu=true, _device="/job:local
host/replica:0/task:0/device:GPU:0"](disc_/Elu_2, disc_/conv3_enc_0/weight/read)]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

         [[Node: Mean_1/_497 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_
device_incarnation=1, tensor_name="edge_1377_Mean_1", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

Caused by op 'disc_/conv3_enc_0/Conv2D', defined at:
  File "main.py", line 56, in <module>
    main()
  File "main.py", line 47, in main
    model = began.BEGAN(args, sess)
  File "D:\pycharmProject\BEGAN_hmi88\src\models\BEGAN.py", line 7, in __init__
    Operator.__init__(self, args, sess)
  File "D:\pycharmProject\BEGAN_hmi88\src\operator\op_BEGAN.py", line 12, in __init__
    self.build_model()
  File "D:\pycharmProject\BEGAN_hmi88\src\operator\op_BEGAN.py", line 25, in build_model
    d_real = self.decoder(self.encoder(self.y))
  File "D:\pycharmProject\BEGAN_hmi88\src\models\BEGAN.py", line 71, in encoder
    x = conv2d(x, [1, 1, f, 2 * f], stride=1,  padding=p,name='conv3_enc_0')
  File "D:\pycharmProject\BEGAN_hmi88\src\layer\layers.py", line 15, in conv2d
    x = tf.nn.conv2d(x, weight, [1, stride, stride, 1], padding=padding)
  File "D:\Anaconda\lib\site-packages\tensorflow\python\ops\gen_nn_ops.py", line 1042, in conv2d
    data_format=data_format, dilations=dilations, name=name)
  File "D:\Anaconda\lib\site-packages\tensorflow\python\framework\op_def_library.py", line 787, in _apply_op_helper
    op_def=op_def)
  File "D:\Anaconda\lib\site-packages\tensorflow\python\framework\ops.py", line 3392, in create_op
    op_def=op_def)
  File "D:\Anaconda\lib\site-packages\tensorflow\python\framework\ops.py", line 1718, in __init__
    self._traceback = self._graph._extract_stack()  # pylint: disable=protected-access

ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[16,256,128,128] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_b
fc
         [[Node: disc_/conv3_enc_0/Conv2D = Conv2D[T=DT_FLOAT, data_format="NCHW", dilations=[1, 1, 1, 1], padding="SAME", strides=[1, 1, 1, 1], use_cudnn_on_gpu=true, _device="/job:local
host/replica:0/task:0/device:GPU:0"](disc_/Elu_2, disc_/conv3_enc_0/weight/read)]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

         [[Node: Mean_1/_497 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_
device_incarnation=1, tensor_name="edge_1377_Mean_1", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

解决：将batch_size的值由原来的16修改为8。
https://blog.csdn.net/qq_33221533/article/details/100188050

BEGAN-hmi88代码调试

相关阅读

相关文章

相关问答

相关文档