目录
当前位置: 首页 > 文档资料 > 机器学习 >

Debug tensorflow

优质
小牛编辑
125浏览
2023-12-01

Welcome!

Contents

  • Introduction: Why debugging in TensorFlow is difficult
  • Basic and advanced methods for debugging TensorFlow codes
  • General tips and guidelines for easy-debuggable code
  • Benchmarking and profiling TensorFlow codes

A Disclaimer

  • This talk is NOT about how to debug your ML/DL model (e.g. my model is not fitting well), but about how to debug your TF codes in a programming perspective
  • I had to assume that the audience is somewhat familiar with basics of TensorFlow and Deep Learning; it would be very good if you have an experience to write a TensorFlow code by yourself
  • Questions are highly welcomed! Please feel free to interrupt!

Debugging?

Debugging TensorFlow Application is ...

Debugging TensorFlow Application is ...

  • Difficult!

Debugging TensorFlow Application is ...

  • Difficult!

  • Do you agree? Life is not that easy

Review: TensorFlow Computation Mechanics

Review: TensorFlow Computation Mechanics

(from TensorFlow docs)

TensorFlow programs are usually structured into

  • a construction phase, that assembles a graph, and
  • an execution phase that uses a session to execute ops in the graph.

Review: in pure numpy ...

W1, b1, W2, b2, W3, b3 = init_parameters()
def multilayer_perceptron(x, y_truth):
# (i) feed-forward pass
assert x.dtype == np.float32 and x.shape == [batch_size, 784] # numpy!
fc1 = fully_connected(x, W1, b1, activation_fn='relu') # [B, 256]
fc2 = fully_connected(fc1, W2, b2, activation_fn='relu') # [B, 256]
out = fully_connected(fc2, W3, b3, activation_fn=None) # [B, 10]
# (ii) loss and gradient, backpropagation
loss = softmax_cross_entropy_loss(out, y_truth) # loss as a scalar
param_gradients = _compute_gradient(...) # just an artificial example :)
return out, loss, param_gradients
def train():
for epoch in range(10):
epoch_loss = 0.0
batch_steps = mnist.train.num_examples / batch_size
for step in range(batch_steps):
batch_x, batch_y = mnist.train.next_batch(batch_size)
y_pred, loss, gradients = multilayer_perceptron(batch_x, batch_y)
for v, grad_v in zip(all_params, gradients):
v = v - learning_rate * grad_v
epoch_loss += c / batch_steps
print "Epoch %02d, Loss = %.6f" % (epoch, epoch_loss)

Review: with TensorFlow

def multilayer_perceptron(x):
fc1 = layers.fully_connected(x, 256, activation_fn=tf.nn.relu)
fc2 = layers.fully_connected(fc1, 256, activation_fn=tf.nn.relu)
out = layers.fully_connected(fc2, 10, activation_fn=None)
return out
x = tf.placeholder(tf.float32, [None, 784])
y = tf.placeholder(tf.float32, [None, 10])
pred = multilayer_perceptron(x)
loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(pred, y))
train_op = tf.train.AdamOptimizer(learning_rate=0.001).minimize(loss)
def train(session):
batch_size = 200
session.run(tf.initialize_all_variables())
for epoch in range(10):
epoch_loss = 0.0
batch_steps = mnist.train.num_examples / batch_size
for step in range(batch_steps):
batch_x, batch_y = mnist.train.next_batch(batch_size)
_, c = session.run([train_op, loss], {x: batch_x, y: batch_y})
epoch_loss += c / batch_steps
print "Epoch %02d, Loss = %.6f" % (epoch, epoch_loss)

Review: The Issues

def multilayer_perceptron(x):
fc1 = layers.fully_connected(x, 256, activation_fn=tf.nn.relu)
fc2 = layers.fully_connected(fc1, 256, activation_fn=tf.nn.relu)
out = layers.fully_connected(fc2, 10, activation_fn=None)
return out
x = tf.placeholder(tf.float32, [None, 784])
y = tf.placeholder(tf.float32, [None, 10])
pred = multilayer_perceptron(x)
  • The actual computation is done inside session.run(); what we just have done is to build a computation graph
  • The model building part (e.g. multilayer_perceptron()) is called only once before training, so we can't access the intermediates simply
    • e.g. Inspecting activations of fc1/fc2 is not trivial!
_, c = session.run([train_op, loss], {x: batch_x, y: batch_y})

Session.run()

The most important method in TensorFlow --- where every computation is performed!

  • tf.Session.run(fetches, feed_dict) runs the operations and evaluates in fetches, subsituting the values (placeholders) in feed_dict for the corresponding input values.

Why TensorFlow debugging is difficult?

  • The concept of Computation Graph might be unfamiliar to us.
  • The "Inversion of Control"
    • The actual computation (feed-forward, training) of model runs inside Session.run(), upon the computation graph, but not upon the Python code we wrote
    • What is exactly being done during an execution of session is under an abstraction barrier
  • Therefore, we do not retrieve the intermediate values during the computation, unless we explicitly fetch them via Session.run()

Debugging Facilities in TensorFlow

Debugging Scenarios

We may wish to ...

  • inspect intra-layer activations (during training)
    • e.g. See the output of conv5, fc7 in CNNs
  • inspect parameter weights (during training)
  • under some conditions, pause the execution (i.e. breakpoint) and evaluate some expressions for debugging
  • during training, NaN occurs in loss and variables but I don't know why

Debugging Scenarios

We may wish to ...

  • inspect intra-layer activations (during training)
    • e.g. See the output of conv5, fc7 in CNNs
  • inspect parameter weights (during training)
  • under some conditions, pause the execution (i.e. breakpoint) and evaluate some expressions for debugging
  • during training, NaN occurs in loss and variables but I don't know why

Of course, in TensorFlow, we can do these very elegantly!

Debugging in TensorFlow: Overview

Basic ways:

  • Explicitly fetch, and print (or do whatever you want)!
    • Session.run()
  • Tensorboard: Histogram and Image Summary
  • the tf.Print() operation

Advanced ways:

  • Interpose any python codelet in the computation graph
  • A step-by-step debugger
  • tfdbg: The TensorFlow debugger

(1) Fetch tensors via Session.run()

TensorFlow allows us to run parts of graph in isolation, i.e. only the relevant part of graph is executed (rather than executing everything)

x = tf.placeholder(tf.float32)
y = tf.placeholder(tf.float32)
bias = tf.Variable(1.0)
y_pred = x ** 2 + bias # x -> x^2 + bias
loss = (y - y_pred)**2 # l2 loss?
# Error: to compute loss, y is required as a dependency
print('Loss(x,y) = %.3f' % session.run(loss, {x: 3.0}))
# OK, print 1.000 = (3**2 + 1 - 9)**2
print('Loss(x,y) = %.3f' % session.run(loss, {x: 3.0, y: 9.0}))
# OK, print 10.000; for evaluating y_pred only, input to y is not required
print('pred_y(x) = %.3f' % session.run(y_pred, {x: 3.0}))
# OK, print 1.000 bias evaluates to 1.0
print('bias = %.3f' % session.run(bias))

Tensor Fetching: Example

We need to access to the tensors as python expressions

def multilayer_perceptron(x):
fc1 = layers.fully_connected(x, 256, activation_fn=tf.nn.relu)
fc2 = layers.fully_connected(fc1, 256, activation_fn=tf.nn.relu)
out = layers.fully_connected(fc2, 10, activation_fn=None)
return out, fc1, fc2
net = {}
x = tf.placeholder(tf.float32, [None, 784])
y = tf.placeholder(tf.float32, [None, 10])
pred, net['fc1'], net['fc2'] = multilayer_perceptron(x)

to fetch and evaluate them:

_, c, fc1, fc2, out = session.run(
[train_op, loss, net['fc1'], net['fc2'], pred],
feed_dict={x: batch_x, y: batch_y})
# and do something ...
if step == 0: # XXX Debug
print fc1[0].mean(), fc2[0].mean(), out[0]

(1) Fetch tensors via Session.run()

The Good:

  • Simple and Easy.
  • The most basic method to get debugging information.
  • We can fetch any evaluation result in numpy arrays, anywhere (except inside Session.run() or the computation graph).

(1) Fetch tensors via Session.run()

The Good:

  • Simple and Easy.
  • The most basic method to get debugging information.
  • We can fetch any evaluation result in numpy arrays, anywhere (except inside Session.run() or the computation graph).

The Bad:

  • We need to hold the reference to the tensors to inspect, which might be burdensome if model becomes complex and big
    (Or, we can simply pass the tensor name such as fc0/Relu:0)
  • The feed-forward needs to be done in an atomic way (i.e. a single call of Session.run())

Tensor Fetching: The Bad (i)

We need to hold a reference to the tensors to inspect, which might be burdensome if model becomes complex and big

def alexnet(x):
assert x.get_shape().as_list() == [224, 224, 3]
conv1 = conv_2d(x, 96, 11, strides=4, activation='relu')
pool1 = max_pool_2d(conv1, 3, strides=2)
conv2 = conv_2d(pool1, 256, 5, activation='relu')
pool2 = max_pool_2d(conv2, 3, strides=2)
conv3 = conv_2d(pool2, 384, 3, activation='relu')
conv4 = conv_2d(conv3, 384, 3, activation='relu')
conv5 = conv_2d(conv4, 256, 3, activation='relu')
pool5 = max_pool_2d(conv5, 3, strides=2)
fc6 = fully_connected(pool5, 4096, activation='relu')
fc7 = fully_connected(fc6, 4096, activation='relu')
output = fully_connected(fc7, 1000, activation='softmax')
return conv1, pool1, conv2, pool2, conv3, conv4, conv5, pool5, fc6, fc7
conv1, conv2, conv3, conv4, conv5, fc6, fc7, output = alexnet(images) # ?!
_, loss_, conv1_, conv2_, conv3_, conv4_, conv5_, fc6_, fc7_ = session.run(
[train_op, loss, conv1, conv2, conv3, conv4, conv5, fc6, fc7],
feed_dict = {...})

Tensor Fetching: The Bad (i)

Suggestion: Using a dict or class instance (e.g. self.conv5) is a very good idea

def alexnet(x, net={}):
assert x.get_shape().as_list() == [224, 224, 3]
net['conv1'] = conv_2d(x, 96, 11, strides=4, activation='relu')
net['pool1'] = max_pool_2d(net['conv1'], 3, strides=2)
net['conv2'] = conv_2d(net['pool1'], 256, 5, activation='relu')
net['pool2'] = max_pool_2d(net['conv2'], 3, strides=2)
net['conv3'] = conv_2d(net['pool2'], 384, 3, activation='relu')
net['conv4'] = conv_2d(net['conv3'], 384, 3, activation='relu')
net['conv5'] = conv_2d(net['conv4'], 256, 3, activation='relu')
net['pool5'] = max_pool_2d(net['conv5'], 3, strides=2)
net['fc6'] = fully_connected(net['pool5'], 4096, activation='relu')
net['fc7'] = fully_connected(net['fc6'], 4096, activation='relu')
net['output'] = fully_connected(net['fc7'], 1000, activation='softmax')
return net['output']
net = {}
output = alexnet(images, net)
# access intermediate layers like net['conv5'], net['fc7'], etc.

Tensor Fetching: The Bad (i)

Suggestion: Using a dict or class instance (e.g. self.conv5) is a very good idea

class AlexNetModel():
# ...
def build_model(self, x):
assert x.get_shape().as_list() == [224, 224, 3]
self.conv1 = conv_2d(x, 96, 11, strides=4, activation='relu')
self.pool1 = max_pool_2d(self.conv1, 3, strides=2)
self.conv2 = conv_2d(self.pool1, 256, 5, activation='relu')
self.pool2 = max_pool_2d(self.conv2, 3, strides=2)
self.conv3 = conv_2d(self.pool2, 384, 3, activation='relu')
self.conv4 = conv_2d(self.conv3, 384, 3, activation='relu')
self.conv5 = conv_2d(self.conv4, 256, 3, activation='relu')
self.pool5 = max_pool_2d(self.conv5, 3, strides=2)
self.fc6 = fully_connected(self.pool5, 4096, activation='relu')
self.fc7 = fully_connected(self.fc6, 4096, activation='relu')
self.output = fully_connected(self.fc7, 1000, activation='softmax')
return self.output
model = AlexNetModel()
output = model.build_model(images)
# access intermediate layers like self.conv5, self.fc7, etc.

Tensor Fetching: The Bad (ii)

The feed-forward (sometimes) needs to be done in an atomic way (i.e. a single call of Session.run())

# a single step of training ...
[loss_value, _] = session.run([loss_op, train_op],
feed_dict={images: batch_image})
# After this, the model parameter has been changed due to `train_op`
#
if np.isnan(loss_value):
# DEBUG : can we see the intermediate values for the current input?
[fc7, prob] = session.run([net['fc7'], net['prob']],
feed_dict={images: batch_image})

In other words, if any input is fed via feed_dict, we may have to fetch the non-debugging-related tensors and the debugging-related tensors at the same time.

Tensor Fetching: The Bad (ii)

In fact, we can just perform an additional session.run() for debugging purposes, if it does not involve any side effect

# for debugging only, get the intermediate layer outputs.
[fc7, prob] = session.run([net['fc7'], net['prob']],
feed_dict={images: batch_image})
#
# Yet another feed-forward: 'fc7' are computed once more ...
[loss_value, _] = session.run([loss_op, train_op],
feed_dict={images: batch_image})

A workaround: Use session.partial_run() (undocumented, and still experimental)

h = sess.partial_run_setup([net['fc7'], loss_op, train_op], [images])
[loss_value, _] = sess.partial_run(h, [loss_op, train_op],
feed_dict={images: batch_image})
fc7 = sess.partial_run(h, net['fc7'])

(2) Tensorboard

Tensorboard: A Quick Tutorial

def multilayer_perceptron(x):
# inside this, variables 'fc1/weights' and 'fc1/bias' are defined
fc1 = layers.fully_connected(x, 256, activation_fn=tf.nn.relu,
scope='fc1')
tf.summary.histogram('fc1', fc1)
tf.summary.histogram('fc1/sparsity', tf.nn.zero_fraction(fc1))
fc2 = layers.fully_connected(fc1, 256, activation_fn=tf.nn.relu,
scope='fc2')
tf.summary.histogram('fc2', fc2)
tf.summary.histogram('fc2/sparsity', tf.nn.zero_fraction(fc2))
out = layers.fully_connected(fc2, 10, scope='out')
return out
# ... (omitted) ...
loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(
logits=pred, labels=y))
tf.scalar_summary('loss', loss)
# histogram summary for all trainable variables (slow?)
for v in tf.trainable_variables():
tf.summary.histogram(v.name, v)
global_step = tf.Variable(0, dtype=tf.int32, trainable=False)
train_op = tf.train.AdamOptimizer(learning_rate=0.001)\
.minimize(loss, global_step=global_step)
def train(session):
batch_size = 200
session.run(tf.global_variables_initializer())
merged_summary_op = tf.summary.merge_all()
summary_writer = tf.summary.FileWriter(FLAGS.train_dir, session.graph)
# ... (omitted) ...
for step in range(batch_steps):
batch_x, batch_y = mnist.train.next_batch(batch_size)
_, c, summary = session.run(
[train_op, loss, merged_summary_op],
feed_dict={x: batch_x, y: batch_y})
summary_writer.add_summary(summary,
global_step.eval(session=session))

Tensorboard: A Quick Tutorial (Demo)

Scalar Summary

Tensorboard: A Quick Tutorial (Demo)

Histogram Summary (activations and variables)

Tensorboard and Summary: Noteworthies

Fetching histogram summary is extremely slow!

  • GPU utilization can become very low (if the serialized values are huge)
  • In non-debugging mode, disable it completely; or fetch summaries only periodically, e.g.
  • eval_tensors = [self.loss, self.train_op]
    if step % 200 == 0:
    eval_tensors += [self.merged_summary_op]
    eval_ret = session.run(eval_tensors, feed_dict)
    eval_ret = dict(zip(eval_tensors, eval_ret)) # as a dict
    current_loss = eval_ret[self.loss]
    if self.merged_summary_op in eval_tensors:
    self.summary_writer.add_summary(
    eval_ret[self.merged_summary_op], current_step)
  • I recommend to take simple and essential scalar summaries only (e.g. train/validation loss, overall accuracy, etc.), and to include debugging stuffs only on demand

Tensorboard and Summary: Noteworthies

Some other recommendations:

  • Use proper names (prefixed or scoped) for tensors and variables (specifying name=... to tensor/variable declaration)
  • Include both of train loss and validation loss, plus train/validation accuracy (if possible) over step

(3) tf.Print()

During run-time evaluation, we can print the value of a tensor without explicitly fetching and returning it to the code (i.e. via session.run())

tf.Print(input, data, message=None,
first_n=None, summarize=None, name=None)

It creates an identity op with the side effect of printing data, when this op is evaluated.

(3) tf.Print(): Examples

def multilayer_perceptron(x):
fc1 = layers.fully_connected(x, 256, activation_fn=tf.nn.relu)
fc2 = layers.fully_connected(fc1, 256, activation_fn=tf.nn.relu)
out = layers.fully_connected(fc2, 10, activation_fn=None)
out = tf.Print(out, [tf.argmax(out, 1)],
'argmax(out) = ', summarize=20, first_n=7)
return out

For the first seven times (i.e. 7 feed-forwards or SGD steps), it will print the predicted labels for the 20 out of batch_size examples

I tensorflow/core/kernels/logging_ops.cc:79] argmax(out) = [6 6 6 4 4 6 4 4 6 6 4 0 6 4 6 4 4 6 0 4...]
I tensorflow/core/kernels/logging_ops.cc:79] argmax(out) = [6 6 0 0 3 6 4 3 6 6 3 4 4 4 4 4 3 4 6 7...]
I tensorflow/core/kernels/logging_ops.cc:79] argmax(out) = [3 4 0 6 6 6 0 7 3 0 6 7 3 6 0 3 4 3 3 6...]
I tensorflow/core/kernels/logging_ops.cc:79] argmax(out) = [6 1 0 0 0 3 3 7 0 8 1 2 0 9 9 0 3 4 6 6...]
I tensorflow/core/kernels/logging_ops.cc:79] argmax(out) = [6 0 0 9 0 4 9 9 0 8 2 7 3 9 1 8 3 9 7 8...]
I tensorflow/core/kernels/logging_ops.cc:79] argmax(out) = [6 0 1 1 9 0 8 3 0 9 9 0 2 6 7 7 3 3 3 9...]
I tensorflow/core/kernels/logging_ops.cc:79] argmax(out) = [3 6 9 8 3 9 1 0 1 1 9 3 2 3 9 9 3 0 6 6...]
[2016-06-03 00:11:08.661563] Epoch 00, Loss = 0.332199

(3) tf.Print(): Some drawbacks ...

Cons:

  • It is hard to take a full control of print formats (e.g. how do we print a 2D tensor in a matrix format?)
  • Usually, we may want to print debugging values conditionally
    (i.e. print them only if some condition is met) or periodically
    (i.e. print just only once per epoch)
    • tf.Print() has limitations to achieve this
    • TensorFlow has control flow operations; an overkill?

(3) tf.Assert()

  • Asserts that the given condition is true, when evaluated (during the computation)
  • If condition evaluates to False, print the list of tensors in data, and an error is thrown. summarize determines how many entries of the tensors to print.
  • tf.Assert(condition, data, summarize=None, name=None)

tf.Assert: Examples

def multilayer_perceptron(x):
fc1 = layers.fully_connected(x, 256, activation_fn=tf.nn.relu, scope='fc1')
fc2 = layers.fully_connected(fc1, 256, activation_fn=tf.nn.relu, scope='fc2')
out = layers.fully_connected(fc2, 10, activation_fn=None, scope='out')
# let's ensure that all the outputs in `out` are positive
tf.Assert(tf.reduce_all(out > 0), [out], name='assert_out_positive')
return out

The assertion will not work!

tf.Assert is also an op, so it should be executed as well

We need to ensure that assert_op is being executed when evaluating out:

def multilayer_perceptron(x):
fc1 = layers.fully_connected(x, 256, activation_fn=tf.nn.relu, scope='fc1')
fc2 = layers.fully_connected(fc1, 256, activation_fn=tf.nn.relu, scope='fc2')
out = layers.fully_connected(fc2, 10, activation_fn=None, scope='out')
# let's ensure that all the outputs in `out` are positive
assert_op = tf.Assert(tf.reduce_all(out > 0), [out], name='assert_out_positive')
with tf.control_dependencies([assert_op]):
out = tf.identity(out, name='out')
return out

... somewhat ugly? ... or

# ... same as above ...
out = tf.with_dependencies([assert_op], out)
return out

Another good way: store all the created assertion operations into a collection, (merge them into a single op), and explicitly evaluate them using Session.run()

def multilayer_perceptron(x):
fc1 = layers.fully_connected(x, 256, activation_fn=tf.nn.relu, scope='fc1')
fc2 = layers.fully_connected(fc1, 256, activation_fn=tf.nn.relu, scope='fc2')
out = layers.fully_connected(fc2, 10, activation_fn=None, scope='out')
tf.add_to_collection('Asserts',
tf.Assert(tf.reduce_all(out > 0), [out], name='assert_out_gt_0')
)
return out
# merge all assertion ops from the collection
assert_op = tf.group(*tf.get_collection('Asserts'))
... = session.run([train_op, assert_op], feed_dict={...})

Some built-in useful Assert ops

See Asserts and boolean checks in the docs!

tf.assert_negative(x, data=None, summarize=None, name=None)
tf.assert_positive(x, data=None, summarize=None, name=None)
tf.assert_proper_iterable(values)
tf.assert_non_negative(x, data=None, summarize=None, name=None)
tf.assert_non_positive(x, data=None, summarize=None, name=None)
tf.assert_equal(x, y, data=None, summarize=None, name=None)
tf.assert_integer(x, data=None, summarize=None, name=None)
tf.assert_less(x, y, data=None, summarize=None, name=None)
tf.assert_less_equal(x, y, data=None, summarize=None, name=None)
tf.assert_rank(x, rank, data=None, summarize=None, name=None)
tf.assert_rank_at_least(x, rank, data=None, summarize=None, name=None)
tf.assert_type(tensor, tf_type)
tf.is_non_decreasing(x, name=None)
tf.is_numeric_tensor(tensor)
tf.is_strictly_increasing(x, name=None)

If we need runtime assertions during computation, they are useful.

(4) Step-by-step Debugger

Python already has a powerful debugging utilities:

which are all interactive debuggers (like gdb for C/C++)

  • set breakpoint
  • pause, continue
  • inspect stack-trace upon exception
  • watch variables and evaluate expressions interactively

Debugger: Usage

Insert set_trace() for a breakpoint

def multilayer_perceptron(x):
fc1 = layers.fully_connected(x, 256, activation_fn=tf.nn.relu, scope='fc1')
fc2 = layers.fully_connected(fc1, 256, activation_fn=tf.nn.relu, scope='fc2')
out = layers.fully_connected(fc2, 10, activation_fn=None, scope='out')
import ipdb; ipdb.set_trace() # XXX BREAKPOINT
return out

(yeah, it is a breakpoint on model building)

Debugger: Usage

Debug breakpoints can be conditional:

for i in range(batch_steps):
batch_x, batch_y = mnist.train.next_batch(batch_size)
if (np.argmax(batch_y, axis=1)[:7] == [4, 9, 6, 2, 9, 6, 5]).all():
import pudb; pudb.set_trace() # XXX BREAKPOINT
_, c = session.run([train_op, loss],
feed_dict={x: batch_x, y: batch_y})

Live Demo and Case Example!! (20-mnist-pdb.py)

  • Let's break on training loop if some condition is met
  • Get the fc2 tensor, and fetch its evaluation result (given x)
    • Session.run() can be invoked and executed anywhere, even in the debugger
  • Wow.... gonna love it..

Hint: Some Useful TensorFlow APIs

To get any operations or tensors that might not be stored explicitly:

  • tf.get_default_graph(): Get the current (default) graph
  • G.get_operations(): List all the TF ops in the graph
  • G.get_operation_by_name(name): Retrieve a specific TF op
    (Q. How to convert an operation to a tensor?)
  • G.get_tensor_by_name(name): Retrieve a specific tensor
  • tf.get_collection(tf.GraphKeys.~~): Get the collection of some tensors

To get variables:

  • tf.get_variable_scope(): Get the current variable scope
  • tf.get_variable(): Get a variable (see Sharing Variables)
  • tf.trainable_variables(): List all the (trainable) variables
  • [v for v in tf.all_variables() if v.name == 'fc2/weights:0'][0]

IPython.embed()

ipdb/pudb:

import pudb; pudb.set_trace()
  • They are debuggers; we can set breakpoint, see stacktraces, watch expressions, ... (much more general)

embed:

from IPython import embed; embed()
  • Open an ipython shell on the current context; mostly used for watching expressions only

(5) Debugging 'inside' the computation graph

Our debugging tools so far can be used for debugging outside Session.run().

Question: How can we run a 'custom' operation? (e.g. custom layer)

(5) Debugging 'inside' the computation graph

Our debugging tools so far can be used for debugging outside Session.run().

Question: How can we run a 'custom' operation? (e.g. custom layer)

  • TensorFlow allows us to write a custom operation in C++ !

  • The 'custom' operation can be designed for logging or debugging purposes (like PrintOp)

  • ... but very burdensome (need to compile, define op interface, and use it ...)

(5) Interpose any python code in the computation graph

We can also embed and interpose a python function in the graph: tf.py_func() comes to the rescue!

tf.py_func(func, inp, Tout, stateful=True, name=None)
  • Wraps a python function and uses it as a tensorflow op.
  • Given a python function func, which takes numpy arrays as its inputs and returns numpy arrays as its outputs, the function is wrapped as an operation.
def my_func(x):
# x will be a numpy array with the contents of the placeholder below
return np.sinh(x)
inp = tf.placeholder(tf.float32, [...])
y = py_func(my_func, [inp], [tf.float32])

In other words, we are now able to use the following (hacky) tricks by intercepting the computation being executed on the graph:

  • Print any intermediate values (e.g. layer activations) in a custom form without fetching them
  • Use python debugger (e.g. trace and breakpoint)
  • Draw a graph or plot using matplotlib, or save images into file

Warning: Some limitations may exist, e.g. thread-safety issue, not allowed to manipulate the state of session, etc..

Case Example (i): Print

An ugly example ...

def multilayer_perceptron(x):
fc1 = layers.fully_connected(x, 256, activation_fn=tf.nn.relu, scope='fc1')
fc2 = layers.fully_connected(fc1, 256, activation_fn=tf.nn.relu, scope='fc2')
out = layers.fully_connected(fc2, 10, activation_fn=None, scope='out')
def _debug_print_func(fc1_val, fc2_val):
print 'FC1 : {}, FC2 : {}'.format(fc1_val.shape, fc2_val.shape)
print 'min, max of FC2 = {}, {}'.format(fc2_val.min(), fc2_val.max())
return False
debug_print_op = tf.py_func(_debug_print_func, [fc1, fc2], [tf.bool])
with tf.control_dependencies(debug_print_op):
out = tf.identity(out, name='out')
return out

Case Example (ii): Breakpoints!

An ugly example to attach breakpoints ...

def multilayer_perceptron(x):
fc1 = layers.fully_connected(x, 256, activation_fn=tf.nn.relu, scope='fc1')
fc2 = layers.fully_connected(fc1, 256, activation_fn=tf.nn.relu, scope='fc2')
out = layers.fully_connected(fc2, 10, activation_fn=None, scope='out')
def _debug_func(x_val, fc1_val, fc2_val, out_val):
if (out_val == 0.0).any():
import ipdb; ipdb.set_trace() # XXX BREAKPOINT
from IPython import embed; embed() # XXX DEBUG
return False
debug_op = tf.py_func(_debug_func, [x, fc1, fc2, out], [tf.bool])
with tf.control_dependencies(debug_op):
out = tf.identity(out, name='out')
return out

Another one: The tdb library

A third-party TensorFlow debugging tool: https://github.com/ericjang/tdb
(not actively maintained and looks clunky, but still good for prototyping)

(6) tfdbg: The official TensorFlow debugger

Recent versions of TensorFlow has the official debugger (tfdbg). Still experimental, but works quite well!

Check out the HOW-TOs and Examples on tfdbg!!!

import tensorflow.python.debug as tf_debug
sess = tf.Session()
# create a debug wrapper session
sess = tf_debug.LocalCLIDebugWrapperSession(sess)
# Add a tensor filter (similar to breakpoint)
sess.add_tensor_filter("has_inf_or_nan", tf_debug.has_inf_or_nan)
# Each session.run() will be intercepted by the debugger,
# and we can inspect the value of tensors via the debugger interface
sess.run(loss, feed_dict = {x : ...})

(6) tfdbg: The TensorFlow debugger

(6) tfdbg: The TensorFlow debugger

tfdbg: Features and Quick References

Conceptually, a wrapper session is employed (currently, CLI debugger session); it can intercept a single run of session.run()

  • run / r : Execute the run() call with debug tensor-watching
  • run -n / r -n : Execute the run() call without debug tensor-watching
  • run -f <filter_name> : Keep executing run() calls until a dumped tensor passes a registered filter (conditional breakpoint)
    • e.g. has_inf_or_nan

tfdbg: Tensor Filters

Registering tensor filters:

sess = tf_debug.LocalCLIDebugWrapperSession(sess)
sess.add_tensor_filter("has_inf_or_nan", tf_debug.has_inf_or_nan)

Tensor filters are just python functions (datum, tensor) -> bool:

def has_inf_or_nan(datum, tensor):
_ = datum # Datum metadata is unused in this predicate.
if tensor is None:
# Uninitialized tensor doesn't have bad numerical values.
return False
elif (np.issubdtype(tensor.dtype, np.float) or
np.issubdtype(tensor.dtype, np.complex) or
np.issubdtype(tensor.dtype, np.integer)):
return np.any(np.isnan(tensor)) or np.any(np.isinf(tensor))
else:
return False

Running tensor filters are, therefore, quite slow.

tfdbg: Tensor Fetching

In a tensor dump mode (the run-end UI), the debugger shows the list of tensors dumped in the session.run() call:

tfdbg: Tensor Fetching

Commands:

  • list_tensors (lt) : Show the list of dumped tensor(s).
  • print_tensor (pt) : Print the value of a dumped tensor.
  • node_info (ni) : Show information about a node
    • ni -t : Shows the traceback of tensor creation
  • list_inputs (li) : Show inputs to a node
  • list_outputs (lo) : Show outputs to a node
  • run_info (ri) : Show the information of current run
    (e.g. what to fetch, what feed_dict is)
  • invoke_stepper (s) : Invoke the stepper!
  • run (r) : Move to the next run

tfdbg: Tensor Fetching

Example: print_tensor fc2/Relu:0

  • Slicing: pt f2/Relu:0[0:10]
  • Dumping: pt fc2/Relu:0 > /tmp/debug/fc2.txt

See also: tfdbg CLI Frequently-Used Commands

tfdbg: Stepper

Shows the tensor value(s) in a topologically-sorted order for the run.

tfdbg: Screencast and Demo!

From Google Brain Team

See also: Debug TensorFlow Models with tfdbg (@Google Developers Blog)

tfdbg: Other Remarks

  • Currently it is actively being developed (still experimental)
  • In a near future, a web-based interactive debugger (integration with TensorBoard) will be out!

Debugging: Summary

  • Session.run(): Explicitly fetch, and print
  • Tensorboard: Histogram and Image Summary
  • tf.Print(), tf.Assert() operation
  • Use python debugger (ipdb, pudb)
  • Interpose your debugging python code in the graph
  • The TensorFlow debugger: tfdbg

There is no silver bullet; one might need to choose the most convenient and suitable debugging tool, depending on the case

Other General Tips

(in a Programmer's Perspective)

General Tips of Debugging

  • Learn to use debugging tools, but do not solely rely on them when a problem occurs.
  • Sometimes, just sitting down and reading through