Debug tensorflow

优质

小牛编辑

125浏览

2023-12-01

Welcome!

Introduction: Why debugging in TensorFlow is difficult
Basic and advanced methods for debugging TensorFlow codes
General tips and guidelines for easy-debuggable code
Benchmarking and profiling TensorFlow codes

A Disclaimer

This talk is NOT about how to debug your ML/DL model (e.g. my model is not fitting well), but about how to debug your TF codes in a programming perspective
I had to assume that the audience is somewhat familiar with basics of TensorFlow and Deep Learning; it would be very good if you have an experience to write a TensorFlow code by yourself
Questions are highly welcomed! Please feel free to interrupt!

Debugging?

Debugging TensorFlow Application is ...

Difficult!

Debugging TensorFlow Application is ...

Difficult!
Do you agree? Life is not that easy

Review: TensorFlow Computation Mechanics

The core concept of TensorFlow: The Computation Graph
See Also: TensorFlow Mechanics 101

Review: TensorFlow Computation Mechanics

(from TensorFlow docs)

TensorFlow programs are usually structured into

a construction phase, that assembles a graph, and
an execution phase that uses a session to execute ops in the graph.

Review: in pure numpy ...

W1, b1, W2, b2, W3, b3 = init_parameters()
def multilayer_perceptron(x, y_truth):
    # (i) feed-forward pass
    assert x.dtype == np.float32 and x.shape == [batch_size, 784] # numpy!
    fc1 = fully_connected(x, W1, b1, activation_fn='relu')    # [B, 256]
    fc2 = fully_connected(fc1, W2, b2, activation_fn='relu')  # [B, 256]
    out = fully_connected(fc2, W3, b3, activation_fn=None)    # [B, 10]
    # (ii) loss and gradient, backpropagation
    loss = softmax_cross_entropy_loss(out, y_truth)   # loss as a scalar
    param_gradients = _compute_gradient(...)  # just an artificial example :)
    return out, loss, param_gradients
def train():
    for epoch in range(10):
        epoch_loss = 0.0
        batch_steps = mnist.train.num_examples / batch_size
        for step in range(batch_steps):
            batch_x, batch_y = mnist.train.next_batch(batch_size)
            y_pred, loss, gradients = multilayer_perceptron(batch_x, batch_y)
            for v, grad_v in zip(all_params, gradients):
                v = v - learning_rate * grad_v
            epoch_loss += c / batch_steps
        print "Epoch %02d, Loss = %.6f" % (epoch, epoch_loss)

Review: with TensorFlow

def multilayer_perceptron(x):
    fc1 = layers.fully_connected(x, 256, activation_fn=tf.nn.relu)
    fc2 = layers.fully_connected(fc1, 256, activation_fn=tf.nn.relu)
    out = layers.fully_connected(fc2, 10, activation_fn=None)
    return out
x = tf.placeholder(tf.float32, [None, 784])
y = tf.placeholder(tf.float32, [None, 10])
pred = multilayer_perceptron(x)
loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(pred, y))
train_op = tf.train.AdamOptimizer(learning_rate=0.001).minimize(loss)
def train(session):
    batch_size = 200
    session.run(tf.initialize_all_variables())
    for epoch in range(10):
        epoch_loss = 0.0
        batch_steps = mnist.train.num_examples / batch_size
        for step in range(batch_steps):
            batch_x, batch_y = mnist.train.next_batch(batch_size)
            _, c = session.run([train_op, loss], {x: batch_x, y: batch_y})
            epoch_loss += c / batch_steps
        print "Epoch %02d, Loss = %.6f" % (epoch, epoch_loss)

Review: The Issues

def multilayer_perceptron(x):
    fc1 = layers.fully_connected(x, 256, activation_fn=tf.nn.relu)
    fc2 = layers.fully_connected(fc1, 256, activation_fn=tf.nn.relu)
    out = layers.fully_connected(fc2, 10, activation_fn=None)
    return out
x = tf.placeholder(tf.float32, [None, 784])
y = tf.placeholder(tf.float32, [None, 10])
pred = multilayer_perceptron(x)

The actual computation is done inside session.run(); what we just have done is to build a computation graph
The model building part (e.g. multilayer_perceptron()) is called only once before training, so we can't access the intermediates simply
- e.g. Inspecting activations of fc1/fc2 is not trivial!

_, c = session.run([train_op, loss], {x: batch_x, y: batch_y})

`Session.run()`

The most important method in TensorFlow --- where every computation is performed!

tf.Session.run(fetches, feed_dict) runs the operations and evaluates in fetches, subsituting the values (placeholders) in feed_dict for the corresponding input values.

Why TensorFlow debugging is difficult?

The concept of Computation Graph might be unfamiliar to us.
The "Inversion of Control"
- The actual computation (feed-forward, training) of model runs inside Session.run(), upon the computation graph, but not upon the Python code we wrote
- What is exactly being done during an execution of session is under an abstraction barrier
Therefore, we do not retrieve the intermediate values during the computation, unless we explicitly fetch them via Session.run()

Debugging Facilities in TensorFlow

Debugging Scenarios

We may wish to ...

inspect intra-layer activations (during training)
- e.g. See the output of conv5, fc7 in CNNs
inspect parameter weights (during training)
under some conditions, pause the execution (i.e. breakpoint) and evaluate some expressions for debugging
during training, NaN occurs in loss and variables but I don't know why

Debugging Scenarios

We may wish to ...

inspect intra-layer activations (during training)
- e.g. See the output of conv5, fc7 in CNNs
inspect parameter weights (during training)
under some conditions, pause the execution (i.e. breakpoint) and evaluate some expressions for debugging
during training, NaN occurs in loss and variables but I don't know why

Of course, in TensorFlow, we can do these very elegantly!

Debugging in TensorFlow: Overview

Basic ways:

Explicitly fetch, and print (or do whatever you want)!
- Session.run()
Tensorboard: Histogram and Image Summary
the tf.Print() operation

Advanced ways:

Interpose any python codelet in the computation graph
A step-by-step debugger
tfdbg: The TensorFlow debugger

(1) Fetch tensors via `Session.run()`

TensorFlow allows us to run parts of graph in isolation, i.e. only the relevant part of graph is executed (rather than executing everything)

x = tf.placeholder(tf.float32)
y = tf.placeholder(tf.float32)
bias = tf.Variable(1.0)
y_pred = x ** 2 + bias     # x -> x^2 + bias
loss = (y - y_pred)**2     # l2 loss?
# Error: to compute loss, y is required as a dependency
print('Loss(x,y) = %.3f' % session.run(loss, {x: 3.0}))
# OK, print 1.000 = (3**2 + 1 - 9)**2
print('Loss(x,y) = %.3f' % session.run(loss, {x: 3.0, y: 9.0}))
# OK, print 10.000; for evaluating y_pred only, input to y is not required
print('pred_y(x) = %.3f' % session.run(y_pred, {x: 3.0}))
# OK, print 1.000 bias evaluates to 1.0
print('bias      = %.3f' % session.run(bias))

Tensor Fetching: Example

We need to access to the tensors as python expressions

def multilayer_perceptron(x):
    fc1 = layers.fully_connected(x, 256, activation_fn=tf.nn.relu)
    fc2 = layers.fully_connected(fc1, 256, activation_fn=tf.nn.relu)
    out = layers.fully_connected(fc2, 10, activation_fn=None)
    return out, fc1, fc2
net = {}
x = tf.placeholder(tf.float32, [None, 784])
y = tf.placeholder(tf.float32, [None, 10])
pred, net['fc1'], net['fc2'] = multilayer_perceptron(x)

to fetch and evaluate them:

_, c, fc1, fc2, out = session.run(
    [train_op, loss, net['fc1'], net['fc2'], pred],
    feed_dict={x: batch_x, y: batch_y})
# and do something ...
if step == 0: # XXX Debug
    print fc1[0].mean(), fc2[0].mean(), out[0]

(1) Fetch tensors via `Session.run()`

The Good:

Simple and Easy.
The most basic method to get debugging information.
We can fetch any evaluation result in numpy arrays, anywhere (except inside Session.run() or the computation graph).

(1) Fetch tensors via `Session.run()`

The Good:

Simple and Easy.
The most basic method to get debugging information.
We can fetch any evaluation result in numpy arrays, anywhere (except inside Session.run() or the computation graph).

The Bad:

We need to hold the reference to the tensors to inspect, which might be burdensome if model becomes complex and big
(Or, we can simply pass the tensor name such as fc0/Relu:0)
The feed-forward needs to be done in an atomic way (i.e. a single call of Session.run())

Tensor Fetching: The Bad (i)

We need to hold a reference to the tensors to inspect, which might be burdensome if model becomes complex and big

def alexnet(x):
    assert x.get_shape().as_list() == [224, 224, 3]
    conv1 = conv_2d(x, 96, 11, strides=4, activation='relu')
    pool1 = max_pool_2d(conv1, 3, strides=2)
    conv2 = conv_2d(pool1, 256, 5, activation='relu')
    pool2 = max_pool_2d(conv2, 3, strides=2)
    conv3 = conv_2d(pool2, 384, 3, activation='relu')
    conv4 = conv_2d(conv3, 384, 3, activation='relu')
    conv5 = conv_2d(conv4, 256, 3, activation='relu')
    pool5 = max_pool_2d(conv5, 3, strides=2)
    fc6 = fully_connected(pool5, 4096, activation='relu')
    fc7 = fully_connected(fc6, 4096, activation='relu')
    output = fully_connected(fc7, 1000, activation='softmax')
    return conv1, pool1, conv2, pool2, conv3, conv4, conv5, pool5, fc6, fc7
conv1, conv2, conv3, conv4, conv5, fc6, fc7, output = alexnet(images)  # ?!

_, loss_, conv1_, conv2_, conv3_, conv4_, conv5_, fc6_, fc7_ = session.run(
        [train_op, loss, conv1, conv2, conv3, conv4, conv5, fc6, fc7],
        feed_dict = {...})

Tensor Fetching: The Bad (i)

Suggestion: Using a dict or class instance (e.g. self.conv5) is a very good idea

def alexnet(x, net={}):
    assert x.get_shape().as_list() == [224, 224, 3]
    net['conv1'] = conv_2d(x, 96, 11, strides=4, activation='relu')
    net['pool1'] = max_pool_2d(net['conv1'], 3, strides=2)
    net['conv2'] = conv_2d(net['pool1'], 256, 5, activation='relu')
    net['pool2'] = max_pool_2d(net['conv2'], 3, strides=2)
    net['conv3'] = conv_2d(net['pool2'], 384, 3, activation='relu')
    net['conv4'] = conv_2d(net['conv3'], 384, 3, activation='relu')
    net['conv5'] = conv_2d(net['conv4'], 256, 3, activation='relu')
    net['pool5'] = max_pool_2d(net['conv5'], 3, strides=2)
    net['fc6'] = fully_connected(net['pool5'], 4096, activation='relu')
    net['fc7'] = fully_connected(net['fc6'], 4096, activation='relu')
    net['output'] = fully_connected(net['fc7'], 1000, activation='softmax')
    return net['output']
net = {}
output = alexnet(images, net)
# access intermediate layers like net['conv5'], net['fc7'], etc.

Tensor Fetching: The Bad (i)

Suggestion: Using a dict or class instance (e.g. self.conv5) is a very good idea

class AlexNetModel():
    # ...
    def build_model(self, x):
        assert x.get_shape().as_list() == [224, 224, 3]
        self.conv1 = conv_2d(x, 96, 11, strides=4, activation='relu')
        self.pool1 = max_pool_2d(self.conv1, 3, strides=2)
        self.conv2 = conv_2d(self.pool1, 256, 5, activation='relu')
        self.pool2 = max_pool_2d(self.conv2, 3, strides=2)
        self.conv3 = conv_2d(self.pool2, 384, 3, activation='relu')
        self.conv4 = conv_2d(self.conv3, 384, 3, activation='relu')
        self.conv5 = conv_2d(self.conv4, 256, 3, activation='relu')
        self.pool5 = max_pool_2d(self.conv5, 3, strides=2)
        self.fc6 = fully_connected(self.pool5, 4096, activation='relu')
        self.fc7 = fully_connected(self.fc6, 4096, activation='relu')
        self.output = fully_connected(self.fc7, 1000, activation='softmax')
        return self.output
model = AlexNetModel()
output = model.build_model(images)
# access intermediate layers like self.conv5, self.fc7, etc.

Tensor Fetching: The Bad (ii)

The feed-forward (sometimes) needs to be done in an atomic way (i.e. a single call of Session.run())

# a single step of training ...
[loss_value, _] = session.run([loss_op, train_op],
                           feed_dict={images: batch_image})
# After this, the model parameter has been changed due to `train_op`
#
if np.isnan(loss_value):
    # DEBUG : can we see the intermediate values for the current input?
    [fc7, prob] = session.run([net['fc7'], net['prob']],
                              feed_dict={images: batch_image})

In other words, if any input is fed via feed_dict, we may have to fetch the non-debugging-related tensors and the debugging-related tensors at the same time.

Tensor Fetching: The Bad (ii)

In fact, we can just perform an additional session.run() for debugging purposes, if it does not involve any side effect

# for debugging only, get the intermediate layer outputs.
[fc7, prob] = session.run([net['fc7'], net['prob']],
                           feed_dict={images: batch_image})
#
# Yet another feed-forward: 'fc7' are computed once more ...
[loss_value, _] = session.run([loss_op, train_op],
                           feed_dict={images: batch_image})

A workaround: Use session.partial_run() (undocumented, and still experimental)

h = sess.partial_run_setup([net['fc7'], loss_op, train_op], [images])
[loss_value, _] = sess.partial_run(h, [loss_op, train_op],
                                   feed_dict={images: batch_image})
fc7 = sess.partial_run(h, net['fc7'])

(2) Tensorboard

An off-the-shelf monitoring and debugging tool!
Check out a must-read tutorial from TensorFlow documentation
You will need to learn
- how to use and collect scalar/histogram/image summary
- how to use tf.summary.FileWriter (previously it was SummaryWriter)

Tensorboard: A Quick Tutorial

def multilayer_perceptron(x):
    # inside this, variables 'fc1/weights' and 'fc1/bias' are defined
    fc1 = layers.fully_connected(x, 256, activation_fn=tf.nn.relu,
                                 scope='fc1')
    tf.summary.histogram('fc1', fc1)
    tf.summary.histogram('fc1/sparsity', tf.nn.zero_fraction(fc1))
    fc2 = layers.fully_connected(fc1, 256, activation_fn=tf.nn.relu,
                                 scope='fc2')
    tf.summary.histogram('fc2', fc2)
    tf.summary.histogram('fc2/sparsity', tf.nn.zero_fraction(fc2))
    out = layers.fully_connected(fc2, 10, scope='out')
    return out
# ... (omitted) ...
loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(
                      logits=pred, labels=y))
tf.scalar_summary('loss', loss)
# histogram summary for all trainable variables (slow?)
for v in tf.trainable_variables():
     tf.summary.histogram(v.name, v)

global_step = tf.Variable(0, dtype=tf.int32, trainable=False)
train_op = tf.train.AdamOptimizer(learning_rate=0.001)\
                   .minimize(loss, global_step=global_step)

def train(session):
    batch_size = 200
    session.run(tf.global_variables_initializer())
    merged_summary_op = tf.summary.merge_all()
    summary_writer = tf.summary.FileWriter(FLAGS.train_dir, session.graph)
    # ... (omitted) ...
        for step in range(batch_steps):
            batch_x, batch_y = mnist.train.next_batch(batch_size)
            _, c, summary = session.run(
                [train_op, loss, merged_summary_op],
                feed_dict={x: batch_x, y: batch_y})
            summary_writer.add_summary(summary,
                                       global_step.eval(session=session))

Tensorboard: A Quick Tutorial (Demo)

Scalar Summary

Tensorboard: A Quick Tutorial (Demo)

Histogram Summary (activations and variables)

Tensorboard and Summary: Noteworthies

Fetching histogram summary is extremely slow!

GPU utilization can become very low (if the serialized values are huge)
In non-debugging mode, disable it completely; or fetch summaries only periodically, e.g.

eval_tensors = [self.loss, self.train_op]
if step % 200 == 0:
    eval_tensors += [self.merged_summary_op]
eval_ret = session.run(eval_tensors, feed_dict)
eval_ret = dict(zip(eval_tensors, eval_ret))  # as a dict
current_loss = eval_ret[self.loss]
if self.merged_summary_op in eval_tensors:
    self.summary_writer.add_summary(
        eval_ret[self.merged_summary_op], current_step)

I recommend to take simple and essential scalar summaries only (e.g. train/validation loss, overall accuracy, etc.), and to include debugging stuffs only on demand

Tensorboard and Summary: Noteworthies

Some other recommendations:

Use proper names (prefixed or scoped) for tensors and variables (specifying name=... to tensor/variable declaration)
Include both of train loss and validation loss, plus train/validation accuracy (if possible) over step

(3) `tf.Print()`

During run-time evaluation, we can print the value of a tensor without explicitly fetching and returning it to the code (i.e. via session.run())

tf.Print(input, data, message=None,
         first_n=None, summarize=None, name=None)

It creates an identity op with the side effect of printing data, when this op is evaluated.

(3) `tf.Print()`: Examples

def multilayer_perceptron(x):
    fc1 = layers.fully_connected(x, 256, activation_fn=tf.nn.relu)
    fc2 = layers.fully_connected(fc1, 256, activation_fn=tf.nn.relu)
    out = layers.fully_connected(fc2, 10, activation_fn=None)
    out = tf.Print(out, [tf.argmax(out, 1)],
                   'argmax(out) = ', summarize=20, first_n=7)
    return out

For the first seven times (i.e. 7 feed-forwards or SGD steps), it will print the predicted labels for the 20 out of batch_size examples

I tensorflow/core/kernels/logging_ops.cc:79] argmax(out) = [6 6 6 4 4 6 4 4 6 6 4 0 6 4 6 4 4 6 0 4...]
I tensorflow/core/kernels/logging_ops.cc:79] argmax(out) = [6 6 0 0 3 6 4 3 6 6 3 4 4 4 4 4 3 4 6 7...]
I tensorflow/core/kernels/logging_ops.cc:79] argmax(out) = [3 4 0 6 6 6 0 7 3 0 6 7 3 6 0 3 4 3 3 6...]
I tensorflow/core/kernels/logging_ops.cc:79] argmax(out) = [6 1 0 0 0 3 3 7 0 8 1 2 0 9 9 0 3 4 6 6...]
I tensorflow/core/kernels/logging_ops.cc:79] argmax(out) = [6 0 0 9 0 4 9 9 0 8 2 7 3 9 1 8 3 9 7 8...]
I tensorflow/core/kernels/logging_ops.cc:79] argmax(out) = [6 0 1 1 9 0 8 3 0 9 9 0 2 6 7 7 3 3 3 9...]
I tensorflow/core/kernels/logging_ops.cc:79] argmax(out) = [3 6 9 8 3 9 1 0 1 1 9 3 2 3 9 9 3 0 6 6...]
[2016-06-03 00:11:08.661563] Epoch 00, Loss = 0.332199

(3) `tf.Print()`: Some drawbacks ...

Cons:

It is hard to take a full control of print formats (e.g. how do we print a 2D tensor in a matrix format?)
Usually, we may want to print debugging values conditionally
(i.e. print them only if some condition is met) or periodically
(i.e. print just only once per epoch)
- tf.Print() has limitations to achieve this
- TensorFlow has control flow operations; an overkill?

(3) `tf.Assert()`

Asserts that the given condition is true, when evaluated (during the computation)
If condition evaluates to False, print the list of tensors in data, and an error is thrown. summarize determines how many entries of the tensors to print.

tf.Assert(condition, data, summarize=None, name=None)

`tf.Assert`: Examples

def multilayer_perceptron(x):
    fc1 = layers.fully_connected(x, 256, activation_fn=tf.nn.relu, scope='fc1')
    fc2 = layers.fully_connected(fc1, 256, activation_fn=tf.nn.relu, scope='fc2')
    out = layers.fully_connected(fc2, 10, activation_fn=None, scope='out')
    # let's ensure that all the outputs in `out` are positive
    tf.Assert(tf.reduce_all(out > 0), [out], name='assert_out_positive')
    return out

The assertion will not work!

tf.Assert is also an op, so it should be executed as well

We need to ensure that assert_op is being executed when evaluating out:

def multilayer_perceptron(x):
    fc1 = layers.fully_connected(x, 256, activation_fn=tf.nn.relu, scope='fc1')
    fc2 = layers.fully_connected(fc1, 256, activation_fn=tf.nn.relu, scope='fc2')
    out = layers.fully_connected(fc2, 10, activation_fn=None, scope='out')
    # let's ensure that all the outputs in `out` are positive
    assert_op = tf.Assert(tf.reduce_all(out > 0), [out], name='assert_out_positive')
    with tf.control_dependencies([assert_op]):
        out = tf.identity(out, name='out')
    return out

... somewhat ugly? ... or

# ... same as above ...
out = tf.with_dependencies([assert_op], out)
return out

Another good way: store all the created assertion operations into a collection, (merge them into a single op), and explicitly evaluate them using Session.run()

def multilayer_perceptron(x):
    fc1 = layers.fully_connected(x, 256, activation_fn=tf.nn.relu, scope='fc1')
    fc2 = layers.fully_connected(fc1, 256, activation_fn=tf.nn.relu, scope='fc2')
    out = layers.fully_connected(fc2, 10, activation_fn=None, scope='out')
    tf.add_to_collection('Asserts',
         tf.Assert(tf.reduce_all(out > 0), [out], name='assert_out_gt_0')
    )
    return out
# merge all assertion ops from the collection
assert_op = tf.group(*tf.get_collection('Asserts'))

... = session.run([train_op, assert_op], feed_dict={...})

Some built-in useful Assert ops

See Asserts and boolean checks in the docs!

tf.assert_negative(x, data=None, summarize=None, name=None)
tf.assert_positive(x, data=None, summarize=None, name=None)
tf.assert_proper_iterable(values)
tf.assert_non_negative(x, data=None, summarize=None, name=None)
tf.assert_non_positive(x, data=None, summarize=None, name=None)
tf.assert_equal(x, y, data=None, summarize=None, name=None)
tf.assert_integer(x, data=None, summarize=None, name=None)
tf.assert_less(x, y, data=None, summarize=None, name=None)
tf.assert_less_equal(x, y, data=None, summarize=None, name=None)
tf.assert_rank(x, rank, data=None, summarize=None, name=None)
tf.assert_rank_at_least(x, rank, data=None, summarize=None, name=None)
tf.assert_type(tensor, tf_type)
tf.is_non_decreasing(x, name=None)
tf.is_numeric_tensor(tensor)
tf.is_strictly_increasing(x, name=None)

If we need runtime assertions during computation, they are useful.

(4) Step-by-step Debugger

Python already has a powerful debugging utilities:

which are all interactive debuggers (like gdb for C/C++)

set breakpoint
pause, continue
inspect stack-trace upon exception
watch variables and evaluate expressions interactively

Debugger: Usage

Insert set_trace() for a breakpoint

def multilayer_perceptron(x):
    fc1 = layers.fully_connected(x, 256, activation_fn=tf.nn.relu, scope='fc1')
    fc2 = layers.fully_connected(fc1, 256, activation_fn=tf.nn.relu, scope='fc2')
    out = layers.fully_connected(fc2, 10, activation_fn=None, scope='out')
    import ipdb; ipdb.set_trace()  # XXX BREAKPOINT
    return out

(yeah, it is a breakpoint on model building)

Debugger: Usage

Debug breakpoints can be conditional:

for i in range(batch_steps):
    batch_x, batch_y = mnist.train.next_batch(batch_size)
    if (np.argmax(batch_y, axis=1)[:7] == [4, 9, 6, 2, 9, 6, 5]).all():
        import pudb; pudb.set_trace()  # XXX BREAKPOINT
    _, c = session.run([train_op, loss],
                        feed_dict={x: batch_x, y: batch_y})

Live Demo and Case Example!! (20-mnist-pdb.py)

Let's break on training loop if some condition is met
Get the fc2 tensor, and fetch its evaluation result (given x)
- Session.run() can be invoked and executed anywhere, even in the debugger
Wow.... gonna love it..

Hint: Some Useful TensorFlow APIs

To get any operations or tensors that might not be stored explicitly:

tf.get_default_graph(): Get the current (default) graph
G.get_operations(): List all the TF ops in the graph
G.get_operation_by_name(name): Retrieve a specific TF op
(Q. How to convert an operation to a tensor?)
G.get_tensor_by_name(name): Retrieve a specific tensor
tf.get_collection(tf.GraphKeys.~~): Get the collection of some tensors

To get variables:

tf.get_variable_scope(): Get the current variable scope
tf.get_variable(): Get a variable (see Sharing Variables)
tf.trainable_variables(): List all the (trainable) variables

[v for v in tf.all_variables() if v.name == 'fc2/weights:0'][0]

`IPython.embed()`

ipdb/pudb:

import pudb; pudb.set_trace()

They are debuggers; we can set breakpoint, see stacktraces, watch expressions, ... (much more general)

embed:

from IPython import embed; embed()

Open an ipython shell on the current context; mostly used for watching expressions only

(5) Debugging 'inside' the computation graph

Our debugging tools so far can be used for debugging outside Session.run().

Question: How can we run a 'custom' operation? (e.g. custom layer)

(5) Debugging 'inside' the computation graph

Our debugging tools so far can be used for debugging outside Session.run().

Question: How can we run a 'custom' operation? (e.g. custom layer)

TensorFlow allows us to write a custom operation in C++ !
The 'custom' operation can be designed for logging or debugging purposes (like PrintOp)
... but very burdensome (need to compile, define op interface, and use it ...)

(5) Interpose any python code in the computation graph

We can also embed and interpose a python function in the graph: tf.py_func() comes to the rescue!

tf.py_func(func, inp, Tout, stateful=True, name=None)

Wraps a python function and uses it as a tensorflow op.
Given a python function func, which takes numpy arrays as its inputs and returns numpy arrays as its outputs, the function is wrapped as an operation.

def my_func(x):
    # x will be a numpy array with the contents of the placeholder below
    return np.sinh(x)
inp = tf.placeholder(tf.float32, [...])
y = py_func(my_func, [inp], [tf.float32])

In other words, we are now able to use the following (hacky) tricks by intercepting the computation being executed on the graph:

Print any intermediate values (e.g. layer activations) in a custom form without fetching them
Use python debugger (e.g. trace and breakpoint)
Draw a graph or plot using matplotlib, or save images into file

Warning: Some limitations may exist, e.g. thread-safety issue, not allowed to manipulate the state of session, etc..

Case Example (i): Print

An ugly example ...

def multilayer_perceptron(x):
    fc1 = layers.fully_connected(x, 256, activation_fn=tf.nn.relu, scope='fc1')
    fc2 = layers.fully_connected(fc1, 256, activation_fn=tf.nn.relu, scope='fc2')
    out = layers.fully_connected(fc2, 10, activation_fn=None, scope='out')
    def _debug_print_func(fc1_val, fc2_val):
        print 'FC1 : {}, FC2 : {}'.format(fc1_val.shape, fc2_val.shape)
        print 'min, max of FC2 = {}, {}'.format(fc2_val.min(), fc2_val.max())
        return False
    debug_print_op = tf.py_func(_debug_print_func, [fc1, fc2], [tf.bool])
    with tf.control_dependencies(debug_print_op):
        out = tf.identity(out, name='out')
    return out

Case Example (ii): Breakpoints!

An ugly example to attach breakpoints ...

def multilayer_perceptron(x):
    fc1 = layers.fully_connected(x, 256, activation_fn=tf.nn.relu, scope='fc1')
    fc2 = layers.fully_connected(fc1, 256, activation_fn=tf.nn.relu, scope='fc2')
    out = layers.fully_connected(fc2, 10, activation_fn=None, scope='out')
    def _debug_func(x_val, fc1_val, fc2_val, out_val):
        if (out_val == 0.0).any():
            import ipdb; ipdb.set_trace()  # XXX BREAKPOINT
        from IPython import embed; embed()  # XXX DEBUG
        return False
    debug_op = tf.py_func(_debug_func, [x, fc1, fc2, out], [tf.bool])
    with tf.control_dependencies(debug_op):
        out = tf.identity(out, name='out')
    return out

Another one: The `tdb` library

A third-party TensorFlow debugging tool: https://github.com/ericjang/tdb
(not actively maintained and looks clunky, but still good for prototyping)

(6) `tfdbg`: The official TensorFlow debugger

Recent versions of TensorFlow has the official debugger (tfdbg). Still experimental, but works quite well!

Check out the HOW-TOs and Examples on tfdbg!!!

import tensorflow.python.debug as tf_debug
sess = tf.Session()
# create a debug wrapper session
sess = tf_debug.LocalCLIDebugWrapperSession(sess)
# Add a tensor filter (similar to breakpoint)
sess.add_tensor_filter("has_inf_or_nan", tf_debug.has_inf_or_nan)
# Each session.run() will be intercepted by the debugger,
# and we can inspect the value of tensors via the debugger interface
sess.run(loss, feed_dict = {x : ...})

(6) `tfdbg`: The TensorFlow debugger

`tfdbg`: Features and Quick References

Conceptually, a wrapper session is employed (currently, CLI debugger session); it can intercept a single run of session.run()

run / r : Execute the run() call with debug tensor-watching
run -n / r -n : Execute the run() call without debug tensor-watching
run -f <filter_name> : Keep executing run() calls until a dumped tensor passes a registered filter (conditional breakpoint)
- e.g. has_inf_or_nan

`tfdbg`: Tensor Filters

Registering tensor filters:

sess = tf_debug.LocalCLIDebugWrapperSession(sess)
sess.add_tensor_filter("has_inf_or_nan", tf_debug.has_inf_or_nan)

Tensor filters are just python functions (datum, tensor) -> bool:

def has_inf_or_nan(datum, tensor):
  _ = datum  # Datum metadata is unused in this predicate.
  if tensor is None:
    # Uninitialized tensor doesn't have bad numerical values.
    return False
  elif (np.issubdtype(tensor.dtype, np.float) or
        np.issubdtype(tensor.dtype, np.complex) or
        np.issubdtype(tensor.dtype, np.integer)):
    return np.any(np.isnan(tensor)) or np.any(np.isinf(tensor))
  else:
    return False

Running tensor filters are, therefore, quite slow.

`tfdbg`: Tensor Fetching

In a tensor dump mode (the run-end UI), the debugger shows the list of tensors dumped in the session.run() call:

`tfdbg`: Tensor Fetching

Commands:

list_tensors (lt) : Show the list of dumped tensor(s).
print_tensor (pt) : Print the value of a dumped tensor.
node_info (ni) : Show information about a node
- ni -t : Shows the traceback of tensor creation
list_inputs (li) : Show inputs to a node
list_outputs (lo) : Show outputs to a node
run_info (ri) : Show the information of current run
(e.g. what to fetch, what feed_dict is)
invoke_stepper (s) : Invoke the stepper!
run (r) : Move to the next run

`tfdbg`: Tensor Fetching

Example: print_tensor fc2/Relu:0

Slicing: pt f2/Relu:0[0:10]
Dumping: pt fc2/Relu:0 > /tmp/debug/fc2.txt

`tfdbg`: Stepper

Shows the tensor value(s) in a topologically-sorted order for the run.

`tfdbg`: Screencast and Demo!

From Google Brain Team

`tfdbg`: Other Remarks

Currently it is actively being developed (still experimental)
In a near future, a web-based interactive debugger (integration with TensorBoard) will be out!

Debugging: Summary

Session.run(): Explicitly fetch, and print
Tensorboard: Histogram and Image Summary
tf.Print(), tf.Assert() operation
Use python debugger (ipdb, pudb)
Interpose your debugging python code in the graph
The TensorFlow debugger: tfdbg

There is no silver bullet; one might need to choose the most convenient and suitable debugging tool, depending on the case

Other General Tips

(in a Programmer's Perspective)

General Tips of Debugging

Learn to use debugging tools, but do not solely rely on them when a problem occurs.
Sometimes, just sitting down and reading through

Debug tensorflow

Welcome!

Contents

A Disclaimer

Debugging?

Debugging TensorFlow Application is ...

Debugging TensorFlow Application is ...

Debugging TensorFlow Application is ...

Review: TensorFlow Computation Mechanics

Review: TensorFlow Computation Mechanics

Review: in pure numpy ...

Review: with TensorFlow

Review: The Issues

Session.run()

Why TensorFlow debugging is difficult?

Debugging Facilities in TensorFlow

Debugging Scenarios

Debugging Scenarios

Debugging in TensorFlow: Overview

(1) Fetch tensors via Session.run()

Tensor Fetching: Example

(1) Fetch tensors via Session.run()

(1) Fetch tensors via Session.run()

Tensor Fetching: The Bad (i)

Tensor Fetching: The Bad (i)

Tensor Fetching: The Bad (i)

Tensor Fetching: The Bad (ii)

Tensor Fetching: The Bad (ii)

(2) Tensorboard

Tensorboard: A Quick Tutorial

Tensorboard: A Quick Tutorial (Demo)

Tensorboard: A Quick Tutorial (Demo)

Tensorboard and Summary: Noteworthies

Tensorboard and Summary: Noteworthies

(3) tf.Print()

(3) tf.Print(): Examples

(3) tf.Print(): Some drawbacks ...

(3) tf.Assert()

tf.Assert: Examples

Some built-in useful Assert ops

(4) Step-by-step Debugger

Debugger: Usage

Debugger: Usage

Hint: Some Useful TensorFlow APIs

IPython.embed()

(5) Debugging 'inside' the computation graph

(5) Debugging 'inside' the computation graph

(5) Interpose any python code in the computation graph

Case Example (i): Print

Case Example (ii): Breakpoints!

Another one: The tdb library

(6) tfdbg: The official TensorFlow debugger

(6) tfdbg: The TensorFlow debugger

(6) tfdbg: The TensorFlow debugger

tfdbg: Features and Quick References

tfdbg: Tensor Filters

tfdbg: Tensor Fetching

tfdbg: Tensor Fetching

tfdbg: Tensor Fetching

tfdbg: Stepper

tfdbg: Screencast and Demo!

tfdbg: Other Remarks

Debugging: Summary

Other General Tips

(in a Programmer's Perspective)

General Tips of Debugging

`Session.run()`

(1) Fetch tensors via `Session.run()`

(1) Fetch tensors via `Session.run()`

(1) Fetch tensors via `Session.run()`

(3) `tf.Print()`

(3) `tf.Print()`: Examples

(3) `tf.Print()`: Some drawbacks ...

(3) `tf.Assert()`

`tf.Assert`: Examples

`IPython.embed()`

Another one: The `tdb` library

(6) `tfdbg`: The official TensorFlow debugger

(6) `tfdbg`: The TensorFlow debugger

(6) `tfdbg`: The TensorFlow debugger

`tfdbg`: Features and Quick References

`tfdbg`: Tensor Filters

`tfdbg`: Tensor Fetching

`tfdbg`: Tensor Fetching

`tfdbg`: Tensor Fetching

`tfdbg`: Stepper

`tfdbg`: Screencast and Demo!

`tfdbg`: Other Remarks