1.52 How to Use TensorFlow Debugger (tfdbg with tf.contrib.learn

优质
小牛编辑
119浏览
2023-12-01

[TOC]

In @{$debugger$a previous tutorial}, we described how to use TensorFlow Debugger (tfdbg) to debug TensorFlow graphs running in @{tf.Session} objects managed by yourself. However, many users find @{$tflearn$tf.contrib.learn} @{tf.contrib.learn.Estimator$Estimator}s to be a convenient higher-level API for creating and using models in TensorFlow. Part of the convenience is that Estimators manage Sessions internally. Fortunately, you can still use tfdbg with Estimators by adding special hooks.

Debugging tf.contrib.learn Estimators

Currently, tfdbg can debug the @{tf.contrib.learn.BaseEstimator.fit$fit()} @{tf.contrib.learn.BaseEstimator.evaluate$evaluate()} methods of tf-learn Estimators. To debug Estimator.fit(), create a LocalCLIDebugHook and supply it as the monitors argument. For example:

# First, let your BUILD target depend on "//tensorflow/python/debug:debug_py"
# (You don't need to worry about the BUILD dependency if you are using a pip
#  install of open-source TensorFlow.)
from tensorflow.python import debug as tf_debug

hooks = [tf_debug.LocalCLIDebugHook()]

# Create a local CLI debug hook and use it as a monitor when calling fit().
classifier.fit(x=training_set.data,         y=training_set.target,         steps=1000,         monitors=hooks)

To debug Estimator.evaluate(), you can follow the example below:

accuracy_score = classifier.evaluate(x=test_set.data,                               y=test_set.target,                               hooks=hooks)["accuracy"]

For a detailed example based on @{$tflearn$tf-learn's iris tutorial}, run:

python -m tensorflow.python.debug.examples.debug_tflearn_iris --debug

Debugging tf.contrib.learn Experiments

Experiment is a construct in tf.contrib.learn at a higher level than Estimator. It provides a single interface for training and evaluating a model. To debug the train() and evaluate() calls to an Experiment object, you can use the keyword arguments train_monitors and eval_hooks, respectively, when calling its constructor. For example:

# First, let your BUILD target depend on "//tensorflow/python/debug:debug_py"
# (You don't need to worry about the BUILD dependency if you are using a pip
#  install of open-source TensorFlow.)
from tensorflow.python import debug as tf_debug

hooks = [tf_debug.LocalCLIDebugHook()]

ex = experiment.Experiment(classifier,                     train_input_fn=iris_input_fn,                     eval_input_fn=iris_input_fn,                     train_steps=FLAGS.train_steps,                     eval_delay_secs=0,                     eval_steps=1,                     train_monitors=hooks,                     eval_hooks=hooks)

ex.train()
accuracy_score = ex.evaluate()["accuracy"]

To see the debug_tflearn_iris example run in the Experiment mode, do:

python -m tensorflow.python.debug.examples.debug_tflearn_iris \--use_experiment --debug

Debugging Estimators and Experiments without Terminal Access

If your Estimator or Experiment is running in an environment to which you do not have command-line access (e.g., a remote server), you can use the non-interactive DumpingDebugHook. For example:

# Let your BUILD target depend on "//tensorflow/python/debug:debug_py
# (You don't need to worry about the BUILD dependency if you are using a pip
#  install of open-source TensorFlow.)
from tensorflow.python import debug as tf_debug

hooks = [tf_debug.DumpingDebugHook("/shared/storage/location/tfdbg_dumps_1")]

Then this hook can be used in the same way as the LocalCLIDebugHook examples above. As the training and/or evalution of Estimator or Experiment happens, directories of the naming pattern /shared/storage/location/tfdbg_dumps_1/run_<epoch_timestamp_microsec>_<uuid> will appear. Each directory corresponds to a Session.run() call that underlies the fit() or evaluate() call. You can load these directories and inspect them in a command-line interface in an offline manner using the offline_analyzer offered by tfdbg. For example:

python -m tensorflow.python.debug.cli.offline_analyzer \--dump_dir="/shared/storage/location/tfdbg_dumps_1/run_<epoch_timestamp_microsec>_<uuid>"

The LocalCLIDebugHook also allows you to configure a watch_fn that can be used to flexibly specify what Tensors to watch on different Session.run() calls, as a function of the fetches and feed_dict and other states. See @{tfdbg.DumpingDebugWrapperSession.init$this API doc} for more details.