RNN示例
tf.SequenceExample
tf.SequenceExample
可以用来方便的处理序列数据,它包含两部分
- context:非序列化的数据
- feature_list:序列数据
使用tf.SequenceExample
的好处是:
(1) 方便分布式训练 (2) 便于数据模型复用 (3) 便于使用tensorflow内置的其他函数,比如tf.parse_single_sequence_example (4) 分离数据预处理跟模型
实际使用中,经常把数据预处理为tf.SequenceExample
并保存为TFRecord
,再通过 tf.TFRecordReader
和tf.parse_single_sequence_example
等读取和解析数据。
将原始数据转换为tf.SequenceExample
格式
import tensorflow as tf
import numpy as np
import tempfile
sequences = [[1, 2, 3], [4, 5, 1], [1, 2]]
label_sequences = [[0, 1, 0], [1, 0, 0], [1, 1]]
def make_example(sequence, labels):
# The object we return
ex = tf.train.SequenceExample()
# A non-sequential feature of our example
sequence_length = len(sequence)
ex.context.feature["length"].int64_list.value.append(sequence_length)
# Feature lists for the two sequential features of our example
fl_tokens = ex.feature_lists.feature_list["tokens"]
fl_labels = ex.feature_lists.feature_list["labels"]
for token, label in zip(sequence, labels):
fl_tokens.feature.add().int64_list.value.append(token)
fl_labels.feature.add().int64_list.value.append(label)
return ex
# Write all examples into a TFRecords file
with tempfile.NamedTemporaryFile() as fp:
writer = tf.python_io.TFRecordWriter(fp.name)
for sequence, label_sequence in zip(sequences, label_sequences):
ex = make_example(sequence, label_sequence)
writer.write(ex.SerializeToString())
writer.close()
print("Wrote to {}".format(fp.name))
解析回原始数据的方法
tf.reset_default_graph()
# A single serialized example
# (You can read this from a file using TFRecordReader)
ex = make_example([1, 2, 3], [0, 1, 0]).SerializeToString()
# Define how to parse the example
context_features = {
"length": tf.FixedLenFeature([], dtype=tf.int64)
}
sequence_features = {
"tokens": tf.FixedLenSequenceFeature([], dtype=tf.int64),
"labels": tf.FixedLenSequenceFeature([], dtype=tf.int64)
}
# Parse the example (returns a dictionary of tensors)
context_parsed, sequence_parsed = tf.parse_single_sequence_example(
serialized=ex,
context_features=context_features,
sequence_features=sequence_features
)
context = tf.contrib.learn.run_n(context_parsed, n=1, feed_dict=None)
print(context[0])
sequence = tf.contrib.learn.run_n(sequence_parsed, n=1, feed_dict=None)
print(sequence[0])
{'length': 3}
{'tokens': array([1, 2, 3]), 'labels': array([0, 1, 0])}
批处理和数据填充(padding)
Tensorflow RNN需要tensor类型为[B, T, ...]
,其中B为batch_size
,而T为每次input数据的长度。但在实际使用中,并非所有的序列都有相同的长度,这个时候就需要做数据填充。
幸好Tensorflow内置了对批量填充的支持。如果在调用tf.train.batch
时设置dynamic_pad = True
,返回的batch
将自动为数据填充0。但是要注意,如果一个分类问题使用了0号分类ID,那么padding就会引起问题,所以建议不要使用0,分类直接从1开始。
# Example with tf.train.batch dynamic padding
# ==================================================
import tensorflow as tf
import numpy as np
tf.reset_default_graph()
# Create a tensor [0, 1, 2, 3, 4 ,...]
x = tf.range(1, 10, name="x")
# A queue that outputs 0,1,2,3,...
range_q = tf.train.range_input_producer(limit=5, shuffle=False)
slice_end = range_q.dequeue()
# Slice x to variable length, i.e. [0], [0, 1], [0, 1, 2], ....
y = tf.slice(x, [0], [slice_end], name="y")
print(y)
# Batch the variable length tensor with dynamic padding
batched_data = tf.train.batch(
tensors=[y],
batch_size=5,
dynamic_pad=True,
name="y_batch"
)
print(batched_data)
# Run the graph
# tf.contrib.learn takes care of starting the queues for us
res = tf.contrib.learn.run_n({"y": batched_data}, n=1, feed_dict=None)
# Print the result
print("Batch shape: {}".format(res[0]["y"].shape))
print(res[0]["y"])
Tensor("y:0", shape=(?,), dtype=int32)
Tensor("y_batch:0", shape=(5, ?), dtype=int32)
Batch shape: (5, 4)
[[0 0 0 0]
[1 0 0 0]
[1 2 0 0]
[1 2 3 0]
[1 2 3 4]]
当然,也可以使用tf.PaddingFIFOQueue
来做数据填充。
# Example with PaddingFIFOQueue
# ==================================================
import tensorflow as tf
import numpy as np
tf.reset_default_graph()
# Create a tensor [0, 1, 2, 3, 4 ,...]
x = tf.range(1, 10, name="x")
# A queue that outputs 0,1,2,3,...
range_q = tf.train.range_input_producer(limit=5, shuffle=False)
slice_end = range_q.dequeue()
# Slice x to variable length, i.e. [0], [0, 1], [0, 1, 2], ....
y = tf.slice(x, [0], [slice_end], name="y")
# Creating a new queue
padding_q = tf.PaddingFIFOQueue(
capacity=10,
dtypes=tf.int32,
shapes=[[None]])
# Enqueue the examples
enqueue_op = padding_q.enqueue([y])
# Add the queue runner to the graph
qr = tf.train.QueueRunner(padding_q, [enqueue_op])
tf.train.add_queue_runner(qr)
# Dequeue padded data
batched_data = padding_q.dequeue_many(5)
print(batched_data)
# Run the graph
# tf.contrib.learn takes care of starting the queues for us
res = tf.contrib.learn.run_n({"y": batched_data}, n=1, feed_dict=None)
# Print the result
print("Batch shape: {}".format(res[0]["y"].shape))
print(res[0]["y"])
Tensor("padding_fifo_queue_DequeueMany:0", shape=(5, ?), dtype=int32)
Batch shape: (5, 4)
[[0 0 0 0]
[1 0 0 0]
[1 2 0 0]
[1 2 3 0]
[1 2 3 4]]
tf.nn.rnn
和tf.nn.dynamic_rnn
Internally, tf.nn.rnn creates an unrolled graph for a fixed RNN length. That means, if you call tf.nn.rnn with inputs having 200 time steps you are creating a static graph with 200 RNN steps. First, graph creation is slow. Second, you’re unable to pass in longer sequences (> 200) than you’ve originally specified.
tf.nn.dynamic_rnn solves this. It uses a tf.While loop to dynamically construct the graph when it is executed. That means graph creation is faster and you can feed batches of variable size. What about performance? You may think the static rnn is faster than its dynamic counterpart because it pre-builds the graph. In my experience that’s not the case.
In short, just use tf.nn.dynamic_rnn. There is no benefit to tf.nn.rnn and I wouldn’t be surprised if it was deprecated in the future.
为RNN传入sequence_length
if you didn’t pass
sequence_length
you would get incorrect results! Without passingsequence_length
, Tensorflow will continue calculating the state until T=20 instead of simply copying the state from T=13. This means you would calculate the state using the padded elements, which is not what you want.
import tensorflow as tf
import numpy as np
tf.reset_default_graph()
# Create input data
X = np.random.randn(2, 10, 8)
# The second example is of length 6
X[1,6:] = 0
X_lengths = [10, 6]
cell = tf.nn.rnn_cell.LSTMCell(num_units=64, state_is_tuple=True)
outputs, last_states = tf.nn.dynamic_rnn(
cell=cell,
dtype=tf.float64,
sequence_length=X_lengths,
inputs=X)
result = tf.contrib.learn.run_n(
{"outputs": outputs, "last_states": last_states},
n=1,
feed_dict=None)
assert result[0]["outputs"].shape == (2, 10, 64)
print(result[0]["outputs"])
# Outputs for the second example past past length 6 should be 0
assert (result[0]["outputs"][1,7,:] == np.zeros(cell.output_size)).all()
双向RNN
import tensorflow as tf
import numpy as np
tf.reset_default_graph()
# Create input data
X = np.random.randn(2, 10, 8)
# The second example is of length 6
X[1,6:] = 0
X_lengths = [10, 6]
cell = tf.nn.rnn_cell.LSTMCell(num_units=64, state_is_tuple=True)
outputs, states = tf.nn.bidirectional_dynamic_rnn(
cell_fw=cell,
cell_bw=cell,
dtype=tf.float64,
sequence_length=X_lengths,
inputs=X)
output_fw, output_bw = outputs
states_fw, states_bw = states
result = tf.contrib.learn.run_n(
{"output_fw": output_fw,
"output_bw": output_bw,
"states_fw": states_fw,
"states_bw": states_bw},
n=1,
feed_dict=None)
print(result[0]["output_fw"].shape)
print(result[0]["output_bw"].shape)
print(result[0]["states_fw"].h.shape)
print(result[0]["states_bw"].h.shape)
RNN CELLS, WRAPPERS AND MULTI-LAYER RNNS
As of the time of this writing, the basic RNN cells and wrappers are:
BasicRNNCell
– A vanilla RNN cell.GRUCell
– A Gated Recurrent Unit cell.BasicLSTMCell
– An LSTM cell based on Recurrent Neural Network Regularization. No peephole connection or cell clipping.LSTMCell
– A more complex LSTM cell that allows for optional peephole connections and cell clipping.MultiRNNCell
– A wrapper to combine multiple cells into a multi-layer cell.DropoutWrapper
– A wrapper to add dropout to input and/or output connections of a cell.
and the contributed RNN cells and wrappers:
CoupledInputForgetGateLSTMCell
– An extendedLSTMCell
that has coupled input and forget gates based on LSTM: A Search Space Odyssey.- TimeFreqLSTMCell – Time-Frequency LSTM cell based on Modeling Time-Frequency Patterns with LSTM vs. Convolutional Architectures for LVCSR Tasks
GridLSTMCell
– The cell from Grid Long Short-Term Memory.AttentionCellWrapper
– Adds attention to an existing RNN cell, based on Long Short-Term Memory-Networks for Machine Reading.LSTMBlockCell
– A faster version of the basic LSTM cell (Note: this one is inlstm_ops.py
)
import tensorflow as tf
import numpy as np
tf.reset_default_graph()
# Create input data
X = np.random.randn(2, 10, 8)
# The second example is of length 6
X[1,6,:] = 0
X_lengths = [10, 6]
cell = tf.nn.rnn_cell.LSTMCell(num_units=64, state_is_tuple=True)
cell = tf.nn.rnn_cell.DropoutWrapper(cell=cell, output_keep_prob=0.5)
cell = tf.nn.rnn_cell.MultiRNNCell(cells=[cell] * 4, state_is_tuple=True)
outputs, last_states = tf.nn.dynamic_rnn(
cell=cell,
dtype=tf.float64,
sequence_length=X_lengths,
inputs=X)
result = tf.contrib.learn.run_n(
{"outputs": outputs, "last_states": last_states},
n=1,
feed_dict=None)
print(result[0]["outputs"].shape)
print(result[0]["outputs"])
assert result[0]["outputs"].shape == (2, 10, 64)
# Outputs for the second example past past length 6 should be 0
assert (result[0]["outputs"][1,7,:] == np.zeros(cell.output_size)).all()
print(result[0]["last_states"][0].h.shape)
print(result[0]["last_states"][0].h)
Loss计算
import tensorflow as tf
import numpy as np
tf.reset_default_graph()
tf.set_random_seed(10)
np.random.seed(10)
# Batch size
B = 4
# (Maximum) number of time steps in this batch
T = 8
RNN_DIM = 128
NUM_CLASSES = 10
# The *acutal* length of the examples
example_len = [1, 2, 3, 8]
# The classes of the examples at each step (between 1 and 9, 0 means padding)
y = np.random.randint(1, 10, [B, T])
for i, length in enumerate(example_len):
y[i, length:] = 0
# The RNN outputs
rnn_outputs = tf.convert_to_tensor(np.random.randn(B, T, RNN_DIM), dtype=tf.float32)
# Output layer weights
W = tf.get_variable(
name="W",
initializer=tf.random_normal_initializer(),
shape=[RNN_DIM, NUM_CLASSES])
# Calculate logits and probs
# Reshape so we can calculate them all at once
rnn_outputs_flat = tf.reshape(rnn_outputs, [-1, RNN_DIM])
logits_flat = tf.batch_matmul(rnn_outputs_flat, W)
probs_flat = tf.nn.softmax(logits_flat)
# Calculate the losses
y_flat = tf.reshape(y, [-1])
losses = tf.nn.sparse_softmax_cross_entropy_with_logits(logits_flat, y_flat)
# Mask the losses
mask = tf.sign(tf.to_float(y_flat))
masked_losses = mask * losses
# Bring back to [B, T] shape
masked_losses = tf.reshape(masked_losses, tf.shape(y))
# Calculate mean loss
mean_loss_by_example = tf.reduce_sum(masked_losses, reduction_indices=1) / example_len
mean_loss = tf.reduce_mean(mean_loss_by_example)
result = tf.contrib.learn.run_n(
{
"masked_losses": masked_losses,
"mean_loss_by_example": mean_loss_by_example,
"mean_loss": mean_loss
},
n=1,
feed_dict=None)
print(result[0]["masked_losses"])
print(result[0]["mean_loss_by_example"])
print(result[0]["mean_loss"])
参考文档
本文翻译自http://www.wildml.com/2016/08/rnns-in-tensorflow-a-practical-guide-and-undocumented-features/。