mirror of
https://github.com/zebrajr/tensorflow.git
synced 2025-12-06 12:20:11 +01:00
TensorFlow: Upstream changes to git.
Changes: - Update a lot of documentation, installation instructions, requirements, etc. - Add RNN models directory for recurrent neural network examples to go along with the tutorials. Base CL: 107290480
This commit is contained in:
parent
cd9e60c1cd
commit
8bd3b38e66
|
|
@ -15,3 +15,14 @@ Follow either of the two links above to access the appropriate CLA and instructi
|
|||
|
||||
***NOTE***: Only original source code from you and other people that have signed the CLA can be accepted into the main repository.
|
||||
|
||||
## Contributing code
|
||||
|
||||
We currently use Gerrit to host and handle code changes to TensorFlow. The main
|
||||
site is
|
||||
[https://tensorflow-review.googlesource.com/](https://tensorflow-review.googlesource.com/).
|
||||
See Gerrit [docs](https://gerrit-review.googlesource.com/Documentation/) for
|
||||
information on how Gerrit's code review system works.
|
||||
|
||||
We are currently working on improving our external acceptance process, so
|
||||
please be patient with us as we work out the details.
|
||||
|
||||
|
|
|
|||
|
|
@ -14,7 +14,7 @@ variety of other domains, as well.
|
|||
# Download and Setup
|
||||
|
||||
For detailed installation instructions, see
|
||||
[here](g3doc/get_started/os_setup.md).
|
||||
[here](tensorflow/g3doc/get_started/os_setup.md).
|
||||
|
||||
## Binary Installation
|
||||
|
||||
|
|
|
|||
|
|
@ -27,7 +27,7 @@
|
|||
* [class tf.RegisterShape](#RegisterShape)
|
||||
* [class tf.TensorShape](#TensorShape)
|
||||
* [class tf.Dimension](#Dimension)
|
||||
* [tf.op_scope(*args, **kwds)](#op_scope)
|
||||
* [tf.op_scope(values, name, default_name)](#op_scope)
|
||||
* [tf.get_seed(op_seed)](#get_seed)
|
||||
|
||||
|
||||
|
|
@ -235,7 +235,7 @@ def my_func(pred, tensor):
|
|||
|
||||
- - -
|
||||
|
||||
#### tf.Graph.device(*args, **kwds) {#Graph.device}
|
||||
#### tf.Graph.device(device_name_or_function) {#Graph.device}
|
||||
|
||||
Returns a context manager that specifies the default device to use.
|
||||
|
||||
|
|
@ -287,7 +287,7 @@ with g.device(matmul_on_gpu):
|
|||
|
||||
- - -
|
||||
|
||||
#### tf.Graph.name_scope(*args, **kwds) {#Graph.name_scope}
|
||||
#### tf.Graph.name_scope(name) {#Graph.name_scope}
|
||||
|
||||
Returns a context manager that creates hierarchical names for operations.
|
||||
|
||||
|
|
@ -611,7 +611,7 @@ the default graph.
|
|||
|
||||
- - -
|
||||
|
||||
#### tf.Graph.gradient_override_map(*args, **kwds) {#Graph.gradient_override_map}
|
||||
#### tf.Graph.gradient_override_map(op_type_map) {#Graph.gradient_override_map}
|
||||
|
||||
EXPERIMENTAL: A context manager for overriding gradient functions.
|
||||
|
||||
|
|
@ -2023,7 +2023,7 @@ The value of this dimension, or None if it is unknown.
|
|||
|
||||
- - -
|
||||
|
||||
### tf.op_scope(*args, **kwds) <div class="md-anchor" id="op_scope">{#op_scope}</div>
|
||||
### tf.op_scope(values, name, default_name) <div class="md-anchor" id="op_scope">{#op_scope}</div>
|
||||
|
||||
Returns a context manager for use when defining a Python op.
|
||||
|
||||
|
|
|
|||
|
|
@ -267,7 +267,6 @@
|
|||
* [`depthwise_conv2d`](nn.md#depthwise_conv2d)
|
||||
* [`dropout`](nn.md#dropout)
|
||||
* [`embedding_lookup`](nn.md#embedding_lookup)
|
||||
* [`embedding_lookup_sparse`](nn.md#embedding_lookup_sparse)
|
||||
* [`fixed_unigram_candidate_sampler`](nn.md#fixed_unigram_candidate_sampler)
|
||||
* [`in_top_k`](nn.md#in_top_k)
|
||||
* [`l2_loss`](nn.md#l2_loss)
|
||||
|
|
|
|||
|
|
@ -35,7 +35,6 @@ accepted by [`tf.convert_to_tensor`](framework.md#convert_to_tensor).
|
|||
* [tf.nn.softmax_cross_entropy_with_logits(logits, labels, name=None)](#softmax_cross_entropy_with_logits)
|
||||
* [Embeddings](#AUTOGENERATED-embeddings)
|
||||
* [tf.nn.embedding_lookup(params, ids, name=None)](#embedding_lookup)
|
||||
* [tf.nn.embedding_lookup_sparse(params, sp_ids, sp_weights, name=None, combiner='mean')](#embedding_lookup_sparse)
|
||||
* [Evaluation](#AUTOGENERATED-evaluation)
|
||||
* [tf.nn.top_k(input, k, name=None)](#top_k)
|
||||
* [tf.nn.in_top_k(predictions, targets, k, name=None)](#in_top_k)
|
||||
|
|
@ -130,17 +129,18 @@ sum is unchanged.
|
|||
By default, each element is kept or dropped independently. If `noise_shape`
|
||||
is specified, it must be
|
||||
[broadcastable](http://docs.scipy.org/doc/numpy/user/basics.broadcasting.html)
|
||||
to the shape of `x`, and only dimensions with `noise_shape[i] == x.shape[i]`
|
||||
will make independent decisions. For example, if `x.shape = [b, x, y, c]` and
|
||||
`noise_shape = [b, 1, 1, c]`, each batch and channel component will be
|
||||
to the shape of `x`, and only dimensions with `noise_shape[i] == shape(x)[i]`
|
||||
will make independent decisions. For example, if `shape(x) = [k, l, m, n]`
|
||||
and `noise_shape = [k, 1, 1, n]`, each batch and channel component will be
|
||||
kept independently and each row and column will be kept or not kept together.
|
||||
|
||||
##### Args:
|
||||
|
||||
|
||||
* <b>x</b>: A tensor.
|
||||
* <b>keep_prob</b>: Float probability that each element is kept.
|
||||
* <b>noise_shape</b>: Shape for randomly generated keep/drop flags.
|
||||
* <b>keep_prob</b>: A Python float. The probability that each element is kept.
|
||||
* <b>noise_shape</b>: A 1-D `Tensor` of type `int32`, representing the
|
||||
shape for randomly generated keep/drop flags.
|
||||
* <b>seed</b>: A Python integer. Used to create a random seed.
|
||||
See [`set_random_seed`](constant_op.md#set_random_seed) for behavior.
|
||||
* <b>name</b>: A name for this operation (optional).
|
||||
|
|
@ -247,10 +247,10 @@ are as follows. If the 4-D `input` has shape
|
|||
`[batch, in_height, in_width, ...]` and the 4-D `filter` has shape
|
||||
`[filter_height, filter_width, ...]`, then
|
||||
|
||||
output.shape = [batch,
|
||||
(in_height - filter_height + 1) / strides[1],
|
||||
(in_width - filter_width + 1) / strides[2],
|
||||
...]
|
||||
shape(output) = [batch,
|
||||
(in_height - filter_height + 1) / strides[1],
|
||||
(in_width - filter_width + 1) / strides[2],
|
||||
...]
|
||||
|
||||
output[b, i, j, :] =
|
||||
sum_{di, dj} input[b, strides[1] * i + di, strides[2] * j + dj, ...] *
|
||||
|
|
@ -262,7 +262,7 @@ vectors. For `depthwise_conv_2d`, each scalar component `input[b, i, j, k]`
|
|||
is multiplied by a vector `filter[di, dj, k]`, and all the vectors are
|
||||
concatenated.
|
||||
|
||||
In the formula for `output.shape`, the rounding direction depends on padding:
|
||||
In the formula for `shape(output)`, the rounding direction depends on padding:
|
||||
|
||||
* `padding = 'SAME'`: Round down (only full size windows are considered).
|
||||
* `padding = 'VALID'`: Round up (partial windows are included).
|
||||
|
|
@ -411,7 +411,7 @@ In detail, the output is
|
|||
|
||||
for each tuple of indices `i`. The output shape is
|
||||
|
||||
output.shape = (value.shape - ksize + 1) / strides
|
||||
shape(output) = (shape(value) - ksize + 1) / strides
|
||||
|
||||
where the rounding direction depends on padding:
|
||||
|
||||
|
|
@ -722,103 +722,43 @@ and the same dtype (either `float32` or `float64`).
|
|||
|
||||
## Embeddings <div class="md-anchor" id="AUTOGENERATED-embeddings">{#AUTOGENERATED-embeddings}</div>
|
||||
|
||||
TensorFlow provides several operations that help you compute embeddings.
|
||||
TensorFlow provides library support for looking up values in embedding
|
||||
tensors.
|
||||
|
||||
- - -
|
||||
|
||||
### tf.nn.embedding_lookup(params, ids, name=None) <div class="md-anchor" id="embedding_lookup">{#embedding_lookup}</div>
|
||||
|
||||
Return a tensor of embedding values by looking up "ids" in "params".
|
||||
Looks up `ids` in a list of embedding tensors.
|
||||
|
||||
This function is used to perform parallel lookups on the list of
|
||||
tensors in `params`. It is a generalization of
|
||||
[`tf.gather()`](array_ops.md#gather), where `params` is interpreted
|
||||
as a partition of a larger embedding tensor.
|
||||
|
||||
If `len(params) > 1`, each element `id` of `ids` is partitioned between
|
||||
the elements of `params` by computing `p = id % len(params)`, and is
|
||||
then used to look up the slice `params[p][id // len(params), ...]`.
|
||||
|
||||
The results of the lookup are then concatenated into a dense
|
||||
tensor. The returned tensor has shape `shape(ids) + shape(params)[1:]`.
|
||||
|
||||
##### Args:
|
||||
|
||||
|
||||
* <b>params</b>: List of tensors of the same shape. A single tensor is
|
||||
treated as a singleton list.
|
||||
* <b>ids</b>: Tensor of integers containing the ids to be looked up in
|
||||
'params'. Let P be len(params). If P > 1, then the ids are
|
||||
partitioned by id % P, and we do separate lookups in params[p]
|
||||
for 0 <= p < P, and then stitch the results back together into
|
||||
a single result tensor.
|
||||
* <b>name</b>: Optional name for the op.
|
||||
* <b>params</b>: A list of tensors with the same shape and type.
|
||||
* <b>ids</b>: A `Tensor` with type `int32` containing the ids to be looked
|
||||
up in `params`.
|
||||
* <b>name</b>: A name for the operation (optional).
|
||||
|
||||
##### Returns:
|
||||
|
||||
A tensor of shape ids.shape + params[0].shape[1:] containing the
|
||||
values params[i % P][i] for each i in ids.
|
||||
A `Tensor` with the same type as the tensors in `params`.
|
||||
|
||||
##### Raises:
|
||||
|
||||
|
||||
* <b>ValueError</b>: if some parameters are invalid.
|
||||
|
||||
|
||||
- - -
|
||||
|
||||
### tf.nn.embedding_lookup_sparse(params, sp_ids, sp_weights, name=None, combiner='mean') <div class="md-anchor" id="embedding_lookup_sparse">{#embedding_lookup_sparse}</div>
|
||||
|
||||
Computes embeddings for the given ids and weights.
|
||||
|
||||
This op assumes that there is at least one id for each row in the dense tensor
|
||||
represented by sp_ids (i.e. there are no rows with empty features), and that
|
||||
all the indices of sp_ids are in canonical row-major order.
|
||||
|
||||
It also assumes that all id values lie in the range [0, p0), where p0
|
||||
is the sum of the size of params along dimension 0.
|
||||
|
||||
##### Args:
|
||||
|
||||
|
||||
* <b>params</b>: A single tensor representing the complete embedding tensor,
|
||||
or a list of P tensors all of same shape except for the first dimension,
|
||||
representing sharded embedding tensors. In the latter case, the ids are
|
||||
partitioned by id % P, and we do separate lookups in params[p] for
|
||||
0 <= p < P, and then stitch the results back together into a single
|
||||
result tensor. The first dimension is allowed to vary as the vocab
|
||||
size is not necessarily a multiple of P.
|
||||
* <b>sp_ids</b>: N x M SparseTensor of int64 ids (typically from FeatureValueToId),
|
||||
where N is typically batch size and M is arbitrary.
|
||||
* <b>sp_weights</b>: either a SparseTensor of float / double weights, or None to
|
||||
indicate all weights should be taken to be 1. If specified, sp_weights
|
||||
must have exactly the same shape and indices as sp_ids.
|
||||
* <b>name</b>: Optional name for the op.
|
||||
* <b>combiner</b>: A string specifying the reduction op. Currently "mean" and "sum"
|
||||
are supported.
|
||||
"sum" computes the weighted sum of the embedding results for each row.
|
||||
"mean" is the weighted sum divided by the total weight.
|
||||
|
||||
##### Returns:
|
||||
|
||||
A dense tensor representing the combined embeddings for the
|
||||
sparse ids. For each row in the dense tensor represented by sp_ids, the op
|
||||
looks up the embeddings for all ids in that row, multiplies them by the
|
||||
corresponding weight, and combines these embeddings as specified.
|
||||
|
||||
In other words, if
|
||||
shape(combined params) = [p0, p1, ..., pm]
|
||||
and
|
||||
shape(sp_ids) = shape(sp_weights) = [d0, d1, ..., dn]
|
||||
then
|
||||
shape(output) = [d0, d1, ..., dn-1, p1, ..., pm].
|
||||
|
||||
For instance, if params is a 10x20 matrix, and sp_ids / sp_weights are
|
||||
|
||||
[0, 0]: id 1, weight 2.0
|
||||
[0, 1]: id 3, weight 0.5
|
||||
[1, 0]: id 0, weight 1.0
|
||||
[2, 3]: id 1, weight 3.0
|
||||
|
||||
with combiner="mean", then the output will be a 3x20 matrix where
|
||||
output[0, :] = (params[1, :] * 2.0 + params[3, :] * 0.5) / (2.0 + 0.5)
|
||||
output[1, :] = params[0, :] * 1.0
|
||||
output[2, :] = params[1, :] * 3.0
|
||||
|
||||
##### Raises:
|
||||
|
||||
|
||||
* <b>TypeError</b>: If sp_ids is not a SparseTensor, or if sp_weights is neither
|
||||
None nor SparseTensor.
|
||||
* <b>ValueError</b>: If combiner is not one of {"mean", "sum"}.
|
||||
* <b>ValueError</b>: If `params` is empty.
|
||||
|
||||
|
||||
|
||||
|
|
|
|||
|
|
@ -23,7 +23,7 @@ accepted by [`tf.convert_to_tensor`](framework.md#convert_to_tensor).
|
|||
* [Sharing Variables](#AUTOGENERATED-sharing-variables)
|
||||
* [tf.get_variable(name, shape=None, dtype=tf.float32, initializer=None, trainable=True, collections=None)](#get_variable)
|
||||
* [tf.get_variable_scope()](#get_variable_scope)
|
||||
* [tf.variable_scope(*args, **kwds)](#variable_scope)
|
||||
* [tf.variable_scope(name_or_scope, reuse=None, initializer=None)](#variable_scope)
|
||||
* [tf.constant_initializer(value=0.0)](#constant_initializer)
|
||||
* [tf.random_normal_initializer(mean=0.0, stddev=1.0, seed=None)](#random_normal_initializer)
|
||||
* [tf.truncated_normal_initializer(mean=0.0, stddev=1.0, seed=None)](#truncated_normal_initializer)
|
||||
|
|
@ -896,7 +896,7 @@ Returns the current variable scope.
|
|||
|
||||
- - -
|
||||
|
||||
### tf.variable_scope(*args, **kwds) <div class="md-anchor" id="variable_scope">{#variable_scope}</div>
|
||||
### tf.variable_scope(name_or_scope, reuse=None, initializer=None) <div class="md-anchor" id="variable_scope">{#variable_scope}</div>
|
||||
|
||||
Returns a context for variable scope.
|
||||
|
||||
|
|
|
|||
|
|
@ -36,7 +36,33 @@ Install TensorFlow (only CPU binary version is currently available).
|
|||
$ sudo pip install https://storage.googleapis.com/tensorflow/mac/tensorflow-0.5.0-py2-none-any.whl
|
||||
```
|
||||
|
||||
### Try your first TensorFlow program
|
||||
## Docker-based installation
|
||||
|
||||
We also support running TensorFlow via [Docker](http://docker.com/), which lets
|
||||
you avoid worrying about setting up dependencies.
|
||||
|
||||
First, [install Docker](http://docs.docker.com/engine/installation/). Once
|
||||
Docker is up and running, you can start a container with one command:
|
||||
|
||||
```sh
|
||||
$ docker run -it b.gcr.io/tensorflow/tensorflow
|
||||
```
|
||||
|
||||
This will start a container with TensorFlow and all its dependencies already
|
||||
installed.
|
||||
|
||||
### Additional images
|
||||
|
||||
The default Docker image above contains just a minimal set of libraries for
|
||||
getting up and running with TensorFlow. We also have several other containers,
|
||||
which you can use in the `docker run` command above:
|
||||
|
||||
* `b.gcr.io/tensorflow/tensorflow-full`: Contains a complete TensorFlow source
|
||||
installation, including all utilities needed to build and run TensorFlow. This
|
||||
makes it easy to experiment directly with the source, without needing to
|
||||
install any of the dependencies described above.
|
||||
|
||||
## Try your first TensorFlow program
|
||||
|
||||
```sh
|
||||
$ python
|
||||
|
|
@ -133,6 +159,13 @@ $ sudo apt-get install python-numpy swig python-dev
|
|||
In order to build TensorFlow with GPU support, both Cuda Toolkit 7.0 and CUDNN
|
||||
6.5 V2 from NVIDIA need to be installed.
|
||||
|
||||
TensorFlow GPU support requires having a GPU card with NVidia Compute Capability >= 3.5. Supported cards include but are not limited to:
|
||||
|
||||
* NVidia Titan
|
||||
* NVidia Titan X
|
||||
* NVidia K20
|
||||
* NVidia K40
|
||||
|
||||
##### Download and install Cuda Toolkit 7.0
|
||||
|
||||
https://developer.nvidia.com/cuda-toolkit-70
|
||||
|
|
@ -227,7 +260,7 @@ Notes : You need to install
|
|||
Follow installation instructions [here](http://docs.scipy.org/doc/numpy/user/install.html).
|
||||
|
||||
|
||||
### Create the pip package and install
|
||||
### Create the pip package and install {#create-pip}
|
||||
|
||||
```sh
|
||||
$ bazel build -c opt //tensorflow/tools/pip_package:build_pip_package
|
||||
|
|
@ -238,7 +271,7 @@ $ bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/tensorflow_pkg
|
|||
$ pip install /tmp/tensorflow_pkg/tensorflow-0.5.0-cp27-none-linux_x86_64.whl
|
||||
```
|
||||
|
||||
### Train your first TensorFlow neural net model
|
||||
## Train your first TensorFlow neural net model
|
||||
|
||||
From the root of your source tree, run:
|
||||
|
||||
|
|
|
|||
|
|
@ -127,10 +127,9 @@ To do this for the `ZeroOut` op, add the following to `zero_out.cc`:
|
|||
REGISTER_KERNEL_BUILDER(Name("ZeroOut").Device(DEVICE_CPU), ZeroOutOp);
|
||||
```
|
||||
|
||||
TODO: instructions or pointer to building TF
|
||||
|
||||
At this point, the Tensorflow system can reference and use the Op when
|
||||
requested.
|
||||
Once you
|
||||
[build and reinstall TensorFlow](../../get_started/os_setup.md#create-pip), the
|
||||
Tensorflow system can reference and use the Op when requested.
|
||||
|
||||
## Generate the client wrapper <div class="md-anchor" id="AUTOGENERATED-generate-the-client-wrapper">{#AUTOGENERATED-generate-the-client-wrapper}</div>
|
||||
### The Python Op wrapper <div class="md-anchor" id="AUTOGENERATED-the-python-op-wrapper">{#AUTOGENERATED-the-python-op-wrapper}</div>
|
||||
|
|
|
|||
|
|
@ -101,7 +101,7 @@ w_twice = tf.Variable(weights.initialized_value() * 0.2, name="w_twice")
|
|||
The convenience function `tf.initialize_all_variables()` adds an Op to
|
||||
initialize *all variables* in the model. You can also pass it an explicit list
|
||||
of variables to initialize. See the
|
||||
[Variables Documentation](../../api_docs/python/state_op.md) for more options,
|
||||
[Variables Documentation](../../api_docs/python/state_ops.md) for more options,
|
||||
including checking if variables are initialized.
|
||||
|
||||
## Saving and Restoring
|
||||
|
|
|
|||
|
|
@ -6,18 +6,18 @@ answer on one of the TensorFlow [community resources](index.md).
|
|||
|
||||
<!-- TOC-BEGIN This section is generated by neural network: DO NOT EDIT! -->
|
||||
## Contents
|
||||
* [Building a TensorFlow graph](#AUTOGENERATED-building-a-tensorflow-graph)
|
||||
* [Running a TensorFlow computation](#AUTOGENERATED-running-a-tensorflow-computation)
|
||||
* [Variables](#AUTOGENERATED-variables)
|
||||
* [Tensor shapes](#AUTOGENERATED-tensor-shapes)
|
||||
* [TensorBoard](#AUTOGENERATED-tensorboard)
|
||||
* [Extending TensorFlow](#AUTOGENERATED-extending-tensorflow)
|
||||
* [Miscellaneous](#AUTOGENERATED-miscellaneous)
|
||||
* [Building a TensorFlow graph](#AUTOGENERATED-building-a-tensorflow-graph)
|
||||
* [Running a TensorFlow computation](#AUTOGENERATED-running-a-tensorflow-computation)
|
||||
* [Variables](#AUTOGENERATED-variables)
|
||||
* [Tensor shapes](#AUTOGENERATED-tensor-shapes)
|
||||
* [TensorBoard](#AUTOGENERATED-tensorboard)
|
||||
* [Extending TensorFlow](#AUTOGENERATED-extending-tensorflow)
|
||||
* [Miscellaneous](#AUTOGENERATED-miscellaneous)
|
||||
|
||||
|
||||
<!-- TOC-END This section was generated by neural network, THANKS FOR READING! -->
|
||||
|
||||
### Building a TensorFlow graph <div class="md-anchor" id="AUTOGENERATED-building-a-tensorflow-graph">{#AUTOGENERATED-building-a-tensorflow-graph}</div>
|
||||
## Building a TensorFlow graph <div class="md-anchor" id="AUTOGENERATED-building-a-tensorflow-graph">{#AUTOGENERATED-building-a-tensorflow-graph}</div>
|
||||
|
||||
See also the
|
||||
[API documentation on building graphs](../api_docs/python/framework.md).
|
||||
|
|
@ -55,7 +55,7 @@ uses multiple GPUs.
|
|||
TensorFlow supports a variety of different data types and tensor shapes. See the
|
||||
[ranks, shapes, and types reference](dims_types.md) for more details.
|
||||
|
||||
### Running a TensorFlow computation <div class="md-anchor" id="AUTOGENERATED-running-a-tensorflow-computation">{#AUTOGENERATED-running-a-tensorflow-computation}</div>
|
||||
## Running a TensorFlow computation <div class="md-anchor" id="AUTOGENERATED-running-a-tensorflow-computation">{#AUTOGENERATED-running-a-tensorflow-computation}</div>
|
||||
|
||||
See also the
|
||||
[API documentation on running graphs](../api_docs/python/client.md).
|
||||
|
|
@ -175,7 +175,7 @@ for
|
|||
[using `QueueRunner` objects to drive queues and readers](../how_tos/reading_data/index.md#QueueRunners)
|
||||
for more information on how to use them.
|
||||
|
||||
### Variables <div class="md-anchor" id="AUTOGENERATED-variables">{#AUTOGENERATED-variables}</div>
|
||||
## Variables <div class="md-anchor" id="AUTOGENERATED-variables">{#AUTOGENERATED-variables}</div>
|
||||
|
||||
See also the how-to documentation on [variables](../how_tos/variables/index.md)
|
||||
and [variable scopes](../how_tos/variable_scope/index.md), and
|
||||
|
|
@ -196,7 +196,7 @@ operations to a variable are allowed to run with no mutual exclusion. To acquire
|
|||
a lock when assigning to a variable, pass `use_locking=True` to
|
||||
[`Variable.assign()`](../api_docs/python/state_ops.md#Variable.assign).
|
||||
|
||||
### Tensor shapes <div class="md-anchor" id="AUTOGENERATED-tensor-shapes">{#AUTOGENERATED-tensor-shapes}</div>
|
||||
## Tensor shapes <div class="md-anchor" id="AUTOGENERATED-tensor-shapes">{#AUTOGENERATED-tensor-shapes}</div>
|
||||
|
||||
See also the
|
||||
[`TensorShape` API documentation](../api_docs/python/framework.md#TensorShape).
|
||||
|
|
@ -248,7 +248,7 @@ to encode the batch size as a Python constant, but instead to use a symbolic
|
|||
[`tf.placeholder(..., shape=[None, ...])`](../api_docs/python/io_ops.md#placeholder). The
|
||||
`None` element of the shape corresponds to a variable-sized dimension.
|
||||
|
||||
### TensorBoard <div class="md-anchor" id="AUTOGENERATED-tensorboard">{#AUTOGENERATED-tensorboard}</div>
|
||||
## TensorBoard <div class="md-anchor" id="AUTOGENERATED-tensorboard">{#AUTOGENERATED-tensorboard}</div>
|
||||
|
||||
See also the
|
||||
[how-to documentation on TensorBoard](../how_tos/graph_viz/index.md).
|
||||
|
|
@ -260,7 +260,7 @@ of these summaries to a log directory. Then, startup TensorBoard using
|
|||
<SOME_COMMAND> and pass the --logdir flag so that it points to your
|
||||
log directory. For more details, see <YET_UNWRITTEN_TENSORBOARD_TUTORIAL>.
|
||||
|
||||
### Extending TensorFlow <div class="md-anchor" id="AUTOGENERATED-extending-tensorflow">{#AUTOGENERATED-extending-tensorflow}</div>
|
||||
## Extending TensorFlow <div class="md-anchor" id="AUTOGENERATED-extending-tensorflow">{#AUTOGENERATED-extending-tensorflow}</div>
|
||||
|
||||
See also the how-to documentation for
|
||||
[adding a new operation to TensorFlow](../how_tos/adding_an_op/index.md).
|
||||
|
|
@ -293,7 +293,7 @@ how-to documentation for
|
|||
[adding an op with a list of inputs or outputs](../how_tos/adding_an_op/index.md#list-input-output)
|
||||
for more details of how to define these different input types.
|
||||
|
||||
### Miscellaneous <div class="md-anchor" id="AUTOGENERATED-miscellaneous">{#AUTOGENERATED-miscellaneous}</div>
|
||||
## Miscellaneous <div class="md-anchor" id="AUTOGENERATED-miscellaneous">{#AUTOGENERATED-miscellaneous}</div>
|
||||
|
||||
#### Does TensorFlow work with Python 3?
|
||||
|
||||
|
|
|
|||
|
|
@ -17,7 +17,7 @@ Listed below are some of the many uses of TensorFlow.
|
|||
|
||||
* **Inception Image Classification Model**
|
||||
* **Organization**: Google
|
||||
* **Description**: Baseline model and follow on research into highly accurate computer vision models, starting with the model that won the 2014 Imagenet iamge classification challenge
|
||||
* **Description**: Baseline model and follow on research into highly accurate computer vision models, starting with the model that won the 2014 Imagenet image classification challenge
|
||||
* **More Info**: Baseline model described in [Arxiv paper](http://arxiv.org/abs/1409.4842)
|
||||
|
||||
* **SmartReply**
|
||||
|
|
@ -33,6 +33,6 @@ Listed below are some of the many uses of TensorFlow.
|
|||
|
||||
* **On-Device Computer Vision for OCR**
|
||||
* **Organization**: Google
|
||||
* **Description**: On-device computer vision model to do optical character recoignition to enable real-time translation.
|
||||
* **Description**: On-device computer vision model to do optical character recognition to enable real-time translation.
|
||||
* **More info**: [Google Research blog post](http://googleresearch.blogspot.com/2015/07/how-google-translate-squeezes-deep.html)
|
||||
}
|
||||
|
|
|
|||
|
|
@ -1,4 +1,4 @@
|
|||
# Convolutional Neural Networks for Object Recognition
|
||||
# Convolutional Neural Networks
|
||||
|
||||
**NOTE:** This tutorial is intended for *advanced* users of TensorFlow
|
||||
and assumes expertise and experience in machine learning.
|
||||
|
|
@ -264,7 +264,7 @@ in `cifar10.py`.
|
|||
|
||||
`cifar10_train.py` periodically [saves](../../api_docs/python/state_ops.md#Saver)
|
||||
all model parameters in
|
||||
[checkpoint files](../../how_tos/variables.md#saving-and-restoring)
|
||||
[checkpoint files](../../how_tos/variables/index.md#saving-and-restoring)
|
||||
but it does *not* evaluate the model. The checkpoint file
|
||||
will be used by `cifar10_eval.py` to measure the predictive
|
||||
performance (see [Evaluating a Model](#evaluating-a-model) below).
|
||||
|
|
|
|||
|
|
@ -1,7 +1,7 @@
|
|||
# Overview
|
||||
|
||||
|
||||
## ML for Beginners
|
||||
## MNIST For ML Beginners
|
||||
|
||||
If you're new to machine learning, we recommend starting here. You'll learn
|
||||
about a classic problem, handwritten digit classification (MNIST), and get a
|
||||
|
|
@ -10,7 +10,7 @@ gentle introduction to multiclass classification.
|
|||
[View Tutorial](mnist/beginners/index.md)
|
||||
|
||||
|
||||
## MNIST for Pros
|
||||
## Deep MNIST for Experts
|
||||
|
||||
If you're already familiar with other deep learning software packages, and are
|
||||
already familiar with MNIST, this tutorial with give you a very brief primer on
|
||||
|
|
|
|||
|
|
@ -1,4 +1,4 @@
|
|||
|
||||
# Mandelbrot Set
|
||||
|
||||
```
|
||||
#Import libraries for simulation
|
||||
|
|
|
|||
|
|
@ -1,4 +1,4 @@
|
|||
# MNIST Softmax Regression (For Beginners)
|
||||
# MNIST For ML Beginners
|
||||
|
||||
*This tutorial is intended for readers who are new to both machine learning and
|
||||
TensorFlow. If you already
|
||||
|
|
|
|||
|
|
@ -1,4 +1,4 @@
|
|||
# Downloading MNIST
|
||||
# MNIST Data Download
|
||||
|
||||
Code: [tensorflow/g3doc/tutorials/mnist/](https://tensorflow.googlesource.com/tensorflow/+/master/tensorflow/g3doc/tutorials/mnist/)
|
||||
|
||||
|
|
|
|||
|
|
@ -1,4 +1,4 @@
|
|||
# MNIST Deep Learning Example (For Experts)
|
||||
# Deep MNIST for Experts
|
||||
|
||||
TensorFlow is a powerful library for doing large-scale numerical computation.
|
||||
One of the tasks at which it excels is implementing and training deep neural
|
||||
|
|
|
|||
|
|
@ -1,4 +1,4 @@
|
|||
# Handwritten Digit Classification
|
||||
# TensorFlow Mechanics 101
|
||||
|
||||
Code: [tensorflow/g3doc/tutorials/mnist/](https://tensorflow.googlesource.com/tensorflow/+/master/tensorflow/g3doc/tutorials/mnist/)
|
||||
|
||||
|
|
|
|||
|
|
@ -1,3 +1,4 @@
|
|||
# Partial Differential Equations
|
||||
|
||||
## Basic Setup
|
||||
|
||||
|
|
|
|||
|
|
@ -1,4 +1,4 @@
|
|||
# Sequence-to-Sequence Models: Learning to Translate
|
||||
# Sequence-to-Sequence Models
|
||||
|
||||
Recurrent neural networks can learn to model language, as already discussed
|
||||
in the [RNN Tutorial](../recurrent/index.md)
|
||||
|
|
|
|||
|
|
@ -1,4 +1,4 @@
|
|||
# Learning Vector Representations of Words
|
||||
# Vector Representations of Words
|
||||
|
||||
In this tutorial we look at the word2vec model by
|
||||
[Mikolov et al.](http://papers.nips.cc/paper/5021-distributed-representations-of-words-and-phrases-and-their-compositionality.pdf).
|
||||
|
|
|
|||
106
tensorflow/models/rnn/BUILD
Normal file
106
tensorflow/models/rnn/BUILD
Normal file
|
|
@ -0,0 +1,106 @@
|
|||
# Description:
|
||||
# Example RNN models, including language models and sequence-to-sequence models.
|
||||
|
||||
package(default_visibility = ["//visibility:public"])
|
||||
|
||||
licenses(["notice"]) # Apache 2.0
|
||||
|
||||
exports_files(["LICENSE"])
|
||||
|
||||
load("/tensorflow/tensorflow", "cuda_py_tests")
|
||||
|
||||
py_library(
|
||||
name = "linear",
|
||||
srcs = [
|
||||
"linear.py",
|
||||
],
|
||||
deps = [
|
||||
"//tensorflow:tensorflow_py",
|
||||
],
|
||||
)
|
||||
|
||||
py_test(
|
||||
name = "linear_test",
|
||||
size = "small",
|
||||
srcs = ["linear_test.py"],
|
||||
deps = [
|
||||
":linear",
|
||||
"//tensorflow:tensorflow_py",
|
||||
],
|
||||
)
|
||||
|
||||
py_library(
|
||||
name = "rnn_cell",
|
||||
srcs = [
|
||||
"rnn_cell.py",
|
||||
],
|
||||
deps = [
|
||||
":linear",
|
||||
"//tensorflow:tensorflow_py",
|
||||
],
|
||||
)
|
||||
|
||||
py_test(
|
||||
name = "rnn_cell_test",
|
||||
size = "small",
|
||||
srcs = ["rnn_cell_test.py"],
|
||||
deps = [
|
||||
":rnn_cell",
|
||||
"//tensorflow:tensorflow_py",
|
||||
],
|
||||
)
|
||||
|
||||
py_library(
|
||||
name = "rnn",
|
||||
srcs = [
|
||||
"rnn.py",
|
||||
],
|
||||
deps = [
|
||||
":rnn_cell",
|
||||
"//tensorflow:tensorflow_py",
|
||||
],
|
||||
)
|
||||
|
||||
cuda_py_tests(
|
||||
name = "rnn_tests",
|
||||
srcs = [
|
||||
"rnn_test.py",
|
||||
],
|
||||
additional_deps = [
|
||||
":rnn",
|
||||
],
|
||||
)
|
||||
|
||||
py_library(
|
||||
name = "seq2seq",
|
||||
srcs = [
|
||||
"seq2seq.py",
|
||||
],
|
||||
deps = [
|
||||
":rnn",
|
||||
"//tensorflow:tensorflow_py",
|
||||
],
|
||||
)
|
||||
|
||||
py_test(
|
||||
name = "seq2seq_test",
|
||||
srcs = [
|
||||
"seq2seq_test.py",
|
||||
],
|
||||
deps = [
|
||||
":seq2seq",
|
||||
"//tensorflow:tensorflow_py",
|
||||
],
|
||||
)
|
||||
|
||||
filegroup(
|
||||
name = "all_files",
|
||||
srcs = glob(
|
||||
["**/*"],
|
||||
exclude = [
|
||||
"**/METADATA",
|
||||
"**/OWNERS",
|
||||
],
|
||||
),
|
||||
visibility = ["//tensorflow:__subpackages__"],
|
||||
)
|
||||
21
tensorflow/models/rnn/README.md
Normal file
21
tensorflow/models/rnn/README.md
Normal file
|
|
@ -0,0 +1,21 @@
|
|||
This directory contains functions for creating recurrent neural networks
|
||||
and sequence-to-sequence models. Detailed instructions on how to get started
|
||||
and use them are available in the tutorials.
|
||||
|
||||
* [RNN Tutorial](http://tensorflow.org/tutorials/recurrent/)
|
||||
* [Sequence-to-Sequence Tutorial](http://tensorflow.org/tutorials/seq2seq/)
|
||||
|
||||
Here is a short overview of what is in this directory.
|
||||
|
||||
File | What's in it?
|
||||
--- | ---
|
||||
`linear.py` | Basic helper functions for creating linear layers.
|
||||
`linear_test.py` | Unit tests for `linear.py`.
|
||||
`rnn_cell.py` | Cells for recurrent neural networks, e.g., LSTM.
|
||||
`rnn_cell_test.py` | Unit tests for `rnn_cell.py`.
|
||||
`rnn.py` | Functions for building recurrent neural networks.
|
||||
`rnn_test.py` | Unit tests for `rnn.py`.
|
||||
`seq2seq.py` | Functions for building sequence-to-sequence models.
|
||||
`seq2seq_test.py` | Unit tests for `seq2seq.py`.
|
||||
`ptb/` | PTB language model, see the [RNN Tutorial](http://tensorflow.org/tutorials/recurrent/)
|
||||
`translate/` | Translation model, see the [Sequence-to-Sequence Tutorial](http://tensorflow.org/tutorials/seq2seq/)
|
||||
0
tensorflow/models/rnn/__init__.py
Executable file
0
tensorflow/models/rnn/__init__.py
Executable file
49
tensorflow/models/rnn/linear.py
Normal file
49
tensorflow/models/rnn/linear.py
Normal file
|
|
@ -0,0 +1,49 @@
|
|||
"""Basic linear combinations that implicitly generate variables."""
|
||||
|
||||
import tensorflow as tf
|
||||
|
||||
|
||||
def linear(args, output_size, bias, bias_start=0.0, scope=None):
|
||||
"""Linear map: sum_i(args[i] * W[i]), where W[i] is a variable.
|
||||
|
||||
Args:
|
||||
args: a 2D Tensor or a list of 2D, batch x n, Tensors.
|
||||
output_size: int, second dimension of W[i].
|
||||
bias: boolean, whether to add a bias term or not.
|
||||
bias_start: starting value to initialize the bias; 0 by default.
|
||||
scope: VariableScope for the created subgraph; defaults to "Linear".
|
||||
|
||||
Returns:
|
||||
A 2D Tensor with shape [batch x output_size] equal to
|
||||
sum_i(args[i] * W[i]), where W[i]s are newly created matrices.
|
||||
|
||||
Raises:
|
||||
ValueError: if some of the arguments has unspecified or wrong shape.
|
||||
"""
|
||||
assert args
|
||||
if not isinstance(args, (list, tuple)):
|
||||
args = [args]
|
||||
|
||||
# Calculate the total size of arguments on dimension 1.
|
||||
total_arg_size = 0
|
||||
shapes = [a.get_shape().as_list() for a in args]
|
||||
for shape in shapes:
|
||||
if len(shape) != 2:
|
||||
raise ValueError("Linear is expecting 2D arguments: %s" % str(shapes))
|
||||
if not shape[1]:
|
||||
raise ValueError("Linear expects shape[1] of arguments: %s" % str(shapes))
|
||||
else:
|
||||
total_arg_size += shape[1]
|
||||
|
||||
# Now the computation.
|
||||
with tf.variable_scope(scope or "Linear"):
|
||||
matrix = tf.get_variable("Matrix", [total_arg_size, output_size])
|
||||
if len(args) == 1:
|
||||
res = tf.matmul(args[0], matrix)
|
||||
else:
|
||||
res = tf.matmul(tf.concat(1, args), matrix)
|
||||
if not bias:
|
||||
return res
|
||||
bias_term = tf.get_variable("Bias", [output_size],
|
||||
initializer=tf.constant_initializer(bias_start))
|
||||
return res + bias_term
|
||||
35
tensorflow/models/rnn/linear_test.py
Normal file
35
tensorflow/models/rnn/linear_test.py
Normal file
|
|
@ -0,0 +1,35 @@
|
|||
# pylint: disable=g-bad-import-order,unused-import
|
||||
import tensorflow.python.platform
|
||||
|
||||
import numpy as np
|
||||
import tensorflow as tf
|
||||
|
||||
from tensorflow.models.rnn import linear
|
||||
|
||||
|
||||
class LinearTest(tf.test.TestCase):
|
||||
|
||||
def testLinear(self):
|
||||
with self.test_session() as sess:
|
||||
with tf.variable_scope("root", initializer=tf.constant_initializer(1.0)):
|
||||
x = tf.zeros([1, 2])
|
||||
l = linear.linear([x], 2, False)
|
||||
sess.run([tf.variables.initialize_all_variables()])
|
||||
res = sess.run([l], {x.name: np.array([[1., 2.]])})
|
||||
self.assertAllClose(res[0], [[3.0, 3.0]])
|
||||
|
||||
# Checks prevent you from accidentally creating a shared function.
|
||||
with self.assertRaises(ValueError) as exc:
|
||||
l1 = linear.linear([x], 2, False)
|
||||
self.assertEqual(exc.exception.message[:12], "Over-sharing")
|
||||
|
||||
# But you can create a new one in a new scope and share the variables.
|
||||
with tf.variable_scope("l1") as new_scope:
|
||||
l1 = linear.linear([x], 2, False)
|
||||
with tf.variable_scope(new_scope, reuse=True):
|
||||
linear.linear([l1], 2, False)
|
||||
self.assertEqual(len(tf.trainable_variables()), 2)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
tf.test.main()
|
||||
49
tensorflow/models/rnn/ptb/BUILD
Normal file
49
tensorflow/models/rnn/ptb/BUILD
Normal file
|
|
@ -0,0 +1,49 @@
|
|||
# Description:
|
||||
# Python support for TensorFlow.
|
||||
|
||||
package(default_visibility = ["//tensorflow:internal"])
|
||||
|
||||
licenses(["notice"]) # Apache 2.0
|
||||
|
||||
exports_files(["LICENSE"])
|
||||
|
||||
py_library(
|
||||
name = "reader",
|
||||
srcs = ["reader.py"],
|
||||
deps = ["//tensorflow:tensorflow_py"],
|
||||
)
|
||||
|
||||
py_test(
|
||||
name = "reader_test",
|
||||
srcs = ["reader_test.py"],
|
||||
deps = [
|
||||
":reader",
|
||||
"//tensorflow:tensorflow_py",
|
||||
],
|
||||
)
|
||||
|
||||
py_binary(
|
||||
name = "ptb_word_lm",
|
||||
srcs = [
|
||||
"ptb_word_lm.py",
|
||||
],
|
||||
deps = [
|
||||
":reader",
|
||||
"//tensorflow:tensorflow_py",
|
||||
"//tensorflow/models/rnn",
|
||||
"//tensorflow/models/rnn:rnn_cell",
|
||||
"//tensorflow/models/rnn:seq2seq",
|
||||
],
|
||||
)
|
||||
|
||||
filegroup(
|
||||
name = "all_files",
|
||||
srcs = glob(
|
||||
["**/*"],
|
||||
exclude = [
|
||||
"**/METADATA",
|
||||
"**/OWNERS",
|
||||
],
|
||||
),
|
||||
visibility = ["//tensorflow:__subpackages__"],
|
||||
)
|
||||
0
tensorflow/models/rnn/ptb/__init__.py
Executable file
0
tensorflow/models/rnn/ptb/__init__.py
Executable file
292
tensorflow/models/rnn/ptb/ptb_word_lm.py
Normal file
292
tensorflow/models/rnn/ptb/ptb_word_lm.py
Normal file
|
|
@ -0,0 +1,292 @@
|
|||
"""Example / benchmark for building a PTB LSTM model.
|
||||
|
||||
Trains the model described in:
|
||||
(Zaremba, et. al.) Recurrent Neural Network Regularization
|
||||
http://arxiv.org/abs/1409.2329
|
||||
|
||||
The data required for this example is in the data/ dir of the
|
||||
PTB dataset from Tomas Mikolov's webpage:
|
||||
|
||||
http://www.fit.vutbr.cz/~imikolov/rnnlm/simple-examples.tgz
|
||||
|
||||
There are 3 supported model configurations:
|
||||
===========================================
|
||||
| config | epochs | train | valid | test
|
||||
===========================================
|
||||
| small | 13 | 37.99 | 121.39 | 115.91
|
||||
| medium | 39 | 48.45 | 86.16 | 82.07
|
||||
| large | 55 | 37.87 | 82.62 | 78.29
|
||||
The exact results may vary depending on the random initialization.
|
||||
|
||||
The hyperparameters used in the model:
|
||||
- init_scale - the initial scale of the weights
|
||||
- learning_rate - the initial value of the learning rate
|
||||
- max_grad_norm - the maximum permissible norm of the gradient
|
||||
- num_layers - the number of LSTM layers
|
||||
- num_steps - the number of unrolled steps of LSTM
|
||||
- hidden_size - the number of LSTM units
|
||||
- max_epoch - the number of epochs trained with the initial learning rate
|
||||
- max_max_epoch - the total number of epochs for training
|
||||
- keep_prob - the probability of keeping weights in the dropout layer
|
||||
- lr_decay - the decay of the learning rate for each epoch after "max_epoch"
|
||||
- batch_size - the batch size
|
||||
|
||||
To compile on CPU:
|
||||
bazel build -c opt tensorflow/models/rnn/ptb:ptb_word_lm
|
||||
To compile on GPU:
|
||||
bazel build -c opt tensorflow --config=cuda \
|
||||
tensorflow/models/rnn/ptb:ptb_word_lm
|
||||
To run:
|
||||
./bazel-bin/.../ptb_word_lm \
|
||||
--data_path=/tmp/simple-examples/data/ --alsologtostderr
|
||||
|
||||
"""
|
||||
|
||||
import time
|
||||
|
||||
import tensorflow.python.platform
|
||||
|
||||
import numpy as np
|
||||
import tensorflow as tf
|
||||
|
||||
from tensorflow.models.rnn import rnn_cell
|
||||
from tensorflow.models.rnn import seq2seq
|
||||
from tensorflow.models.rnn.ptb import reader
|
||||
|
||||
flags = tf.flags
|
||||
logging = tf.logging
|
||||
|
||||
flags.DEFINE_string(
|
||||
"model", "small",
|
||||
"A type of model. Possible options are: small, medium, large.")
|
||||
flags.DEFINE_string("data_path", None, "data_path")
|
||||
|
||||
FLAGS = flags.FLAGS
|
||||
|
||||
|
||||
class PTBModel(object):
|
||||
"""The PTB model."""
|
||||
|
||||
def __init__(self, is_training, config):
|
||||
self.batch_size = batch_size = config.batch_size
|
||||
self.num_steps = num_steps = config.num_steps
|
||||
size = config.hidden_size
|
||||
vocab_size = config.vocab_size
|
||||
|
||||
self._input_data = tf.placeholder(tf.int32, [batch_size, num_steps])
|
||||
self._targets = tf.placeholder(tf.int32, [batch_size, num_steps])
|
||||
|
||||
# Slightly better results can be obtained with forget gate biases
|
||||
# initialized to 1 but the hyperparameters of the model would need to be
|
||||
# different than reported in the paper.
|
||||
lstm_cell = rnn_cell.BasicLSTMCell(size, forget_bias=0.0)
|
||||
if is_training and config.keep_prob < 1:
|
||||
lstm_cell = rnn_cell.DropoutWrapper(
|
||||
lstm_cell, output_keep_prob=config.keep_prob)
|
||||
cell = rnn_cell.MultiRNNCell([lstm_cell] * config.num_layers)
|
||||
|
||||
self._initial_state = cell.zero_state(batch_size, tf.float32)
|
||||
|
||||
with tf.device("/cpu:0"):
|
||||
embedding = tf.get_variable("embedding", [vocab_size, size])
|
||||
inputs = tf.split(
|
||||
1, num_steps, tf.nn.embedding_lookup(embedding, self._input_data))
|
||||
inputs = [tf.squeeze(input_, [1]) for input_ in inputs]
|
||||
|
||||
if is_training and config.keep_prob < 1:
|
||||
inputs = [tf.nn.dropout(input_, config.keep_prob) for input_ in inputs]
|
||||
|
||||
# Simplified version of tensorflow.models.rnn.rnn.py's rnn().
|
||||
# This builds an unrolled LSTM for tutorial purposes only.
|
||||
# In general, use the rnn() or state_saving_rnn() from rnn.py.
|
||||
#
|
||||
# The alternative version of the code below is:
|
||||
#
|
||||
# from tensorflow.models.rnn import rnn
|
||||
# outputs, states = rnn.rnn(cell, inputs, initial_state=self._initial_state)
|
||||
outputs = []
|
||||
states = []
|
||||
state = self._initial_state
|
||||
with tf.variable_scope("RNN"):
|
||||
for time_step, input_ in enumerate(inputs):
|
||||
if time_step > 0: tf.get_variable_scope().reuse_variables()
|
||||
(cell_output, state) = cell(input_, state)
|
||||
outputs.append(cell_output)
|
||||
states.append(state)
|
||||
|
||||
output = tf.reshape(tf.concat(1, outputs), [-1, size])
|
||||
logits = tf.nn.xw_plus_b(output,
|
||||
tf.get_variable("softmax_w", [size, vocab_size]),
|
||||
tf.get_variable("softmax_b", [vocab_size]))
|
||||
loss = seq2seq.sequence_loss_by_example([logits],
|
||||
[tf.reshape(self._targets, -1)],
|
||||
[tf.ones([batch_size * num_steps])],
|
||||
vocab_size)
|
||||
self._cost = cost = tf.reduce_sum(loss) / batch_size
|
||||
self._final_state = states[-1]
|
||||
|
||||
if not is_training:
|
||||
return
|
||||
|
||||
self._lr = tf.Variable(0.0, trainable=False)
|
||||
tvars = tf.trainable_variables()
|
||||
grads, _ = tf.clip_by_global_norm(tf.gradients(cost, tvars),
|
||||
config.max_grad_norm)
|
||||
optimizer = tf.train.GradientDescentOptimizer(self.lr)
|
||||
self._train_op = optimizer.apply_gradients(zip(grads, tvars))
|
||||
|
||||
def assign_lr(self, session, lr_value):
|
||||
session.run(tf.assign(self.lr, lr_value))
|
||||
|
||||
@property
|
||||
def input_data(self):
|
||||
return self._input_data
|
||||
|
||||
@property
|
||||
def targets(self):
|
||||
return self._targets
|
||||
|
||||
@property
|
||||
def initial_state(self):
|
||||
return self._initial_state
|
||||
|
||||
@property
|
||||
def cost(self):
|
||||
return self._cost
|
||||
|
||||
@property
|
||||
def final_state(self):
|
||||
return self._final_state
|
||||
|
||||
@property
|
||||
def lr(self):
|
||||
return self._lr
|
||||
|
||||
@property
|
||||
def train_op(self):
|
||||
return self._train_op
|
||||
|
||||
|
||||
class SmallConfig(object):
|
||||
"""Small config."""
|
||||
init_scale = 0.1
|
||||
learning_rate = 1.0
|
||||
max_grad_norm = 5
|
||||
num_layers = 2
|
||||
num_steps = 20
|
||||
hidden_size = 200
|
||||
max_epoch = 4
|
||||
max_max_epoch = 13
|
||||
keep_prob = 1.0
|
||||
lr_decay = 0.5
|
||||
batch_size = 20
|
||||
vocab_size = 10000
|
||||
|
||||
|
||||
class MediumConfig(object):
|
||||
"""Medium config."""
|
||||
init_scale = 0.05
|
||||
learning_rate = 1.0
|
||||
max_grad_norm = 5
|
||||
num_layers = 2
|
||||
num_steps = 35
|
||||
hidden_size = 650
|
||||
max_epoch = 6
|
||||
max_max_epoch = 39
|
||||
keep_prob = 0.5
|
||||
lr_decay = 0.8
|
||||
batch_size = 20
|
||||
vocab_size = 10000
|
||||
|
||||
|
||||
class LargeConfig(object):
|
||||
"""Large config."""
|
||||
init_scale = 0.04
|
||||
learning_rate = 1.0
|
||||
max_grad_norm = 10
|
||||
num_layers = 2
|
||||
num_steps = 35
|
||||
hidden_size = 1500
|
||||
max_epoch = 14
|
||||
max_max_epoch = 55
|
||||
keep_prob = 0.35
|
||||
lr_decay = 1 / 1.15
|
||||
batch_size = 20
|
||||
vocab_size = 10000
|
||||
|
||||
|
||||
def run_epoch(session, m, data, eval_op, verbose=False):
|
||||
"""Runs the model on the given data."""
|
||||
epoch_size = ((len(data) / m.batch_size) - 1) / m.num_steps
|
||||
start_time = time.time()
|
||||
costs = 0.0
|
||||
iters = 0
|
||||
state = m.initial_state.eval()
|
||||
for step, (x, y) in enumerate(reader.ptb_iterator(data, m.batch_size,
|
||||
m.num_steps)):
|
||||
cost, state, _ = session.run([m.cost, m.final_state, eval_op],
|
||||
{m.input_data: x,
|
||||
m.targets: y,
|
||||
m.initial_state: state})
|
||||
costs += cost
|
||||
iters += m.num_steps
|
||||
|
||||
if verbose and step % (epoch_size / 10) == 10:
|
||||
print("%.3f perplexity: %.3f speed: %.0f wps" %
|
||||
(step * 1.0 / epoch_size, np.exp(costs / iters),
|
||||
iters * m.batch_size / (time.time() - start_time)))
|
||||
|
||||
return np.exp(costs / iters)
|
||||
|
||||
|
||||
def get_config():
|
||||
if FLAGS.model == "small":
|
||||
return SmallConfig()
|
||||
elif FLAGS.model == "medium":
|
||||
return MediumConfig()
|
||||
elif FLAGS.model == "large":
|
||||
return LargeConfig()
|
||||
else:
|
||||
raise ValueError("Invalid model: %s", FLAGS.model)
|
||||
|
||||
|
||||
def main(unused_args):
|
||||
if not FLAGS.data_path:
|
||||
raise ValueError("Must set --data_path to PTB data directory")
|
||||
|
||||
raw_data = reader.ptb_raw_data(FLAGS.data_path)
|
||||
train_data, valid_data, test_data, _ = raw_data
|
||||
|
||||
config = get_config()
|
||||
eval_config = get_config()
|
||||
eval_config.batch_size = 1
|
||||
eval_config.num_steps = 1
|
||||
|
||||
with tf.Graph().as_default(), tf.Session() as session:
|
||||
initializer = tf.random_uniform_initializer(-config.init_scale,
|
||||
config.init_scale)
|
||||
with tf.variable_scope("model", reuse=None, initializer=initializer):
|
||||
m = PTBModel(is_training=True, config=config)
|
||||
with tf.variable_scope("model", reuse=True, initializer=initializer):
|
||||
mvalid = PTBModel(is_training=False, config=config)
|
||||
mtest = PTBModel(is_training=False, config=eval_config)
|
||||
|
||||
tf.initialize_all_variables().run()
|
||||
|
||||
for i in range(config.max_max_epoch):
|
||||
lr_decay = config.lr_decay ** max(i - config.max_epoch, 0.0)
|
||||
m.assign_lr(session, config.learning_rate * lr_decay)
|
||||
|
||||
print("Epoch: %d Learning rate: %.3f" % (i + 1, session.run(m.lr)))
|
||||
train_perplexity = run_epoch(session, m, train_data, m.train_op,
|
||||
verbose=True)
|
||||
print("Epoch: %d Train Perplexity: %.3f" % (i + 1, train_perplexity))
|
||||
valid_perplexity = run_epoch(session, mvalid, valid_data, tf.no_op())
|
||||
print("Epoch: %d Valid Perplexity: %.3f" % (i + 1, valid_perplexity))
|
||||
|
||||
test_perplexity = run_epoch(session, mtest, test_data, tf.no_op())
|
||||
print("Test Perplexity: %.3f" % test_perplexity)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
tf.app.run()
|
||||
105
tensorflow/models/rnn/ptb/reader.py
Normal file
105
tensorflow/models/rnn/ptb/reader.py
Normal file
|
|
@ -0,0 +1,105 @@
|
|||
# pylint: disable=unused-import,g-bad-import-order
|
||||
|
||||
"""Utilities for parsing PTB text files."""
|
||||
import collections
|
||||
import os
|
||||
import sys
|
||||
import time
|
||||
|
||||
import tensorflow.python.platform
|
||||
|
||||
import numpy as np
|
||||
import tensorflow as tf
|
||||
|
||||
from tensorflow.python.platform import gfile
|
||||
|
||||
|
||||
def _read_words(filename):
|
||||
with gfile.GFile(filename, "r") as f:
|
||||
return f.read().replace("\n", "<eos>").split()
|
||||
|
||||
|
||||
def _build_vocab(filename):
|
||||
data = _read_words(filename)
|
||||
|
||||
counter = collections.Counter(data)
|
||||
count_pairs = sorted(counter.items(), key=lambda x: -x[1])
|
||||
|
||||
words, _ = zip(*count_pairs)
|
||||
word_to_id = dict(zip(words, range(len(words))))
|
||||
|
||||
return word_to_id
|
||||
|
||||
|
||||
def _file_to_word_ids(filename, word_to_id):
|
||||
data = _read_words(filename)
|
||||
return [word_to_id[word] for word in data]
|
||||
|
||||
|
||||
def ptb_raw_data(data_path=None):
|
||||
"""Load PTB raw data from data directory "data_path".
|
||||
|
||||
Reads PTB text files, converts strings to integer ids,
|
||||
and performs mini-batching of the inputs.
|
||||
|
||||
The PTB dataset comes from Tomas Mikolov's webpage:
|
||||
|
||||
http://www.fit.vutbr.cz/~imikolov/rnnlm/simple-examples.tgz
|
||||
|
||||
Args:
|
||||
data_path: string path to the directory where simple-examples.tgz has
|
||||
been extracted.
|
||||
|
||||
Returns:
|
||||
tuple (train_data, valid_data, test_data, vocabulary)
|
||||
where each of the data objects can be passed to PTBIterator.
|
||||
"""
|
||||
|
||||
train_path = os.path.join(data_path, "ptb.train.txt")
|
||||
valid_path = os.path.join(data_path, "ptb.valid.txt")
|
||||
test_path = os.path.join(data_path, "ptb.test.txt")
|
||||
|
||||
word_to_id = _build_vocab(train_path)
|
||||
train_data = _file_to_word_ids(train_path, word_to_id)
|
||||
valid_data = _file_to_word_ids(valid_path, word_to_id)
|
||||
test_data = _file_to_word_ids(test_path, word_to_id)
|
||||
vocabulary = len(word_to_id)
|
||||
return train_data, valid_data, test_data, vocabulary
|
||||
|
||||
|
||||
def ptb_iterator(raw_data, batch_size, num_steps):
|
||||
"""Iterate on the raw PTB data.
|
||||
|
||||
This generates batch_size pointers into the raw PTB data, and allows
|
||||
minibatch iteration along these pointers.
|
||||
|
||||
Args:
|
||||
raw_data: one of the raw data outputs from ptb_raw_data.
|
||||
batch_size: int, the batch size.
|
||||
num_steps: int, the number of unrolls.
|
||||
|
||||
Yields:
|
||||
Pairs of the batched data, each a matrix of shape [batch_size, num_steps].
|
||||
The second element of the tuple is the same data time-shifted to the
|
||||
right by one.
|
||||
|
||||
Raises:
|
||||
ValueError: if batch_size or num_steps are too high.
|
||||
"""
|
||||
raw_data = np.array(raw_data, dtype=np.int32)
|
||||
|
||||
data_len = len(raw_data)
|
||||
batch_len = data_len / batch_size
|
||||
data = np.zeros([batch_size, batch_len], dtype=np.int32)
|
||||
for i in range(batch_size):
|
||||
data[i] = raw_data[batch_len * i:batch_len * (i + 1)]
|
||||
|
||||
epoch_size = (batch_len - 1) / num_steps
|
||||
|
||||
if epoch_size == 0:
|
||||
raise ValueError("epoch_size == 0, decrease batch_size or num_steps")
|
||||
|
||||
for i in range(epoch_size):
|
||||
x = data[:, i*num_steps:(i+1)*num_steps]
|
||||
y = data[:, i*num_steps+1:(i+1)*num_steps+1]
|
||||
yield (x, y)
|
||||
47
tensorflow/models/rnn/ptb/reader_test.py
Normal file
47
tensorflow/models/rnn/ptb/reader_test.py
Normal file
|
|
@ -0,0 +1,47 @@
|
|||
"""Tests for tensorflow.models.ptb_lstm.ptb_reader."""
|
||||
|
||||
import os.path
|
||||
|
||||
# pylint: disable=g-bad-import-order,unused-import
|
||||
import tensorflow.python.platform
|
||||
|
||||
import numpy as np
|
||||
import tensorflow as tf
|
||||
|
||||
from tensorflow.models.rnn.ptb import reader
|
||||
from tensorflow.python.platform import gfile
|
||||
|
||||
|
||||
class PtbReaderTest(tf.test.TestCase):
|
||||
|
||||
def setUp(self):
|
||||
self._string_data = "\n".join(
|
||||
[" hello there i am",
|
||||
" rain as day",
|
||||
" want some cheesy puffs ?"])
|
||||
|
||||
def testPtbRawData(self):
|
||||
tmpdir = tf.test.get_temp_dir()
|
||||
for suffix in "train", "valid", "test":
|
||||
filename = os.path.join(tmpdir, "ptb.%s.txt" % suffix)
|
||||
with gfile.GFile(filename, "w") as fh:
|
||||
fh.write(self._string_data)
|
||||
# Smoke test
|
||||
output = reader.ptb_raw_data(tmpdir)
|
||||
self.assertEqual(len(output), 4)
|
||||
|
||||
def testPtbIterator(self):
|
||||
raw_data = [4, 3, 2, 1, 0, 5, 6, 1, 1, 1, 1, 0, 3, 4, 1]
|
||||
batch_size = 3
|
||||
num_steps = 2
|
||||
output = list(reader.ptb_iterator(raw_data, batch_size, num_steps))
|
||||
self.assertEqual(len(output), 2)
|
||||
o1, o2 = (output[0], output[1])
|
||||
self.assertEqual(o1[0].shape, (batch_size, num_steps))
|
||||
self.assertEqual(o1[1].shape, (batch_size, num_steps))
|
||||
self.assertEqual(o2[0].shape, (batch_size, num_steps))
|
||||
self.assertEqual(o2[1].shape, (batch_size, num_steps))
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
tf.test.main()
|
||||
128
tensorflow/models/rnn/rnn.py
Normal file
128
tensorflow/models/rnn/rnn.py
Normal file
|
|
@ -0,0 +1,128 @@
|
|||
"""RNN helpers for TensorFlow models."""
|
||||
|
||||
import tensorflow as tf
|
||||
|
||||
from tensorflow.models.rnn import rnn_cell
|
||||
from tensorflow.python.ops import control_flow_ops
|
||||
|
||||
|
||||
def rnn(cell, inputs, initial_state=None, dtype=None,
|
||||
sequence_length=None, scope=None):
|
||||
"""Creates a recurrent neural network specified by RNNCell "cell".
|
||||
|
||||
The simplest form of RNN network generated is:
|
||||
state = cell.zero_state(...)
|
||||
outputs = []
|
||||
states = []
|
||||
for input_ in inputs:
|
||||
output, state = cell(input_, state)
|
||||
outputs.append(output)
|
||||
states.append(state)
|
||||
return (outputs, states)
|
||||
|
||||
However, a few other options are available:
|
||||
|
||||
An initial state can be provided.
|
||||
If sequence_length is provided, dynamic calculation is performed.
|
||||
|
||||
Dynamic calculation returns, at time t:
|
||||
(t >= max(sequence_length)
|
||||
? (zeros(output_shape), zeros(state_shape))
|
||||
: cell(input, state)
|
||||
|
||||
Thus saving computational time when unrolling past the max sequence length.
|
||||
|
||||
Args:
|
||||
cell: An instance of RNNCell.
|
||||
inputs: A length T list of inputs, each a vector with shape [batch_size].
|
||||
initial_state: (optional) An initial state for the RNN. This must be
|
||||
a tensor of appropriate type and shape [batch_size x cell.state_size].
|
||||
dtype: (optional) The data type for the initial state. Required if
|
||||
initial_state is not provided.
|
||||
sequence_length: An int64 vector (tensor) size [batch_size].
|
||||
scope: VariableScope for the created subgraph; defaults to "RNN".
|
||||
|
||||
Returns:
|
||||
A pair (outputs, states) where:
|
||||
outputs is a length T list of outputs (one for each input)
|
||||
states is a length T list of states (one state following each input)
|
||||
|
||||
Raises:
|
||||
TypeError: If "cell" is not an instance of RNNCell.
|
||||
ValueError: If inputs is None or an empty list.
|
||||
"""
|
||||
|
||||
if not isinstance(cell, rnn_cell.RNNCell):
|
||||
raise TypeError("cell must be an instance of RNNCell")
|
||||
if not isinstance(inputs, list):
|
||||
raise TypeError("inputs must be a list")
|
||||
if not inputs:
|
||||
raise ValueError("inputs must not be empty")
|
||||
|
||||
outputs = []
|
||||
states = []
|
||||
with tf.variable_scope(scope or "RNN"):
|
||||
batch_size = tf.shape(inputs[0])[0]
|
||||
if initial_state is not None:
|
||||
state = initial_state
|
||||
else:
|
||||
if not dtype:
|
||||
raise ValueError("If no initial_state is provided, dtype must be.")
|
||||
state = cell.zero_state(batch_size, dtype)
|
||||
|
||||
if sequence_length: # Prepare variables
|
||||
zero_output_state = (
|
||||
tf.zeros(tf.pack([batch_size, cell.output_size]),
|
||||
inputs[0].dtype),
|
||||
tf.zeros(tf.pack([batch_size, cell.state_size]),
|
||||
state.dtype))
|
||||
max_sequence_length = tf.reduce_max(sequence_length)
|
||||
|
||||
output_state = (None, None)
|
||||
for time, input_ in enumerate(inputs):
|
||||
if time > 0:
|
||||
tf.get_variable_scope().reuse_variables()
|
||||
output_state = cell(input_, state)
|
||||
if sequence_length:
|
||||
(output, state) = control_flow_ops.cond(
|
||||
time >= max_sequence_length,
|
||||
lambda: zero_output_state, lambda: output_state)
|
||||
else:
|
||||
(output, state) = output_state
|
||||
|
||||
outputs.append(output)
|
||||
states.append(state)
|
||||
|
||||
return (outputs, states)
|
||||
|
||||
|
||||
def state_saving_rnn(cell, inputs, state_saver, state_name,
|
||||
sequence_length=None, scope=None):
|
||||
"""RNN that accepts a state saver for time-truncated RNN calculation.
|
||||
|
||||
Args:
|
||||
cell: An instance of RNNCell.
|
||||
inputs: A length T list of inputs, each a vector with shape [batch_size].
|
||||
state_saver: A StateSaver object.
|
||||
state_name: The name to use with the state_saver.
|
||||
sequence_length: (optional) An int64 vector (tensor) size [batch_size].
|
||||
See the documentation for rnn() for more details about sequence_length.
|
||||
scope: VariableScope for the created subgraph; defaults to "RNN".
|
||||
|
||||
Returns:
|
||||
A pair (outputs, states) where:
|
||||
outputs is a length T list of outputs (one for each input)
|
||||
states is a length T list of states (one state following each input)
|
||||
|
||||
Raises:
|
||||
TypeError: If "cell" is not an instance of RNNCell.
|
||||
ValueError: If inputs is None or an empty list.
|
||||
"""
|
||||
initial_state = state_saver.State(state_name)
|
||||
(outputs, states) = rnn(cell, inputs, initial_state=initial_state,
|
||||
sequence_length=sequence_length, scope=scope)
|
||||
save_state = state_saver.SaveState(state_name, states[-1])
|
||||
with tf.control_dependencies([save_state]):
|
||||
outputs[-1] = tf.identity(outputs[-1])
|
||||
|
||||
return (outputs, states)
|
||||
605
tensorflow/models/rnn/rnn_cell.py
Normal file
605
tensorflow/models/rnn/rnn_cell.py
Normal file
|
|
@ -0,0 +1,605 @@
|
|||
"""Module for constructing RNN Cells."""
|
||||
|
||||
import math
|
||||
|
||||
import tensorflow as tf
|
||||
|
||||
from tensorflow.models.rnn import linear
|
||||
|
||||
|
||||
class RNNCell(object):
|
||||
"""Abstract object representing an RNN cell.
|
||||
|
||||
An RNN cell, in the most abstract setting, is anything that has
|
||||
a state -- a vector of floats of size self.state_size -- and performs some
|
||||
operation that takes inputs of size self.input_size. This operation
|
||||
results in an output of size self.output_size and a new state.
|
||||
|
||||
This module provides a number of basic commonly used RNN cells, such as
|
||||
LSTM (Long Short Term Memory) or GRU (Gated Recurrent Unit), and a number
|
||||
of operators that allow add dropouts, projections, or embeddings for inputs.
|
||||
Constructing multi-layer cells is supported by a super-class, MultiRNNCell,
|
||||
defined later. Every RNNCell must have the properties below and and
|
||||
implement __call__ with the following signature.
|
||||
"""
|
||||
|
||||
def __call__(self, inputs, state, scope=None):
|
||||
"""Run this RNN cell on inputs, starting from the given state.
|
||||
|
||||
Args:
|
||||
inputs: 2D Tensor with shape [batch_size x self.input_size].
|
||||
state: 2D Tensor with shape [batch_size x self.state_size].
|
||||
scope: VariableScope for the created subgraph; defaults to class name.
|
||||
|
||||
Returns:
|
||||
A pair containing:
|
||||
- Output: A 2D Tensor with shape [batch_size x self.output_size]
|
||||
- New state: A 2D Tensor with shape [batch_size x self.state_size].
|
||||
"""
|
||||
raise NotImplementedError("Abstract method")
|
||||
|
||||
@property
|
||||
def input_size(self):
|
||||
"""Integer: size of inputs accepted by this cell."""
|
||||
raise NotImplementedError("Abstract method")
|
||||
|
||||
@property
|
||||
def output_size(self):
|
||||
"""Integer: size of outputs produced by this cell."""
|
||||
raise NotImplementedError("Abstract method")
|
||||
|
||||
@property
|
||||
def state_size(self):
|
||||
"""Integer: size of state used by this cell."""
|
||||
raise NotImplementedError("Abstract method")
|
||||
|
||||
def zero_state(self, batch_size, dtype):
|
||||
"""Return state tensor (shape [batch_size x state_size]) filled with 0.
|
||||
|
||||
Args:
|
||||
batch_size: int, float, or unit Tensor representing the batch size.
|
||||
dtype: the data type to use for the state.
|
||||
|
||||
Returns:
|
||||
A 2D Tensor of shape [batch_size x state_size] filled with zeros.
|
||||
"""
|
||||
zeros = tf.zeros(tf.pack([batch_size, self.state_size]), dtype=dtype)
|
||||
# The reshape below is a no-op, but it allows shape inference of shape[1].
|
||||
return tf.reshape(zeros, [-1, self.state_size])
|
||||
|
||||
|
||||
class BasicRNNCell(RNNCell):
|
||||
"""The most basic RNN cell."""
|
||||
|
||||
def __init__(self, num_units):
|
||||
self._num_units = num_units
|
||||
|
||||
@property
|
||||
def input_size(self):
|
||||
return self._num_units
|
||||
|
||||
@property
|
||||
def output_size(self):
|
||||
return self._num_units
|
||||
|
||||
@property
|
||||
def state_size(self):
|
||||
return self._num_units
|
||||
|
||||
def __call__(self, inputs, state, scope=None):
|
||||
"""Most basic RNN: output = new_state = tanh(W * input + U * state + B)."""
|
||||
with tf.variable_scope(scope or type(self).__name__): # "BasicRNNCell"
|
||||
output = tf.tanh(linear.linear([inputs, state], self._num_units, True))
|
||||
return output, output
|
||||
|
||||
|
||||
class GRUCell(RNNCell):
|
||||
"""Gated Recurrent Unit cell (cf. http://arxiv.org/abs/1406.1078)."""
|
||||
|
||||
def __init__(self, num_units):
|
||||
self._num_units = num_units
|
||||
|
||||
@property
|
||||
def input_size(self):
|
||||
return self._num_units
|
||||
|
||||
@property
|
||||
def output_size(self):
|
||||
return self._num_units
|
||||
|
||||
@property
|
||||
def state_size(self):
|
||||
return self._num_units
|
||||
|
||||
def __call__(self, inputs, state, scope=None):
|
||||
"""Gated recurrent unit (GRU) with nunits cells."""
|
||||
with tf.variable_scope(scope or type(self).__name__): # "GRUCell"
|
||||
with tf.variable_scope("Gates"): # Reset gate and update gate.
|
||||
# We start with bias of 1.0 to not reset and not udpate.
|
||||
r, u = tf.split(1, 2, linear.linear([inputs, state],
|
||||
2 * self._num_units, True, 1.0))
|
||||
r, u = tf.sigmoid(r), tf.sigmoid(u)
|
||||
with tf.variable_scope("Candidate"):
|
||||
c = tf.tanh(linear.linear([inputs, r * state], self._num_units, True))
|
||||
new_h = u * state + (1 - u) * c
|
||||
return new_h, new_h
|
||||
|
||||
|
||||
class BasicLSTMCell(RNNCell):
|
||||
"""Basic LSTM recurrent network cell.
|
||||
|
||||
The implementation is based on: http://arxiv.org/pdf/1409.2329v5.pdf.
|
||||
|
||||
It does not allow cell clipping, a projection layer, and does not
|
||||
use peep-hole connections: it is the basic baseline.
|
||||
|
||||
Biases of the forget gate are initialized by default to 1 in order to reduce
|
||||
the scale of forgetting in the beginning of the training.
|
||||
"""
|
||||
|
||||
def __init__(self, num_units, forget_bias=1.0):
|
||||
self._num_units = num_units
|
||||
self._forget_bias = forget_bias
|
||||
|
||||
@property
|
||||
def input_size(self):
|
||||
return self._num_units
|
||||
|
||||
@property
|
||||
def output_size(self):
|
||||
return self._num_units
|
||||
|
||||
@property
|
||||
def state_size(self):
|
||||
return 2 * self._num_units
|
||||
|
||||
def __call__(self, inputs, state, scope=None):
|
||||
"""Long short-term memory cell (LSTM)."""
|
||||
with tf.variable_scope(scope or type(self).__name__): # "BasicLSTMCell"
|
||||
# Parameters of gates are concatenated into one multiply for efficiency.
|
||||
c, h = tf.split(1, 2, state)
|
||||
concat = linear.linear([inputs, h], 4 * self._num_units, True)
|
||||
|
||||
# i = input_gate, j = new_input, f = forget_gate, o = output_gate
|
||||
i, j, f, o = tf.split(1, 4, concat)
|
||||
|
||||
new_c = c * tf.sigmoid(f + self._forget_bias) + tf.sigmoid(i) * tf.tanh(j)
|
||||
new_h = tf.tanh(new_c) * tf.sigmoid(o)
|
||||
|
||||
return new_h, tf.concat(1, [new_c, new_h])
|
||||
|
||||
|
||||
class LSTMCell(RNNCell):
|
||||
"""Long short-term memory unit (LSTM) recurrent network cell.
|
||||
|
||||
This implementation is based on:
|
||||
|
||||
https://research.google.com/pubs/archive/43905.pdf
|
||||
|
||||
Hasim Sak, Andrew Senior, and Francoise Beaufays.
|
||||
"Long short-term memory recurrent neural network architectures for
|
||||
large scale acoustic modeling." INTERSPEECH, 2014.
|
||||
|
||||
It uses peep-hole connections, optional cell clipping, and an optional
|
||||
projection layer.
|
||||
"""
|
||||
|
||||
def __init__(self, num_units, input_size,
|
||||
use_peepholes=False, cell_clip=None,
|
||||
initializer=None, num_proj=None,
|
||||
num_unit_shards=1, num_proj_shards=1):
|
||||
"""Initialize the parameters for an LSTM cell.
|
||||
|
||||
Args:
|
||||
num_units: int, The number of units in the LSTM cell
|
||||
input_size: int, The dimensionality of the inputs into the LSTM cell
|
||||
use_peepholes: bool, set True to enable diagonal/peephole connections.
|
||||
cell_clip: (optional) A float value, if provided the cell state is clipped
|
||||
by this value prior to the cell output activation.
|
||||
initializer: (optional) The initializer to use for the weight and
|
||||
projection matrices.
|
||||
num_proj: (optional) int, The output dimensionality for the projection
|
||||
matrices. If None, no projection is performed.
|
||||
num_unit_shards: How to split the weight matrix. If >1, the weight
|
||||
matrix is stored across num_unit_shards.
|
||||
Note that num_unit_shards must evenly divide num_units * 4.
|
||||
num_proj_shards: How to split the projection matrix. If >1, the
|
||||
projection matrix is stored across num_proj_shards.
|
||||
Note that num_proj_shards must evenly divide num_proj
|
||||
(if num_proj is not None).
|
||||
|
||||
Raises:
|
||||
ValueError: if num_unit_shards doesn't divide 4 * num_units or
|
||||
num_proj_shards doesn't divide num_proj
|
||||
"""
|
||||
self._num_units = num_units
|
||||
self._input_size = input_size
|
||||
self._use_peepholes = use_peepholes
|
||||
self._cell_clip = cell_clip
|
||||
self._initializer = initializer
|
||||
self._num_proj = num_proj
|
||||
self._num_unit_shards = num_unit_shards
|
||||
self._num_proj_shards = num_proj_shards
|
||||
|
||||
if (num_units * 4) % num_unit_shards != 0:
|
||||
raise ValueError("num_unit_shards must evently divide 4 * num_units")
|
||||
if num_proj and num_proj % num_proj_shards != 0:
|
||||
raise ValueError("num_proj_shards must evently divide num_proj")
|
||||
|
||||
if num_proj:
|
||||
self._state_size = num_units + num_proj
|
||||
self._output_size = num_proj
|
||||
else:
|
||||
self._state_size = 2 * num_units
|
||||
self._output_size = num_units
|
||||
|
||||
@property
|
||||
def input_size(self):
|
||||
return self._input_size
|
||||
|
||||
@property
|
||||
def output_size(self):
|
||||
return self._output_size
|
||||
|
||||
@property
|
||||
def state_size(self):
|
||||
return self._state_size
|
||||
|
||||
def __call__(self, input_, state, scope=None):
|
||||
"""Run one step of LSTM.
|
||||
|
||||
Args:
|
||||
input_: input Tensor, 2D, batch x num_units.
|
||||
state: state Tensor, 2D, batch x state_size.
|
||||
scope: VariableScope for the created subgraph; defaults to "LSTMCell".
|
||||
|
||||
Returns:
|
||||
A tuple containing:
|
||||
- A 2D, batch x output_dim, Tensor representing the output of the LSTM
|
||||
after reading "input_" when previous state was "state".
|
||||
Here output_dim is:
|
||||
num_proj if num_proj was set,
|
||||
num_units otherwise.
|
||||
- A 2D, batch x state_size, Tensor representing the new state of LSTM
|
||||
after reading "input_" when previous state was "state".
|
||||
"""
|
||||
num_proj = self._num_units if self._num_proj is None else self._num_proj
|
||||
|
||||
c_prev = tf.slice(state, [0, 0], [-1, self._num_units])
|
||||
m_prev = tf.slice(state, [0, self._num_units], [-1, num_proj])
|
||||
|
||||
dtype = input_.dtype
|
||||
|
||||
unit_shard_size = (4 * self._num_units) / self._num_unit_shards
|
||||
|
||||
with tf.variable_scope(scope or type(self).__name__): # "LSTMCell"
|
||||
w = tf.concat(
|
||||
1, [tf.get_variable("W_%d" % i,
|
||||
shape=[self.input_size + num_proj,
|
||||
unit_shard_size],
|
||||
initializer=self._initializer,
|
||||
dtype=dtype)
|
||||
for i in range(self._num_unit_shards)])
|
||||
|
||||
b = tf.get_variable(
|
||||
"B", shape=[4 * self._num_units],
|
||||
initializer=tf.zeros_initializer, dtype=dtype)
|
||||
|
||||
# i = input_gate, j = new_input, f = forget_gate, o = output_gate
|
||||
cell_inputs = tf.concat(1, [input_, m_prev])
|
||||
i, j, f, o = tf.split(1, 4, tf.nn.bias_add(tf.matmul(cell_inputs, w), b))
|
||||
|
||||
# Diagonal connections
|
||||
if self._use_peepholes:
|
||||
w_f_diag = tf.get_variable(
|
||||
"W_F_diag", shape=[self._num_units], dtype=dtype)
|
||||
w_i_diag = tf.get_variable(
|
||||
"W_I_diag", shape=[self._num_units], dtype=dtype)
|
||||
w_o_diag = tf.get_variable(
|
||||
"W_O_diag", shape=[self._num_units], dtype=dtype)
|
||||
|
||||
if self._use_peepholes:
|
||||
c = (tf.sigmoid(f + 1 + w_f_diag * c_prev) * c_prev +
|
||||
tf.sigmoid(i + w_i_diag * c_prev) * tf.tanh(j))
|
||||
else:
|
||||
c = (tf.sigmoid(f + 1) * c_prev + tf.sigmoid(i) * tf.tanh(j))
|
||||
|
||||
if self._cell_clip is not None:
|
||||
c = tf.clip_by_value(c, -self._cell_clip, self._cell_clip)
|
||||
|
||||
if self._use_peepholes:
|
||||
m = tf.sigmoid(o + w_o_diag * c) * tf.tanh(c)
|
||||
else:
|
||||
m = tf.sigmoid(o) * tf.tanh(c)
|
||||
|
||||
if self._num_proj is not None:
|
||||
proj_shard_size = self._num_proj / self._num_proj_shards
|
||||
w_proj = tf.concat(
|
||||
1, [tf.get_variable("W_P_%d" % i,
|
||||
shape=[self._num_units, proj_shard_size],
|
||||
initializer=self._initializer, dtype=dtype)
|
||||
for i in range(self._num_proj_shards)])
|
||||
# TODO(ebrevdo), use matmulsum
|
||||
m = tf.matmul(m, w_proj)
|
||||
|
||||
return m, tf.concat(1, [c, m])
|
||||
|
||||
|
||||
class OutputProjectionWrapper(RNNCell):
|
||||
"""Operator adding an output projection to the given cell.
|
||||
|
||||
Note: in many cases it may be more efficient to not use this wrapper,
|
||||
but instead concatenate the whole sequence of your outputs in time,
|
||||
do the projection on this batch-concated sequence, then split it
|
||||
if needed or directly feed into a softmax.
|
||||
"""
|
||||
|
||||
def __init__(self, cell, output_size):
|
||||
"""Create a cell with output projection.
|
||||
|
||||
Args:
|
||||
cell: an RNNCell, a projection to output_size is added to it.
|
||||
output_size: integer, the size of the output after projection.
|
||||
|
||||
Raises:
|
||||
TypeError: if cell is not an RNNCell.
|
||||
ValueError: if output_size is not positive.
|
||||
"""
|
||||
if not isinstance(cell, RNNCell):
|
||||
raise TypeError("The parameter cell is not RNNCell.")
|
||||
if output_size < 1:
|
||||
raise ValueError("Parameter output_size must be > 0: %d." % output_size)
|
||||
self._cell = cell
|
||||
self._output_size = output_size
|
||||
|
||||
@property
|
||||
def input_size(self):
|
||||
return self._cell.input_size
|
||||
|
||||
@property
|
||||
def output_size(self):
|
||||
return self._output_size
|
||||
|
||||
@property
|
||||
def state_size(self):
|
||||
return self._cell.state_size
|
||||
|
||||
def __call__(self, inputs, state, scope=None):
|
||||
"""Run the cell and output projection on inputs, starting from state."""
|
||||
output, res_state = self._cell(inputs, state)
|
||||
# Default scope: "OutputProjectionWrapper"
|
||||
with tf.variable_scope(scope or type(self).__name__):
|
||||
projected = linear.linear(output, self._output_size, True)
|
||||
return projected, res_state
|
||||
|
||||
|
||||
class InputProjectionWrapper(RNNCell):
|
||||
"""Operator adding an input projection to the given cell.
|
||||
|
||||
Note: in many cases it may be more efficient to not use this wrapper,
|
||||
but instead concatenate the whole sequence of your inputs in time,
|
||||
do the projection on this batch-concated sequence, then split it.
|
||||
"""
|
||||
|
||||
def __init__(self, cell, input_size):
|
||||
"""Create a cell with input projection.
|
||||
|
||||
Args:
|
||||
cell: an RNNCell, a projection of inputs is added before it.
|
||||
input_size: integer, the size of the inputs before projection.
|
||||
|
||||
Raises:
|
||||
TypeError: if cell is not an RNNCell.
|
||||
ValueError: if input_size is not positive.
|
||||
"""
|
||||
if not isinstance(cell, RNNCell):
|
||||
raise TypeError("The parameter cell is not RNNCell.")
|
||||
if input_size < 1:
|
||||
raise ValueError("Parameter input_size must be > 0: %d." % input_size)
|
||||
self._cell = cell
|
||||
self._input_size = input_size
|
||||
|
||||
@property
|
||||
def input_size(self):
|
||||
return self._input_size
|
||||
|
||||
@property
|
||||
def output_size(self):
|
||||
return self._cell.output_size
|
||||
|
||||
@property
|
||||
def state_size(self):
|
||||
return self._cell.state_size
|
||||
|
||||
def __call__(self, inputs, state, scope=None):
|
||||
"""Run the input projection and then the cell."""
|
||||
# Default scope: "InputProjectionWrapper"
|
||||
with tf.variable_scope(scope or type(self).__name__):
|
||||
projected = linear.linear(inputs, self._cell.input_size, True)
|
||||
return self._cell(projected, state)
|
||||
|
||||
|
||||
class DropoutWrapper(RNNCell):
|
||||
"""Operator adding dropout to inputs and outputs of the given cell."""
|
||||
|
||||
def __init__(self, cell, input_keep_prob=1.0, output_keep_prob=1.0,
|
||||
seed=None):
|
||||
"""Create a cell with added input and/or output dropout.
|
||||
|
||||
Dropout is never used on the state.
|
||||
|
||||
Args:
|
||||
cell: an RNNCell, a projection to output_size is added to it.
|
||||
input_keep_prob: unit Tensor or float between 0 and 1, input keep
|
||||
probability; if it is float and 1, no input dropout will be added.
|
||||
output_keep_prob: unit Tensor or float between 0 and 1, output keep
|
||||
probability; if it is float and 1, no output dropout will be added.
|
||||
seed: (optional) integer, the randomness seed.
|
||||
|
||||
Raises:
|
||||
TypeError: if cell is not an RNNCell.
|
||||
ValueError: if keep_prob is not between 0 and 1.
|
||||
"""
|
||||
if not isinstance(cell, RNNCell):
|
||||
raise TypeError("The parameter cell is not a RNNCell.")
|
||||
if (isinstance(input_keep_prob, float) and
|
||||
not (input_keep_prob >= 0.0 and input_keep_prob <= 1.0)):
|
||||
raise ValueError("Parameter input_keep_prob must be between 0 and 1: %d"
|
||||
% input_keep_prob)
|
||||
if (isinstance(output_keep_prob, float) and
|
||||
not (output_keep_prob >= 0.0 and output_keep_prob <= 1.0)):
|
||||
raise ValueError("Parameter input_keep_prob must be between 0 and 1: %d"
|
||||
% output_keep_prob)
|
||||
self._cell = cell
|
||||
self._input_keep_prob = input_keep_prob
|
||||
self._output_keep_prob = output_keep_prob
|
||||
self._seed = seed
|
||||
|
||||
@property
|
||||
def input_size(self):
|
||||
return self._cell.input_size
|
||||
|
||||
@property
|
||||
def output_size(self):
|
||||
return self._cell.output_size
|
||||
|
||||
@property
|
||||
def state_size(self):
|
||||
return self._cell.state_size
|
||||
|
||||
def __call__(self, inputs, state):
|
||||
"""Run the cell with the declared dropouts."""
|
||||
if (not isinstance(self._input_keep_prob, float) or
|
||||
self._input_keep_prob < 1):
|
||||
inputs = tf.nn.dropout(inputs, self._input_keep_prob, seed=self._seed)
|
||||
output, new_state = self._cell(inputs, state)
|
||||
if (not isinstance(self._output_keep_prob, float) or
|
||||
self._output_keep_prob < 1):
|
||||
output = tf.nn.dropout(output, self._output_keep_prob, seed=self._seed)
|
||||
return output, new_state
|
||||
|
||||
|
||||
class EmbeddingWrapper(RNNCell):
|
||||
"""Operator adding input embedding to the given cell.
|
||||
|
||||
Note: in many cases it may be more efficient to not use this wrapper,
|
||||
but instead concatenate the whole sequence of your inputs in time,
|
||||
do the embedding on this batch-concated sequence, then split it and
|
||||
feed into your RNN.
|
||||
"""
|
||||
|
||||
def __init__(self, cell, embedding_classes=0, embedding=None,
|
||||
initializer=None):
|
||||
"""Create a cell with an added input embedding.
|
||||
|
||||
Args:
|
||||
cell: an RNNCell, an embedding will be put before its inputs.
|
||||
embedding_classes: integer, how many symbols will be embedded.
|
||||
embedding: Variable, the embedding to use; if None, a new embedding
|
||||
will be created; if set, then embedding_classes is not required.
|
||||
initializer: an initializer to use when creating the embedding;
|
||||
if None, the initializer from variable scope or a default one is used.
|
||||
|
||||
Raises:
|
||||
TypeError: if cell is not an RNNCell.
|
||||
ValueError: if embedding_classes is not positive.
|
||||
"""
|
||||
if not isinstance(cell, RNNCell):
|
||||
raise TypeError("The parameter cell is not RNNCell.")
|
||||
if embedding_classes < 1 and embedding is None:
|
||||
raise ValueError("Pass embedding or embedding_classes must be > 0: %d."
|
||||
% embedding_classes)
|
||||
if embedding_classes > 0 and embedding is not None:
|
||||
if embedding.size[0] != embedding_classes:
|
||||
raise ValueError("You declared embedding_classes=%d but passed an "
|
||||
"embedding for %d classes." % (embedding.size[0],
|
||||
embedding_classes))
|
||||
if embedding.size[1] != cell.input_size:
|
||||
raise ValueError("You passed embedding with output size %d and a cell"
|
||||
" that accepts size %d." % (embedding.size[1],
|
||||
cell.input_size))
|
||||
self._cell = cell
|
||||
self._embedding_classes = embedding_classes
|
||||
self._embedding = embedding
|
||||
self._initializer = initializer
|
||||
|
||||
@property
|
||||
def input_size(self):
|
||||
return 1
|
||||
|
||||
@property
|
||||
def output_size(self):
|
||||
return self._cell.output_size
|
||||
|
||||
@property
|
||||
def state_size(self):
|
||||
return self._cell.state_size
|
||||
|
||||
def __call__(self, inputs, state, scope=None):
|
||||
"""Run the cell on embedded inputs."""
|
||||
with tf.variable_scope(scope or type(self).__name__): # "EmbeddingWrapper"
|
||||
with tf.device("/cpu:0"):
|
||||
if self._embedding:
|
||||
embedding = self._embedding
|
||||
else:
|
||||
if self._initializer:
|
||||
initializer = self._initializer
|
||||
elif tf.get_variable_scope().initializer:
|
||||
initializer = tf.get_variable_scope().initializer
|
||||
else:
|
||||
# Default initializer for embeddings should have variance=1.
|
||||
sqrt3 = math.sqrt(3) # Uniform(-sqrt(3), sqrt(3)) has variance=1.
|
||||
initializer = tf.random_uniform_initializer(-sqrt3, sqrt3)
|
||||
embedding = tf.get_variable("embedding", [self._embedding_classes,
|
||||
self._cell.input_size],
|
||||
initializer=initializer)
|
||||
embedded = tf.nn.embedding_lookup(embedding, tf.reshape(inputs, [-1]))
|
||||
return self._cell(embedded, state)
|
||||
|
||||
|
||||
class MultiRNNCell(RNNCell):
|
||||
"""RNN cell composed sequentially of multiple simple cells."""
|
||||
|
||||
def __init__(self, cells):
|
||||
"""Create a RNN cell composed sequentially of a number of RNNCells.
|
||||
|
||||
Args:
|
||||
cells: list of RNNCells that will be composed in this order.
|
||||
|
||||
Raises:
|
||||
ValueError: if cells is empty (not allowed) or if their sizes don't match.
|
||||
"""
|
||||
if not cells:
|
||||
raise ValueError("Must specify at least one cell for MultiRNNCell.")
|
||||
for i in xrange(len(cells) - 1):
|
||||
if cells[i + 1].input_size != cells[i].output_size:
|
||||
raise ValueError("In MultiRNNCell, the input size of each next"
|
||||
" cell must match the output size of the previous one."
|
||||
" Mismatched output size in cell %d." % i)
|
||||
self._cells = cells
|
||||
|
||||
@property
|
||||
def input_size(self):
|
||||
return self._cells[0].input_size
|
||||
|
||||
@property
|
||||
def output_size(self):
|
||||
return self._cells[-1].output_size
|
||||
|
||||
@property
|
||||
def state_size(self):
|
||||
return sum([cell.state_size for cell in self._cells])
|
||||
|
||||
def __call__(self, inputs, state, scope=None):
|
||||
"""Run this multi-layer cell on inputs, starting from state."""
|
||||
with tf.variable_scope(scope or type(self).__name__): # "MultiRNNCell"
|
||||
cur_state_pos = 0
|
||||
cur_inp = inputs
|
||||
new_states = []
|
||||
for i, cell in enumerate(self._cells):
|
||||
with tf.variable_scope("Cell%d" % i):
|
||||
cur_state = tf.slice(state, [0, cur_state_pos], [-1, cell.state_size])
|
||||
cur_state_pos += cell.state_size
|
||||
cur_inp, new_state = cell(cur_inp, cur_state)
|
||||
new_states.append(new_state)
|
||||
return cur_inp, tf.concat(1, new_states)
|
||||
154
tensorflow/models/rnn/rnn_cell_test.py
Normal file
154
tensorflow/models/rnn/rnn_cell_test.py
Normal file
|
|
@ -0,0 +1,154 @@
|
|||
"""Tests for RNN cells."""
|
||||
|
||||
# pylint: disable=g-bad-import-order,unused-import
|
||||
import tensorflow.python.platform
|
||||
|
||||
import numpy as np
|
||||
import tensorflow as tf
|
||||
|
||||
from tensorflow.models.rnn import rnn_cell
|
||||
|
||||
|
||||
class RNNCellTest(tf.test.TestCase):
|
||||
|
||||
def testBasicRNNCell(self):
|
||||
with self.test_session() as sess:
|
||||
with tf.variable_scope("root", initializer=tf.constant_initializer(0.5)):
|
||||
x = tf.zeros([1, 2])
|
||||
m = tf.zeros([1, 2])
|
||||
g, _ = rnn_cell.BasicRNNCell(2)(x, m)
|
||||
sess.run([tf.variables.initialize_all_variables()])
|
||||
res = sess.run([g], {x.name: np.array([[1., 1.]]),
|
||||
m.name: np.array([[0.1, 0.1]])})
|
||||
self.assertEqual(res[0].shape, (1, 2))
|
||||
|
||||
def testGRUCell(self):
|
||||
with self.test_session() as sess:
|
||||
with tf.variable_scope("root", initializer=tf.constant_initializer(0.5)):
|
||||
x = tf.zeros([1, 2])
|
||||
m = tf.zeros([1, 2])
|
||||
g, _ = rnn_cell.GRUCell(2)(x, m)
|
||||
sess.run([tf.variables.initialize_all_variables()])
|
||||
res = sess.run([g], {x.name: np.array([[1., 1.]]),
|
||||
m.name: np.array([[0.1, 0.1]])})
|
||||
# Smoke test
|
||||
self.assertAllClose(res[0], [[0.175991, 0.175991]])
|
||||
|
||||
def testBasicLSTMCell(self):
|
||||
with self.test_session() as sess:
|
||||
with tf.variable_scope("root", initializer=tf.constant_initializer(0.5)):
|
||||
x = tf.zeros([1, 2])
|
||||
m = tf.zeros([1, 8])
|
||||
g, out_m = rnn_cell.MultiRNNCell([rnn_cell.BasicLSTMCell(2)] * 2)(x, m)
|
||||
sess.run([tf.variables.initialize_all_variables()])
|
||||
res = sess.run([g, out_m], {x.name: np.array([[1., 1.]]),
|
||||
m.name: 0.1 * np.ones([1, 8])})
|
||||
self.assertEqual(len(res), 2)
|
||||
# The numbers in results were not calculated, this is just a smoke test.
|
||||
self.assertAllClose(res[0], [[0.24024698, 0.24024698]])
|
||||
expected_mem = np.array([[0.68967271, 0.68967271,
|
||||
0.44848421, 0.44848421,
|
||||
0.39897051, 0.39897051,
|
||||
0.24024698, 0.24024698]])
|
||||
self.assertAllClose(res[1], expected_mem)
|
||||
|
||||
def testLSTMCell(self):
|
||||
with self.test_session() as sess:
|
||||
num_units = 8
|
||||
num_proj = 6
|
||||
state_size = num_units + num_proj
|
||||
batch_size = 3
|
||||
input_size = 2
|
||||
with tf.variable_scope("root", initializer=tf.constant_initializer(0.5)):
|
||||
x = tf.zeros([batch_size, input_size])
|
||||
m = tf.zeros([batch_size, state_size])
|
||||
output, state = rnn_cell.LSTMCell(
|
||||
num_units=num_units, input_size=input_size, num_proj=num_proj)(x, m)
|
||||
sess.run([tf.variables.initialize_all_variables()])
|
||||
res = sess.run([output, state],
|
||||
{x.name: np.array([[1., 1.], [2., 2.], [3., 3.]]),
|
||||
m.name: 0.1 * np.ones((batch_size, state_size))})
|
||||
self.assertEqual(len(res), 2)
|
||||
# The numbers in results were not calculated, this is mostly just a
|
||||
# smoke test.
|
||||
self.assertEqual(res[0].shape, (batch_size, num_proj))
|
||||
self.assertEqual(res[1].shape, (batch_size, state_size))
|
||||
# Different inputs so different outputs and states
|
||||
for i in range(1, batch_size):
|
||||
self.assertTrue(
|
||||
float(np.linalg.norm((res[0][0,:] - res[0][i,:]))) > 1e-6)
|
||||
self.assertTrue(
|
||||
float(np.linalg.norm((res[1][0,:] - res[1][i,:]))) > 1e-6)
|
||||
|
||||
def testOutputProjectionWrapper(self):
|
||||
with self.test_session() as sess:
|
||||
with tf.variable_scope("root", initializer=tf.constant_initializer(0.5)):
|
||||
x = tf.zeros([1, 3])
|
||||
m = tf.zeros([1, 3])
|
||||
cell = rnn_cell.OutputProjectionWrapper(rnn_cell.GRUCell(3), 2)
|
||||
g, new_m = cell(x, m)
|
||||
sess.run([tf.variables.initialize_all_variables()])
|
||||
res = sess.run([g, new_m], {x.name: np.array([[1., 1., 1.]]),
|
||||
m.name: np.array([[0.1, 0.1, 0.1]])})
|
||||
self.assertEqual(res[1].shape, (1, 3))
|
||||
# The numbers in results were not calculated, this is just a smoke test.
|
||||
self.assertAllClose(res[0], [[0.231907, 0.231907]])
|
||||
|
||||
def testInputProjectionWrapper(self):
|
||||
with self.test_session() as sess:
|
||||
with tf.variable_scope("root", initializer=tf.constant_initializer(0.5)):
|
||||
x = tf.zeros([1, 2])
|
||||
m = tf.zeros([1, 3])
|
||||
cell = rnn_cell.InputProjectionWrapper(rnn_cell.GRUCell(3), 2)
|
||||
g, new_m = cell(x, m)
|
||||
sess.run([tf.variables.initialize_all_variables()])
|
||||
res = sess.run([g, new_m], {x.name: np.array([[1., 1.]]),
|
||||
m.name: np.array([[0.1, 0.1, 0.1]])})
|
||||
self.assertEqual(res[1].shape, (1, 3))
|
||||
# The numbers in results were not calculated, this is just a smoke test.
|
||||
self.assertAllClose(res[0], [[0.154605, 0.154605, 0.154605]])
|
||||
|
||||
def testDropoutWrapper(self):
|
||||
with self.test_session() as sess:
|
||||
with tf.variable_scope("root", initializer=tf.constant_initializer(0.5)):
|
||||
x = tf.zeros([1, 3])
|
||||
m = tf.zeros([1, 3])
|
||||
keep = tf.zeros([1]) + 1
|
||||
g, new_m = rnn_cell.DropoutWrapper(rnn_cell.GRUCell(3),
|
||||
keep, keep)(x, m)
|
||||
sess.run([tf.variables.initialize_all_variables()])
|
||||
res = sess.run([g, new_m], {x.name: np.array([[1., 1., 1.]]),
|
||||
m.name: np.array([[0.1, 0.1, 0.1]])})
|
||||
self.assertEqual(res[1].shape, (1, 3))
|
||||
# The numbers in results were not calculated, this is just a smoke test.
|
||||
self.assertAllClose(res[0], [[0.154605, 0.154605, 0.154605]])
|
||||
|
||||
def testEmbeddingWrapper(self):
|
||||
with self.test_session() as sess:
|
||||
with tf.variable_scope("root", initializer=tf.constant_initializer(0.5)):
|
||||
x = tf.zeros([1, 1], dtype=tf.int32)
|
||||
m = tf.zeros([1, 2])
|
||||
g, new_m = rnn_cell.EmbeddingWrapper(rnn_cell.GRUCell(2), 3)(x, m)
|
||||
sess.run([tf.variables.initialize_all_variables()])
|
||||
res = sess.run([g, new_m], {x.name: np.array([[1]]),
|
||||
m.name: np.array([[0.1, 0.1]])})
|
||||
self.assertEqual(res[1].shape, (1, 2))
|
||||
# The numbers in results were not calculated, this is just a smoke test.
|
||||
self.assertAllClose(res[0], [[0.17139, 0.17139]])
|
||||
|
||||
def testMultiRNNCell(self):
|
||||
with self.test_session() as sess:
|
||||
with tf.variable_scope("root", initializer=tf.constant_initializer(0.5)):
|
||||
x = tf.zeros([1, 2])
|
||||
m = tf.zeros([1, 4])
|
||||
_, ml = rnn_cell.MultiRNNCell([rnn_cell.GRUCell(2)] * 2)(x, m)
|
||||
sess.run([tf.variables.initialize_all_variables()])
|
||||
res = sess.run(ml, {x.name: np.array([[1., 1.]]),
|
||||
m.name: np.array([[0.1, 0.1, 0.1, 0.1]])})
|
||||
# The numbers in results were not calculated, this is just a smoke test.
|
||||
self.assertAllClose(res, [[0.175991, 0.175991,
|
||||
0.13248, 0.13248]])
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
tf.test.main()
|
||||
472
tensorflow/models/rnn/rnn_test.py
Normal file
472
tensorflow/models/rnn/rnn_test.py
Normal file
|
|
@ -0,0 +1,472 @@
|
|||
"""Tests for rnn module."""
|
||||
|
||||
# pylint: disable=g-bad-import-order,unused-import
|
||||
import tensorflow.python.platform
|
||||
|
||||
import numpy as np
|
||||
import tensorflow as tf
|
||||
|
||||
from tensorflow.models.rnn import rnn
|
||||
from tensorflow.models.rnn import rnn_cell
|
||||
|
||||
|
||||
class Plus1RNNCell(rnn_cell.RNNCell):
|
||||
"""RNN Cell generating (output, new_state) = (input + 1, state + 1)."""
|
||||
|
||||
@property
|
||||
def output_size(self):
|
||||
return 5
|
||||
|
||||
@property
|
||||
def state_size(self):
|
||||
return 5
|
||||
|
||||
def __call__(self, input_, state):
|
||||
return (input_ + 1, state + 1)
|
||||
|
||||
|
||||
class TestStateSaver(object):
|
||||
|
||||
def __init__(self, batch_size, state_size):
|
||||
self._batch_size = batch_size
|
||||
self._state_size = state_size
|
||||
|
||||
def State(self, _):
|
||||
return tf.zeros(tf.pack([self._batch_size, self._state_size]))
|
||||
|
||||
def SaveState(self, _, state):
|
||||
self.saved_state = state
|
||||
return tf.identity(state)
|
||||
|
||||
|
||||
class RNNTest(tf.test.TestCase):
|
||||
|
||||
def setUp(self):
|
||||
self._seed = 23489
|
||||
np.random.seed(self._seed)
|
||||
|
||||
def testRNN(self):
|
||||
cell = Plus1RNNCell()
|
||||
batch_size = 2
|
||||
inputs = [tf.placeholder(tf.float32, shape=(batch_size, 5))] * 10
|
||||
outputs, states = rnn.rnn(cell, inputs, dtype=tf.float32)
|
||||
self.assertEqual(len(outputs), len(inputs))
|
||||
for out, inp in zip(outputs, inputs):
|
||||
self.assertEqual(out.get_shape(), inp.get_shape())
|
||||
self.assertEqual(out.dtype, inp.dtype)
|
||||
|
||||
with self.test_session(use_gpu=False) as sess:
|
||||
input_value = np.random.randn(batch_size, 5)
|
||||
values = sess.run(outputs + [states[-1]],
|
||||
feed_dict={inputs[0]: input_value})
|
||||
|
||||
# Outputs
|
||||
for v in values[:-1]:
|
||||
self.assertAllClose(v, input_value + 1.0)
|
||||
|
||||
# Final state
|
||||
self.assertAllClose(
|
||||
values[-1], 10.0*np.ones((batch_size, 5), dtype=np.float32))
|
||||
|
||||
def testDropout(self):
|
||||
cell = Plus1RNNCell()
|
||||
full_dropout_cell = rnn_cell.DropoutWrapper(
|
||||
cell, input_keep_prob=1e-12, seed=0)
|
||||
batch_size = 2
|
||||
inputs = [tf.placeholder(tf.float32, shape=(batch_size, 5))] * 10
|
||||
with tf.variable_scope("share_scope"):
|
||||
outputs, states = rnn.rnn(cell, inputs, dtype=tf.float32)
|
||||
with tf.variable_scope("drop_scope"):
|
||||
dropped_outputs, _ = rnn.rnn(full_dropout_cell, inputs, dtype=tf.float32)
|
||||
self.assertEqual(len(outputs), len(inputs))
|
||||
for out, inp in zip(outputs, inputs):
|
||||
self.assertEqual(out.get_shape().as_list(), inp.get_shape().as_list())
|
||||
self.assertEqual(out.dtype, inp.dtype)
|
||||
|
||||
with self.test_session(use_gpu=False) as sess:
|
||||
input_value = np.random.randn(batch_size, 5)
|
||||
values = sess.run(outputs + [states[-1]],
|
||||
feed_dict={inputs[0]: input_value})
|
||||
full_dropout_values = sess.run(dropped_outputs,
|
||||
feed_dict={inputs[0]: input_value})
|
||||
|
||||
for v in values[:-1]:
|
||||
self.assertAllClose(v, input_value + 1.0)
|
||||
for d_v in full_dropout_values[:-1]: # Add 1.0 to dropped_out (all zeros)
|
||||
self.assertAllClose(d_v, np.ones_like(input_value))
|
||||
|
||||
def testDynamicCalculation(self):
|
||||
cell = Plus1RNNCell()
|
||||
sequence_length = tf.placeholder(tf.int64)
|
||||
batch_size = 2
|
||||
inputs = [tf.placeholder(tf.float32, shape=(batch_size, 5))] * 10
|
||||
with tf.variable_scope("drop_scope"):
|
||||
dynamic_outputs, dynamic_states = rnn.rnn(
|
||||
cell, inputs, sequence_length=sequence_length, dtype=tf.float32)
|
||||
self.assertEqual(len(dynamic_outputs), len(inputs))
|
||||
self.assertEqual(len(dynamic_states), len(inputs))
|
||||
|
||||
with self.test_session(use_gpu=False) as sess:
|
||||
input_value = np.random.randn(batch_size, 5)
|
||||
dynamic_values = sess.run(dynamic_outputs,
|
||||
feed_dict={inputs[0]: input_value,
|
||||
sequence_length: [2, 3]})
|
||||
dynamic_state_values = sess.run(dynamic_states,
|
||||
feed_dict={inputs[0]: input_value,
|
||||
sequence_length: [2, 3]})
|
||||
|
||||
# fully calculated for t = 0, 1, 2
|
||||
for v in dynamic_values[:3]:
|
||||
self.assertAllClose(v, input_value + 1.0)
|
||||
for vi, v in enumerate(dynamic_state_values[:3]):
|
||||
self.assertAllEqual(v, 1.0 * (vi + 1) * np.ones((batch_size, 5)))
|
||||
# zeros for t = 3+
|
||||
for v in dynamic_values[3:]:
|
||||
self.assertAllEqual(v, np.zeros_like(input_value))
|
||||
for v in dynamic_state_values[3:]:
|
||||
self.assertAllEqual(v, np.zeros_like(input_value))
|
||||
|
||||
|
||||
class LSTMTest(tf.test.TestCase):
|
||||
|
||||
def setUp(self):
|
||||
self._seed = 23489
|
||||
np.random.seed(self._seed)
|
||||
|
||||
def _testNoProjNoSharding(self, use_gpu):
|
||||
num_units = 3
|
||||
input_size = 5
|
||||
batch_size = 2
|
||||
with self.test_session(use_gpu=use_gpu, graph=tf.Graph()) as sess:
|
||||
initializer = tf.random_uniform_initializer(-0.01, 0.01, seed=self._seed)
|
||||
cell = rnn_cell.LSTMCell(
|
||||
num_units, input_size, initializer=initializer)
|
||||
inputs = 10 * [
|
||||
tf.placeholder(tf.float32, shape=(batch_size, input_size))]
|
||||
outputs, _ = rnn.rnn(cell, inputs, dtype=tf.float32)
|
||||
self.assertEqual(len(outputs), len(inputs))
|
||||
for out in outputs:
|
||||
self.assertEqual(out.get_shape().as_list(), [batch_size, num_units])
|
||||
|
||||
tf.initialize_all_variables().run()
|
||||
input_value = np.random.randn(batch_size, input_size)
|
||||
sess.run(outputs, feed_dict={inputs[0]: input_value})
|
||||
|
||||
def _testCellClipping(self, use_gpu):
|
||||
num_units = 3
|
||||
input_size = 5
|
||||
batch_size = 2
|
||||
with self.test_session(use_gpu=use_gpu, graph=tf.Graph()) as sess:
|
||||
initializer = tf.random_uniform_initializer(-0.01, 0.01, seed=self._seed)
|
||||
cell = rnn_cell.LSTMCell(
|
||||
num_units, input_size, use_peepholes=True,
|
||||
cell_clip=0.0, initializer=initializer)
|
||||
inputs = 10 * [
|
||||
tf.placeholder(tf.float32, shape=(batch_size, input_size))]
|
||||
outputs, _ = rnn.rnn(cell, inputs, dtype=tf.float32)
|
||||
self.assertEqual(len(outputs), len(inputs))
|
||||
for out in outputs:
|
||||
self.assertEqual(out.get_shape().as_list(), [batch_size, num_units])
|
||||
|
||||
tf.initialize_all_variables().run()
|
||||
input_value = np.random.randn(batch_size, input_size)
|
||||
values = sess.run(outputs, feed_dict={inputs[0]: input_value})
|
||||
|
||||
for value in values:
|
||||
# if cell c is clipped to 0, tanh(c) = 0 => m==0
|
||||
self.assertAllEqual(value, np.zeros((batch_size, num_units)))
|
||||
|
||||
def _testNoProjNoShardingSimpleStateSaver(self, use_gpu):
|
||||
num_units = 3
|
||||
input_size = 5
|
||||
batch_size = 2
|
||||
with self.test_session(use_gpu=use_gpu, graph=tf.Graph()) as sess:
|
||||
initializer = tf.random_uniform_initializer(-0.01, 0.01, seed=self._seed)
|
||||
state_saver = TestStateSaver(batch_size, 2*num_units)
|
||||
cell = rnn_cell.LSTMCell(
|
||||
num_units, input_size, use_peepholes=False, initializer=initializer)
|
||||
inputs = 10 * [
|
||||
tf.placeholder(tf.float32, shape=(batch_size, input_size))]
|
||||
with tf.variable_scope("share_scope"):
|
||||
outputs, states = rnn.state_saving_rnn(
|
||||
cell, inputs, state_saver=state_saver, state_name="save_lstm")
|
||||
self.assertEqual(len(outputs), len(inputs))
|
||||
for out in outputs:
|
||||
self.assertEqual(out.get_shape().as_list(), [batch_size, num_units])
|
||||
|
||||
tf.initialize_all_variables().run()
|
||||
input_value = np.random.randn(batch_size, input_size)
|
||||
(last_state_value, saved_state_value) = sess.run(
|
||||
[states[-1], state_saver.saved_state],
|
||||
feed_dict={inputs[0]: input_value})
|
||||
self.assertAllEqual(last_state_value, saved_state_value)
|
||||
|
||||
def _testProjNoSharding(self, use_gpu):
|
||||
num_units = 3
|
||||
input_size = 5
|
||||
batch_size = 2
|
||||
num_proj = 4
|
||||
with self.test_session(use_gpu=use_gpu, graph=tf.Graph()) as sess:
|
||||
initializer = tf.random_uniform_initializer(-0.01, 0.01, seed=self._seed)
|
||||
inputs = 10 * [
|
||||
tf.placeholder(tf.float32, shape=(None, input_size))]
|
||||
cell = rnn_cell.LSTMCell(
|
||||
num_units, input_size, use_peepholes=True,
|
||||
num_proj=num_proj, initializer=initializer)
|
||||
outputs, _ = rnn.rnn(cell, inputs, dtype=tf.float32)
|
||||
self.assertEqual(len(outputs), len(inputs))
|
||||
|
||||
tf.initialize_all_variables().run()
|
||||
input_value = np.random.randn(batch_size, input_size)
|
||||
sess.run(outputs, feed_dict={inputs[0]: input_value})
|
||||
|
||||
def _testProjSharding(self, use_gpu):
|
||||
num_units = 3
|
||||
input_size = 5
|
||||
batch_size = 2
|
||||
num_proj = 4
|
||||
num_proj_shards = 4
|
||||
num_unit_shards = 2
|
||||
with self.test_session(use_gpu=use_gpu, graph=tf.Graph()) as sess:
|
||||
initializer = tf.random_uniform_initializer(-0.01, 0.01, seed=self._seed)
|
||||
|
||||
inputs = 10 * [
|
||||
tf.placeholder(tf.float32, shape=(None, input_size))]
|
||||
|
||||
cell = rnn_cell.LSTMCell(
|
||||
num_units,
|
||||
input_size=input_size,
|
||||
use_peepholes=True,
|
||||
num_proj=num_proj,
|
||||
num_unit_shards=num_unit_shards,
|
||||
num_proj_shards=num_proj_shards,
|
||||
initializer=initializer)
|
||||
|
||||
outputs, _ = rnn.rnn(cell, inputs, dtype=tf.float32)
|
||||
|
||||
self.assertEqual(len(outputs), len(inputs))
|
||||
|
||||
tf.initialize_all_variables().run()
|
||||
input_value = np.random.randn(batch_size, input_size)
|
||||
sess.run(outputs, feed_dict={inputs[0]: input_value})
|
||||
|
||||
def _testDoubleInput(self, use_gpu):
|
||||
num_units = 3
|
||||
input_size = 5
|
||||
batch_size = 2
|
||||
num_proj = 4
|
||||
num_proj_shards = 4
|
||||
num_unit_shards = 2
|
||||
with self.test_session(use_gpu=use_gpu, graph=tf.Graph()) as sess:
|
||||
initializer = tf.random_uniform_initializer(-1, 1, seed=self._seed)
|
||||
inputs = 10 * [tf.placeholder(tf.float64)]
|
||||
|
||||
cell = rnn_cell.LSTMCell(
|
||||
num_units,
|
||||
input_size=input_size,
|
||||
use_peepholes=True,
|
||||
num_proj=num_proj,
|
||||
num_unit_shards=num_unit_shards,
|
||||
num_proj_shards=num_proj_shards,
|
||||
initializer=initializer)
|
||||
|
||||
outputs, _ = rnn.rnn(
|
||||
cell, inputs, initial_state=cell.zero_state(batch_size, tf.float64))
|
||||
|
||||
self.assertEqual(len(outputs), len(inputs))
|
||||
|
||||
tf.initialize_all_variables().run()
|
||||
input_value = np.asarray(np.random.randn(batch_size, input_size),
|
||||
dtype=np.float64)
|
||||
values = sess.run(outputs, feed_dict={inputs[0]: input_value})
|
||||
self.assertEqual(values[0].dtype, input_value.dtype)
|
||||
|
||||
def _testShardNoShardEquivalentOutput(self, use_gpu):
|
||||
num_units = 3
|
||||
input_size = 5
|
||||
batch_size = 2
|
||||
num_proj = 4
|
||||
num_proj_shards = 4
|
||||
num_unit_shards = 2
|
||||
with self.test_session(use_gpu=use_gpu, graph=tf.Graph()) as sess:
|
||||
inputs = 10 * [tf.placeholder(tf.float32)]
|
||||
initializer = tf.constant_initializer(0.001)
|
||||
|
||||
cell_noshard = rnn_cell.LSTMCell(
|
||||
num_units, input_size,
|
||||
num_proj=num_proj,
|
||||
use_peepholes=True,
|
||||
initializer=initializer,
|
||||
num_unit_shards=num_unit_shards,
|
||||
num_proj_shards=num_proj_shards)
|
||||
|
||||
cell_shard = rnn_cell.LSTMCell(
|
||||
num_units, input_size, use_peepholes=True,
|
||||
initializer=initializer, num_proj=num_proj)
|
||||
|
||||
with tf.variable_scope("noshard_scope"):
|
||||
outputs_noshard, states_noshard = rnn.rnn(
|
||||
cell_noshard, inputs, dtype=tf.float32)
|
||||
with tf.variable_scope("shard_scope"):
|
||||
outputs_shard, states_shard = rnn.rnn(
|
||||
cell_shard, inputs, dtype=tf.float32)
|
||||
|
||||
self.assertEqual(len(outputs_noshard), len(inputs))
|
||||
self.assertEqual(len(outputs_noshard), len(outputs_shard))
|
||||
|
||||
tf.initialize_all_variables().run()
|
||||
input_value = np.random.randn(batch_size, input_size)
|
||||
feeds = dict((x, input_value) for x in inputs)
|
||||
values_noshard = sess.run(outputs_noshard, feed_dict=feeds)
|
||||
values_shard = sess.run(outputs_shard, feed_dict=feeds)
|
||||
state_values_noshard = sess.run(states_noshard, feed_dict=feeds)
|
||||
state_values_shard = sess.run(states_shard, feed_dict=feeds)
|
||||
self.assertEqual(len(values_noshard), len(values_shard))
|
||||
self.assertEqual(len(state_values_noshard), len(state_values_shard))
|
||||
for (v_noshard, v_shard) in zip(values_noshard, values_shard):
|
||||
self.assertAllClose(v_noshard, v_shard, atol=1e-3)
|
||||
for (s_noshard, s_shard) in zip(state_values_noshard, state_values_shard):
|
||||
self.assertAllClose(s_noshard, s_shard, atol=1e-3)
|
||||
|
||||
def _testDoubleInputWithDropoutAndDynamicCalculation(
|
||||
self, use_gpu):
|
||||
"""Smoke test for using LSTM with doubles, dropout, dynamic calculation."""
|
||||
|
||||
num_units = 3
|
||||
input_size = 5
|
||||
batch_size = 2
|
||||
num_proj = 4
|
||||
num_proj_shards = 4
|
||||
num_unit_shards = 2
|
||||
with self.test_session(use_gpu=use_gpu, graph=tf.Graph()) as sess:
|
||||
sequence_length = tf.placeholder(tf.int64)
|
||||
initializer = tf.random_uniform_initializer(-0.01, 0.01, seed=self._seed)
|
||||
inputs = 10 * [tf.placeholder(tf.float64)]
|
||||
|
||||
cell = rnn_cell.LSTMCell(
|
||||
num_units,
|
||||
input_size=input_size,
|
||||
use_peepholes=True,
|
||||
num_proj=num_proj,
|
||||
num_unit_shards=num_unit_shards,
|
||||
num_proj_shards=num_proj_shards,
|
||||
initializer=initializer)
|
||||
dropout_cell = rnn_cell.DropoutWrapper(cell, 0.5, seed=0)
|
||||
|
||||
outputs, states = rnn.rnn(
|
||||
dropout_cell, inputs, sequence_length=sequence_length,
|
||||
initial_state=cell.zero_state(batch_size, tf.float64))
|
||||
|
||||
self.assertEqual(len(outputs), len(inputs))
|
||||
self.assertEqual(len(outputs), len(states))
|
||||
|
||||
tf.initialize_all_variables().run()
|
||||
input_value = np.asarray(np.random.randn(batch_size, input_size),
|
||||
dtype=np.float64)
|
||||
values = sess.run(outputs, feed_dict={inputs[0]: input_value,
|
||||
sequence_length: [2, 3]})
|
||||
state_values = sess.run(states, feed_dict={inputs[0]: input_value,
|
||||
sequence_length: [2, 3]})
|
||||
self.assertEqual(values[0].dtype, input_value.dtype)
|
||||
self.assertEqual(state_values[0].dtype, input_value.dtype)
|
||||
|
||||
def testSharingWeightsWithReuse(self):
|
||||
num_units = 3
|
||||
input_size = 5
|
||||
batch_size = 2
|
||||
num_proj = 4
|
||||
with self.test_session(graph=tf.Graph()) as sess:
|
||||
initializer = tf.random_uniform_initializer(-1, 1, seed=self._seed)
|
||||
inputs = 10 * [
|
||||
tf.placeholder(tf.float32, shape=(None, input_size))]
|
||||
cell = rnn_cell.LSTMCell(
|
||||
num_units, input_size, use_peepholes=True,
|
||||
num_proj=num_proj, initializer=initializer)
|
||||
|
||||
with tf.variable_scope("share_scope"):
|
||||
outputs0, _ = rnn.rnn(cell, inputs, dtype=tf.float32)
|
||||
with tf.variable_scope("share_scope", reuse=True):
|
||||
outputs1, _ = rnn.rnn(cell, inputs, dtype=tf.float32)
|
||||
with tf.variable_scope("diff_scope"):
|
||||
outputs2, _ = rnn.rnn(cell, inputs, dtype=tf.float32)
|
||||
|
||||
tf.initialize_all_variables().run()
|
||||
input_value = np.random.randn(batch_size, input_size)
|
||||
output_values = sess.run(
|
||||
outputs0 + outputs1 + outputs2, feed_dict={inputs[0]: input_value})
|
||||
outputs0_values = output_values[:10]
|
||||
outputs1_values = output_values[10:20]
|
||||
outputs2_values = output_values[20:]
|
||||
self.assertEqual(len(outputs0_values), len(outputs1_values))
|
||||
self.assertEqual(len(outputs0_values), len(outputs2_values))
|
||||
for o1, o2, o3 in zip(outputs0_values, outputs1_values, outputs2_values):
|
||||
# Same weights used by both RNNs so outputs should be the same.
|
||||
self.assertAllEqual(o1, o2)
|
||||
# Different weights used so outputs should be different.
|
||||
self.assertTrue(np.linalg.norm(o1-o3) > 1e-6)
|
||||
|
||||
def testSharingWeightsWithDifferentNamescope(self):
|
||||
num_units = 3
|
||||
input_size = 5
|
||||
batch_size = 2
|
||||
num_proj = 4
|
||||
with self.test_session(graph=tf.Graph()) as sess:
|
||||
initializer = tf.random_uniform_initializer(-1, 1, seed=self._seed)
|
||||
inputs = 10 * [
|
||||
tf.placeholder(tf.float32, shape=(None, input_size))]
|
||||
cell = rnn_cell.LSTMCell(
|
||||
num_units, input_size, use_peepholes=True,
|
||||
num_proj=num_proj, initializer=initializer)
|
||||
|
||||
with tf.name_scope("scope0"):
|
||||
with tf.variable_scope("share_scope"):
|
||||
outputs0, _ = rnn.rnn(cell, inputs, dtype=tf.float32)
|
||||
with tf.name_scope("scope1"):
|
||||
with tf.variable_scope("share_scope", reuse=True):
|
||||
outputs1, _ = rnn.rnn(cell, inputs, dtype=tf.float32)
|
||||
|
||||
tf.initialize_all_variables().run()
|
||||
input_value = np.random.randn(batch_size, input_size)
|
||||
output_values = sess.run(
|
||||
outputs0 + outputs1, feed_dict={inputs[0]: input_value})
|
||||
outputs0_values = output_values[:10]
|
||||
outputs1_values = output_values[10:]
|
||||
self.assertEqual(len(outputs0_values), len(outputs1_values))
|
||||
for out0, out1 in zip(outputs0_values, outputs1_values):
|
||||
self.assertAllEqual(out0, out1)
|
||||
|
||||
def testNoProjNoShardingSimpleStateSaver(self):
|
||||
self._testNoProjNoShardingSimpleStateSaver(False)
|
||||
self._testNoProjNoShardingSimpleStateSaver(True)
|
||||
|
||||
def testNoProjNoSharding(self):
|
||||
self._testNoProjNoSharding(False)
|
||||
self._testNoProjNoSharding(True)
|
||||
|
||||
def testCellClipping(self):
|
||||
self._testCellClipping(False)
|
||||
self._testCellClipping(True)
|
||||
|
||||
def testProjNoSharding(self):
|
||||
self._testProjNoSharding(False)
|
||||
self._testProjNoSharding(True)
|
||||
|
||||
def testProjSharding(self):
|
||||
self._testProjSharding(False)
|
||||
self._testProjSharding(True)
|
||||
|
||||
def testShardNoShardEquivalentOutput(self):
|
||||
self._testShardNoShardEquivalentOutput(False)
|
||||
self._testShardNoShardEquivalentOutput(True)
|
||||
|
||||
def testDoubleInput(self):
|
||||
self._testDoubleInput(False)
|
||||
self._testDoubleInput(True)
|
||||
|
||||
def testDoubleInputWithDropoutAndDynamicCalculation(self):
|
||||
self._testDoubleInputWithDropoutAndDynamicCalculation(False)
|
||||
self._testDoubleInputWithDropoutAndDynamicCalculation(True)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
tf.test.main()
|
||||
749
tensorflow/models/rnn/seq2seq.py
Normal file
749
tensorflow/models/rnn/seq2seq.py
Normal file
|
|
@ -0,0 +1,749 @@
|
|||
"""Library for creating sequence-to-sequence models."""
|
||||
|
||||
import tensorflow.python.platform
|
||||
|
||||
import tensorflow as tf
|
||||
|
||||
from tensorflow.models.rnn import linear
|
||||
from tensorflow.models.rnn import rnn
|
||||
from tensorflow.models.rnn import rnn_cell
|
||||
|
||||
|
||||
def rnn_decoder(decoder_inputs, initial_state, cell, loop_function=None,
|
||||
scope=None):
|
||||
"""RNN decoder for the sequence-to-sequence model.
|
||||
|
||||
Args:
|
||||
decoder_inputs: a list of 2D Tensors [batch_size x cell.input_size].
|
||||
initial_state: 2D Tensor with shape [batch_size x cell.state_size].
|
||||
cell: rnn_cell.RNNCell defining the cell function and size.
|
||||
loop_function: if not None, this function will be applied to i-th output
|
||||
in order to generate i+1-th input, and decoder_inputs will be ignored,
|
||||
except for the first element ("GO" symbol). This can be used for decoding,
|
||||
but also for training to emulate http://arxiv.org/pdf/1506.03099v2.pdf.
|
||||
Signature -- loop_function(prev, i) = next
|
||||
* prev is a 2D Tensor of shape [batch_size x cell.output_size],
|
||||
* i is an integer, the step number (when advanced control is needed),
|
||||
* next is a 2D Tensor of shape [batch_size x cell.input_size].
|
||||
scope: VariableScope for the created subgraph; defaults to "rnn_decoder".
|
||||
|
||||
Returns:
|
||||
outputs: A list of the same length as decoder_inputs of 2D Tensors with
|
||||
shape [batch_size x cell.output_size] containing generated outputs.
|
||||
states: The state of each cell in each time-step. This is a list with
|
||||
length len(decoder_inputs) -- one item for each time-step.
|
||||
Each item is a 2D Tensor of shape [batch_size x cell.state_size].
|
||||
(Note that in some cases, like basic RNN cell or GRU cell, outputs and
|
||||
states can be the same. They are different for LSTM cells though.)
|
||||
"""
|
||||
with tf.variable_scope(scope or "rnn_decoder"):
|
||||
states = [initial_state]
|
||||
outputs = []
|
||||
prev = None
|
||||
for i in xrange(len(decoder_inputs)):
|
||||
inp = decoder_inputs[i]
|
||||
if loop_function is not None and prev is not None:
|
||||
with tf.variable_scope("loop_function", reuse=True):
|
||||
# We do not propagate gradients over the loop function.
|
||||
inp = tf.stop_gradient(loop_function(prev, i))
|
||||
if i > 0:
|
||||
tf.get_variable_scope().reuse_variables()
|
||||
output, new_state = cell(inp, states[-1])
|
||||
outputs.append(output)
|
||||
states.append(new_state)
|
||||
if loop_function is not None:
|
||||
prev = tf.stop_gradient(output)
|
||||
return outputs, states
|
||||
|
||||
|
||||
def basic_rnn_seq2seq(
|
||||
encoder_inputs, decoder_inputs, cell, dtype=tf.float32, scope=None):
|
||||
"""Basic RNN sequence-to-sequence model.
|
||||
|
||||
This model first runs an RNN to encode encoder_inputs into a state vector, and
|
||||
then runs decoder, initialized with the last encoder state, on decoder_inputs.
|
||||
Encoder and decoder use the same RNN cell type, but don't share parameters.
|
||||
|
||||
Args:
|
||||
encoder_inputs: a list of 2D Tensors [batch_size x cell.input_size].
|
||||
decoder_inputs: a list of 2D Tensors [batch_size x cell.input_size].
|
||||
cell: rnn_cell.RNNCell defining the cell function and size.
|
||||
dtype: The dtype of the initial state of the RNN cell (default: tf.float32).
|
||||
scope: VariableScope for the created subgraph; default: "basic_rnn_seq2seq".
|
||||
|
||||
Returns:
|
||||
outputs: A list of the same length as decoder_inputs of 2D Tensors with
|
||||
shape [batch_size x cell.output_size] containing the generated outputs.
|
||||
states: The state of each decoder cell in each time-step. This is a list
|
||||
with length len(decoder_inputs) -- one item for each time-step.
|
||||
Each item is a 2D Tensor of shape [batch_size x cell.state_size].
|
||||
"""
|
||||
with tf.variable_scope(scope or "basic_rnn_seq2seq"):
|
||||
_, enc_states = rnn.rnn(cell, encoder_inputs, dtype=dtype)
|
||||
return rnn_decoder(decoder_inputs, enc_states[-1], cell)
|
||||
|
||||
|
||||
def tied_rnn_seq2seq(encoder_inputs, decoder_inputs, cell,
|
||||
loop_function=None, dtype=tf.float32, scope=None):
|
||||
"""RNN sequence-to-sequence model with tied encoder and decoder parameters.
|
||||
|
||||
This model first runs an RNN to encode encoder_inputs into a state vector, and
|
||||
then runs decoder, initialized with the last encoder state, on decoder_inputs.
|
||||
Encoder and decoder use the same RNN cell and share parameters.
|
||||
|
||||
Args:
|
||||
encoder_inputs: a list of 2D Tensors [batch_size x cell.input_size].
|
||||
decoder_inputs: a list of 2D Tensors [batch_size x cell.input_size].
|
||||
cell: rnn_cell.RNNCell defining the cell function and size.
|
||||
loop_function: if not None, this function will be applied to i-th output
|
||||
in order to generate i+1-th input, and decoder_inputs will be ignored,
|
||||
except for the first element ("GO" symbol), see rnn_decoder for details.
|
||||
dtype: The dtype of the initial state of the rnn cell (default: tf.float32).
|
||||
scope: VariableScope for the created subgraph; default: "tied_rnn_seq2seq".
|
||||
|
||||
Returns:
|
||||
outputs: A list of the same length as decoder_inputs of 2D Tensors with
|
||||
shape [batch_size x cell.output_size] containing the generated outputs.
|
||||
states: The state of each decoder cell in each time-step. This is a list
|
||||
with length len(decoder_inputs) -- one item for each time-step.
|
||||
Each item is a 2D Tensor of shape [batch_size x cell.state_size].
|
||||
"""
|
||||
with tf.variable_scope("combined_tied_rnn_seq2seq"):
|
||||
scope = scope or "tied_rnn_seq2seq"
|
||||
_, enc_states = rnn.rnn(
|
||||
cell, encoder_inputs, dtype=dtype, scope=scope)
|
||||
tf.get_variable_scope().reuse_variables()
|
||||
return rnn_decoder(decoder_inputs, enc_states[-1], cell,
|
||||
loop_function=loop_function, scope=scope)
|
||||
|
||||
|
||||
def embedding_rnn_decoder(decoder_inputs, initial_state, cell, num_symbols,
|
||||
output_projection=None, feed_previous=False,
|
||||
scope=None):
|
||||
"""RNN decoder with embedding and a pure-decoding option.
|
||||
|
||||
Args:
|
||||
decoder_inputs: a list of 1D batch-sized int32-Tensors (decoder inputs).
|
||||
initial_state: 2D Tensor [batch_size x cell.state_size].
|
||||
cell: rnn_cell.RNNCell defining the cell function.
|
||||
num_symbols: integer, how many symbols come into the embedding.
|
||||
output_projection: None or a pair (W, B) of output projection weights and
|
||||
biases; W has shape [cell.output_size x num_symbols] and B has
|
||||
shape [num_symbols]; if provided and feed_previous=True, each fed
|
||||
previous output will first be multiplied by W and added B.
|
||||
feed_previous: Boolean; if True, only the first of decoder_inputs will be
|
||||
used (the "GO" symbol), and all other decoder inputs will be generated by:
|
||||
next = embedding_lookup(embedding, argmax(previous_output)),
|
||||
In effect, this implements a greedy decoder. It can also be used
|
||||
during training to emulate http://arxiv.org/pdf/1506.03099v2.pdf.
|
||||
If False, decoder_inputs are used as given (the standard decoder case).
|
||||
scope: VariableScope for the created subgraph; defaults to
|
||||
"embedding_rnn_decoder".
|
||||
|
||||
Returns:
|
||||
outputs: A list of the same length as decoder_inputs of 2D Tensors with
|
||||
shape [batch_size x cell.output_size] containing the generated outputs.
|
||||
states: The state of each decoder cell in each time-step. This is a list
|
||||
with length len(decoder_inputs) -- one item for each time-step.
|
||||
Each item is a 2D Tensor of shape [batch_size x cell.state_size].
|
||||
|
||||
Raises:
|
||||
ValueError: when output_projection has the wrong shape.
|
||||
"""
|
||||
if output_projection is not None:
|
||||
proj_weights = tf.convert_to_tensor(output_projection[0], dtype=tf.float32)
|
||||
proj_weights.get_shape().assert_is_compatible_with([cell.output_size,
|
||||
num_symbols])
|
||||
proj_biases = tf.convert_to_tensor(output_projection[1], dtype=tf.float32)
|
||||
proj_biases.get_shape().assert_is_compatible_with([num_symbols])
|
||||
|
||||
with tf.variable_scope(scope or "embedding_rnn_decoder"):
|
||||
with tf.device("/cpu:0"):
|
||||
embedding = tf.get_variable("embedding", [num_symbols, cell.input_size])
|
||||
|
||||
def extract_argmax_and_embed(prev, _):
|
||||
"""Loop_function that extracts the symbol from prev and embeds it."""
|
||||
if output_projection is not None:
|
||||
prev = tf.nn.xw_plus_b(prev, output_projection[0], output_projection[1])
|
||||
prev_symbol = tf.stop_gradient(tf.argmax(prev, 1))
|
||||
return tf.nn.embedding_lookup(embedding, prev_symbol)
|
||||
|
||||
loop_function = None
|
||||
if feed_previous:
|
||||
loop_function = extract_argmax_and_embed
|
||||
|
||||
emb_inp = [tf.nn.embedding_lookup(embedding, i) for i in decoder_inputs]
|
||||
return rnn_decoder(emb_inp, initial_state, cell,
|
||||
loop_function=loop_function)
|
||||
|
||||
|
||||
def embedding_rnn_seq2seq(encoder_inputs, decoder_inputs, cell,
|
||||
num_encoder_symbols, num_decoder_symbols,
|
||||
output_projection=None, feed_previous=False,
|
||||
dtype=tf.float32, scope=None):
|
||||
"""Embedding RNN sequence-to-sequence model.
|
||||
|
||||
This model first embeds encoder_inputs by a newly created embedding (of shape
|
||||
[num_encoder_symbols x cell.input_size]). Then it runs an RNN to encode
|
||||
embedded encoder_inputs into a state vector. Next, it embeds decoder_inputs
|
||||
by another newly created embedding (of shape [num_decoder_symbols x
|
||||
cell.input_size]). Then it runs RNN decoder, initialized with the last
|
||||
encoder state, on embedded decoder_inputs.
|
||||
|
||||
Args:
|
||||
encoder_inputs: a list of 2D Tensors [batch_size x cell.input_size].
|
||||
decoder_inputs: a list of 2D Tensors [batch_size x cell.input_size].
|
||||
cell: rnn_cell.RNNCell defining the cell function and size.
|
||||
num_encoder_symbols: integer; number of symbols on the encoder side.
|
||||
num_decoder_symbols: integer; number of symbols on the decoder side.
|
||||
output_projection: None or a pair (W, B) of output projection weights and
|
||||
biases; W has shape [cell.output_size x num_decoder_symbols] and B has
|
||||
shape [num_decoder_symbols]; if provided and feed_previous=True, each
|
||||
fed previous output will first be multiplied by W and added B.
|
||||
feed_previous: Boolean or scalar Boolean Tensor; if True, only the first
|
||||
of decoder_inputs will be used (the "GO" symbol), and all other decoder
|
||||
inputs will be taken from previous outputs (as in embedding_rnn_decoder).
|
||||
If False, decoder_inputs are used as given (the standard decoder case).
|
||||
dtype: The dtype of the initial state for both the encoder and encoder
|
||||
rnn cells (default: tf.float32).
|
||||
scope: VariableScope for the created subgraph; defaults to
|
||||
"embedding_rnn_seq2seq"
|
||||
|
||||
Returns:
|
||||
outputs: A list of the same length as decoder_inputs of 2D Tensors with
|
||||
shape [batch_size x num_decoder_symbols] containing the generated outputs.
|
||||
states: The state of each decoder cell in each time-step. This is a list
|
||||
with length len(decoder_inputs) -- one item for each time-step.
|
||||
Each item is a 2D Tensor of shape [batch_size x cell.state_size].
|
||||
"""
|
||||
with tf.variable_scope(scope or "embedding_rnn_seq2seq"):
|
||||
# Encoder.
|
||||
encoder_cell = rnn_cell.EmbeddingWrapper(cell, num_encoder_symbols)
|
||||
_, encoder_states = rnn.rnn(encoder_cell, encoder_inputs, dtype=dtype)
|
||||
|
||||
# Decoder.
|
||||
if output_projection is None:
|
||||
cell = rnn_cell.OutputProjectionWrapper(cell, num_decoder_symbols)
|
||||
|
||||
if isinstance(feed_previous, bool):
|
||||
return embedding_rnn_decoder(decoder_inputs, encoder_states[-1], cell,
|
||||
num_decoder_symbols, output_projection,
|
||||
feed_previous)
|
||||
else: # If feed_previous is a Tensor, we construct 2 graphs and use cond.
|
||||
outputs1, states1 = embedding_rnn_decoder(
|
||||
decoder_inputs, encoder_states[-1], cell, num_decoder_symbols,
|
||||
output_projection, True)
|
||||
tf.get_variable_scope().reuse_variables()
|
||||
outputs2, states2 = embedding_rnn_decoder(
|
||||
decoder_inputs, encoder_states[-1], cell, num_decoder_symbols,
|
||||
output_projection, False)
|
||||
|
||||
outputs = tf.control_flow_ops.cond(feed_previous,
|
||||
lambda: outputs1, lambda: outputs2)
|
||||
states = tf.control_flow_ops.cond(feed_previous,
|
||||
lambda: states1, lambda: states2)
|
||||
return outputs, states
|
||||
|
||||
|
||||
def embedding_tied_rnn_seq2seq(encoder_inputs, decoder_inputs, cell,
|
||||
num_symbols, output_projection=None,
|
||||
feed_previous=False, dtype=tf.float32,
|
||||
scope=None):
|
||||
"""Embedding RNN sequence-to-sequence model with tied (shared) parameters.
|
||||
|
||||
This model first embeds encoder_inputs by a newly created embedding (of shape
|
||||
[num_symbols x cell.input_size]). Then it runs an RNN to encode embedded
|
||||
encoder_inputs into a state vector. Next, it embeds decoder_inputs using
|
||||
the same embedding. Then it runs RNN decoder, initialized with the last
|
||||
encoder state, on embedded decoder_inputs.
|
||||
|
||||
Args:
|
||||
encoder_inputs: a list of 2D Tensors [batch_size x cell.input_size].
|
||||
decoder_inputs: a list of 2D Tensors [batch_size x cell.input_size].
|
||||
cell: rnn_cell.RNNCell defining the cell function and size.
|
||||
num_symbols: integer; number of symbols for both encoder and decoder.
|
||||
output_projection: None or a pair (W, B) of output projection weights and
|
||||
biases; W has shape [cell.output_size x num_symbols] and B has
|
||||
shape [num_symbols]; if provided and feed_previous=True, each
|
||||
fed previous output will first be multiplied by W and added B.
|
||||
feed_previous: Boolean or scalar Boolean Tensor; if True, only the first
|
||||
of decoder_inputs will be used (the "GO" symbol), and all other decoder
|
||||
inputs will be taken from previous outputs (as in embedding_rnn_decoder).
|
||||
If False, decoder_inputs are used as given (the standard decoder case).
|
||||
dtype: The dtype to use for the initial RNN states (default: tf.float32).
|
||||
scope: VariableScope for the created subgraph; defaults to
|
||||
"embedding_tied_rnn_seq2seq".
|
||||
|
||||
Returns:
|
||||
outputs: A list of the same length as decoder_inputs of 2D Tensors with
|
||||
shape [batch_size x num_decoder_symbols] containing the generated outputs.
|
||||
states: The state of each decoder cell in each time-step. This is a list
|
||||
with length len(decoder_inputs) -- one item for each time-step.
|
||||
Each item is a 2D Tensor of shape [batch_size x cell.state_size].
|
||||
|
||||
Raises:
|
||||
ValueError: when output_projection has the wrong shape.
|
||||
"""
|
||||
if output_projection is not None:
|
||||
proj_weights = tf.convert_to_tensor(output_projection[0], dtype=dtype)
|
||||
proj_weights.get_shape().assert_is_compatible_with([cell.output_size,
|
||||
num_symbols])
|
||||
proj_biases = tf.convert_to_tensor(output_projection[1], dtype=dtype)
|
||||
proj_biases.get_shape().assert_is_compatible_with([num_symbols])
|
||||
|
||||
with tf.variable_scope(scope or "embedding_tied_rnn_seq2seq"):
|
||||
with tf.device("/cpu:0"):
|
||||
embedding = tf.get_variable("embedding", [num_symbols, cell.input_size])
|
||||
|
||||
emb_encoder_inputs = [tf.nn.embedding_lookup(embedding, x)
|
||||
for x in encoder_inputs]
|
||||
emb_decoder_inputs = [tf.nn.embedding_lookup(embedding, x)
|
||||
for x in decoder_inputs]
|
||||
|
||||
def extract_argmax_and_embed(prev, _):
|
||||
"""Loop_function that extracts the symbol from prev and embeds it."""
|
||||
if output_projection is not None:
|
||||
prev = tf.nn.xw_plus_b(prev, output_projection[0], output_projection[1])
|
||||
prev_symbol = tf.stop_gradient(tf.argmax(prev, 1))
|
||||
return tf.nn.embedding_lookup(embedding, prev_symbol)
|
||||
|
||||
if output_projection is None:
|
||||
cell = rnn_cell.OutputProjectionWrapper(cell, num_symbols)
|
||||
|
||||
if isinstance(feed_previous, bool):
|
||||
loop_function = extract_argmax_and_embed if feed_previous else None
|
||||
return tied_rnn_seq2seq(emb_encoder_inputs, emb_decoder_inputs, cell,
|
||||
loop_function=loop_function, dtype=dtype)
|
||||
else: # If feed_previous is a Tensor, we construct 2 graphs and use cond.
|
||||
outputs1, states1 = tied_rnn_seq2seq(
|
||||
emb_encoder_inputs, emb_decoder_inputs, cell,
|
||||
loop_function=extract_argmax_and_embed, dtype=dtype)
|
||||
tf.get_variable_scope().reuse_variables()
|
||||
outputs2, states2 = tied_rnn_seq2seq(
|
||||
emb_encoder_inputs, emb_decoder_inputs, cell, dtype=dtype)
|
||||
|
||||
outputs = tf.control_flow_ops.cond(feed_previous,
|
||||
lambda: outputs1, lambda: outputs2)
|
||||
states = tf.control_flow_ops.cond(feed_previous,
|
||||
lambda: states1, lambda: states2)
|
||||
return outputs, states
|
||||
|
||||
|
||||
def attention_decoder(decoder_inputs, initial_state, attention_states, cell,
|
||||
output_size=None, num_heads=1, loop_function=None,
|
||||
dtype=tf.float32, scope=None):
|
||||
"""RNN decoder with attention for the sequence-to-sequence model.
|
||||
|
||||
Args:
|
||||
decoder_inputs: a list of 2D Tensors [batch_size x cell.input_size].
|
||||
initial_state: 2D Tensor [batch_size x cell.state_size].
|
||||
attention_states: 3D Tensor [batch_size x attn_length x attn_size].
|
||||
cell: rnn_cell.RNNCell defining the cell function and size.
|
||||
output_size: size of the output vectors; if None, we use cell.output_size.
|
||||
num_heads: number of attention heads that read from attention_states.
|
||||
loop_function: if not None, this function will be applied to i-th output
|
||||
in order to generate i+1-th input, and decoder_inputs will be ignored,
|
||||
except for the first element ("GO" symbol). This can be used for decoding,
|
||||
but also for training to emulate http://arxiv.org/pdf/1506.03099v2.pdf.
|
||||
Signature -- loop_function(prev, i) = next
|
||||
* prev is a 2D Tensor of shape [batch_size x cell.output_size],
|
||||
* i is an integer, the step number (when advanced control is needed),
|
||||
* next is a 2D Tensor of shape [batch_size x cell.input_size].
|
||||
dtype: The dtype to use for the RNN initial state (default: tf.float32).
|
||||
scope: VariableScope for the created subgraph; default: "attention_decoder".
|
||||
|
||||
Returns:
|
||||
outputs: A list of the same length as decoder_inputs of 2D Tensors of shape
|
||||
[batch_size x output_size]. These represent the generated outputs.
|
||||
Output i is computed from input i (which is either i-th decoder_inputs or
|
||||
loop_function(output {i-1}, i)) as follows. First, we run the cell
|
||||
on a combination of the input and previous attention masks:
|
||||
cell_output, new_state = cell(linear(input, prev_attn), prev_state).
|
||||
Then, we calculate new attention masks:
|
||||
new_attn = softmax(V^T * tanh(W * attention_states + U * new_state))
|
||||
and then we calculate the output:
|
||||
output = linear(cell_output, new_attn).
|
||||
states: The state of each decoder cell in each time-step. This is a list
|
||||
with length len(decoder_inputs) -- one item for each time-step.
|
||||
Each item is a 2D Tensor of shape [batch_size x cell.state_size].
|
||||
|
||||
Raises:
|
||||
ValueError: when num_heads is not positive, there are no inputs, or shapes
|
||||
of attention_states are not set.
|
||||
"""
|
||||
if not decoder_inputs:
|
||||
raise ValueError("Must provide at least 1 input to attention decoder.")
|
||||
if num_heads < 1:
|
||||
raise ValueError("With less than 1 heads, use a non-attention decoder.")
|
||||
if not attention_states.get_shape()[1:2].is_fully_defined():
|
||||
raise ValueError("Shape[1] and [2] of attention_states must be known: %s"
|
||||
% attention_states.get_shape())
|
||||
if output_size is None:
|
||||
output_size = cell.output_size
|
||||
|
||||
with tf.variable_scope(scope or "attention_decoder"):
|
||||
batch_size = tf.shape(decoder_inputs[0])[0] # Needed for reshaping.
|
||||
attn_length = attention_states.get_shape()[1].value
|
||||
attn_size = attention_states.get_shape()[2].value
|
||||
|
||||
# To calculate W1 * h_t we use a 1-by-1 convolution, need to reshape before.
|
||||
hidden = tf.reshape(attention_states, [-1, attn_length, 1, attn_size])
|
||||
hidden_features = []
|
||||
v = []
|
||||
attention_vec_size = attn_size # Size of query vectors for attention.
|
||||
for a in xrange(num_heads):
|
||||
k = tf.get_variable("AttnW_%d" % a, [1, 1, attn_size, attention_vec_size])
|
||||
hidden_features.append(tf.nn.conv2d(hidden, k, [1, 1, 1, 1], "SAME"))
|
||||
v.append(tf.get_variable("AttnV_%d" % a, [attention_vec_size]))
|
||||
|
||||
states = [initial_state]
|
||||
|
||||
def attention(query):
|
||||
"""Put attention masks on hidden using hidden_features and query."""
|
||||
ds = [] # Results of attention reads will be stored here.
|
||||
for a in xrange(num_heads):
|
||||
with tf.variable_scope("Attention_%d" % a):
|
||||
y = linear.linear(query, attention_vec_size, True)
|
||||
y = tf.reshape(y, [-1, 1, 1, attention_vec_size])
|
||||
# Attention mask is a softmax of v^T * tanh(...).
|
||||
s = tf.reduce_sum(v[a] * tf.tanh(hidden_features[a] + y), [2, 3])
|
||||
a = tf.nn.softmax(s)
|
||||
# Now calculate the attention-weighted vector d.
|
||||
d = tf.reduce_sum(tf.reshape(a, [-1, attn_length, 1, 1]) * hidden,
|
||||
[1, 2])
|
||||
ds.append(tf.reshape(d, [-1, attn_size]))
|
||||
return ds
|
||||
|
||||
outputs = []
|
||||
prev = None
|
||||
batch_attn_size = tf.pack([batch_size, attn_size])
|
||||
attns = [tf.zeros(batch_attn_size, dtype=dtype)
|
||||
for _ in xrange(num_heads)]
|
||||
for a in attns: # Ensure the second shape of attention vectors is set.
|
||||
a.set_shape([None, attn_size])
|
||||
for i in xrange(len(decoder_inputs)):
|
||||
if i > 0:
|
||||
tf.get_variable_scope().reuse_variables()
|
||||
inp = decoder_inputs[i]
|
||||
# If loop_function is set, we use it instead of decoder_inputs.
|
||||
if loop_function is not None and prev is not None:
|
||||
with tf.variable_scope("loop_function", reuse=True):
|
||||
inp = tf.stop_gradient(loop_function(prev, i))
|
||||
# Merge input and previous attentions into one vector of the right size.
|
||||
x = linear.linear([inp] + attns, cell.input_size, True)
|
||||
# Run the RNN.
|
||||
cell_output, new_state = cell(x, states[-1])
|
||||
states.append(new_state)
|
||||
# Run the attention mechanism.
|
||||
attns = attention(new_state)
|
||||
with tf.variable_scope("AttnOutputProjection"):
|
||||
output = linear.linear([cell_output] + attns, output_size, True)
|
||||
if loop_function is not None:
|
||||
# We do not propagate gradients over the loop function.
|
||||
prev = tf.stop_gradient(output)
|
||||
outputs.append(output)
|
||||
|
||||
return outputs, states
|
||||
|
||||
|
||||
def embedding_attention_decoder(decoder_inputs, initial_state, attention_states,
|
||||
cell, num_symbols, num_heads=1,
|
||||
output_size=None, output_projection=None,
|
||||
feed_previous=False, dtype=tf.float32,
|
||||
scope=None):
|
||||
"""RNN decoder with embedding and attention and a pure-decoding option.
|
||||
|
||||
Args:
|
||||
decoder_inputs: a list of 1D batch-sized int32-Tensors (decoder inputs).
|
||||
initial_state: 2D Tensor [batch_size x cell.state_size].
|
||||
attention_states: 3D Tensor [batch_size x attn_length x attn_size].
|
||||
cell: rnn_cell.RNNCell defining the cell function.
|
||||
num_symbols: integer, how many symbols come into the embedding.
|
||||
num_heads: number of attention heads that read from attention_states.
|
||||
output_size: size of the output vectors; if None, use cell.output_size.
|
||||
output_projection: None or a pair (W, B) of output projection weights and
|
||||
biases; W has shape [output_size x num_symbols] and B has shape
|
||||
[num_symbols]; if provided and feed_previous=True, each fed previous
|
||||
output will first be multiplied by W and added B.
|
||||
feed_previous: Boolean; if True, only the first of decoder_inputs will be
|
||||
used (the "GO" symbol), and all other decoder inputs will be generated by:
|
||||
next = embedding_lookup(embedding, argmax(previous_output)),
|
||||
In effect, this implements a greedy decoder. It can also be used
|
||||
during training to emulate http://arxiv.org/pdf/1506.03099v2.pdf.
|
||||
If False, decoder_inputs are used as given (the standard decoder case).
|
||||
dtype: The dtype to use for the RNN initial states (default: tf.float32).
|
||||
scope: VariableScope for the created subgraph; defaults to
|
||||
"embedding_attention_decoder".
|
||||
|
||||
Returns:
|
||||
outputs: A list of the same length as decoder_inputs of 2D Tensors with
|
||||
shape [batch_size x output_size] containing the generated outputs.
|
||||
states: The state of each decoder cell in each time-step. This is a list
|
||||
with length len(decoder_inputs) -- one item for each time-step.
|
||||
Each item is a 2D Tensor of shape [batch_size x cell.state_size].
|
||||
|
||||
Raises:
|
||||
ValueError: when output_projection has the wrong shape.
|
||||
"""
|
||||
if output_size is None:
|
||||
output_size = cell.output_size
|
||||
if output_projection is not None:
|
||||
proj_weights = tf.convert_to_tensor(output_projection[0], dtype=dtype)
|
||||
proj_weights.get_shape().assert_is_compatible_with([cell.output_size,
|
||||
num_symbols])
|
||||
proj_biases = tf.convert_to_tensor(output_projection[1], dtype=dtype)
|
||||
proj_biases.get_shape().assert_is_compatible_with([num_symbols])
|
||||
|
||||
with tf.variable_scope(scope or "embedding_attention_decoder"):
|
||||
with tf.device("/cpu:0"):
|
||||
embedding = tf.get_variable("embedding", [num_symbols, cell.input_size])
|
||||
|
||||
def extract_argmax_and_embed(prev, _):
|
||||
"""Loop_function that extracts the symbol from prev and embeds it."""
|
||||
if output_projection is not None:
|
||||
prev = tf.nn.xw_plus_b(prev, output_projection[0], output_projection[1])
|
||||
prev_symbol = tf.stop_gradient(tf.argmax(prev, 1))
|
||||
emb_prev = tf.nn.embedding_lookup(embedding, prev_symbol)
|
||||
return emb_prev
|
||||
|
||||
loop_function = None
|
||||
if feed_previous:
|
||||
loop_function = extract_argmax_and_embed
|
||||
|
||||
emb_inp = [tf.nn.embedding_lookup(embedding, i) for i in decoder_inputs]
|
||||
return attention_decoder(
|
||||
emb_inp, initial_state, attention_states, cell, output_size=output_size,
|
||||
num_heads=num_heads, loop_function=loop_function)
|
||||
|
||||
|
||||
def embedding_attention_seq2seq(encoder_inputs, decoder_inputs, cell,
|
||||
num_encoder_symbols, num_decoder_symbols,
|
||||
num_heads=1, output_projection=None,
|
||||
feed_previous=False, dtype=tf.float32,
|
||||
scope=None):
|
||||
"""Embedding sequence-to-sequence model with attention.
|
||||
|
||||
This model first embeds encoder_inputs by a newly created embedding (of shape
|
||||
[num_encoder_symbols x cell.input_size]). Then it runs an RNN to encode
|
||||
embedded encoder_inputs into a state vector. It keeps the outputs of this
|
||||
RNN at every step to use for attention later. Next, it embeds decoder_inputs
|
||||
by another newly created embedding (of shape [num_decoder_symbols x
|
||||
cell.input_size]). Then it runs attention decoder, initialized with the last
|
||||
encoder state, on embedded decoder_inputs and attending to encoder outputs.
|
||||
|
||||
Args:
|
||||
encoder_inputs: a list of 2D Tensors [batch_size x cell.input_size].
|
||||
decoder_inputs: a list of 2D Tensors [batch_size x cell.input_size].
|
||||
cell: rnn_cell.RNNCell defining the cell function and size.
|
||||
num_encoder_symbols: integer; number of symbols on the encoder side.
|
||||
num_decoder_symbols: integer; number of symbols on the decoder side.
|
||||
num_heads: number of attention heads that read from attention_states.
|
||||
output_projection: None or a pair (W, B) of output projection weights and
|
||||
biases; W has shape [cell.output_size x num_decoder_symbols] and B has
|
||||
shape [num_decoder_symbols]; if provided and feed_previous=True, each
|
||||
fed previous output will first be multiplied by W and added B.
|
||||
feed_previous: Boolean or scalar Boolean Tensor; if True, only the first
|
||||
of decoder_inputs will be used (the "GO" symbol), and all other decoder
|
||||
inputs will be taken from previous outputs (as in embedding_rnn_decoder).
|
||||
If False, decoder_inputs are used as given (the standard decoder case).
|
||||
dtype: The dtype of the initial RNN state (default: tf.float32).
|
||||
scope: VariableScope for the created subgraph; defaults to
|
||||
"embedding_attention_seq2seq".
|
||||
|
||||
Returns:
|
||||
outputs: A list of the same length as decoder_inputs of 2D Tensors with
|
||||
shape [batch_size x num_decoder_symbols] containing the generated outputs.
|
||||
states: The state of each decoder cell in each time-step. This is a list
|
||||
with length len(decoder_inputs) -- one item for each time-step.
|
||||
Each item is a 2D Tensor of shape [batch_size x cell.state_size].
|
||||
"""
|
||||
with tf.variable_scope(scope or "embedding_attention_seq2seq"):
|
||||
# Encoder.
|
||||
encoder_cell = rnn_cell.EmbeddingWrapper(cell, num_encoder_symbols)
|
||||
encoder_outputs, encoder_states = rnn.rnn(
|
||||
encoder_cell, encoder_inputs, dtype=dtype)
|
||||
|
||||
# First calculate a concatenation of encoder outputs to put attention on.
|
||||
top_states = [tf.reshape(e, [-1, 1, cell.output_size])
|
||||
for e in encoder_outputs]
|
||||
attention_states = tf.concat(1, top_states)
|
||||
|
||||
# Decoder.
|
||||
output_size = None
|
||||
if output_projection is None:
|
||||
cell = rnn_cell.OutputProjectionWrapper(cell, num_decoder_symbols)
|
||||
output_size = num_decoder_symbols
|
||||
|
||||
if isinstance(feed_previous, bool):
|
||||
return embedding_attention_decoder(
|
||||
decoder_inputs, encoder_states[-1], attention_states, cell,
|
||||
num_decoder_symbols, num_heads, output_size, output_projection,
|
||||
feed_previous)
|
||||
else: # If feed_previous is a Tensor, we construct 2 graphs and use cond.
|
||||
outputs1, states1 = embedding_attention_decoder(
|
||||
decoder_inputs, encoder_states[-1], attention_states, cell,
|
||||
num_decoder_symbols, num_heads, output_size, output_projection, True)
|
||||
tf.get_variable_scope().reuse_variables()
|
||||
outputs2, states2 = embedding_attention_decoder(
|
||||
decoder_inputs, encoder_states[-1], attention_states, cell,
|
||||
num_decoder_symbols, num_heads, output_size, output_projection, False)
|
||||
|
||||
outputs = tf.control_flow_ops.cond(feed_previous,
|
||||
lambda: outputs1, lambda: outputs2)
|
||||
states = tf.control_flow_ops.cond(feed_previous,
|
||||
lambda: states1, lambda: states2)
|
||||
return outputs, states
|
||||
|
||||
|
||||
def sequence_loss_by_example(logits, targets, weights, num_decoder_symbols,
|
||||
average_across_timesteps=True,
|
||||
softmax_loss_function=None, name=None):
|
||||
"""Weighted cross-entropy loss for a sequence of logits (per example).
|
||||
|
||||
Args:
|
||||
logits: list of 2D Tensors of shape [batch_size x num_decoder_symbols].
|
||||
targets: list of 1D batch-sized int32-Tensors of the same length as logits.
|
||||
weights: list of 1D batch-sized float-Tensors of the same length as logits.
|
||||
num_decoder_symbols: integer, number of decoder symbols (output classes).
|
||||
average_across_timesteps: If set, divide the returned cost by the total
|
||||
label weight.
|
||||
softmax_loss_function: function (inputs-batch, labels-batch) -> loss-batch
|
||||
to be used instead of the standard softmax (the default if this is None).
|
||||
name: optional name for this operation, default: "sequence_loss_by_example".
|
||||
|
||||
Returns:
|
||||
1D batch-sized float Tensor: the log-perplexity for each sequence.
|
||||
|
||||
Raises:
|
||||
ValueError: if len(logits) is different from len(targets) or len(weights).
|
||||
"""
|
||||
if len(targets) != len(logits) or len(weights) != len(logits):
|
||||
raise ValueError("Lengths of logits, weights, and targets must be the same "
|
||||
"%d, %d, %d." % (len(logits), len(weights), len(targets)))
|
||||
with tf.op_scope(logits + targets + weights, name,
|
||||
"sequence_loss_by_example"):
|
||||
batch_size = tf.shape(targets[0])[0]
|
||||
log_perp_list = []
|
||||
length = batch_size * num_decoder_symbols
|
||||
for i in xrange(len(logits)):
|
||||
if softmax_loss_function is None:
|
||||
# TODO(lukaszkaiser): There is no SparseCrossEntropy in TensorFlow, so
|
||||
# we need to first cast targets into a dense representation, and as
|
||||
# SparseToDense does not accept batched inputs, we need to do this by
|
||||
# re-indexing and re-sizing. When TensorFlow adds SparseCrossEntropy,
|
||||
# rewrite this method.
|
||||
indices = targets[i] + num_decoder_symbols * tf.range(0, batch_size)
|
||||
with tf.device("/cpu:0"): # Sparse-to-dense must happen on CPU for now.
|
||||
dense = tf.sparse_to_dense(indices, tf.expand_dims(length, 0), 1.0,
|
||||
0.0)
|
||||
target = tf.reshape(dense, [-1, num_decoder_symbols])
|
||||
crossent = tf.nn.softmax_cross_entropy_with_logits(
|
||||
logits[i], target, name="SequenceLoss/CrossEntropy{0}".format(i))
|
||||
else:
|
||||
crossent = softmax_loss_function(logits[i], targets[i])
|
||||
log_perp_list.append(crossent * weights[i])
|
||||
log_perps = tf.add_n(log_perp_list)
|
||||
if average_across_timesteps:
|
||||
total_size = tf.add_n(weights)
|
||||
total_size += 1e-12 # Just to avoid division by 0 for all-0 weights.
|
||||
log_perps /= total_size
|
||||
return log_perps
|
||||
|
||||
|
||||
def sequence_loss(logits, targets, weights, num_decoder_symbols,
|
||||
average_across_timesteps=True, average_across_batch=True,
|
||||
softmax_loss_function=None, name=None):
|
||||
"""Weighted cross-entropy loss for a sequence of logits, batch-collapsed.
|
||||
|
||||
Args:
|
||||
logits: list of 2D Tensors os shape [batch_size x num_decoder_symbols].
|
||||
targets: list of 1D batch-sized int32-Tensors of the same length as logits.
|
||||
weights: list of 1D batch-sized float-Tensors of the same length as logits.
|
||||
num_decoder_symbols: integer, number of decoder symbols (output classes).
|
||||
average_across_timesteps: If set, divide the returned cost by the total
|
||||
label weight.
|
||||
average_across_batch: If set, divide the returned cost by the batch size.
|
||||
softmax_loss_function: function (inputs-batch, labels-batch) -> loss-batch
|
||||
to be used instead of the standard softmax (the default if this is None).
|
||||
name: optional name for this operation, defaults to "sequence_loss".
|
||||
|
||||
Returns:
|
||||
A scalar float Tensor: the average log-perplexity per symbol (weighted).
|
||||
|
||||
Raises:
|
||||
ValueError: if len(logits) is different from len(targets) or len(weights).
|
||||
"""
|
||||
with tf.op_scope(logits + targets + weights, name, "sequence_loss"):
|
||||
cost = tf.reduce_sum(sequence_loss_by_example(
|
||||
logits, targets, weights, num_decoder_symbols,
|
||||
average_across_timesteps=average_across_timesteps,
|
||||
softmax_loss_function=softmax_loss_function))
|
||||
if average_across_batch:
|
||||
batch_size = tf.shape(targets[0])[0]
|
||||
return cost / tf.cast(batch_size, tf.float32)
|
||||
else:
|
||||
return cost
|
||||
|
||||
|
||||
def model_with_buckets(encoder_inputs, decoder_inputs, targets, weights,
|
||||
buckets, num_decoder_symbols, seq2seq,
|
||||
softmax_loss_function=None, name=None):
|
||||
"""Create a sequence-to-sequence model with support for bucketing.
|
||||
|
||||
The seq2seq argument is a function that defines a sequence-to-sequence model,
|
||||
e.g., seq2seq = lambda x, y: basic_rnn_seq2seq(x, y, rnn_cell.GRUCell(24))
|
||||
|
||||
Args:
|
||||
encoder_inputs: a list of Tensors to feed the encoder; first seq2seq input.
|
||||
decoder_inputs: a list of Tensors to feed the decoder; second seq2seq input.
|
||||
targets: a list of 1D batch-sized int32-Tensors (desired output sequence).
|
||||
weights: list of 1D batch-sized float-Tensors to weight the targets.
|
||||
buckets: a list of pairs of (input size, output size) for each bucket.
|
||||
num_decoder_symbols: integer, number of decoder symbols (output classes).
|
||||
seq2seq: a sequence-to-sequence model function; it takes 2 input that
|
||||
agree with encoder_inputs and decoder_inputs, and returns a pair
|
||||
consisting of outputs and states (as, e.g., basic_rnn_seq2seq).
|
||||
softmax_loss_function: function (inputs-batch, labels-batch) -> loss-batch
|
||||
to be used instead of the standard softmax (the default if this is None).
|
||||
name: optional name for this operation, defaults to "model_with_buckets".
|
||||
|
||||
Returns:
|
||||
outputs: The outputs for each bucket. Its j'th element consists of a list
|
||||
of 2D Tensors of shape [batch_size x num_decoder_symbols] (j'th outputs).
|
||||
losses: List of scalar Tensors, representing losses for each bucket.
|
||||
Raises:
|
||||
ValueError: if length of encoder_inputsut, targets, or weights is smaller
|
||||
than the largest (last) bucket.
|
||||
"""
|
||||
if len(encoder_inputs) < buckets[-1][0]:
|
||||
raise ValueError("Length of encoder_inputs (%d) must be at least that of la"
|
||||
"st bucket (%d)." % (len(encoder_inputs), buckets[-1][0]))
|
||||
if len(targets) < buckets[-1][1]:
|
||||
raise ValueError("Length of targets (%d) must be at least that of last"
|
||||
"bucket (%d)." % (len(targets), buckets[-1][1]))
|
||||
if len(weights) < buckets[-1][1]:
|
||||
raise ValueError("Length of weights (%d) must be at least that of last"
|
||||
"bucket (%d)." % (len(weights), buckets[-1][1]))
|
||||
|
||||
all_inputs = encoder_inputs + decoder_inputs + targets + weights
|
||||
losses = []
|
||||
outputs = []
|
||||
with tf.op_scope(all_inputs, name, "model_with_buckets"):
|
||||
for j in xrange(len(buckets)):
|
||||
if j > 0:
|
||||
tf.get_variable_scope().reuse_variables()
|
||||
bucket_encoder_inputs = [encoder_inputs[i]
|
||||
for i in xrange(buckets[j][0])]
|
||||
bucket_decoder_inputs = [decoder_inputs[i]
|
||||
for i in xrange(buckets[j][1])]
|
||||
bucket_outputs, _ = seq2seq(bucket_encoder_inputs,
|
||||
bucket_decoder_inputs)
|
||||
outputs.append(bucket_outputs)
|
||||
|
||||
bucket_targets = [targets[i] for i in xrange(buckets[j][1])]
|
||||
bucket_weights = [weights[i] for i in xrange(buckets[j][1])]
|
||||
losses.append(sequence_loss(
|
||||
outputs[-1], bucket_targets, bucket_weights, num_decoder_symbols,
|
||||
softmax_loss_function=softmax_loss_function))
|
||||
|
||||
return outputs, losses
|
||||
384
tensorflow/models/rnn/seq2seq_test.py
Normal file
384
tensorflow/models/rnn/seq2seq_test.py
Normal file
|
|
@ -0,0 +1,384 @@
|
|||
"""Tests for functional style sequence-to-sequence models."""
|
||||
import math
|
||||
import random
|
||||
|
||||
import tensorflow.python.platform
|
||||
|
||||
import numpy as np
|
||||
import tensorflow as tf
|
||||
|
||||
from tensorflow.models.rnn import rnn
|
||||
from tensorflow.models.rnn import rnn_cell
|
||||
from tensorflow.models.rnn import seq2seq
|
||||
|
||||
|
||||
class Seq2SeqTest(tf.test.TestCase):
|
||||
|
||||
def testRNNDecoder(self):
|
||||
with self.test_session() as sess:
|
||||
with tf.variable_scope("root", initializer=tf.constant_initializer(0.5)):
|
||||
inp = [tf.constant(0.5, shape=[2, 2]) for _ in xrange(2)]
|
||||
_, enc_states = rnn.rnn(rnn_cell.GRUCell(2), inp, dtype=tf.float32)
|
||||
dec_inp = [tf.constant(0.4, shape=[2, 2]) for _ in xrange(3)]
|
||||
cell = rnn_cell.OutputProjectionWrapper(rnn_cell.GRUCell(2), 4)
|
||||
dec, mem = seq2seq.rnn_decoder(dec_inp, enc_states[-1], cell)
|
||||
sess.run([tf.initialize_all_variables()])
|
||||
res = sess.run(dec)
|
||||
self.assertEqual(len(res), 3)
|
||||
self.assertEqual(res[0].shape, (2, 4))
|
||||
|
||||
res = sess.run(mem)
|
||||
self.assertEqual(len(res), 4)
|
||||
self.assertEqual(res[0].shape, (2, 2))
|
||||
|
||||
def testBasicRNNSeq2Seq(self):
|
||||
with self.test_session() as sess:
|
||||
with tf.variable_scope("root", initializer=tf.constant_initializer(0.5)):
|
||||
inp = [tf.constant(0.5, shape=[2, 2]) for _ in xrange(2)]
|
||||
dec_inp = [tf.constant(0.4, shape=[2, 2]) for _ in xrange(3)]
|
||||
cell = rnn_cell.OutputProjectionWrapper(rnn_cell.GRUCell(2), 4)
|
||||
dec, mem = seq2seq.basic_rnn_seq2seq(inp, dec_inp, cell)
|
||||
sess.run([tf.initialize_all_variables()])
|
||||
res = sess.run(dec)
|
||||
self.assertEqual(len(res), 3)
|
||||
self.assertEqual(res[0].shape, (2, 4))
|
||||
|
||||
res = sess.run(mem)
|
||||
self.assertEqual(len(res), 4)
|
||||
self.assertEqual(res[0].shape, (2, 2))
|
||||
|
||||
def testTiedRNNSeq2Seq(self):
|
||||
with self.test_session() as sess:
|
||||
with tf.variable_scope("root", initializer=tf.constant_initializer(0.5)):
|
||||
inp = [tf.constant(0.5, shape=[2, 2]) for _ in xrange(2)]
|
||||
dec_inp = [tf.constant(0.4, shape=[2, 2]) for _ in xrange(3)]
|
||||
cell = rnn_cell.OutputProjectionWrapper(rnn_cell.GRUCell(2), 4)
|
||||
dec, mem = seq2seq.tied_rnn_seq2seq(inp, dec_inp, cell)
|
||||
sess.run([tf.initialize_all_variables()])
|
||||
res = sess.run(dec)
|
||||
self.assertEqual(len(res), 3)
|
||||
self.assertEqual(res[0].shape, (2, 4))
|
||||
|
||||
res = sess.run(mem)
|
||||
self.assertEqual(len(res), 4)
|
||||
self.assertEqual(res[0].shape, (2, 2))
|
||||
|
||||
def testEmbeddingRNNDecoder(self):
|
||||
with self.test_session() as sess:
|
||||
with tf.variable_scope("root", initializer=tf.constant_initializer(0.5)):
|
||||
inp = [tf.constant(0.5, shape=[2, 2]) for _ in xrange(2)]
|
||||
cell = rnn_cell.BasicLSTMCell(2)
|
||||
_, enc_states = rnn.rnn(cell, inp, dtype=tf.float32)
|
||||
dec_inp = [tf.constant(i, tf.int32, shape=[2]) for i in xrange(3)]
|
||||
dec, mem = seq2seq.embedding_rnn_decoder(dec_inp, enc_states[-1],
|
||||
cell, 4)
|
||||
sess.run([tf.initialize_all_variables()])
|
||||
res = sess.run(dec)
|
||||
self.assertEqual(len(res), 3)
|
||||
self.assertEqual(res[0].shape, (2, 2))
|
||||
|
||||
res = sess.run(mem)
|
||||
self.assertEqual(len(res), 4)
|
||||
self.assertEqual(res[0].shape, (2, 4))
|
||||
|
||||
def testEmbeddingRNNSeq2Seq(self):
|
||||
with self.test_session() as sess:
|
||||
with tf.variable_scope("root", initializer=tf.constant_initializer(0.5)):
|
||||
enc_inp = [tf.constant(1, tf.int32, shape=[2]) for i in xrange(2)]
|
||||
dec_inp = [tf.constant(i, tf.int32, shape=[2]) for i in xrange(3)]
|
||||
cell = rnn_cell.BasicLSTMCell(2)
|
||||
dec, mem = seq2seq.embedding_rnn_seq2seq(enc_inp, dec_inp, cell, 2, 5)
|
||||
sess.run([tf.variables.initialize_all_variables()])
|
||||
res = sess.run(dec)
|
||||
self.assertEqual(len(res), 3)
|
||||
self.assertEqual(res[0].shape, (2, 5))
|
||||
|
||||
res = sess.run(mem)
|
||||
self.assertEqual(len(res), 4)
|
||||
self.assertEqual(res[0].shape, (2, 4))
|
||||
|
||||
# Test externally provided output projection.
|
||||
w = tf.get_variable("proj_w", [2, 5])
|
||||
b = tf.get_variable("proj_b", [5])
|
||||
with tf.variable_scope("proj_seq2seq"):
|
||||
dec, _ = seq2seq.embedding_rnn_seq2seq(
|
||||
enc_inp, dec_inp, cell, 2, 5, output_projection=(w, b))
|
||||
sess.run([tf.variables.initialize_all_variables()])
|
||||
res = sess.run(dec)
|
||||
self.assertEqual(len(res), 3)
|
||||
self.assertEqual(res[0].shape, (2, 2))
|
||||
|
||||
# Test that previous-feeding model ignores inputs after the first.
|
||||
dec_inp2 = [tf.constant(0, tf.int32, shape=[2]) for _ in xrange(3)]
|
||||
tf.get_variable_scope().reuse_variables()
|
||||
d1, _ = seq2seq.embedding_rnn_seq2seq(enc_inp, dec_inp, cell, 2, 5,
|
||||
feed_previous=True)
|
||||
d2, _ = seq2seq.embedding_rnn_seq2seq(enc_inp, dec_inp2, cell, 2, 5,
|
||||
feed_previous=True)
|
||||
d3, _ = seq2seq.embedding_rnn_seq2seq(enc_inp, dec_inp2, cell, 2, 5,
|
||||
feed_previous=tf.constant(True))
|
||||
res1 = sess.run(d1)
|
||||
res2 = sess.run(d2)
|
||||
res3 = sess.run(d3)
|
||||
self.assertAllClose(res1, res2)
|
||||
self.assertAllClose(res1, res3)
|
||||
|
||||
def testEmbeddingTiedRNNSeq2Seq(self):
|
||||
with self.test_session() as sess:
|
||||
with tf.variable_scope("root", initializer=tf.constant_initializer(0.5)):
|
||||
enc_inp = [tf.constant(1, tf.int32, shape=[2]) for i in xrange(2)]
|
||||
dec_inp = [tf.constant(i, tf.int32, shape=[2]) for i in xrange(3)]
|
||||
cell = rnn_cell.BasicLSTMCell(2)
|
||||
dec, mem = seq2seq.embedding_tied_rnn_seq2seq(enc_inp, dec_inp, cell, 5)
|
||||
sess.run([tf.variables.initialize_all_variables()])
|
||||
res = sess.run(dec)
|
||||
self.assertEqual(len(res), 3)
|
||||
self.assertEqual(res[0].shape, (2, 5))
|
||||
|
||||
res = sess.run(mem)
|
||||
self.assertEqual(len(res), 4)
|
||||
self.assertEqual(res[0].shape, (2, 4))
|
||||
|
||||
# Test externally provided output projection.
|
||||
w = tf.get_variable("proj_w", [2, 5])
|
||||
b = tf.get_variable("proj_b", [5])
|
||||
with tf.variable_scope("proj_seq2seq"):
|
||||
dec, _ = seq2seq.embedding_tied_rnn_seq2seq(
|
||||
enc_inp, dec_inp, cell, 5, output_projection=(w, b))
|
||||
sess.run([tf.variables.initialize_all_variables()])
|
||||
res = sess.run(dec)
|
||||
self.assertEqual(len(res), 3)
|
||||
self.assertEqual(res[0].shape, (2, 2))
|
||||
|
||||
# Test that previous-feeding model ignores inputs after the first.
|
||||
dec_inp2 = [tf.constant(0, tf.int32, shape=[2]) for _ in xrange(3)]
|
||||
tf.get_variable_scope().reuse_variables()
|
||||
d1, _ = seq2seq.embedding_tied_rnn_seq2seq(enc_inp, dec_inp, cell, 5,
|
||||
feed_previous=True)
|
||||
d2, _ = seq2seq.embedding_tied_rnn_seq2seq(enc_inp, dec_inp2, cell, 5,
|
||||
feed_previous=True)
|
||||
d3, _ = seq2seq.embedding_tied_rnn_seq2seq(
|
||||
enc_inp, dec_inp2, cell, 5, feed_previous=tf.constant(True))
|
||||
res1 = sess.run(d1)
|
||||
res2 = sess.run(d2)
|
||||
res3 = sess.run(d3)
|
||||
self.assertAllClose(res1, res2)
|
||||
self.assertAllClose(res1, res3)
|
||||
|
||||
def testAttentionDecoder1(self):
|
||||
with self.test_session() as sess:
|
||||
with tf.variable_scope("root", initializer=tf.constant_initializer(0.5)):
|
||||
cell = rnn_cell.GRUCell(2)
|
||||
inp = [tf.constant(0.5, shape=[2, 2]) for _ in xrange(2)]
|
||||
enc_outputs, enc_states = rnn.rnn(cell, inp, dtype=tf.float32)
|
||||
attn_states = tf.concat(1, [tf.reshape(e, [-1, 1, cell.output_size])
|
||||
for e in enc_outputs])
|
||||
dec_inp = [tf.constant(0.4, shape=[2, 2]) for _ in xrange(3)]
|
||||
dec, mem = seq2seq.attention_decoder(dec_inp, enc_states[-1],
|
||||
attn_states, cell, output_size=4)
|
||||
sess.run([tf.initialize_all_variables()])
|
||||
res = sess.run(dec)
|
||||
self.assertEqual(len(res), 3)
|
||||
self.assertEqual(res[0].shape, (2, 4))
|
||||
|
||||
res = sess.run(mem)
|
||||
self.assertEqual(len(res), 4)
|
||||
self.assertEqual(res[0].shape, (2, 2))
|
||||
|
||||
def testAttentionDecoder2(self):
|
||||
with self.test_session() as sess:
|
||||
with tf.variable_scope("root", initializer=tf.constant_initializer(0.5)):
|
||||
cell = rnn_cell.GRUCell(2)
|
||||
inp = [tf.constant(0.5, shape=[2, 2]) for _ in xrange(2)]
|
||||
enc_outputs, enc_states = rnn.rnn(cell, inp, dtype=tf.float32)
|
||||
attn_states = tf.concat(1, [tf.reshape(e, [-1, 1, cell.output_size])
|
||||
for e in enc_outputs])
|
||||
dec_inp = [tf.constant(0.4, shape=[2, 2]) for _ in xrange(3)]
|
||||
dec, mem = seq2seq.attention_decoder(dec_inp, enc_states[-1],
|
||||
attn_states, cell, output_size=4,
|
||||
num_heads=2)
|
||||
sess.run([tf.initialize_all_variables()])
|
||||
res = sess.run(dec)
|
||||
self.assertEqual(len(res), 3)
|
||||
self.assertEqual(res[0].shape, (2, 4))
|
||||
|
||||
res = sess.run(mem)
|
||||
self.assertEqual(len(res), 4)
|
||||
self.assertEqual(res[0].shape, (2, 2))
|
||||
|
||||
def testEmbeddingAttentionDecoder(self):
|
||||
with self.test_session() as sess:
|
||||
with tf.variable_scope("root", initializer=tf.constant_initializer(0.5)):
|
||||
inp = [tf.constant(0.5, shape=[2, 2]) for _ in xrange(2)]
|
||||
cell = rnn_cell.GRUCell(2)
|
||||
enc_outputs, enc_states = rnn.rnn(cell, inp, dtype=tf.float32)
|
||||
attn_states = tf.concat(1, [tf.reshape(e, [-1, 1, cell.output_size])
|
||||
for e in enc_outputs])
|
||||
dec_inp = [tf.constant(i, tf.int32, shape=[2]) for i in xrange(3)]
|
||||
dec, mem = seq2seq.embedding_attention_decoder(dec_inp, enc_states[-1],
|
||||
attn_states, cell, 4,
|
||||
output_size=3)
|
||||
sess.run([tf.initialize_all_variables()])
|
||||
res = sess.run(dec)
|
||||
self.assertEqual(len(res), 3)
|
||||
self.assertEqual(res[0].shape, (2, 3))
|
||||
|
||||
res = sess.run(mem)
|
||||
self.assertEqual(len(res), 4)
|
||||
self.assertEqual(res[0].shape, (2, 2))
|
||||
|
||||
def testEmbeddingAttentionSeq2Seq(self):
|
||||
with self.test_session() as sess:
|
||||
with tf.variable_scope("root", initializer=tf.constant_initializer(0.5)):
|
||||
enc_inp = [tf.constant(1, tf.int32, shape=[2]) for i in xrange(2)]
|
||||
dec_inp = [tf.constant(i, tf.int32, shape=[2]) for i in xrange(3)]
|
||||
cell = rnn_cell.BasicLSTMCell(2)
|
||||
dec, mem = seq2seq.embedding_attention_seq2seq(
|
||||
enc_inp, dec_inp, cell, 2, 5)
|
||||
sess.run([tf.initialize_all_variables()])
|
||||
res = sess.run(dec)
|
||||
self.assertEqual(len(res), 3)
|
||||
self.assertEqual(res[0].shape, (2, 5))
|
||||
|
||||
res = sess.run(mem)
|
||||
self.assertEqual(len(res), 4)
|
||||
self.assertEqual(res[0].shape, (2, 4))
|
||||
|
||||
# Test externally provided output projection.
|
||||
w = tf.get_variable("proj_w", [2, 5])
|
||||
b = tf.get_variable("proj_b", [5])
|
||||
with tf.variable_scope("proj_seq2seq"):
|
||||
dec, _ = seq2seq.embedding_attention_seq2seq(
|
||||
enc_inp, dec_inp, cell, 2, 5, output_projection=(w, b))
|
||||
sess.run([tf.variables.initialize_all_variables()])
|
||||
res = sess.run(dec)
|
||||
self.assertEqual(len(res), 3)
|
||||
self.assertEqual(res[0].shape, (2, 2))
|
||||
|
||||
# Test that previous-feeding model ignores inputs after the first.
|
||||
dec_inp2 = [tf.constant(0, tf.int32, shape=[2]) for _ in xrange(3)]
|
||||
tf.get_variable_scope().reuse_variables()
|
||||
d1, _ = seq2seq.embedding_attention_seq2seq(
|
||||
enc_inp, dec_inp, cell, 2, 5, feed_previous=True)
|
||||
d2, _ = seq2seq.embedding_attention_seq2seq(
|
||||
enc_inp, dec_inp2, cell, 2, 5, feed_previous=True)
|
||||
d3, _ = seq2seq.embedding_attention_seq2seq(
|
||||
enc_inp, dec_inp2, cell, 2, 5, feed_previous=tf.constant(True))
|
||||
res1 = sess.run(d1)
|
||||
res2 = sess.run(d2)
|
||||
res3 = sess.run(d3)
|
||||
self.assertAllClose(res1, res2)
|
||||
self.assertAllClose(res1, res3)
|
||||
|
||||
def testSequenceLoss(self):
|
||||
with self.test_session() as sess:
|
||||
output_classes = 5
|
||||
logits = [tf.constant(i + 0.5, shape=[2, 5]) for i in xrange(3)]
|
||||
targets = [tf.constant(i, tf.int32, shape=[2]) for i in xrange(3)]
|
||||
weights = [tf.constant(1.0, shape=[2]) for i in xrange(3)]
|
||||
|
||||
average_loss_per_example = seq2seq.sequence_loss(
|
||||
logits, targets, weights, output_classes,
|
||||
average_across_timesteps=True,
|
||||
average_across_batch=True)
|
||||
res = sess.run(average_loss_per_example)
|
||||
self.assertAllClose(res, 1.60944)
|
||||
|
||||
average_loss_per_sequence = seq2seq.sequence_loss(
|
||||
logits, targets, weights, output_classes,
|
||||
average_across_timesteps=False,
|
||||
average_across_batch=True)
|
||||
res = sess.run(average_loss_per_sequence)
|
||||
self.assertAllClose(res, 4.828314)
|
||||
|
||||
total_loss = seq2seq.sequence_loss(
|
||||
logits, targets, weights, output_classes,
|
||||
average_across_timesteps=False,
|
||||
average_across_batch=False)
|
||||
res = sess.run(total_loss)
|
||||
self.assertAllClose(res, 9.656628)
|
||||
|
||||
def testSequenceLossByExample(self):
|
||||
with self.test_session() as sess:
|
||||
output_classes = 5
|
||||
logits = [tf.constant(i + 0.5, shape=[2, output_classes])
|
||||
for i in xrange(3)]
|
||||
targets = [tf.constant(i, tf.int32, shape=[2]) for i in xrange(3)]
|
||||
weights = [tf.constant(1.0, shape=[2]) for i in xrange(3)]
|
||||
|
||||
average_loss_per_example = seq2seq.sequence_loss_by_example(
|
||||
logits, targets, weights, output_classes,
|
||||
average_across_timesteps=True)
|
||||
res = sess.run(average_loss_per_example)
|
||||
self.assertAllClose(res, np.asarray([1.609438, 1.609438]))
|
||||
|
||||
loss_per_sequence = seq2seq.sequence_loss_by_example(
|
||||
logits, targets, weights, output_classes,
|
||||
average_across_timesteps=False)
|
||||
res = sess.run(loss_per_sequence)
|
||||
self.assertAllClose(res, np.asarray([4.828314, 4.828314]))
|
||||
|
||||
def testModelWithBuckets(self):
|
||||
"""Larger tests that does full sequence-to-sequence model training."""
|
||||
# We learn to copy 10 symbols in 2 buckets: length 4 and length 8.
|
||||
classes = 10
|
||||
buckets = [(4, 4), (8, 8)]
|
||||
# We use sampled softmax so we keep output projection separate.
|
||||
w = tf.get_variable("proj_w", [24, classes])
|
||||
w_t = tf.transpose(w)
|
||||
b = tf.get_variable("proj_b", [classes])
|
||||
# Here comes a sample Seq2Seq model using GRU cells.
|
||||
def SampleGRUSeq2Seq(enc_inp, dec_inp, weights):
|
||||
"""Example sequence-to-sequence model that uses GRU cells."""
|
||||
def GRUSeq2Seq(enc_inp, dec_inp):
|
||||
cell = rnn_cell.MultiRNNCell([rnn_cell.GRUCell(24)] * 2)
|
||||
return seq2seq.embedding_attention_seq2seq(
|
||||
enc_inp, dec_inp, cell, classes, classes, output_projection=(w, b))
|
||||
targets = [dec_inp[i+1] for i in xrange(len(dec_inp) - 1)] + [0]
|
||||
def SampledLoss(inputs, labels):
|
||||
labels = tf.reshape(labels, [-1, 1])
|
||||
return tf.nn.sampled_softmax_loss(w_t, b, inputs, labels, 8, classes)
|
||||
return seq2seq.model_with_buckets(enc_inp, dec_inp, targets, weights,
|
||||
buckets, classes, GRUSeq2Seq,
|
||||
softmax_loss_function=SampledLoss)
|
||||
# Now we construct the copy model.
|
||||
with self.test_session() as sess:
|
||||
tf.set_random_seed(111)
|
||||
batch_size = 32
|
||||
inp = [tf.placeholder(tf.int32, shape=[None]) for _ in xrange(8)]
|
||||
out = [tf.placeholder(tf.int32, shape=[None]) for _ in xrange(8)]
|
||||
weights = [tf.ones_like(inp[0], dtype=tf.float32) for _ in xrange(8)]
|
||||
with tf.variable_scope("root"):
|
||||
_, losses = SampleGRUSeq2Seq(inp, out, weights)
|
||||
updates = []
|
||||
params = tf.all_variables()
|
||||
optimizer = tf.train.AdamOptimizer(0.03, epsilon=1e-5)
|
||||
for i in xrange(len(buckets)):
|
||||
full_grads = tf.gradients(losses[i], params)
|
||||
grads, _ = tf.clip_by_global_norm(full_grads, 30.0)
|
||||
update = optimizer.apply_gradients(zip(grads, params))
|
||||
updates.append(update)
|
||||
sess.run([tf.initialize_all_variables()])
|
||||
for ep in xrange(3):
|
||||
log_perp = 0.0
|
||||
for _ in xrange(50):
|
||||
bucket = random.choice(range(len(buckets)))
|
||||
length = buckets[bucket][0]
|
||||
i = [np.array([np.random.randint(9) + 1 for _ in xrange(batch_size)],
|
||||
dtype=np.int32) for _ in xrange(length)]
|
||||
# 0 is our "GO" symbol here.
|
||||
o = [np.array([0 for _ in xrange(batch_size)], dtype=np.int32)] + i
|
||||
feed = {}
|
||||
for l in xrange(length):
|
||||
feed[inp[l].name] = i[l]
|
||||
feed[out[l].name] = o[l]
|
||||
if length < 8: # For the 4-bucket, we need the 5th as target.
|
||||
feed[out[length].name] = o[length]
|
||||
res = sess.run([updates[bucket], losses[bucket]], feed)
|
||||
log_perp += float(res[1])
|
||||
perp = math.exp(log_perp / 100)
|
||||
print "step %d avg. perp %f" % ((ep + 1)*50, perp)
|
||||
self.assertLess(perp, 2.5)
|
||||
|
||||
if __name__ == "__main__":
|
||||
tf.test.main()
|
||||
71
tensorflow/models/rnn/translate/BUILD
Normal file
71
tensorflow/models/rnn/translate/BUILD
Normal file
|
|
@ -0,0 +1,71 @@
|
|||
# Description:
|
||||
# Example neural translation models.
|
||||
|
||||
package(default_visibility = ["//visibility:public"])
|
||||
|
||||
licenses(["notice"]) # Apache 2.0
|
||||
|
||||
exports_files(["LICENSE"])
|
||||
|
||||
py_library(
|
||||
name = "data_utils",
|
||||
srcs = [
|
||||
"data_utils.py",
|
||||
],
|
||||
deps = [
|
||||
"//tensorflow:tensorflow_py",
|
||||
],
|
||||
)
|
||||
|
||||
py_library(
|
||||
name = "seq2seq_model",
|
||||
srcs = [
|
||||
"seq2seq_model.py",
|
||||
],
|
||||
deps = [
|
||||
":data_utils",
|
||||
"//tensorflow:tensorflow_py",
|
||||
"//tensorflow/models/rnn:seq2seq",
|
||||
],
|
||||
)
|
||||
|
||||
py_binary(
|
||||
name = "translate",
|
||||
srcs = [
|
||||
"translate.py",
|
||||
],
|
||||
deps = [
|
||||
":data_utils",
|
||||
":seq2seq_model",
|
||||
"//tensorflow:tensorflow_py",
|
||||
],
|
||||
)
|
||||
|
||||
py_test(
|
||||
name = "translate_test",
|
||||
size = "medium",
|
||||
srcs = [
|
||||
"translate.py",
|
||||
],
|
||||
args = [
|
||||
"--self_test=True",
|
||||
],
|
||||
main = "translate.py",
|
||||
deps = [
|
||||
":data_utils",
|
||||
":seq2seq_model",
|
||||
"//tensorflow:tensorflow_py",
|
||||
],
|
||||
)
|
||||
|
||||
filegroup(
|
||||
name = "all_files",
|
||||
srcs = glob(
|
||||
["**/*"],
|
||||
exclude = [
|
||||
"**/METADATA",
|
||||
"**/OWNERS",
|
||||
],
|
||||
),
|
||||
visibility = ["//tensorflow:__subpackages__"],
|
||||
)
|
||||
0
tensorflow/models/rnn/translate/__init__.py
Executable file
0
tensorflow/models/rnn/translate/__init__.py
Executable file
264
tensorflow/models/rnn/translate/data_utils.py
Normal file
264
tensorflow/models/rnn/translate/data_utils.py
Normal file
|
|
@ -0,0 +1,264 @@
|
|||
"""Utilities for downloading data from WMT, tokenizing, vocabularies."""
|
||||
|
||||
import gzip
|
||||
import os
|
||||
import re
|
||||
import tarfile
|
||||
import urllib
|
||||
|
||||
from tensorflow.python.platform import gfile
|
||||
|
||||
# Special vocabulary symbols - we always put them at the start.
|
||||
_PAD = "_PAD"
|
||||
_GO = "_GO"
|
||||
_EOS = "_EOS"
|
||||
_UNK = "_UNK"
|
||||
_START_VOCAB = [_PAD, _GO, _EOS, _UNK]
|
||||
|
||||
PAD_ID = 0
|
||||
GO_ID = 1
|
||||
EOS_ID = 2
|
||||
UNK_ID = 3
|
||||
|
||||
# Regular expressions used to tokenize.
|
||||
_WORD_SPLIT = re.compile("([.,!?\"':;)(])")
|
||||
_DIGIT_RE = re.compile(r"\d")
|
||||
|
||||
# URLs for WMT data.
|
||||
_WMT_ENFR_TRAIN_URL = "http://www.statmt.org/wmt10/training-giga-fren.tar"
|
||||
_WMT_ENFR_DEV_URL = "http://www.statmt.org/wmt15/dev-v2.tgz"
|
||||
|
||||
|
||||
def maybe_download(directory, filename, url):
|
||||
"""Download filename from url unless it's already in directory."""
|
||||
if not os.path.exists(directory):
|
||||
print "Creating directory %s" % directory
|
||||
os.mkdir(directory)
|
||||
filepath = os.path.join(directory, filename)
|
||||
if not os.path.exists(filepath):
|
||||
print "Downloading %s to %s" % (url, filepath)
|
||||
filepath, _ = urllib.urlretrieve(url, filepath)
|
||||
statinfo = os.stat(filepath)
|
||||
print "Succesfully downloaded", filename, statinfo.st_size, "bytes"
|
||||
return filepath
|
||||
|
||||
|
||||
def gunzip_file(gz_path, new_path):
|
||||
"""Unzips from gz_path into new_path."""
|
||||
print "Unpacking %s to %s" % (gz_path, new_path)
|
||||
with gzip.open(gz_path, "rb") as gz_file:
|
||||
with open(new_path, "w") as new_file:
|
||||
for line in gz_file:
|
||||
new_file.write(line)
|
||||
|
||||
|
||||
def get_wmt_enfr_train_set(directory):
|
||||
"""Download the WMT en-fr training corpus to directory unless it's there."""
|
||||
train_path = os.path.join(directory, "giga-fren.release2")
|
||||
if not (gfile.Exists(train_path +".fr") and gfile.Exists(train_path +".en")):
|
||||
corpus_file = maybe_download(directory, "training-giga-fren.tar",
|
||||
_WMT_ENFR_TRAIN_URL)
|
||||
print "Extracting tar file %s" % corpus_file
|
||||
with tarfile.open(corpus_file, "r") as corpus_tar:
|
||||
corpus_tar.extractall(directory)
|
||||
gunzip_file(train_path + ".fr.gz", train_path + ".fr")
|
||||
gunzip_file(train_path + ".en.gz", train_path + ".en")
|
||||
return train_path
|
||||
|
||||
|
||||
def get_wmt_enfr_dev_set(directory):
|
||||
"""Download the WMT en-fr training corpus to directory unless it's there."""
|
||||
dev_name = "newstest2013"
|
||||
dev_path = os.path.join(directory, dev_name)
|
||||
if not (gfile.Exists(dev_path + ".fr") and gfile.Exists(dev_path + ".en")):
|
||||
dev_file = maybe_download(directory, "dev-v2.tgz", _WMT_ENFR_DEV_URL)
|
||||
print "Extracting tgz file %s" % dev_file
|
||||
with tarfile.open(dev_file, "r:gz") as dev_tar:
|
||||
fr_dev_file = dev_tar.getmember("dev/" + dev_name + ".fr")
|
||||
en_dev_file = dev_tar.getmember("dev/" + dev_name + ".en")
|
||||
fr_dev_file.name = dev_name + ".fr" # Extract without "dev/" prefix.
|
||||
en_dev_file.name = dev_name + ".en"
|
||||
dev_tar.extract(fr_dev_file, directory)
|
||||
dev_tar.extract(en_dev_file, directory)
|
||||
return dev_path
|
||||
|
||||
|
||||
def basic_tokenizer(sentence):
|
||||
"""Very basic tokenizer: split the sentence into a list of tokens."""
|
||||
words = []
|
||||
for space_separated_fragment in sentence.strip().split():
|
||||
words.extend(re.split(_WORD_SPLIT, space_separated_fragment))
|
||||
return [w for w in words if w]
|
||||
|
||||
|
||||
def create_vocabulary(vocabulary_path, data_path, max_vocabulary_size,
|
||||
tokenizer=None, normalize_digits=True):
|
||||
"""Create vocabulary file (if it does not exist yet) from data file.
|
||||
|
||||
Data file is assumed to contain one sentence per line. Each sentence is
|
||||
tokenized and digits are normalized (if normalize_digits is set).
|
||||
Vocabulary contains the most-frequent tokens up to max_vocabulary_size.
|
||||
We write it to vocabulary_path in a one-token-per-line format, so that later
|
||||
token in the first line gets id=0, second line gets id=1, and so on.
|
||||
|
||||
Args:
|
||||
vocabulary_path: path where the vocabulary will be created.
|
||||
data_path: data file that will be used to create vocabulary.
|
||||
max_vocabulary_size: limit on the size of the created vocabulary.
|
||||
tokenizer: a function to use to tokenize each data sentence;
|
||||
if None, basic_tokenizer will be used.
|
||||
normalize_digits: Boolean; if true, all digits are replaced by 0s.
|
||||
"""
|
||||
if not gfile.Exists(vocabulary_path):
|
||||
print "Creating vocabulary %s from data %s" % (vocabulary_path, data_path)
|
||||
vocab = {}
|
||||
with gfile.GFile(data_path, mode="r") as f:
|
||||
counter = 0
|
||||
for line in f:
|
||||
counter += 1
|
||||
if counter % 100000 == 0: print " processing line %d" % counter
|
||||
tokens = tokenizer(line) if tokenizer else basic_tokenizer(line)
|
||||
for w in tokens:
|
||||
word = re.sub(_DIGIT_RE, "0", w) if normalize_digits else w
|
||||
if word in vocab:
|
||||
vocab[word] += 1
|
||||
else:
|
||||
vocab[word] = 1
|
||||
vocab_list = _START_VOCAB + sorted(vocab, key=vocab.get, reverse=True)
|
||||
if len(vocab_list) > max_vocabulary_size:
|
||||
vocab_list = vocab_list[:max_vocabulary_size]
|
||||
with gfile.GFile(vocabulary_path, mode="w") as vocab_file:
|
||||
for w in vocab_list:
|
||||
vocab_file.write(w + "\n")
|
||||
|
||||
|
||||
def initialize_vocabulary(vocabulary_path):
|
||||
"""Initialize vocabulary from file.
|
||||
|
||||
We assume the vocabulary is stored one-item-per-line, so a file:
|
||||
dog
|
||||
cat
|
||||
will result in a vocabulary {"dog": 0, "cat": 1}, and this function will
|
||||
also return the reversed-vocabulary ["dog", "cat"].
|
||||
|
||||
Args:
|
||||
vocabulary_path: path to the file containing the vocabulary.
|
||||
|
||||
Returns:
|
||||
a pair: the vocabulary (a dictionary mapping string to integers), and
|
||||
the reversed vocabulary (a list, which reverses the vocabulary mapping).
|
||||
|
||||
Raises:
|
||||
ValueError: if the provided vocabulary_path does not exist.
|
||||
"""
|
||||
if gfile.Exists(vocabulary_path):
|
||||
rev_vocab = []
|
||||
with gfile.GFile(vocabulary_path, mode="r") as f:
|
||||
rev_vocab.extend(f.readlines())
|
||||
rev_vocab = [line.strip() for line in rev_vocab]
|
||||
vocab = dict([(x, y) for (y, x) in enumerate(rev_vocab)])
|
||||
return vocab, rev_vocab
|
||||
else:
|
||||
raise ValueError("Vocabulary file %s not found.", vocabulary_path)
|
||||
|
||||
|
||||
def sentence_to_token_ids(sentence, vocabulary,
|
||||
tokenizer=None, normalize_digits=True):
|
||||
"""Convert a string to list of integers representing token-ids.
|
||||
|
||||
For example, a sentence "I have a dog" may become tokenized into
|
||||
["I", "have", "a", "dog"] and with vocabulary {"I": 1, "have": 2,
|
||||
"a": 4, "dog": 7"} this function will return [1, 2, 4, 7].
|
||||
|
||||
Args:
|
||||
sentence: a string, the sentence to convert to token-ids.
|
||||
vocabulary: a dictionary mapping tokens to integers.
|
||||
tokenizer: a function to use to tokenize each sentence;
|
||||
if None, basic_tokenizer will be used.
|
||||
normalize_digits: Boolean; if true, all digits are replaced by 0s.
|
||||
|
||||
Returns:
|
||||
a list of integers, the token-ids for the sentence.
|
||||
"""
|
||||
if tokenizer:
|
||||
words = tokenizer(sentence)
|
||||
else:
|
||||
words = basic_tokenizer(sentence)
|
||||
if not normalize_digits:
|
||||
return [vocabulary.get(w, UNK_ID) for w in words]
|
||||
# Normalize digits by 0 before looking words up in the vocabulary.
|
||||
return [vocabulary.get(re.sub(_DIGIT_RE, "0", w), UNK_ID) for w in words]
|
||||
|
||||
|
||||
def data_to_token_ids(data_path, target_path, vocabulary_path,
|
||||
tokenizer=None, normalize_digits=True):
|
||||
"""Tokenize data file and turn into token-ids using given vocabulary file.
|
||||
|
||||
This function loads data line-by-line from data_path, calls the above
|
||||
sentence_to_token_ids, and saves the result to target_path. See comment
|
||||
for sentence_to_token_ids on the details of token-ids format.
|
||||
|
||||
Args:
|
||||
data_path: path to the data file in one-sentence-per-line format.
|
||||
target_path: path where the file with token-ids will be created.
|
||||
vocabulary_path: path to the vocabulary file.
|
||||
tokenizer: a function to use to tokenize each sentence;
|
||||
if None, basic_tokenizer will be used.
|
||||
normalize_digits: Boolean; if true, all digits are replaced by 0s.
|
||||
"""
|
||||
if not gfile.Exists(target_path):
|
||||
print "Tokenizing data in %s" % data_path
|
||||
vocab, _ = initialize_vocabulary(vocabulary_path)
|
||||
with gfile.GFile(data_path, mode="r") as data_file:
|
||||
with gfile.GFile(target_path, mode="w") as tokens_file:
|
||||
counter = 0
|
||||
for line in data_file:
|
||||
counter += 1
|
||||
if counter % 100000 == 0: print " tokenizing line %d" % counter
|
||||
token_ids = sentence_to_token_ids(line, vocab, tokenizer,
|
||||
normalize_digits)
|
||||
tokens_file.write(" ".join([str(tok) for tok in token_ids]) + "\n")
|
||||
|
||||
|
||||
def prepare_wmt_data(data_dir, en_vocabulary_size, fr_vocabulary_size):
|
||||
"""Get WMT data into data_dir, create vocabularies and tokenize data.
|
||||
|
||||
Args:
|
||||
data_dir: directory in which the data sets will be stored.
|
||||
en_vocabulary_size: size of the English vocabulary to create and use.
|
||||
fr_vocabulary_size: size of the French vocabulary to create and use.
|
||||
|
||||
Returns:
|
||||
A tuple of 6 elements:
|
||||
(1) path to the token-ids for English training data-set,
|
||||
(2) path to the token-ids for French training data-set,
|
||||
(3) path to the token-ids for English development data-set,
|
||||
(4) path to the token-ids for French development data-set,
|
||||
(5) path to the English vocabulary file,
|
||||
(6) path to the French vocabluary file.
|
||||
"""
|
||||
# Get wmt data to the specified directory.
|
||||
train_path = get_wmt_enfr_train_set(data_dir)
|
||||
dev_path = get_wmt_enfr_dev_set(data_dir)
|
||||
|
||||
# Create vocabularies of the appropriate sizes.
|
||||
fr_vocab_path = os.path.join(data_dir, "vocab%d.fr" % fr_vocabulary_size)
|
||||
en_vocab_path = os.path.join(data_dir, "vocab%d.en" % en_vocabulary_size)
|
||||
create_vocabulary(fr_vocab_path, train_path + ".fr", fr_vocabulary_size)
|
||||
create_vocabulary(en_vocab_path, train_path + ".en", en_vocabulary_size)
|
||||
|
||||
# Create token ids for the training data.
|
||||
fr_train_ids_path = train_path + (".ids%d.fr" % fr_vocabulary_size)
|
||||
en_train_ids_path = train_path + (".ids%d.en" % en_vocabulary_size)
|
||||
data_to_token_ids(train_path + ".fr", fr_train_ids_path, fr_vocab_path)
|
||||
data_to_token_ids(train_path + ".en", fr_train_ids_path, fr_vocab_path)
|
||||
|
||||
# Create token ids for the development data.
|
||||
fr_dev_ids_path = dev_path + (".ids%d.fr" % fr_vocabulary_size)
|
||||
en_dev_ids_path = dev_path + (".ids%d.en" % en_vocabulary_size)
|
||||
data_to_token_ids(dev_path + ".fr", fr_dev_ids_path, fr_vocab_path)
|
||||
data_to_token_ids(dev_path + ".en", en_dev_ids_path, en_vocab_path)
|
||||
|
||||
return (en_train_ids_path, fr_train_ids_path,
|
||||
en_dev_ids_path, fr_dev_ids_path,
|
||||
en_vocab_path, fr_vocab_path)
|
||||
268
tensorflow/models/rnn/translate/seq2seq_model.py
Normal file
268
tensorflow/models/rnn/translate/seq2seq_model.py
Normal file
|
|
@ -0,0 +1,268 @@
|
|||
"""Sequence-to-sequence model with an attention mechanism."""
|
||||
|
||||
import random
|
||||
|
||||
import numpy as np
|
||||
import tensorflow as tf
|
||||
|
||||
from tensorflow.models.rnn import rnn_cell
|
||||
from tensorflow.models.rnn import seq2seq
|
||||
|
||||
from tensorflow.models.rnn.translate import data_utils
|
||||
|
||||
|
||||
class Seq2SeqModel(object):
|
||||
"""Sequence-to-sequence model with attention and for multiple buckets.
|
||||
|
||||
This class implements a multi-layer recurrent neural network as encoder,
|
||||
and an attention-based decoder. This is the same as the model described in
|
||||
this paper: http://arxiv.org/abs/1412.7449 - please look there for details,
|
||||
or into the seq2seq library for complete model implementation.
|
||||
This class also allows to use GRU cells in addition to LSTM cells, and
|
||||
sampled softmax to handle large output vocabulary size. A single-layer
|
||||
version of this model, but with bi-directional encoder, was presented in
|
||||
http://arxiv.org/abs/1409.0473
|
||||
and sampled softmax is described in Section 3 of the following paper.
|
||||
http://arxiv.org/pdf/1412.2007v2.pdf
|
||||
"""
|
||||
|
||||
def __init__(self, source_vocab_size, target_vocab_size, buckets, size,
|
||||
num_layers, max_gradient_norm, batch_size, learning_rate,
|
||||
learning_rate_decay_factor, use_lstm=False,
|
||||
num_samples=512, forward_only=False):
|
||||
"""Create the model.
|
||||
|
||||
Args:
|
||||
source_vocab_size: size of the source vocabulary.
|
||||
target_vocab_size: size of the target vocabulary.
|
||||
buckets: a list of pairs (I, O), where I specifies maximum input length
|
||||
that will be processed in that bucket, and O specifies maximum output
|
||||
length. Training instances that have inputs longer than I or outputs
|
||||
longer than O will be pushed to the next bucket and padded accordingly.
|
||||
We assume that the list is sorted, e.g., [(2, 4), (8, 16)].
|
||||
size: number of units in each layer of the model.
|
||||
num_layers: number of layers in the model.
|
||||
max_gradient_norm: gradients will be clipped to maximally this norm.
|
||||
batch_size: the size of the batches used during training;
|
||||
the model construction is independent of batch_size, so it can be
|
||||
changed after initialization if this is convenient, e.g., for decoding.
|
||||
learning_rate: learning rate to start with.
|
||||
learning_rate_decay_factor: decay learning rate by this much when needed.
|
||||
use_lstm: if true, we use LSTM cells instead of GRU cells.
|
||||
num_samples: number of samples for sampled softmax.
|
||||
forward_only: if set, we do not construct the backward pass in the model.
|
||||
"""
|
||||
self.source_vocab_size = source_vocab_size
|
||||
self.target_vocab_size = target_vocab_size
|
||||
self.buckets = buckets
|
||||
self.batch_size = batch_size
|
||||
self.learning_rate = tf.Variable(float(learning_rate), trainable=False)
|
||||
self.learning_rate_decay_op = self.learning_rate.assign(
|
||||
self.learning_rate * learning_rate_decay_factor)
|
||||
self.global_step = tf.Variable(0, trainable=False)
|
||||
|
||||
# If we use sampled softmax, we need an output projection.
|
||||
output_projection = None
|
||||
softmax_loss_function = None
|
||||
# Sampled softmax only makes sense if we sample less than vocabulary size.
|
||||
if num_samples > 0 and num_samples < self.target_vocab_size:
|
||||
with tf.device("/cpu:0"):
|
||||
w = tf.get_variable("proj_w", [size, self.target_vocab_size])
|
||||
w_t = tf.transpose(w)
|
||||
b = tf.get_variable("proj_b", [self.target_vocab_size])
|
||||
output_projection = (w, b)
|
||||
|
||||
def sampled_loss(inputs, labels):
|
||||
with tf.device("/cpu:0"):
|
||||
labels = tf.reshape(labels, [-1, 1])
|
||||
return tf.nn.sampled_softmax_loss(w_t, b, inputs, labels, num_samples,
|
||||
self.target_vocab_size)
|
||||
softmax_loss_function = sampled_loss
|
||||
|
||||
# Create the internal multi-layer cell for our RNN.
|
||||
single_cell = rnn_cell.GRUCell(size)
|
||||
if use_lstm:
|
||||
single_cell = rnn_cell.BasicLSTMCell(size)
|
||||
cell = single_cell
|
||||
if num_layers > 1:
|
||||
cell = rnn_cell.MultiRNNCell([single_cell] * num_layers)
|
||||
|
||||
# The seq2seq function: we use embedding for the input and attention.
|
||||
def seq2seq_f(encoder_inputs, decoder_inputs, do_decode):
|
||||
return seq2seq.embedding_attention_seq2seq(
|
||||
encoder_inputs, decoder_inputs, cell, source_vocab_size,
|
||||
target_vocab_size, output_projection=output_projection,
|
||||
feed_previous=do_decode)
|
||||
|
||||
# Feeds for inputs.
|
||||
self.encoder_inputs = []
|
||||
self.decoder_inputs = []
|
||||
self.target_weights = []
|
||||
for i in xrange(buckets[-1][0]): # Last bucket is the biggest one.
|
||||
self.encoder_inputs.append(tf.placeholder(tf.int32, shape=[None],
|
||||
name="encoder{0}".format(i)))
|
||||
for i in xrange(buckets[-1][1] + 1):
|
||||
self.decoder_inputs.append(tf.placeholder(tf.int32, shape=[None],
|
||||
name="decoder{0}".format(i)))
|
||||
self.target_weights.append(tf.placeholder(tf.float32, shape=[None],
|
||||
name="weight{0}".format(i)))
|
||||
|
||||
# Our targets are decoder inputs shifted by one.
|
||||
targets = [self.decoder_inputs[i + 1]
|
||||
for i in xrange(len(self.decoder_inputs) - 1)]
|
||||
|
||||
# Training outputs and losses.
|
||||
if forward_only:
|
||||
self.outputs, self.losses = seq2seq.model_with_buckets(
|
||||
self.encoder_inputs, self.decoder_inputs, targets,
|
||||
self.target_weights, buckets, self.target_vocab_size,
|
||||
lambda x, y: seq2seq_f(x, y, True),
|
||||
softmax_loss_function=softmax_loss_function)
|
||||
# If we use output projection, we need to project outputs for decoding.
|
||||
if output_projection is not None:
|
||||
for b in xrange(len(buckets)):
|
||||
self.outputs[b] = [tf.nn.xw_plus_b(output, output_projection[0],
|
||||
output_projection[1])
|
||||
for output in self.outputs[b]]
|
||||
else:
|
||||
self.outputs, self.losses = seq2seq.model_with_buckets(
|
||||
self.encoder_inputs, self.decoder_inputs, targets,
|
||||
self.target_weights, buckets, self.target_vocab_size,
|
||||
lambda x, y: seq2seq_f(x, y, False),
|
||||
softmax_loss_function=softmax_loss_function)
|
||||
|
||||
# Gradients and SGD update operation for training the model.
|
||||
params = tf.trainable_variables()
|
||||
if not forward_only:
|
||||
self.gradient_norms = []
|
||||
self.updates = []
|
||||
opt = tf.train.GradientDescentOptimizer(self.learning_rate)
|
||||
for b in xrange(len(buckets)):
|
||||
gradients = tf.gradients(self.losses[b], params)
|
||||
clipped_gradients, norm = tf.clip_by_global_norm(gradients,
|
||||
max_gradient_norm)
|
||||
self.gradient_norms.append(norm)
|
||||
self.updates.append(opt.apply_gradients(
|
||||
zip(clipped_gradients, params), global_step=self.global_step))
|
||||
|
||||
self.saver = tf.train.Saver(tf.all_variables())
|
||||
|
||||
def step(self, session, encoder_inputs, decoder_inputs, target_weights,
|
||||
bucket_id, forward_only):
|
||||
"""Run a step of the model feeding the given inputs.
|
||||
|
||||
Args:
|
||||
session: tensorflow session to use.
|
||||
encoder_inputs: list of numpy int vectors to feed as encoder inputs.
|
||||
decoder_inputs: list of numpy int vectors to feed as decoder inputs.
|
||||
target_weights: list of numpy float vectors to feed as target weights.
|
||||
bucket_id: which bucket of the model to use.
|
||||
forward_only: whether to do the backward step or only forward.
|
||||
|
||||
Returns:
|
||||
A triple consisting of gradient norm (or None if we did not do backward),
|
||||
average perplexity, and the outputs.
|
||||
|
||||
Raises:
|
||||
ValueError: if length of enconder_inputs, decoder_inputs, or
|
||||
target_weights disagrees with bucket size for the specified bucket_id.
|
||||
"""
|
||||
# Check if the sizes match.
|
||||
encoder_size, decoder_size = self.buckets[bucket_id]
|
||||
if len(encoder_inputs) != encoder_size:
|
||||
raise ValueError("Encoder length must be equal to the one in bucket,"
|
||||
" %d != %d." % (len(encoder_inputs), encoder_size))
|
||||
if len(decoder_inputs) != decoder_size:
|
||||
raise ValueError("Decoder length must be equal to the one in bucket,"
|
||||
" %d != %d." % (len(decoder_inputs), decoder_size))
|
||||
if len(target_weights) != decoder_size:
|
||||
raise ValueError("Weights length must be equal to the one in bucket,"
|
||||
" %d != %d." % (len(target_weights), decoder_size))
|
||||
|
||||
# Input feed: encoder inputs, decoder inputs, target_weights, as provided.
|
||||
input_feed = {}
|
||||
for l in xrange(encoder_size):
|
||||
input_feed[self.encoder_inputs[l].name] = encoder_inputs[l]
|
||||
for l in xrange(decoder_size):
|
||||
input_feed[self.decoder_inputs[l].name] = decoder_inputs[l]
|
||||
input_feed[self.target_weights[l].name] = target_weights[l]
|
||||
|
||||
# Since our targets are decoder inputs shifted by one, we need one more.
|
||||
last_target = self.decoder_inputs[decoder_size].name
|
||||
input_feed[last_target] = np.zeros([self.batch_size], dtype=np.int32)
|
||||
|
||||
# Output feed: depends on whether we do a backward step or not.
|
||||
if not forward_only:
|
||||
output_feed = [self.updates[bucket_id], # Update Op that does SGD.
|
||||
self.gradient_norms[bucket_id], # Gradient norm.
|
||||
self.losses[bucket_id]] # Loss for this batch.
|
||||
else:
|
||||
output_feed = [self.losses[bucket_id]] # Loss for this batch.
|
||||
for l in xrange(decoder_size): # Output logits.
|
||||
output_feed.append(self.outputs[bucket_id][l])
|
||||
|
||||
outputs = session.run(output_feed, input_feed)
|
||||
if not forward_only:
|
||||
return outputs[1], outputs[2], None # Gradient norm, loss, no outputs.
|
||||
else:
|
||||
return None, outputs[0], outputs[1:] # No gradient norm, loss, outputs.
|
||||
|
||||
def get_batch(self, data, bucket_id):
|
||||
"""Get a random batch of data from the specified bucket, prepare for step.
|
||||
|
||||
To feed data in step(..) it must be a list of batch-major vectors, while
|
||||
data here contains single length-major cases. So the main logic of this
|
||||
function is to re-index data cases to be in the proper format for feeding.
|
||||
|
||||
Args:
|
||||
data: a tuple of size len(self.buckets) in which each element contains
|
||||
lists of pairs of input and output data that we use to create a batch.
|
||||
bucket_id: integer, which bucket to get the batch for.
|
||||
|
||||
Returns:
|
||||
The triple (encoder_inputs, decoder_inputs, target_weights) for
|
||||
the constructed batch that has the proper format to call step(...) later.
|
||||
"""
|
||||
encoder_size, decoder_size = self.buckets[bucket_id]
|
||||
encoder_inputs, decoder_inputs = [], []
|
||||
|
||||
# Get a random batch of encoder and decoder inputs from data,
|
||||
# pad them if needed, reverse encoder inputs and add GO to decoder.
|
||||
for _ in xrange(self.batch_size):
|
||||
encoder_input, decoder_input = random.choice(data[bucket_id])
|
||||
|
||||
# Encoder inputs are padded and then reversed.
|
||||
encoder_pad = [data_utils.PAD_ID] * (encoder_size - len(encoder_input))
|
||||
encoder_inputs.append(list(reversed(encoder_input + encoder_pad)))
|
||||
|
||||
# Decoder inputs get an extra "GO" symbol, and are padded then.
|
||||
decoder_pad_size = decoder_size - len(decoder_input) - 1
|
||||
decoder_inputs.append([data_utils.GO_ID] + decoder_input +
|
||||
[data_utils.PAD_ID] * decoder_pad_size)
|
||||
|
||||
# Now we create batch-major vectors from the data selected above.
|
||||
batch_encoder_inputs, batch_decoder_inputs, batch_weights = [], [], []
|
||||
|
||||
# Batch encoder inputs are just re-indexed encoder_inputs.
|
||||
for length_idx in xrange(encoder_size):
|
||||
batch_encoder_inputs.append(
|
||||
np.array([encoder_inputs[batch_idx][length_idx]
|
||||
for batch_idx in xrange(self.batch_size)], dtype=np.int32))
|
||||
|
||||
# Batch decoder inputs are re-indexed decoder_inputs, we create weights.
|
||||
for length_idx in xrange(decoder_size):
|
||||
batch_decoder_inputs.append(
|
||||
np.array([decoder_inputs[batch_idx][length_idx]
|
||||
for batch_idx in xrange(self.batch_size)], dtype=np.int32))
|
||||
|
||||
# Create target_weights to be 0 for targets that are padding.
|
||||
batch_weight = np.ones(self.batch_size, dtype=np.float32)
|
||||
for batch_idx in xrange(self.batch_size):
|
||||
# We set weight to 0 if the corresponding target is a PAD symbol.
|
||||
# The corresponding target is decoder_input shifted by 1 forward.
|
||||
if length_idx < decoder_size - 1:
|
||||
target = decoder_inputs[batch_idx][length_idx + 1]
|
||||
if length_idx == decoder_size - 1 or target == data_utils.PAD_ID:
|
||||
batch_weight[batch_idx] = 0.0
|
||||
batch_weights.append(batch_weight)
|
||||
return batch_encoder_inputs, batch_decoder_inputs, batch_weights
|
||||
260
tensorflow/models/rnn/translate/translate.py
Normal file
260
tensorflow/models/rnn/translate/translate.py
Normal file
|
|
@ -0,0 +1,260 @@
|
|||
"""Binary for training translation models and decoding from them.
|
||||
|
||||
Running this program without --decode will download the WMT corpus into
|
||||
the directory specified as --data_dir and tokenize it in a very basic way,
|
||||
and then start training a model saving checkpoints to --train_dir.
|
||||
|
||||
Running with --decode starts an interactive loop so you can see how
|
||||
the current checkpoint translates English sentences into French.
|
||||
|
||||
See the following papers for more information on neural translation models.
|
||||
* http://arxiv.org/abs/1409.3215
|
||||
* http://arxiv.org/abs/1409.0473
|
||||
* http://arxiv.org/pdf/1412.2007v2.pdf
|
||||
"""
|
||||
|
||||
import math
|
||||
import os
|
||||
import random
|
||||
import sys
|
||||
import time
|
||||
|
||||
import tensorflow.python.platform
|
||||
|
||||
import numpy as np
|
||||
import tensorflow as tf
|
||||
|
||||
from tensorflow.models.rnn.translate import data_utils
|
||||
from tensorflow.models.rnn.translate import seq2seq_model
|
||||
from tensorflow.python.platform import gfile
|
||||
|
||||
|
||||
tf.app.flags.DEFINE_float("learning_rate", 0.5, "Learning rate.")
|
||||
tf.app.flags.DEFINE_float("learning_rate_decay_factor", 0.99,
|
||||
"Learning rate decays by this much.")
|
||||
tf.app.flags.DEFINE_float("max_gradient_norm", 5.0,
|
||||
"Clip gradients to this norm.")
|
||||
tf.app.flags.DEFINE_integer("batch_size", 64,
|
||||
"Batch size to use during training.")
|
||||
tf.app.flags.DEFINE_integer("size", 1024, "Size of each model layer.")
|
||||
tf.app.flags.DEFINE_integer("num_layers", 3, "Number of layers in the model.")
|
||||
tf.app.flags.DEFINE_integer("en_vocab_size", 40000, "English vocabulary size.")
|
||||
tf.app.flags.DEFINE_integer("fr_vocab_size", 40000, "French vocabulary size.")
|
||||
tf.app.flags.DEFINE_string("data_dir", "/tmp", "Data directory")
|
||||
tf.app.flags.DEFINE_string("train_dir", "/tmp", "Training directory.")
|
||||
tf.app.flags.DEFINE_integer("max_train_data_size", 0,
|
||||
"Limit on the size of training data (0: no limit).")
|
||||
tf.app.flags.DEFINE_integer("steps_per_checkpoint", 200,
|
||||
"How many training steps to do per checkpoint.")
|
||||
tf.app.flags.DEFINE_boolean("decode", False,
|
||||
"Set to True for interactive decoding.")
|
||||
tf.app.flags.DEFINE_boolean("self_test", False,
|
||||
"Run a self-test if this is set to True.")
|
||||
|
||||
FLAGS = tf.app.flags.FLAGS
|
||||
|
||||
# We use a number of buckets and pad to the closest one for efficiency.
|
||||
# See seq2seq_model.Seq2SeqModel for details of how they work.
|
||||
_buckets = [(5, 10), (10, 15), (20, 25), (40, 50)]
|
||||
|
||||
|
||||
def read_data(source_path, target_path, max_size=None):
|
||||
"""Read data from source and target files and put into buckets.
|
||||
|
||||
Args:
|
||||
source_path: path to the files with token-ids for the source language.
|
||||
target_path: path to the file with token-ids for the target language;
|
||||
it must be aligned with the source file: n-th line contains the desired
|
||||
output for n-th line from the source_path.
|
||||
max_size: maximum number of lines to read, all other will be ignored;
|
||||
if 0 or None, data files will be read completely (no limit).
|
||||
|
||||
Returns:
|
||||
data_set: a list of length len(_buckets); data_set[n] contains a list of
|
||||
(source, target) pairs read from the provided data files that fit
|
||||
into the n-th bucket, i.e., such that len(source) < _buckets[n][0] and
|
||||
len(target) < _buckets[n][1]; source and target are lists of token-ids.
|
||||
"""
|
||||
data_set = [[] for _ in _buckets]
|
||||
with gfile.GFile(source_path, mode="r") as source_file:
|
||||
with gfile.GFile(target_path, mode="r") as target_file:
|
||||
source, target = source_file.readline(), target_file.readline()
|
||||
counter = 0
|
||||
while source and target and (not max_size or counter < max_size):
|
||||
counter += 1
|
||||
if counter % 100000 == 0:
|
||||
print " reading data line %d" % counter
|
||||
sys.stdout.flush()
|
||||
source_ids = [int(x) for x in source.split()]
|
||||
target_ids = [int(x) for x in target.split()]
|
||||
target_ids.append(data_utils.EOS_ID)
|
||||
for bucket_id, (source_size, target_size) in enumerate(_buckets):
|
||||
if len(source_ids) < source_size and len(target_ids) < target_size:
|
||||
data_set[bucket_id].append([source_ids, target_ids])
|
||||
break
|
||||
source, target = source_file.readline(), target_file.readline()
|
||||
return data_set
|
||||
|
||||
|
||||
def create_model(session, forward_only):
|
||||
"""Create translation model and initialize or load parameters in session."""
|
||||
model = seq2seq_model.Seq2SeqModel(
|
||||
FLAGS.en_vocab_size, FLAGS.fr_vocab_size, _buckets,
|
||||
FLAGS.size, FLAGS.num_layers, FLAGS.max_gradient_norm, FLAGS.batch_size,
|
||||
FLAGS.learning_rate, FLAGS.learning_rate_decay_factor,
|
||||
forward_only=forward_only)
|
||||
ckpt = tf.train.get_checkpoint_state(FLAGS.train_dir)
|
||||
if ckpt and gfile.Exists(ckpt.model_checkpoint_path):
|
||||
print "Reading model parameters from %s" % ckpt.model_checkpoint_path
|
||||
model.saver.restore(session, ckpt.model_checkpoint_path)
|
||||
else:
|
||||
print "Created model with fresh parameters."
|
||||
session.run(tf.variables.initialize_all_variables())
|
||||
return model
|
||||
|
||||
|
||||
def train():
|
||||
"""Train a en->fr translation model using WMT data."""
|
||||
# Prepare WMT data.
|
||||
print "Preparing WMT data in %s" % FLAGS.data_dir
|
||||
en_train, fr_train, en_dev, fr_dev, _, _ = data_utils.prepare_wmt_data(
|
||||
FLAGS.data_dir, FLAGS.en_vocab_size, FLAGS.fr_vocab_size)
|
||||
|
||||
with tf.Session() as sess:
|
||||
# Create model.
|
||||
print "Creating %d layers of %d units." % (FLAGS.num_layers, FLAGS.size)
|
||||
model = create_model(sess, False)
|
||||
|
||||
# Read data into buckets and compute their sizes.
|
||||
print ("Reading development and training data (limit: %d)."
|
||||
% FLAGS.max_train_data_size)
|
||||
dev_set = read_data(en_dev, fr_dev)
|
||||
train_set = read_data(en_train, fr_train, FLAGS.max_train_data_size)
|
||||
train_bucket_sizes = [len(train_set[b]) for b in xrange(len(_buckets))]
|
||||
train_total_size = float(sum(train_bucket_sizes))
|
||||
|
||||
# A bucket scale is a list of increasing numbers from 0 to 1 that we'll use
|
||||
# to select a bucket. Length of [scale[i], scale[i+1]] is proportional to
|
||||
# the size if i-th training bucket, as used later.
|
||||
train_buckets_scale = [sum(train_bucket_sizes[:i + 1]) / train_total_size
|
||||
for i in xrange(len(train_bucket_sizes))]
|
||||
|
||||
# This is the training loop.
|
||||
step_time, loss = 0.0, 0.0
|
||||
current_step = 0
|
||||
previous_losses = []
|
||||
while True:
|
||||
# Choose a bucket according to data distribution. We pick a random number
|
||||
# in [0, 1] and use the corresponding interval in train_buckets_scale.
|
||||
random_number_01 = np.random.random_sample()
|
||||
bucket_id = min([i for i in xrange(len(train_buckets_scale))
|
||||
if train_buckets_scale[i] > random_number_01])
|
||||
|
||||
# Get a batch and make a step.
|
||||
start_time = time.time()
|
||||
encoder_inputs, decoder_inputs, target_weights = model.get_batch(
|
||||
train_set, bucket_id)
|
||||
_, step_loss, _ = model.step(sess, encoder_inputs, decoder_inputs,
|
||||
target_weights, bucket_id, False)
|
||||
step_time += (time.time() - start_time) / FLAGS.steps_per_checkpoint
|
||||
loss += step_loss / FLAGS.steps_per_checkpoint
|
||||
current_step += 1
|
||||
|
||||
# Once in a while, we save checkpoint, print statistics, and run evals.
|
||||
if current_step % FLAGS.steps_per_checkpoint == 0:
|
||||
# Print statistics for the previous epoch.
|
||||
perplexity = math.exp(loss) if loss < 300 else float('inf')
|
||||
print ("global step %d learning rate %.4f step-time %.2f perplexity "
|
||||
"%.2f" % (model.global_step.eval(), model.learning_rate.eval(),
|
||||
step_time, perplexity))
|
||||
# Decrease learning rate if no improvement was seen over last 3 times.
|
||||
if len(previous_losses) > 2 and loss > max(previous_losses[-3:]):
|
||||
sess.run(model.learning_rate_decay_op)
|
||||
previous_losses.append(loss)
|
||||
# Save checkpoint and zero timer and loss.
|
||||
checkpoint_path = os.path.join(FLAGS.train_dir, "translate.ckpt")
|
||||
model.saver.save(sess, checkpoint_path, global_step=model.global_step)
|
||||
step_time, loss = 0.0, 0.0
|
||||
# Run evals on development set and print their perplexity.
|
||||
for bucket_id in xrange(len(_buckets)):
|
||||
encoder_inputs, decoder_inputs, target_weights = model.get_batch(
|
||||
dev_set, bucket_id)
|
||||
_, eval_loss, _ = model.step(sess, encoder_inputs, decoder_inputs,
|
||||
target_weights, bucket_id, True)
|
||||
eval_ppx = math.exp(eval_loss) if eval_loss < 300 else float('inf')
|
||||
print " eval: bucket %d perplexity %.2f" % (bucket_id, eval_ppx)
|
||||
sys.stdout.flush()
|
||||
|
||||
|
||||
def decode():
|
||||
with tf.Session() as sess:
|
||||
# Create model and load parameters.
|
||||
model = create_model(sess, True)
|
||||
model.batch_size = 1 # We decode one sentence at a time.
|
||||
|
||||
# Load vocabularies.
|
||||
en_vocab_path = os.path.join(FLAGS.data_dir,
|
||||
"vocab%d.en" % FLAGS.en_vocab_size)
|
||||
fr_vocab_path = os.path.join(FLAGS.data_dir,
|
||||
"vocab%d.fr" % FLAGS.fr_vocab_size)
|
||||
en_vocab, _ = data_utils.initialize_vocabulary(en_vocab_path)
|
||||
_, rev_fr_vocab = data_utils.initialize_vocabulary(fr_vocab_path)
|
||||
|
||||
# Decode from standard input.
|
||||
sys.stdout.write("> ")
|
||||
sys.stdout.flush()
|
||||
sentence = sys.stdin.readline()
|
||||
while sentence:
|
||||
# Get token-ids for the input sentence.
|
||||
token_ids = data_utils.sentence_to_token_ids(sentence, en_vocab)
|
||||
# Which bucket does it belong to?
|
||||
bucket_id = min([b for b in xrange(len(_buckets))
|
||||
if _buckets[b][0] > len(token_ids)])
|
||||
# Get a 1-element batch to feed the sentence to the model.
|
||||
encoder_inputs, decoder_inputs, target_weights = model.get_batch(
|
||||
{bucket_id: [(token_ids, [])]}, bucket_id)
|
||||
# Get output logits for the sentence.
|
||||
_, _, output_logits = model.step(sess, encoder_inputs, decoder_inputs,
|
||||
target_weights, bucket_id, True)
|
||||
# This is a greedy decoder - outputs are just argmaxes of output_logits.
|
||||
outputs = [int(np.argmax(logit, axis=1)) for logit in output_logits]
|
||||
# If there is an EOS symbol in outputs, cut them at that point.
|
||||
if data_utils.EOS_ID in outputs:
|
||||
outputs = outputs[:outputs.index(data_utils.EOS_ID)]
|
||||
# Print out French sentence corresponding to outputs.
|
||||
print " ".join([rev_fr_vocab[output] for output in outputs])
|
||||
print "> ",
|
||||
sys.stdout.flush()
|
||||
sentence = sys.stdin.readline()
|
||||
|
||||
|
||||
def self_test():
|
||||
"""Test the translation model."""
|
||||
with tf.Session() as sess:
|
||||
print "Self-test for neural translation model."
|
||||
# Create model with vocabularies of 10, 2 small buckets, 2 layers of 32.
|
||||
model = seq2seq_model.Seq2SeqModel(10, 10, [(3, 3), (6, 6)], 32, 2,
|
||||
5.0, 32, 0.3, 0.99, num_samples=8)
|
||||
sess.run(tf.variables.initialize_all_variables())
|
||||
|
||||
# Fake data set for both the (3, 3) and (6, 6) bucket.
|
||||
data_set = ([([1, 1], [2, 2]), ([3, 3], [4]), ([5], [6])],
|
||||
[([1, 1, 1, 1, 1], [2, 2, 2, 2, 2]), ([3, 3, 3], [5, 6])])
|
||||
for _ in xrange(5): # Train the fake model for 5 steps.
|
||||
bucket_id = random.choice([0, 1])
|
||||
encoder_inputs, decoder_inputs, target_weights = model.get_batch(
|
||||
data_set, bucket_id)
|
||||
model.step(sess, encoder_inputs, decoder_inputs, target_weights,
|
||||
bucket_id, False)
|
||||
|
||||
|
||||
def main(_):
|
||||
if FLAGS.self_test:
|
||||
self_test()
|
||||
elif FLAGS.decode:
|
||||
decode()
|
||||
else:
|
||||
train()
|
||||
|
||||
if __name__ == "__main__":
|
||||
tf.app.run()
|
||||
|
|
@ -233,6 +233,14 @@ class Library(Document):
|
|||
# signatures.
|
||||
continue
|
||||
args_list.append(arg)
|
||||
|
||||
# TODO(mrry): This is a workaround for documenting signature of
|
||||
# functions that have the @contextlib.contextmanager decorator.
|
||||
# We should do something better.
|
||||
if argspec.varargs == "args" and argspec.keywords == "kwds":
|
||||
original_func = func.func_closure[0].cell_contents
|
||||
return self._generate_signature_for_function(original_func)
|
||||
|
||||
if argspec.defaults:
|
||||
for arg, default in zip(
|
||||
argspec.args[first_arg_with_default:], argspec.defaults):
|
||||
|
|
|
|||
|
|
@ -76,7 +76,8 @@ def all_libraries(module_to_name, members, documented):
|
|||
"xw_plus_b", "relu_layer", "lrn",
|
||||
"batch_norm_with_global_normalization",
|
||||
"batch_norm_with_global_normalization_grad",
|
||||
"all_candidate_sampler"],
|
||||
"all_candidate_sampler",
|
||||
"embedding_lookup_sparse"],
|
||||
prefix=PREFIX_TEXT),
|
||||
library('client', "Running Graphs", client_lib,
|
||||
exclude_symbols=["InteractiveSession"]),
|
||||
|
|
|
|||
|
|
@ -8,24 +8,31 @@ from tensorflow.python.ops import math_ops
|
|||
|
||||
|
||||
def embedding_lookup(params, ids, name=None):
|
||||
"""Return a tensor of embedding values by looking up "ids" in "params".
|
||||
"""Looks up `ids` in a list of embedding tensors.
|
||||
|
||||
This function is used to perform parallel lookups on the list of
|
||||
tensors in `params`. It is a generalization of
|
||||
[`tf.gather()`](array_ops.md#gather), where `params` is interpreted
|
||||
as a partition of a larger embedding tensor.
|
||||
|
||||
If `len(params) > 1`, each element `id` of `ids` is partitioned between
|
||||
the elements of `params` by computing `p = id % len(params)`, and is
|
||||
then used to look up the slice `params[p][id // len(params), ...]`.
|
||||
|
||||
The results of the lookup are then concatenated into a dense
|
||||
tensor. The returned tensor has shape `shape(ids) + shape(params)[1:]`.
|
||||
|
||||
Args:
|
||||
params: List of tensors of the same shape. A single tensor is
|
||||
treated as a singleton list.
|
||||
ids: Tensor of integers containing the ids to be looked up in
|
||||
'params'. Let P be len(params). If P > 1, then the ids are
|
||||
partitioned by id % P, and we do separate lookups in params[p]
|
||||
for 0 <= p < P, and then stitch the results back together into
|
||||
a single result tensor.
|
||||
name: Optional name for the op.
|
||||
params: A list of tensors with the same shape and type.
|
||||
ids: A `Tensor` with type `int32` containing the ids to be looked
|
||||
up in `params`.
|
||||
name: A name for the operation (optional).
|
||||
|
||||
Returns:
|
||||
A tensor of shape ids.shape + params[0].shape[1:] containing the
|
||||
values params[i % P][i] for each i in ids.
|
||||
A `Tensor` with the same type as the tensors in `params`.
|
||||
|
||||
Raises:
|
||||
ValueError: if some parameters are invalid.
|
||||
ValueError: If `params` is empty.
|
||||
"""
|
||||
if not isinstance(params, list):
|
||||
params = [params]
|
||||
|
|
|
|||
|
|
@ -43,10 +43,10 @@ are as follows. If the 4-D `input` has shape
|
|||
`[batch, in_height, in_width, ...]` and the 4-D `filter` has shape
|
||||
`[filter_height, filter_width, ...]`, then
|
||||
|
||||
output.shape = [batch,
|
||||
(in_height - filter_height + 1) / strides[1],
|
||||
(in_width - filter_width + 1) / strides[2],
|
||||
...]
|
||||
shape(output) = [batch,
|
||||
(in_height - filter_height + 1) / strides[1],
|
||||
(in_width - filter_width + 1) / strides[2],
|
||||
...]
|
||||
|
||||
output[b, i, j, :] =
|
||||
sum_{di, dj} input[b, strides[1] * i + di, strides[2] * j + dj, ...] *
|
||||
|
|
@ -58,7 +58,7 @@ vectors. For `depthwise_conv_2d`, each scalar component `input[b, i, j, k]`
|
|||
is multiplied by a vector `filter[di, dj, k]`, and all the vectors are
|
||||
concatenated.
|
||||
|
||||
In the formula for `output.shape`, the rounding direction depends on padding:
|
||||
In the formula for `shape(output)`, the rounding direction depends on padding:
|
||||
|
||||
* `padding = 'SAME'`: Round down (only full size windows are considered).
|
||||
* `padding = 'VALID'`: Round up (partial windows are included).
|
||||
|
|
@ -81,7 +81,7 @@ In detail, the output is
|
|||
|
||||
for each tuple of indices `i`. The output shape is
|
||||
|
||||
output.shape = (value.shape - ksize + 1) / strides
|
||||
shape(output) = (shape(value) - ksize + 1) / strides
|
||||
|
||||
where the rounding direction depends on padding:
|
||||
|
||||
|
|
@ -119,10 +119,10 @@ TensorFlow provides several operations that help you perform classification.
|
|||
|
||||
## Embeddings
|
||||
|
||||
TensorFlow provides several operations that help you compute embeddings.
|
||||
TensorFlow provides library support for looking up values in embedding
|
||||
tensors.
|
||||
|
||||
@@embedding_lookup
|
||||
@@embedding_lookup_sparse
|
||||
|
||||
## Evaluation
|
||||
|
||||
|
|
@ -336,15 +336,16 @@ def dropout(x, keep_prob, noise_shape=None, seed=None, name=None):
|
|||
By default, each element is kept or dropped independently. If `noise_shape`
|
||||
is specified, it must be
|
||||
[broadcastable](http://docs.scipy.org/doc/numpy/user/basics.broadcasting.html)
|
||||
to the shape of `x`, and only dimensions with `noise_shape[i] == x.shape[i]`
|
||||
will make independent decisions. For example, if `x.shape = [b, x, y, c]` and
|
||||
`noise_shape = [b, 1, 1, c]`, each batch and channel component will be
|
||||
to the shape of `x`, and only dimensions with `noise_shape[i] == shape(x)[i]`
|
||||
will make independent decisions. For example, if `shape(x) = [k, l, m, n]`
|
||||
and `noise_shape = [k, 1, 1, n]`, each batch and channel component will be
|
||||
kept independently and each row and column will be kept or not kept together.
|
||||
|
||||
Args:
|
||||
x: A tensor.
|
||||
keep_prob: Float probability that each element is kept.
|
||||
noise_shape: Shape for randomly generated keep/drop flags.
|
||||
keep_prob: A Python float. The probability that each element is kept.
|
||||
noise_shape: A 1-D `Tensor` of type `int32`, representing the
|
||||
shape for randomly generated keep/drop flags.
|
||||
seed: A Python integer. Used to create a random seed.
|
||||
See [`set_random_seed`](constant_op.md#set_random_seed) for behavior.
|
||||
name: A name for this operation (optional).
|
||||
|
|
|
|||
|
|
@ -66,3 +66,7 @@ RUN bazel clean && \
|
|||
bazel build -c opt tensorflow/tools/docker:simple_console
|
||||
|
||||
ENV PYTHONPATH=/tensorflow/bazel-bin/tensorflow/tools/docker/simple_console.runfiles/:$PYTHONPATH
|
||||
|
||||
# We want to start Jupyter in the directory with our getting started
|
||||
# tutorials.
|
||||
WORKDIR /notebooks
|
||||
|
|
|
|||
Loading…
Reference in New Issue
Block a user