Summary: Added a new RNNCell, DropoutCell, which wraps an existing RNNCell and applies dropout to its primary output (as defined by get_output_state_index()).
Reviewed By: salexspb
Differential Revision: D5084871
fbshipit-source-id: 60474af84e5757a12e7fdc3814840dc9ba8e32a1
Summary: As noted by salexspb, MultiRNNCell had unreliable gradient computation. The problem was that recurrent gradient and gradient computed wihtin the backward step net were not being accumulated during the backward pass, but rather writing to the same blob, thus overwriting each other. This diff fixes that by artificially introducing an extra blob for the internal output, and then accumulating it into the gradient coming from the recurrent connection.
Reviewed By: salexspb
Differential Revision: D5110059
fbshipit-source-id: 16add50989fe8866361bbc21afce5f214c5292fd
Summary:
Incorporate arbitrary dropout for encoder and decoder layers for Caffe2 NMT models using current configuration. This involves separate output processing (_prepare_output() and _prepare_output_sequence()) for the final layer in a MultiRNNCell.
Switching to using the newly introduced forward_only switch for RNN cells revealed an unrelated bug in our NetGradientChecker test, which urikz is investigating.
Reviewed By: salexspb
Differential Revision: D5031964
fbshipit-source-id: 19b49607d551aa3e2140041ef4e585f128c8f178
Summary:
Residual connections for multilayer RNN encoder/decoder for Caffe2 NMT model. Only supporting 'add' connections (the standard approach, which ves's TF experiments concluded was at least as good as other approaches), and also only implementing for residual_level >= 1 (which also fits our use case).
It is the responsibility of the config to ensure dimension compatibility: each level at and beyond residual_level (in both the encoder and decoder) should have the same number of units, with the exception that a bidirectional initial encoder layer should have half the number of units of the succeeding layer if that next layer is a residual layer.
Differential Revision: D5023160
fbshipit-source-id: f38c1b140638fee78cf3ef7d6b4602dd462484ee
Summary:
Update rnn_cell.py and char_rnn.py example with new `brew` model.
- Deprecated CNNModelHelper
- replace all helper functions with brew helper functions
- Use `model.net.<SingleOp>` format to create bare bone Operator for better clarity.
Reviewed By: salexspb
Differential Revision: D5062963
fbshipit-source-id: 254f7b9059a29621027d2b09e932f3f81db2e0ce
Summary:
Incorporating definition of cell's output and illustraing it's usage by adding dropout to all types of cell.
I think that we should try to get rid of aliases in RecurrentNetwork, so output of applied_over_sequence is also always (state_1_all, state_2_all, ...). This way we can merge get_output_from_single_step, get_output_from_sequence and get_outputs_with_grads into a single method
Let me know what do you think!
Reviewed By: jhcross
Differential Revision: D4992913
fbshipit-source-id: 737939be336ad145f84e8733cd255d4f7188ef70
Summary:
This diff creates a generalized AttentionCell class, which will allow us to construct attention decoders out of arbitrary RNNCell components (with a particular view to using stacked, multi-layer RNNs).
In order to do this, we introduce a new optional input for RNNCell._apply which allows us to provide an additional input that is not processed by prepare_input(). Note that this is an argument only to _apply, not apply, since it is only meant to be used for additional recurrent connections to "embedded" cells, not for standalone RNNs.
Reviewed By: urikz
Differential Revision: D4998465
fbshipit-source-id: 473009ea4917e86e365f9d23aa2f11a46a94fd65
Summary:
Use the rnn_cell's multi-cell for LSTM benchmark. While doing this, i had not changed the initial_states and I got a inconsistent result from rnn_cell, so added an assertion to check initial states length is 2 * num layers.
+ fix division by zero error
Reviewed By: salexspb
Differential Revision: D5003177
fbshipit-source-id: a8250b825394c352428a0f067098dfcd7516ab2a
Summary: Add Python support for arbitrary (unidirectional) recurrent networks with MultiRNNCell abstraction. Since the combined step net for all layers is created at one time (in method _apply), this may be optimizable as-is. LSTM() function is extended to accept a list of numbers of units for the dim_out argument, producing a multi-layer LSTM in that case.
Reviewed By: salexspb
Differential Revision: D4965001
fbshipit-source-id: 39c069468d5b40bf803503cf62046a479ca83cbb
Summary:
This is a naive layering approroach till we have a better
one. It could be c++ based and support diagonal execution. Not integrating into main LSTM API yet as this might be revised a bit. Would like to land so we can compare current implementation in the benchmark and also use this as an example of how LSTMs could be combined (as some folks are doing similar things with some variations).
Later we can LSTM() support API of layered_LSTM() and also change it under the hood so it stacks cells into a bigger cell instead. This way if we make RNN op use a kind of a DAG net, then RNN op can provide more parallelizm in stacked cells.
Reviewed By: urikz
Differential Revision: D4936015
fbshipit-source-id: b1e25f12d985dda582f0c67d9a02508027e5497f
Summary:
This is useful when data has standalone sequences which are
not connected to each other by any meaningful context
Reviewed By: yqwangustc
Differential Revision: D4835164
fbshipit-source-id: f95626acc26acc3eba3bca7efb08ed1dbdb36c83
Summary:
Added option to recurrent_net and RNNCell's for forward_only. If this is set, the backward_step_net is not passed to the operator.
When backward_step_net is not available, operator knows it is in forward_only mode and does not create workspaces for each step but cycles
through only one private workspace.
Note: we could avoid doing a lot of work in recurrent.py:recurrent_network call when backward step is not needed, but doing that nicely requires
more refactoring that I did not want to do now. Thus, we create the backward step nets etc, but just don't pass it to the op.
This can be used to create more efficient inference models. You can also sanitize existing inference nets and remove the backward_step_net argument to
get the benefits.
Reviewed By: salexspb
Differential Revision: D4916482
fbshipit-source-id: c99b93c9cb897c32b0f449253f7f6d6a942618ad
Summary: CuDNN LSTM weights were incorrectly sized for layers > 0: there was assumption that the input size to middle layers is same as for the first layer, but actually the middle layer will get input from a layer below, which will have dimension equal to the output dimension (hidden dimension). This worked fine when input_dim and hidden_dim were equal, as are the default params for lstm_benchmark.
Reviewed By: salexspb
Differential Revision: D4922824
fbshipit-source-id: 3ed05529dcb0a4e66ad440084a55df1c5932fd33
Summary: This is the nice way to re-use RNN layers for training and for inference.
Reviewed By: salexspb
Differential Revision: D4825894
fbshipit-source-id: 779c69758cee8caca6f36bc507e3ea0566f7652a