Commit Graph

10 Commits

Author SHA1 Message Date
Yury Zemlyanskiy
ae924be3ac Removing extra Reshapes in MILSTM with new broadcasted ops
Summary: D4873222 introduced SumReduceLike and removed the use_grad_hack ... hack. Remove unnecessary reshapes and kill use_grad_hack parameters.

Reviewed By: jamesr66a

Differential Revision: D4894243

fbshipit-source-id: c4f3f84abf95572d436b58bbdc2b18b21583c2f1
2017-05-09 14:11:04 -07:00
Yury Zemlyanskiy
11052d03aa RNNCell API change: returns states and outputs
Summary:
Incorporating definition of cell's output and illustraing it's usage by adding dropout to all types of cell.

I think that we should try to get rid of aliases in RecurrentNetwork, so output of applied_over_sequence is also always (state_1_all, state_2_all, ...). This way we can merge get_output_from_single_step, get_output_from_sequence and get_outputs_with_grads into a single method

Let me know what do you think!

Reviewed By: jhcross

Differential Revision: D4992913

fbshipit-source-id: 737939be336ad145f84e8733cd255d4f7188ef70
2017-05-08 15:19:48 -07:00
James Cross
5c667ebe4e AttentionCell
Summary:
This diff creates a generalized AttentionCell class, which will allow us to construct attention decoders out of arbitrary RNNCell components (with a particular view to using stacked, multi-layer RNNs).

In order to do this, we introduce a new optional input for RNNCell._apply which allows us to provide an additional input that is not processed by prepare_input(). Note that this is an argument only to _apply, not apply, since it is only meant to be used for additional recurrent connections to "embedded" cells, not for standalone RNNs.

Reviewed By: urikz

Differential Revision: D4998465

fbshipit-source-id: 473009ea4917e86e365f9d23aa2f11a46a94fd65
2017-05-05 12:33:01 -07:00
Aapo Kyrola
d312dcc881 lstm_benchmark use rnn_cell.LSTM multicell + assertion
Summary:
Use the rnn_cell's multi-cell for LSTM benchmark. While doing this, i had not changed the initial_states and I got a inconsistent result from rnn_cell, so added an assertion to check initial states length is 2 * num layers.

+ fix division by zero error

Reviewed By: salexspb

Differential Revision: D5003177

fbshipit-source-id: a8250b825394c352428a0f067098dfcd7516ab2a
2017-05-04 17:02:32 -07:00
James Cross
ddc4d101ad MultiRNNCell (Caffe2)
Summary: Add Python support for arbitrary (unidirectional) recurrent networks with MultiRNNCell abstraction. Since the combined step net for all layers is created at one time (in method _apply), this may be optimizable as-is. LSTM() function is extended to accept a list of numbers of units for the dim_out argument, producing a multi-layer LSTM in that case.

Reviewed By: salexspb

Differential Revision: D4965001

fbshipit-source-id: 39c069468d5b40bf803503cf62046a479ca83cbb
2017-05-03 10:02:31 -07:00
Alexander Sidorov
bf50599c70 Layered LSTM (naive version)
Summary:
This is a naive layering approroach till we have a better
one. It could be c++ based and support diagonal execution. Not integrating into main LSTM API yet as this might be revised a bit. Would like to land so we can compare current implementation in the benchmark and also use this as an example of how LSTMs could be combined (as some folks are doing similar things with some variations).

Later we can LSTM() support API of layered_LSTM() and also change it under the hood so it stacks cells into a bigger cell instead. This way if we make RNN op use a kind of a DAG net, then RNN op can provide more parallelizm in stacked cells.

Reviewed By: urikz

Differential Revision: D4936015

fbshipit-source-id: b1e25f12d985dda582f0c67d9a02508027e5497f
2017-04-27 19:16:58 -07:00
Alexander Sidorov
ad6204eb0b LSTM: support dropping hidden / cell states when sequence
Summary:
This is useful when data has standalone sequences which are
not connected to each other by any meaningful context

Reviewed By: yqwangustc

Differential Revision: D4835164

fbshipit-source-id: f95626acc26acc3eba3bca7efb08ed1dbdb36c83
2017-04-27 11:47:29 -07:00
Aapo Kyrola
9cb901caf0 Forward-only rnns
Summary:
Added option to recurrent_net and RNNCell's for forward_only. If this is set, the backward_step_net is not passed to the operator.
When backward_step_net is not available, operator knows it is in forward_only mode and does not create workspaces for each step but cycles
through only one private workspace.

Note: we could avoid doing a lot of work in recurrent.py:recurrent_network call when backward step is not needed, but doing that nicely requires
more refactoring that I did not want to do now. Thus, we create the backward step nets etc, but just don't pass it to the op.

This can be used to create more efficient inference models. You can also sanitize existing inference nets and remove the backward_step_net argument to
get the benefits.

Reviewed By: salexspb

Differential Revision: D4916482

fbshipit-source-id: c99b93c9cb897c32b0f449253f7f6d6a942618ad
2017-04-24 15:52:27 -07:00
Aapo Kyrola
6ed36c37e6 fix CUDNN layer weight size calculation for multiple layers
Summary: CuDNN LSTM weights were incorrectly sized for layers > 0: there was assumption that the input size to middle layers is same as for the first layer, but actually the middle layer will get input from a layer below, which will have dimension equal to the output dimension (hidden dimension). This worked fine when input_dim and hidden_dim were equal, as are the default params for lstm_benchmark.

Reviewed By: salexspb

Differential Revision: D4922824

fbshipit-source-id: 3ed05529dcb0a4e66ad440084a55df1c5932fd33
2017-04-20 15:02:48 -07:00
Yury Zemlyanskiy
4bf559eddb RNNCell, LSTMCell, LSTMWithAttentionCell
Summary: This is the nice way to re-use RNN layers for training and for inference.

Reviewed By: salexspb

Differential Revision: D4825894

fbshipit-source-id: 779c69758cee8caca6f36bc507e3ea0566f7652a
2017-04-18 00:47:20 -07:00