pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
Yury Zemlyanskiy	ae924be3ac	Removing extra Reshapes in MILSTM with new broadcasted ops Summary: D4873222 introduced SumReduceLike and removed the use_grad_hack ... hack. Remove unnecessary reshapes and kill use_grad_hack parameters. Reviewed By: jamesr66a Differential Revision: D4894243 fbshipit-source-id: c4f3f84abf95572d436b58bbdc2b18b21583c2f1	2017-05-09 14:11:04 -07:00
Yury Zemlyanskiy	11052d03aa	RNNCell API change: returns states and outputs Summary: Incorporating definition of cell's output and illustraing it's usage by adding dropout to all types of cell. I think that we should try to get rid of aliases in RecurrentNetwork, so output of applied_over_sequence is also always (state_1_all, state_2_all, ...). This way we can merge get_output_from_single_step, get_output_from_sequence and get_outputs_with_grads into a single method Let me know what do you think! Reviewed By: jhcross Differential Revision: D4992913 fbshipit-source-id: 737939be336ad145f84e8733cd255d4f7188ef70	2017-05-08 15:19:48 -07:00
James Cross	5c667ebe4e	AttentionCell Summary: This diff creates a generalized AttentionCell class, which will allow us to construct attention decoders out of arbitrary RNNCell components (with a particular view to using stacked, multi-layer RNNs). In order to do this, we introduce a new optional input for RNNCell._apply which allows us to provide an additional input that is not processed by prepare_input(). Note that this is an argument only to _apply, not apply, since it is only meant to be used for additional recurrent connections to "embedded" cells, not for standalone RNNs. Reviewed By: urikz Differential Revision: D4998465 fbshipit-source-id: 473009ea4917e86e365f9d23aa2f11a46a94fd65	2017-05-05 12:33:01 -07:00
Aapo Kyrola	d312dcc881	lstm_benchmark use rnn_cell.LSTM multicell + assertion Summary: Use the rnn_cell's multi-cell for LSTM benchmark. While doing this, i had not changed the initial_states and I got a inconsistent result from rnn_cell, so added an assertion to check initial states length is 2 * num layers. + fix division by zero error Reviewed By: salexspb Differential Revision: D5003177 fbshipit-source-id: a8250b825394c352428a0f067098dfcd7516ab2a	2017-05-04 17:02:32 -07:00
James Cross	ddc4d101ad	MultiRNNCell (Caffe2) Summary: Add Python support for arbitrary (unidirectional) recurrent networks with MultiRNNCell abstraction. Since the combined step net for all layers is created at one time (in method _apply), this may be optimizable as-is. LSTM() function is extended to accept a list of numbers of units for the dim_out argument, producing a multi-layer LSTM in that case. Reviewed By: salexspb Differential Revision: D4965001 fbshipit-source-id: 39c069468d5b40bf803503cf62046a479ca83cbb	2017-05-03 10:02:31 -07:00
Alexander Sidorov	bf50599c70	Layered LSTM (naive version) Summary: This is a naive layering approroach till we have a better one. It could be c++ based and support diagonal execution. Not integrating into main LSTM API yet as this might be revised a bit. Would like to land so we can compare current implementation in the benchmark and also use this as an example of how LSTMs could be combined (as some folks are doing similar things with some variations). Later we can LSTM() support API of layered_LSTM() and also change it under the hood so it stacks cells into a bigger cell instead. This way if we make RNN op use a kind of a DAG net, then RNN op can provide more parallelizm in stacked cells. Reviewed By: urikz Differential Revision: D4936015 fbshipit-source-id: b1e25f12d985dda582f0c67d9a02508027e5497f	2017-04-27 19:16:58 -07:00
Alexander Sidorov	ad6204eb0b	LSTM: support dropping hidden / cell states when sequence Summary: This is useful when data has standalone sequences which are not connected to each other by any meaningful context Reviewed By: yqwangustc Differential Revision: D4835164 fbshipit-source-id: f95626acc26acc3eba3bca7efb08ed1dbdb36c83	2017-04-27 11:47:29 -07:00
Aapo Kyrola	9cb901caf0	Forward-only rnns Summary: Added option to recurrent_net and RNNCell's for forward_only. If this is set, the backward_step_net is not passed to the operator. When backward_step_net is not available, operator knows it is in forward_only mode and does not create workspaces for each step but cycles through only one private workspace. Note: we could avoid doing a lot of work in recurrent.py:recurrent_network call when backward step is not needed, but doing that nicely requires more refactoring that I did not want to do now. Thus, we create the backward step nets etc, but just don't pass it to the op. This can be used to create more efficient inference models. You can also sanitize existing inference nets and remove the backward_step_net argument to get the benefits. Reviewed By: salexspb Differential Revision: D4916482 fbshipit-source-id: c99b93c9cb897c32b0f449253f7f6d6a942618ad	2017-04-24 15:52:27 -07:00
Aapo Kyrola	6ed36c37e6	fix CUDNN layer weight size calculation for multiple layers Summary: CuDNN LSTM weights were incorrectly sized for layers > 0: there was assumption that the input size to middle layers is same as for the first layer, but actually the middle layer will get input from a layer below, which will have dimension equal to the output dimension (hidden dimension). This worked fine when input_dim and hidden_dim were equal, as are the default params for lstm_benchmark. Reviewed By: salexspb Differential Revision: D4922824 fbshipit-source-id: 3ed05529dcb0a4e66ad440084a55df1c5932fd33	2017-04-20 15:02:48 -07:00
Yury Zemlyanskiy	4bf559eddb	RNNCell, LSTMCell, LSTMWithAttentionCell Summary: This is the nice way to re-use RNN layers for training and for inference. Reviewed By: salexspb Differential Revision: D4825894 fbshipit-source-id: 779c69758cee8caca6f36bc507e3ea0566f7652a	2017-04-18 00:47:20 -07:00

10 Commits