Commit Graph

15 Commits

Author SHA1 Message Date
James Reed
10d95bd0f0 Remove batch_size parameter from attention and LSTMWithAttention interfaces
Summary: Reshape based on tensor shapes in the graph rather than based on a passed-in batch_size parameter

Reviewed By: urikz

Differential Revision: D4702086

fbshipit-source-id: c4c1d8425cd36c1e86695918eaba2667c27e9601
2017-03-16 11:47:52 -07:00
Aapo Kyrola
26628d10ff Fix workspace clashes
Summary: For example, test and train nets could have shared workspaces, leading to race condition. This adds an assertion and adds a running counter to the workspace-blob name.

Reviewed By: jhcross

Differential Revision: D4712152

fbshipit-source-id: 808d7069095bac24ebfe0c9d31ebd134f4cf0956
2017-03-14 23:33:28 -07:00
Aapo Kyrola
91f468b15c fixes to make data parallel model work for RecurrentNet + test case
Summary:
First, this diff includes a full test of data-parallel LSTM, which confirms it works correctly. To make it work, some changes had to be made:
 - cell net/step net external inputs must be namespace scoped
 - prevent double-namescoping of cellnet inputs
 - make data parallel model understand recurrentnets so the device-mapping works

Reviewed By: salexspb

Differential Revision: D4708840

fbshipit-source-id: 4b0ddc43642d449076a2b6f67ad1c47f84138ff4
2017-03-14 15:48:07 -07:00
Karthik Prasad
965a7daf9b Implement MILSTM in caffe2
Summary:
Created a new function with specifics related to MI LSTM implementation in caffe2
See https://arxiv.org/pdf/1606.06630.pdf for details.
See D4478877 for the implementation of the same in tensorflow

Reviewed By: jhcross

Differential Revision: D4669882

fbshipit-source-id: 095bbcf187dbdac2cd79558ff0c8f9f67d8af639
2017-03-09 16:32:47 -08:00
James Reed
8de1db9eb6 Implement recurrent attention in C2
Summary: Super rough implementation of recurrent attention. Planning to factor out the common code between the two functions as well as train and eval. I want to get this out and get eyes on it sooner rather than later

Differential Revision: D4647837

fbshipit-source-id: 54bc4e8ed0df6f04c86c425926decbe89f73b068
2017-03-08 11:21:28 -08:00
Yury Zemlyanskiy
4a53ab3cb6 LSTMWithAttention implementation in Caffe2
Summary:
Implementation of ##LSTMWithAttention##

Still TBD:
1. There are problems with back propagation, because gradient is not implemented for ops with broadcasting
2. I need to make initial_recurrent_state to be of shape [dim] rather than [1, batch_size, dim], so one doesn't need to provide batch_size to LSTMWithAttention

Differential Revision: D4298735

fbshipit-source-id: 8903fcff4d6a66647ee6d45a6ef28803fc3091e5
2017-02-23 04:08:34 -08:00
James Cross
b436788b16 LSTMUnit: pass through H values
Summary:
Pass through the h-value recurrent output unchanged at each LSTM step beyond the valid part of a sequence (computed based on seqLengths, allowing batching of sequences of different length). This enables using the final-step output of each sequence as the output when one vector is desired for the entire sequence. Gradient also passed back unchanged.

Also made some cosmetic changes to recurrent_network_test.py (seq_lengths offset corrected, should be in [1, T] rather than [0, T-1]).

Reviewed By: urikz

Differential Revision: D4540307

fbshipit-source-id: 73a9f6326069d713dcb0cdc8d17869317c6dbe96
2017-02-16 15:31:38 -08:00
James Cross
63901e9aca allow recurrent network gradient op to receive gradient on any combination of network output blobs
Summary:
(Caffe2) Modified RecurrentNetworkGradient operator so that training is possible with any of the output blob(s) receiving gradient during the backward pass. This is realized through a new argument for the RecurrentNetwork op, outputs_with_grads, which takes a list of the indices of the output blobs which will receive gradient. The default case (only receiving gradient from the first output blob) remains the default.

New unit test covers the case where outputs_with_grads = [1, 2] using Python LSTM wrapper.

Reviewed By: urikz

Differential Revision: D4518516

fbshipit-source-id: 5c531582b20f3cf727d1aa91239b4d5a2b8a7c1f
2017-02-15 16:00:45 -08:00
James Cross
93795406c5 Adapt NLU proj code for Caffe2 RecurrentNetworkOp changes
Summary: Updates function revise_recurrent_network_op() which supports cloning recurrent networks by adding a blob-name prefix to string arguments to maintain correspondence. Previously relied on many hard-coded indices referring to the positions of arguments and inputs of RecurrentNetworkOp and its corresponding gradient operator, and therefore broke when the implementation changed. This fix should make it more general and robust

Differential Revision: D4559768

fbshipit-source-id: fb85b0b1ffb1393dc84760d6ae5dc473e8b764b0
2017-02-15 16:00:44 -08:00
Alexander Sidorov
b7fa6b2a8b remove recurrent_inputs in a favor of recurrent_input_ids
Summary:
I have forgotten to remove this one. The rest of indexing
instead of string names is comming after  D4446813 lands as scratches
aren't inputs or outputs and thus can't be indexed.

Reviewed By: urikz

Differential Revision: D4465748

fbshipit-source-id: 2ccbedfb35541ef4a2231d1480eef59025bd5290
2017-01-31 13:14:33 -08:00
Alexander Sidorov
d019ec793c improve fluky test
Summary: On some inputs TestWarden was failing

Reviewed By: Yangqing

Differential Revision: D4487293

fbshipit-source-id: 3da4b310a619c2b57f033b2dd7727f71403bfd68
2017-01-30 22:14:27 -08:00
Yury Zemlyanskiy
22e1bdd6d1 Use stack workspaces in RecurrentNetwork
Summary: This diff use stack workspaces in RecurrentNetwork, which allows to simplify the implementation and get rid of scratches.

Reviewed By: salexspb

Differential Revision: D4446813

fbshipit-source-id: 514eec7e4300bdf492a9cb192b40cf4f89acf656
2017-01-27 11:44:26 -08:00
Yury Zemlyanskiy
0e3146e1e8 Remove recurrent_sizes from RecurrentNetwork
Summary: Remove usage of recurrent_sizes, so recurrent states' sizes can depend on input (in case of attention matrix for beam decoder). I removed recurrent_sizes from forward and backward steps.

Reviewed By: salexspb

Differential Revision: D4427688

fbshipit-source-id: 580420a294d309c86ec5cb4e677058623b7228e1
2017-01-24 23:14:25 -08:00
Alexander Sidorov
b1472a173a don't hardcode outputs order to work only for lstm + don't pass blob names for parameters
Summary:
In this diff I stop passing parameters by name and also remove hardcoded output ids which were there specifically for LSTM to work. It also allows to avoid using recurrent_sizes in the backward pass (for forward this is done in D4427688)

Using similar technic it should be simple enough to eliminate blob name passing at all. Then we can fix scoping. These can be done in a next diff.

Reviewed By: urikz

Differential Revision: D4444614

fbshipit-source-id: 3580a76365502b9f2f09e3d8b7e78084ca739f00
2017-01-24 16:29:23 -08:00
Yury Zemlyanskiy
c2d28fb874 RNNs API simplification
Summary:
This is a first step in improving our RNN story. It provides a wrapper around current RecurrentNetworkOp implementation which infers most of the redundant parameters and makes API much simpler.

Also in order to support general step nets I added an extra argument to the RecurrentNetworkOp.

Future work:

1. Inferring step net output and internal blobs (scratches) sizes and type
2. Avoid accessing blobs by names in c++ part
3. Remove requirement for inputs / output 1:1 correspondence in the step net
4. Make python API support networks with operators like Sum being on the boarder of the Cell net (currently there is an issue with such networks where gradient blobs which are on the side are not explicitly created).

Differential Revision: D4268503

fbshipit-source-id: f8a66491c2b55daa730caeed7e9f2b3921541b49
2016-12-21 09:29:43 -08:00