Commit Graph

14 Commits

Author SHA1 Message Date
James Cross
51033f19d7 unbreak test_seq2seq_caffe2_model_cnn_one_stack_encoder
Summary: Fixes unit test test_seq2seq_caffe2_model_cnn_one_stack_encoder, broken by D4905003. (Also some commas.)

Differential Revision: D4920699

fbshipit-source-id: 2fe501095e3e26a475d666afcae8e48c953f2eef
2017-04-20 10:06:25 -07:00
Aapo Kyrola
1e5140aa76 option to recompute blobs backward pass with massive memory savings
Summary:
This diff adds an option to recurrent_net to define some cell blobs to be recomputed on backward step, and thus they don't need to be stored in the step workspace. This is done by modifying the backward step to automatically include all operators that are needed to produce the output that is to be recomputed, and by storing those blobs in a shared workspace. To enable the shared workspace, i had to modify the stepworkspaces blob to also store a forward shared workspace. Making it a class field won't work since the lifecycle of the blob does not match the lifecycle of the operator.

For basic LSTM, the performance hit is quite modest (about 15% with one setting, but your mileage might vary. For Attention models, I am sure this is beneficial as computing the attention blobs is not expensive.

For basic LSTM, the memory saving is wonderful: each forward workspace only has 4 bytes (for timestep).

I also modified the neural_mt LSTM Cells, but there is no test available, so I am not 100% sure I did it correctly. Please have a look.

Added options to LSTM, MILSTM and LSTMAttention to enable memory mode.

Reviewed By: urikz

Differential Revision: D4853890

fbshipit-source-id: d8d0e0e75a5330d174fbfa39b96d8e4e8c446baa
2017-04-11 13:03:48 -07:00
James Reed
66d00b3a63 Use CUDNN softmax implementation
Summary: The caffe2 implementation of bare Softmax() has a race condition that wipes out the numerical stability trick. Use the CUDNN implementation instead

Reviewed By: urikz

Differential Revision: D4831298

fbshipit-source-id: d11b1de700e3954629e7ed43225a2416c27b3252
2017-04-04 20:02:21 -07:00
Aapo Kyrola
e13e9c1302 cuDNN version of TransposeOp
Summary:
Uses the cudnnTransformTensor function. It works by shuffling the strides according to the transpose axis. Significant speedup over current GPU version .
+ moves the transpose test under utility_ops, because hypothesis_test is too big

Reviewed By: jamesr66a

Differential Revision: D4810993

fbshipit-source-id: 82577c4ced1389e70bd5992820ae4d8297a3817f
2017-04-03 13:33:10 -07:00
Aaron Markham
58f7f2b441 doxygen python block added
Summary: Closes https://github.com/caffe2/caffe2/pull/226

Differential Revision: D4793550

Pulled By: JoelMarcey

fbshipit-source-id: cc33e58186304fa8dcac2ee9115dcc271d785b1e
2017-03-29 06:46:16 -07:00
Deepak Gopinath
422c65ca35 Removing unnecessary Copy after fixing gradients for external parameters
Summary: Apart from copying gradient blobs for inputs with initial_cell_input, we needed to perform a similar operation for external parameters used by the step net

Reviewed By: salexspb

Differential Revision: D4752259

fbshipit-source-id: 13ee48cf583ed86221a4cc1cc9f57f5c3a7d2450
2017-03-23 15:04:22 -07:00
Yury Zemlyanskiy
ea66516d5e Output attention weights from apply_xxx_attention methods
Summary: OSS diff. We need it later for beam decoding.

Differential Revision: D4747785

fbshipit-source-id: ce2d53ee2434216ace3c4ddbd40a9b68e9db7ec5
2017-03-21 19:01:58 -07:00
Yury Zemlyanskiy
93ff338ca7 Beam decoder for NMT in Caffe2
Summary: yolo5

Differential Revision: D4685076

fbshipit-source-id: b5534e441bb453f90e5210294f2dfff6b5c3b5b1
2017-03-20 22:03:59 -07:00
James Reed
33f41c06c0 Remove more instances of batch_size
Summary: D4734505 part 2. Remove more instances of the batch_size parameter

Reviewed By: urikz

Differential Revision: D4736906

fbshipit-source-id: fc9d374e9308017d61c427890364c5ab9cec2edf
2017-03-19 22:31:30 -07:00
James Reed
17da5856ed Remove batch_size parameter from attention and LSTMWithAttention interfaces
Summary: Reshape based on tensor shapes in the graph rather than based on a passed-in batch_size parameter

Reviewed By: urikz

Differential Revision: D4734505

fbshipit-source-id: d9c23d85be84f61124106e752ef2b4f6945e2a07
2017-03-19 18:16:28 -07:00
Yury Zemlyanskiy
d1424c3265 Revert D4702086: Remove batch_size parameter from attention and LSTMWithAttention interfaces
Summary: This reverts commit c4c1d8425cd36c1e86695918eaba2667c27e9601

Differential Revision: D4702086

fbshipit-source-id: 4620610b182bb84b9297b5de32782761ae89d20b
2017-03-17 17:36:47 -07:00
James Reed
10d95bd0f0 Remove batch_size parameter from attention and LSTMWithAttention interfaces
Summary: Reshape based on tensor shapes in the graph rather than based on a passed-in batch_size parameter

Reviewed By: urikz

Differential Revision: D4702086

fbshipit-source-id: c4c1d8425cd36c1e86695918eaba2667c27e9601
2017-03-16 11:47:52 -07:00
James Reed
8de1db9eb6 Implement recurrent attention in C2
Summary: Super rough implementation of recurrent attention. Planning to factor out the common code between the two functions as well as train and eval. I want to get this out and get eyes on it sooner rather than later

Differential Revision: D4647837

fbshipit-source-id: 54bc4e8ed0df6f04c86c425926decbe89f73b068
2017-03-08 11:21:28 -08:00
Yury Zemlyanskiy
4a53ab3cb6 LSTMWithAttention implementation in Caffe2
Summary:
Implementation of ##LSTMWithAttention##

Still TBD:
1. There are problems with back propagation, because gradient is not implemented for ops with broadcasting
2. I need to make initial_recurrent_state to be of shape [dim] rather than [1, batch_size, dim], so one doesn't need to provide batch_size to LSTMWithAttention

Differential Revision: D4298735

fbshipit-source-id: 8903fcff4d6a66647ee6d45a6ef28803fc3091e5
2017-02-23 04:08:34 -08:00