Commit Graph

19 Commits

Author SHA1 Message Date
Alexander Sidorov
016f72537a ModelHelper.create_param, Initializer abstraction and ParameterInfo for optimizers
Summary:
This is going to unblock Nvidia in their work on adding fp16
support to Caffe2. I discussed this with kennyhorror before to make
sure this fits into his work on parameter sharing.

Reviewed By: kennyhorror

Differential Revision: D5127797

fbshipit-source-id: 4db155d320b1862570c23b77c4252bdacbf2296f
2017-05-25 22:03:15 -07:00
Yiming Wu
a28b01c155 rnn with brew
Summary:
Update rnn_cell.py and char_rnn.py example with new `brew` model.

- Deprecated CNNModelHelper
- replace all helper functions with brew helper functions
- Use `model.net.<SingleOp>` format to create bare bone Operator for better clarity.

Reviewed By: salexspb

Differential Revision: D5062963

fbshipit-source-id: 254f7b9059a29621027d2b09e932f3f81db2e0ce
2017-05-16 13:33:44 -07:00
Yury Zemlyanskiy
48de1ea165 Drop extra Reshape in attention calculation
Summary: We can avoid this extra Reshape.

Reviewed By: jamesr66a

Differential Revision: D5032874

fbshipit-source-id: 92bd568bc6bec53d7f81a64cfa96d2c610823f8c
2017-05-09 17:16:36 -07:00
Yury Zemlyanskiy
b6a8dd1438 don't recompute small blob in attention
Summary: decoder_hidden_encoder_outputs_sum_tmp is tiny after D5010109, no need to recompute it.

Reviewed By: akyrola

Differential Revision: D5014335

fbshipit-source-id: cc9e8f91372889d10bd99c79366018cb3943a435
2017-05-08 13:06:06 -07:00
Yury Zemlyanskiy
d7f20c94fd Optimize memory for RNN attention
Summary:
The fix should save us (source_len - 1) * target_len * batch_size * encoder_output_size * 4 bytes for the forward pass. Typically, these values are 100 * 100 * 128 * 512 * 4 = 2.4GB.
Not entirely sure about backward pass.

Reviewed By: akyrola

Differential Revision: D5010109

fbshipit-source-id: 2ed68f3ebfd3b8362916d24af991482f1686e064
2017-05-05 12:18:50 -07:00
James Cross
51033f19d7 unbreak test_seq2seq_caffe2_model_cnn_one_stack_encoder
Summary: Fixes unit test test_seq2seq_caffe2_model_cnn_one_stack_encoder, broken by D4905003. (Also some commas.)

Differential Revision: D4920699

fbshipit-source-id: 2fe501095e3e26a475d666afcae8e48c953f2eef
2017-04-20 10:06:25 -07:00
Aapo Kyrola
1e5140aa76 option to recompute blobs backward pass with massive memory savings
Summary:
This diff adds an option to recurrent_net to define some cell blobs to be recomputed on backward step, and thus they don't need to be stored in the step workspace. This is done by modifying the backward step to automatically include all operators that are needed to produce the output that is to be recomputed, and by storing those blobs in a shared workspace. To enable the shared workspace, i had to modify the stepworkspaces blob to also store a forward shared workspace. Making it a class field won't work since the lifecycle of the blob does not match the lifecycle of the operator.

For basic LSTM, the performance hit is quite modest (about 15% with one setting, but your mileage might vary. For Attention models, I am sure this is beneficial as computing the attention blobs is not expensive.

For basic LSTM, the memory saving is wonderful: each forward workspace only has 4 bytes (for timestep).

I also modified the neural_mt LSTM Cells, but there is no test available, so I am not 100% sure I did it correctly. Please have a look.

Added options to LSTM, MILSTM and LSTMAttention to enable memory mode.

Reviewed By: urikz

Differential Revision: D4853890

fbshipit-source-id: d8d0e0e75a5330d174fbfa39b96d8e4e8c446baa
2017-04-11 13:03:48 -07:00
James Reed
66d00b3a63 Use CUDNN softmax implementation
Summary: The caffe2 implementation of bare Softmax() has a race condition that wipes out the numerical stability trick. Use the CUDNN implementation instead

Reviewed By: urikz

Differential Revision: D4831298

fbshipit-source-id: d11b1de700e3954629e7ed43225a2416c27b3252
2017-04-04 20:02:21 -07:00
Aapo Kyrola
e13e9c1302 cuDNN version of TransposeOp
Summary:
Uses the cudnnTransformTensor function. It works by shuffling the strides according to the transpose axis. Significant speedup over current GPU version .
+ moves the transpose test under utility_ops, because hypothesis_test is too big

Reviewed By: jamesr66a

Differential Revision: D4810993

fbshipit-source-id: 82577c4ced1389e70bd5992820ae4d8297a3817f
2017-04-03 13:33:10 -07:00
Aaron Markham
58f7f2b441 doxygen python block added
Summary: Closes https://github.com/caffe2/caffe2/pull/226

Differential Revision: D4793550

Pulled By: JoelMarcey

fbshipit-source-id: cc33e58186304fa8dcac2ee9115dcc271d785b1e
2017-03-29 06:46:16 -07:00
Deepak Gopinath
422c65ca35 Removing unnecessary Copy after fixing gradients for external parameters
Summary: Apart from copying gradient blobs for inputs with initial_cell_input, we needed to perform a similar operation for external parameters used by the step net

Reviewed By: salexspb

Differential Revision: D4752259

fbshipit-source-id: 13ee48cf583ed86221a4cc1cc9f57f5c3a7d2450
2017-03-23 15:04:22 -07:00
Yury Zemlyanskiy
ea66516d5e Output attention weights from apply_xxx_attention methods
Summary: OSS diff. We need it later for beam decoding.

Differential Revision: D4747785

fbshipit-source-id: ce2d53ee2434216ace3c4ddbd40a9b68e9db7ec5
2017-03-21 19:01:58 -07:00
Yury Zemlyanskiy
93ff338ca7 Beam decoder for NMT in Caffe2
Summary: yolo5

Differential Revision: D4685076

fbshipit-source-id: b5534e441bb453f90e5210294f2dfff6b5c3b5b1
2017-03-20 22:03:59 -07:00
James Reed
33f41c06c0 Remove more instances of batch_size
Summary: D4734505 part 2. Remove more instances of the batch_size parameter

Reviewed By: urikz

Differential Revision: D4736906

fbshipit-source-id: fc9d374e9308017d61c427890364c5ab9cec2edf
2017-03-19 22:31:30 -07:00
James Reed
17da5856ed Remove batch_size parameter from attention and LSTMWithAttention interfaces
Summary: Reshape based on tensor shapes in the graph rather than based on a passed-in batch_size parameter

Reviewed By: urikz

Differential Revision: D4734505

fbshipit-source-id: d9c23d85be84f61124106e752ef2b4f6945e2a07
2017-03-19 18:16:28 -07:00
Yury Zemlyanskiy
d1424c3265 Revert D4702086: Remove batch_size parameter from attention and LSTMWithAttention interfaces
Summary: This reverts commit c4c1d8425cd36c1e86695918eaba2667c27e9601

Differential Revision: D4702086

fbshipit-source-id: 4620610b182bb84b9297b5de32782761ae89d20b
2017-03-17 17:36:47 -07:00
James Reed
10d95bd0f0 Remove batch_size parameter from attention and LSTMWithAttention interfaces
Summary: Reshape based on tensor shapes in the graph rather than based on a passed-in batch_size parameter

Reviewed By: urikz

Differential Revision: D4702086

fbshipit-source-id: c4c1d8425cd36c1e86695918eaba2667c27e9601
2017-03-16 11:47:52 -07:00
James Reed
8de1db9eb6 Implement recurrent attention in C2
Summary: Super rough implementation of recurrent attention. Planning to factor out the common code between the two functions as well as train and eval. I want to get this out and get eyes on it sooner rather than later

Differential Revision: D4647837

fbshipit-source-id: 54bc4e8ed0df6f04c86c425926decbe89f73b068
2017-03-08 11:21:28 -08:00
Yury Zemlyanskiy
4a53ab3cb6 LSTMWithAttention implementation in Caffe2
Summary:
Implementation of ##LSTMWithAttention##

Still TBD:
1. There are problems with back propagation, because gradient is not implemented for ops with broadcasting
2. I need to make initial_recurrent_state to be of shape [dim] rather than [1, batch_size, dim], so one doesn't need to provide batch_size to LSTMWithAttention

Differential Revision: D4298735

fbshipit-source-id: 8903fcff4d6a66647ee6d45a6ef28803fc3091e5
2017-02-23 04:08:34 -08:00