pytorch/caffe2/python
Aapo Kyrola 1e5140aa76 option to recompute blobs backward pass with massive memory savings
Summary:
This diff adds an option to recurrent_net to define some cell blobs to be recomputed on backward step, and thus they don't need to be stored in the step workspace. This is done by modifying the backward step to automatically include all operators that are needed to produce the output that is to be recomputed, and by storing those blobs in a shared workspace. To enable the shared workspace, i had to modify the stepworkspaces blob to also store a forward shared workspace. Making it a class field won't work since the lifecycle of the blob does not match the lifecycle of the operator.

For basic LSTM, the performance hit is quite modest (about 15% with one setting, but your mileage might vary. For Attention models, I am sure this is beneficial as computing the attention blobs is not expensive.

For basic LSTM, the memory saving is wonderful: each forward workspace only has 4 bytes (for timestep).

I also modified the neural_mt LSTM Cells, but there is no test available, so I am not 100% sure I did it correctly. Please have a look.

Added options to LSTM, MILSTM and LSTMAttention to enable memory mode.

Reviewed By: urikz

Differential Revision: D4853890

fbshipit-source-id: d8d0e0e75a5330d174fbfa39b96d8e4e8c446baa
2017-04-11 13:03:48 -07:00
..
docs doxygen python block added 2017-03-29 06:46:16 -07:00
examples Configurable CuDNN workspace limit in resnet50_trainer 2017-04-05 10:50:00 -07:00
helpers create helpers package and add dropout 2017-04-07 17:33:49 -07:00
layers create bucket-based calibration - layer 2017-04-11 12:30:26 -07:00
mint doxygen python block added 2017-03-29 06:46:16 -07:00
models Downloader fix 2017-04-07 10:16:58 -07:00
operator_test option to recompute blobs backward pass with massive memory savings 2017-04-11 13:03:48 -07:00
predictor Predictor exporter open-sourcing 2017-04-06 10:01:42 -07:00
_import_c_extension.py doxygen python block added 2017-03-29 06:46:16 -07:00
attention.py option to recompute blobs backward pass with massive memory savings 2017-04-11 13:03:48 -07:00
caffe_translator_test.py Allow test discovery in caffe2/python/ 2017-03-14 18:16:41 -07:00
caffe_translator.py Add Reduction layer in caffe_translator 2017-04-07 16:17:07 -07:00
checkpoint_test.py Skips the initialization phase of the individual checkpoint objects. 2017-03-31 10:10:56 -07:00
checkpoint.py Skips the initialization phase of the individual checkpoint objects. 2017-03-31 10:10:56 -07:00
CMakeLists.txt CMake completions work 2017-01-11 16:59:22 -08:00
cnn.py create helpers package and add dropout 2017-04-07 17:33:49 -07:00
context_test.py Make ContextManager thread-safe 2017-02-13 19:45:35 -08:00
context.py doxygen python block added 2017-03-29 06:46:16 -07:00
control_test.py fbsync. TODO: check if build files need update. 2016-11-15 00:00:46 -08:00
control.py doxygen python block added 2017-03-29 06:46:16 -07:00
convnet_benchmarks_test.py chunky sync - build scripts to be written 2016-07-21 10:16:42 -07:00
convnet_benchmarks.py doxygen python block added 2017-03-29 06:46:16 -07:00
core_gradients_test.py add inference for gradient ops + a couple of missing shape inference functions + fix to scalars 2017-02-28 23:33:32 -08:00
core_test.py NextScopedBlob with well-defined behavior and respect namescope 2017-02-16 17:16:36 -08:00
core.py Caffe2/Recurrent] recurrent.py API to cuDNN LSTM 2017-04-05 14:20:23 -07:00
crf.py cuDNN version of TransposeOp 2017-04-03 13:33:10 -07:00
cudnn_recurrent_test.py Caffe2/Recurrent] recurrent.py API to cuDNN LSTM 2017-04-05 14:20:23 -07:00
data_parallel_model_test.py make memonger work with RecurrentNetwork(Gradient) 2017-04-05 09:48:25 -07:00
data_parallel_model.py doxygen python block added 2017-03-29 06:46:16 -07:00
data_workers_test.py close blobs queues when stopping + test 2017-02-27 10:07:57 -08:00
data_workers.py doxygen python block added 2017-03-29 06:46:16 -07:00
dataio_test.py Stop multi_reader if we run out of data before max_examples 2017-03-10 18:03:57 -08:00
dataio.py doxygen python block added 2017-03-29 06:46:16 -07:00
dataset.py doxygen python block added 2017-03-29 06:46:16 -07:00
db_test.py Fix db_test under tsan 2016-11-29 15:18:37 -08:00
device_checker.py doxygen python block added 2017-03-29 06:46:16 -07:00
dyndep.py doxygen python block added 2017-03-29 06:46:16 -07:00
experiment_util.py doxygen python block added 2017-03-29 06:46:16 -07:00
extension_loader.py Make extension loader properly handle visibility. 2017-03-30 14:38:38 -07:00
gradient_check_test.py gradient checker for nets 2017-03-28 13:03:14 -07:00
gradient_checker.py doxygen python block added 2017-03-29 06:46:16 -07:00
hsm_util.py doxygen python block added 2017-03-29 06:46:16 -07:00
hypothesis_test_util.py doxygen python block added 2017-03-29 06:46:16 -07:00
hypothesis_test.py feature processing ops 2017-04-11 07:07:51 -07:00
layer_model_helper.py Remove unused optimizers 2017-04-05 21:18:29 -07:00
layer_model_instantiator.py doxygen python block added 2017-03-29 06:46:16 -07:00
layer_test_util.py uniform_sampling layer 2017-03-29 14:36:12 -07:00
layers_test.py Add option to subtract log odd from sampled trained prediction. 2017-04-03 17:50:58 -07:00
load_save_test.py Improve error message from LogFileDB on missing file 2017-03-10 23:31:28 -08:00
lstm_benchmark.py option to recompute blobs backward pass with massive memory savings 2017-04-11 13:03:48 -07:00
memonger_test.py Added a DP + recursion algorithm for finding optimal blob assignments based on blob sizes. 2017-04-07 02:18:08 -07:00
memonger.py Added a DP + recursion algorithm for finding optimal blob assignments based on blob sizes. 2017-04-07 02:18:08 -07:00
mkl_test_util.py doxygen python block added 2017-03-29 06:46:16 -07:00
model_device_test.py Comment out NHWC Alexnet test for now 2017-01-23 13:59:29 -08:00
model_helper.py ability to disable inputs for extract predictor net 2017-04-06 17:05:32 -07:00
model_helpers_test.py create helpers package and add dropout 2017-04-07 17:33:49 -07:00
model_helpers.py create helpers package and add dropout 2017-04-07 17:33:49 -07:00
mpi_python.cc Move mpi_python.cc to the python folder to be more consistent about source file locations. 2017-01-09 10:59:39 -08:00
muji_test.py chunky sync - build scripts to be written 2016-07-21 10:16:42 -07:00
muji.py doxygen python block added 2017-03-29 06:46:16 -07:00
net_builder_test.py Allow test discovery in caffe2/python/ 2017-03-14 18:16:41 -07:00
net_builder.py doxygen python block added 2017-03-29 06:46:16 -07:00
net_drawer.py doxygen python block added 2017-03-29 06:46:16 -07:00
net_printer_test.py Debug/Analysis tools for Jobs/ExecutionSteps 2017-02-06 17:31:20 -08:00
net_printer.py doxygen python block added 2017-03-29 06:46:16 -07:00
optimizer_test_util.py doxygen python block added 2017-03-29 06:46:16 -07:00
optimizer_test.py Allow test discovery in caffe2/python/ 2017-03-14 18:16:41 -07:00
optimizer.py doxygen python block added 2017-03-29 06:46:16 -07:00
pipeline.py doxygen python block added 2017-03-29 06:46:16 -07:00
predictor_constants.py Constant string is generated from Protobuf instead of Thrift 2017-04-04 15:03:39 -07:00
pybind_state_gpu.cc Cudnn v6 2017-02-28 17:46:33 -08:00
pybind_state_mkl.cc Expose MKLMemory to the Python Feed and Fetch interface, and misc changes 2016-11-29 15:18:36 -08:00
pybind_state.cc Add CAFFE_ENFORCE to protobuf parsing 2017-04-06 14:34:30 -07:00
pybind_state.h bugfix for Windows, esp. VS 2017 2017-03-21 05:17:59 -07:00
python_op_test.py Allow PythonOp to access the workspace 2016-12-05 11:53:26 -08:00
queue_util.py doxygen python block added 2017-03-29 06:46:16 -07:00
record_queue.py doxygen python block added 2017-03-29 06:46:16 -07:00
recurrent.py option to recompute blobs backward pass with massive memory savings 2017-04-11 13:03:48 -07:00
schema_test.py Struct nested field name lookup supports List 2017-03-24 18:17:19 -07:00
schema.py distributed training for dper2 2017-03-30 19:04:50 -07:00
scope_test.py fbsync. TODO: check if build files need update. 2016-11-15 00:00:46 -08:00
scope.py doxygen python block added 2017-03-29 06:46:16 -07:00
session_test.py NextScopedBlob with well-defined behavior and respect namescope 2017-02-16 17:16:36 -08:00
session.py doxygen python block added 2017-03-29 06:46:16 -07:00
sparse_to_dense_mask_test.py Fix few more operators to handle empty batches correctly. 2016-11-29 15:18:37 -08:00
task.py doxygen python block added 2017-03-29 06:46:16 -07:00
test_util.py doxygen python block added 2017-03-29 06:46:16 -07:00
text_file_reader.py doxygen python block added 2017-03-29 06:46:16 -07:00
timeout_guard.py doxygen python block added 2017-03-29 06:46:16 -07:00
toy_regression_test.py sync 2016-08-10 11:02:15 -07:00
tt_core_test.py sync 2016-08-10 11:02:15 -07:00
tt_core.py doxygen python block added 2017-03-29 06:46:16 -07:00
utils.py doxygen python block added 2017-03-29 06:46:16 -07:00
visualize.py doxygen python block added 2017-03-29 06:46:16 -07:00
workspace_test.py Added predictor bindings to python interface 2017-03-15 11:17:54 -07:00
workspace.py doxygen python block added 2017-03-29 06:46:16 -07:00