pytorch/caffe2/python
Luke Yeager 82e318cf8b Optimizer: one LR op per (device, optimizer)
Summary:
Try running this script through `nvprof`:
```py
import numpy as np
from caffe2.proto import caffe2_pb2
from caffe2.python import brew, core, optimizer, workspace
from caffe2.python.model_helper import ModelHelper

do = core.DeviceOption(caffe2_pb2.CUDA, 0)
with core.DeviceScope(do):
    model = ModelHelper(arg_scope={'order': 'NCHW'})
    conv1 = brew.conv(model, 'data', 'conv1', 1, 20, 5)
    pool1 = brew.max_pool(model, conv1, 'pool1', kernel=2, stride=2)
    conv2 = brew.conv(model, pool1, 'conv2', 20, 50, 5)
    pool2 = brew.max_pool(model, conv2, 'pool2', kernel=2, stride=2)
    fc3 = brew.fc(model, pool2, 'fc3', 50 * 4 * 4, 500)
    fc3 = brew.relu(model, fc3, fc3)
    pred = brew.fc(model, fc3, 'pred', 500, 10)
    softmax, loss = model.SoftmaxWithLoss([pred, 'label'], ['softmax', 'loss'])
    model.AddGradientOperators([loss])
    optimizer.build_sgd(model, 0.01,
                        policy='step', stepsize=1, gamma=0.999,
                        momentum=0.9, nesterov=False)
    workspace.FeedBlob('data', np.zeros((1, 1, 28, 28), dtype=np.float32))
    workspace.FeedBlob('label', np.zeros((1, 1), dtype=np.int32))

workspace.RunNetOnce(model.param_init_net)
workspace.CreateNet(model.net)

for _ in range(100):
    workspace.RunNet(model.net)
```
Before this change:
```
                    1.55%  1.4185ms       837  1.6940us  1.6630us  2.4000us  [CUDA memcpy HtoD]
                    0.72%  656.03us       200  3.2800us  3.1350us  3.5840us  [CUDA memcpy DtoD]
                    0.39%  7.1574ms      1034  6.9220us  3.8300us  18.677us  cudaMemcpyAsync
                    0.00%  34.180us         3  11.393us  9.0960us  12.910us  cudaMemcpy
```
And after it (look at the third column):
```
                    0.73%  657.15us       200  3.2850us  3.1040us  3.6160us  [CUDA memcpy DtoD]
                    0.26%  235.07us       137  1.7150us  1.6640us  2.3680us  [CUDA memcpy HtoD]
                    0.20%  3.4493ms       334  10.327us  6.4220us  16.958us  cudaMemcpyAsync
                    0.00%  37.376us         3  12.458us  9.4120us  15.412us  cudaMemcpy
```
That makes a pretty big difference in performance. Is there any particular reason you decided to have a separate `LearningRate` op for every parameter in 1317e3498c?
Closes https://github.com/caffe2/caffe2/pull/893

Reviewed By: kennyhorror

Differential Revision: D5372541

Pulled By: asaadaldien

fbshipit-source-id: 57357e1be2d58ce294058e9422fb3b1eddfca24d
2017-07-12 21:17:49 -07:00
..
docs Dict fixes/improvements and unittest targets for Python 3 in caffe2 core 2017-06-29 17:05:41 -07:00
examples Fixed typo 2017-06-23 14:02:40 -07:00
helpers Adding tanh to brew 2017-07-11 18:17:52 -07:00
layers Allow to import subclasses of layers 2017-07-12 20:19:47 -07:00
mint doxygen python block added 2017-03-29 06:46:16 -07:00
mkl Deprecate CNNModelHelper - Inception() 2017-06-15 14:03:27 -07:00
modeling allow param_info to set optimizer 2017-07-12 08:49:48 -07:00
models fast simple-net memonger for C++ 2017-07-06 15:17:07 -07:00
operator_test Implemented GRUCell 2017-07-10 17:52:25 -07:00
predictor Dict fixes/improvements and unittest targets for Python 3 in caffe2 core 2017-06-29 17:05:41 -07:00
rnn Moved sigmoid, tanh, and _prepare_lstm (renamed) to a util file. 2017-07-10 17:52:22 -07:00
_import_c_extension.py doxygen python block added 2017-03-29 06:46:16 -07:00
attention.py Unrolled test for AttentionCell 2017-06-25 17:21:24 -07:00
brew_test.py quick fix for model_helper __init__ 2017-07-12 08:49:48 -07:00
brew.py Adding tanh to brew 2017-07-11 18:17:52 -07:00
caffe_translator_test.py Allow test discovery in caffe2/python/ 2017-03-14 18:16:41 -07:00
caffe_translator.py Read pretrained weights using binary mode in caffe_translator.py 2017-07-08 10:17:57 -07:00
checkpoint_test.py Allow tasks/execution_steps to be cloned at runtime 2017-06-20 22:32:07 -07:00
checkpoint.py Adds interfaces to check the existence of a DB 2017-04-11 14:07:49 -07:00
CMakeLists.txt CMake completions work 2017-01-11 16:59:22 -08:00
cnn.py cnnmodelhelper deprecate warning 2017-05-18 23:35:26 -07:00
context_test.py Make ContextManager thread-safe 2017-02-13 19:45:35 -08:00
context.py doxygen python block added 2017-03-29 06:46:16 -07:00
control_test.py
control.py Dict fixes/improvements and unittest targets for Python 3 in caffe2 core 2017-06-29 17:05:41 -07:00
convnet_benchmarks_test.py
convnet_benchmarks.py brew API in convnet benchmark 2017-07-05 10:34:48 -07:00
core_gradients_test.py add debug information when there is blob version mismatch 2017-06-30 16:22:46 -07:00
core_test.py single trainer hybrid device 2017-06-27 22:06:30 -07:00
core.py Fix communication_schema decoding 2017-07-02 13:04:20 -07:00
crf.py Deprecate CNNModelHelper in python/crf.py 2017-06-14 08:49:27 -07:00
data_parallel_model_test.py Added device scope checks to data_parallel_model and data_parallel_rendevous 2017-07-12 10:47:28 -07:00
data_parallel_model.py Added device scope checks to data_parallel_model and data_parallel_rendevous 2017-07-12 10:47:28 -07:00
data_workers_test.py fix a rare race condition by initializing scratch blobs beforehand 2017-06-26 10:18:18 -07:00
data_workers.py Dict fixes/improvements and unittest targets for Python 3 in caffe2 core 2017-06-29 17:05:41 -07:00
dataio_test.py Allow tasks/execution_steps to be cloned at runtime 2017-06-20 22:32:07 -07:00
dataio.py Fix a few typos and grammars in comment 2017-06-14 18:22:39 -07:00
dataset.py Add random shuffle through the data to the benchmark workflow 2017-06-16 13:22:46 -07:00
db_test.py String-related fixes for Python 3 2017-05-26 16:04:32 -07:00
device_checker.py Dict fixes/improvements and unittest targets for Python 3 in caffe2 core 2017-06-29 17:05:41 -07:00
dyndep.py doxygen python block added 2017-03-29 06:46:16 -07:00
empty.so Adding video data layer for caffe2 2017-05-05 14:16:38 -07:00
experiment_util.py Dict fixes/improvements and unittest targets for Python 3 in caffe2 core 2017-06-29 17:05:41 -07:00
extension_loader.py Make extension loader properly handle visibility. 2017-03-30 14:38:38 -07:00
gradient_check_test.py Cos, Sin, and Abs operators 2017-07-03 22:18:32 -07:00
gradient_checker.py Fix a few typos and grammars in comment 2017-06-14 18:22:39 -07:00
gru_cell.py Implemented GRUCell 2017-07-10 17:52:25 -07:00
hsm_util.py doxygen python block added 2017-03-29 06:46:16 -07:00
hypothesis_test_util.py Add min_satisfying_examples 2017-06-29 12:48:01 -07:00
hypothesis_test.py Cos, Sin, and Abs operators 2017-07-03 22:18:32 -07:00
layer_model_helper.py Dict fixes/improvements and unittest targets for Python 3 in caffe2 core 2017-06-29 17:05:41 -07:00
layer_model_instantiator.py Remove map() and filter() in favor of comprehensions 2017-05-30 15:32:58 -07:00
layer_test_util.py Core unit test fixes for Python 3 2017-06-23 13:22:16 -07:00
layers_test.py make functional layer return scalar if only one output 2017-07-12 11:34:31 -07:00
load_save_test.py Allow Load operator to load into overriden names 2017-04-27 01:18:12 -07:00
lstm_benchmark.py Added flags to lstm, convnet and sparse_nn_benchmarks to print out operators 2017-06-30 23:47:04 -07:00
memonger_test.py fast simple-net memonger for C++ 2017-07-06 15:17:07 -07:00
memonger.py fix for back-and-forth models, pass reference instead of copy 2017-07-11 10:52:14 -07:00
mkl_test_util.py doxygen python block added 2017-03-29 06:46:16 -07:00
model_device_test.py Deprecate CNNModelHelper in caffe2/python/model_device_test.py 2017-06-22 15:37:17 -07:00
model_helper.py quick fix for model_helper __init__ 2017-07-12 08:49:48 -07:00
mpi_python.cc Fix pybind11 module name for MPI helpers 2017-05-02 23:18:50 -07:00
muji_test.py Fixes range/xrange for Python 3 2017-06-07 00:04:26 -07:00
muji.py Fixes range/xrange for Python 3 2017-06-07 00:04:26 -07:00
net_builder_test.py Allow tasks/execution_steps to be cloned at runtime 2017-06-20 22:32:07 -07:00
net_builder.py Allow tasks/execution_steps to be cloned at runtime 2017-06-20 22:32:07 -07:00
net_drawer.py Dict fixes/improvements and unittest targets for Python 3 in caffe2 core 2017-06-29 17:05:41 -07:00
net_printer_test.py Allow tasks/execution_steps to be cloned at runtime 2017-06-20 22:32:07 -07:00
net_printer.py Fix net_printer.py 2017-07-11 15:26:52 -07:00
optimizer_context.py allow param_info to set optimizer 2017-07-12 08:49:48 -07:00
optimizer_test_util.py Fp16 training initializers 2017-06-01 08:34:46 -07:00
optimizer_test.py allow param_info to set optimizer 2017-07-12 08:49:48 -07:00
optimizer.py Optimizer: one LR op per (device, optimizer) 2017-07-12 21:17:49 -07:00
parallelize_gpu_bmuf_distributed_test.py Add distributed BMUF implementation. 2017-06-21 16:18:11 -07:00
pipeline.py Enable runtime cloning of tasks. 2017-06-21 03:18:20 -07:00
predictor_constants.py Re-apply #266 2017-04-25 21:17:04 -07:00
pybind_state_gpu.cc Cudnn v6 2017-02-28 17:46:33 -08:00
pybind_state_mkl.cc
pybind_state.cc fast simple-net memonger for C++ 2017-07-06 15:17:07 -07:00
pybind_state.h fast simple-net memonger for C++ 2017-07-06 15:17:07 -07:00
python_op_test.py Fix some typos 2017-06-28 13:50:48 -07:00
queue_util.py doxygen python block added 2017-03-29 06:46:16 -07:00
record_queue.py Fix a few typos and grammars in comment 2017-06-14 18:22:39 -07:00
recurrent.py Dict fixes/improvements and unittest targets for Python 3 in caffe2 core 2017-06-29 17:05:41 -07:00
rnn_cell.py Dict fixes/improvements and unittest targets for Python 3 in caffe2 core 2017-06-29 17:05:41 -07:00
schema_test.py Add __sub__ function for schema.Struct 2017-06-28 11:24:01 -07:00
schema.py IndexHash 2017-07-07 23:06:11 -07:00
scope_test.py Fix corruption of NameScope when exception is thrown 2017-04-24 22:46:27 -07:00
scope.py Dict fixes/improvements and unittest targets for Python 3 in caffe2 core 2017-06-29 17:05:41 -07:00
session_test.py Warn on setting blob on Scalar 2017-05-01 20:18:30 -07:00
session.py Allow tasks/execution_steps to be cloned at runtime 2017-06-20 22:32:07 -07:00
sparse_to_dense_mask_test.py String-related fixes for Python 3 2017-05-26 16:04:32 -07:00
task.py Dict fixes/improvements and unittest targets for Python 3 in caffe2 core 2017-06-29 17:05:41 -07:00
test_util.py doxygen python block added 2017-03-29 06:46:16 -07:00
text_file_reader.py doxygen python block added 2017-03-29 06:46:16 -07:00
timeout_guard.py Dict fixes/improvements and unittest targets for Python 3 in caffe2 core 2017-06-29 17:05:41 -07:00
toy_regression_test.py
tt_core_test.py
tt_core.py Fix a few typos and grammars in comment 2017-06-14 18:22:39 -07:00
utils.py Fast path for serializing large floating-point tensors to protobuf 2017-07-10 17:52:22 -07:00
visualize.py Python 3 compatible integer division 2017-07-06 11:47:12 -07:00
workspace_test.py Core unit test fixes for Python 3 2017-06-23 13:22:16 -07:00
workspace.py Dict fixes/improvements and unittest targets for Python 3 in caffe2 core 2017-06-29 17:05:41 -07:00