Commit Graph

13 Commits

Author SHA1 Message Date
Qinqing Zheng
ce62c65c18 momentum sgd
Summary: Add support for SparseMomentumSGDUpdate and tests for momentum SGD in both dense and sparse cases

Reviewed By: akyrola

Differential Revision: D6234834

fbshipit-source-id: 9848c29ea06794ef35f1ebaff0f5e81eac4f4db9
2017-11-03 16:17:17 -07:00
Yangqing Jia
8286ce1e3a Re-license to Apache
Summary: Closes https://github.com/caffe2/caffe2/pull/1260

Differential Revision: D5906739

Pulled By: Yangqing

fbshipit-source-id: e482ba9ba60b5337d9165f28f7ec68d4518a0902
2017-09-28 16:22:00 -07:00
Christopher Hay
cc3662e939 Added support for scaling learning rate of Caffe2 optimizers during training
Summary: While there is currently support for scaling the base learning rate when loading the model, there is not support for scaling the base learning rate during training. This is needed for LATTE's seq2seq translation models, as the learning schedule is not predefined and is modified at runtime.

Reviewed By: jhcross

Differential Revision: D5701391

fbshipit-source-id: ae3bec45f238db1a2be7af9c04d720067e9095d5
2017-08-25 19:04:47 -07:00
Christopher Hay
ad07f5f05d Added norm-based gradient clipping to optimizer library
Summary: Moved code for global norm-based gradient clipping from fb specific workflows (seq2seq) to the open-source caffe2 optimizer library

Reviewed By: jhcross

Differential Revision: D5637453

fbshipit-source-id: 7e73c9a1c97c28a152c188467b27a6449f79242e
2017-08-24 10:17:50 -07:00
Simon Layton
58874ad5bf Fp16 training initializers
Summary:
Re-open for re-importing :)
Closes https://github.com/caffe2/caffe2/pull/721

Differential Revision: D5164345

Pulled By: akyrola

fbshipit-source-id: e80b32556cd25610602df91a4225b93edc0ca40b
2017-06-01 08:34:46 -07:00
Aapo Kyrola
0f8c8f37a8 Revert D5159712: [caffe2][PR] Fp16 training initializers
Summary: This reverts commit 60a889494d2e2f4df1d720331e19f638c5eb95cc

Differential Revision: D5159712

fbshipit-source-id: 16040c911b260648857f656f92b165f92c2daae0
2017-06-01 00:17:14 -07:00
Simon Layton
2bfacff426 Fp16 training initializers
Summary:
Adds support for generating and training pfp16 models. Added SGD optimizer for multi-precision trainers and a new callback to data_parallel_model in order to help multi-precision models keep their different copies of parameters in sync during training.
Closes https://github.com/caffe2/caffe2/pull/697

Differential Revision: D5159712

Pulled By: salexspb

fbshipit-source-id: 60a889494d2e2f4df1d720331e19f638c5eb95cc
2017-05-31 17:46:58 -07:00
Aapo Kyrola
44257ea5ed automatically infer device scope for param
Summary:
hankun is using the optimizer, but having mixed set of of GPU and CPU operators. Currently this won't work with optimizer since it adds optimizers for all parameters in the current device scope. But we can actually infer the device that a param belongs to by looking at the device option in the param_init_net.

Added a test as well.

Reviewed By: salexspb

Differential Revision: D5133652

fbshipit-source-id: ad8689d75ac1f5c78981bae1b6978fe91e40ef0f
2017-05-30 12:02:19 -07:00
Yangqing Jia
cf317d1106 create_net: explicitly specify if one wants to overwrite the network.
Summary:
This is from discussion with dzhulgakov : as a step towards revisiting the
core.Net autonaming, we will first guard against accidental overwrites of
existing networks in the workspace.

ajtulloch since we are doing Predictors in mobile, this should be safe right?

azzolini - I assume this would be safe, but would love to get your approval.

akyrola - would this hurt xray?

Reviewed By: dzhulgakov

Differential Revision: D4897725

fbshipit-source-id: aa41271927ad6671f07a53b9505283623f8c49e5
2017-04-17 21:46:53 -07:00
Bor-Yiing Su
7270471ed6 Returns auxiliary parameters in the optimizers.
Summary:
1. Adds a function to return auxiliary parameters for each optimizer. This function can be used to serialize the optimizers so that they can be recovered.
2. Fixes the bug that the iteration blob is not incremented by one in each iteration. Suppose there are k parameters using the adam learning rate optimizer, the iteration blob is incremented by k based on the original implementation.

Reviewed By: azzolini

Differential Revision: D4872397

fbshipit-source-id: d86711feedda2ba83af5f2a18141b06a6a473733
2017-04-17 10:16:32 -07:00
Aaron Markham
58f7f2b441 doxygen python block added
Summary: Closes https://github.com/caffe2/caffe2/pull/226

Differential Revision: D4793550

Pulled By: JoelMarcey

fbshipit-source-id: cc33e58186304fa8dcac2ee9115dcc271d785b1e
2017-03-29 06:46:16 -07:00
Luke Yeager
014d1fe5c4 Allow test discovery in caffe2/python/
Summary:
These are all essentially no-op changes which allow for nose-style (or pytest-style) test discovery.

With this patch, you can use any of these methods to discover and run tests under `caffe2/python`:
```
python -m unittest discover -p '*test*.py' caffe2/python/
python -m nose caffe2/python/
python -m pytest caffe2/python/
```

Future work:

* Get all of the tests to pass
  * Some seem to be testing operations which don't have GPU implementations
  * I get a segfault unless I set `CUDA_VISIBLE_DEVICES=0`
  * Some tests are flaky
* Allow test discovery throughout the whole project (e.g. the `experiments/` dir)
Closes https://github.com/caffe2/caffe2/pull/199

Reviewed By: pietern

Differential Revision: D4704504

Pulled By: Yangqing

fbshipit-source-id: 8f5687ec9c8aa873dfaff30dbf44272bc38a206b
2017-03-14 18:16:41 -07:00
Huazhong Ning
83437853ad refactor and modulize optimizers
Summary:
The current optimizer code in c2/python has the following issues:
(1) the optimizers in sgd.py cannot config per param-blob optimizer;
(2) sgd.py is a bad file name. optimizer.py is a better name;
(3) layer_model_helper.py has another set of optimizer code (which supports per param-blob optimizer)

This diff did the following
(1) create optimizer objects so that we can config per param-blob optimizer and that are also compatible to the existing optimizer code
(2) the new optimizer code are much more modulized
(3) move the optimizer code to file with better name (optimizer.py)
(4) replace the optimizer imports in the existing code

will do in next diffs
(1) optimizers with structured parameters for dper2
(2) get rid of the optimizer code in layer_model_helper.py

Reviewed By: salexspb

Differential Revision: D4609013

fbshipit-source-id: 2e2d6dfa8685d10498f89069157453d9feca3f27
2017-03-07 18:46:47 -08:00