Commit Graph

56 Commits

Author SHA1 Message Date
Yiming Wu
0aeffa985e make sure mutex is on CPU too
Summary: mutex is only supported on CPU. need to make sure mutex and following atomicIter are both on CPU. This is critical for gpu SparseNN training

Differential Revision: D5093184

fbshipit-source-id: 021e6ba699a3208449fa4761cad6b0ec4544957e
2017-05-19 12:17:17 -07:00
Xiaolong Wang
add840510f Refactor Optimizer to Allow scale_learning_rate
Summary:
In transfer learning, parameter initialized from pretrained model might require
a different learning rate than otherwise initialized. To this end, here we
implement a python solution where `base_learning_rate` is scaled by `scale`,
which is in turn set by `scale_learning_rate`; Alternatively, we can achieve
same effect by rewriting the LearningRate operator in C++

Reviewed By: kennyhorror

Differential Revision: D4992827

fbshipit-source-id: 8d7e87a61c95b3eb8ef733ec436f4060e865c0ac
2017-05-09 13:16:21 -07:00
Alisson Gusatti Azzolini
20d8de8d51 Parameter cost estimation job
Summary:
Adds a parameter cost estimation step before the actual training starts. The costs are later used in order to better shard the parameters across instances of the parameter server.

Things I needed to modify:
- A few changes to make ModelLayerHelper picklable
- Add support for stopping a distributed job after a number of stats reporting steps.
- Refactored run_dist_job to support collocating the reader with the trainer even when PS are present.
- Option to disable dense updates (when num_dense_servers=0).

Currently there's a huge overhead posed by having to launch a child workflow. I'll try and address next in a subsequent diff.

This is WIP because the other workflows need to be migrated as well.

I can break this down into smaller diffs if reviewers would prefer it.

Reviewed By: kennyhorror

Differential Revision: D4974752

fbshipit-source-id: 04c336acb2945f8f11324a221ffc6967818c0672
2017-05-09 13:02:24 -07:00
Bor-Yiing Su
7270471ed6 Returns auxiliary parameters in the optimizers.
Summary:
1. Adds a function to return auxiliary parameters for each optimizer. This function can be used to serialize the optimizers so that they can be recovered.
2. Fixes the bug that the iteration blob is not incremented by one in each iteration. Suppose there are k parameters using the adam learning rate optimizer, the iteration blob is incremented by k based on the original implementation.

Reviewed By: azzolini

Differential Revision: D4872397

fbshipit-source-id: d86711feedda2ba83af5f2a18141b06a6a473733
2017-04-17 10:16:32 -07:00
Aaron Markham
58f7f2b441 doxygen python block added
Summary: Closes https://github.com/caffe2/caffe2/pull/226

Differential Revision: D4793550

Pulled By: JoelMarcey

fbshipit-source-id: cc33e58186304fa8dcac2ee9115dcc271d785b1e
2017-03-29 06:46:16 -07:00
Huazhong Ning
83437853ad refactor and modulize optimizers
Summary:
The current optimizer code in c2/python has the following issues:
(1) the optimizers in sgd.py cannot config per param-blob optimizer;
(2) sgd.py is a bad file name. optimizer.py is a better name;
(3) layer_model_helper.py has another set of optimizer code (which supports per param-blob optimizer)

This diff did the following
(1) create optimizer objects so that we can config per param-blob optimizer and that are also compatible to the existing optimizer code
(2) the new optimizer code are much more modulized
(3) move the optimizer code to file with better name (optimizer.py)
(4) replace the optimizer imports in the existing code

will do in next diffs
(1) optimizers with structured parameters for dper2
(2) get rid of the optimizer code in layer_model_helper.py

Reviewed By: salexspb

Differential Revision: D4609013

fbshipit-source-id: 2e2d6dfa8685d10498f89069157453d9feca3f27
2017-03-07 18:46:47 -08:00