pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
Jiyan Yang	00e5afea6a	Adding dedup aggregator options to sgd optimizer Summary: As desc. Reviewed By: xianjiec Differential Revision: D5324671 fbshipit-source-id: 27f3a58f618cd5ea11c2ea2e756df3f73635c2c8	2017-07-04 02:10:18 -07:00
Yiming Wu	1fce3eac4e	single trainer hybrid device Summary: First try of single trainer hybrid device training for sparsenn Comparison results with CPU training: https://our.intern.facebook.com/intern/fblearner/run/compare/?compare_to[0]=20016969&compare_to[1]=19660293&baseline_run=19660293&all_runs[0]=20016969&all_runs[1]=19660293 Reviewed By: dzhulgakov Differential Revision: D5205723 fbshipit-source-id: 4a024324ac2efc3248dd470d4c533cf2ecec2e92	2017-06-27 22:06:30 -07:00
Simon Layton	eaacfc7e25	Fix multi-precision SGD outputs Summary: salexspb This fixes a major perf issue (40% boost on alexnet end-to-end perf) in the multi-precision SGD optimizer - it was causing repeated cudaMalloc / cudaFree calls during training iterations due to the changing size of the `grad` blob as it moved from fp16 <-> fp32. Closes https://github.com/caffe2/caffe2/pull/797 Differential Revision: D5246978 Pulled By: salexspb fbshipit-source-id: ec3d7ef18445e19eaf5aac908d0a7bcd5957eb60	2017-06-14 11:36:43 -07:00
Wael Abdelghani	ebecafbcca	Support for position weighted in distributed PS Summary: Title Reviewed By: azzolini Differential Revision: D5081871 fbshipit-source-id: 68a97c2112522fbcbcdfd9e0f717b8bce60fe028	2017-06-05 17:04:42 -07:00
Aapo Kyrola	401908d570	add_weight_decay + restore weight decay to resnet50_trainer Summary: Add add_weight_decay to optimizer + test. In D5142973 I accidentally removed weight decay from resnet50 trainer, so this restores it. Reviewed By: asaadaldien Differential Revision: D5173594 fbshipit-source-id: c736d8955eddff151632ae6be11afde0883f7531	2017-06-02 14:16:56 -07:00
Simon Layton	58874ad5bf	Fp16 training initializers Summary: Re-open for re-importing :) Closes https://github.com/caffe2/caffe2/pull/721 Differential Revision: D5164345 Pulled By: akyrola fbshipit-source-id: e80b32556cd25610602df91a4225b93edc0ca40b	2017-06-01 08:34:46 -07:00
Aapo Kyrola	ffbba0fae7	add model_helper Validate() + sprinkler around Summary: Recent diff introduced a duplicate parameter to the model, which would hurt the performance and also affect correctness (duplicate momentum updates, for example). We unfortunately had no checks for duplicate params, outside of data_parallel_model, which fortunately brought this into our attention. But it is better to have a Validate() function in model_helper, and call that before adding gradient ops and querying for parameters. Added to brew_test calls as well. Reviewed By: kennyhorror Differential Revision: D5163458 fbshipit-source-id: 35692e8bfcc359d4e8bc73e6f2358659f6e45ceb	2017-06-01 02:36:47 -07:00
Aapo Kyrola	0f8c8f37a8	Revert D5159712: [caffe2][PR] Fp16 training initializers Summary: This reverts commit 60a889494d2e2f4df1d720331e19f638c5eb95cc Differential Revision: D5159712 fbshipit-source-id: 16040c911b260648857f656f92b165f92c2daae0	2017-06-01 00:17:14 -07:00
Simon Layton	2bfacff426	Fp16 training initializers Summary: Adds support for generating and training pfp16 models. Added SGD optimizer for multi-precision trainers and a new callback to data_parallel_model in order to help multi-precision models keep their different copies of parameters in sync during training. Closes https://github.com/caffe2/caffe2/pull/697 Differential Revision: D5159712 Pulled By: salexspb fbshipit-source-id: 60a889494d2e2f4df1d720331e19f638c5eb95cc	2017-05-31 17:46:58 -07:00
Aapo Kyrola	ce7ce46ca1	fix secondary device check by gradient, if it is sparse Summary: Fix an issue where the parameter is not created in param_init_net, or net, and then we secondarily look at which device op outputs the gradient. This did not work if the gradient was a GradientSlice. Reviewed By: harouwu Differential Revision: D5153102 fbshipit-source-id: 20eae660ea32e5a9ea484bf93c04c8f8c71a51ed	2017-05-30 20:47:17 -07:00
Aapo Kyrola	cdb50fbf2b	add optimizer support to data_parallel_model; Use MomentumSGDUpdate Summary: This diff does two things: - add supports for optimizer to data_parallel_model. User can supply optimizer_builder_fun instead of param_update_builder_fun. The latter is called for each GPU separately with proper namescope and devicescope, while optimizer builder only is called once and adds optimizes to the whole model. - use MomentumSGDUpdate instead of MomentumSGD + WeightedSum. This bring major perf benefits. Changes resnet50 trainer to use optimizer. This relies on D5133652 Reviewed By: dzhulgakov Differential Revision: D5142973 fbshipit-source-id: 98e1114f5fae6c657314b3296841ae2dad0dc0e2	2017-05-30 12:49:57 -07:00
Aapo Kyrola	44257ea5ed	automatically infer device scope for param Summary: hankun is using the optimizer, but having mixed set of of GPU and CPU operators. Currently this won't work with optimizer since it adds optimizers for all parameters in the current device scope. But we can actually infer the device that a param belongs to by looking at the device option in the param_init_net. Added a test as well. Reviewed By: salexspb Differential Revision: D5133652 fbshipit-source-id: ad8689d75ac1f5c78981bae1b6978fe91e40ef0f	2017-05-30 12:02:19 -07:00
Alexander Sidorov	016f72537a	ModelHelper.create_param, Initializer abstraction and ParameterInfo for optimizers Summary: This is going to unblock Nvidia in their work on adding fp16 support to Caffe2. I discussed this with kennyhorror before to make sure this fits into his work on parameter sharing. Reviewed By: kennyhorror Differential Revision: D5127797 fbshipit-source-id: 4db155d320b1862570c23b77c4252bdacbf2296f	2017-05-25 22:03:15 -07:00
Yiming Wu	0aeffa985e	make sure mutex is on CPU too Summary: mutex is only supported on CPU. need to make sure mutex and following atomicIter are both on CPU. This is critical for gpu SparseNN training Differential Revision: D5093184 fbshipit-source-id: 021e6ba699a3208449fa4761cad6b0ec4544957e	2017-05-19 12:17:17 -07:00
Xiaolong Wang	add840510f	Refactor Optimizer to Allow scale_learning_rate Summary: In transfer learning, parameter initialized from pretrained model might require a different learning rate than otherwise initialized. To this end, here we implement a python solution where `base_learning_rate` is scaled by `scale`, which is in turn set by `scale_learning_rate`; Alternatively, we can achieve same effect by rewriting the LearningRate operator in C++ Reviewed By: kennyhorror Differential Revision: D4992827 fbshipit-source-id: 8d7e87a61c95b3eb8ef733ec436f4060e865c0ac	2017-05-09 13:16:21 -07:00
Alisson Gusatti Azzolini	20d8de8d51	Parameter cost estimation job Summary: Adds a parameter cost estimation step before the actual training starts. The costs are later used in order to better shard the parameters across instances of the parameter server. Things I needed to modify: - A few changes to make ModelLayerHelper picklable - Add support for stopping a distributed job after a number of stats reporting steps. - Refactored run_dist_job to support collocating the reader with the trainer even when PS are present. - Option to disable dense updates (when num_dense_servers=0). Currently there's a huge overhead posed by having to launch a child workflow. I'll try and address next in a subsequent diff. This is WIP because the other workflows need to be migrated as well. I can break this down into smaller diffs if reviewers would prefer it. Reviewed By: kennyhorror Differential Revision: D4974752 fbshipit-source-id: 04c336acb2945f8f11324a221ffc6967818c0672	2017-05-09 13:02:24 -07:00
Bor-Yiing Su	7270471ed6	Returns auxiliary parameters in the optimizers. Summary: 1. Adds a function to return auxiliary parameters for each optimizer. This function can be used to serialize the optimizers so that they can be recovered. 2. Fixes the bug that the iteration blob is not incremented by one in each iteration. Suppose there are k parameters using the adam learning rate optimizer, the iteration blob is incremented by k based on the original implementation. Reviewed By: azzolini Differential Revision: D4872397 fbshipit-source-id: d86711feedda2ba83af5f2a18141b06a6a473733	2017-04-17 10:16:32 -07:00
Aaron Markham	58f7f2b441	doxygen python block added Summary: Closes https://github.com/caffe2/caffe2/pull/226 Differential Revision: D4793550 Pulled By: JoelMarcey fbshipit-source-id: cc33e58186304fa8dcac2ee9115dcc271d785b1e	2017-03-29 06:46:16 -07:00
Huazhong Ning	83437853ad	refactor and modulize optimizers Summary: The current optimizer code in c2/python has the following issues: (1) the optimizers in sgd.py cannot config per param-blob optimizer; (2) sgd.py is a bad file name. optimizer.py is a better name; (3) layer_model_helper.py has another set of optimizer code (which supports per param-blob optimizer) This diff did the following (1) create optimizer objects so that we can config per param-blob optimizer and that are also compatible to the existing optimizer code (2) the new optimizer code are much more modulized (3) move the optimizer code to file with better name (optimizer.py) (4) replace the optimizer imports in the existing code will do in next diffs (1) optimizers with structured parameters for dper2 (2) get rid of the optimizer code in layer_model_helper.py Reviewed By: salexspb Differential Revision: D4609013 fbshipit-source-id: 2e2d6dfa8685d10498f89069157453d9feca3f27	2017-03-07 18:46:47 -08:00

19 Commits