Commit Graph

84 Commits

Author SHA1 Message Date
Aaron Gokaslan
8fce9a09cd [BE]: pyupgrade Python to 3.8 - imports and object inheritance only (#94308)
Apply parts of pyupgrade to torch (starting with the safest changes).
This PR only does two things: removes the need to inherit from object and removes unused future imports.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/94308
Approved by: https://github.com/ezyang, https://github.com/albanD
2023-02-07 21:10:56 +00:00
Rahul Arunapuram Gokul
6eaf96961d [codemod] fix tautological imports
Test Plan: waitforsandcastle

Reviewed By: koronthaly

Differential Revision: D27310963

fbshipit-source-id: 9ca0a6468e00d481b1583ab98578dc70f80bb3bf
2021-03-27 01:15:57 -07:00
Bugra Akyildiz
27c7158166 Remove __future__ imports for legacy Python2 supports (#45033)
Summary:
There is a module called `2to3` which you can target for future specifically to remove these, the directory of `caffe2` has the most redundant imports:

```2to3 -f future -w caffe2```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/45033

Reviewed By: seemethere

Differential Revision: D23808648

Pulled By: bugra

fbshipit-source-id: 38971900f0fe43ab44a9168e57f2307580d36a38
2020-09-23 17:57:02 -07:00
Xiang Gao
20ac736200 Remove py2 compatible future imports (#44735)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44735

Reviewed By: mruberry

Differential Revision: D23731306

Pulled By: ezyang

fbshipit-source-id: 0ba009a99e475ddbe22981be8ac636f8a1c8b02f
2020-09-16 12:55:57 -07:00
Stanislau Hlebik
b774ce54f8 remediation of S205607
fbshipit-source-id: 798decc90db4f13770e97cdce3c0df7d5421b2a3
2020-07-17 17:19:47 -07:00
Stanislau Hlebik
8fdea489af remediation of S205607
fbshipit-source-id: 5113fe0c527595e4227ff827253b7414abbdf7ac
2020-07-17 17:17:03 -07:00
Jeff Daily
4a2c642e1f fix ROCm bench CI by increasing first iter timeout (#37633)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37633

Differential Revision: D21395519

Pulled By: ngimel

fbshipit-source-id: 03b31417dde0758db6c189c21b6cb5771c776115
2020-05-04 20:49:32 -07:00
Brian Wignall
f326045b37 Fix typos, via a Levenshtein-type corrector (#31523)
Summary:
Should be non-semantic.

Uses https://en.wikipedia.org/wiki/Wikipedia:Lists_of_common_misspellings/For_machines to find likely typos, with https://github.com/bwignall/typochecker to help automate the checking.

Uses an updated version of the tool used in https://github.com/pytorch/pytorch/pull/30606 .
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31523

Differential Revision: D19216749

Pulled By: mrshenli

fbshipit-source-id: 7fd489cb9a77cd7e4950c1046f925d57524960ea
2020-01-17 16:03:19 -08:00
Xiaodong Wang
36b73d5a1b Hipify contrib/nccl (#29385)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29385

hipify contrib/gloo

Test Plan: OSS & sandcastle build

Reviewed By: bddppq

Differential Revision: D18373308

fbshipit-source-id: 39c232db36318af116c341f64d03642639575ecd
2019-11-08 10:39:17 -08:00
Abhinav Jauhri
31aefd9b09 Adding models to jenkins benchmark script (#21010)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21010

Adding and editing the jenkins benchmarks script to accommadate both
Renext and Shufflenet models.

Reviewed By: bddppq

Differential Revision: D15515354

fbshipit-source-id: 2a92c272b0b74ed3ecc78af6544a06337c7753cf
2019-05-30 15:17:40 -07:00
Abhinav Jauhri
f93e0619f3 Adding ShufflenetV2 to caffe2's benchmark suite. (#20180)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20180

Adding ShufflenetV2 (by Ma et. al. 2018) to the caffe2's benchmark
suite.

To run, use: `buck run mode/opt caffe2/caffe2/python/examples:imagenet_trainer -- --train_data null --batch_size 128 --epoch_size 3200 --num_epochs 2 --num_gpus 2 --model shufflenet`

Reviewed By: bddppq, xw285cornell

Differential Revision: D15094282

fbshipit-source-id: 0e1ce9c5975868e917b0f179e2c5b15647a76b4e
2019-05-23 20:40:17 -07:00
Junjie Bai
2ad5dcbbe4 Make timeout in resnet50_trainer configurable (#17058)
Summary:
xw285cornell petrex dagamayank
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17058

Differential Revision: D14068458

Pulled By: bddppq

fbshipit-source-id: 15df4007859067a22df4c6c407df4121e19aaf97
2019-02-13 17:03:48 -08:00
Junjie Bai
f169f398d0 Change the default image size from 227 to 224 in resnet50 trainer (#16924)
Summary:
cc xw285cornell
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16924

Differential Revision: D14018509

Pulled By: bddppq

fbshipit-source-id: fdbc9e94816ce6e4b1ca6f7261007bda7b80e1e5
2019-02-09 11:18:58 -08:00
rohithkrn
ddeaa541aa fix typo in resnet50_trainer.py
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16219

Differential Revision: D13776742

Pulled By: bddppq

fbshipit-source-id: 10a6ab4c58159b3f619b739074f773662722c1d9
2019-01-22 17:28:04 -08:00
Shane Li
620ff25bdb Enhance cpu support on gloo based multi-nodes mode. (#11330)
Summary:
1. Add some gloo communication operators into related fallback list;
2. Work around to avoid compiling errors while using fallback operator whose CPU operator inherits from 'OperatorBase' directly like PrefetchOperator;
3. Add new cpu context support for some python module files and resnet50 training example file.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11330

Reviewed By: yinghai

Differential Revision: D13624519

Pulled By: wesolwsk

fbshipit-source-id: ce39d57ddb8cd7786db2e873bfe954069d972f4f
2019-01-15 11:47:10 -08:00
Orion Reblitz-Richardson
febc7ff99f Add __init__.py so files get picked up on install (#14898)
Summary:
This will let us install tests and other Caffe2 python code as a part of running Caffe2 tests in PyTorch.

Broken out of https://github.com/pytorch/pytorch/pull/13733/

cc pjh5 yf225
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14898

Reviewed By: pjh5

Differential Revision: D13381123

Pulled By: orionr

fbshipit-source-id: 0ec96629b0570f6cc2abb1d1d6fce084e7464dbe
2018-12-07 13:40:23 -08:00
rohithkrn
0d663cec30 Unify cuda and hip device types in Caffe2 python front end (#14221)
Summary:
Goal of this PR is to unify cuda and hip device types in caffe2 python front end.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14221

Differential Revision: D13148564

Pulled By: bddppq

fbshipit-source-id: ef9bd2c7d238200165f217097ac5727e686d887b
2018-11-29 14:00:16 -08:00
Xiaodong Wang
eb7a298489 Add resnext model to OSS (#11468)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11468

Add resnext model into OSS Caffe 2 repo.

Reviewed By: orionr, kuttas

Differential Revision: D9506000

fbshipit-source-id: 236005d5d7dbeb8c2864014b1eea03810618d8e8
2018-09-12 15:59:20 -07:00
Orion Reblitz-Richardson
6223bfdb1d Update from Facebook (#6692)
* [GanH][Easy]: Add assertion to adaptive weighting layer

0 weight causes numeric instability and exploding ne

* [Easy] Add cast op before computing norm in diagnose options

As LpNorm only takes floats we add a manual casting here.

* Introduce a new caching device allocator

`cudaMalloc` and `cudaFree` calls are slow, and become slower the
more GPUs there are. Essentially, they grab a host-wide (not device-wide) lock
because GPU memory is transparently shared across all GPUs. Normally, this
isn't much of a concern since workloads allocate memory upfront, and reuse it
during later computation.

However, under some computation models (specifically, memory conserving
approaches like checkpoint-and-recompute, see
https://medium.com/@yaroslavvb/fitting-larger-networks-into-memory-583e3c758ff9)
this assumption is no longer true. In these situations, `cudaMalloc` and
`cudaFree` are common and frequent. Furthermore, in data parallel contexts,
these calls happen at nearly the same time from all GPUs worsening lock
contention.

A common solution to this problem is to add a custom allocator. In fact,
nVIDIA provides one out of the box: CUB, which Caffe2 already supports.
Unfortunately, the CUB allocator suffers from very high fragmentation. This is
primarily because it is a "buddy" allocator which neither splits nor merges
free cached blocks. Study
https://github.com/NVlabs/cub/blob/1.8.0/cub/util_allocator.cuh#L357 if you
want to convince yourself.

This diff adapts a caching allocator from the Torch codebase
https://github.com/torch/cutorch/blob/master/lib/THC/THCCachingAllocator.cpp
which does splitting and merging and ends up working really well, at least for
workloads like the checkpoint-and-recompute computation models noted above.

I simplified the implementation a little bit, made it a bit more C++-like. I
also removed a bunch of stream synchronization primitives for this diff. I
plan to add them back in subsequent diffs.

* Report reader progress in fblearner workflows

Integrate with fblearner progress reporting API and add support to report training progress from reader nodes.
If reader is constructed with batch limits, report based on finished batch vs total batch. The finished batch may be more than total batch because we evaludate if we should stop processing everytime we dequeue a split.
If no limit for the reader, report based on finished splits (Hive files) vs total splits. This is fairly accurate.

* [GanH][Diagnose]: fix plotting

1. ganh diagnose needs to set plot options
2. modifier's blob name is used for metric field can need to be fixed before
generating net

* Automatic update of fbcode/onnx to 985af3f5a0f7e7d29bc0ee6b13047e7ead9c90c8

* Make CompositeReader stops as soon as one reader finishes

Previously, CompositeReader calls all readers before stopping. It results in flaky test since the last batch may be read by different threads; resulting in dropped data.

* [dper] make sure loss is not nan

as desc.

* [rosetta2] [mobile-vision] Option to export NHWC order for RoIWarp/RoIAlign

Thanks for finding this @stzpz and @wangyanghan. Looks like NHWC is more
optimized. For OCR though it doesn't yet help since NHWC uses more mem b/w but
will soon become important.

* Intra-op parallel FC operator

Intra-op parallel FC operator

* [C2 Proto] extra info in device option

passing extra information in device option

design doc: https://fb.quip.com/yAiuAXkRXZGx

* Unregister MKL fallbacks for NCHW conversions

* Tracing for more executors

Modified Tracer to work with other executors and add more tracing

* Remove ShiftActivationDevices()

* Check for blob entry iff it is present

When processing the placeholders ops, ignore if the blob is not present in the blob_to_device.

* Internalize use of eigen tensor

Move use of eigen tensor out of the header file so we don't get template partial specialization errors when building other libraries.

* feature importance for transformed features.

* - Fix unused parameter warnings

The changes in this diff comments out unused parameters.
This will allow us to enable -Wunused-parameter as error.

#accept2ship

* add opencv dependencies to caffe2

The video input op requires additional opencv packages. This is to add them to
cmake so that it can build

* Add clip_by_value option in gradient clipping

Add clip_by_value option in gradient clipping

when the value is bigger than max or smaller than min, do the clip

* std::round compat
2018-04-17 23:36:40 -07:00
Orion Reblitz-Richardson
1d5780d42c Remove Apache headers from source.
* LICENSE file contains details, so removing from individual source files.
2018-03-27 13:10:18 -07:00
Kutta Srinivasan
0ee53bf7fe Fix one more naming issue in resnet50_trainer.py for PR 2205 2018-03-09 13:51:42 -08:00
Kutta Srinivasan
ed05ca9fec Clean up naming of FP16-related code, add comments 2018-03-09 13:51:42 -08:00
Lukasz Wesolowski
29a4c942fe Add support for multi-device batch normalization through an option to data_parallel_model
Summary: Stage 3 in stack of diffs for supporting multi-device batch normalization. Adds input parameter to data_parallel_model to enable multi-device batch normalization. Depends on D6699258.

Reviewed By: pietern

Differential Revision: D6700387

fbshipit-source-id: 24ed62915483fa4da9b1760eec0c1ab9a64b94f8
2018-01-24 13:24:06 -08:00
Anirban Roychowdhury
158e001238 Checking for positive epoch size before running epoch
Summary: Checking for positive epoch size before running epoch

Reviewed By: pietern

Differential Revision: D6738966

fbshipit-source-id: 64e1fb461d784786b20a316999e4c037787f3a14
2018-01-18 11:48:35 -08:00
Aapo Kyrola
2caca70a37 Allow shifting of activations / ops to other GPUs in data parallel model
Summary:
(Work in progress). This diff will allow shifting of activations to other GPUs, in case the model does not fit into memory. To see the API, check the code in data_parallel_model_test, which tests shifting two activations from 0 and 1 to gpu 4, and from gpu 2 and 3 to gpu 5.

I will need to further test on ResNets, and probablly add copy operations to handle device change points.

Reviewed By: asaadaldien

Differential Revision: D5591674

fbshipit-source-id: eb12d23651a56d64fa4db91090c6474218705270
2017-11-29 21:17:00 -08:00
Aapo Kyrola
b71cebb11f Fix LoadModel() in resnet50_trainer
Summary:
resnet50 trainer will save the 'optimizer_iteration' blob in checkpoints, but loads it i in GPU context. This fails because AtomicIter/Iter expect the blob to be in CPU context. So manually reset the optimizer_iteration in CPU context.

I am thinking of making the iter-operators automatically do this switch, but in the mean time this unbreaks the trainer.

Reviewed By: sf-wind

Differential Revision: D6232626

fbshipit-source-id: da7c183a87803e008f94c86b6574b879c3b76438
2017-11-03 11:15:25 -07:00
Aapo Kyrola
b5c053b1c4 fix fp16 issues with resnet trainer
Summary:
My commit  bab5bc  broke things wiht fp16 compute, as i had tested it only with the null-input, that actually produced fp32 data (even dtype was given as float16). Also, I had confused the concepts of "float16 compute" and fp16 data. Issue #1408.

This fixes those issues, tested with both Volta and M40 GPUs. Basically restored much of the previous code and fixed the null input to do FloatToHalf.

Reviewed By: pietern

Differential Revision: D6211849

fbshipit-source-id: 5b41cffdd605f61a438a4c34c56972ede9eee28e
2017-11-01 13:30:08 -07:00
Aapo Kyrola
669ec0ccba Added FP16 compute support to FC Op
Summary: Allow the GEMMs in the FC/FCGradient Op to do FP16 compute instead of FP32 if the appropriate op flag is set.

Reviewed By: asaadaldien

Differential Revision: D5839777

fbshipit-source-id: 8051daedadf72bf56c298c1cf830b019b7019f43
2017-10-30 17:03:51 -07:00
Aapo Kyrola
1b71bf1d36 Updated resnet50_trainer and resnet for more FP16 support
Summary: Added FP16SgdOptimizer to resnet50_trainer

Reviewed By: wesolwsk

Differential Revision: D5841408

fbshipit-source-id: 3c8c0709fcd115377c13ee58d5bb35f1f83a7105
2017-10-24 09:19:06 -07:00
Luke Yeager
ee143d31ef Fix ImageInput op in resnet50_trainer.py
Summary:
Fix #1269 (from fa0fcd4053dd42a4ec3a2a12085662179f0e11df).
Closes https://github.com/caffe2/caffe2/pull/1314

Reviewed By: bwasti

Differential Revision: D6021171

Pulled By: bddppq

fbshipit-source-id: 7d7c45f8b997c25f34530f826729d700a9c522d4
2017-10-10 11:20:52 -07:00
Junjie Bai
d894a6362f Add missing is_test argument in ImageInput ops
Summary: reported in Github Issue https://github.com/caffe2/caffe2/issues/1269

Reviewed By: salexspb

Differential Revision: D6004461

fbshipit-source-id: 03f4bccfe085010b30109ab7b6fe7325caa160ef
2017-10-10 10:03:13 -07:00
Yangqing Jia
8286ce1e3a Re-license to Apache
Summary: Closes https://github.com/caffe2/caffe2/pull/1260

Differential Revision: D5906739

Pulled By: Yangqing

fbshipit-source-id: e482ba9ba60b5337d9165f28f7ec68d4518a0902
2017-09-28 16:22:00 -07:00
Aapo Kyrola
9ec981b866 for CPU-data parallel, allow sharing model
Summary: On CPU, no need to replicate parameters. So try using only one copy (cpu_0) for parameters. Made resnet50_trainer use shared model in cpu mode.

Reviewed By: wesolwsk

Differential Revision: D5812181

fbshipit-source-id: 93254733edbc4a62bd74a629a68f5fa23f7e96ea
2017-09-15 16:19:37 -07:00
Pieter Noordhuis
27dde63358 Allow run of example resnet50_trainer without training data
Summary:
This is useful for pure throughput tests where
we don't care about training a real model.

Reviewed By: akyrola

Differential Revision: D5834293

fbshipit-source-id: dab528c9269fb713e6f6b42457966219c06e0a35
2017-09-15 09:45:11 -07:00
Aapo Kyrola
1e37145872 Resnet50 should param init net before creating test net
Summary: Otherwise weights, biases are not created and test creation fails

Reviewed By: gsethi523

Differential Revision: D5836438

fbshipit-source-id: 32a75313b6b9ebecbfaa43ebd39f19c8eaba8cd1
2017-09-14 16:06:01 -07:00
Pieter Noordhuis
d43ab4bec5 Create Gloo common world through MPI rendezvous
Summary:
Before this change there were two ways for machines to rendezvous for a
distributed run: shared file system or Redis. If you're using an MPI
cluster it is much more convenient to simply execute mpirun and expect
the "right thing (tm)" to happen. This change adds the "mpi_rendezvous"
option to the CreateCommonWorld operator. If this is set, the common
world size and rank will be pulled from the MPI context and Gloo
rendezvous takes place using MPI. Note that this does NOT mean the MPI
BTL is used; MPI is only used for rendezvous.
Closes https://github.com/caffe2/caffe2/pull/1190

Reviewed By: akyrola

Differential Revision: D5796060

Pulled By: pietern

fbshipit-source-id: f8276908d3f3afef2ac88594ad377e38c17d0226
2017-09-08 17:18:47 -07:00
Pieter Noordhuis
b8eb8ced7d Add transport/interface arguments to CreateCommonWorld operator
Summary:
These arguments control which Gloo transport (TCP or IB) and which
network interface is used for the common world. If not specified, it
defaults to using TCP and the network interface for the IP that the
machine's hostname resolves to.

The valid values for the transport argument are "tcp" and "ibverbs".
For ibverbs to work, Gloo must have been compiled with ibverbs
support. If Gloo is built as part of Caffe2 (sourced from the
third_party directory), then you can pass -DUSE_IBVERBS=ON to CMake to
enable ibverbs support in Gloo.
Closes https://github.com/caffe2/caffe2/pull/1177

Reviewed By: akyrola

Differential Revision: D5789729

Pulled By: pietern

fbshipit-source-id: 0dea1a115c729e54c5c1f9fdd5fb29c14a834a82
2017-09-08 10:57:41 -07:00
Luke Yeager
f7ece79949 Add fp16 and tensorcore support to resnet50_trainer
Summary:
Use like `--dtype=float16 --enable-tensor-core`
Closes https://github.com/caffe2/caffe2/pull/1093

Differential Revision: D5634840

Pulled By: harouwu

fbshipit-source-id: 18c1e70236ba5ef8661ff55fb524caae1be19310
2017-08-17 15:16:24 -07:00
Luke Yeager
f92fdd850d Important typo in resnet50_trainer
Summary: Closes https://github.com/caffe2/caffe2/pull/1092

Reviewed By: Yangqing

Differential Revision: D5637489

Pulled By: harouwu

fbshipit-source-id: 13609a3e14a45e640849268821fd8565fd7aae4d
2017-08-15 19:03:15 -07:00
Pieter Noordhuis
d177846dbf Add prefix argument to FileStoreHandler
Summary:
This brings it up to par with how the RedisStoreHandler
works. The store handler configuration does not have to change and
only the run ID parameter changes across runs.

This was inconsistent and came up in https://github.com/caffe2/caffe2/issues/984.

Reviewed By: Yangqing

Differential Revision: D5539299

fbshipit-source-id: 3b5f31c6549b46c24bbd70ebc0bec150eac8b76c
2017-08-03 10:37:26 -07:00
Aapo Kyrola
26645154bb warn about using test/val model with init_params=True + fixed some cases
Summary: It is common mistake to create test/validation model with init_params=True. When its param_init_net is run, it will overwrite training models' params, and with DPM, those won't be synchronized to all GPUs. I don't want to make this an assertion yet, since it might break people's trainers (it is ok to have init_params=True if you never run the param_init_net...).

Reviewed By: asaadaldien

Differential Revision: D5509963

fbshipit-source-id: 63b1a16ec0af96e3790e226850f6e0e64689143f
2017-07-27 13:20:27 -07:00
Ahmed Taei
804ebf7c41 Populate learning rate blob name into data_parallel_model and fix resnet50_trainer example.
Reviewed By: akyrola

Differential Revision: D5463772

fbshipit-source-id: 10b8963af778503a3de6edbabb869747bd1e986d
2017-07-21 16:24:10 -07:00
Hyungsuk Kang
ca2b608f83 Fixed typo
Summary:
peaces -> pieces, peace -> piece
Closes https://github.com/caffe2/caffe2/pull/819

Differential Revision: D5312417

Pulled By: aaronmarkham

fbshipit-source-id: 59d2c3f475197a5f29dc7cf3ecaf675a242d3cdf
2017-06-23 14:02:40 -07:00
Aapo Kyrola
34eaa19d27 CPU data parallel model
Summary:
CPU -version of data parallel model. Great thing is that now we can run data_parallel_model_test in Sandcastle (as it does not have GPUs).

Pretty simple change, really. I did not change all variable names with "gpu" in them, to reduce risk (and being a bit lazy). Can improve later.

Reviewed By: wesolwsk

Differential Revision: D5277350

fbshipit-source-id: 682e0c5f9f4ce94a8f5bd089905b0f8268bd2210
2017-06-20 23:19:08 -07:00
Xiaoti Hu
969831ea33 Deprecate CNNModelHelper in lmdb_create_example
Reviewed By: akyrola

Differential Revision: D5233793

fbshipit-source-id: bae745791f071bc36fd45bd81145ce86c8ba9ed0
2017-06-19 13:04:02 -07:00
Aapo Kyrola
feba1eed00 resnet50: fetch right lr
Summary: I broke resnet50 when switching to use optimizer, which uses LR per parameter. This only happens after each epoch, and I did no test patiently enough. For a stop-gap, while asaadaldien works on a better solution, just fetch the lr of a conv1_w param.

Reviewed By: asaadaldien

Differential Revision: D5207552

fbshipit-source-id: f3474cd5eb0e291a59880e2834375491883fddfc
2017-06-07 21:46:35 -07:00
Thomas Dudziak
60c78d6160 Fixes range/xrange for Python 3
Summary: As title

Differential Revision: D5151894

fbshipit-source-id: 7badce5d3122e8f2526a7170fbdcf0d0b66e2638
2017-06-07 00:04:26 -07:00
Xiangyu Wang
c9c862fa8f 16117716 [Caffe2 OSS] make char-rnn exapmle use build_sgd
Summary: replace hand made sgd with build_sgd

Reviewed By: salexspb

Differential Revision: D5186331

fbshipit-source-id: 3c7b4b370e29a1344b95819766463bae3812c9a6
2017-06-06 13:54:59 -07:00
Aapo Kyrola
401908d570 add_weight_decay + restore weight decay to resnet50_trainer
Summary:
Add add_weight_decay to optimizer + test.

In D5142973 I accidentally removed weight decay from resnet50 trainer, so this restores it.

Reviewed By: asaadaldien

Differential Revision: D5173594

fbshipit-source-id: c736d8955eddff151632ae6be11afde0883f7531
2017-06-02 14:16:56 -07:00
Aapo Kyrola
cdb50fbf2b add optimizer support to data_parallel_model; Use MomentumSGDUpdate
Summary:
This diff does two things:
- add supports for optimizer to data_parallel_model. User can supply optimizer_builder_fun instead of param_update_builder_fun. The latter is called for each GPU separately with proper namescope and devicescope, while optimizer builder only is called once and adds optimizes to the whole model.

- use MomentumSGDUpdate instead of MomentumSGD + WeightedSum. This bring major perf benefits.

Changes resnet50 trainer to use optimizer.

This relies on D5133652

Reviewed By: dzhulgakov

Differential Revision: D5142973

fbshipit-source-id: 98e1114f5fae6c657314b3296841ae2dad0dc0e2
2017-05-30 12:49:57 -07:00