pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-06 12:20:52 +01:00

Author	SHA1	Message	Date
Xuehai Pan	8d45f555d7	[BE] [1/3] Rewrite `super()` calls in caffe2 and benchmarks (#94587 ) Rewrite Python built-in class `super()` calls. Only non-semantic changes should be applied. - #94587 - #94588 - #94592 Also, methods with only a `super()` call are removed: ```diff class MyModule(nn.Module): - def __init__(self): - super().__init__() - def forward(self, ...): ... ``` Some cases that change the semantics should be kept unchanged. E.g.: `f152a79be9/caffe2/python/net_printer.py (L184-L190)` `f152a79be9/test/test_jit_fuser_te.py (L2628-L2635)` Pull Request resolved: https://github.com/pytorch/pytorch/pull/94587 Approved by: https://github.com/ezyang	2023-02-11 18:19:48 +00:00
Aaron Gokaslan	8fce9a09cd	[BE]: pyupgrade Python to 3.8 - imports and object inheritance only (#94308 ) Apply parts of pyupgrade to torch (starting with the safest changes). This PR only does two things: removes the need to inherit from object and removes unused future imports. Pull Request resolved: https://github.com/pytorch/pytorch/pull/94308 Approved by: https://github.com/ezyang, https://github.com/albanD	2023-02-07 21:10:56 +00:00
Serkan Karakulak	52e8af57a6	[3/N] Update ema_teacher_arch in the backward call (#92080 ) Summary: adding support for updating ema_teacher_arch in C2 backend Test Plan: baseline f397096610 EMA run f397096864 Differential Revision: D41124891 Pull Request resolved: https://github.com/pytorch/pytorch/pull/92080 Approved by: https://github.com/kit1980	2023-01-20 02:29:42 +00:00
Wenguang Mao	755b39ba66	[LRD] Allowing using dedicated iteration counter for learning rate (#85195 ) Summary: So that we could manipulate the iteration counter for lrarning rate separately (for learning rate decay or learning rate re-warming up etc), without affecting other techniques relying on iterations (such as EMA) Test Plan: Unit tests: ``` ✓ Pass: caffe2/caffe2/python:optimizer_test - testSparse (caffe2.caffe2.python.optimizer_test.TestAdagradWithDedicatedLRIteration) (46.475) ✓ Pass: caffe2/caffe2/python:optimizer_test - test_global_norm_based_gradient_clipping (caffe2.caffe2.python.optimizer_test.TestAdagradWithDedicatedLRIteration) (46.475) ✓ Pass: caffe2/caffe2/python:optimizer_test - test_lr_injection (caffe2.caffe2.python.optimizer_test.TestAdagradWithDedicatedLRIteration) (46.475) ✓ Pass: caffe2/caffe2/python:optimizer_test - main (46.475) Summary Pass: 5 Skip: 1 ↻ caffe2/caffe2/python:optimizer_test - testGPUDense (caffe2.caffe2.python.optimizer_test.TestAdagradWithDedicatedLRIteration) ListingSuccess: 1 ``` Reviewed By: liangming168 Differential Revision: D38747417 Pull Request resolved: https://github.com/pytorch/pytorch/pull/85195 Approved by: https://github.com/liangming168, https://github.com/eellison	2022-09-27 00:56:57 +00:00
Stephen Macke	785b6905de	reduce plan generation log spam (#70880 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70880 Change loglevel to `debug` in caffe2 `optimizer.py` for logging rowwise Adagrad engine. Test Plan: CI + sandcastle Reviewed By: boryiingsu Differential Revision: D33439337 fbshipit-source-id: b158249b8df771c0ec8b642210ede39972929b00	2022-01-08 10:07:06 -08:00
Atul Jangra	49f1605392	[RFC] Reduce logging noise from AdagradOptimizer (#66443 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66443 For some reason, this logging is adding noise to a lot of flow jobs. I am not sure if this is actually needed. This is called from the __init__ so it's logged all the time and logs all key:values the current local symbol. Test Plan: N/A Reviewed By: chowarfb Differential Revision: D31534372 fbshipit-source-id: bed032b66fed548c97a6f66b1b9e905fd2738851	2021-10-11 13:25:41 -07:00
Jamie King	812bc1dde6	Smart Decay for Adam - DPER3 (#62058 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62058 This is the second diff in this stack. This diff includes the changes to DPER3; the first diff includes the changes to Caffe2. We want to decay learning parameters properly. Previously this was not done when a parameter is absent from a minibatch. We fix this by keeping track of missed minibatches and making decay catch up accordingly. The exponential moving averages (EMA) for the first and second moments used in Adam are updated only for parameters seen in a minibatch. Actually, for these parameters, 0 should be added to the EMAs and the EMAs should then be decayed by multiplying by beta1 and beta2 respectively. To avoid the computational overhead of touching every parameter for every minibatch, we: * keep track of the last time a parameter is seen * instead of decaying the EMAs by multiplying by beta1 and beta2, we multiply by beta1^k and beta2^k, where k is the number of minibatches since the parameter was last seen. We hope this will significantly improve the inconsistent learning parameter issue we have seen with Adam. Differential Revision: D29638897 fbshipit-source-id: 18d8e227d72c2e23010ca81e0f6eeb78872c8d3c	2021-07-23 13:26:30 -07:00
Baichuan Yuan	dca97b4394	Weighted decay with frequency (count-based) (#60382 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60382 Instead of setting weight_decay w uniformly for all ids, for each row i in the sparse embedding table, the actual weight_decay `w_i` becomes `wfreq_i` where `freq_i = halflife/counter_i \in [\log(2), halflife]`. Counter is from `rowwise_counter` with definition `counter_i = 1 + \exp(-iter_{\delta}\rho)*counter_i`. Test Plan: buck test //caffe2/caffe2/python/operator_test:adagrad_test -- test_row_wise_sparse_adagrad buck test caffe2/caffe2/fb/dper/layer_models/tests/split_1:sparse_nn_test_weight_decay Reviewed By: 0x10cxR1 Differential Revision: D25581030 fbshipit-source-id: 54b3831b20516c76c559b13d8deb809e2ee3b446	2021-06-21 18:46:35 -07:00
Zhijing Li	88a1e8eb01	Add EMA to DecayAdagrad (#57866 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57866 As titled Test Plan: f271267365 Reviewed By: lanlanfb Differential Revision: D28292875 fbshipit-source-id: f6532048eb558afce87fdada3b7dfa8457a1f538	2021-05-07 23:09:08 -07:00
Lanlan Liu	695eef05a4	optimizer exploration - v1 and v2 + fix position_weighted optimizer + decoupled weight decay (#54042 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54042 Pull Request resolved: https://github.com/pytorch/pytorch/pull/53881 1. Fix position_weighted optimizer: Position weighted layer uses default optimizer but is actually gradient_slice, which will cause problem if we do not handle it properly in the new optimizier. The solution is to use sparseadagrad when it is gradient_slices. 2. Optimizer implementation of v1 and v2: using 1st momentum with/without bias_correction. 3. also implemented decoupled weight decay in the new optimizer. Test Plan: buck test //caffe2/caffe2/fb/dper/layer_models/tests/split_1:sparse_nn_test_2 -- test_mlp_optimization buck test //caffe2/caffe2/python:optimizer_test -- TestDecayAdagrad buck test //caffe2/caffe2/python/operator_test:decay_adagrad_test ctr_mbl_feed work flow: f255731660 oc work flow: f255739503 Reviewed By: 0x10cxR1 Differential Revision: D26839668 fbshipit-source-id: 2b6881c1a88540ef5766be40f5e80001257e2199	2021-03-27 23:03:29 -07:00
Neha Shah	f3c00047ce	Reset Optimizer counter while deserializing netWithBackwardOptions Summary: Add ability to reset optimizer counter.. Test Plan: will wait for integration tests to run on diff. Differential Revision: D27248286 fbshipit-source-id: a608df1bd61b64eb317c9ffd9cfdd804c5288f6d	2021-03-23 11:16:11 -07:00
Zhijing Li	05542f6222	EMA op (#50393 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50393 Exponential Moving Average Usage: add ema_options in adagrad optimizer. For details, plz refer to the test workflow setting. if ema_end == -1, it means ema will never end. Test Plan: buck test caffe2/caffe2/fb/optimizers:ema_op_optimizer_test buck test caffe2/caffe2/fb/optimizers:ema_op_test f240459719 Differential Revision: D25416056 fbshipit-source-id: a25e676a364969e3be2bc47750011c812fc3a62f	2021-01-13 08:58:01 -08:00
Bugra Akyildiz	27c7158166	Remove __future__ imports for legacy Python2 supports (#45033 ) Summary: There is a module called `2to3` which you can target for future specifically to remove these, the directory of `caffe2` has the most redundant imports: ```2to3 -f future -w caffe2``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/45033 Reviewed By: seemethere Differential Revision: D23808648 Pulled By: bugra fbshipit-source-id: 38971900f0fe43ab44a9168e57f2307580d36a38	2020-09-23 17:57:02 -07:00
Gang Shen	058d7228ec	Expose the interface of nesterov of SGD Optimizer from caffe2 to dper Summary: Expose the interface of `nesterov` of SGD Optimizer from caffe2 to dper. dper sgd optimizer (https://fburl.com/diffusion/chpobg0h) has referred to NAG sgdoptimizer in caffe2: https://fburl.com/diffusion/uat2lnan. So just need to add the parameter 'nesterov' in dper sgd optimizer. Analysis of run resutls: N345540. - train_ne increases as momentum (m) decreases. - for m=0.95, 0.9: eval_ne is lower with NAG than production (no NAG, m = 0.95). - for m=0.99: eval_ne with or without NAG is higher than production. It indicates larger variance in validation and overfit in training (lower train_ne). Test Plan: 1. unit tests: `buck test caffe2/caffe2/fb/dper/layer_models/tests/split_1:sparse_nn_test -- test_sgd_without_nesterov` `buck test caffe2/caffe2/fb/dper/layer_models/tests/split_1:sparse_nn_test -- test_sgd_with_nesterov` . 1. build dper front end package: `flow-cli canary ads.dper3.workflows.sparse_nn.train --mode opt --entitlement ads_global --run-as-secure-group team_ads_ml_ranking`. The build result (refreshed) is here https://www.internalfb.com/intern/buck/build/2a368b55-d94b-45c1-8617-2753fbce994b. Flow package version is ads_dper3.canary:856b545cc6b249c0bd328f845adeb0d2. . 2. To build dper back end package: `flow-cli canary dper.workflows.dper3.train --mode opt --entitlement ads_global --run-as-secure-group team_ads_ml_ranking`. The build result (refreshed) is here: https://www.internalfb.com/intern/buck/build/70fa91cd-bf6e-4a08-8a4d-41e41a77fb52. Flow package version is aml.dper2.canary:84123a34be914dfe86b1ffd9925869de. . 3. Compare prod with NAG-enabled runs: a) refreshed prod run (m=0.95): f213877098 NAG enabled run (m=0.95): f213887113 . b) prod run (m=0.9): f214065288 NAG enabled run (m=0.9): f214066319 . c) prod run (m=0.99): f214065804 NAG enabled run (m=0.99): f214066725 . d) change date type of nestrov to `bool` and launched a validation run NAG enabled (m=0.95): f214500597 Reviewed By: ustctf Differential Revision: D23152229 fbshipit-source-id: 61703ef6b4e72277f4c73171640fb8afc6d31f3c	2020-09-09 19:37:00 -07:00
Rui Liu	92b7347fd7	Enforce counter value to double type in rowwise_counter Summary: Enforce counter value to double type in rowwise_counter. Context: The existing implementation is using float type for counter value. But due to the precision limit of a floating number [1], we observed that the counter value can't increment beyond 16777216.0 (i.e., the max value is 16777216.0) in our earlier experiments. We decide to enforce double type to avoid this issue. [1] https://stackoverflow.com/questions/12596695/why-does-a-float-variable-stop-incrementing-at-16777216-in-c Test Plan: op test ``` ruixliu@devvm1997:~/fbsource/fbcode/caffe2/caffe2/python/operator_test(f0b0b48c)$ buck test :rowwise_counter_test Trace available for this run at /tmp/testpilot.20200728-083200.729292.log TestPilot test runner for Facebook. See https://fburl.com/testpilot for details. Testpilot build revision cd2638f1f47250eac058b8c36561760027d16add fbpkg f88726c8ebde4ba288e1172a348c7f46 at Mon Jul 27 18:11:43 2020 by twsvcscm from /usr/local/fbprojects/packages/testinfra.testpilot/887/t.par Discovering tests Running 1 test Started new test run: https://our.intern.facebook.com/intern/testinfra/testrun/7881299364977047 ✓ caffe2/caffe2/python/operator_test:rowwise_counter_test - test_rowwise_counter (caffe2.caffe2.python.operator_test.rowwise_counter_test.TestRowWiseCounter) 0.265 1/1 (passed) ✓ caffe2/caffe2/python/operator_test:rowwise_counter_test - main 14.414 (passed) Finished test run: https://our.intern.facebook.com/intern/testinfra/testrun/7881299364977047 Summary (total time 18.51s): PASS: 2 FAIL: 0 SKIP: 0 FATAL: 0 TIMEOUT: 0 OMIT: 0 ``` optimizer test ``` ruixliu@devvm1997:~/fbsource/fbcode/caffe2/caffe2/python(7d66fbb9)$ buck test :optimizer_test Finished test run: https://our.intern.facebook.com/intern/testinfra/testrun/7036874434841896 Summary (total time 64.87s): PASS: 48 FAIL: 0 SKIP: 24 caffe2/caffe2/python:optimizer_test - testGPUDense (caffe2.caffe2.python.optimizer_test.TestMomentumSgd) caffe2/caffe2/python:optimizer_test - testGPUDense (caffe2.caffe2.python.optimizer_test.TestGFtrl) caffe2/caffe2/python:optimizer_test - test_caffe2_cpu_vs_numpy (caffe2.caffe2.python.optimizer_test.TestYellowFin) caffe2/caffe2/python:optimizer_test - testGPUDense (caffe2.caffe2.python.optimizer_test.TestSparseRAdam) caffe2/caffe2/python:optimizer_test - testGPUDense (caffe2.caffe2.python.optimizer_test.TestRowWiseAdagradWithCounter) caffe2/caffe2/python:optimizer_test - testGPUDense (caffe2.caffe2.python.optimizer_test.TestAdagrad) caffe2/caffe2/python:optimizer_test - test_caffe2_gpu_vs_numpy (caffe2.caffe2.python.optimizer_test.TestYellowFin) caffe2/caffe2/python:optimizer_test - testDense (caffe2.caffe2.python.optimizer_test.TestRowWiseAdagrad) caffe2/caffe2/python:optimizer_test - testGPUDense (caffe2.caffe2.python.optimizer_test.TestFtrl) caffe2/caffe2/python:optimizer_test - testSparse (caffe2.caffe2.python.optimizer_test.TestRmsProp) ...and 14 more not shown... FATAL: 0 TIMEOUT: 0 OMIT: 0 ``` param download test ``` ruixliu@devvm1997:~/fbsource/fbcode/caffe2/caffe2/fb/net_transforms/tests(7ef20a38)$ sudo buck test :param_download_test Finished test run: Finished test run: https://our.intern.facebook.com/intern/testinfra/testrun/6473924481526935 ``` e2e flow: f208394929 f207991149 f207967273 ANP notebook to check the counter value loaded from the flows https://fburl.com/anp/5fdcbnoi screenshot of the loaded counter (note that counter max is larger than 16777216.0) {F250926501} Reviewed By: ellie-wen Differential Revision: D22711514 fbshipit-source-id: 426fed7415270aa3f276dda8141907534734337f	2020-08-05 20:40:51 -07:00
Rui Liu	9d8dc0318b	[pruning] add rowwise counter to sparse adagrad Summary: Use the newly added counter op in sparse adagrad Reviewed By: chocjy, ellie-wen Differential Revision: D19221100 fbshipit-source-id: d939d83e3b5b3179f57194be2e8864d0fbbee2c1	2020-06-30 14:40:02 -07:00
Taiqing Wang	ad91a3a11f	Skipping L2 regularization on sparse biases Summary: # Motivations As explained in the [link](https://stats.stackexchange.com/questions/86991/reason-for-not-shrinking-the-bias-intercept-term-in-regression/161689#161689), regularizing biases will cause mis-calibration of predicted probabilities. In SparseNN, the unary processor may use 1d embedding tables for the sparse features to serve as biases. In this diff, the regularization term is automatically skipped for the 1d sparse parameters to avoid regularizing biases. # Experiments Experiments were conducted to verify that it has no significant impact on the NE to skip the regularization on 1d sparse parameters. Baseline.1 (no L2 regularization): f193105372 Baseline.2 (L2 regularization in prod): f193105522 Treatment (skipping L2 regularization on 1d sparse params): f193105708 {F239859690} Test Plan: Experiments were conducted to verify that it has no significant impact on the NE to skip the regularization on 1d sparse parameters using a canary package: `aml.dper2.canary:9efc576b35b24361bb600dcbf94d31ea`. Baseline.1 (no L2 regularization): f193105372 Baseline.2 (L2 regularization in prod): f193105522 Treatment (skipping L2 regularization on 1d sparse params): f193105708 Reviewed By: zhongyx12 Differential Revision: D21757902 fbshipit-source-id: ced126e1eab270669b9981c9ecc287dfc9dee995	2020-06-11 11:21:25 -07:00
Lu Fang	9efbc19f75	Fix the issue with C2 cont build Summary: Issue was introduced in D21258652. We need to make sure it compiles with opt mode. We may still have some left over py2 packages. Let's just use some format work with both. Test Plan: ci Reviewed By: xush6528 Differential Revision: D21457394 fbshipit-source-id: cde79a0fc6b4feba307bd9d45e1a1d4a42de9263	2020-05-07 19:33:00 -07:00
Taiqing Wang	8cb1f2f9dc	implement L2 regularization for Adagrad in caffe2 and dper (#37705 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37705 Pull Request resolved: https://github.com/pytorch/pytorch/pull/37372 Posted note: [Regularizing SparseNN Against Over-fitting](https://fb.workplace.com/notes/taiqing-wang/regularizing-sparsenn-against-over-fitting/220306075902708/) Problem formulation L(w) = J(w) + lambda/2 * \|\|w\|\|^2 J(w) is the empirical loss, and \|\|w\|\|^2 is the squared L2 norm of the parameters, a.k.a. L2 regularizer. dL(w)/ dw_i = dJ(w)/dw_i + lambda w_i dL(w)/ dw_i is the gradient of L(w) w.r.t. w_i. To implement the L2 regularizer, the gradient of J(w) w.r.t. w_i is added with w_i. lambda is called as weight decay in this implementation. Code changes * In the initialization method of AdagradOptimizer, a new input argument, weight_decay, is added. * In the _run function of AdagradOptimizer, the weight decay will be skipped for 1d bias vectors. * In the parameter update functions of Adagrad, the gradient is updated by weight_decay * w_i. The default value for weight_decay is zero. Test Plan: ` buck build caffe2/caffe2/fb/dper/layer_models/tests/split_1:sparse_nn_test_weight_decay ` ` ./buck-out/gen/caffe2/caffe2/fb/dper/layer_models/tests/split_1/sparse_nn_test_weight_decay#binary.par ` Reviewed By: jspark1105 Differential Revision: D21258652 fbshipit-source-id: d2366ddcd736a03205a2d16f914703b16d9fce8f	2020-05-03 10:42:49 -07:00
Yuxi Hu	cf27d07e04	Implementation of STORM optimizer caffe2 python wrapper (#36399 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36399 Added caffe2 python wrapper and unit test for the STORM C++ operator. Test Plan: All newly added unit tests passed using "buck test //caffe2/caffe2/python:optimizer_test -- TestStorm" {F233644598} Reviewed By: chocjy Differential Revision: D18841013 fbshipit-source-id: f692bc18412839db140202ec9a971e556db0e54f	2020-04-14 23:05:45 -07:00
Xing Wang	4b3e3d8227	[improve logging] add the param information when logging the optimizer engine (#36558 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36558 In the log, frequently see a large trunk of Using engine xx for rowWise Adagrad, but without information on which parameter is applied. Test Plan: Should be covered by existing testing that use optimizer Reviewed By: chocjy Differential Revision: D20985176 fbshipit-source-id: 6eb4e19e5307db53fc89b38594a3f303f1492a1c	2020-04-14 07:42:24 -07:00
Dhruv Choudhary	f0ea6862ba	Support for pruning delays in Adagrad Optimizer (#34527 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34527 Adding support for prune_delays and prune ratios in Adagrad optimizer. Test Plan: Tested via unit tests in masked_adagrad_optimizer_test. Added unit test for prune_delay versions of MaskedAdagrad buck build caffe2/caffe2/fb/optimizers:masked_adagrad_optimizer_test; buck-out/gen/caffe2/caffe2/fb/optimizers/masked_adagrad_optimizer_test#binary.par buck test caffe2/caffe2/fb/dper/layer_models/tests/split_1:sparse_nn_test -- 'test_pruning' All Dper tests passed https://our.intern.facebook.com/intern/testinfra/testrun/7599824380741217 Reviewed By: chocjy Differential Revision: D20313419 fbshipit-source-id: 5c2c8d4e0fc2ec538bcd6f145c6b87a2381f90f3	2020-04-09 12:59:23 -07:00
Fei Tian	845b19c4ef	Add weight_scale in Adagrad (#34944 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34944 Reviewed By: chonglinsun Differential Revision: D20506032 fbshipit-source-id: ef025e536da01fdcabc783466bc065685b80ab9a	2020-03-20 22:36:51 -07:00
Zhonghao Liu	e3272559e4	[caffe2] SWA operator (#34394 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34394 # SWA operator In this diff, we added a new operator `SWA` which will be used in `AdaGradOptimizer`. The algorithm looks like: {F230902995} # Background In our testings, we found that this operator could improve our models' reproducibility a lot. (KT: 0.86 -> .92) So we hope to land this operator and in future, enable this by default in our Models. Test Plan: Local build `aml.dper3:30f068668cfb408fbb40141fb17129f2` and bento kernel. - Local test: n215857 - f174600345 Reviewed By: chocjy Differential Revision: D20165239 fbshipit-source-id: c03cdd048cb10b091e5f06323f4c0f3999f95d8a	2020-03-20 08:17:08 -07:00
Jiyan Yang	b102550d2c	Allow to pass in masks through db (#31676 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31676 Facebook： Previously we assumed mask is passed in as a tensor which is not feasible for sparse parameter. Here we allow to pass in the mask through db path which requires the masks to be stored in some db first. Test Plan: unit tests Reviewed By: ellie-wen Differential Revision: D18928753 fbshipit-source-id: 75ca894de0f0dcd64ce17b13652484b3550cbdac	2019-12-30 20:54:27 -08:00
Jiyan Yang	90a187618e	Integrate masked sparse Adagrad (#31641 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31641 Assuming mask is provided as a tensor Test Plan: unit test Reviewed By: ellie-wen Differential Revision: D18928737 fbshipit-source-id: a4f3dd51769c2b56e5890043e91c18e6128be082	2019-12-27 18:40:50 -08:00
Jiyan Yang	4983ef8de1	Integrating MaskedAdagrad Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31640 Test Plan: unit test Reviewed By: ellie-wen Differential Revision: D18805278 fbshipit-source-id: 1def4a89b7e4e04385c762bf127d95c5e513180e	2019-12-26 17:18:39 -08:00
Mengshi Zhang	5b6dd52e3c	Build Unit Test of SparseRAdam Summary: We added caffe2 python wrapper and unit test for the SparseRAdam C++ operator. Test Plan: Unit test is constructed following the design pattern of [Wngrad optimizer](https://our.intern.facebook.com/intern/diff/D8655724/). Test passed smoothly. buck test //caffe2/caffe2/python:optimizer_test -- TestSparseRAdam Test result: {F221144048} Reviewed By: wx1988 Differential Revision: D18330650 fbshipit-source-id: e0f4724c2b616b665e2a0fe2e5c3430696cca7ee	2019-11-18 15:22:37 -08:00
Jiyan Yang	c48e1679f9	Add validator for optimizers when parameters are shared Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18497 Reviewed By: kennyhorror Differential Revision: D14614738 fbshipit-source-id: beddd8349827dcc8ccae36f21e5d29627056afcd	2019-04-17 21:10:38 -07:00
rohithkrn	0d663cec30	Unify cuda and hip device types in Caffe2 python front end (#14221 ) Summary: Goal of this PR is to unify cuda and hip device types in caffe2 python front end. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14221 Differential Revision: D13148564 Pulled By: bddppq fbshipit-source-id: ef9bd2c7d238200165f217097ac5727e686d887b	2018-11-29 14:00:16 -08:00
Jiyan Yang	a2fcd4dee5	Ensure FP16 rowwise Adagrad can be run Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12317 Reviewed By: hyuen Differential Revision: D10190778 fbshipit-source-id: 720a9aaa4e6b1736023d8c6326a613e4ea592b31	2018-11-28 02:15:36 -08:00
Junjie Bai	f54ab540af	Rename cuda_gpu_id to device_id in DeviceOption (#12456 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12456 codemod with 'Yes to all' codemod -d . --extensions h,cc,cpp,cu,py,proto,pbtxt,pb.txt,config cuda_gpu_id device_id Overload TextFormat::ParseFromString to do string replace when parsing from protobuf format Reviewed By: Yangqing Differential Revision: D10240535 fbshipit-source-id: 5e6992bec961214be8dbe26f16f5794154a22b25	2018-10-09 15:54:04 -07:00
Junjie Bai	ff608a9ff3	Back out "Revert D10123245: Back out "codemod cuda_gpu_id to device_id"" (#12232 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12232 Original commit changeset: fca91fea58b7 This adds proper modifications to the DeviceType <->DeviceOption conversion code added in D10033396 Reviewed By: jerryzh168 Differential Revision: D10132473 fbshipit-source-id: 801ef777e2950982cb47b48051b1471a0a91e64b	2018-10-01 21:54:52 -07:00
Rick Ratmansky	3010dc4208	Revert D10123245: Back out "codemod cuda_gpu_id to device_id" Differential Revision: D10123245 Original commit changeset: d83da8e00a12 fbshipit-source-id: fca91fea58b7df208edc2e218a1d514f9821ec7b	2018-10-01 12:22:36 -07:00
Yang Liu	7d7d336c45	Back out "codemod cuda_gpu_id to device_id" Summary: Original commit changeset: f5614a5d2607 D9986213 is causing Multifeed Aggregator a [huge performance different](https://our.intern.facebook.com/intern/ads/analyze_canary/412951953278781781/) and is blocking aggregator push since last Friday night: https://fburl.com/feedtools/b6izvwjz We need to land this revert ASAP to unblock aggregator push. Reviewed By: orionr Differential Revision: D10123245 fbshipit-source-id: d83da8e00a1250f5d09811a0a587c127e377aab2	2018-10-01 11:31:14 -07:00
Jiyan Yang	40aa212cd6	Support fp16 mkl engine in training Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12080 Reviewed By: hyuen Differential Revision: D10037719 fbshipit-source-id: 618ce894eccc4c87a038dc3ab836684f16843cde	2018-09-29 21:55:11 -07:00
Junjie Bai	3eb5940cf5	codemod cuda_gpu_id to device_id (#12022 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12022 codemod -d . --extensions h,cc,cpp,cu,py,proto,pbtxt,pb.txt,config cuda_gpu_id device_id codemod with 'Yes to all' Reviewed By: orionr Differential Revision: D9986213 fbshipit-source-id: f5614a5d26078817aee8caf79a494abfd6a95ff1	2018-09-27 20:24:53 -07:00
Chenguang Xi	cdefc27795	Support lr adaption for SparseAdam and RowWiseSparseAdam (#11162 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11162 as title, fix pr test failure Reviewed By: chocjy Differential Revision: D9619308 fbshipit-source-id: 0a2228841ed8fadb15f07e94d3575aa701b10146	2018-09-17 10:29:03 -07:00
Jiyan Yang	c5f7da3f4a	Support FP16 sparse lookup (#11674 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11674 Pull Request resolved: https://github.com/pytorch/pytorch/pull/11658 Reviewed By: hyuen Differential Revision: D9676950 fbshipit-source-id: 89a115b9664b84e4e4436b7da033e5a428c2246d	2018-09-14 02:40:08 -07:00
Edward Yang	3073051a18	Revert D9554375: Support lr adaption for SparseAdam and RowWiseSparseAdam Differential Revision: D9554375 Original commit changeset: b88768f470ef fbshipit-source-id: 2c103c616c8680684892c7d9085fd7bb8289d2f1	2018-08-31 07:54:31 -07:00
Chenguang Xi	0555768e0f	Support lr adaption for SparseAdam and RowWiseSparseAdam (#10993 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10993 as title Reviewed By: chocjy Differential Revision: D9554375 fbshipit-source-id: b88768f470ef7d023dd481c6a97b91594892f422	2018-08-31 00:55:39 -07:00
Lin Li	4a2f3cc45f	Improve lars operator by applying clipping (#9905 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9905 This diff improves lars operator in Caffe2 by applying clipping to the computed learning rate Reviewed By: pjh5 Differential Revision: D9020606 fbshipit-source-id: b579f1d628113c09366feac9406002f1ef4bd54f	2018-08-02 11:54:28 -07:00
Xiuyan Ni	db96a0951f	Add SIMD version to GFTRL optimizer (#9698 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9698 Add SIMD version to GFTRL optimizer Differential Revision: D8949723 fbshipit-source-id: 835ce2ce49630ae43fc6bac63c545c14b25f5a26	2018-07-30 15:27:24 -07:00
Siddharth Goyal	4b61760738	Add Adadelta optimizer to caffe2 (#9088 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9088 Closes https://github.com/pytorch/pytorch/pull/9088 - Added CPU/GPU implementations of Adadelta and SparseAdadelta. - Added corresponding Python unittests Reviewed By: BIT-silence Differential Revision: D8712169 fbshipit-source-id: 544e99e13b230a919672a7341b3715d64597c0be	2018-07-24 20:09:21 -07:00
Jian Zhang	099a6d5e08	Implementation of Wngrad optimizer caffe2 python wrapper and unit test on least square regression (#9001 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9001 Closes https://github.com/pytorch/pytorch/pull/9001 We added caffe2 python wrapper and unit test for the Wngrad C++ operator. Reviewed By: chocjy Differential Revision: D8655724 fbshipit-source-id: fb259afd6fd50231691bd75c52852b20a1e1aec8	2018-07-13 18:54:52 -07:00
Xiuyan Ni	4e5369349f	Add FTRL Optimzier with Group Lasso regularizer (#9074 ) Summary: Closes https://github.com/pytorch/pytorch/pull/9074 Implement an optimzier based on FTRL Optimzier which support Group Lasso regularizer. The relevant paper list for this optimizer: 1. About the FTRL Optimizer: https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/41159.pdf, 2. About the group lasso regularizer solver: http://www.cse.cuhk.edu.hk/~king/PUB/ICML2010-Yang-473.pdf Differential Revision: D8623146 fbshipit-source-id: 40e08aa6319d1ad7aa95e8716e3de83b9cfb8452	2018-07-06 13:41:00 -07:00
Orion Reblitz-Richardson	edb88b5f3a	Update from Facebook (#8887 ) * add opencl + fpga context adds an opencl context inside caffe2/fb which can be used for fpga access * [Caffe2] Force tensor inference checks to be triggered during testing We've started to rely on TensorInference functions more for different analysis. This diff ensures that the TensorInference function's result matches what is expected from the definition of the operator. * Enable building //caffe2:torch with @mode/opt In @mode/opt, python runs out of a PAR, which breaks a lot of assumptions in the code about where templates/ folders live relative to __file__. Rather than introduce hacks with parutil, I simply turn template_path into a parameter for all the relevant functions and thread it through from the top level. * [Caffe2] Fix cost models for DotProduct and Div. Update Tensor Inference for dot product As title. DotProduct states that output is a 1-D tensor (https://caffe2.ai/docs/operators-catalogue.html#dotproduct) though code suggests it is either 0- or 1-D depending on inputs. TensorInference defined to support implementation. * [SG-MoE] Add an option to make the experts NOT as components * [nomnigraph] Rename and fixup convertToNeuralNetOperator API This will make things a bit cleaner * no longer symlink THNN.h and THCUNN.h * forced decoder network (onnx export) Closes https://github.com/pytorch/translate/pull/95 Add networks in ensemble_export.py to create a forced decoding network from PyTorch NMT checkpoints. This network takes an arbitrary numberized (source, target) pair and returns the model score for the translation, including penalties. Vocabulary reduction networks are also supported, but note that target indices which are not in the possible_translation_tokens generated for the source input will be trea * Revert schema change to fix production models Revert schema change to fix production models * MockLogDeviceReader - rebase on FIX # Goal 1), Build a make_mock_log_device_reader using make_mock_reader 2), Replace the real log_device_reader here: https://fburl.com/raihwf1p # Log by D8151734 Real log_device_reader: ``` I0529 20:29:05.373108 954994 tensor.h:839] Tensor print_net/log of type std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >. Dims: (): read_net/ParseOpenTrainingRow:0 I0529 20:29:05.373244 954994 tensor.h:839] Tensor read_net/ParseOpenTrainin * [C2/D2][1/n]: Nonnegative-Constrained Optimization -- log barrier implement log barrier as a regularization method * Add teacher weight screening. Add teacher weight sceening according to teacher labels. If teacher label is zero, we do not use the distill loss in the objective function. * Add NormalizerContext See task for more detail. This implementation is a copy of what exists for RegularizerContext except for how the parameters are defined in the model_definition thrift file. I'll try an alternative implementation which overrides the default arguments of functions instead like for argscopes in tensorflow. https://github.com/pytorch/pytorch/compare/master...MaximeBoucher:update-from-facebook-0939578c068c?expand=1 * Adding cosine similarity option in dot processor Add pairwise cosine similarity option in dot product. Add an option to concate dot product and cosine similarity. Add test cases. * [nomnigraph][redo] Concat elim for sparseNN Same as D7962948, which was reverted because Operator Schema was not defined * [pytorch] Revert pytorch/pytorch#7918 'Release GIL when copying to shared memory', breaks ASAN Revert this pytorch diff that breaks ASAN when running Filament in dev mode; in opt mode it gives "bad file descriptor" errors. Looks like a race when copying tensors to shared memory in multiple mp.Queue's (which spawn separate threads). https://github.com/pytorch/pytorch/pull/7918/files * [nomnigraph][mobile] Enable nomnigraph by default, use -Oz on nomnigraph related code to reduce code size enables nomnigraph and reduces codesize * [Warmup] Allow both offline incremental training and online training Change plan name on saving side and reading side to support both training type This diff depends on D8128530 and D8168651. * Revert D7802642: [Warmup] Allow both offline incremental training and online training This reverts commit afc213cf9b36cecf75333a788391c4d09f4afccc @bypass-lint An infra SEV is better than not reverting this diff. If you copy this password, see you in SEV Review! @cause_a_sev_many_files * Add legacy grad logic to fix div op on old graphs. Add legacy grad logic to fix div op on old graphs. * Correctly propagate operator failures Propagate errors from operators that throw exceptions and return false * Revert D8374829: [caffe2][nomnigraph][redo] Concat elim for sparseNN This reverts commit 6dda028c463e54bb5c32188bbbe9202107e188a5 @bypass-lint An infra SEV is better than not reverting this diff. If you copy this password, see you in SEV Review! @cause_a_sev_many_files * [Caffe2] Added extra_info to core.DeviceOption(), enforced extra_info to be inherited in scope.DeviceScope extra_info is a newly defined field in DeviceOption proto. This diff added extra_info to the core.DeviceOption(). And, In scope.DeviceScope(), this diff enforce the new scope to inherit the extra_info from old scope. * [opt] hgdirsync wasn't enabled, merge diverged code Here's the damage, P59732616 basically xplat was left behind but had the change from assert to CAFFE_ENFORCE * OMP parallelism over RoIs for RoIAlign op Simpler to parallelize over RoIs. Shouldn't affect other uses as it relies on the number of OMP threads set during startup. PR: https://github.com/pytorch/pytorch/pull/8562 * Use int64_t for shape in FillOps to avoid overflow of int32 * Implement Rotated RoIAlign op Based on Rotated RPNs as explained in https://arxiv.org/abs/1703.01086. The idea is simple - orientation/angle is added as an RPN anchor parameter and then the angle is further regressed similar to bbox coords. There are some additional changes related to NMS and IoU, but besides that it's a direct extension to Faster-RCNN. Further details in https://fb.quip.com/sZHlA1iMfWPZ. RoIs are represented in [center_x, center_y, width, height, angle] format. `angle` repre * Rotated RoIAlign op CUDA forward implementation CUDA forward impl for D8415490 * RoIAlignRotated op CUDA backward pass implementation TSIA * All remaining fixes to eliminate process_github.sh Most of this diff has already been reviewed separately, except for the parts relating to _thnn/utils.py and _utils._internal.py remove skipIf(True, 'Fbcode') line from process_github.sh replace sed of cpp file with #ifdef to control cudnnDestroy use undo sync-time deletion of .gitattributes, remove process_github.sh switch to using _utils._internal rather than try-import-except This diff also fixes the open-source bug where rebuilds have * Back out "Revert D7802642: [Warmup] Allow both offline incremental training and online training" Original commit changeset: 7707d2efe60e The original diff is backout becuase the online trainer package is backed out. This code would only work with new online trainer package * [easy] improve error log in adagrad op as title * re-allow use of thnn_h_path This fixes cffi usage in OSS * [4/4] [tum] paralyzing layerNorm for GPU full sync as title * add compile=False to pytorch tests, remove hack with pyc * Add shape and type inference for RowWiseArgMax operator See title * Revert D8515341: Back out "Revert D7802642: [Warmup] Allow both offline incremental training and online training" This reverts commit 78167eeef0af16b60f72c82f9dcdda9b41b4dcbd @bypass-lint An infra SEV is better than not reverting this diff. If you copy this password, see you in SEV Review! @cause_a_sev_many_files * [fix-flaky-test] mock_hive_reader_test flaky, because GlobalCounter collects local counts intervally # Problem `MockHiveReader` uses `GlobalCounter` to limit `max_examples`. GlobalCounter on server node collect local counts from worker nodes every 1 sec. This 1 sec delay makes it impossible to limit exactly to the `max_examples`, it will definitely exceed `max_examples`. # Plan Given, ``` Expected num_examples = max_examples + num_examples/sec (Read Speed) x 1 sec (GlobalCounter Sync Int * [Caffe2] Fix FCGradient cost inference. Prevent overflow in cost inference FCGradient missed a factor 2 in the `num_outputs == 3` case. Overflow was occurring with flop calculation for FC. Changed types to `uint64_t` to prevent future problems. * Fix binary ops with empty inputs Fix binary ops with empty inputs * Support the filling of input blob with provided data as title for Biz Integrity case * Back out "Revert D8515341: Back out "Revert D7802642: [Warmup] Allow both offline incremental training and online training"" Original commit changeset: 30c55dd38816 Original diff is reverted due to introducing bad integration test. Fixed the integration test. * [c2][easy] improve pack ops error loggings as desc. * Add ShapeTypeInference for LpNorm operator As desc * Shard test_nn to reduce runtime for each test target Closes https://github.com/pytorch/pytorch/pull/8793 The current test_nn would time out and be disabled in GreenWarden, and we need to have an option to split it up in order to pass the stress test. Right now GreenWarden roughly allows running 100 test cases in test_nn before timing out, and here we have an option to divide test_nn into 30 shards (with ~40 tests in each shard) to allow for some test suite growth in the future. * Change default caffe2_streams_per_gpu to 1 * Remove IN_SANDCASTLE from common.py and test_nn.py We prefer to disable the failing tests through Sandcastle UI instead. * Add a new class for an updated prof_dag.proto This diff contains: - An updated prof_dag.proto that contains blob profiles. - A class to deserialize this information (serialization is in a follow up diff) - Update to separate profiling information from NeuralNet (and use it as part of the class above). - Unit tests * Lambdarank for SparseNN This diff adds a lambda_rank_layer for SparseNN. changes include 1) Adds support for multi sessions in c2 op 2) Adds support for two different loss functions in c2 op 3) Unit tests for op * Revert D8586950: Back out "Revert D8515341: Back out "Revert D7802642: [Warmup] Allow both offline incremental training and online training"" This reverts commit 012220ed63eccc35659a57b31d16a3625da6317b @bypass-lint An infra SEV is better than not reverting this diff. If you copy this password, see you in SEV Review! @cause_a_sev_many_files * [easy] A few fixups to multithread predictor benchmark (1) support perf on T6 server (2) remove dead code * fix a bug about the map size as title * Fix reduce sum on in-place case. Fix reduce sum on in-place case. * [Warmup] Reland reverted diff Allow both offline incremental training and online training Closes https://github.com/pytorch/pytorch/pull/8827 fix net transform integration test. Allow offline and online trainer to coexist D7802642. * Add StoreHandlerNotAvailableException Add an exception for a store that is not available or has been deleted. * Use exception handling for fault tolerance, missing KV store Remove status blobs to communication ops so that exceptions propagate on failure. * [C2/D2][2/n]: Nonnegative-Constrained Optimization -- bounded grad proj for simple bounded constrained optimization, incl non-negative box constraints. * [GanH]: Adaptive Weighting with More Estimations With implemented postivity optimization, we now learn adaptive weights with different parameterizations. This improves parameter estimation and training stability. * Revert some changes for landing * Remove AutoNoGIL in StorageSharing * Temporarily disable net_tests * Revert "[Caffe2] Force tensor inference checks to be triggered during testing" This reverts commit 67ef05c22b2f71b4a489695384932f968384a2a4. * Revert "Fix reduce sum on in-place case." This reverts commit 6cb8a8e1b3db7b6d20941b0053e3f3836068eb64. * Revert "Revert "Fix reduce sum on in-place case."" This reverts commit 130a257c0893dc09f4bd6e6a45d112261807fd2c.	2018-06-26 14:55:48 -07:00
bddppq	f94ae3ba1d	Update from facebook (#7696 ) * Fix handling of empty batches in SumReduceDimsOp As titled * Deferrable async_scheduling finishRun fix Proper order of finishing run operations in deferrable_async_scheduling net * Simplify exception handling in async_scheduling Simplify exception handling, no need to busy wait, thread that processes the last task can finish the run * [C2]worker_coordinator_memorize_worker_ids As titled. This is related to T28689868, where the number of blobs we want to create is equal to the number of worker ids * Add unit test for nets with no type set * Ignore total length argument in sympolic_pad_packed_sequence 1- There was a mistake in the code that total_length was added to the wrong symbolic function (pack_padded_sequence) instead of (pad_packed_sequence) 2- No need to throw an exception if total_length is given since it is only used to enable data_parallel training on multi-gpus and doesn't have anything to do with onnx export, so just ignore it. https://fburl.com/tk4gciqp * Add support for MKLDNN to async_scheduling Just add MKLDNN as a possible CPU option to async_scheduling's pool function * [AuFL][ensemble] support branch output for prediction This diff supports using predictions from different branches and thus enables model ensembling (not fully independent). * Fix a bug in add_loss in layer_model_helper As titled. * Support lradaption for adam 1.lr adaption operator 2.apply to dense adam * Perf tweaks for async_scheduling Restore single pool option + remove unnecessary (no-ops) calls * add quantization to SparseSimdAdagradOp add a bunch of quantization signatures to SparseSimdAdagradOp, implementations to come next * [sr] [codemod] Change all SR callsites to use new API @allow-large-files This diff refactors all callsites of SR to use the slightly changed API introduced in the diff below. Really what this means is that you need to include the correct header. Also if you were using `ClientFactory::newFactory` you need to not prefix it with `ClientFactory::`. ``` cd ~/fbsource/fbcode find ./ -type f -exec sed -i -e 's:#include "servicerouter/client/cpp2/ClientFactory.h":#include "servicerouter/client/cpp2/ServiceRouter.h":' -e 's:#include <servicerouter/client/cpp2/ClientFactory.h>:#include <servicerouter/client/cpp2/ServiceRouter.h>:' -e 's/ClientFactory::newFactory(/newFactory(/g' {} \; ``` Also manually fixed spots that couldn't be done automatically (or broke because they depended on transitive includes). * Back out "Fix handling of empty batches in SumReduceDimsOp" Original commit changeset: 282da1730cc2 This commit is blocking the Github->fbcode sync, which really needs to get merged ASAP. D7881937 which this diff depends on will be reverted in the sync D7990948 which causes this to break. The sync diff cannot be patched with this reversion because it must be landed against base revision 5c8c099 , and D7881937 must not be included in the sync diff because it is breaking GPU tests that are not available in sandcastle : https://ci.pytorch.org/jenkins/job/caffe2-builds/job/py2-cuda8.0-cudnn6-ubuntu16.04-test/3638/console for one example. * Add the flow to support operator benchmark 1) generate model with the operator 2) upload to everstore 3) generate model spec into json file 4) start running the benchmark * [tum][gpu] Connect DPM trainer with flow and unit tests This diff: - Fix some small bugs for Yiming's recent changes to parallelizer, so it suits real use cases. - Add correct tags to the TUM code, so we can do data parallel transform - pass extra info when instantiation. - add unit test for using DPM in TUM model After this diff, we can do simple box, multi-gpu fully-sync trainer for TUM in Fblearner workflow, but may still need to do speed benchmarking. * w/o normalized lradaption for adam dense only The previous lr adaption includes a normalization step when performing the dot product operation. This is not exactly same as what is proposed in the paper. I add normalization as an option. Without it, the operator performs exactly what the paper proposed. With the option, we add the normalization step * [fb] Use SharedPromise in DeferrableAsyncSchedulingNet This code is to simplify DeferrableAsyncSchedulingNet by removing condition variable + small fixes * [tum] implement cuda sparseLengthsMean and LengthsMean as title * Adding an optional parameter to allow use of protobufs in InferShapesAndTypes function. Adding an optional parameter to allow use of protobufs in InferShapesAndTypes function. * Move feature_to_index to FeatureSpec.feature_to_index move feature_to_index to FeatureSpec.feature_to_index to avoid override other fields * [Caffe2] Rename bytes_moved to bytes_written Just a rename in preparation for supporting bytes_read. * [c2] fix ReduceFrontSumOp for empty case by setting 0 otherwise, it may use the results from last iteration when it's empty batch. * [Caffe2] [Int8] Improve Intel CPU performance * [Easy] Improve PrependDim op logging as titled * DBFileReader expand db_path using os.path.expanduser(..) Since there are a lot of possible use cases of `DBFileReader` to read from user home path, like `~/local/sample.db`, I want to save people's trouble of calling `os.path.expanduser(db_path)` themselves. * [Caffe2] Add bytes_read to cost structure We're adding analytical read bytes to cost functions. This extends the structure accordingly for all CostInference defined operators. Additionally, some small bug fixes were performed: 1) Cost functions now extract type information of operands instead of assuming float * Fix sleef on aarch64 for hhvm @bypass-lint Rename flag * Remove duplicated part in caffe2/ideep/operators/conv_op.cc should be sync error * Rename test helper function test_adagrad_sparse_helper to adagrad_sparse_test_helper to avoid confusing pytest	2018-05-19 23:10:48 -07:00
Paul Jesse Hellemn	b875fb281c	Update from facebook (#7451 ) * [bootcamp] Improve "Shape" operator to support axes specification To improve .shape operator of Caffe2 to support x.shape(tensor, axes), which takes an optional int array "axes" as input. For example, x.shape(tensor, [1, 0]) will return the dimension for axis 1 and 0 following the specified order. For current version, "axes" input allows duplications and can have arbitrary length. * Back out "Add barrier net that runs before training nets" Original commit changeset: b373fdc9c30f. Need additional changes to some callers to support barrier failures. * Change warning to verbose log to reduce log spam The `LOG(WARNING)` was a bit spammy for regular use so lets just make it a `VLOG`. * Extract the shared code from different caffe2_benchmark binaries The OSS benchmark and Internal benchmark will share most functions in the benchmark. * Support MFR in sequence training As titled. * Make knowledge distillation work with using logged prediction feature as teacher label. 1) Add loading raw dense feature as teacher label. 2) Optional calibration function for teacher label 3) Add teacher label into generic unit test 4) Deprecated TTSN workflow version using feature_options to config teacher label * [C2/CUDA]: unjoined cross entropy sigmoid as desc * Add async_scheduling executor into deferrable_net_exec_test Add async_scheduling into tests and fix some exception cases * Fix Event disabled error When disabling event in RNN ops make sure we don't call Finish on disabled event from op's RunAsync * cuda ensure cpu output op can handle both TensorCPU and TensorCUDA as desc. * [C2 Core] Infer input device option in C2 hypothesis_test checkers Improve how we default input blob device options. Previously it defaults as where op lives but it is not necessarily the case. For example: CopyCPUToGPU * [C2 Op]SplitByLengthsOp CPU/GPU implementation [C2 Op]SplitByLengthsOp CPU/GPU implementation * fix undefined symbol error not sure why we're getting undefined symbol even with link_whole = True Need to figure out why but need this workaround for now * Add tools in DAIPlayground platform to help debugging models Add additional tools to allow Plauground override individual method defined in AnyExp. This will allow user to create module that specificly change certain default method behavior. An example included in this diff is deactivating test model and checkpointing. When debugging any model problems, switching off components helps me quickly narrow down the location of the bug. The technique is extensively used in task T27038712 (Steady memory increase in EDPM, eventually resulting in gloo/cuda.cu:34: out of memory) * add shape and type inference for int8 conversion operator * Fix flaky test for group_norm Fix flaky test for group_norm * Fix group_norm_op_test flaky Fix group_norm_op_test flaky * Implementation of composite learning rate policy In many state-of-the-arts deep learning works, people use a simple trick to schedule the learning rate: use a fixed learning rate until error plateaus and then switch to a different fixed learning rate, and so on. In this diff, we implemented a simple version of the composite learning rate. The user gives a set of learning rates policies and corresponding iteration nums, and the optimizer will change the learning rate policy based on the number of iterations so far. For example, the user give two learning rate policies, one is FixedLearningRate and PolyLearningRate, with an iteration number of 1k. Then the first 1k iteration, we use FixedLearningRate. For the following iterations, we use PolyLearningRate. * Split two use cases of CachedReader into two classes, DBFileReader and CachedReader # Use Cases: 1). input: DB file -> output: DatasetReader. Use DBFileReader. 2). input: Reader -> build cache DB file -> output: DatasetReader. Use CachedReader. # Changes to CachedReader: 1). Move db_path to the constructor. Because in mock reader. cache will always be built ahead. # Changes to tests: 1). Make a separate TestCase class for CachedReader and DBFileReader. 2). Make it possible to add more test functions by adding setUp, tearDown and _make_temp_path. 3). Make delete db_path more general. `db_path` could be a file for `log_file_db`, but could also be a directory for `leveldb`. * Back out "On Mobile phones, call GlobalInit with no arguments in predictor in case we need to perform initialization" Original commit changeset: 4489c6133f11 * Fix LARS bug Fixed a bug in the LARS implementation which caused all subsequent blobs not using LARS to have the LARS learning rate multiplier applied to them. * [tum] support sparse init & add uniformFill option as title * Propagate exception for async nets Capture the exception when an exception is thrown in async nets and re-throw it after wait(). This allows exceptions to be propagated up to the caller. This diff was a part of D7752068. We split the diff so that C2 core files changes are in a separate diff. * Automatic update of fbcode/onnx to 69894f207dfcd72d1e70497d387201cec327efbc Previous import was 403ccfbd0161c38f0834413d790bad0874afbf9a Included changes: - [69894f2](https://github.com/onnx/onnx/commit/69894f2): Use op schema.all tensor types in random like definitions (#865) <Scott McKay> - [b9d6b90](https://github.com/onnx/onnx/commit/b9d6b90): Clarify random like operators (#846) <Scott McKay> - [fc6b5fb](https://github.com/onnx/onnx/commit/fc6b5fb): Refactor shape inference implementation (#855) <anderspapitto> - [b7d8dc8](https://github.com/onnx/onnx/commit/b7d8dc8): fix cmake warning message (#863) <Eric S. Yu> - [f585c5d](https://github.com/onnx/onnx/commit/f585c5d): add pytorch-operator test for tile (#831) <Wenhao Hu> - [993fe70](https://github.com/onnx/onnx/commit/993fe70): add install step (#832) <Eric S. Yu> - [68bc26c](https://github.com/onnx/onnx/commit/68bc26c): add type inference for traditional ml ops except classifier ops. (#857) <Ke Zhang> - [9cc0cda](https://github.com/onnx/onnx/commit/9cc0cda): fix string representation of scalar types (#858) <G. Ramalingam> - [1078925](https://github.com/onnx/onnx/commit/1078925): fix y in pow test case to scalar (#852) <Wenhao Hu> - [c66fb6f](https://github.com/onnx/onnx/commit/c66fb6f): Add some math function shape inference (#845) <anderspapitto> - [ff667d1](https://github.com/onnx/onnx/commit/ff667d1): Refactor return type and docs for ONNXIFI_BACKEND_DIRECTX_ID (#853) <Marat Dukhan> - [11c6876](https://github.com/onnx/onnx/commit/11c6876): clear initializer names when clear initializer (#849) <Wenhao Hu> - [73c34ae](https://github.com/onnx/onnx/commit/73c34ae): Clarify FeatureVectorizer description. (#843) <Scott McKay> - [1befb9b](https://github.com/onnx/onnx/commit/1befb9b): Remove useless text in docs (#850) <Lu Fang> - [e84788f](https://github.com/onnx/onnx/commit/e84788f): Fix SELU attributes' default values (#839) <Lu Fang> - [ebac046](https://github.com/onnx/onnx/commit/ebac046): Add tile test case (#823) <Wenhao Hu> - [8b7a925](https://github.com/onnx/onnx/commit/8b7a925): a few more shape inference functions (#772) <anderspapitto> - [9718f42](https://github.com/onnx/onnx/commit/9718f42): Make the coefficient non optional for LinearClassifier (#836) <Jaliya Ekanayake> - [ef083d0](https://github.com/onnx/onnx/commit/ef083d0): Add save_tensor and load_tensor functions for Protos (#770) <Lu Fang> - [45ceb55](https://github.com/onnx/onnx/commit/45ceb55): Check if CMAKE_BUILD_TYPE set before project(). (#812) <Sergii Dymchenko> - [4b3d2b0](https://github.com/onnx/onnx/commit/4b3d2b0): [WIP] reenable shape inference tests (#834) <anderspapitto> - [22d17ee](https://github.com/onnx/onnx/commit/22d17ee): RNN tests: LSTM, GRU, SimpleRNN (#739) <Peyman Manikashani> - [de65b95](https://github.com/onnx/onnx/commit/de65b95): dimension denotation (#443) <Tian Jin> - [eccc76e](https://github.com/onnx/onnx/commit/eccc76e): fix field number issue in onnx operator proto and enable its build (#829) <Ke Zhang> - [d582beb](https://github.com/onnx/onnx/commit/d582beb): disable shape inference test to unbreak ci (#830) <Lu Fang> - [485b787](https://github.com/onnx/onnx/commit/485b787): function proto for composite op. (#802) <Ke Zhang> - [cd58928](https://github.com/onnx/onnx/commit/cd58928): specify defaults for attributes of Affine op (#820) <G. Ramalingam> - [7ee2cf9](https://github.com/onnx/onnx/commit/7ee2cf9): merge the dummy backend back into the main one (#743) <anderspapitto> - [1c03a5a](https://github.com/onnx/onnx/commit/1c03a5a): [Proposal] ONNX Interface for Framework Integration (previously ONNX Backend API) header and docs (#551) <Marat Dukhan> - [3769a98](https://github.com/onnx/onnx/commit/3769a98): Rename real model test case from VGG-16 to ZFNet (#821) <Lu Fang> * [C2]ReluN Op relu n op. tf reference: https://www.tensorflow.org/api_docs/python/tf/nn/relu6 * Call destructor when assigning a blob value * Add executor overrides Add executor overrides flag to enable migration to async_scheduling executor * Add barrier net that runs before training nets - attempt #2 Add a synchonize barrier net that is run before training nets. With this net, shards that are faster will wait for other shards before start training. This reduce chances of the faster shards timing out during GLOO AllReduce. Removed explicit data_parallel_model.py.synchronize call in holmes workflow. This change was landed previously but caused errors for some EDPM workflows - See https://fb.facebook.com/groups/1426530000692545/permalink/1906766366002237/ - because EDPM assumes any call to CreateOrCloneCommonWorld and Gloo ops are wrapped in exception handlers but in this case exception thrown in the barrier init net is not handled. To address this issue, we add _CreateOrCloneCommonWorld to the param_init_net instead of a new barrier init net. Since errors for param_init_net run is handled gracefully and re-rendezvous, it should fixes the problem. * Handle empty nets in async_scheduling Make sure we don't get stuck on empty nets * use CUDA_ARCH for conditional compile * [C2 fix] infer function for ensure_cpu_output_op * Update group_norm test to reduce flaky test * Fix lr_multiplier for GPU	2018-05-10 23:14:27 -07:00
Lu Fang	664fe34e0a	[Caffe2][fbcode=>GH sync] Update from facebook 4323b18ce13c (#7116 ) * [fix] Re-enable events in RNN ops We have earlier added event disabling in RNN ops as back then we didn't use events, with current use cases this is no longer true (https://fburl.com/8vd0lp8y) * use ops with cude impl * Revert D7729695: [caffe2][fix] Re-enable events in RNN ops This reverts commit 4b215c7496fb724656ff4c776933a15bdbbcde5e @bypass-lint An infra SEV is better than not reverting this diff. If you copy this password, see you in SEV Review! @cause_a_sev_many_files * [observer] Clean up observer_config.h #accept2ship * [1/n] Refactor dataio_test.py Replace code duplication with a common function * Add barrier net that runs before training nets Add a synchonize barrier net that is run before training nets. With this net, shards that are faster will wait for other shards before start training. This reduce chances of the faster shards timing out during GLOO AllReduce. Removed explicit data_parallel_model.py.synchronize call in holmes workflow. Similar change in speech/asr_training workflow will come in another diff. * Support the dnnlowp backend in caffe2_benchmark This is for SHARE operator latency evaluation * Migrate integral_image_op to main caffe2 migrate integral_image_op(GPU version) given by https://fburl.com/yvqezigi to caffe2/caffe2/operators and implement its CPU version. Write up a test using the hypothesis_test mechanism * [pos_disc, fbcode] Implement unjoined lr loss As explained in https://our.intern.facebook.com/intern/wiki/Model_Based_Calibration/, when the dataset is an joined data set, where labels might change later, we need to use unjoined logloss. The implementation is almost the same as in Sigrid (https://fburl.com/1trngsls), where loss = y (log(p) - log(1-p)) + (1-y)(log(1-p)) = xy - (1-y)x - (1-y)log(1+exp(-x)) For x < 0, to ensure stability and avoid overflow, we reformulate the above exp as loss = xy - (1-y)x - (1-y)x + (1-y)log(1+exp(x)) = xy + (1-y)log(1+exp(x)) Then the final expression becomes loss = xy + (y - 1) x (x >= 0) - (1 - y) log(1 + exp(x - 2 x (x >= 0))) where y is the true label, x is the dot product and p = logistic(x). This kind of implementation is align with the current implementation of the original cross entropy in https://phabricator.intern.facebook.com/diffusion/FBS/browse/master/fbcode/caffe2/caffe2/operators/cross_entropy_op.cc;0bae3b5d0f825897c5e0dd0ff10f489d7271bf25$7-13 * Keep the array to fix the conflict * [C2] Compute Adagrad effective LR The AdagradWithLR op outputs an extra blob which is contains the average effective learning rate across all weights in this blob. * Open-source extractMetaNetDef & runGlobalInitialization, add new Predictor constructor from db file, and add run_map_outputs 1. Open-source extractMetaNetDef and runGlobalInitialization, for use in 2. new Predictor constructor from db file. 3. Add new run function that returns outputs as TensorMap * Disable eigen cpu Disable eigen cpu in transpose and reduce * Introduce request_only/object_only property of ModelLayer by default this is False * A simple TC Caffe2 benchmark We can run tunner, get MappingOptions and then use them to compare against cuBLAS currently broken due to LLVM issues. How to run: hg checkout eec1ab31b59c03b8deded1c755a9abaf8c45be01 add D7401202 add D7434625 add D7506031 add D7540728 buck run @mode/dev-nosan tc/tc/benchmarks_python:caffe2_benchmark * Move Caffe2 feature_maps_ops to open source Need feature maps operators in open source project facebookresearch/BlueWhale * Manually fix the conflicts in channel shuffle op * Fix the inconsistency between different gh and fbcode * Skip Adagrad GPU Test (Because some gpu implementation is missing) * Fix another test to make sure it won't run on gpu when implementation is not available yet	2018-05-01 20:49:00 -07:00

1 2

98 Commits