pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 00:21:07 +01:00

Author	SHA1	Message	Date
Masaki Kozuki	7a6c4d0c50	[mta] APEX style Fused Adam (#81705 ) This PR implements an APEX style FusedAdam in PyTorch. This is different from the APEX one in that this is compatible with `torch.cuda.amp.GradScaler` by setting `_step_supports_amp_scaling` to `True` and unscales gradients inside its CUDA kernel. related: https://github.com/pytorch/pytorch/issues/68041, https://github.com/pytorch/pytorch/issues/71274, https://github.com/pytorch/pytorch/issues/80167 possibly related to https://github.com/pytorch/pytorch/issues/80595#issuecomment-1178519436 cc @ptrblck @ngimel Pull Request resolved: https://github.com/pytorch/pytorch/pull/81705 Approved by: https://github.com/ngimel	2022-09-20 17:18:33 +00:00
kshitij12345	faac3dbce2	[optim] asgd : handle complex params as independent real params (#84472 ) Ref: #65711 Pull Request resolved: https://github.com/pytorch/pytorch/pull/84472 Approved by: https://github.com/Lezcano, https://github.com/soulitzer	2022-09-06 16:58:42 +00:00
Chen, Jian Ping	e72256604f	Enhance add_out_dense_sparse_cpu for hybrid sparse tensor (#23057 ) This is to improve the performance for hybrid sparse coo tensor on CPU path. This case is appeared at the DLRM terabyte test. With this fix, according to the previous performance test data, it got ~10x performance improvement on DLRM execution. without this, the DLRM will run as Finished training it 100/1000 of epoch 0, 2969.25 ms/it, loss 0.220505, accuracy 0.000 % with this, the DLRM will run as Finished training it 100/1000 of epoch 0, 270.71 ms/it, loss 0.220505, accuracy 0.000 % Pull Request resolved: https://github.com/pytorch/pytorch/pull/23057 Approved by: https://github.com/VitalyFedyunin, https://github.com/malfet	2022-08-24 22:42:53 +00:00
kshitij12345	7c20ad3dfa	[optim] rprop: handle complex params as independent real params (#83858 ) Ref #65711 Pull Request resolved: https://github.com/pytorch/pytorch/pull/83858 Approved by: https://github.com/albanD	2022-08-23 08:39:35 +00:00
Kshiteej K	09331c947c	[optim] rmsprop: handle complex params as independent real params (#83860 ) Ref: #65711 Pull Request resolved: https://github.com/pytorch/pytorch/pull/83860 Approved by: https://github.com/albanD	2022-08-22 21:55:01 +00:00
Emilio Castillo	f0eb841d20	Make `torch.optim.RMSprop` differentiable (#83578 ) Blocked by #82205 Pull Request resolved: https://github.com/pytorch/pytorch/pull/83578 Approved by: https://github.com/albanD	2022-08-22 03:37:10 +00:00
Emilio Castillo	5aab57e112	Make Adam optimizer differentiable (#82205 ) Continues [80938](https://github.com/pytorch/pytorch/pull/80938) Pull Request resolved: https://github.com/pytorch/pytorch/pull/82205 Approved by: https://github.com/albanD	2022-08-17 07:20:37 +00:00
Rob Zinkov	ff75562cff	Adding maximize to rprop (#81864 ) Added the maximize flag #68052 to rprop optimizer and updates the respective tests. Pull Request resolved: https://github.com/pytorch/pytorch/pull/81864 Approved by: https://github.com/albanD	2022-08-16 08:19:46 +00:00
Federico Pozzi	f8a10a7f79	feat: add PolynomialLR scheduler (#82769 ) ### Description <!-- What did you change and why was it needed? --> Add PolynomialLR scheduler. ### Issue Closes #79511. ### Testing I added tests for PolynomialLR. Pull Request resolved: https://github.com/pytorch/pytorch/pull/82769 Approved by: https://github.com/datumbox	2022-08-10 18:21:00 +00:00
Rob Zinkov	c54d18dbc7	Handle complex optimization in Adamax by treating complex numbers as 2D real numbers (#80319 ) This commit partially addresses #65711 Pull Request resolved: https://github.com/pytorch/pytorch/pull/80319 Approved by: https://github.com/albanD	2022-08-05 21:03:18 +00:00
Rob Zinkov	dcbe9ce2ad	Handle complex optimization in AdamW by treating complex numbers as 2D real numbers (#80280 ) This commit partially addresses #65711 Pull Request resolved: https://github.com/pytorch/pytorch/pull/80280 Approved by: https://github.com/albanD	2022-08-05 13:47:14 +00:00
Mikayla Gawarecki	b1922e03ab	Test that multi_tensor optimizer state buffers match with single_tensor state buffers (#81894 ) Add testing for state of multitensor optimizers suggested in #78807 (previously only the equality of model parameters after a few optimizer steps was being tested) Pull Request resolved: https://github.com/pytorch/pytorch/pull/81894 Approved by: https://github.com/albanD	2022-08-04 23:23:17 +00:00
Rob Zinkov	f9ef363982	Modifying Adam to support complex numbers as 2d real numbers (#80279 ) This commit addresses issues in #65711 Pull Request resolved: https://github.com/pytorch/pytorch/pull/80279 Approved by: https://github.com/albanD	2022-07-27 18:39:40 +00:00
Emilio Castillo	49b4f45781	Add initial support for differentiable optimizers (#80938 ) Adds the `differentiable` argument, a method for updating parameters in an existing optimizer, and a template for testing the differentiability of multiple optimizers. This is all based in discussions with @albanD & @jbschlosser Pull Request resolved: https://github.com/pytorch/pytorch/pull/80938 Approved by: https://github.com/albanD	2022-07-25 13:37:08 +00:00
Rob Zinkov	50c655d5e3	Adding maximize to ASGD (#81875 ) Added the maximize flag #68052 to ASGD optimizer and updates the respective tests. Pull Request resolved: https://github.com/pytorch/pytorch/pull/81875 Approved by: https://github.com/albanD	2022-07-22 17:05:41 +00:00
PyTorch MergeBot	135af0fe30	Revert "Adding maximize to ASGD (#80323 )" This reverts commit `14bd5bd6ee`. Reverted https://github.com/pytorch/pytorch/pull/80323 on behalf of https://github.com/albanD due to Broke rocm test	2022-07-08 13:35:31 +00:00
PyTorch MergeBot	0b8a5ca01b	Revert "Adding maximize to rprop (#80335 )" This reverts commit `495aa9bc3a`. Reverted https://github.com/pytorch/pytorch/pull/80335 on behalf of https://github.com/albanD due to Broke rocm and windows test	2022-07-08 13:34:02 +00:00
Rob Zinkov	f24c94d7ae	Adding maximize to SparseAdam (#80336 ) Added the maximize flag #68052 to SparseAdam optimizer and updates the respective tests. Pull Request resolved: https://github.com/pytorch/pytorch/pull/80336 Approved by: https://github.com/albanD	2022-07-08 12:17:27 +00:00
Rob Zinkov	495aa9bc3a	Adding maximize to rprop (#80335 ) Added the maximize flag #68052 to rprop optimizer and updates the respective tests. Pull Request resolved: https://github.com/pytorch/pytorch/pull/80335 Approved by: https://github.com/albanD	2022-07-08 08:04:38 +00:00
Rob Zinkov	a1fd5b4273	Adding maximize to RMSprop (#80326 ) Added the maximize flag #68052 to RMSprop optimizer and updates the respective tests. Pull Request resolved: https://github.com/pytorch/pytorch/pull/80326 Approved by: https://github.com/albanD	2022-07-08 08:04:26 +00:00
Rob Zinkov	14bd5bd6ee	Adding maximize to ASGD (#80323 ) Added the maximize flag #68052 to ASGD optimizer and updates the respective tests. Pull Request resolved: https://github.com/pytorch/pytorch/pull/80323 Approved by: https://github.com/albanD	2022-07-08 08:03:36 +00:00
Antonio Kim	765b6a8fab	Fix SequentialLR initialization (#72856 ) What was happening is that when we have multiple learning rate schedulers, the order in which they are being initialized is not being taken into account. This is a problem if they were being initialized in sequential order (as one might intuitively do). Each scheduler calls `step()` on initialization and sets the `lr` in its optimizer's `params_groups`. However, this means that step 0 will be using the `lr` that was set by the very last scheduler (in the case of initializing schedulers sequentially) instead of the first scheduler. The fix in this PR, addresses the above bug by performing a call to the appropriate scheduler on initialization after decrementing the `last_epoch` values in order to keep them the same post-step. This will ensure that the correct scheduler is the one setting the `lr` values for the optimizer's `param_groups` Pull Request resolved: https://github.com/pytorch/pytorch/pull/72856 Approved by: https://github.com/jbschlosser	2022-06-21 20:21:13 +00:00
Rob Zinkov	2a496e2f80	Adding maximize to Adamax (#77409 ) Added the maximize flag #68052 to Adamax optimizer and updates the respective tests. Pull Request resolved: https://github.com/pytorch/pytorch/pull/77409 Approved by: https://github.com/albanD	2022-05-16 17:34:44 +00:00
Emilio Castillo	e5ee6f5cf7	Fix `CosineAnnealingLR` on restart Fixes #60265 The initial LR for this scheduler is not consistent when a new instance is created with `last_epoch != -1` Maybe we can refactor the testing code to test `last_epoch != -1` in schedulers that can recreate their state from the current epoch? Pull Request resolved: https://github.com/pytorch/pytorch/pull/60339 Approved by: https://github.com/albanD	2022-04-20 13:35:01 +00:00
Rob Zinkov	6642e88ad2	Adding maximize flag to Adagrad This adds maximize to Adagrad (#68052) along with updates the respective tests. Pull Request resolved: https://github.com/pytorch/pytorch/pull/75968 Approved by: https://github.com/albanD	2022-04-20 08:29:03 +00:00
arindamroy-eng	7478ce187a	ROCM:Unskip more tests for ROCM5.0 Re-enabling more tests which are working on ROCM5.0 Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/75353 Approved by: https://github.com/ezyang	2022-04-19 19:45:55 +00:00
francescocastelli	58a44523c1	Add maximize flag to Adadelta Added the maximize flag to Adadelta optimizer (#68052) and adjusted tests to take maximize into account. Pull Request resolved: https://github.com/pytorch/pytorch/pull/75330 Approved by: https://github.com/cpuhrsch	2022-04-08 20:32:35 +00:00
Mikayla Gawarecki	10bb0ffe69	Fix casting bug in state_step for optimizers when loading state dict Pull Request resolved: https://github.com/pytorch/pytorch/pull/75214 Approved by: https://github.com/albanD	2022-04-05 01:27:18 +00:00
Mikayla Gawarecki	03662b32d5	Uncomment step no-op test in test_optim (#70953 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70953 Test Plan: Imported from OSS Reviewed By: anjali411 Differential Revision: D33767861 Pulled By: mikaylagawarecki fbshipit-source-id: 8b41c8ee5d0e045436b10da5f68e9d5c5852c334 (cherry picked from commit `9224afc453`)	2022-02-15 18:02:08 +00:00
Mikayla Gawarecki	8e8d170674	Optim foreach cleanup for Adadelta (#69980 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69980 - Merged `torch/optim/adadelta.py` and `torch/optim/_multitensor/adadelta.py` into `torch/optim/adadelta.py` - Moved adadelta functional forms from `torch/optim/_functional.py` and `torch/optim/_multi_tensor/_functional.py` to `torch/optim/adadelta.py` - `torch/optim/_functional.py` just imports from `torch/optim/adadelta.py` - Added a test `test_optimizers_foreach_flag` which replicates `test_multi_tensor_optimizers` in `test/test_optim.py` - Add a test `test_adadelta_new` that replicates the behavior of `test_adadelta` but with `foreach` flag instead of using the multitensor adadleta class. If we delete `_multitensor/` we could replace `test_adadelta` with this Remaining TODO: - [ ] single_tensor adadelta supports complex but multitensor does not, need to integrate the singletensor logic in multitensor and switch the `test_adadelta_complex` to test for foreach in [True, False] Test Plan: Imported from OSS Reviewed By: VitalyFedyunin, albanD Differential Revision: D33413059 Pulled By: mikaylagawarecki fbshipit-source-id: 92a9fa98705762bb6bd464261671e49aef40070e (cherry picked from commit `a008227d22`)	2022-02-09 16:52:12 +00:00
Mikayla Gawarecki	7176c92687	[optim] update step in functional and pass state_steps instead of state (#71333 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/71333 Updated - Adagrad - Adamax - Adam - AdamW - RAdam make multi_tensor functionals take `state_steps: List[Tensor]` instead of taking `states: List[Dict]` make `state_steps: List[int]s -> state_steps:List[Tensor]` where each is a Singleton tensor so step can be updated within the functional (NAdam and ASGD) were updated in separate diffs to fold their handling of state into the functionals Test Plan: Imported from OSS Reviewed By: anjali411 Differential Revision: D33767872 Pulled By: mikaylagawarecki fbshipit-source-id: 9baa7cafb6375eab839917df9287c65a437891f2 (cherry picked from commit `831c02b3d0`)	2022-02-08 16:51:19 +00:00
Rishi Puri	b066931106	fixing of usage of rel_tol for test adadelta (#71880 ) Summary: Recently I made a PR to change some test tolerances: https://github.com/pytorch/pytorch/pull/69919 It turns out that the previous decorator does not work with the test optim unit test framework. I have summarized the issue in the following doc: https://docs.google.com/document/d/1BOrp29r31A2WXwM0O6ydsCs43wi01sAgdduKd7is_ec/edit?usp=sharing Pull Request resolved: https://github.com/pytorch/pytorch/pull/71880 Reviewed By: cpuhrsch Differential Revision: D33801967 Pulled By: jbschlosser fbshipit-source-id: 094feba10e2ee2a94e3ab754e4140e16b634ea09 (cherry picked from commit `d504ddd950`)	2022-01-26 23:33:28 +00:00
Prabhat Roy	942a084c46	Remove state_dict from AveragedModel and use buffers instead (#71763 ) Summary: Fixes [https://github.com/pytorch/pytorch/issues/66686](https://github.com/pytorch/pytorch/issues/66686) Pull Request resolved: https://github.com/pytorch/pytorch/pull/71763 Reviewed By: anjali411 Differential Revision: D33770907 Pulled By: prabhat00155 fbshipit-source-id: ee32f2cb8475c9add4e1a9a5d3d784ef95825efc (cherry picked from commit `a15898b072`)	2022-01-26 13:31:30 +00:00
Jake Tae	a4196a9abf	Remove unused `optimizers` variable in test (#70668 ) Summary: In `TestLRScheduler._test()`, an unused variable `optimizers` is created. This PR is a minor refactoring that removes the variable and the loop block that populates the set. Pull Request resolved: https://github.com/pytorch/pytorch/pull/70668 Reviewed By: wenleix Differential Revision: D33586236 Pulled By: albanD fbshipit-source-id: cabf870a8221f144df9d3e2f2b564cdc5c255f5a	2022-01-14 11:59:49 -08:00
Alban Desmaison	e1b84e1b6b	fix loading of older models that don't have maximize (#71023 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/71023 Reviewed By: jbschlosser Differential Revision: D33483687 Pulled By: albanD fbshipit-source-id: 2f3c6e97a9579be9ba15eca0756fc1f2c466fbb6	2022-01-10 06:01:24 -08:00
Jake Tae	dd1121435b	SequentialLR update _last_lr on step (#70558 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/68956. Pull Request resolved: https://github.com/pytorch/pytorch/pull/70558 Reviewed By: dagitses Differential Revision: D33430213 Pulled By: albanD fbshipit-source-id: 446f182610de32db224d55b244d76c3076e8080f	2022-01-07 10:36:35 -08:00
Rishi Puri	f9e1a1c97f	Increase tolerance for test_adadelta (#69919 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/69698 Pull Request resolved: https://github.com/pytorch/pytorch/pull/69919 Reviewed By: cpuhrsch Differential Revision: D33286427 Pulled By: jbschlosser fbshipit-source-id: a2ca90683c14b6669f9b1804881ac675ba925fc5	2022-01-05 15:02:10 -08:00
Adnios	a9c7d626e1	Add the `maximize` flag to AdamW (#70146 ) Summary: Related issue: https://github.com/pytorch/pytorch/issues/68052 cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse SciPioneer H-Huang Pull Request resolved: https://github.com/pytorch/pytorch/pull/70146 Reviewed By: malfet Differential Revision: D33254561 Pulled By: albanD fbshipit-source-id: f190c836a4162f936c5953e076747c345df21421	2021-12-23 09:20:29 -08:00
oliver	3d358a7678	Adds a `maximize` flag to Adam (#68164 ) Summary: Solves the next most important use case in https://github.com/pytorch/pytorch/issues/68052. I have kept the style as close to that in SGD as seemed reasonable, given the slight differences in their internal implementations. All feedback welcome! cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse SciPioneer H-Huang Pull Request resolved: https://github.com/pytorch/pytorch/pull/68164 Reviewed By: VitalyFedyunin Differential Revision: D32994129 Pulled By: albanD fbshipit-source-id: 65c57c3f3dbbd3e3e5338d51def54482503e8850	2021-12-13 05:53:53 -08:00
Kurt Mohler	52219b1017	Fix `ChainedScheduler.get_last_lr()` (#69112 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/68820 cc vincentqb jbschlosser albanD Pull Request resolved: https://github.com/pytorch/pytorch/pull/69112 Reviewed By: zou3519 Differential Revision: D32796626 Pulled By: albanD fbshipit-source-id: bde9d4e473527be4c0a7f21cb57f795a67a99eaa	2021-12-02 13:44:12 -08:00
oliver	94b6fa6f8b	Adds an optimizer instance variable to ChainedScheduler (#68010 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/67601. As simple a fix as I could make it. I even managed to delete some testing code! I checked calling `super()` and, as I had feared, it doesn't work out the box, so perhaps that ought to be revisited later. As it stands, https://github.com/pytorch/pytorch/issues/20124, still applies to the chained scheduler, but I think this change is still an improvement. Pull Request resolved: https://github.com/pytorch/pytorch/pull/68010 Reviewed By: zou3519 Differential Revision: D32278139 Pulled By: albanD fbshipit-source-id: 4c6f9f1b2822affdf63a6d22ddfdbcb1c6afd579	2021-11-10 01:31:47 -08:00
oliver	f8297d40fc	Adds a `maximize` flag to SGD. (#67847 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/46480 -- for SGD. ## Notes: - I have modified the existing tests to take a new `constructor_accepts_maximize` flag. When this is set to true, the ` _test_basic_cases_template` function will test both maximizing and minimizing the sample function. - This was the clearest way I could think of testing the changes -- I would appreciate feedback on this strategy. ## Work to be done: [] I need to update the docs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/67847 Reviewed By: H-Huang Differential Revision: D32252631 Pulled By: albanD fbshipit-source-id: 27915a3cc2d18b7e4d17bfc2d666fe7d2cfdf9a4	2021-11-09 00:43:07 -08:00
oliver	b3770766c4	Fixes deprecation warnings in `test_optim.py` (#67954 ) Summary: Catches deprecation warnings when we call `scheduler.step(epoch)` in tests. Removes duplicate parameters to optimizers unless we are specifically testing for that Fixes https://github.com/pytorch/pytorch/issues/67696 There is one warning remaining when I run this locally -- however that is due to the implementation of the `SequentialLR` Scheduler. I will open a new issue relating to that. Pull Request resolved: https://github.com/pytorch/pytorch/pull/67954 Reviewed By: H-Huang Differential Revision: D32244056 Pulled By: albanD fbshipit-source-id: 2ab3086a58e10c8d29809ccbaab80606a1ec61d8	2021-11-08 09:36:08 -08:00
oliver	53ebccbe78	Fix warnings produced when running test_optim.py (#67756 ) Summary: Fixes part of https://github.com/pytorch/pytorch/issues/67696 by adding calls to `optimizer.step()` in various places. ## Notes for reviewers: - It is not entirely clear which is the right optimizer to step in each case. I have favoured the more explicit approach of creating a set of optimizers and calling step on each of them. - At the time of writing, the only Scheduler without an `optimizer` instance variable is `ChainedScheduler` which I need to deal with once. I use `hasattr` to do this check. Let me know if this ought to be changed. - I am opening this PR for review when it only solve part of the issue, as I'd rather get feedback sooner. I think it is fine to fix the issue in several PRs too. Pull Request resolved: https://github.com/pytorch/pytorch/pull/67756 Reviewed By: jbschlosser Differential Revision: D32187864 Pulled By: albanD fbshipit-source-id: fd0d133bcaa3a24588e5a997ad198fdf5879ff5a	2021-11-05 07:12:13 -07:00
Christopher Gray Howard	dfa7225a38	[Pytorch][Bootcamp] Add fix and testing for non-vectorized Adadelta optimizer to handle complex numbers (#66587 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66587 Made some changes in the step function of the non-vectorized Adadelta optimizer to handle complex numbers as two real numbers as per 65711 on github ghstack-source-id: 141484731 Test Plan: buck test mode/dev caffe2/test:optim -- 'test_adadelta_complex' https://pxl.cl/1R7kJ Reviewed By: albanD Differential Revision: D31630069 fbshipit-source-id: 2741177b837960538ce39772897af36bbce7b7d8	2021-10-26 17:35:01 -07:00
Christopher Gray Howard	acb340de75	[Pytorch][Bootcamp] Add fixes and vanilla testing for Adagrad non-vectorized and vectorized optimizers to handle complex numbers (#66671 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66671 Made changes in the step function of the vectorized and non-vectorized adagrad optimizers to handle complex numbers as two real numbers as per 65711 on github ghstack-source-id: 141442350 Test Plan: buck test mode/dev caffe2/test:optim -- 'test_adagrad_complex' https://pxl.cl/1Rd44 Reviewed By: albanD Differential Revision: D31673503 fbshipit-source-id: 90a0d0c69b556716e2d17c59ce80f09c750fc464	2021-10-25 10:13:21 -07:00
Jane Xu	fd608cd313	[skip ci] Set test owners for optim tests (#66861 ) Summary: Action following https://github.com/pytorch/pytorch/issues/66232 cc vincentqb jbschlosser albanD Pull Request resolved: https://github.com/pytorch/pytorch/pull/66861 Reviewed By: albanD Differential Revision: D31761369 Pulled By: janeyx99 fbshipit-source-id: 57829e1f1509fc2af321530a4b55c9d33b7fb150	2021-10-19 08:39:35 -07:00
Christopher Gray Howard	87df043f63	[Bootcamp][Pytorch]Add testing for complex parameters in Adagrad optimizer (#66501 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66501 Add testing for the Adagrad optimizer to ensure that it behaves as if complex numbers are two real numbers in R^2 as per issue 65711 on github ghstack-source-id: 140414042 Test Plan: buck test mode/dev caffe2/test:optim -- 'test_adagrad_complex' https://pxl.cl/1R27M Reviewed By: albanD Differential Revision: D31584240 fbshipit-source-id: 5c9938084566b8ea49cc8ff002789731f62fe87e	2021-10-13 07:05:20 -07:00
Mikayla Gawarecki	0e2d1b221a	[Bootcamp][Pytorch Core] Add testing for complex non-vanilla SGD Summary: Adding test to ensure non-Vanilla SGD behaves as if complex numbers are two real numbers in R^2 as per issue 65711 on github Test Plan: ```buck test mode/dev caffe2/test:optim -- 'test_sgd_complex'``` https://pxl.cl/1QLxw Reviewed By: albanD Differential Revision: D31477212 fbshipit-source-id: 500678e561a05ac96759223b4c87a37cab26c6a6	2021-10-07 14:07:39 -07:00
Mikayla Gawarecki	1e4bcbdddb	[Bootcamp][Pytorch Core] Add test for complex numbers for vanilla SGD (#66230 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66230 Adding test to ensure Vanilla SGD behaves as if complex numbers are two real numbers in R^2 as per issue 65711 on github https://github.com/pytorch/pytorch/issues/65711 ghstack-source-id: 139918862 Test Plan: ```buck test mode/dev caffe2/test:optim -- 'test_sgd_complex'``` https://pxl.cl/1QHvX Reviewed By: albanD Differential Revision: D31449289 fbshipit-source-id: da8b00421085796a23b643e73f96b19b5b560a32	2021-10-07 07:14:05 -07:00
Prabhat Roy	2ea724b1fd	Added option to update parameters using state_dict in AveragedModel (#65495 ) Summary: While implementing [EMA](https://github.com/pytorch/vision/pull/4381)(which extends AveragedModel) in torchvision, update_parameters() from AveragedModel could not be used as it did not handle state_dict(), so a custom update_parameters() needed to be defined in [EMA class](https://github.com/pytorch/vision/pull/4406). This PR aims to handle this scenario removing the need for this custom update_parameters() implementation. Discussion: https://github.com/pytorch/vision/pull/4406#pullrequestreview-753734102 Pull Request resolved: https://github.com/pytorch/pytorch/pull/65495 Reviewed By: datumbox Differential Revision: D31176742 Pulled By: prabhat00155 fbshipit-source-id: 326d14876018f21cf602bab5eaba344678dbabe2	2021-09-28 03:34:49 -07:00
Ilqar Ramazanli	2b41bf40c5	To add SequentialLR to PyTorch Core Schedulers (#64037 ) Summary: Partially resolves https://github.com/pytorch/vision/issues/4281 In this PR we are proposing a new scheduler --SequentialLR-- which enables list of different schedulers called in different periods of the training process. The main motivation of this scheduler is recently gained popularity of warming up phase in the training time. It has been shown that having a small steps in initial stages of training can help convergence procedure get faster. With the help of SequentialLR we mainly enable to call a small constant (or linearly increasing) learning rate followed by actual target learning rate scheduler. ```PyThon scheduler1 = ConstantLR(optimizer, factor=0.1, total_iters=2) scheduler2 = ExponentialLR(optimizer, gamma=0.9) scheduler = SequentialLR(optimizer, schedulers=[scheduler1, scheduler2], milestones=[5]) for epoch in range(100): train(...) validate(...) scheduler.step() ``` which this code snippet will call `ConstantLR` in the first 5 epochs and will follow up with `ExponentialLR` in the following epochs. This scheduler could be used to provide call of any group of schedulers next to each other. The main consideration we should make is every time we switch to a new scheduler we assume that new scheduler starts from the beginning- zeroth epoch. We also add Chained Scheduler to `optim.rst` and `lr_scheduler.pyi` files here. Pull Request resolved: https://github.com/pytorch/pytorch/pull/64037 Reviewed By: albanD Differential Revision: D30841099 Pulled By: iramazanli fbshipit-source-id: 94f7d352066ee108eef8cda5f0dcb07f4d371751	2021-09-09 09:36:32 -07:00
Ilqar Ramazanli	f767cf6683	To change WarmUp Scheduler with ConstantLR and LinearLR (#64395 ) Summary: Partially unblocks https://github.com/pytorch/vision/issues/4281 Previously we have added WarmUp Schedulers to PyTorch Core in the PR : https://github.com/pytorch/pytorch/pull/60836 which had two mode of execution - linear and constant depending on warming up function. In this PR we are changing this interface to more direct form, as separating linear and constant modes to separate Schedulers. In particular ```Python scheduler1 = WarmUpLR(optimizer, warmup_factor=0.1, warmup_iters=5, warmup_method="constant") scheduler2 = WarmUpLR(optimizer, warmup_factor=0.1, warmup_iters=5, warmup_method="linear") ``` will look like ```Python scheduler1 = ConstantLR(optimizer, warmup_factor=0.1, warmup_iters=5) scheduler2 = LinearLR(optimizer, warmup_factor=0.1, warmup_iters=5) ``` correspondingly. Pull Request resolved: https://github.com/pytorch/pytorch/pull/64395 Reviewed By: datumbox Differential Revision: D30753688 Pulled By: iramazanli fbshipit-source-id: e47f86d12033f80982ddf1faf5b46873adb4f324	2021-09-07 08:42:31 -07:00
Ilqar Ramazanli	5a12cb611f	To add Chained Scheduler to the list of PyTorch schedulers. (#63491 ) Summary: In this PR we are introducing ChainedScheduler which initially proposed in the discussion https://github.com/pytorch/pytorch/pull/26423#discussion_r329976246 . The idea is to provide a user friendly chaining method for schedulers, especially for the cases many of them are involved and we want to have a clean and easy to read interface for schedulers. This method will be even more crucial once CompositeSchedulers and Schedulers for different type of parameters are involved. The immediate application of Chained Scheduler is expected to happen in TorchVision Library to combine WarmUpLR and MultiStepLR https://github.com/pytorch/vision/blob/master/references/video_classification/scheduler.py#L5 . However, it can be expected that in many other use cases also this method could be applied. ### Example The usage is as simple as below: ```python sched=ChainedScheduler([ExponentialLR(self.opt, gamma=0.9), WarmUpLR(self.opt, warmup_factor=0.2, warmup_iters=4, warmup_method="constant"), StepLR(self.opt, gamma=0.1, step_size=3)]) ``` Then calling ```python sched.step() ``` would trigger step function for all three schedulers consecutively Partially resolves https://github.com/pytorch/vision/issues/4281 Pull Request resolved: https://github.com/pytorch/pytorch/pull/63491 Reviewed By: datumbox, mruberry Differential Revision: D30576180 Pulled By: iramazanli fbshipit-source-id: b43f0749f55faab25079641b7d91c21a891a87e4	2021-08-26 13:30:21 -07:00
Ilqar Ramazanli	e7c4988b52	To fix the chainability at epoch zero for some schedulers (#63457 ) Summary: It has been discussed in the https://github.com/pytorch/pytorch/pull/60836#issuecomment-899084092 that we have observed an obstacle to chain some type of learning rate schedulers. In particular we observed * some of the learning rate schedulers returns initial learning rates at epoch 0 as ``` return self.base_lrs` ``` * This can be a problem when two schedulers called as chained as ``` scheduler1.step() scheduler2.step() ``` in particular, we completely ignore the effect of scheduler1 at epoch 0. This could not be an issue if at epoch 0, scheduler1 was ineffective as in many schedulers, however for schedulers as WarmUp Schedulers, where at epoch 0 schedulers multiplicative value is smaller than 1 this could lead to undesired behaviors. The following code snippet illustrates the problem better ## Reproducing the bug ```python import torch from torch.nn import Parameter from torch.optim import SGD from torch.optim.lr_scheduler import WarmUpLR, ExponentialLR model = [Parameter(torch.randn(2, 2, requires_grad=True))] optimizer = SGD(model, 1.0) scheduler1 = WarmUpLR(optimizer, warmup_factor=0.1, warmup_iters=5, warmup_method="constant") scheduler2 = ExponentialLR(optimizer, gamma=0.9) for epoch in range(10): print(epoch, scheduler2.get_last_lr()[0]) optimizer.step() scheduler1.step() scheduler2.step() ``` ### Current Result ``` 0 1.0 1 0.9 2 0.81 3 0.7290000000000001 4 0.6561000000000001 5 5.904900000000001 6 5.314410000000001 7 4.782969000000001 8 4.304672100000001 9 3.874204890000001 ``` ### Expected Result ``` 0 1.0 1 0.9 2 0.81 3 0.7290000000000001 4 0.6561000000000001 5 0.5904900000000001 6 0.5314410000000001 7 0.4782969000000001 8 0.4304672100000001 9 0.3874204890000001 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/63457 Reviewed By: datumbox Differential Revision: D30424160 Pulled By: iramazanli fbshipit-source-id: 3e15af8d278c872cd6f53406b55f4d3ce5002867	2021-08-19 07:17:03 -07:00
Ilqar Ramazanli	cec08e7032	To add warm-up scheduler to optim (#60836 ) Summary: Warm up of learning rate scheduling has initially been discussed by Priya et. al. in the paper: https://arxiv.org/pdf/1706.02677.pdf . In the section 2.2 of the paper they discussed and proposed idea of warming up learning schedulers in order to prevent big variance / noise in the learning rate. Then idea has been further discussed in the following papers: * Akilesh Gotmare et al. https://arxiv.org/abs/1810.13243 * Bernstein et al http://proceedings.mlr.press/v80/bernstein18a/bernstein18a.pdf * Liyuan Liu et al: https://arxiv.org/pdf/1908.03265.pdf There are two type of popularly used learning rate warm up ideas * Constant warmup (start with very small constant learning rate) * Linear Warmup ( start with small learning rate and gradually increase) In this PR we are adding warm up as learning rate scheduler. Note that learning rates are chainable, which means that we can merge warmup scheduler with any other learning rate scheduler to make more sophisticated learning rate scheduler. ## Linear Warmup Linear Warmup is multiplying learning rate with pre-defined constant - warmup_factor in the first epoch (epoch 0). Then targeting to increase this multiplication constant to one in warmup_iters many epochs. Hence we can derive the formula at i-th step to have multiplication constant equal to: warmup_factor + (1-warmup_factor) * i / warmup_iters Moreover, the fraction of this quantity at point i to point i-1 will give us 1 + (1.0 - warmup_factor) / [warmup_iterswarmup_factor+(i-1)(1-warmup_factor)] which is used in get_lr() method in our implementation. Below we provide an example how to use linear warmup scheduler and to give an example to show how does it works. ```python import torch from torch.nn import Parameter from torch.optim import SGD from torch.optim.lr_scheduler import WarmUpLR model = [Parameter(torch.randn(2, 2, requires_grad=True))] optimizer = SGD(model, 0.1) scheduler = WarmUpLR(optimizer, warmup_factor=0.1, warmup_iters=10, warmup_method="linear") for epoch in range(15): print(epoch, scheduler.get_last_lr()[0]) optimizer.step() scheduler.step() ``` ``` 0 0.010000000000000002 1 0.019000000000000003 2 0.028000000000000008 3 0.03700000000000001 4 0.04600000000000001 5 0.055000000000000014 6 0.06400000000000002 7 0.07300000000000002 8 0.08200000000000003 9 0.09100000000000004 10 0.10000000000000005 11 0.10000000000000005 12 0.10000000000000005 13 0.10000000000000005 14 0.10000000000000005 ``` ## Constant Warmup Constant warmup has straightforward idea, to multiply learning rate by warmup_factor until we reach to epoch warmup_factor, then do nothing for following epochs ```python import torch from torch.nn import Parameter from torch.optim import SGD from torch.optim.lr_scheduler import WarmUpLR model = [Parameter(torch.randn(2, 2, requires_grad=True))] optimizer = SGD(model, 0.1) scheduler = WarmUpLR(optimizer, warmup_factor=0.1, warmup_iters=5, warmup_method="constant") for epoch in range(10): print(epoch, scheduler.get_last_lr()[0]) optimizer.step() scheduler.step() ``` ``` 0 0.010000000000000002 1 0.010000000000000002 2 0.010000000000000002 3 0.010000000000000002 4 0.010000000000000002 5 0.10000000000000002 6 0.10000000000000002 7 0.10000000000000002 8 0.10000000000000002 9 0.10000000000000002 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/60836 Reviewed By: saketh-are Differential Revision: D29537615 Pulled By: iramazanli fbshipit-source-id: d910946027acc52663b301f9c56ade686e62cb69	2021-08-15 12:31:45 -07:00
Shen Li	1022443168	Revert D30279364: [codemod][lint][fbcode/c*] Enable BLACK by default Test Plan: revert-hammer Differential Revision: D30279364 (`b004307252`) Original commit changeset: c1ed77dfe43a fbshipit-source-id: eab50857675c51e0088391af06ec0ecb14e2347e	2021-08-12 11:45:01 -07:00
Zsolt Dollenstein	b004307252	[codemod][lint][fbcode/c*] Enable BLACK by default Test Plan: manual inspection & sandcastle Reviewed By: zertosh Differential Revision: D30279364 fbshipit-source-id: c1ed77dfe43a3bde358f92737cd5535ae5d13c9a	2021-08-12 10:58:35 -07:00
Ilqar Ramazanli	5ed6e4429e	To fix variance computation for complex Adam (#62946 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/59998 It has been discussed in the issue that the variance term of Adam optimizer currently doesn't compute correctly for complex domain. As it has been stated in the Generalization to Complex numbers section in https://en.wikipedia.org/wiki/Variance variance is computed as E[(X - mu)(X-mu)] (where mu = E[X] and stands for conjugate) for complex random variable X. However, currently the computation method in implementation of Adam is via E[(X - mu)(X-mu)] which doesn't return right variance value, in particular it returns complex number. Variance is defined to be real number even though underlying random variable is complex. We fix this issue here, and testing that resulting variance is indeed real number. Pull Request resolved: https://github.com/pytorch/pytorch/pull/62946 Reviewed By: albanD Differential Revision: D30196038 Pulled By: iramazanli fbshipit-source-id: ab0a6f31658aeb56bdcb211ff86eaa29f3f0d718	2021-08-09 17:54:43 -07:00
ramvenkat98	4a544df00d	Implement and benchmark a torch.optim.multi_tensor.adagrad implementation (#59155 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59155 Test Plan: Imported from OSS Reviewed By: iramazanli Differential Revision: D29525213 Pulled By: ramvenkat98 fbshipit-source-id: 6d7e8da91c965d1f4e955a084ed875bab641dc9a	2021-07-07 08:08:32 -07:00
Ilqar Ramazanli	f0e972a481	To add Nesterov Adam algorithm for multi-tensor optimizers API (#59165 ) Summary: Previously in the PR: https://github.com/pytorch/pytorch/issues/59009 we added NAdam to Optimizers. Here in this PR we are proposing multi-tensor version of NAdam for PyTorch. Nadam has been proposed in the paper https://openreview.net/forum?id=OM0jvwB8jIp57ZJjtNEZ and report and report : http://cs229.stanford.edu/proj2015/054_report.pdf by Timothy Dozat. It has been one of the most used algorithm in Deep Learning community. It worth to noting that the implementation of NAdam is inspired by the implementation for Keras : `f9d3868495/tensorflow/python/keras/optimizer_v2/nadam.py` Pull Request resolved: https://github.com/pytorch/pytorch/pull/59165 Reviewed By: vincentqb Differential Revision: D29360577 Pulled By: iramazanli fbshipit-source-id: 0fe14016303b2df2cb8cc31912a2674acf63d1e5	2021-06-27 17:00:41 -07:00
Ilqar Ramazanli	5563f4bda0	To add Rectified Adam algorithm for multi-tensor optimizers API (#59161 ) Summary: Previously in the PR: https://github.com/pytorch/pytorch/issues/58968 we added RAdam to Optimizers. Here in this PR we are proposing multi-tensor version of RAdam for PyTorch. Radam has been proposed in the paper https://arxiv.org/pdf/1908.03265.pdf Liyuan Liu et al. It has been one of the most used algorithm in Deep Learning community. Differing from the paper, we selected variance tractability cut-off as 5 instead of 4 as it is the common practice. Pull Request resolved: https://github.com/pytorch/pytorch/pull/59161 Reviewed By: vincentqb Differential Revision: D29360576 Pulled By: iramazanli fbshipit-source-id: 7ccdbf12b1ee7f12e66f7d7992123a70cc818b6b	2021-06-27 13:01:20 -07:00
Ilqar Ramazanli	63219f1f9f	To add Rectified Adam Algorithm to Optimizers (#58968 ) Summary: Fixes : https://github.com/pytorch/pytorch/issues/24892 In the paper : https://arxiv.org/pdf/1908.03265.pdf Liyuan Liu et al. suggested a new optimization algorithm with an essence of similar to Adam Algorithm. It has been discussed in the paper that, without warmup heuristic, in the early stage of adaptive optimization / learning algorithms sometimes we can get undesirable large variance which can slow overall convergence process. Authors proposed the idea of rectification of variance of adaptive learning rate when it is expected to be high. Differing from the paper, we selected variance tractability cut-off as 5 instead of 4. This adjustment is common practice, and could be found in the code-repository and also tensorflow swift optim library as well : `2f03dd1970/radam/radam.py (L156)` `f51ee4618d/Sources/TensorFlow/Optimizers/MomentumBased.swift (L638)` Pull Request resolved: https://github.com/pytorch/pytorch/pull/58968 Reviewed By: vincentqb Differential Revision: D29310601 Pulled By: iramazanli fbshipit-source-id: b7bd487f72f1074f266687fd9c0c6be264a748a9	2021-06-23 18:27:57 -07:00
Ilqar Ramazanli	e8690dacb2	To add Nesterov Adam Algorithm to Optimizers (#59009 ) Summary: Fixes : https://github.com/pytorch/pytorch/issues/5804 In the paper : https://openreview.net/forum?id=OM0jvwB8jIp57ZJjtNEZ Timothy Dozat suggested a new optimization algorithm with an essence of combination of NAG and Adam algorithms. It is known that the idea of momentum can be improved with the Nesterov acceleration in optimization algorithms, and Dozat is investigating to apply this idea to momentum component of Adam algorithm. Author provided experiment evidence in their work to show excellence of the idea. In this PR we are implementing the proposed algorithm NAdam in the mentioned paper. Author has a preliminary work http://cs229.stanford.edu/proj2015/054_report.pdf where he shows the decay base constant should be taken as 0.96 which we also followed the same phenomenon here in this implementation similar to Keras. Moreover, implementation / coding practice have been followed similar to Keras in some other places as well: `f9d3868495/tensorflow/python/keras/optimizer_v2/nadam.py` Pull Request resolved: https://github.com/pytorch/pytorch/pull/59009 Reviewed By: gchanan, vincentqb Differential Revision: D29220375 Pulled By: iramazanli fbshipit-source-id: 4b4bb4b15f7e16f7527f368bbf4207ed345751aa	2021-06-23 08:21:43 -07:00
Sam Estep	1abf45e37f	Revert D29241736: [pytorch][PR] To add Rectified Adam Algorithm to Optimizers Test Plan: revert-hammer Differential Revision: D29241736 (`0d2a936176`) Original commit changeset: 288b9b1f3125 fbshipit-source-id: 56c4ec98647c6f1822b130726741a1c9ca193670	2021-06-22 12:08:31 -07:00
Ilqar Ramazanli	0d2a936176	To add Rectified Adam Algorithm to Optimizers (#58968 ) Summary: Fixes : https://github.com/pytorch/pytorch/issues/24892 In the paper : https://arxiv.org/pdf/1908.03265.pdf Liyuan Liu et al. suggested a new optimization algorithm with an essence of similar to Adam Algorithm. It has been discussed in the paper that, without warmup heuristic, in the early stage of adaptive optimization / learning algorithms sometimes we can get undesirable large variance which can slow overall convergence process. Authors proposed the idea of rectification of variance of adaptive learning rate when it is expected to be high. Differing from the paper, we selected variance tractability cut-off as 5 instead of 4. This adjustment is common practice, and could be found in the code-repository and also tensorflow swift optim library as well : `2f03dd1970/radam/radam.py (L156)` `f51ee4618d/Sources/TensorFlow/Optimizers/MomentumBased.swift (L638)` Pull Request resolved: https://github.com/pytorch/pytorch/pull/58968 Reviewed By: gchanan Differential Revision: D29241736 Pulled By: iramazanli fbshipit-source-id: 288b9b1f3125fdc6c7a7bb23fde1ea5c201c0448	2021-06-22 10:38:41 -07:00
Yukio Siraichi	93bf0ae6fc	Remove legacy constructor calls from pytorch codebase. (#54142 ) Summary: Follow up from https://github.com/pytorch/pytorch/issues/53889 Related to https://github.com/pytorch/pytorch/issues/47112 Removing every occurrence of the legacy constructor call present in PyTorch at: - _docs_ - _benchmarks_ - _test_ - _caffe2_ - _CONTRIBUTING.md_ Pull Request resolved: https://github.com/pytorch/pytorch/pull/54142 Reviewed By: ngimel Differential Revision: D27699450 Pulled By: mruberry fbshipit-source-id: 530aa3f5746cc8bc1407d5d51b2bbd8075e30546	2021-04-11 15:45:17 -07:00
Kyle Chen	bf5e5bf901	[ROCm] Enable test in test_linalg.py, test_optim.py and test_vmap.py … (#52818 ) Summary: Enable test in test_linalg.py, test_optim.py, and test_vmap.py for ROCm because they are passing. Signed-off-by: Kyle Chen <kylechen@amd.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/52818 Reviewed By: H-Huang Differential Revision: D26694091 Pulled By: mruberry fbshipit-source-id: 285d17aa7f271f4d94b5fa9d9f6620de8a70847b	2021-03-04 02:29:45 -08:00
Wanchao Liang	f8238d7917	[optim] bugfix when all parameters have no grad (#52944 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52944 This fix the bug introduced during refactoring optimizers https://github.com/pytorch/pytorch/pull/50411. When all parameters have no grads, we should still allows `beta` like hyper params to be defined. Reviewed By: ngimel Differential Revision: D26699827 fbshipit-source-id: 8a7074127704c7a4a1fbc17d48a81e23a649f280	2021-03-03 11:56:09 -08:00
Michael Carilli	e36576d153	Probable fix for out of place BinaryOpScalar bad values and/or IMAs on 11.2 (ci-all edition) (#52634 ) Summary: Should close https://github.com/pytorch/pytorch/issues/51992. ci-all resubmit of https://github.com/pytorch/pytorch/pull/52591. The plot also thickened considerably since then. Every foreach functor, it turns out, has bad `r_args` accesses for certain code paths and instantiations. Also, I noticed the [`n % kILP == 0`](`2680ff7759/aten/src/ATen/native/cuda/ForeachFunctors.cuh (L87)`) condition for vectorization in all functors is way too restrictive: it'll refuse to vectorize anything on any tensor whose overall numel is not a multiple of ILP. That's out of scope though. Pull Request resolved: https://github.com/pytorch/pytorch/pull/52634 Reviewed By: H-Huang Differential Revision: D26725991 Pulled By: izdeby fbshipit-source-id: 4bade0ac186bf85527baddc1c44b2c2b8e3c9777	2021-03-01 12:41:24 -08:00
Jane Xu	09516d2d0c	Reenables skipped tests for all CUDA versions except 11.2 (#52359 ) Summary: This PR adds functionality to skip a test based on CUDA version. This way, we can be more specific when skipping a test, such as when the test only fails for a particular CUDA version. This allows us to add back the skipped tests for CUDA 11.2 for other CUDA versions, such as 10.1 and 11.1. I tested this locally (by using 11.0 instead of 11.2), but will run all the CI to make sure it works. Pull Request resolved: https://github.com/pytorch/pytorch/pull/52359 Reviewed By: walterddr Differential Revision: D26487951 Pulled By: janeyx99 fbshipit-source-id: 45c71cc6105ffd9985054880009cf68ea5ef3f6a	2021-02-19 15:30:55 -08:00
Jane Xu	a1b8f3d4b6	Replace CUDA 11.1 Linux CI with CUDA 11.2 (#51905 ) Summary: Adding 11.2 to CI with BUILD_SPLIT_CUDA enabled. Disabled the following tests as they were failing in test_optim.py: test_adadelta test_adam test_adamw test_multi_tensor_optimizers test_rmsprop (Issue tracking that is here: https://github.com/pytorch/pytorch/issues/51992) Pull Request resolved: https://github.com/pytorch/pytorch/pull/51905 Reviewed By: VitalyFedyunin Differential Revision: D26368575 Pulled By: janeyx99 fbshipit-source-id: 31612c7d04d51afb3f18956e43dc7f7db8a91749	2021-02-10 11:43:50 -08:00
Natalia Gimelshein	4d169258ef	Revert D25976245: [pytorch][PR] Enable Skipped ROCM Tests in common_nn.py Test Plan: revert-hammer Differential Revision: D25976245 (`24a0272132`) Original commit changeset: 801032534f91 fbshipit-source-id: 561e6d761cb694451d5f87557b4f96f37d19dd90	2021-01-21 13:28:37 -08:00
Arindam Roy	24a0272132	Enable Skipped ROCM Tests in common_nn.py (#50753 ) Summary: Removed test_cuda=(not TEST_WITH_ROCM) in common_nn.py to enable the skipped tests for ROCM. Signed-off-by: Arindam Roy <rarindam@gmail.com> Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/50753 Reviewed By: mrshenli Differential Revision: D25976245 Pulled By: ngimel fbshipit-source-id: 801032534f911d24d231bc9f0d3235a4506412c0	2021-01-21 09:48:47 -08:00
Alexander Grund	5b0f400488	Replace list(map(...)) constructs by list comprehensions (#46461 ) Summary: As discussed in https://github.com/pytorch/pytorch/issues/46392 this makes the code more readable and possibly more performant. It also fixes a bug detected by this where the argument order of `map` was confused: `030a24906e (diff-5bb26bd3a23ee3bb540aeadcc0385df2a4e48de39f87ed9ea76b21990738fe98L1537-R1537)` Fixes https://github.com/pytorch/pytorch/issues/46392 Pull Request resolved: https://github.com/pytorch/pytorch/pull/46461 Reviewed By: ailzhang Differential Revision: D24367015 Pulled By: ezyang fbshipit-source-id: d55a67933cc22346b00544c9671f09982ad920e7	2020-10-19 18:42:49 -07:00
Aiden Nibali	2bc6caa9e4	Add three-phase option to OneCycleLR (#42715 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/40362 The new `three_phase` option provides a way of constructing schedules according to the scheme recommended in [Super-Convergence: Very Fast Training of Neural Networks Using Large Learning Rates](https://arxiv.org/abs/1708.07120). Note that this change maintains backwards compatibility, and as a result the default behaviour of OneCycleLR remains quite counter-intuitive. vincentqb Pull Request resolved: https://github.com/pytorch/pytorch/pull/42715 Reviewed By: heitorschueroff Differential Revision: D24289744 Pulled By: vincentqb fbshipit-source-id: e4aad87880716bb14613c0aa8631e43b04a93e5c	2020-10-14 15:05:14 -07:00
Iurii Zdebskyi	1a57b390e8	Add torch._foreach_maximum(TensorList, TensorList) & torch._foreach_minimum(TensorList, TensorList) APIs (#45692 ) Summary: - Adding torch._foreach_maximum(TensorList, TensorList) API - Adding torch._foreach_minimum(TensorList, TensorList) API - Updated Adam/AdamW optimizers Tested via unit tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/45692 Reviewed By: anjali411 Differential Revision: D24142464 Pulled By: izdeby fbshipit-source-id: 6a4fc343a1613cb1e26c8398450ac9cea0a2eb51	2020-10-13 09:22:30 -07:00
Iurii Zdebskyi	939e0389de	Update test_multi_tensor_optimizers test (#45510 ) Summary: Following up on previous [feedback](https://github.com/pytorch/pytorch/pull/45475/files#r496330797). Pull Request resolved: https://github.com/pytorch/pytorch/pull/45510 Reviewed By: heitorschueroff Differential Revision: D23992304 Pulled By: izdeby fbshipit-source-id: 4784ed8d79e09da3aa61880add6443e3a8d322e4	2020-09-30 08:59:18 -07:00
Iurii Zdebskyi	637570405b	Disable multi tensor tesnor tests on rocm (#45535 ) Summary: Disable multi tensor test on rocm Pull Request resolved: https://github.com/pytorch/pytorch/pull/45535 Reviewed By: ngimel Differential Revision: D24002557 Pulled By: izdeby fbshipit-source-id: 608c9389e3d9cd7dac49ea42c9bb0af55662c754	2020-09-29 15:49:21 -07:00
Iurii Zdebskyi	8c309fc052	Add more tests for mt optimizers (#45475 ) Summary: Add more test cases for mt optimizers and fix Adam/AdamW Pull Request resolved: https://github.com/pytorch/pytorch/pull/45475 Reviewed By: soumith Differential Revision: D23982727 Pulled By: izdeby fbshipit-source-id: 4b24d37bd52a2fa3719d3e3a5dcf3b96990b0f5b	2020-09-28 23:59:58 -07:00
Iurii Zdebskyi	722faeb2a4	[RELAND] Added optimizers based on multi tensor apply (#45408 ) Summary: Original PR https://github.com/pytorch/pytorch/pull/45299. The present PR fixes minor bugs that caused revert. Adding a new namespace `torch.optim._multi_tensor` with a bunch of updated optimizers. Those optimizers are using _foreach APIs which improve performance significantly. ### Tests - updated existing tests to use both optimizers - added `test_multi_tensor_optimizers` test to verify correctness. ### Perf results Adam timeit: 42.69 ms --> 10.16 ms autorange: 41.96 ms --> 10.28 ms AdamW timeit: 51.38 ms --> 15.63 ms autorange: 50.82 ms --> 16.07 ms SGD timeit: 6.28 ms --> 4.40 ms autorange: 6.13 ms --> 4.73 ms RMSprop timeit: 28.63 ms --> 5.89 ms autorange: 28.27 ms --> 5.76 ms Rprop timeit: 213.30 --> 178.42 autorange: 212.03 --> 178.03 ASGD timeit: 21.67 --> 9.33 autorange: 21.64 --> 9.27 Adamax timeit: 55.60 --> 48.29 autorange: 55.22 -> 49.13 Rerf Script used ``` import torch import time import torch.optim as optim from torch.autograd import Variable from torch.optim.lr_scheduler import ExponentialLR, ReduceLROnPlateau, StepLR import torch.nn as nn import time import torchvision import torch.utils._benchmark as benchmark_utils device = "cuda" model = torchvision.models.resnet.resnet101(pretrained=True).to(device) targets = torch.randint(0, 1000, (100, 100), device=device) criterion = nn.CrossEntropyLoss() optimizer = optim.SGD(model.parameters(), lr=1e-3) # <----------------------- optimizer. # would compare optim.SGD vs optim._multi_tensor.SGD running_loss = 0.0 target = torch.empty(128, dtype=torch.long, device=device).random_(5) optimizer.zero_grad() inputs = torch.rand(128, 3, 100, 100, device=device , requires_grad=True) outputs = model(inputs) loss = criterion(outputs, target) loss.backward() optimizer.step() running_loss += loss.item() def main(): timer = benchmark_utils.Timer( stmt="optimizer.step()", globals=globals(), label="str(optimizer)", ) for i in range(1): print(f"Run: {i}\n{'-' * 40}") print(f"timeit:\n{timer.timeit(1000)}\n") print(f"autorange:\n{timer.blocked_autorange()}\n\n") if __name__ == "__main__": main() ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/45408 Reviewed By: gchanan Differential Revision: D23956680 Pulled By: izdeby fbshipit-source-id: c5eab7bf5fce14a287c15cead1cdc26e42cfed94	2020-09-28 13:14:04 -07:00
Mike Ruberry	54a253fded	Revert D23931987: Added optimizers based on multi tensor apply Test Plan: revert-hammer Differential Revision: D23931987 (`2b21e7767e`) Original commit changeset: 582134ef2d40 fbshipit-source-id: ffd500aea55fda34155442fb15e2529cb9c00100	2020-09-26 18:11:54 -07:00
Iurii Zdebskyi	2b21e7767e	Added optimizers based on multi tensor apply (#45299 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45299 Adding a new namespace `torch.optim._multi_tensor` with a bunch of updated optimizers. Those optimizers are using _foreach APIs which improve performance significantly. ### Tests - updated existing tests to use both optimizers - added `test_multi_tensor_optimizers` test to verify correctness. ### Perf results Adam timeit: 42.69 ms --> 10.16 ms autorange: 41.96 ms --> 10.28 ms AdamW timeit: 51.38 ms --> 15.63 ms autorange: 50.82 ms --> 16.07 ms SGD timeit: 6.28 ms --> 4.40 ms autorange: 6.13 ms --> 4.73 ms RMSprop timeit: 28.63 ms --> 5.89 ms autorange: 28.27 ms --> 5.76 ms Rprop timeit: 213.30 --> 178.42 autorange: 212.03 --> 178.03 ASGD timeit: 21.67 --> 9.33 autorange: 21.64 --> 9.27 Adamax timeit: 55.60 --> 48.29 autorange: 55.22 -> 49.13 Rerf Script used ``` import torch import time import torch.optim as optim from torch.autograd import Variable from torch.optim.lr_scheduler import ExponentialLR, ReduceLROnPlateau, StepLR import torch.nn as nn import time import torchvision import torch.utils._benchmark as benchmark_utils device = "cuda" model = torchvision.models.resnet.resnet101(pretrained=True).to(device) targets = torch.randint(0, 1000, (100, 100), device=device) criterion = nn.CrossEntropyLoss() optimizer = optim.SGD(model.parameters(), lr=1e-3) # <----------------------- optimizer. # would compare optim.SGD vs optim._multi_tensor.SGD running_loss = 0.0 target = torch.empty(128, dtype=torch.long, device=device).random_(5) optimizer.zero_grad() inputs = torch.rand(128, 3, 100, 100, device=device , requires_grad=True) outputs = model(inputs) loss = criterion(outputs, target) loss.backward() optimizer.step() running_loss += loss.item() def main(): timer = benchmark_utils.Timer( stmt="optimizer.step()", globals=globals(), label="str(optimizer)", ) for i in range(1): print(f"Run: {i}\n{'-' * 40}") print(f"timeit:\n{timer.timeit(1000)}\n") print(f"autorange:\n{timer.blocked_autorange()}\n\n") if __name__ == "__main__": main() ``` Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D23931987 Pulled By: izdeby fbshipit-source-id: 582134ef2d402909d27d89a45c5b588fb7130ea1	2020-09-26 12:17:43 -07:00
Wanchao Liang	08caf15502	[optimizer] refactor Adam to use functional API (#44791 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44791 Test Plan: Imported from OSS Reviewed By: ailzhang Differential Revision: D23935257 Pulled By: wanchaol fbshipit-source-id: 6f6e22a9287f5515d2e4e6abd4dee2fe7e17b945	2020-09-25 17:13:08 -07:00
Randall Hunt	24eea364f7	Check SparseAdam params are dense on init (#41966 ) (#43668 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/41966 Raises a value error if user attempts to create SparseAdam optimizer with sparse parameter tensors. Pull Request resolved: https://github.com/pytorch/pytorch/pull/43668 Reviewed By: glaringlee Differential Revision: D23388109 Pulled By: ranman fbshipit-source-id: 1fbcc7527d49eac6fae9ce51b3307c609a6ca38b	2020-09-01 14:25:59 -07:00
mariosasko	4281240cb5	Raise error for duplicate params in param group #40967 (#41597 ) Summary: This PR fixes an issue in https://github.com/pytorch/pytorch/issues/40967 where duplicate parameters across different parameter groups are not allowed, but duplicates inside the same parameter group are accepted. After this PR, both cases are treated equally and raise `ValueError`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/41597 Reviewed By: zou3519 Differential Revision: D22608019 Pulled By: vincentqb fbshipit-source-id: 6df41dac62b80db042cfefa6e53fb021b49f4399	2020-07-27 12:25:52 -07:00
Mike Ruberry	b2b8af9645	Removes assertAlmostEqual (#41514 ) Summary: This test function is confusing since our `assertEqual` behavior allows for tolerance to be specified, and this is a redundant mechanism. Pull Request resolved: https://github.com/pytorch/pytorch/pull/41514 Reviewed By: ngimel Differential Revision: D22569348 Pulled By: mruberry fbshipit-source-id: 2b2ff8aaa9625a51207941dfee8e07786181fe9f	2020-07-16 10:35:12 -07:00
Alex Hedges	a3c87c4922	Make Optimizer.state_dict() nondeterministic (#37347 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/36831. Instead of using `id()`, an arbitrary yet consistent order-based index is used instead. This results in a deterministic output between runs. I am not the biggest fan of using `nonlocal` (it appears to be used sparingly in the codebase) to get `start_index` between calls to `pack_group()`, but the alternatives had larger issues: - Using the last value added to `param_mappings` would be ideal, but that only works if `dict` iteration order is consistent, and PyTorch currently supports Python <3.7. - Using the maximum value added to `param_mappings` wouldn't have that issue but would not be constant time. For testing, I confirmed that `test_optim.py` works before and after these changes. Randomizing the indices in `param_mappings` causes the tests to fail, which is further evidence these changes work. I'm not 100% if these tests are sufficient, but they're a start. Pull Request resolved: https://github.com/pytorch/pytorch/pull/37347 Differential Revision: D21353820 Pulled By: vincentqb fbshipit-source-id: e549f1f154833a461b1f4df6d07ad509aab34ea1	2020-06-01 15:32:02 -07:00
Mike Ruberry	13120bf677	Updates assertEqual to require atol and rtol, removes positional atol (#38872 ) Summary: This updates assertEqual and assertEqual-like functions to either require both or neither of atol and rtol be specified. This should improve clarity around handling precision in the test suite, and it allows us to remove the legacy positional atol argument from assertEqual. In addition, the "message" kwarg is replace with a kwarg-only "msg" argument whose name is consistent with unittest's assertEqual argument. In the future we could make "msg" an optional third positional argument to be more consistent with unittest's assertEqual, but requiring it be specified should be clear, and we can easily update the signature to make "msg" an optional positional argument in the future, too. Pull Request resolved: https://github.com/pytorch/pytorch/pull/38872 Differential Revision: D21740237 Pulled By: mruberry fbshipit-source-id: acbc027aa1d7877a49664d94db9a5fff91a07042	2020-05-27 06:31:07 -07:00
Rohan Varma	63e545e0fe	Revert D21717199: [pytorch][PR] Updates assertEqual to require atol and rtol, removes positional atol Test Plan: revert-hammer Differential Revision: D21717199 Original commit changeset: 9feb856f94ee fbshipit-source-id: bfde9c39a5ce99f0ca6183a7dde703c65b7c8259	2020-05-26 18:23:59 -07:00
Mike Ruberry	6ddca30b2d	Updates assertEqual to require atol and rtol, removes positional atol (#38872 ) Summary: This updates assertEqual and assertEqual-like functions to either require both or neither of atol and rtol be specified. This should improve clarity around handling precision in the test suite, and it allows us to remove the legacy positional atol argument from assertEqual. In addition, the "message" kwarg is replace with a kwarg-only "msg" argument whose name is consistent with unittest's assertEqual argument. In the future we could make "msg" an optional third positional argument to be more consistent with unittest's assertEqual, but requiring it be specified should be clear, and we can easily update the signature to make "msg" an optional positional argument in the future, too. Pull Request resolved: https://github.com/pytorch/pytorch/pull/38872 Differential Revision: D21717199 Pulled By: mruberry fbshipit-source-id: 9feb856f94eee911b44f6c7140a1d07c1b026d3a	2020-05-26 08:30:23 -07:00
Mike Ruberry	9cfc10d52e	Updates assertEqual to use torch.isclose-like logic (#37294 ) Summary: Edit: this has been updated to reflect the PR's current status, which has changed after review. This PR updates the behavior of the assertEqual, assertNotEqual, and assert_allclose to be consistent with each other and torch.isclose. It corrects several additional bugs in the current implementations and adds extensive testing and comments, too. These updates follow from changes to assertEqual like https://github.com/pytorch/pytorch/pull/34258 and https://github.com/pytorch/pytorch/pull/37069, and from our discussion of torch.isclose for complex tensors (see https://github.com/pytorch/pytorch/issues/36462), where we decided to implement a NumPy-compatible mathematical notion of "closeness" for complex tensors that is not a great fit for our testing framework. The detailed changelist is: - New test framework functions for comparing tensors and scalars - Tensors are compared using isclose; the real and imaginary parts of complex tensors are compared independently - Scalars are compared using the same algorithm - assertEqual and assert_allclose now use this common comparison function, instead of each implementing their own with divergent behavior - assertEqual-like debug messages are now available for all tensor and scalar comparisons, with additional context when comparing the components of sparse, quantized, and complex tensors - Extensive testing of the comparison behavior and debug messages - Small Updates - assertEqual now takes an "exact_device" argument, analogous to "exact_dtype", which should be useful in multidevice tests - assertEqual now takes an "equal_nan" argument for argument consistency with torch.isclose - assertEqual no longer takes the "allow_inf" keyword, which misleadingly only applied to scalar comparisons, was only ever set (rarely) to true, and is not supported by torch.isclose - Bug fixes: - the exact_dtype attribute has been removed (no longer needed after https://github.com/pytorch/pytorch/pull/38103) - message arguments passed to assertEqual are now handled correctly - bool x other dtype comparisons are now supported - uint8 and int8 tensor comparisons now function properly - rtol for integer comparisons is now supported (default is zero) - rtol and atol for scalar comparisons are now supported - complex scalar comparisons are now supported, analogous to complex tensor comparisons - assertNotEqual is now equivalent to the logical negation of assertEqual Pull Request resolved: https://github.com/pytorch/pytorch/pull/37294 Differential Revision: D21596830 Pulled By: mruberry fbshipit-source-id: f2576669f7113a06f82581fc71883e6b772de19b	2020-05-15 16:24:03 -07:00
Pavel Izmailov	22ac071d9a	Add SWA to PyTorch mainline (#35032 ) Summary: This PR is based on the issue https://github.com/pytorch/pytorch/issues/29994#issue-524418771 and the discussion in the previous version of the PR https://github.com/pytorch/pytorch/pull/30559. Specifically, I followed the interface outlined in this [comment](https://github.com/pytorch/pytorch/pull/30559#issuecomment-574864768). ## Structure - `torch/optim/swa_utils.py` contains the implementation of `AveragedModel` class, `SWALR` learning rate scheduler and `update_bn` utility - `test/test_optim.py` contains unit tests for the three components of SWA - `torch/optim/swa_utils.pyi` describes the interface of `torch/optim/swa_utils.py` The new implementation consists of - `AveragedModel` class; this class creates a copy of a given model and allows to compute running averages of the parameters. - `SWALR` learning rate scheduler; after a certain number of epochs switches to a constant learning rate; this scheduler is supposed to be chained with other schedulers. - `update_bn` utility; updates the Batch Normalization activation statistics for a given model and dataloader; this utility is meant to be applied to `AveragedModel` instances. For `update_bn` I simplified the implementation compared to the [original PR](https://github.com/pytorch/pytorch/pull/30559) according to the sugestions by vadimkantorov. ## Example ```python loader, optimizer, model = ... swa_model = torch.optim.swa_utils.AveragedModel(model) # You can use custom averaging functions with `avg_fun` parameter ema_avg = lambda p_avg, p, n_avg: 0.1 * p_avg + 0.9 * p ema_model = torch.optim.swa_utils.AveragedModel(model, avg_function=ema_avg) scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(optimizer, T_max=300) swa_start = 160 swa_scheduler = SWALR(optimizer, start_epoch=swa_start, swa_lr=0.05) for i in range(300): for input, target in loader: optimizer.zero_grad() loss_fn(model(input), target).backward() optimizer.step() scheduler.step() swa_scheduler.step() if i > swa_start: swa_model.update_parameters(model) # Update bn statistics for the swa_model at the end torch.optim.swa_utils.update_bn(loader, swa_model) ``` UPDATED: ```python3 loader, optimizer, model, loss_fn = ... swa_model = torch.optim.swa_utils.AveragedModel(model) scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(optimizer, T_max=300) swa_start = 160 swa_scheduler = SWALR(optimizer, swa_lr=0.05) for i in range(300): for input, target in loader: optimizer.zero_grad() loss_fn(model(input), target).backward() optimizer.step() if i > swa_start: swa_model.update_parameters(model) swa_scheduler.step() else: scheduler.step() # Update bn statistics for the swa_model at the end torch.optim.swa_utils.update_bn(loader, swa_model) ``` Fixes https://github.com/pytorch/pytorch/issues/29994 cc soumith vincentqb andrewgordonwilson vadimkantorov Pull Request resolved: https://github.com/pytorch/pytorch/pull/35032 Differential Revision: D21079606 Pulled By: vincentqb fbshipit-source-id: e07f5e821f72ada63789814c2dcbdc31f0160c37	2020-04-27 07:42:19 -07:00
Wanchao Liang	3526627f46	Use unittest assertWarns instead (#36411 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36411 This PR remove pytorch specific defined assertwarns and use the unit test one, also format some tests Test Plan: Imported from OSS Differential Revision: D20998159 Pulled By: wanchaol fbshipit-source-id: 1280ecff2dd293b95a639d13cc7417fc819c2201	2020-04-13 15:56:42 -07:00
Derun Gu	5857a125df	Turn on exact_dtype by default on test_optim.py (#34825 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34825 Test Plan: Imported from OSS Differential Revision: D20498111 Pulled By: great-way fbshipit-source-id: e689ca40c496b6b4cccb0df30bdae89b2c024f31	2020-03-17 14:41:13 -07:00
Vincent Quenneville-Belair	be3bc1deb1	convert counter back to list #33229 (#33356 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/33229 Pull Request resolved: https://github.com/pytorch/pytorch/pull/33356 Differential Revision: D20003196 Pulled By: vincentqb fbshipit-source-id: 96f9e0fc7e99a7c2e202f932d1a2ffa158afad92	2020-03-10 15:46:24 -07:00
Nikolay Novik	d19a50bf27	Add missing weight_decay parameter validation for Adam and AdamW (#33126 ) Summary: Adam and AdamW are missing parameter validation for weight_decay. Other optimisers have this check present. Pull Request resolved: https://github.com/pytorch/pytorch/pull/33126 Differential Revision: D19860366 Pulled By: vincentqb fbshipit-source-id: 286d7dc90e2f4ccf6540638286d2fe17939648fc	2020-02-20 11:11:51 -08:00
Vincent Quenneville-Belair	e7f0b15473	Remove return value for __exit__ (#32997 ) Summary: When an error is raised and `__exit__` in a context manager returns `True`, the error is suppressed; otherwise the error is raised. No return value should be given to maintain the default behavior of context manager. Fixes https://github.com/pytorch/pytorch/issues/32639. The `get_lr` function was overridden with a function taking an epoch parameter, which is not allowed. However, the relevant error was not being raised. ```python In [1]: import torch ...: ...: class MultiStepLR(torch.optim.lr_scheduler._LRScheduler): ...: def __init__(self, optimizer, gamma, milestones, last_epoch = -1): ...: self.init_lr = [group['lr'] for group in optimizer.param_groups] ...: self.gamma = gamma ...: self.milestones = milestones ...: super().__init__(optimizer, last_epoch) ...: ...: def get_lr(self, step): ...: global_step = self.last_epoch #iteration number in pytorch ...: gamma_power = ([0] + [i + 1 for i, m in enumerate(self.milestones) if global_step >= m])[-1] ...: return [init_lr * (self.gamma ** gamma_power) for init_lr in self.init_lr] ...: ...: optimizer = torch.optim.SGD([torch.rand(1)], lr = 1) ...: scheduler = MultiStepLR(optimizer, gamma = 1, milestones = [10, 20]) ``` ``` --------------------------------------------------------------------------- TypeError Traceback (most recent call last) <ipython-input-1-7fad6ba050b0> in <module> 14 15 optimizer = torch.optim.SGD([torch.rand(1)], lr = 1) ---> 16 scheduler = MultiStepLR(optimizer, gamma = 1, milestones = [10, 20]) <ipython-input-1-7fad6ba050b0> in __init__(self, optimizer, gamma, milestones, last_epoch) 6 self.gamma = gamma 7 self.milestones = milestones ----> 8 super().__init__(optimizer, last_epoch) 9 10 def get_lr(self, step): ~/anaconda3/envs/pytorch/lib/python3.7/site-packages/torch/optim/lr_scheduler.py in __init__(self, optimizer, last_epoch) 75 self._step_count = 0 76 ---> 77 self.step() 78 79 def state_dict(self): ~/anaconda3/envs/pytorch/lib/python3.7/site-packages/torch/optim/lr_scheduler.py in step(self, epoch) 141 print("1a") 142 # try: --> 143 values = self.get_lr() 144 # except TypeError: 145 # raise RuntimeError TypeError: get_lr() missing 1 required positional argument: 'step' ``` May be related to https://github.com/pytorch/pytorch/issues/32898. Pull Request resolved: https://github.com/pytorch/pytorch/pull/32997 Differential Revision: D19737731 Pulled By: vincentqb fbshipit-source-id: 5cf84beada69b91f91e36b20c3278e9920343655	2020-02-11 09:27:29 -08:00
Pritam Damania	f050b16dd9	Move pytorch distributed tests to separate folder for contbuild. (#30445 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30445 Create distributed and rpc directories under caffe/test for better management of unit tests. Differential Revision: D18702786 fbshipit-source-id: e9daeed0cfb846ef68806f6decfcb57c0e0e3606	2020-01-22 21:16:59 -08:00
Vincent Quenneville-Belair	e4f40bf3b2	Add multiplicative lr. (#27254 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27254 `MultiplicativeLR` consumes a function providing the multiplicative factor at each epoch. It mimics `LambdaLR` in its syntax. Test Plan: Imported from OSS Differential Revision: D17728088 Pulled By: vincentqb fbshipit-source-id: 1c4a8e19a4f24c87b5efccda01630c8a970dc5c9	2019-10-23 11:38:45 -07:00

1 2 3 4 5

216 Commits