Commit Graph

30 Commits

Author SHA1 Message Date
Mikayla Gawarecki
2a5aaf1c49 Optim foreach cleanup for AdamW (#70484)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70484

Test Plan: Imported from OSS

Reviewed By: anjali411

Differential Revision: D33767869

Pulled By: mikaylagawarecki

fbshipit-source-id: 2f5273bbfeea3ed502c5d77da4bebe1674243e86
(cherry picked from commit 2dd9b77917)
2022-02-15 18:02:08 +00:00
Mikayla Gawarecki
dff58d519f Optim foreach cleanup for Rprop (#70483)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70483

Test Plan: Imported from OSS

Reviewed By: anjali411

Differential Revision: D33767866

Pulled By: mikaylagawarecki

fbshipit-source-id: ffc5ae68eeea8fa09385862b853b731554b77bcb
(cherry picked from commit 3a0fe29580)
2022-02-15 18:02:08 +00:00
Mikayla Gawarecki
ce3094f5f6 Optim foreach cleanup for Rmsprop (#70482)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70482

Test Plan: Imported from OSS

Reviewed By: anjali411

Differential Revision: D33767862

Pulled By: mikaylagawarecki

fbshipit-source-id: 8e2e9c986d5a3774093a79755940372945f1b3a9
(cherry picked from commit baea537277)
2022-02-15 18:02:08 +00:00
Mikayla Gawarecki
2cb03e926f Optim foreach cleanup for SGD (#70481)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70481

Test Plan: Imported from OSS

Reviewed By: anjali411

Differential Revision: D33767868

Pulled By: mikaylagawarecki

fbshipit-source-id: 89b9227a4ddf99602855973cbc343c58ae3d5328
(cherry picked from commit ffea8ddcfd)
2022-02-15 18:02:08 +00:00
Mikayla Gawarecki
5f9590681d Optim foreach cleanup for Adam (#70295)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70295

Test Plan: Imported from OSS

Reviewed By: anjali411

Differential Revision: D33767870

Pulled By: mikaylagawarecki

fbshipit-source-id: f922f15ecb0307458c8ecee737325c42c4f3ce8b
(cherry picked from commit 66233a8a3e)
2022-02-15 18:02:08 +00:00
Mikayla Gawarecki
0972db5b7d Optim foreach cleanup for ASGD (#70231)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70231

Test Plan: Imported from OSS

Reviewed By: anjali411

Differential Revision: D33767867

Pulled By: mikaylagawarecki

fbshipit-source-id: 4406824acbb6f427d52c1ced2d8a02a98c943b86
(cherry picked from commit cbd9a4da15)
2022-02-09 16:52:13 +00:00
Mikayla Gawarecki
5948522e9c Optim foreach cleanup for RAdam (#70230)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70230

Test Plan: Imported from OSS

Reviewed By: anjali411

Differential Revision: D33767874

Pulled By: mikaylagawarecki

fbshipit-source-id: 9379db24266a7bbcc2c23849f87ae0af2e6729c0
(cherry picked from commit ecf7b31fc3)
2022-02-09 16:52:13 +00:00
Mikayla Gawarecki
3653f07c7c Optim foreach cleanup for NAdam (#70229)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70229

Test Plan: Imported from OSS

Reviewed By: anjali411

Differential Revision: D33767873

Pulled By: mikaylagawarecki

fbshipit-source-id: 833ead14c1d1659351ebfbeb41045a3c7eb96dad
(cherry picked from commit 9415df6b5c)
2022-02-09 16:52:13 +00:00
Mikayla Gawarecki
d9acfef831 Optim foreach cleanup for Adamax (#69982)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69982

Test Plan: Imported from OSS

Reviewed By: anjali411

Differential Revision: D33767865

Pulled By: mikaylagawarecki

fbshipit-source-id: c5efd351e359825d38b71f57a2c61a2055c3c114
(cherry picked from commit 37bb80c2d7)
2022-02-09 16:52:13 +00:00
Mikayla Gawarecki
dabfea8363 Optim foreach cleanup for Adagrad (#69981)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69981

Test Plan: Imported from OSS

Reviewed By: anjali411

Differential Revision: D33767863

Pulled By: mikaylagawarecki

fbshipit-source-id: 1c99abe4ac4eb2a9eb896dff4837b539b94f68e7
(cherry picked from commit 61c28d0645)
2022-02-09 16:52:12 +00:00
Mikayla Gawarecki
8e8d170674 Optim foreach cleanup for Adadelta (#69980)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69980

- Merged `torch/optim/adadelta.py` and `torch/optim/_multitensor/adadelta.py` into `torch/optim/adadelta.py`
- Moved adadelta functional forms from `torch/optim/_functional.py` and `torch/optim/_multi_tensor/_functional.py` to `torch/optim/adadelta.py`
- `torch/optim/_functional.py` just imports from `torch/optim/adadelta.py`
- Added a test `test_optimizers_foreach_flag` which replicates `test_multi_tensor_optimizers` in `test/test_optim.py`
- Add a test `test_adadelta_new` that replicates the behavior of `test_adadelta` but with `foreach` flag instead of using the multitensor adadleta class. If we delete `_multitensor/` we could replace `test_adadelta` with this

Remaining TODO:

- [ ] single_tensor adadelta supports complex but multitensor does not, need to integrate the singletensor logic in multitensor and switch the `test_adadelta_complex` to test for foreach in [True, False]

Test Plan: Imported from OSS

Reviewed By: VitalyFedyunin, albanD

Differential Revision: D33413059

Pulled By: mikaylagawarecki

fbshipit-source-id: 92a9fa98705762bb6bd464261671e49aef40070e
(cherry picked from commit a008227d22)
2022-02-09 16:52:12 +00:00
Mikayla Gawarecki
8bb1d06702 [optim] ASGD fold state updates into functional and pass list of vars rather than states (#71335)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/71335

Test Plan: Imported from OSS

Reviewed By: anjali411

Differential Revision: D33767871

Pulled By: mikaylagawarecki

fbshipit-source-id: 84ebe1fafb1c27572f08c8c8026c882dd7e054c1
(cherry picked from commit 7613ebb391)
2022-02-08 23:58:41 +00:00
Mikayla Gawarecki
ccc1a01dcb [optim] NAdam fold state updates into functional (#71334)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/71334

Test Plan: Imported from OSS

Reviewed By: anjali411

Differential Revision: D33767864

Pulled By: mikaylagawarecki

fbshipit-source-id: 4d985e9e346f40110bd4231e0f16e5643fbc448d
(cherry picked from commit 58aa77e367)
2022-02-08 23:58:41 +00:00
Mikayla Gawarecki
7176c92687 [optim] update step in functional and pass state_steps instead of state (#71333)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71333

Updated
- Adagrad
- Adamax
- Adam
- AdamW
- RAdam
make multi_tensor functionals take `state_steps: List[Tensor]` instead of taking `states: List[Dict]`
make `state_steps: List[int]s -> state_steps:List[Tensor]` where each is a Singleton tensor so step can be updated within the functional

(NAdam and ASGD) were updated in separate diffs to fold their handling of state into the functionals

Test Plan: Imported from OSS

Reviewed By: anjali411

Differential Revision: D33767872

Pulled By: mikaylagawarecki

fbshipit-source-id: 9baa7cafb6375eab839917df9287c65a437891f2
(cherry picked from commit 831c02b3d0)
2022-02-08 16:51:19 +00:00
Adnios
a9c7d626e1 Add the maximize flag to AdamW (#70146)
Summary:
Related issue: https://github.com/pytorch/pytorch/issues/68052

cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse SciPioneer H-Huang

Pull Request resolved: https://github.com/pytorch/pytorch/pull/70146

Reviewed By: malfet

Differential Revision: D33254561

Pulled By: albanD

fbshipit-source-id: f190c836a4162f936c5953e076747c345df21421
2021-12-23 09:20:29 -08:00
oliver
3d358a7678 Adds a maximize flag to Adam (#68164)
Summary:
Solves the next most important use case in https://github.com/pytorch/pytorch/issues/68052.

I have kept the style as close to that in SGD as seemed reasonable, given the slight differences in their internal implementations.

All feedback welcome!

cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse SciPioneer H-Huang

Pull Request resolved: https://github.com/pytorch/pytorch/pull/68164

Reviewed By: VitalyFedyunin

Differential Revision: D32994129

Pulled By: albanD

fbshipit-source-id: 65c57c3f3dbbd3e3e5338d51def54482503e8850
2021-12-13 05:53:53 -08:00
oliver
f8297d40fc Adds a maximize flag to SGD. (#67847)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/46480 -- for SGD.

## Notes:
- I have modified the existing tests to take a new `constructor_accepts_maximize` flag. When this is set to true, the ` _test_basic_cases_template` function will test both maximizing and minimizing the sample function.
- This was the clearest way I could think of testing the changes -- I would appreciate feedback on this strategy.

## Work to be done:
[] I need to update the docs.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/67847

Reviewed By: H-Huang

Differential Revision: D32252631

Pulled By: albanD

fbshipit-source-id: 27915a3cc2d18b7e4d17bfc2d666fe7d2cfdf9a4
2021-11-09 00:43:07 -08:00
Christopher Gray Howard
dfa7225a38 [Pytorch][Bootcamp] Add fix and testing for non-vectorized Adadelta optimizer to handle complex numbers (#66587)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66587

Made some changes in the step function of the non-vectorized Adadelta optimizer to handle complex numbers as two real numbers as per 65711 on github
ghstack-source-id: 141484731

Test Plan:
buck test mode/dev caffe2/test:optim -- 'test_adadelta_complex'

https://pxl.cl/1R7kJ

Reviewed By: albanD

Differential Revision: D31630069

fbshipit-source-id: 2741177b837960538ce39772897af36bbce7b7d8
2021-10-26 17:35:01 -07:00
Christopher Gray Howard
acb340de75 [Pytorch][Bootcamp] Add fixes and vanilla testing for Adagrad non-vectorized and vectorized optimizers to handle complex numbers (#66671)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66671

Made changes in the step function of the vectorized and non-vectorized adagrad optimizers to handle complex numbers as two real numbers as per 65711 on github
ghstack-source-id: 141442350

Test Plan:
buck test mode/dev caffe2/test:optim -- 'test_adagrad_complex'
https://pxl.cl/1Rd44

Reviewed By: albanD

Differential Revision: D31673503

fbshipit-source-id: 90a0d0c69b556716e2d17c59ce80f09c750fc464
2021-10-25 10:13:21 -07:00
Ilqar Ramazanli
5ed6e4429e To fix variance computation for complex Adam (#62946)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/59998

It has been discussed in the issue that the variance term of Adam optimizer currently doesn't compute correctly for complex domain.  As it has been stated in the Generalization to Complex numbers section  in https://en.wikipedia.org/wiki/Variance variance is computed as E[(X - mu)(X-mu)*] (where mu = E[X] and * stands for conjugate) for complex random variable X.

However, currently the computation method in implementation of Adam is via E[(X - mu)(X-mu)] which doesn't return right variance value, in particular it returns complex number. Variance is defined to be real number even though underlying random variable is complex.

We fix this issue here, and testing that resulting variance is indeed real number.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62946

Reviewed By: albanD

Differential Revision: D30196038

Pulled By: iramazanli

fbshipit-source-id: ab0a6f31658aeb56bdcb211ff86eaa29f3f0d718
2021-08-09 17:54:43 -07:00
Ilqar Ramazanli
7c2938bf67 To refactor Sparse Adam algorithm for functional form (#59171)
Summary:
Adds Functional Interface for Sparse Adam Optimizer.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/59171

Reviewed By: vincentqb

Differential Revision: D29360582

Pulled By: iramazanli

fbshipit-source-id: 5ceffd7f4b7abd1e0b758a5b8445abdf5555eba0
2021-06-25 06:35:39 -07:00
Ilqar Ramazanli
63219f1f9f To add Rectified Adam Algorithm to Optimizers (#58968)
Summary:
Fixes : https://github.com/pytorch/pytorch/issues/24892

In the paper : https://arxiv.org/pdf/1908.03265.pdf  Liyuan Liu et al. suggested a new optimization algorithm with an essence of similar to Adam Algorithm.

It has been discussed in the paper that, without warmup heuristic, in the early stage of adaptive optimization / learning algorithms sometimes we can get undesirable large variance which can slow overall convergence process.

Authors proposed the idea of rectification of variance of adaptive learning rate when it is expected to be high.

Differing from the paper, we selected variance tractability cut-off as 5 instead of 4. This adjustment is common practice, and could be found in the code-repository and also tensorflow swift optim library as well :

2f03dd1970/radam/radam.py (L156)

f51ee4618d/Sources/TensorFlow/Optimizers/MomentumBased.swift (L638)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/58968

Reviewed By: vincentqb

Differential Revision: D29310601

Pulled By: iramazanli

fbshipit-source-id: b7bd487f72f1074f266687fd9c0c6be264a748a9
2021-06-23 18:27:57 -07:00
Ilqar Ramazanli
e8690dacb2 To add Nesterov Adam Algorithm to Optimizers (#59009)
Summary:
Fixes : https://github.com/pytorch/pytorch/issues/5804

In the paper : https://openreview.net/forum?id=OM0jvwB8jIp57ZJjtNEZ  Timothy Dozat suggested a new optimization algorithm with an essence of combination of NAG and Adam algorithms.

It is known that the idea of momentum can be improved with the Nesterov acceleration in optimization algorithms, and Dozat is investigating to apply this idea to momentum component of Adam algorithm. Author provided experiment evidence in their work to show excellence of the idea.

In this PR we are implementing the proposed algorithm NAdam in the mentioned paper. Author has a preliminary work http://cs229.stanford.edu/proj2015/054_report.pdf  where he shows the decay base constant should be taken as 0.96 which we also followed the same phenomenon here in this implementation similar to Keras. Moreover, implementation / coding practice have been followed similar to Keras in some other places as well:

f9d3868495/tensorflow/python/keras/optimizer_v2/nadam.py

Pull Request resolved: https://github.com/pytorch/pytorch/pull/59009

Reviewed By: gchanan, vincentqb

Differential Revision: D29220375

Pulled By: iramazanli

fbshipit-source-id: 4b4bb4b15f7e16f7527f368bbf4207ed345751aa
2021-06-23 08:21:43 -07:00
Sam Estep
1abf45e37f Revert D29241736: [pytorch][PR] To add Rectified Adam Algorithm to Optimizers
Test Plan: revert-hammer

Differential Revision:
D29241736 (0d2a936176)

Original commit changeset: 288b9b1f3125

fbshipit-source-id: 56c4ec98647c6f1822b130726741a1c9ca193670
2021-06-22 12:08:31 -07:00
Ilqar Ramazanli
0d2a936176 To add Rectified Adam Algorithm to Optimizers (#58968)
Summary:
Fixes : https://github.com/pytorch/pytorch/issues/24892

In the paper : https://arxiv.org/pdf/1908.03265.pdf  Liyuan Liu et al. suggested a new optimization algorithm with an essence of similar to Adam Algorithm.

It has been discussed in the paper that, without warmup heuristic, in the early stage of adaptive optimization / learning algorithms sometimes we can get undesirable large variance which can slow overall convergence process.

Authors proposed the idea of rectification of variance of adaptive learning rate when it is expected to be high.

Differing from the paper, we selected variance tractability cut-off as 5 instead of 4. This adjustment is common practice, and could be found in the code-repository and also tensorflow swift optim library as well :

2f03dd1970/radam/radam.py (L156)

f51ee4618d/Sources/TensorFlow/Optimizers/MomentumBased.swift (L638)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/58968

Reviewed By: gchanan

Differential Revision: D29241736

Pulled By: iramazanli

fbshipit-source-id: 288b9b1f3125fdc6c7a7bb23fde1ea5c201c0448
2021-06-22 10:38:41 -07:00
Ilqar Ramazanli
9a622f4cd9 refactor ASGD to use functional API (#58410)
Summary:
Functional API is used in large scale distributed training to enable multithreaded training instead of multiprocess, as it gives more optimal resource utilization and efficiency.

In this PR, we provide code migration and refactoring for functional API for ASGD algorithm.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/58410

Reviewed By: ailzhang

Differential Revision: D28546702

Pulled By: iramazanli

fbshipit-source-id: 4f62b6037d53f35b19f98340e88af2ebb6243a4f
2021-05-19 18:55:52 -07:00
Wanchao Liang
4611387608 [optim] take kw-only argument for functional optim APIs (#56185)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56185

ghstack-source-id: 126670123

Reviewed By: albanD

Differential Revision: D27802169

fbshipit-source-id: f5e1cb2046dcdeecf5f6b0f70892828bf0adb22f
2021-04-15 20:08:04 -07:00
Wanchao Liang
8ef13cf976 [optim] refactor rprop to use functional API (#55832)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/55832

ghstack-source-id: 126325541

Reviewed By: driazati

Differential Revision: D27703877

fbshipit-source-id: 34d4ce7b7d124c0cd75e2f6d0bc8f836713b7301
2021-04-15 15:19:41 -07:00
Wanchao Liang
bb245b6444 [optim] refactor adamax to use functional API (#55830)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/55830

ghstack-source-id: 126325537

Reviewed By: driazati

Differential Revision: D26561017

fbshipit-source-id: 41273d200e546d4ac08d39b57865d63c624f143a
2021-04-15 15:19:39 -07:00
Vincent Quenneville-Belair
50d903f19f [optim] make functional api be private (#51316) (#51665)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/51665

This reverts commit 896f82aa92.

Test Plan: Imported from OSS

Reviewed By: gchanan

Differential Revision: D26232608

Pulled By: vincentqb

fbshipit-source-id: ca006baf4fb672c11c1bb003c39a29cbadb63dd3
2021-02-03 17:59:05 -08:00