Commit Graph

93 Commits

Author SHA1 Message Date
albanD
375668cd96 Remove overly restrictive assert in adam (#80222)
This is causing issues if the user has the step on cuda for a good reason.

These assert prevents code that used to run just fine to fail.
Note that this is a pretty bad thing to do for performance though so it is ok to try and push users away from doing it.

For the 1.12.1 milestone: this is not asking for a dot release to fix this (as this is bad practice anyways). But it would be a great thing to add if we do one: it is very low risk and will prevent breakage for users.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80222
Approved by: https://github.com/jbschlosser, https://github.com/ngimel
2022-06-24 17:08:34 +00:00
Michael Carilli
ba27ee9e8f [CUDA graphs] Allows Adam and AdamW to be capture-safe (#77862)
Near term fix for https://github.com/pytorch/pytorch/issues/76368.

Q. Why does the user need to request `capturable=True` in the optimizer constructor? Why can't capture safety be completely automatic?
A. We need to set up capture-safe (device-side) state variables before capture. If we don't, and step() internally detects capture is underway, it's too late: the best we could do is create a device state variable and copy the current CPU value into it, which is not something we want baked into the graph.

Q. Ok, why not just do the capture-safe approach with device-side state variables all the time?
A. It incurs several more kernel launches per parameter, which could really add up and regress cpu overhead for ungraphed step()s. If the optimizer won't be captured, we should allow step() to stick with its current cpu-side state handling.

Q. But cuda RNG is a stateful thing that maintains its state on the cpu outside of capture and replay, and we capture it automatically. Why can't we do the same thing here?
A. The graph object can handle RNG generator increments because its capture_begin, capture_end, and replay() methods can see and access generator object. But the graph object has no explicit knowledge of or access to optimizer steps in its capture scope. We could let the user tell the graph object what optimizers will be stepped in its scope, ie something like
```python
graph.will_use_optimizer(opt)
graph.capture_begin()
...
```
but that seems clunkier than an optimizer constructor arg.

I'm open to other ideas, but right now I think constructor arg is necessary and the least bad approach.

Long term, https://github.com/pytorch/pytorch/issues/71274 is a better fix.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/77862
Approved by: https://github.com/ezyang
2022-06-13 01:56:47 +00:00
Mikayla Gawarecki
5f9590681d Optim foreach cleanup for Adam (#70295)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70295

Test Plan: Imported from OSS

Reviewed By: anjali411

Differential Revision: D33767870

Pulled By: mikaylagawarecki

fbshipit-source-id: f922f15ecb0307458c8ecee737325c42c4f3ce8b
(cherry picked from commit 66233a8a3e)
2022-02-15 18:02:08 +00:00
Mikayla Gawarecki
7176c92687 [optim] update step in functional and pass state_steps instead of state (#71333)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71333

Updated
- Adagrad
- Adamax
- Adam
- AdamW
- RAdam
make multi_tensor functionals take `state_steps: List[Tensor]` instead of taking `states: List[Dict]`
make `state_steps: List[int]s -> state_steps:List[Tensor]` where each is a Singleton tensor so step can be updated within the functional

(NAdam and ASGD) were updated in separate diffs to fold their handling of state into the functionals

Test Plan: Imported from OSS

Reviewed By: anjali411

Differential Revision: D33767872

Pulled By: mikaylagawarecki

fbshipit-source-id: 9baa7cafb6375eab839917df9287c65a437891f2
(cherry picked from commit 831c02b3d0)
2022-02-08 16:51:19 +00:00
Alban Desmaison
e1b84e1b6b fix loading of older models that don't have maximize (#71023)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/71023

Reviewed By: jbschlosser

Differential Revision: D33483687

Pulled By: albanD

fbshipit-source-id: 2f3c6e97a9579be9ba15eca0756fc1f2c466fbb6
2022-01-10 06:01:24 -08:00
Adnios
15f14ce0dc fix typo in adam docs (#70387)
Summary:
Fix the typo in [adam docs in master branch](https://pytorch.org/docs/master/generated/torch.optim.Adam.html#torch.optim.Adam)

![image](https://user-images.githubusercontent.com/41060790/147345284-37e180d1-fd06-4a62-9c79-2d17b8aa5cd3.png)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/70387

Reviewed By: H-Huang

Differential Revision: D33309283

Pulled By: albanD

fbshipit-source-id: d20c5d8f2498ac64013f71e202a6b50dcc069f2b
2021-12-28 07:35:39 -08:00
oliver
3d358a7678 Adds a maximize flag to Adam (#68164)
Summary:
Solves the next most important use case in https://github.com/pytorch/pytorch/issues/68052.

I have kept the style as close to that in SGD as seemed reasonable, given the slight differences in their internal implementations.

All feedback welcome!

cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse SciPioneer H-Huang

Pull Request resolved: https://github.com/pytorch/pytorch/pull/68164

Reviewed By: VitalyFedyunin

Differential Revision: D32994129

Pulled By: albanD

fbshipit-source-id: 65c57c3f3dbbd3e3e5338d51def54482503e8850
2021-12-13 05:53:53 -08:00
Ilqar Ramazanli
43248d9112 [doc][hackathon] To add Adam Optimizer to the documentation (#63251)
Summary:
It has been discussed before that adding description of Optimization algorithms to PyTorch Core documentation may result in a nice Optimization research tutorial. In the following tracking issue we mentioned about all the necessary algorithms and links to the originally published paper  https://github.com/pytorch/pytorch/issues/63236.

In this PR we are adding description of Adam Algorithm to the documentation.  For more details, we refer to the paper  https://arxiv.org/abs/1412.6980

<img width="442" alt="Screen Shot 2021-08-27 at 6 37 54 PM" src="https://user-images.githubusercontent.com/73658284/131195297-35fce613-3691-4fed-b42d-db234d4fcd7c.png">

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63251

Reviewed By: albanD

Differential Revision: D30779163

Pulled By: iramazanli

fbshipit-source-id: 319a80fc3952793b0d064d0e641ddc1de3c05a86
2021-09-07 11:03:35 -07:00
Wanchao Liang
4611387608 [optim] take kw-only argument for functional optim APIs (#56185)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56185

ghstack-source-id: 126670123

Reviewed By: albanD

Differential Revision: D27802169

fbshipit-source-id: f5e1cb2046dcdeecf5f6b0f70892828bf0adb22f
2021-04-15 20:08:04 -07:00
Jay Patel
4f62c622b3 Cleanup of unused list in adam.py (#53874)
Summary:
Code cleanup.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/53874

Reviewed By: jbschlosser

Differential Revision: D27036819

Pulled By: ngimel

fbshipit-source-id: c267e20c8d91224cd3c01b302a75f43aa309b560
2021-03-15 09:49:27 -07:00
Wanchao Liang
f8238d7917 [optim] bugfix when all parameters have no grad (#52944)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/52944

This fix the bug introduced during refactoring optimizers https://github.com/pytorch/pytorch/pull/50411. When all parameters have no grads, we should still allows `beta` like hyper params to be defined.

Reviewed By: ngimel

Differential Revision: D26699827

fbshipit-source-id: 8a7074127704c7a4a1fbc17d48a81e23a649f280
2021-03-03 11:56:09 -08:00
Vincent Quenneville-Belair
50d903f19f [optim] make functional api be private (#51316) (#51665)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/51665

This reverts commit 896f82aa92.

Test Plan: Imported from OSS

Reviewed By: gchanan

Differential Revision: D26232608

Pulled By: vincentqb

fbshipit-source-id: ca006baf4fb672c11c1bb003c39a29cbadb63dd3
2021-02-03 17:59:05 -08:00
Vincent Quenneville-Belair
896f82aa92 [optim] make functional api be private (#51316)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/51316

Make optim functional API be private until we release with beta

Test Plan: Imported from OSS

Reviewed By: albanD

Differential Revision: D26213469

fbshipit-source-id: b0fd001a8362ec1c152250bcd57c7205ed893107
2021-02-03 09:29:33 -08:00
Samuel Marks
e6779d4357 [*.py] Rename "Arguments:" to "Args:" (#49736)
Summary:
I've written custom parsers and emitters for everything from docstrings to classes and functions. However, I recently came across an issue when I was parsing/generating from the TensorFlow codebase: inconsistent use of `Args:` and `Arguments:` in its docstrings.

```sh
(pytorch#c348fae)$ for name in 'Args:' 'Arguments:'; do
    printf '%-10s %04d\n' "$name" "$(rg -IFtpy --count-matches "$name" | paste -s -d+ -- | bc)"; done
Args:      1095
Arguments: 0336
```

It is easy enough to extend my parsers to support both variants, however it looks like `Arguments:` is wrong anyway, as per:

  - https://google.github.io/styleguide/pyguide.html#doc-function-args @ [`ddccc0f`](https://github.com/google/styleguide/blob/ddccc0f/pyguide.md)

  - https://chromium.googlesource.com/chromiumos/docs/+/master/styleguide/python.md#describing-arguments-in-docstrings @ [`9fc0fc0`](https://chromium.googlesource.com/chromiumos/docs/+/9fc0fc0/styleguide/python.md)

  - https://sphinxcontrib-napoleon.readthedocs.io/en/latest/example_google.html @ [`c0ae8e3`](https://github.com/sphinx-contrib/napoleon/blob/c0ae8e3/docs/source/example_google.rst)

Therefore, only `Args:` is valid. This PR replaces them throughout the codebase.

PS: For related PRs, see tensorflow/tensorflow/pull/45420

PPS: The trackbacks automatically appearing below are sending the same changes to other repositories in the [PyTorch](https://github.com/pytorch) organisation.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/49736

Reviewed By: albanD

Differential Revision: D25710534

Pulled By: soumith

fbshipit-source-id: 61e8ff01abb433e9f78185c2d1d0cbd7c22c1619
2020-12-28 09:34:47 -08:00
Wanchao Liang
08caf15502 [optimizer] refactor Adam to use functional API (#44791)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44791

Test Plan: Imported from OSS

Reviewed By: ailzhang

Differential Revision: D23935257

Pulled By: wanchaol

fbshipit-source-id: 6f6e22a9287f5515d2e4e6abd4dee2fe7e17b945
2020-09-25 17:13:08 -07:00
Xiang Gao
6bc77f4d35 Use amax/maximum instead of max in optimizers (#43797)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43797

Reviewed By: malfet

Differential Revision: D23406641

Pulled By: mruberry

fbshipit-source-id: 0cd075124aa6533b21375fe2c90c44a5d05ad6e6
2020-09-15 10:39:40 -07:00
farhadrgh
4b4273a04e Update Adam documentation (#41679)
Summary:
This PR fixes https://github.com/pytorch/pytorch/issues/41477

Adam implementation is doing L2 regularization and not decoupled weight decay. However, the change mentioned in https://github.com/pytorch/pytorch/issues/41477 was motivated by Line 12 of algorithm 2 in [Decoupled Weight Decay Regularization](https://arxiv.org/pdf/1711.05101.pdf) paper.

Please let me know if you have other suggestions about how to deliver this info in the docs.
cc ezyang

Pull Request resolved: https://github.com/pytorch/pytorch/pull/41679

Reviewed By: izdeby

Differential Revision: D22671329

Pulled By: vincentqb

fbshipit-source-id: 2caf60e4f62fe31f29aa35a9532d1c6895a24224
2020-07-23 09:25:41 -07:00
albanD
6e2bb1c054 End of the .data removal in torch/optim (#34211)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34211

Test Plan: Imported from OSS

Differential Revision: D20248684

Pulled By: albanD

fbshipit-source-id: 2294bfa41b82ff47f000bc98860780f59d7d4421
2020-03-09 06:40:39 -07:00
Eleanor Dwight Holland
6a97777f72 Remove use of .data from optimizers (#33640)
Summary:
Removes all uses of `.data` from optimizers.

Or tries to.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33640

Reviewed By: vincentqb

Differential Revision: D20203216

Pulled By: albanD

fbshipit-source-id: 9bfe78bbed00fd4aaa690801cff0201f0bd680a0
2020-03-03 13:21:55 -08:00
Xiao Wang
c1dd70688a Fix deprecated python "add" calls (#33428)
Summary:
This PR fixed those python "add" calls using deprecated signature `add(Scalar, Tensor)`. The alternative signature `add(Tensor, alpha = Scalar)` is used.

cc csarofeen zasdfgbnm ptrblck ngimel
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33428

Differential Revision: D20002534

Pulled By: vincentqb

fbshipit-source-id: 81f2dd6170a47a9b53a17e5817c26e70d8afa130
2020-02-26 09:02:31 -08:00
Nikolay Novik
d19a50bf27 Add missing weight_decay parameter validation for Adam and AdamW (#33126)
Summary:
Adam and AdamW are missing parameter validation for weight_decay. Other optimisers have this check present.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33126

Differential Revision: D19860366

Pulled By: vincentqb

fbshipit-source-id: 286d7dc90e2f4ccf6540638286d2fe17939648fc
2020-02-20 11:11:51 -08:00
albanD
b0871f211b Make all optimizers consistent so that they don't change gradients inplace
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30257

Test Plan: Imported from OSS

Differential Revision: D18665461

Pulled By: albanD

fbshipit-source-id: cfdafef919468a41007881b82fd288b7128baf95
2019-11-26 12:16:25 -08:00
Vitaly Fedyunin
877c96cddf explicitly provide memory format when calling to *_like operators
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30008

Test Plan: Imported from OSS

Differential Revision: D18575981

Pulled By: VitalyFedyunin

fbshipit-source-id: ec3418257089ad57913932be1a8608cd20ce054c
2019-11-19 16:19:29 -08:00
Farhad Ramezanghorbani
fed5ca192c Adam/AdamW implementation minor fix (#22628)
Summary:
I have noticed a small discrepancy between theory and the implementation of AdamW and in general Adam. The epsilon in the denominator of the following Adam update should not be scaled by the bias correction [(Algorithm 2, L9-12)](https://arxiv.org/pdf/1711.05101.pdf). Only the running average of the gradient (_m_) and squared gradients (_v_) should be scaled by their corresponding bias corrections.

![adam_update](https://user-images.githubusercontent.com/13050245/60894105-11117f00-a230-11e9-9ba0-adad2ae2e0ae.png)

In the current implementation, the epsilon is scaled by the square root of `bias_correction2`.  I have plotted this ratio as a function of step given `beta2 = 0.999` and `eps = 1e-8`. In the early steps of optimization, this ratio slightly deviates from theory (denoted by the horizontal red line).

![plot](https://user-images.githubusercontent.com/13050245/60893952-cabc2000-a22f-11e9-8dc2-6353ad5d674d.png)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22628

Differential Revision: D16589914

Pulled By: vincentqb

fbshipit-source-id: 8791eb338236faea9457c0845ccfdba700e5f1e7
2019-08-01 11:42:04 -07:00
Soumith Chintala
cf235e0894 fix lint after new flake8 release added new style constraints (#13047)
Summary:
fix lint after new flake8 release added new style constraints
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13047

Differential Revision: D10527804

Pulled By: soumith

fbshipit-source-id: 6f4d02662570b6339f69117b61037c8394b0bbd8
2018-10-24 09:03:38 -07:00
Jerry Ma
383d340e88 Small optimization for adam (#12107)
Summary:
Apply weight decay for Adam in-place instead of via copy.

Synced offline with soumith , who mentioned that it should be OK. This is also consistent with other optimizers, e.g. eee01731a5/torch/optim/sgd.py (L93)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12107

Reviewed By: soumith

Differential Revision: D10071787

Pulled By: jma127

fbshipit-source-id: 5fd7939c79039693b225c44c4c80450923b8d673
2018-09-26 21:43:46 -07:00
rasbt
eee01731a5 Adds the default value for the amsgrad arg to the Adam docstring (#9971)
Summary:
Minor addition to the docstring of `torch.nn.optim.Adam`, adding the default argument description for the `amsgrad` argument to the docstring for concistency.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9971

Differential Revision: D9040820

Pulled By: soumith

fbshipit-source-id: 168744a6bb0d1422331beffd7e694b9d6f61900c
2018-07-28 09:23:45 -07:00
lazypanda1
063946d2b3 Added parameter range checks for all optimizers (#6000) 2018-03-28 11:22:23 +02:00
li-roy
df88373f88 set default ams param in adam optimizer (#5501) 2018-03-02 11:43:06 +01:00
lazypanda1
a061000250 Added check and test for betas parameter in Adam optimizer (#5147)
* Added check and test for betas parameter in Adam optimizer

* Simplified test
2018-02-11 20:24:43 -05:00
Dr. Kashif Rasul
68c0998cbe added AMSgrad optimizer to Adam and SparseAdam (#4034)
* initial AMSGrad

* added test for amsgrad

* added amsgrad to adam

* fixed tests

* added option to sparse adam

* flake8
2017-12-18 13:24:49 -05:00
SsnL
f76d6c029c Sparse Adam optimizer for sparse gradients (#3137)
* sparse adam

* Favor dense addition over sparse_mask
2017-11-06 14:20:51 -05:00
Martin Raison
f17cfe4293 sparse tensor operations (#735) 2017-03-03 18:37:03 +01:00
Adam McCarthy
7926324385 Corrected parameter typo in Adam docstring (#697) 2017-02-07 19:00:10 +01:00
Luke Yeager
e7c1e6a8e3 [pep8] Fix most lint automatically with autopep8
Here's the command I used to invoke autopep8 (in parallel!):

    git ls-files | grep '\.py$' | xargs -n1 -P`nproc` autopep8 -i

Several rules are ignored in setup.cfg. The goal is to let autopep8
handle everything which it can handle safely, and to disable any rules
which are tricky or controversial to address. We may want to come back
and re-enable some of these rules later, but I'm trying to make this
patch as safe as possible.

Also configures flake8 to match pep8's behavior.

Also configures TravisCI to check the whole project for lint.
2017-01-28 01:15:51 +01:00
Adam Paszke
ecfcf39f30 Improve optimizer serialization
Also, add optimizer.load_state_dict
2017-01-24 17:30:50 -05:00
Sergey Zagoruyko
2748b920ab make adam have the same lr as lua torch (#576) 2017-01-24 16:35:28 -05:00
Adam Paszke
95f0fa8a92 Change .grad attribute of Variables to be a Variable 2017-01-16 12:59:47 -05:00
Adam Paszke
604e13775f Add optim docs 2017-01-16 12:59:47 -05:00
Adam Paszke
09493603f6 Change optimizer API 2016-11-08 18:12:56 +01:00
Adam Paszke
df59b89fbb Add more optimizers 2016-11-07 22:50:56 +01:00
Adam Paszke
2f342af22f Move optim to legacy 2016-08-01 12:01:46 -04:00
Adam Paszke
554a1d8336 Add optim 2016-07-21 16:42:06 -04:00