Commit Graph

549 Commits

Author SHA1 Message Date
joncrall
b136f3f310 More doctest refinements. (#83317)
Follow up to #82797

Now that the doctests themselves are in a better state, we should be able to enable xdoctest on the CI so they stay that way.

@ezyang @vadimkantorov
Pull Request resolved: https://github.com/pytorch/pytorch/pull/83317
Approved by: https://github.com/ezyang
2022-08-22 20:07:26 +00:00
Emilio Castillo
f0eb841d20 Make torch.optim.RMSprop differentiable (#83578)
Blocked by #82205
Pull Request resolved: https://github.com/pytorch/pytorch/pull/83578
Approved by: https://github.com/albanD
2022-08-22 03:37:10 +00:00
albanD
84c4b07932 Make sure that we can load old optimizer checkpoint (#83588)
We want to make sure that we can load checkpoints that were saved with older version of the code (which doesn't contain the differentiable attribute).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/83588
Approved by: https://github.com/mikaylagawarecki
2022-08-17 15:08:05 +00:00
Emilio Castillo
5aab57e112 Make Adam optimizer differentiable (#82205)
Continues [80938](https://github.com/pytorch/pytorch/pull/80938)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/82205
Approved by: https://github.com/albanD
2022-08-17 07:20:37 +00:00
Rob Zinkov
ff75562cff Adding maximize to rprop (#81864)
Added the maximize flag #68052 to rprop optimizer and updates the respective tests.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/81864
Approved by: https://github.com/albanD
2022-08-16 08:19:46 +00:00
joncrall
4618371da5 Integrate xdoctest - Rebased (#82797)
This is a new version of #15648 based on the latest master branch.

Unlike the previous PR where I fixed a lot of the doctests in addition to integrating xdoctest, I'm going to reduce the scope here. I'm simply going to integrate xdoctest, and then I'm going to mark all of the failing tests as "SKIP". This will let xdoctest run on the dashboards, provide some value, and still let the dashboards pass. I'll leave fixing the doctests themselves to another PR.

In my initial commit, I do the bare minimum to get something running with failing dashboards. The few tests that I marked as skip are causing segfaults. Running xdoctest results in 293 failed, 201 passed tests. The next commits will be to disable those tests. (unfortunately I don't have a tool that will insert the `#xdoctest: +SKIP` directive over every failing test, so I'm going to do this mostly manually.)

Fixes https://github.com/pytorch/pytorch/issues/71105

@ezyang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/82797
Approved by: https://github.com/ezyang
2022-08-12 02:08:01 +00:00
Federico Pozzi
f8a10a7f79 feat: add PolynomialLR scheduler (#82769)
### Description
<!-- What did you change and why was it needed? -->

Add PolynomialLR scheduler.

### Issue
Closes #79511.

### Testing
I added tests for PolynomialLR.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/82769
Approved by: https://github.com/datumbox
2022-08-10 18:21:00 +00:00
Rob Zinkov
c54d18dbc7 Handle complex optimization in Adamax by treating complex numbers as 2D real numbers (#80319)
This commit partially addresses #65711

Pull Request resolved: https://github.com/pytorch/pytorch/pull/80319
Approved by: https://github.com/albanD
2022-08-05 21:03:18 +00:00
Rob Zinkov
dcbe9ce2ad Handle complex optimization in AdamW by treating complex numbers as 2D real numbers (#80280)
This commit partially addresses #65711

Pull Request resolved: https://github.com/pytorch/pytorch/pull/80280
Approved by: https://github.com/albanD
2022-08-05 13:47:14 +00:00
Masaki Kozuki
3139722679 [foreach][mta] Inplace maximum and minimum (#82523)
### Description
<!-- What did you change and why was it needed? -->
Implement `torch._foreach_maximum_` and `torch._foreach_minimum_` mainly for `_multi_tensor_adam` and `_multi_tensor_adamw` with `amsgrad=True` to correctly update their `max_exp_avg_sqs`.

### Issue
<!-- Link to Issue ticket or RFP -->
- https://github.com/pytorch/pytorch/issues/78807
- https://github.com/pytorch/pytorch/pull/81894
- https://github.com/pytorch/pytorch/pull/81348
- https://github.com/pytorch/pytorch/pull/81705
- https://github.com/pytorch/pytorch/issues/58833
- https://github.com/pytorch/pytorch/issues/68041

### Testing
<!-- How did you test your change? -->
Updated `test_foreach.py::TestForeach::_minmax_test` to compare the outputs of `_foreach_maximum_` (and `_foreach_minimum_`) against those of `[torch.maximum(a, b) for a, b in zip(tensors1, tensors2)]`

cc @ngimel @albanD @mikaylagawarecki
Pull Request resolved: https://github.com/pytorch/pytorch/pull/82523
Approved by: https://github.com/albanD
2022-08-03 03:40:42 +00:00
ProGamerGov
71d50f4f89 Change docstring type callable to Callable for consistency (#82487)
### Description

Across PyTorch's docstrings, both `callable` and `Callable` for variable types. The Callable should be capitalized as we are referring to the `Callable` type, and not the Python `callable()` function.

### Testing

There shouldn't be any testing required.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/82487
Approved by: https://github.com/albanD
2022-08-01 17:26:09 +00:00
ProGamerGov
357b7d589c Fix docstring inconsistencies: string -> str, boolean -> bool (#82410)
### Description

Throughout the PyTorch docs and codebase, the `string` type in docstrings is referred to by two separate names. This leads to inconsistent docs, like you can see here: https://pytorch.org/docs/stable/generated/torch.nn.Conv3d.html#torch.nn.Conv3d

This PR fixes this issue by ensuring that all mentions of the string type in docstrings, are using the same format that Sphinx generates hyperlinks for.

### Testing
No testing should be required for this change

Pull Request resolved: https://github.com/pytorch/pytorch/pull/82410
Approved by: https://github.com/jbschlosser
2022-07-28 21:29:57 +00:00
Rob Zinkov
f9ef363982 Modifying Adam to support complex numbers as 2d real numbers (#80279)
This commit addresses issues in #65711

Pull Request resolved: https://github.com/pytorch/pytorch/pull/80279
Approved by: https://github.com/albanD
2022-07-27 18:39:40 +00:00
Sudarshan Raghunathan
52aae5aa19 [Sparse Adam] Fix error in loading serialized models due to introduction of new parameter (#82273)
### Description
PR #80336  introduced a new parameter to the Sparse Adam optimizer. The new parameter is accessed inside the `step` method of the optimizer. If we try to deserialize and run an older version of the optimizer before this change was introduced, it fails in the step that tries to access the missing parameter.

I have added a workaround to set a default value in case the parameter is unavailable in the optimizer.

### Issue
<!-- Link to Issue ticket or RFP -->

### Testing
* Testing on PyTorch CI
* Manual validation against existing serialized models to make sure they continue to work
Pull Request resolved: https://github.com/pytorch/pytorch/pull/82273
Approved by: https://github.com/mehtanirav, https://github.com/albanD
2022-07-27 12:48:38 +00:00
albanD
312ece7f65 fix sgd maximize when momentum is involved (#81859)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/81859
Approved by: https://github.com/jbschlosser
2022-07-26 16:48:32 +00:00
Emilio Castillo
49b4f45781 Add initial support for differentiable optimizers (#80938)
Adds the `differentiable` argument, a method for updating parameters in an existing optimizer, and a template for testing the differentiability of multiple optimizers.

This is all based in discussions with @albanD & @jbschlosser
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80938
Approved by: https://github.com/albanD
2022-07-25 13:37:08 +00:00
Rob Zinkov
50c655d5e3 Adding maximize to ASGD (#81875)
Added the maximize flag #68052 to ASGD optimizer and updates the respective tests.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/81875
Approved by: https://github.com/albanD
2022-07-22 17:05:41 +00:00
PyTorch MergeBot
135af0fe30 Revert "Adding maximize to ASGD (#80323)"
This reverts commit 14bd5bd6ee.

Reverted https://github.com/pytorch/pytorch/pull/80323 on behalf of https://github.com/albanD due to Broke rocm test
2022-07-08 13:35:31 +00:00
PyTorch MergeBot
0b8a5ca01b Revert "Adding maximize to rprop (#80335)"
This reverts commit 495aa9bc3a.

Reverted https://github.com/pytorch/pytorch/pull/80335 on behalf of https://github.com/albanD due to Broke rocm and windows test
2022-07-08 13:34:02 +00:00
Rob Zinkov
f24c94d7ae Adding maximize to SparseAdam (#80336)
Added the maximize flag #68052 to SparseAdam optimizer and updates the respective tests.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80336
Approved by: https://github.com/albanD
2022-07-08 12:17:27 +00:00
Rob Zinkov
495aa9bc3a Adding maximize to rprop (#80335)
Added the maximize flag #68052 to rprop optimizer and updates the respective tests.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80335
Approved by: https://github.com/albanD
2022-07-08 08:04:38 +00:00
Rob Zinkov
a1fd5b4273 Adding maximize to RMSprop (#80326)
Added the maximize flag #68052 to RMSprop optimizer and updates the respective tests.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/80326
Approved by: https://github.com/albanD
2022-07-08 08:04:26 +00:00
Rob Zinkov
14bd5bd6ee Adding maximize to ASGD (#80323)
Added the maximize flag #68052 to ASGD optimizer and updates the respective tests.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80323
Approved by: https://github.com/albanD
2022-07-08 08:03:36 +00:00
albanD
9d20af5060 remove overly restrictive checks for cudagraph (#80881)
Finish fixing https://github.com/pytorch/pytorch/issues/80809
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80881
Approved by: https://github.com/jbschlosser
2022-07-06 18:08:49 +00:00
Edward Z. Yang
57f001f35a Don't error if _warned_capturable_if_run_uncaptured not set (#80345)
This can happen if an optimizer was pickled.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80345
Approved by: https://github.com/malfet, https://github.com/albanD
2022-06-29 03:46:22 +00:00
anjali411
bda04e9f5e Add __all__ for torch.optim and torch.nn.modules modules (#80237)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80237
Approved by: https://github.com/albanD
2022-06-24 21:34:10 +00:00
Sergii Dymchenko
de7219e8a7 Use generators with all/any in torch/optim (#78142)
Generator comprehensions with any/all are less verbose and potentially help to save memory/CPU : https://eklitzke.org/generator-comprehensions-and-using-any-and-all-in-python

To make JIT work with this change, I added code to convert GeneratorExp to ListComp. So the whole PR is basically NoOp for JIT, but potentially memory and speed improvement for eager mode.

Also I removed a test from test/jit/test_parametrization.py. The test was bad and had a TODO to actually implement and just tested that UnsupportedNodeError is thrown, and with GeneratorExp support a different error would be thrown.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78142
Approved by: https://github.com/malfet, https://github.com/albanD
2022-06-24 17:23:45 +00:00
albanD
375668cd96 Remove overly restrictive assert in adam (#80222)
This is causing issues if the user has the step on cuda for a good reason.

These assert prevents code that used to run just fine to fail.
Note that this is a pretty bad thing to do for performance though so it is ok to try and push users away from doing it.

For the 1.12.1 milestone: this is not asking for a dot release to fix this (as this is bad practice anyways). But it would be a great thing to add if we do one: it is very low risk and will prevent breakage for users.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80222
Approved by: https://github.com/jbschlosser, https://github.com/ngimel
2022-06-24 17:08:34 +00:00
Antonio Kim
765b6a8fab Fix SequentialLR initialization (#72856)
What was happening is that when we have multiple learning rate schedulers, the order in which they are being initialized is not being taken into account. This is a problem if they were being initialized in sequential order (as one might intuitively do).

Each scheduler calls `step()` on initialization and sets the `lr` in its optimizer's `params_groups`. However, this means that step 0 will be using the `lr` that was set by the very last scheduler (in the case of initializing schedulers sequentially) instead of the first scheduler.

The fix in this PR, addresses the above bug by performing a call to the appropriate scheduler on initialization after decrementing the `last_epoch` values in order to keep them the same post-step. This will ensure that the correct scheduler is the one setting the `lr` values for the optimizer's `param_groups`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72856
Approved by: https://github.com/jbschlosser
2022-06-21 20:21:13 +00:00
Janosh Riebesell
660d9ddef4 Fix SWALR doc string (#79836)
In `torch/optim/swa_utils.py`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/79836
Approved by: https://github.com/albanD
2022-06-20 12:57:07 +00:00
Sebastian Brodehl
fb9d8de379 Make LR scheduler stub complete, including OneCycleLR and class attributes. (#59476)
This PR completes the stub file for lr scheduler and includes a previously missing scheduler, namely `OneCycleLR, and adds additional class attributes and methods for all lr scheduler.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/59476
Approved by: https://github.com/jbschlosser
2022-06-17 16:39:13 +00:00
Madhushan B
9acbaaaf05 Fix typo in ChainedScheduler docstring (#79775)
### Goal
Fixes https://github.com/pytorch/pytorch/issues/79720

### Approach
replace `Chains list of learning rate schedulers. It takes a list of chainable learning rate schedulers and performs consecutive step() functions` **`belong`** `to them by just one call.` with `Chains list of learning rate schedulers. It takes a list of chainable learning rate schedulers and performs consecutive step() functions` **`belonging`** `to them by just one call.`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/79775
Approved by: https://github.com/albanD
2022-06-17 14:18:42 +00:00
Michael Carilli
ba27ee9e8f [CUDA graphs] Allows Adam and AdamW to be capture-safe (#77862)
Near term fix for https://github.com/pytorch/pytorch/issues/76368.

Q. Why does the user need to request `capturable=True` in the optimizer constructor? Why can't capture safety be completely automatic?
A. We need to set up capture-safe (device-side) state variables before capture. If we don't, and step() internally detects capture is underway, it's too late: the best we could do is create a device state variable and copy the current CPU value into it, which is not something we want baked into the graph.

Q. Ok, why not just do the capture-safe approach with device-side state variables all the time?
A. It incurs several more kernel launches per parameter, which could really add up and regress cpu overhead for ungraphed step()s. If the optimizer won't be captured, we should allow step() to stick with its current cpu-side state handling.

Q. But cuda RNG is a stateful thing that maintains its state on the cpu outside of capture and replay, and we capture it automatically. Why can't we do the same thing here?
A. The graph object can handle RNG generator increments because its capture_begin, capture_end, and replay() methods can see and access generator object. But the graph object has no explicit knowledge of or access to optimizer steps in its capture scope. We could let the user tell the graph object what optimizers will be stepped in its scope, ie something like
```python
graph.will_use_optimizer(opt)
graph.capture_begin()
...
```
but that seems clunkier than an optimizer constructor arg.

I'm open to other ideas, but right now I think constructor arg is necessary and the least bad approach.

Long term, https://github.com/pytorch/pytorch/issues/71274 is a better fix.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/77862
Approved by: https://github.com/ezyang
2022-06-13 01:56:47 +00:00
Rob Zinkov
2a496e2f80 Adding maximize to Adamax (#77409)
Added the maximize flag #68052 to Adamax optimizer and updates the respective tests.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/77409
Approved by: https://github.com/albanD
2022-05-16 17:34:44 +00:00
James Reed
57b54dfec5 Fix Optimizer.zero_grad type annotation (#76998)
`Optimizer.zero_grad()` defines the `set_to_none` argument as `bool`, not `Optional[bool]`

Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/76998
Approved by: https://github.com/albanD
2022-05-11 00:05:26 +00:00
tomMoral
ff94c9dee4 DOC fix momentum equation for nesterov
Fix https://github.com/pytorch/pytorch/issues/72395

This is a small fix in the doc for an indice in this equation:

![image](https://user-images.githubusercontent.com/3321081/166165461-140855b5-96b5-4417-85fc-2a170f95700a.png)

I think teh indice should not be `t-1` but `t`. This is coherent with [the implementation)[https://github.com/pytorch/pytorch/blob/master/torch/optim/sgd.py#L236] and with what is done for instance in [keras](https://www.tensorflow.org/api_docs/python/tf/keras/optimizers/SGD).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76639
Approved by: https://github.com/albanD
2022-05-04 20:40:21 +00:00
Emilio Castillo
e5ee6f5cf7 Fix CosineAnnealingLR on restart
Fixes #60265

The initial LR for this scheduler is not consistent when a new instance is created with `last_epoch != -1`

Maybe we can refactor the testing code to test `last_epoch != -1` in schedulers that can recreate their state from the current epoch?

Pull Request resolved: https://github.com/pytorch/pytorch/pull/60339
Approved by: https://github.com/albanD
2022-04-20 13:35:01 +00:00
Rob Zinkov
6642e88ad2 Adding maximize flag to Adagrad
This adds maximize to Adagrad (#68052) along with updates the respective tests.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/75968
Approved by: https://github.com/albanD
2022-04-20 08:29:03 +00:00
Jake Tae
3b18bc36f3 Docs: Add missing zero-ing step in Rprop algorithm
Fixes ##70418.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/75555
Approved by: https://github.com/albanD
2022-04-11 21:57:13 +00:00
francescocastelli
58a44523c1 Add maximize flag to Adadelta
Added the maximize flag to Adadelta optimizer (#68052) and adjusted tests to take maximize into account.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75330
Approved by: https://github.com/cpuhrsch
2022-04-08 20:32:35 +00:00
Mikayla Gawarecki
10bb0ffe69 Fix casting bug in state_step for optimizers when loading state dict
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75214

Approved by: https://github.com/albanD
2022-04-05 01:27:18 +00:00
Jan Zikes
715a0dc5c0 [PyTorch/d2go] fix optim _multi_tensor (#73215)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73215

Fixing an issue in optimizers from _multi_tensor, for `sgd_mt` introduced in 2cb03e926f

Reviewed By: mikaylagawarecki

Differential Revision: D34389034

fbshipit-source-id: ede153d52dca15909c6c022853589707f18dc8d1
(cherry picked from commit cc8a58e584)
2022-02-23 10:29:48 +00:00
Sergii Dymchenko
313557a613 Add missing import (#72840)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/72840

Reviewed By: H-Huang

Differential Revision: D34242612

Pulled By: albanD

fbshipit-source-id: 3dd34de96dbf1ae8f3c3ea45888d211d95862c49
(cherry picked from commit d2650ffa75)
2022-02-15 19:43:54 +00:00
Mikayla Gawarecki
2a5aaf1c49 Optim foreach cleanup for AdamW (#70484)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70484

Test Plan: Imported from OSS

Reviewed By: anjali411

Differential Revision: D33767869

Pulled By: mikaylagawarecki

fbshipit-source-id: 2f5273bbfeea3ed502c5d77da4bebe1674243e86
(cherry picked from commit 2dd9b77917)
2022-02-15 18:02:08 +00:00
Mikayla Gawarecki
dff58d519f Optim foreach cleanup for Rprop (#70483)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70483

Test Plan: Imported from OSS

Reviewed By: anjali411

Differential Revision: D33767866

Pulled By: mikaylagawarecki

fbshipit-source-id: ffc5ae68eeea8fa09385862b853b731554b77bcb
(cherry picked from commit 3a0fe29580)
2022-02-15 18:02:08 +00:00
Mikayla Gawarecki
ce3094f5f6 Optim foreach cleanup for Rmsprop (#70482)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70482

Test Plan: Imported from OSS

Reviewed By: anjali411

Differential Revision: D33767862

Pulled By: mikaylagawarecki

fbshipit-source-id: 8e2e9c986d5a3774093a79755940372945f1b3a9
(cherry picked from commit baea537277)
2022-02-15 18:02:08 +00:00
Mikayla Gawarecki
2cb03e926f Optim foreach cleanup for SGD (#70481)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70481

Test Plan: Imported from OSS

Reviewed By: anjali411

Differential Revision: D33767868

Pulled By: mikaylagawarecki

fbshipit-source-id: 89b9227a4ddf99602855973cbc343c58ae3d5328
(cherry picked from commit ffea8ddcfd)
2022-02-15 18:02:08 +00:00
Mikayla Gawarecki
5f9590681d Optim foreach cleanup for Adam (#70295)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70295

Test Plan: Imported from OSS

Reviewed By: anjali411

Differential Revision: D33767870

Pulled By: mikaylagawarecki

fbshipit-source-id: f922f15ecb0307458c8ecee737325c42c4f3ce8b
(cherry picked from commit 66233a8a3e)
2022-02-15 18:02:08 +00:00
Mikayla Gawarecki
0972db5b7d Optim foreach cleanup for ASGD (#70231)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70231

Test Plan: Imported from OSS

Reviewed By: anjali411

Differential Revision: D33767867

Pulled By: mikaylagawarecki

fbshipit-source-id: 4406824acbb6f427d52c1ced2d8a02a98c943b86
(cherry picked from commit cbd9a4da15)
2022-02-09 16:52:13 +00:00
Mikayla Gawarecki
5948522e9c Optim foreach cleanup for RAdam (#70230)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70230

Test Plan: Imported from OSS

Reviewed By: anjali411

Differential Revision: D33767874

Pulled By: mikaylagawarecki

fbshipit-source-id: 9379db24266a7bbcc2c23849f87ae0af2e6729c0
(cherry picked from commit ecf7b31fc3)
2022-02-09 16:52:13 +00:00
Mikayla Gawarecki
3653f07c7c Optim foreach cleanup for NAdam (#70229)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70229

Test Plan: Imported from OSS

Reviewed By: anjali411

Differential Revision: D33767873

Pulled By: mikaylagawarecki

fbshipit-source-id: 833ead14c1d1659351ebfbeb41045a3c7eb96dad
(cherry picked from commit 9415df6b5c)
2022-02-09 16:52:13 +00:00
Mikayla Gawarecki
d9acfef831 Optim foreach cleanup for Adamax (#69982)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69982

Test Plan: Imported from OSS

Reviewed By: anjali411

Differential Revision: D33767865

Pulled By: mikaylagawarecki

fbshipit-source-id: c5efd351e359825d38b71f57a2c61a2055c3c114
(cherry picked from commit 37bb80c2d7)
2022-02-09 16:52:13 +00:00
Mikayla Gawarecki
dabfea8363 Optim foreach cleanup for Adagrad (#69981)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69981

Test Plan: Imported from OSS

Reviewed By: anjali411

Differential Revision: D33767863

Pulled By: mikaylagawarecki

fbshipit-source-id: 1c99abe4ac4eb2a9eb896dff4837b539b94f68e7
(cherry picked from commit 61c28d0645)
2022-02-09 16:52:12 +00:00
Mikayla Gawarecki
8e8d170674 Optim foreach cleanup for Adadelta (#69980)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69980

- Merged `torch/optim/adadelta.py` and `torch/optim/_multitensor/adadelta.py` into `torch/optim/adadelta.py`
- Moved adadelta functional forms from `torch/optim/_functional.py` and `torch/optim/_multi_tensor/_functional.py` to `torch/optim/adadelta.py`
- `torch/optim/_functional.py` just imports from `torch/optim/adadelta.py`
- Added a test `test_optimizers_foreach_flag` which replicates `test_multi_tensor_optimizers` in `test/test_optim.py`
- Add a test `test_adadelta_new` that replicates the behavior of `test_adadelta` but with `foreach` flag instead of using the multitensor adadleta class. If we delete `_multitensor/` we could replace `test_adadelta` with this

Remaining TODO:

- [ ] single_tensor adadelta supports complex but multitensor does not, need to integrate the singletensor logic in multitensor and switch the `test_adadelta_complex` to test for foreach in [True, False]

Test Plan: Imported from OSS

Reviewed By: VitalyFedyunin, albanD

Differential Revision: D33413059

Pulled By: mikaylagawarecki

fbshipit-source-id: 92a9fa98705762bb6bd464261671e49aef40070e
(cherry picked from commit a008227d22)
2022-02-09 16:52:12 +00:00
Mikayla Gawarecki
8bb1d06702 [optim] ASGD fold state updates into functional and pass list of vars rather than states (#71335)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/71335

Test Plan: Imported from OSS

Reviewed By: anjali411

Differential Revision: D33767871

Pulled By: mikaylagawarecki

fbshipit-source-id: 84ebe1fafb1c27572f08c8c8026c882dd7e054c1
(cherry picked from commit 7613ebb391)
2022-02-08 23:58:41 +00:00
Mikayla Gawarecki
ccc1a01dcb [optim] NAdam fold state updates into functional (#71334)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/71334

Test Plan: Imported from OSS

Reviewed By: anjali411

Differential Revision: D33767864

Pulled By: mikaylagawarecki

fbshipit-source-id: 4d985e9e346f40110bd4231e0f16e5643fbc448d
(cherry picked from commit 58aa77e367)
2022-02-08 23:58:41 +00:00
Mikayla Gawarecki
7176c92687 [optim] update step in functional and pass state_steps instead of state (#71333)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71333

Updated
- Adagrad
- Adamax
- Adam
- AdamW
- RAdam
make multi_tensor functionals take `state_steps: List[Tensor]` instead of taking `states: List[Dict]`
make `state_steps: List[int]s -> state_steps:List[Tensor]` where each is a Singleton tensor so step can be updated within the functional

(NAdam and ASGD) were updated in separate diffs to fold their handling of state into the functionals

Test Plan: Imported from OSS

Reviewed By: anjali411

Differential Revision: D33767872

Pulled By: mikaylagawarecki

fbshipit-source-id: 9baa7cafb6375eab839917df9287c65a437891f2
(cherry picked from commit 831c02b3d0)
2022-02-08 16:51:19 +00:00
Prabhat Roy
942a084c46 Remove state_dict from AveragedModel and use buffers instead (#71763)
Summary:
Fixes [https://github.com/pytorch/pytorch/issues/66686](https://github.com/pytorch/pytorch/issues/66686)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/71763

Reviewed By: anjali411

Differential Revision: D33770907

Pulled By: prabhat00155

fbshipit-source-id: ee32f2cb8475c9add4e1a9a5d3d784ef95825efc
(cherry picked from commit a15898b072)
2022-01-26 13:31:30 +00:00
Alban Desmaison
e1b84e1b6b fix loading of older models that don't have maximize (#71023)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/71023

Reviewed By: jbschlosser

Differential Revision: D33483687

Pulled By: albanD

fbshipit-source-id: 2f3c6e97a9579be9ba15eca0756fc1f2c466fbb6
2022-01-10 06:01:24 -08:00
Jake Tae
dd1121435b SequentialLR update _last_lr on step (#70558)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/68956.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/70558

Reviewed By: dagitses

Differential Revision: D33430213

Pulled By: albanD

fbshipit-source-id: 446f182610de32db224d55b244d76c3076e8080f
2022-01-07 10:36:35 -08:00
Alban Desmaison
c6e727d05b Fix adamw formula doc (#68587)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/68482

Pull Request resolved: https://github.com/pytorch/pytorch/pull/68587

Reviewed By: dagitses, jbschlosser

Differential Revision: D33478646

Pulled By: albanD

fbshipit-source-id: 4e6419829c3faa7449c041e7d467a6dab30fe917
2022-01-07 10:15:16 -08:00
Mikayla Gawarecki
3a21f38a2e Integrate multi_tensor zero_grad into Optimizer base class (#69936)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69936

Currently, the optimizers in `torch/optim/_multi_tensor/` all override the base Optimizer class' implementation of `zero_grad` with the same foreach zero_grad implementation (e.g. [here](https://github.com/pytorch/pytorch/blob/master/torch/optim/_multi_tensor/adadelta.py#L93-L114)). There is a TODO that indicates that this should be refactored to the base class once the foreach ops are in good shape. This PR is intended to address that TODO.

Test Plan: Imported from OSS

Reviewed By: mrshenli

Differential Revision: D33346748

Pulled By: mikaylagawarecki

fbshipit-source-id: 6573f4776aeac757b6a778894681868191a1b4c7
2022-01-05 15:46:23 -08:00
Adnios
15f14ce0dc fix typo in adam docs (#70387)
Summary:
Fix the typo in [adam docs in master branch](https://pytorch.org/docs/master/generated/torch.optim.Adam.html#torch.optim.Adam)

![image](https://user-images.githubusercontent.com/41060790/147345284-37e180d1-fd06-4a62-9c79-2d17b8aa5cd3.png)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/70387

Reviewed By: H-Huang

Differential Revision: D33309283

Pulled By: albanD

fbshipit-source-id: d20c5d8f2498ac64013f71e202a6b50dcc069f2b
2021-12-28 07:35:39 -08:00
Adnios
a9c7d626e1 Add the maximize flag to AdamW (#70146)
Summary:
Related issue: https://github.com/pytorch/pytorch/issues/68052

cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse SciPioneer H-Huang

Pull Request resolved: https://github.com/pytorch/pytorch/pull/70146

Reviewed By: malfet

Differential Revision: D33254561

Pulled By: albanD

fbshipit-source-id: f190c836a4162f936c5953e076747c345df21421
2021-12-23 09:20:29 -08:00
Rohit Gupta
5f3f327a9d update SequentialLR signature (#69817)
Summary:
- ~optimizer isn't required for `SequentialLR` since it's already present in the schedulers. Trying to match the signature of it with `ChainedScheduler`.~
- ~`verbose` isn't really used anywhere so removed it.~

updated missing docs and added a small check

Pull Request resolved: https://github.com/pytorch/pytorch/pull/69817

Reviewed By: ngimel

Differential Revision: D33069589

Pulled By: albanD

fbshipit-source-id: f015105a35a2ca39fe94c70acdfd55cdf5601419
2021-12-16 12:58:00 -08:00
John Muradeli
fdcb78df38 print fix in lr_scheduler (#68338)
Summary:
`{:5d}` fails for `CosineAnnealingWarmRestarts` which has float `epoch`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/68338

Reviewed By: jbschlosser

Differential Revision: D33063970

Pulled By: albanD

fbshipit-source-id: 992e987f8d5f6f8f5067924df4671e9725b6d884
2021-12-14 09:05:19 -08:00
oliver
3d358a7678 Adds a maximize flag to Adam (#68164)
Summary:
Solves the next most important use case in https://github.com/pytorch/pytorch/issues/68052.

I have kept the style as close to that in SGD as seemed reasonable, given the slight differences in their internal implementations.

All feedback welcome!

cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse SciPioneer H-Huang

Pull Request resolved: https://github.com/pytorch/pytorch/pull/68164

Reviewed By: VitalyFedyunin

Differential Revision: D32994129

Pulled By: albanD

fbshipit-source-id: 65c57c3f3dbbd3e3e5338d51def54482503e8850
2021-12-13 05:53:53 -08:00
Kurt Mohler
52219b1017 Fix ChainedScheduler.get_last_lr() (#69112)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/68820

cc vincentqb jbschlosser albanD

Pull Request resolved: https://github.com/pytorch/pytorch/pull/69112

Reviewed By: zou3519

Differential Revision: D32796626

Pulled By: albanD

fbshipit-source-id: bde9d4e473527be4c0a7f21cb57f795a67a99eaa
2021-12-02 13:44:12 -08:00
Santiago Castro
263125a962 Fix RAdam docstring on LR default value (#69186)
Summary:
Fixes #{issue number}

Pull Request resolved: https://github.com/pytorch/pytorch/pull/69186

Reviewed By: albanD

Differential Revision: D32759614

Pulled By: H-Huang

fbshipit-source-id: b11819c50156a538cd6003e9cddde0390c853f67
2021-12-01 14:32:07 -08:00
Artsiom Sanakoyeu
c0e6dc9ac7 [pytorch] Fix loading from checkpoint after "maximize" flag was introduced in SGD (#68733)
Summary:
After 'maximize' flag was introduced in  https://github.com/pytorch/pytorch/issues/46480 some jobs fail because they resume training from the checkpoints.

After we load old checkpoints we will get an error during optimizer.step() call during backward pass in [torch/optim/sgd.py", line 129] because there is no key 'maximize' in the parameter groups of the SGD.

To circumvent this I add a default value `group.setdefault('maximize', False)` when the optimizer state is restored.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/68733

Reviewed By: albanD

Differential Revision: D32480963

Pulled By: asanakoy

fbshipit-source-id: 4e367fe955000a6cb95090541c143a7a1de640c2
2021-11-23 11:42:16 -08:00
nhankiet
a2e35e167b refactor: update f-string for swa.utils.py (#68718)
Summary:
_ Update some old-style formats to f-string, for whole and coherent consistency.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/68718

Reviewed By: jbschlosser

Differential Revision: D32593746

Pulled By: albanD

fbshipit-source-id: fcc17958f8af6a3260beca883bc1065f019dcf0e
2021-11-22 11:23:18 -08:00
oliver
94b6fa6f8b Adds an optimizer instance variable to ChainedScheduler (#68010)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/67601.

As simple a fix as I could make it. I even managed to delete some testing code!

I checked calling `super()` and, as I had feared, it doesn't work out the box, so perhaps that ought to be revisited later.

As it stands,  https://github.com/pytorch/pytorch/issues/20124, still applies to the chained scheduler, but I think this change is still an improvement.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/68010

Reviewed By: zou3519

Differential Revision: D32278139

Pulled By: albanD

fbshipit-source-id: 4c6f9f1b2822affdf63a6d22ddfdbcb1c6afd579
2021-11-10 01:31:47 -08:00
oliver
f8297d40fc Adds a maximize flag to SGD. (#67847)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/46480 -- for SGD.

## Notes:
- I have modified the existing tests to take a new `constructor_accepts_maximize` flag. When this is set to true, the ` _test_basic_cases_template` function will test both maximizing and minimizing the sample function.
- This was the clearest way I could think of testing the changes -- I would appreciate feedback on this strategy.

## Work to be done:
[] I need to update the docs.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/67847

Reviewed By: H-Huang

Differential Revision: D32252631

Pulled By: albanD

fbshipit-source-id: 27915a3cc2d18b7e4d17bfc2d666fe7d2cfdf9a4
2021-11-09 00:43:07 -08:00
hesom
07a08fb95f Fix typo in LinearLR docs (#67840)
Summary:
The final learning rate should be 0.05 like the lr used as the argument for the optimizer and not 0.005.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/67840

Reviewed By: jbschlosser

Differential Revision: D32187091

Pulled By: albanD

fbshipit-source-id: 8aff691bba3896a847d7b9d9d669a65f67a6f066
2021-11-05 07:16:15 -07:00
Yiwen Song
6696c59af4 Adding optimizer attribute to SequentialLR (#67406)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/67318 :)

cc albanD, datumbox

Pull Request resolved: https://github.com/pytorch/pytorch/pull/67406

Reviewed By: jbschlosser

Differential Revision: D31997873

Pulled By: albanD

fbshipit-source-id: f579fb886d049a545673fd92ef5892fcf501bcc6
2021-10-28 14:43:40 -07:00
Christopher Gray Howard
dfa7225a38 [Pytorch][Bootcamp] Add fix and testing for non-vectorized Adadelta optimizer to handle complex numbers (#66587)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66587

Made some changes in the step function of the non-vectorized Adadelta optimizer to handle complex numbers as two real numbers as per 65711 on github
ghstack-source-id: 141484731

Test Plan:
buck test mode/dev caffe2/test:optim -- 'test_adadelta_complex'

https://pxl.cl/1R7kJ

Reviewed By: albanD

Differential Revision: D31630069

fbshipit-source-id: 2741177b837960538ce39772897af36bbce7b7d8
2021-10-26 17:35:01 -07:00
Christopher Gray Howard
acb340de75 [Pytorch][Bootcamp] Add fixes and vanilla testing for Adagrad non-vectorized and vectorized optimizers to handle complex numbers (#66671)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66671

Made changes in the step function of the vectorized and non-vectorized adagrad optimizers to handle complex numbers as two real numbers as per 65711 on github
ghstack-source-id: 141442350

Test Plan:
buck test mode/dev caffe2/test:optim -- 'test_adagrad_complex'
https://pxl.cl/1Rd44

Reviewed By: albanD

Differential Revision: D31673503

fbshipit-source-id: 90a0d0c69b556716e2d17c59ce80f09c750fc464
2021-10-25 10:13:21 -07:00
Prabhat Roy
c7748fc172 Added validation of mode parameter in AveragedModel (#65921)
Summary:
Discussion: https://github.com/pytorch/pytorch/pull/65495#issuecomment-930460469

Pull Request resolved: https://github.com/pytorch/pytorch/pull/65921

Reviewed By: albanD

Differential Revision: D31310105

Pulled By: prabhat00155

fbshipit-source-id: 417691832a7c793744830c11e0ce53e3972d21a3
2021-10-03 08:42:28 -07:00
Prabhat Roy
2ea724b1fd Added option to update parameters using state_dict in AveragedModel (#65495)
Summary:
While implementing [EMA](https://github.com/pytorch/vision/pull/4381)(which extends AveragedModel) in torchvision, update_parameters() from AveragedModel could not be used as it did not handle state_dict(), so a custom update_parameters() needed to be defined in [EMA class](https://github.com/pytorch/vision/pull/4406). This PR aims to handle this scenario removing the need for this custom update_parameters() implementation.

Discussion: https://github.com/pytorch/vision/pull/4406#pullrequestreview-753734102

Pull Request resolved: https://github.com/pytorch/pytorch/pull/65495

Reviewed By: datumbox

Differential Revision: D31176742

Pulled By: prabhat00155

fbshipit-source-id: 326d14876018f21cf602bab5eaba344678dbabe2
2021-09-28 03:34:49 -07:00
Balaji
32f0387ee8 Bug in CosineAnnealingWarmRestarts in optim/lr_scheduler.py (#64758)
Summary:
## {emoji:1f41b} Bug
'CosineAnnealingWarmRestarts'  object has no attribute 'T_cur'.
In the Constructor of the CosineAnnealingWarmRestarts, we're calling the constructor of the Parent class (_LRScheduler) which inturn calls the step method of the CosineAnnealingWarmRestarts.
The called method tries to update the object's attribute  'T_cur' which is not defined yet. So it raises the error.
This only holds, when we give the value for last_epoch argument as 0 or greater than 0 to the 'CosineAnnealingWarmRestarts', while initializing the object.

![Bug_in_CosineAnnealingWarmRestarts](https://user-images.githubusercontent.com/77477328/132552212-70abc8b5-0357-4c35-90a9-832648bac607.png)
## To Reproduce

Steps to reproduce the behavior:

1. Give the value for the last_epoch argument as zero OR
1. Give the value for the last_epoch argument as a Positive integer.

## Expected behavior

I only expected the 'CosineAnnealingWarmRestarts' object to be initialized.

## Environment

PyTorch version: 1.9.0+cpu
Is debug build: False
CUDA used to build PyTorch: None
ROCM used to build PyTorch: N/A
OS: Ubuntu 20.04.2 LTS (x86_64)
GCC version: (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0
Clang version: Could not collect
CMake version: version 3.21.2
Libc version: glibc-2.31
Python version: 3.8.10  [GCC 9.4.0] (64-bit runtime)
Python platform: Linux-5.8.0-59-generic-x86_64-with-glibc2.29
Is CUDA available: False
CUDA runtime version: No CUDA

## Additional context
We can able to solve this bug by moving the line 'self.T_cur = self.last_epoch' above the 'super(CosineAnnealingWarmRestarts,self).__init__()' line. Since we've initialized the "self.T_cur" to the object.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/64758

Reviewed By: ezyang

Differential Revision: D31113694

Pulled By: jbschlosser

fbshipit-source-id: 98c0e292291775895dc3566fda011f2d6696f721
2021-09-22 16:55:14 -07:00
Ilqar Ramazanli
df3d649380 To add state dict and load_dict for Chained Scheduler (#65034)
Summary:
Adding state_dict() and load_state_dict() methods for Chained Scheduler

Pull Request resolved: https://github.com/pytorch/pytorch/pull/65034

Reviewed By: prabhat00155, nateanl

Differential Revision: D30958207

Pulled By: datumbox

fbshipit-source-id: 1a587a330d34e0548e891a39f8fb5a3d251b71fa
2021-09-15 13:11:41 -07:00
Ilqar Ramazanli
211ad231dc To add state_dict and load_state_dict to SequentialLR (#65035)
Summary:
To add state_dict() and load_state_dict() methods to SequentialLR

Pull Request resolved: https://github.com/pytorch/pytorch/pull/65035

Reviewed By: prabhat00155, nateanl

Differential Revision: D30958204

Pulled By: datumbox

fbshipit-source-id: 65114e1b07146526ae2680233f5cd42b2534d67a
2021-09-15 12:01:51 -07:00
Ilqar Ramazanli
dafa0a5a3b [doc][hackathon] To add Adadelta Optimizer to the documentation (#63255)
Summary:
It has been discussed before that adding description of Optimization algorithms to PyTorch Core documentation may result in a nice Optimization research tutorial. In the following tracking issue we mentioned about all the necessary algorithms and links to the originally published paper  https://github.com/pytorch/pytorch/issues/63236.

In this PR we are adding description of AdaDelta Algorithm to the documentation.  For more details, we refer to the paper  here https://arxiv.org/abs/1212.5701

<img width="654" alt="AdaDeltaalg" src="https://user-images.githubusercontent.com/73658284/132770544-82ccf90a-1d54-4ad5-8fc4-51c8dec63a12.png">

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63255

Reviewed By: ngimel

Differential Revision: D30867589

Pulled By: iramazanli

fbshipit-source-id: 5ba602c20c724a4486bdd38b73e1b64c0e767bdc
2021-09-10 16:49:12 -07:00
Ilqar Ramazanli
54b72a99ef To add Rprop documentation (#63866)
Summary:
It has been discussed before that adding description of Optimization algorithms to PyTorch Core documentation may result in a nice Optimization research tutorial. In the following tracking issue we mentioned about all the necessary algorithms and links to the originally published paper  https://github.com/pytorch/pytorch/issues/63236.

In this PR we are adding description of Rprop to the documentation.  For more details, we refer to the paper  http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.21.1417

<img width="657" alt="Rpropalg" src="https://user-images.githubusercontent.com/73658284/132750009-a5ec059e-6d53-4c67-917b-57174c8ca27b.png">

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63866

Reviewed By: ngimel

Differential Revision: D30867590

Pulled By: iramazanli

fbshipit-source-id: 0d2d4ffc6c4d939290bbbaa84d2c6e901ed8b54a
2021-09-10 09:49:10 -07:00
Ilqar Ramazanli
d4b09dbab3 [doc][hackathon] To add Adagrad Optimizer to the documentation (#63254)
Summary:
It has been discussed before that adding description of Optimization algorithms to PyTorch Core documentation may result in a nice Optimization research tutorial. In the following tracking issue we mentioned about all the necessary algorithms and links to the originally published paper  https://github.com/pytorch/pytorch/issues/63236.

In this PR we are adding description of Adagrad to the documentation.  For more details, we refer to the paper
http://jmlr.org/papers/v12/duchi11a.html

<img width="658" alt="AdaGradAlgo" src="https://user-images.githubusercontent.com/73658284/132743276-a52ea3fb-70a5-4788-94b7-f99367907a26.png">

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63254

Reviewed By: albanD

Differential Revision: D30852139

Pulled By: iramazanli

fbshipit-source-id: 9e496560a97e92be8386585b01d9bd3bba4b0c66
2021-09-09 15:41:29 -07:00
Ilqar Ramazanli
2b41bf40c5 To add SequentialLR to PyTorch Core Schedulers (#64037)
Summary:
Partially resolves https://github.com/pytorch/vision/issues/4281

In this PR we are proposing a new scheduler --SequentialLR-- which enables list of different schedulers called in different periods of the training process.

The main motivation of this scheduler is recently gained popularity of warming up phase in the training time. It has been shown that having a small steps in initial stages of training can help convergence procedure get faster.

With the help of SequentialLR we mainly enable to call a small constant (or linearly increasing) learning rate followed by actual target learning rate scheduler.

```PyThon
scheduler1 = ConstantLR(optimizer, factor=0.1, total_iters=2)
scheduler2 = ExponentialLR(optimizer, gamma=0.9)
scheduler = SequentialLR(optimizer, schedulers=[scheduler1, scheduler2], milestones=[5])

for epoch in range(100):
    train(...)
    validate(...)
    scheduler.step()
```

which this code snippet will call `ConstantLR` in the first 5 epochs and will follow up with `ExponentialLR` in the following epochs.

This scheduler could be used to provide call of any group of schedulers next to each other. The main consideration we should make is every time we switch to a new scheduler we assume that new scheduler starts from the beginning- zeroth epoch.

We also add Chained Scheduler to `optim.rst` and `lr_scheduler.pyi` files here.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/64037

Reviewed By: albanD

Differential Revision: D30841099

Pulled By: iramazanli

fbshipit-source-id: 94f7d352066ee108eef8cda5f0dcb07f4d371751
2021-09-09 09:36:32 -07:00
Ilqar Ramazanli
239366c9c2 To add Rectified Adam Description to Documentation (#63772)
Summary:
It has been discussed before that adding description of Optimization algorithms to PyTorch Core documentation may result in a nice Optimization research tutorial. In the following tracking issue we mentioned about all the necessary algorithms and links to the originally published paper  https://github.com/pytorch/pytorch/issues/63236.

In this PR we are adding description of Rectified Adam Algorithm to the documentation.  For more details, we refer to the paper  https://arxiv.org/abs/1908.03265

<img width="446" alt="RadamAlgo" src="https://user-images.githubusercontent.com/73658284/132587815-4764b642-df53-4e41-975f-72e0f40fdc48.png">

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63772

Reviewed By: datumbox

Differential Revision: D30839694

Pulled By: iramazanli

fbshipit-source-id: 6f5629ce56e10c66a451433334b587b99eda1610
2021-09-09 07:10:36 -07:00
Ilqar Ramazanli
5b21f172a4 [doc][hackathon] To add AdamW Optimizer to the documentation (#63252)
Summary:
It has been discussed before that adding description of Optimization algorithms to PyTorch Core documentation may result in a nice Optimization research tutorial. In the following tracking issue we mentioned about all the necessary algorithms and links to the originally published paper  https://github.com/pytorch/pytorch/issues/63236.

In this PR we are adding description of AdamW Algorithm to the documentation.  For more details, we refer to the paper  here https://arxiv.org/abs/1711.05101

<img width="442" alt="AdamWalgo" src="https://user-images.githubusercontent.com/73658284/132589957-6d381e96-cb62-40d0-990f-82a32ec455be.png">

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63252

Reviewed By: datumbox

Differential Revision: D30839685

Pulled By: iramazanli

fbshipit-source-id: 1a426c874ab86408d286a34f41aefcf5b21167c0
2021-09-09 07:05:31 -07:00
Ilqar Ramazanli
39ce801d1f To add Adamax algorithm to documentation (#63903)
Summary:
It has been discussed before that adding description of Optimization algorithms to PyTorch Core documentation may result in a nice Optimization research tutorial. In the following tracking issue we mentioned about all the necessary algorithms and links to the originally published paper  https://github.com/pytorch/pytorch/issues/63236.

In this PR we are adding description of Adamax Algorithm to the documentation.  For more details, we refer to the paper  https://arxiv.org/abs/1412.6980

<img width="447" alt="Adamx" src="https://user-images.githubusercontent.com/73658284/132577306-878ce64c-627a-4086-808c-d0482868d4a1.png">

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63903

Reviewed By: albanD

Differential Revision: D30819055

Pulled By: iramazanli

fbshipit-source-id: 37f748cbea9f93bf37193ee30fc295fb1a1e9ffd
2021-09-09 06:42:33 -07:00
Ilqar Ramazanli
149f1114fe To add Stochastic Gradient Descent to Documentation (#63805)
Summary:
It has been discussed before that adding description of Optimization algorithms to PyTorch Core documentation may result in a nice Optimization research tutorial. In the following tracking issue we mentioned about all the necessary algorithms and links to the originally published paper  https://github.com/pytorch/pytorch/issues/63236.

In this PR we are adding description of Stochastic Gradient Descent to the documentation.

<img width="466" alt="SGDalgo" src="https://user-images.githubusercontent.com/73658284/132585881-b351a6d4-ece0-4825-b9c0-126d7303ed53.png">

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63805

Reviewed By: albanD

Differential Revision: D30818947

Pulled By: iramazanli

fbshipit-source-id: 3812028e322c8a64f4343552b0c8c4582ea382f3
2021-09-08 15:22:30 -07:00
Ilqar Ramazanli
43248d9112 [doc][hackathon] To add Adam Optimizer to the documentation (#63251)
Summary:
It has been discussed before that adding description of Optimization algorithms to PyTorch Core documentation may result in a nice Optimization research tutorial. In the following tracking issue we mentioned about all the necessary algorithms and links to the originally published paper  https://github.com/pytorch/pytorch/issues/63236.

In this PR we are adding description of Adam Algorithm to the documentation.  For more details, we refer to the paper  https://arxiv.org/abs/1412.6980

<img width="442" alt="Screen Shot 2021-08-27 at 6 37 54 PM" src="https://user-images.githubusercontent.com/73658284/131195297-35fce613-3691-4fed-b42d-db234d4fcd7c.png">

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63251

Reviewed By: albanD

Differential Revision: D30779163

Pulled By: iramazanli

fbshipit-source-id: 319a80fc3952793b0d064d0e641ddc1de3c05a86
2021-09-07 11:03:35 -07:00
Ilqar Ramazanli
f767cf6683 To change WarmUp Scheduler with ConstantLR and LinearLR (#64395)
Summary:
Partially unblocks https://github.com/pytorch/vision/issues/4281

Previously we have added WarmUp Schedulers to PyTorch Core in the PR : https://github.com/pytorch/pytorch/pull/60836 which had two mode of execution - linear and constant depending on warming up function.

In this PR we are changing this interface to more direct form, as separating linear and constant modes to separate Schedulers. In particular

```Python
scheduler1 = WarmUpLR(optimizer, warmup_factor=0.1, warmup_iters=5, warmup_method="constant")
scheduler2 = WarmUpLR(optimizer, warmup_factor=0.1, warmup_iters=5, warmup_method="linear")
```

will look like

```Python
scheduler1 = ConstantLR(optimizer, warmup_factor=0.1, warmup_iters=5)
scheduler2 = LinearLR(optimizer, warmup_factor=0.1, warmup_iters=5)
```

correspondingly.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/64395

Reviewed By: datumbox

Differential Revision: D30753688

Pulled By: iramazanli

fbshipit-source-id: e47f86d12033f80982ddf1faf5b46873adb4f324
2021-09-07 08:42:31 -07:00
Tugsbayasgalan (Tugsuu) Manlaibaatar
52d7dd7398 [DOC] improve docstring for Optimizer.state_dict (#63153)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63153

Fixes: https://github.com/pytorch/pytorch/issues/60121

Test Plan: Imported from OSS

Reviewed By: pbelevich

Differential Revision: D30629462

Pulled By: tugsbayasgalan

fbshipit-source-id: a9160e02ac53bb1a6219879747d73aae9ebe4d2f
2021-08-29 10:20:58 -07:00
Ilqar Ramazanli
aefa2f3e64 To add RMSProp algorithm documentation (#63721)
Summary:
It has been discussed before that adding description of Optimization algorithms to PyTorch Core documentation may result in a nice Optimization research tutorial. In the following tracking issue we mentioned about all the necessary algorithms and links to the originally published paper  https://github.com/pytorch/pytorch/issues/63236.

In this PR we are adding description of RMSProp to the documentation.  For more details, we refer to the paper   https://www.cs.toronto.edu/~tijmen/csc321/slides/lecture_slides_lec6.pdf

<img width="464" alt="RMSProp" src="https://user-images.githubusercontent.com/73658284/131179226-3fb6fe5a-5301-4948-afbe-f38bf57f24ff.png">

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63721

Reviewed By: albanD

Differential Revision: D30612426

Pulled By: iramazanli

fbshipit-source-id: c3ac630a9658d1282866b53c86023ac10cf95398
2021-08-28 15:55:56 -07:00
Ilqar Ramazanli
9ccb9299e0 To add Nesterov Adam algorithm description to documentation (#63793)
Summary:
It has been discussed before that adding description of Optimization algorithms to PyTorch Core documentation may result in a nice Optimization research tutorial. In the following tracking issue we mentioned about all the necessary algorithms and links to the originally published paper  https://github.com/pytorch/pytorch/issues/63236.

In this PR we are adding description of Nesterov Adam Algorithm to the documentation.  For more details, we refer to the paper  https://openreview.net/forum?id=OM0jvwB8jIp57ZJjtNEZ

<img width="439" alt="NAdam" src="https://user-images.githubusercontent.com/73658284/131185124-e81b2edf-33d9-4a9d-a7bf-f7e5eea47d7c.png">

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63793

Reviewed By: NivekT

Differential Revision: D30617057

Pulled By: iramazanli

fbshipit-source-id: cd2054b0d9b6883878be74576e86e307f32f1435
2021-08-27 19:29:34 -07:00
Ilqar Ramazanli
5a12cb611f To add Chained Scheduler to the list of PyTorch schedulers. (#63491)
Summary:
In this PR we are introducing ChainedScheduler which initially proposed in the discussion https://github.com/pytorch/pytorch/pull/26423#discussion_r329976246 .

The idea is to provide a user friendly chaining method for schedulers, especially for the cases many of them are involved and we want to have a clean and easy to read interface for schedulers. This method will be even more crucial once CompositeSchedulers and Schedulers for different type of parameters are involved.

The immediate application of Chained Scheduler is expected to happen in TorchVision Library to combine WarmUpLR and  MultiStepLR https://github.com/pytorch/vision/blob/master/references/video_classification/scheduler.py#L5 . However, it can be expected that in many other use cases also this method could be applied.

### Example
The usage is as simple as below:

```python
sched=ChainedScheduler([ExponentialLR(self.opt, gamma=0.9),
                        WarmUpLR(self.opt, warmup_factor=0.2, warmup_iters=4, warmup_method="constant"),
                        StepLR(self.opt, gamma=0.1, step_size=3)])
```

Then calling
```python
sched.step()
```
would trigger step function for all three schedulers consecutively

Partially resolves https://github.com/pytorch/vision/issues/4281

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63491

Reviewed By: datumbox, mruberry

Differential Revision: D30576180

Pulled By: iramazanli

fbshipit-source-id: b43f0749f55faab25079641b7d91c21a891a87e4
2021-08-26 13:30:21 -07:00
Ilqar Ramazanli
e7c4988b52 To fix the chainability at epoch zero for some schedulers (#63457)
Summary:
It has been discussed in the https://github.com/pytorch/pytorch/pull/60836#issuecomment-899084092 that we have observed an obstacle to chain some type of learning rate schedulers. In particular we observed

* some of the learning rate schedulers returns initial learning rates at epoch 0 as
```
       return self.base_lrs`
```

* This can be a problem when two schedulers called as chained as

```
     scheduler1.step()
     scheduler2.step()
```

in particular, we completely ignore the effect of scheduler1 at epoch 0.  This could not be an issue if at epoch 0, scheduler1 was ineffective as in many schedulers, however for schedulers as WarmUp Schedulers, where at epoch 0 schedulers multiplicative value is smaller than 1 this could lead to undesired behaviors.

The following code snippet illustrates the problem better

## Reproducing the bug

```python
import torch
from torch.nn import Parameter
from torch.optim import SGD
from torch.optim.lr_scheduler import WarmUpLR, ExponentialLR

model = [Parameter(torch.randn(2, 2, requires_grad=True))]
optimizer = SGD(model, 1.0)
scheduler1 = WarmUpLR(optimizer, warmup_factor=0.1, warmup_iters=5, warmup_method="constant")
scheduler2 = ExponentialLR(optimizer, gamma=0.9)

for epoch in range(10):
     print(epoch, scheduler2.get_last_lr()[0])
     optimizer.step()
     scheduler1.step()
     scheduler2.step()
```

### Current Result

```
0 1.0
1 0.9
2 0.81
3 0.7290000000000001
4 0.6561000000000001
5 5.904900000000001
6 5.314410000000001
7 4.782969000000001
8 4.304672100000001
9 3.874204890000001
```

### Expected Result

```
0 1.0
1 0.9
2 0.81
3 0.7290000000000001
4 0.6561000000000001
5 0.5904900000000001
6 0.5314410000000001
7 0.4782969000000001
8 0.4304672100000001
9 0.3874204890000001
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63457

Reviewed By: datumbox

Differential Revision: D30424160

Pulled By: iramazanli

fbshipit-source-id: 3e15af8d278c872cd6f53406b55f4d3ce5002867
2021-08-19 07:17:03 -07:00
Ilqar Ramazanli
cec08e7032 To add warm-up scheduler to optim (#60836)
Summary:
Warm up of learning rate scheduling has initially been discussed  by Priya et. al. in the paper: https://arxiv.org/pdf/1706.02677.pdf .

In the section 2.2 of the paper they discussed and proposed idea of warming up learning schedulers in order to prevent big variance / noise in the learning rate. Then idea has been further discussed in the following papers:
  * Akilesh Gotmare et al. https://arxiv.org/abs/1810.13243
  * Bernstein et al  http://proceedings.mlr.press/v80/bernstein18a/bernstein18a.pdf
  * Liyuan Liu et al: https://arxiv.org/pdf/1908.03265.pdf

There are two type of popularly used learning rate warm up ideas
  * Constant warmup  (start with very small constant learning rate)
  * Linear Warmup        ( start with small learning rate and gradually increase)

In this PR we are adding warm up as learning rate scheduler. Note that learning rates are chainable, which means that we can merge warmup scheduler with any other learning rate scheduler to make more sophisticated learning rate scheduler.

## Linear Warmup

Linear Warmup is multiplying learning rate with pre-defined constant - warmup_factor in the first epoch (epoch 0). Then targeting to increase this multiplication constant to one in warmup_iters many epochs. Hence we can derive the formula at i-th step to have multiplication constant equal to:

                    warmup_factor + (1-warmup_factor) * i /  warmup_iters

Moreover, the fraction of this quantity at point i to point i-1 will give us

           1 + (1.0 - warmup_factor) / [warmup_iters*warmup_factor+(i-1)*(1-warmup_factor)]

which is used in get_lr() method in our implementation. Below we provide an example how to use linear warmup scheduler and to give an example to show how does it works.

```python
import torch
from torch.nn import Parameter
from torch.optim import SGD
from torch.optim.lr_scheduler import WarmUpLR

model = [Parameter(torch.randn(2, 2, requires_grad=True))]
optimizer = SGD(model, 0.1)
scheduler = WarmUpLR(optimizer, warmup_factor=0.1, warmup_iters=10, warmup_method="linear")

for epoch in range(15):

    print(epoch, scheduler.get_last_lr()[0])

    optimizer.step()
    scheduler.step()
```

```
0 0.010000000000000002
1 0.019000000000000003
2 0.028000000000000008
3 0.03700000000000001
4 0.04600000000000001
5 0.055000000000000014
6 0.06400000000000002
7 0.07300000000000002
8 0.08200000000000003
9 0.09100000000000004
10 0.10000000000000005
11 0.10000000000000005
12 0.10000000000000005
13 0.10000000000000005
14 0.10000000000000005
```

## Constant Warmup

Constant warmup has straightforward idea, to multiply learning rate by warmup_factor until we reach to epoch warmup_factor, then do nothing for following epochs

```python
import torch
from torch.nn import Parameter
from torch.optim import SGD
from torch.optim.lr_scheduler import WarmUpLR

model = [Parameter(torch.randn(2, 2, requires_grad=True))]
optimizer = SGD(model, 0.1)
scheduler = WarmUpLR(optimizer, warmup_factor=0.1, warmup_iters=5, warmup_method="constant")

for epoch in range(10):

    print(epoch, scheduler.get_last_lr()[0])

    optimizer.step()
    scheduler.step()
```

```
0 0.010000000000000002
1 0.010000000000000002
2 0.010000000000000002
3 0.010000000000000002
4 0.010000000000000002
5 0.10000000000000002
6 0.10000000000000002
7 0.10000000000000002
8 0.10000000000000002
9 0.10000000000000002
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/60836

Reviewed By: saketh-are

Differential Revision: D29537615

Pulled By: iramazanli

fbshipit-source-id: d910946027acc52663b301f9c56ade686e62cb69
2021-08-15 12:31:45 -07:00
Ilqar Ramazanli
5ed6e4429e To fix variance computation for complex Adam (#62946)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/59998

It has been discussed in the issue that the variance term of Adam optimizer currently doesn't compute correctly for complex domain.  As it has been stated in the Generalization to Complex numbers section  in https://en.wikipedia.org/wiki/Variance variance is computed as E[(X - mu)(X-mu)*] (where mu = E[X] and * stands for conjugate) for complex random variable X.

However, currently the computation method in implementation of Adam is via E[(X - mu)(X-mu)] which doesn't return right variance value, in particular it returns complex number. Variance is defined to be real number even though underlying random variable is complex.

We fix this issue here, and testing that resulting variance is indeed real number.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62946

Reviewed By: albanD

Differential Revision: D30196038

Pulled By: iramazanli

fbshipit-source-id: ab0a6f31658aeb56bdcb211ff86eaa29f3f0d718
2021-08-09 17:54:43 -07:00
Eugene Yang
27135f86fd fix docstring default value of last_epoch for SWALR in torch/optim/… (#62799)
Summary:
…swa_utils

Fixes https://github.com/pytorch/pytorch/issues/62633

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62799

Reviewed By: zou3519

Differential Revision: D30131929

Pulled By: H-Huang

fbshipit-source-id: 741c077073bbe398492dff0761836acdbba7be78
2021-08-06 08:15:10 -07:00
Philip Meier
8423ab4f99 Fix CosineAnnealingWarmRestart annotation (#61106)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/44770.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61106

Reviewed By: 1ntEgr8

Differential Revision: D29635764

Pulled By: walterddr

fbshipit-source-id: ddc45a7f04532a76d033ae7774706da1fa8608f7
2021-07-09 08:28:18 -07:00
ramvenkat98
4a544df00d Implement and benchmark a torch.optim.multi_tensor.adagrad implementation (#59155)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59155

Test Plan: Imported from OSS

Reviewed By: iramazanli

Differential Revision: D29525213

Pulled By: ramvenkat98

fbshipit-source-id: 6d7e8da91c965d1f4e955a084ed875bab641dc9a
2021-07-07 08:08:32 -07:00
Ilqar Ramazanli
f0e972a481 To add Nesterov Adam algorithm for multi-tensor optimizers API (#59165)
Summary:
Previously in the PR: https://github.com/pytorch/pytorch/issues/59009 we added NAdam to Optimizers.  Here in this PR we are proposing multi-tensor version of NAdam for PyTorch.

Nadam has been proposed in the paper   https://openreview.net/forum?id=OM0jvwB8jIp57ZJjtNEZ and report  and report : http://cs229.stanford.edu/proj2015/054_report.pdf by Timothy Dozat.

It has been one of the most used algorithm in Deep Learning community.

It worth to noting that the implementation of NAdam is inspired by the implementation for Keras :
f9d3868495/tensorflow/python/keras/optimizer_v2/nadam.py

Pull Request resolved: https://github.com/pytorch/pytorch/pull/59165

Reviewed By: vincentqb

Differential Revision: D29360577

Pulled By: iramazanli

fbshipit-source-id: 0fe14016303b2df2cb8cc31912a2674acf63d1e5
2021-06-27 17:00:41 -07:00
Ilqar Ramazanli
5563f4bda0 To add Rectified Adam algorithm for multi-tensor optimizers API (#59161)
Summary:
Previously in the PR: https://github.com/pytorch/pytorch/issues/58968 we added RAdam to Optimizers. Here in this PR we are proposing multi-tensor version of RAdam for PyTorch.

Radam has been proposed in the paper https://arxiv.org/pdf/1908.03265.pdf Liyuan Liu et al.

It has been one of the most used algorithm in Deep Learning community.

Differing from the paper, we selected variance tractability cut-off as 5 instead of 4 as it is the common practice.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/59161

Reviewed By: vincentqb

Differential Revision: D29360576

Pulled By: iramazanli

fbshipit-source-id: 7ccdbf12b1ee7f12e66f7d7992123a70cc818b6b
2021-06-27 13:01:20 -07:00
Ilqar Ramazanli
e1bd4963e2 To intorduce Functional API for multi-tensor (#60735)
Summary:
In this PR we change Multi-Tensor Optimizers to Functional API.

We can see that in the file : https://github.com/pytorch/pytorch/blob/master/torch/optim/_functional.py , there has been functional API defined for most of Optimizers. However we do not have similar file / functionality for multi tensors :
https://github.com/pytorch/pytorch/tree/master/torch/optim/_multi_tensor

Therefore we are adding it in this PR here.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/60735

Reviewed By: vincentqb

Differential Revision: D29392253

Pulled By: iramazanli

fbshipit-source-id: cebc8e7b07ab11156370f5297cfb419cd9f20b46
2021-06-25 13:09:26 -07:00
Ilqar Ramazanli
7c2938bf67 To refactor Sparse Adam algorithm for functional form (#59171)
Summary:
Adds Functional Interface for Sparse Adam Optimizer.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/59171

Reviewed By: vincentqb

Differential Revision: D29360582

Pulled By: iramazanli

fbshipit-source-id: 5ceffd7f4b7abd1e0b758a5b8445abdf5555eba0
2021-06-25 06:35:39 -07:00
Ilqar Ramazanli
63219f1f9f To add Rectified Adam Algorithm to Optimizers (#58968)
Summary:
Fixes : https://github.com/pytorch/pytorch/issues/24892

In the paper : https://arxiv.org/pdf/1908.03265.pdf  Liyuan Liu et al. suggested a new optimization algorithm with an essence of similar to Adam Algorithm.

It has been discussed in the paper that, without warmup heuristic, in the early stage of adaptive optimization / learning algorithms sometimes we can get undesirable large variance which can slow overall convergence process.

Authors proposed the idea of rectification of variance of adaptive learning rate when it is expected to be high.

Differing from the paper, we selected variance tractability cut-off as 5 instead of 4. This adjustment is common practice, and could be found in the code-repository and also tensorflow swift optim library as well :

2f03dd1970/radam/radam.py (L156)

f51ee4618d/Sources/TensorFlow/Optimizers/MomentumBased.swift (L638)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/58968

Reviewed By: vincentqb

Differential Revision: D29310601

Pulled By: iramazanli

fbshipit-source-id: b7bd487f72f1074f266687fd9c0c6be264a748a9
2021-06-23 18:27:57 -07:00
Ilqar Ramazanli
e8690dacb2 To add Nesterov Adam Algorithm to Optimizers (#59009)
Summary:
Fixes : https://github.com/pytorch/pytorch/issues/5804

In the paper : https://openreview.net/forum?id=OM0jvwB8jIp57ZJjtNEZ  Timothy Dozat suggested a new optimization algorithm with an essence of combination of NAG and Adam algorithms.

It is known that the idea of momentum can be improved with the Nesterov acceleration in optimization algorithms, and Dozat is investigating to apply this idea to momentum component of Adam algorithm. Author provided experiment evidence in their work to show excellence of the idea.

In this PR we are implementing the proposed algorithm NAdam in the mentioned paper. Author has a preliminary work http://cs229.stanford.edu/proj2015/054_report.pdf  where he shows the decay base constant should be taken as 0.96 which we also followed the same phenomenon here in this implementation similar to Keras. Moreover, implementation / coding practice have been followed similar to Keras in some other places as well:

f9d3868495/tensorflow/python/keras/optimizer_v2/nadam.py

Pull Request resolved: https://github.com/pytorch/pytorch/pull/59009

Reviewed By: gchanan, vincentqb

Differential Revision: D29220375

Pulled By: iramazanli

fbshipit-source-id: 4b4bb4b15f7e16f7527f368bbf4207ed345751aa
2021-06-23 08:21:43 -07:00
Sam Estep
1abf45e37f Revert D29241736: [pytorch][PR] To add Rectified Adam Algorithm to Optimizers
Test Plan: revert-hammer

Differential Revision:
D29241736 (0d2a936176)

Original commit changeset: 288b9b1f3125

fbshipit-source-id: 56c4ec98647c6f1822b130726741a1c9ca193670
2021-06-22 12:08:31 -07:00
Ilqar Ramazanli
0d2a936176 To add Rectified Adam Algorithm to Optimizers (#58968)
Summary:
Fixes : https://github.com/pytorch/pytorch/issues/24892

In the paper : https://arxiv.org/pdf/1908.03265.pdf  Liyuan Liu et al. suggested a new optimization algorithm with an essence of similar to Adam Algorithm.

It has been discussed in the paper that, without warmup heuristic, in the early stage of adaptive optimization / learning algorithms sometimes we can get undesirable large variance which can slow overall convergence process.

Authors proposed the idea of rectification of variance of adaptive learning rate when it is expected to be high.

Differing from the paper, we selected variance tractability cut-off as 5 instead of 4. This adjustment is common practice, and could be found in the code-repository and also tensorflow swift optim library as well :

2f03dd1970/radam/radam.py (L156)

f51ee4618d/Sources/TensorFlow/Optimizers/MomentumBased.swift (L638)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/58968

Reviewed By: gchanan

Differential Revision: D29241736

Pulled By: iramazanli

fbshipit-source-id: 288b9b1f3125fdc6c7a7bb23fde1ea5c201c0448
2021-06-22 10:38:41 -07:00
Ilqar Ramazanli
9a622f4cd9 refactor ASGD to use functional API (#58410)
Summary:
Functional API is used in large scale distributed training to enable multithreaded training instead of multiprocess, as it gives more optimal resource utilization and efficiency.

In this PR, we provide code migration and refactoring for functional API for ASGD algorithm.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/58410

Reviewed By: ailzhang

Differential Revision: D28546702

Pulled By: iramazanli

fbshipit-source-id: 4f62b6037d53f35b19f98340e88af2ebb6243a4f
2021-05-19 18:55:52 -07:00
Wanchao Liang
4611387608 [optim] take kw-only argument for functional optim APIs (#56185)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56185

ghstack-source-id: 126670123

Reviewed By: albanD

Differential Revision: D27802169

fbshipit-source-id: f5e1cb2046dcdeecf5f6b0f70892828bf0adb22f
2021-04-15 20:08:04 -07:00
Wanchao Liang
8ef13cf976 [optim] refactor rprop to use functional API (#55832)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/55832

ghstack-source-id: 126325541

Reviewed By: driazati

Differential Revision: D27703877

fbshipit-source-id: 34d4ce7b7d124c0cd75e2f6d0bc8f836713b7301
2021-04-15 15:19:41 -07:00
Wanchao Liang
bb245b6444 [optim] refactor adamax to use functional API (#55830)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/55830

ghstack-source-id: 126325537

Reviewed By: driazati

Differential Revision: D26561017

fbshipit-source-id: 41273d200e546d4ac08d39b57865d63c624f143a
2021-04-15 15:19:39 -07:00
mattip
40d74e6f71 breakup optim, cuda documentation (#55673)
Summary:
Related to https://github.com/pytorch/pytorch/issues/52256

Use autosummary instead of autofunction to create subpages for optim and cuda functions/classes.

Also fix some minor formatting issues in optim.LBFGS and cuda.stream docstings

Pull Request resolved: https://github.com/pytorch/pytorch/pull/55673

Reviewed By: jbschlosser

Differential Revision: D27747741

Pulled By: zou3519

fbshipit-source-id: 070681f840cdf4433a44af75be3483f16e5acf7d
2021-04-14 12:44:00 -07:00
Sam Estep
5bcbbf5373 Lint trailing newlines (#54737)
Summary:
*Context:* https://github.com/pytorch/pytorch/issues/53406 added a lint for trailing whitespace at the ends of lines. However, in order to pass FB-internal lints, that PR also had to normalize the trailing newlines in four of the files it touched. This PR adds an OSS lint to normalize trailing newlines.

The changes to the following files (made in 54847d0adb9be71be4979cead3d9d4c02160e4cd) are the only manually-written parts of this PR:

- `.github/workflows/lint.yml`
- `mypy-strict.ini`
- `tools/README.md`
- `tools/test/test_trailing_newlines.py`
- `tools/trailing_newlines.py`

I would have liked to make this just a shell one-liner like the other three similar lints, but nothing I could find quite fit the bill. Specifically, all the answers I tried from the following Stack Overflow questions were far too slow (at least a minute and a half to run on this entire repository):

- [How to detect file ends in newline?](https://stackoverflow.com/q/38746)
- [How do I find files that do not end with a newline/linefeed?](https://stackoverflow.com/q/4631068)
- [How to list all files in the Git index without newline at end of file](https://stackoverflow.com/q/27624800)
- [Linux - check if there is an empty line at the end of a file [duplicate]](https://stackoverflow.com/q/34943632)
- [git ensure newline at end of each file](https://stackoverflow.com/q/57770972)

To avoid giving false positives during the few days after this PR is merged, we should probably only merge it after https://github.com/pytorch/pytorch/issues/54967.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/54737

Test Plan:
Running the shell script from the "Ensure correct trailing newlines" step in the `quick-checks` job of `.github/workflows/lint.yml` should print no output and exit in a fraction of a second with a status of 0. That was not the case prior to this PR, as shown by this failing GHA workflow run on an earlier draft of this PR:

- https://github.com/pytorch/pytorch/runs/2197446987?check_suite_focus=true

In contrast, this run (after correcting the trailing newlines in this PR) succeeded:

- https://github.com/pytorch/pytorch/pull/54737/checks?check_run_id=2197553241

To unit-test `tools/trailing_newlines.py` itself (this is run as part of our "Test tools" GitHub Actions workflow):
```
python tools/test/test_trailing_newlines.py
```

Reviewed By: malfet

Differential Revision: D27409736

Pulled By: samestep

fbshipit-source-id: 46f565227046b39f68349bbd5633105b2d2e9b19
2021-03-30 13:09:52 -07:00
Jay Patel
4f62c622b3 Cleanup of unused list in adam.py (#53874)
Summary:
Code cleanup.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/53874

Reviewed By: jbschlosser

Differential Revision: D27036819

Pulled By: ngimel

fbshipit-source-id: c267e20c8d91224cd3c01b302a75f43aa309b560
2021-03-15 09:49:27 -07:00
Sam Estep
8c798e0622 Forbid trailing whitespace (#53406)
Summary:
Context: https://github.com/pytorch/pytorch/pull/53299#discussion_r587882857

These are the only hand-written parts of this diff:
- the addition to `.github/workflows/lint.yml`
- the file endings changed in these four files (to appease FB-internal land-blocking lints):
  - `GLOSSARY.md`
  - `aten/src/ATen/core/op_registration/README.md`
  - `scripts/README.md`
  - `torch/csrc/jit/codegen/fuser/README.md`

The rest was generated by running this command (on macOS):
```
git grep -I -l ' $' -- . ':(exclude)**/contrib/**' ':(exclude)third_party' | xargs gsed -i 's/ *$//'
```

I looked over the auto-generated changes and didn't see anything that looked problematic.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/53406

Test Plan:
This run (after adding the lint but before removing existing trailing spaces) failed:
- https://github.com/pytorch/pytorch/runs/2043032377

This run (on the tip of this PR) succeeded:
- https://github.com/pytorch/pytorch/runs/2043296348

Reviewed By: walterddr, seemethere

Differential Revision: D26856620

Pulled By: samestep

fbshipit-source-id: 3f0de7f7c2e4b0f1c089eac9b5085a58dd7e0d97
2021-03-05 17:22:55 -08:00
Wanchao Liang
f8238d7917 [optim] bugfix when all parameters have no grad (#52944)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/52944

This fix the bug introduced during refactoring optimizers https://github.com/pytorch/pytorch/pull/50411. When all parameters have no grads, we should still allows `beta` like hyper params to be defined.

Reviewed By: ngimel

Differential Revision: D26699827

fbshipit-source-id: 8a7074127704c7a4a1fbc17d48a81e23a649f280
2021-03-03 11:56:09 -08:00
MiHarsh
c871abecf5 Added torch.no_grad() to update_bn (#52654)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/52055

This fixes the **out of memory error** while using update_bn in **SWA**, by not allocating memory for backpropagation.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/52654

Reviewed By: malfet

Differential Revision: D26620077

Pulled By: albanD

fbshipit-source-id: 890b5a78ba9c1a148f3ab7c63472a73d8f6412a4
2021-02-25 11:35:38 -08:00
Chester Liu
58eb23378f Clean up usage of torch._six partially (#49785)
Summary:
See https://github.com/pytorch/pytorch/issues/42919

Pull Request resolved: https://github.com/pytorch/pytorch/pull/49785

Reviewed By: mruberry

Differential Revision: D25963833

Pulled By: bugra

fbshipit-source-id: 11c90d6b8d3f206c9d0a4d8621b773beb10c6ba2
2021-02-08 13:58:34 -08:00
Vincent Quenneville-Belair
50d903f19f [optim] make functional api be private (#51316) (#51665)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/51665

This reverts commit 896f82aa92.

Test Plan: Imported from OSS

Reviewed By: gchanan

Differential Revision: D26232608

Pulled By: vincentqb

fbshipit-source-id: ca006baf4fb672c11c1bb003c39a29cbadb63dd3
2021-02-03 17:59:05 -08:00
Jasha
a651696ab4 fix misspelling in swa_utils.pyi (#51608)
Summary:
Change `avg_fun -> avg_fn` to match the spelling in the `.py` file.
(`swa_utils.pyi` should match `swa_utils.py`)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/51608

Reviewed By: glaringlee

Differential Revision: D26224779

Pulled By: zou3519

fbshipit-source-id: 01ff7173ba0a996f1b7a653438acb6b6b4659de6
2021-02-03 10:51:22 -08:00
Vincent Quenneville-Belair
896f82aa92 [optim] make functional api be private (#51316)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/51316

Make optim functional API be private until we release with beta

Test Plan: Imported from OSS

Reviewed By: albanD

Differential Revision: D26213469

fbshipit-source-id: b0fd001a8362ec1c152250bcd57c7205ed893107
2021-02-03 09:29:33 -08:00
Jan
a5b65ae40a Fix small typo (#51542)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/51541

Pull Request resolved: https://github.com/pytorch/pytorch/pull/51542

Reviewed By: albanD

Differential Revision: D26199174

Pulled By: H-Huang

fbshipit-source-id: 919fc4a70d901916eae123672d010e9eb8e8b977
2021-02-02 10:14:17 -08:00
Wanchao Liang
5cbe1e4933 [dist_optim] add distributed functional Adam optimizer (#50624)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/50624

Add TorchScript compatible Adam functional optimizer to distributed optimizer

Test Plan: Imported from OSS

Reviewed By: rohan-varma

Differential Revision: D25932770

Pulled By: wanchaol

fbshipit-source-id: cab3f1164c76186969c284a2c52481b79bbb7190
2021-01-23 01:01:37 -08:00
Wanchao Liang
df96344968 [optimizer] refactor AdamW to use functional API (#50411)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50411

Test Plan: Imported from OSS

Reviewed By: izdeby

Differential Revision: D25932776

Pulled By: wanchaol

fbshipit-source-id: e8e1696b3390ba7909b36fd0107c58b892520432
2021-01-21 11:00:45 -08:00
Wanchao Liang
ce1781d8db [optimizer] refactor RMSProp to use functional API (#50410)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50410

Test Plan: Imported from OSS

Reviewed By: izdeby

Differential Revision: D25932779

Pulled By: wanchaol

fbshipit-source-id: b0d6007ea83d77e2d70d04681163ea7e4632c5cd
2021-01-21 11:00:41 -08:00
Wanchao Liang
d6fb27ce72 [optimizer] refactor Adadelta to use functional API (#50409)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50409

Test Plan: Imported from OSS

Reviewed By: izdeby

Differential Revision: D25932780

Pulled By: wanchaol

fbshipit-source-id: 2fc025f66a0e0863f21689892e19d8a5681f2f2f
2021-01-21 11:00:36 -08:00
Wanchao Liang
a0cf5566d8 [optimizer] refactor SGD to use functional API (#45597)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45597

Test Plan: Imported from OSS

Reviewed By: izdeby

Differential Revision: D25932773

Pulled By: wanchaol

fbshipit-source-id: bc5f830d6812f847475b9bdcc67865d9968e3282
2021-01-21 10:57:08 -08:00
Samuel Marks
e6779d4357 [*.py] Rename "Arguments:" to "Args:" (#49736)
Summary:
I've written custom parsers and emitters for everything from docstrings to classes and functions. However, I recently came across an issue when I was parsing/generating from the TensorFlow codebase: inconsistent use of `Args:` and `Arguments:` in its docstrings.

```sh
(pytorch#c348fae)$ for name in 'Args:' 'Arguments:'; do
    printf '%-10s %04d\n' "$name" "$(rg -IFtpy --count-matches "$name" | paste -s -d+ -- | bc)"; done
Args:      1095
Arguments: 0336
```

It is easy enough to extend my parsers to support both variants, however it looks like `Arguments:` is wrong anyway, as per:

  - https://google.github.io/styleguide/pyguide.html#doc-function-args @ [`ddccc0f`](https://github.com/google/styleguide/blob/ddccc0f/pyguide.md)

  - https://chromium.googlesource.com/chromiumos/docs/+/master/styleguide/python.md#describing-arguments-in-docstrings @ [`9fc0fc0`](https://chromium.googlesource.com/chromiumos/docs/+/9fc0fc0/styleguide/python.md)

  - https://sphinxcontrib-napoleon.readthedocs.io/en/latest/example_google.html @ [`c0ae8e3`](https://github.com/sphinx-contrib/napoleon/blob/c0ae8e3/docs/source/example_google.rst)

Therefore, only `Args:` is valid. This PR replaces them throughout the codebase.

PS: For related PRs, see tensorflow/tensorflow/pull/45420

PPS: The trackbacks automatically appearing below are sending the same changes to other repositories in the [PyTorch](https://github.com/pytorch) organisation.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/49736

Reviewed By: albanD

Differential Revision: D25710534

Pulled By: soumith

fbshipit-source-id: 61e8ff01abb433e9f78185c2d1d0cbd7c22c1619
2020-12-28 09:34:47 -08:00
Iurii Zdebskyi
6230e337d5 Add torch._foreach_zero_ API (#47286)
Summary:
**In this PR**
- add `_foreach_zero_` API
- Update all optimizers under /_multi_tensor/ to use `_foreach_zero_` in `zero_grad` method

Performance improvement
----------------- OP:  zero_  -----------------
for-loop: 630.36 us
foreach: 90.84 us

script

```
import torch
import torch.optim as optim
import torch.nn as nn
import torchvision
import torch.utils.benchmark as benchmark_utils

inputs = [torch.rand(3, 200, 200, device="cuda") for _ in range(100)]

def main():
    for op in [
            "zero_"
        ]:
        print("\n\n----------------- OP: ", op, " -----------------")
        stmt = "[torch.{op}(t) for t in inputs]"
        timer = benchmark_utils.Timer(
            stmt=stmt.format(op = op),
            globals=globals(),
            label="str(optimizer)",
        )
        print(f"autorange:\n{timer.blocked_autorange()}\n\n")

        stmt = "torch._foreach_{op}(inputs)"
        timer_mta = benchmark_utils.Timer(
            stmt=stmt.format(op = op),
            globals=globals(),
            label="str(optimizer_mta)",
        )
        print(f"autorange:\n{timer_mta.blocked_autorange()}\n\n")

if __name__ == "__main__":
    main()

```
**TODO**
- Refactor zero_grad once foreach APIs are stable.

**Tested** via unit tests

Pull Request resolved: https://github.com/pytorch/pytorch/pull/47286

Reviewed By: ngimel

Differential Revision: D24706240

Pulled By: izdeby

fbshipit-source-id: aac69d6d134d65126ae8e5916f3627b73d8a94bf
2020-12-16 20:04:25 -08:00
Daniil Osokin
09173ae65e Allow zero annealing epochs (#47579)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/47578.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/47579

Reviewed By: H-Huang

Differential Revision: D25429403

Pulled By: vincentqb

fbshipit-source-id: c42fbcd71b46e07c672a1e9661468848ac16de38
2020-12-16 14:09:43 -08:00
lixinyu
94e328c038 fix optimizer.pyi typo 'statue'->'state' (#49388)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49388

Test Plan: Imported from OSS

Reviewed By: zou3519

Differential Revision: D25553672

Pulled By: glaringlee

fbshipit-source-id: e9f2233bd678a90768844af2d8d5e2994d59e304
2020-12-15 23:41:56 -08:00
Teng Gao
1c31f76297 Add high level profiling trace for dataloading and optimizer (#47655)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/47441

To give user more information about python level functions in profiler traces, we propose to instrument on the following functions:

```
_BaseDataLoaderIter.__next__
Optimizer.step
Optimizer.zero_grad
```

Because the record_function already uses if (!active) to check whether the profiler is enabled, so we don't explicitly call torch.autograd._profiler_enabled() before each instrument.

Acknowledgement: nbcsm, guotuofeng, gunandrose4u , guyang3532 , mszhanyi

Pull Request resolved: https://github.com/pytorch/pytorch/pull/47655

Reviewed By: smessmer

Differential Revision: D24960386

Pulled By: ilia-cher

fbshipit-source-id: 2eb655789e2e2f506e1b8f95ad3d470c83281102
2020-12-09 00:13:56 -08:00
mariosasko
f2c3efd51f Fix generator exhaustion in SparseAdam (#47724)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/47594

Pull Request resolved: https://github.com/pytorch/pytorch/pull/47724

Reviewed By: heitorschueroff

Differential Revision: D25304131

Pulled By: albanD

fbshipit-source-id: 67c058b0836b9b4fba4f7b966396e4f3fa61f939
2020-12-07 09:38:07 -08:00
jsrozner
42e6951e62 Remove save_state_warning in LambdaLR (#46813)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/46405, https://github.com/pytorch/pytorch/issues/43352

I updated the docstring in the local file (function level comments). Do I also need to edit somewhere else or recompile docstrings?

Also, though I didn't change any types here, how is typing (for IDE type checking) documentation generated / used)?

Pull Request resolved: https://github.com/pytorch/pytorch/pull/46813

Reviewed By: ezyang

Differential Revision: D24923112

Pulled By: vincentqb

fbshipit-source-id: be7818e0d4593bfc5d74023b9c361ac2a538589a
2020-12-04 13:19:59 -08:00
Alban Desmaison
46b252b83a Revert D24262885: [pytorch][PR] Added foreach_zero_ API
Test Plan: revert-hammer

Differential Revision:
D24262885 (8e37dcb1f3)

Original commit changeset: 144c283dd009

fbshipit-source-id: 451b202e23bc1fcb11b20d26c11d9a1329789d22
2020-10-28 06:48:59 -07:00
iurii zdebskyi
8e37dcb1f3 Added foreach_zero_ API (#46215)
Summary:
Adding Added foreach_zero_(TensorList) API

Tested via unit tests

Pull Request resolved: https://github.com/pytorch/pytorch/pull/46215

Reviewed By: zhangguanheng66

Differential Revision: D24262885

Pulled By: izdeby

fbshipit-source-id: 144c283dd00924083096d6d92eb9085cbd6097d3
2020-10-27 18:03:34 -07:00
Alexander Grund
5b0f400488 Replace list(map(...)) constructs by list comprehensions (#46461)
Summary:
As discussed in https://github.com/pytorch/pytorch/issues/46392 this makes the code more readable and possibly more performant.

It also fixes a bug detected by this where the argument order of `map` was confused: 030a24906e (diff-5bb26bd3a23ee3bb540aeadcc0385df2a4e48de39f87ed9ea76b21990738fe98L1537-R1537)

Fixes https://github.com/pytorch/pytorch/issues/46392

Pull Request resolved: https://github.com/pytorch/pytorch/pull/46461

Reviewed By: ailzhang

Differential Revision: D24367015

Pulled By: ezyang

fbshipit-source-id: d55a67933cc22346b00544c9671f09982ad920e7
2020-10-19 18:42:49 -07:00
Iurii Zdebskyi
e7564b076c Refactor scalar list APIs to use overloads (#45673)
Summary:
Refactor foreach APIs to use overloads in case of scalar list inputs.
Tested via unit tests.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/45673

Reviewed By: heitorschueroff

Differential Revision: D24053424

Pulled By: izdeby

fbshipit-source-id: 35976cc50b4acfe228a32ed26cede579d5621cde
2020-10-19 09:28:49 -07:00
Aiden Nibali
2bc6caa9e4 Add three-phase option to OneCycleLR (#42715)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/40362

The new `three_phase` option provides a way of constructing schedules according to the scheme recommended in [Super-Convergence: Very Fast Training of Neural Networks Using Large Learning Rates](https://arxiv.org/abs/1708.07120).

Note that this change maintains backwards compatibility, and as a result the default behaviour of OneCycleLR remains quite counter-intuitive.

vincentqb

Pull Request resolved: https://github.com/pytorch/pytorch/pull/42715

Reviewed By: heitorschueroff

Differential Revision: D24289744

Pulled By: vincentqb

fbshipit-source-id: e4aad87880716bb14613c0aa8631e43b04a93e5c
2020-10-14 15:05:14 -07:00
Iurii Zdebskyi
8a074af929 Added scalar lists APIs for addcdiv and addcmul (#45932)
Summary:
1) Added new APIs:
 _foreach_addcdiv(Tensor(a!)[] self, Tensor[] tensor1, Tensor[] tensor2, float[] scalars)
 _foreach_addcdiv_(Tensor(a!)[] self, Tensor[] tensor1, Tensor[] tensor2, float[] scalars)
 _foreach_addcmul(Tensor(a!)[] self, Tensor[] tensor1, Tensor[] tensor2, float[] scalars)
 _foreach_addcmul_(Tensor(a!)[] self, Tensor[] tensor1, Tensor[] tensor2, float[] scalars)

2) Updated optimizers to use new APIs

Tested via unit tests

Pull Request resolved: https://github.com/pytorch/pytorch/pull/45932

Reviewed By: navahgar

Differential Revision: D24150306

Pulled By: izdeby

fbshipit-source-id: c2e65dedc95d9d81a2fdd116e41df0accb0b6f26
2020-10-14 08:12:37 -07:00
Iurii Zdebskyi
1a57b390e8 Add torch._foreach_maximum(TensorList, TensorList) & torch._foreach_minimum(TensorList, TensorList) APIs (#45692)
Summary:
- Adding torch._foreach_maximum(TensorList, TensorList) API
- Adding torch._foreach_minimum(TensorList, TensorList) API
- Updated Adam/AdamW optimizers

Tested via unit tests

Pull Request resolved: https://github.com/pytorch/pytorch/pull/45692

Reviewed By: anjali411

Differential Revision: D24142464

Pulled By: izdeby

fbshipit-source-id: 6a4fc343a1613cb1e26c8398450ac9cea0a2eb51
2020-10-13 09:22:30 -07:00
Iurii Zdebskyi
8c309fc052 Add more tests for mt optimizers (#45475)
Summary:
Add more test cases for mt optimizers and fix Adam/AdamW

Pull Request resolved: https://github.com/pytorch/pytorch/pull/45475

Reviewed By: soumith

Differential Revision: D23982727

Pulled By: izdeby

fbshipit-source-id: 4b24d37bd52a2fa3719d3e3a5dcf3b96990b0f5b
2020-09-28 23:59:58 -07:00
Iurii Zdebskyi
722faeb2a4 [RELAND] Added optimizers based on multi tensor apply (#45408)
Summary:
Original PR https://github.com/pytorch/pytorch/pull/45299.  The present PR fixes minor bugs that caused revert.

Adding a new namespace `torch.optim._multi_tensor` with a bunch of updated optimizers. Those optimizers are using _foreach APIs which improve performance significantly.

### Tests
- updated existing tests to use both optimizers
- added `test_multi_tensor_optimizers` test to verify correctness.

### Perf results

**Adam**
timeit: 42.69 ms --> 10.16 ms
autorange: 41.96 ms --> 10.28 ms

**AdamW**
timeit: 51.38 ms --> 15.63 ms
autorange: 50.82 ms --> 16.07 ms

**SGD**
timeit: 6.28 ms --> 4.40 ms
autorange: 6.13 ms --> 4.73 ms

**RMSprop**
timeit: 28.63 ms --> 5.89 ms
autorange: 28.27 ms -->  5.76 ms

**Rprop**
timeit: 213.30 --> 178.42
autorange: 212.03 --> 178.03

**ASGD**
timeit: 21.67 --> 9.33
autorange: 21.64 --> 9.27

**Adamax**
timeit: 55.60 --> 48.29
autorange: 55.22 -> 49.13

**Rerf Script used**

```
import torch
import time
import torch.optim as optim
from torch.autograd import Variable
from torch.optim.lr_scheduler import ExponentialLR, ReduceLROnPlateau, StepLR
import torch.nn as nn
import time
import torchvision
import torch.utils._benchmark as benchmark_utils

device = "cuda"
model = torchvision.models.resnet.resnet101(pretrained=True).to(device)
targets = torch.randint(0, 1000, (100, 100), device=device)
criterion = nn.CrossEntropyLoss()

optimizer = optim.SGD(model.parameters(), lr=1e-3) # <----------------------- optimizer.
                                                          # would compare optim.SGD vs optim._multi_tensor.SGD
running_loss = 0.0
target = torch.empty(128, dtype=torch.long, device=device).random_(5)

optimizer.zero_grad()
inputs = torch.rand(128, 3, 100, 100, device=device , requires_grad=True)
outputs = model(inputs)
loss = criterion(outputs, target)
loss.backward()
optimizer.step()
running_loss += loss.item()

def main():
    timer = benchmark_utils.Timer(
        stmt="optimizer.step()",
        globals=globals(),
        label="str(optimizer)",
    )

    for i in range(1):
        print(f"Run: {i}\n{'-' * 40}")
        print(f"timeit:\n{timer.timeit(1000)}\n")
        print(f"autorange:\n{timer.blocked_autorange()}\n\n")

if __name__ == "__main__":
    main()
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/45408

Reviewed By: gchanan

Differential Revision: D23956680

Pulled By: izdeby

fbshipit-source-id: c5eab7bf5fce14a287c15cead1cdc26e42cfed94
2020-09-28 13:14:04 -07:00
Mike Ruberry
54a253fded Revert D23931987: Added optimizers based on multi tensor apply
Test Plan: revert-hammer

Differential Revision:
D23931987 (2b21e7767e)

Original commit changeset: 582134ef2d40

fbshipit-source-id: ffd500aea55fda34155442fb15e2529cb9c00100
2020-09-26 18:11:54 -07:00
Iurii Zdebskyi
2b21e7767e Added optimizers based on multi tensor apply (#45299)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/45299

Adding a new namespace `torch.optim._multi_tensor` with a bunch of updated optimizers. Those optimizers are using _foreach APIs which improve performance significantly.

### Tests
- updated existing tests to use both optimizers
- added `test_multi_tensor_optimizers` test to verify correctness.

### Perf results

**Adam**
timeit: 42.69 ms --> 10.16 ms
autorange: 41.96 ms --> 10.28 ms

**AdamW**
timeit: 51.38 ms --> 15.63 ms
autorange: 50.82 ms --> 16.07 ms

**SGD**
timeit: 6.28 ms --> 4.40 ms
autorange: 6.13 ms --> 4.73 ms

**RMSprop**
timeit: 28.63 ms --> 5.89 ms
autorange: 28.27 ms -->  5.76 ms

**Rprop**
timeit: 213.30 --> 178.42
autorange: 212.03 --> 178.03

**ASGD**
timeit: 21.67 --> 9.33
autorange: 21.64 --> 9.27

**Adamax**
timeit: 55.60 --> 48.29
autorange: 55.22 -> 49.13

**Rerf Script used**

```
import torch
import time
import torch.optim as optim
from torch.autograd import Variable
from torch.optim.lr_scheduler import ExponentialLR, ReduceLROnPlateau, StepLR
import torch.nn as nn
import time
import torchvision
import torch.utils._benchmark as benchmark_utils

device = "cuda"
model = torchvision.models.resnet.resnet101(pretrained=True).to(device)
targets = torch.randint(0, 1000, (100, 100), device=device)
criterion = nn.CrossEntropyLoss()

optimizer = optim.SGD(model.parameters(), lr=1e-3) # <----------------------- optimizer.
                                                          # would compare optim.SGD vs optim._multi_tensor.SGD
running_loss = 0.0
target = torch.empty(128, dtype=torch.long, device=device).random_(5)

optimizer.zero_grad()
inputs = torch.rand(128, 3, 100, 100, device=device , requires_grad=True)
outputs = model(inputs)
loss = criterion(outputs, target)
loss.backward()
optimizer.step()
running_loss += loss.item()

def main():
    timer = benchmark_utils.Timer(
        stmt="optimizer.step()",
        globals=globals(),
        label="str(optimizer)",
    )

    for i in range(1):
        print(f"Run: {i}\n{'-' * 40}")
        print(f"timeit:\n{timer.timeit(1000)}\n")
        print(f"autorange:\n{timer.blocked_autorange()}\n\n")

if __name__ == "__main__":
    main()
```

Test Plan: Imported from OSS

Reviewed By: ngimel

Differential Revision: D23931987

Pulled By: izdeby

fbshipit-source-id: 582134ef2d402909d27d89a45c5b588fb7130ea1
2020-09-26 12:17:43 -07:00
Wanchao Liang
32c355af5b [dist_optim] introduce distributed functional optimizer (#45221)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/45221

This PR introduces a distributed functional optimizer, so that
distributed optimizer can reuse the functional optimizer APIs and
maintain their own states. This could enable the torchscript compatible
functional optimizer when using distributed optimizer, helps getting rid
of GIL and improve overall performance of training, especially distributed
model parallel training

Test Plan: Imported from OSS

Reviewed By: ailzhang

Differential Revision: D23935256

Pulled By: wanchaol

fbshipit-source-id: 59b6d77ff4693ab24a6e1cbb6740bcf614cc624a
2020-09-25 17:13:10 -07:00
Wanchao Liang
08caf15502 [optimizer] refactor Adam to use functional API (#44791)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44791

Test Plan: Imported from OSS

Reviewed By: ailzhang

Differential Revision: D23935257

Pulled By: wanchaol

fbshipit-source-id: 6f6e22a9287f5515d2e4e6abd4dee2fe7e17b945
2020-09-25 17:13:08 -07:00
Wanchao Liang
0444c372e1 [optimizer] introduce optimizer functional API, refactor Adagrad (#44715)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44715

We have provided a nice and intuitive API in Python. But in the context of large scale distributed training (e.g. Distributed Model Parallel), users often want to use multithreaded training instead of multiprocess training as it provides better resource utilization and efficiency.

This PR introduces functional optimizer concept (that is similar to the concept of `nn.functional`), we split optimizer into two parts: 1. optimizer state management 2. optimizer computation. We expose the computation part as a separate functional API that is available to be used by internal and OSS developers, the caller of the functional API will maintain their own states in order to directly calls the functional API. While maintaining the end user API be the same, the functional API is TorchScript friendly, and could be used by the distributed optimizer to speed up the training without GIL.

Test Plan: Imported from OSS

Reviewed By: ailzhang

Differential Revision: D23935258

Pulled By: wanchaol

fbshipit-source-id: d2a5228439edb3bc64f7771af2bb9e891847136a
2020-09-25 17:10:26 -07:00
Michael Carilli
3e6bb5233f Reference amp tutorial (recipe) from core amp docs (#44725)
Summary:
https://pytorch.org/tutorials/recipes/recipes/amp_recipe.html is live.  Core amp docs should reference it.

Also i fixed some typos in the `zero_grad` docs we ignored when git was behaving weirdly during ngimel 's merge of https://github.com/pytorch/pytorch/pull/44423.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/44725

Reviewed By: mruberry

Differential Revision: D23723807

Pulled By: ngimel

fbshipit-source-id: ca0b76365f8ca908bd978e3b38bf81857fa6c2a3
2020-09-16 11:37:58 -07:00
Kent Gauen
2efc618f19 lr_schedule.py redundant code (#44613)
Summary:
The subclass sets "self.last_epoch" when this is set in the parent class's init function. Why would we need to set last_epoch twice? I think calling "super" resets last_epoch anyway, so I am not sure why we would want to include this in the subclass. Am I missing something?

For the record, I am just a Pytorch enthusiast. I hope my question isn't totally silly.

Fixes #{issue number}

Pull Request resolved: https://github.com/pytorch/pytorch/pull/44613

Reviewed By: albanD

Differential Revision: D23691770

Pulled By: mrshenli

fbshipit-source-id: 080d9acda86e1a2bfaafe2c6fcb8fc1544f8cf8a
2020-09-15 20:28:39 -07:00
Xiang Gao
6bc77f4d35 Use amax/maximum instead of max in optimizers (#43797)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43797

Reviewed By: malfet

Differential Revision: D23406641

Pulled By: mruberry

fbshipit-source-id: 0cd075124aa6533b21375fe2c90c44a5d05ad6e6
2020-09-15 10:39:40 -07:00
taiyuanz
c515881137 Add reset_grad() function (#44423)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44423

Pull Request resolved: https://github.com/pytorch/pytorch/pull/42754

Test Plan: Imported from OSS

Reviewed By: mruberry

Differential Revision: D23010859

Pulled By: ngimel

fbshipit-source-id: 56eec43eba88b98cbf714841813977c68f983564
2020-09-09 22:05:45 -07:00
Randall Hunt
24eea364f7 Check SparseAdam params are dense on init (#41966) (#43668)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/41966

Raises a value error if user attempts to create SparseAdam optimizer with sparse parameter tensors.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/43668

Reviewed By: glaringlee

Differential Revision: D23388109

Pulled By: ranman

fbshipit-source-id: 1fbcc7527d49eac6fae9ce51b3307c609a6ca38b
2020-09-01 14:25:59 -07:00
NTT123
103887892c Fix "non-negative integer" error messages (#42734)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/42662

Use "positive integer" error message for consistency with: 17f76f9a78/torch/optim/lr_scheduler.py (L958-L959)
ad7133d3c1/torch/utils/data/sampler.py (L102-L104)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/42734

Reviewed By: zdevito

Differential Revision: D23039575

Pulled By: smessmer

fbshipit-source-id: 1be1e0caa868891540ecdbe6f471a6cd51c40ede
2020-08-10 19:39:37 -07:00
Vincent Quenneville-Belair
7221a3d1aa enable torch.optim.swa_utils.SWALR (#42574)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/42435

Pull Request resolved: https://github.com/pytorch/pytorch/pull/42574

Reviewed By: zou3519

Differential Revision: D22949369

Pulled By: vincentqb

fbshipit-source-id: f2f319ec94a97e0afe4d4327c866504ae632a986
2020-08-05 12:37:45 -07:00
Yanli Zhao
79cfd85987 grad detach_ only when it has grad_fn in zero_grad call (#41283)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/41283

in optimizer.zero_grad(), detach_ is useful to avoid memory leak only when grad has grad_fn, so add check to call grad.detach_ only when the grad has grad_fn in zero_grad() function
ghstack-source-id: 108702289

Test Plan: unit test

Reviewed By: mrshenli

Differential Revision: D22487315

fbshipit-source-id: 861909b15c8497f1da57f092d8963d4920c85e38
2020-07-29 11:40:13 -07:00
YifanShenSZ
e7ed0b3fae Avoid zero division in _cubic_interpolate (#42093)
Summary:
I encountered a zero division problem when using LBFGS:

File "/home/yshen/anaconda3/lib/python3.7/site-packages/torch/optim/lbfgs.py", line 118, in _strong_wolfe
    bracket[1], bracket_f[1], bracket_gtd[1])
File "/home/yshen/anaconda3/lib/python3.7/site-packages/torch/optim/lbfgs.py", line 21, in _cubic_interpolate
    d1 = g1 + g2 - 3 * (f1 - f2) / (x1 - x2)
ZeroDivisionError: float division by zero

My solution is to determine whether "line-search bracket is so small" before calling _cubic_interpolate

Pull Request resolved: https://github.com/pytorch/pytorch/pull/42093

Reviewed By: pbelevich

Differential Revision: D22770667

Pulled By: mrshenli

fbshipit-source-id: f8fdfcbd3fd530235901d255208fef8005bf898c
2020-07-28 08:32:00 -07:00
mariosasko
4281240cb5 Raise error for duplicate params in param group #40967 (#41597)
Summary:
This PR fixes an issue in https://github.com/pytorch/pytorch/issues/40967 where duplicate parameters across different parameter groups are not allowed, but duplicates inside the same parameter group are accepted. After this PR, both cases are treated equally and raise `ValueError`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/41597

Reviewed By: zou3519

Differential Revision: D22608019

Pulled By: vincentqb

fbshipit-source-id: 6df41dac62b80db042cfefa6e53fb021b49f4399
2020-07-27 12:25:52 -07:00
Zhijian Liu
7646f3c77f Fix type annotation for CosineAnnealingLR (#41866)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41866

Reviewed By: izdeby

Differential Revision: D22703576

Pulled By: mrshenli

fbshipit-source-id: 10a0f593ffaaae82a2923a42815c36793a9043d5
2020-07-23 15:56:50 -07:00
guol-fnst
17f76f9a78 Verbose param for schedulers that don't have it #38726 (#41580)
Summary:
Verbose param for schedulers that don't have it https://github.com/pytorch/pytorch/issues/38726

Pull Request resolved: https://github.com/pytorch/pytorch/pull/41580

Reviewed By: izdeby

Differential Revision: D22671163

Pulled By: vincentqb

fbshipit-source-id: 53a6c9e929141d411b6846bc25f3fe7f46fdf3be
2020-07-23 09:57:33 -07:00
Jeong Ukjae
e831299bae Fix typing error of torch/optim/lr_scheduler.pyi (#41775)
Summary:
* add `_LRScheduler.get_last_lr` type stub.
* remove `CosineAnnealingWarmRestarts.step` because its signature is same with `_LRScheduler`'s.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/41775

Reviewed By: izdeby

Differential Revision: D22649350

Pulled By: vincentqb

fbshipit-source-id: 5355dd062a5af437f4fc153244dda793a2382e7e
2020-07-23 09:30:32 -07:00
farhadrgh
4b4273a04e Update Adam documentation (#41679)
Summary:
This PR fixes https://github.com/pytorch/pytorch/issues/41477

Adam implementation is doing L2 regularization and not decoupled weight decay. However, the change mentioned in https://github.com/pytorch/pytorch/issues/41477 was motivated by Line 12 of algorithm 2 in [Decoupled Weight Decay Regularization](https://arxiv.org/pdf/1711.05101.pdf) paper.

Please let me know if you have other suggestions about how to deliver this info in the docs.
cc ezyang

Pull Request resolved: https://github.com/pytorch/pytorch/pull/41679

Reviewed By: izdeby

Differential Revision: D22671329

Pulled By: vincentqb

fbshipit-source-id: 2caf60e4f62fe31f29aa35a9532d1c6895a24224
2020-07-23 09:25:41 -07:00
wudenggang
9600ed9af3 typo fixes (#41632)
Summary:
typo fixes

Pull Request resolved: https://github.com/pytorch/pytorch/pull/41632

Reviewed By: ezyang

Differential Revision: D22617827

Pulled By: mrshenli

fbshipit-source-id: c2bfcb7cc36913a8dd32f13fc9adc3aa0a9b682f
2020-07-20 07:23:00 -07:00
Edward Leardi
6b50874cb7 Fix HTTP links in documentation to HTTPS (#40878)
Summary:
I ran `make linkcheck` using `sphinx.builders.linkcheck` on the documentation and noticed a few links weren't using HTTPS so I quickly updated them all.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/40878

Differential Revision: D22404647

Pulled By: ngimel

fbshipit-source-id: 9c9756db59197304023fddc28f252314f6cf4af3
2020-07-06 20:05:21 -07:00
vfdev
a6a2dd14ea Fix typo in warning message (#39854)
Summary:
Fix typo
Pull Request resolved: https://github.com/pytorch/pytorch/pull/39854

Reviewed By: ezyang

Differential Revision: D22193544

Pulled By: zou3519

fbshipit-source-id: 04b9f59da7b6ba0649fc6d315adcf20685e10930
2020-06-23 16:47:35 -07:00
Ram Rachum
f6b9848c25 Use chain.from_iterable in optimizer.py (#40156)
Summary:
This is a faster and more idiomatic way of using `itertools.chain`. Instead of computing all the items in the iterable and storing them in memory, they are computed one-by-one and never stored as a huge list. This can save on both runtime and memory space.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/40156

Reviewed By: ezyang

Differential Revision: D22189038

Pulled By: vincentqb

fbshipit-source-id: 160b2c27f442686821a6ea541e1f48f4a846c186
2020-06-23 14:07:05 -07:00
Alex Hedges
a3c87c4922 Make Optimizer.state_dict() nondeterministic (#37347)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/36831.

Instead of using `id()`, an arbitrary yet consistent order-based index is used instead. This results in a deterministic output between runs.

I am not the biggest fan of using `nonlocal` (it appears to be used sparingly in the codebase) to get `start_index` between calls to `pack_group()`, but the alternatives had larger issues:
- Using the last value added to `param_mappings` would be ideal, but that only works if `dict` iteration order is consistent, and PyTorch currently supports Python <3.7.
- Using the maximum value added to `param_mappings` wouldn't have that issue but would not be constant time.

For testing, I confirmed that `test_optim.py` works before and after these changes. Randomizing the indices in `param_mappings` causes the tests to fail, which is further evidence these changes work. I'm not 100% if these tests are sufficient, but they're a start.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/37347

Differential Revision: D21353820

Pulled By: vincentqb

fbshipit-source-id: e549f1f154833a461b1f4df6d07ad509aab34ea1
2020-06-01 15:32:02 -07:00
Ralf Gommers
9fe8243536 Fix minor issue in type stub for Optimizer (#38067)
Summary:
Closes gh-23731
Pull Request resolved: https://github.com/pytorch/pytorch/pull/38067

Differential Revision: D21471021

Pulled By: ezyang

fbshipit-source-id: 8e7ee7f437bfa8e78a47ac6cf572b0fc9b5c6939
2020-05-07 20:11:40 -07:00
Bartosz Gasiorzewski
867e05921f Fix multiple issues with type annotations (#36358)
Summary:
- added tests that showcase the problems
- fixed the problems

These changes would allow me to remove many "# type: ignore" comments in my codebase.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/36358

Differential Revision: D21230704

Pulled By: ezyang

fbshipit-source-id: e6d475a0aa1fb40258fa0231ade28c38108355fb
2020-04-29 11:16:39 -07:00
Pavel Izmailov
22ac071d9a Add SWA to PyTorch mainline (#35032)
Summary:
This PR is based on the issue https://github.com/pytorch/pytorch/issues/29994#issue-524418771 and the discussion in the previous version of the PR https://github.com/pytorch/pytorch/pull/30559. Specifically, I followed the interface outlined in this [comment](https://github.com/pytorch/pytorch/pull/30559#issuecomment-574864768).

## Structure
- `torch/optim/swa_utils.py` contains the implementation of  `AveragedModel` class, `SWALR` learning rate scheduler and `update_bn` utility
- `test/test_optim.py` contains unit tests for the three components of SWA
- `torch/optim/swa_utils.pyi` describes the interface of `torch/optim/swa_utils.py`

The new implementation consists of
- `AveragedModel` class; this class creates a copy of a given model and allows to compute running averages of the parameters.
- `SWALR` learning rate scheduler; after a certain number of epochs switches to a constant learning rate; this scheduler is supposed to be chained with other schedulers.
- `update_bn` utility; updates the Batch Normalization activation statistics for a given model and dataloader; this utility is meant to be applied to `AveragedModel` instances.

For `update_bn` I simplified the implementation compared to the [original PR](https://github.com/pytorch/pytorch/pull/30559) according to the sugestions by vadimkantorov.

## Example
```python
loader, optimizer, model = ...
swa_model = torch.optim.swa_utils.AveragedModel(model)
# You can use custom averaging functions with `avg_fun` parameter
ema_avg = lambda p_avg, p, n_avg: 0.1 * p_avg + 0.9 * p
ema_model = torch.optim.swa_utils.AveragedModel(model,
                                    avg_function=ema_avg)
scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(optimizer,
                                    T_max=300)
swa_start = 160
swa_scheduler = SWALR(optimizer, start_epoch=swa_start, swa_lr=0.05)

for i in range(300):
     for input, target in loader:
         optimizer.zero_grad()
         loss_fn(model(input), target).backward()
         optimizer.step()
         scheduler.step()
         swa_scheduler.step()

     if i > swa_start:
         swa_model.update_parameters(model)

# Update bn statistics for the swa_model at the end
torch.optim.swa_utils.update_bn(loader, swa_model)
```

UPDATED:
```python3
loader, optimizer, model, loss_fn = ...
swa_model = torch.optim.swa_utils.AveragedModel(model)
scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(optimizer, T_max=300)
swa_start = 160
swa_scheduler = SWALR(optimizer, swa_lr=0.05)
for i in range(300):
     for input, target in loader:
         optimizer.zero_grad()
         loss_fn(model(input), target).backward()
         optimizer.step()
     if i > swa_start:
         swa_model.update_parameters(model)
         swa_scheduler.step()
     else:
         scheduler.step()

# Update bn statistics for the swa_model at the end
torch.optim.swa_utils.update_bn(loader, swa_model)
```

Fixes https://github.com/pytorch/pytorch/issues/29994
cc soumith vincentqb andrewgordonwilson vadimkantorov
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35032

Differential Revision: D21079606

Pulled By: vincentqb

fbshipit-source-id: e07f5e821f72ada63789814c2dcbdc31f0160c37
2020-04-27 07:42:19 -07:00
Masaki Kozuki
7403545518 Fix exception message of torch.optim.AdamW. (#36088)
Summary:
PyTorch does not implement `SparseAdamW`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/36088

Differential Revision: D20932357

Pulled By: gchanan

fbshipit-source-id: 49e5b72c34ff8ce0deb6b3807662b8b7d67d959f
2020-04-09 08:02:10 -07:00
lordeddard
2de4f245c6 Fix typo in documentation (#34581)
Summary:
Update the  parameter description of `total_steps` in `OneCycleLR`. References https://github.com/pytorch/pytorch/issues/34531
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34581

Differential Revision: D20386306

Pulled By: albanD

fbshipit-source-id: f8b424a01760e8f5d4de5367b6c60fb342019689
2020-03-11 13:57:10 -07:00
Vincent Quenneville-Belair
be3bc1deb1 convert counter back to list #33229 (#33356)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/33229
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33356

Differential Revision: D20003196

Pulled By: vincentqb

fbshipit-source-id: 96f9e0fc7e99a7c2e202f932d1a2ffa158afad92
2020-03-10 15:46:24 -07:00
prajjwal1
b1bd950a4d Fixed stub for AdamW (#34299)
Summary:
Fixes [https://github.com/pytorch/pytorch/issues/33757](https://github.com/pytorch/pytorch/issues/33757)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34299

Differential Revision: D20337844

Pulled By: ezyang

fbshipit-source-id: 54bf174a09b8db9bf6e0c3c717730dd7c795d76b
2020-03-09 08:45:51 -07:00
albanD
6e2bb1c054 End of the .data removal in torch/optim (#34211)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34211

Test Plan: Imported from OSS

Differential Revision: D20248684

Pulled By: albanD

fbshipit-source-id: 2294bfa41b82ff47f000bc98860780f59d7d4421
2020-03-09 06:40:39 -07:00
Eleanor Dwight Holland
6a97777f72 Remove use of .data from optimizers (#33640)
Summary:
Removes all uses of `.data` from optimizers.

Or tries to.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33640

Reviewed By: vincentqb

Differential Revision: D20203216

Pulled By: albanD

fbshipit-source-id: 9bfe78bbed00fd4aaa690801cff0201f0bd680a0
2020-03-03 13:21:55 -08:00
HearyShen
edd5c009f7 fix docs mistakes in lr_scheduler.MultiplicativeLR (#33805)
Summary:
This PR is referenced to an issue: [The docs of `MultiplicativeLR` use `LambdaLR` as example](https://github.com/pytorch/pytorch/issues/33752#issue-570374087)

https://github.com/pytorch/pytorch/issues/33752
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33805

Differential Revision: D20121314

Pulled By: mruberry

fbshipit-source-id: 5afa63bbe83d35ce4e55705b8cbd96326a907651
2020-02-27 14:11:57 -08:00
JeongUkJae
b10761d890 fix type stub errors (#33762)
Summary:
I've been using pytorch with type hintings, and I found errors that can be easily fixed. So I'm creating this PR to fix type bugs.

I expected below code should be type-checked without any errors.

```python
import torch
from torch.nn import Linear
from torch.autograd import Variable
from torch.optim import AdamW
from torch.utils import hooks

# nn.Module should have training attribute
module = Linear(10, 20)
module.training

# torch should have dtype bfloat16
tensor2 = torch.tensor([1,2,3], dtype=torch.bfloat16)

# torch.Tensor.cuda should accept int or str value
torch.randn(5).cuda(1)
torch.tensor(5).cuda('cuda:0')

# optimizer should have default attribute
module = Linear(10, 20)
print(AdamW(module.weight).default)

# torch.Tensor should have these boolean attributes
torch.tensor([1]).is_sparse
torch.tensor([1]).is_quantized
torch.tensor([1]).is_mkldnn

# Size class should tuple of int
a, b = torch.tensor([[1,2,3]]).size()

# check modules can be accessed
torch.nn.parallel
torch.autograd.profiler
torch.multiprocessing
torch.sparse
torch.onnx
torch.jit
torch.hub
torch.random
torch.distributions
torch.quantization
torch.__config__
torch.__future__

torch.ops
torch.classes

# Variable class's constructor should return Tensor
def fn_to_test_variable(t: torch.Tensor):
    return None

v = Variable(torch.tensor(1))
fn_to_test_variable(v)

# check RemovableHandle attributes can be accessed
handle = hooks.RemovableHandle({})
handle.id
handle.next_id

# check torch function hints
torch.is_grad_enabled()
```

But current master branch raises errors. (I checked with pyright)

```
$ pyright test.py
Searching for source files
Found 1 source file
test.py
  12:45 - error: 'bfloat16' is not a known member of module
  15:21 - error: Argument of type 'Literal[1]' cannot be assigned to parameter 'device' of type 'Optional[device]'
  'int' is incompatible with 'device'
  Cannot assign to 'None'
  16:22 - error: Argument of type 'Literal['cuda:0']' cannot be assigned to parameter 'device' of type 'Optional[device]'
  'str' is incompatible with 'device'
  Cannot assign to 'None'
  23:19 - error: Cannot access member 'is_sparse' for type 'Tensor'
  Member 'is_sparse' is unknown
  24:19 - error: Cannot access member 'is_quantized' for type 'Tensor'
  Member 'is_quantized' is unknown
  25:19 - error: Cannot access member 'is_mkldnn' for type 'Tensor'
  Member 'is_mkldnn' is unknown
  32:7 - error: 'autograd' is not a known member of module
  33:7 - error: 'multiprocessing' is not a known member of module
  34:7 - error: 'sparse' is not a known member of module
  35:7 - error: 'onnx' is not a known member of module
  36:7 - error: 'jit' is not a known member of module
  37:7 - error: 'hub' is not a known member of module
  38:7 - error: 'random' is not a known member of module
  39:7 - error: 'distributions' is not a known member of module
  40:7 - error: 'quantization' is not a known member of module
  41:7 - error: '__config__' is not a known member of module
  42:7 - error: '__future__' is not a known member of module
  44:7 - error: 'ops' is not a known member of module
  45:7 - error: 'classes' is not a known member of module
  60:7 - error: 'is_grad_enabled' is not a known member of module
20 errors, 0 warnings
Completed in 1.436sec
```

and below list is not checked as errors, but I think these are errors too.

* `nn.Module.training` is not boolean
* return type of `torch.Tensor.size()` is `Tuple[Unknown]`.

 ---

related issues.

https://github.com/pytorch/pytorch/issues/23731, https://github.com/pytorch/pytorch/issues/32824, https://github.com/pytorch/pytorch/issues/31753
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33762

Differential Revision: D20118884

Pulled By: albanD

fbshipit-source-id: 41557d66674a11b8e7503a48476d4cdd0f278eab
2020-02-27 06:58:53 -08:00
Xiao Wang
c1dd70688a Fix deprecated python "add" calls (#33428)
Summary:
This PR fixed those python "add" calls using deprecated signature `add(Scalar, Tensor)`. The alternative signature `add(Tensor, alpha = Scalar)` is used.

cc csarofeen zasdfgbnm ptrblck ngimel
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33428

Differential Revision: D20002534

Pulled By: vincentqb

fbshipit-source-id: 81f2dd6170a47a9b53a17e5817c26e70d8afa130
2020-02-26 09:02:31 -08:00
Hong Xu
a6a72ac68f Fix all occurrences of C416. (#33429)
Summary:
C416: Unnecessary (list/set) comprehension - rewrite using list/set().

See https://pypi.org/project/flake8-comprehensions/
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33429

Differential Revision: D19972858

Pulled By: ezyang

fbshipit-source-id: faac042a94c59d737bd5ae983121a0a029346e23
2020-02-21 08:32:22 -08:00
Nikolay Novik
d19a50bf27 Add missing weight_decay parameter validation for Adam and AdamW (#33126)
Summary:
Adam and AdamW are missing parameter validation for weight_decay. Other optimisers have this check present.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33126

Differential Revision: D19860366

Pulled By: vincentqb

fbshipit-source-id: 286d7dc90e2f4ccf6540638286d2fe17939648fc
2020-02-20 11:11:51 -08:00
Edgar Andrés Margffoy Tuay
cdf381c967 Fix LambdaLR scheduler side effects (#32848)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/32756
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32848

Differential Revision: D19859736

Pulled By: vincentqb

fbshipit-source-id: 43b3cbb2b6bed208c75aad37aebc2a8a9565fe0d
2020-02-20 11:09:56 -08:00
Jeong Ukjae
879cf0b15a fix typing bug of LambdaLR.__init__ (#33271)
Summary:
## problem

```python
class LambdaLR(_LRScheduler):
    """Sets the learning rate of each parameter group to the initial lr
    times a given function. When last_epoch=-1, sets initial lr as lr.

    Args:
        optimizer (Optimizer): Wrapped optimizer.
        lr_lambda (function or list): A function which computes a multiplicative
            factor given an integer parameter epoch, or a list of such
            functions, one for each group in optimizer.param_groups.
        last_epoch (int): The index of last epoch. Default: -1.

    Example:
        >>> # Assuming optimizer has two groups.
        >>> lambda1 = lambda epoch: epoch // 30
        >>> lambda2 = lambda epoch: 0.95 ** epoch
        >>> scheduler = LambdaLR(optimizer, lr_lambda=[lambda1, lambda2])
        >>> for epoch in range(100):
        >>>     train(...)
        >>>     validate(...)
        >>>     scheduler.step()
    """
```

`LambdaLR` takes a lambda that returns a float and takes a int, or a list of such lambdas.

## related issue

Resolve https://github.com/pytorch/pytorch/issues/32645
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33271

Differential Revision: D19878665

Pulled By: vincentqb

fbshipit-source-id: 50b16caea13de5a3cbd187e688369f33500499d0
2020-02-18 09:10:00 -08:00
cshesse
1487137c5b add missing default value for LRScheduler.step() (#32411)
Summary:
see also other type errors in https://github.com/pytorch/pytorch/pull/30576 and https://github.com/pytorch/pytorch/pull/30441
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32411

Differential Revision: D19697245

Pulled By: ezyang

fbshipit-source-id: d0295d747541adec5d6fad646f4cf4bb2f04abf5
2020-02-11 20:34:33 -08:00
Vincent Quenneville-Belair
e7f0b15473 Remove return value for __exit__ (#32997)
Summary:
When an error is raised and `__exit__` in a context manager returns `True`, the error is suppressed; otherwise the error is raised. No return value should be given to maintain the default behavior of context manager.

Fixes https://github.com/pytorch/pytorch/issues/32639. The `get_lr` function was overridden with a function taking an epoch parameter, which is not allowed. However, the relevant error was not being raised.

```python
In [1]: import torch
   ...:
   ...: class MultiStepLR(torch.optim.lr_scheduler._LRScheduler):
   ...:     def __init__(self, optimizer, gamma, milestones, last_epoch = -1):
   ...:         self.init_lr = [group['lr'] for group in optimizer.param_groups]
   ...:         self.gamma = gamma
   ...:         self.milestones = milestones
   ...:         super().__init__(optimizer, last_epoch)
   ...:
   ...:     def get_lr(self, step):
   ...:         global_step = self.last_epoch #iteration number in pytorch
   ...:         gamma_power = ([0] + [i + 1 for i, m in enumerate(self.milestones) if global_step >= m])[-1]
   ...:         return [init_lr * (self.gamma ** gamma_power) for init_lr in self.init_lr]
   ...:
   ...: optimizer = torch.optim.SGD([torch.rand(1)], lr = 1)
   ...: scheduler = MultiStepLR(optimizer, gamma = 1, milestones = [10, 20])
```
```
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-1-7fad6ba050b0> in <module>
     14
     15 optimizer = torch.optim.SGD([torch.rand(1)], lr = 1)
---> 16 scheduler = MultiStepLR(optimizer, gamma = 1, milestones = [10, 20])

<ipython-input-1-7fad6ba050b0> in __init__(self, optimizer, gamma, milestones, last_epoch)
      6         self.gamma = gamma
      7         self.milestones = milestones
----> 8         super().__init__(optimizer, last_epoch)
      9
     10     def get_lr(self, step):

~/anaconda3/envs/pytorch/lib/python3.7/site-packages/torch/optim/lr_scheduler.py in __init__(self, optimizer, last_epoch)
     75         self._step_count = 0
     76
---> 77         self.step()
     78
     79     def state_dict(self):

~/anaconda3/envs/pytorch/lib/python3.7/site-packages/torch/optim/lr_scheduler.py in step(self, epoch)
    141                 print("1a")
    142                 # try:
--> 143                 values = self.get_lr()
    144                 # except TypeError:
    145                     # raise RuntimeError

TypeError: get_lr() missing 1 required positional argument: 'step'
```

May be related to https://github.com/pytorch/pytorch/issues/32898.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32997

Differential Revision: D19737731

Pulled By: vincentqb

fbshipit-source-id: 5cf84beada69b91f91e36b20c3278e9920343655
2020-02-11 09:27:29 -08:00
Enealor
e085c55e53 Fix \\ warnings/errors when building optim documentation (#32911)
Summary:
This PR fixes the warnings and errors attributed to the use of `\\` outside of a proper environment. While rendered correctly in the documentation, it produces the warning
```
LaTeX-incompatible input and strict mode is set to 'warn': In LaTeX, \\ or \newline does nothing in display mode [newLineInDisplayMode]
```
on the CI tools and errors with
```
ParseError: KaTeX parse error: Expected 'EOF', got '\\' at position (x): ...
```
when not set to warn.

This PR also makes minor formatting adjustments. The `CosineAnnealingLR` documentation has been adjusted to remove an unnecessarily large fraction and to improve spacing. The `SGD` documentation has been adjusted so that variables are consistently typeset and so that it follows the convention of punctuating equations. I attached images of the current documentation, the new documentation and a marked version to highlight differences.

* SGD:
New: ![new_sgd](https://user-images.githubusercontent.com/53704971/73596383-98795500-44d6-11ea-97ce-bac02a0a1638.png)
Current: ![current_sgd](https://user-images.githubusercontent.com/53704971/73596384-98795500-44d6-11ea-86d3-b407cebbb513.png)
Marked new: ![marked_sgd](https://user-images.githubusercontent.com/53704971/73596385-98795500-44d6-11ea-9e06-9ac5e5e27270.png)

* CosineAnnealingLR:
New: ![new_calr](https://user-images.githubusercontent.com/53704971/73596382-98795500-44d6-11ea-9c90-02406d297bae.png)
Current: ![current_calr](https://user-images.githubusercontent.com/53704971/73596387-9911eb80-44d6-11ea-93fb-ee72d695312a.png)
Marked new: ![marked_calr](https://user-images.githubusercontent.com/53704971/73596386-9911eb80-44d6-11ea-91a6-ed7a62b4e255.png)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32911

Differential Revision: D19697114

Pulled By: ezyang

fbshipit-source-id: 567304bd4adcfa4086eae497cb818cf74375fe5d
2020-02-03 09:54:38 -08:00
Nikolay Novik
e87887ccb4 Update type hints for torch.optim.optimizer.Optimizer (#32900)
Summary:
This PR fixes type hints for `torch.optim.optimizer.Optimizer` object, issue also reported in https://github.com/pytorch/pytorch/issues/23731

To test things I used following optimiser implementation, that is fully covered with type hints:

```python
from typing import Optional, Callable, Union, Iterable

from torch import Tensor
from torch.optim.optimizer import Optimizer

OptClosure = Optional[Callable[[], float]]
_params_t = Union[Iterable[Tensor], Iterable[dict]]

class SGD(Optimizer):
    def __init__(self, params: _params_t, lr: float = 0.1) -> None:
        defaults = dict(lr=lr)
        super(SGD, self).__init__(params, defaults)

    def __setstate__(self, state: dict) -> None:
        super(SGD, self).__setstate__(state)

    def step(self, closure: OptClosure = None) -> Optional[float]:
        loss = None
        if closure is not None:
            loss = closure()

        for group in self.param_groups:
            for p in group['params']:
                if p.grad is None:
                    continue
                d_p = p.grad.data
                p.data.add_(-group['lr'], d_p)
        return loss
```

Without fix `mypy` reports bunch of inconsistencies in types and missing properties:

```bash
$ mypy  torch_optimizer/sgd.py
torch_optimizer/sgd.py:14: error: Too many arguments for "__init__" of "Optimizer"
torch_optimizer/sgd.py:17: error: "__setstate__" undefined in superclass
torch_optimizer/sgd.py:19: error: Return type "Optional[float]" of "step" incompatible with return type "None" in supertype "Optimizer"
torch_optimizer/sgd.py:24: error: "SGD" has no attribute "param_groups"
Found 4 errors in 1 file (checked 1 source file)
```

with fix not issues:
```bash
$ mypy  torch_optimizer/sgd.py
Success: no issues found in 1 source file
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32900

Differential Revision: D19697175

Pulled By: ezyang

fbshipit-source-id: d5e2b3c421f69da3df8c32b3d53b4b6d15d61a41
2020-02-03 09:00:01 -08:00
Kirayue
9e9bfbfd8d Update old scheduler example usage (#31358)
Summary:
Update the old example usage in CosineAnnealingWarm, `scheduler.step()` should be called after `optimizer.step()`.

https://github.com/pytorch/pytorch/issues/20028#issuecomment-566061580
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31358

Differential Revision: D19199311

Pulled By: vincentqb

fbshipit-source-id: cb29b95f8277d2dfa75ec2a83c1af03a5c9c9a69
2020-01-02 09:15:04 -08:00
Vincent Quenneville-Belair
9459db86bf Raise warning for schedulers following chainable shedulers (#31125)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/29697.

Raise warning for schedulers following chainable schedulers in https://github.com/pytorch/pytorch/issues/26423. See explanation for
* [new warning when load/save](https://github.com/pytorch/pytorch/issues/29697#issuecomment-564655802)
* [change from deprecation to user warning](https://github.com/pytorch/pytorch/issues/29697#issuecomment-564659775).

gchanan -- This should go in the upcoming release following https://github.com/pytorch/pytorch/issues/26423.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31125

Differential Revision: D19143740

Pulled By: vincentqb

fbshipit-source-id: 35b55fe6c5b39ca5a68b1a6e19f14eb95b9a784e
2019-12-23 08:24:22 -08:00
Stephen Roller
159835e666 Add types for the remaining optimizers. (#31130)
Summary:
**Patch Description**
Round out the rest of the optimizer types in torch.optim by creating the stubs for the rest of them.

**Testing**:
I ran mypy looking for just errors in that optim folder. There's no *new* mypy errors created.
```
$ mypy torch/optim | grep optim
$ git checkout master; mypy torch/optim | wc -l
968
$ git checkout typeoptims; mypy torch/optim | wc -l
968
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31130

Reviewed By: stephenroller

Differential Revision: D18947145

Pulled By: vincentqb

fbshipit-source-id: 5b8582223833b1d9123d829acc1ed8243df87561
2019-12-12 06:36:41 -08:00
albanD
b0871f211b Make all optimizers consistent so that they don't change gradients inplace
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30257

Test Plan: Imported from OSS

Differential Revision: D18665461

Pulled By: albanD

fbshipit-source-id: cfdafef919468a41007881b82fd288b7128baf95
2019-11-26 12:16:25 -08:00
Vitaly Fedyunin
877c96cddf explicitly provide memory format when calling to *_like operators
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30008

Test Plan: Imported from OSS

Differential Revision: D18575981

Pulled By: VitalyFedyunin

fbshipit-source-id: ec3418257089ad57913932be1a8608cd20ce054c
2019-11-19 16:19:29 -08:00
Adam J. Stewart
23483406aa Fix missing space in lr_scheduler warning msg
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29527

Differential Revision: D18422662

Pulled By: ngimel

fbshipit-source-id: 80191232ee0b639274ba3561e0d89ddcb40434e7
2019-11-10 22:51:35 -08:00
Igor Fedan
43d4d019c4 explicitly provide memory format when calling to clone() at rprop.py
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28693

Test Plan: Imported from OSS

Differential Revision: D18333379

Pulled By: ifedan

fbshipit-source-id: 4430efc0602a3fc6ef05adac07df845a696449f7
2019-11-07 09:00:37 -08:00
Igor Fedan
b05e9d4521 explicitly provide memory format when calling to clone() at lbfgs.py
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28692

Test Plan: Imported from OSS

Differential Revision: D18333356

Pulled By: ifedan

fbshipit-source-id: ca0de6b721f695893c0756ea1b3b469df1a2b249
2019-11-07 08:20:11 -08:00
なるみ
d83389d327 Ignore F401 in all __init__.py without putting noqa (#25823)
Summary:
By adding `per-file-ignores = __init__.py: F401` into `.flake8` with `flake8>=3.7`, we can ignore F410 in all `__init__.py` without putting `# noqa: F401` line by line.

http://flake8.pycqa.org/en/latest/user/options.html?highlight=per-file-ignores#cmdoption-flake8-per-file-ignores
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25823

Differential Revision: D17252182

Pulled By: soumith

fbshipit-source-id: 87b174075b79e4078953a7521bd1a8f82405646b
2019-10-23 15:28:13 -07:00
Vincent Quenneville-Belair
cbddc77ac5 fix docs for lr (#28026)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28026

Documentation for learning rate does not render well. #27730.

Test Plan: Imported from OSS

Differential Revision: D17953395

Pulled By: vincentqb

fbshipit-source-id: 9e84df3e7de43f11399a67bc99c76ef241b1120f
2019-10-23 13:49:34 -07:00
Timothy Man
1c53a74e26 Fixed behavior of div_factor parameter in optim.lr_scheduler.OneCycleLR (#28217)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/28216
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28217

Differential Revision: D18070759

Pulled By: vincentqb

fbshipit-source-id: ed032190c0e3eab834fc9a8f408b75b56f0f35ec
2019-10-23 13:39:05 -07:00
Vincent Quenneville-Belair
e4f40bf3b2 Add multiplicative lr. (#27254)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27254

`MultiplicativeLR` consumes a function providing the multiplicative factor at each epoch. It mimics `LambdaLR` in its syntax.

Test Plan: Imported from OSS

Differential Revision: D17728088

Pulled By: vincentqb

fbshipit-source-id: 1c4a8e19a4f24c87b5efccda01630c8a970dc5c9
2019-10-23 11:38:45 -07:00
Vincent Quenneville-Belair
d1d2358d31 Correct math formatting for lr scheduler (#28467)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28467

Correcting formatting error from #27874. Also making size of parenthesis more natural.

![Screen Shot 2019-10-22 at 5 38 22 PM](https://user-images.githubusercontent.com/3047868/67336492-76ddfa00-f4f3-11e9-9d79-70a0aa4f6d29.png)

Closes #27874

Test Plan: Imported from OSS

Differential Revision: D18076085

Pulled By: vincentqb

fbshipit-source-id: cb7c52b347d6d11ea4a2d3c94d00a42f849c0a83
2019-10-23 11:11:25 -07:00
zou3519
e5d6b75319 Bag of documentation fixes; fix more sphinx warnings (#27850)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27850

Many of these are real problems in the documentation (i.e., link or
bullet point doesn't display correctly).

Test Plan: - built and viewed the documentation for each change locally.

Differential Revision: D17908123

Pulled By: zou3519

fbshipit-source-id: 65c92a352c89b90fb6b508c388b0874233a3817a
2019-10-15 07:31:14 -07:00
Vincent Quenneville-Belair
28b1f586f6 Change schedulers to chainable form (#26423)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26423

Enable chainable schedulers as requested in #13022 by implementing the changes mentioned below from [comment](https://github.com/pytorch/pytorch/pull/21800#issuecomment-513370208).

* Changing the behavior of schedulers to the chainable formula when available
* Using the closed form whenever epoch is different from None until the next release with a deprecation warning
* Making `get_computed_values` the supported way of obtaining the last computed learning rate by the scheduler (see [comment](https://github.com/pytorch/pytorch/pull/21800#issuecomment-513940729) for new syntax)
* Returning a deprecation warning when invoking the undocumented get_lr function (see [comment](https://github.com/pytorch/pytorch/pull/21800#discussion_r294305485)) referring to `get_computed_values`, and deprecating it in the next release.
* `CosineAnnealingWarmRestart` still takes an epoch parameter as it is the only one with a mechanic relying on fractional epoch
* `MultiplicativeLR` is consumes a function providing the multiplicative factor at each epoch. It mimics `LambdaLR` in its syntax.

# #20527

### Before

The user calls scheduler with a constant epoch either across loops or in the same loop.
```
import torch.optim as optim
from torch import nn

conv = nn.Conv2d(3,3,3)
optimizer = optim.Adam(conv.parameters())
lr_scheduler = optim.lr_scheduler.StepLR(optimizer, 2)

# Scheduler with sometimes-constant epoch number
for epoch in [0, 0, 1, 1, 2, 2, 3, 3]:
  lr_scheduler.step(epoch)
  print(optimizer.param_groups[0]['lr'])
```

### After

If the user wants to step
```
import torch.optim as optim
from torch import nn

conv = nn.Conv2d(3,3,3)
optimizer = optim.Adam(conv.parameters())
lr_scheduler = optim.lr_scheduler.StepLR(optimizer, 2)

last_epoch = -1
for epoch in [0, 0, 1, 1, 2, 2, 3, 3]:

  # Check if epoch number has changed manually
  if epoch-last_epoch > 0:
    lr_scheduler.step()
  last_epoch = epoch

  print(epoch, scheduler.get_computed_values())
```

# #22107

### Before

```
import torch
from torchvision.models import resnet18
net = resnet18()

optimizer = torch.optim.SGD(net.parameters(), 0.1)
scheduler = torch.optim.lr_scheduler.MultiStepLR(optimizer, milestones=[3, 6, 9], gamma=0.1)
scheduler = torch.optim.lr_scheduler.StepLR(optimizer, 3, gamma=0.1)

for i in range(10):
  # Scheduler computes and returns new learning rate, leading to unexpected behavior
  print(i, scheduler.get_lr())
  scheduler.step()
```

### After

```
import torch
from torchvision.models import resnet18

net = resnet18()
optimizer = torch.optim.SGD(net.parameters(), 0.1)
lr_scheduler = torch.optim.lr_scheduler.MultiStepLR(optimizer, milestones=[3, 6, 9], gamma=0.1)
lr_scheduler = torch.optim.lr_scheduler.StepLR(optimizer, 3, gamma=0.1)

for i in range(10):
    # Returns last computed learning rate by scheduler
    print(i, lr_scheduler.get_computed_values())
    lr_scheduler.step()
```

# ghstack

This contains the changes from #24352. Opening again since they were reverted.

This reverts commit 1c477b7e1f.

Test Plan: Imported from OSS

Differential Revision: D17460427

Pulled By: vincentqb

fbshipit-source-id: 8c10f4e7246d6756ac91df734e8bed65bdef63c9
2019-10-04 08:53:14 -07:00
Vincent Quenneville-Belair
e4fba752cb fix type annotation
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26930

Test Plan: Imported from OSS

Differential Revision: D17614745

Pulled By: vincentqb

fbshipit-source-id: 1c29543f74d9cf307e9665aa890b4830b886fe63
2019-09-27 13:39:36 -07:00
Vincent Quenneville-Belair
660d9e24dd Highlighting in the doc that square root comes before adding epsilon
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26735

Test Plan: Imported from OSS

Differential Revision: D17558505

Pulled By: vincentqb

fbshipit-source-id: 36449c501f3ab3bc7cadd1f580258904b39369d4
2019-09-25 15:52:28 -07:00
Zecong Hu
b8ae4d0f1c Resolve #25605 cyclic reference in _LRScheduler (#25776)
Summary:
Cyclic reference was introduced in a previous version due to runtime overwriting of the bound method `optimizer.step`. This is now avoided by keeping a weak reference to the optimizer instance.

Credit: https://stackoverflow.com/questions/26157952/why-set-a-bound-method-to-python-object-create-a-circular-reference
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25776

Differential Revision: D17420770

Pulled By: ezyang

fbshipit-source-id: 546ec94cf725ebfddb310b24e6a2e146ddecd1f6
2019-09-18 06:08:35 -07:00
Vincent Quenneville-Belair
a3f0d988d9 Revert D17349760: Change schedulers to chainable form
Test Plan: revert-hammer

Differential Revision:
D17349760

Original commit changeset: 0a6ac01e2a6b

fbshipit-source-id: 41c2c136215dabc26cad5098a08eff2a2a29b715
2019-09-13 12:54:59 -07:00
Vincent Quenneville-Belair
939ae80de1 Change schedulers to chainable form (#24352)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24352

Enable chainable schedulers as requested in #13022 by implementing the changes mentioned below from [comment](https://github.com/pytorch/pytorch/pull/21800#issuecomment-513370208).

* Changing the behavior of schedulers to the chainable formula when available
* Using the closed form whenever epoch is different from None until the next release with a deprecation warning
* Making `get_computed_values` the supported way of obtaining the last computed learning rate by the scheduler (see [comment](https://github.com/pytorch/pytorch/pull/21800#issuecomment-513940729) for new syntax)
* Returning a deprecation warning when invoking the undocumented get_lr function (see [comment](https://github.com/pytorch/pytorch/pull/21800#discussion_r294305485)) referring to `get_computed_values`, and deprecating it in the next release.
* `CosineAnnealingWarmRestart` still takes an epoch parameter as it is the only one with a mechanic relying on fractional epoch
* `MultiplicativeLR` is consumes a function providing the multiplicative factor at each epoch. It mimics `LambdaLR` in its syntax.

# #20527

### Before

The user calls scheduler with a constant epoch either across loops or in the same loop.
```
import torch.optim as optim
from torch import nn

conv = nn.Conv2d(3,3,3)
optimizer = optim.Adam(conv.parameters())
lr_scheduler = optim.lr_scheduler.StepLR(optimizer, 2)

# Scheduler with sometimes-constant epoch number
for epoch in [0, 0, 1, 1, 2, 2, 3, 3]:
  lr_scheduler.step(epoch)
  print(optimizer.param_groups[0]['lr'])
```

### After

If the user wants to step
```
import torch.optim as optim
from torch import nn

conv = nn.Conv2d(3,3,3)
optimizer = optim.Adam(conv.parameters())
lr_scheduler = optim.lr_scheduler.StepLR(optimizer, 2)

last_epoch = -1
for epoch in [0, 0, 1, 1, 2, 2, 3, 3]:

  # Check if epoch number has changed manually
  if epoch-last_epoch > 0:
    lr_scheduler.step()
  last_epoch = epoch

  print(epoch, scheduler.get_computed_values())
```

# #22107

### Before

```
import torch
from torchvision.models import resnet18
net = resnet18()

optimizer = torch.optim.SGD(net.parameters(), 0.1)
scheduler = torch.optim.lr_scheduler.MultiStepLR(optimizer, milestones=[3, 6, 9], gamma=0.1)
scheduler = torch.optim.lr_scheduler.StepLR(optimizer, 3, gamma=0.1)

for i in range(10):
  # Scheduler computes and returns new learning rate, leading to unexpected behavior
  print(i, scheduler.get_lr())
  scheduler.step()
```

### After

```
import torch
from torchvision.models import resnet18

net = resnet18()
optimizer = torch.optim.SGD(net.parameters(), 0.1)
lr_scheduler = torch.optim.lr_scheduler.MultiStepLR(optimizer, milestones=[3, 6, 9], gamma=0.1)
lr_scheduler = torch.optim.lr_scheduler.StepLR(optimizer, 3, gamma=0.1)

for i in range(10):
    # Returns last computed learning rate by scheduler
    print(i, lr_scheduler.get_computed_values())
    lr_scheduler.step()
```

Test Plan: Imported from OSS

Differential Revision: D17349760

Pulled By: vincentqb

fbshipit-source-id: 0a6ac01e2a6b45000bc6f9df732033dd81f0d89f
2019-09-13 07:36:05 -07:00
Vincent Quenneville-Belair
135bbc261d fix base_lr overridden in cyclic lr (#26105)
Summary:
base_lr parameter was being overridden by super `__init__`, see https://github.com/pytorch/pytorch/issues/21965.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26105

Reviewed By: yf225

Differential Revision: D17346724

Pulled By: vincentqb

fbshipit-source-id: 4b146bd64f4f385c0a9c4f4df8eb8991312fb15c
2019-09-12 15:53:03 -07:00
Vincent Quenneville-Belair
05f1fed693 Add OneCycleLR (#25324)
Summary:
Squash rebase of https://github.com/pytorch/pytorch/issues/21258

ghstack-source-id: 7d3ce522ac4dd3050bc6c6bbda1eaaeb8bc4b2c1
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25324
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25325

Differential Revision: D17095722

Pulled By: vincentqb

fbshipit-source-id: 7fe69b210924ee3b39223dd78122aea61267234a
2019-08-28 16:59:40 -07:00
lili
1b7f7aa12a change LBFGS's default tolerance_grad to 1e-7 (#25240)
Summary:
Hi,

I noticed after v1.2.0 the implement of LBFGS optimizer has been changed. In this new implement, the return condition has been changed from the sum of the gradients to the max value in the gradients (see: b15d91490a/torch/optim/lbfgs.py (L313)). But the default tolerance_grad parameter has not been changed (which is too large for max of gradients), so this result in lots of my old codes not optimizing or only optimizing for one or two steps.

So, I came up this pull request to suggest that changing this tolerance_grad to a smaller value
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25240

Differential Revision: D17102713

Pulled By: vincentqb

fbshipit-source-id: d46acacdca1c319c1db669f75da3405a7db4a7cb
2019-08-28 16:46:04 -07:00
Roy Li
14ac7a1d87 Add epsilon argument to Adagrad optimizer (#24980)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24980

We'll need this internally, so just updating the open source version. the other optimizers have this argument anyways.

Test Plan: Imported from OSS

Differential Revision: D16945279

Pulled By: li-roy

fbshipit-source-id: 0b8cc86f15387cd65660747899d3d7dd870cff27
2019-08-21 16:36:51 -07:00
meijieru
bd054e7cef reduce memory usage for centered rmsprop (#24170)
Summary:
Reduce gpu memory usage by using in-place operation
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24170

Differential Revision: D16784495

Pulled By: vincentqb

fbshipit-source-id: 03820cdc9a3952b95b9af0f87d3a9bb0f21e9b4d
2019-08-13 12:18:31 -07:00
Geovanni Zhang
a5f697619c Add interfaces in lr_scheduler.pyi (#23934)
Summary:
Some interfaces of schedulers defined in lr_scheduler.py are missing in lr_scheduler.pyi.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23934

Differential Revision: D16726622

Pulled By: ezyang

fbshipit-source-id: 45fd2d28fbb658c71f6fcd33b8997d6ee8e2b17d
2019-08-12 07:03:41 -07:00
Horace He
bb41e62e3b Updated SGD docs with subscripts (#23985)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/23982

Obvious improvement imo.

Also changed `rho` to `mu`, since `rho` and `p` look very similar.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23985

Differential Revision: D16733037

Pulled By: Chillee

fbshipit-source-id: 5431615d1983f24d6582da6fc8103ac0093b5832
2019-08-09 10:32:40 -07:00
Gregory Chanan
fc82ec298b Update CosineAnnealingWarmRestarts to follow PyTorch 1.1+ Step Order. (#23833)
Summary:
Fixes: https://github.com/pytorch/pytorch/issues/23480.

I only verified that the schedule reaches the restart at the expected step as specified in the issue, it would be good to have someone else verify correctness here.

Script:
```
scheduler = torch.optim.lr_scheduler.CosineAnnealingWarmRestarts(torch.optim.SGD([torch.randn(1, requires_grad=True)], lr=0.5), T_0=1, T_mult=2)
for i in range(9):
    print(i)
    print(scheduler.get_lr())
    scheduler.step()
```
Output:
```
0
[0.5]
1
[0.5]
2
[0.25]
3
[0.5]
4
[0.42677669529663687]
5
[0.25]
6
[0.07322330470336313]
7
[0.5]
8
[0.4809698831278217]
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23833

Differential Revision: D16657251

Pulled By: gchanan

fbshipit-source-id: 713973cb7cbfc85dc333641cbe9feaf917718eb9
2019-08-07 07:15:50 -07:00
Farhad Ramezanghorbani
fed5ca192c Adam/AdamW implementation minor fix (#22628)
Summary:
I have noticed a small discrepancy between theory and the implementation of AdamW and in general Adam. The epsilon in the denominator of the following Adam update should not be scaled by the bias correction [(Algorithm 2, L9-12)](https://arxiv.org/pdf/1711.05101.pdf). Only the running average of the gradient (_m_) and squared gradients (_v_) should be scaled by their corresponding bias corrections.

![adam_update](https://user-images.githubusercontent.com/13050245/60894105-11117f00-a230-11e9-9ba0-adad2ae2e0ae.png)

In the current implementation, the epsilon is scaled by the square root of `bias_correction2`.  I have plotted this ratio as a function of step given `beta2 = 0.999` and `eps = 1e-8`. In the early steps of optimization, this ratio slightly deviates from theory (denoted by the horizontal red line).

![plot](https://user-images.githubusercontent.com/13050245/60893952-cabc2000-a22f-11e9-8dc2-6353ad5d674d.png)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22628

Differential Revision: D16589914

Pulled By: vincentqb

fbshipit-source-id: 8791eb338236faea9457c0845ccfdba700e5f1e7
2019-08-01 11:42:04 -07:00
Pavel Belevich
7b229342ca Renamed CosineAnnealingLr to CosineAnnealingLR (#23242)
Summary:
fixing https://github.com/pytorch/pytorch/issues/23160
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23242

Differential Revision: D16443348

Pulled By: pbelevich

fbshipit-source-id: af0edf4e841e04a8016c98bfee72696581f3f070
2019-07-23 14:54:15 -07:00
Michael Acar
a4b2f3e213 Implement AdamW optimizer (#21250)
Summary:
# What is this?
This is an implementation of the AdamW optimizer as implemented in [the fastai library](803894051b/fastai/callback.py) and as initially introduced in the paper [Decoupled Weight Decay Regularization](https://arxiv.org/abs/1711.05101). It decouples the weight decay regularization step from the optimization step during training.

There have already been several abortive attempts to push this into pytorch in some form or fashion: https://github.com/pytorch/pytorch/pull/17468, https://github.com/pytorch/pytorch/pull/10866, https://github.com/pytorch/pytorch/pull/3740, https://github.com/pytorch/pytorch/pull/4429. Hopefully this one goes through.
# Why is this important?
Via a simple reparameterization, it can be shown that L2 regularization has a weight decay effect in the case of SGD optimization. Because of this, L2 regularization became synonymous with the concept of weight decay. However, it can be shown that the equivalence of L2 regularization and weight decay breaks down for more complex adaptive optimization schemes. It was shown in the paper [Decoupled Weight Decay Regularization](https://arxiv.org/abs/1711.05101) that this is the reason why models trained with SGD achieve better generalization than those trained with Adam. Weight decay is a very effective regularizer. L2 regularization, in and of itself, is much less effective. By explicitly decaying the weights, we can achieve state-of-the-art results while also taking advantage of the quick convergence properties that adaptive optimization schemes have.
# How was this tested?
There were test cases added to `test_optim.py` and I also ran a [little experiment](https://gist.github.com/mjacar/0c9809b96513daff84fe3d9938f08638) to validate that this implementation is equivalent to the fastai implementation.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21250

Differential Revision: D16060339

Pulled By: vincentqb

fbshipit-source-id: ded7cc9cfd3fde81f655b9ffb3e3d6b3543a4709
2019-07-02 09:09:10 -07:00
Vincent Quenneville-Belair
f176950a67 Use lower case for strong wolfe option. (#22092)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22092
ghimport-source-id: ccc53ed2f1e16865237334a4dde4d162e21762e5

Test Plan: Imported from OSS

Differential Revision: D15955996

Pulled By: vincentqb

fbshipit-source-id: 8ffbea3b9ef8ff7021d42524fa46112da8a3438e
2019-06-26 08:20:25 -07:00
fehiepsi
ad73ea22f7 Add strong Wolfe line search for lbfgs (#8824)
Summary:
This pull request adds a line search for lbfgs. "strong Wolfe" is the default line search method in [minFunc](https://www.cs.ubc.ca/~schmidtm/Software/minFunc.html) and it is also recommended in the [Numerical Optimization](https://www.springer.com/gp/book/9780387303031) book.

The implementation is based on four sources:
+ https://www.cs.ubc.ca/~schmidtm/Software/minFunc.html
+ https://www.springer.com/gp/book/9780387303031 Algorithms 3.5, 3.6, formula 3.59
+ https://github.com/torch/optim/blob/master/lswolfe.lua
+ https://github.com/torch/optim/blob/master/polyinterp.lua

The 'lua' version is based on an old version of `minFunc`, which has been updated in 2012. I made a couple of small changes based on the updated version. Due to that, the test of comparing with `.lua` version is not consistent (that's is the reason I changed a learning rate in the test).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/8824

Differential Revision: D15783067

Pulled By: vincentqb

fbshipit-source-id: 5316d9088233981120376d79c7869d5f97e51b69
2019-06-12 11:32:41 -07:00
Ejaaz Merali
fb9fbc009c Fix momentum bug in CyclicLR (#20401)
Summary:
Resolves issue https://github.com/pytorch/pytorch/issues/19003

The author of this issue also asked that `cycle_momentum` default to `False` if the optimizer does not have a momentum parameter, but I'm not sure what the best way to do this would be. Silently changing the value based on the optimizer may confuse the user in some cases (say the user explicitly set `cycle_momentum=True` but doesn't know that the Adam optimizer doesn't use momentum).

Maybe printing a warning when switching this argument's value would suffice?
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20401

Differential Revision: D15765463

Pulled By: ezyang

fbshipit-source-id: 88ddabd9e960c46f3471f37ea46013e6b4137eaf
2019-06-11 15:10:28 -07:00
Edward Yang
3889855a5b Revert "Redefine scheduler to set learning rate using recursive formula" #14010 (#21463)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21463
ghimport-source-id: 1b0ea4a282b41388d5c6f6a5d18d37c14ae874ad

Differential Revision: D15747426

Pulled By: ezyang

fbshipit-source-id: 0708394f907b98a9f45bcfa26e5cc450fda8cf76
2019-06-10 15:26:25 -07:00
vfn
8ece538a79 Addresses bad behavior with overridden optimizer.step by #20124 (#21460)
Summary:
This PR addresses the problem described in the comment: https://github.com/pytorch/pytorch/pull/20203#issuecomment-499231276
and previously coded bad behaviour:
- a warning was raised all the times when lr schedulling is initialized

Now the code checks that:
- on the second call of `lr_scheduler.step`, ensure that `optimizer.step` has been already called, otherwise raise a warning (as it was done in #20203 )
- if optimizer's step is overridden -> raise once another warning to aware user about the new pattern:
`opt.step()` -> `lrs.step()` as we can not check this .

Now tests check that
- at initialization (`lrs = StepLR(...)`)there is no warnings
- if we replace `optimizer.step` by something else (similarly to the [code of nvidia/apex](https://github.com/NVIDIA/apex/blob/master/apex/amp/_process_optimizer.py#L287)) there is another warning raised.

cc ezyang

PS. honestly I would say that there is a lot of overhead introduced for simple warnings. I hope all these checks will be removed in future `1.2.0` or other versions...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21460

Differential Revision: D15701776

Pulled By: ezyang

fbshipit-source-id: eac5712b9146d9d3392a30f6339cd33d90c497c7
2019-06-06 13:54:42 -07:00
vfdev
449a2c3555 Fixes #20124 (#20203)
Summary:
Fixes #20124

Description:
Code wraps `optimizer.step()` method to detect whether user is following new pattern or old pattern. In case of old pattern detected, a UserWarning is raised. Documentation is also updated to reflect the change:

![Screen Shot 2019-05-07 at 11 05 17](https://user-images.githubusercontent.com/2459423/57287527-04e63580-70b8-11e9-9ddd-5d159ef0ed2f.png)

cc SsnL, bado-lee
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20203

Differential Revision: D15543060

Pulled By: ezyang

fbshipit-source-id: 3605e1afdb6ffc2dfd5e75e92e01b967c4d065b5
2019-05-29 14:15:01 -07:00
njdalton
d190450a35 Fix typo in CyclicLR docs (#21021)
Summary:
Fixes a typo in the CyclicLR docs by adding the lr_scheduler directory and puts in other required arguments.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21021

Differential Revision: D15530109

Pulled By: soumith

fbshipit-source-id: 98781bdab8d82465257229e50fa3bd0015da1286
2019-05-28 21:18:50 -07:00
Sam Pepose
082936f033 Clarify cycliclr param docs (#20880)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20880

This clarifies how the momentum parameters should be used.

Reviewed By: soumith

Differential Revision: D15482450

fbshipit-source-id: e3649a38876c5912cb101d8e404abca7c3431766
2019-05-28 12:07:47 -07:00
Dmytro Dzhulgakov
c25e33789e Lightweight at-most-once logging for API usage (#20745)
Summary:
Resubmit #20698 which got messed up.

Idea is that when PyTorch is used in a custom build environment (e.g. Facebook), it's useful to track usage of various APIs centrally. This PR introduces a simple very lightweight mechanism to do so - only first invocation of a trigger point would be logged. This is significantly more lightweight than #18235 and thus we can allow to put logging in e.g. TensorImpl.

Also adds an initial list of trigger points. Trigger points are added in such a way that no static initialization triggers them, i.e. just linking with libtorch.so will not cause any logging. Further suggestions of what to log are welcomed.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20745

Differential Revision: D15429196

Pulled By: dzhulgakov

fbshipit-source-id: a5e41a709a65b7ebccc6b95f93854e583cf20aca
2019-05-23 23:17:59 -07:00
Michael Kösel
4e0d098ace Fix optimizer type hint (#20648)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/20548
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20648

Differential Revision: D15453935

Pulled By: ezyang

fbshipit-source-id: 8778e819c58fdc2620f123ec5b5fd568e23b7705
2019-05-22 11:27:40 -07:00
Edward Yang
74bdcd44c4 Remove tab. (#20715)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20715
ghimport-source-id: 600e244581b37152d86614cca6c9fb5fee6cdcde

Differential Revision: D15417984

Pulled By: ezyang

fbshipit-source-id: 939425de1a95ecc3798384e121b12faaba3a27b8
2019-05-20 11:57:18 -07:00
kirayue
d0c742134d #20028 (#20696)
Summary:
Hi, ezyang
Sorry to trouble you.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20696

Differential Revision: D15413694

Pulled By: ezyang

fbshipit-source-id: 1c19d18e00c3a66a52bb9230aa25d7530f6e659c
2019-05-20 07:51:55 -07:00
Edward Z. Yang
9b1dbffba5
Re-sync with internal repository (#20702) 2019-05-20 09:22:57 -04:00
Dmytro Dzhulgakov
d3059b9c49 Lightweight logging for once-only API usage 2019-05-19 23:04:40 -07:00
Edward Yang
839a69f587 Revert D15393514: [pytorch][PR] Refine CosineAnnealingWarmRestarts doc for issue #20028
Differential Revision:
D15393514

Original commit changeset: 03f270a577fc

fbshipit-source-id: 3633f4e9916bdadf018288a64df89078b14af563
2019-05-17 09:55:56 -07:00
kirayue
3c69c9a7fe Refine CosineAnnealingWarmRestarts doc for issue #20028 (#20267)
Summary:
Fixes #20028
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20267

Differential Revision: D15393514

Pulled By: ezyang

fbshipit-source-id: 03f270a577fc3e0414d3f07d97512a409b08f7cd
2019-05-17 09:02:28 -07:00
vfdev
61f1242b7f Formula typo fix (#20110)
Summary:
T_{cur + 1} -> T_{cur} + 1
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20110

Differential Revision: D15218135

Pulled By: ezyang

fbshipit-source-id: fb914d977cac447867921510bf57b59e62e4f68c
2019-05-06 08:08:37 -07:00
Ricky Chen
57948414ac Fix small typo T_mul->T_mult
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20148

Differential Revision: D15217485

Pulled By: ezyang

fbshipit-source-id: cb183cdc2eb3e42c685ef024742a18745923d283
2019-05-06 06:32:33 -07:00
barrh
767c82e151 Initialize last_epoch in _LRScheduler.__init__() (#20059)
Summary:
Class attributes preferably be explicitly initiated within
the __init__() call. Otherwise, overriding step() is
prone to bugs.

This patch partially reverts #7889
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20059

Differential Revision: D15195747

Pulled By: soumith

fbshipit-source-id: 3d1a51d8c725d6f14e3e91ee94c7bc7a7d6c1713
2019-05-02 22:38:12 -07:00
Soumith Chintala
75754beca3 Revert D14577575: [pytorch][PR] Fix lack of state init for adagrad and add share_memory flag
Differential Revision:
D14577575

Original commit changeset: 12440079ac96

fbshipit-source-id: 935106385e608471dc280fc61cfedf19d330812d
2019-04-26 15:43:04 -07:00
kirayue
af06d6342c Add SGDR(Stochastic Gradient Descent with Warm Restarts) scheduler (#17226)
Summary:
Because of merge error with master in #15042, open a new PR for ezyang.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17226

Differential Revision: D14418145

Pulled By: mrshenli

fbshipit-source-id: 099ba225b28e6aba71760b81b2153ad1c40fbaae
2019-04-25 09:26:31 -07:00
Kaiyu Shi
444f792fa6 Fix lack of state init for adagrad and add share_memory flag (#17679)
Summary:
The current code initialize the `state` in `__init__` method, but the initialization process is not invoked in `add_parameter_group`.

I followed the same approach in other Optimizers to init the `state`.

```python
import torch

emb = torch.nn.Embedding(10,10)
emb2 = torch.nn.Embedding(10,10)

optim = torch.optim.Adagrad(emb.parameters())
print(optim.state[emb.weight])  # already initialized

optim.add_param_group({'params': emb2.parameters()})
print(optim.state[emb2.weight])  # empty dict

loss = emb2.weight.sum() + emb.weight.sum()
loss.backward()
optim.step()  # raised KeyError
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17679

Differential Revision: D14577575

Pulled By: ezyang

fbshipit-source-id: 12440079ac964b9eedad48e393d47f558babe300
2019-04-23 12:22:19 -07:00
Chandler Zuo
e3f1504621 Fix the Division by Zero Bug of CosineAnnealingLR (#19180)
Summary:
Added the formula for the corner case. Updated unit tests.

Fixes #17913
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19180

Differential Revision: D14942023

Pulled By: ezyang

fbshipit-source-id: 167c109b97a7830d5b24541dc91e4788d531feec
2019-04-23 09:54:28 -07:00
Bado Lee
36084908e4 Fix lr_scheduler's last_epoch value at the time of initialization (BC BREAKING!) (#7889)
Summary:
Hello everyone :) !!

I've found that lr_scheduler was initialized with last_epoch as -1.
This causes that even after the first step (not the one in init but explicit step of scheduler),
learning rate of scheduler's optimizer remains as the previous.
```python
>>> import torch
>>> cc = torch.nn.Conv2d(10,10,3)
>>> myinitial_lr = 0.1
>>> myoptimizer = torch.optim.Adam(cc.parameters(), lr=myinitial_lr)
>>> mylrdecay = 0.5
>>> myscheduler = torch.optim.lr_scheduler.ExponentialLR(myoptimizer,mylrdecay)

>>> myscheduler.get_lr()
[0.2]    # this is because of  get_lr calculates lr by 0.1 * 0.5^-1
>>> myscheduler.optimizer.param_groups[0]["lr"]
0.1    # this is not consistent with get_lr value
>>> myscheduler.last_epoch
-1

>>> myscheduler.step()
>>> myscheduler.get_lr()
[0.1]    # this should be the value right after the init, not after first step
>>> myscheduler.optimizer.param_groups[0]["lr"]
0.1    # since this is after first step, it should have been decayed as 0.05
>>> myscheduler.last_epoch
0

>>> myscheduler.step()
>>> myscheduler.last_epoch
1
>>> myscheduler.get_lr()
[0.05]
>>> myscheduler.optimizer.param_groups[0]["lr"]
0.05
>>> myscheduler.last_epoch
1
```

First problem is, even after the init of lr_scheduler, you get the inconsistent parameter values.

The second problem is, you are stuck with same learning rate in the first 2 epochs if the step function of lr_scheduler is not called in the beginning of the epoch loop.
Of course, you can avoid this by calling lr_scheduler's step in the beginning,
but I don't think this is proper use since, incase of optimizer, step is called in the end of the iteration loop.

I've simply avoided all above issues by setting last_epoch as 0 after the initialization.

This also makes sense when you init with some value of last_epoch which is not -1.
For example, if you want to init with last epoch 10,
lr should not be set with decayed 1 step further. Which is
last_epoch gets +1 in the previous code.
base_lr * self.gamma ** self.last_epoch

Instead, it should be set with step 10 exact value.

I hope this fix find it's way with all your help :)
I'm really looking forward & excited to become a contributor for pytorch!
Pytorch Rocks!!
Pull Request resolved: https://github.com/pytorch/pytorch/pull/7889

Differential Revision: D15012769

Pulled By: ezyang

fbshipit-source-id: 258fc3009ea7b7390a3cf2e8a3682eafb506b08b
2019-04-23 08:54:09 -07:00
barrh
557b1b362f Fix copied optimizer (#19308)
Summary:
Add the defaults field to the copied object.
Prior to this patch, optimizer.__getattr__ has excluded the defaults
attribute of optimizer source object, required by some LR schedulers. (e.g. CyclicLR with momentum)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19308

Differential Revision: D15012801

Pulled By: soumith

fbshipit-source-id: 95801b269f6f9d78d531d4fed95c973b280cc96f
2019-04-19 10:27:01 -07:00
Jon Malmaud
1b25fdbcd0 More type stubs (#18511)
Summary:
Added stubs for:

* The `device` module
* The `cuda` module
* Parts of the `optim` module
* Began adding stubs for the `autograd` module. I'll annotate more later but `no_grad` and friends are probably the most used exports from it so it seemed like a good place to start.

This would close #16996, although comments on that issue reference other missing stubs so maybe it's worth keeping open as an umbrella issue.

The big remaining missing package is `nn`.

Also added a `py.typed` file so mypy will pick up on the type stubs. That closes #17639.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18511

Differential Revision: D14715053

Pulled By: ezyang

fbshipit-source-id: 9e4882ac997063650e6ce47604b3eaf1232c61c9
2019-04-01 16:03:58 -07:00
Edward Yang
173f224570 Turn on F401: Unused import warning. (#18598)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18598
ghimport-source-id: c74597e5e7437e94a43c163cee0639b20d0d0c6a

Stack from [ghstack](https://github.com/ezyang/ghstack):
* **#18598 Turn on F401: Unused import warning.**

This was requested by someone at Facebook; this lint is turned
on for Facebook by default.  "Sure, why not."

I had to noqa a number of imports in __init__.  Hypothetically
we're supposed to use __all__ in this case, but I was too lazy
to fix it.  Left for future work.

Be careful!  flake8-2 and flake8-3 behave differently with
respect to import resolution for # type: comments.  flake8-3 will
report an import unused; flake8-2 will not.  For now, I just
noqa'd all these sites.

All the changes were done by hand.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Differential Revision: D14687478

fbshipit-source-id: 30d532381e914091aadfa0d2a5a89404819663e3
2019-03-30 09:01:17 -07:00
Søren Rasmussen
95d3825e48 ReduceLrOnPlateau: best=current -> best=copy(current) (#16364) (#16697)
Summary:
Fixes #16364
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16697

Differential Revision: D14680879

Pulled By: soumith

fbshipit-source-id: c50c22f3eacea4474fb3a04fe85fbf11d5a177c9
2019-03-29 06:56:51 -07:00
Sam Pepose
8635078d9e Adds Cyclical Learning Rate and Momentum (#18001)
Summary:
This implements a cyclical learning rate (CLR) schedule with an optional inverse cyclical momentum. More info about CLR: https://github.com/bckenstler/CLR

This is finishing what #2016 started. Resolves #1909.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18001

Differential Revision: D14451845

Pulled By: sampepose

fbshipit-source-id: 8f682e0c3dee3a73bd2b14cc93fcf5f0e836b8c9
2019-03-27 19:56:04 -07:00