Commit Graph

705 Commits

Author SHA1 Message Date
Masaki Kozuki
7a3503dfd8 Add _foreach_sign (#106343)
Rel:
- #106221

Should we add foreach of [`torch.sgn`](https://pytorch.org/docs/stable/generated/torch.sgn.html) as well?

Pull Request resolved: https://github.com/pytorch/pytorch/pull/106343
Approved by: https://github.com/janeyx99
2023-08-01 22:33:34 +00:00
Jane Xu
59d0dea90f Only make a shallow copy when loading optimizer state_dict (#106082)
The thing we do still deep copy is the param_groups, which is much lighter weight. This should also save memory when loading from a checkpoint.

The deepcopy was introduced in ecfcf39f30, but module.py had only a shallow copy at that point so it did not actually bring parity.

Incorporates an XLA fix, which is why I'm updating the pin to ca5eab87a7

Pull Request resolved: https://github.com/pytorch/pytorch/pull/106082
Approved by: https://github.com/albanD, https://github.com/Skylion007
2023-08-01 05:33:31 +00:00
Jane Xu
23f47f746b [optim][rprop] Minimize intermediates=1 for foreach to save memory (#105193)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/105193
Approved by: https://github.com/albanD
2023-07-31 20:59:26 +00:00
Jane Xu
ad3af0aead Change phrasing on optim state hook docs (#106209)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/106209
Approved by: https://github.com/albanD
2023-07-28 18:59:21 +00:00
Jane Xu
dffa4e14b9 Add Optimizer state_dict hooks (#105953)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/105953
Approved by: https://github.com/albanD
2023-07-28 11:52:41 +00:00
Jane Xu
ec0ffac33b [BE] Document optimizer state_dict better, use example (#105958)
![image](https://github.com/pytorch/pytorch/assets/31798555/50ce293c-d884-47ab-b5f5-9ba41e3b4bad)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/105958
Approved by: https://github.com/albanD
2023-07-27 23:08:42 +00:00
shibo19
21ede4547a remove duplicated code in optimizer (#106022)
Fixes #ISSUE_NUMBER
as the title, the check code  has duplicates
Pull Request resolved: https://github.com/pytorch/pytorch/pull/106022
Approved by: https://github.com/janeyx99
2023-07-26 17:01:28 +00:00
Matthew Hoffman
0616952d13 Merge and improve torch optim optimizer type stubs (#102593)
Fixes #102428

Also improves hook registration type hints:

```python
from typing import Any, Dict, Tuple

from torch import nn
from torch.optim import Adam, Adagrad, Optimizer

linear = nn.Linear(2,2)
optimizer = Adam(linear.parameters(), lr=0.001)

def pre_hook_fn_return_none(optimizer: Adam, inputs: Tuple[Any, ...], kwargs: Dict[str, Any]) -> None:
    return None

def pre_hook_fn_return_modified(
    optimizer: Optimizer, inputs: Tuple[Any, ...], kwargs: Dict[str, Any]
) -> Tuple[Tuple[Any, ...], Dict[str, Any]]:
    return inputs, kwargs

def hook_fn(optimizer: Optimizer, inputs: Tuple[Any, ...], kwargs: Dict[str, Any]) -> None:
    return None

def hook_fn_other_optimizer(optimizer: Adagrad, inputs: Tuple[Any, ...], kwargs: Dict[str, Any]) -> None:
    return None

optimizer.register_step_post_hook(hook_fn)  # OK

optimizer.register_step_pre_hook(pre_hook_fn_return_none)  # OK
optimizer.register_step_pre_hook(pre_hook_fn_return_modified)  # OK

optimizer.register_step_post_hook(hook_fn_other_optimizer)  # Parameter 1: type "Adam" cannot be assigned to type "Adagrad"

```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/102593
Approved by: https://github.com/janeyx99, https://github.com/malfet
2023-07-26 11:56:42 +00:00
janEbert
b0708654c0 Implement NAdamW optimizer (#103881)
NAdamW, which is simply NAdam with the AdamW weight decay term, has shown strong performance in optimizer comparisons such as
1. https://arxiv.org/abs/2211.09760
1. https://arxiv.org/abs/2306.07179

[The VeLO paper](https://arxiv.org/abs/2211.09760) argues its power lies in its ability to act as a superset of other popular optimizers.

This PR adds NAdamW by ~~copying and making very small adaptations to the NAdam implementation (just like AdamW and Adam). To see the small changes in better detail, you can `diff torch/optim/nadam.py torch/optim/nadamw.py`.~~ adding a boolean flag `decoupled_weight_decay` that activates NAdamW behavior (`False` by default) to NAdam.

Interest in the optimizer has also been shown in the PyTorch forums:
https://discuss.pytorch.org/t/nadamw-and-demon-optimizers/179778

Pull Request resolved: https://github.com/pytorch/pytorch/pull/103881
Approved by: https://github.com/janeyx99
2023-07-24 19:29:26 +00:00
Aaron Gokaslan
6d43c89f37 [BE]: Update Ruff to 0.0.280 (#105724)
Removes unusued loop values in python dictionary iteration. Automated fix from Ruff master

Pull Request resolved: https://github.com/pytorch/pytorch/pull/105724
Approved by: https://github.com/ezyang, https://github.com/janeyx99
2023-07-22 23:03:34 +00:00
Jane Xu
1959802548 [AdamW] Fix complex x amsgrad support (#104990)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/104990
Approved by: https://github.com/albanD
2023-07-21 23:43:26 +00:00
Jane Xu
e1296a7f8d [Adam] Fix complex x amsgrad support (#104989)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/104989
Approved by: https://github.com/albanD
2023-07-21 23:43:26 +00:00
Justin Chu
4cc1745b13 [BE] f-stringify torch/ and scripts (#105538)
This PR is a follow up on the pyupgrade series to convert more strings to use f-strings using `flynt`.

- https://docs.python.org/3/reference/lexical_analysis.html#f-strings
- https://pypi.org/project/flynt/

Command used:

```
flynt torch/ -ll 120
flynt scripts/ -ll 120
flynt tools/ -ll 120
```

and excluded `collect_env.py`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/105538
Approved by: https://github.com/ezyang, https://github.com/malfet
2023-07-21 19:35:24 +00:00
Jane Xu
25d80c69ce [foreach] super minor BE: remove unnecessary cast (#105601)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/105601
Approved by: https://github.com/albanD
2023-07-20 17:06:52 +00:00
Jane Xu
e855348cdf [foreach][SGD] minimize intermediates=1 to decrease peak memory (#105599)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/105599
Approved by: https://github.com/albanD
2023-07-20 17:06:52 +00:00
Justin Chu
3721fa5612 [BE] Enable ruff's UP rules and autoformat optim/ (#105426)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/105426
Approved by: https://github.com/malfet, https://github.com/albanD, https://github.com/aaronenyeshi, https://github.com/janeyx99
2023-07-18 21:07:43 +00:00
Nikita Shulga
5837e95d30 [Reland] Update mypy to 1.4.1 (#105227)
This PR re-lands
- [Typing] Fix PEP 484 Violation (#105022)
- Update mypy to 1.4.1 (#91983)

That were reverted due to the conflict with internal source repo.

Mostly fixes for PEP-484 violation (i.e. when default arg is set to None, but type is not annotated as optional)
Plus few real fixes:
  - Add missing `_get_upgraders_entry_map` to `torch/_C/__init__.pyi`
  - Add missing return statement to `torch._export. deserialize_graph`
  - Fix error message in `torch.ao.ns.fx.weight_utils.get_lstm_mod_weights`
  - Add assert it `torch/optim/optimizer.py` that Optional list is not None
TODO (in followup PR):
  - Fix erroneous `isinstance` check in `torch/ao/quantization/_pt2e/qat_utils.py`

Unrelated, to bypass CI failures due to the gcc9 dependency update in Ubuntu-18.04:
- Add hack to squash older libstdc++ from conda environment in favor one from OS to `.ci/docker/install_conda.sh`
- Update bazel cuda builds to focal, as with libstdc++-6.0.32 bazel builds loose the ability to catch exceptions (probably because they link with cupti statically, but I could not found where it is done)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/105227
Approved by: https://github.com/atalman, https://github.com/albanD, https://github.com/Skylion007
2023-07-15 20:30:20 +00:00
PyTorch MergeBot
15fd1ea118 Revert "[Reland] Update mypy to 1.4.1 (#105227)"
This reverts commit c9c4f8efc3.

Reverted https://github.com/pytorch/pytorch/pull/105227 on behalf of https://github.com/atalman due to trying to mitigate ci sev #105248 ([comment](https://github.com/pytorch/pytorch/pull/105227#issuecomment-1636510935))
2023-07-14 22:28:35 +00:00
Nikita Shulga
c9c4f8efc3 [Reland] Update mypy to 1.4.1 (#105227)
This PR re-lands
- [Typing] Fix PEP 484 Violation (#105022)
- Update mypy to 1.4.1 (#91983)

That were reverted due to the conflict with internal source repo.

Mostly fixes for PEP-484 violation (i.e. when default arg is set to None, but type is not annotated as optional)
Plus few real fixes:
  - Add missing `_get_upgraders_entry_map` to `torch/_C/__init__.pyi`
  - Add missing return statement to `torch._export. deserialize_graph`
  - Fix error message in `torch.ao.ns.fx.weight_utils.get_lstm_mod_weights`
  - Add assert it `torch/optim/optimizer.py` that Optional list is not None
TODO (in followup PR):
  - Fix erroneous `isinstance` check in `torch/ao/quantization/_pt2e/qat_utils.py`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/105227
Approved by: https://github.com/atalman, https://github.com/albanD, https://github.com/Skylion007
2023-07-14 20:45:12 +00:00
PyTorch MergeBot
1646d6f939 Revert "Merge and improve torch optim optimizer type stubs (#102593)"
This reverts commit 3279f06410.

Reverted https://github.com/pytorch/pytorch/pull/102593 on behalf of https://github.com/malfet due to There is nothing wrong with this PR, but it fails some internal builds that depend on outdated typing_extensions, will reland when update is done ([comment](https://github.com/pytorch/pytorch/pull/102593#issuecomment-1636062515))
2023-07-14 16:04:54 +00:00
PyTorch MergeBot
3c5a494d7a Revert "Update mypy to 1.4.1 (#91983)"
This reverts commit 634659e262.

Reverted https://github.com/pytorch/pytorch/pull/91983 on behalf of https://github.com/malfet due to It's dependent change was reverted, so reverting this one as well, to keep CI clean ([comment](https://github.com/pytorch/pytorch/pull/91983#issuecomment-1636059709))
2023-07-14 15:59:16 +00:00
Jane Xu
cd15229950 [foreach][RMSprop] Minimize intermediates=2 to decrease peak memory (#105161)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/105161
Approved by: https://github.com/albanD
2023-07-13 23:18:54 +00:00
Jane Xu
219cf2a1c8 [foreach][ASGD] Minimize intermediates=1 to decrease peak memory (#105146)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/105146
Approved by: https://github.com/Skylion007, https://github.com/albanD
2023-07-13 23:18:54 +00:00
albanD
ef05c5f202 Use plain power operator in Adam/Adamw when capturing (#104254)
The goal is to fix the problem from https://github.com/pytorch/pytorch/pull/102858

The full error this used to raise was :
```
2023-06-27T15:12:15.0663239Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/optim/adamw.py", line 409, in _single_tensor_adamw
2023-06-27T15:12:15.0663699Z     bias_correction1 = 1 - beta1 ** step
2023-06-27T15:12:15.0664200Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_tensor.py", line 40, in wrapped
2023-06-27T15:12:15.0664547Z     return f(*args, **kwargs)
2023-06-27T15:12:15.0665031Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_tensor.py", line 882, in __rpow__
2023-06-27T15:12:15.0665483Z     return torch.tensor(other, dtype=dtype, device=self.device) ** self
2023-06-27T15:12:15.0665899Z RuntimeError: CUDA error: operation not permitted when stream is capturing
2023-06-27T15:12:15.0666401Z CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
```

This pow issue was fixed in https://github.com/pytorch/pytorch/pull/104264 and so this problem should be solvable now.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/104254
Approved by: https://github.com/janeyx99, https://github.com/aws-murandoo
2023-07-13 19:24:25 +00:00
Nikita Shulga
634659e262 Update mypy to 1.4.1 (#91983)
Mostly fixes for PEP-484 violation (i.e. when default arg is set to None, but type is not annotated as optional)
Plus few real fixes:
  - Add missing `_get_upgraders_entry_map` to `torch/_C/__init__.pyi`
  - Add missing return statement to `torch._export. deserialize_graph`
  - Fix error message in `torch.ao.ns.fx.weight_utils.get_lstm_mod_weights`
  -
TODO (in followup PR):
  - Fix erroneous `isinstance` check in `torch/ao/quantization/_pt2e/qat_utils.py`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/91983
Approved by: https://github.com/kit1980, https://github.com/ZainRizvi, https://github.com/huydhn, https://github.com/thiagocrepaldi, https://github.com/aaronenyeshi
2023-07-13 16:30:36 +00:00
Jane Xu
0bc382ea55 [foreach][Adamax] Minimize intermediates=1 to decrease peak memory (#104991)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/104991
Approved by: https://github.com/albanD, https://github.com/Skylion007
2023-07-12 03:09:17 +00:00
Jane Xu
ea6a563a8c [foreach][Adagrad] Minimize intermediates=2 to decrease peak memory (#104988)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/104988
Approved by: https://github.com/albanD, https://github.com/Skylion007
2023-07-12 03:09:17 +00:00
Jane Xu
455f495f04 [foreach][Adadelta] Minimize intermediates=3 to decrease peak memory (#104983)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/104983
Approved by: https://github.com/albanD, https://github.com/Skylion007
2023-07-12 03:09:15 +00:00
Jane Xu
15aa401baa [foreach][NAdam] Minimize use of intermediates to decrease peak memory (#104910)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/104910
Approved by: https://github.com/Skylion007, https://github.com/albanD
2023-07-11 17:08:07 +00:00
Jane Xu
6878d3a157 [foreach][RAdam] Minimize use of intermediates to decrease peak memory (#104904)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/104904
Approved by: https://github.com/albanD, https://github.com/Skylion007
2023-07-11 17:08:07 +00:00
Jane Xu
231364fd06 [optim] use lerp whenever possible (#104796)
This is a better copy (with fixes) of #104781.

Test plan:
CI will pass once https://github.com/pytorch/pytorch/pull/104784 is landed

Internal CI (and the newly enabled compiled optim tests) will pass after https://github.com/pytorch/pytorch/pull/104866 is landed.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/104796
Approved by: https://github.com/albanD
2023-07-11 14:32:59 +00:00
Matthew Hoffman
3279f06410 Merge and improve torch optim optimizer type stubs (#102593)
Fixes #102428

Also improves hook registration type hints:

```python
from typing import Any, Dict, Tuple

from torch import nn
from torch.optim import Adam, Adagrad, Optimizer

linear = nn.Linear(2,2)
optimizer = Adam(linear.parameters(), lr=0.001)

def pre_hook_fn_return_none(optimizer: Adam, inputs: Tuple[Any, ...], kwargs: Dict[str, Any]) -> None:
    return None

def pre_hook_fn_return_modified(
    optimizer: Optimizer, inputs: Tuple[Any, ...], kwargs: Dict[str, Any]
) -> Tuple[Tuple[Any, ...], Dict[str, Any]]:
    return inputs, kwargs

def hook_fn(optimizer: Optimizer, inputs: Tuple[Any, ...], kwargs: Dict[str, Any]) -> None:
    return None

def hook_fn_other_optimizer(optimizer: Adagrad, inputs: Tuple[Any, ...], kwargs: Dict[str, Any]) -> None:
    return None

optimizer.register_step_post_hook(hook_fn)  # OK

optimizer.register_step_pre_hook(pre_hook_fn_return_none)  # OK
optimizer.register_step_pre_hook(pre_hook_fn_return_modified)  # OK

optimizer.register_step_post_hook(hook_fn_other_optimizer)  # Parameter 1: type "Adam" cannot be assigned to type "Adagrad"

```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/102593
Approved by: https://github.com/janeyx99
2023-07-11 00:07:30 +00:00
Jane Xu
7e9c891056 [foreach][AdamW] Minimize intermediates to save peak memory (#104898)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/104898
Approved by: https://github.com/albanD
2023-07-10 23:40:52 +00:00
Jane Xu
35f0e35529 [foreach][Adam] Minimize use of intermediates to decrease peak memory (#104780)
Starts addressing https://github.com/pytorch/pytorch/issues/97712 by
- Minimizing intermediates usage for foreach Adam
- Document the extra memory usage
- Add comments within the code for clarity now that we reuse intermediates
- Add tests
- Did some refactoring

Next steps involve doing this for all other foreach implementations. Note that even after this change, foreach mem usage will be higher than forloop due to the fact that we have a minimum budget of 1 intermediate (to not muddle the input values) and the intermediate will be larger. For capturable, the memory usage is higher due to moving more tensors to CUDA.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/104780
Approved by: https://github.com/albanD
2023-07-10 17:38:46 +00:00
PyTorch MergeBot
e7fe2a797c Revert "[optim] use lerp whenever possible (#104796)"
This reverts commit fbe2a7e50a.

Reverted https://github.com/pytorch/pytorch/pull/104796 on behalf of https://github.com/DanilBaibak due to Break internal build ([comment](https://github.com/pytorch/pytorch/pull/104796#issuecomment-1628591105))
2023-07-10 09:36:41 +00:00
Jane Xu
fbe2a7e50a [optim] use lerp whenever possible (#104796)
This is a better copy (with fixes) of #104781.

Test plan:
CI will pass once https://github.com/pytorch/pytorch/pull/104784 is landed

Pull Request resolved: https://github.com/pytorch/pytorch/pull/104796
Approved by: https://github.com/albanD
2023-07-08 07:13:38 +00:00
Animesh Jain
0444f9f85b [dynamo] Reland #104317 - Lazy disable_dynamo API out-of-dynamo (#104664)
Internal failed because of torch.deploy issues with disable_dynamo in fx/* and _jit/* files. Removing disable_dynamo for both. Added a comment in the code.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/104664
Approved by: https://github.com/wconstab
2023-07-06 00:48:02 +00:00
Michael Lazos
a290cbf32b Enable fused foreach Adam compilation (#104121)
Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/104121
Approved by: https://github.com/janeyx99
2023-07-05 23:40:03 +00:00
PyTorch MergeBot
54e320d4d1 Revert "[dynamo] Lazy disable_dynamo API out-of-dynamo (#104317)"
This reverts commit 5c12a810ac.

Reverted https://github.com/pytorch/pytorch/pull/104317 on behalf of https://github.com/huydhn due to This has been reverted internally by D47166892, so I need to also revert it on OSS to keep them in sync ([comment](https://github.com/pytorch/pytorch/pull/104317#issuecomment-1621099151))
2023-07-05 06:21:48 +00:00
Animesh Jain
5c12a810ac [dynamo] Lazy disable_dynamo API out-of-dynamo (#104317)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/104317
Approved by: https://github.com/jansel, https://github.com/wconstab, https://github.com/mlazos
2023-06-29 13:30:17 +00:00
Ali Moezzi
8c3958eddc Fix lr_scheduler serialization contains bound methods issue (#102627)
Fixes #42376
`torch.save` serializes bound methods inside LR scheduler resulting in large serialized file.

Test cases include checking file size, checking if the `anneal_func` is bounded and file is loaded correctly.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/102627
Approved by: https://github.com/albanD
2023-06-23 03:53:15 +00:00
Michael Lazos
5a97c947c6 Fix optimizer grad mode state interaction with dynamo (#103952)
Graph break before restoring the grad mode to ensure dynamo respects `no_grad`. This isn't a bug necessarily, but this will allow us to get good perf until aot is updated.

https://github.com/pytorch/pytorch/issues/104053

Pull Request resolved: https://github.com/pytorch/pytorch/pull/103952
Approved by: https://github.com/janeyx99
2023-06-23 02:07:08 +00:00
Niklas Nolte
4624afaa30 use reset_running_stats in swa_utils.update_bn (#103801)
the stat reset in `swa_utils.update_bn` already exists in `NormBase.reset_running_stats`, so use that
Pull Request resolved: https://github.com/pytorch/pytorch/pull/103801
Approved by: https://github.com/janeyx99
2023-06-23 01:17:13 +00:00
milesial
f73ff54f9a Use torch._foreach_lerp for SWA update (#103550)
Launch fewer kernels during a SWA update thanks to `torch._foreach_lerp`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/103550
Approved by: https://github.com/janeyx99
2023-06-21 15:35:20 +00:00
Nikita Shulga
6d2887cc06 Reland "Move tensor grouping to ATen" (#103912)
This is a reland of https://github.com/pytorch/pytorch/pull/100007 with a build fix for Windows debug builds.
`at::native::ParamsHash` only works on structs with standard layout, but `std::string` isn't one in Visual C++ debug builds, which one can easily verified by running something like:
```cpp
#define _DEBUG
#include <type_traits>
#include <string>
static_assert(std::is_standard_layout_v<std::string>, "Oh noes");
```
If above conditon is not met, instead of printing a static_assert output, VC++ raises a very cryptic compilation errors,  see https://github.com/pytorch/pytorch/pull/100007#discussion_r1227116292 for more detail.

Also, using `std::hash` for string should result in a faster hash function.

(cherry picked from commit 74b7a6c75e)

<!--
copilot:summary
-->
### <samp>🤖 Generated by Copilot at 5914771</samp>

This pull request introduces a new function `_group_tensors_by_device_and_dtype` that can group tensors by their device and dtype, and updates the `foreach` utilities and several optimizers to use this function. The goal is to improve the performance, readability, and compatibility of the code that handles tensors with different properties. The pull request also adds a test case and type annotations for the new function, and some error checks for the `fused` argument in Adam and AdamW.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/103912
Approved by: https://github.com/janeyx99
2023-06-21 09:26:33 +00:00
WEN Hao
67babf7a45 Enhance decorator _use_grad_for_differentiable (#103567)
Aim: enhance decorator _use_grad_for_differentiable so that functions (methods) decorated by it keep their docstrings and signatures unchanged.

Fixes #103566

Pull Request resolved: https://github.com/pytorch/pytorch/pull/103567
Approved by: https://github.com/janeyx99
2023-06-16 18:33:31 +00:00
Andrew Gu
9152d0e5be Silence has_cuda deprecation in optim (#103610)
```
UserWarning: 'has_cuda' is deprecated, please use 'torch.backends.cuda.is_built()'
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/103610
Approved by: https://github.com/janeyx99, https://github.com/Skylion007
2023-06-14 22:09:22 +00:00
Ravikiran Parameshwara
8340762211 Update lr_scheduler.py to check the type of eta_min (#97003)
Add float assertion to `eta_min` parameter in `CosineAnnealingWarmRestarts`.

Fixes #87757

Pull Request resolved: https://github.com/pytorch/pytorch/pull/97003
Approved by: https://github.com/janeyx99
2023-06-14 02:13:05 +00:00
Jane Xu
fa893f3f58 Fix optim state_dict casting to allow step to cast to CPU (#102619)
I'm guessing this should fix https://github.com/pytorch/pytorch/pull/88015#issuecomment-1569523106 but am waiting on @ychfan to supply more details so I could write a good test case.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/102619
Approved by: https://github.com/albanD
2023-06-13 00:46:40 +00:00
PyTorch MergeBot
0cb5bc3b04 Revert "Move tensor grouping to ATen (#100007)"
This reverts commit 74b7a6c75e.

Reverted https://github.com/pytorch/pytorch/pull/100007 on behalf of https://github.com/izaitsevfb due to Breaks internal builds, see D46629727 ([comment](https://github.com/pytorch/pytorch/pull/100007#issuecomment-1587861598))
2023-06-12 18:30:33 +00:00
shibo19
900226f20a add multi swa support for custom device (#103297)
Fixes #ISSUE_NUMBER
add multi swa support for custom device
Pull Request resolved: https://github.com/pytorch/pytorch/pull/103297
Approved by: https://github.com/janeyx99
2023-06-10 10:01:08 +00:00
Masaki Kozuki
74b7a6c75e Move tensor grouping to ATen (#100007)
rel: #94344
Pull Request resolved: https://github.com/pytorch/pytorch/pull/100007
Approved by: https://github.com/janeyx99
2023-06-09 15:44:46 +00:00
shibo19
e4a42bcf56 add foreach support for custom device (#102047)
Fixes #ISSUE_NUMBER
for custom device, we want to support foreach, so I add a func that we could set other device type, and the default value is cuda.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/102047
Approved by: https://github.com/janeyx99
2023-06-07 13:59:20 +00:00
Michael Lazos
00f1bb0963 Fix optimizer cuda health check graph break (can be done in the compiler) (#102765)
- Ignore the health check if we are compiling
- Don't disable the function anymore

Pull Request resolved: https://github.com/pytorch/pytorch/pull/102765
Approved by: https://github.com/albanD
2023-06-03 03:42:23 +00:00
Michael Lazos
4da88447ea Disable grouping by dtype and device if compiling (#102771)
Disable grouping if we are compiling, this happens during lowering
Pull Request resolved: https://github.com/pytorch/pytorch/pull/102771
Approved by: https://github.com/janeyx99
2023-06-02 21:04:49 +00:00
PyTorch MergeBot
9d77949b9e Revert "add foreach support for custom device (#102047)"
This reverts commit b088ff4677.

Reverted https://github.com/pytorch/pytorch/pull/102047 on behalf of https://github.com/malfet due to Broke inductor, see b088ff4677 ([comment](https://github.com/pytorch/pytorch/pull/102047#issuecomment-1572368942))
2023-06-01 16:33:03 +00:00
shibo19
b088ff4677 add foreach support for custom device (#102047)
Fixes #ISSUE_NUMBER
for custom device, we want to support foreach, so I add a func that we could set other device type, and the default value is cuda.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/102047
Approved by: https://github.com/janeyx99
2023-06-01 06:22:44 +00:00
vfdev
76af22103b Fixed type hints for CosineAnnealingWarmRestarts (#102067)
Fixed type hints for CosineAnnealingWarmRestarts:
- `T_mult` is not `Optional[int]` but just `int`
- `eta_min` is not `Optional[float]` but just `float`
- removed `step` method specific annotation as it is compatible with the base class

e132f09e88/torch/optim/lr_scheduler.py (L1365-L1375)

Otherwise, computation like this `self.T_i * self.T_mult` in `self.step` is not possible:
```
error: Unsupported operand types for * ("int" and "None")
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/102067
Approved by: https://github.com/janeyx99
2023-05-23 19:06:07 +00:00
Jane Xu
3135bec4a0 [docs] Clarify when to use SparseAdam (#101465)
![image](https://github.com/pytorch/pytorch/assets/31798555/ff19a522-2630-4578-bc0e-6a704aa94d4e)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/101465
Approved by: https://github.com/albanD
2023-05-17 21:16:20 +00:00
Jane Xu
f558af2a55 [adam] Use the right params in weight_decay, rename for clarity, fixes #100707 (#100973)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/100973
Approved by: https://github.com/Skylion007, https://github.com/albanD
2023-05-09 17:00:27 +00:00
milesial
45bf3f6216 Optimized EMA implementation (#94820)
This PR proposes an optimized way to do Exponential Moving Average (EMA), which is faster than the current way using `swa_utils.AveragedModel` described in https://pytorch.org/docs/stable/optim.html#custom-averaging-strategies.

This implementation is asynchronous, and is built as an optimizer wrapper so that the EMA weight update happens without any additional CPU/GPU sync, just after optimizer steps, and with limited code changes.

Example usage:
```
model = Model().to(device)
opt = torch.optim.Adam(model.parameters())

opt = EMAOptimizer(opt, device, 0.9999)

for epoch in range(epochs):
    training_loop(model, opt)

    regular_eval_accuracy = evaluate(model)

    with opt.swap_ema_weights():
        ema_eval_accuracy = evaluate(model)
```

Here are some benchmarks (time per iteration) on various torchvision models:

|model|this PR iteration time                      |swa_utils.AveragedModel iteration time| iteration speedup                                      |
|-----|-----------------------------|-----------------------|---------------------------------------------|
|     |                             |                       |                                             |
|regnet_x_1_6gf|62.73                        |67.998                 |1.08                                         |
|regnet_x_3_2gf|101.75                       |109.422                |1.08                                         |
|regnet_x_400mf|25.13                        |32.005                 |1.27                                         |
|regnet_x_800mf|33.01                        |37.466                 |1.13                                         |
|regnet_x_8gf|128.13                       |134.868                |1.05                                         |
|regnet_y_16gf|252.91                       |261.292                |1.03                                         |
|regnet_y_1_6gf|72.14                        |84.22                  |1.17                                         |
|regnet_y_3_2gf|99.99                        |109.296                |1.09                                         |
|regnet_y_400mf|29.53                        |36.506                 |1.24                                         |
|regnet_y_800mf|37.82                        |43.634                 |1.15                                         |
|regnet_y_8gf|196.63                       |203.317                |1.03                                         |
|resnet101|128.80                       |137.434                |1.07                                         |
|resnet152|182.85                       |196.498                |1.07                                         |
|resnet18|29.06                        |29.975                 |1.03                                         |
|resnet34|50.73                        |53.443                 |1.05                                         |
|resnet50|76.88                        |80.602                 |1.05                                         |
|resnext101_32x8d|277.29                       |280.759                |1.01                                         |
|resnext101_64x4d|269.56                       |281.052                |1.04                                         |
|resnext50_32x4d|100.73                       |101.102                |1.00                                         |
|shufflenet_v2_x0_5|10.56                        |15.419                 |1.46                                         |
|shufflenet_v2_x1_0|13.11                        |18.525                 |1.41                                         |
|shufflenet_v2_x1_5|18.05                        |23.132                 |1.28                                         |
|shufflenet_v2_x2_0|25.04                        |30.008                 |1.20                                         |
|squeezenet1_1|14.26                        |14.325                 |1.00                                         |
|swin_b|264.52                       |274.613                |1.04                                         |
|swin_s|180.66                       |188.914                |1.05                                         |
|swin_t|108.62                       |112.632                |1.04                                         |
|swin_v2_s|220.29                       |231.153                |1.05                                         |
|swin_v2_t|127.27                       |133.586                |1.05                                         |
|vgg11|95.52                        |103.714                |1.09                                         |
|vgg11_bn|106.49                       |120.711                |1.13                                         |
|vgg13|132.94                       |147.063                |1.11                                         |
|vgg13_bn|149.73                       |165.256                |1.10                                         |
|vgg16|158.19                       |172.865                |1.09                                         |
|vgg16_bn|177.04                       |192.888                |1.09                                         |
|vgg19|184.76                       |194.194                |1.05                                         |
|vgg19_bn|203.30                       |213.334                |1.05                                         |
|vit_b_16|217.31                       |219.748                |1.01                                         |
|vit_b_32|69.47                        |75.692                 |1.09                                         |
|vit_l_32|223.20                       |258.487                |1.16                                         |
|wide_resnet101_2|267.38                       |279.836                |1.05                                         |
|wide_resnet50_2|145.06                       |154.918                |1.07                                         |

You can see that in all cases it is faster than using `AveragedModel`. In fact in many cases, adding EMA does not add any overhead since the computation is hidden behind the usual iteration flow.

This is a similar implementation to the one currently in [NVIDIA NeMo](https://github.com/NVIDIA/NeMo).

If the team is interested in merging this, let me know and I'll add some documentation similar to `swa_utils` and tests.

Credits to @szmigacz for the implementation.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/94820
Approved by: https://github.com/janeyx99
2023-04-26 18:02:11 +00:00
Arthur
7a8d0ccddf Correct LBFGS tolerance_grad doc string (#99792)
LBFGS' `tolerance_grad` parameter has had a default value of `1e-7` since #25240. The doc string wasn't updated in that PR to match the change https://github.com/pytorch/pytorch/blob/main/torch/optim/lbfgs.py#L207.

no open issue for it, just happened to set it to 1e-7 and was surprised my results didn't change :-) eventually noticed inconsistency in the doc and seemed like an easy opportunity to figure out how to contribute.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/99792
Approved by: https://github.com/janeyx99
2023-04-22 20:19:01 +00:00
PyTorch MergeBot
4637c5ae5b Revert "Simplify _use_grad_for_differentiable (#98706)"
This reverts commit b9da79d280.

Reverted https://github.com/pytorch/pytorch/pull/98706 on behalf of https://github.com/huydhn due to Sorry for reverting your PR but a bunch of inductor tests are failing after this commit, so reverting the PR just to be sure
2023-04-22 00:35:56 +00:00
Jason Ansel
b9da79d280 Simplify _use_grad_for_differentiable (#98706)
This makes it so dynamo can trace through it

Pull Request resolved: https://github.com/pytorch/pytorch/pull/98706
Approved by: https://github.com/janeyx99
2023-04-21 20:47:19 +00:00
Masaki Kozuki
22ea21da3d Change 1D Tensor of 1 element to 0D Tensor (#96994)
add 0d tensor to graph adam/adamw test

Affected:
- `torch.cuda.amp.GradScaler`'s `found_inf`, `_scale`, and `_growth_tracker`
- `step` of Adam & AdamW of `capturable`

Fixes #96776 🤞

Pull Request resolved: https://github.com/pytorch/pytorch/pull/96994
Approved by: https://github.com/janeyx99
2023-03-21 18:24:19 +00:00
Jane Xu
aacbf091db Allow fused optimizers to call _foreach_zero_ in zero_grad (#97159)
Fixes #97032

Pull Request resolved: https://github.com/pytorch/pytorch/pull/97159
Approved by: https://github.com/Skylion007
2023-03-20 19:03:26 +00:00
Aaron Gokaslan
5471621497 [BE] Remove unnecessary dict comprehensions (#97116)
Removes unnecessary dict comprehensions that optimize creation of dicts from iterables

Pull Request resolved: https://github.com/pytorch/pytorch/pull/97116
Approved by: https://github.com/kit1980
2023-03-20 00:56:57 +00:00
Aaron Gokaslan
dd9ade6377 Remove unnecessary items() call in zero_grad (#97040)
Micro-optimization to zero_grad() which is performance critical
Pull Request resolved: https://github.com/pytorch/pytorch/pull/97040
Approved by: https://github.com/ezyang, https://github.com/albanD
2023-03-17 21:34:14 +00:00
David
e8b0f504e2 Fix unpicklable object in AveragedModel (#95979)
Fixes #95376

Don't store the callable `avg_fn`, instead test if `avg_fn` is None and call
the default impl if it's not.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/95979
Approved by: https://github.com/janeyx99
2023-03-12 05:13:22 +00:00
Masaki Kozuki
7d765cdc66 Fix wrong handling of grad_scale & found_inf in fused optimizers (#95847)
Fixes #95781.
The cause seems to be that the current implementation doesn't correctly pass `found_inf` when `grad_scale` is `None`. Therefore parameters can get mistakenly updated by gradients whose some elements are invalid, i.e. nan or inf.

Related #94060

I forgot about this wrong handling after #94344

Pull Request resolved: https://github.com/pytorch/pytorch/pull/95847
Approved by: https://github.com/janeyx99
2023-03-04 01:21:21 +00:00
Jane Xu
75cb99e549 [optim] Widen the cases for defaulting to foreach (#95820)
Big OOP correction continued. Also added a test this time to verify the defaulting was as expected.

The key here is realizing that the grouping for foreach already assumes that the non-param tensorlists follow suit in dtype and device, so it is too narrow to check that _all_ tensors were on CUDA. The main leeway this allowed was state_steps, which are sometimes cpu tensors. Since foreach _can_ handle cpu tensors, this should not introduce breakage.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/95820
Approved by: https://github.com/albanD
2023-03-02 04:15:33 +00:00
Jane Xu
2bcf863fad [optim] include nn.Parameter as foreach supported (#95811)
This PR is a result of a realization that models are NOT subscribed to the foreach defaulting as have been claimed on our documentation for months now. BIG OOPS.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/95811
Approved by: https://github.com/albanD
2023-03-02 04:15:33 +00:00
Xuehai Pan
1fd119948e [3/3] Update .pyi Python stub files and enable 'UFMT' linter (#95268)
Changes:

- #95200

1. Recognize `.py.in` and `.pyi.in` files as Python in VS Code for a better development experience.
2. Fix deep setting merge in `tools/vscode_settings.py`.

- #95267

3. Use `Namedtuple` rather than `namedtuple + __annotations__` for `torch.nn.utils.rnn.PackedSequence_`:

    `namedtuple + __annotations__`:

    ```python
    PackedSequence_ = namedtuple('PackedSequence_',
                                 ['data', 'batch_sizes', 'sorted_indices', 'unsorted_indices'])

    # type annotation for PackedSequence_ to make it compatible with TorchScript
    PackedSequence_.__annotations__ = {'data': torch.Tensor, 'batch_sizes': torch.Tensor,
                                       'sorted_indices': Optional[torch.Tensor],
                                       'unsorted_indices': Optional[torch.Tensor]}
    ```

    `Namedtuple`: Python 3.6+

    ```python
    class PackedSequence_(NamedTuple):
        data: torch.Tensor
        batch_sizes: torch.Tensor
        sorted_indices: Optional[torch.Tensor]
        unsorted_indices: Optional[torch.Tensor]
    ```

- => this PR: #95268

4. Sort import statements and remove unnecessary imports in `.pyi`, `.pyi.in` files.
5. Format `.pyi`, `.pyi.in` files and remove unnecessary ellipsis `...` in type stubs.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/95268
Approved by: https://github.com/huydhn
2023-03-01 23:50:56 +00:00
Kiersten Stokes
60a1d29585 Correct OneCycleLR doc example code to explicitly call optimizer.step() (#95730)
Fixes #89358 as suggested in the issue comment

A screenshot of the example code in the built docs:
<img width="1168" alt="Screenshot 2023-02-28 at 4 46 45 PM" src="https://user-images.githubusercontent.com/31816267/221999156-02b28f2a-85b3-4aa8-841d-e4c66a39a33f.png">

Pull Request resolved: https://github.com/pytorch/pytorch/pull/95730
Approved by: https://github.com/janeyx99
2023-03-01 02:15:50 +00:00
Jane Xu
e5b9d98752 Rephrase zero_grad docs (#95643)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/95643
Approved by: https://github.com/albanD
2023-02-28 22:04:23 +00:00
Jane Xu
097679478e [optim] Set defaults to foreach, NOT fused (#95241)
Rolling back the default change for Adam and rectifying the docs to reflect that AdamW never defaulted to fused.

Since our fused implementations are relatively newer, let's give them a longer bake-in time before flipping the switch for every user.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/95241
Approved by: https://github.com/ngimel
2023-02-22 04:47:32 +00:00
Masaki Kozuki
3e9df622fb [mta] implement _foreach_pow (#92303)
Mainly for foreach path of `Adam` and `AdamW`

rel: https://github.com/pytorch/pytorch/issues/58833
Pull Request resolved: https://github.com/pytorch/pytorch/pull/92303
Approved by: https://github.com/albanD
2023-02-16 02:28:26 +00:00
Xuehai Pan
b005ec62b9 [BE] Remove dependency on six and future (#94709)
Remove the Python 2 and 3 compatibility library [six](https://pypi.org/project/six) and [future](https://pypi.org/project/future) and `torch._six`. We only support Python 3.8+ now. It's time to retire them.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/94709
Approved by: https://github.com/malfet, https://github.com/Skylion007
2023-02-14 09:14:14 +00:00
Xuehai Pan
5b1cedacde [BE] [2/3] Rewrite super() calls in functorch and torch (#94588)
Rewrite Python built-in class `super()` calls. Only non-semantic changes should be applied.

- #94587
- #94588
- #94592

Also, methods with only a `super()` call are removed:

```diff
class MyModule(nn.Module):
-   def __init__(self):
-       super().__init__()
-
    def forward(self, ...):
        ...
```

Some cases that change the semantics should be kept unchanged. E.g.:

f152a79be9/caffe2/python/net_printer.py (L184-L190)

f152a79be9/test/test_jit_fuser_te.py (L2628-L2635)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/94588
Approved by: https://github.com/ezyang, https://github.com/albanD
2023-02-10 21:16:33 +00:00
Aaron Gokaslan
1e2d82b8e4 [BE] Merge isinstance calls together (#94419)
Simplify and speeds up isinstance calls by checking for multiple types at the same time.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/94419
Approved by: https://github.com/ezyang
2023-02-09 00:47:26 +00:00
Aaron Gokaslan
3ce1ebb6fb Apply some safe comprehension optimizations (#94323)
Optimize unnecessary collection cast calls, unnecessary calls to list, tuple, and dict, and simplify calls to the sorted builtin. This should strictly improve speed and improve readability.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/94323
Approved by: https://github.com/albanD
2023-02-07 23:53:46 +00:00
Aaron Gokaslan
8fce9a09cd [BE]: pyupgrade Python to 3.8 - imports and object inheritance only (#94308)
Apply parts of pyupgrade to torch (starting with the safest changes).
This PR only does two things: removes the need to inherit from object and removes unused future imports.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/94308
Approved by: https://github.com/ezyang, https://github.com/albanD
2023-02-07 21:10:56 +00:00
Masaki Kozuki
6ba041fcae Look up group["capturable"], not defaults["capturable"] in Adam(W) (#94149)
We could set different values in each `param_group` when calling dunder init of `torch.optim` optimizers as in e.g.  https://github.com/pytorch/pytorch/issues/89987.

So check whether or not `capturable` is `True` among all the `param_group`s.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/94149
Approved by: https://github.com/albanD
2023-02-07 00:24:35 +00:00
Masaki Kozuki
a23ed38f9a [mta][foreach] Implement fused adamw (#88015)
related: https://github.com/pytorch/pytorch/issues/68041, https://github.com/pytorch/pytorch/issues/71274, https://github.com/pytorch/pytorch/issues/80167
possibly related to https://github.com/pytorch/pytorch/issues/80595#issuecomment-1178519436

Pull Request resolved: https://github.com/pytorch/pytorch/pull/88015
Approved by: https://github.com/albanD, https://github.com/ngimel
2023-02-01 19:32:29 +00:00
Masaki Kozuki
d7a3f2128f pass None instead of False inside Adam.__setstate__ (#93289)
with a061f139dc, `fused`'s type hint is `Optional[bool]` and its default value is `None`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/93289
Approved by: https://github.com/janeyx99, https://github.com/Skylion007
2023-01-31 09:41:35 +00:00
Jane Xu
4fc19e1a71 [optim][adam] use fastest impl whenever possible, add util (#93184)
This allows it so that ONLY when the users don't set anything for foreach or fused do we switch the default and cascades adam so that we default to fused, then foreach, then single-tensor.

To clarify:
* if the user puts True in foreach _only_, it will run the foreach implementation.
* if the user puts True in fused _only_, it will run the fused implementation.
* if the user puts True in foreach AND for fused, it will run the fused implementation.

And:
* if the user puts False in foreach _only_, it will run the single tensor implementation.
* if the user puts False in fused _only_, it will still run the single tensor implementation.
* if the user puts False in foreach AND for fused, it will run the single tensor implementation.

I also didn't trust myself that much with the helper function, so I ran some local asserts on _default_to_fused_or_foreach. The only point left to really test is the type(p) -- torch.Tensor but I think the distributed tests will catch that in CI.
```
cuda_only_fp_list = [
    torch.rand((1, 2), device="cuda", dtype=torch.float32),
    torch.rand((1, 2), device="cuda", dtype=torch.float64),
    torch.rand((1, 2), device="cuda", dtype=torch.float16),
    torch.rand((1, 2), device="cuda", dtype=torch.bfloat16),
]

cuda_only_int_list = [
    torch.randint(1024, (1, 2), device="cuda", dtype=torch.int64),
]

cpu_list = [
    torch.rand((1, 2), device="cpu", dtype=torch.float32),
    torch.rand((1, 2), device="cpu", dtype=torch.float64),
    torch.rand((1, 2), device="cpu", dtype=torch.float16),
]

none_list = [None]

# differentiable should always make it return false for both
assert _default_to_fused_or_foreach([cuda_only_fp_list], True, True) == (False, False)
assert _default_to_fused_or_foreach([cuda_only_fp_list], True, False) == (False, False)

# cpu lists should always make it return false for both
assert _default_to_fused_or_foreach([cuda_only_fp_list, cpu_list], False, True) == (False, False)
assert _default_to_fused_or_foreach([cpu_list], False, True) == (False, False)
assert _default_to_fused_or_foreach([cuda_only_fp_list, cpu_list], False, False) == (False, False)
assert _default_to_fused_or_foreach([cpu_list], False, False) == (False, False)

# has fused triggers correctly
assert _default_to_fused_or_foreach([cuda_only_fp_list], False, True) == (True, False)
assert _default_to_fused_or_foreach([cuda_only_fp_list], False, False) == (False, True)

# ints always goes to foreach
assert _default_to_fused_or_foreach([cuda_only_fp_list, cuda_only_int_list], False, True) == (False, True)
assert _default_to_fused_or_foreach([cuda_only_fp_list, cuda_only_int_list], False, False) == (False, True)

# Nones don't error
assert _default_to_fused_or_foreach([cuda_only_fp_list, none_list], False, True) == (True, False)
assert _default_to_fused_or_foreach([cuda_only_fp_list, cuda_only_int_list, none_list], False, True) == (False, True)
assert _default_to_fused_or_foreach([none_list], False, True) == (True, False)
assert _default_to_fused_or_foreach([none_list], False, False) == (False, True)
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/93184
Approved by: https://github.com/albanD
2023-01-30 19:58:55 +00:00
Jane Xu
e714e37a06 [optim][sgd] default to foreach when CUDA + differentiable=False (#92730)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/92730
Approved by: https://github.com/albanD
2023-01-26 04:52:58 +00:00
Jane Xu
8c9f745af1 [foreach] guard default support on native tensors only (#92923)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/92923
Approved by: https://github.com/ngimel, https://github.com/crcrpar
2023-01-26 04:52:58 +00:00
Jane Xu
b90496eef5 [nn] zero_grad() set_to_none default True (#92731)
Attempts to fix #92656

BC-breaking! This changes the default of zero_grad in optim and in nn to default set grads to None instead of zero tensors. We are changing the default because there are proven perf wins and existing code has typically not regressed due to this change. (will probably have to flesh out this note more).

Pull Request resolved: https://github.com/pytorch/pytorch/pull/92731
Approved by: https://github.com/ngimel
2023-01-26 01:04:28 +00:00
Jane Xu
0d870b50d3 [optim][nadam] group tensors in foreach, make it default (#92715)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/92715
Approved by: https://github.com/albanD
2023-01-21 05:43:37 +00:00
Jane Xu
9ccf9362c2 [optim][rprop] default to foreach when CUDA + differentiable=False (#92728)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/92728
Approved by: https://github.com/albanD
2023-01-21 05:31:22 +00:00
Jane Xu
c628654724 [optim][rmsprop] default to foreach when CUDA + differentiable=False (#92727)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/92727
Approved by: https://github.com/albanD
2023-01-21 05:31:22 +00:00
Jane Xu
7277247a8c [optim][radam] default to foreach when CUDA + differentiable=False (#92726)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/92726
Approved by: https://github.com/albanD
2023-01-21 05:31:22 +00:00
Jane Xu
9f356568ab [optim][asgd] default to foreach when CUDA + differentiable=False (#92724)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/92724
Approved by: https://github.com/albanD
2023-01-21 05:31:22 +00:00
Jane Xu
30bda6b12b [optim][adamax] default to foreach when CUDA + differentiable=False (#92723)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/92723
Approved by: https://github.com/albanD
2023-01-21 05:31:22 +00:00
Jane Xu
9b4a778420 [optim][adagrad] default to foreach when CUDA + differentiable=False (#92716)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/92716
Approved by: https://github.com/albanD
2023-01-21 05:31:22 +00:00
Jane Xu
de0375e79d [optim][foreach] Do NOT inplace modify gradients (#92706)
SGD and ASGD already had out-of-place grads.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/92706
Approved by: https://github.com/ngimel, https://github.com/albanD
2023-01-21 00:12:28 +00:00
Jane Xu
2b885e1f6c [optim][NAdam] Fix discrepancy between mt vs st impl (#92699)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/92699
Approved by: https://github.com/albanD
2023-01-21 00:12:28 +00:00
milesial
e4d83d54a6 Foreach gradient clipping (#91846)
Faster gradient clipping using the foreach functions

```
[------------------------ (tensors, scalar) -------------------------]
                                   |  without foreach  |  with foreach |    apex
1 threads: ----------------------------------------------------------------------
      10 tensors of size 4         |         120.5     |       61.1    |     50.3
      100 tensors of size 4        |         946.2     |      239.5    |    136.3
      1000 tensors of size 4       |        9808.5     |     2151.1    |   1006.9
      10000 tensors of size 4      |       96871.2     |    22637.4    |  10119.1
      10 tensors of size 16        |         121.0     |       64.1    |     52.5
      100 tensors of size 16       |         993.4     |      252.6    |    136.7
      1000 tensors of size 16      |        9427.7     |     2151.2    |   1049.5
      10000 tensors of size 16     |       97437.1     |    22203.1    |  10340.0
      10 tensors of size 256       |         118.9     |       62.3    |     51.5
      100 tensors of size 256      |         955.2     |      243.1    |    134.2
      1000 tensors of size 256     |        9374.9     |     2140.7    |   1009.6
      10000 tensors of size 256    |       95302.5     |    21849.4    |  10215.5
      10 tensors of size 65536     |         118.5     |       62.4    |     51.1
      100 tensors of size 65536    |        1740.7     |      243.3    |    225.3
      1000 tensors of size 65536   |       17364.1     |     2228.7    |   2004.5
      10000 tensors of size 65536  |      177510.1     |    25410.4    |  20678.2
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/91846
Approved by: https://github.com/janeyx99
2023-01-20 21:43:29 +00:00
Jane Xu
b2ca2c8662 [optim][adagrad] group tensors in foreach to maximize perf (#92362)
another one
Pull Request resolved: https://github.com/pytorch/pytorch/pull/92362
Approved by: https://github.com/albanD
2023-01-20 16:24:39 +00:00
Jane (Yuan) Xu
3ba5eae72a [optim][radam] fix eps discrepancy for foreach (#92551)
Will likely race with https://github.com/pytorch/pytorch/pull/92365

eps was not being used at all in the mta/foreach impl. There was also a discrepancy between the docs vs the implementation: the implementation was doing sqrt(x) + eps and the docs were doing sqrt(x+eps)).

I've fixed the docs + extended the current multi_tensor test case to capture this issue.

![image](https://user-images.githubusercontent.com/31798555/213300617-61cbb763-da2d-48e0-b3b6-0190594dd049.png)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/92551
Approved by: https://github.com/albanD
2023-01-19 14:38:59 +00:00
Jane Xu
c5cb46ecdb [optim][asgd] group tensors in foreach to maximize perf (#92364)
faster foreach
Pull Request resolved: https://github.com/pytorch/pytorch/pull/92364
Approved by: https://github.com/albanD
2023-01-18 23:09:55 +00:00
Jane Xu
fbafcecf8d [optim][radam] group tensors in foreach to maximize perf (#92365)
Also noticed that eps is not being used nor tested at all for the mta impl of RAdam.

Will fix in a followup PR before turning foreach to default!
Pull Request resolved: https://github.com/pytorch/pytorch/pull/92365
Approved by: https://github.com/albanD
2023-01-18 22:32:27 +00:00
Jane Xu
de459bdfaa [optim][rmsprop] group tensors in foreach to maximize perf (#92369)
Test plan:
CI
Pull Request resolved: https://github.com/pytorch/pytorch/pull/92369
Approved by: https://github.com/albanD
2023-01-18 22:28:52 +00:00
Jane Xu
07800c52af [optim][adam] group tensors in foreach to maximize perf (#92349)
same idea as https://github.com/pytorch/pytorch/pull/92338
Pull Request resolved: https://github.com/pytorch/pytorch/pull/92349
Approved by: https://github.com/albanD
2023-01-18 22:05:42 +00:00
Jane (Yuan) Xu
e2433e420c [optim][adamax] group tensors in foreach to maximize perf (#92363)
make foreach faster
Pull Request resolved: https://github.com/pytorch/pytorch/pull/92363
Approved by: https://github.com/albanD
2023-01-18 21:32:28 +00:00
Jane Xu
bb34461f00 [optim][rprop] group tensors in foreach to maximize perf (#92372)
this one had a few more for loops than i was expecting
Pull Request resolved: https://github.com/pytorch/pytorch/pull/92372
Approved by: https://github.com/albanD
2023-01-18 20:03:11 +00:00
Jane Xu
0070c546b5 [BE][optim] abstract out docstrings, add differentiable docs (#92336)
1. abstract out common doc strings --> I'm sure there are more, but let this be a first step.
2. Add differentiable docs to those who are actually differentiable
Pull Request resolved: https://github.com/pytorch/pytorch/pull/92336
Approved by: https://github.com/albanD
2023-01-18 15:09:28 +00:00
Jane Xu
a41f00ed70 [optim][sgd] group tensors in foreach to maximize perf (#92338)
Make foreach faster for SGD
Pull Request resolved: https://github.com/pytorch/pytorch/pull/92338
Approved by: https://github.com/albanD
2023-01-18 04:02:41 +00:00
Jane Xu
0157e2ef4e [optim][adamw] default to foreach when CUDA + differentiable=False (#92306)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/92306
Approved by: https://github.com/albanD
2023-01-18 00:13:50 +00:00
Jane Xu
4fc796daf9 [optim] abstract out _default_to_foreach_util (#92305)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/92305
Approved by: https://github.com/albanD
2023-01-17 19:42:20 +00:00
Jane Xu
d41b5d7c14 [adam] Add not torch.jit.is_scripting() as a requirement for switching to fused (#92181)
A "fix" following https://github.com/pytorch/pytorch/pull/90865. Realized that fused is not compatible with torch.jit.is_scripting() when looking at a later line.

Took the opportunity to make the code cleaner/slightly more performant (with the extends) as well.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/92181
Approved by: https://github.com/albanD
2023-01-14 19:05:27 +00:00
Jane Xu
d3765509df [optim][adadelta] default to foreach when CUDA + differentiable=False (#91896)
following up to https://github.com/pytorch/pytorch/pull/90865 and https://github.com/pytorch/pytorch/pull/92048
Pull Request resolved: https://github.com/pytorch/pytorch/pull/91896
Approved by: https://github.com/albanD
2023-01-14 01:21:33 +00:00
Jane Xu
4af5939d7a [optim] Improve adadelta foreach, group tensors to maximize fast path (#92048)
Old behavior would have adadelta foreach sending tensors to the slow path if they were not all the same dtype nor on the same device.

This PR adds grouping for adadelta optimizer so that it would run foreach in batches, allowing more users to benefit from foreach perf.

Of course, we should ensure that the new implementation works, so there are new tests to ensure this behavior is not broken.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/92048
Approved by: https://github.com/albanD
2023-01-14 00:35:14 +00:00
Anupam Bhatnagar
f4b804eeaa Call profiler step via optimizer post hook (#90101)
This PR adds the `_profile_using_dynolog` function to `torch/__init__.py`. The `_profile_using_dynolog` method allows registering the optimizer step post hook. This is required to collect iteration based traces using dynolog.

Other related changes for tests to pass:
1. Updated `optimizer.pyi`
1. Updated `overrides.py`
1. The test `test_kineto_profiler_multiple_steppers` in `test_profiler.py` has been broken down into two cases:
     - `test_kineto_profiler_multiple_steppers_with_override_True` : this test uses the override argument
     - `test_kineto_profiler_multiple_steppers_with_override_False` : this test uses the environment variable
Pull Request resolved: https://github.com/pytorch/pytorch/pull/90101
Approved by: https://github.com/albanD
2023-01-13 18:07:40 +00:00
Nouran Ali
a60125e298 add docstring for adam differentiable parameter (#91881)
Fixes #90467

Pull Request resolved: https://github.com/pytorch/pytorch/pull/91881
Approved by: https://github.com/janeyx99
2023-01-13 17:08:27 +00:00
albanD
60e37a6e08 Update sgd doc to insist on momentum buffer initial value (#92111)
Following the discussion in https://github.com/pytorch/pytorch/pull/91108
Pull Request resolved: https://github.com/pytorch/pytorch/pull/92111
Approved by: https://github.com/soumith, https://github.com/janeyx99
2023-01-13 15:50:57 +00:00
milesial
9412778d51 Fix OneCycleLR error log (#92040)
If we call the scheduler 11 times but the number of expected steps is 10, we should print `Tried to step 11 times`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/92040
Approved by: https://github.com/janeyx99
2023-01-13 02:46:59 +00:00
Jane Xu
ed7885c254 [utils][foreach] Add group tensor by device and dtype util (#92014)
Add util that will be commonly used throughout optim
Pull Request resolved: https://github.com/pytorch/pytorch/pull/92014
Approved by: https://github.com/albanD
2023-01-11 23:37:20 +00:00
PyTorch MergeBot
7f2b5ea1e1 Revert "Avoid device casting for all singleton tensors in optimizer states (#91454)"
This reverts commit 1e725c9747.

Reverted https://github.com/pytorch/pytorch/pull/91454 on behalf of https://github.com/janeyx99 due to Likely caused regression where checkpoint resume fails during training
2023-01-10 18:57:50 +00:00
Joel Schlosser
1e725c9747 Avoid device casting for all singleton tensors in optimizer states (#91454)
Fixes #75224
Pull Request resolved: https://github.com/pytorch/pytorch/pull/91454
Approved by: https://github.com/janeyx99
2023-01-04 17:55:00 +00:00
joncrall
ad782ff7df Enable xdoctest runner in CI for real this time (#83816)
Builds on #83317 and enables running the doctests. Just need to figure out what is causing the failures.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/83816
Approved by: https://github.com/ezyang, https://github.com/malfet
2022-12-29 05:32:42 +00:00
Adrian Wälchli
f5e20d6060 Make the state dict of CyclicLR scheduler pickleable (#91400)
Fixes #90414

This PR drops the unpicklable `weakref.WeakMethod` object from CyclicLR scheduler from the state dict, and re-inits the object again once the state dict gets loaded. This makes the state picklable so you can include it in your checkpoint. Also fixes https://github.com/Lightning-AI/lightning/issues/15901

A simple test was added that `pickle.dumps(state)` the state.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/91400
Approved by: https://github.com/albanD
2022-12-28 18:05:24 +00:00
Jane Xu
a061f139dc [optim] Adam defaults to fused when CUDA + differentiable=False (#90865)
Step 1 in faster default optimizers.

Preliminary benchmarks show gaps in improvement on CUDA for BERT_pytorch and resnet18:
![image](https://user-images.githubusercontent.com/31798555/207707118-14221802-77ce-4ee0-96e3-04638c07924c.png)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/90865
Approved by: https://github.com/albanD
2022-12-27 01:28:47 +00:00
richardachen
dafd0432ee Update __init__.py (#91196)
Fixes #91080

Pull Request resolved: https://github.com/pytorch/pytorch/pull/91196
Approved by: https://github.com/janeyx99
2022-12-20 23:38:25 +00:00
Michael Lazos
1accd915a4 Re-enable optimizers (#90709)
Fixes
https://github.com/pytorch/pytorch/issues/90165
https://github.com/pytorch/torchdynamo/issues/328

Re-enables optimizer capture + compilation now that the dynamo slowdowns have been fixed

and it has speedups, numbers to come soon

Pull Request resolved: https://github.com/pytorch/pytorch/pull/90709
Approved by: https://github.com/anijain2305, https://github.com/jansel, https://github.com/yanboliang
2022-12-19 04:07:41 +00:00
Soumith Chintala
06326a7721 [optim] skip .item calls in all optimizers when compiling with dynamo (#88173)
@mlazos: skips `item()` calls if compiling with dynamo, by defining a helper function `_get_value` which either returns the result of `.item()` or the scalar cpu tensor if compiling with dynamo. This was done because removing `item()` calls significantly regresses eager perf. Additionally, `_dispatch_sqrt` calls the appropriate sqrt function (math.sqrt, or torch.sqrt).

Fixes https://github.com/pytorch/torchdynamo/issues/1083

This PR will no longer be needed once symint support is default.

This PR closes all remaining graph breaks in the optimizers (!!)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/88173
Approved by: https://github.com/albanD
2022-12-12 17:32:35 +00:00
Mauricio Villegas
aacafd2cba Fixed a couple of mistakes in type annotations in optim package (#90216)
Doing some tests with all Optimizer and LRScheduler classes in optim package, I noticed a couple of mistakes in type annotations, so created a pull request to fix them.

- In Optimizer class, incorrectly named parameter `default` instead of `defaults` in pyi file
- In SGD class, type for `maximize` and `differentiable` not available in either py or pyi files

I don't know if there is a plan to move all types from pyi to py files, so wasn't too sure where to fix what.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/90216
Approved by: https://github.com/janeyx99
2022-12-09 03:20:21 +00:00
Anupam Bhatnagar
6f4dea562d Implement post and pre hooks for optimizer (#89176)
Fixes #88446

Pull Request resolved: https://github.com/pytorch/pytorch/pull/89176
Approved by: https://github.com/albanD
2022-12-02 07:03:45 +00:00
Michael Lazos
c63afb283c Disable dynamo on optimizer lazy initialization (#89902)
Helps with https://github.com/pytorch/torchdynamo/issues/1803

Separate out the group initialization and disable dynamo on it

Pull Request resolved: https://github.com/pytorch/pytorch/pull/89902
Approved by: https://github.com/soumith, https://github.com/albanD
2022-12-02 01:15:11 +00:00
Michael Lazos
3d47c74cfe Update code style for optimizer code (#89862)
Separating out whitespace-only changes
Pull Request resolved: https://github.com/pytorch/pytorch/pull/89862
Approved by: https://github.com/albanD, https://github.com/soumith
2022-11-30 00:53:05 +00:00
albanD
c3e85d879c Mention discrepency between original impl and our impl of RAdam (#89575)
Fixes https://github.com/pytorch/pytorch/issues/88836

Pull Request resolved: https://github.com/pytorch/pytorch/pull/89575
Approved by: https://github.com/mruberry
2022-11-24 17:11:42 +00:00
Jane Xu
310335de48 Update lr_scheduler.pyi to match lr_scheduler.py (#88818)
Following #88503, we should also update the pyi file

Pull Request resolved: https://github.com/pytorch/pytorch/pull/88818
Approved by: https://github.com/soulitzer
2022-11-11 04:02:44 +00:00
Jane Xu
0a69c50a46 Publicly expose _LRScheduler to LRScheduler (#88503)
Fixes #61232

Pull Request resolved: https://github.com/pytorch/pytorch/pull/88503
Approved by: https://github.com/soulitzer
2022-11-07 21:15:10 +00:00
Kazuaki Ishizaki
2ddefbdc3c Fix typos used in documents under torch directory (#88300)
This PR fixes typos, in comments of Python files, that are found from a search box at https://pytorch.org/docs/master/search.html

Pull Request resolved: https://github.com/pytorch/pytorch/pull/88300
Approved by: https://github.com/lezcano
2022-11-02 09:38:13 +00:00
RangiLyu
512a3a48e3 sync AveragedModel buffers when use_buffers=False (#84054)
Fixes #84053

As described in the issue, the AveragedModel will deep copy the model during initialization, which means that the buffers in the averaged model cannot be updated together with the model.

One solution is to make the buffers equal to the source model every time when calling `update_parameters`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/84054
Approved by: https://github.com/samdow
2022-10-24 16:03:14 +00:00
Emilio Castillo
1b43883fd6 Make AdamW, NAdam & RAdam differentiable (#86183)
Blocked by #86096
Pull Request resolved: https://github.com/pytorch/pytorch/pull/86183
Approved by: https://github.com/albanD
2022-10-17 04:32:08 +00:00
mikael10j
7dcfbedce0 Fix LinearLR scheduler start_factor (#86695)
Fixes #86454

The `start_factor` must be comprised in ]0;1] instead of [0;1] to avoid division by 0. This PR changes the lower limit checking of the parameter.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/86695
Approved by: https://github.com/albanD
2022-10-13 17:31:36 +00:00
Emilio Castillo
cb4867a71a Make ASGD & RProp differentiable (#86258)
Blocked by #86183
Pull Request resolved: https://github.com/pytorch/pytorch/pull/86258
Approved by: https://github.com/albanD
2022-10-13 04:06:13 +00:00
Emilio Castillo
aacb9f3ac6 Make Adadelta,Adagrad & Adamax differentiable (#86096)
Continuing the differentiable optimizers support

Pull Request resolved: https://github.com/pytorch/pytorch/pull/86096
Approved by: https://github.com/janeyx99
2022-10-12 23:16:29 +00:00
kshitij12345
82229d1e33 [optim] fix: empty grad support for SparseAdam (#86459)
Fixes #82486

Pull Request resolved: https://github.com/pytorch/pytorch/pull/86459
Approved by: https://github.com/albanD
2022-10-07 19:24:59 +00:00
Check Deng
b3fdb02fb2 Fix memory leak in _LRScheduler.step() (#85602)
Fixes #85410

This diff removed the cyclic references in `_LRScheduler.step()`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85602
Approved by: https://github.com/albanD
2022-10-07 15:55:55 +00:00
Tongzhou Wang
5ed75ec1d7 Fix SparseAdam consuming iterator (#86210)
Fixes https://github.com/pytorch/pytorch/issues/86209
Pull Request resolved: https://github.com/pytorch/pytorch/pull/86210
Approved by: https://github.com/cpuhrsch
2022-10-06 23:11:25 +00:00
PyTorch MergeBot
233d6f195a Revert "Fix memory leak in _LRScheduler.step() (#85602)"
This reverts commit eb32330d6b.

Reverted https://github.com/pytorch/pytorch/pull/85602 on behalf of https://github.com/albanD due to newly added test is flaky
2022-10-06 22:02:02 +00:00
Chengqi Deng
eb32330d6b Fix memory leak in _LRScheduler.step() (#85602)
Fixes #85410

This diff removed the cyclic references in `_LRScheduler.step()`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85602
Approved by: https://github.com/albanD
2022-10-06 17:07:36 +00:00
Masaki Kozuki
5f26df0345 resubmit: "resubmit: [mta] APEX style Fused Adam (#81705) (#85507)" (#85739)
Embarrassingly move the pow implementations around [ATen/native/cuda/PowKernel.cu#L21-L66](849b08f14b/aten/src/ATen/native/cuda/PowKernel.cu (L21-L66)) to a new header file and let FusedAdam use them to tame MSVC, hopefully.

cc @ngimel @ptrblck
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85739
Approved by: https://github.com/ngimel
2022-09-29 16:58:59 +00:00
Seonglyong Gong
f80ef73d1c [Profiler] tracking Optimizer (part 2 of Record Optimizer) (#84920)
Summary:
Part 2 of Record Optimizer param_groups and states (https://github.com/pytorch/pytorch/pull/84063)
- hooking from optimizer step
- PyOptCall Type
- declare data type for collection
- python binding
- simple unit test case

Test Plan: buck run mode/opt //caffe2/test:profiler

Differential Revision: D39402667

Pull Request resolved: https://github.com/pytorch/pytorch/pull/84920
Approved by: https://github.com/robieta
2022-09-28 02:48:07 +00:00
Peter Jung
9f1468ae6c CyclicLR memory leak fix (#85462)
Hi, we noticed in our team that by using CyclicLR, there is a problem with memory clearance on GPU (probably it will be the case without the GPU as well, but that was our use case) After initializing CyclicLR, GPU memory is not cleared even after the model, optimizer and scheduler are out of scope (e.g. reference count is zero). This is because `__init__` method inside `CyclicLR` creates reference to its own methods and it will not get removed until `gc.collect()` is called manually. This is a problem if people want to test multiple models in one run of a script, after testing the first model, second one will fail on `CUDA out of memory error` because the first one is not cleared from the memory.

I propose a simple fix by using `weakref`, similarly as in `_LRScheduler` base class, but if you have any comments I am happy to change it.

Here is the code to reproduce the bug:

```
import torch
import weakref
from transformers import DetrForObjectDetection

class X:
    def __init__(self, optimizer):
        self.optimizer = optimizer

        # Will cause cyclic reference.
        self.func = self.dummy

        # Will work as expected, memory cleared after instance count is zero.
        # self.func = weakref.WeakMethod(self.dummy)

    def dummy(self, x):
        return 1.

def test():
    model = DetrForObjectDetection.from_pretrained('facebook/detr-resnet-50')
    model.to('cuda')
    optimizer = torch.optim.Adam(model.parameters())
    x = X(optimizer)

test()
print(f'{torch.cuda.memory_reserved()}, {torch.cuda.memory_allocated()}')  # Should print (<some memory>, 0), but with cyclic reference, it will print (<some memory>, <some memory>).
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85462
Approved by: https://github.com/albanD
2022-09-27 17:41:58 +00:00
PyTorch MergeBot
7167996346 Revert "resubmit: [mta] APEX style Fused Adam (#81705) (#85507)"
This reverts commit 4615d1bcfa.

Reverted https://github.com/pytorch/pytorch/pull/85507 on behalf of https://github.com/atalman due to Break internal windows builds
2022-09-27 16:59:35 +00:00
Masaki Kozuki
4615d1bcfa resubmit: [mta] APEX style Fused Adam (#81705) (#85507)
This PR implements an APEX style FusedAdam in PyTorch. This is different from the APEX one in that this is compatible with `torch.cuda.amp.GradScaler` by setting `_step_supports_amp_scaling` to `True` and unscales gradients inside its CUDA kernel.

related: https://github.com/pytorch/pytorch/issues/68041, https://github.com/pytorch/pytorch/issues/71274, https://github.com/pytorch/pytorch/issues/80167 possibly related to https://github.com/pytorch/pytorch/issues/80595#issuecomment-1178519436

Pull Request resolved: https://github.com/pytorch/pytorch/pull/81705
Approved by: https://github.com/ngimel

cc @ptrblck @ngimel
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85507
Approved by: https://github.com/ngimel
2022-09-23 18:56:00 +00:00
PyTorch MergeBot
e505360eb8 Revert "[mta] APEX style Fused Adam (#81705)"
This reverts commit 7a6c4d0c50.

Reverted https://github.com/pytorch/pytorch/pull/81705 on behalf of https://github.com/dagitses due to broke internal builds, details to come
2022-09-22 19:37:29 +00:00
Masaki Kozuki
7a6c4d0c50 [mta] APEX style Fused Adam (#81705)
This PR implements an APEX style FusedAdam in PyTorch.
This is different from the APEX one in that this is compatible with `torch.cuda.amp.GradScaler` by setting `_step_supports_amp_scaling` to `True` and unscales gradients inside its CUDA kernel.

related: https://github.com/pytorch/pytorch/issues/68041, https://github.com/pytorch/pytorch/issues/71274, https://github.com/pytorch/pytorch/issues/80167
possibly related to https://github.com/pytorch/pytorch/issues/80595#issuecomment-1178519436

cc @ptrblck @ngimel
Pull Request resolved: https://github.com/pytorch/pytorch/pull/81705
Approved by: https://github.com/ngimel
2022-09-20 17:18:33 +00:00
F-G Fernandez
7243264c61 fix: Allowed optimizers with more than 2 betas (#84486)
Hello there 👋

As discussed in #84485, this PR enables more flexibility on the optimizers that are wrapped by LR schedulers in PyTorch. Currently, it is incompatible with optimizers that have a number of betas different than 2. This PR fixes that with minimal modifications.

Fixes #84485

Any feedback is welcome!

Pull Request resolved: https://github.com/pytorch/pytorch/pull/84486
Approved by: https://github.com/Lezcano, https://github.com/soulitzer
2022-09-06 19:24:10 +00:00
kshitij12345
faac3dbce2 [optim] asgd : handle complex params as independent real params (#84472)
Ref: #65711
Pull Request resolved: https://github.com/pytorch/pytorch/pull/84472
Approved by: https://github.com/Lezcano, https://github.com/soulitzer
2022-09-06 16:58:42 +00:00
kshitij12345
7c20ad3dfa [optim] rprop: handle complex params as independent real params (#83858)
Ref #65711

Pull Request resolved: https://github.com/pytorch/pytorch/pull/83858
Approved by: https://github.com/albanD
2022-08-23 08:39:35 +00:00
Kshiteej K
09331c947c [optim] rmsprop: handle complex params as independent real params (#83860)
Ref: #65711
Pull Request resolved: https://github.com/pytorch/pytorch/pull/83860
Approved by: https://github.com/albanD
2022-08-22 21:55:01 +00:00
joncrall
b136f3f310 More doctest refinements. (#83317)
Follow up to #82797

Now that the doctests themselves are in a better state, we should be able to enable xdoctest on the CI so they stay that way.

@ezyang @vadimkantorov
Pull Request resolved: https://github.com/pytorch/pytorch/pull/83317
Approved by: https://github.com/ezyang
2022-08-22 20:07:26 +00:00
Emilio Castillo
f0eb841d20 Make torch.optim.RMSprop differentiable (#83578)
Blocked by #82205
Pull Request resolved: https://github.com/pytorch/pytorch/pull/83578
Approved by: https://github.com/albanD
2022-08-22 03:37:10 +00:00
albanD
84c4b07932 Make sure that we can load old optimizer checkpoint (#83588)
We want to make sure that we can load checkpoints that were saved with older version of the code (which doesn't contain the differentiable attribute).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/83588
Approved by: https://github.com/mikaylagawarecki
2022-08-17 15:08:05 +00:00
Emilio Castillo
5aab57e112 Make Adam optimizer differentiable (#82205)
Continues [80938](https://github.com/pytorch/pytorch/pull/80938)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/82205
Approved by: https://github.com/albanD
2022-08-17 07:20:37 +00:00
Rob Zinkov
ff75562cff Adding maximize to rprop (#81864)
Added the maximize flag #68052 to rprop optimizer and updates the respective tests.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/81864
Approved by: https://github.com/albanD
2022-08-16 08:19:46 +00:00
joncrall
4618371da5 Integrate xdoctest - Rebased (#82797)
This is a new version of #15648 based on the latest master branch.

Unlike the previous PR where I fixed a lot of the doctests in addition to integrating xdoctest, I'm going to reduce the scope here. I'm simply going to integrate xdoctest, and then I'm going to mark all of the failing tests as "SKIP". This will let xdoctest run on the dashboards, provide some value, and still let the dashboards pass. I'll leave fixing the doctests themselves to another PR.

In my initial commit, I do the bare minimum to get something running with failing dashboards. The few tests that I marked as skip are causing segfaults. Running xdoctest results in 293 failed, 201 passed tests. The next commits will be to disable those tests. (unfortunately I don't have a tool that will insert the `#xdoctest: +SKIP` directive over every failing test, so I'm going to do this mostly manually.)

Fixes https://github.com/pytorch/pytorch/issues/71105

@ezyang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/82797
Approved by: https://github.com/ezyang
2022-08-12 02:08:01 +00:00
Federico Pozzi
f8a10a7f79 feat: add PolynomialLR scheduler (#82769)
### Description
<!-- What did you change and why was it needed? -->

Add PolynomialLR scheduler.

### Issue
Closes #79511.

### Testing
I added tests for PolynomialLR.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/82769
Approved by: https://github.com/datumbox
2022-08-10 18:21:00 +00:00
Rob Zinkov
c54d18dbc7 Handle complex optimization in Adamax by treating complex numbers as 2D real numbers (#80319)
This commit partially addresses #65711

Pull Request resolved: https://github.com/pytorch/pytorch/pull/80319
Approved by: https://github.com/albanD
2022-08-05 21:03:18 +00:00
Rob Zinkov
dcbe9ce2ad Handle complex optimization in AdamW by treating complex numbers as 2D real numbers (#80280)
This commit partially addresses #65711

Pull Request resolved: https://github.com/pytorch/pytorch/pull/80280
Approved by: https://github.com/albanD
2022-08-05 13:47:14 +00:00
Masaki Kozuki
3139722679 [foreach][mta] Inplace maximum and minimum (#82523)
### Description
<!-- What did you change and why was it needed? -->
Implement `torch._foreach_maximum_` and `torch._foreach_minimum_` mainly for `_multi_tensor_adam` and `_multi_tensor_adamw` with `amsgrad=True` to correctly update their `max_exp_avg_sqs`.

### Issue
<!-- Link to Issue ticket or RFP -->
- https://github.com/pytorch/pytorch/issues/78807
- https://github.com/pytorch/pytorch/pull/81894
- https://github.com/pytorch/pytorch/pull/81348
- https://github.com/pytorch/pytorch/pull/81705
- https://github.com/pytorch/pytorch/issues/58833
- https://github.com/pytorch/pytorch/issues/68041

### Testing
<!-- How did you test your change? -->
Updated `test_foreach.py::TestForeach::_minmax_test` to compare the outputs of `_foreach_maximum_` (and `_foreach_minimum_`) against those of `[torch.maximum(a, b) for a, b in zip(tensors1, tensors2)]`

cc @ngimel @albanD @mikaylagawarecki
Pull Request resolved: https://github.com/pytorch/pytorch/pull/82523
Approved by: https://github.com/albanD
2022-08-03 03:40:42 +00:00
ProGamerGov
71d50f4f89 Change docstring type callable to Callable for consistency (#82487)
### Description

Across PyTorch's docstrings, both `callable` and `Callable` for variable types. The Callable should be capitalized as we are referring to the `Callable` type, and not the Python `callable()` function.

### Testing

There shouldn't be any testing required.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/82487
Approved by: https://github.com/albanD
2022-08-01 17:26:09 +00:00
ProGamerGov
357b7d589c Fix docstring inconsistencies: string -> str, boolean -> bool (#82410)
### Description

Throughout the PyTorch docs and codebase, the `string` type in docstrings is referred to by two separate names. This leads to inconsistent docs, like you can see here: https://pytorch.org/docs/stable/generated/torch.nn.Conv3d.html#torch.nn.Conv3d

This PR fixes this issue by ensuring that all mentions of the string type in docstrings, are using the same format that Sphinx generates hyperlinks for.

### Testing
No testing should be required for this change

Pull Request resolved: https://github.com/pytorch/pytorch/pull/82410
Approved by: https://github.com/jbschlosser
2022-07-28 21:29:57 +00:00
Rob Zinkov
f9ef363982 Modifying Adam to support complex numbers as 2d real numbers (#80279)
This commit addresses issues in #65711

Pull Request resolved: https://github.com/pytorch/pytorch/pull/80279
Approved by: https://github.com/albanD
2022-07-27 18:39:40 +00:00
Sudarshan Raghunathan
52aae5aa19 [Sparse Adam] Fix error in loading serialized models due to introduction of new parameter (#82273)
### Description
PR #80336  introduced a new parameter to the Sparse Adam optimizer. The new parameter is accessed inside the `step` method of the optimizer. If we try to deserialize and run an older version of the optimizer before this change was introduced, it fails in the step that tries to access the missing parameter.

I have added a workaround to set a default value in case the parameter is unavailable in the optimizer.

### Issue
<!-- Link to Issue ticket or RFP -->

### Testing
* Testing on PyTorch CI
* Manual validation against existing serialized models to make sure they continue to work
Pull Request resolved: https://github.com/pytorch/pytorch/pull/82273
Approved by: https://github.com/mehtanirav, https://github.com/albanD
2022-07-27 12:48:38 +00:00
albanD
312ece7f65 fix sgd maximize when momentum is involved (#81859)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/81859
Approved by: https://github.com/jbschlosser
2022-07-26 16:48:32 +00:00
Emilio Castillo
49b4f45781 Add initial support for differentiable optimizers (#80938)
Adds the `differentiable` argument, a method for updating parameters in an existing optimizer, and a template for testing the differentiability of multiple optimizers.

This is all based in discussions with @albanD & @jbschlosser
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80938
Approved by: https://github.com/albanD
2022-07-25 13:37:08 +00:00
Rob Zinkov
50c655d5e3 Adding maximize to ASGD (#81875)
Added the maximize flag #68052 to ASGD optimizer and updates the respective tests.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/81875
Approved by: https://github.com/albanD
2022-07-22 17:05:41 +00:00
PyTorch MergeBot
135af0fe30 Revert "Adding maximize to ASGD (#80323)"
This reverts commit 14bd5bd6ee.

Reverted https://github.com/pytorch/pytorch/pull/80323 on behalf of https://github.com/albanD due to Broke rocm test
2022-07-08 13:35:31 +00:00
PyTorch MergeBot
0b8a5ca01b Revert "Adding maximize to rprop (#80335)"
This reverts commit 495aa9bc3a.

Reverted https://github.com/pytorch/pytorch/pull/80335 on behalf of https://github.com/albanD due to Broke rocm and windows test
2022-07-08 13:34:02 +00:00
Rob Zinkov
f24c94d7ae Adding maximize to SparseAdam (#80336)
Added the maximize flag #68052 to SparseAdam optimizer and updates the respective tests.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80336
Approved by: https://github.com/albanD
2022-07-08 12:17:27 +00:00
Rob Zinkov
495aa9bc3a Adding maximize to rprop (#80335)
Added the maximize flag #68052 to rprop optimizer and updates the respective tests.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80335
Approved by: https://github.com/albanD
2022-07-08 08:04:38 +00:00
Rob Zinkov
a1fd5b4273 Adding maximize to RMSprop (#80326)
Added the maximize flag #68052 to RMSprop optimizer and updates the respective tests.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/80326
Approved by: https://github.com/albanD
2022-07-08 08:04:26 +00:00
Rob Zinkov
14bd5bd6ee Adding maximize to ASGD (#80323)
Added the maximize flag #68052 to ASGD optimizer and updates the respective tests.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80323
Approved by: https://github.com/albanD
2022-07-08 08:03:36 +00:00
albanD
9d20af5060 remove overly restrictive checks for cudagraph (#80881)
Finish fixing https://github.com/pytorch/pytorch/issues/80809
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80881
Approved by: https://github.com/jbschlosser
2022-07-06 18:08:49 +00:00
Edward Z. Yang
57f001f35a Don't error if _warned_capturable_if_run_uncaptured not set (#80345)
This can happen if an optimizer was pickled.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80345
Approved by: https://github.com/malfet, https://github.com/albanD
2022-06-29 03:46:22 +00:00
anjali411
bda04e9f5e Add __all__ for torch.optim and torch.nn.modules modules (#80237)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80237
Approved by: https://github.com/albanD
2022-06-24 21:34:10 +00:00
Sergii Dymchenko
de7219e8a7 Use generators with all/any in torch/optim (#78142)
Generator comprehensions with any/all are less verbose and potentially help to save memory/CPU : https://eklitzke.org/generator-comprehensions-and-using-any-and-all-in-python

To make JIT work with this change, I added code to convert GeneratorExp to ListComp. So the whole PR is basically NoOp for JIT, but potentially memory and speed improvement for eager mode.

Also I removed a test from test/jit/test_parametrization.py. The test was bad and had a TODO to actually implement and just tested that UnsupportedNodeError is thrown, and with GeneratorExp support a different error would be thrown.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78142
Approved by: https://github.com/malfet, https://github.com/albanD
2022-06-24 17:23:45 +00:00
albanD
375668cd96 Remove overly restrictive assert in adam (#80222)
This is causing issues if the user has the step on cuda for a good reason.

These assert prevents code that used to run just fine to fail.
Note that this is a pretty bad thing to do for performance though so it is ok to try and push users away from doing it.

For the 1.12.1 milestone: this is not asking for a dot release to fix this (as this is bad practice anyways). But it would be a great thing to add if we do one: it is very low risk and will prevent breakage for users.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80222
Approved by: https://github.com/jbschlosser, https://github.com/ngimel
2022-06-24 17:08:34 +00:00
Antonio Kim
765b6a8fab Fix SequentialLR initialization (#72856)
What was happening is that when we have multiple learning rate schedulers, the order in which they are being initialized is not being taken into account. This is a problem if they were being initialized in sequential order (as one might intuitively do).

Each scheduler calls `step()` on initialization and sets the `lr` in its optimizer's `params_groups`. However, this means that step 0 will be using the `lr` that was set by the very last scheduler (in the case of initializing schedulers sequentially) instead of the first scheduler.

The fix in this PR, addresses the above bug by performing a call to the appropriate scheduler on initialization after decrementing the `last_epoch` values in order to keep them the same post-step. This will ensure that the correct scheduler is the one setting the `lr` values for the optimizer's `param_groups`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72856
Approved by: https://github.com/jbschlosser
2022-06-21 20:21:13 +00:00
Janosh Riebesell
660d9ddef4 Fix SWALR doc string (#79836)
In `torch/optim/swa_utils.py`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/79836
Approved by: https://github.com/albanD
2022-06-20 12:57:07 +00:00
Sebastian Brodehl
fb9d8de379 Make LR scheduler stub complete, including OneCycleLR and class attributes. (#59476)
This PR completes the stub file for lr scheduler and includes a previously missing scheduler, namely `OneCycleLR, and adds additional class attributes and methods for all lr scheduler.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/59476
Approved by: https://github.com/jbschlosser
2022-06-17 16:39:13 +00:00
Madhushan B
9acbaaaf05 Fix typo in ChainedScheduler docstring (#79775)
### Goal
Fixes https://github.com/pytorch/pytorch/issues/79720

### Approach
replace `Chains list of learning rate schedulers. It takes a list of chainable learning rate schedulers and performs consecutive step() functions` **`belong`** `to them by just one call.` with `Chains list of learning rate schedulers. It takes a list of chainable learning rate schedulers and performs consecutive step() functions` **`belonging`** `to them by just one call.`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/79775
Approved by: https://github.com/albanD
2022-06-17 14:18:42 +00:00
Michael Carilli
ba27ee9e8f [CUDA graphs] Allows Adam and AdamW to be capture-safe (#77862)
Near term fix for https://github.com/pytorch/pytorch/issues/76368.

Q. Why does the user need to request `capturable=True` in the optimizer constructor? Why can't capture safety be completely automatic?
A. We need to set up capture-safe (device-side) state variables before capture. If we don't, and step() internally detects capture is underway, it's too late: the best we could do is create a device state variable and copy the current CPU value into it, which is not something we want baked into the graph.

Q. Ok, why not just do the capture-safe approach with device-side state variables all the time?
A. It incurs several more kernel launches per parameter, which could really add up and regress cpu overhead for ungraphed step()s. If the optimizer won't be captured, we should allow step() to stick with its current cpu-side state handling.

Q. But cuda RNG is a stateful thing that maintains its state on the cpu outside of capture and replay, and we capture it automatically. Why can't we do the same thing here?
A. The graph object can handle RNG generator increments because its capture_begin, capture_end, and replay() methods can see and access generator object. But the graph object has no explicit knowledge of or access to optimizer steps in its capture scope. We could let the user tell the graph object what optimizers will be stepped in its scope, ie something like
```python
graph.will_use_optimizer(opt)
graph.capture_begin()
...
```
but that seems clunkier than an optimizer constructor arg.

I'm open to other ideas, but right now I think constructor arg is necessary and the least bad approach.

Long term, https://github.com/pytorch/pytorch/issues/71274 is a better fix.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/77862
Approved by: https://github.com/ezyang
2022-06-13 01:56:47 +00:00
Rob Zinkov
2a496e2f80 Adding maximize to Adamax (#77409)
Added the maximize flag #68052 to Adamax optimizer and updates the respective tests.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/77409
Approved by: https://github.com/albanD
2022-05-16 17:34:44 +00:00
James Reed
57b54dfec5 Fix Optimizer.zero_grad type annotation (#76998)
`Optimizer.zero_grad()` defines the `set_to_none` argument as `bool`, not `Optional[bool]`

Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/76998
Approved by: https://github.com/albanD
2022-05-11 00:05:26 +00:00
tomMoral
ff94c9dee4 DOC fix momentum equation for nesterov
Fix https://github.com/pytorch/pytorch/issues/72395

This is a small fix in the doc for an indice in this equation:

![image](https://user-images.githubusercontent.com/3321081/166165461-140855b5-96b5-4417-85fc-2a170f95700a.png)

I think teh indice should not be `t-1` but `t`. This is coherent with [the implementation)[https://github.com/pytorch/pytorch/blob/master/torch/optim/sgd.py#L236] and with what is done for instance in [keras](https://www.tensorflow.org/api_docs/python/tf/keras/optimizers/SGD).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76639
Approved by: https://github.com/albanD
2022-05-04 20:40:21 +00:00
Emilio Castillo
e5ee6f5cf7 Fix CosineAnnealingLR on restart
Fixes #60265

The initial LR for this scheduler is not consistent when a new instance is created with `last_epoch != -1`

Maybe we can refactor the testing code to test `last_epoch != -1` in schedulers that can recreate their state from the current epoch?

Pull Request resolved: https://github.com/pytorch/pytorch/pull/60339
Approved by: https://github.com/albanD
2022-04-20 13:35:01 +00:00
Rob Zinkov
6642e88ad2 Adding maximize flag to Adagrad
This adds maximize to Adagrad (#68052) along with updates the respective tests.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/75968
Approved by: https://github.com/albanD
2022-04-20 08:29:03 +00:00
Jake Tae
3b18bc36f3 Docs: Add missing zero-ing step in Rprop algorithm
Fixes ##70418.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/75555
Approved by: https://github.com/albanD
2022-04-11 21:57:13 +00:00
francescocastelli
58a44523c1 Add maximize flag to Adadelta
Added the maximize flag to Adadelta optimizer (#68052) and adjusted tests to take maximize into account.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75330
Approved by: https://github.com/cpuhrsch
2022-04-08 20:32:35 +00:00
Mikayla Gawarecki
10bb0ffe69 Fix casting bug in state_step for optimizers when loading state dict
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75214

Approved by: https://github.com/albanD
2022-04-05 01:27:18 +00:00
Jan Zikes
715a0dc5c0 [PyTorch/d2go] fix optim _multi_tensor (#73215)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73215

Fixing an issue in optimizers from _multi_tensor, for `sgd_mt` introduced in 2cb03e926f

Reviewed By: mikaylagawarecki

Differential Revision: D34389034

fbshipit-source-id: ede153d52dca15909c6c022853589707f18dc8d1
(cherry picked from commit cc8a58e584)
2022-02-23 10:29:48 +00:00
Sergii Dymchenko
313557a613 Add missing import (#72840)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/72840

Reviewed By: H-Huang

Differential Revision: D34242612

Pulled By: albanD

fbshipit-source-id: 3dd34de96dbf1ae8f3c3ea45888d211d95862c49
(cherry picked from commit d2650ffa75)
2022-02-15 19:43:54 +00:00
Mikayla Gawarecki
2a5aaf1c49 Optim foreach cleanup for AdamW (#70484)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70484

Test Plan: Imported from OSS

Reviewed By: anjali411

Differential Revision: D33767869

Pulled By: mikaylagawarecki

fbshipit-source-id: 2f5273bbfeea3ed502c5d77da4bebe1674243e86
(cherry picked from commit 2dd9b77917)
2022-02-15 18:02:08 +00:00
Mikayla Gawarecki
dff58d519f Optim foreach cleanup for Rprop (#70483)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70483

Test Plan: Imported from OSS

Reviewed By: anjali411

Differential Revision: D33767866

Pulled By: mikaylagawarecki

fbshipit-source-id: ffc5ae68eeea8fa09385862b853b731554b77bcb
(cherry picked from commit 3a0fe29580)
2022-02-15 18:02:08 +00:00
Mikayla Gawarecki
ce3094f5f6 Optim foreach cleanup for Rmsprop (#70482)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70482

Test Plan: Imported from OSS

Reviewed By: anjali411

Differential Revision: D33767862

Pulled By: mikaylagawarecki

fbshipit-source-id: 8e2e9c986d5a3774093a79755940372945f1b3a9
(cherry picked from commit baea537277)
2022-02-15 18:02:08 +00:00
Mikayla Gawarecki
2cb03e926f Optim foreach cleanup for SGD (#70481)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70481

Test Plan: Imported from OSS

Reviewed By: anjali411

Differential Revision: D33767868

Pulled By: mikaylagawarecki

fbshipit-source-id: 89b9227a4ddf99602855973cbc343c58ae3d5328
(cherry picked from commit ffea8ddcfd)
2022-02-15 18:02:08 +00:00
Mikayla Gawarecki
5f9590681d Optim foreach cleanup for Adam (#70295)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70295

Test Plan: Imported from OSS

Reviewed By: anjali411

Differential Revision: D33767870

Pulled By: mikaylagawarecki

fbshipit-source-id: f922f15ecb0307458c8ecee737325c42c4f3ce8b
(cherry picked from commit 66233a8a3e)
2022-02-15 18:02:08 +00:00
Mikayla Gawarecki
0972db5b7d Optim foreach cleanup for ASGD (#70231)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70231

Test Plan: Imported from OSS

Reviewed By: anjali411

Differential Revision: D33767867

Pulled By: mikaylagawarecki

fbshipit-source-id: 4406824acbb6f427d52c1ced2d8a02a98c943b86
(cherry picked from commit cbd9a4da15)
2022-02-09 16:52:13 +00:00
Mikayla Gawarecki
5948522e9c Optim foreach cleanup for RAdam (#70230)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70230

Test Plan: Imported from OSS

Reviewed By: anjali411

Differential Revision: D33767874

Pulled By: mikaylagawarecki

fbshipit-source-id: 9379db24266a7bbcc2c23849f87ae0af2e6729c0
(cherry picked from commit ecf7b31fc3)
2022-02-09 16:52:13 +00:00
Mikayla Gawarecki
3653f07c7c Optim foreach cleanup for NAdam (#70229)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70229

Test Plan: Imported from OSS

Reviewed By: anjali411

Differential Revision: D33767873

Pulled By: mikaylagawarecki

fbshipit-source-id: 833ead14c1d1659351ebfbeb41045a3c7eb96dad
(cherry picked from commit 9415df6b5c)
2022-02-09 16:52:13 +00:00
Mikayla Gawarecki
d9acfef831 Optim foreach cleanup for Adamax (#69982)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69982

Test Plan: Imported from OSS

Reviewed By: anjali411

Differential Revision: D33767865

Pulled By: mikaylagawarecki

fbshipit-source-id: c5efd351e359825d38b71f57a2c61a2055c3c114
(cherry picked from commit 37bb80c2d7)
2022-02-09 16:52:13 +00:00
Mikayla Gawarecki
dabfea8363 Optim foreach cleanup for Adagrad (#69981)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69981

Test Plan: Imported from OSS

Reviewed By: anjali411

Differential Revision: D33767863

Pulled By: mikaylagawarecki

fbshipit-source-id: 1c99abe4ac4eb2a9eb896dff4837b539b94f68e7
(cherry picked from commit 61c28d0645)
2022-02-09 16:52:12 +00:00
Mikayla Gawarecki
8e8d170674 Optim foreach cleanup for Adadelta (#69980)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69980

- Merged `torch/optim/adadelta.py` and `torch/optim/_multitensor/adadelta.py` into `torch/optim/adadelta.py`
- Moved adadelta functional forms from `torch/optim/_functional.py` and `torch/optim/_multi_tensor/_functional.py` to `torch/optim/adadelta.py`
- `torch/optim/_functional.py` just imports from `torch/optim/adadelta.py`
- Added a test `test_optimizers_foreach_flag` which replicates `test_multi_tensor_optimizers` in `test/test_optim.py`
- Add a test `test_adadelta_new` that replicates the behavior of `test_adadelta` but with `foreach` flag instead of using the multitensor adadleta class. If we delete `_multitensor/` we could replace `test_adadelta` with this

Remaining TODO:

- [ ] single_tensor adadelta supports complex but multitensor does not, need to integrate the singletensor logic in multitensor and switch the `test_adadelta_complex` to test for foreach in [True, False]

Test Plan: Imported from OSS

Reviewed By: VitalyFedyunin, albanD

Differential Revision: D33413059

Pulled By: mikaylagawarecki

fbshipit-source-id: 92a9fa98705762bb6bd464261671e49aef40070e
(cherry picked from commit a008227d22)
2022-02-09 16:52:12 +00:00
Mikayla Gawarecki
8bb1d06702 [optim] ASGD fold state updates into functional and pass list of vars rather than states (#71335)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/71335

Test Plan: Imported from OSS

Reviewed By: anjali411

Differential Revision: D33767871

Pulled By: mikaylagawarecki

fbshipit-source-id: 84ebe1fafb1c27572f08c8c8026c882dd7e054c1
(cherry picked from commit 7613ebb391)
2022-02-08 23:58:41 +00:00
Mikayla Gawarecki
ccc1a01dcb [optim] NAdam fold state updates into functional (#71334)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/71334

Test Plan: Imported from OSS

Reviewed By: anjali411

Differential Revision: D33767864

Pulled By: mikaylagawarecki

fbshipit-source-id: 4d985e9e346f40110bd4231e0f16e5643fbc448d
(cherry picked from commit 58aa77e367)
2022-02-08 23:58:41 +00:00
Mikayla Gawarecki
7176c92687 [optim] update step in functional and pass state_steps instead of state (#71333)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71333

Updated
- Adagrad
- Adamax
- Adam
- AdamW
- RAdam
make multi_tensor functionals take `state_steps: List[Tensor]` instead of taking `states: List[Dict]`
make `state_steps: List[int]s -> state_steps:List[Tensor]` where each is a Singleton tensor so step can be updated within the functional

(NAdam and ASGD) were updated in separate diffs to fold their handling of state into the functionals

Test Plan: Imported from OSS

Reviewed By: anjali411

Differential Revision: D33767872

Pulled By: mikaylagawarecki

fbshipit-source-id: 9baa7cafb6375eab839917df9287c65a437891f2
(cherry picked from commit 831c02b3d0)
2022-02-08 16:51:19 +00:00
Prabhat Roy
942a084c46 Remove state_dict from AveragedModel and use buffers instead (#71763)
Summary:
Fixes [https://github.com/pytorch/pytorch/issues/66686](https://github.com/pytorch/pytorch/issues/66686)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/71763

Reviewed By: anjali411

Differential Revision: D33770907

Pulled By: prabhat00155

fbshipit-source-id: ee32f2cb8475c9add4e1a9a5d3d784ef95825efc
(cherry picked from commit a15898b072)
2022-01-26 13:31:30 +00:00
Alban Desmaison
e1b84e1b6b fix loading of older models that don't have maximize (#71023)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/71023

Reviewed By: jbschlosser

Differential Revision: D33483687

Pulled By: albanD

fbshipit-source-id: 2f3c6e97a9579be9ba15eca0756fc1f2c466fbb6
2022-01-10 06:01:24 -08:00
Jake Tae
dd1121435b SequentialLR update _last_lr on step (#70558)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/68956.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/70558

Reviewed By: dagitses

Differential Revision: D33430213

Pulled By: albanD

fbshipit-source-id: 446f182610de32db224d55b244d76c3076e8080f
2022-01-07 10:36:35 -08:00
Alban Desmaison
c6e727d05b Fix adamw formula doc (#68587)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/68482

Pull Request resolved: https://github.com/pytorch/pytorch/pull/68587

Reviewed By: dagitses, jbschlosser

Differential Revision: D33478646

Pulled By: albanD

fbshipit-source-id: 4e6419829c3faa7449c041e7d467a6dab30fe917
2022-01-07 10:15:16 -08:00
Mikayla Gawarecki
3a21f38a2e Integrate multi_tensor zero_grad into Optimizer base class (#69936)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69936

Currently, the optimizers in `torch/optim/_multi_tensor/` all override the base Optimizer class' implementation of `zero_grad` with the same foreach zero_grad implementation (e.g. [here](https://github.com/pytorch/pytorch/blob/master/torch/optim/_multi_tensor/adadelta.py#L93-L114)). There is a TODO that indicates that this should be refactored to the base class once the foreach ops are in good shape. This PR is intended to address that TODO.

Test Plan: Imported from OSS

Reviewed By: mrshenli

Differential Revision: D33346748

Pulled By: mikaylagawarecki

fbshipit-source-id: 6573f4776aeac757b6a778894681868191a1b4c7
2022-01-05 15:46:23 -08:00
Adnios
15f14ce0dc fix typo in adam docs (#70387)
Summary:
Fix the typo in [adam docs in master branch](https://pytorch.org/docs/master/generated/torch.optim.Adam.html#torch.optim.Adam)

![image](https://user-images.githubusercontent.com/41060790/147345284-37e180d1-fd06-4a62-9c79-2d17b8aa5cd3.png)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/70387

Reviewed By: H-Huang

Differential Revision: D33309283

Pulled By: albanD

fbshipit-source-id: d20c5d8f2498ac64013f71e202a6b50dcc069f2b
2021-12-28 07:35:39 -08:00
Adnios
a9c7d626e1 Add the maximize flag to AdamW (#70146)
Summary:
Related issue: https://github.com/pytorch/pytorch/issues/68052

cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse SciPioneer H-Huang

Pull Request resolved: https://github.com/pytorch/pytorch/pull/70146

Reviewed By: malfet

Differential Revision: D33254561

Pulled By: albanD

fbshipit-source-id: f190c836a4162f936c5953e076747c345df21421
2021-12-23 09:20:29 -08:00
Rohit Gupta
5f3f327a9d update SequentialLR signature (#69817)
Summary:
- ~optimizer isn't required for `SequentialLR` since it's already present in the schedulers. Trying to match the signature of it with `ChainedScheduler`.~
- ~`verbose` isn't really used anywhere so removed it.~

updated missing docs and added a small check

Pull Request resolved: https://github.com/pytorch/pytorch/pull/69817

Reviewed By: ngimel

Differential Revision: D33069589

Pulled By: albanD

fbshipit-source-id: f015105a35a2ca39fe94c70acdfd55cdf5601419
2021-12-16 12:58:00 -08:00
John Muradeli
fdcb78df38 print fix in lr_scheduler (#68338)
Summary:
`{:5d}` fails for `CosineAnnealingWarmRestarts` which has float `epoch`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/68338

Reviewed By: jbschlosser

Differential Revision: D33063970

Pulled By: albanD

fbshipit-source-id: 992e987f8d5f6f8f5067924df4671e9725b6d884
2021-12-14 09:05:19 -08:00
oliver
3d358a7678 Adds a maximize flag to Adam (#68164)
Summary:
Solves the next most important use case in https://github.com/pytorch/pytorch/issues/68052.

I have kept the style as close to that in SGD as seemed reasonable, given the slight differences in their internal implementations.

All feedback welcome!

cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse SciPioneer H-Huang

Pull Request resolved: https://github.com/pytorch/pytorch/pull/68164

Reviewed By: VitalyFedyunin

Differential Revision: D32994129

Pulled By: albanD

fbshipit-source-id: 65c57c3f3dbbd3e3e5338d51def54482503e8850
2021-12-13 05:53:53 -08:00
Kurt Mohler
52219b1017 Fix ChainedScheduler.get_last_lr() (#69112)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/68820

cc vincentqb jbschlosser albanD

Pull Request resolved: https://github.com/pytorch/pytorch/pull/69112

Reviewed By: zou3519

Differential Revision: D32796626

Pulled By: albanD

fbshipit-source-id: bde9d4e473527be4c0a7f21cb57f795a67a99eaa
2021-12-02 13:44:12 -08:00
Santiago Castro
263125a962 Fix RAdam docstring on LR default value (#69186)
Summary:
Fixes #{issue number}

Pull Request resolved: https://github.com/pytorch/pytorch/pull/69186

Reviewed By: albanD

Differential Revision: D32759614

Pulled By: H-Huang

fbshipit-source-id: b11819c50156a538cd6003e9cddde0390c853f67
2021-12-01 14:32:07 -08:00
Artsiom Sanakoyeu
c0e6dc9ac7 [pytorch] Fix loading from checkpoint after "maximize" flag was introduced in SGD (#68733)
Summary:
After 'maximize' flag was introduced in  https://github.com/pytorch/pytorch/issues/46480 some jobs fail because they resume training from the checkpoints.

After we load old checkpoints we will get an error during optimizer.step() call during backward pass in [torch/optim/sgd.py", line 129] because there is no key 'maximize' in the parameter groups of the SGD.

To circumvent this I add a default value `group.setdefault('maximize', False)` when the optimizer state is restored.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/68733

Reviewed By: albanD

Differential Revision: D32480963

Pulled By: asanakoy

fbshipit-source-id: 4e367fe955000a6cb95090541c143a7a1de640c2
2021-11-23 11:42:16 -08:00
nhankiet
a2e35e167b refactor: update f-string for swa.utils.py (#68718)
Summary:
_ Update some old-style formats to f-string, for whole and coherent consistency.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/68718

Reviewed By: jbschlosser

Differential Revision: D32593746

Pulled By: albanD

fbshipit-source-id: fcc17958f8af6a3260beca883bc1065f019dcf0e
2021-11-22 11:23:18 -08:00
oliver
94b6fa6f8b Adds an optimizer instance variable to ChainedScheduler (#68010)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/67601.

As simple a fix as I could make it. I even managed to delete some testing code!

I checked calling `super()` and, as I had feared, it doesn't work out the box, so perhaps that ought to be revisited later.

As it stands,  https://github.com/pytorch/pytorch/issues/20124, still applies to the chained scheduler, but I think this change is still an improvement.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/68010

Reviewed By: zou3519

Differential Revision: D32278139

Pulled By: albanD

fbshipit-source-id: 4c6f9f1b2822affdf63a6d22ddfdbcb1c6afd579
2021-11-10 01:31:47 -08:00
oliver
f8297d40fc Adds a maximize flag to SGD. (#67847)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/46480 -- for SGD.

## Notes:
- I have modified the existing tests to take a new `constructor_accepts_maximize` flag. When this is set to true, the ` _test_basic_cases_template` function will test both maximizing and minimizing the sample function.
- This was the clearest way I could think of testing the changes -- I would appreciate feedback on this strategy.

## Work to be done:
[] I need to update the docs.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/67847

Reviewed By: H-Huang

Differential Revision: D32252631

Pulled By: albanD

fbshipit-source-id: 27915a3cc2d18b7e4d17bfc2d666fe7d2cfdf9a4
2021-11-09 00:43:07 -08:00
hesom
07a08fb95f Fix typo in LinearLR docs (#67840)
Summary:
The final learning rate should be 0.05 like the lr used as the argument for the optimizer and not 0.005.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/67840

Reviewed By: jbschlosser

Differential Revision: D32187091

Pulled By: albanD

fbshipit-source-id: 8aff691bba3896a847d7b9d9d669a65f67a6f066
2021-11-05 07:16:15 -07:00
Yiwen Song
6696c59af4 Adding optimizer attribute to SequentialLR (#67406)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/67318 :)

cc albanD, datumbox

Pull Request resolved: https://github.com/pytorch/pytorch/pull/67406

Reviewed By: jbschlosser

Differential Revision: D31997873

Pulled By: albanD

fbshipit-source-id: f579fb886d049a545673fd92ef5892fcf501bcc6
2021-10-28 14:43:40 -07:00
Christopher Gray Howard
dfa7225a38 [Pytorch][Bootcamp] Add fix and testing for non-vectorized Adadelta optimizer to handle complex numbers (#66587)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66587

Made some changes in the step function of the non-vectorized Adadelta optimizer to handle complex numbers as two real numbers as per 65711 on github
ghstack-source-id: 141484731

Test Plan:
buck test mode/dev caffe2/test:optim -- 'test_adadelta_complex'

https://pxl.cl/1R7kJ

Reviewed By: albanD

Differential Revision: D31630069

fbshipit-source-id: 2741177b837960538ce39772897af36bbce7b7d8
2021-10-26 17:35:01 -07:00
Christopher Gray Howard
acb340de75 [Pytorch][Bootcamp] Add fixes and vanilla testing for Adagrad non-vectorized and vectorized optimizers to handle complex numbers (#66671)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66671

Made changes in the step function of the vectorized and non-vectorized adagrad optimizers to handle complex numbers as two real numbers as per 65711 on github
ghstack-source-id: 141442350

Test Plan:
buck test mode/dev caffe2/test:optim -- 'test_adagrad_complex'
https://pxl.cl/1Rd44

Reviewed By: albanD

Differential Revision: D31673503

fbshipit-source-id: 90a0d0c69b556716e2d17c59ce80f09c750fc464
2021-10-25 10:13:21 -07:00
Prabhat Roy
c7748fc172 Added validation of mode parameter in AveragedModel (#65921)
Summary:
Discussion: https://github.com/pytorch/pytorch/pull/65495#issuecomment-930460469

Pull Request resolved: https://github.com/pytorch/pytorch/pull/65921

Reviewed By: albanD

Differential Revision: D31310105

Pulled By: prabhat00155

fbshipit-source-id: 417691832a7c793744830c11e0ce53e3972d21a3
2021-10-03 08:42:28 -07:00
Prabhat Roy
2ea724b1fd Added option to update parameters using state_dict in AveragedModel (#65495)
Summary:
While implementing [EMA](https://github.com/pytorch/vision/pull/4381)(which extends AveragedModel) in torchvision, update_parameters() from AveragedModel could not be used as it did not handle state_dict(), so a custom update_parameters() needed to be defined in [EMA class](https://github.com/pytorch/vision/pull/4406). This PR aims to handle this scenario removing the need for this custom update_parameters() implementation.

Discussion: https://github.com/pytorch/vision/pull/4406#pullrequestreview-753734102

Pull Request resolved: https://github.com/pytorch/pytorch/pull/65495

Reviewed By: datumbox

Differential Revision: D31176742

Pulled By: prabhat00155

fbshipit-source-id: 326d14876018f21cf602bab5eaba344678dbabe2
2021-09-28 03:34:49 -07:00
Balaji
32f0387ee8 Bug in CosineAnnealingWarmRestarts in optim/lr_scheduler.py (#64758)
Summary:
## {emoji:1f41b} Bug
'CosineAnnealingWarmRestarts'  object has no attribute 'T_cur'.
In the Constructor of the CosineAnnealingWarmRestarts, we're calling the constructor of the Parent class (_LRScheduler) which inturn calls the step method of the CosineAnnealingWarmRestarts.
The called method tries to update the object's attribute  'T_cur' which is not defined yet. So it raises the error.
This only holds, when we give the value for last_epoch argument as 0 or greater than 0 to the 'CosineAnnealingWarmRestarts', while initializing the object.

![Bug_in_CosineAnnealingWarmRestarts](https://user-images.githubusercontent.com/77477328/132552212-70abc8b5-0357-4c35-90a9-832648bac607.png)
## To Reproduce

Steps to reproduce the behavior:

1. Give the value for the last_epoch argument as zero OR
1. Give the value for the last_epoch argument as a Positive integer.

## Expected behavior

I only expected the 'CosineAnnealingWarmRestarts' object to be initialized.

## Environment

PyTorch version: 1.9.0+cpu
Is debug build: False
CUDA used to build PyTorch: None
ROCM used to build PyTorch: N/A
OS: Ubuntu 20.04.2 LTS (x86_64)
GCC version: (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0
Clang version: Could not collect
CMake version: version 3.21.2
Libc version: glibc-2.31
Python version: 3.8.10  [GCC 9.4.0] (64-bit runtime)
Python platform: Linux-5.8.0-59-generic-x86_64-with-glibc2.29
Is CUDA available: False
CUDA runtime version: No CUDA

## Additional context
We can able to solve this bug by moving the line 'self.T_cur = self.last_epoch' above the 'super(CosineAnnealingWarmRestarts,self).__init__()' line. Since we've initialized the "self.T_cur" to the object.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/64758

Reviewed By: ezyang

Differential Revision: D31113694

Pulled By: jbschlosser

fbshipit-source-id: 98c0e292291775895dc3566fda011f2d6696f721
2021-09-22 16:55:14 -07:00
Ilqar Ramazanli
df3d649380 To add state dict and load_dict for Chained Scheduler (#65034)
Summary:
Adding state_dict() and load_state_dict() methods for Chained Scheduler

Pull Request resolved: https://github.com/pytorch/pytorch/pull/65034

Reviewed By: prabhat00155, nateanl

Differential Revision: D30958207

Pulled By: datumbox

fbshipit-source-id: 1a587a330d34e0548e891a39f8fb5a3d251b71fa
2021-09-15 13:11:41 -07:00
Ilqar Ramazanli
211ad231dc To add state_dict and load_state_dict to SequentialLR (#65035)
Summary:
To add state_dict() and load_state_dict() methods to SequentialLR

Pull Request resolved: https://github.com/pytorch/pytorch/pull/65035

Reviewed By: prabhat00155, nateanl

Differential Revision: D30958204

Pulled By: datumbox

fbshipit-source-id: 65114e1b07146526ae2680233f5cd42b2534d67a
2021-09-15 12:01:51 -07:00
Ilqar Ramazanli
dafa0a5a3b [doc][hackathon] To add Adadelta Optimizer to the documentation (#63255)
Summary:
It has been discussed before that adding description of Optimization algorithms to PyTorch Core documentation may result in a nice Optimization research tutorial. In the following tracking issue we mentioned about all the necessary algorithms and links to the originally published paper  https://github.com/pytorch/pytorch/issues/63236.

In this PR we are adding description of AdaDelta Algorithm to the documentation.  For more details, we refer to the paper  here https://arxiv.org/abs/1212.5701

<img width="654" alt="AdaDeltaalg" src="https://user-images.githubusercontent.com/73658284/132770544-82ccf90a-1d54-4ad5-8fc4-51c8dec63a12.png">

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63255

Reviewed By: ngimel

Differential Revision: D30867589

Pulled By: iramazanli

fbshipit-source-id: 5ba602c20c724a4486bdd38b73e1b64c0e767bdc
2021-09-10 16:49:12 -07:00
Ilqar Ramazanli
54b72a99ef To add Rprop documentation (#63866)
Summary:
It has been discussed before that adding description of Optimization algorithms to PyTorch Core documentation may result in a nice Optimization research tutorial. In the following tracking issue we mentioned about all the necessary algorithms and links to the originally published paper  https://github.com/pytorch/pytorch/issues/63236.

In this PR we are adding description of Rprop to the documentation.  For more details, we refer to the paper  http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.21.1417

<img width="657" alt="Rpropalg" src="https://user-images.githubusercontent.com/73658284/132750009-a5ec059e-6d53-4c67-917b-57174c8ca27b.png">

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63866

Reviewed By: ngimel

Differential Revision: D30867590

Pulled By: iramazanli

fbshipit-source-id: 0d2d4ffc6c4d939290bbbaa84d2c6e901ed8b54a
2021-09-10 09:49:10 -07:00
Ilqar Ramazanli
d4b09dbab3 [doc][hackathon] To add Adagrad Optimizer to the documentation (#63254)
Summary:
It has been discussed before that adding description of Optimization algorithms to PyTorch Core documentation may result in a nice Optimization research tutorial. In the following tracking issue we mentioned about all the necessary algorithms and links to the originally published paper  https://github.com/pytorch/pytorch/issues/63236.

In this PR we are adding description of Adagrad to the documentation.  For more details, we refer to the paper
http://jmlr.org/papers/v12/duchi11a.html

<img width="658" alt="AdaGradAlgo" src="https://user-images.githubusercontent.com/73658284/132743276-a52ea3fb-70a5-4788-94b7-f99367907a26.png">

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63254

Reviewed By: albanD

Differential Revision: D30852139

Pulled By: iramazanli

fbshipit-source-id: 9e496560a97e92be8386585b01d9bd3bba4b0c66
2021-09-09 15:41:29 -07:00
Ilqar Ramazanli
2b41bf40c5 To add SequentialLR to PyTorch Core Schedulers (#64037)
Summary:
Partially resolves https://github.com/pytorch/vision/issues/4281

In this PR we are proposing a new scheduler --SequentialLR-- which enables list of different schedulers called in different periods of the training process.

The main motivation of this scheduler is recently gained popularity of warming up phase in the training time. It has been shown that having a small steps in initial stages of training can help convergence procedure get faster.

With the help of SequentialLR we mainly enable to call a small constant (or linearly increasing) learning rate followed by actual target learning rate scheduler.

```PyThon
scheduler1 = ConstantLR(optimizer, factor=0.1, total_iters=2)
scheduler2 = ExponentialLR(optimizer, gamma=0.9)
scheduler = SequentialLR(optimizer, schedulers=[scheduler1, scheduler2], milestones=[5])

for epoch in range(100):
    train(...)
    validate(...)
    scheduler.step()
```

which this code snippet will call `ConstantLR` in the first 5 epochs and will follow up with `ExponentialLR` in the following epochs.

This scheduler could be used to provide call of any group of schedulers next to each other. The main consideration we should make is every time we switch to a new scheduler we assume that new scheduler starts from the beginning- zeroth epoch.

We also add Chained Scheduler to `optim.rst` and `lr_scheduler.pyi` files here.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/64037

Reviewed By: albanD

Differential Revision: D30841099

Pulled By: iramazanli

fbshipit-source-id: 94f7d352066ee108eef8cda5f0dcb07f4d371751
2021-09-09 09:36:32 -07:00
Ilqar Ramazanli
239366c9c2 To add Rectified Adam Description to Documentation (#63772)
Summary:
It has been discussed before that adding description of Optimization algorithms to PyTorch Core documentation may result in a nice Optimization research tutorial. In the following tracking issue we mentioned about all the necessary algorithms and links to the originally published paper  https://github.com/pytorch/pytorch/issues/63236.

In this PR we are adding description of Rectified Adam Algorithm to the documentation.  For more details, we refer to the paper  https://arxiv.org/abs/1908.03265

<img width="446" alt="RadamAlgo" src="https://user-images.githubusercontent.com/73658284/132587815-4764b642-df53-4e41-975f-72e0f40fdc48.png">

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63772

Reviewed By: datumbox

Differential Revision: D30839694

Pulled By: iramazanli

fbshipit-source-id: 6f5629ce56e10c66a451433334b587b99eda1610
2021-09-09 07:10:36 -07:00
Ilqar Ramazanli
5b21f172a4 [doc][hackathon] To add AdamW Optimizer to the documentation (#63252)
Summary:
It has been discussed before that adding description of Optimization algorithms to PyTorch Core documentation may result in a nice Optimization research tutorial. In the following tracking issue we mentioned about all the necessary algorithms and links to the originally published paper  https://github.com/pytorch/pytorch/issues/63236.

In this PR we are adding description of AdamW Algorithm to the documentation.  For more details, we refer to the paper  here https://arxiv.org/abs/1711.05101

<img width="442" alt="AdamWalgo" src="https://user-images.githubusercontent.com/73658284/132589957-6d381e96-cb62-40d0-990f-82a32ec455be.png">

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63252

Reviewed By: datumbox

Differential Revision: D30839685

Pulled By: iramazanli

fbshipit-source-id: 1a426c874ab86408d286a34f41aefcf5b21167c0
2021-09-09 07:05:31 -07:00
Ilqar Ramazanli
39ce801d1f To add Adamax algorithm to documentation (#63903)
Summary:
It has been discussed before that adding description of Optimization algorithms to PyTorch Core documentation may result in a nice Optimization research tutorial. In the following tracking issue we mentioned about all the necessary algorithms and links to the originally published paper  https://github.com/pytorch/pytorch/issues/63236.

In this PR we are adding description of Adamax Algorithm to the documentation.  For more details, we refer to the paper  https://arxiv.org/abs/1412.6980

<img width="447" alt="Adamx" src="https://user-images.githubusercontent.com/73658284/132577306-878ce64c-627a-4086-808c-d0482868d4a1.png">

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63903

Reviewed By: albanD

Differential Revision: D30819055

Pulled By: iramazanli

fbshipit-source-id: 37f748cbea9f93bf37193ee30fc295fb1a1e9ffd
2021-09-09 06:42:33 -07:00
Ilqar Ramazanli
149f1114fe To add Stochastic Gradient Descent to Documentation (#63805)
Summary:
It has been discussed before that adding description of Optimization algorithms to PyTorch Core documentation may result in a nice Optimization research tutorial. In the following tracking issue we mentioned about all the necessary algorithms and links to the originally published paper  https://github.com/pytorch/pytorch/issues/63236.

In this PR we are adding description of Stochastic Gradient Descent to the documentation.

<img width="466" alt="SGDalgo" src="https://user-images.githubusercontent.com/73658284/132585881-b351a6d4-ece0-4825-b9c0-126d7303ed53.png">

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63805

Reviewed By: albanD

Differential Revision: D30818947

Pulled By: iramazanli

fbshipit-source-id: 3812028e322c8a64f4343552b0c8c4582ea382f3
2021-09-08 15:22:30 -07:00
Ilqar Ramazanli
43248d9112 [doc][hackathon] To add Adam Optimizer to the documentation (#63251)
Summary:
It has been discussed before that adding description of Optimization algorithms to PyTorch Core documentation may result in a nice Optimization research tutorial. In the following tracking issue we mentioned about all the necessary algorithms and links to the originally published paper  https://github.com/pytorch/pytorch/issues/63236.

In this PR we are adding description of Adam Algorithm to the documentation.  For more details, we refer to the paper  https://arxiv.org/abs/1412.6980

<img width="442" alt="Screen Shot 2021-08-27 at 6 37 54 PM" src="https://user-images.githubusercontent.com/73658284/131195297-35fce613-3691-4fed-b42d-db234d4fcd7c.png">

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63251

Reviewed By: albanD

Differential Revision: D30779163

Pulled By: iramazanli

fbshipit-source-id: 319a80fc3952793b0d064d0e641ddc1de3c05a86
2021-09-07 11:03:35 -07:00
Ilqar Ramazanli
f767cf6683 To change WarmUp Scheduler with ConstantLR and LinearLR (#64395)
Summary:
Partially unblocks https://github.com/pytorch/vision/issues/4281

Previously we have added WarmUp Schedulers to PyTorch Core in the PR : https://github.com/pytorch/pytorch/pull/60836 which had two mode of execution - linear and constant depending on warming up function.

In this PR we are changing this interface to more direct form, as separating linear and constant modes to separate Schedulers. In particular

```Python
scheduler1 = WarmUpLR(optimizer, warmup_factor=0.1, warmup_iters=5, warmup_method="constant")
scheduler2 = WarmUpLR(optimizer, warmup_factor=0.1, warmup_iters=5, warmup_method="linear")
```

will look like

```Python
scheduler1 = ConstantLR(optimizer, warmup_factor=0.1, warmup_iters=5)
scheduler2 = LinearLR(optimizer, warmup_factor=0.1, warmup_iters=5)
```

correspondingly.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/64395

Reviewed By: datumbox

Differential Revision: D30753688

Pulled By: iramazanli

fbshipit-source-id: e47f86d12033f80982ddf1faf5b46873adb4f324
2021-09-07 08:42:31 -07:00
Tugsbayasgalan (Tugsuu) Manlaibaatar
52d7dd7398 [DOC] improve docstring for Optimizer.state_dict (#63153)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63153

Fixes: https://github.com/pytorch/pytorch/issues/60121

Test Plan: Imported from OSS

Reviewed By: pbelevich

Differential Revision: D30629462

Pulled By: tugsbayasgalan

fbshipit-source-id: a9160e02ac53bb1a6219879747d73aae9ebe4d2f
2021-08-29 10:20:58 -07:00
Ilqar Ramazanli
aefa2f3e64 To add RMSProp algorithm documentation (#63721)
Summary:
It has been discussed before that adding description of Optimization algorithms to PyTorch Core documentation may result in a nice Optimization research tutorial. In the following tracking issue we mentioned about all the necessary algorithms and links to the originally published paper  https://github.com/pytorch/pytorch/issues/63236.

In this PR we are adding description of RMSProp to the documentation.  For more details, we refer to the paper   https://www.cs.toronto.edu/~tijmen/csc321/slides/lecture_slides_lec6.pdf

<img width="464" alt="RMSProp" src="https://user-images.githubusercontent.com/73658284/131179226-3fb6fe5a-5301-4948-afbe-f38bf57f24ff.png">

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63721

Reviewed By: albanD

Differential Revision: D30612426

Pulled By: iramazanli

fbshipit-source-id: c3ac630a9658d1282866b53c86023ac10cf95398
2021-08-28 15:55:56 -07:00