pytorch/torch/optim
bilzard 18a58f0bd6 Implement "RAdamW" optimizer (#107507)
Fixes #107282

## Overview

- basic design decision was followed as they made on #103881 (tensor operation, test cases, order & position of argument etc.)
- for the algorithm for decoupled weight decay, I referred to [1, 2]

## backwards-incompatible changes

- positional argument `decoupled_weight_decay` is added to:
    -  `torch.optim.radam`

The existing code which refers to these APIs can be affected.

Note: Positional argument `decoupled_weight_decay` is added to `torch.optim.RAdam`. However, since it was added to the last position and with default value, it is not affected.

## Reference

- [1] [Decoupled Weight Decay Regularization](https://arxiv.org/abs/1711.05101)
- [2] https://github.com/LiyuanLucasLiu/RAdam/blob/master/radam/radam.py#L5-L94

## TODO

- [x] implement tensor operation
- [x] implement test cases
- [x] modify doc-string
- [x] pass unit test code locally `python test/test_optim.py -k test_radam`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/107507
Approved by: https://github.com/janeyx99
2023-08-28 20:50:25 +00:00
..
_multi_tensor
__init__.py
__init__.pyi
_functional.py
adadelta.py [BE]: Update Ruff to 0.0.280 (#105724) 2023-07-22 23:03:34 +00:00
adadelta.pyi Merge and improve torch optim optimizer type stubs (#102593) 2023-07-26 11:56:42 +00:00
adagrad.py [BE]: Update Ruff to 0.0.280 (#105724) 2023-07-22 23:03:34 +00:00
adagrad.pyi Merge and improve torch optim optimizer type stubs (#102593) 2023-07-26 11:56:42 +00:00
adam.py [optim] FusedAdam/W accepts lr: Tensor without h2ds (#106916) 2023-08-21 23:00:44 +00:00
adam.pyi [optim] FusedAdam/W accepts lr: Tensor without h2ds (#106916) 2023-08-21 23:00:44 +00:00
adamax.py [BE]: Update Ruff to 0.0.280 (#105724) 2023-07-22 23:03:34 +00:00
adamax.pyi Merge and improve torch optim optimizer type stubs (#102593) 2023-07-26 11:56:42 +00:00
adamw.py [optim] FusedAdam/W accepts lr: Tensor without h2ds (#106916) 2023-08-21 23:00:44 +00:00
adamw.pyi [optim] FusedAdam/W accepts lr: Tensor without h2ds (#106916) 2023-08-21 23:00:44 +00:00
asgd.py [BE]: Update Ruff to 0.0.280 (#105724) 2023-07-22 23:03:34 +00:00
asgd.pyi Merge and improve torch optim optimizer type stubs (#102593) 2023-07-26 11:56:42 +00:00
lbfgs.py Correct LBFGS tolerance_grad doc string (#99792) 2023-04-22 20:19:01 +00:00
lbfgs.pyi Merge and improve torch optim optimizer type stubs (#102593) 2023-07-26 11:56:42 +00:00
lr_scheduler.py [BE]: Update ruff to 0.285 (#107519) 2023-08-22 23:16:38 +00:00
lr_scheduler.pyi Fixed type hints for CosineAnnealingWarmRestarts (#102067) 2023-05-23 19:06:07 +00:00
nadam.py Fix docs, missed a // in LaTeX for nadam (#107736) 2023-08-23 21:36:27 +00:00
nadam.pyi Merge and improve torch optim optimizer type stubs (#102593) 2023-07-26 11:56:42 +00:00
optimizer.py Revert "[optim] Make casting to match params a hook (#106725)" 2023-08-25 13:47:19 +00:00
radam.py Implement "RAdamW" optimizer (#107507) 2023-08-28 20:50:25 +00:00
radam.pyi Implement "RAdamW" optimizer (#107507) 2023-08-28 20:50:25 +00:00
rmsprop.py [BE]: Update Ruff to 0.0.280 (#105724) 2023-07-22 23:03:34 +00:00
rmsprop.pyi Merge and improve torch optim optimizer type stubs (#102593) 2023-07-26 11:56:42 +00:00
rprop.py Add in-place _foreach_copy (#107226) 2023-08-17 00:11:18 +00:00
rprop.pyi Merge and improve torch optim optimizer type stubs (#102593) 2023-07-26 11:56:42 +00:00
sgd.py Fixes #107737 SGD doc blank line (#107738) 2023-08-25 19:48:30 +00:00
sgd.pyi Merge and improve torch optim optimizer type stubs (#102593) 2023-07-26 11:56:42 +00:00
sparse_adam.py [BE]: Update Ruff to 0.0.280 (#105724) 2023-07-22 23:03:34 +00:00
sparse_adam.pyi Merge and improve torch optim optimizer type stubs (#102593) 2023-07-26 11:56:42 +00:00
swa_utils.py use reset_running_stats in swa_utils.update_bn (#103801) 2023-06-23 01:17:13 +00:00
swa_utils.pyi