pytorch/radam.pyi at 3058700f7f91170a7f34ea2dd1fa0ae32cc901b4 - pytorch - Carlos Sousa's Git

OSSForks/pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 00:21:07 +01:00

bilzard 18a58f0bd6 Implement "RAdamW" optimizer (#107507 )

Fixes #107282

## Overview

- basic design decision was followed as they made on #103881 (tensor operation, test cases, order & position of argument etc.)
- for the algorithm for decoupled weight decay, I referred to [1, 2]

## backwards-incompatible changes

- positional argument `decoupled_weight_decay` is added to:
    -  `torch.optim.radam`

The existing code which refers to these APIs can be affected.

Note: Positional argument `decoupled_weight_decay` is added to `torch.optim.RAdam`. However, since it was added to the last position and with default value, it is not affected.

## Reference

- [1] [Decoupled Weight Decay Regularization](https://arxiv.org/abs/1711.05101)
- [2] https://github.com/LiyuanLucasLiu/RAdam/blob/master/radam/radam.py#L5-L94

## TODO

- [x] implement tensor operation
- [x] implement test cases
- [x] modify doc-string
- [x] pass unit test code locally `python test/test_optim.py -k test_radam`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/107507
Approved by: https://github.com/janeyx99

2023-08-28 20:50:25 +00:00

15 lines

343 B

Python

Raw Blame History

 from typing import Tuple
 from .optimizer import Optimizer, params_t
 class RAdam(Optimizer):
     def __init__(
         self,
         params: params_t,
         lr: float = ...,
         betas: Tuple[float, float] = ...,
         eps: float = ...,
         weight_decay: float = ...,
         decoupled_weight_decay: bool = ...,
     ) -> None: ...