mirror of
https://github.com/zebrajr/pytorch.git
synced 2025-12-07 12:21:27 +01:00
Summary: Fixes : https://github.com/pytorch/pytorch/issues/24892 In the paper : https://arxiv.org/pdf/1908.03265.pdf Liyuan Liu et al. suggested a new optimization algorithm with an essence of similar to Adam Algorithm. It has been discussed in the paper that, without warmup heuristic, in the early stage of adaptive optimization / learning algorithms sometimes we can get undesirable large variance which can slow overall convergence process. Authors proposed the idea of rectification of variance of adaptive learning rate when it is expected to be high. Differing from the paper, we selected variance tractability cut-off as 5 instead of 4. This adjustment is common practice, and could be found in the code-repository and also tensorflow swift optim library as well :2f03dd1970/radam/radam.py (L156)f51ee4618d/Sources/TensorFlow/Optimizers/MomentumBased.swift (L638)Pull Request resolved: https://github.com/pytorch/pytorch/pull/58968 Reviewed By: vincentqb Differential Revision: D29310601 Pulled By: iramazanli fbshipit-source-id: b7bd487f72f1074f266687fd9c0c6be264a748a9
39 lines
834 B
Python
39 lines
834 B
Python
"""
|
|
:mod:`torch.optim` is a package implementing various optimization algorithms.
|
|
Most commonly used methods are already supported, and the interface is general
|
|
enough, so that more sophisticated ones can be also easily integrated in the
|
|
future.
|
|
"""
|
|
|
|
from .adadelta import Adadelta
|
|
from .adagrad import Adagrad
|
|
from .adam import Adam
|
|
from .adamw import AdamW
|
|
from .sparse_adam import SparseAdam
|
|
from .adamax import Adamax
|
|
from .asgd import ASGD
|
|
from .sgd import SGD
|
|
from .radam import RAdam
|
|
from .rprop import Rprop
|
|
from .rmsprop import RMSprop
|
|
from .optimizer import Optimizer
|
|
from .nadam import NAdam
|
|
from .lbfgs import LBFGS
|
|
from . import lr_scheduler
|
|
from . import swa_utils
|
|
|
|
del adadelta
|
|
del adagrad
|
|
del adam
|
|
del adamw
|
|
del sparse_adam
|
|
del adamax
|
|
del asgd
|
|
del sgd
|
|
del radam
|
|
del rprop
|
|
del rmsprop
|
|
del optimizer
|
|
del nadam
|
|
del lbfgs
|