Commit Graph

13 Commits

Author SHA1 Message Date
Michael Acar
a4b2f3e213 Implement AdamW optimizer (#21250)
Summary:
# What is this?
This is an implementation of the AdamW optimizer as implemented in [the fastai library](803894051b/fastai/callback.py) and as initially introduced in the paper [Decoupled Weight Decay Regularization](https://arxiv.org/abs/1711.05101). It decouples the weight decay regularization step from the optimization step during training.

There have already been several abortive attempts to push this into pytorch in some form or fashion: https://github.com/pytorch/pytorch/pull/17468, https://github.com/pytorch/pytorch/pull/10866, https://github.com/pytorch/pytorch/pull/3740, https://github.com/pytorch/pytorch/pull/4429. Hopefully this one goes through.
# Why is this important?
Via a simple reparameterization, it can be shown that L2 regularization has a weight decay effect in the case of SGD optimization. Because of this, L2 regularization became synonymous with the concept of weight decay. However, it can be shown that the equivalence of L2 regularization and weight decay breaks down for more complex adaptive optimization schemes. It was shown in the paper [Decoupled Weight Decay Regularization](https://arxiv.org/abs/1711.05101) that this is the reason why models trained with SGD achieve better generalization than those trained with Adam. Weight decay is a very effective regularizer. L2 regularization, in and of itself, is much less effective. By explicitly decaying the weights, we can achieve state-of-the-art results while also taking advantage of the quick convergence properties that adaptive optimization schemes have.
# How was this tested?
There were test cases added to `test_optim.py` and I also ran a [little experiment](https://gist.github.com/mjacar/0c9809b96513daff84fe3d9938f08638) to validate that this implementation is equivalent to the fastai implementation.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21250

Differential Revision: D16060339

Pulled By: vincentqb

fbshipit-source-id: ded7cc9cfd3fde81f655b9ffb3e3d6b3543a4709
2019-07-02 09:09:10 -07:00
vfdev
449a2c3555 Fixes #20124 (#20203)
Summary:
Fixes #20124

Description:
Code wraps `optimizer.step()` method to detect whether user is following new pattern or old pattern. In case of old pattern detected, a UserWarning is raised. Documentation is also updated to reflect the change:

![Screen Shot 2019-05-07 at 11 05 17](https://user-images.githubusercontent.com/2459423/57287527-04e63580-70b8-11e9-9ddd-5d159ef0ed2f.png)

cc SsnL, bado-lee
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20203

Differential Revision: D15543060

Pulled By: ezyang

fbshipit-source-id: 3605e1afdb6ffc2dfd5e75e92e01b967c4d065b5
2019-05-29 14:15:01 -07:00
Tongzhou Wang
4f5e72600e fix lint in optim doc
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18883

Differential Revision: D14793365

Pulled By: ezyang

fbshipit-source-id: c1b46c98e3319badec3e0e772d0ddea24cbf9c89
2019-04-04 19:08:13 -07:00
Sam Pepose
8635078d9e Adds Cyclical Learning Rate and Momentum (#18001)
Summary:
This implements a cyclical learning rate (CLR) schedule with an optional inverse cyclical momentum. More info about CLR: https://github.com/bckenstler/CLR

This is finishing what #2016 started. Resolves #1909.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18001

Differential Revision: D14451845

Pulled By: sampepose

fbshipit-source-id: 8f682e0c3dee3a73bd2b14cc93fcf5f0e836b8c9
2019-03-27 19:56:04 -07:00
livc
ecc5e623a2 fix punctuation
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17973

Differential Revision: D14438725

Pulled By: zou3519

fbshipit-source-id: 30a5485b508b4ae028057e0b66a8abb2b163d66b
2019-03-13 08:14:30 -07:00
Kai Arulkumaran
e9ef20eab5 Add Cosine Annealing LR Scheduler (#3311)
* Add Cosine Annealing LR Scheduler

* Update eta_min in tests to prevent numerical mistakes

* Use non-zero min_eta in test_cos_anneal_lr
2017-12-18 02:43:08 -05:00
SsnL
e2f33eb6a2 add doc for sparse_adam (#3519) 2017-11-06 18:37:15 -05:00
SsnL
9260f0e5ee Fix a typo in optim.rst (#3069) 2017-10-11 16:47:14 +02:00
SsnL
828048f578 Add document on how Module.cuda() and optims should work together (#3056) 2017-10-10 22:55:23 -04:00
Jiaming Liu
630af4d7d8 add learning rate schedulers (#1370) 2017-05-25 16:21:43 -04:00
Adam Paszke
f8ae34706e Port L-BFGS from Lua optim 2017-01-22 18:02:40 -05:00
Adam Paszke
604e13775f Add optim docs 2017-01-16 12:59:47 -05:00
Sam Gross
126a1cc398 Add Sphinx docs 2016-12-28 00:03:39 +01:00