pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
Michael Acar	a4b2f3e213	Implement AdamW optimizer (#21250 ) Summary: # What is this? This is an implementation of the AdamW optimizer as implemented in [the fastai library](`803894051b/fastai/callback.py`) and as initially introduced in the paper [Decoupled Weight Decay Regularization](https://arxiv.org/abs/1711.05101). It decouples the weight decay regularization step from the optimization step during training. There have already been several abortive attempts to push this into pytorch in some form or fashion: https://github.com/pytorch/pytorch/pull/17468, https://github.com/pytorch/pytorch/pull/10866, https://github.com/pytorch/pytorch/pull/3740, https://github.com/pytorch/pytorch/pull/4429. Hopefully this one goes through. # Why is this important? Via a simple reparameterization, it can be shown that L2 regularization has a weight decay effect in the case of SGD optimization. Because of this, L2 regularization became synonymous with the concept of weight decay. However, it can be shown that the equivalence of L2 regularization and weight decay breaks down for more complex adaptive optimization schemes. It was shown in the paper [Decoupled Weight Decay Regularization](https://arxiv.org/abs/1711.05101) that this is the reason why models trained with SGD achieve better generalization than those trained with Adam. Weight decay is a very effective regularizer. L2 regularization, in and of itself, is much less effective. By explicitly decaying the weights, we can achieve state-of-the-art results while also taking advantage of the quick convergence properties that adaptive optimization schemes have. # How was this tested? There were test cases added to `test_optim.py` and I also ran a [little experiment](https://gist.github.com/mjacar/0c9809b96513daff84fe3d9938f08638) to validate that this implementation is equivalent to the fastai implementation. Pull Request resolved: https://github.com/pytorch/pytorch/pull/21250 Differential Revision: D16060339 Pulled By: vincentqb fbshipit-source-id: ded7cc9cfd3fde81f655b9ffb3e3d6b3543a4709	2019-07-02 09:09:10 -07:00
vfdev	449a2c3555	Fixes #20124 (#20203 ) Summary: Fixes #20124 Description: Code wraps `optimizer.step()` method to detect whether user is following new pattern or old pattern. In case of old pattern detected, a UserWarning is raised. Documentation is also updated to reflect the change: ![Screen Shot 2019-05-07 at 11 05 17](https://user-images.githubusercontent.com/2459423/57287527-04e63580-70b8-11e9-9ddd-5d159ef0ed2f.png) cc SsnL, bado-lee Pull Request resolved: https://github.com/pytorch/pytorch/pull/20203 Differential Revision: D15543060 Pulled By: ezyang fbshipit-source-id: 3605e1afdb6ffc2dfd5e75e92e01b967c4d065b5	2019-05-29 14:15:01 -07:00
Tongzhou Wang	4f5e72600e	fix lint in optim doc Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18883 Differential Revision: D14793365 Pulled By: ezyang fbshipit-source-id: c1b46c98e3319badec3e0e772d0ddea24cbf9c89	2019-04-04 19:08:13 -07:00
Sam Pepose	8635078d9e	Adds Cyclical Learning Rate and Momentum (#18001 ) Summary: This implements a cyclical learning rate (CLR) schedule with an optional inverse cyclical momentum. More info about CLR: https://github.com/bckenstler/CLR This is finishing what #2016 started. Resolves #1909. Pull Request resolved: https://github.com/pytorch/pytorch/pull/18001 Differential Revision: D14451845 Pulled By: sampepose fbshipit-source-id: 8f682e0c3dee3a73bd2b14cc93fcf5f0e836b8c9	2019-03-27 19:56:04 -07:00
livc	ecc5e623a2	fix punctuation Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17973 Differential Revision: D14438725 Pulled By: zou3519 fbshipit-source-id: 30a5485b508b4ae028057e0b66a8abb2b163d66b	2019-03-13 08:14:30 -07:00
Kai Arulkumaran	e9ef20eab5	Add Cosine Annealing LR Scheduler (#3311 ) * Add Cosine Annealing LR Scheduler * Update eta_min in tests to prevent numerical mistakes * Use non-zero min_eta in test_cos_anneal_lr	2017-12-18 02:43:08 -05:00
SsnL	e2f33eb6a2	add doc for sparse_adam (#3519 )	2017-11-06 18:37:15 -05:00
SsnL	9260f0e5ee	Fix a typo in optim.rst (#3069 )	2017-10-11 16:47:14 +02:00
SsnL	828048f578	Add document on how Module.cuda() and optims should work together (#3056 )	2017-10-10 22:55:23 -04:00
Jiaming Liu	630af4d7d8	add learning rate schedulers (#1370 )	2017-05-25 16:21:43 -04:00
Adam Paszke	f8ae34706e	Port L-BFGS from Lua optim	2017-01-22 18:02:40 -05:00
Adam Paszke	604e13775f	Add optim docs	2017-01-16 12:59:47 -05:00
Sam Gross	126a1cc398	Add Sphinx docs	2016-12-28 00:03:39 +01:00

13 Commits