pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 00:21:07 +01:00

Author	SHA1	Message	Date
Xiang Gao	6bc77f4d35	Use amax/maximum instead of max in optimizers (#43797 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43797 Reviewed By: malfet Differential Revision: D23406641 Pulled By: mruberry fbshipit-source-id: 0cd075124aa6533b21375fe2c90c44a5d05ad6e6	2020-09-15 10:39:40 -07:00
Masaki Kozuki	7403545518	Fix exception message of `torch.optim.AdamW`. (#36088 ) Summary: PyTorch does not implement `SparseAdamW`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/36088 Differential Revision: D20932357 Pulled By: gchanan fbshipit-source-id: 49e5b72c34ff8ce0deb6b3807662b8b7d67d959f	2020-04-09 08:02:10 -07:00
albanD	6e2bb1c054	End of the .data removal in torch/optim (#34211 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34211 Test Plan: Imported from OSS Differential Revision: D20248684 Pulled By: albanD fbshipit-source-id: 2294bfa41b82ff47f000bc98860780f59d7d4421	2020-03-09 06:40:39 -07:00
Eleanor Dwight Holland	6a97777f72	Remove use of `.data` from optimizers (#33640 ) Summary: Removes all uses of `.data` from optimizers. Or tries to. Pull Request resolved: https://github.com/pytorch/pytorch/pull/33640 Reviewed By: vincentqb Differential Revision: D20203216 Pulled By: albanD fbshipit-source-id: 9bfe78bbed00fd4aaa690801cff0201f0bd680a0	2020-03-03 13:21:55 -08:00
Xiao Wang	c1dd70688a	Fix deprecated python "add" calls (#33428 ) Summary: This PR fixed those python "add" calls using deprecated signature `add(Scalar, Tensor)`. The alternative signature `add(Tensor, alpha = Scalar)` is used. cc csarofeen zasdfgbnm ptrblck ngimel Pull Request resolved: https://github.com/pytorch/pytorch/pull/33428 Differential Revision: D20002534 Pulled By: vincentqb fbshipit-source-id: 81f2dd6170a47a9b53a17e5817c26e70d8afa130	2020-02-26 09:02:31 -08:00
Nikolay Novik	d19a50bf27	Add missing weight_decay parameter validation for Adam and AdamW (#33126 ) Summary: Adam and AdamW are missing parameter validation for weight_decay. Other optimisers have this check present. Pull Request resolved: https://github.com/pytorch/pytorch/pull/33126 Differential Revision: D19860366 Pulled By: vincentqb fbshipit-source-id: 286d7dc90e2f4ccf6540638286d2fe17939648fc	2020-02-20 11:11:51 -08:00
Vitaly Fedyunin	877c96cddf	explicitly provide memory format when calling to *_like operators Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30008 Test Plan: Imported from OSS Differential Revision: D18575981 Pulled By: VitalyFedyunin fbshipit-source-id: ec3418257089ad57913932be1a8608cd20ce054c	2019-11-19 16:19:29 -08:00
Farhad Ramezanghorbani	fed5ca192c	Adam/AdamW implementation minor fix (#22628 ) Summary: I have noticed a small discrepancy between theory and the implementation of AdamW and in general Adam. The epsilon in the denominator of the following Adam update should not be scaled by the bias correction [(Algorithm 2, L9-12)](https://arxiv.org/pdf/1711.05101.pdf). Only the running average of the gradient (_m_) and squared gradients (_v_) should be scaled by their corresponding bias corrections. ![adam_update](https://user-images.githubusercontent.com/13050245/60894105-11117f00-a230-11e9-9ba0-adad2ae2e0ae.png) In the current implementation, the epsilon is scaled by the square root of `bias_correction2`. I have plotted this ratio as a function of step given `beta2 = 0.999` and `eps = 1e-8`. In the early steps of optimization, this ratio slightly deviates from theory (denoted by the horizontal red line). ![plot](https://user-images.githubusercontent.com/13050245/60893952-cabc2000-a22f-11e9-8dc2-6353ad5d674d.png) Pull Request resolved: https://github.com/pytorch/pytorch/pull/22628 Differential Revision: D16589914 Pulled By: vincentqb fbshipit-source-id: 8791eb338236faea9457c0845ccfdba700e5f1e7	2019-08-01 11:42:04 -07:00
Michael Acar	a4b2f3e213	Implement AdamW optimizer (#21250 ) Summary: # What is this? This is an implementation of the AdamW optimizer as implemented in [the fastai library](`803894051b/fastai/callback.py`) and as initially introduced in the paper [Decoupled Weight Decay Regularization](https://arxiv.org/abs/1711.05101). It decouples the weight decay regularization step from the optimization step during training. There have already been several abortive attempts to push this into pytorch in some form or fashion: https://github.com/pytorch/pytorch/pull/17468, https://github.com/pytorch/pytorch/pull/10866, https://github.com/pytorch/pytorch/pull/3740, https://github.com/pytorch/pytorch/pull/4429. Hopefully this one goes through. # Why is this important? Via a simple reparameterization, it can be shown that L2 regularization has a weight decay effect in the case of SGD optimization. Because of this, L2 regularization became synonymous with the concept of weight decay. However, it can be shown that the equivalence of L2 regularization and weight decay breaks down for more complex adaptive optimization schemes. It was shown in the paper [Decoupled Weight Decay Regularization](https://arxiv.org/abs/1711.05101) that this is the reason why models trained with SGD achieve better generalization than those trained with Adam. Weight decay is a very effective regularizer. L2 regularization, in and of itself, is much less effective. By explicitly decaying the weights, we can achieve state-of-the-art results while also taking advantage of the quick convergence properties that adaptive optimization schemes have. # How was this tested? There were test cases added to `test_optim.py` and I also ran a [little experiment](https://gist.github.com/mjacar/0c9809b96513daff84fe3d9938f08638) to validate that this implementation is equivalent to the fastai implementation. Pull Request resolved: https://github.com/pytorch/pytorch/pull/21250 Differential Revision: D16060339 Pulled By: vincentqb fbshipit-source-id: ded7cc9cfd3fde81f655b9ffb3e3d6b3543a4709	2019-07-02 09:09:10 -07:00

9 Commits