pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
albanD	375668cd96	Remove overly restrictive assert in adam (#80222 ) This is causing issues if the user has the step on cuda for a good reason. These assert prevents code that used to run just fine to fail. Note that this is a pretty bad thing to do for performance though so it is ok to try and push users away from doing it. For the 1.12.1 milestone: this is not asking for a dot release to fix this (as this is bad practice anyways). But it would be a great thing to add if we do one: it is very low risk and will prevent breakage for users. Pull Request resolved: https://github.com/pytorch/pytorch/pull/80222 Approved by: https://github.com/jbschlosser, https://github.com/ngimel	2022-06-24 17:08:34 +00:00
Michael Carilli	ba27ee9e8f	[CUDA graphs] Allows Adam and AdamW to be capture-safe (#77862 ) Near term fix for https://github.com/pytorch/pytorch/issues/76368. Q. Why does the user need to request `capturable=True` in the optimizer constructor? Why can't capture safety be completely automatic? A. We need to set up capture-safe (device-side) state variables before capture. If we don't, and step() internally detects capture is underway, it's too late: the best we could do is create a device state variable and copy the current CPU value into it, which is not something we want baked into the graph. Q. Ok, why not just do the capture-safe approach with device-side state variables all the time? A. It incurs several more kernel launches per parameter, which could really add up and regress cpu overhead for ungraphed step()s. If the optimizer won't be captured, we should allow step() to stick with its current cpu-side state handling. Q. But cuda RNG is a stateful thing that maintains its state on the cpu outside of capture and replay, and we capture it automatically. Why can't we do the same thing here? A. The graph object can handle RNG generator increments because its capture_begin, capture_end, and replay() methods can see and access generator object. But the graph object has no explicit knowledge of or access to optimizer steps in its capture scope. We could let the user tell the graph object what optimizers will be stepped in its scope, ie something like ```python graph.will_use_optimizer(opt) graph.capture_begin() ... ``` but that seems clunkier than an optimizer constructor arg. I'm open to other ideas, but right now I think constructor arg is necessary and the least bad approach. Long term, https://github.com/pytorch/pytorch/issues/71274 is a better fix. Pull Request resolved: https://github.com/pytorch/pytorch/pull/77862 Approved by: https://github.com/ezyang	2022-06-13 01:56:47 +00:00
Mikayla Gawarecki	5f9590681d	Optim foreach cleanup for Adam (#70295 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70295 Test Plan: Imported from OSS Reviewed By: anjali411 Differential Revision: D33767870 Pulled By: mikaylagawarecki fbshipit-source-id: f922f15ecb0307458c8ecee737325c42c4f3ce8b (cherry picked from commit `66233a8a3e`)	2022-02-15 18:02:08 +00:00
Mikayla Gawarecki	7176c92687	[optim] update step in functional and pass state_steps instead of state (#71333 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/71333 Updated - Adagrad - Adamax - Adam - AdamW - RAdam make multi_tensor functionals take `state_steps: List[Tensor]` instead of taking `states: List[Dict]` make `state_steps: List[int]s -> state_steps:List[Tensor]` where each is a Singleton tensor so step can be updated within the functional (NAdam and ASGD) were updated in separate diffs to fold their handling of state into the functionals Test Plan: Imported from OSS Reviewed By: anjali411 Differential Revision: D33767872 Pulled By: mikaylagawarecki fbshipit-source-id: 9baa7cafb6375eab839917df9287c65a437891f2 (cherry picked from commit `831c02b3d0`)	2022-02-08 16:51:19 +00:00
Alban Desmaison	e1b84e1b6b	fix loading of older models that don't have maximize (#71023 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/71023 Reviewed By: jbschlosser Differential Revision: D33483687 Pulled By: albanD fbshipit-source-id: 2f3c6e97a9579be9ba15eca0756fc1f2c466fbb6	2022-01-10 06:01:24 -08:00
Adnios	15f14ce0dc	fix typo in adam docs (#70387 ) Summary: Fix the typo in [adam docs in master branch](https://pytorch.org/docs/master/generated/torch.optim.Adam.html#torch.optim.Adam) ![image](https://user-images.githubusercontent.com/41060790/147345284-37e180d1-fd06-4a62-9c79-2d17b8aa5cd3.png) Pull Request resolved: https://github.com/pytorch/pytorch/pull/70387 Reviewed By: H-Huang Differential Revision: D33309283 Pulled By: albanD fbshipit-source-id: d20c5d8f2498ac64013f71e202a6b50dcc069f2b	2021-12-28 07:35:39 -08:00
oliver	3d358a7678	Adds a `maximize` flag to Adam (#68164 ) Summary: Solves the next most important use case in https://github.com/pytorch/pytorch/issues/68052. I have kept the style as close to that in SGD as seemed reasonable, given the slight differences in their internal implementations. All feedback welcome! cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse SciPioneer H-Huang Pull Request resolved: https://github.com/pytorch/pytorch/pull/68164 Reviewed By: VitalyFedyunin Differential Revision: D32994129 Pulled By: albanD fbshipit-source-id: 65c57c3f3dbbd3e3e5338d51def54482503e8850	2021-12-13 05:53:53 -08:00
Ilqar Ramazanli	43248d9112	[doc][hackathon] To add Adam Optimizer to the documentation (#63251 ) Summary: It has been discussed before that adding description of Optimization algorithms to PyTorch Core documentation may result in a nice Optimization research tutorial. In the following tracking issue we mentioned about all the necessary algorithms and links to the originally published paper https://github.com/pytorch/pytorch/issues/63236. In this PR we are adding description of Adam Algorithm to the documentation. For more details, we refer to the paper https://arxiv.org/abs/1412.6980 <img width="442" alt="Screen Shot 2021-08-27 at 6 37 54 PM" src="https://user-images.githubusercontent.com/73658284/131195297-35fce613-3691-4fed-b42d-db234d4fcd7c.png"> Pull Request resolved: https://github.com/pytorch/pytorch/pull/63251 Reviewed By: albanD Differential Revision: D30779163 Pulled By: iramazanli fbshipit-source-id: 319a80fc3952793b0d064d0e641ddc1de3c05a86	2021-09-07 11:03:35 -07:00
Wanchao Liang	4611387608	[optim] take kw-only argument for functional optim APIs (#56185 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56185 ghstack-source-id: 126670123 Reviewed By: albanD Differential Revision: D27802169 fbshipit-source-id: f5e1cb2046dcdeecf5f6b0f70892828bf0adb22f	2021-04-15 20:08:04 -07:00
Jay Patel	4f62c622b3	Cleanup of unused list in adam.py (#53874 ) Summary: Code cleanup. Pull Request resolved: https://github.com/pytorch/pytorch/pull/53874 Reviewed By: jbschlosser Differential Revision: D27036819 Pulled By: ngimel fbshipit-source-id: c267e20c8d91224cd3c01b302a75f43aa309b560	2021-03-15 09:49:27 -07:00
Wanchao Liang	f8238d7917	[optim] bugfix when all parameters have no grad (#52944 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52944 This fix the bug introduced during refactoring optimizers https://github.com/pytorch/pytorch/pull/50411. When all parameters have no grads, we should still allows `beta` like hyper params to be defined. Reviewed By: ngimel Differential Revision: D26699827 fbshipit-source-id: 8a7074127704c7a4a1fbc17d48a81e23a649f280	2021-03-03 11:56:09 -08:00
Vincent Quenneville-Belair	50d903f19f	[optim] make functional api be private (#51316 ) (#51665 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51665 This reverts commit `896f82aa92`. Test Plan: Imported from OSS Reviewed By: gchanan Differential Revision: D26232608 Pulled By: vincentqb fbshipit-source-id: ca006baf4fb672c11c1bb003c39a29cbadb63dd3	2021-02-03 17:59:05 -08:00
Vincent Quenneville-Belair	896f82aa92	[optim] make functional api be private (#51316 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51316 Make optim functional API be private until we release with beta Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D26213469 fbshipit-source-id: b0fd001a8362ec1c152250bcd57c7205ed893107	2021-02-03 09:29:33 -08:00
Samuel Marks	e6779d4357	[*.py] Rename "Arguments:" to "Args:" (#49736 ) Summary: I've written custom parsers and emitters for everything from docstrings to classes and functions. However, I recently came across an issue when I was parsing/generating from the TensorFlow codebase: inconsistent use of `Args:` and `Arguments:` in its docstrings. ```sh (pytorch#c348fae)$ for name in 'Args:' 'Arguments:'; do printf '%-10s %04d\n' "$name" "$(rg -IFtpy --count-matches "$name" \| paste -s -d+ -- \| bc)"; done Args: 1095 Arguments: 0336 ``` It is easy enough to extend my parsers to support both variants, however it looks like `Arguments:` is wrong anyway, as per: - https://google.github.io/styleguide/pyguide.html#doc-function-args @ [`ddccc0f`](https://github.com/google/styleguide/blob/ddccc0f/pyguide.md) - https://chromium.googlesource.com/chromiumos/docs/+/master/styleguide/python.md#describing-arguments-in-docstrings @ [`9fc0fc0`](https://chromium.googlesource.com/chromiumos/docs/+/9fc0fc0/styleguide/python.md) - https://sphinxcontrib-napoleon.readthedocs.io/en/latest/example_google.html @ [`c0ae8e3`](https://github.com/sphinx-contrib/napoleon/blob/c0ae8e3/docs/source/example_google.rst) Therefore, only `Args:` is valid. This PR replaces them throughout the codebase. PS: For related PRs, see tensorflow/tensorflow/pull/45420 PPS: The trackbacks automatically appearing below are sending the same changes to other repositories in the [PyTorch](https://github.com/pytorch) organisation. Pull Request resolved: https://github.com/pytorch/pytorch/pull/49736 Reviewed By: albanD Differential Revision: D25710534 Pulled By: soumith fbshipit-source-id: 61e8ff01abb433e9f78185c2d1d0cbd7c22c1619	2020-12-28 09:34:47 -08:00
Wanchao Liang	08caf15502	[optimizer] refactor Adam to use functional API (#44791 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44791 Test Plan: Imported from OSS Reviewed By: ailzhang Differential Revision: D23935257 Pulled By: wanchaol fbshipit-source-id: 6f6e22a9287f5515d2e4e6abd4dee2fe7e17b945	2020-09-25 17:13:08 -07:00
Xiang Gao	6bc77f4d35	Use amax/maximum instead of max in optimizers (#43797 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43797 Reviewed By: malfet Differential Revision: D23406641 Pulled By: mruberry fbshipit-source-id: 0cd075124aa6533b21375fe2c90c44a5d05ad6e6	2020-09-15 10:39:40 -07:00
farhadrgh	4b4273a04e	Update Adam documentation (#41679 ) Summary: This PR fixes https://github.com/pytorch/pytorch/issues/41477 Adam implementation is doing L2 regularization and not decoupled weight decay. However, the change mentioned in https://github.com/pytorch/pytorch/issues/41477 was motivated by Line 12 of algorithm 2 in [Decoupled Weight Decay Regularization](https://arxiv.org/pdf/1711.05101.pdf) paper. Please let me know if you have other suggestions about how to deliver this info in the docs. cc ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/41679 Reviewed By: izdeby Differential Revision: D22671329 Pulled By: vincentqb fbshipit-source-id: 2caf60e4f62fe31f29aa35a9532d1c6895a24224	2020-07-23 09:25:41 -07:00
albanD	6e2bb1c054	End of the .data removal in torch/optim (#34211 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34211 Test Plan: Imported from OSS Differential Revision: D20248684 Pulled By: albanD fbshipit-source-id: 2294bfa41b82ff47f000bc98860780f59d7d4421	2020-03-09 06:40:39 -07:00
Eleanor Dwight Holland	6a97777f72	Remove use of `.data` from optimizers (#33640 ) Summary: Removes all uses of `.data` from optimizers. Or tries to. Pull Request resolved: https://github.com/pytorch/pytorch/pull/33640 Reviewed By: vincentqb Differential Revision: D20203216 Pulled By: albanD fbshipit-source-id: 9bfe78bbed00fd4aaa690801cff0201f0bd680a0	2020-03-03 13:21:55 -08:00
Xiao Wang	c1dd70688a	Fix deprecated python "add" calls (#33428 ) Summary: This PR fixed those python "add" calls using deprecated signature `add(Scalar, Tensor)`. The alternative signature `add(Tensor, alpha = Scalar)` is used. cc csarofeen zasdfgbnm ptrblck ngimel Pull Request resolved: https://github.com/pytorch/pytorch/pull/33428 Differential Revision: D20002534 Pulled By: vincentqb fbshipit-source-id: 81f2dd6170a47a9b53a17e5817c26e70d8afa130	2020-02-26 09:02:31 -08:00
Nikolay Novik	d19a50bf27	Add missing weight_decay parameter validation for Adam and AdamW (#33126 ) Summary: Adam and AdamW are missing parameter validation for weight_decay. Other optimisers have this check present. Pull Request resolved: https://github.com/pytorch/pytorch/pull/33126 Differential Revision: D19860366 Pulled By: vincentqb fbshipit-source-id: 286d7dc90e2f4ccf6540638286d2fe17939648fc	2020-02-20 11:11:51 -08:00
albanD	b0871f211b	Make all optimizers consistent so that they don't change gradients inplace Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30257 Test Plan: Imported from OSS Differential Revision: D18665461 Pulled By: albanD fbshipit-source-id: cfdafef919468a41007881b82fd288b7128baf95	2019-11-26 12:16:25 -08:00
Vitaly Fedyunin	877c96cddf	explicitly provide memory format when calling to *_like operators Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30008 Test Plan: Imported from OSS Differential Revision: D18575981 Pulled By: VitalyFedyunin fbshipit-source-id: ec3418257089ad57913932be1a8608cd20ce054c	2019-11-19 16:19:29 -08:00
Farhad Ramezanghorbani	fed5ca192c	Adam/AdamW implementation minor fix (#22628 ) Summary: I have noticed a small discrepancy between theory and the implementation of AdamW and in general Adam. The epsilon in the denominator of the following Adam update should not be scaled by the bias correction [(Algorithm 2, L9-12)](https://arxiv.org/pdf/1711.05101.pdf). Only the running average of the gradient (_m_) and squared gradients (_v_) should be scaled by their corresponding bias corrections. ![adam_update](https://user-images.githubusercontent.com/13050245/60894105-11117f00-a230-11e9-9ba0-adad2ae2e0ae.png) In the current implementation, the epsilon is scaled by the square root of `bias_correction2`. I have plotted this ratio as a function of step given `beta2 = 0.999` and `eps = 1e-8`. In the early steps of optimization, this ratio slightly deviates from theory (denoted by the horizontal red line). ![plot](https://user-images.githubusercontent.com/13050245/60893952-cabc2000-a22f-11e9-8dc2-6353ad5d674d.png) Pull Request resolved: https://github.com/pytorch/pytorch/pull/22628 Differential Revision: D16589914 Pulled By: vincentqb fbshipit-source-id: 8791eb338236faea9457c0845ccfdba700e5f1e7	2019-08-01 11:42:04 -07:00
Soumith Chintala	cf235e0894	fix lint after new flake8 release added new style constraints (#13047 ) Summary: fix lint after new flake8 release added new style constraints Pull Request resolved: https://github.com/pytorch/pytorch/pull/13047 Differential Revision: D10527804 Pulled By: soumith fbshipit-source-id: 6f4d02662570b6339f69117b61037c8394b0bbd8	2018-10-24 09:03:38 -07:00
Jerry Ma	383d340e88	Small optimization for adam (#12107 ) Summary: Apply weight decay for Adam in-place instead of via copy. Synced offline with soumith , who mentioned that it should be OK. This is also consistent with other optimizers, e.g. `eee01731a5/torch/optim/sgd.py (L93)` Pull Request resolved: https://github.com/pytorch/pytorch/pull/12107 Reviewed By: soumith Differential Revision: D10071787 Pulled By: jma127 fbshipit-source-id: 5fd7939c79039693b225c44c4c80450923b8d673	2018-09-26 21:43:46 -07:00
rasbt	eee01731a5	Adds the default value for the amsgrad arg to the Adam docstring (#9971 ) Summary: Minor addition to the docstring of `torch.nn.optim.Adam`, adding the default argument description for the `amsgrad` argument to the docstring for concistency. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9971 Differential Revision: D9040820 Pulled By: soumith fbshipit-source-id: 168744a6bb0d1422331beffd7e694b9d6f61900c	2018-07-28 09:23:45 -07:00
lazypanda1	063946d2b3	Added parameter range checks for all optimizers (#6000 )	2018-03-28 11:22:23 +02:00
li-roy	df88373f88	set default ams param in adam optimizer (#5501 )	2018-03-02 11:43:06 +01:00
lazypanda1	a061000250	Added check and test for betas parameter in Adam optimizer (#5147 ) * Added check and test for betas parameter in Adam optimizer * Simplified test	2018-02-11 20:24:43 -05:00
Dr. Kashif Rasul	68c0998cbe	added AMSgrad optimizer to Adam and SparseAdam (#4034 ) * initial AMSGrad * added test for amsgrad * added amsgrad to adam * fixed tests * added option to sparse adam * flake8	2017-12-18 13:24:49 -05:00
SsnL	f76d6c029c	Sparse Adam optimizer for sparse gradients (#3137 ) * sparse adam * Favor dense addition over sparse_mask	2017-11-06 14:20:51 -05:00
Martin Raison	f17cfe4293	sparse tensor operations (#735 )	2017-03-03 18:37:03 +01:00
Adam McCarthy	7926324385	Corrected parameter typo in Adam docstring (#697 )	2017-02-07 19:00:10 +01:00
Luke Yeager	e7c1e6a8e3	[pep8] Fix most lint automatically with autopep8 Here's the command I used to invoke autopep8 (in parallel!): git ls-files \| grep '\.py$' \| xargs -n1 -P`nproc` autopep8 -i Several rules are ignored in setup.cfg. The goal is to let autopep8 handle everything which it can handle safely, and to disable any rules which are tricky or controversial to address. We may want to come back and re-enable some of these rules later, but I'm trying to make this patch as safe as possible. Also configures flake8 to match pep8's behavior. Also configures TravisCI to check the whole project for lint.	2017-01-28 01:15:51 +01:00
Adam Paszke	ecfcf39f30	Improve optimizer serialization Also, add optimizer.load_state_dict	2017-01-24 17:30:50 -05:00
Sergey Zagoruyko	2748b920ab	make adam have the same lr as lua torch (#576 )	2017-01-24 16:35:28 -05:00
Adam Paszke	95f0fa8a92	Change .grad attribute of Variables to be a Variable	2017-01-16 12:59:47 -05:00
Adam Paszke	604e13775f	Add optim docs	2017-01-16 12:59:47 -05:00
Adam Paszke	09493603f6	Change optimizer API	2016-11-08 18:12:56 +01:00
Adam Paszke	df59b89fbb	Add more optimizers	2016-11-07 22:50:56 +01:00
Adam Paszke	2f342af22f	Move optim to legacy	2016-08-01 12:01:46 -04:00
Adam Paszke	554a1d8336	Add optim	2016-07-21 16:42:06 -04:00

1 2

93 Commits