pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 00:21:07 +01:00

Author	SHA1	Message	Date
Jane Xu	0070c546b5	[BE][optim] abstract out docstrings, add differentiable docs (#92336 ) 1. abstract out common doc strings --> I'm sure there are more, but let this be a first step. 2. Add differentiable docs to those who are actually differentiable Pull Request resolved: https://github.com/pytorch/pytorch/pull/92336 Approved by: https://github.com/albanD	2023-01-18 15:09:28 +00:00
Jane Xu	d41b5d7c14	[adam] Add not torch.jit.is_scripting() as a requirement for switching to fused (#92181 ) A "fix" following https://github.com/pytorch/pytorch/pull/90865. Realized that fused is not compatible with torch.jit.is_scripting() when looking at a later line. Took the opportunity to make the code cleaner/slightly more performant (with the extends) as well. Pull Request resolved: https://github.com/pytorch/pytorch/pull/92181 Approved by: https://github.com/albanD	2023-01-14 19:05:27 +00:00
Nouran Ali	a60125e298	add docstring for adam differentiable parameter (#91881 ) Fixes #90467 Pull Request resolved: https://github.com/pytorch/pytorch/pull/91881 Approved by: https://github.com/janeyx99	2023-01-13 17:08:27 +00:00
Jane Xu	ed7885c254	[utils][foreach] Add group tensor by device and dtype util (#92014 ) Add util that will be commonly used throughout optim Pull Request resolved: https://github.com/pytorch/pytorch/pull/92014 Approved by: https://github.com/albanD	2023-01-11 23:37:20 +00:00
Jane Xu	a061f139dc	[optim] Adam defaults to fused when CUDA + differentiable=False (#90865 ) Step 1 in faster default optimizers. Preliminary benchmarks show gaps in improvement on CUDA for BERT_pytorch and resnet18: ![image](https://user-images.githubusercontent.com/31798555/207707118-14221802-77ce-4ee0-96e3-04638c07924c.png) Pull Request resolved: https://github.com/pytorch/pytorch/pull/90865 Approved by: https://github.com/albanD	2022-12-27 01:28:47 +00:00
Soumith Chintala	06326a7721	[optim] skip .item calls in all optimizers when compiling with dynamo (#88173 ) @mlazos: skips `item()` calls if compiling with dynamo, by defining a helper function `_get_value` which either returns the result of `.item()` or the scalar cpu tensor if compiling with dynamo. This was done because removing `item()` calls significantly regresses eager perf. Additionally, `_dispatch_sqrt` calls the appropriate sqrt function (math.sqrt, or torch.sqrt). Fixes https://github.com/pytorch/torchdynamo/issues/1083 This PR will no longer be needed once symint support is default. This PR closes all remaining graph breaks in the optimizers (!!) Pull Request resolved: https://github.com/pytorch/pytorch/pull/88173 Approved by: https://github.com/albanD	2022-12-12 17:32:35 +00:00
Michael Lazos	c63afb283c	Disable dynamo on optimizer lazy initialization (#89902 ) Helps with https://github.com/pytorch/torchdynamo/issues/1803 Separate out the group initialization and disable dynamo on it Pull Request resolved: https://github.com/pytorch/pytorch/pull/89902 Approved by: https://github.com/soumith, https://github.com/albanD	2022-12-02 01:15:11 +00:00
Michael Lazos	3d47c74cfe	Update code style for optimizer code (#89862 ) Separating out whitespace-only changes Pull Request resolved: https://github.com/pytorch/pytorch/pull/89862 Approved by: https://github.com/albanD, https://github.com/soumith	2022-11-30 00:53:05 +00:00
Masaki Kozuki	5f26df0345	resubmit: "resubmit: [mta] APEX style Fused Adam (#81705 ) (#85507 )" (#85739 ) Embarrassingly move the pow implementations around [ATen/native/cuda/PowKernel.cu#L21-L66](`849b08f14b/aten/src/ATen/native/cuda/PowKernel.cu (L21-L66)`) to a new header file and let FusedAdam use them to tame MSVC, hopefully. cc @ngimel @ptrblck Pull Request resolved: https://github.com/pytorch/pytorch/pull/85739 Approved by: https://github.com/ngimel	2022-09-29 16:58:59 +00:00
PyTorch MergeBot	7167996346	Revert "resubmit: [mta] APEX style Fused Adam (#81705 ) (#85507 )" This reverts commit `4615d1bcfa`. Reverted https://github.com/pytorch/pytorch/pull/85507 on behalf of https://github.com/atalman due to Break internal windows builds	2022-09-27 16:59:35 +00:00
Masaki Kozuki	4615d1bcfa	resubmit: [mta] APEX style Fused Adam (#81705 ) (#85507 ) This PR implements an APEX style FusedAdam in PyTorch. This is different from the APEX one in that this is compatible with `torch.cuda.amp.GradScaler` by setting `_step_supports_amp_scaling` to `True` and unscales gradients inside its CUDA kernel. related: https://github.com/pytorch/pytorch/issues/68041, https://github.com/pytorch/pytorch/issues/71274, https://github.com/pytorch/pytorch/issues/80167 possibly related to https://github.com/pytorch/pytorch/issues/80595#issuecomment-1178519436 Pull Request resolved: https://github.com/pytorch/pytorch/pull/81705 Approved by: https://github.com/ngimel cc @ptrblck @ngimel Pull Request resolved: https://github.com/pytorch/pytorch/pull/85507 Approved by: https://github.com/ngimel	2022-09-23 18:56:00 +00:00
PyTorch MergeBot	e505360eb8	Revert "[mta] APEX style Fused Adam (#81705 )" This reverts commit `7a6c4d0c50`. Reverted https://github.com/pytorch/pytorch/pull/81705 on behalf of https://github.com/dagitses due to broke internal builds, details to come	2022-09-22 19:37:29 +00:00
Masaki Kozuki	7a6c4d0c50	[mta] APEX style Fused Adam (#81705 ) This PR implements an APEX style FusedAdam in PyTorch. This is different from the APEX one in that this is compatible with `torch.cuda.amp.GradScaler` by setting `_step_supports_amp_scaling` to `True` and unscales gradients inside its CUDA kernel. related: https://github.com/pytorch/pytorch/issues/68041, https://github.com/pytorch/pytorch/issues/71274, https://github.com/pytorch/pytorch/issues/80167 possibly related to https://github.com/pytorch/pytorch/issues/80595#issuecomment-1178519436 cc @ptrblck @ngimel Pull Request resolved: https://github.com/pytorch/pytorch/pull/81705 Approved by: https://github.com/ngimel	2022-09-20 17:18:33 +00:00
albanD	84c4b07932	Make sure that we can load old optimizer checkpoint (#83588 ) We want to make sure that we can load checkpoints that were saved with older version of the code (which doesn't contain the differentiable attribute). Pull Request resolved: https://github.com/pytorch/pytorch/pull/83588 Approved by: https://github.com/mikaylagawarecki	2022-08-17 15:08:05 +00:00
Emilio Castillo	5aab57e112	Make Adam optimizer differentiable (#82205 ) Continues [80938](https://github.com/pytorch/pytorch/pull/80938) Pull Request resolved: https://github.com/pytorch/pytorch/pull/82205 Approved by: https://github.com/albanD	2022-08-17 07:20:37 +00:00
Masaki Kozuki	3139722679	[foreach][mta] Inplace `maximum` and `minimum` (#82523 ) ### Description <!-- What did you change and why was it needed? --> Implement `torch._foreach_maximum_` and `torch._foreach_minimum_` mainly for `_multi_tensor_adam` and `_multi_tensor_adamw` with `amsgrad=True` to correctly update their `max_exp_avg_sqs`. ### Issue <!-- Link to Issue ticket or RFP --> - https://github.com/pytorch/pytorch/issues/78807 - https://github.com/pytorch/pytorch/pull/81894 - https://github.com/pytorch/pytorch/pull/81348 - https://github.com/pytorch/pytorch/pull/81705 - https://github.com/pytorch/pytorch/issues/58833 - https://github.com/pytorch/pytorch/issues/68041 ### Testing <!-- How did you test your change? --> Updated `test_foreach.py::TestForeach::_minmax_test` to compare the outputs of `_foreach_maximum_` (and `_foreach_minimum_`) against those of `[torch.maximum(a, b) for a, b in zip(tensors1, tensors2)]` cc @ngimel @albanD @mikaylagawarecki Pull Request resolved: https://github.com/pytorch/pytorch/pull/82523 Approved by: https://github.com/albanD	2022-08-03 03:40:42 +00:00
ProGamerGov	71d50f4f89	Change docstring type callable to Callable for consistency (#82487 ) ### Description Across PyTorch's docstrings, both `callable` and `Callable` for variable types. The Callable should be capitalized as we are referring to the `Callable` type, and not the Python `callable()` function. ### Testing There shouldn't be any testing required. Pull Request resolved: https://github.com/pytorch/pytorch/pull/82487 Approved by: https://github.com/albanD	2022-08-01 17:26:09 +00:00
ProGamerGov	357b7d589c	Fix docstring inconsistencies: string -> str, boolean -> bool (#82410 ) ### Description Throughout the PyTorch docs and codebase, the `string` type in docstrings is referred to by two separate names. This leads to inconsistent docs, like you can see here: https://pytorch.org/docs/stable/generated/torch.nn.Conv3d.html#torch.nn.Conv3d This PR fixes this issue by ensuring that all mentions of the string type in docstrings, are using the same format that Sphinx generates hyperlinks for. ### Testing No testing should be required for this change Pull Request resolved: https://github.com/pytorch/pytorch/pull/82410 Approved by: https://github.com/jbschlosser	2022-07-28 21:29:57 +00:00
Rob Zinkov	f9ef363982	Modifying Adam to support complex numbers as 2d real numbers (#80279 ) This commit addresses issues in #65711 Pull Request resolved: https://github.com/pytorch/pytorch/pull/80279 Approved by: https://github.com/albanD	2022-07-27 18:39:40 +00:00
anjali411	bda04e9f5e	Add __all__ for torch.optim and torch.nn.modules modules (#80237 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/80237 Approved by: https://github.com/albanD	2022-06-24 21:34:10 +00:00
Sergii Dymchenko	de7219e8a7	Use generators with all/any in torch/optim (#78142 ) Generator comprehensions with any/all are less verbose and potentially help to save memory/CPU : https://eklitzke.org/generator-comprehensions-and-using-any-and-all-in-python To make JIT work with this change, I added code to convert GeneratorExp to ListComp. So the whole PR is basically NoOp for JIT, but potentially memory and speed improvement for eager mode. Also I removed a test from test/jit/test_parametrization.py. The test was bad and had a TODO to actually implement and just tested that UnsupportedNodeError is thrown, and with GeneratorExp support a different error would be thrown. Pull Request resolved: https://github.com/pytorch/pytorch/pull/78142 Approved by: https://github.com/malfet, https://github.com/albanD	2022-06-24 17:23:45 +00:00
albanD	375668cd96	Remove overly restrictive assert in adam (#80222 ) This is causing issues if the user has the step on cuda for a good reason. These assert prevents code that used to run just fine to fail. Note that this is a pretty bad thing to do for performance though so it is ok to try and push users away from doing it. For the 1.12.1 milestone: this is not asking for a dot release to fix this (as this is bad practice anyways). But it would be a great thing to add if we do one: it is very low risk and will prevent breakage for users. Pull Request resolved: https://github.com/pytorch/pytorch/pull/80222 Approved by: https://github.com/jbschlosser, https://github.com/ngimel	2022-06-24 17:08:34 +00:00
Michael Carilli	ba27ee9e8f	[CUDA graphs] Allows Adam and AdamW to be capture-safe (#77862 ) Near term fix for https://github.com/pytorch/pytorch/issues/76368. Q. Why does the user need to request `capturable=True` in the optimizer constructor? Why can't capture safety be completely automatic? A. We need to set up capture-safe (device-side) state variables before capture. If we don't, and step() internally detects capture is underway, it's too late: the best we could do is create a device state variable and copy the current CPU value into it, which is not something we want baked into the graph. Q. Ok, why not just do the capture-safe approach with device-side state variables all the time? A. It incurs several more kernel launches per parameter, which could really add up and regress cpu overhead for ungraphed step()s. If the optimizer won't be captured, we should allow step() to stick with its current cpu-side state handling. Q. But cuda RNG is a stateful thing that maintains its state on the cpu outside of capture and replay, and we capture it automatically. Why can't we do the same thing here? A. The graph object can handle RNG generator increments because its capture_begin, capture_end, and replay() methods can see and access generator object. But the graph object has no explicit knowledge of or access to optimizer steps in its capture scope. We could let the user tell the graph object what optimizers will be stepped in its scope, ie something like ```python graph.will_use_optimizer(opt) graph.capture_begin() ... ``` but that seems clunkier than an optimizer constructor arg. I'm open to other ideas, but right now I think constructor arg is necessary and the least bad approach. Long term, https://github.com/pytorch/pytorch/issues/71274 is a better fix. Pull Request resolved: https://github.com/pytorch/pytorch/pull/77862 Approved by: https://github.com/ezyang	2022-06-13 01:56:47 +00:00
Mikayla Gawarecki	5f9590681d	Optim foreach cleanup for Adam (#70295 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70295 Test Plan: Imported from OSS Reviewed By: anjali411 Differential Revision: D33767870 Pulled By: mikaylagawarecki fbshipit-source-id: f922f15ecb0307458c8ecee737325c42c4f3ce8b (cherry picked from commit `66233a8a3e`)	2022-02-15 18:02:08 +00:00
Mikayla Gawarecki	7176c92687	[optim] update step in functional and pass state_steps instead of state (#71333 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/71333 Updated - Adagrad - Adamax - Adam - AdamW - RAdam make multi_tensor functionals take `state_steps: List[Tensor]` instead of taking `states: List[Dict]` make `state_steps: List[int]s -> state_steps:List[Tensor]` where each is a Singleton tensor so step can be updated within the functional (NAdam and ASGD) were updated in separate diffs to fold their handling of state into the functionals Test Plan: Imported from OSS Reviewed By: anjali411 Differential Revision: D33767872 Pulled By: mikaylagawarecki fbshipit-source-id: 9baa7cafb6375eab839917df9287c65a437891f2 (cherry picked from commit `831c02b3d0`)	2022-02-08 16:51:19 +00:00
Alban Desmaison	e1b84e1b6b	fix loading of older models that don't have maximize (#71023 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/71023 Reviewed By: jbschlosser Differential Revision: D33483687 Pulled By: albanD fbshipit-source-id: 2f3c6e97a9579be9ba15eca0756fc1f2c466fbb6	2022-01-10 06:01:24 -08:00
Adnios	15f14ce0dc	fix typo in adam docs (#70387 ) Summary: Fix the typo in [adam docs in master branch](https://pytorch.org/docs/master/generated/torch.optim.Adam.html#torch.optim.Adam) ![image](https://user-images.githubusercontent.com/41060790/147345284-37e180d1-fd06-4a62-9c79-2d17b8aa5cd3.png) Pull Request resolved: https://github.com/pytorch/pytorch/pull/70387 Reviewed By: H-Huang Differential Revision: D33309283 Pulled By: albanD fbshipit-source-id: d20c5d8f2498ac64013f71e202a6b50dcc069f2b	2021-12-28 07:35:39 -08:00
oliver	3d358a7678	Adds a `maximize` flag to Adam (#68164 ) Summary: Solves the next most important use case in https://github.com/pytorch/pytorch/issues/68052. I have kept the style as close to that in SGD as seemed reasonable, given the slight differences in their internal implementations. All feedback welcome! cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse SciPioneer H-Huang Pull Request resolved: https://github.com/pytorch/pytorch/pull/68164 Reviewed By: VitalyFedyunin Differential Revision: D32994129 Pulled By: albanD fbshipit-source-id: 65c57c3f3dbbd3e3e5338d51def54482503e8850	2021-12-13 05:53:53 -08:00
Ilqar Ramazanli	43248d9112	[doc][hackathon] To add Adam Optimizer to the documentation (#63251 ) Summary: It has been discussed before that adding description of Optimization algorithms to PyTorch Core documentation may result in a nice Optimization research tutorial. In the following tracking issue we mentioned about all the necessary algorithms and links to the originally published paper https://github.com/pytorch/pytorch/issues/63236. In this PR we are adding description of Adam Algorithm to the documentation. For more details, we refer to the paper https://arxiv.org/abs/1412.6980 <img width="442" alt="Screen Shot 2021-08-27 at 6 37 54 PM" src="https://user-images.githubusercontent.com/73658284/131195297-35fce613-3691-4fed-b42d-db234d4fcd7c.png"> Pull Request resolved: https://github.com/pytorch/pytorch/pull/63251 Reviewed By: albanD Differential Revision: D30779163 Pulled By: iramazanli fbshipit-source-id: 319a80fc3952793b0d064d0e641ddc1de3c05a86	2021-09-07 11:03:35 -07:00
Wanchao Liang	4611387608	[optim] take kw-only argument for functional optim APIs (#56185 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56185 ghstack-source-id: 126670123 Reviewed By: albanD Differential Revision: D27802169 fbshipit-source-id: f5e1cb2046dcdeecf5f6b0f70892828bf0adb22f	2021-04-15 20:08:04 -07:00
Jay Patel	4f62c622b3	Cleanup of unused list in adam.py (#53874 ) Summary: Code cleanup. Pull Request resolved: https://github.com/pytorch/pytorch/pull/53874 Reviewed By: jbschlosser Differential Revision: D27036819 Pulled By: ngimel fbshipit-source-id: c267e20c8d91224cd3c01b302a75f43aa309b560	2021-03-15 09:49:27 -07:00
Wanchao Liang	f8238d7917	[optim] bugfix when all parameters have no grad (#52944 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52944 This fix the bug introduced during refactoring optimizers https://github.com/pytorch/pytorch/pull/50411. When all parameters have no grads, we should still allows `beta` like hyper params to be defined. Reviewed By: ngimel Differential Revision: D26699827 fbshipit-source-id: 8a7074127704c7a4a1fbc17d48a81e23a649f280	2021-03-03 11:56:09 -08:00
Vincent Quenneville-Belair	50d903f19f	[optim] make functional api be private (#51316 ) (#51665 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51665 This reverts commit `896f82aa92`. Test Plan: Imported from OSS Reviewed By: gchanan Differential Revision: D26232608 Pulled By: vincentqb fbshipit-source-id: ca006baf4fb672c11c1bb003c39a29cbadb63dd3	2021-02-03 17:59:05 -08:00
Vincent Quenneville-Belair	896f82aa92	[optim] make functional api be private (#51316 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51316 Make optim functional API be private until we release with beta Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D26213469 fbshipit-source-id: b0fd001a8362ec1c152250bcd57c7205ed893107	2021-02-03 09:29:33 -08:00
Samuel Marks	e6779d4357	[*.py] Rename "Arguments:" to "Args:" (#49736 ) Summary: I've written custom parsers and emitters for everything from docstrings to classes and functions. However, I recently came across an issue when I was parsing/generating from the TensorFlow codebase: inconsistent use of `Args:` and `Arguments:` in its docstrings. ```sh (pytorch#c348fae)$ for name in 'Args:' 'Arguments:'; do printf '%-10s %04d\n' "$name" "$(rg -IFtpy --count-matches "$name" \| paste -s -d+ -- \| bc)"; done Args: 1095 Arguments: 0336 ``` It is easy enough to extend my parsers to support both variants, however it looks like `Arguments:` is wrong anyway, as per: - https://google.github.io/styleguide/pyguide.html#doc-function-args @ [`ddccc0f`](https://github.com/google/styleguide/blob/ddccc0f/pyguide.md) - https://chromium.googlesource.com/chromiumos/docs/+/master/styleguide/python.md#describing-arguments-in-docstrings @ [`9fc0fc0`](https://chromium.googlesource.com/chromiumos/docs/+/9fc0fc0/styleguide/python.md) - https://sphinxcontrib-napoleon.readthedocs.io/en/latest/example_google.html @ [`c0ae8e3`](https://github.com/sphinx-contrib/napoleon/blob/c0ae8e3/docs/source/example_google.rst) Therefore, only `Args:` is valid. This PR replaces them throughout the codebase. PS: For related PRs, see tensorflow/tensorflow/pull/45420 PPS: The trackbacks automatically appearing below are sending the same changes to other repositories in the [PyTorch](https://github.com/pytorch) organisation. Pull Request resolved: https://github.com/pytorch/pytorch/pull/49736 Reviewed By: albanD Differential Revision: D25710534 Pulled By: soumith fbshipit-source-id: 61e8ff01abb433e9f78185c2d1d0cbd7c22c1619	2020-12-28 09:34:47 -08:00
Wanchao Liang	08caf15502	[optimizer] refactor Adam to use functional API (#44791 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44791 Test Plan: Imported from OSS Reviewed By: ailzhang Differential Revision: D23935257 Pulled By: wanchaol fbshipit-source-id: 6f6e22a9287f5515d2e4e6abd4dee2fe7e17b945	2020-09-25 17:13:08 -07:00
Xiang Gao	6bc77f4d35	Use amax/maximum instead of max in optimizers (#43797 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43797 Reviewed By: malfet Differential Revision: D23406641 Pulled By: mruberry fbshipit-source-id: 0cd075124aa6533b21375fe2c90c44a5d05ad6e6	2020-09-15 10:39:40 -07:00
farhadrgh	4b4273a04e	Update Adam documentation (#41679 ) Summary: This PR fixes https://github.com/pytorch/pytorch/issues/41477 Adam implementation is doing L2 regularization and not decoupled weight decay. However, the change mentioned in https://github.com/pytorch/pytorch/issues/41477 was motivated by Line 12 of algorithm 2 in [Decoupled Weight Decay Regularization](https://arxiv.org/pdf/1711.05101.pdf) paper. Please let me know if you have other suggestions about how to deliver this info in the docs. cc ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/41679 Reviewed By: izdeby Differential Revision: D22671329 Pulled By: vincentqb fbshipit-source-id: 2caf60e4f62fe31f29aa35a9532d1c6895a24224	2020-07-23 09:25:41 -07:00
albanD	6e2bb1c054	End of the .data removal in torch/optim (#34211 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34211 Test Plan: Imported from OSS Differential Revision: D20248684 Pulled By: albanD fbshipit-source-id: 2294bfa41b82ff47f000bc98860780f59d7d4421	2020-03-09 06:40:39 -07:00
Eleanor Dwight Holland	6a97777f72	Remove use of `.data` from optimizers (#33640 ) Summary: Removes all uses of `.data` from optimizers. Or tries to. Pull Request resolved: https://github.com/pytorch/pytorch/pull/33640 Reviewed By: vincentqb Differential Revision: D20203216 Pulled By: albanD fbshipit-source-id: 9bfe78bbed00fd4aaa690801cff0201f0bd680a0	2020-03-03 13:21:55 -08:00
Xiao Wang	c1dd70688a	Fix deprecated python "add" calls (#33428 ) Summary: This PR fixed those python "add" calls using deprecated signature `add(Scalar, Tensor)`. The alternative signature `add(Tensor, alpha = Scalar)` is used. cc csarofeen zasdfgbnm ptrblck ngimel Pull Request resolved: https://github.com/pytorch/pytorch/pull/33428 Differential Revision: D20002534 Pulled By: vincentqb fbshipit-source-id: 81f2dd6170a47a9b53a17e5817c26e70d8afa130	2020-02-26 09:02:31 -08:00
Nikolay Novik	d19a50bf27	Add missing weight_decay parameter validation for Adam and AdamW (#33126 ) Summary: Adam and AdamW are missing parameter validation for weight_decay. Other optimisers have this check present. Pull Request resolved: https://github.com/pytorch/pytorch/pull/33126 Differential Revision: D19860366 Pulled By: vincentqb fbshipit-source-id: 286d7dc90e2f4ccf6540638286d2fe17939648fc	2020-02-20 11:11:51 -08:00
albanD	b0871f211b	Make all optimizers consistent so that they don't change gradients inplace Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30257 Test Plan: Imported from OSS Differential Revision: D18665461 Pulled By: albanD fbshipit-source-id: cfdafef919468a41007881b82fd288b7128baf95	2019-11-26 12:16:25 -08:00
Vitaly Fedyunin	877c96cddf	explicitly provide memory format when calling to *_like operators Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30008 Test Plan: Imported from OSS Differential Revision: D18575981 Pulled By: VitalyFedyunin fbshipit-source-id: ec3418257089ad57913932be1a8608cd20ce054c	2019-11-19 16:19:29 -08:00
Farhad Ramezanghorbani	fed5ca192c	Adam/AdamW implementation minor fix (#22628 ) Summary: I have noticed a small discrepancy between theory and the implementation of AdamW and in general Adam. The epsilon in the denominator of the following Adam update should not be scaled by the bias correction [(Algorithm 2, L9-12)](https://arxiv.org/pdf/1711.05101.pdf). Only the running average of the gradient (_m_) and squared gradients (_v_) should be scaled by their corresponding bias corrections. ![adam_update](https://user-images.githubusercontent.com/13050245/60894105-11117f00-a230-11e9-9ba0-adad2ae2e0ae.png) In the current implementation, the epsilon is scaled by the square root of `bias_correction2`. I have plotted this ratio as a function of step given `beta2 = 0.999` and `eps = 1e-8`. In the early steps of optimization, this ratio slightly deviates from theory (denoted by the horizontal red line). ![plot](https://user-images.githubusercontent.com/13050245/60893952-cabc2000-a22f-11e9-8dc2-6353ad5d674d.png) Pull Request resolved: https://github.com/pytorch/pytorch/pull/22628 Differential Revision: D16589914 Pulled By: vincentqb fbshipit-source-id: 8791eb338236faea9457c0845ccfdba700e5f1e7	2019-08-01 11:42:04 -07:00
Soumith Chintala	cf235e0894	fix lint after new flake8 release added new style constraints (#13047 ) Summary: fix lint after new flake8 release added new style constraints Pull Request resolved: https://github.com/pytorch/pytorch/pull/13047 Differential Revision: D10527804 Pulled By: soumith fbshipit-source-id: 6f4d02662570b6339f69117b61037c8394b0bbd8	2018-10-24 09:03:38 -07:00
Jerry Ma	383d340e88	Small optimization for adam (#12107 ) Summary: Apply weight decay for Adam in-place instead of via copy. Synced offline with soumith , who mentioned that it should be OK. This is also consistent with other optimizers, e.g. `eee01731a5/torch/optim/sgd.py (L93)` Pull Request resolved: https://github.com/pytorch/pytorch/pull/12107 Reviewed By: soumith Differential Revision: D10071787 Pulled By: jma127 fbshipit-source-id: 5fd7939c79039693b225c44c4c80450923b8d673	2018-09-26 21:43:46 -07:00
rasbt	eee01731a5	Adds the default value for the amsgrad arg to the Adam docstring (#9971 ) Summary: Minor addition to the docstring of `torch.nn.optim.Adam`, adding the default argument description for the `amsgrad` argument to the docstring for concistency. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9971 Differential Revision: D9040820 Pulled By: soumith fbshipit-source-id: 168744a6bb0d1422331beffd7e694b9d6f61900c	2018-07-28 09:23:45 -07:00
lazypanda1	063946d2b3	Added parameter range checks for all optimizers (#6000 )	2018-03-28 11:22:23 +02:00
li-roy	df88373f88	set default ams param in adam optimizer (#5501 )	2018-03-02 11:43:06 +01:00

1 2

64 Commits