pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
joncrall	b136f3f310	More doctest refinements. (#83317 ) Follow up to #82797 Now that the doctests themselves are in a better state, we should be able to enable xdoctest on the CI so they stay that way. @ezyang @vadimkantorov Pull Request resolved: https://github.com/pytorch/pytorch/pull/83317 Approved by: https://github.com/ezyang	2022-08-22 20:07:26 +00:00
Emilio Castillo	f0eb841d20	Make `torch.optim.RMSprop` differentiable (#83578 ) Blocked by #82205 Pull Request resolved: https://github.com/pytorch/pytorch/pull/83578 Approved by: https://github.com/albanD	2022-08-22 03:37:10 +00:00
albanD	84c4b07932	Make sure that we can load old optimizer checkpoint (#83588 ) We want to make sure that we can load checkpoints that were saved with older version of the code (which doesn't contain the differentiable attribute). Pull Request resolved: https://github.com/pytorch/pytorch/pull/83588 Approved by: https://github.com/mikaylagawarecki	2022-08-17 15:08:05 +00:00
Emilio Castillo	5aab57e112	Make Adam optimizer differentiable (#82205 ) Continues [80938](https://github.com/pytorch/pytorch/pull/80938) Pull Request resolved: https://github.com/pytorch/pytorch/pull/82205 Approved by: https://github.com/albanD	2022-08-17 07:20:37 +00:00
Rob Zinkov	ff75562cff	Adding maximize to rprop (#81864 ) Added the maximize flag #68052 to rprop optimizer and updates the respective tests. Pull Request resolved: https://github.com/pytorch/pytorch/pull/81864 Approved by: https://github.com/albanD	2022-08-16 08:19:46 +00:00
joncrall	4618371da5	Integrate xdoctest - Rebased (#82797 ) This is a new version of #15648 based on the latest master branch. Unlike the previous PR where I fixed a lot of the doctests in addition to integrating xdoctest, I'm going to reduce the scope here. I'm simply going to integrate xdoctest, and then I'm going to mark all of the failing tests as "SKIP". This will let xdoctest run on the dashboards, provide some value, and still let the dashboards pass. I'll leave fixing the doctests themselves to another PR. In my initial commit, I do the bare minimum to get something running with failing dashboards. The few tests that I marked as skip are causing segfaults. Running xdoctest results in 293 failed, 201 passed tests. The next commits will be to disable those tests. (unfortunately I don't have a tool that will insert the `#xdoctest: +SKIP` directive over every failing test, so I'm going to do this mostly manually.) Fixes https://github.com/pytorch/pytorch/issues/71105 @ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/82797 Approved by: https://github.com/ezyang	2022-08-12 02:08:01 +00:00
Federico Pozzi	f8a10a7f79	feat: add PolynomialLR scheduler (#82769 ) ### Description <!-- What did you change and why was it needed? --> Add PolynomialLR scheduler. ### Issue Closes #79511. ### Testing I added tests for PolynomialLR. Pull Request resolved: https://github.com/pytorch/pytorch/pull/82769 Approved by: https://github.com/datumbox	2022-08-10 18:21:00 +00:00
Rob Zinkov	c54d18dbc7	Handle complex optimization in Adamax by treating complex numbers as 2D real numbers (#80319 ) This commit partially addresses #65711 Pull Request resolved: https://github.com/pytorch/pytorch/pull/80319 Approved by: https://github.com/albanD	2022-08-05 21:03:18 +00:00
Rob Zinkov	dcbe9ce2ad	Handle complex optimization in AdamW by treating complex numbers as 2D real numbers (#80280 ) This commit partially addresses #65711 Pull Request resolved: https://github.com/pytorch/pytorch/pull/80280 Approved by: https://github.com/albanD	2022-08-05 13:47:14 +00:00
Masaki Kozuki	3139722679	[foreach][mta] Inplace `maximum` and `minimum` (#82523 ) ### Description <!-- What did you change and why was it needed? --> Implement `torch._foreach_maximum_` and `torch._foreach_minimum_` mainly for `_multi_tensor_adam` and `_multi_tensor_adamw` with `amsgrad=True` to correctly update their `max_exp_avg_sqs`. ### Issue <!-- Link to Issue ticket or RFP --> - https://github.com/pytorch/pytorch/issues/78807 - https://github.com/pytorch/pytorch/pull/81894 - https://github.com/pytorch/pytorch/pull/81348 - https://github.com/pytorch/pytorch/pull/81705 - https://github.com/pytorch/pytorch/issues/58833 - https://github.com/pytorch/pytorch/issues/68041 ### Testing <!-- How did you test your change? --> Updated `test_foreach.py::TestForeach::_minmax_test` to compare the outputs of `_foreach_maximum_` (and `_foreach_minimum_`) against those of `[torch.maximum(a, b) for a, b in zip(tensors1, tensors2)]` cc @ngimel @albanD @mikaylagawarecki Pull Request resolved: https://github.com/pytorch/pytorch/pull/82523 Approved by: https://github.com/albanD	2022-08-03 03:40:42 +00:00
ProGamerGov	71d50f4f89	Change docstring type callable to Callable for consistency (#82487 ) ### Description Across PyTorch's docstrings, both `callable` and `Callable` for variable types. The Callable should be capitalized as we are referring to the `Callable` type, and not the Python `callable()` function. ### Testing There shouldn't be any testing required. Pull Request resolved: https://github.com/pytorch/pytorch/pull/82487 Approved by: https://github.com/albanD	2022-08-01 17:26:09 +00:00
ProGamerGov	357b7d589c	Fix docstring inconsistencies: string -> str, boolean -> bool (#82410 ) ### Description Throughout the PyTorch docs and codebase, the `string` type in docstrings is referred to by two separate names. This leads to inconsistent docs, like you can see here: https://pytorch.org/docs/stable/generated/torch.nn.Conv3d.html#torch.nn.Conv3d This PR fixes this issue by ensuring that all mentions of the string type in docstrings, are using the same format that Sphinx generates hyperlinks for. ### Testing No testing should be required for this change Pull Request resolved: https://github.com/pytorch/pytorch/pull/82410 Approved by: https://github.com/jbschlosser	2022-07-28 21:29:57 +00:00
Rob Zinkov	f9ef363982	Modifying Adam to support complex numbers as 2d real numbers (#80279 ) This commit addresses issues in #65711 Pull Request resolved: https://github.com/pytorch/pytorch/pull/80279 Approved by: https://github.com/albanD	2022-07-27 18:39:40 +00:00
Sudarshan Raghunathan	52aae5aa19	[Sparse Adam] Fix error in loading serialized models due to introduction of new parameter (#82273 ) ### Description PR #80336 introduced a new parameter to the Sparse Adam optimizer. The new parameter is accessed inside the `step` method of the optimizer. If we try to deserialize and run an older version of the optimizer before this change was introduced, it fails in the step that tries to access the missing parameter. I have added a workaround to set a default value in case the parameter is unavailable in the optimizer. ### Issue <!-- Link to Issue ticket or RFP --> ### Testing * Testing on PyTorch CI * Manual validation against existing serialized models to make sure they continue to work Pull Request resolved: https://github.com/pytorch/pytorch/pull/82273 Approved by: https://github.com/mehtanirav, https://github.com/albanD	2022-07-27 12:48:38 +00:00
albanD	312ece7f65	fix sgd maximize when momentum is involved (#81859 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/81859 Approved by: https://github.com/jbschlosser	2022-07-26 16:48:32 +00:00
Emilio Castillo	49b4f45781	Add initial support for differentiable optimizers (#80938 ) Adds the `differentiable` argument, a method for updating parameters in an existing optimizer, and a template for testing the differentiability of multiple optimizers. This is all based in discussions with @albanD & @jbschlosser Pull Request resolved: https://github.com/pytorch/pytorch/pull/80938 Approved by: https://github.com/albanD	2022-07-25 13:37:08 +00:00
Rob Zinkov	50c655d5e3	Adding maximize to ASGD (#81875 ) Added the maximize flag #68052 to ASGD optimizer and updates the respective tests. Pull Request resolved: https://github.com/pytorch/pytorch/pull/81875 Approved by: https://github.com/albanD	2022-07-22 17:05:41 +00:00
PyTorch MergeBot	135af0fe30	Revert "Adding maximize to ASGD (#80323 )" This reverts commit `14bd5bd6ee`. Reverted https://github.com/pytorch/pytorch/pull/80323 on behalf of https://github.com/albanD due to Broke rocm test	2022-07-08 13:35:31 +00:00
PyTorch MergeBot	0b8a5ca01b	Revert "Adding maximize to rprop (#80335 )" This reverts commit `495aa9bc3a`. Reverted https://github.com/pytorch/pytorch/pull/80335 on behalf of https://github.com/albanD due to Broke rocm and windows test	2022-07-08 13:34:02 +00:00
Rob Zinkov	f24c94d7ae	Adding maximize to SparseAdam (#80336 ) Added the maximize flag #68052 to SparseAdam optimizer and updates the respective tests. Pull Request resolved: https://github.com/pytorch/pytorch/pull/80336 Approved by: https://github.com/albanD	2022-07-08 12:17:27 +00:00
Rob Zinkov	495aa9bc3a	Adding maximize to rprop (#80335 ) Added the maximize flag #68052 to rprop optimizer and updates the respective tests. Pull Request resolved: https://github.com/pytorch/pytorch/pull/80335 Approved by: https://github.com/albanD	2022-07-08 08:04:38 +00:00
Rob Zinkov	a1fd5b4273	Adding maximize to RMSprop (#80326 ) Added the maximize flag #68052 to RMSprop optimizer and updates the respective tests. Pull Request resolved: https://github.com/pytorch/pytorch/pull/80326 Approved by: https://github.com/albanD	2022-07-08 08:04:26 +00:00
Rob Zinkov	14bd5bd6ee	Adding maximize to ASGD (#80323 ) Added the maximize flag #68052 to ASGD optimizer and updates the respective tests. Pull Request resolved: https://github.com/pytorch/pytorch/pull/80323 Approved by: https://github.com/albanD	2022-07-08 08:03:36 +00:00
albanD	9d20af5060	remove overly restrictive checks for cudagraph (#80881 ) Finish fixing https://github.com/pytorch/pytorch/issues/80809 Pull Request resolved: https://github.com/pytorch/pytorch/pull/80881 Approved by: https://github.com/jbschlosser	2022-07-06 18:08:49 +00:00
Edward Z. Yang	57f001f35a	Don't error if _warned_capturable_if_run_uncaptured not set (#80345 ) This can happen if an optimizer was pickled. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/80345 Approved by: https://github.com/malfet, https://github.com/albanD	2022-06-29 03:46:22 +00:00
anjali411	bda04e9f5e	Add __all__ for torch.optim and torch.nn.modules modules (#80237 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/80237 Approved by: https://github.com/albanD	2022-06-24 21:34:10 +00:00
Sergii Dymchenko	de7219e8a7	Use generators with all/any in torch/optim (#78142 ) Generator comprehensions with any/all are less verbose and potentially help to save memory/CPU : https://eklitzke.org/generator-comprehensions-and-using-any-and-all-in-python To make JIT work with this change, I added code to convert GeneratorExp to ListComp. So the whole PR is basically NoOp for JIT, but potentially memory and speed improvement for eager mode. Also I removed a test from test/jit/test_parametrization.py. The test was bad and had a TODO to actually implement and just tested that UnsupportedNodeError is thrown, and with GeneratorExp support a different error would be thrown. Pull Request resolved: https://github.com/pytorch/pytorch/pull/78142 Approved by: https://github.com/malfet, https://github.com/albanD	2022-06-24 17:23:45 +00:00
albanD	375668cd96	Remove overly restrictive assert in adam (#80222 ) This is causing issues if the user has the step on cuda for a good reason. These assert prevents code that used to run just fine to fail. Note that this is a pretty bad thing to do for performance though so it is ok to try and push users away from doing it. For the 1.12.1 milestone: this is not asking for a dot release to fix this (as this is bad practice anyways). But it would be a great thing to add if we do one: it is very low risk and will prevent breakage for users. Pull Request resolved: https://github.com/pytorch/pytorch/pull/80222 Approved by: https://github.com/jbschlosser, https://github.com/ngimel	2022-06-24 17:08:34 +00:00
Antonio Kim	765b6a8fab	Fix SequentialLR initialization (#72856 ) What was happening is that when we have multiple learning rate schedulers, the order in which they are being initialized is not being taken into account. This is a problem if they were being initialized in sequential order (as one might intuitively do). Each scheduler calls `step()` on initialization and sets the `lr` in its optimizer's `params_groups`. However, this means that step 0 will be using the `lr` that was set by the very last scheduler (in the case of initializing schedulers sequentially) instead of the first scheduler. The fix in this PR, addresses the above bug by performing a call to the appropriate scheduler on initialization after decrementing the `last_epoch` values in order to keep them the same post-step. This will ensure that the correct scheduler is the one setting the `lr` values for the optimizer's `param_groups` Pull Request resolved: https://github.com/pytorch/pytorch/pull/72856 Approved by: https://github.com/jbschlosser	2022-06-21 20:21:13 +00:00
Janosh Riebesell	660d9ddef4	Fix `SWALR` doc string (#79836 ) In `torch/optim/swa_utils.py`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/79836 Approved by: https://github.com/albanD	2022-06-20 12:57:07 +00:00
Sebastian Brodehl	fb9d8de379	Make `LR scheduler` stub complete, including `OneCycleLR` and class attributes. (#59476 ) This PR completes the stub file for lr scheduler and includes a previously missing scheduler, namely `OneCycleLR, and adds additional class attributes and methods for all lr scheduler. Pull Request resolved: https://github.com/pytorch/pytorch/pull/59476 Approved by: https://github.com/jbschlosser	2022-06-17 16:39:13 +00:00
Madhushan B	9acbaaaf05	Fix typo in ChainedScheduler docstring (#79775 ) ### Goal Fixes https://github.com/pytorch/pytorch/issues/79720 ### Approach replace `Chains list of learning rate schedulers. It takes a list of chainable learning rate schedulers and performs consecutive step() functions` `belong` `to them by just one call.` with `Chains list of learning rate schedulers. It takes a list of chainable learning rate schedulers and performs consecutive step() functions` `belonging` `to them by just one call.` Pull Request resolved: https://github.com/pytorch/pytorch/pull/79775 Approved by: https://github.com/albanD	2022-06-17 14:18:42 +00:00
Michael Carilli	ba27ee9e8f	[CUDA graphs] Allows Adam and AdamW to be capture-safe (#77862 ) Near term fix for https://github.com/pytorch/pytorch/issues/76368. Q. Why does the user need to request `capturable=True` in the optimizer constructor? Why can't capture safety be completely automatic? A. We need to set up capture-safe (device-side) state variables before capture. If we don't, and step() internally detects capture is underway, it's too late: the best we could do is create a device state variable and copy the current CPU value into it, which is not something we want baked into the graph. Q. Ok, why not just do the capture-safe approach with device-side state variables all the time? A. It incurs several more kernel launches per parameter, which could really add up and regress cpu overhead for ungraphed step()s. If the optimizer won't be captured, we should allow step() to stick with its current cpu-side state handling. Q. But cuda RNG is a stateful thing that maintains its state on the cpu outside of capture and replay, and we capture it automatically. Why can't we do the same thing here? A. The graph object can handle RNG generator increments because its capture_begin, capture_end, and replay() methods can see and access generator object. But the graph object has no explicit knowledge of or access to optimizer steps in its capture scope. We could let the user tell the graph object what optimizers will be stepped in its scope, ie something like ```python graph.will_use_optimizer(opt) graph.capture_begin() ... ``` but that seems clunkier than an optimizer constructor arg. I'm open to other ideas, but right now I think constructor arg is necessary and the least bad approach. Long term, https://github.com/pytorch/pytorch/issues/71274 is a better fix. Pull Request resolved: https://github.com/pytorch/pytorch/pull/77862 Approved by: https://github.com/ezyang	2022-06-13 01:56:47 +00:00
Rob Zinkov	2a496e2f80	Adding maximize to Adamax (#77409 ) Added the maximize flag #68052 to Adamax optimizer and updates the respective tests. Pull Request resolved: https://github.com/pytorch/pytorch/pull/77409 Approved by: https://github.com/albanD	2022-05-16 17:34:44 +00:00
James Reed	57b54dfec5	Fix Optimizer.zero_grad type annotation (#76998 ) `Optimizer.zero_grad()` defines the `set_to_none` argument as `bool`, not `Optional[bool]` Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/76998 Approved by: https://github.com/albanD	2022-05-11 00:05:26 +00:00
tomMoral	ff94c9dee4	DOC fix momentum equation for nesterov Fix https://github.com/pytorch/pytorch/issues/72395 This is a small fix in the doc for an indice in this equation: ![image](https://user-images.githubusercontent.com/3321081/166165461-140855b5-96b5-4417-85fc-2a170f95700a.png) I think teh indice should not be `t-1` but `t`. This is coherent with [the implementation)[https://github.com/pytorch/pytorch/blob/master/torch/optim/sgd.py#L236] and with what is done for instance in [keras](https://www.tensorflow.org/api_docs/python/tf/keras/optimizers/SGD). Pull Request resolved: https://github.com/pytorch/pytorch/pull/76639 Approved by: https://github.com/albanD	2022-05-04 20:40:21 +00:00
Emilio Castillo	e5ee6f5cf7	Fix `CosineAnnealingLR` on restart Fixes #60265 The initial LR for this scheduler is not consistent when a new instance is created with `last_epoch != -1` Maybe we can refactor the testing code to test `last_epoch != -1` in schedulers that can recreate their state from the current epoch? Pull Request resolved: https://github.com/pytorch/pytorch/pull/60339 Approved by: https://github.com/albanD	2022-04-20 13:35:01 +00:00
Rob Zinkov	6642e88ad2	Adding maximize flag to Adagrad This adds maximize to Adagrad (#68052) along with updates the respective tests. Pull Request resolved: https://github.com/pytorch/pytorch/pull/75968 Approved by: https://github.com/albanD	2022-04-20 08:29:03 +00:00
Jake Tae	3b18bc36f3	Docs: Add missing zero-ing step in Rprop algorithm Fixes ##70418. Pull Request resolved: https://github.com/pytorch/pytorch/pull/75555 Approved by: https://github.com/albanD	2022-04-11 21:57:13 +00:00
francescocastelli	58a44523c1	Add maximize flag to Adadelta Added the maximize flag to Adadelta optimizer (#68052) and adjusted tests to take maximize into account. Pull Request resolved: https://github.com/pytorch/pytorch/pull/75330 Approved by: https://github.com/cpuhrsch	2022-04-08 20:32:35 +00:00
Mikayla Gawarecki	10bb0ffe69	Fix casting bug in state_step for optimizers when loading state dict Pull Request resolved: https://github.com/pytorch/pytorch/pull/75214 Approved by: https://github.com/albanD	2022-04-05 01:27:18 +00:00
Jan Zikes	715a0dc5c0	[PyTorch/d2go] fix optim _multi_tensor (#73215 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/73215 Fixing an issue in optimizers from _multi_tensor, for `sgd_mt` introduced in `2cb03e926f` Reviewed By: mikaylagawarecki Differential Revision: D34389034 fbshipit-source-id: ede153d52dca15909c6c022853589707f18dc8d1 (cherry picked from commit `cc8a58e584`)	2022-02-23 10:29:48 +00:00
Sergii Dymchenko	313557a613	Add missing import (#72840 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/72840 Reviewed By: H-Huang Differential Revision: D34242612 Pulled By: albanD fbshipit-source-id: 3dd34de96dbf1ae8f3c3ea45888d211d95862c49 (cherry picked from commit `d2650ffa75`)	2022-02-15 19:43:54 +00:00
Mikayla Gawarecki	2a5aaf1c49	Optim foreach cleanup for AdamW (#70484 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70484 Test Plan: Imported from OSS Reviewed By: anjali411 Differential Revision: D33767869 Pulled By: mikaylagawarecki fbshipit-source-id: 2f5273bbfeea3ed502c5d77da4bebe1674243e86 (cherry picked from commit `2dd9b77917`)	2022-02-15 18:02:08 +00:00
Mikayla Gawarecki	dff58d519f	Optim foreach cleanup for Rprop (#70483 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70483 Test Plan: Imported from OSS Reviewed By: anjali411 Differential Revision: D33767866 Pulled By: mikaylagawarecki fbshipit-source-id: ffc5ae68eeea8fa09385862b853b731554b77bcb (cherry picked from commit `3a0fe29580`)	2022-02-15 18:02:08 +00:00
Mikayla Gawarecki	ce3094f5f6	Optim foreach cleanup for Rmsprop (#70482 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70482 Test Plan: Imported from OSS Reviewed By: anjali411 Differential Revision: D33767862 Pulled By: mikaylagawarecki fbshipit-source-id: 8e2e9c986d5a3774093a79755940372945f1b3a9 (cherry picked from commit `baea537277`)	2022-02-15 18:02:08 +00:00
Mikayla Gawarecki	2cb03e926f	Optim foreach cleanup for SGD (#70481 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70481 Test Plan: Imported from OSS Reviewed By: anjali411 Differential Revision: D33767868 Pulled By: mikaylagawarecki fbshipit-source-id: 89b9227a4ddf99602855973cbc343c58ae3d5328 (cherry picked from commit `ffea8ddcfd`)	2022-02-15 18:02:08 +00:00
Mikayla Gawarecki	5f9590681d	Optim foreach cleanup for Adam (#70295 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70295 Test Plan: Imported from OSS Reviewed By: anjali411 Differential Revision: D33767870 Pulled By: mikaylagawarecki fbshipit-source-id: f922f15ecb0307458c8ecee737325c42c4f3ce8b (cherry picked from commit `66233a8a3e`)	2022-02-15 18:02:08 +00:00
Mikayla Gawarecki	0972db5b7d	Optim foreach cleanup for ASGD (#70231 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70231 Test Plan: Imported from OSS Reviewed By: anjali411 Differential Revision: D33767867 Pulled By: mikaylagawarecki fbshipit-source-id: 4406824acbb6f427d52c1ced2d8a02a98c943b86 (cherry picked from commit `cbd9a4da15`)	2022-02-09 16:52:13 +00:00
Mikayla Gawarecki	5948522e9c	Optim foreach cleanup for RAdam (#70230 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70230 Test Plan: Imported from OSS Reviewed By: anjali411 Differential Revision: D33767874 Pulled By: mikaylagawarecki fbshipit-source-id: 9379db24266a7bbcc2c23849f87ae0af2e6729c0 (cherry picked from commit `ecf7b31fc3`)	2022-02-09 16:52:13 +00:00
Mikayla Gawarecki	3653f07c7c	Optim foreach cleanup for NAdam (#70229 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70229 Test Plan: Imported from OSS Reviewed By: anjali411 Differential Revision: D33767873 Pulled By: mikaylagawarecki fbshipit-source-id: 833ead14c1d1659351ebfbeb41045a3c7eb96dad (cherry picked from commit `9415df6b5c`)	2022-02-09 16:52:13 +00:00
Mikayla Gawarecki	d9acfef831	Optim foreach cleanup for Adamax (#69982 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69982 Test Plan: Imported from OSS Reviewed By: anjali411 Differential Revision: D33767865 Pulled By: mikaylagawarecki fbshipit-source-id: c5efd351e359825d38b71f57a2c61a2055c3c114 (cherry picked from commit `37bb80c2d7`)	2022-02-09 16:52:13 +00:00
Mikayla Gawarecki	dabfea8363	Optim foreach cleanup for Adagrad (#69981 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69981 Test Plan: Imported from OSS Reviewed By: anjali411 Differential Revision: D33767863 Pulled By: mikaylagawarecki fbshipit-source-id: 1c99abe4ac4eb2a9eb896dff4837b539b94f68e7 (cherry picked from commit `61c28d0645`)	2022-02-09 16:52:12 +00:00
Mikayla Gawarecki	8e8d170674	Optim foreach cleanup for Adadelta (#69980 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69980 - Merged `torch/optim/adadelta.py` and `torch/optim/_multitensor/adadelta.py` into `torch/optim/adadelta.py` - Moved adadelta functional forms from `torch/optim/_functional.py` and `torch/optim/_multi_tensor/_functional.py` to `torch/optim/adadelta.py` - `torch/optim/_functional.py` just imports from `torch/optim/adadelta.py` - Added a test `test_optimizers_foreach_flag` which replicates `test_multi_tensor_optimizers` in `test/test_optim.py` - Add a test `test_adadelta_new` that replicates the behavior of `test_adadelta` but with `foreach` flag instead of using the multitensor adadleta class. If we delete `_multitensor/` we could replace `test_adadelta` with this Remaining TODO: - [ ] single_tensor adadelta supports complex but multitensor does not, need to integrate the singletensor logic in multitensor and switch the `test_adadelta_complex` to test for foreach in [True, False] Test Plan: Imported from OSS Reviewed By: VitalyFedyunin, albanD Differential Revision: D33413059 Pulled By: mikaylagawarecki fbshipit-source-id: 92a9fa98705762bb6bd464261671e49aef40070e (cherry picked from commit `a008227d22`)	2022-02-09 16:52:12 +00:00
Mikayla Gawarecki	8bb1d06702	[optim] ASGD fold state updates into functional and pass list of vars rather than states (#71335 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/71335 Test Plan: Imported from OSS Reviewed By: anjali411 Differential Revision: D33767871 Pulled By: mikaylagawarecki fbshipit-source-id: 84ebe1fafb1c27572f08c8c8026c882dd7e054c1 (cherry picked from commit `7613ebb391`)	2022-02-08 23:58:41 +00:00
Mikayla Gawarecki	ccc1a01dcb	[optim] NAdam fold state updates into functional (#71334 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/71334 Test Plan: Imported from OSS Reviewed By: anjali411 Differential Revision: D33767864 Pulled By: mikaylagawarecki fbshipit-source-id: 4d985e9e346f40110bd4231e0f16e5643fbc448d (cherry picked from commit `58aa77e367`)	2022-02-08 23:58:41 +00:00
Mikayla Gawarecki	7176c92687	[optim] update step in functional and pass state_steps instead of state (#71333 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/71333 Updated - Adagrad - Adamax - Adam - AdamW - RAdam make multi_tensor functionals take `state_steps: List[Tensor]` instead of taking `states: List[Dict]` make `state_steps: List[int]s -> state_steps:List[Tensor]` where each is a Singleton tensor so step can be updated within the functional (NAdam and ASGD) were updated in separate diffs to fold their handling of state into the functionals Test Plan: Imported from OSS Reviewed By: anjali411 Differential Revision: D33767872 Pulled By: mikaylagawarecki fbshipit-source-id: 9baa7cafb6375eab839917df9287c65a437891f2 (cherry picked from commit `831c02b3d0`)	2022-02-08 16:51:19 +00:00
Prabhat Roy	942a084c46	Remove state_dict from AveragedModel and use buffers instead (#71763 ) Summary: Fixes [https://github.com/pytorch/pytorch/issues/66686](https://github.com/pytorch/pytorch/issues/66686) Pull Request resolved: https://github.com/pytorch/pytorch/pull/71763 Reviewed By: anjali411 Differential Revision: D33770907 Pulled By: prabhat00155 fbshipit-source-id: ee32f2cb8475c9add4e1a9a5d3d784ef95825efc (cherry picked from commit `a15898b072`)	2022-01-26 13:31:30 +00:00
Alban Desmaison	e1b84e1b6b	fix loading of older models that don't have maximize (#71023 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/71023 Reviewed By: jbschlosser Differential Revision: D33483687 Pulled By: albanD fbshipit-source-id: 2f3c6e97a9579be9ba15eca0756fc1f2c466fbb6	2022-01-10 06:01:24 -08:00
Jake Tae	dd1121435b	SequentialLR update _last_lr on step (#70558 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/68956. Pull Request resolved: https://github.com/pytorch/pytorch/pull/70558 Reviewed By: dagitses Differential Revision: D33430213 Pulled By: albanD fbshipit-source-id: 446f182610de32db224d55b244d76c3076e8080f	2022-01-07 10:36:35 -08:00
Alban Desmaison	c6e727d05b	Fix adamw formula doc (#68587 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/68482 Pull Request resolved: https://github.com/pytorch/pytorch/pull/68587 Reviewed By: dagitses, jbschlosser Differential Revision: D33478646 Pulled By: albanD fbshipit-source-id: 4e6419829c3faa7449c041e7d467a6dab30fe917	2022-01-07 10:15:16 -08:00
Mikayla Gawarecki	3a21f38a2e	Integrate multi_tensor zero_grad into Optimizer base class (#69936 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69936 Currently, the optimizers in `torch/optim/_multi_tensor/` all override the base Optimizer class' implementation of `zero_grad` with the same foreach zero_grad implementation (e.g. [here](https://github.com/pytorch/pytorch/blob/master/torch/optim/_multi_tensor/adadelta.py#L93-L114)). There is a TODO that indicates that this should be refactored to the base class once the foreach ops are in good shape. This PR is intended to address that TODO. Test Plan: Imported from OSS Reviewed By: mrshenli Differential Revision: D33346748 Pulled By: mikaylagawarecki fbshipit-source-id: 6573f4776aeac757b6a778894681868191a1b4c7	2022-01-05 15:46:23 -08:00
Adnios	15f14ce0dc	fix typo in adam docs (#70387 ) Summary: Fix the typo in [adam docs in master branch](https://pytorch.org/docs/master/generated/torch.optim.Adam.html#torch.optim.Adam) ![image](https://user-images.githubusercontent.com/41060790/147345284-37e180d1-fd06-4a62-9c79-2d17b8aa5cd3.png) Pull Request resolved: https://github.com/pytorch/pytorch/pull/70387 Reviewed By: H-Huang Differential Revision: D33309283 Pulled By: albanD fbshipit-source-id: d20c5d8f2498ac64013f71e202a6b50dcc069f2b	2021-12-28 07:35:39 -08:00
Adnios	a9c7d626e1	Add the `maximize` flag to AdamW (#70146 ) Summary: Related issue: https://github.com/pytorch/pytorch/issues/68052 cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse SciPioneer H-Huang Pull Request resolved: https://github.com/pytorch/pytorch/pull/70146 Reviewed By: malfet Differential Revision: D33254561 Pulled By: albanD fbshipit-source-id: f190c836a4162f936c5953e076747c345df21421	2021-12-23 09:20:29 -08:00
Rohit Gupta	5f3f327a9d	update `SequentialLR` signature (#69817 ) Summary: - ~optimizer isn't required for `SequentialLR` since it's already present in the schedulers. Trying to match the signature of it with `ChainedScheduler`.~ - ~`verbose` isn't really used anywhere so removed it.~ updated missing docs and added a small check Pull Request resolved: https://github.com/pytorch/pytorch/pull/69817 Reviewed By: ngimel Differential Revision: D33069589 Pulled By: albanD fbshipit-source-id: f015105a35a2ca39fe94c70acdfd55cdf5601419	2021-12-16 12:58:00 -08:00
John Muradeli	fdcb78df38	`print` fix in `lr_scheduler` (#68338 ) Summary: `{:5d}` fails for `CosineAnnealingWarmRestarts` which has float `epoch` Pull Request resolved: https://github.com/pytorch/pytorch/pull/68338 Reviewed By: jbschlosser Differential Revision: D33063970 Pulled By: albanD fbshipit-source-id: 992e987f8d5f6f8f5067924df4671e9725b6d884	2021-12-14 09:05:19 -08:00
oliver	3d358a7678	Adds a `maximize` flag to Adam (#68164 ) Summary: Solves the next most important use case in https://github.com/pytorch/pytorch/issues/68052. I have kept the style as close to that in SGD as seemed reasonable, given the slight differences in their internal implementations. All feedback welcome! cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse SciPioneer H-Huang Pull Request resolved: https://github.com/pytorch/pytorch/pull/68164 Reviewed By: VitalyFedyunin Differential Revision: D32994129 Pulled By: albanD fbshipit-source-id: 65c57c3f3dbbd3e3e5338d51def54482503e8850	2021-12-13 05:53:53 -08:00
Kurt Mohler	52219b1017	Fix `ChainedScheduler.get_last_lr()` (#69112 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/68820 cc vincentqb jbschlosser albanD Pull Request resolved: https://github.com/pytorch/pytorch/pull/69112 Reviewed By: zou3519 Differential Revision: D32796626 Pulled By: albanD fbshipit-source-id: bde9d4e473527be4c0a7f21cb57f795a67a99eaa	2021-12-02 13:44:12 -08:00
Santiago Castro	263125a962	Fix RAdam docstring on LR default value (#69186 ) Summary: Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/69186 Reviewed By: albanD Differential Revision: D32759614 Pulled By: H-Huang fbshipit-source-id: b11819c50156a538cd6003e9cddde0390c853f67	2021-12-01 14:32:07 -08:00
Artsiom Sanakoyeu	c0e6dc9ac7	[pytorch] Fix loading from checkpoint after "maximize" flag was introduced in SGD (#68733 ) Summary: After 'maximize' flag was introduced in https://github.com/pytorch/pytorch/issues/46480 some jobs fail because they resume training from the checkpoints. After we load old checkpoints we will get an error during optimizer.step() call during backward pass in [torch/optim/sgd.py", line 129] because there is no key 'maximize' in the parameter groups of the SGD. To circumvent this I add a default value `group.setdefault('maximize', False)` when the optimizer state is restored. Pull Request resolved: https://github.com/pytorch/pytorch/pull/68733 Reviewed By: albanD Differential Revision: D32480963 Pulled By: asanakoy fbshipit-source-id: 4e367fe955000a6cb95090541c143a7a1de640c2	2021-11-23 11:42:16 -08:00
nhankiet	a2e35e167b	refactor: update f-string for swa.utils.py (#68718 ) Summary: _ Update some old-style formats to f-string, for whole and coherent consistency. Pull Request resolved: https://github.com/pytorch/pytorch/pull/68718 Reviewed By: jbschlosser Differential Revision: D32593746 Pulled By: albanD fbshipit-source-id: fcc17958f8af6a3260beca883bc1065f019dcf0e	2021-11-22 11:23:18 -08:00
oliver	94b6fa6f8b	Adds an optimizer instance variable to ChainedScheduler (#68010 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/67601. As simple a fix as I could make it. I even managed to delete some testing code! I checked calling `super()` and, as I had feared, it doesn't work out the box, so perhaps that ought to be revisited later. As it stands, https://github.com/pytorch/pytorch/issues/20124, still applies to the chained scheduler, but I think this change is still an improvement. Pull Request resolved: https://github.com/pytorch/pytorch/pull/68010 Reviewed By: zou3519 Differential Revision: D32278139 Pulled By: albanD fbshipit-source-id: 4c6f9f1b2822affdf63a6d22ddfdbcb1c6afd579	2021-11-10 01:31:47 -08:00
oliver	f8297d40fc	Adds a `maximize` flag to SGD. (#67847 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/46480 -- for SGD. ## Notes: - I have modified the existing tests to take a new `constructor_accepts_maximize` flag. When this is set to true, the ` _test_basic_cases_template` function will test both maximizing and minimizing the sample function. - This was the clearest way I could think of testing the changes -- I would appreciate feedback on this strategy. ## Work to be done: [] I need to update the docs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/67847 Reviewed By: H-Huang Differential Revision: D32252631 Pulled By: albanD fbshipit-source-id: 27915a3cc2d18b7e4d17bfc2d666fe7d2cfdf9a4	2021-11-09 00:43:07 -08:00
hesom	07a08fb95f	Fix typo in LinearLR docs (#67840 ) Summary: The final learning rate should be 0.05 like the lr used as the argument for the optimizer and not 0.005. Pull Request resolved: https://github.com/pytorch/pytorch/pull/67840 Reviewed By: jbschlosser Differential Revision: D32187091 Pulled By: albanD fbshipit-source-id: 8aff691bba3896a847d7b9d9d669a65f67a6f066	2021-11-05 07:16:15 -07:00
Yiwen Song	6696c59af4	Adding `optimizer` attribute to SequentialLR (#67406 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/67318 :) cc albanD, datumbox Pull Request resolved: https://github.com/pytorch/pytorch/pull/67406 Reviewed By: jbschlosser Differential Revision: D31997873 Pulled By: albanD fbshipit-source-id: f579fb886d049a545673fd92ef5892fcf501bcc6	2021-10-28 14:43:40 -07:00
Christopher Gray Howard	dfa7225a38	[Pytorch][Bootcamp] Add fix and testing for non-vectorized Adadelta optimizer to handle complex numbers (#66587 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66587 Made some changes in the step function of the non-vectorized Adadelta optimizer to handle complex numbers as two real numbers as per 65711 on github ghstack-source-id: 141484731 Test Plan: buck test mode/dev caffe2/test:optim -- 'test_adadelta_complex' https://pxl.cl/1R7kJ Reviewed By: albanD Differential Revision: D31630069 fbshipit-source-id: 2741177b837960538ce39772897af36bbce7b7d8	2021-10-26 17:35:01 -07:00
Christopher Gray Howard	acb340de75	[Pytorch][Bootcamp] Add fixes and vanilla testing for Adagrad non-vectorized and vectorized optimizers to handle complex numbers (#66671 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66671 Made changes in the step function of the vectorized and non-vectorized adagrad optimizers to handle complex numbers as two real numbers as per 65711 on github ghstack-source-id: 141442350 Test Plan: buck test mode/dev caffe2/test:optim -- 'test_adagrad_complex' https://pxl.cl/1Rd44 Reviewed By: albanD Differential Revision: D31673503 fbshipit-source-id: 90a0d0c69b556716e2d17c59ce80f09c750fc464	2021-10-25 10:13:21 -07:00
Prabhat Roy	c7748fc172	Added validation of mode parameter in AveragedModel (#65921 ) Summary: Discussion: https://github.com/pytorch/pytorch/pull/65495#issuecomment-930460469 Pull Request resolved: https://github.com/pytorch/pytorch/pull/65921 Reviewed By: albanD Differential Revision: D31310105 Pulled By: prabhat00155 fbshipit-source-id: 417691832a7c793744830c11e0ce53e3972d21a3	2021-10-03 08:42:28 -07:00
Prabhat Roy	2ea724b1fd	Added option to update parameters using state_dict in AveragedModel (#65495 ) Summary: While implementing [EMA](https://github.com/pytorch/vision/pull/4381)(which extends AveragedModel) in torchvision, update_parameters() from AveragedModel could not be used as it did not handle state_dict(), so a custom update_parameters() needed to be defined in [EMA class](https://github.com/pytorch/vision/pull/4406). This PR aims to handle this scenario removing the need for this custom update_parameters() implementation. Discussion: https://github.com/pytorch/vision/pull/4406#pullrequestreview-753734102 Pull Request resolved: https://github.com/pytorch/pytorch/pull/65495 Reviewed By: datumbox Differential Revision: D31176742 Pulled By: prabhat00155 fbshipit-source-id: 326d14876018f21cf602bab5eaba344678dbabe2	2021-09-28 03:34:49 -07:00
Balaji	32f0387ee8	Bug in CosineAnnealingWarmRestarts in optim/lr_scheduler.py (#64758 ) Summary: ## {emoji:1f41b} Bug 'CosineAnnealingWarmRestarts' object has no attribute 'T_cur'. In the Constructor of the CosineAnnealingWarmRestarts, we're calling the constructor of the Parent class (_LRScheduler) which inturn calls the step method of the CosineAnnealingWarmRestarts. The called method tries to update the object's attribute 'T_cur' which is not defined yet. So it raises the error. This only holds, when we give the value for last_epoch argument as 0 or greater than 0 to the 'CosineAnnealingWarmRestarts', while initializing the object. ![Bug_in_CosineAnnealingWarmRestarts](https://user-images.githubusercontent.com/77477328/132552212-70abc8b5-0357-4c35-90a9-832648bac607.png) ## To Reproduce Steps to reproduce the behavior: 1. Give the value for the last_epoch argument as zero OR 1. Give the value for the last_epoch argument as a Positive integer. ## Expected behavior I only expected the 'CosineAnnealingWarmRestarts' object to be initialized. ## Environment PyTorch version: 1.9.0+cpu Is debug build: False CUDA used to build PyTorch: None ROCM used to build PyTorch: N/A OS: Ubuntu 20.04.2 LTS (x86_64) GCC version: (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0 Clang version: Could not collect CMake version: version 3.21.2 Libc version: glibc-2.31 Python version: 3.8.10 [GCC 9.4.0] (64-bit runtime) Python platform: Linux-5.8.0-59-generic-x86_64-with-glibc2.29 Is CUDA available: False CUDA runtime version: No CUDA ## Additional context We can able to solve this bug by moving the line 'self.T_cur = self.last_epoch' above the 'super(CosineAnnealingWarmRestarts,self).__init__()' line. Since we've initialized the "self.T_cur" to the object. Pull Request resolved: https://github.com/pytorch/pytorch/pull/64758 Reviewed By: ezyang Differential Revision: D31113694 Pulled By: jbschlosser fbshipit-source-id: 98c0e292291775895dc3566fda011f2d6696f721	2021-09-22 16:55:14 -07:00
Ilqar Ramazanli	df3d649380	To add state dict and load_dict for Chained Scheduler (#65034 ) Summary: Adding state_dict() and load_state_dict() methods for Chained Scheduler Pull Request resolved: https://github.com/pytorch/pytorch/pull/65034 Reviewed By: prabhat00155, nateanl Differential Revision: D30958207 Pulled By: datumbox fbshipit-source-id: 1a587a330d34e0548e891a39f8fb5a3d251b71fa	2021-09-15 13:11:41 -07:00
Ilqar Ramazanli	211ad231dc	To add state_dict and load_state_dict to SequentialLR (#65035 ) Summary: To add state_dict() and load_state_dict() methods to SequentialLR Pull Request resolved: https://github.com/pytorch/pytorch/pull/65035 Reviewed By: prabhat00155, nateanl Differential Revision: D30958204 Pulled By: datumbox fbshipit-source-id: 65114e1b07146526ae2680233f5cd42b2534d67a	2021-09-15 12:01:51 -07:00
Ilqar Ramazanli	dafa0a5a3b	[doc][hackathon] To add Adadelta Optimizer to the documentation (#63255 ) Summary: It has been discussed before that adding description of Optimization algorithms to PyTorch Core documentation may result in a nice Optimization research tutorial. In the following tracking issue we mentioned about all the necessary algorithms and links to the originally published paper https://github.com/pytorch/pytorch/issues/63236. In this PR we are adding description of AdaDelta Algorithm to the documentation. For more details, we refer to the paper here https://arxiv.org/abs/1212.5701 <img width="654" alt="AdaDeltaalg" src="https://user-images.githubusercontent.com/73658284/132770544-82ccf90a-1d54-4ad5-8fc4-51c8dec63a12.png"> Pull Request resolved: https://github.com/pytorch/pytorch/pull/63255 Reviewed By: ngimel Differential Revision: D30867589 Pulled By: iramazanli fbshipit-source-id: 5ba602c20c724a4486bdd38b73e1b64c0e767bdc	2021-09-10 16:49:12 -07:00
Ilqar Ramazanli	54b72a99ef	To add Rprop documentation (#63866 ) Summary: It has been discussed before that adding description of Optimization algorithms to PyTorch Core documentation may result in a nice Optimization research tutorial. In the following tracking issue we mentioned about all the necessary algorithms and links to the originally published paper https://github.com/pytorch/pytorch/issues/63236. In this PR we are adding description of Rprop to the documentation. For more details, we refer to the paper http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.21.1417 <img width="657" alt="Rpropalg" src="https://user-images.githubusercontent.com/73658284/132750009-a5ec059e-6d53-4c67-917b-57174c8ca27b.png"> Pull Request resolved: https://github.com/pytorch/pytorch/pull/63866 Reviewed By: ngimel Differential Revision: D30867590 Pulled By: iramazanli fbshipit-source-id: 0d2d4ffc6c4d939290bbbaa84d2c6e901ed8b54a	2021-09-10 09:49:10 -07:00
Ilqar Ramazanli	d4b09dbab3	[doc][hackathon] To add Adagrad Optimizer to the documentation (#63254 ) Summary: It has been discussed before that adding description of Optimization algorithms to PyTorch Core documentation may result in a nice Optimization research tutorial. In the following tracking issue we mentioned about all the necessary algorithms and links to the originally published paper https://github.com/pytorch/pytorch/issues/63236. In this PR we are adding description of Adagrad to the documentation. For more details, we refer to the paper http://jmlr.org/papers/v12/duchi11a.html <img width="658" alt="AdaGradAlgo" src="https://user-images.githubusercontent.com/73658284/132743276-a52ea3fb-70a5-4788-94b7-f99367907a26.png"> Pull Request resolved: https://github.com/pytorch/pytorch/pull/63254 Reviewed By: albanD Differential Revision: D30852139 Pulled By: iramazanli fbshipit-source-id: 9e496560a97e92be8386585b01d9bd3bba4b0c66	2021-09-09 15:41:29 -07:00
Ilqar Ramazanli	2b41bf40c5	To add SequentialLR to PyTorch Core Schedulers (#64037 ) Summary: Partially resolves https://github.com/pytorch/vision/issues/4281 In this PR we are proposing a new scheduler --SequentialLR-- which enables list of different schedulers called in different periods of the training process. The main motivation of this scheduler is recently gained popularity of warming up phase in the training time. It has been shown that having a small steps in initial stages of training can help convergence procedure get faster. With the help of SequentialLR we mainly enable to call a small constant (or linearly increasing) learning rate followed by actual target learning rate scheduler. ```PyThon scheduler1 = ConstantLR(optimizer, factor=0.1, total_iters=2) scheduler2 = ExponentialLR(optimizer, gamma=0.9) scheduler = SequentialLR(optimizer, schedulers=[scheduler1, scheduler2], milestones=[5]) for epoch in range(100): train(...) validate(...) scheduler.step() ``` which this code snippet will call `ConstantLR` in the first 5 epochs and will follow up with `ExponentialLR` in the following epochs. This scheduler could be used to provide call of any group of schedulers next to each other. The main consideration we should make is every time we switch to a new scheduler we assume that new scheduler starts from the beginning- zeroth epoch. We also add Chained Scheduler to `optim.rst` and `lr_scheduler.pyi` files here. Pull Request resolved: https://github.com/pytorch/pytorch/pull/64037 Reviewed By: albanD Differential Revision: D30841099 Pulled By: iramazanli fbshipit-source-id: 94f7d352066ee108eef8cda5f0dcb07f4d371751	2021-09-09 09:36:32 -07:00
Ilqar Ramazanli	239366c9c2	To add Rectified Adam Description to Documentation (#63772 ) Summary: It has been discussed before that adding description of Optimization algorithms to PyTorch Core documentation may result in a nice Optimization research tutorial. In the following tracking issue we mentioned about all the necessary algorithms and links to the originally published paper https://github.com/pytorch/pytorch/issues/63236. In this PR we are adding description of Rectified Adam Algorithm to the documentation. For more details, we refer to the paper https://arxiv.org/abs/1908.03265 <img width="446" alt="RadamAlgo" src="https://user-images.githubusercontent.com/73658284/132587815-4764b642-df53-4e41-975f-72e0f40fdc48.png"> Pull Request resolved: https://github.com/pytorch/pytorch/pull/63772 Reviewed By: datumbox Differential Revision: D30839694 Pulled By: iramazanli fbshipit-source-id: 6f5629ce56e10c66a451433334b587b99eda1610	2021-09-09 07:10:36 -07:00
Ilqar Ramazanli	5b21f172a4	[doc][hackathon] To add AdamW Optimizer to the documentation (#63252 ) Summary: It has been discussed before that adding description of Optimization algorithms to PyTorch Core documentation may result in a nice Optimization research tutorial. In the following tracking issue we mentioned about all the necessary algorithms and links to the originally published paper https://github.com/pytorch/pytorch/issues/63236. In this PR we are adding description of AdamW Algorithm to the documentation. For more details, we refer to the paper here https://arxiv.org/abs/1711.05101 <img width="442" alt="AdamWalgo" src="https://user-images.githubusercontent.com/73658284/132589957-6d381e96-cb62-40d0-990f-82a32ec455be.png"> Pull Request resolved: https://github.com/pytorch/pytorch/pull/63252 Reviewed By: datumbox Differential Revision: D30839685 Pulled By: iramazanli fbshipit-source-id: 1a426c874ab86408d286a34f41aefcf5b21167c0	2021-09-09 07:05:31 -07:00
Ilqar Ramazanli	39ce801d1f	To add Adamax algorithm to documentation (#63903 ) Summary: It has been discussed before that adding description of Optimization algorithms to PyTorch Core documentation may result in a nice Optimization research tutorial. In the following tracking issue we mentioned about all the necessary algorithms and links to the originally published paper https://github.com/pytorch/pytorch/issues/63236. In this PR we are adding description of Adamax Algorithm to the documentation. For more details, we refer to the paper https://arxiv.org/abs/1412.6980 <img width="447" alt="Adamx" src="https://user-images.githubusercontent.com/73658284/132577306-878ce64c-627a-4086-808c-d0482868d4a1.png"> Pull Request resolved: https://github.com/pytorch/pytorch/pull/63903 Reviewed By: albanD Differential Revision: D30819055 Pulled By: iramazanli fbshipit-source-id: 37f748cbea9f93bf37193ee30fc295fb1a1e9ffd	2021-09-09 06:42:33 -07:00
Ilqar Ramazanli	149f1114fe	To add Stochastic Gradient Descent to Documentation (#63805 ) Summary: It has been discussed before that adding description of Optimization algorithms to PyTorch Core documentation may result in a nice Optimization research tutorial. In the following tracking issue we mentioned about all the necessary algorithms and links to the originally published paper https://github.com/pytorch/pytorch/issues/63236. In this PR we are adding description of Stochastic Gradient Descent to the documentation. <img width="466" alt="SGDalgo" src="https://user-images.githubusercontent.com/73658284/132585881-b351a6d4-ece0-4825-b9c0-126d7303ed53.png"> Pull Request resolved: https://github.com/pytorch/pytorch/pull/63805 Reviewed By: albanD Differential Revision: D30818947 Pulled By: iramazanli fbshipit-source-id: 3812028e322c8a64f4343552b0c8c4582ea382f3	2021-09-08 15:22:30 -07:00
Ilqar Ramazanli	43248d9112	[doc][hackathon] To add Adam Optimizer to the documentation (#63251 ) Summary: It has been discussed before that adding description of Optimization algorithms to PyTorch Core documentation may result in a nice Optimization research tutorial. In the following tracking issue we mentioned about all the necessary algorithms and links to the originally published paper https://github.com/pytorch/pytorch/issues/63236. In this PR we are adding description of Adam Algorithm to the documentation. For more details, we refer to the paper https://arxiv.org/abs/1412.6980 <img width="442" alt="Screen Shot 2021-08-27 at 6 37 54 PM" src="https://user-images.githubusercontent.com/73658284/131195297-35fce613-3691-4fed-b42d-db234d4fcd7c.png"> Pull Request resolved: https://github.com/pytorch/pytorch/pull/63251 Reviewed By: albanD Differential Revision: D30779163 Pulled By: iramazanli fbshipit-source-id: 319a80fc3952793b0d064d0e641ddc1de3c05a86	2021-09-07 11:03:35 -07:00
Ilqar Ramazanli	f767cf6683	To change WarmUp Scheduler with ConstantLR and LinearLR (#64395 ) Summary: Partially unblocks https://github.com/pytorch/vision/issues/4281 Previously we have added WarmUp Schedulers to PyTorch Core in the PR : https://github.com/pytorch/pytorch/pull/60836 which had two mode of execution - linear and constant depending on warming up function. In this PR we are changing this interface to more direct form, as separating linear and constant modes to separate Schedulers. In particular ```Python scheduler1 = WarmUpLR(optimizer, warmup_factor=0.1, warmup_iters=5, warmup_method="constant") scheduler2 = WarmUpLR(optimizer, warmup_factor=0.1, warmup_iters=5, warmup_method="linear") ``` will look like ```Python scheduler1 = ConstantLR(optimizer, warmup_factor=0.1, warmup_iters=5) scheduler2 = LinearLR(optimizer, warmup_factor=0.1, warmup_iters=5) ``` correspondingly. Pull Request resolved: https://github.com/pytorch/pytorch/pull/64395 Reviewed By: datumbox Differential Revision: D30753688 Pulled By: iramazanli fbshipit-source-id: e47f86d12033f80982ddf1faf5b46873adb4f324	2021-09-07 08:42:31 -07:00
Tugsbayasgalan (Tugsuu) Manlaibaatar	52d7dd7398	[DOC] improve docstring for Optimizer.state_dict (#63153 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63153 Fixes: https://github.com/pytorch/pytorch/issues/60121 Test Plan: Imported from OSS Reviewed By: pbelevich Differential Revision: D30629462 Pulled By: tugsbayasgalan fbshipit-source-id: a9160e02ac53bb1a6219879747d73aae9ebe4d2f	2021-08-29 10:20:58 -07:00
Ilqar Ramazanli	aefa2f3e64	To add RMSProp algorithm documentation (#63721 ) Summary: It has been discussed before that adding description of Optimization algorithms to PyTorch Core documentation may result in a nice Optimization research tutorial. In the following tracking issue we mentioned about all the necessary algorithms and links to the originally published paper https://github.com/pytorch/pytorch/issues/63236. In this PR we are adding description of RMSProp to the documentation. For more details, we refer to the paper https://www.cs.toronto.edu/~tijmen/csc321/slides/lecture_slides_lec6.pdf <img width="464" alt="RMSProp" src="https://user-images.githubusercontent.com/73658284/131179226-3fb6fe5a-5301-4948-afbe-f38bf57f24ff.png"> Pull Request resolved: https://github.com/pytorch/pytorch/pull/63721 Reviewed By: albanD Differential Revision: D30612426 Pulled By: iramazanli fbshipit-source-id: c3ac630a9658d1282866b53c86023ac10cf95398	2021-08-28 15:55:56 -07:00
Ilqar Ramazanli	9ccb9299e0	To add Nesterov Adam algorithm description to documentation (#63793 ) Summary: It has been discussed before that adding description of Optimization algorithms to PyTorch Core documentation may result in a nice Optimization research tutorial. In the following tracking issue we mentioned about all the necessary algorithms and links to the originally published paper https://github.com/pytorch/pytorch/issues/63236. In this PR we are adding description of Nesterov Adam Algorithm to the documentation. For more details, we refer to the paper https://openreview.net/forum?id=OM0jvwB8jIp57ZJjtNEZ <img width="439" alt="NAdam" src="https://user-images.githubusercontent.com/73658284/131185124-e81b2edf-33d9-4a9d-a7bf-f7e5eea47d7c.png"> Pull Request resolved: https://github.com/pytorch/pytorch/pull/63793 Reviewed By: NivekT Differential Revision: D30617057 Pulled By: iramazanli fbshipit-source-id: cd2054b0d9b6883878be74576e86e307f32f1435	2021-08-27 19:29:34 -07:00
Ilqar Ramazanli	5a12cb611f	To add Chained Scheduler to the list of PyTorch schedulers. (#63491 ) Summary: In this PR we are introducing ChainedScheduler which initially proposed in the discussion https://github.com/pytorch/pytorch/pull/26423#discussion_r329976246 . The idea is to provide a user friendly chaining method for schedulers, especially for the cases many of them are involved and we want to have a clean and easy to read interface for schedulers. This method will be even more crucial once CompositeSchedulers and Schedulers for different type of parameters are involved. The immediate application of Chained Scheduler is expected to happen in TorchVision Library to combine WarmUpLR and MultiStepLR https://github.com/pytorch/vision/blob/master/references/video_classification/scheduler.py#L5 . However, it can be expected that in many other use cases also this method could be applied. ### Example The usage is as simple as below: ```python sched=ChainedScheduler([ExponentialLR(self.opt, gamma=0.9), WarmUpLR(self.opt, warmup_factor=0.2, warmup_iters=4, warmup_method="constant"), StepLR(self.opt, gamma=0.1, step_size=3)]) ``` Then calling ```python sched.step() ``` would trigger step function for all three schedulers consecutively Partially resolves https://github.com/pytorch/vision/issues/4281 Pull Request resolved: https://github.com/pytorch/pytorch/pull/63491 Reviewed By: datumbox, mruberry Differential Revision: D30576180 Pulled By: iramazanli fbshipit-source-id: b43f0749f55faab25079641b7d91c21a891a87e4	2021-08-26 13:30:21 -07:00
Ilqar Ramazanli	e7c4988b52	To fix the chainability at epoch zero for some schedulers (#63457 ) Summary: It has been discussed in the https://github.com/pytorch/pytorch/pull/60836#issuecomment-899084092 that we have observed an obstacle to chain some type of learning rate schedulers. In particular we observed * some of the learning rate schedulers returns initial learning rates at epoch 0 as ``` return self.base_lrs` ``` * This can be a problem when two schedulers called as chained as ``` scheduler1.step() scheduler2.step() ``` in particular, we completely ignore the effect of scheduler1 at epoch 0. This could not be an issue if at epoch 0, scheduler1 was ineffective as in many schedulers, however for schedulers as WarmUp Schedulers, where at epoch 0 schedulers multiplicative value is smaller than 1 this could lead to undesired behaviors. The following code snippet illustrates the problem better ## Reproducing the bug ```python import torch from torch.nn import Parameter from torch.optim import SGD from torch.optim.lr_scheduler import WarmUpLR, ExponentialLR model = [Parameter(torch.randn(2, 2, requires_grad=True))] optimizer = SGD(model, 1.0) scheduler1 = WarmUpLR(optimizer, warmup_factor=0.1, warmup_iters=5, warmup_method="constant") scheduler2 = ExponentialLR(optimizer, gamma=0.9) for epoch in range(10): print(epoch, scheduler2.get_last_lr()[0]) optimizer.step() scheduler1.step() scheduler2.step() ``` ### Current Result ``` 0 1.0 1 0.9 2 0.81 3 0.7290000000000001 4 0.6561000000000001 5 5.904900000000001 6 5.314410000000001 7 4.782969000000001 8 4.304672100000001 9 3.874204890000001 ``` ### Expected Result ``` 0 1.0 1 0.9 2 0.81 3 0.7290000000000001 4 0.6561000000000001 5 0.5904900000000001 6 0.5314410000000001 7 0.4782969000000001 8 0.4304672100000001 9 0.3874204890000001 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/63457 Reviewed By: datumbox Differential Revision: D30424160 Pulled By: iramazanli fbshipit-source-id: 3e15af8d278c872cd6f53406b55f4d3ce5002867	2021-08-19 07:17:03 -07:00
Ilqar Ramazanli	cec08e7032	To add warm-up scheduler to optim (#60836 ) Summary: Warm up of learning rate scheduling has initially been discussed by Priya et. al. in the paper: https://arxiv.org/pdf/1706.02677.pdf . In the section 2.2 of the paper they discussed and proposed idea of warming up learning schedulers in order to prevent big variance / noise in the learning rate. Then idea has been further discussed in the following papers: * Akilesh Gotmare et al. https://arxiv.org/abs/1810.13243 * Bernstein et al http://proceedings.mlr.press/v80/bernstein18a/bernstein18a.pdf * Liyuan Liu et al: https://arxiv.org/pdf/1908.03265.pdf There are two type of popularly used learning rate warm up ideas * Constant warmup (start with very small constant learning rate) * Linear Warmup ( start with small learning rate and gradually increase) In this PR we are adding warm up as learning rate scheduler. Note that learning rates are chainable, which means that we can merge warmup scheduler with any other learning rate scheduler to make more sophisticated learning rate scheduler. ## Linear Warmup Linear Warmup is multiplying learning rate with pre-defined constant - warmup_factor in the first epoch (epoch 0). Then targeting to increase this multiplication constant to one in warmup_iters many epochs. Hence we can derive the formula at i-th step to have multiplication constant equal to: warmup_factor + (1-warmup_factor) * i / warmup_iters Moreover, the fraction of this quantity at point i to point i-1 will give us 1 + (1.0 - warmup_factor) / [warmup_iterswarmup_factor+(i-1)(1-warmup_factor)] which is used in get_lr() method in our implementation. Below we provide an example how to use linear warmup scheduler and to give an example to show how does it works. ```python import torch from torch.nn import Parameter from torch.optim import SGD from torch.optim.lr_scheduler import WarmUpLR model = [Parameter(torch.randn(2, 2, requires_grad=True))] optimizer = SGD(model, 0.1) scheduler = WarmUpLR(optimizer, warmup_factor=0.1, warmup_iters=10, warmup_method="linear") for epoch in range(15): print(epoch, scheduler.get_last_lr()[0]) optimizer.step() scheduler.step() ``` ``` 0 0.010000000000000002 1 0.019000000000000003 2 0.028000000000000008 3 0.03700000000000001 4 0.04600000000000001 5 0.055000000000000014 6 0.06400000000000002 7 0.07300000000000002 8 0.08200000000000003 9 0.09100000000000004 10 0.10000000000000005 11 0.10000000000000005 12 0.10000000000000005 13 0.10000000000000005 14 0.10000000000000005 ``` ## Constant Warmup Constant warmup has straightforward idea, to multiply learning rate by warmup_factor until we reach to epoch warmup_factor, then do nothing for following epochs ```python import torch from torch.nn import Parameter from torch.optim import SGD from torch.optim.lr_scheduler import WarmUpLR model = [Parameter(torch.randn(2, 2, requires_grad=True))] optimizer = SGD(model, 0.1) scheduler = WarmUpLR(optimizer, warmup_factor=0.1, warmup_iters=5, warmup_method="constant") for epoch in range(10): print(epoch, scheduler.get_last_lr()[0]) optimizer.step() scheduler.step() ``` ``` 0 0.010000000000000002 1 0.010000000000000002 2 0.010000000000000002 3 0.010000000000000002 4 0.010000000000000002 5 0.10000000000000002 6 0.10000000000000002 7 0.10000000000000002 8 0.10000000000000002 9 0.10000000000000002 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/60836 Reviewed By: saketh-are Differential Revision: D29537615 Pulled By: iramazanli fbshipit-source-id: d910946027acc52663b301f9c56ade686e62cb69	2021-08-15 12:31:45 -07:00
Ilqar Ramazanli	5ed6e4429e	To fix variance computation for complex Adam (#62946 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/59998 It has been discussed in the issue that the variance term of Adam optimizer currently doesn't compute correctly for complex domain. As it has been stated in the Generalization to Complex numbers section in https://en.wikipedia.org/wiki/Variance variance is computed as E[(X - mu)(X-mu)] (where mu = E[X] and stands for conjugate) for complex random variable X. However, currently the computation method in implementation of Adam is via E[(X - mu)(X-mu)] which doesn't return right variance value, in particular it returns complex number. Variance is defined to be real number even though underlying random variable is complex. We fix this issue here, and testing that resulting variance is indeed real number. Pull Request resolved: https://github.com/pytorch/pytorch/pull/62946 Reviewed By: albanD Differential Revision: D30196038 Pulled By: iramazanli fbshipit-source-id: ab0a6f31658aeb56bdcb211ff86eaa29f3f0d718	2021-08-09 17:54:43 -07:00
Eugene Yang	27135f86fd	fix docstring default value of `last_epoch` for SWALR in torch/optim/… (#62799 ) Summary: …swa_utils Fixes https://github.com/pytorch/pytorch/issues/62633 Pull Request resolved: https://github.com/pytorch/pytorch/pull/62799 Reviewed By: zou3519 Differential Revision: D30131929 Pulled By: H-Huang fbshipit-source-id: 741c077073bbe398492dff0761836acdbba7be78	2021-08-06 08:15:10 -07:00
Philip Meier	8423ab4f99	Fix `CosineAnnealingWarmRestart` annotation (#61106 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/44770. Pull Request resolved: https://github.com/pytorch/pytorch/pull/61106 Reviewed By: 1ntEgr8 Differential Revision: D29635764 Pulled By: walterddr fbshipit-source-id: ddc45a7f04532a76d033ae7774706da1fa8608f7	2021-07-09 08:28:18 -07:00
ramvenkat98	4a544df00d	Implement and benchmark a torch.optim.multi_tensor.adagrad implementation (#59155 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59155 Test Plan: Imported from OSS Reviewed By: iramazanli Differential Revision: D29525213 Pulled By: ramvenkat98 fbshipit-source-id: 6d7e8da91c965d1f4e955a084ed875bab641dc9a	2021-07-07 08:08:32 -07:00
Ilqar Ramazanli	f0e972a481	To add Nesterov Adam algorithm for multi-tensor optimizers API (#59165 ) Summary: Previously in the PR: https://github.com/pytorch/pytorch/issues/59009 we added NAdam to Optimizers. Here in this PR we are proposing multi-tensor version of NAdam for PyTorch. Nadam has been proposed in the paper https://openreview.net/forum?id=OM0jvwB8jIp57ZJjtNEZ and report and report : http://cs229.stanford.edu/proj2015/054_report.pdf by Timothy Dozat. It has been one of the most used algorithm in Deep Learning community. It worth to noting that the implementation of NAdam is inspired by the implementation for Keras : `f9d3868495/tensorflow/python/keras/optimizer_v2/nadam.py` Pull Request resolved: https://github.com/pytorch/pytorch/pull/59165 Reviewed By: vincentqb Differential Revision: D29360577 Pulled By: iramazanli fbshipit-source-id: 0fe14016303b2df2cb8cc31912a2674acf63d1e5	2021-06-27 17:00:41 -07:00
Ilqar Ramazanli	5563f4bda0	To add Rectified Adam algorithm for multi-tensor optimizers API (#59161 ) Summary: Previously in the PR: https://github.com/pytorch/pytorch/issues/58968 we added RAdam to Optimizers. Here in this PR we are proposing multi-tensor version of RAdam for PyTorch. Radam has been proposed in the paper https://arxiv.org/pdf/1908.03265.pdf Liyuan Liu et al. It has been one of the most used algorithm in Deep Learning community. Differing from the paper, we selected variance tractability cut-off as 5 instead of 4 as it is the common practice. Pull Request resolved: https://github.com/pytorch/pytorch/pull/59161 Reviewed By: vincentqb Differential Revision: D29360576 Pulled By: iramazanli fbshipit-source-id: 7ccdbf12b1ee7f12e66f7d7992123a70cc818b6b	2021-06-27 13:01:20 -07:00
Ilqar Ramazanli	e1bd4963e2	To intorduce Functional API for multi-tensor (#60735 ) Summary: In this PR we change Multi-Tensor Optimizers to Functional API. We can see that in the file : https://github.com/pytorch/pytorch/blob/master/torch/optim/_functional.py , there has been functional API defined for most of Optimizers. However we do not have similar file / functionality for multi tensors : https://github.com/pytorch/pytorch/tree/master/torch/optim/_multi_tensor Therefore we are adding it in this PR here. Pull Request resolved: https://github.com/pytorch/pytorch/pull/60735 Reviewed By: vincentqb Differential Revision: D29392253 Pulled By: iramazanli fbshipit-source-id: cebc8e7b07ab11156370f5297cfb419cd9f20b46	2021-06-25 13:09:26 -07:00
Ilqar Ramazanli	7c2938bf67	To refactor Sparse Adam algorithm for functional form (#59171 ) Summary: Adds Functional Interface for Sparse Adam Optimizer. Pull Request resolved: https://github.com/pytorch/pytorch/pull/59171 Reviewed By: vincentqb Differential Revision: D29360582 Pulled By: iramazanli fbshipit-source-id: 5ceffd7f4b7abd1e0b758a5b8445abdf5555eba0	2021-06-25 06:35:39 -07:00
Ilqar Ramazanli	63219f1f9f	To add Rectified Adam Algorithm to Optimizers (#58968 ) Summary: Fixes : https://github.com/pytorch/pytorch/issues/24892 In the paper : https://arxiv.org/pdf/1908.03265.pdf Liyuan Liu et al. suggested a new optimization algorithm with an essence of similar to Adam Algorithm. It has been discussed in the paper that, without warmup heuristic, in the early stage of adaptive optimization / learning algorithms sometimes we can get undesirable large variance which can slow overall convergence process. Authors proposed the idea of rectification of variance of adaptive learning rate when it is expected to be high. Differing from the paper, we selected variance tractability cut-off as 5 instead of 4. This adjustment is common practice, and could be found in the code-repository and also tensorflow swift optim library as well : `2f03dd1970/radam/radam.py (L156)` `f51ee4618d/Sources/TensorFlow/Optimizers/MomentumBased.swift (L638)` Pull Request resolved: https://github.com/pytorch/pytorch/pull/58968 Reviewed By: vincentqb Differential Revision: D29310601 Pulled By: iramazanli fbshipit-source-id: b7bd487f72f1074f266687fd9c0c6be264a748a9	2021-06-23 18:27:57 -07:00
Ilqar Ramazanli	e8690dacb2	To add Nesterov Adam Algorithm to Optimizers (#59009 ) Summary: Fixes : https://github.com/pytorch/pytorch/issues/5804 In the paper : https://openreview.net/forum?id=OM0jvwB8jIp57ZJjtNEZ Timothy Dozat suggested a new optimization algorithm with an essence of combination of NAG and Adam algorithms. It is known that the idea of momentum can be improved with the Nesterov acceleration in optimization algorithms, and Dozat is investigating to apply this idea to momentum component of Adam algorithm. Author provided experiment evidence in their work to show excellence of the idea. In this PR we are implementing the proposed algorithm NAdam in the mentioned paper. Author has a preliminary work http://cs229.stanford.edu/proj2015/054_report.pdf where he shows the decay base constant should be taken as 0.96 which we also followed the same phenomenon here in this implementation similar to Keras. Moreover, implementation / coding practice have been followed similar to Keras in some other places as well: `f9d3868495/tensorflow/python/keras/optimizer_v2/nadam.py` Pull Request resolved: https://github.com/pytorch/pytorch/pull/59009 Reviewed By: gchanan, vincentqb Differential Revision: D29220375 Pulled By: iramazanli fbshipit-source-id: 4b4bb4b15f7e16f7527f368bbf4207ed345751aa	2021-06-23 08:21:43 -07:00
Sam Estep	1abf45e37f	Revert D29241736: [pytorch][PR] To add Rectified Adam Algorithm to Optimizers Test Plan: revert-hammer Differential Revision: D29241736 (`0d2a936176`) Original commit changeset: 288b9b1f3125 fbshipit-source-id: 56c4ec98647c6f1822b130726741a1c9ca193670	2021-06-22 12:08:31 -07:00
Ilqar Ramazanli	0d2a936176	To add Rectified Adam Algorithm to Optimizers (#58968 ) Summary: Fixes : https://github.com/pytorch/pytorch/issues/24892 In the paper : https://arxiv.org/pdf/1908.03265.pdf Liyuan Liu et al. suggested a new optimization algorithm with an essence of similar to Adam Algorithm. It has been discussed in the paper that, without warmup heuristic, in the early stage of adaptive optimization / learning algorithms sometimes we can get undesirable large variance which can slow overall convergence process. Authors proposed the idea of rectification of variance of adaptive learning rate when it is expected to be high. Differing from the paper, we selected variance tractability cut-off as 5 instead of 4. This adjustment is common practice, and could be found in the code-repository and also tensorflow swift optim library as well : `2f03dd1970/radam/radam.py (L156)` `f51ee4618d/Sources/TensorFlow/Optimizers/MomentumBased.swift (L638)` Pull Request resolved: https://github.com/pytorch/pytorch/pull/58968 Reviewed By: gchanan Differential Revision: D29241736 Pulled By: iramazanli fbshipit-source-id: 288b9b1f3125fdc6c7a7bb23fde1ea5c201c0448	2021-06-22 10:38:41 -07:00
Ilqar Ramazanli	9a622f4cd9	refactor ASGD to use functional API (#58410 ) Summary: Functional API is used in large scale distributed training to enable multithreaded training instead of multiprocess, as it gives more optimal resource utilization and efficiency. In this PR, we provide code migration and refactoring for functional API for ASGD algorithm. Pull Request resolved: https://github.com/pytorch/pytorch/pull/58410 Reviewed By: ailzhang Differential Revision: D28546702 Pulled By: iramazanli fbshipit-source-id: 4f62b6037d53f35b19f98340e88af2ebb6243a4f	2021-05-19 18:55:52 -07:00
Wanchao Liang	4611387608	[optim] take kw-only argument for functional optim APIs (#56185 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56185 ghstack-source-id: 126670123 Reviewed By: albanD Differential Revision: D27802169 fbshipit-source-id: f5e1cb2046dcdeecf5f6b0f70892828bf0adb22f	2021-04-15 20:08:04 -07:00
Wanchao Liang	8ef13cf976	[optim] refactor rprop to use functional API (#55832 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55832 ghstack-source-id: 126325541 Reviewed By: driazati Differential Revision: D27703877 fbshipit-source-id: 34d4ce7b7d124c0cd75e2f6d0bc8f836713b7301	2021-04-15 15:19:41 -07:00
Wanchao Liang	bb245b6444	[optim] refactor adamax to use functional API (#55830 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55830 ghstack-source-id: 126325537 Reviewed By: driazati Differential Revision: D26561017 fbshipit-source-id: 41273d200e546d4ac08d39b57865d63c624f143a	2021-04-15 15:19:39 -07:00
mattip	40d74e6f71	breakup optim, cuda documentation (#55673 ) Summary: Related to https://github.com/pytorch/pytorch/issues/52256 Use autosummary instead of autofunction to create subpages for optim and cuda functions/classes. Also fix some minor formatting issues in optim.LBFGS and cuda.stream docstings Pull Request resolved: https://github.com/pytorch/pytorch/pull/55673 Reviewed By: jbschlosser Differential Revision: D27747741 Pulled By: zou3519 fbshipit-source-id: 070681f840cdf4433a44af75be3483f16e5acf7d	2021-04-14 12:44:00 -07:00
Sam Estep	5bcbbf5373	Lint trailing newlines (#54737 ) Summary: Context: https://github.com/pytorch/pytorch/issues/53406 added a lint for trailing whitespace at the ends of lines. However, in order to pass FB-internal lints, that PR also had to normalize the trailing newlines in four of the files it touched. This PR adds an OSS lint to normalize trailing newlines. The changes to the following files (made in 54847d0adb9be71be4979cead3d9d4c02160e4cd) are the only manually-written parts of this PR: - `.github/workflows/lint.yml` - `mypy-strict.ini` - `tools/README.md` - `tools/test/test_trailing_newlines.py` - `tools/trailing_newlines.py` I would have liked to make this just a shell one-liner like the other three similar lints, but nothing I could find quite fit the bill. Specifically, all the answers I tried from the following Stack Overflow questions were far too slow (at least a minute and a half to run on this entire repository): - [How to detect file ends in newline?](https://stackoverflow.com/q/38746) - [How do I find files that do not end with a newline/linefeed?](https://stackoverflow.com/q/4631068) - [How to list all files in the Git index without newline at end of file](https://stackoverflow.com/q/27624800) - [Linux - check if there is an empty line at the end of a file [duplicate]](https://stackoverflow.com/q/34943632) - [git ensure newline at end of each file](https://stackoverflow.com/q/57770972) To avoid giving false positives during the few days after this PR is merged, we should probably only merge it after https://github.com/pytorch/pytorch/issues/54967. Pull Request resolved: https://github.com/pytorch/pytorch/pull/54737 Test Plan: Running the shell script from the "Ensure correct trailing newlines" step in the `quick-checks` job of `.github/workflows/lint.yml` should print no output and exit in a fraction of a second with a status of 0. That was not the case prior to this PR, as shown by this failing GHA workflow run on an earlier draft of this PR: - https://github.com/pytorch/pytorch/runs/2197446987?check_suite_focus=true In contrast, this run (after correcting the trailing newlines in this PR) succeeded: - https://github.com/pytorch/pytorch/pull/54737/checks?check_run_id=2197553241 To unit-test `tools/trailing_newlines.py` itself (this is run as part of our "Test tools" GitHub Actions workflow): ``` python tools/test/test_trailing_newlines.py ``` Reviewed By: malfet Differential Revision: D27409736 Pulled By: samestep fbshipit-source-id: 46f565227046b39f68349bbd5633105b2d2e9b19	2021-03-30 13:09:52 -07:00
Jay Patel	4f62c622b3	Cleanup of unused list in adam.py (#53874 ) Summary: Code cleanup. Pull Request resolved: https://github.com/pytorch/pytorch/pull/53874 Reviewed By: jbschlosser Differential Revision: D27036819 Pulled By: ngimel fbshipit-source-id: c267e20c8d91224cd3c01b302a75f43aa309b560	2021-03-15 09:49:27 -07:00
Sam Estep	8c798e0622	Forbid trailing whitespace (#53406 ) Summary: Context: https://github.com/pytorch/pytorch/pull/53299#discussion_r587882857 These are the only hand-written parts of this diff: - the addition to `.github/workflows/lint.yml` - the file endings changed in these four files (to appease FB-internal land-blocking lints): - `GLOSSARY.md` - `aten/src/ATen/core/op_registration/README.md` - `scripts/README.md` - `torch/csrc/jit/codegen/fuser/README.md` The rest was generated by running this command (on macOS): ``` git grep -I -l ' $' -- . ':(exclude)/contrib/' ':(exclude)third_party' \| xargs gsed -i 's/ *$//' ``` I looked over the auto-generated changes and didn't see anything that looked problematic. Pull Request resolved: https://github.com/pytorch/pytorch/pull/53406 Test Plan: This run (after adding the lint but before removing existing trailing spaces) failed: - https://github.com/pytorch/pytorch/runs/2043032377 This run (on the tip of this PR) succeeded: - https://github.com/pytorch/pytorch/runs/2043296348 Reviewed By: walterddr, seemethere Differential Revision: D26856620 Pulled By: samestep fbshipit-source-id: 3f0de7f7c2e4b0f1c089eac9b5085a58dd7e0d97	2021-03-05 17:22:55 -08:00
Wanchao Liang	f8238d7917	[optim] bugfix when all parameters have no grad (#52944 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52944 This fix the bug introduced during refactoring optimizers https://github.com/pytorch/pytorch/pull/50411. When all parameters have no grads, we should still allows `beta` like hyper params to be defined. Reviewed By: ngimel Differential Revision: D26699827 fbshipit-source-id: 8a7074127704c7a4a1fbc17d48a81e23a649f280	2021-03-03 11:56:09 -08:00
MiHarsh	c871abecf5	Added torch.no_grad() to update_bn (#52654 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/52055 This fixes the out of memory error while using update_bn in SWA, by not allocating memory for backpropagation. Pull Request resolved: https://github.com/pytorch/pytorch/pull/52654 Reviewed By: malfet Differential Revision: D26620077 Pulled By: albanD fbshipit-source-id: 890b5a78ba9c1a148f3ab7c63472a73d8f6412a4	2021-02-25 11:35:38 -08:00
Chester Liu	58eb23378f	Clean up usage of torch._six partially (#49785 ) Summary: See https://github.com/pytorch/pytorch/issues/42919 Pull Request resolved: https://github.com/pytorch/pytorch/pull/49785 Reviewed By: mruberry Differential Revision: D25963833 Pulled By: bugra fbshipit-source-id: 11c90d6b8d3f206c9d0a4d8621b773beb10c6ba2	2021-02-08 13:58:34 -08:00
Vincent Quenneville-Belair	50d903f19f	[optim] make functional api be private (#51316 ) (#51665 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51665 This reverts commit `896f82aa92`. Test Plan: Imported from OSS Reviewed By: gchanan Differential Revision: D26232608 Pulled By: vincentqb fbshipit-source-id: ca006baf4fb672c11c1bb003c39a29cbadb63dd3	2021-02-03 17:59:05 -08:00
Jasha	a651696ab4	fix misspelling in swa_utils.pyi (#51608 ) Summary: Change `avg_fun -> avg_fn` to match the spelling in the `.py` file. (`swa_utils.pyi` should match `swa_utils.py`) Pull Request resolved: https://github.com/pytorch/pytorch/pull/51608 Reviewed By: glaringlee Differential Revision: D26224779 Pulled By: zou3519 fbshipit-source-id: 01ff7173ba0a996f1b7a653438acb6b6b4659de6	2021-02-03 10:51:22 -08:00
Vincent Quenneville-Belair	896f82aa92	[optim] make functional api be private (#51316 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51316 Make optim functional API be private until we release with beta Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D26213469 fbshipit-source-id: b0fd001a8362ec1c152250bcd57c7205ed893107	2021-02-03 09:29:33 -08:00
Jan	a5b65ae40a	Fix small typo (#51542 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/51541 Pull Request resolved: https://github.com/pytorch/pytorch/pull/51542 Reviewed By: albanD Differential Revision: D26199174 Pulled By: H-Huang fbshipit-source-id: 919fc4a70d901916eae123672d010e9eb8e8b977	2021-02-02 10:14:17 -08:00
Wanchao Liang	5cbe1e4933	[dist_optim] add distributed functional Adam optimizer (#50624 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50624 Add TorchScript compatible Adam functional optimizer to distributed optimizer Test Plan: Imported from OSS Reviewed By: rohan-varma Differential Revision: D25932770 Pulled By: wanchaol fbshipit-source-id: cab3f1164c76186969c284a2c52481b79bbb7190	2021-01-23 01:01:37 -08:00
Wanchao Liang	df96344968	[optimizer] refactor AdamW to use functional API (#50411 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50411 Test Plan: Imported from OSS Reviewed By: izdeby Differential Revision: D25932776 Pulled By: wanchaol fbshipit-source-id: e8e1696b3390ba7909b36fd0107c58b892520432	2021-01-21 11:00:45 -08:00
Wanchao Liang	ce1781d8db	[optimizer] refactor RMSProp to use functional API (#50410 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50410 Test Plan: Imported from OSS Reviewed By: izdeby Differential Revision: D25932779 Pulled By: wanchaol fbshipit-source-id: b0d6007ea83d77e2d70d04681163ea7e4632c5cd	2021-01-21 11:00:41 -08:00
Wanchao Liang	d6fb27ce72	[optimizer] refactor Adadelta to use functional API (#50409 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50409 Test Plan: Imported from OSS Reviewed By: izdeby Differential Revision: D25932780 Pulled By: wanchaol fbshipit-source-id: 2fc025f66a0e0863f21689892e19d8a5681f2f2f	2021-01-21 11:00:36 -08:00
Wanchao Liang	a0cf5566d8	[optimizer] refactor SGD to use functional API (#45597 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45597 Test Plan: Imported from OSS Reviewed By: izdeby Differential Revision: D25932773 Pulled By: wanchaol fbshipit-source-id: bc5f830d6812f847475b9bdcc67865d9968e3282	2021-01-21 10:57:08 -08:00
Samuel Marks	e6779d4357	[*.py] Rename "Arguments:" to "Args:" (#49736 ) Summary: I've written custom parsers and emitters for everything from docstrings to classes and functions. However, I recently came across an issue when I was parsing/generating from the TensorFlow codebase: inconsistent use of `Args:` and `Arguments:` in its docstrings. ```sh (pytorch#c348fae)$ for name in 'Args:' 'Arguments:'; do printf '%-10s %04d\n' "$name" "$(rg -IFtpy --count-matches "$name" \| paste -s -d+ -- \| bc)"; done Args: 1095 Arguments: 0336 ``` It is easy enough to extend my parsers to support both variants, however it looks like `Arguments:` is wrong anyway, as per: - https://google.github.io/styleguide/pyguide.html#doc-function-args @ [`ddccc0f`](https://github.com/google/styleguide/blob/ddccc0f/pyguide.md) - https://chromium.googlesource.com/chromiumos/docs/+/master/styleguide/python.md#describing-arguments-in-docstrings @ [`9fc0fc0`](https://chromium.googlesource.com/chromiumos/docs/+/9fc0fc0/styleguide/python.md) - https://sphinxcontrib-napoleon.readthedocs.io/en/latest/example_google.html @ [`c0ae8e3`](https://github.com/sphinx-contrib/napoleon/blob/c0ae8e3/docs/source/example_google.rst) Therefore, only `Args:` is valid. This PR replaces them throughout the codebase. PS: For related PRs, see tensorflow/tensorflow/pull/45420 PPS: The trackbacks automatically appearing below are sending the same changes to other repositories in the [PyTorch](https://github.com/pytorch) organisation. Pull Request resolved: https://github.com/pytorch/pytorch/pull/49736 Reviewed By: albanD Differential Revision: D25710534 Pulled By: soumith fbshipit-source-id: 61e8ff01abb433e9f78185c2d1d0cbd7c22c1619	2020-12-28 09:34:47 -08:00
Iurii Zdebskyi	6230e337d5	Add torch._foreach_zero_ API (#47286 ) Summary: In this PR - add `_foreach_zero_` API - Update all optimizers under /_multi_tensor/ to use `_foreach_zero_` in `zero_grad` method Performance improvement ----------------- OP: zero_ ----------------- for-loop: 630.36 us foreach: 90.84 us script ``` import torch import torch.optim as optim import torch.nn as nn import torchvision import torch.utils.benchmark as benchmark_utils inputs = [torch.rand(3, 200, 200, device="cuda") for _ in range(100)] def main(): for op in [ "zero_" ]: print("\n\n----------------- OP: ", op, " -----------------") stmt = "[torch.{op}(t) for t in inputs]" timer = benchmark_utils.Timer( stmt=stmt.format(op = op), globals=globals(), label="str(optimizer)", ) print(f"autorange:\n{timer.blocked_autorange()}\n\n") stmt = "torch._foreach_{op}(inputs)" timer_mta = benchmark_utils.Timer( stmt=stmt.format(op = op), globals=globals(), label="str(optimizer_mta)", ) print(f"autorange:\n{timer_mta.blocked_autorange()}\n\n") if __name__ == "__main__": main() ``` TODO - Refactor zero_grad once foreach APIs are stable. Tested via unit tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/47286 Reviewed By: ngimel Differential Revision: D24706240 Pulled By: izdeby fbshipit-source-id: aac69d6d134d65126ae8e5916f3627b73d8a94bf	2020-12-16 20:04:25 -08:00
Daniil Osokin	09173ae65e	Allow zero annealing epochs (#47579 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/47578. Pull Request resolved: https://github.com/pytorch/pytorch/pull/47579 Reviewed By: H-Huang Differential Revision: D25429403 Pulled By: vincentqb fbshipit-source-id: c42fbcd71b46e07c672a1e9661468848ac16de38	2020-12-16 14:09:43 -08:00
lixinyu	94e328c038	fix optimizer.pyi typo 'statue'->'state' (#49388 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49388 Test Plan: Imported from OSS Reviewed By: zou3519 Differential Revision: D25553672 Pulled By: glaringlee fbshipit-source-id: e9f2233bd678a90768844af2d8d5e2994d59e304	2020-12-15 23:41:56 -08:00
Teng Gao	1c31f76297	Add high level profiling trace for dataloading and optimizer (#47655 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/47441 To give user more information about python level functions in profiler traces, we propose to instrument on the following functions: ``` _BaseDataLoaderIter.__next__ Optimizer.step Optimizer.zero_grad ``` Because the record_function already uses if (!active) to check whether the profiler is enabled, so we don't explicitly call torch.autograd._profiler_enabled() before each instrument. Acknowledgement: nbcsm, guotuofeng, gunandrose4u , guyang3532 , mszhanyi Pull Request resolved: https://github.com/pytorch/pytorch/pull/47655 Reviewed By: smessmer Differential Revision: D24960386 Pulled By: ilia-cher fbshipit-source-id: 2eb655789e2e2f506e1b8f95ad3d470c83281102	2020-12-09 00:13:56 -08:00
mariosasko	f2c3efd51f	Fix generator exhaustion in SparseAdam (#47724 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/47594 Pull Request resolved: https://github.com/pytorch/pytorch/pull/47724 Reviewed By: heitorschueroff Differential Revision: D25304131 Pulled By: albanD fbshipit-source-id: 67c058b0836b9b4fba4f7b966396e4f3fa61f939	2020-12-07 09:38:07 -08:00
jsrozner	42e6951e62	Remove save_state_warning in LambdaLR (#46813 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/46405, https://github.com/pytorch/pytorch/issues/43352 I updated the docstring in the local file (function level comments). Do I also need to edit somewhere else or recompile docstrings? Also, though I didn't change any types here, how is typing (for IDE type checking) documentation generated / used)? Pull Request resolved: https://github.com/pytorch/pytorch/pull/46813 Reviewed By: ezyang Differential Revision: D24923112 Pulled By: vincentqb fbshipit-source-id: be7818e0d4593bfc5d74023b9c361ac2a538589a	2020-12-04 13:19:59 -08:00
Alban Desmaison	46b252b83a	Revert D24262885: [pytorch][PR] Added foreach_zero_ API Test Plan: revert-hammer Differential Revision: D24262885 (`8e37dcb1f3`) Original commit changeset: 144c283dd009 fbshipit-source-id: 451b202e23bc1fcb11b20d26c11d9a1329789d22	2020-10-28 06:48:59 -07:00
iurii zdebskyi	8e37dcb1f3	Added foreach_zero_ API (#46215 ) Summary: Adding Added foreach_zero_(TensorList) API Tested via unit tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/46215 Reviewed By: zhangguanheng66 Differential Revision: D24262885 Pulled By: izdeby fbshipit-source-id: 144c283dd00924083096d6d92eb9085cbd6097d3	2020-10-27 18:03:34 -07:00
Alexander Grund	5b0f400488	Replace list(map(...)) constructs by list comprehensions (#46461 ) Summary: As discussed in https://github.com/pytorch/pytorch/issues/46392 this makes the code more readable and possibly more performant. It also fixes a bug detected by this where the argument order of `map` was confused: `030a24906e (diff-5bb26bd3a23ee3bb540aeadcc0385df2a4e48de39f87ed9ea76b21990738fe98L1537-R1537)` Fixes https://github.com/pytorch/pytorch/issues/46392 Pull Request resolved: https://github.com/pytorch/pytorch/pull/46461 Reviewed By: ailzhang Differential Revision: D24367015 Pulled By: ezyang fbshipit-source-id: d55a67933cc22346b00544c9671f09982ad920e7	2020-10-19 18:42:49 -07:00
Iurii Zdebskyi	e7564b076c	Refactor scalar list APIs to use overloads (#45673 ) Summary: Refactor foreach APIs to use overloads in case of scalar list inputs. Tested via unit tests. Pull Request resolved: https://github.com/pytorch/pytorch/pull/45673 Reviewed By: heitorschueroff Differential Revision: D24053424 Pulled By: izdeby fbshipit-source-id: 35976cc50b4acfe228a32ed26cede579d5621cde	2020-10-19 09:28:49 -07:00
Aiden Nibali	2bc6caa9e4	Add three-phase option to OneCycleLR (#42715 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/40362 The new `three_phase` option provides a way of constructing schedules according to the scheme recommended in [Super-Convergence: Very Fast Training of Neural Networks Using Large Learning Rates](https://arxiv.org/abs/1708.07120). Note that this change maintains backwards compatibility, and as a result the default behaviour of OneCycleLR remains quite counter-intuitive. vincentqb Pull Request resolved: https://github.com/pytorch/pytorch/pull/42715 Reviewed By: heitorschueroff Differential Revision: D24289744 Pulled By: vincentqb fbshipit-source-id: e4aad87880716bb14613c0aa8631e43b04a93e5c	2020-10-14 15:05:14 -07:00
Iurii Zdebskyi	8a074af929	Added scalar lists APIs for addcdiv and addcmul (#45932 ) Summary: 1) Added new APIs: _foreach_addcdiv(Tensor(a!)[] self, Tensor[] tensor1, Tensor[] tensor2, float[] scalars) _foreach_addcdiv_(Tensor(a!)[] self, Tensor[] tensor1, Tensor[] tensor2, float[] scalars) _foreach_addcmul(Tensor(a!)[] self, Tensor[] tensor1, Tensor[] tensor2, float[] scalars) _foreach_addcmul_(Tensor(a!)[] self, Tensor[] tensor1, Tensor[] tensor2, float[] scalars) 2) Updated optimizers to use new APIs Tested via unit tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/45932 Reviewed By: navahgar Differential Revision: D24150306 Pulled By: izdeby fbshipit-source-id: c2e65dedc95d9d81a2fdd116e41df0accb0b6f26	2020-10-14 08:12:37 -07:00
Iurii Zdebskyi	1a57b390e8	Add torch._foreach_maximum(TensorList, TensorList) & torch._foreach_minimum(TensorList, TensorList) APIs (#45692 ) Summary: - Adding torch._foreach_maximum(TensorList, TensorList) API - Adding torch._foreach_minimum(TensorList, TensorList) API - Updated Adam/AdamW optimizers Tested via unit tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/45692 Reviewed By: anjali411 Differential Revision: D24142464 Pulled By: izdeby fbshipit-source-id: 6a4fc343a1613cb1e26c8398450ac9cea0a2eb51	2020-10-13 09:22:30 -07:00
Iurii Zdebskyi	8c309fc052	Add more tests for mt optimizers (#45475 ) Summary: Add more test cases for mt optimizers and fix Adam/AdamW Pull Request resolved: https://github.com/pytorch/pytorch/pull/45475 Reviewed By: soumith Differential Revision: D23982727 Pulled By: izdeby fbshipit-source-id: 4b24d37bd52a2fa3719d3e3a5dcf3b96990b0f5b	2020-09-28 23:59:58 -07:00
Iurii Zdebskyi	722faeb2a4	[RELAND] Added optimizers based on multi tensor apply (#45408 ) Summary: Original PR https://github.com/pytorch/pytorch/pull/45299. The present PR fixes minor bugs that caused revert. Adding a new namespace `torch.optim._multi_tensor` with a bunch of updated optimizers. Those optimizers are using _foreach APIs which improve performance significantly. ### Tests - updated existing tests to use both optimizers - added `test_multi_tensor_optimizers` test to verify correctness. ### Perf results Adam timeit: 42.69 ms --> 10.16 ms autorange: 41.96 ms --> 10.28 ms AdamW timeit: 51.38 ms --> 15.63 ms autorange: 50.82 ms --> 16.07 ms SGD timeit: 6.28 ms --> 4.40 ms autorange: 6.13 ms --> 4.73 ms RMSprop timeit: 28.63 ms --> 5.89 ms autorange: 28.27 ms --> 5.76 ms Rprop timeit: 213.30 --> 178.42 autorange: 212.03 --> 178.03 ASGD timeit: 21.67 --> 9.33 autorange: 21.64 --> 9.27 Adamax timeit: 55.60 --> 48.29 autorange: 55.22 -> 49.13 Rerf Script used ``` import torch import time import torch.optim as optim from torch.autograd import Variable from torch.optim.lr_scheduler import ExponentialLR, ReduceLROnPlateau, StepLR import torch.nn as nn import time import torchvision import torch.utils._benchmark as benchmark_utils device = "cuda" model = torchvision.models.resnet.resnet101(pretrained=True).to(device) targets = torch.randint(0, 1000, (100, 100), device=device) criterion = nn.CrossEntropyLoss() optimizer = optim.SGD(model.parameters(), lr=1e-3) # <----------------------- optimizer. # would compare optim.SGD vs optim._multi_tensor.SGD running_loss = 0.0 target = torch.empty(128, dtype=torch.long, device=device).random_(5) optimizer.zero_grad() inputs = torch.rand(128, 3, 100, 100, device=device , requires_grad=True) outputs = model(inputs) loss = criterion(outputs, target) loss.backward() optimizer.step() running_loss += loss.item() def main(): timer = benchmark_utils.Timer( stmt="optimizer.step()", globals=globals(), label="str(optimizer)", ) for i in range(1): print(f"Run: {i}\n{'-' * 40}") print(f"timeit:\n{timer.timeit(1000)}\n") print(f"autorange:\n{timer.blocked_autorange()}\n\n") if __name__ == "__main__": main() ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/45408 Reviewed By: gchanan Differential Revision: D23956680 Pulled By: izdeby fbshipit-source-id: c5eab7bf5fce14a287c15cead1cdc26e42cfed94	2020-09-28 13:14:04 -07:00
Mike Ruberry	54a253fded	Revert D23931987: Added optimizers based on multi tensor apply Test Plan: revert-hammer Differential Revision: D23931987 (`2b21e7767e`) Original commit changeset: 582134ef2d40 fbshipit-source-id: ffd500aea55fda34155442fb15e2529cb9c00100	2020-09-26 18:11:54 -07:00
Iurii Zdebskyi	2b21e7767e	Added optimizers based on multi tensor apply (#45299 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45299 Adding a new namespace `torch.optim._multi_tensor` with a bunch of updated optimizers. Those optimizers are using _foreach APIs which improve performance significantly. ### Tests - updated existing tests to use both optimizers - added `test_multi_tensor_optimizers` test to verify correctness. ### Perf results Adam timeit: 42.69 ms --> 10.16 ms autorange: 41.96 ms --> 10.28 ms AdamW timeit: 51.38 ms --> 15.63 ms autorange: 50.82 ms --> 16.07 ms SGD timeit: 6.28 ms --> 4.40 ms autorange: 6.13 ms --> 4.73 ms RMSprop timeit: 28.63 ms --> 5.89 ms autorange: 28.27 ms --> 5.76 ms Rprop timeit: 213.30 --> 178.42 autorange: 212.03 --> 178.03 ASGD timeit: 21.67 --> 9.33 autorange: 21.64 --> 9.27 Adamax timeit: 55.60 --> 48.29 autorange: 55.22 -> 49.13 Rerf Script used ``` import torch import time import torch.optim as optim from torch.autograd import Variable from torch.optim.lr_scheduler import ExponentialLR, ReduceLROnPlateau, StepLR import torch.nn as nn import time import torchvision import torch.utils._benchmark as benchmark_utils device = "cuda" model = torchvision.models.resnet.resnet101(pretrained=True).to(device) targets = torch.randint(0, 1000, (100, 100), device=device) criterion = nn.CrossEntropyLoss() optimizer = optim.SGD(model.parameters(), lr=1e-3) # <----------------------- optimizer. # would compare optim.SGD vs optim._multi_tensor.SGD running_loss = 0.0 target = torch.empty(128, dtype=torch.long, device=device).random_(5) optimizer.zero_grad() inputs = torch.rand(128, 3, 100, 100, device=device , requires_grad=True) outputs = model(inputs) loss = criterion(outputs, target) loss.backward() optimizer.step() running_loss += loss.item() def main(): timer = benchmark_utils.Timer( stmt="optimizer.step()", globals=globals(), label="str(optimizer)", ) for i in range(1): print(f"Run: {i}\n{'-' * 40}") print(f"timeit:\n{timer.timeit(1000)}\n") print(f"autorange:\n{timer.blocked_autorange()}\n\n") if __name__ == "__main__": main() ``` Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D23931987 Pulled By: izdeby fbshipit-source-id: 582134ef2d402909d27d89a45c5b588fb7130ea1	2020-09-26 12:17:43 -07:00
Wanchao Liang	32c355af5b	[dist_optim] introduce distributed functional optimizer (#45221 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45221 This PR introduces a distributed functional optimizer, so that distributed optimizer can reuse the functional optimizer APIs and maintain their own states. This could enable the torchscript compatible functional optimizer when using distributed optimizer, helps getting rid of GIL and improve overall performance of training, especially distributed model parallel training Test Plan: Imported from OSS Reviewed By: ailzhang Differential Revision: D23935256 Pulled By: wanchaol fbshipit-source-id: 59b6d77ff4693ab24a6e1cbb6740bcf614cc624a	2020-09-25 17:13:10 -07:00
Wanchao Liang	08caf15502	[optimizer] refactor Adam to use functional API (#44791 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44791 Test Plan: Imported from OSS Reviewed By: ailzhang Differential Revision: D23935257 Pulled By: wanchaol fbshipit-source-id: 6f6e22a9287f5515d2e4e6abd4dee2fe7e17b945	2020-09-25 17:13:08 -07:00
Wanchao Liang	0444c372e1	[optimizer] introduce optimizer functional API, refactor Adagrad (#44715 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44715 We have provided a nice and intuitive API in Python. But in the context of large scale distributed training (e.g. Distributed Model Parallel), users often want to use multithreaded training instead of multiprocess training as it provides better resource utilization and efficiency. This PR introduces functional optimizer concept (that is similar to the concept of `nn.functional`), we split optimizer into two parts: 1. optimizer state management 2. optimizer computation. We expose the computation part as a separate functional API that is available to be used by internal and OSS developers, the caller of the functional API will maintain their own states in order to directly calls the functional API. While maintaining the end user API be the same, the functional API is TorchScript friendly, and could be used by the distributed optimizer to speed up the training without GIL. Test Plan: Imported from OSS Reviewed By: ailzhang Differential Revision: D23935258 Pulled By: wanchaol fbshipit-source-id: d2a5228439edb3bc64f7771af2bb9e891847136a	2020-09-25 17:10:26 -07:00
Michael Carilli	3e6bb5233f	Reference amp tutorial (recipe) from core amp docs (#44725 ) Summary: https://pytorch.org/tutorials/recipes/recipes/amp_recipe.html is live. Core amp docs should reference it. Also i fixed some typos in the `zero_grad` docs we ignored when git was behaving weirdly during ngimel 's merge of https://github.com/pytorch/pytorch/pull/44423. Pull Request resolved: https://github.com/pytorch/pytorch/pull/44725 Reviewed By: mruberry Differential Revision: D23723807 Pulled By: ngimel fbshipit-source-id: ca0b76365f8ca908bd978e3b38bf81857fa6c2a3	2020-09-16 11:37:58 -07:00
Kent Gauen	2efc618f19	lr_schedule.py redundant code (#44613 ) Summary: The subclass sets "self.last_epoch" when this is set in the parent class's init function. Why would we need to set last_epoch twice? I think calling "super" resets last_epoch anyway, so I am not sure why we would want to include this in the subclass. Am I missing something? For the record, I am just a Pytorch enthusiast. I hope my question isn't totally silly. Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/44613 Reviewed By: albanD Differential Revision: D23691770 Pulled By: mrshenli fbshipit-source-id: 080d9acda86e1a2bfaafe2c6fcb8fc1544f8cf8a	2020-09-15 20:28:39 -07:00
Xiang Gao	6bc77f4d35	Use amax/maximum instead of max in optimizers (#43797 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43797 Reviewed By: malfet Differential Revision: D23406641 Pulled By: mruberry fbshipit-source-id: 0cd075124aa6533b21375fe2c90c44a5d05ad6e6	2020-09-15 10:39:40 -07:00
taiyuanz	c515881137	Add reset_grad() function (#44423 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44423 Pull Request resolved: https://github.com/pytorch/pytorch/pull/42754 Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D23010859 Pulled By: ngimel fbshipit-source-id: 56eec43eba88b98cbf714841813977c68f983564	2020-09-09 22:05:45 -07:00
Randall Hunt	24eea364f7	Check SparseAdam params are dense on init (#41966 ) (#43668 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/41966 Raises a value error if user attempts to create SparseAdam optimizer with sparse parameter tensors. Pull Request resolved: https://github.com/pytorch/pytorch/pull/43668 Reviewed By: glaringlee Differential Revision: D23388109 Pulled By: ranman fbshipit-source-id: 1fbcc7527d49eac6fae9ce51b3307c609a6ca38b	2020-09-01 14:25:59 -07:00
NTT123	103887892c	Fix "non-negative integer" error messages (#42734 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/42662 Use "positive integer" error message for consistency with: `17f76f9a78/torch/optim/lr_scheduler.py (L958-L959)` `ad7133d3c1/torch/utils/data/sampler.py (L102-L104)` Pull Request resolved: https://github.com/pytorch/pytorch/pull/42734 Reviewed By: zdevito Differential Revision: D23039575 Pulled By: smessmer fbshipit-source-id: 1be1e0caa868891540ecdbe6f471a6cd51c40ede	2020-08-10 19:39:37 -07:00
Vincent Quenneville-Belair	7221a3d1aa	enable torch.optim.swa_utils.SWALR (#42574 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/42435 Pull Request resolved: https://github.com/pytorch/pytorch/pull/42574 Reviewed By: zou3519 Differential Revision: D22949369 Pulled By: vincentqb fbshipit-source-id: f2f319ec94a97e0afe4d4327c866504ae632a986	2020-08-05 12:37:45 -07:00
Yanli Zhao	79cfd85987	grad detach_ only when it has grad_fn in zero_grad call (#41283 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41283 in optimizer.zero_grad(), detach_ is useful to avoid memory leak only when grad has grad_fn, so add check to call grad.detach_ only when the grad has grad_fn in zero_grad() function ghstack-source-id: 108702289 Test Plan: unit test Reviewed By: mrshenli Differential Revision: D22487315 fbshipit-source-id: 861909b15c8497f1da57f092d8963d4920c85e38	2020-07-29 11:40:13 -07:00
YifanShenSZ	e7ed0b3fae	Avoid zero division in _cubic_interpolate (#42093 ) Summary: I encountered a zero division problem when using LBFGS: File "/home/yshen/anaconda3/lib/python3.7/site-packages/torch/optim/lbfgs.py", line 118, in _strong_wolfe bracket[1], bracket_f[1], bracket_gtd[1]) File "/home/yshen/anaconda3/lib/python3.7/site-packages/torch/optim/lbfgs.py", line 21, in _cubic_interpolate d1 = g1 + g2 - 3 * (f1 - f2) / (x1 - x2) ZeroDivisionError: float division by zero My solution is to determine whether "line-search bracket is so small" before calling _cubic_interpolate Pull Request resolved: https://github.com/pytorch/pytorch/pull/42093 Reviewed By: pbelevich Differential Revision: D22770667 Pulled By: mrshenli fbshipit-source-id: f8fdfcbd3fd530235901d255208fef8005bf898c	2020-07-28 08:32:00 -07:00
mariosasko	4281240cb5	Raise error for duplicate params in param group #40967 (#41597 ) Summary: This PR fixes an issue in https://github.com/pytorch/pytorch/issues/40967 where duplicate parameters across different parameter groups are not allowed, but duplicates inside the same parameter group are accepted. After this PR, both cases are treated equally and raise `ValueError`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/41597 Reviewed By: zou3519 Differential Revision: D22608019 Pulled By: vincentqb fbshipit-source-id: 6df41dac62b80db042cfefa6e53fb021b49f4399	2020-07-27 12:25:52 -07:00
Zhijian Liu	7646f3c77f	Fix type annotation for CosineAnnealingLR (#41866 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41866 Reviewed By: izdeby Differential Revision: D22703576 Pulled By: mrshenli fbshipit-source-id: 10a0f593ffaaae82a2923a42815c36793a9043d5	2020-07-23 15:56:50 -07:00
guol-fnst	17f76f9a78	Verbose param for schedulers that don't have it #38726 (#41580 ) Summary: Verbose param for schedulers that don't have it https://github.com/pytorch/pytorch/issues/38726 Pull Request resolved: https://github.com/pytorch/pytorch/pull/41580 Reviewed By: izdeby Differential Revision: D22671163 Pulled By: vincentqb fbshipit-source-id: 53a6c9e929141d411b6846bc25f3fe7f46fdf3be	2020-07-23 09:57:33 -07:00
Jeong Ukjae	e831299bae	Fix typing error of torch/optim/lr_scheduler.pyi (#41775 ) Summary: * add `_LRScheduler.get_last_lr` type stub. * remove `CosineAnnealingWarmRestarts.step` because its signature is same with `_LRScheduler`'s. Pull Request resolved: https://github.com/pytorch/pytorch/pull/41775 Reviewed By: izdeby Differential Revision: D22649350 Pulled By: vincentqb fbshipit-source-id: 5355dd062a5af437f4fc153244dda793a2382e7e	2020-07-23 09:30:32 -07:00
farhadrgh	4b4273a04e	Update Adam documentation (#41679 ) Summary: This PR fixes https://github.com/pytorch/pytorch/issues/41477 Adam implementation is doing L2 regularization and not decoupled weight decay. However, the change mentioned in https://github.com/pytorch/pytorch/issues/41477 was motivated by Line 12 of algorithm 2 in [Decoupled Weight Decay Regularization](https://arxiv.org/pdf/1711.05101.pdf) paper. Please let me know if you have other suggestions about how to deliver this info in the docs. cc ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/41679 Reviewed By: izdeby Differential Revision: D22671329 Pulled By: vincentqb fbshipit-source-id: 2caf60e4f62fe31f29aa35a9532d1c6895a24224	2020-07-23 09:25:41 -07:00
wudenggang	9600ed9af3	typo fixes (#41632 ) Summary: typo fixes Pull Request resolved: https://github.com/pytorch/pytorch/pull/41632 Reviewed By: ezyang Differential Revision: D22617827 Pulled By: mrshenli fbshipit-source-id: c2bfcb7cc36913a8dd32f13fc9adc3aa0a9b682f	2020-07-20 07:23:00 -07:00
Edward Leardi	6b50874cb7	Fix HTTP links in documentation to HTTPS (#40878 ) Summary: I ran `make linkcheck` using `sphinx.builders.linkcheck` on the documentation and noticed a few links weren't using HTTPS so I quickly updated them all. Pull Request resolved: https://github.com/pytorch/pytorch/pull/40878 Differential Revision: D22404647 Pulled By: ngimel fbshipit-source-id: 9c9756db59197304023fddc28f252314f6cf4af3	2020-07-06 20:05:21 -07:00
vfdev	a6a2dd14ea	Fix typo in warning message (#39854 ) Summary: Fix typo Pull Request resolved: https://github.com/pytorch/pytorch/pull/39854 Reviewed By: ezyang Differential Revision: D22193544 Pulled By: zou3519 fbshipit-source-id: 04b9f59da7b6ba0649fc6d315adcf20685e10930	2020-06-23 16:47:35 -07:00
Ram Rachum	f6b9848c25	Use chain.from_iterable in optimizer.py (#40156 ) Summary: This is a faster and more idiomatic way of using `itertools.chain`. Instead of computing all the items in the iterable and storing them in memory, they are computed one-by-one and never stored as a huge list. This can save on both runtime and memory space. Pull Request resolved: https://github.com/pytorch/pytorch/pull/40156 Reviewed By: ezyang Differential Revision: D22189038 Pulled By: vincentqb fbshipit-source-id: 160b2c27f442686821a6ea541e1f48f4a846c186	2020-06-23 14:07:05 -07:00
Alex Hedges	a3c87c4922	Make Optimizer.state_dict() nondeterministic (#37347 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/36831. Instead of using `id()`, an arbitrary yet consistent order-based index is used instead. This results in a deterministic output between runs. I am not the biggest fan of using `nonlocal` (it appears to be used sparingly in the codebase) to get `start_index` between calls to `pack_group()`, but the alternatives had larger issues: - Using the last value added to `param_mappings` would be ideal, but that only works if `dict` iteration order is consistent, and PyTorch currently supports Python <3.7. - Using the maximum value added to `param_mappings` wouldn't have that issue but would not be constant time. For testing, I confirmed that `test_optim.py` works before and after these changes. Randomizing the indices in `param_mappings` causes the tests to fail, which is further evidence these changes work. I'm not 100% if these tests are sufficient, but they're a start. Pull Request resolved: https://github.com/pytorch/pytorch/pull/37347 Differential Revision: D21353820 Pulled By: vincentqb fbshipit-source-id: e549f1f154833a461b1f4df6d07ad509aab34ea1	2020-06-01 15:32:02 -07:00
Ralf Gommers	9fe8243536	Fix minor issue in type stub for Optimizer (#38067 ) Summary: Closes gh-23731 Pull Request resolved: https://github.com/pytorch/pytorch/pull/38067 Differential Revision: D21471021 Pulled By: ezyang fbshipit-source-id: 8e7ee7f437bfa8e78a47ac6cf572b0fc9b5c6939	2020-05-07 20:11:40 -07:00
Bartosz Gasiorzewski	867e05921f	Fix multiple issues with type annotations (#36358 ) Summary: - added tests that showcase the problems - fixed the problems These changes would allow me to remove many "# type: ignore" comments in my codebase. Pull Request resolved: https://github.com/pytorch/pytorch/pull/36358 Differential Revision: D21230704 Pulled By: ezyang fbshipit-source-id: e6d475a0aa1fb40258fa0231ade28c38108355fb	2020-04-29 11:16:39 -07:00
Pavel Izmailov	22ac071d9a	Add SWA to PyTorch mainline (#35032 ) Summary: This PR is based on the issue https://github.com/pytorch/pytorch/issues/29994#issue-524418771 and the discussion in the previous version of the PR https://github.com/pytorch/pytorch/pull/30559. Specifically, I followed the interface outlined in this [comment](https://github.com/pytorch/pytorch/pull/30559#issuecomment-574864768). ## Structure - `torch/optim/swa_utils.py` contains the implementation of `AveragedModel` class, `SWALR` learning rate scheduler and `update_bn` utility - `test/test_optim.py` contains unit tests for the three components of SWA - `torch/optim/swa_utils.pyi` describes the interface of `torch/optim/swa_utils.py` The new implementation consists of - `AveragedModel` class; this class creates a copy of a given model and allows to compute running averages of the parameters. - `SWALR` learning rate scheduler; after a certain number of epochs switches to a constant learning rate; this scheduler is supposed to be chained with other schedulers. - `update_bn` utility; updates the Batch Normalization activation statistics for a given model and dataloader; this utility is meant to be applied to `AveragedModel` instances. For `update_bn` I simplified the implementation compared to the [original PR](https://github.com/pytorch/pytorch/pull/30559) according to the sugestions by vadimkantorov. ## Example ```python loader, optimizer, model = ... swa_model = torch.optim.swa_utils.AveragedModel(model) # You can use custom averaging functions with `avg_fun` parameter ema_avg = lambda p_avg, p, n_avg: 0.1 * p_avg + 0.9 * p ema_model = torch.optim.swa_utils.AveragedModel(model, avg_function=ema_avg) scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(optimizer, T_max=300) swa_start = 160 swa_scheduler = SWALR(optimizer, start_epoch=swa_start, swa_lr=0.05) for i in range(300): for input, target in loader: optimizer.zero_grad() loss_fn(model(input), target).backward() optimizer.step() scheduler.step() swa_scheduler.step() if i > swa_start: swa_model.update_parameters(model) # Update bn statistics for the swa_model at the end torch.optim.swa_utils.update_bn(loader, swa_model) ``` UPDATED: ```python3 loader, optimizer, model, loss_fn = ... swa_model = torch.optim.swa_utils.AveragedModel(model) scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(optimizer, T_max=300) swa_start = 160 swa_scheduler = SWALR(optimizer, swa_lr=0.05) for i in range(300): for input, target in loader: optimizer.zero_grad() loss_fn(model(input), target).backward() optimizer.step() if i > swa_start: swa_model.update_parameters(model) swa_scheduler.step() else: scheduler.step() # Update bn statistics for the swa_model at the end torch.optim.swa_utils.update_bn(loader, swa_model) ``` Fixes https://github.com/pytorch/pytorch/issues/29994 cc soumith vincentqb andrewgordonwilson vadimkantorov Pull Request resolved: https://github.com/pytorch/pytorch/pull/35032 Differential Revision: D21079606 Pulled By: vincentqb fbshipit-source-id: e07f5e821f72ada63789814c2dcbdc31f0160c37	2020-04-27 07:42:19 -07:00
Masaki Kozuki	7403545518	Fix exception message of `torch.optim.AdamW`. (#36088 ) Summary: PyTorch does not implement `SparseAdamW`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/36088 Differential Revision: D20932357 Pulled By: gchanan fbshipit-source-id: 49e5b72c34ff8ce0deb6b3807662b8b7d67d959f	2020-04-09 08:02:10 -07:00
lordeddard	2de4f245c6	Fix typo in documentation (#34581 ) Summary: Update the parameter description of `total_steps` in `OneCycleLR`. References https://github.com/pytorch/pytorch/issues/34531 Pull Request resolved: https://github.com/pytorch/pytorch/pull/34581 Differential Revision: D20386306 Pulled By: albanD fbshipit-source-id: f8b424a01760e8f5d4de5367b6c60fb342019689	2020-03-11 13:57:10 -07:00
Vincent Quenneville-Belair	be3bc1deb1	convert counter back to list #33229 (#33356 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/33229 Pull Request resolved: https://github.com/pytorch/pytorch/pull/33356 Differential Revision: D20003196 Pulled By: vincentqb fbshipit-source-id: 96f9e0fc7e99a7c2e202f932d1a2ffa158afad92	2020-03-10 15:46:24 -07:00
prajjwal1	b1bd950a4d	Fixed stub for AdamW (#34299 ) Summary: Fixes [https://github.com/pytorch/pytorch/issues/33757](https://github.com/pytorch/pytorch/issues/33757) Pull Request resolved: https://github.com/pytorch/pytorch/pull/34299 Differential Revision: D20337844 Pulled By: ezyang fbshipit-source-id: 54bf174a09b8db9bf6e0c3c717730dd7c795d76b	2020-03-09 08:45:51 -07:00
albanD	6e2bb1c054	End of the .data removal in torch/optim (#34211 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34211 Test Plan: Imported from OSS Differential Revision: D20248684 Pulled By: albanD fbshipit-source-id: 2294bfa41b82ff47f000bc98860780f59d7d4421	2020-03-09 06:40:39 -07:00
Eleanor Dwight Holland	6a97777f72	Remove use of `.data` from optimizers (#33640 ) Summary: Removes all uses of `.data` from optimizers. Or tries to. Pull Request resolved: https://github.com/pytorch/pytorch/pull/33640 Reviewed By: vincentqb Differential Revision: D20203216 Pulled By: albanD fbshipit-source-id: 9bfe78bbed00fd4aaa690801cff0201f0bd680a0	2020-03-03 13:21:55 -08:00
HearyShen	edd5c009f7	fix docs mistakes in lr_scheduler.MultiplicativeLR (#33805 ) Summary: This PR is referenced to an issue: [The docs of `MultiplicativeLR` use `LambdaLR` as example](https://github.com/pytorch/pytorch/issues/33752#issue-570374087) https://github.com/pytorch/pytorch/issues/33752 Pull Request resolved: https://github.com/pytorch/pytorch/pull/33805 Differential Revision: D20121314 Pulled By: mruberry fbshipit-source-id: 5afa63bbe83d35ce4e55705b8cbd96326a907651	2020-02-27 14:11:57 -08:00
JeongUkJae	b10761d890	fix type stub errors (#33762 ) Summary: I've been using pytorch with type hintings, and I found errors that can be easily fixed. So I'm creating this PR to fix type bugs. I expected below code should be type-checked without any errors. ```python import torch from torch.nn import Linear from torch.autograd import Variable from torch.optim import AdamW from torch.utils import hooks # nn.Module should have training attribute module = Linear(10, 20) module.training # torch should have dtype bfloat16 tensor2 = torch.tensor([1,2,3], dtype=torch.bfloat16) # torch.Tensor.cuda should accept int or str value torch.randn(5).cuda(1) torch.tensor(5).cuda('cuda:0') # optimizer should have default attribute module = Linear(10, 20) print(AdamW(module.weight).default) # torch.Tensor should have these boolean attributes torch.tensor([1]).is_sparse torch.tensor([1]).is_quantized torch.tensor([1]).is_mkldnn # Size class should tuple of int a, b = torch.tensor([[1,2,3]]).size() # check modules can be accessed torch.nn.parallel torch.autograd.profiler torch.multiprocessing torch.sparse torch.onnx torch.jit torch.hub torch.random torch.distributions torch.quantization torch.__config__ torch.__future__ torch.ops torch.classes # Variable class's constructor should return Tensor def fn_to_test_variable(t: torch.Tensor): return None v = Variable(torch.tensor(1)) fn_to_test_variable(v) # check RemovableHandle attributes can be accessed handle = hooks.RemovableHandle({}) handle.id handle.next_id # check torch function hints torch.is_grad_enabled() ``` But current master branch raises errors. (I checked with pyright) ``` $ pyright test.py Searching for source files Found 1 source file test.py 12:45 - error: 'bfloat16' is not a known member of module 15:21 - error: Argument of type 'Literal[1]' cannot be assigned to parameter 'device' of type 'Optional[device]' 'int' is incompatible with 'device' Cannot assign to 'None' 16:22 - error: Argument of type 'Literal['cuda:0']' cannot be assigned to parameter 'device' of type 'Optional[device]' 'str' is incompatible with 'device' Cannot assign to 'None' 23:19 - error: Cannot access member 'is_sparse' for type 'Tensor' Member 'is_sparse' is unknown 24:19 - error: Cannot access member 'is_quantized' for type 'Tensor' Member 'is_quantized' is unknown 25:19 - error: Cannot access member 'is_mkldnn' for type 'Tensor' Member 'is_mkldnn' is unknown 32:7 - error: 'autograd' is not a known member of module 33:7 - error: 'multiprocessing' is not a known member of module 34:7 - error: 'sparse' is not a known member of module 35:7 - error: 'onnx' is not a known member of module 36:7 - error: 'jit' is not a known member of module 37:7 - error: 'hub' is not a known member of module 38:7 - error: 'random' is not a known member of module 39:7 - error: 'distributions' is not a known member of module 40:7 - error: 'quantization' is not a known member of module 41:7 - error: '__config__' is not a known member of module 42:7 - error: '__future__' is not a known member of module 44:7 - error: 'ops' is not a known member of module 45:7 - error: 'classes' is not a known member of module 60:7 - error: 'is_grad_enabled' is not a known member of module 20 errors, 0 warnings Completed in 1.436sec ``` and below list is not checked as errors, but I think these are errors too. * `nn.Module.training` is not boolean * return type of `torch.Tensor.size()` is `Tuple[Unknown]`. --- related issues. https://github.com/pytorch/pytorch/issues/23731, https://github.com/pytorch/pytorch/issues/32824, https://github.com/pytorch/pytorch/issues/31753 Pull Request resolved: https://github.com/pytorch/pytorch/pull/33762 Differential Revision: D20118884 Pulled By: albanD fbshipit-source-id: 41557d66674a11b8e7503a48476d4cdd0f278eab	2020-02-27 06:58:53 -08:00
Xiao Wang	c1dd70688a	Fix deprecated python "add" calls (#33428 ) Summary: This PR fixed those python "add" calls using deprecated signature `add(Scalar, Tensor)`. The alternative signature `add(Tensor, alpha = Scalar)` is used. cc csarofeen zasdfgbnm ptrblck ngimel Pull Request resolved: https://github.com/pytorch/pytorch/pull/33428 Differential Revision: D20002534 Pulled By: vincentqb fbshipit-source-id: 81f2dd6170a47a9b53a17e5817c26e70d8afa130	2020-02-26 09:02:31 -08:00
Hong Xu	a6a72ac68f	Fix all occurrences of C416. (#33429 ) Summary: C416: Unnecessary (list/set) comprehension - rewrite using list/set(). See https://pypi.org/project/flake8-comprehensions/ Pull Request resolved: https://github.com/pytorch/pytorch/pull/33429 Differential Revision: D19972858 Pulled By: ezyang fbshipit-source-id: faac042a94c59d737bd5ae983121a0a029346e23	2020-02-21 08:32:22 -08:00
Nikolay Novik	d19a50bf27	Add missing weight_decay parameter validation for Adam and AdamW (#33126 ) Summary: Adam and AdamW are missing parameter validation for weight_decay. Other optimisers have this check present. Pull Request resolved: https://github.com/pytorch/pytorch/pull/33126 Differential Revision: D19860366 Pulled By: vincentqb fbshipit-source-id: 286d7dc90e2f4ccf6540638286d2fe17939648fc	2020-02-20 11:11:51 -08:00
Edgar Andrés Margffoy Tuay	cdf381c967	Fix LambdaLR scheduler side effects (#32848 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/32756 Pull Request resolved: https://github.com/pytorch/pytorch/pull/32848 Differential Revision: D19859736 Pulled By: vincentqb fbshipit-source-id: 43b3cbb2b6bed208c75aad37aebc2a8a9565fe0d	2020-02-20 11:09:56 -08:00
Jeong Ukjae	879cf0b15a	fix typing bug of LambdaLR.__init__ (#33271 ) Summary: ## problem ```python class LambdaLR(_LRScheduler): """Sets the learning rate of each parameter group to the initial lr times a given function. When last_epoch=-1, sets initial lr as lr. Args: optimizer (Optimizer): Wrapped optimizer. lr_lambda (function or list): A function which computes a multiplicative factor given an integer parameter epoch, or a list of such functions, one for each group in optimizer.param_groups. last_epoch (int): The index of last epoch. Default: -1. Example: >>> # Assuming optimizer has two groups. >>> lambda1 = lambda epoch: epoch // 30 >>> lambda2 = lambda epoch: 0.95 ** epoch >>> scheduler = LambdaLR(optimizer, lr_lambda=[lambda1, lambda2]) >>> for epoch in range(100): >>> train(...) >>> validate(...) >>> scheduler.step() """ ``` `LambdaLR` takes a lambda that returns a float and takes a int, or a list of such lambdas. ## related issue Resolve https://github.com/pytorch/pytorch/issues/32645 Pull Request resolved: https://github.com/pytorch/pytorch/pull/33271 Differential Revision: D19878665 Pulled By: vincentqb fbshipit-source-id: 50b16caea13de5a3cbd187e688369f33500499d0	2020-02-18 09:10:00 -08:00
cshesse	1487137c5b	add missing default value for LRScheduler.step() (#32411 ) Summary: see also other type errors in https://github.com/pytorch/pytorch/pull/30576 and https://github.com/pytorch/pytorch/pull/30441 Pull Request resolved: https://github.com/pytorch/pytorch/pull/32411 Differential Revision: D19697245 Pulled By: ezyang fbshipit-source-id: d0295d747541adec5d6fad646f4cf4bb2f04abf5	2020-02-11 20:34:33 -08:00
Vincent Quenneville-Belair	e7f0b15473	Remove return value for __exit__ (#32997 ) Summary: When an error is raised and `__exit__` in a context manager returns `True`, the error is suppressed; otherwise the error is raised. No return value should be given to maintain the default behavior of context manager. Fixes https://github.com/pytorch/pytorch/issues/32639. The `get_lr` function was overridden with a function taking an epoch parameter, which is not allowed. However, the relevant error was not being raised. ```python In [1]: import torch ...: ...: class MultiStepLR(torch.optim.lr_scheduler._LRScheduler): ...: def __init__(self, optimizer, gamma, milestones, last_epoch = -1): ...: self.init_lr = [group['lr'] for group in optimizer.param_groups] ...: self.gamma = gamma ...: self.milestones = milestones ...: super().__init__(optimizer, last_epoch) ...: ...: def get_lr(self, step): ...: global_step = self.last_epoch #iteration number in pytorch ...: gamma_power = ([0] + [i + 1 for i, m in enumerate(self.milestones) if global_step >= m])[-1] ...: return [init_lr * (self.gamma ** gamma_power) for init_lr in self.init_lr] ...: ...: optimizer = torch.optim.SGD([torch.rand(1)], lr = 1) ...: scheduler = MultiStepLR(optimizer, gamma = 1, milestones = [10, 20]) ``` ``` --------------------------------------------------------------------------- TypeError Traceback (most recent call last) <ipython-input-1-7fad6ba050b0> in <module> 14 15 optimizer = torch.optim.SGD([torch.rand(1)], lr = 1) ---> 16 scheduler = MultiStepLR(optimizer, gamma = 1, milestones = [10, 20]) <ipython-input-1-7fad6ba050b0> in __init__(self, optimizer, gamma, milestones, last_epoch) 6 self.gamma = gamma 7 self.milestones = milestones ----> 8 super().__init__(optimizer, last_epoch) 9 10 def get_lr(self, step): ~/anaconda3/envs/pytorch/lib/python3.7/site-packages/torch/optim/lr_scheduler.py in __init__(self, optimizer, last_epoch) 75 self._step_count = 0 76 ---> 77 self.step() 78 79 def state_dict(self): ~/anaconda3/envs/pytorch/lib/python3.7/site-packages/torch/optim/lr_scheduler.py in step(self, epoch) 141 print("1a") 142 # try: --> 143 values = self.get_lr() 144 # except TypeError: 145 # raise RuntimeError TypeError: get_lr() missing 1 required positional argument: 'step' ``` May be related to https://github.com/pytorch/pytorch/issues/32898. Pull Request resolved: https://github.com/pytorch/pytorch/pull/32997 Differential Revision: D19737731 Pulled By: vincentqb fbshipit-source-id: 5cf84beada69b91f91e36b20c3278e9920343655	2020-02-11 09:27:29 -08:00
Enealor	e085c55e53	Fix `\\` warnings/errors when building optim documentation (#32911 ) Summary: This PR fixes the warnings and errors attributed to the use of `\\` outside of a proper environment. While rendered correctly in the documentation, it produces the warning ``` LaTeX-incompatible input and strict mode is set to 'warn': In LaTeX, \\ or \newline does nothing in display mode [newLineInDisplayMode] ``` on the CI tools and errors with ``` ParseError: KaTeX parse error: Expected 'EOF', got '\\' at position (x): ... ``` when not set to warn. This PR also makes minor formatting adjustments. The `CosineAnnealingLR` documentation has been adjusted to remove an unnecessarily large fraction and to improve spacing. The `SGD` documentation has been adjusted so that variables are consistently typeset and so that it follows the convention of punctuating equations. I attached images of the current documentation, the new documentation and a marked version to highlight differences. * SGD: New: ![new_sgd](https://user-images.githubusercontent.com/53704971/73596383-98795500-44d6-11ea-97ce-bac02a0a1638.png) Current: ![current_sgd](https://user-images.githubusercontent.com/53704971/73596384-98795500-44d6-11ea-86d3-b407cebbb513.png) Marked new: ![marked_sgd](https://user-images.githubusercontent.com/53704971/73596385-98795500-44d6-11ea-9e06-9ac5e5e27270.png) * CosineAnnealingLR: New: ![new_calr](https://user-images.githubusercontent.com/53704971/73596382-98795500-44d6-11ea-9c90-02406d297bae.png) Current: ![current_calr](https://user-images.githubusercontent.com/53704971/73596387-9911eb80-44d6-11ea-93fb-ee72d695312a.png) Marked new: ![marked_calr](https://user-images.githubusercontent.com/53704971/73596386-9911eb80-44d6-11ea-91a6-ed7a62b4e255.png) Pull Request resolved: https://github.com/pytorch/pytorch/pull/32911 Differential Revision: D19697114 Pulled By: ezyang fbshipit-source-id: 567304bd4adcfa4086eae497cb818cf74375fe5d	2020-02-03 09:54:38 -08:00
Nikolay Novik	e87887ccb4	Update type hints for torch.optim.optimizer.Optimizer (#32900 ) Summary: This PR fixes type hints for `torch.optim.optimizer.Optimizer` object, issue also reported in https://github.com/pytorch/pytorch/issues/23731 To test things I used following optimiser implementation, that is fully covered with type hints: ```python from typing import Optional, Callable, Union, Iterable from torch import Tensor from torch.optim.optimizer import Optimizer OptClosure = Optional[Callable[[], float]] _params_t = Union[Iterable[Tensor], Iterable[dict]] class SGD(Optimizer): def __init__(self, params: _params_t, lr: float = 0.1) -> None: defaults = dict(lr=lr) super(SGD, self).__init__(params, defaults) def __setstate__(self, state: dict) -> None: super(SGD, self).__setstate__(state) def step(self, closure: OptClosure = None) -> Optional[float]: loss = None if closure is not None: loss = closure() for group in self.param_groups: for p in group['params']: if p.grad is None: continue d_p = p.grad.data p.data.add_(-group['lr'], d_p) return loss ``` Without fix `mypy` reports bunch of inconsistencies in types and missing properties: ```bash $ mypy torch_optimizer/sgd.py torch_optimizer/sgd.py:14: error: Too many arguments for "__init__" of "Optimizer" torch_optimizer/sgd.py:17: error: "__setstate__" undefined in superclass torch_optimizer/sgd.py:19: error: Return type "Optional[float]" of "step" incompatible with return type "None" in supertype "Optimizer" torch_optimizer/sgd.py:24: error: "SGD" has no attribute "param_groups" Found 4 errors in 1 file (checked 1 source file) ``` with fix not issues: ```bash $ mypy torch_optimizer/sgd.py Success: no issues found in 1 source file ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/32900 Differential Revision: D19697175 Pulled By: ezyang fbshipit-source-id: d5e2b3c421f69da3df8c32b3d53b4b6d15d61a41	2020-02-03 09:00:01 -08:00
Kirayue	9e9bfbfd8d	Update old scheduler example usage (#31358 ) Summary: Update the old example usage in CosineAnnealingWarm, `scheduler.step()` should be called after `optimizer.step()`. https://github.com/pytorch/pytorch/issues/20028#issuecomment-566061580 Pull Request resolved: https://github.com/pytorch/pytorch/pull/31358 Differential Revision: D19199311 Pulled By: vincentqb fbshipit-source-id: cb29b95f8277d2dfa75ec2a83c1af03a5c9c9a69	2020-01-02 09:15:04 -08:00
Vincent Quenneville-Belair	9459db86bf	Raise warning for schedulers following chainable shedulers (#31125 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/29697. Raise warning for schedulers following chainable schedulers in https://github.com/pytorch/pytorch/issues/26423. See explanation for * [new warning when load/save](https://github.com/pytorch/pytorch/issues/29697#issuecomment-564655802) * [change from deprecation to user warning](https://github.com/pytorch/pytorch/issues/29697#issuecomment-564659775). gchanan -- This should go in the upcoming release following https://github.com/pytorch/pytorch/issues/26423. Pull Request resolved: https://github.com/pytorch/pytorch/pull/31125 Differential Revision: D19143740 Pulled By: vincentqb fbshipit-source-id: 35b55fe6c5b39ca5a68b1a6e19f14eb95b9a784e	2019-12-23 08:24:22 -08:00
Stephen Roller	159835e666	Add types for the remaining optimizers. (#31130 ) Summary: Patch Description Round out the rest of the optimizer types in torch.optim by creating the stubs for the rest of them. Testing: I ran mypy looking for just errors in that optim folder. There's no new mypy errors created. ``` $ mypy torch/optim \| grep optim $ git checkout master; mypy torch/optim \| wc -l 968 $ git checkout typeoptims; mypy torch/optim \| wc -l 968 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/31130 Reviewed By: stephenroller Differential Revision: D18947145 Pulled By: vincentqb fbshipit-source-id: 5b8582223833b1d9123d829acc1ed8243df87561	2019-12-12 06:36:41 -08:00
albanD	b0871f211b	Make all optimizers consistent so that they don't change gradients inplace Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30257 Test Plan: Imported from OSS Differential Revision: D18665461 Pulled By: albanD fbshipit-source-id: cfdafef919468a41007881b82fd288b7128baf95	2019-11-26 12:16:25 -08:00
Vitaly Fedyunin	877c96cddf	explicitly provide memory format when calling to *_like operators Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30008 Test Plan: Imported from OSS Differential Revision: D18575981 Pulled By: VitalyFedyunin fbshipit-source-id: ec3418257089ad57913932be1a8608cd20ce054c	2019-11-19 16:19:29 -08:00
Adam J. Stewart	23483406aa	Fix missing space in lr_scheduler warning msg Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29527 Differential Revision: D18422662 Pulled By: ngimel fbshipit-source-id: 80191232ee0b639274ba3561e0d89ddcb40434e7	2019-11-10 22:51:35 -08:00
Igor Fedan	43d4d019c4	explicitly provide memory format when calling to clone() at rprop.py Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28693 Test Plan: Imported from OSS Differential Revision: D18333379 Pulled By: ifedan fbshipit-source-id: 4430efc0602a3fc6ef05adac07df845a696449f7	2019-11-07 09:00:37 -08:00
Igor Fedan	b05e9d4521	explicitly provide memory format when calling to clone() at lbfgs.py Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28692 Test Plan: Imported from OSS Differential Revision: D18333356 Pulled By: ifedan fbshipit-source-id: ca0de6b721f695893c0756ea1b3b469df1a2b249	2019-11-07 08:20:11 -08:00
なるみ	d83389d327	Ignore F401 in all __init__.py without putting noqa (#25823 ) Summary: By adding `per-file-ignores = __init__.py: F401` into `.flake8` with `flake8>=3.7`, we can ignore F410 in all `__init__.py` without putting `# noqa: F401` line by line. http://flake8.pycqa.org/en/latest/user/options.html?highlight=per-file-ignores#cmdoption-flake8-per-file-ignores Pull Request resolved: https://github.com/pytorch/pytorch/pull/25823 Differential Revision: D17252182 Pulled By: soumith fbshipit-source-id: 87b174075b79e4078953a7521bd1a8f82405646b	2019-10-23 15:28:13 -07:00
Vincent Quenneville-Belair	cbddc77ac5	fix docs for lr (#28026 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28026 Documentation for learning rate does not render well. #27730. Test Plan: Imported from OSS Differential Revision: D17953395 Pulled By: vincentqb fbshipit-source-id: 9e84df3e7de43f11399a67bc99c76ef241b1120f	2019-10-23 13:49:34 -07:00
Timothy Man	1c53a74e26	Fixed behavior of div_factor parameter in optim.lr_scheduler.OneCycleLR (#28217 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/28216 Pull Request resolved: https://github.com/pytorch/pytorch/pull/28217 Differential Revision: D18070759 Pulled By: vincentqb fbshipit-source-id: ed032190c0e3eab834fc9a8f408b75b56f0f35ec	2019-10-23 13:39:05 -07:00
Vincent Quenneville-Belair	e4f40bf3b2	Add multiplicative lr. (#27254 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27254 `MultiplicativeLR` consumes a function providing the multiplicative factor at each epoch. It mimics `LambdaLR` in its syntax. Test Plan: Imported from OSS Differential Revision: D17728088 Pulled By: vincentqb fbshipit-source-id: 1c4a8e19a4f24c87b5efccda01630c8a970dc5c9	2019-10-23 11:38:45 -07:00
Vincent Quenneville-Belair	d1d2358d31	Correct math formatting for lr scheduler (#28467 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28467 Correcting formatting error from #27874. Also making size of parenthesis more natural. ![Screen Shot 2019-10-22 at 5 38 22 PM](https://user-images.githubusercontent.com/3047868/67336492-76ddfa00-f4f3-11e9-9d79-70a0aa4f6d29.png) Closes #27874 Test Plan: Imported from OSS Differential Revision: D18076085 Pulled By: vincentqb fbshipit-source-id: cb7c52b347d6d11ea4a2d3c94d00a42f849c0a83	2019-10-23 11:11:25 -07:00
zou3519	e5d6b75319	Bag of documentation fixes; fix more sphinx warnings (#27850 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27850 Many of these are real problems in the documentation (i.e., link or bullet point doesn't display correctly). Test Plan: - built and viewed the documentation for each change locally. Differential Revision: D17908123 Pulled By: zou3519 fbshipit-source-id: 65c92a352c89b90fb6b508c388b0874233a3817a	2019-10-15 07:31:14 -07:00
Vincent Quenneville-Belair	28b1f586f6	Change schedulers to chainable form (#26423 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26423 Enable chainable schedulers as requested in #13022 by implementing the changes mentioned below from [comment](https://github.com/pytorch/pytorch/pull/21800#issuecomment-513370208). * Changing the behavior of schedulers to the chainable formula when available * Using the closed form whenever epoch is different from None until the next release with a deprecation warning * Making `get_computed_values` the supported way of obtaining the last computed learning rate by the scheduler (see [comment](https://github.com/pytorch/pytorch/pull/21800#issuecomment-513940729) for new syntax) * Returning a deprecation warning when invoking the undocumented get_lr function (see [comment](https://github.com/pytorch/pytorch/pull/21800#discussion_r294305485)) referring to `get_computed_values`, and deprecating it in the next release. * `CosineAnnealingWarmRestart` still takes an epoch parameter as it is the only one with a mechanic relying on fractional epoch * `MultiplicativeLR` is consumes a function providing the multiplicative factor at each epoch. It mimics `LambdaLR` in its syntax. # #20527 ### Before The user calls scheduler with a constant epoch either across loops or in the same loop. ``` import torch.optim as optim from torch import nn conv = nn.Conv2d(3,3,3) optimizer = optim.Adam(conv.parameters()) lr_scheduler = optim.lr_scheduler.StepLR(optimizer, 2) # Scheduler with sometimes-constant epoch number for epoch in [0, 0, 1, 1, 2, 2, 3, 3]: lr_scheduler.step(epoch) print(optimizer.param_groups[0]['lr']) ``` ### After If the user wants to step ``` import torch.optim as optim from torch import nn conv = nn.Conv2d(3,3,3) optimizer = optim.Adam(conv.parameters()) lr_scheduler = optim.lr_scheduler.StepLR(optimizer, 2) last_epoch = -1 for epoch in [0, 0, 1, 1, 2, 2, 3, 3]: # Check if epoch number has changed manually if epoch-last_epoch > 0: lr_scheduler.step() last_epoch = epoch print(epoch, scheduler.get_computed_values()) ``` # #22107 ### Before ``` import torch from torchvision.models import resnet18 net = resnet18() optimizer = torch.optim.SGD(net.parameters(), 0.1) scheduler = torch.optim.lr_scheduler.MultiStepLR(optimizer, milestones=[3, 6, 9], gamma=0.1) scheduler = torch.optim.lr_scheduler.StepLR(optimizer, 3, gamma=0.1) for i in range(10): # Scheduler computes and returns new learning rate, leading to unexpected behavior print(i, scheduler.get_lr()) scheduler.step() ``` ### After ``` import torch from torchvision.models import resnet18 net = resnet18() optimizer = torch.optim.SGD(net.parameters(), 0.1) lr_scheduler = torch.optim.lr_scheduler.MultiStepLR(optimizer, milestones=[3, 6, 9], gamma=0.1) lr_scheduler = torch.optim.lr_scheduler.StepLR(optimizer, 3, gamma=0.1) for i in range(10): # Returns last computed learning rate by scheduler print(i, lr_scheduler.get_computed_values()) lr_scheduler.step() ``` # ghstack This contains the changes from #24352. Opening again since they were reverted. This reverts commit `1c477b7e1f`. Test Plan: Imported from OSS Differential Revision: D17460427 Pulled By: vincentqb fbshipit-source-id: 8c10f4e7246d6756ac91df734e8bed65bdef63c9	2019-10-04 08:53:14 -07:00
Vincent Quenneville-Belair	e4fba752cb	fix type annotation Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26930 Test Plan: Imported from OSS Differential Revision: D17614745 Pulled By: vincentqb fbshipit-source-id: 1c29543f74d9cf307e9665aa890b4830b886fe63	2019-09-27 13:39:36 -07:00
Vincent Quenneville-Belair	660d9e24dd	Highlighting in the doc that square root comes before adding epsilon Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26735 Test Plan: Imported from OSS Differential Revision: D17558505 Pulled By: vincentqb fbshipit-source-id: 36449c501f3ab3bc7cadd1f580258904b39369d4	2019-09-25 15:52:28 -07:00
Zecong Hu	b8ae4d0f1c	Resolve #25605 cyclic reference in _LRScheduler (#25776 ) Summary: Cyclic reference was introduced in a previous version due to runtime overwriting of the bound method `optimizer.step`. This is now avoided by keeping a weak reference to the optimizer instance. Credit: https://stackoverflow.com/questions/26157952/why-set-a-bound-method-to-python-object-create-a-circular-reference Pull Request resolved: https://github.com/pytorch/pytorch/pull/25776 Differential Revision: D17420770 Pulled By: ezyang fbshipit-source-id: 546ec94cf725ebfddb310b24e6a2e146ddecd1f6	2019-09-18 06:08:35 -07:00
Vincent Quenneville-Belair	a3f0d988d9	Revert D17349760: Change schedulers to chainable form Test Plan: revert-hammer Differential Revision: D17349760 Original commit changeset: 0a6ac01e2a6b fbshipit-source-id: 41c2c136215dabc26cad5098a08eff2a2a29b715	2019-09-13 12:54:59 -07:00
Vincent Quenneville-Belair	939ae80de1	Change schedulers to chainable form (#24352 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24352 Enable chainable schedulers as requested in #13022 by implementing the changes mentioned below from [comment](https://github.com/pytorch/pytorch/pull/21800#issuecomment-513370208). * Changing the behavior of schedulers to the chainable formula when available * Using the closed form whenever epoch is different from None until the next release with a deprecation warning * Making `get_computed_values` the supported way of obtaining the last computed learning rate by the scheduler (see [comment](https://github.com/pytorch/pytorch/pull/21800#issuecomment-513940729) for new syntax) * Returning a deprecation warning when invoking the undocumented get_lr function (see [comment](https://github.com/pytorch/pytorch/pull/21800#discussion_r294305485)) referring to `get_computed_values`, and deprecating it in the next release. * `CosineAnnealingWarmRestart` still takes an epoch parameter as it is the only one with a mechanic relying on fractional epoch * `MultiplicativeLR` is consumes a function providing the multiplicative factor at each epoch. It mimics `LambdaLR` in its syntax. # #20527 ### Before The user calls scheduler with a constant epoch either across loops or in the same loop. ``` import torch.optim as optim from torch import nn conv = nn.Conv2d(3,3,3) optimizer = optim.Adam(conv.parameters()) lr_scheduler = optim.lr_scheduler.StepLR(optimizer, 2) # Scheduler with sometimes-constant epoch number for epoch in [0, 0, 1, 1, 2, 2, 3, 3]: lr_scheduler.step(epoch) print(optimizer.param_groups[0]['lr']) ``` ### After If the user wants to step ``` import torch.optim as optim from torch import nn conv = nn.Conv2d(3,3,3) optimizer = optim.Adam(conv.parameters()) lr_scheduler = optim.lr_scheduler.StepLR(optimizer, 2) last_epoch = -1 for epoch in [0, 0, 1, 1, 2, 2, 3, 3]: # Check if epoch number has changed manually if epoch-last_epoch > 0: lr_scheduler.step() last_epoch = epoch print(epoch, scheduler.get_computed_values()) ``` # #22107 ### Before ``` import torch from torchvision.models import resnet18 net = resnet18() optimizer = torch.optim.SGD(net.parameters(), 0.1) scheduler = torch.optim.lr_scheduler.MultiStepLR(optimizer, milestones=[3, 6, 9], gamma=0.1) scheduler = torch.optim.lr_scheduler.StepLR(optimizer, 3, gamma=0.1) for i in range(10): # Scheduler computes and returns new learning rate, leading to unexpected behavior print(i, scheduler.get_lr()) scheduler.step() ``` ### After ``` import torch from torchvision.models import resnet18 net = resnet18() optimizer = torch.optim.SGD(net.parameters(), 0.1) lr_scheduler = torch.optim.lr_scheduler.MultiStepLR(optimizer, milestones=[3, 6, 9], gamma=0.1) lr_scheduler = torch.optim.lr_scheduler.StepLR(optimizer, 3, gamma=0.1) for i in range(10): # Returns last computed learning rate by scheduler print(i, lr_scheduler.get_computed_values()) lr_scheduler.step() ``` Test Plan: Imported from OSS Differential Revision: D17349760 Pulled By: vincentqb fbshipit-source-id: 0a6ac01e2a6b45000bc6f9df732033dd81f0d89f	2019-09-13 07:36:05 -07:00
Vincent Quenneville-Belair	135bbc261d	fix base_lr overridden in cyclic lr (#26105 ) Summary: base_lr parameter was being overridden by super `__init__`, see https://github.com/pytorch/pytorch/issues/21965. Pull Request resolved: https://github.com/pytorch/pytorch/pull/26105 Reviewed By: yf225 Differential Revision: D17346724 Pulled By: vincentqb fbshipit-source-id: 4b146bd64f4f385c0a9c4f4df8eb8991312fb15c	2019-09-12 15:53:03 -07:00
Vincent Quenneville-Belair	05f1fed693	Add OneCycleLR (#25324 ) Summary: Squash rebase of https://github.com/pytorch/pytorch/issues/21258 ghstack-source-id: 7d3ce522ac4dd3050bc6c6bbda1eaaeb8bc4b2c1 Pull Request resolved: https://github.com/pytorch/pytorch/pull/25324 Pull Request resolved: https://github.com/pytorch/pytorch/pull/25325 Differential Revision: D17095722 Pulled By: vincentqb fbshipit-source-id: 7fe69b210924ee3b39223dd78122aea61267234a	2019-08-28 16:59:40 -07:00
lili	1b7f7aa12a	change LBFGS's default tolerance_grad to 1e-7 (#25240 ) Summary: Hi, I noticed after v1.2.0 the implement of LBFGS optimizer has been changed. In this new implement, the return condition has been changed from the sum of the gradients to the max value in the gradients (see: `b15d91490a/torch/optim/lbfgs.py (L313)`). But the default tolerance_grad parameter has not been changed (which is too large for max of gradients), so this result in lots of my old codes not optimizing or only optimizing for one or two steps. So, I came up this pull request to suggest that changing this tolerance_grad to a smaller value Pull Request resolved: https://github.com/pytorch/pytorch/pull/25240 Differential Revision: D17102713 Pulled By: vincentqb fbshipit-source-id: d46acacdca1c319c1db669f75da3405a7db4a7cb	2019-08-28 16:46:04 -07:00
Roy Li	14ac7a1d87	Add epsilon argument to Adagrad optimizer (#24980 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24980 We'll need this internally, so just updating the open source version. the other optimizers have this argument anyways. Test Plan: Imported from OSS Differential Revision: D16945279 Pulled By: li-roy fbshipit-source-id: 0b8cc86f15387cd65660747899d3d7dd870cff27	2019-08-21 16:36:51 -07:00
meijieru	bd054e7cef	reduce memory usage for centered rmsprop (#24170 ) Summary: Reduce gpu memory usage by using in-place operation Pull Request resolved: https://github.com/pytorch/pytorch/pull/24170 Differential Revision: D16784495 Pulled By: vincentqb fbshipit-source-id: 03820cdc9a3952b95b9af0f87d3a9bb0f21e9b4d	2019-08-13 12:18:31 -07:00
Geovanni Zhang	a5f697619c	Add interfaces in lr_scheduler.pyi (#23934 ) Summary: Some interfaces of schedulers defined in lr_scheduler.py are missing in lr_scheduler.pyi. Pull Request resolved: https://github.com/pytorch/pytorch/pull/23934 Differential Revision: D16726622 Pulled By: ezyang fbshipit-source-id: 45fd2d28fbb658c71f6fcd33b8997d6ee8e2b17d	2019-08-12 07:03:41 -07:00
Horace He	bb41e62e3b	Updated SGD docs with subscripts (#23985 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/23982 Obvious improvement imo. Also changed `rho` to `mu`, since `rho` and `p` look very similar. Pull Request resolved: https://github.com/pytorch/pytorch/pull/23985 Differential Revision: D16733037 Pulled By: Chillee fbshipit-source-id: 5431615d1983f24d6582da6fc8103ac0093b5832	2019-08-09 10:32:40 -07:00
Gregory Chanan	fc82ec298b	Update CosineAnnealingWarmRestarts to follow PyTorch 1.1+ Step Order. (#23833 ) Summary: Fixes: https://github.com/pytorch/pytorch/issues/23480. I only verified that the schedule reaches the restart at the expected step as specified in the issue, it would be good to have someone else verify correctness here. Script: ``` scheduler = torch.optim.lr_scheduler.CosineAnnealingWarmRestarts(torch.optim.SGD([torch.randn(1, requires_grad=True)], lr=0.5), T_0=1, T_mult=2) for i in range(9): print(i) print(scheduler.get_lr()) scheduler.step() ``` Output: ``` 0 [0.5] 1 [0.5] 2 [0.25] 3 [0.5] 4 [0.42677669529663687] 5 [0.25] 6 [0.07322330470336313] 7 [0.5] 8 [0.4809698831278217] ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/23833 Differential Revision: D16657251 Pulled By: gchanan fbshipit-source-id: 713973cb7cbfc85dc333641cbe9feaf917718eb9	2019-08-07 07:15:50 -07:00
Farhad Ramezanghorbani	fed5ca192c	Adam/AdamW implementation minor fix (#22628 ) Summary: I have noticed a small discrepancy between theory and the implementation of AdamW and in general Adam. The epsilon in the denominator of the following Adam update should not be scaled by the bias correction [(Algorithm 2, L9-12)](https://arxiv.org/pdf/1711.05101.pdf). Only the running average of the gradient (_m_) and squared gradients (_v_) should be scaled by their corresponding bias corrections. ![adam_update](https://user-images.githubusercontent.com/13050245/60894105-11117f00-a230-11e9-9ba0-adad2ae2e0ae.png) In the current implementation, the epsilon is scaled by the square root of `bias_correction2`. I have plotted this ratio as a function of step given `beta2 = 0.999` and `eps = 1e-8`. In the early steps of optimization, this ratio slightly deviates from theory (denoted by the horizontal red line). ![plot](https://user-images.githubusercontent.com/13050245/60893952-cabc2000-a22f-11e9-8dc2-6353ad5d674d.png) Pull Request resolved: https://github.com/pytorch/pytorch/pull/22628 Differential Revision: D16589914 Pulled By: vincentqb fbshipit-source-id: 8791eb338236faea9457c0845ccfdba700e5f1e7	2019-08-01 11:42:04 -07:00
Pavel Belevich	7b229342ca	Renamed CosineAnnealingLr to CosineAnnealingLR (#23242 ) Summary: fixing https://github.com/pytorch/pytorch/issues/23160 Pull Request resolved: https://github.com/pytorch/pytorch/pull/23242 Differential Revision: D16443348 Pulled By: pbelevich fbshipit-source-id: af0edf4e841e04a8016c98bfee72696581f3f070	2019-07-23 14:54:15 -07:00
Michael Acar	a4b2f3e213	Implement AdamW optimizer (#21250 ) Summary: # What is this? This is an implementation of the AdamW optimizer as implemented in [the fastai library](`803894051b/fastai/callback.py`) and as initially introduced in the paper [Decoupled Weight Decay Regularization](https://arxiv.org/abs/1711.05101). It decouples the weight decay regularization step from the optimization step during training. There have already been several abortive attempts to push this into pytorch in some form or fashion: https://github.com/pytorch/pytorch/pull/17468, https://github.com/pytorch/pytorch/pull/10866, https://github.com/pytorch/pytorch/pull/3740, https://github.com/pytorch/pytorch/pull/4429. Hopefully this one goes through. # Why is this important? Via a simple reparameterization, it can be shown that L2 regularization has a weight decay effect in the case of SGD optimization. Because of this, L2 regularization became synonymous with the concept of weight decay. However, it can be shown that the equivalence of L2 regularization and weight decay breaks down for more complex adaptive optimization schemes. It was shown in the paper [Decoupled Weight Decay Regularization](https://arxiv.org/abs/1711.05101) that this is the reason why models trained with SGD achieve better generalization than those trained with Adam. Weight decay is a very effective regularizer. L2 regularization, in and of itself, is much less effective. By explicitly decaying the weights, we can achieve state-of-the-art results while also taking advantage of the quick convergence properties that adaptive optimization schemes have. # How was this tested? There were test cases added to `test_optim.py` and I also ran a [little experiment](https://gist.github.com/mjacar/0c9809b96513daff84fe3d9938f08638) to validate that this implementation is equivalent to the fastai implementation. Pull Request resolved: https://github.com/pytorch/pytorch/pull/21250 Differential Revision: D16060339 Pulled By: vincentqb fbshipit-source-id: ded7cc9cfd3fde81f655b9ffb3e3d6b3543a4709	2019-07-02 09:09:10 -07:00
Vincent Quenneville-Belair	f176950a67	Use lower case for strong wolfe option. (#22092 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22092 ghimport-source-id: ccc53ed2f1e16865237334a4dde4d162e21762e5 Test Plan: Imported from OSS Differential Revision: D15955996 Pulled By: vincentqb fbshipit-source-id: 8ffbea3b9ef8ff7021d42524fa46112da8a3438e	2019-06-26 08:20:25 -07:00
fehiepsi	ad73ea22f7	Add strong Wolfe line search for lbfgs (#8824 ) Summary: This pull request adds a line search for lbfgs. "strong Wolfe" is the default line search method in [minFunc](https://www.cs.ubc.ca/~schmidtm/Software/minFunc.html) and it is also recommended in the [Numerical Optimization](https://www.springer.com/gp/book/9780387303031) book. The implementation is based on four sources: + https://www.cs.ubc.ca/~schmidtm/Software/minFunc.html + https://www.springer.com/gp/book/9780387303031 Algorithms 3.5, 3.6, formula 3.59 + https://github.com/torch/optim/blob/master/lswolfe.lua + https://github.com/torch/optim/blob/master/polyinterp.lua The 'lua' version is based on an old version of `minFunc`, which has been updated in 2012. I made a couple of small changes based on the updated version. Due to that, the test of comparing with `.lua` version is not consistent (that's is the reason I changed a learning rate in the test). Pull Request resolved: https://github.com/pytorch/pytorch/pull/8824 Differential Revision: D15783067 Pulled By: vincentqb fbshipit-source-id: 5316d9088233981120376d79c7869d5f97e51b69	2019-06-12 11:32:41 -07:00
Ejaaz Merali	fb9fbc009c	Fix momentum bug in CyclicLR (#20401 ) Summary: Resolves issue https://github.com/pytorch/pytorch/issues/19003 The author of this issue also asked that `cycle_momentum` default to `False` if the optimizer does not have a momentum parameter, but I'm not sure what the best way to do this would be. Silently changing the value based on the optimizer may confuse the user in some cases (say the user explicitly set `cycle_momentum=True` but doesn't know that the Adam optimizer doesn't use momentum). Maybe printing a warning when switching this argument's value would suffice? Pull Request resolved: https://github.com/pytorch/pytorch/pull/20401 Differential Revision: D15765463 Pulled By: ezyang fbshipit-source-id: 88ddabd9e960c46f3471f37ea46013e6b4137eaf	2019-06-11 15:10:28 -07:00
Edward Yang	3889855a5b	Revert "Redefine scheduler to set learning rate using recursive formula" #14010 (#21463 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21463 ghimport-source-id: 1b0ea4a282b41388d5c6f6a5d18d37c14ae874ad Differential Revision: D15747426 Pulled By: ezyang fbshipit-source-id: 0708394f907b98a9f45bcfa26e5cc450fda8cf76	2019-06-10 15:26:25 -07:00
vfn	8ece538a79	Addresses bad behavior with overridden optimizer.step by #20124 (#21460 ) Summary: This PR addresses the problem described in the comment: https://github.com/pytorch/pytorch/pull/20203#issuecomment-499231276 and previously coded bad behaviour: - a warning was raised all the times when lr schedulling is initialized Now the code checks that: - on the second call of `lr_scheduler.step`, ensure that `optimizer.step` has been already called, otherwise raise a warning (as it was done in #20203 ) - if optimizer's step is overridden -> raise once another warning to aware user about the new pattern: `opt.step()` -> `lrs.step()` as we can not check this . Now tests check that - at initialization (`lrs = StepLR(...)`)there is no warnings - if we replace `optimizer.step` by something else (similarly to the [code of nvidia/apex](https://github.com/NVIDIA/apex/blob/master/apex/amp/_process_optimizer.py#L287)) there is another warning raised. cc ezyang PS. honestly I would say that there is a lot of overhead introduced for simple warnings. I hope all these checks will be removed in future `1.2.0` or other versions... Pull Request resolved: https://github.com/pytorch/pytorch/pull/21460 Differential Revision: D15701776 Pulled By: ezyang fbshipit-source-id: eac5712b9146d9d3392a30f6339cd33d90c497c7	2019-06-06 13:54:42 -07:00
vfdev	449a2c3555	Fixes #20124 (#20203 ) Summary: Fixes #20124 Description: Code wraps `optimizer.step()` method to detect whether user is following new pattern or old pattern. In case of old pattern detected, a UserWarning is raised. Documentation is also updated to reflect the change: ![Screen Shot 2019-05-07 at 11 05 17](https://user-images.githubusercontent.com/2459423/57287527-04e63580-70b8-11e9-9ddd-5d159ef0ed2f.png) cc SsnL, bado-lee Pull Request resolved: https://github.com/pytorch/pytorch/pull/20203 Differential Revision: D15543060 Pulled By: ezyang fbshipit-source-id: 3605e1afdb6ffc2dfd5e75e92e01b967c4d065b5	2019-05-29 14:15:01 -07:00
njdalton	d190450a35	Fix typo in CyclicLR docs (#21021 ) Summary: Fixes a typo in the CyclicLR docs by adding the lr_scheduler directory and puts in other required arguments. Pull Request resolved: https://github.com/pytorch/pytorch/pull/21021 Differential Revision: D15530109 Pulled By: soumith fbshipit-source-id: 98781bdab8d82465257229e50fa3bd0015da1286	2019-05-28 21:18:50 -07:00
Sam Pepose	082936f033	Clarify cycliclr param docs (#20880 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20880 This clarifies how the momentum parameters should be used. Reviewed By: soumith Differential Revision: D15482450 fbshipit-source-id: e3649a38876c5912cb101d8e404abca7c3431766	2019-05-28 12:07:47 -07:00
Dmytro Dzhulgakov	c25e33789e	Lightweight at-most-once logging for API usage (#20745 ) Summary: Resubmit #20698 which got messed up. Idea is that when PyTorch is used in a custom build environment (e.g. Facebook), it's useful to track usage of various APIs centrally. This PR introduces a simple very lightweight mechanism to do so - only first invocation of a trigger point would be logged. This is significantly more lightweight than #18235 and thus we can allow to put logging in e.g. TensorImpl. Also adds an initial list of trigger points. Trigger points are added in such a way that no static initialization triggers them, i.e. just linking with libtorch.so will not cause any logging. Further suggestions of what to log are welcomed. Pull Request resolved: https://github.com/pytorch/pytorch/pull/20745 Differential Revision: D15429196 Pulled By: dzhulgakov fbshipit-source-id: a5e41a709a65b7ebccc6b95f93854e583cf20aca	2019-05-23 23:17:59 -07:00
Michael Kösel	4e0d098ace	Fix optimizer type hint (#20648 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/20548 Pull Request resolved: https://github.com/pytorch/pytorch/pull/20648 Differential Revision: D15453935 Pulled By: ezyang fbshipit-source-id: 8778e819c58fdc2620f123ec5b5fd568e23b7705	2019-05-22 11:27:40 -07:00
Edward Yang	74bdcd44c4	Remove tab. (#20715 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20715 ghimport-source-id: 600e244581b37152d86614cca6c9fb5fee6cdcde Differential Revision: D15417984 Pulled By: ezyang fbshipit-source-id: 939425de1a95ecc3798384e121b12faaba3a27b8	2019-05-20 11:57:18 -07:00
kirayue	d0c742134d	#20028 (#20696 ) Summary: Hi, ezyang Sorry to trouble you. Pull Request resolved: https://github.com/pytorch/pytorch/pull/20696 Differential Revision: D15413694 Pulled By: ezyang fbshipit-source-id: 1c19d18e00c3a66a52bb9230aa25d7530f6e659c	2019-05-20 07:51:55 -07:00
Edward Z. Yang	9b1dbffba5	Re-sync with internal repository (#20702 )	2019-05-20 09:22:57 -04:00
Dmytro Dzhulgakov	d3059b9c49	Lightweight logging for once-only API usage	2019-05-19 23:04:40 -07:00
Edward Yang	839a69f587	Revert D15393514: [pytorch][PR] Refine CosineAnnealingWarmRestarts doc for issue #20028 Differential Revision: D15393514 Original commit changeset: 03f270a577fc fbshipit-source-id: 3633f4e9916bdadf018288a64df89078b14af563	2019-05-17 09:55:56 -07:00
kirayue	3c69c9a7fe	Refine CosineAnnealingWarmRestarts doc for issue #20028 (#20267 ) Summary: Fixes #20028 Pull Request resolved: https://github.com/pytorch/pytorch/pull/20267 Differential Revision: D15393514 Pulled By: ezyang fbshipit-source-id: 03f270a577fc3e0414d3f07d97512a409b08f7cd	2019-05-17 09:02:28 -07:00
vfdev	61f1242b7f	Formula typo fix (#20110 ) Summary: T_{cur + 1} -> T_{cur} + 1 Pull Request resolved: https://github.com/pytorch/pytorch/pull/20110 Differential Revision: D15218135 Pulled By: ezyang fbshipit-source-id: fb914d977cac447867921510bf57b59e62e4f68c	2019-05-06 08:08:37 -07:00
Ricky Chen	57948414ac	Fix small typo T_mul->T_mult Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20148 Differential Revision: D15217485 Pulled By: ezyang fbshipit-source-id: cb183cdc2eb3e42c685ef024742a18745923d283	2019-05-06 06:32:33 -07:00
barrh	767c82e151	Initialize last_epoch in _LRScheduler.__init__() (#20059 ) Summary: Class attributes preferably be explicitly initiated within the __init__() call. Otherwise, overriding step() is prone to bugs. This patch partially reverts #7889 Pull Request resolved: https://github.com/pytorch/pytorch/pull/20059 Differential Revision: D15195747 Pulled By: soumith fbshipit-source-id: 3d1a51d8c725d6f14e3e91ee94c7bc7a7d6c1713	2019-05-02 22:38:12 -07:00
Soumith Chintala	75754beca3	Revert D14577575: [pytorch][PR] Fix lack of state init for adagrad and add share_memory flag Differential Revision: D14577575 Original commit changeset: 12440079ac96 fbshipit-source-id: 935106385e608471dc280fc61cfedf19d330812d	2019-04-26 15:43:04 -07:00
kirayue	af06d6342c	Add SGDR(Stochastic Gradient Descent with Warm Restarts) scheduler (#17226 ) Summary: Because of merge error with master in #15042, open a new PR for ezyang. Pull Request resolved: https://github.com/pytorch/pytorch/pull/17226 Differential Revision: D14418145 Pulled By: mrshenli fbshipit-source-id: 099ba225b28e6aba71760b81b2153ad1c40fbaae	2019-04-25 09:26:31 -07:00
Kaiyu Shi	444f792fa6	Fix lack of state init for adagrad and add share_memory flag (#17679 ) Summary: The current code initialize the `state` in `__init__` method, but the initialization process is not invoked in `add_parameter_group`. I followed the same approach in other Optimizers to init the `state`. ```python import torch emb = torch.nn.Embedding(10,10) emb2 = torch.nn.Embedding(10,10) optim = torch.optim.Adagrad(emb.parameters()) print(optim.state[emb.weight]) # already initialized optim.add_param_group({'params': emb2.parameters()}) print(optim.state[emb2.weight]) # empty dict loss = emb2.weight.sum() + emb.weight.sum() loss.backward() optim.step() # raised KeyError ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/17679 Differential Revision: D14577575 Pulled By: ezyang fbshipit-source-id: 12440079ac964b9eedad48e393d47f558babe300	2019-04-23 12:22:19 -07:00
Chandler Zuo	e3f1504621	Fix the Division by Zero Bug of CosineAnnealingLR (#19180 ) Summary: Added the formula for the corner case. Updated unit tests. Fixes #17913 Pull Request resolved: https://github.com/pytorch/pytorch/pull/19180 Differential Revision: D14942023 Pulled By: ezyang fbshipit-source-id: 167c109b97a7830d5b24541dc91e4788d531feec	2019-04-23 09:54:28 -07:00
Bado Lee	36084908e4	Fix lr_scheduler's last_epoch value at the time of initialization (BC BREAKING!) (#7889 ) Summary: Hello everyone :) !! I've found that lr_scheduler was initialized with last_epoch as -1. This causes that even after the first step (not the one in init but explicit step of scheduler), learning rate of scheduler's optimizer remains as the previous. ```python >>> import torch >>> cc = torch.nn.Conv2d(10,10,3) >>> myinitial_lr = 0.1 >>> myoptimizer = torch.optim.Adam(cc.parameters(), lr=myinitial_lr) >>> mylrdecay = 0.5 >>> myscheduler = torch.optim.lr_scheduler.ExponentialLR(myoptimizer,mylrdecay) >>> myscheduler.get_lr() [0.2] # this is because of get_lr calculates lr by 0.1 * 0.5^-1 >>> myscheduler.optimizer.param_groups[0]["lr"] 0.1 # this is not consistent with get_lr value >>> myscheduler.last_epoch -1 >>> myscheduler.step() >>> myscheduler.get_lr() [0.1] # this should be the value right after the init, not after first step >>> myscheduler.optimizer.param_groups[0]["lr"] 0.1 # since this is after first step, it should have been decayed as 0.05 >>> myscheduler.last_epoch 0 >>> myscheduler.step() >>> myscheduler.last_epoch 1 >>> myscheduler.get_lr() [0.05] >>> myscheduler.optimizer.param_groups[0]["lr"] 0.05 >>> myscheduler.last_epoch 1 ``` First problem is, even after the init of lr_scheduler, you get the inconsistent parameter values. The second problem is, you are stuck with same learning rate in the first 2 epochs if the step function of lr_scheduler is not called in the beginning of the epoch loop. Of course, you can avoid this by calling lr_scheduler's step in the beginning, but I don't think this is proper use since, incase of optimizer, step is called in the end of the iteration loop. I've simply avoided all above issues by setting last_epoch as 0 after the initialization. This also makes sense when you init with some value of last_epoch which is not -1. For example, if you want to init with last epoch 10, lr should not be set with decayed 1 step further. Which is last_epoch gets +1 in the previous code. base_lr * self.gamma ** self.last_epoch Instead, it should be set with step 10 exact value. I hope this fix find it's way with all your help :) I'm really looking forward & excited to become a contributor for pytorch! Pytorch Rocks!! Pull Request resolved: https://github.com/pytorch/pytorch/pull/7889 Differential Revision: D15012769 Pulled By: ezyang fbshipit-source-id: 258fc3009ea7b7390a3cf2e8a3682eafb506b08b	2019-04-23 08:54:09 -07:00
barrh	557b1b362f	Fix copied optimizer (#19308 ) Summary: Add the defaults field to the copied object. Prior to this patch, optimizer.__getattr__ has excluded the defaults attribute of optimizer source object, required by some LR schedulers. (e.g. CyclicLR with momentum) Pull Request resolved: https://github.com/pytorch/pytorch/pull/19308 Differential Revision: D15012801 Pulled By: soumith fbshipit-source-id: 95801b269f6f9d78d531d4fed95c973b280cc96f	2019-04-19 10:27:01 -07:00
Jon Malmaud	1b25fdbcd0	More type stubs (#18511 ) Summary: Added stubs for: * The `device` module * The `cuda` module * Parts of the `optim` module * Began adding stubs for the `autograd` module. I'll annotate more later but `no_grad` and friends are probably the most used exports from it so it seemed like a good place to start. This would close #16996, although comments on that issue reference other missing stubs so maybe it's worth keeping open as an umbrella issue. The big remaining missing package is `nn`. Also added a `py.typed` file so mypy will pick up on the type stubs. That closes #17639. Pull Request resolved: https://github.com/pytorch/pytorch/pull/18511 Differential Revision: D14715053 Pulled By: ezyang fbshipit-source-id: 9e4882ac997063650e6ce47604b3eaf1232c61c9	2019-04-01 16:03:58 -07:00
Edward Yang	173f224570	Turn on F401: Unused import warning. (#18598 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18598 ghimport-source-id: c74597e5e7437e94a43c163cee0639b20d0d0c6a Stack from [ghstack](https://github.com/ezyang/ghstack): * #18598 Turn on F401: Unused import warning. This was requested by someone at Facebook; this lint is turned on for Facebook by default. "Sure, why not." I had to noqa a number of imports in __init__. Hypothetically we're supposed to use __all__ in this case, but I was too lazy to fix it. Left for future work. Be careful! flake8-2 and flake8-3 behave differently with respect to import resolution for # type: comments. flake8-3 will report an import unused; flake8-2 will not. For now, I just noqa'd all these sites. All the changes were done by hand. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Differential Revision: D14687478 fbshipit-source-id: 30d532381e914091aadfa0d2a5a89404819663e3	2019-03-30 09:01:17 -07:00
Søren Rasmussen	95d3825e48	ReduceLrOnPlateau: best=current -> best=copy(current) (#16364 ) (#16697 ) Summary: Fixes #16364 Pull Request resolved: https://github.com/pytorch/pytorch/pull/16697 Differential Revision: D14680879 Pulled By: soumith fbshipit-source-id: c50c22f3eacea4474fb3a04fe85fbf11d5a177c9	2019-03-29 06:56:51 -07:00
Sam Pepose	8635078d9e	Adds Cyclical Learning Rate and Momentum (#18001 ) Summary: This implements a cyclical learning rate (CLR) schedule with an optional inverse cyclical momentum. More info about CLR: https://github.com/bckenstler/CLR This is finishing what #2016 started. Resolves #1909. Pull Request resolved: https://github.com/pytorch/pytorch/pull/18001 Differential Revision: D14451845 Pulled By: sampepose fbshipit-source-id: 8f682e0c3dee3a73bd2b14cc93fcf5f0e836b8c9	2019-03-27 19:56:04 -07:00

... 3 4 5 6 7 ...

549 Commits