pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
Animesh Jain	0444f9f85b	[dynamo] Reland #104317 - Lazy disable_dynamo API out-of-dynamo (#104664 ) Internal failed because of torch.deploy issues with disable_dynamo in fx/* and _jit/* files. Removing disable_dynamo for both. Added a comment in the code. Pull Request resolved: https://github.com/pytorch/pytorch/pull/104664 Approved by: https://github.com/wconstab	2023-07-06 00:48:02 +00:00
Michael Lazos	a290cbf32b	Enable fused foreach Adam compilation (#104121 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/104121 Approved by: https://github.com/janeyx99	2023-07-05 23:40:03 +00:00
PyTorch MergeBot	54e320d4d1	Revert "[dynamo] Lazy disable_dynamo API out-of-dynamo (#104317 )" This reverts commit `5c12a810ac`. Reverted https://github.com/pytorch/pytorch/pull/104317 on behalf of https://github.com/huydhn due to This has been reverted internally by D47166892, so I need to also revert it on OSS to keep them in sync ([comment](https://github.com/pytorch/pytorch/pull/104317#issuecomment-1621099151))	2023-07-05 06:21:48 +00:00
Animesh Jain	5c12a810ac	[dynamo] Lazy disable_dynamo API out-of-dynamo (#104317 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/104317 Approved by: https://github.com/jansel, https://github.com/wconstab, https://github.com/mlazos	2023-06-29 13:30:17 +00:00
Michael Lazos	5a97c947c6	Fix optimizer grad mode state interaction with dynamo (#103952 ) Graph break before restoring the grad mode to ensure dynamo respects `no_grad`. This isn't a bug necessarily, but this will allow us to get good perf until aot is updated. https://github.com/pytorch/pytorch/issues/104053 Pull Request resolved: https://github.com/pytorch/pytorch/pull/103952 Approved by: https://github.com/janeyx99	2023-06-23 02:07:08 +00:00
Nikita Shulga	6d2887cc06	Reland "Move tensor grouping to ATen" (#103912 ) This is a reland of https://github.com/pytorch/pytorch/pull/100007 with a build fix for Windows debug builds. `at::native::ParamsHash` only works on structs with standard layout, but `std::string` isn't one in Visual C++ debug builds, which one can easily verified by running something like: ```cpp #define _DEBUG #include <type_traits> #include <string> static_assert(std::is_standard_layout_v<std::string>, "Oh noes"); ``` If above conditon is not met, instead of printing a static_assert output, VC++ raises a very cryptic compilation errors, see https://github.com/pytorch/pytorch/pull/100007#discussion_r1227116292 for more detail. Also, using `std::hash` for string should result in a faster hash function. (cherry picked from commit `74b7a6c75e`) <!-- copilot:summary --> ### <samp>🤖 Generated by Copilot at 5914771</samp> This pull request introduces a new function `_group_tensors_by_device_and_dtype` that can group tensors by their device and dtype, and updates the `foreach` utilities and several optimizers to use this function. The goal is to improve the performance, readability, and compatibility of the code that handles tensors with different properties. The pull request also adds a test case and type annotations for the new function, and some error checks for the `fused` argument in Adam and AdamW. Pull Request resolved: https://github.com/pytorch/pytorch/pull/103912 Approved by: https://github.com/janeyx99	2023-06-21 09:26:33 +00:00
WEN Hao	67babf7a45	Enhance decorator _use_grad_for_differentiable (#103567 ) Aim: enhance decorator _use_grad_for_differentiable so that functions (methods) decorated by it keep their docstrings and signatures unchanged. Fixes #103566 Pull Request resolved: https://github.com/pytorch/pytorch/pull/103567 Approved by: https://github.com/janeyx99	2023-06-16 18:33:31 +00:00
Andrew Gu	9152d0e5be	Silence `has_cuda` deprecation in optim (#103610 ) ``` UserWarning: 'has_cuda' is deprecated, please use 'torch.backends.cuda.is_built()' ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/103610 Approved by: https://github.com/janeyx99, https://github.com/Skylion007	2023-06-14 22:09:22 +00:00
Jane Xu	fa893f3f58	Fix optim state_dict casting to allow step to cast to CPU (#102619 ) I'm guessing this should fix https://github.com/pytorch/pytorch/pull/88015#issuecomment-1569523106 but am waiting on @ychfan to supply more details so I could write a good test case. Pull Request resolved: https://github.com/pytorch/pytorch/pull/102619 Approved by: https://github.com/albanD	2023-06-13 00:46:40 +00:00
PyTorch MergeBot	0cb5bc3b04	Revert "Move tensor grouping to ATen (#100007 )" This reverts commit `74b7a6c75e`. Reverted https://github.com/pytorch/pytorch/pull/100007 on behalf of https://github.com/izaitsevfb due to Breaks internal builds, see D46629727 ([comment](https://github.com/pytorch/pytorch/pull/100007#issuecomment-1587861598))	2023-06-12 18:30:33 +00:00
Masaki Kozuki	74b7a6c75e	Move tensor grouping to ATen (#100007 ) rel: #94344 Pull Request resolved: https://github.com/pytorch/pytorch/pull/100007 Approved by: https://github.com/janeyx99	2023-06-09 15:44:46 +00:00
shibo19	e4a42bcf56	add foreach support for custom device (#102047 ) Fixes #ISSUE_NUMBER for custom device, we want to support foreach, so I add a func that we could set other device type, and the default value is cuda. Pull Request resolved: https://github.com/pytorch/pytorch/pull/102047 Approved by: https://github.com/janeyx99	2023-06-07 13:59:20 +00:00
Michael Lazos	00f1bb0963	Fix optimizer cuda health check graph break (can be done in the compiler) (#102765 ) - Ignore the health check if we are compiling - Don't disable the function anymore Pull Request resolved: https://github.com/pytorch/pytorch/pull/102765 Approved by: https://github.com/albanD	2023-06-03 03:42:23 +00:00
Michael Lazos	4da88447ea	Disable grouping by dtype and device if compiling (#102771 ) Disable grouping if we are compiling, this happens during lowering Pull Request resolved: https://github.com/pytorch/pytorch/pull/102771 Approved by: https://github.com/janeyx99	2023-06-02 21:04:49 +00:00
PyTorch MergeBot	9d77949b9e	Revert "add foreach support for custom device (#102047 )" This reverts commit `b088ff4677`. Reverted https://github.com/pytorch/pytorch/pull/102047 on behalf of https://github.com/malfet due to Broke inductor, see `b088ff4677` ([comment](https://github.com/pytorch/pytorch/pull/102047#issuecomment-1572368942))	2023-06-01 16:33:03 +00:00
shibo19	b088ff4677	add foreach support for custom device (#102047 ) Fixes #ISSUE_NUMBER for custom device, we want to support foreach, so I add a func that we could set other device type, and the default value is cuda. Pull Request resolved: https://github.com/pytorch/pytorch/pull/102047 Approved by: https://github.com/janeyx99	2023-06-01 06:22:44 +00:00
PyTorch MergeBot	4637c5ae5b	Revert "Simplify _use_grad_for_differentiable (#98706 )" This reverts commit `b9da79d280`. Reverted https://github.com/pytorch/pytorch/pull/98706 on behalf of https://github.com/huydhn due to Sorry for reverting your PR but a bunch of inductor tests are failing after this commit, so reverting the PR just to be sure	2023-04-22 00:35:56 +00:00
Jason Ansel	b9da79d280	Simplify _use_grad_for_differentiable (#98706 ) This makes it so dynamo can trace through it Pull Request resolved: https://github.com/pytorch/pytorch/pull/98706 Approved by: https://github.com/janeyx99	2023-04-21 20:47:19 +00:00
Jane Xu	aacbf091db	Allow fused optimizers to call _foreach_zero_ in zero_grad (#97159 ) Fixes #97032 Pull Request resolved: https://github.com/pytorch/pytorch/pull/97159 Approved by: https://github.com/Skylion007	2023-03-20 19:03:26 +00:00
Aaron Gokaslan	5471621497	[BE] Remove unnecessary dict comprehensions (#97116 ) Removes unnecessary dict comprehensions that optimize creation of dicts from iterables Pull Request resolved: https://github.com/pytorch/pytorch/pull/97116 Approved by: https://github.com/kit1980	2023-03-20 00:56:57 +00:00
Aaron Gokaslan	dd9ade6377	Remove unnecessary items() call in zero_grad (#97040 ) Micro-optimization to zero_grad() which is performance critical Pull Request resolved: https://github.com/pytorch/pytorch/pull/97040 Approved by: https://github.com/ezyang, https://github.com/albanD	2023-03-17 21:34:14 +00:00
Jane Xu	75cb99e549	[optim] Widen the cases for defaulting to foreach (#95820 ) Big OOP correction continued. Also added a test this time to verify the defaulting was as expected. The key here is realizing that the grouping for foreach already assumes that the non-param tensorlists follow suit in dtype and device, so it is too narrow to check that _all_ tensors were on CUDA. The main leeway this allowed was state_steps, which are sometimes cpu tensors. Since foreach _can_ handle cpu tensors, this should not introduce breakage. Pull Request resolved: https://github.com/pytorch/pytorch/pull/95820 Approved by: https://github.com/albanD	2023-03-02 04:15:33 +00:00
Jane Xu	2bcf863fad	[optim] include nn.Parameter as foreach supported (#95811 ) This PR is a result of a realization that models are NOT subscribed to the foreach defaulting as have been claimed on our documentation for months now. BIG OOPS. Pull Request resolved: https://github.com/pytorch/pytorch/pull/95811 Approved by: https://github.com/albanD	2023-03-02 04:15:33 +00:00
Jane Xu	e5b9d98752	Rephrase zero_grad docs (#95643 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/95643 Approved by: https://github.com/albanD	2023-02-28 22:04:23 +00:00
Jane Xu	097679478e	[optim] Set defaults to foreach, NOT fused (#95241 ) Rolling back the default change for Adam and rectifying the docs to reflect that AdamW never defaulted to fused. Since our fused implementations are relatively newer, let's give them a longer bake-in time before flipping the switch for every user. Pull Request resolved: https://github.com/pytorch/pytorch/pull/95241 Approved by: https://github.com/ngimel	2023-02-22 04:47:32 +00:00
Aaron Gokaslan	8fce9a09cd	[BE]: pyupgrade Python to 3.8 - imports and object inheritance only (#94308 ) Apply parts of pyupgrade to torch (starting with the safest changes). This PR only does two things: removes the need to inherit from object and removes unused future imports. Pull Request resolved: https://github.com/pytorch/pytorch/pull/94308 Approved by: https://github.com/ezyang, https://github.com/albanD	2023-02-07 21:10:56 +00:00
Masaki Kozuki	6ba041fcae	Look up `group["capturable"]`, not `defaults["capturable"]` in Adam(W) (#94149 ) We could set different values in each `param_group` when calling dunder init of `torch.optim` optimizers as in e.g. https://github.com/pytorch/pytorch/issues/89987. So check whether or not `capturable` is `True` among all the `param_group`s. Pull Request resolved: https://github.com/pytorch/pytorch/pull/94149 Approved by: https://github.com/albanD	2023-02-07 00:24:35 +00:00
Jane Xu	4fc19e1a71	[optim][adam] use fastest impl whenever possible, add util (#93184 ) This allows it so that ONLY when the users don't set anything for foreach or fused do we switch the default and cascades adam so that we default to fused, then foreach, then single-tensor. To clarify: * if the user puts True in foreach _only_, it will run the foreach implementation. * if the user puts True in fused _only_, it will run the fused implementation. * if the user puts True in foreach AND for fused, it will run the fused implementation. And: * if the user puts False in foreach _only_, it will run the single tensor implementation. * if the user puts False in fused _only_, it will still run the single tensor implementation. * if the user puts False in foreach AND for fused, it will run the single tensor implementation. I also didn't trust myself that much with the helper function, so I ran some local asserts on _default_to_fused_or_foreach. The only point left to really test is the type(p) -- torch.Tensor but I think the distributed tests will catch that in CI. ``` cuda_only_fp_list = [ torch.rand((1, 2), device="cuda", dtype=torch.float32), torch.rand((1, 2), device="cuda", dtype=torch.float64), torch.rand((1, 2), device="cuda", dtype=torch.float16), torch.rand((1, 2), device="cuda", dtype=torch.bfloat16), ] cuda_only_int_list = [ torch.randint(1024, (1, 2), device="cuda", dtype=torch.int64), ] cpu_list = [ torch.rand((1, 2), device="cpu", dtype=torch.float32), torch.rand((1, 2), device="cpu", dtype=torch.float64), torch.rand((1, 2), device="cpu", dtype=torch.float16), ] none_list = [None] # differentiable should always make it return false for both assert _default_to_fused_or_foreach([cuda_only_fp_list], True, True) == (False, False) assert _default_to_fused_or_foreach([cuda_only_fp_list], True, False) == (False, False) # cpu lists should always make it return false for both assert _default_to_fused_or_foreach([cuda_only_fp_list, cpu_list], False, True) == (False, False) assert _default_to_fused_or_foreach([cpu_list], False, True) == (False, False) assert _default_to_fused_or_foreach([cuda_only_fp_list, cpu_list], False, False) == (False, False) assert _default_to_fused_or_foreach([cpu_list], False, False) == (False, False) # has fused triggers correctly assert _default_to_fused_or_foreach([cuda_only_fp_list], False, True) == (True, False) assert _default_to_fused_or_foreach([cuda_only_fp_list], False, False) == (False, True) # ints always goes to foreach assert _default_to_fused_or_foreach([cuda_only_fp_list, cuda_only_int_list], False, True) == (False, True) assert _default_to_fused_or_foreach([cuda_only_fp_list, cuda_only_int_list], False, False) == (False, True) # Nones don't error assert _default_to_fused_or_foreach([cuda_only_fp_list, none_list], False, True) == (True, False) assert _default_to_fused_or_foreach([cuda_only_fp_list, cuda_only_int_list, none_list], False, True) == (False, True) assert _default_to_fused_or_foreach([none_list], False, True) == (True, False) assert _default_to_fused_or_foreach([none_list], False, False) == (False, True) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/93184 Approved by: https://github.com/albanD	2023-01-30 19:58:55 +00:00
Jane Xu	8c9f745af1	[foreach] guard default support on native tensors only (#92923 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/92923 Approved by: https://github.com/ngimel, https://github.com/crcrpar	2023-01-26 04:52:58 +00:00
Jane Xu	b90496eef5	[nn] zero_grad() set_to_none default True (#92731 ) Attempts to fix #92656 BC-breaking! This changes the default of zero_grad in optim and in nn to default set grads to None instead of zero tensors. We are changing the default because there are proven perf wins and existing code has typically not regressed due to this change. (will probably have to flesh out this note more). Pull Request resolved: https://github.com/pytorch/pytorch/pull/92731 Approved by: https://github.com/ngimel	2023-01-26 01:04:28 +00:00
Jane Xu	0d870b50d3	[optim][nadam] group tensors in foreach, make it default (#92715 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/92715 Approved by: https://github.com/albanD	2023-01-21 05:43:37 +00:00
Jane Xu	0070c546b5	[BE][optim] abstract out docstrings, add differentiable docs (#92336 ) 1. abstract out common doc strings --> I'm sure there are more, but let this be a first step. 2. Add differentiable docs to those who are actually differentiable Pull Request resolved: https://github.com/pytorch/pytorch/pull/92336 Approved by: https://github.com/albanD	2023-01-18 15:09:28 +00:00
Jane Xu	4fc796daf9	[optim] abstract out _default_to_foreach_util (#92305 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/92305 Approved by: https://github.com/albanD	2023-01-17 19:42:20 +00:00
PyTorch MergeBot	7f2b5ea1e1	Revert "Avoid device casting for all singleton tensors in optimizer states (#91454 )" This reverts commit `1e725c9747`. Reverted https://github.com/pytorch/pytorch/pull/91454 on behalf of https://github.com/janeyx99 due to Likely caused regression where checkpoint resume fails during training	2023-01-10 18:57:50 +00:00
Joel Schlosser	1e725c9747	Avoid device casting for all singleton tensors in optimizer states (#91454 ) Fixes #75224 Pull Request resolved: https://github.com/pytorch/pytorch/pull/91454 Approved by: https://github.com/janeyx99	2023-01-04 17:55:00 +00:00
Soumith Chintala	06326a7721	[optim] skip .item calls in all optimizers when compiling with dynamo (#88173 ) @mlazos: skips `item()` calls if compiling with dynamo, by defining a helper function `_get_value` which either returns the result of `.item()` or the scalar cpu tensor if compiling with dynamo. This was done because removing `item()` calls significantly regresses eager perf. Additionally, `_dispatch_sqrt` calls the appropriate sqrt function (math.sqrt, or torch.sqrt). Fixes https://github.com/pytorch/torchdynamo/issues/1083 This PR will no longer be needed once symint support is default. This PR closes all remaining graph breaks in the optimizers (!!) Pull Request resolved: https://github.com/pytorch/pytorch/pull/88173 Approved by: https://github.com/albanD	2022-12-12 17:32:35 +00:00
Anupam Bhatnagar	6f4dea562d	Implement post and pre hooks for optimizer (#89176 ) Fixes #88446 Pull Request resolved: https://github.com/pytorch/pytorch/pull/89176 Approved by: https://github.com/albanD	2022-12-02 07:03:45 +00:00
Seonglyong Gong	f80ef73d1c	[Profiler] tracking Optimizer (part 2 of Record Optimizer) (#84920 ) Summary: Part 2 of Record Optimizer param_groups and states (https://github.com/pytorch/pytorch/pull/84063) - hooking from optimizer step - PyOptCall Type - declare data type for collection - python binding - simple unit test case Test Plan: buck run mode/opt //caffe2/test:profiler Differential Revision: D39402667 Pull Request resolved: https://github.com/pytorch/pytorch/pull/84920 Approved by: https://github.com/robieta	2022-09-28 02:48:07 +00:00
ProGamerGov	71d50f4f89	Change docstring type callable to Callable for consistency (#82487 ) ### Description Across PyTorch's docstrings, both `callable` and `Callable` for variable types. The Callable should be capitalized as we are referring to the `Callable` type, and not the Python `callable()` function. ### Testing There shouldn't be any testing required. Pull Request resolved: https://github.com/pytorch/pytorch/pull/82487 Approved by: https://github.com/albanD	2022-08-01 17:26:09 +00:00
Emilio Castillo	49b4f45781	Add initial support for differentiable optimizers (#80938 ) Adds the `differentiable` argument, a method for updating parameters in an existing optimizer, and a template for testing the differentiability of multiple optimizers. This is all based in discussions with @albanD & @jbschlosser Pull Request resolved: https://github.com/pytorch/pytorch/pull/80938 Approved by: https://github.com/albanD	2022-07-25 13:37:08 +00:00
Edward Z. Yang	57f001f35a	Don't error if _warned_capturable_if_run_uncaptured not set (#80345 ) This can happen if an optimizer was pickled. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/80345 Approved by: https://github.com/malfet, https://github.com/albanD	2022-06-29 03:46:22 +00:00
anjali411	bda04e9f5e	Add __all__ for torch.optim and torch.nn.modules modules (#80237 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/80237 Approved by: https://github.com/albanD	2022-06-24 21:34:10 +00:00
Michael Carilli	ba27ee9e8f	[CUDA graphs] Allows Adam and AdamW to be capture-safe (#77862 ) Near term fix for https://github.com/pytorch/pytorch/issues/76368. Q. Why does the user need to request `capturable=True` in the optimizer constructor? Why can't capture safety be completely automatic? A. We need to set up capture-safe (device-side) state variables before capture. If we don't, and step() internally detects capture is underway, it's too late: the best we could do is create a device state variable and copy the current CPU value into it, which is not something we want baked into the graph. Q. Ok, why not just do the capture-safe approach with device-side state variables all the time? A. It incurs several more kernel launches per parameter, which could really add up and regress cpu overhead for ungraphed step()s. If the optimizer won't be captured, we should allow step() to stick with its current cpu-side state handling. Q. But cuda RNG is a stateful thing that maintains its state on the cpu outside of capture and replay, and we capture it automatically. Why can't we do the same thing here? A. The graph object can handle RNG generator increments because its capture_begin, capture_end, and replay() methods can see and access generator object. But the graph object has no explicit knowledge of or access to optimizer steps in its capture scope. We could let the user tell the graph object what optimizers will be stepped in its scope, ie something like ```python graph.will_use_optimizer(opt) graph.capture_begin() ... ``` but that seems clunkier than an optimizer constructor arg. I'm open to other ideas, but right now I think constructor arg is necessary and the least bad approach. Long term, https://github.com/pytorch/pytorch/issues/71274 is a better fix. Pull Request resolved: https://github.com/pytorch/pytorch/pull/77862 Approved by: https://github.com/ezyang	2022-06-13 01:56:47 +00:00
Mikayla Gawarecki	10bb0ffe69	Fix casting bug in state_step for optimizers when loading state dict Pull Request resolved: https://github.com/pytorch/pytorch/pull/75214 Approved by: https://github.com/albanD	2022-04-05 01:27:18 +00:00
Mikayla Gawarecki	3a21f38a2e	Integrate multi_tensor zero_grad into Optimizer base class (#69936 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69936 Currently, the optimizers in `torch/optim/_multi_tensor/` all override the base Optimizer class' implementation of `zero_grad` with the same foreach zero_grad implementation (e.g. [here](https://github.com/pytorch/pytorch/blob/master/torch/optim/_multi_tensor/adadelta.py#L93-L114)). There is a TODO that indicates that this should be refactored to the base class once the foreach ops are in good shape. This PR is intended to address that TODO. Test Plan: Imported from OSS Reviewed By: mrshenli Differential Revision: D33346748 Pulled By: mikaylagawarecki fbshipit-source-id: 6573f4776aeac757b6a778894681868191a1b4c7	2022-01-05 15:46:23 -08:00
oliver	f8297d40fc	Adds a `maximize` flag to SGD. (#67847 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/46480 -- for SGD. ## Notes: - I have modified the existing tests to take a new `constructor_accepts_maximize` flag. When this is set to true, the ` _test_basic_cases_template` function will test both maximizing and minimizing the sample function. - This was the clearest way I could think of testing the changes -- I would appreciate feedback on this strategy. ## Work to be done: [] I need to update the docs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/67847 Reviewed By: H-Huang Differential Revision: D32252631 Pulled By: albanD fbshipit-source-id: 27915a3cc2d18b7e4d17bfc2d666fe7d2cfdf9a4	2021-11-09 00:43:07 -08:00
Tugsbayasgalan (Tugsuu) Manlaibaatar	52d7dd7398	[DOC] improve docstring for Optimizer.state_dict (#63153 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63153 Fixes: https://github.com/pytorch/pytorch/issues/60121 Test Plan: Imported from OSS Reviewed By: pbelevich Differential Revision: D30629462 Pulled By: tugsbayasgalan fbshipit-source-id: a9160e02ac53bb1a6219879747d73aae9ebe4d2f	2021-08-29 10:20:58 -07:00
Chester Liu	58eb23378f	Clean up usage of torch._six partially (#49785 ) Summary: See https://github.com/pytorch/pytorch/issues/42919 Pull Request resolved: https://github.com/pytorch/pytorch/pull/49785 Reviewed By: mruberry Differential Revision: D25963833 Pulled By: bugra fbshipit-source-id: 11c90d6b8d3f206c9d0a4d8621b773beb10c6ba2	2021-02-08 13:58:34 -08:00
Jan	a5b65ae40a	Fix small typo (#51542 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/51541 Pull Request resolved: https://github.com/pytorch/pytorch/pull/51542 Reviewed By: albanD Differential Revision: D26199174 Pulled By: H-Huang fbshipit-source-id: 919fc4a70d901916eae123672d010e9eb8e8b977	2021-02-02 10:14:17 -08:00
Samuel Marks	e6779d4357	[*.py] Rename "Arguments:" to "Args:" (#49736 ) Summary: I've written custom parsers and emitters for everything from docstrings to classes and functions. However, I recently came across an issue when I was parsing/generating from the TensorFlow codebase: inconsistent use of `Args:` and `Arguments:` in its docstrings. ```sh (pytorch#c348fae)$ for name in 'Args:' 'Arguments:'; do printf '%-10s %04d\n' "$name" "$(rg -IFtpy --count-matches "$name" \| paste -s -d+ -- \| bc)"; done Args: 1095 Arguments: 0336 ``` It is easy enough to extend my parsers to support both variants, however it looks like `Arguments:` is wrong anyway, as per: - https://google.github.io/styleguide/pyguide.html#doc-function-args @ [`ddccc0f`](https://github.com/google/styleguide/blob/ddccc0f/pyguide.md) - https://chromium.googlesource.com/chromiumos/docs/+/master/styleguide/python.md#describing-arguments-in-docstrings @ [`9fc0fc0`](https://chromium.googlesource.com/chromiumos/docs/+/9fc0fc0/styleguide/python.md) - https://sphinxcontrib-napoleon.readthedocs.io/en/latest/example_google.html @ [`c0ae8e3`](https://github.com/sphinx-contrib/napoleon/blob/c0ae8e3/docs/source/example_google.rst) Therefore, only `Args:` is valid. This PR replaces them throughout the codebase. PS: For related PRs, see tensorflow/tensorflow/pull/45420 PPS: The trackbacks automatically appearing below are sending the same changes to other repositories in the [PyTorch](https://github.com/pytorch) organisation. Pull Request resolved: https://github.com/pytorch/pytorch/pull/49736 Reviewed By: albanD Differential Revision: D25710534 Pulled By: soumith fbshipit-source-id: 61e8ff01abb433e9f78185c2d1d0cbd7c22c1619	2020-12-28 09:34:47 -08:00
Teng Gao	1c31f76297	Add high level profiling trace for dataloading and optimizer (#47655 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/47441 To give user more information about python level functions in profiler traces, we propose to instrument on the following functions: ``` _BaseDataLoaderIter.__next__ Optimizer.step Optimizer.zero_grad ``` Because the record_function already uses if (!active) to check whether the profiler is enabled, so we don't explicitly call torch.autograd._profiler_enabled() before each instrument. Acknowledgement: nbcsm, guotuofeng, gunandrose4u , guyang3532 , mszhanyi Pull Request resolved: https://github.com/pytorch/pytorch/pull/47655 Reviewed By: smessmer Differential Revision: D24960386 Pulled By: ilia-cher fbshipit-source-id: 2eb655789e2e2f506e1b8f95ad3d470c83281102	2020-12-09 00:13:56 -08:00
Alban Desmaison	46b252b83a	Revert D24262885: [pytorch][PR] Added foreach_zero_ API Test Plan: revert-hammer Differential Revision: D24262885 (`8e37dcb1f3`) Original commit changeset: 144c283dd009 fbshipit-source-id: 451b202e23bc1fcb11b20d26c11d9a1329789d22	2020-10-28 06:48:59 -07:00
iurii zdebskyi	8e37dcb1f3	Added foreach_zero_ API (#46215 ) Summary: Adding Added foreach_zero_(TensorList) API Tested via unit tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/46215 Reviewed By: zhangguanheng66 Differential Revision: D24262885 Pulled By: izdeby fbshipit-source-id: 144c283dd00924083096d6d92eb9085cbd6097d3	2020-10-27 18:03:34 -07:00
Michael Carilli	3e6bb5233f	Reference amp tutorial (recipe) from core amp docs (#44725 ) Summary: https://pytorch.org/tutorials/recipes/recipes/amp_recipe.html is live. Core amp docs should reference it. Also i fixed some typos in the `zero_grad` docs we ignored when git was behaving weirdly during ngimel 's merge of https://github.com/pytorch/pytorch/pull/44423. Pull Request resolved: https://github.com/pytorch/pytorch/pull/44725 Reviewed By: mruberry Differential Revision: D23723807 Pulled By: ngimel fbshipit-source-id: ca0b76365f8ca908bd978e3b38bf81857fa6c2a3	2020-09-16 11:37:58 -07:00
taiyuanz	c515881137	Add reset_grad() function (#44423 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44423 Pull Request resolved: https://github.com/pytorch/pytorch/pull/42754 Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D23010859 Pulled By: ngimel fbshipit-source-id: 56eec43eba88b98cbf714841813977c68f983564	2020-09-09 22:05:45 -07:00
Yanli Zhao	79cfd85987	grad detach_ only when it has grad_fn in zero_grad call (#41283 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41283 in optimizer.zero_grad(), detach_ is useful to avoid memory leak only when grad has grad_fn, so add check to call grad.detach_ only when the grad has grad_fn in zero_grad() function ghstack-source-id: 108702289 Test Plan: unit test Reviewed By: mrshenli Differential Revision: D22487315 fbshipit-source-id: 861909b15c8497f1da57f092d8963d4920c85e38	2020-07-29 11:40:13 -07:00
mariosasko	4281240cb5	Raise error for duplicate params in param group #40967 (#41597 ) Summary: This PR fixes an issue in https://github.com/pytorch/pytorch/issues/40967 where duplicate parameters across different parameter groups are not allowed, but duplicates inside the same parameter group are accepted. After this PR, both cases are treated equally and raise `ValueError`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/41597 Reviewed By: zou3519 Differential Revision: D22608019 Pulled By: vincentqb fbshipit-source-id: 6df41dac62b80db042cfefa6e53fb021b49f4399	2020-07-27 12:25:52 -07:00
Ram Rachum	f6b9848c25	Use chain.from_iterable in optimizer.py (#40156 ) Summary: This is a faster and more idiomatic way of using `itertools.chain`. Instead of computing all the items in the iterable and storing them in memory, they are computed one-by-one and never stored as a huge list. This can save on both runtime and memory space. Pull Request resolved: https://github.com/pytorch/pytorch/pull/40156 Reviewed By: ezyang Differential Revision: D22189038 Pulled By: vincentqb fbshipit-source-id: 160b2c27f442686821a6ea541e1f48f4a846c186	2020-06-23 14:07:05 -07:00
Alex Hedges	a3c87c4922	Make Optimizer.state_dict() nondeterministic (#37347 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/36831. Instead of using `id()`, an arbitrary yet consistent order-based index is used instead. This results in a deterministic output between runs. I am not the biggest fan of using `nonlocal` (it appears to be used sparingly in the codebase) to get `start_index` between calls to `pack_group()`, but the alternatives had larger issues: - Using the last value added to `param_mappings` would be ideal, but that only works if `dict` iteration order is consistent, and PyTorch currently supports Python <3.7. - Using the maximum value added to `param_mappings` wouldn't have that issue but would not be constant time. For testing, I confirmed that `test_optim.py` works before and after these changes. Randomizing the indices in `param_mappings` causes the tests to fail, which is further evidence these changes work. I'm not 100% if these tests are sufficient, but they're a start. Pull Request resolved: https://github.com/pytorch/pytorch/pull/37347 Differential Revision: D21353820 Pulled By: vincentqb fbshipit-source-id: e549f1f154833a461b1f4df6d07ad509aab34ea1	2020-06-01 15:32:02 -07:00
albanD	b0871f211b	Make all optimizers consistent so that they don't change gradients inplace Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30257 Test Plan: Imported from OSS Differential Revision: D18665461 Pulled By: albanD fbshipit-source-id: cfdafef919468a41007881b82fd288b7128baf95	2019-11-26 12:16:25 -08:00
Dmytro Dzhulgakov	c25e33789e	Lightweight at-most-once logging for API usage (#20745 ) Summary: Resubmit #20698 which got messed up. Idea is that when PyTorch is used in a custom build environment (e.g. Facebook), it's useful to track usage of various APIs centrally. This PR introduces a simple very lightweight mechanism to do so - only first invocation of a trigger point would be logged. This is significantly more lightweight than #18235 and thus we can allow to put logging in e.g. TensorImpl. Also adds an initial list of trigger points. Trigger points are added in such a way that no static initialization triggers them, i.e. just linking with libtorch.so will not cause any logging. Further suggestions of what to log are welcomed. Pull Request resolved: https://github.com/pytorch/pytorch/pull/20745 Differential Revision: D15429196 Pulled By: dzhulgakov fbshipit-source-id: a5e41a709a65b7ebccc6b95f93854e583cf20aca	2019-05-23 23:17:59 -07:00
Edward Z. Yang	9b1dbffba5	Re-sync with internal repository (#20702 )	2019-05-20 09:22:57 -04:00
Dmytro Dzhulgakov	d3059b9c49	Lightweight logging for once-only API usage	2019-05-19 23:04:40 -07:00
barrh	557b1b362f	Fix copied optimizer (#19308 ) Summary: Add the defaults field to the copied object. Prior to this patch, optimizer.__getattr__ has excluded the defaults attribute of optimizer source object, required by some LR schedulers. (e.g. CyclicLR with momentum) Pull Request resolved: https://github.com/pytorch/pytorch/pull/19308 Differential Revision: D15012801 Pulled By: soumith fbshipit-source-id: 95801b269f6f9d78d531d4fed95c973b280cc96f	2019-04-19 10:27:01 -07:00
Jerry Ma	7956e9718b	Add name for required optimizer parameter. (#13202 ) Summary: Small change -- the benefit is that the docs will show ``<required parameter>`` instead of ``<object object>`` for these required parameters. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13202 Reviewed By: SsnL Differential Revision: D12826252 Pulled By: jma127 fbshipit-source-id: 5f2c8495e5c56920377e4e012b8711e8f2a6e30e	2018-10-29 15:02:21 -07:00
Jeff Smith	05e06f7de2	migrating deprecated calls without abc module for containers (#11515 ) Summary: Implementing #10540. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11515 Reviewed By: apaszke Differential Revision: D9771045 Pulled By: jeffreyksmithjr fbshipit-source-id: 85ea39abaa9b465805a969f122b626b11fc85ef6	2018-09-13 15:09:22 -07:00
Domagoj Alagić	f43e067128	Make optimizer not complain about parameters with requires_grad=False (#7419 )	2018-05-09 11:34:52 -04:00
Tongzhou Wang	1c01eabd3c	Codemod to update our codebase to 0.4 standard (#6641 ) * Codemod to update our codebase to 0.4 standard * Update some of the test scri[ts * remove Variable in test_clip_grad_value * fix _symbolic_override_wrapper_maker	2018-04-17 22:06:54 -04:00
Kento NOZAWA	3b58b859b2	Fix typos in docs (#6389 )	2018-04-07 12:41:15 -04:00
Jiaming Liu	31c0e2321a	Block set from param_group['params'] (#6031 ) * Block set from param_group['params'] This might cause `list(params)` to output in random order. In this case, in `load_state_dict()`, `id_map` would not be matched correctly. * Update Error Message * Add Warning on Optimizer Docs * Update optimizer.py	2018-03-28 07:45:19 -07:00
Sam Gross	30ec06c140	Merge Variable and Tensor classes (#5225 ) This replaces the torch.Tensor constructors with factories that produce Variables. Similarly, functions on the torch module (e.g. torch.randn) now return Variables. To keep the PR to a reasonable size, I've left most of the unused tensor code. Subsequent PRs will remove the dead code, clean-up calls to torch.autograd.Variable, and rename Variable to Tensor everywhere. There are some breaking changes because Variable and Tensors had slightly different semantics. There's a list of those changes here: https://github.com/pytorch/pytorch/wiki/Breaking-Changes-from-Variable-and-Tensor-merge	2018-02-23 18:03:31 -05:00
Vishwak Srinivasan	89acc10f85	Adding description for Optimizers (#4371 )	2017-12-28 16:55:52 +01:00
Sam Gross	d605058212	Replace Variable.volatile with torch.no_grad() (#3970 ) This removes volatile from Variable. The functionality is mostly replaced by a global (thread-local) flag, which is controlled by torch.set_grad_enabled() and the context manager torch.no_grad(). In C++, the flag is exposed through GradMode::is_enabled() and GradMode::set_enabled() Fixes #3627	2017-12-18 15:46:13 -05:00
Adam Paszke	af9fd35d82	Cast tensors when loading optimizer state dicts (#3658 )	2017-11-28 09:56:39 -05:00
Taehoon Lee	5d9de014bd	Fix typos	2017-10-01 03:09:25 -04:00
randxie	1a83c372ec	address issue #1488 by using defaultdict in load_state_dict	2017-09-20 14:56:21 -04:00
Michael Dietz	e69063405e	Allow param groups to be added to Optimizer dynamically (#2374 )	2017-08-30 11:20:58 -04:00
Yan Wang	a76098ac15	fix optimizer when given single parameters (instead of an iterable) When I use the named_parametes to modify the lr and weight decay, I will face a bug. Because the value of the named_parameters return is torch.nn.paramter.Parameter, not a generator of the Parameter.	2017-06-05 23:47:56 -04:00
Adam Paszke	feef54ec34	Don't modify non-volatile grads in zero_grad	2017-05-10 16:43:14 +02:00
Adam Paszke	20aa5b066f	Convert some of the functions to new format Also, fix a lot of issues that appeared after the previous commits.	2017-05-01 16:44:56 -04:00
Adam Paszke	2ca787fcf4	Refactor attribute names in autograd	2017-05-01 16:44:56 -04:00
Martin Raison	f17cfe4293	sparse tensor operations (#735 )	2017-03-03 18:37:03 +01:00
Adam Paszke	bd7a5ad6f0	Make Optimizer.load_state_dict use __setstate__	2017-02-26 20:02:42 +01:00
Luke Yeager	3ed720079e	[pep8] Fix most remaining lint manually	2017-01-28 01:15:51 +01:00
Luke Yeager	e7c1e6a8e3	[pep8] Fix most lint automatically with autopep8 Here's the command I used to invoke autopep8 (in parallel!): git ls-files \| grep '\.py$' \| xargs -n1 -P`nproc` autopep8 -i Several rules are ignored in setup.cfg. The goal is to let autopep8 handle everything which it can handle safely, and to disable any rules which are tricky or controversial to address. We may want to come back and re-enable some of these rules later, but I'm trying to make this patch as safe as possible. Also configures flake8 to match pep8's behavior. Also configures TravisCI to check the whole project for lint.	2017-01-28 01:15:51 +01:00
Adam Paszke	ecfcf39f30	Improve optimizer serialization Also, add optimizer.load_state_dict	2017-01-24 17:30:50 -05:00
Adam Paszke	3238786ea1	Improve optimizer error messages	2017-01-22 18:32:51 -05:00
Adam Paszke	95f0fa8a92	Change .grad attribute of Variables to be a Variable	2017-01-16 12:59:47 -05:00
Adam Paszke	676ffee542	Check params type in optimizers	2017-01-16 12:59:47 -05:00
Adam Paszke	604e13775f	Add optim docs	2017-01-16 12:59:47 -05:00
Sam Gross	162170fd7b	Add optional weight decay to optim.SGD (#269 )	2016-11-29 20:35:40 -05:00
Adam Paszke	09493603f6	Change optimizer API	2016-11-08 18:12:56 +01:00
Adam Paszke	df59b89fbb	Add more optimizers	2016-11-07 22:50:56 +01:00
Adam Paszke	3cbe66ba8c	Change requires_grad default to False	2016-10-05 08:46:34 -07:00
Adam Paszke	99de537a2e	Remove CUDA sync points from losses and trainer	2016-10-05 08:46:31 -07:00
Adam Paszke	4db6667923	Allow specifying per-parameter optimization parameters	2016-10-04 18:21:50 -07:00
Adam Paszke	58b134b793	Allow exporting optimizer state as a dict	2016-10-04 17:33:49 -07:00
Adam Paszke	ff785e5f17	Make optimizers accept a closure	2016-08-25 09:23:39 -07:00
Adam Paszke	7bcb2a4081	Initial optim version	2016-08-23 19:03:30 -07:00

1 2 3

149 Commits