pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
Justin Chu	abc1cadddb	[BE] Enable ruff's UP rules and autoformat utils/ (#105424 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/105424 Approved by: https://github.com/ezyang, https://github.com/malfet	2023-07-18 20:17:25 +00:00
Rohan Varma	b7b44e766b	[Checkpoint] Separate implementation into generator (#105101 ) Separates the non-reentrant AC implementation into a generator so that other APIs such as composable checkpoint API can use the generator as pre and post forward logic. Differential Revision: [D47419387](https://our.internmc.facebook.com/intern/diff/D47419387/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/105101 Approved by: https://github.com/soulitzer	2023-07-14 06:27:13 +00:00
soulitzer	91dcc3b272	Fix activation checkpoint for mps (#104787 ) Fixes https://github.com/pytorch/pytorch/issues/104478 Pull Request resolved: https://github.com/pytorch/pytorch/pull/104787 Approved by: https://github.com/albanD	2023-07-08 14:57:05 +00:00
Animesh Jain	8c191d8eef	[dynamo][ac] Reland #104397 - Remove disable monkeypatching of utils.checkpoint (#104665 ) NO CHANGE from before. The ancestor diff was reverted, so this diff got reverted as well. Pull Request resolved: https://github.com/pytorch/pytorch/pull/104665 Approved by: https://github.com/wconstab	2023-07-06 00:48:02 +00:00
PyTorch MergeBot	40f53912cf	Revert "[dynamo][ac] Remove disable monkeypatching of utils.checkpoint (#104397 )" This reverts commit `537a6c0651`. Reverted https://github.com/pytorch/pytorch/pull/104397 on behalf of https://github.com/huydhn due to This has been reverted internally by D47216591, so I need to also revert it on OSS to keep them in sync ([comment](https://github.com/pytorch/pytorch/pull/104397#issuecomment-1621086360))	2023-07-05 06:11:08 +00:00
Animesh Jain	537a6c0651	[dynamo][ac] Remove disable monkeypatching of utils.checkpoint (#104397 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/104397 Approved by: https://github.com/wconstab	2023-06-30 02:27:06 +00:00
soulitzer	73c927f901	Improve debuggability of activation checkpoint (#103859 ) This PR makes some improvements for debuggability of checkpointing: - improved error messages that are more understandable - errors are now `CheckpointError` which subclasses `RuntimeError` (only `CheckpointError` triggers debug message, see below) - stricter error checking by default: - shapes, dtypes, and device are compared - we also now error when more tensors are being saved for backward during recompute - NOTE: checks are relaxed if it is detected that you are doing backward within forward - shapes, dtype, and device checking can be disabled by passing `determinism_check="none"` - new debug flag: more helpful error message when `debug=True` Note: - cpp stack trace is only included for x86 linux machines - the error message if cpp stack trace is included can be quite long. For a function checkpointed with 8 operators, the log was around 1300 lines! (should this be hidden behind a flag?) [Error message when debug='True' (python stack trace only)](https://gist.github.com/soulitzer/3d5e19c7cceae8e22f9bdd625ec39dd4) [Error message when debug='True' (with python and cpp stacktrace)](https://gist.github.com/soulitzer/ff8fd8c3ccbb2c90dfe3df6d7713b167) Pull Request resolved: https://github.com/pytorch/pytorch/pull/103859 Approved by: https://github.com/albanD	2023-06-22 03:57:36 +00:00
Rohan Varma	60547fcbee	Autoformat torch/utils/checkpoint (#101649 ) Per title Differential Revision: [D45933467](https://our.internmc.facebook.com/intern/diff/D45933467/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/101649 Approved by: https://github.com/Skylion007, https://github.com/soulitzer	2023-05-18 21:55:05 +00:00
soulitzer	70ef0bb45a	Fix checkpoint doc small formatting issue (#101419 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/101419 Approved by: https://github.com/albanD	2023-05-15 21:33:56 +00:00
soulitzer	98f6b815b7	[BE] Make some simplifications to torch.utils.checkpoint logic (#101193 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/101193 Approved by: https://github.com/albanD	2023-05-12 04:35:22 +00:00
shibo	6aeb85add8	add checkpoint support for custom device (#99626 ) Fixes #ISSUE_NUMBER 1、add checkpoint support for custom device 2、add a device argument, I want to add a device="cuda" parameter to the func `forward` of `CheckpointFunction`, and I can specify the device type when using it, but the func `apply` of `torch.autograd.Function` does not support `kwargs`, so I added a variable named `_device`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/99626 Approved by: https://github.com/soulitzer	2023-05-04 00:23:42 +00:00
soulitzer	e552b91286	torch.utils.checkpoint warns if user does not pass use_reentrant explicitly (#100551 ) Now that we have updated all internal callsites, per https://fb.workplace.com/groups/pytorch.oss.dev/permalink/1635183750239493/ we should raise a warning when use_reentrant is not explicitly passed for 2.1 Deprecation note: - Not passing in use_reentrant explicitly is now deprecated and will raise a warning. In the future the default value of use-reentrant will be False. To preserve the existing behavior you can pass in use_reentrant=True. It is recommended that you use use_reentrant=False. Pull Request resolved: https://github.com/pytorch/pytorch/pull/100551 Approved by: https://github.com/Skylion007	2023-05-03 20:48:07 +00:00
Kazuaki Ishizaki	622a11d512	Fix typos under torch/utils directory (#97516 ) This PR fixes typos in comments and messages of `.py` files under `torch/utils` directory Pull Request resolved: https://github.com/pytorch/pytorch/pull/97516 Approved by: https://github.com/ezyang	2023-03-24 16:53:39 +00:00
soulitzer	7a8b691388	Make early stop the default for checkpoint and expose a way to disable (#96866 ) Why did I choose context manager instead of per-call? Early stopping is not part of the model definition, and depending on how a particular model is used, e.g., with PT2 or not we may or may not want to disable early stopping. Pull Request resolved: https://github.com/pytorch/pytorch/pull/96866 Approved by: https://github.com/albanD	2023-03-22 20:03:56 +00:00
soulitzer	89d116d961	[BE][docs]Improve and update checkpoint documentation (#96862 ) Updates: - ~recommend user to use non-reentrant, mention that reentrant will be deprecated in the future~ - merges all the warnings into a single list of non-reentrant improvements over reentrant - adds an additional entry to the list about allowing backward inside checkpointed region Pull Request resolved: https://github.com/pytorch/pytorch/pull/96862 Approved by: https://github.com/albanD	2023-03-22 16:53:29 +00:00
soulitzer	f3db2a6341	Expose API to specify custom context manager for checkpoint (#96783 ) Per [design](https://docs.google.com/document/d/1v-yqRqiWA6dIUOw5OpqFs2PqSQIbDEkwRPGk9FcYnxg/edit) we want (1) to allow the user to pass in a function that returns two context managers (2) a per-call API only for now, and (3) do not upstream selective checkpoint for the short term. Pull Request resolved: https://github.com/pytorch/pytorch/pull/96783 Approved by: https://github.com/albanD	2023-03-15 20:37:33 +00:00
soulitzer	d30db9a251	Replace non-reentrant checkpoint with a rewrite that can be nested and contain grad (#90105 ) Changes: - bc-breaking change: The main difference between this and the old non-reentrant impl that it replaces is that we clear recomputed tensors on backward immediately upon unpack, even if retain_graph=True. This has the following additional implications: - Accessing _saved_tensors multiple times will silently recompute forward multiple times. - Accessing ctx.saved_tensor twice in the same backward will now raise an error. - To avoid dealing with the potential consequences, early stopping has been hidden behind a global flag that is by default False, and can be enabled via a context manager. We can remove this in a follow up. Some features of nesting as a result do not work by default. Before land: - import to check for more bc-breakingness - implement any workarounds for the bc-breaking-ness, if we decide on any - update docs to reflect new lifetime of recomputed variables - update docs to mention the early stop feature Follow ups: - enable early-stopping by default - update docs/tutorial to feature nested use cases Related docs: - code comment: https://github.com/pytorch/pytorch/pull/90105/files#diff-9dcd955620b52ce128e18e3567be88edbb238810460d1288a86fabc20e483b30R448 - design doc: https://docs.google.com/document/d/1UDLhTNv6_kvuDTRlsjfj9WdqtNaQNr8ahrvdBIB6914/edit# - retains_grad <> checkpiont https://docs.google.com/document/d/1maiGmuFUxysQL0AdYUU88kngAaXh_L0XpDcLDh_5Ors/edit Pull Request resolved: https://github.com/pytorch/pytorch/pull/90105 Approved by: https://github.com/albanD	2023-03-14 20:38:36 +00:00
Aaron Gokaslan	67d9790985	[BE] Apply almost all remaining flake8-comprehension checks (#94676 ) Applies the remaining flake8-comprehension fixes and checks. This changes replace all remaining unnecessary generator expressions with list/dict/set comprehensions which are more succinct, performant, and better supported by our torch.jit compiler. It also removes useless generators such as 'set(a for a in b)`, resolving it into just the set call. Pull Request resolved: https://github.com/pytorch/pytorch/pull/94676 Approved by: https://github.com/ezyang	2023-02-12 01:01:25 +00:00
Rohan Varma	d93b1b9c4e	Address feedback from previous PR (#86622 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/86622 Approved by: https://github.com/albanD	2022-10-10 18:53:41 +00:00
Rohan Varma	7a411952fb	CheckpointSequential support non-reentrant (#86331 ) Closes https://github.com/pytorch/pytorch/issues/86328 Adds `use_reentrant` argument to `checkpoint_sequential`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/86331 Approved by: https://github.com/zhaojuanmao, https://github.com/albanD	2022-10-06 23:10:18 +00:00
joncrall	4618371da5	Integrate xdoctest - Rebased (#82797 ) This is a new version of #15648 based on the latest master branch. Unlike the previous PR where I fixed a lot of the doctests in addition to integrating xdoctest, I'm going to reduce the scope here. I'm simply going to integrate xdoctest, and then I'm going to mark all of the failing tests as "SKIP". This will let xdoctest run on the dashboards, provide some value, and still let the dashboards pass. I'll leave fixing the doctests themselves to another PR. In my initial commit, I do the bare minimum to get something running with failing dashboards. The few tests that I marked as skip are causing segfaults. Running xdoctest results in 293 failed, 201 passed tests. The next commits will be to disable those tests. (unfortunately I don't have a tool that will insert the `#xdoctest: +SKIP` directive over every failing test, so I'm going to do this mostly manually.) Fixes https://github.com/pytorch/pytorch/issues/71105 @ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/82797 Approved by: https://github.com/ezyang	2022-08-12 02:08:01 +00:00
albanD	7dd795cbed	Prevent ref cycle creation in inner hook (#82776 ) Towards fixing https://github.com/pytorch/pytorch/issues/82482 This PR fixes two things: ## 1) memory leak The .detach() call prevents a true memory leak in some cases where the user function is using multiple ops in a row that save their inputs. The following chain of objects keep each other alive - the `storage` object - a recomputed Tensor y - y's grad_fn FooBackward (in c++) - FooBackward's SavedVariables (in c++) - SavedVariable Hook - the `inner_pack` function - captures `storage` Since part of this cycle is in c++, the python gc is not able to break it. Should THPCppFunction_traverse actually visit it's SavedVariables which in turn should visit their hooks? I think the answer is yes but I haven't dived into which python object is traversing what as if there is non-unique ownership of the c++ object, it makes the traversal a lot trickier. @ezyang do you think we should dive into this more? In this case, this can be easily solved anyways by storing `y.detach()` in the `storage` object as we don't care about the temporary backward graph that gets created during the second forward call. ## 2) Lifetime of the recomputed buffers The new storage system is now such that the lifetime of the recomputed buffer is directly linked to the SavedVariable c++ object. Meaning that this buffer will get deleted IIF the SavedVariable is cleared. This means that we now get the exact same behavior as the version without the saved variable hook where Tensors are saved directly on the SavedVariable object. This is great as this solves all the cases where the non-checkpoint version used to work but the checkpoint version does not (even double access or retain_graph=True). The one drawback of this approach though is that the buffer do NOT get cleared when the user passes in `retain_graph=True`! The next backward won't even re-run the forward as it already has all the buffers available. Is this a problem that you think we would need to find a solution for @rohan-varma or it is niche enough that we don't care for now? Pull Request resolved: https://github.com/pytorch/pytorch/pull/82776 Approved by: https://github.com/ezyang, https://github.com/rohan-varma	2022-08-06 00:31:22 +00:00
ProGamerGov	8def154e00	Fix multiple docstring type mistakes (#82474 ) ### Description * Docstrings using `(tuple of ints)` shows up as `(tuple of python:ints)`, so I fixed them by making the `int` no longer plural. Example: https://pytorch.org/docs/stable/generated/torch.permute.html#torch.permute * A docstring type in JIT had one of its types incorrectly highlighted as code. Example: https://pytorch.org/docs/stable/generated/torch.jit.script.html#torch.jit.script * I found some docstring type usages of `string` that had not yet been converted to `str` after #82410 * Some docstrings incorrectly listed their defaults inside the docstring types. * I also found a docstring that was missing its type ### Testing No testing should be required. --- In the developer guidelines, there should probably be standards listed for the docstring types. Pull Request resolved: https://github.com/pytorch/pytorch/pull/82474 Approved by: https://github.com/albanD	2022-07-29 17:45:37 +00:00
Rohan Varma	98cad3d305	[Checkpoint] Fix autocasting (#81766 ) Add support for the correct autocasting in the non-reentrant checkpoint as it exists in the reentrant-version. This was noticed by @awgu. Pull Request resolved: https://github.com/pytorch/pytorch/pull/81766 Approved by: https://github.com/albanD	2022-07-22 21:33:56 +00:00
Rohan Varma	e14941ef79	Add kwarg support for no_reentrant checkpoint (#80987 ) Supports kwargs input to function when `torch.utils.checkpoint` with use_reentrant=False. This is required to unblock T5 activation checkpointing and MetaSeq use cases. Closes https://github.com/pytorch/pytorch/issues/79887 Pull Request resolved: https://github.com/pytorch/pytorch/pull/80987 Approved by: https://github.com/zhaojuanmao	2022-07-09 05:07:13 +00:00
Rohan Varma	0c5fdfd95f	Revert "Revert "[FSDP Optim State] Remove checkpoint prefix (#80480 )"" (#80936 ) This reverts commit `fe361dede4`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/80936 Approved by: https://github.com/awgu	2022-07-06 22:21:07 +00:00
PyTorch MergeBot	58532256e9	Revert "Add __all__ for torch.distributed and fx modules (#80460 )" This reverts commit `5d40c3d5c8`. Reverted https://github.com/pytorch/pytorch/pull/80460 on behalf of https://github.com/malfet due to Broke MacOS testing, see https://github.com/pytorch/pytorch/runs/7105579664?check_suite_focus=true	2022-06-29 16:20:55 +00:00
anjali411	5d40c3d5c8	Add __all__ for torch.distributed and fx modules (#80460 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/80460 Approved by: https://github.com/albanD, https://github.com/rohan-varma	2022-06-29 02:53:56 +00:00
Rohan Varma	44fe851feb	[WIP] Fix non-reentrant hooks based checkpointing Pull Request resolved: https://github.com/pytorch/pytorch/pull/78752 Approved by: https://github.com/albanD	2022-06-14 01:13:33 +00:00
Kiarash Jamali	bc3c7a6cbd	Fix issue with _checkpoint_without_reentrant Fixes #76737 I also added a test case for this bug. Pull Request resolved: https://github.com/pytorch/pytorch/pull/76890 Approved by: https://github.com/albanD	2022-05-05 17:37:31 +00:00
Michael Carilli	cf3ef23713	Propagate full autocast state to CheckpointFunction's forward-inside-backward (#71169 ) Summary: Should fix https://github.com/pytorch/pytorch/issues/71124 (implements https://github.com/pytorch/pytorch/issues/71124#issuecomment-1009436056). cc mcarilli ptrblck Pull Request resolved: https://github.com/pytorch/pytorch/pull/71169 Reviewed By: albanD Differential Revision: D33793556 Pulled By: ngimel fbshipit-source-id: 80a4b4f0657b922002e3446fb6b48f082fa98453 (cherry picked from commit `cf9beee28b`)	2022-01-27 00:31:53 +00:00
Rohan Varma	049debd97d	[Reland][Autograd/Checkpoint] Checkpoint implementation without reentrant autograd (#69508 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69508 Original Phabricator Diff: D32704467 (`e032dae329`) Reland, fix is to not test traditional checkpoint when input does not require grad as that is unsupported as documented. Original PR body: Resubmission of https://github.com/pytorch/pytorch/pull/62964 with the suggestions and tests discussed in https://github.com/pytorch/pytorch/issues/65537. Adds a `use_reentrant=False` flag to `checkpoint` function. When `use_reentrant=True` is specified, a checkpointing implementation that uses SavedVariableHooks instead of re-entrant autograd is used. This makes it more composable with things such as `autograd.grad` as well as DDP (still need to add thorough distributed testing). As discussed in https://github.com/pytorch/pytorch/issues/65537, the tests that we need to add are: - [x] Gradient hooks are called once - [x] works when input does require grads but Tensor that require grads are captures (like first layer in a nn) - [x] works for functions with arbitrary input/output objects - [x] distributed tests (next PR) Note that this is only for `torch.utils.checkpoint`, if this approach overall looks good, we will do something similar for `checkpoint_sequential`. ghstack-source-id: 144948501 Test Plan: CI Reviewed By: zhaojuanmao Differential Revision: D32902634 fbshipit-source-id: 2ee87006e5045e5471ff80c36a07fbecc2bea3fe	2021-12-07 16:31:23 -08:00
Michael Suo	59e98b66ac	Revert D32704467: [Autograd/Checkpoint] Checkpoint implementation without reentrant autograd Test Plan: revert-hammer Differential Revision: D32704467 (`e032dae329`) Original commit changeset: 6eea1cce6b93 fbshipit-source-id: 1a788c1fd57cee46bba82e216e6162d078359cc2	2021-12-06 16:33:32 -08:00
Rohan Varma	e032dae329	[Autograd/Checkpoint] Checkpoint implementation without reentrant autograd (#69027 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69027 Resubmission of https://github.com/pytorch/pytorch/pull/62964 withe suggestions and tests discussed in https://github.com/pytorch/pytorch/issues/65537. Adds a `use_reentrant=False` flag to `checkpoint` function. When `use_reentrant=True` is specified, a checkpointing implementation that uses SavedVariableHooks instead of re-entrant autograd is used. This makes it more composable with things such as `autograd.grad` as well as DDP (still need to add thorough distributed testing). As discussed in https://github.com/pytorch/pytorch/issues/65537, we have added the following tests: -[ ] Gradient hooks are called once ghstack-source-id: 144644859 Test Plan: CI Reviewed By: pbelevich Differential Revision: D32704467 fbshipit-source-id: 6eea1cce6b935ef5a0f90b769e395120900e4412	2021-12-06 13:29:37 -08:00
epwalsh	c738c13304	Fix typo in checkpoint docs (#59646 ) Summary: This small typo causing this valuable piece of information to be excluded from the docs. <img width="876" alt="image" src="https://user-images.githubusercontent.com/8812459/121240517-47f2d400-c84f-11eb-9288-23c551c1591a.png"> The last "warning" is missing a second ":", so it doesn't render in the docs {emoji:1f447} <img width="875" alt="image" src="https://user-images.githubusercontent.com/8812459/121240467-39a4b800-c84f-11eb-9dd6-ec26754c43d3.png"> Pull Request resolved: https://github.com/pytorch/pytorch/pull/59646 Reviewed By: mruberry Differential Revision: D28972541 Pulled By: jbschlosser fbshipit-source-id: d10c6688d8db4d4ec4b02858a4c7b352365219c0	2021-06-09 12:48:18 -07:00
Pritam Damania	4fa47e5e7d	Support non-tensor inputs and outputs for checkpointed functions. (#52422 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52422 As mentioned in https://github.com/pytorch/pytorch/issues/52415, `torch.utils.checkpoint` doesn't support checkpointing for functions which have non-tensor inputs and outputs. This PR resolves this issue by ensuring the autograd machinery ignores the non-tensor inputs and outputs and processes the tensors accordingly. ghstack-source-id: 124406867 Test Plan: 1) unit test 2) waitforbuildbot Reviewed By: albanD Differential Revision: D26507228 fbshipit-source-id: 0a5a1591570814176185362e83ad18dabd9c84b0	2021-03-19 21:29:03 -07:00
Jeffrey Wan	7b9ca54ecf	Reset checkpoint_valid flag when error happens during function execution (#51746 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/37874, https://github.com/pytorch/pytorch/issues/51743 Uses RAII to manage the flag so that it gets reset properly on exception Pull Request resolved: https://github.com/pytorch/pytorch/pull/51746 Reviewed By: izdeby Differential Revision: D26319619 Pulled By: soulitzer fbshipit-source-id: ea1235438ba516f99195c83fa23d5880f9977c93	2021-02-08 17:48:25 -08:00
Michael Carilli	ee271047b5	torch.utils.checkpoint.checkpoint + torch.cuda.amp (#49757 ) Summary: Adds a test to orphaned original PR (https://github.com/pytorch/pytorch/pull/40221). Should fix https://github.com/pytorch/pytorch/issues/49738 and https://github.com/pytorch/pytorch/issues/47183 Pull Request resolved: https://github.com/pytorch/pytorch/pull/49757 Reviewed By: mruberry Differential Revision: D25689609 Pulled By: ngimel fbshipit-source-id: 0a6adc11eb98382048ef9a9775e185dcdeff6010	2020-12-22 22:25:11 -08:00
Weiyi Zheng	22f4a58a45	[pytorch] activation checkpointing: enable mixing tensor without requires_grad (#45934 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45934 https://pytorch.org/docs/stable/checkpoint.html pytorch checkpoint requires all input to the function being checkpointed to requires_grad, but this assumption is not necessarily try. consider the following two examples ``` output = MultiheadedMaskedAtten(input, mask) output = LSTM(input, seq_length) ``` both length and mask are tensors that won't requires grad, currently if you try to checkpoint torch.autograd.backward will complain ``` File "/mnt/xarfuse/uid-124297/7d159c34-seed-nspid4026531836-ns-4026531840/torch/autograd/function.py ", line 87, in apply return self._forward_cls.backward(self, *args) File "/mnt/xarfuse/uid-124297/7d159c34-seed-nspid4026531836-ns-4026531840/torch/utils/checkpoint.py" , line 99, in backward torch.autograd.backward(outputs, args) File "/mnt/xarfuse/uid-124297/7d159c34-seed-nspid4026531836-ns-4026531840/torch/autograd/__init__.py ", line 132, in backward allow_unreachable=True) # allow_unreachable flag RuntimeError: element 1 of tensors does not require grad and does not have a grad_fn ``` this diff allows skipping the non-grad-requiring tensor when running autograd.backward. added documentation for this feature as well. Test Plan: added unit test to make sure partial tensor grads can be used in checkpoint(). Differential Revision: D24094764 fbshipit-source-id: 6557e8e74132d5a392526adc7b57b6998609ed12	2020-10-14 21:28:02 -07:00
Xiang Gao	20ac736200	Remove py2 compatible future imports (#44735 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44735 Reviewed By: mruberry Differential Revision: D23731306 Pulled By: ezyang fbshipit-source-id: 0ba009a99e475ddbe22981be8ac636f8a1c8b02f	2020-09-16 12:55:57 -07:00
Nikita Shulga	6753157c5a	Enable torch.utils typechecks (#42960 ) Summary: Fix typos in torch.utils/_benchmark/README.md Add empty __init__.py to examples folder to make example invocations from README.md correct Fixed uniform distribution logic generation when mixval and maxval are None Fixes https://github.com/pytorch/pytorch/issues/42984 Pull Request resolved: https://github.com/pytorch/pytorch/pull/42960 Reviewed By: seemethere Differential Revision: D23095399 Pulled By: malfet fbshipit-source-id: 0546ce7299b157d9a1f8634340024b10c4b7e7de	2020-08-13 15:24:56 -07:00
Justin Liang	16f4501cd4	Improve checkpoint docs to warn users about detached gradient issues (#37266 ) Summary: See https://discuss.pytorch.org/t/training-with-gradient-checkpoints-torch-utils-checkpoint-appears-to-reduce-performance-of-model/78102/3?u=jwl for details. Updated the docs to warn users about issues with checkpointing models that use `detach()` or `torch.no_grad()` to freeze their model layers/weights during training. When they do this, training with `checkpoint` will fail as it forces the outputs to require gradients when the model itself does not. Hence, during the backward pass it will output the error: ``` [4]<stderr>:RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn ``` Maybe it is possible to fix this directly in the code, but I am not sure how in the current codebase. Pull Request resolved: https://github.com/pytorch/pytorch/pull/37266 Differential Revision: D21262558 Pulled By: mrshenli fbshipit-source-id: 529cf370534504baf8937ef17dac5d6916fbf5ae	2020-04-27 15:25:23 -07:00
Brian Wignall	f326045b37	Fix typos, via a Levenshtein-type corrector (#31523 ) Summary: Should be non-semantic. Uses https://en.wikipedia.org/wiki/Wikipedia:Lists_of_common_misspellings/For_machines to find likely typos, with https://github.com/bwignall/typochecker to help automate the checking. Uses an updated version of the tool used in https://github.com/pytorch/pytorch/pull/30606 . Pull Request resolved: https://github.com/pytorch/pytorch/pull/31523 Differential Revision: D19216749 Pulled By: mrshenli fbshipit-source-id: 7fd489cb9a77cd7e4950c1046f925d57524960ea	2020-01-17 16:03:19 -08:00
Heungsub Hans Lee	fa251cfd97	Fully deprecate variadic inputs of checkpoint_sequential (#25985 ) Summary: To support variadic inputs of `checkpoint_sequential` was deprecated at https://github.com/pytorch/pytorch/issues/21006. This case should be warned with `DeprecationWarning` for PyTorch 1.2, but it should be simply failed with `TypeError` since PyTorch 1.3. This patch removes the `DeprecationWarning` for PyTorch 1.2. Pull Request resolved: https://github.com/pytorch/pytorch/pull/25985 Differential Revision: D18809875 Pulled By: albanD fbshipit-source-id: e84dd8629c04979c4b2dc63e8ada94292e8cedd0	2019-12-05 09:23:28 -08:00
Brian Wignall	e7fe64f6a6	Fix typos (#30606 ) Summary: Should be non-semantic. Uses https://en.wikipedia.org/wiki/Wikipedia:Lists_of_common_misspellings/For_machines to find likely typos. Pull Request resolved: https://github.com/pytorch/pytorch/pull/30606 Differential Revision: D18763028 Pulled By: mrshenli fbshipit-source-id: 896515a2156d062653408852e6c04b429fc5955c	2019-12-02 20:17:42 -08:00
Hans Lee	ffdce79078	Deprecate variadic inputs of checkpoint_sequential (#21006 ) Summary: I've reported inconsistency between `checkpoint_sequential` and `nn.Sequential` at https://github.com/pytorch/pytorch/issues/19260. Both should provide the same input signature but they don't. I think the consistency is important and I agree with apaszke that `nn.Sequential`'s semantics should be kept instead of `checkpoint_sequential`. I hope `checkpoint_sequential` raises `TypeError` on variadic arguments since PyTorch 1.2.0. But for now, it's okay just to warn as `DeprecationWarning`. I've talked about this approach with soumith. Please review this pull request. Any comment will be my pleasure. Pull Request resolved: https://github.com/pytorch/pytorch/pull/21006 Differential Revision: D15530801 Pulled By: soumith fbshipit-source-id: 0ceb2cc6a17dcc547d0d00ebaf9df8603be53183	2019-05-28 21:33:45 -07:00
Choongwoo Han	40074d647c	Allow None for checkpoint (#17969 ) Summary: Currently, we cannot run a checkpointed function with None argument. ```python out = torch.utils.checkpoint.checkpoint(run_fn, input_var, None) ``` ``` File "/home/tunz/anaconda3/envs/torchdev/lib/python3.7/site-packages/torch/utils/checkpoint.py", line 14, in detach_variable x = inp.detach() AttributeError: 'NoneType' object has no attribute 'detach' ``` This PR makes checkpoint function to safely handle None argument. Pull Request resolved: https://github.com/pytorch/pytorch/pull/17969 Differential Revision: D14475148 Pulled By: ezyang fbshipit-source-id: 9afe9e9aac511a6df1e1620e9ac341536890d451	2019-03-15 07:38:41 -07:00
Michael Carilli	5d3a347685	Stashing checkpointing RNG states based on devices of arg tensors (#14518 ) Summary: This PR intends to address apaszke's concerns in https://github.com/pytorch/pytorch/pull/14253#issuecomment-441740016. Preserving the rng state is now controlled by a kwarg rather than a global state, hopefully in a python 2.7-compatible way. Additionally, the checkpointing function stashes and restores the RNG states of 1. devices associated with all input tensor args to run_fn as well as 2. the current device. I could easily change this to only save and restore the RNG states associated 1. alone. This would simplify the logic to create a [deduplicated, ordered](https://github.com/pytorch/pytorch/compare/master...mcarilli:checkpointing_rng_touchup?expand=1#diff-58da227fc9b1d56752b7dfad90428fe0R37) list of devices considered active. I'm wondering if the [get_device_states](https://github.com/pytorch/pytorch/compare/master...mcarilli:checkpointing_rng_touchup?expand=1#diff-58da227fc9b1d56752b7dfad90428fe0R32) and [set_device_states](https://github.com/pytorch/pytorch/compare/master...mcarilli:checkpointing_rng_touchup?expand=1#diff-58da227fc9b1d56752b7dfad90428fe0R47) functions are general enough to reside elsewhere (presumably torch/random.py). I'm also wondering if the check on [torch.cuda._initialized](https://github.com/pytorch/pytorch/compare/master...mcarilli:checkpointing_rng_touchup?expand=1#diff-58da227fc9b1d56752b7dfad90428fe0R47) would be better placed within `get_device_states`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14518 Differential Revision: D13356210 Pulled By: ezyang fbshipit-source-id: afa4cc21ce7862142d5cb1dec3750018df222039	2018-12-11 09:48:45 -08:00
Andy Chen	33ea7eafef	Make checkpoint_sequential work with multiple arguments (#14278 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14278 In this commit, we make checkpoint_sequential work for models with multiple tensor inputs. Previously, it only processed the first tensor and ignored the rest. We introduce a new test in test/test_utils.py that replicates the issue referenced in this [GitHub issue](https://github.com/pytorch/pytorch/issues/11093), and we make sure that the test passes by changing the behavior of checkpoint_sequential to process all input tensors. Reviewed By: ezyang Differential Revision: D13144672 fbshipit-source-id: 24f58233a65a0f5b80b89c8d8cbced6f814004f7	2018-12-04 18:47:43 -08:00
Michael Carilli	c36156eded	Option to preserve bitwise accuracy of gradient checkpointed vs non-checkpointed dropout (#14253 ) Summary: This issue was noticed, and fix proposed, by raulpuric. Checkpointing is implemented by rerunning a forward-pass segment for each checkpointed segment during backward. This can result in the RNG state advancing more than it would without checkpointing, which can cause checkpoints that include dropout invocations to lose end-to-end bitwise accuracy as compared to non-checkpointed passes. The present PR contains optional logic to juggle the RNG states such that checkpointed passes containing dropout achieve bitwise accuracy with non-checkpointed equivalents.** The user requests this behavior by supplying `preserve_rng_state=True` to `torch.utils.checkpoint` or `torch.utils.checkpoint_sequential`. Currently, `preserve_rng_state=True` may incur a moderate performance hit because restoring MTGP states can be expensive. However, restoring Philox states is dirt cheap, so syed-ahmed's [RNG refactor](https://github.com/pytorch/pytorch/pull/13070#discussion_r235179882), once merged, will make this option more or less free. I'm a little wary of the [def checkpoint(function, args, preserve_rng_state=False):](https://github.com/pytorch/pytorch/pull/14253/files#diff-58da227fc9b1d56752b7dfad90428fe0R75) argument-passing method (specifically, putting a kwarg after a variable argument list). Python 3 seems happy with it. Edit: It appears Python 2.7 is NOT happy with a [kwarg after args](https://travis-ci.org/pytorch/pytorch/builds/457706518?utm_source=github_status&utm_medium=notification). `preserve_rng_state` also needs to be communicated in a way that doesn't break any existing usage. I'm open to suggestions (a global flag perhaps)? **Batchnorm may still be an issue, but that's a battle for another day. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14253 Differential Revision: D13166665 Pulled By: soumith fbshipit-source-id: 240cddab57ceaccba038b0276151342344eeecd7	2018-11-23 08:09:43 -08:00

1 2

56 Commits