pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
Rohan Varma	98cad3d305	[Checkpoint] Fix autocasting (#81766 ) Add support for the correct autocasting in the non-reentrant checkpoint as it exists in the reentrant-version. This was noticed by @awgu. Pull Request resolved: https://github.com/pytorch/pytorch/pull/81766 Approved by: https://github.com/albanD	2022-07-22 21:33:56 +00:00
Rohan Varma	e14941ef79	Add kwarg support for no_reentrant checkpoint (#80987 ) Supports kwargs input to function when `torch.utils.checkpoint` with use_reentrant=False. This is required to unblock T5 activation checkpointing and MetaSeq use cases. Closes https://github.com/pytorch/pytorch/issues/79887 Pull Request resolved: https://github.com/pytorch/pytorch/pull/80987 Approved by: https://github.com/zhaojuanmao	2022-07-09 05:07:13 +00:00
Rohan Varma	0c5fdfd95f	Revert "Revert "[FSDP Optim State] Remove checkpoint prefix (#80480 )"" (#80936 ) This reverts commit `fe361dede4`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/80936 Approved by: https://github.com/awgu	2022-07-06 22:21:07 +00:00
PyTorch MergeBot	58532256e9	Revert "Add __all__ for torch.distributed and fx modules (#80460 )" This reverts commit `5d40c3d5c8`. Reverted https://github.com/pytorch/pytorch/pull/80460 on behalf of https://github.com/malfet due to Broke MacOS testing, see https://github.com/pytorch/pytorch/runs/7105579664?check_suite_focus=true	2022-06-29 16:20:55 +00:00
anjali411	5d40c3d5c8	Add __all__ for torch.distributed and fx modules (#80460 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/80460 Approved by: https://github.com/albanD, https://github.com/rohan-varma	2022-06-29 02:53:56 +00:00
Rohan Varma	44fe851feb	[WIP] Fix non-reentrant hooks based checkpointing Pull Request resolved: https://github.com/pytorch/pytorch/pull/78752 Approved by: https://github.com/albanD	2022-06-14 01:13:33 +00:00
Kiarash Jamali	bc3c7a6cbd	Fix issue with _checkpoint_without_reentrant Fixes #76737 I also added a test case for this bug. Pull Request resolved: https://github.com/pytorch/pytorch/pull/76890 Approved by: https://github.com/albanD	2022-05-05 17:37:31 +00:00
Michael Carilli	cf3ef23713	Propagate full autocast state to CheckpointFunction's forward-inside-backward (#71169 ) Summary: Should fix https://github.com/pytorch/pytorch/issues/71124 (implements https://github.com/pytorch/pytorch/issues/71124#issuecomment-1009436056). cc mcarilli ptrblck Pull Request resolved: https://github.com/pytorch/pytorch/pull/71169 Reviewed By: albanD Differential Revision: D33793556 Pulled By: ngimel fbshipit-source-id: 80a4b4f0657b922002e3446fb6b48f082fa98453 (cherry picked from commit `cf9beee28b`)	2022-01-27 00:31:53 +00:00
Rohan Varma	049debd97d	[Reland][Autograd/Checkpoint] Checkpoint implementation without reentrant autograd (#69508 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69508 Original Phabricator Diff: D32704467 (`e032dae329`) Reland, fix is to not test traditional checkpoint when input does not require grad as that is unsupported as documented. Original PR body: Resubmission of https://github.com/pytorch/pytorch/pull/62964 with the suggestions and tests discussed in https://github.com/pytorch/pytorch/issues/65537. Adds a `use_reentrant=False` flag to `checkpoint` function. When `use_reentrant=True` is specified, a checkpointing implementation that uses SavedVariableHooks instead of re-entrant autograd is used. This makes it more composable with things such as `autograd.grad` as well as DDP (still need to add thorough distributed testing). As discussed in https://github.com/pytorch/pytorch/issues/65537, the tests that we need to add are: - [x] Gradient hooks are called once - [x] works when input does require grads but Tensor that require grads are captures (like first layer in a nn) - [x] works for functions with arbitrary input/output objects - [x] distributed tests (next PR) Note that this is only for `torch.utils.checkpoint`, if this approach overall looks good, we will do something similar for `checkpoint_sequential`. ghstack-source-id: 144948501 Test Plan: CI Reviewed By: zhaojuanmao Differential Revision: D32902634 fbshipit-source-id: 2ee87006e5045e5471ff80c36a07fbecc2bea3fe	2021-12-07 16:31:23 -08:00
Michael Suo	59e98b66ac	Revert D32704467: [Autograd/Checkpoint] Checkpoint implementation without reentrant autograd Test Plan: revert-hammer Differential Revision: D32704467 (`e032dae329`) Original commit changeset: 6eea1cce6b93 fbshipit-source-id: 1a788c1fd57cee46bba82e216e6162d078359cc2	2021-12-06 16:33:32 -08:00
Rohan Varma	e032dae329	[Autograd/Checkpoint] Checkpoint implementation without reentrant autograd (#69027 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69027 Resubmission of https://github.com/pytorch/pytorch/pull/62964 withe suggestions and tests discussed in https://github.com/pytorch/pytorch/issues/65537. Adds a `use_reentrant=False` flag to `checkpoint` function. When `use_reentrant=True` is specified, a checkpointing implementation that uses SavedVariableHooks instead of re-entrant autograd is used. This makes it more composable with things such as `autograd.grad` as well as DDP (still need to add thorough distributed testing). As discussed in https://github.com/pytorch/pytorch/issues/65537, we have added the following tests: -[ ] Gradient hooks are called once ghstack-source-id: 144644859 Test Plan: CI Reviewed By: pbelevich Differential Revision: D32704467 fbshipit-source-id: 6eea1cce6b935ef5a0f90b769e395120900e4412	2021-12-06 13:29:37 -08:00
epwalsh	c738c13304	Fix typo in checkpoint docs (#59646 ) Summary: This small typo causing this valuable piece of information to be excluded from the docs. <img width="876" alt="image" src="https://user-images.githubusercontent.com/8812459/121240517-47f2d400-c84f-11eb-9288-23c551c1591a.png"> The last "warning" is missing a second ":", so it doesn't render in the docs {emoji:1f447} <img width="875" alt="image" src="https://user-images.githubusercontent.com/8812459/121240467-39a4b800-c84f-11eb-9dd6-ec26754c43d3.png"> Pull Request resolved: https://github.com/pytorch/pytorch/pull/59646 Reviewed By: mruberry Differential Revision: D28972541 Pulled By: jbschlosser fbshipit-source-id: d10c6688d8db4d4ec4b02858a4c7b352365219c0	2021-06-09 12:48:18 -07:00
Pritam Damania	4fa47e5e7d	Support non-tensor inputs and outputs for checkpointed functions. (#52422 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52422 As mentioned in https://github.com/pytorch/pytorch/issues/52415, `torch.utils.checkpoint` doesn't support checkpointing for functions which have non-tensor inputs and outputs. This PR resolves this issue by ensuring the autograd machinery ignores the non-tensor inputs and outputs and processes the tensors accordingly. ghstack-source-id: 124406867 Test Plan: 1) unit test 2) waitforbuildbot Reviewed By: albanD Differential Revision: D26507228 fbshipit-source-id: 0a5a1591570814176185362e83ad18dabd9c84b0	2021-03-19 21:29:03 -07:00
Jeffrey Wan	7b9ca54ecf	Reset checkpoint_valid flag when error happens during function execution (#51746 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/37874, https://github.com/pytorch/pytorch/issues/51743 Uses RAII to manage the flag so that it gets reset properly on exception Pull Request resolved: https://github.com/pytorch/pytorch/pull/51746 Reviewed By: izdeby Differential Revision: D26319619 Pulled By: soulitzer fbshipit-source-id: ea1235438ba516f99195c83fa23d5880f9977c93	2021-02-08 17:48:25 -08:00
Michael Carilli	ee271047b5	torch.utils.checkpoint.checkpoint + torch.cuda.amp (#49757 ) Summary: Adds a test to orphaned original PR (https://github.com/pytorch/pytorch/pull/40221). Should fix https://github.com/pytorch/pytorch/issues/49738 and https://github.com/pytorch/pytorch/issues/47183 Pull Request resolved: https://github.com/pytorch/pytorch/pull/49757 Reviewed By: mruberry Differential Revision: D25689609 Pulled By: ngimel fbshipit-source-id: 0a6adc11eb98382048ef9a9775e185dcdeff6010	2020-12-22 22:25:11 -08:00
Weiyi Zheng	22f4a58a45	[pytorch] activation checkpointing: enable mixing tensor without requires_grad (#45934 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45934 https://pytorch.org/docs/stable/checkpoint.html pytorch checkpoint requires all input to the function being checkpointed to requires_grad, but this assumption is not necessarily try. consider the following two examples ``` output = MultiheadedMaskedAtten(input, mask) output = LSTM(input, seq_length) ``` both length and mask are tensors that won't requires grad, currently if you try to checkpoint torch.autograd.backward will complain ``` File "/mnt/xarfuse/uid-124297/7d159c34-seed-nspid4026531836-ns-4026531840/torch/autograd/function.py ", line 87, in apply return self._forward_cls.backward(self, *args) File "/mnt/xarfuse/uid-124297/7d159c34-seed-nspid4026531836-ns-4026531840/torch/utils/checkpoint.py" , line 99, in backward torch.autograd.backward(outputs, args) File "/mnt/xarfuse/uid-124297/7d159c34-seed-nspid4026531836-ns-4026531840/torch/autograd/__init__.py ", line 132, in backward allow_unreachable=True) # allow_unreachable flag RuntimeError: element 1 of tensors does not require grad and does not have a grad_fn ``` this diff allows skipping the non-grad-requiring tensor when running autograd.backward. added documentation for this feature as well. Test Plan: added unit test to make sure partial tensor grads can be used in checkpoint(). Differential Revision: D24094764 fbshipit-source-id: 6557e8e74132d5a392526adc7b57b6998609ed12	2020-10-14 21:28:02 -07:00
Xiang Gao	20ac736200	Remove py2 compatible future imports (#44735 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44735 Reviewed By: mruberry Differential Revision: D23731306 Pulled By: ezyang fbshipit-source-id: 0ba009a99e475ddbe22981be8ac636f8a1c8b02f	2020-09-16 12:55:57 -07:00
Nikita Shulga	6753157c5a	Enable torch.utils typechecks (#42960 ) Summary: Fix typos in torch.utils/_benchmark/README.md Add empty __init__.py to examples folder to make example invocations from README.md correct Fixed uniform distribution logic generation when mixval and maxval are None Fixes https://github.com/pytorch/pytorch/issues/42984 Pull Request resolved: https://github.com/pytorch/pytorch/pull/42960 Reviewed By: seemethere Differential Revision: D23095399 Pulled By: malfet fbshipit-source-id: 0546ce7299b157d9a1f8634340024b10c4b7e7de	2020-08-13 15:24:56 -07:00
Justin Liang	16f4501cd4	Improve checkpoint docs to warn users about detached gradient issues (#37266 ) Summary: See https://discuss.pytorch.org/t/training-with-gradient-checkpoints-torch-utils-checkpoint-appears-to-reduce-performance-of-model/78102/3?u=jwl for details. Updated the docs to warn users about issues with checkpointing models that use `detach()` or `torch.no_grad()` to freeze their model layers/weights during training. When they do this, training with `checkpoint` will fail as it forces the outputs to require gradients when the model itself does not. Hence, during the backward pass it will output the error: ``` [4]<stderr>:RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn ``` Maybe it is possible to fix this directly in the code, but I am not sure how in the current codebase. Pull Request resolved: https://github.com/pytorch/pytorch/pull/37266 Differential Revision: D21262558 Pulled By: mrshenli fbshipit-source-id: 529cf370534504baf8937ef17dac5d6916fbf5ae	2020-04-27 15:25:23 -07:00
Brian Wignall	f326045b37	Fix typos, via a Levenshtein-type corrector (#31523 ) Summary: Should be non-semantic. Uses https://en.wikipedia.org/wiki/Wikipedia:Lists_of_common_misspellings/For_machines to find likely typos, with https://github.com/bwignall/typochecker to help automate the checking. Uses an updated version of the tool used in https://github.com/pytorch/pytorch/pull/30606 . Pull Request resolved: https://github.com/pytorch/pytorch/pull/31523 Differential Revision: D19216749 Pulled By: mrshenli fbshipit-source-id: 7fd489cb9a77cd7e4950c1046f925d57524960ea	2020-01-17 16:03:19 -08:00
Heungsub Hans Lee	fa251cfd97	Fully deprecate variadic inputs of checkpoint_sequential (#25985 ) Summary: To support variadic inputs of `checkpoint_sequential` was deprecated at https://github.com/pytorch/pytorch/issues/21006. This case should be warned with `DeprecationWarning` for PyTorch 1.2, but it should be simply failed with `TypeError` since PyTorch 1.3. This patch removes the `DeprecationWarning` for PyTorch 1.2. Pull Request resolved: https://github.com/pytorch/pytorch/pull/25985 Differential Revision: D18809875 Pulled By: albanD fbshipit-source-id: e84dd8629c04979c4b2dc63e8ada94292e8cedd0	2019-12-05 09:23:28 -08:00
Brian Wignall	e7fe64f6a6	Fix typos (#30606 ) Summary: Should be non-semantic. Uses https://en.wikipedia.org/wiki/Wikipedia:Lists_of_common_misspellings/For_machines to find likely typos. Pull Request resolved: https://github.com/pytorch/pytorch/pull/30606 Differential Revision: D18763028 Pulled By: mrshenli fbshipit-source-id: 896515a2156d062653408852e6c04b429fc5955c	2019-12-02 20:17:42 -08:00
Hans Lee	ffdce79078	Deprecate variadic inputs of checkpoint_sequential (#21006 ) Summary: I've reported inconsistency between `checkpoint_sequential` and `nn.Sequential` at https://github.com/pytorch/pytorch/issues/19260. Both should provide the same input signature but they don't. I think the consistency is important and I agree with apaszke that `nn.Sequential`'s semantics should be kept instead of `checkpoint_sequential`. I hope `checkpoint_sequential` raises `TypeError` on variadic arguments since PyTorch 1.2.0. But for now, it's okay just to warn as `DeprecationWarning`. I've talked about this approach with soumith. Please review this pull request. Any comment will be my pleasure. Pull Request resolved: https://github.com/pytorch/pytorch/pull/21006 Differential Revision: D15530801 Pulled By: soumith fbshipit-source-id: 0ceb2cc6a17dcc547d0d00ebaf9df8603be53183	2019-05-28 21:33:45 -07:00
Choongwoo Han	40074d647c	Allow None for checkpoint (#17969 ) Summary: Currently, we cannot run a checkpointed function with None argument. ```python out = torch.utils.checkpoint.checkpoint(run_fn, input_var, None) ``` ``` File "/home/tunz/anaconda3/envs/torchdev/lib/python3.7/site-packages/torch/utils/checkpoint.py", line 14, in detach_variable x = inp.detach() AttributeError: 'NoneType' object has no attribute 'detach' ``` This PR makes checkpoint function to safely handle None argument. Pull Request resolved: https://github.com/pytorch/pytorch/pull/17969 Differential Revision: D14475148 Pulled By: ezyang fbshipit-source-id: 9afe9e9aac511a6df1e1620e9ac341536890d451	2019-03-15 07:38:41 -07:00
Michael Carilli	5d3a347685	Stashing checkpointing RNG states based on devices of arg tensors (#14518 ) Summary: This PR intends to address apaszke's concerns in https://github.com/pytorch/pytorch/pull/14253#issuecomment-441740016. Preserving the rng state is now controlled by a kwarg rather than a global state, hopefully in a python 2.7-compatible way. Additionally, the checkpointing function stashes and restores the RNG states of 1. devices associated with all input tensor args to run_fn as well as 2. the current device. I could easily change this to only save and restore the RNG states associated 1. alone. This would simplify the logic to create a [deduplicated, ordered](https://github.com/pytorch/pytorch/compare/master...mcarilli:checkpointing_rng_touchup?expand=1#diff-58da227fc9b1d56752b7dfad90428fe0R37) list of devices considered active. I'm wondering if the [get_device_states](https://github.com/pytorch/pytorch/compare/master...mcarilli:checkpointing_rng_touchup?expand=1#diff-58da227fc9b1d56752b7dfad90428fe0R32) and [set_device_states](https://github.com/pytorch/pytorch/compare/master...mcarilli:checkpointing_rng_touchup?expand=1#diff-58da227fc9b1d56752b7dfad90428fe0R47) functions are general enough to reside elsewhere (presumably torch/random.py). I'm also wondering if the check on [torch.cuda._initialized](https://github.com/pytorch/pytorch/compare/master...mcarilli:checkpointing_rng_touchup?expand=1#diff-58da227fc9b1d56752b7dfad90428fe0R47) would be better placed within `get_device_states`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14518 Differential Revision: D13356210 Pulled By: ezyang fbshipit-source-id: afa4cc21ce7862142d5cb1dec3750018df222039	2018-12-11 09:48:45 -08:00
Andy Chen	33ea7eafef	Make checkpoint_sequential work with multiple arguments (#14278 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14278 In this commit, we make checkpoint_sequential work for models with multiple tensor inputs. Previously, it only processed the first tensor and ignored the rest. We introduce a new test in test/test_utils.py that replicates the issue referenced in this [GitHub issue](https://github.com/pytorch/pytorch/issues/11093), and we make sure that the test passes by changing the behavior of checkpoint_sequential to process all input tensors. Reviewed By: ezyang Differential Revision: D13144672 fbshipit-source-id: 24f58233a65a0f5b80b89c8d8cbced6f814004f7	2018-12-04 18:47:43 -08:00
Michael Carilli	c36156eded	Option to preserve bitwise accuracy of gradient checkpointed vs non-checkpointed dropout (#14253 ) Summary: This issue was noticed, and fix proposed, by raulpuric. Checkpointing is implemented by rerunning a forward-pass segment for each checkpointed segment during backward. This can result in the RNG state advancing more than it would without checkpointing, which can cause checkpoints that include dropout invocations to lose end-to-end bitwise accuracy as compared to non-checkpointed passes. The present PR contains optional logic to juggle the RNG states such that checkpointed passes containing dropout achieve bitwise accuracy with non-checkpointed equivalents.** The user requests this behavior by supplying `preserve_rng_state=True` to `torch.utils.checkpoint` or `torch.utils.checkpoint_sequential`. Currently, `preserve_rng_state=True` may incur a moderate performance hit because restoring MTGP states can be expensive. However, restoring Philox states is dirt cheap, so syed-ahmed's [RNG refactor](https://github.com/pytorch/pytorch/pull/13070#discussion_r235179882), once merged, will make this option more or less free. I'm a little wary of the [def checkpoint(function, args, preserve_rng_state=False):](https://github.com/pytorch/pytorch/pull/14253/files#diff-58da227fc9b1d56752b7dfad90428fe0R75) argument-passing method (specifically, putting a kwarg after a variable argument list). Python 3 seems happy with it. Edit: It appears Python 2.7 is NOT happy with a [kwarg after args](https://travis-ci.org/pytorch/pytorch/builds/457706518?utm_source=github_status&utm_medium=notification). `preserve_rng_state` also needs to be communicated in a way that doesn't break any existing usage. I'm open to suggestions (a global flag perhaps)? **Batchnorm may still be an issue, but that's a battle for another day. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14253 Differential Revision: D13166665 Pulled By: soumith fbshipit-source-id: 240cddab57ceaccba038b0276151342344eeecd7	2018-11-23 08:09:43 -08:00
Yangqing Jia	c47f680086	arc lint torch/utils (#13141 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13141 This is an example diff to show what lint rules are being applied. Reviewed By: mingzhe09088 Differential Revision: D10858478 fbshipit-source-id: cbeb013f10f755b0095478adf79366e7cf7836ff	2018-10-25 14:59:03 -07:00
Thomas Viehmann	3799b10c44	various documentation formatting (#9359 ) Summary: This is a grab-bag of documentation formatting fixes. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9359 Differential Revision: D8831400 Pulled By: soumith fbshipit-source-id: 8dac02303168b2ea365e23938ee528d8e8c9f9b7	2018-07-13 02:48:25 -07:00
vfdev	2dc177ac50	Update checkpoint.py (#6943 )	2018-04-25 08:43:58 -04:00
Priya Goyal	7d32f6fdc3	Adding runtime warning for checkpointing inputs to have requires_grad=True (#6883 ) * Adding the warning for the checkpointing inputs to have requires_grad=True * fix bug	2018-04-23 22:43:35 -04:00
Tongzhou Wang	c2187790e3	Improve utils.checkpoint docs (#6526 ) * improve util.checkpoint docs * change volatile to no_grad, and add more explanation * address comments	2018-04-12 16:59:06 -04:00
Priya Goyal	e3196e0ea8	[Re-checkpointing] Autograd container for trading compute for memory (#6467 ) * Autograd container for trading compute for memory * add a unit test for checkpoint * address comments * address review comments * adding some docs for the checkpoint api * more comments * more comments * repro bug * Fix a subtle bug/apply some review comments * Update checkpoint.py * Run everything in grad mode * fix flake and chunk=1 * use imperative backward as per discussion * remove Variable and also add models and test for models * Add a simple thread local variable to check for autograd grad mode * remove models and models test after debugging * address review comments * address more comments * address more comments	2018-04-10 15:26:24 -04:00

1 2

83 Commits