Commit Graph

83 Commits

Author SHA1 Message Date
Rohan Varma
98cad3d305 [Checkpoint] Fix autocasting (#81766)
Add support for the correct autocasting in the non-reentrant checkpoint as it exists in the reentrant-version.

This was noticed by @awgu.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/81766
Approved by: https://github.com/albanD
2022-07-22 21:33:56 +00:00
Rohan Varma
e14941ef79 Add kwarg support for no_reentrant checkpoint (#80987)
Supports kwargs input to function when `torch.utils.checkpoint` with use_reentrant=False. This is required to unblock T5 activation checkpointing and MetaSeq use cases.

Closes https://github.com/pytorch/pytorch/issues/79887
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80987
Approved by: https://github.com/zhaojuanmao
2022-07-09 05:07:13 +00:00
Rohan Varma
0c5fdfd95f Revert "Revert "[FSDP Optim State] Remove checkpoint prefix (#80480)"" (#80936)
This reverts commit fe361dede4.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80936
Approved by: https://github.com/awgu
2022-07-06 22:21:07 +00:00
PyTorch MergeBot
58532256e9 Revert "Add __all__ for torch.distributed and fx modules (#80460)"
This reverts commit 5d40c3d5c8.

Reverted https://github.com/pytorch/pytorch/pull/80460 on behalf of https://github.com/malfet due to Broke MacOS testing, see https://github.com/pytorch/pytorch/runs/7105579664?check_suite_focus=true
2022-06-29 16:20:55 +00:00
anjali411
5d40c3d5c8 Add __all__ for torch.distributed and fx modules (#80460)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80460
Approved by: https://github.com/albanD, https://github.com/rohan-varma
2022-06-29 02:53:56 +00:00
Rohan Varma
44fe851feb [WIP] Fix non-reentrant hooks based checkpointing
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78752

Approved by: https://github.com/albanD
2022-06-14 01:13:33 +00:00
Kiarash Jamali
bc3c7a6cbd Fix issue with _checkpoint_without_reentrant
Fixes  #76737
I also added a test case for this bug.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/76890
Approved by: https://github.com/albanD
2022-05-05 17:37:31 +00:00
Michael Carilli
cf3ef23713 Propagate full autocast state to CheckpointFunction's forward-inside-backward (#71169)
Summary:
Should fix https://github.com/pytorch/pytorch/issues/71124 (implements https://github.com/pytorch/pytorch/issues/71124#issuecomment-1009436056).

cc mcarilli ptrblck

Pull Request resolved: https://github.com/pytorch/pytorch/pull/71169

Reviewed By: albanD

Differential Revision: D33793556

Pulled By: ngimel

fbshipit-source-id: 80a4b4f0657b922002e3446fb6b48f082fa98453
(cherry picked from commit cf9beee28b)
2022-01-27 00:31:53 +00:00
Rohan Varma
049debd97d [Reland][Autograd/Checkpoint] Checkpoint implementation without reentrant autograd (#69508)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69508

Original Phabricator Diff: D32704467 (e032dae329)

Reland, fix is to not test traditional checkpoint when input does not require grad as that is unsupported as documented.

Original PR body:

Resubmission of https://github.com/pytorch/pytorch/pull/62964 with the
suggestions and tests discussed in
https://github.com/pytorch/pytorch/issues/65537.

Adds a `use_reentrant=False` flag to `checkpoint` function. When
`use_reentrant=True` is specified, a checkpointing implementation that uses
SavedVariableHooks instead of re-entrant autograd is used. This makes it more
composable with things such as `autograd.grad` as well as DDP (still need to
add thorough distributed testing).

As discussed in https://github.com/pytorch/pytorch/issues/65537, the tests that we need to add are:

- [x] Gradient hooks are called once
- [x] works when input does require grads but Tensor that require grads are captures (like first layer in a nn)
- [x] works for functions with arbitrary input/output objects
- [x] distributed tests (next PR)

Note that this is only for `torch.utils.checkpoint`, if this approach overall looks good, we will do something similar for `checkpoint_sequential`.
ghstack-source-id: 144948501

Test Plan: CI

Reviewed By: zhaojuanmao

Differential Revision: D32902634

fbshipit-source-id: 2ee87006e5045e5471ff80c36a07fbecc2bea3fe
2021-12-07 16:31:23 -08:00
Michael Suo
59e98b66ac Revert D32704467: [Autograd/Checkpoint] Checkpoint implementation without reentrant autograd
Test Plan: revert-hammer

Differential Revision:
D32704467 (e032dae329)

Original commit changeset: 6eea1cce6b93

fbshipit-source-id: 1a788c1fd57cee46bba82e216e6162d078359cc2
2021-12-06 16:33:32 -08:00
Rohan Varma
e032dae329 [Autograd/Checkpoint] Checkpoint implementation without reentrant autograd (#69027)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69027

Resubmission of https://github.com/pytorch/pytorch/pull/62964 withe
suggestions and tests discussed in
https://github.com/pytorch/pytorch/issues/65537.

Adds a `use_reentrant=False` flag to `checkpoint` function. When
`use_reentrant=True` is specified, a checkpointing implementation that uses
SavedVariableHooks instead of re-entrant autograd is used. This makes it more
composable with things such as `autograd.grad` as well as DDP (still need to
add thorough distributed testing).

As discussed in https://github.com/pytorch/pytorch/issues/65537, we have added
the following tests:

-[ ] Gradient hooks are called once
ghstack-source-id: 144644859

Test Plan: CI

Reviewed By: pbelevich

Differential Revision: D32704467

fbshipit-source-id: 6eea1cce6b935ef5a0f90b769e395120900e4412
2021-12-06 13:29:37 -08:00
epwalsh
c738c13304 Fix typo in checkpoint docs (#59646)
Summary:
This small typo causing this valuable piece of information to be excluded from the docs.

<img width="876" alt="image" src="https://user-images.githubusercontent.com/8812459/121240517-47f2d400-c84f-11eb-9288-23c551c1591a.png">

The last "warning" is missing a second ":", so it doesn't render in the docs {emoji:1f447}

<img width="875" alt="image" src="https://user-images.githubusercontent.com/8812459/121240467-39a4b800-c84f-11eb-9dd6-ec26754c43d3.png">

Pull Request resolved: https://github.com/pytorch/pytorch/pull/59646

Reviewed By: mruberry

Differential Revision: D28972541

Pulled By: jbschlosser

fbshipit-source-id: d10c6688d8db4d4ec4b02858a4c7b352365219c0
2021-06-09 12:48:18 -07:00
Pritam Damania
4fa47e5e7d Support non-tensor inputs and outputs for checkpointed functions. (#52422)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/52422

As mentioned in https://github.com/pytorch/pytorch/issues/52415,
`torch.utils.checkpoint` doesn't support checkpointing for functions which have
non-tensor inputs and outputs.

This PR resolves this issue by ensuring the autograd machinery ignores the
non-tensor inputs and outputs and processes the tensors accordingly.
ghstack-source-id: 124406867

Test Plan:
1) unit test
2) waitforbuildbot

Reviewed By: albanD

Differential Revision: D26507228

fbshipit-source-id: 0a5a1591570814176185362e83ad18dabd9c84b0
2021-03-19 21:29:03 -07:00
Jeffrey Wan
7b9ca54ecf Reset checkpoint_valid flag when error happens during function execution (#51746)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/37874, https://github.com/pytorch/pytorch/issues/51743

Uses RAII to manage the flag so that it gets reset properly on exception

Pull Request resolved: https://github.com/pytorch/pytorch/pull/51746

Reviewed By: izdeby

Differential Revision: D26319619

Pulled By: soulitzer

fbshipit-source-id: ea1235438ba516f99195c83fa23d5880f9977c93
2021-02-08 17:48:25 -08:00
Michael Carilli
ee271047b5 torch.utils.checkpoint.checkpoint + torch.cuda.amp (#49757)
Summary:
Adds a test to orphaned original PR (https://github.com/pytorch/pytorch/pull/40221).

Should fix https://github.com/pytorch/pytorch/issues/49738 and https://github.com/pytorch/pytorch/issues/47183

Pull Request resolved: https://github.com/pytorch/pytorch/pull/49757

Reviewed By: mruberry

Differential Revision: D25689609

Pulled By: ngimel

fbshipit-source-id: 0a6adc11eb98382048ef9a9775e185dcdeff6010
2020-12-22 22:25:11 -08:00
Weiyi Zheng
22f4a58a45 [pytorch] activation checkpointing: enable mixing tensor without requires_grad (#45934)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/45934

https://pytorch.org/docs/stable/checkpoint.html pytorch checkpoint requires all input to the function being checkpointed to requires_grad, but this assumption is not necessarily try. consider the following two examples

```
output = MultiheadedMaskedAtten(input, mask)

output = LSTM(input, seq_length)
```
both length and mask are tensors that won't requires grad, currently if you try to checkpoint torch.autograd.backward will complain

```
  File "/mnt/xarfuse/uid-124297/7d159c34-seed-nspid4026531836-ns-4026531840/torch/autograd/function.py
", line 87, in apply
    return self._forward_cls.backward(self, *args)
  File "/mnt/xarfuse/uid-124297/7d159c34-seed-nspid4026531836-ns-4026531840/torch/utils/checkpoint.py"
, line 99, in backward
    torch.autograd.backward(outputs, args)
  File "/mnt/xarfuse/uid-124297/7d159c34-seed-nspid4026531836-ns-4026531840/torch/autograd/__init__.py
", line 132, in backward
    allow_unreachable=True)  # allow_unreachable flag
RuntimeError: element 1 of tensors does not require grad and does not have a grad_fn
```

this diff allows skipping the non-grad-requiring tensor when running autograd.backward.

added documentation for this feature as well.

Test Plan: added unit test to make sure partial tensor grads can be used in checkpoint().

Differential Revision: D24094764

fbshipit-source-id: 6557e8e74132d5a392526adc7b57b6998609ed12
2020-10-14 21:28:02 -07:00
Xiang Gao
20ac736200 Remove py2 compatible future imports (#44735)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44735

Reviewed By: mruberry

Differential Revision: D23731306

Pulled By: ezyang

fbshipit-source-id: 0ba009a99e475ddbe22981be8ac636f8a1c8b02f
2020-09-16 12:55:57 -07:00
Nikita Shulga
6753157c5a Enable torch.utils typechecks (#42960)
Summary:
Fix typos in torch.utils/_benchmark/README.md
Add empty __init__.py to examples folder to make example invocations from README.md correct
Fixed uniform distribution logic generation when mixval and maxval are None

Fixes https://github.com/pytorch/pytorch/issues/42984

Pull Request resolved: https://github.com/pytorch/pytorch/pull/42960

Reviewed By: seemethere

Differential Revision: D23095399

Pulled By: malfet

fbshipit-source-id: 0546ce7299b157d9a1f8634340024b10c4b7e7de
2020-08-13 15:24:56 -07:00
Justin Liang
16f4501cd4 Improve checkpoint docs to warn users about detached gradient issues (#37266)
Summary:
See https://discuss.pytorch.org/t/training-with-gradient-checkpoints-torch-utils-checkpoint-appears-to-reduce-performance-of-model/78102/3?u=jwl for details.

Updated the docs to warn users about issues with checkpointing models that use `detach()` or `torch.no_grad()` to freeze their model layers/weights during training. When they do this, training with `checkpoint` will fail as it forces the outputs to require gradients when the model itself does not. Hence, during the backward pass it will output the error:
```
[4]<stderr>:RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn
```

Maybe it is possible to fix this directly in the code, but I am not sure how in the current codebase.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/37266

Differential Revision: D21262558

Pulled By: mrshenli

fbshipit-source-id: 529cf370534504baf8937ef17dac5d6916fbf5ae
2020-04-27 15:25:23 -07:00
Brian Wignall
f326045b37 Fix typos, via a Levenshtein-type corrector (#31523)
Summary:
Should be non-semantic.

Uses https://en.wikipedia.org/wiki/Wikipedia:Lists_of_common_misspellings/For_machines to find likely typos, with https://github.com/bwignall/typochecker to help automate the checking.

Uses an updated version of the tool used in https://github.com/pytorch/pytorch/pull/30606 .
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31523

Differential Revision: D19216749

Pulled By: mrshenli

fbshipit-source-id: 7fd489cb9a77cd7e4950c1046f925d57524960ea
2020-01-17 16:03:19 -08:00
Heungsub Hans Lee
fa251cfd97 Fully deprecate variadic inputs of checkpoint_sequential (#25985)
Summary:
To support variadic inputs of `checkpoint_sequential` was deprecated at https://github.com/pytorch/pytorch/issues/21006. This case should be warned with `DeprecationWarning` for PyTorch 1.2, but it should be simply failed with `TypeError` since PyTorch 1.3. This patch removes the `DeprecationWarning` for PyTorch 1.2.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25985

Differential Revision: D18809875

Pulled By: albanD

fbshipit-source-id: e84dd8629c04979c4b2dc63e8ada94292e8cedd0
2019-12-05 09:23:28 -08:00
Brian Wignall
e7fe64f6a6 Fix typos (#30606)
Summary:
Should be non-semantic.

Uses https://en.wikipedia.org/wiki/Wikipedia:Lists_of_common_misspellings/For_machines to find likely typos.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30606

Differential Revision: D18763028

Pulled By: mrshenli

fbshipit-source-id: 896515a2156d062653408852e6c04b429fc5955c
2019-12-02 20:17:42 -08:00
Hans Lee
ffdce79078 Deprecate variadic inputs of checkpoint_sequential (#21006)
Summary:
I've reported inconsistency between `checkpoint_sequential` and `nn.Sequential` at https://github.com/pytorch/pytorch/issues/19260. Both should provide the same input signature but they don't. I think the consistency is important and I agree with apaszke that `nn.Sequential`'s semantics should be kept instead of `checkpoint_sequential`.

I hope `checkpoint_sequential` raises `TypeError` on variadic arguments since PyTorch 1.2.0. But for now, it's okay just to warn as `DeprecationWarning`. I've talked about this approach with soumith.

Please review this pull request. Any comment will be my pleasure.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21006

Differential Revision: D15530801

Pulled By: soumith

fbshipit-source-id: 0ceb2cc6a17dcc547d0d00ebaf9df8603be53183
2019-05-28 21:33:45 -07:00
Choongwoo Han
40074d647c Allow None for checkpoint (#17969)
Summary:
Currently, we cannot run a checkpointed function with None argument.

```python
out = torch.utils.checkpoint.checkpoint(run_fn, input_var, None)
```

```
  File "/home/tunz/anaconda3/envs/torchdev/lib/python3.7/site-packages/torch/utils/checkpoint.py", line 14, in detach_variable
    x = inp.detach()
AttributeError: 'NoneType' object has no attribute 'detach'
```

This PR makes checkpoint function to safely handle None argument.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17969

Differential Revision: D14475148

Pulled By: ezyang

fbshipit-source-id: 9afe9e9aac511a6df1e1620e9ac341536890d451
2019-03-15 07:38:41 -07:00
Michael Carilli
5d3a347685 Stashing checkpointing RNG states based on devices of arg tensors (#14518)
Summary:
This PR intends to address apaszke's concerns in https://github.com/pytorch/pytorch/pull/14253#issuecomment-441740016.  Preserving the rng state is now controlled by a kwarg rather than a global state, hopefully in a python 2.7-compatible way.

Additionally, the checkpointing function stashes and restores the RNG states of
1. devices associated with all input tensor args to run_fn as well as
2. the current device.

I could easily change this to only save and restore the RNG states associated 1. alone.  This would simplify the logic to create a [deduplicated, ordered](https://github.com/pytorch/pytorch/compare/master...mcarilli:checkpointing_rng_touchup?expand=1#diff-58da227fc9b1d56752b7dfad90428fe0R37) list of devices considered active.

I'm wondering if the [get_device_states](https://github.com/pytorch/pytorch/compare/master...mcarilli:checkpointing_rng_touchup?expand=1#diff-58da227fc9b1d56752b7dfad90428fe0R32) and [set_device_states](https://github.com/pytorch/pytorch/compare/master...mcarilli:checkpointing_rng_touchup?expand=1#diff-58da227fc9b1d56752b7dfad90428fe0R47) functions are general enough to reside elsewhere (presumably torch/random.py).  I'm also wondering if the check on [torch.cuda._initialized](https://github.com/pytorch/pytorch/compare/master...mcarilli:checkpointing_rng_touchup?expand=1#diff-58da227fc9b1d56752b7dfad90428fe0R47) would be better placed within `get_device_states`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14518

Differential Revision: D13356210

Pulled By: ezyang

fbshipit-source-id: afa4cc21ce7862142d5cb1dec3750018df222039
2018-12-11 09:48:45 -08:00
Andy Chen
33ea7eafef Make checkpoint_sequential work with multiple arguments (#14278)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14278

In this commit, we make checkpoint_sequential work for models with multiple tensor inputs. Previously, it only processed the first tensor and ignored the rest.

We introduce a new test in test/test_utils.py that replicates the issue referenced in this [GitHub issue](https://github.com/pytorch/pytorch/issues/11093), and we make sure that the test passes by changing the behavior of checkpoint_sequential to process all input tensors.

Reviewed By: ezyang

Differential Revision: D13144672

fbshipit-source-id: 24f58233a65a0f5b80b89c8d8cbced6f814004f7
2018-12-04 18:47:43 -08:00
Michael Carilli
c36156eded Option to preserve bitwise accuracy of gradient checkpointed vs non-checkpointed dropout (#14253)
Summary:
This issue was noticed, and fix proposed, by raulpuric.

Checkpointing is implemented by rerunning a forward-pass segment for each checkpointed segment during backward.  This can result in the RNG state advancing more than it would without checkpointing, which can cause checkpoints that include dropout invocations to lose end-to-end bitwise accuracy as compared to non-checkpointed passes.

The present PR contains optional logic to juggle the RNG states such that checkpointed passes containing dropout achieve bitwise accuracy with non-checkpointed equivalents.**  The user requests this behavior by supplying `preserve_rng_state=True` to `torch.utils.checkpoint` or `torch.utils.checkpoint_sequential`.

Currently, `preserve_rng_state=True` may incur a moderate performance hit because restoring MTGP states can be expensive.  However, restoring Philox states is dirt cheap, so syed-ahmed's [RNG refactor](https://github.com/pytorch/pytorch/pull/13070#discussion_r235179882), once merged, will make this option more or less free.

I'm a little wary of the [def checkpoint(function, *args, preserve_rng_state=False):](https://github.com/pytorch/pytorch/pull/14253/files#diff-58da227fc9b1d56752b7dfad90428fe0R75) argument-passing method (specifically, putting a kwarg after a variable argument list).  Python 3 seems happy with it.
Edit:  It appears Python 2.7 is NOT happy with a [kwarg after *args](https://travis-ci.org/pytorch/pytorch/builds/457706518?utm_source=github_status&utm_medium=notification).  `preserve_rng_state` also needs to be communicated in a way that doesn't break any existing usage.  I'm open to suggestions (a global flag perhaps)?

**Batchnorm may still be an issue, but that's a battle for another day.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14253

Differential Revision: D13166665

Pulled By: soumith

fbshipit-source-id: 240cddab57ceaccba038b0276151342344eeecd7
2018-11-23 08:09:43 -08:00
Yangqing Jia
c47f680086 arc lint torch/utils (#13141)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13141

This is an example diff to show what lint rules are being applied.

Reviewed By: mingzhe09088

Differential Revision: D10858478

fbshipit-source-id: cbeb013f10f755b0095478adf79366e7cf7836ff
2018-10-25 14:59:03 -07:00
Thomas Viehmann
3799b10c44 various documentation formatting (#9359)
Summary:
This is a grab-bag of documentation formatting fixes.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9359

Differential Revision: D8831400

Pulled By: soumith

fbshipit-source-id: 8dac02303168b2ea365e23938ee528d8e8c9f9b7
2018-07-13 02:48:25 -07:00
vfdev
2dc177ac50 Update checkpoint.py (#6943) 2018-04-25 08:43:58 -04:00
Priya Goyal
7d32f6fdc3 Adding runtime warning for checkpointing inputs to have requires_grad=True (#6883)
* Adding the warning for the checkpointing inputs to have requires_grad=True

* fix bug
2018-04-23 22:43:35 -04:00
Tongzhou Wang
c2187790e3 Improve utils.checkpoint docs (#6526)
* improve util.checkpoint docs

* change volatile to no_grad, and add more explanation

* address comments
2018-04-12 16:59:06 -04:00
Priya Goyal
e3196e0ea8
[Re-checkpointing] Autograd container for trading compute for memory (#6467)
* Autograd container for trading compute for memory

* add a unit test for checkpoint

* address comments

* address review comments

* adding some docs for the checkpoint api

* more comments

* more comments

* repro bug

* Fix a subtle bug/apply some review comments

* Update checkpoint.py

* Run everything in grad mode

* fix flake and chunk=1

* use imperative backward as per discussion

* remove Variable and also add models and test for models

* Add a simple thread local variable to check for autograd grad mode

* remove models and models test after debugging

* address review comments

* address more comments

* address more comments
2018-04-10 15:26:24 -04:00