pytorch/torch/distributed/algorithms
Rohan Varma 55ca6901a7 [CheckpointWrapper] Decouple CPU offload (#84907)
This fixes the activation offload for checkpoint wrapper, which was previously broken. It was broken because it was tightly coupled with activation checkpoint, i.e. we did:

```
with save_on_cpu:
    checkpoint(module_forward())
```

which would not offload any activation tensors to CPU, as those activations would already be not saved by autograd due to the checkpoint implementation taking priority.

Now, if `offload_to_cpu` is specified, we only do `save_on_cpu` and no checkpoint, so all intermediate tensors are offloaded to CPU instead of checkpointed.

These wrappers can be composed, i.e. if we have

`(Linear, Linear) -> (Linear, Linear) -> (Linear, Linear)`

we can do

`Offload( checkpoint(Linear, Linear) -> checkpoint(Linear, Linear) -> checkpoint(Linear, Linear))`

and inner tensors would be checkpointed while outers will be offloaded.

Differential Revision: [D39448882](https://our.internmc.facebook.com/intern/diff/D39448882/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/84907
Approved by: https://github.com/awgu
2022-09-15 00:30:23 +00:00
..
_checkpoint [CheckpointWrapper] Decouple CPU offload (#84907) 2022-09-15 00:30:23 +00:00
_comm_hooks Enforce explicit ProcessGroup passed into DefaultState (#84105) 2022-08-29 14:52:58 +00:00
_optimizer_overlap make fsdp folder to be public (#72084) 2022-02-02 15:50:14 +00:00
_quantization Change docstring type callable to Callable for consistency (#82487) 2022-08-01 17:26:09 +00:00
ddp_comm_hooks Add __all__ for a few distributed modules plus a little typing (reland) (#84872) 2022-09-13 21:57:49 +00:00
model_averaging Integrate xdoctest - Rebased (#82797) 2022-08-12 02:08:01 +00:00
__init__.py
join.py Integrate xdoctest - Rebased (#82797) 2022-08-12 02:08:01 +00:00