Commit Graph

18 Commits

Author SHA1 Message Date
CaoE
2f53d570fe Update document for autocast on CPU (#135299)
Update document for autocast on CPU due to the support of float16 and changes in the operator list.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/135299
Approved by: https://github.com/jgong5, https://github.com/leslie-fang-intel, https://github.com/svekars
2024-09-13 09:11:47 +00:00
Jing Xu
7fe9ab9ccc update amp example to device-agnostic (#127278)
As support for Intel GPU has been upstreamed, this PR is to make the AMP example doc device-agnostic.

Co-authored-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/127278
Approved by: https://github.com/dvrogozh, https://github.com/EikanWang, https://github.com/svekars
2024-06-13 02:01:16 +00:00
eqy
eff74ed7bd [AMP] Use generic autocast in example, specify dtype (#79579)
CC @mruberry @ptrblck
Pull Request resolved: https://github.com/pytorch/pytorch/pull/79579
Approved by: https://github.com/mruberry, https://github.com/ngimel
2022-06-17 21:32:51 +00:00
leslie-fang-intel
3a112ebb57 add autocast cpu doc
As discussed in https://github.com/pytorch/pytorch/issues/55374#issuecomment-968333614, here we update the cpu autocast operation list in autocast API document.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/68567
Approved by: https://github.com/ezyang
2022-03-22 02:02:43 +00:00
Elio
088a4feb41 Update the documentation for AMP with DataParallel (#69218)
Summary:
Following https://github.com/pytorch/pytorch/issues/60540 and pull request https://github.com/pytorch/pytorch/issues/43102

Pull Request resolved: https://github.com/pytorch/pytorch/pull/69218

Reviewed By: gchanan

Differential Revision: D32803814

Pulled By: ngimel

fbshipit-source-id: 06fdbbee2c7734153271be70ec4bc24263c8c367
2021-12-03 14:58:47 -08:00
Michael Carilli
3e6bb5233f Reference amp tutorial (recipe) from core amp docs (#44725)
Summary:
https://pytorch.org/tutorials/recipes/recipes/amp_recipe.html is live.  Core amp docs should reference it.

Also i fixed some typos in the `zero_grad` docs we ignored when git was behaving weirdly during ngimel 's merge of https://github.com/pytorch/pytorch/pull/44423.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/44725

Reviewed By: mruberry

Differential Revision: D23723807

Pulled By: ngimel

fbshipit-source-id: ca0b76365f8ca908bd978e3b38bf81857fa6c2a3
2020-09-16 11:37:58 -07:00
Michael Carilli
2fd142a2ef Small clarification to amp gradient penalty example (#44667)
Summary:
requested by https://discuss.pytorch.org/t/what-is-the-correct-way-of-computing-a-grad-penalty-using-amp/95827/3

Pull Request resolved: https://github.com/pytorch/pytorch/pull/44667

Reviewed By: mruberry

Differential Revision: D23692768

Pulled By: ngimel

fbshipit-source-id: 83c61b94e79ef9f86abed2cc066f188dce0c8456
2020-09-14 21:56:09 -07:00
Michael Carilli
d927aee312 Small clarification of torch.cuda.amp multi-model example (#41203)
Summary:
some people have been confused by `retain_graph` in the snippet, they thought it was an additional requirement imposed by amp.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/41203

Differential Revision: D22463700

Pulled By: ngimel

fbshipit-source-id: e6fc8871be2bf0ecc1794b1c6f5ea99af922bf7e
2020-07-10 11:13:26 -07:00
Michael Carilli
3b040c478a Make custom_fwd a no-op when not executed under autocast (#36171)
Summary:
Currently, a custom autograd function written with
```
torch.cuda.amp.custom_fwd(cast_inputs=dtype)
def forward(ctx, *args):
    ...
```
casts incoming floating-point CUDA tensors to `dtype` unconditionally, regardless of whether the function executes in an autocast-enabled region.  I think I had the wrong idea there.  Autocast-disabled regions should give the user control of input types.  Also, `custom_fwd(cast_inputs=dtype)`-decorated functions' behavior should align with native fp32list/fp16list functions.  C++-side casting wrappers have no effect when autocast is disabled, and  `custom_fwd`'s casting should behave the same way.

The present PR changes `custom_fwd` so it only casts in autocast-enabled regions (also updates custom_fwd to ignore fp64 inputs, like the C++ wrappers).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/36171

Differential Revision: D22179511

Pulled By: ngimel

fbshipit-source-id: 5a93d070179a43206066bce19da0a5a19ecaabbd
2020-06-23 10:23:02 -07:00
Michael Carilli
e6bc34f549 Amp gradient accumulation example (#36601)
Summary:
Several people have asked me about proper Amp usage with gradient accumulation.  In particular, it's [unclear to people](https://github.com/NVIDIA/apex/issues/439#issuecomment-610351482) that you should only call `scaler.unscale_()` (if desired) and `scaler.update()` in iterations where you actually plan to step.  This PR adds a minimal accumulation example.

I built the docs locally and it looks free from sphinx errors, at least.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/36601

Differential Revision: D21082295

Pulled By: ngimel

fbshipit-source-id: b2faa6c02b9f7e1972618a0f1d5360a03f0450ac
2020-04-17 09:56:36 -07:00
Michael Carilli
0f0271e255 [RELAND2] Eager autocasting, out-of-place ops only (with MSVC 2017 fix) (#35102)
Summary:
This is the second reland attempt for https://github.com/pytorch/pytorch/pull/32140.

The first reland attempt https://github.com/pytorch/pytorch/pull/35011 failed due a [small incompatible change](https://github.com/pytorch/pytorch/pull/35011#issuecomment-601754216) in recent master (`skipIfRocm` was removed from `test_data_parallel.py`).

The present PR restores skipIfRocm.

Description from first reland attempt https://github.com/pytorch/pytorch/pull/35011:

> https://github.com/pytorch/pytorch/pull/32140 was approved and merged, but [reverted](d0577e19f0) because it broke builds with versions of Visual Studio older than 15.8 that were not represented in public CI.  The build failures were caused by a [known VS bug](https://developercommunity.visualstudio.com/content/problem/27729/allow-function-with-internal-linkage-as-template-n.html), fixed in versions 15.8 and newer.
>
> The present PR reverts the revert (restoring https://github.com/pytorch/pytorch/pull/32140 's diffs) and adds a workaround to enable compilation with VS < 15.8.  The workaround isn't pretty, but it's guarded by macros such that it's only used when compiling with VS < 15.8.  All other builds compile with the same code/control flow as was merged in https://github.com/pytorch/pytorch/pull/32140.
>
> Original description of https://github.com/pytorch/pytorch/pull/32140:
> > Initial integration of eager autocasting, supporting out-of-place ops only for easier review.
> Relevant issue/RFC: https://github.com/pytorch/pytorch/issues/25081
>
> > In-place ops and ops with user-supplied out=... can certainly be supported as well (my initial WIP https://github.com/pytorch/pytorch/issues/29552 handled many) but require substantially more complex special casing in the autocasting backend and tests. Support for these ops (much of which has already been written) will be broken into later PRs.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35102

Differential Revision: D20596918

Pulled By: ezyang

fbshipit-source-id: 60caa279bb0ce4a9bb0b28c1d585d42cf1cc7e50
2020-03-24 09:08:04 -07:00
Mike Ruberry
fe276d541e Revert D20541921: [pytorch][PR] [RELAND] Eager autocasting, out-of-place ops only (with MSVC 2017 fix)
Test Plan: revert-hammer

Differential Revision:
D20541921

Original commit changeset: abb5488dca86

fbshipit-source-id: d2c6038978f80e5429632f8b49107090a8a247f4
2020-03-19 22:39:12 -07:00
Michael Carilli
991b97277a [RELAND] Eager autocasting, out-of-place ops only (with MSVC 2017 fix) (#35011)
Summary:
https://github.com/pytorch/pytorch/pull/32140 was approved and merged, but [reverted](d0577e19f0) because it broke builds with versions of Visual Studio older than 15.8 that were not represented in public CI.  The build failures were caused by a [known VS bug](https://developercommunity.visualstudio.com/content/problem/27729/allow-function-with-internal-linkage-as-template-n.html), fixed in versions 15.8 and newer.

The present PR reverts the revert (restoring https://github.com/pytorch/pytorch/pull/32140 's diffs) and adds a workaround to enable compilation with VS < 15.8.  The workaround isn't pretty, but it's guarded by macros such that it's only used when compiling with VS < 15.8.  All other builds compile with the same code/control flow as was merged in https://github.com/pytorch/pytorch/pull/32140.

Original description of https://github.com/pytorch/pytorch/pull/32140:
> Initial integration of eager autocasting, supporting out-of-place ops only for easier review.
Relevant issue/RFC: https://github.com/pytorch/pytorch/issues/25081

> In-place ops and ops with user-supplied out=... can certainly be supported as well (my initial WIP https://github.com/pytorch/pytorch/issues/29552 handled many) but require substantially more complex special casing in the autocasting backend and tests. Support for these ops (much of which has already been written) will be broken into later PRs.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35011

Differential Revision: D20541921

Pulled By: ezyang

fbshipit-source-id: abb5488dca8620b0daac4306ebf2bb47fc36e4f5
2020-03-19 20:18:18 -07:00
Edward Yang
d0577e19f0 Revert D20346700: [pytorch][PR] Eager autocasting, out-of-place ops only
Test Plan: revert-hammer

Differential Revision:
D20346700

Original commit changeset: 12d77b391731

fbshipit-source-id: 108d72bf24232f443c0be293ec932c0c478d6a60
2020-03-18 11:42:51 -07:00
Michael Carilli
aaa8f02156 Eager autocasting, out-of-place ops only (#32140)
Summary:
Initial integration of eager autocasting, supporting out-of-place ops only for easier review.
Relevant issue/RFC: https://github.com/pytorch/pytorch/issues/25081

In-place ops and ops with user-supplied `out=...` can certainly be supported as well (my initial WIP https://github.com/pytorch/pytorch/pull/29552 handled many) but require substantially more complex special casing in the autocasting backend and tests.  Support for these ops (much of which has already been written) will be broken into later PRs.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32140

Differential Revision: D20346700

Pulled By: ezyang

fbshipit-source-id: 12d77b3917310186fbddf11c59b2794dc859131f
2020-03-18 10:28:21 -07:00
Michael Carilli
fc6a153688 [WIP] Reanimate gradient scaling API with original scale update heuristic (#33366)
Summary:
Also, windows memory failures responsible for the earlier reversion have been fixed.

This PR (initially) contains 2 commits:
* a revert of the revert
* all changes to implement the original Apex scale update heuristic, squashed into a single commit for easier diff review
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33366

Differential Revision: D20099026

Pulled By: ngimel

fbshipit-source-id: 339b9b6bd5134bf055057492cd1eedb7e4461529
2020-02-25 19:00:34 -08:00
Edward Yang
ae53f8dd25 Revert D19859905: [pytorch][PR] Gradient scaling API
Test Plan: revert-hammer

Differential Revision:
D19859905

Original commit changeset: bb8ae6966214

fbshipit-source-id: 28f1c93e8a00e3a4bbe8cc981499b15468f0b970
2020-02-14 11:03:27 -08:00
Michael Carilli
40246fa63c Gradient scaling API (#26512)
Summary:
This PR implements the gradient scaling API that mruberry, jjsjann123, ngimel, zdevito, gchanan and I have been discussing.  Relevant issue/RFC: https://github.com/pytorch/pytorch/issues/25081.

Volume-wise, this PR is mostly documentation and tests.  The Python API (found entirely in `torch/cuda/amp/amp_scaler.py`) is lightweight .  The exposed functions are intended to make the implementation and control flow of gradient scaling convenient, intuitive, and performant.

The API is probably easiest to digest by looking at the documentation and examples. `docs/source/amp.rst` is the homepage for the Automatic Mixed Precision package.  `docs/source/notes/amp_examples.rst` includes several examples demonstrating common but not-immediately-obvious use cases.  Examples are backed by tests in `test_cuda.py` (and thankfully the tests pass :P).

Two small utility kernels have been added in `native/cuda/AmpKernels.cu` to improve performance and avoid host-device synchronizations wherever possible.

Existing optimizers, both in the wild and in Pytorch core, do not need to change to use the scaling API.

However, the API was also designed to establish a contract between user scripts and optimizers such that writers of _new_ custom optimizers have the control points they need to implement fast, optionally sync-free updates.  User scripts that obey the scaling API can drop such custom optimizers in and reap performance benefits without having to change anything aside from the optimizer constructor itself.  [I know what the contract with custom optimizers should be](35829f24ef/torch/cuda/amp/amp_scaler.py (L179-L184)), but I'm waiting for review on the rest of the API before I go about documenting it (it will be given a dedicated section in `docs/source/notes/amp_examples.rst`.

Currently, the gradient scaling examples do not include the auto-casting API as discussed in https://github.com/pytorch/pytorch/issues/25081.  The gradient scaling API is intended to be orthogonal/modular relative to autocasting.  Without auto-casting the gradient scaling API is fully use-_able_, but not terribly use-_ful_, so it's up to you guys whether you want to wait until auto-casting is ready before merging the scaling API as well.

### Todo
- [ ] How do I get c10 registered status for my two custom kernels?  They're very simple.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26512

Differential Revision: D19859905

Pulled By: mruberry

fbshipit-source-id: bb8ae6966214718dfee11345db824389e4286923
2020-02-13 11:06:06 -08:00