pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
CaoE	2f53d570fe	Update document for autocast on CPU (#135299 ) Update document for autocast on CPU due to the support of float16 and changes in the operator list. Pull Request resolved: https://github.com/pytorch/pytorch/pull/135299 Approved by: https://github.com/jgong5, https://github.com/leslie-fang-intel, https://github.com/svekars	2024-09-13 09:11:47 +00:00
Jing Xu	5fba5d83f0	add xpu for amp (#127276 ) As support for Intel GPU has been upstreamed, this PR is to add the XPU-related contents to AMP doc. Co-authored-by: Yu, Guangye <guangye.yu@intel.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/127276 Approved by: https://github.com/dvrogozh, https://github.com/albanD, https://github.com/malfet	2024-06-20 21:49:35 +00:00
Yu, Guangye	e7a42702f9	generalize custom_fwd&custom_bwd to be device-agnostic (#126531 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/126531 Approved by: https://github.com/jgong5, https://github.com/gujinghui, https://github.com/albanD, https://github.com/EikanWang ghstack dependencies: #126527	2024-05-25 06:48:16 +00:00
Yu, Guangye	c09205a057	Deprecate device-specific GradScaler autocast API (#126527 ) # Motivation ## for `torch.amp.GradScaler`, - `torch.cpu.amp.GradScaler(args...)` is completely equivalent to `torch. amp.GradScaler("cpu", args...)`. - `torch.cuda.amp.GradScaler(args...)` is completely equivalent to `torch.amp.GradScaler("cuda", args...)`. So, we intend to depreate them and strongly recommend developer to use `torch.amp.GradScaler`. ## for `custom_fwd` and `custom_bwd`, this is a good solution to make the custom function run with or without effect even in an autocast-enabled region and can be shared by other backends, like CPU and XPU. So we generalize it to be device-agnostic and put them int `torch/amp/autocast_mode.py` and re-expose to `torch.amp.custom_fwd` and `torch.amp.custom_bwd`. Meanwhile, we deprecate `torch.cuda.amp.custom_fwd` and `torch.cuda.amp.custom_bwd`. # Additional Context Add UT to cover the deprecated warning. No need for more UTs to cover the functionality of `torch.amp.custom_f/bwd`, the existing UTs that previously covered the functionality of `torch.cuda.amp.custom_f/bwd` can cover them. To facilitate the review, we separate these code changes to two PRs. The first PR cover `torch.amp.GradScaler`. The follow-up covers `custom_fwd` and `custom_bwd`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/126527 Approved by: https://github.com/jgong5, https://github.com/gujinghui, https://github.com/janeyx99, https://github.com/EikanWang	2024-05-25 06:41:34 +00:00
Yu, Guangye	19a83eacb5	add new API torch.amp.is_autocast_available (#124938 ) # Motivation expose `torch._is_autocast_available` to `torch.amp.is_autocast_available` as a public api. Pull Request resolved: https://github.com/pytorch/pytorch/pull/124938 Approved by: https://github.com/albanD	2024-04-26 08:45:20 +00:00
CaoE	bacbad5bc9	add GradScaler on CPU (#109993 ) Step 2 of https://github.com/pytorch/pytorch/issues/111559. Pull Request resolved: https://github.com/pytorch/pytorch/pull/109993 Approved by: https://github.com/jgong5, https://github.com/ezyang	2024-01-29 23:42:35 +00:00
Kurt Mohler	fd209543d5	Add `torch.utils.deterministic.fill_uninitialized_memory` flag (#111377 ) Part of #109802 Pull Request resolved: https://github.com/pytorch/pytorch/pull/111377 Approved by: https://github.com/albanD, https://github.com/aaronenyeshi	2023-11-01 16:10:09 +00:00
PyTorch MergeBot	ace2713d1e	Revert "Add `torch.utils.deterministic.fill_uninitialized_memory` flag (#111377 )" This reverts commit `f1785373c0`. Reverted https://github.com/pytorch/pytorch/pull/111377 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally ([comment](https://github.com/pytorch/pytorch/pull/111377#issuecomment-1784179040))	2023-10-29 17:41:55 +00:00
Kurt Mohler	f1785373c0	Add `torch.utils.deterministic.fill_uninitialized_memory` flag (#111377 ) Part of #109802 Pull Request resolved: https://github.com/pytorch/pytorch/pull/111377 Approved by: https://github.com/albanD	2023-10-26 02:39:06 +00:00
albanD	c4db607607	Doc test non packages (#110568 ) Add non-package python modules to the public API checks. The original change is to remove the `ispkg` check in this line https://github.com/pytorch/pytorch/blob/main/docs/source/conf.py#L518 Everything else is to add the appropriate modules to the rst files, make sure every module we provide can be imported (fixed by either making optional dependencies optional or just deleting files that have been un-importable for 3 years), make API that are both modules and functions (like torch.autograd.gradcheck) properly rendered on the docs website without confusion and add every non-documented API to the allow list (~3k of them). Next steps will be to try and fix these missing docs Pull Request resolved: https://github.com/pytorch/pytorch/pull/110568 Approved by: https://github.com/zou3519	2023-10-06 14:16:01 +00:00
Ruoxi	5afc2f5069	Documentation for `torch.autocast` (#95760 ) - [x] Corrected examples for CUDA devices. - [x] Information about availability of `torch.autocast`. Fixes #95547 Pull Request resolved: https://github.com/pytorch/pytorch/pull/95760 Approved by: https://github.com/leslie-fang-intel, https://github.com/kit1980	2023-07-22 03:56:34 +00:00
Rodrigo Kumpera	fc012d716d	[core] Bring cpu device module closer to cuda's. (#103172 ) By implementing some of the functionality used by CUDA we make implementing device agnostic code a lot easier. With this set of changes it's now possible to get FSDP wrap a trivial module. FWD/BWD still TBD. Pull Request resolved: https://github.com/pytorch/pytorch/pull/103172 Approved by: https://github.com/wz337, https://github.com/wanchaol	2023-07-12 19:43:22 +00:00
Jane Xu	cde597efa1	[docs] Warn that GradScaler can scale under 1 (#101569 ) Completes action item 1 in https://github.com/pytorch/pytorch/issues/99640 Pull Request resolved: https://github.com/pytorch/pytorch/pull/101569 Approved by: https://github.com/ngimel	2023-05-16 23:56:07 +00:00
Ivan Yashchuk	bcf93181a0	Remove deprecated torch.matrix_rank (#70981 ) The time has come to remove deprecated linear algebra related functions. This PR removes `torch.matrix_rank`. cc @jianyuh @nikitaved @pearu @mruberry @walterddr @IvanYashchuk @xwang233 @Lezcano Pull Request resolved: https://github.com/pytorch/pytorch/pull/70981 Approved by: https://github.com/lezcano, https://github.com/kit1980	2022-09-22 17:40:46 +00:00
Sergii Dymchenko	a0b3854548	Change seperate -> separate (#83056 ) One instance was caught by Meta-internal "exact-word-misspell" linter in D38505529. Pull Request resolved: https://github.com/pytorch/pytorch/pull/83056 Approved by: https://github.com/huydhn, https://github.com/seemethere	2022-08-09 23:11:34 +00:00
ecao	541a378914	Remove operators that support BFloat16 in the fp32 cast policy list of AutocastCPU (#77623 ) Remove operators that support BFloat16 in the fp32 cast policy list of AutocastCPU. Pull Request resolved: https://github.com/pytorch/pytorch/pull/77623 Approved by: https://github.com/frank-wei	2022-05-17 16:49:17 +00:00
ecao	5993cc0b3d	Update operator list for AutocastCPU (#68725 ) Update operator list for AutocastCPU. Pull Request resolved: https://github.com/pytorch/pytorch/pull/68725 Approved by: https://github.com/frank-wei	2022-05-11 17:28:35 +00:00
leslie-fang-intel	f2d9fc18f1	Update amp document with CPU Training/Inference Examples (#77244 ) This PR mainly updates the document with CPU Training/Inference Examples. Pull Request resolved: https://github.com/pytorch/pytorch/pull/77244 Approved by: https://github.com/H-Huang	2022-05-11 15:42:45 +00:00
leslie-fang-intel	3a112ebb57	add autocast cpu doc As discussed in https://github.com/pytorch/pytorch/issues/55374#issuecomment-968333614, here we update the cpu autocast operation list in autocast API document. Pull Request resolved: https://github.com/pytorch/pytorch/pull/68567 Approved by: https://github.com/ezyang	2022-03-22 02:02:43 +00:00
Alban Desmaison	734281c3d6	Cleanup all module references in doc (#73983 ) Summary: Working towards https://docs.google.com/document/d/10yx2-4gs0gTMOimVS403MnoAWkqitS8TUHX73PN8EjE/edit?pli=1# This PR: - Ensure that all the submodules are listed in a rst file (that ensure they are considered by the coverage tool) - Remove some long deprecated code that just error out on import - Remove the allow list altogether to ensure nothing gets added back there Pull Request resolved: https://github.com/pytorch/pytorch/pull/73983 Reviewed By: anjali411 Differential Revision: D34787908 Pulled By: albanD fbshipit-source-id: 163ce61e133b12b2f2e1cbe374f979e3d6858db7 (cherry picked from commit c9edfead7a01dc45bfc24eaf7220d2a84ab1f62e)	2022-03-10 22:26:29 +00:00
Rishi Puri	324673a537	rebase for autocast updates to include device_type and dtype flags (#61002 ) Summary: Fixes #{55374} https://github.com/pytorch/pytorch/issues/55374 Pull Request resolved: https://github.com/pytorch/pytorch/pull/61002 Reviewed By: malfet, mruberry Differential Revision: D30016812 Pulled By: ngimel fbshipit-source-id: 6e09a29f539d28e9aea5cd9489b1e633cc588033	2021-08-10 20:03:12 -07:00
Michael Carilli	f89ae9cb8d	Moves grid_sampler to autocast promote list (#58618 ) Summary: Should close https://github.com/pytorch/pytorch/issues/42218 Numerically, `grid_sampler` is fine in fp16 or fp32, but takes several inputs and expects their dtypes to match, so it belongs on the autocast promote list. `grid_sampler` currently uses `gpuAtomicAdd`, notoriously slow in fp16 because it calls cuda's atomicAdd __half overload which uses a software compare-and-swap loop internally. To allow good performance if both inputs happen to be FP16, the PR also modifies `grid_sampler_[2,3]d_backward_kernel`s to use `fastAtomicAdd` instead. Pull Request resolved: https://github.com/pytorch/pytorch/pull/58618 Reviewed By: mruberry Differential Revision: D29257199 Pulled By: ngimel fbshipit-source-id: 3cc7505945b480427f2fc1beb36bee80bf3853b3	2021-06-21 10:22:36 -07:00
Patrick Wang	8b55e9feaf	removed cat, equal, and stack from autocast promote list (#59497 ) Summary: Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/59497 Reviewed By: zou3519 Differential Revision: D29185909 Pulled By: ngimel fbshipit-source-id: db96239106d9e46a2704b8f457fd0463dacc1f5c	2021-06-17 21:13:22 -07:00
Patrick	5948e6f653	removed gelu from autocast fp32 list (#59639 ) Summary: Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/59639 Reviewed By: H-Huang Differential Revision: D29155914 Pulled By: ezyang fbshipit-source-id: feb117181894c2355768d5b1189b3d5f1649fc0b	2021-06-16 16:29:57 -07:00
Michael Carilli	e8c6a65074	Adds grid_sampler to autocast fp32 list for 1.9 (#58679 ) Summary: Temporary fix for https://github.com/pytorch/pytorch/issues/42218. Numerically, grid_sampler should be fine in fp32 or fp16. So grid_sampler really belongs on the promote list. But performancewise, native grid_sampler backward kernels use gpuAtomicAdd, which is notoriously slow in fp16. So the simplest functionality fix is to put grid_sampler on the fp32 list. In https://github.com/pytorch/pytorch/pull/58618 I implement the right long-term fix (refactoring kernels to use fp16-friendly fastAtomicAdd and moving grid_sampler to the promote list). But that's more invasive, and for 1.9 ngimel says this simple temporary fix is preferred. Pull Request resolved: https://github.com/pytorch/pytorch/pull/58679 Reviewed By: soulitzer Differential Revision: D28576559 Pulled By: ngimel fbshipit-source-id: d653003f37eaedcbb3eaac8d7fec26c343acbc07	2021-05-20 14:05:09 -07:00
Heitor Schueroff	5d68b3695c	[Relanding] Implemented torch.linalg.multi_dot (#52859 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52859 This reverts commit `92a4ee1cf6`. Added support for bfloat16 for CUDA 11 and removed fast-path for empty input tensors that was affecting autograd graph. Test Plan: Imported from OSS Reviewed By: H-Huang Differential Revision: D27402390 Pulled By: heitorschueroff fbshipit-source-id: 73c5ccf54f3da3d29eb63c9ed3601e2fe6951034	2021-04-01 04:49:05 -07:00
Michael Carilli	920eb01e2e	Add scatter_add to amp docs (#54908 ) Summary: Updates docs to reflect https://github.com/pytorch/pytorch/pull/52133. Pull Request resolved: https://github.com/pytorch/pytorch/pull/54908 Reviewed By: agolynski Differential Revision: D27431302 Pulled By: H-Huang fbshipit-source-id: fa3dc6267bc73c81cdd96f986c971daee1922cb5	2021-03-30 15:26:41 -07:00
Michael Carilli	3e6bb5233f	Reference amp tutorial (recipe) from core amp docs (#44725 ) Summary: https://pytorch.org/tutorials/recipes/recipes/amp_recipe.html is live. Core amp docs should reference it. Also i fixed some typos in the `zero_grad` docs we ignored when git was behaving weirdly during ngimel 's merge of https://github.com/pytorch/pytorch/pull/44423. Pull Request resolved: https://github.com/pytorch/pytorch/pull/44725 Reviewed By: mruberry Differential Revision: D23723807 Pulled By: ngimel fbshipit-source-id: ca0b76365f8ca908bd978e3b38bf81857fa6c2a3	2020-09-16 11:37:58 -07:00
Michael Carilli	2a87742ffa	Autocast wrappers for RNN cell apis (#44296 ) Summary: Should fix https://github.com/pytorch/pytorch/issues/42605. Pull Request resolved: https://github.com/pytorch/pytorch/pull/44296 Reviewed By: izdeby Differential Revision: D23580447 Pulled By: ezyang fbshipit-source-id: 86027b693fd2b648f043ab781b84ffcc1f72854d	2020-09-09 09:44:59 -07:00
Michael Carilli	0911c1e71a	Added index_put to promotelist (#41035 ) Summary: [index_put](https://pytorch.org/docs/master/tensors.html#torch.Tensor.index_put) requires src and dst tensors to be the same dtype, so imo it belongs on the promote list when autocast is active (output should be widest dtype among input dtypes). i also put some other registrations in alphabetical order. Pull Request resolved: https://github.com/pytorch/pytorch/pull/41035 Differential Revision: D22418305 Pulled By: ngimel fbshipit-source-id: b467cb16ac6c2ba1f9e43531f69a144b17f00b87	2020-07-07 20:36:55 -07:00
Michael Carilli	3b040c478a	Make custom_fwd a no-op when not executed under autocast (#36171 ) Summary: Currently, a custom autograd function written with ``` torch.cuda.amp.custom_fwd(cast_inputs=dtype) def forward(ctx, *args): ... ``` casts incoming floating-point CUDA tensors to `dtype` unconditionally, regardless of whether the function executes in an autocast-enabled region. I think I had the wrong idea there. Autocast-disabled regions should give the user control of input types. Also, `custom_fwd(cast_inputs=dtype)`-decorated functions' behavior should align with native fp32list/fp16list functions. C++-side casting wrappers have no effect when autocast is disabled, and `custom_fwd`'s casting should behave the same way. The present PR changes `custom_fwd` so it only casts in autocast-enabled regions (also updates custom_fwd to ignore fp64 inputs, like the C++ wrappers). Pull Request resolved: https://github.com/pytorch/pytorch/pull/36171 Differential Revision: D22179511 Pulled By: ngimel fbshipit-source-id: 5a93d070179a43206066bce19da0a5a19ecaabbd	2020-06-23 10:23:02 -07:00
Mike Ruberry	4f761f325c	Back out "[pytorch][PR] Removes dunder div" Summary: NVIDIA's Apex is updating to no longer rely on this behavior, but we're reverting this Python2->Python3 update to unblock internal apex users. Test Plan: Sandcaslte + OSS CI. Reviewed By: ngimel Differential Revision: D22146782 fbshipit-source-id: f9483d2cbf9dc3a469ad48a6c863edea3ae51070	2020-06-19 18:31:20 -07:00
Mike Ruberry	9d588f7ce2	Removes dunder div (#39151 ) Summary: BC-breaking note: If a user is using one of these dunders directly they will not longer be available. Users should update to Python3 compatible dunders. Original PR note: `__div__` (and `__idiv__` and `__rdiv__`) are no longer special dunders in Python3. This PR replaces them with the `__truediv__` (`__itrudediv__`, `__rtruediv__`) dunders, since we no longer support Python2. Pull Request resolved: https://github.com/pytorch/pytorch/pull/39151 Differential Revision: D22075713 Pulled By: mruberry fbshipit-source-id: d318b47b51f7cc4c3728b1606a34d81e49ba0fa1	2020-06-16 23:02:20 -07:00
Michael Carilli	0f0271e255	[RELAND2] Eager autocasting, out-of-place ops only (with MSVC 2017 fix) (#35102 ) Summary: This is the second reland attempt for https://github.com/pytorch/pytorch/pull/32140. The first reland attempt https://github.com/pytorch/pytorch/pull/35011 failed due a [small incompatible change](https://github.com/pytorch/pytorch/pull/35011#issuecomment-601754216) in recent master (`skipIfRocm` was removed from `test_data_parallel.py`). The present PR restores skipIfRocm. Description from first reland attempt https://github.com/pytorch/pytorch/pull/35011: > https://github.com/pytorch/pytorch/pull/32140 was approved and merged, but [reverted](`d0577e19f0`) because it broke builds with versions of Visual Studio older than 15.8 that were not represented in public CI. The build failures were caused by a [known VS bug](https://developercommunity.visualstudio.com/content/problem/27729/allow-function-with-internal-linkage-as-template-n.html), fixed in versions 15.8 and newer. > > The present PR reverts the revert (restoring https://github.com/pytorch/pytorch/pull/32140 's diffs) and adds a workaround to enable compilation with VS < 15.8. The workaround isn't pretty, but it's guarded by macros such that it's only used when compiling with VS < 15.8. All other builds compile with the same code/control flow as was merged in https://github.com/pytorch/pytorch/pull/32140. > > Original description of https://github.com/pytorch/pytorch/pull/32140: > > Initial integration of eager autocasting, supporting out-of-place ops only for easier review. > Relevant issue/RFC: https://github.com/pytorch/pytorch/issues/25081 > > > In-place ops and ops with user-supplied out=... can certainly be supported as well (my initial WIP https://github.com/pytorch/pytorch/issues/29552 handled many) but require substantially more complex special casing in the autocasting backend and tests. Support for these ops (much of which has already been written) will be broken into later PRs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/35102 Differential Revision: D20596918 Pulled By: ezyang fbshipit-source-id: 60caa279bb0ce4a9bb0b28c1d585d42cf1cc7e50	2020-03-24 09:08:04 -07:00
Mike Ruberry	fe276d541e	Revert D20541921: [pytorch][PR] [RELAND] Eager autocasting, out-of-place ops only (with MSVC 2017 fix) Test Plan: revert-hammer Differential Revision: D20541921 Original commit changeset: abb5488dca86 fbshipit-source-id: d2c6038978f80e5429632f8b49107090a8a247f4	2020-03-19 22:39:12 -07:00
Michael Carilli	991b97277a	[RELAND] Eager autocasting, out-of-place ops only (with MSVC 2017 fix) (#35011 ) Summary: https://github.com/pytorch/pytorch/pull/32140 was approved and merged, but [reverted](`d0577e19f0`) because it broke builds with versions of Visual Studio older than 15.8 that were not represented in public CI. The build failures were caused by a [known VS bug](https://developercommunity.visualstudio.com/content/problem/27729/allow-function-with-internal-linkage-as-template-n.html), fixed in versions 15.8 and newer. The present PR reverts the revert (restoring https://github.com/pytorch/pytorch/pull/32140 's diffs) and adds a workaround to enable compilation with VS < 15.8. The workaround isn't pretty, but it's guarded by macros such that it's only used when compiling with VS < 15.8. All other builds compile with the same code/control flow as was merged in https://github.com/pytorch/pytorch/pull/32140. Original description of https://github.com/pytorch/pytorch/pull/32140: > Initial integration of eager autocasting, supporting out-of-place ops only for easier review. Relevant issue/RFC: https://github.com/pytorch/pytorch/issues/25081 > In-place ops and ops with user-supplied out=... can certainly be supported as well (my initial WIP https://github.com/pytorch/pytorch/issues/29552 handled many) but require substantially more complex special casing in the autocasting backend and tests. Support for these ops (much of which has already been written) will be broken into later PRs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/35011 Differential Revision: D20541921 Pulled By: ezyang fbshipit-source-id: abb5488dca8620b0daac4306ebf2bb47fc36e4f5	2020-03-19 20:18:18 -07:00
Edward Yang	d0577e19f0	Revert D20346700: [pytorch][PR] Eager autocasting, out-of-place ops only Test Plan: revert-hammer Differential Revision: D20346700 Original commit changeset: 12d77b391731 fbshipit-source-id: 108d72bf24232f443c0be293ec932c0c478d6a60	2020-03-18 11:42:51 -07:00
Michael Carilli	aaa8f02156	Eager autocasting, out-of-place ops only (#32140 ) Summary: Initial integration of eager autocasting, supporting out-of-place ops only for easier review. Relevant issue/RFC: https://github.com/pytorch/pytorch/issues/25081 In-place ops and ops with user-supplied `out=...` can certainly be supported as well (my initial WIP https://github.com/pytorch/pytorch/pull/29552 handled many) but require substantially more complex special casing in the autocasting backend and tests. Support for these ops (much of which has already been written) will be broken into later PRs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/32140 Differential Revision: D20346700 Pulled By: ezyang fbshipit-source-id: 12d77b3917310186fbddf11c59b2794dc859131f	2020-03-18 10:28:21 -07:00
Michael Carilli	fc6a153688	[WIP] Reanimate gradient scaling API with original scale update heuristic (#33366 ) Summary: Also, windows memory failures responsible for the earlier reversion have been fixed. This PR (initially) contains 2 commits: * a revert of the revert * all changes to implement the original Apex scale update heuristic, squashed into a single commit for easier diff review Pull Request resolved: https://github.com/pytorch/pytorch/pull/33366 Differential Revision: D20099026 Pulled By: ngimel fbshipit-source-id: 339b9b6bd5134bf055057492cd1eedb7e4461529	2020-02-25 19:00:34 -08:00
Edward Yang	ae53f8dd25	Revert D19859905: [pytorch][PR] Gradient scaling API Test Plan: revert-hammer Differential Revision: D19859905 Original commit changeset: bb8ae6966214 fbshipit-source-id: 28f1c93e8a00e3a4bbe8cc981499b15468f0b970	2020-02-14 11:03:27 -08:00
Michael Carilli	40246fa63c	Gradient scaling API (#26512 ) Summary: This PR implements the gradient scaling API that mruberry, jjsjann123, ngimel, zdevito, gchanan and I have been discussing. Relevant issue/RFC: https://github.com/pytorch/pytorch/issues/25081. Volume-wise, this PR is mostly documentation and tests. The Python API (found entirely in `torch/cuda/amp/amp_scaler.py`) is lightweight . The exposed functions are intended to make the implementation and control flow of gradient scaling convenient, intuitive, and performant. The API is probably easiest to digest by looking at the documentation and examples. `docs/source/amp.rst` is the homepage for the Automatic Mixed Precision package. `docs/source/notes/amp_examples.rst` includes several examples demonstrating common but not-immediately-obvious use cases. Examples are backed by tests in `test_cuda.py` (and thankfully the tests pass :P). Two small utility kernels have been added in `native/cuda/AmpKernels.cu` to improve performance and avoid host-device synchronizations wherever possible. Existing optimizers, both in the wild and in Pytorch core, do not need to change to use the scaling API. However, the API was also designed to establish a contract between user scripts and optimizers such that writers of _new_ custom optimizers have the control points they need to implement fast, optionally sync-free updates. User scripts that obey the scaling API can drop such custom optimizers in and reap performance benefits without having to change anything aside from the optimizer constructor itself. [I know what the contract with custom optimizers should be](`35829f24ef/torch/cuda/amp/amp_scaler.py (L179-L184)`), but I'm waiting for review on the rest of the API before I go about documenting it (it will be given a dedicated section in `docs/source/notes/amp_examples.rst`. Currently, the gradient scaling examples do not include the auto-casting API as discussed in https://github.com/pytorch/pytorch/issues/25081. The gradient scaling API is intended to be orthogonal/modular relative to autocasting. Without auto-casting the gradient scaling API is fully use-_able_, but not terribly use-_ful_, so it's up to you guys whether you want to wait until auto-casting is ready before merging the scaling API as well. ### Todo - [ ] How do I get c10 registered status for my two custom kernels? They're very simple. Pull Request resolved: https://github.com/pytorch/pytorch/pull/26512 Differential Revision: D19859905 Pulled By: mruberry fbshipit-source-id: bb8ae6966214718dfee11345db824389e4286923	2020-02-13 11:06:06 -08:00

41 Commits