pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-06 12:20:52 +01:00

Author	SHA1	Message	Date
eqy	eff74ed7bd	[AMP] Use generic autocast in example, specify dtype (#79579 ) CC @mruberry @ptrblck Pull Request resolved: https://github.com/pytorch/pytorch/pull/79579 Approved by: https://github.com/mruberry, https://github.com/ngimel	2022-06-17 21:32:51 +00:00
Rhys Goodall	62ba548cac	[DOC] Missing line in serialization notes (#79454 ) Small typo fix to serialization docs where there was a missing line in one of the examples. Pull Request resolved: https://github.com/pytorch/pytorch/pull/79454 Approved by: https://github.com/mruberry	2022-06-17 18:26:47 +00:00
Mike Ruberry	1d47e0df5a	Updates TF32 docs (#79401 ) Updates TF32 docs to reflect PyTorch 1.12 updates. Pull Request resolved: https://github.com/pytorch/pytorch/pull/79401 Approved by: https://github.com/ngimel	2022-06-13 21:02:00 +00:00
lezcano	a8ea58afee	Add randomness case to the autograd notes I also took this chance to clean a bit the sphinx formatting and reworded a few minor things. Pull Request resolved: https://github.com/pytorch/pytorch/pull/78617 Approved by: https://github.com/soulitzer, https://github.com/albanD	2022-06-08 21:27:03 +00:00
Kurt Mohler	a4403c17c7	Improve reproducibility docs for RNG (#78849 ) * Mention that operations may change RNG state and how to deal with it * Add link to Reproducibility note in `use_deterministic_algorithms` docs * Also fix a broken link Fixes #77206 Pull Request resolved: https://github.com/pytorch/pytorch/pull/78849 Approved by: https://github.com/mruberry	2022-06-06 14:53:59 +00:00
albanD	b30b1f3dec	update mps note with more details (#78669 ) Follow up to the comments in https://github.com/pytorch/pytorch/pull/77767#pullrequestreview-978807521 Pull Request resolved: https://github.com/pytorch/pytorch/pull/78669 Approved by: https://github.com/kulinseth, https://github.com/anjali411	2022-06-02 20:53:19 +00:00
vfdev	642fc94501	Update extending.rst (#78707 ) Follow-up fix for https://github.com/pytorch/pytorch/pull/78073 : https://github.com/pytorch/pytorch/pull/78073#discussion_r887621219 Pull Request resolved: https://github.com/pytorch/pytorch/pull/78707 Approved by: https://github.com/albanD	2022-06-02 17:24:00 +00:00
Philip Meier	288b23bc52	fix MetadataTensor example (#78073 ) ```py [bar if bar for bar in foo] ``` is invalid Python syntax. The `if` clause needs to be at the end: ```py [bar for bar in foo if bar] ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/78073 Approved by: https://github.com/albanD	2022-05-31 21:34:19 +00:00
Alban Desmaison	dcd2ba3538	improve mps note to describe the different functions available (#77767 ) Fixing https://github.com/pytorch/pytorch/issues/77748 Pull Request resolved: https://github.com/pytorch/pytorch/pull/77767 Approved by: https://github.com/soulitzer	2022-05-18 20:17:23 +00:00
Jeff Daily	de86146c61	rocblas alt impl during backward pass only (#71881 ) In preparation of adopting future rocblas library options, it is necessary to track when the backward pass of training is executing. The scope-based helper class `BackwardPassGuard` is provided to toggle state. Pull Request resolved: https://github.com/pytorch/pytorch/pull/71881 Approved by: https://github.com/albanD	2022-05-18 19:42:58 +00:00
Kulin Seth	e011a8e18b	Enable PyTorch operations on MPS Backend. (#77343 ) Add PyTorch operations to MPS backend. - https://github.com/pytorch/pytorch/issues/77394 Pull Request resolved: https://github.com/pytorch/pytorch/pull/77343 Approved by: https://github.com/albanD	2022-05-13 18:28:53 +00:00
James Reed	286d788029	Properly capitalize PyTorch (#77308 ) pytorch -> PyTorch Pull Request resolved: https://github.com/pytorch/pytorch/pull/77308 Approved by: https://github.com/bertmaher, https://github.com/mthrok	2022-05-12 18:07:32 +00:00
Alban Desmaison	d5210a4269	Add gradient choice detail to autograd doc Trying to clarify what our backward functions should compute. Pull Request resolved: https://github.com/pytorch/pytorch/pull/76898 Approved by: https://github.com/soulitzer, https://github.com/Lezcano	2022-05-06 21:12:25 +00:00
Smark	ab57876420	fix docs error in Autograd Mechanics Fixes #74682 Pull Request resolved: https://github.com/pytorch/pytorch/pull/74807 Approved by: https://github.com/albanD	2022-03-29 18:32:16 +00:00
leslie-fang-intel	3a112ebb57	add autocast cpu doc As discussed in https://github.com/pytorch/pytorch/issues/55374#issuecomment-968333614, here we update the cpu autocast operation list in autocast API document. Pull Request resolved: https://github.com/pytorch/pytorch/pull/68567 Approved by: https://github.com/ezyang	2022-03-22 02:02:43 +00:00
Jaewon Lee	11ea09effc	[CUDACachingAlloc/GPUInference] Implement garbage collection without GPU sync (#74261 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/74261 ### Goal Implement a cheap way to reclaim GPU memory (garbage collection) without incurring GPU sync. ### Why do we need this? Currently, there are only two ways to reclaim GPU memory block already assigned to a particular stream. - `release_available_cached_blocks(params)`: Free blocks exceeding the `CachingAllocatorConfig::max_split_size()` until we can satisfy the request. Issue: If the `max_split_size` is unset (default), this function is a no-op. Even if this is set, the reclamation is quite conservative (e.g., never frees blocks under max_split_size). - `release_cached_blocks()`: Waits for all the in-flight events and then reclaim blocks. Issue: 'waiting for all event' is very expensive as it will likely stall all the GPU operations. Many GPU applications without a proper handling of potential GPU throttling would suffer/crash. ### Proposed idea - If the garbage collection threshold is set, try to reclaim some memory blocks without synchronization. It should be safe to do so, as `release_available_cached_blocks` essentially does the same thing (but less aggressively). - GC is triggered only when we fail to serve a `malloc` request from the block pool. No need to free blocks when the block pool is functioning just fine. - Prioritize reclaiming blocks that weren't reused for long time. Reclamation stops once the used memory capacity < threshold. - This code path is totally optional; by default it won't be invoked. Test Plan: - Unit tests - Manually checked that the GPU memory usage stays as indicated by the garbage collector. If not the caching allocator at least tries to keep freeing the blocks. Reviewed By: jianyuh Differential Revision: D34482514 fbshipit-source-id: d5eae62ac60b94b0bca851f9d233a092d086e3c2 (cherry picked from commit 05780f1ed4b176f05e765b2411c9eaa2eaeb48b0)	2022-03-21 18:46:02 +00:00
Banit Agrawal	ac3effd150	[PyTorch GPU Allocator] Better use of blocks with rounding of allocation sizes (#74213 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/74213 In the current CUDACachingAllocator, the sizes are rounded up in multiple of blocks size of 512, so this works for smaller sizes. However for large sizes, we can have lots of different size blocks in the larger pool. This is problematic when we have variable batch sizes 1001, 1021, 1023 -> all will go to different block size and will create different size of blocks. This will create lots of unused blocks and will waste GPU memory capacity. This diff adds a rounding approach to allocation size. It rounds up the size to nearest power-of-2 divisions and the power2-division can be changed with env variable setting. For example, if we need to round-up size of1200 and if number of divisions is 4, the size 1200 lies between 1024 and 2048 and if we do 4 divisions between them, the values are 1024, 1280, 1536, and 1792. So the function will return 1280 as the nearest ceiling of power-2 division. env setting: export PYTORCH_CUDA_ALLOC_CONF=roundup_power2_divisions:4 ghstack-source-id: 151446017 Reviewed By: ezyang Differential Revision: D34868036 fbshipit-source-id: 494785add16e6b37c920dcb5a2b81d4c637b554a (cherry picked from commit 548454ccacbd8700e7ffd2d762e40b4ba37abbae)	2022-03-16 02:53:53 +00:00
Rohit Goswami	801abc0cdd	MAINT, DOC: Trivial spellings and warnings (#72745 ) Summary: Fixes N/A. Just minor annoyances. Pull Request resolved: https://github.com/pytorch/pytorch/pull/72745 Reviewed By: samdow Differential Revision: D34216016 Pulled By: albanD fbshipit-source-id: b65600b50e41a1dd7bf7d076b0dd3e2d1c99caf9 (cherry picked from commit `b959392a5f`)	2022-02-14 21:55:19 +00:00
Felix Divo	340fae4363	[Doc] Better formatting in autograd.rst (#72586 ) Summary: See title. Pull Request resolved: https://github.com/pytorch/pytorch/pull/72586 Reviewed By: soulitzer Differential Revision: D34177704 Pulled By: albanD fbshipit-source-id: 1adf6ebed4f64ec4d8fff160df300c8e6ee528ea (cherry picked from commit `bbb586d67d`)	2022-02-11 22:46:10 +00:00
Felix Divo	25fba4a019	[DOC] Add link to "double backward" from "extending pytorch" page (#72584 ) Summary: It is probably the most user friendly to link to that (lesser known?) feature. Pull Request resolved: https://github.com/pytorch/pytorch/pull/72584 Reviewed By: soulitzer Differential Revision: D34173999 Pulled By: albanD fbshipit-source-id: 99fff7a55412faf54888f8317ab2388f4d7d30e4 (cherry picked from commit `2191ee7657`)	2022-02-11 20:34:13 +00:00
Mike Ruberry	9b9b878c89	Fixes jiterator cache macro include + updates CUDA note with cache variables (#71452 ) Summary: Per title. Pull Request resolved: https://github.com/pytorch/pytorch/pull/71452 Reviewed By: ngimel Differential Revision: D33646495 Pulled By: mruberry fbshipit-source-id: bbf627e6d7a724a83a3ea2ae9c0f50430f8d578e (cherry picked from commit `d1e72b144a`)	2022-01-19 03:45:05 +00:00
Rohan Varma	4fd1992a60	[Docs][BE] DDP doc fix (#71363 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/71363 Looks like DDP example is currently broken as per https://discuss.pytorch.org/t/official-ddp-example-is-broken/141493. Fix the issue by setting the correct env variable. ghstack-source-id: 147080377 Test Plan: CI Reviewed By: mrshenli Differential Revision: D33607250 fbshipit-source-id: e0e7d03cc365c186253b959c4c5405a5e3609218 (cherry picked from commit `32472884ec`)	2022-01-18 22:24:51 +00:00
Jake Tae	23f902f7e4	Fix incorrect variable in autograd docs (#70884 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/68362. Pull Request resolved: https://github.com/pytorch/pytorch/pull/70884 Reviewed By: mruberry Differential Revision: D33463331 Pulled By: ngimel fbshipit-source-id: 834ba9c450972710e0424cc92af222551f0b4a4a	2022-01-06 20:53:10 -08:00
Peter Bell	e279963eef	Remove remaining THC code (#69039 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69039 Test Plan: Imported from OSS Reviewed By: anjali411 Differential Revision: D32872476 Pulled By: ngimel fbshipit-source-id: 7972aacc24aef9450fb59b707ed6396c501bcb31	2021-12-08 12:18:08 -08:00
Rodrigo Bermúdez Schettino	1a202b0c39	Docs: Fix broken code syntax in autograd.rst (#69362 ) Summary: The backticks around `nn.Parameters` were not rendered correctly because the word was enclosed in an italics block. Spotted the issue on https://pytorch.org/docs/stable/notes/autograd.html#locally-disable-grad-doc. Pull Request resolved: https://github.com/pytorch/pytorch/pull/69362 Reviewed By: zou3519 Differential Revision: D32924093 Pulled By: albanD fbshipit-source-id: 5a310ac3f3d13a5116f7aa911817b9452eee711d	2021-12-07 12:03:15 -08:00
Michael Carilli	da023611d7	[CUDA graphs] Fixes make_graphed_callables example typos (#69379 ) Summary: cc mcarilli Pull Request resolved: https://github.com/pytorch/pytorch/pull/69379 Reviewed By: mruberry Differential Revision: D32841260 Pulled By: ngimel fbshipit-source-id: a7d0b9db0578526907547b201eddd55827812b63	2021-12-03 16:51:14 -08:00
Elio	088a4feb41	Update the documentation for AMP with DataParallel (#69218 ) Summary: Following https://github.com/pytorch/pytorch/issues/60540 and pull request https://github.com/pytorch/pytorch/issues/43102 Pull Request resolved: https://github.com/pytorch/pytorch/pull/69218 Reviewed By: gchanan Differential Revision: D32803814 Pulled By: ngimel fbshipit-source-id: 06fdbbee2c7734153271be70ec4bc24263c8c367	2021-12-03 14:58:47 -08:00
Vansh Sharma	ff125a3624	Minor changes in documentation (#68557 ) Summary: Fixed some small typos Pull Request resolved: https://github.com/pytorch/pytorch/pull/68557 Reviewed By: mruberry Differential Revision: D32538749 Pulled By: ngimel fbshipit-source-id: 09a9cd4031463b6a40d7307bd8fcb7d364444ac3	2021-11-18 17:57:16 -08:00
eqy	790763b0fe	Add an option to disable reduced precision reductions for FP16 GEMM (#67946 ) Summary: https://github.com/pytorch/pytorch/issues/67578 disabled reduced precision reductions for FP16 GEMMs. After benchmarking, we've found that this has substantial performance impacts for common GEMM shapes (e.g., those found in popular instantiations of multiheaded-attention) on architectures such as Volta. As these performance regressions may come as a surprise to current users, this PR adds a toggle to disable reduced precision reductions `torch.backends.cuda.matmul.allow_fp16_reduced_precision_reduction = ` rather than making it the default behavior. CC ngimel ptrblck stas00 Note that the behavior after the previous PR can be replicated with `torch.backends.cuda.matmul.allow_fp16_reduced_precision_reduction = False` Pull Request resolved: https://github.com/pytorch/pytorch/pull/67946 Reviewed By: zou3519 Differential Revision: D32289896 Pulled By: ngimel fbshipit-source-id: a1ea2918b77e27a7d9b391e030417802a0174abe	2021-11-09 17:27:20 -08:00
Alban Desmaison	708f7b1209	Update extending doc to cover forward mode AD (#66962 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66962 Reviewed By: VitalyFedyunin Differential Revision: D31897782 Pulled By: albanD fbshipit-source-id: 64164783a14a7ed4cedc17da28f1181d9807a499	2021-10-27 14:18:38 -07:00
Natalia Gimelshein	fdd9f49cf5	add a note on numerical accuracy (#65947 ) Summary: Per title Fixes https://github.com/pytorch/pytorch/issues/54437 Pull Request resolved: https://github.com/pytorch/pytorch/pull/65947 Reviewed By: albanD Differential Revision: D31612445 Pulled By: ngimel fbshipit-source-id: 5c155891a088aef3b9813f253d0dc1ee4d51ae1c	2021-10-13 12:43:55 -07:00
Rodrigo Berriel	7e772e7685	Update link to tutorial on defining NN modules (#65534 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/65527. Please, see my comment in the issue: https://github.com/pytorch/pytorch/issues/65527#issuecomment-925863193. The file was renamed in `ce58d5904c (diff-e5ef486bd89eb38de15752211d9437953681b8caa8f44d7c86bb820d13151df2)`, but the link in this repository was not updated. It doesn't change the fact that the old link is still working, but I guess this has to be fixed in [pytorch/tutorials](https://github.com/pytorch/tutorials) instead of here. Pull Request resolved: https://github.com/pytorch/pytorch/pull/65534 Reviewed By: soulitzer Differential Revision: D31144269 Pulled By: H-Huang fbshipit-source-id: f70744a21113b7dc84510e2992d87f0fed793985	2021-09-23 11:26:50 -07:00
Rodrigo Berriel	f0ada4bd54	[docs] Remove .data from some docs (#65358 ) Summary: Related to https://github.com/pytorch/pytorch/issues/30987. Fix the following task: - [ ] Remove the use of `.data` in all our internal code: - [ ] ... - [x] `docs/source/scripts/build_activation_images.py` and `docs/source/notes/extending.rst` In `docs/source/scripts/build_activation_images.py`, I used `nn.init` because the snippet already assumes `nn` is available (the class inherits from `nn.Module`). cc albanD Pull Request resolved: https://github.com/pytorch/pytorch/pull/65358 Reviewed By: malfet Differential Revision: D31061790 Pulled By: albanD fbshipit-source-id: be936c2035f0bdd49986351026fe3e932a5b4032	2021-09-21 06:32:31 -07:00
Michael Carilli	e3210ca184	[CUDA graphs] Beta, not prototype (#65247 ) Summary: Powers have decided this API should be listed as beta. Pull Request resolved: https://github.com/pytorch/pytorch/pull/65247 Reviewed By: malfet Differential Revision: D31057940 Pulled By: ngimel fbshipit-source-id: 137b63cbd2c7409fecdc161a22135619bfc96bfa	2021-09-20 13:32:36 -07:00
albanD	473e55d5b2	Use classmethods for overrides (#64841 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64841 Test Plan: Imported from OSS Reviewed By: heitorschueroff Differential Revision: D30991424 Pulled By: albanD fbshipit-source-id: 551e2119768f3a4292713f3bfa83930f5506adbd	2021-09-17 08:32:49 -07:00
Jane Xu	4c4c03124b	Remove old references to 9.2 in documentation (#65059 ) Summary: Removes references in .rst and README.md and comments in the Dockerfile Pull Request resolved: https://github.com/pytorch/pytorch/pull/65059 Reviewed By: malfet Differential Revision: D30961110 Pulled By: janeyx99 fbshipit-source-id: 702a9a81bf08125ec4ac38bc656fc2c128c30018	2021-09-16 13:24:05 -07:00
Michael Carilli	36cac2be4d	[CUDA graphs] moves memory sharing intro paragraph (#64996 ) Summary: Puts memory sharing intro under Sharing memory... header, where it should have been all along. Pull Request resolved: https://github.com/pytorch/pytorch/pull/64996 Reviewed By: mruberry Differential Revision: D30948619 Pulled By: ngimel fbshipit-source-id: 5d9dd267b34e9d3fc499d4738377b58a22da1dc2	2021-09-14 17:53:43 -07:00
Michael Carilli	8d08b103be	[CUDA graphs] Prototype API and documentation (#63269 ) Summary: RFC: https://github.com/pytorch/pytorch/issues/61880 Pull Request resolved: https://github.com/pytorch/pytorch/pull/63269 Reviewed By: mruberry Differential Revision: D30596643 Pulled By: ngimel fbshipit-source-id: b1f8061406364b667e2c2d4d30fbce1f0d8456be	2021-08-31 13:34:23 -07:00
Joel Schlosser	196fd3ee7a	Modules note v2 (#63963 ) Summary: This PR expands the [note on modules](https://pytorch.org/docs/stable/notes/modules.html) with additional info for 1.10. It adds the following: * Examples of using hooks * Examples of using apply() * Examples for ParameterList / ParameterDict * register_parameter() / register_buffer() usage * Discussion of train() / eval() modes * Distributed training overview / links * TorchScript overview / links * Quantization overview / links * FX overview / links * Parametrization overview / link to tutorial Pull Request resolved: https://github.com/pytorch/pytorch/pull/63963 Reviewed By: albanD Differential Revision: D30606604 Pulled By: jbschlosser fbshipit-source-id: c1030b19162bcb5fe7364bcdc981a2eb6d6e89b4	2021-08-27 11:30:18 -07:00
Jithun Nair	730ce29baf	Add note on ifdefing based on CUDA_VERSION for ROCm path (#62850 ) Summary: CUDA_VERSION and HIP_VERSION follow very unrelated versioning schemes, so it does not make sense to use CUDA_VERSION to determine the ROCm path. This note explicitly addresses it. Pull Request resolved: https://github.com/pytorch/pytorch/pull/62850 Reviewed By: mruberry Differential Revision: D30547562 Pulled By: malfet fbshipit-source-id: 02990fa66a88466c2330ab85f446b25b78545150	2021-08-25 15:02:03 -07:00
Victor Quach	b95ce1591d	Add docs describing saved tensor hooks (#62362 ) Summary: Add section to the Autograd mechanics docs to describe the recently exposed saved tensors (https://github.com/pytorch/pytorch/issues/52451), how to register packing / unpacking hooks (https://github.com/pytorch/pytorch/issues/60975) and how to use default hooks (https://github.com/pytorch/pytorch/issues/61834) Sister PR: https://github.com/pytorch/pytorch/issues/62361 (will add a link from autograd.rst to notes/autograd in whatever PR does not land first) Pull Request resolved: https://github.com/pytorch/pytorch/pull/62362 Reviewed By: soulitzer Differential Revision: D30453177 Pulled By: Varal7 fbshipit-source-id: f5759977b069ff0ef36a47b08856d297691a6caa	2021-08-20 11:10:51 -07:00
soulitzer	2f615f6313	Improve custom function docs (#60312 ) Summary: - Adds some code examples for `ctx` methods and make requirements of arguments more clear - Type annotations for `save_for_backward`, `mark_dirty`, `mark_non_differentiable`, and `set_materialize_grads` (BC-breaking?) - Refactor `torch.autograd.Function` doc Pull Request resolved: https://github.com/pytorch/pytorch/pull/60312 Reviewed By: VitalyFedyunin Differential Revision: D30314961 Pulled By: soulitzer fbshipit-source-id: a284314b65662e26390417bd2b6b12cd85e68dc8	2021-08-18 11:31:31 -07:00
kyshel	e75ed4a4b5	add comma to prevent syntax errors (#62492 ) Summary: Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/62492 Reviewed By: VitalyFedyunin Differential Revision: D30304684 Pulled By: ezyang fbshipit-source-id: db08ca39bcecbfd79ea50df18536bf4e87f51e15	2021-08-16 12:27:31 -07:00
cpatru	6d896cb545	Update faq.rst so OOM section mentions checkpoint (#62709 ) Summary: This FAQ has a section for CUDA OOMs where there are lots of don'ts. This limits modeling solution. Deep nets can blow up memory due to output caching during training. It's a known problem with a known solution: to trade-off compute for memory via checkpointing. FAQ should mention it. Pull Request resolved: https://github.com/pytorch/pytorch/pull/62709 Reviewed By: nairbv Differential Revision: D30103326 Pulled By: ezyang fbshipit-source-id: 3a8b465a7fbe19aae88f83cc50fe82ebafcb56c9	2021-08-05 07:40:08 -07:00
Victor Quach	5830f122f1	Add docstrings for save_on_cpu hooks (#62410 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62410 This PR adds docstrings for CPU hooks introduced in #61928. Also uncomments the warning about pinned memory in CUDA semantics docs. Depends on: #62361. For now docstrings are an orphan page at https://docs-preview.pytorch.org/62410/generated/torch.autograd.graph.set_save_on_cpu_hooks.html#torch-autograd-graph-set-save-on-cpu-hooks Test Plan: Imported from OSS Reviewed By: soulitzer Differential Revision: D29990129 Pulled By: Varal7 fbshipit-source-id: 7a98eeee6a0abb11e2c2d9169cd1aa35ad7ba3f4	2021-08-03 17:53:45 -07:00
Michael Dagitses	58df01c3b8	clarify default value of requires_grad for tensors (#61038 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61038 Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D29491984 Pulled By: dagitses fbshipit-source-id: 7e6b7f8e81d77f38c881b86a68c17d3cf5483dad	2021-07-12 12:57:37 -07:00
Jithun Nair	336970c03e	Add note on torch.distributed backends on ROCm (#58975 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58975 Reviewed By: soulitzer Differential Revision: D29595510 Pulled By: rohan-varma fbshipit-source-id: 384bb67fcd003d65b76e957a474406b2a38099b9	2021-07-10 03:51:19 -07:00
Michael Carilli	2fa6c7627e	[CUDA graphs][BC-breaking] Removes post-backward syncs on default stream (#60421 ) Summary: Before https://github.com/pytorch/pytorch/pull/57833, calls to backward() or grad() synced only the calling thread's default stream with autograd leaf streams at the end of backward. This made the following weird pattern safe: ```python with torch.cuda.stream(s): # imagine forward used many streams, so backward leaf nodes may run on many streams loss.backward() # no sync use grads ``` but a more benign-looking pattern was unsafe: ```python with torch.cuda.stream(s): # imagine forward used a lot of streams, so backward leaf nodes may run on many streams loss.backward() # backward() syncs the default stream with all the leaf streams, but does not sync s with anything, # so counterintuitively (even though we're in the same stream context as backward()!) # it is NOT SAFE to use grads here, and there's no easy way to make it safe, # unless you manually sync on all the streams you used in forward, # or move "use grads" back to default stream outside the context. use grads ``` mruberry ngimel and I decided backward() should have the [same user-facing stream semantics as any cuda op](https://pytorch.org/docs/master/notes/cuda.html#stream-semantics-of-backward-passes). In other words, the weird pattern should be unsafe, and the benign-looking pattern should be safe. Implementationwise, this meant backward() should sync its calling thread's current stream, not default stream, with the leaf streams. After https://github.com/pytorch/pytorch/pull/57833, backward syncs the calling thread's current stream AND default stream with all leaf streams at the end of backward. The default stream syncs were retained for temporary backward compatibility. This PR finishes https://github.com/pytorch/pytorch/pull/57833's work by deleting syncs on the default stream. With this PR, graph-capturing an entire backward() call should be possible (see the [test_graph_grad_scaling diffs](https://github.com/pytorch/pytorch/compare/master...mcarilli:streaming_backwards_remove_default_syncs?expand=1#diff-893b1eea27352f336f4cd832919e48d721e4e90186e63400b8596db6b82e7450R3641-R3642)). first paragraph has a formatting error which this PR should also fix. Pull Request resolved: https://github.com/pytorch/pytorch/pull/60421 Reviewed By: albanD Differential Revision: D29370344 Pulled By: ngimel fbshipit-source-id: 3248bc5fb92fc517db0c15c897e5d7250f67d7fe	2021-06-24 17:34:02 -07:00
Luca Wehrstedt	bb9e1150ea	Revert D29342234: [pytorch][PR] [CUDA graphs][BC-breaking] Removes post-backward syncs on default stream Test Plan: revert-hammer Differential Revision: D29342234 (`675cea1adb`) Original commit changeset: 98e6be7fdd85 fbshipit-source-id: 84022973248b2254210eee57402df2c4f4bc43c6	2021-06-24 04:49:28 -07:00
Michael Carilli	675cea1adb	[CUDA graphs][BC-breaking] Removes post-backward syncs on default stream (#60421 ) Summary: Before https://github.com/pytorch/pytorch/pull/57833, calls to backward() or grad() synced only the calling thread's default stream with autograd leaf streams at the end of backward. This made the following weird pattern safe: ```python with torch.cuda.stream(s): # imagine forward used many streams, so backward leaf nodes may run on many streams loss.backward() # no sync use grads ``` but a more benign-looking pattern was unsafe: ```python with torch.cuda.stream(s): # imagine forward used a lot of streams, so backward leaf nodes may run on many streams loss.backward() # backward() syncs the default stream with all the leaf streams, but does not sync s with anything, # so counterintuitively (even though we're in the same stream context as backward()!) # it is NOT SAFE to use grads here, and there's no easy way to make it safe, # unless you manually sync on all the streams you used in forward, # or move "use grads" back to default stream outside the context. use grads ``` mruberry ngimel and I decided backward() should have the [same user-facing stream semantics as any cuda op](https://pytorch.org/docs/master/notes/cuda.html#stream-semantics-of-backward-passes). In other words, the weird pattern should be unsafe, and the benign-looking pattern should be safe. Implementationwise, this meant backward() should sync its calling thread's current stream, not default stream, with the leaf streams. After https://github.com/pytorch/pytorch/pull/57833, backward syncs the calling thread's current stream AND default stream with all leaf streams at the end of backward. The default stream syncs were retained for temporary backward compatibility. This PR finishes https://github.com/pytorch/pytorch/pull/57833's work by deleting syncs on the default stream. With this PR, graph-capturing an entire backward() call should be possible (see the [test_graph_grad_scaling diffs](https://github.com/pytorch/pytorch/compare/master...mcarilli:streaming_backwards_remove_default_syncs?expand=1#diff-893b1eea27352f336f4cd832919e48d721e4e90186e63400b8596db6b82e7450R3641-R3642)). first paragraph has a formatting error which this PR should also fix. Pull Request resolved: https://github.com/pytorch/pytorch/pull/60421 Reviewed By: VitalyFedyunin, albanD Differential Revision: D29342234 Pulled By: ngimel fbshipit-source-id: 98e6be7fdd8550872f0a78f9a66cb8dfe75abf63	2021-06-23 23:35:24 -07:00

1 2 3 4 5 ...

274 Commits