pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
Sriram Kumar	a19b667bca	[ROCm] Update CUDAPluggableAllocator.h (#1984 ) (#150010 ) Altering the flag to use the correct streamType in CUDAPluggableAllocator class for ROCm gpu. The flag TORCH_HIP_VERSION does not work for ROCm as intended. This flag is replaced with USE_ROCM. This is impacting Distributed Fused Adam in Rocm/APEX when using nccl_ub feature. This has been tested with rocm/apex. See PR https://github.com/ROCm/apex/pull/184 Pull Request resolved: https://github.com/pytorch/pytorch/pull/150010 Approved by: https://github.com/jeffdaily	2025-04-01 16:49:03 +00:00
Shivam Raikundalia	a11538aa46	[GPU Snapshot] Add Clear History Flag (#149352 ) Summary: Oftentimes, users complain that a bunch of extra events are prepended to their desired GPU snapshot. This is because they usually attach an OOM logger without knowing and when they go to collect the actual snapshot, it adds all the OOM logger contents. Since OOM and regular snapshot use the same backend, we currently don't have the infra in place to split these snapshots. As a solution we add a flag to the snapshot frontend to clear out the history when starting the auto-trace record memory history. A more thorough solution would be to have a user pass in a handle and to have snapshots per handle to seperate the events. However, this would likely be complicated and more work than it is worth as we would have to change the callbacks in the caching allocator and pass these objects between python and cpp. Test Plan: See diff below Differential Revision: D71159720 Pull Request resolved: https://github.com/pytorch/pytorch/pull/149352 Approved by: https://github.com/eqy, https://github.com/aaronenyeshi	2025-03-19 21:44:20 +00:00
Marko Radmilac	c65ee728f0	Initial implementation of host memory stats (#147660 ) This is an initial attempt to provide some statistics for the pinned host memory allocations flowing through CachingHostAllocator. Many times in the past we have had inexplicable slowdowns that would be much easier to diagnose if we had some host memory characteristics. This change tries very hard not to disrupt the initial design of the allocator, and it uses existing locking mechanism, whenever possible, to gather statistics "for free". Only deviation from that is on the "slow path" where we incur CUDA calls anyway, so taking a short lock is not going to hurt the performance much, especially in the steady state where most allocations will come from cache. As mentioned before, this is the first PR, to introduce the concept and to see if it fits the right paradigm. We can always add more later. Metrics that would require more involved changes to the code base and locks, like requested memory, have been punted for now. I also tried to reuse the Stat structure used in CUDA caching allocator, in order to maintain symmetry. Pull Request resolved: https://github.com/pytorch/pytorch/pull/147660 Approved by: https://github.com/ngimel	2025-03-05 16:13:19 +00:00
Richard Barnes	5301710b15	[codemod] Fix unused-value issue in caffe2/aten/src/ATen/cuda/detail/CUDAHooks.cpp +4 (#147555 ) Summary: LLVM has a warning `-Wunused-value` which we treat as an error because it's so often diagnostic of a code issue. Unused values often indicate a programming mistake, but can also just be unnecessary cruft that harms readability and performance. For questions/comments, contact r-barnes. - If you approve of this diff, please use the "Accept & Ship" button :-) Test Plan: Sandcastle Differential Revision: D69945678 Pull Request resolved: https://github.com/pytorch/pytorch/pull/147555 Approved by: https://github.com/Skylion007, https://github.com/eqy	2025-03-01 19:46:13 +00:00
PyTorch MergeBot	a983b2b11a	Revert "Initial implementation of host memory stats (#147660 )" This reverts commit `945e359fc1`. Reverted https://github.com/pytorch/pytorch/pull/147660 on behalf of https://github.com/mradmila due to There is an issue with ambiguous definition of Stat structure when different C++ tools are used. Backing out for now. ([comment](https://github.com/pytorch/pytorch/pull/147660#issuecomment-2692346379))	2025-03-01 18:05:45 +00:00
Marko Radmilac	945e359fc1	Initial implementation of host memory stats (#147660 ) This is an initial attempt to provide some statistics for the pinned host memory allocations flowing through CachingHostAllocator. Many times in the past we have had inexplicable slowdowns that would be much easier to diagnose if we had some host memory characteristics. This change tries very hard not to disrupt the initial design of the allocator, and it uses existing locking mechanism, whenever possible, to gather statistics "for free". Only deviation from that is on the "slow path" where we incur CUDA calls anyway, so taking a short lock is not going to hurt the performance much, especially in the steady state where most allocations will come from cache. As mentioned before, this is the first PR, to introduce the concept and to see if it fits the right paradigm. We can always add more later. Metrics that would require more involved changes to the code base and locks, like requested memory, have been punted for now. I also tried to reuse the Stat structure used in CUDA caching allocator, in order to maintain symmetry. Pull Request resolved: https://github.com/pytorch/pytorch/pull/147660 Approved by: https://github.com/ngimel	2025-02-28 18:36:44 +00:00
x41lakazam	30375cb326	Fix minor typo in python_nccl (#148088 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/148088 Approved by: https://github.com/Skylion007	2025-02-28 00:47:09 +00:00
PyTorch MergeBot	9a883007a2	Revert "Implement cuda graphs implementation of torch.cond and torch.while_loop (#140979 )" This reverts commit `c7515da7b0`. Reverted https://github.com/pytorch/pytorch/pull/140979 on behalf of https://github.com/huydhn due to This change has been reported to break internal code ([comment](https://github.com/pytorch/pytorch/pull/140979#issuecomment-2657361940))	2025-02-13 18:04:26 +00:00
Daniel Galvez	c7515da7b0	Implement cuda graphs implementation of torch.cond and torch.while_loop (#140979 ) This is a new PR for #130386 , which got stale and was closed. Since I force-pushed to that branch in order to rebase it on top of main, the PR can no longer be reopened, according to https://github.com/isaacs/github/issues/361 I fixed the possibly-not-warmed-up problem described here: https://github.com/pytorch/pytorch/pull/130386/files#r1690856534 Since starting this, torch.cond and torch.while_loop now apparently have support for backward passes. I will look into what it might take to support that. Pull Request resolved: https://github.com/pytorch/pytorch/pull/140979 Approved by: https://github.com/eqy, https://github.com/eellison	2025-02-11 18:16:15 +00:00
cyy	25aa7ca62d	Cleanup CallOnce.h (#146700 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/146700 Approved by: https://github.com/albanD	2025-02-07 16:44:45 +00:00
PyTorch MergeBot	00dc5b10f6	Revert "[Environment Variable][7/N] Use thread-safe getenv functions (#140211 )" This reverts commit `2fd1b6b361`. Reverted https://github.com/pytorch/pytorch/pull/140211 on behalf of https://github.com/atalman due to Breaks executorch tests ([comment](https://github.com/pytorch/pytorch/pull/140211#issuecomment-2632202864))	2025-02-03 22:04:28 +00:00
cyy	2fd1b6b361	[Environment Variable][7/N] Use thread-safe getenv functions (#140211 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/140211 Approved by: https://github.com/ezyang, https://github.com/eqy	2025-02-01 12:33:41 +00:00
PyTorch MergeBot	284f217011	Revert "[Environment Variable][7/N] Use thread-safe getenv functions (#140211 )" This reverts commit `97b3b73f3e`. Reverted https://github.com/pytorch/pytorch/pull/140211 on behalf of https://github.com/ZainRizvi due to Sorry but this is failing internally. @eqy @ezyang can you please help this get remerged? See D68779772. ([comment](https://github.com/pytorch/pytorch/pull/140211#issuecomment-2622504898))	2025-01-29 18:24:29 +00:00
cyyever	97b3b73f3e	[Environment Variable][7/N] Use thread-safe getenv functions (#140211 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/140211 Approved by: https://github.com/ezyang, https://github.com/eqy	2025-01-28 15:21:12 +00:00
cyy	29f52e3972	[2/N] Remove unnecessary once flag usage (#145057 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/145057 Approved by: https://github.com/albanD	2025-01-23 09:48:46 +00:00
Will Constable	2859b11bdb	[pytorch/ncclx] Remove Alltoallv specialization for PTD all_to_all (#145045 ) Summary: PTD all_to_all uses a list of tensors, while ncclAllToAllv (provided by NCCLX and RCCL) assumes that a single contiguous buffer is used. These are fundamentally mismatched. The list of tensors might not be contiguous or even ordered (buffer addresses might not be in increasing order). This patch removes the ncclAllToAllv specialization for PTD all_to_all, and instead let's it directly call ncclSend/ncclRecv. Co-authored by @pavanbalaji Pull Request resolved: https://github.com/pytorch/pytorch/pull/145045 Approved by: https://github.com/pavanbalaji, https://github.com/d4l3k, https://github.com/fduwjj, https://github.com/ezyang	2025-01-18 05:26:55 +00:00
Tom Ritchford	13d35ea67a	[BE] Add missing throw of `std::runtime_error` in scrc/cuda/utils.cpp (#144962 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/144962 Approved by: https://github.com/amjames, https://github.com/Skylion007, https://github.com/malfet	2025-01-16 17:35:39 +00:00
cyy	b0be30dd79	[19/N] Fix extra warnings brought by clang-tidy-17 (#144448 ) Apply more clang-tidy fixes. There was a bug introduced by #144014 due to incorrect namespace concatenation which is reverted here. Pull Request resolved: https://github.com/pytorch/pytorch/pull/144448 Approved by: https://github.com/albanD	2025-01-09 15:58:05 +00:00
Michal Gallus	37e9da0687	[ROCm][Windows] Disable roctracer-related code (#143329 ) Currently, the roctracer for Windows is not available. This PR disables any mentions of its usage for Windows, and creates dummy functions for Windows to keep compatibility with existing code, but which warn the user about the lack of Windows' availability. Pull Request resolved: https://github.com/pytorch/pytorch/pull/143329 Approved by: https://github.com/sraikund16	2025-01-03 01:51:01 +00:00
Yu, Guangye	3848de55ed	Add get_stream_from_external API for CUDA backend (#143799 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/143799 Approved by: https://github.com/albanD, https://github.com/EikanWang ghstack dependencies: #142347, #141119, #141123	2024-12-31 11:15:59 +00:00
Richard Barnes	9283c40ba8	[codemod] Decorate unused variables with `[[maybe_unused]]` (#143381 ) Summary: LLVM-15 has a warning `-Wunused-variable` which we treat as an error because it's so often diagnostic of a code issue. Unused variables can compromise readability or, worse, performance. This diff either (a) removes an unused variable and, possibly, it's associated code or (b) qualifies the variable with `[[maybe_unused]]`. - If you approve of this diff, please use the "Accept & Ship" button :-) Test Plan: Sandcastle Reviewed By: palmje Pull Request resolved: https://github.com/pytorch/pytorch/pull/143381 Approved by: https://github.com/malfet	2024-12-17 20:36:03 +00:00
Nichols A. Romero	c0a39ad35a	[ROCm] Fix TunableOp UTs: Rotating Buffer (#143172 ) TunableOp's rotating buffer feature cannot be properly tested because the environment variable that controls this feature is sticky. A Python API is introduced to modify this value. Additional items in this PR: * UT for rotating buffer API * Clean up UTs that were setting the rotating buffer via the environment variable * Align behavior of environment variable and Python API when a negative value (< 0) is set. * Update documentation. Pull Request resolved: https://github.com/pytorch/pytorch/pull/143172 Approved by: https://github.com/jeffdaily	2024-12-14 06:18:11 +00:00
Peter Bell	96c3b2c388	Expose remaining sharedMem cudaDeviceProps to python (#143226 ) Was a bit too fast with my earlier PR, `sharedMemPerMultiprocessor` includes some memory that is reserved for the system. The amount a kernel can actually use is limited by `sharedMemPerBlockOptin`. I also expose `sharedMemPerBlock` for completeness. Pull Request resolved: https://github.com/pytorch/pytorch/pull/143226 Approved by: https://github.com/ezyang	2024-12-14 06:13:28 +00:00
Peter Bell	82a45d19b4	Expose sharedMemPerMultiprocessor device property to python (#143119 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/143119 Approved by: https://github.com/ezyang	2024-12-13 16:53:57 +00:00
Alex Denisov	539286a67b	Inductor annotations (#130429 ) Add NVTX annotations around training phases and buffer computations RFC/discussion: https://dev-discuss.pytorch.org/t/rfc-performance-profiling-at-scale-with-details-nvtx-annotations/2224 <img width="2160" alt="Screenshot 2024-07-10 at 11 48 04" src="https://github.com/pytorch/pytorch/assets/1175576/9ade139c-d393-473f-9b68-6c25da367dc4"> Pull Request resolved: https://github.com/pytorch/pytorch/pull/130429 Approved by: https://github.com/aorenste, https://github.com/eellison, https://github.com/albanD Co-authored-by: Cedric GESTES <cedric.gestes@flex.ai>	2024-12-10 08:53:39 +00:00
Benjamin Glass	4959784dac	Add API query for available per-process CUDA memory (#140620 ) Certain `cpp_wrapper`-enabled tests were OOM-ing in the CI pipeline, with error messages suggesting that sufficient memory was accessible. This ultimately resulted from an internal memory limitation that was not queryable in the API. This PR adds querying for that limit. Additionally, the failing tests had incorrect memory availability checks, and are updated with measured memory requirements. Pull Request resolved: https://github.com/pytorch/pytorch/pull/140620 Approved by: https://github.com/malfet, https://github.com/eqy ghstack dependencies: #141367	2024-12-03 00:24:03 +00:00
Jesse Cai	5accae4197	[sparse] add extra options to _cslt_spare_mm (#137427 ) Summary: Splitting this PR into two, one for the cuSPARSELt improvements, and one for the inductor lowering. This PR adds in the additional cuSPARSELt bindings into pytorch. * `torch._cslt_sparse_mm_search` will be deprecated in a future PR, so a warning has been added * Added a header file for cuSPARSELtOps.cpp * max_id is now available in `torch.backends.cusparselt` via `torch.backends.cusparselt.get_max_alg_id()` * fixed meta registrations for float8 Test Plan: python test/test_sparse_semi_structured.py Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/137427 Approved by: https://github.com/cpuhrsch, https://github.com/eqy	2024-11-27 05:32:45 +00:00
PyTorch MergeBot	5318bf8baf	Revert "[sparse] add extra options to _cslt_spare_mm (#137427 )" This reverts commit `f1451163ec`. Reverted https://github.com/pytorch/pytorch/pull/137427 on behalf of https://github.com/huydhn due to This looks like the test is still failing, plz do a rebase ([comment](https://github.com/pytorch/pytorch/pull/137427#issuecomment-2499918590))	2024-11-26 08:01:24 +00:00
Jesse Cai	f1451163ec	[sparse] add extra options to _cslt_spare_mm (#137427 ) Summary: Splitting this PR into two, one for the cuSPARSELt improvements, and one for the inductor lowering. This PR adds in the additional cuSPARSELt bindings into pytorch. * `torch._cslt_sparse_mm_search` will be deprecated in a future PR, so a warning has been added * Added a header file for cuSPARSELtOps.cpp * max_id is now available in `torch.backends.cusparselt` via `torch.backends.cusparselt.get_max_alg_id()` * fixed meta registrations for float8 Test Plan: python test/test_sparse_semi_structured.py Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/137427 Approved by: https://github.com/cpuhrsch, https://github.com/eqy	2024-11-25 23:45:41 +00:00
Xiaodong Wang	4a378d77d4	[AMD] Add ncclRemoteError back (#141461 ) Summary: It looks RCCL does have the support for those two error types:: ncclRemoteError and ncclnProgress: https://github.com/ROCm/rccl/blob/develop/src/nccl.h.in#L57. And I do see my job throwing out those errors. But pytorch just said: ``` RuntimeError: Unconvertible NCCL type ``` Even though nccl says: ``` develop/src/init.cc.hip:502 NCCL WARN Attempt to use communicator before the previous operation returned ncclSuccess ``` Therefore just enabling those. Test Plan: CI Differential Revision: D66434341 Pull Request resolved: https://github.com/pytorch/pytorch/pull/141461 Approved by: https://github.com/eqy	2024-11-25 17:49:34 +00:00
PyTorch MergeBot	cc90ba8924	Revert "[sparse] add extra options to _cslt_spare_mm (#137427 )" This reverts commit `45b30a5aec`. Reverted https://github.com/pytorch/pytorch/pull/137427 on behalf of https://github.com/huydhn due to Sorry for reverting your change but test_sparse_semi_structured is failing in trunk after it lands ([comment](https://github.com/pytorch/pytorch/pull/137427#issuecomment-2494047577))	2024-11-22 15:40:21 +00:00
Jesse Cai	45b30a5aec	[sparse] add extra options to _cslt_spare_mm (#137427 ) Summary: Splitting this PR into two, one for the cuSPARSELt improvements, and one for the inductor lowering. This PR adds in the additional cuSPARSELt bindings into pytorch. * `torch._cslt_sparse_mm_search` will be deprecated in a future PR, so a warning has been added * Added a header file for cuSPARSELtOps.cpp * max_id is now available in `torch.backends.cusparselt` via `torch.backends.cusparselt.get_max_alg_id()` * fixed meta registrations for float8 Test Plan: python test/test_sparse_semi_structured.py Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/137427 Approved by: https://github.com/cpuhrsch, https://github.com/eqy	2024-11-21 23:37:36 +00:00
PyTorch MergeBot	614e727191	Revert "[Environment Variable][7/N] Use thread-safe getenv functions (#140211 )" This reverts commit `cd942d00dd`. Reverted https://github.com/pytorch/pytorch/pull/140211 on behalf of https://github.com/izaitsevfb due to causes crash internally during test listing ([comment](https://github.com/pytorch/pytorch/pull/140211#issuecomment-2492328790))	2024-11-21 21:05:22 +00:00
cyyever	cd942d00dd	[Environment Variable][7/N] Use thread-safe getenv functions (#140211 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/140211 Approved by: https://github.com/ezyang, https://github.com/eqy	2024-11-21 00:25:20 +00:00
cyy	00b3b61076	Add and use thread-safe strerror (#140472 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/140472 Approved by: https://github.com/ezyang	2024-11-19 04:24:17 +00:00
Richard Barnes	c8be6f1196	[codemod] Remove unused-variable in pytorch (#140569 ) Summary: LLVM-15 has a warning `-Wunused-variable` which we treat as an error because it's so often diagnostic of a code issue. Unused variables can compromise readability or, worse, performance. This diff either (a) removes an unused variable and, possibly, it's associated code or (b) qualifies the variable with `[[maybe_unused]]`. #buildsonlynotests - Builds are sufficient - If you approve of this diff, please use the "Accept & Ship" button :-) Test Plan: Sandcastle Reviewed By: meyering Differential Revision: D65833225 Pull Request resolved: https://github.com/pytorch/pytorch/pull/140569 Approved by: https://github.com/Skylion007	2024-11-13 20:38:03 +00:00
PyTorch MergeBot	4a18e26ff5	Revert "[Environment Variable][7/N] Use thread-safe getenv functions (#140211 )" This reverts commit `a3cff4bbd4`. Reverted https://github.com/pytorch/pytorch/pull/140211 on behalf of https://github.com/ezyang due to One of these diffs had incorrect downstream optional handling, we must reaudit all of these diffs ([comment](https://github.com/pytorch/pytorch/pull/140211#issuecomment-2473709246))	2024-11-13 14:05:01 +00:00
cyy	a3cff4bbd4	[Environment Variable][7/N] Use thread-safe getenv functions (#140211 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/140211 Approved by: https://github.com/ezyang, https://github.com/eqy	2024-11-12 18:49:51 +00:00
cyy	bf1b8adee6	Turn static inline into static function (#139843 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/139843 Approved by: https://github.com/ezyang	2024-11-07 23:58:18 +00:00
PyTorch MergeBot	68f1b52d8a	Revert "Turn static inline into static function (#139843 )" This reverts commit `72d3f5b26d`. Reverted https://github.com/pytorch/pytorch/pull/139843 on behalf of https://github.com/ZainRizvi due to Sorry but this is causing tests to fail on trunk. See [GH job link](https://github.com/pytorch/pytorch/actions/runs/11729669425/job/32675829894) [HUD commit link](`72d3f5b26d`) ([comment](https://github.com/pytorch/pytorch/pull/139843#issuecomment-2463354131))	2024-11-07 22:29:45 +00:00
cyy	72d3f5b26d	Turn static inline into static function (#139843 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/139843 Approved by: https://github.com/ezyang	2024-11-07 19:08:41 +00:00
cyy	64d9ee88d7	[11/N] Fix extra warnings brought by clang-tidy-17 (#139599 ) Follows #139385 Pull Request resolved: https://github.com/pytorch/pytorch/pull/139599 Approved by: https://github.com/sraikund16	2024-11-04 23:57:41 +00:00
cyy	7f387fa612	[10/N] Fix extra warnings brought by clang-tidy-17 (#139385 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/139385 Approved by: https://github.com/Skylion007	2024-11-04 00:47:19 +00:00
cyy	f95c71867e	[9/N] Fix extra warnings brought by clang-tidy-17 (#139286 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/139286 Approved by: https://github.com/ezyang	2024-10-31 05:20:31 +00:00
cyyever	456c87c8a2	[8/N] Fix extra warnings brought by clang-tidy-17 (#139151 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/139151 Approved by: https://github.com/ezyang Co-authored-by: Aaron Gokaslan <aaronGokaslan@gmail.com>	2024-10-30 14:20:08 +00:00
cyy	4d9b5a87e4	[3/N] Fix cppcoreguidelines-special-member-functions warnings (#138796 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/138796 Approved by: https://github.com/ezyang	2024-10-28 10:53:11 +00:00
PyTorch MergeBot	9ca749d6cd	Revert " [3/N] Fix cppcoreguidelines-special-member-functions warnings (#138796 )" This reverts commit `7cb3cef05f`. Reverted https://github.com/pytorch/pytorch/pull/138796 on behalf of https://github.com/wdvr due to reverting since this started failing a windows test ([comment](https://github.com/pytorch/pytorch/pull/138796#issuecomment-2440710865))	2024-10-28 07:06:00 +00:00
cyyever	7cb3cef05f	[3/N] Fix cppcoreguidelines-special-member-functions warnings (#138796 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/138796 Approved by: https://github.com/ezyang	2024-10-28 01:38:02 +00:00
Ke Wen	ee11e2da1e	[PGNCCL] Use non-blocking mode by default in eager init (#138527 ) ### Why use non-blocking mode in eager init? For overlapping comm init and model init, etc. ![image](https://github.com/user-attachments/assets/9b0bf7a9-be26-4d16-827b-dbe861f083cd) ### Why can we set non-blocking as default? If the setting is dangling -- i.e. not passed in by user nor set via env -- `ProcessGroupNCCL` can have some preferred logic. And torch-level API semantics does not change whether the NCCL comm is blocking or non-blocking (handled within `ProcessGroupNCCL`). ### Why not make non-blocking default for lazy mode as well? PR https://github.com/pytorch/pytorch/pull/137544 tried it. Two reasons why that's not preferred today: 1. It is hard -- too big a blast. 2. There is no gain by doing lazy init in non-blocking mode, because the right next CPU call is a collective, and we will block there waiting for comm to be ready, so same effect as blocked init, no "opening" compared to eager mode. Pull Request resolved: https://github.com/pytorch/pytorch/pull/138527 Approved by: https://github.com/wconstab ghstack dependencies: #138860	2024-10-27 17:40:43 +00:00
PyTorch MergeBot	144d75d934	Revert "[PGNCCL] Use non-blocking mode by default in eager init (#138527 )" This reverts commit `07e30eae2a`. Reverted https://github.com/pytorch/pytorch/pull/138527 on behalf of https://github.com/huydhn due to Sorry for reverting your change, but it is failing on ROCm ([comment](https://github.com/pytorch/pytorch/pull/138527#issuecomment-2440070035))	2024-10-27 15:39:33 +00:00

1 2 3 4 5 ...

640 Commits