pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 00:21:07 +01:00

Author	SHA1	Message	Date
fduwjj	67e68e0785	[c10d] Cleanup split_group logic using the newly built splitGroup (#158488 ) with https://github.com/pytorch/pytorch/pull/157716 merged we want to further clean up the code on the python side for `split_group` API. We do need to keep some old global book keeping for bc. The rest of logic is now all in cpp. Regarding the change brought in https://github.com/pytorch/pytorch/pull/152175, we did clean up in https://github.com/pytorch/pytorch/pull/158790 (including internal changes) so that we can safely remove it. Differential Revision: [D78777152](https://our.internmc.facebook.com/intern/diff/D78777152) Pull Request resolved: https://github.com/pytorch/pytorch/pull/158488 Approved by: https://github.com/d4l3k	2025-07-29 03:27:11 +00:00
fduwjj	f58a680d09	[c10d]Prototype of remote_group_merge (#158287 ) Tentative implementation of merge_remote_group per the proposal here: [docs.google.com/document/d/13R-1t_yESTvmAjcCN-wQjQQadIEu0JNIdS65uZawZzY/edit?tab=t.0#heading=h.3ctbqqopzc89](https://docs.google.com/document/d/13R-1t_yESTvmAjcCN-wQjQQadIEu0JNIdS65uZawZzY/edit?tab=t.0#heading=h.3ctbqqopzc89) Pull Request resolved: https://github.com/pytorch/pytorch/pull/158287 Approved by: https://github.com/d4l3k ghstack dependencies: #157716	2025-07-16 19:33:57 +00:00
fduwjj	6b2bef10af	[c10d] Prototype of `group_split` for dist2 work (#157716 ) This is to implement group_split as proposed in [docs.google.com/document/d/13R-1t_yESTvmAjcCN-wQjQQadIEu0JNIdS65uZawZzY/edit?tab=t.0#heading=h.3ctbqqopzc89](https://docs.google.com/document/d/13R-1t_yESTvmAjcCN-wQjQQadIEu0JNIdS65uZawZzY/edit?tab=t.0#heading=h.3ctbqqopzc89) Pull Request resolved: https://github.com/pytorch/pytorch/pull/157716 Approved by: https://github.com/d4l3k	2025-07-14 21:04:12 +00:00
Tristan Rice	7ffadff286	c10d/ProcessGroup: cleanup abort and shutdown (#148798 ) This adds `abort` and `shutdown` to `Backend` and `ProcessGroup` objects. This simplifies the logic in `distributed_c10d.py` by having a default noop implementation for all PGs. This will be useful for torchft and upcoming versions of NCCL which will handle abort correctly. Currently `torchft` would have to call internal methods `_abort` on the PGNCCL object directly but with this change we can now just call `.abort()` and have it work for any PG implementation. Test plan: ``` pytest distributed/test_backends.py distributed/test_c10d_common.py distributed/test_c10d_pypg.py ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/148798 Approved by: https://github.com/kwen2501	2025-03-08 18:33:18 +00:00
Tristan Rice	68631f6e87	PyWork: preserve Python reference counting when used in functional collectives (#146376 ) @fegin found an issue where torchft is not compatible with functional collectives. Found in https://github.com/pytorch/torchtitan/pull/806 The root cause is because PyProcessGroup/PyWork are not compatible with functional collectives due to a nasty ownership bug. PyWork relies on a pybind trampoline to propagate requests to Python unfortunately the way Pybind works is that the Python object owns the C++ object rather than some form of shared ownership. Thus what happens is that the PyWork Python object will collected when returned to C++ from the PyProcessGroup but the C++ PyWork object still exists. When the PyWork object is used, this causes a deadlock as the corresponding Python object no longer exists To solve this, we introduce a new `PyWorkHolder` class which holds a reference to the `py::object` as well as the trampoline class. This resolves any dependency issues since we can now hold ownership in C++ to both the Python and C++ objects. To make this cleaner we introduce a `WORK_OVERRIDE` macro which is a patched version of `PYBIND11_OVERRIDE` that returns a `PyWorkHolder` rather than just `PyWork` and use for all collectives in PyProcessGroup. Test plan: ``` cd pytorch pytest test/distributed/test_c10d_functional_native.py ``` ``` cd torchft pytest torchft/process_group_test.py -k functional -v -x -s ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/146376 Approved by: https://github.com/yifuwang	2025-02-07 18:07:53 +00:00
Tristan Rice	9f4f061f89	PyProcessGroup: support rank, world size, group name/desc overrides (#141529 ) This improves `PyProcessGroup` so you can override rank, world size and group name/desc methods from Python. These will be needed to support resizable process groups in torchft. This also has some small fixes in test_c10d_pypg.py to use threads instead of processes which speeds up the test execution by ~10x. Test plan: ``` pytest test/distributed/test_c10d_pypg.py ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/141529 Approved by: https://github.com/fegin	2024-11-26 20:56:57 +00:00
cyy	f9ae3fac8c	[Distributed] [19/N] Fix clang-tidy warnings in torch/csrc/distributed/ (#138903 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/138903 Approved by: https://github.com/ezyang	2024-10-28 05:29:25 +00:00
cyy	f5f1d0a753	Fix build warnings for torch_python (#134981 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/134981 Approved by: https://github.com/ezyang	2024-09-12 03:59:34 +00:00
Yifu Wang	a136a7d623	[Functional Collective] enable custom work registration from python (#130354 ) This PR does two things: - Allow tensor -> work registration in Python via `torch._C._distributed_c10d.register_work`. Calling `torch.ops._c10d_functional.wait_tensor` on a tensor would trigger `.wait()` on the registered work object. - Allow user-defined work object in Python to work with functional collectives. Pull Request resolved: https://github.com/pytorch/pytorch/pull/130354 Approved by: https://github.com/wanchaol, https://github.com/fegin, https://github.com/wconstab	2024-07-22 21:45:19 +00:00
cyy	4457cd9a30	[Distributed] [7/N] Fix clang-tidy warnings in torch/csrc/distributed/c10d (#124987 ) This PR continues to clean clang-tidy warnings in torch/csrc/distributed/c10d, following #124701. Pull Request resolved: https://github.com/pytorch/pytorch/pull/124987 Approved by: https://github.com/malfet	2024-05-11 00:03:52 +00:00
PyTorch MergeBot	724c7491d0	Revert " [Distributed] [7/N] Fix clang-tidy warnings in torch/csrc/distributed/c10d (#124987 )" This reverts commit `b3fd94d15e`. Reverted https://github.com/pytorch/pytorch/pull/124987 on behalf of https://github.com/ezyang due to broke downstream extensions ([comment](https://github.com/pytorch/pytorch/pull/124987#issuecomment-2083956511))	2024-04-30 00:37:53 +00:00
cyy	b3fd94d15e	[Distributed] [7/N] Fix clang-tidy warnings in torch/csrc/distributed/c10d (#124987 ) This PR continues to clean clang-tidy warnings in torch/csrc/distributed/c10d, following #124701. In addition, libfmt dependency is added in CMake code to enable using it in the headers. The libfmt has to be added as private dependency to torch_cuda and torch_hip because they include torch/csrc/distributed/c10d/Utils.hpp which uses libfmt. Pull Request resolved: https://github.com/pytorch/pytorch/pull/124987 Approved by: https://github.com/malfet	2024-04-27 07:22:27 +00:00
Tristan Rice	358ace1a1b	functional_collectives: add first differentiable collective -- all_to_all_single_grad (#123599 ) This adds the differentiable collective -- all_to_all_single_grad. This is the initial proof of concept PR and I will be adding the remaining collectives in follow up PRs. This adds a new function called `all_to_all_single_autograd` which is the autograd variant of `all_to_all_single`. For backwards compatibility + initial testing we wanted to make the autograd variant separate to avoid regressions. This uses `autograd::Function` to register an Autograd op that calls the original `_c10d_functional::all_to_all_single` via the dispatcher. This works with compile and inductor as opposed to the previous Python implementation that had issues. As this uses the existing `_c10d_functional` ops we don't need to register any meta functions or lowering. To avoid cudaStream issues this explicitly calls `wait_tensor` in the backward method to ensure it runs under the same stream as the async operation. This hurts performance but can be alleviated potentially using `compile`. Related work: https://github.com/pytorch/torchrec/blob/main/torchrec/distributed/comm_ops.py Test plan: ``` pytest test/distributed/test_functional_api.py -k test_all_to_all_single_compile pytest test/distributed/test_functional_api.py ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/123599 Approved by: https://github.com/yifuwang	2024-04-12 01:48:49 +00:00
Yifu Wang	2d6c0cc81b	Run test_functional_api.py with both legacy and native funcol impls (#119982 ) Additional changes: tests in test_functional_api.py uses multi-threaded pg which is implemented in Python. For the native ops to call into the Python pg implementation, glue code in PyProcessGroup is required for each collective. This PR also adds a few pieces of previously missing glue code, which are necessary for running test_functional_api.py with native funcol. Pull Request resolved: https://github.com/pytorch/pytorch/pull/119982 Approved by: https://github.com/wanchaol	2024-02-20 21:15:37 +00:00
Shen Li	45128ab67c	[Reland] Add OnCompletion Hook to ProcessGroup (#106988 ) (#107233 ) This allows infra/trainers to get detailed stats about communication efficiencies without know anything about what model or distributed training paradigms have been used. This is helpful as infra/trainer package usually prefers to be as model/algorithm agnostic as possible. Therefore, we cannot assume that infra/trainer can have access to all collectives used by the model authors. This commit adds an `OnCompletion` hook to `ProcessGroupNCCL` which will be fired on every work completion event. Pull Request resolved: https://github.com/pytorch/pytorch/pull/107233 Approved by: https://github.com/kumpera	2023-08-15 17:35:14 +00:00
PyTorch MergeBot	fd214aa8be	Revert "Add OnCompletion Hook to ProcessGroup (#106988 )" This reverts commit `ba1da47e8f`. Reverted https://github.com/pytorch/pytorch/pull/106988 on behalf of https://github.com/huydhn due to Sorry for reverting you change, but it is failing Windows build with some linker error. The Windows failures on PR looks legit ([comment](https://github.com/pytorch/pytorch/pull/106988#issuecomment-1678580899))	2023-08-15 08:24:33 +00:00
Shen Li	ba1da47e8f	Add OnCompletion Hook to ProcessGroup (#106988 ) This allows infra/trainers to get detailed stats about communication efficiencies without know anything about what model or distributed training paradigms have been used. This is helpful as infra/trainer package usually prefers to be as model/algorithm agnostic as possible. Therefore, we cannot assume that infra/trainer can have access to all collectives used by the model authors. This commit adds an `OnCompletion` hook to `ProcessGroupNCCL` which will be fired on every work completion event. Pull Request resolved: https://github.com/pytorch/pytorch/pull/106988 Approved by: https://github.com/kumpera, https://github.com/H-Huang ghstack dependencies: #107140, #107141, #107160	2023-08-15 04:32:23 +00:00
Shen Li	dd6319198d	Apply clang-format to distributed/c10d folder (#107140 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/107140 Approved by: https://github.com/H-Huang	2023-08-14 23:16:38 +00:00
mikey dagitses	322e4b4c8a	set -Wsuggest-override for builds (#89852 ) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/pytorch/pytorch/pull/89852). * __->__ #89852 * #89851 set -Wsuggest-override for builds Summary: This was flagged by a Meta internal build. Test Plan: Rely on CI. Pull Request resolved: https://github.com/pytorch/pytorch/pull/89852 Approved by: https://github.com/malfet	2022-12-19 22:08:47 +00:00
Min Si	1ad0048b64	Refactor distribuetd to use absolute header path (#85780 ) Headers under torch/csrc/distributed may be referened with relative path, e.g., "<c10d/...>". However, relative path cannot be gracefully handled by Meta internal build when the NCCL PG is hipified to support AMD/RCCL because the "hipified" header files are generated in other directories. Moreover, using absolute path for header inclusion is the state-of-the-art in most components in Pytorch. Thus, this patch refactors all header paths in torch/csrc/distributed to be absolute. See D39835774 for more details about Meta internal complication. How to test: commit 9e5d199 removes -I./torch/csrc/distributed in compile options. Thus use it to verify we don't miss any relative path use of torch/csrc/distributed headers. Pull Request resolved: https://github.com/pytorch/pytorch/pull/85780 Approved by: https://github.com/kumpera, https://github.com/huydhn	2022-09-30 05:13:50 +00:00
PyTorch MergeBot	a50d8864fc	Revert "Refactor distribuetd to use absolute header path (#85780 )" This reverts commit `668082718a`. Reverted https://github.com/pytorch/pytorch/pull/85780 on behalf of https://github.com/huydhn due to Sorry for reverting your PR but it breaks build due to a missing file <c10d/Store.hpp>	2022-09-30 02:04:29 +00:00
Min Si	668082718a	Refactor distribuetd to use absolute header path (#85780 ) Headers under torch/csrc/distributed may be referened with relative path, e.g., "<c10d/...>". However, relative path cannot be gracefully handled by Meta internal build when the NCCL PG is hipified to support AMD/RCCL because the "hipified" header files are generated in other directories. Moreover, using absolute path for header inclusion is the state-of-the-art in most components in Pytorch. Thus, this patch refactors all header paths in torch/csrc/distributed to be absolute. See D39835774 for more details about Meta internal complication. How to test: commit 9e5d199 removes -I./torch/csrc/distributed in compile options. Thus use it to verify we don't miss any relative path use of torch/csrc/distributed headers. Pull Request resolved: https://github.com/pytorch/pytorch/pull/85780 Approved by: https://github.com/kumpera	2022-09-30 00:27:24 +00:00
Howard Huang	74ead61944	[2/N] [Dispatchable Collectives] Extract ProcessGroup::Work into a separate class and update references (#83680 ) ### Changes - Move ProcessGroup::Work into its own class and update all the references to it / header includes. #### Motivation In the future PRs we will repurpose ProcessGroup to instead contain a list of Backends (ProcessGroupNCCL/Gloo/UCC) and perform dispatching to them based on tensor type. This change is prevent a circular dependency with ProcessGroup depending on Backend and Backend depending on ProcessGroup::Work. Differential Revision: [D38839212](https://our.internmc.facebook.com/intern/diff/D38839212) Pull Request resolved: https://github.com/pytorch/pytorch/pull/83680 Approved by: https://github.com/kwen2501	2022-09-14 13:05:58 +00:00
Rodrigo Kumpera	f6a45f7984	[distributed] Make DDP work with python process group (#79176 ) This PR enables python process group usage with DDP by doing the following: - Surface PG::Work::getFuture() as overridable() - Use Work::getFuture() to retrieve values from a PG. - Add _create_work_from_future python method that creates a Work object that wraps a Future. To test this changes we use both strategies to run DDP with a python based PG. The reason for offering two methods is that both have short-comings. The wrapper method is harder to troubleshoot as there's no visibility of how the future is used. The subclass method has memory management issues as can be noticed in the test suite by having to keep Work instances alive by storing them in PG fields. Pull Request resolved: https://github.com/pytorch/pytorch/pull/79176 Approved by: https://github.com/rohan-varma	2022-06-28 17:14:21 +00:00
Shen Li	a3aa9df59f	Add barrier to ProcessGroup trampoline (#67236 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67236 cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse SciPioneer H-Huang Test Plan: Imported from OSS Reviewed By: rohan-varma Differential Revision: D31916706 Pulled By: mrshenli fbshipit-source-id: f3d2bcd938a384ec297f4094831c69d4059316bb	2021-10-27 08:18:07 -07:00
Shen Li	58fefa6516	Add pybind trampoline for ProcessGroup and Work (#66338 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66338 This commit exposes c10d extension API to Python land. Users can now override c10d communication behaviors in pure Python, and no longer needs to go through the cpp extension steps. Test Plan: Imported from OSS Reviewed By: rohan-varma Differential Revision: D31514351 Pulled By: mrshenli fbshipit-source-id: a8b94af0af7960c078e1006c29b25f7f3bd86c81	2021-10-11 06:41:06 -07:00

26 Commits