pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 00:21:07 +01:00

Author	SHA1	Message	Date
Tristan Rice	358ace1a1b	functional_collectives: add first differentiable collective -- all_to_all_single_grad (#123599 ) This adds the differentiable collective -- all_to_all_single_grad. This is the initial proof of concept PR and I will be adding the remaining collectives in follow up PRs. This adds a new function called `all_to_all_single_autograd` which is the autograd variant of `all_to_all_single`. For backwards compatibility + initial testing we wanted to make the autograd variant separate to avoid regressions. This uses `autograd::Function` to register an Autograd op that calls the original `_c10d_functional::all_to_all_single` via the dispatcher. This works with compile and inductor as opposed to the previous Python implementation that had issues. As this uses the existing `_c10d_functional` ops we don't need to register any meta functions or lowering. To avoid cudaStream issues this explicitly calls `wait_tensor` in the backward method to ensure it runs under the same stream as the async operation. This hurts performance but can be alleviated potentially using `compile`. Related work: https://github.com/pytorch/torchrec/blob/main/torchrec/distributed/comm_ops.py Test plan: ``` pytest test/distributed/test_functional_api.py -k test_all_to_all_single_compile pytest test/distributed/test_functional_api.py ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/123599 Approved by: https://github.com/yifuwang	2024-04-12 01:48:49 +00:00
Yifu Wang	f7a2bae0ac	Change TestOpWaitiness to use MultiProcessTestCase (#121046 ) The test has been failing sporadically rencetly in CI and the failures are not reproducible locally, likely due to some nasty race conditional related a combination of MultiThreadedTestCase, the use of global state and finalizers, and the recently introduced test decorator for native funcol migration. Switching to the test to use MultiProcessTestCase to provide better isolation. Pull Request resolved: https://github.com/pytorch/pytorch/pull/121046 Approved by: https://github.com/weifengpy	2024-03-02 01:12:14 +00:00
Yifu Wang	2d6c0cc81b	Run test_functional_api.py with both legacy and native funcol impls (#119982 ) Additional changes: tests in test_functional_api.py uses multi-threaded pg which is implemented in Python. For the native ops to call into the Python pg implementation, glue code in PyProcessGroup is required for each collective. This PR also adds a few pieces of previously missing glue code, which are necessary for running test_functional_api.py with native funcol. Pull Request resolved: https://github.com/pytorch/pytorch/pull/119982 Approved by: https://github.com/wanchaol	2024-02-20 21:15:37 +00:00
Omkar Salpekar	53cba40651	[Distributed] Fix tests when CUDA not available (#117163 ) NCCL tests failed after https://github.com/pytorch/pytorch/pull/116217 when PyTorch was not built with CUDA. This PR fixes the issue. Pull Request resolved: https://github.com/pytorch/pytorch/pull/117163 Approved by: https://github.com/malfet, https://github.com/wanchaol	2024-01-11 22:27:43 +00:00
Jeff Daily	a2d73e21d1	follow up #115078 , broken distributed tests (#116217 ) ROCm distributed tests started failing after #115078. This skips the new tests if the number of GPUs available isn't sufficient. Pull Request resolved: https://github.com/pytorch/pytorch/pull/116217 Approved by: https://github.com/malfet, https://github.com/huydhn	2024-01-08 15:26:54 +00:00
Lucas Pasqualin	d749b4a152	Implements `permute_tensor` in functional collectives (#115078 ) Implementation of `permute_tensor` as per @yifuwang 's suggestion. Pull Request resolved: https://github.com/pytorch/pytorch/pull/115078 Approved by: https://github.com/wanchaol, https://github.com/yifuwang	2023-12-19 18:33:28 +00:00
wz337	7b3e45be59	[DeviceMesh] Rename get_dim_groups to get_group (#114708 ) Rename get_dim_groups to get_group and update all callsites. Differential Revision: [D51629801](https://our.internmc.facebook.com/intern/diff/D51629801/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/114708 Approved by: https://github.com/XilunWu, https://github.com/wanchaol, https://github.com/fegin	2023-11-30 23:40:14 +00:00
Chien-Chin Huang	08641a3232	Make FakeProcessGroup traceable (#113314 ) This PR mimics what we have done to trace ProcessGroup. This allows use to use FakeProcessGroup with torch.compile. FakeProcessGroup allows us to use world_size > 1 without creating multiple processes thus enabling the usage of PDB to debug bucketing DDP allreduce in the Inductor. We can theoretically use GLOO with world_size==1 to achieve the same goal. However, the `wait()` seems to be optimized away when the world_size is 1. Differential Revision: [D51136463](https://our.internmc.facebook.com/intern/diff/D51136463/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/113314 Approved by: https://github.com/wanchaol	2023-11-10 16:03:38 +00:00
Lucas Pasqualin	1d56e7b5af	Adds broadcast to functional collectives (#112668 ) Adds `broadcast` to functional collectives, including inductor support. Test with `python test_inductor_collectives.py -- TestCollectivesMultiProc.test_broadcast_inductor` Pull Request resolved: https://github.com/pytorch/pytorch/pull/112668 Approved by: https://github.com/wanchaol, https://github.com/wconstab	2023-11-09 15:47:52 +00:00
Chien-Chin Huang	57f6368b8e	[collective] Add a torch.compile + functional_collectives test (#110688 ) Add a test to ensure functional_collectives + torch.compile always works. Differential Revision: [D50001491](https://our.internmc.facebook.com/intern/diff/D50001491/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/110688 Approved by: https://github.com/wanchaol, https://github.com/fduwjj	2023-10-10 17:14:50 +00:00
Edward Z. Yang	f274c7b32c	Add functional collective all_to_all_single and support it in Inductor (#110195 ) Copy of https://github.com/pytorch/pytorch/pull/106655 from yf225 rebased on top of item() support changes Pull Request resolved: https://github.com/pytorch/pytorch/pull/110195 Approved by: https://github.com/Skylion007	2023-10-05 23:11:51 +00:00
Edward Z. Yang	ec8b58f5ba	Add support for tolist on AsyncCollectiveTensor (#109377 ) This has to be done by hand because tolist isn't supported on tensor subclasses. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/109377 Approved by: https://github.com/wconstab, https://github.com/fduwjj	2023-09-15 21:48:13 +00:00
Rodrigo Kumpera	bbf03561a9	[functional collectives] Move back to registering finalizers on wrappers. (#107250 ) We cannot use inner tensors for finalizers as they are uncollective until waited. This PR adds a bunch of tests for the observable behavior we want, including the necessary scafold for us to test code for their waitiness. Pull Request resolved: https://github.com/pytorch/pytorch/pull/107250 Approved by: https://github.com/wconstab	2023-08-17 21:08:28 +00:00
Wanchao Liang	f026b32008	[device_mesh][BE] reduce_scatter fallback to funcol and remove from DM (#105642 ) For the reason similar to https://github.com/pytorch/pytorch/pull/105605 Pull Request resolved: https://github.com/pytorch/pytorch/pull/105642 Approved by: https://github.com/kumpera, https://github.com/wz337, https://github.com/fduwjj	2023-07-27 01:33:05 +00:00
Wanchao Liang	2fa063e1e0	[device_mesh][BE] remove allgather from DM (#105614 ) For the reason similar to https://github.com/pytorch/pytorch/pull/105605 Pull Request resolved: https://github.com/pytorch/pytorch/pull/105614 Approved by: https://github.com/rohan-varma, https://github.com/wz337, https://github.com/fduwjj	2023-07-27 01:33:05 +00:00
Wanchao Liang	8b94280008	[functional collective] parameterize allreduce tests (#105604 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/105604 Approved by: https://github.com/rohan-varma	2023-07-24 22:21:19 +00:00
Rodrigo Kumpera	17ab4f85e9	[c10d] Adopt allgather_into_tensor_coalesced for NCCL. (#103086 ) This is done by adding c10d::_allgather_into_tensor_coalesced wrapper. Pull Request resolved: https://github.com/pytorch/pytorch/pull/103086 Approved by: https://github.com/rohan-varma	2023-07-06 15:05:55 +00:00
Rodrigo Kumpera	c17bdb3247	[C10D] Add functional collective reduce_scatter_into_tensor_coalesced. (#101023 ) Implementation uses a fallback that does no coalescing. Pull Request resolved: https://github.com/pytorch/pytorch/pull/101023 Approved by: https://github.com/wanchaol	2023-06-23 19:24:11 +00:00
Rodrigo Kumpera	63fe26809d	Implement all_gather_into_tensor_coalesced. (#98642 ) The implementation is suboptimal since it uses c10d's group coalescing which is known to be inneficient. Pull Request resolved: https://github.com/pytorch/pytorch/pull/98642 Approved by: https://github.com/wanchaol	2023-06-13 15:06:52 +00:00
Rodrigo Kumpera	5b4a523583	Add all_reduce_coalesced to functional collectives (#98640 ) This adds all_reduce_coalesced to MTPG to ease testing. Pull Request resolved: https://github.com/pytorch/pytorch/pull/98640 Approved by: https://github.com/wanchaol	2023-04-26 17:05:54 +00:00
PyTorch MergeBot	e778bcec05	Revert "fix allgather func collective to use maybe_wrap_tensor (#98866 )" This reverts commit `ada7dfff71`. Reverted https://github.com/pytorch/pytorch/pull/98866 on behalf of https://github.com/izaitsevfb due to Conflicts with the co-dev diff D44921259, reverting to unblock the diff train	2023-04-14 00:30:16 +00:00
Wanchao Liang	ada7dfff71	fix allgather func collective to use maybe_wrap_tensor (#98866 ) It looks like we forgot to switch allgather to use maybe_wrap_tensor, this PR switch to use that and added test to guard tracing behavior Pull Request resolved: https://github.com/pytorch/pytorch/pull/98866 Approved by: https://github.com/mrshenli	2023-04-12 19:13:46 +00:00
PyTorch MergeBot	fa08e546f3	Revert "Add all_reduce_coalesced functional collective (#97157 )" This reverts commit `a3fc3531f5`. Reverted https://github.com/pytorch/pytorch/pull/97157 on behalf of https://github.com/huydhn due to Sorry for reverting your PR, but it seems to have a land race with https://github.com/pytorch/pytorch/pull/96226 and fails lint on trunk	2023-04-04 01:50:49 +00:00
Rodrigo Kumpera	a3fc3531f5	Add all_reduce_coalesced functional collective (#97157 ) Inductor codegen is suboptimal when calling all_reduce_coalesced with input args. We need to fix inductor's calling convention for that, or something else. Might not work if any outputs is unused. Test code: ```python import torch import torch.distributed as dist import torch.nn.functional as F from functorch import make_fx import os import torch.distributed._functional_collectives as ft_c from torch.testing._internal.common_distributed import ( spawn_threads_and_init_comms, ) from torch._inductor.compile_fx import compile_fx_inner def my_fun(a, b): c = a * 3 tensors = ft_c.all_reduce_coalesced([a, c, b], "sum", [0]) return ((tensors[1] + tensors[0] + tensors[2]).sum(), ) @spawn_threads_and_init_comms(world_size=1) def inductor_main(self): x = torch.arange(4).cuda() * (dist.get_rank() + 1) y = torch.arange(4).cuda() * (dist.get_rank() + 1) x = x.to(torch.float) y = y.to(torch.float) * 0.5 res = make_fx(my_fun)(x, y) print(f"fx graph:\n{res.graph}") ind = compile_fx_inner(res, [x, y]) print(f"inductor done:\n{ind}") os.environ["PROXY_TENSOR_TRACING"] = "1" os.environ["TORCH_COMPILE_DEBUG"] = "1" torch._dynamo.config.output_code = True if __name__ == "__main__": inductor_main(None) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/97157 Approved by: https://github.com/fegin	2023-04-04 01:13:18 +00:00
Wanchao Liang	848bf8103b	fix functional collective to not generate getattr node (#97924 ) use mesh.get_dim_groups directly instead of doing mesh tensor operations This help us get rid of the getattr ops during tracing Pull Request resolved: https://github.com/pytorch/pytorch/pull/97924 Approved by: https://github.com/kumpera	2023-03-30 20:14:50 +00:00
Rodrigo Kumpera	e22d791287	[PTD] Introduce tracing friendly collectives. (#93990 ) This change adds torch.distributed.traceable_collectives. This experimental API enables collectives to be fully traced by dynamo and FX. See #93173 for the RFC Pull Request resolved: https://github.com/pytorch/pytorch/pull/93990 Approved by: https://github.com/wconstab, https://github.com/wanchaol, https://github.com/H-Huang	2023-02-16 15:35:01 +00:00

26 Commits