Wanchao Liang
f026b32008
[device_mesh][BE] reduce_scatter fallback to funcol and remove from DM ( #105642 )
...
For the reason similar to https://github.com/pytorch/pytorch/pull/105605
Pull Request resolved: https://github.com/pytorch/pytorch/pull/105642
Approved by: https://github.com/kumpera , https://github.com/wz337 , https://github.com/fduwjj
2023-07-27 01:33:05 +00:00
Wanchao Liang
2fa063e1e0
[device_mesh][BE] remove allgather from DM ( #105614 )
...
For the reason similar to https://github.com/pytorch/pytorch/pull/105605
Pull Request resolved: https://github.com/pytorch/pytorch/pull/105614
Approved by: https://github.com/rohan-varma , https://github.com/wz337 , https://github.com/fduwjj
2023-07-27 01:33:05 +00:00
Wanchao Liang
8b94280008
[functional collective] parameterize allreduce tests ( #105604 )
...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/105604
Approved by: https://github.com/rohan-varma
2023-07-24 22:21:19 +00:00
Rodrigo Kumpera
17ab4f85e9
[c10d] Adopt allgather_into_tensor_coalesced for NCCL. ( #103086 )
...
This is done by adding c10d::_allgather_into_tensor_coalesced wrapper.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/103086
Approved by: https://github.com/rohan-varma
2023-07-06 15:05:55 +00:00
Rodrigo Kumpera
c17bdb3247
[C10D] Add functional collective reduce_scatter_into_tensor_coalesced. ( #101023 )
...
Implementation uses a fallback that does no coalescing.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/101023
Approved by: https://github.com/wanchaol
2023-06-23 19:24:11 +00:00
Rodrigo Kumpera
63fe26809d
Implement all_gather_into_tensor_coalesced. ( #98642 )
...
The implementation is suboptimal since it uses c10d's group coalescing which
is known to be inneficient.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/98642
Approved by: https://github.com/wanchaol
2023-06-13 15:06:52 +00:00
Rodrigo Kumpera
5b4a523583
Add all_reduce_coalesced to functional collectives ( #98640 )
...
This adds all_reduce_coalesced to MTPG to ease testing.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/98640
Approved by: https://github.com/wanchaol
2023-04-26 17:05:54 +00:00
PyTorch MergeBot
e778bcec05
Revert "fix allgather func collective to use maybe_wrap_tensor ( #98866 )"
...
This reverts commit ada7dfff71 .
Reverted https://github.com/pytorch/pytorch/pull/98866 on behalf of https://github.com/izaitsevfb due to Conflicts with the co-dev diff D44921259, reverting to unblock the diff train
2023-04-14 00:30:16 +00:00
Wanchao Liang
ada7dfff71
fix allgather func collective to use maybe_wrap_tensor ( #98866 )
...
It looks like we forgot to switch allgather to use maybe_wrap_tensor,
this PR switch to use that and added test to guard tracing behavior
Pull Request resolved: https://github.com/pytorch/pytorch/pull/98866
Approved by: https://github.com/mrshenli
2023-04-12 19:13:46 +00:00
PyTorch MergeBot
fa08e546f3
Revert "Add all_reduce_coalesced functional collective ( #97157 )"
...
This reverts commit a3fc3531f5 .
Reverted https://github.com/pytorch/pytorch/pull/97157 on behalf of https://github.com/huydhn due to Sorry for reverting your PR, but it seems to have a land race with https://github.com/pytorch/pytorch/pull/96226 and fails lint on trunk
2023-04-04 01:50:49 +00:00
Rodrigo Kumpera
a3fc3531f5
Add all_reduce_coalesced functional collective ( #97157 )
...
Inductor codegen is suboptimal when calling all_reduce_coalesced with input args. We need to fix inductor's calling convention for that, or something else.
Might not work if any outputs is unused.
Test code:
```python
import torch
import torch.distributed as dist
import torch.nn.functional as F
from functorch import make_fx
import os
import torch.distributed._functional_collectives as ft_c
from torch.testing._internal.common_distributed import (
spawn_threads_and_init_comms,
)
from torch._inductor.compile_fx import compile_fx_inner
def my_fun(a, b):
c = a * 3
tensors = ft_c.all_reduce_coalesced([a, c, b], "sum", [0])
return ((tensors[1] + tensors[0] + tensors[2]).sum(), )
@spawn_threads_and_init_comms(world_size=1)
def inductor_main(self):
x = torch.arange(4).cuda() * (dist.get_rank() + 1)
y = torch.arange(4).cuda() * (dist.get_rank() + 1)
x = x.to(torch.float)
y = y.to(torch.float) * 0.5
res = make_fx(my_fun)(x, y)
print(f"fx graph:\n{res.graph}")
ind = compile_fx_inner(res, [x, y])
print(f"inductor done:\n{ind}")
os.environ["PROXY_TENSOR_TRACING"] = "1"
os.environ["TORCH_COMPILE_DEBUG"] = "1"
torch._dynamo.config.output_code = True
if __name__ == "__main__":
inductor_main(None)
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/97157
Approved by: https://github.com/fegin
2023-04-04 01:13:18 +00:00
Wanchao Liang
848bf8103b
fix functional collective to not generate getattr node ( #97924 )
...
use mesh.get_dim_groups directly instead of doing mesh tensor operations
This help us get rid of the getattr ops during tracing
Pull Request resolved: https://github.com/pytorch/pytorch/pull/97924
Approved by: https://github.com/kumpera
2023-03-30 20:14:50 +00:00
Rodrigo Kumpera
e22d791287
[PTD] Introduce tracing friendly collectives. ( #93990 )
...
This change adds torch.distributed.traceable_collectives.
This experimental API enables collectives to be fully traced by dynamo and FX.
See #93173 for the RFC
Pull Request resolved: https://github.com/pytorch/pytorch/pull/93990
Approved by: https://github.com/wconstab , https://github.com/wanchaol , https://github.com/H-Huang
2023-02-16 15:35:01 +00:00