Commit Graph

45 Commits

Author SHA1 Message Date
Will Constable
d64bada876 Refactor funcol for readability and dynamo tracing (#104387)
Move eager kernel impls to separate file, which is eaiser to read
(since users may be confused about 2 versions of each kernel in the same file)
and easier to set a dynamo policy to trace only the first file currently.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/104387
Approved by: https://github.com/wanchaol, https://github.com/fduwjj, https://github.com/kumpera
2023-07-06 23:29:49 +00:00
Rodrigo Kumpera
17ab4f85e9 [c10d] Adopt allgather_into_tensor_coalesced for NCCL. (#103086)
This is done by adding c10d::_allgather_into_tensor_coalesced wrapper.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/103086
Approved by: https://github.com/rohan-varma
2023-07-06 15:05:55 +00:00
Wanchao Liang
db1ac4e29b fix functional collective's allgather for gloo (#104681)
Summary: We should explicitly check for the gloo backend instead of relying on the shard's device, because user might pass a GPU tensor as input and a process group gloo as the pg, and expect that should work.

Differential Revision: D47249172

Pull Request resolved: https://github.com/pytorch/pytorch/pull/104681
Approved by: https://github.com/rohan-varma, https://github.com/fduwjj
2023-07-06 09:52:48 +00:00
Will Constable
d0509fe32d Document how functional collectives work under eager/dynamo (#104386)
Move user facing apis to the top for best visibility
(strictly code-motion in this PR, besides adding comments)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/104386
Approved by: https://github.com/voznesenskym, https://github.com/wanchaol
2023-06-30 01:12:55 +00:00
Rodrigo Kumpera
c17bdb3247 [C10D] Add functional collective reduce_scatter_into_tensor_coalesced. (#101023)
Implementation uses a fallback that does no coalescing.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/101023
Approved by: https://github.com/wanchaol
2023-06-23 19:24:11 +00:00
Rodrigo Kumpera
0beec88c93 Inductor support for all_gather_into_tensor_coalesced. (#98643)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/98643
Approved by: https://github.com/wanchaol
2023-06-21 19:25:03 +00:00
Rodrigo Kumpera
63fe26809d Implement all_gather_into_tensor_coalesced. (#98642)
The implementation is suboptimal since it uses c10d's group coalescing which
is known to be inneficient.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/98642
Approved by: https://github.com/wanchaol
2023-06-13 15:06:52 +00:00
PyTorch MergeBot
caecb55223 Revert "Log functional_collectives apis to distributed logger (#103288)"
This reverts commit 37359c36fd.

Reverted https://github.com/pytorch/pytorch/pull/103288 on behalf of https://github.com/malfet due to Broke test_inductor_collectives, see 37359c36fd ([comment](https://github.com/pytorch/pytorch/pull/103288#issuecomment-1587677705))
2023-06-12 16:37:57 +00:00
Will Constable
37359c36fd Log functional_collectives apis to distributed logger (#103288)
This logs functional collectives API calls with debug log level only.

(the `+` in the TORCH_LOGS cmdline enables debug level, otherwise only info level)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/103288
Approved by: https://github.com/wanchaol, https://github.com/fduwjj
2023-06-12 06:33:26 +00:00
Wanchao Liang
d31707a257 Get rid of dim_groups attribute from DeviceMesh (#103105)
This PR get rids of the dim_groups attribute from DeviceMesh, the main
motivation behind this is that we should let c10d store the process
groups during its creation instead of DeviceMesh, DeviceMesh should just
handle ranks correctly.

This could enable DTensor becomes picklable! (torch.save/load could be
possible), which I will give it a try in the next PR
Pull Request resolved: https://github.com/pytorch/pytorch/pull/103105
Approved by: https://github.com/XilunWu, https://github.com/fduwjj
2023-06-09 04:11:15 +00:00
Will Constable
77f97019b7 Dynamo remaps legacy allgather to traceable one (#102232)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/102232
Approved by: https://github.com/voznesenskym
2023-05-30 16:45:25 +00:00
albanD
59dff01319 Add top level function to check if running with deploy (#101420)
Also not sure if this should be a public function or not. Leaving it private for now but let me know if you prefer for it to be public.

FYI @nikitaved this will logically conflict with your triton kernel PR.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/101420
Approved by: https://github.com/malfet
2023-05-16 16:05:49 +00:00
Will Constable
793bd6993a Work around torchdynamo import error with functional collectives (#100901)
Summary:
Currently there are build configs where the torchdynamo import trips over a
strange SystemError related to some module's __dict__.items() returning NULL,
while torchdynamo tries to iterate all torch modules and process them for
its allowed functions list.

While this is hard to repro, we should be able to work around it and then fix
it properly.

Test Plan: Rely on others to test this, assuming CI passes.

Reviewed By: anijain2305

Differential Revision: D45663313

Pull Request resolved: https://github.com/pytorch/pytorch/pull/100901
Approved by: https://github.com/yanboliang, https://github.com/malfet
2023-05-09 16:09:42 +00:00
Rodrigo Kumpera
7a15e82388 Fix tensor registration to work with coalescing collectives. (#99763)
We do it by making it possible to register multiple tensors for the same
worker and coordinate waiting/cleanup among them.

This ensures waiting on any number the output tensors will result in a
single stream sync. This simplifies codegen by inductor.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/99763
Approved by: https://github.com/wanchaol
2023-05-05 14:25:35 +00:00
Will Constable
2dca418112 Reland basic dynamo support for traceable collectives (#100476)
Relative to the original land, this also contains:
- Fix torchdeploy import of functional collectives
- Can't import torchdynamo utils due to torch._refs being missing

Pull Request resolved: https://github.com/pytorch/pytorch/pull/100476
Approved by: https://github.com/kumpera
2023-05-04 04:25:35 +00:00
Shabab Ayub
287f74c4fc Revert D45387167: Multisect successfully blamed D45387167 for test or build failures (#100424)
Summary:
This diff is reverting D45387167
D45387167: Basic dynamo support for traceable collectives (#94440) by wconstab has been identified to be causing the following test or build failures (internal)

If you believe this diff has been generated in error you may Commandeer and Abandon it.

Test Plan: NA

Reviewed By: s4ayub

Differential Revision: D45448312

Pull Request resolved: https://github.com/pytorch/pytorch/pull/100424
Approved by: https://github.com/rohan-varma, https://github.com/kumpera
2023-05-03 16:10:54 +00:00
Will Constable
100a25d021 Basic dynamo support for traceable collectives (#94440)
Make traceable collectives work with torchdynamo,
bypassing problems with tracing the AsyncTensor subclass.

Accept a suboptimal solution for now, and optimize it later.
For now, wait happens immediately, which generally forces an early sync.

Later, find a way either in dynamo or AOT stack to handle
AsyncCollectiveTensor to get the wait in the optimal place.

Note on implementation:
- Dynamo traces 'user-level' fc apis that are designed to behave differently
  in eager vs compiled.  In eager, there will be work-obj registration and
  a wrapper subclass will insert a 'wait' call at the appropriate time.
  In compile/trace mode, wait will be immetiately called, and work obj
  registration is required to be handled by the compile backend at runtime.
- Dynamo needs to trace into some of the helper functions in the 'user-level'
  api, such as '_expand_group' which is essentially a constant transformation.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/94440
Approved by: https://github.com/kumpera
2023-04-27 05:38:36 +00:00
Rodrigo Kumpera
5b4a523583 Add all_reduce_coalesced to functional collectives (#98640)
This adds all_reduce_coalesced to MTPG to ease testing.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/98640
Approved by: https://github.com/wanchaol
2023-04-26 17:05:54 +00:00
Rodrigo Kumpera
38e964056b Reland python ops (#99170)
Waiting for the revert to land.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/99170
Approved by: https://github.com/albanD
2023-04-18 15:15:46 +00:00
PyTorch MergeBot
1c042a2137 Revert "Reland python ops (#99170)"
This reverts commit d4de64ae8d.

Reverted https://github.com/pytorch/pytorch/pull/99170 on behalf of https://github.com/DanilBaibak due to Break internal build
2023-04-18 11:37:43 +00:00
Rodrigo Kumpera
d4de64ae8d Reland python ops (#99170)
Waiting for the revert to land.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/99170
Approved by: https://github.com/albanD
2023-04-17 21:53:41 +00:00
Rodrigo Kumpera
a910045add [PATCH] Back out "Move functional collectives implementation to python. (#98595) (#99168)
Summary:
Original commit changeset: ba36f8751adc

Original Phabricator Diff: D44788697

Test Plan: model loading is fine after reverting the diff

Reviewed By: zyan0, sayitmemory

Differential Revision: D44921259
---

Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/99168
Approved by: https://github.com/izaitsevfb
2023-04-14 23:48:19 +00:00
PyTorch MergeBot
e778bcec05 Revert "fix allgather func collective to use maybe_wrap_tensor (#98866)"
This reverts commit ada7dfff71.

Reverted https://github.com/pytorch/pytorch/pull/98866 on behalf of https://github.com/izaitsevfb due to Conflicts with the co-dev diff D44921259, reverting to unblock the diff train
2023-04-14 00:30:16 +00:00
Wanchao Liang
ada7dfff71 fix allgather func collective to use maybe_wrap_tensor (#98866)
It looks like we forgot to switch allgather to use maybe_wrap_tensor,
this PR switch to use that and added test to guard tracing behavior
Pull Request resolved: https://github.com/pytorch/pytorch/pull/98866
Approved by: https://github.com/mrshenli
2023-04-12 19:13:46 +00:00
Rodrigo Kumpera
24d9001527 Move functional collectives implementation to python. (#98595)
This simplifies a lot the work we need to add new ops.

This relands the previous PR, not sure why it was reverted.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/98595
Approved by: https://github.com/wconstab
2023-04-07 21:48:05 +00:00
PyTorch MergeBot
67d1a77086 Revert "Move functional collectives implementation to python. (#98315)"
This reverts commit 8b0374f83c.

Reverted https://github.com/pytorch/pytorch/pull/98315 on behalf of https://github.com/huydhn due to Sorry for reverting for PR. This is failing in trunk probably due to a landrace
2023-04-06 16:49:40 +00:00
Rodrigo Kumpera
8b0374f83c Move functional collectives implementation to python. (#98315)
This simplifies a lot the work we need to add new ops.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/98315
Approved by: https://github.com/albanD, https://github.com/wconstab, https://github.com/Neilblaze
2023-04-06 14:06:16 +00:00
Kazuaki Ishizaki
6514d71add Fix typos under torch/distributed directory (#98225)
This PR fixes typos in comments and messages of `.py` files under `torch/distributed` directory

Pull Request resolved: https://github.com/pytorch/pytorch/pull/98225
Approved by: https://github.com/soulitzer, https://github.com/kit1980
2023-04-05 00:21:33 +00:00
PyTorch MergeBot
fa08e546f3 Revert "Add all_reduce_coalesced functional collective (#97157)"
This reverts commit a3fc3531f5.

Reverted https://github.com/pytorch/pytorch/pull/97157 on behalf of https://github.com/huydhn due to Sorry for reverting your PR, but it seems to have a land race with https://github.com/pytorch/pytorch/pull/96226 and fails lint on trunk
2023-04-04 01:50:49 +00:00
Rodrigo Kumpera
a3fc3531f5 Add all_reduce_coalesced functional collective (#97157)
Inductor codegen is suboptimal when calling all_reduce_coalesced with input args. We need to fix inductor's calling convention for that, or something else.

Might not work if any outputs is unused.

Test code:

```python
import torch
import torch.distributed as dist
import torch.nn.functional as F
from functorch import make_fx
import os

import torch.distributed._functional_collectives as ft_c
from torch.testing._internal.common_distributed import (
    spawn_threads_and_init_comms,
)
from torch._inductor.compile_fx import compile_fx_inner

def my_fun(a, b):
    c = a * 3
    tensors = ft_c.all_reduce_coalesced([a, c, b], "sum", [0])
    return ((tensors[1] + tensors[0] + tensors[2]).sum(), )

@spawn_threads_and_init_comms(world_size=1)
def inductor_main(self):

    x = torch.arange(4).cuda() * (dist.get_rank() + 1)
    y = torch.arange(4).cuda() * (dist.get_rank() + 1)
    x = x.to(torch.float)
    y = y.to(torch.float) * 0.5
    res = make_fx(my_fun)(x, y)
    print(f"fx graph:\n{res.graph}")
    ind = compile_fx_inner(res, [x, y])
    print(f"inductor done:\n{ind}")

os.environ["PROXY_TENSOR_TRACING"] = "1"
os.environ["TORCH_COMPILE_DEBUG"] = "1"
torch._dynamo.config.output_code = True

if __name__ == "__main__":
    inductor_main(None)
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/97157
Approved by: https://github.com/fegin
2023-04-04 01:13:18 +00:00
Rodrigo Kumpera
9ad66dd588 Switch reduce_scatter and all_gather in DeviceMesh to use functional collectives (#96226)
Among the changes is the introduction of gather_dim and scatter_dim in DeviceMesh collectives to simplify user code.

The current plan is to keep padding and gather/scatter dim support in DeviceMesh while we explore  optimization opportunities in Inductor.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/96226
Approved by: https://github.com/wanchaol
2023-04-04 00:58:33 +00:00
Rodrigo Kumpera
3b188c5883 Don't use subclass when tracing and call wait_tensor immediately. (#98001)
This change expects that proper scheduling of the wait_tensor call will happen over the traced graph.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/98001
Approved by: https://github.com/wconstab, https://github.com/wanchaol
2023-03-31 18:33:20 +00:00
PyTorch MergeBot
f4f1a5b5b3 Revert "Move functional collectives to the right namespace (#97793)"
This reverts commit 184bfbc3d7.

Reverted https://github.com/pytorch/pytorch/pull/97793 on behalf of https://github.com/atalman due to breaks internal builds
2023-03-31 16:02:07 +00:00
Rodrigo Kumpera
184bfbc3d7 Move functional collectives to the right namespace (#97793)
This moves them from `torch._C._nn` to `torch._C._dist`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/97793
Approved by: https://github.com/albanD
2023-03-30 22:18:13 +00:00
Wanchao Liang
848bf8103b fix functional collective to not generate getattr node (#97924)
use mesh.get_dim_groups directly instead of doing mesh tensor operations

This help us get rid of the getattr ops during tracing
Pull Request resolved: https://github.com/pytorch/pytorch/pull/97924
Approved by: https://github.com/kumpera
2023-03-30 20:14:50 +00:00
Kazuaki Ishizaki
35fd5c548e Fix typos under torch/distributed directory (#95638)
This PR fixes typos in comments and messages of `.py` files under torch/distributed directory

Pull Request resolved: https://github.com/pytorch/pytorch/pull/95638
Approved by: https://github.com/usamah1, https://github.com/H-Huang, https://github.com/kit1980
2023-03-27 21:13:44 +00:00
Rodrigo Kumpera
c7bd9b9490 Switch AsyncCollectiveTensor to be a wrapper subclass. (#96105)
Our usage is of a wrapper, so it makes sense that we use one.

This makes it possible for FakeTensorMode to work.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/96105
Approved by: https://github.com/wanchaol, https://github.com/wconstab
2023-03-10 15:13:32 +00:00
Rodrigo Kumpera
5b2ab0dd4f Multiple fixes for functional collectives. (#95897)
_functional_collectives.py: Ensure we always wait all collectives.
derivatives.yaml: mark all_reduce as non differentiable
gen_variable_type.py: Add all_reduce to DONT_ENFORCE_TENSOR_IMPL_USE_COUNT
common_dtensor.py: replace dist.barrier with all_reduce

Pull Request resolved: https://github.com/pytorch/pytorch/pull/95897
Approved by: https://github.com/wconstab, https://github.com/fegin
2023-03-06 15:35:07 +00:00
Will Constable
92a2107375 Support Inductor collectives with wait or collective outside graph (#95893)
Inductor implementations of collectives/wait must match
eager impls in _functional_collectives in terms of interacting
with _register_tensor_work API.  If they do, then splitting
a collective-wait pair so one half is in a compiled graph should
work fine.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/95893
Approved by: https://github.com/kumpera
2023-03-03 09:00:48 +00:00
Wanchao Liang
f397d1700f Inductor reduce_scatter_tensor (#95764)
This adds reduce_scatter to the functional collective and adds the
inductor lowering support

Pull Request resolved: https://github.com/pytorch/pytorch/pull/95764
Approved by: https://github.com/kumpera
2023-03-02 22:05:30 +00:00
Rodrigo Kumpera
3e8eedd78e Round of fixes for functional collectives (#95714)
Move collective registration to torch.__init__ to handle multipy warmup.
Fix all_reduce with non-contiguous tensors.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/95714
Approved by: https://github.com/wconstab
2023-03-01 17:52:14 +00:00
Will Constable
cc6da7b901 Inductor allgather_into_tensor (#95530)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/95530
Approved by: https://github.com/kumpera
2023-02-27 21:38:36 +00:00
PyTorch MergeBot
d950f45577 Revert "[Functional Collectives] Migrate DeviceMesh::all_reduce to use functional all_reduce. (#95009)"
This reverts commit 0765dbc25e.

Reverted https://github.com/pytorch/pytorch/pull/95009 on behalf of https://github.com/jeanschmidt due to this PR is causing internal breakages. Check https://fburl.com/diff/me41urq8
2023-02-27 19:21:58 +00:00
Rodrigo Kumpera
0765dbc25e [Functional Collectives] Migrate DeviceMesh::all_reduce to use functional all_reduce. (#95009)
BC: This changes the signature and semantics of DeviceMesh::all_reduce.

DeviceMesh::all_reduce now uses a functional collective under the hood which makes it more easily traceable.
You no longer need to use CommTensor to get a trace.

all_reduce now is async only and uses AsyncCollectiveTensor to ensure proper stream synchronization.

Signature changed: removed `async_op` param and changes return type from `Optional[Work]` to `torch.Tensor`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/95009
Approved by: https://github.com/wanchaol
2023-02-24 02:10:55 +00:00
Rodrigo Kumpera
e22d791287 [PTD] Introduce tracing friendly collectives. (#93990)
This change adds torch.distributed.traceable_collectives.

This experimental API enables collectives to be fully traced by dynamo and FX.

See #93173 for the RFC

Pull Request resolved: https://github.com/pytorch/pytorch/pull/93990
Approved by: https://github.com/wconstab, https://github.com/wanchaol, https://github.com/H-Huang
2023-02-16 15:35:01 +00:00