Commit Graph

36 Commits

Author SHA1 Message Date
Oguz Ulgen
71e22e0959 Add basic mypy annotations to dynamo (#132415)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/132415
Approved by: https://github.com/XuehaiPan, https://github.com/jamesjwu
2024-08-01 20:14:25 +00:00
Xuehai Pan
e74ba1b34a [BE][Easy][15/19] enforce style for empty lines in import segments in torch/_d*/ (#129767)
See https://github.com/pytorch/pytorch/pull/129751#issue-2380881501. Most changes are auto-generated by linter.

You can review these PRs via:

```bash
git diff --ignore-all-space --ignore-blank-lines HEAD~1
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/129767
Approved by: https://github.com/anijain2305
2024-07-31 21:18:11 +00:00
Oguz Ulgen
7a42470bcb Annotate all InstructionTranslator (#131509)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/131509
Approved by: https://github.com/zou3519
2024-07-24 23:45:53 +00:00
PyTorch MergeBot
5db5865614 Revert "Annotate all InstructionTranslator (#131509)"
This reverts commit eafbd20f23.

Reverted https://github.com/pytorch/pytorch/pull/131509 on behalf of https://github.com/clee2000 due to sorry need to revert this to revert something else, I think you only need to rebase and remerge ([comment](https://github.com/pytorch/pytorch/pull/131509#issuecomment-2249000843))
2024-07-24 22:29:49 +00:00
Oguz Ulgen
eafbd20f23 Annotate all InstructionTranslator (#131509)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/131509
Approved by: https://github.com/zou3519
2024-07-24 05:31:01 +00:00
Brian Hirsh
7058563078 support as_python_constant on PlacementClassVariable (#124398)
Fixes an error for torchtitan + internal

Pull Request resolved: https://github.com/pytorch/pytorch/pull/124398
Approved by: https://github.com/ezyang, https://github.com/wanchaol, https://github.com/yoyoyocmu
2024-05-01 21:56:01 +00:00
Yifu Wang
3bede14fa7 Don't create world pg variable out of thin air when rewriting c10d collectives (#122561)
Fixes https://github.com/pytorch/pytorch/issues/122404

Previously, when rewriting c10d collectives, if the group argument is
unspecified or None, we create a world pg variable out of thin air and
pass it to the rewrite target. The approach was problematic, as it
assumes the symbol `torch` is available in the scope (see #122404).

After #120560, dynamo can now trace dist.group.WORLD. If the group
argument is unspecified, we can just set it with dist.group.WORLD in the
rewrite target.

Testing

pytest test/distributed/test_inductor_collectives.py -k test_dynamo_rewrite_dist_allreduce

Also verified with the repro provided in #122404

Pull Request resolved: https://github.com/pytorch/pytorch/pull/122561
Approved by: https://github.com/wconstab
ghstack dependencies: #120560
2024-03-26 20:12:08 +00:00
Yifu Wang
36188360dd [dynamo] support torch.distributed.{group.WORLD, GroupMember.WORLD, distributed_c10d._get_default_group} (#120560)
Fixes https://github.com/pytorch/pytorch/issues/120431

Pull Request resolved: https://github.com/pytorch/pytorch/pull/120560
Approved by: https://github.com/wconstab
2024-03-24 11:13:05 +00:00
Brian Hirsh
2e44b12dd4 dynamo: handle DTensor.device_mesh.device_type (#118803)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/118803
Approved by: https://github.com/wanchaol, https://github.com/yanboliang
2024-03-22 14:42:22 +00:00
Jason Ansel
3c8c7e2a46 [dynamo] Tweak naming for module hook bw_state (#121609)
Some minor changes not related to the other PRs in the stack

Pull Request resolved: https://github.com/pytorch/pytorch/pull/121609
Approved by: https://github.com/yanboliang
2024-03-12 16:27:56 +00:00
Yifu Wang
d7a5e59647 [dynamo] support group=None when rewriting collectives (#121043)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/121043
Approved by: https://github.com/awgu
2024-03-06 21:37:19 +00:00
Jason Ansel
e3dbd194f4 [dynamo] Support module backwards hooks (#120685)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/120685
Approved by: https://github.com/yanboliang, https://github.com/xmfan
2024-03-01 02:24:26 +00:00
Sergii Dymchenko
d341b66e96 Revert [dynamo] support group=None when rewriting collectives (#12018) (#120677)
This reverts commit 298c686d3f.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/120677
Approved by: https://github.com/yifuwang, https://github.com/huydhn
2024-02-27 00:33:35 +00:00
Yifu Wang
298c686d3f [dynamo] support group=None when rewriting collectives (#120118)
Resolves case 2 in #120082.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/120118
Approved by: https://github.com/wconstab
ghstack dependencies: #120370
2024-02-25 03:12:10 +00:00
Yifu Wang
11e4a9266d Temporarily support ranks + tag as pg identifier in native funcol (#120226)
As communicated in https://github.com/pytorch/pytorch/issues/93173#issuecomment-1907095208, although we are dropping `(ranks, tag)` as group identifier in funcols, there will be a grace period for migration. This PR adds temporary `(ranks, tag)` support in native funcols. It also helps us decouple the py funcol -> native funcol transition from the API change.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/120226
Approved by: https://github.com/wanchaol, https://github.com/wconstab
ghstack dependencies: #120042, #120043, #120070
2024-02-22 20:24:16 +00:00
Yifu Wang
2d6c0cc81b Run test_functional_api.py with both legacy and native funcol impls (#119982)
Additional changes: tests in test_functional_api.py uses multi-threaded pg which is implemented in Python. For the native ops to call into the Python pg implementation, glue code in PyProcessGroup is required for each collective. This PR also adds a few pieces of previously missing glue code, which are necessary for running test_functional_api.py with native funcol.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/119982
Approved by: https://github.com/wanchaol
2024-02-20 21:15:37 +00:00
Brian Hirsh
26fbbc3e84 DTensor + dynamo: fix is_shard/replicate always inlining to False (#118668)
Fixes an internal enablement bug. When dynamo traces `is_sharded`/`is_replicate`, it would unconditioanlly assume the result was False.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/118668
Approved by: https://github.com/wconstab, https://github.com/wanchaol
ghstack dependencies: #117667, #117666, #118209, #118191, #118667
2024-02-20 15:23:48 +00:00
Jason Ansel
f2cf0768d1 [dynamo][distributed] handle _rank_not_in_group, _get_or_create_default_group (#119628)
Copy of #117692

Pull Request resolved: https://github.com/pytorch/pytorch/pull/119628
Approved by: https://github.com/yanboliang
2024-02-18 22:34:35 +00:00
Yifu Wang
cd08dc37f8 Support tracing native functional collective via python APIs (#119103)
Summary:
- Inlined `torch.distributed.distributed_c10d._get_group_size_by_name`
- Updated all torch.compile tests in test_c10d_functional_native.py to use funcol python APIs (as opposed to the dispatcher ops)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/119103
Approved by: https://github.com/wconstab, https://github.com/fegin, https://github.com/wanchaol
2024-02-15 03:33:49 +00:00
Edward Z. Yang
d03173e88c Unify MYPYINDUCTOR and MYPY (#118432)
The original motivation for MYPYINDUCTOR was a faster type checking configuration that only checked a subset of files. With the removal of `follow_imports = ignore`, we are now able to use dmypy to do fast incremental typechecking, eliminating the need for this.

Perhaps erroneously, when I tee'ed up this PR I elected to delete the `follow_imports = skip` designations in the mypy-inductor.ini. This lead to a number of extra type error suppressions that I manually edited. You will need to review.

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/118432
Approved by: https://github.com/Skylion007
ghstack dependencies: #118414, #118418
2024-01-27 17:23:20 +00:00
Yue Dong
a8978d3676 [dynamo] Add size(), get_coordinate() support for DeviceMesh in dynamo (#117710)
Summary: This fix is part of: https://github.com/pytorch/pytorch/issues/117670

Test Plan: Unit tetst and CI

Differential Revision: D52857348

Pull Request resolved: https://github.com/pytorch/pytorch/pull/117710
Approved by: https://github.com/wconstab, https://github.com/yanboliang, https://github.com/wanchaol, https://github.com/anijain2305
2024-01-23 07:10:52 +00:00
Yue Dong
56ef5afdee [dynamo] Add more dynamo call_methods and getattr support or Placement (#117733)
Summary:
Explained by title.
This fix is part of: https://github.com/pytorch/pytorch/issues/117670

Test Plan:
Unit tetst and CI
- Unit test: `buck2 test mode/dev-nosan //caffe2/test/distributed/_tensor:dtensor_compile -- test_placement_compile`

Differential Revision: D52863073

Pull Request resolved: https://github.com/pytorch/pytorch/pull/117733
Approved by: https://github.com/yanboliang
2024-01-22 18:22:54 +00:00
Wanchao Liang
4720109d7f [dynamo] add common methods to DistributedVariable (#117590)
This PR refactors the distributed related variables to use
DistributedVariable for common methods, so that things like
`python_type` works for all distributed variables.

Maybe we can add `as_python_constant` to the DistributedVariable too? I
didn't add in this PR but if that make sense I can update.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/117590
Approved by: https://github.com/voznesenskym
2024-01-18 17:32:31 +00:00
Iris Zhang (PyTorch)
23fa9621e4 [DeviceMesh] Rename _device_mesh.py to device_mesh.py to prepare for beta (#115099) (#115193)
Summary:

Rename _device_mesh.py to device_mesh.py, update all callsites, add documentation.
We created stubs for public class and methods in torch.distributed.device_mesh so that torch.distributed.device_mesh can be imported with or without distributed is available().

Original diff reverted: D51629761
Original PR reverted: https://github.com/pytorch/pytorch/pull/115099
Prior to landing, CI signals are all passed. Shipit added the "ci/trunk" label to the PR and DID NOT wait for it and went ahead committing. More context can be found in the reverted PR above.

Test Plan: CI.

Differential Revision: D51861018

Pull Request resolved: https://github.com/pytorch/pytorch/pull/115193
Approved by: https://github.com/fegin
2023-12-08 08:44:32 +00:00
Nikita Shulga
a827ac71f2 Revert "[DeviceMesh] Rename _device_mesh.py to device_mesh.py to prepare for beta (#115099)"
This reverts commit eaa64339d6.
2023-12-05 08:59:36 -08:00
Iris Zhang (PyTorch)
eaa64339d6 [DeviceMesh] Rename _device_mesh.py to device_mesh.py to prepare for beta (#115099)
Summary:
Rename _device_mesh.py to device_mesh.py, update all callsites, adds documentation.

Original diff reverted: D51629761
Original PR reverted: https://github.com/pytorch/pytorch/pull/114991
It was failing because failing a public module binding tests in MacOS, and this is due to the change in import order for torch/distributed/fsdp/_common_utils.py. Since this original import would still work, we remove the changes in this file.

Test Plan: CI.

Differential Revision: D51825114

Pull Request resolved: https://github.com/pytorch/pytorch/pull/115099
Approved by: https://github.com/wanchaol, https://github.com/fegin
2023-12-05 05:44:52 +00:00
PyTorch MergeBot
3a2e2044cd Revert "[DeviceMesh] Rename _device_mesh.py to device_mesh.py to prepare for beta (#114710) (#114991)"
This reverts commit 729ac7317a.

Reverted https://github.com/pytorch/pytorch/pull/114991 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally ([comment](https://github.com/pytorch/pytorch/pull/114991#issuecomment-1837214567))
2023-12-02 17:55:51 +00:00
Iris Zhang (PyTorch)
729ac7317a [DeviceMesh] Rename _device_mesh.py to device_mesh.py to prepare for beta (#114710) (#114991)
Summary:

Same content of changes as https://github.com/pytorch/pytorch/pull/114710

Rename _device_mesh.py to device_mesh.py, update all callsites, adds documentation.
ghstack-source-id: 208980207
exported-using-ghexport

Test Plan: CI.

Reviewed By: wanchaol

Differential Revision: D51629761

Pull Request resolved: https://github.com/pytorch/pytorch/pull/114991
Approved by: https://github.com/wanchaol, https://github.com/fduwjj, https://github.com/fegin
2023-12-02 04:39:41 +00:00
Chien-Chin Huang
08641a3232 Make FakeProcessGroup traceable (#113314)
This PR mimics what we have done to trace ProcessGroup. This allows use to use FakeProcessGroup with torch.compile. FakeProcessGroup allows us to use world_size > 1 without creating multiple processes thus enabling the usage of PDB to debug bucketing DDP allreduce in the Inductor. We can theoretically use GLOO with world_size==1 to achieve the same goal. However, the `wait()` seems to be optimized away when the world_size is 1.

Differential Revision: [D51136463](https://our.internmc.facebook.com/intern/diff/D51136463/)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/113314
Approved by: https://github.com/wanchaol
2023-11-10 16:03:38 +00:00
Jason Ansel
5fe96eaaf4 [dynamo] Remove VariableTracker.propagate (#111726)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/111726
Approved by: https://github.com/voznesenskym
ghstack dependencies: #111306, #111415, #111725
2023-11-07 19:55:19 +00:00
Jason Ansel
843a8ecd24 [dynamo] Remove VariableTracker.add_options (#111725)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/111725
Approved by: https://github.com/voznesenskym
ghstack dependencies: #111306, #111415
2023-11-07 19:55:19 +00:00
Michael Voznesensky
a902150a1e [Easy] ConstantVariable() -> .create (#109896)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/109896
Approved by: https://github.com/ezyang
2023-09-22 22:30:15 +00:00
Wanchao Liang
a29b9101fa [dynamo] fix dynamo + DTensor to work with 2d (#108329)
pair debugged with @wconstab and we found some issue in both dynamo and
the TP's fsdp extension side. This PR fixes the dynamo + DTensor integration
so that the current graph break FSDP can work with tensor parallel by moving
the torch.compile after FSDP wrapping.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/108329
Approved by: https://github.com/Skylion007, https://github.com/wconstab
2023-08-31 22:46:26 +00:00
Michael Voznesensky
8549abc347 Grab bag of DTensor enablement stuff (Enable whole graph capture for DTensor) (#105787)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/105787
Approved by: https://github.com/ezyang
2023-07-30 00:17:45 +00:00
Wanchao Liang
c76c84bde4 [dynamo] make ProcessGroupVariable a DistributedVariable (#105593)
This PR move the ProcessGroupVariable from UDO to DistributedVT
so that Distributed VTs are consolidated together

Pull Request resolved: https://github.com/pytorch/pytorch/pull/105593
Approved by: https://github.com/voznesenskym
2023-07-26 06:42:50 +00:00
Wanchao Liang
f139aab2f4 [dynamo] add initial dynamo support for DTensor (#103146)
This PR adds initial dynamo support for DTensor, in particular, it:
- allows DTensor be passed into a compiled function, and allow fakify
DTensor during dynamo tracing by turning the inner local tensor to meta
tensor.
- We use `allow_in_graph` to include `DTensor` and `DTensor.from_local` to be represented as `TorchVariable`
- The dtensor created becomes a normal `TensorVariable` and it would insert any tensor operations to the output graph just like torch.Tensor
- note that dtensor have a new instance method `redistribute` compare to plain tensor, and we currently special handle it in `TensorVariable`

`from_local` and `redistribute` both accepts some non-trival metadata as arguments (i.e. DeviceMesh, Placement) which fx.Graph does not support. In order to let these two APIs appear in the dynamo captured graph, we encoded the metadata into a new_function (like `functools.partial`) and the new function only accepts prim args (i.e. tensor), then we put `call_function` with this new_function to the graph. This is suggested by @ezyang. The underlying rationale here is that the metadata will not change across the graph invocations so it's safe to encode them.

Captured graph:
```
    def forward(self, L_x_ : torch.Tensor):
        l_x_ = L_x_

        # File: /scratch/wanchaol/work/pytorch/test/distributed/_tensor/test_dtensor.py:685, code: dt = DTensor.from_local(x, mesh, [Shard(0)], run_check=False)
        prim_from_local = torch__dynamo_variables_torch_prim_from_local(l_x_, run_check = False);  l_x_ = None

        # File: /scratch/wanchaol/work/pytorch/test/distributed/_tensor/test_dtensor.py:686, code: return dt.redistribute(mesh, [Replicate()]).to_local() + 2
        prim_redistribute = torch__dynamo_variables_tensor_prim_redistribute(prim_from_local);  prim_from_local = None
        to_local = prim_redistribute.to_local();  prim_redistribute = None
        add = to_local + 2;  to_local = None
        return (add,)
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/103146
Approved by: https://github.com/voznesenskym
2023-07-19 16:01:12 +00:00