Commit Graph

22 Commits

Author SHA1 Message Date
Aaron Orenstein
5a0068cc69 [BE] mypy: disallow untyped decorators (#131428)
Untyped decorators strip the types from their decorated function so even if the underlying function is fully typed then callers to it don't get any benefit from type annotations.

Step 1 - Enable the error and override in all the offending files.

#131429

Pull Request resolved: https://github.com/pytorch/pytorch/pull/131428
Approved by: https://github.com/justinchuby, https://github.com/oulgen
2024-07-23 21:50:55 +00:00
Xuehai Pan
22d258427b [BE][Easy] enable UFMT for torch/distributed/_shard/ (#128867)
Part of #123062

- #123062

Pull Request resolved: https://github.com/pytorch/pytorch/pull/128867
Approved by: https://github.com/fegin
ghstack dependencies: #128866
2024-06-18 14:39:25 +00:00
Aaron Orenstein
3a0d088517 Flip default value for mypy disallow_untyped_defs [5/11] (#127842)
See #127836 for details.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/127842
Approved by: https://github.com/oulgen
2024-06-08 18:49:18 +00:00
Wanchao Liang
2c9a420da3 [dtensor] move some modules to private namespace (#127339)
as titled, moving some modules that are mainly for DTensor private usage
to be a private module.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/127339
Approved by: https://github.com/awgu
ghstack dependencies: #127338
2024-05-29 05:18:47 +00:00
Wanchao Liang
afee5bea92 [dtensor] refactor schema suggestions in output sharding (#122929)
This PR refactors the schema_suggestions in OuputSharding to be a single
OpSchema instead of list of schemas, which in practice we only have one,
for the multiple resharding case we also moved to OpStrategy so there's
no case that needs it to be a list

Pull Request resolved: https://github.com/pytorch/pytorch/pull/122929
Approved by: https://github.com/tianyu-l
2024-04-01 17:39:39 +00:00
Adrian Wälchli
866457e746 Fix pydocstyle errors in fully_sharded_data_parallel.py, api.py, graph_utils.py, distribute.py, iter_graph_module.py, comm_tensor.py, experimental_ops.py, batch_dim_utils.py, data_parallel.py, graph_optimization.py (#113216)
Fixes #113191

```
pydocstyle torch/distributed/fsdp/fully_sharded_data_parallel.py --count
```

On master: 80
After my changes on this PR: 3

```
pydocstyle torch/distributed/_spmd/comm_tensor.py --count
```
On master: 5
After my changes on this PR: 3

```
pydocstyle torch/distributed/_spmd/experimental_ops.py --count
```
On master: 3
After my changes on this PR: 1

```
pydocstyle torch/distributed/_spmd/iter_graph_module.py --count
```
On master: 39
After my changes on this PR: 27

```
pydocstyle torch/distributed/_spmd/graph_utils.py --count
```
On master: 16
After my changes on this PR: 4

```
pydocstyle torch/distributed/_spmd/distribute.py --count
```
On master: 19
After my changes on this PR: 10

```
pydocstyle torch/distributed/_spmd/api.py --count
```
On master: 10
After my changes on this PR: 3

```
pydocstyle torch/distributed/_spmd/batch_dim_utils.py  --count
```
On master: 14
After my changes on this PR: 3

```
pydocstyle torch/distributed/_spmd/data_parallel.py --count
```
On master: 34
After my changes on this PR: 2

```
pydocstyle torch/distributed/_spmd/graph_optimization.py --count
```
On master: 35
After my changes on this PR: 13

Pull Request resolved: https://github.com/pytorch/pytorch/pull/113216
Approved by: https://github.com/ezyang
2023-11-10 03:08:32 +00:00
Wanchao Liang
09f3e08bcc [dtensor][3/n] use dedicated TensorMeta instead of the fx one (#108261)
This PR switches the usage of fx's shape prop TensorMetadata to
dtensor's own dedicated defined TensorMeta, this is because DTensor
only cares three fields: shape/stride/dtype, all other fields are not
necessary and can be inferred from local_tensor directly. This would
help significantly simplify how we deal with the tensor metadata by not
caring other fields.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/108261
Approved by: https://github.com/fduwjj
ghstack dependencies: #107306
2023-09-13 04:08:02 +00:00
Wanchao Liang
fc1dcfb9ab [dtensor][2/n] use op overload instead of function schema (#107306)
function schema doesn't provide us anything as we can also get the schema from `op._schema`, include the op directly in op_schema makes easier for sharding prop to do fake execution, and in principle it should also make the hash comparison faster as we don't need to hash the function schema, instead we just hash the `id(op)` which is constant

This PR is just a refactor to include op to OpSchema instead of func schema, no other logic changes
Pull Request resolved: https://github.com/pytorch/pytorch/pull/107306
Approved by: https://github.com/fduwjj
2023-09-13 04:08:02 +00:00
fduwjj
4a6ca4cc05 [TP][DTensor Perf] Some perf improvement to reduce DTensor CPU overhead (#106524)
By inspecting a small TP benchmark, we found couple things we can optimize:
1. We call deep_copy so many times when we initialize DTensor.
2. Some shading_prop is not cached successfully.
3. We are still calling redistribute when not necessary.

![image](https://github.com/pytorch/pytorch/assets/6937752/b847d110-eea1-45df-9298-066d0ba07dd7)

![image](https://github.com/pytorch/pytorch/assets/6937752/fc08f564-caed-496b-80d7-275c1dba3806)

![image](https://github.com/pytorch/pytorch/assets/6937752/fdc06cc4-a4ba-48e8-a118-c041bbd04f5e)

So we want to:
1. Remove the deep_copy, and we now make placements a tuple so we are sure it's immutable.
2. Somehow the op_schema gets changed during sharding_op propogation, so we store a hash version of it before passing it to sharding_prop. Ideally we want to figure out why `op_schema` gets changed, but looks like in both index and detach/view op, all get changed, it might take more time to debug.
3. Also when we do hashing of op_schema, we want to hash the entire args_schema not just the args_spec which only contains the DTensorSpec from args which are Dtensors.
4. It turns out that sometimes, DTensor has mem_format to be None (not contiguous) and this will lead to redistribute get triggered, so that we only need to compare type/shape and stride in the metadata.

Also we need to ensure _Partial and Shard have different hash value in the DTensorSpec.

![image](https://github.com/pytorch/pytorch/assets/6937752/321e6890-1ab6-4975-adc9-524c6ef9a76b)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/106524
Approved by: https://github.com/wanchaol
2023-08-14 20:03:19 +00:00
Aaron Gokaslan
e2a3817dfd [BE] Enable C419 rule for any all shortcircuiting (#99890)
Apparently https://github.com/pytorch/pytorch/pull/78142 made torch.JIT allow for simple generator expressions which allows us to enable rules that replace unnecessary list comprehensions with generators in any/all. This was originally part of #99280 but I split it off into this PR so that it can be easily reverted should anything break.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/99890
Approved by: https://github.com/justinchuby, https://github.com/kit1980, https://github.com/malfet
2023-04-25 15:02:13 +00:00
Shen Li
ca89e7942a [SPMD][Easy] switch to tree_map_only to simplify code (#99547)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/99547
Approved by: https://github.com/fegin
2023-04-19 20:40:09 +00:00
Shen Li
7c0c663a4c [SPMD] Add aten.stack and aten.select to DTensor prop (#99417)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/99417
Approved by: https://github.com/fegin
2023-04-19 14:55:34 +00:00
Shen Li
54b168484d Support LayerNorm without weight or bias parameters (#98687)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/98687
Approved by: https://github.com/yifuwang
2023-04-09 02:13:10 +00:00
Shen Li
d255c8e1ad Add NLLLoss to DTensor prop rule (#98512)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/98512
Approved by: https://github.com/wanchaol
2023-04-08 01:22:36 +00:00
Shen Li
02179827cb [Easy] Include SPMD and DTensor files in UFMT checks (#98148)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/98148
Approved by: https://github.com/fegin
2023-04-02 15:34:49 +00:00
Shen Li
e8d39606eb [SPMD] Enable fused Adam in full train step tracing (#98113)
Differential Revision: [](https://our.internmc.facebook.com/intern/diff/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/98113
Approved by: https://github.com/yifuwang, https://github.com/fegin
2023-04-01 15:54:13 +00:00
Shen Li
9ec6fdb29b Enable adam foreach in full train step tracing (#97897)
Main changes:

1. Registered several foreach ops to both meta and DTensor
2. Skip redundant getitem node when expanding foreach ops with DTensor
Pull Request resolved: https://github.com/pytorch/pytorch/pull/97897
Approved by: https://github.com/wanchaol, https://github.com/fegin
2023-03-30 16:47:10 +00:00
Shen Li
379fb47654 [SPMD] Support foreach optimizers with functionalization (#97853)
My first attempt was to apply the same solution as how proxy_tensor.py
handles other inplace ops. However, foreach is different in the way
that it's schema is `native_functions.yaml` does not return anything,
whereas ops like `addcmul_` and `addcdiv_` do return Tensors (Thanks
bdhirsh for teaching me this!). As a result, the proxy output
during tracing does not wrap anything, and hence we cannot correctly
connect it with subsequent operators. Modifying `native_functions.yaml`
is not a preferred solution. After discussing with bdhirsh, the
temporary solution is to do foreach functionalization as a graph
pass for now. Later, when https://github.com/pytorch/pytorch/issues/97852
is addressed, we will switch to default functionalization.

Edit: the latest version follows @bdhirsh 's suggestion on using
`make_fx` `decomposition_table` instead of implementing manual
fx.Graph tranforms to functionalize `_foreach_add_`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/97853
Approved by: https://github.com/fegin, https://github.com/wanchaol
2023-03-30 11:27:10 +00:00
Shen Li
021de486ff [Easy] Apply black to format _spmd files (#97534)
No real changes. Format code to prepare for the PR on top.

Differential Revision: [D44376380](https://our.internmc.facebook.com/intern/diff/D44376380)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/97534
Approved by: https://github.com/wanchaol
2023-03-25 01:09:41 +00:00
Wanchao Liang
738beaa6b8 [dtensor] fix experimental_op slice_scatter (#95894)
Test Plan: test with spmd e2e flow

Differential Revision: D43740349

Pull Request resolved: https://github.com/pytorch/pytorch/pull/95894
Approved by: https://github.com/fegin
2023-03-03 08:41:22 +00:00
Wanchao Liang
bb9a05b116 [dtensor] use tracing for metadata prop (#95456)
This PR uses tracing for metadata prop, so that we can get correct
shape/stride metadata without manual calculation by ourselves.

The follow up PR on this would be adopt tracing for the sharding
prop itself

Differential Revision: [D43643578](https://our.internmc.facebook.com/intern/diff/D43643578)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/95456
Approved by: https://github.com/XilunWu
2023-02-28 17:54:22 +00:00
Chien-Chin Huang
250c054bdd [SPMD] Pull the minimal working distribute API and SPMD module to PyTorch (#94802)
Pull the minimal working distribute API and SPMD module to PyTorch. The original code is on https://github.com/pytorch/tau/tree/main/spmd/compiler.

Other main contributors to the original code base: @anj-s, @lessw2020, @wanchaol @aazzolini

Differential Revision: [D43197230](https://our.internmc.facebook.com/intern/diff/D43197230/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/94802
Approved by: https://github.com/anj-s, https://github.com/wanchaol
2023-02-16 00:36:16 +00:00