Commit Graph

8 Commits

Author SHA1 Message Date
Aaron Gokaslan
9c3fbe7475 [BE] Enable flake8-simplify checks (#97984)
Enable some sensible flake8-simplify rules. Mainly wanted to enable the SIM101, and `yield from` SIM103 checks. @kit1980 since you wanted to be tagged on this CI check.

Enabling this check also helped flag one logical bug so it's definitely beneficial (also fixed in this PR).

Pull Request resolved: https://github.com/pytorch/pytorch/pull/97984
Approved by: https://github.com/ezyang
2023-03-31 03:40:21 +00:00
Shen Li
9ec6fdb29b Enable adam foreach in full train step tracing (#97897)
Main changes:

1. Registered several foreach ops to both meta and DTensor
2. Skip redundant getitem node when expanding foreach ops with DTensor
Pull Request resolved: https://github.com/pytorch/pytorch/pull/97897
Approved by: https://github.com/wanchaol, https://github.com/fegin
2023-03-30 16:47:10 +00:00
Shen Li
379fb47654 [SPMD] Support foreach optimizers with functionalization (#97853)
My first attempt was to apply the same solution as how proxy_tensor.py
handles other inplace ops. However, foreach is different in the way
that it's schema is `native_functions.yaml` does not return anything,
whereas ops like `addcmul_` and `addcdiv_` do return Tensors (Thanks
bdhirsh for teaching me this!). As a result, the proxy output
during tracing does not wrap anything, and hence we cannot correctly
connect it with subsequent operators. Modifying `native_functions.yaml`
is not a preferred solution. After discussing with bdhirsh, the
temporary solution is to do foreach functionalization as a graph
pass for now. Later, when https://github.com/pytorch/pytorch/issues/97852
is addressed, we will switch to default functionalization.

Edit: the latest version follows @bdhirsh 's suggestion on using
`make_fx` `decomposition_table` instead of implementing manual
fx.Graph tranforms to functionalize `_foreach_add_`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/97853
Approved by: https://github.com/fegin, https://github.com/wanchaol
2023-03-30 11:27:10 +00:00
Chien-Chin Huang
942e587d40 [SPMD] Make compile cache the compilation result and add option to perform transformation (#97836)
This PR changes ``compile()`` decorator to cache the compilation result so that the compilation is done once. An gm_transformation option is also added to ``compile()`` so that after the compilation is done, users can perform any graph transformation with the compiled graph module.

Differential Revision: [D44484033](https://our.internmc.facebook.com/intern/diff/D44484033/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/97836
Approved by: https://github.com/mrshenli, https://github.com/wconstab
2023-03-29 20:51:22 +00:00
Shen Li
c39f1c1490 Allow DTensor to trigger collecives before inplace ops (#97787)
Mainly two fixes:

1. `make_fx` seems trace through DeviceMesh operations. This commit removes that from the DTensor expanded graph
2. During DTensor expansion, autograd complains about inplace changes on leaf node. This commit wraps entire DTensor expansion code with `torch.no_grad()`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/97787
Approved by: https://github.com/wanchaol
2023-03-28 21:06:51 +00:00
Shen Li
75fb0b6c9f Enable full train_step tracing and customizable dist graph expansion (#97416)
This commit adds an entry point for full `train_step` tracing and
expansion. Model forward, backwrd, and optimizer step will be included
in one graph. DTensor expansion will be applied on top to insert
collective communications. Users can also provide an `Override`
implementation to skip non-traceable submodules and directly install
submodule logic to the  DTensor-expanded graph by inserting `fx.Nodes`.

Differential Revision: [D44325177](https://our.internmc.facebook.com/intern/diff/D44325177)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/97416
Approved by: https://github.com/yifuwang, https://github.com/wanchaol
2023-03-25 09:24:21 +00:00
Shen Li
021de486ff [Easy] Apply black to format _spmd files (#97534)
No real changes. Format code to prepare for the PR on top.

Differential Revision: [D44376380](https://our.internmc.facebook.com/intern/diff/D44376380)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/97534
Approved by: https://github.com/wanchaol
2023-03-25 01:09:41 +00:00
Chien-Chin Huang
250c054bdd [SPMD] Pull the minimal working distribute API and SPMD module to PyTorch (#94802)
Pull the minimal working distribute API and SPMD module to PyTorch. The original code is on https://github.com/pytorch/tau/tree/main/spmd/compiler.

Other main contributors to the original code base: @anj-s, @lessw2020, @wanchaol @aazzolini

Differential Revision: [D43197230](https://our.internmc.facebook.com/intern/diff/D43197230/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/94802
Approved by: https://github.com/anj-s, https://github.com/wanchaol
2023-02-16 00:36:16 +00:00