Adrian Wälchli
866457e746
Fix pydocstyle errors in fully_sharded_data_parallel.py, api.py, graph_utils.py, distribute.py, iter_graph_module.py, comm_tensor.py, experimental_ops.py, batch_dim_utils.py, data_parallel.py, graph_optimization.py ( #113216 )
...
Fixes #113191
```
pydocstyle torch/distributed/fsdp/fully_sharded_data_parallel.py --count
```
On master: 80
After my changes on this PR: 3
```
pydocstyle torch/distributed/_spmd/comm_tensor.py --count
```
On master: 5
After my changes on this PR: 3
```
pydocstyle torch/distributed/_spmd/experimental_ops.py --count
```
On master: 3
After my changes on this PR: 1
```
pydocstyle torch/distributed/_spmd/iter_graph_module.py --count
```
On master: 39
After my changes on this PR: 27
```
pydocstyle torch/distributed/_spmd/graph_utils.py --count
```
On master: 16
After my changes on this PR: 4
```
pydocstyle torch/distributed/_spmd/distribute.py --count
```
On master: 19
After my changes on this PR: 10
```
pydocstyle torch/distributed/_spmd/api.py --count
```
On master: 10
After my changes on this PR: 3
```
pydocstyle torch/distributed/_spmd/batch_dim_utils.py --count
```
On master: 14
After my changes on this PR: 3
```
pydocstyle torch/distributed/_spmd/data_parallel.py --count
```
On master: 34
After my changes on this PR: 2
```
pydocstyle torch/distributed/_spmd/graph_optimization.py --count
```
On master: 35
After my changes on this PR: 13
Pull Request resolved: https://github.com/pytorch/pytorch/pull/113216
Approved by: https://github.com/ezyang
2023-11-10 03:08:32 +00:00
Peter Bell
66c32d099a
Use pytree.arg_tree_leaves everywhere ( #112394 )
...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/112394
Approved by: https://github.com/lezcano
ghstack dependencies: #112391 , #112392 , #112393
2023-10-31 15:57:06 +00:00
Peter Bell
bbd5b935e4
Use pytree.tree_leaves everywhere ( #112324 )
...
This changes all the instances I could find of `tree_flatten(...)[0]` or
`x, _ = tree_flatten` to use `tree_leaves`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/112324
Approved by: https://github.com/lezcano
ghstack dependencies: #112327 , #112323
2023-10-30 03:39:04 +00:00
Kazuaki Ishizaki
b5f9696d81
Fix typo under torch directory ( #110824 )
...
This PR fixes typo `the the` of comments and exception messages in files under `torch` directory.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110824
Approved by: https://github.com/H-Huang
2023-10-09 19:16:43 +00:00
Yeonju Ro
06f656c5d1
[distributed] implemented find_all_descendants ( #102138 )
...
Fixes #100397
Implemented find_all_descendants function that identifies the list of nodes that need to be moved. Added unit test.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/102138
Approved by: https://github.com/fegin
2023-05-24 21:47:59 +00:00
Chien-Chin Huang
e0a2b49f0b
[SPMD] Introduce prerequisites to graph_optimization_pass ( #99970 )
...
Some optimizations require prerequisite passes. It is hard to debug why a optimization pass because of the prerequisites condition does not match. Adding this check makes it easier to discover the error.
Differential Revision: [D45255377](https://our.internmc.facebook.com/intern/diff/D45255377/ )
Pull Request resolved: https://github.com/pytorch/pytorch/pull/99970
Approved by: https://github.com/lessw2020
2023-04-28 18:38:01 +00:00
Chien-Chin Huang
01de8ee845
[SPMD][Easy] Add time counter in graph_optimization_pass ( #99969 )
...
This can give the idea how expensive the pass is.
Differential Revision: [D45255366](https://our.internmc.facebook.com/intern/diff/D45255366/ )
Pull Request resolved: https://github.com/pytorch/pytorch/pull/99969
Approved by: https://github.com/lessw2020
2023-04-27 17:56:07 +00:00
Chien-Chin Huang
41d7969590
[SPMD] Upstream iter_move_grads_and_optimizers ( #98785 )
...
This PR upstreams `iter_move_grads_and_optimizer` which delay some of the gradients and the corresponding optimizer to the next iteration. D44512863(credit to @lessw2020 ) is the internal implementation, which is only good for the old _SPMD expansion. This PR changes the implmentation to use the new APIs.
Differential Revision: [D44836486](https://our.internmc.facebook.com/intern/diff/D44836486/ )
Pull Request resolved: https://github.com/pytorch/pytorch/pull/98785
Approved by: https://github.com/mrshenli
2023-04-19 06:40:33 +00:00
Shen Li
19c2804614
[SPMD][EASY] Remove unnecessary torch.ops prefix ( #99331 )
...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/99331
Approved by: https://github.com/dracifer
2023-04-17 19:33:45 +00:00
Chien-Chin Huang
148d49260a
[SPMD] Implement split_fused_optimizer to split one fused_optimizer node to two ( #98784 )
...
Several optimization passes requires the ability to split the fused_optimizer. This PR adds the API to support the use cases.
Differential Revision: [D44806450](https://our.internmc.facebook.com/intern/diff/D44806450/ )
Pull Request resolved: https://github.com/pytorch/pytorch/pull/98784
Approved by: https://github.com/mrshenli
2023-04-17 10:02:07 +00:00
Chien-Chin Huang
99aacf5c68
[SPMD] Expedite the allreduce call before doing comm_fusion ( #98922 )
...
The allreduce call order and gradients order may be different and can interfere the benefit of comm_fusion. This PR reorders the graph so that all the allreduce calls happen right after its last input.
Differential Revision: [D44900738](https://our.internmc.facebook.com/intern/diff/D44900738/ )
Pull Request resolved: https://github.com/pytorch/pytorch/pull/98922
Approved by: https://github.com/mrshenli
2023-04-12 23:26:37 +00:00
Chien-Chin Huang
f3080997e5
[SPMD] Introduce remove_copy_for_optimizer optimization ( #98580 )
...
This PR adds the ability to remove unused `copy_` (`len(node.users) == 0`) that generated by tracing the optimizer.
Differential Revision: [D44761556](https://our.internmc.facebook.com/intern/diff/D44761556/ )
Pull Request resolved: https://github.com/pytorch/pytorch/pull/98580
Approved by: https://github.com/mrshenli
2023-04-12 00:51:22 +00:00
Chien-Chin Huang
07a1378f52
[SPMD] Introduce schedule_comm_wait ( #98578 )
...
`schedule_comm_wait` delays the wait_tensor ops as late as possible. Note that this optimization currently does not reorder the computation ops. For `foreach` based optimizer, we observe that reordering the computation ops is required to achieve a good performance.
Differential Revision: [D44761487](https://our.internmc.facebook.com/intern/diff/D44761487/ )
Pull Request resolved: https://github.com/pytorch/pytorch/pull/98578
Approved by: https://github.com/mrshenli
2023-04-12 00:51:19 +00:00
Chien-Chin Huang
dd3e2ddc0a
[SPMD] Introduce graph_optimization_pass and comm_fusion_with_cat ( #98285 )
...
This PR add `graph_optimization_pass` decorator which should be wrapped by all graph optimization passes. This PR also introduces the first graph optimization, `comm_fusion_with_cat`, as the first use case of `graph_optimization_pass`.
Differential Revision: [D44661608](https://our.internmc.facebook.com/intern/diff/D44661608/ )
Pull Request resolved: https://github.com/pytorch/pytorch/pull/98285
Approved by: https://github.com/yifuwang
2023-04-12 00:51:16 +00:00