FFFrog
9a1cdcb8a0
Format: fixing multiple string concatenation in single line ( #106013 )
...
Fixing multiple string concatenation in single line
Pull Request resolved: https://github.com/pytorch/pytorch/pull/106013
Approved by: https://github.com/albanD
2023-07-26 18:39:18 +00:00
Xilun Wu
e799f565eb
[DTensor][TP][Random] Introduce TensorParallelRNGTracker to integrate parallel RNG state with Tensor Parallel ( #103910 )
...
This PR enables the automatic use of `TensorParallelRNGTracker` in Tensor Parallel api. Some unit tests are going to be added to cover.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/103910
Approved by: https://github.com/wanchaol , https://github.com/fduwjj
2023-06-30 08:06:41 +00:00
fduwjj
d4380edb9b
[TP] Add API logging for TP high level API ( #102209 )
...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/102209
Approved by: https://github.com/wz337 , https://github.com/wanchaol
2023-05-25 03:33:00 +00:00
Wanchao Liang
a1aa32e204
[dtensor] tensor ops to use strategy based sharding prop ( #100607 )
...
This is the first series of PR that adopts operator impls to use a
strategy based approach, each op utilizes OpStrategy and PlacementStrategy
to generate their own strategy. By utilizing the strategy based
approach along with the op graph, we could enable more advanced op
implementation (decomp is possible), and turn the sharding prop to be
more like a contraint satisfication problem.
This PR alone only adds some basic tensor op strategies, and it directly
works on the op graph that was used for metadata propagation. The tensor ops
added in this PR mainly follows one of the arg strategy. The next set of
PRs would add more op strategies to other ops.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/100607
Approved by: https://github.com/XilunWu
2023-05-11 02:47:20 +00:00
fduwjj
953aa6d90e
[TP] Enable more generic attn in Tensor Parallelism ( #100508 )
...
To make TP more generic for Attention module, we come up with this new col/rowwise parallel style.
Basically, the idea behind is that:
We only do DTensor op for Col/Rowwise sharded part. For the rest of ATen ops, we will leave it to Tensor ops.
And we set this behavior as default for Colwise and Rowwise parallel style. If people want to customize it, they can always pass in different prepare_input or prepare_output
Pull Request resolved: https://github.com/pytorch/pytorch/pull/100508
Approved by: https://github.com/wanchaol
2023-05-07 18:15:49 +00:00
fduwjj
89b1e67d0a
[Tensor Parallel] Add a new Colwise Parallel style when Pairwise cannot directly used ( #100137 )
...
Some use cases, users cannot directly `PairwiseParallelStyle` and they might need to specify colwise and rowwise separately.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/100137
Approved by: https://github.com/wz337
2023-04-28 03:27:51 +00:00
Rohan Varma
be8c7c06b6
[Tensor Parallel] Simplify distribute for MHA ( #100046 )
...
This function is only called for nn.MHA or the custom MHA we use, and
if it is the former it is converted to the latter. So this check can actually
be an assert.
Differential Revision: [D45300396](https://our.internmc.facebook.com/intern/diff/D45300396/ )
Pull Request resolved: https://github.com/pytorch/pytorch/pull/100046
Approved by: https://github.com/wanchaol
2023-04-27 00:54:21 +00:00
Kazuaki Ishizaki
35fd5c548e
Fix typos under torch/distributed directory ( #95638 )
...
This PR fixes typos in comments and messages of `.py` files under torch/distributed directory
Pull Request resolved: https://github.com/pytorch/pytorch/pull/95638
Approved by: https://github.com/usamah1 , https://github.com/H-Huang , https://github.com/kit1980
2023-03-27 21:13:44 +00:00
fduwjj
b209d8fa0d
[PT-D][Sequence Parallelism] Enable DTensor based Naive sequence parallelism ( #94369 )
...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/94369
Approved by: https://github.com/wanchaol
2023-02-16 21:21:00 +00:00
Wanchao Liang
cd9ca4c73f
[tp] additional doc fixes ( #94786 )
...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/94786
Approved by: https://github.com/fduwjj
2023-02-15 21:25:26 +00:00
PyTorch MergeBot
28ed0bdb37
Revert "[tp] additional doc fixes ( #94786 )"
...
This reverts commit 7522ca55f1 .
Reverted https://github.com/pytorch/pytorch/pull/94786 on behalf of https://github.com/huydhn due to Sorry for reverting your PR, but the doc failure looks related and they are also failing in trunk 7522ca55f1
2023-02-14 05:43:37 +00:00
Wanchao Liang
7522ca55f1
[tp] additional doc fixes ( #94786 )
...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/94786
Approved by: https://github.com/fduwjj
2023-02-14 04:52:04 +00:00
Wanchao Liang
2db12e3844
[tp] minor update to TP docs ( #94748 )
...
minor update to TP docs for beta release
Pull Request resolved: https://github.com/pytorch/pytorch/pull/94748
Approved by: https://github.com/fduwjj
2023-02-13 21:54:19 +00:00
Aaron Gokaslan
1e2d82b8e4
[BE] Merge isinstance calls together ( #94419 )
...
Simplify and speeds up isinstance calls by checking for multiple types at the same time.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/94419
Approved by: https://github.com/ezyang
2023-02-09 00:47:26 +00:00
fduwjj
3fb6e119e2
[PT-D][TP] Fix the module registration in TP API ( #93412 )
...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/93412
Approved by: https://github.com/XilunWu
2023-02-01 21:03:56 +00:00
Wanchao Liang
9a56997fe1
[dtensor][5/N] add cached propagator for TP ( #90734 )
...
This PR adds a cached propagator for TP use, it caches the sharding
prop decision for the same input sharding on an operator. This could
improve eager mode performance.
Differential Revision: [D42876249](https://our.internmc.facebook.com/intern/diff/D42876249 )
Pull Request resolved: https://github.com/pytorch/pytorch/pull/90734
Approved by: https://github.com/XilunWu , https://github.com/fduwjj
2023-02-01 05:04:08 +00:00
fduwjj
913866efbf
[PT-D][TP] Fix TP API for FQN path based parallelization ( #93029 )
...
We have not tested dict based parallelize_module and turns out we had mistakes here.
1. Fix the error.
2. Add unit test cases for it.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/93029
Approved by: https://github.com/wz337
2023-01-26 09:10:21 +00:00
joncrall
ad782ff7df
Enable xdoctest runner in CI for real this time ( #83816 )
...
Builds on #83317 and enables running the doctests. Just need to figure out what is causing the failures.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/83816
Approved by: https://github.com/ezyang , https://github.com/malfet
2022-12-29 05:32:42 +00:00
Wanchao Liang
9b5e6b029f
[tp] umft distributed.tensor.parallel ( #89969 )
...
cmd: `ufmt format torch/distributed/tensor`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/89969
Approved by: https://github.com/fduwjj
2022-12-01 20:58:16 +00:00
Wanchao Liang
4451eb24e6
Move tensor_parallel out to distributed.tensor folder ( #89878 )
...
This PR moves tensor parallel from torch.distributed._tensor.parallel
to torch.distributed.tensor.parallel, to prepare for beta release
Pull Request resolved: https://github.com/pytorch/pytorch/pull/89878
Approved by: https://github.com/fduwjj
2022-11-30 22:13:10 +00:00