Commit Graph

108 Commits

Author SHA1 Message Date
Will Constable
346343e6b5 [DeviceMesh] Make _validate_tp_mesh_dim support 3D (#125763)
Currently a 3D mesh with a submesh sliced out for TP is going to fail
this check.

According to @wanchaol in [this
comment](https://github.com/pytorch/pytorch/pull/125250#discussion_r1586653669)
it should be OK to remove these checks.  Though I would appreciate a
more careful review here, since I'm not too sure if there are other edge
cases where these checks are important.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/125763
Approved by: https://github.com/wz337, https://github.com/wanchaol
2024-05-08 21:22:11 +00:00
wz337
2716e77cf7 [FSDP1][2D] Fix FSDP1 2D state_dict to use run_check=False (#123802)
`from_local` with replicate placement would run mesh_broadcast if `run_check=True`, by default `from_local` have `run_check=True`, but for FSDP state_dict case we are for sure that these are replicated on dp dimension (FSDP + TP) already, so we don't need to check/force check it.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/123802
Approved by: https://github.com/wanchaol
2024-04-24 01:25:11 +00:00
Wanchao Liang
05addd5658 [tp] add kwargs support to prepare_module_input (#124114)
as titled, this PR adds kwargs support to PrepareModuleInput style,
where there might be modules who have only kwargs inputs but no
positional args, so we should support this

Pull Request resolved: https://github.com/pytorch/pytorch/pull/124114
Approved by: https://github.com/XilunWu
2024-04-22 21:46:31 +00:00
Ke Wen
5027ef7e9c [TP] Add wildcard support (#122968)
Adding wildcard support for TP's `parallelize_module` API.

Example patterns:
`layers.*.linear`: any characters
`layers.?.linear`: single character
`layers.[1-2]`: digit range, matches `layers.1` and `layers.2`

Example use case:
A model have multiple layers, and we want to parallelize the linear module `lin` inside each layer.
```
model_tp = parallelize_module(
    model,
    device_mesh,
    {
        "layers.*.lin": ColwiseParallel(),
    },
)
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/122968
Approved by: https://github.com/XilunWu, https://github.com/wz337, https://github.com/wanchaol
ghstack dependencies: #122919
2024-04-02 21:23:39 +00:00
Ke Wen
0a038cf0cf [TP] Avoid splitting path twice (#122919)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/122919
Approved by: https://github.com/awgu, https://github.com/wanchaol
2024-04-02 02:06:11 +00:00
Wanchao Liang
a26480a4d1 [dtensor] move early return check into redistribute autograd function (#121653)
This PR fixed the bug of redistribute to move early return check into the
redistribute autograd function, so that even though we redistribute the
same placement, the grad_placements from the `to_local` call might be
different, the redistribute backward still need to happen

Pull Request resolved: https://github.com/pytorch/pytorch/pull/121653
Approved by: https://github.com/awgu
2024-03-12 17:37:30 +00:00
Wanchao Liang
242e03ba86 [dtensor] add async_op option to redistribute and some refactor (#121477)
async output option was only available in `full_tensor()` call, but I think it's
generally good to make this option available in the `redistribute` call directly
so that user can control it

This PR adds async_op option to redistribute call, to allow user control
whether to perform tensor redistribution asynchronously or not.

By default we set this to False, this is to follow the semantics of the c10d
collectives.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/121477
Approved by: https://github.com/wz337
2024-03-09 06:17:23 +00:00
Wanchao Liang
08460f4bae [tp] remove deprecated tp_mesh_dim arg (#121432)
This PR removes the deprecated tp_mesh_dim arg to prepare for release.
As we deprecated this arg for a while (by throwing deprecating
messages), we should remove it before the release

#suppress-api-compatibility-check

Pull Request resolved: https://github.com/pytorch/pytorch/pull/121432
Approved by: https://github.com/wz337
ghstack dependencies: #121431
2024-03-08 17:46:44 +00:00
Wanchao Liang
30982ce072 [tp] doc fixes (#121431)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/121431
Approved by: https://github.com/wz337
2024-03-08 17:46:44 +00:00
Tianyu Liu
dc514b967e [dtensor][TP] check funcol calls and improve doc for loss parallel (#121366)
Since CommDebugMode is fixed, we can check that loss parallel is working as expected.

Under loss parallel, the forward computation should invoke 3 all-reduces, and the backward computation should invoke no functional collectives.

Co-authored-by: Wanchao <wanchaol@users.noreply.github.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/121366
Approved by: https://github.com/wanchaol
2024-03-08 01:41:31 +00:00
Wanchao Liang
1a28ebffb3 [TP] Introduce Sequence Parallel Style for Laynorm/RMSNorm/Dropout (#121295)
As titled, this PR introduces a dedicated `ParallelStyle` to shard the
nn.LayerNorm/nn.Dropout/RMSNorm layers. We were mainly using a manual
distribute_module calls before when sharding the RMSNorm layer, but I
think we should have a dedicate TP API to easily shard those layers,
instead of user manually using DTensors.

I call this SequenceParallel, which might bring some confusion that we
technically "deprecated" a SequenceParallel style months ago. But this
time the SeuqenceParallel style is significantly different with the
previous ones (which used to shard two consecutive Linear layers). I
believe making it the right name is the first priority, instead of
worrying about the issue of reusing the old name

Pull Request resolved: https://github.com/pytorch/pytorch/pull/121295
Approved by: https://github.com/awgu, https://github.com/tianyu-l
ghstack dependencies: #121294
2024-03-07 02:04:59 +00:00
Wanchao Liang
2e50566722 [dtensor] change distribute_module input/output_fn to accept module (#120895)
This is a BC breaking change to distribute_module. The underlying rationle
for this change is that sometimes in the input_fn/output_fn, user would want
to access to the current module for some attributes. This might not be
common enough, but in some cases it's worth to access to the module.

An outstanding use case we want to support is float8, if we want to make
float8 works with the TP API, the input_fn/output_fn of TP parallel
styles would need to get access to the module, where the module might
encapsulates `dynamic_linear.emulate` attribute, that is useful for
input/output casting

Since this is needed for fp8 and DTensor still under prototype release,
I feel it's worth the change and it's better we make the change as
early.

Right now making it a soft BC breaking, which means we maintain BC still
but throw deprecation messages.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/120895
Approved by: https://github.com/tianyu-l
2024-03-04 07:22:32 +00:00
Tianyu Liu
af5376c444 [dtensor] add support for loss parallel (#119877)
Loss parallel is the last piece of sequence parallelism to enable. It enables efficient distributed cross entropy computation when the input is sharded on the class dimension (in a classification problem with many classes). The implementation is via a context manager `loss_parallel`, after enabling which users can directly use `torch.nn.functional.cross_entropy` or `torch.nn.CrossEntropyLoss` without modifying other parts of their code.

Here are the underlying rationales why we are going through these op replacements:

1. `nn.functional.cross_entropy` is the common method that OSS user is using for things like transformer training, to avoid changing user code, we want user to still use this function for loss calculation if they are already using it.
2. `nn.functional.cross_entropy` boils down into `aten.log_softmax` and `aten.nll_loss_foward/backward`, and DTensor now supports those ops already (#117723 #119255 #118917 #119256). They are doing computation with input *replicated* on the class dimension.
3. However when the input of this loss calculation is **sharded on the class dimension**, to run sharded computation efficiently, we need to run both `aten.log_softmax` and `aten.nll_loss_foward` with multiple all-reduce collectives **in the middle of** those aten ops. This is not possible if we are just overriding these two ops, so we need to have some way to **decompose** these two ops into smaller ops to have collectives run in the middle of these two ops.
4. We explored the existing decompositions (#118950). It seems working, except that `log_softmax_backward` and `nll_loss_backward` combined together in aten are implemented in a inefficient way, which would trigger an additional expensive collective. Recently some user also reported similar issues https://github.com/pytorch/pytorch/issues/119261.
5. Therefore, currently we are doing our own decomposition inside a context manager for sequence parallelism specifically. Once we have a better decomposition in core, we can possibly take that instead of reinventing the wheels here.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/119877
Approved by: https://github.com/wanchaol
2024-03-02 05:06:26 +00:00
Mihir Patel
6efda849b5 Update chunk_dtensor to support HYBRID_SHARD (#119481)
Fixes https://github.com/pytorch/pytorch/issues/118639.

Adds support to replicate across HSDP dimensions instead of sharding for shard placement

Pull Request resolved: https://github.com/pytorch/pytorch/pull/119481
Approved by: https://github.com/Skylion007, https://github.com/wz337
2024-02-09 01:30:53 +00:00
Mihir Patel
88e346680b Patch all_gather to support HSDP + TP (#118638)
Update all_gather to support HSDP + TP.

Currently, the `_all_gather_dtensor` function for dtensors only replaces the first dimension with replicate (the FSDP dimension) and does not touch the second dimension (which is assumed to be the TP dimension). With HSDP, we have two dimensions ahead of the TP dimension as opposed to 1. This PR updates to replace all other dimensions with replicate to run the all-gather.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/118638
Approved by: https://github.com/fegin, https://github.com/awgu, https://github.com/wz337
2024-02-05 18:29:23 +00:00
Catherine Lee
4f5785b6b3 Enable possibly-undefined error code (#118533)
Fixes https://github.com/pytorch/pytorch/issues/118129

Suppressions automatically added with

```
import re

with open("error_file.txt", "r") as f:
    errors = f.readlines()

error_lines = {}
for error in errors:
    match = re.match(r"(.*):(\d+):\d+: error:.*\[(.*)\]", error)
    if match:
        file_path, line_number, error_type = match.groups()
        if file_path not in error_lines:
            error_lines[file_path] = {}
        error_lines[file_path][int(line_number)] = error_type

for file_path, lines in error_lines.items():
    with open(file_path, "r") as f:
        code = f.readlines()
    for line_number, error_type in sorted(lines.items(), key=lambda x: x[0], reverse=True):
        code[line_number - 1] = code[line_number - 1].rstrip() + f"  # type: ignore[{error_type}]\n"
    with open(file_path, "w") as f:
        f.writelines(code)
```

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Co-authored-by: Catherine Lee <csl@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/118533
Approved by: https://github.com/Skylion007, https://github.com/zou3519
2024-01-30 21:07:01 +00:00
PyTorch MergeBot
40ece2e579 Revert "Enable possibly-undefined error code (#118533)"
This reverts commit 4f13f69a45.

Reverted https://github.com/pytorch/pytorch/pull/118533 on behalf of https://github.com/clee2000 due to sorry i'm trying to figure out a codev merge conflict, if this works i'll be back to rebase and merge ([comment](https://github.com/pytorch/pytorch/pull/118533#issuecomment-1917695185))
2024-01-30 19:00:34 +00:00
Edward Z. Yang
4f13f69a45 Enable possibly-undefined error code (#118533)
Fixes https://github.com/pytorch/pytorch/issues/118129

Suppressions automatically added with

```
import re

with open("error_file.txt", "r") as f:
    errors = f.readlines()

error_lines = {}
for error in errors:
    match = re.match(r"(.*):(\d+):\d+: error:.*\[(.*)\]", error)
    if match:
        file_path, line_number, error_type = match.groups()
        if file_path not in error_lines:
            error_lines[file_path] = {}
        error_lines[file_path][int(line_number)] = error_type

for file_path, lines in error_lines.items():
    with open(file_path, "r") as f:
        code = f.readlines()
    for line_number, error_type in sorted(lines.items(), key=lambda x: x[0], reverse=True):
        code[line_number - 1] = code[line_number - 1].rstrip() + f"  # type: ignore[{error_type}]\n"
    with open(file_path, "w") as f:
        f.writelines(code)
```

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/118533
Approved by: https://github.com/Skylion007, https://github.com/zou3519
2024-01-30 05:08:10 +00:00
Wanchao Liang
e696fa1ee7 [tp] enable rowwise embedding sharding in RowwiseParallel (#118242)
As titled, this PR enables the rowwise embedding sharding in the
RowwiseParallel style, and add tests to ensure it's working as expected

Pull Request resolved: https://github.com/pytorch/pytorch/pull/118242
Approved by: https://github.com/tianyu-l
ghstack dependencies: #118079, #118080
2024-01-26 19:01:24 +00:00
PyTorch MergeBot
bc67f87559 Revert "[tp] enable rowwise embedding sharding in RowwiseParallel (#118242)"
This reverts commit 7a9012d7e8.

Reverted https://github.com/pytorch/pytorch/pull/118242 on behalf of https://github.com/DanilBaibak due to Break internal build ([comment](https://github.com/pytorch/pytorch/pull/118079#issuecomment-1911681293))
2024-01-26 08:47:14 +00:00
Wanchao Liang
7a9012d7e8 [tp] enable rowwise embedding sharding in RowwiseParallel (#118242)
As titled, this PR enables the rowwise embedding sharding in the
RowwiseParallel style, and add tests to ensure it's working as expected

Pull Request resolved: https://github.com/pytorch/pytorch/pull/118242
Approved by: https://github.com/tianyu-l
ghstack dependencies: #118079, #118080
2024-01-26 01:36:24 +00:00
wz337
e1f9eca113 [DeviceMesh] Reuse sub_group pg if exists (#115716)
Currently, we create new_group for sub_group pg during mesh initialization. The PR changes this so we will:
1) re-use sub_group pg if it exsits,
2) create new sub_group pg if it does not exist.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/115716
Approved by: https://github.com/wanchaol
2024-01-25 18:07:16 +00:00
Wanchao Liang
2bb2cc0b71 [tp] add clarification to doc and improve TP examples (#117618)
This PR adds a clarification about evenly sharded assumption in the main
tp doc and improved the tp examples by adding device mesh constructions

fixes https://github.com/pytorch/pytorch/issues/100044

Pull Request resolved: https://github.com/pytorch/pytorch/pull/117618
Approved by: https://github.com/wconstab, https://github.com/awgu
2024-01-22 18:56:50 +00:00
Wanchao Liang
d8bdb50379 [reland] pass shape/stride during tensor unflatten (#117340)
Reland of https://github.com/pytorch/pytorch/pull/113547 as the previous
PR reverted bc of torch.compile symbolic shape issue. Since we now disabled tensor
unflatten with dynamo.disable, we should not hit this issue again

Pull Request resolved: https://github.com/pytorch/pytorch/pull/117340
Approved by: https://github.com/Skylion007
ghstack dependencies: #117336
2024-01-13 19:33:47 +00:00
Wanchao Liang
848cfe8d45 [reland] unflatten_tensor on compute stream for DTensorExtension (#117020)
reland of https://github.com/pytorch/pytorch/pull/116559, which was reverted by internal.

The underlying reason for the revert is that the torch.dynamo.disable can't be used by the
pytorch codebase, as it's conflicting with some torch.deploy together, although the later one
only run some inference, but it somehow take that weird dependency on fsdp..

We have seen this issue with our functional collectives that we can't
use any dynamo components otherwise torch.deploy would complain..

verified internally that after removing torch.dynamo.disable the test
passed again

Pull Request resolved: https://github.com/pytorch/pytorch/pull/117020
Approved by: https://github.com/awgu
2024-01-09 21:25:15 +00:00
Qinfan Wu
b847290ddd Back out "[2d] unflatten_tensor on compute stream for DTensorExtension (#116559)" (#116939)
Summary:
Original commit changeset: 65298112f3db

Original Phabricator Diff: D52530451

Differential Revision: D52583345

Pull Request resolved: https://github.com/pytorch/pytorch/pull/116939
Approved by: https://github.com/842974287
2024-01-07 03:53:40 +00:00
Wanchao Liang
b10cb168a7 [tp] disable some assertion temporarily for torch.compile (#116573)
Disable some runtime assertion first as it does not work with
torch.compile properly, I'll have a follow up fix in dynamo and reenable
this check again

Pull Request resolved: https://github.com/pytorch/pytorch/pull/116573
Approved by: https://github.com/awgu, https://github.com/XilunWu
ghstack dependencies: #116426, #116559
2024-01-03 23:01:19 +00:00
Wanchao Liang
d9c0e37bab [2d] unflatten_tensor on compute stream for DTensorExtension (#116559)
Context: Existing FSDPExtension have some bug in the case when the
unflatten tensor involves some compute/communications in cuda stream,
the current logic of FSDPExtension unflatten tensor happens in the
unshard stream, which makes runtime lost sync with the compute stream,
and if there're some dependencies between the compute stream and the
unflatten tensor logic, currently it would lose sync point, which could
possibly lead to NaN.

This PR make the FSDPExtension to record the compute stream and let
DTensorExtension to directly use the compute stream for unflatten_tensor.

In long term we might want to directly make the FSDP runtime logic to only
make the unshard happen in unshard stream, and use unshard views to
happen in the compute stream. We currently fix this in the Extension
directly as this is the simplest thing to do without affecting FSDP
runtime logic

Pull Request resolved: https://github.com/pytorch/pytorch/pull/116559
Approved by: https://github.com/awgu, https://github.com/fduwjj, https://github.com/yifuwang
ghstack dependencies: #116426
2024-01-03 07:29:08 +00:00
Wanchao Liang
dfccaac31b [2d] Ensure gradient clear out pending AsyncCollectiveTensor in FSDP Extension (#116122)
As titled, this PR adds gradient hook to the FSDP DTensor extension, to check if there's gradients that are AsyncCollectiveTensors, if there're some, we eagerly wait there.

This is needed because sometimes the parameter's gradient might still pending with AsyncCollectiveTensor, if we directly feed them to FSDP then FSDP would use the ACT's storage to do reduce_scatter, which is wrong.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/116122
Approved by: https://github.com/awgu, https://github.com/fduwjj
2024-01-02 22:11:36 +00:00
Carlos Mocholí
9df4ee8d38 Fix ColwiseParallel typo (#116151)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/116151
Approved by: https://github.com/wanchaol
2023-12-20 06:40:32 +00:00
Tianyu Liu
2a5659a797 add length assertion to PrepareModuleInput and PrepareModuleOutput (#115957)
## summary

`zip(inputs, self.input_layouts, self.desired_input_layouts)` is used in `_prepare_input_fn`; similar for `_prepare_output_fn`. Without assertion, unmatched dimension in inputs/outputs will be lost, potentially causing unexpected behabiors.

## test plan
`python test/distributed/tensor/parallel/test_tp_style.py`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/115957
Approved by: https://github.com/wanchaol
2023-12-18 21:50:18 +00:00
Wanchao Liang
a1a0b290d2 [tp] further fix the docs (#115974)
some typo result in the note section not rendered properly, can't see
this from the last PR directly as the last PR only show the first commit
documentation :(

Also make the parallelize_module doc example more concrete

Pull Request resolved: https://github.com/pytorch/pytorch/pull/115974
Approved by: https://github.com/wz337
2023-12-18 20:41:53 +00:00
Wanchao Liang
61abacf829 [tp] improve documentation (#115880)
Improve the TP documentation in terms of format and descriptions

Pull Request resolved: https://github.com/pytorch/pytorch/pull/115880
Approved by: https://github.com/XilunWu
2023-12-15 18:44:22 +00:00
Iris Zhang (PyTorch)
23fa9621e4 [DeviceMesh] Rename _device_mesh.py to device_mesh.py to prepare for beta (#115099) (#115193)
Summary:

Rename _device_mesh.py to device_mesh.py, update all callsites, add documentation.
We created stubs for public class and methods in torch.distributed.device_mesh so that torch.distributed.device_mesh can be imported with or without distributed is available().

Original diff reverted: D51629761
Original PR reverted: https://github.com/pytorch/pytorch/pull/115099
Prior to landing, CI signals are all passed. Shipit added the "ci/trunk" label to the PR and DID NOT wait for it and went ahead committing. More context can be found in the reverted PR above.

Test Plan: CI.

Differential Revision: D51861018

Pull Request resolved: https://github.com/pytorch/pytorch/pull/115193
Approved by: https://github.com/fegin
2023-12-08 08:44:32 +00:00
Nikita Shulga
a827ac71f2 Revert "[DeviceMesh] Rename _device_mesh.py to device_mesh.py to prepare for beta (#115099)"
This reverts commit eaa64339d6.
2023-12-05 08:59:36 -08:00
Iris Zhang (PyTorch)
eaa64339d6 [DeviceMesh] Rename _device_mesh.py to device_mesh.py to prepare for beta (#115099)
Summary:
Rename _device_mesh.py to device_mesh.py, update all callsites, adds documentation.

Original diff reverted: D51629761
Original PR reverted: https://github.com/pytorch/pytorch/pull/114991
It was failing because failing a public module binding tests in MacOS, and this is due to the change in import order for torch/distributed/fsdp/_common_utils.py. Since this original import would still work, we remove the changes in this file.

Test Plan: CI.

Differential Revision: D51825114

Pull Request resolved: https://github.com/pytorch/pytorch/pull/115099
Approved by: https://github.com/wanchaol, https://github.com/fegin
2023-12-05 05:44:52 +00:00
PyTorch MergeBot
3a2e2044cd Revert "[DeviceMesh] Rename _device_mesh.py to device_mesh.py to prepare for beta (#114710) (#114991)"
This reverts commit 729ac7317a.

Reverted https://github.com/pytorch/pytorch/pull/114991 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally ([comment](https://github.com/pytorch/pytorch/pull/114991#issuecomment-1837214567))
2023-12-02 17:55:51 +00:00
Wanchao Liang
28925902fa [TP] fully rewrite Tensor Parallel APIs (#114732)
This PR rewrites Tensor Parallel implementation. Tensor Parallel APIs
supposed to be a very thin-wrapper to DTensor APIs, but the current
implementation got too messy and buggy. It's really hard to debug what
went wrong when using it. It's crucially important for advanced users or
developers to understand the API and its implementation easily without
going through all different types of functions and utils, so that
they could trust what happen under the hood.

In particular this PR:

* Make ParallelStyle to be a real contract API for parallelize_module to
  take, each concrete ParallelStyle only needs to implement `apply` to
apply the sharding to nn.Module, remove all non-necessary fields. This
also enable easier ParallelStyle authoring going forward.
* Keep the ColwiseParallel and RowwiseParallel public interface, but
  refactor them in a way that makes the parameter sharding, inputs and
outputs handling lives within the style itself, so that it's easy to
understand how Linear/Embedding layers are sharded and how the inputs/outputs
transformations are performed.
* remove all those private _prepare_input/_prepare_output_fn fields for
  both ColwiseParallel/RowwiseParallel. Since we throw deprecation
messages in nightly for a while and TP is on prototype release, the
fields are also private, it should be safe to remove them
* Refactor the recently landed PrepareModuleInput/Output style, change
  output_layouts to desired_input/output_layouts, group
  the function inside the style itself, no default arguments for these
two styles and user need to specify them to think about the sharding
layouts. Fixed bugs about not handling
`use_local_output` flag.
* Make default arguments be None instead of Placement object, this is
  standard python practice to not have custom object instance as default
argument
* Remove all dead APIs (i.e. PairwiseParallel and SequenceParallel
  style, all prepare input/output functions) as we throw deprecation
 msgs for a while, and in the progress of removing all of them from the tests.
* throw deprecation warning for `tp_mesh_dim` as we recomemnd use device
  mesh slice/indexing instead of manually specify mesh dim
* Rewrite all documentations for every ParallelStyle and make the
  documentation more clear about what each style is doing

TODOs:
* Rewrite TP tests to adjust for the changes we have in this PR
* add more tests to guard the bug fixes

Differential Revision: [D51761183](https://our.internmc.facebook.com/intern/diff/D51761183)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/114732
Approved by: https://github.com/wz337, https://github.com/fduwjj
2023-12-02 08:18:12 +00:00
Iris Zhang (PyTorch)
729ac7317a [DeviceMesh] Rename _device_mesh.py to device_mesh.py to prepare for beta (#114710) (#114991)
Summary:

Same content of changes as https://github.com/pytorch/pytorch/pull/114710

Rename _device_mesh.py to device_mesh.py, update all callsites, adds documentation.
ghstack-source-id: 208980207
exported-using-ghexport

Test Plan: CI.

Reviewed By: wanchaol

Differential Revision: D51629761

Pull Request resolved: https://github.com/pytorch/pytorch/pull/114991
Approved by: https://github.com/wanchaol, https://github.com/fduwjj, https://github.com/fegin
2023-12-02 04:39:41 +00:00
Wanchao Liang
4875e4d63f [tp] delete dead code (#114731)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/114731
Approved by: https://github.com/fegin, https://github.com/wz337
2023-12-01 06:35:42 +00:00
wz337
7b3e45be59 [DeviceMesh] Rename get_dim_groups to get_group (#114708)
Rename get_dim_groups to get_group and update all callsites.

Differential Revision: [D51629801](https://our.internmc.facebook.com/intern/diff/D51629801/)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/114708
Approved by: https://github.com/XilunWu, https://github.com/wanchaol, https://github.com/fegin
2023-11-30 23:40:14 +00:00
wz337
febbc48f43 [DeviceMesh] Make our mesh_dim kwarg naming consistent (#114707)
Changing size(self, dim: Optional[int] = None) to def size(self, mesh_dim: Optional[int] = None) so it is consistent with the rest of our APIs.

We also update this API usage change in both PT and internal (pyper, APS).

Differential Revision: [D51602986](https://our.internmc.facebook.com/intern/diff/D51602986/)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/114707
Approved by: https://github.com/XilunWu, https://github.com/wanchaol, https://github.com/fegin
2023-11-29 19:43:23 +00:00
Iris Zhang
72ce5dd13e [2D] Remove enable_2d_with_fsdp() API and make remove_enable_2d_with_fsdp private (#112473)
As we have our new 2D flow out, we want to remove `enable_2d_with_fsdp()`.
In addition, we change pre_dp_module_transform to private, as we may need to change the UX later on.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/112473
Approved by: https://github.com/fegin, https://github.com/wanchaol
2023-11-16 01:14:00 +00:00
PyTorch MergeBot
277474f1a0 Revert "[2d] pass shape/stride during tensor unflatten (#113547)"
This reverts commit 93372455a7.

Reverted https://github.com/pytorch/pytorch/pull/113547 on behalf of https://github.com/wanchaol due to broken compile test ([comment](https://github.com/pytorch/pytorch/pull/113547#issuecomment-1813048318))
2023-11-15 18:32:54 +00:00
Wanchao Liang
93372455a7 [2d] pass shape/stride during tensor unflatten (#113547)
as titled, built on top of the work @wz337 enabled, this could save some
runtime CPU time to recreate DTensor parameters with correct
shape/stride, and avoid issues when un-even sharding parameters

Pull Request resolved: https://github.com/pytorch/pytorch/pull/113547
Approved by: https://github.com/XilunWu
ghstack dependencies: #113323, #113324
2023-11-14 09:28:09 +00:00
NVS Abhilash
44c0521e8c fix: docstring error in torch/distributed module (#113241)
Fixes: #113193

`pydocstyle <all_files_in_issue> --count`

- Before: 345
- After: 130

For deprecated methods, I have added a `noqa` to ignore them. I was not able to find the file `torch/distributed/tensor/parallel/multihead_attention_tp.py`, so I've ignored it for this PR.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/113241
Approved by: https://github.com/kit1980
2023-11-09 19:10:20 +00:00
Iris Zhang
12c1465d76 [DeviceMesh] Make mesh_resources private (#112294)
This is to prepare moving DeviceMesh as a standalone distributed package.

`_mesh_resources` should only be used in torch.distributed package.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/112294
Approved by: https://github.com/fegin
2023-10-28 17:28:46 +00:00
Wanchao Liang
033680c9af [tp] fix PrepareModuleInput for multiple inputs (#112204)
Not all inputs needs to annotate shardings and convert to DTensors, if
user annotate only one inputs are mark the rest as Nones, we should skip
creating DTensors
Pull Request resolved: https://github.com/pytorch/pytorch/pull/112204
Approved by: https://github.com/fduwjj
2023-10-27 05:08:05 +00:00
Iris Z
185e76238d [2D][Documentation] Add some comments to _chunk_dtensor (#111775)
As title.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/111775
Approved by: https://github.com/awgu
2023-10-23 20:43:03 +00:00
fduwjj
fdc29f58c6 [TP] Refactor style to make it work with torch.compile (#111625)
We are refactoring parallel style to solve the following things:
1. To further simplifying code logic to make more readable for users.
2. To remove tuple check so that we can work with dynamo for now. Ideally dynamo needs to support this case and we will fix it in parallel.
3. Add tests for newly added parallel style in UT and torch compile test so that we can capture regression due to code change.
4. Move placements early return check into DTensor since it is by passed by dynamo.
5. Remove PairwiseParallelStyle from unit tests to use the new Col/Rowwise parallel style.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/111625
Approved by: https://github.com/wanchaol
2023-10-20 19:20:43 +00:00