pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
Carlos Mocholí	9df4ee8d38	Fix ColwiseParallel typo (#116151 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/116151 Approved by: https://github.com/wanchaol	2023-12-20 06:40:32 +00:00
Tianyu Liu	2a5659a797	add length assertion to PrepareModuleInput and PrepareModuleOutput (#115957 ) ## summary `zip(inputs, self.input_layouts, self.desired_input_layouts)` is used in `_prepare_input_fn`; similar for `_prepare_output_fn`. Without assertion, unmatched dimension in inputs/outputs will be lost, potentially causing unexpected behabiors. ## test plan `python test/distributed/tensor/parallel/test_tp_style.py` Pull Request resolved: https://github.com/pytorch/pytorch/pull/115957 Approved by: https://github.com/wanchaol	2023-12-18 21:50:18 +00:00
Wanchao Liang	a1a0b290d2	[tp] further fix the docs (#115974 ) some typo result in the note section not rendered properly, can't see this from the last PR directly as the last PR only show the first commit documentation :( Also make the parallelize_module doc example more concrete Pull Request resolved: https://github.com/pytorch/pytorch/pull/115974 Approved by: https://github.com/wz337	2023-12-18 20:41:53 +00:00
Wanchao Liang	61abacf829	[tp] improve documentation (#115880 ) Improve the TP documentation in terms of format and descriptions Pull Request resolved: https://github.com/pytorch/pytorch/pull/115880 Approved by: https://github.com/XilunWu	2023-12-15 18:44:22 +00:00
Wanchao Liang	28925902fa	[TP] fully rewrite Tensor Parallel APIs (#114732 ) This PR rewrites Tensor Parallel implementation. Tensor Parallel APIs supposed to be a very thin-wrapper to DTensor APIs, but the current implementation got too messy and buggy. It's really hard to debug what went wrong when using it. It's crucially important for advanced users or developers to understand the API and its implementation easily without going through all different types of functions and utils, so that they could trust what happen under the hood. In particular this PR: * Make ParallelStyle to be a real contract API for parallelize_module to take, each concrete ParallelStyle only needs to implement `apply` to apply the sharding to nn.Module, remove all non-necessary fields. This also enable easier ParallelStyle authoring going forward. * Keep the ColwiseParallel and RowwiseParallel public interface, but refactor them in a way that makes the parameter sharding, inputs and outputs handling lives within the style itself, so that it's easy to understand how Linear/Embedding layers are sharded and how the inputs/outputs transformations are performed. * remove all those private _prepare_input/_prepare_output_fn fields for both ColwiseParallel/RowwiseParallel. Since we throw deprecation messages in nightly for a while and TP is on prototype release, the fields are also private, it should be safe to remove them * Refactor the recently landed PrepareModuleInput/Output style, change output_layouts to desired_input/output_layouts, group the function inside the style itself, no default arguments for these two styles and user need to specify them to think about the sharding layouts. Fixed bugs about not handling `use_local_output` flag. * Make default arguments be None instead of Placement object, this is standard python practice to not have custom object instance as default argument * Remove all dead APIs (i.e. PairwiseParallel and SequenceParallel style, all prepare input/output functions) as we throw deprecation msgs for a while, and in the progress of removing all of them from the tests. * throw deprecation warning for `tp_mesh_dim` as we recomemnd use device mesh slice/indexing instead of manually specify mesh dim * Rewrite all documentations for every ParallelStyle and make the documentation more clear about what each style is doing TODOs: * Rewrite TP tests to adjust for the changes we have in this PR * add more tests to guard the bug fixes Differential Revision: [D51761183](https://our.internmc.facebook.com/intern/diff/D51761183) Pull Request resolved: https://github.com/pytorch/pytorch/pull/114732 Approved by: https://github.com/wz337, https://github.com/fduwjj	2023-12-02 08:18:12 +00:00
NVS Abhilash	44c0521e8c	fix: docstring error in torch/distributed module (#113241 ) Fixes: #113193 `pydocstyle <all_files_in_issue> --count` - Before: 345 - After: 130 For deprecated methods, I have added a `noqa` to ignore them. I was not able to find the file `torch/distributed/tensor/parallel/multihead_attention_tp.py`, so I've ignored it for this PR. Pull Request resolved: https://github.com/pytorch/pytorch/pull/113241 Approved by: https://github.com/kit1980	2023-11-09 19:10:20 +00:00
Wanchao Liang	033680c9af	[tp] fix PrepareModuleInput for multiple inputs (#112204 ) Not all inputs needs to annotate shardings and convert to DTensors, if user annotate only one inputs are mark the rest as Nones, we should skip creating DTensors Pull Request resolved: https://github.com/pytorch/pytorch/pull/112204 Approved by: https://github.com/fduwjj	2023-10-27 05:08:05 +00:00
fduwjj	fdc29f58c6	[TP] Refactor style to make it work with torch.compile (#111625 ) We are refactoring parallel style to solve the following things: 1. To further simplifying code logic to make more readable for users. 2. To remove tuple check so that we can work with dynamo for now. Ideally dynamo needs to support this case and we will fix it in parallel. 3. Add tests for newly added parallel style in UT and torch compile test so that we can capture regression due to code change. 4. Move placements early return check into DTensor since it is by passed by dynamo. 5. Remove PairwiseParallelStyle from unit tests to use the new Col/Rowwise parallel style. Pull Request resolved: https://github.com/pytorch/pytorch/pull/111625 Approved by: https://github.com/wanchaol	2023-10-20 19:20:43 +00:00
Wanchao Liang	03e28bde2e	[tp] fix torch compile regression (#111521 ) The most recent refactor of TP https://github.com/pytorch/pytorch/pull/111160 breaks torch compile path, so reverting the behavior back by: 1. use the old default prepare_input/output 2. add the colwise/rowwise parallel test instead Pull Request resolved: https://github.com/pytorch/pytorch/pull/111521 Approved by: https://github.com/fduwjj	2023-10-19 10:27:10 +00:00
Wanchao Liang	59281d5631	[tp] fix SP style regression (#111353 ) [tp] fix SP style regression Although we want to remove prepare_input/output, we should still keep the old behavior for SequenceParallel Pull Request resolved: https://github.com/pytorch/pytorch/pull/111353 Approved by: https://github.com/fduwjj	2023-10-16 17:18:17 +00:00
fduwjj	bfcd86955e	[TP] Fix TP doc format to show examples correctly (#111346 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/111346 Approved by: https://github.com/wanchaol ghstack dependencies: #111160, #111166, #111176, #111177	2023-10-16 06:15:10 +00:00
fduwjj	25a2845d78	[TP] Enable embedding sharding in TP API (#111177 ) We see use cases where embedding sharding is also needed in TP API so we enabled it in the API since DTensor already support colwise embedding sharding. Pull Request resolved: https://github.com/pytorch/pytorch/pull/111177 Approved by: https://github.com/wanchaol ghstack dependencies: #111160, #111166, #111176	2023-10-15 11:49:56 +00:00
fduwjj	8085e08a84	[TP] Add prepareInput and output for input/output DTensor layout annotation in the parent module in TP API (#111166 ) In some use cases, we found that users might want to annote the input/output DTensor layout for the parent module rather than the submodule whose parameters are to be distributed so that we want to have these two class for users to annote input/output DTensor layouts so that we register pre-FWD/FWD hook for the TP-lized module. Pull Request resolved: https://github.com/pytorch/pytorch/pull/111166 Approved by: https://github.com/wanchaol ghstack dependencies: #111160	2023-10-14 15:37:52 +00:00
fduwjj	3a8b10e2da	[TP] Refactor Parallel Style to make it more usable (#111160 ) One thing we find it challenging for users is that we don't want to expose the concept of prepare_input and prepare_out to users since there are so many func names for users to select from which is quite confusing. On the other hand, the colwise and rowwise parallel always need input(out) and output(in) to be certain layout so we can somehow simplify the logic here and make it more usable. So we added three public attributes to the parallelStyle here and the code logic is like: ```python class ParallelStyle(ABC): """ The parallel style user wants the module or submodule to be parallelized. We can add more in future, but this seems sufficient for immediate needs. Users can extend this class to build their own parallel style with customized input/output preparations. """ input_layouts: Union[placement, Tuple[placement]] output_layouts: Union[placement, Tuple[placement]] use_local: bool class RowwiseParallel(ParallelStyle): """ Partitioning the row of a module. We assume the input to be a sharded DTensor and output to be a replicate Tensor. """ def __init__(self): super().__init__(input_layouts=Shard(-1), output_layouts=Replicate(), use_local=True) Class ColwiseParallel(ParallelStyle): """ Partitioning the column of a module. We assume the input to be a Replicated DTensor and output to be a sharded DTensor. """ def __init__(self): super().__init__(input_layouts=Replicate(), output_layouts=Shard(-1), use_local=True) # For the case of Sequence parallel, users just set different input_shard, Shard(0) or Shard(1) instead of Replicate() Class PrepareModuleInput(ParallelStyle): """ Only used to specify the input distribute spec for a module. """ def __init__(self): super().__init__(input_layouts=Shard(0), output_layouts=Replicate(), use_local=False) Class PrepareModuleOutput(ParallelStyle): """ Only used to specify the output distribute spec for a module. """ def __init__(self): super().__init__(input_layouts=Replicate(), output_layouts=Shard(0), use_local=True) parallelize_plan = { "embedding": ColwiseParallel(output_shard=Replicate()), "attn": PrepareModuleInput(), "attn.w1": ColwiseParallel(), "attn.w2": ColwiseParallel(), "attn.w3": ColwiseParallel(), "attn.wo": RowwiseParallel(), } parallelize_module( module=block, # this can be a submodule or module device_mesh=mesh['tp'], parallelize_plan=parallelize_plan, ) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/111160 Approved by: https://github.com/wanchaol	2023-10-14 15:26:36 +00:00
fduwjj	3828cd4b79	[TP][EZ] Update doc for TP parallel style (#107819 ) We need to update the doc for PairwiseParallel and SequenceParallel so that users don't get wrong impressions that these working for ``nn.Transformer``. Pull Request resolved: https://github.com/pytorch/pytorch/pull/107819 Approved by: https://github.com/awgu, https://github.com/wanchaol	2023-08-24 00:13:52 +00:00
fduwjj	953aa6d90e	[TP] Enable more generic attn in Tensor Parallelism (#100508 ) To make TP more generic for Attention module, we come up with this new col/rowwise parallel style. Basically, the idea behind is that: We only do DTensor op for Col/Rowwise sharded part. For the rest of ATen ops, we will leave it to Tensor ops. And we set this behavior as default for Colwise and Rowwise parallel style. If people want to customize it, they can always pass in different prepare_input or prepare_output Pull Request resolved: https://github.com/pytorch/pytorch/pull/100508 Approved by: https://github.com/wanchaol	2023-05-07 18:15:49 +00:00
fduwjj	89b1e67d0a	[Tensor Parallel] Add a new Colwise Parallel style when Pairwise cannot directly used (#100137 ) Some use cases, users cannot directly `PairwiseParallelStyle` and they might need to specify colwise and rowwise separately. Pull Request resolved: https://github.com/pytorch/pytorch/pull/100137 Approved by: https://github.com/wz337	2023-04-28 03:27:51 +00:00
fduwjj	b209d8fa0d	[PT-D][Sequence Parallelism] Enable DTensor based Naive sequence parallelism (#94369 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/94369 Approved by: https://github.com/wanchaol	2023-02-16 21:21:00 +00:00
fduwjj	41e3189222	[PT-D][Tensor parallelism] Add documentations for TP (#94421 ) This is far from completed and we will definitely polish it down the road. Pull Request resolved: https://github.com/pytorch/pytorch/pull/94421 Approved by: https://github.com/wz337	2023-02-09 02:31:06 +00:00
Wanchao Liang	9b5e6b029f	[tp] umft distributed.tensor.parallel (#89969 ) cmd: `ufmt format torch/distributed/tensor` Pull Request resolved: https://github.com/pytorch/pytorch/pull/89969 Approved by: https://github.com/fduwjj	2022-12-01 20:58:16 +00:00
Wanchao Liang	4451eb24e6	Move tensor_parallel out to distributed.tensor folder (#89878 ) This PR moves tensor parallel from torch.distributed._tensor.parallel to torch.distributed.tensor.parallel, to prepare for beta release Pull Request resolved: https://github.com/pytorch/pytorch/pull/89878 Approved by: https://github.com/fduwjj	2022-11-30 22:13:10 +00:00

21 Commits