pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
Iris Zhang (PyTorch)	eaa64339d6	[DeviceMesh] Rename _device_mesh.py to device_mesh.py to prepare for beta (#115099 ) Summary: Rename _device_mesh.py to device_mesh.py, update all callsites, adds documentation. Original diff reverted: D51629761 Original PR reverted: https://github.com/pytorch/pytorch/pull/114991 It was failing because failing a public module binding tests in MacOS, and this is due to the change in import order for torch/distributed/fsdp/_common_utils.py. Since this original import would still work, we remove the changes in this file. Test Plan: CI. Differential Revision: D51825114 Pull Request resolved: https://github.com/pytorch/pytorch/pull/115099 Approved by: https://github.com/wanchaol, https://github.com/fegin	2023-12-05 05:44:52 +00:00
PyTorch MergeBot	3a2e2044cd	Revert "[DeviceMesh] Rename _device_mesh.py to device_mesh.py to prepare for beta (#114710 ) (#114991 )" This reverts commit `729ac7317a`. Reverted https://github.com/pytorch/pytorch/pull/114991 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally ([comment](https://github.com/pytorch/pytorch/pull/114991#issuecomment-1837214567))	2023-12-02 17:55:51 +00:00
Wanchao Liang	28925902fa	[TP] fully rewrite Tensor Parallel APIs (#114732 ) This PR rewrites Tensor Parallel implementation. Tensor Parallel APIs supposed to be a very thin-wrapper to DTensor APIs, but the current implementation got too messy and buggy. It's really hard to debug what went wrong when using it. It's crucially important for advanced users or developers to understand the API and its implementation easily without going through all different types of functions and utils, so that they could trust what happen under the hood. In particular this PR: * Make ParallelStyle to be a real contract API for parallelize_module to take, each concrete ParallelStyle only needs to implement `apply` to apply the sharding to nn.Module, remove all non-necessary fields. This also enable easier ParallelStyle authoring going forward. * Keep the ColwiseParallel and RowwiseParallel public interface, but refactor them in a way that makes the parameter sharding, inputs and outputs handling lives within the style itself, so that it's easy to understand how Linear/Embedding layers are sharded and how the inputs/outputs transformations are performed. * remove all those private _prepare_input/_prepare_output_fn fields for both ColwiseParallel/RowwiseParallel. Since we throw deprecation messages in nightly for a while and TP is on prototype release, the fields are also private, it should be safe to remove them * Refactor the recently landed PrepareModuleInput/Output style, change output_layouts to desired_input/output_layouts, group the function inside the style itself, no default arguments for these two styles and user need to specify them to think about the sharding layouts. Fixed bugs about not handling `use_local_output` flag. * Make default arguments be None instead of Placement object, this is standard python practice to not have custom object instance as default argument * Remove all dead APIs (i.e. PairwiseParallel and SequenceParallel style, all prepare input/output functions) as we throw deprecation msgs for a while, and in the progress of removing all of them from the tests. * throw deprecation warning for `tp_mesh_dim` as we recomemnd use device mesh slice/indexing instead of manually specify mesh dim * Rewrite all documentations for every ParallelStyle and make the documentation more clear about what each style is doing TODOs: * Rewrite TP tests to adjust for the changes we have in this PR * add more tests to guard the bug fixes Differential Revision: [D51761183](https://our.internmc.facebook.com/intern/diff/D51761183) Pull Request resolved: https://github.com/pytorch/pytorch/pull/114732 Approved by: https://github.com/wz337, https://github.com/fduwjj	2023-12-02 08:18:12 +00:00
Iris Zhang (PyTorch)	729ac7317a	[DeviceMesh] Rename _device_mesh.py to device_mesh.py to prepare for beta (#114710 ) (#114991 ) Summary: Same content of changes as https://github.com/pytorch/pytorch/pull/114710 Rename _device_mesh.py to device_mesh.py, update all callsites, adds documentation. ghstack-source-id: 208980207 exported-using-ghexport Test Plan: CI. Reviewed By: wanchaol Differential Revision: D51629761 Pull Request resolved: https://github.com/pytorch/pytorch/pull/114991 Approved by: https://github.com/wanchaol, https://github.com/fduwjj, https://github.com/fegin	2023-12-02 04:39:41 +00:00
Wanchao Liang	4875e4d63f	[tp] delete dead code (#114731 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/114731 Approved by: https://github.com/fegin, https://github.com/wz337	2023-12-01 06:35:42 +00:00
wz337	7b3e45be59	[DeviceMesh] Rename get_dim_groups to get_group (#114708 ) Rename get_dim_groups to get_group and update all callsites. Differential Revision: [D51629801](https://our.internmc.facebook.com/intern/diff/D51629801/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/114708 Approved by: https://github.com/XilunWu, https://github.com/wanchaol, https://github.com/fegin	2023-11-30 23:40:14 +00:00
wz337	febbc48f43	[DeviceMesh] Make our mesh_dim kwarg naming consistent (#114707 ) Changing size(self, dim: Optional[int] = None) to def size(self, mesh_dim: Optional[int] = None) so it is consistent with the rest of our APIs. We also update this API usage change in both PT and internal (pyper, APS). Differential Revision: [D51602986](https://our.internmc.facebook.com/intern/diff/D51602986/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/114707 Approved by: https://github.com/XilunWu, https://github.com/wanchaol, https://github.com/fegin	2023-11-29 19:43:23 +00:00
Iris Zhang	72ce5dd13e	[2D] Remove enable_2d_with_fsdp() API and make remove_enable_2d_with_fsdp private (#112473 ) As we have our new 2D flow out, we want to remove `enable_2d_with_fsdp()`. In addition, we change pre_dp_module_transform to private, as we may need to change the UX later on. Pull Request resolved: https://github.com/pytorch/pytorch/pull/112473 Approved by: https://github.com/fegin, https://github.com/wanchaol	2023-11-16 01:14:00 +00:00
PyTorch MergeBot	277474f1a0	Revert "[2d] pass shape/stride during tensor unflatten (#113547 )" This reverts commit `93372455a7`. Reverted https://github.com/pytorch/pytorch/pull/113547 on behalf of https://github.com/wanchaol due to broken compile test ([comment](https://github.com/pytorch/pytorch/pull/113547#issuecomment-1813048318))	2023-11-15 18:32:54 +00:00
Wanchao Liang	93372455a7	[2d] pass shape/stride during tensor unflatten (#113547 ) as titled, built on top of the work @wz337 enabled, this could save some runtime CPU time to recreate DTensor parameters with correct shape/stride, and avoid issues when un-even sharding parameters Pull Request resolved: https://github.com/pytorch/pytorch/pull/113547 Approved by: https://github.com/XilunWu ghstack dependencies: #113323, #113324	2023-11-14 09:28:09 +00:00
NVS Abhilash	44c0521e8c	fix: docstring error in torch/distributed module (#113241 ) Fixes: #113193 `pydocstyle <all_files_in_issue> --count` - Before: 345 - After: 130 For deprecated methods, I have added a `noqa` to ignore them. I was not able to find the file `torch/distributed/tensor/parallel/multihead_attention_tp.py`, so I've ignored it for this PR. Pull Request resolved: https://github.com/pytorch/pytorch/pull/113241 Approved by: https://github.com/kit1980	2023-11-09 19:10:20 +00:00
Iris Zhang	12c1465d76	[DeviceMesh] Make mesh_resources private (#112294 ) This is to prepare moving DeviceMesh as a standalone distributed package. `_mesh_resources` should only be used in torch.distributed package. Pull Request resolved: https://github.com/pytorch/pytorch/pull/112294 Approved by: https://github.com/fegin	2023-10-28 17:28:46 +00:00
Wanchao Liang	033680c9af	[tp] fix PrepareModuleInput for multiple inputs (#112204 ) Not all inputs needs to annotate shardings and convert to DTensors, if user annotate only one inputs are mark the rest as Nones, we should skip creating DTensors Pull Request resolved: https://github.com/pytorch/pytorch/pull/112204 Approved by: https://github.com/fduwjj	2023-10-27 05:08:05 +00:00
Iris Z	185e76238d	[2D][Documentation] Add some comments to _chunk_dtensor (#111775 ) As title. Pull Request resolved: https://github.com/pytorch/pytorch/pull/111775 Approved by: https://github.com/awgu	2023-10-23 20:43:03 +00:00
fduwjj	fdc29f58c6	[TP] Refactor style to make it work with torch.compile (#111625 ) We are refactoring parallel style to solve the following things: 1. To further simplifying code logic to make more readable for users. 2. To remove tuple check so that we can work with dynamo for now. Ideally dynamo needs to support this case and we will fix it in parallel. 3. Add tests for newly added parallel style in UT and torch compile test so that we can capture regression due to code change. 4. Move placements early return check into DTensor since it is by passed by dynamo. 5. Remove PairwiseParallelStyle from unit tests to use the new Col/Rowwise parallel style. Pull Request resolved: https://github.com/pytorch/pytorch/pull/111625 Approved by: https://github.com/wanchaol	2023-10-20 19:20:43 +00:00
Wanchao Liang	03e28bde2e	[tp] fix torch compile regression (#111521 ) The most recent refactor of TP https://github.com/pytorch/pytorch/pull/111160 breaks torch compile path, so reverting the behavior back by: 1. use the old default prepare_input/output 2. add the colwise/rowwise parallel test instead Pull Request resolved: https://github.com/pytorch/pytorch/pull/111521 Approved by: https://github.com/fduwjj	2023-10-19 10:27:10 +00:00
Wanchao Liang	59281d5631	[tp] fix SP style regression (#111353 ) [tp] fix SP style regression Although we want to remove prepare_input/output, we should still keep the old behavior for SequenceParallel Pull Request resolved: https://github.com/pytorch/pytorch/pull/111353 Approved by: https://github.com/fduwjj	2023-10-16 17:18:17 +00:00
fduwjj	bfcd86955e	[TP] Fix TP doc format to show examples correctly (#111346 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/111346 Approved by: https://github.com/wanchaol ghstack dependencies: #111160, #111166, #111176, #111177	2023-10-16 06:15:10 +00:00
fduwjj	25a2845d78	[TP] Enable embedding sharding in TP API (#111177 ) We see use cases where embedding sharding is also needed in TP API so we enabled it in the API since DTensor already support colwise embedding sharding. Pull Request resolved: https://github.com/pytorch/pytorch/pull/111177 Approved by: https://github.com/wanchaol ghstack dependencies: #111160, #111166, #111176	2023-10-15 11:49:56 +00:00
fduwjj	8085e08a84	[TP] Add prepareInput and output for input/output DTensor layout annotation in the parent module in TP API (#111166 ) In some use cases, we found that users might want to annote the input/output DTensor layout for the parent module rather than the submodule whose parameters are to be distributed so that we want to have these two class for users to annote input/output DTensor layouts so that we register pre-FWD/FWD hook for the TP-lized module. Pull Request resolved: https://github.com/pytorch/pytorch/pull/111166 Approved by: https://github.com/wanchaol ghstack dependencies: #111160	2023-10-14 15:37:52 +00:00
fduwjj	3a8b10e2da	[TP] Refactor Parallel Style to make it more usable (#111160 ) One thing we find it challenging for users is that we don't want to expose the concept of prepare_input and prepare_out to users since there are so many func names for users to select from which is quite confusing. On the other hand, the colwise and rowwise parallel always need input(out) and output(in) to be certain layout so we can somehow simplify the logic here and make it more usable. So we added three public attributes to the parallelStyle here and the code logic is like: ```python class ParallelStyle(ABC): """ The parallel style user wants the module or submodule to be parallelized. We can add more in future, but this seems sufficient for immediate needs. Users can extend this class to build their own parallel style with customized input/output preparations. """ input_layouts: Union[placement, Tuple[placement]] output_layouts: Union[placement, Tuple[placement]] use_local: bool class RowwiseParallel(ParallelStyle): """ Partitioning the row of a module. We assume the input to be a sharded DTensor and output to be a replicate Tensor. """ def __init__(self): super().__init__(input_layouts=Shard(-1), output_layouts=Replicate(), use_local=True) Class ColwiseParallel(ParallelStyle): """ Partitioning the column of a module. We assume the input to be a Replicated DTensor and output to be a sharded DTensor. """ def __init__(self): super().__init__(input_layouts=Replicate(), output_layouts=Shard(-1), use_local=True) # For the case of Sequence parallel, users just set different input_shard, Shard(0) or Shard(1) instead of Replicate() Class PrepareModuleInput(ParallelStyle): """ Only used to specify the input distribute spec for a module. """ def __init__(self): super().__init__(input_layouts=Shard(0), output_layouts=Replicate(), use_local=False) Class PrepareModuleOutput(ParallelStyle): """ Only used to specify the output distribute spec for a module. """ def __init__(self): super().__init__(input_layouts=Replicate(), output_layouts=Shard(0), use_local=True) parallelize_plan = { "embedding": ColwiseParallel(output_shard=Replicate()), "attn": PrepareModuleInput(), "attn.w1": ColwiseParallel(), "attn.w2": ColwiseParallel(), "attn.w3": ColwiseParallel(), "attn.wo": RowwiseParallel(), } parallelize_module( module=block, # this can be a submodule or module device_mesh=mesh['tp'], parallelize_plan=parallelize_plan, ) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/111160 Approved by: https://github.com/wanchaol	2023-10-14 15:26:36 +00:00
wz337	8e32e62f67	[TP] Validate TP mesh dim for 2D composition (#111001 ) Currently, we only support intranode TP when compositin TP with other parallelism. This PR adds additional check to validate the TP mesh dim in TP initialization when parent mesh exists. cc. @fegin, @fduwjj Pull Request resolved: https://github.com/pytorch/pytorch/pull/111001 Approved by: https://github.com/fduwjj	2023-10-12 02:11:44 +00:00
wz337	80dfc974dd	[2D] Enable 2D FSDP+TP model.load_state_dict() (#110925 ) This PR adds a all_gather_dtensor() method to fsdp/_fsdp_extensions.py and the actual implementation in tensor/parallel/fsdp.py. This enables FSDP to load 2D DTensor state_dict into model when calling `model.load_state_dict()`. cc. @fegin Pull Request resolved: https://github.com/pytorch/pytorch/pull/110925 Approved by: https://github.com/fegin ghstack dependencies: #110831, #110846	2023-10-11 18:22:20 +00:00
wz337	6c136c3302	[2D] Enable 2D DTensor state_dict for FSDP + TP (#110846 ) This PR adds a `chunk_dtensor()` method to fsdp/_fsdp_extensions.py and the actual implementation of `chunk_dtensor()` in tensor/parallel/fsdp.py. This enables FSDP to return 2D DTensor state_dict when composing FSDP with TP. cc. @fegin Pull Request resolved: https://github.com/pytorch/pytorch/pull/110846 Approved by: https://github.com/fegin, https://github.com/wanchaol ghstack dependencies: #110831	2023-10-11 17:40:39 +00:00
wz337	8140494afd	[3/N][2D] Enable training with new 2D flow (#110034 ) Replacing https://github.com/pytorch/pytorch/pull/109553 as it gets reverted. This PR enables training with new 2D flow and adds associated test. In addition, this PR moves the tensor/parallel/_data_parallel_utils.py that are fsdp specific back to tensor/parallel/fsdp.py to avoid circular dependency for ddp.py and test/distributed/tensor/parallel/test_ddp_2d_parallel.py. state_dict related changes would be in later PRs. cc. @fegin, @fduwjj, @wanchaol, @awgu Pull Request resolved: https://github.com/pytorch/pytorch/pull/110034 Approved by: https://github.com/fduwjj	2023-09-26 09:14:15 +00:00
PyTorch MergeBot	f5886bf352	Revert "[3/N][2D] Enable training with new 2D flow (#109553 )" This reverts commit `217b37c023`. Reverted https://github.com/pytorch/pytorch/pull/109553 on behalf of https://github.com/huydhn due to Sorry for reverting your change, but those distributed failures look legit and they are failing in trunk https://hud.pytorch.org/pr/109553 ([comment](https://github.com/pytorch/pytorch/pull/109553#issuecomment-1734100546))	2023-09-25 16:37:19 +00:00
wz337	217b37c023	[3/N][2D] Enable training with new 2D flow (#109553 ) This PR enables training with new 2D flow and adds associated test. state_dict related changes would be in later PRs. cc. @fegin, @fduwjj, @wanchaol, @awgu Pull Request resolved: https://github.com/pytorch/pytorch/pull/109553 Approved by: https://github.com/fegin, https://github.com/awgu	2023-09-25 05:32:07 +00:00
Chien-Chin Huang	1b3e5b53f3	[FSDP][optim_state_dict] Add device to _shard_utils.py to explicitly use the device from fsdp_state (#109631 ) _get_pg_default_device does not always get the device we want. This PR let the user explicitly tell use the correct device. Differential Revision: [D49425743](https://our.internmc.facebook.com/intern/diff/D49425743/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/109631 Approved by: https://github.com/awgu, https://github.com/fduwjj, https://github.com/wz337	2023-09-20 01:59:38 +00:00
Wanchao Liang	a29b9101fa	[dynamo] fix dynamo + DTensor to work with 2d (#108329 ) pair debugged with @wconstab and we found some issue in both dynamo and the TP's fsdp extension side. This PR fixes the dynamo + DTensor integration so that the current graph break FSDP can work with tensor parallel by moving the torch.compile after FSDP wrapping. Pull Request resolved: https://github.com/pytorch/pytorch/pull/108329 Approved by: https://github.com/Skylion007, https://github.com/wconstab	2023-08-31 22:46:26 +00:00
fduwjj	3828cd4b79	[TP][EZ] Update doc for TP parallel style (#107819 ) We need to update the doc for PairwiseParallel and SequenceParallel so that users don't get wrong impressions that these working for ``nn.Transformer``. Pull Request resolved: https://github.com/pytorch/pytorch/pull/107819 Approved by: https://github.com/awgu, https://github.com/wanchaol	2023-08-24 00:13:52 +00:00
Wanchao Liang	da765995fb	[2d] remove ShardedTensor from fsdp extension (#107472 ) 2D Parallel won't use ShardedTensor, and it causes headable for dynamo to recoginize it, removing it from the runtime flatten/unflatten path Pull Request resolved: https://github.com/pytorch/pytorch/pull/107472 Approved by: https://github.com/fduwjj	2023-08-21 17:16:07 +00:00
Wanchao Liang	d8f2ef10a6	[dtensor][1/n] refactor op dispatch logic to reduce overhead (#107305 ) This PR is the first change of a series of refactors to the op dispatch logic to: 1. remove the redundant logic in the op dispatch, simplify the error checking 2. reduce the number of tree_map/tree_flatten/unflatten needed to reduce the overhead coming from those operations 3. remove the CachedShardingPropagator by using lru_cache from functools directly, this makes it not only helps TP, but general DTensor operations could be faster! 4. change the view ops behavior by inplace changing the op_schema, which is dangerous for sharding prop caching, model the view op as one type of resharding too 5. enrich output sharding to include whether the op needs redistribute so that we don't need explicit op schema comparison to know it. This should help with further reducing the CPU overhead, benchmark results: before (without this change), aten.addmm latency: 0.476ms ![Screenshot 2023-08-16 at 10 46 26 AM](https://github.com/pytorch/pytorch/assets/9443650/7692e6c1-1936-4c7f-bf9c-6c8c9b8f6c76) after (with this change), aten.addmm latency: 0.341ms ![Screenshot 2023-08-16 at 11 05 49 AM](https://github.com/pytorch/pytorch/assets/9443650/15a53f0b-7a95-444e-ab2f-3ee0ad2fa47f) overall one layer of mlp time reduced from 13.535 -> 9.665ms Apart from overhead reduction, this PR simplifies the op dispatching logic and the resharding logic (more refactor needed to make things more clean, which will be done in later PRs) Pull Request resolved: https://github.com/pytorch/pytorch/pull/107305 Approved by: https://github.com/fduwjj	2023-08-18 18:30:46 +00:00
fduwjj	983fd5ba79	[2D][TP] Enable DDP TP integration with unit test (#106583 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/106583 Approved by: https://github.com/kumpera, https://github.com/fegin, https://github.com/wanchaol ghstack dependencies: #107313	2023-08-17 02:54:17 +00:00
fduwjj	f3b0d83fe3	[EZ][TP] Refactor FSDP 2D integration extension code so that it can re-used (#107313 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/107313 Approved by: https://github.com/wz337	2023-08-16 22:01:17 +00:00
fduwjj	4a6ca4cc05	[TP][DTensor Perf] Some perf improvement to reduce DTensor CPU overhead (#106524 ) By inspecting a small TP benchmark, we found couple things we can optimize: 1. We call deep_copy so many times when we initialize DTensor. 2. Some shading_prop is not cached successfully. 3. We are still calling redistribute when not necessary. ![image](https://github.com/pytorch/pytorch/assets/6937752/b847d110-eea1-45df-9298-066d0ba07dd7) ![image](https://github.com/pytorch/pytorch/assets/6937752/fc08f564-caed-496b-80d7-275c1dba3806) ![image](https://github.com/pytorch/pytorch/assets/6937752/fdc06cc4-a4ba-48e8-a118-c041bbd04f5e) So we want to: 1. Remove the deep_copy, and we now make placements a tuple so we are sure it's immutable. 2. Somehow the op_schema gets changed during sharding_op propogation, so we store a hash version of it before passing it to sharding_prop. Ideally we want to figure out why `op_schema` gets changed, but looks like in both index and detach/view op, all get changed, it might take more time to debug. 3. Also when we do hashing of op_schema, we want to hash the entire args_schema not just the args_spec which only contains the DTensorSpec from args which are Dtensors. 4. It turns out that sometimes, DTensor has mem_format to be None (not contiguous) and this will lead to redistribute get triggered, so that we only need to compare type/shape and stride in the metadata. Also we need to ensure _Partial and Shard have different hash value in the DTensorSpec. ![image](https://github.com/pytorch/pytorch/assets/6937752/321e6890-1ab6-4975-adc9-524c6ef9a76b) Pull Request resolved: https://github.com/pytorch/pytorch/pull/106524 Approved by: https://github.com/wanchaol	2023-08-14 20:03:19 +00:00
alanhe151220037	1afbc985fe	Make RNGStateTracker support cuda-like device (#106771 ) replace `CudaRNGStateTracker` with `RNGStateTracker` by rewriting some Cuda-binding code with `device_handle` Pull Request resolved: https://github.com/pytorch/pytorch/pull/106771 Approved by: https://github.com/wanchaol	2023-08-10 19:14:33 +00:00
fduwjj	487ebcac3b	Clean up unsed MHA code to avoid confusion (#105956 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/105956 Approved by: https://github.com/wz337, https://github.com/ezyang, https://github.com/wanchaol	2023-07-27 17:10:17 +00:00
FFFrog	9a1cdcb8a0	Format: fixing multiple string concatenation in single line (#106013 ) Fixing multiple string concatenation in single line Pull Request resolved: https://github.com/pytorch/pytorch/pull/106013 Approved by: https://github.com/albanD	2023-07-26 18:39:18 +00:00
Nikita Shulga	5837e95d30	[Reland] Update mypy to 1.4.1 (#105227 ) This PR re-lands - [Typing] Fix PEP 484 Violation (#105022) - Update mypy to 1.4.1 (#91983) That were reverted due to the conflict with internal source repo. Mostly fixes for PEP-484 violation (i.e. when default arg is set to None, but type is not annotated as optional) Plus few real fixes: - Add missing `_get_upgraders_entry_map` to `torch/_C/__init__.pyi` - Add missing return statement to `torch._export. deserialize_graph` - Fix error message in `torch.ao.ns.fx.weight_utils.get_lstm_mod_weights` - Add assert it `torch/optim/optimizer.py` that Optional list is not None TODO (in followup PR): - Fix erroneous `isinstance` check in `torch/ao/quantization/_pt2e/qat_utils.py` Unrelated, to bypass CI failures due to the gcc9 dependency update in Ubuntu-18.04: - Add hack to squash older libstdc++ from conda environment in favor one from OS to `.ci/docker/install_conda.sh` - Update bazel cuda builds to focal, as with libstdc++-6.0.32 bazel builds loose the ability to catch exceptions (probably because they link with cupti statically, but I could not found where it is done) Pull Request resolved: https://github.com/pytorch/pytorch/pull/105227 Approved by: https://github.com/atalman, https://github.com/albanD, https://github.com/Skylion007	2023-07-15 20:30:20 +00:00
PyTorch MergeBot	15fd1ea118	Revert "[Reland] Update mypy to 1.4.1 (#105227 )" This reverts commit `c9c4f8efc3`. Reverted https://github.com/pytorch/pytorch/pull/105227 on behalf of https://github.com/atalman due to trying to mitigate ci sev #105248 ([comment](https://github.com/pytorch/pytorch/pull/105227#issuecomment-1636510935))	2023-07-14 22:28:35 +00:00
Nikita Shulga	c9c4f8efc3	[Reland] Update mypy to 1.4.1 (#105227 ) This PR re-lands - [Typing] Fix PEP 484 Violation (#105022) - Update mypy to 1.4.1 (#91983) That were reverted due to the conflict with internal source repo. Mostly fixes for PEP-484 violation (i.e. when default arg is set to None, but type is not annotated as optional) Plus few real fixes: - Add missing `_get_upgraders_entry_map` to `torch/_C/__init__.pyi` - Add missing return statement to `torch._export. deserialize_graph` - Fix error message in `torch.ao.ns.fx.weight_utils.get_lstm_mod_weights` - Add assert it `torch/optim/optimizer.py` that Optional list is not None TODO (in followup PR): - Fix erroneous `isinstance` check in `torch/ao/quantization/_pt2e/qat_utils.py` Pull Request resolved: https://github.com/pytorch/pytorch/pull/105227 Approved by: https://github.com/atalman, https://github.com/albanD, https://github.com/Skylion007	2023-07-14 20:45:12 +00:00
PyTorch MergeBot	3c5a494d7a	Revert "Update mypy to 1.4.1 (#91983 )" This reverts commit `634659e262`. Reverted https://github.com/pytorch/pytorch/pull/91983 on behalf of https://github.com/malfet due to It's dependent change was reverted, so reverting this one as well, to keep CI clean ([comment](https://github.com/pytorch/pytorch/pull/91983#issuecomment-1636059709))	2023-07-14 15:59:16 +00:00
Nikita Shulga	634659e262	Update mypy to 1.4.1 (#91983 ) Mostly fixes for PEP-484 violation (i.e. when default arg is set to None, but type is not annotated as optional) Plus few real fixes: - Add missing `_get_upgraders_entry_map` to `torch/_C/__init__.pyi` - Add missing return statement to `torch._export. deserialize_graph` - Fix error message in `torch.ao.ns.fx.weight_utils.get_lstm_mod_weights` - TODO (in followup PR): - Fix erroneous `isinstance` check in `torch/ao/quantization/_pt2e/qat_utils.py` Pull Request resolved: https://github.com/pytorch/pytorch/pull/91983 Approved by: https://github.com/kit1980, https://github.com/ZainRizvi, https://github.com/huydhn, https://github.com/thiagocrepaldi, https://github.com/aaronenyeshi	2023-07-13 16:30:36 +00:00
Xilun Wu	e799f565eb	[DTensor][TP][Random] Introduce TensorParallelRNGTracker to integrate parallel RNG state with Tensor Parallel (#103910 ) This PR enables the automatic use of `TensorParallelRNGTracker` in Tensor Parallel api. Some unit tests are going to be added to cover. Pull Request resolved: https://github.com/pytorch/pytorch/pull/103910 Approved by: https://github.com/wanchaol, https://github.com/fduwjj	2023-06-30 08:06:41 +00:00
fduwjj	23b7035b3c	[TP] Add an input resharding wrapper for TP and unit test for 2D + AC (#103334 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/103334 Approved by: https://github.com/kumpera	2023-06-23 04:05:01 +00:00
Wanchao Liang	d31707a257	Get rid of dim_groups attribute from DeviceMesh (#103105 ) This PR get rids of the dim_groups attribute from DeviceMesh, the main motivation behind this is that we should let c10d store the process groups during its creation instead of DeviceMesh, DeviceMesh should just handle ranks correctly. This could enable DTensor becomes picklable! (torch.save/load could be possible), which I will give it a try in the next PR Pull Request resolved: https://github.com/pytorch/pytorch/pull/103105 Approved by: https://github.com/XilunWu, https://github.com/fduwjj	2023-06-09 04:11:15 +00:00
fduwjj	d4380edb9b	[TP] Add API logging for TP high level API (#102209 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/102209 Approved by: https://github.com/wz337, https://github.com/wanchaol	2023-05-25 03:33:00 +00:00
Wanchao Liang	a1aa32e204	[dtensor] tensor ops to use strategy based sharding prop (#100607 ) This is the first series of PR that adopts operator impls to use a strategy based approach, each op utilizes OpStrategy and PlacementStrategy to generate their own strategy. By utilizing the strategy based approach along with the op graph, we could enable more advanced op implementation (decomp is possible), and turn the sharding prop to be more like a contraint satisfication problem. This PR alone only adds some basic tensor op strategies, and it directly works on the op graph that was used for metadata propagation. The tensor ops added in this PR mainly follows one of the arg strategy. The next set of PRs would add more op strategies to other ops. Pull Request resolved: https://github.com/pytorch/pytorch/pull/100607 Approved by: https://github.com/XilunWu	2023-05-11 02:47:20 +00:00
fduwjj	953aa6d90e	[TP] Enable more generic attn in Tensor Parallelism (#100508 ) To make TP more generic for Attention module, we come up with this new col/rowwise parallel style. Basically, the idea behind is that: We only do DTensor op for Col/Rowwise sharded part. For the rest of ATen ops, we will leave it to Tensor ops. And we set this behavior as default for Colwise and Rowwise parallel style. If people want to customize it, they can always pass in different prepare_input or prepare_output Pull Request resolved: https://github.com/pytorch/pytorch/pull/100508 Approved by: https://github.com/wanchaol	2023-05-07 18:15:49 +00:00
fduwjj	89b1e67d0a	[Tensor Parallel] Add a new Colwise Parallel style when Pairwise cannot directly used (#100137 ) Some use cases, users cannot directly `PairwiseParallelStyle` and they might need to specify colwise and rowwise separately. Pull Request resolved: https://github.com/pytorch/pytorch/pull/100137 Approved by: https://github.com/wz337	2023-04-28 03:27:51 +00:00
Rohan Varma	be8c7c06b6	[Tensor Parallel] Simplify distribute for MHA (#100046 ) This function is only called for nn.MHA or the custom MHA we use, and if it is the former it is converted to the latter. So this check can actually be an assert. Differential Revision: [D45300396](https://our.internmc.facebook.com/intern/diff/D45300396/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/100046 Approved by: https://github.com/wanchaol	2023-04-27 00:54:21 +00:00
Xilun Wu	ce60997376	[BE][DTensor] validate the mesh argument in DeviceMesh construction (#99094 ) ## What's in this PR DeviceMesh's __init__ function now requires all calling ranks to pass the same `mesh` argument. ## Why We want to enforce SPMD style of programs using DTensor. Before this PR, 2-D Parallel API (e.g. _create_1d_device_mesh) defines different DeviceMesh on different ranks. After this PR, it defines each sub-meshes and simply perform communications on the one that it is associated with. Differential Revision: [D45165511](https://our.internmc.facebook.com/intern/diff/D45165511) Pull Request resolved: https://github.com/pytorch/pytorch/pull/99094 Approved by: https://github.com/wanchaol	2023-04-21 23:47:51 +00:00
Kazuaki Ishizaki	35fd5c548e	Fix typos under torch/distributed directory (#95638 ) This PR fixes typos in comments and messages of `.py` files under torch/distributed directory Pull Request resolved: https://github.com/pytorch/pytorch/pull/95638 Approved by: https://github.com/usamah1, https://github.com/H-Huang, https://github.com/kit1980	2023-03-27 21:13:44 +00:00
Wanchao Liang	16e7e5a24b	[dtensor] lazy init process groups in device mesh (#96700 ) This PR adds a private flag to allow process grou lazy initialization, this is replacing the previous `dim_groups` arg, as no one is using that now This could help avoid creating process groups when not necessary Differential Revision: [D44044664](https://our.internmc.facebook.com/intern/diff/D44044664) Pull Request resolved: https://github.com/pytorch/pytorch/pull/96700 Approved by: https://github.com/fduwjj, https://github.com/XilunWu	2023-03-20 17:50:04 +00:00
Wanchao Liang	261eb46ddd	[dtensor] refactor get_coordiniate (#95457 ) This refactor get_coordinate to return a optional[list] instead of directly the coordinate on dim, this is so that we can check if the rank is inside the mesh easily Differential Revision: [D43643579](https://our.internmc.facebook.com/intern/diff/D43643579) Pull Request resolved: https://github.com/pytorch/pytorch/pull/95457 Approved by: https://github.com/XilunWu	2023-02-28 17:54:26 +00:00
Wanchao Liang	bb9a05b116	[dtensor] use tracing for metadata prop (#95456 ) This PR uses tracing for metadata prop, so that we can get correct shape/stride metadata without manual calculation by ourselves. The follow up PR on this would be adopt tracing for the sharding prop itself Differential Revision: [D43643578](https://our.internmc.facebook.com/intern/diff/D43643578) Pull Request resolved: https://github.com/pytorch/pytorch/pull/95456 Approved by: https://github.com/XilunWu	2023-02-28 17:54:22 +00:00
fduwjj	b209d8fa0d	[PT-D][Sequence Parallelism] Enable DTensor based Naive sequence parallelism (#94369 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/94369 Approved by: https://github.com/wanchaol	2023-02-16 21:21:00 +00:00
Wanchao Liang	cd9ca4c73f	[tp] additional doc fixes (#94786 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/94786 Approved by: https://github.com/fduwjj	2023-02-15 21:25:26 +00:00
fduwjj	39511697d4	[PT-D][BE] Update 2D parallelism API name and docs (#94771 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/94771 Approved by: https://github.com/wanchaol	2023-02-14 08:13:15 +00:00
PyTorch MergeBot	28ed0bdb37	Revert "[tp] additional doc fixes (#94786 )" This reverts commit `7522ca55f1`. Reverted https://github.com/pytorch/pytorch/pull/94786 on behalf of https://github.com/huydhn due to Sorry for reverting your PR, but the doc failure looks related and they are also failing in trunk `7522ca55f1`	2023-02-14 05:43:37 +00:00
Wanchao Liang	7522ca55f1	[tp] additional doc fixes (#94786 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/94786 Approved by: https://github.com/fduwjj	2023-02-14 04:52:04 +00:00
Wanchao Liang	2db12e3844	[tp] minor update to TP docs (#94748 ) minor update to TP docs for beta release Pull Request resolved: https://github.com/pytorch/pytorch/pull/94748 Approved by: https://github.com/fduwjj	2023-02-13 21:54:19 +00:00
Xuehai Pan	5b1cedacde	[BE] [2/3] Rewrite `super()` calls in functorch and torch (#94588 ) Rewrite Python built-in class `super()` calls. Only non-semantic changes should be applied. - #94587 - #94588 - #94592 Also, methods with only a `super()` call are removed: ```diff class MyModule(nn.Module): - def __init__(self): - super().__init__() - def forward(self, ...): ... ``` Some cases that change the semantics should be kept unchanged. E.g.: `f152a79be9/caffe2/python/net_printer.py (L184-L190)` `f152a79be9/test/test_jit_fuser_te.py (L2628-L2635)` Pull Request resolved: https://github.com/pytorch/pytorch/pull/94588 Approved by: https://github.com/ezyang, https://github.com/albanD	2023-02-10 21:16:33 +00:00
fduwjj	41e3189222	[PT-D][Tensor parallelism] Add documentations for TP (#94421 ) This is far from completed and we will definitely polish it down the road. Pull Request resolved: https://github.com/pytorch/pytorch/pull/94421 Approved by: https://github.com/wz337	2023-02-09 02:31:06 +00:00
Aaron Gokaslan	1e2d82b8e4	[BE] Merge isinstance calls together (#94419 ) Simplify and speeds up isinstance calls by checking for multiple types at the same time. Pull Request resolved: https://github.com/pytorch/pytorch/pull/94419 Approved by: https://github.com/ezyang	2023-02-09 00:47:26 +00:00
fduwjj	3fb6e119e2	[PT-D][TP] Fix the module registration in TP API (#93412 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/93412 Approved by: https://github.com/XilunWu	2023-02-01 21:03:56 +00:00
Wanchao Liang	9a56997fe1	[dtensor][5/N] add cached propagator for TP (#90734 ) This PR adds a cached propagator for TP use, it caches the sharding prop decision for the same input sharding on an operator. This could improve eager mode performance. Differential Revision: [D42876249](https://our.internmc.facebook.com/intern/diff/D42876249) Pull Request resolved: https://github.com/pytorch/pytorch/pull/90734 Approved by: https://github.com/XilunWu, https://github.com/fduwjj	2023-02-01 05:04:08 +00:00
fduwjj	913866efbf	[PT-D][TP] Fix TP API for FQN path based parallelization (#93029 ) We have not tested dict based parallelize_module and turns out we had mistakes here. 1. Fix the error. 2. Add unit test cases for it. Pull Request resolved: https://github.com/pytorch/pytorch/pull/93029 Approved by: https://github.com/wz337	2023-01-26 09:10:21 +00:00
joncrall	ad782ff7df	Enable xdoctest runner in CI for real this time (#83816 ) Builds on #83317 and enables running the doctests. Just need to figure out what is causing the failures. Pull Request resolved: https://github.com/pytorch/pytorch/pull/83816 Approved by: https://github.com/ezyang, https://github.com/malfet	2022-12-29 05:32:42 +00:00
PyTorch MergeBot	cba96366a2	Revert "remove torch.equal usages (#89527 )" This reverts commit `4095ef8b80`. Reverted https://github.com/pytorch/pytorch/pull/89527 on behalf of https://github.com/clee2000 due to broke periodic multigpu tests `4095ef8b80` https://github.com/pytorch/pytorch/actions/runs/3592806602/jobs/6049368502	2022-12-02 21:36:13 +00:00
Wanchao Liang	9b5e6b029f	[tp] umft distributed.tensor.parallel (#89969 ) cmd: `ufmt format torch/distributed/tensor` Pull Request resolved: https://github.com/pytorch/pytorch/pull/89969 Approved by: https://github.com/fduwjj	2022-12-01 20:58:16 +00:00
Philip Meier	4095ef8b80	remove torch.equal usages (#89527 ) Preparation for the next PR in this stack: #89559. I replaced - `self.assertTrue(torch.equal(...))` with `self.assertEqual(..., rtol=0, atol=0, exact_device=True)`, - the same for `self.assertFalse(...)` with `self.assertNotEqual(...)`, and - `assert torch.equal(...)` with `torch.testing.assert_close(..., rtol=0, atol=0)` (note that we don't need to set `check_device=True` here since that is the default). There were a few instances where the result of `torch.equal` is used directly. In that cases I've replaced with `(... == ...).all().item()` while sometimes also dropping the `.item()` depending on the context. Pull Request resolved: https://github.com/pytorch/pytorch/pull/89527 Approved by: https://github.com/mruberry	2022-12-01 11:22:52 +00:00
Wanchao Liang	4451eb24e6	Move tensor_parallel out to distributed.tensor folder (#89878 ) This PR moves tensor parallel from torch.distributed._tensor.parallel to torch.distributed.tensor.parallel, to prepare for beta release Pull Request resolved: https://github.com/pytorch/pytorch/pull/89878 Approved by: https://github.com/fduwjj	2022-11-30 22:13:10 +00:00

1 2 3 4 5

223 Commits