pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-06 12:20:52 +01:00

Author	SHA1	Message	Date
PyTorch MergeBot	35f36363ec	Revert "[dtensor] move DTensor to public namespace (#133113 )" This reverts commit `2ee6b97464`. Reverted https://github.com/pytorch/pytorch/pull/133113 on behalf of https://github.com/wanchaol due to looks like it break some internal type imports ([comment](https://github.com/pytorch/pytorch/pull/133113#issuecomment-2295670911))	2024-08-19 05:00:19 +00:00
Wanchao Liang	2ee6b97464	[dtensor] move DTensor to public namespace (#133113 ) Moving DTensor to be in the public namespace, to formally add the documentation page that includes all the public APIs. This includes: * many path renames and path import fixes * a dedicated doc page without too much content yet (adding in the next PRs) * To preserve the BC for users still using the `torch.distributed._tensor`, I added a shim script to redirect old path calls to the new module The BC preserving is evidented by the fact that all DTensor tests are still working without changing the public imports. So it's safe to land the changes Pull Request resolved: https://github.com/pytorch/pytorch/pull/133113 Approved by: https://github.com/XilunWu ghstack dependencies: #133305, #133306	2024-08-17 05:09:52 +00:00
Wanchao Liang	9f17037e8b	[dtensor] move tensor constructors to the api module (#133129 ) This is to ensure __init__.py only contain public APIs Pull Request resolved: https://github.com/pytorch/pytorch/pull/133129 Approved by: https://github.com/awgu, https://github.com/tianyu-l	2024-08-13 06:09:56 +00:00
PyTorch MergeBot	00aa086298	Revert "[dtensor] move tensor constructors to a separate module (#133129 )" This reverts commit `e890d888d9`. Reverted https://github.com/pytorch/pytorch/pull/133129 on behalf of https://github.com/fbgheith due to breaking internal tests ([comment](https://github.com/pytorch/pytorch/pull/133129#issuecomment-2285090400))	2024-08-12 23:55:08 +00:00
Wanchao Liang	e890d888d9	[dtensor] move tensor constructors to a separate module (#133129 ) This is to ensure __init__.py only contain public APIs Pull Request resolved: https://github.com/pytorch/pytorch/pull/133129 Approved by: https://github.com/awgu, https://github.com/tianyu-l	2024-08-10 02:51:42 +00:00
Wei Feng	472b0daeaa	[DDP][FSDP2] keep DTensor params for replicate(fully_shard) (#133059 ) current status: for `replicate(fully_shard)`, DDP lazy_init will convert DTensor into local tensor, and that breaks FSDP unshard this PR keeps FSDP params untouched during DDP lazy_init I came across it because of a CI error in FSDP2's unit test #132978 thanks @awgu for fix proposal Pull Request resolved: https://github.com/pytorch/pytorch/pull/133059 Approved by: https://github.com/Skylion007, https://github.com/fegin	2024-08-09 18:38:05 +00:00
wz337	87053132ea	[DeviceMesh] Remove parent mesh concept from _MeshEnv and replace by root mesh (#132339 ) Previously, when we slice out a submesh from a mesh, we assign the mesh as the parent mesh of the submesh. In this case, when we have a 3D mesh topology, the parent mesh of a 1D mesh sliced out from the 3D mesh is different from the parent mesh of the same 1D mesh sliced out from the 2D submesh of the 3D mesh. For example: ``` mesh_3d = init_device_mesh("cuda", (2,2,2), ("dim0", "dim1", "dim2")) mesh_dim0 = mesh_3d["dim0"] mesh_2d = mesh_2d["dim0", "dim1"] mesh_dim0_2 = mesh_2d["dim0_2"] # This would evaluate to be True print(_mesh_resources.get_parent_mesh(mesh_dim0) != _mesh_resources.get_parent_mesh(mesh_dim0)) ``` We can always reconstruct the mesh needed from the mesh dim names, as long as two dims come from the same root. For simplicity, we do not see the necessity of building a tree structure to represent child-parent relationship. Therefore, we are replacing the parent mesh concept with a root mesh concept in `_MeshEnv` so we would have: ``` mesh_3d = init_device_mesh("cuda", (2,2,2), ("dim0", "dim1", "dim2")) mesh_dim0 = mesh_3d["dim0"] mesh_2d = mesh_2d["dim0", "dim1"] mesh_dim0_2 = mesh_2d["dim0_2"] # This would evaluate to be True print(_mesh_resources.get_root_mesh(mesh_dim0) == _mesh_resources.get_root_mesh(mesh_dim0)) ``` With this change, we will have two types of meshes in an environment. 1. `device_mesh != _mesh_resources.get_root_mesh(device_mesh)` means that the device_mesh is created by slicing. 2. `device_mesh == _mesh_resources.get_root_mesh(device_mesh)` means that the device_mesh is a root mesh not created through slicing. Pull Request resolved: https://github.com/pytorch/pytorch/pull/132339 Approved by: https://github.com/wanchaol ghstack dependencies: #132310, #132311	2024-08-07 07:01:12 +00:00
PyTorch MergeBot	cbee9c1fd2	Revert "Deprecate `torch._utils.is_compiling()` and `torch._dynamo.external_utils.is_compiling()` (#127690 )" This reverts commit `0e7e61f7ce`. Reverted https://github.com/pytorch/pytorch/pull/127690 on behalf of https://github.com/kit1980 due to breaking internal builds ([comment](https://github.com/pytorch/pytorch/pull/127690#issuecomment-2272370386))	2024-08-07 00:05:20 +00:00
Xuehai Pan	0e7e61f7ce	Deprecate `torch._utils.is_compiling()` and `torch._dynamo.external_utils.is_compiling()` (#127690 ) This PR is split from PR #126898. - #126898 ------ Pull Request resolved: https://github.com/pytorch/pytorch/pull/127690 Approved by: https://github.com/Skylion007, https://github.com/malfet	2024-08-03 09:43:38 +00:00
Ke Wen	3d7f541597	[BE][TP] Check module has bias before access (#132137 ) Some linear modules, such as the ones reconstructed by `torch.export.unflatten()`, may not have the `bias` attribute, if the original linear module has `bias=None`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/132137 Approved by: https://github.com/wanchaol	2024-07-31 13:45:28 +00:00
Tianyu Liu	7b0e10f0e5	fix _MaskPartial when multiple embeddings coexist (#131264 ) Previously, using _MaskPartial when multiple embeddings have the following issues: 1. Suppose an `nn.Embedding` has shape `[vocab_size, emb_size]`. When there are more than one embeddings, sharing the same `vocab_size` but with different `emb_size`s. Then they would not share `OpStrategy` since each, when involved in computation, would have different `OpSchema`; however, there would be cache hit for redistribute (specifically `_gen_transform_infos` in `torch/distributed/_tensor/_redistribute.py` when doing `Replicate` -> `_MaskPartial`) as the `_MaskPartial` only has `vocab_size` as `logical_dim_size` but not `emb_size` as attribute. This cache hit is undesirable and would cause trouble when doing all-reduce/reduce-scatter on the new `_MaskPartial` in a separate `OpStrategy`. The error was reported in #130725. In this PR, we introduce `offset_shape` to represent the embedding's full shape to avoid cache hit from embeddings of different shapes. 2. The second issue is when we have two `nn.Embedding`s `emb1` and `emb2` with the same shape. There will be cache hit not only in `_gen_transform_infos`, but also in `OpStrategy` generation. Previously, if we sequentially do `Replicate` -> `_MaskPartial` for both `emb1` `emb2` and then sequentially do reduction on the `_MaskPartial` of `emb1`, it would destroy the `MaskBuffer` and `emb2` would hit error. This PR adds a `refcount` for the `MaskBuffer` so that it can be properly shared by multiple `nn.Embedding`s. Pull Request resolved: https://github.com/pytorch/pytorch/pull/131264 Approved by: https://github.com/wanchaol	2024-07-29 00:40:58 +00:00
Wanchao Liang	1c58aacbc8	[dtensor] move ops to private (#131211 ) as titled Differential Revision: [D60132519](https://our.internmc.facebook.com/intern/diff/D60132519) Pull Request resolved: https://github.com/pytorch/pytorch/pull/131211 Approved by: https://github.com/XilunWu, https://github.com/wz337 ghstack dependencies: #131212	2024-07-25 20:59:55 +00:00
Wanchao Liang	35a0e0f018	[tp] improve SequenceParallel and its documentation (#131346 ) SequenceParallel style assumes the input torch.Tensor ALREADY sharded on the sequence dimension if not passing in DTensor. Since it causes some user confusion on the documentation, this PR: 1. for the case where input passed in is already a DTensor, we check the input placements and redistribute if it's not sharded on the sequence dimension 2. update the doc to make it more explicit about the case when user passed in a torch.Tensor and DTensor This would fix https://github.com/pytorch/pytorch/issues/129355 Pull Request resolved: https://github.com/pytorch/pytorch/pull/131346 Approved by: https://github.com/awgu	2024-07-23 03:57:01 +00:00
Mayank Mishra	e5657024b5	Fix loss_parallel with BF16 logits (#130550 ) Fixes #130549 This PR uses the specific dtype for the `grad_input` buffer and fixes the error Pull Request resolved: https://github.com/pytorch/pytorch/pull/130550 Approved by: https://github.com/tianyu-l	2024-07-12 15:47:38 +00:00
Xuehai Pan	cec31050b4	[BE][Easy] enable UFMT for `torch/distributed/{tensor,_tensor}/` (#128868 ) Part of #123062 - #123062 Pull Request resolved: https://github.com/pytorch/pytorch/pull/128868 Approved by: https://github.com/fegin	2024-06-18 21:49:02 +00:00
Wanchao Liang	7775fee10f	[tp] refactor and fix PrepareModuleInput for DTensor inputs (#128431 ) as titled, this PR refactors the PrepareModuleInput style to have common method prepare_input_arg, allow both args/kwargs to reuse this logic This also fixes https://github.com/pytorch/pytorch/issues/128365 Pull Request resolved: https://github.com/pytorch/pytorch/pull/128431 Approved by: https://github.com/awgu	2024-06-12 19:16:33 +00:00
PyTorch MergeBot	a421699998	Revert "[tp] refactor and fix PrepareModuleInput for DTensor inputs (#128431 )" This reverts commit `089f9a116a`. Reverted https://github.com/pytorch/pytorch/pull/128431 on behalf of https://github.com/DanilBaibak due to Sorry for the revert. Your changes broke the linter. Here you can find more details - `089f9a116a` ([comment](https://github.com/pytorch/pytorch/pull/128431#issuecomment-2162197858))	2024-06-12 06:25:53 +00:00
Wanchao Liang	089f9a116a	[tp] refactor and fix PrepareModuleInput for DTensor inputs (#128431 ) as titled, this PR refactors the PrepareModuleInput style to have common method prepare_input_arg, allow both args/kwargs to reuse this logic This also fixes https://github.com/pytorch/pytorch/issues/128365 Pull Request resolved: https://github.com/pytorch/pytorch/pull/128431 Approved by: https://github.com/awgu	2024-06-12 05:22:24 +00:00
PyTorch MergeBot	90bb510ece	Revert "Deprecate `torch._utils.is_compiling()` and `torch._dynamo.external_utils.is_compiling()` (#127690 )" This reverts commit `348b181a97`. Reverted https://github.com/pytorch/pytorch/pull/127690 on behalf of https://github.com/clee2000 due to sorry I think https://github.com/pytorch/pytorch/pull/126898#issuecomment-2142884456 is still relevant, I will reach out to them to see what needs to be done in internal to get this remerged ([comment](https://github.com/pytorch/pytorch/pull/127690#issuecomment-2159248859))	2024-06-10 20:44:42 +00:00
Aaron Orenstein	7c12cc7ce4	Flip default value for mypy disallow_untyped_defs [6/11] (#127843 ) See #127836 for details. Pull Request resolved: https://github.com/pytorch/pytorch/pull/127843 Approved by: https://github.com/oulgen ghstack dependencies: #127842	2024-06-08 18:49:29 +00:00
Xuehai Pan	348b181a97	Deprecate `torch._utils.is_compiling()` and `torch._dynamo.external_utils.is_compiling()` (#127690 ) This PR is split from PR #126898. - #126898 ------ Pull Request resolved: https://github.com/pytorch/pytorch/pull/127690 Approved by: https://github.com/Skylion007	2024-06-08 15:25:03 +00:00
Iris Z	1d84c7e100	[DeviceMesh] Update get_group and add get_all_groups (#128097 ) Fixes #121984 Pull Request resolved: https://github.com/pytorch/pytorch/pull/128097 Approved by: https://github.com/wconstab, https://github.com/wanchaol	2024-06-08 04:28:56 +00:00
Wanchao Liang	4f87f47ea1	[dtensor] reuse DTensorSpec as much as possible (#128112 ) as titled, given that our DTensorSpec is immutable, we can always reuse the spec if the input/output have the same tensor metadata. this helps two fold: 1. We don't need to re-calculate the hash everytime we produce a DTensorSpec, reduce runtime operator overhead 2. reduce the DTensor construction overhead. Some local benchmark on a 800 parameter clip_grad_norm shows that for foreach_norm the CPU overhead reduces from 11ms -> 7.8ms (around 30% improvement) Pull Request resolved: https://github.com/pytorch/pytorch/pull/128112 Approved by: https://github.com/awgu	2024-06-06 16:55:50 +00:00
Xuehai Pan	8b08b0f340	[BE] enable ruff rule `Q` from flake8-quotes (#127713 ) Enable [ruff rule `Q`](https://docs.astral.sh/ruff/rules/#flake8-quotes-q) from flake8-quotes. Fixes: - [avoidable-escaped-quote (Q003)](https://docs.astral.sh/ruff/rules/avoidable-escaped-quote/#avoidable-escaped-quote-q003) - [unnecessary-escaped-quote (Q004)](https://docs.astral.sh/ruff/rules/unnecessary-escaped-quote/#unnecessary-escaped-quote-q004) Pull Request resolved: https://github.com/pytorch/pytorch/pull/127713 Approved by: https://github.com/ezyang	2024-06-02 23:25:26 +00:00
Xuehai Pan	67ef2683d9	[BE] wrap deprecated function/class with `typing_extensions.deprecated` (#127689 ) Use `typing_extensions.deprecated` for deprecation annotation if possible. Otherwise, add `category=FutureWarning` to `warnings.warn("message")` if the category is missing. Note that only warnings that their messages contain `[Dd]eprecat(ed\|ion)` are updated in this PR. Resolves #126888 - #126888 This PR is split from PR #126898. - #126898 ------ Pull Request resolved: https://github.com/pytorch/pytorch/pull/127689 Approved by: https://github.com/Skylion007	2024-06-02 12:30:43 +00:00
PyTorch MergeBot	033e733021	Revert "[BE] wrap deprecated function/class with `typing_extensions.deprecated` (#126898 )" This reverts commit `749a132fb0`. Reverted https://github.com/pytorch/pytorch/pull/126898 on behalf of https://github.com/fbgheith due to switching typing-extensions=4.3.0 to 4.9.0 causes internal failure ([comment](https://github.com/pytorch/pytorch/pull/126898#issuecomment-2142884456))	2024-05-31 19:47:24 +00:00
Xuehai Pan	749a132fb0	[BE] wrap deprecated function/class with `typing_extensions.deprecated` (#126898 ) Use `typing_extensions.deprecated` for deprecation annotation if possible. Otherwise, add `category=FutureWarning` to `warnings.warn("message")` if the category is missing. Note that only warnings that their messages contain `[Dd]eprecat(ed\|ion)` are updated in this PR. UPDATE: Use `FutureWarning` instead of `DeprecationWarning`. Resolves #126888 - #126888 Pull Request resolved: https://github.com/pytorch/pytorch/pull/126898 Approved by: https://github.com/albanD	2024-05-29 12:09:27 +00:00
Will Constable	346343e6b5	[DeviceMesh] Make _validate_tp_mesh_dim support 3D (#125763 ) Currently a 3D mesh with a submesh sliced out for TP is going to fail this check. According to @wanchaol in [this comment](https://github.com/pytorch/pytorch/pull/125250#discussion_r1586653669) it should be OK to remove these checks. Though I would appreciate a more careful review here, since I'm not too sure if there are other edge cases where these checks are important. Pull Request resolved: https://github.com/pytorch/pytorch/pull/125763 Approved by: https://github.com/wz337, https://github.com/wanchaol	2024-05-08 21:22:11 +00:00
wz337	2716e77cf7	[FSDP1][2D] Fix FSDP1 2D state_dict to use run_check=False (#123802 ) `from_local` with replicate placement would run mesh_broadcast if `run_check=True`, by default `from_local` have `run_check=True`, but for FSDP state_dict case we are for sure that these are replicated on dp dimension (FSDP + TP) already, so we don't need to check/force check it. Pull Request resolved: https://github.com/pytorch/pytorch/pull/123802 Approved by: https://github.com/wanchaol	2024-04-24 01:25:11 +00:00
Wanchao Liang	05addd5658	[tp] add kwargs support to prepare_module_input (#124114 ) as titled, this PR adds kwargs support to PrepareModuleInput style, where there might be modules who have only kwargs inputs but no positional args, so we should support this Pull Request resolved: https://github.com/pytorch/pytorch/pull/124114 Approved by: https://github.com/XilunWu	2024-04-22 21:46:31 +00:00
Ke Wen	5027ef7e9c	[TP] Add wildcard support (#122968 ) Adding wildcard support for TP's `parallelize_module` API. Example patterns: `layers..linear`: any characters `layers.?.linear`: single character `layers.[1-2]`: digit range, matches `layers.1` and `layers.2` Example use case: A model have multiple layers, and we want to parallelize the linear module `lin` inside each layer. ``` model_tp = parallelize_module( model, device_mesh, { "layers..lin": ColwiseParallel(), }, ) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/122968 Approved by: https://github.com/XilunWu, https://github.com/wz337, https://github.com/wanchaol ghstack dependencies: #122919	2024-04-02 21:23:39 +00:00
Ke Wen	0a038cf0cf	[TP] Avoid splitting path twice (#122919 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/122919 Approved by: https://github.com/awgu, https://github.com/wanchaol	2024-04-02 02:06:11 +00:00
Wanchao Liang	a26480a4d1	[dtensor] move early return check into redistribute autograd function (#121653 ) This PR fixed the bug of redistribute to move early return check into the redistribute autograd function, so that even though we redistribute the same placement, the grad_placements from the `to_local` call might be different, the redistribute backward still need to happen Pull Request resolved: https://github.com/pytorch/pytorch/pull/121653 Approved by: https://github.com/awgu	2024-03-12 17:37:30 +00:00
Wanchao Liang	242e03ba86	[dtensor] add async_op option to redistribute and some refactor (#121477 ) async output option was only available in `full_tensor()` call, but I think it's generally good to make this option available in the `redistribute` call directly so that user can control it This PR adds async_op option to redistribute call, to allow user control whether to perform tensor redistribution asynchronously or not. By default we set this to False, this is to follow the semantics of the c10d collectives. Pull Request resolved: https://github.com/pytorch/pytorch/pull/121477 Approved by: https://github.com/wz337	2024-03-09 06:17:23 +00:00
Wanchao Liang	08460f4bae	[tp] remove deprecated tp_mesh_dim arg (#121432 ) This PR removes the deprecated tp_mesh_dim arg to prepare for release. As we deprecated this arg for a while (by throwing deprecating messages), we should remove it before the release #suppress-api-compatibility-check Pull Request resolved: https://github.com/pytorch/pytorch/pull/121432 Approved by: https://github.com/wz337 ghstack dependencies: #121431	2024-03-08 17:46:44 +00:00
Wanchao Liang	30982ce072	[tp] doc fixes (#121431 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/121431 Approved by: https://github.com/wz337	2024-03-08 17:46:44 +00:00
Tianyu Liu	dc514b967e	[dtensor][TP] check funcol calls and improve doc for loss parallel (#121366 ) Since CommDebugMode is fixed, we can check that loss parallel is working as expected. Under loss parallel, the forward computation should invoke 3 all-reduces, and the backward computation should invoke no functional collectives. Co-authored-by: Wanchao <wanchaol@users.noreply.github.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/121366 Approved by: https://github.com/wanchaol	2024-03-08 01:41:31 +00:00
Wanchao Liang	1a28ebffb3	[TP] Introduce Sequence Parallel Style for Laynorm/RMSNorm/Dropout (#121295 ) As titled, this PR introduces a dedicated `ParallelStyle` to shard the nn.LayerNorm/nn.Dropout/RMSNorm layers. We were mainly using a manual distribute_module calls before when sharding the RMSNorm layer, but I think we should have a dedicate TP API to easily shard those layers, instead of user manually using DTensors. I call this SequenceParallel, which might bring some confusion that we technically "deprecated" a SequenceParallel style months ago. But this time the SeuqenceParallel style is significantly different with the previous ones (which used to shard two consecutive Linear layers). I believe making it the right name is the first priority, instead of worrying about the issue of reusing the old name Pull Request resolved: https://github.com/pytorch/pytorch/pull/121295 Approved by: https://github.com/awgu, https://github.com/tianyu-l ghstack dependencies: #121294	2024-03-07 02:04:59 +00:00
Wanchao Liang	2e50566722	[dtensor] change distribute_module input/output_fn to accept module (#120895 ) This is a BC breaking change to distribute_module. The underlying rationle for this change is that sometimes in the input_fn/output_fn, user would want to access to the current module for some attributes. This might not be common enough, but in some cases it's worth to access to the module. An outstanding use case we want to support is float8, if we want to make float8 works with the TP API, the input_fn/output_fn of TP parallel styles would need to get access to the module, where the module might encapsulates `dynamic_linear.emulate` attribute, that is useful for input/output casting Since this is needed for fp8 and DTensor still under prototype release, I feel it's worth the change and it's better we make the change as early. Right now making it a soft BC breaking, which means we maintain BC still but throw deprecation messages. Pull Request resolved: https://github.com/pytorch/pytorch/pull/120895 Approved by: https://github.com/tianyu-l	2024-03-04 07:22:32 +00:00
Tianyu Liu	af5376c444	[dtensor] add support for loss parallel (#119877 ) Loss parallel is the last piece of sequence parallelism to enable. It enables efficient distributed cross entropy computation when the input is sharded on the class dimension (in a classification problem with many classes). The implementation is via a context manager `loss_parallel`, after enabling which users can directly use `torch.nn.functional.cross_entropy` or `torch.nn.CrossEntropyLoss` without modifying other parts of their code. Here are the underlying rationales why we are going through these op replacements: 1. `nn.functional.cross_entropy` is the common method that OSS user is using for things like transformer training, to avoid changing user code, we want user to still use this function for loss calculation if they are already using it. 2. `nn.functional.cross_entropy` boils down into `aten.log_softmax` and `aten.nll_loss_foward/backward`, and DTensor now supports those ops already (#117723 #119255 #118917 #119256). They are doing computation with input replicated on the class dimension. 3. However when the input of this loss calculation is sharded on the class dimension, to run sharded computation efficiently, we need to run both `aten.log_softmax` and `aten.nll_loss_foward` with multiple all-reduce collectives in the middle of those aten ops. This is not possible if we are just overriding these two ops, so we need to have some way to decompose these two ops into smaller ops to have collectives run in the middle of these two ops. 4. We explored the existing decompositions (#118950). It seems working, except that `log_softmax_backward` and `nll_loss_backward` combined together in aten are implemented in a inefficient way, which would trigger an additional expensive collective. Recently some user also reported similar issues https://github.com/pytorch/pytorch/issues/119261. 5. Therefore, currently we are doing our own decomposition inside a context manager for sequence parallelism specifically. Once we have a better decomposition in core, we can possibly take that instead of reinventing the wheels here. Pull Request resolved: https://github.com/pytorch/pytorch/pull/119877 Approved by: https://github.com/wanchaol	2024-03-02 05:06:26 +00:00
Mihir Patel	6efda849b5	Update chunk_dtensor to support HYBRID_SHARD (#119481 ) Fixes https://github.com/pytorch/pytorch/issues/118639. Adds support to replicate across HSDP dimensions instead of sharding for shard placement Pull Request resolved: https://github.com/pytorch/pytorch/pull/119481 Approved by: https://github.com/Skylion007, https://github.com/wz337	2024-02-09 01:30:53 +00:00
Mihir Patel	88e346680b	Patch all_gather to support HSDP + TP (#118638 ) Update all_gather to support HSDP + TP. Currently, the `_all_gather_dtensor` function for dtensors only replaces the first dimension with replicate (the FSDP dimension) and does not touch the second dimension (which is assumed to be the TP dimension). With HSDP, we have two dimensions ahead of the TP dimension as opposed to 1. This PR updates to replace all other dimensions with replicate to run the all-gather. Pull Request resolved: https://github.com/pytorch/pytorch/pull/118638 Approved by: https://github.com/fegin, https://github.com/awgu, https://github.com/wz337	2024-02-05 18:29:23 +00:00
Catherine Lee	4f5785b6b3	Enable possibly-undefined error code (#118533 ) Fixes https://github.com/pytorch/pytorch/issues/118129 Suppressions automatically added with ``` import re with open("error_file.txt", "r") as f: errors = f.readlines() error_lines = {} for error in errors: match = re.match(r"(.):(\d+):\d+: error:.\[(.*)\]", error) if match: file_path, line_number, error_type = match.groups() if file_path not in error_lines: error_lines[file_path] = {} error_lines[file_path][int(line_number)] = error_type for file_path, lines in error_lines.items(): with open(file_path, "r") as f: code = f.readlines() for line_number, error_type in sorted(lines.items(), key=lambda x: x[0], reverse=True): code[line_number - 1] = code[line_number - 1].rstrip() + f" # type: ignore[{error_type}]\n" with open(file_path, "w") as f: f.writelines(code) ``` Signed-off-by: Edward Z. Yang <ezyang@meta.com> Co-authored-by: Catherine Lee <csl@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/118533 Approved by: https://github.com/Skylion007, https://github.com/zou3519	2024-01-30 21:07:01 +00:00
PyTorch MergeBot	40ece2e579	Revert "Enable possibly-undefined error code (#118533 )" This reverts commit `4f13f69a45`. Reverted https://github.com/pytorch/pytorch/pull/118533 on behalf of https://github.com/clee2000 due to sorry i'm trying to figure out a codev merge conflict, if this works i'll be back to rebase and merge ([comment](https://github.com/pytorch/pytorch/pull/118533#issuecomment-1917695185))	2024-01-30 19:00:34 +00:00
Edward Z. Yang	4f13f69a45	Enable possibly-undefined error code (#118533 ) Fixes https://github.com/pytorch/pytorch/issues/118129 Suppressions automatically added with ``` import re with open("error_file.txt", "r") as f: errors = f.readlines() error_lines = {} for error in errors: match = re.match(r"(.):(\d+):\d+: error:.\[(.*)\]", error) if match: file_path, line_number, error_type = match.groups() if file_path not in error_lines: error_lines[file_path] = {} error_lines[file_path][int(line_number)] = error_type for file_path, lines in error_lines.items(): with open(file_path, "r") as f: code = f.readlines() for line_number, error_type in sorted(lines.items(), key=lambda x: x[0], reverse=True): code[line_number - 1] = code[line_number - 1].rstrip() + f" # type: ignore[{error_type}]\n" with open(file_path, "w") as f: f.writelines(code) ``` Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/118533 Approved by: https://github.com/Skylion007, https://github.com/zou3519	2024-01-30 05:08:10 +00:00
Wanchao Liang	e696fa1ee7	[tp] enable rowwise embedding sharding in RowwiseParallel (#118242 ) As titled, this PR enables the rowwise embedding sharding in the RowwiseParallel style, and add tests to ensure it's working as expected Pull Request resolved: https://github.com/pytorch/pytorch/pull/118242 Approved by: https://github.com/tianyu-l ghstack dependencies: #118079, #118080	2024-01-26 19:01:24 +00:00
PyTorch MergeBot	bc67f87559	Revert "[tp] enable rowwise embedding sharding in RowwiseParallel (#118242 )" This reverts commit `7a9012d7e8`. Reverted https://github.com/pytorch/pytorch/pull/118242 on behalf of https://github.com/DanilBaibak due to Break internal build ([comment](https://github.com/pytorch/pytorch/pull/118079#issuecomment-1911681293))	2024-01-26 08:47:14 +00:00
Wanchao Liang	7a9012d7e8	[tp] enable rowwise embedding sharding in RowwiseParallel (#118242 ) As titled, this PR enables the rowwise embedding sharding in the RowwiseParallel style, and add tests to ensure it's working as expected Pull Request resolved: https://github.com/pytorch/pytorch/pull/118242 Approved by: https://github.com/tianyu-l ghstack dependencies: #118079, #118080	2024-01-26 01:36:24 +00:00
wz337	e1f9eca113	[DeviceMesh] Reuse sub_group pg if exists (#115716 ) Currently, we create new_group for sub_group pg during mesh initialization. The PR changes this so we will: 1) re-use sub_group pg if it exsits, 2) create new sub_group pg if it does not exist. Pull Request resolved: https://github.com/pytorch/pytorch/pull/115716 Approved by: https://github.com/wanchaol	2024-01-25 18:07:16 +00:00
Wanchao Liang	2bb2cc0b71	[tp] add clarification to doc and improve TP examples (#117618 ) This PR adds a clarification about evenly sharded assumption in the main tp doc and improved the tp examples by adding device mesh constructions fixes https://github.com/pytorch/pytorch/issues/100044 Pull Request resolved: https://github.com/pytorch/pytorch/pull/117618 Approved by: https://github.com/wconstab, https://github.com/awgu	2024-01-22 18:56:50 +00:00

1 2 3

135 Commits