pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
Wanchao Liang	2ee6b97464	[dtensor] move DTensor to public namespace (#133113 ) Moving DTensor to be in the public namespace, to formally add the documentation page that includes all the public APIs. This includes: * many path renames and path import fixes * a dedicated doc page without too much content yet (adding in the next PRs) * To preserve the BC for users still using the `torch.distributed._tensor`, I added a shim script to redirect old path calls to the new module The BC preserving is evidented by the fact that all DTensor tests are still working without changing the public imports. So it's safe to land the changes Pull Request resolved: https://github.com/pytorch/pytorch/pull/133113 Approved by: https://github.com/XilunWu ghstack dependencies: #133305, #133306	2024-08-17 05:09:52 +00:00
wz337	87053132ea	[DeviceMesh] Remove parent mesh concept from _MeshEnv and replace by root mesh (#132339 ) Previously, when we slice out a submesh from a mesh, we assign the mesh as the parent mesh of the submesh. In this case, when we have a 3D mesh topology, the parent mesh of a 1D mesh sliced out from the 3D mesh is different from the parent mesh of the same 1D mesh sliced out from the 2D submesh of the 3D mesh. For example: ``` mesh_3d = init_device_mesh("cuda", (2,2,2), ("dim0", "dim1", "dim2")) mesh_dim0 = mesh_3d["dim0"] mesh_2d = mesh_2d["dim0", "dim1"] mesh_dim0_2 = mesh_2d["dim0_2"] # This would evaluate to be True print(_mesh_resources.get_parent_mesh(mesh_dim0) != _mesh_resources.get_parent_mesh(mesh_dim0)) ``` We can always reconstruct the mesh needed from the mesh dim names, as long as two dims come from the same root. For simplicity, we do not see the necessity of building a tree structure to represent child-parent relationship. Therefore, we are replacing the parent mesh concept with a root mesh concept in `_MeshEnv` so we would have: ``` mesh_3d = init_device_mesh("cuda", (2,2,2), ("dim0", "dim1", "dim2")) mesh_dim0 = mesh_3d["dim0"] mesh_2d = mesh_2d["dim0", "dim1"] mesh_dim0_2 = mesh_2d["dim0_2"] # This would evaluate to be True print(_mesh_resources.get_root_mesh(mesh_dim0) == _mesh_resources.get_root_mesh(mesh_dim0)) ``` With this change, we will have two types of meshes in an environment. 1. `device_mesh != _mesh_resources.get_root_mesh(device_mesh)` means that the device_mesh is created by slicing. 2. `device_mesh == _mesh_resources.get_root_mesh(device_mesh)` means that the device_mesh is a root mesh not created through slicing. Pull Request resolved: https://github.com/pytorch/pytorch/pull/132339 Approved by: https://github.com/wanchaol ghstack dependencies: #132310, #132311	2024-08-07 07:01:12 +00:00
Xuehai Pan	b25ef91bf1	[BE][Easy][18/19] enforce style for empty lines in import segments in `torch/d*/` (#129770 ) See https://github.com/pytorch/pytorch/pull/129751#issue-2380881501. Most changes are auto-generated by linter. You can review these PRs via: ```bash git diff --ignore-all-space --ignore-blank-lines HEAD~1 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/129770 Approved by: https://github.com/wconstab	2024-08-01 04:22:50 +00:00
Jeeja	16e0868a3d	[FSDP] Add hpu device to _get_remote_device_str (#132120 ) In _creating chunk_sharded_tensor, _get_remote_device_str is used. by default it uses the node cound to determine the device:instance. for hpu, need to use current device to get the deivce_instance. Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/132120 Approved by: https://github.com/awgu	2024-07-30 14:24:24 +00:00
Aaron Orenstein	7c12cc7ce4	Flip default value for mypy disallow_untyped_defs [6/11] (#127843 ) See #127836 for details. Pull Request resolved: https://github.com/pytorch/pytorch/pull/127843 Approved by: https://github.com/oulgen ghstack dependencies: #127842	2024-06-08 18:49:29 +00:00
Shawn Xu	5b8c81eb82	[PT] [FSDP] fix HSDP sharding placement (#123778 ) Summary: https://github.com/pytorch/pytorch/pull/123230 formalized the contract for `ShardedTensor` sub group rank placement validation by making sure the placement rank is global rank, to align with general `torch.distributed` convention. The current HSDP allows for both `ShardedTensor` and `DTensor`. While `DTensor` will eventually will replace `ShardedTensor`, its usage still exists and there's at least one test verifying the state dict with ST output. This got broken as the test is run periodically only so it didn't block the other PR. Fixes [#123749](https://github.com/pytorch/pytorch/issues/123749) Test Plan: CI Differential Revision: D55991256 Pull Request resolved: https://github.com/pytorch/pytorch/pull/123778 Approved by: https://github.com/Skylion007, https://github.com/wz337	2024-04-12 00:05:49 +00:00
Wanchao Liang	0fef82b3df	[dcp] fix fsdp state_dict to use run_check=False (#114995 ) from_local with replicate placement would run mesh_broadcast if run_check=True, by default from_local have run_check=True, but for FSDP state_dict case we are for sure that these are replica already, so we don't need to check/force check it. Pull Request resolved: https://github.com/pytorch/pytorch/pull/114995 Approved by: https://github.com/fegin, https://github.com/XilunWu, https://github.com/wz337	2023-12-02 04:16:37 +00:00
Chien-Chin Huang	a66f2a1b99	[state_dict] Move _gather_state_dict to dcp module (#112835 ) This api is getting used by more than just FSDP. This PR moves it to DCP module. Differential Revision: [D50962966](https://our.internmc.facebook.com/intern/diff/D50962966/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/112835 Approved by: https://github.com/wz337	2023-11-08 19:42:56 +00:00
Iris Zhang	c84dbd2c03	[2D] Enable 2D optimizer set_state_dict() (#111778 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/111778 Approved by: https://github.com/fegin, https://github.com/fduwjj ghstack dependencies: #111774	2023-10-27 04:33:00 +00:00
PyTorch MergeBot	d8e19bb03a	Revert "[2D] Enable 2D optimizer set_state_dict() (#111778 )" This reverts commit `52eec50d31`. Reverted https://github.com/pytorch/pytorch/pull/111778 on behalf of https://github.com/huydhn due to Sorry for reverting you change, but it is failing multigpu test in trunk `52eec50d31` ([comment](https://github.com/pytorch/pytorch/pull/111778#issuecomment-1780227820))	2023-10-26 00:18:30 +00:00
wz337	52eec50d31	[2D] Enable 2D optimizer set_state_dict() (#111778 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/111778 Approved by: https://github.com/fegin ghstack dependencies: #111774	2023-10-25 04:27:13 +00:00
wz337	80dfc974dd	[2D] Enable 2D FSDP+TP model.load_state_dict() (#110925 ) This PR adds a all_gather_dtensor() method to fsdp/_fsdp_extensions.py and the actual implementation in tensor/parallel/fsdp.py. This enables FSDP to load 2D DTensor state_dict into model when calling `model.load_state_dict()`. cc. @fegin Pull Request resolved: https://github.com/pytorch/pytorch/pull/110925 Approved by: https://github.com/fegin ghstack dependencies: #110831, #110846	2023-10-11 18:22:20 +00:00
wz337	6c136c3302	[2D] Enable 2D DTensor state_dict for FSDP + TP (#110846 ) This PR adds a `chunk_dtensor()` method to fsdp/_fsdp_extensions.py and the actual implementation of `chunk_dtensor()` in tensor/parallel/fsdp.py. This enables FSDP to return 2D DTensor state_dict when composing FSDP with TP. cc. @fegin Pull Request resolved: https://github.com/pytorch/pytorch/pull/110846 Approved by: https://github.com/fegin, https://github.com/wanchaol ghstack dependencies: #110831	2023-10-11 17:40:39 +00:00
wz337	d9eb5a57aa	[FSDP] Change _create_chunk_dtensor in fsdp/_shard_utils.py to use public API from DTensor (#110831 ) This PR: 1) updates _create_chunk_dtensor() in _shard_utils.py to use public APIs from DTensor. This will avoid the global_size calculation error from using DTensor.from_local() for uneven-sharded parameters, as described in https://github.com/pytorch/pytorch/issues/110762 2) updates test/distributed/fsdp/test_fsdp_dtensor_state_dict.py to include unit test for a model with uneven sharding. cc. @wanchaol, @fegin Pull Request resolved: https://github.com/pytorch/pytorch/pull/110831 Approved by: https://github.com/wanchaol, https://github.com/fegin	2023-10-10 21:04:27 +00:00
Chien-Chin Huang	88616349d7	[state_dict][1/N] Implement the basic functions of distributed.checkpoint._state_dict (#105902 ) This PR implements the basic functions of distributed.checkpoint._state_dict. This PR currently contains the flattening of optimizer state_dict which makes the PR too large. A later version may split it into 2 for a better code review. Differential Revision: [D47647719](https://our.internmc.facebook.com/intern/diff/D47647719/) NOTE FOR REVIEWERS: This PR has internal Meta-specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D47647719/)! Pull Request resolved: https://github.com/pytorch/pytorch/pull/105902 Approved by: https://github.com/wz337	2023-10-05 20:04:15 +00:00
Chien-Chin Huang	1b3e5b53f3	[FSDP][optim_state_dict] Add device to _shard_utils.py to explicitly use the device from fsdp_state (#109631 ) _get_pg_default_device does not always get the device we want. This PR let the user explicitly tell use the correct device. Differential Revision: [D49425743](https://our.internmc.facebook.com/intern/diff/D49425743/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/109631 Approved by: https://github.com/awgu, https://github.com/fduwjj, https://github.com/wz337	2023-09-20 01:59:38 +00:00
wz337	66af4f6ec7	[HSDP] Add device_mesh to FSDP kwarg and add dtensor state_dict support for HSDP (#107533 ) This PR: 1) Add device_mesh kwarg to FSDP. Remove init_device_mesh() from _runtime_utils.py, as device_mesh would be passed in by user as an kwarg. 2) change use_dtensor flag for state_dict_config and optim_state_dict_config to be private. If device_mesh is used with sharded model/optim state dict, _use_dtensor flag would be set to True and model/optim state dict would return dtensor state_dict. Otherwise, _use_dtensor flag would be set to False and model/optim state dict would return sharded_tensor state_dict. 3) Update _optim_utils.py, _shard_utils.py, and _state_dict_utils.py to add support for HSDP to return 2D DTensor state_dict. Pull Request resolved: https://github.com/pytorch/pytorch/pull/107533 Approved by: https://github.com/fegin, https://github.com/awgu, https://github.com/wanchaol	2023-09-05 21:21:21 +00:00
Chien-Chin Huang	591cb776af	[FSDP][state_dict][optim_state_dict] Log slow optim and model state_dict paths (#108290 ) This PR adds SimpleProfiler for FSDP state_dict/load_state_dict logging purpose. SimpleProfiler use class variables to record profiling results and it does everything in the Python which can be slow. So it is only suitable for logging slow actions such as initialization and state_dict/load_state_dict. This PR uses SimpleProfiler to log some critical/slow paths of the model and optimizer state_dict/load_state_dict. Differential Revision: [D48774406](https://our.internmc.facebook.com/intern/diff/D48774406/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/108290 Approved by: https://github.com/wz337	2023-09-01 06:57:59 +00:00
PyTorch MergeBot	ab5b4c4419	Revert "[HSDP] Add device_mesh to FSDP and add dtensor state_dict support for HSDP (#107533 )" This reverts commit `cc220e45a8`. Reverted https://github.com/pytorch/pytorch/pull/107533 on behalf of https://github.com/huydhn due to Sorry for reverting this, but it is failing in trunk with the same failure on test_dynamo_distributed `cc220e45a8` ([comment](https://github.com/pytorch/pytorch/pull/107533#issuecomment-1701983247))	2023-09-01 01:26:30 +00:00
wz337	cc220e45a8	[HSDP] Add device_mesh to FSDP and add dtensor state_dict support for HSDP (#107533 ) This PR: 1) Add device_mesh kwarg to FSDP. Remove init_device_mesh() from _runtime_utils.py, as device_mesh would be passed in by user as an kwarg. 2) change use_dtensor flag for state_dict_config and optim_state_dict_config to be private. If device_mesh is used with sharded model/optim state dict, _use_dtensor flag would be set to True and model/optim state dict would return dtensor state_dict. Otherwise, _use_dtensor flag would be set to False and model/optim state dict would return sharded_tensor state_dict. 3) Update _optim_utils.py, _shard_utils.py, and _state_dict_utils.py to add support for HSDP to return 2D DTensor state_dict. Pull Request resolved: https://github.com/pytorch/pytorch/pull/107533 Approved by: https://github.com/fegin, https://github.com/awgu, https://github.com/wanchaol	2023-09-01 00:15:00 +00:00
Iris	6b2d48e78c	[8/n][FSDP] make use_dtensor=True work with offload_to_cpu=True for optim.load_state_dict() (#105690 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/105690 Approved by: https://github.com/fegin	2023-07-21 18:55:01 +00:00
Iris	613970eb05	[5/n][FSDP] Update _sharded_post_state_dict_hook to use DTensor when use_dtensor=True in state_dict_config (#103921 ) This allows us use use_dtensor=True for ShardedStateDictConfig() before calling model.state_dict(). load_state_dict hooks updates will be in next PR. Pull Request resolved: https://github.com/pytorch/pytorch/pull/103921 Approved by: https://github.com/fduwjj, https://github.com/fegin	2023-06-22 08:32:19 +00:00
Rodrigo Kumpera	f83ebfe1bb	[FSDP] Improve support for CPU tensors. (#103171 ) Don't emit device index when using CPU devices. Don't call Tensor::record_stream as it's CUDA only op. Pull Request resolved: https://github.com/pytorch/pytorch/pull/103171 Approved by: https://github.com/rohan-varma, https://github.com/wz337	2023-06-20 21:08:19 +00:00
Iris	d991ce6da3	[FSDP][3/N]_shard_utils update for dtensor state_dict support (#103479 ) Same as https://github.com/pytorch/pytorch/pull/102545 (this branch is corrupted so have to re-submit). Pull Request resolved: https://github.com/pytorch/pytorch/pull/103479 Approved by: https://github.com/fegin	2023-06-14 06:45:28 +00:00
medivh-xp	8b7bd81902	determined collective device by _get_pg_default_device rather than explicit cuda (#101533 ) There are many communication operations for shardedTensor in the state dict of fsdp. They use the external passed-in pg (or the default pg), which currently supports cuda devices. Before communication, the memory will be moved to cuda, which is implicit (because it is essentially moving data to the memory type required by pg, not the computing device type). Similarly, when users use fsdp on a custom backend, they will pass in a custom pg (which does not support cuda devices), which may cause fsdp to not work properly in some cases. This PR obtains the memory type supported by the pg through _get_pg_default_device during communication, and moves the data to it when needed. Pull Request resolved: https://github.com/pytorch/pytorch/pull/101533 Approved by: https://github.com/awgu	2023-05-24 13:48:43 +00:00
Chien-Chin Huang	4f62e7cb10	[FSDP][BE] Remove unused code (#99731 ) Remove the unused code. https://github.com/pytorch/pytorch/pull/99675 is duplicated and we should land this PR. Pull Request resolved: https://github.com/pytorch/pytorch/pull/99731 Approved by: https://github.com/wz337	2023-04-21 23:11:37 +00:00
Kazuaki Ishizaki	35fd5c548e	Fix typos under torch/distributed directory (#95638 ) This PR fixes typos in comments and messages of `.py` files under torch/distributed directory Pull Request resolved: https://github.com/pytorch/pytorch/pull/95638 Approved by: https://github.com/usamah1, https://github.com/H-Huang, https://github.com/kit1980	2023-03-27 21:13:44 +00:00
Andrew Gu	e3cf81e0a7	[FSDP] ufmt /fsdp (#87811 ) This applies `ufmt` to all of the FSDP files in the `torch/distributed/fsdp/` directory. Test Plan CI Notes For VSCode users, - Install `ufmt`: https://pypi.org/project/ufmt/ - Install VSCode `ufmt` extension: https://marketplace.visualstudio.com/items?itemName=omnilib.ufmt - Include in `settings.json`: ``` { "[python]": { "editor.defaultFormatter": "omnilib.ufmt", "editor.formatOnSave": true, }, } ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/87811 Approved by: https://github.com/rohan-varma, https://github.com/fegin	2022-10-27 04:25:55 +00:00
Andrew Gu	ff71f45788	[FSDP] Add `FSDPExtensions` for TP support (#85039 ) This adds `FSDPExtensions` to enable TP + FSDP composability. To be agnostic to both `ShardedTensor` and `DistributedTensor`, the design relies on customizable hooks. Some notes: - I preferred the `_ext` prefix (short for "extension") over `_param_extension` simply because it is shorter. It should not matter much because it is purely internal facing. Pull Request resolved: https://github.com/pytorch/pytorch/pull/85039 Approved by: https://github.com/kumpera, https://github.com/fegin	2022-09-28 18:34:17 +00:00
Chien-Chin Huang	3e1fc85b23	[FSDP] Implement sharded_optim_state_dict and flatten_sharded_optim_state_dict. (#77628 ) As title Differential Revision: [D36436496](https://our.internmc.facebook.com/intern/diff/D36436496/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/77628 Approved by: https://github.com/awgu	2022-08-18 16:38:58 +00:00
Chien-Chin Huang	244690205f	[FSDP] Use _init_from_local_tensor to create ShardedTensor to avoid communication overhead (#82911 ) FSDP originally uses `_init_from_local_shards_and_global_metadata()` to create a ShardedTensor for sharded_state_dict(). We have seen some non-trivial overhead if the number of tensors is large. Using `_init_from_local_shards_and_global_metadata ` can significantly reduce the overhead. For a model with ~250 tensors in the state_dict trained with 16 GPUs, the original `sharded_state_dict` takes ~1.7 seconds and this PR reduces the overhead to ~0.6 seconds. Differential Revision: [D38452170](https://our.internmc.facebook.com/intern/diff/D38452170/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/82911 Approved by: https://github.com/awgu	2022-08-17 16:40:20 +00:00

31 Commits