Commit Graph

12 Commits

Author SHA1 Message Date
Iris Zhang (PyTorch)
23fa9621e4 [DeviceMesh] Rename _device_mesh.py to device_mesh.py to prepare for beta (#115099) (#115193)
Summary:

Rename _device_mesh.py to device_mesh.py, update all callsites, add documentation.
We created stubs for public class and methods in torch.distributed.device_mesh so that torch.distributed.device_mesh can be imported with or without distributed is available().

Original diff reverted: D51629761
Original PR reverted: https://github.com/pytorch/pytorch/pull/115099
Prior to landing, CI signals are all passed. Shipit added the "ci/trunk" label to the PR and DID NOT wait for it and went ahead committing. More context can be found in the reverted PR above.

Test Plan: CI.

Differential Revision: D51861018

Pull Request resolved: https://github.com/pytorch/pytorch/pull/115193
Approved by: https://github.com/fegin
2023-12-08 08:44:32 +00:00
wz337
dacf5d6e92 [DTensor] Remove assert to allow tensor sharding dimension < Shard(x).ndim (#115114)
Consolidated by changes made by @yoyoyocmu. https://www.internalfb.com/diff/D51821717
Remove assert to allow tensor dimension < Shard(x).ndim. With the current padding, we do support this already.

Follow up: we will still need to fix the size mismatch and `full_tensor()` hang when tensor is uneven-sharded.
Created issue here: https://github.com/pytorch/pytorch/issues/115310

Pull Request resolved: https://github.com/pytorch/pytorch/pull/115114
Approved by: https://github.com/yoyoyocmu, https://github.com/wanchaol
2023-12-07 21:57:30 +00:00
Nikita Shulga
a827ac71f2 Revert "[DeviceMesh] Rename _device_mesh.py to device_mesh.py to prepare for beta (#115099)"
This reverts commit eaa64339d6.
2023-12-05 08:59:36 -08:00
Iris Zhang (PyTorch)
eaa64339d6 [DeviceMesh] Rename _device_mesh.py to device_mesh.py to prepare for beta (#115099)
Summary:
Rename _device_mesh.py to device_mesh.py, update all callsites, adds documentation.

Original diff reverted: D51629761
Original PR reverted: https://github.com/pytorch/pytorch/pull/114991
It was failing because failing a public module binding tests in MacOS, and this is due to the change in import order for torch/distributed/fsdp/_common_utils.py. Since this original import would still work, we remove the changes in this file.

Test Plan: CI.

Differential Revision: D51825114

Pull Request resolved: https://github.com/pytorch/pytorch/pull/115099
Approved by: https://github.com/wanchaol, https://github.com/fegin
2023-12-05 05:44:52 +00:00
PyTorch MergeBot
3a2e2044cd Revert "[DeviceMesh] Rename _device_mesh.py to device_mesh.py to prepare for beta (#114710) (#114991)"
This reverts commit 729ac7317a.

Reverted https://github.com/pytorch/pytorch/pull/114991 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally ([comment](https://github.com/pytorch/pytorch/pull/114991#issuecomment-1837214567))
2023-12-02 17:55:51 +00:00
Iris Zhang (PyTorch)
729ac7317a [DeviceMesh] Rename _device_mesh.py to device_mesh.py to prepare for beta (#114710) (#114991)
Summary:

Same content of changes as https://github.com/pytorch/pytorch/pull/114710

Rename _device_mesh.py to device_mesh.py, update all callsites, adds documentation.
ghstack-source-id: 208980207
exported-using-ghexport

Test Plan: CI.

Reviewed By: wanchaol

Differential Revision: D51629761

Pull Request resolved: https://github.com/pytorch/pytorch/pull/114991
Approved by: https://github.com/wanchaol, https://github.com/fduwjj, https://github.com/fegin
2023-12-02 04:39:41 +00:00
wz337
49aa8d19dd [DTensor] Replace usage of compute_local_offset by compute_local_shape_and_global_offset (#108547)
This PR removes four usages of compute_local_offset() in PyTorch repo and replaces it with the new API compute_local_shape_and_global_offset().

We will be removing compute_local_offset() API in the next diff, as there are usages internally.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/108547
Approved by: https://github.com/wanchaol
2023-09-06 04:53:44 +00:00
wz337
13e4cce83c [DTensor] Add util API to compute_local_shape_and_global_offset for checkpointing purpose (#107996)
The compute_local_shape_and_global_offset API does the following:
1) Calculate both local_shape and global_offset in one API to replace two API calls (compute_local_size and compute_local_shape).
2) Generate the correct global_offset for checkpointing purposes. We are currently using compute_local_offset for downstream checkpoint components, which could lead to incorrect results. For checkpointing, we need global_offset instead of local_offset. In some cases, global_offset does not equal to local_offset, when a dimension is sharded multipe times on different mesh dimension (e.g. placements = [Shard(0), Shard(0)]).

Follow-up PRs:
1) Replace related downstream components to use compute_local_shape_and_global_offset instead of compute_local_size and compute_local_offset.
2) Audit existing code base to see if we can remove compute_local_size and compute_local_offset, since they are currently being used.

cc. @wanchaol
Pull Request resolved: https://github.com/pytorch/pytorch/pull/107996
Approved by: https://github.com/wanchaol
2023-08-30 02:46:50 +00:00
Iris
0d2b55c459 [DTensor] Change Sharding algorithm to be in line with `torch.chunk()` (#98722)
As functional collective being updated, using tensor_split() as the underlying sharding algorithm would require padding and unpadding on multiple ranks. Therefore, we are changing the sharding algorithm to be in line with ``torch.chunk()`` to allow padding on the last two ranks in most of the scenarios.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/98722
Approved by: https://github.com/wanchaol
2023-04-21 02:05:22 +00:00
Shen Li
02179827cb [Easy] Include SPMD and DTensor files in UFMT checks (#98148)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/98148
Approved by: https://github.com/fegin
2023-04-02 15:34:49 +00:00
Wanchao Liang
789fc4c292 [dtensor] refactor shape/offset calculation (#95923)
Shape offset calculation is commonly used and extract them into a separate util

Pull Request resolved: https://github.com/pytorch/pytorch/pull/95923
Approved by: https://github.com/fduwjj
2023-03-05 06:33:32 +00:00
Ke Sang
6c061e5145 [DTensor][Shampoo] add _tenso.zero function (#95863)
Summary:
implement zeros function inside DTensor API
- user specify the zeros tensor shape, and the function will create local zero tensor given the placement information

Test Plan:
{F889157756} - unit test for util function for compute_local_tensor_size
- unit test for _tensor.zeros

Reviewed By: wanchaol

Differential Revision: D43630718

Pull Request resolved: https://github.com/pytorch/pytorch/pull/95863
Approved by: https://github.com/wanchaol
2023-03-03 19:36:44 +00:00