Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72141
We have many sharding components currently:
torch.distributed._sharded_tensor, torch.distributed._sharding_spec,
torch.distributed._sharded_optimizer and more coming.
As a result, organizing all of this under the `torch.distributed._shard`
package. For BC reasons, I'm still keeping the old packages and have them just
reference the new package.
ghstack-source-id: 148150861
ghstack-source-id: 148150861
Test Plan: waitforbuildbot
Reviewed By: fduwjj
Differential Revision: D33904585
fbshipit-source-id: 057e847eb7521b536a3ee4e0f94871aacc752062
(cherry picked from commit 29a70dd7af)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71742
We have many sharding components currently:
torch.distributed._sharded_tensor, torch.distributed._sharding_spec,
torch.distributed._sharded_optimizer and more coming.
As a result, organizing all of this under the `torch.distributed.shard`
package. For BC reasons, I'm still keeping the old packages and have them just
reference the new package.
ghstack-source-id: 147899768
Test Plan: waitforbuildbot
Reviewed By: fduwjj, wanchaol
Differential Revision: D33755913
fbshipit-source-id: dc692b31e2607063d55dfcb3db33ec53961d5a5b
(cherry picked from commit 5b6885f358)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66464
Dimension sizes are referred to as `size` in general in PyTorch and
hence rename shard_lengths to shard_sizes.
#Closes: https://github.com/pytorch/pytorch/issues/65794
ghstack-source-id: 143866449
Test Plan: waitforbuildbot
Reviewed By: fduwjj, wanchaol
Differential Revision: D31564153
fbshipit-source-id: 6273426c4b0e079358806070d0d9644740adb257
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68021
reland PR of https://github.com/pytorch/pytorch/pull/64481 as the previous one have some internal failures that didn't get captured when first landed.
This simplifies `init_from_local_shards` API in sharded tensor, to only require user pass in a list of `Shard` and `overall_size`, instead of ShardedTensorMetadata. We will do the all_gather inside to form a valid ShardedTensorMetadata instead.
TODO: add more test cases to improve coverage.
ghstack-source-id: 143661119
ghstack-source-id: 143661119
Test Plan: TestShardedTensorFromLocalShards
Reviewed By: pritamdamania87
Differential Revision: D32147888
fbshipit-source-id: 897128b75224f4b9644471a04a64079f51e0d5fe
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67188
This diff/PR is trying to implement the ShardedEmbeddingBag using the ShardedTensor.
We support both row-wise and column-wise sharding of the embedding bag. The detailed logic can be found in the comment.
Several caveats:
1. Only the sharding of one weight is supported now.
1. We support limited input params for the op. To support more params are on the way.
2. We only support chuck sharding for now.
3. We only support a single local shard per rank for now.
Some other changes include:
1. Refactor the ShardedEmbedding code so that the common logic can be reused.
2. Fix tiny typos and corner cases in API `get_chunked_dim_size`. Where it will return -1 if the we set the dim_size = 5, split_size = 2, idx = 3. (This is a valid case because when chunks = 4, dim_size = 5, then the split_size = 2)
ghstack-source-id: 142325915
Test Plan: Unit test and CI
Reviewed By: pritamdamania87
Differential Revision: D31749458
fbshipit-source-id: ed77e05e4ec94ef1a01b1feda8bbf32dc5d5da1b
Summary: Original commit changeset: 6e97d95ffafd
Test Plan: unit test
Reviewed By: wanchaol
Differential Revision: D32023341
fbshipit-source-id: 2a9f7b637c0ff18700bcc3e44466fffcff861698
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64481
This simplifies `init_from_local_shards` API in sharded tensor, to only require user pass in a list of `Shard` and `overall_size`, instead of ShardedTensorMetadata. We will do the all_gather inside to form a valid ShardedTensorMetadata instead.
TODO: add more test cases to improve coverage.
ghstack-source-id: 141742350
Test Plan: TestShardedTensorFromLocalShards
Reviewed By: pritamdamania87
Differential Revision: D30748504
fbshipit-source-id: 6e97d95ffafde6b5f3970e2c2ba33b76cabd8d8a
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65007
Relax shard size check in ShardMetadata to allow zero size local shard.
When sharding a tensor on N ranks, some ranks may have empty shard allocated. As we are assuming SPMD, the ranks w/ empty shard still need to participate in all collectives, and we need to allow this in ShardMetadata.
Test Plan: Unit tests and CLI
Reviewed By: jiaqizhai, wanchaol
Differential Revision: D30926566
fbshipit-source-id: afa562c94ffa8f8d91d65ddb4c348156d871dc36
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64128
This PR implements a sharded nn.Linear layer using ShardedTensors with
the following limitations:
1) Works only for ChunkShardingSpec.
2) Implementation is only aimed to demonstrate functionality and is most likely
not performant at all.
The PR also introduces a `shard_parameter` API to easily shard parameters of
`nn.Modules`. This also has the following limitations:
1) Works only for ChunkShardingSpec.
2) Is not performant since it uses broadcast instead of scatter since
ProcessGroupNCCL doesn't yet support scatter.
Overall user API for running a sharded linear would be something like this:
```
# SPMD programming paradigm running same code on all nodes.
fc = nn.Linear(10, 10)
# Setup sharding.
sharding_spec=ChunkShardingSpec(...)
shard_parameter(fc, 'weight', sharding_spec, src_rank=0)
# Run as a normal linear layer.
inp = torch.rand(10, 10)
output = fc(inp)
```
ghstack-source-id: 138500985
Test Plan:
1) unit tests.
2) waitforbuildbot
Reviewed By: wanchaol, bowangbj
Differential Revision: D30621215
fbshipit-source-id: 1aa7478568c18a4572f6c3462fdf24a4cbde01d6
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63426
placement should either be a string or a _remote_device, this fixes the type to match the behaviors
ghstack-source-id: 136041125
Reviewed By: pritamdamania87
Differential Revision: D30379702
fbshipit-source-id: 34e226494240923b433e3a39cc08c84d42cdad6b
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62927
As part of the ShardedTensor work, we realized we do need some sort of
_RemoteDevice structure that deals with our format of "workername/device" so
that users don't have to worry about parsing this string directly.
Right now this structure is just the bare minimum and is mostly a container for
describing a remote device. It is currently only used in ShardedTensor,
ShardingSpec and RemoteModule.
Once we actually have a consolidated remote device proposal, this class can be
extended appropriately if needed.
ghstack-source-id: 135534086
Test Plan:
1) unit tests
2) waitforbuildbot
Reviewed By: SciPioneer
Differential Revision: D30170689
fbshipit-source-id: 1ac2e81c7a597dc40bf3fbf2c1168c382c66649f
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60479
This added `init_from_local_shards` API to construct a ShardedTensor from local_shards and global sharded_tensor_metadata. It also refactors the utils in ShardingSpec to be able to be used by sharded_tensor for sanity check purpose.
Test Plan:
test_init_from_local_shards
test_init_from_local_shards_invalid_sharding
Reviewed By: pritamdamania87
Differential Revision: D29276777
fbshipit-source-id: 011c1d70426bc560a59b8d858c68f1aa12db8481
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61388
The doc for `placement` argument was outdated and is now fixed.
ghstack-source-id: 133184441
Test Plan: waitforbuildbot
Reviewed By: wanchaol
Differential Revision: D29601316
fbshipit-source-id: a0817f799382bf91a5192c54dfeea4d253eb0d56
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59990
ShardingSpecs accepted a Device/PlacementSpec and was initially
written this way for flexibility. Although, it is slightly confusing given
there is no general use case for this. As a result, to keep things simple I've
ensured that both specs only accept devices for now.
We can always extend this to include a general PlacementSpec later on.
ghstack-source-id: 131842525
Test Plan: waitforbuildbot
Reviewed By: SciPioneer, rohan-varma
Differential Revision: D29116463
fbshipit-source-id: a6f2b3f1346ac6afab91c9595d4cae4f4da04fda
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/58517
Building upon the sharding specifications, this PR introduces the
intial skeleton of ShardedTensor and allows building a ShardedTensor by
specifying ChunkedShardingSpec.
In follow up PRs, I'll add further support for GenericShardingSpec.
ghstack-source-id: 129917841
Test Plan:
1) unit tests.
2) waitforbuildbot
Reviewed By: SciPioneer
Differential Revision: D28526012
fbshipit-source-id: 8e62847b58957d284e40f57a644302c171289138
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57409
Full design: https://github.com/pytorch/pytorch/issues/55207
In https://github.com/pytorch/pytorch/issues/55207, we proposed
`MeshShardingSpec` as a generic sharding mechanism. However, that proposal does
not provide the flexibility to specify shards which have uneven
sizes/partitions and assumes even partitioning. Uneven partitioning is one of
the requirements of an internal use case.
As a result, instead of that we introduce a `GenericShardingSpec` which allows
specifying any arbitrary partitioning of a multi dimensional tensor. Basically
it specifies the start offsets of each shard and the length of each dim of the
shard allowing for greater flexibility
ghstack-source-id: 129604155
Test Plan:
1) unit tests
2) waitforbuildbot
Reviewed By: SciPioneer
Differential Revision: D28137616
fbshipit-source-id: 61255762485fb8fa3ec3a43c27bbb222ca25abff
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/55728
Full design: https://github.com/pytorch/pytorch/issues/55207
This PR introduces ChunkShardingSpec (SingleShardingSpec in the design). Used
the name ChunkShardingSpec since it is very similar to `torch.chunk` in terms
of how a Tensor is split up and feels more clear compared to SingleShardingSpec.
ghstack-source-id: 129603318
Test Plan: waitforbuildbot
Reviewed By: SciPioneer
Differential Revision: D27694108
fbshipit-source-id: c8764abe6a4d5fc56d023fda29b74b5af2a73b49