pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 00:21:07 +01:00

Author	SHA1	Message	Date
Rodrigo Kumpera	38192f63cd	Add __all__ for a few distributed modules plus a little typing (reland) (#84872 ) This handles distributed_c10d, which is massive and ddp_comm_hooks. This relands #84119 with the required fixes. Pull Request resolved: https://github.com/pytorch/pytorch/pull/84872 Approved by: https://github.com/rohan-varma	2022-09-13 21:57:49 +00:00
PyTorch MergeBot	219ff26172	Revert "Add __all__ for a few distributed modules plus a little typing (#84119 )" This reverts commit `6f21680563`. Reverted https://github.com/pytorch/pytorch/pull/84119 on behalf of https://github.com/izaitsevfb due to breaking internal builds, see D39386448	2022-09-09 20:01:07 +00:00
Rodrigo Kumpera	6f21680563	Add __all__ for a few distributed modules plus a little typing (#84119 ) This handles distributed_c10d, which is massive and ddp_comm_hooks. Pull Request resolved: https://github.com/pytorch/pytorch/pull/84119 Approved by: https://github.com/rohan-varma	2022-09-08 23:28:31 +00:00
Pritam Damania	64670e414e	[reland] Create torch.distributed._shard package. (#72141 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/72141 We have many sharding components currently: torch.distributed._sharded_tensor, torch.distributed._sharding_spec, torch.distributed._sharded_optimizer and more coming. As a result, organizing all of this under the `torch.distributed._shard` package. For BC reasons, I'm still keeping the old packages and have them just reference the new package. ghstack-source-id: 148150861 ghstack-source-id: 148150861 Test Plan: waitforbuildbot Reviewed By: fduwjj Differential Revision: D33904585 fbshipit-source-id: 057e847eb7521b536a3ee4e0f94871aacc752062 (cherry picked from commit `29a70dd7af`)	2022-02-02 06:58:20 +00:00
Nikita Shulga	34494e6252	Back out "Create torch.distributed.shard package." (#72062 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/72062 Original commit changeset: dc692b31e260 Original Phabricator Diff: D33755913 (`87bbcf70f7`) Test Plan: CI Reviewed By: pbelevich Differential Revision: D33891115 fbshipit-source-id: 37286e03d743d8691319f07c95e9561d54f3d6d0 (cherry picked from commit `0c1b3fe008`)	2022-01-31 18:29:27 +00:00
Pritam Damania	87bbcf70f7	Create torch.distributed.shard package. (#71742 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/71742 We have many sharding components currently: torch.distributed._sharded_tensor, torch.distributed._sharding_spec, torch.distributed._sharded_optimizer and more coming. As a result, organizing all of this under the `torch.distributed.shard` package. For BC reasons, I'm still keeping the old packages and have them just reference the new package. ghstack-source-id: 147899768 Test Plan: waitforbuildbot Reviewed By: fduwjj, wanchaol Differential Revision: D33755913 fbshipit-source-id: dc692b31e2607063d55dfcb3db33ec53961d5a5b (cherry picked from commit `5b6885f358`)	2022-01-29 00:48:06 +00:00
Pritam Damania	c41d8290b3	Rename shard_lengths to shard_sizes to be more inline with Tensor sizes. (#66464 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66464 Dimension sizes are referred to as `size` in general in PyTorch and hence rename shard_lengths to shard_sizes. #Closes: https://github.com/pytorch/pytorch/issues/65794 ghstack-source-id: 143866449 Test Plan: waitforbuildbot Reviewed By: fduwjj, wanchaol Differential Revision: D31564153 fbshipit-source-id: 6273426c4b0e079358806070d0d9644740adb257	2021-11-19 16:30:00 -08:00
Wanchao Liang	35712a8eb4	[reland] simplify init_from_local_shards API (#68021 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68021 reland PR of https://github.com/pytorch/pytorch/pull/64481 as the previous one have some internal failures that didn't get captured when first landed. This simplifies `init_from_local_shards` API in sharded tensor, to only require user pass in a list of `Shard` and `overall_size`, instead of ShardedTensorMetadata. We will do the all_gather inside to form a valid ShardedTensorMetadata instead. TODO: add more test cases to improve coverage. ghstack-source-id: 143661119 ghstack-source-id: 143661119 Test Plan: TestShardedTensorFromLocalShards Reviewed By: pritamdamania87 Differential Revision: D32147888 fbshipit-source-id: 897128b75224f4b9644471a04a64079f51e0d5fe	2021-11-17 23:20:37 -08:00
Junjie Wang	2766662ca9	[PyTorch][2/N] Basic implementation of ShardedEmbeddingBag using ShardedTensor. (#67188 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67188 This diff/PR is trying to implement the ShardedEmbeddingBag using the ShardedTensor. We support both row-wise and column-wise sharding of the embedding bag. The detailed logic can be found in the comment. Several caveats: 1. Only the sharding of one weight is supported now. 1. We support limited input params for the op. To support more params are on the way. 2. We only support chuck sharding for now. 3. We only support a single local shard per rank for now. Some other changes include: 1. Refactor the ShardedEmbedding code so that the common logic can be reused. 2. Fix tiny typos and corner cases in API `get_chunked_dim_size`. Where it will return -1 if the we set the dim_size = 5, split_size = 2, idx = 3. (This is a valid case because when chunks = 4, dim_size = 5, then the split_size = 2) ghstack-source-id: 142325915 Test Plan: Unit test and CI Reviewed By: pritamdamania87 Differential Revision: D31749458 fbshipit-source-id: ed77e05e4ec94ef1a01b1feda8bbf32dc5d5da1b	2021-11-03 17:39:18 -07:00
Dmytro Ivchenko	ba74b03b0d	Back out "[sharded_tensor] simplify init_from_local_shards API" Summary: Original commit changeset: 6e97d95ffafd Test Plan: unit test Reviewed By: wanchaol Differential Revision: D32023341 fbshipit-source-id: 2a9f7b637c0ff18700bcc3e44466fffcff861698	2021-10-29 14:01:07 -07:00
Wanchao Liang	71a67d0ce9	[sharded_tensor] simplify init_from_local_shards API (#64481 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64481 This simplifies `init_from_local_shards` API in sharded tensor, to only require user pass in a list of `Shard` and `overall_size`, instead of ShardedTensorMetadata. We will do the all_gather inside to form a valid ShardedTensorMetadata instead. TODO: add more test cases to improve coverage. ghstack-source-id: 141742350 Test Plan: TestShardedTensorFromLocalShards Reviewed By: pritamdamania87 Differential Revision: D30748504 fbshipit-source-id: 6e97d95ffafde6b5f3970e2c2ba33b76cabd8d8a	2021-10-27 22:19:20 -07:00
Masaki Kozuki	768cfaa8f8	fix typo in _sharded_tensor (#65511 ) Summary: per title cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse SciPioneer H-Huang gcramer23 Pull Request resolved: https://github.com/pytorch/pytorch/pull/65511 Reviewed By: albanD Differential Revision: D31239269 Pulled By: cbalioglu fbshipit-source-id: 602c0bf7ef96a930606d68b15a5b3cadda9d9437	2021-09-29 18:00:47 -07:00
Xing Liu	600df80296	[PT/ShardedTensor]Allow zero size local shard (#65007 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65007 Relax shard size check in ShardMetadata to allow zero size local shard. When sharding a tensor on N ranks, some ranks may have empty shard allocated. As we are assuming SPMD, the ranks w/ empty shard still need to participate in all collectives, and we need to allow this in ShardMetadata. Test Plan: Unit tests and CLI Reviewed By: jiaqizhai, wanchaol Differential Revision: D30926566 fbshipit-source-id: afa562c94ffa8f8d91d65ddb4c348156d871dc36	2021-09-21 09:58:54 -07:00
Pritam Damania	0dc98728bc	Basic implementation of ShardedLinear using ShardedTensor. (#64128 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64128 This PR implements a sharded nn.Linear layer using ShardedTensors with the following limitations: 1) Works only for ChunkShardingSpec. 2) Implementation is only aimed to demonstrate functionality and is most likely not performant at all. The PR also introduces a `shard_parameter` API to easily shard parameters of `nn.Modules`. This also has the following limitations: 1) Works only for ChunkShardingSpec. 2) Is not performant since it uses broadcast instead of scatter since ProcessGroupNCCL doesn't yet support scatter. Overall user API for running a sharded linear would be something like this: ``` # SPMD programming paradigm running same code on all nodes. fc = nn.Linear(10, 10) # Setup sharding. sharding_spec=ChunkShardingSpec(...) shard_parameter(fc, 'weight', sharding_spec, src_rank=0) # Run as a normal linear layer. inp = torch.rand(10, 10) output = fc(inp) ``` ghstack-source-id: 138500985 Test Plan: 1) unit tests. 2) waitforbuildbot Reviewed By: wanchaol, bowangbj Differential Revision: D30621215 fbshipit-source-id: 1aa7478568c18a4572f6c3462fdf24a4cbde01d6	2021-09-20 18:31:11 -07:00
Wanchao Liang	d431c77d76	[sharded_tensor] fix typing issue for placement (#63426 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63426 placement should either be a string or a _remote_device, this fixes the type to match the behaviors ghstack-source-id: 136041125 Reviewed By: pritamdamania87 Differential Revision: D30379702 fbshipit-source-id: 34e226494240923b433e3a39cc08c84d42cdad6b	2021-08-17 23:11:48 -07:00
Pritam Damania	b8e6144e0a	Add a _RemoteDevice structure for ShardedTensor/ShardingSpec. (#62927 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62927 As part of the ShardedTensor work, we realized we do need some sort of _RemoteDevice structure that deals with our format of "workername/device" so that users don't have to worry about parsing this string directly. Right now this structure is just the bare minimum and is mostly a container for describing a remote device. It is currently only used in ShardedTensor, ShardingSpec and RemoteModule. Once we actually have a consolidated remote device proposal, this class can be extended appropriately if needed. ghstack-source-id: 135534086 Test Plan: 1) unit tests 2) waitforbuildbot Reviewed By: SciPioneer Differential Revision: D30170689 fbshipit-source-id: 1ac2e81c7a597dc40bf3fbf2c1168c382c66649f	2021-08-11 11:27:32 -07:00
Wanchao Liang	d92301dd02	[sharded_tensor] add new init_from_local_shards API (#60479 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60479 This added `init_from_local_shards` API to construct a ShardedTensor from local_shards and global sharded_tensor_metadata. It also refactors the utils in ShardingSpec to be able to be used by sharded_tensor for sanity check purpose. Test Plan: test_init_from_local_shards test_init_from_local_shards_invalid_sharding Reviewed By: pritamdamania87 Differential Revision: D29276777 fbshipit-source-id: 011c1d70426bc560a59b8d858c68f1aa12db8481	2021-07-29 22:04:13 -07:00
Pritam Damania	0222291544	Fix docs for ShardMetadata. (#61388 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61388 The doc for `placement` argument was outdated and is now fixed. ghstack-source-id: 133184441 Test Plan: waitforbuildbot Reviewed By: wanchaol Differential Revision: D29601316 fbshipit-source-id: a0817f799382bf91a5192c54dfeea4d253eb0d56	2021-07-07 21:27:30 -07:00
Pritam Damania	a8430f1076	Remove PlacementSpec from ShardingSpecs. (#59990 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59990 ShardingSpecs accepted a Device/PlacementSpec and was initially written this way for flexibility. Although, it is slightly confusing given there is no general use case for this. As a result, to keep things simple I've ensured that both specs only accept devices for now. We can always extend this to include a general PlacementSpec later on. ghstack-source-id: 131842525 Test Plan: waitforbuildbot Reviewed By: SciPioneer, rohan-varma Differential Revision: D29116463 fbshipit-source-id: a6f2b3f1346ac6afab91c9595d4cae4f4da04fda	2021-06-18 17:37:43 -07:00
Pritam Damania	f11120967e	Support EnumerableShardingSpec in ShardedTensor. (#59061 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59061 Overall Design: https://github.com/pytorch/pytorch/issues/55207 This PR builds upon https://github.com/pytorch/pytorch/pull/58517 and https://github.com/pytorch/pytorch/pull/57409 to support creating a ShardedTensor using EnumerableShardingSpec. ghstack-source-id: 130780376 Test Plan: 1) unit tests 2) waitforbuildbot Reviewed By: SciPioneer Differential Revision: D28734551 fbshipit-source-id: 656f5f2b22041dae071bc475f19fe94c969716e8	2021-06-09 23:21:14 -07:00
Pritam Damania	40f851c53e	Use dataclasses to simplify ShardingSpec (#58893 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58893 Leverage dataclasses to simplify some of the ShardingSpec classes. ghstack-source-id: 130041687 Test Plan: waitforbuildbot Reviewed By: SciPioneer Differential Revision: D28665137 fbshipit-source-id: da37517cf2bd8c65d4a5b7cae171fa460e6b0946	2021-05-27 17:33:28 -07:00
Pritam Damania	b420ded66f	ShardedTensor framework for ChunkedShardingSpec (#58517 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58517 Building upon the sharding specifications, this PR introduces the intial skeleton of ShardedTensor and allows building a ShardedTensor by specifying ChunkedShardingSpec. In follow up PRs, I'll add further support for GenericShardingSpec. ghstack-source-id: 129917841 Test Plan: 1) unit tests. 2) waitforbuildbot Reviewed By: SciPioneer Differential Revision: D28526012 fbshipit-source-id: 8e62847b58957d284e40f57a644302c171289138	2021-05-26 13:24:23 -07:00
Pritam Damania	4709fdb117	Add GenericShardingSpec for generic tensor sharding. (#57409 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57409 Full design: https://github.com/pytorch/pytorch/issues/55207 In https://github.com/pytorch/pytorch/issues/55207, we proposed `MeshShardingSpec` as a generic sharding mechanism. However, that proposal does not provide the flexibility to specify shards which have uneven sizes/partitions and assumes even partitioning. Uneven partitioning is one of the requirements of an internal use case. As a result, instead of that we introduce a `GenericShardingSpec` which allows specifying any arbitrary partitioning of a multi dimensional tensor. Basically it specifies the start offsets of each shard and the length of each dim of the shard allowing for greater flexibility ghstack-source-id: 129604155 Test Plan: 1) unit tests 2) waitforbuildbot Reviewed By: SciPioneer Differential Revision: D28137616 fbshipit-source-id: 61255762485fb8fa3ec3a43c27bbb222ca25abff	2021-05-23 16:06:05 -07:00
Pritam Damania	0d6fa1adc5	Introduce ChunkShardingSpec as a model sharding specification. (#55728 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55728 Full design: https://github.com/pytorch/pytorch/issues/55207 This PR introduces ChunkShardingSpec (SingleShardingSpec in the design). Used the name ChunkShardingSpec since it is very similar to `torch.chunk` in terms of how a Tensor is split up and feels more clear compared to SingleShardingSpec. ghstack-source-id: 129603318 Test Plan: waitforbuildbot Reviewed By: SciPioneer Differential Revision: D27694108 fbshipit-source-id: c8764abe6a4d5fc56d023fda29b74b5af2a73b49	2021-05-23 16:04:57 -07:00

24 Commits