pytorch/torch/distributed
Ke Wen daed3bf8f9 Implement coalesced all_gather_into_tensor (#101157)
This PR adds support for the following use cases:
- Sync style:
```
with dist._coalescing_manager():
     for i in range(num_coll):
         dist.all_gather_into_tensor(output_tensors[i], input_tensors[i])
```
- Async style:
```
with dist._coalescing_manager(async_ops=True) as cm:
     for i in range(num_coll):
         dist.all_gather_into_tensor(output_tensors[i], input_tensors[i])

# do a bunch of other things
cm.wait()
# do things that depend on the all-gather's
```
Each `all_gather_into_tensor` would be independent in terms of data and their buffer location. But could be executed in parallel by supported backends (like NCCL).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/101157
Approved by: https://github.com/kumpera, https://github.com/wanchaol
2023-05-11 20:58:47 +00:00
..
_composable [replicate] support simpler device_id (#100217) 2023-05-04 21:06:04 +00:00
_shard [distributed][sharded_tensor] Move local_shards check from ShardedTensorBase to ShardedTensor (#100197) 2023-05-02 12:42:24 +00:00
_sharded_tensor
_sharding_spec
_spmd handle new param from torch.compile (Inductor pattern matcher), enable_log (#100814) 2023-05-08 18:34:45 +00:00
_tensor [dtensor] tensor ops to use strategy based sharding prop (#100607) 2023-05-11 02:47:20 +00:00
_tools Fix typos under torch/distributed directory (#95638) 2023-03-27 21:13:44 +00:00
algorithms Properly propagates checkpoint wrapper args and kwargs (#99791) 2023-05-03 23:19:21 +00:00
autograd
benchmarks
checkpoint [BE] Fix flake8 B027 errors - missing abstractmethod decorator (#100715) 2023-05-09 17:28:48 +00:00
elastic [BE] Fix flake8 B027 errors - missing abstractmethod decorator (#100715) 2023-05-09 17:28:48 +00:00
examples Fix typos under torch/distributed directory (#95638) 2023-03-27 21:13:44 +00:00
fsdp [FSDP][state_dict] Make sharded_state_dict work with composable fully_shard (#100856) 2023-05-10 15:32:45 +00:00
launcher Convert logging f-strings to use % format, part four (#98705) 2023-04-11 13:17:59 +00:00
nn Convert logging f-strings to use % format (#98697) 2023-04-10 12:19:31 +00:00
optim Convert logging f-strings to use % format, part four (#98705) 2023-04-11 13:17:59 +00:00
pipeline Enable ruff in lintrunner (#99785) 2023-04-24 16:18:44 +00:00
rpc [BE] Fix all B022 useless-contextlib-suppress (#100335) 2023-04-30 18:47:40 +00:00
tensor [dtensor] tensor ops to use strategy based sharding prop (#100607) 2023-05-11 02:47:20 +00:00
__init__.py [c10d] Faster coalescing (#98793) 2023-04-24 21:27:26 +00:00
_composable_state.py
_functional_collectives.py Work around torchdynamo import error with functional collectives (#100901) 2023-05-09 16:09:42 +00:00
argparse_util.py
c10d_error_logger.py
constants.py
CONTRIBUTING.md
distributed_c10d.py Implement coalesced all_gather_into_tensor (#101157) 2023-05-11 20:58:47 +00:00
launch.py Fix typos under torch/distributed directory (#95638) 2023-03-27 21:13:44 +00:00
logging_handlers.py
remote_device.py
rendezvous.py Revisit torch._six.string_classes removal (#94709) (#97863) 2023-03-30 17:02:45 +00:00
run.py Convert logging f-strings to use % format, part four (#98705) 2023-04-11 13:17:59 +00:00
utils.py [PyTorch/Distributed]Only sync buffers when broadcast_buffers is True (#100729) 2023-05-08 16:34:29 +00:00