mirror of
https://github.com/zebrajr/pytorch.git
synced 2025-12-07 12:21:27 +01:00
This PR adds support for the following use cases:
- Sync style:
```
with dist._coalescing_manager():
for i in range(num_coll):
dist.all_gather_into_tensor(output_tensors[i], input_tensors[i])
```
- Async style:
```
with dist._coalescing_manager(async_ops=True) as cm:
for i in range(num_coll):
dist.all_gather_into_tensor(output_tensors[i], input_tensors[i])
# do a bunch of other things
cm.wait()
# do things that depend on the all-gather's
```
Each `all_gather_into_tensor` would be independent in terms of data and their buffer location. But could be executed in parallel by supported backends (like NCCL).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/101157
Approved by: https://github.com/kumpera, https://github.com/wanchaol
|
||
|---|---|---|
| .. | ||
| _composable | ||
| _shard | ||
| _sharded_tensor | ||
| _sharding_spec | ||
| _spmd | ||
| _tensor | ||
| _tools | ||
| algorithms | ||
| autograd | ||
| benchmarks | ||
| checkpoint | ||
| elastic | ||
| examples | ||
| fsdp | ||
| launcher | ||
| nn | ||
| optim | ||
| pipeline | ||
| rpc | ||
| tensor | ||
| __init__.py | ||
| _composable_state.py | ||
| _functional_collectives.py | ||
| argparse_util.py | ||
| c10d_error_logger.py | ||
| constants.py | ||
| CONTRIBUTING.md | ||
| distributed_c10d.py | ||
| launch.py | ||
| logging_handlers.py | ||
| remote_device.py | ||
| rendezvous.py | ||
| run.py | ||
| utils.py | ||