pytorch/torch/distributed
Shengbao Zheng 60aaba4128 create function to get ProcessGroupNCCL uid (#121132)
Summary: expose ProcessGroupNCCL uid

Differential Revision: D54446056

Pull Request resolved: https://github.com/pytorch/pytorch/pull/121132
Approved by: https://github.com/aaronenyeshi
2024-03-07 18:34:38 +00:00
..
_composable [FSDP2] Relaxed check for parent mesh (#121360) 2024-03-07 08:09:25 +00:00
_shard [dist][sharded_tensor] Fix ChunkShardingSpec metadata offsets for empty shards (#121002) 2024-03-02 08:58:48 +00:00
_sharded_tensor
_sharding_spec
_spmd Fix ouput typos (#120870) 2024-02-29 08:29:14 +00:00
_tensor [DTensor][XLA] refactor DTensor _xla API (#113214) 2024-03-07 06:18:05 +00:00
_tools
algorithms [DDP] Use compiled_autograd to trace DDP backward allreduce (#110662) 2024-02-08 03:03:15 +00:00
autograd
benchmarks Enable possibly-undefined error code (#118533) 2024-01-30 21:07:01 +00:00
checkpoint [DCP] Adds main in format utils (#120128) 2024-03-07 01:18:17 +00:00
elastic [Torchelasic] Create root log directory by default (#121257) 2024-03-06 18:50:38 +00:00
examples
fsdp [FSDP][StateDict] Allow FULL_STATE_DICT option for 2D (#120837) 2024-03-05 10:03:44 +00:00
launcher [Torchelastic][Logging] Pluggable logsspecs using python entrypoints and option to specify one by name. (#120942) 2024-03-02 08:07:52 +00:00
nn Fix a bug where nn.functional._AllGather.backward produces wrong gradients (#120582) 2024-02-26 09:58:27 +00:00
optim [mta] Fused SGD (#116585) 2024-01-16 23:54:38 +00:00
pipeline
rpc [BE]: Use iterable.chain.from_iterable where possible (#116376) 2023-12-27 19:20:07 +00:00
tensor [TP] Introduce Sequence Parallel Style for Laynorm/RMSNorm/Dropout (#121295) 2024-03-07 02:04:59 +00:00
__init__.py Fix torch.distributed.breakpoint (#115705) 2023-12-13 20:33:56 +00:00
_composable_state.py
_functional_collectives_impl.py [functional collecitve] don't import torchdynamo when running torchdeploy (#120900) 2024-02-29 19:20:54 +00:00
_functional_collectives.py [dynamo] support group=None when rewriting collectives (#121043) 2024-03-06 21:37:19 +00:00
_state_dict_utils.py [DCP][state_dict] Implement pin_memory and shared_memory copy for _offload_state_dict_to_cpu (#120378) 2024-03-05 17:48:15 +00:00
argparse_util.py Add --local-ranks-filter to torchrun: allow logs filtering by rank (#118562) 2024-02-07 04:29:54 +00:00
c10d_logger.py
collective_utils.py
constants.py
CONTRIBUTING.md
device_mesh.py [DeviceMesh] Ensure mesh tensor is a cpu tensor (#120046) 2024-02-22 22:03:13 +00:00
distributed_c10d.py create function to get ProcessGroupNCCL uid (#121132) 2024-03-07 18:34:38 +00:00
launch.py
logging_handlers.py
remote_device.py
rendezvous.py Enable local_partial_types (#118467) 2024-01-28 13:38:22 +00:00
run.py [Torchelastic][Logging] Pluggable logsspecs using python entrypoints and option to specify one by name. (#120942) 2024-03-02 08:07:52 +00:00
utils.py [fsdp][torch.compile] FSDP changes (#115497) 2023-12-19 18:44:36 +00:00