pytorch/torch/distributed
2024-10-17 01:05:41 +00:00
..
_composable [Traceable FSDP2] Add compiled_autograd_enabled helper function (#138105) 2024-10-17 00:04:06 +00:00
_shard [BE]: Update mypy to 1.11.2 (#133816) 2024-09-16 19:44:11 +00:00
_sharded_tensor
_sharding_spec
_symmetric_memory [SymmetricMemory] fix a race condition in _pipelined_produce_and_all2all that can cause correctness issues for very small chunk_producers (#138126) 2024-10-17 01:05:41 +00:00
_tensor [reland][dtensor] move DTensor to public namespace (#134203) 2024-09-08 17:08:40 +00:00
_tools Selective Activation Checkpointing (SAC) Estimator for estimating memory and recomputation time trade-offs. (#135208) 2024-10-14 13:56:40 +00:00
algorithms [BE]: Update mypy to 1.11.2 (#133816) 2024-09-16 19:44:11 +00:00
autograd
benchmarks [BE][Easy] enable ruff rule PIE790: unnecessary pass statement (#133200) 2024-08-15 15:50:19 +00:00
checkpoint [Distributed] fix FileSystemWriter __init__ (#136135) 2024-09-16 19:11:08 +00:00
elastic Fix rendezvous error due to EtcdStore get method not waiting in some cases (#137056) 2024-10-02 01:45:00 +00:00
examples
fsdp Enable channels_last format for FSDP (#137382) 2024-10-11 03:47:16 +00:00
launcher
nn Revert "added persistent option to buffers and namedbuffers (#132994)" 2024-08-09 18:14:53 +00:00
optim [BE]: Update mypy to 1.11.2 (#133816) 2024-09-16 19:44:11 +00:00
pipelining [Pipelining] Refactor Interleaved1F1B and ZeroBubble (#137783) 2024-10-16 03:05:14 +00:00
rpc [BE][Easy] enable ruff rule PIE790: unnecessary pass statement (#133200) 2024-08-15 15:50:19 +00:00
tensor Allow parallelize_module to get device_mesh from ambient context (#134247) 2024-10-09 00:19:03 +00:00
__init__.py Remove ProcessGroupRoundRobin (#132888) 2024-08-08 01:07:40 +00:00
_checkpointable.py
_composable_state.py
_functional_collectives_impl.py
_functional_collectives.py [BE]: Update mypy to 1.11.2 (#133816) 2024-09-16 19:44:11 +00:00
_state_dict_utils.py [DSD] Fix loading uneven full tensor into sharded state dict (#136365) 2024-09-23 16:35:58 +00:00
argparse_util.py
c10d_logger.py
collective_utils.py
constants.py
CONTRIBUTING.md
device_mesh.py [DeviceMesh] Fixed from_group when passing tensor mesh (#137713) 2024-10-11 14:53:51 +00:00
distributed_c10d.py [c10d][experimental] Add _abort_process_group (#132291) 2024-10-11 05:04:17 +00:00
launch.py
logging_handlers.py
remote_device.py
rendezvous.py [reland] [torchelastic][c10d] Fix store prefix race in rendezvous (#136768) 2024-09-26 17:37:07 +00:00
run.py fix torchrun log message (#131652) 2024-07-25 14:50:10 +00:00
utils.py Revert "[compiled autograd] Compiled autograd configs in TLS (#137821)" 2024-10-16 16:38:29 +00:00