pytorch/torch/distributed
Jeeja 7f1d5aba93 [FSDP] Use generic device handle instead of cuda (#121620)
In FSDP _optim_utils.py  Use generic device handle instead of cuda
to support other backends

Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/121620
Approved by: https://github.com/awgu, https://github.com/wz337
2024-05-13 18:07:08 +00:00
..
_composable [fsdp2] Accomodate FSDP2 to accept parent mesh > 2 (#125778) 2024-05-09 05:02:21 +00:00
_shard Revert "[FX] Update type hints in torch.fx._compatibility.py (#125469)" 2024-05-06 18:36:43 +00:00
_sharded_tensor
_sharding_spec
_spmd Revert "[FX] Update type hints in torch.fx._compatibility.py (#125469)" 2024-05-06 18:36:43 +00:00
_tensor [DTensor] allow numel 1 tensor operand to be implicitly replicate DTensor (#125073) 2024-05-08 19:47:47 +00:00
_tools [BE]: Try TCH autofixes on torch/ (#125536) 2024-05-05 23:13:59 +00:00
algorithms [DDP] Use compiled_autograd to trace DDP backward allreduce (#110662) 2024-02-08 03:03:15 +00:00
autograd
benchmarks Move doc links to point to main (#121823) 2024-03-15 19:49:37 +00:00
checkpoint [DSD] Implement broadcast_from_rank0 option for optim state_dict (#125339) 2024-05-08 07:22:20 +00:00
elastic Prevent rendezvous shutdown on worker restarts (#124819) 2024-05-09 02:40:31 +00:00
examples
fsdp [FSDP] Use generic device handle instead of cuda (#121620) 2024-05-13 18:07:08 +00:00
launcher torchelastic: change monitor_interval default to 0.1 (#124692) 2024-04-24 01:44:41 +00:00
nn Fix get_rank under a non-default group. (#120481) 2024-03-11 05:40:54 +00:00
optim [optim] add fused_adagrad support for CPU device (#124905) 2024-05-13 01:16:20 +00:00
pipeline [BE]: Update ruff to v0.4.4 (#125031) 2024-05-12 20:02:37 +00:00
pipelining [pipelining] Add pipeline schedules (#125975) 2024-05-11 21:17:53 +00:00
rpc [BE]: Use iterable.chain.from_iterable where possible (#116376) 2023-12-27 19:20:07 +00:00
tensor [DeviceMesh] Make _validate_tp_mesh_dim support 3D (#125763) 2024-05-08 21:22:11 +00:00
__init__.py [C10D] Add dist.get_node_local_rank helper (#123992) 2024-04-16 00:09:46 +00:00
_composable_state.py Fix docstring errors in _composable_state.py, remote_device.py, value_ranges.py, utils.py, run.py, rendezvous.py, launch.py, argparse_util.py, __init__.py, _cycles.py (#112953) 2023-11-08 01:13:09 +00:00
_functional_collectives_impl.py Make c10d_functional ops call into _c10d_functional ops (#124979) 2024-04-27 08:08:02 +00:00
_functional_collectives.py AsyncCollectiveTensor: prevent wait_tensor() calls on graph inputs from getting DCEd (#125677) 2024-05-08 15:54:01 +00:00
_state_dict_utils.py [DSD] Implement broadcast_from_rank0 option for optim state_dict (#125339) 2024-05-08 07:22:20 +00:00
argparse_util.py Add --local-ranks-filter to torchrun: allow logs filtering by rank (#118562) 2024-02-07 04:29:54 +00:00
c10d_logger.py [DCP] Adds better handling in logging of specific kwargs (#123658) 2024-04-11 21:09:38 +00:00
collective_utils.py
constants.py Switch env variable use in test harnesses to the non-deprecated names to fix warnings (#114880) 2023-12-01 20:08:23 +00:00
CONTRIBUTING.md
device_mesh.py Fix device type issue in _get_device_handle (#124390) 2024-04-30 06:59:56 +00:00
distributed_c10d.py [PT2D] Fix the circular import issue (#125618) 2024-05-07 05:10:18 +00:00
launch.py [Docs][Distributed] Add migration notes for --local-rank option style change for torchrun in PyTorch 2.0 (#109480) 2024-04-16 05:51:57 +00:00
logging_handlers.py
remote_device.py Fix docstring errors in _composable_state.py, remote_device.py, value_ranges.py, utils.py, run.py, rendezvous.py, launch.py, argparse_util.py, __init__.py, _cycles.py (#112953) 2023-11-08 01:13:09 +00:00
rendezvous.py Enable local_partial_types (#118467) 2024-01-28 13:38:22 +00:00
run.py [BE]: Improve exception typing. Remove NOQAs (#125535) 2024-05-08 14:07:13 +00:00
utils.py [fsdp][torch.compile] FSDP changes (#115497) 2023-12-19 18:44:36 +00:00