pytorch/torch/distributed
Rohan Varma 782ee6c7e7 [FSDP][Reland] Implement local_state_dict and load_local_state_dict
1. Implement the framework to allow user to choose among `state_dict`, `local_state_dict`, and `sharded_state_dict`.
2. Implement ShardedTensor compatible local_state_dict() and load_local_state_dict().
ghstack-source-id: 149625958

Differential Revision: [D34383925](https://our.internmc.facebook.com/intern/diff/D34383925/)

[ghstack-poisoned]
2022-02-23 07:57:34 -08:00
..
_shard Revert D34284271: [TLC][checkpoint] Add unit test for StatefulComponentCheckpointAgent 2022-02-19 21:28:55 +00:00
_sharded_tensor [reland] Create torch.distributed._shard package. (#72141) 2022-02-02 06:58:20 +00:00
_sharding_spec [reland] Create torch.distributed._shard package. (#72141) 2022-02-02 06:58:20 +00:00
algorithms [Join][BE] Fix typo; remove obsolete method (#72886) 2022-02-16 15:03:09 +00:00
autograd Add Python declaration of torch._C and torch._C._autograd modules. (#46622) 2020-11-06 01:25:47 -08:00
benchmarks Add lint for unqualified type: ignore (#56290) 2021-04-21 08:07:23 -07:00
elastic [codemod][type-comments] Convert type comments in api.py (#73084) 2022-02-19 00:31:45 +00:00
fsdp [FSDP][Reland] Implement local_state_dict and load_local_state_dict 2022-02-23 07:57:34 -08:00
launcher (torch/elastic) fix scale down bug caused by calling rdzv_handler.shutdown() on premature agent failures (#67749) 2021-11-05 12:18:46 -07:00
nn Revert D33716716: [pytorch][PR] Added remove_duplicate parameter to nn.Module 2022-02-03 09:04:29 +00:00
optim [ZeRO] (Reland) Add ctor support for multiple param groups (#72932) 2022-02-22 16:29:55 +00:00
pipeline Remove dtype from torch.Storage and use only torch.ByteStorage (#62030) 2021-10-05 13:50:34 -07:00
rpc [distributed] Make rref_proxy._invoke_rpc trully async when needed. (#70206) 2022-01-19 23:37:15 +00:00
__init__.py Add pybind trampoline for ProcessGroup and Work (#66338) 2021-10-11 06:41:06 -07:00
argparse_util.py [19/n][torch/elastic][upstream] Replace pytorch.distributed.launch with torchelastic launcher (#56214) 2021-04-16 13:38:23 -07:00
constants.py make ProcessGroupDefaultTimeout the same as python (#56549) 2021-04-21 17:56:05 -07:00
CONTRIBUTING.md Update distributed contributing guide to show how to run one test in test_distributed_spawn (#67801) 2021-11-04 08:54:31 -07:00
distributed_c10d.py Stop writing logs to root logger (#72649) 2022-02-11 21:30:53 +00:00
launch.py Introduce the torchrun entrypoint (#64049) 2021-08-26 20:17:48 -07:00
remote_device.py Basic implementation of ShardedLinear using ShardedTensor. (#64128) 2021-09-20 18:31:11 -07:00
rendezvous.py Update _create_c10d_store to check port value (#71863) 2022-01-26 22:29:33 +00:00
run.py (torch/elastic) add fqdn hostname to error printout (#66182) 2021-10-07 01:40:02 -07:00