pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-06 12:20:52 +01:00

History

Rohan Varma 782ee6c7e7 [FSDP][Reland] Implement local_state_dict and load_local_state_dict 1. Implement the framework to allow user to choose among `state_dict`, `local_state_dict`, and `sharded_state_dict`. 2. Implement ShardedTensor compatible local_state_dict() and load_local_state_dict(). ghstack-source-id: 149625958 Differential Revision: [D34383925](https://our.internmc.facebook.com/intern/diff/D34383925/) [ghstack-poisoned]		2022-02-23 07:57:34 -08:00
..
_shard	Revert D34284271: [TLC][checkpoint] Add unit test for StatefulComponentCheckpointAgent	2022-02-19 21:28:55 +00:00
_sharded_tensor	[reland] Create torch.distributed._shard package. (#72141 )	2022-02-02 06:58:20 +00:00
_sharding_spec	[reland] Create torch.distributed._shard package. (#72141 )	2022-02-02 06:58:20 +00:00
algorithms	[Join][BE] Fix typo; remove obsolete method (#72886 )	2022-02-16 15:03:09 +00:00
autograd	Add Python declaration of torch._C and torch._C._autograd modules. (#46622 )	2020-11-06 01:25:47 -08:00
benchmarks	Add lint for unqualified `type: ignore` (#56290 )	2021-04-21 08:07:23 -07:00
elastic	[codemod][type-comments] Convert type comments in api.py (#73084 )	2022-02-19 00:31:45 +00:00
fsdp	[FSDP][Reland] Implement local_state_dict and load_local_state_dict	2022-02-23 07:57:34 -08:00
launcher	(torch/elastic) fix scale down bug caused by calling rdzv_handler.shutdown() on premature agent failures (#67749 )	2021-11-05 12:18:46 -07:00
nn	Revert D33716716: [pytorch][PR] Added remove_duplicate parameter to `nn.Module`	2022-02-03 09:04:29 +00:00
optim	[ZeRO] (Reland) Add ctor support for multiple param groups (#72932 )	2022-02-22 16:29:55 +00:00
pipeline	Remove dtype from torch.Storage and use only torch.ByteStorage (#62030 )	2021-10-05 13:50:34 -07:00
rpc	[distributed] Make rref_proxy._invoke_rpc trully async when needed. (#70206 )	2022-01-19 23:37:15 +00:00
__init__.py	Add pybind trampoline for ProcessGroup and Work (#66338 )	2021-10-11 06:41:06 -07:00
argparse_util.py	[19/n][torch/elastic][upstream] Replace pytorch.distributed.launch with torchelastic launcher (#56214 )	2021-04-16 13:38:23 -07:00
constants.py	make ProcessGroupDefaultTimeout the same as python (#56549 )	2021-04-21 17:56:05 -07:00
CONTRIBUTING.md	Update distributed contributing guide to show how to run one test in test_distributed_spawn (#67801 )	2021-11-04 08:54:31 -07:00
distributed_c10d.py	Stop writing logs to root logger (#72649 )	2022-02-11 21:30:53 +00:00
launch.py	Introduce the torchrun entrypoint (#64049 )	2021-08-26 20:17:48 -07:00
remote_device.py	Basic implementation of ShardedLinear using ShardedTensor. (#64128 )	2021-09-20 18:31:11 -07:00
rendezvous.py	Update _create_c10d_store to check port value (#71863 )	2022-01-26 22:29:33 +00:00
run.py	(torch/elastic) add fqdn hostname to error printout (#66182 )	2021-10-07 01:40:02 -07:00