pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 00:21:07 +01:00

History

Rohan Varma a0b3814433 Clean prefixes when searching for params / buffers to ignore (#78278 ) Co-authored with: @awgu When `state_dict` has a prefix attached to it, the current logic for ignoring parameters and buffers does not work since it doesn't account for this prefix. To fix this, we make the following changes: - clean the key if it starts with prefix. Note that all keys may not start with prefix, i.e. if the current module's state_dict_post_hook is running and previous module `state_dict` has already been computed and previous module is on the same level of hierarchy as the current module. - This prefixing makes it so that it is not current to override child module's ignored params and buffers with the root FSDP instance's (this wouldn't work if child FSDP instances had ignored modules, and root didn't, for example). We fix this by having each parent know about the ignored modules of their children, and computing fully qualified names for ignored params and buffers. - This means that each for a particular FSDP instance, that instance knows about the names of itself and its children (in fully qualified form) that it needs to ignore. It wouldn't know about parent ignored params and buffers, but it doesn't need to store this data. Pull Request resolved: https://github.com/pytorch/pytorch/pull/78278 Approved by: https://github.com/awgu		2022-05-26 02:43:03 +00:00
..
_shard	[PT-D] Enable nan_to_num op for sharded tensor	2022-05-25 18:03:42 +00:00
_sharded_tensor	[reland] Create torch.distributed._shard package. (#72141 )	2022-02-02 06:58:20 +00:00
_sharding_spec	[reland] Create torch.distributed._shard package. (#72141 )	2022-02-02 06:58:20 +00:00
algorithms	CheckpointWrapper state_dict fix (#77224 )	2022-05-17 03:39:31 +00:00
autograd
benchmarks
elastic	[lint] upgrade mypy to latest version	2022-05-03 20:51:34 +00:00
fsdp	Clean prefixes when searching for params / buffers to ignore (#78278 )	2022-05-26 02:43:03 +00:00
launcher	(torch/elastic) add documentation clarifying that torchrun is a console script to torch.distributed.run (#73598 )	2022-03-03 08:35:50 +00:00
nn	[Reland] load_state_dict post hook (#77392 )	2022-05-14 06:06:23 +00:00
optim	Adding maximize to Adamax (#77409 )	2022-05-16 17:34:44 +00:00
pipeline	Add type hints for a few random functions/classes	2022-05-04 13:53:00 +00:00
rpc	[RPC small change] Improving logging for store.wait error	2022-05-05 18:23:17 +00:00
__init__.py	[Dynamic RPC] Allow for optional world_size argument in init_rpc (#73372 )	2022-03-24 16:19:28 +00:00
argparse_util.py
constants.py
CONTRIBUTING.md	Update distributed contributing guide to show how to run one test in test_distributed_spawn (#67801 )	2021-11-04 08:54:31 -07:00
distributed_c10d.py	[lint] upgrade mypy to latest version	2022-05-03 20:51:34 +00:00
launch.py	Introduce the torchrun entrypoint (#64049 )	2021-08-26 20:17:48 -07:00
remote_device.py	Rewrite ShardedTensor.gather to use dist.gather instead of gather_object (#77272 )	2022-05-17 02:14:40 +00:00
rendezvous.py	Improving typing and typing-related performance in rendezvous.py	2022-04-24 21:49:51 +00:00
run.py	(torch/elastic) add documentation clarifying that torchrun is a console script to torch.distributed.run (#73598 )	2022-03-03 08:35:50 +00:00
utils.py	FSDP parameter sync	2022-05-17 19:58:49 +00:00