mirror of
https://github.com/zebrajr/pytorch.git
synced 2025-12-07 12:21:27 +01:00
This is one step toward the ultimate goal: remove the overwritten state_dict in FSDP. All the logic should be either in `pre_state_dict_hook` or `post_state_dict_hook`. Since current `nn.Module` does not support `pre_state_dict_hook`, this PR mimic `pre_state_dict_hook` by calling the pre hook inside post the hook, effectively ditching all the work done by `nn.Module.state_dict`. Once `pre_state_dict_hook` is supported by `nn.Module`, these pre hook calls can be moved out from the post hooks and be registered to `nn.Module.pre_state_dict_hook`. The major issue of this temporary solution is that `post_state_dict_hook` is called from the leaf node to the root node. This makes the `module._lazy_init()` invalid as FSDP assumes `_lazy_init()` to be called from the root. As a result, `FSDP.state_dict` currently contains only one logic -- calling `module._lazy_init()`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/87900 Approved by: https://github.com/rohan-varma |
||
|---|---|---|
| .. | ||
| _C | ||
| _C_flatbuffer | ||
| _decomp | ||
| _dispatch | ||
| _dynamo | ||
| _inductor | ||
| _lazy | ||
| _prims | ||
| _prims_common | ||
| _refs | ||
| _subclasses | ||
| amp | ||
| ao | ||
| autograd | ||
| backends | ||
| contrib | ||
| cpu | ||
| csrc | ||
| cuda | ||
| distributed | ||
| distributions | ||
| fft | ||
| futures | ||
| fx | ||
| jit | ||
| legacy | ||
| lib | ||
| linalg | ||
| masked | ||
| monitor | ||
| multiprocessing | ||
| nested | ||
| nn | ||
| onnx | ||
| optim | ||
| package | ||
| profiler | ||
| quantization | ||
| signal | ||
| sparse | ||
| special | ||
| testing | ||
| utils | ||
| __config__.py | ||
| __future__.py | ||
| __init__.py | ||
| _appdirs.py | ||
| _classes.py | ||
| _deploy.py | ||
| _jit_internal.py | ||
| _linalg_utils.py | ||
| _lobpcg.py | ||
| _lowrank.py | ||
| _meta_registrations.py | ||
| _namedtensor_internals.py | ||
| _ops.py | ||
| _python_dispatcher.py | ||
| _six.py | ||
| _sources.py | ||
| _storage_docs.py | ||
| _tensor_docs.py | ||
| _tensor_str.py | ||
| _tensor.py | ||
| _torch_docs.py | ||
| _utils_internal.py | ||
| _utils.py | ||
| _VF.py | ||
| _vmap_internals.py | ||
| _weights_only_unpickler.py | ||
| abi-check.cpp | ||
| CMakeLists.txt | ||
| custom_class_detail.h | ||
| custom_class.h | ||
| extension.h | ||
| functional.py | ||
| hub.py | ||
| library.h | ||
| library.py | ||
| overrides.py | ||
| py.typed | ||
| quasirandom.py | ||
| random.py | ||
| README.txt | ||
| return_types.py | ||
| script.h | ||
| serialization.py | ||
| storage.py | ||
| torch_version.py | ||
| types.py | ||
Note [TH abstraction violation] ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ TH/THC provide some hpp headers, which are proper C++ headers rather than C headers. These headers serve double duty as *internal implementation detail* headers, whose contents should largely not be used by external clients. Ideally, we would not install these headers at all; instead, you should use public functions (in headers like `THTensor.h`, NOT `THTensor.hpp`) to manipulate these structs. However, there are a few places in torch/csrc where we violate this abstraction. They are marked with a pointer to this note. Each of those sites will have to be refactored when we refactor the guts of THTensor and related structures.