pytorch/torch/distributed/checkpoint/__init__.py
Ankita George 8a40fca9a1 Support huggingface reading and writing for multi rank case (#148189)
Summary: This diff adds the ability for HF reader/writer to read/write in a distributed way. We do this by sending all the tensors meant for the same file to the same rank.

Test Plan:
ensure existing tests pass
I also ran a full end to end test on my devserver to read/write from my HF repo

Differential Revision: D70096439

Pull Request resolved: https://github.com/pytorch/pytorch/pull/148189
Approved by: https://github.com/joecummings, https://github.com/saumishr
2025-03-26 14:47:31 +00:00

18 lines
771 B
Python

from . import _extension
from ._hf_planner import _HuggingFaceLoadPlanner, _HuggingFaceSavePlanner
from ._hf_storage import _HuggingFaceStorageReader, _HuggingFaceStorageWriter
from .api import CheckpointException
from .default_planner import DefaultLoadPlanner, DefaultSavePlanner
from .filesystem import FileSystemReader, FileSystemWriter
from .metadata import (
BytesStorageMetadata,
ChunkStorageMetadata,
Metadata,
TensorStorageMetadata,
)
from .optimizer import load_sharded_optimizer_state_dict
from .planner import LoadPlan, LoadPlanner, ReadItem, SavePlan, SavePlanner, WriteItem
from .state_dict_loader import load, load_state_dict
from .state_dict_saver import async_save, save, save_state_dict
from .storage import StorageReader, StorageWriter