pytorch/torch/distributed/checkpoint
Ankita George 78fe079c97 Support having no metadata file for HuggingFaceStorageReader (#150701)
Summary: If there is only one safetensors file, we don't need users to have a metadata file and we can just construct it from the keys of that file. This is a use-case for some HuggingFace models, so adding support for it

Test Plan:
ensure existing tests pass
tested e2e in a notebook

Differential Revision: D72472490

Pull Request resolved: https://github.com/pytorch/pytorch/pull/150701
Approved by: https://github.com/joecummings
2025-04-07 22:10:39 +00:00
..
examples
__init__.py Support huggingface reading and writing for multi rank case (#148189) 2025-03-26 14:47:31 +00:00
_async_executor.py [DCP] Introduce process based async checkpointing (#147039) 2025-03-04 13:33:28 +00:00
_async_process_executor.py [DCP] Introduce process based async checkpointing (#147039) 2025-03-04 13:33:28 +00:00
_async_thread_executor.py [DCP] Introduce process based async checkpointing (#147039) 2025-03-04 13:33:28 +00:00
_checkpointer.py PEP585 update - torch/distributed/elastic torch/distributed/checkpoint (#145163) 2025-01-19 20:55:59 +00:00
_dedup_save_plans.py Support huggingface reading and writing for multi rank case (#148189) 2025-03-26 14:47:31 +00:00
_dedup_tensors.py PEP585 update - torch/distributed/elastic torch/distributed/checkpoint (#145163) 2025-01-19 20:55:59 +00:00
_extension.py PEP585: Missed conversions (#145342) 2025-01-29 05:24:36 +00:00
_fsspec_filesystem.py Support having no metadata file for HuggingFaceStorageReader (#150701) 2025-04-07 22:10:39 +00:00
_hf_planner.py Support huggingface reading and writing for multi rank case (#148189) 2025-03-26 14:47:31 +00:00
_hf_storage.py Support having no metadata file for HuggingFaceStorageReader (#150701) 2025-04-07 22:10:39 +00:00
_nested_dict.py PEP585 update - torch/distributed/elastic torch/distributed/checkpoint (#145163) 2025-01-19 20:55:59 +00:00
_sharded_tensor_utils.py
_storage_utils.py PEP585 update - torch/distributed/elastic torch/distributed/checkpoint (#145163) 2025-01-19 20:55:59 +00:00
_traverse.py PEP585 update - torch/distributed/elastic torch/distributed/checkpoint (#145163) 2025-01-19 20:55:59 +00:00
_version.py
api.py PEP585 update - torch/distributed/elastic torch/distributed/checkpoint (#145163) 2025-01-19 20:55:59 +00:00
default_planner.py Support huggingface reading and writing for multi rank case (#148189) 2025-03-26 14:47:31 +00:00
filesystem.py Support having no metadata file for HuggingFaceStorageReader (#150701) 2025-04-07 22:10:39 +00:00
format_utils.py PEP585 update - torch/distributed/elastic torch/distributed/checkpoint (#145163) 2025-01-19 20:55:59 +00:00
logger.py [DCP] Introduce process based async checkpointing (#147039) 2025-03-04 13:33:28 +00:00
logging_handlers.py PEP585 update - torch/distributed/elastic torch/distributed/checkpoint (#145163) 2025-01-19 20:55:59 +00:00
metadata.py [DCP] Introduce modules metadata in the storage_meta (#146654) 2025-02-13 17:44:30 +00:00
optimizer.py [BE][PYFMT] migrate PYFMT for torch.{distributed,distributions} to ruff format (#144547) 2025-02-28 07:35:56 +00:00
planner_helpers.py [DCP] Cache save plan metadata to reduce the collective overhead (#149785) 2025-03-25 02:00:15 +00:00
planner.py [DCP] Cache save plan metadata to reduce the collective overhead (#149785) 2025-03-25 02:00:15 +00:00
resharding.py PEP585 update - torch/distributed/elastic torch/distributed/checkpoint (#145163) 2025-01-19 20:55:59 +00:00
staging.py Fix staging for CPU tensors in OSS DCP async_save (#145408) 2025-01-23 12:49:26 -08:00
state_dict_loader.py Fix bug in _load_state_dict_from_keys method (#150058) 2025-03-27 16:36:00 +00:00
state_dict_saver.py [DCP] Introduce process based async checkpointing (#147039) 2025-03-04 13:33:28 +00:00
state_dict.py [State_dict] Remove functools.cache and add unit test (#149354) 2025-03-18 17:30:41 +00:00
stateful.py PEP585 update - torch/distributed/elastic torch/distributed/checkpoint (#145163) 2025-01-19 20:55:59 +00:00
storage.py PEP585 update - torch/distributed/elastic torch/distributed/checkpoint (#145163) 2025-01-19 20:55:59 +00:00
utils.py [DCP][OSS] Introduce barrier util in the DistWrapper for rank local checkpointing (#150748) 2025-04-07 17:33:07 +00:00