pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 00:21:07 +01:00

Author	SHA1	Message	Date
Aaron Gokaslan	b7b2178204	[BE]: Remove useless lambdas (#113602 ) Applies PLW0108 which removes useless lambda calls in Python, the rule is in preview so it is not ready to be enabled by default just yet. These are the autofixes from the rule. Pull Request resolved: https://github.com/pytorch/pytorch/pull/113602 Approved by: https://github.com/albanD	2023-11-14 20:06:48 +00:00
NVS Abhilash	44c0521e8c	fix: docstring error in torch/distributed module (#113241 ) Fixes: #113193 `pydocstyle <all_files_in_issue> --count` - Before: 345 - After: 130 For deprecated methods, I have added a `noqa` to ignore them. I was not able to find the file `torch/distributed/tensor/parallel/multihead_attention_tp.py`, so I've ignored it for this PR. Pull Request resolved: https://github.com/pytorch/pytorch/pull/113241 Approved by: https://github.com/kit1980	2023-11-09 19:10:20 +00:00
Hanzhi Zhou	bb45f89cd9	Hackable distributed filesystem reader and writer (#106635 ) I propose some changes so that the `FileSystemReader` and `FileSystemWriter` can be used on other file systems. User only needs to provide `path` as a subclass of `Path` that overrides the necessary interfaces. For example, one can utilize `tf.io.gfile` to implement an interface to save to or load from HDFS. The following code snippet shows a working implementation. ```python from pathlib import Path import tensorflow as tf class GFileWrapper(tf.io.gfile.GFile): def __init__(self, path, mode="r") -> None: super().__init__(path, mode) def write(self, data): return super().write(bytes(data)) # a not quite efficient readinto, but it works def readinto(self, buffer): # read up to buffer's length data = self.read(len(buffer)) length = len(data) buffer[:length] = data return length class HdfsPath(type(Path())): def __new__(cls, pathsegments): return super().__new__(cls, pathsegments) @staticmethod def _fix_path(path): path = str(path) if path.startswith("hdfs:/") and not path.startswith("hdfs://"): path = path.replace("hdfs:/", "hdfs://") return path def open(self, mode="r", args, kwargs): return GFileWrapper(HdfsPath._fix_path(self), mode=mode) def mkdir(self, *kwargs) -> None: return tf.io.gfile.makedirs(HdfsPath._fix_path(self)) def rename(self, target): return tf.io.gfile.rename(HdfsPath._fix_path(self), HdfsPath._fix_path(target)) ``` ```python writer = FileSystemWriter(HdfsPath("hdfs://..."), sync_files=False) reader = FileSystemReader(HdfsPath("hdfs://...")) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/106635 Approved by: https://github.com/fduwjj	2023-10-31 19:36:18 +00:00
dilililiwhy	ff37f6018d	Enable custom device support in fsdp checkpoint (#107289 ) Fixes https://github.com/pytorch/pytorch/issues/104390 Enable custom device(privateuse1 backend) support in checkpointing by a dynamic abstract device module. Pull Request resolved: https://github.com/pytorch/pytorch/pull/107289 Approved by: https://github.com/wz337	2023-08-25 11:50:03 +00:00
Rodrigo Kumpera	4833dc10b8	[DCP] Rewrite read slicing to use a wrapper. (#99167 ) Moved SlicedBufferedReader to utils and renamed to _ReaderView. It no longer depends on file handles and is a pure wrapper. This makes it general enought to handle non io stream objects like fsspec's. Should help with #98386 Pull Request resolved: https://github.com/pytorch/pytorch/pull/99167 Approved by: https://github.com/wz337	2023-06-08 13:52:13 +00:00
Aaron Gokaslan	8769fb854d	[BE] Fix flake8 B027 errors - missing abstractmethod decorator (#100715 ) Enables B027 and applies fixes by adding abstract method decorators. Autofix generated by ruff master. Pull Request resolved: https://github.com/pytorch/pytorch/pull/100715 Approved by: https://github.com/ezyang	2023-05-09 17:28:48 +00:00
Kazuaki Ishizaki	35fd5c548e	Fix typos under torch/distributed directory (#95638 ) This PR fixes typos in comments and messages of `.py` files under torch/distributed directory Pull Request resolved: https://github.com/pytorch/pytorch/pull/95638 Approved by: https://github.com/usamah1, https://github.com/H-Huang, https://github.com/kit1980	2023-03-27 21:13:44 +00:00
Iris	bebe58bd71	[DCP] Set single_file_per_rank default to True (#94501 ) The default behavior of FileSystemWriter should produce one file per rank instead of one file per tensor/blob. Pull Request resolved: https://github.com/pytorch/pytorch/pull/94501 Approved by: https://github.com/fegin	2023-02-09 21:45:31 +00:00
Aaron Gokaslan	748bac8757	[BE]: Apply pyupgrade yield from and unit test alias upgrades (#94309 ) Applies some more harmless pyupgrades. This one gets rid of deprecated aliases in unit_tests and more upgrades yield for loops into yield from generators which are more performance and propagates more information / exceptions from original generator. This is the modern recommended way of forwarding generators. Pull Request resolved: https://github.com/pytorch/pytorch/pull/94309 Approved by: https://github.com/albanD	2023-02-07 20:08:58 +00:00
Iris	dd05f028e2	[PT-D][Checkpoint] Rename DCP storage layer init() (#92869 ) Rename DCP storage layer init() and update tests accordingly. Pull Request resolved: https://github.com/pytorch/pytorch/pull/92869 Approved by: https://github.com/kumpera	2023-01-25 23:52:45 +00:00
Iris	eee2869ea7	[PT-D][checkpoint] Resolve no such file or directory issue when checkpointing on multi hosts (#92553 ) Previously, we only create the directory in rank 0. Therefore, if running on multihosts with multiple GPUs, we would run into issues of "No such file or directory". This is the fix for it. Pull Request resolved: https://github.com/pytorch/pytorch/pull/92553 Approved by: https://github.com/kumpera	2023-01-20 21:54:04 +00:00
Iris	0cc0e5ef65	[PT-D][Checkpoint]Add MultiThreaded FileSystemWriter for distributed checkpointing and Update tests (#87987 ) This PR includes: Changes from @kumpera (https://github.com/pytorch/pytorch/pull/86327): adding MultiThreaded FileSystemWriter for distributed checkpointing, which adds two knobs to FileSystemWriter: thread_count and per_thread_copy_ahead. This increases up to 50% performance improvement on 32 GPUS workloads on AWS. Add parametrize tests to /test/distributed/_shard/checkpoint/test_file_system_checkpoint.py and /test/distributed/_shard/checkpoint/test_file_system_checkpoint_cpu.py Modify @with_comms in ShardedTensorTestBase to take in args and *kwargs. Tests: ``` python3 test/distributed/checkpoint/test_file_system_checkpoint_cpu.py ``` test/distributed/checkpoint/test_file_system_checkpoint.py(GPU tests) runs fine locally but would timeout on CI. We will use thread-based PG and update this test in following PR. [T134844615] ## Add docstring and update comments in the following PRs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/87987 Approved by: https://github.com/fduwjj	2022-11-30 08:19:41 +00:00
Iris	cefece3726	Fix typo in filesystem.py (#89849 ) As title. Pull Request resolved: https://github.com/pytorch/pytorch/pull/89849 Approved by: https://github.com/H-Huang	2022-11-30 01:06:58 +00:00
Iris	aee96bbf5a	[PT-D][Checkpointing] Move distributed checkpointing from torch.distributed._shard.checkpoint to torch.distributed.checkpoint (#88698 ) Context in RFC: https://github.com/pytorch/pytorch/issues/86620 .rst file will be finalized in subsequent PRs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/88698 Approved by: https://github.com/wanchaol	2022-11-16 21:06:38 +00:00

14 Commits