pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 00:21:07 +01:00

Author	SHA1	Message	Date
Xuehai Pan	22d258427b	[BE][Easy] enable UFMT for `torch/distributed/_shard/` (#128867 ) Part of #123062 - #123062 Pull Request resolved: https://github.com/pytorch/pytorch/pull/128867 Approved by: https://github.com/fegin ghstack dependencies: #128866	2024-06-18 14:39:25 +00:00
Xuehai Pan	67ef2683d9	[BE] wrap deprecated function/class with `typing_extensions.deprecated` (#127689 ) Use `typing_extensions.deprecated` for deprecation annotation if possible. Otherwise, add `category=FutureWarning` to `warnings.warn("message")` if the category is missing. Note that only warnings that their messages contain `[Dd]eprecat(ed\|ion)` are updated in this PR. Resolves #126888 - #126888 This PR is split from PR #126898. - #126898 ------ Pull Request resolved: https://github.com/pytorch/pytorch/pull/127689 Approved by: https://github.com/Skylion007	2024-06-02 12:30:43 +00:00
PyTorch MergeBot	033e733021	Revert "[BE] wrap deprecated function/class with `typing_extensions.deprecated` (#126898 )" This reverts commit `749a132fb0`. Reverted https://github.com/pytorch/pytorch/pull/126898 on behalf of https://github.com/fbgheith due to switching typing-extensions=4.3.0 to 4.9.0 causes internal failure ([comment](https://github.com/pytorch/pytorch/pull/126898#issuecomment-2142884456))	2024-05-31 19:47:24 +00:00
Xuehai Pan	749a132fb0	[BE] wrap deprecated function/class with `typing_extensions.deprecated` (#126898 ) Use `typing_extensions.deprecated` for deprecation annotation if possible. Otherwise, add `category=FutureWarning` to `warnings.warn("message")` if the category is missing. Note that only warnings that their messages contain `[Dd]eprecat(ed\|ion)` are updated in this PR. UPDATE: Use `FutureWarning` instead of `DeprecationWarning`. Resolves #126888 - #126888 Pull Request resolved: https://github.com/pytorch/pytorch/pull/126898 Approved by: https://github.com/albanD	2024-05-29 12:09:27 +00:00
Iris	aee96bbf5a	[PT-D][Checkpointing] Move distributed checkpointing from torch.distributed._shard.checkpoint to torch.distributed.checkpoint (#88698 ) Context in RFC: https://github.com/pytorch/pytorch/issues/86620 .rst file will be finalized in subsequent PRs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/88698 Approved by: https://github.com/wanchaol	2022-11-16 21:06:38 +00:00
Kurt Mohler	ee28b865ee	Deprecate TypedStorage, its derived classes, and all of their public methods (#85303 ) Part of #85302 Pull Request resolved: https://github.com/pytorch/pytorch/pull/85303 Approved by: https://github.com/ezyang	2022-11-08 18:11:01 +00:00
Rodrigo Kumpera	f66be71d77	[checkpoint] Adopt Planner interface across the board. (#83781 ) Change StorageReader and StorageWriter to follow the new SavePlanner / LoadPlanner design. Add optional planner param to load_state_dict and save_state_dict and implement the new protocol. This includes a small rework of FileSystem layer to support single file per rank and making fsync optional to match torch.save behavior. Pull Request resolved: https://github.com/pytorch/pytorch/pull/83781 Approved by: https://github.com/wanchaol, https://github.com/fduwjj	2022-08-29 14:38:32 +00:00
Sergii Dymchenko	591222f5d9	Fix use-dict-literal lint (#83718 ) Fix use-dict-literal pylint suggestions by changing `dict()` to `{}`. This PR should do the change for every Python file except test/jit/test_list_dict.py, where I think the intent is to test the constructor. Pull Request resolved: https://github.com/pytorch/pytorch/pull/83718 Approved by: https://github.com/albanD	2022-08-24 00:26:46 +00:00
joncrall	b136f3f310	More doctest refinements. (#83317 ) Follow up to #82797 Now that the doctests themselves are in a better state, we should be able to enable xdoctest on the CI so they stay that way. @ezyang @vadimkantorov Pull Request resolved: https://github.com/pytorch/pytorch/pull/83317 Approved by: https://github.com/ezyang	2022-08-22 20:07:26 +00:00
Rodrigo Kumpera	d11d3dd036	[dist.cp] Introduce LoadPlanner and SavePlanner extensibility API. (#83419 ) The planners come with default implementations in default_planner.py. The default planners expose their core functionality as separate functions to make it easy for other checkpoint implementations to use this functionality. Pull Request resolved: https://github.com/pytorch/pytorch/pull/83419 Approved by: https://github.com/wanchaol	2022-08-18 19:40:15 +00:00
joncrall	4618371da5	Integrate xdoctest - Rebased (#82797 ) This is a new version of #15648 based on the latest master branch. Unlike the previous PR where I fixed a lot of the doctests in addition to integrating xdoctest, I'm going to reduce the scope here. I'm simply going to integrate xdoctest, and then I'm going to mark all of the failing tests as "SKIP". This will let xdoctest run on the dashboards, provide some value, and still let the dashboards pass. I'll leave fixing the doctests themselves to another PR. In my initial commit, I do the bare minimum to get something running with failing dashboards. The few tests that I marked as skip are causing segfaults. Running xdoctest results in 293 failed, 201 passed tests. The next commits will be to disable those tests. (unfortunately I don't have a tool that will insert the `#xdoctest: +SKIP` directive over every failing test, so I'm going to do this mostly manually.) Fixes https://github.com/pytorch/pytorch/issues/71105 @ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/82797 Approved by: https://github.com/ezyang	2022-08-12 02:08:01 +00:00
Rodrigo Kumpera	f4ee37453c	[dist.checkpoint] Change metadata format and improve error reporting (#82078 ) This PR implements the following changes. Move to new checkpoint metadata format with split between logical and storage data. This is a step in the direction of supporting extensible checkpointing as it moves us away from the hardcoded storage model enforced by the FileSystem storage layer. Change CheckpointException to include exception traceback. Exception tracebacks are not serializable so we need to take care of that otherwise we provide horribly bad errors to users. Finally, remove `validate_state_dict` as it lost its usefulness. Loading is becoming more and more flexible to the point that the only reasonable way to verify if it's possible to load a given configuration is to actually try it. Pull Request resolved: https://github.com/pytorch/pytorch/pull/82078 Approved by: https://github.com/wanchaol, https://github.com/fduwjj	2022-08-03 17:00:12 +00:00
Rodrigo Kumpera	69eecdbc9c	Introduce MetadataIndex and helper to use it. (#81909 ) MetadataIndex simplifies indexing into state dict and Metadata. This includes a find_state_dict_object helper that searcher into a state dict. This PR doesn't include search over Metadata at it requires changes that will land in a subsequent PR. Pull Request resolved: https://github.com/pytorch/pytorch/pull/81909 Approved by: https://github.com/wanchaol	2022-07-28 12:20:58 +00:00
Rodrigo Kumpera	d2078fac11	[dist.checkpoint] Cleanup usage of collectives and introduce narrow helper (#81828 ) Introduce _DistWrapper class that wraps a process group and provides functional variants of collectives. It works without c10d enabled and is exception robust. Introduce tensor_narrow_n that handle narrowing over multiple dimentions. Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/81828 Approved by: https://github.com/wanchaol	2022-07-27 12:59:58 +00:00
Sergii Dymchenko	d61ae1a773	Remove unused variables from state_dict_loader (#81513 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/81513 Approved by: https://github.com/mrshenli	2022-07-15 15:31:34 +00:00
Sergii Dymchenko	fe34bf1201	Remove unused storage_size (#81514 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/81514 Approved by: https://github.com/mrshenli	2022-07-15 15:30:52 +00:00
zilinzhu	3d9cef8c98	Clone tensor to write in ShardedTensor checkpoint (#79400 ) The `torch.save` api will save the origin tensor of a view, which will results in saving a much larger checkpoint when parameters are fused, e.g. in torchrec. Relates to #79016 Pull Request resolved: https://github.com/pytorch/pytorch/pull/79400 Approved by: https://github.com/kumpera	2022-06-29 03:47:24 +00:00
Rodrigo Kumpera	270c518be0	[checkpoint] Implement interop between Tensor and Sharded Tensor (#78120 ) This allows loading a Tensor from a checkpoint with a SharedTensor in the same FQN. Pull Request resolved: https://github.com/pytorch/pytorch/pull/78120 Approved by: https://github.com/pritamdamania87	2022-06-16 15:31:09 +00:00
Rodrigo Kumpera	c9570e4b88	[checkpoint] Synchronize error handling across all ranks (#77091 ) Introduce error handling across all ranks when loading and saving checkpoints. This makes it a lot simpler for users to handle failures and, as a positive side-effect, coordination of when it successfully finished. This change requires 3 collectives when saving and 1 when loading. All those collectives carry a small payload so they will be latency bound and write time should dominate it. Pull Request resolved: https://github.com/pytorch/pytorch/pull/77091 Approved by: https://github.com/pritamdamania87, https://github.com/wanchaol	2022-05-18 21:24:09 +00:00
Rodrigo Kumpera	710246ea99	Introduce distributed checkpoint with ShardedTensor. This is a copy of #76123. I had to create a new PR due to some infra limitations so please look at the other PR for comment history. Pull Request resolved: https://github.com/pytorch/pytorch/pull/76897 Approved by: https://github.com/wanchaol	2022-05-05 20:28:12 +00:00

20 Commits