Commit Graph

99 Commits

Author SHA1 Message Date
Chien-Chin Huang
db8d409d08 [DCP][BE] Apply ufmt to DCP and turn on lintrunner for DCP (#115302)
No logic change. Just typing and ufmt.

Differential Revision: [D51914982](https://our.internmc.facebook.com/intern/diff/D51914982/)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/115302
Approved by: https://github.com/XilunWu, https://github.com/wz337, https://github.com/LucasLLC
ghstack dependencies: #115523
2023-12-13 10:32:36 +00:00
Chien-Chin Huang
cc28f61fa3 [DCP][BE] Move DCP._state_dict_utils out from DCP (#115523)
DCP._state_dict_utils is also used by FSDP. This can cause circular import sometimes. Move it out from DCP to avoid circular import.

Differential Revision: [D52022440](https://our.internmc.facebook.com/intern/diff/D52022440/)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/115523
Approved by: https://github.com/wz337
2023-12-13 08:59:48 +00:00
Lucas Pasqualin
ffb2a28a67 Fixes expected behavior when no_dist=True in state_dict_loader.load (#115660)
Fixes expected behavior when `no_dist=True` in `state_dict_loader.load`

Fixes #115591

Pull Request resolved: https://github.com/pytorch/pytorch/pull/115660
Approved by: https://github.com/wz337, https://github.com/fegin
2023-12-12 22:21:16 +00:00
Chien-Chin Huang
d954ef208f [DCP][state_dict] DCP state_dict cannot correctly find FQN when the leaf module is wrapped by FSDP (#115592)
Summary: The original logic has an incorrect assumption that there is at one object name left when traversing the module tree. This is not correct when the leaf module is wrapped by FSDP.

Test Plan: CI

Differential Revision: D52049293

Pull Request resolved: https://github.com/pytorch/pytorch/pull/115592
Approved by: https://github.com/wz337
2023-12-12 19:22:23 +00:00
Lucas Pasqualin
5432088098 Adds Checkpointer Wrapper for DCP [3/N] (#114603)
Adds a useful high level wrapper for calling `dist.save/load` with the correct storage readers and writers.

Instead of doing:

```
DCP.save(
    state_dict={...},
    storage_writer=StorageWriter(...)
)

DCP.load(
    state_dict={...},
    storage_reader=StorageReader(...)
)
```

We can now do:

```
checkpointer = Checkpointer(...)

checkpointer.save(state_dict={...})
checkpointer.load(state_dict={...})
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/114603
Approved by: https://github.com/fegin, https://github.com/wz337
2023-12-08 01:03:21 +00:00
Lucas Pasqualin
753c07bbe0 All gather keys before processing Stateful objects in save/load [2/N] (#114304)
Accounts for the case where `state_dict` keys may present in different orders. Since users may be calling collectives in `state_dict` and `load_state_dict` call, different ordered keys could cause a deadlock. This is mostly a defensive move, meant to match the feature in TSS.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/114304
Approved by: https://github.com/fegin, https://github.com/wz337
2023-12-04 18:31:14 +00:00
Lucas Pasqualin
f073dcd4f7 Stateful Checkpointing for Distributed [1/N] (#113867)
First pass at adding a save/load API, as well as definition of Stateful objects.

Amongst a couple todo's, we still need to explore adding an `all_gather` & potentially a `barrier` while iterating through state keys.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/113867
Approved by: https://github.com/fegin, https://github.com/wz337
2023-12-01 19:21:03 +00:00
Aaron Gokaslan
b7b2178204 [BE]: Remove useless lambdas (#113602)
Applies PLW0108 which removes useless lambda calls in Python, the rule is in preview so it is not ready to be enabled by default just yet. These are the autofixes from the rule.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/113602
Approved by: https://github.com/albanD
2023-11-14 20:06:48 +00:00
Chien-Chin Huang
2bcff4d8e3 [state_dict][11/N] Implement cpu_offload and full_state_dict for get_state_dict (#112837)
As title

Differential Revision: [D50962991](https://our.internmc.facebook.com/intern/diff/D50962991/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/112837
Approved by: https://github.com/LucasLLC, https://github.com/wz337
ghstack dependencies: #112836, #112885
2023-11-13 10:03:06 +00:00
Chien-Chin Huang
6e714d7315 [state_dict] Rewrite _gather_state_dict to extract the traversal logic (#112885)
This allows us to do cpu_offload with the same traversal logic

Differential Revision: [D50982355](https://our.internmc.facebook.com/intern/diff/D50982355/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/112885
Approved by: https://github.com/LucasLLC, https://github.com/wz337
ghstack dependencies: #112836
2023-11-10 17:07:52 +00:00
Chien-Chin Huang
d4c810cc11 [state_dict] Add cpu_only and ranks_only support for _gather_state_dict (#112836)
Add cpu_only and ranks_only support for _gather_state_dict

Differential Revision: [D50962980](https://our.internmc.facebook.com/intern/diff/D50962980/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/112836
Approved by: https://github.com/LucasLLC, https://github.com/wz337
2023-11-10 16:03:46 +00:00
NVS Abhilash
44c0521e8c fix: docstring error in torch/distributed module (#113241)
Fixes: #113193

`pydocstyle <all_files_in_issue> --count`

- Before: 345
- After: 130

For deprecated methods, I have added a `noqa` to ignore them. I was not able to find the file `torch/distributed/tensor/parallel/multihead_attention_tp.py`, so I've ignored it for this PR.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/113241
Approved by: https://github.com/kit1980
2023-11-09 19:10:20 +00:00
Chien-Chin Huang
a66f2a1b99 [state_dict] Move _gather_state_dict to dcp module (#112835)
This api is getting used by more than just FSDP. This PR moves it to DCP module.

Differential Revision: [D50962966](https://our.internmc.facebook.com/intern/diff/D50962966/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/112835
Approved by: https://github.com/wz337
2023-11-08 19:42:56 +00:00
Chien-Chin Huang
e9d7fac89c [state_dict][10/N] Let set_state_dict returns IncompatibleKeys (#112414)
load_state_dict returns IncompatibleKeys, so set should also return the same information for the users.

Differential Revision: [D50748157](https://our.internmc.facebook.com/intern/diff/D50748157/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/112414
Approved by: https://github.com/wz337
ghstack dependencies: #112167, #112203
2023-11-02 22:39:38 +00:00
Chien-Chin Huang
9d0c3e21d0 [state_dict][9/N] Add get and set APIs for model and optimizer state_dict (#112203)
The original get_state_dict and set_state_dict pair is too complicated because of the possible combinations of usages. This PR adds the APIs to get/set model_state_dict and optimizer_state_dict seperately.

Differential Revision: [D50713584](https://our.internmc.facebook.com/intern/diff/D50713584/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/112203
Approved by: https://github.com/wz337
ghstack dependencies: #112167
2023-11-02 22:03:57 +00:00
Brian
07c9b053f7 Enable planner to be used for loading sharded optimizer state dict (#112259)
This creates a more consistent interface for saving and loading sharded state dicts. A planner is able to be specified when saving a sharded optimizer state dict, but there is currently no planner support for loading one. This change does not affect the default behavior of the function.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/112259
Approved by: https://github.com/wz337
2023-11-02 21:40:30 +00:00
Chien-Chin Huang
b1f50ead4f [state_dict][8/N] Ignore meta parameters (#112167)
This PR let `get_state_dict` ignore the parameters that are on the meta device.

This PR also demonstrates a possible use case of ignoring meta parameters -- checkpointing pipeline parallelism.

Differential Revision: [D50672521](https://our.internmc.facebook.com/intern/diff/D50672521/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/112167
Approved by: https://github.com/wz337
2023-11-02 17:10:03 +00:00
PyTorch MergeBot
16953482d9 Revert "Enable planner to be used for loading sharded optimizer state dict (#112259)"
This reverts commit 6188f2e899.

Reverted https://github.com/pytorch/pytorch/pull/112259 on behalf of https://github.com/ZainRizvi due to Sorry, but this breaks internal builds. @wz337 can you please help fix this? ([comment](https://github.com/pytorch/pytorch/pull/112259#issuecomment-1788119247))
2023-10-31 22:27:48 +00:00
Hanzhi Zhou
bb45f89cd9 Hackable distributed filesystem reader and writer (#106635)
I propose some changes so that the `FileSystemReader` and `FileSystemWriter` can be used on other file systems. User only needs to provide `path` as a subclass of `Path` that overrides the necessary interfaces.

For example, one can utilize `tf.io.gfile` to implement an interface to save to or load from HDFS. The following code snippet shows a working implementation.

```python
from pathlib import Path
import tensorflow as tf

class GFileWrapper(tf.io.gfile.GFile):
    def __init__(self, path, mode="r") -> None:
        super().__init__(path, mode)

    def write(self, data):
        return super().write(bytes(data))

    # a not quite efficient readinto, but it works
    def readinto(self, buffer):
        # read up to buffer's length
        data = self.read(len(buffer))
        length = len(data)
        buffer[:length] = data
        return length

class HdfsPath(type(Path())):
    def __new__(cls, *pathsegments):
        return super().__new__(cls, *pathsegments)

    @staticmethod
    def _fix_path(path):
        path = str(path)
        if path.startswith("hdfs:/") and not path.startswith("hdfs://"):
          path = path.replace("hdfs:/", "hdfs://")
        return path

    def open(self, mode="r", *args, **kwargs):
        return GFileWrapper(HdfsPath._fix_path(self), mode=mode)

    def mkdir(self, **kwargs) -> None:
        return tf.io.gfile.makedirs(HdfsPath._fix_path(self))

    def rename(self, target):
        return tf.io.gfile.rename(HdfsPath._fix_path(self), HdfsPath._fix_path(target))
```

```python
writer = FileSystemWriter(HdfsPath("hdfs://..."), sync_files=False)
reader = FileSystemReader(HdfsPath("hdfs://..."))
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/106635
Approved by: https://github.com/fduwjj
2023-10-31 19:36:18 +00:00
Brian
6188f2e899 Enable planner to be used for loading sharded optimizer state dict (#112259)
This creates a more consistent interface for saving and loading sharded state dicts. A planner is able to be specified when saving a sharded optimizer state dict, but there is currently no planner support for loading one. This change does not affect the default behavior of the function.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/112259
Approved by: https://github.com/wz337
2023-10-30 22:51:09 +00:00
Rohan Varma
192e795f3f Change save -> load in comment (#112217)
Change save -> load in comment because this is the load_state_dict API

Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/112217
Approved by: https://github.com/wz337
2023-10-27 19:39:02 +00:00
Iris Zhang (PyTorch)
aa9e65d8f5 [DCP] Add fsspec.transaction context when writing checkpoint to storage (#112191)
Summary: Adding fsspec.transaction to safeguard checkpointing writing. With the context, it should only commit if there was no exception and discard otherwise.

Test Plan:
```
command: buck test @//mode/dev-nosan  //caffe2/test/distributed/checkpoint/fb:test_fsspec_filesystem -- --print-passing-details
```

Reviewed By: rohan-varma

Differential Revision: D50701929

Pull Request resolved: https://github.com/pytorch/pytorch/pull/112191
Approved by: https://github.com/rohan-varma
2023-10-27 04:27:29 +00:00
Chien-Chin Huang
19a6487ad4 [state_dict][6/N] Change API names to avoid conflict and simplify the API signatures (#111120)
`state_dict` is a very common variable name people use to represent a local
state_dict and `load_state_dict` conflicts with DCP's `load_state_dict`.

This PR changes `state_dict` to `get_state_dict`. `get_state_dict` is more close to what is this API does -- users use the API to get the current state_dict for saving or for loading (passed to DCP for loading in-place)..

This PR also changes `load_state_dict` to `set_state_dict`. `set_state_dict` is less ideal compared to `get_state_dict` but is symetric. We can still change the API name before it goes to beta.

This PR also simplies the API signatures. `model_only` is removed and `optim_only` only exists for `get_state_dict`.

Differential Revision: [D50213931](https://our.internmc.facebook.com/intern/diff/D50213931/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/111120
Approved by: https://github.com/wz337
ghstack dependencies: #111106, #111107, #111275, #111109, #111110
2023-10-17 00:15:31 +00:00
Chien-Chin Huang
9683a26c55 [state_dict][5/N] Add submodules save and load support (#111110)
It is not easy for user to do submodules save and load (e.g., fine tuning) because FSDP requires to get the root module. This PR enables the support of submodule save and load.

Differential Revision: [D50209727](https://our.internmc.facebook.com/intern/diff/D50209727/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/111110
Approved by: https://github.com/wz337
ghstack dependencies: #111106, #111107, #111275, #111109
2023-10-16 23:25:37 +00:00
Chien-Chin Huang
7df287dc18 [state_dict][4/N] Support strict flag for model.load_state_dict (#111109)
As title

Differential Revision: [D50209723](https://our.internmc.facebook.com/intern/diff/D50209723/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/111109
Approved by: https://github.com/wz337
ghstack dependencies: #111106, #111107, #111275
2023-10-15 04:58:15 +00:00
Chien-Chin Huang
7c67139e7b [state_dict][3/N] Cleanup StateDictOptions, make it more readable (#111275)
This is a reland PR for https://github.com/pytorch/pytorch/pull/111108 with the proper docstring fix.

1. Rename DistributedStateDictOptions to StateDictOptions.
2. Remove cpu_offload as we have not yet required this option.
3. Rename save_frozen_parameters to ignore_frozen_params.

Differential Revision: [D50294352](https://our.internmc.facebook.com/intern/diff/D50294352/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/111275
Approved by: https://github.com/wz337
ghstack dependencies: #111106, #111107
2023-10-14 15:34:52 +00:00
PyTorch MergeBot
581d97c19e Revert "[state_dict][3/N] Cleanup StateDictOptions, make it more readable (#111108)"
This reverts commit b1db959085.

Reverted https://github.com/pytorch/pytorch/pull/111108 on behalf of https://github.com/huydhn due to Sorry for reverting your change, but I think it is cleaner to reland this change ([comment](https://github.com/pytorch/pytorch/pull/111108#issuecomment-1762504496))
2023-10-14 02:22:19 +00:00
Chien-Chin Huang
b1db959085 [state_dict][3/N] Cleanup StateDictOptions, make it more readable (#111108)
1. Rename DistributedStateDictOptions to StateDictOptions.
2. Remove cpu_offload as we have not yet required this option.
3. Rename save_frozen_parameters to ignore_frozen_params.

Differential Revision: [D50209711](https://our.internmc.facebook.com/intern/diff/D50209711/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/111108
Approved by: https://github.com/wz337
ghstack dependencies: #111106, #111107
2023-10-13 21:03:51 +00:00
Chien-Chin Huang
e99abaae2f [state_dict][2/N] Let distributed.state_dict accepts single optimizer (#111107)
It's quite annoying that users have to create a tuple of optimizers even if there is only one optimizer. This PR makes most users' life easier.

Differential Revision: [D50209704](https://our.internmc.facebook.com/intern/diff/D50209704/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/111107
Approved by: https://github.com/wz337
ghstack dependencies: #111106
2023-10-13 18:40:57 +00:00
wz337
e0eaa95e99 [DCP] Remove _shard_tensor() call in load_sharded_optimizer_state_dict in optimizer.py (#111096)
`_shard_tensor()` calls into `dist.all_gather_object()` and this is causing optimizer state dict loading to be super slow. Workaround: call `FSDP._shard_utils._create_chunk_sharded_tensor()` to construct ShardedTensor without any communication.

Thanks to @fegin for suggesting the fix!
Thanks @mvpatel2000 for reporting the issue and providing profiling details to help us isolate the problematic source code quickly.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/111096
Approved by: https://github.com/fegin
2023-10-12 20:27:06 +00:00
wz337
a614281ea9 Add current_device() to torch.cpu (#110987)
Better support device agnostic, add a "cpu" return for `current_device()` in torch.cpu so that we won't run into `AttributeError: module 'torch.cpu' has no attribute 'current_device'`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110987
Approved by: https://github.com/wanchaol
2023-10-11 05:13:10 +00:00
Chien-Chin Huang
88616349d7 [state_dict][1/N] Implement the basic functions of distributed.checkpoint._state_dict (#105902)
This PR implements the basic functions of distributed.checkpoint._state_dict. This PR currently contains the flattening of optimizer state_dict which makes the PR too large. A later version may split it into 2 for a better code review.

Differential Revision: [D47647719](https://our.internmc.facebook.com/intern/diff/D47647719/)

**NOTE FOR REVIEWERS**: This PR has internal Meta-specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D47647719/)!
Pull Request resolved: https://github.com/pytorch/pytorch/pull/105902
Approved by: https://github.com/wz337
2023-10-05 20:04:15 +00:00
wz337
a588648759 [DCP] Fix 'torch.cpu' has no attribute 'current_device' in checkpoint/optimizer.py (#110299)
When running on "gloo" and "cpu:gloo,cuda:nccl" backend, it will run into the following error.

```
-- Process 1 terminated with the following error:
Traceback (most recent call last):
  File "/data/users/irisz/pytorch/torch/multiprocessing/spawn.py", line 74, in _wrap
    fn(i, *args)
  File "/data/users/irisz/pytorch/torch/distributed/checkpoint/examples/fsdp_checkpoint_example.py", line 105, in run_fsdp_checkpoint_example
    optim_state = load_sharded_optimizer_state_dict(
  File "/data/users/irisz/pytorch/torch/distributed/checkpoint/optimizer.py", line 295, in load_sharded_optimizer_state_dict
    _alloc_tensor(value.properties, value.size, dp_pg_device_type), sharding_spec
  File "/data/users/irisz/pytorch/torch/distributed/checkpoint/optimizer.py", line 109, in _alloc_tensor
    device=cast(torch.device, _get_device_module(device_type).current_device()),
AttributeError: module 'torch.cpu' has no attribute 'current_device'
```

This PR fix the error in optimizer.py. Will follow up to add "cpu:gloo,cuda:nccl" support in DTensorBase so we can update unit test to include this backend.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110299
Approved by: https://github.com/kumpera
2023-10-01 21:54:13 +00:00
Aaron Gokaslan
6b39cf863f Fix invalid arg to getLogger in torch distributed checkpoint (#110008)
Ran the experimental LOG002 ruff check and found a bug in our codebase. Logger should not be instantiated from `__file__`, it should be instantiated from `__name__`

https://docs.astral.sh/ruff/rules/invalid-get-logger-argument/
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110008
Approved by: https://github.com/ezyang
2023-09-25 18:21:18 +00:00
Brian
ab99a95470 Update planner.py (#107998)
Fixes #107997
Pull Request resolved: https://github.com/pytorch/pytorch/pull/107998
Approved by: https://github.com/wz337
2023-09-15 18:12:45 +00:00
Iris
b6f9d4dbc4 [DCP] Enable nD device_mesh resharding DTensor in DCP and add associated tests (#106230)
This PR:
     1. Drop assert for 1D DeviceMesh check to allow DTensor with nD DeviceMesh when creating write_item.
     2. Add tests for both placement changes and mesh changes for both 1D and 2D scenarios.

cc. @kumpera  @wanchaol  @fegin
Pull Request resolved: https://github.com/pytorch/pytorch/pull/106230
Approved by: https://github.com/kumpera
2023-09-12 00:47:58 +00:00
wz337
49aa8d19dd [DTensor] Replace usage of compute_local_offset by compute_local_shape_and_global_offset (#108547)
This PR removes four usages of compute_local_offset() in PyTorch repo and replaces it with the new API compute_local_shape_and_global_offset().

We will be removing compute_local_offset() API in the next diff, as there are usages internally.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/108547
Approved by: https://github.com/wanchaol
2023-09-06 04:53:44 +00:00
dilililiwhy
ff37f6018d Enable custom device support in fsdp checkpoint (#107289)
Fixes https://github.com/pytorch/pytorch/issues/104390
Enable custom device(privateuse1 backend) support in checkpointing by a dynamic abstract device module.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/107289
Approved by: https://github.com/wz337
2023-08-25 11:50:03 +00:00
Brian
3361fae89b Fix FP16Planner documentation (#107620)
Fixes #107619

Pull Request resolved: https://github.com/pytorch/pytorch/pull/107620
Approved by: https://github.com/awgu
2023-08-22 02:05:27 +00:00
Brian
24968383b5 Fix RenamePlanner documentation (#107535)
Fixes #107490

Pull Request resolved: https://github.com/pytorch/pytorch/pull/107535
Approved by: https://github.com/awgu, https://github.com/fduwjj
2023-08-21 07:51:57 +00:00
Eddy Ogola Onyango
cbcd9083be [DCP] Modify tensor saving logic in DCP (#106415)
Currently, DCP treats tensors as duplicates and only saves them on rank0. This won't work for PiPPy as PiPPy does have unique tensors across different ranks. With the current setup, we would only be saving the tensors on rank0 (coordinator rank).

In this PR, we are changing to letting each rank create its own WriteItem for tensors. For the ones that does replicate across different ranks, we are handling it thru dedup_tensors(), which will dedup the replicate WriteItem so we only do the actual writing once.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/106415
Approved by: https://github.com/wz337
2023-08-09 00:16:10 +00:00
Jon Bolin
1032a2541e Add option to disable rewriting index hints in default global save plan (#105861)
With distributed checkpointing in PyTorch/XLA SPMD, the WriteItem index hints should not be modified when creating the global plan. In order to reuse the default planner logic for checkpoint metadata creation, we need to make the behavior of rewriting index hints optional.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/105861
Approved by: https://github.com/kumpera
2023-07-25 06:00:13 +00:00
Justin Chu
232b96b6e2 [BE] Enable ruff's UP rules and autoformat distributed/ (#105433)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/105433
Approved by: https://github.com/albanD
2023-07-19 14:27:11 +00:00
Nikita Shulga
5837e95d30 [Reland] Update mypy to 1.4.1 (#105227)
This PR re-lands
- [Typing] Fix PEP 484 Violation (#105022)
- Update mypy to 1.4.1 (#91983)

That were reverted due to the conflict with internal source repo.

Mostly fixes for PEP-484 violation (i.e. when default arg is set to None, but type is not annotated as optional)
Plus few real fixes:
  - Add missing `_get_upgraders_entry_map` to `torch/_C/__init__.pyi`
  - Add missing return statement to `torch._export. deserialize_graph`
  - Fix error message in `torch.ao.ns.fx.weight_utils.get_lstm_mod_weights`
  - Add assert it `torch/optim/optimizer.py` that Optional list is not None
TODO (in followup PR):
  - Fix erroneous `isinstance` check in `torch/ao/quantization/_pt2e/qat_utils.py`

Unrelated, to bypass CI failures due to the gcc9 dependency update in Ubuntu-18.04:
- Add hack to squash older libstdc++ from conda environment in favor one from OS to `.ci/docker/install_conda.sh`
- Update bazel cuda builds to focal, as with libstdc++-6.0.32 bazel builds loose the ability to catch exceptions (probably because they link with cupti statically, but I could not found where it is done)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/105227
Approved by: https://github.com/atalman, https://github.com/albanD, https://github.com/Skylion007
2023-07-15 20:30:20 +00:00
PyTorch MergeBot
15fd1ea118 Revert "[Reland] Update mypy to 1.4.1 (#105227)"
This reverts commit c9c4f8efc3.

Reverted https://github.com/pytorch/pytorch/pull/105227 on behalf of https://github.com/atalman due to trying to mitigate ci sev #105248 ([comment](https://github.com/pytorch/pytorch/pull/105227#issuecomment-1636510935))
2023-07-14 22:28:35 +00:00
Nikita Shulga
c9c4f8efc3 [Reland] Update mypy to 1.4.1 (#105227)
This PR re-lands
- [Typing] Fix PEP 484 Violation (#105022)
- Update mypy to 1.4.1 (#91983)

That were reverted due to the conflict with internal source repo.

Mostly fixes for PEP-484 violation (i.e. when default arg is set to None, but type is not annotated as optional)
Plus few real fixes:
  - Add missing `_get_upgraders_entry_map` to `torch/_C/__init__.pyi`
  - Add missing return statement to `torch._export. deserialize_graph`
  - Fix error message in `torch.ao.ns.fx.weight_utils.get_lstm_mod_weights`
  - Add assert it `torch/optim/optimizer.py` that Optional list is not None
TODO (in followup PR):
  - Fix erroneous `isinstance` check in `torch/ao/quantization/_pt2e/qat_utils.py`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/105227
Approved by: https://github.com/atalman, https://github.com/albanD, https://github.com/Skylion007
2023-07-14 20:45:12 +00:00
PyTorch MergeBot
b4d91b1c5b Revert "[Typing] Fix PEP 484 Violation (#105022)"
This reverts commit 4148b7bada.

Reverted https://github.com/pytorch/pytorch/pull/105022 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally ([comment](https://github.com/pytorch/pytorch/pull/105022#issuecomment-1635967734))
2023-07-14 14:45:09 +00:00
PyTorch MergeBot
5b4aacd691 Revert "[DCP] Add FsspecReader and FsspecWriter to checkpoint __init__.py (#105088)"
This reverts commit 76a053d55c.

Reverted https://github.com/pytorch/pytorch/pull/105088 on behalf of https://github.com/atalman due to broke trunk and  linux-focal-py3.9-clang7-asan ([comment](https://github.com/pytorch/pytorch/pull/105088#issuecomment-1633385350))
2023-07-13 00:59:55 +00:00
Iris
76a053d55c [DCP] Add FsspecReader and FsspecWriter to checkpoint __init__.py (#105088)
Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/105088
Approved by: https://github.com/kumpera
2023-07-12 23:40:35 +00:00
Nikita Shulga
4148b7bada [Typing] Fix PEP 484 Violation (#105022)
Not sure, how it worked before, but if arguments must be annotated is optional if they are defaulted to None

Towards enabling mypy-1.4.1 in lintrunner

<!--
copilot:poem
-->
### <samp>🤖 Generated by Copilot at 5e1b9f4</samp>

> _We annotate the arguments of doom_
> _To show the `None` values of gloom_
> _We improve the type checking and readability_
> _With `Optional` annotations of metal-ity_

Pull Request resolved: https://github.com/pytorch/pytorch/pull/105022
Approved by: https://github.com/izaitsevfb, https://github.com/huydhn, https://github.com/Skylion007
2023-07-12 10:20:48 +00:00