This PR re-lands
- [Typing] Fix PEP 484 Violation (#105022)
- Update mypy to 1.4.1 (#91983)
That were reverted due to the conflict with internal source repo.
Mostly fixes for PEP-484 violation (i.e. when default arg is set to None, but type is not annotated as optional)
Plus few real fixes:
- Add missing `_get_upgraders_entry_map` to `torch/_C/__init__.pyi`
- Add missing return statement to `torch._export. deserialize_graph`
- Fix error message in `torch.ao.ns.fx.weight_utils.get_lstm_mod_weights`
- Add assert it `torch/optim/optimizer.py` that Optional list is not None
TODO (in followup PR):
- Fix erroneous `isinstance` check in `torch/ao/quantization/_pt2e/qat_utils.py`
Unrelated, to bypass CI failures due to the gcc9 dependency update in Ubuntu-18.04:
- Add hack to squash older libstdc++ from conda environment in favor one from OS to `.ci/docker/install_conda.sh`
- Update bazel cuda builds to focal, as with libstdc++-6.0.32 bazel builds loose the ability to catch exceptions (probably because they link with cupti statically, but I could not found where it is done)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/105227
Approved by: https://github.com/atalman, https://github.com/albanD, https://github.com/Skylion007
This PR re-lands
- [Typing] Fix PEP 484 Violation (#105022)
- Update mypy to 1.4.1 (#91983)
That were reverted due to the conflict with internal source repo.
Mostly fixes for PEP-484 violation (i.e. when default arg is set to None, but type is not annotated as optional)
Plus few real fixes:
- Add missing `_get_upgraders_entry_map` to `torch/_C/__init__.pyi`
- Add missing return statement to `torch._export. deserialize_graph`
- Fix error message in `torch.ao.ns.fx.weight_utils.get_lstm_mod_weights`
- Add assert it `torch/optim/optimizer.py` that Optional list is not None
TODO (in followup PR):
- Fix erroneous `isinstance` check in `torch/ao/quantization/_pt2e/qat_utils.py`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/105227
Approved by: https://github.com/atalman, https://github.com/albanD, https://github.com/Skylion007
Not sure, how it worked before, but if arguments must be annotated is optional if they are defaulted to None
Towards enabling mypy-1.4.1 in lintrunner
<!--
copilot:poem
-->
### <samp>🤖 Generated by Copilot at 5e1b9f4</samp>
> _We annotate the arguments of doom_
> _To show the `None` values of gloom_
> _We improve the type checking and readability_
> _With `Optional` annotations of metal-ity_
Pull Request resolved: https://github.com/pytorch/pytorch/pull/105022
Approved by: https://github.com/izaitsevfb, https://github.com/huydhn, https://github.com/Skylion007
Summary: The "kill worker process" event was logged to Scuba only when the worker process was really reaped. We want to add a new event "timer expired", no matter the worker process will be reaped or not. This will help collect data before we enable the JustKnob to kill the worker process on timeout.
Test Plan:
### Unit Test
```
buck test mode/dev-nosan //caffe2/test/distributed/elastic/agent/server/test:local_agent_test
```
```
Test Session: https://www.internalfb.com/intern/testinfra/testrun/7318349508929624
RE: reSessionID-ea464c43-54e7-44f2-942b-14ea8aa98c74 Up: 10.5 KiB Down: 1.1 MiB
Jobs completed: 100. Time elapsed: 3206.9s. Cache hits: 91%. Commands: 11 (cached: 10, remote: 1, local: 0)
Tests finished: Pass 55. Fail 0. Fatal 0. Skip 0. 0 builds failed
```
--------
```
buck test mode/dev-nosan //caffe2/test/distributed/elastic/agent/server/test/fb:local_agent_fb_internal_test
```
```
Test Session: https://www.internalfb.com/intern/testinfra/testrun/6473924579130483
RE: reSessionID-231a47b7-a43d-4c0f-9f73-64713ffcbbd3 Up: 5.7 MiB Down: 1.9 GiB
Jobs completed: 182156. Time elapsed: 282.4s. Cache hits: 99%. Commands: 72112 (cached: 72107, remote: 1, local: 4)
Tests finished: Pass 2. Fail 0. Fatal 0. Skip 0. 0 builds failed
```
Differential Revision: D39903376
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85861
Approved by: https://github.com/d4l3k
Summary:
This diff implements a named pipe based watchdog timer (`FileTimerClient` and `FileTimerServer`). This is similar to the existing `LocalTimerClient` and `LocalTimerServer` (https://fburl.com/code/j4b9pyya).
The motivation is from the need of handling various timeout issues. The training process occasionally get stuck. We need a proper watchdog to monitor the liveness of the training processes. This timer allows the TorchElastic agent (as the watchdog) to monitor the progress of the training processes that it spawned. If a timeout occurred, he TorchElastic agent can take some action to kill the stuck process and creating a core dump for it.
`LocalTimerClient` and `LocalTimerServer` require a `multiprocessing.Queue()` to work. So they can only be used between `multiprocessing` parent and child processes.
`FileTimerClient` and `FileTimerServer` does not have such limitation.
Test Plan:
### Unit Test
```
buck test mode/opt caffe2/test/distributed/elastic/timer:file_based_timer_test
```
```
RemoteExecution session id: reSessionID-06d70a77-043c-4d9d-b0f2-94c24460740a-tpx
Started reporting to test run: https://www.internalfb.com/intern/testinfra/testrun/844425186732666
✓ ListingSuccess: caffe2/test/distributed/elastic/timer:file_based_timer_test : 12 tests discovered (2.177)
✓ Pass: caffe2/test/distributed/elastic/timer:file_based_timer_test - test_happy_path (file_based_local_timer_test.FileTimerTest) (2.463)
✓ Pass: caffe2/test/distributed/elastic/timer:file_based_timer_test - test_expired_timers (file_based_local_timer_test.FileTimerServerTest) (1.889)
✓ Pass: caffe2/test/distributed/elastic/timer:file_based_timer_test - test_send_request_release (file_based_local_timer_test.FileTimerServerTest) (1.700)
✓ Pass: caffe2/test/distributed/elastic/timer:file_based_timer_test - test_valid_timers (file_based_local_timer_test.FileTimerServerTest) (1.873)
✓ Pass: caffe2/test/distributed/elastic/timer:file_based_timer_test - test_watchdog_call_count (file_based_local_timer_test.FileTimerServerTest) (1.715)
✓ Pass: caffe2/test/distributed/elastic/timer:file_based_timer_test - test_watchdog_empty_queue (file_based_local_timer_test.FileTimerServerTest) (1.609)
✓ Pass: caffe2/test/distributed/elastic/timer:file_based_timer_test - test_exception_propagation (file_based_local_timer_test.FileTimerTest) (1.633)
✓ Pass: caffe2/test/distributed/elastic/timer:file_based_timer_test - test_multiple_clients_interaction (file_based_local_timer_test.FileTimerTest) (2.189)
✓ Pass: caffe2/test/distributed/elastic/timer:file_based_timer_test - test_get_timer_recursive (file_based_local_timer_test.FileTimerTest) (2.295)
✓ Pass: caffe2/test/distributed/elastic/timer:file_based_timer_test - test_no_client (file_based_local_timer_test.FileTimerTest) (1.753)
✓ Pass: caffe2/test/distributed/elastic/timer:file_based_timer_test - test_timer (file_based_local_timer_test.FileTimerTest) (2.151)
✓ Pass: caffe2/test/distributed/elastic/timer:file_based_timer_test - test_client_interaction (file_based_local_timer_test.FileTimerTest) (1.895)
Summary
Pass: 12
ListingSuccess: 1
Finished test run: https://www.internalfb.com/intern/testinfra/testrun/844425186732666
```
Differential Revision: D38604238
Pull Request resolved: https://github.com/pytorch/pytorch/pull/83695
Approved by: https://github.com/d4l3k
Summary:
As this diff shows, currently there are a couple hundred instances of raw `noqa` in the codebase, which just ignore all errors on a given line. That isn't great, so this PR changes all existing instances of that antipattern to qualify the `noqa` with respect to a specific error code, and adds a lint to prevent more of this from happening in the future.
Interestingly, some of the examples the `noqa` lint catches are genuine attempts to qualify the `noqa` with a specific error code, such as these two:
```
test/jit/test_misc.py:27: print(f"{hello + ' ' + test}, I'm a {test}") # noqa E999
test/jit/test_misc.py:28: print(f"format blank") # noqa F541
```
However, those are still wrong because they are [missing a colon](https://flake8.pycqa.org/en/3.9.1/user/violations.html#in-line-ignoring-errors), which actually causes the error code to be completely ignored:
- If you change them to anything else, the warnings will still be suppressed.
- If you add the necessary colons then it is revealed that `E261` was also being suppressed, unintentionally:
```
test/jit/test_misc.py:27:57: E261 at least two spaces before inline comment
test/jit/test_misc.py:28:35: E261 at least two spaces before inline comment
```
I did try using [flake8-noqa](https://pypi.org/project/flake8-noqa/) instead of a custom `git grep` lint, but it didn't seem to work. This PR is definitely missing some of the functionality that flake8-noqa is supposed to provide, though, so if someone can figure out how to use it, we should do that instead.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56272
Test Plan:
CI should pass on the tip of this PR, and we know that the lint works because the following CI run (before this PR was finished) failed:
- https://github.com/pytorch/pytorch/runs/2365189927
Reviewed By: janeyx99
Differential Revision: D27830127
Pulled By: samestep
fbshipit-source-id: d6dcf4f945ebd18cd76c46a07f3b408296864fcb