Changes by apply order:
1. Replace all `".."` and `os.pardir` usage with `os.path.dirname(...)`.
2. Replace nested `os.path.dirname(os.path.dirname(...))` call with `str(Path(...).parent.parent)`.
3. Reorder `.absolute()` ~/ `.resolve()`~ and `.parent`: always resolve the path first.
`.parent{...}.absolute()` -> `.absolute().parent{...}`
4. Replace chained `.parent x N` with `.parents[${N - 1}]`: the code is easier to read (see 5.)
`.parent.parent.parent.parent` -> `.parents[3]`
5. ~Replace `.parents[${N - 1}]` with `.parents[${N} - 1]`: the code is easier to read and does not introduce any runtime overhead.~
~`.parents[3]` -> `.parents[4 - 1]`~
6. ~Replace `.parents[2 - 1]` with `.parent.parent`: because the code is shorter and easier to read.~
Pull Request resolved: https://github.com/pytorch/pytorch/pull/129374
Approved by: https://github.com/justinchuby, https://github.com/malfet
As title, this patch prevents developers from importing third party
libraries to patch things in Dynamo, unless there's no other easy
workaround (in which case one would add the library to the allowlist in
`import_linter.py`, as instructed by the lint error).
For instance, if we remove `einops` from the allowlist, we'd get this
```verbatim
>>> Lint for torch/_dynamo/decorators.py:
Error (IMPORT) Disallowed import
importing from einops is not allowed, if you believe there's a valid
reason, please add it to import_linter.py
608 |# Note: this carefully avoids eagerly import einops.
609 |# TODO: we should delete this whole _allow_in_graph_einops logic by approximately 2024 Q2
610 |def _allow_in_graph_einops():
>>> 611 | import einops
612 |
613 | try:
614 | # requires einops > 0.6.1, torch >= 2.0
Error (IMPORT) Disallowed import
importing from einops is not allowed, if you believe there's a valid
reason, please add it to import_linter.py
612 |
613 | try:
614 | # requires einops > 0.6.1, torch >= 2.0
>>> 615 | from einops._torch_specific import ( # type: ignore[attr-defined] # noqa: F401
616 | _ops_were_registered_in_torchdynamo,
617 | )
618 |
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/143312
Approved by: https://github.com/zou3519
Now syntax errors look like this:
```
torch/_dynamo/variables/base.py:20:
Error (RUFF) E999
SyntaxError: Expected ',', found indent.
See https://beta.ruff.rs/docs/rules/
>>> 19 |class SourceType(Enum]:
20 | """
21 | This Enum divides VariableTracker into 2 cases, depending on the variable
22 | it represents:
[...more errors...]
```
Note that most syntax errors lead to a cascade of other errors, so the exception is generally wrong, but the location and name are good.
Before they looked like this:
```
>>> General linter failure:
Error (RUFF) Linter failed
Linter failed. This a bug, please file an issue against the linter
maintainer.
CONTEXT:
Linter command failed with non-zero exit code.
STDERR:
<MainThread:DEBUG> $ /home/rec/.conda/envs/pytorch-dev-constant/
bin/python3 -m ruff check --exit-zero --quiet --output-format=json
--config=pyproject.toml /home/rec/git-constant/pytorch/torch/_dynamo/
variables/base.py
<MainThread:DEBUG> took 38ms
Traceback (most recent call last):
File "/home/rec/git-constant/pytorch/tools/linter/adapters/
ruff_linter.py", line 465, in <module>
main()
File "/home/rec/git-constant/pytorch/tools/linter/adapters/
ruff_linter.py", line 424, in main
lint_messages = check_files(
File "/home/rec/git-constant/pytorch/tools/linter/adapters/
ruff_linter.py", line 273, in check_files
return [
File "/home/rec/git-constant/pytorch/tools/linter/adapters/
ruff_linter.py", line 288, in <listcomp>
severity=severities.get(vuln["code"],
get_issue_severity(vuln["code"])),
File "/home/rec/git-constant/pytorch/tools/linter/adapters/
ruff_linter.py", line 172, in get_issue_severity
if any(
File "/home/rec/git-constant/pytorch/tools/linter/adapters/
ruff_linter.py", line 173, in <genexpr>
code.startswith(x)
AttributeError: 'NoneType' object has no attribute 'startswith'
STDOUT:
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/142312
Approved by: https://github.com/Skylion007
MInor, adds a linter that ensures that all jobs run on pull_request, schedule, push etc have a `if: github.repository_owner == 'pytorch'` or are dependent on a job that has that check
There is also a setting in Github repos that can disable all workflows for that repo
A lot of these are unnecessary because many jobs use reusable workflows that have that check. However, this is a one time change so I'm not that bothered
Unfortunately I can't put this at the workflow level, which would make this better
Lots of weird string parsing
Pull Request resolved: https://github.com/pytorch/pytorch/pull/138849
Approved by: https://github.com/malfet
Add a new option `--cuda` to `tools/nightly.py` to pull the nightly packages with CUDA support.
```bash
# installs pytorch-nightly with cpuonly
tools/nightly.py pull
# The following only available on Linux and Windows
# installs pytorch-nightly with latest CUDA we support
tools/nightly.py pull --cuda
# installs pytorch-nightly with CUDA 12.1
tools/nightly.py pull --cuda 12.1
```
Also add targets in `Makefile` and instructions in constribution guidelines.
```bash
# setup conda environment with pytorch-nightly
make setup-env
# setup conda environment with pytorch-nightly with CUDA support
make setup-env-cuda
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/131133
Approved by: https://github.com/ezyang
This PR follows https://github.com/pytorch/pytorch/pull/129374#pullrequestreview-2136555775 cc @malfet:
> Lots of formatting changes unrelated to PR goal, please keep them as part of separate PR (and please add lint rule if you want to enforce those, or at least cite one)
`usort` allows empty lines within import segments. For example, `usort` do not change the following code:
```python
import torch.aaa
import torch.bbb
import torch.ccc
x = ... # some code
```
```python
import torch.aaa
import torch.bbb
import torch.ccc
x = ... # some code
```
```python
import torch.aaa
import torch.bbb
import torch.ccc
x = ... # some code
```
This PR first sort imports via `isort`, then re-sort the file using `ufmt` (`usort` + `black`). This enforces the following import style:
1. no empty lines within segments.
2. single empty line between segments.
3. two spaces after import statements.
All the code snippets above will be formatted to:
```python
import torch.aaa
import torch.bbb
import torch.ccc
x = ... # some code
```
which produces a consistent code style.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/129751
Approved by: https://github.com/malfet
Update ruff to 0.5.0 so we can enable all the some of the new checks I've been wanting to add to the codebase. First just updating the code to comply with some rule changes and a couple minor API changes / deprecations.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/129744
Approved by: https://github.com/ezyang
This PR follows https://github.com/pytorch/pytorch/pull/129374#pullrequestreview-2136555775 cc @malfet:
> Lots of formatting changes unrelated to PR goal, please keep them as part of separate PR (and please add lint rule if you want to enforce those, or at least cite one)
`usort` allows empty lines within import segments. For example, `usort` do not change the following code:
```python
import torch.aaa
import torch.bbb
import torch.ccc
x = ... # some code
```
```python
import torch.aaa
import torch.bbb
import torch.ccc
x = ... # some code
```
```python
import torch.aaa
import torch.bbb
import torch.ccc
x = ... # some code
```
This PR first sort imports via `isort`, then re-sort the file using `ufmt` (`usort` + `black`). This enforces the following import style:
1. no empty lines within segments.
2. single empty line between segments.
3. two spaces after import statements.
All the code snippets above will be formatted to:
```python
import torch.aaa
import torch.bbb
import torch.ccc
x = ... # some code
```
which produces a consistent code style.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/129751
Approved by: https://github.com/malfet
Changes by apply order:
1. Replace all `".."` and `os.pardir` usage with `os.path.dirname(...)`.
2. Replace nested `os.path.dirname(os.path.dirname(...))` call with `str(Path(...).parent.parent)`.
3. Reorder `.absolute()` ~/ `.resolve()`~ and `.parent`: always resolve the path first.
`.parent{...}.absolute()` -> `.absolute().parent{...}`
4. Replace chained `.parent x N` with `.parents[${N - 1}]`: the code is easier to read (see 5.)
`.parent.parent.parent.parent` -> `.parents[3]`
5. ~Replace `.parents[${N - 1}]` with `.parents[${N} - 1]`: the code is easier to read and does not introduce any runtime overhead.~
~`.parents[3]` -> `.parents[4 - 1]`~
6. ~Replace `.parents[2 - 1]` with `.parent.parent`: because the code is shorter and easier to read.~
Pull Request resolved: https://github.com/pytorch/pytorch/pull/129374
Approved by: https://github.com/justinchuby, https://github.com/malfet
Changes by apply order:
1. Replace all `".."` and `os.pardir` usage with `os.path.dirname(...)`.
2. Replace nested `os.path.dirname(os.path.dirname(...))` call with `str(Path(...).parent.parent)`.
3. Reorder `.absolute()` ~/ `.resolve()`~ and `.parent`: always resolve the path first.
`.parent{...}.absolute()` -> `.absolute().parent{...}`
4. Replace chained `.parent x N` with `.parents[${N - 1}]`: the code is easier to read (see 5.)
`.parent.parent.parent.parent` -> `.parents[3]`
5. ~Replace `.parents[${N - 1}]` with `.parents[${N} - 1]`: the code is easier to read and does not introduce any runtime overhead.~
~`.parents[3]` -> `.parents[4 - 1]`~
6. ~Replace `.parents[2 - 1]` with `.parent.parent`: because the code is shorter and easier to read.~
Pull Request resolved: https://github.com/pytorch/pytorch/pull/129374
Approved by: https://github.com/justinchuby, https://github.com/malfet
## Problem this PR resolves
Today, most of distributed tests are arranged like this:
```
def test_allreduce(self):
pg = self._create_process_group_nccl(store, self.opts())
pg.allreduce(tensor)
...
```
Thus, we are paying PG creation time **per test**. That's bad. But why were we doing that? Is there a constraint?
If we look deeper, we would find that most of our test cases inherit from `torch.testing._internal.common_distributed.MultiProcessTestCase`. From the name, nothing seems wrong, and probably fits distributed well. But a "problem" exists in its `setUp()` and `tearDown()` methods, which basically do the following:
```
def setUp(self):
self._spawn_processes()
def tearDown(self):
for p in self.processes:
p.terminate()
```
Since `setUp` and `tearDown` are "**test-scope fixtures"**, meaning, they are called per test, each test will have brand new processes. Of course we'd have to recreate ProcessGroup every time.
## How we are fixing it
First, obviously, we need to put a PG's lifetime into a longer scope. Python `unittest` provides such a helper, called **"class-scope fixtures."** It is embodied by a `setUpClass` method and a `tearDownClass` method (note the name difference), which are called only once for all tests in the same test class. Therefore, we would do:
```
@classmethod
def setUpClass(self):
dist.init_process_group(...)
@classmethod
def tearDownClass(self):
dist.destroy_process_group()
```
**In this PR, we create a new test template for distributed: `MultiProcContinousTest`, to hold this class-scope fixture.**
Second, we'd need to avoid per-test process spawn and terminate. That's easy, we can either:
1. launch the whole test file with `torchrun --nproc-per-node=...` or
2. use `mp.spawn()` under `if __name__ == "__main__":`.
Point is, launch the processes only once.
## Result
We moved the "positive tests" from test_c10d_nccl.py to test_c10d_ops_nccl.py.
Before this PR:
```
$ python test_c10d_nccl.py -k ProcessGroupNCCLTest
Ran 24 tests in 174.457s
```
After this PR:
```
$ torchrun --nproc-per-node 2 test_c10d_ops_nccl.py
or
$ python test_c10d_ops_nccl.py
Ran 24 tests in 16.247s
```
10X speedup.
## Limitation
For tests intended to test destroy or abort of PGs, we'd need to go back to the old style. So it would make sense to divide our tests into two classes: one for positive tests where we would reuse the PGs, and the other one for abort/destroy and negative tests like watchdog timeout.
## Next step
Migrate the tests of distributed that would fit with this test style!
Pull Request resolved: https://github.com/pytorch/pytorch/pull/125648
Approved by: https://github.com/wconstab
When you make a git worktree, the .git "folder" in the worktree
is not a directory, it's a file pointing at the actual .git directory.
Signed-off-by: Edward Z. Yang <ezyang@meta.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/123060
Approved by: https://github.com/albanD
We can use Sapling (hg) with the pytorch repo but there are a couple minor issues to teach our scripting to be happier with having either a git or hg repo.
This change fixes some issues in:
- setup.py
- lintrunner
Pull Request resolved: https://github.com/pytorch/pytorch/pull/122072
Approved by: https://github.com/ezyang
Constant time access of first value in collection. This is a constant time operation instead of converting the item to a list to get the first item which is linear. The rule is turned on which automatically autofixes and enforces this.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/115507
Approved by: https://github.com/malfet
The linter uses libcst to check for a call to run_tests or a raised exception when the test file is run as main to ensure that all test files either get run in OSS CI or don't run and are expected to not run.
A better option instead of making this into a linter might be to add this code in run_test since there's also a list of blocklisted tests there that needs to be updated when a test file raises an exception.
This is possibly overkill since run on its own, the code takes ~1 minutes to run without the multiprocessing on all the files
Pull Request resolved: https://github.com/pytorch/pytorch/pull/114882
Approved by: https://github.com/kit1980
This PR addresses the persistent issue of merge conflicts in the benchmarks/dynamo/ci_expected_accuracy/ directory, specifically those arising from frequently updated CSV files. Based on @malfet suggestion, the solution implemented adds three spaces between each line in the CSV files. This approach has proven effective in preventing merge conflicts, as evidenced in [D50239634](https://www.internalfb.com/intern/diff/D50239634/). Regardless of these changes the extra new lines should still allow the csvs to be ingested as normal.
If you have access to the diff:
Normally, modifying a line that is later altered in the stack results in a merge conflict during restacking. With this new spacing strategy, lines that are not modified further down the stack will not trigger merge conflicts, achieving our intended outcome.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/111163
Approved by: https://github.com/malfet, https://github.com/huydhn
Enables two ruff rules derived from pylint:
* PLR1722 replaces any exit() calls with sys.exit(). exit() is only designed to be used in repl contexts as may not always be imported by default. This always use the version in the sys module which is better
* PLW3301 replaces nested min / max calls with simplified versions (ie. `min(a, min(b, c))` => `min(a, b. c)`). The new version is more idiomatic and more efficient.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/109461
Approved by: https://github.com/ezyang
When running on a host with multiple CPUs, the ufmt linter was not able to use them very effectively. The biggest single culprit seems to be debug logging inside blib2to3 trying to acquire a lock, but disabling that doesn't help much - I suppose this must be GIL contention. Changing to a ProcessPoolExecutor makes it much faster.
The following timings are on a PaperSpace GPU+ instance with 8 vCPUs (the cores show up as Intel(R) Xeon(R) CPU E5-2623 v4 @ 2.60GHz but I'm not entirely clear if those are shared with other instances).
On main:
```
$ time lintrunner --all-files --take UFMT
ok No lint issues.
real 7m46.140s
user 8m0.828s
sys 0m5.446s
```
On this branch:
```
$ time lintrunner --all-files --take UFMT
ok No lint issues.
real 1m7.255s
user 8m13.388s
sys 0m3.506s
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/106123
Approved by: https://github.com/ezyang
To `slow.yml` and `mac-mps.yaml`, based on the results of the following grep:
```
% grep "sync-tag: " .github/workflows/*.yml
.github/workflows/mac-mps.yml: sync-tag: macos-12-py3-arm64-build
.github/workflows/mac-mps.yml: sync-tag: macos-12-py3-arm64-mps-test
.github/workflows/pull.yml: sync-tag: asan-build
.github/workflows/pull.yml: sync-tag: asan-test
.github/workflows/pull.yml: sync-tag: win-cpu-build
.github/workflows/pull.yml: sync-tag: rocm-build
.github/workflows/slow.yml: sync-tag: asan-build
.github/workflows/slow.yml: sync-tag: asan-test
.github/workflows/trunk.yml: sync-tag: macos-12-py3-arm64-build
.github/workflows/trunk.yml: sync-tag: macos-12-py3-arm64-mps-test
.github/workflows/trunk.yml: sync-tag: win-cpu-build
.github/workflows/trunk.yml: sync-tag: win-cuda-build
.github/workflows/trunk.yml: sync-tag: rocm-build
```
Allow synced workflows to diverge with regards to `test-matrix`, to allow for both `mac-mps` and slow part of ASAN tests.
Discovered while working on https://github.com/pytorch/pytorch/pull/105260 that slow sync-tag is not checked.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/106331
Approved by: https://github.com/huydhn, https://github.com/atalman, https://github.com/seemethere
This PR re-lands
- [Typing] Fix PEP 484 Violation (#105022)
- Update mypy to 1.4.1 (#91983)
That were reverted due to the conflict with internal source repo.
Mostly fixes for PEP-484 violation (i.e. when default arg is set to None, but type is not annotated as optional)
Plus few real fixes:
- Add missing `_get_upgraders_entry_map` to `torch/_C/__init__.pyi`
- Add missing return statement to `torch._export. deserialize_graph`
- Fix error message in `torch.ao.ns.fx.weight_utils.get_lstm_mod_weights`
- Add assert it `torch/optim/optimizer.py` that Optional list is not None
TODO (in followup PR):
- Fix erroneous `isinstance` check in `torch/ao/quantization/_pt2e/qat_utils.py`
Unrelated, to bypass CI failures due to the gcc9 dependency update in Ubuntu-18.04:
- Add hack to squash older libstdc++ from conda environment in favor one from OS to `.ci/docker/install_conda.sh`
- Update bazel cuda builds to focal, as with libstdc++-6.0.32 bazel builds loose the ability to catch exceptions (probably because they link with cupti statically, but I could not found where it is done)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/105227
Approved by: https://github.com/atalman, https://github.com/albanD, https://github.com/Skylion007
This PR re-lands
- [Typing] Fix PEP 484 Violation (#105022)
- Update mypy to 1.4.1 (#91983)
That were reverted due to the conflict with internal source repo.
Mostly fixes for PEP-484 violation (i.e. when default arg is set to None, but type is not annotated as optional)
Plus few real fixes:
- Add missing `_get_upgraders_entry_map` to `torch/_C/__init__.pyi`
- Add missing return statement to `torch._export. deserialize_graph`
- Fix error message in `torch.ao.ns.fx.weight_utils.get_lstm_mod_weights`
- Add assert it `torch/optim/optimizer.py` that Optional list is not None
TODO (in followup PR):
- Fix erroneous `isinstance` check in `torch/ao/quantization/_pt2e/qat_utils.py`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/105227
Approved by: https://github.com/atalman, https://github.com/albanD, https://github.com/Skylion007
### This change
- Implements the ruff linter in pytorch lintrunner. It is adapted from https://github.com/justinchuby/lintrunner-adapters/blob/main/lintrunner_adapters/adapters/ruff_linter.py. It does **both linting and fixing**. 🔧
- Migrated all flake8 configs to the ruff config and enabled it for the repo. ✅
- **`ruff` lints the whole repo in under 2s** 🤯
Fixes https://github.com/pytorch/pytorch/issues/94737 Replaces #99280
@huydhn @Skylion007
<!--
copilot:all
-->
### <samp>🤖 Generated by Copilot at 6b982dd</samp>
### Summary
🧹🛠️🎨
<!--
1. 🧹 This emoji represents cleaning or tidying up, which is what `ruff` does by formatting and linting the code. It also suggests improving the code quality and removing unnecessary or redundant code.
2. 🛠️ This emoji represents tools or fixing, which is what `ruff` is as a code formatter and linter. It also suggests enhancing the code functionality and performance, and resolving potential issues or bugs.
3. 🎨 This emoji represents art or creativity, which is what `ruff` allows by providing a consistent and configurable style for the code. It also suggests adding some flair or personality to the code, and making it more readable and enjoyable.
-->
Add `[tool.ruff]` section to `pyproject.toml` to configure `ruff` code formatter and linter. This change aims to improve code quality and consistency with a single tool.
> _`ruff` cleans the code_
> _like a spring breeze in the fields_
> _`pyproject.toml`_
### Walkthrough
* Configure `ruff` code formatter and linter for the whole project ([link](https://github.com/pytorch/pytorch/pull/99785/files?diff=unified&w=0#diff-50c86b7ed8ac2cf95bd48334961bf0530cdc77b5a56f852c5c61b89d735fd711R22-R79))
Pull Request resolved: https://github.com/pytorch/pytorch/pull/99785
Approved by: https://github.com/malfet, https://github.com/Skylion007
Merges startswith, endswith calls to into a single call that feeds in a tuple. Not only are these calls more readable, but it will be more efficient as it iterates through each string only once.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/96754
Approved by: https://github.com/ezyang
An action item from https://github.com/pytorch/pytorch/issues/94346
Although the security practice of setting the checksum is good, it doesn't work when the archive is downloaded from some sites like GitHub because it can change. Specifically, GitHub gives no guarantee to keep the same value forever https://github.com/community/community/discussions/46034.
This also adds a new linter to make sure that SHA checksum from GitHub can be removed quickly. The WORKSPACE file is actually updated using the new linter:
```
>>> Lint for WORKSPACE:
Advice (BAZEL_LINTER) format
Redundant SHA checksum. Run `lintrunner -a` to apply this patch.
You can run `lintrunner -a` to apply this patch.
5 5 |
6 6 | http_archive(
7 7 | name = "rules_cuda",
7 |- sha256 = "f80438bee9906e9ecb1a8a4ae2365374ac1e8a283897281a2db2fb7fcf746333",
9 8 | strip_prefix = "runtime-b1c7cce21ba4661c17ac72421c6a0e2015e7bef3/third_party/rules_cuda",
10 9 | urls = ["b1c7cce21b.tar.gz"],
11 10 | )
--------------------------------------------------------------------------------
29 28 | name = "pybind11_bazel",
30 29 | strip_prefix = "pybind11_bazel-992381ced716ae12122360b0fbadbc3dda436dbf",
31 30 | urls = ["992381ced7.zip"],
31 |- sha256 = "3dc6435bd41c058453efe102995ef084d0a86b0176fd6a67a6b7100a2e9a940e",
33 31 | )
34 32 |
35 33 | new_local_repository(
--------------------------------------------------------------------------------
52 50 | urls = [
53 51 | "https://github.com/gflags/gflags/archive/v2.2.2.tar.gz",
54 52 | ],
54 |- sha256 = "34af2f15cf7367513b352bdcd2493ab14ce43692d2dcd9dfc499492966c64dcf",
56 53 | )
57 54 |
58 55 | new_local_repository(
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/95039
Approved by: https://github.com/ZainRizvi
Preferring dash over underscore in command-line options. Add `--command-arg-name` to the argument parser. The old arguments with underscores `--command_arg_name` are kept for backward compatibility.
Both dashes and underscores are used in the PyTorch codebase. Some argument parsers only have dashes or only have underscores in arguments. For example, the `torchrun` utility for distributed training only accepts underscore arguments (e.g., `--master_port`). The dashes are more common in other command-line tools. And it looks to be the default choice in the Python standard library:
`argparse.BooleanOptionalAction`: 4a9dff0e5a/Lib/argparse.py (L893-L895)
```python
class BooleanOptionalAction(Action):
def __init__(...):
if option_string.startswith('--'):
option_string = '--no-' + option_string[2:]
_option_strings.append(option_string)
```
It adds `--no-argname`, not `--no_argname`. Also typing `_` need to press the shift or the caps-lock key than `-`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/94505
Approved by: https://github.com/ezyang, https://github.com/seemethere
There are some occurrences when clang-tidy linter fails flakily with the following error, which is very weird:
```
>>> Lint for FILE:
Error (CLANGTIDY) command-failed
Failed due to FileNotFoundError:
[Errno 2] No such file or directory: '.lintbin/clang-tidy'
```
For examples,
* 0a93e6db5a
* 203b2cad3e
The binary is definitely there as the log shows that it has been downloaded successfully from S3. Looking a bit closer, I notice that the linter uses `os.chdir` to jump around between the workspace and the build folder. And it also refers to the binary with the relative path `.lintbin/clang-tidy` which doesn't exist in the latter. AFAIK, the current working directory is per process (https://stackoverflow.com/questions/16388400/what-is-a-thread-specific-os-chdir-and-mkdir-in-python), so I suspect that there is a race here where one thread chdir into build while another thread tries to lint another file. Thus the fix to use the absolute path to clang-tidy
Pull Request resolved: https://github.com/pytorch/pytorch/pull/94093
Approved by: https://github.com/malfet
It would be good to make the s3_init_config.json instructions
more detailed (like step-by-step for how to run the custom build)
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/91978
Approved by: https://github.com/malfet
I missed the fine print in https://github.com/actions/setup-python/blob/main/README.md#caching-packages-dependencies when setting up the cache using setup-python GHA
> Restored cache will not be used if the requirements.txt file is not updated for a long time and a newer version of the dependency is available which can lead to an increase in total build time.
The latter part is important because it implies that even with the cache, pip will still try to check if a newer version exists and that part can be flaky, i.e. https://github.com/pytorch/pytorch/actions/runs/3313764038/jobs/5472180293
This undesired behavior can be turned off by setting the advance option `check-latest` to false https://github.com/actions/setup-python/blob/main/docs/advanced-usage.md#check-latest-version. Per my understanding, this should tell pip install in these workflows to use the local cached copy of the package avoiding the need to query pypi every single time.
`check-latest` was added quite recently https://github.com/actions/setup-python/pull/406, so `actionlint-1.6.15` fails to recognize it. Thus, this PR also upgrades `actionlint` to the latest 1.6.21 to pass the linter check. Here is an example error from 1.6.15 from https://github.com/pytorch/pytorch/actions/runs/3315388073/jobs/5475918454:
```
>>> Lint for .github/workflows/lint.yml:
Error (ACTIONLINT) [action]
input "check-latest" is not defined in action "actions/setup-python@v4".
available inputs are "architecture", "cache", "cache-dependency-path",
"python-version", "python-version-file", "token"
25 | with:
26 | python-version: 3.8
27 | architecture: x64
>>> 28 | check-latest: false
29 | cache: pip
30 | cache-dependency-path: |
31 | **/.github/requirements-gha-cache.txt
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/87621
Approved by: https://github.com/ZainRizvi
I missed the fine print in https://github.com/actions/setup-python/blob/main/README.md#caching-packages-dependencies when setting up the cache using setup-python GHA
> Restored cache will not be used if the requirements.txt file is not updated for a long time and a newer version of the dependency is available which can lead to an increase in total build time.
The latter part is important because it implies that even with the cache, pip will still try to check if a newer version exists and that part can be flaky, i.e. https://github.com/pytorch/pytorch/actions/runs/3313764038/jobs/5472180293
This undesired behavior can be turned off by setting the advance option `check-latest` to false https://github.com/actions/setup-python/blob/main/docs/advanced-usage.md#check-latest-version. Per my understanding, this should tell pip install in these workflows to use the local cached copy of the package avoiding the need to query pypi every single time.
`check-latest` was added quite recently https://github.com/actions/setup-python/pull/406, so `actionlint-1.6.15` fails to recognize it. Thus, this PR also upgrades `actionlint` to the latest 1.6.21 to pass the linter check. Here is an example error from 1.6.15 from https://github.com/pytorch/pytorch/actions/runs/3315388073/jobs/5475918454:
```
>>> Lint for .github/workflows/lint.yml:
Error (ACTIONLINT) [action]
input "check-latest" is not defined in action "actions/setup-python@v4".
available inputs are "architecture", "cache", "cache-dependency-path",
"python-version", "python-version-file", "token"
25 | with:
26 | python-version: 3.8
27 | architecture: x64
>>> 28 | check-latest: false
29 | cache: pip
30 | cache-dependency-path: |
31 | **/.github/requirements-gha-cache.txt
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/87621
Approved by: https://github.com/ZainRizvi
We define specializations for pybind11 defined templates
(in particular, PYBIND11_DECLARE_HOLDER_TYPE) and consequently
it is important that these specializations *always* be #include'd
when making use of pybind11 templates whose behavior depends on
these specializations, otherwise we can cause an ODR violation.
The easiest way to ensure that all the specializations are always
loaded is to designate a header (in this case, torch/csrc/util/pybind.h)
that ensures the specializations are defined, and then add a lint
to ensure this header is included whenever pybind11 headers are
included.
The existing grep linter didn't have enough knobs to do this
conveniently, so I added some features. I'm open to suggestions
for how to structure the features better. The main changes:
- Added an --allowlist-pattern flag, which turns off the grep lint
if some other line exists. This is used to stop the grep
lint from complaining about pybind11 includes if the util
include already exists.
- Added --match-first-only flag, which lets grep only match against
the first matching line. This is because, even if there are multiple
includes that are problematic, I only need to fix one of them.
We don't /really/ need this, but when I was running lintrunner -a
to fixup the preexisting codebase it was annoying without this,
as the lintrunner overall driver fails if there are multiple edits
on the same file.
I excluded any files that didn't otherwise have a dependency on
torch/ATen, this was mostly caffe2 and the valgrind wrapper compat
bindings.
Note the grep replacement is kind of crappy, but clang-tidy lint
cleaned it up in most cases.
See also https://github.com/pybind/pybind11/issues/4099
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/82552
Approved by: https://github.com/albanD
This PR will:
1. Update actionlint to fix false positives from https://github.com/pytorch/pytorch/issues/81807
2. Establish a new naming convention for S3 file paths for linter adapters which allows older commits of pytorch to no longer be broken
3. Add update instructions to the s3_init_config.json file.
**Why are the instructions embedded in this json file and not the pytorch wiki?**
Anyone who tries to update the binaries will definitely easily this file and can see the instructions above. The wiki is not nearly as searchable and is likely to not get noticed
**Why embed the comment as data in the json file?**
Json doesn't support native comments. But since nothing is validating the exact shape of this json file, adding an extra dictionary entry to serve as a comment is perfectly safe.
## Testing
I validated the architectures of the old binaries by running `file actionlint` on them and inspecting the outputs
I validated the hash was sha256 by checking tools/linter/adapters/s3_init.py and by also downloading the binaries from s3 and verifying their sha256 matches what's in s3_init_config.json
I validated end to end behavior by:
1. Deleting `.lintbin\actionlint` locally, running `lintrunner init` and verifying it got installed correctly and could lint files
2. Changing the sha to an invalid value and verifying `lintrunner init` failed to install actionlint
Pull Request resolved: https://github.com/pytorch/pytorch/pull/81922
Approved by: https://github.com/kit1980, https://github.com/janeyx99
In order to maintain consistency between jobs, introduce a linter that
checks whether jobs sharing the same `sync-tag` are indeed the same.
`sync-tag` is just a dummy input on the reusable workflow. I chose to
use a dummy input over the following alternatives:
- The job's id isn't great, because we are likely to change a job's id
(say, when upgrading CUDA or linux versions)
- The job's name doesn't work as we have build/test jobs that share the
same name
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80200
Approved by: https://github.com/janeyx99
grep_linter.py was using the `-P` flag of `grep`, which is available in
GNU grep but notably *not* available in the BSD grep that is installed
on Macs.
Use `-E` instead, which uses ERE instead of PCRE. Sadly we were actually
using two PCRE features in our linters:
- Negative lookaheads. I changed these to less-accurate-but-still-good-enough
versions that use `[^...]` expressions.
- Apparently ERE doesn't support the `\t` atom lol. So I used a literal tab
character instead (and then had to disable the TAB linter for
`.lintrunner.toml` lol.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76947
Approved by: https://github.com/ezyang
This PR allows user to author a CUDA kernel in python.
```
from torch.cuda.jiterator import create_jit_fn
code_string = "template <typename T> T my_kernel(T x, T y, T alpha) { return -x * y + x - y + alpha; }"
jitted_fn = create_jit_fn(code_string, alpha=0)
a = torch.rand(3, device='cuda')
b = torch.rand(3, device='cuda')
result = jitted_fn(a, b, alpha=1.0)
```
Limitations:
- Only supports elementwise kernel
- 1~8 tensor inputs (empty input, e.g. factory methods, is not supported)
- inputs tensors must live in cuda device
- cpu Scalar is not supported
- kwargs must be pre-declared when calling create_jit_fn
- kwargs must be convertible to at::Scalar, one of float64, int64_t, bool. (complex not support for now)
TODOs:
- [x] consolidate union and c10::variant implementation
- [x] plug into existing op testing framework
- [ ] rename files, place files in the right folder
- [ ] place util functions in the right file
- [x] enforce assumptions in python interface e.g <8 inputs, kwargs types
- [x] Add user-facing documentation
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76394
Approved by: https://github.com/mruberry
We would for some reason report formatting-based lints as showing up at
line 1 column 1. This removes them for now. Maybe eventually we can
recover better line numbers from the formatting diff and post messages
for each diff cluster, but that requires actual changes to the linting
engine.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75928
Approved by: https://github.com/janeyx99