Adds a ruff lint rule to ban raising raw exceptions. Most of these should at the very least be runtime exception, value errors, type errors or some other errors. There are hundreds of instance of these bad exception types already in the codebase, so I have noqa'd most of them. Hopefully this error code will get commiters to rethink what exception type they should raise when they submit a PR.
I also encourage people to gradually go and fix all the existing noqas that have been added so they can be removed overtime and our exception typing can be improved.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/124570
Approved by: https://github.com/ezyang
#121313 changed precompiled patterns so they are more integrated with the pattern matching code. This resulted with a list of "known" patterns (with their example data) being stored globally. Unfortunately since small FakeTensors store a constant of the original tensor it meant that we leaked cuda tensors in the example data.
Fix this by clearing out the constant storage for the example data that we keep around.
Fixes#124081
Pull Request resolved: https://github.com/pytorch/pytorch/pull/124345
Approved by: https://github.com/xuzhao9
The check_fn portion of pattern_matcher was retracing the pattern even if a pre-traced pattern was provided.
I think that as long as the patterns don't have control flow based on their inputs then this should be safe.
For this benchmark
```
python benchmarks/dynamo/huggingface.py --training --amp --performance --only MobileBertForQuestionAnswering --backend=inductor
```
this improves the performance of `joint_graph_passes` from about 9s down to 3s.
In the performance dashboard it seems to be a small win - most of the compilation times dropped by a couple seconds:
Torchbench 126s -> 124s
Huggingface 114s -> 110s
TIMM models 209s -> 208s
Dynamic 44s -> 43s
Blueberries 84s -> 81s
Pull Request resolved: https://github.com/pytorch/pytorch/pull/121314
Approved by: https://github.com/eellison
ghstack dependencies: #121313
Make it easier to serialize patterns by adding `pattern_matcher.gen_register_replacement()` which is like `pattern_matcher.register_replacement()` but also requires the replacement to be precompiled.
To precompile patterns (and save to disk) run:
```
torchgen/fuse_attention_patterns/gen_attention_patterns.py
```
- Updated the sfdp patterns to use `gen_register_replacement`.
- Add serialized patterns for mm_pattern and bmm_pattern (The 'misc' patterns don't serialize cleanly so can't be added).
- Updated the testing so it checked the round-trip patterns match and not just that it serialized the same way.
- Checking that the patterns round-trip properly found that the `users` field wasn't being serialized properly.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/121313
Approved by: https://github.com/eellison
We'd like to get auto_functionalized to work with AOTInductor. To get
there, we decompose `output = auto_functionalized(inplace_op, ...)` into its
corresponding aten ops (clones + inplace_op) before the Inductor lowering phase.
This decomposition must happen at the end of the Inductor FX passes
because it introduces in-place operations.
The pattern matcher's "replace this single node with multiple nodes" API
isn't robust enough here. The problem is that `auto_functionalized`
returns a single output (this output is a List), but the decomposition
ends up returning the unpacked List (e.g. it may return two tensors).
Previously, there was an assertion that this was not the case; I fixed
up `replace_with_graph` to handle this.
Future: Not all of the clones are necessary (e.g. if the input's last
usage is this operator, then we don't need to clone it). We can add this
logic later.
Test Plan:
- existing tests
Pull Request resolved: https://github.com/pytorch/pytorch/pull/118673
Approved by: https://github.com/oulgen
Fixes https://github.com/pytorch/pytorch/issues/117361
The implementation here slightly diverges from what was proposed in the issue, so I will recap what this PR is doing here. Today, when doing computations involving size-like unbacked SymInts, we assume for all operations that the compile time range of the integer is `[2, inf]`, even though at runtime we also accept zero and one.
This PR removes the carte blanche assumption, and instead does the analysis in a much more limited and controlled fashion: only for guards which we have designated as "size oblivious" are we willing to do the analysis under the assumption that the range of all size-like unbacked SymInts is `[2, inf]`; otherwise, we will faithfully only do analysis with `[0, inf]` (or whatever the user provided) bounds.
The infra pieces of this PR are:
* Remove runtime_var_to_range from torch/fx/experimental/symbolic_shapes.py; modify `_constrain_range_for_size` to refine the range without clamping min to 2, and instead add the symbol to a `size_like` set in the ShapeEnv
* When evaluating an expression, if the expression is requested to be evaluated in a `size_oblivious` way, we attempt to statically compute the value of the expression with the assumption that all symbols in `size_like` are updated to assume that they are `>= 2`.
* Add Python and C++ APIs for guarding on a SymBool in a size-oblivious way. In C++, I also need to add some helpers for performing symbolic comparisons, since the stock comparisons immediately specialize in the "normal" way.
The rest of the changes of the PR are marking various spots in PyTorch framework code as size oblivious, based on what our current test suite exercises.
As you review the places where we have marked things as size oblivious, it may become clear why I ended up not opting for the "designate a branch as the default branch when it's not statically obvious which way to go": for some of the conditions, this answer is rather non-obvious. I think potentially there is another refinement on top of this PR, which is something like "I don't care if you can't figure it out with ValueRange analysis, go down this path anyway if there are unbacked sizes involved." But even if we add this API, I think we are obligated to attempt the ValueRange analysis first, since it can lead to better outcomes sometimes (e.g., we are able to figure out that something is contiguous no matter what the unbacked size is.)
When is it permissible to mark something as size oblivious? Heuristically, it is OK anywhere in framework code if it gets you past a guard on unbacked SymInt problem. It is somewhat difficult to provide a true semantic answer, however. In particular, these annotations don't have any observational equivalence guarantee; for example, if I have `torch.empty(u0, 1).squeeze()`, we will always produce a `[u0]` size tensor, even though if `u0 == 1` PyTorch will actually produce a `[]` size tensor. The argument that I gave to Lezcano is that we are in fact defining an alternate semantics for a "special" size = 0, 1, for which we have these alternate eager mode semantics. In particular, suppose that we have a constant `special1` which semantically denotes 1, but triggers alternate handling rules. We would define `torch.empty(special1, 1).squeeze()` to always produce a `[special1]` size tensor, making its semantics coincide with unbacked SymInt semantics. In this model, the decision to designate guards as size oblivious is simply a user API question: you put them where ever you need some handling for special1! As we conservatively error out whenever it is not obvious what `special1` semantics should be, it is always valid to expand these semantics to cover more cases (although you can always choose the wrong semantics!)
Signed-off-by: Edward Z. Yang <ezyang@meta.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/118579
Approved by: https://github.com/eellison, https://github.com/lezcano
Fixes https://github.com/pytorch/pytorch/issues/118129
Suppressions automatically added with
```
import re
with open("error_file.txt", "r") as f:
errors = f.readlines()
error_lines = {}
for error in errors:
match = re.match(r"(.*):(\d+):\d+: error:.*\[(.*)\]", error)
if match:
file_path, line_number, error_type = match.groups()
if file_path not in error_lines:
error_lines[file_path] = {}
error_lines[file_path][int(line_number)] = error_type
for file_path, lines in error_lines.items():
with open(file_path, "r") as f:
code = f.readlines()
for line_number, error_type in sorted(lines.items(), key=lambda x: x[0], reverse=True):
code[line_number - 1] = code[line_number - 1].rstrip() + f" # type: ignore[{error_type}]\n"
with open(file_path, "w") as f:
f.writelines(code)
```
Signed-off-by: Edward Z. Yang <ezyang@meta.com>
Co-authored-by: Catherine Lee <csl@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/118533
Approved by: https://github.com/Skylion007, https://github.com/zou3519
Fixes https://github.com/pytorch/pytorch/issues/118129
Suppressions automatically added with
```
import re
with open("error_file.txt", "r") as f:
errors = f.readlines()
error_lines = {}
for error in errors:
match = re.match(r"(.*):(\d+):\d+: error:.*\[(.*)\]", error)
if match:
file_path, line_number, error_type = match.groups()
if file_path not in error_lines:
error_lines[file_path] = {}
error_lines[file_path][int(line_number)] = error_type
for file_path, lines in error_lines.items():
with open(file_path, "r") as f:
code = f.readlines()
for line_number, error_type in sorted(lines.items(), key=lambda x: x[0], reverse=True):
code[line_number - 1] = code[line_number - 1].rstrip() + f" # type: ignore[{error_type}]\n"
with open(file_path, "w") as f:
f.writelines(code)
```
Signed-off-by: Edward Z. Yang <ezyang@meta.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/118533
Approved by: https://github.com/Skylion007, https://github.com/zou3519
The original motivation for MYPYINDUCTOR was a faster type checking configuration that only checked a subset of files. With the removal of `follow_imports = ignore`, we are now able to use dmypy to do fast incremental typechecking, eliminating the need for this.
Perhaps erroneously, when I tee'ed up this PR I elected to delete the `follow_imports = skip` designations in the mypy-inductor.ini. This lead to a number of extra type error suppressions that I manually edited. You will need to review.
Signed-off-by: Edward Z. Yang <ezyang@meta.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/118432
Approved by: https://github.com/Skylion007
ghstack dependencies: #118414, #118418
Many of our pattern matching replacements are specified as a `search_fn` and a `replacment_fn`. The search_fn's are traced out once with static shapes, converted to a pattern, and then matched on every graph compiled with inductor.
The static shape patterns would not match with graphs that are traced out with dynamic shapes because SymInts would be added to the graph as `sym_size` fx nodes which added additional uses and prevented matching. The previous PR partially addresses this by deduping SymInts that are resolvable to graph inputs, as is the calling convention in aot autograd.
This PR adjusts our matching of the `search_fn` by adding SymInts to the arguments we trace out the search_fn with so that their symint accesses are deduped. Later, if we have a match, we will trace out the replacement graph with the correct Tensors and corresponding symbolic shapes that will get added to the graph.
Note: the replacement patterns will insert sym_size uses which could potentially be removed, but I'll leave that for follow up.
Fix for https://github.com/pytorch/pytorch/issues/111190.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/115441
Approved by: https://github.com/jansel
ghstack dependencies: #116158
In this PR, we try to keep the input mutations in the forward graph IFF input mutation is data mutation and not metadata mutation and doesn't require grad. This is for optimizing inductor training graphs. (For more details: https://github.com/pytorch/pytorch/issues/109240)
We keep the input mutation in the graph by wrapping the original callable in a wrapper function where in the end we add input.copy_(updated_input) call which is then traced via make_fx. Previously, this was only enabled for forward-only path but unconditionally disabled for joint graph.
Another caveat is that when we are tracing through tensor subclasses, we won't allow any input mutations to be preserved in the graph. The reason is that it makes the code logic quite ugly for no obvious performance improvement.
Most of the changes in this PR are mechanical and I didn't have to make any change to the partitioner. Previously forward/backward heavily relied on metadata field `num_mutated_inps` to figure out whether something is returned as extra output or not. But now since we keep some mutations in the graph, we need to propogate something similar to `num_mutated_inps - num_graph_handled_inps`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/111046
Approved by: https://github.com/ezyang, https://github.com/bdhirsh
Re-land of https://github.com/pytorch/pytorch/pull/111011.
The original PR ended up having a bad interaction with code that tried to run `torch.compile` under `with torch.inference_mode`, which caused some internal tests to fail.
The issue was that:
(1) AOTInductor invokes the pattern matcher passes in inductor
(2) The pattern matcher registers some code with [training_graph](https://github.com/pytorch/pytorch/blob/main/torch/_inductor/fx_passes/pad_mm.py#L461)
(3) The `training_graph` function expects to be able to set the global autograd state to `requires_grad`, and always get out a join graph (assertion [here](https://github.com/pytorch/pytorch/blob/main/torch/_inductor/pattern_matcher.py#L1196)).
(4) However, when inference_mode is activated, and you try to run AOTAutograd, AOTAutograd will witness that all outputs to the traced function will not require grad, and (now correctly) think that we are tracing an inference graph, which fails the above assert.
After talking to Bin, it sounds like these training-only patterns aren't necessary when we know we are compiling an inference graph (which should always be the case if you're running torch.compile with inference_mode). So I updated the pattern matcher to ignore any pattern matches using `training_graph`, when inference_mode is enabled.
This reverts commit cf6b1cdf6a.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/111347
Approved by: https://github.com/Chillee