I was originally trying to solve https://github.com/pytorch/pytorch/issues/120799 but got sidetracked along the way.
This PR contains a couple fixes. Let me know if you want me to split them up!
- Properly handle invalid user code when "super()" is called from non-method/classmethod. It will now properly raise the same error as CPython
- Fix base VariableTracker `__str__` method shadowing all `__repr__` methods defined in subclasses
- Fix accessing a classmethod on a user object to bind "cls" and not "self"
- Fix custom class handling of super() call to properly handle mixed regular/class/static methods
Locally , test_repros.py -k test_batch_norm_act still fails where the generated graph module is:
```
Call using an FX-traced Module, line 8 of the traced Module's generated forward function:
x = self.forward(l_x_); self = l_x_ = None
x_1 = self.L__self___act(x); x = None
```
note that "self" is being unset on the first line even though it is used on the second one.
For reference, this is the test c268ce4a6d/test/dynamo/test_repros.py (L1368-L1369)
I cannot figure out where the generated forward function comes from though, any hint would be welcome!
Pull Request resolved: https://github.com/pytorch/pytorch/pull/121365
Approved by: https://github.com/jansel
Inductor codegen for `_assert_async` is currently disabled because we don't really understand how to codegen `scalar_to_tensor` on a Sympy expression. I initially tried to see if I could get this to work, but I got into some weird problem involving stride sorting, so I decided to fix it properly by not going through a tensor.
So we introduce an `_assert_scalar` which takes a scalar as an argument, avoiding needing to turn a SymBool into a tensor before asserting on it. I also add `_functional_assert_scalar` for good luck, although this doesn't do anything right now because https://github.com/pytorch/pytorch/pull/104203 still hasn't been landed.
I need to customize the codegen for this operator, so I decide to directly implement it in Inductor, rather than trying to treat it as a generic ExternKernel. This leads to the new AssertScalar IR node. This is written carefully so that it doesn't get DCE'd by Inductor.
Signed-off-by: Edward Z. Yang <ezyang@meta.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/114148
Approved by: https://github.com/jansel
ghstack dependencies: #120800
Pre-emptive test in OSS to ensure that models relying on the "non-overlapping guards" checks do not suffer drastically w.r.t. guard slowness. Current plan is to follow up on this with a "real" fix, to generate a linear number of these guards.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/120106
Approved by: https://github.com/mlazos
`dv = at::empty_like(k)` and `dv = at::empty_like(v)` can be materially different, because `empty_like` tries to preserve the strides of the input when possible. So if `k` is contiguous, but `v`, is transposed, then before this PR, `dv` would be computed to be contiguous.
Alternatively, we could change the meta implementation of `aten._scaled_dot_product_flash_attention` to this:
```
grad_q = torch.empty_like(query.transpose(1, 2)).transpose(1, 2)
grad_k = torch.empty_like(key.transpose(1, 2)).transpose(1, 2)
grad_v = torch.empty_like(key.transpose(1, 2)).transpose(1, 2)
return grad_q, grad_k, grad_v
```
But (I think?) the logic in the sdpa backward impl was a typo.
I noticed this because changing the meta formula as above was enough to fix the issue with the `aot_eager` backend in this [link](https://github.com/pytorch/pytorch/issues/116935#issuecomment-1914310523).
A minimal repro that I made looks like this:
```
import torch
# in this repro, "grad_out" and "value" are transposed tensors,
# but "key" and "value" are contiguous
a = torch.randn(2, 513, 16, 64, dtype=torch.float16, device='cuda').transpose(1, 2)
b = torch.randn(2, 16, 513, 64, dtype=torch.float16, device='cuda')
c = torch.randn(2, 16, 513, 64, dtype=torch.float16, device='cuda')
d = torch.randn(2, 513, 16, 64, dtype=torch.float16, device='cuda').transpose(1, 2)
e = torch.randn(2, 16, 513, 64, dtype=torch.float16, device='cuda')
f = torch.randn(2, 16, 513, device='cuda')
g = None
h = None
i = 513
j = 513
k = 0.0
l = False
m = torch.tensor(1, dtype=torch.int64)
n = torch.tensor(1, dtype=torch.int64)
out1_ref, out2_ref, out3_ref = torch.ops.aten._scaled_dot_product_flash_attention_backward(a, b, c, d, e, f, g, h, i, j, k, l, m, n, scale=0.125)
from torch._meta_registrations import meta__scaled_dot_product_flash_backward
out1_test, out2_test, out3_test = meta__scaled_dot_product_flash_backward(a, b, c, d, e, f, g, h, i, j, k, l, m, n, scale=0.125)
# prints True True
print(out1_ref.is_contiguous())
print(out1_test.is_contiguous())
# prints True True
print(out2_ref.is_contiguous())
print(out2_test.is_contiguous())
# prints True False
print(out3_ref.is_contiguous())
print(out3_test.is_contiguous())
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/119500
Approved by: https://github.com/drisspg, https://github.com/ezyang, https://github.com/Skylion007
`dv = at::empty_like(k)` and `dv = at::empty_like(v)` can be materially different, because `empty_like` tries to preserve the strides of the input when possible. So if `k` is contiguous, but `v`, is transposed, then before this PR, `dv` would be computed to be contiguous.
Alternatively, we could change the meta implementation of `aten._scaled_dot_product_flash_attention` to this:
```
grad_q = torch.empty_like(query.transpose(1, 2)).transpose(1, 2)
grad_k = torch.empty_like(key.transpose(1, 2)).transpose(1, 2)
grad_v = torch.empty_like(key.transpose(1, 2)).transpose(1, 2)
return grad_q, grad_k, grad_v
```
But (I think?) the logic in the sdpa backward impl was a typo.
I noticed this because changing the meta formula as above was enough to fix the issue with the `aot_eager` backend in this [link](https://github.com/pytorch/pytorch/issues/116935#issuecomment-1914310523).
A minimal repro that I made looks like this:
```
import torch
# in this repro, "grad_out" and "value" are transposed tensors,
# but "key" and "value" are contiguous
a = torch.randn(2, 513, 16, 64, dtype=torch.float16, device='cuda').transpose(1, 2)
b = torch.randn(2, 16, 513, 64, dtype=torch.float16, device='cuda')
c = torch.randn(2, 16, 513, 64, dtype=torch.float16, device='cuda')
d = torch.randn(2, 513, 16, 64, dtype=torch.float16, device='cuda').transpose(1, 2)
e = torch.randn(2, 16, 513, 64, dtype=torch.float16, device='cuda')
f = torch.randn(2, 16, 513, device='cuda')
g = None
h = None
i = 513
j = 513
k = 0.0
l = False
m = torch.tensor(1, dtype=torch.int64)
n = torch.tensor(1, dtype=torch.int64)
out1_ref, out2_ref, out3_ref = torch.ops.aten._scaled_dot_product_flash_attention_backward(a, b, c, d, e, f, g, h, i, j, k, l, m, n, scale=0.125)
from torch._meta_registrations import meta__scaled_dot_product_flash_backward
out1_test, out2_test, out3_test = meta__scaled_dot_product_flash_backward(a, b, c, d, e, f, g, h, i, j, k, l, m, n, scale=0.125)
# prints True True
print(out1_ref.is_contiguous())
print(out1_test.is_contiguous())
# prints True True
print(out2_ref.is_contiguous())
print(out2_test.is_contiguous())
# prints True False
print(out3_ref.is_contiguous())
print(out3_test.is_contiguous())
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/119500
Approved by: https://github.com/drisspg, https://github.com/ezyang, https://github.com/Skylion007
The torch "fake" ndarray had some mismatches vs numpy.ndarray which caused test_sparse_to_sparse_compressed to fail under dynamo.
This also fixes (because the test now hits it) a problem where unpacking a sequence with the incorrect number of args would assert in dynamo instead of graph breaking (because it would throw an exception). Added a unit test for this condition.
Fixed:
- torch._numpy._ndarray.astype() (actually used by the test)
- torch._numpy._ndarray.put() (drive-by discovery)
- torch._numpy._ndarray.view() (drive-by discovery)
(burndown item 7)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/117952
Approved by: https://github.com/yanboliang
ghstack dependencies: #117951
Don't require using it as `@requires_cuda()` -> `@requires_cuda` instead No need for the partial function invoked many times
Split out this change from the initial large refactoring in #117741 to hopefully get merged before conflicts arise
Pull Request resolved: https://github.com/pytorch/pytorch/pull/118281
Approved by: https://github.com/ezyang
Inductor codegen for `_assert_async` is currently disabled because we don't really understand how to codegen `scalar_to_tensor` on a Sympy expression. I initially tried to see if I could get this to work, but I got into some weird problem involving stride sorting, so I decided to fix it properly by not going through a tensor.
So we introduce an `_assert_scalar` which takes a scalar as an argument, avoiding needing to turn a SymBool into a tensor before asserting on it. I also add `_functional_assert_scalar` for good luck, although this doesn't do anything right now because https://github.com/pytorch/pytorch/pull/104203 still hasn't been landed.
I need to customize the codegen for this operator, so I decide to directly implement it in Inductor, rather than trying to treat it as a generic ExternKernel. This leads to the new AssertScalar IR node. This is written carefully so that it doesn't get DCE'd by Inductor.
Signed-off-by: Edward Z. Yang <ezyang@meta.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/114148
Approved by: https://github.com/jansel
This prepares the PR where we implement sets in terms of dicts.
To do so, rather than storing internally a dictionary that maps literals
to VariableTrackers, it stores (pretty much) a dictionary from VTs to VTs.
To do so, keys are wrapped in an opaque internal class _Hashable.
The Hashable class is opaque on purpose so that it fails hard if
if it inadvertently leaks back into user code.
We also found and fixed a number of latent bugs and inconsistencies
in the way dynamo checked what can be a dict key. More generally, we
make much clearer what are the things that need to be modified to add
a new supported key type to Dicts.
Fixes [#107595](https://www.internalfb.com/tasks?t=107595)
Fixes [#111603](https://www.internalfb.com/tasks?t=111603)
Re-PR of https://github.com/pytorch/pytorch/pull/111196 sadly due to reverts, we could not reuse @lezcano's original PR.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/116785
Approved by: https://github.com/mlazos
After # 113734 landed (adding dynamic storage offsets), we found that compilation times increased significantly. The reason: tensors_definitely_do_not_overlap was doing comparisons on storage offsets which were adding guards
626b7dc847/torch/_functorch/_aot_autograd/input_output_analysis.py (L268-L276)
This guard is added on all pairs of tensors which are views of the same source tensor - i.e. it the number of guards can be quadratic in the number of input tensors. This PR adds a test to prevent similar regressions.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/115793
Approved by: https://github.com/yanboliang
Currently this skip imports torchvision, so if your torchvision install
is broken then the entire file fails at collection time. This instead
means only the test itself will fail.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/115677
Approved by: https://github.com/lezcano
Constant time access of first value in collection. This is a constant time operation instead of converting the item to a list to get the first item which is linear. The rule is turned on which automatically autofixes and enforces this.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/115507
Approved by: https://github.com/malfet
**Dynamo**
We don't want setattr in the graph. Setting data has interesting implications on both aliasing and on the autograd engine.
The safe recipe is:
1) Disable grad
2) Call set_()
3) Manually lower the version counter on the object to hide it from the autograd engine
This is effectively the same exact thing as setting .data, and it composes properly with aot_autograd and inductor.
**aot_autograd**
For aot_autograd, there's another snag.
Specifically, when we invoke aot_autograd, we call `fake_mode.from_tensor()`, relying on memo to get the right tensor out. For .data mutations, this doesn't work, because the memoized fake_tensor is in the state it will be in at the end of the trace, not at the beginning. This means that the .data call is already applied, and the tensor shape (as in the case of these tests) mismatches. aot_autograd produces an invalid graph, with illegal calls like `torch.ops.aten.view.default(primals_2, [0])` where primals is actually sized `([6])` on input.
The new plan here is to:
1) Record tensor fakification policy in dynamo
2) provide a fresh fake mode to all backends
3) Invoke from_tensor with the stored policy to get fresh new fake tensors in aot_autograd
Pull Request resolved: https://github.com/pytorch/pytorch/pull/113080
Approved by: https://github.com/bdhirsh
During the course of fake tensor propagation (and, potentially, also Dynamo execution, although I do not believe it is possible to exercise this right now), we may generate deferred runtime asserts, which represent "guards" on unbacked symbols which cannot be immediately checked on entry to a code block; instead, they have to be checked at runtime. However, we currently accumulate these deferred runtime asserts into the ShapeEnv, and don't do anything with them.
This PR modifies Dynamo to automatically insert these runtime asserts into the FX graph, before passing it on to the backend compiler. The assert format coincides with the export assert format as practiced in `torch/_export/passes/add_runtime_assertions_for_constraints_pass.py`, but actually these passes are completely disjoint right now as I only handle deferred runtime asserts, while export only handles ranges (which I should probably also handle, but don't in this PR.)
The assertions must be inserted by Dynamo, because you could potentially then pass the asserts onto another backend like "eager" which no longer looks at the ShapeEnv before. Thanks to previous work in export, these asserts are preserved in AOTAutograd, but they are dropped by Inductor, which needs to be fixed in future work. This piece will be a bit awkward, as Inductor would have preferred to work with the Sympy expressions directly, ah well.
Here is what the Dynamo traced FX graph looks like for the test in question:
```
<eval_with_key>.0 class GraphModule(torch.nn.Module):
def forward(self, L_x_ : torch.Tensor):
l_x_ = L_x_
# File: /data/users/ezyang/c/pytorch/wu.py:8, code: y = x.item()
item = l_x_.item()
# No stacktrace found for following nodes
ge_1 = item >= 0
scalar_tensor_default = torch.ops.aten.scalar_tensor.default(ge_1); ge_1 = None
_assert_async_msg = torch.ops.aten._assert_async.msg(scalar_tensor_default, "Deferred runtime assert failed: i0 >= 0, where i0 was defined by 'item' (for more information, run with TORCH_LOGS=+dynamo,dynamic)"); scalar_tensor_default = None
# File: /data/users/ezyang/c/pytorch/wu.py:9, code: torch._check_is_size
_check_is_size = torch._check_is_size(item)
# File: /data/users/ezyang/c/pytorch/wu.py:10, code: if y >= 0:
ge = item >= 0; item = None
# File: /data/users/ezyang/c/pytorch/wu.py:11, code: return x * 2
mul = l_x_ * 2; l_x_ = None
return (mul,)
```
Note that we actually keep the `_check_is_size` in the graph redundantly. However, assert_async is retained in the graph, whereas _check_is_size ends up getting DCE'ed.
Signed-off-by: Edward Z. Yang <ezyang@meta.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/113958
Approved by: https://github.com/aakhundov, https://github.com/tugsbayasgalan
ghstack dependencies: #113978
Copied from @ezyang 's #113693.
The motivation for this change is that we'd like to guard on storage offset in inductor, to make assumptions about data alignment.
create_symbolic_sizes_strides_storage_offset() creates the sizes/strides/offset for fake tensors - they can either be integers or symints. This PR changes storage_offset to always be dynamic. In variables/builder.py, we remove a conditional so that all tensors get added to tracked_fakes. This is because the storage offset will be dynamic even if the other logic in builder.py suggests that it will be static; otherwise, we run into this issue:
1e260c851b/torch/fx/experimental/symbolic_shapes.py (L892-L895)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/113734
Approved by: https://github.com/ezyang
Copied from @ezyang 's #113693.
The motivation for this change is that we'd like to guard on storage offset in inductor, to make assumptions about data alignment.
create_symbolic_sizes_strides_storage_offset() creates the sizes/strides/offset for fake tensors - they can either be integers or symints. This PR changes storage_offset to always be dynamic. In variables/builder.py, we remove a conditional so that all tensors get added to tracked_fakes. This is because the storage offset will be dynamic even if the other logic in builder.py suggests that it will be static; otherwise, we run into this issue:
1e260c851b/torch/fx/experimental/symbolic_shapes.py (L892-L895)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/113734
Approved by: https://github.com/ezyang
fixes https://github.com/pytorch/pytorch/issues/90552. This is a simpler fix that just detects the situation where AOTAutograd can't create a proper backward graph for the situation and graph breaks. This was technically a silent correctness issue before.
This PR tries to always graph break when we see a factory function that returns a tensor requiring grad. I check this by seeing if the op returned a `TensorVariable` in dynamo, and if one of the input arguments was a `requires_grad=True` kwarg. I think this is high-fidelity enough, and I'm also hoping that this is uncommon enough that a graph break is reasonable here.
The fix to avoid the graph break in user land is also pretty easy - just instantiate your tensor outside of the compiled region and plumb it in.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/113277
Approved by: https://github.com/eellison
ghstack dependencies: #113267, #113416, #113584
Fixes https://github.com/pytorch/pytorch/issues/113010
In eager mode, when you call an out= op like `add(..., out=out_arg)` with an out argument that is noncontiguous, the noncontiguous out arg will be returned directly. When we functionalize though, functionalization replaces it with a call to `add(...)` which ignores the contiguity of the original out arg.
Instead of trying to support this, this PR detects that situation and graph breaks
Pull Request resolved: https://github.com/pytorch/pytorch/pull/113267
Approved by: https://github.com/albanD
This prepares the PR where we implement sets in terms of dicts.
To do so, rather than storing internally a dictionary that maps literals
to VariableTrackers, it stores (pretty much) a dictionary from VTs to VTs.
To do so, keys are wrapped in an opaque internal class `_Hashable`.
The Hashable class is opaque on purpose so that it fails hard if
if it inadvertently leaks back into user code.
We also found and fixed a number of latent bugs and inconsistencies
in the way dynamo checked what can be a dict key. More generally, we
make much clearer what are the things that need to be modified to add
a new supported key type to Dicts.
Fixes https://github.com/pytorch/pytorch/issues/107595
Fixes https://github.com/pytorch/pytorch/issues/111603
Pull Request resolved: https://github.com/pytorch/pytorch/pull/111196
Approved by: https://github.com/jansel
Main: `RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn`
This PR: graph breaks and eager applies the mutation, new tensors are tracked
Fixes https://github.com/pytorch/pytorch/issues/109505 (the original bug does not occur, but a new bug where the mutation isn't applied - because AOTAutograd is not `requires_grad` mutation aware - is mitigated)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/113163
Approved by: https://github.com/bdhirsh
Main: `RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn`
This PR: graph breaks and eager applies the mutation, new tensors are tracked
Fixes https://github.com/pytorch/pytorch/issues/109505 (the original bug does not occur, but a new bug where the mutation isn't applied - because AOTAutograd is not `requires_grad` mutation aware - is mitigated)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/113163
Approved by: https://github.com/bdhirsh
Thanks aakhundov for constructing the test case. This PR was constructed by running the failing test case, and then fixing problems until we got all the way to the end. There are a few distinct fixes:
* AOTAutograd performs equality tests on tensor metadata to determine if a metadata mutation had occurred. If we test i0 vs i1, we should report these are NOT equal, since obviously we have somehow resized the tensor from i0 to i1 (even if, on a particular run, it is possible i0 == i1).
* There's a sketchy fix for `test_aot_autograd_exhaustive_matmul_cpu_float32` where we check if the output shape equals the tangent shape. Unfortunately, the same `definitely_true` treatment does not work here, it still fails on the example. I piled an extra sketchy fix on top of it, where I just try my best to avoid doing the view. Maybe we should have some sort of logging here.
* Partitioner needs to get out a size for unbacked SymInt when partitioning. I just feed it a random heuristic value in this case, similar to how we've been dealing with this in Inductor.
Signed-off-by: Edward Z. Yang <ezyang@meta.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/113159
Approved by: https://github.com/aakhundov, https://github.com/bdhirsh
Fixes https://github.com/pytorch/pytorch/issues/113030
Alias information needs to be applied in eager before we can continue to trace the graph.
----
Perhaps this is too strict - couldn't we fx trace through the in-graph (pointer) aliasing, and track mutations through fake tensors instead, and still apply the aliasing mutation epilogue for further mutations outside of graph? 🤔
Regardless, it didn't seem to work too well when I tried this. Seems that `Tensor.__setattr__` doesn't work well in fx graph.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/113043
Approved by: https://github.com/ezyang, https://github.com/voznesenskym
Dynamo currently runs the real graph module with real inputs as a way to match the return result of graph module with the eager return type. This is unsafe when graph module is side effectful. In the long term, we will get rid of this step. But in the short term, we just fakify the graph module again and run it.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/107271
Approved by: https://github.com/ezyang
Add similar semantics for creating a buffer object similar to creating a parameter. This is done by introducing a new `Buffer` class that can be used for type disambiguation. The underlying functionality of registering a buffer remains the same as the `register_buffer` method has not been changed. The `persistent` parameter in the `Buffer` type is to indicate whether a buffer object should be persistent or not. Other non-test changes have to do with getting the new `Buffer` type recognized by inductor and dynamo. Remaining changes are test changes to make sure that the `Buffer` type can be used as a drop in replacement for `register_buffer` as it just leads to `register_buffer` being called. The addition of this new functionality still allows for normal tensors to be used as buffers so these changes are intended to be backwards compatible.
Fixes#35735
Pull Request resolved: https://github.com/pytorch/pytorch/pull/104069
Approved by: https://github.com/mikaylagawarecki
The main complexity comes from the __init__ function of Dataclass variables which look something like this
```
[2023-07-10 05:01:29,548] torch._dynamo.symbolic_convert: [DEBUG] INLINING <code object __init__ at 0x7f7015154450, file "<string>", line 2>
3 0 LOAD_FAST 1 (b)
2 LOAD_FAST 0 (self)
4 STORE_ATTR 0 (b)
4 6 LOAD_FAST 2 (named_tensors)
8 LOAD_DEREF 0 (_HAS_DEFAULT_FACTORY)
10 IS_OP 0
12 POP_JUMP_IF_FALSE 20
14 LOAD_DEREF 1 (_dflt_named_tensors)
16 CALL_FUNCTION 0
18 JUMP_FORWARD 2 (to 22)
>> 20 LOAD_FAST 2 (named_tensors)
>> 22 LOAD_FAST 0 (self)
24 STORE_ATTR 1 (named_tensors)
26 LOAD_CONST 0 (None)
28 RETURN_VALUE
```
There are multiple issues
* VariableBuilder call in functions.py was wrong. We were calling *options as args.
* We were not setting source while tracking the new object. This led to no source for Dataclass variable, which has some new variables in its closures as seen in the above bytecode.
* There is IS_OP in above bytecode, which brings more cases.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/104840
Approved by: https://github.com/jansel
This PR turns translation validation on by default for tests and accuracy benchmark
runs. It also installs Z3 on CI.
The main changes are:
- Add `--no-translation-validation` as an option in _test/run_tests.py_
- Set `PYTORCH_TEST_WITH_TV` environment variable
- Add `TEST_WITH_TV` variable in _torch/testing/_internal/common_utils.py_
- Turn translation validation on for accuracy benchmarks in _benchmarks/dynamo/common.py_
- Add Z3 installation on CI scripts
Pull Request resolved: https://github.com/pytorch/pytorch/pull/103611
Approved by: https://github.com/ezyang
In some cases, a UserFunctionVariable can be constructed when the underlying function is actually a TorchVariable. One example is when an attribute on a UnspecializedNNModuleVariable is a torch function. In those cases, we should treat the UserFunctionVariable as a TorchVariable.
This adds a check in UserDefinedObjectVariable.var_getattr() to try to create a TorchVariable instead of a UserFunctionVariable.
Fixes#104172
Pull Request resolved: https://github.com/pytorch/pytorch/pull/104231
Approved by: https://github.com/williamwen42, https://github.com/jansel
Fixes https://github.com/pytorch/pytorch/issues/103132
This is kind of annoying: Functionalization (and also vmap, I think?) manually figures out which ops have C++ CompositeImplicit decomps, and directly registers them to the Functionalize key. This is a problem for the PyDispatcher: We normally want the PyDispatcher to take precedence over the regular dispatcher. But in this case, we have a python decomp registered to `CompositeImplicitAutograd`, and a C++ decomp registered *directly* to the `Functionalize` key, so the C++ decomp gets precedence over the python decomp.
The way this showed up was that a model was running `matmul()` under inference mode, so we never hit the autograd dispatch key, and go straight to the functionalize dispatch key. Matmul has both a python decomp and a c++ decomp, but we were running the C++ decomp. That C++ decomp isn't meant to be used with dynamic shapes, so we were failing with the "tried to call `.sizes()` on a tensor with dynamic shapes" error.
For now, I had the PyDispatcher mimic the behavior of functionalization codegen: when you register a python decomp to the `CompositeImplicitAutograd` key, this PR just automatically registers that decomp to the `Functionalize` key at the same time.
I'm trying to remember now why we didn't just add `Functionalize` (and all of the other functorch transform keys) directly to the `CompositeImplicitAutograd` alias keyset, but I couldn't remember (@zou3519 any chance you remember?).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/103275
Approved by: https://github.com/ezyang, https://github.com/zou3519
The main concept behind this refactor is this: if we know that a size/stride/etc is constant, do NOT trace it into the graph, EXCEPT for any preexisting special cases that applied for static shapes. The refactor unfolds like this:
1. Delete the `dynamic_shapes` branches in torch/_dynamo/variables/builder.py which accept int/float/bool outputs. This is over-aggressive and we don't want to allow this (because if the operator returns a constant, we shouldn't have called wrap_fx_proxy in the first place.) This causes a bunch of failures because we are blindly feeding the result of size() call to wrap_fx_proxy when dynamic shapes is enabled.
2. Modify TensorVariable.call_method in torch/_dynamo/variables/tensor.py to avoid sending constant ints to wrap_fx_proxy. After normal specialization (which should be deleted, see https://github.com/pytorch/pytorch/pull/103434) we consult the fake tensor to see if the values in question have free variables or not. If they don't we short circuit tracing into graph. We only trace into graph if the operation in question is truly symbolic. Note that there is a near miss here: it's OK to trace x.size() call entirely into the graph, even if it doesn't have all dynamic shapes, because operator.getitem with int output is special cased in builder.py. This is a preexisting special case and I don't try to get rid of it.
3. It turns out that the change here also breaks torch_np compatibility layer. So I completely rewrite getattr handling in torch/_dynamo/variables/tensor.py to follow the same pattern (only trace into graph if truly dynamic).
There's some minor housekeeping in torch/fx/experimental/symbolic_shapes.py and some test files.
Signed-off-by: Edward Z. Yang <ezyang@meta.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/103438
Approved by: https://github.com/larryliu0820
Previously I accidentally thought setitem takes each argument as a
list. But if you write x[:, b] that actually is passed in as a tuple.
Try harder.
Signed-off-by: Edward Z. Yang <ezyang@meta.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/103052
Approved by: https://github.com/desertfire
Fixes#102315
The root cause is for ```UnspecializedNNModuleVariable``` which extends from ```UserDefinedObjectVariable```, if ```__bool__``` is missing, we should use ```__len__``` to infer a truth value.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/102583
Approved by: https://github.com/jansel
This PR adds support for tracing autograd.Function with grad.
A few important bullet points outlining our approach:
1) Our goal is to verify soundness in order to add a call_function to the autograd.Function's `apply` to the graph.
2) We achieve (1) by either verifying soundness or rejecting soundness, by ensuring that both forward and backward of the autograd.Function are sound.
3) For the forward, if we verify soundness, we install its guards into the graph.
4) For the backward, if we verify soundness, we throw it out. However, backwards soundness verification is more onerous, and has a config driven set of banned attrs and methods for tensors.
1-4 above are achieved by turning the forward and backward into UserDefinedFunctionVariables, and inlining through them, relying on dynamo's soundness detection. If we graph break in these, we raise and treat them as unsound. As noted above, backwards is stricter yet.
For the tracing, the safety comes from dynamo's HigherOrderOperator system. That system ensures that not only do we trace soundly, but that no new variables are lifted into inputs during the tracing, and that the forward and backwards are entirely self contained.
Whenever we reject a function as unsound, we restore back, as usual.
Due to some limitations in the lifting logic, we have an escape hatch we implemented for tensors that are known in forward, but cross into backwards through save_tensors (save) /saved_tensors (load). We escape hatch here to avoid having the known saved tensors coming from forward end up being accidentally treated as lifted variables (and rejected). This is sound, but a little hacky feeling.
Additionally, due to some limitations in fx node removal, combined with how we produce subgraphs for the traces installed from HigherOrderOperators, we had to improve our node removal logic. In the event of a restore, we remove the old nodes from the graph, as usual in dynamo. However, because the references to these nodes may exist in subgraphs, we traverse any nodes users and remove them first if and only if they are in another graph. This is always sound, because removal should only be downstream of restoration at this point.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/99483
Approved by: https://github.com/zou3519
This PR introduces a new operator called aten._assert_async.msg, which allows passing a tensor value and assertion message as inputs. As part of TorchDynamo, we're replacing the use of torch._assert with this new operator so that make_fx also knows how to handle assertions. This is subset of https://github.com/pytorch/pytorch/pull/98878, refer there for historic reviews.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/100101
Approved by: https://github.com/jansel
Currently, we return `unimplemented` w/o a graph break on seeing a x.unsqueeze_ when x is input. This essentially means we fall back to the original frame.
This PR actually graph breaks so that we can generate the continuation frame for the rest of the function. Instead of graph breaking at LOAD_ATTR, we delay the graph break to the actual CALL_FUNCTION, where its cleaner to graph break.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/99986
Approved by: https://github.com/jansel
Summary of changes:
- Add CPython exceptiontable parsing/assembling functions in torch/_dynamo/bytecode_transformation.py, based on https://github.com/python/cpython/blob/3.11/Objects/exception_handling_notes.txt.
- Add optional `exn_tab_entry` field to dynamo `Instruction`s in torch/_dynamo/bytecode_transformation.py in order to virtualize exception table entries (start, end, target instructions).
- Add checks guarding against duplicate instructions in dynamo, so that jump/exceptiontable targets are unambiguous. See `get_indexof` in torch/_dynamo/bytecode_analysis.py. Ensure that bytecode generation throughout dynamo does not generate duplicate instructions.
- Allow dynamo bytecode generation logic to generate nested exception table entries for developer convenience. CPython expects entries to not overlap, so we flatten nested entries during assembly in torch/_dynamo/bytecode_transformation.py:compute_exception_table.
- Simulate the block stack in torch/_dynamo/symbolic_convert.py. CPython removed the block stack in 3.11, but dynamo needs it in order to keep track of active contexts. So we simulate the block stack as before by looking at exceptiontable entries in order to determine the current blocks.
- Update context codegen in torch/_dynamo/resume_execution.py. The `SETUP_FINALLY` bytecode, which conveniently had a jump target to the finally block, was removed in 3.11, so we need to keep track of the jump target of the finally block using exceptiontables. Generating resume functions is more difficult since the original exceptiontable entries pointing to old cleanup code need to be modified to point to new cleanup code.
- Fix a push_null bug in torch/_dynamo/variables/functions.py introduced by https://github.com/pytorch/pytorch/pull/98699
Pull Request resolved: https://github.com/pytorch/pytorch/pull/96511
Approved by: https://github.com/jansel, https://github.com/yanboliang, https://github.com/albanD
Summary
There is confusion between`_dynamo.skip` and `_dynamo.disable`. This removes the `_dynamo.skip` API. The functionality is still available via `_dynamo.disable(recursive=False)`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/98899
Approved by: https://github.com/jansel
### Overview
This PR de-duplicates graph inputs in TorchDynamo, using the `Source` as the unique identifier for each input. This closes https://github.com/pytorch/pytorch/issues/98743 and https://github.com/pytorch/pytorch/issues/98625.
### Details
`VariableBuilder.wrap_tensor()` should return a `VariableTracker` for the passed-in `value: Tensor`. If `value` is duplicated, we should avoid calling `OutputGraph.create_graph_input()` and `OutputGraph.add_grapharg()`.
- Note that `create_graph_input()` and `add_grapharg()` are not 1:1. For a constant source and either `wrap_sym()` or `wrap_unspecialized_primitive()`, TorchDynamo still calls `create_graph_input()` but not `add_grapharg()`.
- Note that `create_graph_input()` should be called before constructing the corresponding `VariableTracker`. TorchDynamo needs the `fx.Proxy` object to pass to `wrap_fx_proxy()`.
In this PR, the `OutputGraph` saves an additional mapping `input_source_to_var` from each graph input's `Source` to its `VariableTracker`, which works because `Source` is now hashable. This mapping should be updated each time `create_graph_input()` is called. However, since we must construct the `VariableTracker` after `create_graph_input()` returns, we must have a separate call to the `OutputGraph` to update the mapping.
If anyone has any suggestion on how to coalesce this logic and avoid having to remember to update `input_source_to_var` for each `create_graph_input()`, I would love to hear it.
<details>
<summary> Alternate Approach</summary>
Initially, I tried having TorchDynamo construct a new but equivalent `VariableTracker` for the duplicated tensor. However, I abandoned this approach after hitting an assertion in `def wrap_fx_proxy_cls()` due to `"example_value"` already being in the proxy node's metadata because we were reusing the primary tensor's `Proxy` object. Reusing the exact `VariableTracker` also seems less error-prone instead of requiring constructing a new but identical `VariableTracker`.
</details>
### Testing
#### Global Variable Test
```
import torch
@torch.compile()
def f():
return x + x
x = torch.randn(3)
f()
```
Before:
```
====== Forward graph 0 ======
<eval_with_key>.6 class <lambda>(torch.nn.Module):
def forward(self, arg0_1: f32[3], arg1_1: f32[3]):
# File: /data/users/ezyang/b/pytorch/ff.py:5, code: return x + x
add: f32[3] = torch.ops.aten.add.Tensor(arg0_1, arg1_1); arg0_1 = arg1_1 = None
return (add,)
```
After (only `arg0_1` and no more `arg1_1`):
```
====== Forward graph 0 ======
<eval_with_key>.4 class <lambda>(torch.nn.Module):
def forward(self, arg0_1: f32[3]):
# File: dynamo/test_dup_global.py:8, code: return x + x
add: f32[3] = torch.ops.aten.add.Tensor(arg0_1, arg0_1); arg0_1 = None
return (add,)
```
#### FSDP Test
Before we error on
```
File "/.../pytorch/torch/_guards.py", line 244, in __post_init__
assert self.input_source_a != self.input_source_b
```
and now there is no error.
---
The rename from `name_to_input` to `input_name_to_proxy` is not part of the core logic change and is a remnant from initial attempts. I can undo it later if desired, but I also feel that the new name is more informative. It also fixes the type annotation.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/98775
Approved by: https://github.com/ezyang, https://github.com/voznesenskym
Summary:
`:test_dynamo` has been broken for long time internally in Meta. This PR is to fix the broken test and re-enable it internally.
- Using the root `pytest.ini` for pytest
- Decouple tests so that one can be disabled with affecting others
- Temporarily disable the test cases that require additional efforts to fix
**OSS CI doesn't provide test code coverage info. Meta internal test infra does. The value of re-enabling these tests internally is not only to collect test coverage info but help fbcode developers to build/test from fbcode.**
Test Plan:
`buck test mode/dev-nosan //caffe2/test:test_dynamo`
https://www.internalfb.com/intern/testinfra/testrun/7318349540623516
Differential Revision: D44325238
Pull Request resolved: https://github.com/pytorch/pytorch/pull/97937
Approved by: https://github.com/ezyang
The purpose of this API is to execute a few large components of work:
1) Refactor all the internals of plumbing dynamic dimension information after dynamo to be stateless
2) Decouple allocation controls around dynamic dimensions from verification
3) For (2), for allocation, create an enum that dictates whether we are in DUCK (default today), STATIC (aka assume_static_default in the past), or DYNAMIC (aka user constrained, do not duck shape)
4) For (2), for verification, we separate out the list of dynamic ranges entirely from allocation. This means shape_env does not tracking for what we verify on, and instead, it is the callers job to invoke produce_guards() with the various things they want verified, specifically, with the valid ranges. We do use constrain ranges to refine value ranges when doing analysis.
5) We have decided, therefore, as an extension of (4) to double down on "late" checks versus "eager" checks, primarily because the mechanisms for gathering what actually matters happens during guards, and should be a purview of the caller seeking guards, not the shape env. However, for dynamo, these structures are essentially one and the same.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/96699
Approved by: https://github.com/avikchaudhuri, https://github.com/ezyang
Enables the last few files under pytest.
xdist was causing problems with `profiler/test_profiler` `test_source_multithreaded` due to creating extra threads. Luckily we don't use it so we can disable it with `-p no:xdist`, but this is incompatible with pytest-rerunfailures==10.2, so upgrade to 10.3. I'd update the windows ami but idk how.
`dynamo/test_optimizers` and `dynamo/test_repros` both had tests that used skip_if_pytest. https://github.com/pytorch/pytorch/pull/93251/files suggests that it is due to pytest assertion rewriting, so I added `PYTEST_DONT_REWRITE` to their module docstrings to prevent pytest from rewriting assertions.
Disable test by issue in `dynamo/test_dynamic_shapes` seems sane.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/96698
Approved by: https://github.com/huydhn, https://github.com/malfet
Planning to do a full writeup later. The short story is, sometimes the following chain of events happens:
1. We turn on Dynamo's custom frame handler
2. GC triggers (and all of the finalizers run under Dynamo)
3. GC hits a GeneratorExit frame
4. You end up in the custom frame handler with throw_flag == TRUE and PyErr_Occurred() != NULL
If this happens and we blindly call into other Python functions (like the Python callback), the executed Python code will immediately raise an exception (because there's already an ambient exception set.) This is very, very confusing. The fix is to defer to the regular handler when throw_flag is TRUE.
I triggered this locally with
```
PYTHONUNBUFFERED=1 pytest test/dynamo/test_dynamic_shapes.py -k 'Unspec and export and not dupes and not reorder' -v -x -s
```
But I also have some tests which trigger the problem synthetically.
Fixes https://github.com/pytorch/pytorch/issues/93781
Signed-off-by: Edward Z. Yang <ezyang@meta.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/96488
Approved by: https://github.com/albanD
OK, so this PR used to be about reducing the number of constants we specialize on, but it turns out that unspecialization was ~essentially never used (because we still constant specialized way too aggressively) and I ended up having to fix a bunch of issues to actually get tests to pass. So this PR is now "make int unspecialization actually work". As part of this, I have to turn off unspecialization by default, as there are still latent bugs in inductor.
The general strategy is that an unspecialized int is represented as a SymInt. Representing it as a 0d tensor (which is what the code used to do) is untenable: (1) we often need unspecialized ints to participate in size computations, but we have no way of propagating sympy expressions through tensor compute, and (2) a lot of APIs work when passed SymInt, but not when passed a Tensor. However, I continue to represent Numpy scalars as Tensors, as they are rarely used for size computation and they have an explicit dtype, so they are more accurately modeled as 0d tensors.
* I folded in the changes from https://github.com/pytorch/pytorch/pull/95099 as I cannot represent unspecialized ints as SymInts without also turning on dynamic shapes. This also eliminates the necessity for test_unspec.py, as toggling specialization without dynamic shapes doesn't do anything. As dynamic shapes defaults to unspecializing, I just deleted this entirely; for the specialization case, I rely on regular static shape tests to catch it. (Hypothetically, we could also rerun all the tests with dynamic shapes, but WITH int/float specialization, but this seems... not that useful? I mean, I guess export wants it, but I'd kind of like our Source heuristic to improve enough that export doesn't have to toggle this either.)
* Only 0/1 integers get specialized by default now
* A hodgepodge of fixes. I'll comment on the PR about them.
Fixes https://github.com/pytorch/pytorch/issues/95469
Signed-off-by: Edward Z. Yang <ezyang@meta.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/95621
Approved by: https://github.com/jansel, https://github.com/Chillee
Applies the remaining flake8-comprehension fixes and checks. This changes replace all remaining unnecessary generator expressions with list/dict/set comprehensions which are more succinct, performant, and better supported by our torch.jit compiler. It also removes useless generators such as 'set(a for a in b)`, resolving it into just the set call.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/94676
Approved by: https://github.com/ezyang
The functorch setting still exists, but now it is no longer necessary:
we infer use of Python dispatcher by checking if the ambient
FakeTensorMode has a ShapeEnv or not. The setting still exists,
but it is for controlling direct AOTAutograd use now; for PT2,
it's sufficient to use torch._dynamo.config.dynamic_shapes.
Signed-off-by: Edward Z. Yang <ezyang@meta.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/94469
Approved by: https://github.com/Chillee, https://github.com/voznesenskym, https://github.com/jansel
Optimize unnecessary collection cast calls, unnecessary calls to list, tuple, and dict, and simplify calls to the sorted builtin. This should strictly improve speed and improve readability.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/94323
Approved by: https://github.com/albanD
As found in #92709, thanks to @ngimel and @jansel, currently `torch.Tensor.fn` points to `UserDefinedObjectVariable` rather than `TorchVariable`. The root cause is due to https://github.com/pytorch/pytorch/pull/92709#pullrequestreview-1273357406. To prevent this, build `TorchVariable` of `torch.Tensor.fn` pointing to `torch.ops.aten.fn`.
This issue propagates to `torch.Tensor.fn` causing graph break with `nopython=True`.
```python
import torch
import torch._dynamo as dynamo
#op = torch.ops.aten.abs_ # no graph break
op = torch.Tensor.abs_ # graph break
args = torch.empty(10)
def foo(args):
return op(args)
opt_foo = dynamo.optimize("inductor", nopython=True)(foo)
y_ = opt_foo(args)
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/93243
Approved by: https://github.com/jansel
Previously, Dynamo faked support for item() when `capture_scalar_outputs` was True by representing it internally as a Tensor. With dynamic shapes, this is no longer necessary; we can represent it directly as a SymInt/SymFloat. Do so. Doing this requires you to use dynamic shapes; in principle we could support scalar outputs WITHOUT dynamic shapes but I won't do this unless someone hollers for it.
Signed-off-by: Edward Z. Yang <ezyang@meta.com>
Differential Revision: [D42885775](https://our.internmc.facebook.com/intern/diff/D42885775)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/93150
Approved by: https://github.com/voznesenskym
When we run the node with fake value for tensor.item, it would previously error because the utility method doesn't know how to handle placeholder node. The tensor we are calling item can be input from user will be placeholder in the graph.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/91668
Approved by: https://github.com/voznesenskym
GraphArgs worked fairly well, but it was still missing sources
sometimes. Now, we maintain an auxiliary data structure which we
MUST populate whenever we fakeify a tensor / allocate a bare SymInt.
This should guarantee once and for all that every symbol is available.
Should fix swin_base_patch4_window7_224.
While I was at it, I moved fakeification utility back to builder
as it was only used at once call site.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/90911
Approved by: https://github.com/voznesenskym
This will be the last disruptive functorch internals change.
Why are we moving these files?
- As a part of rationalizing functorch we are moving the code in
functorch/_src to torch/_functorch
- This is so that we can offer the functorch APIs as native PyTorch APIs
(coming soon) and resolve some internal build issues.
Why are we moving all of these files at once?
- It's better to break developers all at once rather than many times
Test Plan:
- wait for tests
Pull Request resolved: https://github.com/pytorch/pytorch/pull/90091
Approved by: https://github.com/anijain2305, https://github.com/ezyang
See code comment for details. I also had to do some extra fixes:
* `run_functionalized_fw_and_collect_metadata` now is able to handle duplicated arguments
* `aot_wrapper_dedupe` now always returns boxed compiled functions
* `aot_wrapper_dedupe` is now applied to inference compiler along with autograd compiler (preexisting)
Fixes https://github.com/pytorch/torchdynamo/issues/1939
Fixes DebertaV2ForQuestionAnswering DebertaForMaskedLM DebertaForQuestionAnswering DebertaV2ForMaskedLM
Repro command:
```
python benchmarks/dynamo/huggingface.py --performance --float32 -dcuda --training --inductor --no-skip --dashboard --only DebertaForQuestionAnswering --cold_start_latency
```
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/89896
Approved by: https://github.com/bdhirsh
This will be the last disruptive functorch internals change.
Why are we moving these files?
- As a part of rationalizing functorch we are moving the code in
functorch/_src to torch/_functorch
- This is so that we can offer the functorch APIs as native PyTorch APIs
(coming soon) and resolve some internal build issues.
Why are we moving all of these files at once?
- It's better to break developers all at once rather than many times
Test Plan:
- wait for tests
Pull Request resolved: https://github.com/pytorch/pytorch/pull/88756
Approved by: https://github.com/ezyang
A previous version of this patch graph breaks when torch.tensor fails, but that causes
```
PYTORCH_TEST_WITH_DYNAMO=1 python test/nn/test_embedding.py -k test_embedding_bag_1D_padding_idx_cpu_float32
```
to start failing. Probably another latent bug that needs investigating.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/89645
Approved by: https://github.com/albanD
I'm not really sure what desertfire's intended follow up was
on https://github.com/pytorch/pytorch/pull/87490 because when I remove
the unsupported() call, dynamo tests pass. But the change here is
conservative and I think strictly better than the current situation.
The idea is to force fake tensor pop on for the test, and then just
observe that we are doing a graph break. Clearly, export doesn't work,
so I manually xfail it.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/89641
Approved by: https://github.com/anjali411
This fixes excessing recompilation issue in tacotron2 but has few caveats - https://github.com/pytorch/torchdynamo/issues/330
For tacotron2, the repro is something like this
~~~
def inner(x):
return torch.sin(x)
def fn(x):
for _ in range(100):
inner(x)
torch._dynamo.graph_break()
return x
~~~
The problem here is that Dynamo has guards on the TUPLE_ITERATOR_LEN whenever a graph break happens. Therefore, we keep on recompiling.
This PR checks if there is a backedge (helps with while loop) in presence of a graph break. If there is, Dynamo skips processing this frame. Therefore, Dynamo gets called when inner is called, and we compile only once.
Note that, if there was no graph break, we will unroll the original loop, and see one graph with 100 sin operations (just as before, so no changes there).
The caveat is - We are skipping the frame, so if we have something like this
~~~
def fn(x):
for _ in range(100):
# 1000s of lines of PyTorch code
torch._dynamo.graph_break()
return x
~~~
Dynamo will skip processing this frame, and might miss on the optimization.
Completely open for suggestions. Happy to re-implement if there is a better way to handle this.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/88857
Approved by: https://github.com/jansel, https://github.com/yanboliang
**Introduces symbolic shape guards into dynamo.**
In this PR, we take the existing fake tensor infra and plumbing in dynamo and we start passing a shape_env around. This shape_env does not get plumbed down to middle layers / backend yet - it only collects expressions from frontend invocations at the moment. We then translate these expressions into guards at the point where we take other guards installed throughout dynamo - and add them to check_fn.
Part 1 of https://docs.google.com/document/d/1QJ-M4zfMkD-fjHIqW089RptjLl9EgozZGCceUbvmgfY/edit#
cc @jansel @lezcano @fdrocha @mlazos @soumith @yanboliang @penguinwu @anijain2305
Pull Request resolved: https://github.com/pytorch/pytorch/pull/87570
Approved by: https://github.com/ezyang
I noticed that a lot of bugs are being suppressed by torchdynamo's default
error suppression, and worse yet, there's no way to unsuppress them. After
discussion with voz and soumith, we decided that we will unify error suppression
into a single option (suppress_errors) and default suppression to False.
If your model used to work and no longer works, try TORCHDYNAMO_SUPPRESS_ERRORS=1
to bring back the old suppression behavior.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
cc @jansel @lezcano @fdrocha @mlazos @soumith @voznesenskym @yanboliang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/87440
Approved by: https://github.com/voznesenskym, https://github.com/albanD