Commit Graph

436 Commits

Author SHA1 Message Date
Joel Schlosser
fb146fc3c6 Only store necessary tensor_dict fields in node meta (#132805)
Fixes #132290

This PR attempts a more invasive / complete solution than the one from #132338, which removes immediate tensor fields from the `tensor_dict` copy stored in node meta. The approach taken here is to store only those fields of the `tensor_dict` which are absolutely utilized somewhere else.

So far, this appears to be limited to:
* `_dynamo_static_input_type`
* `tag` (at least in the tests). Discussion at #94080 appears to indicate this is depended on for export

(CI may point out more)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/132805
Approved by: https://github.com/mlazos
2024-08-07 13:35:16 +00:00
Brian Hirsh
af8b8a47cb fsdp.set_: convey to functionalization that it mutates storage (#132322)
Fixes https://github.com/pytorch/pytorch/issues/132197

Pull Request resolved: https://github.com/pytorch/pytorch/pull/132322
Approved by: https://github.com/albanD, https://github.com/yf225
ghstack dependencies: #132243, #132337
2024-08-05 21:28:59 +00:00
Brian Hirsh
4db368a475 make functorch CSE respect mutations as barriers (like fsdp.set_) (#132243)
Fixes https://github.com/pytorch/pytorch/issues/132200

Pull Request resolved: https://github.com/pytorch/pytorch/pull/132243
Approved by: https://github.com/albanD, https://github.com/zou3519, https://github.com/yf225
2024-08-05 21:28:55 +00:00
William Wen
01cdcbf7c8 [dynamo] revert map/zip iterator related changes (#132528)
Need to revert due to internal hangs: S437700

This reverts commit b6c1490cc0.

Revert "[dynamo] implement IteratorVariable and polyfill fallbacks for enumerate (#131725)"

This reverts commit 2576dbbc35.

Revert "[dynamo] add itertools repeat/count bytecode reconstruction (#131716)"

This reverts commit 35b4de32fa.

Revert "[dynamo] add lazy IteratorVariable implementations for map and zip (#131413)"

This reverts commit 7d282d8755.

Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/132528
Approved by: https://github.com/ZainRizvi
2024-08-04 18:46:55 +00:00
PyTorch MergeBot
0a25666f92 Revert "[dynamo] revert map/zip iterator related changes (#132528)"
This reverts commit e81e74ca6c.

Reverted https://github.com/pytorch/pytorch/pull/132528 on behalf of https://github.com/ZainRizvi due to This stack entered a weird state in the diff train. Reverting and relanding to clean the state ([comment](https://github.com/pytorch/pytorch/pull/132528#issuecomment-2267628475))
2024-08-04 18:26:09 +00:00
Pian Pawakapan
a896fb1b36 check unsupported sympy functions for runtime asserts (#132457)
Some sympy Functions aren't supported by sympy_interp(); we can't turn them into FX nodes, so currently the runtime asserts CSE pass avoids CSE'ing on any expression containing a sympy Function. https://github.com/pytorch/pytorch/pull/132325 started tracking unsupported functions, so we switch the check to that to be more precise. We also check for and skip unsupported functions when adding asserts - previously we only did the check for CSE, and not adding new expressions.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/132457
Approved by: https://github.com/avikchaudhuri
2024-08-03 10:17:25 +00:00
William Wen
e81e74ca6c [dynamo] revert map/zip iterator related changes (#132528)
Need to revert due to internal hangs: S437700

This reverts commit b6c1490cc0.

Revert "[dynamo] implement IteratorVariable and polyfill fallbacks for enumerate (#131725)"

This reverts commit 2576dbbc35.

Revert "[dynamo] add itertools repeat/count bytecode reconstruction (#131716)"

This reverts commit 35b4de32fa.

Revert "[dynamo] add lazy IteratorVariable implementations for map and zip (#131413)"

This reverts commit 7d282d8755.

Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/132528
Approved by: https://github.com/ZainRizvi
2024-08-02 19:40:57 +00:00
Michael Lazos
93979e7063 Skip frame if torch dispatch mode enabled (#131828)
Fixes https://github.com/pytorch/pytorch/issues/105929

We now skip frames if a dispatch mode is enabled.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/131828
Approved by: https://github.com/bdhirsh, https://github.com/anijain2305
2024-08-01 19:06:20 +00:00
Oguz Ulgen
920f0426ae Add None return type to init -- tests rest (#132376)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/132376
Approved by: https://github.com/jamesjwu
ghstack dependencies: #132335, #132351, #132352
2024-08-01 15:44:51 +00:00
YangQun1
589aef4bb0 Fix py codegen to delete values that don't have any users (#131028)
Fixes #131025

Pull Request resolved: https://github.com/pytorch/pytorch/pull/131028
Approved by: https://github.com/ezyang
2024-08-01 03:18:37 +00:00
ekamiti
9e473fd868 Make adding Buffers more like adding Parameters (#125971)
Add similar semantics for creating a buffer object similar to creating a parameter. This is done by introducing a new Buffer class that can be used for type disambiguation. The underlying functionality of registering a buffer remains the same as the register_buffer method has not been changed. The persistent parameter in the Buffer type is to indicate whether a buffer object should be persistent or not. Other non-test changes have to do with getting the new Buffer type recognized by inductor and dynamo. Remaining changes are test changes to make sure that the Buffer type can be used as a drop in replacement for register_buffer as it just leads to register_buffer being called. The addition of this new functionality still allows for normal tensors to be used as buffers so these changes are intended to be backwards compatible.

Fixes #35735

Co-authored-by: Mikayla Gawarecki <mikaylagawarecki@gmail.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/125971
Approved by: https://github.com/albanD, https://github.com/anijain2305, https://github.com/mlazos
2024-07-31 10:32:40 +00:00
Yiming Zhou
e9d1c26275 fix uniform op in dynamo (#132160)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/132160
Approved by: https://github.com/anijain2305
2024-07-31 06:48:43 +00:00
Xuehai Pan
918ece4f4d [BE][Easy][11/19] enforce style for empty lines in import segments in test/dy*/ (#129762)
See https://github.com/pytorch/pytorch/pull/129751#issue-2380881501. Most changes are auto-generated by linter.

You can review these PRs via:

```bash
git diff --ignore-all-space --ignore-blank-lines HEAD~1
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/129762
Approved by: https://github.com/anijain2305
2024-07-27 17:43:53 +00:00
Brian Hirsh
e4ace1a396 AOTDispatcher: properly bump version counter on input mutations in inference graphs (#131665)
This ensures that in an inference setting, we properly bump the VC of mutated graph inputs. Previously, we would only properly bump the VC for training graphs.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/131665
Approved by: https://github.com/ezyang, https://github.com/zou3519
ghstack dependencies: #131403, #131482
2024-07-26 14:22:20 +00:00
Brian Hirsh
5570a0da0a dont dispatch aten.conj(scalar_tensor) back to python (#131482)
https://github.com/pytorch/pytorch/issues/105290

The problem in the original flow is that:

(1) the user calls `torch.mul(complex_tensor, complex_scalar)
(2) python arg parser wraps the complex scalar in a `scalar_tensor`, and dispatches to `aten.mul.Tensor(self, scalar_other)`
(3) autograd sees `aten.mul.Tensor`, calls `scalar_other.conj()` [here](https://github.com/pytorch/pytorch/blob/main/torch/csrc/autograd/FunctionsManual.cpp#L597)
(4) during proxy tensor tracing, this gets dispatched to `aten._conj(scalar_tensor)`
(5) when we hit __torch_dispatch__, the scalar_tensor is converted back into a plain python scalar
(6) we error during tracing, because in `FunctionalTensorMode.__torch_dispatch__` we try to redispatch on `aten._conj.default(plain_python_scalar)`, and this overload does not accept python scalars.

My attempted fix in this PR is to update `TensorBase::conj()` to check if the current tensor is a scalar tensor (wrapped number), and if so, manually:
(1) convert the scalar tensor back into a scalar
(2) call scalar.conj() directly
(3) convert the result back into a wrapped tensor

This avoids having to go through python entirely in the tracing case (which is fine, because these scalar tensors are constants that we can const-prop during tracing anyway).

Notable, I did **not** add e.g. a new `aten._conj.Scalar` overload. This would not actually fix the problem, since the bug is that we call `aten._conj.default(python_scalar)` directly. we would also need to muck with all `__torch_dispatch__` call sites to know to convert python scalars back into tensors directly.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/131482
Approved by: https://github.com/zou3519, https://github.com/ezyang
ghstack dependencies: #131403
2024-07-26 14:22:20 +00:00
Brian Hirsh
8bb9aa93a7 dynamo: mutations on .data should be invisible to autograd (#131403)
Fixes https://github.com/pytorch/pytorch/issues/121353

our handle for `.data` in dynamo today basically just converts `y = x.data` into `y = x.detach()`. The semantics of these two ops are not quite the same, because:

(1) any future mutations on `x.data` will be fully ignored by autograd
(2) any mutations on `x.detach()` will bump x's version counter

the linked model does a .data mutation that is hidden from autograd in eager, but ends up erroring during AOTDispatcher tracing.

I updated dynamo's handling so that:

(1) when dynamo sees a call to `getattr(tensor, "data")` and calls `.detach()` we set a flag on the returned `TensorVariable` indicating it came from `.data`

(2) on any tensor method that we call with an input `TensorVariable` with this flag turned on, we proxy autograd's `preserve_version_counter` logic into the graph, to properly reset the VC after the op is run.

One thing to note is that I don't actually do this on every op that we pass the tensor to: I only do it for tensor methods that appear to be mutations (by checking for a trailing underscore). My thought was that:

(1) I didn't want to do this for **every** op that you pass `y` into, since that will e.g. triple the number of nodes in the graph, and could cause compile time regressions if you use .data

(2) this situation is pretty rare in general, and I'm hoping that "tensor method mutations" cover most reasonable mutation cases. If we manage to miss a case, you will get a loud error during tracing anyway, so there is not a safety issue.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/131403
Approved by: https://github.com/anijain2305, https://github.com/zou3519
2024-07-26 14:22:20 +00:00
William Wen
7d282d8755 [dynamo] add lazy IteratorVariable implementations for map and zip (#131413)
Fixes https://github.com/pytorch/pytorch/issues/130750.

Repro of lazy/eager `map` discrepancy without `islice`:
```python
    def fn(a, b):
        y = 1

        def f(x):
            nonlocal y
            y += 1
            return x

        l = list(zip([a, b], map(f, [1, 2, 3, 4])))
        return a + y
```

The major change is that we implement `MapVariable` and `ZipVariable` based on `IteratorVariable`. Before, `map` and `zip` were being traced by immediately unpacking the result as a `TupleVariable`, which is wrong in cases such as the example above.

`MapVariable`s are not allowed to be unpacked while `ZipVariable`s can only be unpacked if all of its iterables can also be unpacked.

We also add new `[has_]force_unpack_var_sequence` methods to `VariableTracker` for the case where it is safe to unpack the entire sequence lazily, e.g., when building a list from a map (i.e. `list(map(f, ...))`).

Pull Request resolved: https://github.com/pytorch/pytorch/pull/131413
Approved by: https://github.com/anijain2305
2024-07-26 10:47:38 +00:00
Michael Lazos
51f4f87718 [Reland] Ensure staticmethods can be allowed in graph (#131789)
Fixes https://github.com/pytorch/pytorch/issues/124735

Pull Request resolved: https://github.com/pytorch/pytorch/pull/131789
Approved by: https://github.com/anijain2305
2024-07-25 22:54:18 +00:00
William Wen
2423d89d0c [dynamo] mirror training flag in OptimizedModule (#131546)
Fixes https://github.com/pytorch/pytorch/issues/122414.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/131546
Approved by: https://github.com/yanboliang, https://github.com/anijain2305
2024-07-25 17:43:09 +00:00
PyTorch MergeBot
c3679bed35 Revert "Fix py codegen to delete values that don't have any users (#131028)"
This reverts commit 91aba7baac.

Reverted https://github.com/pytorch/pytorch/pull/131028 on behalf of https://github.com/clee2000 due to broke inductor/test_triton_kernels inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_functionalize [GH job link](https://github.com/pytorch/pytorch/actions/runs/10094659640/job/27915271250) [HUD commit link](91aba7baac) ([comment](https://github.com/pytorch/pytorch/pull/131028#issuecomment-2251058374))
2024-07-25 17:42:18 +00:00
YangQun1
91aba7baac Fix py codegen to delete values that don't have any users (#131028)
Fixes #131025

Pull Request resolved: https://github.com/pytorch/pytorch/pull/131028
Approved by: https://github.com/ezyang
2024-07-25 13:04:23 +00:00
PyTorch MergeBot
236e06f9f9 Revert "Ensure staticmethods can be allowed in graph (#130882)"
This reverts commit 93fdd0237d.

Reverted https://github.com/pytorch/pytorch/pull/130882 on behalf of https://github.com/clee2000 due to torchrec test still broken internally D59945836 ([comment](https://github.com/pytorch/pytorch/pull/130882#issuecomment-2249003059))
2024-07-24 22:32:41 +00:00
PyTorch MergeBot
8ffd109a00 Revert "Fix py codegen to delete values that don't have any users (#131028)"
This reverts commit 466c167b71.

Reverted https://github.com/pytorch/pytorch/pull/131028 on behalf of https://github.com/atalman due to breaks CI ([comment](https://github.com/pytorch/pytorch/pull/131028#issuecomment-2247771530))
2024-07-24 12:21:43 +00:00
YangQun1
466c167b71 Fix py codegen to delete values that don't have any users (#131028)
Fixes #131025

Pull Request resolved: https://github.com/pytorch/pytorch/pull/131028
Approved by: https://github.com/ezyang
2024-07-24 01:03:56 +00:00
Michael Lazos
93fdd0237d Ensure staticmethods can be allowed in graph (#130882)
Fixes https://github.com/pytorch/pytorch/issues/124735

Pull Request resolved: https://github.com/pytorch/pytorch/pull/130882
Approved by: https://github.com/anijain2305, https://github.com/williamwen42
2024-07-23 18:59:19 +00:00
Animesh Jain
ddde9dd25c [dynamo][automatic_dynamic] Trigger dynamism on stride changes (#130232)
Fixes https://github.com/pytorch/pytorch/issues/129798

Pull Request resolved: https://github.com/pytorch/pytorch/pull/130232
Approved by: https://github.com/ezyang
2024-07-21 03:45:54 +00:00
Animesh Jain
00e54e74ff [dynamo][cpp-guards] Fix bug in dict tags (#131056)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/131056
Approved by: https://github.com/williamwen42, https://github.com/jansel
2024-07-19 04:42:38 +00:00
PyTorch MergeBot
9f6db5d0e2 Revert "Ensure staticmethods can be allowed in graph (#130882)"
This reverts commit b0387449db.

Reverted https://github.com/pytorch/pytorch/pull/130882 on behalf of https://github.com/atalman due to failing torchrec tests internally, please fix and reland ([comment](https://github.com/pytorch/pytorch/pull/130882#issuecomment-2236528473))
2024-07-18 13:31:30 +00:00
Michael Lazos
b0387449db Ensure staticmethods can be allowed in graph (#130882)
Fixes https://github.com/pytorch/pytorch/issues/124735

Pull Request resolved: https://github.com/pytorch/pytorch/pull/130882
Approved by: https://github.com/anijain2305, https://github.com/williamwen42
2024-07-17 19:18:30 +00:00
Michael Lazos
e4f9d01cd9 Add test for dataclass field accesses (#130848)
Fixes https://github.com/pytorch/pytorch/issues/120108

Pull Request resolved: https://github.com/pytorch/pytorch/pull/130848
Approved by: https://github.com/williamwen42, https://github.com/anijain2305
2024-07-17 19:14:23 +00:00
William Wen
3928ca2ab6 [dynamo] update call map to allow multiple input parameters (#130748)
Fixes https://github.com/pytorch/pytorch/issues/128072.

Commandeering https://github.com/pytorch/pytorch/pull/128282 since the issue is now hi pri.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/130748
Approved by: https://github.com/Skylion007, https://github.com/anijain2305
2024-07-15 22:16:49 +00:00
PyTorch MergeBot
dff9d68f18 Revert "Fix names conflict when lifting (#129817)"
This reverts commit 53cf46b8c6.

Reverted https://github.com/pytorch/pytorch/pull/129817 on behalf of https://github.com/clee2000 due to Failing inductor/test_flex_attention.py https://github.com/pytorch/pytorch/actions/runs/9940532858/job/27478084137 74da2a467f Sorry for the churn, possibly a landrace? ([comment](https://github.com/pytorch/pytorch/pull/129817#issuecomment-2229519886))
2024-07-15 22:08:45 +00:00
Zhanghan Wang
53cf46b8c6 Fix names conflict when lifting (#129817)
## Bug description
When pending args that are potentially to be lift [here](58f346c874/torch/_dynamo/output_graph.py (L1866)) having same base name, like `contiguous` and `contiguous_1`, the call into [create_graph_input](58f346c874/torch/_dynamo/output_graph.py (L2081)) can finally create a name ([here](58f346c874/torch/fx/graph.py (L1008))) that overwrite args to lift. And thus causing a wrong output of graph.

## Reproducing
Below is an reproduceable example,
```python
import logging
from typing import List

import torch
from functorch.compile import aot_module_simplified, make_boxed_func

@torch.library.custom_op("mylib::somefunc_forward", mutates_args=())
def somefunc_forward(
    input_: torch.Tensor,
    weight: torch.Tensor,
    shape: List[int],
) -> torch.Tensor:
    return torch.ones_like(input_)

@somefunc_forward.register_fake
def _(input_, shape, weight):
    return torch.empty_like(input_)

@torch.library.custom_op("mylib::somefunc_backward", mutates_args=())
def somefunc_backward(
    grad_output: torch.Tensor,
    input_: torch.Tensor,
    weight: torch.Tensor,
    shape: List[int],
) -> torch.Tensor:
    print(f"backward.{grad_output.shape=}")
    print(f"backward.{input_.shape=}")
    print(f"backward.{weight.shape=}")
    print(f"backward.{shape=}")
    assert list(weight.shape) == shape
    return torch.ones_like(weight)

@somefunc_backward.register_fake
def _(grad_output, input_, weight, shape):
    return torch.empty_like(weight)

def a_func(grad_output, input_, weight_, shape):
    return torch.ones_like(input_.sum() * weight_)

class SomeFunc(torch.autograd.Function):
    @staticmethod
    def forward(ctx, input, weight, normalized_shape):
        ctx.normalized_shape = normalized_shape
        input_ = input.contiguous()
        weight_ = weight.contiguous()
        output = somefunc_forward(input_, weight_, ctx.normalized_shape)
        ctx.save_for_backward(input_, weight_)
        return output

    @staticmethod
    def backward(ctx, grad_output):
        input_, weight_ = ctx.saved_tensors
        # grad_weight = a_func(grad_output, input_, weight_, ctx.normalized_shape)
        grad_weight = somefunc_backward(
            grad_output.contiguous(),
            input_,
            weight_,
            ctx.normalized_shape,
        )
        return None, grad_weight, None

class MyModel(torch.nn.Module):
    def __init__(self):
        super().__init__()
        self.weight = torch.nn.Parameter(torch.ones(7))

    def forward(self, x):
        return SomeFunc.apply(x, self.weight, [7])

model = MyModel()
torch._logging.set_logs(dynamo=logging.DEBUG, aot=logging.DEBUG, graph_code=True)

def aot_print_backend(gm, sample_inputs):
    # Forward compiler capture
    def fw(gm, sample_inputs):
        print(f"----- fw")
        gm.print_readable()
        return make_boxed_func(gm.forward)

    # Backward compiler capture
    def bw(gm, sample_inputs):
        print(f"----- bw")
        gm.print_readable()
        return make_boxed_func(gm.forward)

    # Call AOTAutograd
    gm_forward = aot_module_simplified(
        gm, sample_inputs, fw_compiler=fw, bw_compiler=bw
    )
    return gm_forward

model = torch.compile(
    model,
    backend=aot_print_backend,
    dynamic=False,
)
out = model(torch.rand((128, 4, 7)))
out.mean().backward()
```

I can see log that showing calling into create_graph_input like
```log
V0629 02:08:46.839914 8200981504 torch/_dynamo/output_graph.py:2042] [0/0] create_graph_input contiguous (none)
V0629 02:08:46.839998 8200981504 torch/_dynamo/output_graph.py:2042] [0/0] create_graph_input contiguous_1 (none)
```

And the backward graph generate will be like
```log
class GraphModule(torch.nn.Module):
    def forward(self, function_ctx, somefunc_forward_default: "f32[128, 4, 7]", contiguous: "f32[128, 4, 7]", contiguous_1: "f32[7]"):
        contiguous_1 = contiguous
        contiguous_2 = contiguous_1

        # No stacktrace found for following nodes
        _set_grad_enabled = torch._C._set_grad_enabled(False)

         # File: /Users/bytedance/testtorch/test_custom_op_bug.py:61 in backward, code: grad_output.contiguous(),
        contiguous: "f32[128, 4, 7]" = somefunc_forward_default.contiguous();  somefunc_forward_default = None

         # File: /opt/tiger/pytorch/torch/_library/custom_ops.py:506 in __call__, code: return self._opoverload(*args, **kwargs)
        somefunc_backward_default: "f32[7]" = torch.ops.mylib.somefunc_backward.default(contiguous, contiguous_1, contiguous_2, [7]);  contiguous = contiguous_1 = contiguous_2 = None

        # No stacktrace found for following nodes
        _set_grad_enabled_1 = torch._C._set_grad_enabled(True)
        return (None, somefunc_backward_default)
```

The original code of `somefunc_backward` takes a input list of `grad_output`, `input_`, `weight` and `shape`, where `weight` should be shape of `torch.Size([7])`. However, in the graph, `contiguous1` and `contiguous_2` are assigned with `contiguous`, this leads to assertion failure I added in `somefunc_backward`.

## Environment
```log
Collecting environment information...
PyTorch version: 2.5.0a0+git0b7e8df
Is debug build: False
CUDA used to build PyTorch: None
ROCM used to build PyTorch: N/A

OS: macOS 14.5 (arm64)
GCC version: Could not collect
Clang version: 15.0.0 (clang-1500.3.9.4)
CMake version: version 3.26.4
Libc version: N/A

Python version: 3.9.19 (main, May  6 2024, 14:39:30)  [Clang 14.0.6 ] (64-bit runtime)
Python platform: macOS-14.5-arm64-arm-64bit
Is CUDA available: False
CUDA runtime version: No CUDA
CUDA_MODULE_LOADING set to: N/A
GPU models and configuration: No CUDA
Nvidia driver version: No CUDA
cuDNN version: No CUDA
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

CPU:
Apple M3 Pro

Versions of relevant libraries:
[pip3] numpy==2.0.0
[pip3] optree==0.11.0
[pip3] torch==2.5.0a0+git0b7e8df
[pip3] torchgraph==0.0.1
[conda] numpy                     2.0.0                    pypi_0    pypi
[conda] optree                    0.11.0                   pypi_0    pypi
[conda] torch                     2.5.0a0+git0b7e8df           dev_0    <develop>
[conda] torchgraph                0.0.1                     dev_0    <develop>
```

## How to fix?

I put a naive fix that add the potential args to lift into the used_names. This visits private variables, will fix that if this issue makes sense to you.

@zou3519 @oulgen

Co-authored-by: rzou <zou3519@gmail.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/129817
Approved by: https://github.com/zou3519
2024-07-15 18:49:12 +00:00
PyTorch MergeBot
1e897a0ca4 Revert "Fix names conflict when lifting (#129817)"
This reverts commit 74da2a467f.

Reverted https://github.com/pytorch/pytorch/pull/129817 on behalf of https://github.com/clee2000 due to broke dynamo/test_inline_inbuilt_nn_modules.py https://github.com/pytorch/pytorch/actions/runs/9940532858/job/27461141919 74da2a467f.  Test passed on PR, possibly a landrace? ([comment](https://github.com/pytorch/pytorch/pull/129817#issuecomment-2228993570))
2024-07-15 17:09:52 +00:00
Zhanghan Wang
74da2a467f Fix names conflict when lifting (#129817)
## Bug description
When pending args that are potentially to be lift [here](58f346c874/torch/_dynamo/output_graph.py (L1866)) having same base name, like `contiguous` and `contiguous_1`, the call into [create_graph_input](58f346c874/torch/_dynamo/output_graph.py (L2081)) can finally create a name ([here](58f346c874/torch/fx/graph.py (L1008))) that overwrite args to lift. And thus causing a wrong output of graph.

## Reproducing
Below is an reproduceable example,
```python
import logging
from typing import List

import torch
from functorch.compile import aot_module_simplified, make_boxed_func

@torch.library.custom_op("mylib::somefunc_forward", mutates_args=())
def somefunc_forward(
    input_: torch.Tensor,
    weight: torch.Tensor,
    shape: List[int],
) -> torch.Tensor:
    return torch.ones_like(input_)

@somefunc_forward.register_fake
def _(input_, shape, weight):
    return torch.empty_like(input_)

@torch.library.custom_op("mylib::somefunc_backward", mutates_args=())
def somefunc_backward(
    grad_output: torch.Tensor,
    input_: torch.Tensor,
    weight: torch.Tensor,
    shape: List[int],
) -> torch.Tensor:
    print(f"backward.{grad_output.shape=}")
    print(f"backward.{input_.shape=}")
    print(f"backward.{weight.shape=}")
    print(f"backward.{shape=}")
    assert list(weight.shape) == shape
    return torch.ones_like(weight)

@somefunc_backward.register_fake
def _(grad_output, input_, weight, shape):
    return torch.empty_like(weight)

def a_func(grad_output, input_, weight_, shape):
    return torch.ones_like(input_.sum() * weight_)

class SomeFunc(torch.autograd.Function):
    @staticmethod
    def forward(ctx, input, weight, normalized_shape):
        ctx.normalized_shape = normalized_shape
        input_ = input.contiguous()
        weight_ = weight.contiguous()
        output = somefunc_forward(input_, weight_, ctx.normalized_shape)
        ctx.save_for_backward(input_, weight_)
        return output

    @staticmethod
    def backward(ctx, grad_output):
        input_, weight_ = ctx.saved_tensors
        # grad_weight = a_func(grad_output, input_, weight_, ctx.normalized_shape)
        grad_weight = somefunc_backward(
            grad_output.contiguous(),
            input_,
            weight_,
            ctx.normalized_shape,
        )
        return None, grad_weight, None

class MyModel(torch.nn.Module):
    def __init__(self):
        super().__init__()
        self.weight = torch.nn.Parameter(torch.ones(7))

    def forward(self, x):
        return SomeFunc.apply(x, self.weight, [7])

model = MyModel()
torch._logging.set_logs(dynamo=logging.DEBUG, aot=logging.DEBUG, graph_code=True)

def aot_print_backend(gm, sample_inputs):
    # Forward compiler capture
    def fw(gm, sample_inputs):
        print(f"----- fw")
        gm.print_readable()
        return make_boxed_func(gm.forward)

    # Backward compiler capture
    def bw(gm, sample_inputs):
        print(f"----- bw")
        gm.print_readable()
        return make_boxed_func(gm.forward)

    # Call AOTAutograd
    gm_forward = aot_module_simplified(
        gm, sample_inputs, fw_compiler=fw, bw_compiler=bw
    )
    return gm_forward

model = torch.compile(
    model,
    backend=aot_print_backend,
    dynamic=False,
)
out = model(torch.rand((128, 4, 7)))
out.mean().backward()
```

I can see log that showing calling into create_graph_input like
```log
V0629 02:08:46.839914 8200981504 torch/_dynamo/output_graph.py:2042] [0/0] create_graph_input contiguous (none)
V0629 02:08:46.839998 8200981504 torch/_dynamo/output_graph.py:2042] [0/0] create_graph_input contiguous_1 (none)
```

And the backward graph generate will be like
```log
class GraphModule(torch.nn.Module):
    def forward(self, function_ctx, somefunc_forward_default: "f32[128, 4, 7]", contiguous: "f32[128, 4, 7]", contiguous_1: "f32[7]"):
        contiguous_1 = contiguous
        contiguous_2 = contiguous_1

        # No stacktrace found for following nodes
        _set_grad_enabled = torch._C._set_grad_enabled(False)

         # File: /Users/bytedance/testtorch/test_custom_op_bug.py:61 in backward, code: grad_output.contiguous(),
        contiguous: "f32[128, 4, 7]" = somefunc_forward_default.contiguous();  somefunc_forward_default = None

         # File: /opt/tiger/pytorch/torch/_library/custom_ops.py:506 in __call__, code: return self._opoverload(*args, **kwargs)
        somefunc_backward_default: "f32[7]" = torch.ops.mylib.somefunc_backward.default(contiguous, contiguous_1, contiguous_2, [7]);  contiguous = contiguous_1 = contiguous_2 = None

        # No stacktrace found for following nodes
        _set_grad_enabled_1 = torch._C._set_grad_enabled(True)
        return (None, somefunc_backward_default)
```

The original code of `somefunc_backward` takes a input list of `grad_output`, `input_`, `weight` and `shape`, where `weight` should be shape of `torch.Size([7])`. However, in the graph, `contiguous1` and `contiguous_2` are assigned with `contiguous`, this leads to assertion failure I added in `somefunc_backward`.

## Environment
```log
Collecting environment information...
PyTorch version: 2.5.0a0+git0b7e8df
Is debug build: False
CUDA used to build PyTorch: None
ROCM used to build PyTorch: N/A

OS: macOS 14.5 (arm64)
GCC version: Could not collect
Clang version: 15.0.0 (clang-1500.3.9.4)
CMake version: version 3.26.4
Libc version: N/A

Python version: 3.9.19 (main, May  6 2024, 14:39:30)  [Clang 14.0.6 ] (64-bit runtime)
Python platform: macOS-14.5-arm64-arm-64bit
Is CUDA available: False
CUDA runtime version: No CUDA
CUDA_MODULE_LOADING set to: N/A
GPU models and configuration: No CUDA
Nvidia driver version: No CUDA
cuDNN version: No CUDA
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

CPU:
Apple M3 Pro

Versions of relevant libraries:
[pip3] numpy==2.0.0
[pip3] optree==0.11.0
[pip3] torch==2.5.0a0+git0b7e8df
[pip3] torchgraph==0.0.1
[conda] numpy                     2.0.0                    pypi_0    pypi
[conda] optree                    0.11.0                   pypi_0    pypi
[conda] torch                     2.5.0a0+git0b7e8df           dev_0    <develop>
[conda] torchgraph                0.0.1                     dev_0    <develop>
```

## How to fix?

I put a naive fix that add the potential args to lift into the used_names. This visits private variables, will fix that if this issue makes sense to you.

@zou3519 @oulgen

Co-authored-by: rzou <zou3519@gmail.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/129817
Approved by: https://github.com/zou3519
2024-07-15 13:41:46 +00:00
Animesh Jain
8714b7fc69 [dynamo][cpp-guards] Use dict tags to skip guards on immutable dict getitems (#130654)
Reduces the guard overhead from 3.7k units to 2.1k units.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/130654
Approved by: https://github.com/jansel
2024-07-13 15:31:10 +00:00
Pian Pawakapan
1b3b4c2fb9 [runtime asserts] deduplicate runtime asserts & CSE (#128599) (#130380)
original PR: https://github.com/pytorch/pytorch/pull/128599 (re-created after revert + poisoned diff train)

Summary:
This PR adds deduplication and CSE for runtime asserts. Existing size computation in the graph is CSE'd along with added runtime asserts, and redundant asserts are removed. Shape calls on intermediate tensors are also turned into compute on input sizes if possible, allowing intermediate tensors to be freed earlier. For example:
```
z = torch.cat([x, x], dim=0)  # 2*s0
w = z.repeat(y.shape[0])  # 2*s0*s1
_w = w.shape[0]

s0 = x.shape[0]
s1 = y.shape[0]
_w0 = 2 * s0
_w = _w0 * s1
```

Additionally, constrain_range calls are deduplicated. Single-symbol bound checks for unbacked symbols (e.g. u0 >= 0, u0 <= 5) and sym_constrain_range.default calls are also removed, since they accumulate range info in the ShapeEnv, and are replaced with two _assert_scalar.default calls that check the min/max bounds. For example:
```
torch.sym_constrain_range_for_size(n, min=2, max=16)
torch.sym_constrain_range(n, min=4, max=20)
torch._check(n >= 0)
torch._check(n >= 3)
torch._check(n <= 14)

torch.sym_constrain_range_for_size(n)
torch._check(n >= 4)
torch._check(n <= 14)
```

Test Plan:
contbuild & OSS CI, see 940e4477ab

Original Phabricator Test Plan:
Imported from GitHub, without a `Test Plan:` line.

Differential Revision: D59543603

Pull Request resolved: https://github.com/pytorch/pytorch/pull/130380
Approved by: https://github.com/izaitsevfb
2024-07-10 19:23:37 +00:00
PyTorch MergeBot
9c9744c3ac Revert "[runtime asserts] deduplicate runtime asserts & CSE (#128599)"
This reverts commit 940e4477ab.

Reverted https://github.com/pytorch/pytorch/pull/128599 on behalf of https://github.com/izaitsevfb due to breaking internal APS tests, see D59498864 ([comment](https://github.com/pytorch/pytorch/pull/128599#issuecomment-2218724762))
2024-07-09 21:03:49 +00:00
Animesh Jain
f053be2a97 [dynamo] Graph break on random_ op (#130222)
Fixes https://github.com/pytorch/pytorch/issues/121621

Pull Request resolved: https://github.com/pytorch/pytorch/pull/130222
Approved by: https://github.com/jansel
2024-07-08 06:10:24 +00:00
Pian Pawakapan
940e4477ab [runtime asserts] deduplicate runtime asserts & CSE (#128599)
This PR adds deduplication and CSE for runtime asserts. Existing size computation in the graph is CSE'd along with added runtime asserts, and redundant asserts are removed. Shape calls on intermediate tensors are also turned into compute on input sizes if possible, allowing intermediate tensors to be freed earlier. For example:
```
z = torch.cat([x, x], dim=0)  # 2*s0
w = z.repeat(y.shape[0])  # 2*s0*s1
_w = w.shape[0]
# something with _w ...

# turns into ->
s0 = x.shape[0]
s1 = y.shape[0]
_w0 = 2 * s0
_w = _w0 * s1
```

Additionally, constrain_range calls are deduplicated. Single-symbol bound checks for unbacked symbols (e.g. u0 >= 0, u0 <= 5) and sym_constrain_range.default calls are also removed, since they accumulate range info in the ShapeEnv, and are replaced with two _assert_scalar.default calls that check the min/max bounds. For example:
```
torch.sym_constrain_range_for_size(n, min=2, max=16)
torch.sym_constrain_range(n, min=4, max=20)
torch._check(n >= 0)
torch._check(n >= 3)
torch._check(n <= 14)

# turns into
torch.sym_constrain_range_for_size(n)
torch._check(n >= 4)
torch._check(n <= 14)
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/128599
Approved by: https://github.com/ezyang
2024-07-07 20:10:14 +00:00
Xuehai Pan
a3ce9eddd6 [BE][Easy] apply autofix for ruff rule unnecessary-literal-set (C405) and unnecessary-map (C417) (#130198)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/130198
Approved by: https://github.com/Skylion007
2024-07-07 00:58:22 +00:00
PyTorch MergeBot
963f430d13 Revert "[runtime asserts] deduplicate runtime asserts & CSE (#128599)"
This reverts commit 0267b2ddcb.

Reverted https://github.com/pytorch/pytorch/pull/128599 on behalf of https://github.com/huydhn due to Sorry for reverting your change but it seems to cause a landrace and fails inductor/test_cudagraph_trees in trunk 0267b2ddcb ([comment](https://github.com/pytorch/pytorch/pull/128599#issuecomment-2211690518))
2024-07-06 07:20:05 +00:00
Pian Pawakapan
0267b2ddcb [runtime asserts] deduplicate runtime asserts & CSE (#128599)
This PR adds deduplication and CSE for runtime asserts. Existing size computation in the graph is CSE'd along with added runtime asserts, and redundant asserts are removed. Shape calls on intermediate tensors are also turned into compute on input sizes if possible, allowing intermediate tensors to be freed earlier. For example:
```
z = torch.cat([x, x], dim=0)  # 2*s0
w = z.repeat(y.shape[0])  # 2*s0*s1
_w = w.shape[0]
# something with _w ...

# turns into ->
s0 = x.shape[0]
s1 = y.shape[0]
_w0 = 2 * s0
_w = _w0 * s1
```

Additionally, constrain_range calls are deduplicated. Single-symbol bound checks for unbacked symbols (e.g. u0 >= 0, u0 <= 5) and sym_constrain_range.default calls are also removed, since they accumulate range info in the ShapeEnv, and are replaced with two _assert_scalar.default calls that check the min/max bounds. For example:
```
torch.sym_constrain_range_for_size(n, min=2, max=16)
torch.sym_constrain_range(n, min=4, max=20)
torch._check(n >= 0)
torch._check(n >= 3)
torch._check(n <= 14)

# turns into
torch.sym_constrain_range_for_size(n)
torch._check(n >= 4)
torch._check(n <= 14)
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/128599
Approved by: https://github.com/ezyang
2024-07-06 03:44:49 +00:00
Edward Z. Yang
29c68df600 Stop immediately specializing common constants 0/1 for plain int (#128327)
Fixes https://github.com/pytorch/pytorch/issues/128319

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/128327
Approved by: https://github.com/lezcano
ghstack dependencies: #129983
2024-07-03 16:41:51 +00:00
Edward Z. Yang
8af58f66bb Fix typo in floordiv solver code that affects flipped relation (#129888)
Fixes https://github.com/pytorch/pytorch/issues/123535

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/129888
Approved by: https://github.com/lezcano
2024-07-03 04:47:32 +00:00
Simon Fan
be2d79a16b [dynamic] config to disable duck sizing (#129804)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/129804
Approved by: https://github.com/ezyang
2024-07-03 00:20:54 +00:00
PyTorch MergeBot
c22e66896f Revert "Fix typo in floordiv solver code that affects flipped relation (#129888)"
This reverts commit 3c6c3b9448.

Reverted https://github.com/pytorch/pytorch/pull/129888 on behalf of https://github.com/huydhn due to Sorry for reverting your change but the updated test starts to fail flakily in trunk somehow, so I am reverting the change to see if it helps ([comment](https://github.com/pytorch/pytorch/pull/129888#issuecomment-2204442653))
2024-07-02 21:16:59 +00:00
Edward Z. Yang
3c6c3b9448 Fix typo in floordiv solver code that affects flipped relation (#129888)
Fixes https://github.com/pytorch/pytorch/issues/123535

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/129888
Approved by: https://github.com/lezcano
2024-07-02 11:15:03 +00:00
Brian Hirsh
b91a9dc328 [Brian's PR #128754] Use torch.ops.fsdp.set_ for FSDP2 storage resize; dont functionalize resize_, set_, split_with_sizes_copy.out (#129203)
This is a copy of Brian's PR https://github.com/pytorch/pytorch/pull/128754, with some changes in the test_distributed_patterns.py unit tests to more closely reflect FSDP2 patterns. Also disabled two tests `test_input_mutation_storage_resize_up_down` and `test_input_mutation_storage_resize_not_supported` in test_aotdispatch.py until we figure out the right behavior for them.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/129203
Approved by: https://github.com/bdhirsh
2024-06-23 06:07:19 +00:00
Animesh Jain
6b5fbc544e [dynamo] Use polyfill to trace through the attributes of torch.jit.* and lru_cache_wrapper (#128336)
Earlier we were taking the vt for `obj` and then monkeypatching that `vt.source` to be `obj._torchdynamo_inline`. If one accesses `obj.attr_a`, this would cause problems because Dynamo would then search it in `obj._torchdynamo_inline.attr_a`. This PR makes it more functional, so that we have different vts for obj and `ob._torchdynamo_inline`.

Fixes https://github.com/pytorch/pytorch/issues/93698

Pull Request resolved: https://github.com/pytorch/pytorch/pull/128336
Approved by: https://github.com/jansel, https://github.com/yanboliang
ghstack dependencies: #129117
2024-06-21 07:44:44 +00:00
chilli
a2b1673dfb [Horace's PR #126446] Prevent partitioner from ever saving views (#129039)
Most work is done by Horace in https://github.com/pytorch/pytorch/issues/126446, this PR just additionally adds the config for it.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/129039
Approved by: https://github.com/Chillee
2024-06-19 23:21:16 +00:00
Animesh Jain
c5e0b84484 [dynamo][trace_rules] Remove incorrectly classified Ingraph functions (#128428)
Co-authored-by: Laith Sakka <lsakka@meta.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/128428
Approved by: https://github.com/yanboliang, https://github.com/mlazos
2024-06-19 00:06:46 +00:00
PyTorch MergeBot
5bc9835d64 Revert "[dynamo][trace_rules] Remove incorrectly classified Ingraph functions (#128428)"
This reverts commit c52eda896e.

Reverted https://github.com/pytorch/pytorch/pull/128428 on behalf of https://github.com/anijain2305 due to luca saw bad compile time ([comment](https://github.com/pytorch/pytorch/pull/128453#issuecomment-2176877667))
2024-06-18 20:09:00 +00:00
Animesh Jain
b0282071c4 [dynamo] override torch.nn.modules.activation._is_make_fx_tracing (#128748)
Discovered while inlining `MultiHeadAttention` nn Module.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/128748
Approved by: https://github.com/jansel
ghstack dependencies: #128315
2024-06-17 08:49:29 +00:00
Animesh Jain
7e092a62e6 [dynamo] Support weakref objects (#128533)
Fixes https://github.com/pytorch/pytorch/issues/125720

I was earlier worried that DELETE_* or STORE_* on referent values should result in a graph break, because they could invalidate the weak ref. But then @zou3519 pointed out that weakref invalidation will happen EVENTUALLY, CPython provides no guarantees when the weakref will be invalidated (even when the user calls del x and x is the last reference).

So any code that relies on del x to invalidate the weakref of x right away is BAD code. CPython provide no guarantees. Therefore we can (ab)use this nuance, and can just ignore DELETE_* or STORE_* on the referent objects.

The only corner case is when Dynamo is reconstructing the weakref object. Dynamo will have a hard time being correct here, so just SKIP_FRAME on such a case. This is rare.

Cpython notes
1) https://docs.python.org/3/library/weakref.html
2) https://docs.python.org/3/reference/datamodel.html#index-2

Pull Request resolved: https://github.com/pytorch/pytorch/pull/128533
Approved by: https://github.com/jansel
2024-06-15 02:16:25 +00:00
Simon Fan
4b96575a09 [dynamo][aot autograd] Silently disable default saved tensor hooks during tracing (#123196)
FIXES #113263. Same idea as in https://github.com/pytorch/pytorch/pull/113417, but we need a more intrusive C API to silently nop default saved tensor hooks, in order to support user-code that use torch.autograd.disable_saved_tensors_hooks (see test_unpack_hooks_can_be_disabled). We mock the output of get_hooks while leaving push/pop untouched.

For compiled autograd, we're firing pack hooks once and unpack hooks twice right now, I'll look into this separately from this issue.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/123196
Approved by: https://github.com/soulitzer
2024-06-14 20:28:08 +00:00
Animesh Jain
c52eda896e [dynamo][trace_rules] Remove incorrectly classified Ingraph functions (#128428)
Co-authored-by: Laith Sakka <lsakka@meta.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/128428
Approved by: https://github.com/yanboliang, https://github.com/mlazos
ghstack dependencies: #126578, #128440, #128470, #128453, #128484
2024-06-13 06:08:56 +00:00
Animesh Jain
05711eece9 [dynamo][inlining inbuilt modules] Ensure BC for nn_module_stack (#128295)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/128295
Approved by: https://github.com/ydwu4
2024-06-10 23:11:04 +00:00
laithsakka
5b3624117a update test_issue175 to handle inline_inbuilt_nn_modules (#128026)
with inlining the output graph have more function calls reflecting those on the test that count number of function calls.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/128026
Approved by: https://github.com/anijain2305
ghstack dependencies: #127553
2024-06-07 22:07:16 +00:00
laithsakka
78a6b0c479 update test_reformer_train test to handle nn module inlining (#127467)
number of call nodes increase due to inlining
before inlining:
```
 class GraphModule(torch.nn.Module):
        def forward(self, function_ctx, cat: "f32[1, s0, 512]"):
            # No stacktrace found for following nodes
            _set_grad_enabled = torch._C._set_grad_enabled(False)

            # File: /data/users/lsakka/pytorch/pytorch/test/dynamo/test_repros.py:283 in backward, code: grad_attn_output, grad_hidden_states = torch.chunk(
            chunk = torch.chunk(cat, 2, dim = -1);  cat = None
            getitem: "f32[1, s0, 256]" = chunk[0]
            getitem_1: "f32[1, s0, 256]" = chunk[1];  chunk = None

            # No stacktrace found for following nodes
            _set_grad_enabled_1 = torch._C._set_grad_enabled(True)
            return (getitem_1, None)
```

after inlining:
```
class GraphModule(torch.nn.Module):
    def forward(self, s0: "Sym(s0)", L_hidden_states_: "f32[1, s0, 256]", L_self_layers_0_weight: "f32[256, 256]", L_self_layers_0_bias: "f32[256]", L_self_layer_norm_weight: "f32[512]", L_self_layer_norm_bias: "f32[512]", L_self_layer_norm_normalized_shape_0_: "Sym(512)"):
        l_hidden_states_ = L_hidden_states_
        l_self_layers_0_weight = L_self_layers_0_weight
        l_self_layers_0_bias = L_self_layers_0_bias
        l_self_layer_norm_weight = L_self_layer_norm_weight
        l_self_layer_norm_bias = L_self_layer_norm_bias
        l_self_layer_norm_normalized_shape_0_ = L_self_layer_norm_normalized_shape_0_

        # File: /data/users/lsakka/pytorch/pytorch/test/dynamo/test_repros.py:332 in forward, code: hidden_states = torch.cat([hidden_states, hidden_states], dim=-1)
        hidden_states: "f32[1, s0, 512]" = torch.cat([l_hidden_states_, l_hidden_states_], dim = -1);  l_hidden_states_ = None

        # File: /data/users/lsakka/pytorch/pytorch/test/dynamo/test_repros.py:333 in forward, code: hidden_states = _ReversibleFunction.apply(
        function_ctx = torch.autograd.function.FunctionCtx()

        # File: /data/users/lsakka/pytorch/pytorch/test/dynamo/test_repros.py:258 in forward, code: hidden_states, attn_output = torch.chunk(hidden_states, 2, dim=-1)
        chunk = torch.chunk(hidden_states, 2, dim = -1);  hidden_states = None
        hidden_states_1: "f32[1, s0, 256]" = chunk[0]
        attn_output: "f32[1, s0, 256]" = chunk[1];  chunk = None

        # File: /data/users/lsakka/pytorch/pytorch/torch/nn/modules/linear.py:116 in forward, code: return F.linear(input, self.weight, self.bias)
        attn_output_1: "f32[1, s0, 256]" = torch._C._nn.linear(attn_output, l_self_layers_0_weight, l_self_layers_0_bias);  attn_output = l_self_layers_0_weight = l_self_layers_0_bias = None

        # File: /data/users/lsakka/pytorch/pytorch/test/dynamo/test_repros.py:272 in forward, code: ctx.save_for_backward(attn_output.detach(), hidden_states.detach())
        detach: "f32[1, s0, 256]" = attn_output_1.detach()
        detach_1: "f32[1, s0, 256]" = hidden_states_1.detach()

        # File: /data/users/lsakka/pytorch/pytorch/test/dynamo/test_repros.py:279 in forward, code: return torch.cat([attn_output, hidden_states], dim=-1)
        hidden_states_2: "f32[1, s0, 512]" = torch.cat([attn_output_1, hidden_states_1], dim = -1);  attn_output_1 = hidden_states_1 = None

        # File: /data/users/lsakka/pytorch/pytorch/torch/nn/modules/normalization.py:201 in forward, code: return F.layer_norm(
        hidden_states_3: "f32[1, s0, 512]" = torch.nn.functional.layer_norm(hidden_states_2, (l_self_layer_norm_normalized_shape_0_,), l_self_layer_norm_weight, l_self_layer_norm_bias, 1e-12);  hidden_states_2 = l_self_layer_norm_normalized_shape_0_ = l_self_layer_norm_weight = l_self_layer_norm_bias = None

        # File: /data/users/lsakka/pytorch/pytorch/test/dynamo/test_repros.py:352 in forward, code: hidden_states = torch.nn.functional.dropout(
        hidden_states_4: "f32[1, s0, 512]" = torch.nn.functional.dropout(hidden_states_3, p = 0.5, training = True);  hidden_states_3 = None
        return (hidden_states_4,)
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/127467
Approved by: https://github.com/anijain2305
ghstack dependencies: #126444, #127146, #127424, #127440
2024-06-06 17:56:36 +00:00
Animesh Jain
c7e936a56a [dynamo] Tensorvariable - track grad with _grad field (#127785)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/127785
Approved by: https://github.com/jansel
2024-06-04 18:25:46 +00:00
Animesh Jain
4aa7a1efcf [dynamo] Initial exception handling support (#126923)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/126923
Approved by: https://github.com/williamwen42, https://github.com/jansel
2024-06-01 13:00:32 +00:00
Animesh Jain
159632aecd [dynamo] Support hasattr on BuiltinVariable (#127372)
Fixes https://github.com/pytorch/pytorch/issues/127172

Pull Request resolved: https://github.com/pytorch/pytorch/pull/127372
Approved by: https://github.com/williamwen42, https://github.com/yanboliang
ghstack dependencies: #127377
2024-05-31 04:23:56 +00:00
Animesh Jain
51b22d9cf2 [dynamo] Support enum construction (#127364)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/127364
Approved by: https://github.com/yanboliang
ghstack dependencies: #127263
2024-05-29 08:09:21 +00:00
Michael Hsu
85172fbe84 Back out "Prevent partitioner from ever saving views (#126446)" (#127316)
Summary: Revert "Prevent partitioner from ever saving views (#126446)" due to a torchinductor failure on CU Training Framework tests.

Reviewed By: Chillee

Differential Revision: D57868343

Pull Request resolved: https://github.com/pytorch/pytorch/pull/127316
Approved by: https://github.com/Chillee
2024-05-29 00:29:44 +00:00
Xuehai Pan
26f4f10ac8 [5/N][Easy] fix typo for usort config in pyproject.toml (kown -> known): sort torch (#127126)
The `usort` config in `pyproject.toml` has no effect due to a typo. Fixing the typo make `usort` do more and generate the changes in the PR. Except `pyproject.toml`, all changes are generated by `lintrunner -a --take UFMT --all-files`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/127126
Approved by: https://github.com/kit1980
2024-05-27 14:49:57 +00:00
PyTorch MergeBot
55c0ab2887 Revert "[5/N][Easy] fix typo for usort config in pyproject.toml (kown -> known): sort torch (#127126)"
This reverts commit 7763c83af6.

Reverted https://github.com/pytorch/pytorch/pull/127126 on behalf of https://github.com/XuehaiPan due to Broken CI ([comment](https://github.com/pytorch/pytorch/pull/127126#issuecomment-2133044286))
2024-05-27 09:22:08 +00:00
Xuehai Pan
7763c83af6 [5/N][Easy] fix typo for usort config in pyproject.toml (kown -> known): sort torch (#127126)
The `usort` config in `pyproject.toml` has no effect due to a typo. Fixing the typo make `usort` do more and generate the changes in the PR. Except `pyproject.toml`, all changes are generated by `lintrunner -a --take UFMT --all-files`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/127126
Approved by: https://github.com/kit1980
ghstack dependencies: #127122, #127123, #127124, #127125
2024-05-27 04:22:18 +00:00
Xuehai Pan
a28bfb5ed5 [4/N][Easy] fix typo for usort config in pyproject.toml (kown -> known): sort functorch (#127125)
The `usort` config in `pyproject.toml` has no effect due to a typo. Fixing the typo make `usort` do more and generate the changes in the PR. Except `pyproject.toml`, all changes are generated by `lintrunner -a --take UFMT --all-files`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/127125
Approved by: https://github.com/Skylion007
ghstack dependencies: #127122, #127123, #127124
2024-05-25 22:45:38 +00:00
Yidi Wu
2d6d2dbc0b [dynamo] make callable(nn_module) return True (#127026)
Before the pr, we have a graph break for `callable(nn_module)`:
```python
class M(nn.Module):
    def forward(self, x):
        return x.sin()

def f(m):
    return callable(m)

res = torch.compile(f, fullgraph=True)(M())
```

```
Traceback (most recent call last):
  File "/data/users/yidi/pytorch/t.py", line 17, in <module>
    out = torch.compile(f, backend="eager", fullgraph=True)(M())
  File "/data/users/yidi/pytorch/torch/_dynamo/eval_frame.py", line 414, in _fn
    return fn(*args, **kwargs)
  File "/data/users/yidi/pytorch/torch/_dynamo/convert_frame.py", line 1077, in catch_errors
    return callback(frame, cache_entry, hooks, frame_state, skip=1)
  File "/data/users/yidi/pytorch/torch/_dynamo/convert_frame.py", line 456, in _convert_frame_assert
    return _compile(
  File "/data/users/yidi/pytorch/torch/_utils_internal.py", line 74, in wrapper_function
    return function(*args, **kwargs)
  File "/home/yidi/.conda/envs/pytorch/lib/python3.10/contextlib.py", line 79, in inner
    return func(*args, **kwds)
  File "/data/users/yidi/pytorch/torch/_dynamo/convert_frame.py", line 799, in _compile
    guarded_code = compile_inner(code, one_graph, hooks, transform)
  File "/data/users/yidi/pytorch/torch/_dynamo/utils.py", line 210, in time_wrapper
    r = func(*args, **kwargs)
  File "/data/users/yidi/pytorch/torch/_dynamo/convert_frame.py", line 618, in compile_inner
    out_code = transform_code_object(code, transform)
  File "/data/users/yidi/pytorch/torch/_dynamo/bytecode_transformation.py", line 1167, in transform_code_object
    transformations(instructions, code_options)
  File "/data/users/yidi/pytorch/torch/_dynamo/convert_frame.py", line 177, in _fn
    return fn(*args, **kwargs)
  File "/data/users/yidi/pytorch/torch/_dynamo/convert_frame.py", line 564, in transform
    tracer.run()
  File "/data/users/yidi/pytorch/torch/_dynamo/symbolic_convert.py", line 2244, in run
    super().run()
  File "/data/users/yidi/pytorch/torch/_dynamo/symbolic_convert.py", line 886, in run
    while self.step():
  File "/data/users/yidi/pytorch/torch/_dynamo/symbolic_convert.py", line 801, in step
    self.dispatch_table[inst.opcode](self, inst)
  File "/data/users/yidi/pytorch/torch/_dynamo/symbolic_convert.py", line 496, in wrapper
    return inner_fn(self, inst)
  File "/data/users/yidi/pytorch/torch/_dynamo/symbolic_convert.py", line 1255, in CALL_FUNCTION
    self.call_function(fn, args, {})
  File "/data/users/yidi/pytorch/torch/_dynamo/symbolic_convert.py", line 739, in call_function
    self.push(fn.call_function(self, args, kwargs))
  File "/data/users/yidi/pytorch/torch/_dynamo/variables/builtin.py", line 948, in call_function
    return handler(tx, args, kwargs)
  File "/data/users/yidi/pytorch/torch/_dynamo/variables/builtin.py", line 711, in <lambda>
    return lambda tx, args, kwargs: obj.call_function(
  File "/data/users/yidi/pytorch/torch/_dynamo/variables/builtin.py", line 948, in call_function
    return handler(tx, args, kwargs)
  File "/data/users/yidi/pytorch/torch/_dynamo/variables/builtin.py", line 835, in builtin_dipatch
    unimplemented(error_msg)
  File "/data/users/yidi/pytorch/torch/_dynamo/exc.py", line 216, in unimplemented
    raise Unsupported(msg)
torch._dynamo.exc.Unsupported: builtin: callable [<class 'torch._dynamo.variables.nn_module.NNModuleVariable'>] False
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/127026
Approved by: https://github.com/jansel
2024-05-24 18:31:43 +00:00
Animesh Jain
f0366de414 [dynamo] Support __contains__ on obj.__dict__ (#126922)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/126922
Approved by: https://github.com/jansel, https://github.com/yanboliang
2024-05-23 09:01:29 +00:00
chilli
d4ec18bdad Prevent partitioner from ever saving views (#126446)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/126446
Approved by: https://github.com/anijain2305
ghstack dependencies: #126615
2024-05-22 17:28:46 +00:00
PyTorch MergeBot
0f37fd06d9 Revert "Prevent partitioner from ever saving views (#126446)"
This reverts commit da2292ce6b.

Reverted https://github.com/pytorch/pytorch/pull/126446 on behalf of https://github.com/DanilBaibak due to Break internal build ([comment](https://github.com/pytorch/pytorch/pull/126615#issuecomment-2124169157))
2024-05-22 08:23:40 +00:00
chilli
da2292ce6b Prevent partitioner from ever saving views (#126446)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/126446
Approved by: https://github.com/anijain2305
ghstack dependencies: #126615
2024-05-20 23:40:56 +00:00
Huy Do
abc4b66124 Forward fix the failed new test from D57474327 (#126596)
Summary: TSIA.  The two looks the same to me, but buck was failing with the following error when `with torch._inductor.utils.fresh_inductor_cache()` is used:

```
_________________________ ReproTests.test_issue126128 __________________________

self = <caffe2.test.dynamo.test_repros.ReproTests testMethod=test_issue126128>

    def test_issue126128(self):
        def fn():
            x = torch.randn(1, 10)
            y = torch.randn(10, 1)
            return torch.mm(x, y).sum()

        def fn2():
            x = torch.randn(10, 100)
            y = torch.randn(100, 10)
            return torch.mm(x, y).sum()

>       with torch._inductor.utils.fresh_inductor_cache():
E       AttributeError: module 'torch._inductor' has no attribute 'utils'
```

Test Plan: `buck2 test 'fbcode//mode/opt' fbcode//caffe2/test/dynamo:test_dynamo -- --exact 'caffe2/test/dynamo:test_dynamo - test_repros.py::ReproTests::test_issue126128'`

Differential Revision: D57516676

Pull Request resolved: https://github.com/pytorch/pytorch/pull/126596
Approved by: https://github.com/xmfan
2024-05-18 23:56:03 +00:00
Matthew Hoffman
81277baa0c Remove removed ruff rule TRY200 (#126256)
My TOML linter is complaining that "TRY200" is not acceptable for the `tool.ruff.lint` schema.

From the ruff docs: https://docs.astral.sh/ruff/rules/reraise-no-cause/

> This rule has been removed and its documentation is only available for historical reasons.
>
> This rule is identical to [B904](https://docs.astral.sh/ruff/rules/raise-without-from-inside-except/) which should be used instead.

and we are currently explicitly ignoring B904.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/126256
Approved by: https://github.com/Skylion007
2024-05-17 16:31:05 +00:00
Simon Fan
cef7756c9c [inductor] Clear cache on ctx manager exit (#126146)
FIXES https://github.com/pytorch/pytorch/issues/126128.

Right now, we only clear the cache on ctx manager enter. So state is bad unless we call fresh_inductor_cache again,  usually fine in tests.

Cue compiled autograd tests when going from TestCompiledAutograd -> TestAutogradWithCompiledAutograd.
TestCompiledAutograd uses the ctx manager, but TestAutogradWithCompiledAutograd don't

Pull Request resolved: https://github.com/pytorch/pytorch/pull/126146
Approved by: https://github.com/jgong5, https://github.com/oulgen
ghstack dependencies: #126144
2024-05-16 22:23:02 +00:00
William Wen
56a89fcc08 [dynamo] graph break on issubclass call with non-const args (#125943)
Fixes https://github.com/pytorch/pytorch/issues/125942

Pull Request resolved: https://github.com/pytorch/pytorch/pull/125943
Approved by: https://github.com/jansel
ghstack dependencies: #125882
2024-05-15 23:22:06 +00:00
William Wen
100e3c1205 [dynamo] graph break on const dict KeyError (#125882)
Fixes https://github.com/pytorch/pytorch/issues/125866

Pull Request resolved: https://github.com/pytorch/pytorch/pull/125882
Approved by: https://github.com/jansel
2024-05-15 23:22:06 +00:00
William Wen
4a8db9d45b [dynamo] reset grad state in aotdispatch test, add failing trace functional tensor test to dynamo (#126113)
Workaround for https://github.com/pytorch/pytorch/issues/125568.

We could add additional global state to reset (e.g. autocast?) or move this setup/teardown to a more general place.

Also added a minimal repro for the linked issue - will investigate in a followup PR.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/126113
Approved by: https://github.com/ezyang, https://github.com/bdhirsh
2024-05-14 20:42:49 +00:00
Edward Z. Yang
db3b38202b Improve dead code elimination of unnecessary int arguments (#126074)
Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/126074
Approved by: https://github.com/lezcano
ghstack dependencies: #125325, #125915
2024-05-14 17:22:30 +00:00
Brian Hirsh
f25c7c9699 functionalize storage resizing, minimal ppFSDP traceable forward (#122434)
More details further down, but first a more high-level description of "how do we functionalize storage resizing"

Today, dynamo converts `param.untyped_storage().resize_(x)` calls that it sees from fsdp into a custom op, `ops.inductor.resize_storage_bytes_(x)`

So given this setup, there are 3 main cases that I think we want to handle:

(1) graph input starts with a real storage size, gets resized down to zero in the graph
(2) graph input starts with 0 storage size, gets resized up in the graph
(3) graph input starts with 0 storage size, gets resized up and used in some compute, then resized back down to 0

For case (1) we need to emit a `resize_storage_bytes_` at the end of the graph, similar to how we emit `copy_()` for data mutations.

For case (2), we need to emit a `resize_storage_bytes_` in the graph, and we **also** need to emit a `copy_()` (the input had its storage resized up, and filled in with data, which is we need to reflect as an input mutation)

For case (3), the net effect is that the input had no data on entry and exit of the function, so we don't need to emit any mutable ops in the end of the graph.

The main thing to call out is that: we need to write a functionalization rule for `resize_storage_byte_`, (`FunctionalTensorWrapper::storage_resize_()`) and this rule actually does very little. We would like to **not** emit any new ops in the graph (like say, a functional resize op). Instead, we should expect / rely on the fact that any resize up will be immediately followed by a `copy_()`/`foreach_copy_`/`out=` op, that will fill in the data of the tensor. So `FunctionalTensor` can temporarily live in a state where its data is invalid, until the `x.copy_(y)` "updates" its data with the new tensor.

So effectively, all that this rule does is:

(1) it stores metadata on the storage, indicating that the tensor was resized, as well as the updated storage size. We need this info in AOTAutograd, so it knows whether to emit a mutable resize_() op in the graph epilogue

(2) There is also a corner case: if we are resizing down to zero, but our tensor had **previously** had a zero size storage, then we update `value_` to point to the original value of the tensor. The reason this seems safe is because if we have a zero storage sized tensor `x`, and we resize it up, use it in some compute, resize it back down to zero, and use it somewhere, we would want the functional version of this code to use the original `x` after the second resize. For FSDP, this is important because we end up saving parameters (graph inputs) for backward, and we want to make sure that the thing we save (and the output to the forward graph) is the original, zero-storage-sized parameter, and not the "version 2" of the parameter after the first resize_()

I think a good order to look at changes in this PR would be:

(1) `test_aotdispatch.py` shows the 3 main cases I focused on as well as the expected functionalized graphs

(2) In `FunctionalStorageImpl.h/cpp`, I had to add a notion of "original base", and "original/curr_size". The first is so I can re-use the zero-size tensor after multiple resizes, and the second is so I can tell in AOTAutograd whether any resizes canceled each other out into a no-op

(3) FunctionalTensorWrapper.h/cpp has the new resize functionalizion rule + some extra utils

(4) `_functorch/_autograd`: the main changes in this folder were around adding the logic at trace-time to detect when we need to put a resize_() in the graph. I also have some assertions to check that any inputs that experience storage resizing will **always be in the graph** and not the opaque epilogue, and I also limited the resize_() mutation case so that you can only ever start with zero storage, or end with zero storage (you can't do e.g. `torch.ones(2).storage().resize_(3)`), and banned it on tensor subclasses

(5) `fake_tensor.py`/`meta_utils.py`: we now need to be able to fakeify tensors with zero storage, so I added a quick version of it in meta_utils.py. This also.. has ramifications for fake tensor caching that I need to fix (include the storage size on the cache key, maybe?)

------------------

This PR subsumes https://github.com/pytorch/pytorch/pull/120971.

This PR is enough to **almost** get a simple ppFSDP forward pass tracing with a functionalized resize_() properly. It also attempts to do the updated version from @jansel, where we don't have any notion of `resize_()` in the graph at all, post functionalization. It would probably be good to test it with @yf225 's FSDP changes, and see how many of the FX passes it allows us to remove. I think that in theory, it should allow us to remove all FX passes that affect the forward graph / partitioner, **except** the one that forces views to be recomputed in the backward (more details below).

There are a few things worth calling out:

(1) failed attempt at functionalizing `aten.copy_()`. I originally wanted to get a version takes these operations:
```
param.storage().resize_(all_gather_size)
param.copy_(all_gather_buffer)
out = aten.matmul(param, param)
```
and functionalizes them into:
```
out = aten.matmul(all_gather_buffer, all_gather_buffer)
```

This would involve getting functionalization to turn `x.copy_(y)` into a giant no-op that just returns `y`. Unfortunately, we can't actually do this in a reasonable way within functionalization (instead, there's a functional `aten.copy` in the graph - see the test case graph expecttest for details). Why? In order for that transformation to be safe, `x` and `y` need to have the same metadata. However, it's possible for `x` and `y` to be subclasses of different types. This is not something we can easily tell from within functionalization, and would be a layering violation. So for now I'm leaving it to downstream code to optimize away the `aten.copy` (this is already the case today, so I think inductor can handle this)

(2) The forward doesn't **actually** run successfully in this PR (see the `assertRaisesRegex` in the test). Why?

The final forward graph looks like this:
```
def forward(self, primals_1, primals_2):
    _foreach_copy = torch.ops.aten._foreach_copy.default([primals_1], [primals_2]);  primals_2 = None
    getitem = _foreach_copy[0];  _foreach_copy = None
    mm = torch.ops.aten.mm.default(getitem, getitem);  getitem = None
    t_1 = torch.ops.aten.t.default(primals_1);  primals_1 = None
    return [mm, t_1]
```

Where `primals_1` starts out as a secretly-zero-storage-size parameter, and gets resized up and back down within the forward (these are functionalized away).

Importantly, the matmul happy on the result of the `foreach_copy`, **but** the activation that we save for backward (`t_1`) is the result of transposing the **original parameter** (the zero-storage-size param). This is exactly the optimization in fsdp that allows us to have good peak memory usage.

The problem is that the min-cut partitioner decides to save `t_1` for backward. Running this code in eager breaks, because the kernel for `aten.permute(x)` is not happy when `x` has secretly-zero-sized-storage.

The real problem here is that in eager mode the `permute` kernel runs during the backward, after backward hooks have properly resized the saved activation. Here, we are running the transpose in the forward.

One option would be to turn off the checks in our view kernels and allow them to work on zero-storage-sized tensors, which feels pretty bad. Another option is to tweak the partitioner (or use one of Will's FX passes) to force the partitioner to not save views for backward, and allow the views to be recomputed in the backward. This seems kind of silly, but is also probably harmless.

(3) The backward is still broken. To be fair, this issue is pretty separable from "functionalizing storage resize calls", and can be fixed later (either by a real fix to our tracing infra, or via another hacky FX pass). More description of this problem is described at issue (8) of my PR description in https://github.com/pytorch/pytorch/pull/120971

(4) I only added support for "full graph" resizing: basically, the limited case where a param starts with zero storage size, and gets resized up and back down. I think we can add support for the graph break case, but I think we can keep that add-on separate from this PR unless we need it immediately. I also added asserts so we should fail loudly when we hit this case

(5) I have a change to FakeTensor creation when inputs have zero storage size that.. is probably ok. But I also removed FakeTensor caching on view ops, which I probably need to fix before I can land this PR

(6) I added a notion of "original_base" to `FunctionalStorageImpl`. More details are in the comments, but my rational for this was that we basically need it to ensure that autograd saves the **original**, zero-storage-sized param for backward, after resizing up and back down

(7) I had to update our eager kernels for `aten.copy` and `aten._foreach_copy`, to handle the case where the `self` argument has secretly-zero-storage. Inductor can probably generate correct code for this case, but we need these ops to work properly in this situation for the `aot_eager` backend to do the right thing

Pull Request resolved: https://github.com/pytorch/pytorch/pull/122434
Approved by: https://github.com/jansel
2024-05-10 18:09:10 +00:00
William Wen
ae20f15941 [dynamo] trace through nn parametrize (#125771)
Fix https://github.com/pytorch/pytorch/issues/120914

Example dynamo output graph (from test_nn_parametrize):
```
V0508 11:16:26.687000 140092517021504 torch/_dynamo/output_graph.py:1272] [0/0] [__graph_code] TRACED GRAPH
V0508 11:16:26.687000 140092517021504 torch/_dynamo/output_graph.py:1272] [0/0] [__graph_code]  ===== __compiled_fn_1 =====
V0508 11:16:26.687000 140092517021504 torch/_dynamo/output_graph.py:1272] [0/0] [__graph_code]  /data/users/williamwen/pytorch2/torch/fx/_lazy_graph_module.py class GraphModule(torch.nn.Module):
V0508 11:16:26.687000 140092517021504 torch/_dynamo/output_graph.py:1272] [0/0] [__graph_code]     def forward(self, L_x_: "f32[10, 10]"):
V0508 11:16:26.687000 140092517021504 torch/_dynamo/output_graph.py:1272] [0/0] [__graph_code]         l_x_ = L_x_
V0508 11:16:26.687000 140092517021504 torch/_dynamo/output_graph.py:1272] [0/0] [__graph_code]
V0508 11:16:26.687000 140092517021504 torch/_dynamo/output_graph.py:1272] [0/0] [__graph_code]         # File: /data/users/williamwen/pytorch2/torch/nn/utils/parametrize.py:275 in forward, code: x = self[0](self.original)
V0508 11:16:26.687000 140092517021504 torch/_dynamo/output_graph.py:1272] [0/0] [__graph_code]         l__self___parametrizations__param___original: "f32[10, 10]" = self.L__self___parametrizations__param___original
V0508 11:16:26.687000 140092517021504 torch/_dynamo/output_graph.py:1272] [0/0] [__graph_code]
V0508 11:16:26.687000 140092517021504 torch/_dynamo/output_graph.py:1272] [0/0] [__graph_code]         # File: /data/users/williamwen/pytorch2/test/dynamo/test_repros.py:4759 in forward, code: return torch.sin(x)
V0508 11:16:26.687000 140092517021504 torch/_dynamo/output_graph.py:1272] [0/0] [__graph_code]         x: "f32[10, 10]" = torch.sin(l__self___parametrizations__param___original);  l__self___parametrizations__param___original = None
V0508 11:16:26.687000 140092517021504 torch/_dynamo/output_graph.py:1272] [0/0] [__graph_code]
V0508 11:16:26.687000 140092517021504 torch/_dynamo/output_graph.py:1272] [0/0] [__graph_code]         # File: /data/users/williamwen/pytorch2/test/dynamo/test_repros.py:4755 in forward, code: return self.param @ x
V0508 11:16:26.687000 140092517021504 torch/_dynamo/output_graph.py:1272] [0/0] [__graph_code]         matmul: "f32[10, 10]" = x @ l_x_;  x = l_x_ = None
V0508 11:16:26.687000 140092517021504 torch/_dynamo/output_graph.py:1272] [0/0] [__graph_code]         return (matmul,)
V0508 11:16:26.687000 140092517021504 torch/_dynamo/output_graph.py:1272] [0/0] [__graph_code]
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/125771
Approved by: https://github.com/jbschlosser
ghstack dependencies: #125710, #125724
2024-05-09 17:43:48 +00:00
William Wen
ff090c6937 [dynamo] support tracing nn.Module @property that accesses closure cells (#125724)
Fix https://github.com/pytorch/pytorch/issues/125702

Pull Request resolved: https://github.com/pytorch/pytorch/pull/125724
Approved by: https://github.com/jansel, https://github.com/jbschlosser
ghstack dependencies: #125710
2024-05-08 23:25:39 +00:00
William Wen
93f3d561f9 [dynamo] don't make nn parametrized Modules unspecialized (#125710)
Workaround for https://github.com/pytorch/pytorch/issues/125314 and https://github.com/pytorch/pytorch/issues/125478.

We no longer make parametrized nn.Modules unspecialized. Instead, when we are about to call a function from the `torch.nn.utils.parametrize` module, we skip the frame.

The script from https://github.com/pytorch/pytorch/issues/125314 now outputs
```
parametrize=True: 6587ms
parametrize=False: 1729ms
parametrize=True: 4497ms
parametrize=False: 1539ms
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/125710
Approved by: https://github.com/jansel, https://github.com/jbschlosser
2024-05-08 23:25:39 +00:00
Edward Z. Yang
ecd62746e3 Also pull size/stride info from example_value (#125505)
Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/125505
Approved by: https://github.com/jansel
2024-05-05 22:27:46 +00:00
Arun Pa
00c5859aeb [dynamo] Add support for DELETE_SUBSCR (#123526)
Fixes #123317

Co-authored-by: Jason Ansel <jansel@meta.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/123526
Approved by: https://github.com/jansel
2024-04-25 22:07:24 +00:00
Yanbo Liang
72a34eeb99 Dynamo x autograd.Function supports non-{Tensor, symnode, constant} inputs (#124360)
Fixes #118395

Pull Request resolved: https://github.com/pytorch/pytorch/pull/124360
Approved by: https://github.com/zou3519
2024-04-22 23:32:54 +00:00
Andrew M. James
64f42bfd52 [dynamo] Support list.reverse (#124210)
fixes #123974

Pull Request resolved: https://github.com/pytorch/pytorch/pull/124210
Approved by: https://github.com/peterbell10
2024-04-17 23:33:32 +00:00
Xuehai Pan
93e249969b [BE] enable ruff rule RSE and remove useless parentheses in raise statements (#124261)
Remove useless parentheses in `raise` statements if the exception type is raised with no argument.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/124261
Approved by: https://github.com/albanD
2024-04-17 19:29:34 +00:00
William Wen
2b3594f90e [dynamo] fix call_finally issue in Python 3.8 (#124122)
Fix https://github.com/pytorch/pytorch/issues/97811 again...

Pull Request resolved: https://github.com/pytorch/pytorch/pull/124122
Approved by: https://github.com/jansel
2024-04-16 08:36:20 +00:00
Edward Z. Yang
e4efa311f1 Refactor test_tensor_set_data to be parametrized (#124105)
Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/124105
Approved by: https://github.com/albanD
2024-04-16 03:23:41 +00:00
Jason Ansel
f3fd280238 [dynamo] Relax strict_mode for autograd.Function forward inputs (#123910)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/123910
Approved by: https://github.com/oulgen
2024-04-13 19:41:59 +00:00
Animesh Jain
58afcd7b61 [dynamo][dict] Add UnspecializedNNModuleVariable to dict keys (#122812)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/122812
Approved by: https://github.com/jansel
ghstack dependencies: #122943, #123877, #123878
2024-04-13 02:07:35 +00:00
Jason Ansel
e3935783f7 [dynamo] Fix @property on user-defined nn.Module (#123804)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/123804
Approved by: https://github.com/anijain2305
ghstack dependencies: #123700, #123705, #123786, #123790, #123803
2024-04-12 19:03:13 +00:00
Brian Hirsh
f9f7ef33c4 AOTAutograd: add config to error when overlapping input checks would cause slow compile / runtimes (#123455)
We should eventually make the non-overlapping checks faster when dynamic shapes are enabled, but this is pretty difficult to do. So for now this PR adds a config that lets us fail fast when this situation happens, instead of causing compile times to secretly come to a crawl.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/123455
Approved by: https://github.com/ezyang
2024-04-12 13:25:33 +00:00
Brian Hirsh
96fe3c5d46 fix correctness for dynamo inlining RangeVariable __contains__ (#122751)
Fixes https://github.com/pytorch/pytorch/issues/122379

It looks like `iter_contains()` in dynamo expects to take in something like `iter_contains(List[VariableTracker], VariableTracker])`. Previously, when we called this function where the list in question was a `RangeVariable`, we would pass in `RangeVariable.items` as our list.

This is wrong, though since `RangeVariable.items` just contains the underlying [start, stop, step]. It looks like `unpack_var_sequence` does the right thing of "materializing" the range into a list of `VariableTrackers`, so I used that instead.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/122751
Approved by: https://github.com/anijain2305, https://github.com/jansel
ghstack dependencies: #122502
2024-04-12 01:12:23 +00:00
Brian Hirsh
2fe672b146 compile: ban mutations on non-compositional uses of as_strided (#122502)
Fixes https://github.com/pytorch/pytorch/issues/104505

I was originally going to ban all usages of as_strided + mutation in functionalization. But I'm pretty sure that as_strided + mutation is fine when we are calling as_strided on a base tensor.

So in this PR I added a slightly more conservative check: if we see an as_strided + mutation, where the input to an as_strided was **another** view op, then I error loudly in functionalization and link to the github issue above (in case anyone runs into this in the real world)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/122502
Approved by: https://github.com/ezyang, https://github.com/albanD
2024-04-12 01:12:23 +00:00
Jason Ansel
b3feb01910 [dynamo] Update co_names if needed in fix_vars (#123697)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/123697
Approved by: https://github.com/williamwen42
2024-04-11 01:00:01 +00:00
Animesh Jain
7283c37c98 [dynamo] Keep guards on global function (#123423)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/123423
Approved by: https://github.com/jansel
2024-04-09 04:23:11 +00:00
Jason Ansel
d8e0c26e64 [dynamo] Support warnings.catch_warnings (#123511)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/123511
Approved by: https://github.com/anijain2305
2024-04-08 22:27:46 +00:00
Jason Ansel
212e460dce [dynamo] Support custom __setattr__ on UserDefinedObjectVariable (#123318)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/123318
Approved by: https://github.com/anijain2305
2024-04-07 21:06:52 +00:00
Animesh Jain
5d0ac887b9 [dynamo][higher order ops] Make the subgraph sourceless (#123071)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/123071
Approved by: https://github.com/jansel, https://github.com/zou3519
ghstack dependencies: #123046, #123058, #123059
2024-04-01 21:09:41 +00:00
Jason Ansel
781e8d2201 [dynamo] Support __next__ on UserDefinedObjectVariable (#122565)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/122565
Approved by: https://github.com/yanboliang
2024-03-31 19:00:03 +00:00
Edward Z. Yang
deeeaded1f Add metas for randint/rand factory functions out overload (#122375)
Fixes https://github.com/pytorch/pytorch/issues/121897

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/122375
Approved by: https://github.com/lezcano
2024-03-25 04:01:38 +00:00
Jason Ansel
3e4a4bea12 [dynamo] Graph break on SymNode control flow (#122546)
Fixes #111918

Pull Request resolved: https://github.com/pytorch/pytorch/pull/122546
Approved by: https://github.com/ezyang
2024-03-24 07:22:02 +00:00
Jason Ansel
5f7e71c411 [dynamo] Add HASATTR guard for UserDefinedObject attrs (#122555)
Fixes #111522

Pull Request resolved: https://github.com/pytorch/pytorch/pull/122555
Approved by: https://github.com/Skylion007
2024-03-24 03:41:58 +00:00
Edward Z. Yang
c2651a7f0e Make check_is_size clamp to sys.maxsize - 1, so sys.maxsize comparison returns False (#122372)
Partially fixes https://github.com/pytorch/pytorch/issues/113002

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/122372
Approved by: https://github.com/lezcano
ghstack dependencies: #122370
2024-03-21 17:14:42 +00:00
albanD
53d5276d69 Improve Dynamo support for torch function and class methods in general (#121365)
I was originally trying to solve https://github.com/pytorch/pytorch/issues/120799 but got sidetracked along the way.
This PR contains a couple fixes. Let me know if you want me to split them up!

- Properly handle invalid user code when "super()" is called from non-method/classmethod. It will now properly raise the same error as CPython
- Fix base VariableTracker `__str__` method shadowing all `__repr__` methods defined in subclasses
- Fix accessing a classmethod on a user object to bind "cls" and not "self"
- Fix custom class handling of super() call to properly handle mixed regular/class/static methods

Locally , test_repros.py -k test_batch_norm_act still fails where the generated graph module is:
```
Call using an FX-traced Module, line 8 of the traced Module's generated forward function:
    x = self.forward(l_x_);  self = l_x_ = None
    x_1 = self.L__self___act(x);  x = None
```
note that "self" is being unset on the first line even though it is used on the second one.
For reference, this is the test c268ce4a6d/test/dynamo/test_repros.py (L1368-L1369)
I cannot figure out where the generated forward function comes from though, any hint would be welcome!

Pull Request resolved: https://github.com/pytorch/pytorch/pull/121365
Approved by: https://github.com/jansel
2024-03-08 20:03:49 +00:00
Tugsbayasgalan Manlaibaatar
f01a23d01b Don't aggressively rewrite asserts for symbolic expressions (#120564)
Fixes: https://github.com/pytorch/pytorch/issues/118417

Pull Request resolved: https://github.com/pytorch/pytorch/pull/120564
Approved by: https://github.com/ezyang
2024-03-01 17:46:36 +00:00
Edward Z. Yang
2a08a51738 Add _assert_scalar and teach Inductor to codegen it (#114148)
Inductor codegen for `_assert_async` is currently disabled because we don't really understand how to codegen `scalar_to_tensor` on a Sympy expression. I initially tried to see if I could get this to work, but I got into some weird problem involving stride sorting, so I decided to fix it properly by not going through a tensor.

So we introduce an `_assert_scalar` which takes a scalar as an argument, avoiding needing to turn a SymBool into a tensor before asserting on it. I also add `_functional_assert_scalar` for good luck, although this doesn't do anything right now because https://github.com/pytorch/pytorch/pull/104203 still hasn't been landed.

I need to customize the codegen for this operator, so I decide to directly implement it in Inductor, rather than trying to treat it as a generic ExternKernel. This leads to the new AssertScalar IR node. This is written carefully so that it doesn't get DCE'd by Inductor.

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/114148
Approved by: https://github.com/jansel
ghstack dependencies: #120800
2024-03-01 05:06:36 +00:00
Animesh Jain
e7039e3a0b [dynamo][easy] Dynamo test changes (#120927)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/120927
Approved by: https://github.com/yanboliang
ghstack dependencies: #120864, #120730
2024-02-29 22:05:41 +00:00
Brian Hirsh
cccacf6c8e add a test that non_overlapping checks dont generate too many guards (#120106)
Pre-emptive test in OSS to ensure that models relying on the "non-overlapping guards" checks do not suffer drastically w.r.t. guard slowness. Current plan is to follow up on this with a "real" fix, to generate a linear number of these guards.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/120106
Approved by: https://github.com/mlazos
2024-02-20 18:38:59 +00:00
Brian Hirsh
6819452a08 fix multiple-fake-modes bug with compile + subclasses (#118191)
This should fix the "multiple fake modes" errors we've been seeing with both float8 tensor and DTensor.

Haven't added a test yet - will add one before landing.

I also have a separate PR that would have made the error significantly nicer (the bad error resulted from us returning a FakeTensor at runtime): https://github.com/pytorch/pytorch/pull/118644

Pull Request resolved: https://github.com/pytorch/pytorch/pull/118191
Approved by: https://github.com/drisspg
ghstack dependencies: #117667, #117666, #118209
2024-02-20 15:23:41 +00:00
Animesh Jain
80379ef0aa [dynamo-must-fix] Use ID_MATCH for UserDefinedClass (#119853)
Fixes https://github.com/pytorch/pytorch/issues/119715

Pull Request resolved: https://github.com/pytorch/pytorch/pull/119853
Approved by: https://github.com/jansel
2024-02-14 03:14:42 +00:00
Sergii Dymchenko
bd9db6a9c7 Update to TorchFix 0.4.0 (#119424)
`torch.library.Library` updated to `torch.library._scoped_library` in files with many tests where it seems obvious to do, otherwise `noqa: TOR901` added - see https://github.com/pytorch/pytorch/pull/118318 for more context.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/119424
Approved by: https://github.com/zou3519
2024-02-12 23:30:12 +00:00
Brian Hirsh
02b60e76c9 make flash_attn_bw impl correct w.r.t. meta when k and v have different strides (#119500)
`dv = at::empty_like(k)` and `dv = at::empty_like(v)` can be materially different, because `empty_like` tries to preserve the strides of the input when possible. So if `k` is contiguous, but `v`, is transposed, then before this PR, `dv` would be computed to be contiguous.

Alternatively, we could change the meta implementation of `aten._scaled_dot_product_flash_attention` to this:
```
    grad_q = torch.empty_like(query.transpose(1, 2)).transpose(1, 2)
    grad_k = torch.empty_like(key.transpose(1, 2)).transpose(1, 2)
    grad_v = torch.empty_like(key.transpose(1, 2)).transpose(1, 2)
    return grad_q, grad_k, grad_v
```

But (I think?) the logic in the sdpa backward impl was a typo.

I noticed this because changing the meta formula as above was enough to fix the issue with the `aot_eager` backend in this [link](https://github.com/pytorch/pytorch/issues/116935#issuecomment-1914310523).

A minimal repro that I made looks like this:
```
import torch

# in this repro, "grad_out" and "value" are transposed tensors,
# but "key" and "value" are contiguous
a = torch.randn(2, 513, 16, 64, dtype=torch.float16, device='cuda').transpose(1, 2)
b = torch.randn(2, 16, 513, 64, dtype=torch.float16, device='cuda')
c = torch.randn(2, 16, 513, 64, dtype=torch.float16, device='cuda')
d = torch.randn(2, 513, 16, 64, dtype=torch.float16, device='cuda').transpose(1, 2)
e = torch.randn(2, 16, 513, 64, dtype=torch.float16, device='cuda')
f = torch.randn(2, 16, 513, device='cuda')
g = None
h = None
i = 513
j = 513
k = 0.0
l = False
m = torch.tensor(1, dtype=torch.int64)
n = torch.tensor(1, dtype=torch.int64)

out1_ref, out2_ref, out3_ref = torch.ops.aten._scaled_dot_product_flash_attention_backward(a, b, c, d, e, f, g, h, i, j, k, l, m, n, scale=0.125)

from torch._meta_registrations import meta__scaled_dot_product_flash_backward
out1_test, out2_test, out3_test = meta__scaled_dot_product_flash_backward(a, b, c, d, e, f, g, h, i, j, k, l, m, n, scale=0.125)

# prints True True
print(out1_ref.is_contiguous())
print(out1_test.is_contiguous())

# prints True True
print(out2_ref.is_contiguous())
print(out2_test.is_contiguous())

# prints True False
print(out3_ref.is_contiguous())
print(out3_test.is_contiguous())
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/119500
Approved by: https://github.com/drisspg, https://github.com/ezyang, https://github.com/Skylion007
2024-02-12 22:12:29 +00:00
PyTorch MergeBot
34db6f1b13 Revert "make flash_attn_bw impl correct w.r.t. meta when k and v have different strides (#119500)"
This reverts commit 095f471307.

Reverted https://github.com/pytorch/pytorch/pull/119500 on behalf of https://github.com/DanilBaibak due to Broken trunk ([comment](https://github.com/pytorch/pytorch/pull/119500#issuecomment-1937003082))
2024-02-10 13:06:30 +00:00
Brian Hirsh
095f471307 make flash_attn_bw impl correct w.r.t. meta when k and v have different strides (#119500)
`dv = at::empty_like(k)` and `dv = at::empty_like(v)` can be materially different, because `empty_like` tries to preserve the strides of the input when possible. So if `k` is contiguous, but `v`, is transposed, then before this PR, `dv` would be computed to be contiguous.

Alternatively, we could change the meta implementation of `aten._scaled_dot_product_flash_attention` to this:
```
    grad_q = torch.empty_like(query.transpose(1, 2)).transpose(1, 2)
    grad_k = torch.empty_like(key.transpose(1, 2)).transpose(1, 2)
    grad_v = torch.empty_like(key.transpose(1, 2)).transpose(1, 2)
    return grad_q, grad_k, grad_v
```

But (I think?) the logic in the sdpa backward impl was a typo.

I noticed this because changing the meta formula as above was enough to fix the issue with the `aot_eager` backend in this [link](https://github.com/pytorch/pytorch/issues/116935#issuecomment-1914310523).

A minimal repro that I made looks like this:
```
import torch

# in this repro, "grad_out" and "value" are transposed tensors,
# but "key" and "value" are contiguous
a = torch.randn(2, 513, 16, 64, dtype=torch.float16, device='cuda').transpose(1, 2)
b = torch.randn(2, 16, 513, 64, dtype=torch.float16, device='cuda')
c = torch.randn(2, 16, 513, 64, dtype=torch.float16, device='cuda')
d = torch.randn(2, 513, 16, 64, dtype=torch.float16, device='cuda').transpose(1, 2)
e = torch.randn(2, 16, 513, 64, dtype=torch.float16, device='cuda')
f = torch.randn(2, 16, 513, device='cuda')
g = None
h = None
i = 513
j = 513
k = 0.0
l = False
m = torch.tensor(1, dtype=torch.int64)
n = torch.tensor(1, dtype=torch.int64)

out1_ref, out2_ref, out3_ref = torch.ops.aten._scaled_dot_product_flash_attention_backward(a, b, c, d, e, f, g, h, i, j, k, l, m, n, scale=0.125)

from torch._meta_registrations import meta__scaled_dot_product_flash_backward
out1_test, out2_test, out3_test = meta__scaled_dot_product_flash_backward(a, b, c, d, e, f, g, h, i, j, k, l, m, n, scale=0.125)

# prints True True
print(out1_ref.is_contiguous())
print(out1_test.is_contiguous())

# prints True True
print(out2_ref.is_contiguous())
print(out2_test.is_contiguous())

# prints True False
print(out3_ref.is_contiguous())
print(out3_test.is_contiguous())
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/119500
Approved by: https://github.com/drisspg, https://github.com/ezyang, https://github.com/Skylion007
2024-02-10 02:04:56 +00:00
Aaron Orenstein
4dc53f777b Fix dynamo failure w/ astype (#117952)
The torch "fake" ndarray had some mismatches vs numpy.ndarray which caused test_sparse_to_sparse_compressed to fail under dynamo.

This also fixes (because the test now hits it) a problem where unpacking a sequence with the incorrect number of args would assert in dynamo instead of graph breaking (because it would throw an exception). Added a unit test for this condition.

Fixed:
- torch._numpy._ndarray.astype() (actually used by the test)
- torch._numpy._ndarray.put() (drive-by discovery)
- torch._numpy._ndarray.view() (drive-by discovery)

(burndown item 7)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/117952
Approved by: https://github.com/yanboliang
ghstack dependencies: #117951
2024-02-03 08:10:15 +00:00
Alexander Grund
865945cc1f Convert requires_cuda to full decorator (#118281)
Don't require using it as `@requires_cuda()` -> `@requires_cuda` instead No need for the partial function invoked many times

Split out this change from the initial large refactoring in #117741 to hopefully get merged before conflicts arise

Pull Request resolved: https://github.com/pytorch/pytorch/pull/118281
Approved by: https://github.com/ezyang
2024-01-25 15:50:21 +00:00
voznesenskym
fed45aee54 Replace invoking self.value if there is a user defined init, avoiding arbitrary code execution (#117818)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/117818
Approved by: https://github.com/ezyang
2024-01-23 03:14:58 +00:00
PyTorch MergeBot
1174e82bde Revert "Add _assert_scalar and teach Inductor to codegen it (#114148)"
This reverts commit b6028acfa4.

Reverted https://github.com/pytorch/pytorch/pull/114148 on behalf of https://github.com/osalpekar due to Going to revert this given the broken torchrec PT2 tests internally: [D52648865](https://www.internalfb.com/diff/D52648865). Logs aren't too clear but @dstaay-fb can help debug as well ([comment](https://github.com/pytorch/pytorch/pull/114148#issuecomment-1886100368))
2024-01-11 02:30:22 +00:00
Edward Z. Yang
b6028acfa4 Add _assert_scalar and teach Inductor to codegen it (#114148)
Inductor codegen for `_assert_async` is currently disabled because we don't really understand how to codegen `scalar_to_tensor` on a Sympy expression. I initially tried to see if I could get this to work, but I got into some weird problem involving stride sorting, so I decided to fix it properly by not going through a tensor.

So we introduce an `_assert_scalar` which takes a scalar as an argument, avoiding needing to turn a SymBool into a tensor before asserting on it. I also add `_functional_assert_scalar` for good luck, although this doesn't do anything right now because https://github.com/pytorch/pytorch/pull/104203 still hasn't been landed.

I need to customize the codegen for this operator, so I decide to directly implement it in Inductor, rather than trying to treat it as a generic ExternKernel. This leads to the new AssertScalar IR node. This is written carefully so that it doesn't get DCE'd by Inductor.

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/114148
Approved by: https://github.com/jansel
2024-01-09 23:21:26 +00:00
voznesenskym
83e8a0721d Reland #111196 (take 4) "Support tensors as Dict keys" (#116934)
Fixes #ISSUE_NUMBER

See that PR

Pull Request resolved: https://github.com/pytorch/pytorch/pull/116934
Approved by: https://github.com/ezyang, https://github.com/huydhn
2024-01-07 01:37:26 +00:00
PyTorch MergeBot
2dca3e99eb Revert "Support tensors as Dict keys Re-PR of #111196 (#116785)"
This reverts commit 1badad9ce9.

Reverted https://github.com/pytorch/pytorch/pull/116785 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally ([comment](https://github.com/pytorch/pytorch/pull/116785#issuecomment-1879592261))
2024-01-06 08:22:33 +00:00
voznesenskym
1badad9ce9 Support tensors as Dict keys Re-PR of #111196 (#116785)
This prepares the PR where we implement sets in terms of dicts.
To do so, rather than storing internally a dictionary that maps literals
to VariableTrackers, it stores (pretty much) a dictionary from VTs to VTs.
To do so, keys are wrapped in an opaque internal class _Hashable.
The Hashable class is opaque on purpose so that it fails hard if
if it inadvertently leaks back into user code.
We also found and fixed a number of latent bugs and inconsistencies
in the way dynamo checked what can be a dict key. More generally, we
make much clearer what are the things that need to be modified to add
a new supported key type to Dicts.

Fixes [#107595](https://www.internalfb.com/tasks?t=107595)
Fixes [#111603](https://www.internalfb.com/tasks?t=111603)

Re-PR of https://github.com/pytorch/pytorch/pull/111196 sadly due to reverts, we could not reuse @lezcano's original PR.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/116785
Approved by: https://github.com/mlazos
2024-01-06 03:35:35 +00:00
Edward Z. Yang
53f8d17d1e Specialize SymNodeVariable when used as module index (#114377)
Fixes #114171

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/114377
Approved by: https://github.com/Skylion007
2024-01-05 13:51:52 +00:00
Guilherme Leobas
5c9464fb51 add CALL_FINALLY opcode (#116159)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/116159
Approved by: https://github.com/yanboliang
2023-12-27 19:01:08 +00:00
David Berard
7b7f11f230 [dynamo] test number of guards when inputs are views (#115793)
After # 113734 landed (adding dynamic storage offsets), we found that compilation times increased significantly. The reason: tensors_definitely_do_not_overlap was doing comparisons on storage offsets which were adding guards

626b7dc847/torch/_functorch/_aot_autograd/input_output_analysis.py (L268-L276)

This guard is added on all pairs of tensors which are views of the same source tensor - i.e. it the number of guards can be quadratic in the number of input tensors. This PR adds a test to prevent similar regressions.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/115793
Approved by: https://github.com/yanboliang
2023-12-19 16:09:29 +00:00
Peter Bell
0e0dd8f985 [dynamo][BE] Move torchvision import inside of test_multi_import (#115677)
Currently this skip imports torchvision, so if your torchvision install
is broken then the entire file fails at collection time. This instead
means only the test itself will fail.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/115677
Approved by: https://github.com/lezcano
2023-12-13 14:16:31 +00:00
voznesenskym
76ced0df03 Consider storage_changed for assigning alias_of_input in aot_autograd when computing differentiable outputs that alias each other (#115315)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/115315
Approved by: https://github.com/bdhirsh
2023-12-12 23:21:58 +00:00
Aaron Gokaslan
794545c11f [BE]: Enable RUF015 codebase wide (#115507)
Constant time access of first value in collection. This is a constant time operation instead of converting the item to a list to get the first item which is linear. The rule is turned on which automatically autofixes and enforces this.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/115507
Approved by: https://github.com/malfet
2023-12-11 15:51:01 +00:00
Jason Ansel
88642d44d9 [dynamo] Add RestrictedListSubclassVariable (#115057)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/115057
Approved by: https://github.com/yanboliang
ghstack dependencies: #115095, #115046
2023-12-05 19:01:23 +00:00
Jason Ansel
aa70e31610 [dynamo] Fix MutableSideEffects returning alias (#115095)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/115095
Approved by: https://github.com/yanboliang
2023-12-05 19:01:03 +00:00
Jason Ansel
3d0bbb24a1 [dynamo] Improve support for list subclasses (#115052)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/115052
Approved by: https://github.com/oulgen, https://github.com/eellison
ghstack dependencies: #114830, #115047, #115048
2023-12-05 01:31:33 +00:00
Jason Ansel
a70c85ce90 [dynamo] Improve support for inspect.signature().parameters (#115047)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/115047
Approved by: https://github.com/oulgen
ghstack dependencies: #114830
2023-12-04 19:08:36 +00:00
voznesenskym
4cfe997490 [dynamo] handle setting .data on a tensor (#113080)
**Dynamo**

We don't want setattr in the graph. Setting data has interesting implications on both aliasing and on the autograd engine.

The safe recipe is:

1) Disable grad
2) Call set_()
3) Manually lower the version counter on the object to hide it from the autograd engine

This is effectively the same exact thing as setting .data, and it composes properly with aot_autograd and inductor.

**aot_autograd**

For aot_autograd, there's another snag.

Specifically, when we invoke aot_autograd, we call `fake_mode.from_tensor()`, relying on memo to get the right tensor out. For .data mutations, this doesn't work, because the memoized fake_tensor is in the state it will be in at the end of the trace, not at the beginning. This means that the .data call is already applied, and the tensor shape (as in the case of these tests) mismatches. aot_autograd produces an invalid graph, with illegal calls like `torch.ops.aten.view.default(primals_2, [0])` where primals is actually sized `([6])` on input.

The new plan here is to:
1) Record tensor fakification policy in dynamo
2) provide a fresh fake mode to all backends
3) Invoke from_tensor with the stored policy to get fresh new fake tensors in aot_autograd

Pull Request resolved: https://github.com/pytorch/pytorch/pull/113080
Approved by: https://github.com/bdhirsh
2023-12-02 00:35:44 +00:00
David Berard
3fc58a6bbe Revert "Make offsets dynamic by default (#113734)" (#114889)
This reverts commit 7c38b76efe.

if a graph has a lot of inputs which are views (with nonzero storage offset), then the check for overlapping tensor views will add a lot of guards (n^2?)

b35ca2cb94/torch/_functorch/_aot_autograd/input_output_analysis.py (L256-L260)

this was causing very slow compilations on an internal model.

Differential Revision: [D51733774](https://our.internmc.facebook.com/intern/diff/D51733774)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/114889
Approved by: https://github.com/ckluk2, https://github.com/YuqingJ, https://github.com/aaronenyeshi
2023-12-01 16:49:42 +00:00
Jon Chuang
f66add9b85 [dynamo] graph break on np.ndarray.tobytes (#114208)
We can't model this accurately across np and tnp https://github.com/pytorch/pytorch/issues/114204#issuecomment-1820269949

So let's not even try. Just graph break.

Fixes: https://github.com/pytorch/pytorch/issues/114204

Pull Request resolved: https://github.com/pytorch/pytorch/pull/114208
Approved by: https://github.com/lezcano
2023-11-21 18:19:37 +00:00
Edward Z. Yang
59ad51e10a Insert deferred runtime asserts into Dynamo FX graph (#113958)
During the course of fake tensor propagation (and, potentially, also Dynamo execution, although I do not believe it is possible to exercise this right now), we may generate deferred runtime asserts, which represent "guards" on unbacked symbols which cannot be immediately checked on entry to a code block; instead, they have to be checked at runtime. However, we currently accumulate these deferred runtime asserts into the ShapeEnv, and don't do anything with them.

This PR modifies Dynamo to automatically insert these runtime asserts into the FX graph, before passing it on to the backend compiler. The assert format coincides with the export assert format as practiced in `torch/_export/passes/add_runtime_assertions_for_constraints_pass.py`, but actually these passes are completely disjoint right now as I only handle deferred runtime asserts, while export only handles ranges (which I should probably also handle, but don't in this PR.)

The assertions must be inserted by Dynamo, because you could potentially then pass the asserts onto another backend like "eager" which no longer looks at the ShapeEnv before. Thanks to previous work in export, these asserts are preserved in AOTAutograd, but they are dropped by Inductor, which needs to be fixed in future work. This piece will be a bit awkward, as Inductor would have preferred to work with the Sympy expressions directly, ah well.

Here is what the Dynamo traced FX graph looks like for the test in question:

```
  <eval_with_key>.0 class GraphModule(torch.nn.Module):
     def forward(self, L_x_ : torch.Tensor):
         l_x_ = L_x_

         # File: /data/users/ezyang/c/pytorch/wu.py:8, code: y = x.item()
         item = l_x_.item()

         # No stacktrace found for following nodes
         ge_1 = item >= 0
         scalar_tensor_default = torch.ops.aten.scalar_tensor.default(ge_1);  ge_1 = None
         _assert_async_msg = torch.ops.aten._assert_async.msg(scalar_tensor_default, "Deferred runtime assert failed: i0 >= 0, where i0 was defined by 'item' (for more information, run with TORCH_LOGS=+dynamo,dynamic)");  scalar_tensor_default = None

         # File: /data/users/ezyang/c/pytorch/wu.py:9, code: torch._check_is_size

         _check_is_size = torch._check_is_size(item)

         # File: /data/users/ezyang/c/pytorch/wu.py:10, code: if y >= 0:
         ge = item >= 0;  item = None

         # File: /data/users/ezyang/c/pytorch/wu.py:11, code: return x * 2
         mul = l_x_ * 2;  l_x_ = None
         return (mul,)

```

Note that we actually keep the `_check_is_size` in the graph redundantly. However, assert_async is retained in the graph, whereas _check_is_size ends up getting DCE'ed.

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/113958
Approved by: https://github.com/aakhundov, https://github.com/tugsbayasgalan
ghstack dependencies: #113978
2023-11-20 21:25:11 +00:00
Jon Chuang
100b9952b1 [dynamo] Fix user defined object sourceless callable (#114066)
Fixes https://github.com/pytorch/pytorch/issues/114019
We do not need to guard on callable user object defined instantiated in graph

Pull Request resolved: https://github.com/pytorch/pytorch/pull/114066
Approved by: https://github.com/ezyang
2023-11-20 18:38:03 +00:00
Edward Z. Yang
caffa44b1c Correctly use real boolean operators, not bitwise in shape guard prints (#113927)
Fixes https://github.com/pytorch/pytorch/issues/113875

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/113927
Approved by: https://github.com/voznesenskym
2023-11-18 04:24:45 +00:00
Peter Bell
9f47580ad7 [BE] Don't mutate torch.compile global config in tests (#113882)
We should uniformly use `config.patch` so the configuration changes don't effect
different tests.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/113882
Approved by: https://github.com/lezcano
2023-11-17 16:49:48 +00:00
David Berard
7c38b76efe Make offsets dynamic by default (#113734)
Copied from @ezyang 's #113693.

The motivation for this change is that we'd like to guard on storage offset in inductor, to make assumptions about data alignment.

create_symbolic_sizes_strides_storage_offset() creates the sizes/strides/offset for fake tensors - they can either be integers or symints. This PR changes storage_offset to always be dynamic. In variables/builder.py, we remove a conditional so that all tensors get added to tracked_fakes. This is because the storage offset will be dynamic even if the other logic in builder.py suggests that it will be static; otherwise, we run into this issue:

1e260c851b/torch/fx/experimental/symbolic_shapes.py (L892-L895)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/113734
Approved by: https://github.com/ezyang
2023-11-17 07:57:21 +00:00
Jon Chuang
277229d0c6 [dynamo] Fix incorrectly casting SymNode to int when input is bool (#113871)
Fixes https://github.com/pytorch/pytorch/issues/113393, https://github.com/pytorch/pytorch/pull/113848#issuecomment-1814624510

Incorrectly casting symnode type will cause it to take the wrong path in symbolic_shapes

Pull Request resolved: https://github.com/pytorch/pytorch/pull/113871
Approved by: https://github.com/jansel
2023-11-16 23:24:57 +00:00
PyTorch MergeBot
98df3088c3 Revert "Make offsets dynamic by default (#113734)"
This reverts commit 9efbb4ea73.

Reverted https://github.com/pytorch/pytorch/pull/113734 on behalf of https://github.com/huydhn due to Sorry for reverting your change, but it is causing a memory leak in one of the test 9efbb4ea73 ([comment](https://github.com/pytorch/pytorch/pull/113734#issuecomment-1815297222))
2023-11-16 20:56:27 +00:00
David Berard
9efbb4ea73 Make offsets dynamic by default (#113734)
Copied from @ezyang 's #113693.

The motivation for this change is that we'd like to guard on storage offset in inductor, to make assumptions about data alignment.

create_symbolic_sizes_strides_storage_offset() creates the sizes/strides/offset for fake tensors - they can either be integers or symints. This PR changes storage_offset to always be dynamic. In variables/builder.py, we remove a conditional so that all tensors get added to tracked_fakes. This is because the storage offset will be dynamic even if the other logic in builder.py suggests that it will be static; otherwise, we run into this issue:

1e260c851b/torch/fx/experimental/symbolic_shapes.py (L892-L895)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/113734
Approved by: https://github.com/ezyang
2023-11-16 06:49:09 +00:00
Brian Hirsh
cebad9867b graph break on intermediate leaves that require grad (#113277)
fixes https://github.com/pytorch/pytorch/issues/90552. This is a simpler fix that just detects the situation where AOTAutograd can't create a proper backward graph for the situation and graph breaks. This was technically a silent correctness issue before.

This PR tries to always graph break when we see a factory function that returns a tensor requiring grad. I check this by seeing if the op returned a `TensorVariable` in dynamo, and if one of the input arguments was a `requires_grad=True` kwarg. I think this is high-fidelity enough, and I'm also hoping that this is uncommon enough that a graph break is reasonable here.

The fix to avoid the graph break in user land is also pretty easy - just instantiate your tensor outside of the compiled region and plumb it in.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/113277
Approved by: https://github.com/eellison
ghstack dependencies: #113267, #113416, #113584
2023-11-16 02:47:45 +00:00
PyTorch MergeBot
5d170fce29 Revert "Support tensors as Dict keys (#111196)"
This reverts commit b0805fa5d0.

Reverted https://github.com/pytorch/pytorch/pull/111196 on behalf of https://github.com/huydhn due to Sorry for reverting your change, but it is failing internally. I will provide the details there ([comment](https://github.com/pytorch/pytorch/pull/111196#issuecomment-1813410149))
2023-11-15 23:08:00 +00:00