Commit Graph

756 Commits

Author SHA1 Message Date
Brian Hirsh
7d710403b0 Reapply "Make functionalization ViewMeta serializable with pickle. (#143712)" (#163769)
### Summary:
NOTE: This is a re-export of https://github.com/pytorch/pytorch/pull/161994 ; the changes between these two PRs is exclusively to the buck/build files

(Summary from #161994 )
Attempted rebase of https://github.com/pytorch/pytorch/pull/143712.

This reverts commit 6c713ccb5e.

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 kadeng chauhang amjames Lucaskabela

imported-using-ghimport

Test Plan: Imported from OSS

Differential Revision: D81524507

Pulled By: Lucaskabela

Pull Request resolved: https://github.com/pytorch/pytorch/pull/163769
Approved by: https://github.com/dolpm

Co-authored-by: Brian Hirsh <hirsheybar@fb.com>
2025-09-25 10:27:37 +00:00
Laith Sakka
04ddea44fd Fix: ShapeEnv not propagated properly to inductor SizeVars (#162927)
Summary:
I am really skeptical about inductor sizevars creating an empty shape env when not provided with one
i think we should fail there if the graph has dynamic shapes and no shape env is provided.

however i wonder if there are actually use cases that depends on the shape env not being there?
Reasoning APIs depends on facts in the shape env. and assumes some stuff exists for specific symbols.

Test Plan:
Fix the bug reported in creating simple e2e unit test is not trivial
https://www.internalfb.com/diff/D82337184

Rollback Plan:

Differential Revision: D82412384

Pull Request resolved: https://github.com/pytorch/pytorch/pull/162927
Approved by: https://github.com/ezyang, https://github.com/eellison, https://github.com/jansel
2025-09-18 00:56:22 +00:00
bobrenjc93
0661ecdb38 add support for hint_override in mark_unbacked (#162652)
Very similar to https://github.com/pytorch/pytorch/pull/161007 except now for mark_unbacked.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/162652
Approved by: https://github.com/laithsakka
2025-09-17 22:29:54 +00:00
Shen Zhang
3f8a2e62ea Fix rebind_unbacked in torch.fx.experimental.symbolic_shapes (#162788)
## Description
Fix a float type handling in `torch.fx.experimental.symbolic_shapes` function. [#162480](https://github.com/pytorch/pytorch/issues/162480)

## Issue
When I use AOTInductor to compile the YOLOv10, I encounter the bug `'float' object has no attribute 'node'`.
[Torch AOTInductor Ahead-Of-Time Compilation Fail](https://github.com/opendatalab/DocLayout-YOLO/issues/177)

The problem is due to missing float type handling.
https://github.com/pytorch/pytorch/blob/main/torch/fx/experimental/symbolic_shapes.py#L597
```
            if isinstance(u1, int):
                log.info(
                    "rebind_unbacked: discard %s %s %s -> %s",
                    n.target,
                    raw_u0,
                    path,
                    u1,
                )
                continue
```

## Solution
Change the code `if isinstance(u1, float)` to `if isinstance(u1, (int,float))`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/162788
Approved by: https://github.com/ezyang
2025-09-14 17:07:14 +00:00
Laith Sakka
f01bf0f64b Do not use // but use CleanDiv or FloorDiv instead (#162869)
Summary:
When rewriting sympy expressions in the compiler codebase we want to generate
FloorDiv(a, b) CleanDiv(a, b) directly and not a//b. since the later become floor(a*pow(b, -1))

For symnodes we automatically handle that conversions in the symnode op dispatch.
I will follow up with an issue to track all other usages of //.
Block internal Model.

Test Plan:
add test
run existing tests.
dakechen1993 testing on the model.

Rollback Plan:

Differential Revision: D82362241

Pull Request resolved: https://github.com/pytorch/pytorch/pull/162869
Approved by: https://github.com/ezyang
2025-09-14 01:30:33 +00:00
Avik Chaudhuri
fccddf02b6 repro 161902 (#162416)
Summary:
Sometimes `ShapeEnv.create_symbol` can return a `sympy.Integer`. This messes up our phantom symbol infra for derived dims.

Fixes #161902

Test Plan:
added test based on repro

Rollback Plan:

Differential Revision: D81960709

Pull Request resolved: https://github.com/pytorch/pytorch/pull/162416
Approved by: https://github.com/tugsbayasgalan
2025-09-11 16:35:23 +00:00
Avik Chaudhuri
3c45af079a kill allow_complex_guards_as_runtime_asserts (#161794)
Summary:
[reland]
Since `allow_complex_guards_as_runtime_asserts` is now sync'd with `prefer_deferred_runtime_asserts_over_guards`, we can kill the former (especially since it was a export-only concept).

Test Plan:
updated tests

Rollback Plan:

Differential Revision: D81334984

Pull Request resolved: https://github.com/pytorch/pytorch/pull/161794
Approved by: https://github.com/zhxchen17
2025-09-04 00:17:01 +00:00
Laith Sakka
3559c354ce stop suggesting using guard_size_oblivious on data dependent errors (#160510)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/160510
Approved by: https://github.com/ezyang
2025-09-03 18:07:59 +00:00
PyTorch MergeBot
47742081c9 Revert "kill allow_complex_guards_as_runtime_asserts (#160198)"
This reverts commit 69d91b94ba.

Reverted https://github.com/pytorch/pytorch/pull/160198 on behalf of https://github.com/jeffdaily due to let's revert again instead of waiting for forward fix, see earlier comments ([comment](https://github.com/pytorch/pytorch/pull/160198#issuecomment-3235165462))
2025-08-28 22:50:37 +00:00
Avik Chaudhuri
69d91b94ba kill allow_complex_guards_as_runtime_asserts (#160198)
Summary: Since `allow_complex_guards_as_runtime_asserts` is now sync'd with `prefer_deferred_runtime_asserts_over_guards`, we can kill the former (especially since it was a export-only concept).

Test Plan:
updated tests

Rollback Plan:

Differential Revision: D79903317

Pull Request resolved: https://github.com/pytorch/pytorch/pull/160198
Approved by: https://github.com/ezyang
2025-08-28 19:36:19 +00:00
PyTorch MergeBot
a8270dd124 Revert "kill allow_complex_guards_as_runtime_asserts (#160198)"
This reverts commit 196232bb93.

Reverted https://github.com/pytorch/pytorch/pull/160198 on behalf of https://github.com/atalman due to dynamo/test_activation_checkpointing.py::ActivationCheckpointingViaTagsTestsCUDA::test_compile_selective_checkpoint_triton_kernel_cuda [GH job link](https://github.com/pytorch/pytorch/actions/runs/17289619543/job/49074475338) [HUD commit link](196232bb93) ([comment](https://github.com/pytorch/pytorch/pull/160198#issuecomment-3234013520))
2025-08-28 15:40:37 +00:00
Avik Chaudhuri
196232bb93 kill allow_complex_guards_as_runtime_asserts (#160198)
Summary: Since `allow_complex_guards_as_runtime_asserts` is now sync'd with `prefer_deferred_runtime_asserts_over_guards`, we can kill the former (especially since it was a export-only concept).

Test Plan:
updated tests

Rollback Plan:

Differential Revision: D79903317

Pull Request resolved: https://github.com/pytorch/pytorch/pull/160198
Approved by: https://github.com/ezyang
2025-08-28 07:59:29 +00:00
bobrenjc93
9a41570199 [rfc] add hint_override kwarg to mark_dynamic (#161007)
The motivation for this change can be seen through the following example:

```
import torch

GPU_TYPE = "cuda"

@torch.compile
def no_override(x):
    return x.sum(dim=0)

@torch.compile
def override(x):
    return x.sum(dim=0)

x_small = torch.randn(4096, 512, device=GPU_TYPE)
no_override(x_small)
torch._dynamo.decorators.mark_dynamic(x_small, 0, hint_override=4096 * 1000)
override(x_small)
```

Previously, when reductions were split, codegen relied only on the first observed shape. With a small input, this resulted in a small split size:

```
def triton_red_fused_sum_0(in_ptr0, out_ptr0, ks0, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
    xnumel = 16384
    rnumel = r0_numel
```

With the new scheme, inductor honors hint_override during codegen, producing larger and more appropriate split sizes:

```
def triton_red_fused_sum_0(in_ptr0, out_ptr0, ks0, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
    xnumel = 1024000
    rnumel = r0_numel
```

This addresses a broader problem with dynamism: performance and numerics previously depended on whichever shape was seen first. For example:

```
f(s0) -> f(s2)
f(s1) -> f(s2)
```

could generate different kernels. With the new approach, an explicit override pins the chosen configuration:

```
f(s0, hint_override=s0) -> f(s2)
f(s1, hint_override=s0) -> f(s2)
```

ensuring consistent kernel generation regardless of input order.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/161007
Approved by: https://github.com/jansel
2025-08-21 02:22:52 +00:00
Tugsbayasgalan (Tugsuu) Manlaibaatar
dbef606631 Add support for tracing vmap in pre-dispatch export (#154650)
Summary: ONNX team and recent transformer upgrade ran into this error and we also ran into during our export benchmarking. This diff makes it possible to trace through vmap implementation in pre-dispatch IR. Note that we don't support serializing functorch ops in pre-dispatch IR and in the future, we should desugar them to post-grad ops.

The implementation strategy is:
1. We add python wrappers around vmap APIs so that we attach custom torch function handler that is only on during non-strict export. The reason is we don't want to add this to default torch_function handler because it will break BC.
2. Some dynamo changes to make sure it picks up new python wrapper APIs. The reason is when we do strict export, we need to re-materialize these APIs in pre-dispatch IR from torch IR. We can avoid this by special casing in dynamo for export to proxy different API calls but i feel that is too much chaos because you need to be able to proxy 2 different variants of same vmap API.

Test Plan: CI

Differential Revision: D75623875

Pull Request resolved: https://github.com/pytorch/pytorch/pull/154650
Approved by: https://github.com/ezyang, https://github.com/zou3519
2025-08-20 19:31:07 +00:00
PyTorch MergeBot
90ea9ccefe Revert "[rfc] add hint_override kwarg to mark_dynamic (#161007)"
This reverts commit 0533ff2ccb.

Reverted https://github.com/pytorch/pytorch/pull/161007 on behalf of https://github.com/jeffdaily due to failing on both cuda and rocm ([comment](https://github.com/pytorch/pytorch/pull/161007#issuecomment-3206893756))
2025-08-20 15:31:33 +00:00
bobrenjc93
0533ff2ccb [rfc] add hint_override kwarg to mark_dynamic (#161007)
The motivation for this change can be seen through the following example:

```
import torch

GPU_TYPE = "cuda"

@torch.compile
def no_override(x):
    return x.sum(dim=0)

@torch.compile
def override(x):
    return x.sum(dim=0)

x_small = torch.randn(4096, 512, device=GPU_TYPE)
no_override(x_small)
torch._dynamo.decorators.mark_dynamic(x_small, 0, hint_override=4096 * 1000)
override(x_small)
```

Previously, when reductions were split, codegen relied only on the first observed shape. With a small input, this resulted in a small split size:

```
def triton_per_fused_sum_1(in_ptr0, out_ptr0, xnumel, r0_numel, XBLOCK : tl.constexpr):
    xnumel = 512
    r0_numel = 32
```

With the new scheme, inductor honors hint_override during codegen, producing larger and more appropriate split sizes:

```
def triton_red_fused_sum_0(in_ptr0, out_ptr0, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
    xnumel = 16384
    r0_numel = 128
```

This addresses a broader problem with dynamism: performance and numerics previously depended on whichever shape was seen first. For example:

```
f(s0) -> f(s2)
f(s1) -> f(s2)
```

could generate different kernels. With the new approach, an explicit override pins the chosen configuration:

```
f(s0, hint_override=s0) -> f(s2)
f(s1, hint_override=s0) -> f(s2)
```

ensuring consistent kernel generation regardless of input order.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/161007
Approved by: https://github.com/jansel
2025-08-20 07:51:09 +00:00
bobrenjc93
311f74089a remove print (#159917)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/159917
Approved by: https://github.com/laithsakka
2025-08-06 03:48:23 +00:00
Lucas Kabela
c137f9da0b [Dynamo][Better Engineering] Add type coverage to dynamo/compiled_autograd.py (#159518)
As part of better engineering effort, we would like to improve out type support to improve dev experience in dynamo

This PR adds strict typing support to `torch/_dynamo/compiled_autograd.py`

Running
```
mypy torch/_dynamo/compiled_autograd.py --linecount-report /tmp/coverage_log
```

| -------- | Lines Annotated | Lines Total | % lines covered | Funcs Annotated | Funcs Total | % funcs covered |
| -------- | ------- | -------- | ------- | ------- | ------- | ------- |
| Main  |  425 | 1553 | 27.37% | 17 | 62 | 27.42% |
| This PR | 1623 | 1623 | 100.00% | 62 | 62 | 100.00% |
| Delta    | +1198| +0 | +72.63% | +45 | 0 | +72.58% |

Pull Request resolved: https://github.com/pytorch/pytorch/pull/159518
Approved by: https://github.com/xmfan
2025-08-01 20:24:58 +00:00
Lucas Kabela
2b1ae29960 [Dynamo][Better Engineering] Add typing annotations to guard and source (#158397) (#159491)
Summary:
X-link: https://github.com/pytorch/executorch/pull/12986

As part of better engineering week, we would like to improve out type support to improve dev experience in dynamo

This PR adds strict typing support to a critical set of files for dynamo, `source.py` and the base `_guards.py`

Running
```
mypy torch/_dynamo/source.py torch/_guards.py --linecount-report /tmp/coverage_log
```

| -------- | Lines Unannotated | Lines Total | % lines covered | Funcs Unannotated | Funcs Total | % funcs covered |
| -------- | ------- | -------- | ------- | ------- | ------- | ------- |
| Main  |  1227 | 2208 | 55.57% | 207 | 362 | 57.18% |
| This PR | 2217 | 2217 | 100.00% | 362 | 362 | 100.00% |
| Delta    | +990 | +9 | +44.43% | +155 | 0 | +42.82% |

cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 jerryzh168 voznesenskym penguinwu EikanWang Guobing-Chen zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov coconutruben

Test Plan:
Imported from GitHub, without a `Test Plan:` line.

Rollback Plan:

Reviewed By: JacobSzwejbka, yangw-dev

Differential Revision: D79199389

Pulled By: Lucaskabela

Pull Request resolved: https://github.com/pytorch/pytorch/pull/159491
Approved by: https://github.com/anijain2305, https://github.com/yangw-dev
2025-07-30 22:57:50 +00:00
PyTorch MergeBot
d987a6f7f0 Revert "[Dynamo][Better Engineering] Add typing annotations to guard and source (#158397)"
This reverts commit abcb24f4de.

Reverted https://github.com/pytorch/pytorch/pull/158397 on behalf of https://github.com/yangw-dev due to Suggested to fix failing internal signals on D78911890 ([comment](https://github.com/pytorch/pytorch/pull/158397#issuecomment-3133823766))
2025-07-29 19:49:40 +00:00
Laith Sakka
0b2ef76e85 DDE-Free select with unbacked index. (#157605)
When select has data dependent input, we cant tell if the actual index shall be index+size or index.
to avoid throwing dde, we allocate a new unbacked symbol to represent the storage offset of the
output view and we compute its value dynamically at runtime when inductor is lowered.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/157605
Approved by: https://github.com/ColinPeppler
2025-07-24 20:08:05 +00:00
Lucas Kabela
abcb24f4de [Dynamo][Better Engineering] Add typing annotations to guard and source (#158397)
As part of better engineering week, we would like to improve out type support to improve dev experience in dynamo

This PR adds strict typing support to a critical set of files for dynamo, `source.py` and the base `_guards.py`

Running
```
mypy torch/_dynamo/source.py torch/_guards.py --linecount-report /tmp/coverage_log
```

| -------- | Lines Unannotated | Lines Total | % lines covered | Funcs Unannotated | Funcs Total | % funcs covered |
| -------- | ------- | -------- | ------- | ------- | ------- | ------- |
| Main  |  1227 | 2208 | 55.57% | 207 | 362 | 57.18% |
| This PR | 2217 | 2217 | 100.00% | 362 | 362 | 100.00% |
| Delta    | +990 | +9 | +44.43% | +155 | 0 | +42.82% |

Pull Request resolved: https://github.com/pytorch/pytorch/pull/158397
Approved by: https://github.com/anijain2305
2025-07-24 15:55:18 +00:00
Pian Pawakapan
1227ed6674 [dynamic shapes] fix _maybe_evaluate_static axioms bug (#158672)
Summary: couldn't get a minimal repro, but xref for size change during dict iteration error: https://fb.workplace.com/groups/1075192433118967/posts/1709439696360901

Test Plan:
-

Rollback Plan:

Differential Revision: D78047846

Pull Request resolved: https://github.com/pytorch/pytorch/pull/158672
Approved by: https://github.com/bobrenjc93
2025-07-21 23:14:19 +00:00
PyTorch MergeBot
23550ab735 Revert "DDE-Free select with unbacked index. (#157605)"
This reverts commit 79d7c754ab.

Reverted https://github.com/pytorch/pytorch/pull/157605 on behalf of https://github.com/laithsakka due to fail pr time benchmarks  ([comment](https://github.com/pytorch/pytorch/pull/157605#issuecomment-3084663020))
2025-07-17 16:20:02 +00:00
Laith Sakka
79d7c754ab DDE-Free select with unbacked index. (#157605)
When select has data dependent input, we cant tell if the actual index shall be index+size or index.
to avoid throwing dde, we allocate a new unbacked symbol to represent the storage offset of the
output view and we compute its value dynamically at runtime when inductor is lowered.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/157605
Approved by: https://github.com/ColinPeppler
2025-07-17 05:08:11 +00:00
Xuehai Pan
11c07c848c [BE][14/16] fix typos in torch/ (torch/fx/) (#156604)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/156604
Approved by: https://github.com/jingsh
ghstack dependencies: #156318, #156320, #156602
2025-07-02 22:55:29 +00:00
Prajesh Praveen Anchalia
156bc243f0 Back out "Include c++ stack traces when we hit constraint violation (#155603)" (#157406)
Summary:
Original commit changeset: 4b3fdaa8f2c6

Original Phabricator Diff: D76434787

Meta:
https://fb.workplace.com/groups/1286739428954016/permalink/1535462614081695/

Test Plan:
Meta:
Revert D76434787 for S536719

Rollback Plan:

Differential Revision: D77626334

Pull Request resolved: https://github.com/pytorch/pytorch/pull/157406
Approved by: https://github.com/bobrenjc93
2025-07-02 16:51:07 +00:00
Laith Sakka
6cc490d40b simplify max(1,x) to x when x known >=1 (#157189)
Creating contiguous strides creates an expression max(1, x). Often we know that x >= 1, in
 which case we should simplify max(1, x) to x.

This appeared in two situations:
1) An internal user complained about statically_known_true(x == max(1, x)) failing (internal link: https://fb.workplace.com/groups/1028545332188949/permalink/1232958568414290).
This https://github.com/pytorch/pytorch/pull/155938 won't be needed with this.

3) Not simplifying the above could result in wrong ConstraintViolationErrors.
Because we assume non-trival single arg guards shall evaporate see the logic in the function
issue_guard in symbolic_shapes.py

with this change we longer throw ConstraintViolationErrors with the program bellow
this is blocking landing this [PR](https://github.com/pytorch/pytorch/pull/155590) from landing
internally. Due to internal export tests throwing ConstraintViolationErrors.
like
```
Constraints violated (width)!
  - Not all values of width = L['x'].size()[3] in the specified range 224 <= width <= 455 satisfy the generated guard max(1, 1 + (((-1) + L['x'].size()[3]) // 2)) == (1 + (((-1) + L['x'].size()[3]) // 2)).
````

```
x = torch.rand(10)
torch._dynamo.mark_dynamic(x, 0, max=20, min=5)

@torch.compile(fullgraph=True, dynamic=True)
def func(x):
    if max(1, (-1 + x.size()[0]//2)) == (-1+x.size()[0]//2):
        return x*400
    else:
        return (x*10)*100

func(x)

```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/157189
Approved by: https://github.com/pianpwk
2025-06-29 01:16:30 +00:00
Tuan Trieu
eeaefa1336 Fix UnbackedSymint rebinding - check unbacked before renaming (#156911)
Differential Revision: D77249427

Due to memoization and graph order update, it can happen that a backed symbol is passed into compute_unbacked_bindings and lead to failure. An example as follow:

- There are 2 boolean indexing operators (e.g. op1 and op2) with the same mask.
- A unbacked symint is generated from op1, and then op2 reuses the unbacked symint due to a nonzero_memo in nonzero's fake implementation and no rebinding is needed for op2.
- Since op1 generated the unbacked symint, its meta has "unbacked_bindings" field filled and op2's meta doesn't have it.
- Output from op1 and op2 are later concated with others with backed symint, so that the unbacked symint can be replaced by a backed symint.
- In Inductor, during fake tensor prop, there is no memoi because new fake tensor is always generated (for the same node). op1 generates an unbacked symint and the unbacked can be rebound successfully to the backed symint. Since there is no memoi, op2 also generates a new unbacked symint, but no rebinding can happen because op2's meta doesn't have "unbacked_bindings". And "compute_unbacked_bindings/_rename_unbacked_to" fails to assert op2's old symbol to be unbacked.

From discussion with [@ezyang](https://www.internalfb.com/intern/profile/?id=503862770), there is no easy way to fix this issue.

- We can try to enable memoization for fake tensor prop in Inductor, however, we need to ensure that op1 is visited before op2 during Inductor fake tensor prop for this to work (op2's meta doesn't have "unbacked_bindings" so no rebinding can happen and we need to do rebinding from op1. But there are passes such as reorder_for_locality that can change the graph order so this doesn't work.
- A simple hack is to just replace the unbacked symbol in op2 by the backed symbol.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/156911
Approved by: https://github.com/ezyang
2025-06-27 16:57:04 +00:00
Shangdi Yu
204db27a0c Consolidate stack trace in Tracer (#156257)
Summary:
- Consolidate the stack trace recording code in TracerBase and PythonKeyTracer
- Change `make_fx`'s arg name to be consistent with TracerBase member name `record_stack_traces`

We move the stack trace logic from `create_proxy` to `create_node` so all inherited classes of TracerBase and re-use the same stack trace logic.

Test Plan:
```
buck run caffe2/test:test_export -- -r  test_stack_trace
```

Rollback Plan:

Pull Request resolved: https://github.com/pytorch/pytorch/pull/156257
Approved by: https://github.com/angelayi, https://github.com/zou3519
2025-06-25 23:07:10 +00:00
bobrenjc93
348e2a76df s/defer_runtime_assert/guard_or_defer_runtime_assert (#156397)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/156397
Approved by: https://github.com/laithsakka
2025-06-19 10:18:28 +00:00
Xuehai Pan
2e0e08588e [BE][PYFMT] migrate PYFMT for torch/[e-n]*/ to ruff format (#144553)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/144553
Approved by: https://github.com/ezyang
ghstack dependencies: #144551
2025-06-17 08:18:47 +00:00
PyTorch MergeBot
06408dae49 Revert "Add view_simple as meta function for view, and avoid calling reshape_view_helper. (#154757)"
This reverts commit 0029259bdf.

Reverted https://github.com/pytorch/pytorch/pull/154757 on behalf of https://github.com/laithsakka due to post land issue ([comment](https://github.com/pytorch/pytorch/pull/154757#issuecomment-2971385787))
2025-06-13 19:11:43 +00:00
Avik Chaudhuri
463fe36532 fix error message on specialization with Dim.DYNAMIC (#155738)
Previously specialization error messages would render sources that were pretty far from source-code names. E.g., given args named `x, y, zs`, the source for `y.size()[0]` would be rendered as `args[0][1].size()[0]`.

This is because we created artificial local names following `(args, kwargs)` structure instead of reusing signatures. This PR fixes that situation.

Basically we map prefixes of key paths that correspond to original arg names to root sources corresponding to those names; the rest of the key paths hang from these root sources.

Differential Revision: [D76461391](https://our.internmc.facebook.com/intern/diff/D76461391/)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/155738
Approved by: https://github.com/bobrenjc93
2025-06-13 10:33:46 +00:00
Laith Sakka
f4376cac54 unify symbolic_shapes and sizevars dynamic shapes APIs naming 1 (#154774)
Inductor have a set of APIs that allows performing symbolic evaluations similar to that of symbolic shapes
but it operates on sympy expressions instead of symnodes. Namings are not consistent making them consistent
in this stack.

Step 1 : unify statically_know_true naming! for consistent experience.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/154774
Approved by: https://github.com/drisspg, https://github.com/bobrenjc93, https://github.com/eellison
2025-06-12 16:11:55 +00:00
Zhengxu Chen
0fd711df19 [export] Allow user frame to be None when symbolic shape tries to get stacktrace. (#155744)
Summary: Fixing https://github.com/pytorch/pytorch/issues/155605

Test Plan:
CI

Rollback Plan:

Differential Revision: D76463358

Pull Request resolved: https://github.com/pytorch/pytorch/pull/155744
Approved by: https://github.com/angelayi
2025-06-12 14:36:29 +00:00
Laith Sakka
0029259bdf Add view_simple as meta function for view, and avoid calling reshape_view_helper. (#154757)
address https://github.com/pytorch/pytorch/issues/153303

Pull Request resolved: https://github.com/pytorch/pytorch/pull/154757
Approved by: https://github.com/bobrenjc93, https://github.com/leslie-fang-intel
2025-06-12 09:58:15 +00:00
Oguz Ulgen
d1947a8707 Migrate from lru_cache to cache (#155613)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/155613
Approved by: https://github.com/ezyang
ghstack dependencies: #155612
2025-06-11 19:44:18 +00:00
Pian Pawakapan
247f83e0a4 [dynamic shapes] guard individual terms in sym_and; user-code-friendly sym_and/sym_or (#154737)
Previously when processing `sym_and(a, b, c)`, symbolic shapes wouldn't individually process a, b, and c and store their implications. This would lead us to data-dependent error on individual checks, e.g. we stored `u0 >= 0 & u0 <= 10`, but then couldn't figure out `u0 <= 10`.

This handles that, and also makes `sym_and/or` user-code friendly, for testing.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/154737
Approved by: https://github.com/laithsakka
2025-06-11 18:08:06 +00:00
bobrenjc93
5b9db4335e Include c++ stack traces when we hit constraint violation (#155603)
Example new error message

```
torch.fx.experimental.symbolic_shapes.ConstraintViolationError: Constraints violated (L['x'].size()[0])! For more information, run with TORCH_LOGS="+dynamic".
  - You marked L['x'].size()[0] as dynamic but your code specialized it to be a constant (5). Either remove the mark_dynamic or use a less strict API such as maybe_mark_dynamic or Dim.AUTO.

Framework stack:
  File "??", line 0, in _start
  File "", line 0, in __libc_start_main_alias_2
  File "??", line 0, in __libc_start_call_main
  File "/usr/local/src/conda/python-3.10.16/Modules/main.c", line 1094, in Py_BytesMain
  File "/usr/local/src/conda/python-3.10.16/Modules/main.c", line 357, in pymain_run_file_obj
  File "/usr/local/src/conda/python-3.10.16/Python/pythonrun.c", line 90, in _PyRun_AnyFileObject
  File "/usr/local/src/conda/python-3.10.16/Python/pythonrun.c", line 456, in _PyRun_SimpleFileObject
  File "/usr/local/src/conda/python-3.10.16/Python/pythonrun.c", line 1208, in pyrun_file
  File "/usr/local/src/conda/python-3.10.16/Python/pythonrun.c", line 1312, in run_mod
  File "/usr/local/src/conda/python-3.10.16/Python/pythonrun.c", line 1291, in run_eval_code_obj
  File "/usr/local/src/conda/python-3.10.16/Python/ceval.c", line 1134, in PyEval_EvalCode
  File "/usr/local/src/conda/python-3.10.16/Include/internal/pycore_ceval.h", line 46, in _PyEval_EvalFrame
  File "/usr/local/src/conda/python-3.10.16/Include/cpython/abstract.h", line 114, in _PyObject_VectorcallTstate
  File "/usr/local/src/conda/python-3.10.16/Include/internal/pycore_ceval.h", line 46, in _PyEval_EvalFrame
  File "/home/bobren/local/a/pytorch/scratch/repro.py", line 9, in <module>
    foo(x)
  File "/usr/local/src/conda/python-3.10.16/Python/ceval.c", line 5945, in do_call_core
  File "/usr/local/src/conda/python-3.10.16/Include/internal/pycore_ceval.h", line 46, in _PyEval_EvalFrame
  File "/home/bobren/local/a/pytorch/torch/_dynamo/eval_frame.py", line 699, in compile_wrapper
    return fn(*args, **kwargs)
  File "offloadstuff.c", line 0, in dynamo__custom_eval_frame
  File "/usr/local/src/conda/python-3.10.16/Objects/call.c", line 305, in _PyObject_Call
  File "/usr/local/src/conda/python-3.10.16/Objects/typeobject.c", line 7494, in slot_tp_call
  File "/usr/local/src/conda/python-3.10.16/Objects/call.c", line 431, in _PyObject_Call_Prepend
  File "/usr/local/src/conda/python-3.10.16/Include/internal/pycore_ceval.h", line 46, in _PyEval_EvalFrame
  File "/home/bobren/local/a/pytorch/torch/_dynamo/convert_frame.py", line 1469, in __call__
    return self._torchdynamo_orig_callable(
  File "/usr/local/src/conda/python-3.10.16/Include/cpython/abstract.h", line 112, in _PyObject_VectorcallTstate
  File "/usr/local/src/conda/python-3.10.16/Objects/call.c", line 215, in _PyObject_MakeTpCall
  File "/usr/local/src/conda/python-3.10.16/Objects/typeobject.c", line 7494, in slot_tp_call
  File "/usr/local/src/conda/python-3.10.16/Objects/call.c", line 431, in _PyObject_Call_Prepend
  File "/usr/local/src/conda/python-3.10.16/Objects/call.c", line 153, in _PyObject_FastCallDictTstate
  File "/usr/local/src/conda/python-3.10.16/Include/internal/pycore_ceval.h", line 46, in _PyEval_EvalFrame
  File "/home/bobren/local/a/pytorch/torch/_dynamo/convert_frame.py", line 1248, in __call__
    result = self._inner_convert(
  File "/usr/local/src/conda/python-3.10.16/Include/cpython/abstract.h", line 112, in _PyObject_VectorcallTstate
  File "/usr/local/src/conda/python-3.10.16/Objects/call.c", line 215, in _PyObject_MakeTpCall
  File "/usr/local/src/conda/python-3.10.16/Objects/typeobject.c", line 7494, in slot_tp_call
  File "/usr/local/src/conda/python-3.10.16/Objects/call.c", line 431, in _PyObject_Call_Prepend
  File "/usr/local/src/conda/python-3.10.16/Objects/call.c", line 153, in _PyObject_FastCallDictTstate
  File "/usr/local/src/conda/python-3.10.16/Include/internal/pycore_ceval.h", line 46, in _PyEval_EvalFrame
  File "/home/bobren/local/a/pytorch/torch/_dynamo/convert_frame.py", line 625, in __call__
    return _compile(
  File "/usr/local/src/conda/python-3.10.16/Include/cpython/abstract.h", line 114, in _PyObject_VectorcallTstate
  File "/usr/local/src/conda/python-3.10.16/Include/internal/pycore_ceval.h", line 46, in _PyEval_EvalFrame
  File "/home/bobren/local/a/pytorch/torch/_dynamo/convert_frame.py", line 1092, in _compile
    guarded_code = compile_inner(code, one_graph, hooks, transform)
  File "/usr/local/src/conda/python-3.10.16/Include/cpython/abstract.h", line 114, in _PyObject_VectorcallTstate
  File "/usr/local/src/conda/python-3.10.16/Include/internal/pycore_ceval.h", line 46, in _PyEval_EvalFrame
  File "/home/bobren/local/a/pytorch/torch/_utils_internal.py", line 97, in wrapper_function
    return function(*args, **kwargs)
  File "/usr/local/src/conda/python-3.10.16/Python/ceval.c", line 5945, in do_call_core
  File "/usr/local/src/conda/python-3.10.16/Include/internal/pycore_ceval.h", line 46, in _PyEval_EvalFrame
  File "/home/bobren/local/a/pytorch/torch/_dynamo/convert_frame.py", line 779, in compile_inner
    return _compile_inner(code, one_graph, hooks, transform)
  File "/usr/local/src/conda/python-3.10.16/Include/cpython/abstract.h", line 114, in _PyObject_VectorcallTstate
  File "/usr/local/src/conda/python-3.10.16/Include/internal/pycore_ceval.h", line 46, in _PyEval_EvalFrame
  File "/home/bobren/local/a/pytorch/torch/_dynamo/convert_frame.py", line 818, in _compile_inner
    out_code = transform_code_object(code, transform)
  File "/usr/local/src/conda/python-3.10.16/Include/cpython/abstract.h", line 114, in _PyObject_VectorcallTstate
  File "/usr/local/src/conda/python-3.10.16/Include/internal/pycore_ceval.h", line 46, in _PyEval_EvalFrame
  File "/home/bobren/local/a/pytorch/torch/_dynamo/bytecode_transformation.py", line 1424, in transform_code_object
    transformations(instructions, code_options)
  File "/usr/local/src/conda/python-3.10.16/Include/cpython/abstract.h", line 114, in _PyObject_VectorcallTstate
  File "/usr/local/src/conda/python-3.10.16/Include/internal/pycore_ceval.h", line 46, in _PyEval_EvalFrame
  File "/home/bobren/local/a/pytorch/torch/_dynamo/convert_frame.py", line 265, in _fn
    return fn(*args, **kwargs)
  File "/usr/local/src/conda/python-3.10.16/Python/ceval.c", line 5945, in do_call_core
  File "/usr/local/src/conda/python-3.10.16/Include/internal/pycore_ceval.h", line 46, in _PyEval_EvalFrame
  File "/home/bobren/local/a/pytorch/torch/_dynamo/convert_frame.py", line 743, in transform
    tracer.run()
  File "/usr/local/src/conda/python-3.10.16/Include/cpython/abstract.h", line 114, in _PyObject_VectorcallTstate
  File "/usr/local/src/conda/python-3.10.16/Include/internal/pycore_ceval.h", line 46, in _PyEval_EvalFrame
  File "/home/bobren/local/a/pytorch/torch/_dynamo/symbolic_convert.py", line 3531, in run
    super().run()
  File "/usr/local/src/conda/python-3.10.16/Include/cpython/abstract.h", line 114, in _PyObject_VectorcallTstate
  File "/usr/local/src/conda/python-3.10.16/Include/internal/pycore_ceval.h", line 46, in _PyEval_EvalFrame
  File "/home/bobren/local/a/pytorch/torch/_dynamo/symbolic_convert.py", line 1359, in run
    while self.step():
  File "/usr/local/src/conda/python-3.10.16/Include/cpython/abstract.h", line 114, in _PyObject_VectorcallTstate
  File "/usr/local/src/conda/python-3.10.16/Include/internal/pycore_ceval.h", line 46, in _PyEval_EvalFrame
  File "/home/bobren/local/a/pytorch/torch/_dynamo/symbolic_convert.py", line 1263, in step
    self.dispatch_table[inst.opcode](self, inst)
  File "/usr/local/src/conda/python-3.10.16/Include/cpython/abstract.h", line 114, in _PyObject_VectorcallTstate
  File "/usr/local/src/conda/python-3.10.16/Include/internal/pycore_ceval.h", line 46, in _PyEval_EvalFrame
  File "/home/bobren/local/a/pytorch/torch/_dynamo/symbolic_convert.py", line 422, in impl
    self.push(fn_var.call_function(self, self.popn(nargs), {}))
  File "/usr/local/src/conda/python-3.10.16/Include/cpython/abstract.h", line 114, in _PyObject_VectorcallTstate
  File "/usr/local/src/conda/python-3.10.16/Include/internal/pycore_ceval.h", line 46, in _PyEval_EvalFrame
  File "/home/bobren/local/a/pytorch/torch/_dynamo/variables/builtin.py", line 1160, in call_function
    return handler(tx, args, kwargs)
  File "/usr/local/src/conda/python-3.10.16/Include/cpython/abstract.h", line 114, in _PyObject_VectorcallTstate
  File "/usr/local/src/conda/python-3.10.16/Include/internal/pycore_ceval.h", line 46, in _PyEval_EvalFrame
  File "/home/bobren/local/a/pytorch/torch/_dynamo/variables/builtin.py", line 792, in <lambda>
    return lambda tx, args, kwargs: obj.call_function(
  File "/usr/local/src/conda/python-3.10.16/Include/cpython/abstract.h", line 114, in _PyObject_VectorcallTstate
  File "/usr/local/src/conda/python-3.10.16/Include/internal/pycore_ceval.h", line 46, in _PyEval_EvalFrame
  File "/home/bobren/local/a/pytorch/torch/_dynamo/variables/builtin.py", line 1160, in call_function
    return handler(tx, args, kwargs)
  File "/usr/local/src/conda/python-3.10.16/Include/cpython/abstract.h", line 114, in _PyObject_VectorcallTstate
  File "/usr/local/src/conda/python-3.10.16/Include/internal/pycore_ceval.h", line 46, in _PyEval_EvalFrame
  File "/home/bobren/local/a/pytorch/torch/_dynamo/variables/builtin.py", line 1120, in _handle_insert_op_in_graph
    return wrap_fx_proxy(tx, proxy)
  File "/usr/local/src/conda/python-3.10.16/Include/cpython/abstract.h", line 114, in _PyObject_VectorcallTstate
  File "/usr/local/src/conda/python-3.10.16/Include/internal/pycore_ceval.h", line 46, in _PyEval_EvalFrame
  File "/home/bobren/local/a/pytorch/torch/_dynamo/variables/builder.py", line 2500, in wrap_fx_proxy
    return wrap_fx_proxy_cls(target_cls=TensorVariable, **kwargs)
  File "/usr/local/src/conda/python-3.10.16/Python/ceval.c", line 5945, in do_call_core
  File "/usr/local/src/conda/python-3.10.16/Objects/call.c", line 267, in PyVectorcall_Call
  File "/usr/local/src/conda/python-3.10.16/Include/internal/pycore_ceval.h", line 46, in _PyEval_EvalFrame
  File "/home/bobren/local/a/pytorch/torch/_dynamo/variables/builder.py", line 2566, in wrap_fx_proxy_cls
    return _wrap_fx_proxy(
  File "/usr/local/src/conda/python-3.10.16/Python/ceval.c", line 5945, in do_call_core
  File "/usr/local/src/conda/python-3.10.16/Include/internal/pycore_ceval.h", line 46, in _PyEval_EvalFrame
  File "/home/bobren/local/a/pytorch/torch/_dynamo/variables/builder.py", line 2664, in _wrap_fx_proxy
    example_value = get_fake_value(proxy.node, tx, allow_non_graph_fake=True)
  File "/usr/local/src/conda/python-3.10.16/Include/cpython/abstract.h", line 114, in _PyObject_VectorcallTstate
  File "/usr/local/src/conda/python-3.10.16/Include/internal/pycore_ceval.h", line 46, in _PyEval_EvalFrame
  File "/home/bobren/local/a/pytorch/torch/_dynamo/utils.py", line 3205, in get_fake_value
    ret_val = wrap_fake_exception(
  File "/usr/local/src/conda/python-3.10.16/Include/cpython/abstract.h", line 114, in _PyObject_VectorcallTstate
  File "/usr/local/src/conda/python-3.10.16/Include/internal/pycore_ceval.h", line 46, in _PyEval_EvalFrame
  File "/home/bobren/local/a/pytorch/torch/_dynamo/utils.py", line 2705, in wrap_fake_exception
    return fn()
  File "/usr/local/src/conda/python-3.10.16/Include/cpython/abstract.h", line 114, in _PyObject_VectorcallTstate
  File "/usr/local/src/conda/python-3.10.16/Include/internal/pycore_ceval.h", line 46, in _PyEval_EvalFrame
  File "/home/bobren/local/a/pytorch/torch/_dynamo/utils.py", line 3206, in <lambda>
    lambda: run_node(tx.output, node, args, kwargs, nnmodule)
  File "/usr/local/src/conda/python-3.10.16/Include/cpython/abstract.h", line 114, in _PyObject_VectorcallTstate
  File "/usr/local/src/conda/python-3.10.16/Include/internal/pycore_ceval.h", line 46, in _PyEval_EvalFrame
  File "/home/bobren/local/a/pytorch/torch/_dynamo/utils.py", line 3373, in run_node
    return node.target(*args, **kwargs)
  File "/usr/local/src/conda/python-3.10.16/Python/ceval.c", line 5917, in do_call_core
  File "/usr/local/src/conda/python-3.10.16/Objects/methodobject.c", line 430, in cfunction_vectorcall_FASTCALL
  File "/usr/local/src/conda/python-3.10.16/Objects/abstract.c", line 891, in binary_op1
  File "/usr/local/src/conda/python-3.10.16/Objects/typeobject.c", line 7284, in slot_nb_multiply
  File "/usr/local/src/conda/python-3.10.16/Include/cpython/abstract.h", line 114, in _PyObject_VectorcallTstate
  File "/usr/local/src/conda/python-3.10.16/Objects/descrobject.c", line 344, in method_vectorcall_VARARGS_KEYWORDS
  File "python_variable_methods.cpp", line 0, in _object* torch::autograd::TypeError_to_NotImplemented_<&torch::autograd::THPVariable_mul>(_object*, _object*, _object*)
  File "python_variable_methods.cpp", line 0, in torch::autograd::THPVariable_mul(_object*, _object*, _object*)
  File "??", line 0, in at::_ops::mul_Tensor::call(at::Tensor const&, at::Tensor const&)
  File "offloadstuff.c", line 0, in c10::impl::BoxedKernelWrapper<at::Tensor (at::Tensor const&, at::Tensor const&), void>::call(c10::BoxedKernel const&, c10::OperatorHandle const&, c10::DispatchKeySet, at::Tensor const&, at::Tensor const&)
  File "PyInterpreter.cpp", line 0, in torch::detail::(anonymous namespace)::ConcretePyInterpreterVTable::python_dispatcher(c10::OperatorHandle const&, c10::DispatchKeySet, std::vector<c10::IValue, std::allocator<c10::IValue> >*) const
  File "offloadstuff.c", line 0, in c10::OperatorHandle::callBoxedForDispatchKey(c10::DispatchKey, std::vector<c10::IValue, std::allocator<c10::IValue> >&) const
  File "PythonFallbackKernel.cpp", line 0, in void c10::BoxedKernel::make_boxed_function<&(anonymous namespace)::pythonTLSSnapshotFallback>(c10::OperatorKernel*, c10::OperatorHandle const&, c10::DispatchKeySet, std::vector<c10::IValue, std::allocator<c10::IValue> >*)
  File "PyInterpreter.cpp", line 0, in torch::detail::(anonymous namespace)::ConcretePyInterpreterVTable::python_dispatcher(c10::OperatorHandle const&, c10::DispatchKeySet, std::vector<c10::IValue, std::allocator<c10::IValue> >*) const
  File "offloadstuff.c", line 0, in c10::OperatorHandle::callBoxedForDispatchKey(c10::DispatchKey, std::vector<c10::IValue, std::allocator<c10::IValue> >&) const
  File "VariableType_0.cpp", line 0, in c10::impl::make_boxed_from_unboxed_functor<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (c10::DispatchKeySet, at::Tensor const&, at::Tensor const&), &torch::autograd::VariableType::(anonymous namespace)::mul_Tensor>, at::Tensor, c10::guts::typelist::typelist<c10::DispatchKeySet, at::Tensor const&, at::Tensor const&> >, false>::call(c10::OperatorKernel*, c10::OperatorHandle const&, c10::DispatchKeySet, std::vector<c10::IValue, std::allocator<c10::IValue> >*)
  File "VariableType_0.cpp", line 0, in torch::autograd::VariableType::(anonymous namespace)::mul_Tensor(c10::DispatchKeySet, at::Tensor const&, at::Tensor const&)
  File "??", line 0, in at::_ops::mul_Tensor::redispatch(c10::DispatchKeySet, at::Tensor const&, at::Tensor const&)
  File "offloadstuff.c", line 0, in c10::impl::BoxedKernelWrapper<at::Tensor (at::Tensor const&, at::Tensor const&), void>::call(c10::BoxedKernel const&, c10::OperatorHandle const&, c10::DispatchKeySet, at::Tensor const&, at::Tensor const&)
  File "PyInterpreter.cpp", line 0, in torch::detail::(anonymous namespace)::ConcretePyInterpreterVTable::python_dispatcher(c10::OperatorHandle const&, c10::DispatchKeySet, std::vector<c10::IValue, std::allocator<c10::IValue> >*) const
  File "offloadstuff.c", line 0, in c10::OperatorHandle::callBoxedForDispatchKey(c10::DispatchKey, std::vector<c10::IValue, std::allocator<c10::IValue> >&) const
  File "PythonFallbackKernel.cpp", line 0, in (anonymous namespace)::pythonFallback(c10::OperatorHandle const&, c10::DispatchKeySet, std::vector<c10::IValue, std::allocator<c10::IValue> >*)
  File "PyInterpreter.cpp", line 0, in torch::detail::(anonymous namespace)::ConcretePyInterpreterVTable::dispatch(c10::OperatorHandle const&, std::vector<c10::IValue, std::allocator<c10::IValue> >*) const
  File "??", line 0, in torch::handle_torch_function_no_python_arg_parser(c10::ArrayRef<_object*>, _object*, _object*, char const*, _object*, char const*, torch::TorchFunctionName)
  File "/usr/local/src/conda/python-3.10.16/Objects/call.c", line 577, in PyObject_CallMethod
  File "/usr/local/src/conda/python-3.10.16/Include/cpython/abstract.h", line 114, in _PyObject_VectorcallTstate
  File "/usr/local/src/conda/python-3.10.16/Include/internal/pycore_ceval.h", line 46, in _PyEval_EvalFrame
  File "/home/bobren/local/a/pytorch/torch/utils/_stats.py", line 27, in wrapper
    return fn(*args, **kwargs)
  File "/usr/local/src/conda/python-3.10.16/Python/ceval.c", line 5945, in do_call_core
  File "/usr/local/src/conda/python-3.10.16/Include/internal/pycore_ceval.h", line 46, in _PyEval_EvalFrame
  File "/home/bobren/local/a/pytorch/torch/_subclasses/fake_tensor.py", line 1346, in __torch_dispatch__
    return self.dispatch(func, types, args, kwargs)
  File "/usr/local/src/conda/python-3.10.16/Include/cpython/abstract.h", line 114, in _PyObject_VectorcallTstate
  File "/usr/local/src/conda/python-3.10.16/Include/internal/pycore_ceval.h", line 46, in _PyEval_EvalFrame
  File "/home/bobren/local/a/pytorch/torch/_subclasses/fake_tensor.py", line 2029, in dispatch
    return self._cached_dispatch_impl(func, types, args, kwargs)
  File "/usr/local/src/conda/python-3.10.16/Include/cpython/abstract.h", line 114, in _PyObject_VectorcallTstate
  File "/usr/local/src/conda/python-3.10.16/Include/internal/pycore_ceval.h", line 46, in _PyEval_EvalFrame
  File "/home/bobren/local/a/pytorch/torch/_subclasses/fake_tensor.py", line 1442, in _cached_dispatch_impl
    return self._dispatch_impl(func, types, args, kwargs)
  File "/usr/local/src/conda/python-3.10.16/Include/cpython/abstract.h", line 114, in _PyObject_VectorcallTstate
  File "/usr/local/src/conda/python-3.10.16/Include/internal/pycore_ceval.h", line 46, in _PyEval_EvalFrame
  File "/home/bobren/local/a/pytorch/torch/_subclasses/fake_tensor.py", line 2552, in _dispatch_impl
    return maybe_propagate_real_tensors(fast_impl(self, *args, **kwargs))
  File "/usr/local/src/conda/python-3.10.16/Python/ceval.c", line 5945, in do_call_core
  File "/usr/local/src/conda/python-3.10.16/Include/internal/pycore_ceval.h", line 46, in _PyEval_EvalFrame
  File "/home/bobren/local/a/pytorch/torch/_subclasses/fake_impls.py", line 956, in fast_binary_impl
    final_shape = infer_size(final_shape, shape)
  File "/usr/local/src/conda/python-3.10.16/Include/cpython/abstract.h", line 114, in _PyObject_VectorcallTstate
  File "/usr/local/src/conda/python-3.10.16/Include/internal/pycore_ceval.h", line 46, in _PyEval_EvalFrame
  File "/home/bobren/local/a/pytorch/torch/_subclasses/fake_impls.py", line 916, in infer_size
    torch._check(
  File "/usr/local/src/conda/python-3.10.16/Include/cpython/abstract.h", line 114, in _PyObject_VectorcallTstate
  File "/usr/local/src/conda/python-3.10.16/Include/internal/pycore_ceval.h", line 46, in _PyEval_EvalFrame
  File "/home/bobren/local/a/pytorch/torch/__init__.py", line 1669, in _check
    _check_with(RuntimeError, cond, message)
  File "/usr/local/src/conda/python-3.10.16/Include/cpython/abstract.h", line 114, in _PyObject_VectorcallTstate
  File "/usr/local/src/conda/python-3.10.16/Include/internal/pycore_ceval.h", line 46, in _PyEval_EvalFrame
  File "/home/bobren/local/a/pytorch/torch/__init__.py", line 1632, in _check_with
    if expect_true(cond):
  File "/usr/local/src/conda/python-3.10.16/Include/cpython/abstract.h", line 114, in _PyObject_VectorcallTstate
  File "/usr/local/src/conda/python-3.10.16/Include/internal/pycore_ceval.h", line 46, in _PyEval_EvalFrame
  File "/home/bobren/local/a/pytorch/torch/fx/experimental/symbolic_shapes.py", line 1686, in expect_true
    return a.node.expect_true(
  File "/usr/local/src/conda/python-3.10.16/Include/cpython/abstract.h", line 114, in _PyObject_VectorcallTstate
  File "/usr/local/src/conda/python-3.10.16/Include/internal/pycore_ceval.h", line 46, in _PyEval_EvalFrame
  File "/home/bobren/local/a/pytorch/torch/fx/experimental/sym_node.py", line 552, in expect_true
    return self.guard_bool(file, line)
  File "/usr/local/src/conda/python-3.10.16/Include/cpython/abstract.h", line 114, in _PyObject_VectorcallTstate
  File "/usr/local/src/conda/python-3.10.16/Include/internal/pycore_ceval.h", line 46, in _PyEval_EvalFrame
  File "/home/bobren/local/a/pytorch/torch/fx/experimental/sym_node.py", line 536, in guard_bool
    r = self.evaluate()
  File "/usr/local/src/conda/python-3.10.16/Include/cpython/abstract.h", line 114, in _PyObject_VectorcallTstate
  File "/usr/local/src/conda/python-3.10.16/Include/internal/pycore_ceval.h", line 46, in _PyEval_EvalFrame
  File "/home/bobren/local/a/pytorch/torch/fx/experimental/sym_node.py", line 510, in evaluate
    return self.shape_env.evaluate_sym_node(self, size_oblivious)
  File "/usr/local/src/conda/python-3.10.16/Include/cpython/abstract.h", line 114, in _PyObject_VectorcallTstate
  File "/usr/local/src/conda/python-3.10.16/Include/internal/pycore_ceval.h", line 46, in _PyEval_EvalFrame
  File "/home/bobren/local/a/pytorch/torch/fx/experimental/symbolic_shapes.py", line 7113, in evaluate_sym_node
    return self.evaluate_expr(
  File "/usr/local/src/conda/python-3.10.16/Include/cpython/abstract.h", line 114, in _PyObject_VectorcallTstate
  File "/usr/local/src/conda/python-3.10.16/Include/cpython/abstract.h", line 112, in _PyObject_VectorcallTstate
  File "/usr/local/src/conda/python-3.10.16/Objects/call.c", line 215, in _PyObject_MakeTpCall
  File "/usr/local/src/conda/python-3.10.16/Modules/_functoolsmodule.c", line 1020, in bounded_lru_cache_wrapper
  File "/usr/local/src/conda/python-3.10.16/Objects/call.c", line 267, in PyVectorcall_Call
  File "/usr/local/src/conda/python-3.10.16/Include/internal/pycore_ceval.h", line 46, in _PyEval_EvalFrame
  File "/home/bobren/local/a/pytorch/torch/fx/experimental/recording.py", line 272, in wrapper
    return retlog(fn(*args, **kwargs))
  File "/usr/local/src/conda/python-3.10.16/Python/ceval.c", line 5945, in do_call_core
  File "/usr/local/src/conda/python-3.10.16/Objects/call.c", line 267, in PyVectorcall_Call
  File "/usr/local/src/conda/python-3.10.16/Include/internal/pycore_ceval.h", line 46, in _PyEval_EvalFrame
  File "/home/bobren/local/a/pytorch/torch/fx/experimental/symbolic_shapes.py", line 7215, in evaluate_expr
    return self._inner_evaluate_expr(
  File "/usr/local/src/conda/python-3.10.16/Include/cpython/abstract.h", line 112, in _PyObject_VectorcallTstate
  File "/usr/local/src/conda/python-3.10.16/Objects/call.c", line 215, in _PyObject_MakeTpCall
  File "/usr/local/src/conda/python-3.10.16/Modules/_functoolsmodule.c", line 1020, in bounded_lru_cache_wrapper
  File "/usr/local/src/conda/python-3.10.16/Include/internal/pycore_ceval.h", line 46, in _PyEval_EvalFrame
  File "/home/bobren/local/a/pytorch/torch/fx/experimental/recording.py", line 272, in wrapper
    return retlog(fn(*args, **kwargs))
  File "/usr/local/src/conda/python-3.10.16/Python/ceval.c", line 5945, in do_call_core
  File "/usr/local/src/conda/python-3.10.16/Include/internal/pycore_ceval.h", line 46, in _PyEval_EvalFrame
  File "/home/bobren/local/a/pytorch/torch/fx/experimental/symbolic_shapes.py", line 7238, in _inner_evaluate_expr
    return self._evaluate_expr(
  File "/usr/local/src/conda/python-3.10.16/Include/cpython/abstract.h", line 114, in _PyObject_VectorcallTstate
  File "/usr/local/src/conda/python-3.10.16/Include/internal/pycore_ceval.h", line 46, in _PyEval_EvalFrame
  File "/home/bobren/local/a/pytorch/torch/fx/experimental/symbolic_shapes.py", line 7505, in _evaluate_expr
    self._maybe_guard_rel(g)
  File "/usr/local/src/conda/python-3.10.16/Include/cpython/abstract.h", line 112, in _PyObject_VectorcallTstate
  File "/usr/local/src/conda/python-3.10.16/Objects/call.c", line 215, in _PyObject_MakeTpCall
  File "/usr/local/src/conda/python-3.10.16/Modules/_functoolsmodule.c", line 1020, in bounded_lru_cache_wrapper
  File "/usr/local/src/conda/python-3.10.16/Include/internal/pycore_ceval.h", line 46, in _PyEval_EvalFrame
  File "/home/bobren/local/a/pytorch/torch/fx/experimental/symbolic_shapes.py", line 6758, in _maybe_guard_rel
    self._refine_ranges(expr)
  File "/usr/local/src/conda/python-3.10.16/Include/cpython/abstract.h", line 114, in _PyObject_VectorcallTstate
  File "/usr/local/src/conda/python-3.10.16/Include/internal/pycore_ceval.h", line 46, in _PyEval_EvalFrame
  File "/home/bobren/local/a/pytorch/torch/fx/experimental/symbolic_shapes.py", line 7709, in _refine_ranges
    self._set_replacement(
  File "/usr/local/src/conda/python-3.10.16/Include/cpython/abstract.h", line 114, in _PyObject_VectorcallTstate
  File "/usr/local/src/conda/python-3.10.16/Include/internal/pycore_ceval.h", line 46, in _PyEval_EvalFrame
  File "/home/bobren/local/a/pytorch/torch/fx/experimental/symbolic_shapes.py", line 6667, in _set_replacement
    self.framework_specialization_stacks[source] = CapturedTraceback.extract(cpp=True)
  File "/usr/local/src/conda/python-3.10.16/Include/cpython/abstract.h", line 114, in _PyObject_VectorcallTstate
  File "/usr/local/src/conda/python-3.10.16/Include/internal/pycore_ceval.h", line 46, in _PyEval_EvalFrame
  File "/home/bobren/local/a/pytorch/torch/utils/_traceback.py", line 207, in extract
    torch._C._profiler.gather_traceback(python=True, script=script, cpp=cpp),
  File "/usr/local/src/conda/python-3.10.16/Include/cpython/abstract.h", line 112, in _PyObject_VectorcallTstate
  File "/usr/local/src/conda/python-3.10.16/Objects/call.c", line 215, in _PyObject_MakeTpCall
  File "/usr/local/src/conda/python-3.10.16/Objects/methodobject.c", line 543, in cfunction_call
  File "offloadstuff.c", line 0, in pybind11::cpp_function::dispatcher(_object*, _object*, _object*)
  File "offloadstuff.c", line 0, in pybind11::cpp_function::initialize<std::shared_ptr<torch::CapturedTraceback> (*&)(bool, bool, bool), std::shared_ptr<torch::CapturedTraceback>, bool, bool, bool, pybind11::name, pybind11::scope, pybind11::sibling, pybind11::arg_v, pybind11::arg_v, pybind11::arg_v>(std::shared_ptr<torch::CapturedTraceback> (*&)(bool, bool, bool), std::shared_ptr<torch::CapturedTraceback> (*)(bool, bool, bool), pybind11::name const&, pybind11::scope const&, pybind11::sibling const&, pybind11::arg_v const&, pybind11::arg_v const&, pybind11::arg_v const&)::{lambda(pybind11::detail::function_call&)#3}::_FUN(pybind11::detail::function_call&)
  File "??", line 0, in torch::CapturedTraceback::gather(bool, bool, bool)
  File "??", line 0, in torch::unwind::unwind()

User stack:
  File "/home/bobren/local/a/pytorch/scratch/repro.py", line 5, in foo
    return torch.randn(5) * x
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/155603
Approved by: https://github.com/zou3519, https://github.com/cyyever
ghstack dependencies: #155133
2025-06-11 05:00:36 +00:00
Aaron Orenstein
fc0135ca11 Re-enable FakeTensor caching for SymInts (#152662)
Summary:

This backs out D60320595 which itself turned off FakeTensor caching when a SymInt was present.

There has been a lot of dynamic shape fixes done this year and tests pass so I'm assuming some of that work fixed what was breaking previously.

Test Plan: Reran the tests listed in T196779132 and they pass.

## Perf
### Instruction Counter Benchmark:
- 26% win on add_loop_eager_dynamic
- 13% win on add_loop_inductor_dynamic_gpu
### Perf Dashboard
Compilation Latency wins across the board but especially strong on the dynamic tests (like cudagraphs_dynamic) - for example MobileBertForMaskedLM went from 66s -> 50s.

Differential Revision: [D75467694](https://our.internmc.facebook.com/intern/diff/D75467694)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/152662
Approved by: https://github.com/anijain2305
2025-05-30 17:23:36 +00:00
bobrenjc93
9c06dff1ce [multigraph] use specializations in compile_and_call_fx_graph (#153449)
The goal of this multigraph work is to enable a compiled region that has a single dynamo trace but multiple backend specializations. This work was inspired by vLLM which does this in a somewhat hacky way where they use a custom backend to capture a dynamo graph and then manually invoke compile_fx multiple times to get specialized graphs.

There's really two parts of this work:

**The frontend changes:**
1) we introduce an optional kwarg `specialize_on` to mark_{dynamic,unbacked} that takes in a list of specializations. I debated other methods including specifying specializations via decorators, but ultimately decided this approach was more harmonious. The big issue with decorators is the difficulty of composing well with the rest of the torch.compile ecosystem including graph breaks, lazy initialization of variable trackers and symbolic variables, etc.

**The backend changes (this PR):**
1) We capture the backend_specialization specified in the mark_{dynamic,unbacked} API into a SymbolicContext. See changes in `/_dynamo/variables/builder.py`
2) After we are done dynamo tracing, we will lazily (more on this later) invoke `call_user_compiler` up to N + 1 times for N specializations and 1 generic graph. Under the hood this will call compile_fx, which composes nicely with both Async Compile and AOTAutogradCache. We do this by using a context manager to patch in specialization specific axioms into the ShapeEnv before invoking the user compiler.
3) When we have specializations, we install a lazy specialized dispatch function that checks each specialization and dispatches to the first one that matches. Instead of doing all of the specialization compiles up front, we do the compiles lazily. The first time a specialization is invoked, we will do the compilation and save it in a cache so subsequent invocations are fast. If none of the specializations match, we dispatch to the generic graph. I decided to do this over returning N different GuardedCodes since 1) it doesn't pollute the dynamo cache (eg. if you have 8 specializations, you would hit the cache limit) 2) it naturally incorporates the hierarchical lattice structure of the guards since the specializations are always necessarily stricter than the generic region's guards.

I benchmarked this PR stack with #152596 and found around a 50% reduction when dispatching to the specialized regions:

![495269647_576053105510082_9189856138964956774_n](https://github.com/user-attachments/assets/66030fed-d62e-4d87-940f-aa13c99b1a73)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/153449
Approved by: https://github.com/zou3519
ghstack dependencies: #153433
2025-05-30 03:19:49 +00:00
Laith Sakka
cd9ff41282 check fallback_value first. (#154493)
This is just a refactor, not a fix for any issue.
we do check fallback_value first  and early exit instead of checking it not set over and over.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/154493
Approved by: https://github.com/bobrenjc93
2025-05-29 09:06:43 +00:00
bobrenjc93
d62a33c002 [ez] add docblock for _expandsums (#154397)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/154397
Approved by: https://github.com/laithsakka
ghstack dependencies: #154400, #154398, #154396, #154399
2025-05-28 22:43:26 +00:00
bobrenjc93
0c00e32632 [ez] add docblock for _eval_is_non_overlapping_and_dense (#154399)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/154399
Approved by: https://github.com/laithsakka
ghstack dependencies: #154400, #154398, #154396
2025-05-28 22:40:03 +00:00
bobrenjc93
ed348e7026 Add docblock for TrackedFake (#154396)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/154396
Approved by: https://github.com/laithsakka
ghstack dependencies: #154400, #154398
2025-05-28 21:19:49 +00:00
bobrenjc93
d311b79c12 add docblock for _fast_expand (#154398)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/154398
Approved by: https://github.com/laithsakka
ghstack dependencies: #154400
2025-05-28 21:16:47 +00:00
bobrenjc93
e7318b863d [ez] add docblock to cast_symbool_to_symint_guardless (#154400)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/154400
Approved by: https://github.com/laithsakka
2025-05-28 21:11:53 +00:00
bobrenjc93
476e0a643a [ez] add docblock for ShapeGuardPythonPrinter (#154403)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/154403
Approved by: https://github.com/jingsh
ghstack dependencies: #154374, #154375, #154376, #154386, #154401, #154404, #154405, #154377, #154378, #154379, #154380, #154381, #154383, #154384, #154385, #154402
2025-05-28 14:17:17 +00:00
bobrenjc93
473a93eb58 [ez] add docblock for _ShapeGuardPrinter (#154402)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/154402
Approved by: https://github.com/jingsh
ghstack dependencies: #154374, #154375, #154376, #154386, #154401, #154404, #154405, #154377, #154378, #154379, #154380, #154381, #154383, #154384, #154385
2025-05-28 14:13:22 +00:00