Commit Graph

715 Commits

Author SHA1 Message Date
ankushwahaRH
ba3c2c80ab SDP Backend function fix (#161169)
The issue cannot be reproduced using the original repro code provided in the issue description.

However, the underlying issue mentioned by the maintainer (missing functions in `builder.py` and `trace_rules.py`) was never addressed and can still be reproduced with this test case:

```python
import torch
from torch.nn.attention import _cur_sdpa_kernel_backends

@torch.compile(fullgraph=True)
def test_function_that_triggers_error():
    return _cur_sdpa_kernel_backends()

print("Calling torch.compile function...")
try:
    result = test_function_that_triggers_error()
    print(f"Success: {result}")
except Exception as e:
    print(f"ERROR: {e}")
    print(f"Error type: {type(e)}")
```

The original repro likely no longer triggers the issue due to code path changes in the SDPA implementation, while the direct call to `_cur_sdpa_kernel_backends()` exposes the underlying problem where certain torch._C functions returning non-Tensor values aren't properly handled by dynamo tracing.

I have implemented the changes by adding the missing functions to both `builder.py` and `trace_rules.py` to properly handle these cases during compilation.

@guilhermeleobas

Pull Request resolved: https://github.com/pytorch/pytorch/pull/161169
Approved by: https://github.com/guilhermeleobas, https://github.com/StrongerXi
2025-09-19 20:19:59 +00:00
Pian Pawakapan
4c007073e6 [dynamic shapes] DynamicInts prototype (#162194)
Initial prototype for dynamic int inputs, allows users to run with `torch.compile(f)(DynamicInt(4))`, compiling dynamically and using the underlying hint at runtime.

Current behavior:
- Also works in eager (mostly by subclassing int), as scalar input to torch functions, or numpy/math/etc. For example, `x = DynamicInt(3); torch.randn(x); torch.add(y, z, alpha=x); np.arange(x)` all act as if x = 3.
- Behavior for arithmetic ops is to return new DynamicInts rather than static ints; `DynamicInt(3) * 2 = DynamicInt(6)`. This is via SymNode magic methods, but coverage might not be 100% - for example, I had to explicitly override floordiv to avoid int casting. This is not necessarily the case for non-magic method ops (e.g. `math.cos(x)`). The alternative here is to int cast on all operations, but I opted for this for dynamism propagation in non-compiled regions.
- Doesn't ban fullgraph=False; DynamicInt objects might be leaked back to the user, but I guess this is fine, because they can be casted to ints when needed?
- Dynamo only allocates one symbol per DynamicInt; specifying the same DynamicInt for multiple inputs leads to input deduplication, and a guard installed.
- We don't raise on int specialization (in allowlist/maybe_mark_dynamic style) - but an easy change if needed.
- DynamicInts as nn.Module attributes are handled.
- We don't guard on the DynamicInt id, e.g. users can do the following without recompiling (maybe we should guard?)
```python
x = DynamicInt(4)
f(x)
f(1)
f(DynamicInt(3))  # same as f(3)
```

Follow-up work:
- Specifying shape constraints, either at the int-level, e.g.
```python
DynamicInt(64, name="s0", constraints=["s0 % 32 == 0", "s0 <= 1024"]
```
or at the compilation level, e.g. something like
```python
s0 = DynamicInt(64, name="s0")
s1 = DynamicInt(128, name="s1")
with some_compiler_config.dynamic_int_constraints(["s1 == 2*s0", "s0 % 32 == 0"]):
    f(s0, s1)
```
This should subsume the need for specifying derived SymInts?
- SymFloat support - currently it seems backed floats are specialized by the tensorify float pass, and there's no handling in inductor.
- Propagating dynamism in tensor constructors, e.g. `x = DynamicInt(4); torch.randn(x)` could annotate `_dynamo_dynamic_indices`.

Differential Revision: D81698719

Pull Request resolved: https://github.com/pytorch/pytorch/pull/162194
Approved by: https://github.com/bobrenjc93
2025-09-18 23:26:28 +00:00
Simon Fan
821458d97a [dynamo][hop] Introduce Local Map HOP (#161458)
Can't actually deploy it because of: https://github.com/pytorch/pytorch/issues/161456

Pull Request resolved: https://github.com/pytorch/pytorch/pull/161458
Approved by: https://github.com/ydwu4
2025-09-17 09:32:38 +00:00
PyTorch MergeBot
e7c3f802ff Revert "[dynamo][hop] Introduce Local Map HOP (#161458)"
This reverts commit 505458db80.

Reverted https://github.com/pytorch/pytorch/pull/161458 on behalf of https://github.com/jeffdaily due to broke rocm tests ([comment](https://github.com/pytorch/pytorch/pull/161458#issuecomment-3299230458))
2025-09-16 15:14:36 +00:00
Simon Fan
505458db80 [dynamo][hop] Introduce Local Map HOP (#161458)
Can't actually deploy it because of: https://github.com/pytorch/pytorch/issues/161456

Pull Request resolved: https://github.com/pytorch/pytorch/pull/161458
Approved by: https://github.com/ydwu4
2025-09-16 00:37:40 +00:00
morrison-turnansky
86d34a43f5 NamedTuple: Allow side effects for dynamic attributes (#161645)
I confirmed that the tracing was correct i.e. NamedTupleVariable had the correct dynamic attribute added to it.

The problem was that NamedTupleVariable was always marked as immutable. This does not reflect the behavior of namedtuple.

Subclasses of namedtuple may be mutable, so when a NamedTupleVariable is derived from a subclass that is mutable, I made NamedTupleVariable mutable as well. Then side_effects correctly updates the returned object.

Fixes #161610

Pull Request resolved: https://github.com/pytorch/pytorch/pull/161645
Approved by: https://github.com/anijain2305, https://github.com/StrongerXi
2025-09-09 19:42:02 +00:00
Arsh Zahed
4c45090cf7 [DTensor] Check if tracing for sharding propagation to handle unhashable keys (#160798)
Fixes #159590

This is similar to the reverted commit #156868, except it resolves an issue with two caches becoming misaligned, leading to incorrect objects for stateful placements (i.e. `_MaskPartial`) as in issue #159601. This adds little to no overhead in eager ([see past benchmarks](https://github.com/pytorch/pytorch/pull/156868#issuecomment-3047831149)).

This also handles cases such as #159590  where dynamo is disabled during tracing by entering the Python Dispatcher ahead of the sharding propogation during compile. Tests are added/modified to handle these, and the list/tuple inputs with the cat op.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/160798
Approved by: https://github.com/bdhirsh
2025-09-09 03:52:05 +00:00
angelayi
5c67426d68 [dynamo] Add support for const prop on .item (#162204)
Fixes some of the errors in https://fb.workplace.com/groups/1028545332188949/permalink/1303030824740397/

Pull Request resolved: https://github.com/pytorch/pytorch/pull/162204
Approved by: https://github.com/williamwen42
2025-09-05 00:28:49 +00:00
William Wen
8678d831c4 [dynamo] rename set_fullgraph to error_on_graph_break (#161739)
Renaming `set_fullgraph` to `error_on_graph_break` for now. There are no semantic differences yet. In a followup PR, we will introduce a new `torch.compile` option `error_on_graph_break` that has lower priority than `fullgraph` so that `fullgraph` really returns 1 graph.

I could keep `set_fullgraph` as a deprecated alias for `error_on_graph_break` for now, but I'm hoping that won't be necessary since it's still private API (there are no internal callsites yet, and there are no significant OSS callsites yet).

 cc @albanD @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @chenyang78 @kadeng @chauhang @amjames @Lucaskabela @mlazos @guilhermeleobas @xmfan as primary users for `set_fullgraph`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/161739
Approved by: https://github.com/xmfan, https://github.com/Lucaskabela, https://github.com/anijain2305, https://github.com/mlazos
2025-09-04 01:15:06 +00:00
Animesh Jain
68fa882dad [dynamo] Correctly track mutation class source for MutableMappingVariable (#161568)
Fixes https://github.com/pytorch/pytorch/issues/161505

Pull Request resolved: https://github.com/pytorch/pytorch/pull/161568
Approved by: https://github.com/Lucaskabela, https://github.com/malfet
2025-08-27 21:47:17 +00:00
Yidi Wu
ba6ce66698 [dynamo] lift backed symint output of item() (#161198)
Before the change in this PR, we have an error for the following code
```python
import torch

torch._dynamo.config.capture_scalar_outputs = True

class M(torch.nn.Module):
    def forward(self, idx, x):
        u0 = idx.item()
        x0 = x.select(0, u0)
        def fn():
            return x0.sin()
        return torch.cond(x0.sum() > 0, fn, fn)

m = M()
out = torch.compile(m, fullgraph=True)(torch.tensor(0, dtype=torch.int64), torch.randn(3, 3))
```

The error is caused when speculate fn, and tries to lift symbol of x0.storage_offset() but found the symbols doesn't have a source associated with it.

What really happens is that, when input tensor is a scalar tensor of int type and resides on CPU, we have a short cut that creates a norm symint when .item() is called see https://github.com/pytorch/pytorch/pull/126245.

However, previously, we only track the unbacked symint output of an operation because we believe all the backed symint must have a source associated with it and has already bee lifted as input at the top-level. Now this invariant no longer holds, so we end up an error saying the symbol doesn't have source (because only input and symbols derided from inputs have source and result of .item() doesn't have a source).

In this PR, we start to also track the normal symint with the proxy that created it (i.e. in this case the proxy .item()).

Pull Request resolved: https://github.com/pytorch/pytorch/pull/161198
Approved by: https://github.com/zou3519
2025-08-26 17:06:54 +00:00
Arsh Zahed
9e491f753e [dynamo] Remove extra if statement in builder _wrap (#161215)
Removes a redundant if statement. Does not impact logic so no test changes needed.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/161215
Approved by: https://github.com/StrongerXi
2025-08-22 08:56:06 +00:00
Simon Fan
8aad3a60ce [dynamo] propagate tensor metadata on Tensor.__setitem__(tensor) (#161036)
Fixes silent incorrectness for autograd function tracing, where we rely on FakeTensor metadata (requires_grad) to determine whether to HOP or not: 5ee464db5c/torch/_dynamo/variables/misc.py (L671)

Stared at this with @anijain2305 yesterday, `Tensor.__setitem__` can update tensor metadata, and we can just run the fake prop and extract the output metadata from the updated FakeTensor.

FIXES https://github.com/pytorch/pytorch/issues/160901

It should also be the root cause behind the issue in https://github.com/pytorch/torchtitan/pull/1604 @bdhirsh  @ruisizhang123

Pull Request resolved: https://github.com/pytorch/pytorch/pull/161036
Approved by: https://github.com/anijain2305
ghstack dependencies: #160805
2025-08-22 04:43:22 +00:00
James Wu
9668210302 Allow bypasses for Precompile when guards, etc. cannot be serialized (#160902)
This adds a new function `bypass_package` and `CompilePackage.bypass_current_entry()`. This allows us to safely bypass if there are models with unserializable or incompatible parts. When we encounter something incompatible, we'll raise a bypass and ignore that particular code in DynamoCodeEntry.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/160902
Approved by: https://github.com/zhxchen17
2025-08-21 18:20:42 +00:00
Tugsbayasgalan (Tugsuu) Manlaibaatar
dbef606631 Add support for tracing vmap in pre-dispatch export (#154650)
Summary: ONNX team and recent transformer upgrade ran into this error and we also ran into during our export benchmarking. This diff makes it possible to trace through vmap implementation in pre-dispatch IR. Note that we don't support serializing functorch ops in pre-dispatch IR and in the future, we should desugar them to post-grad ops.

The implementation strategy is:
1. We add python wrappers around vmap APIs so that we attach custom torch function handler that is only on during non-strict export. The reason is we don't want to add this to default torch_function handler because it will break BC.
2. Some dynamo changes to make sure it picks up new python wrapper APIs. The reason is when we do strict export, we need to re-materialize these APIs in pre-dispatch IR from torch IR. We can avoid this by special casing in dynamo for export to proxy different API calls but i feel that is too much chaos because you need to be able to proxy 2 different variants of same vmap API.

Test Plan: CI

Differential Revision: D75623875

Pull Request resolved: https://github.com/pytorch/pytorch/pull/154650
Approved by: https://github.com/ezyang, https://github.com/zou3519
2025-08-20 19:31:07 +00:00
Guilherme Leobas
c6333f7dae Fixes for collections.NamedTuple (#159367)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/159367
Approved by: https://github.com/mlazos
ghstack dependencies: #159365, #159366, #159368, #159483, #159902, #159864, #159865
2025-08-18 17:32:59 +00:00
Pian Pawakapan
9eedd2a20b [PGO] no counterfactual suggestions for dynamic allowlist (#160231)
Being more conservative with whitelist suggestions as we roll out suggestions; now we only suggest sources that were dynamic in previous runs.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/160231
Approved by: https://github.com/bobrenjc93
2025-08-11 20:13:25 +00:00
Animesh Jain
3eb3da9b4b [dynamo][guards] Skip ID_MATCH guard on self.__class__.__closure__ (#159888)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/159888
Approved by: https://github.com/williamwen42
2025-08-06 00:36:43 +00:00
James Wu
90fd06be71 Various bugfixes for running NanoGPT training (#159166)
Fix various small bugs with running nanogpt on torchbenchmark in OSS under python 3.10. After these changes, the following now succeeds:

```
tlp python benchmarks/dynamo/torchbench.py --only nanogpt --performance  --training --backend inductor  --caching-precompile --warm-start-latency
```

Cold start: https://manifold.edge.x2p.facebook.net/v0/read/tree/logs/.tmp12LuZ5/index.html?bucketName=tlparse_reports&apiKey=tlparse_reports-key&withPayload=1&timeoutMsec=10000

Warm start (we are invesigating the recompile):
https://manifold.edge.x2p.facebook.net/v0/read/tree/logs/.tmpT5YTB2/index.html?bucketName=tlparse_reports&apiKey=tlparse_reports-key&withPayload=1&timeoutMsec=10000

Pull Request resolved: https://github.com/pytorch/pytorch/pull/159166
Approved by: https://github.com/zhxchen17
2025-07-30 16:30:22 +00:00
Guilherme Leobas
576253c476 [math] Trace float.fromhex (#156976)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/156976
Approved by: https://github.com/zou3519
ghstack dependencies: #156975, #156977
2025-07-23 16:12:08 +00:00
Ryan Guo
2c16eb9f3d [dynamo] Support more basic output types for nonstrict_trace (#157969)
Fixes #157397 and improves the user-facing error message for remaining
unsupported cases.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/157969
Approved by: https://github.com/zou3519
2025-07-19 00:59:54 +00:00
Simon Fan
07c4c2a792 [dynamo][be] hide warnings without invalidating warnings cache (#158520)
I feel uneasy about touching `__warningregistry__` since it is undocumented and private surface. The only public API hook that doesn't increment warnings version seems to be https://docs.python.org/3/library/warnings.html#warnings.showwarning.

So we could wack a mole all the warnings muters in compile to just not display warnings, and we wouldn't invalidate warnings cache. This PR adds it for torch/_dynamo, and I didn't find any warnings versioning mutation from torch/_inductor.

There is a behavior change if someone calls a compiled graph with simplefilter("error"):
```python
# e.g. test/dynamo_expected_failures/TestAutogradFallback.test_no_autograd_kernel_inplace_mode_nothing
with warnings.catch_warnings():
    warnings.simplefilter("error")  # turns all warnings into errors
    compiled_fn()  # will throw if any of the muted warnings fire
```

FIXES https://github.com/pytorch/pytorch/issues/128427

A note for the future: The warnings module doesn't offer a thread safe way of using it. Even regular filters have this problem, directly editing `__warningregistry__` would be very bad, and this PR would mute all threads. Someone will need to build a thread safe warnings interface.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/158520
Approved by: https://github.com/anijain2305, https://github.com/zou3519
2025-07-18 22:02:31 +00:00
Yidi Wu
651b4a68f2 [hop][dynamo] track run-ahead sym variables in side effects (#158273)
Before the PR, for code like this:
```
        class Example2(torch.nn.Module):
            def forward(self, x, trigger, target):
                return torch.cond(
                    trigger == 1,
                    lambda: x + target,
                    lambda: x * target,
                    (),
                )

        m = Example2()
        x = torch.randn(2)
        trigger = 0
        target = 2
        args = (x, trigger, target)
        ep = torch.export.export(
            m, args, dynamic_shapes=(None, Dim.DYNAMIC, Dim.DYNAMIC)
        )
```
dynamo will wrap "target" (i.e. a symInt) twice, once when we speculate the first lambda and find target is a symint and decides to wrap it up, creating a new SymNodeVariable and a placeholder input to the top-level graph.

The second time happens when we speculate the second lambda. Tensors are de-duplicated by checking tracked side effects to make sure object with the same id (though different sources) is mapped to the same TensorVaraible. For symints, two things are missing:
1. it's not in the _can_lift_attrs_to_input list (the change in builder.py)
2. it's not in the tracked by runahead_side_effects, so when speculate_subgraph finishes, they're discarded (the change in side_effects.py)

Note: the auto lifting mechanism for HOPs happens at proxy level when we trace the subgraph, which is after SymNodeVariable are created (they're created when realizing the args and bind them to subgraph). At that time, builder has created two unique SymNodeVariable for the same symint so the auto lifting in hops cannot de-dup them.

Differential Revision: [D78298163](https://our.internmc.facebook.com/intern/diff/D78298163)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/158273
Approved by: https://github.com/avikchaudhuri, https://github.com/zou3519
2025-07-15 23:48:20 +00:00
Xuehai Pan
7f14b42adf [BE][2/16] fix typos in torch/ (torch/_*/) (#156312)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/156312
Approved by: https://github.com/albanD
2025-07-12 05:47:06 +00:00
PyTorch MergeBot
e15f4248ad Revert "[BE][2/16] fix typos in torch/ (torch/_*/) (#156312)"
This reverts commit 7a92b51196.

Reverted https://github.com/pytorch/pytorch/pull/156312 on behalf of https://github.com/XuehaiPan due to landrace ([comment](https://github.com/pytorch/pytorch/pull/156312#issuecomment-3064672250))
2025-07-12 04:40:52 +00:00
Xuehai Pan
7a92b51196 [BE][2/16] fix typos in torch/ (torch/_*/) (#156312)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/156312
Approved by: https://github.com/albanD
2025-07-12 01:47:22 +00:00
bobrenjc93
80bcaa4195 have dynamic sources only apply to sizes and not strides (#157960)
@animesh pointed out using whitelist for strides can result in confusing graphs as follows

```
s60: "Sym(s60)", L_hidden_states_: "bf16[1, 4096, 3072][s60, 3072, 1]cuda:0"
```

We probably want to capture the relationship between sizes and strides anyways so let's make it so the whitelist only makes the sizes dynamic. That same graph now looks lik ethis

```
L_hidden_states_: "bf16[1, 4096, 64][262144, 64, 1]cuda:0"
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/157960
Approved by: https://github.com/pianpwk
2025-07-10 05:03:51 +00:00
Pian Pawakapan
752f202ef3 [PGO] include module int attributes in PGO state (#157518)
Dynamo specializes on int module attributes by default. This includes them in PGO state despite specialization, if they're involved in guards.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/157518
Approved by: https://github.com/bobrenjc93
2025-07-09 23:57:54 +00:00
Ryan Guo
f742b32a2f [dynamo] Avoid recompiling over unused objects (#156891)
Dynamo was aggressively specializing on lazy VTs over `set_name_hint` in
`STORE_FAST`, etc., and `isinstance` in `LOAD_FAST_CHECK`. This causes
regional `torch.compile` from optimizing ComfyUI GGUF + LoRA to either
(1). exceed the recompialtion limit of 8, which results in suboptimal
performance, and (2). even if recompilation limit is increased, the
compilation time gets unnecessarily high (180s v.s. 20s for Flux).

This patch fixes the recompilation issue.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/156891
Approved by: https://github.com/williamwen42, https://github.com/mlazos
2025-07-09 20:14:34 +00:00
Guilherme Leobas
0e7f02fe2e [Dynamo] [FrozensetSubclass] Add support for user defined frozensets (#154263)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/154263
Approved by: https://github.com/williamwen42
ghstack dependencies: #153150, #152991, #154539, #153553, #154063, #154064, #154065, #154066
2025-07-04 00:46:05 +00:00
Guilherme Leobas
22abe6ded4 [Dynamo] [SetSubclass] Add support for user defined sets (#153553)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/153553
Approved by: https://github.com/williamwen42, https://github.com/zou3519
ghstack dependencies: #153150, #152991, #154539
2025-07-04 00:45:25 +00:00
Guilherme Leobas
e7167dbacf [Set] Support sets in VariableBuilder (#153150)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/153150
Approved by: https://github.com/zou3519
2025-07-04 00:45:03 +00:00
William Wen
dcb8982969 [dynamo] move error_on_graph_break out of config (#156762)
error_on_graph_break doesn't need to be in config, so we move it out. It should make the functorch_maml_omniglot regression less severe.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/156762
Approved by: https://github.com/jansel
ghstack dependencies: #154283, #154289, #154782
2025-06-26 21:40:38 +00:00
Xuehai Pan
1b2146fc6d [BE][4/16] fix typos in torch/ (torch/_dynamo/) (#156314)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/156314
Approved by: https://github.com/jingsh
ghstack dependencies: #156313
2025-06-23 02:57:19 +00:00
PyTorch MergeBot
5b427c92a8 Revert "[BE][4/16] fix typos in torch/ (torch/_dynamo/) (#156314)"
This reverts commit ead741c5fb.

Reverted https://github.com/pytorch/pytorch/pull/156314 on behalf of https://github.com/atalman due to export/test_torchbind.py::TestCompileTorchbind::test_compile_error_on_input_aliasing_contents_backend_aot_eager [GH job link](https://github.com/pytorch/pytorch/actions/runs/15804799771/job/44548489912) [HUD commit link](c95f7fa874) ([comment](https://github.com/pytorch/pytorch/pull/156313#issuecomment-2994171213))
2025-06-22 12:31:57 +00:00
Xuehai Pan
ead741c5fb [BE][4/16] fix typos in torch/ (torch/_dynamo/) (#156314)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/156314
Approved by: https://github.com/jingsh
ghstack dependencies: #156313
2025-06-22 08:43:18 +00:00
Animesh Jain
fab85fc5f9 [compile][hierarchical compilation] Release nested_compile_region API (#156449)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/156449
Approved by: https://github.com/zou3519, https://github.com/jansel
2025-06-21 15:14:59 +00:00
David Berard
132babe7e0 [user triton] dynamo support for new host-side TMA API (#155662)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/155662
Approved by: https://github.com/aakhundov
ghstack dependencies: #155510
2025-06-12 12:56:23 +00:00
Animesh Jain
a9d5157e25 [dynamo] Use BINARY_SUBSCR for pre-graph bytecode for regular dict accesses (#155727)
vLLM profiler sets with_stack=True that shows the dict_getitem on the profiler, both inflating the numbers and confusing compile users. This PR keeps BINARY_SUBSCR for regular dicts, while using `dict.__getitem__` only for dict subclasses.

Using binary_subscr is little bit faster, but not enough to make any major latency improvements.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/155727
Approved by: https://github.com/zou3519, https://github.com/StrongerXi, https://github.com/jansel
2025-06-12 04:02:29 +00:00
Oguz Ulgen
d1947a8707 Migrate from lru_cache to cache (#155613)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/155613
Approved by: https://github.com/ezyang
ghstack dependencies: #155612
2025-06-11 19:44:18 +00:00
Animesh Jain
13ea0f2c0a [dynamo][dynamic] Recompilation hint for nn module integer attributes (#154867)
For program like this

```
class Mod(torch.nn.Module):
    def __init__(self):
        super().__init__()
        self.c = 0

    def forward(self, x):
        self.c += 1
        return x * self.c
```

You can check the recompile reasons at https://manifold.edge.x2p.facebook.net/v0/read/tree/logs/.tmpzv9z6Q/index.html?bucketName=tlparse_reports&apiKey=tlparse_reports-key&withPayload=1&timeoutMsec=10000

![image](https://github.com/user-attachments/assets/856a95fd-0533-4abc-a213-1f73ae2cb766)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/154867
Approved by: https://github.com/zou3519
2025-06-05 16:37:22 +00:00
Animesh Jain
c881f2ddf3 [reland][dynamo] Mark a vt unspecialized nn module variable source earlier (#155099)
Reland of https://github.com/pytorch/pytorch/pull/154780

Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/155099
Approved by: https://github.com/williamwen42
2025-06-04 23:05:36 +00:00
PyTorch MergeBot
a99a01a677 Revert "[dynamo] Mark a vt unspecialized nn module variable source earlier (#154780)"
This reverts commit cc96febb97.

Reverted https://github.com/pytorch/pytorch/pull/154780 on behalf of https://github.com/seemethere due to This fails internal testing see, https://fburl.com/diff/b0yuxk4w ([comment](https://github.com/pytorch/pytorch/pull/154780#issuecomment-2940381691))
2025-06-04 15:03:34 +00:00
PyTorch MergeBot
a0f2544502 Revert "[dynamo][dynamic] Recompilation hint for nn module integer attributes (#154867)"
This reverts commit 6c2f941e25.

Reverted https://github.com/pytorch/pytorch/pull/154867 on behalf of https://github.com/seemethere due to This fails internal testing see, https://fburl.com/diff/b0yuxk4w ([comment](https://github.com/pytorch/pytorch/pull/154780#issuecomment-2940381691))
2025-06-04 15:03:34 +00:00
Animesh Jain
6c2f941e25 [dynamo][dynamic] Recompilation hint for nn module integer attributes (#154867)
For program like this

```
class Mod(torch.nn.Module):
    def __init__(self):
        super().__init__()
        self.c = 0

    def forward(self, x):
        self.c += 1
        return x * self.c
```

You can check the recompile reasons at https://manifold.edge.x2p.facebook.net/v0/read/tree/logs/.tmpzv9z6Q/index.html?bucketName=tlparse_reports&apiKey=tlparse_reports-key&withPayload=1&timeoutMsec=10000

![image](https://github.com/user-attachments/assets/856a95fd-0533-4abc-a213-1f73ae2cb766)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/154867
Approved by: https://github.com/zou3519
ghstack dependencies: #154780
2025-06-04 00:05:53 +00:00
Animesh Jain
cc96febb97 [dynamo] Mark a vt unspecialized nn module variable source earlier (#154780)
I am working on providing some skip guard helper functions to allow users to reduce guard overhead. This is a refactor to allow that.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/154780
Approved by: https://github.com/StrongerXi, https://github.com/jansel
2025-06-03 19:19:47 +00:00
Ryan Guo
7183f52675 [dynamo] Support namedtuple subclass (#153982)
Fixes #133762. This involves
1. support tuple subclass constructed inside compile region.
2. handle the "fake" global scope associated with NamedTuple-generated
   `__new__`.
3. handle `namedtuple._tuplegetter` more faithfully.

Differential Revision: [D75488091](https://our.internmc.facebook.com/intern/diff/D75488091)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/153982
Approved by: https://github.com/jansel
ghstack dependencies: #154176
2025-05-30 16:14:37 +00:00
bobrenjc93
9c06dff1ce [multigraph] use specializations in compile_and_call_fx_graph (#153449)
The goal of this multigraph work is to enable a compiled region that has a single dynamo trace but multiple backend specializations. This work was inspired by vLLM which does this in a somewhat hacky way where they use a custom backend to capture a dynamo graph and then manually invoke compile_fx multiple times to get specialized graphs.

There's really two parts of this work:

**The frontend changes:**
1) we introduce an optional kwarg `specialize_on` to mark_{dynamic,unbacked} that takes in a list of specializations. I debated other methods including specifying specializations via decorators, but ultimately decided this approach was more harmonious. The big issue with decorators is the difficulty of composing well with the rest of the torch.compile ecosystem including graph breaks, lazy initialization of variable trackers and symbolic variables, etc.

**The backend changes (this PR):**
1) We capture the backend_specialization specified in the mark_{dynamic,unbacked} API into a SymbolicContext. See changes in `/_dynamo/variables/builder.py`
2) After we are done dynamo tracing, we will lazily (more on this later) invoke `call_user_compiler` up to N + 1 times for N specializations and 1 generic graph. Under the hood this will call compile_fx, which composes nicely with both Async Compile and AOTAutogradCache. We do this by using a context manager to patch in specialization specific axioms into the ShapeEnv before invoking the user compiler.
3) When we have specializations, we install a lazy specialized dispatch function that checks each specialization and dispatches to the first one that matches. Instead of doing all of the specialization compiles up front, we do the compiles lazily. The first time a specialization is invoked, we will do the compilation and save it in a cache so subsequent invocations are fast. If none of the specializations match, we dispatch to the generic graph. I decided to do this over returning N different GuardedCodes since 1) it doesn't pollute the dynamo cache (eg. if you have 8 specializations, you would hit the cache limit) 2) it naturally incorporates the hierarchical lattice structure of the guards since the specializations are always necessarily stricter than the generic region's guards.

I benchmarked this PR stack with #152596 and found around a 50% reduction when dispatching to the specialized regions:

![495269647_576053105510082_9189856138964956774_n](https://github.com/user-attachments/assets/66030fed-d62e-4d87-940f-aa13c99b1a73)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/153449
Approved by: https://github.com/zou3519
ghstack dependencies: #153433
2025-05-30 03:19:49 +00:00
bobrenjc93
d865b784e4 Support unbacked whitelist (#154295)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/154295
Approved by: https://github.com/angelayi
2025-05-28 23:01:22 +00:00
Pian Pawakapan
1d9b7dd2d1 [PGO] suggest dynamic whitelist for recompilations (#154189)
suggests `TORCH_COMPILE_DYNAMIC_SOURCES` based off tensor size changes in PGO code state, including parameters.

Closing #153442 which took the dynamo guards approach.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/154189
Approved by: https://github.com/bobrenjc93
2025-05-28 07:11:43 +00:00