Commit Graph

4601 Commits

Author SHA1 Message Date
PyTorch MergeBot
35fc5c49b4 Revert "[internal] Expose additional metadata to compilation callbacks (#153596)"
This reverts commit f889dea97d.

Reverted https://github.com/pytorch/pytorch/pull/153596 on behalf of https://github.com/izaitsevfb due to introduces bunch of callback-related failures on rocm ([comment](https://github.com/pytorch/pytorch/pull/153596#issuecomment-2923139061))
2025-05-30 18:39:27 +00:00
Aaron Gokaslan
b6b9311f4f [BE][Ez]: Fix typo in dynamo utils #154639 (#154748)
Fixes a typo in #154639

Pull Request resolved: https://github.com/pytorch/pytorch/pull/154748
Approved by: https://github.com/ngimel
2025-05-30 18:39:01 +00:00
Aaron Gokaslan
2120eeb8de [BE][Ez]: Improve dynamo utils typing with TypeIs and TypeGuard (#154639)
Adds some additional TypeIs and TypeGuard to some _dynamo utils for additional type narrowing

Pull Request resolved: https://github.com/pytorch/pytorch/pull/154639
Approved by: https://github.com/jansel
2025-05-30 18:09:50 +00:00
Sidharth
3bdceab124 [dynamo] fix: added star operator for graph_break_hints (#154713)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/154713
Approved by: https://github.com/zou3519, https://github.com/williamwen42
2025-05-30 17:31:03 +00:00
bobrenjc93
e7bf72c908 [multigraph] fix composabilty with aotautograd cache (#153526)
AOTAutogradCache uses FXGraphCache which uses the tracing context to get the ShapeEnv. Although the TracingContext global_context is cleared by the time we get around to reusing it, we don't actually need it. We just need the ShapeEnv in the TracingContext, which isn't cleared at the end of dynamo and does persist. This PR adds the tracing context manager around the specialized compile to ensure our caching infrastructure can get access to the ShapeEnv. A test was also added to prove correctness.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/153526
Approved by: https://github.com/jamesjwu, https://github.com/zou3519
ghstack dependencies: #153433, #153449
2025-05-30 16:56:17 +00:00
Ryan Guo
7183f52675 [dynamo] Support namedtuple subclass (#153982)
Fixes #133762. This involves
1. support tuple subclass constructed inside compile region.
2. handle the "fake" global scope associated with NamedTuple-generated
   `__new__`.
3. handle `namedtuple._tuplegetter` more faithfully.

Differential Revision: [D75488091](https://our.internmc.facebook.com/intern/diff/D75488091)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/153982
Approved by: https://github.com/jansel
ghstack dependencies: #154176
2025-05-30 16:14:37 +00:00
Ryan Guo
8002d22ce3 [dynamo] Trace into descriptor with __set__ (#154176)
As title, this patch basically implements
https://github.com/python/cpython/blob/3.11/Objects/object.c#L1371-L1452,
and make the `__get__` handling more robust.

I ran into this while fixing #133762.

Differential Revision: [D75488090](https://our.internmc.facebook.com/intern/diff/D75488090)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/154176
Approved by: https://github.com/jansel
2025-05-30 16:14:37 +00:00
Simon Fan
f889dea97d [internal] Expose additional metadata to compilation callbacks (#153596)
These hooks are used by internal stuck job detection to associate compilation events with the compile lease. Previously, we only had events for Dynamo and Inductor compilation. And recently, the callback handler was updated to ignore nested events. So the Inductor event was only really used by lazy backward.

Here, I remove the inductor event, and add an explicit lazy backward one. Additionally, I add other runtime compilation events: autotuning and cudagraphs. I also expose the CompileId as a string to avoid imports, this will let internal UIs track each graph's contribution to the timeout.

```python
class CallbackTrigger(enum.Enum):
    # most common case, dynamo attempts to trace a new frame
    DYNAMO = 1
    # backward compilation can be deferred to runtime
    LAZY_BACKWARD = 2
    # some backends autotune at runtime
    TRITON_AUTOTUNING = 3
    # cudagraphs record at runtime
    CUDAGRAPH_RECORDING = 4
```

Differential Revision: [D75092426](https://our.internmc.facebook.com/intern/diff/D75092426)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/153596
Approved by: https://github.com/masnesral
2025-05-30 08:07:04 +00:00
PyTorch MergeBot
ba51f4876d Revert "Enable C++ dynamic shape guards by default (#140756)"
This reverts commit dc0f09a478.

Reverted https://github.com/pytorch/pytorch/pull/140756 on behalf of https://github.com/izaitsevfb due to seem to break dynamo tests ([comment](https://github.com/pytorch/pytorch/pull/140756#issuecomment-2921151663))
2025-05-30 03:52:02 +00:00
bobrenjc93
9c06dff1ce [multigraph] use specializations in compile_and_call_fx_graph (#153449)
The goal of this multigraph work is to enable a compiled region that has a single dynamo trace but multiple backend specializations. This work was inspired by vLLM which does this in a somewhat hacky way where they use a custom backend to capture a dynamo graph and then manually invoke compile_fx multiple times to get specialized graphs.

There's really two parts of this work:

**The frontend changes:**
1) we introduce an optional kwarg `specialize_on` to mark_{dynamic,unbacked} that takes in a list of specializations. I debated other methods including specifying specializations via decorators, but ultimately decided this approach was more harmonious. The big issue with decorators is the difficulty of composing well with the rest of the torch.compile ecosystem including graph breaks, lazy initialization of variable trackers and symbolic variables, etc.

**The backend changes (this PR):**
1) We capture the backend_specialization specified in the mark_{dynamic,unbacked} API into a SymbolicContext. See changes in `/_dynamo/variables/builder.py`
2) After we are done dynamo tracing, we will lazily (more on this later) invoke `call_user_compiler` up to N + 1 times for N specializations and 1 generic graph. Under the hood this will call compile_fx, which composes nicely with both Async Compile and AOTAutogradCache. We do this by using a context manager to patch in specialization specific axioms into the ShapeEnv before invoking the user compiler.
3) When we have specializations, we install a lazy specialized dispatch function that checks each specialization and dispatches to the first one that matches. Instead of doing all of the specialization compiles up front, we do the compiles lazily. The first time a specialization is invoked, we will do the compilation and save it in a cache so subsequent invocations are fast. If none of the specializations match, we dispatch to the generic graph. I decided to do this over returning N different GuardedCodes since 1) it doesn't pollute the dynamo cache (eg. if you have 8 specializations, you would hit the cache limit) 2) it naturally incorporates the hierarchical lattice structure of the guards since the specializations are always necessarily stricter than the generic region's guards.

I benchmarked this PR stack with #152596 and found around a 50% reduction when dispatching to the specialized regions:

![495269647_576053105510082_9189856138964956774_n](https://github.com/user-attachments/assets/66030fed-d62e-4d87-940f-aa13c99b1a73)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/153449
Approved by: https://github.com/zou3519
ghstack dependencies: #153433
2025-05-30 03:19:49 +00:00
bobrenjc93
172015fc11 [multigraph] add specialize_on kwarg to mark_{dynamic,unbacked} (#153433)
The goal of this multigraph work is to enable a compiled region that has a single dynamo trace but multiple backend specializations. This work was inspired by vLLM which does this in a somewhat hacky way where they use a custom backend to capture a dynamo graph and then manually invoke compile_fx multiple times to get specialized graphs.

There's really two parts of this work:

**The frontend changes (this PR):**
1) we introduce an optional kwarg `specialize_on` to mark_{dynamic,unbacked} that takes in a list of specializations. I debated other methods including specifying specializations via decorators, but ultimately decided this approach was more harmonious. The big issue with decorators is the difficulty of composing well with the rest of the torch.compile ecosystem including graph breaks, lazy initialization of variable trackers and symbolic variables, etc.

**The backend changes:**
1) We capture the backend_specialization specified in the mark_{dynamic,unbacked} API into a SymbolicContext. See changes in `/_dynamo/variables/builder.py`
2) After we are done dynamo tracing, we will lazily (more on this later) invoke `call_user_compiler` up to N + 1 times for N specializations and 1 generic graph. Under the hood this will call compile_fx, which composes nicely with both Async Compile and AOTAutogradCache. We do this by using a context manager to patch in specialization specific axioms into the ShapeEnv before invoking the user compiler.
3) When we have specializations, we install a lazy specialized dispatch function that checks each specialization and dispatches to the first one that matches. Instead of doing all of the specialization compiles up front, we do the compiles lazily. The first time a specialization is invoked, we will do the compilation and save it in a cache so subsequent invocations are fast. If none of the specializations match, we dispatch to the generic graph. I decided to do this over returning N different GuardedCodes since 1) it doesn't pollute the dynamo cache (eg. if you have 8 specializations, you would hit the cache limit) 2) it naturally incorporates the hierarchical lattice structure of the guards since the specializations are always necessarily stricter than the generic region's guards.

I benchmarked this PR stack with #152596 and found around a 50% reduction when dispatching to the specialized regions:

![495269647_576053105510082_9189856138964956774_n](https://github.com/user-attachments/assets/66030fed-d62e-4d87-940f-aa13c99b1a73)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/153433
Approved by: https://github.com/zou3519
2025-05-30 01:08:15 +00:00
Isuru Fernando
dc0f09a478 Enable C++ dynamic shape guards by default (#140756)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/140756
Approved by: https://github.com/anijain2305, https://github.com/laithsakka
ghstack dependencies: #151225
2025-05-29 23:44:43 +00:00
William Wen
81b7c96697 [dynamo, nested graph breaks] add skip_frame debugging function (#153773)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/153773
Approved by: https://github.com/jansel
ghstack dependencies: #151056, #153510, #153772
2025-05-28 23:29:37 +00:00
William Wen
6cda280483 [dynamo, nested graph breaks] remove block stack graph break in output_graph (#153772)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/153772
Approved by: https://github.com/jansel
ghstack dependencies: #151056, #153510
2025-05-28 23:29:37 +00:00
William Wen
bbd45f1f1f [dynamo, nested graph breaks] refactor codegen to minimize NULL codegen'ing (#153510)
Stop codegening NULLs that we need to pop later. Some output_graph.py changes to prepare for nested graph break support.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/153510
Approved by: https://github.com/jansel
ghstack dependencies: #151056
2025-05-28 23:29:37 +00:00
William Wen
0f0d5749a0 [dynamo, nested graph breaks] small fixes to resume function generation (#151056)
Old: ~pack resume function stack + locals into a list: we need to be able to pass frame stack+locals in lists to hand off to nested functions in the future, so we implement this part first.~

We are no longer doing this right now since GraphModule/guard variable naming gets messed up. Going forward, our approach will be to keep the top frame unpacked, but pack the rest of the contents of other frames in a list.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/151056
Approved by: https://github.com/jansel
2025-05-28 23:29:37 +00:00
bobrenjc93
d865b784e4 Support unbacked whitelist (#154295)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/154295
Approved by: https://github.com/angelayi
2025-05-28 23:01:22 +00:00
mieshkiwrk
ef4d57329b [CAG] Support for call_module at copy paste aot bwd graph (#153827)
Support for `call_module` in `copy_paste_aot_backward_graph` added recently with PT2.7

Problem is being observed with HPU backend in example repro due to creating fused modules.

```
import torch

device = 'cpu' #'hpu'
backend = 'inductor' #'hpu_backend'

def fn(t1):
    t1 = t1 * 1
    t1_grad = torch.ones_like(t1, device=device)
    t1.backward(t1_grad, retain_graph=True)
    return t1

t1 = torch.ones(1, requires_grad=True, device=device) #.squeeze()
compiled_fn = torch.compile(fn, backend=backend)
result = compiled_fn(t1)

with torch._dynamo.compiled_autograd._enable(torch.compile(backend=backend)):
    result_grad = torch.ones_like(result, device=device)
    result.backward(result_grad)

print(f'{result_grad=}')
print(f'{t1.grad=}')
```

With this change I'm getting same results like on CPU, however I'm facing below problem when running with scalar (t1 tensor after squeeze):
`torch._dynamo.exc.TorchRuntimeError: Dynamo failed to run FX node with fake tensors: call_function <built-in function getitem>(*(FakeTensor(..., device='hpu:0', size=()), 0), **{}): got IndexError('invalid index of a 0-dim tensor. Use `tensor.item()` in Python or `tensor.item<T>()` in C++ to convert a 0-dim tensor to a number')`

While on CPU there's following warning and None returned:
`repro.py:23: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at pytorch/build/aten/src/ATen/core/TensorBody.h:489.)
  print(f'{t1.grad=}')
t1.grad=None`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/153827
Approved by: https://github.com/xmfan
2025-05-28 22:52:40 +00:00
Zhengxu Chen
0f56318152 [precompile] Add Exception type PackageError for unsupported precompile features. (#154430)
Summary:
Today when guard serialization fails, dynamo will raise an internal error like:

```
torch._dynamo.exc.InternalTorchDynamoError: RuntimeError: CLOSURE_MATCH guard cannot be serialized.
```

Adding a dedicated PackageError type to surface the error more clearly.

Test Plan: CI

Differential Revision: D75452124

Pull Request resolved: https://github.com/pytorch/pytorch/pull/154430
Approved by: https://github.com/jamesjwu, https://github.com/jansel
2025-05-28 22:34:51 +00:00
Sidharth
70539308ac [dynamo] updating gb_type names for uniqueness (#154452)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/154452
Approved by: https://github.com/williamwen42
2025-05-28 16:54:10 +00:00
Zhengxu Chen
5bf74753f6 [precompile] Prune local scope variables for guard serialization. (#154431)
Summary: Prune unused local objects from serialized local scope if they are not used in guard reconstruction. This is helpful when a user program takes things like local callable functions or the function call is recursive.

Test Plan:
test/dynamo/test_guard_serialization.py -k test_function_locals

Before pruning locals:
```
state = GuardsState(output_graph=OutputGraphGuardsState(local_scope={'x': tensor([ 0.0461,  0.4024, -1.0115]), 'g': <function ...aints=None, _guards=<torch._guards.GuardsSet object at 0x7fbccc7e9fc0>, _aotautograd_guards=[]), shape_code_parts=None)

    def pickle_guards_state(state: GuardsState) -> bytes:
        buf = io.BytesIO()
        pickler = GuardsStatePickler(buf)
        try:
            pickler.dump(state)
        except AttributeError as e:
>           raise torch._dynamo.exc.PackageError(str(e)) from e
E           torch._dynamo.exc.PackageError: Can't pickle local object 'TestGuardSerialization.test_function_locals.<locals>.foo'
```
After the diff
```
Tests finished: Pass 1. Fail 0. Fatal 0. Skip 0. Build failure 0
```

Differential Revision: D75452123

Pull Request resolved: https://github.com/pytorch/pytorch/pull/154431
Approved by: https://github.com/jansel
2025-05-28 16:03:02 +00:00
Joel Schlosser
9db7bcb3fe [Dynamo] Introduce hook receiving list of traced code objects (#153622)
This PR:
* Expands `Hooks` with a new, optional `frame_traced_fn` field. It should be a callable receiving the list of traced code objects
* Maintains a list of `traced_code` objects in the `TracingContext` of an `OutputGraph`
    *  Whenever an `inline_call()` is encountered, the corresponding code object is added to this set
    * `OutputGraph`'s associated `f_code` is added to the list just before the hook is called

I believe use of this hook should enable the source code hashing that vLLM does in a better way than monkey-patching `inline_call()`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/153622
Approved by: https://github.com/jansel
2025-05-28 15:40:09 +00:00
PyTorch MergeBot
a75e3a02be Revert "[dynamo, nested graph breaks] small fixes to resume function generation (#151056)"
This reverts commit 28e7aa21c5.

Reverted https://github.com/pytorch/pytorch/pull/151056 on behalf of https://github.com/malfet due to Not sure which one, but it broke test_error_messages, see 203b0efd63/1 ([comment](https://github.com/pytorch/pytorch/pull/151056#issuecomment-2916437433))
2025-05-28 13:53:50 +00:00
PyTorch MergeBot
9603d6382d Revert "[dynamo, nested graph breaks] refactor codegen to minimize NULL codegen'ing (#153510)"
This reverts commit 1fe9842922.

Reverted https://github.com/pytorch/pytorch/pull/153510 on behalf of https://github.com/malfet due to Not sure which one, but it broke test_error_messages, see 203b0efd63/1 ([comment](https://github.com/pytorch/pytorch/pull/151056#issuecomment-2916437433))
2025-05-28 13:53:50 +00:00
PyTorch MergeBot
5fd7004dc9 Revert "[dynamo, nested graph breaks] remove block stack graph break in output_graph (#153772)"
This reverts commit 9a66c30bdc.

Reverted https://github.com/pytorch/pytorch/pull/153772 on behalf of https://github.com/malfet due to Not sure which one, but it broke test_error_messages, see 203b0efd63/1 ([comment](https://github.com/pytorch/pytorch/pull/151056#issuecomment-2916437433))
2025-05-28 13:53:50 +00:00
PyTorch MergeBot
e86439ed5b Revert "[dynamo, nested graph breaks] add skip_frame debugging function (#153773)"
This reverts commit aadf9eae63.

Reverted https://github.com/pytorch/pytorch/pull/153773 on behalf of https://github.com/malfet due to Not sure which one, but it broke test_error_messages, see 203b0efd63/1 ([comment](https://github.com/pytorch/pytorch/pull/151056#issuecomment-2916437433))
2025-05-28 13:53:50 +00:00
Laith Sakka
853958f82c Fix: Replacements can cause runtime assertions to disappear and can cause invalid inductor code. (#153661)
Lets explore firs a couple of problem related to replacements and runtime assertions.

#### example problem 1
if we have a runtime assertions that u0==s0, u0 is an input coming from mark_unbacked. A replacement u0=s0 will be added, the function f(u0, s0) will become f(s0, s0), this leads to the assert  not being inserted during insert_deferred_runtime_asserts.
The reason is that insert_deferred_runtime_asserts logic insert each assertion once all its inputs are seen,  but u0 will never be seen. Same thing can happen when we defer assertion on backed i.e: s0==s2 ..etc.

#### example problem 2
Consider u0==s0, where u0 is coming from a call to .item() Imagine later on that a specialization happens to s0 to become 2. In that case s0 as input wont be seen during insert_deferred_runtime_asserts and the assertion won't be inserted in the graph. Worse, Inductor will generate some code that refers to s0 in the cpp wrapper while it does not exist, causing a failure.
internal xref: https://fb.workplace.com/groups/1075192433118967/permalink/1669766396994898/

## The solution :
Runtime assertions insertion loops depend on detecting that the symbols that are used in the runtime assertions are seen, note that those symbols are either graph inputs or generated in the graph from data dependent ops like .item().

The issues above happen when symbols are graph inputs, in order to force the symbols to exist in the graph and to be seen by the runtime assertions we do not do replacements on placeholders expressions during codegen and during runtime assertions insertion.

This should not have performance overhead, since we already optimized the graph with replacements, the only effect is not mistakenly dropping graph inputs that are used in runtime assertions.
I added extended testing. A solo unrelated follow up that I noticed, is that we might want to rename unbacked symbols in runtime assertions when we do unbacked renaming, but that's a different issue.

Other approaches that did not work :
#### ban replacements on unbacked.
1. does not work when we defer runtime assertions on backed ex: s0==s1. we could also ban such replacements
but problem 2 becomes more problematic.
2. Problem two, it affects the quality of reasoning ! in a bad way.

#### Apply specialization on runtime assertions before codegen .
1. Can fix some issues, but may lead also to runtime assertions becoming NOPs.
2. Does not fix the issue if not inserting runtime assertions during insert_deferred_runtime_asserts due to input not being detected.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/153661
Approved by: https://github.com/jansel
2025-05-28 09:08:05 +00:00
William Wen
aadf9eae63 [dynamo, nested graph breaks] add skip_frame debugging function (#153773)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/153773
Approved by: https://github.com/jansel
ghstack dependencies: #151056, #153510, #153772
2025-05-28 08:54:09 +00:00
William Wen
9a66c30bdc [dynamo, nested graph breaks] remove block stack graph break in output_graph (#153772)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/153772
Approved by: https://github.com/jansel
ghstack dependencies: #151056, #153510
2025-05-28 08:54:09 +00:00
William Wen
1fe9842922 [dynamo, nested graph breaks] refactor codegen to minimize NULL codegen'ing (#153510)
Stop codegening NULLs that we need to pop later. Some output_graph.py changes to prepare for nested graph break support.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/153510
Approved by: https://github.com/jansel
ghstack dependencies: #151056
2025-05-28 08:54:09 +00:00
William Wen
28e7aa21c5 [dynamo, nested graph breaks] small fixes to resume function generation (#151056)
Old: ~pack resume function stack + locals into a list: we need to be able to pass frame stack+locals in lists to hand off to nested functions in the future, so we implement this part first.~

We are no longer doing this right now since GraphModule/guard variable naming gets messed up. Going forward, our approach will be to keep the top frame unpacked, but pack the rest of the contents of other frames in a list.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/151056
Approved by: https://github.com/jansel
2025-05-28 08:54:09 +00:00
Pian Pawakapan
1d9b7dd2d1 [PGO] suggest dynamic whitelist for recompilations (#154189)
suggests `TORCH_COMPILE_DYNAMIC_SOURCES` based off tensor size changes in PGO code state, including parameters.

Closing #153442 which took the dynamo guards approach.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/154189
Approved by: https://github.com/bobrenjc93
2025-05-28 07:11:43 +00:00
Sidharth
54f1f29fed [dynamo] dynamic gb_type -> static gb_type (#154435)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/154435
Approved by: https://github.com/williamwen42
2025-05-28 03:14:26 +00:00
Ryan Guo
75bbd4989c [dynamo] Support using symint from dispatcher-style tensor subclass (#154130)
Fixes #146932.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/154130
Approved by: https://github.com/laithsakka
2025-05-27 19:05:46 +00:00
bobrenjc93
2560c1f3f0 add sticky cache pgo (#154418)
It's a reland of https://github.com/pytorch/pytorch/pull/154394 that hit some mergebot bug

Pull Request resolved: https://github.com/pytorch/pytorch/pull/154418
Approved by: https://github.com/malfet
2025-05-27 16:40:18 +00:00
bobrenjc93
53ecb8159a Introduce statically_known_false (#154291)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/154291
Approved by: https://github.com/mengluy0125
2025-05-24 14:23:55 +00:00
Zhengxu Chen
308beeeb56 [dynamo] Use UUID for compiled function variable names. (#154148)
Summary:
We previously assign each compiled function variable a name based on in-process global counter. This works fine within the same process but when we're trying to serialize the states with precompile, we need a way to load back these compiled functions without causing collision to the existing global scope.

Changing the counter to a true global uuid seems to resolve this issue.

For example, the new variable name will look like:
```
__compiled_fn_0_7ce7d872_4fe8_4174_b8fd_2496b09b8b43
```

Test Plan: CI

Differential Revision: D75244901

Pull Request resolved: https://github.com/pytorch/pytorch/pull/154148
Approved by: https://github.com/jansel
2025-05-24 01:08:42 +00:00
James Wu
bb17f9c98b [AOTAutogradCache] Fix CHROMIUM_EVENT_LOG being none (#154258)
It turns out if you import something that's None at import time in python, and later update the value, the one you imported stays none:

```
import torch
from torch._dynamo.utils import CHROMIUM_EVENT_LOG
class Foo:
  pass
torch._dynamo.utils.CHROMIUM_EVENT_LOG =  Foo()

print(CHROMIUM_EVENT_LOG) # None
```

This fixes teh bug so we get AOTAUtogradCache instant events again

Differential Revision: [D75305770](https://our.internmc.facebook.com/intern/diff/D75305770/)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/154258
Approved by: https://github.com/oulgen
2025-05-23 21:53:31 +00:00
William Wen
5bb156a7fd [dynamo] raise observed exception for module attribute errors (#153659)
Fixes https://github.com/pytorch/pytorch/issues/153605

Pull Request resolved: https://github.com/pytorch/pytorch/pull/153659
Approved by: https://github.com/StrongerXi
2025-05-23 03:56:26 +00:00
Ruisi Zhang
f74842d665 [DTensor] enable SimpleFSDP's composability with Tensor Parallel (#152286)
This PR adds support for SimpleFSDP's composability with Tensor Parallel + torch.compile.

`_StridedShard` is used in SimpleFSDP/FSDP2 to support correct distributed checkpointing when FSDP+TP is applied. Previously, `_StridedShard` is not guarded by torch.compile. This PR adds `_StridedShard` as an additional placement type to be guarded by torch.compile.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/152286
Approved by: https://github.com/bdhirsh
2025-05-23 01:40:38 +00:00
bobrenjc93
413664b3c5 catch CSE recursion depth errors (#154039)
Fixes #153777

CSE is an optimization and shouldn't block a compile if it hits recursion depth limits. Unfortunately we can't write this iteratively due to a dependency on `ast.unparse` which necessarily needs to do recursion. This PR catches opts out of CSE when we hit recursion depth errors.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/154039
Approved by: https://github.com/Microve
2025-05-22 20:17:19 +00:00
Yidi Wu
fc859077a0 [export][cond] support merging constant ints as unbacked symint (#152742)
@pianpwk points out that this will be helpful to address several data dependent issues in huggingface [models](e23705e557/src/diffusers/schedulers/scheduling_euler_ancestral_discrete.py (L332)) with the following pattern:
```python
idx = return 0 if u0 else return 1
return  x[idx]
```
We could preserve the conditional with a cond.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/152742
Approved by: https://github.com/zou3519
2025-05-22 17:25:38 +00:00
IvanKobzarev
4439255148 [aotd] Support saved tensors hooks in aot_autograd (#150032)
https://github.com/pytorch/pytorch/issues/148222

Goal:

At the moment autograd saved tensors hooks are run in eager after compiled forward.
They are executed at the same time for all saved tensors.
Hooks can be used to reduce amout of memory used for saved tensors, doing quantization or offloading to cpu.
This is suboptimal for optimization of peak memory.
Better solution will be to put the hooks in the graph, as close as possible to the last usage of the tensor.

To get user specified autograd saved tensors hooks in the graph.

Logic:

UX:
If user specifies with torch.autograd.graph.saved_tensors_hooks(pack_gm, unpack_gm).
Where pack_gm and unpack_gm are torch.fx.GraphModule.
Then AotAutograd will retrace those graph modules, doing decompositions and functionalization in aot_autograd, inlining the result graphs in forward epilogue and backward prologue.

User may want to use control logic in the hooks, for example applying quantization only for specific dtypes and sizes.

This is also possible, user can put it into torch.fx.wrap function and use symbolic trace to make a GraphModule.

In that case AotAutograd cahing will work only in case when user explicitly set to the torch.fx.wrap call_function node "user_cache_hash" metadata.

If this metadata set - then aot_autograd cache can use saved cache artifact.
If metadata is not set - then cache is bypassed.

Dynamo:
Dynamo traces pack and unpack hooks and installs them as subgraph and explicitly adds to the output_graph. (As those subgraphs are not used and will not be copied in the result by default).

The complexity here is that at this moment we do not have example of inputs for the hooks.
We trace  pack_hook with some Tensor from the inputs.
The result subgraphs are added to the hashing of AotAutograd Cache.

In AotAutograd we retrace the graph with the true saved tensors coming from partitioner.

Backwards Compatibility:
As current hooks are executed in eager mode and not all of them will be traceable - we only try to put in the graph hooks, explicitly marked by user with annotation (@_inlineable_saved_tensors_hooks).
For other hooks or if compiled autograd is enabled - keep the same logic.

Recompilations:
Hooks are guarded with lambda guard matching function id to cause recompilation if user reruns compiled function.

Aot_autograd:
After partitioner prepared forward and backward module - we trace prepared at Dynamo graphs for pack and unpack hooks and inline them in epilogue of forward and prologue of backward. Forward outputs and backward inputs are changed, transparently for user.

We do not try to put it close the last usage etc., relying on inductor to do this optimization.

```
INFO: TRACED GRAPH
 ===== Forward graph pre saved_tensors_hooks inlining 3 =====
 /data/users/ivankobzarev/a/pytorch/torch/fx/_lazy_graph_module.py class GraphModule(torch.nn.Module):
    def forward(self, primals_1: "Sym(s0)", primals_2: "Sym(s1)", primals_3: "f32[s0, s1][s1, 1]cuda:0"):
         # File: /data/users/ivankobzarev/a/pytorch/test/functorch/test_aotdispatch.py:6660 in simple_fn, code: x = x + 1
        add: "f32[s0, s1][s1, 1]cuda:0" = torch.ops.aten.add.Tensor(primals_3, 1);  primals_3 = None

         # File: /data/users/ivankobzarev/a/pytorch/test/functorch/test_aotdispatch.py:6661 in simple_fn, code: x = SAF.apply(x)
        view: "f32[s0, s1][s1, 1]cuda:0" = torch.ops.aten.view.default(add, [primals_1, primals_2])
        return (view, add, primals_1, primals_2)

INFO: TRACED GRAPH
 ===== Backward graph pre saved_tensors_hooks inlining 3 =====
 /data/users/ivankobzarev/a/pytorch/torch/fx/_lazy_graph_module.py class GraphModule(torch.nn.Module):
    def forward(self, primals_1: "Sym(s0)", primals_2: "Sym(s1)", primals_3: "f32[s0, s1][s1, 1]cuda:0"):
         # File: /data/users/ivankobzarev/a/pytorch/test/functorch/test_aotdispatch.py:6660 in simple_fn, code: x = x + 1
        add: "f32[s0, s1][s1, 1]cuda:0" = torch.ops.aten.add.Tensor(primals_3, 1);  primals_3 = None

         # File: /data/users/ivankobzarev/a/pytorch/test/functorch/test_aotdispatch.py:6661 in simple_fn, code: x = SAF.apply(x)
        view: "f32[s0, s1][s1, 1]cuda:0" = torch.ops.aten.view.default(add, [primals_1, primals_2])
        return (view, add, primals_1, primals_2)

INFO: TRACED GRAPH
 ===== saved_tensors_pack_hook add 3 =====
 /data/users/ivankobzarev/a/pytorch/torch/fx/_lazy_graph_module.py class pack_float8(torch.nn.Module):
    def forward(self, x_1: "f32[s0, s1][s1, 1]cuda:0"):
        # No stacktrace found for following nodes
        _to_copy: "f8e4m3fn[s0, s1][s1, 1]cuda:0" = torch.ops.aten._to_copy.default(x_1, dtype = torch.float8_e4m3fn);  x_1 = None
        return (torch.float32, _to_copy)

INFO: TRACED GRAPH
 ===== saved_tensors_unpack_hook add 3 =====
 <eval_with_key>.22 from /data/users/ivankobzarev/a/pytorch/torch/fx/experimental/proxy_tensor.py:1225 in wrapped class pack_float8(torch.nn.Module):
    def forward(self, x_1: "f32[s0, s1][s1, 1]cuda:0"):
        # No stacktrace found for following nodes
        _to_copy: "f8e4m3fn[s0, s1][s1, 1]cuda:0" = torch.ops.aten._to_copy.default(x_1, dtype = torch.float8_e4m3fn);  x_1 = None
        return (torch.float32, _to_copy)

INFO: TRACED GRAPH
 ===== Forward graph 3 =====
 /data/users/ivankobzarev/a/pytorch/torch/fx/_lazy_graph_module.py class GraphModule(torch.nn.Module):
    def forward(self, primals_1: "Sym(s0)", primals_2: "Sym(s1)", primals_3: "f32[s0, s1][s1, 1]cuda:0"):
         # File: /data/users/ivankobzarev/a/pytorch/test/functorch/test_aotdispatch.py:6660 in simple_fn, code: x = x + 1
        add: "f32[s0, s1][s1, 1]cuda:0" = torch.ops.aten.add.Tensor(primals_3, 1);  primals_3 = None

        # No stacktrace found for following nodes
        _to_copy: "f8e4m3fn[s0, s1][s1, 1]cuda:0" = torch.ops.aten._to_copy.default(add, dtype = torch.float8_e4m3fn)

         # File: /data/users/ivankobzarev/a/pytorch/test/functorch/test_aotdispatch.py:6661 in simple_fn, code: x = SAF.apply(x)
        view: "f32[s0, s1][s1, 1]cuda:0" = torch.ops.aten.view.default(add, [primals_1, primals_2]);  add = None
        return (view, _to_copy, primals_1, primals_2)

INFO: TRACED GRAPH
 ===== Backward graph 3 =====
 <eval_with_key>.21 class GraphModule(torch.nn.Module):
    def forward(self, primals_1: "Sym(s0)", primals_2: "Sym(s1)", add_packed_2: "f8e4m3fn[s0, s1][s1, 1]cuda:0", tangents_1: "f32[s0, s1][s1, 1]cuda:0"):
        # No stacktrace found for following nodes
        _to_copy: "f32[s0, s1][s1, 1]cuda:0" = torch.ops.aten._to_copy.default(add_packed_2, dtype = torch.float32);  add_packed_2 = None

         # File: /data/users/ivankobzarev/a/pytorch/test/functorch/test_aotdispatch.py:6661 in simple_fn, code: x = SAF.apply(x)
        add_7: "f32[s0, s1][s1, 1]cuda:0" = torch.ops.aten.add.Tensor(tangents_1, _to_copy);  tangents_1 = _to_copy = None
        return (None, None, add_7)

```

Differential Revision: [D72187044](https://our.internmc.facebook.com/intern/diff/D72187044)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/150032
Approved by: https://github.com/bdhirsh
2025-05-22 14:09:38 +00:00
Sidharth
c1b7dbc52a [dynamo] unimplemented -> unimplemented_v2 in variables/dict.py (#154040)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/154040
Approved by: https://github.com/williamwen42, https://github.com/StrongerXi
2025-05-22 06:46:10 +00:00
soulitzer
f2af30fee5 Add a HOP to bypass tracing of a wrapper function while tracing the wrapped function (#153487)
Usage:
```python
from torch._higher_order_ops.wrap import dynamo_bypassing_wrapper

# Your ordinary function wrapper
def my_hop_fn_impl(fn, *args, k=1, **kwargs):
    def wrapper(*args, **kwargs):
        out = fn(*args, **kwargs)
        if isinstance(out, tuple):
            return (out[0] + k,)
        return out + k

    return wrapper

# Calling `my_hop_fn` instead of the impl directly captures a HOP into the dynamo graph
def my_hop_fn(fn, *args, k=1, **kwargs):
    return dynamo_bypassing_wrapper(
        functools.partial(my_hop_fn_impl, k=k), fn, *args, **kwargs
    )
```

Notes:
- The dynamo captured graph now stashes arbitrary callable objects (the wrapper_fn) - this is equivalent to what SAC does today with policy_fn.
- The `wrapper_fn` passed to `dynamo_bypassing_wrapper ` should have signature `Callable -> Callable`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/153487
Approved by: https://github.com/ydwu4
2025-05-22 04:24:38 +00:00
Sidharth
0b79a8c1a9 [dynamo] renamed _fn for more clarity and put a comment of user compiler user (#154026)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/154026
Approved by: https://github.com/williamwen42, https://github.com/StrongerXi
2025-05-21 21:12:51 +00:00
Ryan Guo
4c6f0fe22f [dynamo] Properly handle torch.script.jit under @staticmethod (#153984)
Fixes #153607.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/153984
Approved by: https://github.com/williamwen42
2025-05-21 19:45:06 +00:00
Tomasz Bohutyn
bb7e30c165 [MegaCache] Make MegaCache generic to allow external plugins registration (#152977)
Implements #152976

Pull Request resolved: https://github.com/pytorch/pytorch/pull/152977
Approved by: https://github.com/oulgen
2025-05-21 18:18:47 +00:00
Michael Lazos
d44074f01a [Dynamo] Fix einops regression (#153925)
Fixes https://github.com/pytorch/pytorch/issues/153476

Pull Request resolved: https://github.com/pytorch/pytorch/pull/153925
Approved by: https://github.com/williamwen42
2025-05-20 09:52:42 +00:00
Sidharth
89ebd29fdc [Dynamo] added warning message for tracing lru_cache wrapped functions (#153744)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/153744
Approved by: https://github.com/williamwen42
2025-05-20 04:08:29 +00:00
PyTorch MergeBot
75eb2f3ff6 Revert "[Dynamo] added warning message for tracing lru_cache wrapped functions (#153744)"
This reverts commit aac30ef503.

Reverted https://github.com/pytorch/pytorch/pull/153744 on behalf of https://github.com/jeanschmidt due to Need to revert as it is breaking internal signals: [D74935585](https://www.internalfb.com/diff/D74935585) ([comment](https://github.com/pytorch/pytorch/pull/153744#issuecomment-2889187038))
2025-05-18 20:13:00 +00:00
Thomas Bohnstingl
68034198e5 [HOP] Mutation and alias rework (#146658)
This PR reworks the way the input mutations and various aliases are checked

Pull Request resolved: https://github.com/pytorch/pytorch/pull/146658
Approved by: https://github.com/ydwu4
2025-05-18 08:05:22 +00:00
Sidharth
aac30ef503 [Dynamo] added warning message for tracing lru_cache wrapped functions (#153744)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/153744
Approved by: https://github.com/williamwen42
2025-05-17 00:43:18 +00:00
clr
a952f42bdb dynamo: Log if we're using dynamic shapes via set_feature_usage (#153490)
This makes it extremely clear if a specific model didn't use dynamic shapes and
should have (except it had a bad config option).

Pull Request resolved: https://github.com/pytorch/pytorch/pull/153490
Approved by: https://github.com/jansel
2025-05-16 23:59:00 +00:00
Ryan Guo
e4a636df80 [dynamo] Make OptimizedModule more robust in attribute reads and writes (#153637)
Fixes #138157.

Differential Revision: [D74834872](https://our.internmc.facebook.com/intern/diff/D74834872)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/153637
Approved by: https://github.com/williamwen42
2025-05-16 20:29:19 +00:00
PyTorch MergeBot
c2dda47bc5 Revert "[dynamo] Make OptimizedModule more robust in attribute reads and writes (#153637)"
This reverts commit 2ce0b66db8.

Reverted https://github.com/pytorch/pytorch/pull/153637 on behalf of https://github.com/malfet due to Looks like it broke slow tests, see cda572b053/1 ([comment](https://github.com/pytorch/pytorch/pull/153637#issuecomment-2887449037))
2025-05-16 18:49:57 +00:00
Ryan Guo
2ce0b66db8 [dynamo] Make OptimizedModule more robust in attribute reads and writes (#153637)
Fixes #138157.

Differential Revision: [D74834872](https://our.internmc.facebook.com/intern/diff/D74834872)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/153637
Approved by: https://github.com/williamwen42
2025-05-16 15:17:07 +00:00
Guilherme Leobas
f66a159db5 [Set] Raise TypeError if set is called with the wrong number of arguments (#152990)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/152990
Approved by: https://github.com/anijain2305
ghstack dependencies: #150792, #152987, #152988, #152904, #152901, #152902, #152903, #152905, #152906, #152989, #152907, #152908
2025-05-16 14:28:32 +00:00
Guilherme Leobas
5a0ca65555 [Set] Add correct set/frozenset __init__ behavior (#152908)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/152908
Approved by: https://github.com/anijain2305
ghstack dependencies: #150792, #152987, #152988, #152904, #152901, #152902, #152903, #152905, #152906, #152989, #152907
2025-05-16 14:28:32 +00:00
Guilherme Leobas
053025494f [Set] Raise KeyError on empty set.pop() (#152907)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/152907
Approved by: https://github.com/anijain2305
ghstack dependencies: #150792, #152987, #152988, #152904, #152901, #152902, #152903, #152905, #152906, #152989
2025-05-16 14:28:32 +00:00
Guilherme Leobas
5964cb5eb1 [Set] Update set.union and set.update to support *args (#152989)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/152989
Approved by: https://github.com/anijain2305
ghstack dependencies: #150792, #152987, #152988, #152904, #152901, #152902, #152903, #152905, #152906
2025-05-16 14:28:32 +00:00
Guilherme Leobas
4759922c5e [Set] Add set.intersection(_update) (#152906)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/152906
Approved by: https://github.com/anijain2305
ghstack dependencies: #150792, #152987, #152988, #152904, #152901, #152902, #152903, #152905
2025-05-16 14:28:32 +00:00
Guilherme Leobas
ca96d55322 [Set] Add set.difference(_update) (#152905)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/152905
Approved by: https://github.com/anijain2305
ghstack dependencies: #150792, #152987, #152988, #152904, #152901, #152902, #152903
2025-05-16 14:28:32 +00:00
Guilherme Leobas
5c6830ced0 [Set] Raise KeyError if elem not contained in the set (#152903)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/152903
Approved by: https://github.com/anijain2305
ghstack dependencies: #150792, #152987, #152988, #152904, #152901, #152902
2025-05-16 14:28:32 +00:00
Guilherme Leobas
574f4c507a [Set] Add set.issubset and set.issuperset (#152902)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/152902
Approved by: https://github.com/anijain2305
ghstack dependencies: #150792, #152987, #152988, #152904, #152901
2025-05-16 14:28:32 +00:00
Guilherme Leobas
5926b7a38f [Set] Add set.symmetric_difference(_update) (#152901)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/152901
Approved by: https://github.com/anijain2305
ghstack dependencies: #150792, #152987, #152988, #152904
2025-05-16 14:28:32 +00:00
Guilherme Leobas
fe51ce62ca [Set] Raise TypeError if number of arguments mismatch (#152904)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/152904
Approved by: https://github.com/anijain2305
ghstack dependencies: #150792, #152987, #152988
2025-05-16 14:28:32 +00:00
Guilherme Leobas
481c345f49 [Set] Raise TypeError if argument is unhashable (#152988)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/152988
Approved by: https://github.com/anijain2305
ghstack dependencies: #150792, #152987
2025-05-16 14:28:32 +00:00
Guilherme Leobas
cf7021a0ee [Set] Handle exception in ConstantVariable operation (#152987)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/152987
Approved by: https://github.com/williamwen42, https://github.com/anijain2305
ghstack dependencies: #150792
2025-05-16 14:28:32 +00:00
PyTorch MergeBot
3443627e07 Revert "[BE]: Enable RUFF TRY400 rule - log.exception (#153473)"
This reverts commit 4f4ecc583e.

Reverted https://github.com/pytorch/pytorch/pull/153473 on behalf of https://github.com/jeanschmidt due to seems to have broken internal signals, @albanD may I count on you to help the author merge his PR? D74837988 ([comment](https://github.com/pytorch/pytorch/pull/153473#issuecomment-2886017075))
2025-05-16 08:29:26 +00:00
angelayi
3fe42d4d5d [export] Dynamo symint support (#152677)
Basically adds native _IntWrapper support to dynamo. Here's my process of trying to make symint input support work on dynamo, and how I ended up with this approach [(doc)](https://docs.google.com/document/d/1GvNRQd8BnxlMay_hrEVgEta6VUeUW_hcFeRuB7q1nDY/edit?tab=t.0).

What I did was, before passing inputs to dynamo.export, I first wrap them with a class, `_IntWrapper`. When processing dynamic shapes, I will then add the corresponding dynamic shape specification to the `dynamism` field stored on the `_IntWrapper`. If there is no dynamism specified, then this will get unwrapped back to an integer. When dynamo tracing, when we encounter an `_IntWrapper`, we will convert this to a symint if the dynamism was specified as `Dim.DYNAMIC/AUTO`. Dynamo will then trace a graph that contains symint inputs, which will get passed to AOTAutograd and so on.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/152677
Approved by: https://github.com/pianpwk
2025-05-16 07:51:50 +00:00
Raymond Li
56e1c236bf [Dynamo] Catch unserialisable NN modules (#153503)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/153503
Approved by: https://github.com/c00w, https://github.com/jansel
2025-05-16 02:55:28 +00:00
Animesh Jain
fa8543454a [dynamo][torch-function] Prevent unnecessary __torch_function__ tracing (#153551)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/153551
Approved by: https://github.com/mlazos
2025-05-15 14:06:17 +00:00
Aaron Gokaslan
4f4ecc583e [BE]: Enable RUFF TRY400 rule - log.exception (#153473)
Change logging.error to logging.exception to log additional information when relevant.  A few places have slipped in logging.errors in try except since I last did a clean up here and the rule is stabilized so I am enabling it codebase wide. I have NOQA'd much of our custom exception stack trace handling for RPC calls and distributed and tried to a fix a few errors based on whether we immediately reraised it or if we didn't print any exception handling where it could be useful.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/153473
Approved by: https://github.com/albanD, https://github.com/cyyever
2025-05-15 13:36:59 +00:00
Animesh Jain
9839ec1383 [dynamo][compile-time] Cache method on load builtin (#153524)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/153524
Approved by: https://github.com/StrongerXi, https://github.com/jansel
ghstack dependencies: #153522
2025-05-15 05:54:15 +00:00
Animesh Jain
b47be23461 [dynamo][compile-time] Faster inspect getattr_static for torch.Tensor (#153522)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/153522
Approved by: https://github.com/StrongerXi, https://github.com/jansel
2025-05-15 05:54:15 +00:00
Animesh Jain
03d01860fd [dynamo][compile-time] Compute logging related flags once (#153426)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/153426
Approved by: https://github.com/jansel
2025-05-14 21:19:06 +00:00
Ryan Guo
8bb67700a3 [dynamo] Support delattr on result of torch.compile(module) (#152741)
This is essentially a follow-up on #122098, where we added support of
`getattr` and `setattr` on result of `torch.compile(module)`, but didn't
add support for `delattr`.

Fixes #150711.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/152741
Approved by: https://github.com/anijain2305
ghstack dependencies: #152740
2025-05-14 17:03:59 +00:00
Ryan Guo
6765df052c [dynamo] Emit warning on global module hooks when calling using output of torch.compile(module) (#152740)
When we do `torch.compile(module)`, we eventually end up returning a new
`OptimizedModule` instance, whose `forward` method is the result of
`torch.compile(mod.__call__)`, meaning it already captures all the extra
logic (e.g., hook firing) for the compiled module.

`OptimizedModule` also inherits `nn.module.__call__`, and thus
has its own hook logic. This is useful for torchao, which injects module
forward hooks to run in eager for quantization purposes.

However, this might create unexpected behavior for global module hooks,
because `torch.compile(module)` causes the hook to fire one extra time
for `OptimizedModule`, when compared to eager.

To preserve BC, we simply emit a warning for this behavior, and let
users decide what to do. This is reasonable because the global module
hooks are documented to be used for debugging/profiling purposes only.

Fixes #149502

Differential Revision: [D74611716](https://our.internmc.facebook.com/intern/diff/D74611716)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/152740
Approved by: https://github.com/anijain2305, https://github.com/zou3519
2025-05-14 17:03:59 +00:00
Animesh Jain
8f3d7972ad [dynamo][compile-time] Cache the function signature to speedup inlining (#153396)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/153396
Approved by: https://github.com/jansel, https://github.com/StrongerXi
ghstack dependencies: #153333
2025-05-14 14:01:46 +00:00
Animesh Jain
864a5f4434 [dynamo][compile-time] Cache the cleaned insturctions while inlining (#153333)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/153333
Approved by: https://github.com/StrongerXi, https://github.com/jansel, https://github.com/williamwen42
2025-05-14 09:26:26 +00:00
Animesh Jain
11c64b7cf8 [dynamo][compile-time] Cache whether a function is inlineable (#153192)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/153192
Approved by: https://github.com/StrongerXi, https://github.com/jansel, https://github.com/williamwen42
ghstack dependencies: #153458
2025-05-14 05:40:25 +00:00
Animesh Jain
c797f1285c [dynamo][copmile-time] Handle builtins first in LOAD_GLOBAL (#153458)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/153458
Approved by: https://github.com/jansel
2025-05-14 03:04:38 +00:00
William Wen
8521a690f7 [dynamo] fix potential circular import error in decorators.py (#153217)
Differential Revision: [D74442043](https://our.internmc.facebook.com/intern/diff/D74442043)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/153217
Approved by: https://github.com/jansel
2025-05-14 01:01:57 +00:00
Sam Larsen
dde705864a Fix test broken by D73809989 (#153413)
Summary: I forgot to remove this unused field in D73809989.

Test Plan: `buck test 'fbcode//mode/opt' fbcode//caffe2/test:fbonly -- --exact 'caffe2/test:fbonly - test_compilation_metrics_logger_in_sync (caffe2.test.fb.test_fb.TestFBOnly)'`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/153413
Approved by: https://github.com/c00w
2025-05-13 16:44:30 +00:00
Simon Fan
a80eb84a5f [ca] support higher order gradients (create_graph=True) (#153222)
Adds create_graph support if you don't compile or compile only with torch.compile(backend="eager").

Using a backend that uses AOTDispatch produces a post-dispatch AOT backward, where its double backward will be silently incorrect if the forward trace involved any ops that are not composite implicit.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/153222
Approved by: https://github.com/jansel
ghstack dependencies: #153193
2025-05-13 16:42:09 +00:00
Simon Fan
37efaf4af9 [ca][api] config api shouldn't error with optimize_assert (#153193)
Toggling on `torch._dynamo.config.compiled_autograd = True` was erroring export (optimize_assert didn't have `rebuild_ctx` defined). Separately add a way to `rebuild_ctx` for `optimize_assert` since it is a public API.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/153193
Approved by: https://github.com/jansel
2025-05-13 16:42:02 +00:00
Guilherme Leobas
a4459cd4e3 Remove property from python_type function (#152900)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/152900
Approved by: https://github.com/amjames, https://github.com/anijain2305
ghstack dependencies: #153070
2025-05-13 16:26:25 +00:00
Guilherme Leobas
f67eb6f8c5 Fix path matching in CPythonTestCase/setUpClass (#153070)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/153070
Approved by: https://github.com/amjames, https://github.com/anijain2305, https://github.com/Skylion007
2025-05-13 16:26:25 +00:00
Animesh Jain
7fdd754136 [compile-time traces] Profile large missing gaps in compile time (#151256)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/151256
Approved by: https://github.com/bdhirsh, https://github.com/masnesral, https://github.com/zou3519, https://github.com/jansel
2025-05-13 14:44:51 +00:00
Michael Lazos
ff039d39ec [Dynamo] Optimize dedupe region ancestor tracking (#152589)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/152589
Approved by: https://github.com/anijain2305
ghstack dependencies: #152389, #152505, #152410, #152506, #152570, #152572
2025-05-13 12:17:59 +00:00
Michael Lazos
d0faa9985d [Dynamo] Fix typing in graph_deduplication.py (#152572)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/152572
Approved by: https://github.com/Skylion007, https://github.com/anijain2305
ghstack dependencies: #152389, #152505, #152410, #152506, #152570
2025-05-13 12:17:59 +00:00
Michael Lazos
a415c9831f [Hierarchical Compile] Replace tracing alias and mutation check with dynamo impl (#152570)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/152570
Approved by: https://github.com/anijain2305
ghstack dependencies: #152389, #152505, #152410, #152506
2025-05-13 12:17:59 +00:00
Michael Lazos
57dafb90ef [Hierarchical Compile] Take into account mutation deps in cycle detection (#152506)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/152506
Approved by: https://github.com/anijain2305
ghstack dependencies: #152389, #152505, #152410
2025-05-13 12:17:59 +00:00
Michael Lazos
118192011e [Hierarchical Compile] Add mutation dependencies to topological sorting (#152410)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/152410
Approved by: https://github.com/anijain2305
ghstack dependencies: #152389, #152505
2025-05-13 12:17:59 +00:00
Michael Lazos
3592cb52d9 [Hierarchical Compilation] Use universal flatten APIs (#152505)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/152505
Approved by: https://github.com/anijain2305
ghstack dependencies: #152389
2025-05-13 12:17:59 +00:00
Michael Lazos
023a3dc69f [Hierarchical Compilation] Track node mutations (#152389)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/152389
Approved by: https://github.com/anijain2305
2025-05-13 12:17:59 +00:00
PyTorch MergeBot
641e4bee67 Revert "[export][cond] support merging constant ints as unbacked symint (#152742)"
This reverts commit a805911d15.

Reverted https://github.com/pytorch/pytorch/pull/152742 on behalf of https://github.com/ydwu4 due to breaking trunk ([comment](https://github.com/pytorch/pytorch/pull/152742#issuecomment-2874410372))
2025-05-12 23:06:33 +00:00
Yidi Wu
a805911d15 [export][cond] support merging constant ints as unbacked symint (#152742)
@pianpwk points out that this will be helpful to address several data dependent issues in huggingface [models](e23705e557/src/diffusers/schedulers/scheduling_euler_ancestral_discrete.py (L332)) with the following pattern:
```python
idx = if u0 return 0 else return 1
return  x[idx]
```
We could preserve the conditional with a cond.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/152742
Approved by: https://github.com/zou3519
2025-05-12 20:26:31 +00:00
Aaron Gokaslan
3555ebb63d [BE]: Update ruff to 0.11.8 (#153249)
Fixes a ton of false negatives throughout the codebase. RUFF also properly validates NOQA comments now and most of the changes are fixing typos there or removing filewide flake8 suppressions that were also silencing ruff issues.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/153249
Approved by: https://github.com/cyyever, https://github.com/albanD, https://github.com/seemethere
2025-05-12 18:30:52 +00:00