pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
PyTorch MergeBot	35fc5c49b4	Revert "[internal] Expose additional metadata to compilation callbacks (#153596 )" This reverts commit `f889dea97d`. Reverted https://github.com/pytorch/pytorch/pull/153596 on behalf of https://github.com/izaitsevfb due to introduces bunch of callback-related failures on rocm ([comment](https://github.com/pytorch/pytorch/pull/153596#issuecomment-2923139061))	2025-05-30 18:39:27 +00:00
Aaron Gokaslan	b6b9311f4f	[BE][Ez]: Fix typo in dynamo utils #154639 (#154748 ) Fixes a typo in #154639 Pull Request resolved: https://github.com/pytorch/pytorch/pull/154748 Approved by: https://github.com/ngimel	2025-05-30 18:39:01 +00:00
Aaron Gokaslan	2120eeb8de	[BE][Ez]: Improve dynamo utils typing with TypeIs and TypeGuard (#154639 ) Adds some additional TypeIs and TypeGuard to some _dynamo utils for additional type narrowing Pull Request resolved: https://github.com/pytorch/pytorch/pull/154639 Approved by: https://github.com/jansel	2025-05-30 18:09:50 +00:00
Sidharth	3bdceab124	[dynamo] fix: added star operator for graph_break_hints (#154713 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/154713 Approved by: https://github.com/zou3519, https://github.com/williamwen42	2025-05-30 17:31:03 +00:00
bobrenjc93	e7bf72c908	[multigraph] fix composabilty with aotautograd cache (#153526 ) AOTAutogradCache uses FXGraphCache which uses the tracing context to get the ShapeEnv. Although the TracingContext global_context is cleared by the time we get around to reusing it, we don't actually need it. We just need the ShapeEnv in the TracingContext, which isn't cleared at the end of dynamo and does persist. This PR adds the tracing context manager around the specialized compile to ensure our caching infrastructure can get access to the ShapeEnv. A test was also added to prove correctness. Pull Request resolved: https://github.com/pytorch/pytorch/pull/153526 Approved by: https://github.com/jamesjwu, https://github.com/zou3519 ghstack dependencies: #153433, #153449	2025-05-30 16:56:17 +00:00
Ryan Guo	7183f52675	[dynamo] Support namedtuple subclass (#153982 ) Fixes #133762. This involves 1. support tuple subclass constructed inside compile region. 2. handle the "fake" global scope associated with NamedTuple-generated `__new__`. 3. handle `namedtuple._tuplegetter` more faithfully. Differential Revision: [D75488091](https://our.internmc.facebook.com/intern/diff/D75488091) Pull Request resolved: https://github.com/pytorch/pytorch/pull/153982 Approved by: https://github.com/jansel ghstack dependencies: #154176	2025-05-30 16:14:37 +00:00
Ryan Guo	8002d22ce3	[dynamo] Trace into descriptor with `__set__` (#154176 ) As title, this patch basically implements https://github.com/python/cpython/blob/3.11/Objects/object.c#L1371-L1452, and make the `__get__` handling more robust. I ran into this while fixing #133762. Differential Revision: [D75488090](https://our.internmc.facebook.com/intern/diff/D75488090) Pull Request resolved: https://github.com/pytorch/pytorch/pull/154176 Approved by: https://github.com/jansel	2025-05-30 16:14:37 +00:00
Simon Fan	f889dea97d	[internal] Expose additional metadata to compilation callbacks (#153596 ) These hooks are used by internal stuck job detection to associate compilation events with the compile lease. Previously, we only had events for Dynamo and Inductor compilation. And recently, the callback handler was updated to ignore nested events. So the Inductor event was only really used by lazy backward. Here, I remove the inductor event, and add an explicit lazy backward one. Additionally, I add other runtime compilation events: autotuning and cudagraphs. I also expose the CompileId as a string to avoid imports, this will let internal UIs track each graph's contribution to the timeout. ```python class CallbackTrigger(enum.Enum): # most common case, dynamo attempts to trace a new frame DYNAMO = 1 # backward compilation can be deferred to runtime LAZY_BACKWARD = 2 # some backends autotune at runtime TRITON_AUTOTUNING = 3 # cudagraphs record at runtime CUDAGRAPH_RECORDING = 4 ``` Differential Revision: [D75092426](https://our.internmc.facebook.com/intern/diff/D75092426) Pull Request resolved: https://github.com/pytorch/pytorch/pull/153596 Approved by: https://github.com/masnesral	2025-05-30 08:07:04 +00:00
PyTorch MergeBot	ba51f4876d	Revert "Enable C++ dynamic shape guards by default (#140756 )" This reverts commit `dc0f09a478`. Reverted https://github.com/pytorch/pytorch/pull/140756 on behalf of https://github.com/izaitsevfb due to seem to break dynamo tests ([comment](https://github.com/pytorch/pytorch/pull/140756#issuecomment-2921151663))	2025-05-30 03:52:02 +00:00
bobrenjc93	9c06dff1ce	[multigraph] use specializations in compile_and_call_fx_graph (#153449 ) The goal of this multigraph work is to enable a compiled region that has a single dynamo trace but multiple backend specializations. This work was inspired by vLLM which does this in a somewhat hacky way where they use a custom backend to capture a dynamo graph and then manually invoke compile_fx multiple times to get specialized graphs. There's really two parts of this work: The frontend changes: 1) we introduce an optional kwarg `specialize_on` to mark_{dynamic,unbacked} that takes in a list of specializations. I debated other methods including specifying specializations via decorators, but ultimately decided this approach was more harmonious. The big issue with decorators is the difficulty of composing well with the rest of the torch.compile ecosystem including graph breaks, lazy initialization of variable trackers and symbolic variables, etc. The backend changes (this PR): 1) We capture the backend_specialization specified in the mark_{dynamic,unbacked} API into a SymbolicContext. See changes in `/_dynamo/variables/builder.py` 2) After we are done dynamo tracing, we will lazily (more on this later) invoke `call_user_compiler` up to N + 1 times for N specializations and 1 generic graph. Under the hood this will call compile_fx, which composes nicely with both Async Compile and AOTAutogradCache. We do this by using a context manager to patch in specialization specific axioms into the ShapeEnv before invoking the user compiler. 3) When we have specializations, we install a lazy specialized dispatch function that checks each specialization and dispatches to the first one that matches. Instead of doing all of the specialization compiles up front, we do the compiles lazily. The first time a specialization is invoked, we will do the compilation and save it in a cache so subsequent invocations are fast. If none of the specializations match, we dispatch to the generic graph. I decided to do this over returning N different GuardedCodes since 1) it doesn't pollute the dynamo cache (eg. if you have 8 specializations, you would hit the cache limit) 2) it naturally incorporates the hierarchical lattice structure of the guards since the specializations are always necessarily stricter than the generic region's guards. I benchmarked this PR stack with #152596 and found around a 50% reduction when dispatching to the specialized regions: ![495269647_576053105510082_9189856138964956774_n](https://github.com/user-attachments/assets/66030fed-d62e-4d87-940f-aa13c99b1a73) Pull Request resolved: https://github.com/pytorch/pytorch/pull/153449 Approved by: https://github.com/zou3519 ghstack dependencies: #153433	2025-05-30 03:19:49 +00:00
bobrenjc93	172015fc11	[multigraph] add specialize_on kwarg to mark_{dynamic,unbacked} (#153433 ) The goal of this multigraph work is to enable a compiled region that has a single dynamo trace but multiple backend specializations. This work was inspired by vLLM which does this in a somewhat hacky way where they use a custom backend to capture a dynamo graph and then manually invoke compile_fx multiple times to get specialized graphs. There's really two parts of this work: The frontend changes (this PR): 1) we introduce an optional kwarg `specialize_on` to mark_{dynamic,unbacked} that takes in a list of specializations. I debated other methods including specifying specializations via decorators, but ultimately decided this approach was more harmonious. The big issue with decorators is the difficulty of composing well with the rest of the torch.compile ecosystem including graph breaks, lazy initialization of variable trackers and symbolic variables, etc. The backend changes: 1) We capture the backend_specialization specified in the mark_{dynamic,unbacked} API into a SymbolicContext. See changes in `/_dynamo/variables/builder.py` 2) After we are done dynamo tracing, we will lazily (more on this later) invoke `call_user_compiler` up to N + 1 times for N specializations and 1 generic graph. Under the hood this will call compile_fx, which composes nicely with both Async Compile and AOTAutogradCache. We do this by using a context manager to patch in specialization specific axioms into the ShapeEnv before invoking the user compiler. 3) When we have specializations, we install a lazy specialized dispatch function that checks each specialization and dispatches to the first one that matches. Instead of doing all of the specialization compiles up front, we do the compiles lazily. The first time a specialization is invoked, we will do the compilation and save it in a cache so subsequent invocations are fast. If none of the specializations match, we dispatch to the generic graph. I decided to do this over returning N different GuardedCodes since 1) it doesn't pollute the dynamo cache (eg. if you have 8 specializations, you would hit the cache limit) 2) it naturally incorporates the hierarchical lattice structure of the guards since the specializations are always necessarily stricter than the generic region's guards. I benchmarked this PR stack with #152596 and found around a 50% reduction when dispatching to the specialized regions: ![495269647_576053105510082_9189856138964956774_n](https://github.com/user-attachments/assets/66030fed-d62e-4d87-940f-aa13c99b1a73) Pull Request resolved: https://github.com/pytorch/pytorch/pull/153433 Approved by: https://github.com/zou3519	2025-05-30 01:08:15 +00:00
Isuru Fernando	dc0f09a478	Enable C++ dynamic shape guards by default (#140756 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/140756 Approved by: https://github.com/anijain2305, https://github.com/laithsakka ghstack dependencies: #151225	2025-05-29 23:44:43 +00:00
William Wen	81b7c96697	[dynamo, nested graph breaks] add skip_frame debugging function (#153773 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/153773 Approved by: https://github.com/jansel ghstack dependencies: #151056, #153510, #153772	2025-05-28 23:29:37 +00:00
William Wen	6cda280483	[dynamo, nested graph breaks] remove block stack graph break in output_graph (#153772 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/153772 Approved by: https://github.com/jansel ghstack dependencies: #151056, #153510	2025-05-28 23:29:37 +00:00
William Wen	bbd45f1f1f	[dynamo, nested graph breaks] refactor codegen to minimize NULL codegen'ing (#153510 ) Stop codegening NULLs that we need to pop later. Some output_graph.py changes to prepare for nested graph break support. Pull Request resolved: https://github.com/pytorch/pytorch/pull/153510 Approved by: https://github.com/jansel ghstack dependencies: #151056	2025-05-28 23:29:37 +00:00
William Wen	0f0d5749a0	[dynamo, nested graph breaks] small fixes to resume function generation (#151056 ) Old: ~pack resume function stack + locals into a list: we need to be able to pass frame stack+locals in lists to hand off to nested functions in the future, so we implement this part first.~ We are no longer doing this right now since GraphModule/guard variable naming gets messed up. Going forward, our approach will be to keep the top frame unpacked, but pack the rest of the contents of other frames in a list. Pull Request resolved: https://github.com/pytorch/pytorch/pull/151056 Approved by: https://github.com/jansel	2025-05-28 23:29:37 +00:00
bobrenjc93	d865b784e4	Support unbacked whitelist (#154295 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/154295 Approved by: https://github.com/angelayi	2025-05-28 23:01:22 +00:00
mieshkiwrk	ef4d57329b	[CAG] Support for call_module at copy paste aot bwd graph (#153827 ) Support for `call_module` in `copy_paste_aot_backward_graph` added recently with PT2.7 Problem is being observed with HPU backend in example repro due to creating fused modules. ``` import torch device = 'cpu' #'hpu' backend = 'inductor' #'hpu_backend' def fn(t1): t1 = t1 * 1 t1_grad = torch.ones_like(t1, device=device) t1.backward(t1_grad, retain_graph=True) return t1 t1 = torch.ones(1, requires_grad=True, device=device) #.squeeze() compiled_fn = torch.compile(fn, backend=backend) result = compiled_fn(t1) with torch._dynamo.compiled_autograd._enable(torch.compile(backend=backend)): result_grad = torch.ones_like(result, device=device) result.backward(result_grad) print(f'{result_grad=}') print(f'{t1.grad=}') ``` With this change I'm getting same results like on CPU, however I'm facing below problem when running with scalar (t1 tensor after squeeze): `torch._dynamo.exc.TorchRuntimeError: Dynamo failed to run FX node with fake tensors: call_function <built-in function getitem>((FakeTensor(..., device='hpu:0', size=()), 0), *{}): got IndexError('invalid index of a 0-dim tensor. Use `tensor.item()` in Python or `tensor.item<T>()` in C++ to convert a 0-dim tensor to a number')` While on CPU there's following warning and None returned: `repro.py:23: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at pytorch/build/aten/src/ATen/core/TensorBody.h:489.) print(f'{t1.grad=}') t1.grad=None` Pull Request resolved: https://github.com/pytorch/pytorch/pull/153827 Approved by: https://github.com/xmfan	2025-05-28 22:52:40 +00:00
Zhengxu Chen	0f56318152	[precompile] Add Exception type PackageError for unsupported precompile features. (#154430 ) Summary: Today when guard serialization fails, dynamo will raise an internal error like: ``` torch._dynamo.exc.InternalTorchDynamoError: RuntimeError: CLOSURE_MATCH guard cannot be serialized. ``` Adding a dedicated PackageError type to surface the error more clearly. Test Plan: CI Differential Revision: D75452124 Pull Request resolved: https://github.com/pytorch/pytorch/pull/154430 Approved by: https://github.com/jamesjwu, https://github.com/jansel	2025-05-28 22:34:51 +00:00
Sidharth	70539308ac	[dynamo] updating gb_type names for uniqueness (#154452 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/154452 Approved by: https://github.com/williamwen42	2025-05-28 16:54:10 +00:00
Zhengxu Chen	5bf74753f6	[precompile] Prune local scope variables for guard serialization. (#154431 ) Summary: Prune unused local objects from serialized local scope if they are not used in guard reconstruction. This is helpful when a user program takes things like local callable functions or the function call is recursive. Test Plan: test/dynamo/test_guard_serialization.py -k test_function_locals Before pruning locals: ``` state = GuardsState(output_graph=OutputGraphGuardsState(local_scope={'x': tensor([ 0.0461, 0.4024, -1.0115]), 'g': <function ...aints=None, _guards=<torch._guards.GuardsSet object at 0x7fbccc7e9fc0>, _aotautograd_guards=[]), shape_code_parts=None) def pickle_guards_state(state: GuardsState) -> bytes: buf = io.BytesIO() pickler = GuardsStatePickler(buf) try: pickler.dump(state) except AttributeError as e: > raise torch._dynamo.exc.PackageError(str(e)) from e E torch._dynamo.exc.PackageError: Can't pickle local object 'TestGuardSerialization.test_function_locals.<locals>.foo' ``` After the diff ``` Tests finished: Pass 1. Fail 0. Fatal 0. Skip 0. Build failure 0 ``` Differential Revision: D75452123 Pull Request resolved: https://github.com/pytorch/pytorch/pull/154431 Approved by: https://github.com/jansel	2025-05-28 16:03:02 +00:00
Joel Schlosser	9db7bcb3fe	[Dynamo] Introduce hook receiving list of traced code objects (#153622 ) This PR: * Expands `Hooks` with a new, optional `frame_traced_fn` field. It should be a callable receiving the list of traced code objects * Maintains a list of `traced_code` objects in the `TracingContext` of an `OutputGraph` * Whenever an `inline_call()` is encountered, the corresponding code object is added to this set * `OutputGraph`'s associated `f_code` is added to the list just before the hook is called I believe use of this hook should enable the source code hashing that vLLM does in a better way than monkey-patching `inline_call()`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/153622 Approved by: https://github.com/jansel	2025-05-28 15:40:09 +00:00
PyTorch MergeBot	a75e3a02be	Revert "[dynamo, nested graph breaks] small fixes to resume function generation (#151056 )" This reverts commit `28e7aa21c5`. Reverted https://github.com/pytorch/pytorch/pull/151056 on behalf of https://github.com/malfet due to Not sure which one, but it broke test_error_messages, see `203b0efd63/1` ([comment](https://github.com/pytorch/pytorch/pull/151056#issuecomment-2916437433))	2025-05-28 13:53:50 +00:00
PyTorch MergeBot	9603d6382d	Revert "[dynamo, nested graph breaks] refactor codegen to minimize NULL codegen'ing (#153510 )" This reverts commit `1fe9842922`. Reverted https://github.com/pytorch/pytorch/pull/153510 on behalf of https://github.com/malfet due to Not sure which one, but it broke test_error_messages, see `203b0efd63/1` ([comment](https://github.com/pytorch/pytorch/pull/151056#issuecomment-2916437433))	2025-05-28 13:53:50 +00:00
PyTorch MergeBot	5fd7004dc9	Revert "[dynamo, nested graph breaks] remove block stack graph break in output_graph (#153772 )" This reverts commit `9a66c30bdc`. Reverted https://github.com/pytorch/pytorch/pull/153772 on behalf of https://github.com/malfet due to Not sure which one, but it broke test_error_messages, see `203b0efd63/1` ([comment](https://github.com/pytorch/pytorch/pull/151056#issuecomment-2916437433))	2025-05-28 13:53:50 +00:00
PyTorch MergeBot	e86439ed5b	Revert "[dynamo, nested graph breaks] add skip_frame debugging function (#153773 )" This reverts commit `aadf9eae63`. Reverted https://github.com/pytorch/pytorch/pull/153773 on behalf of https://github.com/malfet due to Not sure which one, but it broke test_error_messages, see `203b0efd63/1` ([comment](https://github.com/pytorch/pytorch/pull/151056#issuecomment-2916437433))	2025-05-28 13:53:50 +00:00
Laith Sakka	853958f82c	Fix: Replacements can cause runtime assertions to disappear and can cause invalid inductor code. (#153661 ) Lets explore firs a couple of problem related to replacements and runtime assertions. #### example problem 1 if we have a runtime assertions that u0==s0, u0 is an input coming from mark_unbacked. A replacement u0=s0 will be added, the function f(u0, s0) will become f(s0, s0), this leads to the assert not being inserted during insert_deferred_runtime_asserts. The reason is that insert_deferred_runtime_asserts logic insert each assertion once all its inputs are seen, but u0 will never be seen. Same thing can happen when we defer assertion on backed i.e: s0==s2 ..etc. #### example problem 2 Consider u0==s0, where u0 is coming from a call to .item() Imagine later on that a specialization happens to s0 to become 2. In that case s0 as input wont be seen during insert_deferred_runtime_asserts and the assertion won't be inserted in the graph. Worse, Inductor will generate some code that refers to s0 in the cpp wrapper while it does not exist, causing a failure. internal xref: https://fb.workplace.com/groups/1075192433118967/permalink/1669766396994898/ ## The solution : Runtime assertions insertion loops depend on detecting that the symbols that are used in the runtime assertions are seen, note that those symbols are either graph inputs or generated in the graph from data dependent ops like .item(). The issues above happen when symbols are graph inputs, in order to force the symbols to exist in the graph and to be seen by the runtime assertions we do not do replacements on placeholders expressions during codegen and during runtime assertions insertion. This should not have performance overhead, since we already optimized the graph with replacements, the only effect is not mistakenly dropping graph inputs that are used in runtime assertions. I added extended testing. A solo unrelated follow up that I noticed, is that we might want to rename unbacked symbols in runtime assertions when we do unbacked renaming, but that's a different issue. Other approaches that did not work : #### ban replacements on unbacked. 1. does not work when we defer runtime assertions on backed ex: s0==s1. we could also ban such replacements but problem 2 becomes more problematic. 2. Problem two, it affects the quality of reasoning ! in a bad way. #### Apply specialization on runtime assertions before codegen . 1. Can fix some issues, but may lead also to runtime assertions becoming NOPs. 2. Does not fix the issue if not inserting runtime assertions during insert_deferred_runtime_asserts due to input not being detected. Pull Request resolved: https://github.com/pytorch/pytorch/pull/153661 Approved by: https://github.com/jansel	2025-05-28 09:08:05 +00:00
William Wen	aadf9eae63	[dynamo, nested graph breaks] add skip_frame debugging function (#153773 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/153773 Approved by: https://github.com/jansel ghstack dependencies: #151056, #153510, #153772	2025-05-28 08:54:09 +00:00
William Wen	9a66c30bdc	[dynamo, nested graph breaks] remove block stack graph break in output_graph (#153772 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/153772 Approved by: https://github.com/jansel ghstack dependencies: #151056, #153510	2025-05-28 08:54:09 +00:00
William Wen	1fe9842922	[dynamo, nested graph breaks] refactor codegen to minimize NULL codegen'ing (#153510 ) Stop codegening NULLs that we need to pop later. Some output_graph.py changes to prepare for nested graph break support. Pull Request resolved: https://github.com/pytorch/pytorch/pull/153510 Approved by: https://github.com/jansel ghstack dependencies: #151056	2025-05-28 08:54:09 +00:00
William Wen	28e7aa21c5	[dynamo, nested graph breaks] small fixes to resume function generation (#151056 ) Old: ~pack resume function stack + locals into a list: we need to be able to pass frame stack+locals in lists to hand off to nested functions in the future, so we implement this part first.~ We are no longer doing this right now since GraphModule/guard variable naming gets messed up. Going forward, our approach will be to keep the top frame unpacked, but pack the rest of the contents of other frames in a list. Pull Request resolved: https://github.com/pytorch/pytorch/pull/151056 Approved by: https://github.com/jansel	2025-05-28 08:54:09 +00:00
Pian Pawakapan	1d9b7dd2d1	[PGO] suggest dynamic whitelist for recompilations (#154189 ) suggests `TORCH_COMPILE_DYNAMIC_SOURCES` based off tensor size changes in PGO code state, including parameters. Closing #153442 which took the dynamo guards approach. Pull Request resolved: https://github.com/pytorch/pytorch/pull/154189 Approved by: https://github.com/bobrenjc93	2025-05-28 07:11:43 +00:00
Sidharth	54f1f29fed	[dynamo] dynamic gb_type -> static gb_type (#154435 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/154435 Approved by: https://github.com/williamwen42	2025-05-28 03:14:26 +00:00
Ryan Guo	75bbd4989c	[dynamo] Support using symint from dispatcher-style tensor subclass (#154130 ) Fixes #146932. Pull Request resolved: https://github.com/pytorch/pytorch/pull/154130 Approved by: https://github.com/laithsakka	2025-05-27 19:05:46 +00:00
bobrenjc93	2560c1f3f0	add sticky cache pgo (#154418 ) It's a reland of https://github.com/pytorch/pytorch/pull/154394 that hit some mergebot bug Pull Request resolved: https://github.com/pytorch/pytorch/pull/154418 Approved by: https://github.com/malfet	2025-05-27 16:40:18 +00:00
bobrenjc93	53ecb8159a	Introduce statically_known_false (#154291 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/154291 Approved by: https://github.com/mengluy0125	2025-05-24 14:23:55 +00:00
Zhengxu Chen	308beeeb56	[dynamo] Use UUID for compiled function variable names. (#154148 ) Summary: We previously assign each compiled function variable a name based on in-process global counter. This works fine within the same process but when we're trying to serialize the states with precompile, we need a way to load back these compiled functions without causing collision to the existing global scope. Changing the counter to a true global uuid seems to resolve this issue. For example, the new variable name will look like: ``` __compiled_fn_0_7ce7d872_4fe8_4174_b8fd_2496b09b8b43 ``` Test Plan: CI Differential Revision: D75244901 Pull Request resolved: https://github.com/pytorch/pytorch/pull/154148 Approved by: https://github.com/jansel	2025-05-24 01:08:42 +00:00
James Wu	bb17f9c98b	[AOTAutogradCache] Fix CHROMIUM_EVENT_LOG being none (#154258 ) It turns out if you import something that's None at import time in python, and later update the value, the one you imported stays none: ``` import torch from torch._dynamo.utils import CHROMIUM_EVENT_LOG class Foo: pass torch._dynamo.utils.CHROMIUM_EVENT_LOG = Foo() print(CHROMIUM_EVENT_LOG) # None ``` This fixes teh bug so we get AOTAUtogradCache instant events again Differential Revision: [D75305770](https://our.internmc.facebook.com/intern/diff/D75305770/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/154258 Approved by: https://github.com/oulgen	2025-05-23 21:53:31 +00:00
William Wen	5bb156a7fd	[dynamo] raise observed exception for module attribute errors (#153659 ) Fixes https://github.com/pytorch/pytorch/issues/153605 Pull Request resolved: https://github.com/pytorch/pytorch/pull/153659 Approved by: https://github.com/StrongerXi	2025-05-23 03:56:26 +00:00
Ruisi Zhang	f74842d665	[DTensor] enable SimpleFSDP's composability with Tensor Parallel (#152286 ) This PR adds support for SimpleFSDP's composability with Tensor Parallel + torch.compile. `_StridedShard` is used in SimpleFSDP/FSDP2 to support correct distributed checkpointing when FSDP+TP is applied. Previously, `_StridedShard` is not guarded by torch.compile. This PR adds `_StridedShard` as an additional placement type to be guarded by torch.compile. Pull Request resolved: https://github.com/pytorch/pytorch/pull/152286 Approved by: https://github.com/bdhirsh	2025-05-23 01:40:38 +00:00
bobrenjc93	413664b3c5	catch CSE recursion depth errors (#154039 ) Fixes #153777 CSE is an optimization and shouldn't block a compile if it hits recursion depth limits. Unfortunately we can't write this iteratively due to a dependency on `ast.unparse` which necessarily needs to do recursion. This PR catches opts out of CSE when we hit recursion depth errors. Pull Request resolved: https://github.com/pytorch/pytorch/pull/154039 Approved by: https://github.com/Microve	2025-05-22 20:17:19 +00:00
Yidi Wu	fc859077a0	[export][cond] support merging constant ints as unbacked symint (#152742 ) @pianpwk points out that this will be helpful to address several data dependent issues in huggingface [models](`e23705e557/src/diffusers/schedulers/scheduling_euler_ancestral_discrete.py (L332)`) with the following pattern: ```python idx = return 0 if u0 else return 1 return x[idx] ``` We could preserve the conditional with a cond. Pull Request resolved: https://github.com/pytorch/pytorch/pull/152742 Approved by: https://github.com/zou3519	2025-05-22 17:25:38 +00:00
IvanKobzarev	4439255148	[aotd] Support saved tensors hooks in aot_autograd (#150032 ) https://github.com/pytorch/pytorch/issues/148222 Goal: At the moment autograd saved tensors hooks are run in eager after compiled forward. They are executed at the same time for all saved tensors. Hooks can be used to reduce amout of memory used for saved tensors, doing quantization or offloading to cpu. This is suboptimal for optimization of peak memory. Better solution will be to put the hooks in the graph, as close as possible to the last usage of the tensor. To get user specified autograd saved tensors hooks in the graph. Logic: UX: If user specifies with torch.autograd.graph.saved_tensors_hooks(pack_gm, unpack_gm). Where pack_gm and unpack_gm are torch.fx.GraphModule. Then AotAutograd will retrace those graph modules, doing decompositions and functionalization in aot_autograd, inlining the result graphs in forward epilogue and backward prologue. User may want to use control logic in the hooks, for example applying quantization only for specific dtypes and sizes. This is also possible, user can put it into torch.fx.wrap function and use symbolic trace to make a GraphModule. In that case AotAutograd cahing will work only in case when user explicitly set to the torch.fx.wrap call_function node "user_cache_hash" metadata. If this metadata set - then aot_autograd cache can use saved cache artifact. If metadata is not set - then cache is bypassed. Dynamo: Dynamo traces pack and unpack hooks and installs them as subgraph and explicitly adds to the output_graph. (As those subgraphs are not used and will not be copied in the result by default). The complexity here is that at this moment we do not have example of inputs for the hooks. We trace pack_hook with some Tensor from the inputs. The result subgraphs are added to the hashing of AotAutograd Cache. In AotAutograd we retrace the graph with the true saved tensors coming from partitioner. Backwards Compatibility: As current hooks are executed in eager mode and not all of them will be traceable - we only try to put in the graph hooks, explicitly marked by user with annotation (@_inlineable_saved_tensors_hooks). For other hooks or if compiled autograd is enabled - keep the same logic. Recompilations: Hooks are guarded with lambda guard matching function id to cause recompilation if user reruns compiled function. Aot_autograd: After partitioner prepared forward and backward module - we trace prepared at Dynamo graphs for pack and unpack hooks and inline them in epilogue of forward and prologue of backward. Forward outputs and backward inputs are changed, transparently for user. We do not try to put it close the last usage etc., relying on inductor to do this optimization. ``` INFO: TRACED GRAPH ===== Forward graph pre saved_tensors_hooks inlining 3 ===== /data/users/ivankobzarev/a/pytorch/torch/fx/_lazy_graph_module.py class GraphModule(torch.nn.Module): def forward(self, primals_1: "Sym(s0)", primals_2: "Sym(s1)", primals_3: "f32[s0, s1][s1, 1]cuda:0"): # File: /data/users/ivankobzarev/a/pytorch/test/functorch/test_aotdispatch.py:6660 in simple_fn, code: x = x + 1 add: "f32[s0, s1][s1, 1]cuda:0" = torch.ops.aten.add.Tensor(primals_3, 1); primals_3 = None # File: /data/users/ivankobzarev/a/pytorch/test/functorch/test_aotdispatch.py:6661 in simple_fn, code: x = SAF.apply(x) view: "f32[s0, s1][s1, 1]cuda:0" = torch.ops.aten.view.default(add, [primals_1, primals_2]) return (view, add, primals_1, primals_2) INFO: TRACED GRAPH ===== Backward graph pre saved_tensors_hooks inlining 3 ===== /data/users/ivankobzarev/a/pytorch/torch/fx/_lazy_graph_module.py class GraphModule(torch.nn.Module): def forward(self, primals_1: "Sym(s0)", primals_2: "Sym(s1)", primals_3: "f32[s0, s1][s1, 1]cuda:0"): # File: /data/users/ivankobzarev/a/pytorch/test/functorch/test_aotdispatch.py:6660 in simple_fn, code: x = x + 1 add: "f32[s0, s1][s1, 1]cuda:0" = torch.ops.aten.add.Tensor(primals_3, 1); primals_3 = None # File: /data/users/ivankobzarev/a/pytorch/test/functorch/test_aotdispatch.py:6661 in simple_fn, code: x = SAF.apply(x) view: "f32[s0, s1][s1, 1]cuda:0" = torch.ops.aten.view.default(add, [primals_1, primals_2]) return (view, add, primals_1, primals_2) INFO: TRACED GRAPH ===== saved_tensors_pack_hook add 3 ===== /data/users/ivankobzarev/a/pytorch/torch/fx/_lazy_graph_module.py class pack_float8(torch.nn.Module): def forward(self, x_1: "f32[s0, s1][s1, 1]cuda:0"): # No stacktrace found for following nodes _to_copy: "f8e4m3fn[s0, s1][s1, 1]cuda:0" = torch.ops.aten._to_copy.default(x_1, dtype = torch.float8_e4m3fn); x_1 = None return (torch.float32, _to_copy) INFO: TRACED GRAPH ===== saved_tensors_unpack_hook add 3 ===== <eval_with_key>.22 from /data/users/ivankobzarev/a/pytorch/torch/fx/experimental/proxy_tensor.py:1225 in wrapped class pack_float8(torch.nn.Module): def forward(self, x_1: "f32[s0, s1][s1, 1]cuda:0"): # No stacktrace found for following nodes _to_copy: "f8e4m3fn[s0, s1][s1, 1]cuda:0" = torch.ops.aten._to_copy.default(x_1, dtype = torch.float8_e4m3fn); x_1 = None return (torch.float32, _to_copy) INFO: TRACED GRAPH ===== Forward graph 3 ===== /data/users/ivankobzarev/a/pytorch/torch/fx/_lazy_graph_module.py class GraphModule(torch.nn.Module): def forward(self, primals_1: "Sym(s0)", primals_2: "Sym(s1)", primals_3: "f32[s0, s1][s1, 1]cuda:0"): # File: /data/users/ivankobzarev/a/pytorch/test/functorch/test_aotdispatch.py:6660 in simple_fn, code: x = x + 1 add: "f32[s0, s1][s1, 1]cuda:0" = torch.ops.aten.add.Tensor(primals_3, 1); primals_3 = None # No stacktrace found for following nodes _to_copy: "f8e4m3fn[s0, s1][s1, 1]cuda:0" = torch.ops.aten._to_copy.default(add, dtype = torch.float8_e4m3fn) # File: /data/users/ivankobzarev/a/pytorch/test/functorch/test_aotdispatch.py:6661 in simple_fn, code: x = SAF.apply(x) view: "f32[s0, s1][s1, 1]cuda:0" = torch.ops.aten.view.default(add, [primals_1, primals_2]); add = None return (view, _to_copy, primals_1, primals_2) INFO: TRACED GRAPH ===== Backward graph 3 ===== <eval_with_key>.21 class GraphModule(torch.nn.Module): def forward(self, primals_1: "Sym(s0)", primals_2: "Sym(s1)", add_packed_2: "f8e4m3fn[s0, s1][s1, 1]cuda:0", tangents_1: "f32[s0, s1][s1, 1]cuda:0"): # No stacktrace found for following nodes _to_copy: "f32[s0, s1][s1, 1]cuda:0" = torch.ops.aten._to_copy.default(add_packed_2, dtype = torch.float32); add_packed_2 = None # File: /data/users/ivankobzarev/a/pytorch/test/functorch/test_aotdispatch.py:6661 in simple_fn, code: x = SAF.apply(x) add_7: "f32[s0, s1][s1, 1]cuda:0" = torch.ops.aten.add.Tensor(tangents_1, _to_copy); tangents_1 = _to_copy = None return (None, None, add_7) ``` Differential Revision: [D72187044](https://our.internmc.facebook.com/intern/diff/D72187044) Pull Request resolved: https://github.com/pytorch/pytorch/pull/150032 Approved by: https://github.com/bdhirsh	2025-05-22 14:09:38 +00:00
Sidharth	c1b7dbc52a	[dynamo] unimplemented -> unimplemented_v2 in variables/dict.py (#154040 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/154040 Approved by: https://github.com/williamwen42, https://github.com/StrongerXi	2025-05-22 06:46:10 +00:00
soulitzer	f2af30fee5	Add a HOP to bypass tracing of a wrapper function while tracing the wrapped function (#153487 ) Usage: ```python from torch._higher_order_ops.wrap import dynamo_bypassing_wrapper # Your ordinary function wrapper def my_hop_fn_impl(fn, args, k=1, kwargs): def wrapper(args, *kwargs): out = fn(args, *kwargs) if isinstance(out, tuple): return (out[0] + k,) return out + k return wrapper # Calling `my_hop_fn` instead of the impl directly captures a HOP into the dynamo graph def my_hop_fn(fn, args, k=1, *kwargs): return dynamo_bypassing_wrapper( functools.partial(my_hop_fn_impl, k=k), fn, args, **kwargs ) ``` Notes: - The dynamo captured graph now stashes arbitrary callable objects (the wrapper_fn) - this is equivalent to what SAC does today with policy_fn. - The `wrapper_fn` passed to `dynamo_bypassing_wrapper ` should have signature `Callable -> Callable` Pull Request resolved: https://github.com/pytorch/pytorch/pull/153487 Approved by: https://github.com/ydwu4	2025-05-22 04:24:38 +00:00
Sidharth	0b79a8c1a9	[dynamo] renamed _fn for more clarity and put a comment of user compiler user (#154026 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/154026 Approved by: https://github.com/williamwen42, https://github.com/StrongerXi	2025-05-21 21:12:51 +00:00
Ryan Guo	4c6f0fe22f	[dynamo] Properly handle `torch.script.jit` under `@staticmethod` (#153984 ) Fixes #153607. Pull Request resolved: https://github.com/pytorch/pytorch/pull/153984 Approved by: https://github.com/williamwen42	2025-05-21 19:45:06 +00:00
Tomasz Bohutyn	bb7e30c165	[MegaCache] Make MegaCache generic to allow external plugins registration (#152977 ) Implements #152976 Pull Request resolved: https://github.com/pytorch/pytorch/pull/152977 Approved by: https://github.com/oulgen	2025-05-21 18:18:47 +00:00
Michael Lazos	d44074f01a	[Dynamo] Fix einops regression (#153925 ) Fixes https://github.com/pytorch/pytorch/issues/153476 Pull Request resolved: https://github.com/pytorch/pytorch/pull/153925 Approved by: https://github.com/williamwen42	2025-05-20 09:52:42 +00:00
Sidharth	89ebd29fdc	[Dynamo] added warning message for tracing lru_cache wrapped functions (#153744 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/153744 Approved by: https://github.com/williamwen42	2025-05-20 04:08:29 +00:00
PyTorch MergeBot	75eb2f3ff6	Revert "[Dynamo] added warning message for tracing lru_cache wrapped functions (#153744 )" This reverts commit `aac30ef503`. Reverted https://github.com/pytorch/pytorch/pull/153744 on behalf of https://github.com/jeanschmidt due to Need to revert as it is breaking internal signals: [D74935585](https://www.internalfb.com/diff/D74935585) ([comment](https://github.com/pytorch/pytorch/pull/153744#issuecomment-2889187038))	2025-05-18 20:13:00 +00:00
Thomas Bohnstingl	68034198e5	[HOP] Mutation and alias rework (#146658 ) This PR reworks the way the input mutations and various aliases are checked Pull Request resolved: https://github.com/pytorch/pytorch/pull/146658 Approved by: https://github.com/ydwu4	2025-05-18 08:05:22 +00:00
Sidharth	aac30ef503	[Dynamo] added warning message for tracing lru_cache wrapped functions (#153744 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/153744 Approved by: https://github.com/williamwen42	2025-05-17 00:43:18 +00:00
clr	a952f42bdb	dynamo: Log if we're using dynamic shapes via set_feature_usage (#153490 ) This makes it extremely clear if a specific model didn't use dynamic shapes and should have (except it had a bad config option). Pull Request resolved: https://github.com/pytorch/pytorch/pull/153490 Approved by: https://github.com/jansel	2025-05-16 23:59:00 +00:00
Ryan Guo	e4a636df80	[dynamo] Make `OptimizedModule` more robust in attribute reads and writes (#153637 ) Fixes #138157. Differential Revision: [D74834872](https://our.internmc.facebook.com/intern/diff/D74834872) Pull Request resolved: https://github.com/pytorch/pytorch/pull/153637 Approved by: https://github.com/williamwen42	2025-05-16 20:29:19 +00:00
PyTorch MergeBot	c2dda47bc5	Revert "[dynamo] Make `OptimizedModule` more robust in attribute reads and writes (#153637 )" This reverts commit `2ce0b66db8`. Reverted https://github.com/pytorch/pytorch/pull/153637 on behalf of https://github.com/malfet due to Looks like it broke slow tests, see `cda572b053/1` ([comment](https://github.com/pytorch/pytorch/pull/153637#issuecomment-2887449037))	2025-05-16 18:49:57 +00:00
Ryan Guo	2ce0b66db8	[dynamo] Make `OptimizedModule` more robust in attribute reads and writes (#153637 ) Fixes #138157. Differential Revision: [D74834872](https://our.internmc.facebook.com/intern/diff/D74834872) Pull Request resolved: https://github.com/pytorch/pytorch/pull/153637 Approved by: https://github.com/williamwen42	2025-05-16 15:17:07 +00:00
Guilherme Leobas	f66a159db5	[Set] Raise TypeError if set is called with the wrong number of arguments (#152990 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/152990 Approved by: https://github.com/anijain2305 ghstack dependencies: #150792, #152987, #152988, #152904, #152901, #152902, #152903, #152905, #152906, #152989, #152907, #152908	2025-05-16 14:28:32 +00:00
Guilherme Leobas	5a0ca65555	[Set] Add correct set/frozenset __init__ behavior (#152908 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/152908 Approved by: https://github.com/anijain2305 ghstack dependencies: #150792, #152987, #152988, #152904, #152901, #152902, #152903, #152905, #152906, #152989, #152907	2025-05-16 14:28:32 +00:00
Guilherme Leobas	053025494f	[Set] Raise KeyError on empty `set.pop()` (#152907 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/152907 Approved by: https://github.com/anijain2305 ghstack dependencies: #150792, #152987, #152988, #152904, #152901, #152902, #152903, #152905, #152906, #152989	2025-05-16 14:28:32 +00:00
Guilherme Leobas	5964cb5eb1	[Set] Update `set.union` and `set.update` to support *args (#152989 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/152989 Approved by: https://github.com/anijain2305 ghstack dependencies: #150792, #152987, #152988, #152904, #152901, #152902, #152903, #152905, #152906	2025-05-16 14:28:32 +00:00
Guilherme Leobas	4759922c5e	[Set] Add `set.intersection(_update)` (#152906 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/152906 Approved by: https://github.com/anijain2305 ghstack dependencies: #150792, #152987, #152988, #152904, #152901, #152902, #152903, #152905	2025-05-16 14:28:32 +00:00
Guilherme Leobas	ca96d55322	[Set] Add `set.difference(_update)` (#152905 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/152905 Approved by: https://github.com/anijain2305 ghstack dependencies: #150792, #152987, #152988, #152904, #152901, #152902, #152903	2025-05-16 14:28:32 +00:00
Guilherme Leobas	5c6830ced0	[Set] Raise `KeyError` if elem not contained in the set (#152903 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/152903 Approved by: https://github.com/anijain2305 ghstack dependencies: #150792, #152987, #152988, #152904, #152901, #152902	2025-05-16 14:28:32 +00:00
Guilherme Leobas	574f4c507a	[Set] Add `set.issubset` and `set.issuperset` (#152902 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/152902 Approved by: https://github.com/anijain2305 ghstack dependencies: #150792, #152987, #152988, #152904, #152901	2025-05-16 14:28:32 +00:00
Guilherme Leobas	5926b7a38f	[Set] Add set.symmetric_difference(_update) (#152901 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/152901 Approved by: https://github.com/anijain2305 ghstack dependencies: #150792, #152987, #152988, #152904	2025-05-16 14:28:32 +00:00
Guilherme Leobas	fe51ce62ca	[Set] Raise TypeError if number of arguments mismatch (#152904 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/152904 Approved by: https://github.com/anijain2305 ghstack dependencies: #150792, #152987, #152988	2025-05-16 14:28:32 +00:00
Guilherme Leobas	481c345f49	[Set] Raise `TypeError` if argument is unhashable (#152988 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/152988 Approved by: https://github.com/anijain2305 ghstack dependencies: #150792, #152987	2025-05-16 14:28:32 +00:00
Guilherme Leobas	cf7021a0ee	[Set] Handle exception in ConstantVariable operation (#152987 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/152987 Approved by: https://github.com/williamwen42, https://github.com/anijain2305 ghstack dependencies: #150792	2025-05-16 14:28:32 +00:00
PyTorch MergeBot	3443627e07	Revert "[BE]: Enable RUFF TRY400 rule - log.exception (#153473 )" This reverts commit `4f4ecc583e`. Reverted https://github.com/pytorch/pytorch/pull/153473 on behalf of https://github.com/jeanschmidt due to seems to have broken internal signals, @albanD may I count on you to help the author merge his PR? D74837988 ([comment](https://github.com/pytorch/pytorch/pull/153473#issuecomment-2886017075))	2025-05-16 08:29:26 +00:00
angelayi	3fe42d4d5d	[export] Dynamo symint support (#152677 ) Basically adds native _IntWrapper support to dynamo. Here's my process of trying to make symint input support work on dynamo, and how I ended up with this approach [(doc)](https://docs.google.com/document/d/1GvNRQd8BnxlMay_hrEVgEta6VUeUW_hcFeRuB7q1nDY/edit?tab=t.0). What I did was, before passing inputs to dynamo.export, I first wrap them with a class, `_IntWrapper`. When processing dynamic shapes, I will then add the corresponding dynamic shape specification to the `dynamism` field stored on the `_IntWrapper`. If there is no dynamism specified, then this will get unwrapped back to an integer. When dynamo tracing, when we encounter an `_IntWrapper`, we will convert this to a symint if the dynamism was specified as `Dim.DYNAMIC/AUTO`. Dynamo will then trace a graph that contains symint inputs, which will get passed to AOTAutograd and so on. Pull Request resolved: https://github.com/pytorch/pytorch/pull/152677 Approved by: https://github.com/pianpwk	2025-05-16 07:51:50 +00:00
Raymond Li	56e1c236bf	[Dynamo] Catch unserialisable NN modules (#153503 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/153503 Approved by: https://github.com/c00w, https://github.com/jansel	2025-05-16 02:55:28 +00:00
Animesh Jain	fa8543454a	[dynamo][torch-function] Prevent unnecessary __torch_function__ tracing (#153551 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/153551 Approved by: https://github.com/mlazos	2025-05-15 14:06:17 +00:00
Aaron Gokaslan	4f4ecc583e	[BE]: Enable RUFF TRY400 rule - log.exception (#153473 ) Change logging.error to logging.exception to log additional information when relevant. A few places have slipped in logging.errors in try except since I last did a clean up here and the rule is stabilized so I am enabling it codebase wide. I have NOQA'd much of our custom exception stack trace handling for RPC calls and distributed and tried to a fix a few errors based on whether we immediately reraised it or if we didn't print any exception handling where it could be useful. Pull Request resolved: https://github.com/pytorch/pytorch/pull/153473 Approved by: https://github.com/albanD, https://github.com/cyyever	2025-05-15 13:36:59 +00:00
Animesh Jain	9839ec1383	[dynamo][compile-time] Cache method on load builtin (#153524 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/153524 Approved by: https://github.com/StrongerXi, https://github.com/jansel ghstack dependencies: #153522	2025-05-15 05:54:15 +00:00
Animesh Jain	b47be23461	[dynamo][compile-time] Faster inspect getattr_static for torch.Tensor (#153522 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/153522 Approved by: https://github.com/StrongerXi, https://github.com/jansel	2025-05-15 05:54:15 +00:00
Animesh Jain	03d01860fd	[dynamo][compile-time] Compute logging related flags once (#153426 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/153426 Approved by: https://github.com/jansel	2025-05-14 21:19:06 +00:00
Ryan Guo	8bb67700a3	[dynamo] Support `delattr` on result of `torch.compile(module)` (#152741 ) This is essentially a follow-up on #122098, where we added support of `getattr` and `setattr` on result of `torch.compile(module)`, but didn't add support for `delattr`. Fixes #150711. Pull Request resolved: https://github.com/pytorch/pytorch/pull/152741 Approved by: https://github.com/anijain2305 ghstack dependencies: #152740	2025-05-14 17:03:59 +00:00
Ryan Guo	6765df052c	[dynamo] Emit warning on global module hooks when calling using output of `torch.compile(module)` (#152740 ) When we do `torch.compile(module)`, we eventually end up returning a new `OptimizedModule` instance, whose `forward` method is the result of `torch.compile(mod.__call__)`, meaning it already captures all the extra logic (e.g., hook firing) for the compiled module. `OptimizedModule` also inherits `nn.module.__call__`, and thus has its own hook logic. This is useful for torchao, which injects module forward hooks to run in eager for quantization purposes. However, this might create unexpected behavior for global module hooks, because `torch.compile(module)` causes the hook to fire one extra time for `OptimizedModule`, when compared to eager. To preserve BC, we simply emit a warning for this behavior, and let users decide what to do. This is reasonable because the global module hooks are documented to be used for debugging/profiling purposes only. Fixes #149502 Differential Revision: [D74611716](https://our.internmc.facebook.com/intern/diff/D74611716) Pull Request resolved: https://github.com/pytorch/pytorch/pull/152740 Approved by: https://github.com/anijain2305, https://github.com/zou3519	2025-05-14 17:03:59 +00:00
Animesh Jain	8f3d7972ad	[dynamo][compile-time] Cache the function signature to speedup inlining (#153396 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/153396 Approved by: https://github.com/jansel, https://github.com/StrongerXi ghstack dependencies: #153333	2025-05-14 14:01:46 +00:00
Animesh Jain	864a5f4434	[dynamo][compile-time] Cache the cleaned insturctions while inlining (#153333 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/153333 Approved by: https://github.com/StrongerXi, https://github.com/jansel, https://github.com/williamwen42	2025-05-14 09:26:26 +00:00
Animesh Jain	11c64b7cf8	[dynamo][compile-time] Cache whether a function is inlineable (#153192 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/153192 Approved by: https://github.com/StrongerXi, https://github.com/jansel, https://github.com/williamwen42 ghstack dependencies: #153458	2025-05-14 05:40:25 +00:00
Animesh Jain	c797f1285c	[dynamo][copmile-time] Handle builtins first in LOAD_GLOBAL (#153458 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/153458 Approved by: https://github.com/jansel	2025-05-14 03:04:38 +00:00
William Wen	8521a690f7	[dynamo] fix potential circular import error in decorators.py (#153217 ) Differential Revision: [D74442043](https://our.internmc.facebook.com/intern/diff/D74442043) Pull Request resolved: https://github.com/pytorch/pytorch/pull/153217 Approved by: https://github.com/jansel	2025-05-14 01:01:57 +00:00
Sam Larsen	dde705864a	Fix test broken by D73809989 (#153413 ) Summary: I forgot to remove this unused field in D73809989. Test Plan: `buck test 'fbcode//mode/opt' fbcode//caffe2/test:fbonly -- --exact 'caffe2/test:fbonly - test_compilation_metrics_logger_in_sync (caffe2.test.fb.test_fb.TestFBOnly)'` Pull Request resolved: https://github.com/pytorch/pytorch/pull/153413 Approved by: https://github.com/c00w	2025-05-13 16:44:30 +00:00
Simon Fan	a80eb84a5f	[ca] support higher order gradients (create_graph=True) (#153222 ) Adds create_graph support if you don't compile or compile only with torch.compile(backend="eager"). Using a backend that uses AOTDispatch produces a post-dispatch AOT backward, where its double backward will be silently incorrect if the forward trace involved any ops that are not composite implicit. Pull Request resolved: https://github.com/pytorch/pytorch/pull/153222 Approved by: https://github.com/jansel ghstack dependencies: #153193	2025-05-13 16:42:09 +00:00
Simon Fan	37efaf4af9	[ca][api] config api shouldn't error with optimize_assert (#153193 ) Toggling on `torch._dynamo.config.compiled_autograd = True` was erroring export (optimize_assert didn't have `rebuild_ctx` defined). Separately add a way to `rebuild_ctx` for `optimize_assert` since it is a public API. Pull Request resolved: https://github.com/pytorch/pytorch/pull/153193 Approved by: https://github.com/jansel	2025-05-13 16:42:02 +00:00
Guilherme Leobas	a4459cd4e3	Remove `property` from python_type function (#152900 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/152900 Approved by: https://github.com/amjames, https://github.com/anijain2305 ghstack dependencies: #153070	2025-05-13 16:26:25 +00:00
Guilherme Leobas	f67eb6f8c5	Fix path matching in `CPythonTestCase/setUpClass` (#153070 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/153070 Approved by: https://github.com/amjames, https://github.com/anijain2305, https://github.com/Skylion007	2025-05-13 16:26:25 +00:00
Animesh Jain	7fdd754136	[compile-time traces] Profile large missing gaps in compile time (#151256 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/151256 Approved by: https://github.com/bdhirsh, https://github.com/masnesral, https://github.com/zou3519, https://github.com/jansel	2025-05-13 14:44:51 +00:00
Michael Lazos	ff039d39ec	[Dynamo] Optimize dedupe region ancestor tracking (#152589 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/152589 Approved by: https://github.com/anijain2305 ghstack dependencies: #152389, #152505, #152410, #152506, #152570, #152572	2025-05-13 12:17:59 +00:00
Michael Lazos	d0faa9985d	[Dynamo] Fix typing in graph_deduplication.py (#152572 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/152572 Approved by: https://github.com/Skylion007, https://github.com/anijain2305 ghstack dependencies: #152389, #152505, #152410, #152506, #152570	2025-05-13 12:17:59 +00:00
Michael Lazos	a415c9831f	[Hierarchical Compile] Replace tracing alias and mutation check with dynamo impl (#152570 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/152570 Approved by: https://github.com/anijain2305 ghstack dependencies: #152389, #152505, #152410, #152506	2025-05-13 12:17:59 +00:00
Michael Lazos	57dafb90ef	[Hierarchical Compile] Take into account mutation deps in cycle detection (#152506 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/152506 Approved by: https://github.com/anijain2305 ghstack dependencies: #152389, #152505, #152410	2025-05-13 12:17:59 +00:00
Michael Lazos	118192011e	[Hierarchical Compile] Add mutation dependencies to topological sorting (#152410 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/152410 Approved by: https://github.com/anijain2305 ghstack dependencies: #152389, #152505	2025-05-13 12:17:59 +00:00
Michael Lazos	3592cb52d9	[Hierarchical Compilation] Use universal flatten APIs (#152505 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/152505 Approved by: https://github.com/anijain2305 ghstack dependencies: #152389	2025-05-13 12:17:59 +00:00
Michael Lazos	023a3dc69f	[Hierarchical Compilation] Track node mutations (#152389 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/152389 Approved by: https://github.com/anijain2305	2025-05-13 12:17:59 +00:00
PyTorch MergeBot	641e4bee67	Revert "[export][cond] support merging constant ints as unbacked symint (#152742 )" This reverts commit `a805911d15`. Reverted https://github.com/pytorch/pytorch/pull/152742 on behalf of https://github.com/ydwu4 due to breaking trunk ([comment](https://github.com/pytorch/pytorch/pull/152742#issuecomment-2874410372))	2025-05-12 23:06:33 +00:00
Yidi Wu	a805911d15	[export][cond] support merging constant ints as unbacked symint (#152742 ) @pianpwk points out that this will be helpful to address several data dependent issues in huggingface [models](`e23705e557/src/diffusers/schedulers/scheduling_euler_ancestral_discrete.py (L332)`) with the following pattern: ```python idx = if u0 return 0 else return 1 return x[idx] ``` We could preserve the conditional with a cond. Pull Request resolved: https://github.com/pytorch/pytorch/pull/152742 Approved by: https://github.com/zou3519	2025-05-12 20:26:31 +00:00
Aaron Gokaslan	3555ebb63d	[BE]: Update ruff to 0.11.8 (#153249 ) Fixes a ton of false negatives throughout the codebase. RUFF also properly validates NOQA comments now and most of the changes are fixing typos there or removing filewide flake8 suppressions that were also silencing ruff issues. Pull Request resolved: https://github.com/pytorch/pytorch/pull/153249 Approved by: https://github.com/cyyever, https://github.com/albanD, https://github.com/seemethere	2025-05-12 18:30:52 +00:00

1 2 3 4 5 ...

4601 Commits