pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
IvanKobzarev	c5d92edd5a	[dynamo] WeakRefVar reconstruct (#148083 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/148083 Approved by: https://github.com/anijain2305	2025-03-05 19:34:17 +00:00
Marko Radmilac	c65ee728f0	Initial implementation of host memory stats (#147660 ) This is an initial attempt to provide some statistics for the pinned host memory allocations flowing through CachingHostAllocator. Many times in the past we have had inexplicable slowdowns that would be much easier to diagnose if we had some host memory characteristics. This change tries very hard not to disrupt the initial design of the allocator, and it uses existing locking mechanism, whenever possible, to gather statistics "for free". Only deviation from that is on the "slow path" where we incur CUDA calls anyway, so taking a short lock is not going to hurt the performance much, especially in the steady state where most allocations will come from cache. As mentioned before, this is the first PR, to introduce the concept and to see if it fits the right paradigm. We can always add more later. Metrics that would require more involved changes to the code base and locks, like requested memory, have been punted for now. I also tried to reuse the Stat structure used in CUDA caching allocator, in order to maintain symmetry. Pull Request resolved: https://github.com/pytorch/pytorch/pull/147660 Approved by: https://github.com/ngimel	2025-03-05 16:13:19 +00:00
William Wen	b28cbe5db3	[dynamo] remove internal stack trace for fullgraph=True graph breaks (#148205 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/148205 Approved by: https://github.com/zou3519	2025-03-05 01:16:53 +00:00
dan_the_3rd	d1abde11ec	[dynamo] Support passing arguments to `DeviceMesh.get_group` (#147741 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/147741 Approved by: https://github.com/StrongerXi	2025-03-04 21:19:47 +00:00
Thomas Bohnstingl	e4c558be1d	[scan] Corrections for scan (#146110 ) This PR resolves some minor issues with the scan HOP and unifies the handling of the additional_inputs in the same way as for associative_scan. Pull Request resolved: https://github.com/pytorch/pytorch/pull/146110 Approved by: https://github.com/ydwu4	2025-03-04 20:29:08 +00:00
PyTorch MergeBot	92beda54c8	Revert "[fx] Move map_aggregate to C++ (#148243 )" This reverts commit `edaff88f69`. Reverted https://github.com/pytorch/pytorch/pull/148243 on behalf of https://github.com/jovianjaison due to breaking internal builds [T216910920] ([comment](https://github.com/pytorch/pytorch/pull/148243#issuecomment-2698724058))	2025-03-04 19:40:21 +00:00
bobrenjc93	da2688f624	Introduce delayed compile via `eager_then_compile` stance (#147983 ) Recently I've been experimenting with introducing new APIs to delay compile as a way to reduce compile times while improving the ergonomics of using dynamic shapes. The high level idea is to run the first invocation of compile in eager, save the example inputs, and on the second invocation we can derive the dynamism in the inputs so that we don't need to waste our time doing a compile with static shapes (which is the status quo today with automatic dynamic). Another benefit of this is most users no longer need to annotate their inputs with mark_dynamic and mark_unbaked calls since we can derive the dynamism on the very first call. Additionally we get dynamic ints out of the box in this new regime. This PR implements this idea through the set_stance APIs. In particular it introduces a new `eager_then_compile` stance. Pull Request resolved: https://github.com/pytorch/pytorch/pull/147983 Approved by: https://github.com/williamwen42	2025-03-04 07:46:31 +00:00
William Wen	8f361c808b	[dynamo] run-only recursively on recompile limit exceeded (#148021 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/148021 Approved by: https://github.com/anijain2305	2025-03-03 21:01:08 +00:00
FFFrog	1bbe57336b	Replace unimplemented with unimplemented_v2 for dynamo (#148158 ) torch/_dynamo/variables/constant.py https://github.com/pytorch/pytorch/issues/147913 Pull Request resolved: https://github.com/pytorch/pytorch/pull/148158 Approved by: https://github.com/williamwen42, https://github.com/Skylion007	2025-03-03 21:00:17 +00:00
Sam Larsen	40c2505f16	[logging] Log individual Triton kernel compilation times to dynamo_compile (#147022 ) Summary: Gather the compilation time of individual triton kernels and log them to dynamo_compile: * Time compilation in `_worker_compile_triton` and pass back to the main process and logged from `get_result()`. * Added a way to track the "top N" (or N most-expensive compiles) in the metrics_context. I did this because I doubt we really care to capture potentially thousands of kernel compile times. That would be problematic for scuba logging anyway, so let's limit the number we track from the beginning. Arbitrarily chose 25 for now. * Format the list of compile times as a json string before logging. Test Plan: `python benchmarks/dynamo/torchbench.py --performance --training --amp --backend inductor --device cuda --print-compilation-time --repeat 5 --cold-start-latency --only nanogpt` Scuba: https://fburl.com/scuba/dynamo_compile/sandbox/nc4dzm3r Pull Request resolved: https://github.com/pytorch/pytorch/pull/147022 Approved by: https://github.com/jamesjwu	2025-03-03 19:32:17 +00:00
Jason Ansel	edaff88f69	[fx] Move map_aggregate to C++ (#148243 ) Microbenchmarking `fx.symbolic_trace(lambda x: functools.reduce(operator.add, [x, *range(100000)]))`, before: ``` 30603618 function calls (29403419 primitive calls) in 13.744 seconds ``` after: ``` 25203549 function calls (24403352 primitive calls) in 12.090 seconds ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/148243 Approved by: https://github.com/oulgen	2025-03-02 22:42:31 +00:00
PyTorch MergeBot	a983b2b11a	Revert "Initial implementation of host memory stats (#147660 )" This reverts commit `945e359fc1`. Reverted https://github.com/pytorch/pytorch/pull/147660 on behalf of https://github.com/mradmila due to There is an issue with ambiguous definition of Stat structure when different C++ tools are used. Backing out for now. ([comment](https://github.com/pytorch/pytorch/pull/147660#issuecomment-2692346379))	2025-03-01 18:05:45 +00:00
bobrenjc93	83ec7cdcd4	Fix recompile reason logging (#148200 ) for the following test case ``` @torch.compile(dynamic=False, backend=cnts) def fn(x, y, z): return x * y * z[0] fn(1, torch.randn(1), {0: torch.randn(1)}) fn(2, torch.randn(2), {0: torch.randn(2)}) fn(3, torch.randn(3), {0: torch.randn(3)}) fn(4, torch.randn(4), {0: torch.randn(4)}) fn(5, torch.randn(5), {0: torch.randn(5)}) ``` previously we would log ``` 0/0: L['x'] == 1 0/0: L['x'] == 1 0/0: L['x'] == 1 0/0: L['x'] == 1 ``` but after this change we now log ``` 0/0: L['x'] == 1 0/1: L['x'] == 2 0/2: L['x'] == 3 0/3: L['x'] == 4 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/148200 Approved by: https://github.com/xmfan	2025-02-28 22:33:37 +00:00
William Wen	40b3e4a358	[dynamo] expose code execution strategy to python (#148020 ) @anijain2305 this can be used to mark a code object to be skipped/run-only (recursively) while tracing. Pull Request resolved: https://github.com/pytorch/pytorch/pull/148020 Approved by: https://github.com/jansel	2025-02-28 21:59:12 +00:00
clr	e0e516c554	Don't crash when we call __qualname__ on torch._C.ScriptFunction (#147894 ) We've root caused this to correctly throwing attribute error on ScriptFunction when missing attributes are caused. This PR will fix crashes that are showing up. I'm going to stack a second PR to fix torch._c.ScriptFunction just being a very badly behaving python object (which should also fix this Pull Request resolved: https://github.com/pytorch/pytorch/pull/147894 Approved by: https://github.com/jansel	2025-02-28 20:15:38 +00:00
Marko Radmilac	945e359fc1	Initial implementation of host memory stats (#147660 ) This is an initial attempt to provide some statistics for the pinned host memory allocations flowing through CachingHostAllocator. Many times in the past we have had inexplicable slowdowns that would be much easier to diagnose if we had some host memory characteristics. This change tries very hard not to disrupt the initial design of the allocator, and it uses existing locking mechanism, whenever possible, to gather statistics "for free". Only deviation from that is on the "slow path" where we incur CUDA calls anyway, so taking a short lock is not going to hurt the performance much, especially in the steady state where most allocations will come from cache. As mentioned before, this is the first PR, to introduce the concept and to see if it fits the right paradigm. We can always add more later. Metrics that would require more involved changes to the code base and locks, like requested memory, have been punted for now. I also tried to reuse the Stat structure used in CUDA caching allocator, in order to maintain symmetry. Pull Request resolved: https://github.com/pytorch/pytorch/pull/147660 Approved by: https://github.com/ngimel	2025-02-28 18:36:44 +00:00
bobrenjc93	4708cfdbd9	Support whitelist of dynamic sources (#147979 ) This PR introduces the ability to whitelist sources as dynamic. This is particularly useful for large models with graph breaks, as you can keep the dynamism across graph breaks since source names stay consistent. Additionally you can use this to mark ints as dynamic. NB: I intentionally didn't complicate the interface by supporting specification of per dimension dynamism. There is virtue in keeping true to the standard way of representing sources (eg. L['x']). If we find in practice that we need more more fine grained control, we can explore further affordances at that time. Pull Request resolved: https://github.com/pytorch/pytorch/pull/147979 Approved by: https://github.com/Mingming-Ding	2025-02-28 15:43:14 +00:00
Yuanhao Ji	0a948f705b	[Dynamo] Fix `AssertionError` when dynamo traces `torch.functional.xxx()` functions (#148075 ) Fixes #147840 Pull Request resolved: https://github.com/pytorch/pytorch/pull/148075 Approved by: https://github.com/yanboliang	2025-02-28 15:09:11 +00:00
William Wen	34d726011f	[dynamo] update data-dependent branching graph break messages (#147912 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/147912 Approved by: https://github.com/jansel, https://github.com/zou3519 ghstack dependencies: #147494, #147872	2025-02-28 12:30:06 +00:00
William Wen	baba7beed2	[dynamo] add context manager debug information to graph breaks (#147872 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/147872 Approved by: https://github.com/zou3519 ghstack dependencies: #147494	2025-02-28 06:23:28 +00:00
William Wen	4caeede799	[dynamo] more better error messages [3/N] (#147494 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/147494 Approved by: https://github.com/jansel, https://github.com/zou3519	2025-02-28 06:23:28 +00:00
Animesh Jain	eb9c127341	[dynamo][optimizers] Install ID_GUARDED tensors into the Fx graph (#147824 ) Earlier, with inline flag we were lifting id-guarded tensors to the inputs to the Fx graph. But this offers no benefit. Main idea behind lifting parameters as inputs was to reuse the compilation units across many instances of the nn-module. However, if we are guarding on the `id`, we are explicitly specializing the compiled artifact to the parameter. This PR installs the parameters back into the graph. The benefit is removal of all pre-graph bytecode to extract the id-guarded tensors from locals/globals. This increases speedup from 1.67x to 1.75x for an internal model that has large number of optimizer parameters. Pull Request resolved: https://github.com/pytorch/pytorch/pull/147824 Approved by: https://github.com/jansel Co-authored-by: Jason Ansel <jansel@meta.com>	2025-02-28 03:22:11 +00:00
Xuehai Pan	3ce352e389	[BE][PYFMT] migrate PYFMT for `torch._dynamo` to `ruff format` (#144549 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/144549 Approved by: https://github.com/jansel	2025-02-28 03:03:53 +00:00
Xuehai Pan	0edb2da4a4	[dynamo] add sourceless builder for `types.MethodType` (#147880 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/147880 Approved by: https://github.com/jansel	2025-02-28 02:30:04 +00:00
Raymond Li	c5bf9aaf1c	Log graph breaks (#146537 ) Graph breaks currently aren't logged to dynamo_compile and pt2_compile_events. We want to log them. Pull Request resolved: https://github.com/pytorch/pytorch/pull/146537 Approved by: https://github.com/c00w	2025-02-27 11:06:33 +00:00
PyTorch MergeBot	915eb012e1	Revert "[dynamo] add sourceless builder for `types.MethodType` (#147880 )" This reverts commit `08f4c1a233`. Reverted https://github.com/pytorch/pytorch/pull/147880 on behalf of https://github.com/wdvr due to failing trunk tests ([comment](https://github.com/pytorch/pytorch/pull/147880#issuecomment-2686436432))	2025-02-26 23:29:58 +00:00
Thomas Bohnstingl	7c71ab1d40	[scan] User-facing reverse flag handling (#147886 ) This PR removes the reverse flag from the backend implementation and resolves it via `torch.flip` in the frontend. Pull Request resolved: https://github.com/pytorch/pytorch/pull/147886 Approved by: https://github.com/ydwu4	2025-02-26 20:04:57 +00:00
Ryan Guo	eb08ada5d3	[dynamo] Support reads to global/captured tensors in `nonstrict_trace`-ed function (#147572 ) As title. Without this patch we get the following error: Tweaking the `allow_non_fake_inputs` flag on tensor mode doesn't quite work for AOTAutograd, which also needs to fake-tensor-propagate the `nonstrict_trace`-ed function, but that's _after_ Dynamo has handled the `nonstrict_trace` processing and put the `flat_apply(...)` node into the graph. So we can't easily to temporarily enable the `allow_non_fake_inputs` flag on current fake mode, when AOTAutograd processes a `flat_apply` node from Dynamo's `nonstrict_trace` handling. And after discussing with zou3519, I decided to add a global `FakeTensorTLS` that contains a `allow_non_fake_inputs_override` flag, and patch the `nonstrict_trace`-ed function to temporarily tweak this flag during its execution. Pull Request resolved: https://github.com/pytorch/pytorch/pull/147572 Approved by: https://github.com/zou3519 ghstack dependencies: #146714, #146367, #146950, #147571	2025-02-26 19:47:39 +00:00
Ryan Guo	73e963459e	[dynamo] Support `nonstrict_trace` on class method (#147571 ) As title, also see 1. new test `test_nonstrict_trace_on_method` for example. 2. newly added comments for why we need special treatment on methods. Pull Request resolved: https://github.com/pytorch/pytorch/pull/147571 Approved by: https://github.com/zou3519 ghstack dependencies: #146714, #146367, #146950	2025-02-26 19:47:39 +00:00
Ryan Guo	7e0ef2c844	[dynamo] Use the new `get_unique_name_wrt` helper when applicable (#146950 ) This patch removes some duplicated name generation logic in Dynamo. Pull Request resolved: https://github.com/pytorch/pytorch/pull/146950 Approved by: https://github.com/zou3519 ghstack dependencies: #146714, #146367	2025-02-26 19:47:39 +00:00
Ryan Guo	f46f0e465c	[dynamo] Initial support for `nonstrict_trace` (#146367 ) ## Context > Note: `mark_traceable` got renamed to `nonstrict_trace` after > offline discussion. The reasons are (1) it aligns with `torch.export`'s > `nonstrict` notion, and (2) it's more definitive in behavior suggestion. 1. [Overall Design](https://docs.google.com/document/d/1O-dR2ZQaJQVt_v67AVcDCw2yJLtqgkZFwoXK0buEWRg/edit?tab=t.0) 2. [Dynamo graph representation with `torch._higher_order_ops.flat_apply`](https://docs.google.com/document/d/1YHl5nPTJvYeCPE5TO9uA18DPWNgUYGE4gCn6bFvXcBM/edit?tab=t.0#heading=h.xtw3hhbro4gn) ## Summary This patch adds a `torch._dynamo.nonstrict_trace` decorator, which currently is an enhanced version of `torch._dynamo.allow_in_graph` (see docstring for their differences). Specifically, this patch focuses on the UI and functionality prototyping/plumbing. The main enhancement is supporting more input types, and the implementation challenge lies in reconstructing the input objects from Dynamo `VariableTracker` (while accounting for buffered side-effects and guards). This patch takes a middle-ground (simple implementation with a bit of user labor), by 1. asking the user to provide pytree registration for non-proxy-able input types, 2. letting Dynamo trace through `pytree_flatten` (which accounts for buffered side-effects and guards automatically), 3. and passing in the TreeSpec as a graph attribute constant into `torch._higher_order_ops.flat_apply` (which unflattens the inputs and invokes the underlying function). ## Next Steps In subsequent patches, we will try to support the following: - annotating on class method - reads to global tensors - inputs that contains `pytree.register_constant`-ed instances. - function as input - more output types (e.g., any pytree-registered type) - `torch.nn.Module` as inputs Pull Request resolved: https://github.com/pytorch/pytorch/pull/146367 Approved by: https://github.com/zou3519 ghstack dependencies: #146714	2025-02-26 19:47:39 +00:00
Simon Fan	0a2da008f8	[ca] trace saved variable unpacking (#147242 ) ## Before Previously, CA will always unpack all saved variables stored in the autograd graph before executing it. This meant that we can't capture unpack hooks as part of the CA graph, and they would fire out of order wrt to other backward hooks. For memory saving APIs built on top of saved tensor hooks like non-reentrant checkpointing and offloading, we couldn't achieve any savings because all activations would be recomputed/loaded and active at the same time, resulting in no-op. ## After We add unpack hooks into the CA graph so that they can be executed progressively. The python hook and hook input themselves are wrapped by non-traceable code, so CA polyfills the wrapping as: ```python # pseudocode class SavedVariable: def unpack(self): if self.hook: return self.hook(self.packed_data) else: return self.packed_data # This approach won't directly work when we add support for Forward AD or double-backward. ``` Directly executing the CA graph (without torch.compiling it) under checkpointing/offloading, memory profile is expected to stay the same as when using the eager autograd engine. If AOT backward is in the autograd graph, memory profile is expected to be better than the eager autograd engine, since we can now delay saved activations unpacking into the AOT backward's execution. All tests pass when running the CA graph directly, the remaining issues are in Dynamo. Pull Request resolved: https://github.com/pytorch/pytorch/pull/147242 Approved by: https://github.com/jansel	2025-02-26 16:37:17 +00:00
Xuehai Pan	08f4c1a233	[dynamo] add sourceless builder for `types.MethodType` (#147880 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/147880 Approved by: https://github.com/jansel	2025-02-26 15:43:47 +00:00
Luca Wehrstedt	60d94ea22b	Add option to limit number of SMs used by matmul kernels (#147966 ) Resubmission of #144974 which was reverted for unrelated reasons. Newer matmul kernels, e.g. those targeting Hopper GPUs, sometime use a "persistent" schedule which consists in launching as many CUDA blocks as there are SMs on the GPU, with each such block then working on multiple output tiles in a row. This allows to eliminate the overhead of starting and finishing each tile, effectively doing cross-tile pipelining. In previous generations these latencies could be hidden by having multiple CUDA blocks per SM but, with blocks becoming larger, only one can run at a time per SM and thus this needs to be taken care of in software. Persistent kernels become an issue when other kernels are running concurrently. The classical example is a NCCL communication kernel running in the background. In such cases the matmul expects to be able to use all the SMs but is prevented from doing so because some of the are busy. This can lead to its blocks being scheduled as two separate waves on the available SMs. This "wave quantization" can double the latency of the matmul kernels. While we wait for smarter solutions, such as automatic load balancing among the blocks, an easy way to unblock ourselves is to tell the matmuls to only use a subset of the GPU's SMs. For this, I am introducing a global `sm_carveout` flag which can be used to specify how many SMs should be left available for other kernels. For now I only change the cuBLAS kernels and the scaled-mm CUTLASS kernel. More kernels can be opted-in later. I tested this change manually, by using the Kineto profiler to look up the grid size of a scaled-mm kernel with different values of `sm_carveout`, and making sure it changed. Suggestions are welcome for a more automated test. Pull Request resolved: https://github.com/pytorch/pytorch/pull/147966 Approved by: https://github.com/danthe3rd	2025-02-26 12:01:12 +00:00
William Wen	cf6d1e6824	[dynamo] add generic graph break hints (#147429 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/147429 Approved by: https://github.com/jansel, https://github.com/zou3519 ghstack dependencies: #147385	2025-02-26 09:20:28 +00:00
William Wen	3fd68e4e2f	[dynamo] make some more graph break messages readable in English [2/N] (#147385 ) This is for "for some large number Z, make sure the error messages are readable English." - beginning to audit all `unimplemented` sites and making sure that all messages are at least English-readable. Hints may not necessarily be provided. Pull Request resolved: https://github.com/pytorch/pytorch/pull/147385 Approved by: https://github.com/jansel	2025-02-26 09:20:28 +00:00
PyTorch MergeBot	90e3a3d86d	Revert "[ca] trace saved variable unpacking (#147242 )" This reverts commit `68ddca9449`. Reverted https://github.com/pytorch/pytorch/pull/147242 on behalf of https://github.com/wdvr due to failing tests in the slow workflow, see below ([comment](https://github.com/pytorch/pytorch/pull/147242#issuecomment-2683604547))	2025-02-26 00:40:16 +00:00
PyTorch MergeBot	1e894d2635	Revert "Add option to limit number of SMs used by matmul kernels (#144974 )" This reverts commit `af2d63637e`. Reverted https://github.com/pytorch/pytorch/pull/144974 on behalf of https://github.com/wdvr due to reverting in order to revert #147548 that causes a merge conflict ([comment](https://github.com/pytorch/pytorch/pull/144974#issuecomment-2683461733))	2025-02-25 22:46:38 +00:00
Simon Fan	68ddca9449	[ca] trace saved variable unpacking (#147242 ) ## Before Previously, CA will always unpack all saved variables stored in the autograd graph before executing it. This meant that we can't capture unpack hooks as part of the CA graph, and they would fire out of order wrt to other backward hooks. For memory saving APIs built on top of saved tensor hooks like non-reentrant checkpointing and offloading, we couldn't achieve any savings because all activations would be recomputed/loaded and active at the same time, resulting in no-op. ## After We add unpack hooks into the CA graph so that they can be executed progressively. The python hook and hook input themselves are wrapped by non-traceable code, so CA polyfills the wrapping as: ```python # pseudocode class SavedVariable: def unpack(self): if self.hook: return self.hook(self.packed_data) else: return self.packed_data # This approach won't directly work when we add support for Forward AD or double-backward. ``` Directly executing the CA graph (without torch.compiling it) under checkpointing/offloading, memory profile is expected to stay the same as when using the eager autograd engine. If AOT backward is in the autograd graph, memory profile is expected to be better than the eager autograd engine, since we can now delay saved activations unpacking into the AOT backward's execution. All tests pass when running the CA graph directly, the remaining issues are in Dynamo. Pull Request resolved: https://github.com/pytorch/pytorch/pull/147242 Approved by: https://github.com/jansel	2025-02-25 20:38:51 +00:00
Yidi Wu	824474cb35	[cond] support output sizes mismatch in front end (#147130 ) This PR finishes https://github.com/pytorch/pytorch/pull/137615 by addressing the TODOs and comments left there. Pull Request resolved: https://github.com/pytorch/pytorch/pull/147130 Approved by: https://github.com/zou3519	2025-02-25 20:28:41 +00:00
Luca Wehrstedt	af2d63637e	Add option to limit number of SMs used by matmul kernels (#144974 ) Newer matmul kernels, e.g. those targeting Hopper GPUs, sometime use a "persistent" schedule which consists in launching as many CUDA blocks as there are SMs on the GPU, with each such block then working on multiple output tiles in a row. This allows to eliminate the overhead of starting and finishing each tile, effectively doing cross-tile pipelining. In previous generations these latencies could be hidden by having multiple CUDA blocks per SM but, with blocks becoming larger, only one can run at a time per SM and thus this needs to be taken care of in software. Persistent kernels become an issue when other kernels are running concurrently. The classical example is a NCCL communication kernel running in the background. In such cases the matmul expects to be able to use all the SMs but is prevented from doing so because some of the are busy. This can lead to its blocks being scheduled as two separate waves on the available SMs. This "wave quantization" can double the latency of the matmul kernels. While we wait for smarter solutions, such as automatic load balancing among the blocks, an easy way to unblock ourselves is to tell the matmuls to only use a subset of the GPU's SMs. For this, I am introducing a global `sm_carveout` flag which can be used to specify how many SMs should be left available for other kernels. For now I only change the cuBLAS kernels and the scaled-mm CUTLASS kernel. More kernels can be opted-in later. I tested this change manually, by using the Kineto profiler to look up the grid size of a scaled-mm kernel with different values of `sm_carveout`, and making sure it changed. Suggestions are welcome for a more automated test. Pull Request resolved: https://github.com/pytorch/pytorch/pull/144974 Approved by: https://github.com/eqy, https://github.com/albanD	2025-02-25 10:19:19 +00:00
xinan.lin	dc9a03d30c	[Window] Fix invalid file path on windows. (#147708 ) This PR aims to fix the invalid path for windows: `C:\\Users\\sdp\\AppData\\Local\\Temp\\tmp0wugz2qm\\dynamo\\code_state___main__.TestFxGraphCache.test_cache_hot_load_pgo:None:.pkl.lock` Windows does not allow chars `\ / : * ? " < > \|` in a path. And this PR also replace `os.rename` to `os.replace` in torch/_dynamo/pgo.py because `os.replace` allows target file exists on Windows, but not `os.rename` . \| Function \| `os.rename()` \| `os.replace()` \| \|--------------------------------\|----------------------------\|----------------------------\| \| Rename a file \| ✅ \| ✅ \| \| Move a file \| ✅ \| ✅ \| \| Overwrite an existing file \| ❌ (Error on Windows) \| ✅ (Will overwrite) \| \| Overwrite an existing directory \| ❌ (Error on Windows) \| ❌ (Error on Windows) \| \| Move across disks \| ❌ \| ❌ \| Pull Request resolved: https://github.com/pytorch/pytorch/pull/147708 Approved by: https://github.com/jansel	2025-02-24 08:31:11 +00:00
lei,zhenyuan	7c52ef2424	Add XPU to is_compile_supported to support roi_align op in torchvision (#147541 ) Part of the required fix for https://github.com/intel/torch-xpu-ops/issues/1264. To support `roi_align`, torchvision uses `is_compile_supported` in `torch/_dynamo/utils.py` to compile a non-deterministic version of the op for backwards passes. This PR adds XPU device to the supported compile devices. The `is_compile_supported()` util function has extremely limited usage, only being used in `torchvision.ops.roi_align` and `torch.utils._content_store.has_storage()`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/147541 Approved by: https://github.com/guangyey, https://github.com/jansel Co-authored-by: lei,zhenyuan <zhenyuan.lei@intel.com>	2025-02-24 01:32:36 +00:00
Guilherme Leobas	d0adff761e	Propagate `AttributeError` to user code in user_defined.py (#146497 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/146497 Approved by: https://github.com/anijain2305, https://github.com/zou3519 ghstack dependencies: #146496	2025-02-23 01:18:28 +00:00
Guilherme Leobas	8c761ac7e3	Handle `is`/`is not` (#146496 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/146496 Approved by: https://github.com/anijain2305, https://github.com/zou3519	2025-02-23 01:18:28 +00:00
xinan.lin	2d433cf1ad	[Inductor UT][Windows][XPU] Enable Inductor UT on XPU Windows. (#147347 ) This PR removes the restrictions on general cases for XPU on Windows, allowing us to run Inductor UT on Windows. Additionally, this series of PRs has also fixed all XPU Inductor UT issues on Windows. However, due to resource constraints, we have not yet set up a Windows CI pipeline online. Pull Request resolved: https://github.com/pytorch/pytorch/pull/147347 Approved by: https://github.com/jansel, https://github.com/EikanWang	2025-02-22 02:53:16 +00:00
Thomas Bohnstingl	6eb795c9e8	[associative_scan] compile backend change to "eager" (#146973 ) This PR fixes some issues with torch export discussed here: https://github.com/pytorch/pytorch/pull/140043#discussion_r1941932960 However, this backend change does still not resolve the failure for specific shapes mentioned here: https://github.com/pytorch/pytorch/issues/137943#issuecomment-2649564994 Pull Request resolved: https://github.com/pytorch/pytorch/pull/146973 Approved by: https://github.com/ydwu4	2025-02-21 20:21:41 +00:00
Aaron Orenstein	db4ce78d46	PEP585: More UP006 fixes (#146392 ) This should be the final PR before we can enable RUFF UP006. Pull Request resolved: https://github.com/pytorch/pytorch/pull/146392 Approved by: https://github.com/justinchuby, https://github.com/albanD, https://github.com/Skylion007	2025-02-20 06:18:13 +00:00
Animesh Jain	76ad19a549	[dynamo][codegen] Implement CSE for pre-graph graph-arg bytecode reconstruction (#147425 ) This reduces fixed overhead seen in a few internal models. Pull Request resolved: https://github.com/pytorch/pytorch/pull/147425 Approved by: https://github.com/jansel, https://github.com/StrongerXi	2025-02-20 05:42:52 +00:00
Shangdi Yu	0b0da81021	Support static method of torchbind attributes in torch.compile with inductor backend (#146927 ) As title. Many changes adapted from https://github.com/pytorch/pytorch/pull/129537. Also this diff is only for static method of torchbind attributes. Some case that's not supported/tested: - dynamic torchbind objects - torchbind objects as an input to the module. Note that in JIT Inductor, the attributes are lifted as inputs. So even if we just have torchbind objects as attributes, they will show up as inputs in the graph. Example generated python code in torch.compile with inductor backend for the test case in `inductor/test_torchbind.py` (P1730554370): ```python async_compile.wait(globals()) del async_compile def call(args): arg1_1, arg2_1, arg3_1 = args args.clear() assert_size_stride(arg1_1, (2, 3), (3, 1)) assert_size_stride(arg2_1, (2, 3), (3, 1)) buf2 = empty_strided_cpu((2, 3), (3, 1), torch.float32) cpp_fused_add_0(arg1_1, arg2_1, buf2) del arg1_1 del arg2_1 # Topologically Sorted Source Nodes: [x, takes_foo_tuple_return], Original ATen: [aten.add] buf3 = torch.ops._TorchScriptTesting.takes_foo_tuple_return.default(arg3_1, buf2) buf4 = buf3[0] assert_size_stride(buf4, (2, 3), (3, 1)) buf5 = buf3[1] assert_size_stride(buf5, (2, 3), (3, 1)) buf6 = buf4; del buf4 # reuse cpp_fused_add_1(buf6, buf5) del buf5 # Topologically Sorted Source Nodes: [y, b], Original ATen: [aten.add] buf7 = torch.ops._TorchScriptTesting.takes_foo.default(arg3_1, buf6) del buf3 del buf6 buf8 = buf7 assert_size_stride(buf8, (2, 3), (3, 1)) # Topologically Sorted Source Nodes: [c], Original ATen: [] buf9 = torch.ops.higher_order.call_torchbind(arg3_1, 'add_tensor', buf2) del arg3_1 del buf7 buf10 = buf9 assert_size_stride(buf10, (2, 3), (3, 1)) del buf9 buf11 = buf2; del buf2 # reuse cpp_fused_add_2(buf11, buf8, buf10) return (buf11, ) def benchmark_compiled_module(times=10, repeat=10): from torch._dynamo.testing import rand_strided from torch._inductor.utils import print_performance arg1_1 = rand_strided((2, 3), (3, 1), device='cpu', dtype=torch.float32) arg2_1 = rand_strided((2, 3), (3, 1), device='cpu', dtype=torch.float32) import pickle global arg3_1 arg3_1 = pickle.loads(b'\x80\x04\x95[\x00\x00\x00\x00\x00\x00\x00\x8c\x05torch\x94\x8c\x0cScriptObject\x94\x93\x94)\x81\x94]\x94(K\nK\x14e\x8c0__torch__.torch.classes._TorchScriptTesting._Foo\x94\x86\x94b.') fn = lambda: call([arg1_1, arg2_1, arg3_1]) return print_performance(fn, times=times, repeat=repeat) if __name__ == "__main__": from torch._inductor.wrapper_benchmark import compiled_module_main compiled_module_main('None', benchmark_compiled_module) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/146927 Approved by: https://github.com/angelayi	2025-02-20 03:33:19 +00:00
rzou	fea718f062	[BaseHOP] change hop(subgraph, operands) to hop(subgraph, *operands) (#146730 ) Our three main users are OK with this, with two of them (foreach_map, invoke_quant) prefering it like this. I was originally worried about BC issues (this now means you cannot add any positional args) but I think that's not a concern -- one can always add kwonly args. Test Plan - tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/146730 Approved by: https://github.com/ydwu4, https://github.com/mlazos	2025-02-20 02:30:36 +00:00
William Wen	16e202a38e	[dynamo] improved graph break messages for some common graph break sites [1/N] (#146525 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/146525 Approved by: https://github.com/jansel	2025-02-20 00:08:13 +00:00
Simon Fan	ed83b0b70b	[ddp] decouple python reducer from compilation mode (#147123 ) Current implementation reads as: we will only actually use the "python_reducer" config if the DDP forward is compiled. Otherwise, we will silently fallback to C++ reducer + no DDPOptimizer. I'm changing this behavior to always use the python reducer if the config is specified. Pull Request resolved: https://github.com/pytorch/pytorch/pull/147123 Approved by: https://github.com/fegin	2025-02-19 15:51:40 +00:00
bobrenjc93	525ca80f53	add unbacked strict mode (#147333 ) fixes #145775 This is the first step in introducing a "strict" mode where we don't silent specialize and don't silent graph break. At a high level when we do mark_unbacked(... strict=True), anytime we specialize an unbacked symint we will explicitly error and tell the user their unbacked dimension was specialized to a single value. Pull Request resolved: https://github.com/pytorch/pytorch/pull/147333 Approved by: https://github.com/laithsakka	2025-02-18 23:33:55 +00:00
bobrenjc93	5d547d82e6	Add no_data_dependent_graph_break mode (#147342 ) This adds a strict mode `TORCHDYNAMO_UNBACKED_STRICT` to prevent graph breaking when we guard on data dependent. This is a better UX for those who are actively trying to make their model more dynamic, but aren't close enough to full graph to use that flag directly. Pull Request resolved: https://github.com/pytorch/pytorch/pull/147342 Approved by: https://github.com/laithsakka	2025-02-18 23:33:47 +00:00
William Wen	63e8ad49b8	[dynamo] replace hardcoded eval frame control flags skip_code_recursive_flag/cache_limit_hit_flag (#146355 ) This PR and the previous: - Moves parts of `eval_frame.c` to C++. - Reduces code duplication in `dynamo__custom_eval_frame` and makes the control flow more clear. - Enables `convert_frame` to signal to `eval_frame.cpp` in a general manner how to evaluate this frame, recursive frames, and future frames with the same code object (default/compile, skip, run-only). e.g. this will allow us to change skipping/cache limit hit eval_frame behavior directly from convert_frame without requiring changes to C/C++. Pull Request resolved: https://github.com/pytorch/pytorch/pull/146355 Approved by: https://github.com/jansel ghstack dependencies: #145603	2025-02-18 21:37:12 +00:00
clr	166419b9c1	dynamo: Don't crash when encountering a object with no __name__ (#147246 ) This was triggering on ScriptFunctions. Note that other than badly implemented c functiosn, this seems to be almost impossible to trigger, so I wrote a smaller unit test, rather than a full repro. Let me know if people feel strongly and want a full reproduction. Pull Request resolved: https://github.com/pytorch/pytorch/pull/147246 Approved by: https://github.com/anijain2305, https://github.com/jansel, https://github.com/Skylion007	2025-02-18 20:35:49 +00:00
zeshengzong	c6b331f7d9	Deprecate `skip_code_recursive_on_cache_limit_hit` config flag (#136970 ) Fixes one of #136862 Make `skip_code_recursive_on_cache_limit_hit` flag deprecated. Affected logic is in here: `6931c1644a/torch/_dynamo/convert_frame.py (L866-L876)` Pull Request resolved: https://github.com/pytorch/pytorch/pull/136970 Approved by: https://github.com/williamwen42	2025-02-18 18:48:23 +00:00
Xuehai Pan	ee38a32c55	[Dynamo] support `isinstance(...)` check for type tuple (#146984 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/146984 Approved by: https://github.com/jansel	2025-02-16 10:41:49 +00:00
Animesh Jain	9dc702875d	[dynamo][mappingproxy][inspect] Support existing types.MappingProxyType (#147217 ) Fixes https://github.com/pytorch/pytorch/issues/147162 Pull Request resolved: https://github.com/pytorch/pytorch/pull/147217 Approved by: https://github.com/williamwen42	2025-02-15 07:59:33 +00:00
Yidi Wu	bf0c89a72f	[dynamo] fix error message when logging graph that contains hops (#147227 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/147227 Approved by: https://github.com/zou3519	2025-02-15 00:53:44 +00:00
Animesh Jain	76f57e184a	[dynamo] Make SliceVariable a subclass of VariableTracker (#147046 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/147046 Approved by: https://github.com/StrongerXi ghstack dependencies: #146819, #146995	2025-02-14 23:22:27 +00:00
Yidi Wu	1224765286	[cond] make cond call fake kernel in dynamo (#147045 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/147045 Approved by: https://github.com/zou3519 ghstack dependencies: #146954	2025-02-14 23:13:15 +00:00
Guilherme Leobas	cefd9805de	Add `RAISE_VARARGS 0` (#146493 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/146493 Approved by: https://github.com/zou3519 ghstack dependencies: #146498, #146492	2025-02-14 13:37:23 +00:00
Guilherme Leobas	134723ee1c	Add `WITH_EXCEPT_START` opcode (#146492 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/146492 Approved by: https://github.com/anijain2305, https://github.com/zou3519 ghstack dependencies: #146498	2025-02-14 13:37:23 +00:00
Guilherme Leobas	dbb86b78ad	Add `sys.exc_info` and `sys.exception` (#146498 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/146498 Approved by: https://github.com/anijain2305, https://github.com/zou3519	2025-02-14 13:37:14 +00:00
Animesh Jain	2d089a5697	[dynamo] Remove unintended lru_cache (#147147 ) I forgot to remove it while add frozenset __contains__ method in this PR - https://github.com/pytorch/pytorch/pull/146062?fbclid=IwZXh0bgNhZW0CMTEAAR3S_qq8bYxO7pDuHqpr2X-vqkXQrY0KtT14z46bfuRDYikjJBet3uKF2dE_aem_o1c7I4eawKyaEsfiWhnTmw This is causing memory leak Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/147147 Approved by: https://github.com/williamwen42	2025-02-14 03:55:39 +00:00
Aaron Gokaslan	6344ca1dd4	[BE][Ez]: Apply FURB188: use str remove(pre\|suf)fix (#146997 ) Since we are on 3.9, we can use this nice str builtin which is more readable and more efficient. Pull Request resolved: https://github.com/pytorch/pytorch/pull/146997 Approved by: https://github.com/XuehaiPan, https://github.com/cyyever, https://github.com/jansel	2025-02-14 03:38:07 +00:00
cyy	d473c212fd	Remove code for Python < 3.9 (#147097 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/147097 Approved by: https://github.com/albanD	2025-02-14 03:22:49 +00:00
Simon Fan	057bcd3a45	[ca] eliminate duplicate getitem graph nodes for shape inputs (#146875 ) should reuse existing proxies instead of creating new ones before: https://manifold.edge.x2p.facebook.net/v0/read/tree/logs/.tmpL7hmHe/0_-_-_0/compiled_autograd_graph_3.txt?bucketName=tlparse_reports&apiKey=tlparse_reports-key&withPayload=1&timeoutMsec=100 ```python class CompiledAutograd0(torch.nn.Module): def forward(self, inputs, sizes, scalars, hooks): # No stacktrace found for following nodes getitem = inputs[0] getitem_1 = inputs[1] getitem_2 = inputs[2]; inputs = None getitem_3 = sizes[0]; getitem_3 = None getitem_4 = sizes[1]; getitem_4 = None getitem_5 = sizes[2]; getitem_5 = None getitem_6 = sizes[3]; getitem_6 = None getitem_7 = sizes[4]; getitem_7 = None getitem_8 = sizes[5]; getitem_8 = None getitem_9 = sizes[6]; getitem_9 = None getitem_10 = sizes[7]; getitem_10 = None getitem_11 = sizes[8]; getitem_11 = None getitem_12 = sizes[9]; getitem_12 = None getitem_13 = sizes[10]; getitem_13 = None getitem_14 = sizes[11]; getitem_14 = None getitem_15 = sizes[12]; getitem_15 = None getitem_16 = sizes[13]; getitem_16 = None getitem_17 = sizes[14]; getitem_17 = None getitem_18 = sizes[15]; getitem_18 = None getitem_19 = sizes[0] getitem_20 = sizes[1] getitem_21 = sizes[2] getitem_22 = sizes[3] getitem_23 = sizes[4] getitem_24 = sizes[5] getitem_25 = sizes[6] getitem_26 = sizes[7] getitem_27 = sizes[8] getitem_28 = sizes[9] getitem_29 = sizes[10] getitem_30 = sizes[11] getitem_31 = sizes[12] getitem_32 = sizes[13] getitem_33 = sizes[14] getitem_34 = sizes[15]; sizes = None ``` after: https://manifold.edge.x2p.facebook.net/v0/read/tree/logs/.tmpCo5T6B/0_-_-_0/compiled_autograd_graph_1.txt?bucketName=tlparse_reports&apiKey=tlparse_reports-key&withPayload=1&timeoutMsec=100 ```python class CompiledAutograd0(torch.nn.Module): def forward(self, inputs, sizes, scalars, hooks): # No stacktrace found for following nodes getitem = inputs[0] getitem_1 = inputs[1] getitem_2 = inputs[2]; inputs = None getitem_3 = sizes[0] getitem_4 = sizes[1] getitem_5 = sizes[2] getitem_6 = sizes[3] getitem_7 = sizes[4] getitem_8 = sizes[5] getitem_9 = sizes[6] getitem_10 = sizes[7] getitem_11 = sizes[8] getitem_12 = sizes[9] getitem_13 = sizes[10] getitem_14 = sizes[11] getitem_15 = sizes[12] getitem_16 = sizes[13] getitem_17 = sizes[14] getitem_18 = sizes[15]; sizes = None ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/146875 Approved by: https://github.com/jansel ghstack dependencies: #146720, #146735	2025-02-13 21:41:33 +00:00
Simon Fan	76dacd5fc7	[ca] log graph before reodering passes (#146735 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/146735 Approved by: https://github.com/jansel ghstack dependencies: #146720	2025-02-13 21:41:33 +00:00
PyTorch MergeBot	9a883007a2	Revert "Implement cuda graphs implementation of torch.cond and torch.while_loop (#140979 )" This reverts commit `c7515da7b0`. Reverted https://github.com/pytorch/pytorch/pull/140979 on behalf of https://github.com/huydhn due to This change has been reported to break internal code ([comment](https://github.com/pytorch/pytorch/pull/140979#issuecomment-2657361940))	2025-02-13 18:04:26 +00:00
rzou	5dab0aeef0	[SkipFiles] Some more cleanup (#147013 ) This isn't a no-op but I think it's fine. It changes the case where a function f1 in a module in MOD_SKIPFILES calls a function f2 in one of the deleted modules. Previously f2 would have been skipped, now f2 gets inlined. Pull Request resolved: https://github.com/pytorch/pytorch/pull/147013 Approved by: https://github.com/yanboliang ghstack dependencies: #147016, #147012	2025-02-13 01:18:47 +00:00
rzou	fddaa2958b	[SkipFiles] Some more cleanup (#147012 ) I think these are all no-ops. Test Plan: - tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/147012 Approved by: https://github.com/yanboliang ghstack dependencies: #147016	2025-02-13 01:18:47 +00:00
rzou	87ebd77b34	Add some more docs to trace_rules.py (#147016 ) After discussing with Yanbo we wanted to record the behavior down so we don't need to rederive them in the future. Pull Request resolved: https://github.com/pytorch/pytorch/pull/147016 Approved by: https://github.com/yanboliang	2025-02-13 01:18:39 +00:00
Animesh Jain	b77a6eb184	[dynamo] Fix tensordict regression (#146995 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/146995 Approved by: https://github.com/StrongerXi ghstack dependencies: #146819	2025-02-13 00:59:59 +00:00
Raymond Li	21c2565f35	Document dynamo (#146736 ) Many files in dynamo are currently lacking file/module-level documentation, which makes it hard to know what they do at a glance and without digging into the code. This fixes that. Note: documentation was AI-generated and could be incorrect, please review carefully. Pull Request resolved: https://github.com/pytorch/pytorch/pull/146736 Approved by: https://github.com/jansel, https://github.com/StrongerXi, https://github.com/anijain2305, https://github.com/zou3519	2025-02-13 00:02:21 +00:00
Brian Hirsh	de964b9f8b	dont specialize symints when testing truthiness (#146731 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/146731 Approved by: https://github.com/bobrenjc93 ghstack dependencies: #146642, #146729	2025-02-12 20:57:10 +00:00
Animesh Jain	d6513f3246	[dynamo] Support list subclasses and fix dict subclasses mutation bugs (#146819 ) This PR adds support for list subclasses. Among other things are 1) Tracking the mutations on internal vts like `_dict_vt` and `_list_vt` using sources. This helps identify if there was a mutation in the underlying data structures, and we need to reconstruct it. 2) `UserDefinedObjectVariable` now has a new method - `is_modified` which `side_effect` infra relies upon to check mutations in the underlying vts (like `_dict_vt`). 3) `reconstruction` logic ensures that we use `dict.__getitem__` and `list.__getitem__` methods. This is super important because we don't want to call the overridden `__getitem__` methods. If this PR is hard to review, please let me know. I can break it into several small PRs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/146819 Approved by: https://github.com/StrongerXi, https://github.com/jansel	2025-02-12 17:46:02 +00:00
Yuanhao Ji	b0042286d4	[Dynamo] Allow dynamo to handle `str.xxx()` (#146587 ) Fixes #146350 Pull Request resolved: https://github.com/pytorch/pytorch/pull/146587 Approved by: https://github.com/zou3519	2025-02-12 08:54:10 +00:00
Thomas Bohnstingl	3a29992ee6	[associative_scan] Lifted arguments (#140043 ) This PR implements lifted arguments for associative_scan Pull Request resolved: https://github.com/pytorch/pytorch/pull/140043 Approved by: https://github.com/ydwu4	2025-02-11 23:25:55 +00:00
Daniel Galvez	c7515da7b0	Implement cuda graphs implementation of torch.cond and torch.while_loop (#140979 ) This is a new PR for #130386 , which got stale and was closed. Since I force-pushed to that branch in order to rebase it on top of main, the PR can no longer be reopened, according to https://github.com/isaacs/github/issues/361 I fixed the possibly-not-warmed-up problem described here: https://github.com/pytorch/pytorch/pull/130386/files#r1690856534 Since starting this, torch.cond and torch.while_loop now apparently have support for backward passes. I will look into what it might take to support that. Pull Request resolved: https://github.com/pytorch/pytorch/pull/140979 Approved by: https://github.com/eqy, https://github.com/eellison	2025-02-11 18:16:15 +00:00
rzou	5235a18cd6	[SkipFiles] remove some more stuff from MOD_SKIPLIST (#146876 ) Test Plan: - tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/146876 Approved by: https://github.com/anijain2305 ghstack dependencies: #146854	2025-02-11 15:00:56 +00:00
Yanbo Liang	229fb0bc83	[Dynamo][autograd.Function] Relax backward speculation strict mode: support .requires_grad (#146742 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/146742 Approved by: https://github.com/zou3519 ghstack dependencies: #146571, #146741	2025-02-11 05:39:07 +00:00
Yanbo Liang	f2da810516	[Dynamo][autograd.Function] Relax backward speculation strict mode: support .data (#146741 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/146741 Approved by: https://github.com/zou3519 ghstack dependencies: #146571	2025-02-11 05:39:07 +00:00
Yanbo Liang	29523aa113	[Dynamo][autograd.Function] Relax backward speculation strict mode a bit (#146571 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/146571 Approved by: https://github.com/zou3519	2025-02-11 05:39:00 +00:00
rzou	a7fe384d0e	Remove torch._higher_order_ops from MOD_SKIPLIST (#146853 ) Test Plan: - tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/146853 Approved by: https://github.com/williamwen42	2025-02-11 04:38:26 +00:00
rzou	1d81ecfc54	Rename PrimHOPBase to BaseHOP + minor changes (#146727 ) This PR: - renames PrimHOPBase to BaseHOP - changes the backward pass to always return a tuple (to match the forward pass). Test Plan: - tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/146727 Approved by: https://github.com/ydwu4	2025-02-11 02:43:37 +00:00
rzou	275c034b16	[SkipFiles] remove some stuff from MOD_SKIPLIST (#146854 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/146854 Approved by: https://github.com/yanboliang, https://github.com/anijain2305	2025-02-11 01:34:46 +00:00
Animesh Jain	cbbb11d967	[dynamo][user-defined] Unify standard and non-standard __new__ codebase (#146737 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/146737 Approved by: https://github.com/jansel ghstack dependencies: #146677	2025-02-10 17:31:13 +00:00
Animesh Jain	ee8a06f1f6	[dynamo][user-defined] User class.__new__ instead of special casing (#146677 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/146677 Approved by: https://github.com/jansel	2025-02-10 17:31:13 +00:00
Guilherme Leobas	899066eedf	Fix round(...) with constants (#146495 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/146495 Approved by: https://github.com/anijain2305	2025-02-10 15:08:09 +00:00
Simon Fan	298226f358	[dynamo] check for incompatible configs (#146513 ) internal: https://fb.workplace.com/groups/1075192433118967/permalink/1599802033991335/ Assuming flags don't change during compilation, we shouldn't allow incompatible configs to be set at torch.compile wrap time. Not in this PR: For flags that need to change during compilation, we'd have to be strict about where they can be used in the compile lifecycle Pull Request resolved: https://github.com/pytorch/pytorch/pull/146513 Approved by: https://github.com/williamwen42 Co-authored-by: Gabriel Ferns <gabeferns@meta.com>	2025-02-10 00:44:23 +00:00
Guilherme Leobas	6a9a02acbe	Set `enable_faithful_generator_behavior` flag to True (#142513 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/142513 Approved by: https://github.com/zou3519 ghstack dependencies: #141055, #144421, #144422, #144423, #144424, #144420, #145223	2025-02-08 22:42:12 +00:00
Guilherme Leobas	580a305681	Raise MutationError if there are side effects when returning generator (#145223 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/145223 Approved by: https://github.com/zou3519 ghstack dependencies: #141055, #144421, #144422, #144423, #144424, #144420	2025-02-08 22:42:12 +00:00
Guilherme Leobas	68cfd36c11	Add `CLEANUP_THROW` bytecode (#144420 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/144420 Approved by: https://github.com/zou3519 ghstack dependencies: #141055, #144421, #144422, #144423, #144424	2025-02-08 22:42:12 +00:00
Guilherme Leobas	53ab82d8f5	Implement `generator.throw(exception)` (#144424 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/144424 Approved by: https://github.com/zou3519 ghstack dependencies: #141055, #144421, #144422, #144423	2025-02-08 22:42:12 +00:00
Guilherme Leobas	8ee095f7c1	Implement `generator.close()` (#144423 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/144423 Approved by: https://github.com/zou3519 ghstack dependencies: #141055, #144421, #144422	2025-02-08 22:42:12 +00:00
Guilherme Leobas	ca9b16e070	Implement `generator.send(..)` (#144422 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/144422 Approved by: https://github.com/zou3519 ghstack dependencies: #141055, #144421	2025-02-08 22:42:12 +00:00
Guilherme Leobas	d798831167	Implement `generator.__iter__()` (#144421 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/144421 Approved by: https://github.com/zou3519 ghstack dependencies: #141055	2025-02-08 22:42:12 +00:00
Guilherme Leobas	8603a1c870	Suport generators (#141055 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/141055 Approved by: https://github.com/zou3519	2025-02-08 22:42:12 +00:00
eellison	92b7e610ab	[Inductor changes] Invoke Quant (#139102 ) Adds a `invoke_quant` higher order operator as proposed [here](https://docs.google.com/document/d/1s2PfJlq6Q1F8l11CkTIC69BW1rEnGEgs6YmBC7hu8rA/edit?tab=t.0). The primary motivations are - Unifying scattered reasoning for quant operators throughout the code base - Easy of pattern matching - see this very large pattern match expression [here](`949fdd2997/torch/_inductor/fx_passes/post_grad.py (L390-L426)`. Compared to the pattern I have in the tests: ``` @register_graph_pattern( CallFunction( torch.ops.aten.mm, CallFunction( torch.ops.higher_order.invoke_quant, Ignored(), Ignored(), Ignored(), scheme="nf4", ), Arg(), ), pass_dict=test_pass, ) ``` - Ability to specify inductor specific logic, like codegen'ing the operators in lower precision, or forcing fusion to a matmul. Example graph: ``` Python ===== AFTER POST GRAD ===== /data/users/eellison/pytorch/torch/fx/_lazy_graph_module.py class <lambda>(torch.nn.Module): def forward(self, arg0_1: "f32[8][1]cpu", arg1_1: "f32[8][1]cpu"): # File: /data/users/eellison/pytorch/torch/_higher_order_ops/invoke_quant.py:87 in __call__, code: return invoke_quant_tracer(args, kwargs, quant_options=self) # type: ignore[call-arg] repeated_subgraph0 = self.repeated_subgraph0 invoke_quant: "f32[8][1]cpu" = torch.ops.higher_order.invoke_quant(repeated_subgraph0, arg0_1, arg1_1, scheme = 'nf4'); repeated_subgraph0 = arg0_1 = arg1_1 = None return (invoke_quant,) class repeated_subgraph0(torch.nn.Module): def forward(self, arg0_1: "f32[8][1]cpu", arg1_1: "f32[8][1]cpu"): # File: /data/users/eellison/pytorch/torch/_higher_order_ops/invoke_quant.py:87 in __call__, code: return invoke_quant_tracer(args, *kwargs, quant_options=self) # type: ignore[call-arg] mul: "f32[8][1]cpu" = torch.ops.aten.mul.Tensor(arg0_1, arg1_1); arg0_1 = None add: "f32[8][1]cpu" = torch.ops.aten.add.Tensor(mul, arg1_1); mul = arg1_1 = None return add ``` The schema for `invoke_quant` is `torch.ops.higher_order.invoke_quant(subgraph, args, scheme=None)` where the scheme will not always be present. I wasn't sure exactly how the inductor specific configurations like `codgen_in_low_precision` should be passed through. I didnt want to stuff them all in as kwargs, and I didn't want to have them affect pattern matching. So they will be stored as meta of the node itself. And, following that, I wanted the invocation of the hop to match how it will show up in the graph. So I decided to have it be an object that is then invoked for the tracing. ``` invoke_quant = InvokeQuant(codegen_low_precision=True) invoke_quant(gn, (x, y), scheme="nf4") ``` Todo - not require the packing of args in a tuple, will do following https://github.com/pytorch/pytorch/pull/139162. Feedback welcome. Pull Request resolved: https://github.com/pytorch/pytorch/pull/139102 Approved by: https://github.com/Chillee	2025-02-08 19:30:19 +00:00
James Wu	76c8a2dc48	Fix get_top() to return the base level event of the stack, not the most recently started event (#146649 ) `get_top()` is really confusing when talking about a stack, because it can mean the most recently started event on the stack or the toplevel event in perfetto(which displays the stack upside down). Rename to `get_outermost` and fix the bug associated with it, so that it returns the correct value out of the stack. Running nanogpt now puts `guard_latency_us` correctly in the `dynamo` event: ``` tlp python benchmarks/dynamo/torchbench.py --backend inductor --device cuda --only nanogpt --amp --cold-start-latency --print-compilation-time --training --performance 2>&1 --dynamic-shapes \| tee out.log ``` <img width="1281" alt="image" src="https://github.com/user-attachments/assets/4eeb371a-4d81-415a-acc4-7d303a4b2a93" /> Pull Request resolved: https://github.com/pytorch/pytorch/pull/146649 Approved by: https://github.com/masnesral, https://github.com/anijain2305	2025-02-07 18:04:50 +00:00
Animesh Jain	ee45ea599d	[dynamo] Actionable message on recompilations for fullgraph=True (#146550 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/146550 Approved by: https://github.com/zou3519, https://github.com/StrongerXi ghstack dependencies: #146553	2025-02-07 17:28:43 +00:00
Animesh Jain	fa0956951c	[dynamo] Remove the suggestion to use suppress_errors on compiler error (#146553 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/146553 Approved by: https://github.com/zou3519, https://github.com/jansel	2025-02-07 17:28:43 +00:00
Animesh Jain	99ddbb4802	[dynamo][fullgraph] Do not skip frame with fullgraph=True (#146527 ) Earlier if there were no ops in the graph, fullgraph=True will also fallback to eager. This hides issues in testing, where we silently fallback to eager, and do not test optimized bytecode. As can be seen in the PR, I had to fix several tests when I forced to use the optimized bytecode in the absence of graph. A few failing tests will be fixed in follow up PRs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/146527 Approved by: https://github.com/zou3519, https://github.com/StrongerXi	2025-02-06 18:56:07 +00:00
Animesh Jain	e2e265e27b	[dynamo] Use polyfill to implement comparison operators (#144485 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/144485 Approved by: https://github.com/jansel	2025-02-06 17:27:07 +00:00
Simon Fan	a14c780c4c	[dynamo] fix dynamo_compile logging on RecompileLimitExceeded (#146544 ) Logging branches based on RecompileLimitExceeded or not. If we exceed the limit, we fallback to eager before even trying to analyze the frame. We handle RecompileLimitExceeded outside of the try/catch/finally that edits the metrics context: `72405b0c0f/torch/_dynamo/convert_frame.py (L908-L935)`. dynamo_config and recompile_reason are both known before we raise the RecompileLimitExceeded, so we can add them with the rest of the "common" metrics. which are logged on metric_context decorator exit and is always called Pull Request resolved: https://github.com/pytorch/pytorch/pull/146544 Approved by: https://github.com/masnesral	2025-02-06 16:20:42 +00:00
PyTorch MergeBot	1b79d47635	Revert "[dynamo] check for incompatible configs (#146513 )" This reverts commit `aab7925418`. Reverted https://github.com/pytorch/pytorch/pull/146513 on behalf of https://github.com/atalman due to inductor/test_fuzzer.py::TestConfigFuzzer::test_config_fuzzer_dynamo_bisect [GH job link](https://github.com/pytorch/pytorch/actions/runs/13174131431/job/36772837627) [HUD commit link](`4a545eb85d`) ([comment](https://github.com/pytorch/pytorch/pull/146513#issuecomment-2639860568))	2025-02-06 13:42:25 +00:00
Animesh Jain	340cfe4f28	[dynamo][fbcode] Turn on inline_inbuilt_nn_modules (#145407 ) As title. Some internal testing at https://fb.workplace.com/groups/241460628989036/permalink/411650015303429/ Pull Request resolved: https://github.com/pytorch/pytorch/pull/145407 Approved by: https://github.com/ezyang, https://github.com/jansel	2025-02-06 13:18:35 +00:00
Simon Fan	aab7925418	[dynamo] check for incompatible configs (#146513 ) internal: https://fb.workplace.com/groups/1075192433118967/permalink/1599802033991335/ Assuming flags don't change during compilation, we shouldn't allow incompatible configs to be set at torch.compile wrap time. Not in this PR: For flags that need to change during compilation, we'd have to be strict about where they can be used in the compile lifecycle Pull Request resolved: https://github.com/pytorch/pytorch/pull/146513 Approved by: https://github.com/williamwen42	2025-02-06 07:39:52 +00:00
bobrenjc93	389c5c0842	print out partial fx graph for all data-dependent errors (#146363 ) The previous implementation didn't catch the following type of errors ``` torch.fx.experimental.symbolic_shapes.GuardOnDataDependentSymNode: Could not extract specialized integer from data-dependent expression u2 (unhinted: u2). (Size-like symbols: none) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/146363 Approved by: https://github.com/angelayi, https://github.com/bdhirsh ghstack dependencies: #146298, #146296	2025-02-06 04:21:34 +00:00
Simon Fan	72405b0c0f	[ca] refactor compile reasons and log to tlparse (#146386 ) This PR accumulates comple reasons inside each CacheNode, and logs them to tlparse on each CA compile. This defines a compile as an autograd structure change, and a recompile as a dynamic shape change. sample tlparse: https://manifold.edge.x2p.facebook.net/v0/read/tree/logs/.tmpdbo7gt/index.html?bucketName=tlparse_reports&apiKey=tlparse_reports-key&withPayload=1&timeoutMsec=100 for compiles: ```python [ "!0: Cache miss due to new autograd node: torch::autograd::GraphRoot (NodeCall 0) with key size 39, previous key sizes=[]" ] ``` for recompiles: ```python [ "!0: Cache miss due to new autograd node: torch::autograd::GraphRoot (NodeCall 0) with key size 39, previous key sizes=[]", "!1: Cache miss due to 7 changed tensor shapes (total of 7): sizes[0], sizes[1], sizes[2], sizes[3], sizes[4], sizes[5], sizes[6]" ] ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/146386 Approved by: https://github.com/jansel ghstack dependencies: #146229	2025-02-05 23:33:21 +00:00
Simon Fan	e20b0c82d1	[ca] no longer require is_traceable annotations for c++ autograd functions (#146229 ) This PR removes the CA compile-time error for C++ autograd functions, and supports them by having dynamo graph break on them (instead of allow_in_graph). The CppNode's collects are kept as is for now. Pull Request resolved: https://github.com/pytorch/pytorch/pull/146229 Approved by: https://github.com/jansel, https://github.com/zou3519	2025-02-05 08:49:17 +00:00
clr	93d98aca31	inductor: Don't throw an internal error when a nn.module is missing a attribute (#145122 ) If a nn.module getattr call throws, we should make sure that we don't crash with an internal error Note that I couldn't figure out how to test this, so advice would be awesome. I have my best case attempt at https://github.com/pytorch/pytorch/pull/145799, but it doesn't seem to reproduce the crash. Pull Request resolved: https://github.com/pytorch/pytorch/pull/145122 Approved by: https://github.com/jansel	2025-02-05 05:49:32 +00:00
Michael Lazos	616ac94175	[Dynamo] Fix spammy optimizer warning (#146374 ) Fixes https://discuss.pytorch.org/t/torch-compile-optimizer-step-generates-excessive-warning-messages/216067/7 Pull Request resolved: https://github.com/pytorch/pytorch/pull/146374 Approved by: https://github.com/anijain2305	2025-02-05 01:03:49 +00:00
clr	4e194bbfd6	dynamo: fsdp throw unimplemented vs attribute error (#146188 ) Rather than throw a full exception for fsdp, instead just return unimplemented, and respect the user options (i.e. fullgraph, vs graph break). Pull Request resolved: https://github.com/pytorch/pytorch/pull/146188 Approved by: https://github.com/jansel	2025-02-04 21:45:55 +00:00
Yanbo Liang	07b9fe0690	[Trace PyDispatcher] Add CustomFunctionHigherOrderOperatorVariable (#146272 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/146272 Approved by: https://github.com/zou3519 ghstack dependencies: #146270, #146271	2025-02-04 20:55:51 +00:00
Aaron Gokaslan	7f65a20884	[BE]: Enable ruff SLOT checks (#146276 ) This enables a check that which a class which only inherits from immutable classes like str, tuple, and NamedTuple, also defined `__slots__` so they don't allocate memory unnecessarily. This also ensure contributors think about how they define their classes with subclass NamedTuples and str, of which we have many in our codebase Pull Request resolved: https://github.com/pytorch/pytorch/pull/146276 Approved by: https://github.com/aorenste	2025-02-04 19:18:23 +00:00
bobrenjc93	c591ad0c03	dump partial fx graph to stderr when dynamo tracing fails with guard on data-dependent (#146296 ) As discussed with @avikchaudhuri and @bdhirsh last week, this can be quite useful when debugging. The following code produces a data dependent error ``` import torch from torch import nn # UserError: Could not guard on data-dependent expression Eq(507 - u0, 0) (unhinted: Eq(507 - u0, 0)). (Size-like symbols: u0) class Repro(nn.Module): def __init__(self): super().__init__() def forward(self, cache, update, pos): _, _, max_seq_len, _ = cache.shape _, _, seqlen, _ = update.shape pos_item = pos[0].item() # u0 torch._check(pos_item + seqlen <= max_seq_len) # u0 + 502 <= 507 torch._check(pos_item >= 0) before = cache.narrow(2, 0, pos_item) # FAIL # Laith: why can't we make unbacked expressions size-like? after = cache.narrow(2, (pos_item + seqlen), (max_seq_len - pos_item - seqlen)) # PASS end = torch.tensor(max_seq_len - pos_item - seqlen).item() after = cache.narrow(2, (pos_item + seqlen), end) return torch.cat([before, update, after], dim=2) repro = Repro() bsz = 1 n_heads = 4 max_seq_len = 512 head_dim = 64 seqlen = 5 pos_item = 1 cache = torch.zeros(bsz, n_heads, max_seq_len, head_dim) update = torch.ones(bsz, n_heads, seqlen, head_dim) pos = torch.tensor([pos_item]) example_inputs = (cache, update, pos) torch.export.export(repro, example_inputs) ``` This is what it now prints out ``` class GraphModule(torch.nn.Module): def forward(self, L_cache_: "f32[1, 4, 512, 64][131072, 32768, 64, 1]cpu", L_update_: "f32[1, 4, 5, 64][1280, 320, 64, 1]cpu", L_pos_: "i64[1][1]cpu"): l_cache_ = L_cache_ l_update_ = L_update_ l_pos_ = L_pos_ # File: /data/users/bobren/a/pytorch/r1.py:14 in forward, code: pos_item = pos[0].item() # u0 getitem: "i64[][]cpu" = l_pos_[0]; l_pos_ = None item: "Sym(u0)" = getitem.item(); getitem = None # File: /data/users/bobren/a/pytorch/r1.py:15 in forward, code: torch._check(pos_item + seqlen <= max_seq_len) # u0 + 502 <= 507 add: "Sym(u0 + 5)" = item + 5 le: "Sym(u0 + 5 <= 512)" = add <= 512; add = None _check = torch._check(le); le = _check = None # File: /data/users/bobren/a/pytorch/r1.py:16 in forward, code: torch._check(pos_item >= 0) ge: "Sym(u0 >= 0)" = item >= 0 _check_1 = torch._check(ge); ge = _check_1 = None # File: /data/users/bobren/a/pytorch/r1.py:17 in forward, code: before = cache.narrow(2, 0, pos_item) before: "f32[1, 4, u0, 64][131072, 32768, 64, 1]cpu" = l_cache_.narrow(2, 0, item); before = None # File: /data/users/bobren/a/pytorch/r1.py:21 in forward, code: after = cache.narrow(2, (pos_item + seqlen), (max_seq_len - pos_item - seqlen)) add_1: "Sym(u0 + 5)" = item + 5 sub: "Sym(512 - u0)" = 512 - item; item = None sub_1: "Sym(507 - u0)" = sub - 5; sub = None narrow_1 = l_cache_.narrow(2, add_1, sub_1); l_cache_ = add_1 = sub_1 = narrow_1 = None Traceback (most recent call last): File "/data/users/bobren/a/pytorch/torch/_dynamo/utils.py", line 3075, in run_node return getattr(args[0], node.target)(args[1:], kwargs) File "/data/users/bobren/a/pytorch/torch/utils/_stats.py", line 27, in wrapper return fn(args, *kwargs) File "/data/users/bobren/a/pytorch/torch/_subclasses/fake_tensor.py", line 1267, in __torch_dispatch__ return self.dispatch(func, types, args, kwargs) File "/data/users/bobren/a/pytorch/torch/_subclasses/fake_tensor.py", line 1808, in dispatch return self._cached_dispatch_impl(func, types, args, kwargs) File "/data/users/bobren/a/pytorch/torch/_subclasses/fake_tensor.py", line 1369, in _cached_dispatch_impl output = self._dispatch_impl(func, types, args, kwargs) File "/data/users/bobren/a/pytorch/torch/_subclasses/fake_tensor.py", line 2282, in _dispatch_impl decomposition_table[func](args, *kwargs) File "/data/users/bobren/a/pytorch/torch/_decomp/decompositions.py", line 759, in slice_forward return self.as_strided(sizes, strides, storage_offset) File "/data/users/bobren/a/pytorch/torch/utils/_stats.py", line 27, in wrapper return fn(args, *kwargs) File "/data/users/bobren/a/pytorch/torch/_subclasses/fake_tensor.py", line 1267, in __torch_dispatch__ return self.dispatch(func, types, args, kwargs) File "/data/users/bobren/a/pytorch/torch/_subclasses/fake_tensor.py", line 1808, in dispatch return self._cached_dispatch_impl(func, types, args, kwargs) File "/data/users/bobren/a/pytorch/torch/_subclasses/fake_tensor.py", line 1370, in _cached_dispatch_impl entry = self._make_cache_entry(state, key, func, args, kwargs, output) File "/data/users/bobren/a/pytorch/torch/_subclasses/fake_tensor.py", line 1640, in _make_cache_entry output_info = self._get_output_info_for_cache_entry( File "/data/users/bobren/a/pytorch/torch/_subclasses/fake_tensor.py", line 1583, in _get_output_info_for_cache_entry synth_output = self._output_from_cache_entry( File "/data/users/bobren/a/pytorch/torch/_subclasses/fake_tensor.py", line 1738, in _output_from_cache_entry return self._get_output_tensor_from_cache_entry( File "/data/users/bobren/a/pytorch/torch/_subclasses/fake_tensor.py", line 1709, in _get_output_tensor_from_cache_entry empty.set_(storage, storage_offset, shape, stride) File "/data/users/bobren/a/pytorch/torch/fx/experimental/sym_node.py", line 564, in guard_size_oblivious r = self.shape_env.evaluate_expr( File "/data/users/bobren/a/pytorch/torch/fx/experimental/recording.py", line 263, in wrapper return retlog(fn(args, **kwargs)) File "/data/users/bobren/a/pytorch/torch/fx/experimental/symbolic_shapes.py", line 6468, in evaluate_expr return self._evaluate_expr( File "/data/users/bobren/a/pytorch/torch/fx/experimental/symbolic_shapes.py", line 6658, in _evaluate_expr raise self._make_data_dependent_error( torch.fx.experimental.symbolic_shapes.GuardOnDataDependentSymNode: Could not guard on data-dependent expression Ne(507 - u0, 1) (unhinted: Ne(507 - u0, 1)). (Size-like symbols: u0) Caused by: after = cache.narrow(2, (pos_item + seqlen), (max_seq_len - pos_item - seqlen)) # r1.py:21 in forward (utils/_stats.py:27 in wrapper) For more information, run with TORCH_LOGS="dynamic" For extended logs when we create symbols, also add TORCHDYNAMO_EXTENDED_DEBUG_CREATE_SYMBOL="u0" If you suspect the guard was triggered from C++, add TORCHDYNAMO_EXTENDED_DEBUG_CPP=1 For more debugging help, see https://docs.google.com/document/d/1HSuTTVvYH1pTew89Rtpeu84Ht3nQEFTYhAX3Ypa_xJs/edit?usp=sharing``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/146296 Approved by: https://github.com/zou3519 ghstack dependencies: #146298	2025-02-04 19:12:39 +00:00
Aaron Gokaslan	292af3cc89	[BE][Ez]: ISC001 Auto concatenate implicit one line strings (#146408 ) Apply ruff rule about implicit string concatenation, this autofixes strings that are all the same type and on the same line. These lines are broken up likely as the result of autoformatters in the past. All fixes are automated using the autofixes in ISC001. Pull Request resolved: https://github.com/pytorch/pytorch/pull/146408 Approved by: https://github.com/justinchuby, https://github.com/janeyx99	2025-02-04 19:07:04 +00:00
rzou	f38a2ea0d4	[Dynamo] Better unsupported message for Fake Tensor Exception (#146357 ) I cannot repro this. But this line shows up in internal logs, and I want to know what the exception is and the context inside it. All of the exceptions_allowed_to_be_fallback are dataclasses, so they should print nicely. Test Plan: - code reading Pull Request resolved: https://github.com/pytorch/pytorch/pull/146357 Approved by: https://github.com/williamwen42	2025-02-04 18:52:11 +00:00
Brian Hirsh	e68f5087d8	update _unsafe_set_version_counter to accept lists of tensors (#137921 ) See the comment [here](https://github.com/pytorch/pytorch/issues/132014#issuecomment-2379547400) (cc @H-Huang @awgu @kwen2501 @wanchaol @fegin @fduwjj @wz337 @wconstab @d4l3k @c-p-i-o @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @ColinPeppler @amjames @desertfire @chauhang @aakhundov @XilunWu @rec) - this PR updates `_unsafe_set_version_counter` to accept a list of tensors, for overhead-sensitive users (e.g. distributed) who need to hide VC bumps from autograd on a large list of tensors without wanting to suffer the overhead of going from python->C++ separately for every tensor in the list. I left the binding in pybind, and used a `std::vector`. if we really need to optimize overhead even further, we could write a manual cpython binding. I use this updated API in the next PR to fix FSDP2, so that it properly hides the VC of all `all_gather_buffer` tensors in its call to `split_with_sizes_copy.out(all_gather_buffers)`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/137921 Approved by: https://github.com/awgu, https://github.com/albanD	2025-02-04 04:51:11 +00:00
Animesh Jain	487400f47f	[dynamo] Support functools.partial variables through inspect.signature (#146339 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/146339 Approved by: https://github.com/jansel ghstack dependencies: #146322, #146116	2025-02-04 04:39:39 +00:00
Animesh Jain	5f53889850	[dynamo][builtin-skipfiles-cleanup] Remove inspect (#146116 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/146116 Approved by: https://github.com/williamwen42, https://github.com/zou3519, https://github.com/jansel ghstack dependencies: #146322	2025-02-04 03:36:07 +00:00
Animesh Jain	0da07a6d1d	[dynamo][skip-function] Add missing unimplemented line (#146322 ) This is a missing line from the merged PR in the stack below. Lets try to get this in quickly. Pull Request resolved: https://github.com/pytorch/pytorch/pull/146322 Approved by: https://github.com/StrongerXi, https://github.com/jansel, https://github.com/mlazos	2025-02-03 22:11:55 +00:00
Yanbo Liang	15e12d5ec3	[Trace PyDispatcher] Support temporarily_pop_interpreter_stack ctx manager (#146271 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/146271 Approved by: https://github.com/zou3519 ghstack dependencies: #146270	2025-02-03 21:47:54 +00:00
Simon Fan	1d4adf4e1f	[dynamo] log recompile reason to dynamo_compile (#146117 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/146117 Approved by: https://github.com/bobrenjc93	2025-02-03 21:04:04 +00:00
Harmen Stoppels	01554c7b5a	fix incorrect literal strings / accidental tuples (#146037 ) * `expr,` is short for `(expr,)` * literal strings over multiple lines need to escape the newline `\` or use `(...)`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/146037 Approved by: https://github.com/Skylion007	2025-02-03 15:08:11 +00:00
Animesh Jain	fa48757180	[dynamo] misc fixes for inspect (#146283 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/146283 Approved by: https://github.com/jansel ghstack dependencies: #146075	2025-02-03 04:26:10 +00:00
Animesh Jain	c0ec2e0a0d	[dynamo][functions] Improve getattr on functions (#146075 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/146075 Approved by: https://github.com/jansel	2025-02-03 02:01:57 +00:00
Yanbo Liang	511d0dd558	[Dynamo][Trace PyDispatcher] Support calling id function over class (#146269 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/146269 Approved by: https://github.com/anijain2305	2025-02-02 22:29:30 +00:00
Animesh Jain	cef856faa9	[dynamo][enum] Trace through enum.py for enum construction (#146070 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/146070 Approved by: https://github.com/jansel ghstack dependencies: #146062, #146198, #146258, #146214	2025-02-02 03:12:36 +00:00
Animesh Jain	31fb691782	[dynamo] Graph break on tensor.retain_grad (#146214 ) Fixes https://github.com/pytorch/pytorch/issues/146212 Pull Request resolved: https://github.com/pytorch/pytorch/pull/146214 Approved by: https://github.com/jansel ghstack dependencies: #146062, #146198, #146258	2025-02-02 03:12:36 +00:00
Animesh Jain	529eb8d558	[dynamo] Add return to python_type (#146258 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/146258 Approved by: https://github.com/jansel ghstack dependencies: #146062, #146198	2025-02-02 03:12:36 +00:00
Nikita Shulga	e56dcf2772	[CPUInductor] Fix SVE256 detection (#146207 ) This PR removes `torch.cpu._is_arm_sve_supported()` and replaces is with stable `torch.backends.cpu.get_cpu_capability()` I should have reviewed https://github.com/pytorch/pytorch/pull/134672 more thoroughly, because it introduced duplicate, but slightly different API for detecting CPU architectures, which resulted in runtime crashes on system that do support SVE128, rather than SVE256 Fixes https://github.com/pytorch/pytorch/issues/145441 Pull Request resolved: https://github.com/pytorch/pytorch/pull/146207 Approved by: https://github.com/angelayi	2025-02-01 18:51:34 +00:00
Shangdi Yu	a97a906dd9	Add "//caffe2:libtorch" to minifier TARGET file (#146203 ) Summary: as title. To avoid errors like "undefined symbol: aoti_torch_device_type_cpu" when compiling minifier_launcher.py Test Plan: CI Differential Revision: D68978430 Pull Request resolved: https://github.com/pytorch/pytorch/pull/146203 Approved by: https://github.com/desertfire	2025-02-01 05:37:23 +00:00
Animesh Jain	1de41e6918	[dynamo][exceptions][3.10] Clean symbolic stack on exception handling (#146198 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/146198 Approved by: https://github.com/williamwen42 ghstack dependencies: #146062	2025-02-01 02:51:44 +00:00
Animesh Jain	f25f1163dc	[dynamo] Support frozenset({..}).__contains__ (#146062 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/146062 Approved by: https://github.com/Skylion007, https://github.com/jansel	2025-01-31 23:22:58 +00:00
Animesh Jain	781aceee9c	[dynamo] Revert abc change due to internal failures (#146177 ) xref - https://www.internalfb.com/tasks/?t=191383874 Pull Request resolved: https://github.com/pytorch/pytorch/pull/146177 Approved by: https://github.com/StrongerXi ghstack dependencies: #146141	2025-01-31 21:28:06 +00:00
William Wen	49df8de8be	[dynamo] disable eval_frame callback in _TorchDynamoContext __enter__/__exit__ (#145981 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/145981 Approved by: https://github.com/jansel	2025-01-31 20:40:59 +00:00
Animesh Jain	667b94d1c2	[hotfix][dynamo] Skip linecache due to a flaky issue (#146141 ) A large number of jit + dynamo wrapped tests fail in linecache tracing. We need further debugging. Skipping for now to stem the bleeding. https://github.com/pytorch/pytorch/issues/146076 Pull Request resolved: https://github.com/pytorch/pytorch/pull/146141 Approved by: https://github.com/StrongerXi	2025-01-31 17:45:06 +00:00
PyTorch MergeBot	f5a61ba0a3	Revert "inductor: Don't throw an internal error when a nn.module is missing a attribute (#145122 )" This reverts commit `d100e9ae74`. Reverted https://github.com/pytorch/pytorch/pull/145122 on behalf of https://github.com/ZainRizvi due to Sorry but this is failing internally. See D68924977 for details ([comment](https://github.com/pytorch/pytorch/pull/145122#issuecomment-2627880860))	2025-01-31 17:39:23 +00:00
Aaron Orenstein	57d8278ab9	pickler for GraphModule (#141659 ) Pickling GraphModule needs some special handling for wrapping things that normally can't be pickled - but async compile needs to pass them across a wire so we need to be able to serialize it - add some helpers to enable that. Differential Revision: [D68921318](https://our.internmc.facebook.com/intern/diff/D68921318) Pull Request resolved: https://github.com/pytorch/pytorch/pull/141659 Approved by: https://github.com/jamesjwu	2025-01-31 05:34:28 +00:00
Pian Pawakapan	ffb424eab6	[dynamo/export] call local_scalar_dense when full() value is scalar tensor (#144999 ) Fixes https://github.com/pytorch/pytorch/issues/144907 ``` class Foo(torch.nn.Module): def forward(self, val): return torch.full((80, 2), val, dtype=torch.float32) export(Foo(), args=(torch.tensor(1),)) ``` When we have a `torch.full` call like above, where the fill value is a scalar Tensor and not a scalar value, the FX graph from `_dynamo.export()` contains a single node: the full op. We run into a `PendingUnbackedSymbolNotFound` error, because the `item()` call is implicit; the UnbackedSymInt is extracted but goes directly into the data of the output tensor value, and we're then unable to locate it when we try to compute unbacked bindings. On the other hand, non-strict export doesn't face this, because an explicit `item()`, or `local_scalar_dense` node is inserted, and the unbacked binding is directly the example value of that node. This adds a dynamo handler to imitate what happens in non-strict. Pull Request resolved: https://github.com/pytorch/pytorch/pull/144999 Approved by: https://github.com/angelayi	2025-01-31 02:45:43 +00:00
Oguz Ulgen	ccd27e8129	Turn on fx graph cache and automatic dynamic pgo local caches in fbcode (#146065 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/146065 Approved by: https://github.com/jamesjwu	2025-01-31 01:11:48 +00:00
Animesh Jain	1e3d1738a4	[dynamo][polyfills]Support getrecursionlimit (#145989 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/145989 Approved by: https://github.com/StrongerXi, https://github.com/jansel ghstack dependencies: #145986, #145987, #145994	2025-01-31 00:47:31 +00:00
Animesh Jain	e7bb608d02	[dynamo][dicts] Support construction of types.MappingProxyType (#145994 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/145994 Approved by: https://github.com/StrongerXi, https://github.com/jansel ghstack dependencies: #145986, #145987	2025-01-31 00:47:31 +00:00
Animesh Jain	4665bc2cc0	[dynamo][functions] Support `id` on function (#145987 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/145987 Approved by: https://github.com/StrongerXi, https://github.com/jansel, https://github.com/mlazos ghstack dependencies: #145986	2025-01-31 00:47:23 +00:00
Animesh Jain	56307dc370	[dynamo][dicts] Raise exception on pop (#145986 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/145986 Approved by: https://github.com/Skylion007, https://github.com/williamwen42, https://github.com/StrongerXi, https://github.com/jansel	2025-01-31 00:47:13 +00:00

1 2 3 4 5 ...

4261 Commits