pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
Aaron Orenstein	fbb076cc45	Fix call to create_load_global (#145553 ) There is no version of create_load_global() that takes three parameters - any use of this function will fail. I think this is probably the correct fix. Pull Request resolved: https://github.com/pytorch/pytorch/pull/145553 Approved by: https://github.com/anijain2305 ghstack dependencies: #145551, #145552	2025-01-30 22:21:40 +00:00
Aaron Orenstein	ccbbc88bbb	Turn on mypy for _dynamo/variables/builtin.py (#145552 ) The fact that mypy errors were ignored was hiding several bugs in builtin.py (for example the previous diff's incorrect override and use of `call_getattr`) Pull Request resolved: https://github.com/pytorch/pytorch/pull/145552 Approved by: https://github.com/anijain2305, https://github.com/Skylion007 ghstack dependencies: #145551	2025-01-30 22:21:32 +00:00
Aaron Orenstein	f3120f6d26	Remove incorrect BuiltinVariable.call_hasattr() (#145551 ) BuiltinVariable.call_hasattr() overrides the base class - but actually behaves differently. The base is `obj.call_hasattr(tx, attr)` but BuiltinVariable's version is `<unused>.call_hasattr(tx, obj, attr)`. The BuiltinVariable version is used as a pattern from `call_self_handler()` for `BuiltinVariable(hasattr)`. I think the other version is just used for internal `hasattr(obj, name)` so I renamed that one to `call_obj_hasattr`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/145551 Approved by: https://github.com/anijain2305	2025-01-30 22:21:19 +00:00
clr	d100e9ae74	inductor: Don't throw an internal error when a nn.module is missing a attribute (#145122 ) If a nn.module getattr call throws, we should make sure that we don't crash with an internal error Note that I couldn't figure out how to test this, so advice would be awesome. I have my best case attempt at https://github.com/pytorch/pytorch/pull/145799, but it doesn't seem to reproduce the crash. Pull Request resolved: https://github.com/pytorch/pytorch/pull/145122 Approved by: https://github.com/jansel	2025-01-30 21:55:29 +00:00
Yidi Wu	7e7341bddd	[hop] fix unbacked_bindings meta for while_loop (#143559 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/143559 Approved by: https://github.com/zou3519	2025-01-30 21:33:09 +00:00
Thomas Bohnstingl	9f9904172d	[scan] scan dim handling in user-facing scan() (#145179 ) This PR introduces the capability that the scan dim is handled in the user facing scan() call. Internally, the scan dim is always shifted to dim 0 and then the scan is performed over that dim. This is a follow-up PR from https://github.com/bohnstingl/pytorch/pull/3 Pull Request resolved: https://github.com/pytorch/pytorch/pull/145179 Approved by: https://github.com/ydwu4	2025-01-30 21:09:07 +00:00
Yidi Wu	a3698ebd5c	[while_loop] specialize when cond_fn return constants (#144515 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/144515 Approved by: https://github.com/zou3519	2025-01-30 19:02:34 +00:00
IvanKobzarev	894ef8c1e3	[torchbench] Inductor freezing bfloat16 conv folding needs high tolerance (#145623 ) Issue: https://github.com/pytorch/pytorch/issues/144888 Torchbench of timm lcnet_050 model fails on accuracy in case of `--frezing` `--inference` `--bfloat16` `res_error==0.12` If to turn off convolution inductor constant folding - `res_error==0.016` `float16 error ~ 0.00669` `float16 without conv folding ~ 0.0018` convolution folding results in increase of error almost at one order of magnitude. I think we should revisit and try to do something to improve the accuracy for conv folding. E.g. For example doing conv folding at compilation time with float64? At the moment I am adding counters to identify if convolution folding happened, and in case of bfloat16 and conv_folding - increase multiplier to the max level (10) to pass accuracy test. Pull Request resolved: https://github.com/pytorch/pytorch/pull/145623 Approved by: https://github.com/eellison	2025-01-30 12:46:35 +00:00
PyTorch MergeBot	1185b81c51	Revert "[dynamo] Use polyfill to implement comparison operators (#144485 )" This reverts commit `d1f82de2bf`. Reverted https://github.com/pytorch/pytorch/pull/144485 on behalf of https://github.com/huydhn due to This seems to break dynamo tests in trunk after landing ([comment](https://github.com/pytorch/pytorch/pull/144485#issuecomment-2622893294))	2025-01-29 21:30:42 +00:00
Animesh Jain	d1f82de2bf	[dynamo] Use polyfill to implement comparison operators (#144485 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/144485 Approved by: https://github.com/jansel	2025-01-29 17:37:40 +00:00
Animesh Jain	4499d60d56	[dynamo][builin-skipfiles-cleanup] Remove types (#145909 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/145909 Approved by: https://github.com/zou3519 ghstack dependencies: #145856, #145875, #145878, #145892	2025-01-29 16:47:02 +00:00
Animesh Jain	3f77002b96	[dynamo][builtin-skipfiles-cleanup] remove abc, enum, importlib (#145892 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/145892 Approved by: https://github.com/williamwen42, https://github.com/StrongerXi ghstack dependencies: #145856, #145875, #145878	2025-01-29 05:30:06 +00:00
Animesh Jain	236793684d	[dynamo][builtin-skipfiles-cleanup] Remove threading, _collections_abc, _weakrefset, threading (#145878 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/145878 Approved by: https://github.com/williamwen42, https://github.com/StrongerXi ghstack dependencies: #145856, #145875	2025-01-29 05:30:06 +00:00
Animesh Jain	a479656cd2	[dynamo][builtin-skipfiles-removal] Remove logging (#145875 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/145875 Approved by: https://github.com/williamwen42 ghstack dependencies: #145856	2025-01-29 05:29:58 +00:00
Animesh Jain	64ee57847b	[dynamo][builtin-skipfiles-cleanup] Remove some builtins (#145856 ) [dynamo][builtin-skipfiles-cleanup] Remove more builtins Pull Request resolved: https://github.com/pytorch/pytorch/pull/145856 Approved by: https://github.com/zou3519	2025-01-29 05:29:47 +00:00
Thomas Bohnstingl	82859f6185	[associative_scan] scan dim handling in user-facing associative_scan() (#139864 ) This PR implements the user-facing dim change, i.e., that the scan dim provided by the user is always moved to dim 0 and then the associative_scan operation always operates on dim 0. Pull Request resolved: https://github.com/pytorch/pytorch/pull/139864 Approved by: https://github.com/ydwu4	2025-01-28 23:58:10 +00:00
PyTorch MergeBot	3481c2aec4	Revert "[dynamo] save/restore system random state more carefully (#145750 )" This reverts commit `e3d3f2b22e`. Reverted https://github.com/pytorch/pytorch/pull/145750 on behalf of https://github.com/eellison due to bisected perf regression ([comment](https://github.com/pytorch/pytorch/pull/145750#issuecomment-2620028414))	2025-01-28 20:51:07 +00:00
Ryan Guo	eaff13275e	[dynamo] Properly branch on an unspecialized NN module (#145786 ) User defined NN module might have their own `__len__` or `__bool__` methods which Dynamo needs to trace through, so that side effects and/or reads to buffered writes are properly handled. This patch removes the special `UnspecializedNNModuleVariable` branch in Dynamo's branch handling, and lets these cases fall into the `UserDefinedObjectVariable` branch, which handles the aforementioned cases correctly. Fixes #145284. Pull Request resolved: https://github.com/pytorch/pytorch/pull/145786 Approved by: https://github.com/williamwen42	2025-01-28 19:45:17 +00:00
Ryan Guo	eaec97ab1f	[dynamo] Properly prune dead input cell object (#145781 ) This patch models input cell object as "newly created" rather than "pre-existing" python object (see added documentation for why this actually captures the semantics more accurately). This enables the `SideEffects.prune_dead_object_new` algorithm to prune away writes to input cell objects which are no longer relevant; this didn't happen prior to this patch because we modelled them as pre-existing objects, which forces us to codegen their attribute mutations. Fixes #145564. Pull Request resolved: https://github.com/pytorch/pytorch/pull/145781 Approved by: https://github.com/williamwen42, https://github.com/jansel	2025-01-28 18:28:13 +00:00
Animesh Jain	80a0412b76	[dynamo][builtin-skipfiles-cleanup] Remove posixpath (#145828 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/145828 Approved by: https://github.com/zou3519 ghstack dependencies: #145744, #145753, #145826	2025-01-28 16:14:34 +00:00
Animesh Jain	6824a4a75d	[dynamo][builtin-skipfiles-cleanup] Remove re (#145826 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/145826 Approved by: https://github.com/zou3519 ghstack dependencies: #145744, #145753	2025-01-28 16:14:34 +00:00
Animesh Jain	4307e6c008	[dynamo][builtin-skipfile-cleanup] Remove signal (#145753 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/145753 Approved by: https://github.com/zou3519 ghstack dependencies: #145744	2025-01-28 16:14:23 +00:00
Animesh Jain	5c5306e8bc	[dynamo][builtin-skiplist-cleanup] Remove weakref (#145744 ) WeakKeyDictionary already works very nicely with the UserDefinedObject Variable Tracker. Pull Request resolved: https://github.com/pytorch/pytorch/pull/145744 Approved by: https://github.com/jansel	2025-01-28 07:55:12 +00:00
Burak Turk	01a4d86b31	add pt2 callbacks for backward pass and prevent duplicate callbacks (#145732 ) Summary: This change adds callbacks for lazy backwards compilation while preventing duplicate callbacks to be fired. Differential Revision: D68577593 Pull Request resolved: https://github.com/pytorch/pytorch/pull/145732 Approved by: https://github.com/mlazos	2025-01-28 03:50:02 +00:00
William Wen	e3d3f2b22e	[dynamo] save/restore system random state more carefully (#145750 ) Reattempt of https://github.com/pytorch/pytorch/pull/145435 since the state of the linked internal diff appears to be messed up. Note: I have verified that the previously failing internal tests now pass internally. Differential Revision: [D68723334](https://our.internmc.facebook.com/intern/diff/D68723334) Pull Request resolved: https://github.com/pytorch/pytorch/pull/145750 Approved by: https://github.com/StrongerXi	2025-01-28 01:34:13 +00:00
Ryan Guo	5a4d959cdb	[dynamo] Properly model torch profiler context objects (#145537 ) Prior to this patch, Dynamo conveniently modelled torch profiler context objects (e.g., `torch.profiler.profile`) as `NullContextVariable` because `torch.compile` ignore the effect of these profiler contexts. However, the semantics of these profiler contexts diverges from `contextlib.nullcontext` in the `__enter__` function, where the former returns `self` and the latter returns `None`. This causes subtle error as observed in #125021. This patch adds back a `ProfilerContextVariable`, which addresses the aforementioned semantic discrepency. Fixes #125021. Pull Request resolved: https://github.com/pytorch/pytorch/pull/145537 Approved by: https://github.com/zou3519, https://github.com/williamwen42	2025-01-28 00:03:36 +00:00
Colin L. Rice	c1161957a4	inductor_config_logging: Don't drop keys (#144700 ) This bit me while I was trying to debug some trace issues. In general this config is already quite large when dumping, so adding more fields doesn't make it significantly worse. Also a number of the items we are type checking for (except the test configs), don't even show up. Primarily this will help us when debugging rocm, halide, and trace configs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/144700 Approved by: https://github.com/ezyang	2025-01-27 23:47:25 +00:00
PyTorch MergeBot	2de53b3b65	Revert "pickler for GraphModule (#141659 )" This reverts commit `c6ad08357b`. Reverted https://github.com/pytorch/pytorch/pull/141659 on behalf of https://github.com/ZainRizvi due to Sorry but this is breaking internally, please take a look at D68694181 for more details. ([comment](https://github.com/pytorch/pytorch/pull/141659#issuecomment-2617045120))	2025-01-27 22:39:30 +00:00
Animesh Jain	993b229665	[dynamo][dicts] Fix dict.__new__ bug (#145723 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/145723 Approved by: https://github.com/jansel, https://github.com/StrongerXi ghstack dependencies: #145519, #145547, #145558	2025-01-27 21:42:43 +00:00
Animesh Jain	7e1c7253e9	[dynamo][builtin-skipfile-cleanup] Support tuple.__new__ (#145558 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/145558 Approved by: https://github.com/jansel, https://github.com/StrongerXi ghstack dependencies: #145519, #145547	2025-01-27 21:42:43 +00:00
Ryan Guo	bfaf76bfc6	[dynamo] clear out traced frames at the start of `test_log_traced_frames` (#145640 ) The test was being flaky in CI, and this patch fixes it. Fixes #137461. Pull Request resolved: https://github.com/pytorch/pytorch/pull/145640 Approved by: https://github.com/williamwen42	2025-01-27 20:49:59 +00:00
Randolf Scholz	835e770bad	Use `typing.IO[bytes]` instead of `io.BytesIO` in annotations (#144994 ) Fixes #144976 Using appoach ① `IO[bytes]`, but could also try with a protocol. ## Notes: - moved `torch.serialization.FILE_LIKE` to `torch.types.FileLike` - Use `FileLike` annotation where it makes sense - made sure those functions also support `os.PathLike` - Replaced `isinstance(x, io.BytesIO)` with `isinstance(x, (io.IOBase, IO))` where appropriate. - Replaced `BinaryIO` with `IO[bytes]` (the two ABCs are almost identical, the only difference is that `BinaryIO` allows `bytearray` input to `write`, whereas `IO[bytes]` only `bytes`) - needed to make `torch.serialization._opener` generic to avoid LSP violations. - skipped `torch/onnx/verification` for now (functions use `BytesIO.getvalue` which is not part of the `IO[bytes]` ABC, but it kind of seems that this is redundant, as e.g. `onnx.load` supports `str \| PathLike[str] \| IO[bytes]` directly... Pull Request resolved: https://github.com/pytorch/pytorch/pull/144994 Approved by: https://github.com/ezyang, https://github.com/Skylion007	2025-01-27 18:08:07 +00:00
rzou	ea141d8134	functional compiled autograd (#144707 ) This PR squashes together the following commits: https://github.com/pytorch/pytorch/pull/144115 https://github.com/pytorch/pytorch/pull/143417 https://github.com/pytorch/pytorch/pull/143405 https://github.com/pytorch/pytorch/pull/143387 https://github.com/pytorch/pytorch/pull/143304 https://github.com/pytorch/pytorch/pull/143296 This is a refactor of compiled autograd to use "functional autograd". The end goal is that it gets compiled autograd's initial capture to stop specializing on Tensor metadata, therefore allowing compiled autograd to better handle Tensor subclasses. For more information, please read the commit messages for each PR. Pull Request resolved: https://github.com/pytorch/pytorch/pull/144707 Approved by: https://github.com/bdhirsh, https://github.com/xmfan, https://github.com/jansel	2025-01-27 05:20:56 +00:00
Aaron Orenstein	c6ad08357b	pickler for GraphModule (#141659 ) Pickling GraphModule needs some special handling for wrapping things that normally can't be pickled - but async compile needs to pass them across a wire so we need to be able to serialize it - add some helpers to enable that. Pull Request resolved: https://github.com/pytorch/pytorch/pull/141659 Approved by: https://github.com/jamesjwu	2025-01-26 19:29:13 +00:00
Edward Z. Yang	90448f0128	Output of nonzero is transposed, fix fake tensor (#144695 ) Needs this companion executorch PR: https://github.com/pytorch/executorch/pull/7657 Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/144695 Approved by: https://github.com/bobrenjc93, https://github.com/albanD	2025-01-26 01:07:22 +00:00
Xuehai Pan	0afdee4c39	[dynamo] raise IndexError when inserting into a full `deque` (#139379 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/139379 Approved by: https://github.com/jansel	2025-01-25 18:04:49 +00:00
Yuanhao Ji	cc1ecead07	[Dynamo] Allow `format()` to handle int (#144956 ) Fixes #144830 Pull Request resolved: https://github.com/pytorch/pytorch/pull/144956 Approved by: https://github.com/jansel	2025-01-25 04:12:45 +00:00
Animesh Jain	ef60de07a0	[dynamo] Log guard latency (#145132 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/145132 Approved by: https://github.com/ezyang ghstack dependencies: #145509	2025-01-25 03:01:18 +00:00
Shangdi Yu	4cc5e880f9	Add accuracy issue support in AOTI Minifier (#145539 ) Summary: Add three more repro levels for AOTI minifier (level 2 already exists). They are the same as the existing dynamo minifier repro levels. Now AOTI minifier can minify and repro programs that have numerical accuracy issues as well. 1: Dumps the original graph out to repro.py if compilation fails 2: Dumps a minifier_launcher.py if aoti fails. 3: Always dumps a minifier_launcher.py. Good for segfaults. 4: Dumps a minifier_launcher.py if the accuracy fails. Refactor AOTI minifier unit tests to be cleaner and better re-use the existing minifier testing code. We do not need to manually patch {"aot_inductor.dump_aoti_minifier": True} to each test now, this config is generated in the test code. Differential Revision: D68294638 Pull Request resolved: https://github.com/pytorch/pytorch/pull/145539 Approved by: https://github.com/desertfire	2025-01-24 23:07:19 +00:00
Aishwarya Sivaraman	457facf7e2	[caffe2] Use the manifold cache backend as the default (#144773 ) Test Plan: CI D68155591 Pull Request resolved: https://github.com/pytorch/pytorch/pull/144773 Approved by: https://github.com/izaitsevfb	2025-01-24 19:48:34 +00:00
Michael Lazos	8eea554332	[Dynamo] Fix names collisions with foreach decomps (#145479 ) Fixes https://github.com/pytorch/pytorch/issues/138698 Pull Request resolved: https://github.com/pytorch/pytorch/pull/145479 Approved by: https://github.com/yanboliang	2025-01-24 18:46:58 +00:00
Animesh Jain	74cfb4f364	[dynamo][refactor] Move collections.namedtuple out of SkipFunctionVariable (#145547 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/145547 Approved by: https://github.com/zou3519 ghstack dependencies: #145519	2025-01-24 17:39:33 +00:00
Animesh Jain	9132f4b7ce	[dynamo][guards] Log guard latency to tlparse (#145509 ) Example ![image](https://github.com/user-attachments/assets/1503ee59-ff35-46d9-9b61-16352a4a30e2) Pull Request resolved: https://github.com/pytorch/pytorch/pull/145509 Approved by: https://github.com/ezyang	2025-01-24 16:33:29 +00:00
Animesh Jain	53fc921ce2	[dynamo][trace-rules-cleanup] Remove functools from the Builtins skiplist (#145519 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/145519 Approved by: https://github.com/yanboliang, https://github.com/zou3519	2025-01-24 06:02:03 +00:00
PyTorch MergeBot	6f60c65a3a	Revert "[dynamo] Log guard latency (#145132 )" This reverts commit `0a310d7388`. Reverted https://github.com/pytorch/pytorch/pull/145132 on behalf of https://github.com/anijain2305 due to CI failures observed after PR was merged ([comment](https://github.com/pytorch/pytorch/pull/145132#issuecomment-2611268421))	2025-01-24 00:11:50 +00:00
PyTorch MergeBot	6dd8283381	Revert "[compiled autograd] Proxy opaque nodes for built-in autograd nodes (#143296 )" This reverts commit `5531fafffe`. Reverted https://github.com/pytorch/pytorch/pull/143296 on behalf of https://github.com/izaitsevfb due to breaking internal tests T213390054 ([comment](https://github.com/pytorch/pytorch/pull/143296#issuecomment-2611224926))	2025-01-23 23:34:13 +00:00
PyTorch MergeBot	9553301ade	Revert "[compiled autograd] Proxy nodes for user-defined C++ torch::autograd::Function (#143387 )" This reverts commit `784bb2127c`. Reverted https://github.com/pytorch/pytorch/pull/143387 on behalf of https://github.com/izaitsevfb due to breaking internal tests T213390054 ([comment](https://github.com/pytorch/pytorch/pull/143296#issuecomment-2611224926))	2025-01-23 23:34:13 +00:00
PyTorch MergeBot	16c4f8c395	Revert "[compiled autograd] Always proxy autograd.Function nodes; handle AOT backwards (#143405 )" This reverts commit `ec820fe57c`. Reverted https://github.com/pytorch/pytorch/pull/143405 on behalf of https://github.com/izaitsevfb due to breaking internal tests T213390054 ([comment](https://github.com/pytorch/pytorch/pull/143296#issuecomment-2611224926))	2025-01-23 23:34:13 +00:00
PyTorch MergeBot	3f6cfd0156	Revert "[compiled autograd] stop specializing on metadata during initial trace (#143417 )" This reverts commit `99dd1bf1b9`. Reverted https://github.com/pytorch/pytorch/pull/143417 on behalf of https://github.com/izaitsevfb due to breaking internal tests T213390054 ([comment](https://github.com/pytorch/pytorch/pull/143296#issuecomment-2611224926))	2025-01-23 23:34:12 +00:00
PyTorch MergeBot	ab082863a1	Revert "[compiled autograd] support Tensor Subclasses in AOTBackward (#144115 )" This reverts commit `082c28c3c6`. Reverted https://github.com/pytorch/pytorch/pull/144115 on behalf of https://github.com/izaitsevfb due to breaking internal tests T213390054 ([comment](https://github.com/pytorch/pytorch/pull/143296#issuecomment-2611224926))	2025-01-23 23:34:12 +00:00
Animesh Jain	0a310d7388	[dynamo] Log guard latency (#145132 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/145132 Approved by: https://github.com/ezyang ghstack dependencies: #145351, #145420	2025-01-23 23:30:07 +00:00
Nikhil Gupta	41b38f755c	Revert "Reverting the PR adding Kleidiai-based int4 kernels (#145392 )" (#145505 ) https://github.com/pytorch/pytorch/pull/134124 was reverted by https://github.com/pytorch/pytorch/pull/145392 due to KleidiAI clone issue. 1. This reverts commit `0940eb6d44` (https://github.com/pytorch/pytorch/pull/145392 )and Fixes KleidiAI mirror issue. 2. KleidiAI is now cloned from github mirror instead of arm gitlab Change-Id: I7d6eee7214cd117d3057d615936fcc3ee6052fa2 Fixes https://github.com/pytorch/pytorch/issues/145273 Pull Request resolved: https://github.com/pytorch/pytorch/pull/145505 Approved by: https://github.com/malfet	2025-01-23 18:50:59 +00:00
Animesh Jain	015c6d6fdb	[dynamo][guards] Turn on profiling of guard manager (#145420 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/145420 Approved by: https://github.com/ezyang ghstack dependencies: #145351	2025-01-23 18:17:43 +00:00
Animesh Jain	c58198184b	[dynamo][dicts] Insert LENTGH guard on an if condition on dict (#145432 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/145432 Approved by: https://github.com/williamwen42, https://github.com/jansel	2025-01-23 04:40:56 +00:00
Animesh Jain	5a18f1e1eb	[dynamo] Support fx map_aggregate (#145351 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/145351 Approved by: https://github.com/zou3519	2025-01-23 03:19:30 +00:00
Li Yu (ads)	e6a84be3d3	[PyTorch] Add backend aot_eager_decomp_partition_with_mode (#143250 ) Summary: ## Why To make it possible to run torch dispatch mode inside compiled modules. This is to enable running MemoryTrackerMode (in next diff) to collect memory usage of compiled modules. ## What Add a backend aot_eager_decomp_partition_with_mode. Add an enable_log to the backend to control the compilation logging (which can be very verbose and slow the run of mode) Test Plan: unittest E2e tested in the next diff which shows the memory read from the mode passed to this backend is very close to the actual job's memory snapshot. Differential Revision: D67227144 Pull Request resolved: https://github.com/pytorch/pytorch/pull/143250 Approved by: https://github.com/bdhirsh	2025-01-22 23:20:59 +00:00
PyTorch MergeBot	f0a210bf5d	Revert "Output of nonzero is transposed, fix fake tensor (#144695 )" This reverts commit `693d8c7e94`. Reverted https://github.com/pytorch/pytorch/pull/144695 on behalf of https://github.com/izaitsevfb due to breaking internal tests, see D68461259 ([comment](https://github.com/pytorch/pytorch/pull/144695#issuecomment-2608443589))	2025-01-22 23:04:50 +00:00
rzou	082c28c3c6	[compiled autograd] support Tensor Subclasses in AOTBackward (#144115 ) Compiled autograd's initial trace traces through the AOTBackward epilogue. The Tensor Subclass code is not traceable. This PR changes it so that when we see Tensor Subclass constructors, we proxy nodes for their construction into the graph. Test Plan: - New basic test with TwoTensor - Existing tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/144115 Approved by: https://github.com/jansel, https://github.com/xmfan, https://github.com/bdhirsh ghstack dependencies: #143296, #143304, #143387, #143405, #143417	2025-01-22 21:51:07 +00:00
rzou	99dd1bf1b9	[compiled autograd] stop specializing on metadata during initial trace (#143417 ) The previous PRs built up to this. We change compiled autograd's initial trace to stop baking in metadata. While tracing, we allocate some weirdly shaped tensors that we can put proxies on. The initial trace should not be accessing any metadata of these tensors (it will likely error out if it does because of how weird the shapes are). This involved fixing some various sites where we do specialize on the metadata, like: - we change CopySlices's apply_with_saved to proxy some calls into the graph (this change is fairly hard to split out by itself). - we stop calling InputBuffer::add - we delete the weird metadata from the graph so that no graph passes can make use of it. Test Plan: - tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/143417 Approved by: https://github.com/jansel, https://github.com/xmfan ghstack dependencies: #143296, #143304, #143387, #143405	2025-01-22 21:51:07 +00:00
rzou	ec820fe57c	[compiled autograd] Always proxy autograd.Function nodes; handle AOT backwards (#143405 ) We will always proxy autograd.Function nodes in compiled autograd's initial graph capture (previously there was an option to proxy vs trace into the autograd.Function) We have some requirements for the AOTBackward. Compiled Autograd runs accumulate grad reordering passes on the AOTBackward graph directly after the initial graph capture, so we can't just proxy a single node for it. Instead, we: - proxy the AOTBackward prologue function into the CA graph - copy-paste the AOTBackward graph into the CA graph - trace directly through the epilogue (the traced nodes go into the CA graph). Tracing through the epilogue is safe (assuming no Tensor subclasses) because the only thing the epilogue does is drop some outputs. The Tensor subclass situation was already broken so this doesn't regress anything but this PR sets it up to be fixed (in a followup, where we will proxy "make_subclass" calls into the graph from the epilogue). Test Plan: - existing tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/143405 Approved by: https://github.com/jansel, https://github.com/xmfan ghstack dependencies: #143296, #143304, #143387	2025-01-22 21:50:56 +00:00
rzou	784bb2127c	[compiled autograd] Proxy nodes for user-defined C++ torch::autograd::Function (#143387 ) We define a functional version of a C++ torch::autograd::Function. The functional version reconstructs the ctx object and then calls backward with it. Some more details: - we define how to pack/unpack ctx.saved_data into an IValue. It's a Dict[str, IValue], so it wasn't difficult. - every call to CppNode::apply_with_saved binds a new function to Python. This is because we're unable to reuse the a previously bound function for reasons (the schema may change depending on what the user actually puts into their Dict[str, IValue]). Test Plan: - existing tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/143387 Approved by: https://github.com/jansel, https://github.com/xmfan ghstack dependencies: #143296, #143304	2025-01-22 21:50:47 +00:00
rzou	5531fafffe	[compiled autograd] Proxy opaque nodes for built-in autograd nodes (#143296 ) This PR is on the way to getting compiled autograd's initial capture to stop specializing on Tensor metadata. This PR changes compiled autograd's initial capture to proxy an opaque (w.r.t. Dynamo) function into the graph for all built-in codegen'ed autograd nodes and validate_outputs. We changed each codegen'ed apply_with_saved (e.g. MulBackward0::apply_with_saved) to call into Python to proxy a function (compiled_autograd.ops.MulBackward0) into the graph. Then, we use the node's InputMetadata to "guess" at the properties of the output Tensors to create some new FakeTensors. Some details: - MulBackward0::apply_with_saved lives in libtorch_cpu, but needs to be call to Python via libtorch_python. There is an indirection (PyCompilerInterface) to do this. - MulBackward0::apply_with_saved passes a C++ function to Python. To make our lives easier, every codegen'ed apply_with_saved passes a C++ function with the same signature `(variable_list, ivalue_list) -> variable_list`. - We define how to pack arbitrary C++ types into IValue via a helper IValuePacker struct and codegen functional variants of each builtin C++ autograd node (e.g. MulBackward0_apply_functional_ivalue). MulBackward0 before this PR: https://gist.github.com/zou3519/a80381d5fa38e970e413fcd91b0530de MulBackward0 after this PR: https://gist.github.com/zou3519/0c2eee8b3d8d96232b51ef430b53c5b0 Test Plan: - existing tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/143296 Approved by: https://github.com/jansel	2025-01-22 21:50:29 +00:00
albanD	0940eb6d44	Reverting the PR adding Kleidiai-based int4 kernels (#145392 ) Mitigation for https://github.com/pytorch/pytorch/issues/145273 Reverting https://github.com/pytorch/pytorch/pull/134124 and https://github.com/pytorch/pytorch/pull/144074 Pull Request resolved: https://github.com/pytorch/pytorch/pull/145392 Approved by: https://github.com/ZainRizvi, https://github.com/malfet, https://github.com/atalman, https://github.com/digantdesai	2025-01-22 20:11:49 +00:00
Isuru Fernando	0efa843392	Dynamic shape guards in C++ (#139899 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/139899 Approved by: https://github.com/anijain2305, https://github.com/albanD, https://github.com/jansel ghstack dependencies: #143385, #143164	2025-01-22 14:58:35 +00:00
Isuru Fernando	fbaef0ac03	Add a language option for symbolic shape guards (#143164 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/143164 Approved by: https://github.com/ezyang ghstack dependencies: #143385	2025-01-22 14:58:35 +00:00
Aaron Orenstein	1ce533867f	Teach dynamo to handle GenericAlias without a graph break (#145240 ) Dynamo wasn't handling the new PEP585 type annotations: ``` x = list[Foo] ``` Although this worked in py3.9 this was causing an `unimplemented` (Unexpected type in sourceless builder) in py3.12. This fixes it to treat them as a BuiltinVariable. Fixes #145226 Pull Request resolved: https://github.com/pytorch/pytorch/pull/145240 Approved by: https://github.com/anijain2305	2025-01-22 01:55:51 +00:00
rzou	1e8d6d6f0e	[SkipFiles] New modules added to torch.* are inlined by default (#145279 ) This PR: - makes it so that new modules added to torch are inlined by default - adds a list of the previously "skipped by default" modules to avoid regressing anything. This is a new MOD_SKIPLIST list that is consulted in trace_rules.check_file. - Follow-up work will go through this list, one-by-one, and try to delete modules. I think we should be able to delete almost everything, except for torch._dynamo. Test Plan - existing tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/145279 Approved by: https://github.com/yanboliang	2025-01-21 23:24:12 +00:00
Edward Z. Yang	693d8c7e94	Output of nonzero is transposed, fix fake tensor (#144695 ) Needs this companion executorch PR: https://github.com/pytorch/executorch/pull/7657 Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/144695 Approved by: https://github.com/bobrenjc93, https://github.com/albanD	2025-01-21 20:50:09 +00:00
Animesh Jain	19584b28fd	[dynamo][dicts] Consolidate dict(..) construction (#144342 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/144342 Approved by: https://github.com/StrongerXi	2025-01-20 04:42:06 +00:00
Aaron Orenstein	a79100ab11	PEP585 update - torch/_dynamo (#145105 ) See #145101 for details. Pull Request resolved: https://github.com/pytorch/pytorch/pull/145105 Approved by: https://github.com/bobrenjc93	2025-01-18 20:47:11 +00:00
Yanbo Liang	43a00d73b3	[Trace Python Dispatcher] Support FuncTorchInterpreter (#144444 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/144444 Approved by: https://github.com/williamwen42, https://github.com/zou3519 ghstack dependencies: #144439	2025-01-17 02:26:37 +00:00
Yanbo Liang	5d02575aa1	[Trace Python dispatcher] Support torch.DispatchKey & torch.DispatchKeySet (#144439 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/144439 Approved by: https://github.com/zou3519	2025-01-17 02:26:36 +00:00
William Wen	3a50aba7d3	[dynamo] add option to not skip on empty graph (#144885 ) Temporary fix to https://github.com/pytorch/pytorch/issues/144360. Turning the config on globally will cause a bunch of tests to fail, which needs to be addressed in followups. I had a previous attempt at https://github.com/pytorch/pytorch/pull/144712, but this is a more complicated change and will likely be absorbed into work to refactor Dynamo's exception handling. Pull Request resolved: https://github.com/pytorch/pytorch/pull/144885 Approved by: https://github.com/jansel	2025-01-17 02:12:20 +00:00
Nikita Shulga	a61a65ff82	[MPSInductor] Add `Worker.current_device` method (#145023 ) That just returns 0, as multi-gpu is not currently supported by MPS Pull Request resolved: https://github.com/pytorch/pytorch/pull/145023 Approved by: https://github.com/dcci	2025-01-17 01:41:01 +00:00
PyTorch MergeBot	5e6e6200bf	Revert "[dynamo][dicts] Consolidate dict(..) construction (#144342 )" This reverts commit `a54a784b82`. Reverted https://github.com/pytorch/pytorch/pull/144342 on behalf of https://github.com/kit1980 due to breaking internal builds, see D68125388 ([comment](https://github.com/pytorch/pytorch/pull/144342#issuecomment-2597184167))	2025-01-17 00:32:09 +00:00
Laith Sakka	c3fcb3606d	Profile compile_inner instead of _compile_inner (#144930 ) Summary: title Test Plan: NA Reviewed By: jamesjwu Differential Revision: D67990492 Pull Request resolved: https://github.com/pytorch/pytorch/pull/144930 Approved by: https://github.com/jamesjwu	2025-01-16 23:59:27 +00:00
Colin L. Rice	95c363cc9b	dynamo: Don't crash with internal error if getattr on a tensor fails (#144817 ) This prevents crashes when getattr is called on a tensor for something which doesn't exist. Pull Request resolved: https://github.com/pytorch/pytorch/pull/144817 Approved by: https://github.com/williamwen42, https://github.com/jansel	2025-01-16 22:04:06 +00:00
Colin L. Rice	6492851125	symbolic_convert: Don't fail when we hit a undefined name (#144784 ) We're using a python builtin NameError here, instead of throwing a Unsupported exception. This causes the NameError to get wrapped in a InternalTorchDynamoError instead of just causing a graph break, and letting the user code fail directly. Pull Request resolved: https://github.com/pytorch/pytorch/pull/144784 Approved by: https://github.com/williamwen42, https://github.com/jansel	2025-01-16 01:47:48 +00:00
Colin L. Rice	926f9056a9	speculation_log: Raise a unique error for divergence issues (#144785 ) This is primarily sent for discussion and to see what tests fail due to this. The idea is that rather than capturing this as a regex on the fail_reason, just give it a unique failure type Pull Request resolved: https://github.com/pytorch/pytorch/pull/144785 Approved by: https://github.com/ezyang	2025-01-16 00:49:43 +00:00
Colin L. Rice	b88dcb4835	dynamo: Don't crash when tracing a missing attr on a constant. (#144593 ) dynamo: Don't crash when tracing a missing attr on a constant. This throws a InternalTorchDynamoError: AttributeError: 'NoneType' object has no attribute 'max' instead of just skipping the bad call when tracing, and throwing a normal AttributeError instead. There are two questions that I would love reviewer comment on. 1) Is throwing unimplemented the right thing here? or should I throw something like ObservedAttributeError 2) Do we need to worry about performance with this code? In particular, should we just catch the exception? Or maybe cache the lookup result? Pull Request resolved: https://github.com/pytorch/pytorch/pull/144593 Approved by: https://github.com/jansel	2025-01-15 20:23:43 +00:00
Simon Fan	898a90c6bb	[dynamo][hop] Introduce FlexAttentionBackwardHighOrderVariable (#144533 ) FIXES https://github.com/pytorch/pytorch/issues/143180 This PR adds a new variable mapping to SourcelessBuilder to represent the flex attention intermediates. The variable proxies a call to HOP, and carryovers the graph state (subgraphs represented as UnspecializedNNModuleVariable) to the dynamo output graph. This is safe to do because the nn modules used in flex attention have either been speculated on before, or are outputs of make_fx of the forward. tlparse of `TestCompiledAutograd.test_flex_attention`: https://manifold.edge.x2p.facebook.net/v0/read/tree/logs/.tmpiWendk/index.html?bucketName=tlparse_reports&apiKey=tlparse_reports-key&withPayload=1&timeoutMsec=100 ```python class GraphModule(torch.nn.Module): def forward(self, L_inputs_ : list): ... # File: /data/users/xmfan/core/b/pytorch/torch/_dynamo/compiled_autograd.py:832 in set_node_origin, code: CompiledFunctionBackward0 (NodeCall 1) ... fw_graph0_0 = self.fw_graph0_0 joint_graph0_0 = self.joint_graph0_0 mask_graph0_0 = self.mask_graph0_0 flex_attention_backward = torch.ops.higher_order.flex_attention_backward(aot0_primals_1, aot0_primals_1, aot0_primals_1, aot0_detach_3, aot0_detach_5, aot0_expand_5, aot0_zeros_1, fw_graph0_0, joint_graph0_0, (1, 1, aot0_ones, aot0_zeros, None, None, aot0__to_copy_1, aot0__to_copy_2, None, None, 1073741824, 1073741824, mask_graph0_0), 0.125, {'PRESCALE_QK': False, 'ROWS_GUARANTEED_SAFE': False, 'BLOCKS_ARE_CONTIGUOUS': False, 'WRITE_DQ': True, 'OUTPUT_LOGSUMEXP': True}, (), ()); aot0_primals_1 = aot0_detach_3 = aot0_detach_5 = aot0_expand_5 = aot0_zeros_1 = fw_graph0_0 = joint_graph0_0 = aot0_ones = aot0_zeros = aot0__to_copy_1 = aot0__to_copy_2 = mask_graph0_0 = None aot0_getitem_4: "bf16[1, 1, s0, s1][s0s1, s0s1, s1, 1]cuda:0" = flex_attention_backward[0] aot0_getitem_5: "bf16[1, 1, s0, s1][s0s1, s0s1, s1, 1]cuda:0" = flex_attention_backward[1] aot0_getitem_6: "bf16[1, 1, s0, s1][s0s1, s0s1, s1, 1]cuda:0" = flex_attention_backward[2]; flex_attention_backward = None ... class fw_graph0_0(torch.nn.Module): def forward(self, arg0_1: "bf16[][]cuda:0", arg1_1: "i32[][]cuda:0", arg2_1: "i32[][]cuda:0", arg3_1: "i32[][]cuda:0", arg4_1: "i32[][]cuda:0"): return arg0_1 class joint_graph0_0(torch.nn.Module): def forward(self, arg0_1: "bf16[][]cuda:0", arg1_1: "i32[][]cuda:0", arg2_1: "i32[][]cuda:0", arg3_1: "i32[][]cuda:0", arg4_1: "i32[][]cuda:0", arg5_1: "bf16[][]cuda:0"): return [arg5_1, None, None, None, None] class mask_graph0_0(torch.nn.Module): def forward(self, arg0_1: "i32[][]cuda:0", arg1_1: "i32[][]cuda:0", arg2_1: "i32[][]cuda:0", arg3_1: "i32[][]cuda:0"): # File: /data/users/xmfan/core/b/pytorch/torch/_dynamo/compiled_autograd.py:832 in set_node_origin, code: CompiledFunctionBackward0 (NodeCall 1) new_ones: "b8[][]cuda:0" = torch.ops.aten.new_ones.default(arg0_1, [], dtype = torch.bool, device = device(type='cuda', index=0), pin_memory = False); arg0_1 = None return new_ones ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/144533 Approved by: https://github.com/zou3519	2025-01-15 18:40:57 +00:00
Edward Z. Yang	ee8f833d13	Undo leading underscore on ctx for breakpoint (#144864 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/144864 Approved by: https://github.com/Skylion007	2025-01-15 18:00:58 +00:00
Sujoy Saraswati	7e1c1e65eb	Graph freezing preparation for non-Inductor backends (#139902 ) Enable preparing module named parameters and buffers in tracing context for non-Inductor backends to implement graph freezing. Fixes #139272 Pull Request resolved: https://github.com/pytorch/pytorch/pull/139902 Approved by: https://github.com/eellison, https://github.com/masnesral, https://github.com/gujinghui	2025-01-15 11:25:04 +00:00
James Wu	7d71ddbe5d	Add non_c_binding torch functions to allowlist for AOTAutogradCache, confirm no special handlers for them (#144802 ) Differential Revision: [D68173093](https://our.internmc.facebook.com/intern/diff/D68173093/) This diff allows any function in torch_non_c_binding_in_graph_functions to be safe to cache. These functions should be safe to cache because they are part of the torch API, and do not save global state (or if they do, dynamo creates unique guards around the constants they return). A function that's allowed in a dynamo graph is safe to cache for AOTAutograd purposes as long as: - It's functional (i.e. does not access global state); - or its value is constant folded away (and guarded against by dynamo) The tricky cases are functions that dynamo uses special handlers to track. These special handlers can sometimes close over stuff that's safe for dynamo locally, but isn't encoded anywhere when cached across processes. An example of this is `DTensor.from_local`, where various DeviceMesh information doesn't change in the same dynamo process, but can change across multiple processes. The handler for `DTensor.from_local` closes over these and dynamo creates a proxy for the function call. This is not safe to cache. That said, most special handlers are in fact functional and safe. So I add a unit test to test_trace_rules.py that confirms that any function with special handlers in dynamo added to this list needs to be audited to be safe to cache. The list of safe handlers there either: - Don't access global state; - Guard on global state; or - Always returns a constant that never changes Pull Request resolved: https://github.com/pytorch/pytorch/pull/144802 Approved by: https://github.com/bdhirsh	2025-01-15 05:41:36 +00:00
Simon Fan	9cd6f46130	[ca] raise error message on AOT Autograd caching (#144595 ) FIXES https://github.com/pytorch/pytorch/issues/144175, bandaid Pull Request resolved: https://github.com/pytorch/pytorch/pull/144595 Approved by: https://github.com/bdhirsh	2025-01-15 05:09:42 +00:00
Nikita Shulga	18786c65e5	[BE] Extend `test_remove_no_ops` (#144795 ) ---- - Use `is_dtype_supported` to skip dtype promotions portion of the test on unsupported device - Extend it to use `torch.float16` so promotions could be checked there - Implement `CpuInterface.is_bfloat16_supported` that returns true (which looks like the case, even if it's supported via emulation) Pull Request resolved: https://github.com/pytorch/pytorch/pull/144795 Approved by: https://github.com/Skylion007 ghstack dependencies: #144509, #144798	2025-01-15 05:00:26 +00:00
Nikita Shulga	9157a748a6	[MPSInductor] Add dummy properties (#144509 ) For compute capabilitiy (which is an empty string, same as CPU) And for multicore count return 8, as this is smallest number of GPU cores on Apple silicon Pull Request resolved: https://github.com/pytorch/pytorch/pull/144509 Approved by: https://github.com/jansel	2025-01-14 20:12:38 +00:00
James Wu	e58c823ab8	Implement increment and add_to_set for CompileEventLogger (#143427 ) This diff implements `increment` and `add_to_set`, which are features of MetricsContext, but not ChromiumEventLogger. This allows us to add a bunch of other metricscontext callsites to use CompileEventLogger instead. Differential Revision: [D67354867](https://our.internmc.facebook.com/intern/diff/D67354867/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/143427 Approved by: https://github.com/masnesral	2025-01-14 02:42:49 +00:00
Animesh Jain	a54a784b82	[dynamo][dicts] Consolidate dict(..) construction (#144342 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/144342 Approved by: https://github.com/StrongerXi	2025-01-13 22:24:56 +00:00
Ryan Guo	4ceca4d60f	[dynamo] Avoid graph break on updates to `obj.__dict__` (#144419 ) `obj.__dict__` is handled specially in Dynamo, and prior to this patch we only support read and membership check on that dictionary object. This patch adds support for writes and some documentation. Fixes #143756. Pull Request resolved: https://github.com/pytorch/pytorch/pull/144419 Approved by: https://github.com/jansel, https://github.com/anijain2305	2025-01-13 21:04:10 +00:00
Yanbo Liang	3355103233	[Dynamo] Supports autograd.Function forward returns constant (#144597 ) Fixes #144142 Pull Request resolved: https://github.com/pytorch/pytorch/pull/144597 Approved by: https://github.com/jansel	2025-01-12 03:53:10 +00:00
Simon Fan	8fa47c9455	[dynamo] log compiler collective duration to tlparse chromium trace (#144372 ) To show wall time in tlparse for the synchronous compiler collective. Can eliminate the leading hypothesis from https://fb.workplace.com/groups/1075192433118967/permalink/1578670289437843. <img width="1296" alt="image" src="https://github.com/user-attachments/assets/b17d4efb-8573-43e5-af58-c51af05acb54" /> sample: https://gist.github.com/xmfan/19eeaa80d55a4e7c168e150355ec7392 rank 0: https://manifold.edge.x2p.facebook.net/v0/read/tree/logs/.tmpr5WNMt/rank_0/index.html?bucketName=tlparse_reports&apiKey=tlparse_reports-key&withPayload=1&timeoutMsec=10 rank 1: https://manifold.edge.x2p.facebook.net/v0/read/tree/logs/.tmpr5WNMt/rank_1/index.html?bucketName=tlparse_reports&apiKey=tlparse_reports-key&withPayload=1&timeoutMsec=100 Pull Request resolved: https://github.com/pytorch/pytorch/pull/144372 Approved by: https://github.com/ezyang	2025-01-11 03:10:39 +00:00
Colin L. Rice	0cd9320c7f	easy: dynamo_config: sort keys and set values (#143317 ) This will create consistent ordering of keys when writing, as well as sorting sets before serializing Pull Request resolved: https://github.com/pytorch/pytorch/pull/143317 Approved by: https://github.com/masnesral ghstack dependencies: #143307	2025-01-11 03:08:04 +00:00
Sam Ginzburg	074aca3ed2	[user triton] add support for @triton.heuristics after @triton.autotune (#142208 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/142208 Approved by: https://github.com/zou3519	2025-01-11 02:18:26 +00:00
PyTorch MergeBot	473b745cb9	Revert "[dynamo] Avoid graph break on updates to `obj.__dict__` (#144419 )" This reverts commit `c8595ba7d0`. Reverted https://github.com/pytorch/pytorch/pull/144419 on behalf of https://github.com/clee2000 due to newly added test fails internally D68004708 ([comment](https://github.com/pytorch/pytorch/pull/144419#issuecomment-2583265412))	2025-01-10 16:59:38 +00:00
bobrenjc93	1fe3af2c68	Migrate from Tuple -> tuple in torch/_dynamo (#144261 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/144261 Approved by: https://github.com/aorenste, https://github.com/zou3519	2025-01-10 07:45:57 +00:00
Ryan Guo	c8595ba7d0	[dynamo] Avoid graph break on updates to `obj.__dict__` (#144419 ) `obj.__dict__` is handled specially in Dynamo, and prior to this patch we only support read and membership check on that dictionary object. This patch adds support for writes and some documentation. Fixes #143756. Pull Request resolved: https://github.com/pytorch/pytorch/pull/144419 Approved by: https://github.com/jansel, https://github.com/anijain2305	2025-01-10 05:22:04 +00:00
Guilherme Leobas	bf6dd955cd	Fix max(map(...)) (#142443 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/142443 Approved by: https://github.com/zou3519	2025-01-10 01:44:37 +00:00
Shangdi Yu	66ce13b497	Revert D67299312: Multisect successfully blamed "D67299312: [AoTI Minifier] UX Improvement" for one test failure (#144475 ) Summary: This diff partially reverts D67299312 D67299312: [AoTI Minifier] UX Improvement by yushangdi causes the following test failure: Differential Revision: D67963019 Pull Request resolved: https://github.com/pytorch/pytorch/pull/144475 Approved by: https://github.com/zhxchen17, https://github.com/angelayi	2025-01-09 23:27:55 +00:00
Colin L. Rice	73278e6a5d	easy: sort dictionary keys for inductor config when publishing (#143307 ) This means we should get consistent logging strings for the same config on different ranks Pull Request resolved: https://github.com/pytorch/pytorch/pull/143307 Approved by: https://github.com/xmfan	2025-01-09 18:01:20 +00:00
Xuehai Pan	dcc3cf7066	[BE] fix ruff rule E226: add missing whitespace around operator in f-strings (#144415 ) The fixes are generated by: ```bash ruff check --fix --preview --unsafe-fixes --select=E226 . lintrunner -a --take "RUFF,PYFMT" --all-files ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/144415 Approved by: https://github.com/huydhn, https://github.com/Skylion007	2025-01-08 21:55:00 +00:00
Aaron Gokaslan	373541fbf4	[BE]: Remove unnecessary copy of gradients in util (#144329 ) No need to copy gradients to CPU too Pull Request resolved: https://github.com/pytorch/pytorch/pull/144329 Approved by: https://github.com/awgu, https://github.com/cyyever	2025-01-08 16:52:15 +00:00
Nikita Shulga	708ce3c008	Add `is_dtype_supported` predicate to DeviceInterface (#144355 ) Which will return true, unless dtype is bf16 by default For MPS device it will return false if dtype is double Check that it works by refactoring `test_inf` that should expect TypeError raised if invoked with unsupported dtype Pull Request resolved: https://github.com/pytorch/pytorch/pull/144355 Approved by: https://github.com/jansel, https://github.com/dcci	2025-01-08 13:59:46 +00:00
William Wen	f700035090	[3.13t] use sysconfig to check for Python nogil builds (#144361 ) `sys._is_gil_enabled()` wasn't working in certain cases, according to @atalman Pull Request resolved: https://github.com/pytorch/pytorch/pull/144361 Approved by: https://github.com/atalman	2025-01-08 13:00:32 +00:00
Animesh Jain	2ac41404a8	[dynamo][dicts] Guarding lazily on dict keys (#143997 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/143997 Approved by: https://github.com/jansel	2025-01-08 03:56:33 +00:00
Oguz Ulgen	9ee242213b	[RFC] Introduce cache hot loading APIs (a.k.a. "Mega-cache") (#143341 ) This PR essentially introduces two new APIs * torch.compiler.save_cache_artifacts * torch.compiler.load_cache_artifacts which aim to create a mega cache experience where the user can start collecting cache artifacts, and later call the save API to fetch them. In the next attempt, the user can "hot load" the cache artifacts via the load function. This bundling approach reduces the need to rely on porting individual files one by one, or relying on many network requests. Note that these APIs CANNOT log to structured logging as these functions will be called before and after compilation, as opposed to during compilation. Due to this limitation, the API returns a struct that the user can log with. Pull Request resolved: https://github.com/pytorch/pytorch/pull/143341 Approved by: https://github.com/jansel	2025-01-07 23:13:24 +00:00
Yanbo Liang	430d54ee20	[Dynamo] Add functorch C++ bindings as in graph functions (#144309 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/144309 Approved by: https://github.com/williamwen42 ghstack dependencies: #144306, #144307, #144308	2025-01-07 22:25:01 +00:00
Yanbo Liang	d146763f6f	[Dynamo] Inline functions in torch._ops (#144308 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/144308 Approved by: https://github.com/williamwen42 ghstack dependencies: #144306, #144307	2025-01-07 22:25:01 +00:00
Yanbo Liang	242a4a3f83	[Dynamo] Inline functions in torch._functorch.pyfunctorch (#144307 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/144307 Approved by: https://github.com/williamwen42 ghstack dependencies: #144306	2025-01-07 22:24:53 +00:00
Yanbo Liang	4417be65e5	[Dynamo] Inline functions in torch._functorch.autograd_function (#144306 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/144306 Approved by: https://github.com/williamwen42	2025-01-07 22:24:46 +00:00
Simon Fan	f4969c8235	fix torch.compile + ddp + non-reentrant AC pack hook firing count (#144271 ) FIXES https://github.com/pytorch/pytorch/issues/144035 In order to preserve hook firing semantics, we disabled pack/unpack hooks for torch.compile: https://github.com/pytorch/pytorch/pull/123196. In DDP under torch.compile, there's this other callsite that we need to disable hooks for Pull Request resolved: https://github.com/pytorch/pytorch/pull/144271 Approved by: https://github.com/bdhirsh, https://github.com/soulitzer	2025-01-07 21:08:52 +00:00
Simon Fan	d38af6e8bc	[ca] dedup node names when AOT bwd graph is reused multiple times (#144202 ) This error started popping up in HUD CA benchmarks: ```python File "/data/users/xmfan/core/b/pytorch/torch/_dynamo/compiled_autograd.py", line 371, in dce self.fx_tracer.graph.eliminate_dead_code(is_impure) File "/data/users/xmfan/core/b/pytorch/torch/fx/graph.py", line 1862, in eliminate_dead_code self.lint() File "/data/users/xmfan/core/b/pytorch/torch/fx/graph.py", line 1753, in lint raise RuntimeError(f"Node redefined name {node.name}!") RuntimeError: Node redefined name aot0_expand! ``` We added CA initial capture's renaming (https://github.com/pytorch/pytorch/pull/133148) to help debug issues with AOT backward, but it errors out when we have multiple instances of the same AOT backward. This likely only showed up now because of increased hierarchical graph reuse. I fix it by adding a postfix counter to the node name Pull Request resolved: https://github.com/pytorch/pytorch/pull/144202 Approved by: https://github.com/bdhirsh, https://github.com/jansel	2025-01-07 20:23:09 +00:00
Shangdi Yu	72e8f34715	[AoTI Minifier] UX Improvement (#143330 ) Summary: - When a user specify `TORCHINDUCTOR_MAX_AUTOTUNE=1` env variable, we add `config.max_autotune=True` to the generated minifier_launcher - We should do this to other inductor configs as well in a followup Diff Currently in dynamo and aoti minifier, if a config is overwritten by an env variable, the config will not show up in the config list in the minifier_launcher.py file. As a result, when running the minifier_launcher, they need to re-apply the same env variable. This is: 1) not convenient for the users 2) if they copy-paste the minifier_launcher.py to us without including the env variable, we could be confused and not able to reproduce the error. Underlying implementation change: - Add `env_default` parameter to `codegen_config()`. If set, configs overriden by the env are not considered default. Test Plan: ``` buck2 run 'fbcode//mode/dev-nosan' fbcode//caffe2/test:utils -- -r test_codegen_config ``` Differential Revision: D67299312 Pull Request resolved: https://github.com/pytorch/pytorch/pull/143330 Approved by: https://github.com/jansel, https://github.com/eellison	2025-01-07 20:04:19 +00:00
Guilherme Leobas	4c8d661348	Set `enable_trace_contextlib_contextmanager` flag to True (#140604 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/140604 Approved by: https://github.com/zou3519 ghstack dependencies: #136033	2025-01-06 16:56:22 +00:00
yijun-lee	d4609af1ca	Propagate callable parameter types using ParamSpec (#142306 ) (#144047 ) Fixes #142306 This PR includes typing improvements and refactoring for the following files: - __init__.py - decorators.py - _ops.py Pull Request resolved: https://github.com/pytorch/pytorch/pull/144047 Approved by: https://github.com/XuehaiPan, https://github.com/Skylion007 Co-authored-by: Aaron Gokaslan <aaronGokaslan@gmail.com> Co-authored-by: Xuehai Pan <XuehaiPan@pku.edu.cn>	2025-01-06 16:16:18 +00:00
Animesh Jain	f6488d85a0	[dynamo][user-defined] Remove __getattribute__ checks and add getsetdescriptor (#144173 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/144173 Approved by: https://github.com/jansel	2025-01-05 13:48:15 +00:00
PyTorch MergeBot	b01556bd8a	Revert "[dynamo][dicts] Guarding lazily on dict keys (#143997 )" This reverts commit `f5df082fab`. Reverted https://github.com/pytorch/pytorch/pull/143997 on behalf of https://github.com/jeanschmidt due to Seems to have introduced internal ci redness in some tests, D67828366 ([comment](https://github.com/pytorch/pytorch/pull/143997#issuecomment-2571587599))	2025-01-05 11:09:45 +00:00
James Wu	f2d6cfa677	Introduce CompileEventLogger, replace usages of metrics_context and chromium_event with it (#143420 ) Problem statement: I want to be able to centralize and simplify the process by which people add columns/data to existing spans. We have MetricsContext and ChromiumEventLogger, and there's various choices you can make to decide where and when to log different levels of observability for your events. To resolve this, I want a central API for "adding to events under dynamo_timed". CompileEventLogger is intended as a frontend for MetricsContext and ChromiumEventLogger so we can use the same class for handling everything. CompileEventLogger is intended be used within a `dynamo_timed()` context. Its purpose is to 1. log to existing events that are in progress (i.e. within dynamo_timed), and 2. log instant events to chromium that are independent of any specific span. CompileEventLogger has three log levels: - CHROMIUM: Log only to chromium events, visible via tlparse. - PT2_COMPILE: Log to chromium_events + pt2_compile_events - COMPILATION_METRIC: Log to compilation metrics in addition to the toplevel chromium and pt2_compile_event. In addition, we have a function CompileEventLogger.add() that automagically chooses the correct log level. For now, it is conservative, and will never automagically choose to log CompilationMetrics (though I could imagine it figuring out the metadata are all keys in CompilationMetric and therefore loggable there). The goal here is to make one single interface to log stuff for observability reasons, and make it as easy as possible. Not included in this diff: - V1 of this diff will not have implementations of `increment` and `add_to_set` which MetricsContext has, so those usages are not replaced yet. But I'll add those in a followup. - We don't handle `RuntimeMetricsContext`. It's unclear if I want that to be part of this, because under RuntimeMetricsContext there might not be a toplevel event to log to, so chromium events doesn't make sense in that context. So I might leave that separate for now. Differential Revision: [D67346203](https://our.internmc.facebook.com/intern/diff/D67346203/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/143420 Approved by: https://github.com/aorenste	2025-01-04 22:40:34 +00:00
Animesh Jain	f5df082fab	[dynamo][dicts] Guarding lazily on dict keys (#143997 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/143997 Approved by: https://github.com/jansel ghstack dependencies: #144129, #144130, #144141, #144158, #144163, #144160	2025-01-04 18:13:00 +00:00
Animesh Jain	816328fa51	[dynamo][lazy] LazyVT utils to get original value/source and is_hashable (#144160 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/144160 Approved by: https://github.com/williamwen42, https://github.com/jansel ghstack dependencies: #144129, #144130, #144141, #144158, #144163	2025-01-04 06:23:05 +00:00
Sam Ginzburg	ec1f56fdcf	[user triton] add support for prune_configs_by in @triton.autotune (#142207 ) This PR adds support for prune_configs_by in the @triton.autotune decorator [docs](https://triton-lang.org/main/python-api/generated/triton.autotune.html#triton.autotune). Supporting this lets users reduce autotuning time by running user-supplied code (early_config_prune, perf_model) to prune the provided list of configs. We implement this by realizing args/kwargs in call_triton_kernel(...), and then calling kernel.prune_configs(...). Pull Request resolved: https://github.com/pytorch/pytorch/pull/142207 Approved by: https://github.com/zou3519, https://github.com/aakhundov	2025-01-04 03:50:28 +00:00
Animesh Jain	087c625261	[dynamo] Trace torch.typename (#144163 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/144163 Approved by: https://github.com/yanboliang, https://github.com/williamwen42, https://github.com/jansel ghstack dependencies: #144129, #144130, #144141, #144158	2025-01-04 02:52:58 +00:00
Animesh Jain	3292220c43	[dynamo][easy] Move symnode helpers to utils (#144158 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/144158 Approved by: https://github.com/williamwen42, https://github.com/jansel ghstack dependencies: #144129, #144130, #144141	2025-01-04 02:52:58 +00:00
Xiaodong Wang	0a94bb432e	[ROCm] CK Flash Attention Backend (#143695 ) Replace https://github.com/pytorch/pytorch/pull/138947 for re-import. Replaces https://github.com/ROCm/pytorch/pull/1592 This PR contains the initial implementation of SDPA with composable_kernel backend. The CK path can be forced by simply calling torch.backends.cuda.preferred_rocm_fa_library("ck"). Similarly, you can force the incumbent aotriton implementation by passing in "aotriton" or "default". As you'd expect, not setting this option will result in aotriton to be used as the backend. In the case of CK, if pytorch deems flash attention usable, then it will use the CK path in all the same places aotriton would have been used. This PR makes no changes to the heuristics which select which attention scheme to use (i.e. flash attention vs memory efficient attention vs math etc etc). It only gets called when flash attention is both enabled (via USE_FLASH_ATTENTION) and is selected at runtime by the existing heuristics. Files located in pytorch/aten/src/ATen/native/transformers/hip/flash_attn/ck/mha* have been pulled from https://github.com/Dao-AILab/flash-attention courtesy of @tridao's hard work who is the co-author NOTE: In order to use this backend, the user MUST set USE_CK_FLASH_ATTENTION=1 in their environment when they build PyTorch. Pull Request resolved: https://github.com/pytorch/pytorch/pull/143695 Approved by: https://github.com/malfet Co-authored-by: Andy Lugo <Andy.LugoReyes@amd.com> Co-authored-by: Jithun Nair <jithun.nair@amd.com>	2025-01-03 22:01:36 +00:00
Yidi Wu	c36f94b373	[while_loop][dynamo] auto-unspecialize int input and output to unbacked symints (#143106 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/143106 Approved by: https://github.com/zou3519 ghstack dependencies: #143105, #143545	2025-01-03 19:01:07 +00:00
Yidi Wu	5660709856	[hop][BE] unify meta checking with check_meta_consistency (#143545 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/143545 Approved by: https://github.com/zou3519 ghstack dependencies: #143105	2025-01-03 19:01:07 +00:00
PyTorch MergeBot	8d63a4a409	Revert "Set `enable_trace_contextlib_contextmanager` flag to True (#140604 )" This reverts commit `1c817fe671`. Reverted https://github.com/pytorch/pytorch/pull/140604 on behalf of https://github.com/guilhermeleobas due to breaking one of the benchmarks (moco) ([comment](https://github.com/pytorch/pytorch/pull/140604#issuecomment-2569640837))	2025-01-03 18:23:53 +00:00
Animesh Jain	c5c897c3a1	[dynamo][easy] Miscellaneous fixes (#144141 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/144141 Approved by: https://github.com/williamwen42 ghstack dependencies: #144129, #144130	2025-01-03 18:22:56 +00:00
Xuehai Pan	d9507548d8	[dynamo][BE] move `zip_longest` polyfill to submodule `polyfills.itertools` (#144067 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/144067 Approved by: https://github.com/yanboliang ghstack dependencies: #144066	2025-01-03 08:08:31 +00:00
Xuehai Pan	fb1beb31d2	[dynamo][BE] move `dropwhile` polyfill to submodule `polyfills.itertools` (#144066 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/144066 Approved by: https://github.com/jansel	2025-01-03 08:08:31 +00:00
Animesh Jain	dec1a6d0f0	[dynamo] Separate out GetItemSource and DictGetItemSource (#143926 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/143926 Approved by: https://github.com/jansel	2025-01-01 02:39:41 +00:00
Vinayak Pandey	16a57e232c	removed dead code for dynamo flag dead_code_elimination (#140938 ) Fixes #136862 1. removed dead code from torch/_dynamo/convert_frame.py 2. ran `lintrunner -a` and all the tests passed. 3. ran the unit tests and everything seems to be in order. Pull Request resolved: https://github.com/pytorch/pytorch/pull/140938 Approved by: https://github.com/zou3519	2024-12-31 09:27:43 +00:00
Kasperi Apell	a7915c56f6	Propagate callable parameter types using ParamSpec (#142306 ) (#143797 ) The codebase has a few locations where callable parameter type information is lost when the unpackings args and *kwargs are typed as Any. Refactor these instances to retain type information using typing_extensions.ParamSpec. Also, in these functions, enforce return type with TypeVar. Addresses #142306 Pull Request resolved: https://github.com/pytorch/pytorch/pull/143797 Approved by: https://github.com/Skylion007 Co-authored-by: Aaron Gokaslan <aaronGokaslan@gmail.com> Co-authored-by: Xuehai Pan <XuehaiPan@outlook.com>	2024-12-29 23:03:14 +00:00
Animesh Jain	01980cac38	[dynamo] Make ConstDictKeySource a subclass of ChainedSource (#143924 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/143924 Approved by: https://github.com/jansel	2024-12-28 05:59:45 +00:00
Animesh Jain	c3c27aef34	[dynamo] Remove HFPretrained config hack (#143698 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/143698 Approved by: https://github.com/williamwen42, https://github.com/jansel ghstack dependencies: #143888	2024-12-28 02:03:13 +00:00
Nikita Shulga	1e65dec2b9	[Dynamo] Add MPSDevice interface (#143891 ) That simply checks if device is available and whether or not it supports bf16 Pull Request resolved: https://github.com/pytorch/pytorch/pull/143891 Approved by: https://github.com/jansel	2024-12-27 20:31:44 +00:00
Animesh Jain	a87cd5283b	[dynamo] Trace through overridden __getattribute__ method (#143888 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/143888 Approved by: https://github.com/jansel	2024-12-27 18:10:00 +00:00
Animesh Jain	0f474a960b	[dynamo] Remove dead code after introducing UserDefinedDictVariable (#143699 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/143699 Approved by: https://github.com/williamwen42, https://github.com/yanboliang, https://github.com/jansel ghstack dependencies: #143722	2024-12-27 04:51:35 +00:00
Animesh Jain	e296bab614	[dynamo] Remove DICT_SUBCLASS_GUARD_MANAGER and use dict.keys (#143722 ) In hinsight, we never needed a DICT_SUBCLASS_GUARD_MANAGER, because Dynamo would inline through the overridden keys method. In this PR, we ensure that while creating guards and constructing variable trackers, we get the `d.keys()` value by using `dict.keys(d)`. This ensures that we do not call overridden keys method. Therefore, the C++ guard can use `PyDict_Next` directly to check the guards. Pull Request resolved: https://github.com/pytorch/pytorch/pull/143722 Approved by: https://github.com/jansel	2024-12-27 04:51:35 +00:00
PyTorch MergeBot	26364428f5	Revert "[dynamo] Remove DICT_SUBCLASS_GUARD_MANAGER and use dict.keys (#143722 )" This reverts commit `fe95cbe018`. Reverted https://github.com/pytorch/pytorch/pull/143722 on behalf of https://github.com/wdvr due to failing internal tests ([comment](https://github.com/pytorch/pytorch/pull/143722#issuecomment-2563127017))	2024-12-26 22:04:36 +00:00
PyTorch MergeBot	ee25daef5a	Revert "[dynamo] Remove dead code after introducing UserDefinedDictVariable (#143699 )" This reverts commit `7d1c666139`. Reverted https://github.com/pytorch/pytorch/pull/143699 on behalf of https://github.com/wdvr due to failing internal tests ([comment](https://github.com/pytorch/pytorch/pull/143722#issuecomment-2563127017))	2024-12-26 22:04:35 +00:00
Aaron Orenstein	3df12d38cf	dynamo tracing perf: cache cleaned_instructions: 33.7 -> 30.0 (#143070 ) See #143056 for overall docs. This PR: Cache the interesting/expensive bits of `cleaned_instructions()` Pull Request resolved: https://github.com/pytorch/pytorch/pull/143070 Approved by: https://github.com/jansel	2024-12-26 19:02:08 +00:00
Jason Ansel	9035fb5a7b	[dynamo] Add types to exc.py (#143626 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/143626 Approved by: https://github.com/yanboliang ghstack dependencies: #143552, #143610	2024-12-24 21:48:32 +00:00
Jason Ansel	9e5f3fdfc7	[dynamo] Shorten tracebacks for backend compiler errors (#143552 ) Fixes #143406 After this PR the error for missing Triton is: ```py Traceback (most recent call last): File "/home/jansel/pytorch/repro.py", line 51, in <module> fp32_compiled = optimized_model(low_input) ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/jansel/pytorch/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl return self._call_impl(args, kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/jansel/pytorch/torch/nn/modules/module.py", line 1750, in _call_impl return forward_call(args, *kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/jansel/pytorch/torch/_dynamo/eval_frame.py", line 580, in _fn raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/jansel/pytorch/torch/_inductor/scheduler.py", line 3624, in create_backend raise TritonMissing(inspect.currentframe()) torch._dynamo.exc.TritonMissing: Cannot find a working triton installation. Either the package is not installed or it is too old. More information on installing Triton can be found at: https://github.com/triton-lang/triton Set TORCH_LOGS="+dynamo" and TORCHDYNAMO_VERBOSE=1 for more information You can suppress this exception and fall back to eager by setting: import torch._dynamo torch._dynamo.config.suppress_errors = True ``` Setting `TORCHDYNAMO_VERBOSE=1` yields something like the old error: ```py Traceback (most recent call last): File "/home/jansel/pytorch/repro.py", line 51, in <module> fp32_compiled = optimized_model(low_input) ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/jansel/pytorch/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl return self._call_impl(args, *kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/jansel/pytorch/torch/nn/modules/module.py", line 1750, in _call_impl return forward_call(args, *kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/jansel/pytorch/torch/_dynamo/eval_frame.py", line 580, in _fn raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/jansel/pytorch/torch/_dynamo/eval_frame.py", line 576, in _fn return fn(args, *kwargs) ^^^^^^^^^^^^^^^^^^^ File "/home/jansel/pytorch/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl return self._call_impl(args, *kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/jansel/pytorch/torch/nn/modules/module.py", line 1750, in _call_impl return forward_call(args, *kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/jansel/pytorch/torch/_dynamo/convert_frame.py", line 1383, in __call__ return self._torchdynamo_orig_callable( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/jansel/pytorch/torch/_dynamo/convert_frame.py", line 1167, in __call__ result = self._inner_convert( ^^^^^^^^^^^^^^^^^^^^ File "/home/jansel/pytorch/torch/_dynamo/convert_frame.py", line 548, in __call__ return _compile( ^^^^^^^^^ File "/home/jansel/pytorch/torch/_dynamo/convert_frame.py", line 988, in _compile guarded_code = compile_inner(code, one_graph, hooks, transform) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/jansel/pytorch/torch/_dynamo/convert_frame.py", line 716, in compile_inner return _compile_inner(code, one_graph, hooks, transform) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/jansel/pytorch/torch/_utils_internal.py", line 95, in wrapper_function return function(args, *kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/jansel/pytorch/torch/_dynamo/convert_frame.py", line 751, in _compile_inner out_code = transform_code_object(code, transform) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/jansel/pytorch/torch/_dynamo/bytecode_transformation.py", line 1361, in transform_code_object transformations(instructions, code_options) File "/home/jansel/pytorch/torch/_dynamo/convert_frame.py", line 232, in _fn return fn(args, kwargs) ^^^^^^^^^^^^^^^^^^^ File "/home/jansel/pytorch/torch/_dynamo/convert_frame.py", line 663, in transform tracer.run() File "/home/jansel/pytorch/torch/_dynamo/symbolic_convert.py", line 2870, in run super().run() File "/home/jansel/pytorch/torch/_dynamo/symbolic_convert.py", line 1053, in run while self.step(): ^^^^^^^^^^^ File "/home/jansel/pytorch/torch/_dynamo/symbolic_convert.py", line 963, in step self.dispatch_table[inst.opcode](self, inst) File "/home/jansel/pytorch/torch/_dynamo/symbolic_convert.py", line 3050, in RETURN_VALUE self._return(inst) File "/home/jansel/pytorch/torch/_dynamo/symbolic_convert.py", line 3035, in _return self.output.compile_subgraph( File "/home/jansel/pytorch/torch/_dynamo/output_graph.py", line 1102, in compile_subgraph self.compile_and_call_fx_graph( File "/home/jansel/pytorch/torch/_dynamo/output_graph.py", line 1383, in compile_and_call_fx_graph compiled_fn = self.call_user_compiler(gm) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/jansel/pytorch/torch/_dynamo/output_graph.py", line 1433, in call_user_compiler return self._call_user_compiler(gm) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/jansel/pytorch/torch/_dynamo/output_graph.py", line 1463, in _call_user_compiler compiled_fn = compiler_fn(gm, self.example_inputs()) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/jansel/pytorch/torch/_dynamo/repro/after_dynamo.py", line 130, in __call__ compiled_gm = compiler_fn(gm, example_inputs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/jansel/pytorch/torch/__init__.py", line 2314, in __call__ return compile_fx(model_, inputs_, config_patches=self.config) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/jansel/pytorch/torch/_inductor/compile_fx.py", line 1880, in compile_fx return aot_autograd( ^^^^^^^^^^^^^ File "/home/jansel/pytorch/torch/_dynamo/backends/common.py", line 83, in __call__ cg = aot_module_simplified(gm, example_inputs, self.kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/jansel/pytorch/torch/_functorch/aot_autograd.py", line 1145, in aot_module_simplified compiled_fn = AOTAutogradCache.load( ^^^^^^^^^^^^^^^^^^^^^^ File "/home/jansel/pytorch/torch/_functorch/_aot_autograd/autograd_cache.py", line 754, in load compiled_fn = dispatch_and_compile() ^^^^^^^^^^^^^^^^^^^^^^ File "/home/jansel/pytorch/torch/_functorch/aot_autograd.py", line 1131, in dispatch_and_compile compiled_fn, _ = create_aot_dispatcher_function( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/jansel/pytorch/torch/_functorch/aot_autograd.py", line 580, in create_aot_dispatcher_function return _create_aot_dispatcher_function( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/jansel/pytorch/torch/_functorch/aot_autograd.py", line 830, in _create_aot_dispatcher_function compiled_fn, fw_metadata = compiler_fn( ^^^^^^^^^^^^ File "/home/jansel/pytorch/torch/_functorch/_aot_autograd/jit_compile_runtime_wrappers.py", line 676, in aot_dispatch_autograd compiled_fw_func = aot_config.fw_compiler(fw_module, adjusted_flat_args) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/jansel/pytorch/torch/_functorch/aot_autograd.py", line 489, in __call__ return self.compiler_fn(gm, example_inputs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/jansel/pytorch/torch/_inductor/compile_fx.py", line 1758, in fw_compiler_base return inner_compile( ^^^^^^^^^^^^^^ File "/home/jansel/pytorch/torch/_inductor/compile_fx.py", line 572, in compile_fx_inner return wrap_compiler_debug(_compile_fx_inner, compiler_name="inductor")( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/jansel/pytorch/torch/_dynamo/repro/after_aot.py", line 102, in debug_wrapper inner_compiled_fn = compiler_fn(gm, example_inputs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/jansel/pytorch/torch/_inductor/compile_fx.py", line 686, in _compile_fx_inner mb_compiled_graph = fx_codegen_and_compile( ^^^^^^^^^^^^^^^^^^^^^^^ File "/home/jansel/pytorch/torch/_inductor/compile_fx.py", line 1129, in fx_codegen_and_compile return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/jansel/pytorch/torch/_inductor/compile_fx.py", line 1044, in codegen_and_compile compiled_fn = graph.compile_to_module().call ^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/jansel/pytorch/torch/_inductor/graph.py", line 1975, in compile_to_module return self._compile_to_module() ^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/jansel/pytorch/torch/_inductor/graph.py", line 1981, in _compile_to_module self.codegen_with_cpp_wrapper() if self.cpp_wrapper else self.codegen() ^^^^^^^^^^^^^^ File "/home/jansel/pytorch/torch/_inductor/graph.py", line 1916, in codegen self.scheduler.codegen() File "/home/jansel/pytorch/torch/_inductor/scheduler.py", line 3667, in codegen return self._codegen() ^^^^^^^^^^^^^^^ File "/home/jansel/pytorch/torch/_inductor/scheduler.py", line 3761, in _codegen if device is not None and self.get_backend(device).ready_to_flush(): ^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/jansel/pytorch/torch/_inductor/scheduler.py", line 3631, in get_backend self.backends[device] = self.create_backend(device) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/jansel/pytorch/torch/_inductor/scheduler.py", line 3624, in create_backend raise TritonMissing(inspect.currentframe()) torch._dynamo.exc.TritonMissing: Cannot find a working triton installation. Either the package is not installed or it is too old. More information on installing Triton can be found at: https://github.com/triton-lang/triton You can suppress this exception and fall back to eager by setting: import torch._dynamo torch._dynamo.config.suppress_errors = True ``` This PR also strips dynamo stack frames from other types of backend compile errors. Pull Request resolved: https://github.com/pytorch/pytorch/pull/143552 Approved by: https://github.com/yanboliang	2024-12-24 21:48:23 +00:00
Animesh Jain	7d1c666139	[dynamo] Remove dead code after introducing UserDefinedDictVariable (#143699 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/143699 Approved by: https://github.com/williamwen42, https://github.com/yanboliang, https://github.com/jansel ghstack dependencies: #143722	2024-12-24 02:00:18 +00:00
Animesh Jain	fe95cbe018	[dynamo] Remove DICT_SUBCLASS_GUARD_MANAGER and use dict.keys (#143722 ) In hinsight, we never needed a DICT_SUBCLASS_GUARD_MANAGER, because Dynamo would inline through the overridden keys method. In this PR, we ensure that while creating guards and constructing variable trackers, we get the `d.keys()` value by using `dict.keys(d)`. This ensures that we do not call overridden keys method. Therefore, the C++ guard can use `PyDict_Next` directly to check the guards. Pull Request resolved: https://github.com/pytorch/pytorch/pull/143722 Approved by: https://github.com/jansel	2024-12-24 02:00:18 +00:00
Sam Larsen	4271a95590	[logging] A few fixes/updates to record_compilation_metrics (#143332 ) Summary: Mostly cosmetic, but one bug fix: * Bug fix: Make sure compile_id is converted to a string in the compilation metrics so it's printed as, e.g., "0/1" instead of "[0, 1]" * Sort collections in `collection_to_str` * Print non-string elements as `"<unknown>"` instead of None (since we don't expect non-strings) * Move the population of the legacy metrics and any pre-processing to a new factory method in CompilationMetrics Test Plan: ``` python test/dynamo/test_structured_trace.py python test/dynamo/test_utils.py ``` Internal testing: https://fburl.com/scuba/dynamo_compile/sandbox/l0me8auf Pull Request resolved: https://github.com/pytorch/pytorch/pull/143332 Approved by: https://github.com/ppanchalia	2024-12-23 23:10:11 +00:00
Aaron Orenstein	06b4b96b34	dynamo tracing perf: no re in arg_ref: 33.9 -> 33.7 (#143069 ) See #143056 for overall docs. This PR: Avoid use of python re and move valid varname check in `GuardBuilder.arg_ref()` into C++ Pull Request resolved: https://github.com/pytorch/pytorch/pull/143069 Approved by: https://github.com/jansel	2024-12-23 05:32:09 +00:00
Oguz Ulgen	dc55704b48	Rename cache limit to recompile limit in configs (#143709 ) This PR renames every cache_limit to recompile_limit via sed. Old config options are maintained via Config(alias='xyz') Pull Request resolved: https://github.com/pytorch/pytorch/pull/143709 Approved by: https://github.com/jansel	2024-12-22 10:03:57 +00:00
Aaron Orenstein	9bf4b1c2e9	dynamo tracing perf: c++ strip_function_call: 49.12 -> 47.77 (#143063 ) See #143056 for overall docs. This PR: Convert `strip_function_call()` into C++ Pull Request resolved: https://github.com/pytorch/pytorch/pull/143063 Approved by: https://github.com/jansel ghstack dependencies: #143057, #143062	2024-12-22 06:38:46 +00:00
Aaron Orenstein	3ec04d30d5	dynamo tracing perf: kill import: 50.36 -> 49.12 (#143062 ) See #143056 for overall docs. This PR: Stop importing in the body of `BuiltinVariable.call_getattr()` Pull Request resolved: https://github.com/pytorch/pytorch/pull/143062 Approved by: https://github.com/jansel ghstack dependencies: #143057	2024-12-22 06:38:46 +00:00
Aaron Orenstein	f2b744b9ca	dynamo tracing perf: import_module: 59.92 -> 52.9 (#143057 ) See #143056 for overall docs. This PR: Using `importlib.import_module()` within the hot path of symbolic_convert is slow. Memoize it. Pull Request resolved: https://github.com/pytorch/pytorch/pull/143057 Approved by: https://github.com/jansel	2024-12-22 06:38:38 +00:00
Tom Ritchford	f1cbf4b1b5	Enable ruff's unused variable checking everywhere in pytorch (#136965 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/136965 Approved by: https://github.com/cyyever, https://github.com/albanD	2024-12-22 02:33:11 +00:00
Simon Fan	a8953c36f5	[compiled autograd] log compilation time to perfetto (#140964 ) https://manifold.edge.x2p.facebook.net/v0/read/tree/logs/.tmprli4iy/index.html?bucketName=tlparse_reports&apiKey=tlparse_reports-key&withPayload=1&timeoutMsec=100 ``` [ { "args": { "compile_id": "0/-/-", "graph_id": 0 }, "cat": "dynamo_timed", "name": "compiled_autograd", "ph": "B", "pid": 0, "tid": 0, "ts": 1733886868992655.8 }, { "args": { "compile_id": "0/-/-", "graph_id": 0 }, "cat": "dynamo_timed", "name": "compiled_autograd", "ph": "E", "pid": 0, "tid": 0, "ts": 1733886869130681.0 }, { "args": { "compile_id": "0/0/0" }, "cat": "dynamo_timed", "name": "dynamo", "ph": "B", "pid": 0, "tid": 0, "ts": 1733886869134350.5 }, { ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/140964 Approved by: https://github.com/masnesral ghstack dependencies: #141907, #143175	2024-12-21 04:23:25 +00:00
Animesh Jain	0da004f3dd	[dynamo] Remove transformers ModelOutput hack (#143567 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/143567 Approved by: https://github.com/williamwen42, https://github.com/jansel ghstack dependencies: #143548	2024-12-21 01:46:14 +00:00
Animesh Jain	4627cfd1f9	[dynamo] Support user defined dicts (#143548 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/143548 Approved by: https://github.com/yanboliang, https://github.com/jansel, https://github.com/williamwen42	2024-12-21 01:46:14 +00:00
Simon Fan	ffd1b53f26	[aot] refactor dynamo source and cudagraphs static idx logic (#141748 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/141748 Approved by: https://github.com/ezyang	2024-12-21 01:20:53 +00:00
Simon Fan	d88ebbf822	cleanup chromium event log on dynamo exit rather than on entry (#143175 ) clearing at dynamo start is an issue because it throws away events from compiled autograd Pull Request resolved: https://github.com/pytorch/pytorch/pull/143175 Approved by: https://github.com/Skylion007, https://github.com/jamesjwu ghstack dependencies: #141907	2024-12-21 00:41:24 +00:00
Simon Fan	4ee166b82f	[ca] add compiled autograd to CompileId (#141907 ) tlparse PR: https://github.com/ezyang/tlparse/pull/83 Pull Request resolved: https://github.com/pytorch/pytorch/pull/141907 Approved by: https://github.com/ezyang	2024-12-21 00:41:24 +00:00
PyTorch MergeBot	ad7ab5ef84	Revert "[logging] A few fixes/updates to record_compilation_metrics (#143332 )" This reverts commit `a9c753bbc8`. Reverted https://github.com/pytorch/pytorch/pull/143332 on behalf of https://github.com/malfet due to Surprisingly failure is caused by this PR ([comment](https://github.com/pytorch/pytorch/pull/143332#issuecomment-2557899120))	2024-12-21 00:06:44 +00:00
Sam Larsen	a9c753bbc8	[logging] A few fixes/updates to record_compilation_metrics (#143332 ) Summary: Mostly cosmetic, but one bug fix: * Bug fix: Make sure compile_id is converted to a string in the compilation metrics so it's printed as, e.g., "0/1" instead of "[0, 1]" * Sort collections in `collection_to_str` * Print non-string elements as `"<unknown>"` instead of None (since we don't expect non-strings) * Move the population of the legacy metrics and any pre-processing to a new factory method in CompilationMetrics Test Plan: ``` python test/dynamo/test_structured_trace.py python test/dynamo/test_utils.py ``` Internal testing: https://fburl.com/scuba/dynamo_compile/sandbox/l0me8auf Pull Request resolved: https://github.com/pytorch/pytorch/pull/143332 Approved by: https://github.com/ppanchalia	2024-12-20 21:42:32 +00:00
Colin L. Rice	a94f259a69	pgo: Log feature use (#142819 ) This will cause dynamo_compile to popualte the feature column if we have a hit for PGO. Pull Request resolved: https://github.com/pytorch/pytorch/pull/142819 Approved by: https://github.com/ezyang	2024-12-20 20:22:20 +00:00
Aaron Orenstein	8ce0bc282a	dynamo tracing perf: bytecode_transform improvements: 34.86 -> 33.9 (#143068 ) See #143056 for overall docs. This PR: Use slots on InstructionExnTabEntry and Instruction. Stop doing python version checks in the middle of `convert_instruction()` and `inst_has_op_bits()`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/143068 Approved by: https://github.com/jansel ghstack dependencies: #143065, #143067	2024-12-20 20:06:42 +00:00
Aaron Orenstein	5feb2d7b41	dynamo tracing perf: don't call expensive _set_guard_export_info if it's a duplicate guard: 37.66 -> 34.86 (#143067 ) See #143056 for overall docs. This PR: Move the call to `_set_guard_export_info()` after the duplicate guard check in `GuardBuilder.DUPLICATE_INPUT()` Pull Request resolved: https://github.com/pytorch/pytorch/pull/143067 Approved by: https://github.com/jansel ghstack dependencies: #143065	2024-12-20 20:06:42 +00:00
Aaron Orenstein	7d4e7fbfc1	dynamo tracing perf: no import on hot path: 47.62 -> 47.26 (#143065 ) See #143056 for overall docs. This PR: Removed another `import` in the body of the hot path. Pull Request resolved: https://github.com/pytorch/pytorch/pull/143065 Approved by: https://github.com/jansel	2024-12-20 20:06:42 +00:00
Nikhil Gupta	94737e8a2a	[ARM][feat]: Add 4 bit dynamic quantization matmuls & KleidiAI Backend (#134124 ) Description: 1. Quantize Linear Layer Weights to 4-bits: Quantize the weights of the Linear layer to 4 bits, using symmetric quantization. Pack two 4-bit weights into one uint8 container. Choose a quantization scheme (channel-wise or group-wise), with the group size being a multiple of 32. 2. Prepare Quantized Weights, Scales, and Optional Bias: After quantizing, obtain the quantized_weights, scales, and groupsize. If the original Linear layer has a bias, prepare it as well. 3. Pack the Weights Efficiently: Use torch.ops.aten._dyn_quant_pack_4bit_weight to optimally pack the weights, scales, and optional bias. ```python packed_weights = torch.ops.aten._dyn_quant_pack_4bit_weight(weight, scales_and_zeros, bias, groupsize, in_features, out_features) ``` Input parameters should include: in_features and out_features (the same as the Linear layer’s corresponding parameters). 4. Perform Dynamic Quantized Matrix Multiplication: Use torch.ops.aten._dyn_quant_matmul_4bit to perform matrix multiplication with quantized weights. ```python output = torch.ops.aten._dyn_quant_matmul_4bit(input, packed_weights, groupsize, in_features, out_features) ``` Inputs required include: The input tensor, packed_weights , groupsize, and the in_features and out_features. API Usage: https://github.com/pytorch/pytorch/issues/143289 Model Perf : 7B Transformer model: Prefill : 340 t/s Decode : 40 t/s 2B Transformer model Prefill : 747 t/s Decode : 80 t/s Tests: python test/test_linalg.py -k test__dyn_quant_pack_4bit_weight Ran 1 test in 0.016s OK python test/test_linalg.py -k test__dyn_quant_matmul_4bit Ran 8 tests in 0.077s OK python test/test_linalg.py -k test_compile_dyn_quant_matmul_4bit Ran 8 tests in 11.454s Change-Id: Ia1672bad5e6ec94e64d8bb1971395d60f4b3a452 Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/134124 Approved by: https://github.com/digantdesai, https://github.com/malfet	2024-12-20 19:32:03 +00:00
bobrenjc93	4f8b7c4272	Revert "refactor tensorify restart logic to use sources (#141517 )" (#143623 ) This reverts commit `30d8b30db7`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/143623 Approved by: https://github.com/mlazos	2024-12-20 15:38:34 +00:00
Guilherme Leobas	1c817fe671	Set `enable_trace_contextlib_contextmanager` flag to True (#140604 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/140604 Approved by: https://github.com/zou3519 ghstack dependencies: #136033	2024-12-20 12:02:27 +00:00
Guilherme Leobas	673cc88fd6	Add support for `contextmanager` in Dynamo (#136033 ) Fixes #130559 * Intro This PR adds support for `@contextmanager` in Dynamo. We chose to limit the scope of this work to only `@contextmanager` and plan to handle generators fully in #141055 (still in draft). * Motivation Dynamo lacks support for generator functions. When it encounters one, it traces it as if it were a regular function. This is problematic because it can lead to incorrect behavior. To illustrate, consider the test case below: ```python import torch import contextlib @contextlib.contextmanager def set_default_dtype(dtype): old_dtype = torch.get_default_dtype() try: torch.set_default_dtype(dtype) yield finally: torch.set_default_dtype(old_dtype) @torch.compile(backend="eager", fullgraph=True) def fn(): with set_default_dtype(torch.float64): x = torch.tensor([3.0, 3.0 + 5.0j]) return x ``` Before this work, Dynamo would not stop at the `yield`, and the graph produced would contain both calls to `set_default_dtype` executed one after the other. This is incorrect because the context manager should execute code before and after the `yield`. * List of changes `YIELD_VALUE` now raises an exception (`YieldValueOp`) to signal that control flow must be suspended and returned to the caller. Additionally, `RETURN_VALUE` behaves differently in a generator function. Unlike regular functions, where `RETURN_VALUE` indicates the final result, in generators it signifies that the generator is exhausted and implicitly raises `StopIteration`. A new `VariableTracker` named `FunctionDecoratedByContextlibContextManagerVariable` was introduced to handle `@contextmanager`. This variable tracker acts not just as a wrapper for the original function but also maintains an internal `tx` (InstructionTranslator) object to suspend and return control flow to the parent tracer when a `yield` is encountered. * Corner cases Returning a context manager from a compiled function is not supported. This would require PyTorch to synchronize the generator state between Dynamo and the interpreter. Any attempt to return it will result in an `IncorrectUsage` exception. Graph breaks require special handling as well. In the event of a graph break, the frame associated with the context manager is skipped, and the context manager runs in eager mode. * This PR is breaking my code There is a configuration flag (`enable_trace_contextlib`) that can be set to `False` to disable tracing context managers. If this still causes crashes, please revert this PR. Pull Request resolved: https://github.com/pytorch/pytorch/pull/136033 Approved by: https://github.com/zou3519	2024-12-20 12:02:20 +00:00
Michael Lazos	270ad513c8	[Dynamo] only import einops if version is lower than 0.7.0 (#142847 ) Fixes internal xref (https://fb.workplace.com/groups/257735836456307/posts/804793021750583/?comment_id=805229281706957&reply_comment_id=805232695039949) Pull Request resolved: https://github.com/pytorch/pytorch/pull/142847 Approved by: https://github.com/zou3519	2024-12-20 07:46:49 +00:00
Michael Lazos	fd23cf5848	[Dynamo] check node class first for graph dedup (#143609 ) as title Pull Request resolved: https://github.com/pytorch/pytorch/pull/143609 Approved by: https://github.com/williamwen42	2024-12-20 04:09:46 +00:00
Jane Xu	a0cff096bc	Improve cond error messaging (#143595 ) Discovered by @drisspg and I trying out a simple toy example and being way too confused :') Pull Request resolved: https://github.com/pytorch/pytorch/pull/143595 Approved by: https://github.com/zou3519, https://github.com/ydwu4	2024-12-20 01:19:20 +00:00
PyTorch MergeBot	8136daff5a	Revert "[ARM][feat]: Add 4 bit dynamic quantization matmuls & KleidiAI Backend (#134124 )" This reverts commit `4b82251011`. Reverted https://github.com/pytorch/pytorch/pull/134124 on behalf of https://github.com/huydhn due to Sorry for reverting your change but it breaks lots of internal build ([comment](https://github.com/pytorch/pytorch/pull/134124#issuecomment-2555953189))	2024-12-19 23:33:17 +00:00
PyTorch MergeBot	145fd5bad0	Revert "[Dynamo] only import einops if version is lower than 0.7.0 (#142847 )" This reverts commit `a96387a481`. Reverted https://github.com/pytorch/pytorch/pull/142847 on behalf of https://github.com/huydhn due to This has been reverted internally D67436053 ([comment](https://github.com/pytorch/pytorch/pull/142847#issuecomment-2555942351))	2024-12-19 23:22:44 +00:00
bobrenjc93	8850a7b62c	add some logging for tensorify (#143391 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/143391 Approved by: https://github.com/jamesjwu	2024-12-19 20:06:26 +00:00
Yanbo Liang	c46cfc245f	[Dynamo] Support dict_keys from nested dict object (#143557 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/143557 Approved by: https://github.com/williamwen42 ghstack dependencies: #143374, #143547	2024-12-19 19:02:55 +00:00
Yanbo Liang	5fa287aa82	[Dynamo] Rename Dict{View/Keys/Values} to Dict{View/Keys/Values}Variable (#143547 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/143547 Approved by: https://github.com/williamwen42 ghstack dependencies: #143374	2024-12-19 19:02:55 +00:00
Nikhil Gupta	4b82251011	[ARM][feat]: Add 4 bit dynamic quantization matmuls & KleidiAI Backend (#134124 ) Description: 1. Quantize Linear Layer Weights to 4-bits: Quantize the weights of the Linear layer to 4 bits, using symmetric quantization. Pack two 4-bit weights into one uint8 container. Choose a quantization scheme (channel-wise or group-wise), with the group size being a multiple of 32. 2. Prepare Quantized Weights, Scales, and Optional Bias: After quantizing, obtain the quantized_weights, scales, and groupsize. If the original Linear layer has a bias, prepare it as well. 3. Pack the Weights Efficiently: Use torch.ops.aten._dyn_quant_pack_4bit_weight to optimally pack the weights, scales, and optional bias. ```python packed_weights = torch.ops.aten._dyn_quant_pack_4bit_weight(weight, scales_and_zeros, bias, groupsize, in_features, out_features) ``` Input parameters should include: in_features and out_features (the same as the Linear layer’s corresponding parameters). 4. Perform Dynamic Quantized Matrix Multiplication: Use torch.ops.aten._dyn_quant_matmul_4bit to perform matrix multiplication with quantized weights. ```python output = torch.ops.aten._dyn_quant_matmul_4bit(input, packed_weights, groupsize, in_features, out_features) ``` Inputs required include: The input tensor, packed_weights , groupsize, and the in_features and out_features. API Usage: https://github.com/pytorch/pytorch/issues/143289 Model Perf : 7B Transformer model: Prefill : 340 t/s Decode : 40 t/s 2B Transformer model Prefill : 747 t/s Decode : 80 t/s Tests: python test/test_linalg.py -k test__dyn_quant_pack_4bit_weight Ran 1 test in 0.016s OK python test/test_linalg.py -k test__dyn_quant_matmul_4bit Ran 8 tests in 0.077s OK python test/test_linalg.py -k test_compile_dyn_quant_matmul_4bit Ran 8 tests in 11.454s Change-Id: Ia1672bad5e6ec94e64d8bb1971395d60f4b3a452 Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/134124 Approved by: https://github.com/digantdesai, https://github.com/malfet	2024-12-19 18:51:26 +00:00
William Wen	e1e83015d2	[dynamo, 3.13t] raise error if torch.compile is attempted in 3.13t (nogil) (#143404 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/143404 Approved by: https://github.com/colesbury, https://github.com/atalman	2024-12-19 18:10:01 +00:00
bobrenjc93	171e6a934f	Don't 1 specialize if stride is contiguous (#143365 ) Fixes: https://github.com/pytorch/pytorch/issues/142024 Pull Request resolved: https://github.com/pytorch/pytorch/pull/143365 Approved by: https://github.com/ezyang	2024-12-19 15:22:47 +00:00
Animesh Jain	465f282a24	[reland][dynamo][guards] Consider tensors as immutable for dict tag matches (#141085 ) Reland - https://github.com/pytorch/pytorch/pull/139560 As mentioned in https://github.com/pytorch/pytorch/pull/130341, using `static py::object` can lead to segfaults. I suspect this is the reason for the import system error seen internally (https://www.internalfb.com/sevmanager/view/469592). In this PR, I am removing the `static` part. This is fine and also the right thing to do because this will catch if user changes the flag in the same process for compiling two different functions. Unfortunately, there is no easy way to trigger this segfault, so I can't write a test. Pull Request resolved: https://github.com/pytorch/pytorch/pull/141085 Approved by: https://github.com/jansel Co-authored-by: William Wen <williamwen@meta.com>	2024-12-19 15:16:10 +00:00
Aaron Orenstein	da06d47bdb	dynamo tracing perf: slight improvement on __instancecheck__: 47.77 -> 47.62 (#143064 ) See #143056 for overall docs. This PR: Switch out an `isinstance()` for an `is` in the very hot `VariableTrackerMeta.__instancecheck__`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/143064 Approved by: https://github.com/ezyang, https://github.com/jansel	2024-12-19 09:19:35 +00:00
Yanbo Liang	2ffdcab04c	[Dynamo] Add DictKeySetVariable to capture dict_keys passed outside of compiled region (#143374 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/143374 Approved by: https://github.com/williamwen42, https://github.com/jansel	2024-12-19 06:39:27 +00:00
PyTorch MergeBot	14fe1f7190	Revert "[ARM][feat]: Add 4 bit dynamic quantization matmuls & KleidiAI Backend (#134124 )" This reverts commit `d3ff2d42c2`. Reverted https://github.com/pytorch/pytorch/pull/134124 on behalf of https://github.com/malfet due to This broke S390 builds, includes cpuinfo unconditionally ([comment](https://github.com/pytorch/pytorch/pull/134124#issuecomment-2552560208))	2024-12-19 01:05:11 +00:00
Michael Lazos	5c3996cab2	[Dynamo] topologically sort duplicated graph regions (#143523 ) Ensure regions are topologically sorted Pull Request resolved: https://github.com/pytorch/pytorch/pull/143523 Approved by: https://github.com/williamwen42	2024-12-19 00:43:48 +00:00
Michael Lazos	4eafbe5288	[Dynamo] Flatten slices during graph deduplication (#143522 ) I encountered this issue while debugging torchtune - overall we need to make sure to not miss nodes that are slice arguments. Pull Request resolved: https://github.com/pytorch/pytorch/pull/143522 Approved by: https://github.com/williamwen42	2024-12-18 23:12:34 +00:00
Ryan Guo	5380407af5	[dynamo] Properly model root frame globals during inlining (#143447 ) This patch updates `InliningInstructionTranslator.STORE_GLOBAL` to properly check whether `self.f_globals` is the same as root frame `f_globals`. See added comments for why this is important. Fixes #143425. Pull Request resolved: https://github.com/pytorch/pytorch/pull/143447 Approved by: https://github.com/zou3519	2024-12-18 23:04:02 +00:00
Nikhil Gupta	d3ff2d42c2	[ARM][feat]: Add 4 bit dynamic quantization matmuls & KleidiAI Backend (#134124 ) Description: 1. Quantize Linear Layer Weights to 4-bits: Quantize the weights of the Linear layer to 4 bits, using symmetric quantization. Pack two 4-bit weights into one uint8 container. Choose a quantization scheme (channel-wise or group-wise), with the group size being a multiple of 32. 2. Prepare Quantized Weights, Scales, and Optional Bias: After quantizing, obtain the quantized_weights, scales, and groupsize. If the original Linear layer has a bias, prepare it as well. 3. Pack the Weights Efficiently: Use torch.ops.aten._dyn_quant_pack_4bit_weight to optimally pack the weights, scales, and optional bias. ```python packed_weights = torch.ops.aten._dyn_quant_pack_4bit_weight(weight, scales_and_zeros, bias, groupsize, in_features, out_features) ``` Input parameters should include: in_features and out_features (the same as the Linear layer’s corresponding parameters). 4. Perform Dynamic Quantized Matrix Multiplication: Use torch.ops.aten._dyn_quant_matmul_4bit to perform matrix multiplication with quantized weights. ```python output = torch.ops.aten._dyn_quant_matmul_4bit(input, packed_weights, groupsize, in_features, out_features) ``` Inputs required include: The input tensor, packed_weights , groupsize, and the in_features and out_features. API Usage: https://github.com/pytorch/pytorch/issues/143289 Model Perf : 7B Transformer model: Prefill : 340 t/s Decode : 40 t/s 2B Transformer model Prefill : 747 t/s Decode : 80 t/s Tests: python test/test_linalg.py -k test__dyn_quant_pack_4bit_weight Ran 1 test in 0.016s OK python test/test_linalg.py -k test__dyn_quant_matmul_4bit Ran 8 tests in 0.077s OK python test/test_linalg.py -k test_compile_dyn_quant_matmul_4bit Ran 8 tests in 11.454s Change-Id: Ia1672bad5e6ec94e64d8bb1971395d60f4b3a452 Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/134124 Approved by: https://github.com/digantdesai, https://github.com/malfet	2024-12-18 22:30:07 +00:00
Yidi Wu	1e201422ed	[export] add is_exporting flag (#142425 ) We added an is_export flag under torch.compiler.is_exporting. This comes handy when we try to do some special logic in user-level and system-level (e.g. in upper of the stack). In increasing-scope: - `_is_fx_tracing` is set to True when we use under symbolic_trace or make_fx. - `is_exporting` is set to True when we're doing strict or non-strict export, which internally has a step that calls make_fx and set _is_fx_tracing to be True. - `is_compiling` is set to True when we're either doing strict, non-strict export or torch.compile. Pull Request resolved: https://github.com/pytorch/pytorch/pull/142425 Approved by: https://github.com/avikchaudhuri	2024-12-18 21:36:28 +00:00
qiurc	90cc43f270	Support garbage collection after pt2 compilation (#143364 ) Summary: Support garbage collection after pt2 compilation. Add jk to control the global rollout / rollback of this functionality Add env var to control individual job's rollout Test Plan: Test the model training job with / without this changes Reviewers: @yuxihu @ezyang , @Yuzhen11 , Subscribers: Tasks: Tags: Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/143364 Approved by: https://github.com/ezyang	2024-12-18 07:25:11 +00:00
Michael Lazos	a96387a481	[Dynamo] only import einops if version is lower than 0.7.0 (#142847 ) Fixes internal xref (https://fb.workplace.com/groups/257735836456307/posts/804793021750583/?comment_id=805229281706957&reply_comment_id=805232695039949) Pull Request resolved: https://github.com/pytorch/pytorch/pull/142847 Approved by: https://github.com/zou3519	2024-12-17 20:50:25 +00:00
William Wen	18261e9f39	[dynamo] implement framelocals mapping as c++ object (#140063 ) Implements https://github.com/pytorch/pytorch/issues/93753 - move frame local guard accessors to C++. Before, we used dict accessors on a Python dict representing the frame's fastlocals that we manually build. We move this accessor to C++ and additionally use the fastlocal index whenever possible. Some implementation notes: - `FrameLocalsMapping` is now initialized as a C++ vector of `PyObject`s. We do not just use the frame's localsplus/fastlocals buffer because we also unbox cells. - `FrameLocalsMapping` can still be converted into a Python dict representing the frame's fastlocals, but it is done lazily. - We update `LeafGuard`, `GuardAccessor`, and `GuardManager`'s `check_nopybind` methods to accept `FrameLocalsMapping`. By default, we convert the `FrameLocalsMapping` to a Python dict and run the original `check_nopybind` on it, but in some cases, conversion is not needed. - We add a new guard accessor `FrameLocalsGuardAccessor`, which is similar to `DictGetItemGuardAccessor` but has special handling for `FrameLocalsMapping`. We create a separate class to emphasize different use cases, but we could probably combine these two (can do in a follow up) dynamo_guard_eval.py microbenchmark update: - 713.2us -> 630.0us (3.10) - 598.8us -> 530.7us (3.12) Other followups: - Add `FrameLocalsMapping` version for `check_verbose_nopybind` in order to match behavior between `check_nopybind` and `check_verbose_nopybind`. This can prevent difficult debugging situations where guards fail (`check_nopybind` returns false) but no guard error message is generated (`check_verbose_nopybind` succeeds). - Rewrite the `SHAPE_ENV` guard into C++ - it is a fairly common guard that results in `FrameLocalsMapping` needing to convert to a dict Pull Request resolved: https://github.com/pytorch/pytorch/pull/140063 Approved by: https://github.com/jansel ghstack dependencies: #142117, #142430	2024-12-17 18:54:27 +00:00
PyTorch MergeBot	e3d754419f	Revert "[reland][dynamo][guards] Consider tensors as immutable for dict tag matches (#141085 )" This reverts commit `1bf983077f`. Reverted https://github.com/pytorch/pytorch/pull/141085 on behalf of https://github.com/huydhn due to The diff D66211131 has been commandeered internally and is it not part of the train anymore. If codev is needed, pls reland this accordingly ([comment](https://github.com/pytorch/pytorch/pull/141085#issuecomment-2549092225))	2024-12-17 17:21:14 +00:00
Guilherme Leobas	487343346e	Prevent users from seeing hardcoded print stmt when hypothesis is not installed (#142398 ) Fixes: #142357 Pull Request resolved: https://github.com/pytorch/pytorch/pull/142398 Approved by: https://github.com/zou3519	2024-12-17 16:59:05 +00:00
PyTorch MergeBot	969b07b96f	Revert "[ROCm] CK Flash Attention Backend (#138947 )" This reverts commit `500d02921b`. Reverted https://github.com/pytorch/pytorch/pull/138947 on behalf of https://github.com/atalman due to Breaks default windows checkout ([comment](https://github.com/pytorch/pytorch/pull/138947#issuecomment-2548998359))	2024-12-17 16:46:57 +00:00
drisspg	5160a725c8	[FlexAttention] Fix broken eager tracing (#143344 ) Fixes #143331 Pull Request resolved: https://github.com/pytorch/pytorch/pull/143344 Approved by: https://github.com/Chillee ghstack dependencies: #143299	2024-12-17 09:42:36 +00:00
Andy Lugo	500d02921b	[ROCm] CK Flash Attention Backend (#138947 ) Replaces https://github.com/ROCm/pytorch/pull/1592 This PR contains the initial implementation of SDPA with composable_kernel backend. The CK path can be forced by simply calling `torch.backends.cuda.preferred_rocm_fa_library("ck")`. Similarly, you can force the incumbent aotriton implementation by passing in "aotriton" or "default". As you'd expect, not setting this option will result in aotriton to be used as the backend. In the case of CK, if pytorch deems flash attention usable, then it will use the CK path in all the same places aotriton would have been used. This PR makes no changes to the heuristics which select which attention scheme to use (i.e. flash attention vs memory efficient attention vs math etc etc). It only gets called when flash attention is both enabled (via `USE_FLASH_ATTENTION`) and is selected at runtime by the existing heuristics. Files located in pytorch/aten/src/ATen/native/transformers/hip/flash_attn/ck/mha* have been pulled from https://github.com/Dao-AILab/flash-attention courtesy of @tridao's hard work who is the co-author NOTE: In order to use this backend, the user MUST set USE_CK_FLASH_ATTENTION=1 in their environment when they build PyTorch. Pull Request resolved: https://github.com/pytorch/pytorch/pull/138947 Approved by: https://github.com/pruthvistony, https://github.com/xw285cornell, https://github.com/leitian Co-authored-by: Xiaodong Wang <xw285@cornell.edu>	2024-12-17 02:18:07 +00:00
William Wen	1b6b86fad7	[dynamo] disable eval frame callback around most of _TorchDynamoContext wrapper function (#143211 ) Internal xref: https://fb.workplace.com/groups/1075192433118967/permalink/1559636954674510/ If the `_fn` returned by `_TorchDynamoContext.__call__` makes an external function call, dynamo is recursively invoked. This can cause issues if there are added calls that are not skipped by Dynamo. So we should disable the eval frame callback as much as possible. Differential Revision: [D67211749](https://our.internmc.facebook.com/intern/diff/D67211749) Pull Request resolved: https://github.com/pytorch/pytorch/pull/143211 Approved by: https://github.com/jansel	2024-12-16 18:38:58 +00:00
Animesh Jain	1bf983077f	[reland][dynamo][guards] Consider tensors as immutable for dict tag matches (#141085 ) Reland - https://github.com/pytorch/pytorch/pull/139560 As mentioned in https://github.com/pytorch/pytorch/pull/130341, using `static py::object` can lead to segfaults. I suspect this is the reason for the import system error seen internally (https://www.internalfb.com/sevmanager/view/469592). In this PR, I am removing the `static` part. This is fine and also the right thing to do because this will catch if user changes the flag in the same process for compiling two different functions. Unfortunately, there is no easy way to trigger this segfault, so I can't write a test. Pull Request resolved: https://github.com/pytorch/pytorch/pull/141085 Approved by: https://github.com/jansel Co-authored-by: William Wen <williamwen@meta.com>	2024-12-16 18:38:32 +00:00
Edward Z. Yang	24f24eebde	Get rid of _lazy_import hack (#143213 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/143213 Approved by: https://github.com/aorenste, https://github.com/albanD	2024-12-14 03:46:21 +00:00
Simon Fan	cdc03f99b7	[ca] add graph id (#141906 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/141906 Approved by: https://github.com/jansel ghstack dependencies: #141919	2024-12-14 03:02:06 +00:00
Brian Hirsh	e19f493f02	add private config to temporarily preserve old FSDP guard behavior (#142871 ) Summary: https://github.com/pytorch/pytorch/pull/138819 wobbled dynamo guards in a way that caused some performance regression, so this PR temporarily adds a config to get the old behavior back while we investigate. Test Plan: CI Differential Revision: D67096751 Pull Request resolved: https://github.com/pytorch/pytorch/pull/142871 Approved by: https://github.com/yf225	2024-12-13 22:06:48 +00:00
Sam Larsen	60c54467db	[logging] Log runtime autotuning timing to scuba (#141919 ) See test plan in internal diff [D66679369](https://our.internmc.facebook.com/intern/diff/D66679369) Pull Request resolved: https://github.com/pytorch/pytorch/pull/141919 Approved by: https://github.com/jamesjwu, https://github.com/ezyang	2024-12-13 21:22:13 +00:00
Aaron Orenstein	6178be822d	dynamo tracing perf: direct Guard: 52.58 -> 51.76 (#143059 ) See #143056 for overall docs. This PR: Remove explicit constant check from `VariableBuilder.install_guards()` the args calling convention. Also remove a lambda binding. Pull Request resolved: https://github.com/pytorch/pytorch/pull/143059 Approved by: https://github.com/williamwen42, https://github.com/jansel ghstack dependencies: #143066, #143056, #143058	2024-12-13 18:20:48 +00:00
Aaron Orenstein	6bcda3a21a	dynamo tracing perf: cache on import_source: 52.9 -> 52.58 (#143058 ) See #143056 for overall docs. This PR: add cache to `InstructionTranslatorBase.import_source()` Pull Request resolved: https://github.com/pytorch/pytorch/pull/143058 Approved by: https://github.com/jansel ghstack dependencies: #143066, #143056	2024-12-13 18:20:48 +00:00
Aaron Orenstein	b472d82c96	dynamo tracing perf: import in build: 60.48 -> 59.92 (#143056 ) A series of directed perf improvements to drive down the dynamo tracing cost of the given test. Before this PR stack the compile took about 60s, and after takes 30s. Individual improvements are listed below along with the approximate improvement of that change. Tested with this model: ``` @torch.compile(backend="eager") def model_add(x, y): out = x for i in range(5000): out = torch.add(out, y) return out ``` This PR: Stop importing builder in the inner loop of `VariableTracker.build()` Pull Request resolved: https://github.com/pytorch/pytorch/pull/143056 Approved by: https://github.com/jansel ghstack dependencies: #143066	2024-12-13 18:20:48 +00:00
Aaron Orenstein	63e1f97f4b	dynamo tracing perf: don't unnecessarily call getframeinfo on the hot path: 47.26 -> 37.66 (#143066 ) See #143056 for overall docs. This PR: Stop using `getframeinfo()` when we only care about the function name and throw the rest away. Pull Request resolved: https://github.com/pytorch/pytorch/pull/143066 Approved by: https://github.com/jansel	2024-12-13 18:20:48 +00:00
Jeremy Hadidjojo	23b8ea3094	Allow disabling int specialization on nn.Modules (#142829 ) Resolves issue #140464 by adding an option to not specialize int from nn.Modules (False by default to maintain existing behavior). Test Plan: `buck2 test mode/opt caffe2/test/dynamo:test_dynamo -- test_modules.py::NNModuleTests::test_nn_module_unspec_int_attr` Differential Revision: D66837042 Pull Request resolved: https://github.com/pytorch/pytorch/pull/142829 Approved by: https://github.com/ezyang, https://github.com/yanboliang	2024-12-13 17:26:11 +00:00
Ryan Guo	b4f4c75e19	[dynamo] Support multiple inheritance for custom dict construction (#142416 ) This patch applies a local and practical workaround for custom dict construction when multiple inheritance is involved. Handling multiple inheritance in general could be a lot more involved, so I created #142414 to track that. Fixes #141118. Pull Request resolved: https://github.com/pytorch/pytorch/pull/142416 Approved by: https://github.com/jansel	2024-12-13 05:13:05 +00:00
PyTorch MergeBot	d48b16a725	Revert "[Dynamo] only import einops if version is lower than 0.7.0 (#142847 )" This reverts commit `357e261b1e`. Reverted https://github.com/pytorch/pytorch/pull/142847 on behalf of https://github.com/atalman due to Breaks binary builds, see the comment above ([comment](https://github.com/pytorch/pytorch/pull/142847#issuecomment-2539759580))	2024-12-12 18:44:35 +00:00
Xuehai Pan	d47a80246a	[dynamo][pytree][3/N] make CXX pytree traceable: `tree_map` / `tree_map_` (#137399 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/137399 Approved by: https://github.com/jansel ghstack dependencies: #137398	2024-12-12 18:05:25 +00:00
Xuehai Pan	7edeb1005a	[dynamo][pytree][2/N] make CXX pytree traceable: `tree_flatten` / `tree_unflatten` / `tree_structure` (#137398 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/137398 Approved by: https://github.com/jansel	2024-12-12 18:05:25 +00:00
Tom Ritchford	dc23f1944a	Remove unused Python variables in torch/[_-a]* (#133492 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/133492 Approved by: https://github.com/albanD	2024-12-12 17:39:14 +00:00
Sam Larsen	30b61e521c	[logging] Populate compile_time_autotune_time_us (#143104 ) See testing in attached diff Differential Revision: [D67128210](https://our.internmc.facebook.com/intern/diff/D67128210) Pull Request resolved: https://github.com/pytorch/pytorch/pull/143104 Approved by: https://github.com/ezyang	2024-12-12 17:08:43 +00:00
Michael Lazos	357e261b1e	[Dynamo] only import einops if version is lower than 0.7.0 (#142847 ) Fixes internal xref (https://fb.workplace.com/groups/257735836456307/posts/804793021750583/?comment_id=805229281706957&reply_comment_id=805232695039949) Pull Request resolved: https://github.com/pytorch/pytorch/pull/142847 Approved by: https://github.com/zou3519	2024-12-12 06:38:22 +00:00
Michael Lazos	9701c50bdc	[Dynamo] Add missing tensor builtins to allowed functions (#142841 ) Fixes https://github.com/pytorch/pytorch/issues/141232 Pull Request resolved: https://github.com/pytorch/pytorch/pull/142841 Approved by: https://github.com/yanboliang	2024-12-12 06:38:19 +00:00
Colin L. Rice	d68403df3b	filelock: Make waitcounter variant to use (#139816 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/139816 Approved by: https://github.com/ezyang	2024-12-12 01:18:34 +00:00
Michael Lazos	de313f1155	[foreach_map] Initial foreach map HOP impl for inference (#142098 ) This is the initial foreach map HOP for pointwise ops which will be extended in the future to support grouped GEMMs and other ops. This PR utilizes PrimHOPBase class to represent foreach_map as a HOP with a single subgraph. The way this is implemented is that the user API `foreach_map` provides a single pointwise torch op, and internally this function calls a polyfill which has the same semantics as a foreach op (ie iterates over lists of operands applying the op elementwise). The higher order op is passed through the stack down to inductor where a lowering in essence inlines the subgraph into the main graph. This is done by interpreting it with a pointwise subgraph lowering, grouping the outputs by device, and registering the output buffers as foreach groups as applicable. For testing I was able to reuse the existing foreach tests by creating a wrapper function which matches the foreach op interfaces for those tests and then run all of the existing foreach tests on foreach_map. TODO before landing: * Add tests for general functions * Test warning if unsupported op will block fusion Followups: * I need to add tests for backwards (this will be a followup PR because backwards will require other work as well) Pull Request resolved: https://github.com/pytorch/pytorch/pull/142098 Approved by: https://github.com/eellison	2024-12-11 21:32:11 +00:00
Yidi Wu	c632e29774	[hop][dynamo] support torch.SymInt inputs (#141524 ) Fixes https://github.com/pytorch/pytorch/issues/141305. ```python class M(torch.nn.Module): def forward(self, x, y, z): a = y.shape[0] b = z.shape[0] def true_fn(x): return x + a def false_fn(x): return x + b * z # When exporting with non-strict: a and b are symints, # so torch.compile need to wrap and trace symint inputs. return torch.cond(x.shape[0] > 5, true_fn, false_fn, (x,)) ``` In non-strict export, when inputs are annotated with dynamic shape, the a, and b in above example are torch.SymInt type. true_fn and false_fn will have closure that're of torch.SymInt types. The error is triggered because we didn't handle SymInt inputs in dynamo and ends up using a UserDefinedObjectVariable for it, which doesn't have a proxy. We added support by following how we handle SymBool input previously. Pull Request resolved: https://github.com/pytorch/pytorch/pull/141524 Approved by: https://github.com/zou3519 ghstack dependencies: #142185	2024-12-11 18:46:58 +00:00
PyTorch MergeBot	5c97ac9721	Revert "Remove unused Python variables in torch/[_-a]* (#133492 )" This reverts commit `fda975a7b3`. Reverted https://github.com/pytorch/pytorch/pull/133492 on behalf of https://github.com/clee2000 due to Sorry, I need to revert this in order to revert something else. The only thing you need to do is rebase and remerge ([comment](https://github.com/pytorch/pytorch/pull/133492#issuecomment-2536635516))	2024-12-11 17:29:12 +00:00
PyTorch MergeBot	2374d460d0	Revert "filelock: Make waitcounter variant to use (#139816 )" This reverts commit `237c4b559c`. Reverted https://github.com/pytorch/pytorch/pull/139816 on behalf of https://github.com/clee2000 due to Sorry, I need to revert this in order to revert something else. The only thing you need to do is rebase and remerge ([comment](https://github.com/pytorch/pytorch/pull/139816#issuecomment-2536616808))	2024-12-11 17:26:46 +00:00
rzou	00ac4237b2	[Dynamo] stop import third-party astunparse (#142503 ) PyTorch's minimum version is 3.9, so we can now use ast.unparse. Test Plan: - wait for tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/142503 Approved by: https://github.com/StrongerXi, https://github.com/yanboliang, https://github.com/mlazos ghstack dependencies: #142502	2024-12-11 17:00:23 +00:00
rzou	0268abd627	[Dynamo] Stop importing transformers (#142502 ) This import was free because transformers should already have been imported by this time. Test Plan: - CI Pull Request resolved: https://github.com/pytorch/pytorch/pull/142502 Approved by: https://github.com/StrongerXi, https://github.com/yanboliang, https://github.com/mlazos	2024-12-11 17:00:22 +00:00
PyTorch MergeBot	8fd4b26504	Revert "[dynamo] Support multiple inheritance for custom dict construction (#142416 )" This reverts commit `a45326b649`. Reverted https://github.com/pytorch/pytorch/pull/142416 on behalf of https://github.com/clee2000 due to The newly added test is faling internally D67056273 ([comment](https://github.com/pytorch/pytorch/pull/142416#issuecomment-2536537693))	2024-12-11 16:56:26 +00:00
Edward Z. Yang	86300965b6	Add automatic_dynamic_shapes_mark_as == "oblivious" (#141444 ) Fixes https://github.com/pytorch/pytorch/issues/137100 Should also add a mark_oblivious API for manual control. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/141444 Approved by: https://github.com/bobrenjc93 ghstack dependencies: #141415	2024-12-11 14:39:13 +00:00
Edward Z. Yang	e53696bfdb	automatic_dynamic_shapes_mark_as (#141415 ) This adds an option to cause automatic dynamic shapes to trigger unbacked SymInts rather than backed SymInts. This can potentially help if you are still seeing recompilations from 0/1 specialization but it also might just cause your program to fail with GuardOnDataDependent errors. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/141415 Approved by: https://github.com/bobrenjc93	2024-12-11 14:39:13 +00:00
Michael Lazos	539c46b6e8	[Dynamo] Add register_hook as in-graph tensor method (#142820 ) Fixes https://github.com/pytorch/pytorch/issues/141046 Pull Request resolved: https://github.com/pytorch/pytorch/pull/142820 Approved by: https://github.com/StrongerXi, https://github.com/yanboliang	2024-12-11 12:02:03 +00:00
Bob Ren	30d8b30db7	refactor tensorify restart logic to use sources (#141517 ) Differential Revision: [D67066706](https://our.internmc.facebook.com/intern/diff/D67066706) Pull Request resolved: https://github.com/pytorch/pytorch/pull/141517 Approved by: https://github.com/ezyang	2024-12-11 07:15:39 +00:00
Edward Z. Yang	256bfd1096	Rename 'cache limit' to 'recompile limit' (#141542 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/141542 Approved by: https://github.com/oulgen, https://github.com/jansel	2024-12-11 05:05:11 +00:00
Michael Lazos	082124a322	[Dynamo] Refactor to use install subgraph method in higher order ops (#141384 ) Replaced the function in HOP infra with a method on output graph to make it more general and accessible. Pull Request resolved: https://github.com/pytorch/pytorch/pull/141384 Approved by: https://github.com/zou3519 ghstack dependencies: #141381, #141382, #141383	2024-12-11 02:22:21 +00:00
Michael Lazos	c31543c7ae	[Dynamo] Initial deduplication pass impl (#141383 ) This PR implements the deduplication pass (blocked by config currently) for dynamo where identical regions from https://github.com/pytorch/pytorch/pull/141381 are replaced with a common subgraph. The two phases of deduplication are explained below. Subgraph creation: Subgraph creation works by taking one representative region from each region group and creating a subgraph from it, which will then be used to replace all regions in the group. This is implemented by first copying all nodes of the region to the new subgraph and then finding all inputs which are not within the region and creating placeholders for them. For the outputs, all regions in a region group need to be scanned to ensure the largest set of outputs is found, and then an output node is created which returns a tuple of all outputs. Graph replacement: To replace each region with the extracted subgraph, the node index in the region and argument index within the node's flattened args and kwargs are recorded once during subgraph creation. This allows us to determine which (external to the region) nodes and in which order these nodes are passed as inputs. For the outputs, getitem nodes are created for each output, and all nodes in the region with external outputs are replaced by the proper getitem node. Finally, all original nodes are erased (there should be no uses of these left in the graph). Pull Request resolved: https://github.com/pytorch/pytorch/pull/141383 Approved by: https://github.com/zou3519 ghstack dependencies: #141381, #141382	2024-12-11 02:22:21 +00:00
Michael Lazos	49e4307686	[Dynamo] add debug logging for graph region expansion (#141382 ) This PR adds debug logging for the region expansion algorithm. Pull Request resolved: https://github.com/pytorch/pytorch/pull/141382 Approved by: https://github.com/williamwen42 ghstack dependencies: #141381	2024-12-11 02:22:21 +00:00
Michael Lazos	96c36a6947	[Dynamo] Implement graph region tracking for deduplication (#141381 ) This PR implements graph region tracking for later extraction into common subgraphs. The algorithm is as follows: `GraphRegionTracker` tracks each node added to the output graph and generates a key based on the source location, instruction pointer, input shapes, and global state at the time the node is inserted into the graph. Nodes with the same key are grouped together in a list of identical nodes. Once graph capture is complete, these nodes are organized into region groups. A region group looks like this: [[IdenticalNode1], [IdenticalNode2], [IdenticalNode3]] and each sublist is called a region. For each region group (starting at the topologically latest region group), the inner regions are gradually expanded one node at time from args and kwargs of the node in each region provided that for all regions in the group, the nodes being added are also identical (ie have the same key computed above). The `get_identical_regions` function is the main entry point which will be used by the graph replacement algorithm in #141383 Edge cases to add more testing for in future PRs (in progress): * ~~multiple nodes on the same line~~ (implemented) * ~~dynamic shapes checking (need to verify symbolic inputs are the same across subgraphs)~~ (implemented) * ensure we don't expand regions where it will create a cycle during subgraph replacement * ensure outputs are always tensors (or tuples of tensors iirc) * ~~out of order kwargs, unevenly nested kwargs~~ (implemented) * input aliasing - TBD, we may add support for this in `invoke_subgraph` or reuse the aliasing analysis here to not form regions with these properties * ~~all global state~~ (implemented) Other followups: * consolidate global state checking across all caching infra Pull Request resolved: https://github.com/pytorch/pytorch/pull/141381 Approved by: https://github.com/zou3519	2024-12-11 02:22:21 +00:00
Colin L. Rice	237c4b559c	filelock: Make waitcounter variant to use (#139816 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/139816 Approved by: https://github.com/ezyang	2024-12-10 23:02:59 +00:00
Yidi Wu	7111cd6ee0	[hop][BE] add util diff_meta with prettier error message. (#142162 ) The error message changes from: ```python -torch._dynamo.exc.Unsupported: Expected branches to return tensors with same metadata. [(tensor_pair, difference)...]:[('pair0:', TensorMetadata(shape=torch.Size([4, 3]), dtype=torch.float32, requires_grad=False, stride=(3, 1), memory_format=None, is_quantized=False, qparams={}), TensorMetadata(shape=torch.Size([2, 3]), dtype=torch.float32, requires_grad=False, stride=(3, 1), memory_format=None, is_quantized=False, qparams={}))] ``` to ```python +torch._dynamo.exc.Unsupported: Expect branches to return tensors with same metadata but find pair[0] differ in 'shape', where lhs is TensorMetadata(shape=torch.Size([4, 3]), dtype=torch.float32, requires_grad=False, stride=(3, 1), memory_format=None, is_quantized=False, qparams={}) and rhs is TensorMetadata(shape=torch.Size([2, 3]), dtype=torch.float32, requires_grad=False, stride=(3, 1), memory_format=None, is_quantized=False, qparams={}) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/142162 Approved by: https://github.com/zou3519	2024-12-10 21:54:28 +00:00
Yidi Wu	9ced54a51a	[hop] lift free symbols in slice (#142385 ) Before the change, we get an unfound proxy error when linting the subgraph. After the change, we have the following dynamo graph for dynamic_shape test. ```python V1209 11:11:06.187000 4091124 torch/_dynamo/output_graph.py:1346] [0/2] [__graph_code] /data/users/yidi/pytorch/torch/fx/_lazy_graph_module.py class GraphModule(torch.nn.Module): V1209 11:11:06.187000 4091124 torch/_dynamo/output_graph.py:1346] [0/2] [__graph_code] def forward(self, s0: "Sym(s0)", s1: "Sym(s1)", s2: "Sym(s2)", L_x_: "f32[s0, s1, s2][s1s2, s2, 1]cpu"): V1209 11:11:06.187000 4091124 torch/_dynamo/output_graph.py:1346] [0/2] [__graph_code] l_x_ = L_x_ V1209 11:11:06.187000 4091124 torch/_dynamo/output_graph.py:1346] [0/2] [__graph_code] V1209 11:11:06.187000 4091124 torch/_dynamo/output_graph.py:1346] [0/2] [__graph_code] # File: /data/users/yidi/pytorch/test/dynamo/test_higher_order_ops.py:307 in f, code: i = x.size(0) - 2 V1209 11:11:06.187000 4091124 torch/_dynamo/output_graph.py:1346] [0/2] [__graph_code] sub: "Sym(s0 - 2)" = s0 - 2 V1209 11:11:06.187000 4091124 torch/_dynamo/output_graph.py:1346] [0/2] [__graph_code] V1209 11:11:06.187000 4091124 torch/_dynamo/output_graph.py:1346] [0/2] [__graph_code] # File: /data/users/yidi/pytorch/test/dynamo/test_higher_order_ops.py:308 in f, code: j = x.size(1) - 3 V1209 11:11:06.187000 4091124 torch/_dynamo/output_graph.py:1346] [0/2] [__graph_code] sub_1: "Sym(s1 - 3)" = s1 - 3 V1209 11:11:06.187000 4091124 torch/_dynamo/output_graph.py:1346] [0/2] [__graph_code] V1209 11:11:06.187000 4091124 torch/_dynamo/output_graph.py:1346] [0/2] [__graph_code] # File: /data/users/yidi/pytorch/test/dynamo/test_higher_order_ops.py:310 in f, code: return wrap(lambda x: x[:i, :j, k:], x) V1209 11:11:06.187000 4091124 torch/_dynamo/output_graph.py:1346] [0/2] [__graph_code] wrap_body_0 = self.wrap_body_0 V1209 11:11:06.187000 4091124 torch/_dynamo/output_graph.py:1346] [0/2] [__graph_code] wrap = torch.ops.higher_order.wrap(wrap_body_0, s0, s1, s2, l_x_, sub, sub_1); wrap_body_0 = s0 = s1 = s2 = l_x_ = sub = sub_1 = None V1209 11:11:06.187000 4091124 torch/_dynamo/output_graph.py:1346] [0/2] [__graph_code] getitem: "f32[s0 - 2, s1 - 3, 0][s1s2, s2, 1]cpu" = wrap[0]; wrap = None V1209 11:11:06.187000 4091124 torch/_dynamo/output_graph.py:1346] [0/2] [__graph_code] return (getitem,) V1209 11:11:06.187000 4091124 torch/_dynamo/output_graph.py:1346] [0/2] [__graph_code] V1209 11:11:06.187000 4091124 torch/_dynamo/output_graph.py:1346] [0/2] [__graph_code] class wrap_body_0(torch.nn.Module): V1209 11:11:06.187000 4091124 torch/_dynamo/output_graph.py:1346] [0/2] [__graph_code] def forward(self, s0: "Sym(s0)", s1: "Sym(s1)", s2: "Sym(s2)", l_x_: "f32[s0, s1, s2][s1s2, s2, 1]cpu", sub: "Sym(s0 - 2)", sub_1: "Sym(s1 - 3)"): V1209 11:11:06.187000 4091124 torch/_dynamo/output_graph.py:1346] [0/2] [__graph_code] # File: /data/users/yidi/pytorch/test/dynamo/test_higher_order_ops.py:310 in <lambda>, code: return wrap(lambda x: x[:i, :j, k:], x) V1209 11:11:06.187000 4091124 torch/_dynamo/output_graph.py:1346] [0/2] [__graph_code] getitem: "f32[s0 - 2, s1 - 3, 0][s1s2, s2, 1]cpu" = l_x_[(slice(None, sub, None), slice(None, sub_1, None), slice(s2, None, None))]; l_x_ = sub = sub_1 = s2 = None V1209 11:11:06.187000 4091124 torch/_dynamo/output_graph.py:1346] [0/2] [__graph_code] return (getitem,) ``` We lift sub, sub_1 because they're compound expressions and are directly used in argument of the getitem node. We lift s0, s1 and s2 because they're basic symbols in the tensor input. Pull Request resolved: https://github.com/pytorch/pytorch/pull/142385 Approved by: https://github.com/zou3519	2024-12-10 21:52:30 +00:00
Tom Ritchford	fda975a7b3	Remove unused Python variables in torch/[_-a]* (#133492 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/133492 Approved by: https://github.com/albanD	2024-12-10 21:48:44 +00:00
Yuanjing Shi	117b6c3e2c	[Easy][Dynamo][TVM] remove unnecessary prints (#142445 ) This PR intends to remove the unnecessary prints in the auto-scheduler of dynamo's TVM backend. Pull Request resolved: https://github.com/pytorch/pytorch/pull/142445 Approved by: https://github.com/jansel	2024-12-10 19:52:02 +00:00
Ryan Guo	3c03bc2431	[dynamo] Expand support of enum attribute access (#142268 ) This patch changes `EnumVariable` to support access to all types of attributes, not just non-callable literals. Fixes #142050. Pull Request resolved: https://github.com/pytorch/pytorch/pull/142268 Approved by: https://github.com/jansel ghstack dependencies: #142267	2024-12-10 19:32:40 +00:00
Ryan Guo	b117945918	[dynamo] Remove dead code in `ConstantVariable.const_getattr` (#142267 ) This path is no longer reachable after #113390, which also updated `test_access_class_method_from_user_class` to reflect that. Pull Request resolved: https://github.com/pytorch/pytorch/pull/142267 Approved by: https://github.com/jansel	2024-12-10 19:32:40 +00:00
Ryan Guo	f74ba5d30d	[dynamo] Remove special graph break for self-referential list (#142438 ) We introduced a special graph break to avoid max-recursion-depth error in #100296. After #111415, the original `test_list_self_reference` no longer triggers the special graph break because we started modeling root frame free variables with `LazyVariableTracker`. After #117426, we no longer build the list items eagerly, and they'll hit `variable_tracker_cache` when they get lazily constructed later. As a result, this patch updates the `test_list_self_reference` test and removes the special graph break. Pull Request resolved: https://github.com/pytorch/pytorch/pull/142438 Approved by: https://github.com/jansel ghstack dependencies: #142437	2024-12-10 19:23:48 +00:00
Ryan Guo	4f75f1e80d	[dynamo] Use proper item source for `NamedTupleVariable` (#142437 ) Dynamo was generating `GetItemSource(tuple_source, index)` for items of `NamedTupleVariable`, but that stops working when a user supplied named tuple has a custom `__getitem__` function with different semantics. This patch - fixes the aforementioned issue by using `AttrSource` instead. - handles named tuple outside `wrap_listlike`, by removing the special case of named tuple in `BaseListVariable.cls_for_instance`, since the semantics of named tuple is different enough. - makes user all constructions of `NamedTupleVariable` has items with proper sources. Fixes #142399. Pull Request resolved: https://github.com/pytorch/pytorch/pull/142437 Approved by: https://github.com/jansel	2024-12-10 19:23:48 +00:00
Ryan Guo	a45326b649	[dynamo] Support multiple inheritance for custom dict construction (#142416 ) This patch applies a local and practical workaround for custom dict construction when multiple inheritance is involved. Handling multiple inheritance in general could be a lot more involved, so I created #142414 to track that. Fixes #141118. Pull Request resolved: https://github.com/pytorch/pytorch/pull/142416 Approved by: https://github.com/jansel	2024-12-10 19:22:15 +00:00
PyTorch MergeBot	9aefc59649	Revert "[hop][dynamo] support torch.SymInt inputs (#141524 )" This reverts commit `6713b457ae`. Reverted https://github.com/pytorch/pytorch/pull/141524 on behalf of https://github.com/huydhn due to Sorry for reverting your change, but I think it has a landrace in trunk ([comment](https://github.com/pytorch/pytorch/pull/142185#issuecomment-2532605728))	2024-12-10 18:50:17 +00:00
Yidi Wu	6713b457ae	[hop][dynamo] support torch.SymInt inputs (#141524 ) Fixes https://github.com/pytorch/pytorch/issues/141305. ```python class M(torch.nn.Module): def forward(self, x, y, z): a = y.shape[0] b = z.shape[0] def true_fn(x): return x + a def false_fn(x): return x + b * z # When exporting with non-strict: a and b are symints, # so torch.compile need to wrap and trace symint inputs. return torch.cond(x.shape[0] > 5, true_fn, false_fn, (x,)) ``` In non-strict export, when inputs are annotated with dynamic shape, the a, and b in above example are torch.SymInt type. true_fn and false_fn will have closure that're of torch.SymInt types. The error is triggered because we didn't handle SymInt inputs in dynamo and ends up using a UserDefinedObjectVariable for it, which doesn't have a proxy. We added support by following how we handle SymBool input previously. Pull Request resolved: https://github.com/pytorch/pytorch/pull/141524 Approved by: https://github.com/zou3519 ghstack dependencies: #141610, #142185	2024-12-10 17:33:57 +00:00
Yidi Wu	b838bdd4d4	[dynamo] remove unnecessary set_example_value for SymBool input. (#141610 ) These are automatically done in create_graph_input so we can remove them. Code refactoring only. Pull Request resolved: https://github.com/pytorch/pytorch/pull/141610 Approved by: https://github.com/zou3519	2024-12-10 17:33:48 +00:00
Sam Larsen	a751558467	[logging] Fix bug involving missing compilation_metrics fields in tlparse logs (#142423 ) Summary: The line of code that's compiling the set of compilation_metrics to include in the corresponding tlparse log is missing the "legacy" and "common" fields populated above. Fix is to make sure we consider all fields in the compilation_metrics object. Test Plan: Before: https://fburl.com/d6em8csg (e.g, https://fburl.com/c19s7ny0) After: https://fburl.com/5zr6kbvf (e.g, https://fburl.com/3hp14ht2) Pull Request resolved: https://github.com/pytorch/pytorch/pull/142423 Approved by: https://github.com/ezyang	2024-12-10 15:58:43 +00:00
Avik Chaudhuri	e3886fb13c	misc. fixes to unflatten (#142141 ) Combining several fixes to unflatten for bugs revealed by random graph testing. The fixes target two categories of bugs: 1. Some bugs show up as exponential blowups for largish system of nn modules. These are fixes by converting lists to sets, using caching, or otherwise rewriting to reuse computation more effiicently. 2. Other bugs were due to missing intermediate modules created when attributes such as submodules and buffers are accessed through longish paths before calling the corresponding intermediate modules, or missing attributes such as buffers and constants in submodules corresponding to multiple calls. Differential Revision: [D66659795](https://our.internmc.facebook.com/intern/diff/D66659795/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/142141 Approved by: https://github.com/ydwu4	2024-12-10 03:45:13 +00:00
Oguz Ulgen	0f6bfc58a2	Introduce remote cache key prefix to break cache (#142148 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/142148 Approved by: https://github.com/jamesjwu, https://github.com/ezyang	2024-12-10 00:35:50 +00:00
Xuehai Pan	e1196dfe51	Deprecate `torch._utils.is_compiling()` (#127690 ) This PR is split from PR #126898. - #126898 ------ Pull Request resolved: https://github.com/pytorch/pytorch/pull/127690 Approved by: https://github.com/Skylion007, https://github.com/malfet	2024-12-08 22:55:36 +00:00

... 3 4 5 6 7 ...

4210 Commits