pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-06 12:20:52 +01:00

Author	SHA1	Message	Date
Ryan Guo	8002d22ce3	[dynamo] Trace into descriptor with `__set__` (#154176 ) As title, this patch basically implements https://github.com/python/cpython/blob/3.11/Objects/object.c#L1371-L1452, and make the `__get__` handling more robust. I ran into this while fixing #133762. Differential Revision: [D75488090](https://our.internmc.facebook.com/intern/diff/D75488090) Pull Request resolved: https://github.com/pytorch/pytorch/pull/154176 Approved by: https://github.com/jansel	2025-05-30 16:14:37 +00:00
James Wu	bb17f9c98b	[AOTAutogradCache] Fix CHROMIUM_EVENT_LOG being none (#154258 ) It turns out if you import something that's None at import time in python, and later update the value, the one you imported stays none: ``` import torch from torch._dynamo.utils import CHROMIUM_EVENT_LOG class Foo: pass torch._dynamo.utils.CHROMIUM_EVENT_LOG = Foo() print(CHROMIUM_EVENT_LOG) # None ``` This fixes teh bug so we get AOTAUtogradCache instant events again Differential Revision: [D75305770](https://our.internmc.facebook.com/intern/diff/D75305770/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/154258 Approved by: https://github.com/oulgen	2025-05-23 21:53:31 +00:00
PyTorch MergeBot	3443627e07	Revert "[BE]: Enable RUFF TRY400 rule - log.exception (#153473 )" This reverts commit `4f4ecc583e`. Reverted https://github.com/pytorch/pytorch/pull/153473 on behalf of https://github.com/jeanschmidt due to seems to have broken internal signals, @albanD may I count on you to help the author merge his PR? D74837988 ([comment](https://github.com/pytorch/pytorch/pull/153473#issuecomment-2886017075))	2025-05-16 08:29:26 +00:00
Aaron Gokaslan	4f4ecc583e	[BE]: Enable RUFF TRY400 rule - log.exception (#153473 ) Change logging.error to logging.exception to log additional information when relevant. A few places have slipped in logging.errors in try except since I last did a clean up here and the rule is stabilized so I am enabling it codebase wide. I have NOQA'd much of our custom exception stack trace handling for RPC calls and distributed and tried to a fix a few errors based on whether we immediately reraised it or if we didn't print any exception handling where it could be useful. Pull Request resolved: https://github.com/pytorch/pytorch/pull/153473 Approved by: https://github.com/albanD, https://github.com/cyyever	2025-05-15 13:36:59 +00:00
Sam Larsen	dde705864a	Fix test broken by D73809989 (#153413 ) Summary: I forgot to remove this unused field in D73809989. Test Plan: `buck test 'fbcode//mode/opt' fbcode//caffe2/test:fbonly -- --exact 'caffe2/test:fbonly - test_compilation_metrics_logger_in_sync (caffe2.test.fb.test_fb.TestFBOnly)'` Pull Request resolved: https://github.com/pytorch/pytorch/pull/153413 Approved by: https://github.com/c00w	2025-05-13 16:44:30 +00:00
Michael Lazos	ff039d39ec	[Dynamo] Optimize dedupe region ancestor tracking (#152589 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/152589 Approved by: https://github.com/anijain2305 ghstack dependencies: #152389, #152505, #152410, #152506, #152570, #152572	2025-05-13 12:17:59 +00:00
Michael Lazos	3592cb52d9	[Hierarchical Compilation] Use universal flatten APIs (#152505 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/152505 Approved by: https://github.com/anijain2305 ghstack dependencies: #152389	2025-05-13 12:17:59 +00:00
Michael Lazos	023a3dc69f	[Hierarchical Compilation] Track node mutations (#152389 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/152389 Approved by: https://github.com/anijain2305	2025-05-13 12:17:59 +00:00
Aaron Gokaslan	3555ebb63d	[BE]: Update ruff to 0.11.8 (#153249 ) Fixes a ton of false negatives throughout the codebase. RUFF also properly validates NOQA comments now and most of the changes are fixing typos there or removing filewide flake8 suppressions that were also silencing ruff issues. Pull Request resolved: https://github.com/pytorch/pytorch/pull/153249 Approved by: https://github.com/cyyever, https://github.com/albanD, https://github.com/seemethere	2025-05-12 18:30:52 +00:00
PyTorch MergeBot	5c3fddb9cc	Revert "[Hierarchical Compilation] Track node mutations (#152389 )" This reverts commit `c2936ebfd5`. Reverted https://github.com/pytorch/pytorch/pull/152389 on behalf of https://github.com/jeanschmidt due to Humm, interesting, there seems to be a bug in stack PRs, as it should be part of the stack and be reverted with the other ones ([comment](https://github.com/pytorch/pytorch/pull/152389#issuecomment-2873540451))	2025-05-12 18:18:44 +00:00
PyTorch MergeBot	78d752e96a	Revert "[Hierarchical Compilation] Use universal flatten APIs (#152505 )" This reverts commit `f9e3a9058e`. Reverted https://github.com/pytorch/pytorch/pull/152505 on behalf of https://github.com/jeanschmidt due to [TENTATIVE] reverting to check if reverting this stack partially caused the introduction of https://github.com/pytorch/pytorch/actions/runs/14966121510/job/42049638969#step:22:875 ([comment](https://github.com/pytorch/pytorch/pull/152505#issuecomment-2872869990))	2025-05-12 14:48:08 +00:00
PyTorch MergeBot	aa7fe6af41	Revert "[Dynamo] Optimize dedupe region ancestor tracking (#152589 )" This reverts commit `b5f1345f72`. Reverted https://github.com/pytorch/pytorch/pull/152589 on behalf of https://github.com/jeanschmidt due to Breaking internal signal citadel-fbcode-test-mode-opt-for-pt2_stack_for_internal-linux-0 please see diff [D74531503](https://www.internalfb.com/diff/D74531503) for more details ([comment](https://github.com/pytorch/pytorch/pull/152410#issuecomment-2871168679))	2025-05-12 07:15:09 +00:00
PyTorch MergeBot	01bb249978	Revert "`has_triton`: Use the device interface for detecting Triton availability (#139171 )" This reverts commit `48bfe9afc7`. Reverted https://github.com/pytorch/pytorch/pull/139171 on behalf of https://github.com/masnesral due to Performance regression for huggingface ([comment](https://github.com/pytorch/pytorch/pull/139171#issuecomment-2868939790))	2025-05-10 14:46:23 +00:00
Michael Lazos	b5f1345f72	[Dynamo] Optimize dedupe region ancestor tracking (#152589 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/152589 Approved by: https://github.com/anijain2305 ghstack dependencies: #152389, #152505, #152410, #152506, #152570, #152572	2025-05-10 08:27:56 +00:00
Michael Lazos	f9e3a9058e	[Hierarchical Compilation] Use universal flatten APIs (#152505 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/152505 Approved by: https://github.com/anijain2305 ghstack dependencies: #152389	2025-05-10 08:27:07 +00:00
Michael Lazos	c2936ebfd5	[Hierarchical Compilation] Track node mutations (#152389 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/152389 Approved by: https://github.com/anijain2305	2025-05-10 08:27:01 +00:00
Menglu Yu	2d25e4d478	[1/n][Optimus][Auto-AC] Support activation quantization without scaling (#148380 ) Summary: We enable the activation quantization in the forward pass, and users can customize the dtype they want to quantize. Test Plan: # unit test ``` buck2 test 'fbcode//mode/dev-nosan' fbcode//caffe2/test/inductor:quantization -- test_activation_quantization_aten ``` Buck UI: https://www.internalfb.com/buck2/776d3911-bb86-4ac8-a527-540cf1510b9d Test UI: https://www.internalfb.com/intern/testinfra/testrun/4785074873051017 Network: Up: 4.3MiB Down: 42MiB (reSessionID-fef7e727-68b1-4645-a519-5652854df38d) Executing actions. Remaining 0/4 6.7s exec time total Command: test. Finished 2 local Time elapsed: 3:11.5s Tests finished: Pass 2. Fail 0. Fatal 0. Skip 0. Build failure 0 # E2E ### how to enable (you can overrite the dtype, if nothing given, the default is fp8) ``` post_grad_fusion_options={ "activation_quantization_aten_pass": {"quant_type": "torch.float8_e5m2"} }, ``` Differential Revision: D70522237 Pull Request resolved: https://github.com/pytorch/pytorch/pull/148380 Approved by: https://github.com/Mingming-Ding, https://github.com/Hahu803	2025-05-08 04:44:15 +00:00
George White	48bfe9afc7	`has_triton`: Use the device interface for detecting Triton availability (#139171 ) This PR replaces the `has_triton()` global method which was previously used for this task. Pull Request resolved: https://github.com/pytorch/pytorch/pull/139171 Approved by: https://github.com/jansel, https://github.com/shink	2025-05-07 12:23:10 +00:00
William Wen	5b9df57b50	[dynamo] context manager/decorator for dynamo config patching during tracing (#150586 ) Implement traceable config patching for Dynamo: enables restricted patching of Dynamo config where user can use a context manager/decorator to change tracing behavior for parts of the code. The new `dont_skip_tracing` decorator/context manager for ignoring most trace rules is easily implemented with this more generic traceable config patching feature. Implementation: - Create a new specialized context manager class representing a wrapper around torch._dynamo.config.patch - Dynamo doesn't trace into the context manager but updates config at compile time - Correctness is based on our correctness for handling supported context managers - Implementation is inspired by how `GradModeVariable` is implemented. Previous attempts: https://github.com/pytorch/pytorch/pull/148736 (decorator-only global approach) and https://github.com/pytorch/pytorch/pull/149439 (decorator-only traceback approach) See https://docs.google.com/document/d/1vWNwKL_jpg-PLopifcaSa338wks3GqSVF4GHRguybGg/edit?tab=t.0 for more details on implementation - including previous approaches. NOTE: this PR fixes a bug where skipped code objects were not tracked by convert_frame.py, leading to cases where code objects would be automatically skipped even after `torch._dynamo.reset()`. This exposed some latent dynamo-wrapped test failures in CI that previously passed in CI but not locally. Pull Request resolved: https://github.com/pytorch/pytorch/pull/150586 Approved by: https://github.com/jansel, https://github.com/zou3519, https://github.com/anijain2305	2025-04-23 09:12:13 +00:00
Sam Larsen	529f698ad4	[logging] Put "everything" WaitCounters in dynamo_timed (#151757 ) Summary: The main motivation is to capture the cudagraphs overhead in a WaitCounter. We'll combine that with Triton autotuning, and therefore rename to "compile_runtime_overheads". Since we have a couple WaitCounters where we want to capture all runtime and compile overheads, let's put the accounting in dynamo_timed so we'll automatically capture any toplevel timed regions that get added in the future. Also, dynamo_timed already has to figure out if we're timing a runtime vs. compile-time event, so we can reuse some of that logic. Test Plan: Ran an internal model with `TORCHINDUCTOR_BENCHMARK_FUSION=1` (to get benchmarking at compile time in addition to runtime). Overall compile time from various sources matches up: * tlparse: https://fburl.com/9fgsstkr. Eyeballing, total time should be 32 ranks x 2175 = ~69.6k s * ods: https://fburl.com/canvas/r4clhnb7. Right on. * dynamo_compile: https://fburl.com/scuba/dynamo_compile/ax71aqox. Right on. * pt2_compile_events: https://fburl.com/scuba/pt2_compile_events/shcjd9ql. Right on. And the runtime overhead: * ods: https://fburl.com/canvas/nvgjb282 * dynamo_compile: https://fburl.com/scuba/dynamo_compile/f2dtv0qh If we compare that to a run of the same model without the changes in this stack, results can mismatch by a lot: * tlparse: https://fburl.com/cchxwd1s. Eyeballing, total time should be 32 ranks x 2300s = ~73.5k s * ods: https://fburl.com/canvas/x1i3wvf4. It's kinda close * dynamo_compile: https://fburl.com/scuba/dynamo_compile/l7sgxdxd. Waaay too high. * pt2_compile_events: https://fburl.com/scuba/pt2_compile_events/jb4s9z1u. This is the only one that's actually correct. The discrepancy is even worse if we focus on the runtime events: * ods: https://fburl.com/canvas/a4o9f7ou * dynamo_compile: https://fburl.com/scuba/dynamo_compile/95izaes1 Pull Request resolved: https://github.com/pytorch/pytorch/pull/151757 Approved by: https://github.com/ppanchalia ghstack dependencies: #151749	2025-04-22 03:29:13 +00:00
Sam Larsen	edba20b853	[logging] Fix duration logging for dynamo_compile (#151749 ) Summary: There are a few issues I'm solving:. 1. It's too hard to measure total pt2 overhead using the dynamo_compile table because users need to know the columns representing all the top-level events (dynamo_cumulative_compile_time_us, etc.). Instead, let's populate the existing duration_us field for all top-level events. The complication is that runtime events in particular (Triton autotuning, cudagraphify) can be collapsed into a single row, with gaps in between, so we can't simply use `end_time - start_time` in all cases. Instead, we'll sum durations for all outer events when updating the compile-time or runtime metrics context. Introduce a 'depth' counter in TLS to track the nesting of CompilationMetrics events. 2. The existing implementation relies on callers of dynamo_timed to specify whether the event is a runtime or compile-time event. That doesn't work because some methods can be called in both situations, e.g., `CachingAutotuner.benchmark_all_configs`. For example `TORCHINDUCTOR_BENCHMARK_FUSION=1` enables benchmarking during compile-time. Instead, we can figure out automatically whether we're measuring a compile-time or runtime event and log accordingling. 3. If `log_compilation_events` were to throw an exception, we'd fail to clear the aggregated counters for runtime logs and they could be attributed to the wrong compile ID. I didn't actually find evidence of this in practice, but I added exception handling for extra safety. Test Plan: Ran internal models and compared dynamo_compile to pt2_compile_events: `TORCHINDUCTOR_BENCHMARK_FUSION=0` * tlparse: https://fburl.com/itciwnxc * dynamo_compile: https://fburl.com/scuba/dynamo_compile/yvkif5vb * pt2_compile_events: https://fburl.com/scuba/pt2_compile_events/segijet7 `TORCHINDUCTOR_BENCHMARK_FUSION=1` * tlparse: https://fburl.com/jgurcvkw * dynamo_compile: https://fburl.com/scuba/dynamo_compile/uum91ceb * pt2_compile_events: https://fburl.com/scuba/pt2_compile_events/x4xnisez Pull Request resolved: https://github.com/pytorch/pytorch/pull/151749 Approved by: https://github.com/Skylion007	2025-04-22 03:29:13 +00:00
Sam Larsen	585d03fa39	Record how many parameters we're parsing within dynamo (#148508 ) This allows us to track how many paramaters we have in compilations. Pull Request resolved: https://github.com/pytorch/pytorch/pull/148508 Approved by: https://github.com/jansel, https://github.com/anijain2305 Co-authored-by: Sam Larsen <slarsen@meta.com>	2025-04-16 06:15:11 +00:00
Ryan Guo	6a1499d209	[dynamo] handle tensor subclass with non-classmethod `__torch_function__` (#151061 ) As title, this patch fixes bugs in 1. emulating `has_torch_function` 2. emulating calling `__torch_function__` 3. building a callable VT for non-classmethod `__torch_function__` Fixes #120799, #150265, #150848. Pull Request resolved: https://github.com/pytorch/pytorch/pull/151061 Approved by: https://github.com/anijain2305, https://github.com/mlazos ghstack dependencies: #151060	2025-04-15 03:55:34 +00:00
Sam Larsen	2a1e2b88ed	[logging] Add pgo remote get/put timings to dynamo_compile (#150322 ) Test Plan: https://fburl.com/scuba/dynamo_compile/sandbox/xf950tw8 Pull Request resolved: https://github.com/pytorch/pytorch/pull/150322 Approved by: https://github.com/ppanchalia	2025-04-07 18:08:26 +00:00
Yuanhao Ji	98d06b401b	[Dynamo] Fix `dict.items()` return type (#150112 ) Fixes #150110 Pull Request resolved: https://github.com/pytorch/pytorch/pull/150112 Approved by: https://github.com/jansel, https://github.com/zou3519	2025-04-04 04:32:13 +00:00
James Wu	1979a409e9	Make CompileEventLogger more defensive w.r.t to AOTAutogradCache and FXGraphCache (#150423 ) This PR makes it so that we don't crash due to logging if we invoke AOTAutogradCache/FXGraphCache without using dynamo. This is preparation for supporting certain VLLM use cases where they store graph modules and have special handling in conjunection with the caches. Pull Request resolved: https://github.com/pytorch/pytorch/pull/150423 Approved by: https://github.com/oulgen	2025-04-04 01:55:13 +00:00
Ryan Guo	bb98749230	[dynamo] Always trace into tensor subclass `__torch_function__` (#149792 ) This patch effectively ignores traceable_tensor_subclasses, allowing Dynamo to always try tracing into the `__torch_function__` of tensor subclass. This helps us with 2 things: 1. allowing users to directly benefit from better compilation of tensor subclass, by just upgrading pytorch, without having to change legacy library code (see earlier patches in the stack for examples). 2. potentially exposing more issues in compiling tensor subclass, so we can get signals and improve them. As a consequence, it exposed and fixes 2 subtle bugs: 1. In `build_torch_function_fn`, we could get `torch._C._disabled_torch_function_impl` because we have a `Parameter` subclass without `__torch_function__` override or if we have a tensor subclass with `__torch_dispatch__` override. We graph break on this for now, and plan to add support -- the logic for simulating `torch._C._disabled_torch_function_impl` is already in `SuperVariable`, we just need to reuse it. 2. Sometimes we create `SyntheticLocalSource` and need to remove all the guards installed on it, but we only removed the ones whose source _is_ the created synthetic source `s`, but forgot about chained source like `s.foo`, this showed up as `SYNTHETIC_LOCAL['tmp_0'].__torch_function__.__func__`. Differential Revision: [D71906141](https://our.internmc.facebook.com/intern/diff/D71906141) Pull Request resolved: https://github.com/pytorch/pytorch/pull/149792 Approved by: https://github.com/jansel, https://github.com/mlazos ghstack dependencies: #149482, #149483, #149484	2025-04-02 20:57:00 +00:00
PyTorch MergeBot	e545567340	Revert "[dynamo] Always trace into tensor subclass `__torch_function__` (#149792 )" This reverts commit `238109ad32`. Reverted https://github.com/pytorch/pytorch/pull/149792 on behalf of https://github.com/malfet due to Broke trunk, see `b03c42109c/1` ([comment](https://github.com/pytorch/pytorch/pull/149482#issuecomment-2773650522))	2025-04-02 20:30:32 +00:00
Ryan Guo	238109ad32	[dynamo] Always trace into tensor subclass `__torch_function__` (#149792 ) This patch effectively ignores traceable_tensor_subclasses, allowing Dynamo to always try tracing into the `__torch_function__` of tensor subclass. This helps us with 2 things: 1. allowing users to directly benefit from better compilation of tensor subclass, by just upgrading pytorch, without having to change legacy library code (see earlier patches in the stack for examples). 2. potentially exposing more issues in compiling tensor subclass, so we can get signals and improve them. As a consequence, it exposed and fixes 2 subtle bugs: 1. In `build_torch_function_fn`, we could get `torch._C._disabled_torch_function_impl` because we have a `Parameter` subclass without `__torch_function__` override or if we have a tensor subclass with `__torch_dispatch__` override. We graph break on this for now, and plan to add support -- the logic for simulating `torch._C._disabled_torch_function_impl` is already in `SuperVariable`, we just need to reuse it. 2. Sometimes we create `SyntheticLocalSource` and need to remove all the guards installed on it, but we only removed the ones whose source _is_ the created synthetic source `s`, but forgot about chained source like `s.foo`, this showed up as `SYNTHETIC_LOCAL['tmp_0'].__torch_function__.__func__`. Differential Revision: [D71906141](https://our.internmc.facebook.com/intern/diff/D71906141) Pull Request resolved: https://github.com/pytorch/pytorch/pull/149792 Approved by: https://github.com/jansel, https://github.com/mlazos ghstack dependencies: #149482, #149483, #149484	2025-04-02 17:05:25 +00:00
William Wen	3ac5a499dd	[dynamo] add dynamo disable reasons to codebase (#150440 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/150440 Approved by: https://github.com/jansel, https://github.com/zou3519 ghstack dependencies: #150341	2025-04-02 04:26:48 +00:00
Prajesh Praveen Anchalia	005c9b2f4f	Fix _Waitcounter decorator and dd backward pass wait counter (#150235 ) Summary: This will log a wait counter with for backward compile and fixes weirdness with nested context managers. Since the old wait counters added through dynamo_timed were never created with the nesting issue. I am also changing the key nomenclature from `pytorch.dynamo_timed` to `pytorch.wait_counter`. We want to use the same nomenclature, to make it easy to find keys. Reviewed By: jamesjwu Differential Revision: D72032055 Pull Request resolved: https://github.com/pytorch/pytorch/pull/150235 Approved by: https://github.com/jamesjwu, https://github.com/masnesral	2025-03-30 05:20:12 +00:00
Yuanhao Ji	d4da0e955e	[Dynamo] Fix `is_compile_supported()` when `device_type` contains device index (#147837 ) Fixes #147826 Pull Request resolved: https://github.com/pytorch/pytorch/pull/147837 Approved by: https://github.com/anijain2305	2025-03-28 07:16:29 +00:00
Ryan Guo	1c98dc3664	[dynamo] Fix handling of setattr with some tensor attributes (#149791 ) We weren't handling `setattr(tensor_obj, "real", 42)` correctly, because the attribute is a `GetSetDescriptorType` that has special setter logic. See added test and comments for more explanations. This patch makes it so that we graph break in those cases, rather than resulting in silent incorrectness. Pull Request resolved: https://github.com/pytorch/pytorch/pull/149791 Approved by: https://github.com/mlazos ghstack dependencies: #149481	2025-03-25 18:57:56 +00:00
Sam Larsen	1e30192b19	[logging] Add python version to dynamo_compile table (#149419 ) Summary: This adds a version field like the following: `3.10.9+fb (3.10:1dd9be6, May 4 2022, 01:23:45) [Clang 15.0.7 (mononoke://mononoke.internal.tfbnw.net/fbsource 5d1601b0eed7426ac` Pull Request resolved: https://github.com/pytorch/pytorch/pull/149419 Approved by: https://github.com/c00w	2025-03-20 01:48:34 +00:00
Animesh Jain	a3c286677b	[compile] Switch off inference mode during compilation (#149321 ) PR does following * Turns `inference_mode` to False and `no_grad` for `convert_frame`, if the inference_mode is on globally. * Turns off inference_mode for fake tensor prop. This ensures that converting from real inference tensor to a fake tensor removes the inference-ness. * Graph breaks on is_inference and is_inference_mode_enabled. Pull Request resolved: https://github.com/pytorch/pytorch/pull/149321 Approved by: https://github.com/jansel, https://github.com/zou3519	2025-03-19 02:45:27 +00:00
Shunting Zhang	6c7d8419e3	fix two accuracy regression (#149172 ) There are 2 accuracy regression in 3/12 nightly perf run. I can not repro them locally thus there is no effective way to bisect. Raise the tolerance to make them pass the accuracy check. - error log for HF MegatronBertForQuestionAnswering https://gist.github.com/shunting314/25322b66e15e98feed32e0d9a1e43316 - error log for TIMM gluon_inception_v3 https://gist.github.com/shunting314/df64ce22327df27a7057bbbd19ef5164 Pull Request resolved: https://github.com/pytorch/pytorch/pull/149172 Approved by: https://github.com/jansel, https://github.com/eellison	2025-03-17 19:34:00 +00:00
Sam Larsen	7cdbb913e7	[logging] Set compile_id in the CachingAutotuner during compilation so we have it for dynamo_timed logging (#148693 ) Summary: This is a simpler alternative to https://github.com/pytorch/pytorch/pull/146455, where we can stick the compileId (and forward/backward bool) in the CachingAutotuner so that we have it for logging `benchmark_all_configs`. Recall that the first attempt put the compileId in the inductor_meta and that interfered with caching. Test Plan: `python benchmarks/dynamo/torchbench.py --performance --training --amp --backend inductor --device cuda --print-compilation-time --repeat 5 --cold-start-latency --only nanogpt` * tlparse: https://fburl.com/e71yn6uc * dynamo_compile: https://fburl.com/scuba/dynamo_compile/sandbox/4ageghhv * pt2_compile_events: https://fburl.com/scuba/pt2_compile_events/4fgv1itq Pull Request resolved: https://github.com/pytorch/pytorch/pull/148693 Approved by: https://github.com/eellison	2025-03-13 03:50:58 +00:00
Boyuan Feng	5b60749e9e	[cudagraph] add log for skip reasons (#148797 ) Summary: Add skip reasons to dynamo_compile so we can know popular skip reasons for cudagraph Test Plan: {F1975906635} Differential Revision: D70820791 Pull Request resolved: https://github.com/pytorch/pytorch/pull/148797 Approved by: https://github.com/masnesral	2025-03-11 23:31:48 +00:00
PyTorch MergeBot	b54cf1a281	Revert "[logging] Set compile_id in the CachingAutotuner during compilation so we have it for dynamo_timed logging (#148693 )" This reverts commit `73c8068cf8`. Reverted https://github.com/pytorch/pytorch/pull/148693 on behalf of https://github.com/ZainRizvi due to This is breaking lint on trunk. Please rebase these changes before merging them back in. [GH job link](https://github.com/pytorch/pytorch/actions/runs/13796723235/job/38590020554) [HUD commit link](`73c8068cf8`) ([comment](https://github.com/pytorch/pytorch/pull/148693#issuecomment-2715671875))	2025-03-11 20:50:23 +00:00
Sam Larsen	73c8068cf8	[logging] Set compile_id in the CachingAutotuner during compilation so we have it for dynamo_timed logging (#148693 ) Summary: This is a simpler alternative to https://github.com/pytorch/pytorch/pull/146455, where we can stick the compileId (and forward/backward bool) in the CachingAutotuner so that we have it for logging `benchmark_all_configs`. Recall that the first attempt put the compileId in the inductor_meta and that interfered with caching. Test Plan: `python benchmarks/dynamo/torchbench.py --performance --training --amp --backend inductor --device cuda --print-compilation-time --repeat 5 --cold-start-latency --only nanogpt` * tlparse: https://fburl.com/e71yn6uc * dynamo_compile: https://fburl.com/scuba/dynamo_compile/sandbox/4ageghhv * pt2_compile_events: https://fburl.com/scuba/pt2_compile_events/4fgv1itq Pull Request resolved: https://github.com/pytorch/pytorch/pull/148693 Approved by: https://github.com/eellison	2025-03-11 19:38:40 +00:00
PyTorch MergeBot	c916a8efc5	Revert "Use the device interface for detecting Triton availability (#139171 )" This reverts commit `940b60db97`. Reverted https://github.com/pytorch/pytorch/pull/139171 on behalf of https://github.com/ZainRizvi due to Sorry but this is breaking internally. @jansel can you please help get these changes working? See D70946254 for more details. To validate the fixes internally, you can follow the instructions here: https://fburl.com/fixing-ghfirst-reverts ([comment](https://github.com/pytorch/pytorch/pull/139171#issuecomment-2715392451))	2025-03-11 18:49:21 +00:00
George White	940b60db97	Use the device interface for detecting Triton availability (#139171 ) This allows for each device type to check current devices for Triton compatibility and ensure their Triton backend is present. This PR replaces the `has_triton()` global method which was previously used for this task, and moves the initial check for each Inductor backend on to their associated `BaseScheduler` subclass. This means that other backends, such as Halide, can also implement their own availability checks. Pull Request resolved: https://github.com/pytorch/pytorch/pull/139171 Approved by: https://github.com/jansel	2025-03-11 03:56:11 +00:00
clr	6b0fd741d1	dynamo: Count number of opcodes processes (#147149 ) This gives us a decent proxy for how big of a graph we functionally had to parse. Note that this is a cummulative counter. If people feel strongly, I can either write into the dynamo_timed datasets with metrics contexts, or clear the counters / write a counter per frame id as well. Pull Request resolved: https://github.com/pytorch/pytorch/pull/147149 Approved by: https://github.com/jansel	2025-03-10 19:20:09 +00:00
Sam Larsen	187d5c0eb1	[logging] Log cudagraphify timings to dynamo_timed (#143220 ) Summary: this adds some new dynamo_timed calls in cudagraph_trees, primarily with the aim to add cudagraph-related timing to scuba. Things to note: * Uses the changes in https://github.com/pytorch/pytorch/pull/141919 to log "runtime" entries * The logging for chromium/tlparse/scuba relies on us providing a compile_id since it's not available in the environment. A lot of the changes here are just passing around the compile_id * I believe the spirit of the scuba logging is to capture the overheads of `torch.compile`. Therefore, I'm not adding _every_ dynamo_timed to scuba. For example, "run_eager" is the first real execution of the inductor graph -- it's not cudagraph overhead, per se. Watch out for the two instances of `dynamo_compile_runtime_column_us="runtime_cudagraphify_time_us"`. Those are the spots I believe are _extra_ overhead we'd contribute to torch.compile. Test Plan: `python benchmarks/dynamo/torchbench.py --performance --training --amp --backend inductor --device cuda --print-compilation-time --repeat 5 --cold-start-latency --only dcgan`: * tlparse: https://fburl.com/21yrdn8h * scuba: https://fburl.com/scuba/dynamo_compile/sandbox/wt90wnjz `python benchmarks/dynamo/torchbench.py --performance --training --amp --backend inductor --device cuda --print-compilation-time --repeat 5 --cold-start-latency --only nanogpt` * tlparse: https://fburl.com/r9mp7uiv * scuba: https://fburl.com/scuba/dynamo_compile/sandbox/1nvx94re Pull Request resolved: https://github.com/pytorch/pytorch/pull/143220 Approved by: https://github.com/eellison	2025-03-07 23:07:13 +00:00
Ryan Guo	c8cd8f68bd	[dynamo] Properly account for non-list instances in list comparison (#148470 ) As title; this patch also removes an unused `list_compare` method. Fixes #148179. Pull Request resolved: https://github.com/pytorch/pytorch/pull/148470 Approved by: https://github.com/anijain2305	2025-03-07 01:29:30 +00:00
Shunting Zhang	262411e48b	[inductor] online softmax (#127011 ) Softmax need do some preparation work that access the input tensor in two passes - compute amax of each row - compute (x - amax).exp.sum for each row When the row size is large, cache can not hold all the active data and accessing the input multiple passes increases execution time since the kernel is membw bounded. Online softmax uses a customized reduction to compute max and sum at the same time by accessing the data in one pass. Check this paper for more details ( https://arxiv.org/abs/1805.02867 ). Also here is an online softmax kernel generated by inductor as a reference: https://gist.github.com/shunting314/67ae4fffd45d4f2753c781780332fa54 ## Microbenchmark - `TORCHINDUCTOR_COORDINATE_DESCENT_TUNING=1 TORCHINDUCTOR_ONLINE_SOFTMAX=0 DO_PERF_TEST=1 python test/inductor/test_online_softmax.py -k test_softmax` : without online softmax - eager_ms=6.671296119689941 - opt_ms=8.06931209564209 - `TORCHINDUCTOR_COORDINATE_DESCENT_TUNING=1 TORCHINDUCTOR_ONLINE_SOFTMAX=1 DO_PERF_TEST=1 python test/inductor/test_online_softmax.py -k test_softmax`: with online softmax - eager_ms=6.634047985076904 - opt_ms=6.230591773986816 Ideally, online softmax should save about 2ms here. We saves about 1.84ms in practice. Pull Request resolved: https://github.com/pytorch/pytorch/pull/127011 Approved by: https://github.com/jansel	2025-03-06 21:07:18 +00:00
Sam Larsen	40c2505f16	[logging] Log individual Triton kernel compilation times to dynamo_compile (#147022 ) Summary: Gather the compilation time of individual triton kernels and log them to dynamo_compile: * Time compilation in `_worker_compile_triton` and pass back to the main process and logged from `get_result()`. * Added a way to track the "top N" (or N most-expensive compiles) in the metrics_context. I did this because I doubt we really care to capture potentially thousands of kernel compile times. That would be problematic for scuba logging anyway, so let's limit the number we track from the beginning. Arbitrarily chose 25 for now. * Format the list of compile times as a json string before logging. Test Plan: `python benchmarks/dynamo/torchbench.py --performance --training --amp --backend inductor --device cuda --print-compilation-time --repeat 5 --cold-start-latency --only nanogpt` Scuba: https://fburl.com/scuba/dynamo_compile/sandbox/nc4dzm3r Pull Request resolved: https://github.com/pytorch/pytorch/pull/147022 Approved by: https://github.com/jamesjwu	2025-03-03 19:32:17 +00:00
William Wen	4caeede799	[dynamo] more better error messages [3/N] (#147494 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/147494 Approved by: https://github.com/jansel, https://github.com/zou3519	2025-02-28 06:23:28 +00:00
Xuehai Pan	3ce352e389	[BE][PYFMT] migrate PYFMT for `torch._dynamo` to `ruff format` (#144549 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/144549 Approved by: https://github.com/jansel	2025-02-28 03:03:53 +00:00
Raymond Li	c5bf9aaf1c	Log graph breaks (#146537 ) Graph breaks currently aren't logged to dynamo_compile and pt2_compile_events. We want to log them. Pull Request resolved: https://github.com/pytorch/pytorch/pull/146537 Approved by: https://github.com/c00w	2025-02-27 11:06:33 +00:00

1 2 3 4 5 ...

601 Commits