Commit Graph

1105 Commits

Author SHA1 Message Date
Laith Sakka
cc09d3a5ba remove float args benchmark (#155674)
This benchmark very sensitive. removing it for now until we make it better .

<img width="755" alt="Screenshot 2025-06-11 at 12 01 25 AM" src="https://github.com/user-attachments/assets/01a45ae5-2028-42a2-b819-c30d4db3b5d4" />

Pull Request resolved: https://github.com/pytorch/pytorch/pull/155674
Approved by: https://github.com/bdhirsh, https://github.com/bobrenjc93
2025-06-11 20:34:58 +00:00
Oguz Ulgen
d1947a8707 Migrate from lru_cache to cache (#155613)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/155613
Approved by: https://github.com/ezyang
ghstack dependencies: #155612
2025-06-11 19:44:18 +00:00
Animesh Jain
c881f2ddf3 [reland][dynamo] Mark a vt unspecialized nn module variable source earlier (#155099)
Reland of https://github.com/pytorch/pytorch/pull/154780

Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/155099
Approved by: https://github.com/williamwen42
2025-06-04 23:05:36 +00:00
Jeff Daily
3ce5102927 [ROCm] fix CI failures from inductor periodic (#154896)
Similar idea as https://github.com/pytorch/pytorch/pull/154497, but for ROCm.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/154896
Approved by: https://github.com/jeffdaily

Co-authored-by: Jeff Daily <jeff.daily@amd.com>
2025-06-04 15:28:43 +00:00
PyTorch MergeBot
a99a01a677 Revert "[dynamo] Mark a vt unspecialized nn module variable source earlier (#154780)"
This reverts commit cc96febb97.

Reverted https://github.com/pytorch/pytorch/pull/154780 on behalf of https://github.com/seemethere due to This fails internal testing see, https://fburl.com/diff/b0yuxk4w ([comment](https://github.com/pytorch/pytorch/pull/154780#issuecomment-2940381691))
2025-06-04 15:03:34 +00:00
eellison
40a8770154 Incorporate coalesce analysis in codegen (#153751)
This pr uses the coalescing information in generating a tiling. The previous tiling heuristic would have each dependency generate a tiling. Then, we sum up the score for each generated tiling, preferring any 2d tiling over the default. The new tiling heuristics scores each tiling by its global coalesced memory. This gives both a potentially better tiling (especially for more complicated, 3d patterns) as well as information we can use in generating block sizes.

In triton heuristics, for generating 3d tiled reductions, we take the same total block size that the 2d reduction would use, then distribute the block according to whichever block coalesces the most memory.

The motivating kernel is in https://github.com/pytorch/pytorch/issues/149982 which is a 32 element reduction. A smaller version of it is [here](https://gist.github.com/eellison/0fa9396f5479eb4dba09756e3bf6ff2a). We need to run this kernel once in the forward per linear layer on a contiguous tensor, and once in the backward on a transposed tensor.

While the contiguous kernel has coalesced accesses, and is performant on master, the transposed version accesses uncoalesced memory on main and is ~2.8x slower. See, this [full log](https://gist.github.com/eellison/fa644bfd9d0ae11dadb62e17a5d48a83) from the above repro. Now, with this PR, it is only ~1.15x slower. See the [updated log](https://gist.github.com/eellison/0b2b653309494d28cf7b48929a022075).

Pull Request resolved: https://github.com/pytorch/pytorch/pull/153751
Approved by: https://github.com/jansel
ghstack dependencies: #153723, #153730, #153748
2025-06-04 00:22:57 +00:00
Animesh Jain
cc96febb97 [dynamo] Mark a vt unspecialized nn module variable source earlier (#154780)
I am working on providing some skip guard helper functions to allow users to reduce guard overhead. This is a refactor to allow that.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/154780
Approved by: https://github.com/StrongerXi, https://github.com/jansel
2025-06-03 19:19:47 +00:00
bobrenjc93
28f27886eb Vary batch size when running dynamic shapes benchmarks (#154805)
This better measures the actual runtime performance of dynamic shapes
where we aren't guaranteed to have similar shapes as the original hint.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/154805
Approved by: https://github.com/Skylion007
ghstack dependencies: #154802, #154826, #154822, #154823
2025-06-02 18:56:18 +00:00
bobrenjc93
b90fc2ec27 [ez] delete code that died a long time ago (#154802)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/154802
Approved by: https://github.com/Skylion007
2025-06-01 14:57:03 +00:00
Pearu Peterson
6a781619bf Temporarily disable sparse tensor validation when loading from external storage. (#154758)
As in the title per https://github.com/pytorch/pytorch/issues/153143#issuecomment-2917793067 .

The plan is to workout a solution that will allow (1) disabling pinned memory check to fix the original issue and (2) switching off the sparse tensor validation for maximal performance in loading sparse tensors.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/154758
Approved by: https://github.com/amjames, https://github.com/ngimel
2025-05-31 19:45:44 +00:00
Aaron Orenstein
fc0135ca11 Re-enable FakeTensor caching for SymInts (#152662)
Summary:

This backs out D60320595 which itself turned off FakeTensor caching when a SymInt was present.

There has been a lot of dynamic shape fixes done this year and tests pass so I'm assuming some of that work fixed what was breaking previously.

Test Plan: Reran the tests listed in T196779132 and they pass.

## Perf
### Instruction Counter Benchmark:
- 26% win on add_loop_eager_dynamic
- 13% win on add_loop_inductor_dynamic_gpu
### Perf Dashboard
Compilation Latency wins across the board but especially strong on the dynamic tests (like cudagraphs_dynamic) - for example MobileBertForMaskedLM went from 66s -> 50s.

Differential Revision: [D75467694](https://our.internmc.facebook.com/intern/diff/D75467694)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/152662
Approved by: https://github.com/anijain2305
2025-05-30 17:23:36 +00:00
Isuru Fernando
9ba67e99bb [dynamo] keep C++ symbolic shape guards disabled for benchmarks (#151225)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/151225
Approved by: https://github.com/anijain2305
2025-05-29 23:29:39 +00:00
Laith Sakka
39df901b2a introduce definitely_contiguous and use it for reshape and tensor meta data computation. (#153432)
when a tensor has unbacked symbols it can be general enough to represent both contiguous and non contiguous tensors.
in that case we cant really evaluate is_contiguous. In many places in the code base, we check for is_contiguous to take a fast path. but the general path usually works for both contiguous and not contiguous in that case we probably want
to use definitely _contiguous API.

This is appleid for reshape in this PR and also to  tensor meta data computation, the meta data now will have an attribute that says that its contiguous when its always contiguous. We would store that only if definitely _contiguous is true  now.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/153432
Approved by: https://github.com/bobrenjc93
2025-05-28 03:41:26 +00:00
Boyuan Feng
514409d032 update torchvision pin (#154255)
Fixes #153985

Pull Request resolved: https://github.com/pytorch/pytorch/pull/154255
Approved by: https://github.com/desertfire
2025-05-27 16:15:25 +00:00
PyTorch MergeBot
3f64502c98 Revert "Re-enable FakeTensor caching for SymInts (#152662)"
This reverts commit 7d11c61c26.

Reverted https://github.com/pytorch/pytorch/pull/152662 on behalf of https://github.com/malfet due to Looks like it broke bunch of inductor tests, see 187d38185e/1 ([comment](https://github.com/pytorch/pytorch/pull/152662#issuecomment-2910293593))
2025-05-26 17:13:22 +00:00
Aaron Orenstein
7d11c61c26 Re-enable FakeTensor caching for SymInts (#152662)
Summary:

This backs out D60320595 which itself turned off FakeTensor caching when a SymInt was present.

There has been a lot of dynamic shape fixes done this year and tests pass so I'm assuming some of that work fixed what was breaking previously.

Test Plan: Reran the tests listed in T196779132 and they pass.

## Perf
### Instruction Counter Benchmark:
- 26% win on add_loop_eager_dynamic
- 13% win on add_loop_inductor_dynamic_gpu
### Perf Dashboard
Compilation Latency wins across the board but especially strong on the dynamic tests (like cudagraphs_dynamic) - for example MobileBertForMaskedLM went from 66s -> 50s.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/152662
Approved by: https://github.com/anijain2305
2025-05-26 04:17:56 +00:00
Eddie Yan
76ed9db468 [cuBLAS][cuBLASLt] Use cuBLAS default workspace size in Lt (#153556)
Also enables unified workspaces by default for non-FBCODE use cases.
Default Lt workspace size is also updated to match cuBLAS logic for default, including for Blackwell (SM 10.0) and GeForce Blackwell (SM 12.0).

Recommended defaults are documented here:
https://docs.nvidia.com/cuda/cublas/#cublassetworkspace

Pull Request resolved: https://github.com/pytorch/pytorch/pull/153556
Approved by: https://github.com/Skylion007, https://github.com/ngimel
2025-05-24 03:43:35 +00:00
Laith Sakka
9e089bb5b6 change guard_or impl for better perf and simplicity (#153674)
PR time benchmarks has been showing regressions as we move to guard_or_false, reason is that prev implementation do not cache.
This new approach will propagate the fallback value to eval and return it. allowing eval to cache and reducing scamming logs and complexity.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/153674
Approved by: https://github.com/bobrenjc93
2025-05-23 15:24:28 +00:00
Huy Do
7509b150af Don't upload compiler benchmark debug info to the benchmark database (#153769)
During our debug session, @wdvr and I found out that the benchmark database is growing much faster than we expect.  After taking a closer look, the majority of them coming from TorchInductor benchmark and the top 3 are all debug information not used by any dashboard atm.  In the period of 7 days, there are close to 6 millions records ([query](https://paste.sh/GUVCBa0v#UzszFCZaWQxh7oSVsZtfZdVE))

```
Benchmark,Metric,Count
"TorchInductor","user_stack","1926014"
"TorchInductor","reason","1926014"
"TorchInductor","model","1926014"
```

Let's skip uploading them to avoid bloating the database.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/153769
Approved by: https://github.com/malfet
2025-05-23 01:18:26 +00:00
Benjamin Glass
768cb734ec cpp_wrapper: build non-performance-sensitive code at O1 (#148773)
Builds on #148212, applying the same improvements to `cpp_wrapper` mode.

Benchmark results:

* [A100 Benchmarks](https://hud.pytorch.org/benchmark/compilers?dashboard=torchinductor&startTime=Wed%2C%2014%20May%202025%2015%3A10%3A05%20GMT&stopTime=Wed%2C%2021%20May%202025%2015%3A10%3A05%20GMT&granularity=hour&mode=inference&dtype=bfloat16&deviceName=cuda%20(a100)&lBranch=gh/benjaminglass1/77/orig&lCommit=ca7d0a3f16e3c511534d2cd03d695be8524570d3&rBranch=main&rCommit=1075bb37d34e483763a09c7810790d5491441e13)
* [x86 Benchmarks](https://hud.pytorch.org/benchmark/compilers?dashboard=torchinductor&startTime=Wed%2C%2014%20May%202025%2015%3A10%3A05%20GMT&stopTime=Wed%2C%2021%20May%202025%2015%3A10%3A05%20GMT&granularity=hour&mode=inference&dtype=bfloat16&deviceName=cpu%20(x86)&lBranch=gh/benjaminglass1/77/orig&lCommit=ca7d0a3f16e3c511534d2cd03d695be8524570d3&rBranch=main&rCommit=1075bb37d34e483763a09c7810790d5491441e13)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/148773
Approved by: https://github.com/desertfire
2025-05-23 00:51:20 +00:00
Huy Do
6cd9d66b7f Allow higher fp16 tolerance for phlippe_resnet on CUDA 12.8 (#154109)
After https://github.com/pytorch/pytorch/pull/154004, one of the model `phlippe_resnet` needs higher tolerance for fp16 on CUDA 12.8.  I can reproduce it locally with:

```
python benchmarks/dynamo/torchbench.py --accuracy --timing --explain --print-compilation-time --inductor --device cuda --training --amp --only phlippe_resnet

E0522 02:47:12.392000 2130213 site-packages/torch/_dynamo/utils.py:2949] RMSE (res-fp64): 0.00144, (ref-fp64): 0.00036 and shape=torch.Size([]). res.dtype: torch.float32, multiplier: 3.000000, tol: 0.001000, use_larger_multiplier_for_smaller_tensor: 0
```

I'm not sure what exactly happens behind the scene, but this should help fix the CI failure.

Also remove some left over expected accuracy results for CUDA 12.4 which we are not using anymore on CI for benchmark jobs.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/154109
Approved by: https://github.com/Skylion007, https://github.com/malfet
2025-05-22 14:25:12 +00:00
Gabriel Ferns
254293b777 Add flag _metrics_log_runtime to disable runtime metric logging by default (#153506)
https://github.com/pytorch/pytorch/pull/152708 expanded support of `get_estimated_runtime` to many more types of `SchedulerNodes`. This caused an increase in compile time because we're always calling `get_estimated_runtime` to populate the metrics table. This PR adds a flag for this logging, which reduces the instruction count by 8%. Long term, we should probably merge metrics.py with TORCH_LOGS/tlparse (suggestion from @xmfan).

Update: added support for TORCH_LOGS for the metrics logging.

Test Plan:
mm_loop.py and many existing tests cover.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/153506
Approved by: https://github.com/eellison
2025-05-22 01:02:11 +00:00
PyTorch MergeBot
3443627e07 Revert "[BE]: Enable RUFF TRY400 rule - log.exception (#153473)"
This reverts commit 4f4ecc583e.

Reverted https://github.com/pytorch/pytorch/pull/153473 on behalf of https://github.com/jeanschmidt due to seems to have broken internal signals, @albanD may I count on you to help the author merge his PR? D74837988 ([comment](https://github.com/pytorch/pytorch/pull/153473#issuecomment-2886017075))
2025-05-16 08:29:26 +00:00
PyTorch MergeBot
4d073af58c Revert "[inductor][dynamo] Include operator name in size/stride/alignment assertion (#152353)"
This reverts commit 725bbb6b5f.

Reverted https://github.com/pytorch/pytorch/pull/152353 on behalf of https://github.com/jeanschmidt due to seems to have broken a few internal tests, @jansel may you help the author get his PR merged? ([comment](https://github.com/pytorch/pytorch/pull/152353#issuecomment-2885997862))
2025-05-16 08:20:39 +00:00
Nikita Shulga
754b758ea1 [BE] Extend empty_gpu_cache to mps (#153657)
And replace `if: elif:` with `getattr()`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/153657
Approved by: https://github.com/atalman, https://github.com/wdvr, https://github.com/ZainRizvi
2025-05-16 01:08:54 +00:00
Aaron Gokaslan
4f4ecc583e [BE]: Enable RUFF TRY400 rule - log.exception (#153473)
Change logging.error to logging.exception to log additional information when relevant.  A few places have slipped in logging.errors in try except since I last did a clean up here and the rule is stabilized so I am enabling it codebase wide. I have NOQA'd much of our custom exception stack trace handling for RPC calls and distributed and tried to a fix a few errors based on whether we immediately reraised it or if we didn't print any exception handling where it could be useful.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/153473
Approved by: https://github.com/albanD, https://github.com/cyyever
2025-05-15 13:36:59 +00:00
karthickai
725bbb6b5f [inductor][dynamo] Include operator name in size/stride/alignment assertion (#152353)
Fixes #151930

This PR updates the `assert_size_stride` and `assert_alignment` functions in [guards.cpp](https://github.com/pytorch/pytorch/blob/main/torch/csrc/dynamo/guards.cpp) to accept an optional `op_name` argument and includes it in the error messages.

The corresponding type stubs in [guards.pyi](https://github.com/pytorch/pytorch/blob/main/torch/_C/_dynamo/guards.pyi) are updated to match the new function arg.

In [inductor/ir.py](https://github.com/pytorch/pytorch/blob/main/torch/_inductor/ir.py) extracts the operator name from the FX graph and passes it into the `codegen_size_asserts` and `codegen_alignment_asserts` functions, so that generated assertions in Triton code include the op name for better debugging.

Added unit tests inside [test_torchinductor.py](https://github.com/pytorch/pytorch/blob/main/test/inductor/test_torchinductor.py).
- Verified both successful and failing assertion cases include the operator name.
- Verified that generated Triton code contains the op name inside the asserts.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/152353
Approved by: https://github.com/jansel
2025-05-15 02:33:57 +00:00
Animesh Jain
03d01860fd [dynamo][compile-time] Compute logging related flags once (#153426)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/153426
Approved by: https://github.com/jansel
2025-05-14 21:19:06 +00:00
Animesh Jain
8f3d7972ad [dynamo][compile-time] Cache the function signature to speedup inlining (#153396)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/153396
Approved by: https://github.com/jansel, https://github.com/StrongerXi
ghstack dependencies: #153333
2025-05-14 14:01:46 +00:00
Animesh Jain
864a5f4434 [dynamo][compile-time] Cache the cleaned insturctions while inlining (#153333)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/153333
Approved by: https://github.com/StrongerXi, https://github.com/jansel, https://github.com/williamwen42
2025-05-14 09:26:26 +00:00
Animesh Jain
11c64b7cf8 [dynamo][compile-time] Cache whether a function is inlineable (#153192)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/153192
Approved by: https://github.com/StrongerXi, https://github.com/jansel, https://github.com/williamwen42
ghstack dependencies: #153458
2025-05-14 05:40:25 +00:00
Benjamin Glass
e8596c291b Fix misleadingly high AOT Inductor dashboard performance (#153060)
Fixes misleadingly high AOTInductor performance benchmark numbers in scenarios where a model updates internal parameters during `torch.export.export`. Since `FakeTensorMode` is enabled during export, all such parameters become `FakeTensor`s, slowing down future eager-mode runs using that model substantively. This, in turn, causes misleading performance stats, where the slowness of eager-mode makes `AOTInductor` look _very_ good.

An [example benchmark](https://hud.pytorch.org/benchmark/timm_models/inductor_aot_inductor?dashboard=torchinductor&startTime=Wed%2C%2030%20Apr%202025%2015%3A54%3A04%20GMT&stopTime=Wed%2C%2007%20May%202025%2015%3A54%3A04%20GMT&granularity=hour&mode=inference&dtype=bfloat16&deviceName=cuda%20(h100)&lBranch=main&lCommit=1dd36ad2d440a4f3faf724b3a8e13925e3180c24&rBranch=main&rCommit=cc7346bf19c019255dcb4484694a75850ed74d5a&model=convit_base) with this issue. The equivalent `cpp_wrapper` benchmark run shows a 2x performance gain, not 20x.

Only two benchmarks we regularly run are affected by this, both in the TIMM set.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/153060
Approved by: https://github.com/desertfire
2025-05-13 20:59:59 +00:00
Michael Lazos
ff039d39ec [Dynamo] Optimize dedupe region ancestor tracking (#152589)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/152589
Approved by: https://github.com/anijain2305
ghstack dependencies: #152389, #152505, #152410, #152506, #152570, #152572
2025-05-13 12:17:59 +00:00
Laith Sakka
c4fb0b6f33 refresh expected results (#150166)
@huydhn when do you think we will have the APIs to access results on oss storage available so we do not
have to worry about this racing again?
Also is there a way to accelerate unstability in this after we land it?

Pull Request resolved: https://github.com/pytorch/pytorch/pull/150166
Approved by: https://github.com/bobrenjc93, https://github.com/eellison, https://github.com/anijain2305
2025-05-13 04:04:42 +00:00
Aaron Gokaslan
3555ebb63d [BE]: Update ruff to 0.11.8 (#153249)
Fixes a ton of false negatives throughout the codebase. RUFF also properly validates NOQA comments now and most of the changes are fixing typos there or removing filewide flake8 suppressions that were also silencing ruff issues.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/153249
Approved by: https://github.com/cyyever, https://github.com/albanD, https://github.com/seemethere
2025-05-12 18:30:52 +00:00
PyTorch MergeBot
aa7fe6af41 Revert "[Dynamo] Optimize dedupe region ancestor tracking (#152589)"
This reverts commit b5f1345f72.

Reverted https://github.com/pytorch/pytorch/pull/152589 on behalf of https://github.com/jeanschmidt due to Breaking internal signal citadel-fbcode-test-mode-opt-for-pt2_stack_for_internal-linux-0 please see diff [D74531503](https://www.internalfb.com/diff/D74531503) for more details ([comment](https://github.com/pytorch/pytorch/pull/152410#issuecomment-2871168679))
2025-05-12 07:15:09 +00:00
Michael Lazos
b5f1345f72 [Dynamo] Optimize dedupe region ancestor tracking (#152589)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/152589
Approved by: https://github.com/anijain2305
ghstack dependencies: #152389, #152505, #152410, #152506, #152570, #152572
2025-05-10 08:27:56 +00:00
Pian Pawakapan
d808a3e203 [dynamic shapes] guard_or_false for computeStorageNbytes (#150483)
removes fast path for computing storage, fixes some adjacent tests

Pull Request resolved: https://github.com/pytorch/pytorch/pull/150483
Approved by: https://github.com/laithsakka
2025-05-09 19:31:19 +00:00
Pian Pawakapan
8ea95d2e73 [inductor] dtype promotion error in cat decomp (#152995)
cloning single tensor wasn't following dtype promotion rules
for SAM model: https://github.com/pytorch/pytorch/issues/152606

Pull Request resolved: https://github.com/pytorch/pytorch/pull/152995
Approved by: https://github.com/yushangdi, https://github.com/eellison
2025-05-09 16:58:58 +00:00
Animesh Jain
ab829ec629 [dynamo][pr_time_benchmark] Add dynamo benchmark to stress test inlining (#153159)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/153159
Approved by: https://github.com/laithsakka
ghstack dependencies: #152883, #153105
2025-05-09 00:09:19 +00:00
Pian Pawakapan
4166373908 [dynamic shapes] guard_or_false for infer_size (#152146)
Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/152146
Approved by: https://github.com/laithsakka
2025-05-08 21:27:22 +00:00
Animesh Jain
ecd74c953f [dynamo] Recursively realize the stack_values (#152853)
Might also fix - https://github.com/pytorch/pytorch/issues/135696

Pull Request resolved: https://github.com/pytorch/pytorch/pull/152853
Approved by: https://github.com/Lucaskabela, https://github.com/mlazos, https://github.com/jansel
2025-05-07 02:36:44 +00:00
Aaron Gokaslan
07a29dbe81 [BE]: Update cutlass submodule to 3.9.2 (#152779)
A lot of last minute bugfixes for CUTLASS blackwell that we should upstream. It's a header only library and a minor release so this should strictly improve compiler support and fix some bugs. Needed to update some instruction numbers in torch compile baselines for the new kernels

Pull Request resolved: https://github.com/pytorch/pytorch/pull/152779
Approved by: https://github.com/henrylhtsang
2025-05-06 16:08:24 +00:00
PyTorch MergeBot
fcd5e49138 Revert "[dynamo] Recursively realize the stack_values (#152853)"
This reverts commit 460888f908.

Reverted https://github.com/pytorch/pytorch/pull/152853 on behalf of https://github.com/malfet due to Looks like it broke inductor tests ([comment](https://github.com/pytorch/pytorch/pull/152853#issuecomment-2854897485))
2025-05-06 15:02:57 +00:00
Animesh Jain
460888f908 [dynamo] Recursively realize the stack_values (#152853)
Might also fix - https://github.com/pytorch/pytorch/issues/135696

Pull Request resolved: https://github.com/pytorch/pytorch/pull/152853
Approved by: https://github.com/Lucaskabela, https://github.com/mlazos, https://github.com/jansel
2025-05-06 06:30:31 +00:00
rzou
2b37a726e0 Refactor layout constraint selection logic (#148104)
This PR:

- cleans up some existing comments that don't make sense anymore
- hooks up the "custom_op_default_layout_constraint" back (that seems to
have broken)
- cleans up the "lazy registration path" which seems to never get hit
anymore
- adds dislike_padding to nodes that require exact strides

Test Plan:
- tests + CI

disable padding

Pull Request resolved: https://github.com/pytorch/pytorch/pull/148104
Approved by: https://github.com/shunting314, https://github.com/eellison
2025-05-03 00:02:24 +00:00
rzou
64957db6c9 Fix some inductor periodic benchmarks (#152605)
Some were reporting "pass" consistently on https://hud.pytorch.org/
Those are fine to flip.

I filed a separate issue for the now-regressions for AOTI:
https://github.com/pytorch/pytorch/issues/152606. These should be looked
at.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/152605
Approved by: https://github.com/eellison, https://github.com/huydhn
2025-05-01 22:18:30 +00:00
Jason Ansel
15a3f58f91 Return ConstantVariable(None) from WithExitFunctionVariable.exit to prevent NoneType crash inside autocast exception path (#152503)
Copy of #152013 with PR time benchmarks updated (regressions seem unrelated)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/152503
Approved by: https://github.com/anijain2305, https://github.com/Skylion007

Co-authored-by: Witold Dziurdz <wdziurdz@habana.ai>
2025-05-01 04:01:24 +00:00
Huy Do
3f10091d3c Clean up conda usage in benchmark scripts (#152552)
Fixes https://github.com/pytorch/pytorch/issues/152123.

* Switch `benchmarks/dynamo/Makefile` to use uv.  Note that these scripts are only used locally, so it's kind of ok to keep conda here IMO.  But switching to uv is probably nicer to most folks.
* Delete some files that are outdated and not used anymore

Pull Request resolved: https://github.com/pytorch/pytorch/pull/152552
Approved by: https://github.com/atalman, https://github.com/albanD
2025-04-30 21:27:29 +00:00
Gabriel Ferns
ce00ec7ecf Enable max autotune for AOTInductor benchmark (#149309)
With this PR, AOTinductor can choose to run into max-autotune mode when benchmarking.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/149309
Approved by: https://github.com/desertfire

Co-authored-by: Gabriel Ferns <gabeferns@meta.com>
2025-04-28 06:54:26 +00:00