Commit Graph

48708 Commits

Author SHA1 Message Date
Animesh Jain
7368eeba5e [dynamo][guards] Prevent LENGTH guard on nn modules (#154763)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/154763
Approved by: https://github.com/williamwen42
2025-05-31 05:32:31 +00:00
henrylhtsang
7a79de1c0f [inductor] Add kernel_hash_key to ChoiceCaller (#154470)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/154470
Approved by: https://github.com/mlazos
2025-05-31 03:09:37 +00:00
PyTorch MergeBot
bd10ea4e6c Revert "Use 3.27 as the minimum CMake version (#153153)"
This reverts commit ad26ec6abe.

Reverted https://github.com/pytorch/pytorch/pull/153153 on behalf of https://github.com/cyyever due to It still breaks windows debug builds ([comment](https://github.com/pytorch/pytorch/pull/153153#issuecomment-2923997777))
2025-05-31 02:14:24 +00:00
Peter Y. Yeh
43390d8b13 ROCm Sparsity through HipSparseLT (#150578)
TLDR:

- This pull request introduces support for hipSPARSELt in ROCm, current usage would be semi-structure sparsity.
- Require **ROCm 6.4** && **gfx942/gfx950**.
- The average performance uplift (compare to dense operation) is ~ 20% in ROCm 6.4 but expect further performance lift along the way.

### Dense vs. Sparse Performance Comparison

#### **NT (Row-major)**
**Average Uplift**: `1.20`

| M     | N      | K      | hipsparselt-bench (us) | hipblaslt-bench get all (us) | Uplift |
|-------|--------|--------|-------------------------|-------------------------------|--------|
| 14336 | 8      | 4096   | 20.05                   | 25.3                          | 1.26   |
| 4096  | 8      | 14336  | 21.07                   | 25.28                         | 1.20   |
| 3072  | 3072   | 10240  | 299.05                  | 351.82                        | 1.18   |
| 3072  | 1536   | 768    | 18.56                   | 20.05                         | 1.08   |
| 3072  | 17664  | 768    | 163.13                  | 173.91                        | 1.07   |
| 3072  | 196608 | 768    | 1717.30                 | 1949.63                       | 1.14   |
| 3072  | 24576  | 768    | 206.84                  | 242.98                        | 1.17   |
| 3072  | 6144   | 768    | 53.90                   | 56.88                         | 1.06   |
| 3072  | 98304  | 768    | 833.77                  | 962.28                        | 1.15   |
| 768   | 1536   | 768    | 8.53                    | 19.65                         | 2.30   |
| 768   | 17664  | 768    | 46.02                   | 46.84                         | 1.02   |
| 768   | 196608 | 768    | 463.15                  | 540.46                        | 1.17   |
| 768   | 24576  | 768    | 54.32                   | 59.55                         | 1.10   |
| 768   | 6144   | 768    | 19.47                   | 20.15                         | 1.03   |
| 768   | 98304  | 768    | 231.88                  | 258.73                        | 1.12   |

---

#### **NN (Row-major)**
**Average Uplift**: `1.13`

| M   | N      | K     | hipsparselt-bench (us) | hipblaslt-bench get all (us) | Uplift |
|-----|--------|-------|-------------------------|-------------------------------|--------|
| 768 | 1536   | 3072  | 27.50                   | 28.78                         | 1.05   |
| 768 | 17664  | 3072  | 125.06                  | 158.94                        | 1.27   |
| 768 | 196608 | 3072  | 1568.38                 | 1767.12                       | 1.13   |
| 768 | 24576  | 3072  | 171.05                  | 203.49                        | 1.19   |
| 768 | 6144   | 3072  | 58.72                   | 60.39                         | 1.03   |
| 768 | 98304  | 3072  | 787.15                  | 887.60                        | 1.13   |

-------------------------

This pull request introduces support for hipSPARSELt in ROCm, alongside various updates and improvements to the codebase and test suite. The changes primarily involve adding configuration flags, updating conditional checks, and ensuring compatibility with hipSPARSELt.

### ROCm and hipSPARSELt Support:

* [`BUILD.bazel`](diffhunk://#diff-7fc57714ef13c3325ce2a1130202edced92fcccc0c6db34a72f7b57f60d552a3R292): Added `@AT_HIPSPARSELT_ENABLED@` substitution to enable hipSPARSELt support.
* [`aten/CMakeLists.txt`](diffhunk://#diff-0604597797bb21d7c39150f9429d6b2ace10b79ab308514ad03f76153ae8249bR104-R110): Introduced a conditional flag to enable hipSPARSELt support based on ROCm version.
* [`aten/src/ATen/CMakeLists.txt`](diffhunk://#diff-ce80f3115ab2f6be5142f0678a1fc92c6b2d7727766ce44f48726c99e720f777R37): Added `AT_HIPSPARSELT_ENABLED` configuration.
* [`aten/src/ATen/cuda/CUDAConfig.h.in`](diffhunk://#diff-8bb82da825ca87c28233abacffa1b0566c73a54990b7a77f3f5108d3718fea15R11): Defined `AT_HIPSPARSELT_ENABLED` macro.
* `caffe2/CMakeLists.txt`, `cmake/Dependencies.cmake`, `cmake/public/LoadHIP.cmake`: Included hipSPARSELt in the ROCm dependencies. [[1]](diffhunk://#diff-c5ee05f1e918772792ff6f2a3f579fc2f182e57b1709fd786ef6dc711fd68b27R1380) [[2]](diffhunk://#diff-12e8125164bbfc7556b1781a8ed516e333cc0bf058acb7197f7415be44606c72L1084-R1084) [[3]](diffhunk://#diff-b98e27b9a5f196a6965a99ee5a7bb15b3fc633d6375b767635b1b04ccb2fd3d5R153)

### Codebase Updates:

* [`aten/src/ATen/native/sparse/cuda/cuSPARSELtOps.cpp`](diffhunk://#diff-ae921dd1584ab98fdd9c25a3521047795de702223f5b65fdaa45a5bd92b4d1f3R1-R6): Added hipSPARSELt support checks and initialization functions. Updated various methods to conditionally handle hipSPARSELt. [[1]](diffhunk://#diff-ae921dd1584ab98fdd9c25a3521047795de702223f5b65fdaa45a5bd92b4d1f3R1-R6) [[2]](diffhunk://#diff-ae921dd1584ab98fdd9c25a3521047795de702223f5b65fdaa45a5bd92b4d1f3R22-R67) [[3]](diffhunk://#diff-ae921dd1584ab98fdd9c25a3521047795de702223f5b65fdaa45a5bd92b4d1f3R78-R85) [[4]](diffhunk://#diff-ae921dd1584ab98fdd9c25a3521047795de702223f5b65fdaa45a5bd92b4d1f3R97-R109) [[5]](diffhunk://#diff-ae921dd1584ab98fdd9c25a3521047795de702223f5b65fdaa45a5bd92b4d1f3R183-R188) [[6]](diffhunk://#diff-ae921dd1584ab98fdd9c25a3521047795de702223f5b65fdaa45a5bd92b4d1f3L134-R200) [[7]](diffhunk://#diff-ae921dd1584ab98fdd9c25a3521047795de702223f5b65fdaa45a5bd92b4d1f3R213-R222) [[8]](diffhunk://#diff-ae921dd1584ab98fdd9c25a3521047795de702223f5b65fdaa45a5bd92b4d1f3L217-R285)

### Test Suite Updates:

* [`test/test_sparse_semi_structured.py`](diffhunk://#diff-b7b57bc1e34145ef89c7929751d5d26aeecc8edfb37da9c60e9d3f0a1335133cR50-R65): Added checks for hipSPARSELt availability and updated test conditions to skip tests not supported on ROCm. [[1]](diffhunk://#diff-b7b57bc1e34145ef89c7929751d5d26aeecc8edfb37da9c60e9d3f0a1335133cR50-R65) [[2]](diffhunk://#diff-b7b57bc1e34145ef89c7929751d5d26aeecc8edfb37da9c60e9d3f0a1335133cR228) [[3]](diffhunk://#diff-b7b57bc1e34145ef89c7929751d5d26aeecc8edfb37da9c60e9d3f0a1335133cR239) [[4]](diffhunk://#diff-b7b57bc1e34145ef89c7929751d5d26aeecc8edfb37da9c60e9d3f0a1335133cR250) [[5]](diffhunk://#diff-b7b57bc1e34145ef89c7929751d5d26aeecc8edfb37da9c60e9d3f0a1335133cR579) [[6]](diffhunk://#diff-b7b57bc1e34145ef89c7929751d5d26aeecc8edfb37da9c60e9d3f0a1335133cR624) [[7]](diffhunk://#diff-b7b57bc1e34145ef89c7929751d5d26aeecc8edfb37da9c60e9d3f0a1335133cR661) [[8]](diffhunk://#diff-b7b57bc1e34145ef89c7929751d5d26aeecc8edfb37da9c60e9d3f0a1335133cR695) [[9]](diffhunk://#diff-b7b57bc1e34145ef89c7929751d5d26aeecc8edfb37da9c60e9d3f0a1335133cR730) [[10]](diffhunk://#diff-b7b57bc1e34145ef89c7929751d5d26aeecc8edfb37da9c60e9d3f0a1335133cR755) [[11]](diffhunk://#diff-b7b57bc1e34145ef89c7929751d5d26aeecc8edfb37da9c60e9d3f0a1335133cR771) [[12]](diffhunk://#diff-b7b57bc1e34145ef89c7929751d5d26aeecc8edfb37da9c60e9d3f0a1335133cR809) [[13]](diffhunk://#diff-b7b57bc1e34145ef89c7929751d5d26aeecc8edfb37da9c60e9d3f0a1335133cR844) [[14]](diffhunk://#diff-b7b57bc1e34145ef89c7929751d5d26aeecc8edfb37da9c60e9d3f0a1335133cL840-R854) [[15]](diffhunk://#diff-b7b57bc1e34145ef89c7929751d5d26aeecc8edfb37da9c60e9d3f0a1335133cR1005)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/150578
Approved by: https://github.com/jeffdaily
2025-05-31 02:03:40 +00:00
cyy
ad26ec6abe Use 3.27 as the minimum CMake version (#153153)
Update the minimum CMake version to 3.27 because of it provides more CUDA targets such as `CUDA::nvperf_host` so that it is possible to remove some of our forked CUDA modules. See https://github.com/pytorch/pytorch/pull/153783.
It's also possible to facilitate future third-party updates such as FBGEMM (its current shipped version requires 3.21).

Pull Request resolved: https://github.com/pytorch/pytorch/pull/153153
Approved by: https://github.com/malfet
2025-05-31 01:54:35 +00:00
dolpm
472773c7f9 [nativert] move OpKernelKind enum to torch (#154756)
Summary: att

Test Plan: ci

Differential Revision: D75703996

Pull Request resolved: https://github.com/pytorch/pytorch/pull/154756
Approved by: https://github.com/zhxchen17, https://github.com/cyyever
2025-05-31 01:31:29 +00:00
Natalia Gimelshein
f01e628e3b Resubmit Remove MemPoolContext (#154042) (#154746)
Summary: Per title

Test Plan: Added tests + existing tests

Differential Revision: D75695030

Pull Request resolved: https://github.com/pytorch/pytorch/pull/154746
Approved by: https://github.com/malfet
2025-05-31 01:21:54 +00:00
PyTorch MergeBot
108422ac26 Revert "Use 3.27 as the minimum CMake version (#153153)"
This reverts commit 78624679a8.

Reverted https://github.com/pytorch/pytorch/pull/153153 on behalf of https://github.com/cyyever due to It still breaks windows debug builds ([comment](https://github.com/pytorch/pytorch/pull/153153#issuecomment-2923785799))
2025-05-31 00:28:03 +00:00
PyTorch MergeBot
0fab32290a Revert "[draft export] avoid storing intermediate real tensors in proxies (#154630)"
This reverts commit 5acb8d5080.

Reverted https://github.com/pytorch/pytorch/pull/154630 on behalf of https://github.com/malfet due to This still ooms, at least occasionally see 78624679a8/1 ([comment](https://github.com/pytorch/pytorch/pull/154630#issuecomment-2923759745))
2025-05-31 00:07:56 +00:00
Yidi Wu
faf973da5e [refactor] move materialize_as_graph to _higher_order_ops/utils.py (#154070)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/154070
Approved by: https://github.com/zou3519
2025-05-31 00:06:44 +00:00
cyy
78624679a8 Use 3.27 as the minimum CMake version (#153153)
Update the minimum CMake version to 3.27 because of it provides more CUDA targets such as `CUDA::nvperf_host` so that it is possible to remove some of our forked CUDA modules. See https://github.com/pytorch/pytorch/pull/153783.
It's also possible to facilitate future third-party updates such as FBGEMM (its current shipped version requires 3.21).

Pull Request resolved: https://github.com/pytorch/pytorch/pull/153153
Approved by: https://github.com/malfet
2025-05-31 00:01:52 +00:00
Pian Pawakapan
5f1c3c67b2 [pgo] log dynamic whitelist in PT2 Compile Events (#154747)
Summary: logs the whitelist to PT2 Compile Events

Test Plan: loggercli codegen GeneratedPt2CompileEventsLoggerConfig

Reviewed By: bobrenjc93

Differential Revision: D75617963

Pull Request resolved: https://github.com/pytorch/pytorch/pull/154747
Approved by: https://github.com/angelayi
2025-05-30 23:54:24 +00:00
Aaron Gokaslan
bbda22e648 [BE][Ez]: Optimize unnecessary lambda with operator (#154722)
Automated edits performed by FURB118. Operator is implemented in C and way faster when passed to another C method like sorted, max etc as a `key=`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/154722
Approved by: https://github.com/jansel
2025-05-30 23:47:10 +00:00
David Berard
eb93c0adb1 [inductor][AMD] support special kwargs in AMD triton configs (#154605)
**Context**:

AMD triton kernels can be launched with special kwargs, like `waves_per_eu`. Triton configs with these kwargs look like this:

```
triton.Config({
    "BLOCK_SIZE": 64,
    "waves_per_eu": 2,
})
```

in comparison, nvidia's special kwargs are explicit parameters on the config, e.g. num_warps:

```
triton.Config(
    {"BLOCK_SIZE": 64},
    num_warps=4,
)
```

**Problem**: this causes custom triton kernels w/ PT2 to error out, because there's a kwarg in the triton.Config that doesn't appear in the kernel signature.

**Solution**: When splicing in the constexpr values into the arg list, ignore any values in the config kwargs list if they don't appear in the function signature.

Differential Revision: [D75599629](https://our.internmc.facebook.com/intern/diff/D75599629/)

**NOTE FOR REVIEWERS**: This PR has internal Meta-specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D75599629/)!

Pull Request resolved: https://github.com/pytorch/pytorch/pull/154605
Approved by: https://github.com/njriasan
2025-05-30 22:24:32 +00:00
PyTorch MergeBot
1193bf0855 Revert "convert inductor codecache to use getArtifactLogger (#153766)"
This reverts commit 5b6fd277f9.

Reverted https://github.com/pytorch/pytorch/pull/153766 on behalf of https://github.com/malfet due to I want to revert this change as I'm 90+% certain it somehow broke testing ([comment](https://github.com/pytorch/pytorch/pull/153766#issuecomment-2923620806))
2025-05-30 22:20:07 +00:00
Pian Pawakapan
5acb8d5080 [draft export] avoid storing intermediate real tensors in proxies (#154630)
Handles GC for non-strict draft export; GPU memory usage shouldn't be much more than eager mode + input tensors now.

While trying to do draft export CPU offloading, I found out GC is feasible, because in non-strict, there's 2 places holding references to a `.real_tensor` attribute:
1) the FakeTensors in fake tensor prop, but these are held by the actual variables in the model's forward call, and so the real tensor gets gc-ed along with the fake one when the variable goes out of scope.
2) A clone of the fake tensor in 1) stored in `proxy.node.meta["val"]`, which was added in https://github.com/pytorch/pytorch/pull/150948. But we didn't actually need to store them on intermediate values; the placeholders are enough for retracing/lowering.

Avoiding storing the intermediate values in 2), the values in 1) should be naturally GC-ed, and the real-tensor memory usage for non-strict should be pretty similar to eager computation?

Strict still OOMs; dynamo still holds these in variable tracking, and not sure how to GC those.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/154630
Approved by: https://github.com/angelayi, https://github.com/yushangdi
2025-05-30 21:06:55 +00:00
Feny Patel
abc2264e8f remove another instance of mtia_workloadd from pytorch (#154739)
Summary: ^

Test Plan: CIs

Differential Revision: D75692171

Pull Request resolved: https://github.com/pytorch/pytorch/pull/154739
Approved by: https://github.com/sraikund16
2025-05-30 20:50:46 +00:00
Paul Zhang
22a4cabd19 [Inductor] Add NaN assert to returned values from generated code (#154455)
Summary: It is possible to have `reinterpret_tensor` in the output of inductor codegen, e.g. `reinterpret_tensor(buf366, (1024, ), (1, ), 0)` in the return tuple. This adds assertions to all return values from inductor codegen to prevent nans from slipping through and being hard to trace.

Test Plan:
NaN asserts properly generated in example gemm script:

    vars = (buf1, primals_2, buf2, primals_1, )
    for var in vars:
        if isinstance(var, torch.Tensor):
            assert not var.isnan().any().item()
            assert not var.isinf().any().item()

Pull Request resolved: https://github.com/pytorch/pytorch/pull/154455
Approved by: https://github.com/eellison
2025-05-30 20:32:56 +00:00
Alessandro Sangiorgi
f57754e815 [Inductor] Record Triton’s Base32 Cache Key in .best_config for Debugging (#154618)
This is a follow-up PR of the reverted one https://github.com/pytorch/pytorch/pull/148981 re-opening for visibility :

Modified TorchInductor’s autotuning flow so that each best_config JSON file also includes the Triton “base32” (or base64) cache key.

Motivation

Debugging & Analysis: With this change, we can quickly identify which compiled binary and IRs belongs to a given best config.
The impact is minimal since it is only an extra field in .best_config. It can help advanced performance tuning or kernel-level debugging.

Also, since Triton already stores cubin/hsaco in its cache, developers/researchers can avoid to set store_cubin = True since they can get the cubin/hsaco in the Triton cache and with the code provided in this PR, they can easily match the best_config with the right Triton cache directory for the "best" kernel.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/154618
Approved by: https://github.com/jansel
2025-05-30 19:30:25 +00:00
Ryan Guo
967937872f [dynamo] Remove dead code path for torch.Tensor.view(*shape) (#154646)
This was introduced in early days of Dynamo, and looks like it's been
fixed since -- the regression test `test_transpose_for_scores` passes.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/154646
Approved by: https://github.com/Skylion007, https://github.com/zou3519
ghstack dependencies: #154645
2025-05-30 18:50:58 +00:00
Ryan Guo
f9dc20c7a3 [dynamo] Fix syntax error in aot graph from kwarg-less torch.Tensor.[random_|uniform_] calls (#154645)
As title, fixes #151432, see more context in the issue discussion.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/154645
Approved by: https://github.com/zou3519
2025-05-30 18:50:58 +00:00
PyTorch MergeBot
fb67fa9968 Revert "[Inductor] Add NaN assert to returned values from generated code (#154455)"
This reverts commit aec3ef1008.

Reverted https://github.com/pytorch/pytorch/pull/154455 on behalf of https://github.com/malfet due to Looks like it broke inductor/test_compile_subprocess.py::CpuTests::test_AllenaiLongformerBase, see 35fc5c49b4/1(default%2C%20&mergeEphemeralLF=true ([comment](https://github.com/pytorch/pytorch/pull/154455#issuecomment-2923154249))
2025-05-30 18:45:01 +00:00
PyTorch MergeBot
35fc5c49b4 Revert "[internal] Expose additional metadata to compilation callbacks (#153596)"
This reverts commit f889dea97d.

Reverted https://github.com/pytorch/pytorch/pull/153596 on behalf of https://github.com/izaitsevfb due to introduces bunch of callback-related failures on rocm ([comment](https://github.com/pytorch/pytorch/pull/153596#issuecomment-2923139061))
2025-05-30 18:39:27 +00:00
Aaron Gokaslan
b6b9311f4f [BE][Ez]: Fix typo in dynamo utils #154639 (#154748)
Fixes a typo in #154639

Pull Request resolved: https://github.com/pytorch/pytorch/pull/154748
Approved by: https://github.com/ngimel
2025-05-30 18:39:01 +00:00
Aaron Gokaslan
2120eeb8de [BE][Ez]: Improve dynamo utils typing with TypeIs and TypeGuard (#154639)
Adds some additional TypeIs and TypeGuard to some _dynamo utils for additional type narrowing

Pull Request resolved: https://github.com/pytorch/pytorch/pull/154639
Approved by: https://github.com/jansel
2025-05-30 18:09:50 +00:00
zeshengzong
1b569e5490 Fix load_state_dict description (#154599)
Fixes #141364

Fix missing description in `assign` param

## Test Result

### Before
![image](https://github.com/user-attachments/assets/5928c691-4e31-463b-aa0a-86eb8bb452e5)

### After
![image](https://github.com/user-attachments/assets/036631a2-0f20-4a71-95c3-2c0fd732293e)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/154599
Approved by: https://github.com/colesbury, https://github.com/mikaylagawarecki
2025-05-30 18:08:59 +00:00
Shivam Raikundalia
30ac7f4d4e [EZ/Memory Snapshot] Remove Handle even if compile_context not set (#154664)
Summary: When setting the memory snapshot callback we register and unregister callbacks for performance reasons. For ease of use, it makes sense to just remove all callbacks regardless of which flags are enabled. The enable stays behind a feature flag, this just changes the disable to ignore the flag itself.

Test Plan: Ran without any flags and saw all callbacks removed.

Differential Revision: D75636035

Pull Request resolved: https://github.com/pytorch/pytorch/pull/154664
Approved by: https://github.com/sanrise, https://github.com/aaronenyeshi
2025-05-30 18:08:37 +00:00
dolpm
65d8dba735 [nativert] move layout planner settings to torch (#154668)
Summary: att

Test Plan: ci

Differential Revision: D75633031

Pull Request resolved: https://github.com/pytorch/pytorch/pull/154668
Approved by: https://github.com/zhxchen17
2025-05-30 17:33:27 +00:00
Sidharth
3bdceab124 [dynamo] fix: added star operator for graph_break_hints (#154713)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/154713
Approved by: https://github.com/zou3519, https://github.com/williamwen42
2025-05-30 17:31:03 +00:00
Henry Hu
802ffd06c8 [Export] Add math module for deserialization (#154643)
Summary: As title

Test Plan: ci

Differential Revision: D75580646

Pull Request resolved: https://github.com/pytorch/pytorch/pull/154643
Approved by: https://github.com/yushangdi
2025-05-30 17:29:25 +00:00
Aaron Orenstein
fc0135ca11 Re-enable FakeTensor caching for SymInts (#152662)
Summary:

This backs out D60320595 which itself turned off FakeTensor caching when a SymInt was present.

There has been a lot of dynamic shape fixes done this year and tests pass so I'm assuming some of that work fixed what was breaking previously.

Test Plan: Reran the tests listed in T196779132 and they pass.

## Perf
### Instruction Counter Benchmark:
- 26% win on add_loop_eager_dynamic
- 13% win on add_loop_inductor_dynamic_gpu
### Perf Dashboard
Compilation Latency wins across the board but especially strong on the dynamic tests (like cudagraphs_dynamic) - for example MobileBertForMaskedLM went from 66s -> 50s.

Differential Revision: [D75467694](https://our.internmc.facebook.com/intern/diff/D75467694)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/152662
Approved by: https://github.com/anijain2305
2025-05-30 17:23:36 +00:00
Pian Pawakapan
3027051590 [export] avoid float/bool specialization for scalar tensor construction (#154661)
Fixes #153411

Pull Request resolved: https://github.com/pytorch/pytorch/pull/154661
Approved by: https://github.com/angelayi
2025-05-30 17:18:21 +00:00
bobrenjc93
e7bf72c908 [multigraph] fix composabilty with aotautograd cache (#153526)
AOTAutogradCache uses FXGraphCache which uses the tracing context to get the ShapeEnv. Although the TracingContext global_context is cleared by the time we get around to reusing it, we don't actually need it. We just need the ShapeEnv in the TracingContext, which isn't cleared at the end of dynamo and does persist. This PR adds the tracing context manager around the specialized compile to ensure our caching infrastructure can get access to the ShapeEnv. A test was also added to prove correctness.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/153526
Approved by: https://github.com/jamesjwu, https://github.com/zou3519
ghstack dependencies: #153433, #153449
2025-05-30 16:56:17 +00:00
Ryan Guo
7183f52675 [dynamo] Support namedtuple subclass (#153982)
Fixes #133762. This involves
1. support tuple subclass constructed inside compile region.
2. handle the "fake" global scope associated with NamedTuple-generated
   `__new__`.
3. handle `namedtuple._tuplegetter` more faithfully.

Differential Revision: [D75488091](https://our.internmc.facebook.com/intern/diff/D75488091)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/153982
Approved by: https://github.com/jansel
ghstack dependencies: #154176
2025-05-30 16:14:37 +00:00
Ryan Guo
8002d22ce3 [dynamo] Trace into descriptor with __set__ (#154176)
As title, this patch basically implements
https://github.com/python/cpython/blob/3.11/Objects/object.c#L1371-L1452,
and make the `__get__` handling more robust.

I ran into this while fixing #133762.

Differential Revision: [D75488090](https://our.internmc.facebook.com/intern/diff/D75488090)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/154176
Approved by: https://github.com/jansel
2025-05-30 16:14:37 +00:00
PyTorch MergeBot
31f95b5d2e Revert "inductor codecache: include private inductor configs in cache key (#153672)"
This reverts commit 2c1cb38d95.

Reverted https://github.com/pytorch/pytorch/pull/153672 on behalf of https://github.com/malfet due to Looks like it regressed pr_time_benchmarks, see ba3f91af97/1 ([comment](https://github.com/pytorch/pytorch/pull/153672#issuecomment-2922759739))
2025-05-30 15:54:14 +00:00
Randolf Scholz
ba3f91af97 Type hints for distributions/utils (#154712)
Fixes #144196
Part of #144219

Pull Request resolved: https://github.com/pytorch/pytorch/pull/154712
Approved by: https://github.com/Skylion007
2025-05-30 15:50:31 +00:00
PyTorch MergeBot
7e8532077f Revert "Use 3.27 as the minimum CMake version (#153153)"
This reverts commit 1ece53b157.

Reverted https://github.com/pytorch/pytorch/pull/153153 on behalf of https://github.com/cyyever due to It still breaks windows debug builds ([comment](https://github.com/pytorch/pytorch/pull/153153#issuecomment-2922369830))
2025-05-30 13:16:33 +00:00
cyy
1ece53b157 Use 3.27 as the minimum CMake version (#153153)
Update the minimum CMake version to 3.27 because of it provides more CUDA targets such as `CUDA::nvperf_host` so that it is possible to remove some of our forked CUDA modules. See https://github.com/pytorch/pytorch/pull/153783.
It's also possible to facilitate future third-party updates such as FBGEMM (its current shipped version requires 3.21).

Pull Request resolved: https://github.com/pytorch/pytorch/pull/153153
Approved by: https://github.com/malfet
2025-05-30 11:25:30 +00:00
Laith Sakka
9d6f0d5991 avoid sym_max on nested int in is_contiguous. (#154633)
calling is_contiguous will fail due to sym_max not being supported for nested int, this address in a way consistent with
make_contiguous_strides_for
Pull Request resolved: https://github.com/pytorch/pytorch/pull/154633
Approved by: https://github.com/bobrenjc93
2025-05-30 09:59:33 +00:00
Paul Zhang
aec3ef1008 [Inductor] Add NaN assert to returned values from generated code (#154455)
Summary: It is possible to have `reinterpret_tensor` in the output of inductor codegen, e.g. `reinterpret_tensor(buf366, (1024, ), (1, ), 0)` in the return tuple. This adds assertions to all return values from inductor codegen to prevent nans from slipping through and being hard to trace.

Test Plan:
NaN asserts properly generated in example gemm script:

    vars = (buf1, primals_2, buf2, primals_1, )
    for var in vars:
        if isinstance(var, torch.Tensor):
            assert not var.isnan().any().item()
            assert not var.isinf().any().item()

Pull Request resolved: https://github.com/pytorch/pytorch/pull/154455
Approved by: https://github.com/eellison
2025-05-30 08:53:24 +00:00
Bob Ren
dc82e911e7 remove allow-untyped-defs from torch/utils/data/datapipes/iter/filelister.py (#154624)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/154624
Approved by: https://github.com/Skylion007
2025-05-30 08:38:05 +00:00
PyTorch MergeBot
639f459cb6 Revert "[Inductor] Add NaN assert to returned values from generated code (#154455)"
This reverts commit c3de2c7c6b.

Reverted https://github.com/pytorch/pytorch/pull/154455 on behalf of https://github.com/huydhn due to Sorry for reverting your change, I am trying to see if it help fix the broken trunk below.  It it does not help, I will reland the PR ([comment](https://github.com/pytorch/pytorch/pull/154455#issuecomment-2921562089))
2025-05-30 08:11:22 +00:00
Simon Fan
f889dea97d [internal] Expose additional metadata to compilation callbacks (#153596)
These hooks are used by internal stuck job detection to associate compilation events with the compile lease. Previously, we only had events for Dynamo and Inductor compilation. And recently, the callback handler was updated to ignore nested events. So the Inductor event was only really used by lazy backward.

Here, I remove the inductor event, and add an explicit lazy backward one. Additionally, I add other runtime compilation events: autotuning and cudagraphs. I also expose the CompileId as a string to avoid imports, this will let internal UIs track each graph's contribution to the timeout.

```python
class CallbackTrigger(enum.Enum):
    # most common case, dynamo attempts to trace a new frame
    DYNAMO = 1
    # backward compilation can be deferred to runtime
    LAZY_BACKWARD = 2
    # some backends autotune at runtime
    TRITON_AUTOTUNING = 3
    # cudagraphs record at runtime
    CUDAGRAPH_RECORDING = 4
```

Differential Revision: [D75092426](https://our.internmc.facebook.com/intern/diff/D75092426)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/153596
Approved by: https://github.com/masnesral
2025-05-30 08:07:04 +00:00
drisspg
208965a9d6 Fix unbackend symint error (#154672)
## Summary

Me and @laithsakka  spoke offline about this one, TLDR is that we wanted this
![image](https://github.com/user-attachments/assets/2e537612-3261-4fbe-a6b9-f8ff92ba3c37)

to also be true for Inductor. In that vein we added two new apis to size-vars which is `guard_or_false`, or `guard_or_true`
with the semantics:

guard_or_false, guard_or_true:

Those APIs may add guards, but will never fail with data-dependent errors; They will try to evaluate the expression with the possibility of adding guards, if that fails due to data dependency, instead of hard failing. False or True are returned.

When to use this?

Performance optimizations that warrant a recompilation.

Take the general path and add a runtime check.
```
# Consider this branching.
if x==0:
    return 1
else
    return 10
# To make data dependent friendly, it can be written as the following:
if guard_or_false(x==0):
    return 1
else
  torch.check(x!=0) # runtime check
  return 10
```

However there is still 1 more api to add to make this example work which is the torch.check which works with expressions, I will leave that to the @laithsakka

Pull Request resolved: https://github.com/pytorch/pytorch/pull/154672
Approved by: https://github.com/laithsakka
2025-05-30 07:45:01 +00:00
Bob Ren
5a7442b91f remove allow-untyped-defs from torch/distributed/checkpoint/resharding.py (#154626)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/154626
Approved by: https://github.com/Skylion007
2025-05-30 07:43:04 +00:00
Bob Ren
d66a55def0 remove allow-untyped-defs from torch/distributed/elastic/utils/logging.py (#154625)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/154625
Approved by: https://github.com/Skylion007
2025-05-30 07:37:56 +00:00
Bob Ren
382b38ed1b remove allow-untyped-defs from torch/nn/utils/_expanded_weights/conv_expanded_weights.py (#154623)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/154623
Approved by: https://github.com/Skylion007
2025-05-30 07:32:57 +00:00
Bob Ren
0df96e3921 remove allow-untyped-defs from torch/ao/quantization/stubs.py (#154622)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/154622
Approved by: https://github.com/Skylion007
2025-05-30 07:26:09 +00:00
Xuanteng Huang
30f7079c93 [FSDP2] allow different dtypes for no grad model params (#154103)
Fixes #154082

Pull Request resolved: https://github.com/pytorch/pytorch/pull/154103
Approved by: https://github.com/weifengpy
2025-05-30 07:00:54 +00:00