pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
Animesh Jain	7368eeba5e	[dynamo][guards] Prevent LENGTH guard on nn modules (#154763 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/154763 Approved by: https://github.com/williamwen42	2025-05-31 05:32:31 +00:00
henrylhtsang	7a79de1c0f	[inductor] Add kernel_hash_key to ChoiceCaller (#154470 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/154470 Approved by: https://github.com/mlazos	2025-05-31 03:09:37 +00:00
PyTorch MergeBot	bd10ea4e6c	Revert "Use 3.27 as the minimum CMake version (#153153 )" This reverts commit `ad26ec6abe`. Reverted https://github.com/pytorch/pytorch/pull/153153 on behalf of https://github.com/cyyever due to It still breaks windows debug builds ([comment](https://github.com/pytorch/pytorch/pull/153153#issuecomment-2923997777))	2025-05-31 02:14:24 +00:00
Peter Y. Yeh	43390d8b13	ROCm Sparsity through HipSparseLT (#150578 ) TLDR: - This pull request introduces support for hipSPARSELt in ROCm, current usage would be semi-structure sparsity. - Require ROCm 6.4 && gfx942/gfx950. - The average performance uplift (compare to dense operation) is ~ 20% in ROCm 6.4 but expect further performance lift along the way. ### Dense vs. Sparse Performance Comparison #### NT (Row-major) Average Uplift: `1.20` \| M \| N \| K \| hipsparselt-bench (us) \| hipblaslt-bench get all (us) \| Uplift \| \|-------\|--------\|--------\|-------------------------\|-------------------------------\|--------\| \| 14336 \| 8 \| 4096 \| 20.05 \| 25.3 \| 1.26 \| \| 4096 \| 8 \| 14336 \| 21.07 \| 25.28 \| 1.20 \| \| 3072 \| 3072 \| 10240 \| 299.05 \| 351.82 \| 1.18 \| \| 3072 \| 1536 \| 768 \| 18.56 \| 20.05 \| 1.08 \| \| 3072 \| 17664 \| 768 \| 163.13 \| 173.91 \| 1.07 \| \| 3072 \| 196608 \| 768 \| 1717.30 \| 1949.63 \| 1.14 \| \| 3072 \| 24576 \| 768 \| 206.84 \| 242.98 \| 1.17 \| \| 3072 \| 6144 \| 768 \| 53.90 \| 56.88 \| 1.06 \| \| 3072 \| 98304 \| 768 \| 833.77 \| 962.28 \| 1.15 \| \| 768 \| 1536 \| 768 \| 8.53 \| 19.65 \| 2.30 \| \| 768 \| 17664 \| 768 \| 46.02 \| 46.84 \| 1.02 \| \| 768 \| 196608 \| 768 \| 463.15 \| 540.46 \| 1.17 \| \| 768 \| 24576 \| 768 \| 54.32 \| 59.55 \| 1.10 \| \| 768 \| 6144 \| 768 \| 19.47 \| 20.15 \| 1.03 \| \| 768 \| 98304 \| 768 \| 231.88 \| 258.73 \| 1.12 \| --- #### NN (Row-major) Average Uplift: `1.13` \| M \| N \| K \| hipsparselt-bench (us) \| hipblaslt-bench get all (us) \| Uplift \| \|-----\|--------\|-------\|-------------------------\|-------------------------------\|--------\| \| 768 \| 1536 \| 3072 \| 27.50 \| 28.78 \| 1.05 \| \| 768 \| 17664 \| 3072 \| 125.06 \| 158.94 \| 1.27 \| \| 768 \| 196608 \| 3072 \| 1568.38 \| 1767.12 \| 1.13 \| \| 768 \| 24576 \| 3072 \| 171.05 \| 203.49 \| 1.19 \| \| 768 \| 6144 \| 3072 \| 58.72 \| 60.39 \| 1.03 \| \| 768 \| 98304 \| 3072 \| 787.15 \| 887.60 \| 1.13 \| ------------------------- This pull request introduces support for hipSPARSELt in ROCm, alongside various updates and improvements to the codebase and test suite. The changes primarily involve adding configuration flags, updating conditional checks, and ensuring compatibility with hipSPARSELt. ### ROCm and hipSPARSELt Support: * [`BUILD.bazel`](diffhunk://#diff-7fc57714ef13c3325ce2a1130202edced92fcccc0c6db34a72f7b57f60d552a3R292): Added `@AT_HIPSPARSELT_ENABLED@` substitution to enable hipSPARSELt support. * [`aten/CMakeLists.txt`](diffhunk://#diff-0604597797bb21d7c39150f9429d6b2ace10b79ab308514ad03f76153ae8249bR104-R110): Introduced a conditional flag to enable hipSPARSELt support based on ROCm version. * [`aten/src/ATen/CMakeLists.txt`](diffhunk://#diff-ce80f3115ab2f6be5142f0678a1fc92c6b2d7727766ce44f48726c99e720f777R37): Added `AT_HIPSPARSELT_ENABLED` configuration. * [`aten/src/ATen/cuda/CUDAConfig.h.in`](diffhunk://#diff-8bb82da825ca87c28233abacffa1b0566c73a54990b7a77f3f5108d3718fea15R11): Defined `AT_HIPSPARSELT_ENABLED` macro. * `caffe2/CMakeLists.txt`, `cmake/Dependencies.cmake`, `cmake/public/LoadHIP.cmake`: Included hipSPARSELt in the ROCm dependencies. [[1]](diffhunk://#diff-c5ee05f1e918772792ff6f2a3f579fc2f182e57b1709fd786ef6dc711fd68b27R1380) [[2]](diffhunk://#diff-12e8125164bbfc7556b1781a8ed516e333cc0bf058acb7197f7415be44606c72L1084-R1084) [[3]](diffhunk://#diff-b98e27b9a5f196a6965a99ee5a7bb15b3fc633d6375b767635b1b04ccb2fd3d5R153) ### Codebase Updates: * [`aten/src/ATen/native/sparse/cuda/cuSPARSELtOps.cpp`](diffhunk://#diff-ae921dd1584ab98fdd9c25a3521047795de702223f5b65fdaa45a5bd92b4d1f3R1-R6): Added hipSPARSELt support checks and initialization functions. Updated various methods to conditionally handle hipSPARSELt. [[1]](diffhunk://#diff-ae921dd1584ab98fdd9c25a3521047795de702223f5b65fdaa45a5bd92b4d1f3R1-R6) [[2]](diffhunk://#diff-ae921dd1584ab98fdd9c25a3521047795de702223f5b65fdaa45a5bd92b4d1f3R22-R67) [[3]](diffhunk://#diff-ae921dd1584ab98fdd9c25a3521047795de702223f5b65fdaa45a5bd92b4d1f3R78-R85) [[4]](diffhunk://#diff-ae921dd1584ab98fdd9c25a3521047795de702223f5b65fdaa45a5bd92b4d1f3R97-R109) [[5]](diffhunk://#diff-ae921dd1584ab98fdd9c25a3521047795de702223f5b65fdaa45a5bd92b4d1f3R183-R188) [[6]](diffhunk://#diff-ae921dd1584ab98fdd9c25a3521047795de702223f5b65fdaa45a5bd92b4d1f3L134-R200) [[7]](diffhunk://#diff-ae921dd1584ab98fdd9c25a3521047795de702223f5b65fdaa45a5bd92b4d1f3R213-R222) [[8]](diffhunk://#diff-ae921dd1584ab98fdd9c25a3521047795de702223f5b65fdaa45a5bd92b4d1f3L217-R285) ### Test Suite Updates: * [`test/test_sparse_semi_structured.py`](diffhunk://#diff-b7b57bc1e34145ef89c7929751d5d26aeecc8edfb37da9c60e9d3f0a1335133cR50-R65): Added checks for hipSPARSELt availability and updated test conditions to skip tests not supported on ROCm. [[1]](diffhunk://#diff-b7b57bc1e34145ef89c7929751d5d26aeecc8edfb37da9c60e9d3f0a1335133cR50-R65) [[2]](diffhunk://#diff-b7b57bc1e34145ef89c7929751d5d26aeecc8edfb37da9c60e9d3f0a1335133cR228) [[3]](diffhunk://#diff-b7b57bc1e34145ef89c7929751d5d26aeecc8edfb37da9c60e9d3f0a1335133cR239) [[4]](diffhunk://#diff-b7b57bc1e34145ef89c7929751d5d26aeecc8edfb37da9c60e9d3f0a1335133cR250) [[5]](diffhunk://#diff-b7b57bc1e34145ef89c7929751d5d26aeecc8edfb37da9c60e9d3f0a1335133cR579) [[6]](diffhunk://#diff-b7b57bc1e34145ef89c7929751d5d26aeecc8edfb37da9c60e9d3f0a1335133cR624) [[7]](diffhunk://#diff-b7b57bc1e34145ef89c7929751d5d26aeecc8edfb37da9c60e9d3f0a1335133cR661) [[8]](diffhunk://#diff-b7b57bc1e34145ef89c7929751d5d26aeecc8edfb37da9c60e9d3f0a1335133cR695) [[9]](diffhunk://#diff-b7b57bc1e34145ef89c7929751d5d26aeecc8edfb37da9c60e9d3f0a1335133cR730) [[10]](diffhunk://#diff-b7b57bc1e34145ef89c7929751d5d26aeecc8edfb37da9c60e9d3f0a1335133cR755) [[11]](diffhunk://#diff-b7b57bc1e34145ef89c7929751d5d26aeecc8edfb37da9c60e9d3f0a1335133cR771) [[12]](diffhunk://#diff-b7b57bc1e34145ef89c7929751d5d26aeecc8edfb37da9c60e9d3f0a1335133cR809) [[13]](diffhunk://#diff-b7b57bc1e34145ef89c7929751d5d26aeecc8edfb37da9c60e9d3f0a1335133cR844) [[14]](diffhunk://#diff-b7b57bc1e34145ef89c7929751d5d26aeecc8edfb37da9c60e9d3f0a1335133cL840-R854) [[15]](diffhunk://#diff-b7b57bc1e34145ef89c7929751d5d26aeecc8edfb37da9c60e9d3f0a1335133cR1005) Pull Request resolved: https://github.com/pytorch/pytorch/pull/150578 Approved by: https://github.com/jeffdaily	2025-05-31 02:03:40 +00:00
cyy	ad26ec6abe	Use 3.27 as the minimum CMake version (#153153 ) Update the minimum CMake version to 3.27 because of it provides more CUDA targets such as `CUDA::nvperf_host` so that it is possible to remove some of our forked CUDA modules. See https://github.com/pytorch/pytorch/pull/153783. It's also possible to facilitate future third-party updates such as FBGEMM (its current shipped version requires 3.21). Pull Request resolved: https://github.com/pytorch/pytorch/pull/153153 Approved by: https://github.com/malfet	2025-05-31 01:54:35 +00:00
dolpm	472773c7f9	[nativert] move OpKernelKind enum to torch (#154756 ) Summary: att Test Plan: ci Differential Revision: D75703996 Pull Request resolved: https://github.com/pytorch/pytorch/pull/154756 Approved by: https://github.com/zhxchen17, https://github.com/cyyever	2025-05-31 01:31:29 +00:00
Natalia Gimelshein	f01e628e3b	Resubmit Remove MemPoolContext (#154042 ) (#154746 ) Summary: Per title Test Plan: Added tests + existing tests Differential Revision: D75695030 Pull Request resolved: https://github.com/pytorch/pytorch/pull/154746 Approved by: https://github.com/malfet	2025-05-31 01:21:54 +00:00
PyTorch MergeBot	108422ac26	Revert "Use 3.27 as the minimum CMake version (#153153 )" This reverts commit `78624679a8`. Reverted https://github.com/pytorch/pytorch/pull/153153 on behalf of https://github.com/cyyever due to It still breaks windows debug builds ([comment](https://github.com/pytorch/pytorch/pull/153153#issuecomment-2923785799))	2025-05-31 00:28:03 +00:00
PyTorch MergeBot	0fab32290a	Revert "[draft export] avoid storing intermediate real tensors in proxies (#154630 )" This reverts commit `5acb8d5080`. Reverted https://github.com/pytorch/pytorch/pull/154630 on behalf of https://github.com/malfet due to This still ooms, at least occasionally see `78624679a8/1` ([comment](https://github.com/pytorch/pytorch/pull/154630#issuecomment-2923759745))	2025-05-31 00:07:56 +00:00
Yidi Wu	faf973da5e	[refactor] move materialize_as_graph to _higher_order_ops/utils.py (#154070 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/154070 Approved by: https://github.com/zou3519	2025-05-31 00:06:44 +00:00
cyy	78624679a8	Use 3.27 as the minimum CMake version (#153153 ) Update the minimum CMake version to 3.27 because of it provides more CUDA targets such as `CUDA::nvperf_host` so that it is possible to remove some of our forked CUDA modules. See https://github.com/pytorch/pytorch/pull/153783. It's also possible to facilitate future third-party updates such as FBGEMM (its current shipped version requires 3.21). Pull Request resolved: https://github.com/pytorch/pytorch/pull/153153 Approved by: https://github.com/malfet	2025-05-31 00:01:52 +00:00
Pian Pawakapan	5f1c3c67b2	[pgo] log dynamic whitelist in PT2 Compile Events (#154747 ) Summary: logs the whitelist to PT2 Compile Events Test Plan: loggercli codegen GeneratedPt2CompileEventsLoggerConfig Reviewed By: bobrenjc93 Differential Revision: D75617963 Pull Request resolved: https://github.com/pytorch/pytorch/pull/154747 Approved by: https://github.com/angelayi	2025-05-30 23:54:24 +00:00
Aaron Gokaslan	bbda22e648	[BE][Ez]: Optimize unnecessary lambda with operator (#154722 ) Automated edits performed by FURB118. Operator is implemented in C and way faster when passed to another C method like sorted, max etc as a `key=` Pull Request resolved: https://github.com/pytorch/pytorch/pull/154722 Approved by: https://github.com/jansel	2025-05-30 23:47:10 +00:00
David Berard	eb93c0adb1	[inductor][AMD] support special kwargs in AMD triton configs (#154605 ) Context: AMD triton kernels can be launched with special kwargs, like `waves_per_eu`. Triton configs with these kwargs look like this: ``` triton.Config({ "BLOCK_SIZE": 64, "waves_per_eu": 2, }) ``` in comparison, nvidia's special kwargs are explicit parameters on the config, e.g. num_warps: ``` triton.Config( {"BLOCK_SIZE": 64}, num_warps=4, ) ``` Problem: this causes custom triton kernels w/ PT2 to error out, because there's a kwarg in the triton.Config that doesn't appear in the kernel signature. Solution: When splicing in the constexpr values into the arg list, ignore any values in the config kwargs list if they don't appear in the function signature. Differential Revision: [D75599629](https://our.internmc.facebook.com/intern/diff/D75599629/) NOTE FOR REVIEWERS: This PR has internal Meta-specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D75599629/)! Pull Request resolved: https://github.com/pytorch/pytorch/pull/154605 Approved by: https://github.com/njriasan	2025-05-30 22:24:32 +00:00
PyTorch MergeBot	1193bf0855	Revert "convert inductor codecache to use getArtifactLogger (#153766 )" This reverts commit `5b6fd277f9`. Reverted https://github.com/pytorch/pytorch/pull/153766 on behalf of https://github.com/malfet due to I want to revert this change as I'm 90+% certain it somehow broke testing ([comment](https://github.com/pytorch/pytorch/pull/153766#issuecomment-2923620806))	2025-05-30 22:20:07 +00:00
Pian Pawakapan	5acb8d5080	[draft export] avoid storing intermediate real tensors in proxies (#154630 ) Handles GC for non-strict draft export; GPU memory usage shouldn't be much more than eager mode + input tensors now. While trying to do draft export CPU offloading, I found out GC is feasible, because in non-strict, there's 2 places holding references to a `.real_tensor` attribute: 1) the FakeTensors in fake tensor prop, but these are held by the actual variables in the model's forward call, and so the real tensor gets gc-ed along with the fake one when the variable goes out of scope. 2) A clone of the fake tensor in 1) stored in `proxy.node.meta["val"]`, which was added in https://github.com/pytorch/pytorch/pull/150948. But we didn't actually need to store them on intermediate values; the placeholders are enough for retracing/lowering. Avoiding storing the intermediate values in 2), the values in 1) should be naturally GC-ed, and the real-tensor memory usage for non-strict should be pretty similar to eager computation? Strict still OOMs; dynamo still holds these in variable tracking, and not sure how to GC those. Pull Request resolved: https://github.com/pytorch/pytorch/pull/154630 Approved by: https://github.com/angelayi, https://github.com/yushangdi	2025-05-30 21:06:55 +00:00
Feny Patel	abc2264e8f	remove another instance of mtia_workloadd from pytorch (#154739 ) Summary: ^ Test Plan: CIs Differential Revision: D75692171 Pull Request resolved: https://github.com/pytorch/pytorch/pull/154739 Approved by: https://github.com/sraikund16	2025-05-30 20:50:46 +00:00
Paul Zhang	22a4cabd19	[Inductor] Add NaN assert to returned values from generated code (#154455 ) Summary: It is possible to have `reinterpret_tensor` in the output of inductor codegen, e.g. `reinterpret_tensor(buf366, (1024, ), (1, ), 0)` in the return tuple. This adds assertions to all return values from inductor codegen to prevent nans from slipping through and being hard to trace. Test Plan: NaN asserts properly generated in example gemm script: vars = (buf1, primals_2, buf2, primals_1, ) for var in vars: if isinstance(var, torch.Tensor): assert not var.isnan().any().item() assert not var.isinf().any().item() Pull Request resolved: https://github.com/pytorch/pytorch/pull/154455 Approved by: https://github.com/eellison	2025-05-30 20:32:56 +00:00
Alessandro Sangiorgi	f57754e815	[Inductor] Record Triton’s Base32 Cache Key in .best_config for Debugging (#154618 ) This is a follow-up PR of the reverted one https://github.com/pytorch/pytorch/pull/148981 re-opening for visibility : Modified TorchInductor’s autotuning flow so that each best_config JSON file also includes the Triton “base32” (or base64) cache key. Motivation Debugging & Analysis: With this change, we can quickly identify which compiled binary and IRs belongs to a given best config. The impact is minimal since it is only an extra field in .best_config. It can help advanced performance tuning or kernel-level debugging. Also, since Triton already stores cubin/hsaco in its cache, developers/researchers can avoid to set store_cubin = True since they can get the cubin/hsaco in the Triton cache and with the code provided in this PR, they can easily match the best_config with the right Triton cache directory for the "best" kernel. Pull Request resolved: https://github.com/pytorch/pytorch/pull/154618 Approved by: https://github.com/jansel	2025-05-30 19:30:25 +00:00
Ryan Guo	967937872f	[dynamo] Remove dead code path for `torch.Tensor.view(*shape)` (#154646 ) This was introduced in early days of Dynamo, and looks like it's been fixed since -- the regression test `test_transpose_for_scores` passes. Pull Request resolved: https://github.com/pytorch/pytorch/pull/154646 Approved by: https://github.com/Skylion007, https://github.com/zou3519 ghstack dependencies: #154645	2025-05-30 18:50:58 +00:00
Ryan Guo	f9dc20c7a3	[dynamo] Fix syntax error in aot graph from kwarg-less `torch.Tensor.[random_\|uniform_]` calls (#154645 ) As title, fixes #151432, see more context in the issue discussion. Pull Request resolved: https://github.com/pytorch/pytorch/pull/154645 Approved by: https://github.com/zou3519	2025-05-30 18:50:58 +00:00
PyTorch MergeBot	fb67fa9968	Revert "[Inductor] Add NaN assert to returned values from generated code (#154455 )" This reverts commit `aec3ef1008`. Reverted https://github.com/pytorch/pytorch/pull/154455 on behalf of https://github.com/malfet due to Looks like it broke inductor/test_compile_subprocess.py::CpuTests::test_AllenaiLongformerBase, see `35fc5c49b4/1`(default%2C%20&mergeEphemeralLF=true ([comment](https://github.com/pytorch/pytorch/pull/154455#issuecomment-2923154249))	2025-05-30 18:45:01 +00:00
PyTorch MergeBot	35fc5c49b4	Revert "[internal] Expose additional metadata to compilation callbacks (#153596 )" This reverts commit `f889dea97d`. Reverted https://github.com/pytorch/pytorch/pull/153596 on behalf of https://github.com/izaitsevfb due to introduces bunch of callback-related failures on rocm ([comment](https://github.com/pytorch/pytorch/pull/153596#issuecomment-2923139061))	2025-05-30 18:39:27 +00:00
Aaron Gokaslan	b6b9311f4f	[BE][Ez]: Fix typo in dynamo utils #154639 (#154748 ) Fixes a typo in #154639 Pull Request resolved: https://github.com/pytorch/pytorch/pull/154748 Approved by: https://github.com/ngimel	2025-05-30 18:39:01 +00:00
Aaron Gokaslan	2120eeb8de	[BE][Ez]: Improve dynamo utils typing with TypeIs and TypeGuard (#154639 ) Adds some additional TypeIs and TypeGuard to some _dynamo utils for additional type narrowing Pull Request resolved: https://github.com/pytorch/pytorch/pull/154639 Approved by: https://github.com/jansel	2025-05-30 18:09:50 +00:00
zeshengzong	1b569e5490	Fix load_state_dict description (#154599 ) Fixes #141364 Fix missing description in `assign` param ## Test Result ### Before ![image](https://github.com/user-attachments/assets/5928c691-4e31-463b-aa0a-86eb8bb452e5) ### After ![image](https://github.com/user-attachments/assets/036631a2-0f20-4a71-95c3-2c0fd732293e) Pull Request resolved: https://github.com/pytorch/pytorch/pull/154599 Approved by: https://github.com/colesbury, https://github.com/mikaylagawarecki	2025-05-30 18:08:59 +00:00
Shivam Raikundalia	30ac7f4d4e	[EZ/Memory Snapshot] Remove Handle even if compile_context not set (#154664 ) Summary: When setting the memory snapshot callback we register and unregister callbacks for performance reasons. For ease of use, it makes sense to just remove all callbacks regardless of which flags are enabled. The enable stays behind a feature flag, this just changes the disable to ignore the flag itself. Test Plan: Ran without any flags and saw all callbacks removed. Differential Revision: D75636035 Pull Request resolved: https://github.com/pytorch/pytorch/pull/154664 Approved by: https://github.com/sanrise, https://github.com/aaronenyeshi	2025-05-30 18:08:37 +00:00
dolpm	65d8dba735	[nativert] move layout planner settings to torch (#154668 ) Summary: att Test Plan: ci Differential Revision: D75633031 Pull Request resolved: https://github.com/pytorch/pytorch/pull/154668 Approved by: https://github.com/zhxchen17	2025-05-30 17:33:27 +00:00
Sidharth	3bdceab124	[dynamo] fix: added star operator for graph_break_hints (#154713 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/154713 Approved by: https://github.com/zou3519, https://github.com/williamwen42	2025-05-30 17:31:03 +00:00
Henry Hu	802ffd06c8	[Export] Add math module for deserialization (#154643 ) Summary: As title Test Plan: ci Differential Revision: D75580646 Pull Request resolved: https://github.com/pytorch/pytorch/pull/154643 Approved by: https://github.com/yushangdi	2025-05-30 17:29:25 +00:00
Aaron Orenstein	fc0135ca11	Re-enable FakeTensor caching for SymInts (#152662 ) Summary: This backs out D60320595 which itself turned off FakeTensor caching when a SymInt was present. There has been a lot of dynamic shape fixes done this year and tests pass so I'm assuming some of that work fixed what was breaking previously. Test Plan: Reran the tests listed in T196779132 and they pass. ## Perf ### Instruction Counter Benchmark: - 26% win on add_loop_eager_dynamic - 13% win on add_loop_inductor_dynamic_gpu ### Perf Dashboard Compilation Latency wins across the board but especially strong on the dynamic tests (like cudagraphs_dynamic) - for example MobileBertForMaskedLM went from 66s -> 50s. Differential Revision: [D75467694](https://our.internmc.facebook.com/intern/diff/D75467694) Pull Request resolved: https://github.com/pytorch/pytorch/pull/152662 Approved by: https://github.com/anijain2305	2025-05-30 17:23:36 +00:00
Pian Pawakapan	3027051590	[export] avoid float/bool specialization for scalar tensor construction (#154661 ) Fixes #153411 Pull Request resolved: https://github.com/pytorch/pytorch/pull/154661 Approved by: https://github.com/angelayi	2025-05-30 17:18:21 +00:00
bobrenjc93	e7bf72c908	[multigraph] fix composabilty with aotautograd cache (#153526 ) AOTAutogradCache uses FXGraphCache which uses the tracing context to get the ShapeEnv. Although the TracingContext global_context is cleared by the time we get around to reusing it, we don't actually need it. We just need the ShapeEnv in the TracingContext, which isn't cleared at the end of dynamo and does persist. This PR adds the tracing context manager around the specialized compile to ensure our caching infrastructure can get access to the ShapeEnv. A test was also added to prove correctness. Pull Request resolved: https://github.com/pytorch/pytorch/pull/153526 Approved by: https://github.com/jamesjwu, https://github.com/zou3519 ghstack dependencies: #153433, #153449	2025-05-30 16:56:17 +00:00
Ryan Guo	7183f52675	[dynamo] Support namedtuple subclass (#153982 ) Fixes #133762. This involves 1. support tuple subclass constructed inside compile region. 2. handle the "fake" global scope associated with NamedTuple-generated `__new__`. 3. handle `namedtuple._tuplegetter` more faithfully. Differential Revision: [D75488091](https://our.internmc.facebook.com/intern/diff/D75488091) Pull Request resolved: https://github.com/pytorch/pytorch/pull/153982 Approved by: https://github.com/jansel ghstack dependencies: #154176	2025-05-30 16:14:37 +00:00
Ryan Guo	8002d22ce3	[dynamo] Trace into descriptor with `__set__` (#154176 ) As title, this patch basically implements https://github.com/python/cpython/blob/3.11/Objects/object.c#L1371-L1452, and make the `__get__` handling more robust. I ran into this while fixing #133762. Differential Revision: [D75488090](https://our.internmc.facebook.com/intern/diff/D75488090) Pull Request resolved: https://github.com/pytorch/pytorch/pull/154176 Approved by: https://github.com/jansel	2025-05-30 16:14:37 +00:00
PyTorch MergeBot	31f95b5d2e	Revert "inductor codecache: include private inductor configs in cache key (#153672 )" This reverts commit `2c1cb38d95`. Reverted https://github.com/pytorch/pytorch/pull/153672 on behalf of https://github.com/malfet due to Looks like it regressed pr_time_benchmarks, see `ba3f91af97/1` ([comment](https://github.com/pytorch/pytorch/pull/153672#issuecomment-2922759739))	2025-05-30 15:54:14 +00:00
Randolf Scholz	ba3f91af97	Type hints for distributions/utils (#154712 ) Fixes #144196 Part of #144219 Pull Request resolved: https://github.com/pytorch/pytorch/pull/154712 Approved by: https://github.com/Skylion007	2025-05-30 15:50:31 +00:00
PyTorch MergeBot	7e8532077f	Revert "Use 3.27 as the minimum CMake version (#153153 )" This reverts commit `1ece53b157`. Reverted https://github.com/pytorch/pytorch/pull/153153 on behalf of https://github.com/cyyever due to It still breaks windows debug builds ([comment](https://github.com/pytorch/pytorch/pull/153153#issuecomment-2922369830))	2025-05-30 13:16:33 +00:00
cyy	1ece53b157	Use 3.27 as the minimum CMake version (#153153 ) Update the minimum CMake version to 3.27 because of it provides more CUDA targets such as `CUDA::nvperf_host` so that it is possible to remove some of our forked CUDA modules. See https://github.com/pytorch/pytorch/pull/153783. It's also possible to facilitate future third-party updates such as FBGEMM (its current shipped version requires 3.21). Pull Request resolved: https://github.com/pytorch/pytorch/pull/153153 Approved by: https://github.com/malfet	2025-05-30 11:25:30 +00:00
Laith Sakka	9d6f0d5991	avoid sym_max on nested int in is_contiguous. (#154633 ) calling is_contiguous will fail due to sym_max not being supported for nested int, this address in a way consistent with make_contiguous_strides_for Pull Request resolved: https://github.com/pytorch/pytorch/pull/154633 Approved by: https://github.com/bobrenjc93	2025-05-30 09:59:33 +00:00
Paul Zhang	aec3ef1008	[Inductor] Add NaN assert to returned values from generated code (#154455 ) Summary: It is possible to have `reinterpret_tensor` in the output of inductor codegen, e.g. `reinterpret_tensor(buf366, (1024, ), (1, ), 0)` in the return tuple. This adds assertions to all return values from inductor codegen to prevent nans from slipping through and being hard to trace. Test Plan: NaN asserts properly generated in example gemm script: vars = (buf1, primals_2, buf2, primals_1, ) for var in vars: if isinstance(var, torch.Tensor): assert not var.isnan().any().item() assert not var.isinf().any().item() Pull Request resolved: https://github.com/pytorch/pytorch/pull/154455 Approved by: https://github.com/eellison	2025-05-30 08:53:24 +00:00
Bob Ren	dc82e911e7	remove allow-untyped-defs from torch/utils/data/datapipes/iter/filelister.py (#154624 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/154624 Approved by: https://github.com/Skylion007	2025-05-30 08:38:05 +00:00
PyTorch MergeBot	639f459cb6	Revert "[Inductor] Add NaN assert to returned values from generated code (#154455 )" This reverts commit `c3de2c7c6b`. Reverted https://github.com/pytorch/pytorch/pull/154455 on behalf of https://github.com/huydhn due to Sorry for reverting your change, I am trying to see if it help fix the broken trunk below. It it does not help, I will reland the PR ([comment](https://github.com/pytorch/pytorch/pull/154455#issuecomment-2921562089))	2025-05-30 08:11:22 +00:00
Simon Fan	f889dea97d	[internal] Expose additional metadata to compilation callbacks (#153596 ) These hooks are used by internal stuck job detection to associate compilation events with the compile lease. Previously, we only had events for Dynamo and Inductor compilation. And recently, the callback handler was updated to ignore nested events. So the Inductor event was only really used by lazy backward. Here, I remove the inductor event, and add an explicit lazy backward one. Additionally, I add other runtime compilation events: autotuning and cudagraphs. I also expose the CompileId as a string to avoid imports, this will let internal UIs track each graph's contribution to the timeout. ```python class CallbackTrigger(enum.Enum): # most common case, dynamo attempts to trace a new frame DYNAMO = 1 # backward compilation can be deferred to runtime LAZY_BACKWARD = 2 # some backends autotune at runtime TRITON_AUTOTUNING = 3 # cudagraphs record at runtime CUDAGRAPH_RECORDING = 4 ``` Differential Revision: [D75092426](https://our.internmc.facebook.com/intern/diff/D75092426) Pull Request resolved: https://github.com/pytorch/pytorch/pull/153596 Approved by: https://github.com/masnesral	2025-05-30 08:07:04 +00:00
drisspg	208965a9d6	Fix unbackend symint error (#154672 ) ## Summary Me and @laithsakka spoke offline about this one, TLDR is that we wanted this ![image](https://github.com/user-attachments/assets/2e537612-3261-4fbe-a6b9-f8ff92ba3c37) to also be true for Inductor. In that vein we added two new apis to size-vars which is `guard_or_false`, or `guard_or_true` with the semantics: guard_or_false, guard_or_true: Those APIs may add guards, but will never fail with data-dependent errors; They will try to evaluate the expression with the possibility of adding guards, if that fails due to data dependency, instead of hard failing. False or True are returned. When to use this? Performance optimizations that warrant a recompilation. Take the general path and add a runtime check. ``` # Consider this branching. if x==0: return 1 else return 10 # To make data dependent friendly, it can be written as the following: if guard_or_false(x==0): return 1 else torch.check(x!=0) # runtime check return 10 ``` However there is still 1 more api to add to make this example work which is the torch.check which works with expressions, I will leave that to the @laithsakka Pull Request resolved: https://github.com/pytorch/pytorch/pull/154672 Approved by: https://github.com/laithsakka	2025-05-30 07:45:01 +00:00
Bob Ren	5a7442b91f	remove allow-untyped-defs from torch/distributed/checkpoint/resharding.py (#154626 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/154626 Approved by: https://github.com/Skylion007	2025-05-30 07:43:04 +00:00
Bob Ren	d66a55def0	remove allow-untyped-defs from torch/distributed/elastic/utils/logging.py (#154625 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/154625 Approved by: https://github.com/Skylion007	2025-05-30 07:37:56 +00:00
Bob Ren	382b38ed1b	remove allow-untyped-defs from torch/nn/utils/_expanded_weights/conv_expanded_weights.py (#154623 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/154623 Approved by: https://github.com/Skylion007	2025-05-30 07:32:57 +00:00
Bob Ren	0df96e3921	remove allow-untyped-defs from torch/ao/quantization/stubs.py (#154622 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/154622 Approved by: https://github.com/Skylion007	2025-05-30 07:26:09 +00:00
Xuanteng Huang	30f7079c93	[FSDP2] allow different dtypes for no grad model params (#154103 ) Fixes #154082 Pull Request resolved: https://github.com/pytorch/pytorch/pull/154103 Approved by: https://github.com/weifengpy	2025-05-30 07:00:54 +00:00

1 2 3 4 5 ...

48708 Commits