pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-06 12:20:52 +01:00

Author	SHA1	Message	Date
Edward Yang	6e8f17c580	[RELAND] Always build USE_DISTRIBUTED (#160449 ) and Make distributed modules importable even when backend not built (#159889 ) (#162594 ) Summary: Original: D81957844 and D81957923 Also, https://github.com/pytorch/pytorch/pull/162142 is patched in as well #buildall Test Plan: sandcastle and oss ci Rollback Plan: Reviewed By: H-Huang Pull Request resolved: https://github.com/pytorch/pytorch/pull/162594 Approved by: https://github.com/H-Huang, https://github.com/dcci	2025-09-12 03:56:18 +00:00
Klaus Zimmermann	31345fb4f7	Make functorch notebook symlinks PEP 517 valid (#157813 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/157813 Approved by: https://github.com/zou3519, https://github.com/atalman	2025-09-12 03:52:08 +00:00
Daniel Vega-Myhre	872ed60679	[mxfp8 torch._scaled_grouped_mm] fix meta registration for 3d tensor (#162765 ) Meta registration checks for torch._scaled_grouped_mm has a bug for 3d "B" tensors. Namely, the scale shape for such a tensor should be 2d with shape (G, blocked_K * blocked_N), but it currently enforces an expected 3d shape of (G, blocked_K, blocked_N). See Blas.cpp for correct validation logic [here](`8e217a9f6d/aten/src/ATen/native/cuda/Blas.cpp (L1622)`). Pull Request resolved: https://github.com/pytorch/pytorch/pull/162765 Approved by: https://github.com/ngimel	2025-09-12 03:51:52 +00:00
atalman	e8eeb06034	Move inductor jobs 3.9->3.10 (#162323 ) Related to: https://github.com/pytorch/pytorch/issues/161167 Pull Request resolved: https://github.com/pytorch/pytorch/pull/162323 Approved by: https://github.com/huydhn, https://github.com/Skylion007 Co-authored-by: Huy Do <huydhn@gmail.com>	2025-09-12 03:43:06 +00:00
Yang Wang	3cd734584d	bring back the old vllm's use_existing_torch.py (#162747 ) vllm's pr will override our dependencies for torch. quick fix to add the use_existing_torch.py. syncing with vllm now regarding the uv approach they have Pull Request resolved: https://github.com/pytorch/pytorch/pull/162747 Approved by: https://github.com/huydhn	2025-09-12 03:41:39 +00:00
PyTorch MergeBot	222ec8d28e	Revert "AMD CPU CI - Add freezing + fix label trigger (#162176 )" This reverts commit `9cac1b9259`. Reverted https://github.com/pytorch/pytorch/pull/162176 on behalf of https://github.com/huydhn due to Sorry for reverting this but hardcoding the input online 122 does not make sense ([comment](https://github.com/pytorch/pytorch/pull/162176#issuecomment-3283532452))	2025-09-12 03:39:13 +00:00
thenumberouscode	c140bf217f	[indexing] Prevent integer overflow from large step values in C++ (#161707 ) Fixes https://github.com/pytorch/pytorch/issues/160868 hmmm, I found an existing fix PR after I've finished this one. For reference, the old PR was https://github.com/pytorch/pytorch/pull/147433/files. Pull Request resolved: https://github.com/pytorch/pytorch/pull/161707 Approved by: https://github.com/leslie-fang-intel, https://github.com/CaoE, https://github.com/mlazos	2025-09-12 03:16:23 +00:00
Janani Sriram	7eb92b076f	[Inductor][FP8] Validate exhaustive autotuning for FP8 Inductor templates (#162678 ) Summary: Validate exhaustive autotuning for FP8 Inductor templates: scaled MM templates require `block_k >= 32`. Before, exhaustive autotuning defaulted to a limited set of autotuning configs, as limitations for exhaustively autotuning on FP8 shapes had not been tested. Test Plan: ``` CUDA_VISIBLE_DEVICES=0 TRITON_PRINT_AUTOTUNING=1 TRITON_ALWAYS_COMPILE=1 TORCH_LOGS=+inductor TORCHINDUCTOR_FORCE_DISABLE_CACHES=1 ENABLE_PERSISTENT_TMA_MATMUL=1 TORCHINDUCTOR_MAX_AUTOTUNE_GEMM=1 TORCHINDUCTOR_MAX_AUTOTUNE_GEMM_SEARCH_SPACE=DEFAULT buck2 run mode/{opt,inplace} pytorch/t ritonbench:run -- --op fp8_gemm --only torch_fp8_gemm,pt2_fp8_gemm --metrics tflops,accuracy --input-loader=/home/jananisriram/personal/exhaustive_autotune_rowwise_persistent_tma/json_fi les/rowwise_ptma_0.json --output="/home/jananisriram/personal/exhaustive_autotune_rowwise_persistent_tma/autotune/gpu0_bench.csv" --atol=1e-2 --rtol=0.5 2>&1 \| tee ~/personal/exhaustive_ autotune_rowwise_persistent_tma/autotune/gpu0.log ``` Rollback Plan: Differential Revision: D82174075 Pull Request resolved: https://github.com/pytorch/pytorch/pull/162678 Approved by: https://github.com/coconutruben	2025-09-12 02:12:33 +00:00
Shangdi Yu	ccb450b190	[pre_compile] Add check for cuda and hardware version (#162438 ) if we detect compiled model is using cuda in meaningful way, we should store information about cuda + hardware Example: `SystemInfo(python_version='3.12.9', torch_version='2.9.0a0+gite02b0e6', cuda_version='12.6', triton_version=(3, 4), gpu_name='NVIDIA PG509-210')` Pull Request resolved: https://github.com/pytorch/pytorch/pull/162438 Approved by: https://github.com/zhxchen17	2025-09-12 01:42:07 +00:00
Gabriel Ferns	ae97eb86f7	Reland "Fix conv exhaustive autotuning and expand Exhaustive test coverage" (#161957 ) reland https://github.com/pytorch/pytorch/pull/159387 Pull Request resolved: https://github.com/pytorch/pytorch/pull/161957 Approved by: https://github.com/coconutruben	2025-09-12 01:36:43 +00:00
mengph	7a9c4d794c	[BUG]Fixed handle cannot be hit in the cache in the IPC ExpandableSegment (#161885 ) Fixed the bug that handle cannot be hit in the ipcMemHandle_to_devptr cache in the IPC scenario of ExpandableSegment. Fixes #161884 Pull Request resolved: https://github.com/pytorch/pytorch/pull/161885 Approved by: https://github.com/albanD	2025-09-12 01:09:17 +00:00
Avik Chaudhuri	501e19137a	fix var args for shape guards (#162633 ) Summary: Fixes #162599 Test Plan: added test based on repro Rollback Plan: Differential Revision: D82144520 Pull Request resolved: https://github.com/pytorch/pytorch/pull/162633 Approved by: https://github.com/tugsbayasgalan	2025-09-12 00:33:35 +00:00
Aaryaman Vasishta	4a757e1e17	[ROCm] Support torch.cuda._compile_kernel (#162510 ) Supports `torch.cuda._compile_kernel` on ROCm. Related to https://github.com/pytorch/pytorch/pull/151484 Tested on Windows with gfx1201. Testing on Linux pending. Pull Request resolved: https://github.com/pytorch/pytorch/pull/162510 Approved by: https://github.com/mycpuorg, https://github.com/msaroufim	2025-09-12 00:18:47 +00:00
Yuxingwang-intel	563921619b	Fix the regression issue caused by non-arrch64 platforms not hitting the MKLDNN path. (#162168 ) This issue was introduced by the commit in issue #161065. Added an extra check to provide a proper path for other platforms. Pull Request resolved: https://github.com/pytorch/pytorch/pull/162168 Approved by: https://github.com/mingfeima, https://github.com/malfet Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com>	2025-09-12 00:17:08 +00:00
Nikita Shulga	84d8ec73f1	[CD] Build Mac wheels using `setup-python` action (#162136 ) Biggest difference between both conda and homebrew CPython builds and one from python.org, is that later are universal binaries and they are always trying to build universal extension... Workaround lots of universal binary build attempts by explicitly specifying both `_PYTHON_PLATFORM` and `--plat-name` as well as `ARCH_FLAGS` Suppressed actionlint warning on use of `freethreaded` flag which is document in https://github.com/actions/setup-python/tree/v5 TODO: Remove lots of temporary workarounds when `3.14` is out in October 2025 Pull Request resolved: https://github.com/pytorch/pytorch/pull/162136 Approved by: https://github.com/atalman, https://github.com/huydhn ghstack dependencies: #162297, #162265	2025-09-12 00:16:31 +00:00
Ramya Ramineni	a956066b4e	[ROCm] Define uint32 t when ROCM_VERSION >= 70000 (#160587 ) This PR fixes the errors like below: ``` [rank3]: RuntimeError: The following operation failed in the TorchScript interpreter. [rank3]: Traceback of TorchScript (most recent call last): [rank3]: RuntimeError: /tmp/comgr-28f951/input/CompileSourceACC062:67:7: error: unknown type name 'uint32_t'; did you mean '__hip_internal::uint32_t'? [rank3]: 67 \| uint32_t int32; [rank3]: \| ^~~~~~~~ [rank3]: \| __hip_internal::uint32_t ``` Earlier uint32_t was defined in HIP headers in std namespace. Now it is moved to __hip_internal namespace in hip headers. This change is made in ROCm 7.0. Pull Request resolved: https://github.com/pytorch/pytorch/pull/160587 Approved by: https://github.com/jeffdaily	2025-09-12 00:13:26 +00:00
David Berard	ff6870d134	[BE][flex attention] compute RMSE in float64 (#162088 ) I saw a failure where the reference error was 0.0, and the compiled error was 0.035. Although the failure still occurs with or without this change, it was confusing to see RMSE of 0.0. Pull Request resolved: https://github.com/pytorch/pytorch/pull/162088 Approved by: https://github.com/drisspg	2025-09-11 23:53:31 +00:00
PyTorch MergeBot	92f9ed7ac3	Revert "[2/N]Port several test files under test/distributed to Intel GPU (#159473 )" This reverts commit `fa1d409e83`. Reverted https://github.com/pytorch/pytorch/pull/159473 on behalf of https://github.com/huydhn due to Sorry for reverting your change but it seems to break an distributed tests ([comment](https://github.com/pytorch/pytorch/pull/159473#issuecomment-3282999084))	2025-09-11 23:51:21 +00:00
Zhengxu Chen	8e217a9f6d	[precompile] Fix issues with guard serialization on distributed types. (#162418 ) Summary: Add more support for torch internal distributed data structures. Test Plan: test_guard_serialization.py Rollback Plan: Differential Revision: D81927732 Pull Request resolved: https://github.com/pytorch/pytorch/pull/162418 Approved by: https://github.com/dolpm	2025-09-11 23:09:55 +00:00
hanchchch	429052f151	fix: raise value error on init ParametrizationList if original.device != new.device (#162717 ) raise value error on init `ParametrizationList`, if `original.device != new.device`. currently `_maybe_set` will throw below error in such situations, which I think it's not convenient to debug. ``` [rank1]: RuntimeError: Attempted to set the storage of a tensor on device "cuda:1" to a storage on different device "cpu". This is no longer allowed; the devices must match. ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/162717 Approved by: https://github.com/lezcano	2025-09-11 23:07:58 +00:00
Nakul Iyer	a3f01f6418	[MTIA Runtime] Add foreach_div ops to native_functions.yaml (#162732 ) Summary: Quick fix for runtime support on foreach_div, see D81274963. Fixed an issue that I created in that diff so that the CIs pass. Test Plan: CIs created in D81274963 and D81286593 pass. Added some logs in [aten_mtia_ops.py](https://www.internalfb.com/code/fbsource/[c56272ba042c43c65517dcac254364cf732fcfa9]/fbcode/mtia/host_runtime/torch_mtia/aten_mtia_ops.cpp?lines=3676) to all the foreach_div ops. We can see that the correct MTIA kernels are being invoked in the tests. https://www.internalfb.com/intern/testinfra/testrun/15481123829281588 Rollback Plan: Differential Revision: D82161434 Pull Request resolved: https://github.com/pytorch/pytorch/pull/162732 Approved by: https://github.com/danielhou0515	2025-09-11 22:47:03 +00:00
Aaryaman Vasishta	62843c14bb	[ROCm/Windows] Support aotriton for scaled_dot_product_attention on Windows. (#162330 ) Enables flash attention and/or memory efficient attention on Windows with scaled_dot_product_attention via. aotriton. Already tested to be working on Windows with TheRock. Steps to enable: simply set `USE_FLASH_ATTENTION=1` and `USE_MEM_EFF_ATTENTION=1` as usual. See https://github.com/ROCm/TheRock/blob/main/external-builds/pytorch/build_prod_wheels.py#L578-L604 Pull Request resolved: https://github.com/pytorch/pytorch/pull/162330 Approved by: https://github.com/xinyazhang, https://github.com/ScottTodd, https://github.com/jeffdaily Co-authored-by: Scott Todd <scott.todd0@gmail.com>	2025-09-11 22:35:09 +00:00
Nick Riasanovsky	082d3dd9d5	[Triton] [Inductor] Restrict subprocess autotuning to just Triton (#162688 ) Summary: Restricts subprocess benchmarking to only `TritonTemplateCaller`, which is expected by the underlying `target` method. THhis triggered a bug with large K shapes because the decompose k is `SubgraphChoiceCaller`. Test Plan: mm autotuning with a large k and `TORCHINDUCTOR_AUTOTUNE_IN_SUBPROC=1` Rollback Plan: Differential Revision: D82181924 Pull Request resolved: https://github.com/pytorch/pytorch/pull/162688 Approved by: https://github.com/PaulZhang12, https://github.com/eellison, https://github.com/mlazos	2025-09-11 22:17:57 +00:00
PyTorch MergeBot	468c1f9e9d	Revert "[nn] Assert parsed iterable arguments are an appropriate length (#162340 )" This reverts commit `b5e6e58050`. Reverted https://github.com/pytorch/pytorch/pull/162340 on behalf of https://github.com/huydhn due to Sorry for reverting your change but it seems to break an MPS tests on ExecuTorch ([comment](https://github.com/pytorch/pytorch/pull/162340#issuecomment-3282676242))	2025-09-11 21:22:57 +00:00
Nick Riasanovsky	9614c2eb14	[Triton] [Inductor] Pruned failed compilations from Autotuning candidates (#162673 ) Summary: When exahaustively autotuning a new template you may hit situations that lead to compilation failures. This template will still attempt to autotune because nothing was marking this as failed and in my experiments lead to a crash/segfault if I didn't set `TORCHINDUCTOR_AUTOTUNE_IN_SUBPROC=1`. To help eliminate this issue this PR marks any template that fails to compile as "failed" and then removes all of the failed templates from the choice candidates. In the case where it would have just failed to compile twice, this should at least reduce compilation time. Test Plan: Tested locally when experminenting with the new blackwell templates and a Triton version that contains a bug related to `num_warps < 4`. Rollback Plan: Differential Revision: D82172207 Pull Request resolved: https://github.com/pytorch/pytorch/pull/162673 Approved by: https://github.com/PaulZhang12, https://github.com/mlazos	2025-09-11 21:22:36 +00:00
Janani Sriram	4c6a6c2db9	[Inductor][FP8] Add new scaled_mm and scaled_persistent_mm configs to Inductor FP8 Triton templates (#162699 ) Summary: Add new `scaled_mm` and `scaled_persistent_mm` configs to `template_heuristics.py` for Inductor FP8 Triton templates. These configs are a representative subset of the most performant configs generated from exhaustively autotuning FP8 Triton kernels with per-tensor and per-row scaling. See this [spreadsheet](https://docs.google.com/spreadsheets/d/1Fal1vhFUJIUcLpM2kJect6IkgeUFvCY-nUr3RTupM_4/edit?gid=1732602731#gid=1732602731) for benchmarks and performance metrics. Test Plan: Verify that configs do not error, i.e. ``` CUDA_VISIBLE_DEVICES=0 TRITON_PRINT_AUTOTUNING=1 TRITON_ALWAYS_COMPILE=1 TORCH_LOGS=+i nductor TORCHINDUCTOR_FORCE_DISABLE_CACHES=1 ENABLE_PERSISTENT_TMA_MATMUL=1 TORCHINDUCTOR_MAX_AUTOTUNE_GEMM=1 buck2 run mode/{opt,inplace} pytorch/tritonbench:run -- --op fp8_gemm --only pt2_fp8_gemm --metrics tflops,accuracy --input-loader={input_path} --output="{output_csv}" --atol=1e-2 --rtol=0.5 2>&1 \| tee {log_file} ``` Rollback Plan: Reviewed By: NikhilAPatel, PaulZhang12 Differential Revision: D81651226 Pull Request resolved: https://github.com/pytorch/pytorch/pull/162699 Approved by: https://github.com/PaulZhang12	2025-09-11 21:21:06 +00:00
Rohit Manav	3ad3bfe11d	added example for torch.is_storage (#162614 ) Fixes #162613 Pull Request resolved: https://github.com/pytorch/pytorch/pull/162614 Approved by: https://github.com/malfet Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com>	2025-09-11 20:25:26 +00:00
PyTorch MergeBot	1c6dfbe557	Revert "[inductor] FlexibleLayout for ExternKernelChoice for mms (#161351 )" This reverts commit `f08487aa86`. Reverted https://github.com/pytorch/pytorch/pull/161351 on behalf of https://github.com/huydhn due to Check with @coconutruben and the internal failures look real ([comment](https://github.com/pytorch/pytorch/pull/161351#issuecomment-3282511692))	2025-09-11 20:24:15 +00:00
PyTorch MergeBot	934f878883	Revert "[inductor] leverage template stacking in V.choices.get_mm_configs (#161350 )" This reverts commit `623e623c82`. Reverted https://github.com/pytorch/pytorch/pull/161350 on behalf of https://github.com/huydhn due to Check with @coconutruben and the internal failures look real ([comment](https://github.com/pytorch/pytorch/pull/161351#issuecomment-3282511692))	2025-09-11 20:24:15 +00:00
PyTorch MergeBot	cef05b1202	Revert "[inductor][choices] rename get_mm_configs to get_template_configs (#162293 )" This reverts commit `30191fcf03`. Reverted https://github.com/pytorch/pytorch/pull/162293 on behalf of https://github.com/huydhn due to Check with @coconutruben and the internal failures look real ([comment](https://github.com/pytorch/pytorch/pull/161351#issuecomment-3282511692))	2025-09-11 20:24:15 +00:00
Boyuan Feng	b500c166ef	[FlexAttention][Easy] turn off TMA when cannot use it (#162569 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/162569 Approved by: https://github.com/drisspg	2025-09-11 19:51:19 +00:00
Jeff Daily	d65ffdef3d	[ROCm] fix miopen batchnorm changing output format (#162112 ) It was found that the integration of miopen batchnorm was causing the output to always be in default contig memory format even when the input was channels last. This also unskips a number of related unit tests. Pull Request resolved: https://github.com/pytorch/pytorch/pull/162112 Approved by: https://github.com/jeffdaily Co-authored-by: Jeff Daily <jeff.daily@amd.com> Co-authored-by: Dmitry Nikolaev <dmitry.nikolaev@amd.com> Co-authored-by: Jithun Nair <37884920+jithunnair-amd@users.noreply.github.com>	2025-09-11 19:37:48 +00:00
Pian Pawakapan	ac72f81c12	[dynamic shapes] unbacked-safe should_swap (#160473 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/160473 Approved by: https://github.com/laithsakka	2025-09-11 18:51:25 +00:00
Arijit Mukhopadhyay	9cac1b9259	AMD CPU CI - Add freezing + fix label trigger (#162176 ) Added the following changes: 1. Added freezing by default for AMD CPU based CI 2. Fixed issue with label based CI triggers Addresses code review comment in https://github.com/pytorch/pytorch/pull/161155 Pull Request resolved: https://github.com/pytorch/pytorch/pull/162176 Approved by: https://github.com/malfet, https://github.com/jeffdaily	2025-09-11 18:41:29 +00:00
Isalia20	9bc648235d	[MPS] mps sparse mul op implementation (#162349 ) Implements mps sparse mul operation as well as enables other operations such as: 1. copy_ 2. div 3. sum 4. floor 5. power 6. sub 7. floor_divide Pull Request resolved: https://github.com/pytorch/pytorch/pull/162349 Approved by: https://github.com/pearu, https://github.com/malfet Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com>	2025-09-11 18:36:24 +00:00
David Berard	799471d92b	[triton] Update 3.5 pin (AMD compilation fix + warp spec) (#162733 ) Fixes #162390 Also adds warp spec (thanks @manman-ren!) Pull Request resolved: https://github.com/pytorch/pytorch/pull/162733 Approved by: https://github.com/atalman	2025-09-11 18:19:16 +00:00
justinchuby	43d9b5ecaa	[ONNX] Set fallback=False by default (#162726 ) This change addresses confusing error messages users encounter when using the ONNX exporter with default settings. Previously, `fallback=True` was the default, which would attempt to fall back to the TorchScript exporter when the dynamo path failed, leading to mixed error messages that obscured the actual issues. ## Problem When `fallback=True` by default: - Users get confusing error messages mixing dynamo and TorchScript export failures - Error messages tell users to provide the `f` argument unnecessarily - Dynamo error messages get flushed with TorchScript errors when both paths fail - Users expecting the dynamo path get unexpected fallback behavior ## Solution Changed the default from `fallback=True` to `fallback=False` in both: - `torch.onnx.export()` function - `torch.onnx._internal.exporter._compat.export_compat()` function ## Impact Before: ```python # Would fallback to TorchScript on dynamo failure, causing mixed error messages torch.onnx.export(model, args) ``` After: ```python # Clean dynamo-only errors by default torch.onnx.export(model, args) # Advanced users can still opt-in to fallback behavior torch.onnx.export(model, args, fallback=True) ``` Fixes #162697 Pull Request resolved: https://github.com/pytorch/pytorch/pull/162726 Approved by: https://github.com/titaiwangms, https://github.com/xadupre	2025-09-11 18:09:58 +00:00
Tugsbayasgalan Manlaibaatar	463fbc8ca0	Support vmap + custom autograd function/improve DTensor constructor inefficiency (#162240 ) This makes gemma3 exportable on transformers=4.55.4 In HF, there is a torch funciton mode called TransformGetItemToIndex which internally calls custom autograd function. When this custom autograd function is called under vmap, It triggers CustomFunctionHigherOrderOP which error-ed because there was no pre-dispatch proxy mode implementation. Since there are number of requests lately to add various operators in pre-dispatch IR, I introduce a decorator in export that works similar to `allow_in_graph`. Basically: 1) We intercept custom_autograd_function.apply at pre-dispatch mode when this decorator is applied 2) We apply `flat_apply` HOP to hide the pytree spec for this autograd function. Note that this adds restriction that this custom autograd function needs to take in fx-able types. 3) subclass constructor decorator is implemented similarly, so we just refactor it to use similar implementation as this new decorator. eventually we should delete the subclass constructor decorator. 4) Move some code in subclass constructor decorator to exit early in non-export environment which should shave off some inefficiency (around 1% according to @swolchok 's benchmark) Fixes: https://github.com/pytorch/pytorch/issues/161563#issuecomment-3246309758 Differential Revision: [D82141316](https://our.internmc.facebook.com/intern/diff/D82141316) Pull Request resolved: https://github.com/pytorch/pytorch/pull/162240 Approved by: https://github.com/ydwu4	2025-09-11 17:42:41 +00:00
Catherine Lee	2f53395943	[ez][CI] Fix docs push in nightly workflow (#162657 ) HUD metrics page says docs push hasn't happened in 21 days <img width="293" height="142" alt="image" src="https://github.com/user-attachments/assets/f930aab8-0503-4bf2-b962-8c375dec6b78" /> I guess main branch docs just haven't been updated? Did anyone notice? Do we care? Either way I think this should fix it Likely started after https://github.com/pytorch/pytorch/pull/161182 Pull Request resolved: https://github.com/pytorch/pytorch/pull/162657 Approved by: https://github.com/huydhn	2025-09-11 16:45:41 +00:00
Avik Chaudhuri	fccddf02b6	repro 161902 (#162416 ) Summary: Sometimes `ShapeEnv.create_symbol` can return a `sympy.Integer`. This messes up our phantom symbol infra for derived dims. Fixes #161902 Test Plan: added test based on repro Rollback Plan: Differential Revision: D81960709 Pull Request resolved: https://github.com/pytorch/pytorch/pull/162416 Approved by: https://github.com/tugsbayasgalan	2025-09-11 16:35:23 +00:00
Nikita Shulga	8be8b94793	Update SECURITY.md with reporting guidelines (#162608 ) Added clarification that all reports will be disclosed within 90 days Pull Request resolved: https://github.com/pytorch/pytorch/pull/162608 Approved by: https://github.com/seemethere, https://github.com/albanD	2025-09-11 16:30:29 +00:00
suo	fe8cc619b8	[torch][c10d] fix split_group in mixed backend case (#162424 ) Today we can initialize a mixed-backend process group (e.g. "cpu:gloo,cuda:nccl") but we can only pass one set of process group options. However, when we call `split_group`, we retrieve that set of options from the parent PG and pass it to the ProcessGroup::groupSplit C++ API, which then attempts to propagate that set of options to all backends. This leads to an assert on some user code, where ProcessGroupGloo::split is expecting gloo options but receives nccl options instead. Arguably the APIs as currently designed are just broken; we should not ever expect a single set of backend options to apply across multiple backends. However, fixing this would require changing quite a few public APIs. As a quick fix, since user-provided options really only exist for NCCL, just warn and fall-back to defaulted options for Gloo if non-gloo options are detected. Pull Request resolved: https://github.com/pytorch/pytorch/pull/162424 Approved by: https://github.com/d4l3k, https://github.com/fduwjj, https://github.com/H-Huang	2025-09-11 16:29:32 +00:00
atalman	2f5a24c2a2	Smoke tests don't run nvshmem on Windows (#162646 ) Only available for linux x86 and aarch64 : https://pypi.org/project/nvidia-nvshmem-cu13/#files nvshmem is available only on linux: `` "nvidia-nvshmem-cu12==3.3.24; platform_system == 'Linux' and platform_machine == 'x86_64' \| " `` https://github.com/pytorch/pytorch/blob/main/.github/scripts/generate_binary_build_matrix.py#L57 Pull Request resolved: https://github.com/pytorch/pytorch/pull/162646 Approved by: https://github.com/kwen2501	2025-09-11 16:09:20 +00:00
Nikita Shulga	24492cbab2	[BE] Cleanup stale comments/copy from `gemm` (#162001 ) Followup after https://github.com/pytorch/pytorch/pull/154012 Since the introduction of `gemm_no_downcast_stub` it's no longer necessary to allocate temporary array and then manually implement the `beta` logic in the codebase Pull Request resolved: https://github.com/pytorch/pytorch/pull/162001 Approved by: https://github.com/drisspg ghstack dependencies: #161999	2025-09-11 15:48:43 +00:00
Avik Chaudhuri	3f6d88f04c	paths to exclude shape guards (#162684 ) Summary: Easier to land than https://www.internalfb.com/diff/D82030581 Test Plan: everything blamed by https://www.internalfb.com/diff/D80713603 (except some old exir tests) Rollback Plan: Differential Revision: D82180349 Pull Request resolved: https://github.com/pytorch/pytorch/pull/162684 Approved by: https://github.com/tugsbayasgalan	2025-09-11 15:34:06 +00:00
PyTorch MergeBot	94db2ad51d	Revert "Move prioritized text linker optimization code from setup.py to cmake (#160078 )" This reverts commit `26b3ae5890`. Reverted https://github.com/pytorch/pytorch/pull/160078 on behalf of https://github.com/atalman due to Sorry reverting this broke linux aarch64 CUDA nightlies [pytorch/pytorch/actions/runs/17637486681/job/50146967503](https://github.com/pytorch/pytorch/actions/runs/17637486681/job/50146967503) ([comment](https://github.com/pytorch/pytorch/pull/160078#issuecomment-3281426631))	2025-09-11 15:29:29 +00:00
PyTorch MergeBot	9f783e172d	Revert "Build and Install Arm Compute Library in manylinux docker image (#159737 )" This reverts commit `582d278983`. Reverted https://github.com/pytorch/pytorch/pull/159737 on behalf of https://github.com/atalman due to Sorry reverting this broke linux aarch64 CUDA nightlies [pytorch/pytorch/actions/runs/17637486681/job/50146967503](https://github.com/pytorch/pytorch/actions/runs/17637486681/job/50146967503) ([comment](https://github.com/pytorch/pytorch/pull/159737#issuecomment-3281398272))	2025-09-11 15:25:24 +00:00
Animesh Jain	a8432bcaad	[dynamo][guards] Fail on an unknown framelocals to dict conversion (#162695 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/162695 Approved by: https://github.com/williamwen42 ghstack dependencies: #162694	2025-09-11 15:01:00 +00:00
Animesh Jain	a3a40cb741	[dynamo][guards] Do not consturct framelocals to dict on GlobalsGuardAccessor (#162694 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/162694 Approved by: https://github.com/williamwen42	2025-09-11 15:01:00 +00:00
Tugsbayasgalan Manlaibaatar	c924c675d0	Fix persistent buffer bug (#162190 ) For non-persistent buffers, we should properly register them. Pull Request resolved: https://github.com/pytorch/pytorch/pull/162190 Approved by: https://github.com/zhxchen17	2025-09-11 14:56:26 +00:00

1 2 3 4 5 ...

92926 Commits