pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-06 12:20:52 +01:00

Author	SHA1	Message	Date
rzou	70d36e047d	Making batching rule for F.embedding DTensor-aware (#162117 ) `vmap(F.embedding)(DTensor, DTensor)` was failing because F.embedding's batching rule generates a new tensor via at::arange, at::arange generates a regular tensor, and DTensor rightfully errors on mixed DTensor-regular Tensor operations. This PR fixes the problem by activating DTensor implicit replication on just the at::arange and the subsequent add operation. In order to accomplish this I move the DTensor implicit replication flag to C++ (most batching rules are in C++). Test Plan: - new test Pull Request resolved: https://github.com/pytorch/pytorch/pull/162117 Approved by: https://github.com/bdhirsh	2025-09-05 21:40:14 +00:00
Nikita Shulga	a00cdc1e41	[CD][BE] Get rid of SETUPTOOLS and PYYAML extra pins (#162266 ) As those weren't really a pins to begin with, and requirments.txt already has those Pull Request resolved: https://github.com/pytorch/pytorch/pull/162266 Approved by: https://github.com/clee2000, https://github.com/Skylion007, https://github.com/ZainRizvi ghstack dependencies: #162263, #162264	2025-09-05 21:32:52 +00:00
Shunzhi Wen	c10195e723	[C10d][Gloo] Enable complex datatype support in ProcessGroupGloo (#156633 ) - Enable communication of tensors with Complex datatype in ProcessGroupGloo, similar to how ProcessGroupNCCL handles it. - Move a function, which checks if Complex datatype is supported by a reduce operation, from ProcessGroupNCCL.cpp into a new file to be shared with ProcessGroupGloo. Fixes #156632 Pull Request resolved: https://github.com/pytorch/pytorch/pull/156633 Approved by: https://github.com/d4l3k	2025-09-05 21:24:36 +00:00
Boyuan Feng	771f369448	[Inductor] Improve RoPE (#161420 ) This PR fuses ROPE from 2 kernels into 1 kernel. Shape: ``` q: [B, Hq, S, D] k: [B, Hkv, S, D] ``` `Hq=32, Hkv=8, D=128` following Llama3 setting. <img width="980" height="624" alt="image" src="https://github.com/user-attachments/assets/652a8227-6f1d-465c-97fd-2b0af41f8ed9" /> Pull Request resolved: https://github.com/pytorch/pytorch/pull/161420 Approved by: https://github.com/shunting314	2025-09-05 20:55:20 +00:00
henrylhtsang	92a43025e0	[cutlass backend] Add FP8 tests for multiple linears (#160782 ) Adding a test that is closer to real use case. Thanks @mlazos for fixing a few issues so this test works for most cases. We still have to skip the AOTI and dynamic case due to accuracy issues. Pull Request resolved: https://github.com/pytorch/pytorch/pull/160782 Approved by: https://github.com/mlazos	2025-09-05 20:23:25 +00:00
Xuehai Pan	2fa0520a64	[BE][pytree] cleanup parameterized pytree tests (#160842 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/160842 Approved by: https://github.com/Skylion007	2025-09-05 20:15:29 +00:00
Edward Z. Yang	01edcd4df8	Make distributed modules importable even when backend not built (#159889 ) This PR is greatly simplified now that it stacked on top of a PR that builds with distributed always. We only need to stub functions that may not be defined due to a backend not being enabled. Signed-off-by: Edward Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/159889 Approved by: https://github.com/wconstab ghstack dependencies: #160449	2025-09-05 20:15:11 +00:00
Edward Yang	de893e96c7	Always build USE_DISTRIBUTED. (#160449 ) Signed-off-by: Edward Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/160449 Approved by: https://github.com/wconstab, https://github.com/albanD, https://github.com/dcci	2025-09-05 20:15:11 +00:00
Nikita Shulga	6087ef41e5	[BE] Cleanup stale comments/copy from `gemm` (#162001 ) Followup after https://github.com/pytorch/pytorch/pull/154012 Since the introduction of `gemm_no_downcast_stub` it's no longer necessary to allocate temporary array and then manually implement the `beta` logic in the codebase Pull Request resolved: https://github.com/pytorch/pytorch/pull/162001 Approved by: https://github.com/drisspg ghstack dependencies: #161999	2025-09-05 19:59:51 +00:00
Nikita Shulga	a3c7f77e50	[EZ][CD] Update MacOS deployment platform to 11.0 (#162264 ) Fixes following warning ``` MACOSX_DEPLOYMENT_TARGET is set to a lower value (10.15) than the version on which the Python interpreter was compiled (11.0) ``` Update deployment platform in `README.MD` as well Pull Request resolved: https://github.com/pytorch/pytorch/pull/162264 Approved by: https://github.com/clee2000, https://github.com/Skylion007, https://github.com/ZainRizvi ghstack dependencies: #162263	2025-09-05 19:58:04 +00:00
Justin Chu	3771380f83	[ONNX] Hide draft export under a flag (#162225 ) Use `TORCH_ONNX_ENABLE_DRAFT_EXPORT` to control whether draft_export should be used as a strategy in onnx export. Follow up of https://github.com/pytorch/pytorch/pull/161454 Pull Request resolved: https://github.com/pytorch/pytorch/pull/162225 Approved by: https://github.com/xadupre, https://github.com/titaiwangms	2025-09-05 19:54:50 +00:00
PyTorch MergeBot	adae7f66aa	Revert "Always build USE_DISTRIBUTED. (#160449 )" This reverts commit `c37103234a`. Reverted https://github.com/pytorch/pytorch/pull/160449 on behalf of https://github.com/jeanschmidt due to Breaking internal build rules, see D81756619 ([comment](https://github.com/pytorch/pytorch/pull/160449#issuecomment-3259430011))	2025-09-05 18:58:47 +00:00
PyTorch MergeBot	70f865ac9b	Revert "Make distributed modules importable even when backend not built (#159889 )" This reverts commit `ef3be6726f`. Reverted https://github.com/pytorch/pytorch/pull/159889 on behalf of https://github.com/jeanschmidt due to Breaking internal build rules, see D81756619 ([comment](https://github.com/pytorch/pytorch/pull/160449#issuecomment-3259430011))	2025-09-05 18:58:47 +00:00
Scott Wolchok	88d94d17e8	Add torch.Tensor._make_dtensor to accelerate DTensor.__new__ further (#161590 ) This seems to be a (very very roughly) ~8% improvement on DTensor benchmark very similar to the benchmark from #160580 (120ish usec -> 110ish usec) Differential Revision: [D81530105](https://our.internmc.facebook.com/intern/diff/D81530105) Pull Request resolved: https://github.com/pytorch/pytorch/pull/161590 Approved by: https://github.com/albanD ghstack dependencies: #161466, #161586	2025-09-05 18:43:41 +00:00
Ruben Rodriguez Buchillon	c321111499	[inductor][ez] V.choices.get_mm_configs returns list of ChoiceCallers (#161348 ) \# why - every callsite just executes the generator on the spot - previous pr adds the ability to add an override before expensive generators are executed, so we don't need this generator anymore \# what - rather than yielding the ChoiceCaller, just return the list of all valid ChoiceCallers \# testing ``` python3 -bb -m pytest test/inductor/test_max_autotune.py -v ``` Differential Revision: [D81520574](https://our.internmc.facebook.com/intern/diff/D81520574) Pull Request resolved: https://github.com/pytorch/pytorch/pull/161348 Approved by: https://github.com/eellison ghstack dependencies: #162075, #161340, #161341, #161342, #161343, #161344, #161345, #161346, #161347	2025-09-05 18:02:53 +00:00
Ruben Rodriguez Buchillon	9a8d454c46	[inductor] add kernel template choice (ktc) (#161347 ) # why - gather everything up to make choices, without running potentially expensive generators - enables overrides where we toss the entire list of configs from inductor, without having to enumrate it (expensive) # what - add a holding class that just gets all the components necessary to generate a ChoiceCaller - use that class to generate ChoiceCallers - this does not (yet) add the override function, but just prepares the scene ``` python3 -bb -m pytest test/inductor/test_max_autotune.py -v ``` Differential Revision: [D81520569](https://our.internmc.facebook.com/intern/diff/D81520569) Pull Request resolved: https://github.com/pytorch/pytorch/pull/161347 Approved by: https://github.com/eellison ghstack dependencies: #162075, #161340, #161341, #161342, #161343, #161344, #161345, #161346	2025-09-05 18:02:53 +00:00
Ruben Rodriguez Buchillon	e02e9edb55	[inductor] V.choice.get_mm_configs takes a stack of templates (#161346 ) # why - enables us to just gather relevant templates and get all choices at once - that in turns allows us to make op wide override decisions # what - V.choice.get_mm_configs takes a stack of templates - all callsites just provide a stack of size 1 right now but do not merge everything yet (other features pending) # testing ``` python3 -bb -m pytest test/inductor/test_max_autotune.py -v ``` Differential Revision: [D81520583](https://our.internmc.facebook.com/intern/diff/D81520583) Pull Request resolved: https://github.com/pytorch/pytorch/pull/161346 Approved by: https://github.com/eellison ghstack dependencies: #162075, #161340, #161341, #161342, #161343, #161344, #161345	2025-09-05 18:02:46 +00:00
Ruben Rodriguez Buchillon	d63ad53a99	[inductor][ez] return choicecallers directly (#161345 ) # why - remove repeat patterns - we have everything to make the choicecallers - templates - input_nodes - layouts - all the kwargs # what - yield a choicecaller directly from V.choices.get_mm_configs # testing ``` python3 -bb -m pytest test/inductor/test_max_autotune.py -v ``` Differential Revision: [D81520577](https://our.internmc.facebook.com/intern/diff/D81520577) Pull Request resolved: https://github.com/pytorch/pytorch/pull/161345 Approved by: https://github.com/jansel ghstack dependencies: #162075, #161340, #161341, #161342, #161343, #161344	2025-09-05 18:02:38 +00:00
Ruben Rodriguez Buchillon	031d79cb51	[inductor] move max-autotune logic inside V.choices.get_mm_configs (#161344 ) # why - heuristics providers know decide whether to (or which choices to add) in the max-autotune case - enables an eventual override point to gracefully fallback to the standard behavior # what - max-autotune is determined inside V.choices.get_mm_configs because it's mm only right now, we can just do `config.max_autotune or config.max_autotune_gemm` a TODO indicates that this can change in the future when this expands to more templates # testing ``` python3 -bb -m pytest test/inductor/test_max_autotune.py -v ``` Differential Revision: [D81520573](https://our.internmc.facebook.com/intern/diff/D81520573) Pull Request resolved: https://github.com/pytorch/pytorch/pull/161344 Approved by: https://github.com/jansel ghstack dependencies: #162075, #161340, #161341, #161342, #161343	2025-09-05 18:02:30 +00:00
Ruben Rodriguez Buchillon	a301dc3b60	[inductor][ez] pass template rather than template.uid (#161343 ) # why - simpler interface - enables future of extracting more things out of the template e.g. a hash # what V.choices.get_mm_configs now takes the whole template rather than just the template.uid # testing ``` python3 -bb -m pytest test/inductor/test_max_autotune.py -v ``` Differential Revision: [D81520576](https://our.internmc.facebook.com/intern/diff/D81520576) Pull Request resolved: https://github.com/pytorch/pytorch/pull/161343 Approved by: https://github.com/jansel ghstack dependencies: #162075, #161340, #161341, #161342	2025-09-05 18:02:22 +00:00
Ruben Rodriguez Buchillon	af590cb729	[inductor][aten] treat like a template in GEMMs (#161342 ) # why - central point to analyze and override all generated choices # what - add a pseudo heuristic for aten that just yields a single, empty kwargs - add a pseudo heuristic with the bias_addmm logic for it - add an addmm specific heuristic that yields a single choice, but also expands it with alpha and beta kwargs - replace all the aten.bind calls with V.choices.get_mm_configs using the now matching API for aten # testing ``` python3 -bb -m pytest test/inductor/test_max_autotune.py -v ``` Differential Revision: [D81520580](https://our.internmc.facebook.com/intern/diff/D81520580) Pull Request resolved: https://github.com/pytorch/pytorch/pull/161342 Approved by: https://github.com/jansel ghstack dependencies: #162075, #161340, #161341	2025-09-05 18:02:10 +00:00
Ruben Rodriguez Buchillon	4902c76c65	[inductor][ez] add template/externchoice uid (#161341 ) # why - to have a central registry of templates/externkernelchoice to match them to heuristics etc, they need unique names - mm is both the triton template name and the aten_mm name # what - add a uid() to KernelTemplate/ExternKernelChoice that returns name - override in ExternKernel to prepend "aten::" - override in TritonTemplate to prepend "triton::" This id is just use to find template heuristics, so it has no other impact # testing ``` python3 -bb -m pytest test/inductor/test_max_autotune.py -v ``` Differential Revision: [D81520579](https://our.internmc.facebook.com/intern/diff/D81520579) Pull Request resolved: https://github.com/pytorch/pytorch/pull/161341 Approved by: https://github.com/jansel, https://github.com/eellison ghstack dependencies: #162075, #161340	2025-09-05 18:01:58 +00:00
Ruben Rodriguez Buchillon	9602590b15	[inductor] move scaled_mm input nodes logic (#161340 ) # why - a step towards a unified interface for all choices, where any adjustment to nodes (e.g. unsqueezing) happens as part of choice specific preprocessing, behind a common point # what - move the unsqueeze logic for triton nodes for scaled_mm inside the new hookup for adjusting the kernel inputs for template heuristics # testing ``` python3 -bb -m pytest test/inductor/test_max_autotune.py -v -k "scale" ``` Differential Revision: [D81520582](https://our.internmc.facebook.com/intern/diff/D81520582) Pull Request resolved: https://github.com/pytorch/pytorch/pull/161340 Approved by: https://github.com/jansel, https://github.com/eellison ghstack dependencies: #162075	2025-09-05 18:01:44 +00:00
Ruben Rodriguez Buchillon	2ef665ae19	[inductor][contigous mm] mild refactor (#162075 ) # why - use the new heuristics logic better to handle kwargs # what - move all checks into the heuristics to yield a single choice or not choices if the decomposition should not be used - fix `hip` device type, which should be `cuda` - let heuristics handle the kwarg passing # testing in ci Differential Revision: [D81706776](https://our.internmc.facebook.com/intern/diff/D81706776) Pull Request resolved: https://github.com/pytorch/pytorch/pull/162075 Approved by: https://github.com/exclamaforte, https://github.com/jansel	2025-09-05 18:01:07 +00:00
Mikayla Gawarecki	b18bb6796f	Add const to stable amax (#162082 ) Fixes https://github.com/pytorch/pytorch/issues/161826 Pull Request resolved: https://github.com/pytorch/pytorch/pull/162082 Approved by: https://github.com/soulitzer	2025-09-05 17:37:49 +00:00
PyTorch MergeBot	d711f27845	Revert "[ROCm] [CK] Composable Kernel integration for inductor backend (#158747 )" This reverts commit `019fed39aa`. Reverted https://github.com/pytorch/pytorch/pull/158747 on behalf of https://github.com/jithunnair-amd due to Broke linux-binary-manywheel-rocm / manywheel-py3_9-rocm6_4-test: `019fed39aa/1` ... PR didn't have this job run successfully due to CI outage ([comment](https://github.com/pytorch/pytorch/pull/158747#issuecomment-3259212343))	2025-09-05 17:27:45 +00:00
Nikita Shulga	261a84a176	[CD][BE] Remove unnecessary checks for XCode version (#162263 ) None of them have worked for a while, PyTorch for Mac is build with XCode-15.4 Pull Request resolved: https://github.com/pytorch/pytorch/pull/162263 Approved by: https://github.com/clee2000, https://github.com/Skylion007, https://github.com/ZainRizvi	2025-09-05 17:02:36 +00:00
xinan.lin	98374612fc	[Intel GPU] Update Intel triton commit pin to Triton 3.5.x (#161777 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/161777 Approved by: https://github.com/EikanWang	2025-09-05 16:55:47 +00:00
Eddie Yan	c2a3024617	[cuBLASLt][FP8] `cuBLASLt` appears to support float8 rowwise-scaling on H100 (#161305 ) Following #157905 I think the macro around ``` TORCH_INTERNAL_ASSERT(use_rowwise == false, "rowwise scaled_gemm not supported with blaslt"); ``` was never updated and this would cause `float8` tests to fail. Also it appears the `Lt` accepts two inputs with `e4m3` and `e5m2` dtypes simultaneously, so removing that check here as well... CC @lw Pull Request resolved: https://github.com/pytorch/pytorch/pull/161305 Approved by: https://github.com/Skylion007, https://github.com/drisspg, https://github.com/jeffdaily Co-authored-by: Jeff Daily <jeff.daily@amd.com>	2025-09-05 16:55:09 +00:00
Xingyuan Li	b2c7b9ad2d	[Intel GPU][FlexAttention] Enable TMA path on Intel GPU (#162138 ) The existing `can_use_tma` has some conditions that are unnecessary for Intel GPUs. We have removed these useless conditions on the Intel GPU path. Pull Request resolved: https://github.com/pytorch/pytorch/pull/162138 Approved by: https://github.com/liangan1, https://github.com/EikanWang, https://github.com/jansel, https://github.com/etaf	2025-09-05 16:54:51 +00:00
PyTorch MergeBot	f3cebec39e	Revert "Rename propagate_tensor_meta to make private again (#161744 )" This reverts commit `734ce8eba9`. Reverted https://github.com/pytorch/pytorch/pull/161744 on behalf of https://github.com/jeanschmidt due to seems to break internal tests, see D81657000 for more details ([comment](https://github.com/pytorch/pytorch/pull/161744#issuecomment-3258934519))	2025-09-05 16:20:29 +00:00
Saurabh Mishra	06da7c0730	[DCP][Quantization] Fix for FP8 multiplication during dequantization (#162202 ) Summary: Weight vector needs to be upcasted since some FP8 formats (like Float8_e4m3fn) don't have CPU implementations in PyTorch. Reference: https://docs.pytorch.org/docs/stable/tensors.html#id13 We will use FP32 for the scale vector multiplication and convert to the target dtype. Upcasting helps with the following: 1. Full CPU support: `float32` has complete CPU kernel implementations for all operations 2. Numerical stability: `float32` provides more precision during intermediate calculations 3. Compatibility: Works across all devices (CPU/GPU) and PyTorch versions Test Plan: UTs Rollback Plan: Differential Revision: D81711093 Pull Request resolved: https://github.com/pytorch/pytorch/pull/162202 Approved by: https://github.com/wwwjn	2025-09-05 16:06:21 +00:00
Edward Yang	2dd529df00	A basic CLAUDE.md based on bad things I see claude code doing (#162163 ) Signed-off-by: Edward Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/162163 Approved by: https://github.com/albanD, https://github.com/Skylion007	2025-09-05 14:52:36 +00:00
Shunting Zhang	a714437093	[ez][inductor] add a few outer dimension reduction cases for LOAF (#162028 ) For the not able to fuse issue reported here: https://github.com/pytorch/pytorch/issues/93718 , LOAF can fuse the outer dimension softmax into a single kernel and brings 1.87x speedup for the example shape mentioned in the issue. Pull Request resolved: https://github.com/pytorch/pytorch/pull/162028 Approved by: https://github.com/jansel, https://github.com/eellison	2025-09-05 09:30:13 +00:00
atalman	bffc7dd1f3	[CD] Add cuda 13.0 libtorch builds, remove CUDA 12.9 builds (#161916 ) Related to https://github.com/pytorch/pytorch/issues/159779 Adding CUDA 13.0 libtorch builds, followup after https://github.com/pytorch/pytorch/pull/160956 Removing CUDA 12.9 builds, See https://github.com/pytorch/pytorch/issues/159980 Pull Request resolved: https://github.com/pytorch/pytorch/pull/161916 Approved by: https://github.com/jeanschmidt, https://github.com/Skylion007 Co-authored-by: Ting Lu <tingl@nvidia.com>	2025-09-05 07:47:54 +00:00
Zeng, Xiangdong	5c473e9f5e	[1/N] Port 5 _composable/fsdp distributed test cases to Intel GPU (#159118 ) For https://github.com/pytorch/pytorch/issues/114850, we will port distributed tests to Intel GPU. We could enable Intel GPU with following methods and try the best to keep the original code styles: - use "torch.accelerator.current_accelerator()" to determine the accelerator backend - enabled XPU for some test path - skip some test cases which Intel GPU does not support Pull Request resolved: https://github.com/pytorch/pytorch/pull/159118 Approved by: https://github.com/guangyey, https://github.com/d4l3k	2025-09-05 05:52:15 +00:00
Pian Pawakapan	5da573c42c	[PGO] handle PGO profile merges (#162097 ) Avoid merges from extra PGO key, if same source has different rank. Unlikely to happen (needs code hash match & source variable type to change), but being safe. Differential Revision: D81299840 Pull Request resolved: https://github.com/pytorch/pytorch/pull/162097 Approved by: https://github.com/bobrenjc93	2025-09-05 04:58:15 +00:00
PyTorch UpdateBot	494878a11b	[audio hash update] update the pinned audio hash (#162114 ) This PR is auto-generated nightly by [this action](https://github.com/pytorch/pytorch/blob/main/.github/workflows/nightly.yml). Update the pinned audio hash. Pull Request resolved: https://github.com/pytorch/pytorch/pull/162114 Approved by: https://github.com/pytorchbot	2025-09-05 04:32:16 +00:00
PyTorch UpdateBot	3bbc2e3e4f	[vllm hash update] update the pinned vllm hash (#162226 ) This PR is auto-generated nightly by [this action](https://github.com/pytorch/pytorch/blob/main/.github/workflows/nightly.yml). Update the pinned vllm hash. Pull Request resolved: https://github.com/pytorch/pytorch/pull/162226 Approved by: https://github.com/pytorchbot	2025-09-05 04:32:08 +00:00
Nick Riasanovsky	b67c410398	[BE] [Inductor] Add Kernel name to all coor-desc tuning (#161409 ) Summary: When running coordinate descent tuning the logging is difficult to parse if the results are parallelized at all. This includes the kernel name in each step so post-processing can unify the results, even if run in parallel. Test Plan: NFC. Just a logging change. Rollback Plan: Differential Revision: D80942794 Pull Request resolved: https://github.com/pytorch/pytorch/pull/161409 Approved by: https://github.com/PaulZhang12	2025-09-05 02:53:13 +00:00
Colin L Reliability Rice	be5b03dde9	Allow for using a dedicated binary for the torch subproc pool. (#162093 ) Summary: The binary torch is running inside of can be larger than needed and in certain situations, this can cause a loss of memory. Test Plan: We've manually run tests via ``` TORCHINDUCTOR_FORCE_DISABLE_CACHES=1 TORCHINDUCTOR_WORKER_SUPPRESS_LOGGING=0 make mc8-train-publish-cint-datafm-toy -C minimal_viable_ai/models/ifr_mtml/main_v1/ 2>&1 \| tee ~/run_out ``` and overriding the binary used to be the built fbpkg in /packages. We've also kicked off manual runs at ``` fire-feid-20250903-1051-ae8c6827 ``` Which do show the binary running - https://fburl.com/scuba/procprint/e6lwv32m Rollback Plan: steps: - jk.update: jk: pytorch/compiler:subproc_worker_binary constant_bool: null consistent_pass_rate: null fractional_host_rollout: null sampling_rate: null - manual.note: content: '' Differential Revision: D81616624 Pull Request resolved: https://github.com/pytorch/pytorch/pull/162093 Approved by: https://github.com/masnesral	2025-09-05 01:43:46 +00:00
Eddie Yan	73eb4511fb	[B200][NVFP4] Fix argument passing in `test_blockwise_mxfp8_nvfp4_mxfp4_numerics_` (#162185 ) to unblock https://github.com/pytorch/pytorch/pull/159494 Pull Request resolved: https://github.com/pytorch/pytorch/pull/162185 Approved by: https://github.com/Skylion007, https://github.com/drisspg	2025-09-05 01:24:59 +00:00
Jeffro	29280864d9	Add new parameter for gen_pyi.py to make it more configureable. (#161772 ) This is a reposting of PR #128519. This change is important to how we maintain PyTorch at Google. From the previous PR: " This will make the script more flexible for the directory where it is executed. ... We plan to use the deprecated_yaml from a blaze genrule that invokes pyi.py. As the input to the pyi.py, genrule requires the input file to be explicitly listed out. When we feed the value of tools/autograd/deprecated.yaml to genrule, it failed to resolve since tools/autograd is a package from blaze perspective. Any file under a blaze package will a proper blaze target to be access. " Pull Request resolved: https://github.com/pytorch/pytorch/pull/161772 Approved by: https://github.com/albanD Co-authored-by: Haifeng Jin <haifeng-jin@users.noreply.github.com>	2025-09-05 00:48:15 +00:00
angelayi	5c67426d68	[dynamo] Add support for const prop on .item (#162204 ) Fixes some of the errors in https://fb.workplace.com/groups/1028545332188949/permalink/1303030824740397/ Pull Request resolved: https://github.com/pytorch/pytorch/pull/162204 Approved by: https://github.com/williamwen42	2025-09-05 00:28:49 +00:00
Nikita Shulga	d2d4c8e9b2	[BLAS] Avoid downcasts for fp16fp16->fp32 BLAS (#161999 ) Followup after https://github.com/pytorch/pytorch/pull/154012 Fixes CPU part of https://github.com/pytorch/pytorch/issues/160841 Pull Request resolved: https://github.com/pytorch/pytorch/pull/161999 Approved by: https://github.com/drisspg	2025-09-04 23:35:27 +00:00
Eddie Yan	c7e41071a0	[B200][MXFP8] Fix regex in `test_blockwise_mxfp8_nvfp4_error_messages_recipe_mxfp8_cuda` (#162180 ) to unblock https://github.com/pytorch/pytorch/pull/159494 Pull Request resolved: https://github.com/pytorch/pytorch/pull/162180 Approved by: https://github.com/Skylion007, https://github.com/drisspg, https://github.com/nWEIdia	2025-09-04 23:29:10 +00:00
xinan.lin	9499c8761c	[Inductor][Intel GPU] Register triton template heuristic for addmm tma. (#162132 ) Fixes #162048 Pull Request resolved: https://github.com/pytorch/pytorch/pull/162132 Approved by: https://github.com/jansel	2025-09-04 23:01:57 +00:00
Nan Zhang	3a207816cc	Forward fix for user defined triton kernel grid calc (#162162 ) Summary: This change fixes the test: inductor:fxir_backend - test_custom_triton_autotune_dynamic which was broken by https://github.com/pytorch/pytorch/pull/160997 Test Plan: inductor:fxir_backend - test_custom_triton_autotune_dynamic Rollback Plan: Differential Revision: D81679217 Pull Request resolved: https://github.com/pytorch/pytorch/pull/162162 Approved by: https://github.com/eellison, https://github.com/jansel	2025-09-04 22:51:23 +00:00
Yiming Zhou	09be1890d7	[export] Fix torch.export.load with storage offset (#162172 ) Summary: As titled Test Plan: CI Rollback Plan: Differential Revision: D81687701 Pull Request resolved: https://github.com/pytorch/pytorch/pull/162172 Approved by: https://github.com/angelayi	2025-09-04 22:50:33 +00:00
Pian Pawakapan	0d84ff3b78	[PGO] log add_extra_remote PGO to tlparse (#161751 ) Summary: log when additional PGO profile is merged in, from added read key Test Plan: test_pgo Rollback Plan: Differential Revision: D81284190 Pull Request resolved: https://github.com/pytorch/pytorch/pull/161751 Approved by: https://github.com/bobrenjc93	2025-09-04 22:47:03 +00:00

1 2 3 4 5 ...

92627 Commits