pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-06 12:20:52 +01:00

Author	SHA1	Message	Date
Nikita Shulga	add37bacda	[MPS] Better error checking for FFT ops (#166272 ) Namely, error out rather than crash when out dtype is of an unexpected type Resize output tensor to the expected size in `_out` operation, to prevent crash when tensor of an unexpected size is passed. Preserve symbolic shapes whenever possible Test plan: Run `python test_ops.py -v -k test_out_warning_fft_hfft_mps` for MPS device, without this change it crashes with `Error: Invalid KernelDAG, equalShape for destination failed'`, run `python ../test/test_ops.py -v -k test_dtypes_stft_mps`, without this change it crashes with `A complex mlir::Type does not have a corresponding complex MPSDataType"`, when input dtype is bfloat16 Pull Request resolved: https://github.com/pytorch/pytorch/pull/166272 Approved by: https://github.com/kulinseth	2025-10-28 01:31:47 +00:00
karthickai	1425b40f29	[inductor] Fix argmin/argmax returning incorrect indices for non-contiguous tensor (#165983 ) Fixes #163929 Fixes argmin/argmax operations to return correct logical indices instead of physical memory offsets when applied to transposed/permuted tensors. When `argmin()` or `argmax()` is called on a transposed tensor, Inductor was returning physical memory indices instead of logical row-major indices. This caused incorrect results that don't match eager mode behavior. Pull Request resolved: https://github.com/pytorch/pytorch/pull/165983 Approved by: https://github.com/shunting314	2025-10-28 01:23:24 +00:00
bobrenjc93	8af9ed0824	[torchfuzz] split, chunk, stack, cat, expand, gather, cumsum, clamp, index_select, split (#166221 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/166221 Approved by: https://github.com/pianpwk ghstack dependencies: #166187, #166188, #166220, #166189, #166190	2025-10-28 01:21:07 +00:00
bobrenjc93	7045aab143	[torchfuzz] add mhaf operator (#166190 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/166190 Approved by: https://github.com/pianpwk ghstack dependencies: #166187, #166188, #166220, #166189	2025-10-28 01:21:07 +00:00
bobrenjc93	7ae8aaf4c0	[torchfuzz] add sdpa operator (#166189 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/166189 Approved by: https://github.com/pianpwk ghstack dependencies: #166187, #166188, #166220	2025-10-28 01:20:58 +00:00
bobrenjc93	f2450798cd	[torchfuzz] make pointwise subclasses defined torch_op_name (#166220 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/166220 Approved by: https://github.com/pianpwk ghstack dependencies: #166187, #166188	2025-10-28 01:08:34 +00:00
fduwjj	46d17e8871	[Symm mem] Add a unit test for mempool tensor with dist collective (#166206 ) We haven't tried to see if tensors on nvshmem calling c10d collectives work or not. This PR is adding a show case for it inside UT. Pull Request resolved: https://github.com/pytorch/pytorch/pull/166206 Approved by: https://github.com/ngimel	2025-10-28 00:41:47 +00:00
Shunting Zhang	dc011d3203	[inductor][ez] add overridable env var for disabling fx graph cache (#166138 ) I set TORCHINDUCTOR_FX_GRAPH_CACHE=0 a lot to make sure the compilation happens by disabling fx graph caching. I even put this in my .bashrc. But this cause a simple vllm script fail: https://gist.github.com/shunting314/4253b2b5ab5e7d1b0fc9516c84054904 Error log: https://gist.github.com/shunting314/1d04bbeb58bc486f975684f56d65615d The root cause is, 1. vllm patch inductor_config.fx_graph_cache to True here: `e255d92990/vllm/compilation/compiler_interface.py (L308)` The code in vllm relies fx graph cache is on (unless VLLM_DISABLE_COMPILE_CACHE is overriden to false) 2. setting TORCHINDUCTOR_FX_GRAPH_CACHE=0 will cause inductor_config.fx_graph_cache not overridable. I add TORCHINDUCTOR_FX_GRAPH_CACHE_DEFAULT so that we can still use it to skip fx graph cache while still allow project like vllm to override it. Pull Request resolved: https://github.com/pytorch/pytorch/pull/166138 Approved by: https://github.com/eellison	2025-10-28 00:27:19 +00:00
Menglu Yu	e95920e3e6	[Optimus] Rename the post_grad_graph tlparse log (#166109 ) Summary: ezyang observed a cache miss issue, see details in https://github.com/pytorch/pytorch/issues/166012 We thus rename the post_grad_graph tlparse log name to resolve the cache issue. Differential Revision: D85309891 Pull Request resolved: https://github.com/pytorch/pytorch/pull/166109 Approved by: https://github.com/jamesjwu	2025-10-28 00:23:01 +00:00
Ting Lu	5e769ff867	[CD] Upgrade to CUDA 13.0.2 for nightly binaries (#165470 ) 13.0.U2 is posted, adding to nightlies Why we want to upgrade: CUDA 13.0.U2 included a new release from cuBLAS that 1. Enabled opt-in fixed-point emulation for FP64 matmuls (D/ZGEMM) which improves performance and power-efficiency. 2. Improved performance on NVIDIA [DGX Spark](https://www.nvidia.com/en-us/products/workstations/dgx-spark/) for FP16/BF16 and FP8 GEMMs. 3. adds BF16x9 FP32 emulation support for SYRK and HERK routines. Reference: https://docs.nvidia.com/cuda/cuda-toolkit-release-notes/index.html#cublas-release-13-0-update-2 Pull Request resolved: https://github.com/pytorch/pytorch/pull/165470 Approved by: https://github.com/atalman	2025-10-28 00:21:47 +00:00
bobrenjc93	0ae3e30621	[torchfuzz] fix group norm operator (#166188 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/166188 Approved by: https://github.com/pianpwk ghstack dependencies: #166187	2025-10-28 00:11:04 +00:00
bobrenjc93	47f50cfd45	[torchfuzz] check in more ignore regexes (#166187 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/166187 Approved by: https://github.com/pianpwk	2025-10-27 23:58:54 +00:00
Dzmitry Huba	a51f877287	Enable local tensor mode for another set of DTensor tests (#166105 ) Enable local tensor mode DTensor tests for the optimizers, op strategy, matrix ops, math ops, init ops, experimental ops, embedding ops, dynamic, convolution ops, main api. Pull Request resolved: https://github.com/pytorch/pytorch/pull/166105 Approved by: https://github.com/ezyang	2025-10-27 23:58:24 +00:00
Ruben Rodriguez Buchillon	b44423bbb4	[inductor][choices] lookup table choices 1/3 (#164978 ) \# why - enable users to control which choices get used on which inputs - reduce lowering time, and pin kernel selection, by selecting them for the inputs \# what - a new InductorChoices subclass that implements a lookup table - a README explaining the usage - corresponding testing - currently only supports templates that go through `V.choices.get_template_configs` \# testing ``` python3 -bb -m pytest test/inductor/test_lookup_table.py -v ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/164978 Approved by: https://github.com/PaulZhang12, https://github.com/eellison	2025-10-27 23:45:16 +00:00
Animesh Jain	8e1e4ee8e0	[reland][dynamo][easy] Support torch.accelerator.current_accelerator (#166327 ) Reland https://github.com/pytorch/pytorch/pull/165734 Pull Request resolved: https://github.com/pytorch/pytorch/pull/166327 Approved by: https://github.com/Lucaskabela	2025-10-27 23:41:43 +00:00
Isalia20	1e836bc769	[MPS] fix large matmul test device (#166271 ) PR is self explanatory Test was introduced by https://github.com/pytorch/pytorch/pull/143095 and was always running on CPU Pull Request resolved: https://github.com/pytorch/pytorch/pull/166271 Approved by: https://github.com/kulinseth, https://github.com/malfet Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com>	2025-10-27 22:56:59 +00:00
Millie Chen	9a91486e45	[Inductor-FX] Don't flatten constant args (#166144 ) Summary: Fallback kernels are created with flattened constant args and an `unflatten` utility to unflatten them when needed. Apply it in FXConverter to preserve the original structure Test Plan: added new CI tests Differential Revision: D85347589 Pull Request resolved: https://github.com/pytorch/pytorch/pull/166144 Approved by: https://github.com/blaine-rister	2025-10-27 22:33:37 +00:00
Joseph Macaranas	92381a5aa7	[ROCm] Custom OpenBLAS library name (#166333 ) - TheRock build system for ROCm builds OpenBLAS from source and uses a custom name for the library. - Following existing conventions in `FindOpenBLAS.cmake` to support finding a custom named version of OpenBLAS. Pull Request resolved: https://github.com/pytorch/pytorch/pull/166333 Approved by: https://github.com/jeffdaily	2025-10-27 22:13:05 +00:00
Eddie Yan	2a5f87decf	[cuDNN] Smoke-test runtime cuDNN version matches compile time version in CI (#165922 ) Fix and regression test for https://github.com/pytorch/pytorch/issues/165801 Pull Request resolved: https://github.com/pytorch/pytorch/pull/165922 Approved by: https://github.com/malfet, https://github.com/atalman, https://github.com/Skylion007, https://github.com/drisspg Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com>	2025-10-27 22:10:45 +00:00
Andrey Talman	840d63c12d	Update cuDNN 9.10.2 in Manylinux 2.28 Docker files (#165913 ) Fixes https://github.com/pytorch/pytorch/issues/165801 Smoke test: https://github.com/pytorch/pytorch/pull/165922/files Pull Request resolved: https://github.com/pytorch/pytorch/pull/165913 Approved by: https://github.com/Camyll, https://github.com/Skylion007	2025-10-27 22:08:06 +00:00
Animesh Jain	2ce894bb1d	[dynamo] Dont guard on numpy Cython functions (#166328 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/166328 Approved by: https://github.com/Lucaskabela	2025-10-27 22:01:10 +00:00
Tugsbayasgalan Manlaibaatar	47ec1e9990	Support regional inductor with custom config (#166269 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/166269 Approved by: https://github.com/anijain2305	2025-10-27 21:46:02 +00:00
fduwjj	904abfc2ca	Export flex attention with kwargs and DTensor (#166045 ) Fixes #165948 Adding registration of the MaskBlock makes flex attention with kwargs exportable. Also modified unittests to accept kwargs ``` python test/distributed/tensor/test_dtensor_export.py -k test_flex_attention_dtensor_export python test/inductor/test_flex_attention.py -k test_pytree_ ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/166045 Approved by: https://github.com/drisspg, https://github.com/SherlockNoMad Co-authored-by: fduwjj <fduwjj@gmail.com>	2025-10-27 21:40:40 +00:00
Scott Wolchok	7d16fcf2df	Re-re-re-re-apply "C++-accessible Placements via pybind11 (#163030 )" (#166132 ) Was reverted (again!) due to a merge conflict that crept in sometime during the "export to github -> land internally -> merge on github" process. D85096233 Pull Request resolved: https://github.com/pytorch/pytorch/pull/166132 Approved by: https://github.com/Skylion007, https://github.com/ezyang, https://github.com/malfet	2025-10-27 21:19:32 +00:00
Anshul Sinha	483845a9c4	[DTensor][Op] fix for DTensor ops with Partial placements (#165962 ) Summary: When operations are done on partial placements, we use sharding logic to incorrectly determine whether we should redistribute the tensor to replicate. By delaying the redistribution, we do the operation first, and then the partial reduction. This leads to incorrect results for max, min, gradient norm clipping, and more. We solve this by setting reduction_linear to False when there is a Partial placement to force the redistribution before completing the op. Test Cases 1. pytest test/distributed/tensor/test_math_ops.py -k test_partial_reduction_ops 2. pytest test/distributed/tensor/test_math_ops.py -k test_matching_partial_reduction_ops Pull Request resolved: https://github.com/pytorch/pytorch/pull/165962 Approved by: https://github.com/wconstab	2025-10-27 21:17:13 +00:00
Anshul Sinha	60bcb4ee88	[pipeline][be] refactored pipeline composability tests (#165701 ) Summary: The first thing I did was increase the world size to 8 because test_3d_with_tp_dp_pp wouldn't actually do fully shard as tp = 2, pp = 2, leaving dp = 1. The second thing was refactoring the tests using both single and multi stage schedules so that their logic is largely combined. This was accomplished by using the logic in test_replicate_pp_grad multi-stage schedule to determine the start and end indices for a partial model, but setting virtual_stage to 1 if we are using single stage schedules. Even if this approach isn't approved, multistage schedule logic in test_3d_with_tp_dp_pp and test_replicate_pp should be changed as the logic used is incorrect. Test Case 1. pytest test/distributed/_composable/test_composability/test_pp_composability.py Pull Request resolved: https://github.com/pytorch/pytorch/pull/165701 Approved by: https://github.com/H-Huang	2025-10-27 21:08:57 +00:00
Animesh Jain	ee7434be82	[dynamo][guards] 1/N Guard selectively for DTensor (#165824 ) A few internal jobs are observing very high guard overhead for DTensor. Since we own DTensor, we can make those guards way faster. Pull Request resolved: https://github.com/pytorch/pytorch/pull/165824 Approved by: https://github.com/Lucaskabela, https://github.com/bdhirsh	2025-10-27 20:35:40 +00:00
Nikita Shulga	d049ed2cb1	[BE] Fix metal compilation warnings (#166315 ) - Fixes `s/#pragma onces/#pragma once` typoe All methods in the headers must be inline, otherwise one gets barrage of following warnings ``` /Users/malfet/git/pytorch/pytorch/c10/metal/utils.h:337:7: warning: unused function 'conj<half __attribute__((ext_vector_type(2)))>' [-Wunused-function] half2 conj(half2 a) { ^ /Users/malfet/git/pytorch/pytorch/c10/metal/utils.h:342:8: warning: unused function 'conj<float __attribute__((ext_vector_type(2)))>' [-Wunused-function] float2 conj(float2 a) { ^ 2 warnings generated. ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/166315 Approved by: https://github.com/seemethere, https://github.com/atalman	2025-10-27 20:17:10 +00:00
KarhouTam	9901d44418	[torch/utils][Code Clean] Clean asserts in `torch/utils/.py` (#165410 ) Including: - `torch/utils/.py` Fixes part of #164878 Pull Request resolved: https://github.com/pytorch/pytorch/pull/165410 Approved by: https://github.com/albanD, https://github.com/cyyever	2025-10-27 19:48:55 +00:00
Tugsbayasgalan Manlaibaatar	6096c0fc74	Export should use aot_export_joint_with_descriptors (#165931 ) This diff moves export run_decompositions to use aot_export_joint_with_descriptors instead of aot_export_module. Doing so, i ran into 2 main bugs: 1) aot_export_joint_with_descriptors don't correctly pass in record_nn_module_stack flag that is needed to populate nn_module_stack by switching the internal tracer. 2) When creating symint with negative inputs, we need to pass in positive=False. This didn't matter before because aot_autograd directly returns integer inputs instead of creating symint. Pull Request resolved: https://github.com/pytorch/pytorch/pull/165931 Approved by: https://github.com/zhxchen17	2025-10-27 19:33:33 +00:00
eun2ce	f6951cb8ea	[dynamo] Fix recompilation error message to point to new programming model docs (#165260 ) Fixes #163496 Updated troubleshooting_url in torch/_dynamo/utils.py to point to the new programming model documentation. Changed: - Old: https://pytorch.org/docs/main/torch.compiler_troubleshooting.html - New: https://pytorch.org/docs/main/compile/programming_model.recompilation.html Pull Request resolved: https://github.com/pytorch/pytorch/pull/165260 Approved by: https://github.com/Lucaskabela, https://github.com/williamwen42	2025-10-27 19:31:11 +00:00
Nicolas De Carli	8887a33ede	[PyTorch] Improve conversion from/to FP16 on aarch64+sve (#166306 ) Summary: Conversion from/to float16 was not getting covered by conversion templates, because these used float16_t as data type instead of the custom at::Half. We are adding a shim that makes conversion routines use autovec code for float16 We observed the following performance improvements when compiling targeting armv9-a+sve2+fp16 before: float16_t->uint8->float16_t ===> 657.489us float16_t->int8->float16_t ===> 656.518us float16_t->int16->float16_t ===> 668.998us float16_t->int64->float16_t ===> 618.444us float16_t->double->float16_t ===> 439.728us after float16_t->uint8->float16_t ===> 181.216us ----> 263% higher throughput float16_t->int8->float16_t ===> 179.821us -----> 265% higher throughput float16_t->int16->float16_t ===> 183.417us ----> 265% higher throughput float16_t->int64->float16_t ===> 459.897us ----> 35% higher throughput float16_t->double->float16_t ===> 351.276us ---> 25% higher throughput Test Plan: Correctness: buck2 test mode/opt //caffe2/test:test_ops buck2 test mode/opt //caffe2/test:torch Performance: buck2 run mode/opt //caffe2/benchmarks/operator_benchmark/fb:operator_benchmark_test Differential Revision: D85533271 Pull Request resolved: https://github.com/pytorch/pytorch/pull/166306 Approved by: https://github.com/mcfi, https://github.com/ezyang	2025-10-27 19:07:44 +00:00
Maggie Moss	36a48e7e6d	Fix existing pyrefly errors on main (#166312 ) Silences existing errors on main to keep errors and noise from the type checker to a minimum Pull Request resolved: https://github.com/pytorch/pytorch/pull/166312 Approved by: https://github.com/Skylion007	2025-10-27 19:03:06 +00:00
Catherine Lee	c6a02eae5b	Add XLAHooksInterface to bazel file (#166179 ) Differential Revision: D85446553 Internal builds failing after https://github.com/pytorch/pytorch/pull/161369 ``` buck-headers/ATen/Context.h:22:10: fatal error: 'ATen/detail/XLAHooksInterface.h' file not found 22 \| #include <ATen/detail/XLAHooksInterface.h> \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 1 error generated. ``` Changes similar to that PR also change the build_variables file, which I've done here. I'm not sure why this wasn't caught by the bazel build we have? Sanity checked that some of the previously failing builds pass after this change Pull Request resolved: https://github.com/pytorch/pytorch/pull/166179 Approved by: https://github.com/Camyll	2025-10-27 18:47:06 +00:00
Mikayla Gawarecki	6ecd6b23b6	Document limitations of weights_only in SECURITY.md and torch.load doc (#165645 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/165645 Approved by: https://github.com/albanD	2025-10-27 18:20:50 +00:00
Sarthak Tandon	3f69b4d9b4	[ROCm][tunableop] Fixes flaky test issue (#166084 ) Fixes #165603 Pull Request resolved: https://github.com/pytorch/pytorch/pull/166084 Approved by: https://github.com/naromero77amd, https://github.com/jeffdaily	2025-10-27 18:13:30 +00:00
Shunting Zhang	a04edcb27a	[inductor] a few workspace api change (#166204 ) A few workspace API changes: 1. return outer name when creating. Usually a use case does not care about outer name. But for mix-order-reduction (stacked PR), we need it to do the next-layer of reduction on the workspace tensor 2. be able to override workspace tensor dtype 3. be able to delay the deallocation of workspace tensors in TritonKernel.call_kernel since they may be used after the call. The lifetime of the workspace tensors are only enlarged a little bit. They would be deallocated once the next layer reduction is done. Test with the stacked PR. Pull Request resolved: https://github.com/pytorch/pytorch/pull/166204 Approved by: https://github.com/jansel	2025-10-27 18:10:23 +00:00
anwang	eb2bad5bb5	[Inductor] Make combo kernel MAX_NUM_ARGS configurable (#166274 ) The MAX_NUM_ARGS of ComboKernel is currently a fixed number. We need to tune this number to avoid large fusion for MTIA, thus making it configurable. Differential Revision: [D85509352](https://our.internmc.facebook.com/intern/diff/D85509352/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/166274 Approved by: https://github.com/eellison	2025-10-27 18:06:38 +00:00
Catherine Lee	a076b4d7ac	Use std::min for #166021 (#166195 ) Summary: Attempting to forward fix failures from D85405167 (PR https://github.com/pytorch/pytorch/pull/166021) This is devmates suggestion and seems to work, but idk if it's a good idea or not. Devmate says it's getting resolved to at::min which is host only, and it doesn't happen in OSS is likely because `AT_PER_OPERATOR_HEADERS` is defined in OSS but not internally. ``` In file included from .../ATen/native/hip/Normalization.hip:11: .../ATen/native/hip/Normalization.cuh:302:37: error: no matching function for call to 'min' 302 \| v_[u] = input[batch][plane][min(x+u*blockDim.x, input.size(2)-1)]; \| ^~~ ``` Differential Revision: D85463674 Pull Request resolved: https://github.com/pytorch/pytorch/pull/166195 Approved by: https://github.com/Camyll, https://github.com/malfet, https://github.com/eqy	2025-10-27 17:57:44 +00:00
PyTorch MergeBot	a988510c33	Revert "Simplify the CUPTI CMake check for kineto (#161370 )" This reverts commit `e67e3d95f3`. Reverted https://github.com/pytorch/pytorch/pull/161370 on behalf of https://github.com/atalman due to Sorry this is failing libtorch nightly builds [pytorch/pytorch/actions/runs/18800131287/job/53653414136](https://github.com/pytorch/pytorch/actions/runs/18800131287/job/53653414136) ([comment](https://github.com/pytorch/pytorch/pull/161370#issuecomment-3452400982))	2025-10-27 17:05:59 +00:00
Animesh Jain	99e07c39ec	[dynamo][misc] Replace UserFunctionVariable with VariableTracker build (#165707 ) Audit: To prevent future issues with functools.partial or callable objects. Pull Request resolved: https://github.com/pytorch/pytorch/pull/165707 Approved by: https://github.com/Lucaskabela ghstack dependencies: #166251	2025-10-27 16:47:32 +00:00
Animesh Jain	610c09f8f4	[dynamo] Fix python_type for UserDefinedClassExceptionVariable (#166251 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/166251 Approved by: https://github.com/Lucaskabela	2025-10-27 16:47:32 +00:00
Animesh Jain	61bad3c1ea	[dynamo] Move some FUNCTION_MATCH to CLOSURE_MATCH (#166244 ) Closure match is more relaxed than FUNCTION_MATCH (which is ID_MATCH) Pull Request resolved: https://github.com/pytorch/pytorch/pull/166244 Approved by: https://github.com/Lucaskabela	2025-10-27 16:43:46 +00:00
linhaifeng	f89a7e9fe8	[1/N][Fix] Fix typo in aten folder (#166126 ) Fix typo in aten folder. Pull Request resolved: https://github.com/pytorch/pytorch/pull/166126 Approved by: https://github.com/cyyever, https://github.com/slayton58	2025-10-27 15:34:39 +00:00
fduwjj	f2c81635c8	[DeviceMesh][2D] Use concatenate for 2D (FSDP+TP) instead of getting from root mesh (#165492 ) With concatenate API, we can directly combine two meshes together rather than getting the spmd mesh from root. Differential Revision: [D85409698](https://our.internmc.facebook.com/intern/diff/D85409698) Pull Request resolved: https://github.com/pytorch/pytorch/pull/165492 Approved by: https://github.com/fegin ghstack dependencies: #163358	2025-10-27 15:33:21 +00:00
Nicolas De Carli	e214af6ae8	[Pytorch] Improve float32 erf() on aarch64 (#166262 ) Summary: The float32 data type has a vectorized routine that computes erf(). Such function currently calls std::exp() individually for each float on the vector being processed. We now use sleef's vectorized routine to compute exp, improving performance of erf. AVX2/AVX512 also have a custom erf implementation, which uses sleef to compute exp. We've observed a throughput increase of 25%, when tested on tensors containing 1M elements Before: f32 erf: 3175.977us After: f32 erf: 2539.446us Test Plan: Correctness: buck2 test mode/opt //caffe2/test:test_ops buck2 test mode/opt //caffe2/test:torch Performance: buck2 run mode/opt //caffe2/benchmarks/operator_benchmark/fb:operator_benchmark_test Differential Revision: D85522651 Pull Request resolved: https://github.com/pytorch/pytorch/pull/166262 Approved by: https://github.com/fadara01, https://github.com/jgong5, https://github.com/aditew01	2025-10-27 14:55:38 +00:00
Bin Bao	7ce723d21c	[AOTI] Remove c10 as linked library (#165489 ) Summary: AOTI compilation doesn't depend on c10 now. It should only depend on C shim symbols which live in libtorch_cpu or libtorch_cuda. Pull Request resolved: https://github.com/pytorch/pytorch/pull/165489 Approved by: https://github.com/yushangdi	2025-10-27 13:53:44 +00:00
PyTorch UpdateBot	4295a9a158	[xla hash update] update the pinned xla hash (#165895 ) This PR is auto-generated nightly by [this action](https://github.com/pytorch/pytorch/blob/main/.github/workflows/nightly.yml). Update the pinned xla hash. Pull Request resolved: https://github.com/pytorch/pytorch/pull/165895 Approved by: https://github.com/pytorchbot	2025-10-27 11:47:29 +00:00
PyTorch UpdateBot	90d7be35e9	Update slow tests (#165894 ) This PR is auto-generated weekly by [this action](https://github.com/pytorch/pytorch/blob/main/.github/workflows/weekly.yml). Update the list of slow tests. Pull Request resolved: https://github.com/pytorch/pytorch/pull/165894 Approved by: https://github.com/pytorchbot	2025-10-27 11:42:14 +00:00
Oguz Ulgen	8d4e48831e	Remove JITFunction constexpr and some arg_names (#166280 ) https://github.com/triton-lang/triton/pull/8536 breaks torch.compile integration. This PR attempts to fix it. Pull Request resolved: https://github.com/pytorch/pytorch/pull/166280 Approved by: https://github.com/jansel	2025-10-27 09:29:03 +00:00

1 2 3 4 5 ...

95006 Commits