pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-06 00:20:18 +01:00

Author	SHA1	Message	Date
soulitzer	b3861ac8e7	[reland] Warn if AccumulateGrad stream does not match producer node stream (#166136 ) Some checks failed docker-builds / docker-build (pytorch-linux-jammy-linter, linux.12xlarge) (push) Has been cancelled Details docker-builds / docker-build (pytorch-linux-jammy-py3-clang12-executorch, linux.12xlarge) (push) Has been cancelled Details docker-builds / docker-build (pytorch-linux-jammy-py3-clang12-onnx, linux.12xlarge) (push) Has been cancelled Details docker-builds / docker-build (pytorch-linux-jammy-py3-clang18-asan, linux.12xlarge) (push) Has been cancelled Details docker-builds / docker-build (pytorch-linux-jammy-py3-gcc11-inductor-benchmarks, linux.12xlarge) (push) Has been cancelled Details docker-builds / docker-build (pytorch-linux-jammy-py3.10-clang12, linux.12xlarge) (push) Has been cancelled Details docker-builds / docker-build (pytorch-linux-jammy-py3.10-gcc11, linux.12xlarge) (push) Has been cancelled Details docker-builds / docker-build (pytorch-linux-jammy-py3.12-halide, linux.12xlarge) (push) Has been cancelled Details docker-builds / docker-build (pytorch-linux-jammy-py3.12-triton-cpu, linux.12xlarge) (push) Has been cancelled Details docker-builds / docker-build (pytorch-linux-jammy-py3.13-clang12, linux.12xlarge) (push) Has been cancelled Details docker-builds / docker-build (pytorch-linux-jammy-py3.14-clang12, linux.12xlarge) (push) Has been cancelled Details docker-builds / docker-build (pytorch-linux-jammy-rocm-n-py3, linux.12xlarge) (push) Has been cancelled Details docker-builds / docker-build (pytorch-linux-jammy-rocm-n-py3-benchmarks, linux.12xlarge) (push) Has been cancelled Details docker-builds / docker-build (pytorch-linux-jammy-xpu-n-1-py3, linux.12xlarge) (push) Has been cancelled Details docker-builds / docker-build (pytorch-linux-jammy-xpu-n-py3, linux.12xlarge) (push) Has been cancelled Details docker-builds / docker-build (pytorch-linux-jammy-xpu-n-py3-inductor-benchmarks, linux.12xlarge) (push) Has been cancelled Details docker-builds / docker-build (pytorch-linux-noble-riscv64-py3.12-gcc14, linux.12xlarge) (push) Has been cancelled Details docker-builds / docker-build (pytorch-linux-noble-rocm-n-py3, linux.12xlarge) (push) Has been cancelled Details ossf-scorecard / Scorecards analysis (push) Has been cancelled Details Close nonexistent disable issues / close-nonexistent-disable-issues (push) Has been cancelled Details Index PyTorch Tests for Target Determination / get-label-type (push) Has been cancelled Details nightly / get-label-type (push) Has been cancelled Details nightly / update-commit-hashes (main, .ci/docker/ci_commit_pins, triton, triton-lang) (push) Has been cancelled Details nightly / update-commit-hashes (main, .github/ci_commit_pins, audio, pytorch) (push) Has been cancelled Details nightly / update-commit-hashes (main, .github/ci_commit_pins, vision, pytorch) (push) Has been cancelled Details nightly / update-commit-hashes (main, .github/ci_commit_pins, vllm, vllm-project) (push) Has been cancelled Details Index PyTorch Tests for Target Determination / index (push) Has been cancelled Details nightly / Link checks (push) Has been cancelled Details nightly / docs build (push) Has been cancelled Details nightly / docs push (push) Has been cancelled Details ghstack-source-id: 59641aa32dc6fd027abf3276017432b693aa71f8 Pull-Request-resolved: https://github.com/pytorch/pytorch/pull/165065 Fixes #ISSUE_NUMBER Opening a new PR for codev Pull Request resolved: https://github.com/pytorch/pytorch/pull/166136 Approved by: https://github.com/ngimel	2025-11-01 12:33:48 +00:00
Shunting Zhang	4cc64d6234	[inductor] pre grad graph bisecting (#166344 ) A few things to note: 1. Customers like vllm use a custom backend (e.g. VllmBackend), split the graph, and call standalone_compile for each split. If we let the bisector override the backend, we won't bisect thru the custom backend. `test_configs.bisect_keep_custom_backend_for_inductor` is used to keep the custom backend if we are bisecting for inductor. 2. pre_grad_graph bisecting and lowering bisecting so far does not compose well with each other since an issue may be just captured by the first one we try. `test_configs.bisect_pre_grad_graph` is used to enable the 'pre_grad_graph' bisecting. Pull Request resolved: https://github.com/pytorch/pytorch/pull/166344 Approved by: https://github.com/eellison	2025-11-01 09:22:21 +00:00
Laith Sakka	1aef88c72d	Avoid DDE in narrow with unbacked start (#166361 ) Slice knows how to handle unbacked start, we do not need to offset start before calling slice, we can leave it for slice. The only edge case is when start<0 and start+length ==0 in that case slice and narrow would deviate, for that case we shall pass dim_size instead of start+length Pull Request resolved: https://github.com/pytorch/pytorch/pull/166361 Approved by: https://github.com/aorenste	2025-11-01 07:10:23 +00:00
Yuanyuan Chen	f0745ddb11	Replace c10::call_once with static initialization (#166381 ) This PR replaces c10::call_once calls with static initialization when possible. C++11 semantics guarantees that static initialization is atomic. Static initialization also has lower cost than using c10::call_once. Pull Request resolved: https://github.com/pytorch/pytorch/pull/166381 Approved by: https://github.com/malfet	2025-11-01 07:09:40 +00:00
Nikita Shulga	4316df857c	[3.14] Fix torch.package.importer (#166767 ) That relies on internal implementation of `picker._getattribute` which changed from (i.e. takes object and string and returns tuple) `9ab89c026a/Lib/pickle.py (L316)` To (takes object and iterable of strings and returns object `631ba3407e/Lib/pickle.py (L315)` Test plan: ``` python -c "import torch; print(torch.package.sys_importer.get_name(torch.cuda.Stream))" ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/166767 Approved by: https://github.com/williamwen42	2025-11-01 05:05:47 +00:00
Yuanyuan Chen	9d6597b1e9	Correctly use test parameters (#166726 ) This PR uses unused arguments in some tests. Pull Request resolved: https://github.com/pytorch/pytorch/pull/166726 Approved by: https://github.com/rec, https://github.com/albanD, https://github.com/Skylion007	2025-11-01 04:43:31 +00:00
Xuehai Pan	e8fadba28c	[pytree] add `treespec_{leaf,tuple,dict}` functions for args_spec modification (#160843 ) The goal of this PR is to provide a standard way to create simple treespec instances and hide the implementation details of the `PyTreeSpec` class. Changes: 1. Add function `treespec_leaf()` to replace `LeafSpec()`. 2. Add function `treespec_tuple(...)` and `treespec_dict(...)` to create treespec for `tuple` / `dict` which is used for `args` / `*kwargs`. This avoids direct modification to `treespec` instances that rely on the implementation details of the `PyTreeSpec` class. 3. Change `len(spec.children_specs)` to `spec.num_children`. 4. Change `isinstance(spec, LeafSpec)` to `spec.is_leaf()`. ------ Pull Request resolved: https://github.com/pytorch/pytorch/pull/160843 Approved by: https://github.com/mlazos	2025-11-01 04:12:11 +00:00
PyTorch MergeBot	60333de85d	Revert "Remove setup-env instructions; it's confusing (#166749 )" This reverts commit `3dc92d69ed`. Reverted https://github.com/pytorch/pytorch/pull/166749 on behalf of https://github.com/pytorch-auto-revert due to Reverted automatically by pytorch's autorevert, to avoid this behaviour add the tag autorevert: disable ([comment](https://github.com/pytorch/pytorch/pull/166749#issuecomment-3475481831))	2025-11-01 02:55:56 +00:00
Edward Yang	3dc92d69ed	Remove setup-env instructions; it's confusing (#166749 ) Signed-off-by: Edward Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/166749 Approved by: https://github.com/mlazos	2025-11-01 01:48:15 +00:00
Yuanyuan Chen	f91899ca6c	[2/N] Add strict parameter to Python zip calls (#166257 ) This PR adds `strict=True/False` to zip calls in test utils. strict=True is passed when possible. Pull Request resolved: https://github.com/pytorch/pytorch/pull/166257 Approved by: https://github.com/janeyx99	2025-11-01 00:35:41 +00:00
Yuanyuan Chen	e2dc32f4ba	Replace decltype(auto) with auto (#166537 ) This PR replaces `decltype(auto)` with `auto` for C++ return type deduction and simplifies some templates. Pull Request resolved: https://github.com/pytorch/pytorch/pull/166537 Approved by: https://github.com/Skylion007	2025-11-01 00:30:23 +00:00
Zhengxu Chen	83cc38d9c1	[precompile] Preserve default arguments for dynamo capture (#166654 ) Summary: Handle the case where there's default arguments on function signature. Test Plan: pytest test/export/test_experimental.py -k test_dynamo_graph_capture_default_args Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/166654 Approved by: https://github.com/tugsbayasgalan	2025-11-01 00:12:10 +00:00
Sun, Jiayi	8d599045cf	add shape check for avg_pool2d (#161952 ) Fix https://github.com/pytorch/pytorch/issues/153312. Example: ```python import torch print(torch.__version__) tensor = torch.tensor([[ -7.8130e-88, -2.2092e-138, -1.8673e+03, -7.6272e-253, 3.9203e+110, 1.8380e-51, 2.8762e+268, 2.9094e+286, 5.1816e-228, -4.4916e+191, -7.4057e+80, -9.1955e-18, 5.6536e+225, 8.8364e-175, 1.5053e-226], [-3.0521e+239, -2.8307e+306, 1.3297e-03, -9.9969e-132, 2.8920e-286, 2.3964e+58, -6.8138e-281, 2.0321e-305, -3.5127e+74, -4.7560e-92, -8.9403e-99, -1.9739e-187, -2.5124e-173, 2.0458e+295, 4.4992e+52], [ 6.8752e+21, 1.9332e+189, -8.6940e-189, -6.6743e-15, 1.4691e+41, 1.0338e+63, -2.0779e-28, -7.6642e+104, 1.3390e+284, -8.0859e+194, 8.4600e+107, 4.9115e-44, 1.1665e+285, 5.1275e+203, 9.7580e+303]], dtype=torch.float64) try: res = torch.nn.functional.lp_pool1d( tensor, norm_type=-1.38119e+150, kernel_size=7879455037536781369, ceil_mode=True, ) print("CPU result:", res) except RuntimeError as e: print(f"CPU error: {e}") tensor_gpu = tensor.to("cuda:0") try: res = torch.nn.functional.lp_pool1d( tensor_gpu, norm_type=-1.38119e+150, kernel_size=7879455037536781369, ceil_mode=True, ) print("GPU result:", res) except RuntimeError as e: print(f"GPU error: {e}") ``` Output: - before ``` 2.9.0a0+git8703deb CPU result: tensor([[0.], [0.], [0.]], dtype=torch.float64) GPU error: integer out of range ``` - after ``` 2.9.0a0+git2e893df CPU error: integer out of range GPU error: integer out of range ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/161952 Approved by: https://github.com/mingfeima, https://github.com/malfet	2025-10-31 22:52:41 +00:00
Paul de Supinski	fd5da81fdd	[AI Codemod][DevmateFBSourceTestFailureBot] Fix for T243177299 ("Your diff, D85182174, broke some tests") (#166753 ) Summary: As per title, a bot created this diff because this test broke due to [a different PR.](https://github.com/pytorch/pytorch/pull/166026) <Erased bot summary in case anything we don't want to make external.> Test Plan: Bot ran the tests and they passed. <Erased bot test plan in case anything we don't want to make external.> Differential Revision: D85745809 Pull Request resolved: https://github.com/pytorch/pytorch/pull/166753 Approved by: https://github.com/d4l3k	2025-10-31 22:49:59 +00:00
Nikita Shulga	9261a1fb12	[MPS] Error out when BatchNorm is called for Complex (#166215 ) Or BatchNorm or LayerNorm for Long types Discovered while trying to enable `test_ops.py` for MPS Pull Request resolved: https://github.com/pytorch/pytorch/pull/166215 Approved by: https://github.com/dcci, https://github.com/kulinseth, https://github.com/Skylion007 ghstack dependencies: #166214, #166687	2025-10-31 22:44:29 +00:00
clr	d80ae738c9	compile_worker: Make a timer class (#166465 ) This subclass allows us to trigger an action after we haven't seen any activity for a certain amount of seconds. Pull Request resolved: https://github.com/pytorch/pytorch/pull/166465 Approved by: https://github.com/masnesral	2025-10-31 22:39:31 +00:00
drisspg	51667435f5	[FlexFlash] Wire up mask_mod + blockmask to flash impl (#166359 ) I have some local changes that I need to push to flash first https://github.com/Dao-AILab/flash-attention/pull/1970 Pull Request resolved: https://github.com/pytorch/pytorch/pull/166359 Approved by: https://github.com/v0i0	2025-10-31 22:07:40 +00:00
PyTorch MergeBot	2699f5410b	Revert "[xpu][feature] Integrate OneDNN SDPA training forward/backward into XPU OVERRIDEABLE Backend (#162454 )" This reverts commit `fd68d409ad`. Reverted https://github.com/pytorch/pytorch/pull/162454 on behalf of https://github.com/atalman due to internal build failure ([comment](https://github.com/pytorch/pytorch/pull/162454#issuecomment-3475009089))	2025-10-31 21:58:52 +00:00
Parshant Sharma	9970fb97ff	Fix Tril Triu SymInt (#166627 ) Fixes #165613 ### Summary: - This MR fixes an issue where `torch.tril `and `torch.triu` with dynamic diagonal values cause torch.export to incorrectly infer unnecessary constraints between dynamic dimensions. - Ensured proper SymInt type annotations for diagonal parameter - Updated C++ implementation to correctly handle SymInt diagonal values. ### Impacts: module: dynamic shapes Pull Request resolved: https://github.com/pytorch/pytorch/pull/166627 Approved by: https://github.com/ezyang, https://github.com/Skylion007	2025-10-31 21:53:20 +00:00
Boyuan Feng	dfebdcab86	[GraphPartition] cache get_free_symbol_uses (#166338 ) Graph partition relies on `get_free_symbol_uses()` to collect symbol inputs. `ee7434be82/torch/_inductor/scheduler.py (L4869-L4885)` I empirically observed that `get_free_symbol_uses()` becomes slower for larger graphs. Specifically, I tried to aten fallback for torchtitan which results in 10k+ aten nodes. When processing the 600-th node, it takes seconds to `get_free_symbol_uses()` for 1 node. Why? Because `get_free_symbol_uses()` may recursively call another `get_free_symbol_uses()`, which could recursively run many times. `ee7434be82/torch/_inductor/ir.py (L4541-L4543)` This PR fixes the issue by caching the results of `get_free_symbol_uses()`. I validated on torchtitan that the issue is fixed. Pull Request resolved: https://github.com/pytorch/pytorch/pull/166338 Approved by: https://github.com/eellison	2025-10-31 21:24:05 +00:00
Wang, Chuanqi	b09fb481e0	[CD] Upgrade GCC version to 13 for XPU build (#162474 ) Follow #152426 Pull Request resolved: https://github.com/pytorch/pytorch/pull/162474 Approved by: https://github.com/zxiiro, https://github.com/atalman	2025-10-31 21:15:37 +00:00
Nikita Shulga	4e7232c5da	[MPS] Fix `smooth_l1_loss` backward for fp16 (#166687 ) And enable fp16 implementation for CPU, which simplifies OpInfo definitions for the op Pull Request resolved: https://github.com/pytorch/pytorch/pull/166687 Approved by: https://github.com/Skylion007 ghstack dependencies: #166214	2025-10-31 21:13:46 +00:00
PyTorch MergeBot	93a70c717a	Revert "Add CUDA MXFP4 scaled mm support via. FBGEMM (#166526 )" This reverts commit `e3ae0594d1`. Reverted https://github.com/pytorch/pytorch/pull/166526 on behalf of https://github.com/atalman due to Failing internal test ([comment](https://github.com/pytorch/pytorch/pull/166526#issuecomment-3474907536))	2025-10-31 21:10:28 +00:00
Yuanyuan Chen	d97144d31e	[5/N] Remove unused loop variables in tests (#166716 ) This PR removes unused loop variables in tests. Pull Request resolved: https://github.com/pytorch/pytorch/pull/166716 Approved by: https://github.com/Lucaskabela, https://github.com/Skylion007	2025-10-31 20:47:57 +00:00
William Wen	e4043884c7	[dynamo, 3.14] fix segfault due to improper create_call_function_ex (#166678 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/166678 Approved by: https://github.com/malfet	2025-10-31 20:44:53 +00:00
Lucas Kabela	4a7bc1d522	[BE][Typing][Dynamo] Type misc files in `torch/_dynamo/variables/` (#166569 ) Provides type coverage to ~3000 LOC and 200 methods in `torch/_dynamo/variables/` This is the first part of the final step to having 100% strict type coverage in dynamo - see previous comments in https://github.com/pytorch/pytorch/pull/166535 (combined into this one PR because ghstack was giving issues...) ### Coverage report: ``` mypy torch_dynamo/variables --linecount-report /tmp/coverage_log ``` Compare before to after - we go from 3826 to 7221 lines covered Pull Request resolved: https://github.com/pytorch/pytorch/pull/166569 Approved by: https://github.com/williamwen42, https://github.com/Skylion007	2025-10-31 20:42:27 +00:00
Nicolas De Carli	8209a0506b	[Pytorch] Enable aarch64 convert autovec only on clang (#166739 ) Summary: We've noted issues with modern GCC versions. Until further investigation is carried, we'll leave the code only enabled on clang Test Plan: CI Differential Revision: D85968395 Pull Request resolved: https://github.com/pytorch/pytorch/pull/166739 Approved by: https://github.com/mcfi, https://github.com/Skylion007, https://github.com/robert-hardwick	2025-10-31 20:22:33 +00:00
William Wen	70aeb49198	[dynamo] clarify graph break handling/logging in symbolic_convert (#166587 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/166587 Approved by: https://github.com/Lucaskabela ghstack dependencies: #166476, #166477, #166586	2025-10-31 20:13:16 +00:00
Nikita Shulga	cf9a834f39	[BE] Move GreenContext implementation details to cpp (#166462 ) - Remove all complex defines logic from the header - Make GreenContext constructor private, as it should only be created via the static method as singleton - Delete unused `getContext` and `getGreenContext` methods - Rename `CUDA_HAS_GREEN_CONTEXT` to `HAS_CUDA_GREEN_CONTEXT()`, which results in compilation error if one accidentally makes a typo - Suppress `-Wunused-private-field` is GreenContext is not available Pull Request resolved: https://github.com/pytorch/pytorch/pull/166462 Approved by: https://github.com/ngimel, https://github.com/eqy	2025-10-31 20:11:02 +00:00
Yuanyuan Chen	856a7a5298	Add missing device to namedtensor tests (#166717 ) This PR passes unused `device` argument to tests. Pull Request resolved: https://github.com/pytorch/pytorch/pull/166717 Approved by: https://github.com/Skylion007	2025-10-31 20:04:41 +00:00
Camyll Harajli	ef8d97efcf	fix broken nn_convolution test (#166666 ) Summary: Broken by oss diff during oncall by third party contributor Test Plan: buck test 'fbcode//mode/dev-nosan' fbcode//caffe2/test:nn_convolution -- --run-disabled Differential Revision: D85899891 Pull Request resolved: https://github.com/pytorch/pytorch/pull/166666 Approved by: https://github.com/atalman, https://github.com/seemethere, https://github.com/Skylion007	2025-10-31 19:59:50 +00:00
Fadi Arafeh	d2be06f673	[cpu][fix] Update ACL version to fix crashes with tensor sizes > 2^31-1 (#165904 ) ---- - Updates Arm Compute Library (ACL) to v52.6.0 - v52.6.0 contains https://github.com/ARM-software/ComputeLibrary/pull/1201 which fixes crashes with tensors of sizes > 2^31-1 fixes: #165654 Pull Request resolved: https://github.com/pytorch/pytorch/pull/165904 Approved by: https://github.com/malfet	2025-10-31 19:37:26 +00:00
James Wu	08f4535378	Refactor AOTAutogradCacheEntry into AOTAutogradResult (#166656 ) This PR refactors the name AOTAutogradCacheEntry into AOTAutogradResult, and BundledAOTAutogradCacheEntry into BundledAOTAutogradResult. It also moves all coresponding files to a new file, `aot_autograd_result`, which is analogous to `output_code.py` from Inductor. Having all these be called cache entries made sense when all we used them for was caching. But with AOT compile using BundledAOTAutogradCacheEntry, we want a more generalized naming structure. This is a no-op change, and all existing tests should pass. Pull Request resolved: https://github.com/pytorch/pytorch/pull/166656 Approved by: https://github.com/zhxchen17 ghstack dependencies: #166650	2025-10-31 18:54:09 +00:00
James Wu	30157d30f0	Add regional aot eager support to AOTAutogradCacheEntry (#166650 ) This PR does two things: - It genericizes `BundledAOTAutogradCacheEntry` to support any outputcode, not just CompiledFxGraphs - It adds a brand new OutputCode for the `aot_eager_regional_inductor` backend, i.e. a graph module that has regional inductor components in it. This allows BundledAOTAutogradCache to just integrate nicely with inductor out of the box, but more importantly, it allows the result of aot_autograd to be fully serializable when using `aot_eager_regional_inductor`. This will allow us to AOT precompile cases where we have an eager graph that has scooped up inductor bits. It's a bit unfortunate that the naming makes BundledAOTAutogradCacheEntry sound like its primary use is for caching, but really the more common use is going to be as an AOTAutogradOutput. It may be worth revisiting how to refactor/rename these in a later PR: - AOTAutogradCacheEntry -> AOTAutogradResult - BundledAOTAutogradCacheEntry -> BundledAOTAutogradResult Pull Request resolved: https://github.com/pytorch/pytorch/pull/166650 Approved by: https://github.com/zhxchen17	2025-10-31 18:54:09 +00:00
IvanKobzarev	b470e59c38	partitioner option to ignore partitioner_tag for abstract usage (#166725 ) Partitioner functionality is appealing to use in different scenarios (E.g. Autoparallel) We have special logic about "partitioner_tag" from meta that is only needed for forward/backward split. Adding optional argument to avoid it and do only generic split based on inputs/outputs. Potentially we want to make `_extract_graph_with_inputs_outputs` without underscore :) Pull Request resolved: https://github.com/pytorch/pytorch/pull/166725 Approved by: https://github.com/bdhirsh	2025-10-31 18:50:02 +00:00
PyTorch MergeBot	85b85f6c2c	Revert "[pytree] add `treespec_{leaf,tuple,dict}` functions for args_spec modification (#160843 )" This reverts commit `108bb224f7`. Reverted https://github.com/pytorch/pytorch/pull/160843 on behalf of https://github.com/atalman due to failing internal builds ([comment](https://github.com/pytorch/pytorch/pull/160843#issuecomment-3474354428))	2025-10-31 18:31:32 +00:00
Nicolas De Carli	b71966f67b	[PyTorch] Improve aarch64 performance of bfloat16 ops - retry (#166028 ) (#166641 ) Summary: PR allows compiler to better optimize some bfloat16-based operations, when ran on NEON Retrying to land the code, after noting that these expressions became available in recent compiler versions. Current CI benchmark ‎binary_test.py will measure affected codepaths. Benchmarks show measurable improvements on clang-19, when targeting armv9-a+sve2: Before: bfloat16 add: 250.503us bfloat16 sub: 245.674us bfloat16 neg: 113.945us bfloat16 abs: 115.953us bfloat16 reciprocal: 262.602us After: bfloat16 add: 203.862us ---> 23% higher throughput bfloat16 sub: 201.526us ---> 22% higher throughput bfloat16 neg: 68.416us ---> 67% higher throughput bfloat16 abs: 71.003us ---> 63% higher throughput bfloat16 reciprocal: 177.834us ---> 48% higher throughput Test Plan: Correctness: buck2 test mode/opt //caffe2/test:test_ops buck2 test mode/opt //caffe2/test:torch Performance: buck2 run mode/opt //caffe2/benchmarks/operator_benchmark/fb:operator_benchmark_test Reviewed By: mcfi Differential Revision: D85809843 Pull Request resolved: https://github.com/pytorch/pytorch/pull/166641 Approved by: https://github.com/Skylion007, https://github.com/malfet	2025-10-31 18:21:04 +00:00
Scott Wolchok	0947765eb9	Cache even more work for return_and_correct_aliasing (#166365 ) Yet another pass found even more work we can move to be done only once. This seems to knock a few microseconds off the DTensor dispatch fast path. Pull Request resolved: https://github.com/pytorch/pytorch/pull/166365 Approved by: https://github.com/bdhirsh	2025-10-31 18:03:05 +00:00
Jeff Daily	239e7b541a	[ROCm][CI] upgrade nightly wheels to ROCm 7.1 (#166730 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/166730 Approved by: https://github.com/jeffdaily Co-authored-by: Jeff Daily <jeff.daily@amd.com>	2025-10-31 17:30:47 +00:00
Justin Chu	ffaa6578b7	Revise deprecation warning for ONNX exporter (#166692 ) Updated deprecation warning for ONNX export to reflect the current state. Pull Request resolved: https://github.com/pytorch/pytorch/pull/166692 Approved by: https://github.com/titaiwangms	2025-10-31 17:23:55 +00:00
Jane Xu	365ed62f61	Document LibTorch ABI more, add README to headeronly (#166661 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/166661 Approved by: https://github.com/mikaylagawarecki, https://github.com/albanD	2025-10-31 17:18:13 +00:00
PyTorch MergeBot	fcc1063566	Revert "[BE][Typing][Dynamo] Type misc files in `torch/_dynamo/variables/` (#166569 )" This reverts commit `aa9c96af04`. Reverted https://github.com/pytorch/pytorch/pull/166569 on behalf of https://github.com/Lucaskabela due to Lintrunner not fixed due to race condition at landing ([comment](https://github.com/pytorch/pytorch/pull/166569#issuecomment-3474012637))	2025-10-31 16:59:33 +00:00
Jazlyn Li	121235956b	update Node.is_impure check if subgraph contains impure ops (#166609 ) Summary: ## Context when `const_fold.split_const_subgraphs` sees a `call_module` node that is a GraphModule, by the existing implementation it can mark this node as const-foldable when it shouldn't. For example, a parent graph contains a `call_module` to a subgraph that has no inputs but contain impure ops inside. ``` parent graph(): %sub : [num_users=1] = call_module[target=sub](args = (), kwargs = {}) %getitem : [num_users=1] = call_function[target=operator.getitem](args = (%sub, slice(None, None, None)), kwargs = {}) return (getitem,) submodule graph(): %randn : [num_users=1] = call_function[target=torch.ops.aten.randn.default](args = ([5, 10],), kwargs = {device: cpu, pin_memory: False}) %add : [num_users=1] = call_function[target=torch.ops.aten.add.Tensor](args = (%randn, 1), kwargs = {}) return (add,) ``` when `submodule` graph is fed to const_fold.split_const_subgraph, it would come out unmodified since randn is impure. But if the `submodule` is called by a `parent` graph, when `parent` is fed to const_fold.split_const_subgraph, it would come out folded. ``` parent after fold graph(): %_fx_const_folded_attrs : [num_users=1] = get_attr[target=_FX_CONST_FOLDED_ATTRS] return (_fx_const_folded_attrs,) ``` This is because `node.is_impure()` check inside `const_fold.split_const_subgraph` fail through, leading the call_module node to be marked as pure. ## Fix We can update `fx.node.Node.is_impure` function to check for ops inside a call_module node with an additional `subgraph_has_impure_ops` check: - if a call_module node calls a GraphModule, - check any call_function nodes are impure ops - recursively check any call_module nodes that call GraphModule If the call_module subgraph has impure ops, return True to `is_impure` Test Plan: added tests to test_fx_const_fold.py Differential Revision: D85798483 Pull Request resolved: https://github.com/pytorch/pytorch/pull/166609 Approved by: https://github.com/blaine-rister	2025-10-31 16:58:18 +00:00
Lucas Kabela	aa9c96af04	[BE][Typing][Dynamo] Type misc files in `torch/_dynamo/variables/` (#166569 ) Provides type coverage to ~3000 LOC and 200 methods in `torch/_dynamo/variables/` This is the first part of the final step to having 100% strict type coverage in dynamo - see previous comments in https://github.com/pytorch/pytorch/pull/166535 (combined into this one PR because ghstack was giving issues...) ### Coverage report: ``` mypy torch_dynamo/variables --linecount-report /tmp/coverage_log ``` Compare before to after - we go from 3826 to 7221 lines covered Pull Request resolved: https://github.com/pytorch/pytorch/pull/166569 Approved by: https://github.com/williamwen42	2025-10-31 16:56:50 +00:00
Jeff Daily	c3b71d5499	[ROCm][CI] remove relaxed tolerance for tf32 tests (#166478 ) Instead of relaxing tolerances for certain unit tests that exercise TF32 on MI300, skip the tests until hipblaslt accuracy is improved. Pull Request resolved: https://github.com/pytorch/pytorch/pull/166478 Approved by: https://github.com/jeffdaily Co-authored-by: Jeff Daily <jeff.daily@amd.com> Co-authored-by: Jagadish Krishnamoorthy <jagadish.krishnamoorthy@amd.com>	2025-10-31 16:15:42 +00:00
Kurt Mohler	1e3600b528	[MPS] Move `logaddexp/logaddexp2` to Metal and support complex (#166670 ) NOTE: Complex inputs are only supported in `logaddexp`. Since `logaddexp2` does not support complex inputs for CPU, it is not enabled for MPS in this PR either. Pull Request resolved: https://github.com/pytorch/pytorch/pull/166670 Approved by: https://github.com/malfet	2025-10-31 16:15:02 +00:00
Xuan Zhang	fee7624bd6	[PT2] set choice handler in config (#166607 ) Summary: We were setting the custom inductor choice using `torch._inductor.virtualized.V.set_choices_handler(CustomInductorChoices())`. However, this leads to inconsistent behaviors, even for jobs that are submitted back to back. In this diff, we pass in the choice handler via an inductor config and overwrite the default behavior when the config is provided. This sovles the inconsistent behavior. Test Plan: see D85785892 (internal only) Differential Revision: D85785879 Pull Request resolved: https://github.com/pytorch/pytorch/pull/166607 Approved by: https://github.com/eellison	2025-10-31 15:40:05 +00:00
Jeff Daily	24e94e021a	[ROCm][CI] create ROCm 7.1 magma tarball (#166693 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/166693 Approved by: https://github.com/jeffdaily Co-authored-by: Jeff Daily <jeff.daily@amd.com>	2025-10-31 15:20:00 +00:00
Xuehai Pan	69be99ee51	Remove manually synced arch versions in `tools/nightly.py` (#166616 ) Discussed with @atalman offline. To reduce duplicate changes and reduce the number of files to change when updating arch versions. ------ Pull Request resolved: https://github.com/pytorch/pytorch/pull/166616 Approved by: https://github.com/ezyang	2025-10-31 15:11:28 +00:00
Nikita Vedeneev	034e951b0c	[CUDA][cuBLASLt] addmm -- extend bias fusions to cases with (1 by n) shapes (#166307 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/166307 Approved by: https://github.com/eqy	2025-10-31 14:30:41 +00:00

1 2 3 4 5 ...

95310 Commits