Commit Graph

88238 Commits

Author SHA1 Message Date
Laith Sakka
39df901b2a introduce definitely_contiguous and use it for reshape and tensor meta data computation. (#153432)
when a tensor has unbacked symbols it can be general enough to represent both contiguous and non contiguous tensors.
in that case we cant really evaluate is_contiguous. In many places in the code base, we check for is_contiguous to take a fast path. but the general path usually works for both contiguous and not contiguous in that case we probably want
to use definitely _contiguous API.

This is appleid for reshape in this PR and also to  tensor meta data computation, the meta data now will have an attribute that says that its contiguous when its always contiguous. We would store that only if definitely _contiguous is true  now.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/153432
Approved by: https://github.com/bobrenjc93
2025-05-28 03:41:26 +00:00
Sidharth
54f1f29fed [dynamo] dynamic gb_type -> static gb_type (#154435)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/154435
Approved by: https://github.com/williamwen42
2025-05-28 03:14:26 +00:00
ZhiweiYan-96
f12ce4e36b [Intel GPU] convolution fusion at XPU backend (#154202)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/154202
Approved by: https://github.com/EikanWang, https://github.com/guangyey, https://github.com/etaf
ghstack dependencies: #140365
2025-05-28 03:14:18 +00:00
FFFrog
c6fc11af76 Fix the Problems About Defining Static Variable in Inline Function (#147095)
Refer to https://github.com/pytorch/pytorch/issues/125465 for more informations

- Remove unused header files
- Move the inline function that defines the static variable to .cc

Pull Request resolved: https://github.com/pytorch/pytorch/pull/147095
Approved by: https://github.com/cyyever, https://github.com/albanD
2025-05-28 02:47:16 +00:00
bobrenjc93
855eff8e8e Don't CSE unbacked nodes (#154387)
* #154440
Pull Request resolved: https://github.com/pytorch/pytorch/pull/154387
Approved by: https://github.com/TroyGarden
ghstack dependencies: #154440
2025-05-28 02:21:56 +00:00
bobrenjc93
919a1a17e3 [ez] Replace misleading implementations with NYI (#154440)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/154440
Approved by: https://github.com/Skylion007, https://github.com/pianpwk
2025-05-28 02:21:56 +00:00
Bin Bao
a84d8c4a1c [AOTI] Support multi-arch when using package_cpp_only (#154414)
Summary: Add support of multi_arch_kernel_binary in the package_cpp_only mode. More specifically, generate specific cmake targets to compile .ptx to .fatbin and embed them in the final shared library or binary.

Differential Revision: [D75452096](https://our.internmc.facebook.com/intern/diff/D75452096)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/154414
Approved by: https://github.com/angelayi
ghstack dependencies: #154412, #154413
2025-05-28 01:20:38 +00:00
Bin Bao
cde82d25b7 [AOTI] Add a multi_arch_kernel_binary option (#154413)
Summary: CUDA can support multi-arch with the fatbin format. Add this multi_arch_kernel_binary option, so the compiled model binary can run across different GPU archs.

Differential Revision: [D75452094](https://our.internmc.facebook.com/intern/diff/D75452094)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/154413
Approved by: https://github.com/angelayi
ghstack dependencies: #154412
2025-05-28 01:20:38 +00:00
Bin Bao
4d8f3d537a [AOTI][refactor] Rename embed_cubin to embed_kernel_binary (#154412)
Summary: Rename as it is not CUDA specific.

Differential Revision: [D75452095](https://our.internmc.facebook.com/intern/diff/D75452095)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/154412
Approved by: https://github.com/angelayi
2025-05-28 01:20:28 +00:00
bobrenjc93
e79790e14b [ez] add docblock for _sympy_from_args (#154376)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/154376
Approved by: https://github.com/Skylion007
ghstack dependencies: #154374, #154375
2025-05-27 23:43:13 +00:00
atalman
fe082c5ffe Move inductor workflows focal (ubuntu 20.04) -> jammy (ubuntu 22.04) (#154153)
Trying to fix: https://github.com/pytorch/pytorch/issues/154157

Pull Request resolved: https://github.com/pytorch/pytorch/pull/154153
Approved by: https://github.com/Skylion007, https://github.com/huydhn, https://github.com/nike4949, https://github.com/cyyever
2025-05-27 23:16:21 +00:00
iupaikov-amd
3f10c9d8af Fixed an issue with XPU skip so the test_decompose_mem_bound_mm.py suite can be ran correctly (#153245)
Fixes #153239

Replaced custom decorator with the common one. Although the better way to skip the whole suite would be to add it to skip list in run_test.py

Pull Request resolved: https://github.com/pytorch/pytorch/pull/153245
Approved by: https://github.com/jeffdaily
2025-05-27 23:10:25 +00:00
atalman
4b39832412 [CI] Update torchbench pin (#154453)
Related to https://github.com/pytorch/pytorch/issues/154446
Pins torchbench repo to a https://github.com/pytorch/benchmark/pull/2620 which pins opacus to ``1.5.3`` version

Pull Request resolved: https://github.com/pytorch/pytorch/pull/154453
Approved by: https://github.com/wdvr, https://github.com/malfet
2025-05-27 23:08:42 +00:00
atalman
247ea229ba Create issue template: Release highlight for proposed Feature (#154125)
Authors: @anitakat @atalman

This is related to: https://github.com/pytorch/pytorch/issues/152134 . Adding RFC template for feature submissions

Pull Request resolved: https://github.com/pytorch/pytorch/pull/154125
Approved by: https://github.com/anitakat, https://github.com/ZainRizvi, https://github.com/albanD
2025-05-27 22:45:21 +00:00
anwang
53affa273b [MTIA Aten Backend][1.3/n] Migrate remaining view ops, which all need explicit register in native_functions.yaml (#154337)
See context in D75266206.

This diff/PR migrates all the remaining view ops, which all need changes in `native_functions.yaml` and thus need to be exported to PR.

Ops covered by this diff:
- _reshape_alias
- unfold

internal: Also delete the entire aten_mtia_view_ops.cpp file, and update corresponding build config.

Differential Revision: [D75385411](https://our.internmc.facebook.com/intern/diff/D75385411/)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/154337
Approved by: https://github.com/nautsimon
ghstack dependencies: #154336
2025-05-27 22:18:12 +00:00
Shangdi Yu
eaf355cb11 [BE] Clean up unused parameter input in AOTIModel (#154276)
Summary: As title

Test Plan: CI

Differential Revision: D74691763

Pull Request resolved: https://github.com/pytorch/pytorch/pull/154276
Approved by: https://github.com/Skylion007
2025-05-27 22:17:32 +00:00
PyTorch MergeBot
241f8dc84d Revert "Remove outdated CUDA 11 conditions (#154313)"
This reverts commit 3936e6141c.

Reverted https://github.com/pytorch/pytorch/pull/154313 on behalf of https://github.com/izaitsevfb due to breaks internal builds ([comment](https://github.com/pytorch/pytorch/pull/154313#issuecomment-2914230005))
2025-05-27 21:54:41 +00:00
Jerry Mannil
6be829535f [ROCm] Improve vectorized elementwise kernel performance in MI300X (#153634)
* Use non-temporal loads to improve the vectorized elementwise kernel performance on MI300
* Use thread_work_size of 8 or 16 for vectorized elementwise kernel

Co-author: @amd-hhashemi

Pull Request resolved: https://github.com/pytorch/pytorch/pull/153634
Approved by: https://github.com/jeffdaily
2025-05-27 20:49:32 +00:00
PyTorch MergeBot
555fc05868 Revert "[Inductor] Improve typing, and prepare for ABI-compatible AOTI C-shim dispatching (#154371)"
This reverts commit 6169ca0b65.

Reverted https://github.com/pytorch/pytorch/pull/154371 on behalf of https://github.com/benjaminglass1 due to Appears to have broken main ([comment](https://github.com/pytorch/pytorch/pull/154371#issuecomment-2913975736))
2025-05-27 20:39:09 +00:00
Guilherme Leobas
7359705232 Add CPython tests for unittest (#150788)
Tests:
* test_assertions.py

Pull Request resolved: https://github.com/pytorch/pytorch/pull/150788
Approved by: https://github.com/williamwen42
2025-05-27 20:26:17 +00:00
Guilherme Leobas
12fc06d267 Add CPython complex tests (#152015)
Tests:
* test_complex.py

Pull Request resolved: https://github.com/pytorch/pytorch/pull/152015
Approved by: https://github.com/williamwen42
2025-05-27 20:24:28 +00:00
Guilherme Leobas
3b218e56dc Add CPython tests for iter/sort (#150797)
Tests:
* test_iter.py
* test_sort.py

Pull Request resolved: https://github.com/pytorch/pytorch/pull/150797
Approved by: https://github.com/williamwen42
2025-05-27 20:22:34 +00:00
bobrenjc93
4fd8a54a41 [ez] add docblock for is_accessor_node (#154375)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/154375
Approved by: https://github.com/Skylion007, https://github.com/pianpwk
ghstack dependencies: #154374
2025-05-27 19:47:32 +00:00
tvukovic-amd
b367e5f6a6 [ROCm][Windows] Fix building torch 2.8 wheel with ROCm (added hipblasLt and rocblas directories) (#153144)
Since rocblas.dll and hipblaslt.dll are copied to torch/lib, rocblas and hipblaslt directories are needed to be stored there too (otherwise we have an error after wheel installation while searching for files in rocblas/library and hipblaslt/library which doesn't exist). This PR fixes this issue.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/153144
Approved by: https://github.com/jeffdaily

Co-authored-by: Jeff Daily <jeff.daily@amd.com>
2025-05-27 19:40:28 +00:00
PyTorch MergeBot
fa6ca59079 Revert "Move inductor workflows focal (ubuntu 20.04) -> jammy (ubuntu 22.04) (#154153)"
This reverts commit 2bd95f3a1f.

Reverted https://github.com/pytorch/pytorch/pull/154153 on behalf of https://github.com/malfet due to Broke inductor tests, see b8452e55bc/1 ([comment](https://github.com/pytorch/pytorch/pull/154153#issuecomment-2913738047))
2025-05-27 19:23:28 +00:00
Benjamin Glass
6169ca0b65 [Inductor] Improve typing, and prepare for ABI-compatible AOTI C-shim dispatching (#154371)
Prepares for the next PR in the stack by tightening up typing on a `cpp_wrapper` interface that's only used in one (well-typed) place, as well as downstream effects of that change. In particular, this enabled:

1. removing a number of now clearly unnecessary asserts
2. adding a few more targeted asserts to validate the code's current assumptions
3. removing some unneeded control flow in several functions

As far as I can tell, this PR should be functionally neutral. One argument was removed from a `cpp_wrapper` public API, but that argument was unused, and only had a single callsite.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/154371
Approved by: https://github.com/desertfire
2025-05-27 19:17:41 +00:00
Ryan Guo
75bbd4989c [dynamo] Support using symint from dispatcher-style tensor subclass (#154130)
Fixes #146932.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/154130
Approved by: https://github.com/laithsakka
2025-05-27 19:05:46 +00:00
PyTorch MergeBot
8c0f07f944 Revert "[ROCm] Improve vectorized elementwise kernel performance in MI300X (#153634)"
This reverts commit 0d4de7872a.

Reverted https://github.com/pytorch/pytorch/pull/153634 on behalf of https://github.com/malfet due to Broke inductor jobs, see b8452e55bc/1 ([comment](https://github.com/pytorch/pytorch/pull/153634#issuecomment-2913619071))
2025-05-27 19:02:59 +00:00
Zizeng Meng
b8452e55bc [Kineto x Insight] Update Kineto submodule (#154426)
Summary: We add a new ActivityType::MTIA_INSIGHT in 20f652846f

Test Plan: CI

Differential Revision: D75454945

Pull Request resolved: https://github.com/pytorch/pytorch/pull/154426
Approved by: https://github.com/Skylion007
2025-05-27 18:29:29 +00:00
Nikita Shulga
5075df6fee Make torch importable if compiled without TensorPipe (#154382)
By delaying the import/hiding it behind `torch.distributed.rpc.is_tensorpipe_avaiable()` check
Fixes https://github.com/pytorch/pytorch/issues/154300

Pull Request resolved: https://github.com/pytorch/pytorch/pull/154382
Approved by: https://github.com/Skylion007
ghstack dependencies: #154325
2025-05-27 18:13:38 +00:00
Nikita Shulga
f472ea63bb [BE] Fix typos in SyntaxError description (#154436)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/154436
Approved by: https://github.com/seemethere, https://github.com/wdvr, https://github.com/ZainRizvi
2025-05-27 18:08:58 +00:00
Cyrus Daruwala
cfbd99fdfd [Pytorch] Add option to CPU Blas GEMM to avoid output downcast (#154012)
Summary:
Dot product for a single output element consists of 3 steps (both input vectors have elements of type scalar_t):
1. elementwise vector multiply (scalar_t x scalar_t -> opmath_t)
2. vector reduction to a scalar value (opmath_t -> opmath_t)
3. optional downcast if opmath_t != out_t

The current blas kernel performs steps 1 and 2 correctly, but for step 3, it will always downcast to scalar_t even when opmath_t == output_t (and then do an upcast back to output_t), which results in precision loss. This diff fixes the precision loss in the BlasKernel

Test Plan: Attention CI passes

Differential Revision: D75023858

topic: not user facing

Pull Request resolved: https://github.com/pytorch/pytorch/pull/154012
Approved by: https://github.com/Valentine233, https://github.com/aditew01, https://github.com/CaoE, https://github.com/drisspg
2025-05-27 17:43:21 +00:00
bobrenjc93
1ca082d9a1 [ez] Rewrite comment to be more friendly to non haskellers (#151421)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/151421
Approved by: https://github.com/aorenste
2025-05-27 17:32:34 +00:00
bobrenjc93
70fbd5e08c [ez] Add docblock for resolve_unbacked_bindings (#154374)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/154374
Approved by: https://github.com/Skylion007, https://github.com/pianpwk
2025-05-27 17:05:49 +00:00
bobrenjc93
2560c1f3f0 add sticky cache pgo (#154418)
It's a reland of https://github.com/pytorch/pytorch/pull/154394 that hit some mergebot bug

Pull Request resolved: https://github.com/pytorch/pytorch/pull/154418
Approved by: https://github.com/malfet
2025-05-27 16:40:18 +00:00
Boyuan Feng
514409d032 update torchvision pin (#154255)
Fixes #153985

Pull Request resolved: https://github.com/pytorch/pytorch/pull/154255
Approved by: https://github.com/desertfire
2025-05-27 16:15:25 +00:00
ZhiweiYan-96
0ddfd1ed43 [Intel GPU] Enable mkdnn._linear_pointwise at XPU backend (#140365)
# Motivation

This PR is intended to add post-op fusion support fo Linear. The liner-pointwise fusion is expected to be used in graph mode like torch.compile. The FusionUtils.cpp file defines a utilization APIs for generating primitive attribute. This APIs would also be used for conv-pointwise fusion, which is in #140372.

# Validation
```bash
   python test/xpu/test_fusion.py
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/140365
Approved by: https://github.com/etaf, https://github.com/guangyey, https://github.com/EikanWang
2025-05-27 15:57:15 +00:00
Jerry Mannil
0d4de7872a [ROCm] Improve vectorized elementwise kernel performance in MI300X (#153634)
* Use non-temporal loads to improve the vectorized elementwise kernel performance on MI300
* Use thread_work_size of 8 or 16 for vectorized elementwise kernel

Co-author: @amd-hhashemi

Pull Request resolved: https://github.com/pytorch/pytorch/pull/153634
Approved by: https://github.com/jeffdaily
2025-05-27 15:38:43 +00:00
Xuehai Pan
7ae204c3b6 [BE][CI][Easy] Run lintrunner on generated .pyi stub files (#150732)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/150732
Approved by: https://github.com/malfet, https://github.com/cyyever, https://github.com/aorenste
2025-05-27 14:58:02 +00:00
Yuanhao Ji
0a7eef140b Add torch.Tensor._make_wrapper_subclass to torch/_C/__init__.pyi (#154022)
Fixes #153790

Pull Request resolved: https://github.com/pytorch/pytorch/pull/154022
Approved by: https://github.com/Skylion007
2025-05-27 14:10:00 +00:00
Nikita Shulga
d88699308f [CI][MacOS] Move more dependencies to pypi (#154309)
Hopefully last step before all Mac build/tests could be switched away from conda
- Update cmake version from 3.22 to 3.25 as 3.22 from pipy seems  to be unusable with python-3.12
- Add `--plat-name macosx_11_0_arm64` to setup.py command
- Remove `codesign` for cmake workaround (that was probably never really necessary
-  Install `libpng` and `jpeg-turbo` when building torchbench and build torchaudio without OpenMP (to be fixed)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/154309
Approved by: https://github.com/Skylion007, https://github.com/cyyever
2025-05-27 13:49:40 +00:00
PyTorch MergeBot
11a51a11af Revert "introduce definitely_contiguous and use it for reshape and tensor meta data computation. (#153432)"
This reverts commit 5c6d7caaaa.

Reverted https://github.com/pytorch/pytorch/pull/153432 on behalf of https://github.com/malfet due to Looks like it broke flex attention tests, see https://hud.pytorch.org/hud/pytorch/pytorch/main/1?per_page=50&name_filter=g6.4xlarge&mergeEphemeralLF=true ([comment](https://github.com/pytorch/pytorch/pull/153432#issuecomment-2912562570))
2025-05-27 13:42:34 +00:00
Emmanuel Menage
c52a002a22 Add getDeviceProperties api to torch mtia device (#153577)
topic: not user facing

Test Plan: Internal benchmark.

Differential Revision: D74256550

Pull Request resolved: https://github.com/pytorch/pytorch/pull/153577
Approved by: https://github.com/nautsimon
2025-05-27 11:55:58 +00:00
atalman
2bd95f3a1f Move inductor workflows focal (ubuntu 20.04) -> jammy (ubuntu 22.04) (#154153)
Trying to fix: https://github.com/pytorch/pytorch/issues/154157

Pull Request resolved: https://github.com/pytorch/pytorch/pull/154153
Approved by: https://github.com/Skylion007, https://github.com/huydhn, https://github.com/nike4949, https://github.com/cyyever
2025-05-27 11:53:47 +00:00
Tom Ritchford
6f86c1ce1d Add pyrefly.toml (#154144)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/154144
Approved by: https://github.com/Skylion007
2025-05-27 10:16:30 +00:00
Laith Sakka
5c6d7caaaa introduce definitely_contiguous and use it for reshape and tensor meta data computation. (#153432)
when a tensor has unbacked symbols it can be general enough to represent both contiguous and non contiguous tensors.
in that case we cant really evaluate is_contiguous. In many places in the code base, we check for is_contiguous to take a fast path. but the general path usually works for both contiguous and not contiguous in that case we probably want
to use definitely _contiguous API.

This is appleid for reshape in this PR and also to  tensor meta data computation, the meta data now will have an attribute that says that its contiguous when its always contiguous. We would store that only if definitely _contiguous is true  now.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/153432
Approved by: https://github.com/bobrenjc93
2025-05-27 08:54:31 +00:00
anwang
dec5ab8d98 [MTIA Aten Backend][1.2/n] Migrate as_strided to in-tree, and add unit tests (#154336)
See context in PR https://github.com/pytorch/pytorch/pull/153670

This diff migrate as_strided to in-tree. I found it's not covered by `test_kernel_eager_ci` so also adding unit tests.

Differential Revision: [D75385404](https://our.internmc.facebook.com/intern/diff/D75385404/)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/154336
Approved by: https://github.com/nautsimon
2025-05-27 06:32:38 +00:00
PyTorch MergeBot
ef6306e1c6 Revert "[executorch hash update] update the pinned executorch hash (#153436)"
This reverts commit 8d6139b8d8.

Reverted https://github.com/pytorch/pytorch/pull/153436 on behalf of https://github.com/malfet due to Broke ET sanity ([comment](https://github.com/pytorch/pytorch/pull/153436#issuecomment-2911206795))
2025-05-27 06:02:14 +00:00
Yu, Guangye
870133b2a0 Use get_device_context in aoti runtime for XPU directly (#154360)
# Motivation
Reuse [c10::xpu::get_device_context](1bebe0424e/c10/xpu/XPUFunctions.h (L27)) directly to reduce overhead, as it returns a cached `sycl::context` managed by PyTorch.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/154360
Approved by: https://github.com/EikanWang
2025-05-27 05:55:59 +00:00
Jing Xu
8d89cdceb6 fix a compilation issue when TORCH_XPU_ARCH_LIST is an empty string (#153604)
When `XPU_ARCH_FLAGS` is an empty string, compilation will fail on `C10_STRINGIZE(XPU_ARCH_FLAGS)` in file `torch/csrc/xpu/Module.cpp` on Windows.
This PR fixes this issue by setting `TORCH_XPU_ARCH_LIST` to `""` to avoid an empty string conversion in `C10_STRINGIZE()` when compiling without an AOT.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/153604
Approved by: https://github.com/guangyey, https://github.com/EikanWang

Co-authored-by: Aaron Gokaslan <aaronGokaslan@gmail.com>
Co-authored-by: Yu, Guangye <106960996+guangyey@users.noreply.github.com>
2025-05-27 05:26:46 +00:00