Laith Sakka
1aef88c72d
Avoid DDE in narrow with unbacked start ( #166361 )
...
Slice knows how to handle unbacked start, we do not need to offset start before calling slice, we can leave it for slice.
The only edge case is when start<0 and start+length ==0 in that case slice and narrow would deviate,
for that case we shall pass dim_size instead of start+length
Pull Request resolved: https://github.com/pytorch/pytorch/pull/166361
Approved by: https://github.com/aorenste
2025-11-01 07:10:23 +00:00
Yuanyuan Chen
f0745ddb11
Replace c10::call_once with static initialization ( #166381 )
...
This PR replaces c10::call_once calls with static initialization when possible. C++11 semantics guarantees that static initialization is atomic. Static initialization also has lower cost than using c10::call_once.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/166381
Approved by: https://github.com/malfet
2025-11-01 07:09:40 +00:00
Yuanyuan Chen
e2dc32f4ba
Replace decltype(auto) with auto ( #166537 )
...
This PR replaces `decltype(auto)` with `auto` for C++ return type deduction and simplifies some templates.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/166537
Approved by: https://github.com/Skylion007
2025-11-01 00:30:23 +00:00
Kurt Mohler
1e3600b528
[MPS] Move logaddexp/logaddexp2 to Metal and support complex ( #166670 )
...
NOTE: Complex inputs are only supported in `logaddexp`. Since `logaddexp2` does not support complex inputs for CPU, it is not enabled for MPS in this PR either.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/166670
Approved by: https://github.com/malfet
2025-10-31 16:15:02 +00:00
Yu, Guangye
0ec0549823
Introduce a new API torch.xpu.get_per_process_memory_fraction ( #165511 )
...
# Motivation
Aligned with other backends, this PR introduces a new API torch.xpu.get_per_process_memory_fraction to allow user to retrieve the allowed memory fraction per a single process.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/165511
Approved by: https://github.com/EikanWang , https://github.com/ezyang
ghstack dependencies: #165508 , #165509 , #165510
2025-10-30 19:30:09 +00:00
linhaifeng
369f2d6951
[3/N] fix typo in other folders ( #166606 )
...
fix typo in other folders
#166374
#166126
_typos.toml
```bash
[files]
extend-exclude = ["tools/linter/dictionary.txt"]
[default.extend-words]
nd = "nd"
arange = "arange"
Nd = "Nd"
GLOBALs = "GLOBALs"
hte = "hte"
iy = "iy"
PN = "PN"
Dout = "Dout"
optin = "optin"
gam = "gam"
PTD = "PTD"
Sur = "Sur"
nin = "nin"
tme = "tme"
inpt = "inpt"
mis = "mis"
Raison = "Raison"
ouput = "ouput"
nto = "nto"
Onwer = "Onwer"
callibrate = "callibrate"
ser = "ser"
Metdata = "Metdata"
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/166606
Approved by: https://github.com/ezyang
2025-10-30 10:30:40 +00:00
thenumberouscode
94eaeb9cb8
[Conv1d] Check overflow before we compute padding size. ( #162363 )
...
Fixes https://github.com/pytorch/pytorch/issues/161877
also fixes https://github.com/pytorch/pytorch/issues/161875
Pull Request resolved: https://github.com/pytorch/pytorch/pull/162363
Approved by: https://github.com/jbschlosser
2025-10-29 03:27:20 +00:00
Yu, Guangye
753d9bd806
Introduce a new API torch.xpu.set_per_process_memory_fraction ( #165510 )
...
# Motivation
Aligned with other backends, this PR introduces a new API `torch.xpu.set_per_process_memory_fraction` to allow user to customize the allowed memory per a single process.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/165510
Approved by: https://github.com/EikanWang , https://github.com/ezyang
ghstack dependencies: #165508 , #165509
2025-10-29 03:24:52 +00:00
Nikita Shulga
d049ed2cb1
[BE] Fix metal compilation warnings ( #166315 )
...
- Fixes `s/#pragma onces/#pragma once` typoe
All methods in the headers must be inline, otherwise one gets barrage of following warnings
```
/Users/malfet/git/pytorch/pytorch/c10/metal/utils.h:337:7: warning: unused function 'conj<half __attribute__((ext_vector_type(2)))>' [-Wunused-function]
half2 conj(half2 a) {
^
/Users/malfet/git/pytorch/pytorch/c10/metal/utils.h:342:8: warning: unused function 'conj<float __attribute__((ext_vector_type(2)))>' [-Wunused-function]
float2 conj(float2 a) {
^
2 warnings generated.
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/166315
Approved by: https://github.com/seemethere , https://github.com/atalman
2025-10-27 20:17:10 +00:00
Kurt Mohler
c9b49e506e
[MPS] Add linalg.householder_product for MPS ( #166090 )
...
Fixes #166089
Pull Request resolved: https://github.com/pytorch/pytorch/pull/166090
Approved by: https://github.com/malfet
2025-10-24 21:13:56 +00:00
Eddie Yan
e64a814ae7
[CUDA] Add experimental green context support for SM carveout ( #159104 )
...
Low-level PyTorch APIs should be usable/stable enough at this point but we might move the underlying driver API usage a bit from here...
Built on top of @drisspg 's branch
Pull Request resolved: https://github.com/pytorch/pytorch/pull/159104
Approved by: https://github.com/ngimel , https://github.com/malfet , https://github.com/kwen2501
Co-authored-by: drisspg <drisspguessous@gmail.com>
Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com>
2025-10-22 21:38:52 +00:00
Pearu Peterson
d01f15152c
Move toUnderlying to headeronly ( #165694 )
...
As in the title. Required in upper PRs of this ghstack.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/165694
Approved by: https://github.com/janeyx99
2025-10-22 05:31:16 +00:00
Pearu Peterson
4fae6968b1
Move toString(ScalarType) and ScalarType ostream operator to headeronly ( #164405 ) ( #166018 )
...
This PR is created to replace the reverted PR https://github.com/pytorch/pytorch/pull/164405
Pull Request resolved: https://github.com/pytorch/pytorch/pull/166018
Approved by: https://github.com/janeyx99
2025-10-22 05:16:58 +00:00
Jeff Daily
2fde10d914
[ROCm] fix test_allocator_backend ( #166035 )
...
Fixes #165872 .
Forward fix PR #165298 . hipify was causing some symbols to be replaced.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/166035
Approved by: https://github.com/jeffdaily
Co-authored-by: Jeff Daily <jeff.daily@amd.com>
2025-10-22 03:46:23 +00:00
Yu, Guangye
8904a5a7c9
Move allocation size config to AllocatorConfig for cross-allocator sharing ( #159553 )
...
# Motivation
Make CUDA and XPU share the same config and code. And allow the other backends to reuse them.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/159553
Approved by: https://github.com/albanD
ghstack dependencies: #160067
2025-10-22 01:48:56 +00:00
Yuanyuan Chen
35153d0846
Simplify c10::guts::apply ( #164566 )
...
There is only one call site of `c10::guts::apply` that can be replaced by `:std::apply` except for ROCm. This PR therefore simplifies the implementation of `c10::guts::apply`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/164566
Approved by: https://github.com/Aidyn-A , https://github.com/albanD
2025-10-22 00:47:43 +00:00
KarhouTam
12aac12b8d
[Code Clean] Replace std::runtime_error with TORCH_CHECK ( #165209 )
...
Including:
1. `aten/src/ATen/core`
2. `c10/core`
Fixes part of #148114
Pull Request resolved: https://github.com/pytorch/pytorch/pull/165209
Approved by: https://github.com/FFFrog , https://github.com/albanD
2025-10-22 00:05:22 +00:00
Gufan Yin
e6ba4d0725
Back out "Do not decompose in functionalization/proxy tensor if autograd wouldn't have decomposed ( #164939 )" ( #165910 )
...
Summary:
Original commit changeset: d6d62d0c96dd
Original Phabricator Diff: D84468451 and D84613184
D84468451 caused CUDA OutOfMemoryError in model.
Test Plan:
D84468451 was found through bisect. Also double checked on recent trunk 9866939225248c2adc307be7a804b26db0b9b555: f815887517
With this diff that backs out D84468451 and D84613184 : f816114560
Differential Revision: D85025378
Pull Request resolved: https://github.com/pytorch/pytorch/pull/165910
Approved by: https://github.com/clee2000
2025-10-21 16:36:38 +00:00
Yu, Guangye
0bff65503c
Move hardware_destructive_interference_size to c10/core/alignment.h ( #160067 )
...
# Motivation
Move `hardware_destructive_interference_size` to `c10/core/alignment.h`, which gives a chance to reuse it across different accelerators.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/160067
Approved by: https://github.com/Skylion007 , https://github.com/EikanWang
2025-10-21 14:39:46 +00:00
Yuanyuan Chen
99c8640b5d
[1/N] Change C-style casts to static_cast or reinterpret_cast ( #165750 )
...
This series of changes try to cover C style casts into C++ alternatives.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/165750
Approved by: https://github.com/Skylion007
2025-10-20 23:27:13 +00:00
PyTorch MergeBot
ca7360e996
Revert "Move toString(ScalarType) and ScalarType ostream operator to headeronly ( #164405 )"
...
This reverts commit ca8bd5dbed .
Reverted https://github.com/pytorch/pytorch/pull/164405 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally ([comment](https://github.com/pytorch/pytorch/pull/164354#issuecomment-3423132083 ))
2025-10-20 17:48:08 +00:00
PyTorch MergeBot
69a4bfe8bb
Revert "Refactor out headeronly ArrayRef ( #164991 )"
...
This reverts commit 3806e9767b .
Reverted https://github.com/pytorch/pytorch/pull/164991 on behalf of https://github.com/clee2000 due to breaking internal tests D84961075 ([comment](https://github.com/pytorch/pytorch/pull/164991#issuecomment-3423058017 ))
2025-10-20 17:26:42 +00:00
PyTorch MergeBot
ab82456c16
Revert "[1/N] Change C-style casts to static_cast or reinterpret_cast ( #165750 )"
...
This reverts commit e1e8491b31 .
Reverted https://github.com/pytorch/pytorch/pull/165750 on behalf of https://github.com/pytorch-auto-revert due to Reverted automatically by pytorch's autorevert, to avoid this behaviour add the tag autorevert: disable ([comment](https://github.com/pytorch/pytorch/pull/165750#issuecomment-3422413890 ))
2025-10-20 14:51:58 +00:00
Yuanyuan Chen
e1e8491b31
[1/N] Change C-style casts to static_cast or reinterpret_cast ( #165750 )
...
This series of changes try to cover C style casts into C++ alternatives.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/165750
Approved by: https://github.com/Skylion007
2025-10-20 04:36:19 +00:00
Yu, Guangye
1b121d636e
Fix AllocatorConfig parse roundup division bug ( #165304 )
...
* #165288
Pull Request resolved: https://github.com/pytorch/pytorch/pull/165304
Approved by: https://github.com/albanD
ghstack dependencies: #165288 , #165289 , #165291 , #165298
2025-10-19 15:34:44 +00:00
Yu, Guangye
1ba808dd97
Refine CUDA BackendStaticInitializer for allocator select ( #165298 )
...
* #165288
Pull Request resolved: https://github.com/pytorch/pytorch/pull/165298
Approved by: https://github.com/albanD
ghstack dependencies: #165288 , #165289 , #165291
2025-10-19 15:34:44 +00:00
Yu, Guangye
a1114beed2
Deprecate overlapped functions in CUDAAllocatorConfig ( #165289 )
...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/165289
Approved by: https://github.com/albanD
ghstack dependencies: #165288
2025-10-19 15:34:26 +00:00
Yu, Guangye
4888ed440e
Refine Allocator Config error message friendly ( #165288 )
...
* __->__ #165288
Pull Request resolved: https://github.com/pytorch/pytorch/pull/165288
Approved by: https://github.com/albanD
2025-10-19 15:34:17 +00:00
Isalia20
ad67170c8b
[MPS] sparse matmuls ( #165232 )
...
Implements matmuls for sparse tensors. With this commit most of the core sparse operations should be implemented. Fixes:
https://github.com/pytorch/pytorch/issues/156540
https://github.com/pytorch/pytorch/issues/129842
Should be merged after:
https://github.com/pytorch/pytorch/pull/165102
To compare MPS and CPU, you can use this script:
```python
import torch
import time
import matplotlib.pyplot as plt
B, I, J, K = 8, 20000, 20000, 20000
num_iterations = 500
nnz_values = [10, 50, 100, 200, 500, 1000, 2000, 5000, 10000, 20000, 100000]
speedups = []
for nnz in nnz_values:
indices = torch.stack([
torch.randint(0, B, (nnz,)),
torch.randint(0, I, (nnz,)),
torch.randint(0, J, (nnz,)),
])
values = torch.rand(nnz)
sparse = torch.sparse_coo_tensor(indices, values, size=(B, I, J), device="mps").coalesce()
dense = torch.randn(B, J, 200, device="mps")
t1 = time.time()
for _ in range(num_iterations):
result = torch.bmm(sparse, dense)
torch.mps.synchronize()
t2 = time.time()
mps_time = (t2 - t1) / num_iterations
sparse_cpu = sparse.cpu()
dense_cpu = dense.cpu()
t1 = time.time()
for _ in range(num_iterations):
result_cpu = torch.bmm(sparse_cpu, dense_cpu)
t2 = time.time()
cpu_time = (t2 - t1) / num_iterations
speedup = cpu_time / mps_time
speedups.append(speedup)
print(f"nnz={nnz}: MPS={mps_time:.6f}s, CPU={cpu_time:.6f}s, Speedup={speedup:.2f}x")
plt.figure(figsize=(10, 6))
plt.plot(nnz_values, speedups, marker='o', linewidth=2, markersize=8)
plt.xlabel('Number of Non-Zero Elements (nnz)', fontsize=12)
plt.ylabel('Speedup (CPU time / MPS time)', fontsize=12)
plt.title('MPS vs CPU Speedup for Sparse-Dense BMM', fontsize=14)
plt.grid(True, alpha=0.3)
plt.axhline(y=1, color='r', linestyle='--', alpha=0.5)
plt.xscale('log')
plt.tight_layout()
plt.show()
```
## Tested on M1 Pro
<img width="1000" height="600" alt="Figure_1" src="https://github.com/user-attachments/assets/4a2402ec-3dc4-402d-8196-a0426906ca3d " />
Pull Request resolved: https://github.com/pytorch/pytorch/pull/165232
Approved by: https://github.com/malfet
2025-10-18 09:04:42 +00:00
Yuanyuan Chen
0f0b4bf029
[1/N] Remove unused header inclusion ( #165763 )
...
This PR removes unused header inclusion in C++ files.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/165763
Approved by: https://github.com/Skylion007
2025-10-18 05:23:11 +00:00
Shivam Raikundalia
a25a649e70
[Mem Snapshot] Add Metadata Field ( #165490 )
...
Summary:
The implementation adds the ability to:
Set custom metadata strings that will be attached to all subsequent allocations
Clear or change the metadata at any point
View the metadata in memory snapshots via _dump_snapshot()
Test Plan: Added test in test_cuda.py and check manually in snapshot to see that metadata was added.
Differential Revision: D84654933
Pull Request resolved: https://github.com/pytorch/pytorch/pull/165490
Approved by: https://github.com/yushangdi
2025-10-17 23:46:02 +00:00
Jane Xu
3806e9767b
Refactor out headeronly ArrayRef ( #164991 )
...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/164991
Approved by: https://github.com/swolchok
2025-10-17 18:32:39 +00:00
Yu, Guangye
b44fb14906
Remove unused parameter when query extension attribute ( #165623 )
...
# Motivation
This code is no longer needed since SYCL compiler 2025.0. We are now using compiler 2025.2 (two tool uplifts later), so it can be safely removed.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/165623
Approved by: https://github.com/EikanWang
ghstack dependencies: #165622
2025-10-17 08:16:13 +00:00
Yu, Guangye
51348c0219
Give a friendly message for older Intel GPU ( #165622 )
...
# Motivation
Notify the user if the GPU is older than officially supported. This provides a friendly warning that the GPU may work, but the experience could be unstable.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/165622
Approved by: https://github.com/EikanWang
2025-10-17 08:16:13 +00:00
PyTorch MergeBot
11e2084308
Revert "[Mem Snapshot] Add Metadata Field ( #165490 )"
...
This reverts commit 5b3ea75895 .
Reverted https://github.com/pytorch/pytorch/pull/165490 on behalf of https://github.com/pytorch-auto-revert due to Reverted automatically by pytorch's autorevert, to avoid this behaviour add the tag autorevert: disable ([comment](https://github.com/pytorch/pytorch/pull/165490#issuecomment-3413491091 ))
2025-10-17 02:01:53 +00:00
Shivam Raikundalia
5b3ea75895
[Mem Snapshot] Add Metadata Field ( #165490 )
...
Summary:
The implementation adds the ability to:
Set custom metadata strings that will be attached to all subsequent allocations
Clear or change the metadata at any point
View the metadata in memory snapshots via _dump_snapshot()
Test Plan: Added test in test_cuda.py and check manually in snapshot to see that metadata was added.
Differential Revision: D84654933
Pull Request resolved: https://github.com/pytorch/pytorch/pull/165490
Approved by: https://github.com/yushangdi
2025-10-16 22:54:27 +00:00
Yu, Guangye
219fb6aafc
Refactor CUDAAllocatorConfig using ConfigTokenizer ( #165281 )
...
* #165129
Pull Request resolved: https://github.com/pytorch/pytorch/pull/165281
Approved by: https://github.com/albanD
ghstack dependencies: #165129 , #165131 , #165135 , #165136
2025-10-16 15:26:50 +00:00
Yu, Guangye
515b5ff539
Remove unused code in CUDAAllocatorConfig ( #165136 )
...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/165136
Approved by: https://github.com/Skylion007
ghstack dependencies: #165129 , #165131 , #165135
2025-10-16 15:26:50 +00:00
Yu, Guangye
608a6d4a26
Reuse AcceleratorAllocatorConfig in CUDAAllocatorConfig ( #165135 )
...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/165135
Approved by: https://github.com/Skylion007
ghstack dependencies: #165129 , #165131
2025-10-16 15:26:40 +00:00
Yu, Guangye
03e5dbb26e
Register CUDAAllocatorConfig to AcceleratorAllocatorConfig ( #165131 )
...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/165131
Approved by: https://github.com/Skylion007
ghstack dependencies: #165129
2025-10-16 15:26:28 +00:00
Yu, Guangye
7ee45f7503
Restore AcceleratorAllocatorConfig to avoid potential regression ( #165129 )
...
# Motivation
This PR aims to restore `AcceleratorAllocatorConfig` to avoid the potential regression mentioned in https://github.com/pytorch/pytorch/pull/160666#issue-3323270375
These code change would be reverted in the following PR https://github.com/pytorch/pytorch/pull/165304
Pull Request resolved: https://github.com/pytorch/pytorch/pull/165129
Approved by: https://github.com/albanD
2025-10-16 15:26:17 +00:00
Yu, Guangye
d0c32971b4
Refine XPU allocator message when OOM ( #165509 )
...
# Motivation
Provide more information and align with other backends to enhance the user experience.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/165509
Approved by: https://github.com/EikanWang
ghstack dependencies: #165508
2025-10-16 05:47:49 +00:00
Yu, Guangye
66b75693ae
Reuse kLargeBuffer in XPUCachingAllocator ( #165508 )
...
# Motivation
Reuse the shared code.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/165508
Approved by: https://github.com/EikanWang
2025-10-16 04:12:52 +00:00
Pearu Peterson
ca8bd5dbed
Move toString(ScalarType) and ScalarType ostream operator to headeronly ( #164405 )
...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/164405
Approved by: https://github.com/Skylion007 , https://github.com/janeyx99
ghstack dependencies: #164350 , #164354
2025-10-16 00:55:43 +00:00
Pearu Peterson
48064acf37
Move AT_FORALL_... macros and ScalarTypeToCPPTypeT to headeronly ( #164350 )
...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/164350
Approved by: https://github.com/janeyx99
2025-10-16 00:55:42 +00:00
Yuanyuan Chen
36871622f1
[2/N] Mark unused parameters in C++ code ( #165121 )
...
This is follow-up of #164912 to mark unused C++ parameters to improve code readability.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/165121
Approved by: https://github.com/Skylion007
2025-10-15 03:04:39 +00:00
Lakshay Garg
496adf9f9c
Replace insert with std::rotate_copy for RingBuffer ( #165348 )
...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/165348
Approved by: https://github.com/eqy , https://github.com/Skylion007
2025-10-14 05:11:28 +00:00
Colin Peppler
37d57ac9cb
Use sym_eq in _check_rms_norm_inputs_symint ( #165112 )
...
Summary:
### Problem
ArrayRef's `equals()`does elementwise quality using `==` operator. This can cause a DDE for unbacked symints since `==` operator calls `guard_bool`.
```
// SymInt.h
bool operator==(const SymInt& o) const {
return sym_eq(o).guard_bool(__FILE__, __LINE__);
}
```
### Solution
Adds `sym_equals()` to do elementwise equality for `SymIntArrayRef`. Use this instead of `equals()` for `SymIntArrayRef`.
Reviewed By: guangy10, pianpwk, muchulee8
Differential Revision: D84168401
Pull Request resolved: https://github.com/pytorch/pytorch/pull/165112
Approved by: https://github.com/Skylion007
2025-10-14 00:06:24 +00:00
PyTorch MergeBot
955cd7060b
Revert "Update round size with 1 division behavior ( #162203 )"
...
This reverts commit 12d2ef557f .
Reverted https://github.com/pytorch/pytorch/pull/162203 on behalf of https://github.com/izaitsevfb due to Diff reverted internally ([comment](https://github.com/pytorch/pytorch/pull/162203#issuecomment-3398622898 ))
2025-10-13 18:32:37 +00:00
Ma, Jing1
59ad8f1ac6
[XPU] Enhance XPUGeneratorImpl functionality to support XPUGraph ( #163332 )
...
As this [XPUGraph RFC](https://github.com/pytorch/pytorch/issues/162143 ) descripted. This PR enhances `XPUGeneratorImpl` to support XPUGraph.
In this PR, we add `XPUGerneratorState` and `PhiloxXpuState`. Which makes XPUGraph update philox state during graph capture and replay correctly
XPUGraph PR submission plan:
- [ ] 1, Enhance XPUGenerator functionality. Add XPUGeneratorState and philoxState
- [ ] 2, implemenet XPUGraph capture_begin/capture_end/instantiate functionality
- [ ] 3, implemenet XPUGraph replay/debug_dump/reset functionality
- [ ] 4, python APIs: is_current_stream_capturing/graph_pool_handle/graph
- [ ] 5, python APIs: make_graphed_callables
Pull Request resolved: https://github.com/pytorch/pytorch/pull/163332
Approved by: https://github.com/gujinghui , https://github.com/EikanWang , https://github.com/albanD
2025-10-13 02:10:41 +00:00