pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-06 00:20:18 +01:00

Author	SHA1	Message	Date
Laith Sakka	1aef88c72d	Avoid DDE in narrow with unbacked start (#166361 ) Slice knows how to handle unbacked start, we do not need to offset start before calling slice, we can leave it for slice. The only edge case is when start<0 and start+length ==0 in that case slice and narrow would deviate, for that case we shall pass dim_size instead of start+length Pull Request resolved: https://github.com/pytorch/pytorch/pull/166361 Approved by: https://github.com/aorenste	2025-11-01 07:10:23 +00:00
Yuanyuan Chen	f0745ddb11	Replace c10::call_once with static initialization (#166381 ) This PR replaces c10::call_once calls with static initialization when possible. C++11 semantics guarantees that static initialization is atomic. Static initialization also has lower cost than using c10::call_once. Pull Request resolved: https://github.com/pytorch/pytorch/pull/166381 Approved by: https://github.com/malfet	2025-11-01 07:09:40 +00:00
Yuanyuan Chen	e2dc32f4ba	Replace decltype(auto) with auto (#166537 ) This PR replaces `decltype(auto)` with `auto` for C++ return type deduction and simplifies some templates. Pull Request resolved: https://github.com/pytorch/pytorch/pull/166537 Approved by: https://github.com/Skylion007	2025-11-01 00:30:23 +00:00
Kurt Mohler	1e3600b528	[MPS] Move `logaddexp/logaddexp2` to Metal and support complex (#166670 ) NOTE: Complex inputs are only supported in `logaddexp`. Since `logaddexp2` does not support complex inputs for CPU, it is not enabled for MPS in this PR either. Pull Request resolved: https://github.com/pytorch/pytorch/pull/166670 Approved by: https://github.com/malfet	2025-10-31 16:15:02 +00:00
Yu, Guangye	0ec0549823	Introduce a new API torch.xpu.get_per_process_memory_fraction (#165511 ) # Motivation Aligned with other backends, this PR introduces a new API torch.xpu.get_per_process_memory_fraction to allow user to retrieve the allowed memory fraction per a single process. Pull Request resolved: https://github.com/pytorch/pytorch/pull/165511 Approved by: https://github.com/EikanWang, https://github.com/ezyang ghstack dependencies: #165508, #165509, #165510	2025-10-30 19:30:09 +00:00
linhaifeng	369f2d6951	[3/N] fix typo in other folders (#166606 ) fix typo in other folders #166374 #166126 _typos.toml ```bash [files] extend-exclude = ["tools/linter/dictionary.txt"] [default.extend-words] nd = "nd" arange = "arange" Nd = "Nd" GLOBALs = "GLOBALs" hte = "hte" iy = "iy" PN = "PN" Dout = "Dout" optin = "optin" gam = "gam" PTD = "PTD" Sur = "Sur" nin = "nin" tme = "tme" inpt = "inpt" mis = "mis" Raison = "Raison" ouput = "ouput" nto = "nto" Onwer = "Onwer" callibrate = "callibrate" ser = "ser" Metdata = "Metdata" ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/166606 Approved by: https://github.com/ezyang	2025-10-30 10:30:40 +00:00
thenumberouscode	94eaeb9cb8	[Conv1d] Check overflow before we compute padding size. (#162363 ) Fixes https://github.com/pytorch/pytorch/issues/161877 also fixes https://github.com/pytorch/pytorch/issues/161875 Pull Request resolved: https://github.com/pytorch/pytorch/pull/162363 Approved by: https://github.com/jbschlosser	2025-10-29 03:27:20 +00:00
Yu, Guangye	753d9bd806	Introduce a new API torch.xpu.set_per_process_memory_fraction (#165510 ) # Motivation Aligned with other backends, this PR introduces a new API `torch.xpu.set_per_process_memory_fraction` to allow user to customize the allowed memory per a single process. Pull Request resolved: https://github.com/pytorch/pytorch/pull/165510 Approved by: https://github.com/EikanWang, https://github.com/ezyang ghstack dependencies: #165508, #165509	2025-10-29 03:24:52 +00:00
Nikita Shulga	d049ed2cb1	[BE] Fix metal compilation warnings (#166315 ) - Fixes `s/#pragma onces/#pragma once` typoe All methods in the headers must be inline, otherwise one gets barrage of following warnings ``` /Users/malfet/git/pytorch/pytorch/c10/metal/utils.h:337:7: warning: unused function 'conj<half __attribute__((ext_vector_type(2)))>' [-Wunused-function] half2 conj(half2 a) { ^ /Users/malfet/git/pytorch/pytorch/c10/metal/utils.h:342:8: warning: unused function 'conj<float __attribute__((ext_vector_type(2)))>' [-Wunused-function] float2 conj(float2 a) { ^ 2 warnings generated. ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/166315 Approved by: https://github.com/seemethere, https://github.com/atalman	2025-10-27 20:17:10 +00:00
Kurt Mohler	c9b49e506e	[MPS] Add `linalg.householder_product` for MPS (#166090 ) Fixes #166089 Pull Request resolved: https://github.com/pytorch/pytorch/pull/166090 Approved by: https://github.com/malfet	2025-10-24 21:13:56 +00:00
Eddie Yan	e64a814ae7	[CUDA] Add experimental green context support for SM carveout (#159104 ) Low-level PyTorch APIs should be usable/stable enough at this point but we might move the underlying driver API usage a bit from here... Built on top of @drisspg 's branch Pull Request resolved: https://github.com/pytorch/pytorch/pull/159104 Approved by: https://github.com/ngimel, https://github.com/malfet, https://github.com/kwen2501 Co-authored-by: drisspg <drisspguessous@gmail.com> Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com>	2025-10-22 21:38:52 +00:00
Pearu Peterson	d01f15152c	Move toUnderlying to headeronly (#165694 ) As in the title. Required in upper PRs of this ghstack. Pull Request resolved: https://github.com/pytorch/pytorch/pull/165694 Approved by: https://github.com/janeyx99	2025-10-22 05:31:16 +00:00
Pearu Peterson	4fae6968b1	Move toString(ScalarType) and ScalarType ostream operator to headeronly (#164405 ) (#166018 ) This PR is created to replace the reverted PR https://github.com/pytorch/pytorch/pull/164405 Pull Request resolved: https://github.com/pytorch/pytorch/pull/166018 Approved by: https://github.com/janeyx99	2025-10-22 05:16:58 +00:00
Jeff Daily	2fde10d914	[ROCm] fix test_allocator_backend (#166035 ) Fixes #165872. Forward fix PR #165298. hipify was causing some symbols to be replaced. Pull Request resolved: https://github.com/pytorch/pytorch/pull/166035 Approved by: https://github.com/jeffdaily Co-authored-by: Jeff Daily <jeff.daily@amd.com>	2025-10-22 03:46:23 +00:00
Yu, Guangye	8904a5a7c9	Move allocation size config to AllocatorConfig for cross-allocator sharing (#159553 ) # Motivation Make CUDA and XPU share the same config and code. And allow the other backends to reuse them. Pull Request resolved: https://github.com/pytorch/pytorch/pull/159553 Approved by: https://github.com/albanD ghstack dependencies: #160067	2025-10-22 01:48:56 +00:00
Yuanyuan Chen	35153d0846	Simplify c10::guts::apply (#164566 ) There is only one call site of `c10::guts::apply` that can be replaced by `:std::apply` except for ROCm. This PR therefore simplifies the implementation of `c10::guts::apply`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/164566 Approved by: https://github.com/Aidyn-A, https://github.com/albanD	2025-10-22 00:47:43 +00:00
KarhouTam	12aac12b8d	[Code Clean] Replace `std::runtime_error` with `TORCH_CHECK` (#165209 ) Including: 1. `aten/src/ATen/core` 2. `c10/core` Fixes part of #148114 Pull Request resolved: https://github.com/pytorch/pytorch/pull/165209 Approved by: https://github.com/FFFrog, https://github.com/albanD	2025-10-22 00:05:22 +00:00
Gufan Yin	e6ba4d0725	Back out "Do not decompose in functionalization/proxy tensor if autograd wouldn't have decomposed (#164939 )" (#165910 ) Summary: Original commit changeset: d6d62d0c96dd Original Phabricator Diff: D84468451 and D84613184 D84468451 caused CUDA OutOfMemoryError in model. Test Plan: D84468451 was found through bisect. Also double checked on recent trunk 9866939225248c2adc307be7a804b26db0b9b555: f815887517 With this diff that backs out D84468451 and D84613184 : f816114560 Differential Revision: D85025378 Pull Request resolved: https://github.com/pytorch/pytorch/pull/165910 Approved by: https://github.com/clee2000	2025-10-21 16:36:38 +00:00
Yu, Guangye	0bff65503c	Move hardware_destructive_interference_size to c10/core/alignment.h (#160067 ) # Motivation Move `hardware_destructive_interference_size` to `c10/core/alignment.h`, which gives a chance to reuse it across different accelerators. Pull Request resolved: https://github.com/pytorch/pytorch/pull/160067 Approved by: https://github.com/Skylion007, https://github.com/EikanWang	2025-10-21 14:39:46 +00:00
Yuanyuan Chen	99c8640b5d	[1/N] Change C-style casts to static_cast or reinterpret_cast (#165750 ) This series of changes try to cover C style casts into C++ alternatives. Pull Request resolved: https://github.com/pytorch/pytorch/pull/165750 Approved by: https://github.com/Skylion007	2025-10-20 23:27:13 +00:00
PyTorch MergeBot	ca7360e996	Revert "Move toString(ScalarType) and ScalarType ostream operator to headeronly (#164405 )" This reverts commit `ca8bd5dbed`. Reverted https://github.com/pytorch/pytorch/pull/164405 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally ([comment](https://github.com/pytorch/pytorch/pull/164354#issuecomment-3423132083))	2025-10-20 17:48:08 +00:00
PyTorch MergeBot	69a4bfe8bb	Revert "Refactor out headeronly ArrayRef (#164991 )" This reverts commit `3806e9767b`. Reverted https://github.com/pytorch/pytorch/pull/164991 on behalf of https://github.com/clee2000 due to breaking internal tests D84961075 ([comment](https://github.com/pytorch/pytorch/pull/164991#issuecomment-3423058017))	2025-10-20 17:26:42 +00:00
PyTorch MergeBot	ab82456c16	Revert "[1/N] Change C-style casts to static_cast or reinterpret_cast (#165750 )" This reverts commit `e1e8491b31`. Reverted https://github.com/pytorch/pytorch/pull/165750 on behalf of https://github.com/pytorch-auto-revert due to Reverted automatically by pytorch's autorevert, to avoid this behaviour add the tag autorevert: disable ([comment](https://github.com/pytorch/pytorch/pull/165750#issuecomment-3422413890))	2025-10-20 14:51:58 +00:00
Yuanyuan Chen	e1e8491b31	[1/N] Change C-style casts to static_cast or reinterpret_cast (#165750 ) This series of changes try to cover C style casts into C++ alternatives. Pull Request resolved: https://github.com/pytorch/pytorch/pull/165750 Approved by: https://github.com/Skylion007	2025-10-20 04:36:19 +00:00
Yu, Guangye	1b121d636e	Fix AllocatorConfig parse roundup division bug (#165304 ) * #165288 Pull Request resolved: https://github.com/pytorch/pytorch/pull/165304 Approved by: https://github.com/albanD ghstack dependencies: #165288, #165289, #165291, #165298	2025-10-19 15:34:44 +00:00
Yu, Guangye	1ba808dd97	Refine CUDA BackendStaticInitializer for allocator select (#165298 ) * #165288 Pull Request resolved: https://github.com/pytorch/pytorch/pull/165298 Approved by: https://github.com/albanD ghstack dependencies: #165288, #165289, #165291	2025-10-19 15:34:44 +00:00
Yu, Guangye	a1114beed2	Deprecate overlapped functions in CUDAAllocatorConfig (#165289 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/165289 Approved by: https://github.com/albanD ghstack dependencies: #165288	2025-10-19 15:34:26 +00:00
Yu, Guangye	4888ed440e	Refine Allocator Config error message friendly (#165288 ) * __->__ #165288 Pull Request resolved: https://github.com/pytorch/pytorch/pull/165288 Approved by: https://github.com/albanD	2025-10-19 15:34:17 +00:00
Isalia20	ad67170c8b	[MPS] sparse matmuls (#165232 ) Implements matmuls for sparse tensors. With this commit most of the core sparse operations should be implemented. Fixes: https://github.com/pytorch/pytorch/issues/156540 https://github.com/pytorch/pytorch/issues/129842 Should be merged after: https://github.com/pytorch/pytorch/pull/165102 To compare MPS and CPU, you can use this script: ```python import torch import time import matplotlib.pyplot as plt B, I, J, K = 8, 20000, 20000, 20000 num_iterations = 500 nnz_values = [10, 50, 100, 200, 500, 1000, 2000, 5000, 10000, 20000, 100000] speedups = [] for nnz in nnz_values: indices = torch.stack([ torch.randint(0, B, (nnz,)), torch.randint(0, I, (nnz,)), torch.randint(0, J, (nnz,)), ]) values = torch.rand(nnz) sparse = torch.sparse_coo_tensor(indices, values, size=(B, I, J), device="mps").coalesce() dense = torch.randn(B, J, 200, device="mps") t1 = time.time() for _ in range(num_iterations): result = torch.bmm(sparse, dense) torch.mps.synchronize() t2 = time.time() mps_time = (t2 - t1) / num_iterations sparse_cpu = sparse.cpu() dense_cpu = dense.cpu() t1 = time.time() for _ in range(num_iterations): result_cpu = torch.bmm(sparse_cpu, dense_cpu) t2 = time.time() cpu_time = (t2 - t1) / num_iterations speedup = cpu_time / mps_time speedups.append(speedup) print(f"nnz={nnz}: MPS={mps_time:.6f}s, CPU={cpu_time:.6f}s, Speedup={speedup:.2f}x") plt.figure(figsize=(10, 6)) plt.plot(nnz_values, speedups, marker='o', linewidth=2, markersize=8) plt.xlabel('Number of Non-Zero Elements (nnz)', fontsize=12) plt.ylabel('Speedup (CPU time / MPS time)', fontsize=12) plt.title('MPS vs CPU Speedup for Sparse-Dense BMM', fontsize=14) plt.grid(True, alpha=0.3) plt.axhline(y=1, color='r', linestyle='--', alpha=0.5) plt.xscale('log') plt.tight_layout() plt.show() ``` ## Tested on M1 Pro <img width="1000" height="600" alt="Figure_1" src="https://github.com/user-attachments/assets/4a2402ec-3dc4-402d-8196-a0426906ca3d" /> Pull Request resolved: https://github.com/pytorch/pytorch/pull/165232 Approved by: https://github.com/malfet	2025-10-18 09:04:42 +00:00
Yuanyuan Chen	0f0b4bf029	[1/N] Remove unused header inclusion (#165763 ) This PR removes unused header inclusion in C++ files. Pull Request resolved: https://github.com/pytorch/pytorch/pull/165763 Approved by: https://github.com/Skylion007	2025-10-18 05:23:11 +00:00
Shivam Raikundalia	a25a649e70	[Mem Snapshot] Add Metadata Field (#165490 ) Summary: The implementation adds the ability to: Set custom metadata strings that will be attached to all subsequent allocations Clear or change the metadata at any point View the metadata in memory snapshots via _dump_snapshot() Test Plan: Added test in test_cuda.py and check manually in snapshot to see that metadata was added. Differential Revision: D84654933 Pull Request resolved: https://github.com/pytorch/pytorch/pull/165490 Approved by: https://github.com/yushangdi	2025-10-17 23:46:02 +00:00
Jane Xu	3806e9767b	Refactor out headeronly ArrayRef (#164991 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/164991 Approved by: https://github.com/swolchok	2025-10-17 18:32:39 +00:00
Yu, Guangye	b44fb14906	Remove unused parameter when query extension attribute (#165623 ) # Motivation This code is no longer needed since SYCL compiler 2025.0. We are now using compiler 2025.2 (two tool uplifts later), so it can be safely removed. Pull Request resolved: https://github.com/pytorch/pytorch/pull/165623 Approved by: https://github.com/EikanWang ghstack dependencies: #165622	2025-10-17 08:16:13 +00:00
Yu, Guangye	51348c0219	Give a friendly message for older Intel GPU (#165622 ) # Motivation Notify the user if the GPU is older than officially supported. This provides a friendly warning that the GPU may work, but the experience could be unstable. Pull Request resolved: https://github.com/pytorch/pytorch/pull/165622 Approved by: https://github.com/EikanWang	2025-10-17 08:16:13 +00:00
PyTorch MergeBot	11e2084308	Revert "[Mem Snapshot] Add Metadata Field (#165490 )" This reverts commit `5b3ea75895`. Reverted https://github.com/pytorch/pytorch/pull/165490 on behalf of https://github.com/pytorch-auto-revert due to Reverted automatically by pytorch's autorevert, to avoid this behaviour add the tag autorevert: disable ([comment](https://github.com/pytorch/pytorch/pull/165490#issuecomment-3413491091))	2025-10-17 02:01:53 +00:00
Shivam Raikundalia	5b3ea75895	[Mem Snapshot] Add Metadata Field (#165490 ) Summary: The implementation adds the ability to: Set custom metadata strings that will be attached to all subsequent allocations Clear or change the metadata at any point View the metadata in memory snapshots via _dump_snapshot() Test Plan: Added test in test_cuda.py and check manually in snapshot to see that metadata was added. Differential Revision: D84654933 Pull Request resolved: https://github.com/pytorch/pytorch/pull/165490 Approved by: https://github.com/yushangdi	2025-10-16 22:54:27 +00:00
Yu, Guangye	219fb6aafc	Refactor CUDAAllocatorConfig using ConfigTokenizer (#165281 ) * #165129 Pull Request resolved: https://github.com/pytorch/pytorch/pull/165281 Approved by: https://github.com/albanD ghstack dependencies: #165129, #165131, #165135, #165136	2025-10-16 15:26:50 +00:00
Yu, Guangye	515b5ff539	Remove unused code in CUDAAllocatorConfig (#165136 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/165136 Approved by: https://github.com/Skylion007 ghstack dependencies: #165129, #165131, #165135	2025-10-16 15:26:50 +00:00
Yu, Guangye	608a6d4a26	Reuse AcceleratorAllocatorConfig in CUDAAllocatorConfig (#165135 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/165135 Approved by: https://github.com/Skylion007 ghstack dependencies: #165129, #165131	2025-10-16 15:26:40 +00:00
Yu, Guangye	03e5dbb26e	Register CUDAAllocatorConfig to AcceleratorAllocatorConfig (#165131 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/165131 Approved by: https://github.com/Skylion007 ghstack dependencies: #165129	2025-10-16 15:26:28 +00:00
Yu, Guangye	7ee45f7503	Restore AcceleratorAllocatorConfig to avoid potential regression (#165129 ) # Motivation This PR aims to restore `AcceleratorAllocatorConfig` to avoid the potential regression mentioned in https://github.com/pytorch/pytorch/pull/160666#issue-3323270375 These code change would be reverted in the following PR https://github.com/pytorch/pytorch/pull/165304 Pull Request resolved: https://github.com/pytorch/pytorch/pull/165129 Approved by: https://github.com/albanD	2025-10-16 15:26:17 +00:00
Yu, Guangye	d0c32971b4	Refine XPU allocator message when OOM (#165509 ) # Motivation Provide more information and align with other backends to enhance the user experience. Pull Request resolved: https://github.com/pytorch/pytorch/pull/165509 Approved by: https://github.com/EikanWang ghstack dependencies: #165508	2025-10-16 05:47:49 +00:00
Yu, Guangye	66b75693ae	Reuse kLargeBuffer in XPUCachingAllocator (#165508 ) # Motivation Reuse the shared code. Pull Request resolved: https://github.com/pytorch/pytorch/pull/165508 Approved by: https://github.com/EikanWang	2025-10-16 04:12:52 +00:00
Pearu Peterson	ca8bd5dbed	Move toString(ScalarType) and ScalarType ostream operator to headeronly (#164405 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/164405 Approved by: https://github.com/Skylion007, https://github.com/janeyx99 ghstack dependencies: #164350, #164354	2025-10-16 00:55:43 +00:00
Pearu Peterson	48064acf37	Move AT_FORALL_... macros and ScalarTypeToCPPTypeT to headeronly (#164350 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/164350 Approved by: https://github.com/janeyx99	2025-10-16 00:55:42 +00:00
Yuanyuan Chen	36871622f1	[2/N] Mark unused parameters in C++ code (#165121 ) This is follow-up of #164912 to mark unused C++ parameters to improve code readability. Pull Request resolved: https://github.com/pytorch/pytorch/pull/165121 Approved by: https://github.com/Skylion007	2025-10-15 03:04:39 +00:00
Lakshay Garg	496adf9f9c	Replace insert with std::rotate_copy for RingBuffer (#165348 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/165348 Approved by: https://github.com/eqy, https://github.com/Skylion007	2025-10-14 05:11:28 +00:00
Colin Peppler	37d57ac9cb	Use sym_eq in _check_rms_norm_inputs_symint (#165112 ) Summary: ### Problem ArrayRef's `equals()`does elementwise quality using `==` operator. This can cause a DDE for unbacked symints since `==` operator calls `guard_bool`. ``` // SymInt.h bool operator==(const SymInt& o) const { return sym_eq(o).guard_bool(__FILE__, __LINE__); } ``` ### Solution Adds `sym_equals()` to do elementwise equality for `SymIntArrayRef`. Use this instead of `equals()` for `SymIntArrayRef`. Reviewed By: guangy10, pianpwk, muchulee8 Differential Revision: D84168401 Pull Request resolved: https://github.com/pytorch/pytorch/pull/165112 Approved by: https://github.com/Skylion007	2025-10-14 00:06:24 +00:00
PyTorch MergeBot	955cd7060b	Revert "Update round size with 1 division behavior (#162203 )" This reverts commit `12d2ef557f`. Reverted https://github.com/pytorch/pytorch/pull/162203 on behalf of https://github.com/izaitsevfb due to Diff reverted internally ([comment](https://github.com/pytorch/pytorch/pull/162203#issuecomment-3398622898))	2025-10-13 18:32:37 +00:00
Ma, Jing1	59ad8f1ac6	[XPU] Enhance XPUGeneratorImpl functionality to support XPUGraph (#163332 ) As this [XPUGraph RFC](https://github.com/pytorch/pytorch/issues/162143) descripted. This PR enhances `XPUGeneratorImpl` to support XPUGraph. In this PR, we add `XPUGerneratorState` and `PhiloxXpuState`. Which makes XPUGraph update philox state during graph capture and replay correctly XPUGraph PR submission plan: - [ ] 1, Enhance XPUGenerator functionality. Add XPUGeneratorState and philoxState - [ ] 2, implemenet XPUGraph capture_begin/capture_end/instantiate functionality - [ ] 3, implemenet XPUGraph replay/debug_dump/reset functionality - [ ] 4, python APIs: is_current_stream_capturing/graph_pool_handle/graph - [ ] 5, python APIs: make_graphed_callables Pull Request resolved: https://github.com/pytorch/pytorch/pull/163332 Approved by: https://github.com/gujinghui, https://github.com/EikanWang, https://github.com/albanD	2025-10-13 02:10:41 +00:00

1 2 3 4 5 ...

3174 Commits