pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-06 12:20:52 +01:00

Author	SHA1	Message	Date
Laith Sakka	39df901b2a	introduce definitely_contiguous and use it for reshape and tensor meta data computation. (#153432 ) when a tensor has unbacked symbols it can be general enough to represent both contiguous and non contiguous tensors. in that case we cant really evaluate is_contiguous. In many places in the code base, we check for is_contiguous to take a fast path. but the general path usually works for both contiguous and not contiguous in that case we probably want to use definitely _contiguous API. This is appleid for reshape in this PR and also to tensor meta data computation, the meta data now will have an attribute that says that its contiguous when its always contiguous. We would store that only if definitely _contiguous is true now. Pull Request resolved: https://github.com/pytorch/pytorch/pull/153432 Approved by: https://github.com/bobrenjc93	2025-05-28 03:41:26 +00:00
Sidharth	54f1f29fed	[dynamo] dynamic gb_type -> static gb_type (#154435 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/154435 Approved by: https://github.com/williamwen42	2025-05-28 03:14:26 +00:00
ZhiweiYan-96	f12ce4e36b	[Intel GPU] convolution fusion at XPU backend (#154202 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/154202 Approved by: https://github.com/EikanWang, https://github.com/guangyey, https://github.com/etaf ghstack dependencies: #140365	2025-05-28 03:14:18 +00:00
FFFrog	c6fc11af76	Fix the Problems About Defining Static Variable in Inline Function (#147095 ) Refer to https://github.com/pytorch/pytorch/issues/125465 for more informations - Remove unused header files - Move the inline function that defines the static variable to .cc Pull Request resolved: https://github.com/pytorch/pytorch/pull/147095 Approved by: https://github.com/cyyever, https://github.com/albanD	2025-05-28 02:47:16 +00:00
bobrenjc93	855eff8e8e	Don't CSE unbacked nodes (#154387 ) * #154440 Pull Request resolved: https://github.com/pytorch/pytorch/pull/154387 Approved by: https://github.com/TroyGarden ghstack dependencies: #154440	2025-05-28 02:21:56 +00:00
bobrenjc93	919a1a17e3	[ez] Replace misleading implementations with NYI (#154440 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/154440 Approved by: https://github.com/Skylion007, https://github.com/pianpwk	2025-05-28 02:21:56 +00:00
Bin Bao	a84d8c4a1c	[AOTI] Support multi-arch when using package_cpp_only (#154414 ) Summary: Add support of multi_arch_kernel_binary in the package_cpp_only mode. More specifically, generate specific cmake targets to compile .ptx to .fatbin and embed them in the final shared library or binary. Differential Revision: [D75452096](https://our.internmc.facebook.com/intern/diff/D75452096) Pull Request resolved: https://github.com/pytorch/pytorch/pull/154414 Approved by: https://github.com/angelayi ghstack dependencies: #154412, #154413	2025-05-28 01:20:38 +00:00
Bin Bao	cde82d25b7	[AOTI] Add a multi_arch_kernel_binary option (#154413 ) Summary: CUDA can support multi-arch with the fatbin format. Add this multi_arch_kernel_binary option, so the compiled model binary can run across different GPU archs. Differential Revision: [D75452094](https://our.internmc.facebook.com/intern/diff/D75452094) Pull Request resolved: https://github.com/pytorch/pytorch/pull/154413 Approved by: https://github.com/angelayi ghstack dependencies: #154412	2025-05-28 01:20:38 +00:00
Bin Bao	4d8f3d537a	[AOTI][refactor] Rename embed_cubin to embed_kernel_binary (#154412 ) Summary: Rename as it is not CUDA specific. Differential Revision: [D75452095](https://our.internmc.facebook.com/intern/diff/D75452095) Pull Request resolved: https://github.com/pytorch/pytorch/pull/154412 Approved by: https://github.com/angelayi	2025-05-28 01:20:28 +00:00
bobrenjc93	e79790e14b	[ez] add docblock for _sympy_from_args (#154376 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/154376 Approved by: https://github.com/Skylion007 ghstack dependencies: #154374, #154375	2025-05-27 23:43:13 +00:00
atalman	fe082c5ffe	Move inductor workflows focal (ubuntu 20.04) -> jammy (ubuntu 22.04) (#154153 ) Trying to fix: https://github.com/pytorch/pytorch/issues/154157 Pull Request resolved: https://github.com/pytorch/pytorch/pull/154153 Approved by: https://github.com/Skylion007, https://github.com/huydhn, https://github.com/nike4949, https://github.com/cyyever	2025-05-27 23:16:21 +00:00
iupaikov-amd	3f10c9d8af	Fixed an issue with XPU skip so the test_decompose_mem_bound_mm.py suite can be ran correctly (#153245 ) Fixes #153239 Replaced custom decorator with the common one. Although the better way to skip the whole suite would be to add it to skip list in run_test.py Pull Request resolved: https://github.com/pytorch/pytorch/pull/153245 Approved by: https://github.com/jeffdaily	2025-05-27 23:10:25 +00:00
atalman	4b39832412	[CI] Update torchbench pin (#154453 ) Related to https://github.com/pytorch/pytorch/issues/154446 Pins torchbench repo to a https://github.com/pytorch/benchmark/pull/2620 which pins opacus to ``1.5.3`` version Pull Request resolved: https://github.com/pytorch/pytorch/pull/154453 Approved by: https://github.com/wdvr, https://github.com/malfet	2025-05-27 23:08:42 +00:00
atalman	247ea229ba	Create issue template: Release highlight for proposed Feature (#154125 ) Authors: @anitakat @atalman This is related to: https://github.com/pytorch/pytorch/issues/152134 . Adding RFC template for feature submissions Pull Request resolved: https://github.com/pytorch/pytorch/pull/154125 Approved by: https://github.com/anitakat, https://github.com/ZainRizvi, https://github.com/albanD	2025-05-27 22:45:21 +00:00
anwang	53affa273b	[MTIA Aten Backend][1.3/n] Migrate remaining view ops, which all need explicit register in `native_functions.yaml` (#154337 ) See context in D75266206. This diff/PR migrates all the remaining view ops, which all need changes in `native_functions.yaml` and thus need to be exported to PR. Ops covered by this diff: - _reshape_alias - unfold internal: Also delete the entire aten_mtia_view_ops.cpp file, and update corresponding build config. Differential Revision: [D75385411](https://our.internmc.facebook.com/intern/diff/D75385411/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/154337 Approved by: https://github.com/nautsimon ghstack dependencies: #154336	2025-05-27 22:18:12 +00:00
Shangdi Yu	eaf355cb11	[BE] Clean up unused parameter input in AOTIModel (#154276 ) Summary: As title Test Plan: CI Differential Revision: D74691763 Pull Request resolved: https://github.com/pytorch/pytorch/pull/154276 Approved by: https://github.com/Skylion007	2025-05-27 22:17:32 +00:00
PyTorch MergeBot	241f8dc84d	Revert "Remove outdated CUDA 11 conditions (#154313 )" This reverts commit `3936e6141c`. Reverted https://github.com/pytorch/pytorch/pull/154313 on behalf of https://github.com/izaitsevfb due to breaks internal builds ([comment](https://github.com/pytorch/pytorch/pull/154313#issuecomment-2914230005))	2025-05-27 21:54:41 +00:00
Jerry Mannil	6be829535f	[ROCm] Improve vectorized elementwise kernel performance in MI300X (#153634 ) * Use non-temporal loads to improve the vectorized elementwise kernel performance on MI300 * Use thread_work_size of 8 or 16 for vectorized elementwise kernel Co-author: @amd-hhashemi Pull Request resolved: https://github.com/pytorch/pytorch/pull/153634 Approved by: https://github.com/jeffdaily	2025-05-27 20:49:32 +00:00
PyTorch MergeBot	555fc05868	Revert "[Inductor] Improve typing, and prepare for ABI-compatible AOTI C-shim dispatching (#154371 )" This reverts commit `6169ca0b65`. Reverted https://github.com/pytorch/pytorch/pull/154371 on behalf of https://github.com/benjaminglass1 due to Appears to have broken main ([comment](https://github.com/pytorch/pytorch/pull/154371#issuecomment-2913975736))	2025-05-27 20:39:09 +00:00
Guilherme Leobas	7359705232	Add CPython tests for unittest (#150788 ) Tests: * test_assertions.py Pull Request resolved: https://github.com/pytorch/pytorch/pull/150788 Approved by: https://github.com/williamwen42	2025-05-27 20:26:17 +00:00
Guilherme Leobas	12fc06d267	Add CPython complex tests (#152015 ) Tests: * test_complex.py Pull Request resolved: https://github.com/pytorch/pytorch/pull/152015 Approved by: https://github.com/williamwen42	2025-05-27 20:24:28 +00:00
Guilherme Leobas	3b218e56dc	Add CPython tests for iter/sort (#150797 ) Tests: * test_iter.py * test_sort.py Pull Request resolved: https://github.com/pytorch/pytorch/pull/150797 Approved by: https://github.com/williamwen42	2025-05-27 20:22:34 +00:00
bobrenjc93	4fd8a54a41	[ez] add docblock for is_accessor_node (#154375 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/154375 Approved by: https://github.com/Skylion007, https://github.com/pianpwk ghstack dependencies: #154374	2025-05-27 19:47:32 +00:00
tvukovic-amd	b367e5f6a6	[ROCm][Windows] Fix building torch 2.8 wheel with ROCm (added hipblasLt and rocblas directories) (#153144 ) Since rocblas.dll and hipblaslt.dll are copied to torch/lib, rocblas and hipblaslt directories are needed to be stored there too (otherwise we have an error after wheel installation while searching for files in rocblas/library and hipblaslt/library which doesn't exist). This PR fixes this issue. Pull Request resolved: https://github.com/pytorch/pytorch/pull/153144 Approved by: https://github.com/jeffdaily Co-authored-by: Jeff Daily <jeff.daily@amd.com>	2025-05-27 19:40:28 +00:00
PyTorch MergeBot	fa6ca59079	Revert "Move inductor workflows focal (ubuntu 20.04) -> jammy (ubuntu 22.04) (#154153 )" This reverts commit `2bd95f3a1f`. Reverted https://github.com/pytorch/pytorch/pull/154153 on behalf of https://github.com/malfet due to Broke inductor tests, see `b8452e55bc/1` ([comment](https://github.com/pytorch/pytorch/pull/154153#issuecomment-2913738047))	2025-05-27 19:23:28 +00:00
Benjamin Glass	6169ca0b65	[Inductor] Improve typing, and prepare for ABI-compatible AOTI C-shim dispatching (#154371 ) Prepares for the next PR in the stack by tightening up typing on a `cpp_wrapper` interface that's only used in one (well-typed) place, as well as downstream effects of that change. In particular, this enabled: 1. removing a number of now clearly unnecessary asserts 2. adding a few more targeted asserts to validate the code's current assumptions 3. removing some unneeded control flow in several functions As far as I can tell, this PR should be functionally neutral. One argument was removed from a `cpp_wrapper` public API, but that argument was unused, and only had a single callsite. Pull Request resolved: https://github.com/pytorch/pytorch/pull/154371 Approved by: https://github.com/desertfire	2025-05-27 19:17:41 +00:00
Ryan Guo	75bbd4989c	[dynamo] Support using symint from dispatcher-style tensor subclass (#154130 ) Fixes #146932. Pull Request resolved: https://github.com/pytorch/pytorch/pull/154130 Approved by: https://github.com/laithsakka	2025-05-27 19:05:46 +00:00
PyTorch MergeBot	8c0f07f944	Revert "[ROCm] Improve vectorized elementwise kernel performance in MI300X (#153634 )" This reverts commit `0d4de7872a`. Reverted https://github.com/pytorch/pytorch/pull/153634 on behalf of https://github.com/malfet due to Broke inductor jobs, see `b8452e55bc/1` ([comment](https://github.com/pytorch/pytorch/pull/153634#issuecomment-2913619071))	2025-05-27 19:02:59 +00:00
Zizeng Meng	b8452e55bc	[Kineto x Insight] Update Kineto submodule (#154426 ) Summary: We add a new ActivityType::MTIA_INSIGHT in `20f652846f` Test Plan: CI Differential Revision: D75454945 Pull Request resolved: https://github.com/pytorch/pytorch/pull/154426 Approved by: https://github.com/Skylion007	2025-05-27 18:29:29 +00:00
Nikita Shulga	5075df6fee	Make torch importable if compiled without TensorPipe (#154382 ) By delaying the import/hiding it behind `torch.distributed.rpc.is_tensorpipe_avaiable()` check Fixes https://github.com/pytorch/pytorch/issues/154300 Pull Request resolved: https://github.com/pytorch/pytorch/pull/154382 Approved by: https://github.com/Skylion007 ghstack dependencies: #154325	2025-05-27 18:13:38 +00:00
Nikita Shulga	f472ea63bb	[BE] Fix typos in SyntaxError description (#154436 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/154436 Approved by: https://github.com/seemethere, https://github.com/wdvr, https://github.com/ZainRizvi	2025-05-27 18:08:58 +00:00
Cyrus Daruwala	cfbd99fdfd	[Pytorch] Add option to CPU Blas GEMM to avoid output downcast (#154012 ) Summary: Dot product for a single output element consists of 3 steps (both input vectors have elements of type scalar_t): 1. elementwise vector multiply (scalar_t x scalar_t -> opmath_t) 2. vector reduction to a scalar value (opmath_t -> opmath_t) 3. optional downcast if opmath_t != out_t The current blas kernel performs steps 1 and 2 correctly, but for step 3, it will always downcast to scalar_t even when opmath_t == output_t (and then do an upcast back to output_t), which results in precision loss. This diff fixes the precision loss in the BlasKernel Test Plan: Attention CI passes Differential Revision: D75023858 topic: not user facing Pull Request resolved: https://github.com/pytorch/pytorch/pull/154012 Approved by: https://github.com/Valentine233, https://github.com/aditew01, https://github.com/CaoE, https://github.com/drisspg	2025-05-27 17:43:21 +00:00
bobrenjc93	1ca082d9a1	[ez] Rewrite comment to be more friendly to non haskellers (#151421 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/151421 Approved by: https://github.com/aorenste	2025-05-27 17:32:34 +00:00
bobrenjc93	70fbd5e08c	[ez] Add docblock for resolve_unbacked_bindings (#154374 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/154374 Approved by: https://github.com/Skylion007, https://github.com/pianpwk	2025-05-27 17:05:49 +00:00
bobrenjc93	2560c1f3f0	add sticky cache pgo (#154418 ) It's a reland of https://github.com/pytorch/pytorch/pull/154394 that hit some mergebot bug Pull Request resolved: https://github.com/pytorch/pytorch/pull/154418 Approved by: https://github.com/malfet	2025-05-27 16:40:18 +00:00
Boyuan Feng	514409d032	update torchvision pin (#154255 ) Fixes #153985 Pull Request resolved: https://github.com/pytorch/pytorch/pull/154255 Approved by: https://github.com/desertfire	2025-05-27 16:15:25 +00:00
ZhiweiYan-96	0ddfd1ed43	[Intel GPU] Enable mkdnn._linear_pointwise at XPU backend (#140365 ) # Motivation This PR is intended to add post-op fusion support fo Linear. The liner-pointwise fusion is expected to be used in graph mode like torch.compile. The FusionUtils.cpp file defines a utilization APIs for generating primitive attribute. This APIs would also be used for conv-pointwise fusion, which is in #140372. # Validation ```bash python test/xpu/test_fusion.py ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/140365 Approved by: https://github.com/etaf, https://github.com/guangyey, https://github.com/EikanWang	2025-05-27 15:57:15 +00:00
Jerry Mannil	0d4de7872a	[ROCm] Improve vectorized elementwise kernel performance in MI300X (#153634 ) * Use non-temporal loads to improve the vectorized elementwise kernel performance on MI300 * Use thread_work_size of 8 or 16 for vectorized elementwise kernel Co-author: @amd-hhashemi Pull Request resolved: https://github.com/pytorch/pytorch/pull/153634 Approved by: https://github.com/jeffdaily	2025-05-27 15:38:43 +00:00
Xuehai Pan	7ae204c3b6	[BE][CI][Easy] Run `lintrunner` on generated `.pyi` stub files (#150732 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/150732 Approved by: https://github.com/malfet, https://github.com/cyyever, https://github.com/aorenste	2025-05-27 14:58:02 +00:00
Yuanhao Ji	0a7eef140b	Add `torch.Tensor._make_wrapper_subclass` to `torch/_C/__init__.pyi` (#154022 ) Fixes #153790 Pull Request resolved: https://github.com/pytorch/pytorch/pull/154022 Approved by: https://github.com/Skylion007	2025-05-27 14:10:00 +00:00
Nikita Shulga	d88699308f	[CI][MacOS] Move more dependencies to pypi (#154309 ) Hopefully last step before all Mac build/tests could be switched away from conda - Update cmake version from 3.22 to 3.25 as 3.22 from pipy seems to be unusable with python-3.12 - Add `--plat-name macosx_11_0_arm64` to setup.py command - Remove `codesign` for cmake workaround (that was probably never really necessary - Install `libpng` and `jpeg-turbo` when building torchbench and build torchaudio without OpenMP (to be fixed) Pull Request resolved: https://github.com/pytorch/pytorch/pull/154309 Approved by: https://github.com/Skylion007, https://github.com/cyyever	2025-05-27 13:49:40 +00:00
PyTorch MergeBot	11a51a11af	Revert "introduce definitely_contiguous and use it for reshape and tensor meta data computation. (#153432 )" This reverts commit `5c6d7caaaa`. Reverted https://github.com/pytorch/pytorch/pull/153432 on behalf of https://github.com/malfet due to Looks like it broke flex attention tests, see https://hud.pytorch.org/hud/pytorch/pytorch/main/1?per_page=50&name_filter=g6.4xlarge&mergeEphemeralLF=true ([comment](https://github.com/pytorch/pytorch/pull/153432#issuecomment-2912562570))	2025-05-27 13:42:34 +00:00
Emmanuel Menage	c52a002a22	Add getDeviceProperties api to torch mtia device (#153577 ) topic: not user facing Test Plan: Internal benchmark. Differential Revision: D74256550 Pull Request resolved: https://github.com/pytorch/pytorch/pull/153577 Approved by: https://github.com/nautsimon	2025-05-27 11:55:58 +00:00
atalman	2bd95f3a1f	Move inductor workflows focal (ubuntu 20.04) -> jammy (ubuntu 22.04) (#154153 ) Trying to fix: https://github.com/pytorch/pytorch/issues/154157 Pull Request resolved: https://github.com/pytorch/pytorch/pull/154153 Approved by: https://github.com/Skylion007, https://github.com/huydhn, https://github.com/nike4949, https://github.com/cyyever	2025-05-27 11:53:47 +00:00
Tom Ritchford	6f86c1ce1d	Add pyrefly.toml (#154144 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/154144 Approved by: https://github.com/Skylion007	2025-05-27 10:16:30 +00:00
Laith Sakka	5c6d7caaaa	introduce definitely_contiguous and use it for reshape and tensor meta data computation. (#153432 ) when a tensor has unbacked symbols it can be general enough to represent both contiguous and non contiguous tensors. in that case we cant really evaluate is_contiguous. In many places in the code base, we check for is_contiguous to take a fast path. but the general path usually works for both contiguous and not contiguous in that case we probably want to use definitely _contiguous API. This is appleid for reshape in this PR and also to tensor meta data computation, the meta data now will have an attribute that says that its contiguous when its always contiguous. We would store that only if definitely _contiguous is true now. Pull Request resolved: https://github.com/pytorch/pytorch/pull/153432 Approved by: https://github.com/bobrenjc93	2025-05-27 08:54:31 +00:00
anwang	dec5ab8d98	[MTIA Aten Backend][1.2/n] Migrate as_strided to in-tree, and add unit tests (#154336 ) See context in PR https://github.com/pytorch/pytorch/pull/153670 This diff migrate as_strided to in-tree. I found it's not covered by `test_kernel_eager_ci` so also adding unit tests. Differential Revision: [D75385404](https://our.internmc.facebook.com/intern/diff/D75385404/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/154336 Approved by: https://github.com/nautsimon	2025-05-27 06:32:38 +00:00
PyTorch MergeBot	ef6306e1c6	Revert "[executorch hash update] update the pinned executorch hash (#153436 )" This reverts commit `8d6139b8d8`. Reverted https://github.com/pytorch/pytorch/pull/153436 on behalf of https://github.com/malfet due to Broke ET sanity ([comment](https://github.com/pytorch/pytorch/pull/153436#issuecomment-2911206795))	2025-05-27 06:02:14 +00:00
Yu, Guangye	870133b2a0	Use get_device_context in aoti runtime for XPU directly (#154360 ) # Motivation Reuse [c10::xpu::get_device_context](`1bebe0424e/c10/xpu/XPUFunctions.h (L27)`) directly to reduce overhead, as it returns a cached `sycl::context` managed by PyTorch. Pull Request resolved: https://github.com/pytorch/pytorch/pull/154360 Approved by: https://github.com/EikanWang	2025-05-27 05:55:59 +00:00
Jing Xu	8d89cdceb6	fix a compilation issue when TORCH_XPU_ARCH_LIST is an empty string (#153604 ) When `XPU_ARCH_FLAGS` is an empty string, compilation will fail on `C10_STRINGIZE(XPU_ARCH_FLAGS)` in file `torch/csrc/xpu/Module.cpp` on Windows. This PR fixes this issue by setting `TORCH_XPU_ARCH_LIST` to `""` to avoid an empty string conversion in `C10_STRINGIZE()` when compiling without an AOT. Pull Request resolved: https://github.com/pytorch/pytorch/pull/153604 Approved by: https://github.com/guangyey, https://github.com/EikanWang Co-authored-by: Aaron Gokaslan <aaronGokaslan@gmail.com> Co-authored-by: Yu, Guangye <106960996+guangyey@users.noreply.github.com>	2025-05-27 05:26:46 +00:00

1 2 3 4 5 ...

88238 Commits