Yuanyuan Chen
fbe0d20a17
[2/N] More ruff SIM fixes ( #165031 )
...
This is follow-up of #164695 to apply ruff SIM rules to more files. Most changes are about simplifying dict.get because None is already the default value.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/165031
Approved by: https://github.com/mlazos
2025-10-14 14:22:54 +00:00
can-gaa-hou
39161e73fc
[Fix] missing lambda in torch._check ( #165043 )
...
Fixes more missing lambda in torch._check in the source code. Inspired by #164225 .
Pull Request resolved: https://github.com/pytorch/pytorch/pull/165043
Approved by: https://github.com/FFFrog , https://github.com/Skylion007
2025-10-10 17:11:55 +00:00
PyTorch MergeBot
b8be796a57
Revert "[2/N] More ruff SIM fixes ( #165031 )"
...
This reverts commit 38095fbd13 .
Reverted https://github.com/pytorch/pytorch/pull/165031 on behalf of https://github.com/albanD due to One of the changed line started to fail on trunk ([comment](https://github.com/pytorch/pytorch/pull/165031#issuecomment-3390190870 ))
2025-10-10 13:42:14 +00:00
Yuanyuan Chen
38095fbd13
[2/N] More ruff SIM fixes ( #165031 )
...
This is follow-up of #164695 to apply ruff SIM rules to more files. Most changes are about simplifying dict.get because None is already the default value.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/165031
Approved by: https://github.com/mlazos
2025-10-10 05:37:46 +00:00
Laith Sakka
7158aa22e8
remove more ( #164753 )
...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/164753
Approved by: https://github.com/aorenste , https://github.com/mlazos
ghstack dependencies: #164664 , #164665 , #164667 , #164668
2025-10-08 14:23:38 +00:00
Colin Peppler
2855a045b3
Use sym_eq and sym_and on symbolic shapes in common_meta_baddbmm_bmm ( #164781 )
...
Differential Revision: D84005053
Pull Request resolved: https://github.com/pytorch/pytorch/pull/164781
Approved by: https://github.com/Skylion007
2025-10-07 18:25:00 +00:00
PyTorch MergeBot
5d7360bb03
Revert "Enable all SIM rules except disabled ones ( #164645 )"
...
This reverts commit 321e602692 .
Reverted https://github.com/pytorch/pytorch/pull/164645 on behalf of https://github.com/izaitsevfb due to causes lint failures ([comment](https://github.com/pytorch/pytorch/pull/164645#issuecomment-3369274351 ))
2025-10-05 19:32:21 +00:00
Yuanyuan Chen
321e602692
Enable all SIM rules except disabled ones ( #164645 )
...
`SIM` rules are useful for simplifying boolean expressions and enhances code readability.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/164645
Approved by: https://github.com/ezyang
2025-10-05 07:38:25 +00:00
Maggie Moss
1051c1de5c
Add pyrefly suppressions 2/n ( #164513 )
...
Adds suppressions to pyrefly will typecheck clean: https://github.com/pytorch/pytorch/issues/163283
Test plan:
dmypy restart && python3 scripts/lintrunner.py -a
pyrefly check
---
step 1: uncomment lines in the `pyrefly.toml` file
before: https://gist.github.com/maggiemoss/911b4d0bc88bf8cf3ab91f67184e9d46
after:
```
INFO Checking project configured at `/Users/maggiemoss/python_projects/pytorch/pyrefly.toml`
INFO 0 errors (1,152 ignored)
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/164513
Approved by: https://github.com/oulgen
2025-10-03 02:46:13 +00:00
Yuanyuan Chen
315ffdc1e4
[4/N] Apply ruff UP035 rule to python code ( #164206 )
...
Follows #164104
Pull Request resolved: https://github.com/pytorch/pytorch/pull/164206
Approved by: https://github.com/albanD
2025-10-01 19:05:53 +00:00
Yuanyuan Chen
cc8b14d09a
[2/N] Simplify "in" operation for containers of a single item ( #164323 )
...
These issues are detected by ruff [FURB171](https://docs.astral.sh/ruff/rules/single-item-membership-test/#single-item-membership-test-furb171 ).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/164323
Approved by: https://github.com/justinchuby , https://github.com/Skylion007
2025-10-01 05:39:11 +00:00
ankushwahaRH
7f29c47a4f
Fix cdist export compute mode validation ( #161724 )
...
Fixes #161089 . Added '0' as the acceptable value for compute mode in _meta_registrations.py. Also, added a test case in test_export.py file.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/161724
Approved by: https://github.com/albanD , https://github.com/angelayi
2025-09-30 12:23:20 +00:00
Yavuz Yetim
7afcb030d8
Back out "Revert D81959389" ( #163905 )
...
Summary:
Original commit changeset: 06888d7ebff0
Original Phabricator Diff: D82932788
Restricted the test to SM90 for scaled_grouped_mm
Test Plan: TBD (will share the linux CI results)
Differential Revision: D83283991
Pull Request resolved: https://github.com/pytorch/pytorch/pull/163905
Approved by: https://github.com/angelayi
2025-09-30 07:05:13 +00:00
can-gaa-hou
eb4361a801
[Fix] Adding missing f prefixes to formatted strings [1/N] ( #164065 )
...
As stated in the title.
* #164068
* #164067
* #164066
* __->__ #164065
Pull Request resolved: https://github.com/pytorch/pytorch/pull/164065
Approved by: https://github.com/Skylion007
2025-09-29 04:53:00 +00:00
thenumberouscode
c106ee8515
[FakeTensor] Supplement the relevant logic for converting conv1d to conv2d in meta_conv ( #160408 )
...
## Fixes https://github.com/pytorch/pytorch/issues/159462 also fixes #163569 , #163604
## summary
the issue is caused by the wrong stride of conv1d's result generated by meta_conv:
4d5b3f2d5a/torch/_meta_registrations.py (L2453-L2471)
and the wrong stride will be used to codegen size assert in inductor:
4d5b3f2d5a/torch/_inductor/ir.py (L6152-L6163)
## reason
So why the computed stride is wrong in the meta_conv function? because the corresponding backend will convert conv1d to conv2d and change the input tensor' size and memory_format(channel last). but the meta_conv do not do this transformation, so a mismatch happend.
4d5b3f2d5a/aten/src/ATen/native/Convolution.cpp (L1502-L1510)
just add corresponding logic in meta_conv.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/160408
Approved by: https://github.com/eellison , https://github.com/jansel , https://github.com/mlazos
2025-09-26 15:45:02 +00:00
Yidi Wu
21a41edd4f
Add fake_impl for _native_multi_head_attention ( #163700 )
...
Test Plan: See added test in test_export.py
Differential Revision: D83099187
Pull Request resolved: https://github.com/pytorch/pytorch/pull/163700
Approved by: https://github.com/angelayi
2025-09-25 19:01:27 +00:00
Jason Ansel
9c4d9f940b
[inductor] Support out_dtype arg to matmul ( #163393 )
...
Fixes #163275
Pull Request resolved: https://github.com/pytorch/pytorch/pull/163393
Approved by: https://github.com/eellison , https://github.com/coconutruben
ghstack dependencies: #163386 , #163398 , #163387 , #163414 , #163415 , #163419 , #163434
2025-09-23 15:37:38 +00:00
PyTorch MergeBot
aff76c046d
Revert "Add fake_impl for _native_multi_head_attention ( #163167 )"
...
This reverts commit 27164b6788 .
Reverted https://github.com/pytorch/pytorch/pull/163167 on behalf of https://github.com/malfet due to This broke in inductor-cpu-test, see 1a42656d6c/1 ([comment](https://github.com/pytorch/pytorch/pull/163167#issuecomment-3324302026 ))
2025-09-23 14:36:45 +00:00
Yidi Wu
27164b6788
Add fake_impl for _native_multi_head_attention ( #163167 )
...
Test Plan:
See added test in test_export.py
Rollback Plan:
Reviewed By: henryoier
Differential Revision: D77747446
Pull Request resolved: https://github.com/pytorch/pytorch/pull/163167
Approved by: https://github.com/angelayi
2025-09-23 04:02:20 +00:00
Aart J.C. Bik
9b5ec0ff7c
Use computed buffer sizes of torch for cusparseLt metadata ( #163125 )
...
Making sure buffer allocation matches what is computed by cusparseLt compression
Pull Request resolved: https://github.com/pytorch/pytorch/pull/163125
Approved by: https://github.com/jcaip
2025-09-19 22:12:40 +00:00
Eddie Yan
9b7a8c4d05
[cuDNN][SDPA][submodule] Roll-back cuDNN frontend upgrade, update Meta registration ( #163104 )
...
For https://github.com/pytorch/torchtitan/issues/1713
Also note that we will need to rollback the cuDNN frontend upgrade in 2.9 as it currently introduces a segmentation fault by assuming tensors have their strides and sizes populated at graph creation time 1a7b4b78db/include/cudnn_frontend/node/sdpa_support_surface.h (L447%C2%A0)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/163104
Approved by: https://github.com/drisspg
2025-09-17 15:48:54 +00:00
Daniel Vega-Myhre
872ed60679
[mxfp8 torch._scaled_grouped_mm] fix meta registration for 3d tensor ( #162765 )
...
Meta registration checks for torch._scaled_grouped_mm has a bug for 3d "B" tensors. Namely, the scale shape for such a tensor should be 2d with shape (G, blocked_K * blocked_N), but it currently enforces an expected 3d shape of (G, blocked_K, blocked_N).
See Blas.cpp for correct validation logic [here](8e217a9f6d/aten/src/ATen/native/cuda/Blas.cpp (L1622) ).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/162765
Approved by: https://github.com/ngimel
2025-09-12 03:51:52 +00:00
Pian Pawakapan
ac72f81c12
[dynamic shapes] unbacked-safe should_swap ( #160473 )
...
Fixes #ISSUE_NUMBER
Pull Request resolved: https://github.com/pytorch/pytorch/pull/160473
Approved by: https://github.com/laithsakka
2025-09-11 18:51:25 +00:00
Daniel Vega-Myhre
b6d0a9ea90
MXFP8 grouped GEMM support for torch._scaled_grouped_mm + submodule bump ( #162209 )
...
## Summary
- We just landed 2d-2d support for mxfp8 grouped gemm in FBGEMM: https://github.com/pytorch/FBGEMM/pull/4816
- This is needed for backward pass of mxfp8 MoE training with grouped gemms
- Changes:
- Add dispatching + input validation for mxfp8 grouped gemm in `torch._scaled_grouped_mm`
- Add meta registration input validation for mxfp8 grouped gemm, for composability with compile
- Add unit tests exercising torch._scaled_grouped_mm with mxfp8 inputs
- Bump FBGEMM third party submodule to include:
- https://github.com/pytorch/FBGEMM/pull/4816
- https://github.com/pytorch/FBGEMM/pull/4820
- https://github.com/pytorch/FBGEMM/pull/4821
- https://github.com/pytorch/FBGEMM/pull/4823
#### How fbgemm dependency was bumped
Documenting this since I haven't found it documented elsewhere:
- `cd ~/pytorch/third_party/fbgemm`
- `git fetch`
- `git checkout <hash>`
- `cd ~/pytorch`
- `git add third_party/fbgemm`
## Test plan
#### Test build
```
USE_FBGEMM_GENAI=1 python -m pip install --no-build-isolation -v -e .
...
Successfully installed torch-2.9.0a0+gitf5070f3
```
[full build log](https://www.internalfb.com/phabricator/paste/view/P1933787581 )
#### Unit tests
```
pytest test/test_matmul_cuda.py -k test_mxfp8_scaled_grouped_mm_
...
test/test_matmul_cuda.py ......... [100%]
============================================================== 9 passed, 1668 deselected in 5.34s ===============================================================
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/162209
Approved by: https://github.com/ngimel
2025-09-06 15:25:30 +00:00
Laith Sakka
fbf3d2027d
use sym_or instead of any to avoid dde in calc_conv_nd_return_shape ( #162084 )
...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/162084
Approved by: https://github.com/aorenste
Co-authored-by: Aaron Orenstein <aorenste@fb.com>
2025-09-04 01:20:22 +00:00
angelayi
e34b6a0103
Add meta for add.Scalar ( #161332 )
...
Fixes https://github.com/pytorch/pytorch/issues/161076
Pull Request resolved: https://github.com/pytorch/pytorch/pull/161332
Approved by: https://github.com/Skylion007
2025-08-26 02:26:51 +00:00
Isuru Fernando
e631557518
Fix meta function for aten.complex ( #160894 )
...
Closes https://github.com/pytorch/pytorch/issues/160882
Pull Request resolved: https://github.com/pytorch/pytorch/pull/160894
Approved by: https://github.com/mlazos
2025-08-20 16:30:04 +00:00
Isuru Fernando
781e9a7724
Fix meta for constant_pad_nd ( #159878 )
...
Fixes https://github.com/pytorch/pytorch/issues/144187
Pull Request resolved: https://github.com/pytorch/pytorch/pull/159878
Approved by: https://github.com/Skylion007 , https://github.com/ezyang
2025-08-14 14:47:47 +00:00
angelayi
74a754aae9
Add meta kernel for sdpa_math_for_mps ( #159695 )
...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/159695
Approved by: https://github.com/malfet
ghstack dependencies: #159456
2025-08-05 22:27:06 +00:00
Shangdi Yu
bc4b04e058
DeviceCopy should have the same layout as input ( #159615 )
...
Summary: Fix https://github.com/pytorch/pytorch/issues/159612
- Fix the meta implementation of `nan_to_num`, it should preserve the stride of the input
- The DeviceCopy IR node should always preserve the input's layout, so we don't end up with a contiguous call during device copy
Test Plan:
```
buck2 run @mode/dev-nosan fbcode//caffe2/test/inductor:test_aot_inductor -- -r test_d2h_copy
```
Rollback Plan:
Differential Revision: D79411407
Pull Request resolved: https://github.com/pytorch/pytorch/pull/159615
Approved by: https://github.com/eellison
2025-08-04 23:56:58 +00:00
Natalia Gimelshein
a81ffbc5f5
improve shape checks for grouped_mm ( #159666 )
...
Check that contraction dimension matches between tensors if it's known, and do device-side checks for correct offsets
Pull Request resolved: https://github.com/pytorch/pytorch/pull/159666
Approved by: https://github.com/danielvegamyhre , https://github.com/eqy
2025-08-02 00:12:25 +00:00
Chris Thi
c400c8e2e0
[ROCm] Add FP8 rowwise support to _scaled_grouped_mm + Submodule update ( #159075 )
...
Summary:
In this PR we integrate the [FBGEMM AMD FP8 rowwise scaling grouped GEMM kernel](https://github.com/pytorch/FBGEMM/tree/main/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped ) to add support for the `_scaled_grouped_mm` API on AMD. `_scaled_grouped_mm` is [currently supported on Nvidia](9faef3d17c/aten/src/ATen/native/cuda/Blas.cpp (L1614) ), this PR aims to bring parity to AMD. Related: [[RFC]: PyTorch Low-Precision GEMMs Public API](https://github.com/pytorch/pytorch/issues/157950#top ) #157950 .
The kernel is developed using the Composable Kernel framework. Only MI300X is currently supported. In the near future we plan to add support for MI350X as well. For data types we support FP8 e3m4.
The kernel support will be gated with the `USE_FBGEMM_GENAI` flag. We hope to enable this by default for relevant AMD builds.
Note we also update submodule `third_party/fbgemm` to 0adf62831 for the required updates from fbgemm.
Test Plan:
**Hipify & build**
```
python tools/amd_build/build_amd.py
USE_FBGEMM_GENAI=1 python setup.py develop
```
**Unit tests**
```
python test/test_matmul_cuda.py -- TestFP8MatmulCUDA
Ran 488 tests in 32.969s
OK (skipped=454)
```
**Performance Sample**
| G | M | N | K | Runtime Ms | GB/S | TFLOPS |
| -- | -- | -- | -- | -- | -- | -- |
| 128 | 1 | 2048 | 5120 | 0.37| 3590 | 7.17 |
| 128 | 64 | 2048 | 5120 | 0.51| 2792 | 338.34 |
| 128 | 128 | 2048 | 5120 | 0.66| 2272 | 522.72 |
| 128 | 1 | 5120 | 1024 | 0.21| 3224 | 6.43 |
| 128 | 64 | 5120 | 1024 | 0.29| 2590 | 291.40 |
| 128 | 128 | 5120 | 1024 | 0.40| 2165 | 434.76 |
| 128 | 1 | 4096 | 4096 | 0.69| 3126 | 6.25 |
| 128 | 64 | 4096 | 4096 | 0.85| 2655 | 324.66 |
| 128 | 128 | 4096 | 4096 | 1.10| 2142 | 501.40 |
| 128 | 1 | 8192 | 8192 | 2.45| 3508 | 7.01 |
| 128 | 64 | 8192 | 8192 | 3.27| 2692 | 336.74 |
| 128 | 128 | 8192 | 8192 | 4.04| 2224 | 543.76 |
| 16 | 1 | 2048 | 5120 | 0.04| 3928 | 7.85 |
| 16 | 64 | 2048 | 5120 | 0.05| 3295 | 399.29 |
| 16 | 128 | 2048 | 5120 | 0.07| 2558 | 588.69 |
| 16 | 1 | 5120 | 1024 | 0.03| 3119 | 6.23 |
| 16 | 64 | 5120 | 1024 | 0.03| 2849 | 320.62 |
| 16 | 128 | 5120 | 1024 | 0.05| 2013 | 404.11 |
| 16 | 1 | 4096 | 4096 | 0.06| 4512 | 9.02 |
| 16 | 64 | 4096 | 4096 | 0.09| 3124 | 381.95 |
| 16 | 128 | 4096 | 4096 | 0.13| 2340 | 547.67 |
| 16 | 1 | 8192 | 8192 | 0.32| 3374 | 6.75 |
| 16 | 64 | 8192 | 8192 | 0.42| 2593 | 324.28 |
| 16 | 128 | 8192 | 8192 | 0.53| 2120 | 518.36 |
- Using ROCm 6.4.1
- Collected through `triton.testing.do_bench_cudagraph`
**Binary size with gfx942 arch**
Before: 116103856 Jul 23 14:12 build/lib/libtorch_hip.so
After: 118860960 Jul 23 14:29 build/lib/libtorch_hip.so
The difference is 2757104 bytes (~2.6 MiB).
Reviewers: @drisspg @ngimel @jwfromm @jeffdaily
Pull Request resolved: https://github.com/pytorch/pytorch/pull/159075
Approved by: https://github.com/drisspg
2025-07-30 23:53:58 +00:00
Laith Sakka
aaa384b2d4
move view_meta to fake impl ( #158406 )
...
Python dispatcher is not always enabled in fake tensors and have to be called explicitly.
While it should be, it requires some work to get all tests working.
I have been running in several issues where I add to add enable_python_dispatcher ex
XLA, Helom ..etc to avoid issues related to that for the view specifically i moved it to fake tensor impl.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/158406
Approved by: https://github.com/bobrenjc93
2025-07-25 08:21:27 +00:00
Laith Sakka
0b2ef76e85
DDE-Free select with unbacked index. ( #157605 )
...
When select has data dependent input, we cant tell if the actual index shall be index+size or index.
to avoid throwing dde, we allocate a new unbacked symbol to represent the storage offset of the
output view and we compute its value dynamically at runtime when inductor is lowered.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/157605
Approved by: https://github.com/ColinPeppler
2025-07-24 20:08:05 +00:00
PyTorch MergeBot
23550ab735
Revert "DDE-Free select with unbacked index. ( #157605 )"
...
This reverts commit 79d7c754ab .
Reverted https://github.com/pytorch/pytorch/pull/157605 on behalf of https://github.com/laithsakka due to fail pr time benchmarks ([comment](https://github.com/pytorch/pytorch/pull/157605#issuecomment-3084663020 ))
2025-07-17 16:20:02 +00:00
Laith Sakka
79d7c754ab
DDE-Free select with unbacked index. ( #157605 )
...
When select has data dependent input, we cant tell if the actual index shall be index+size or index.
to avoid throwing dde, we allocate a new unbacked symbol to represent the storage offset of the
output view and we compute its value dynamically at runtime when inductor is lowered.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/157605
Approved by: https://github.com/ColinPeppler
2025-07-17 05:08:11 +00:00
Aleksandar Samardžić
90618581e9
Fix grouped MM output strides when compiled but not max-autotuned ( #158143 )
...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/158143
Approved by: https://github.com/ngimel
2025-07-15 11:53:13 +00:00
wengshiy
c8c221c0b3
[Inductor][Float8] Add float8_e4m3fn into assertion dtype list. ( #157684 )
...
Fix assert issue.
Add float8_e4m3fn into dtype list.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/157684
Approved by: https://github.com/Xia-Weiwen , https://github.com/leslie-fang-intel , https://github.com/jansel
2025-07-15 06:02:01 +00:00
Valentine233
1f57e0e04d
[CPU] Support GQA for flash attention ( #157893 )
...
As many models require GQA, we support it in flash attention for CPU path.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/157893
Approved by: https://github.com/mingfeima , https://github.com/jansel
2025-07-13 09:49:02 +00:00
Xia, Weiwen
e1a20988f3
[Quant][CPU] Enable fp8 qconv ( #157076 )
...
**Summary**
Enable fp8 qconv on CPU. It's part of the plan to enable fp8 static quantization on CPU. This PR only adds FP8 support of the existing int8 qconv op. It does not add a new op nor does it affect frontend or quantization flow. The schema of the qconv op is not changed either.
So, the FP8 qconv shares the same op as INT8 qconv and the difference is that src/wei dtype is fp8 instead of int8. The output dtype can be fp8/float32/bfloat16. The implementation uses the oneDNN library.
Note:
OneDNN does not support quantized fp8 convolution until v3.9 but the version used in PyTorch is v3.7.2. So, the op goes to the reference kernel for now. And we have also update the oneDNN path so that it's compatible with the fp8 dtype. Once oneDNN is upgraded to v3.9 or newer, minimum changes are needed to enable the oneDNN path. And we have ensured that the behavior of the reference kernel is the same as the new oneDNN's implementation.
- oneDNN version < 3.9 (now)
- Always go to the reference kernel
- oneDNN version >= 3.9 (future)
- Go to reference kernel on old platforms (without AMX)
- Use oneDNN on new platforms (with AMX)
**Test plan**
```
pytest test/quantization/core/test_quantized_op.py -k "qconv and fp8"
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/157076
Approved by: https://github.com/leslie-fang-intel , https://github.com/jerryzh168
2025-07-11 10:00:57 +00:00
Aleksandar Samardžić
a3ec6d64b2
Update test after CUTLASS upgrade ( #157903 )
...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/157903
Approved by: https://github.com/ngimel
2025-07-10 20:10:20 +00:00
Xuehai Pan
4cc8b60d1b
[BE][1/16] fix typos in torch/ ( #156311 )
...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/156311
Approved by: https://github.com/albanD
2025-07-09 11:02:22 +00:00
Laith Sakka
ed5d6d2a20
python definitely_contiguous-> is_contiguous_or_false ( #156515 )
...
We probably can avoid having those in python as well and just depend on c++ impl after we land https://github.com/pytorch/pytorch/pull/155590 but that is for a different PR.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/156515
Approved by: https://github.com/bobrenjc93
2025-06-30 17:31:51 +00:00
PyTorch MergeBot
75a7d9e868
Revert "python definitely_contiguous-> is_contiguous_or_false ( #156515 )"
...
This reverts commit 4c0091fda6 .
Reverted https://github.com/pytorch/pytorch/pull/156515 on behalf of https://github.com/huydhn due to Sorry for reverting your change but it seems to cause some torch.export failures internally ([comment](https://github.com/pytorch/pytorch/pull/156515#issuecomment-3014104570 ))
2025-06-27 19:07:06 +00:00
Laith Sakka
cbcffce48a
address remaining straight forward gso in meta_registrations ( #156902 )
...
Those are all straight forward generalization of existing checks,
Pull Request resolved: https://github.com/pytorch/pytorch/pull/156902
Approved by: https://github.com/ColinPeppler
2025-06-27 06:19:54 +00:00
Laith Sakka
4c0091fda6
python definitely_contiguous-> is_contiguous_or_false ( #156515 )
...
We probably can avoid having those in python as well and just depend on c++ impl after we land https://github.com/pytorch/pytorch/pull/155590 but that is for a different PR.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/156515
Approved by: https://github.com/bobrenjc93
2025-06-26 00:47:14 +00:00
fengqing.lu
04178d347c
[Reland] [Intel GPU] Make SDPA output has the same stride as Query. ( #154340 )
...
Fixes [#153903 ](https://github.com/pytorch/pytorch/issues/153903 ).
Currently the output tensor of SDPA XPU is always defined as contiguous stride, while CPU/CUDA flash_attention and cudnn_attention allocate output tensor with stride the same as Query.
This PR aligns XPU's behavior with CUDA/CPU to make XPU compatible to CPU/CUDA's modeling code.
The function `alloc_with_matching_layout` is copied from cudnn 8c16d0e404/aten/src/ATen/native/cudnn/MHA.cpp (L874)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/154340
Approved by: https://github.com/guangyey , https://github.com/drisspg
2025-06-24 06:09:59 +00:00
Aleksandar Samardžić
6ed85bfe6a
Refine alignment check along dynamic dimension for grouped MMs ( #155466 )
...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/155466
Approved by: https://github.com/ngimel
2025-06-20 19:42:57 +00:00
Cui, Yifeng
72c8751b61
Align meta deducing for fft_r2c with fft_r2c_mkl on XPU ( #156048 )
...
There is a memory layout mismatching between `fft_r2c` XPU and Inductor meta deducing.
Original `fft_r2c` Inductor meta deducing for XPU backend is aligned with CPU (fallback). This PR is to correct the Inductor meta deducing and update the torch-xpu-ops commit to [intel/torch-xpu-ops@`3a9419c`](3a9419c8bb ).
The XPU implementation first performs the R2C transform on the last dimension, followed by iterative C2C transforms on the remaining dimensions.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/156048
Approved by: https://github.com/guangyey , https://github.com/etaf , https://github.com/jansel
2025-06-20 01:41:03 +00:00
PyTorch MergeBot
0b62465b99
Revert "Refine alignment check along dynamic dimension for grouped MMs ( #155466 )"
...
This reverts commit 830a335a7d .
Reverted https://github.com/pytorch/pytorch/pull/155466 on behalf of https://github.com/atalman due to breaks internal builds ([comment](https://github.com/pytorch/pytorch/pull/155466#issuecomment-2988285117 ))
2025-06-19 14:25:38 +00:00