Commit Graph

80 Commits

Author SHA1 Message Date
Yuanyuan Chen
fdab48a7c1 Enable all PIE rules on ruff (#165814)
This PR enables all PIE rules on ruff, there are already some enabled rules from this family, the new added rules are
```
PIE796  Enum contains duplicate value: {value}
PIE808  Unnecessary start argument in range
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/165814
Approved by: https://github.com/ezyang
2025-10-18 07:36:18 +00:00
PyTorch MergeBot
24520b8386 Revert "Enable all PIE rules on ruff (#165814)"
This reverts commit c79dfdc655.

Reverted https://github.com/pytorch/pytorch/pull/165814 on behalf of https://github.com/cyyever due to Need to cover more files ([comment](https://github.com/pytorch/pytorch/pull/165814#issuecomment-3417931863))
2025-10-18 07:21:08 +00:00
Yuanyuan Chen
c79dfdc655 Enable all PIE rules on ruff (#165814)
This PR enables all PIE rules on ruff, there are already some enabled rules from this family, the new added rules are
```
PIE796  Enum contains duplicate value: {value}
PIE808  Unnecessary start argument in range
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/165814
Approved by: https://github.com/ezyang
2025-10-18 06:40:12 +00:00
vishalgoyal316
9c12651417 Improve error message for non-positive groups in convolution (#165669)
Prevents from segmentation fault for invalid groups value in convolution.

Fixes #142835

Pull Request resolved: https://github.com/pytorch/pytorch/pull/165669
Approved by: https://github.com/mikaylagawarecki
2025-10-17 19:06:05 +00:00
Yuanyuan Chen
8de85896e0 Enable ruff rule E721 (#165162)
`E721` checks for object type comparisons using == and other comparison operators. This is useful because it is recommended to use `is` for type comparisons.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/165162
Approved by: https://github.com/Skylion007
2025-10-13 01:48:55 +00:00
PyTorch MergeBot
816fb7f48d Revert "Enable ruff rule E721 (#165162)"
This reverts commit 9e7c19f72b.

Reverted https://github.com/pytorch/pytorch/pull/165162 on behalf of https://github.com/pytorch-auto-revert due to Reverted automatically by pytorch's autorevert, to avoid this behaviour add the tag autorevert: disable ([comment](https://github.com/pytorch/pytorch/pull/165162#issuecomment-3393328271))
2025-10-11 13:25:40 +00:00
Yuanyuan Chen
9e7c19f72b Enable ruff rule E721 (#165162)
`E721` checks for object type comparisons using == and other comparison operators. This is useful because it is recommended to use `is` for type comparisons.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/165162
Approved by: https://github.com/Skylion007
2025-10-11 06:43:53 +00:00
Jeff Daily
4a6abba0d9 [ROCm][CI] test_convolution.py uses miopen immediate mode (#164598)
This should help stabilize some flaky test behavior where miopen would pick different solutions for different parts of the same test and the test expects bitwise identical results.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/164598
Approved by: https://github.com/jeffdaily

Co-authored-by: Jeff Daily <jeff.daily@amd.com>
2025-10-06 17:48:50 +00:00
Jeff Daily
6b7970192f [ROCm][CI] fix test_cudnn_convolution_relu_cuda (#164466)
Fixes #162816.
Test was comparing output of conv vs fused conv but inputs were different memory formats. Also fix test_cudnn_convolution_add_relu.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/164466
Approved by: https://github.com/jeffdaily

Co-authored-by: Jeff Daily <jeff.daily@amd.com>
2025-10-02 20:36:54 +00:00
Nikita Shulga
95be302889 Skip test_conv3d_cudnn_broken on ROCM (#164138)
Followup after https://github.com/pytorch/pytorch/pull/163903  Fixes https://github.com/pytorch/pytorch/issues/164137

Pull Request resolved: https://github.com/pytorch/pytorch/pull/164138
Approved by: https://github.com/Camyll
2025-09-29 16:56:51 +00:00
Eddie Yan
e2817ac204 [cuDNN][Convolution] Disable cuDNN for 3D convolutions with kernel size != 1 for cuDNN 9.8+ (#163581)
To workaround #163539

Still confirming whether 9.10 is affected. The original test states that the convolution is "large," but note that the input size does not apepar to require 64-bit indexing.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/163581
Approved by: https://github.com/ngimel, https://github.com/malfet

Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com>
2025-09-26 23:47:29 +00:00
eqy
0ea10f9912 [cuDNN][conv][64-bit] Disable cuDNN for 64-bit depthwise convs again (#163171)
test is breaking, will check if there's an older version that we can enable on to avoid completely dropping support

Pull Request resolved: https://github.com/pytorch/pytorch/pull/163171
Approved by: https://github.com/ngimel, https://github.com/malfet
2025-09-26 22:12:17 +00:00
mansiag05
d4e4f70768 Fix overflow in slow_conv3d when kernel size is too large. (#162718)
Also, adding check for padding to avoid segmentation fault caused by overflow.

Fixes #141846

Pull Request resolved: https://github.com/pytorch/pytorch/pull/162718
Approved by: https://github.com/jgong5, https://github.com/Skylion007
2025-09-26 13:39:29 +00:00
Yuanyuan Chen
281bb56cc5 Enable half precision types on test_conv_cudnn_nhwc_support (#163444)
This PR adds flaot16 and bfloat16 cases to `test_conv_cudnn_nhwc_support` and removes outdated comments.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/163444
Approved by: https://github.com/Skylion007
2025-09-22 04:11:20 +00:00
Jeff Daily
0def79fdd9 [ROCm] fix conv relu fusion (#162856)
Fixes #162816.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/162856
Approved by: https://github.com/jeffdaily

Co-authored-by: Jeff Daily <jeff.daily@amd.com>
2025-09-15 22:49:32 +00:00
Jeff Daily
d65ffdef3d [ROCm] fix miopen batchnorm changing output format (#162112)
It was found that the integration of miopen batchnorm was causing the output to always be in default contig memory format even when the input was channels last.  This also unskips a number of related unit tests.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/162112
Approved by: https://github.com/jeffdaily

Co-authored-by: Jeff Daily <jeff.daily@amd.com>
Co-authored-by: Dmitry Nikolaev <dmitry.nikolaev@amd.com>
Co-authored-by: Jithun Nair <37884920+jithunnair-amd@users.noreply.github.com>
2025-09-11 19:37:48 +00:00
eqy
5dbee5691c [cuDNN][Convolution][TF32][64bit] Add tf32_on_and_off decorator to conv3d 64bit test (#161004)
cuDNN has new generated kernels that can use TF32.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/161004
Approved by: https://github.com/janeyx99, https://github.com/Skylion007
2025-09-10 21:39:35 +00:00
Jeff Daily
99f356fa58 [ROCm] revamp miopen integration (#161687)
Update sources under ATen/miopen and ATen/native/miopen to align with best practices. Avoid reshape_ calls inside backward operations.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/161687
Approved by: https://github.com/jeffdaily

Co-authored-by: Jeff Daily <jeff.daily@amd.com>
2025-09-03 22:28:09 +00:00
Eddie Yan
f391afe9bf [cuDNN][convolution] remove redundant conv3d 64bit test (#161177)
turns out it's the same as
```
    @onlyCUDA
    @largeTensorTest("40GB")
    @largeTensorTest("24GB", "cpu")
    @tf32_on_and_off(0.005)
    def test_conv3d_64bit_indexing(self, device):
        x = torch.rand(1, 32, 512, 512, 256)
        m = torch.nn.Conv3d(32, 1, kernel_size=1, padding=0, stride=1, bias=False)
        yref = m(x)
        y = m.to(device=device)(x.to(device=device))
        self.assertEqual(yref, y)
 ```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/161177
Approved by: https://github.com/Skylion007
2025-08-25 15:01:05 +00:00
eqy
9903ca4f70 [cuDNN][64-bit indexing] update conv depthwise 64bit indexing dispatch condition to match native kernel (#156140)
The native kernel doesn't support batch splitting so the previous check wasn't aggressive enough in dispatching to cuDNN

https://github.com/pytorch/pytorch/issues/155225

Pull Request resolved: https://github.com/pytorch/pytorch/pull/156140
Approved by: https://github.com/ngimel, https://github.com/atalman
2025-08-12 18:07:41 +00:00
Nikita Shulga
e06b110f73 [Testing] Add MPS to NATIVE_DEVICES (#153835)
This would allow me to enable more opinfo tests against MPS device eventually and supposed to be a very simple test, but actually required minor adjustments to lots of test files, namely:
- Introduce `all_mps_types_and` that is very similar to `all_types_and`, but skips `float64`
- Decorate lots of tests with `@dtypesIfMPS(*all_mps_types())`
- Skip `test_from_dlpack_noncontinguous` as it currently crashes (need to be fixed)
- Add lots of `expectedFailureIfMPS`
- Delete all `@onlyNativeDeviceTypesAnd("mps")`

&lt;sarcasm&gt; I love how well documented this variable are &lt;/sarcasm&gt;

Pull Request resolved: https://github.com/pytorch/pytorch/pull/153835
Approved by: https://github.com/Skylion007
2025-08-05 18:57:35 +00:00
eqy
c89fa88acb [conv][cuDNN][64-bit indexing] reduce memory usage of depthwise conv 64-bit indexing test (#158981)
Use half instead for reduced memory usage

Pull Request resolved: https://github.com/pytorch/pytorch/pull/158981
Approved by: https://github.com/soulitzer, https://github.com/Skylion007
2025-07-25 23:58:45 +00:00
PyTorch MergeBot
317af4c87b Revert "[cuDNN][64-bit indexing] update conv depthwise 64bit indexing dispatch condition to match native kernel (#156140)"
This reverts commit a5f59cc2ea.

Reverted https://github.com/pytorch/pytorch/pull/156140 on behalf of https://github.com/atalman due to breaks internal builds ([comment](https://github.com/pytorch/pytorch/pull/156140#issuecomment-2988441548))
2025-06-19 15:09:29 +00:00
eqy
a5f59cc2ea [cuDNN][64-bit indexing] update conv depthwise 64bit indexing dispatch condition to match native kernel (#156140)
The native kernel doesn't support batch splitting so the previous check wasn't aggressive enough in dispatching to cuDNN

https://github.com/pytorch/pytorch/issues/155225

Pull Request resolved: https://github.com/pytorch/pytorch/pull/156140
Approved by: https://github.com/ngimel
2025-06-18 17:32:36 +00:00
eqy
bd3c32916c [cuDNN] Enabled dilation for deterministic convolutions in cuDNN (#154292)
Provides order-of-magnitude speedup over fallback impl.

https://github.com/pytorch/pytorch/issues/28777

Pull Request resolved: https://github.com/pytorch/pytorch/pull/154292
Approved by: https://github.com/Skylion007

Co-authored-by: Aaron Gokaslan <aaronGokaslan@gmail.com>
2025-06-11 23:35:52 +00:00
Joona Havukainen
981bdb39ca Enable ConvTranspose3D for FP32 and Complex64 (#154696)
Fixes #154615

Enables using ConvTranspose3D since it seems support exists both on MacOS 14 and 15.

For the half dtypes the discrepancy of CPU and GPU implementations is too large to conclude whether there is a bug in the implementation or not without a more rigorous study on what bounds are there to the expected error. So they are left unsupported for now and an assert is added to notify the user if the op is called with fp16 or bf16 inputs.

Tests for ConvTranspose3D were enabled for the supported data types.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/154696
Approved by: https://github.com/malfet
2025-06-02 16:24:03 +00:00
Aaron Gokaslan
dbad6d71c7 [BE][Ez]: Unskip conv1d MPS test (#154795)
Fixes issue I noticed where conv1d test is skipped for complex types unconditionally
Pull Request resolved: https://github.com/pytorch/pytorch/pull/154795
Approved by: https://github.com/jansel
2025-05-31 23:01:19 +00:00
eqy
823a35807c [CUDA][CUDNN] Dispatch to cuDNN for non-batch-splittable 64-bit NCHW convolutions (#153101)
For #152816

Pull Request resolved: https://github.com/pytorch/pytorch/pull/153101
Approved by: https://github.com/Skylion007
2025-05-20 20:19:03 +00:00
PyTorch MergeBot
bf0fe4f828 Revert "[CUDA][CUDNN] Dispatch to cuDNN for non-batch-splittable 64-bit NCHW convolutions (#153101)"
This reverts commit ced90d23d3.

Reverted https://github.com/pytorch/pytorch/pull/153101 on behalf of https://github.com/jeanschmidt due to Seems to have introduced breakages on main, tentative revert: https://github.com/pytorch/pytorch/actions/runs/15024667248/job/42224521705 ([comment](https://github.com/pytorch/pytorch/pull/153101#issuecomment-2881208171))
2025-05-14 18:52:07 +00:00
eqy
ced90d23d3 [CUDA][CUDNN] Dispatch to cuDNN for non-batch-splittable 64-bit NCHW convolutions (#153101)
For #152816

Pull Request resolved: https://github.com/pytorch/pytorch/pull/153101
Approved by: https://github.com/Skylion007
2025-05-14 15:22:47 +00:00
Eddie Yan
ec68d082a1 [CUDA][TF32] Account for TF32 in test_conv2d_same_padding (#152618)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/152618
Approved by: https://github.com/msaroufim, https://github.com/Skylion007
2025-05-02 20:19:00 +00:00
Jagadish Krishnamoorthy
0d99b4e9e2 ROCm: Enable tf32 testing on test_nn (#148945)
Add tf32 support for ROCm tests.
test command: python test/test_nn.py -v

Pull Request resolved: https://github.com/pytorch/pytorch/pull/148945
Approved by: https://github.com/jeffdaily

Co-authored-by: Jeff Daily <jeff.daily@amd.com>
2025-04-28 23:01:04 +00:00
Alvaro-Kothe
8ce3d4a541 test(Conv3d): use correct class for test_Conv3d_module_same_padding (#152187)
The test for the class `Conv3d` is calling `Conv2d`. This PR just ensure that we are testing the correct module.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/152187
Approved by: https://github.com/Skylion007
2025-04-28 16:59:12 +00:00
cyy
970fefcc53 Remove outdated skipCUDAIfCudnnVersionLessThan decoration (#148940)
Test conditions for CUDNN 7 and 8 were removed because we have moved to CUDNN 9.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/148940
Approved by: https://github.com/mikaylagawarecki
2025-03-13 18:02:50 +00:00
cyy
a5f6b24d87 Remove outdated skipIfRocmVersionLessThan decorations (#148941)
Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/148941
Approved by: https://github.com/jeffdaily
2025-03-11 18:37:40 +00:00
Jeff Daily
44248c44eb [ROCm] miopen benchmark behavior now better aligns with cudnn (#145294)
The default benchmark setting is now false. The new miopen behavior means when benchmarking is disabled, for any shape that doesn't have a find hit, then it will do a quick search (same behavior as the prior default), and use that result. Now when benchmark is enabled, it will perform an exhaustive search and update any DBs. miopen immediate mode is still available and is used when deterministic is true and benchmark is false.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/145294
Approved by: https://github.com/BrianHarrisonAMD, https://github.com/malfet
2025-02-05 17:19:53 +00:00
Benjamin Glass
5aa5a5763e [inductor triton] Disable incorrect TF32 usage on CUDA capability < 8 (#145684)
Triton 2.2 and greater have a bug where allowing TF32 generation for a GPU that does not support TF32 will cause code generation errors. Patch around this problem by:

1. Adding a function to `torch.cuda` that determines whether CUDA hardware is capable of using the TF32 format.
2. Using that function to explicitly disable TF32 generation when calling Triton, where needed.

To demonstrate that this fix works, try running `test/inductor/test_max_autotune.py` on a GPU with CUDA compute capability < 8 (e.g. any NVIDIA consumer GPU) without this fix.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/145684
Approved by: https://github.com/eqy
2025-01-28 22:01:08 +00:00
PyTorch MergeBot
6a4fb4b615 Revert "Align CPU behavior with CUDA for ConvTranspose when out_channels=0 (#142859)"
This reverts commit cb814c0b96.

Reverted https://github.com/pytorch/pytorch/pull/142859 on behalf of https://github.com/malfet due to It broke ROCM tests again, see 5cd2b34e82/1 ([comment](https://github.com/pytorch/pytorch/pull/142859#issuecomment-2614523822))
2025-01-26 17:49:05 +00:00
Wu, Chunyuan
cb814c0b96 Align CPU behavior with CUDA for ConvTranspose when out_channels=0 (#142859)
Fixes https://github.com/pytorch/pytorch/issues/142466.
Remove the `weight.numel() != 0` check to align the behavior with CUDA for `ConvTranspose` when `out_channels=0`. After removing this check, the existing code is already able to give an empty output in such case.

Test plan:
```
python -u test/nn/test_convolution.py -k test_ConvTranspose_output_channels_0_cpu_float32
python -u test/nn/test_convolution.py -k test_ConvTranspose_output_channels_0_cuda_float32
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/142859
Approved by: https://github.com/mingfeima, https://github.com/malfet
2025-01-26 01:56:40 +00:00
PyTorch MergeBot
d95a6babcc Revert "Align CPU behavior with CUDA for ConvTranspose when out_channels=0 (#142859)"
This reverts commit 0bff377880.

Reverted https://github.com/pytorch/pytorch/pull/142859 on behalf of https://github.com/huydhn due to Sorry for reverting your change but the XLA failures look legit ([comment](https://github.com/pytorch/pytorch/pull/142859#issuecomment-2608631019))
2025-01-23 01:10:31 +00:00
Wu, Chunyuan
0bff377880 Align CPU behavior with CUDA for ConvTranspose when out_channels=0 (#142859)
Fixes https://github.com/pytorch/pytorch/issues/142466.
Remove the `weight.numel() != 0` check to align the behavior with CUDA for `ConvTranspose` when `out_channels=0`. After removing this check, the existing code is already able to give an empty output in such case.

Test plan:
```
python -u test/nn/test_convolution.py -k test_ConvTranspose_output_channels_0_cpu_float32
python -u test/nn/test_convolution.py -k test_ConvTranspose_output_channels_0_cuda_float32
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/142859
Approved by: https://github.com/mingfeima, https://github.com/malfet
2025-01-22 17:52:53 +00:00
Tom Ritchford
eaef613688 Fix issue with test/nn/test_convolution:TestConvolutionNNDeviceTypeCUDA.test_conv_large_batch_1_cuda (#145067)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/145067
Approved by: https://github.com/Skylion007, https://github.com/nWEIdia

Co-authored-by: Wei Wang <143543872+nWEIdia@users.noreply.github.com>
2025-01-17 20:31:25 +00:00
Tom Ritchford
c947a7d38e Fix unused Python variables in test/nn (#143396)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/143396
Approved by: https://github.com/mikaylagawarecki
2024-12-18 03:30:54 +00:00
Nikita Shulga
9c88b08ac9 [BE] Replace skipIfMPS with expectedFailureMPS (#139940)
Functionally two decorators are very similar, but one should rely on expectedFailure as much as possible to get signal when something is fixed.
- Move `product_version` variable from `test_mps` to common_utils, but call it `MACOS_VERSION`
- Introduce `skipIfMPSOnMacOS13`  to decorate the hard crashes that happens only on MacOS13 (which at this point will not get any fixes and will be deprecated soon)
- Add `device_type='mps'` to all `skipIfMPS` per https://github.com/pytorch/pytorch/issues/140560
Pull Request resolved: https://github.com/pytorch/pytorch/pull/139940
Approved by: https://github.com/janeyx99, https://github.com/huydhn
2024-11-15 03:48:37 +00:00
Eddie Yan
846b4e614b [TF32][cuDNN][Convolution] Add some missing TF32 decorators (#138768)
Newer cuDNN versions seem to be able to dispatch to cuDNN kernels

Pull Request resolved: https://github.com/pytorch/pytorch/pull/138768
Approved by: https://github.com/Skylion007
2024-10-25 19:03:42 +00:00
Siddharth Kotapati
e27c0048db Enable additional tests for MPS CI runs (#134356)
As part of the follow up for https://github.com/pytorch/pytorch/issues/133520, adapting existing unused tests for use in MPS CI runs. Focusing on nhwc & other memory formatting tests

Pull Request resolved: https://github.com/pytorch/pytorch/pull/134356
Approved by: https://github.com/malfet, https://github.com/eqy, https://github.com/huydhn
2024-10-04 21:52:38 +00:00
Mikayla Gawarecki
d9576c9440 Fix failures when default is flipped for weights_only (#127627)
Tests on XLA shard not fixed yet but there is an issue here https://github.com/pytorch/xla/issues/7799

Pull Request resolved: https://github.com/pytorch/pytorch/pull/127627
Approved by: https://github.com/albanD
ghstack dependencies: #132349
2024-08-16 00:22:43 +00:00
Xuehai Pan
fbe6f42dcf [BE][Easy][8/19] enforce style for empty lines in import segments in test/[k-p]*/ (#129759)
See https://github.com/pytorch/pytorch/pull/129751#issue-2380881501. Most changes are auto-generated by linter.

You can review these PRs via:

```bash
git diff --ignore-all-space --ignore-blank-lines HEAD~1
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/129759
Approved by: https://github.com/justinchuby, https://github.com/ezyang
2024-07-31 02:09:20 +00:00
eellison
28f29e074b Dont mutate tensor stride in place in cudnn conv (#126786)
Fix for https://github.com/pytorch/pytorch/issues/126241.

Within the cudnn convolution, we were in-place updating the strides of the tensor to disambiguate for size-1 dims and contiguous and channels last tensors. Instead of mutating the tensors stride, just use a temporary. Inside cudnn it is then copied: d7ccb5b3c4/include/cudnn_frontend_Tensor.h (L201-L203).

Pull Request resolved: https://github.com/pytorch/pytorch/pull/126786
Approved by: https://github.com/ezyang, https://github.com/shunting314, https://github.com/eqy
2024-05-22 01:53:44 +00:00
eqy
973d724e21 [CUDA] Fix 64-bit indexing in vol2col in conv3d (#124650)
Similar to #118005, fixes sometimes silent IMAs that occur

CC @atalman @malfet

Pull Request resolved: https://github.com/pytorch/pytorch/pull/124650
Approved by: https://github.com/soulitzer
2024-04-25 23:21:43 +00:00