Yuanyuan Chen
281bb56cc5
Enable half precision types on test_conv_cudnn_nhwc_support ( #163444 )
...
This PR adds flaot16 and bfloat16 cases to `test_conv_cudnn_nhwc_support` and removes outdated comments.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/163444
Approved by: https://github.com/Skylion007
2025-09-22 04:11:20 +00:00
Jeff Daily
0def79fdd9
[ROCm] fix conv relu fusion ( #162856 )
...
Fixes #162816 .
Pull Request resolved: https://github.com/pytorch/pytorch/pull/162856
Approved by: https://github.com/jeffdaily
Co-authored-by: Jeff Daily <jeff.daily@amd.com>
2025-09-15 22:49:32 +00:00
Jeff Daily
d65ffdef3d
[ROCm] fix miopen batchnorm changing output format ( #162112 )
...
It was found that the integration of miopen batchnorm was causing the output to always be in default contig memory format even when the input was channels last. This also unskips a number of related unit tests.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/162112
Approved by: https://github.com/jeffdaily
Co-authored-by: Jeff Daily <jeff.daily@amd.com>
Co-authored-by: Dmitry Nikolaev <dmitry.nikolaev@amd.com>
Co-authored-by: Jithun Nair <37884920+jithunnair-amd@users.noreply.github.com>
2025-09-11 19:37:48 +00:00
eqy
5dbee5691c
[cuDNN][Convolution][TF32][64bit] Add tf32_on_and_off decorator to conv3d 64bit test ( #161004 )
...
cuDNN has new generated kernels that can use TF32.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/161004
Approved by: https://github.com/janeyx99 , https://github.com/Skylion007
2025-09-10 21:39:35 +00:00
Jeff Daily
99f356fa58
[ROCm] revamp miopen integration ( #161687 )
...
Update sources under ATen/miopen and ATen/native/miopen to align with best practices. Avoid reshape_ calls inside backward operations.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/161687
Approved by: https://github.com/jeffdaily
Co-authored-by: Jeff Daily <jeff.daily@amd.com>
2025-09-03 22:28:09 +00:00
Eddie Yan
f391afe9bf
[cuDNN][convolution] remove redundant conv3d 64bit test ( #161177 )
...
turns out it's the same as
```
@onlyCUDA
@largeTensorTest("40GB")
@largeTensorTest("24GB", "cpu")
@tf32_on_and_off(0.005)
def test_conv3d_64bit_indexing(self, device):
x = torch.rand(1, 32, 512, 512, 256)
m = torch.nn.Conv3d(32, 1, kernel_size=1, padding=0, stride=1, bias=False)
yref = m(x)
y = m.to(device=device)(x.to(device=device))
self.assertEqual(yref, y)
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/161177
Approved by: https://github.com/Skylion007
2025-08-25 15:01:05 +00:00
eqy
9903ca4f70
[cuDNN][64-bit indexing] update conv depthwise 64bit indexing dispatch condition to match native kernel ( #156140 )
...
The native kernel doesn't support batch splitting so the previous check wasn't aggressive enough in dispatching to cuDNN
https://github.com/pytorch/pytorch/issues/155225
Pull Request resolved: https://github.com/pytorch/pytorch/pull/156140
Approved by: https://github.com/ngimel , https://github.com/atalman
2025-08-12 18:07:41 +00:00
Nikita Shulga
e06b110f73
[Testing] Add MPS to NATIVE_DEVICES ( #153835 )
...
This would allow me to enable more opinfo tests against MPS device eventually and supposed to be a very simple test, but actually required minor adjustments to lots of test files, namely:
- Introduce `all_mps_types_and` that is very similar to `all_types_and`, but skips `float64`
- Decorate lots of tests with `@dtypesIfMPS(*all_mps_types())`
- Skip `test_from_dlpack_noncontinguous` as it currently crashes (need to be fixed)
- Add lots of `expectedFailureIfMPS`
- Delete all `@onlyNativeDeviceTypesAnd("mps")`
<sarcasm> I love how well documented this variable are </sarcasm>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/153835
Approved by: https://github.com/Skylion007
2025-08-05 18:57:35 +00:00
eqy
c89fa88acb
[conv][cuDNN][64-bit indexing] reduce memory usage of depthwise conv 64-bit indexing test ( #158981 )
...
Use half instead for reduced memory usage
Pull Request resolved: https://github.com/pytorch/pytorch/pull/158981
Approved by: https://github.com/soulitzer , https://github.com/Skylion007
2025-07-25 23:58:45 +00:00
PyTorch MergeBot
317af4c87b
Revert "[cuDNN][64-bit indexing] update conv depthwise 64bit indexing dispatch condition to match native kernel ( #156140 )"
...
This reverts commit a5f59cc2ea .
Reverted https://github.com/pytorch/pytorch/pull/156140 on behalf of https://github.com/atalman due to breaks internal builds ([comment](https://github.com/pytorch/pytorch/pull/156140#issuecomment-2988441548 ))
2025-06-19 15:09:29 +00:00
eqy
a5f59cc2ea
[cuDNN][64-bit indexing] update conv depthwise 64bit indexing dispatch condition to match native kernel ( #156140 )
...
The native kernel doesn't support batch splitting so the previous check wasn't aggressive enough in dispatching to cuDNN
https://github.com/pytorch/pytorch/issues/155225
Pull Request resolved: https://github.com/pytorch/pytorch/pull/156140
Approved by: https://github.com/ngimel
2025-06-18 17:32:36 +00:00
eqy
bd3c32916c
[cuDNN] Enabled dilation for deterministic convolutions in cuDNN ( #154292 )
...
Provides order-of-magnitude speedup over fallback impl.
https://github.com/pytorch/pytorch/issues/28777
Pull Request resolved: https://github.com/pytorch/pytorch/pull/154292
Approved by: https://github.com/Skylion007
Co-authored-by: Aaron Gokaslan <aaronGokaslan@gmail.com>
2025-06-11 23:35:52 +00:00
Joona Havukainen
981bdb39ca
Enable ConvTranspose3D for FP32 and Complex64 ( #154696 )
...
Fixes #154615
Enables using ConvTranspose3D since it seems support exists both on MacOS 14 and 15.
For the half dtypes the discrepancy of CPU and GPU implementations is too large to conclude whether there is a bug in the implementation or not without a more rigorous study on what bounds are there to the expected error. So they are left unsupported for now and an assert is added to notify the user if the op is called with fp16 or bf16 inputs.
Tests for ConvTranspose3D were enabled for the supported data types.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/154696
Approved by: https://github.com/malfet
2025-06-02 16:24:03 +00:00
Aaron Gokaslan
dbad6d71c7
[BE][Ez]: Unskip conv1d MPS test ( #154795 )
...
Fixes issue I noticed where conv1d test is skipped for complex types unconditionally
Pull Request resolved: https://github.com/pytorch/pytorch/pull/154795
Approved by: https://github.com/jansel
2025-05-31 23:01:19 +00:00
eqy
823a35807c
[CUDA][CUDNN] Dispatch to cuDNN for non-batch-splittable 64-bit NCHW convolutions ( #153101 )
...
For #152816
Pull Request resolved: https://github.com/pytorch/pytorch/pull/153101
Approved by: https://github.com/Skylion007
2025-05-20 20:19:03 +00:00
PyTorch MergeBot
bf0fe4f828
Revert "[CUDA][CUDNN] Dispatch to cuDNN for non-batch-splittable 64-bit NCHW convolutions ( #153101 )"
...
This reverts commit ced90d23d3 .
Reverted https://github.com/pytorch/pytorch/pull/153101 on behalf of https://github.com/jeanschmidt due to Seems to have introduced breakages on main, tentative revert: https://github.com/pytorch/pytorch/actions/runs/15024667248/job/42224521705 ([comment](https://github.com/pytorch/pytorch/pull/153101#issuecomment-2881208171 ))
2025-05-14 18:52:07 +00:00
eqy
ced90d23d3
[CUDA][CUDNN] Dispatch to cuDNN for non-batch-splittable 64-bit NCHW convolutions ( #153101 )
...
For #152816
Pull Request resolved: https://github.com/pytorch/pytorch/pull/153101
Approved by: https://github.com/Skylion007
2025-05-14 15:22:47 +00:00
Eddie Yan
ec68d082a1
[CUDA][TF32] Account for TF32 in test_conv2d_same_padding ( #152618 )
...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/152618
Approved by: https://github.com/msaroufim , https://github.com/Skylion007
2025-05-02 20:19:00 +00:00
Jagadish Krishnamoorthy
0d99b4e9e2
ROCm: Enable tf32 testing on test_nn ( #148945 )
...
Add tf32 support for ROCm tests.
test command: python test/test_nn.py -v
Pull Request resolved: https://github.com/pytorch/pytorch/pull/148945
Approved by: https://github.com/jeffdaily
Co-authored-by: Jeff Daily <jeff.daily@amd.com>
2025-04-28 23:01:04 +00:00
Alvaro-Kothe
8ce3d4a541
test(Conv3d): use correct class for test_Conv3d_module_same_padding ( #152187 )
...
The test for the class `Conv3d` is calling `Conv2d`. This PR just ensure that we are testing the correct module.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/152187
Approved by: https://github.com/Skylion007
2025-04-28 16:59:12 +00:00
cyy
970fefcc53
Remove outdated skipCUDAIfCudnnVersionLessThan decoration ( #148940 )
...
Test conditions for CUDNN 7 and 8 were removed because we have moved to CUDNN 9.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/148940
Approved by: https://github.com/mikaylagawarecki
2025-03-13 18:02:50 +00:00
cyy
a5f6b24d87
Remove outdated skipIfRocmVersionLessThan decorations ( #148941 )
...
Fixes #ISSUE_NUMBER
Pull Request resolved: https://github.com/pytorch/pytorch/pull/148941
Approved by: https://github.com/jeffdaily
2025-03-11 18:37:40 +00:00
Jeff Daily
44248c44eb
[ROCm] miopen benchmark behavior now better aligns with cudnn ( #145294 )
...
The default benchmark setting is now false. The new miopen behavior means when benchmarking is disabled, for any shape that doesn't have a find hit, then it will do a quick search (same behavior as the prior default), and use that result. Now when benchmark is enabled, it will perform an exhaustive search and update any DBs. miopen immediate mode is still available and is used when deterministic is true and benchmark is false.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/145294
Approved by: https://github.com/BrianHarrisonAMD , https://github.com/malfet
2025-02-05 17:19:53 +00:00
Benjamin Glass
5aa5a5763e
[inductor triton] Disable incorrect TF32 usage on CUDA capability < 8 ( #145684 )
...
Triton 2.2 and greater have a bug where allowing TF32 generation for a GPU that does not support TF32 will cause code generation errors. Patch around this problem by:
1. Adding a function to `torch.cuda` that determines whether CUDA hardware is capable of using the TF32 format.
2. Using that function to explicitly disable TF32 generation when calling Triton, where needed.
To demonstrate that this fix works, try running `test/inductor/test_max_autotune.py` on a GPU with CUDA compute capability < 8 (e.g. any NVIDIA consumer GPU) without this fix.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/145684
Approved by: https://github.com/eqy
2025-01-28 22:01:08 +00:00
PyTorch MergeBot
6a4fb4b615
Revert "Align CPU behavior with CUDA for ConvTranspose when out_channels=0 ( #142859 )"
...
This reverts commit cb814c0b96 .
Reverted https://github.com/pytorch/pytorch/pull/142859 on behalf of https://github.com/malfet due to It broke ROCM tests again, see 5cd2b34e82/1 ([comment](https://github.com/pytorch/pytorch/pull/142859#issuecomment-2614523822 ))
2025-01-26 17:49:05 +00:00
Wu, Chunyuan
cb814c0b96
Align CPU behavior with CUDA for ConvTranspose when out_channels=0 ( #142859 )
...
Fixes https://github.com/pytorch/pytorch/issues/142466 .
Remove the `weight.numel() != 0` check to align the behavior with CUDA for `ConvTranspose` when `out_channels=0`. After removing this check, the existing code is already able to give an empty output in such case.
Test plan:
```
python -u test/nn/test_convolution.py -k test_ConvTranspose_output_channels_0_cpu_float32
python -u test/nn/test_convolution.py -k test_ConvTranspose_output_channels_0_cuda_float32
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/142859
Approved by: https://github.com/mingfeima , https://github.com/malfet
2025-01-26 01:56:40 +00:00
PyTorch MergeBot
d95a6babcc
Revert "Align CPU behavior with CUDA for ConvTranspose when out_channels=0 ( #142859 )"
...
This reverts commit 0bff377880 .
Reverted https://github.com/pytorch/pytorch/pull/142859 on behalf of https://github.com/huydhn due to Sorry for reverting your change but the XLA failures look legit ([comment](https://github.com/pytorch/pytorch/pull/142859#issuecomment-2608631019 ))
2025-01-23 01:10:31 +00:00
Wu, Chunyuan
0bff377880
Align CPU behavior with CUDA for ConvTranspose when out_channels=0 ( #142859 )
...
Fixes https://github.com/pytorch/pytorch/issues/142466 .
Remove the `weight.numel() != 0` check to align the behavior with CUDA for `ConvTranspose` when `out_channels=0`. After removing this check, the existing code is already able to give an empty output in such case.
Test plan:
```
python -u test/nn/test_convolution.py -k test_ConvTranspose_output_channels_0_cpu_float32
python -u test/nn/test_convolution.py -k test_ConvTranspose_output_channels_0_cuda_float32
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/142859
Approved by: https://github.com/mingfeima , https://github.com/malfet
2025-01-22 17:52:53 +00:00
Tom Ritchford
eaef613688
Fix issue with test/nn/test_convolution:TestConvolutionNNDeviceTypeCUDA.test_conv_large_batch_1_cuda ( #145067 )
...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/145067
Approved by: https://github.com/Skylion007 , https://github.com/nWEIdia
Co-authored-by: Wei Wang <143543872+nWEIdia@users.noreply.github.com>
2025-01-17 20:31:25 +00:00
Tom Ritchford
c947a7d38e
Fix unused Python variables in test/nn ( #143396 )
...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/143396
Approved by: https://github.com/mikaylagawarecki
2024-12-18 03:30:54 +00:00
Nikita Shulga
9c88b08ac9
[BE] Replace skipIfMPS with expectedFailureMPS ( #139940 )
...
Functionally two decorators are very similar, but one should rely on expectedFailure as much as possible to get signal when something is fixed.
- Move `product_version` variable from `test_mps` to common_utils, but call it `MACOS_VERSION`
- Introduce `skipIfMPSOnMacOS13` to decorate the hard crashes that happens only on MacOS13 (which at this point will not get any fixes and will be deprecated soon)
- Add `device_type='mps'` to all `skipIfMPS` per https://github.com/pytorch/pytorch/issues/140560
Pull Request resolved: https://github.com/pytorch/pytorch/pull/139940
Approved by: https://github.com/janeyx99 , https://github.com/huydhn
2024-11-15 03:48:37 +00:00
Eddie Yan
846b4e614b
[TF32][cuDNN][Convolution] Add some missing TF32 decorators ( #138768 )
...
Newer cuDNN versions seem to be able to dispatch to cuDNN kernels
Pull Request resolved: https://github.com/pytorch/pytorch/pull/138768
Approved by: https://github.com/Skylion007
2024-10-25 19:03:42 +00:00
Siddharth Kotapati
e27c0048db
Enable additional tests for MPS CI runs ( #134356 )
...
As part of the follow up for https://github.com/pytorch/pytorch/issues/133520 , adapting existing unused tests for use in MPS CI runs. Focusing on nhwc & other memory formatting tests
Pull Request resolved: https://github.com/pytorch/pytorch/pull/134356
Approved by: https://github.com/malfet , https://github.com/eqy , https://github.com/huydhn
2024-10-04 21:52:38 +00:00
Mikayla Gawarecki
d9576c9440
Fix failures when default is flipped for weights_only ( #127627 )
...
Tests on XLA shard not fixed yet but there is an issue here https://github.com/pytorch/xla/issues/7799
Pull Request resolved: https://github.com/pytorch/pytorch/pull/127627
Approved by: https://github.com/albanD
ghstack dependencies: #132349
2024-08-16 00:22:43 +00:00
Xuehai Pan
fbe6f42dcf
[BE][Easy][8/19] enforce style for empty lines in import segments in test/[k-p]*/ ( #129759 )
...
See https://github.com/pytorch/pytorch/pull/129751#issue-2380881501 . Most changes are auto-generated by linter.
You can review these PRs via:
```bash
git diff --ignore-all-space --ignore-blank-lines HEAD~1
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/129759
Approved by: https://github.com/justinchuby , https://github.com/ezyang
2024-07-31 02:09:20 +00:00
eellison
28f29e074b
Dont mutate tensor stride in place in cudnn conv ( #126786 )
...
Fix for https://github.com/pytorch/pytorch/issues/126241 .
Within the cudnn convolution, we were in-place updating the strides of the tensor to disambiguate for size-1 dims and contiguous and channels last tensors. Instead of mutating the tensors stride, just use a temporary. Inside cudnn it is then copied: d7ccb5b3c4/include/cudnn_frontend_Tensor.h (L201-L203) .
Pull Request resolved: https://github.com/pytorch/pytorch/pull/126786
Approved by: https://github.com/ezyang , https://github.com/shunting314 , https://github.com/eqy
2024-05-22 01:53:44 +00:00
eqy
973d724e21
[CUDA] Fix 64-bit indexing in vol2col in conv3d ( #124650 )
...
Similar to #118005 , fixes sometimes silent IMAs that occur
CC @atalman @malfet
Pull Request resolved: https://github.com/pytorch/pytorch/pull/124650
Approved by: https://github.com/soulitzer
2024-04-25 23:21:43 +00:00
PyTorch MergeBot
24ed909934
Revert "[CUDA] Fix 64-bit indexing in vol2col in conv3d ( #124650 )"
...
This reverts commit 71d92bace2 .
Reverted https://github.com/pytorch/pytorch/pull/124650 on behalf of https://github.com/jeanschmidt due to Reverting to check if it introduced regressions for linux-focal-rocm6.0-py3.8 tests ([comment](https://github.com/pytorch/pytorch/pull/124650#issuecomment-2076786795 ))
2024-04-25 09:46:21 +00:00
Eddie Yan
71d92bace2
[CUDA] Fix 64-bit indexing in vol2col in conv3d ( #124650 )
...
Similar to #118005 , fixes sometimes silent IMAs that occur
CC @atalman @malfet
Pull Request resolved: https://github.com/pytorch/pytorch/pull/124650
Approved by: https://github.com/soulitzer
2024-04-24 19:47:18 +00:00
Yuanhao Ji
a625705290
Enable UFMT on all of test/nn ( #123809 )
...
Part of: #123062
Ran lintrunner on:
- `test/nn`
with command:
```bash
lintrunner -a --take UFMT --all-files
```
Co-authored-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/123809
Approved by: https://github.com/mikaylagawarecki
2024-04-12 18:32:25 +00:00
eqy
624e58f2c6
[CUDA] Update size_1 conv tests with TF32 thresholds ( #118022 )
...
Seeing some numerical mismatches on A100
Pull Request resolved: https://github.com/pytorch/pytorch/pull/118022
Approved by: https://github.com/atalman
2024-04-09 23:49:40 +00:00
Eddie Yan
3db618d656
[CUDA] Use 64-bit indexing in CUDA_KERNEL_LOOP in im2col ( #118005 )
...
#117736
Pull Request resolved: https://github.com/pytorch/pytorch/pull/118005
Approved by: https://github.com/atalman
2024-04-09 21:04:20 +00:00
Xia Weiwen
d1510e01fa
Upgrade submodule onednn to v3.3.5 ( #120767 )
...
This upgrade contains the fixes to the known issues brought by oneDNN v3.3.2, including issues https://github.com/pytorch/pytorch/issues/115346 , https://github.com/pytorch/pytorch/issues/120211 and https://github.com/pytorch/pytorch/issues/120406 and those listed in PR #112700 .
Issue https://github.com/pytorch/pytorch/issues/115346 (perf regression) was fixed by oneDNN v3.3.4. No new regression was found with v3.3.5. The detailed results of v3.3.4 are given below and compared with v3.1.1 (the oneDNN version in PyTorch before it was updated to v3.3.2).
1. A performance regression with 5.8% perf drop from `pytorch_stargan-train` (see https://github.com/pytorch/benchmark/issues/2076#issuecomment-1847545843 )
Validation results with this patch: Latency increased by 0.60%
```
Tested on an Intel(R) Xeon(R) Platinum 8358 CPU @ 2.60GHz instance (IceLake)
oneDNN v3.1.1
metrics-1484287.json
{
"name": "cpu",
"environ": {
"pytorch_git_version": "6c8c5ad5eaf47a62fafbb4a2747198cbffbf1ff0"
},
"metrics": {
"latency": 418.851717
}
}
oneDNN v3.3.4
{
"name": "cpu",
"environ": {
"pytorch_git_version": "6c8c5ad5eaf47a62fafbb4a2747198cbffbf1ff0"
},
"metrics": {
"latency": 421.381313
}
}
```
2. Performance regression of FP32 rexnet_100 with Inductor, dynamic shape, multi-threads (see https://github.com/pytorch/pytorch/issues/115346#issue-2030859592 )
Validation results with this patch: Latency reduced by 3.23%
```
Tested on an Intel(R) Xeon(R) Platinum 8358 CPU @ 2.60GHz instance (IceLake)
oneDNN v3.1.1
(inductor speedup over eager mode) 2.876x
dev,name,batch_size,speedup,abs_latency,compilation_latency,compression_ratio,eager_peak_mem,dynamo_peak_mem,calls_captured,unique_graphs,graph_breaks,unique_graph_breaks
cpu,rexnet_100,128,2.875904,113.314765,18.455283,0.990437,1302.636134,1315.212902,351,1,0,0
oneDNN v3.3.4
(inductor speedup over eager mode) 3.003x
dev,name,batch_size,speedup,abs_latency,compilation_latency,compression_ratio,eager_peak_mem,dynamo_peak_mem,calls_captured,unique_graphs,graph_breaks,unique_graph_breaks
cpu,rexnet_100,128,3.003012,109.653012,91.547260,0.990048,1302.532506,1315.625370,351,1,0,0
```
3. Performance regression of AMP hf_T5_generate and tinynet_a with Inductor, static shape, multi-threads (see https://github.com/pytorch/pytorch/issues/115346#issuecomment-1856029962 )
Validation results with this patch: Latency reduced by 0.85%
```
Tested on an AWS spr metal instance
oneDNN v3.1.1
(inductor speedup over eager mode) 1.120x
dev,name,batch_size,speedup,abs_latency,compilation_latency,compression_ratio,eager_peak_mem,dynamo_peak_mem,calls_captured,unique_graphs,graph_breaks,unique_graph_breaks
cpu,hf_T5_generate,1,1.120018,1197.807729,205.905466,0.442803,125.179904,282.698957,10550,48,8,4
oneDNN v3.3.4
(inductor speedup over eager mode) 1.134x
dev,name,batch_size,speedup,abs_latency,compilation_latency,compression_ratio,eager_peak_mem,dynamo_peak_mem,calls_captured,unique_graphs,graph_breaks,unique_graph_breaks
cpu,hf_T5_generate,1,1.133594,1187.701514,205.855527,0.422012,128.405094,304.268493,10550,48,8,4
```
The following issues about functionality are fixed by this upgrade. Test cases are also added for these issues.
- https://github.com/pytorch/pytorch/issues/120211
- https://github.com/pytorch/pytorch/issues/120406
- https://github.com/pytorch/pytorch/issues/120547
-----
Below are detailed data of torchbench CPU userbenchmark test and Inductor FP32/AMP inference tests. No regression of perf or functionality was found.
I. *torchbench CPU userbenchmark test*
Suite | Speedup
-- | --
eager_throughtput_bf16_infer | 1.001848
eager_throughtput_fp32_infer | 1.000257
eager_throughtput_fx_int8 | 1.003069
jit_llga_throughtput_amp_bf16 | 1.000682
jit_llga_throughtput_fp32 | 1.000313
eager_throughtput_bf16_train | 0.998222
eager_throughtput_fp32_train | 1.003384
II. *Inductor FP32/AMP inference tests*
i. FP32 static default
suite | name | thread | batch size | Ratio Speedup(New/old)
-- | -- | -- | -- | --
torchbench | timm_efficientnet | multiple | 64 | 1.09
timm_models | tinynet_a | multiple | 128 | 1.14
ii. FP32 dynamic default
suite | name | thread | batch size | Ratio Speedup(New/old)
-- | -- | -- | -- | --
torchbench | alexnet | multiple | 128 | 1.08
torchbench | basic_gnn_edgecnn | multiple | 1 | 0.98
torchbench | timm_efficientnet | multiple | 64 | 1.08
iii. AMP static default
suite | name | thread | batch size | Ratio Speedup(New/old)
-- | -- | -- | -- | --
torchbench | hf_distil_whisper | multiple | 1 | 1.18
torchbench | timm_efficientnet | multiple | 64 | 1.32
huggingface | BartForConditionalGeneration | multiple | 2 | 1.19
timm_models | eca_halonext26ts | multiple | 128 | 1.13
timm_models | nfnet_l0 | multiple | 128 | 1.13
timm_models | rexnet_100 | multiple | 128 | 1.45
timm_models | spnasnet_100 | multiple | 128 | 1.15
timm_models | tf_efficientnet_b0 | multiple | 128 | 1.22
timm_models | tinynet_a | multiple | 128 | 1.49
torchbench | hf_Bert_large | single | 1 | 1.16
huggingface | XLNetLMHeadModel | single | 1 | 1.07
iv. AMP dynamic default
suite | name | thread | batch size | Ratio Speedup(New/old)
-- | -- | -- | -- | --
torchbench | timm_efficientnet | multiple | 64 | 1.32
huggingface | PLBartForConditionalGeneration | multiple | 4 | 1.14
timm_models | nfnet_l0 | multiple | 128 | 1.15
timm_models | rexnet_100 | multiple | 128 | 1.45
timm_models | tinynet_a | multiple | 128 | 1.34
huggingface | XLNetLMHeadModel | single | 1 | 1.09
-----
Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/120767
Approved by: https://github.com/chuanqi129 , https://github.com/jgong5 , https://github.com/atalman
2024-03-11 12:56:59 +00:00
Eddie Yan
d790c1dca6
[CUDA][cuDNN][TF32] Misc TF32 updates ( #118781 )
...
Twiddle some thresholds that don't seem to play nice with sm90.
CC @tinglvv @nWEIdia @ptrblck
Pull Request resolved: https://github.com/pytorch/pytorch/pull/118781
Approved by: https://github.com/ezyang
2024-02-01 15:32:50 +00:00
Damien
2d2016fdf8
WIP Add compatibility with channels_last_3d for conv3d ( #114790 )
...
Part of a multi-PR work to fix #59168
Pull Request resolved: https://github.com/pytorch/pytorch/pull/114790
Approved by: https://github.com/albanD
2023-12-20 19:28:25 +00:00
PyTorch MergeBot
a7bfa04da6
Revert "More markDynamoStrictTest ( #115870 )"
...
This reverts commit 7f686c8fe1 .
Reverted https://github.com/pytorch/pytorch/pull/115870 on behalf of https://github.com/jeanschmidt due to Breaking internal tests and builds, please check diff ([comment](https://github.com/pytorch/pytorch/pull/115870#issuecomment-1862997125 ))
2023-12-19 15:40:57 +00:00
rzou
7f686c8fe1
More markDynamoStrictTest ( #115870 )
...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/115870
Approved by: https://github.com/voznesenskym
ghstack dependencies: #115845 , #115855 , #115856 , #115857 , #115858
2023-12-15 05:26:54 +00:00
Jithun Nair
2ea2421b44
Skip unit tests that fail on MI210 runners ( #114613 )
...
Taken from https://github.com/pytorch/pytorch/pull/105980
Pull Request resolved: https://github.com/pytorch/pytorch/pull/114613
Approved by: https://github.com/malfet
2023-11-27 22:25:35 +00:00
rraminen
44367c59b2
Update skip reason for failing unit tests on ROCm 5.7 ( #113286 )
...
Follow up to https://github.com/pytorch/pytorch/pull/110465 . Updated skip reason for failing unit tests on ROCm 5.7
Pull Request resolved: https://github.com/pytorch/pytorch/pull/113286
Approved by: https://github.com/malfet
2023-11-13 19:29:04 +00:00
rraminen
3a429423fc
Upgrade CI to ROCm5.7 ( #110465 )
...
This PR is to upgrade CI to ROCm5.7
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110465
Approved by: https://github.com/pruthvistony , https://github.com/malfet
2023-11-08 06:11:10 +00:00