Eddie Yan
71d92bace2
[CUDA] Fix 64-bit indexing in vol2col in conv3d ( #124650 )
...
Similar to #118005 , fixes sometimes silent IMAs that occur
CC @atalman @malfet
Pull Request resolved: https://github.com/pytorch/pytorch/pull/124650
Approved by: https://github.com/soulitzer
2024-04-24 19:47:18 +00:00
Yuanhao Ji
a625705290
Enable UFMT on all of test/nn ( #123809 )
...
Part of: #123062
Ran lintrunner on:
- `test/nn`
with command:
```bash
lintrunner -a --take UFMT --all-files
```
Co-authored-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/123809
Approved by: https://github.com/mikaylagawarecki
2024-04-12 18:32:25 +00:00
eqy
624e58f2c6
[CUDA] Update size_1 conv tests with TF32 thresholds ( #118022 )
...
Seeing some numerical mismatches on A100
Pull Request resolved: https://github.com/pytorch/pytorch/pull/118022
Approved by: https://github.com/atalman
2024-04-09 23:49:40 +00:00
Eddie Yan
3db618d656
[CUDA] Use 64-bit indexing in CUDA_KERNEL_LOOP in im2col ( #118005 )
...
#117736
Pull Request resolved: https://github.com/pytorch/pytorch/pull/118005
Approved by: https://github.com/atalman
2024-04-09 21:04:20 +00:00
Xia Weiwen
d1510e01fa
Upgrade submodule onednn to v3.3.5 ( #120767 )
...
This upgrade contains the fixes to the known issues brought by oneDNN v3.3.2, including issues https://github.com/pytorch/pytorch/issues/115346 , https://github.com/pytorch/pytorch/issues/120211 and https://github.com/pytorch/pytorch/issues/120406 and those listed in PR #112700 .
Issue https://github.com/pytorch/pytorch/issues/115346 (perf regression) was fixed by oneDNN v3.3.4. No new regression was found with v3.3.5. The detailed results of v3.3.4 are given below and compared with v3.1.1 (the oneDNN version in PyTorch before it was updated to v3.3.2).
1. A performance regression with 5.8% perf drop from `pytorch_stargan-train` (see https://github.com/pytorch/benchmark/issues/2076#issuecomment-1847545843 )
Validation results with this patch: Latency increased by 0.60%
```
Tested on an Intel(R) Xeon(R) Platinum 8358 CPU @ 2.60GHz instance (IceLake)
oneDNN v3.1.1
metrics-1484287.json
{
"name": "cpu",
"environ": {
"pytorch_git_version": "6c8c5ad5eaf47a62fafbb4a2747198cbffbf1ff0"
},
"metrics": {
"latency": 418.851717
}
}
oneDNN v3.3.4
{
"name": "cpu",
"environ": {
"pytorch_git_version": "6c8c5ad5eaf47a62fafbb4a2747198cbffbf1ff0"
},
"metrics": {
"latency": 421.381313
}
}
```
2. Performance regression of FP32 rexnet_100 with Inductor, dynamic shape, multi-threads (see https://github.com/pytorch/pytorch/issues/115346#issue-2030859592 )
Validation results with this patch: Latency reduced by 3.23%
```
Tested on an Intel(R) Xeon(R) Platinum 8358 CPU @ 2.60GHz instance (IceLake)
oneDNN v3.1.1
(inductor speedup over eager mode) 2.876x
dev,name,batch_size,speedup,abs_latency,compilation_latency,compression_ratio,eager_peak_mem,dynamo_peak_mem,calls_captured,unique_graphs,graph_breaks,unique_graph_breaks
cpu,rexnet_100,128,2.875904,113.314765,18.455283,0.990437,1302.636134,1315.212902,351,1,0,0
oneDNN v3.3.4
(inductor speedup over eager mode) 3.003x
dev,name,batch_size,speedup,abs_latency,compilation_latency,compression_ratio,eager_peak_mem,dynamo_peak_mem,calls_captured,unique_graphs,graph_breaks,unique_graph_breaks
cpu,rexnet_100,128,3.003012,109.653012,91.547260,0.990048,1302.532506,1315.625370,351,1,0,0
```
3. Performance regression of AMP hf_T5_generate and tinynet_a with Inductor, static shape, multi-threads (see https://github.com/pytorch/pytorch/issues/115346#issuecomment-1856029962 )
Validation results with this patch: Latency reduced by 0.85%
```
Tested on an AWS spr metal instance
oneDNN v3.1.1
(inductor speedup over eager mode) 1.120x
dev,name,batch_size,speedup,abs_latency,compilation_latency,compression_ratio,eager_peak_mem,dynamo_peak_mem,calls_captured,unique_graphs,graph_breaks,unique_graph_breaks
cpu,hf_T5_generate,1,1.120018,1197.807729,205.905466,0.442803,125.179904,282.698957,10550,48,8,4
oneDNN v3.3.4
(inductor speedup over eager mode) 1.134x
dev,name,batch_size,speedup,abs_latency,compilation_latency,compression_ratio,eager_peak_mem,dynamo_peak_mem,calls_captured,unique_graphs,graph_breaks,unique_graph_breaks
cpu,hf_T5_generate,1,1.133594,1187.701514,205.855527,0.422012,128.405094,304.268493,10550,48,8,4
```
The following issues about functionality are fixed by this upgrade. Test cases are also added for these issues.
- https://github.com/pytorch/pytorch/issues/120211
- https://github.com/pytorch/pytorch/issues/120406
- https://github.com/pytorch/pytorch/issues/120547
-----
Below are detailed data of torchbench CPU userbenchmark test and Inductor FP32/AMP inference tests. No regression of perf or functionality was found.
I. *torchbench CPU userbenchmark test*
Suite | Speedup
-- | --
eager_throughtput_bf16_infer | 1.001848
eager_throughtput_fp32_infer | 1.000257
eager_throughtput_fx_int8 | 1.003069
jit_llga_throughtput_amp_bf16 | 1.000682
jit_llga_throughtput_fp32 | 1.000313
eager_throughtput_bf16_train | 0.998222
eager_throughtput_fp32_train | 1.003384
II. *Inductor FP32/AMP inference tests*
i. FP32 static default
suite | name | thread | batch size | Ratio Speedup(New/old)
-- | -- | -- | -- | --
torchbench | timm_efficientnet | multiple | 64 | 1.09
timm_models | tinynet_a | multiple | 128 | 1.14
ii. FP32 dynamic default
suite | name | thread | batch size | Ratio Speedup(New/old)
-- | -- | -- | -- | --
torchbench | alexnet | multiple | 128 | 1.08
torchbench | basic_gnn_edgecnn | multiple | 1 | 0.98
torchbench | timm_efficientnet | multiple | 64 | 1.08
iii. AMP static default
suite | name | thread | batch size | Ratio Speedup(New/old)
-- | -- | -- | -- | --
torchbench | hf_distil_whisper | multiple | 1 | 1.18
torchbench | timm_efficientnet | multiple | 64 | 1.32
huggingface | BartForConditionalGeneration | multiple | 2 | 1.19
timm_models | eca_halonext26ts | multiple | 128 | 1.13
timm_models | nfnet_l0 | multiple | 128 | 1.13
timm_models | rexnet_100 | multiple | 128 | 1.45
timm_models | spnasnet_100 | multiple | 128 | 1.15
timm_models | tf_efficientnet_b0 | multiple | 128 | 1.22
timm_models | tinynet_a | multiple | 128 | 1.49
torchbench | hf_Bert_large | single | 1 | 1.16
huggingface | XLNetLMHeadModel | single | 1 | 1.07
iv. AMP dynamic default
suite | name | thread | batch size | Ratio Speedup(New/old)
-- | -- | -- | -- | --
torchbench | timm_efficientnet | multiple | 64 | 1.32
huggingface | PLBartForConditionalGeneration | multiple | 4 | 1.14
timm_models | nfnet_l0 | multiple | 128 | 1.15
timm_models | rexnet_100 | multiple | 128 | 1.45
timm_models | tinynet_a | multiple | 128 | 1.34
huggingface | XLNetLMHeadModel | single | 1 | 1.09
-----
Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/120767
Approved by: https://github.com/chuanqi129 , https://github.com/jgong5 , https://github.com/atalman
2024-03-11 12:56:59 +00:00
Eddie Yan
d790c1dca6
[CUDA][cuDNN][TF32] Misc TF32 updates ( #118781 )
...
Twiddle some thresholds that don't seem to play nice with sm90.
CC @tinglvv @nWEIdia @ptrblck
Pull Request resolved: https://github.com/pytorch/pytorch/pull/118781
Approved by: https://github.com/ezyang
2024-02-01 15:32:50 +00:00
Damien
2d2016fdf8
WIP Add compatibility with channels_last_3d for conv3d ( #114790 )
...
Part of a multi-PR work to fix #59168
Pull Request resolved: https://github.com/pytorch/pytorch/pull/114790
Approved by: https://github.com/albanD
2023-12-20 19:28:25 +00:00
PyTorch MergeBot
a7bfa04da6
Revert "More markDynamoStrictTest ( #115870 )"
...
This reverts commit 7f686c8fe1 .
Reverted https://github.com/pytorch/pytorch/pull/115870 on behalf of https://github.com/jeanschmidt due to Breaking internal tests and builds, please check diff ([comment](https://github.com/pytorch/pytorch/pull/115870#issuecomment-1862997125 ))
2023-12-19 15:40:57 +00:00
rzou
7f686c8fe1
More markDynamoStrictTest ( #115870 )
...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/115870
Approved by: https://github.com/voznesenskym
ghstack dependencies: #115845 , #115855 , #115856 , #115857 , #115858
2023-12-15 05:26:54 +00:00
Jithun Nair
2ea2421b44
Skip unit tests that fail on MI210 runners ( #114613 )
...
Taken from https://github.com/pytorch/pytorch/pull/105980
Pull Request resolved: https://github.com/pytorch/pytorch/pull/114613
Approved by: https://github.com/malfet
2023-11-27 22:25:35 +00:00
rraminen
44367c59b2
Update skip reason for failing unit tests on ROCm 5.7 ( #113286 )
...
Follow up to https://github.com/pytorch/pytorch/pull/110465 . Updated skip reason for failing unit tests on ROCm 5.7
Pull Request resolved: https://github.com/pytorch/pytorch/pull/113286
Approved by: https://github.com/malfet
2023-11-13 19:29:04 +00:00
rraminen
3a429423fc
Upgrade CI to ROCm5.7 ( #110465 )
...
This PR is to upgrade CI to ROCm5.7
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110465
Approved by: https://github.com/pruthvistony , https://github.com/malfet
2023-11-08 06:11:10 +00:00
Pruthvi Madugundu
9ce2e02fd6
Revert "[ROCm] Remove PYTORCH_MIOPEN_SUGGEST_NHWC flag ( #90725 )" ( #110319 )
...
This reverts commit 66bfcd32fd .
NHWC is have perf regression on MIOpen, so reverting till the performance issue is fixed.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110319
Approved by: https://github.com/jeffdaily , https://github.com/jithunnair-amd , https://github.com/kit1980
2023-10-03 19:14:47 +00:00
CaoE
7c9052165a
add fp16 support for native conv and deconv on CPU ( #99497 )
...
### Testing
Native conv vs. mkldnn conv on SPR (with avx512_fp16 support)
Single core:
Input | Naïve impl / us | oneDNN / us | Speed up
-- | -- | -- | --
IC: 64, OC: 256, kernel: 1, stride: 1, N: 256, H: 56, W: 56, G: 1, pad: 0 | 34676789 | 524199.8 | 66.15185
IC: 128, OC: 512, kernel: 1, stride: 1, N: 256, H: 28, W: 28, G: 1, pad: 0 | 33454125 | 349844.4 | 95.62573
IC: 256, OC: 256, kernel: 3, stride: 1, N: 1, H: 16, W: 16, G: 1, pad: 0 | 317650.1 | 2317.677 | 137.0554
IC: 128, OC: 256, kernel: 3, stride: 1, N: 1, L: 64 | 15334.68 | 167.264 | 91.67952
56 cores:
Input | Naïve impl / us | oneDNN / us | Speed up
-- | -- | -- | --
IC: 64, OC: 256, kernel: 1, stride: 1, N: 256, H: 56, W: 56, G: 1, pad: 0 | 1032064 | 11073.58 | 93.20061
IC: 128, OC: 512, kernel: 1, stride: 1, N: 256, H: 28, W: 28, G: 1, pad: 0 | 1000097 | 16371.19 | 61.08883
IC: 256, OC: 1024, kernel: 1, stride: 1, N: 256, H: 14, W: 14, G: 1, pad: 0 | 981813.4 | 9008.908 | 108.9825
IC: 1024, OC: 256, kernel: 1, stride: 1, N: 256, H: 14, W: 14, G: 1, pad: 0 | 1082606 | 10150.47 | 106.6558
IC: 256, OC: 256, kernel: 3, stride: 1, N: 1, H: 16, W: 16, G: 1, pad: 0 | 319980.6 | 181.598 | 1762.027
Pull Request resolved: https://github.com/pytorch/pytorch/pull/99497
Approved by: https://github.com/jgong5 , https://github.com/cpuhrsch
2023-09-25 01:31:26 +00:00
Justin Chu
79c5e33349
[BE] Enable ruff's UP rules and autoformat nn/ mps/ and torch/ ( #105436 )
...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/105436
Approved by: https://github.com/malfet , https://github.com/albanD
2023-07-21 07:38:46 +00:00
Fuzzkatt
6d570ccd59
tf32 context fixes for various tests ( #103137 )
...
Addresses tf32 context related failures from NVIDIA internal testing for following unit tests:
H100:
- functorch/test_vmap.py: test_op_has_batch_rule
A100:
- test_expanded_weights.py: test_cnn_model_sum
- nn/test_convolution.py: test_conv2d_same_padding_backward
Pull Request resolved: https://github.com/pytorch/pytorch/pull/103137
Approved by: https://github.com/zou3519
2023-06-15 02:33:12 +00:00
Fuzzkatt
f8896b7b0e
update tf32 thresholds in nn/test_convolution.py ( #102015 )
...
updated tf32 thresholds for test_cudnn_convolution_relu, test_cudnn_convolution_add_relu
Pull Request resolved: https://github.com/pytorch/pytorch/pull/102015
Approved by: https://github.com/ngimel
2023-05-24 22:42:25 +00:00
Fuzzkatt
47e9dba765
move tf32_on_and_off fix for test_convolution.py ( #102007 )
...
move tf32_on_and_off after @torch.backends.cudnn.flags(enabled=True, benchmark=False) due to @torch.backends.cudnn.flags(enabled=True, benchmark=False) overwriting tf32_on_and_off if after.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/102007
Approved by: https://github.com/ngimel
2023-05-24 02:23:06 +00:00
kshitij12345
3b966a6ce3
[autograd] disable backward/grad for complex scalar output ( #92753 )
...
Fixes https://github.com/pytorch/pytorch/issues/92750
Pull Request resolved: https://github.com/pytorch/pytorch/pull/92753
Approved by: https://github.com/ezyang
2023-02-23 11:38:27 +00:00
Jeff Daily
66bfcd32fd
[ROCm] Remove PYTORCH_MIOPEN_SUGGEST_NHWC flag ( #90725 )
...
Fixes #64427 . MIOpen supports ChannelsLast. No longer need to opt-in with env var.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/90725
Approved by: https://github.com/malfet
2023-02-09 22:26:24 +00:00
mingfeima
26cba842ad
Optimize ConvTransposed2D with mkldnn float32 and bfloat16 on CPU ( #92530 )
...
this PR optimized `ConvTranspose2d` with oneDNN and add channels last support for it. Also the fallback path `slow_conv_transpose2d` also have channels last support. So the memory format propagation behavior would stay the same with or without oneDNN.
Replacement of https://github.com/pytorch/pytorch/pull/77060 , https://github.com/pytorch/pytorch/pull/70897 and https://github.com/pytorch/pytorch/pull/74023 which enables oneDNN for `ConvTranspose2d` and `ConvTranspose3d`
The following results collects on Skylake Xeon 8180, dual sockets, 28 cores per socket.
### single core channels last
configs | forward before/ms | forward after/ms | ratio | backward before/ms | backward after/ms | ratio
-- | -- | -- | -- | -- | -- | --
input size: (32, 32, 100, 100), weight size: (32, 32, 3, 3) | 181.36 | 91.16 | 1.99 | 531.38 | 124.08 | 4.28
input size: (32, 16, 200, 200), weight size: (16, 16, 3, 3) | 324.35 | 153.50 | 2.11 | 973.16 | 185.97 | 5.23
input size: (32, 128, 100, 100), weight size: (128, 128, 3, 3) | 1086.82 | 671.52 | 1.62 | 3008.94 | 1453.33 | 2.07
### single core channels first
configs | forward before/ms | forward after/ms | ratio | backward before/ms | backward after/ms | ratio
-- | -- | -- | -- | -- | -- | --
input size: (32, 32, 100, 100), weight size: (32, 32, 3, 3) | 138.10 | 5.94 | 23.23 | 37.97 | 11.25 | 3.38
input size: (32, 16, 200, 200), weight size: (16, 16, 3, 3) | 236.43 | 8.75 | 27.03 | 87.77 | 18.58 | 4.72
input size: (32, 128, 100, 100), weight size: (128, 128, 3, 3) | 484.39 | 37.69 | 12.85 | 185.40 | 90.57 | 2.05
### single socket channels last
configs | forward before/ms | forward after/ms | ratio | backward before/ms | backward after/ms | ratio
-- | -- | -- | -- | -- | -- | --
input size: (32, 32, 100, 100), weight size: (32, 32, 3, 3) | 138.10 | 5.94 | 23.23 | 37.97 | 11.25 | 3.38
input size: (32, 16, 200, 200), weight size: (16, 16, 3, 3) | 236.43 | 8.75 | 27.03 | 87.77 | 18.58 | 4.72
input size: (32, 128, 100, 100), weight size: (128, 128, 3, 3) | 484.39 | 37.69 | 12.85 | 185.40 | 90.57 | 2.0
### single socket channels first
configs | forward before/ms | forward after/ms | ratio | backward before/ms | backward after/ms | ratio
-- | -- | -- | -- | -- | -- | --
input size: (32, 32, 100, 100), weight size: (32, 32, 3, 3) | 132.56 | 7.19 | 18.43 | 31.43 | 11.20 | 2.81
input size: (32, 16, 200, 200), weight size: (16, 16, 3, 3) | 227.94 | 13.33 | 17.11 | 63.00 | 23.41 | 2.69
input size: (32, 128, 100, 100), weight size: (128, 128, 3, 3) | 473.68 | 52.79 | 8.97 | 150.40 | 87.33 | 1.72
Pull Request resolved: https://github.com/pytorch/pytorch/pull/92530
Approved by: https://github.com/jgong5 , https://github.com/ezyang
2023-02-06 10:11:25 +00:00
Jeff Daily
72502b94f3
correct use of torch.backends.cudnn.flags() ( #93182 )
...
Fixes #77467 .
Pull Request resolved: https://github.com/pytorch/pytorch/pull/93182
Approved by: https://github.com/ngimel
2023-01-28 06:50:06 +00:00
Eddie Yan
dabf515c18
[cuDNN][cuDNN V8 API] (re-re-re-open) cuDNN V8 API on by default ( #91117 )
...
Re-opening following #91025
CC @ptrblck @ngimel
Pull Request resolved: https://github.com/pytorch/pytorch/pull/91117
Approved by: https://github.com/ngimel
2022-12-20 18:52:29 +00:00
PyTorch MergeBot
ba7aeac37b
Revert "[cuDNN][cuDNN V8 API] (re-re-open) cuDNN V8 API on by default ( #89022 )"
...
This reverts commit eecd621f06 .
Reverted https://github.com/pytorch/pytorch/pull/89022 on behalf of https://github.com/ngimel due to breaks some convolution configurations #91025
2022-12-16 23:06:35 +00:00
Eddie Yan
eecd621f06
[cuDNN][cuDNN V8 API] (re-re-open) cuDNN V8 API on by default ( #89022 )
...
Testing V8 on by default again after fixes have been merged for e.g., https://github.com/pytorch/torchdynamo/issues/1833
One new failure that seems to be surfaced with V8 on appears in halonext + amp
```
RuntimeError: Internal Triton PTX codegen error:
Segmentation fault (core dumped)
```
But I'm not sure if this points to a V8 issue or a Triton issue CC @ngimel @ptrblck
Current dynamo benchmarks on A100:
v7 vs. v8
|dev |name |batch_size|abs_latency_v7|abs_latency_v8|
|----|-------------------------------|----------|--------------|--------------|
|cuda|adv_inception_v3 |128 |166.0240 |165.5798 |
|cuda|beit_base_patch16_224 |64 |123.5912 |123.0797 |
|cuda|botnet26t_256 |128 |107.7343 |107.5948 |
|cuda|cait_m36_384 |4 |184.5038 |184.0271 |
|cuda|coat_lite_mini |128 |142.3061 |140.5814 |
|cuda|convit_base |64 |165.2499 |161.0743 |
|cuda|convmixer_768_32 |32 |325.6984 |325.7094 |
|cuda|convnext_base |64 |237.4632 |238.0142 |
|cuda|crossvit_9_240 |128 |72.2980 |72.4367 |
|cuda|cspdarknet53 |64 |96.6862 |96.8308 |
|cuda|deit_base_distilled_patch16_224|64 |117.6045 |117.9616 |
|cuda|dla102 |128 |182.3073 |182.2304 |
|cuda|dm_nfnet_f0 |128 |133.6011 |133.6298 |
|cuda|dpn107 |32 |148.5080 |148.5885 |
|cuda|eca_botnext26ts_256 |128 |113.8676 |113.1514 |
|cuda|eca_halonext26ts |128 |119.2242 |119.1845 |
|cuda|ese_vovnet19b_dw |128 |80.0217 |79.9438 |
|cuda|fbnetc_100 |128 |91.4548 |91.4009 |
|cuda|fbnetv3_b |128 |115.4496 |115.5058 |
|cuda|gernet_l |128 |114.8365 |114.7870 |
|cuda|ghostnet_100 |128 |58.5766 |58.5766 |
|cuda|gluon_inception_v3 |128 |165.5222 |165.7167 |
|cuda|gluon_xception65 |32 |165.8779 |165.7818 |
|cuda|gmixer_24_224 |128 |116.3611 |113.4925 |
|cuda|gmlp_s16_224 |128 |121.2607 |121.2534 |
|cuda|hrnet_w18 |128 |246.5706 |246.7599 |
|cuda|inception_v3 |128 |166.1096 |166.2034 |
|cuda|jx_nest_base |32 |93.6064 |93.4088 |
|cuda|lcnet_050 |128 |21.4156 |21.4207 |
|cuda|levit_128 |128 |27.2901 |27.2543 |
|cuda|mixer_b16_224 |128 |157.8992 |158.2878 |
|cuda|mixnet_l |128 |197.3443 |197.2125 |
|cuda|mnasnet_100 |128 |71.4604 |71.2997 |
|cuda|mobilenetv2_100 |128 |67.6080 |67.7515 |
|cuda|mobilenetv3_large_100 |128 |57.7224 |57.6591 |
|cuda|mobilevit_s |64 |93.0372 |93.0530 |
|cuda|nfnet_l0 |128 |113.1664 |113.2853 |
|cuda|pit_b_224 |64 |133.3333 |133.4153 |
|cuda|pnasnet5large |16 |238.9545 |238.8122 |
|cuda|poolformer_m36 |64 |144.2353 |144.2375 |
|cuda|regnety_002 |128 |32.8534 |32.9069 |
|cuda|repvgg_a2 |128 |102.4150 |102.3827 |
|cuda|res2net101_26w_4s |64 |120.8127 |120.8322 |
|cuda|res2net50_14w_8s |128 |149.7052 |149.8969 |
|cuda|res2next50 |128 |153.7439 |153.8215 |
|cuda|resmlp_12_224 |128 |89.1918 |86.9226 |
|cuda|resnest101e |64 |159.4706 |159.3133 |
|cuda|rexnet_100 |128 |88.0032 |88.0397 |
|cuda|sebotnet33ts_256 |64 |80.4635 |80.0120 |
|cuda|selecsls42b |128 |70.4430 |70.3663 |
|cuda|spnasnet_100 |128 |78.0537 |78.1991 |
|cuda|swin_base_patch4_window7_224 |64 |212.9073 |213.0824 |
|cuda|swsl_resnext101_32x16d |32 |193.0229 |193.0404 |
|cuda|tf_efficientnet_b0 |128 |97.1316 |97.0410 |
|cuda|tf_mixnet_l |128 |203.4956 |203.5340 |
|cuda|tinynet_a |128 |82.4038 |82.8733 |
|cuda|tnt_s_patch16_224 |128 |284.8576 |284.8867 |
|cuda|twins_pcpvt_base |64 |118.3893 |119.2329 |
|cuda|visformer_small |128 |126.0533 |126.0390 |
|cuda|vit_base_patch16_224 |64 |118.2873 |118.0573 |
|cuda|volo_d1_224 |64 |108.7764 |108.2063 |
|cuda|xcit_large_24_p8_224 |5 |100.4656 |100.5209 |
v7 vs. v8 amp
|dev |name |batch_size|abs_latency_v7|abs_latency_v8|
|----|-------------------------------|----------|--------------|--------------|
|cuda|adv_inception_v3 |128 |104.9729 |105.1237 |
|cuda|beit_base_patch16_224 |64 |75.4330 |75.2039 |
|cuda|botnet26t_256 |128 |74.5149 |74.8071 |
|cuda|cait_m36_384 |4 |110.9788 |111.5170 |
|cuda|coat_lite_mini |128 |62.3618 |64.4965 |
|cuda|convit_base |64 |116.4054 |117.9129 |
|cuda|convmixer_768_32 |32 |264.4401 |264.4491 |
|cuda|convnext_base |64 |182.9009 |179.2136 |
|cuda|crossvit_9_240 |128 |48.8586 |48.8359 |
|cuda|cspdarknet53 |64 |80.0245 |80.0160 |
|cuda|deit_base_distilled_patch16_224|64 |66.5921 |66.7448 |
|cuda|dla102 |128 |116.7780 |117.1683 |
|cuda|dm_nfnet_f0 |128 |78.9322 |79.1135 |
|cuda|dpn107 |32 |85.5206 |85.7514 |
|cuda|eca_botnext26ts_256 |128 |76.3672 |77.0050 |
|cuda|eca_halonext26ts |128 |86.2458 | |
|cuda|ese_vovnet19b_dw |128 |43.2943 |43.3379 |
|cuda|fbnetc_100 |128 |54.8479 |54.9251 |
|cuda|fbnetv3_b |128 |70.7504 |71.0188 |
|cuda|gernet_l |128 |66.1607 |66.0379 |
|cuda|ghostnet_100 |128 |43.8882 |43.9336 |
|cuda|gluon_inception_v3 |128 |104.9297 |105.0204 |
|cuda|gluon_xception65 |32 |85.7118 |85.8370 |
|cuda|gmixer_24_224 |128 |75.1214 |76.1170 |
|cuda|gmlp_s16_224 |128 |76.4207 |76.6641 |
|cuda|hrnet_w18 |128 |186.1326 |186.2435 |
|cuda|inception_v3 |128 |105.0561 |105.0783 |
|cuda|jx_nest_base |32 |65.3066 |65.3245 |
|cuda|lcnet_050 |128 |14.7991 |14.8687 |
|cuda|levit_128 |128 |19.2893 |19.4772 |
|cuda|mixer_b16_224 |128 |93.9826 |94.2056 |
|cuda|mixnet_l |128 |147.1245 |147.0435 |
|cuda|mnasnet_100 |128 |39.1781 |39.2565 |
|cuda|mobilenetv2_100 |128 |42.3704 |42.3114 |
|cuda|mobilenetv3_large_100 |128 |37.2946 |37.2816 |
|cuda|mobilevit_s |64 |55.8930 |55.8934 |
|cuda|nfnet_l0 |128 |64.0448 |64.4438 |
|cuda|pit_b_224 |64 |80.6342 |80.2933 |
|cuda|pnasnet5large |16 |154.9611 |154.8654 |
|cuda|poolformer_m36 |64 |101.7489 |101.8138 |
|cuda|regnety_002 |128 |27.0939 |27.0309 |
|cuda|repvgg_a2 |128 |60.9651 |61.2533 |
|cuda|res2net101_26w_4s |64 |77.3291 |77.4739 |
|cuda|res2net50_14w_8s |128 |93.6572 |93.7221 |
|cuda|res2next50 |128 |112.4975 |112.3248 |
|cuda|resmlp_12_224 |128 |59.5422 |60.7644 |
|cuda|resnest101e |64 |97.9894 |98.3358 |
|cuda|rexnet_100 |128 |55.2218 |55.0718 |
|cuda|sebotnet33ts_256 |64 |60.4880 |60.8113 |
|cuda|selecsls42b |128 |41.4294 |41.5341 |
|cuda|spnasnet_100 |128 |45.0037 |45.0304 |
|cuda|swin_base_patch4_window7_224 |64 |98.2561 |98.6925 |
|cuda|swsl_resnext101_32x16d |32 |100.6179 |100.9195 |
|cuda|tf_efficientnet_b0 |128 |56.5344 |56.4591 |
|cuda|tf_mixnet_l |128 |153.0318 |152.9367 |
|cuda|tinynet_a |128 |54.1307 |53.9298 |
|cuda|tnt_s_patch16_224 |128 |142.4801 |142.6589 |
|cuda|twins_pcpvt_base |64 |67.9027 |67.8325 |
|cuda|visformer_small |128 |72.5589 |72.9427 |
|cuda|vit_base_patch16_224 |64 |71.4885 |71.7342 |
|cuda|volo_d1_224 |64 |69.3539 |69.5910 |
|cuda|xcit_large_24_p8_224 |5 |59.9000 |59.9699 |
v7 vs. v8 float16
|dev |name |batch_size|abs_latency|abs_latency|
|----|-------------------------------|----------|-----------|-----------|
|cuda|adv_inception_v3 |128 |104.2544 |104.2677 |
|cuda|beit_base_patch16_224 |64 |85.3601 |85.3786 |
|cuda|botnet26t_256 |128 |72.1476 |71.8277 |
|cuda|cait_m36_384 |4 |108.3075 |108.5941 |
|cuda|coat_lite_mini |128 |61.2382 |61.6049 |
|cuda|convmixer_768_32 |32 |263.3818 |263.3598 |
|cuda|convnext_base |64 |172.6821 |173.8520 |
|cuda|crossvit_9_240 |128 |44.6321 |44.6340 |
|cuda|cspdarknet53 |64 |79.3165 |79.2964 |
|cuda|deit_base_distilled_patch16_224|64 |61.9816 |62.2109 |
|cuda|dla102 |128 |115.7403 |115.9928 |
|cuda|dm_nfnet_f0 |128 |77.5434 |77.7440 |
|cuda|dpn107 |32 |83.6489 |83.5605 |
|cuda|eca_botnext26ts_256 |128 |73.9953 |74.1031 |
|cuda|eca_halonext26ts |128 |81.7951 |81.7103 |
|cuda|ese_vovnet19b_dw |128 |42.9618 |42.8853 |
|cuda|fbnetc_100 |128 |54.3590 |54.3575 |
|cuda|fbnetv3_b |128 |69.7977 |70.1696 |
|cuda|gernet_l |128 |64.8684 |65.1726 |
|cuda|ghostnet_100 |128 |43.2054 |43.1319 |
|cuda|gluon_inception_v3 |128 |104.1988 |104.3030 |
|cuda|gluon_xception65 |32 |84.2245 |84.5085 |
|cuda|gmixer_24_224 |128 |82.0418 |82.7252 |
|cuda|gmlp_s16_224 |128 |75.4792 |75.8374 |
|cuda|hrnet_w18 |128 |184.1450 |184.1848 |
|cuda|inception_v3 |128 |104.1203 |104.2536 |
|cuda|jx_nest_base |32 |58.2386 |58.4901 |
|cuda|lcnet_050 |128 |14.6409 |14.5616 |
|cuda|levit_128 |128 |22.3875 |22.4680 |
|cuda|mixer_b16_224 |128 |98.9534 |98.4730 |
|cuda|mixnet_l |128 |146.1623 |146.1947 |
|cuda|mnasnet_100 |128 |38.9208 |39.3463 |
|cuda|mobilenetv2_100 |128 |41.8946 |41.9847 |
|cuda|mobilenetv3_large_100 |128 |36.7810 |36.8264 |
|cuda|mobilevit_s |64 |55.3211 |55.3186 |
|cuda|nfnet_l0 |128 |63.1302 |63.5544 |
|cuda|pit_b_224 |64 |73.8752 |73.4602 |
|cuda|pnasnet5large |16 |151.6806 |151.6111 |
|cuda|poolformer_m36 |64 |86.8341 |86.8021 |
|cuda|regnety_002 |128 |26.6798 |26.5295 |
|cuda|repvgg_a2 |128 |61.6652 |62.1482 |
|cuda|res2net101_26w_4s |64 |75.8037 |75.7739 |
|cuda|res2net50_14w_8s |128 |92.6362 |92.4338 |
|cuda|res2next50 |128 |111.5371 |111.5832 |
|cuda|resmlp_12_224 |128 |58.2349 |57.9807 |
|cuda|resnest101e |64 |96.1114 |96.2742 |
|cuda|rexnet_100 |128 |54.8138 |54.7643 |
|cuda|sebotnet33ts_256 |64 |53.1524 |53.3823 |
|cuda|selecsls42b |128 |40.6070 |40.7104 |
|cuda|spnasnet_100 |128 |44.5732 |44.4318 |
|cuda|swin_base_patch4_window7_224 |64 |98.6447 |98.8445 |
|cuda|swsl_resnext101_32x16d |32 |97.0195 |97.2968 |
|cuda|tf_efficientnet_b0 |128 |56.0640 |56.0278 |
|cuda|tf_mixnet_l |128 |152.0958 |152.0874 |
|cuda|tinynet_a |128 |53.3694 |53.3762 |
|cuda|tnt_s_patch16_224 |128 |130.2981 |130.3726 |
|cuda|twins_pcpvt_base |64 |62.5459 |62.6416 |
|cuda|visformer_small |128 |68.8502 |69.1756 |
|cuda|vit_base_patch16_224 |64 |65.8587 |66.0285 |
|cuda|volo_d1_224 |64 |64.5348 |64.6057 |
Pull Request resolved: https://github.com/pytorch/pytorch/pull/89022
Approved by: https://github.com/ngimel
2022-12-15 03:24:44 +00:00
kshitij12345
6a964c16e5
[flaky] relax tolerance conv1d_vs_scipy ( #89193 )
...
Fixes https://github.com/pytorch/pytorch/issues/89087
Pull Request resolved: https://github.com/pytorch/pytorch/pull/89193
Approved by: https://github.com/kit1980
2022-11-18 07:31:10 +00:00
PyTorch MergeBot
d98a884b33
Revert "[cuDNN] (re-open) Enable cuDNN Frontend v8 API by Default ( #87669 )"
...
This reverts commit 3c6bddc3f6 .
Reverted https://github.com/pytorch/pytorch/pull/87669 on behalf of https://github.com/eqy due to investigating convnext benchmark regressions
2022-11-08 19:04:25 +00:00
eqy
3c6bddc3f6
[cuDNN] (re-open) Enable cuDNN Frontend v8 API by Default ( #87669 )
...
#58414
Has a small tweak to a test that was breaking on A10 (CC @malfet).
CC @ptrblck @ngimel
Pull Request resolved: https://github.com/pytorch/pytorch/pull/87669
Approved by: https://github.com/ngimel
2022-11-02 01:36:37 +00:00
Kshiteej K
6735bf21c7
[test_nn] split convolution tests from test_nn ( #87474 )
...
Ref #63085
Pull Request resolved: https://github.com/pytorch/pytorch/pull/87474
Approved by: https://github.com/albanD
2022-10-31 04:42:45 +00:00