Mingzhe Li
ad95099f45
fix benchmark_all_test when running on gpu ( #29818 )
...
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29818
When some of the test running on cuda, there is a runtime error because of missing data transfer from cpu to cuda. This diff fixes that issue.
Test Plan:
```
buck run mode/opt //caffe2/benchmarks/operator_benchmark:benchmark_all_test -- --iterations 1
# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : short
# Benchmarking PyTorch: add
# Mode: Eager
# Name: add_M64_N64_K64_cpu
# Input: M: 64, N: 64, K: 64, device: cpu
Forward Execution Time (us) : 165.241
# Benchmarking PyTorch: add
# Mode: Eager
# Name: add_M64_N64_K64_cuda
# Input: M: 64, N: 64, K: 64, device: cuda
Forward Execution Time (us) : 56.546
...
Reviewed By: hl475
Differential Revision: D18506269
fbshipit-source-id: 87942d7a52bd398600766c0f5363d791b74a6ca6
2019-11-14 10:10:48 -08:00
Mingzhe Li
b70d571233
add embeddingbag operator the the benchmark suite ( #29784 )
...
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29784
Add embeddingbag operator to the benchmark suite with different number of embeddings, dims, and inputs.
Test Plan:
```
buck run //caffe2/benchmarks/operator_benchmark/pt:embeddingbag_test -- --iterations 1
# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : short
# Benchmarking PyTorch: embeddingbag
# Mode: Eager
# Name: embeddingbag_embeddingbags2300_dim64_modesum_input_size16_offset0_sparseTrue
# Input: embeddingbags: 2300, dim: 64, mode: sum, input_size: 16, offset: 0, sparse: True
Forward Execution Time (us) : 624.838
# Benchmarking PyTorch: embeddingbag
# Mode: Eager
# Name: embeddingbag_embeddingbags2300_dim64_modesum_input_size64_offset0_sparseTrue
# Input: embeddingbags: 2300, dim: 64, mode: sum, input_size: 64, offset: 0, sparse: True
Forward Execution Time (us) : 636.744
# Benchmarking PyTorch: embeddingbag
# Mode: Eager
# Name: embeddingbag_embeddingbags80_dim64_modesum_input_size8_offset0_sparseTrue
# Input: embeddingbags: 80, dim: 64, mode: sum, input_size: 8, offset: 0, sparse: True
Backward Execution Time (us) : 2325.291
# Benchmarking PyTorch: embeddingbag
# Mode: Eager
# Name: embeddingbag_embeddingbags80_dim64_modesum_input_size16_offset0_sparseTrue
# Input: embeddingbags: 80, dim: 64, mode: sum, input_size: 16, offset: 0, sparse: True
Backward Execution Time (us) : 2528.658
...
Reviewed By: bddppq
Differential Revision: D18496340
fbshipit-source-id: 157dcff2ea4ec13416fe161382fcefd47ce4cc01
2019-11-14 10:05:47 -08:00
Mingzhe Li
e53b510773
add addmm op to the benchmark suite ( #29783 )
...
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29783
Add addmm operator which reuses existing input shapes for the add operator.
Test Plan:
```
buck run //caffe2/benchmarks/operator_benchmark/pt:add_test -- --iterations 1
# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : short
# Benchmarking PyTorch: addmm
# Mode: Eager
# Name: addmm_M64_N64_K64_cpu
# Input: M: 64, N: 64, K: 64, device: cpu
Forward Execution Time (us) : 759.237
# Benchmarking PyTorch: addmm
# Mode: Eager
# Name: addmm_M64_N64_K128_cpu
# Input: M: 64, N: 64, K: 128, device: cpu
Forward Execution Time (us) : 922.764
# Benchmarking PyTorch: addmm
# Mode: Eager
# Name: addmm_M64_N64_K64_cpu_bwdall
# Input: M: 64, N: 64, K: 64, device: cpu
Backward Execution Time (us) : 4689.546
# Benchmarking PyTorch: addmm
# Mode: Eager
# Name: addmm_M64_N64_K64_cpu_bwd1
# Input: M: 64, N: 64, K: 64, device: cpu
Backward Execution Time (us) : 1700.093
# Benchmarking PyTorch: addmm
# Mode: Eager
# Name: addmm_M64_N64_K64_cpu_bwd2
# Input: M: 64, N: 64, K: 64, device: cpu
Backward Execution Time (us) : 2947.427
# Benchmarking PyTorch: addmm
# Mode: Eager
# Name: addmm_M64_N64_K64_cpu_bwd3
# Input: M: 64, N: 64, K: 64, device: cpu
Backward Execution Time (us) : 2518.043
# Benchmarking PyTorch: addmm
# Mode: Eager
# Name: addmm_M64_N64_K128_cpu_bwdall
# Input: M: 64, N: 64, K: 128, device: cpu
Backward Execution Time (us) : 5848.369
Reviewed By: bddppq
Differential Revision: D18496476
fbshipit-source-id: 4f1c116a2676a64106afa958e8c8a8e109f35a4a
2019-11-14 10:02:55 -08:00
Mingzhe Li
f3b15727c5
fix op benchmark OOM issue ( #29794 )
...
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29794
Before this diff, all tests of an operator are created at once before testing. Once an operator is benchmarked, the same process will move to the next operator and so on. The issue is that the number of tests of a single operator could be > 100 which can cause OOM issues. This diff avoids creating all the tests of an operator at once by using generators which creates/runs test one by one.
Test Plan:
```
buck run //caffe2/benchmarks/operator_benchmark:benchmark_all_quantized_test -- --iterations 1
# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : short
# Benchmarking PyTorch: relu
# Mode: Eager
# Name: relu_dims(3,4,5)_contigFalse_inplaceFalse_dtypetorch.quint8
# Input: dims: (3, 4, 5), contig: False, inplace: False, dtype: torch.quint8
Forward Execution Time (us) : 52.493
# Benchmarking PyTorch: relu
# Mode: Eager
# Name: relu_dims(3,4,5)_contigFalse_inplaceFalse_dtypetorch.qint8
# Input: dims: (3, 4, 5), contig: False, inplace: False, dtype: torch.qint8
Forward Execution Time (us) : 44.945
...
Reviewed By: hl475
Differential Revision: D18500103
fbshipit-source-id: 747c0ad0d302177da04da36e112c67f154115b6e
2019-11-13 22:22:58 -08:00
Zafar Takhirov
d2aa4c611f
observer benchmarking
...
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29508
Test Plan: Imported from OSS
Differential Revision: D18415171
Pulled By: z-a-f
fbshipit-source-id: 5ebedee8c17448e36853e0c1bf778bb128975678
2019-11-12 23:28:10 -08:00
Zafar Takhirov
29e509ff1d
Fix a missing comma in quantized benchmark
...
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29685
Test Plan: Imported from OSS
Differential Revision: D18463246
Pulled By: z-a-f
fbshipit-source-id: c21fd7892f3701afcc5faa8bc03f98b6f6550d0f
2019-11-12 16:50:46 -08:00
Zafar Takhirov
9bb0e2834d
Fixing data type in quantized pool benchmarking
...
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29663
Test Plan: Imported from OSS
Differential Revision: D18456671
Pulled By: z-a-f
fbshipit-source-id: b36fc56e4f29937e458308f4c13f7a5e37665269
2019-11-12 13:22:53 -08:00
Zafar Takhirov
3b43cfde80
Benchmarking per channel quantization
...
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29627
Test Plan: Imported from OSS
Differential Revision: D18443929
Pulled By: z-a-f
fbshipit-source-id: a0345cc5e259b4ce98589252719b8885326d43a3
2019-11-12 11:33:42 -08:00
Zafar Takhirov
5db361bd32
Quantized interpolation benchmarks
...
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29509
Test Plan: Imported from OSS
Differential Revision: D18415367
Pulled By: z-a-f
fbshipit-source-id: 84d0aaa81b131b49762edde6ade27e61acb99a42
2019-11-12 11:23:03 -08:00
Zafar Takhirov
f95e8ea1be
Benchmarking quantized methods ( #29625 )
...
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29625
This PR also adds a template for benchmarking methods that require no input.
Test Plan: Imported from OSS
Differential Revision: D18443485
Pulled By: z-a-f
fbshipit-source-id: 6f25c3a7cd94e396c112b5f7c33307b71f78ecd3
2019-11-12 11:08:55 -08:00
Zafar Takhirov
3b452ca428
quantized topk benchmarking
...
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29505
Test Plan: Imported from OSS
Differential Revision: D18414851
Pulled By: z-a-f
fbshipit-source-id: 23999ef95c2f087066c4da36b2bf35516ebc0421
2019-11-12 00:33:47 -08:00
Zafar Takhirov
a0d4d5062b
Quantized unary ops benchmarking (mostly template)
...
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29503
Test Plan: Imported from OSS
Differential Revision: D18414589
Pulled By: z-a-f
fbshipit-source-id: ab5af490359b3e0a51642a46aef86f7be720deff
2019-11-11 23:48:36 -08:00
Zafar Takhirov
fb07098e2b
Creating a base benchmarking class for activations.
...
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29182
Test Plan: Imported from OSS
Differential Revision: D18319456
Pulled By: z-a-f
fbshipit-source-id: d2314bb30a584551b5f1c8610b36c4c10c27ac85
2019-11-11 18:24:44 -08:00
Mingzhe Li
af3468a1c7
change op bench input shape to reduce execution time ( #29616 )
...
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29616
1. Reduce the predefined_min_time which is the minimum time each test needs to run. Based on the test result, the average time across different epoch are pretty stable before exiting. So we can safely reduce the predefined time here.
2. Chang the input shapes of several ops
Test Plan:
```
# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : short
# Benchmarking PyTorch: add
200 256.044864655
400 165.850520134
800 163.579881191
1600 162.871927023
3200 160.3128016
# Mode: Eager
# Name: add_cpu_M64_K64_bwd1_N64
# Input: device: cpu, K: 64, M: 64, N: 64
Backward Execution Time (us) : 164.715
# Benchmarking PyTorch: add
200 170.650482178
400 168.895125389
800 169.867575169
1600 163.400024176
3200 168.658420444
# Mode: Eager
# Name: add_cpu_M64_K64_bwd2_N64
# Input: device: cpu, K: 64, M: 64, N: 64
Backward Execution Time (us) : 168.777
Reviewed By: hl475
Differential Revision: D18438540
fbshipit-source-id: 1fd27cf4bbc34e46e74393af912ee2fcb75c33b2
2019-11-11 16:58:27 -08:00
Mingzhe Li
7374dd0d52
remove SkipInputShape flag ( #29615 )
...
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29615
Remove that flag as it's not needed any more.
Test Plan: na
Reviewed By: hl475
Differential Revision: D18440271
fbshipit-source-id: 41b0659c72ef746a1cc268174fd1e7dc2beb1ae2
2019-11-11 16:56:40 -08:00
Mingzhe Li
b5a38fa98e
update op bench readme ( #29596 )
...
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29596
as title
Test Plan: na
Reviewed By: hl475
Differential Revision: D18437811
fbshipit-source-id: 7996d1689d8a46849b62b2b3875c67cf8dc5861c
2019-11-11 15:33:29 -08:00
Mingzhe Li
00c224f0f2
move quantized tests from benchmark_all_test to benchmark_all_quantized_test ( #29590 )
...
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29590
as title
Test Plan:
```
buck run //caffe2/benchmarks/operator_benchmark:benchmark_all_test -- --iteration 1
Parsing buck files: finished in 1.0 sec
Creating action graph: finished in 43.0 sec
Building: finished in 16.0 sec (100%) 10053/10053 jobs, 1 updated
Total time: 01:00.0 min
# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : short
# Benchmarking PyTorch: add
# Mode: Eager
# Name: add_M64_N64_K64_cpu
# Input: M: 64, N: 64, K: 64, device: cpu
Forward Execution Time (us) : 45419.667
...
buck run //caffe2/benchmarks/operator_benchmark:benchmark_all_quantized_test
Parsing buck files: finished in 1.0 sec
Building: finished in 6.0 sec (100%) 10053/10053 jobs, 1 updated
Total time: 7.0 sec
# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : short
# Benchmarking PyTorch: QReLU
# Mode: Eager
# Name: QReLU_dims(1,)_permute_dimsFalse_inplaceFalse_dtypetorch.quint8
# Input: dims: (1,), permute_dims: False, inplace: False, dtype: torch.quint8
Forward Execution Time (us) : 137.685
...
Reviewed By: hl475
Differential Revision: D18436727
fbshipit-source-id: 317ec0e4bd2a6e33c9a60830f01ed805ae412449
2019-11-11 14:59:29 -08:00
Mingzhe Li
137eea5938
change module_name in chunk_test ( #29589 )
...
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29589
as title
Test Plan:
```
buck run //caffe2/benchmarks/operator_benchmark/pt:chunk_test -- --iteration 1
# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : short
# Benchmarking PyTorch: chunk
# Mode: Eager
# Name: chunk_M256_N512_chunks2_cpu
# Input: M: 256, N: 512, chunks: 2, device: cpu
Forward Execution Time (us) : 148.345
# Benchmarking PyTorch: chunk
# Mode: Eager
# Name: chunk_M512_N512_chunks2_cpu
# Input: M: 512, N: 512, chunks: 2, device: cpu
Forward Execution Time (us) : 125.239
Reviewed By: hl475
Differential Revision: D18436532
fbshipit-source-id: e7100f4605471e27703b2e2e863b971a93229854
2019-11-11 14:59:24 -08:00
Mingzhe Li
6104f4e37c
reduce input shapes for matmul ( #29587 )
...
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29587
as title
Test Plan:
```
buck run //caffe2/benchmarks/operator_benchmark/pt:matmul_test -- --iteration 1
Reviewed By: hl475
Differential Revision: D18436317
fbshipit-source-id: 564143edc3d4400bcfafa0da11b7479562661b0c
2019-11-11 14:59:20 -08:00
Mingzhe Li
0e5299a441
fix list_ops and list_tests ( #29586 )
...
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29586
This diff is fixing the list_ops and list_tests issues caused by D18412342.
Test Plan:
```
buck run //caffe2/benchmarks/operator_benchmark:benchmark_all_tes
t -- --iteration 1 --list_tests
Parsing buck files: finished in 0.9 sec
Creating action graph: finished in 37.2 sec
Building: finished in 15.9 sec (100%) 10053/10053 jobs, 1 updated
Total time: 54.0 sec
# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : short
# List of tests:
# add_M8_N2_K1_cpu
# add_M8_N2_K8_cpu
..
buck run //caffe2/benchmarks/operator_benchmark:benchmark_all_test -- --iteration 1 --list_ops
Parsing buck files: finished in 1.0 sec
Building: finished in 5.3 sec (100%) 10053/10053 jobs, 0 updated
Total time: 6.3 sec
# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : short
# List of Operators to run:
# add
# batchnorm
# cat
# chunks
# Conv1d
# ConvTranspose1d
...
Reviewed By: hl475
Differential Revision: D18435994
fbshipit-source-id: 89ecfd55339b6e7687cdf8d90433d4767252e09f
2019-11-11 14:59:16 -08:00
Mingzhe Li
85752df4a1
reduce conv_test input shapes ( #29580 )
...
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29580
The input shapes of Conv benchmark generates too many tests which could took 40+GB memory. This diff reduces the input shapes to fix that issue.
Test Plan:
```
buck run //caffe2/benchmarks/operator_benchmark/pt:conv_test -- --iteration 1
# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : short
# Benchmarking PyTorch: Conv3d
# Mode: Eager
# Name: Conv3d_in_c64_out_c64_kernel3_stride1_N8_D4_H16_W16_cpu
# Input: in_c: 64, out_c: 64, kernel: 3, stride: 1, N: 8, D: 4, H: 16, W: 16, device: cpu
Forward Execution Time (us) : 383376.096
Reviewed By: hl475
Differential Revision: D18434627
fbshipit-source-id: a91a239394b034ff7b42e1b09e2f744a8ad671e9
2019-11-11 14:59:11 -08:00
Zafar Takhirov
6bfa7c0471
FakeQuantize benchmarking
...
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29507
Test Plan: Imported from OSS
Differential Revision: D18415084
Pulled By: z-a-f
fbshipit-source-id: f758e45d5178ee5f80157772ab701a69f074a78b
2019-11-11 14:41:58 -08:00
Zafar Takhirov
b3b8f522e8
Disabling 'contig' in quantized arithmetic test
...
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29576
Test Plan: Imported from OSS
Differential Revision: D18433052
Pulled By: z-a-f
fbshipit-source-id: 8082303faa368646ef6370b6cf348275526fd33b
2019-11-11 13:30:13 -08:00
Zafar Takhirov
5b43becfc5
per-tensor quantize/dequantize benchmarking
...
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29506
Test Plan: Imported from OSS
Differential Revision: D18415017
Pulled By: z-a-f
fbshipit-source-id: 92a50706aafabdcaa79dd1f226f7f4ac63606c74
2019-11-11 13:19:46 -08:00
Zafar Takhirov
9276cd449d
qadaptive_avgpool2d benchmarking
...
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29274
Test Plan: Imported from OSS
Differential Revision: D18343569
Pulled By: z-a-f
fbshipit-source-id: e5ab9c79965caa59a8e17069e70304c01be46104
2019-11-11 12:17:44 -08:00
Zafar Takhirov
5c9eae075f
qavgpool benchmarking
...
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29268
Test Plan: Imported from OSS
Differential Revision: D18342589
Pulled By: z-a-f
fbshipit-source-id: cc6f0153a927672e0831200b58f5413c7db7bdb1
2019-11-09 22:30:24 -08:00
Zafar Takhirov
958d0cd4df
Adding short tests
...
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29257
Test Plan: Imported from OSS
Differential Revision: D18340536
Pulled By: z-a-f
fbshipit-source-id: dce470fd0c7ef9c6f639de40f7e0713b335408d1
2019-11-09 21:33:41 -08:00
Zafar Takhirov
a47fe40729
qpool benchmarking
...
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29250
Test Plan: Imported from OSS
Differential Revision: D18339142
Pulled By: z-a-f
fbshipit-source-id: 1d2a3dda15ab300ffa63719158a4788b7fb17df5
2019-11-09 17:52:31 -08:00
Zafar Takhirov
4874120804
Added all binary arithmetic tests in QFunctional
...
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29424
Test Plan: Imported from OSS
Differential Revision: D18385689
Pulled By: z-a-f
fbshipit-source-id: 5947e0edfcbe2b6eba984dc9da187e9fce5cd40f
2019-11-09 14:49:57 -08:00
Zafar Takhirov
687ea7460a
quantized comparators benchmarking
...
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29437
Test Plan: Imported from OSS
Differential Revision: D18389909
Pulled By: z-a-f
fbshipit-source-id: e007b50fc3905747f0e0a70ab438b790e63b023e
2019-11-09 14:23:41 -08:00
Zafar Takhirov
fb2eb01955
qadd benchmarking
...
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29420
Test Plan: Imported from OSS
Differential Revision: D18383402
Pulled By: z-a-f
fbshipit-source-id: 8ea2f689b7df676ffb8adef0cbb058a7a2123938
2019-11-09 14:20:28 -08:00
Mingzhe Li
f31d6c70fe
reduce op bench binary size ( #29496 )
...
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29496
This diff reduces the binary size of op benchmark by avoiding creating all tests at once.
Test Plan:
```
buck run //caffe2/benchmarks/operator_benchmark:benchmark_all_test
# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : long
# Benchmarking PyTorch: add
# Mode: Eager
# Name: add_M8_N2_K1_cpu
# Input: M: 8, N: 2, K: 1, device: cpu
Forward Execution Time (us) : 160.781
# Benchmarking PyTorch: add
# Mode: Eager
# Name: add_M8_N2_K8_cpu
# Input: M: 8, N: 2, K: 8, device: cpu
Forward Execution Time (us) : 158.941
Reviewed By: hl475
Differential Revision: D18412342
fbshipit-source-id: 5db647019ae8c2e4d6ab361b54b63cf88236b1ae
2019-11-08 22:15:12 -08:00
Zafar Takhirov
2e5fc034fb
Quantized concat benchmarking
...
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29431
Test Plan: Imported from OSS
Differential Revision: D18387765
Pulled By: z-a-f
fbshipit-source-id: a14f69d1ceb0f63ce5eddfda8af342f672dfec69
2019-11-08 12:48:55 -08:00
Mingzhe Li
6572d0d174
add a new flag to select machine for op benchmark ( #29349 )
...
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29349
This diff adds a new flag to pick cpu/gpu machines to run op benchmarks. The default is None which will try to run all support devices.
Test Plan:
```
buck run mode/opt caffe2/benchmarks/operator_benchmark/pt:add_test
# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : short
# Benchmarking PyTorch: add_
# Mode: Eager
# Name: add__M64_N64_K64_cpu
# Input: M: 64, N: 64, K: 64, device: cpu
Forward Execution Time (us) : 124.283
...
# Benchmarking PyTorch: add_
# Mode: Eager
# Name: add__M64_N64_K128_cuda_bwdall
# Input: M: 64, N: 64, K: 128, device: cuda
Backward Execution Time (us) : 176.592
buck run mode/opt caffe2/benchmarks/operator_benchmark/pt:add_test -- --device cpu
# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : short
# Benchmarking PyTorch: add_
# Mode: Eager
# Name: add__M64_N64_K64_cpu
# Input: M: 64, N: 64, K: 64, device: cpu
Forward Execution Time (us) : 121.884
buck run mode/opt caffe2/benchmarks/operator_benchmark/pt:add_test -- --device cuda
# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : short
# Benchmarking PyTorch: add_
# Mode: Eager
# Name: add__M64_N64_K64_cuda
# Input: M: 64, N: 64, K: 64, device: cuda
Forward Execution Time (us) : 26.002
Reviewed By: hl475
Differential Revision: D18363942
fbshipit-source-id: fccd1fd09bcd6d7725e6fa4063559a27d9cc3065
2019-11-06 20:13:25 -08:00
Mingzhe Li
7d01d5efd7
update op bench readme ( #29289 )
...
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29289
as title
Test Plan: na
Reviewed By: hl475
Differential Revision: D18350580
fbshipit-source-id: 80f41cbbfda9cbcd8988b451cdfb199f2b89e49b
2019-11-06 14:08:02 -08:00
Mingzhe Li
e4c4ff079c
group quantized op benchmarks into a new binary ( #29288 )
...
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29288
More quantized operators have been added to the benchmark suite. We want to split them from the un-quantized ones for easier benchmarking.
Test Plan:
```
buck run mode/dev-nosan //caffe2/benchmarks/operator_benchmark:benchmark_all_quantized_test -- --iterations 1
# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : short
# Benchmarking PyTorch: QConv2d
# Mode: Eager
# Name: QConv2d_kernel3_G32_H56_OC512_N1_stride2_pad1_W56_IC512
# Input: kernel: 3, G: 32, H: 56, OC: 512, N: 1, stride: 2, pad: 1, W: 56, IC: 512
Forward Execution Time (us) : 5614.996
# Benchmarking PyTorch: QLinear
# Mode: Eager
# Name: QLinear_N6400_IN141_OUT15
# Input: N: 6400, IN: 141, OUT: 15
Forward Execution Time (us) : 2829.075
Reviewed By: hl475
Differential Revision: D18349850
fbshipit-source-id: 5b2fd9c1d5a25068592e5059909bb6c14095f397
2019-11-06 09:48:53 -08:00
Mingzhe Li
114e7382b6
skip cuda test if not on GPU machines
...
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29287
Test Plan:
```
buck run mode/dev-nosan //caffe2/benchmarks/operator_benchmark:benchmark_all_test -- --iterations 1
# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : short
# Benchmarking PyTorch: ConvTranspose2d
# Mode: Eager
# Name: ConvTranspose2d_kernel3_out_c256_H16_in_c256_N1_stride1_W16_cpu
# Input: kernel: 3, out_c: 256, H: 16, in_c: 256, N: 1, stride: 1, W: 16, device: cpu
Forward Execution Time (us) : 10434.151
Reviewed By: hl475
Differential Revision: D18344574
fbshipit-source-id: 881c857cf901c4539ee1a61171ab41df1c476db7
2019-11-06 09:37:04 -08:00
Mingzhe Li
e86450620d
add cuda to all op benchmark ( #29285 )
...
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29285
as title
Test Plan:
```
buck run mode/dev-nosan //caffe2/benchmarks/operator_benchmark:benchmark_all_test -- --iterations 1
# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : short
# Benchmarking PyTorch: ConvTranspose2d
# Mode: Eager
# Name: ConvTranspose2d_kernel3_out_c256_H16_in_c256_N1_stride1_W16_cpu
# Input: kernel: 3, out_c: 256, H: 16, in_c: 256, N: 1, stride: 1, W: 16, device: cpu
Forward Execution Time (us) : 10434.151
Reviewed By: hl475
Differential Revision: D18338258
fbshipit-source-id: 944e87d1ec70daadb205faaf2825d4a2202086c5
2019-11-06 09:37:00 -08:00
Mingzhe Li
27115612ab
add execution mode to the test name ( #29284 )
...
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29284
as title
Test Plan:
```
buck run //caffe2/benchmarks/operator_benchmark/pt:add_test -- iterations 1 --ai_pep_format true
# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : short
# Benchmarking PyTorch: add
PyTorchObserver {"type": "PyTorch_add_M64_N64_K64_cpu_Eager", "metric": "latency", "unit": "ms", "value": "26.64516019518487"}
Reviewed By: hl475
Differential Revision: D18336980
fbshipit-source-id: 1f9d5147a56afeb68cd526a57f7375c5ec39efa4
2019-11-06 09:32:54 -08:00
Zafar Takhirov
d545e4f155
qrelu benchmarking
...
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29174
Test Plan: Imported from OSS
Differential Revision: D18319345
Pulled By: z-a-f
fbshipit-source-id: b64f0131296771ed201d85664930cceb7be185bd
2019-11-05 17:20:40 -08:00
Mingzhe Li
044ff91950
reduce predefined_min_secs for execution time ( #29142 )
...
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29142
as title
Test Plan:
```
Before this diff:
# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : short
# Benchmarking PyTorch: add_
# Mode: Eager
# Name: add__M64_N64_K64_cpu
# Input: M: 64, N: 64, K: 64, device: cpu
Forward Execution Time (us) : 122.965
# Benchmarking PyTorch: add_
# Mode: Eager
# Name: add__M64_N64_K128_cpu
# Input: M: 64, N: 64, K: 128, device: cpu
Forward Execution Time (us) : 229.735
# Benchmarking PyTorch: add_
# Mode: Eager
# Name: add__M64_N64_K64_cpu_bwdall
# Input: M: 64, N: 64, K: 64, device: cpu
Backward Execution Time (us) : 950.455
# Benchmarking PyTorch: add_
# Mode: Eager
# Name: add__M64_N64_K64_cpu_bwd1
# Input: M: 64, N: 64, K: 64, device: cpu
Backward Execution Time (us) : 826.893
After this diff:
buck run mode/opt //caffe2/benchmarks/operator_benchmark/pt:add_test;
Parsing buck files: finished in 0.7 sec
Building: finished in 02:35.7 min (100%) 7281/7281 jobs, 1 updated
Total time: 02:36.4 min
# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : short
# Benchmarking PyTorch: add_
# Mode: Eager
# Name: add__M64_N64_K64_cpu
# Input: M: 64, N: 64, K: 64, device: cpu
Forward Execution Time (us) : 125.021
# Benchmarking PyTorch: add_
# Mode: Eager
# Name: add__M64_N64_K128_cpu
# Input: M: 64, N: 64, K: 128, device: cpu
Forward Execution Time (us) : 244.076
# Benchmarking PyTorch: add_
# Mode: Eager
# Name: add__M64_N64_K64_cpu_bwdall
# Input: M: 64, N: 64, K: 64, device: cpu
Backward Execution Time (us) : 946.280
# Benchmarking PyTorch: add_
# Mode: Eager
# Name: add__M64_N64_K64_cpu_bwd1
# Input: M: 64, N: 64, K: 64, device: cpu
Backward Execution Time (us) : 863.835
Reviewed By: hl475
Differential Revision: D18305676
fbshipit-source-id: d382084e39b87c554084891f87701b87cd2d3800
2019-11-04 14:29:00 -08:00
Mingzhe Li
b693c5d6a0
replace add benchmark with add_ ( #29050 )
...
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29050
as title
Test Plan:
```
buck run //caffe2/benchmarks/operator_benchmark/pt:add_test
# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : short
# Benchmarking PyTorch: add_
# Mode: Eager
# Name: add__M64_N64_K64_cpu
# Input: M: 64, N: 64, K: 64, device: cpu
Forward Execution Time (us) : 31475.766
Reviewed By: hl475
Differential Revision: D18265767
fbshipit-source-id: 7aaa04f5fa5b2dd58bbc1aa045693314032e0ff0
2019-11-01 13:08:27 -07:00
Mingzhe Li
f63cbf3ae2
change op benchmark forward_only flag ( #28967 )
...
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28967
Change forward_only flag to take True or False so it should be integrated with PEP.
Test Plan:
```
[mingzhe0908@devgpu203.prn2 ~/fbsource/fbcode] ~/fbsource/fbcode/buck-out/opt/gen/caffe2/benchmarks/operator_benchmark/pt/add_test.par --forward_only True --iterations 1
# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : short
# Benchmarking PyTorch: add
# Mode: Eager
# Name: add_M64_N64_K64_cpu
# Input: M: 64, N: 64, K: 64, device: cpu
Forward Execution Time (us) : 152.489
# Benchmarking PyTorch: add
# Mode: Eager
# Name: add_M64_N64_K128_cpu
# Input: M: 64, N: 64, K: 128, device: cpu
Forward Execution Time (us) : 236.608
[mingzhe0908@devgpu203.prn2 ~/fbsource/fbcode] ~/fbsource/fbcode/buck-out/opt/gen/caffe2/benchmarks/operator_benchmark/pt/add_test.par --forward_only False --iterations 1
# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : short
# Benchmarking PyTorch: add
# Mode: Eager
# Name: add_M64_N64_K64_cpu
# Input: M: 64, N: 64, K: 64, device: cpu
Forward Execution Time (us) : 147.174
# Benchmarking PyTorch: add
# Mode: Eager
# Name: add_M64_N64_K128_cpu
# Input: M: 64, N: 64, K: 128, device: cpu
Forward Execution Time (us) : 253.437
# Benchmarking PyTorch: add
# Mode: Eager
# Name: add_M64_N64_K64_cpu_bwdall
# Input: M: 64, N: 64, K: 64, device: cpu
Backward Execution Time (us) : 1044.082
Reviewed By: hl475
Differential Revision: D18247416
fbshipit-source-id: 1c6cff1ac98233d4f0ca298e0cb4a0d3466e5834
2019-10-31 13:28:58 -07:00
Mingzhe Li
fcd6a8252c
add shapes for fill benchmark ( #28966 )
...
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28966
as title
Test Plan:
```
buck run mode/opt //caffe2/benchmarks/operator_benchmark/pt:fill_test
# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : short
# Benchmarking PyTorch: fill_
# Mode: Eager
# Name: fill__N1024_cpu_dtypetorch.int32
# Input: N: 1024, device: cpu, dtype: torch.int32
Forward Execution Time (us) : 2.008
Reviewed By: hl475
Differential Revision: D18241521
fbshipit-source-id: 6eb6e1ab7e8a2f461c6fc537f5bb971d12f594c3
2019-10-31 13:28:49 -07:00
Mingzhe Li
9034762a7d
add more operators to benchmark_all_test ( #28968 )
...
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28968
Add fill and as_strided operators.
Test Plan:
```
buck run mode/opt //caffe2/benchmarks/operator_benchmark:benchmark_all_test -- --list_ops
# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : short
# List of Operators to run:
# round_
# exponential_
# QLinear
...
Reviewed By: hl475
Differential Revision: D18241522
fbshipit-source-id: aade1d68a68a660d19d8dfd980eb4d5d0891488b
2019-10-31 13:28:39 -07:00
Mingzhe Li
5e94e66c6f
unify unary ops benchmark ( #28913 )
...
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28913
as title
Test Plan:
```
buck run mode/opt //caffe2/benchmarks/operator_benchmark/pt:unary_test
# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : short
# Benchmarking PyTorch: abs
# Mode: Eager
# Name: abs_M512_N512_cpu
# Input: M: 512, N: 512, device: cpu
Forward Execution Time (us) : 90.233
...
Reviewed By: hl475
Differential Revision: D18231641
fbshipit-source-id: 3093db47d0356b927768f15dc63af6ad8aadd430
2019-10-30 17:46:13 -07:00
Mingzhe Li
2ffc4cca67
unify split benchmark ( #28912 )
...
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28912
as title
Test Plan:
```
buck run mode/opt //caffe2/benchmarks/operator_benchmark/pt:split_test
# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : short
# Benchmarking PyTorch: split
# Mode: Eager
# Name: split_M256_N512_parts2_cpu
# Input: M: 256, N: 512, parts: 2, device: cpu
Forward Execution Time (us) : 3.434
Reviewed By: hl475
Differential Revision: D18231542
fbshipit-source-id: 84898db55996aa3faf156d4fb14f32d6db780e7a
2019-10-30 17:46:09 -07:00
Mingzhe Li
94d2599d77
unify softmax benchmark ( #28911 )
...
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28911
as title
Test Plan:
```
buck run mode/opt //caffe2/benchmarks/operator_benchmark/pt:softmax_test
# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : short
# Benchmarking PyTorch: Softmax
# Mode: Eager
# Name: Softmax_N4_C3_H256_W256_cpu
# Input: N: 4, C: 3, H: 256, W: 256, device: cpu
Forward Execution Time (us) : 17929.381
...
Reviewed By: hl475
Differential Revision: D18231517
fbshipit-source-id: 61f35849e1f4cf44cf09e60a7b618f8e9fc67b9c
2019-10-30 17:46:05 -07:00
Mingzhe Li
ed4a978d79
unify pool benchmark ( #28898 )
...
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28898
as title
Test Plan:
```
buck run mode/opt //caffe2/benchmarks/operator_benchmark/pt:pool_test
# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : short
# Benchmarking PyTorch: MaxPool1d
# Mode: Eager
# Name: MaxPool1d_kernel3_stride1_N8_C256_L256_cpu
# Input: kernel: 3, stride: 1, N: 8, C: 256, L: 256, device: cpu
Forward Execution Time (us) : 7133.492
Reviewed By: hl475
Differential Revision: D18228351
fbshipit-source-id: 47af93d5dd3776384f89b1289fbbe01c572ba9fc
2019-10-30 17:46:01 -07:00
Mingzhe Li
f5e99b3249
unify matmul benchmark ( #28899 )
...
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28899
as title
Test Plan:
```
buck run mode/opt //caffe2/benchmarks/operator_benchmark/pt:matmul_test
# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : short
# Benchmarking PyTorch: matmul
# Mode: Eager
# Name: matmul_M128_N128_K128_trans_aTrue_trans_bFalse_cpu
# Input: M: 128, N: 128, K: 128, trans_a: True, trans_b: False, device: cpu
Forward Execution Time (us) : 39.535
Reviewed By: hl475
Differential Revision: D18228271
fbshipit-source-id: 681ed2745c25a122997346a23acdbc67e55e5ec4
2019-10-30 17:45:57 -07:00
Mingzhe Li
6e1c18303b
unify linear benchmark ( #28897 )
...
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28897
as title
Test Plan:
```
buck run mode/opt //caffe2/benchmarks/operator_benchmark/pt:linear_test
# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : short
# Benchmarking PyTorch: linear
# Mode: Eager
# Name: linear_N4_IN256_OUT128_cpu
# Input: N: 4, IN: 256, OUT: 128, device: cpu
Forward Execution Time (us) : 39.275
Reviewed By: hl475
Differential Revision: D18228070
fbshipit-source-id: 9c209eb74e574c6ef85ebcd78b824ef7d5e65dde
2019-10-30 16:25:48 -07:00
Mingzhe Li
a7b235f968
unify gather benchmark ( #28895 )
...
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28895
as title
Test Plan:
```
buck run mode/opt //caffe2/benchmarks/operator_benchmark/pt:conv_test
# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : short
# Benchmarking PyTorch: Conv1d
# Mode: Eager
# Name: Conv1d_in_c256_out_c256_kernel3_stride1_N1_L64_cpu
# Input: in_c: 256, out_c: 256, kernel: 3, stride: 1, N: 1, L: 64, device: cpu
Forward Execution Time (us) : 208.936
Reviewed By: hl475
Differential Revision: D18227757
fbshipit-source-id: 493dd81108848fe3d48fb5ad940eb6aef84b639c
2019-10-30 16:25:43 -07:00
Mingzhe Li
6e4147c72c
unify conv benchmark ( #28894 )
...
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28894
as title
Test Plan:
```
buck run mode/opt //caffe2/benchmarks/operator_benchmark/pt:conv_test
# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : short
# Benchmarking PyTorch: Conv1d
# Mode: Eager
# Name: Conv1d_in_c256_out_c256_kernel3_stride1_N1_L64_cpu
# Input: in_c: 256, out_c: 256, kernel: 3, stride: 1, N: 1, L: 64, device: cpu
Forward Execution Time (us) : 208.936
Reviewed By: hl475
Differential Revision: D18227626
fbshipit-source-id: 1ae768f529aa888415840ca10197323407e47d39
2019-10-30 16:25:39 -07:00
Mingzhe Li
dbf8f535fc
unify chunk benchmark ( #28892 )
...
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28892
as title
Test Plan:
```
buck run mode/opt //caffe2/benchmarks/operator_benchmark/pt:chunk_test
# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : short
# Benchmarking PyTorch: chunks
# Mode: Eager
# Name: chunks_M256_N512_chunks2_cpu
# Input: M: 256, N: 512, chunks: 2, device: cpu
Forward Execution Time (us) : 4.098
Reviewed By: hl475
Differential Revision: D18227499
fbshipit-source-id: 72268b7fe94a7d92d6e47f58f33902a33367c68b
2019-10-30 16:25:35 -07:00
Mingzhe Li
88b2bfd706
unify cat benchmark ( #28893 )
...
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28893
as title
Test Plan:
```
buck run mode/opt //caffe2/benchmarks/operator_benchmark/pt:cat_test
# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : short
# Benchmarking PyTorch: cat
# Mode: Eager
# Name: cat_M256_N512_K1_dim0_cpu
# Input: M: 256, N: 512, K: 1, dim: 0, device: cpu
Forward Execution Time (us) : 78.607
Reviewed By: hl475
Differential Revision: D18227341
fbshipit-source-id: d383709a5aab600f99b37d07e4d4393645289101
2019-10-30 15:53:37 -07:00
Mingzhe Li
aa30b37d2e
unify batchnorm benchmark ( #28889 )
...
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28889
as title
Test Plan:
```
buck run mode/opt //caffe2/benchmarks/operator_benchmark/pt:batchnorm_test
# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : short
# Benchmarking PyTorch: batchnorm
# Mode: Eager
# Name: batchnorm_M1_N256_K3136_cpu
# Input: M: 1, N: 256, K: 3136, device: cpu
Forward Execution Time (us) : 276.192
Reviewed By: hl475
Differential Revision: D18227180
fbshipit-source-id: d8abe56237bb84903315332a5ecdaa1dff613110
2019-10-30 15:53:33 -07:00
Mingzhe Li
740474838f
unify as_strided benchmark ( #28890 )
...
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28890
as title
Test Plan:
```
buck run mode/opt //caffe2/benchmarks/operator_benchmark/pt:as_strided_test
# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : short
# Benchmarking PyTorch: as_strided
# Mode: Eager
# Name: as_strided_M256_N256_size(32,32)_stride(1,1)_storage_offset0_cpu
# Input: M: 256, N: 256, size: (32, 32), stride: (1, 1), storage_offset: 0, device: cpu
Forward Execution Time (us) : 2.792
...
Reviewed By: hl475
Differential Revision: D18227052
fbshipit-source-id: e17d9335ec89b47706a363bdb31451a01d4cbc5b
2019-10-30 15:53:29 -07:00
Mingzhe Li
db15c2ba20
unify add benchmark format ( #28891 )
...
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28891
as title
Test Plan:
```
buck run mode/opt //caffe2/benchmarks/operator_benchmark/pt:add_test
# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : short
# Benchmarking PyTorch: add
# Mode: Eager
# Name: add_M64_N64_K64_cpu
# Input: M: 64, N: 64, K: 64, device: cpu
Forward Execution Time (us) : 125.279
...
Reviewed By: hl475
Differential Revision: D18226789
fbshipit-source-id: 0cc51c6691533b02f662d4b6108916455f3a5b95
2019-10-30 15:53:25 -07:00
Mingzhe Li
607defa8a9
print per block avg time when running on AI-PEP machines ( #28838 )
...
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28838
as title
Test Plan:
```
buck run mode/opt //caffe2/benchmarks/operator_benchmark/pt:softmax_test -- --ai_pep_format true
Total time: 02:36.7 min
# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : short
# Benchmarking PyTorch: Softmax
/proc/self/fd/4/softmax_test.py:57: UserWarning: Implicit dimension choice for softmax has been deprecated. Change the call to include dim=X as an argument.
"""
PyTorchObserver {"type": "PyTorch_Softmax_N4_C3_H128_W128", "metric": "latency", "unit": "ms", "value": "4.83197245048359"}
PyTorchObserver {"type": "PyTorch_Softmax_N4_C3_H128_W128", "metric": "latency", "unit": "ms", "value": "4.839232977246866"}
PyTorchObserver {"type": "PyTorch_Softmax_N4_C3_H128_W128", "metric": "latency", "unit": "ms", "value": "4.7970924858236685"}
PyTorchObserver {"type": "PyTorch_Softmax_N4_C3_H128_W128", "metric": "latency", "unit": "ms", "value": "4.708389271399938"}
# Benchmarking PyTorch: Softmax
...
Reviewed By: hl475
Differential Revision: D18202504
fbshipit-source-id: 4a332763432b3b5886f241bb2ce49d4df481a6f3
2019-10-29 12:08:33 -07:00
Mingzhe Li
0a68e8bab0
fix op bench runtime error when use_jit is enabled ( #28837 )
...
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28837
The JIT code used in op bench is not compatibility with latest JIT code path. This diff aims to resolve that issue.
Test Plan:
```buck run mode/opt //caffe2/benchmarks/operator_benchmark/pt:add_test -- --use_jit
Building: finished in 02:29.8 min (100%) 7055/7055 jobs, 1 updated
Total time: 02:30.3 min
# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : short
# Benchmarking PyTorch: add
# Mode: JIT
# Name: add_M64_N64_K64_cpu
# Input: M: 64, N: 64, K: 64, device: cpu
Forward Execution Time (us) : 118.052
Reviewed By: hl475
Differential Revision: D18197057
fbshipit-source-id: 92edae8a48abc4115a558a91ba46cc9c3edb2eb8
2019-10-29 12:08:28 -07:00
Mingzhe Li
4703854321
change softmax input shape ( #28836 )
...
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28836
as title
Test Plan:
```
buck run mode/opt //caffe2/benchmarks/operator_benchmark/pt:softmax_test
Invalidating internal cached state: Buck configuration options changed between invocations. This may cause slower builds.
Changed value project.buck_out='buck-out/opt' (was 'buck-out/dev')
... and 56 more. See logs for all changes
Parsing buck files: finished in 6.2 sec
Creating action graph: finished in 8.8 sec
Building: finished in 05:42.6 min (100%) 28336/28336 jobs, 23707 updated
Total time: 05:57.7 min
# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : short
# Benchmarking PyTorch: Softmax
/proc/self/fd/4/softmax_test.py:57: UserWarning: Implicit dimension choice for softmax has been deprecated. Change the call to include dim=X as an argument.
"""
# Mode: Eager
# Name: Softmax_N4_C3_H256_W256
# Input: N: 4, C: 3, H: 256, W: 256
Forward Execution Time (us) : 18422.487
Reviewed By: hl475
Differential Revision: D18202335
fbshipit-source-id: 0bb376cb465d998a49196e148d48d436126ae334
2019-10-29 12:05:25 -07:00
Mingzhe Li
9f44a04613
separate PT and C2 to reduce build time ( #28731 )
...
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28731
as title
Test Plan:
```
Before:
buck run mode/opt caffe2/benchmarks/operator_benchmark:benchmark_all_test -- --operator sigmoid
Invalidating internal cached state: Buck configuration options changed between invocations. This may cause slower builds.
Changed value project.buck_out='buck-out/opt' (was 'buck-out/dev')
... and 69 more. See logs for all changes
Parsing buck files: finished in 7.2 sec
Creating action graph: finished in 10.0 sec
Building: finished in 06:38.4 min (100%) 29890/29890 jobs, 29890 updated
Total time: 06:55.7 min
# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : short
# Benchmarking PyTorch: sigmoid
With this diff
buck run mode/opt caffe2/benchmarks/operator_benchmark:benchmark_all_test -- --operator sigmoid
Parsing buck files: finished in 6.4 sec
Creating action graph: finished in 9.8 sec
Building: finished in 06:35.9 min (100%) 29892/29892 jobs, 29892 updated
Total time: 06:52.1 min
Reviewed By: hl475
Differential Revision: D18152071
fbshipit-source-id: 80c29570581bbd2f0e78e2df32734c17a2b036ee
2019-10-28 11:10:47 -07:00
Mingzhe Li
e886450863
report p50 time instead of avg ( #28722 )
...
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28722
as title
Test Plan:
```buck run mode/opt caffe2/benchmarks/operator_benchmark:benchmark_all_test -- --operator sigmoid
# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : short
# Benchmarking PyTorch: sigmoid
iters: 200, 462.6029555220157
iters: 400, 441.04792759753764
iters: 800, 441.81562116136774
iters: 1600, 440.79964311094955
iters: 3200, 436.3108493271284
iters: 6400, 440.87966314691585
iters: 12800, 452.29464218209614
# Mode: Eager
# Name: sigmoid_M512_N512
# Input: M: 512, N: 512
Forward Execution Time (us) : 441.048
Reviewed By: hl475
Differential Revision: D18149525
fbshipit-source-id: 5fe70a35b790ee7ad3ff57c0cb0b1c29cb609b83
2019-10-25 17:22:27 -07:00
なるみ
d83389d327
Ignore F401 in all __init__.py without putting noqa ( #25823 )
...
Summary:
By adding `per-file-ignores = __init__.py: F401` into `.flake8` with `flake8>=3.7`, we can ignore F410 in all `__init__.py` without putting `# noqa: F401` line by line.
http://flake8.pycqa.org/en/latest/user/options.html?highlight=per-file-ignores#cmdoption-flake8-per-file-ignores
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25823
Differential Revision: D17252182
Pulled By: soumith
fbshipit-source-id: 87b174075b79e4078953a7521bd1a8f82405646b
2019-10-23 15:28:13 -07:00
Sebastian Messmer
243298668c
Remove confusing torch::jit::RegisterOperators for custom ops ( #28229 )
...
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28229
We have `torch::RegisterOperators` for custom ops. `torch::jit::RegisterOperators` had a dual state of being able to register custom ops if called one way and being able to register pure JIT ops if called another way.
This is confusing because you end up in different operator libraries depending on which API exactly you're using.
This PR removes the ability for torch::jit::RegisterOperators to register custom ops and forces people to use the new torch::RegisterOperators.
This was already deprecated before but we now remove it.
ghstack-source-id: 92137305
Test Plan: unit tests
Differential Revision: D17981895
fbshipit-source-id: 0af267dfdc3c6a2736740091cf841bac40deff40
2019-10-18 10:46:31 -07:00
Mingzhe Li
5c2bf8abe5
change linear benchmark shapes ( #28228 )
...
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28228
as title
Test Plan:
```
buck run //caffe2/benchmarks/operator_benchmark/pt:linear_test
# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : short
# Benchmarking PyTorch: linear
# Mode: Eager
# Name: linear_N32_IN1024_OUT256
# Input: N: 32, IN: 1024, OUT: 256
Forward Execution Time (us) : 1501.918
# Benchmarking PyTorch: linear
# Mode: Eager
# Name: linear_N64_IN256_OUT100
# Input: N: 64, IN: 256, OUT: 100
Forward Execution Time (us) : 1175.672
Reviewed By: hl475
Differential Revision: D17980463
fbshipit-source-id: c8aaf6fa4d847037accb1e5b9ee04900690fd6ae
2019-10-17 11:09:10 -07:00
Mingzhe Li
cbcb70f84c
print last 50 runs when using ai_pep_format ( #28128 )
...
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28128
as title
Test Plan:
```
# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : short
# Benchmarking PyTorch: add
PyTorchObserver {"type": "add_M64_N64_K64_cpu", "metric": "latency", "unit": "ms", "value": "29.169559478759766"}
PyTorchObserver {"type": "add_M64_N64_K64_cpu", "metric": "latency", "unit": "ms", "value": "29.206514358520508"}
PyTorchObserver {"type": "add_M64_N64_K64_cpu", "metric": "latency", "unit": "ms", "value": "29.4950008392334"}
PyTorchObserver {"type": "add_M64_N64_K64_cpu", "metric": "latency", "unit": "ms", "value": "29.172897338867188"}
PyTorchObserver {"type": "add_M64_N64_K64_cpu", "metric": "latency", "unit": "ms", "value": "29.27255630493164"}
PyTorchObserver {"type": "add_M64_N64_K64_cpu", "metric": "latency", "unit": "ms", "value": "29.549837112426758"}
PyTorchObserver {"type": "add_M64_N64_K64_cpu", "metric": "latency", "unit": "ms", "value": "29.63113784790039"}
...
Reviewed By: hl475
Differential Revision: D17957611
fbshipit-source-id: 4e70ba2070b97fbbca0d6d4295abbead2ac356d4
2019-10-16 15:22:23 -07:00
Mingzhe Li
182abb2580
accept -1 in iterations and warmup iterations ( #28014 )
...
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28014
as title
Test Plan:
```
buck run caffe2/benchmarks/operator_benchmark/pt:add_test -- --iterations -1 --warmup_iterations -1
# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : short
# Benchmarking PyTorch: add
# Mode: Eager
# Name: add_M64_N64_K64_cpu
# Input: M: 64, N: 64, K: 64, device: cpu
Forward Execution Time (us) : 30827.046
...
Reviewed By: hl475
Differential Revision: D17932071
fbshipit-source-id: e4d9d256a0a4958110f61af13afdde70fc0f746c
2019-10-15 11:55:37 -07:00
Mingzhe Li
382917bbd1
report per iteration execution time ( #27923 )
...
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27923
As title
Test Plan:
```
buck run caffe2/benchmarks/operator_benchmark/pt:add_test -- --iterations 3 --ai_pep_format true
# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : short
# Benchmarking PyTorch: add
PyTorchObserver {"type": "add_M64_N64_K64_cpu", "metric": "latency", "unit": "us", "value": "0.027768373489379883"}
PyTorchObserver {"type": "add_M64_N64_K64_cpu", "metric": "latency", "unit": "us", "value": "0.02661752700805664"}
PyTorchObserver {"type": "add_M64_N64_K64_cpu", "metric": "latency", "unit": "us", "value": "0.026746749877929688"}
...
Reviewed By: hl475
Differential Revision: D17911718
fbshipit-source-id: 6fe28f2ab9ce1e0feabb5b822f04ff32dac977a9
2019-10-14 15:44:42 -07:00
Mingzhe Li
38a3eabd3e
remove cuda from add_test
...
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27698
Test Plan:
```
buck run caffe2/benchmarks/operator_benchmark/pt:add_test -- --iterations 3
# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : short
# Benchmarking PyTorch: add
# Mode: Eager
# Name: add_M64_N64_K64_cpu
# Input: M: 64, N: 64, K: 64, device: cpu
Forward Execution Time (us) : 29691.940
# Benchmarking PyTorch: add
# Mode: Eager
# Name: add_M64_N64_K128_cpu
# Input: M: 64, N: 64, K: 128, device: cpu
Forward Execution Time (us) : 60820.813
Reviewed By: hl475
Differential Revision: D17855731
fbshipit-source-id: c64c530f4dbcb5b4132a88894b24e5658aa49d66
2019-10-10 08:32:04 -07:00
Mingzhe Li
aeae5d6020
add dim to the cat benchmark ( #27620 )
...
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27620
as title
Test Plan:
```
buck run caffe2/benchmarks/operator_benchmark/pt:cat_test -- --iterations 3
# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : short
# Benchmarking PyTorch: cat
# Mode: Eager
# Name: cat_M256_N512_K1_dim0
# Input: M: 256, N: 512, K: 1, dim: 0
Forward Execution Time (us) : 775.348
# Benchmarking PyTorch: cat
# Mode: Eager
# Name: cat_M256_N512_K1_dim1
# Input: M: 256, N: 512, K: 1, dim: 1
Forward Execution Time (us) : 3612.599
# Benchmarking PyTorch: cat
# Mode: Eager
# Name: cat_M256_N512_K1_dim2
# Input: M: 256, N: 512, K: 1, dim: 2
Forward Execution Time (us) : 91416.224
...
``
Reviewed By: hl475
Differential Revision: D17835348
fbshipit-source-id: 94e02e328c4ea61b2e210d860ccdd377ef2b97f8
2019-10-09 16:03:07 -07:00
Mingzhe Li
abcd221f19
add as_strided operator to the benchmark ( #27632 )
...
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27632
Support as_strided operator in the benchmark suite.
Test Plan:
buck run caffe2/benchmarks/operator_benchmark/pt:as_strided_test -- --iterations 3
```
# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : short
# Benchmarking PyTorch: as_strided
# Mode: Eager
# Name: as_strided_M256_N256_size(32,32)_stride(1,1)_storage_offset0
# Input: M: 256, N: 256, size: (32, 32), stride: (1, 1), storage_offset: 0
Forward Execution Time (us) : 92.008
# Benchmarking PyTorch: as_strided
# Mode: Eager
# Name: as_strided_M256_N256_size(32,32)_stride(1,1)_storage_offset1
# Input: M: 256, N: 256, size: (32, 32), stride: (1, 1), storage_offset: 1
Forward Execution Time (us) : 91.029
...
Reviewed By: hl475
Differential Revision: D17840076
fbshipit-source-id: 6585feb51ebfaca40032ffa0a61d5f76c25a2599
2019-10-09 15:42:05 -07:00
Dylan Bespalko
7c472ec597
Vectorized complex unary and binary op support. ( #26500 )
...
Summary:
Added Complex support with AVX to unary ops and binary ops.
I need to add nan propagation to minimum() and maximum() in the future.
In-tree changes to pytorch to support complex numbers are being submitted here.
Out-of-tree support for complex numbers is here: pytorch-cpu-strided-complex extension
Preliminary Benchmarks are here.
I tried rrii and riri and found that riri is better in most situations.
Divide is very slow because you can't reduce 1/(x+y)
Sqrt is also very slow.
Reciprocal could be sped up after I add conj()
Everything else is typically within 20% of the real number performance.
Questions:
Why does macOS not support mil? #if AT_MKL_ENABLED() && !defined(__APPLE__) in vml.h. MKL does support some complex operations like Abs, so I was curious about trying it.
Is MKL just calling AVX?
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26500
Differential Revision: D17835431
Pulled By: ezyang
fbshipit-source-id: 6746209168fbeb567af340c22bf34af28286bd54
2019-10-09 12:49:21 -07:00
Mingzhe Li
ab15584dce
add random sample function to generate list of inputs ( #23174 )
...
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23174
This diff introduces a new function to random generates inputs based on the weights.
Test Plan:
buck run mode/dev-nosan //caffe2/benchmarks/operator_benchmark/common/tests:random_sample_test -- --iterations 3
```
# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : short
# Benchmarking PyTorch: add
# Mode: Eager
# Name: add_M1_N5_K7
# Input: M: 1, N: 5, K: 7
Forward Execution Time (us) : 82.923
# Benchmarking PyTorch: add
# Mode: Eager
# Name: add_M1_N6_K8
# Input: M: 1, N: 6, K: 8
Forward Execution Time (us) : 79.535
# Benchmarking PyTorch: add
# Mode: Eager
# Name: add_M2_N6_K7
# Input: M: 2, N: 6, K: 7
Forward Execution Time (us) : 83.471
# Benchmarking PyTorch: add
# Mode: Eager
# Name: add_M1_N4_K7
# Input: M: 1, N: 4, K: 7
Forward Execution Time (us) : 84.410
# Benchmarking PyTorch: add
# Mode: Eager
# Name: add_M1_N6_K7
# Input: M: 1, N: 6, K: 7
Forward Execution Time (us) : 82.399
```
Reviewed By: zheng-xq
Differential Revision: D15791723
fbshipit-source-id: 730e34d455e962ddf594a491d7c81c3f99fafa86
2019-10-09 11:24:14 -07:00
Mingzhe Li
c1ed0150c5
canonical example of torch.add benchmark ( #23402 )
...
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23402
This diff tries to make torch.add as a canonical example for op benchmark. Once it lands, we will also modify all other op benchmarks to be uniform with this example. With that, when people are adding new ops, they can copy paste any existing code.
Test Plan:
buck run mode/dev-nosan caffe2/benchmarks/operator_benchmark/pt:add_test -- --iterations 3
```
# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : short
# Benchmarking PyTorch: add
# Mode: Eager
# Name: add_M8_N16_K32_devicecpu
# Input: M: 8, N: 16, K: 32, device: cpu
Forward Execution Time (us) : 146.586
# Benchmarking PyTorch: add
# Mode: Eager
# Name: add_M8_N16_K32_devicecuda
# Input: M: 8, N: 16, K: 32, device: cuda
Forward Execution Time (us) : 92.151
# Benchmarking PyTorch: add
# Mode: Eager
# Name: add_M16_N16_K64_devicecpu
# Input: M: 16, N: 16, K: 64, device: cpu
Forward Execution Time (us) : 428.421
# Benchmarking PyTorch: add
# Mode: Eager
# Name: add_M16_N16_K64_devicecuda
# Input: M: 16, N: 16, K: 64, device: cuda
Forward Execution Time (us) : 89.811
# Benchmarking PyTorch: add
# Mode: Eager
# Name: add_M64_N64_K128_devicecpu
# Input: M: 64, N: 64, K: 128, device: cpu
Forward Execution Time (us) : 11857.012
# Benchmarking PyTorch: add
# Mode: Eager
# Name: add_M64_N64_K128_devicecuda
# Input: M: 64, N: 64, K: 128, device: cuda
Forward Execution Time (us) : 93.918
# Benchmarking PyTorch: add
# Mode: Eager
# Name: add_M8_N16_K32_devicecpu_bwdall
# Input: M: 8, N: 16, K: 32, device: cpu
Backward Execution Time (us) : 990.125
# Benchmarking PyTorch: add
# Mode: Eager
# Name: add_M8_N16_K32_devicecpu_bwd1
# Input: M: 8, N: 16, K: 32, device: cpu
Backward Execution Time (us) : 781.217
# Benchmarking PyTorch: add
# Mode: Eager
# Name: add_M8_N16_K32_devicecpu_bwd2
# Input: M: 8, N: 16, K: 32, device: cpu
Backward Execution Time (us) : 777.307
```
Reviewed By: zheng-xq
Differential Revision: D16501974
fbshipit-source-id: f1eec010eabf11ce4fcf6cfe6f85cd5241a7022d
2019-10-09 11:24:10 -07:00
Mingzhe Li
a750a1a2b4
modify config_list to support cross product of attributes ( #23399 )
...
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23399
This diff enables config_list function to support cross product of inputs besides the shapes.
The following is an example using the update interface. The same input shapes can run on different devices and dtypes.
```
add_short_configs = op_bench.config_list(
attr_names=['M', 'N', 'K'],
attrs=[
[8, 16, 32],
[16, 16, 64],
[64, 64, 128],
],
cross_product_configs={
'device': ['cpu', 'cuda'],
'dtype': [torch.float, torch.float64],
},
tags=['short'],
)
```
Test Plan:
buck run mode/dev-nosan caffe2/benchmarks/operator_benchmark/common/tests:pt_configs_list_test -- --iterations 3
```
# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : short
# Benchmarking PyTorch: add
# Mode: Eager
# Name: add_M8_N16_K32_devicecpu_dtypetorch.float32
# Input: M: 8, N: 16, K: 32, device: cpu, dtype: torch.float32
Forward Execution Time (us) : 164.489
# Benchmarking PyTorch: add
# Mode: Eager
# Name: add_M8_N16_K32_devicecpu_dtypetorch.float64
# Input: M: 8, N: 16, K: 32, device: cpu, dtype: torch.float64
Forward Execution Time (us) : 158.677
# Benchmarking PyTorch: add
# Mode: Eager
# Name: add_M8_N16_K32_devicecuda_dtypetorch.float32
# Input: M: 8, N: 16, K: 32, device: cuda, dtype: torch.float32
Forward Execution Time (us) : 103.866
# Benchmarking PyTorch: add
# Mode: Eager
# Name: add_M8_N16_K32_devicecuda_dtypetorch.float64
# Input: M: 8, N: 16, K: 32, device: cuda, dtype: torch.float64
Forward Execution Time (us) : 106.027
# Benchmarking PyTorch: add
# Mode: Eager
# Name: add_M16_N16_K64_devicecpu_dtypetorch.float32
# Input: M: 16, N: 16, K: 64, device: cpu, dtype: torch.float32
Forward Execution Time (us) : 451.016
...
```
buck test caffe2/benchmarks/operator_benchmark/fb:operator_benchmark_test
```
Building: finished in 2.4 sec (100%) 6882/6882 jobs, 2 updated
Total time: 2.8 sec
Trace available for this run at /tmp/testpilot.20190730-160519.3952794.log
TestPilot test runner for Facebook. See https://fburl.com/testpilot for details.
Testpilot build revision 203f0104fbfcec4128be2c482c64736309ae39c9 fbpkg a4b2a9897a0c45069bd07d83e5981052 at Sun Jul 28 01:22:13 2019 by twsvcscm from /data/fbprojects/packages/testinfra.testpilot/667/t.par
Discovering tests
Running 3 tests
Started new test run: https://our.intern.facebook.com/intern/testinfra/testrun/5910974514382830
✓ caffe2/benchmarks/operator_benchmark/fb:operator_benchmark_test - test_config_list_impl (operator_benchmark_test.TestConsumeOp) 0.011 1/3 (passed)
✓ caffe2/benchmarks/operator_benchmark/fb:operator_benchmark_test - test_list_of_ops (operator_benchmark_test.TestConsumeOp) 19.920 2/3 (passed)
✓ caffe2/benchmarks/operator_benchmark/fb:operator_benchmark_test - test_single_op (operator_benchmark_test.TestConsumeOp) 23.418 3/3 (passed)
✓ caffe2/benchmarks/operator_benchmark/fb:operator_benchmark_test - main 0.000 (passed)
Finished test run: https://our.intern.facebook.com/intern/testinfra/testrun/5910974514382830
Summary (total time 29.90s):
PASS: 4
FAIL: 0
SKIP: 0
FATAL: 0
TIMEOUT: 0
OMIT: 0
```
Reviewed By: zheng-xq
Differential Revision: D16501272
fbshipit-source-id: d92b5cf50b0f37d5b3a79d423acb521366b4e8db
2019-10-09 11:24:06 -07:00
Mingzhe Li
31a6ff46c1
change input shape to reduce variation ( #27548 )
...
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27548
as title
Test Plan: i_dont_want_it
Reviewed By: hl475
Differential Revision: D17811295
fbshipit-source-id: 3be957f6f3eaa464ebf4f5bd7c07d096ae4eae8c
2019-10-08 11:45:06 -07:00
Daya Khudia
bf7ebc5a53
Set number of threads for operator_benchmarks ( #27010 )
...
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27010
Setting OMP_NUM_THREADS programmatically doesn't do the right thing because initialization is already done. Fixing this by calling torch.set_num_threads explicitly.
Passing --omp_num_threads works as expected now.
In dir benchmarks/operator_benchmark/
python -m pt.qconv_test --tag_filter resnext101_32x4 --wipe_cache --test_name QConv2d_N1_IC64_OC128_H56_W56_G1_kernel1_stride1_pad0 --omp_num_threads 1
```
# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : None
# Benchmarking PyTorch: QConv2d
# Mode: Eager
# Name: QConv2d_N1_IC64_OC128_H56_W56_G1_kernel1_stride1_pad0
# Input: N: 1, IC: 64, OC: 128, H: 56, W: 56, G: 1, kernel: 1, stride: 1, pad: 0
Forward Execution Time (us) : 509.965
# Benchmarking PyTorch: QConv2d
# Mode: Eager
# Name: QConv2d_N1_IC64_OC128_H56_W56_G1_kernel1_stride1_pad0
# Input: N: 1, IC: 64, OC: 128, H: 56, W: 56, G: 1, kernel: 1, stride: 1, pad: 0
Forward Execution Time (us) : 576.007
```
python -m pt.qconv_test --tag_filter resnext101_32x4 --wipe_cache --test_name QConv2d_N1_IC64_OC128_H56_W56_G1_kernel1_stride1_pad0 --omp_num_threads 4
```
# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : None
# Benchmarking PyTorch: QConv2d
# Mode: Eager
# Name: QConv2d_N1_IC64_OC128_H56_W56_G1_kernel1_stride1_pad0
# Input: N: 1, IC: 64, OC: 128, H: 56, W: 56, G: 1, kernel: 1, stride: 1, pad: 0
Forward Execution Time (us) : 195.002
# Benchmarking PyTorch: QConv2d
# Mode: Eager
# Name: QConv2d_N1_IC64_OC128_H56_W56_G1_kernel1_stride1_pad0
# Input: N: 1, IC: 64, OC: 128, H: 56, W: 56, G: 1, kernel: 1, stride: 1, pad: 0
Forward Execution Time (us) : 189.788
```
ghstack-source-id: 91050434
Test Plan: See summary
Differential Revision: D17647391
fbshipit-source-id: e00de1151902291ed94fd34446995ea1f0199d14
2019-09-30 17:04:51 -07:00
Daya Khudia
fc926d9242
fix operator level benchmark to have NHWC layout ( #26577 )
...
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26577
Have the NHWC layout expected by qconv kernel.
for rexnext101-32x4d shapes
Before :
```
Forward Execution Time (us) : 4787.046
Forward Execution Time (us) : 1320.065
Forward Execution Time (us) : 2611.631
Forward Execution Time (us) : 2562.389
Forward Execution Time (us) : 1072.342
Forward Execution Time (us) : 2330.658
Forward Execution Time (us) : 1894.549
Forward Execution Time (us) : 3446.532
Forward Execution Time (us) : 2381.251
Forward Execution Time (us) : 1157.339
Forward Execution Time (us) : 2712.621
Forward Execution Time (us) : 3789.905
Forward Execution Time (us) : 4057.886
Forward Execution Time (us) : 6104.570
Forward Execution Time (us) : 11328.552
Forward Execution Time (us) : 3707.519
Forward Execution Time (us) : 4681.272
Forward Execution Time (us) : 2459.266
Forward Execution Time (us) : 849.564
Forward Execution Time (us) : 3000.764
Forward Execution Time (us) : 3019.704
Forward Execution Time (us) : 5216.046
Forward Execution Time (us) : 3403.549
Forward Execution Time (us) : 1291.878
Forward Execution Time (us) : 2057.147
```
After
```
Forward Execution Time (us) : 4398.649
Forward Execution Time (us) : 993.619
Forward Execution Time (us) : 2252.265
Forward Execution Time (us) : 2230.500
Forward Execution Time (us) : 977.389
Forward Execution Time (us) : 2233.356
Forward Execution Time (us) : 1223.085
Forward Execution Time (us) : 2758.765
Forward Execution Time (us) : 2208.028
Forward Execution Time (us) : 821.816
Forward Execution Time (us) : 2396.748
Forward Execution Time (us) : 2505.803
Forward Execution Time (us) : 2771.251
Forward Execution Time (us) : 4816.474
Forward Execution Time (us) : 10065.299
Forward Execution Time (us) : 2424.949
Forward Execution Time (us) : 3854.800
Forward Execution Time (us) : 2297.426
Forward Execution Time (us) : 682.403
Forward Execution Time (us) : 2297.541
Forward Execution Time (us) : 2317.828
Forward Execution Time (us) : 4517.372
Forward Execution Time (us) : 2716.691
Forward Execution Time (us) : 942.385
Forward Execution Time (us) : 1717.172
```
ghstack-source-id: 90536232
Test Plan: buck build mode/opt caffe2/benchmarks/operator_benchmark/pt:qconv_test --show-output
Differential Revision: D17512291
fbshipit-source-id: 7764b2ab38e0e8e0aab982006915176638004df6
2019-09-23 11:12:51 -07:00
Jerry Zhang
254122dd4e
quantize_linear -> quantize_per_tensor ( #26574 )
...
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26574
Since we also have `quantized::linear`, `quantize_linear` sounds
confusing, so we plan to rename it before the branch cut
Test Plan:
ci
Imported from OSS
Differential Revision: D17514876
fbshipit-source-id: 01d9005e6ec8cb9950b9d8bba122109c389641d3
2019-09-20 21:58:48 -07:00
Dmytro Dzhulgakov
af64789cfa
Fold activation permutation inside quantized conv operator ( #26242 )
...
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26242
According to https://github.com/pytorch/pytorch/issues/19092 we always keep NCHW order and do handling inside the kernels. This PR fixes it for activations of the qconv by using MemoryLayout mechanism - activations stay logically as NCHW but strided as NHWC.
Note, that this version is more aggressive than eventual MemoryLayout mechanism - the QConv's output is always NHWC regardless of the input striding. I think it's ok as we don't have NCHW quantized kernels anyway - so the very first conv would magically switch the order, but I'm open to suggestions. Btw, it doesn't change behavior - same happens today in master because of the explicit permute() call.
Test Plan: Imported from OSS
Differential Revision: D17443218
Pulled By: dzhulgakov
fbshipit-source-id: cfd136ae0465acd8d8c26ffad87385dac9c88726
2019-09-19 13:39:26 -07:00
Huamin Li
2a917616a8
remove cosh_ op test ( #25893 )
...
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25893
as title
Test Plan: waitforsandcastle
Reviewed By: mingzhe09088
Differential Revision: D17278340
fbshipit-source-id: 81b7e8658d5919e865754ae4d834dc44494cb2e3
2019-09-09 20:34:35 -07:00
Huamin Li
1c81d9006a
increase input shape to reduce variance ( #25812 )
...
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25812
as title
Test Plan:
```
[huaminli@devvm2388.ftw3 ~/fbsource/fbcode] buck run mode/dev-nosan caffe2/benchmarks/operator_benchmark:benchmark_all_test -- --operators None --iterations 3
```
last few lines of the output P109238440
Reviewed By: mingzhe09088
Differential Revision: D17246792
fbshipit-source-id: d93ee5f404164d32210968997c6ea63b82058d2a
2019-09-07 06:25:26 -07:00
Huamin Li
d4226392bd
change shape for some ops to reduce variance ( #25686 )
...
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25686
From the new runs, we found some ops that we can increase the shape size to reduce the variance
Test Plan:
```
[huaminli@devvm2388.ftw3 ~/fbsource/fbcode] buck run mode/dev-nosan caffe2/benchmarks/operator_benchmark:benchmark_all_test -- --operators None --iterations 3
```
last few lines of the output P108624830
Reviewed By: mingzhe09088
Differential Revision: D17199623
fbshipit-source-id: a9277509f6d3e6503d3086b3b02f87eebd953239
2019-09-04 21:17:43 -07:00
Huamin Li
cd4a7cdaa6
change shape for some ops to reduce variance
...
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25619
Test Plan:
```
[huaminli@devvm2388.ftw3 ~/fbsource/fbcode] buck run mode/dev-nosan caffe2/benchmarks/operator_benchmark:benchmark_all_test -- --operators None --iterations 3
```
last few lines of output P108286305
Reviewed By: mingzhe09088
Differential Revision: D17175802
fbshipit-source-id: 46b69fc1895444b15b6dfcec0625b6b9b006712a
2019-09-03 18:52:25 -07:00
Huamin Li
9d89c9a30f
change shape for conv and unary ops ( #25477 )
...
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25477
We want to increase `in_c, out_c` so that the metric reported back are more stable
Test Plan:
```[huaminli@devvm2388.ftw3 ~/fbsource/fbcode] buck run mode/dev-nosan caffe2/benchmarks/operator_benchmark:benchmark_all_test -- --operators None --iterations 3
```
runs fine on my devserver, last couple lines of output P107448746
Reviewed By: mingzhe09088
Differential Revision: D17133043
fbshipit-source-id: 0b989a530cbfe3d608471a30ae4bbda10e5216ea
2019-08-30 10:02:30 -07:00
Rohan Varma
4b77cae360
Add qconv_test to benchmarking tests ( #24913 )
...
Summary:
Adds the tests defined in `qconv_tests.py` to `benchmark_all_tests.py` so that they are ran by `benchmark_all_tests`.
The next diff will create another `ai_benchmark_test` specifying the qconv operations similar to D16768680. Since AI-PEP integrates with benchmark_all_tests, this should add these qconv benchmarks to AI-PEP.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24913
Test Plan:
`buck run mode/opt caffe2/benchmarks/operator_benchmark:benchmark_all_test` (runs only test who's `tag` is `short`)
`buck run mode/opt caffe2/benchmarks/operator_benchmark:benchmark_all_test -- --tag_filter resnext101_32x4d` (runs test who's `tag` is `resxnet101_32x4d`).
This runs the tests for all the imported modules in `benchmark_all_test.py` (i.e. add_test, batchnorm_test, qconv_test, etc)
```
buck run mode/opt caffe2/benchmarks/operator_benchmark:benchmark_all_test -- --operators QConv2d,QLinear
```
tests the QConv and QLinear operators
Relevant output for `qconv_test.py` (for short tag):
```
# Benchmarking PyTorch: QConv2d
# Mode: Eager
# Name: QConv2d_N1_IC64_OC128_H56_W56_G1_kernel1_stride1_pad0
# Input: N: 1, IC: 64, OC: 128, H: 56, W: 56, G: 1, kernel: 1, stride: 1, pad: 0
Forward Execution Time (us) : 957.848
# Benchmarking PyTorch: QConv2d
# Mode: Eager
# Name: QConv2d_N1_IC256_OC256_H56_W56_G32_kernel3_stride1_pad1
# Input: N: 1, IC: 256, OC: 256, H: 56, W: 56, G: 32, kernel: 3, stride: 1, pad: 1
Forward Execution Time (us) : 3638.806
# Benchmarking PyTorch: QConv2d
# Mode: Eager
# Name: QConv2d_N1_IC256_OC256_H56_W56_G1_kernel1_stride1_pad0
# Input: N: 1, IC: 256, OC: 256, H: 56, W: 56, G: 1, kernel: 1, stride: 1, pad: 0
Forward Execution Time (us) : 3870.311
# Benchmarking PyTorch: QConv2d
# Mode: Eager
# Name: QConv2d_N1_IC512_OC512_H56_W56_G32_kernel3_stride2_pad1
# Input: N: 1, IC: 512, OC: 512, H: 56, W: 56, G: 32, kernel: 3, stride: 2, pad: 1
Forward Execution Time (us) : 10052.192
```
For resnext tag:
```
# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : resnext101_32x4d
# Benchmarking PyTorch: QConv2d
# Mode: Eager
# Name: QConv2d_N1_IC512_OC512_H14_W14_G32_kernel3_stride1_pad1
# Input: N: 1, IC: 512, OC: 512, H: 14, W: 14, G: 32, kernel: 3, stride: 1, pad: 1
Forward Execution Time (us) : 543.171
# Benchmarking PyTorch: QConv2d
# Mode: Eager
# Name: QConv2d_N1_IC512_OC1024_H28_W28_G1_kernel1_stride2_pad0
# Input: N: 1, IC: 512, OC: 1024, H: 28, W: 28, G: 1, kernel: 1, stride: 2, pad: 0
Forward Execution Time (us) : 1914.301
# Benchmarking PyTorch: QConv2d
# Mode: Eager
# Name: QConv2d_N1_IC512_OC256_H28_W28_G1_kernel1_stride1_pad0
# Input: N: 1, IC: 512, OC: 256, H: 28, W: 28, G: 1, kernel: 1, stride: 1, pad: 0
Forward Execution Time (us) : 1809.069
# Benchmarking PyTorch: QConv2d
# Mode: Eager
# Name: QConv2d_N1_IC512_OC512_H28_W28_G1_kernel1_stride1_pad0
# Input: N: 1, IC: 512, OC: 512, H: 28, W: 28, G: 1, kernel: 1, stride: 1, pad: 0
Forward Execution Time (us) : 3100.579
# Benchmarking PyTorch: QConv2d
# Mode: Eager
# Name: QConv2d_N1_IC512_OC512_H28_W28_G32_kernel3_stride2_pad1
# Input: N: 1, IC: 512, OC: 512, H: 28, W: 28, G: 32, kernel: 3, stride: 2, pad: 1
Forward Execution Time (us) : 2247.540
# Benchmarking PyTorch: QConv2d
# Mode: Eager
# Name: QConv2d_N1_IC64_OC128_H56_W56_G1_kernel1_stride1_pad0
# Input: N: 1, IC: 64, OC: 128, H: 56, W: 56, G: 1, kernel: 1, stride: 1, pad: 0
Forward Execution Time (us) : 1001.731
# Benchmarking PyTorch: QConv2d
# Mode: Eager
# Name: QConv2d_N1_IC64_OC256_H56_W56_G1_kernel1_stride1_pad0
# Input: N: 1, IC: 64, OC: 256, H: 56, W: 56, G: 1, kernel: 1, stride: 1, pad: 0
Forward Execution Time (us) : 1571.620
```
Differential Revision: D16908445
Pulled By: rohan-varma
fbshipit-source-id: b711bc3591ce5dcd3ab2521134cff2b12188e3ac
2019-08-22 11:28:49 -07:00
Rohan Varma
60518e0035
Add resnext 32x4d shapes to benchmark ( #24503 )
...
Summary:
Adds resnext-1011 32x4d shapes to the qconv benchmarks. (Also ran the code formatter)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24503
Test Plan:
Run tests on devserver:
```buck run mode/opt
caffe2/benchmarks/operator_benchmark/pt:qconv_test -- --omp_num_threads 1
--mkl_num_threads 1```
Reviewed By: dskhudia
Differential Revision: D16845746
Pulled By: rohan-varma
fbshipit-source-id: d9f842e5f455fccecf547129c5faffa253a49e23
2019-08-19 12:04:48 -07:00
Huamin Li
5c57cedc16
change the location of wipe cache ( #24454 )
...
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24454
We want to change the place of wipe_cache. From what we observed, the original place does not help.
Reviewed By: mingzhe09088
Differential Revision: D16853205
fbshipit-source-id: 1f6224a52433cbe15c0d27000b4ac140fb9cd4c3
2019-08-15 20:55:47 -07:00
Huamin Li
1b38a6f602
add wipe cache
...
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24390
Reviewed By: mingzhe09088
Differential Revision: D16808041
fbshipit-source-id: 1b19f47706e4e2f2e03356469315b55c6ff76d20
2019-08-14 23:48:52 -07:00
Huamin Li
f511abb701
increase default warmup iter and iter ( #24272 )
...
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24272
As title, plus some lint
Reviewed By: mingzhe09088
Differential Revision: D16792312
fbshipit-source-id: 1386c369c96da04a584d1f7127b708b29d4b47d2
2019-08-13 14:35:19 -07:00
Mingzhe Li
b453fd9916
separate input shapes to reduce default execution time ( #24136 )
...
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24136
This diff aims to reduce the execution of benchmark_all_test which runs all the supported operator benchmarks. In the default run, only one shape of each operator will be benchmarked. The rest of the benchmarks can be triggered with tag_filter flag.
Reviewed By: hl475
Differential Revision: D16736448
fbshipit-source-id: 33bd86f6fc2610f87f24240ad559fb11d3e35e89
2019-08-09 17:09:21 -07:00
Daya Khudia
aa02b1adcd
Fix qconv benchmark ( #24019 )
...
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24019
Permutes are done inside the module. We don't need them outside.
Setting of scale/zero_point has changed.
Reviewed By: jianyuh
Differential Revision: D16712437
fbshipit-source-id: e3cedf9d63347fbf8070d1a65a196e6d4b2833fc
2019-08-09 09:17:55 -07:00
Mingzhe Li
29e2b58b00
Back out "[op-bench][experiment] increase predefined_minimum_secs to reduce variation" ( #24065 )
...
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24065
Original commit changeset: d4c034f64b1d
Reviewed By: hl475
Differential Revision: D16726647
fbshipit-source-id: 6cd6cfdad804efb073062809bcbc4c0921a3d007
2019-08-08 18:36:22 -07:00
Daya Khudia
fb06c9e61f
qconv operator level benchmark ( #22895 )
...
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22895
Adding op level benchmarking for qconv operator
Reviewed By: mingzhe09088
Differential Revision: D16274273
fbshipit-source-id: 6674753e38f6692f5e6d0db0cac90c5fbf358147
2019-08-05 09:39:16 -07:00
Mingzhe Li
5cb41d35da
increase predefined_minimum_secs to reduce variation ( #23734 )
...
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23734
In the latest run on AI-PEP, there are 6 tests out of 342 which has more than 7% variation. Around 20 tests which has variations between 4% to 7%. The rest are within 4%. This diff tries to further reduce the variation to 4% for all tests.
Each test has to run predefined_minimum_secs seconds before exiting. Increasing that value makes all tests run longer. Based on the experimental results, we will see what's the right value to use.
Reviewed By: hl475
Differential Revision: D16622361
fbshipit-source-id: d4c034f64b1d64e1cffd67ffbced7d8cd4449d69
2019-08-02 10:33:48 -07:00
Mingzhe Li
3c986dff77
introduce auto_set to simplify benchmarking the backward path of operators ( #23276 )
...
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23276
This diff introduces a new feature to simplify benchmarking the backward path of ops. Here is an example:
```
...
self.input_one = torch.rand(M, N, K, requires_grad=self.auto_set())
self.input_two = torch.rand(M, N, K, requires_grad=self.auto_set())
...
```
In this way, the benchmark will generate three different test cases.
1. input_one requires grad
2. input_two requires grad
3. both inputs require grad
Here is a sample output:
```
# Benchmarking PyTorch: add
# Mode: Eager
# Name: add_M1_N8_K8_bwdall
# Input: M: 1, N: 8, K: 8
Backward Execution Time (us) : 863.744
# Benchmarking PyTorch: add
# Mode: Eager
# Name: add_M1_N8_K8_bwd1
# Input: M: 1, N: 8, K: 8
Backward Execution Time (us) : 727.915
# Benchmarking PyTorch: add
# Mode: Eager
# Name: add_M1_N8_K8_bwd2
# Input: M: 1, N: 8, K: 8
Backward Execution Time (us) : 687.626
```
Reviewed By: zheng-xq
Differential Revision: D16450355
fbshipit-source-id: 50ae0916e81c3ff9f0c482ed6d386319eb15b305
2019-07-29 15:58:41 -07:00
Abhinav Jauhri
ffef0e03b7
Enabling GPU device runs for operators ( #23461 )
...
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23461
Enabling GPU device runs for production operator shapes.
Reviewed By: xw285cornell, mingzhe09088
Differential Revision: D16526928
fbshipit-source-id: 46657963f4b0bc43d14205ccf1b63d588657e388
2019-07-26 18:53:40 -07:00
Mingzhe Li
f0ebf769de
allow accepting empty input to the benchmark ( #23462 )
...
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23462
as title
Reviewed By: hl475
Differential Revision: D16527176
fbshipit-source-id: 7a8ff4f3c6122ae7b3205e0b446fec06fd95eedc
2019-07-26 17:30:42 -07:00
Mingzhe Li
53182e53f0
fix observer name in the benchmark output ( #23443 )
...
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23443
as title
Reviewed By: hl475
Differential Revision: D16520962
fbshipit-source-id: 7a0ccbece487837c204f242d2a5c6f69b32cbc8c
2019-07-26 12:20:41 -07:00
Mingzhe Li
828c08b4c7
allow passing a list of operators to benchmark ( #23442 )
...
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23442
Replace the argument name from `operator` to `operators` which can take a list of operators to test.
Reviewed By: hl475
Differential Revision: D16520779
fbshipit-source-id: 94284a87c64471793e319f5bd3143f89b9a192bb
2019-07-26 12:20:36 -07:00
Mingzhe Li
7499fe72e9
remove c2 tests from benchmark_all_test ( #23437 )
...
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23437
as title
Reviewed By: hl475
Differential Revision: D16519770
fbshipit-source-id: 63fc269e18c264d399e25f44b03f81fc3ae01113
2019-07-26 11:12:53 -07:00
Mingzhe Li
3516f3c235
handle exit from init method ( #21211 )
...
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21211
There are cases where the `init` method used to create inputs can exit with error. When this happens, that specific input should be skipped.
Reviewed By: zheng-xq
Differential Revision: D15466410
fbshipit-source-id: 55e86764b2ec56f7730349ff1df6e50efc0239d7
2019-07-25 21:41:06 -07:00
Abhinav Jauhri
bae10db522
Incorporating arguments to pull production operators and adding device type. ( #23197 )
...
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23197
Incorporating arguments to pull production operators and adding device type.
Reviewed By: mingzhe09088
Differential Revision: D16387263
fbshipit-source-id: e20ed82225eb1e4b7ab1756ec157967b055d85bf
2019-07-23 13:43:26 -07:00
Kimish Patel
82db5dceb6
Added running via throughput benchmark options. ( #23077 )
...
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23077
Although the difference between running from python and this is not much if we
have forward method's loop long enough (like 1000 in this case).
Reviewed By: mingzhe09088
Differential Revision: D16122343
fbshipit-source-id: 5c1d1b98ae82c996baf9d42bcd04995e2ba60c78
2019-07-22 11:27:55 -07:00
Kimish Patel
2ba516d5b6
Added add op framework overhead benchmark for C2 ( #23078 )
...
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23078
C2 benchmark.
Reviewed By: mingzhe09088
Differential Revision: D16122337
fbshipit-source-id: bf56e60c6e60eda2be2938d9f613708a4bc1669a
2019-07-22 11:27:50 -07:00
Kimish Patel
0621068cdc
Add simple add op based framework overhead benchmark. ( #23076 )
...
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23076
Tracing based and non tracing based added
Reviewed By: mingzhe09088
Differential Revision: D16097280
fbshipit-source-id: 3a137092f7ccc3dd2d29d95e10178ec89d3ce892
2019-07-22 11:27:45 -07:00
Jianyu Huang
f72d754877
qlinear operator level benchmark ( #22914 )
...
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22914
Adding op level benchmarking for qlinear operator
Reviewed By: mingzhe09088
Differential Revision: D16285204
fbshipit-source-id: 99b734ddfa0af6aada820cac7b2f38ef7a5868cb
2019-07-17 09:13:17 -07:00
Mingzhe Li
9b9546a498
replace ByteTensor with bool in fill_test ( #22913 )
...
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22913
as title
Reviewed By: hl475
Differential Revision: D16285248
fbshipit-source-id: 78b13d48d547760e59e0e5c8875ab09a3cd24828
2019-07-16 11:51:55 -07:00
Mingzhe Li
560d847da6
add benchmark for PT fill_ op ( #22867 )
...
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22867
as title
Reviewed By: hl475
Differential Revision: D16263458
fbshipit-source-id: 55b0e62023c117aaa0c2b9a4d65b234a388f086d
2019-07-16 09:50:41 -07:00
Mingzhe Li
94d99f2522
add num_runs flag to the benchmark ( #22892 )
...
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22892
Think of num_runs as manually run the binary <num_runs> times. Each run runs the operator for many iterations.
Reviewed By: hl475
Differential Revision: D16271597
fbshipit-source-id: b6f509ee0332c70f85bec0d447b84940c5c0cecd
2019-07-15 17:18:25 -07:00
Mingzhe Li
0cddd3e751
update README ( #21312 )
...
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21312
This diff updates the README of op-bench.
Reviewed By: zheng-xq
Differential Revision: D15612665
fbshipit-source-id: b33119fd4f9d086b03b5e28fbe8a4015b282b15c
2019-07-15 13:34:05 -07:00
Mingzhe Li
7eb0319339
add new tests to benchmark_all_test ( #22787 )
...
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22787
as title
Reviewed By: hl475
Differential Revision: D16219329
fbshipit-source-id: 097ee73e7644d5ca482ad044d0fd2c3e7dc2c10b
2019-07-11 22:50:55 -07:00
Mingzhe Li
1878800f47
make custom op work in OSS environment ( #22781 )
...
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22781
The custom op is required to make the op benchmark work with JIT. Running this command `python setup.py install` in the pt_extension directory to install it. It is required.
Reviewed By: hl475
Differential Revision: D16214430
fbshipit-source-id: c9221c532011f9cf0d5453ac8535a6cde65e8376
2019-07-11 21:17:17 -07:00
Mingzhe Li
3cf5f22f02
Enable C2 operators running with {cpu, gpu} * {forward, backward} ( #22664 )
...
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22664
This diff enables c2 operators to run the combination of {cpu, gpu} * {forward, backward}.
Reviewed By: hl475
Differential Revision: D15781789
fbshipit-source-id: e9843e3c46ea144042829860638d406f6a33792b
2019-07-09 16:41:53 -07:00
Mingzhe Li
95a5da175d
change c2 bench to use new tensor creation interface ( #22663 )
...
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22663
as title
Reviewed By: hl475
Differential Revision: D15744502
fbshipit-source-id: 441ab9fb7580ca87c3f2027d0a63ba18b8d35016
2019-07-09 16:41:49 -07:00
Mingzhe Li
45aad2e680
change unary, pool, max ops to use new interface ( #22661 )
...
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22661
as title
Reviewed By: hl475
Differential Revision: D16170825
fbshipit-source-id: d80944224b8717e7aa35980907ff48e587b85217
2019-07-09 16:41:32 -07:00
Mingzhe Li
2b2fe525b9
introduce a new interface to add a list of operators ( #21209 )
...
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21209
This diff introduces a new interface to add a list of operators. Here are the steps to add ops using this interface:
- create op_list:
```unary_ops_list = op_bench.op_list(
attr_names=["op_name", "op_function"],
attrs=[
["abs", torch.abs],
["abs_", torch.abs_],
],
)
```
- create a bench class:
```
class UnaryOpBenchmark(op_bench.TorchBenchmarkBase):
def init(self, M, N, op_function):
self.input_one = torch.rand(M, N)
self.op_func = op_function
def forward(self):
return self.op_func(self.input_one)
```
- 3. register those ops
``` op_bench.generate_pt_tests_from_list(unary_ops_list, unary_ops_configs, UnaryOpBenchmark)
```
Reviewed By: zheng-xq
Differential Revision: D15514188
fbshipit-source-id: f09b359cab8175eeb8d51b3ad7bbbcfbc9f6430f
2019-07-09 16:41:29 -07:00
Mingzhe Li
b93f29ded3
add JIT path to the benchmark ( #22309 )
...
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22309
This diff enables PT operators to run with JIT mode. Users can control eager and JIT mode using the `use_jit` flag.
In this diff, we are putting operators in a loop and passed it to JIT. One extra step which wraps the operator with the `_consume` op is introduced to avoid dead code elimination optimization in JIT. With that, the reported time includes the real operator execution time plus the `_consume` (directly return input, nothing else if happening inside) op.
Reviewed By: zheng-xq
Differential Revision: D16033082
fbshipit-source-id: e03be89fd5a505e44e81015dfc63db9cd76fb8a1
2019-07-03 17:18:03 -07:00
Mingzhe Li
325ec2327f
create tensor based on provided datatype ( #22468 )
...
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22468
as title
Reviewed By: ajauhri
Differential Revision: D15744503
fbshipit-source-id: 050b32dd7f135512385fc04f098c376c664211a9
2019-07-03 17:08:23 -07:00
Mingzhe Li
9c44f6c723
generate tests based on op metadata ( #21432 )
...
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21432
This diff introduce a new interface to generate tests based on the metadata of operators.
Reviewed By: ajauhri
Differential Revision: D15675542
fbshipit-source-id: ba60e803ea553d8b9eb6cb2bcdc6a0368ef62b1c
2019-07-03 16:48:41 -07:00
Mingzhe Li
402b9f9a6d
add PT chunk op to the benchmark ( #22409 )
...
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22409
as title
Reviewed By: hl475
Differential Revision: D16079031
fbshipit-source-id: 109060ffc953f2357b2783b13f9b9dc87bd3f98a
2019-07-01 16:37:05 -07:00
Mingzhe Li
8a726f5815
add PT split op to the benchmark ( #22410 )
...
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22410
as title
Reviewed By: hl475
Differential Revision: D16078705
fbshipit-source-id: 29e1cc19d0e93a561d07c47e5678a311e6de3e3b
2019-07-01 16:37:01 -07:00
Mingzhe Li
8281909e73
add PT cat operator to the benchmark ( #22404 )
...
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22404
as title
Reviewed By: hl475
Differential Revision: D16078395
fbshipit-source-id: 4ff5c558036af1dce6ac0001a1a1fc3a373a981f
2019-07-01 16:36:57 -07:00
Mingzhe Li
007fd01e9b
Enable PT operators running with {cpu, gpu} * {forward, backward} ( #22416 )
...
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22416
This diff tests the combination of cpu/gpu and forward/backward path for PT add operator.
Reviewed By: hl475
Differential Revision: D15770792
fbshipit-source-id: 38cc648361d2501d774db407f988c3cb5115b2ae
2019-07-01 16:30:58 -07:00
Mingzhe Li
3a198400f8
modify pool benchmarks
...
Summary: as title
Reviewed By: hl475
Differential Revision: D16058193
fbshipit-source-id: 8f4e04a0356960f6483d6ef58e64876740434849
2019-06-28 14:35:23 -07:00
Mingzhe Li
89c709d217
modify unary operators benchmark
...
Summary: as title
Reviewed By: hl475
Differential Revision: D16057665
fbshipit-source-id: 07e31a17450fbfd88b5bd330c31c729de5300eaa
2019-06-28 14:03:41 -07:00
Mingzhe Li
6cf4df5d06
add PT softmax ops to the benchmark suite ( #21208 )
...
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21208
The diff adds softmax, softmax2d, and logsoftmax to the benchmark suite.
Reviewed By: zheng-xq
Differential Revision: D15526265
fbshipit-source-id: b7ba63032dba7146765513c8cb1ac5a6a7bd1a68
2019-06-28 13:58:20 -07:00
Mingzhe Li
a4f281446b
introduce flags to set omp and mkl threads ( #21472 )
...
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21472
as title
Reviewed By: hl475
Differential Revision: D15695846
fbshipit-source-id: 44437f6b94a9c583275fcc711bb6ccf2b04f90fc
2019-06-26 09:33:05 -07:00
Sungmann Cho
f59581218f
Fix spelling errors ( #21665 )
...
Summary:
alloctor -> allocator
excutable -> executable
excution -> execution
foward -> forward
initiaize -> initialize
paralell -> parallel
preprocesor -> preprocessor
tranpose -> transpose
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21665
Differential Revision: D15806155
Pulled By: soumith
fbshipit-source-id: d92b21ec8650a2b32f05faf9af0b7d2b073e992c
2019-06-13 15:21:55 -07:00
Mingzhe Li
341a7e4bb5
Fix issue in backward path ( #21663 )
...
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21663
as title
Reviewed By: hl475
Differential Revision: D15770793
fbshipit-source-id: b3d0dd030237c4d62bddc388984a273153fac4a6
2019-06-11 21:09:25 -07:00
Mingzhe Li
f2623c74a9
add PT pointwise unary ops to the benchmark suite ( #21207 )
...
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21207
This diff adds 80 PT pointwise unary ops to the benchmark suite. Most of the ops are added using the generate_pt_tests_from_list interface. The rest are handled separately.
Reviewed By: zheng-xq
Differential Revision: D15471597
fbshipit-source-id: 8ea36e292a38b1dc50f064a48c8cd07dbf78ae56
2019-06-10 21:35:44 -07:00
Mingzhe Li
4e3c97a0be
add separate path for op with JIT ( #21210 )
...
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21210
This diff introduces a new path to run op with JIT. There are two steps involved here:
1. Users need to script the op. This should happen in the `init` method.
2. The generated graph from step1 is passed to `jit_forward` which will be executed by the benchmark backend
Reviewed By: zheng-xq
Differential Revision: D15460831
fbshipit-source-id: 48441d9cd4be5d0acebab901f45544616e6ed2ee
2019-06-10 19:53:58 -07:00
Mingzhe Li
512c9d8c76
add PT gather op to the benchmark suite ( #21614 )
...
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21614
as title
Reviewed By: kimishpatel
Differential Revision: D15525115
fbshipit-source-id: 6a17e1d791bdb432cc3d51e45c5e82b96268127d
2019-06-10 16:31:52 -07:00
Mingzhe Li
a5cf6d5100
reorganize op bench directory ( #21543 )
...
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21543
No code change in this diff.
Reviewed By: hl475
Differential Revision: D15721419
fbshipit-source-id: 06212cc882f5297064153417dc4d80bce9ec2667
2019-06-07 16:06:51 -07:00
Huamin Li
f433913996
add more info back to BenchResult ( #21502 )
...
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21502
In BenchResult, we keep name, avg_fwd, std_fwd, avg_bwd, and std_bwd. There is no information about the number of each iteration. In this diff, I am adding more info to BenchResult to include the number reported from each iteration.
Reviewed By: wanchaol
Differential Revision: D15706306
fbshipit-source-id: 3f14be4ba91f1f6da473995783bd7af1d067938d
2019-06-06 18:43:51 -07:00
Mingzhe Li
12528990f8
change output of ai_pep_format ( #21440 )
...
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21440
This diff modifies the output format when ai_pep_format is enabled.
Reviewed By: hl475
Differential Revision: D15681042
fbshipit-source-id: df5f2dbb38d1bd866ca7f74ef4e63459d480be6e
2019-06-05 21:54:24 -07:00
Mingzhe Li
b869a3b4ac
add new ops to benchmark_all_test ( #21365 )
...
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21365
This diff adds new operators to benchmark_all_test so all the supported ops can be built as one binary
Reviewed By: hl475
Differential Revision: D15627328
fbshipit-source-id: b7ca550a279f485102a6a6bd47e4032c7beb9940
2019-06-04 13:54:26 -07:00
Mingzhe Li
3004b397f0
change test_name to be globally unique value across tests ( #21206 )
...
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21206
This diff change the default test_name to be a globally unique value across tests. With that, users can list all the tests and choose to run a specific test.
Reviewed By: zheng-xq
Differential Revision: D15543508
fbshipit-source-id: 0814ef6a60d41637fed5245e30c282497cf21bb8
2019-06-03 14:55:11 -07:00
Mingzhe Li
ca80ec7c97
introduce a new intrace to add op [PT changes] ( #21149 )
...
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21149
The diff modifies the interface for PyTorch operators in the benchmark suite
Reviewed By: zheng-xq
Differential Revision: D15433897
fbshipit-source-id: e858183431eb37d90313356716c2de8709372b58
2019-06-03 14:55:08 -07:00
Mingzhe Li
516ea33f6a
add PT maxpool and avgpool ops to the benchmark suite ( #21200 )
...
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21200
This diff adds MaxPool1d/2d/3d and AvgPool1d/2d/3d to the benchmark suite.
Reviewed By: hl475
Differential Revision: D15541980
fbshipit-source-id: 394d136ee94a16ee24285939323ca5fe317e99d3
2019-05-31 19:35:29 -07:00
Mingzhe Li
dceea73460
add PT conv and convtranspose ops to the benchmark suite ( #21199 )
...
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21199
This diff adds Conv1d, ConvTranspose1d, Conv2d, ConvTranspose2d, Conv3d, and ConvTranspose3d operators to the benchmark suite.
Reviewed By: hl475
Differential Revision: D15520817
fbshipit-source-id: 5512afec2be8a1036fbcd170f70265c7e455fcde
2019-05-31 19:35:25 -07:00
Mingzhe Li
2d75d31398
add PT linear op to the benchmark suite ( #21204 )
...
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21204
as title
Reviewed By: hl475
Differential Revision: D15484743
fbshipit-source-id: 7094a983e370e1c3952021146b58b844874b7d5e
2019-05-31 19:35:22 -07:00
Mingzhe Li
00b3e69211
add PT batchnorm op to the benchmark suite ( #21201 )
...
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21201
as title
Reviewed By: hl475
Differential Revision: D15482581
fbshipit-source-id: d93713a35be41e76d077df419cb24585f69d72eb
2019-05-31 19:35:18 -07:00
Mingzhe Li
ed1078bde3
migrate matmul operator to the new interface ( #21198 )
...
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21198
as title
Reviewed By: hl475
Differential Revision: D15325768
fbshipit-source-id: a5d7c6837cd09445e75846660d12807dd26af6cc
2019-05-31 19:35:15 -07:00
Mingzhe Li
668dbcc41b
migrate intraop benchmarks to the new interface ( #21202 )
...
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21202
Migrate Ilia's op benchmarks to the new interface
Reviewed By: hl475
Differential Revision: D15322577
fbshipit-source-id: 8e75d51e7ddacbd56896c55f2996a9358491d83e
2019-05-31 16:19:04 -07:00
Mingzhe Li
c62d476206
migrate add operator to the new interface ( #21152 )
...
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21152
Migrate existing add benchmark to use the new op front-end
Reviewed By: zheng-xq
Differential Revision: D15325524
fbshipit-source-id: 34e969e1bd289913d881c476711bce9f8ac18a29
2019-05-31 16:19:00 -07:00
Mingzhe Li
0223d3744a
introduce a new intrace to add op [C2 changes] ( #21148 )
...
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21148
The diff modifies the interface for Caffe2 operators in the benchmark suite
Reviewed By: zheng-xq
Differential Revision: D15433888
fbshipit-source-id: c264a95906422d7a26c10b1f9836ba8b35e36b53
2019-05-31 09:21:07 -07:00
Mingzhe Li
31089b02ce
introduce a new interface to add op [core changes] ( #21147 )
...
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21147
This diff introduces a new interface to add PT/C2 operators to the benchmark suite.
The following steps are needed to add a new operator:
1. Specify the input shapes, args to an operator in configs
2. Create a PT/C2 benchmark class which includes ```init``` (create tensors), ```forward``` (specify the operator to be tested.), and ```backward```(gradient of an op.) methods
3. call generate_pt_test/generate_c2_test to create test cases based on configs
Reviewed By: zheng-xq
Differential Revision: D15250380
fbshipit-source-id: 1025a7cf60d2427baa0f3f716455946d3d3e6a27
2019-05-31 09:21:04 -07:00
Kimish Patel
cda9e995e2
Benchmark repeat op. ( #20016 )
...
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20016
PT's repeat op benchmark
Reviewed By: zheng-xq
Differential Revision: D15166941
fbshipit-source-id: b1ed7af790460456210b60bfb4e44a08657e9612
2019-05-20 07:34:54 -07:00
Ilia Cherniavskii
eecf52b444
Fix in benchmark_test_generator ( #20237 )
...
Summary:
Add missing import
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20237
Differential Revision: D15245957
Pulled By: ilia-cher
fbshipit-source-id: 0f71aa08eb9ecac32002a1644838d06ab9faa37c
2019-05-07 17:03:25 -07:00
Ilia Cherniavskii
19e6886576
Intra-op parallel microbenchmarks for PT ( #19997 )
...
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19997
ghimport-source-id: 420d4a68a1ef879beee2734adba8abb575e0b0ab
Differential Revision: D15231375
Pulled By: ilia-cher
fbshipit-source-id: ce7248ea2ebb54d25c9d831c6e3f23f3534557dd
2019-05-06 20:21:45 -07:00
Ilia Cherniavskii
8c97f0b19e
Initialize Caffe2 only when running Caffe2 benchmarks ( #19980 )
...
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19980
ghimport-source-id: ca31ca25b88a1c6219e4a32483f70738a8fdbf88
Differential Revision: D15229797
Pulled By: ilia-cher
fbshipit-source-id: 0b23dbdba0c0f60932a75d8b1900c54285f5a8e4
2019-05-06 19:17:23 -07:00
Ilia Cherniavskii
0c7e98b765
Support for non-contiguous tensors and arbitrary dtypes in PT benchmarks ( #19993 )
...
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19993
ghimport-source-id: 4cf51b61bb83b72883148ab0faa0c75c3cef7635
Differential Revision: D15230363
Pulled By: ilia-cher
fbshipit-source-id: a3ab591d6fd24e874958401e63eaec56bda19a5c
2019-05-06 19:12:09 -07:00
Natalia Gimelshein
3875e1ba45
try to make at::cat in mm_tree_reduction operate on contig tensors ( #18816 )
...
Summary:
Sometimes at::cat gets transposed inputs and goes on a slow path. Also, make jit_premul lstm benchmark add bias to the whole input tensor to avoid separate reduction kernels in the backward pass.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18816
Differential Revision: D15013576
Pulled By: wanchaol
fbshipit-source-id: bcfa1cf44180b11b05b0f55f034707012f66281a
2019-04-24 23:44:25 -07:00
Mingzhe Li
26f12af537
Fix op benchmarks error in OSS environment ( #19518 )
...
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19518
Previous design needs to run the op benchmarks from PyTorch root directory which could lead to `module not found` error in OSS environment. This diff fixes that issue by making the benchmark to be launched in the `benchmarks` folder.
Reviewed By: ilia-cher
Differential Revision: D15020787
fbshipit-source-id: eb09814a33432a66cc857702bc86538cd17bea3b
2019-04-19 16:25:16 -07:00
Mingzhe Li
5da7b74d48
fix AI-PEP path error ( #19514 )
...
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19514
as title
Reviewed By: hl475
Differential Revision: D15018499
fbshipit-source-id: 9ce38e3a577432e0575a6743f5dcd2e907d3ab9d
2019-04-19 16:25:13 -07:00
Mingzhe Li
08f5c05d60
make separate operators as independent binaries ( #19450 )
...
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19450
We want to make each operator benchmark as a separate binary. The previous way to run the benchmark is by collecting all operators into a single binary, it is unnecessary when we want to filter a specific operator. This diff aims to resolve that issue.
Reviewed By: ilia-cher
Differential Revision: D14808159
fbshipit-source-id: 43cd25b219c6e358d0cd2a61463b34596bf3bfac
2019-04-18 20:00:47 -07:00
Mingzhe Li
45d5b6be48
Enhance front-end to add op ( #19433 )
...
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19433
For operator benchmark project, we need to cover a lot of operators, so the interface for adding operators needs to be very clean and simple. This diff is implementing a new interface to add op.
Here is the logic to add new operator to the benchmark:
```
long_config = {}
short_config = {}
map_func
add_test(
[long_config, short_config],
map_func,
[caffe2 op]
[pt op]
)
```
Reviewed By: zheng-xq
Differential Revision: D14791191
fbshipit-source-id: ac6738507cf1b9d6013dc8e546a2022a9b177f05
2019-04-18 17:07:02 -07:00
Xiaoqiang Zheng
5627940e9c
Add a fast path for batch-norm CPU inference. ( #19152 )
...
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19152
Adding a fast path for batch-norm CPU inference when all tensors are contiguous.
* Leverage vectorization through smiple loops.
* Folding linear terms before computation.
* For resnext-101, this version gets 18.95 times faster.
* Add a microbenchmark:
* (buck build mode/opt -c python.package_style=inplace --show-output //caffe2/benchmarks/operator_benchmark:batchnorm_benchmark) && \
(OMP_NUM_THREADS=1 MKL_NUM_THREADS=1 buck-out/gen/caffe2/benchmarks/operator_benchmark/batchnorm_benchmark#binary.par)
* batch_norm: data shape: [1, 256, 3136], bandwidth: 22.26 GB/s
* batch_norm: data shape: [1, 65536, 1], bandwidth: 5.57 GB/s
* batch_norm: data shape: [128, 2048, 1], bandwidth: 18.21 GB/s
Reviewed By: soumith, BIT-silence
Differential Revision: D14889728
fbshipit-source-id: 20c9e567e38ff7dbb9097873b85160eca2b0a795
2019-04-16 19:27:54 -07:00
Mingzhe Li
3501576230
calculate execution time based on final iterations ( #19299 )
...
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19299
I saw larger than 5% performance variation with small operators, this diff aims to reduce the variation by avoiding python overhead. Previously, in the benchmark, we run the main loop for 100 iterations then look at the time. If it's not significant, we will double the number of iterations to rerun and look at the result. We continue this process until it becomes significant. We calculate the time by total_time / number of iterations. The issue is that we are including multiple python trigger overhead.
Now, I change the logic to calculate execution time based on the last run instead of all runs, the equation is time_in_last_run/number of iterations.
Reviewed By: hl475
Differential Revision: D14925287
fbshipit-source-id: cb646298c08a651e27b99a5547350da367ffff47
2019-04-16 08:57:17 -07:00
Wanchao Liang
07efee395c
add Fast-RNN to AI-PEP
...
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18885
Reviewed By: hl475
Differential Revision: D14728854
fbshipit-source-id: 7e7a2946929551963f7c938e3d82a260a9efdfbd
2019-04-04 17:04:21 -07:00
mingzhe0908
cb66759600
temp fix for flake8 error ( #18788 )
...
Summary:
Fix lint error
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18788
Reviewed By: houseroad
Differential Revision: D14741840
Pulled By: mingzhe09088
fbshipit-source-id: 1fa630e3c6e606e3d78fe8293e5b0e7ea1b78da3
2019-04-02 22:52:52 -07:00
Mingzhe Li
5f5a2aaab9
Operator-level performance microbenchmarks ( #18740 )
...
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18740
Test utilities for writing Caffe2/PyTorch performance microbenchmarks. Brief description of the file structure
* benchmark_core.py : core utiltiites for running microbenchmark tests
* benchmark_caffe2.py : Caffe2 specific benchmark utilitites
* benchmark_pytorch.py: PyTorch specific benchmark utilities
* benchmark_runner.py : Main function. Currently it can run the microbenchmark tests in a stand-alone mode. The next step is to have this integrate with AI-PEP.
The utilities are located at https://github.com/pytorch/pytorch/tree/master/test to have access to both Caffe2/PyTorch Python's frontend.
Include two operator microbenchmarks; support both Caffe2/PyTorch:
* MatMul
* Add
Reference: PyTorch benchmarks : https://github.com/pytorch/benchmark/tree/master/timing/python . In this work, we start with two example binary operators MatMul and Add, but eventually we should to cover unary operators like in the PyTorch benchmark repo.
Reviewed By: zheng-xq
Differential Revision: D13887111
fbshipit-source-id: b7a56b95448c9ec3e674b0de0ffb96af4439bfce
2019-04-02 17:06:19 -07:00
Edward Yang
173f224570
Turn on F401: Unused import warning. ( #18598 )
...
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18598
ghimport-source-id: c74597e5e7437e94a43c163cee0639b20d0d0c6a
Stack from [ghstack](https://github.com/ezyang/ghstack ):
* **#18598 Turn on F401: Unused import warning.**
This was requested by someone at Facebook; this lint is turned
on for Facebook by default. "Sure, why not."
I had to noqa a number of imports in __init__. Hypothetically
we're supposed to use __all__ in this case, but I was too lazy
to fix it. Left for future work.
Be careful! flake8-2 and flake8-3 behave differently with
respect to import resolution for # type: comments. flake8-3 will
report an import unused; flake8-2 will not. For now, I just
noqa'd all these sites.
All the changes were done by hand.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Differential Revision: D14687478
fbshipit-source-id: 30d532381e914091aadfa0d2a5a89404819663e3
2019-03-30 09:01:17 -07:00
Junjie Bai
e22a2b9015
Minor fixes in fastrnns benchmarks
...
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18613
Reviewed By: wanchaol
Differential Revision: D14681838
fbshipit-source-id: 60bd5c9b09398c74335f003cd21ea32dd1c45876
2019-03-29 01:22:28 -07:00
Wanchao Liang
6684ef3f23
Move fast rnn benchmark to pytorch/pytorch
...
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18369
Differential Revision: D14652039
Pulled By: wanchaol
fbshipit-source-id: 1177b1f60d96672c3e2c9d527b56ee06ca7c0af1
2019-03-27 14:46:09 -07:00