Commit Graph

199 Commits

Author SHA1 Message Date
Wojciech Baranowski
b10a39bb32 Migrate _cat from TH to ATen (CUDA) (#33237)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/24520

Benchmarks:

Upstream:

```
$ python -m pt.cat_test --tag_filter all --device cuda  --omp_num_threads 1 --mkl_num_threads 1
# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : all

# Benchmarking PyTorch: cat
# Mode: Eager
# Name: cat_sizes(1,1,1)_N2_dim0_cuda
# Input: sizes: (1, 1, 1), N: 2, dim: 0, device: cuda
Forward Execution Time (us) : 17.355

# Benchmarking PyTorch: cat
# Mode: Eager
# Name: cat_sizes(512,512,2)_N2_dim1_cuda
# Input: sizes: (512, 512, 2), N: 2, dim: 1, device: cuda
Forward Execution Time (us) : 30.718

# Benchmarking PyTorch: cat
# Mode: Eager
# Name: cat_sizes(128,1024,2)_N2_dim1_cuda
# Input: sizes: (128, 1024, 2), N: 2, dim: 1, device: cuda
Forward Execution Time (us) : 17.329

# Benchmarking PyTorch: cat
# Mode: Eager
# Name: cat_sizes(512,512,2)_N2_dim1_cuda
# Input: sizes: (512, 512, 2), N: 2, dim: 1, device: cuda
Forward Execution Time (us) : 30.176

# Benchmarking PyTorch: cat
# Mode: Eager
# Name: cat_sizes(1024,1024,2)_N2_dim0_cuda
# Input: sizes: (1024, 1024, 2), N: 2, dim: 0, device: cuda
Forward Execution Time (us) : 74.417

# Benchmarking PyTorch: cat
# Mode: Eager
# Name: cat_sizes(1025,1023,2)_N2_dim1_cuda
# Input: sizes: (1025, 1023, 2), N: 2, dim: 1, device: cuda
Forward Execution Time (us) : 75.728

# Benchmarking PyTorch: cat
# Mode: Eager
# Name: cat_sizes(1024,1024,2)_N2_dim2_cuda
# Input: sizes: (1024, 1024, 2), N: 2, dim: 2, device: cuda
Forward Execution Time (us) : 190.165

# Benchmarking PyTorch: cat
# Mode: Eager
# Name: cat_sizes[<function<lambda>at0x7fa8876fcf28>,111,65]_N5_dim0_cuda
# Input: sizes: [<function <lambda> at 0x7fa8876fcf28>, 111, 65], N: 5, dim: 0, device: cuda
Forward Execution Time (us) : 57.711

# Benchmarking PyTorch: cat
# Mode: Eager
# Name: cat_sizes[96,<function<lambda>at0x7fa886237048>,64]_N5_dim1_cuda
# Input: sizes: [96, <function <lambda> at 0x7fa886237048>, 64], N: 5, dim: 1, device: cuda
Forward Execution Time (us) : 49.903

# Benchmarking PyTorch: cat
# Mode: Eager
# Name: cat_sizes[128,64,<function<lambda>at0x7fa7b57bb840>]_N5_dim2_cuda
# Input: sizes: [128, 64, <function <lambda> at 0x7fa7b57bb840>], N: 5, dim: 2, device: cuda
Forward Execution Time (us) : 84.181

# Benchmarking PyTorch: cat
# Mode: Eager
# Name: cat_sizes[<function<lambda>at0x7fa7b57bba60>,32,64]_N50_dim0_cuda
# Input: sizes: [<function <lambda> at 0x7fa7b57bba60>, 32, 64], N: 50, dim: 0, device: cuda
Forward Execution Time (us) : 82.339

# Benchmarking PyTorch: cat
# Mode: Eager
# Name: cat_sizes[32,<function<lambda>at0x7fa7b57bbae8>,64]_N50_dim1_cuda
# Input: sizes: [32, <function <lambda> at 0x7fa7b57bbae8>, 64], N: 50, dim: 1, device: cuda
Forward Execution Time (us) : 82.312

# Benchmarking PyTorch: cat
# Mode: Eager
# Name: cat_sizes[33,65,<function<lambda>at0x7fa7b57bbb70>]_N50_dim2_cuda
# Input: sizes: [33, 65, <function <lambda> at 0x7fa7b57bbb70>], N: 50, dim: 2, device: cuda
Forward Execution Time (us) : 90.715

# Benchmarking PyTorch: cat
# Mode: Eager
# Name: cat_sizes(64,32,4,16,32)_N2_dim2_cuda
# Input: sizes: (64, 32, 4, 16, 32), N: 2, dim: 2, device: cuda
Forward Execution Time (us) : 129.021

# Benchmarking PyTorch: cat
# Mode: Eager
# Name: cat_sizes(16,32,4,16,32)_N8_dim2_cuda
# Input: sizes: (16, 32, 4, 16, 32), N: 8, dim: 2, device: cuda
Forward Execution Time (us) : 142.966

# Benchmarking PyTorch: cat
# Mode: Eager
# Name: cat_sizes(9,31,5,15,33)_N17_dim4_cuda
# Input: sizes: (9, 31, 5, 15, 33), N: 17, dim: 4, device: cuda
Forward Execution Time (us) : 387.023

# Benchmarking PyTorch: cat
# Mode: Eager
# Name: cat_sizes[<function<lambda>at0x7fa7b57bbbf8>]_N100_dim0_cuda
# Input: sizes: [<function <lambda> at 0x7fa7b57bbbf8>], N: 100, dim: 0, device: cuda
Forward Execution Time (us) : 36.647

# Benchmarking PyTorch: cat
# Mode: Eager
# Name: cat_sizes[<function<lambda>at0x7fa7b57bbc80>]_N1000_dim0_cuda
# Input: sizes: [<function <lambda> at 0x7fa7b57bbc80>], N: 1000, dim: 0, device: cuda
Forward Execution Time (us) : 278.890

# Benchmarking PyTorch: cat
# Mode: Eager
# Name: cat_sizes[<function<lambda>at0x7fa7b57bbd08>]_N2000_dim0_cuda
# Input: sizes: [<function <lambda> at 0x7fa7b57bbd08>], N: 2000, dim: 0, device: cuda
Forward Execution Time (us) : 557.752

# Benchmarking PyTorch: cat
# Mode: Eager
# Name: cat_sizes[<function<lambda>at0x7fa7b57bbd90>]_N3000_dim0_cuda
# Input: sizes: [<function <lambda> at 0x7fa7b57bbd90>], N: 3000, dim: 0, device: cuda
Forward Execution Time (us) : 842.512

```

New version:

```
$ python -m pt.cat_test --tag_filter all --device cuda  --omp_num_threads 1 --mkl_num_threads 1
# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : all

# Benchmarking PyTorch: cat
# Mode: Eager
# Name: cat_sizes(1,1,1)_N2_dim0_cuda
# Input: sizes: (1, 1, 1), N: 2, dim: 0, device: cuda
Forward Execution Time (us) : 24.419

# Benchmarking PyTorch: cat
# Mode: Eager
# Name: cat_sizes(512,512,2)_N2_dim1_cuda
# Input: sizes: (512, 512, 2), N: 2, dim: 1, device: cuda
Forward Execution Time (us) : 25.025

# Benchmarking PyTorch: cat
# Mode: Eager
# Name: cat_sizes(128,1024,2)_N2_dim1_cuda
# Input: sizes: (128, 1024, 2), N: 2, dim: 1, device: cuda
Forward Execution Time (us) : 24.247

# Benchmarking PyTorch: cat
# Mode: Eager
# Name: cat_sizes(512,512,2)_N2_dim1_cuda
# Input: sizes: (512, 512, 2), N: 2, dim: 1, device: cuda
Forward Execution Time (us) : 25.098

# Benchmarking PyTorch: cat
# Mode: Eager
# Name: cat_sizes(1024,1024,2)_N2_dim0_cuda
# Input: sizes: (1024, 1024, 2), N: 2, dim: 0, device: cuda
Forward Execution Time (us) : 74.441

# Benchmarking PyTorch: cat
# Mode: Eager
# Name: cat_sizes(1025,1023,2)_N2_dim1_cuda
# Input: sizes: (1025, 1023, 2), N: 2, dim: 1, device: cuda
Forward Execution Time (us) : 74.866

# Benchmarking PyTorch: cat
# Mode: Eager
# Name: cat_sizes(1024,1024,2)_N2_dim2_cuda
# Input: sizes: (1024, 1024, 2), N: 2, dim: 2, device: cuda
Forward Execution Time (us) : 189.280

# Benchmarking PyTorch: cat
# Mode: Eager
# Name: cat_sizes[<function<lambda>at0x7f1c9b056048>,111,65]_N5_dim0_cuda
# Input: sizes: [<function <lambda> at 0x7f1c9b056048>, 111, 65], N: 5, dim: 0, device: cuda
Forward Execution Time (us) : 57.629

# Benchmarking PyTorch: cat
# Mode: Eager
# Name: cat_sizes[96,<function<lambda>at0x7f1c9b0560d0>,64]_N5_dim1_cuda
# Input: sizes: [96, <function <lambda> at 0x7f1c9b0560d0>, 64], N: 5, dim: 1, device: cuda
Forward Execution Time (us) : 49.975

# Benchmarking PyTorch: cat
# Mode: Eager
# Name: cat_sizes[128,64,<function<lambda>at0x7f1bce8f38c8>]_N5_dim2_cuda
# Input: sizes: [128, 64, <function <lambda> at 0x7f1bce8f38c8>], N: 5, dim: 2, device: cuda
Forward Execution Time (us) : 83.643

# Benchmarking PyTorch: cat
# Mode: Eager
# Name: cat_sizes[<function<lambda>at0x7f1bce8f3ae8>,32,64]_N50_dim0_cuda
# Input: sizes: [<function <lambda> at 0x7f1bce8f3ae8>, 32, 64], N: 50, dim: 0, device: cuda
Forward Execution Time (us) : 82.307

# Benchmarking PyTorch: cat
# Mode: Eager
# Name: cat_sizes[32,<function<lambda>at0x7f1bce8f3b70>,64]_N50_dim1_cuda
# Input: sizes: [32, <function <lambda> at 0x7f1bce8f3b70>, 64], N: 50, dim: 1, device: cuda
Forward Execution Time (us) : 82.323

# Benchmarking PyTorch: cat
# Mode: Eager
# Name: cat_sizes[33,65,<function<lambda>at0x7f1bce8f3bf8>]_N50_dim2_cuda
# Input: sizes: [33, 65, <function <lambda> at 0x7f1bce8f3bf8>], N: 50, dim: 2, device: cuda
Forward Execution Time (us) : 90.549

# Benchmarking PyTorch: cat
# Mode: Eager
# Name: cat_sizes(64,32,4,16,32)_N2_dim2_cuda
# Input: sizes: (64, 32, 4, 16, 32), N: 2, dim: 2, device: cuda
Forward Execution Time (us) : 129.022

# Benchmarking PyTorch: cat
# Mode: Eager
# Name: cat_sizes(16,32,4,16,32)_N8_dim2_cuda
# Input: sizes: (16, 32, 4, 16, 32), N: 8, dim: 2, device: cuda
Forward Execution Time (us) : 142.969

# Benchmarking PyTorch: cat
# Mode: Eager
# Name: cat_sizes(9,31,5,15,33)_N17_dim4_cuda
# Input: sizes: (9, 31, 5, 15, 33), N: 17, dim: 4, device: cuda
Forward Execution Time (us) : 386.973

# Benchmarking PyTorch: cat
# Mode: Eager
# Name: cat_sizes[<function<lambda>at0x7f1bce8f3c80>]_N100_dim0_cuda
# Input: sizes: [<function <lambda> at 0x7f1bce8f3c80>], N: 100, dim: 0, device: cuda
Forward Execution Time (us) : 43.800

# Benchmarking PyTorch: cat
# Mode: Eager
# Name: cat_sizes[<function<lambda>at0x7f1bce8f3d08>]_N1000_dim0_cuda
# Input: sizes: [<function <lambda> at 0x7f1bce8f3d08>], N: 1000, dim: 0, device: cuda
Forward Execution Time (us) : 279.023

# Benchmarking PyTorch: cat
# Mode: Eager
# Name: cat_sizes[<function<lambda>at0x7f1bce8f3d90>]_N2000_dim0_cuda
# Input: sizes: [<function <lambda> at 0x7f1bce8f3d90>], N: 2000, dim: 0, device: cuda
Forward Execution Time (us) : 565.790

# Benchmarking PyTorch: cat
# Mode: Eager
# Name: cat_sizes[<function<lambda>at0x7f1bce8f3e18>]_N3000_dim0_cuda
# Input: sizes: [<function <lambda> at 0x7f1bce8f3e18>], N: 3000, dim: 0, device: cuda
Forward Execution Time (us) : 845.153
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33237

Differential Revision: D20069181

Pulled By: ngimel

fbshipit-source-id: b392e1ffd72c0d8df0c5a2d3ac96f59b37c84e32
2020-02-24 17:41:16 -08:00
comet
9a2691f2fc Fix spelling errors
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32673

Differential Revision: D19597118

Pulled By: pietern

fbshipit-source-id: f88c1da7548fcee141ed248f5f49d25c1d639955
2020-01-28 04:46:15 -08:00
Huamin Li
52f8f031ac add diag into pt operator microbenchmark (#32597)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32597

Currently, there is no benchmark test about diag operator. This diff will add one into the suite.

Test Plan:
```
# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : short

# Benchmarking PyTorch: diag
# Mode: Eager
# Name: diag_dim1_M64_N64_diagonal0_outTrue_cpu
# Input: dim: 1, M: 64, N: 64, diagonal: 0, out: True, device: cpu
Forward Execution Time (us) : 28.496

# Benchmarking PyTorch: diag
# Mode: Eager
# Name: diag_dim2_M128_N128_diagonal-10_outFalse_cpu
# Input: dim: 2, M: 128, N: 128, diagonal: -10, out: False, device: cpu
Forward Execution Time (us) : 45.179

# Benchmarking PyTorch: diag
# Mode: Eager
# Name: diag_dim1_M256_N256_diagonal20_outTrue_cpu
# Input: dim: 1, M: 256, N: 256, diagonal: 20, out: True, device: cpu
Forward Execution Time (us) : 49.009
```

Reviewed By: mingzhe09088

Differential Revision: D19564024

fbshipit-source-id: 828a3e0e0e06810a77eb5ddb734efd30e4a63acf
2020-01-24 15:41:04 -08:00
Brian Wignall
f326045b37 Fix typos, via a Levenshtein-type corrector (#31523)
Summary:
Should be non-semantic.

Uses https://en.wikipedia.org/wiki/Wikipedia:Lists_of_common_misspellings/For_machines to find likely typos, with https://github.com/bwignall/typochecker to help automate the checking.

Uses an updated version of the tool used in https://github.com/pytorch/pytorch/pull/30606 .
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31523

Differential Revision: D19216749

Pulled By: mrshenli

fbshipit-source-id: 7fd489cb9a77cd7e4950c1046f925d57524960ea
2020-01-17 16:03:19 -08:00
Zafar Takhirov
0ae063d5d9 Fixed concatenation benchmark + added it to the microbenchmarking runs
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31587

Test Plan: Imported from OSS

Differential Revision: D19221813

Pulled By: z-a-f

fbshipit-source-id: ee0eb60da7899b23fdc63326302d1e2fd4b540ee
2020-01-03 11:23:12 -08:00
olramde
d770fbc1d2 Some modifications to improve readability (#31352)
Summary:
In the long string, formalstring thinks it is good to have a name.

When using dict, literal is better for readability and faster than dict constructor.

I always appreciate your efforts in creating the world's best frameworks.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31352

Differential Revision: D19191967

Pulled By: ngimel

fbshipit-source-id: 21f063b163b67de8cf9761a4db5991f74318e991
2020-01-02 12:48:34 -08:00
Zafar Takhirov
e33dea6e4e dynamicly quantized lstm benchmarking
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30149

Test Plan: Imported from OSS

Differential Revision: D18613005

Pulled By: z-a-f

fbshipit-source-id: 966bfe2c862b1b4006b228bd9115c5c1cd3ad8cf
2019-12-17 16:52:04 -08:00
Mingzhe Li
f9010d7648 remove wipe cache from op bench (#31334)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31334

The wipe cache logic was introduced hoping to reduce the variations in the benchmark results. Based on our experiments result, it didn't actually help with that. In addition, several engineers had encountered the issue of missing cpuinfo.h which was used in the wipe cache logic. So this diff removes that feature to ensure smooth installation and running of the op bench.

Test Plan:
```
buck run caffe2/benchmarks/operator_benchmark/pt:add_test -- --iterations 1
# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : short

# Benchmarking PyTorch: add
# Mode: Eager
# Name: add_M1_N1_K1_cpu
# Input: M: 1, N: 1, K: 1, device: cpu
Forward Execution Time (us) : 111.192

A/B test also pass Benchmark Run #2476535015

Reviewed By: hl475

Differential Revision: D19126970

fbshipit-source-id: 9b1ab48c121838836ba6e0ae664a48fe2d18efdd
2019-12-16 16:34:14 -08:00
Mingzhe Li
c6a8f884d8 add copy_ operator the op bench (#31327)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31327

Adds copy_ operator to the benchmark suite

Test Plan:
```
buck run caffe2/benchmarks/operator_benchmark/pt:binary_test -- --iterations 1 --operators copy_
# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : short

# Benchmarking PyTorch: copy_
# Mode: Eager
# Name: copy__M1_N1_K1_cpu_dtype_onetorch.int32_dtype_twotorch.int32
# Input: M: 1, N: 1, K: 1, device: cpu, dtype_one: torch.int32, dtype_two: torch.int32
Forward Execution Time (us) : 60.645

Reviewed By: hl475

Differential Revision: D19122910

fbshipit-source-id: e5f0b0e2612daae0201b1b4a87f52b971e0cc4a8
2019-12-16 13:45:12 -08:00
Mingzhe Li
d401ba1417 benchmark binary ops in binary_test (#31326)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31326

as title

Test Plan:
```
buck run caffe2/benchmarks/operator_benchmark/pt:binary_test -- --iterations 1

# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : short

# Benchmarking PyTorch: add
# Mode: Eager
# Name: add_in_one[64,1,64]_in_two[1,64,1]_cpu_dtypetorch.float32
# Input: in_one: [64, 1, 64], in_two: [1, 64, 1], device: cpu, dtype: torch.float32
Forward Execution Time (us) : 28080.802

Reviewed By: hl475

Differential Revision: D19120113

fbshipit-source-id: 1105de208f7609cc6d74f0b5bc6fe75f19146b28
2019-12-16 13:45:08 -08:00
Zafar Takhirov
efe683fb2a dynamicly quantized linear benchmarking
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30148

Test Plan: Imported from OSS

Differential Revision: D18613006

Pulled By: z-a-f

fbshipit-source-id: 3851189a2822fd09a5dd97c9d54774727822d2bf
2019-12-11 18:39:57 -08:00
TH3CHARLie
5edfe9cb80 add torch.square (#30719)
Summary:
fixes https://github.com/pytorch/pytorch/issues/30524
This adds an new operator `torch.square` to PyTorch

I think it is ready for the first-time review now albanD
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30719

Differential Revision: D18909268

Pulled By: albanD

fbshipit-source-id: 5626c445d8db20471a56fc1d7a3490e77812662b
2019-12-10 15:22:46 -08:00
Brian Wignall
e7fe64f6a6 Fix typos (#30606)
Summary:
Should be non-semantic.

Uses https://en.wikipedia.org/wiki/Wikipedia:Lists_of_common_misspellings/For_machines to find likely typos.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30606

Differential Revision: D18763028

Pulled By: mrshenli

fbshipit-source-id: 896515a2156d062653408852e6c04b429fc5955c
2019-12-02 20:17:42 -08:00
Mingzhe Li
b68d1fc316 add small input shapes to some ops (#30617)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30617

as title

Test Plan: buck run //caffe2/benchmarks/operator_benchmark:benchmark_all_test -- --iterations 1 --operator add,as_strided,cat,chunk,fill,linear,matmul,split

Reviewed By: hl475

Differential Revision: D18764248

fbshipit-source-id: 510cf83542822acfa1b7b5e475b0cc7432f7ac19
2019-12-02 10:46:43 -08:00
Mingzhe Li
1aa80471b8 minor fix to filter (#30200)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30200

as title

Test Plan:
```
buck run mode/opt //caffe2/benchmarks/operator_benchmark:benchmark_all_other_test -- --tag_filter all --iterations 1 --ai_pep_format True --operators None --iterations -1 --warmup_iterations -1 --wipe_cache --forward_only False --device cpu --tag_filter all --use_jit False --operator_range b-z
# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : all

# Benchmarking PyTorch: batchnorm
PyTorchObserver {"type": "PyTorch_batchnorm_M1_N256_K3136_cpu_Eager", "metric": "latency", "unit": "ms", "value": "0.29026457108557224"}
PyTorchObserver {"type": "PyTorch_batchnorm_M1_N256_K3136_cpu_Eager", "metric": "latency", "unit": "ms", "value": "0.2813781425356865"}
PyTorchObserver {"type": "PyTorch_batchnorm_M1_N256_K3136_cpu_Eager", "metric": "latency", "unit": "ms", "value": "0.28009670320898294"}
...

Reviewed By: hl475

Differential Revision: D18627512

fbshipit-source-id: 23f622b96168f90a8d8648bfd9ff9a5116baafdf
2019-11-20 16:36:04 -08:00
Mingzhe Li
9cb8fb61c2 update operator_range discription in op bench (#30170)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30170

as title

Test Plan:
```
buck-out/opt/gen/caffe2/benchmarks/operator_benchmark/benchmark_all_other_test.par --tag_filter all --iterations 1 --operator_range ef
...
ValueError: The correct format for operator_range is <start>-<end>, or <point>, <start>-<end>

buck-out/opt/gen/caffe2/benchmarks/operator_benchmark/benchmark_all_other_test.par --tag_filter all --iterations 1 --operator_range a-b
# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : all

# Benchmarking PyTorch: add
# Mode: Eager
# Name: add_M8_N32_K256_cpu
# Input: M: 8, N: 32, K: 256, device: cpu
Forward Execution Time (us) : 60.551

# Benchmarking PyTorch: add
# Mode: Eager
# Name: add_M8_N32_K256_cuda
# Input: M: 8, N: 32, K: 256, device: cuda
Forward Execution Time (us) : 67.716
...

buck-out/opt/gen/caffe2/benchmarks/operator_benchmark/benchmark_all_other_test.par --tag_filter all --iterations 1 --operator_range b,d-f
# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : all

# Benchmarking PyTorch: batchnorm
# Mode: Eager
# Name: batchnorm_M1_N256_K3136_cpu
# Input: M: 1, N: 256, K: 3136, device: cpu
Forward Execution Time (us) : 296.004
...

Reviewed By: hl475

Differential Revision: D18619975

fbshipit-source-id: 08f27ee2aeda47be431385f4b20ef7fbeb797516
2019-11-20 12:07:14 -08:00
Mingzhe Li
d11dfd1a84 only run embeddingbag op on cpu (#30163)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30163

as title

Test Plan:
```
buck run mode/opt //caffe2/benchmarks/operator_benchmark:benchmark_all_other_test -- --tag_filter all --iterations 1 --device cuda --operators embeddingbag
Parsing buck files: finished in 0.9 sec
Building: finished in 02:32.5 min (100%) 7358/7358 jobs, 1 updated
  Total time: 02:33.5 min
# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : all

buck run mode/opt //caffe2/benchmarks/operator_benchmark:benchmark_all_other_test -- --tag_filter all --iterations 1 --operators embeddingbag
Parsing buck files: finished in 0.9 sec
Building: finished in 5.3 sec (100%) 5604/5604 jobs, 0 updated
  Total time: 6.3 sec
# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : all

# Benchmarking PyTorch: embeddingbag
# Mode: Eager
# Name: embeddingbag_embeddingbags80_dim64_modesum_input_size8_offset0_sparseTrue_cpu
# Input: embeddingbags: 80, dim: 64, mode: sum, input_size: 8, offset: 0, sparse: True, device: cpu
Forward Execution Time (us) : 62.608
...

Reviewed By: hl475

Differential Revision: D18617540

fbshipit-source-id: 062dd73c455db8b67749078603745651b55254b2
2019-11-20 10:02:39 -08:00
Mingzhe Li
2b1466e665 allow operator_range to take multiple ranges (#30124)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30124

as title

Test Plan:
```
buck run mode/opt //caffe2/benchmarks/operator_benchmark:benchmark_all_other_test -- --tag_filter all --iterations 1 --device cuda --operator_range a,b-c
# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : all

# Benchmarking PyTorch: add
# Mode: Eager
# Name: add_M8_N32_K256_cuda
# Input: M: 8, N: 32, K: 256, device: cuda
Forward Execution Time (us) : 71.683

# Benchmarking PyTorch: batchnorm
# Mode: Eager
# Name: batchnorm_M1_N256_K3136_cuda
# Input: M: 1, N: 256, K: 3136, device: cuda
Forward Execution Time (us) : 118.840

# Benchmarking PyTorch: batchnorm
# Mode: Eager
# Name: batchnorm_M1_N8192_K1_cuda
# Input: M: 1, N: 8192, K: 1, device: cuda
Forward Execution Time (us) : 134.274

# Benchmarking PyTorch: cat
# Mode: Eager
# Name: cat_M128_N128_K1_dim1_cuda
# Input: M: 128, N: 128, K: 1, dim: 1, device: cuda
Forward Execution Time (us) : 109.172
...

Reviewed By: hl475

Differential Revision: D18605640

fbshipit-source-id: 4ae9b91a50c4cdf1b161b6c5c58f365ba514050c
2019-11-19 16:15:46 -08:00
Mingzhe Li
0ab03d3283 only run embeddingbag benchmark on cpu (#30106)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30106

as title

Test Plan:
```
buck run mode/opt //caffe2/benchmarks/operator_benchmark:benchmark_all_other_test -- --tag_filter all --iterations 1 --device cuda --operators embeddingbag
# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : all

Reviewed By: hl475

Differential Revision: D18598198

fbshipit-source-id: 9b7d103410f1183fdf6776047ea2ef8dba4b7831
2019-11-19 12:07:34 -08:00
Mingzhe Li
23991e89cc change operator_range to work with lower and upper in op bench (#30096)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30096

as title

Test Plan:
```
buck run mode/opt caffe2/benchmarks/operator_benchmark:benchmark_all_quantized_test  -- --iterations 1 --operator_range a-a
# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : short

# Benchmarking PyTorch: add
# Mode: Eager
# Name: add_N2_dtypetorch.quint8_contigTrue
# Input: N: 2, dtype: torch.quint8, contig: True
Forward Execution Time (us) : 22.251

# Benchmarking PyTorch: add
# Mode: Eager
# Name: add_N2_dtypetorch.qint8_contigTrue
# Input: N: 2, dtype: torch.qint8, contig: True
Forward Execution Time (us) : 17.247

# Benchmarking PyTorch: add
# Mode: Eager
# Name: add_N2_dtypetorch.qint32_contigTrue
# Input: N: 2, dtype: torch.qint32, contig: True
Forward Execution Time (us) : 29.653
...

Reviewed By: hl475

Differential Revision: D18596447

fbshipit-source-id: eac8d9d90db244aa9799293c22bb0d30cf3edf58
2019-11-19 11:01:02 -08:00
Mingzhe Li
1597f22982 fix device check in op bench (#30091)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30091

as title

Test Plan:
```
Before:
buck run mode/opt //caffe2/benchmarks/operator_benchmark/pt:unary_test -- --device cuda
# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : short

# Benchmarking PyTorch: abs
# Mode: Eager
# Name: abs_M512_N512_cpu
# Input: M: 512, N: 512, device: cpu
Forward Execution Time (us) : 91.190

# Benchmarking PyTorch: abs
# Mode: Eager
# Name: abs_M512_N512_cuda
# Input: M: 512, N: 512, device: cuda
Forward Execution Time (us) : 27.062

After:
# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : short

# Benchmarking PyTorch: abs
# Mode: Eager
# Name: abs_M512_N512_cuda
# Input: M: 512, N: 512, device: cuda
Forward Execution Time (us) : 28.154

# Benchmarking PyTorch: abs_
# Mode: Eager
# Name: abs__M512_N512_cuda
# Input: M: 512, N: 512, device: cuda
Forward Execution Time (us) : 15.959
...

Reviewed By: hl475

Differential Revision: D18595176

fbshipit-source-id: 048c5b7b2a5318c3687412e12e8d2d5f380a8139
2019-11-19 10:05:47 -08:00
Mingzhe Li
5b15f32697 rename benchmark_all_other_test (#30048)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30048

as title

(Note: this ignores all push blocking failures!)

Test Plan:
```
buck run mode/opt caffe2/benchmarks/operator_benchmark:benchmark_all_other_test
# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : short

# Benchmarking PyTorch: add
# Mode: Eager
# Name: add_M64_N64_K64_cpu
# Input: M: 64, N: 64, K: 64, device: cpu
Forward Execution Time (us) : 142.032
...

Reviewed By: hl475

Differential Revision: D18580754

fbshipit-source-id: 125482d2987cbdb1d019ccedf56a9da5a7cebaba
2019-11-18 21:39:31 -08:00
Mingzhe Li
8b9bac1fad add operator-range argument to the op bench (#30051)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30051

This argument takes hyphen delimited start and end chars to filter operators. If the first character of an operator is in the start and end range, it will be tested. Otherwise skipped.

(Note: this ignores all push blocking failures!)

Test Plan:
```
buck run mode/opt caffe2/benchmarks/operator_benchmark:benchmark_all_test  -- --iterations 1 --operator_range b-c
# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : short

# Benchmarking PyTorch: ceil
# Mode: Eager
# Name: ceil_M512_N512_cpu
# Input: M: 512, N: 512, device: cpu
Forward Execution Time (us) : 110.720

# Benchmarking PyTorch: ceil_
# Mode: Eager
# Name: ceil__M512_N512_cpu
# Input: M: 512, N: 512, device: cpu
Forward Execution Time (us) : 51.128
...

buck run mode/opt caffe2/benchmarks/operator_benchmark:benchmark_all_test  -- --iterations 1 --operator_range None
# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : short

# Benchmarking PyTorch: abs
# Mode: Eager
# Name: abs_M512_N512_cpu
# Input: M: 512, N: 512, device: cpu
Forward Execution Time (us) : 107.113

# Benchmarking PyTorch: abs_
# Mode: Eager
# Name: abs__M512_N512_cpu
# Input: M: 512, N: 512, device: cpu
Forward Execution Time (us) : 54.259
...

Reviewed By: hl475

Differential Revision: D18581910

fbshipit-source-id: b1a1a7ba76f4d6a61c8a1659f15e9c66097654d4
2019-11-18 20:34:43 -08:00
Mingzhe Li
64706e0a74 change conv, batchnorm input shapes (#30041)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30041

as title

(Note: this ignores all push blocking failures!)

Test Plan:
```
buck run mode/opt //caffe2/benchmarks/operator_benchmark/pt:conv_test
# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : None

# Benchmarking PyTorch: ConvTranspose2d
# Mode: Eager
# Name: ConvTranspose2d_in_c512_out_c512_kernel3_stride2_N8_H64_W64_cpu
# Input: in_c: 512, out_c: 512, kernel: 3, stride: 2, N: 8, H: 64, W: 64, device: cpu
Forward Execution Time (us) : 751635.354

Reviewed By: hl475

Differential Revision: D18579767

fbshipit-source-id: 53bfac704828a836412434a66000c17f6ac1c727
2019-11-18 20:34:28 -08:00
Mingzhe Li
3250d5008f change the starting iters to reduce execution time (#30040)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30040

The benchmark will run each test in a loop of 200 iters, then keep doubling the number of iters until the time is significant. For operators which have very large input shapes, the initial 200 iters will take too much time which is not really necessary. This diff changed that 200 to 100.

(Note: this ignores all push blocking failures!)

Test Plan:
```
Before
# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : None

# Benchmarking PyTorch: ConvTranspose2d
# Mode: Eager
# Name: ConvTranspose2d_in_c512_out_c512_kernel3_stride2_N8_H64_W64_cpu
# Input: in_c: 512, out_c: 512, kernel: 3, stride: 2, N: 8, H: 64, W: 64, device: cpu
Forward Execution Time (us) : 729634.577

After
# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : None

# Benchmarking PyTorch: ConvTranspose2d
# Mode: Eager
# Name: ConvTranspose2d_in_c512_out_c512_kernel3_stride2_N8_H64_W64_cpu
# Input: in_c: 512, out_c: 512, kernel: 3, stride: 2, N: 8, H: 64, W: 64, device: cpu
Forward Execution Time (us) : 718315.899

Reviewed By: hl475

Differential Revision: D18579588

fbshipit-source-id: ef52474cf77e7549bbab0a9ae7b1b0c04023d208
2019-11-18 20:34:16 -08:00
Mingzhe Li
189b24ebe9 reorganize test binaries of op bench (#30023)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30023

This diff doesn't change how users run the benchmarks. But under the hood, we group all the tests into three groups: unary test, quantized test, and the rest ops (we name it others here).

Test Plan:
```
buck run //caffe2/benchmarks/operator_benchmark:benchmark_all_test -- --iterations 1
# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : short

# Benchmarking PyTorch: abs
# Mode: Eager
# Name: abs_M512_N512_cpu
# Input: M: 512, N: 512, device: cpu
Forward Execution Time (us) : 17914.301
...
# Benchmarking PyTorch: add
# Mode: Eager
# Name: add_M64_N64_K64_cpu_bwd2
# Input: M: 64, N: 64, K: 64, device: cpu
Backward Execution Time (us) : 66525.855
...
# Benchmarking PyTorch: mul
# Mode: Eager
# Name: mul_N2_dtypetorch.qint32_contigTrue
# Input: N: 2, dtype: torch.qint32, contig: True
Forward Execution Time (us) : 290.555
...

Reviewed By: hl475

Differential Revision: D18574719

fbshipit-source-id: f7ff1d952031129adde51ebf002e4891bd484680
2019-11-18 12:21:26 -08:00
Mingzhe Li
c543034531 add cuda sync when ops running on gpu (#29936)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29936

This diff adds synchronization after op execution to ensure all the cuda streams complete.

Test Plan:
```
buck run mode/opt //caffe2/benchmarks/operator_benchmark:benchmark_all_test -- --iterations 1
# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : short

# Benchmarking PyTorch: add
# Mode: Eager
# Name: add_M64_N64_K64_cpu
# Input: M: 64, N: 64, K: 64, device: cpu
Forward Execution Time (us) : 154.412

# Benchmarking PyTorch: add
# Mode: Eager
# Name: add_M64_N64_K64_cuda
# Input: M: 64, N: 64, K: 64, device: cuda
Forward Execution Time (us) : 101.115
...

Reviewed By: hl475

Differential Revision: D18542732

fbshipit-source-id: b979d26a174f488e971074dc1e16b00e17179c80
2019-11-15 18:02:48 -08:00
Mingzhe Li
3f5dc95b57 fix device check in op bench (#29918)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29918

Some of the tests don't specify `device` in the input configs so filter by device won't work for them. This diff fixes that issue.

Test Plan:
```
buck run mode/opt //caffe2/benchmarks/operator_benchmark/pt:qpool_test -- --iterations 1 --device cpu
# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : short

# Benchmarking PyTorch: QAdaptiveAvgPool2dBenchmark
# Mode: Eager
# Name: QAdaptiveAvgPool2dBenchmark_N4_C3_input_size(224,224)_output_size(112,112)_contigTrue_dtypetorch.qint32
# Input: N: 4, C: 3, input_size: (224, 224), output_size: (112, 112), contig: True, dtype: torch.qint32
Forward Execution Time (us) : 2891.172

Reviewed By: hl475

Differential Revision: D18535766

fbshipit-source-id: 09d89cf23b3caab6c0bc3b8a9ae55cc439b98e0f
2019-11-15 13:55:38 -08:00
Mingzhe Li
60a33cac2b reduce input shapes of long tag in op bench (#29865)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29865

For some operators, the number of tests (forward + backward) could easily go above 100. Many of them could be redundant so this diff tries to reduce the number of shapes.

Test Plan:
```
buck run //caffe2/benchmarks/operator_benchmark:benchmark_all_test -- --iterations 1
# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : short

# Benchmarking PyTorch: add
# Mode: Eager
# Name: add_M64_N64_K64_cpu
# Input: M: 64, N: 64, K: 64, device: cpu
Forward Execution Time (us) : 28418.926
...

Reviewed By: hl475

Differential Revision: D18520946

fbshipit-source-id: 1056d6d5a9c46bc2d508ff133039aefeb9d11c27
2019-11-14 20:19:09 -08:00
Mingzhe Li
90e3bbf3ab support all with tag_filter to run all shapes (#29864)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29864

This diff make `all` as a reserved keyword for tag_filter. When `all` is passed from user, it will run all the supported shapes.

Test Plan:
```
buck run //caffe2/benchmarks/operator_benchmark/pt:add_test -- --iterations 1 --tag_filter all
# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : all

# Benchmarking PyTorch: add
# Mode: Eager
# Name: add_M8_N32_K256_cpu
# Input: M: 8, N: 32, K: 256, device: cpu
Forward Execution Time (us) : 6798.688

...

Reviewed By: hl475

Differential Revision: D18520249

fbshipit-source-id: 4d55af9f46f89b2fe8842e1a00dfa8e5acaf4fa2
2019-11-14 20:19:05 -08:00
Mingzhe Li
5da2bf945e add embeddingbag to benchmark_all_test (#29830)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29830

as title

Test Plan: na

Reviewed By: hl475

Differential Revision: D18506023

fbshipit-source-id: 15693894c0aa736ab3e818bc740099f0d629cb84
2019-11-14 20:13:57 -08:00
Mingzhe Li
747233e3bd minir edit to fix benchmark_all_test cuda error (#29829)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29829

This diff replaces the if check cuda with to(device...) which is a much cleaner interface.

Test Plan:
```
buck run mode/opt //caffe2/benchmarks/operator_benchmark:benchmark_all_test -- --iterations 1
# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : short

# Benchmarking PyTorch: add
# Mode: Eager
# Name: add_M64_N64_K64_cpu
# Input: M: 64, N: 64, K: 64, device: cpu
Forward Execution Time (us) : 129.548

# Benchmarking PyTorch: add
# Mode: Eager
# Name: add_M64_N64_K64_cuda
# Input: M: 64, N: 64, K: 64, device: cuda
Forward Execution Time (us) : 48.313
...

Reviewed By: bddppq

Differential Revision: D18507568

fbshipit-source-id: 32534e76b2e27d59a631a4d76a0d93700e975ea4
2019-11-14 11:13:36 -08:00
Mingzhe Li
ad95099f45 fix benchmark_all_test when running on gpu (#29818)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29818

When some of the test running on cuda, there is a runtime error because of missing data transfer from cpu to cuda. This diff fixes that issue.

Test Plan:
```
buck run mode/opt //caffe2/benchmarks/operator_benchmark:benchmark_all_test -- --iterations 1
# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : short

# Benchmarking PyTorch: add
# Mode: Eager
# Name: add_M64_N64_K64_cpu
# Input: M: 64, N: 64, K: 64, device: cpu
Forward Execution Time (us) : 165.241

# Benchmarking PyTorch: add
# Mode: Eager
# Name: add_M64_N64_K64_cuda
# Input: M: 64, N: 64, K: 64, device: cuda
Forward Execution Time (us) : 56.546
...

Reviewed By: hl475

Differential Revision: D18506269

fbshipit-source-id: 87942d7a52bd398600766c0f5363d791b74a6ca6
2019-11-14 10:10:48 -08:00
Mingzhe Li
b70d571233 add embeddingbag operator the the benchmark suite (#29784)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29784

Add embeddingbag operator to the benchmark suite with different number of embeddings, dims, and inputs.

Test Plan:
```
buck run //caffe2/benchmarks/operator_benchmark/pt:embeddingbag_test -- --iterations 1
# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : short

# Benchmarking PyTorch: embeddingbag
# Mode: Eager
# Name: embeddingbag_embeddingbags2300_dim64_modesum_input_size16_offset0_sparseTrue
# Input: embeddingbags: 2300, dim: 64, mode: sum, input_size: 16, offset: 0, sparse: True
Forward Execution Time (us) : 624.838

# Benchmarking PyTorch: embeddingbag
# Mode: Eager
# Name: embeddingbag_embeddingbags2300_dim64_modesum_input_size64_offset0_sparseTrue
# Input: embeddingbags: 2300, dim: 64, mode: sum, input_size: 64, offset: 0, sparse: True
Forward Execution Time (us) : 636.744

# Benchmarking PyTorch: embeddingbag
# Mode: Eager
# Name: embeddingbag_embeddingbags80_dim64_modesum_input_size8_offset0_sparseTrue
# Input: embeddingbags: 80, dim: 64, mode: sum, input_size: 8, offset: 0, sparse: True
Backward Execution Time (us) : 2325.291

# Benchmarking PyTorch: embeddingbag
# Mode: Eager
# Name: embeddingbag_embeddingbags80_dim64_modesum_input_size16_offset0_sparseTrue
# Input: embeddingbags: 80, dim: 64, mode: sum, input_size: 16, offset: 0, sparse: True
Backward Execution Time (us) : 2528.658
...

Reviewed By: bddppq

Differential Revision: D18496340

fbshipit-source-id: 157dcff2ea4ec13416fe161382fcefd47ce4cc01
2019-11-14 10:05:47 -08:00
Mingzhe Li
e53b510773 add addmm op to the benchmark suite (#29783)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29783

Add addmm operator which reuses existing input shapes for the add operator.

Test Plan:
```
buck run //caffe2/benchmarks/operator_benchmark/pt:add_test -- --iterations 1
# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : short

# Benchmarking PyTorch: addmm
# Mode: Eager
# Name: addmm_M64_N64_K64_cpu
# Input: M: 64, N: 64, K: 64, device: cpu
Forward Execution Time (us) : 759.237

# Benchmarking PyTorch: addmm
# Mode: Eager
# Name: addmm_M64_N64_K128_cpu
# Input: M: 64, N: 64, K: 128, device: cpu
Forward Execution Time (us) : 922.764

# Benchmarking PyTorch: addmm
# Mode: Eager
# Name: addmm_M64_N64_K64_cpu_bwdall
# Input: M: 64, N: 64, K: 64, device: cpu
Backward Execution Time (us) : 4689.546

# Benchmarking PyTorch: addmm
# Mode: Eager
# Name: addmm_M64_N64_K64_cpu_bwd1
# Input: M: 64, N: 64, K: 64, device: cpu
Backward Execution Time (us) : 1700.093

# Benchmarking PyTorch: addmm
# Mode: Eager
# Name: addmm_M64_N64_K64_cpu_bwd2
# Input: M: 64, N: 64, K: 64, device: cpu
Backward Execution Time (us) : 2947.427

# Benchmarking PyTorch: addmm
# Mode: Eager
# Name: addmm_M64_N64_K64_cpu_bwd3
# Input: M: 64, N: 64, K: 64, device: cpu
Backward Execution Time (us) : 2518.043

# Benchmarking PyTorch: addmm
# Mode: Eager
# Name: addmm_M64_N64_K128_cpu_bwdall
# Input: M: 64, N: 64, K: 128, device: cpu
Backward Execution Time (us) : 5848.369

Reviewed By: bddppq

Differential Revision: D18496476

fbshipit-source-id: 4f1c116a2676a64106afa958e8c8a8e109f35a4a
2019-11-14 10:02:55 -08:00
Mingzhe Li
f3b15727c5 fix op benchmark OOM issue (#29794)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29794

Before this diff, all tests of an operator are created at once before testing. Once an operator is benchmarked, the same process will move to the next operator and so on. The issue is that the number of tests of a single operator could be > 100 which can cause OOM issues. This diff avoids creating all the tests of an operator at once by using generators which creates/runs test one by one.

Test Plan:
```
buck run //caffe2/benchmarks/operator_benchmark:benchmark_all_quantized_test -- --iterations 1
# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : short

# Benchmarking PyTorch: relu
# Mode: Eager
# Name: relu_dims(3,4,5)_contigFalse_inplaceFalse_dtypetorch.quint8
# Input: dims: (3, 4, 5), contig: False, inplace: False, dtype: torch.quint8
Forward Execution Time (us) : 52.493

# Benchmarking PyTorch: relu
# Mode: Eager
# Name: relu_dims(3,4,5)_contigFalse_inplaceFalse_dtypetorch.qint8
# Input: dims: (3, 4, 5), contig: False, inplace: False, dtype: torch.qint8
Forward Execution Time (us) : 44.945
...

Reviewed By: hl475

Differential Revision: D18500103

fbshipit-source-id: 747c0ad0d302177da04da36e112c67f154115b6e
2019-11-13 22:22:58 -08:00
Zafar Takhirov
d2aa4c611f observer benchmarking
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29508

Test Plan: Imported from OSS

Differential Revision: D18415171

Pulled By: z-a-f

fbshipit-source-id: 5ebedee8c17448e36853e0c1bf778bb128975678
2019-11-12 23:28:10 -08:00
Zafar Takhirov
29e509ff1d Fix a missing comma in quantized benchmark
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29685

Test Plan: Imported from OSS

Differential Revision: D18463246

Pulled By: z-a-f

fbshipit-source-id: c21fd7892f3701afcc5faa8bc03f98b6f6550d0f
2019-11-12 16:50:46 -08:00
Zafar Takhirov
9bb0e2834d Fixing data type in quantized pool benchmarking
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29663

Test Plan: Imported from OSS

Differential Revision: D18456671

Pulled By: z-a-f

fbshipit-source-id: b36fc56e4f29937e458308f4c13f7a5e37665269
2019-11-12 13:22:53 -08:00
Zafar Takhirov
3b43cfde80 Benchmarking per channel quantization
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29627

Test Plan: Imported from OSS

Differential Revision: D18443929

Pulled By: z-a-f

fbshipit-source-id: a0345cc5e259b4ce98589252719b8885326d43a3
2019-11-12 11:33:42 -08:00
Zafar Takhirov
5db361bd32 Quantized interpolation benchmarks
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29509

Test Plan: Imported from OSS

Differential Revision: D18415367

Pulled By: z-a-f

fbshipit-source-id: 84d0aaa81b131b49762edde6ade27e61acb99a42
2019-11-12 11:23:03 -08:00
Zafar Takhirov
f95e8ea1be Benchmarking quantized methods (#29625)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29625

This PR also adds a template for benchmarking methods that require no input.

Test Plan: Imported from OSS

Differential Revision: D18443485

Pulled By: z-a-f

fbshipit-source-id: 6f25c3a7cd94e396c112b5f7c33307b71f78ecd3
2019-11-12 11:08:55 -08:00
Zafar Takhirov
3b452ca428 quantized topk benchmarking
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29505

Test Plan: Imported from OSS

Differential Revision: D18414851

Pulled By: z-a-f

fbshipit-source-id: 23999ef95c2f087066c4da36b2bf35516ebc0421
2019-11-12 00:33:47 -08:00
Zafar Takhirov
a0d4d5062b Quantized unary ops benchmarking (mostly template)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29503

Test Plan: Imported from OSS

Differential Revision: D18414589

Pulled By: z-a-f

fbshipit-source-id: ab5af490359b3e0a51642a46aef86f7be720deff
2019-11-11 23:48:36 -08:00
Zafar Takhirov
fb07098e2b Creating a base benchmarking class for activations.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29182

Test Plan: Imported from OSS

Differential Revision: D18319456

Pulled By: z-a-f

fbshipit-source-id: d2314bb30a584551b5f1c8610b36c4c10c27ac85
2019-11-11 18:24:44 -08:00
Mingzhe Li
af3468a1c7 change op bench input shape to reduce execution time (#29616)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29616

1. Reduce the predefined_min_time which is the minimum time each test needs to run. Based on the test result, the average time across different epoch are pretty stable before exiting. So we can safely reduce the predefined time here.
2. Chang the input shapes of several ops

Test Plan:
```
# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : short

# Benchmarking PyTorch: add
200 256.044864655
400 165.850520134
800 163.579881191
1600 162.871927023
3200 160.3128016
# Mode: Eager
# Name: add_cpu_M64_K64_bwd1_N64
# Input: device: cpu, K: 64, M: 64, N: 64
Backward Execution Time (us) : 164.715

# Benchmarking PyTorch: add
200 170.650482178
400 168.895125389
800 169.867575169
1600 163.400024176
3200 168.658420444
# Mode: Eager
# Name: add_cpu_M64_K64_bwd2_N64
# Input: device: cpu, K: 64, M: 64, N: 64
Backward Execution Time (us) : 168.777

Reviewed By: hl475

Differential Revision: D18438540

fbshipit-source-id: 1fd27cf4bbc34e46e74393af912ee2fcb75c33b2
2019-11-11 16:58:27 -08:00
Mingzhe Li
7374dd0d52 remove SkipInputShape flag (#29615)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29615

Remove that flag as it's not needed any more.

Test Plan: na

Reviewed By: hl475

Differential Revision: D18440271

fbshipit-source-id: 41b0659c72ef746a1cc268174fd1e7dc2beb1ae2
2019-11-11 16:56:40 -08:00
Mingzhe Li
b5a38fa98e update op bench readme (#29596)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29596

as title

Test Plan: na

Reviewed By: hl475

Differential Revision: D18437811

fbshipit-source-id: 7996d1689d8a46849b62b2b3875c67cf8dc5861c
2019-11-11 15:33:29 -08:00
Mingzhe Li
00c224f0f2 move quantized tests from benchmark_all_test to benchmark_all_quantized_test (#29590)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29590

as title

Test Plan:
```
buck run //caffe2/benchmarks/operator_benchmark:benchmark_all_test  -- --iteration 1
Parsing buck files: finished in 1.0 sec
Creating action graph: finished in 43.0 sec
Building: finished in 16.0 sec (100%) 10053/10053 jobs, 1 updated
  Total time: 01:00.0 min
# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : short

# Benchmarking PyTorch: add
# Mode: Eager
# Name: add_M64_N64_K64_cpu
# Input: M: 64, N: 64, K: 64, device: cpu
Forward Execution Time (us) : 45419.667
...

buck run //caffe2/benchmarks/operator_benchmark:benchmark_all_quantized_test
Parsing buck files: finished in 1.0 sec
Building: finished in 6.0 sec (100%) 10053/10053 jobs, 1 updated
  Total time: 7.0 sec
# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : short

# Benchmarking PyTorch: QReLU
# Mode: Eager
# Name: QReLU_dims(1,)_permute_dimsFalse_inplaceFalse_dtypetorch.quint8
# Input: dims: (1,), permute_dims: False, inplace: False, dtype: torch.quint8
Forward Execution Time (us) : 137.685
...

Reviewed By: hl475

Differential Revision: D18436727

fbshipit-source-id: 317ec0e4bd2a6e33c9a60830f01ed805ae412449
2019-11-11 14:59:29 -08:00
Mingzhe Li
137eea5938 change module_name in chunk_test (#29589)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29589

as title

Test Plan:
```
buck run //caffe2/benchmarks/operator_benchmark/pt:chunk_test  -- --iteration 1
# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : short

# Benchmarking PyTorch: chunk
# Mode: Eager
# Name: chunk_M256_N512_chunks2_cpu
# Input: M: 256, N: 512, chunks: 2, device: cpu
Forward Execution Time (us) : 148.345

# Benchmarking PyTorch: chunk
# Mode: Eager
# Name: chunk_M512_N512_chunks2_cpu
# Input: M: 512, N: 512, chunks: 2, device: cpu
Forward Execution Time (us) : 125.239

Reviewed By: hl475

Differential Revision: D18436532

fbshipit-source-id: e7100f4605471e27703b2e2e863b971a93229854
2019-11-11 14:59:24 -08:00