Summary:
In the long string, formalstring thinks it is good to have a name.
When using dict, literal is better for readability and faster than dict constructor.
I always appreciate your efforts in creating the world's best frameworks.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31352
Differential Revision: D19191967
Pulled By: ngimel
fbshipit-source-id: 21f063b163b67de8cf9761a4db5991f74318e991
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31334
The wipe cache logic was introduced hoping to reduce the variations in the benchmark results. Based on our experiments result, it didn't actually help with that. In addition, several engineers had encountered the issue of missing cpuinfo.h which was used in the wipe cache logic. So this diff removes that feature to ensure smooth installation and running of the op bench.
Test Plan:
```
buck run caffe2/benchmarks/operator_benchmark/pt:add_test -- --iterations 1
# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : short
# Benchmarking PyTorch: add
# Mode: Eager
# Name: add_M1_N1_K1_cpu
# Input: M: 1, N: 1, K: 1, device: cpu
Forward Execution Time (us) : 111.192
A/B test also pass Benchmark Run #2476535015
Reviewed By: hl475
Differential Revision: D19126970
fbshipit-source-id: 9b1ab48c121838836ba6e0ae664a48fe2d18efdd
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30040
The benchmark will run each test in a loop of 200 iters, then keep doubling the number of iters until the time is significant. For operators which have very large input shapes, the initial 200 iters will take too much time which is not really necessary. This diff changed that 200 to 100.
(Note: this ignores all push blocking failures!)
Test Plan:
```
Before
# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : None
# Benchmarking PyTorch: ConvTranspose2d
# Mode: Eager
# Name: ConvTranspose2d_in_c512_out_c512_kernel3_stride2_N8_H64_W64_cpu
# Input: in_c: 512, out_c: 512, kernel: 3, stride: 2, N: 8, H: 64, W: 64, device: cpu
Forward Execution Time (us) : 729634.577
After
# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : None
# Benchmarking PyTorch: ConvTranspose2d
# Mode: Eager
# Name: ConvTranspose2d_in_c512_out_c512_kernel3_stride2_N8_H64_W64_cpu
# Input: in_c: 512, out_c: 512, kernel: 3, stride: 2, N: 8, H: 64, W: 64, device: cpu
Forward Execution Time (us) : 718315.899
Reviewed By: hl475
Differential Revision: D18579588
fbshipit-source-id: ef52474cf77e7549bbab0a9ae7b1b0c04023d208
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29865
For some operators, the number of tests (forward + backward) could easily go above 100. Many of them could be redundant so this diff tries to reduce the number of shapes.
Test Plan:
```
buck run //caffe2/benchmarks/operator_benchmark:benchmark_all_test -- --iterations 1
# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : short
# Benchmarking PyTorch: add
# Mode: Eager
# Name: add_M64_N64_K64_cpu
# Input: M: 64, N: 64, K: 64, device: cpu
Forward Execution Time (us) : 28418.926
...
Reviewed By: hl475
Differential Revision: D18520946
fbshipit-source-id: 1056d6d5a9c46bc2d508ff133039aefeb9d11c27
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29864
This diff make `all` as a reserved keyword for tag_filter. When `all` is passed from user, it will run all the supported shapes.
Test Plan:
```
buck run //caffe2/benchmarks/operator_benchmark/pt:add_test -- --iterations 1 --tag_filter all
# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : all
# Benchmarking PyTorch: add
# Mode: Eager
# Name: add_M8_N32_K256_cpu
# Input: M: 8, N: 32, K: 256, device: cpu
Forward Execution Time (us) : 6798.688
...
Reviewed By: hl475
Differential Revision: D18520249
fbshipit-source-id: 4d55af9f46f89b2fe8842e1a00dfa8e5acaf4fa2
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29830
as title
Test Plan: na
Reviewed By: hl475
Differential Revision: D18506023
fbshipit-source-id: 15693894c0aa736ab3e818bc740099f0d629cb84
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29818
When some of the test running on cuda, there is a runtime error because of missing data transfer from cpu to cuda. This diff fixes that issue.
Test Plan:
```
buck run mode/opt //caffe2/benchmarks/operator_benchmark:benchmark_all_test -- --iterations 1
# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : short
# Benchmarking PyTorch: add
# Mode: Eager
# Name: add_M64_N64_K64_cpu
# Input: M: 64, N: 64, K: 64, device: cpu
Forward Execution Time (us) : 165.241
# Benchmarking PyTorch: add
# Mode: Eager
# Name: add_M64_N64_K64_cuda
# Input: M: 64, N: 64, K: 64, device: cuda
Forward Execution Time (us) : 56.546
...
Reviewed By: hl475
Differential Revision: D18506269
fbshipit-source-id: 87942d7a52bd398600766c0f5363d791b74a6ca6
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29794
Before this diff, all tests of an operator are created at once before testing. Once an operator is benchmarked, the same process will move to the next operator and so on. The issue is that the number of tests of a single operator could be > 100 which can cause OOM issues. This diff avoids creating all the tests of an operator at once by using generators which creates/runs test one by one.
Test Plan:
```
buck run //caffe2/benchmarks/operator_benchmark:benchmark_all_quantized_test -- --iterations 1
# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : short
# Benchmarking PyTorch: relu
# Mode: Eager
# Name: relu_dims(3,4,5)_contigFalse_inplaceFalse_dtypetorch.quint8
# Input: dims: (3, 4, 5), contig: False, inplace: False, dtype: torch.quint8
Forward Execution Time (us) : 52.493
# Benchmarking PyTorch: relu
# Mode: Eager
# Name: relu_dims(3,4,5)_contigFalse_inplaceFalse_dtypetorch.qint8
# Input: dims: (3, 4, 5), contig: False, inplace: False, dtype: torch.qint8
Forward Execution Time (us) : 44.945
...
Reviewed By: hl475
Differential Revision: D18500103
fbshipit-source-id: 747c0ad0d302177da04da36e112c67f154115b6e
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29625
This PR also adds a template for benchmarking methods that require no input.
Test Plan: Imported from OSS
Differential Revision: D18443485
Pulled By: z-a-f
fbshipit-source-id: 6f25c3a7cd94e396c112b5f7c33307b71f78ecd3
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29616
1. Reduce the predefined_min_time which is the minimum time each test needs to run. Based on the test result, the average time across different epoch are pretty stable before exiting. So we can safely reduce the predefined time here.
2. Chang the input shapes of several ops
Test Plan:
```
# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : short
# Benchmarking PyTorch: add
200 256.044864655
400 165.850520134
800 163.579881191
1600 162.871927023
3200 160.3128016
# Mode: Eager
# Name: add_cpu_M64_K64_bwd1_N64
# Input: device: cpu, K: 64, M: 64, N: 64
Backward Execution Time (us) : 164.715
# Benchmarking PyTorch: add
200 170.650482178
400 168.895125389
800 169.867575169
1600 163.400024176
3200 168.658420444
# Mode: Eager
# Name: add_cpu_M64_K64_bwd2_N64
# Input: device: cpu, K: 64, M: 64, N: 64
Backward Execution Time (us) : 168.777
Reviewed By: hl475
Differential Revision: D18438540
fbshipit-source-id: 1fd27cf4bbc34e46e74393af912ee2fcb75c33b2
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29615
Remove that flag as it's not needed any more.
Test Plan: na
Reviewed By: hl475
Differential Revision: D18440271
fbshipit-source-id: 41b0659c72ef746a1cc268174fd1e7dc2beb1ae2
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29596
as title
Test Plan: na
Reviewed By: hl475
Differential Revision: D18437811
fbshipit-source-id: 7996d1689d8a46849b62b2b3875c67cf8dc5861c