pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
Wojciech Baranowski	b10a39bb32	Migrate _cat from TH to ATen (CUDA) (#33237 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/24520 Benchmarks: Upstream: ``` $ python -m pt.cat_test --tag_filter all --device cuda --omp_num_threads 1 --mkl_num_threads 1 # ---------------------------------------- # PyTorch/Caffe2 Operator Micro-benchmarks # ---------------------------------------- # Tag : all # Benchmarking PyTorch: cat # Mode: Eager # Name: cat_sizes(1,1,1)_N2_dim0_cuda # Input: sizes: (1, 1, 1), N: 2, dim: 0, device: cuda Forward Execution Time (us) : 17.355 # Benchmarking PyTorch: cat # Mode: Eager # Name: cat_sizes(512,512,2)_N2_dim1_cuda # Input: sizes: (512, 512, 2), N: 2, dim: 1, device: cuda Forward Execution Time (us) : 30.718 # Benchmarking PyTorch: cat # Mode: Eager # Name: cat_sizes(128,1024,2)_N2_dim1_cuda # Input: sizes: (128, 1024, 2), N: 2, dim: 1, device: cuda Forward Execution Time (us) : 17.329 # Benchmarking PyTorch: cat # Mode: Eager # Name: cat_sizes(512,512,2)_N2_dim1_cuda # Input: sizes: (512, 512, 2), N: 2, dim: 1, device: cuda Forward Execution Time (us) : 30.176 # Benchmarking PyTorch: cat # Mode: Eager # Name: cat_sizes(1024,1024,2)_N2_dim0_cuda # Input: sizes: (1024, 1024, 2), N: 2, dim: 0, device: cuda Forward Execution Time (us) : 74.417 # Benchmarking PyTorch: cat # Mode: Eager # Name: cat_sizes(1025,1023,2)_N2_dim1_cuda # Input: sizes: (1025, 1023, 2), N: 2, dim: 1, device: cuda Forward Execution Time (us) : 75.728 # Benchmarking PyTorch: cat # Mode: Eager # Name: cat_sizes(1024,1024,2)_N2_dim2_cuda # Input: sizes: (1024, 1024, 2), N: 2, dim: 2, device: cuda Forward Execution Time (us) : 190.165 # Benchmarking PyTorch: cat # Mode: Eager # Name: cat_sizes[<function<lambda>at0x7fa8876fcf28>,111,65]_N5_dim0_cuda # Input: sizes: [<function <lambda> at 0x7fa8876fcf28>, 111, 65], N: 5, dim: 0, device: cuda Forward Execution Time (us) : 57.711 # Benchmarking PyTorch: cat # Mode: Eager # Name: cat_sizes[96,<function<lambda>at0x7fa886237048>,64]_N5_dim1_cuda # Input: sizes: [96, <function <lambda> at 0x7fa886237048>, 64], N: 5, dim: 1, device: cuda Forward Execution Time (us) : 49.903 # Benchmarking PyTorch: cat # Mode: Eager # Name: cat_sizes[128,64,<function<lambda>at0x7fa7b57bb840>]_N5_dim2_cuda # Input: sizes: [128, 64, <function <lambda> at 0x7fa7b57bb840>], N: 5, dim: 2, device: cuda Forward Execution Time (us) : 84.181 # Benchmarking PyTorch: cat # Mode: Eager # Name: cat_sizes[<function<lambda>at0x7fa7b57bba60>,32,64]_N50_dim0_cuda # Input: sizes: [<function <lambda> at 0x7fa7b57bba60>, 32, 64], N: 50, dim: 0, device: cuda Forward Execution Time (us) : 82.339 # Benchmarking PyTorch: cat # Mode: Eager # Name: cat_sizes[32,<function<lambda>at0x7fa7b57bbae8>,64]_N50_dim1_cuda # Input: sizes: [32, <function <lambda> at 0x7fa7b57bbae8>, 64], N: 50, dim: 1, device: cuda Forward Execution Time (us) : 82.312 # Benchmarking PyTorch: cat # Mode: Eager # Name: cat_sizes[33,65,<function<lambda>at0x7fa7b57bbb70>]_N50_dim2_cuda # Input: sizes: [33, 65, <function <lambda> at 0x7fa7b57bbb70>], N: 50, dim: 2, device: cuda Forward Execution Time (us) : 90.715 # Benchmarking PyTorch: cat # Mode: Eager # Name: cat_sizes(64,32,4,16,32)_N2_dim2_cuda # Input: sizes: (64, 32, 4, 16, 32), N: 2, dim: 2, device: cuda Forward Execution Time (us) : 129.021 # Benchmarking PyTorch: cat # Mode: Eager # Name: cat_sizes(16,32,4,16,32)_N8_dim2_cuda # Input: sizes: (16, 32, 4, 16, 32), N: 8, dim: 2, device: cuda Forward Execution Time (us) : 142.966 # Benchmarking PyTorch: cat # Mode: Eager # Name: cat_sizes(9,31,5,15,33)_N17_dim4_cuda # Input: sizes: (9, 31, 5, 15, 33), N: 17, dim: 4, device: cuda Forward Execution Time (us) : 387.023 # Benchmarking PyTorch: cat # Mode: Eager # Name: cat_sizes[<function<lambda>at0x7fa7b57bbbf8>]_N100_dim0_cuda # Input: sizes: [<function <lambda> at 0x7fa7b57bbbf8>], N: 100, dim: 0, device: cuda Forward Execution Time (us) : 36.647 # Benchmarking PyTorch: cat # Mode: Eager # Name: cat_sizes[<function<lambda>at0x7fa7b57bbc80>]_N1000_dim0_cuda # Input: sizes: [<function <lambda> at 0x7fa7b57bbc80>], N: 1000, dim: 0, device: cuda Forward Execution Time (us) : 278.890 # Benchmarking PyTorch: cat # Mode: Eager # Name: cat_sizes[<function<lambda>at0x7fa7b57bbd08>]_N2000_dim0_cuda # Input: sizes: [<function <lambda> at 0x7fa7b57bbd08>], N: 2000, dim: 0, device: cuda Forward Execution Time (us) : 557.752 # Benchmarking PyTorch: cat # Mode: Eager # Name: cat_sizes[<function<lambda>at0x7fa7b57bbd90>]_N3000_dim0_cuda # Input: sizes: [<function <lambda> at 0x7fa7b57bbd90>], N: 3000, dim: 0, device: cuda Forward Execution Time (us) : 842.512 ``` New version: ``` $ python -m pt.cat_test --tag_filter all --device cuda --omp_num_threads 1 --mkl_num_threads 1 # ---------------------------------------- # PyTorch/Caffe2 Operator Micro-benchmarks # ---------------------------------------- # Tag : all # Benchmarking PyTorch: cat # Mode: Eager # Name: cat_sizes(1,1,1)_N2_dim0_cuda # Input: sizes: (1, 1, 1), N: 2, dim: 0, device: cuda Forward Execution Time (us) : 24.419 # Benchmarking PyTorch: cat # Mode: Eager # Name: cat_sizes(512,512,2)_N2_dim1_cuda # Input: sizes: (512, 512, 2), N: 2, dim: 1, device: cuda Forward Execution Time (us) : 25.025 # Benchmarking PyTorch: cat # Mode: Eager # Name: cat_sizes(128,1024,2)_N2_dim1_cuda # Input: sizes: (128, 1024, 2), N: 2, dim: 1, device: cuda Forward Execution Time (us) : 24.247 # Benchmarking PyTorch: cat # Mode: Eager # Name: cat_sizes(512,512,2)_N2_dim1_cuda # Input: sizes: (512, 512, 2), N: 2, dim: 1, device: cuda Forward Execution Time (us) : 25.098 # Benchmarking PyTorch: cat # Mode: Eager # Name: cat_sizes(1024,1024,2)_N2_dim0_cuda # Input: sizes: (1024, 1024, 2), N: 2, dim: 0, device: cuda Forward Execution Time (us) : 74.441 # Benchmarking PyTorch: cat # Mode: Eager # Name: cat_sizes(1025,1023,2)_N2_dim1_cuda # Input: sizes: (1025, 1023, 2), N: 2, dim: 1, device: cuda Forward Execution Time (us) : 74.866 # Benchmarking PyTorch: cat # Mode: Eager # Name: cat_sizes(1024,1024,2)_N2_dim2_cuda # Input: sizes: (1024, 1024, 2), N: 2, dim: 2, device: cuda Forward Execution Time (us) : 189.280 # Benchmarking PyTorch: cat # Mode: Eager # Name: cat_sizes[<function<lambda>at0x7f1c9b056048>,111,65]_N5_dim0_cuda # Input: sizes: [<function <lambda> at 0x7f1c9b056048>, 111, 65], N: 5, dim: 0, device: cuda Forward Execution Time (us) : 57.629 # Benchmarking PyTorch: cat # Mode: Eager # Name: cat_sizes[96,<function<lambda>at0x7f1c9b0560d0>,64]_N5_dim1_cuda # Input: sizes: [96, <function <lambda> at 0x7f1c9b0560d0>, 64], N: 5, dim: 1, device: cuda Forward Execution Time (us) : 49.975 # Benchmarking PyTorch: cat # Mode: Eager # Name: cat_sizes[128,64,<function<lambda>at0x7f1bce8f38c8>]_N5_dim2_cuda # Input: sizes: [128, 64, <function <lambda> at 0x7f1bce8f38c8>], N: 5, dim: 2, device: cuda Forward Execution Time (us) : 83.643 # Benchmarking PyTorch: cat # Mode: Eager # Name: cat_sizes[<function<lambda>at0x7f1bce8f3ae8>,32,64]_N50_dim0_cuda # Input: sizes: [<function <lambda> at 0x7f1bce8f3ae8>, 32, 64], N: 50, dim: 0, device: cuda Forward Execution Time (us) : 82.307 # Benchmarking PyTorch: cat # Mode: Eager # Name: cat_sizes[32,<function<lambda>at0x7f1bce8f3b70>,64]_N50_dim1_cuda # Input: sizes: [32, <function <lambda> at 0x7f1bce8f3b70>, 64], N: 50, dim: 1, device: cuda Forward Execution Time (us) : 82.323 # Benchmarking PyTorch: cat # Mode: Eager # Name: cat_sizes[33,65,<function<lambda>at0x7f1bce8f3bf8>]_N50_dim2_cuda # Input: sizes: [33, 65, <function <lambda> at 0x7f1bce8f3bf8>], N: 50, dim: 2, device: cuda Forward Execution Time (us) : 90.549 # Benchmarking PyTorch: cat # Mode: Eager # Name: cat_sizes(64,32,4,16,32)_N2_dim2_cuda # Input: sizes: (64, 32, 4, 16, 32), N: 2, dim: 2, device: cuda Forward Execution Time (us) : 129.022 # Benchmarking PyTorch: cat # Mode: Eager # Name: cat_sizes(16,32,4,16,32)_N8_dim2_cuda # Input: sizes: (16, 32, 4, 16, 32), N: 8, dim: 2, device: cuda Forward Execution Time (us) : 142.969 # Benchmarking PyTorch: cat # Mode: Eager # Name: cat_sizes(9,31,5,15,33)_N17_dim4_cuda # Input: sizes: (9, 31, 5, 15, 33), N: 17, dim: 4, device: cuda Forward Execution Time (us) : 386.973 # Benchmarking PyTorch: cat # Mode: Eager # Name: cat_sizes[<function<lambda>at0x7f1bce8f3c80>]_N100_dim0_cuda # Input: sizes: [<function <lambda> at 0x7f1bce8f3c80>], N: 100, dim: 0, device: cuda Forward Execution Time (us) : 43.800 # Benchmarking PyTorch: cat # Mode: Eager # Name: cat_sizes[<function<lambda>at0x7f1bce8f3d08>]_N1000_dim0_cuda # Input: sizes: [<function <lambda> at 0x7f1bce8f3d08>], N: 1000, dim: 0, device: cuda Forward Execution Time (us) : 279.023 # Benchmarking PyTorch: cat # Mode: Eager # Name: cat_sizes[<function<lambda>at0x7f1bce8f3d90>]_N2000_dim0_cuda # Input: sizes: [<function <lambda> at 0x7f1bce8f3d90>], N: 2000, dim: 0, device: cuda Forward Execution Time (us) : 565.790 # Benchmarking PyTorch: cat # Mode: Eager # Name: cat_sizes[<function<lambda>at0x7f1bce8f3e18>]_N3000_dim0_cuda # Input: sizes: [<function <lambda> at 0x7f1bce8f3e18>], N: 3000, dim: 0, device: cuda Forward Execution Time (us) : 845.153 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/33237 Differential Revision: D20069181 Pulled By: ngimel fbshipit-source-id: b392e1ffd72c0d8df0c5a2d3ac96f59b37c84e32	2020-02-24 17:41:16 -08:00
comet	9a2691f2fc	Fix spelling errors Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32673 Differential Revision: D19597118 Pulled By: pietern fbshipit-source-id: f88c1da7548fcee141ed248f5f49d25c1d639955	2020-01-28 04:46:15 -08:00
Huamin Li	52f8f031ac	add diag into pt operator microbenchmark (#32597 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32597 Currently, there is no benchmark test about diag operator. This diff will add one into the suite. Test Plan: ``` # ---------------------------------------- # PyTorch/Caffe2 Operator Micro-benchmarks # ---------------------------------------- # Tag : short # Benchmarking PyTorch: diag # Mode: Eager # Name: diag_dim1_M64_N64_diagonal0_outTrue_cpu # Input: dim: 1, M: 64, N: 64, diagonal: 0, out: True, device: cpu Forward Execution Time (us) : 28.496 # Benchmarking PyTorch: diag # Mode: Eager # Name: diag_dim2_M128_N128_diagonal-10_outFalse_cpu # Input: dim: 2, M: 128, N: 128, diagonal: -10, out: False, device: cpu Forward Execution Time (us) : 45.179 # Benchmarking PyTorch: diag # Mode: Eager # Name: diag_dim1_M256_N256_diagonal20_outTrue_cpu # Input: dim: 1, M: 256, N: 256, diagonal: 20, out: True, device: cpu Forward Execution Time (us) : 49.009 ``` Reviewed By: mingzhe09088 Differential Revision: D19564024 fbshipit-source-id: 828a3e0e0e06810a77eb5ddb734efd30e4a63acf	2020-01-24 15:41:04 -08:00
Brian Wignall	f326045b37	Fix typos, via a Levenshtein-type corrector (#31523 ) Summary: Should be non-semantic. Uses https://en.wikipedia.org/wiki/Wikipedia:Lists_of_common_misspellings/For_machines to find likely typos, with https://github.com/bwignall/typochecker to help automate the checking. Uses an updated version of the tool used in https://github.com/pytorch/pytorch/pull/30606 . Pull Request resolved: https://github.com/pytorch/pytorch/pull/31523 Differential Revision: D19216749 Pulled By: mrshenli fbshipit-source-id: 7fd489cb9a77cd7e4950c1046f925d57524960ea	2020-01-17 16:03:19 -08:00
Zafar Takhirov	0ae063d5d9	Fixed concatenation benchmark + added it to the microbenchmarking runs Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31587 Test Plan: Imported from OSS Differential Revision: D19221813 Pulled By: z-a-f fbshipit-source-id: ee0eb60da7899b23fdc63326302d1e2fd4b540ee	2020-01-03 11:23:12 -08:00
olramde	d770fbc1d2	Some modifications to improve readability (#31352 ) Summary: In the long string, formalstring thinks it is good to have a name. When using dict, literal is better for readability and faster than dict constructor. I always appreciate your efforts in creating the world's best frameworks. Pull Request resolved: https://github.com/pytorch/pytorch/pull/31352 Differential Revision: D19191967 Pulled By: ngimel fbshipit-source-id: 21f063b163b67de8cf9761a4db5991f74318e991	2020-01-02 12:48:34 -08:00
Zafar Takhirov	e33dea6e4e	dynamicly quantized lstm benchmarking Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30149 Test Plan: Imported from OSS Differential Revision: D18613005 Pulled By: z-a-f fbshipit-source-id: 966bfe2c862b1b4006b228bd9115c5c1cd3ad8cf	2019-12-17 16:52:04 -08:00
Mingzhe Li	f9010d7648	remove wipe cache from op bench (#31334 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31334 The wipe cache logic was introduced hoping to reduce the variations in the benchmark results. Based on our experiments result, it didn't actually help with that. In addition, several engineers had encountered the issue of missing cpuinfo.h which was used in the wipe cache logic. So this diff removes that feature to ensure smooth installation and running of the op bench. Test Plan: ``` buck run caffe2/benchmarks/operator_benchmark/pt:add_test -- --iterations 1 # ---------------------------------------- # PyTorch/Caffe2 Operator Micro-benchmarks # ---------------------------------------- # Tag : short # Benchmarking PyTorch: add # Mode: Eager # Name: add_M1_N1_K1_cpu # Input: M: 1, N: 1, K: 1, device: cpu Forward Execution Time (us) : 111.192 A/B test also pass Benchmark Run #2476535015 Reviewed By: hl475 Differential Revision: D19126970 fbshipit-source-id: 9b1ab48c121838836ba6e0ae664a48fe2d18efdd	2019-12-16 16:34:14 -08:00
Mingzhe Li	c6a8f884d8	add copy_ operator the op bench (#31327 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31327 Adds copy_ operator to the benchmark suite Test Plan: ``` buck run caffe2/benchmarks/operator_benchmark/pt:binary_test -- --iterations 1 --operators copy_ # ---------------------------------------- # PyTorch/Caffe2 Operator Micro-benchmarks # ---------------------------------------- # Tag : short # Benchmarking PyTorch: copy_ # Mode: Eager # Name: copy__M1_N1_K1_cpu_dtype_onetorch.int32_dtype_twotorch.int32 # Input: M: 1, N: 1, K: 1, device: cpu, dtype_one: torch.int32, dtype_two: torch.int32 Forward Execution Time (us) : 60.645 Reviewed By: hl475 Differential Revision: D19122910 fbshipit-source-id: e5f0b0e2612daae0201b1b4a87f52b971e0cc4a8	2019-12-16 13:45:12 -08:00
Mingzhe Li	d401ba1417	benchmark binary ops in binary_test (#31326 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31326 as title Test Plan: ``` buck run caffe2/benchmarks/operator_benchmark/pt:binary_test -- --iterations 1 # ---------------------------------------- # PyTorch/Caffe2 Operator Micro-benchmarks # ---------------------------------------- # Tag : short # Benchmarking PyTorch: add # Mode: Eager # Name: add_in_one[64,1,64]_in_two[1,64,1]_cpu_dtypetorch.float32 # Input: in_one: [64, 1, 64], in_two: [1, 64, 1], device: cpu, dtype: torch.float32 Forward Execution Time (us) : 28080.802 Reviewed By: hl475 Differential Revision: D19120113 fbshipit-source-id: 1105de208f7609cc6d74f0b5bc6fe75f19146b28	2019-12-16 13:45:08 -08:00
Zafar Takhirov	efe683fb2a	dynamicly quantized linear benchmarking Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30148 Test Plan: Imported from OSS Differential Revision: D18613006 Pulled By: z-a-f fbshipit-source-id: 3851189a2822fd09a5dd97c9d54774727822d2bf	2019-12-11 18:39:57 -08:00
TH3CHARLie	5edfe9cb80	add torch.square (#30719 ) Summary: fixes https://github.com/pytorch/pytorch/issues/30524 This adds an new operator `torch.square` to PyTorch I think it is ready for the first-time review now albanD Pull Request resolved: https://github.com/pytorch/pytorch/pull/30719 Differential Revision: D18909268 Pulled By: albanD fbshipit-source-id: 5626c445d8db20471a56fc1d7a3490e77812662b	2019-12-10 15:22:46 -08:00
Brian Wignall	e7fe64f6a6	Fix typos (#30606 ) Summary: Should be non-semantic. Uses https://en.wikipedia.org/wiki/Wikipedia:Lists_of_common_misspellings/For_machines to find likely typos. Pull Request resolved: https://github.com/pytorch/pytorch/pull/30606 Differential Revision: D18763028 Pulled By: mrshenli fbshipit-source-id: 896515a2156d062653408852e6c04b429fc5955c	2019-12-02 20:17:42 -08:00
Mingzhe Li	b68d1fc316	add small input shapes to some ops (#30617 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30617 as title Test Plan: buck run //caffe2/benchmarks/operator_benchmark:benchmark_all_test -- --iterations 1 --operator add,as_strided,cat,chunk,fill,linear,matmul,split Reviewed By: hl475 Differential Revision: D18764248 fbshipit-source-id: 510cf83542822acfa1b7b5e475b0cc7432f7ac19	2019-12-02 10:46:43 -08:00
Mingzhe Li	1aa80471b8	minor fix to filter (#30200 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30200 as title Test Plan: ``` buck run mode/opt //caffe2/benchmarks/operator_benchmark:benchmark_all_other_test -- --tag_filter all --iterations 1 --ai_pep_format True --operators None --iterations -1 --warmup_iterations -1 --wipe_cache --forward_only False --device cpu --tag_filter all --use_jit False --operator_range b-z # ---------------------------------------- # PyTorch/Caffe2 Operator Micro-benchmarks # ---------------------------------------- # Tag : all # Benchmarking PyTorch: batchnorm PyTorchObserver {"type": "PyTorch_batchnorm_M1_N256_K3136_cpu_Eager", "metric": "latency", "unit": "ms", "value": "0.29026457108557224"} PyTorchObserver {"type": "PyTorch_batchnorm_M1_N256_K3136_cpu_Eager", "metric": "latency", "unit": "ms", "value": "0.2813781425356865"} PyTorchObserver {"type": "PyTorch_batchnorm_M1_N256_K3136_cpu_Eager", "metric": "latency", "unit": "ms", "value": "0.28009670320898294"} ... Reviewed By: hl475 Differential Revision: D18627512 fbshipit-source-id: 23f622b96168f90a8d8648bfd9ff9a5116baafdf	2019-11-20 16:36:04 -08:00
Mingzhe Li	9cb8fb61c2	update operator_range discription in op bench (#30170 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30170 as title Test Plan: ``` buck-out/opt/gen/caffe2/benchmarks/operator_benchmark/benchmark_all_other_test.par --tag_filter all --iterations 1 --operator_range ef ... ValueError: The correct format for operator_range is <start>-<end>, or <point>, <start>-<end> buck-out/opt/gen/caffe2/benchmarks/operator_benchmark/benchmark_all_other_test.par --tag_filter all --iterations 1 --operator_range a-b # ---------------------------------------- # PyTorch/Caffe2 Operator Micro-benchmarks # ---------------------------------------- # Tag : all # Benchmarking PyTorch: add # Mode: Eager # Name: add_M8_N32_K256_cpu # Input: M: 8, N: 32, K: 256, device: cpu Forward Execution Time (us) : 60.551 # Benchmarking PyTorch: add # Mode: Eager # Name: add_M8_N32_K256_cuda # Input: M: 8, N: 32, K: 256, device: cuda Forward Execution Time (us) : 67.716 ... buck-out/opt/gen/caffe2/benchmarks/operator_benchmark/benchmark_all_other_test.par --tag_filter all --iterations 1 --operator_range b,d-f # ---------------------------------------- # PyTorch/Caffe2 Operator Micro-benchmarks # ---------------------------------------- # Tag : all # Benchmarking PyTorch: batchnorm # Mode: Eager # Name: batchnorm_M1_N256_K3136_cpu # Input: M: 1, N: 256, K: 3136, device: cpu Forward Execution Time (us) : 296.004 ... Reviewed By: hl475 Differential Revision: D18619975 fbshipit-source-id: 08f27ee2aeda47be431385f4b20ef7fbeb797516	2019-11-20 12:07:14 -08:00
Mingzhe Li	d11dfd1a84	only run embeddingbag op on cpu (#30163 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30163 as title Test Plan: ``` buck run mode/opt //caffe2/benchmarks/operator_benchmark:benchmark_all_other_test -- --tag_filter all --iterations 1 --device cuda --operators embeddingbag Parsing buck files: finished in 0.9 sec Building: finished in 02:32.5 min (100%) 7358/7358 jobs, 1 updated Total time: 02:33.5 min # ---------------------------------------- # PyTorch/Caffe2 Operator Micro-benchmarks # ---------------------------------------- # Tag : all buck run mode/opt //caffe2/benchmarks/operator_benchmark:benchmark_all_other_test -- --tag_filter all --iterations 1 --operators embeddingbag Parsing buck files: finished in 0.9 sec Building: finished in 5.3 sec (100%) 5604/5604 jobs, 0 updated Total time: 6.3 sec # ---------------------------------------- # PyTorch/Caffe2 Operator Micro-benchmarks # ---------------------------------------- # Tag : all # Benchmarking PyTorch: embeddingbag # Mode: Eager # Name: embeddingbag_embeddingbags80_dim64_modesum_input_size8_offset0_sparseTrue_cpu # Input: embeddingbags: 80, dim: 64, mode: sum, input_size: 8, offset: 0, sparse: True, device: cpu Forward Execution Time (us) : 62.608 ... Reviewed By: hl475 Differential Revision: D18617540 fbshipit-source-id: 062dd73c455db8b67749078603745651b55254b2	2019-11-20 10:02:39 -08:00
Mingzhe Li	2b1466e665	allow operator_range to take multiple ranges (#30124 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30124 as title Test Plan: ``` buck run mode/opt //caffe2/benchmarks/operator_benchmark:benchmark_all_other_test -- --tag_filter all --iterations 1 --device cuda --operator_range a,b-c # ---------------------------------------- # PyTorch/Caffe2 Operator Micro-benchmarks # ---------------------------------------- # Tag : all # Benchmarking PyTorch: add # Mode: Eager # Name: add_M8_N32_K256_cuda # Input: M: 8, N: 32, K: 256, device: cuda Forward Execution Time (us) : 71.683 # Benchmarking PyTorch: batchnorm # Mode: Eager # Name: batchnorm_M1_N256_K3136_cuda # Input: M: 1, N: 256, K: 3136, device: cuda Forward Execution Time (us) : 118.840 # Benchmarking PyTorch: batchnorm # Mode: Eager # Name: batchnorm_M1_N8192_K1_cuda # Input: M: 1, N: 8192, K: 1, device: cuda Forward Execution Time (us) : 134.274 # Benchmarking PyTorch: cat # Mode: Eager # Name: cat_M128_N128_K1_dim1_cuda # Input: M: 128, N: 128, K: 1, dim: 1, device: cuda Forward Execution Time (us) : 109.172 ... Reviewed By: hl475 Differential Revision: D18605640 fbshipit-source-id: 4ae9b91a50c4cdf1b161b6c5c58f365ba514050c	2019-11-19 16:15:46 -08:00
Mingzhe Li	0ab03d3283	only run embeddingbag benchmark on cpu (#30106 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30106 as title Test Plan: ``` buck run mode/opt //caffe2/benchmarks/operator_benchmark:benchmark_all_other_test -- --tag_filter all --iterations 1 --device cuda --operators embeddingbag # ---------------------------------------- # PyTorch/Caffe2 Operator Micro-benchmarks # ---------------------------------------- # Tag : all Reviewed By: hl475 Differential Revision: D18598198 fbshipit-source-id: 9b7d103410f1183fdf6776047ea2ef8dba4b7831	2019-11-19 12:07:34 -08:00
Mingzhe Li	23991e89cc	change operator_range to work with lower and upper in op bench (#30096 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30096 as title Test Plan: ``` buck run mode/opt caffe2/benchmarks/operator_benchmark:benchmark_all_quantized_test -- --iterations 1 --operator_range a-a # ---------------------------------------- # PyTorch/Caffe2 Operator Micro-benchmarks # ---------------------------------------- # Tag : short # Benchmarking PyTorch: add # Mode: Eager # Name: add_N2_dtypetorch.quint8_contigTrue # Input: N: 2, dtype: torch.quint8, contig: True Forward Execution Time (us) : 22.251 # Benchmarking PyTorch: add # Mode: Eager # Name: add_N2_dtypetorch.qint8_contigTrue # Input: N: 2, dtype: torch.qint8, contig: True Forward Execution Time (us) : 17.247 # Benchmarking PyTorch: add # Mode: Eager # Name: add_N2_dtypetorch.qint32_contigTrue # Input: N: 2, dtype: torch.qint32, contig: True Forward Execution Time (us) : 29.653 ... Reviewed By: hl475 Differential Revision: D18596447 fbshipit-source-id: eac8d9d90db244aa9799293c22bb0d30cf3edf58	2019-11-19 11:01:02 -08:00
Mingzhe Li	1597f22982	fix device check in op bench (#30091 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30091 as title Test Plan: ``` Before: buck run mode/opt //caffe2/benchmarks/operator_benchmark/pt:unary_test -- --device cuda # ---------------------------------------- # PyTorch/Caffe2 Operator Micro-benchmarks # ---------------------------------------- # Tag : short # Benchmarking PyTorch: abs # Mode: Eager # Name: abs_M512_N512_cpu # Input: M: 512, N: 512, device: cpu Forward Execution Time (us) : 91.190 # Benchmarking PyTorch: abs # Mode: Eager # Name: abs_M512_N512_cuda # Input: M: 512, N: 512, device: cuda Forward Execution Time (us) : 27.062 After: # ---------------------------------------- # PyTorch/Caffe2 Operator Micro-benchmarks # ---------------------------------------- # Tag : short # Benchmarking PyTorch: abs # Mode: Eager # Name: abs_M512_N512_cuda # Input: M: 512, N: 512, device: cuda Forward Execution Time (us) : 28.154 # Benchmarking PyTorch: abs_ # Mode: Eager # Name: abs__M512_N512_cuda # Input: M: 512, N: 512, device: cuda Forward Execution Time (us) : 15.959 ... Reviewed By: hl475 Differential Revision: D18595176 fbshipit-source-id: 048c5b7b2a5318c3687412e12e8d2d5f380a8139	2019-11-19 10:05:47 -08:00
Mingzhe Li	5b15f32697	rename benchmark_all_other_test (#30048 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30048 as title (Note: this ignores all push blocking failures!) Test Plan: ``` buck run mode/opt caffe2/benchmarks/operator_benchmark:benchmark_all_other_test # ---------------------------------------- # PyTorch/Caffe2 Operator Micro-benchmarks # ---------------------------------------- # Tag : short # Benchmarking PyTorch: add # Mode: Eager # Name: add_M64_N64_K64_cpu # Input: M: 64, N: 64, K: 64, device: cpu Forward Execution Time (us) : 142.032 ... Reviewed By: hl475 Differential Revision: D18580754 fbshipit-source-id: 125482d2987cbdb1d019ccedf56a9da5a7cebaba	2019-11-18 21:39:31 -08:00
Mingzhe Li	8b9bac1fad	add operator-range argument to the op bench (#30051 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30051 This argument takes hyphen delimited start and end chars to filter operators. If the first character of an operator is in the start and end range, it will be tested. Otherwise skipped. (Note: this ignores all push blocking failures!) Test Plan: ``` buck run mode/opt caffe2/benchmarks/operator_benchmark:benchmark_all_test -- --iterations 1 --operator_range b-c # ---------------------------------------- # PyTorch/Caffe2 Operator Micro-benchmarks # ---------------------------------------- # Tag : short # Benchmarking PyTorch: ceil # Mode: Eager # Name: ceil_M512_N512_cpu # Input: M: 512, N: 512, device: cpu Forward Execution Time (us) : 110.720 # Benchmarking PyTorch: ceil_ # Mode: Eager # Name: ceil__M512_N512_cpu # Input: M: 512, N: 512, device: cpu Forward Execution Time (us) : 51.128 ... buck run mode/opt caffe2/benchmarks/operator_benchmark:benchmark_all_test -- --iterations 1 --operator_range None # ---------------------------------------- # PyTorch/Caffe2 Operator Micro-benchmarks # ---------------------------------------- # Tag : short # Benchmarking PyTorch: abs # Mode: Eager # Name: abs_M512_N512_cpu # Input: M: 512, N: 512, device: cpu Forward Execution Time (us) : 107.113 # Benchmarking PyTorch: abs_ # Mode: Eager # Name: abs__M512_N512_cpu # Input: M: 512, N: 512, device: cpu Forward Execution Time (us) : 54.259 ... Reviewed By: hl475 Differential Revision: D18581910 fbshipit-source-id: b1a1a7ba76f4d6a61c8a1659f15e9c66097654d4	2019-11-18 20:34:43 -08:00
Mingzhe Li	64706e0a74	change conv, batchnorm input shapes (#30041 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30041 as title (Note: this ignores all push blocking failures!) Test Plan: ``` buck run mode/opt //caffe2/benchmarks/operator_benchmark/pt:conv_test # ---------------------------------------- # PyTorch/Caffe2 Operator Micro-benchmarks # ---------------------------------------- # Tag : None # Benchmarking PyTorch: ConvTranspose2d # Mode: Eager # Name: ConvTranspose2d_in_c512_out_c512_kernel3_stride2_N8_H64_W64_cpu # Input: in_c: 512, out_c: 512, kernel: 3, stride: 2, N: 8, H: 64, W: 64, device: cpu Forward Execution Time (us) : 751635.354 Reviewed By: hl475 Differential Revision: D18579767 fbshipit-source-id: 53bfac704828a836412434a66000c17f6ac1c727	2019-11-18 20:34:28 -08:00
Mingzhe Li	3250d5008f	change the starting iters to reduce execution time (#30040 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30040 The benchmark will run each test in a loop of 200 iters, then keep doubling the number of iters until the time is significant. For operators which have very large input shapes, the initial 200 iters will take too much time which is not really necessary. This diff changed that 200 to 100. (Note: this ignores all push blocking failures!) Test Plan: ``` Before # ---------------------------------------- # PyTorch/Caffe2 Operator Micro-benchmarks # ---------------------------------------- # Tag : None # Benchmarking PyTorch: ConvTranspose2d # Mode: Eager # Name: ConvTranspose2d_in_c512_out_c512_kernel3_stride2_N8_H64_W64_cpu # Input: in_c: 512, out_c: 512, kernel: 3, stride: 2, N: 8, H: 64, W: 64, device: cpu Forward Execution Time (us) : 729634.577 After # ---------------------------------------- # PyTorch/Caffe2 Operator Micro-benchmarks # ---------------------------------------- # Tag : None # Benchmarking PyTorch: ConvTranspose2d # Mode: Eager # Name: ConvTranspose2d_in_c512_out_c512_kernel3_stride2_N8_H64_W64_cpu # Input: in_c: 512, out_c: 512, kernel: 3, stride: 2, N: 8, H: 64, W: 64, device: cpu Forward Execution Time (us) : 718315.899 Reviewed By: hl475 Differential Revision: D18579588 fbshipit-source-id: ef52474cf77e7549bbab0a9ae7b1b0c04023d208	2019-11-18 20:34:16 -08:00
Mingzhe Li	189b24ebe9	reorganize test binaries of op bench (#30023 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30023 This diff doesn't change how users run the benchmarks. But under the hood, we group all the tests into three groups: unary test, quantized test, and the rest ops (we name it others here). Test Plan: ``` buck run //caffe2/benchmarks/operator_benchmark:benchmark_all_test -- --iterations 1 # ---------------------------------------- # PyTorch/Caffe2 Operator Micro-benchmarks # ---------------------------------------- # Tag : short # Benchmarking PyTorch: abs # Mode: Eager # Name: abs_M512_N512_cpu # Input: M: 512, N: 512, device: cpu Forward Execution Time (us) : 17914.301 ... # Benchmarking PyTorch: add # Mode: Eager # Name: add_M64_N64_K64_cpu_bwd2 # Input: M: 64, N: 64, K: 64, device: cpu Backward Execution Time (us) : 66525.855 ... # Benchmarking PyTorch: mul # Mode: Eager # Name: mul_N2_dtypetorch.qint32_contigTrue # Input: N: 2, dtype: torch.qint32, contig: True Forward Execution Time (us) : 290.555 ... Reviewed By: hl475 Differential Revision: D18574719 fbshipit-source-id: f7ff1d952031129adde51ebf002e4891bd484680	2019-11-18 12:21:26 -08:00
Mingzhe Li	c543034531	add cuda sync when ops running on gpu (#29936 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29936 This diff adds synchronization after op execution to ensure all the cuda streams complete. Test Plan: ``` buck run mode/opt //caffe2/benchmarks/operator_benchmark:benchmark_all_test -- --iterations 1 # ---------------------------------------- # PyTorch/Caffe2 Operator Micro-benchmarks # ---------------------------------------- # Tag : short # Benchmarking PyTorch: add # Mode: Eager # Name: add_M64_N64_K64_cpu # Input: M: 64, N: 64, K: 64, device: cpu Forward Execution Time (us) : 154.412 # Benchmarking PyTorch: add # Mode: Eager # Name: add_M64_N64_K64_cuda # Input: M: 64, N: 64, K: 64, device: cuda Forward Execution Time (us) : 101.115 ... Reviewed By: hl475 Differential Revision: D18542732 fbshipit-source-id: b979d26a174f488e971074dc1e16b00e17179c80	2019-11-15 18:02:48 -08:00
Mingzhe Li	3f5dc95b57	fix device check in op bench (#29918 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29918 Some of the tests don't specify `device` in the input configs so filter by device won't work for them. This diff fixes that issue. Test Plan: ``` buck run mode/opt //caffe2/benchmarks/operator_benchmark/pt:qpool_test -- --iterations 1 --device cpu # ---------------------------------------- # PyTorch/Caffe2 Operator Micro-benchmarks # ---------------------------------------- # Tag : short # Benchmarking PyTorch: QAdaptiveAvgPool2dBenchmark # Mode: Eager # Name: QAdaptiveAvgPool2dBenchmark_N4_C3_input_size(224,224)_output_size(112,112)_contigTrue_dtypetorch.qint32 # Input: N: 4, C: 3, input_size: (224, 224), output_size: (112, 112), contig: True, dtype: torch.qint32 Forward Execution Time (us) : 2891.172 Reviewed By: hl475 Differential Revision: D18535766 fbshipit-source-id: 09d89cf23b3caab6c0bc3b8a9ae55cc439b98e0f	2019-11-15 13:55:38 -08:00
Mingzhe Li	60a33cac2b	reduce input shapes of long tag in op bench (#29865 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29865 For some operators, the number of tests (forward + backward) could easily go above 100. Many of them could be redundant so this diff tries to reduce the number of shapes. Test Plan: ``` buck run //caffe2/benchmarks/operator_benchmark:benchmark_all_test -- --iterations 1 # ---------------------------------------- # PyTorch/Caffe2 Operator Micro-benchmarks # ---------------------------------------- # Tag : short # Benchmarking PyTorch: add # Mode: Eager # Name: add_M64_N64_K64_cpu # Input: M: 64, N: 64, K: 64, device: cpu Forward Execution Time (us) : 28418.926 ... Reviewed By: hl475 Differential Revision: D18520946 fbshipit-source-id: 1056d6d5a9c46bc2d508ff133039aefeb9d11c27	2019-11-14 20:19:09 -08:00
Mingzhe Li	90e3bbf3ab	support all with tag_filter to run all shapes (#29864 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29864 This diff make `all` as a reserved keyword for tag_filter. When `all` is passed from user, it will run all the supported shapes. Test Plan: ``` buck run //caffe2/benchmarks/operator_benchmark/pt:add_test -- --iterations 1 --tag_filter all # ---------------------------------------- # PyTorch/Caffe2 Operator Micro-benchmarks # ---------------------------------------- # Tag : all # Benchmarking PyTorch: add # Mode: Eager # Name: add_M8_N32_K256_cpu # Input: M: 8, N: 32, K: 256, device: cpu Forward Execution Time (us) : 6798.688 ... Reviewed By: hl475 Differential Revision: D18520249 fbshipit-source-id: 4d55af9f46f89b2fe8842e1a00dfa8e5acaf4fa2	2019-11-14 20:19:05 -08:00
Mingzhe Li	5da2bf945e	add embeddingbag to benchmark_all_test (#29830 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29830 as title Test Plan: na Reviewed By: hl475 Differential Revision: D18506023 fbshipit-source-id: 15693894c0aa736ab3e818bc740099f0d629cb84	2019-11-14 20:13:57 -08:00
Mingzhe Li	747233e3bd	minir edit to fix benchmark_all_test cuda error (#29829 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29829 This diff replaces the if check cuda with to(device...) which is a much cleaner interface. Test Plan: ``` buck run mode/opt //caffe2/benchmarks/operator_benchmark:benchmark_all_test -- --iterations 1 # ---------------------------------------- # PyTorch/Caffe2 Operator Micro-benchmarks # ---------------------------------------- # Tag : short # Benchmarking PyTorch: add # Mode: Eager # Name: add_M64_N64_K64_cpu # Input: M: 64, N: 64, K: 64, device: cpu Forward Execution Time (us) : 129.548 # Benchmarking PyTorch: add # Mode: Eager # Name: add_M64_N64_K64_cuda # Input: M: 64, N: 64, K: 64, device: cuda Forward Execution Time (us) : 48.313 ... Reviewed By: bddppq Differential Revision: D18507568 fbshipit-source-id: 32534e76b2e27d59a631a4d76a0d93700e975ea4	2019-11-14 11:13:36 -08:00
Mingzhe Li	ad95099f45	fix benchmark_all_test when running on gpu (#29818 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29818 When some of the test running on cuda, there is a runtime error because of missing data transfer from cpu to cuda. This diff fixes that issue. Test Plan: ``` buck run mode/opt //caffe2/benchmarks/operator_benchmark:benchmark_all_test -- --iterations 1 # ---------------------------------------- # PyTorch/Caffe2 Operator Micro-benchmarks # ---------------------------------------- # Tag : short # Benchmarking PyTorch: add # Mode: Eager # Name: add_M64_N64_K64_cpu # Input: M: 64, N: 64, K: 64, device: cpu Forward Execution Time (us) : 165.241 # Benchmarking PyTorch: add # Mode: Eager # Name: add_M64_N64_K64_cuda # Input: M: 64, N: 64, K: 64, device: cuda Forward Execution Time (us) : 56.546 ... Reviewed By: hl475 Differential Revision: D18506269 fbshipit-source-id: 87942d7a52bd398600766c0f5363d791b74a6ca6	2019-11-14 10:10:48 -08:00
Mingzhe Li	b70d571233	add embeddingbag operator the the benchmark suite (#29784 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29784 Add embeddingbag operator to the benchmark suite with different number of embeddings, dims, and inputs. Test Plan: ``` buck run //caffe2/benchmarks/operator_benchmark/pt:embeddingbag_test -- --iterations 1 # ---------------------------------------- # PyTorch/Caffe2 Operator Micro-benchmarks # ---------------------------------------- # Tag : short # Benchmarking PyTorch: embeddingbag # Mode: Eager # Name: embeddingbag_embeddingbags2300_dim64_modesum_input_size16_offset0_sparseTrue # Input: embeddingbags: 2300, dim: 64, mode: sum, input_size: 16, offset: 0, sparse: True Forward Execution Time (us) : 624.838 # Benchmarking PyTorch: embeddingbag # Mode: Eager # Name: embeddingbag_embeddingbags2300_dim64_modesum_input_size64_offset0_sparseTrue # Input: embeddingbags: 2300, dim: 64, mode: sum, input_size: 64, offset: 0, sparse: True Forward Execution Time (us) : 636.744 # Benchmarking PyTorch: embeddingbag # Mode: Eager # Name: embeddingbag_embeddingbags80_dim64_modesum_input_size8_offset0_sparseTrue # Input: embeddingbags: 80, dim: 64, mode: sum, input_size: 8, offset: 0, sparse: True Backward Execution Time (us) : 2325.291 # Benchmarking PyTorch: embeddingbag # Mode: Eager # Name: embeddingbag_embeddingbags80_dim64_modesum_input_size16_offset0_sparseTrue # Input: embeddingbags: 80, dim: 64, mode: sum, input_size: 16, offset: 0, sparse: True Backward Execution Time (us) : 2528.658 ... Reviewed By: bddppq Differential Revision: D18496340 fbshipit-source-id: 157dcff2ea4ec13416fe161382fcefd47ce4cc01	2019-11-14 10:05:47 -08:00
Mingzhe Li	e53b510773	add addmm op to the benchmark suite (#29783 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29783 Add addmm operator which reuses existing input shapes for the add operator. Test Plan: ``` buck run //caffe2/benchmarks/operator_benchmark/pt:add_test -- --iterations 1 # ---------------------------------------- # PyTorch/Caffe2 Operator Micro-benchmarks # ---------------------------------------- # Tag : short # Benchmarking PyTorch: addmm # Mode: Eager # Name: addmm_M64_N64_K64_cpu # Input: M: 64, N: 64, K: 64, device: cpu Forward Execution Time (us) : 759.237 # Benchmarking PyTorch: addmm # Mode: Eager # Name: addmm_M64_N64_K128_cpu # Input: M: 64, N: 64, K: 128, device: cpu Forward Execution Time (us) : 922.764 # Benchmarking PyTorch: addmm # Mode: Eager # Name: addmm_M64_N64_K64_cpu_bwdall # Input: M: 64, N: 64, K: 64, device: cpu Backward Execution Time (us) : 4689.546 # Benchmarking PyTorch: addmm # Mode: Eager # Name: addmm_M64_N64_K64_cpu_bwd1 # Input: M: 64, N: 64, K: 64, device: cpu Backward Execution Time (us) : 1700.093 # Benchmarking PyTorch: addmm # Mode: Eager # Name: addmm_M64_N64_K64_cpu_bwd2 # Input: M: 64, N: 64, K: 64, device: cpu Backward Execution Time (us) : 2947.427 # Benchmarking PyTorch: addmm # Mode: Eager # Name: addmm_M64_N64_K64_cpu_bwd3 # Input: M: 64, N: 64, K: 64, device: cpu Backward Execution Time (us) : 2518.043 # Benchmarking PyTorch: addmm # Mode: Eager # Name: addmm_M64_N64_K128_cpu_bwdall # Input: M: 64, N: 64, K: 128, device: cpu Backward Execution Time (us) : 5848.369 Reviewed By: bddppq Differential Revision: D18496476 fbshipit-source-id: 4f1c116a2676a64106afa958e8c8a8e109f35a4a	2019-11-14 10:02:55 -08:00
Mingzhe Li	f3b15727c5	fix op benchmark OOM issue (#29794 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29794 Before this diff, all tests of an operator are created at once before testing. Once an operator is benchmarked, the same process will move to the next operator and so on. The issue is that the number of tests of a single operator could be > 100 which can cause OOM issues. This diff avoids creating all the tests of an operator at once by using generators which creates/runs test one by one. Test Plan: ``` buck run //caffe2/benchmarks/operator_benchmark:benchmark_all_quantized_test -- --iterations 1 # ---------------------------------------- # PyTorch/Caffe2 Operator Micro-benchmarks # ---------------------------------------- # Tag : short # Benchmarking PyTorch: relu # Mode: Eager # Name: relu_dims(3,4,5)_contigFalse_inplaceFalse_dtypetorch.quint8 # Input: dims: (3, 4, 5), contig: False, inplace: False, dtype: torch.quint8 Forward Execution Time (us) : 52.493 # Benchmarking PyTorch: relu # Mode: Eager # Name: relu_dims(3,4,5)_contigFalse_inplaceFalse_dtypetorch.qint8 # Input: dims: (3, 4, 5), contig: False, inplace: False, dtype: torch.qint8 Forward Execution Time (us) : 44.945 ... Reviewed By: hl475 Differential Revision: D18500103 fbshipit-source-id: 747c0ad0d302177da04da36e112c67f154115b6e	2019-11-13 22:22:58 -08:00
Zafar Takhirov	d2aa4c611f	observer benchmarking Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29508 Test Plan: Imported from OSS Differential Revision: D18415171 Pulled By: z-a-f fbshipit-source-id: 5ebedee8c17448e36853e0c1bf778bb128975678	2019-11-12 23:28:10 -08:00
Zafar Takhirov	29e509ff1d	Fix a missing comma in quantized benchmark Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29685 Test Plan: Imported from OSS Differential Revision: D18463246 Pulled By: z-a-f fbshipit-source-id: c21fd7892f3701afcc5faa8bc03f98b6f6550d0f	2019-11-12 16:50:46 -08:00
Zafar Takhirov	9bb0e2834d	Fixing data type in quantized pool benchmarking Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29663 Test Plan: Imported from OSS Differential Revision: D18456671 Pulled By: z-a-f fbshipit-source-id: b36fc56e4f29937e458308f4c13f7a5e37665269	2019-11-12 13:22:53 -08:00
Zafar Takhirov	3b43cfde80	Benchmarking per channel quantization Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29627 Test Plan: Imported from OSS Differential Revision: D18443929 Pulled By: z-a-f fbshipit-source-id: a0345cc5e259b4ce98589252719b8885326d43a3	2019-11-12 11:33:42 -08:00
Zafar Takhirov	5db361bd32	Quantized interpolation benchmarks Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29509 Test Plan: Imported from OSS Differential Revision: D18415367 Pulled By: z-a-f fbshipit-source-id: 84d0aaa81b131b49762edde6ade27e61acb99a42	2019-11-12 11:23:03 -08:00
Zafar Takhirov	f95e8ea1be	Benchmarking quantized methods (#29625 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29625 This PR also adds a template for benchmarking methods that require no input. Test Plan: Imported from OSS Differential Revision: D18443485 Pulled By: z-a-f fbshipit-source-id: 6f25c3a7cd94e396c112b5f7c33307b71f78ecd3	2019-11-12 11:08:55 -08:00
Zafar Takhirov	3b452ca428	quantized topk benchmarking Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29505 Test Plan: Imported from OSS Differential Revision: D18414851 Pulled By: z-a-f fbshipit-source-id: 23999ef95c2f087066c4da36b2bf35516ebc0421	2019-11-12 00:33:47 -08:00
Zafar Takhirov	a0d4d5062b	Quantized unary ops benchmarking (mostly template) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29503 Test Plan: Imported from OSS Differential Revision: D18414589 Pulled By: z-a-f fbshipit-source-id: ab5af490359b3e0a51642a46aef86f7be720deff	2019-11-11 23:48:36 -08:00
Zafar Takhirov	fb07098e2b	Creating a base benchmarking class for activations. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29182 Test Plan: Imported from OSS Differential Revision: D18319456 Pulled By: z-a-f fbshipit-source-id: d2314bb30a584551b5f1c8610b36c4c10c27ac85	2019-11-11 18:24:44 -08:00
Mingzhe Li	af3468a1c7	change op bench input shape to reduce execution time (#29616 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29616 1. Reduce the predefined_min_time which is the minimum time each test needs to run. Based on the test result, the average time across different epoch are pretty stable before exiting. So we can safely reduce the predefined time here. 2. Chang the input shapes of several ops Test Plan: ``` # ---------------------------------------- # PyTorch/Caffe2 Operator Micro-benchmarks # ---------------------------------------- # Tag : short # Benchmarking PyTorch: add 200 256.044864655 400 165.850520134 800 163.579881191 1600 162.871927023 3200 160.3128016 # Mode: Eager # Name: add_cpu_M64_K64_bwd1_N64 # Input: device: cpu, K: 64, M: 64, N: 64 Backward Execution Time (us) : 164.715 # Benchmarking PyTorch: add 200 170.650482178 400 168.895125389 800 169.867575169 1600 163.400024176 3200 168.658420444 # Mode: Eager # Name: add_cpu_M64_K64_bwd2_N64 # Input: device: cpu, K: 64, M: 64, N: 64 Backward Execution Time (us) : 168.777 Reviewed By: hl475 Differential Revision: D18438540 fbshipit-source-id: 1fd27cf4bbc34e46e74393af912ee2fcb75c33b2	2019-11-11 16:58:27 -08:00
Mingzhe Li	7374dd0d52	remove SkipInputShape flag (#29615 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29615 Remove that flag as it's not needed any more. Test Plan: na Reviewed By: hl475 Differential Revision: D18440271 fbshipit-source-id: 41b0659c72ef746a1cc268174fd1e7dc2beb1ae2	2019-11-11 16:56:40 -08:00
Mingzhe Li	b5a38fa98e	update op bench readme (#29596 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29596 as title Test Plan: na Reviewed By: hl475 Differential Revision: D18437811 fbshipit-source-id: 7996d1689d8a46849b62b2b3875c67cf8dc5861c	2019-11-11 15:33:29 -08:00
Mingzhe Li	00c224f0f2	move quantized tests from benchmark_all_test to benchmark_all_quantized_test (#29590 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29590 as title Test Plan: ``` buck run //caffe2/benchmarks/operator_benchmark:benchmark_all_test -- --iteration 1 Parsing buck files: finished in 1.0 sec Creating action graph: finished in 43.0 sec Building: finished in 16.0 sec (100%) 10053/10053 jobs, 1 updated Total time: 01:00.0 min # ---------------------------------------- # PyTorch/Caffe2 Operator Micro-benchmarks # ---------------------------------------- # Tag : short # Benchmarking PyTorch: add # Mode: Eager # Name: add_M64_N64_K64_cpu # Input: M: 64, N: 64, K: 64, device: cpu Forward Execution Time (us) : 45419.667 ... buck run //caffe2/benchmarks/operator_benchmark:benchmark_all_quantized_test Parsing buck files: finished in 1.0 sec Building: finished in 6.0 sec (100%) 10053/10053 jobs, 1 updated Total time: 7.0 sec # ---------------------------------------- # PyTorch/Caffe2 Operator Micro-benchmarks # ---------------------------------------- # Tag : short # Benchmarking PyTorch: QReLU # Mode: Eager # Name: QReLU_dims(1,)_permute_dimsFalse_inplaceFalse_dtypetorch.quint8 # Input: dims: (1,), permute_dims: False, inplace: False, dtype: torch.quint8 Forward Execution Time (us) : 137.685 ... Reviewed By: hl475 Differential Revision: D18436727 fbshipit-source-id: 317ec0e4bd2a6e33c9a60830f01ed805ae412449	2019-11-11 14:59:29 -08:00
Mingzhe Li	137eea5938	change module_name in chunk_test (#29589 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29589 as title Test Plan: ``` buck run //caffe2/benchmarks/operator_benchmark/pt:chunk_test -- --iteration 1 # ---------------------------------------- # PyTorch/Caffe2 Operator Micro-benchmarks # ---------------------------------------- # Tag : short # Benchmarking PyTorch: chunk # Mode: Eager # Name: chunk_M256_N512_chunks2_cpu # Input: M: 256, N: 512, chunks: 2, device: cpu Forward Execution Time (us) : 148.345 # Benchmarking PyTorch: chunk # Mode: Eager # Name: chunk_M512_N512_chunks2_cpu # Input: M: 512, N: 512, chunks: 2, device: cpu Forward Execution Time (us) : 125.239 Reviewed By: hl475 Differential Revision: D18436532 fbshipit-source-id: e7100f4605471e27703b2e2e863b971a93229854	2019-11-11 14:59:24 -08:00

1 2 3 4

199 Commits