pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
Mingzhe Li	6e1c18303b	unify linear benchmark (#28897 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28897 as title Test Plan: ``` buck run mode/opt //caffe2/benchmarks/operator_benchmark/pt:linear_test # ---------------------------------------- # PyTorch/Caffe2 Operator Micro-benchmarks # ---------------------------------------- # Tag : short # Benchmarking PyTorch: linear # Mode: Eager # Name: linear_N4_IN256_OUT128_cpu # Input: N: 4, IN: 256, OUT: 128, device: cpu Forward Execution Time (us) : 39.275 Reviewed By: hl475 Differential Revision: D18228070 fbshipit-source-id: 9c209eb74e574c6ef85ebcd78b824ef7d5e65dde	2019-10-30 16:25:48 -07:00
Mingzhe Li	a7b235f968	unify gather benchmark (#28895 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28895 as title Test Plan: ``` buck run mode/opt //caffe2/benchmarks/operator_benchmark/pt:conv_test # ---------------------------------------- # PyTorch/Caffe2 Operator Micro-benchmarks # ---------------------------------------- # Tag : short # Benchmarking PyTorch: Conv1d # Mode: Eager # Name: Conv1d_in_c256_out_c256_kernel3_stride1_N1_L64_cpu # Input: in_c: 256, out_c: 256, kernel: 3, stride: 1, N: 1, L: 64, device: cpu Forward Execution Time (us) : 208.936 Reviewed By: hl475 Differential Revision: D18227757 fbshipit-source-id: 493dd81108848fe3d48fb5ad940eb6aef84b639c	2019-10-30 16:25:43 -07:00
Mingzhe Li	6e4147c72c	unify conv benchmark (#28894 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28894 as title Test Plan: ``` buck run mode/opt //caffe2/benchmarks/operator_benchmark/pt:conv_test # ---------------------------------------- # PyTorch/Caffe2 Operator Micro-benchmarks # ---------------------------------------- # Tag : short # Benchmarking PyTorch: Conv1d # Mode: Eager # Name: Conv1d_in_c256_out_c256_kernel3_stride1_N1_L64_cpu # Input: in_c: 256, out_c: 256, kernel: 3, stride: 1, N: 1, L: 64, device: cpu Forward Execution Time (us) : 208.936 Reviewed By: hl475 Differential Revision: D18227626 fbshipit-source-id: 1ae768f529aa888415840ca10197323407e47d39	2019-10-30 16:25:39 -07:00
Mingzhe Li	dbf8f535fc	unify chunk benchmark (#28892 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28892 as title Test Plan: ``` buck run mode/opt //caffe2/benchmarks/operator_benchmark/pt:chunk_test # ---------------------------------------- # PyTorch/Caffe2 Operator Micro-benchmarks # ---------------------------------------- # Tag : short # Benchmarking PyTorch: chunks # Mode: Eager # Name: chunks_M256_N512_chunks2_cpu # Input: M: 256, N: 512, chunks: 2, device: cpu Forward Execution Time (us) : 4.098 Reviewed By: hl475 Differential Revision: D18227499 fbshipit-source-id: 72268b7fe94a7d92d6e47f58f33902a33367c68b	2019-10-30 16:25:35 -07:00
Mingzhe Li	88b2bfd706	unify cat benchmark (#28893 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28893 as title Test Plan: ``` buck run mode/opt //caffe2/benchmarks/operator_benchmark/pt:cat_test # ---------------------------------------- # PyTorch/Caffe2 Operator Micro-benchmarks # ---------------------------------------- # Tag : short # Benchmarking PyTorch: cat # Mode: Eager # Name: cat_M256_N512_K1_dim0_cpu # Input: M: 256, N: 512, K: 1, dim: 0, device: cpu Forward Execution Time (us) : 78.607 Reviewed By: hl475 Differential Revision: D18227341 fbshipit-source-id: d383709a5aab600f99b37d07e4d4393645289101	2019-10-30 15:53:37 -07:00
Mingzhe Li	aa30b37d2e	unify batchnorm benchmark (#28889 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28889 as title Test Plan: ``` buck run mode/opt //caffe2/benchmarks/operator_benchmark/pt:batchnorm_test # ---------------------------------------- # PyTorch/Caffe2 Operator Micro-benchmarks # ---------------------------------------- # Tag : short # Benchmarking PyTorch: batchnorm # Mode: Eager # Name: batchnorm_M1_N256_K3136_cpu # Input: M: 1, N: 256, K: 3136, device: cpu Forward Execution Time (us) : 276.192 Reviewed By: hl475 Differential Revision: D18227180 fbshipit-source-id: d8abe56237bb84903315332a5ecdaa1dff613110	2019-10-30 15:53:33 -07:00
Mingzhe Li	740474838f	unify as_strided benchmark (#28890 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28890 as title Test Plan: ``` buck run mode/opt //caffe2/benchmarks/operator_benchmark/pt:as_strided_test # ---------------------------------------- # PyTorch/Caffe2 Operator Micro-benchmarks # ---------------------------------------- # Tag : short # Benchmarking PyTorch: as_strided # Mode: Eager # Name: as_strided_M256_N256_size(32,32)_stride(1,1)_storage_offset0_cpu # Input: M: 256, N: 256, size: (32, 32), stride: (1, 1), storage_offset: 0, device: cpu Forward Execution Time (us) : 2.792 ... Reviewed By: hl475 Differential Revision: D18227052 fbshipit-source-id: e17d9335ec89b47706a363bdb31451a01d4cbc5b	2019-10-30 15:53:29 -07:00
Mingzhe Li	db15c2ba20	unify add benchmark format (#28891 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28891 as title Test Plan: ``` buck run mode/opt //caffe2/benchmarks/operator_benchmark/pt:add_test # ---------------------------------------- # PyTorch/Caffe2 Operator Micro-benchmarks # ---------------------------------------- # Tag : short # Benchmarking PyTorch: add # Mode: Eager # Name: add_M64_N64_K64_cpu # Input: M: 64, N: 64, K: 64, device: cpu Forward Execution Time (us) : 125.279 ... Reviewed By: hl475 Differential Revision: D18226789 fbshipit-source-id: 0cc51c6691533b02f662d4b6108916455f3a5b95	2019-10-30 15:53:25 -07:00
Mingzhe Li	607defa8a9	print per block avg time when running on AI-PEP machines (#28838 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28838 as title Test Plan: ``` buck run mode/opt //caffe2/benchmarks/operator_benchmark/pt:softmax_test -- --ai_pep_format true Total time: 02:36.7 min # ---------------------------------------- # PyTorch/Caffe2 Operator Micro-benchmarks # ---------------------------------------- # Tag : short # Benchmarking PyTorch: Softmax /proc/self/fd/4/softmax_test.py:57: UserWarning: Implicit dimension choice for softmax has been deprecated. Change the call to include dim=X as an argument. """ PyTorchObserver {"type": "PyTorch_Softmax_N4_C3_H128_W128", "metric": "latency", "unit": "ms", "value": "4.83197245048359"} PyTorchObserver {"type": "PyTorch_Softmax_N4_C3_H128_W128", "metric": "latency", "unit": "ms", "value": "4.839232977246866"} PyTorchObserver {"type": "PyTorch_Softmax_N4_C3_H128_W128", "metric": "latency", "unit": "ms", "value": "4.7970924858236685"} PyTorchObserver {"type": "PyTorch_Softmax_N4_C3_H128_W128", "metric": "latency", "unit": "ms", "value": "4.708389271399938"} # Benchmarking PyTorch: Softmax ... Reviewed By: hl475 Differential Revision: D18202504 fbshipit-source-id: 4a332763432b3b5886f241bb2ce49d4df481a6f3	2019-10-29 12:08:33 -07:00
Mingzhe Li	0a68e8bab0	fix op bench runtime error when use_jit is enabled (#28837 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28837 The JIT code used in op bench is not compatibility with latest JIT code path. This diff aims to resolve that issue. Test Plan: ```buck run mode/opt //caffe2/benchmarks/operator_benchmark/pt:add_test -- --use_jit Building: finished in 02:29.8 min (100%) 7055/7055 jobs, 1 updated Total time: 02:30.3 min # ---------------------------------------- # PyTorch/Caffe2 Operator Micro-benchmarks # ---------------------------------------- # Tag : short # Benchmarking PyTorch: add # Mode: JIT # Name: add_M64_N64_K64_cpu # Input: M: 64, N: 64, K: 64, device: cpu Forward Execution Time (us) : 118.052 Reviewed By: hl475 Differential Revision: D18197057 fbshipit-source-id: 92edae8a48abc4115a558a91ba46cc9c3edb2eb8	2019-10-29 12:08:28 -07:00
Mingzhe Li	4703854321	change softmax input shape (#28836 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28836 as title Test Plan: ``` buck run mode/opt //caffe2/benchmarks/operator_benchmark/pt:softmax_test Invalidating internal cached state: Buck configuration options changed between invocations. This may cause slower builds. Changed value project.buck_out='buck-out/opt' (was 'buck-out/dev') ... and 56 more. See logs for all changes Parsing buck files: finished in 6.2 sec Creating action graph: finished in 8.8 sec Building: finished in 05:42.6 min (100%) 28336/28336 jobs, 23707 updated Total time: 05:57.7 min # ---------------------------------------- # PyTorch/Caffe2 Operator Micro-benchmarks # ---------------------------------------- # Tag : short # Benchmarking PyTorch: Softmax /proc/self/fd/4/softmax_test.py:57: UserWarning: Implicit dimension choice for softmax has been deprecated. Change the call to include dim=X as an argument. """ # Mode: Eager # Name: Softmax_N4_C3_H256_W256 # Input: N: 4, C: 3, H: 256, W: 256 Forward Execution Time (us) : 18422.487 Reviewed By: hl475 Differential Revision: D18202335 fbshipit-source-id: 0bb376cb465d998a49196e148d48d436126ae334	2019-10-29 12:05:25 -07:00
Mingzhe Li	9f44a04613	separate PT and C2 to reduce build time (#28731 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28731 as title Test Plan: ``` Before: buck run mode/opt caffe2/benchmarks/operator_benchmark:benchmark_all_test -- --operator sigmoid Invalidating internal cached state: Buck configuration options changed between invocations. This may cause slower builds. Changed value project.buck_out='buck-out/opt' (was 'buck-out/dev') ... and 69 more. See logs for all changes Parsing buck files: finished in 7.2 sec Creating action graph: finished in 10.0 sec Building: finished in 06:38.4 min (100%) 29890/29890 jobs, 29890 updated Total time: 06:55.7 min # ---------------------------------------- # PyTorch/Caffe2 Operator Micro-benchmarks # ---------------------------------------- # Tag : short # Benchmarking PyTorch: sigmoid With this diff buck run mode/opt caffe2/benchmarks/operator_benchmark:benchmark_all_test -- --operator sigmoid Parsing buck files: finished in 6.4 sec Creating action graph: finished in 9.8 sec Building: finished in 06:35.9 min (100%) 29892/29892 jobs, 29892 updated Total time: 06:52.1 min Reviewed By: hl475 Differential Revision: D18152071 fbshipit-source-id: 80c29570581bbd2f0e78e2df32734c17a2b036ee	2019-10-28 11:10:47 -07:00
Mingzhe Li	e886450863	report p50 time instead of avg (#28722 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28722 as title Test Plan: ```buck run mode/opt caffe2/benchmarks/operator_benchmark:benchmark_all_test -- --operator sigmoid # ---------------------------------------- # PyTorch/Caffe2 Operator Micro-benchmarks # ---------------------------------------- # Tag : short # Benchmarking PyTorch: sigmoid iters: 200, 462.6029555220157 iters: 400, 441.04792759753764 iters: 800, 441.81562116136774 iters: 1600, 440.79964311094955 iters: 3200, 436.3108493271284 iters: 6400, 440.87966314691585 iters: 12800, 452.29464218209614 # Mode: Eager # Name: sigmoid_M512_N512 # Input: M: 512, N: 512 Forward Execution Time (us) : 441.048 Reviewed By: hl475 Differential Revision: D18149525 fbshipit-source-id: 5fe70a35b790ee7ad3ff57c0cb0b1c29cb609b83	2019-10-25 17:22:27 -07:00
なるみ	d83389d327	Ignore F401 in all __init__.py without putting noqa (#25823 ) Summary: By adding `per-file-ignores = __init__.py: F401` into `.flake8` with `flake8>=3.7`, we can ignore F410 in all `__init__.py` without putting `# noqa: F401` line by line. http://flake8.pycqa.org/en/latest/user/options.html?highlight=per-file-ignores#cmdoption-flake8-per-file-ignores Pull Request resolved: https://github.com/pytorch/pytorch/pull/25823 Differential Revision: D17252182 Pulled By: soumith fbshipit-source-id: 87b174075b79e4078953a7521bd1a8f82405646b	2019-10-23 15:28:13 -07:00
Sebastian Messmer	243298668c	Remove confusing torch::jit::RegisterOperators for custom ops (#28229 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28229 We have `torch::RegisterOperators` for custom ops. `torch::jit::RegisterOperators` had a dual state of being able to register custom ops if called one way and being able to register pure JIT ops if called another way. This is confusing because you end up in different operator libraries depending on which API exactly you're using. This PR removes the ability for torch::jit::RegisterOperators to register custom ops and forces people to use the new torch::RegisterOperators. This was already deprecated before but we now remove it. ghstack-source-id: 92137305 Test Plan: unit tests Differential Revision: D17981895 fbshipit-source-id: 0af267dfdc3c6a2736740091cf841bac40deff40	2019-10-18 10:46:31 -07:00
Mingzhe Li	5c2bf8abe5	change linear benchmark shapes (#28228 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28228 as title Test Plan: ``` buck run //caffe2/benchmarks/operator_benchmark/pt:linear_test # ---------------------------------------- # PyTorch/Caffe2 Operator Micro-benchmarks # ---------------------------------------- # Tag : short # Benchmarking PyTorch: linear # Mode: Eager # Name: linear_N32_IN1024_OUT256 # Input: N: 32, IN: 1024, OUT: 256 Forward Execution Time (us) : 1501.918 # Benchmarking PyTorch: linear # Mode: Eager # Name: linear_N64_IN256_OUT100 # Input: N: 64, IN: 256, OUT: 100 Forward Execution Time (us) : 1175.672 Reviewed By: hl475 Differential Revision: D17980463 fbshipit-source-id: c8aaf6fa4d847037accb1e5b9ee04900690fd6ae	2019-10-17 11:09:10 -07:00
Mingzhe Li	cbcb70f84c	print last 50 runs when using ai_pep_format (#28128 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28128 as title Test Plan: ``` # ---------------------------------------- # PyTorch/Caffe2 Operator Micro-benchmarks # ---------------------------------------- # Tag : short # Benchmarking PyTorch: add PyTorchObserver {"type": "add_M64_N64_K64_cpu", "metric": "latency", "unit": "ms", "value": "29.169559478759766"} PyTorchObserver {"type": "add_M64_N64_K64_cpu", "metric": "latency", "unit": "ms", "value": "29.206514358520508"} PyTorchObserver {"type": "add_M64_N64_K64_cpu", "metric": "latency", "unit": "ms", "value": "29.4950008392334"} PyTorchObserver {"type": "add_M64_N64_K64_cpu", "metric": "latency", "unit": "ms", "value": "29.172897338867188"} PyTorchObserver {"type": "add_M64_N64_K64_cpu", "metric": "latency", "unit": "ms", "value": "29.27255630493164"} PyTorchObserver {"type": "add_M64_N64_K64_cpu", "metric": "latency", "unit": "ms", "value": "29.549837112426758"} PyTorchObserver {"type": "add_M64_N64_K64_cpu", "metric": "latency", "unit": "ms", "value": "29.63113784790039"} ... Reviewed By: hl475 Differential Revision: D17957611 fbshipit-source-id: 4e70ba2070b97fbbca0d6d4295abbead2ac356d4	2019-10-16 15:22:23 -07:00
Mingzhe Li	182abb2580	accept -1 in iterations and warmup iterations (#28014 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28014 as title Test Plan: ``` buck run caffe2/benchmarks/operator_benchmark/pt:add_test -- --iterations -1 --warmup_iterations -1 # ---------------------------------------- # PyTorch/Caffe2 Operator Micro-benchmarks # ---------------------------------------- # Tag : short # Benchmarking PyTorch: add # Mode: Eager # Name: add_M64_N64_K64_cpu # Input: M: 64, N: 64, K: 64, device: cpu Forward Execution Time (us) : 30827.046 ... Reviewed By: hl475 Differential Revision: D17932071 fbshipit-source-id: e4d9d256a0a4958110f61af13afdde70fc0f746c	2019-10-15 11:55:37 -07:00
Mingzhe Li	382917bbd1	report per iteration execution time (#27923 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27923 As title Test Plan: ``` buck run caffe2/benchmarks/operator_benchmark/pt:add_test -- --iterations 3 --ai_pep_format true # ---------------------------------------- # PyTorch/Caffe2 Operator Micro-benchmarks # ---------------------------------------- # Tag : short # Benchmarking PyTorch: add PyTorchObserver {"type": "add_M64_N64_K64_cpu", "metric": "latency", "unit": "us", "value": "0.027768373489379883"} PyTorchObserver {"type": "add_M64_N64_K64_cpu", "metric": "latency", "unit": "us", "value": "0.02661752700805664"} PyTorchObserver {"type": "add_M64_N64_K64_cpu", "metric": "latency", "unit": "us", "value": "0.026746749877929688"} ... Reviewed By: hl475 Differential Revision: D17911718 fbshipit-source-id: 6fe28f2ab9ce1e0feabb5b822f04ff32dac977a9	2019-10-14 15:44:42 -07:00
Mingzhe Li	38a3eabd3e	remove cuda from add_test Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27698 Test Plan: ``` buck run caffe2/benchmarks/operator_benchmark/pt:add_test -- --iterations 3 # ---------------------------------------- # PyTorch/Caffe2 Operator Micro-benchmarks # ---------------------------------------- # Tag : short # Benchmarking PyTorch: add # Mode: Eager # Name: add_M64_N64_K64_cpu # Input: M: 64, N: 64, K: 64, device: cpu Forward Execution Time (us) : 29691.940 # Benchmarking PyTorch: add # Mode: Eager # Name: add_M64_N64_K128_cpu # Input: M: 64, N: 64, K: 128, device: cpu Forward Execution Time (us) : 60820.813 Reviewed By: hl475 Differential Revision: D17855731 fbshipit-source-id: c64c530f4dbcb5b4132a88894b24e5658aa49d66	2019-10-10 08:32:04 -07:00
Mingzhe Li	aeae5d6020	add dim to the cat benchmark (#27620 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27620 as title Test Plan: ``` buck run caffe2/benchmarks/operator_benchmark/pt:cat_test -- --iterations 3 # ---------------------------------------- # PyTorch/Caffe2 Operator Micro-benchmarks # ---------------------------------------- # Tag : short # Benchmarking PyTorch: cat # Mode: Eager # Name: cat_M256_N512_K1_dim0 # Input: M: 256, N: 512, K: 1, dim: 0 Forward Execution Time (us) : 775.348 # Benchmarking PyTorch: cat # Mode: Eager # Name: cat_M256_N512_K1_dim1 # Input: M: 256, N: 512, K: 1, dim: 1 Forward Execution Time (us) : 3612.599 # Benchmarking PyTorch: cat # Mode: Eager # Name: cat_M256_N512_K1_dim2 # Input: M: 256, N: 512, K: 1, dim: 2 Forward Execution Time (us) : 91416.224 ... `` Reviewed By: hl475 Differential Revision: D17835348 fbshipit-source-id: 94e02e328c4ea61b2e210d860ccdd377ef2b97f8	2019-10-09 16:03:07 -07:00
Mingzhe Li	abcd221f19	add as_strided operator to the benchmark (#27632 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27632 Support as_strided operator in the benchmark suite. Test Plan: buck run caffe2/benchmarks/operator_benchmark/pt:as_strided_test -- --iterations 3 ``` # ---------------------------------------- # PyTorch/Caffe2 Operator Micro-benchmarks # ---------------------------------------- # Tag : short # Benchmarking PyTorch: as_strided # Mode: Eager # Name: as_strided_M256_N256_size(32,32)_stride(1,1)_storage_offset0 # Input: M: 256, N: 256, size: (32, 32), stride: (1, 1), storage_offset: 0 Forward Execution Time (us) : 92.008 # Benchmarking PyTorch: as_strided # Mode: Eager # Name: as_strided_M256_N256_size(32,32)_stride(1,1)_storage_offset1 # Input: M: 256, N: 256, size: (32, 32), stride: (1, 1), storage_offset: 1 Forward Execution Time (us) : 91.029 ... Reviewed By: hl475 Differential Revision: D17840076 fbshipit-source-id: 6585feb51ebfaca40032ffa0a61d5f76c25a2599	2019-10-09 15:42:05 -07:00
Dylan Bespalko	7c472ec597	Vectorized complex unary and binary op support. (#26500 ) Summary: Added Complex support with AVX to unary ops and binary ops. I need to add nan propagation to minimum() and maximum() in the future. In-tree changes to pytorch to support complex numbers are being submitted here. Out-of-tree support for complex numbers is here: pytorch-cpu-strided-complex extension Preliminary Benchmarks are here. I tried rrii and riri and found that riri is better in most situations. Divide is very slow because you can't reduce 1/(x+y) Sqrt is also very slow. Reciprocal could be sped up after I add conj() Everything else is typically within 20% of the real number performance. Questions: Why does macOS not support mil? #if AT_MKL_ENABLED() && !defined(__APPLE__) in vml.h. MKL does support some complex operations like Abs, so I was curious about trying it. Is MKL just calling AVX? Pull Request resolved: https://github.com/pytorch/pytorch/pull/26500 Differential Revision: D17835431 Pulled By: ezyang fbshipit-source-id: 6746209168fbeb567af340c22bf34af28286bd54	2019-10-09 12:49:21 -07:00
Mingzhe Li	ab15584dce	add random sample function to generate list of inputs (#23174 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23174 This diff introduces a new function to random generates inputs based on the weights. Test Plan: buck run mode/dev-nosan //caffe2/benchmarks/operator_benchmark/common/tests:random_sample_test -- --iterations 3 ``` # ---------------------------------------- # PyTorch/Caffe2 Operator Micro-benchmarks # ---------------------------------------- # Tag : short # Benchmarking PyTorch: add # Mode: Eager # Name: add_M1_N5_K7 # Input: M: 1, N: 5, K: 7 Forward Execution Time (us) : 82.923 # Benchmarking PyTorch: add # Mode: Eager # Name: add_M1_N6_K8 # Input: M: 1, N: 6, K: 8 Forward Execution Time (us) : 79.535 # Benchmarking PyTorch: add # Mode: Eager # Name: add_M2_N6_K7 # Input: M: 2, N: 6, K: 7 Forward Execution Time (us) : 83.471 # Benchmarking PyTorch: add # Mode: Eager # Name: add_M1_N4_K7 # Input: M: 1, N: 4, K: 7 Forward Execution Time (us) : 84.410 # Benchmarking PyTorch: add # Mode: Eager # Name: add_M1_N6_K7 # Input: M: 1, N: 6, K: 7 Forward Execution Time (us) : 82.399 ``` Reviewed By: zheng-xq Differential Revision: D15791723 fbshipit-source-id: 730e34d455e962ddf594a491d7c81c3f99fafa86	2019-10-09 11:24:14 -07:00
Mingzhe Li	c1ed0150c5	canonical example of torch.add benchmark (#23402 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23402 This diff tries to make torch.add as a canonical example for op benchmark. Once it lands, we will also modify all other op benchmarks to be uniform with this example. With that, when people are adding new ops, they can copy paste any existing code. Test Plan: buck run mode/dev-nosan caffe2/benchmarks/operator_benchmark/pt:add_test -- --iterations 3 ``` # ---------------------------------------- # PyTorch/Caffe2 Operator Micro-benchmarks # ---------------------------------------- # Tag : short # Benchmarking PyTorch: add # Mode: Eager # Name: add_M8_N16_K32_devicecpu # Input: M: 8, N: 16, K: 32, device: cpu Forward Execution Time (us) : 146.586 # Benchmarking PyTorch: add # Mode: Eager # Name: add_M8_N16_K32_devicecuda # Input: M: 8, N: 16, K: 32, device: cuda Forward Execution Time (us) : 92.151 # Benchmarking PyTorch: add # Mode: Eager # Name: add_M16_N16_K64_devicecpu # Input: M: 16, N: 16, K: 64, device: cpu Forward Execution Time (us) : 428.421 # Benchmarking PyTorch: add # Mode: Eager # Name: add_M16_N16_K64_devicecuda # Input: M: 16, N: 16, K: 64, device: cuda Forward Execution Time (us) : 89.811 # Benchmarking PyTorch: add # Mode: Eager # Name: add_M64_N64_K128_devicecpu # Input: M: 64, N: 64, K: 128, device: cpu Forward Execution Time (us) : 11857.012 # Benchmarking PyTorch: add # Mode: Eager # Name: add_M64_N64_K128_devicecuda # Input: M: 64, N: 64, K: 128, device: cuda Forward Execution Time (us) : 93.918 # Benchmarking PyTorch: add # Mode: Eager # Name: add_M8_N16_K32_devicecpu_bwdall # Input: M: 8, N: 16, K: 32, device: cpu Backward Execution Time (us) : 990.125 # Benchmarking PyTorch: add # Mode: Eager # Name: add_M8_N16_K32_devicecpu_bwd1 # Input: M: 8, N: 16, K: 32, device: cpu Backward Execution Time (us) : 781.217 # Benchmarking PyTorch: add # Mode: Eager # Name: add_M8_N16_K32_devicecpu_bwd2 # Input: M: 8, N: 16, K: 32, device: cpu Backward Execution Time (us) : 777.307 ``` Reviewed By: zheng-xq Differential Revision: D16501974 fbshipit-source-id: f1eec010eabf11ce4fcf6cfe6f85cd5241a7022d	2019-10-09 11:24:10 -07:00
Mingzhe Li	a750a1a2b4	modify config_list to support cross product of attributes (#23399 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23399 This diff enables config_list function to support cross product of inputs besides the shapes. The following is an example using the update interface. The same input shapes can run on different devices and dtypes. ``` add_short_configs = op_bench.config_list( attr_names=['M', 'N', 'K'], attrs=[ [8, 16, 32], [16, 16, 64], [64, 64, 128], ], cross_product_configs={ 'device': ['cpu', 'cuda'], 'dtype': [torch.float, torch.float64], }, tags=['short'], ) ``` Test Plan: buck run mode/dev-nosan caffe2/benchmarks/operator_benchmark/common/tests:pt_configs_list_test -- --iterations 3 ``` # ---------------------------------------- # PyTorch/Caffe2 Operator Micro-benchmarks # ---------------------------------------- # Tag : short # Benchmarking PyTorch: add # Mode: Eager # Name: add_M8_N16_K32_devicecpu_dtypetorch.float32 # Input: M: 8, N: 16, K: 32, device: cpu, dtype: torch.float32 Forward Execution Time (us) : 164.489 # Benchmarking PyTorch: add # Mode: Eager # Name: add_M8_N16_K32_devicecpu_dtypetorch.float64 # Input: M: 8, N: 16, K: 32, device: cpu, dtype: torch.float64 Forward Execution Time (us) : 158.677 # Benchmarking PyTorch: add # Mode: Eager # Name: add_M8_N16_K32_devicecuda_dtypetorch.float32 # Input: M: 8, N: 16, K: 32, device: cuda, dtype: torch.float32 Forward Execution Time (us) : 103.866 # Benchmarking PyTorch: add # Mode: Eager # Name: add_M8_N16_K32_devicecuda_dtypetorch.float64 # Input: M: 8, N: 16, K: 32, device: cuda, dtype: torch.float64 Forward Execution Time (us) : 106.027 # Benchmarking PyTorch: add # Mode: Eager # Name: add_M16_N16_K64_devicecpu_dtypetorch.float32 # Input: M: 16, N: 16, K: 64, device: cpu, dtype: torch.float32 Forward Execution Time (us) : 451.016 ... ``` buck test caffe2/benchmarks/operator_benchmark/fb:operator_benchmark_test ``` Building: finished in 2.4 sec (100%) 6882/6882 jobs, 2 updated Total time: 2.8 sec Trace available for this run at /tmp/testpilot.20190730-160519.3952794.log TestPilot test runner for Facebook. See https://fburl.com/testpilot for details. Testpilot build revision 203f0104fbfcec4128be2c482c64736309ae39c9 fbpkg a4b2a9897a0c45069bd07d83e5981052 at Sun Jul 28 01:22:13 2019 by twsvcscm from /data/fbprojects/packages/testinfra.testpilot/667/t.par Discovering tests Running 3 tests Started new test run: https://our.intern.facebook.com/intern/testinfra/testrun/5910974514382830 ✓ caffe2/benchmarks/operator_benchmark/fb:operator_benchmark_test - test_config_list_impl (operator_benchmark_test.TestConsumeOp) 0.011 1/3 (passed) ✓ caffe2/benchmarks/operator_benchmark/fb:operator_benchmark_test - test_list_of_ops (operator_benchmark_test.TestConsumeOp) 19.920 2/3 (passed) ✓ caffe2/benchmarks/operator_benchmark/fb:operator_benchmark_test - test_single_op (operator_benchmark_test.TestConsumeOp) 23.418 3/3 (passed) ✓ caffe2/benchmarks/operator_benchmark/fb:operator_benchmark_test - main 0.000 (passed) Finished test run: https://our.intern.facebook.com/intern/testinfra/testrun/5910974514382830 Summary (total time 29.90s): PASS: 4 FAIL: 0 SKIP: 0 FATAL: 0 TIMEOUT: 0 OMIT: 0 ``` Reviewed By: zheng-xq Differential Revision: D16501272 fbshipit-source-id: d92b5cf50b0f37d5b3a79d423acb521366b4e8db	2019-10-09 11:24:06 -07:00
Mingzhe Li	31a6ff46c1	change input shape to reduce variation (#27548 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27548 as title Test Plan: i_dont_want_it Reviewed By: hl475 Differential Revision: D17811295 fbshipit-source-id: 3be957f6f3eaa464ebf4f5bd7c07d096ae4eae8c	2019-10-08 11:45:06 -07:00
Daya Khudia	bf7ebc5a53	Set number of threads for operator_benchmarks (#27010 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27010 Setting OMP_NUM_THREADS programmatically doesn't do the right thing because initialization is already done. Fixing this by calling torch.set_num_threads explicitly. Passing --omp_num_threads works as expected now. In dir benchmarks/operator_benchmark/ python -m pt.qconv_test --tag_filter resnext101_32x4 --wipe_cache --test_name QConv2d_N1_IC64_OC128_H56_W56_G1_kernel1_stride1_pad0 --omp_num_threads 1 ``` # ---------------------------------------- # PyTorch/Caffe2 Operator Micro-benchmarks # ---------------------------------------- # Tag : None # Benchmarking PyTorch: QConv2d # Mode: Eager # Name: QConv2d_N1_IC64_OC128_H56_W56_G1_kernel1_stride1_pad0 # Input: N: 1, IC: 64, OC: 128, H: 56, W: 56, G: 1, kernel: 1, stride: 1, pad: 0 Forward Execution Time (us) : 509.965 # Benchmarking PyTorch: QConv2d # Mode: Eager # Name: QConv2d_N1_IC64_OC128_H56_W56_G1_kernel1_stride1_pad0 # Input: N: 1, IC: 64, OC: 128, H: 56, W: 56, G: 1, kernel: 1, stride: 1, pad: 0 Forward Execution Time (us) : 576.007 ``` python -m pt.qconv_test --tag_filter resnext101_32x4 --wipe_cache --test_name QConv2d_N1_IC64_OC128_H56_W56_G1_kernel1_stride1_pad0 --omp_num_threads 4 ``` # ---------------------------------------- # PyTorch/Caffe2 Operator Micro-benchmarks # ---------------------------------------- # Tag : None # Benchmarking PyTorch: QConv2d # Mode: Eager # Name: QConv2d_N1_IC64_OC128_H56_W56_G1_kernel1_stride1_pad0 # Input: N: 1, IC: 64, OC: 128, H: 56, W: 56, G: 1, kernel: 1, stride: 1, pad: 0 Forward Execution Time (us) : 195.002 # Benchmarking PyTorch: QConv2d # Mode: Eager # Name: QConv2d_N1_IC64_OC128_H56_W56_G1_kernel1_stride1_pad0 # Input: N: 1, IC: 64, OC: 128, H: 56, W: 56, G: 1, kernel: 1, stride: 1, pad: 0 Forward Execution Time (us) : 189.788 ``` ghstack-source-id: 91050434 Test Plan: See summary Differential Revision: D17647391 fbshipit-source-id: e00de1151902291ed94fd34446995ea1f0199d14	2019-09-30 17:04:51 -07:00
Daya Khudia	fc926d9242	fix operator level benchmark to have NHWC layout (#26577 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26577 Have the NHWC layout expected by qconv kernel. for rexnext101-32x4d shapes Before : ``` Forward Execution Time (us) : 4787.046 Forward Execution Time (us) : 1320.065 Forward Execution Time (us) : 2611.631 Forward Execution Time (us) : 2562.389 Forward Execution Time (us) : 1072.342 Forward Execution Time (us) : 2330.658 Forward Execution Time (us) : 1894.549 Forward Execution Time (us) : 3446.532 Forward Execution Time (us) : 2381.251 Forward Execution Time (us) : 1157.339 Forward Execution Time (us) : 2712.621 Forward Execution Time (us) : 3789.905 Forward Execution Time (us) : 4057.886 Forward Execution Time (us) : 6104.570 Forward Execution Time (us) : 11328.552 Forward Execution Time (us) : 3707.519 Forward Execution Time (us) : 4681.272 Forward Execution Time (us) : 2459.266 Forward Execution Time (us) : 849.564 Forward Execution Time (us) : 3000.764 Forward Execution Time (us) : 3019.704 Forward Execution Time (us) : 5216.046 Forward Execution Time (us) : 3403.549 Forward Execution Time (us) : 1291.878 Forward Execution Time (us) : 2057.147 ``` After ``` Forward Execution Time (us) : 4398.649 Forward Execution Time (us) : 993.619 Forward Execution Time (us) : 2252.265 Forward Execution Time (us) : 2230.500 Forward Execution Time (us) : 977.389 Forward Execution Time (us) : 2233.356 Forward Execution Time (us) : 1223.085 Forward Execution Time (us) : 2758.765 Forward Execution Time (us) : 2208.028 Forward Execution Time (us) : 821.816 Forward Execution Time (us) : 2396.748 Forward Execution Time (us) : 2505.803 Forward Execution Time (us) : 2771.251 Forward Execution Time (us) : 4816.474 Forward Execution Time (us) : 10065.299 Forward Execution Time (us) : 2424.949 Forward Execution Time (us) : 3854.800 Forward Execution Time (us) : 2297.426 Forward Execution Time (us) : 682.403 Forward Execution Time (us) : 2297.541 Forward Execution Time (us) : 2317.828 Forward Execution Time (us) : 4517.372 Forward Execution Time (us) : 2716.691 Forward Execution Time (us) : 942.385 Forward Execution Time (us) : 1717.172 ``` ghstack-source-id: 90536232 Test Plan: buck build mode/opt caffe2/benchmarks/operator_benchmark/pt:qconv_test --show-output Differential Revision: D17512291 fbshipit-source-id: 7764b2ab38e0e8e0aab982006915176638004df6	2019-09-23 11:12:51 -07:00
Jerry Zhang	254122dd4e	quantize_linear -> quantize_per_tensor (#26574 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26574 Since we also have `quantized::linear`, `quantize_linear` sounds confusing, so we plan to rename it before the branch cut Test Plan: ci Imported from OSS Differential Revision: D17514876 fbshipit-source-id: 01d9005e6ec8cb9950b9d8bba122109c389641d3	2019-09-20 21:58:48 -07:00
Dmytro Dzhulgakov	af64789cfa	Fold activation permutation inside quantized conv operator (#26242 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26242 According to https://github.com/pytorch/pytorch/issues/19092 we always keep NCHW order and do handling inside the kernels. This PR fixes it for activations of the qconv by using MemoryLayout mechanism - activations stay logically as NCHW but strided as NHWC. Note, that this version is more aggressive than eventual MemoryLayout mechanism - the QConv's output is always NHWC regardless of the input striding. I think it's ok as we don't have NCHW quantized kernels anyway - so the very first conv would magically switch the order, but I'm open to suggestions. Btw, it doesn't change behavior - same happens today in master because of the explicit permute() call. Test Plan: Imported from OSS Differential Revision: D17443218 Pulled By: dzhulgakov fbshipit-source-id: cfd136ae0465acd8d8c26ffad87385dac9c88726	2019-09-19 13:39:26 -07:00
Huamin Li	2a917616a8	remove cosh_ op test (#25893 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25893 as title Test Plan: waitforsandcastle Reviewed By: mingzhe09088 Differential Revision: D17278340 fbshipit-source-id: 81b7e8658d5919e865754ae4d834dc44494cb2e3	2019-09-09 20:34:35 -07:00
Huamin Li	1c81d9006a	increase input shape to reduce variance (#25812 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25812 as title Test Plan: ``` [huaminli@devvm2388.ftw3 ~/fbsource/fbcode] buck run mode/dev-nosan caffe2/benchmarks/operator_benchmark:benchmark_all_test -- --operators None --iterations 3 ``` last few lines of the output P109238440 Reviewed By: mingzhe09088 Differential Revision: D17246792 fbshipit-source-id: d93ee5f404164d32210968997c6ea63b82058d2a	2019-09-07 06:25:26 -07:00
Huamin Li	d4226392bd	change shape for some ops to reduce variance (#25686 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25686 From the new runs, we found some ops that we can increase the shape size to reduce the variance Test Plan: ``` [huaminli@devvm2388.ftw3 ~/fbsource/fbcode] buck run mode/dev-nosan caffe2/benchmarks/operator_benchmark:benchmark_all_test -- --operators None --iterations 3 ``` last few lines of the output P108624830 Reviewed By: mingzhe09088 Differential Revision: D17199623 fbshipit-source-id: a9277509f6d3e6503d3086b3b02f87eebd953239	2019-09-04 21:17:43 -07:00
Huamin Li	cd4a7cdaa6	change shape for some ops to reduce variance Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25619 Test Plan: ``` [huaminli@devvm2388.ftw3 ~/fbsource/fbcode] buck run mode/dev-nosan caffe2/benchmarks/operator_benchmark:benchmark_all_test -- --operators None --iterations 3 ``` last few lines of output P108286305 Reviewed By: mingzhe09088 Differential Revision: D17175802 fbshipit-source-id: 46b69fc1895444b15b6dfcec0625b6b9b006712a	2019-09-03 18:52:25 -07:00
Huamin Li	9d89c9a30f	change shape for conv and unary ops (#25477 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25477 We want to increase `in_c, out_c` so that the metric reported back are more stable Test Plan: ```[huaminli@devvm2388.ftw3 ~/fbsource/fbcode] buck run mode/dev-nosan caffe2/benchmarks/operator_benchmark:benchmark_all_test -- --operators None --iterations 3 ``` runs fine on my devserver, last couple lines of output P107448746 Reviewed By: mingzhe09088 Differential Revision: D17133043 fbshipit-source-id: 0b989a530cbfe3d608471a30ae4bbda10e5216ea	2019-08-30 10:02:30 -07:00
Rohan Varma	4b77cae360	Add qconv_test to benchmarking tests (#24913 ) Summary: Adds the tests defined in `qconv_tests.py` to `benchmark_all_tests.py` so that they are ran by `benchmark_all_tests`. The next diff will create another `ai_benchmark_test` specifying the qconv operations similar to D16768680. Since AI-PEP integrates with benchmark_all_tests, this should add these qconv benchmarks to AI-PEP. Pull Request resolved: https://github.com/pytorch/pytorch/pull/24913 Test Plan: `buck run mode/opt caffe2/benchmarks/operator_benchmark:benchmark_all_test` (runs only test who's `tag` is `short`) `buck run mode/opt caffe2/benchmarks/operator_benchmark:benchmark_all_test -- --tag_filter resnext101_32x4d` (runs test who's `tag` is `resxnet101_32x4d`). This runs the tests for all the imported modules in `benchmark_all_test.py` (i.e. add_test, batchnorm_test, qconv_test, etc) ``` buck run mode/opt caffe2/benchmarks/operator_benchmark:benchmark_all_test -- --operators QConv2d,QLinear ``` tests the QConv and QLinear operators Relevant output for `qconv_test.py` (for short tag): ``` # Benchmarking PyTorch: QConv2d # Mode: Eager # Name: QConv2d_N1_IC64_OC128_H56_W56_G1_kernel1_stride1_pad0 # Input: N: 1, IC: 64, OC: 128, H: 56, W: 56, G: 1, kernel: 1, stride: 1, pad: 0 Forward Execution Time (us) : 957.848 # Benchmarking PyTorch: QConv2d # Mode: Eager # Name: QConv2d_N1_IC256_OC256_H56_W56_G32_kernel3_stride1_pad1 # Input: N: 1, IC: 256, OC: 256, H: 56, W: 56, G: 32, kernel: 3, stride: 1, pad: 1 Forward Execution Time (us) : 3638.806 # Benchmarking PyTorch: QConv2d # Mode: Eager # Name: QConv2d_N1_IC256_OC256_H56_W56_G1_kernel1_stride1_pad0 # Input: N: 1, IC: 256, OC: 256, H: 56, W: 56, G: 1, kernel: 1, stride: 1, pad: 0 Forward Execution Time (us) : 3870.311 # Benchmarking PyTorch: QConv2d # Mode: Eager # Name: QConv2d_N1_IC512_OC512_H56_W56_G32_kernel3_stride2_pad1 # Input: N: 1, IC: 512, OC: 512, H: 56, W: 56, G: 32, kernel: 3, stride: 2, pad: 1 Forward Execution Time (us) : 10052.192 ``` For resnext tag: ``` # ---------------------------------------- # PyTorch/Caffe2 Operator Micro-benchmarks # ---------------------------------------- # Tag : resnext101_32x4d # Benchmarking PyTorch: QConv2d # Mode: Eager # Name: QConv2d_N1_IC512_OC512_H14_W14_G32_kernel3_stride1_pad1 # Input: N: 1, IC: 512, OC: 512, H: 14, W: 14, G: 32, kernel: 3, stride: 1, pad: 1 Forward Execution Time (us) : 543.171 # Benchmarking PyTorch: QConv2d # Mode: Eager # Name: QConv2d_N1_IC512_OC1024_H28_W28_G1_kernel1_stride2_pad0 # Input: N: 1, IC: 512, OC: 1024, H: 28, W: 28, G: 1, kernel: 1, stride: 2, pad: 0 Forward Execution Time (us) : 1914.301 # Benchmarking PyTorch: QConv2d # Mode: Eager # Name: QConv2d_N1_IC512_OC256_H28_W28_G1_kernel1_stride1_pad0 # Input: N: 1, IC: 512, OC: 256, H: 28, W: 28, G: 1, kernel: 1, stride: 1, pad: 0 Forward Execution Time (us) : 1809.069 # Benchmarking PyTorch: QConv2d # Mode: Eager # Name: QConv2d_N1_IC512_OC512_H28_W28_G1_kernel1_stride1_pad0 # Input: N: 1, IC: 512, OC: 512, H: 28, W: 28, G: 1, kernel: 1, stride: 1, pad: 0 Forward Execution Time (us) : 3100.579 # Benchmarking PyTorch: QConv2d # Mode: Eager # Name: QConv2d_N1_IC512_OC512_H28_W28_G32_kernel3_stride2_pad1 # Input: N: 1, IC: 512, OC: 512, H: 28, W: 28, G: 32, kernel: 3, stride: 2, pad: 1 Forward Execution Time (us) : 2247.540 # Benchmarking PyTorch: QConv2d # Mode: Eager # Name: QConv2d_N1_IC64_OC128_H56_W56_G1_kernel1_stride1_pad0 # Input: N: 1, IC: 64, OC: 128, H: 56, W: 56, G: 1, kernel: 1, stride: 1, pad: 0 Forward Execution Time (us) : 1001.731 # Benchmarking PyTorch: QConv2d # Mode: Eager # Name: QConv2d_N1_IC64_OC256_H56_W56_G1_kernel1_stride1_pad0 # Input: N: 1, IC: 64, OC: 256, H: 56, W: 56, G: 1, kernel: 1, stride: 1, pad: 0 Forward Execution Time (us) : 1571.620 ``` Differential Revision: D16908445 Pulled By: rohan-varma fbshipit-source-id: b711bc3591ce5dcd3ab2521134cff2b12188e3ac	2019-08-22 11:28:49 -07:00
Rohan Varma	60518e0035	Add resnext 32x4d shapes to benchmark (#24503 ) Summary: Adds resnext-1011 32x4d shapes to the qconv benchmarks. (Also ran the code formatter) Pull Request resolved: https://github.com/pytorch/pytorch/pull/24503 Test Plan: Run tests on devserver: ```buck run mode/opt caffe2/benchmarks/operator_benchmark/pt:qconv_test -- --omp_num_threads 1 --mkl_num_threads 1``` Reviewed By: dskhudia Differential Revision: D16845746 Pulled By: rohan-varma fbshipit-source-id: d9f842e5f455fccecf547129c5faffa253a49e23	2019-08-19 12:04:48 -07:00
Huamin Li	5c57cedc16	change the location of wipe cache (#24454 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24454 We want to change the place of wipe_cache. From what we observed, the original place does not help. Reviewed By: mingzhe09088 Differential Revision: D16853205 fbshipit-source-id: 1f6224a52433cbe15c0d27000b4ac140fb9cd4c3	2019-08-15 20:55:47 -07:00
Huamin Li	1b38a6f602	add wipe cache Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24390 Reviewed By: mingzhe09088 Differential Revision: D16808041 fbshipit-source-id: 1b19f47706e4e2f2e03356469315b55c6ff76d20	2019-08-14 23:48:52 -07:00
Huamin Li	f511abb701	increase default warmup iter and iter (#24272 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24272 As title, plus some lint Reviewed By: mingzhe09088 Differential Revision: D16792312 fbshipit-source-id: 1386c369c96da04a584d1f7127b708b29d4b47d2	2019-08-13 14:35:19 -07:00
Mingzhe Li	b453fd9916	separate input shapes to reduce default execution time (#24136 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24136 This diff aims to reduce the execution of benchmark_all_test which runs all the supported operator benchmarks. In the default run, only one shape of each operator will be benchmarked. The rest of the benchmarks can be triggered with tag_filter flag. Reviewed By: hl475 Differential Revision: D16736448 fbshipit-source-id: 33bd86f6fc2610f87f24240ad559fb11d3e35e89	2019-08-09 17:09:21 -07:00
Daya Khudia	aa02b1adcd	Fix qconv benchmark (#24019 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24019 Permutes are done inside the module. We don't need them outside. Setting of scale/zero_point has changed. Reviewed By: jianyuh Differential Revision: D16712437 fbshipit-source-id: e3cedf9d63347fbf8070d1a65a196e6d4b2833fc	2019-08-09 09:17:55 -07:00
Mingzhe Li	29e2b58b00	Back out "[op-bench][experiment] increase predefined_minimum_secs to reduce variation" (#24065 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24065 Original commit changeset: d4c034f64b1d Reviewed By: hl475 Differential Revision: D16726647 fbshipit-source-id: 6cd6cfdad804efb073062809bcbc4c0921a3d007	2019-08-08 18:36:22 -07:00
Daya Khudia	fb06c9e61f	qconv operator level benchmark (#22895 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22895 Adding op level benchmarking for qconv operator Reviewed By: mingzhe09088 Differential Revision: D16274273 fbshipit-source-id: 6674753e38f6692f5e6d0db0cac90c5fbf358147	2019-08-05 09:39:16 -07:00
Mingzhe Li	5cb41d35da	increase predefined_minimum_secs to reduce variation (#23734 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23734 In the latest run on AI-PEP, there are 6 tests out of 342 which has more than 7% variation. Around 20 tests which has variations between 4% to 7%. The rest are within 4%. This diff tries to further reduce the variation to 4% for all tests. Each test has to run predefined_minimum_secs seconds before exiting. Increasing that value makes all tests run longer. Based on the experimental results, we will see what's the right value to use. Reviewed By: hl475 Differential Revision: D16622361 fbshipit-source-id: d4c034f64b1d64e1cffd67ffbced7d8cd4449d69	2019-08-02 10:33:48 -07:00
Mingzhe Li	3c986dff77	introduce auto_set to simplify benchmarking the backward path of operators (#23276 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23276 This diff introduces a new feature to simplify benchmarking the backward path of ops. Here is an example: ``` ... self.input_one = torch.rand(M, N, K, requires_grad=self.auto_set()) self.input_two = torch.rand(M, N, K, requires_grad=self.auto_set()) ... ``` In this way, the benchmark will generate three different test cases. 1. input_one requires grad 2. input_two requires grad 3. both inputs require grad Here is a sample output: ``` # Benchmarking PyTorch: add # Mode: Eager # Name: add_M1_N8_K8_bwdall # Input: M: 1, N: 8, K: 8 Backward Execution Time (us) : 863.744 # Benchmarking PyTorch: add # Mode: Eager # Name: add_M1_N8_K8_bwd1 # Input: M: 1, N: 8, K: 8 Backward Execution Time (us) : 727.915 # Benchmarking PyTorch: add # Mode: Eager # Name: add_M1_N8_K8_bwd2 # Input: M: 1, N: 8, K: 8 Backward Execution Time (us) : 687.626 ``` Reviewed By: zheng-xq Differential Revision: D16450355 fbshipit-source-id: 50ae0916e81c3ff9f0c482ed6d386319eb15b305	2019-07-29 15:58:41 -07:00
Abhinav Jauhri	ffef0e03b7	Enabling GPU device runs for operators (#23461 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23461 Enabling GPU device runs for production operator shapes. Reviewed By: xw285cornell, mingzhe09088 Differential Revision: D16526928 fbshipit-source-id: 46657963f4b0bc43d14205ccf1b63d588657e388	2019-07-26 18:53:40 -07:00
Mingzhe Li	f0ebf769de	allow accepting empty input to the benchmark (#23462 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23462 as title Reviewed By: hl475 Differential Revision: D16527176 fbshipit-source-id: 7a8ff4f3c6122ae7b3205e0b446fec06fd95eedc	2019-07-26 17:30:42 -07:00
Mingzhe Li	53182e53f0	fix observer name in the benchmark output (#23443 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23443 as title Reviewed By: hl475 Differential Revision: D16520962 fbshipit-source-id: 7a0ccbece487837c204f242d2a5c6f69b32cbc8c	2019-07-26 12:20:41 -07:00
Mingzhe Li	828c08b4c7	allow passing a list of operators to benchmark (#23442 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23442 Replace the argument name from `operator` to `operators` which can take a list of operators to test. Reviewed By: hl475 Differential Revision: D16520779 fbshipit-source-id: 94284a87c64471793e319f5bd3143f89b9a192bb	2019-07-26 12:20:36 -07:00
Mingzhe Li	7499fe72e9	remove c2 tests from benchmark_all_test (#23437 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23437 as title Reviewed By: hl475 Differential Revision: D16519770 fbshipit-source-id: 63fc269e18c264d399e25f44b03f81fc3ae01113	2019-07-26 11:12:53 -07:00
Mingzhe Li	3516f3c235	handle exit from init method (#21211 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21211 There are cases where the `init` method used to create inputs can exit with error. When this happens, that specific input should be skipped. Reviewed By: zheng-xq Differential Revision: D15466410 fbshipit-source-id: 55e86764b2ec56f7730349ff1df6e50efc0239d7	2019-07-25 21:41:06 -07:00
Abhinav Jauhri	bae10db522	Incorporating arguments to pull production operators and adding device type. (#23197 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23197 Incorporating arguments to pull production operators and adding device type. Reviewed By: mingzhe09088 Differential Revision: D16387263 fbshipit-source-id: e20ed82225eb1e4b7ab1756ec157967b055d85bf	2019-07-23 13:43:26 -07:00
Kimish Patel	82db5dceb6	Added running via throughput benchmark options. (#23077 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23077 Although the difference between running from python and this is not much if we have forward method's loop long enough (like 1000 in this case). Reviewed By: mingzhe09088 Differential Revision: D16122343 fbshipit-source-id: 5c1d1b98ae82c996baf9d42bcd04995e2ba60c78	2019-07-22 11:27:55 -07:00
Kimish Patel	2ba516d5b6	Added add op framework overhead benchmark for C2 (#23078 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23078 C2 benchmark. Reviewed By: mingzhe09088 Differential Revision: D16122337 fbshipit-source-id: bf56e60c6e60eda2be2938d9f613708a4bc1669a	2019-07-22 11:27:50 -07:00
Kimish Patel	0621068cdc	Add simple add op based framework overhead benchmark. (#23076 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23076 Tracing based and non tracing based added Reviewed By: mingzhe09088 Differential Revision: D16097280 fbshipit-source-id: 3a137092f7ccc3dd2d29d95e10178ec89d3ce892	2019-07-22 11:27:45 -07:00
Jianyu Huang	f72d754877	qlinear operator level benchmark (#22914 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22914 Adding op level benchmarking for qlinear operator Reviewed By: mingzhe09088 Differential Revision: D16285204 fbshipit-source-id: 99b734ddfa0af6aada820cac7b2f38ef7a5868cb	2019-07-17 09:13:17 -07:00
Mingzhe Li	9b9546a498	replace ByteTensor with bool in fill_test (#22913 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22913 as title Reviewed By: hl475 Differential Revision: D16285248 fbshipit-source-id: 78b13d48d547760e59e0e5c8875ab09a3cd24828	2019-07-16 11:51:55 -07:00
Mingzhe Li	560d847da6	add benchmark for PT fill_ op (#22867 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22867 as title Reviewed By: hl475 Differential Revision: D16263458 fbshipit-source-id: 55b0e62023c117aaa0c2b9a4d65b234a388f086d	2019-07-16 09:50:41 -07:00
Mingzhe Li	94d99f2522	add num_runs flag to the benchmark (#22892 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22892 Think of num_runs as manually run the binary <num_runs> times. Each run runs the operator for many iterations. Reviewed By: hl475 Differential Revision: D16271597 fbshipit-source-id: b6f509ee0332c70f85bec0d447b84940c5c0cecd	2019-07-15 17:18:25 -07:00
Mingzhe Li	0cddd3e751	update README (#21312 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21312 This diff updates the README of op-bench. Reviewed By: zheng-xq Differential Revision: D15612665 fbshipit-source-id: b33119fd4f9d086b03b5e28fbe8a4015b282b15c	2019-07-15 13:34:05 -07:00
Mingzhe Li	7eb0319339	add new tests to benchmark_all_test (#22787 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22787 as title Reviewed By: hl475 Differential Revision: D16219329 fbshipit-source-id: 097ee73e7644d5ca482ad044d0fd2c3e7dc2c10b	2019-07-11 22:50:55 -07:00
Mingzhe Li	1878800f47	make custom op work in OSS environment (#22781 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22781 The custom op is required to make the op benchmark work with JIT. Running this command `python setup.py install` in the pt_extension directory to install it. It is required. Reviewed By: hl475 Differential Revision: D16214430 fbshipit-source-id: c9221c532011f9cf0d5453ac8535a6cde65e8376	2019-07-11 21:17:17 -07:00
Mingzhe Li	3cf5f22f02	Enable C2 operators running with {cpu, gpu} * {forward, backward} (#22664 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22664 This diff enables c2 operators to run the combination of {cpu, gpu} * {forward, backward}. Reviewed By: hl475 Differential Revision: D15781789 fbshipit-source-id: e9843e3c46ea144042829860638d406f6a33792b	2019-07-09 16:41:53 -07:00
Mingzhe Li	95a5da175d	change c2 bench to use new tensor creation interface (#22663 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22663 as title Reviewed By: hl475 Differential Revision: D15744502 fbshipit-source-id: 441ab9fb7580ca87c3f2027d0a63ba18b8d35016	2019-07-09 16:41:49 -07:00
Mingzhe Li	45aad2e680	change unary, pool, max ops to use new interface (#22661 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22661 as title Reviewed By: hl475 Differential Revision: D16170825 fbshipit-source-id: d80944224b8717e7aa35980907ff48e587b85217	2019-07-09 16:41:32 -07:00
Mingzhe Li	2b2fe525b9	introduce a new interface to add a list of operators (#21209 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21209 This diff introduces a new interface to add a list of operators. Here are the steps to add ops using this interface: - create op_list: ```unary_ops_list = op_bench.op_list( attr_names=["op_name", "op_function"], attrs=[ ["abs", torch.abs], ["abs_", torch.abs_], ], ) ``` - create a bench class: ``` class UnaryOpBenchmark(op_bench.TorchBenchmarkBase): def init(self, M, N, op_function): self.input_one = torch.rand(M, N) self.op_func = op_function def forward(self): return self.op_func(self.input_one) ``` - 3. register those ops ``` op_bench.generate_pt_tests_from_list(unary_ops_list, unary_ops_configs, UnaryOpBenchmark) ``` Reviewed By: zheng-xq Differential Revision: D15514188 fbshipit-source-id: f09b359cab8175eeb8d51b3ad7bbbcfbc9f6430f	2019-07-09 16:41:29 -07:00
Mingzhe Li	b93f29ded3	add JIT path to the benchmark (#22309 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22309 This diff enables PT operators to run with JIT mode. Users can control eager and JIT mode using the `use_jit` flag. In this diff, we are putting operators in a loop and passed it to JIT. One extra step which wraps the operator with the `_consume` op is introduced to avoid dead code elimination optimization in JIT. With that, the reported time includes the real operator execution time plus the `_consume` (directly return input, nothing else if happening inside) op. Reviewed By: zheng-xq Differential Revision: D16033082 fbshipit-source-id: e03be89fd5a505e44e81015dfc63db9cd76fb8a1	2019-07-03 17:18:03 -07:00
Mingzhe Li	325ec2327f	create tensor based on provided datatype (#22468 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22468 as title Reviewed By: ajauhri Differential Revision: D15744503 fbshipit-source-id: 050b32dd7f135512385fc04f098c376c664211a9	2019-07-03 17:08:23 -07:00
Mingzhe Li	9c44f6c723	generate tests based on op metadata (#21432 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21432 This diff introduce a new interface to generate tests based on the metadata of operators. Reviewed By: ajauhri Differential Revision: D15675542 fbshipit-source-id: ba60e803ea553d8b9eb6cb2bcdc6a0368ef62b1c	2019-07-03 16:48:41 -07:00
Mingzhe Li	402b9f9a6d	add PT chunk op to the benchmark (#22409 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22409 as title Reviewed By: hl475 Differential Revision: D16079031 fbshipit-source-id: 109060ffc953f2357b2783b13f9b9dc87bd3f98a	2019-07-01 16:37:05 -07:00
Mingzhe Li	8a726f5815	add PT split op to the benchmark (#22410 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22410 as title Reviewed By: hl475 Differential Revision: D16078705 fbshipit-source-id: 29e1cc19d0e93a561d07c47e5678a311e6de3e3b	2019-07-01 16:37:01 -07:00
Mingzhe Li	8281909e73	add PT cat operator to the benchmark (#22404 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22404 as title Reviewed By: hl475 Differential Revision: D16078395 fbshipit-source-id: 4ff5c558036af1dce6ac0001a1a1fc3a373a981f	2019-07-01 16:36:57 -07:00
Mingzhe Li	007fd01e9b	Enable PT operators running with {cpu, gpu} * {forward, backward} (#22416 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22416 This diff tests the combination of cpu/gpu and forward/backward path for PT add operator. Reviewed By: hl475 Differential Revision: D15770792 fbshipit-source-id: 38cc648361d2501d774db407f988c3cb5115b2ae	2019-07-01 16:30:58 -07:00
Mingzhe Li	3a198400f8	modify pool benchmarks Summary: as title Reviewed By: hl475 Differential Revision: D16058193 fbshipit-source-id: 8f4e04a0356960f6483d6ef58e64876740434849	2019-06-28 14:35:23 -07:00
Mingzhe Li	89c709d217	modify unary operators benchmark Summary: as title Reviewed By: hl475 Differential Revision: D16057665 fbshipit-source-id: 07e31a17450fbfd88b5bd330c31c729de5300eaa	2019-06-28 14:03:41 -07:00
Mingzhe Li	6cf4df5d06	add PT softmax ops to the benchmark suite (#21208 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21208 The diff adds softmax, softmax2d, and logsoftmax to the benchmark suite. Reviewed By: zheng-xq Differential Revision: D15526265 fbshipit-source-id: b7ba63032dba7146765513c8cb1ac5a6a7bd1a68	2019-06-28 13:58:20 -07:00
Mingzhe Li	a4f281446b	introduce flags to set omp and mkl threads (#21472 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21472 as title Reviewed By: hl475 Differential Revision: D15695846 fbshipit-source-id: 44437f6b94a9c583275fcc711bb6ccf2b04f90fc	2019-06-26 09:33:05 -07:00
Sungmann Cho	f59581218f	Fix spelling errors (#21665 ) Summary: alloctor -> allocator excutable -> executable excution -> execution foward -> forward initiaize -> initialize paralell -> parallel preprocesor -> preprocessor tranpose -> transpose Pull Request resolved: https://github.com/pytorch/pytorch/pull/21665 Differential Revision: D15806155 Pulled By: soumith fbshipit-source-id: d92b21ec8650a2b32f05faf9af0b7d2b073e992c	2019-06-13 15:21:55 -07:00
Mingzhe Li	341a7e4bb5	Fix issue in backward path (#21663 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21663 as title Reviewed By: hl475 Differential Revision: D15770793 fbshipit-source-id: b3d0dd030237c4d62bddc388984a273153fac4a6	2019-06-11 21:09:25 -07:00
Mingzhe Li	f2623c74a9	add PT pointwise unary ops to the benchmark suite (#21207 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21207 This diff adds 80 PT pointwise unary ops to the benchmark suite. Most of the ops are added using the generate_pt_tests_from_list interface. The rest are handled separately. Reviewed By: zheng-xq Differential Revision: D15471597 fbshipit-source-id: 8ea36e292a38b1dc50f064a48c8cd07dbf78ae56	2019-06-10 21:35:44 -07:00
Mingzhe Li	4e3c97a0be	add separate path for op with JIT (#21210 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21210 This diff introduces a new path to run op with JIT. There are two steps involved here: 1. Users need to script the op. This should happen in the `init` method. 2. The generated graph from step1 is passed to `jit_forward` which will be executed by the benchmark backend Reviewed By: zheng-xq Differential Revision: D15460831 fbshipit-source-id: 48441d9cd4be5d0acebab901f45544616e6ed2ee	2019-06-10 19:53:58 -07:00
Mingzhe Li	512c9d8c76	add PT gather op to the benchmark suite (#21614 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21614 as title Reviewed By: kimishpatel Differential Revision: D15525115 fbshipit-source-id: 6a17e1d791bdb432cc3d51e45c5e82b96268127d	2019-06-10 16:31:52 -07:00
Mingzhe Li	a5cf6d5100	reorganize op bench directory (#21543 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21543 No code change in this diff. Reviewed By: hl475 Differential Revision: D15721419 fbshipit-source-id: 06212cc882f5297064153417dc4d80bce9ec2667	2019-06-07 16:06:51 -07:00
Huamin Li	f433913996	add more info back to BenchResult (#21502 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21502 In BenchResult, we keep name, avg_fwd, std_fwd, avg_bwd, and std_bwd. There is no information about the number of each iteration. In this diff, I am adding more info to BenchResult to include the number reported from each iteration. Reviewed By: wanchaol Differential Revision: D15706306 fbshipit-source-id: 3f14be4ba91f1f6da473995783bd7af1d067938d	2019-06-06 18:43:51 -07:00
Mingzhe Li	12528990f8	change output of ai_pep_format (#21440 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21440 This diff modifies the output format when ai_pep_format is enabled. Reviewed By: hl475 Differential Revision: D15681042 fbshipit-source-id: df5f2dbb38d1bd866ca7f74ef4e63459d480be6e	2019-06-05 21:54:24 -07:00
Mingzhe Li	b869a3b4ac	add new ops to benchmark_all_test (#21365 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21365 This diff adds new operators to benchmark_all_test so all the supported ops can be built as one binary Reviewed By: hl475 Differential Revision: D15627328 fbshipit-source-id: b7ca550a279f485102a6a6bd47e4032c7beb9940	2019-06-04 13:54:26 -07:00
Mingzhe Li	3004b397f0	change test_name to be globally unique value across tests (#21206 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21206 This diff change the default test_name to be a globally unique value across tests. With that, users can list all the tests and choose to run a specific test. Reviewed By: zheng-xq Differential Revision: D15543508 fbshipit-source-id: 0814ef6a60d41637fed5245e30c282497cf21bb8	2019-06-03 14:55:11 -07:00
Mingzhe Li	ca80ec7c97	introduce a new intrace to add op [PT changes] (#21149 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21149 The diff modifies the interface for PyTorch operators in the benchmark suite Reviewed By: zheng-xq Differential Revision: D15433897 fbshipit-source-id: e858183431eb37d90313356716c2de8709372b58	2019-06-03 14:55:08 -07:00
Mingzhe Li	516ea33f6a	add PT maxpool and avgpool ops to the benchmark suite (#21200 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21200 This diff adds MaxPool1d/2d/3d and AvgPool1d/2d/3d to the benchmark suite. Reviewed By: hl475 Differential Revision: D15541980 fbshipit-source-id: 394d136ee94a16ee24285939323ca5fe317e99d3	2019-05-31 19:35:29 -07:00
Mingzhe Li	dceea73460	add PT conv and convtranspose ops to the benchmark suite (#21199 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21199 This diff adds Conv1d, ConvTranspose1d, Conv2d, ConvTranspose2d, Conv3d, and ConvTranspose3d operators to the benchmark suite. Reviewed By: hl475 Differential Revision: D15520817 fbshipit-source-id: 5512afec2be8a1036fbcd170f70265c7e455fcde	2019-05-31 19:35:25 -07:00
Mingzhe Li	2d75d31398	add PT linear op to the benchmark suite (#21204 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21204 as title Reviewed By: hl475 Differential Revision: D15484743 fbshipit-source-id: 7094a983e370e1c3952021146b58b844874b7d5e	2019-05-31 19:35:22 -07:00
Mingzhe Li	00b3e69211	add PT batchnorm op to the benchmark suite (#21201 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21201 as title Reviewed By: hl475 Differential Revision: D15482581 fbshipit-source-id: d93713a35be41e76d077df419cb24585f69d72eb	2019-05-31 19:35:18 -07:00
Mingzhe Li	ed1078bde3	migrate matmul operator to the new interface (#21198 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21198 as title Reviewed By: hl475 Differential Revision: D15325768 fbshipit-source-id: a5d7c6837cd09445e75846660d12807dd26af6cc	2019-05-31 19:35:15 -07:00
Mingzhe Li	668dbcc41b	migrate intraop benchmarks to the new interface (#21202 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21202 Migrate Ilia's op benchmarks to the new interface Reviewed By: hl475 Differential Revision: D15322577 fbshipit-source-id: 8e75d51e7ddacbd56896c55f2996a9358491d83e	2019-05-31 16:19:04 -07:00
Mingzhe Li	c62d476206	migrate add operator to the new interface (#21152 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21152 Migrate existing add benchmark to use the new op front-end Reviewed By: zheng-xq Differential Revision: D15325524 fbshipit-source-id: 34e969e1bd289913d881c476711bce9f8ac18a29	2019-05-31 16:19:00 -07:00
Mingzhe Li	0223d3744a	introduce a new intrace to add op [C2 changes] (#21148 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21148 The diff modifies the interface for Caffe2 operators in the benchmark suite Reviewed By: zheng-xq Differential Revision: D15433888 fbshipit-source-id: c264a95906422d7a26c10b1f9836ba8b35e36b53	2019-05-31 09:21:07 -07:00
Mingzhe Li	31089b02ce	introduce a new interface to add op [core changes] (#21147 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21147 This diff introduces a new interface to add PT/C2 operators to the benchmark suite. The following steps are needed to add a new operator: 1. Specify the input shapes, args to an operator in configs 2. Create a PT/C2 benchmark class which includes ```init``` (create tensors), ```forward``` (specify the operator to be tested.), and ```backward```(gradient of an op.) methods 3. call generate_pt_test/generate_c2_test to create test cases based on configs Reviewed By: zheng-xq Differential Revision: D15250380 fbshipit-source-id: 1025a7cf60d2427baa0f3f716455946d3d3e6a27	2019-05-31 09:21:04 -07:00
Kimish Patel	cda9e995e2	Benchmark repeat op. (#20016 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20016 PT's repeat op benchmark Reviewed By: zheng-xq Differential Revision: D15166941 fbshipit-source-id: b1ed7af790460456210b60bfb4e44a08657e9612	2019-05-20 07:34:54 -07:00
Ilia Cherniavskii	eecf52b444	Fix in benchmark_test_generator (#20237 ) Summary: Add missing import Pull Request resolved: https://github.com/pytorch/pytorch/pull/20237 Differential Revision: D15245957 Pulled By: ilia-cher fbshipit-source-id: 0f71aa08eb9ecac32002a1644838d06ab9faa37c	2019-05-07 17:03:25 -07:00
Ilia Cherniavskii	19e6886576	Intra-op parallel microbenchmarks for PT (#19997 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19997 ghimport-source-id: 420d4a68a1ef879beee2734adba8abb575e0b0ab Differential Revision: D15231375 Pulled By: ilia-cher fbshipit-source-id: ce7248ea2ebb54d25c9d831c6e3f23f3534557dd	2019-05-06 20:21:45 -07:00
Ilia Cherniavskii	8c97f0b19e	Initialize Caffe2 only when running Caffe2 benchmarks (#19980 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19980 ghimport-source-id: ca31ca25b88a1c6219e4a32483f70738a8fdbf88 Differential Revision: D15229797 Pulled By: ilia-cher fbshipit-source-id: 0b23dbdba0c0f60932a75d8b1900c54285f5a8e4	2019-05-06 19:17:23 -07:00
Ilia Cherniavskii	0c7e98b765	Support for non-contiguous tensors and arbitrary dtypes in PT benchmarks (#19993 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19993 ghimport-source-id: 4cf51b61bb83b72883148ab0faa0c75c3cef7635 Differential Revision: D15230363 Pulled By: ilia-cher fbshipit-source-id: a3ab591d6fd24e874958401e63eaec56bda19a5c	2019-05-06 19:12:09 -07:00
Natalia Gimelshein	3875e1ba45	try to make at::cat in mm_tree_reduction operate on contig tensors (#18816 ) Summary: Sometimes at::cat gets transposed inputs and goes on a slow path. Also, make jit_premul lstm benchmark add bias to the whole input tensor to avoid separate reduction kernels in the backward pass. Pull Request resolved: https://github.com/pytorch/pytorch/pull/18816 Differential Revision: D15013576 Pulled By: wanchaol fbshipit-source-id: bcfa1cf44180b11b05b0f55f034707012f66281a	2019-04-24 23:44:25 -07:00
Mingzhe Li	26f12af537	Fix op benchmarks error in OSS environment (#19518 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19518 Previous design needs to run the op benchmarks from PyTorch root directory which could lead to `module not found` error in OSS environment. This diff fixes that issue by making the benchmark to be launched in the `benchmarks` folder. Reviewed By: ilia-cher Differential Revision: D15020787 fbshipit-source-id: eb09814a33432a66cc857702bc86538cd17bea3b	2019-04-19 16:25:16 -07:00
Mingzhe Li	5da7b74d48	fix AI-PEP path error (#19514 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19514 as title Reviewed By: hl475 Differential Revision: D15018499 fbshipit-source-id: 9ce38e3a577432e0575a6743f5dcd2e907d3ab9d	2019-04-19 16:25:13 -07:00
Mingzhe Li	08f5c05d60	make separate operators as independent binaries (#19450 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19450 We want to make each operator benchmark as a separate binary. The previous way to run the benchmark is by collecting all operators into a single binary, it is unnecessary when we want to filter a specific operator. This diff aims to resolve that issue. Reviewed By: ilia-cher Differential Revision: D14808159 fbshipit-source-id: 43cd25b219c6e358d0cd2a61463b34596bf3bfac	2019-04-18 20:00:47 -07:00
Mingzhe Li	45d5b6be48	Enhance front-end to add op (#19433 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19433 For operator benchmark project, we need to cover a lot of operators, so the interface for adding operators needs to be very clean and simple. This diff is implementing a new interface to add op. Here is the logic to add new operator to the benchmark: ``` long_config = {} short_config = {} map_func add_test( [long_config, short_config], map_func, [caffe2 op] [pt op] ) ``` Reviewed By: zheng-xq Differential Revision: D14791191 fbshipit-source-id: ac6738507cf1b9d6013dc8e546a2022a9b177f05	2019-04-18 17:07:02 -07:00
Xiaoqiang Zheng	5627940e9c	Add a fast path for batch-norm CPU inference. (#19152 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19152 Adding a fast path for batch-norm CPU inference when all tensors are contiguous. * Leverage vectorization through smiple loops. * Folding linear terms before computation. * For resnext-101, this version gets 18.95 times faster. * Add a microbenchmark: * (buck build mode/opt -c python.package_style=inplace --show-output //caffe2/benchmarks/operator_benchmark:batchnorm_benchmark) && \ (OMP_NUM_THREADS=1 MKL_NUM_THREADS=1 buck-out/gen/caffe2/benchmarks/operator_benchmark/batchnorm_benchmark#binary.par) * batch_norm: data shape: [1, 256, 3136], bandwidth: 22.26 GB/s * batch_norm: data shape: [1, 65536, 1], bandwidth: 5.57 GB/s * batch_norm: data shape: [128, 2048, 1], bandwidth: 18.21 GB/s Reviewed By: soumith, BIT-silence Differential Revision: D14889728 fbshipit-source-id: 20c9e567e38ff7dbb9097873b85160eca2b0a795	2019-04-16 19:27:54 -07:00
Mingzhe Li	3501576230	calculate execution time based on final iterations (#19299 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19299 I saw larger than 5% performance variation with small operators, this diff aims to reduce the variation by avoiding python overhead. Previously, in the benchmark, we run the main loop for 100 iterations then look at the time. If it's not significant, we will double the number of iterations to rerun and look at the result. We continue this process until it becomes significant. We calculate the time by total_time / number of iterations. The issue is that we are including multiple python trigger overhead. Now, I change the logic to calculate execution time based on the last run instead of all runs, the equation is time_in_last_run/number of iterations. Reviewed By: hl475 Differential Revision: D14925287 fbshipit-source-id: cb646298c08a651e27b99a5547350da367ffff47	2019-04-16 08:57:17 -07:00
Wanchao Liang	07efee395c	add Fast-RNN to AI-PEP Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18885 Reviewed By: hl475 Differential Revision: D14728854 fbshipit-source-id: 7e7a2946929551963f7c938e3d82a260a9efdfbd	2019-04-04 17:04:21 -07:00
mingzhe0908	cb66759600	temp fix for flake8 error (#18788 ) Summary: Fix lint error Pull Request resolved: https://github.com/pytorch/pytorch/pull/18788 Reviewed By: houseroad Differential Revision: D14741840 Pulled By: mingzhe09088 fbshipit-source-id: 1fa630e3c6e606e3d78fe8293e5b0e7ea1b78da3	2019-04-02 22:52:52 -07:00
Mingzhe Li	5f5a2aaab9	Operator-level performance microbenchmarks (#18740 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18740 Test utilities for writing Caffe2/PyTorch performance microbenchmarks. Brief description of the file structure * benchmark_core.py : core utiltiites for running microbenchmark tests * benchmark_caffe2.py : Caffe2 specific benchmark utilitites * benchmark_pytorch.py: PyTorch specific benchmark utilities * benchmark_runner.py : Main function. Currently it can run the microbenchmark tests in a stand-alone mode. The next step is to have this integrate with AI-PEP. The utilities are located at https://github.com/pytorch/pytorch/tree/master/test to have access to both Caffe2/PyTorch Python's frontend. Include two operator microbenchmarks; support both Caffe2/PyTorch: * MatMul * Add Reference: PyTorch benchmarks : https://github.com/pytorch/benchmark/tree/master/timing/python. In this work, we start with two example binary operators MatMul and Add, but eventually we should to cover unary operators like in the PyTorch benchmark repo. Reviewed By: zheng-xq Differential Revision: D13887111 fbshipit-source-id: b7a56b95448c9ec3e674b0de0ffb96af4439bfce	2019-04-02 17:06:19 -07:00
Edward Yang	173f224570	Turn on F401: Unused import warning. (#18598 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18598 ghimport-source-id: c74597e5e7437e94a43c163cee0639b20d0d0c6a Stack from [ghstack](https://github.com/ezyang/ghstack): * #18598 Turn on F401: Unused import warning. This was requested by someone at Facebook; this lint is turned on for Facebook by default. "Sure, why not." I had to noqa a number of imports in __init__. Hypothetically we're supposed to use __all__ in this case, but I was too lazy to fix it. Left for future work. Be careful! flake8-2 and flake8-3 behave differently with respect to import resolution for # type: comments. flake8-3 will report an import unused; flake8-2 will not. For now, I just noqa'd all these sites. All the changes were done by hand. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Differential Revision: D14687478 fbshipit-source-id: 30d532381e914091aadfa0d2a5a89404819663e3	2019-03-30 09:01:17 -07:00
Junjie Bai	e22a2b9015	Minor fixes in fastrnns benchmarks Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18613 Reviewed By: wanchaol Differential Revision: D14681838 fbshipit-source-id: 60bd5c9b09398c74335f003cd21ea32dd1c45876	2019-03-29 01:22:28 -07:00
Wanchao Liang	6684ef3f23	Move fast rnn benchmark to pytorch/pytorch Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18369 Differential Revision: D14652039 Pulled By: wanchaol fbshipit-source-id: 1177b1f60d96672c3e2c9d527b56ee06ca7c0af1	2019-03-27 14:46:09 -07:00

... 3 4 5 6 7

317 Commits