pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
Mingzhe Li	f9010d7648	remove wipe cache from op bench (#31334 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31334 The wipe cache logic was introduced hoping to reduce the variations in the benchmark results. Based on our experiments result, it didn't actually help with that. In addition, several engineers had encountered the issue of missing cpuinfo.h which was used in the wipe cache logic. So this diff removes that feature to ensure smooth installation and running of the op bench. Test Plan: ``` buck run caffe2/benchmarks/operator_benchmark/pt:add_test -- --iterations 1 # ---------------------------------------- # PyTorch/Caffe2 Operator Micro-benchmarks # ---------------------------------------- # Tag : short # Benchmarking PyTorch: add # Mode: Eager # Name: add_M1_N1_K1_cpu # Input: M: 1, N: 1, K: 1, device: cpu Forward Execution Time (us) : 111.192 A/B test also pass Benchmark Run #2476535015 Reviewed By: hl475 Differential Revision: D19126970 fbshipit-source-id: 9b1ab48c121838836ba6e0ae664a48fe2d18efdd	2019-12-16 16:34:14 -08:00
Mingzhe Li	9cb8fb61c2	update operator_range discription in op bench (#30170 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30170 as title Test Plan: ``` buck-out/opt/gen/caffe2/benchmarks/operator_benchmark/benchmark_all_other_test.par --tag_filter all --iterations 1 --operator_range ef ... ValueError: The correct format for operator_range is <start>-<end>, or <point>, <start>-<end> buck-out/opt/gen/caffe2/benchmarks/operator_benchmark/benchmark_all_other_test.par --tag_filter all --iterations 1 --operator_range a-b # ---------------------------------------- # PyTorch/Caffe2 Operator Micro-benchmarks # ---------------------------------------- # Tag : all # Benchmarking PyTorch: add # Mode: Eager # Name: add_M8_N32_K256_cpu # Input: M: 8, N: 32, K: 256, device: cpu Forward Execution Time (us) : 60.551 # Benchmarking PyTorch: add # Mode: Eager # Name: add_M8_N32_K256_cuda # Input: M: 8, N: 32, K: 256, device: cuda Forward Execution Time (us) : 67.716 ... buck-out/opt/gen/caffe2/benchmarks/operator_benchmark/benchmark_all_other_test.par --tag_filter all --iterations 1 --operator_range b,d-f # ---------------------------------------- # PyTorch/Caffe2 Operator Micro-benchmarks # ---------------------------------------- # Tag : all # Benchmarking PyTorch: batchnorm # Mode: Eager # Name: batchnorm_M1_N256_K3136_cpu # Input: M: 1, N: 256, K: 3136, device: cpu Forward Execution Time (us) : 296.004 ... Reviewed By: hl475 Differential Revision: D18619975 fbshipit-source-id: 08f27ee2aeda47be431385f4b20ef7fbeb797516	2019-11-20 12:07:14 -08:00
Mingzhe Li	8b9bac1fad	add operator-range argument to the op bench (#30051 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30051 This argument takes hyphen delimited start and end chars to filter operators. If the first character of an operator is in the start and end range, it will be tested. Otherwise skipped. (Note: this ignores all push blocking failures!) Test Plan: ``` buck run mode/opt caffe2/benchmarks/operator_benchmark:benchmark_all_test -- --iterations 1 --operator_range b-c # ---------------------------------------- # PyTorch/Caffe2 Operator Micro-benchmarks # ---------------------------------------- # Tag : short # Benchmarking PyTorch: ceil # Mode: Eager # Name: ceil_M512_N512_cpu # Input: M: 512, N: 512, device: cpu Forward Execution Time (us) : 110.720 # Benchmarking PyTorch: ceil_ # Mode: Eager # Name: ceil__M512_N512_cpu # Input: M: 512, N: 512, device: cpu Forward Execution Time (us) : 51.128 ... buck run mode/opt caffe2/benchmarks/operator_benchmark:benchmark_all_test -- --iterations 1 --operator_range None # ---------------------------------------- # PyTorch/Caffe2 Operator Micro-benchmarks # ---------------------------------------- # Tag : short # Benchmarking PyTorch: abs # Mode: Eager # Name: abs_M512_N512_cpu # Input: M: 512, N: 512, device: cpu Forward Execution Time (us) : 107.113 # Benchmarking PyTorch: abs_ # Mode: Eager # Name: abs__M512_N512_cpu # Input: M: 512, N: 512, device: cpu Forward Execution Time (us) : 54.259 ... Reviewed By: hl475 Differential Revision: D18581910 fbshipit-source-id: b1a1a7ba76f4d6a61c8a1659f15e9c66097654d4	2019-11-18 20:34:43 -08:00
Mingzhe Li	90e3bbf3ab	support all with tag_filter to run all shapes (#29864 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29864 This diff make `all` as a reserved keyword for tag_filter. When `all` is passed from user, it will run all the supported shapes. Test Plan: ``` buck run //caffe2/benchmarks/operator_benchmark/pt:add_test -- --iterations 1 --tag_filter all # ---------------------------------------- # PyTorch/Caffe2 Operator Micro-benchmarks # ---------------------------------------- # Tag : all # Benchmarking PyTorch: add # Mode: Eager # Name: add_M8_N32_K256_cpu # Input: M: 8, N: 32, K: 256, device: cpu Forward Execution Time (us) : 6798.688 ... Reviewed By: hl475 Differential Revision: D18520249 fbshipit-source-id: 4d55af9f46f89b2fe8842e1a00dfa8e5acaf4fa2	2019-11-14 20:19:05 -08:00
Mingzhe Li	6572d0d174	add a new flag to select machine for op benchmark (#29349 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29349 This diff adds a new flag to pick cpu/gpu machines to run op benchmarks. The default is None which will try to run all support devices. Test Plan: ``` buck run mode/opt caffe2/benchmarks/operator_benchmark/pt:add_test # ---------------------------------------- # PyTorch/Caffe2 Operator Micro-benchmarks # ---------------------------------------- # Tag : short # Benchmarking PyTorch: add_ # Mode: Eager # Name: add__M64_N64_K64_cpu # Input: M: 64, N: 64, K: 64, device: cpu Forward Execution Time (us) : 124.283 ... # Benchmarking PyTorch: add_ # Mode: Eager # Name: add__M64_N64_K128_cuda_bwdall # Input: M: 64, N: 64, K: 128, device: cuda Backward Execution Time (us) : 176.592 buck run mode/opt caffe2/benchmarks/operator_benchmark/pt:add_test -- --device cpu # ---------------------------------------- # PyTorch/Caffe2 Operator Micro-benchmarks # ---------------------------------------- # Tag : short # Benchmarking PyTorch: add_ # Mode: Eager # Name: add__M64_N64_K64_cpu # Input: M: 64, N: 64, K: 64, device: cpu Forward Execution Time (us) : 121.884 buck run mode/opt caffe2/benchmarks/operator_benchmark/pt:add_test -- --device cuda # ---------------------------------------- # PyTorch/Caffe2 Operator Micro-benchmarks # ---------------------------------------- # Tag : short # Benchmarking PyTorch: add_ # Mode: Eager # Name: add__M64_N64_K64_cuda # Input: M: 64, N: 64, K: 64, device: cuda Forward Execution Time (us) : 26.002 Reviewed By: hl475 Differential Revision: D18363942 fbshipit-source-id: fccd1fd09bcd6d7725e6fa4063559a27d9cc3065	2019-11-06 20:13:25 -08:00
Mingzhe Li	f63cbf3ae2	change op benchmark forward_only flag (#28967 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28967 Change forward_only flag to take True or False so it should be integrated with PEP. Test Plan: ``` [mingzhe0908@devgpu203.prn2 ~/fbsource/fbcode] ~/fbsource/fbcode/buck-out/opt/gen/caffe2/benchmarks/operator_benchmark/pt/add_test.par --forward_only True --iterations 1 # ---------------------------------------- # PyTorch/Caffe2 Operator Micro-benchmarks # ---------------------------------------- # Tag : short # Benchmarking PyTorch: add # Mode: Eager # Name: add_M64_N64_K64_cpu # Input: M: 64, N: 64, K: 64, device: cpu Forward Execution Time (us) : 152.489 # Benchmarking PyTorch: add # Mode: Eager # Name: add_M64_N64_K128_cpu # Input: M: 64, N: 64, K: 128, device: cpu Forward Execution Time (us) : 236.608 [mingzhe0908@devgpu203.prn2 ~/fbsource/fbcode] ~/fbsource/fbcode/buck-out/opt/gen/caffe2/benchmarks/operator_benchmark/pt/add_test.par --forward_only False --iterations 1 # ---------------------------------------- # PyTorch/Caffe2 Operator Micro-benchmarks # ---------------------------------------- # Tag : short # Benchmarking PyTorch: add # Mode: Eager # Name: add_M64_N64_K64_cpu # Input: M: 64, N: 64, K: 64, device: cpu Forward Execution Time (us) : 147.174 # Benchmarking PyTorch: add # Mode: Eager # Name: add_M64_N64_K128_cpu # Input: M: 64, N: 64, K: 128, device: cpu Forward Execution Time (us) : 253.437 # Benchmarking PyTorch: add # Mode: Eager # Name: add_M64_N64_K64_cpu_bwdall # Input: M: 64, N: 64, K: 64, device: cpu Backward Execution Time (us) : 1044.082 Reviewed By: hl475 Differential Revision: D18247416 fbshipit-source-id: 1c6cff1ac98233d4f0ca298e0cb4a0d3466e5834	2019-10-31 13:28:58 -07:00
Mingzhe Li	9f44a04613	separate PT and C2 to reduce build time (#28731 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28731 as title Test Plan: ``` Before: buck run mode/opt caffe2/benchmarks/operator_benchmark:benchmark_all_test -- --operator sigmoid Invalidating internal cached state: Buck configuration options changed between invocations. This may cause slower builds. Changed value project.buck_out='buck-out/opt' (was 'buck-out/dev') ... and 69 more. See logs for all changes Parsing buck files: finished in 7.2 sec Creating action graph: finished in 10.0 sec Building: finished in 06:38.4 min (100%) 29890/29890 jobs, 29890 updated Total time: 06:55.7 min # ---------------------------------------- # PyTorch/Caffe2 Operator Micro-benchmarks # ---------------------------------------- # Tag : short # Benchmarking PyTorch: sigmoid With this diff buck run mode/opt caffe2/benchmarks/operator_benchmark:benchmark_all_test -- --operator sigmoid Parsing buck files: finished in 6.4 sec Creating action graph: finished in 9.8 sec Building: finished in 06:35.9 min (100%) 29892/29892 jobs, 29892 updated Total time: 06:52.1 min Reviewed By: hl475 Differential Revision: D18152071 fbshipit-source-id: 80c29570581bbd2f0e78e2df32734c17a2b036ee	2019-10-28 11:10:47 -07:00
Daya Khudia	bf7ebc5a53	Set number of threads for operator_benchmarks (#27010 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27010 Setting OMP_NUM_THREADS programmatically doesn't do the right thing because initialization is already done. Fixing this by calling torch.set_num_threads explicitly. Passing --omp_num_threads works as expected now. In dir benchmarks/operator_benchmark/ python -m pt.qconv_test --tag_filter resnext101_32x4 --wipe_cache --test_name QConv2d_N1_IC64_OC128_H56_W56_G1_kernel1_stride1_pad0 --omp_num_threads 1 ``` # ---------------------------------------- # PyTorch/Caffe2 Operator Micro-benchmarks # ---------------------------------------- # Tag : None # Benchmarking PyTorch: QConv2d # Mode: Eager # Name: QConv2d_N1_IC64_OC128_H56_W56_G1_kernel1_stride1_pad0 # Input: N: 1, IC: 64, OC: 128, H: 56, W: 56, G: 1, kernel: 1, stride: 1, pad: 0 Forward Execution Time (us) : 509.965 # Benchmarking PyTorch: QConv2d # Mode: Eager # Name: QConv2d_N1_IC64_OC128_H56_W56_G1_kernel1_stride1_pad0 # Input: N: 1, IC: 64, OC: 128, H: 56, W: 56, G: 1, kernel: 1, stride: 1, pad: 0 Forward Execution Time (us) : 576.007 ``` python -m pt.qconv_test --tag_filter resnext101_32x4 --wipe_cache --test_name QConv2d_N1_IC64_OC128_H56_W56_G1_kernel1_stride1_pad0 --omp_num_threads 4 ``` # ---------------------------------------- # PyTorch/Caffe2 Operator Micro-benchmarks # ---------------------------------------- # Tag : None # Benchmarking PyTorch: QConv2d # Mode: Eager # Name: QConv2d_N1_IC64_OC128_H56_W56_G1_kernel1_stride1_pad0 # Input: N: 1, IC: 64, OC: 128, H: 56, W: 56, G: 1, kernel: 1, stride: 1, pad: 0 Forward Execution Time (us) : 195.002 # Benchmarking PyTorch: QConv2d # Mode: Eager # Name: QConv2d_N1_IC64_OC128_H56_W56_G1_kernel1_stride1_pad0 # Input: N: 1, IC: 64, OC: 128, H: 56, W: 56, G: 1, kernel: 1, stride: 1, pad: 0 Forward Execution Time (us) : 189.788 ``` ghstack-source-id: 91050434 Test Plan: See summary Differential Revision: D17647391 fbshipit-source-id: e00de1151902291ed94fd34446995ea1f0199d14	2019-09-30 17:04:51 -07:00
Huamin Li	1b38a6f602	add wipe cache Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24390 Reviewed By: mingzhe09088 Differential Revision: D16808041 fbshipit-source-id: 1b19f47706e4e2f2e03356469315b55c6ff76d20	2019-08-14 23:48:52 -07:00
Huamin Li	f511abb701	increase default warmup iter and iter (#24272 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24272 As title, plus some lint Reviewed By: mingzhe09088 Differential Revision: D16792312 fbshipit-source-id: 1386c369c96da04a584d1f7127b708b29d4b47d2	2019-08-13 14:35:19 -07:00
Mingzhe Li	828c08b4c7	allow passing a list of operators to benchmark (#23442 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23442 Replace the argument name from `operator` to `operators` which can take a list of operators to test. Reviewed By: hl475 Differential Revision: D16520779 fbshipit-source-id: 94284a87c64471793e319f5bd3143f89b9a192bb	2019-07-26 12:20:36 -07:00
Abhinav Jauhri	bae10db522	Incorporating arguments to pull production operators and adding device type. (#23197 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23197 Incorporating arguments to pull production operators and adding device type. Reviewed By: mingzhe09088 Differential Revision: D16387263 fbshipit-source-id: e20ed82225eb1e4b7ab1756ec157967b055d85bf	2019-07-23 13:43:26 -07:00
Mingzhe Li	94d99f2522	add num_runs flag to the benchmark (#22892 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22892 Think of num_runs as manually run the binary <num_runs> times. Each run runs the operator for many iterations. Reviewed By: hl475 Differential Revision: D16271597 fbshipit-source-id: b6f509ee0332c70f85bec0d447b84940c5c0cecd	2019-07-15 17:18:25 -07:00
Mingzhe Li	b93f29ded3	add JIT path to the benchmark (#22309 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22309 This diff enables PT operators to run with JIT mode. Users can control eager and JIT mode using the `use_jit` flag. In this diff, we are putting operators in a loop and passed it to JIT. One extra step which wraps the operator with the `_consume` op is introduced to avoid dead code elimination optimization in JIT. With that, the reported time includes the real operator execution time plus the `_consume` (directly return input, nothing else if happening inside) op. Reviewed By: zheng-xq Differential Revision: D16033082 fbshipit-source-id: e03be89fd5a505e44e81015dfc63db9cd76fb8a1	2019-07-03 17:18:03 -07:00
Mingzhe Li	a4f281446b	introduce flags to set omp and mkl threads (#21472 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21472 as title Reviewed By: hl475 Differential Revision: D15695846 fbshipit-source-id: 44437f6b94a9c583275fcc711bb6ccf2b04f90fc	2019-06-26 09:33:05 -07:00
Mingzhe Li	12528990f8	change output of ai_pep_format (#21440 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21440 This diff modifies the output format when ai_pep_format is enabled. Reviewed By: hl475 Differential Revision: D15681042 fbshipit-source-id: df5f2dbb38d1bd866ca7f74ef4e63459d480be6e	2019-06-05 21:54:24 -07:00
Mingzhe Li	3004b397f0	change test_name to be globally unique value across tests (#21206 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21206 This diff change the default test_name to be a globally unique value across tests. With that, users can list all the tests and choose to run a specific test. Reviewed By: zheng-xq Differential Revision: D15543508 fbshipit-source-id: 0814ef6a60d41637fed5245e30c282497cf21bb8	2019-06-03 14:55:11 -07:00
Mingzhe Li	31089b02ce	introduce a new interface to add op [core changes] (#21147 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21147 This diff introduces a new interface to add PT/C2 operators to the benchmark suite. The following steps are needed to add a new operator: 1. Specify the input shapes, args to an operator in configs 2. Create a PT/C2 benchmark class which includes ```init``` (create tensors), ```forward``` (specify the operator to be tested.), and ```backward```(gradient of an op.) methods 3. call generate_pt_test/generate_c2_test to create test cases based on configs Reviewed By: zheng-xq Differential Revision: D15250380 fbshipit-source-id: 1025a7cf60d2427baa0f3f716455946d3d3e6a27	2019-05-31 09:21:04 -07:00
Ilia Cherniavskii	8c97f0b19e	Initialize Caffe2 only when running Caffe2 benchmarks (#19980 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19980 ghimport-source-id: ca31ca25b88a1c6219e4a32483f70738a8fdbf88 Differential Revision: D15229797 Pulled By: ilia-cher fbshipit-source-id: 0b23dbdba0c0f60932a75d8b1900c54285f5a8e4	2019-05-06 19:17:23 -07:00
Mingzhe Li	26f12af537	Fix op benchmarks error in OSS environment (#19518 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19518 Previous design needs to run the op benchmarks from PyTorch root directory which could lead to `module not found` error in OSS environment. This diff fixes that issue by making the benchmark to be launched in the `benchmarks` folder. Reviewed By: ilia-cher Differential Revision: D15020787 fbshipit-source-id: eb09814a33432a66cc857702bc86538cd17bea3b	2019-04-19 16:25:16 -07:00
Mingzhe Li	08f5c05d60	make separate operators as independent binaries (#19450 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19450 We want to make each operator benchmark as a separate binary. The previous way to run the benchmark is by collecting all operators into a single binary, it is unnecessary when we want to filter a specific operator. This diff aims to resolve that issue. Reviewed By: ilia-cher Differential Revision: D14808159 fbshipit-source-id: 43cd25b219c6e358d0cd2a61463b34596bf3bfac	2019-04-18 20:00:47 -07:00
Mingzhe Li	45d5b6be48	Enhance front-end to add op (#19433 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19433 For operator benchmark project, we need to cover a lot of operators, so the interface for adding operators needs to be very clean and simple. This diff is implementing a new interface to add op. Here is the logic to add new operator to the benchmark: ``` long_config = {} short_config = {} map_func add_test( [long_config, short_config], map_func, [caffe2 op] [pt op] ) ``` Reviewed By: zheng-xq Differential Revision: D14791191 fbshipit-source-id: ac6738507cf1b9d6013dc8e546a2022a9b177f05	2019-04-18 17:07:02 -07:00
mingzhe0908	cb66759600	temp fix for flake8 error (#18788 ) Summary: Fix lint error Pull Request resolved: https://github.com/pytorch/pytorch/pull/18788 Reviewed By: houseroad Differential Revision: D14741840 Pulled By: mingzhe09088 fbshipit-source-id: 1fa630e3c6e606e3d78fe8293e5b0e7ea1b78da3	2019-04-02 22:52:52 -07:00
Mingzhe Li	5f5a2aaab9	Operator-level performance microbenchmarks (#18740 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18740 Test utilities for writing Caffe2/PyTorch performance microbenchmarks. Brief description of the file structure * benchmark_core.py : core utiltiites for running microbenchmark tests * benchmark_caffe2.py : Caffe2 specific benchmark utilitites * benchmark_pytorch.py: PyTorch specific benchmark utilities * benchmark_runner.py : Main function. Currently it can run the microbenchmark tests in a stand-alone mode. The next step is to have this integrate with AI-PEP. The utilities are located at https://github.com/pytorch/pytorch/tree/master/test to have access to both Caffe2/PyTorch Python's frontend. Include two operator microbenchmarks; support both Caffe2/PyTorch: * MatMul * Add Reference: PyTorch benchmarks : https://github.com/pytorch/benchmark/tree/master/timing/python. In this work, we start with two example binary operators MatMul and Add, but eventually we should to cover unary operators like in the PyTorch benchmark repo. Reviewed By: zheng-xq Differential Revision: D13887111 fbshipit-source-id: b7a56b95448c9ec3e674b0de0ffb96af4439bfce	2019-04-02 17:06:19 -07:00

24 Commits