pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-06 12:20:52 +01:00

Author	SHA1	Message	Date
Xuehai Pan	c0ed38e644	[BE][Easy][3/19] enforce style for empty lines in import segments in `benchmarks/` (#129754 ) See https://github.com/pytorch/pytorch/pull/129751#issue-2380881501. Most changes are auto-generated by linter. You can review these PRs via: ```bash git diff --ignore-all-space --ignore-blank-lines HEAD~1 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/129754 Approved by: https://github.com/ezyang	2024-07-17 14:34:42 +00:00
Xuehai Pan	26f4f10ac8	[5/N][Easy] fix typo for `usort` config in `pyproject.toml` (`kown` -> `known`): sort torch (#127126 ) The `usort` config in `pyproject.toml` has no effect due to a typo. Fixing the typo make `usort` do more and generate the changes in the PR. Except `pyproject.toml`, all changes are generated by `lintrunner -a --take UFMT --all-files`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/127126 Approved by: https://github.com/kit1980	2024-05-27 14:49:57 +00:00
PyTorch MergeBot	55c0ab2887	Revert "[5/N][Easy] fix typo for `usort` config in `pyproject.toml` (`kown` -> `known`): sort torch (#127126 )" This reverts commit `7763c83af6`. Reverted https://github.com/pytorch/pytorch/pull/127126 on behalf of https://github.com/XuehaiPan due to Broken CI ([comment](https://github.com/pytorch/pytorch/pull/127126#issuecomment-2133044286))	2024-05-27 09:22:08 +00:00
Xuehai Pan	7763c83af6	[5/N][Easy] fix typo for `usort` config in `pyproject.toml` (`kown` -> `known`): sort torch (#127126 ) The `usort` config in `pyproject.toml` has no effect due to a typo. Fixing the typo make `usort` do more and generate the changes in the PR. Except `pyproject.toml`, all changes are generated by `lintrunner -a --take UFMT --all-files`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/127126 Approved by: https://github.com/kit1980 ghstack dependencies: #127122, #127123, #127124, #127125	2024-05-27 04:22:18 +00:00
Edward Z. Yang	dd3a77bc96	Apply UFMT to all files in benchmarks/ (#105928 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/105928 Approved by: https://github.com/albanD	2023-07-26 01:18:48 +00:00
Yang Wang	8ff0b6fef8	[OpBenchMobile] Enable operator_benchmark to run the benchmark on mobile through AiBench (#47767 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47767 This diff implements the functionality of running benchmark on mobile on top of operator_benchmark framework. It does so through a few steps: 1. create a scripted module from existing benchmark case. 2. run mobile specific optimization pass on the scripted module 3. run the scripted module on AiBench by calling its Python API A small change in the way of writing a benchmark case is introduced so that both local and mobile run can share the same interface. The change is about having inputs as arguments of the `forward` function, so that mobile optimization pass can be run successfully (otherwise everything will be optimized away by constant propagation). Test Plan: ## local op_bench run buck run caffe2/benchmarks/operator_benchmark:benchmark_all_test -- --iterations 1 --warmup_iterations 1 buck run caffe2/benchmarks/operator_benchmark:benchmark_all_test -- --iterations 1 --warmup_iterations 1 --use_jit Exceptions: `py_module` op in `FakeQuantizePerTensorBaseOpBenchmark` and `FakeQuantizePerChannelBaseOpBenchmark` under JIT mode. These tests also failed in the base version ``` RuntimeError: Module 'FakeQuantizePerChannelOpBenchmark' has no attribute 'op_func' (This function exists as an attribute on the Python module, but we failed to compile it to a TorchScript function. The error stack is reproduced here: Python builtin <built-in method apply of FunctionMeta object at 0x619000c652a0> is currently not supported in Torchscript: File "/data/users/wangyang19/fbsource/fbcode/buck-out/dev/gen/caffe2/benchmarks/operator_benchmark/pt/quantization_test#link-tree/quantization_test.py", line 260 quant_min: int, quant_max: int ): return _LearnableFakeQuantizePerChannelOp.apply(input, scale, zero_point, axis, quant_min, quant_max, 1.0) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE : File "/data/users/wangyang19/fbsource/fbcode/buck-out/dev/gen/caffe2/benchmarks/operator_benchmark/pt/quantization_test#link-tree/quantization_test.py", line 313 axis: int, quant_min: int, quant_max: int ): return self.op_func(input, scale, zero_point, axis, quant_min, quant_max) ~~~~~~~~~~~~ <--- HERE ``` `_consume_op` typing mismatch: chunk, split, qobserver, sort in qunary. These will be fixed in D24774105 ## OSS test python3 -m benchmark_all_test --iterations 1 --warmup_iterations 1 --use_jit python3 -m benchmark_all_test --iterations 1 --warmup_iterations 1 ## saved module graph ``` module __torch__.mobile_benchmark_utils.OpBenchmarkMobile { parameters { } attributes { training = True num_iters = 1 benchmark = <__torch__.pt.add_test.___torch_mangle_4.AddBenchmark object at 0x6070001b8b50> } methods { method forward { graph(%self : __torch__.mobile_benchmark_utils.OpBenchmarkMobile): %12 : None = prim::Constant() # /data/users/wangyang19/fbsource/fbcode/buck-out/dev/gen/caffe2/benchmarks/operator_benchmark/fb/pt/mobile/benchmark_all_test_fbcode#link-tree/mobile_benchmark_utils.py:9:4 %4 : bool = prim::Constant[value=1]() # /data/users/wangyang19/fbsource/fbcode/buck-out/dev/gen/caffe2/benchmarks/operator_benchmark/fb/pt/mobile/benchmark_all_test_fbcode#link-tree/mobile_benchmark_utils.py:10:8 %1 : int = prim::GetAttr[name="num_iters"](%self) = prim::Loop(%1, %4) # /data/users/wangyang19/fbsource/fbcode/buck-out/dev/gen/caffe2/benchmarks/operator_benchmark/fb/pt/mobile/benchmark_all_test_fbcode#link-tree/mobile_benchmark_utils.py:10:8 block0(%i : int): %6 : __torch__.pt.add_test.___torch_mangle_4.AddBenchmark = prim::GetAttr[name="benchmark"](%self) %7 : __torch__.pt.add_test.___torch_mangle_4.AddBenchmark = prim::GetAttr[name="benchmark"](%self) %self.inputs_tuple : (Float(1, 1, 1, strides=[1, 1, 1], requires_grad=0, device=cpu), Float(1, 1, 1, strides=[1, 1, 1], requires_grad=0, device=cpu)) = prim::Constant[value=({0.48884}, {0.809042})]() %9 : Tensor, %10 : Tensor = prim::TupleUnpack(%self.inputs_tuple) %23 : int = prim::Constant[value=1]() %24 : Tensor = aten::add(%9, %10, %23) # /data/users/wangyang19/fbsource/fbcode/buck-out/dev/gen/caffe2/benchmarks/operator_benchmark/fb/pt/mobile/benchmark_all_test_fbcode#link-tree/pt/add_test.py:39:15 -> (%4) return (%12) } } submodules { module __torch__.pt.add_test.___torch_mangle_4.AddBenchmark { parameters { } attributes { mobile_optimized = True } methods { method forward { graph(%self : __torch__.pt.add_test.___torch_mangle_4.AddBenchmark, %input_one.1 : Tensor, %input_two.1 : Tensor): %3 : int = prim::Constant[value=1]() %4 : Tensor = aten::add(%input_one.1, %input_two.1, %3) # /data/users/wangyang19/fbsource/fbcode/buck-out/dev/gen/caffe2/benchmarks/operator_benchmark/fb/pt/mobile/benchmark_all_test_fbcode#link-tree/pt/add_test.py:39:15 return (%4) } method get_inputs { graph(%self : __torch__.pt.add_test.___torch_mangle_4.AddBenchmark): %self.inputs_tuple : (Float(1, 1, 1, strides=[1, 1, 1], requires_grad=0, device=cpu), Float(1, 1, 1, strides=[1, 1, 1], requires_grad=0, device=cpu)) = prim::Constant[value=({0.48884}, {0.809042})]() return (%self.inputs_tuple) } } submodules { } } } } ``` Reviewed By: kimishpatel Differential Revision: D24322214 fbshipit-source-id: 335317eca4f40c4083883eb41dc47caf25cbdfd1	2020-11-12 17:15:05 -08:00
Xiang Gao	20ac736200	Remove py2 compatible future imports (#44735 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44735 Reviewed By: mruberry Differential Revision: D23731306 Pulled By: ezyang fbshipit-source-id: 0ba009a99e475ddbe22981be8ac636f8a1c8b02f	2020-09-16 12:55:57 -07:00
Wojciech Baranowski	43331609a4	Port addmm, addbmm, addr to ATen (CUDA) (#38421 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/24536, fixes https://github.com/pytorch/pytorch/issues/24534 and fixes https://github.com/pytorch/pytorch/issues/24533 Pull Request resolved: https://github.com/pytorch/pytorch/pull/38421 Differential Revision: D22138333 Pulled By: VitalyFedyunin fbshipit-source-id: f4411d0df0a001bbb95089eb55fdcac3aba86700	2020-06-22 13:02:33 -07:00
Mingzhe Li	b68d1fc316	add small input shapes to some ops (#30617 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30617 as title Test Plan: buck run //caffe2/benchmarks/operator_benchmark:benchmark_all_test -- --iterations 1 --operator add,as_strided,cat,chunk,fill,linear,matmul,split Reviewed By: hl475 Differential Revision: D18764248 fbshipit-source-id: 510cf83542822acfa1b7b5e475b0cc7432f7ac19	2019-12-02 10:46:43 -08:00
Mingzhe Li	60a33cac2b	reduce input shapes of long tag in op bench (#29865 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29865 For some operators, the number of tests (forward + backward) could easily go above 100. Many of them could be redundant so this diff tries to reduce the number of shapes. Test Plan: ``` buck run //caffe2/benchmarks/operator_benchmark:benchmark_all_test -- --iterations 1 # ---------------------------------------- # PyTorch/Caffe2 Operator Micro-benchmarks # ---------------------------------------- # Tag : short # Benchmarking PyTorch: add # Mode: Eager # Name: add_M64_N64_K64_cpu # Input: M: 64, N: 64, K: 64, device: cpu Forward Execution Time (us) : 28418.926 ... Reviewed By: hl475 Differential Revision: D18520946 fbshipit-source-id: 1056d6d5a9c46bc2d508ff133039aefeb9d11c27	2019-11-14 20:19:09 -08:00
Mingzhe Li	e53b510773	add addmm op to the benchmark suite (#29783 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29783 Add addmm operator which reuses existing input shapes for the add operator. Test Plan: ``` buck run //caffe2/benchmarks/operator_benchmark/pt:add_test -- --iterations 1 # ---------------------------------------- # PyTorch/Caffe2 Operator Micro-benchmarks # ---------------------------------------- # Tag : short # Benchmarking PyTorch: addmm # Mode: Eager # Name: addmm_M64_N64_K64_cpu # Input: M: 64, N: 64, K: 64, device: cpu Forward Execution Time (us) : 759.237 # Benchmarking PyTorch: addmm # Mode: Eager # Name: addmm_M64_N64_K128_cpu # Input: M: 64, N: 64, K: 128, device: cpu Forward Execution Time (us) : 922.764 # Benchmarking PyTorch: addmm # Mode: Eager # Name: addmm_M64_N64_K64_cpu_bwdall # Input: M: 64, N: 64, K: 64, device: cpu Backward Execution Time (us) : 4689.546 # Benchmarking PyTorch: addmm # Mode: Eager # Name: addmm_M64_N64_K64_cpu_bwd1 # Input: M: 64, N: 64, K: 64, device: cpu Backward Execution Time (us) : 1700.093 # Benchmarking PyTorch: addmm # Mode: Eager # Name: addmm_M64_N64_K64_cpu_bwd2 # Input: M: 64, N: 64, K: 64, device: cpu Backward Execution Time (us) : 2947.427 # Benchmarking PyTorch: addmm # Mode: Eager # Name: addmm_M64_N64_K64_cpu_bwd3 # Input: M: 64, N: 64, K: 64, device: cpu Backward Execution Time (us) : 2518.043 # Benchmarking PyTorch: addmm # Mode: Eager # Name: addmm_M64_N64_K128_cpu_bwdall # Input: M: 64, N: 64, K: 128, device: cpu Backward Execution Time (us) : 5848.369 Reviewed By: bddppq Differential Revision: D18496476 fbshipit-source-id: 4f1c116a2676a64106afa958e8c8a8e109f35a4a	2019-11-14 10:02:55 -08:00
Mingzhe Li	f31d6c70fe	reduce op bench binary size (#29496 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29496 This diff reduces the binary size of op benchmark by avoiding creating all tests at once. Test Plan: ``` buck run //caffe2/benchmarks/operator_benchmark:benchmark_all_test # ---------------------------------------- # PyTorch/Caffe2 Operator Micro-benchmarks # ---------------------------------------- # Tag : long # Benchmarking PyTorch: add # Mode: Eager # Name: add_M8_N2_K1_cpu # Input: M: 8, N: 2, K: 1, device: cpu Forward Execution Time (us) : 160.781 # Benchmarking PyTorch: add # Mode: Eager # Name: add_M8_N2_K8_cpu # Input: M: 8, N: 2, K: 8, device: cpu Forward Execution Time (us) : 158.941 Reviewed By: hl475 Differential Revision: D18412342 fbshipit-source-id: 5db647019ae8c2e4d6ab361b54b63cf88236b1ae	2019-11-08 22:15:12 -08:00
Mingzhe Li	e86450620d	add cuda to all op benchmark (#29285 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29285 as title Test Plan: ``` buck run mode/dev-nosan //caffe2/benchmarks/operator_benchmark:benchmark_all_test -- --iterations 1 # ---------------------------------------- # PyTorch/Caffe2 Operator Micro-benchmarks # ---------------------------------------- # Tag : short # Benchmarking PyTorch: ConvTranspose2d # Mode: Eager # Name: ConvTranspose2d_kernel3_out_c256_H16_in_c256_N1_stride1_W16_cpu # Input: kernel: 3, out_c: 256, H: 16, in_c: 256, N: 1, stride: 1, W: 16, device: cpu Forward Execution Time (us) : 10434.151 Reviewed By: hl475 Differential Revision: D18338258 fbshipit-source-id: 944e87d1ec70daadb205faaf2825d4a2202086c5	2019-11-06 09:37:00 -08:00
Mingzhe Li	b693c5d6a0	replace add benchmark with add_ (#29050 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29050 as title Test Plan: ``` buck run //caffe2/benchmarks/operator_benchmark/pt:add_test # ---------------------------------------- # PyTorch/Caffe2 Operator Micro-benchmarks # ---------------------------------------- # Tag : short # Benchmarking PyTorch: add_ # Mode: Eager # Name: add__M64_N64_K64_cpu # Input: M: 64, N: 64, K: 64, device: cpu Forward Execution Time (us) : 31475.766 Reviewed By: hl475 Differential Revision: D18265767 fbshipit-source-id: 7aaa04f5fa5b2dd58bbc1aa045693314032e0ff0	2019-11-01 13:08:27 -07:00
Mingzhe Li	db15c2ba20	unify add benchmark format (#28891 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28891 as title Test Plan: ``` buck run mode/opt //caffe2/benchmarks/operator_benchmark/pt:add_test # ---------------------------------------- # PyTorch/Caffe2 Operator Micro-benchmarks # ---------------------------------------- # Tag : short # Benchmarking PyTorch: add # Mode: Eager # Name: add_M64_N64_K64_cpu # Input: M: 64, N: 64, K: 64, device: cpu Forward Execution Time (us) : 125.279 ... Reviewed By: hl475 Differential Revision: D18226789 fbshipit-source-id: 0cc51c6691533b02f662d4b6108916455f3a5b95	2019-10-30 15:53:25 -07:00
Mingzhe Li	38a3eabd3e	remove cuda from add_test Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27698 Test Plan: ``` buck run caffe2/benchmarks/operator_benchmark/pt:add_test -- --iterations 3 # ---------------------------------------- # PyTorch/Caffe2 Operator Micro-benchmarks # ---------------------------------------- # Tag : short # Benchmarking PyTorch: add # Mode: Eager # Name: add_M64_N64_K64_cpu # Input: M: 64, N: 64, K: 64, device: cpu Forward Execution Time (us) : 29691.940 # Benchmarking PyTorch: add # Mode: Eager # Name: add_M64_N64_K128_cpu # Input: M: 64, N: 64, K: 128, device: cpu Forward Execution Time (us) : 60820.813 Reviewed By: hl475 Differential Revision: D17855731 fbshipit-source-id: c64c530f4dbcb5b4132a88894b24e5658aa49d66	2019-10-10 08:32:04 -07:00
Mingzhe Li	c1ed0150c5	canonical example of torch.add benchmark (#23402 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23402 This diff tries to make torch.add as a canonical example for op benchmark. Once it lands, we will also modify all other op benchmarks to be uniform with this example. With that, when people are adding new ops, they can copy paste any existing code. Test Plan: buck run mode/dev-nosan caffe2/benchmarks/operator_benchmark/pt:add_test -- --iterations 3 ``` # ---------------------------------------- # PyTorch/Caffe2 Operator Micro-benchmarks # ---------------------------------------- # Tag : short # Benchmarking PyTorch: add # Mode: Eager # Name: add_M8_N16_K32_devicecpu # Input: M: 8, N: 16, K: 32, device: cpu Forward Execution Time (us) : 146.586 # Benchmarking PyTorch: add # Mode: Eager # Name: add_M8_N16_K32_devicecuda # Input: M: 8, N: 16, K: 32, device: cuda Forward Execution Time (us) : 92.151 # Benchmarking PyTorch: add # Mode: Eager # Name: add_M16_N16_K64_devicecpu # Input: M: 16, N: 16, K: 64, device: cpu Forward Execution Time (us) : 428.421 # Benchmarking PyTorch: add # Mode: Eager # Name: add_M16_N16_K64_devicecuda # Input: M: 16, N: 16, K: 64, device: cuda Forward Execution Time (us) : 89.811 # Benchmarking PyTorch: add # Mode: Eager # Name: add_M64_N64_K128_devicecpu # Input: M: 64, N: 64, K: 128, device: cpu Forward Execution Time (us) : 11857.012 # Benchmarking PyTorch: add # Mode: Eager # Name: add_M64_N64_K128_devicecuda # Input: M: 64, N: 64, K: 128, device: cuda Forward Execution Time (us) : 93.918 # Benchmarking PyTorch: add # Mode: Eager # Name: add_M8_N16_K32_devicecpu_bwdall # Input: M: 8, N: 16, K: 32, device: cpu Backward Execution Time (us) : 990.125 # Benchmarking PyTorch: add # Mode: Eager # Name: add_M8_N16_K32_devicecpu_bwd1 # Input: M: 8, N: 16, K: 32, device: cpu Backward Execution Time (us) : 781.217 # Benchmarking PyTorch: add # Mode: Eager # Name: add_M8_N16_K32_devicecpu_bwd2 # Input: M: 8, N: 16, K: 32, device: cpu Backward Execution Time (us) : 777.307 ``` Reviewed By: zheng-xq Differential Revision: D16501974 fbshipit-source-id: f1eec010eabf11ce4fcf6cfe6f85cd5241a7022d	2019-10-09 11:24:10 -07:00
Mingzhe Li	31a6ff46c1	change input shape to reduce variation (#27548 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27548 as title Test Plan: i_dont_want_it Reviewed By: hl475 Differential Revision: D17811295 fbshipit-source-id: 3be957f6f3eaa464ebf4f5bd7c07d096ae4eae8c	2019-10-08 11:45:06 -07:00
Huamin Li	1c81d9006a	increase input shape to reduce variance (#25812 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25812 as title Test Plan: ``` [huaminli@devvm2388.ftw3 ~/fbsource/fbcode] buck run mode/dev-nosan caffe2/benchmarks/operator_benchmark:benchmark_all_test -- --operators None --iterations 3 ``` last few lines of the output P109238440 Reviewed By: mingzhe09088 Differential Revision: D17246792 fbshipit-source-id: d93ee5f404164d32210968997c6ea63b82058d2a	2019-09-07 06:25:26 -07:00
Mingzhe Li	b453fd9916	separate input shapes to reduce default execution time (#24136 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24136 This diff aims to reduce the execution of benchmark_all_test which runs all the supported operator benchmarks. In the default run, only one shape of each operator will be benchmarked. The rest of the benchmarks can be triggered with tag_filter flag. Reviewed By: hl475 Differential Revision: D16736448 fbshipit-source-id: 33bd86f6fc2610f87f24240ad559fb11d3e35e89	2019-08-09 17:09:21 -07:00
Mingzhe Li	a5cf6d5100	reorganize op bench directory (#21543 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21543 No code change in this diff. Reviewed By: hl475 Differential Revision: D15721419 fbshipit-source-id: 06212cc882f5297064153417dc4d80bce9ec2667	2019-06-07 16:06:51 -07:00

21 Commits