Commit Graph

21 Commits

Author SHA1 Message Date
Xuehai Pan
c0ed38e644 [BE][Easy][3/19] enforce style for empty lines in import segments in benchmarks/ (#129754)
See https://github.com/pytorch/pytorch/pull/129751#issue-2380881501. Most changes are auto-generated by linter.

You can review these PRs via:

```bash
git diff --ignore-all-space --ignore-blank-lines HEAD~1
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/129754
Approved by: https://github.com/ezyang
2024-07-17 14:34:42 +00:00
Xuehai Pan
26f4f10ac8 [5/N][Easy] fix typo for usort config in pyproject.toml (kown -> known): sort torch (#127126)
The `usort` config in `pyproject.toml` has no effect due to a typo. Fixing the typo make `usort` do more and generate the changes in the PR. Except `pyproject.toml`, all changes are generated by `lintrunner -a --take UFMT --all-files`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/127126
Approved by: https://github.com/kit1980
2024-05-27 14:49:57 +00:00
PyTorch MergeBot
55c0ab2887 Revert "[5/N][Easy] fix typo for usort config in pyproject.toml (kown -> known): sort torch (#127126)"
This reverts commit 7763c83af6.

Reverted https://github.com/pytorch/pytorch/pull/127126 on behalf of https://github.com/XuehaiPan due to Broken CI ([comment](https://github.com/pytorch/pytorch/pull/127126#issuecomment-2133044286))
2024-05-27 09:22:08 +00:00
Xuehai Pan
7763c83af6 [5/N][Easy] fix typo for usort config in pyproject.toml (kown -> known): sort torch (#127126)
The `usort` config in `pyproject.toml` has no effect due to a typo. Fixing the typo make `usort` do more and generate the changes in the PR. Except `pyproject.toml`, all changes are generated by `lintrunner -a --take UFMT --all-files`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/127126
Approved by: https://github.com/kit1980
ghstack dependencies: #127122, #127123, #127124, #127125
2024-05-27 04:22:18 +00:00
Edward Z. Yang
dd3a77bc96 Apply UFMT to all files in benchmarks/ (#105928)
Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/105928
Approved by: https://github.com/albanD
2023-07-26 01:18:48 +00:00
Yang Wang
8ff0b6fef8 [OpBenchMobile] Enable operator_benchmark to run the benchmark on mobile through AiBench (#47767)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/47767

This diff implements the functionality of running benchmark on mobile on top of operator_benchmark framework. It does so through a few steps:

1. create a scripted module from existing benchmark case.
2. run mobile specific optimization pass on the scripted module
3. run the scripted module on AiBench by calling its Python API

A small change in the way of writing a benchmark case is introduced so that both local and mobile run can share the same interface. The change is about having inputs as arguments of the `forward` function, so that mobile optimization pass can be run successfully (otherwise everything will be optimized away by constant propagation).

Test Plan:
## local op_bench run

buck run caffe2/benchmarks/operator_benchmark:benchmark_all_test --  --iterations 1 --warmup_iterations 1

buck run caffe2/benchmarks/operator_benchmark:benchmark_all_test --  --iterations 1 --warmup_iterations 1 --use_jit

Exceptions: `py_module` op in `FakeQuantizePerTensorBaseOpBenchmark` and `FakeQuantizePerChannelBaseOpBenchmark` under JIT mode. These tests also failed in the base version

```
RuntimeError:
Module 'FakeQuantizePerChannelOpBenchmark' has no attribute 'op_func' (This function exists as an attribute on the Python module, but we failed to compile it to a TorchScript function.
The error stack is reproduced here:

Python builtin <built-in method apply of FunctionMeta object at 0x619000c652a0> is currently not supported in Torchscript:
  File "/data/users/wangyang19/fbsource/fbcode/buck-out/dev/gen/caffe2/benchmarks/operator_benchmark/pt/quantization_test#link-tree/quantization_test.py", line 260
    quant_min: int, quant_max: int
):
    return _LearnableFakeQuantizePerChannelOp.apply(input, scale, zero_point, axis, quant_min, quant_max, 1.0)
           ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
:
  File "/data/users/wangyang19/fbsource/fbcode/buck-out/dev/gen/caffe2/benchmarks/operator_benchmark/pt/quantization_test#link-tree/quantization_test.py", line 313
        axis: int, quant_min: int, quant_max: int
    ):
        return self.op_func(input, scale, zero_point, axis, quant_min, quant_max)
               ~~~~~~~~~~~~ <--- HERE
```

`_consume_op` typing mismatch: chunk, split, qobserver, sort in qunary. These will be fixed in D24774105

## OSS test

python3 -m benchmark_all_test --iterations 1 --warmup_iterations 1 --use_jit
python3 -m benchmark_all_test --iterations 1 --warmup_iterations 1

## saved module graph
```
module __torch__.mobile_benchmark_utils.OpBenchmarkMobile {
  parameters {
  }
  attributes {
    training = True
    num_iters = 1
    benchmark = <__torch__.pt.add_test.___torch_mangle_4.AddBenchmark object at 0x6070001b8b50>
  }
  methods {
    method forward {
      graph(%self : __torch__.mobile_benchmark_utils.OpBenchmarkMobile):
        %12 : None = prim::Constant() # /data/users/wangyang19/fbsource/fbcode/buck-out/dev/gen/caffe2/benchmarks/operator_benchmark/fb/pt/mobile/benchmark_all_test_fbcode#link-tree/mobile_benchmark_utils.py:9:4
        %4 : bool = prim::Constant[value=1]() # /data/users/wangyang19/fbsource/fbcode/buck-out/dev/gen/caffe2/benchmarks/operator_benchmark/fb/pt/mobile/benchmark_all_test_fbcode#link-tree/mobile_benchmark_utils.py:10:8
        %1 : int = prim::GetAttr[name="num_iters"](%self)
         = prim::Loop(%1, %4) # /data/users/wangyang19/fbsource/fbcode/buck-out/dev/gen/caffe2/benchmarks/operator_benchmark/fb/pt/mobile/benchmark_all_test_fbcode#link-tree/mobile_benchmark_utils.py:10:8
          block0(%i : int):
            %6 : __torch__.pt.add_test.___torch_mangle_4.AddBenchmark = prim::GetAttr[name="benchmark"](%self)
            %7 : __torch__.pt.add_test.___torch_mangle_4.AddBenchmark = prim::GetAttr[name="benchmark"](%self)
            %self.inputs_tuple : (Float(1, 1, 1, strides=[1, 1, 1], requires_grad=0, device=cpu), Float(1, 1, 1, strides=[1, 1, 1], requires_grad=0, device=cpu)) = prim::Constant[value=({0.48884}, {0.809042})]()
            %9 : Tensor, %10 : Tensor = prim::TupleUnpack(%self.inputs_tuple)
            %23 : int = prim::Constant[value=1]()
            %24 : Tensor = aten::add(%9, %10, %23) # /data/users/wangyang19/fbsource/fbcode/buck-out/dev/gen/caffe2/benchmarks/operator_benchmark/fb/pt/mobile/benchmark_all_test_fbcode#link-tree/pt/add_test.py:39:15
            -> (%4)
        return (%12)

    }
  }
  submodules {
    module __torch__.pt.add_test.___torch_mangle_4.AddBenchmark {
      parameters {
      }
      attributes {
        mobile_optimized = True
      }
      methods {
        method forward {
          graph(%self : __torch__.pt.add_test.___torch_mangle_4.AddBenchmark,
                %input_one.1 : Tensor,
                %input_two.1 : Tensor):
            %3 : int = prim::Constant[value=1]()
            %4 : Tensor = aten::add(%input_one.1, %input_two.1, %3) # /data/users/wangyang19/fbsource/fbcode/buck-out/dev/gen/caffe2/benchmarks/operator_benchmark/fb/pt/mobile/benchmark_all_test_fbcode#link-tree/pt/add_test.py:39:15
            return (%4)

        }
        method get_inputs {
          graph(%self : __torch__.pt.add_test.___torch_mangle_4.AddBenchmark):
            %self.inputs_tuple : (Float(1, 1, 1, strides=[1, 1, 1], requires_grad=0, device=cpu), Float(1, 1, 1, strides=[1, 1, 1], requires_grad=0, device=cpu)) = prim::Constant[value=({0.48884}, {0.809042})]()
            return (%self.inputs_tuple)

        }
      }
      submodules {
      }
    }
  }
}

```

Reviewed By: kimishpatel

Differential Revision: D24322214

fbshipit-source-id: 335317eca4f40c4083883eb41dc47caf25cbdfd1
2020-11-12 17:15:05 -08:00
Xiang Gao
20ac736200 Remove py2 compatible future imports (#44735)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44735

Reviewed By: mruberry

Differential Revision: D23731306

Pulled By: ezyang

fbshipit-source-id: 0ba009a99e475ddbe22981be8ac636f8a1c8b02f
2020-09-16 12:55:57 -07:00
Wojciech Baranowski
43331609a4 Port addmm, addbmm, addr to ATen (CUDA) (#38421)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/24536, fixes https://github.com/pytorch/pytorch/issues/24534 and fixes https://github.com/pytorch/pytorch/issues/24533
Pull Request resolved: https://github.com/pytorch/pytorch/pull/38421

Differential Revision: D22138333

Pulled By: VitalyFedyunin

fbshipit-source-id: f4411d0df0a001bbb95089eb55fdcac3aba86700
2020-06-22 13:02:33 -07:00
Mingzhe Li
b68d1fc316 add small input shapes to some ops (#30617)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30617

as title

Test Plan: buck run //caffe2/benchmarks/operator_benchmark:benchmark_all_test -- --iterations 1 --operator add,as_strided,cat,chunk,fill,linear,matmul,split

Reviewed By: hl475

Differential Revision: D18764248

fbshipit-source-id: 510cf83542822acfa1b7b5e475b0cc7432f7ac19
2019-12-02 10:46:43 -08:00
Mingzhe Li
60a33cac2b reduce input shapes of long tag in op bench (#29865)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29865

For some operators, the number of tests (forward + backward) could easily go above 100. Many of them could be redundant so this diff tries to reduce the number of shapes.

Test Plan:
```
buck run //caffe2/benchmarks/operator_benchmark:benchmark_all_test -- --iterations 1
# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : short

# Benchmarking PyTorch: add
# Mode: Eager
# Name: add_M64_N64_K64_cpu
# Input: M: 64, N: 64, K: 64, device: cpu
Forward Execution Time (us) : 28418.926
...

Reviewed By: hl475

Differential Revision: D18520946

fbshipit-source-id: 1056d6d5a9c46bc2d508ff133039aefeb9d11c27
2019-11-14 20:19:09 -08:00
Mingzhe Li
e53b510773 add addmm op to the benchmark suite (#29783)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29783

Add addmm operator which reuses existing input shapes for the add operator.

Test Plan:
```
buck run //caffe2/benchmarks/operator_benchmark/pt:add_test -- --iterations 1
# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : short

# Benchmarking PyTorch: addmm
# Mode: Eager
# Name: addmm_M64_N64_K64_cpu
# Input: M: 64, N: 64, K: 64, device: cpu
Forward Execution Time (us) : 759.237

# Benchmarking PyTorch: addmm
# Mode: Eager
# Name: addmm_M64_N64_K128_cpu
# Input: M: 64, N: 64, K: 128, device: cpu
Forward Execution Time (us) : 922.764

# Benchmarking PyTorch: addmm
# Mode: Eager
# Name: addmm_M64_N64_K64_cpu_bwdall
# Input: M: 64, N: 64, K: 64, device: cpu
Backward Execution Time (us) : 4689.546

# Benchmarking PyTorch: addmm
# Mode: Eager
# Name: addmm_M64_N64_K64_cpu_bwd1
# Input: M: 64, N: 64, K: 64, device: cpu
Backward Execution Time (us) : 1700.093

# Benchmarking PyTorch: addmm
# Mode: Eager
# Name: addmm_M64_N64_K64_cpu_bwd2
# Input: M: 64, N: 64, K: 64, device: cpu
Backward Execution Time (us) : 2947.427

# Benchmarking PyTorch: addmm
# Mode: Eager
# Name: addmm_M64_N64_K64_cpu_bwd3
# Input: M: 64, N: 64, K: 64, device: cpu
Backward Execution Time (us) : 2518.043

# Benchmarking PyTorch: addmm
# Mode: Eager
# Name: addmm_M64_N64_K128_cpu_bwdall
# Input: M: 64, N: 64, K: 128, device: cpu
Backward Execution Time (us) : 5848.369

Reviewed By: bddppq

Differential Revision: D18496476

fbshipit-source-id: 4f1c116a2676a64106afa958e8c8a8e109f35a4a
2019-11-14 10:02:55 -08:00
Mingzhe Li
f31d6c70fe reduce op bench binary size (#29496)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29496

This diff reduces the binary size of op benchmark by avoiding creating all tests at once.

Test Plan:
```
buck run //caffe2/benchmarks/operator_benchmark:benchmark_all_test

# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : long

# Benchmarking PyTorch: add
# Mode: Eager
# Name: add_M8_N2_K1_cpu
# Input: M: 8, N: 2, K: 1, device: cpu
Forward Execution Time (us) : 160.781

# Benchmarking PyTorch: add
# Mode: Eager
# Name: add_M8_N2_K8_cpu
# Input: M: 8, N: 2, K: 8, device: cpu
Forward Execution Time (us) : 158.941

Reviewed By: hl475

Differential Revision: D18412342

fbshipit-source-id: 5db647019ae8c2e4d6ab361b54b63cf88236b1ae
2019-11-08 22:15:12 -08:00
Mingzhe Li
e86450620d add cuda to all op benchmark (#29285)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29285

as title

Test Plan:
```
buck run mode/dev-nosan //caffe2/benchmarks/operator_benchmark:benchmark_all_test -- --iterations 1

# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : short

# Benchmarking PyTorch: ConvTranspose2d
# Mode: Eager
# Name: ConvTranspose2d_kernel3_out_c256_H16_in_c256_N1_stride1_W16_cpu
# Input: kernel: 3, out_c: 256, H: 16, in_c: 256, N: 1, stride: 1, W: 16, device: cpu
Forward Execution Time (us) : 10434.151

Reviewed By: hl475

Differential Revision: D18338258

fbshipit-source-id: 944e87d1ec70daadb205faaf2825d4a2202086c5
2019-11-06 09:37:00 -08:00
Mingzhe Li
b693c5d6a0 replace add benchmark with add_ (#29050)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29050

as title

Test Plan:
```
buck run //caffe2/benchmarks/operator_benchmark/pt:add_test
# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : short

# Benchmarking PyTorch: add_
# Mode: Eager
# Name: add__M64_N64_K64_cpu
# Input: M: 64, N: 64, K: 64, device: cpu
Forward Execution Time (us) : 31475.766

Reviewed By: hl475

Differential Revision: D18265767

fbshipit-source-id: 7aaa04f5fa5b2dd58bbc1aa045693314032e0ff0
2019-11-01 13:08:27 -07:00
Mingzhe Li
db15c2ba20 unify add benchmark format (#28891)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28891

as title

Test Plan:
```
buck run mode/opt //caffe2/benchmarks/operator_benchmark/pt:add_test
# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : short

# Benchmarking PyTorch: add
# Mode: Eager
# Name: add_M64_N64_K64_cpu
# Input: M: 64, N: 64, K: 64, device: cpu
Forward Execution Time (us) : 125.279
...

Reviewed By: hl475

Differential Revision: D18226789

fbshipit-source-id: 0cc51c6691533b02f662d4b6108916455f3a5b95
2019-10-30 15:53:25 -07:00
Mingzhe Li
38a3eabd3e remove cuda from add_test
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27698

Test Plan:
```
buck run caffe2/benchmarks/operator_benchmark/pt:add_test -- --iterations 3

# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : short

# Benchmarking PyTorch: add
# Mode: Eager
# Name: add_M64_N64_K64_cpu
# Input: M: 64, N: 64, K: 64, device: cpu
Forward Execution Time (us) : 29691.940

# Benchmarking PyTorch: add
# Mode: Eager
# Name: add_M64_N64_K128_cpu
# Input: M: 64, N: 64, K: 128, device: cpu
Forward Execution Time (us) : 60820.813

Reviewed By: hl475

Differential Revision: D17855731

fbshipit-source-id: c64c530f4dbcb5b4132a88894b24e5658aa49d66
2019-10-10 08:32:04 -07:00
Mingzhe Li
c1ed0150c5 canonical example of torch.add benchmark (#23402)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23402

This diff tries to make torch.add as a canonical example for op benchmark. Once it lands, we will also modify all other op benchmarks to be uniform with this example. With that, when people are adding new ops, they can copy paste any existing code.

Test Plan:
buck run mode/dev-nosan caffe2/benchmarks/operator_benchmark/pt:add_test -- --iterations 3

```
# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : short

# Benchmarking PyTorch: add
# Mode: Eager
# Name: add_M8_N16_K32_devicecpu
# Input: M: 8, N: 16, K: 32, device: cpu
Forward Execution Time (us) : 146.586

# Benchmarking PyTorch: add
# Mode: Eager
# Name: add_M8_N16_K32_devicecuda
# Input: M: 8, N: 16, K: 32, device: cuda
Forward Execution Time (us) : 92.151

# Benchmarking PyTorch: add
# Mode: Eager
# Name: add_M16_N16_K64_devicecpu
# Input: M: 16, N: 16, K: 64, device: cpu
Forward Execution Time (us) : 428.421

# Benchmarking PyTorch: add
# Mode: Eager
# Name: add_M16_N16_K64_devicecuda
# Input: M: 16, N: 16, K: 64, device: cuda
Forward Execution Time (us) : 89.811

# Benchmarking PyTorch: add
# Mode: Eager
# Name: add_M64_N64_K128_devicecpu
# Input: M: 64, N: 64, K: 128, device: cpu
Forward Execution Time (us) : 11857.012

# Benchmarking PyTorch: add
# Mode: Eager
# Name: add_M64_N64_K128_devicecuda
# Input: M: 64, N: 64, K: 128, device: cuda
Forward Execution Time (us) : 93.918

# Benchmarking PyTorch: add
# Mode: Eager
# Name: add_M8_N16_K32_devicecpu_bwdall
# Input: M: 8, N: 16, K: 32, device: cpu
Backward Execution Time (us) : 990.125

# Benchmarking PyTorch: add
# Mode: Eager
# Name: add_M8_N16_K32_devicecpu_bwd1
# Input: M: 8, N: 16, K: 32, device: cpu
Backward Execution Time (us) : 781.217

# Benchmarking PyTorch: add
# Mode: Eager
# Name: add_M8_N16_K32_devicecpu_bwd2
# Input: M: 8, N: 16, K: 32, device: cpu
Backward Execution Time (us) : 777.307
```

Reviewed By: zheng-xq

Differential Revision: D16501974

fbshipit-source-id: f1eec010eabf11ce4fcf6cfe6f85cd5241a7022d
2019-10-09 11:24:10 -07:00
Mingzhe Li
31a6ff46c1 change input shape to reduce variation (#27548)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27548

as title

Test Plan: i_dont_want_it

Reviewed By: hl475

Differential Revision: D17811295

fbshipit-source-id: 3be957f6f3eaa464ebf4f5bd7c07d096ae4eae8c
2019-10-08 11:45:06 -07:00
Huamin Li
1c81d9006a increase input shape to reduce variance (#25812)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25812

as title

Test Plan:
```
[huaminli@devvm2388.ftw3 ~/fbsource/fbcode] buck run mode/dev-nosan caffe2/benchmarks/operator_benchmark:benchmark_all_test -- --operators None --iterations 3
```
last few lines of the output P109238440

Reviewed By: mingzhe09088

Differential Revision: D17246792

fbshipit-source-id: d93ee5f404164d32210968997c6ea63b82058d2a
2019-09-07 06:25:26 -07:00
Mingzhe Li
b453fd9916 separate input shapes to reduce default execution time (#24136)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24136

This diff aims to reduce the execution of benchmark_all_test which runs all the supported operator benchmarks. In the default run, only one shape of each operator will be benchmarked. The rest of the benchmarks can be triggered with tag_filter flag.

Reviewed By: hl475

Differential Revision: D16736448

fbshipit-source-id: 33bd86f6fc2610f87f24240ad559fb11d3e35e89
2019-08-09 17:09:21 -07:00
Mingzhe Li
a5cf6d5100 reorganize op bench directory (#21543)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21543

No code change in this diff.

Reviewed By: hl475

Differential Revision: D15721419

fbshipit-source-id: 06212cc882f5297064153417dc4d80bce9ec2667
2019-06-07 16:06:51 -07:00