Commit Graph

7 Commits

Author SHA1 Message Date
Edward Z. Yang
dd3a77bc96 Apply UFMT to all files in benchmarks/ (#105928)
Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/105928
Approved by: https://github.com/albanD
2023-07-26 01:18:48 +00:00
Elias Ellison
6694fdaccd Clean up profiling mode and profiling executor strategy (#73875)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73875

Previously we had a few settings:
- getExecutor - which toggled between Profiling Executor and Legacy
- getGraphOptimize - if true, overrides PE/Legacy to run with simple executor (no optimizations)
and then...
- getProfilingMode - which would set PE to 0 specializtions.

The last mode is redundant with getGraphOptimize, we should just remove it and use getGraphOptimize in these cases. It would lead to potentially invalid combinations of logic - what does mean if getProfilingMode is true but getExecutor is set to false ? This would lead to a bug in specialize_autograd_zero in this case, see: https://github.com/pytorch/pytorch/blob/master/torch%2Fcsrc%2Fjit%2Fpasses%2Fspecialize_autogradzero.cpp#L93.

The tests here are failing but get fixed with the PR above it, so i'll squash for landing.

Test Plan: Imported from OSS

Reviewed By: cpuhrsch

Differential Revision: D34938130

Pulled By: eellison

fbshipit-source-id: 1a9c0ae7f6d1cfddc2ed3499a5af611053ae5e1b
(cherry picked from commit cf69ce3d155ba7d334022c42fb2cee54bb088c23)
2022-03-29 18:38:51 +00:00
Nikolay Korovaiko
195ab5e864 remove non-default settings in fuser.py (#48862)
Summary:
I've noticed we are setting `_jit_set_num_profiled_runs` to 2 (which isn't our default) and sometimes we don't. We are also setting `_jit_set_bailout_depth` to 20 which **is** our default. I suggest we remove this logic altogether.
I did a quick run to see if there's any impact and thankfully, the numbers seem to be consistent, but we should try avoding testing configurations that aren't default or aren't  considered to become default.

 numactl -C 3 python -m fastrnns.bench --fuser=te --executor=profiling

non-defaults:

```
Namespace(cnns=None, cuda_pointwise_block_count=None, cuda_pointwise_block_size=None, cuda_pointwise_loop_level=None, device='cuda', executor='profiling', fuser='te', group=['cnns', 'rnns'], hiddenSize=512, inputSize=512, miniBatch=64, nloops=100, numLayers=1, print_json=None, rnns=None, sep=' ', seqLength=100, variable_lstms=False, warmup=10)
Benchmarking LSTMs...
            name          avg_fwd          std_fwd         info_fwd          avg_bwd          std_bwd         info_bwd
           cudnn            5.057          0.06287             None            7.322          0.07404             None
            aten            5.602          0.06303             None            13.64           0.4078             None
             jit            7.019          0.07995             None            13.77            0.554             None
      jit_premul            5.324          0.06203             None            12.01           0.2996             None
 jit_premul_bias            5.148          0.08061             None            11.62           0.4104             None
      jit_simple             6.69           0.2317             None            13.37           0.3791             None
  jit_multilayer            7.006            0.251             None            13.67           0.2239             None
              py            19.05           0.1119             None            28.28           0.6346             None

Benchmarking ResNets...
            name          avg_fwd          std_fwd         info_fwd          avg_bwd          std_bwd         info_bwd
        resnet18            8.712          0.01628             None            19.93          0.03512             None
    resnet18_jit            8.688          0.01374             None            19.79          0.07518             None
        resnet50            31.04          0.08049             None            66.44          0.08187             None
    resnet50_jit            31.11          0.07171             None            66.45          0.09157             None
```

defaults:
```
Namespace(cnns=None, cuda_pointwise_block_count=None, cuda_pointwise_block_size=None, cuda_pointwise_loop_level=None, device='cuda', executor='profiling', fuser='te', group=['cnns', 'rnns'], hiddenSize=512, inputSize=512, miniBatch=64, nloops=100, numLayers=1, print_json=None, rnns=None, sep=' ', seqLength=100, variable_lstms=False, warmup=10)
Benchmarking LSTMs...
            name          avg_fwd          std_fwd         info_fwd          avg_bwd          std_bwd         info_bwd
           cudnn            5.086            0.115             None            7.394           0.1743             None
            aten            5.611           0.2559             None            13.54            0.387             None
             jit            7.062           0.3358             None            13.24           0.3688             None
      jit_premul            5.379           0.2086             None            11.57           0.3987             None
 jit_premul_bias            5.202           0.2127             None            11.13          0.06748             None
      jit_simple            6.648          0.05794             None            12.84           0.3047             None
  jit_multilayer            6.964           0.1104             None            13.24           0.3283             None
              py            19.14          0.09959             None            28.17           0.4946             None

Benchmarking ResNets...
            name          avg_fwd          std_fwd         info_fwd          avg_bwd          std_bwd         info_bwd
        resnet18            8.713          0.01563             None            19.93          0.02759             None
    resnet18_jit            8.697          0.01792             None            19.78          0.06916             None
        resnet50            31.14          0.07431             None            66.57          0.07418             None
    resnet50_jit            31.21           0.0677             None            66.56          0.08655             None

```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/48862

Reviewed By: bertmaher

Differential Revision: D25342097

Pulled By: Krovatkin

fbshipit-source-id: 8d2f72c2770793ec8cecee9dfab9aaaf2e1ad2b1
2020-12-05 20:58:39 -08:00
Mikhail Zolotukhin
bc5710f2f7 Benchmarks: tweak PE config settings. (#45349)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45349

Test Plan: Imported from OSS

Reviewed By: Krovatkin

Differential Revision: D23935518

Pulled By: ZolotukhinM

fbshipit-source-id: 5a7c508c6fc84eafbc23399f095d732b903510dc
2020-09-26 23:13:29 -07:00
Mikhail Zolotukhin
8cef7326f4 Benchmarks: add 'default' options for fuser and executor. (#45347)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45347

Test Plan: Imported from OSS

Reviewed By: Krovatkin

Differential Revision: D23935519

Pulled By: ZolotukhinM

fbshipit-source-id: 8323fafe7828683c4d29c12a1e5722adb6f945ff
2020-09-26 23:09:02 -07:00
Mikhail Zolotukhin
d11603de38 [TensorExpr] Benchmarks: set number of profiling runs to 2 for PE. (#44112)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44112

Test Plan: Imported from OSS

Reviewed By: bertmaher

Differential Revision: D23500904

Pulled By: ZolotukhinM

fbshipit-source-id: d0dd54752b7ea5ae11f33e865c96d2d61e98d573
2020-09-03 11:29:35 -07:00
Bert Maher
33d51a9b32 Respect canFuseOn{CPU,GPU} in TE fuser (#43967)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43967

Test Plan: Imported from OSS

Reviewed By: asuhan

Differential Revision: D23469048

Pulled By: bertmaher

fbshipit-source-id: 1005a7ae08974059ff9d467492caa3a388070eeb
2020-09-02 18:00:25 -07:00