pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

History

Bert Maher a23e82df10 [nnc] Tweak log_nnc_sleef so vectorization kicks in (#51491 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51491 The vectorizer heuristic is pretty dumb and only kicks in if the unroll factor is exactly 8 or 4. It's still slower than direct implementation, which isn't surprising. ghstack-source-id: 120783426 Test Plan: `buck run mode/opt //caffe2/benchmarks/cpp/tensorexpr:tensorexpr_bench` Before: ``` --------------------------------------------------------------------------- Benchmark Time CPU Iterations UserCounters... --------------------------------------------------------------------------- log_nnc_sleef/64 438 ns 438 ns 1795511 log/s=146.259M/s log_nnc_sleef/512 3196 ns 3195 ns 210032 log/s=160.235M/s log_nnc_sleef/8192 77467 ns 77466 ns 8859 log/s=105.749M/s log_nnc_sleef/32768 310206 ns 310202 ns 2170 log/s=105.634M/s log_nnc_fast/64 100 ns 100 ns 7281074 log/s=637.144M/s log_nnc_fast/512 546 ns 546 ns 1335816 log/s=938.361M/s log_nnc_fast/8192 7360 ns 7359 ns 91971 log/s=1.11316G/s log_nnc_fast/32768 30793 ns 30792 ns 22633 log/s=1064.17M/s log_aten/64 427 ns 427 ns 1634897 log/s=150.021M/s log_aten/512 796 ns 796 ns 877318 log/s=643.566M/s log_aten/8192 6690 ns 6690 ns 102649 log/s=1.22452G/s log_aten/32768 25357 ns 25350 ns 27808 log/s=1.29263G/s ``` After: ``` --------------------------------------------------------------------------- Benchmark Time CPU Iterations UserCounters... --------------------------------------------------------------------------- log_nnc_sleef/64 189 ns 188 ns 3872475 log/s=340.585M/s log_nnc_sleef/512 1307 ns 1307 ns 557770 log/s=391.709M/s log_nnc_sleef/8192 20259 ns 20257 ns 34240 log/s=404.404M/s log_nnc_sleef/32768 81556 ns 81470 ns 8767 log/s=402.209M/s log_nnc_fast/64 110 ns 110 ns 6564558 log/s=581.116M/s log_nnc_fast/512 554 ns 554 ns 1279304 log/s=923.376M/s log_nnc_fast/8192 7774 ns 7774 ns 91421 log/s=1053.75M/s log_nnc_fast/32768 31008 ns 31006 ns 21279 log/s=1056.83M/s ``` Reviewed By: bwasti Differential Revision: D26139067 fbshipit-source-id: db31897ee9922695ff9dff4ff46e3d3fbd61f4c2	2021-02-01 16:35:37 -08:00
..
tensorexpr	[nnc] Tweak log_nnc_sleef so vectorization kicks in (#51491 )	2021-02-01 16:35:37 -08:00

Bert Maher a23e82df10 [nnc] Tweak log_nnc_sleef so vectorization kicks in (#51491 )

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/51491

The vectorizer heuristic is pretty dumb and only kicks in if the
unroll factor is exactly 8 or 4.

It's still slower than direct implementation, which isn't surprising.
ghstack-source-id: 120783426

Test Plan:
`buck run mode/opt //caffe2/benchmarks/cpp/tensorexpr:tensorexpr_bench`

Before:
```
---------------------------------------------------------------------------
Benchmark                    Time           CPU Iterations UserCounters...
---------------------------------------------------------------------------
log_nnc_sleef/64           438 ns        438 ns    1795511 log/s=146.259M/s
log_nnc_sleef/512         3196 ns       3195 ns     210032 log/s=160.235M/s
log_nnc_sleef/8192       77467 ns      77466 ns       8859 log/s=105.749M/s
log_nnc_sleef/32768     310206 ns     310202 ns       2170 log/s=105.634M/s
log_nnc_fast/64            100 ns        100 ns    7281074 log/s=637.144M/s
log_nnc_fast/512           546 ns        546 ns    1335816 log/s=938.361M/s
log_nnc_fast/8192         7360 ns       7359 ns      91971 log/s=1.11316G/s
log_nnc_fast/32768       30793 ns      30792 ns      22633 log/s=1064.17M/s
log_aten/64           427 ns        427 ns    1634897 log/s=150.021M/s
log_aten/512          796 ns        796 ns     877318 log/s=643.566M/s
log_aten/8192        6690 ns       6690 ns     102649 log/s=1.22452G/s
log_aten/32768      25357 ns      25350 ns      27808 log/s=1.29263G/s
```

After:
```
---------------------------------------------------------------------------
Benchmark                    Time           CPU Iterations UserCounters...
---------------------------------------------------------------------------
log_nnc_sleef/64           189 ns        188 ns    3872475 log/s=340.585M/s
log_nnc_sleef/512         1307 ns       1307 ns     557770 log/s=391.709M/s
log_nnc_sleef/8192       20259 ns      20257 ns      34240 log/s=404.404M/s
log_nnc_sleef/32768      81556 ns      81470 ns       8767 log/s=402.209M/s
log_nnc_fast/64            110 ns        110 ns    6564558 log/s=581.116M/s
log_nnc_fast/512           554 ns        554 ns    1279304 log/s=923.376M/s
log_nnc_fast/8192         7774 ns       7774 ns      91421 log/s=1053.75M/s
log_nnc_fast/32768       31008 ns      31006 ns      21279 log/s=1056.83M/s
```

Reviewed By: bwasti

Differential Revision: D26139067

fbshipit-source-id: db31897ee9922695ff9dff4ff46e3d3fbd61f4c2

2021-02-01 16:35:37 -08:00

tensorexpr

[nnc] Tweak log_nnc_sleef so vectorization kicks in (#51491 )

2021-02-01 16:35:37 -08:00