pytorch/benchmarks/cpp
Bert Maher fbeb8b4992 [nnc] Speed up batchnorm benchmark
Summary:
Use better scheduling: fuse and parallelize NC, fuse and
vectorize HW.

```
-----------------------------------------------
 N/C/H/W               ATen               NNC
-----------------------------------------------
1/64/112/112          45449 ns         36672 ns
1/256/14/14           15555 ns	        7116 ns
1/128/28/28           15737 ns	        8560 ns
1/64/56/56            20766 ns	       12153 ns
1/512/7/7             16985 ns	        8182 ns

5/64/112/112        2532475 ns	     2069668 ns
5/256/14/14           24507 ns	       12228 ns
5/128/28/28           29352 ns	       20146 ns
5/64/56/56            44786 ns	       38784 ns
5/512/7/7             22307 ns	       20505 ns
```

Test Plan: benchmark results above

Reviewed By: navahgar

Differential Revision: D29288658

fbshipit-source-id: dd05efa4b7d26b6ad94f54a9ef6c8c47adb160b5
2021-06-22 22:57:43 -07:00
..
tensorexpr [nnc] Speed up batchnorm benchmark 2021-06-22 22:57:43 -07:00
CMakeLists.txt CPU Convolution benchmark harness for some popular models (#56455) 2021-04-22 22:14:36 -07:00
convolution.cpp [clang-tidy] Exclude cppcoreguidelines-avoid-magic-numbers (#57841) 2021-05-07 20:02:33 -07:00