mirror of
https://github.com/zebrajr/pytorch.git
synced 2025-12-07 12:21:27 +01:00
Summary: Use better scheduling: fuse and parallelize NC, fuse and vectorize HW. ``` ----------------------------------------------- N/C/H/W ATen NNC ----------------------------------------------- 1/64/112/112 45449 ns 36672 ns 1/256/14/14 15555 ns 7116 ns 1/128/28/28 15737 ns 8560 ns 1/64/56/56 20766 ns 12153 ns 1/512/7/7 16985 ns 8182 ns 5/64/112/112 2532475 ns 2069668 ns 5/256/14/14 24507 ns 12228 ns 5/128/28/28 29352 ns 20146 ns 5/64/56/56 44786 ns 38784 ns 5/512/7/7 22307 ns 20505 ns ``` Test Plan: benchmark results above Reviewed By: navahgar Differential Revision: D29288658 fbshipit-source-id: dd05efa4b7d26b6ad94f54a9ef6c8c47adb160b5 |
||
|---|---|---|
| .. | ||
| tensorexpr | ||
| CMakeLists.txt | ||
| convolution.cpp | ||