mirror of
https://github.com/zebrajr/pytorch.git
synced 2025-12-06 00:20:18 +01:00
Summary: Implementing autovec template for type conversions on aarch64-NEON Generated code can be seen here: https://godbolt.org/z/1K6T1d9TE We've seen significant performance improvements for converting to and from bytes, compiling using clang with -march=armv9-a+sve2: Before float->uint8->float ===> 683.212us float->int8->float ===> 687.846us int32->uint8->int32 ===> 497.121us int32->int8->int32 ===> 481.889us After: float->uint8->float ===> 198.204us ----> 245% higher throughput float->int8->float ===> 200.241us ----> 244% higher throughput int32->uint8->int32 ===> 197.970us ----> 151% higher throughput int32->int8->int32 ===> 198.206us ----> 143% higher throughput Test Plan: buck2 test mode/opt //caffe2/test:test_ops buck2 test mode/opt //caffe2/test:torch Differential Revision: D85213420 Pull Request resolved: https://github.com/pytorch/pytorch/pull/166049 Approved by: https://github.com/ezyang, https://github.com/mcfi, https://github.com/aditew01 |
||
|---|---|---|
| .. | ||
| conda | ||
| src | ||
| tools | ||
| CMakeLists.txt | ||