pytorch/aten
Nicolas De Carli b31bad1b8f [Pytorch] Enable autovec on aarch64 for type conversion (#166049)
Summary:
Implementing autovec template for type conversions on aarch64-NEON

Generated code can be seen here: https://godbolt.org/z/1K6T1d9TE

We've seen significant performance improvements for converting to and from bytes, compiling using clang with -march=armv9-a+sve2:

Before
float->uint8->float ===> 683.212us
float->int8->float ===> 687.846us
int32->uint8->int32 ===> 497.121us
int32->int8->int32 ===> 481.889us

After:
float->uint8->float ===> 198.204us  ----> 245% higher throughput
float->int8->float ===> 200.241us ----> 244% higher throughput
int32->uint8->int32 ===> 197.970us ----> 151% higher throughput
int32->int8->int32 ===> 198.206us ----> 143% higher throughput

Test Plan:

buck2 test mode/opt //caffe2/test:test_ops
buck2 test mode/opt //caffe2/test:torch

Differential Revision: D85213420

Pull Request resolved: https://github.com/pytorch/pytorch/pull/166049
Approved by: https://github.com/ezyang, https://github.com/mcfi, https://github.com/aditew01
2025-10-25 02:55:50 +00:00
..
conda
src [Pytorch] Enable autovec on aarch64 for type conversion (#166049) 2025-10-25 02:55:50 +00:00
tools Adds Issue#153109 as a test for CUDAPluggableAllocator (#163575) 2025-10-01 09:07:48 +00:00
CMakeLists.txt Revert "Use official CUDAToolkit module in CMake (#154595)" 2025-06-23 21:15:31 +00:00