pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-06 00:20:18 +01:00

History

Nicolas De Carli b31bad1b8f [Pytorch] Enable autovec on aarch64 for type conversion (#166049 ) Summary: Implementing autovec template for type conversions on aarch64-NEON Generated code can be seen here: https://godbolt.org/z/1K6T1d9TE We've seen significant performance improvements for converting to and from bytes, compiling using clang with -march=armv9-a+sve2: Before float->uint8->float ===> 683.212us float->int8->float ===> 687.846us int32->uint8->int32 ===> 497.121us int32->int8->int32 ===> 481.889us After: float->uint8->float ===> 198.204us ----> 245% higher throughput float->int8->float ===> 200.241us ----> 244% higher throughput int32->uint8->int32 ===> 197.970us ----> 151% higher throughput int32->int8->int32 ===> 198.206us ----> 143% higher throughput Test Plan: buck2 test mode/opt //caffe2/test:test_ops buck2 test mode/opt //caffe2/test:torch Differential Revision: D85213420 Pull Request resolved: https://github.com/pytorch/pytorch/pull/166049 Approved by: https://github.com/ezyang, https://github.com/mcfi, https://github.com/aditew01		2025-10-25 02:55:50 +00:00
..
conda
src	[Pytorch] Enable autovec on aarch64 for type conversion (#166049 )	2025-10-25 02:55:50 +00:00
tools	Adds Issue#153109 as a test for CUDAPluggableAllocator (#163575 )	2025-10-01 09:07:48 +00:00
CMakeLists.txt	Revert "Use official CUDAToolkit module in CMake (#154595 )"	2025-06-23 21:15:31 +00:00