Owen Anderson
0b78ae86c5
Cleanup byte swapping utilities to generate optimal code on the platforms we care about. ( #11394 )
...
Summary:
While the use of memcpy as part of the byte swapping sequence looks funky, all major
compilers recognize and optimize this pattern reliably, resulting in essentially
optimal code generation.
For example, decodeUInt32LE goes from this on iOS arm64:
> ldrb w8, [x0, #3 ]
> ldrb w9, [x0, #2 ]
> bfi w8, w9, #8 , #8
> ldrb w9, [x0, #1 ]
> bfi w8, w9, #16 , #8
> ldrb w9, [x0]
> bfi w8, w9, #24 , #8
> mov x0, x8
> ret
To this:
> ldr w8, [x0]
> rev w0, w8
> ret
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11394
Reviewed By: SsnL
Differential Revision: D9728659
Pulled By: resistor
fbshipit-source-id: 9afbd4adfad1d1fb7b01f1179e6707ee21fa726f
2018-09-10 15:40:24 -07:00