ARM64 clang from Android NDK doesn't define __ARM_NEON__, which results is perf regression on some models. I figured that some compilers define __ARM_NEON__ while others define __ARM_NEON. This patch changes all NEON-specific parts in Caffe2 to check both macros.