mirror of
https://github.com/zebrajr/postgres.git
synced 2025-12-06 00:20:01 +01:00
The previous implementation of CRC32C on x86 relied on the native
CRC32 instruction from the SSE 4.2 extension, which operates on
up to 8 bytes at a time. We can get a substantial speedup by using
carryless multiplication on SIMD registers, processing 64 bytes per
loop iteration. Shorter inputs fall back to ordinary CRC instructions.
On Intel Tiger Lake hardware (2020), CRC is now 50% faster for inputs
between 64 and 112 bytes, and 3x faster for 256 bytes.
The VPCLMULQDQ instruction on 512-bit registers has been available
on Intel hardware since 2019 and AMD since 2022. There is an older
variant for 128-bit registers, but at least on Zen 2 it performs worse
than normal CRC instructions for short inputs.
We must now do a runtime check, even for builds that target SSE
4.2. This doesn't matter in practice for WAL (arguably the most
critical case), because since commit
|
||
|---|---|---|
| .. | ||
| ax_pthread.m4 | ||
| c-compiler.m4 | ||
| c-library.m4 | ||
| check_decls.m4 | ||
| check_modules.pl | ||
| config.guess | ||
| config.sub | ||
| general.m4 | ||
| install-sh | ||
| llvm.m4 | ||
| Makefile | ||
| meson.build | ||
| missing | ||
| perl.m4 | ||
| pkg.m4 | ||
| prep_buildtree | ||
| programs.m4 | ||
| python.m4 | ||
| tcl.m4 | ||