postgres

mirror of https://github.com/zebrajr/postgres.git synced 2025-12-06 00:20:01 +01:00

History

John Naylor 3c6e8c1238 Compute CRC32C using AVX-512 instructions where available The previous implementation of CRC32C on x86 relied on the native CRC32 instruction from the SSE 4.2 extension, which operates on up to 8 bytes at a time. We can get a substantial speedup by using carryless multiplication on SIMD registers, processing 64 bytes per loop iteration. Shorter inputs fall back to ordinary CRC instructions. On Intel Tiger Lake hardware (2020), CRC is now 50% faster for inputs between 64 and 112 bytes, and 3x faster for 256 bytes. The VPCLMULQDQ instruction on 512-bit registers has been available on Intel hardware since 2019 and AMD since 2022. There is an older variant for 128-bit registers, but at least on Zen 2 it performs worse than normal CRC instructions for short inputs. We must now do a runtime check, even for builds that target SSE 4.2. This doesn't matter in practice for WAL (arguably the most critical case), because since commit `e2809e3a1` the final computation with the 20-byte WAL header is inlined and unrolled when targeting that extension. Compared with two direct function calls, testing showed equal or slightly faster performance in performing an indirect function call on several dozen bytes followed by inlined instructions on constant input of 20 bytes. The MIT-licensed implementation was generated with the "generate" program from https://github.com/corsix/fast-crc32/ Based on: "Fast CRC Computation for Generic Polynomials Using PCLMULQDQ Instruction" V. Gopal, E. Ozturk, et al., 2009 Co-authored-by: Raghuveer Devulapalli <raghuveer.devulapalli@intel.com> Co-authored-by: Paul Amonson <paul.d.amonson@intel.com> Reviewed-by: Nathan Bossart <nathandbossart@gmail.com> Reviewed-by: Andres Freund <andres@anarazel.de> (earlier version) Reviewed-by: Matthew Sterrett <matthewsterrett2@gmail.com> (earlier version) Tested-by: Raghuveer Devulapalli <raghuveer.devulapalli@intel.com> Tested-by: David Rowley <<dgrowleyml@gmail.com>> (earlier version) Discussion: https://postgr.es/m/BL1PR11MB530401FA7E9B1CA432CF9DC3DC192@BL1PR11MB5304.namprd11.prod.outlook.com Discussion: https://postgr.es/m/PH8PR11MB82869FF741DFA4E9A029FF13FBF72@PH8PR11MB8286.namprd11.prod.outlook.com		2025-04-06 14:04:30 +07:00
..
ax_pthread.m4	Update config/ax_pthread.m4 to latest upstream version.	2018-11-19 15:05:33 -05:00
c-compiler.m4	Compute CRC32C using AVX-512 instructions where available	2025-04-06 14:04:30 +07:00
c-library.m4	Simplify checking for xlocale.h	2024-10-01 07:23:45 -04:00
check_decls.m4	Fix configure's AC_CHECK_DECLS tests to work correctly with clang.	2018-11-19 12:01:47 -05:00
check_modules.pl	Update copyright for 2025	2025-01-01 11:21:55 -05:00
config.guess	Update config.guess and config.sub	2024-04-09 14:21:57 +02:00
config.sub	Update config.guess and config.sub	2024-04-09 14:21:57 +02:00
general.m4	Rename configure.in to configure.ac	2020-07-24 10:42:08 +02:00
install-sh	Fix install-strip on Mac OS X	2012-08-21 23:42:43 -04:00
llvm.m4	jit: Require at least LLVM 14, if enabled.	2024-10-01 04:49:11 -04:00
Makefile	Install our "missing" script where PGXS builds can find it.	2015-12-11 16:15:05 -05:00
meson.build	meson: Add initial version of meson based build system	2022-09-21 22:37:17 -07:00
missing	Remove cvs keywords from all files.	2010-09-20 22:08:53 +02:00
perl.m4	Remove MSVC scripts	2023-12-20 09:44:37 +09:00
pkg.m4	Fix collection of typos in the code and the documentation	2022-03-15 11:29:35 +09:00
prep_buildtree	Fix vpath build	2019-03-27 23:36:00 +01:00
programs.m4	oauth: Disallow synchronous DNS in libcurl	2025-03-19 16:56:19 +13:00
python.m4	Unify DLSUFFIX on Darwin	2022-07-06 07:41:33 +02:00
tcl.m4	configure: More use of AC_ARG_VAR	2019-01-18 08:38:34 +01:00