Mirror of the official PostgreSQL GIT repository. Note that this is just a *mirror* - we don't work with pull requests on github. To contribute, please see https://wiki.postgresql.org/wiki/Submitting_a_Patch
Go to file
Tom Lane 261f89a976 Track the maximum possible frequency of non-MCE array elements.
The lossy-counting algorithm that ANALYZE uses to identify most-common
array elements has a notion of cutoff frequency: elements with
frequency greater than that are guaranteed to be collected, elements
with smaller frequencies are not.  In cases where we find fewer MCEs
than the stats target would permit us to store, the cutoff frequency
provides valuable additional information, to wit that there are no
non-MCEs with frequency greater than that.  What the selectivity
estimation functions actually use the "minfreq" entry for is as a
ceiling on the possible frequency of non-MCEs, so using the cutoff
rather than the lowest stored MCE frequency provides a tighter bound
and more accurate estimates.

Therefore, instead of redundantly storing the minimum observed MCE
frequency, store the cutoff frequency when there are fewer tracked
values than we want.  (When there are more, then of course we cannot
assert that no non-stored elements are above the cutoff frequency,
since we're throwing away some that are; so we still use the
minimum stored frequency in that case.)

Notably, this works even when none of the values are common enough
to be called MCEs.  In such cases we previously stored nothing in
the STATISTIC_KIND_MCELEM pg_statistic slot, which resulted in the
selectivity functions falling back to default estimates.  So in that
case we want to construct a STATISTIC_KIND_MCELEM entry that contains
no "values" but does have "numbers", to wit the three extra numbers
that the MCELEM entry type defines.  A small obstacle is that
update_attstats() has traditionally stored a null, not an empty array,
when passed zero "values" for a slot.  That gives rise to an MCELEM
entry that get_attstatsslot() will spit up on.  The least risky
solution seems to be to adjust update_attstats() so that it will emit
a non-null (but possibly empty) array when the passed stavalues array
pointer isn't NULL, rather than conditioning that on numvalues > 0.
In other existing cases I don't believe that that changes anything.
For consistency, handle the stanumbers array the same way.

In passing, improve the comments in routines that use
STATISTIC_KIND_MCELEM data.  Particularly, explain why we use
minfreq / 2 not minfreq as the estimate for non-MCE values.

Thanks to Matt Long for the suggestion that we could apply this
idea even when there are more than zero MCEs.

Reported-by: Mark Frost <FROSTMAR@uk.ibm.com>
Reported-by: Matt Long <matt@mattlong.org>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/PH3PPF1C905D6E6F24A5C1A1A1D8345B593E16FA@PH3PPF1C905D6E6.namprd15.prod.outlook.com
2025-09-20 14:48:16 -04:00
.github Add CODE_OF_CONDUCT.md, CONTRIBUTING.md, and SECURITY.md. 2024-07-02 13:03:58 -05:00
config Remove traces of support for Sun Studio compiler 2025-09-12 07:39:05 +02:00
contrib Track the maximum possible frequency of non-MCE array elements. 2025-09-20 14:48:16 -04:00
doc Add optional pid parameter to pg_replication_origin_session_setup(). 2025-09-19 05:38:40 +00:00
src Track the maximum possible frequency of non-MCE array elements. 2025-09-20 14:48:16 -04:00
.cirrus.star ci: Simplify ci-os-only handling 2025-08-14 12:09:34 -04:00
.cirrus.tasks.yml ci: Explicitly enable Meson features 2025-09-03 07:54:24 -07:00
.cirrus.yml ci: Per-repo configuration for manually trigger tasks 2025-08-14 11:54:03 -04:00
.dir-locals.el Make Emacs perl-mode indent more like perltidy. 2019-01-13 11:32:31 -08:00
.editorconfig Add script to keep .editorconfig in sync with .gitattributes 2025-02-01 10:09:45 +01:00
.git-blame-ignore-revs Add commit 7e9c216b52 to .git-blame-ignore-revs. 2025-09-13 14:55:38 -05:00
.gitattributes Fix git whitespace warning 2025-08-15 10:32:35 +02:00
.gitignore Update top-level .gitignore. 2022-12-04 15:23:00 -05:00
.mailmap Add a Git .mailmap file 2024-11-05 13:56:02 +01:00
aclocal.m4 autoconf: Move export_dynamic determination to configure 2022-12-06 18:55:28 -08:00
configure Remove traces of support for Sun Studio compiler 2025-09-12 07:39:05 +02:00
configure.ac Remove traces of support for Sun Studio compiler 2025-09-12 07:39:05 +02:00
COPYRIGHT Align organization wording in copyright statement 2025-05-16 11:20:07 -04:00
GNUmakefile.in Allow selecting the git revision to be packaged by "make dist". 2024-05-03 11:08:50 -04:00
HISTORY Canonicalize some URLs 2020-02-10 20:47:50 +01:00
Makefile Remove AIX support 2024-02-28 15:17:23 +04:00
meson_options.txt Add support for basic NUMA awareness 2025-04-07 23:08:17 +02:00
meson.build Remove traces of support for Sun Studio compiler 2025-09-12 07:39:05 +02:00
README.md Revise the style of a paragraph in README.md. 2024-03-21 10:16:41 -05:00

PostgreSQL Database Management System

This directory contains the source code distribution of the PostgreSQL database management system.

PostgreSQL is an advanced object-relational database management system that supports an extended subset of the SQL standard, including transactions, foreign keys, subqueries, triggers, user-defined types and functions. This distribution also contains C language bindings.

Copyright and license information can be found in the file COPYRIGHT.

General documentation about this version of PostgreSQL can be found at https://www.postgresql.org/docs/devel/. In particular, information about building PostgreSQL from the source code can be found at https://www.postgresql.org/docs/devel/installation.html.

The latest version of this software, and related software, may be obtained at https://www.postgresql.org/download/. For more information look at our web site located at https://www.postgresql.org/.