Commit Graph

5021 Commits

Author SHA1 Message Date
Tom Lane
470273da0f Avoid resource leaks when a dblink connection fails.
If we hit out-of-memory between creating the PGconn and inserting
it into dblink's hashtable, we'd lose track of the PGconn, which
is quite bad since it represents a live connection to a remote DB.
Fix by rearranging things so that we create the hashtable entry
first.

Also reduce the number of states we have to deal with by getting rid
of the separately-allocated remoteConn object, instead allocating it
in-line in the hashtable entries.  (That incidentally removes a
session-lifespan memory leak observed in the regression tests.)

There is an apparently-irreducible remaining OOM hazard, which
is that if the connection fails at the libpq level (ie it's
CONNECTION_BAD) then we have to pstrdup the PGconn's error message
before we can release it, and theoretically that could fail.  However,
in such cases we're only leaking memory not a live remote connection,
so I'm not convinced that it's worth sweating over.

This is a pretty low-probability failure mode of course, but losing
a live connection seems bad enough to justify back-patching.

Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Matheus Alcantara <matheusssilv97@gmail.com>
Discussion: https://postgr.es/m/1346940.1748381911@sss.pgh.pa.us
Backpatch-through: 13
2025-05-29 10:39:55 -04:00
Fujii Masao
3c4d7557e0 Fix assertion failure in pg_prewarm() on objects without storage.
An assertion test added in commit 049ef33 could fail when pg_prewarm()
was called on objects without storage, such as partitioned tables.
This resulted in the following failure in assert-enabled builds:

    Failed Assert("RelFileNumberIsValid(rlocator.relNumber)")

Note that, in non-assert builds, pg_prewarm() just failed with an error
in that case, so there was no ill effect in practice.

This commit fixes the issue by having pg_prewarm() raise an error early
if the specified object has no storage. This approach is similar to
the fix in commit 4623d7144 for pg_freespacemap.

Back-patched to v17, where the issue was introduced.

Author: Masahiro Ikeda <ikedamsh@oss.nttdata.com>
Reviewed-by: Dilip Kumar <dilipbalaut@gmail.com>
Reviewed-by: Richard Guo <guofenglinux@gmail.com>
Reviewed-by: Fujii Masao <masao.fujii@gmail.com>
Discussion: https://postgr.es/m/e082e6027610fd0a4091ae6d033aa117@oss.nttdata.com
Backpatch-through: 17
2025-05-29 17:50:32 +09:00
Michael Paquier
35a428f30b pg_stat_statements: Fix parameter number gaps in normalized queries
pg_stat_statements anticipates that certain constant locations may be
recorded multiple times and attempts to avoid calculating a length for
these locations in fill_in_constant_lengths().

However, during generate_normalized_query() where normalized query
strings are generated, these locations are not excluded from
consideration.  This could increment the parameter number counter for
every recorded occurrence at such a location, leading to an incorrect
normalization in certain cases with gaps in the numbers reported.

For example, take this query:
SELECT WHERE '1' IN ('2'::int, '3'::int::text)
Before this commit, it would be normalized like that, with gaps in the
parameter numbers:
SELECT WHERE $1 IN ($3::int, $4::int::text)
However the correct, less confusing one should be like that:
SELECT WHERE $1 IN ($2::int, $3::int::text)

This commit fixes the computation of the parameter numbers to track the
number of constants replaced with an $n by a separate counter instead of
the iterator used to loop through the list of locations.

The underlying query IDs are not changed, neither are the normalized
strings for existing PGSS hash entries.  New entries with fresh
normalized queries would automatically get reshaped based on the new
parameter numbering.

Issue discovered while discussing a separate problem for HEAD, but this
affects all the stable branches.

Author: Sami Imseih <samimseih@gmail.com>
Discussion: https://postgr.es/m/CAA5RZ0tzxvWXsacGyxrixdhy3tTTDfJQqxyFBRFh31nNHBQ5qA@mail.gmail.com
Backpatch-through: 13
2025-05-29 11:26:03 +09:00
Amit Langote
1722d5eb05 Revert "Don't lock partitions pruned by initial pruning"
As pointed out by Tom Lane, the patch introduced fragile and invasive
design around plan invalidation handling when locking of prunable
partitions was deferred from plancache.c to the executor. In
particular, it violated assumptions about CachedPlan immutability and
altered executor APIs in ways that are difficult to justify given the
added complexity and overhead.

This also removes the firstResultRels field added to PlannedStmt in
commit 28317de72, which was intended to support deferred locking of
certain ModifyTable result relations.

Reported-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/605328.1747710381@sss.pgh.pa.us
2025-05-22 17:02:35 +09:00
Michael Paquier
06450c7b8c Fix regression with location calculation of nested statements
The statement location calculated for some nested query cases was wrong
when multiple queries are sent as a single string, these being separated
by semicolons.  As pointed by Sami Imseih, the location calculation was
incorrect when the last query of nested statement with multiple queries
does **NOT** finish with a semicolon for the last statement.  In this
case, the statement length tracked by RawStmt is 0, which is equivalent
to say that the string should be used until its end.  The code
previously discarded this case entirely, causing the location to remain
at 0, the same as pointing at the beginning of the string.  This caused
pg_stat_statements to store incorrect query strings.

This issue has been introduced in 499edb0974.  I have looked at the
diffs generated by pgaudit back then, and noticed the difference
generated for this nested query case, but I have missed the point that
it was an actual regression with an existing case.  A test case is added
in pg_stat_statements to provide some coverage, restoring the pre-17
behavior for the calculation of the query locations.  Special thanks to
David Steele, who, through an analysis of the test diffs generated by
pgaudit with the new v18 logic, has poked me about the fact that my
original analysis of the matter was wrong.

The test output of pg_overexplain is updated to reflect the new logic,
as the new locations refer to the beginning of the argument passed to
the function explain_filter().  When the module was introduced in
8d5ceb113e, which was after 499edb0974 (for the new calculation
method), the locations of the test were not actually right: the plan
generated for the query string given in input of the function pointed to
the top-level query, not the nested one.

Reported-by: David Steele <david@pgbackrest.org>
Author: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Anthonin Bonnefoy <anthonin.bonnefoy@datadoghq.com>
Reviewed-by: Jian He <jian.universality@gmail.com>
Reviewed-by: Sami Imseih <samimseih@gmail.com>
Reviewed-by: David Steele <david@pgbackrest.org>
Discussion: https://postgr.es/m/844a3b38-bbf1-4fb2-9fd6-f58c35c09917@pgbackrest.org
2025-05-21 10:22:12 +09:00
Michael Paquier
2c6469d4cd Fix incorrect year in some copyright notices
A couple of new files have been added in the tree with a copyright year
of 2024 while we were already in 2025.  These should be marked with
2025, so let's fix them.

Reported-by: Shaik Mohammad Mujeeb <mujeeb.sk.dev@gmail.com>
Discussion: https://postgr.es/m/CALa6HA4_Wu7-2PV0xv-Q84cT8eG7rTx6bdjUV0Pc=McAwkNMfQ@mail.gmail.com
2025-05-19 09:46:52 +09:00
Heikki Linnakangas
b28c59a6cd Use 'void *' for arbitrary buffers, 'uint8 *' for byte arrays
A 'void *' argument suggests that the caller might pass an arbitrary
struct, which is appropriate for functions like libc's read/write, or
pq_sendbytes(). 'uint8 *' is more appropriate for byte arrays that
have no structure, like the cancellation keys or SCRAM tokens. Some
places used 'char *', but 'uint8 *' is better because 'char *' is
commonly used for null-terminated strings. Change code around SCRAM,
MD5 authentication, and cancellation key handling to follow these
conventions.

Discussion: https://www.postgresql.org/message-id/61be9e31-7b7d-49d5-bc11-721800d89d64@eisentraut.org
2025-05-08 22:01:25 +03:00
Richard Guo
773db22269 Suppress unnecessary explicit sorting for EPQ mergejoin path
When building a ForeignPath for a joinrel, if there's a possibility
that EvalPlanQual will be executed, we must identify a suitable path
for EPQ checks.  If the outer or inner path of the chosen path is a
ForeignPath representing a pushed-down join, we replace it with its
fdw_outerpath to ensure that the EPQ check path consists entirely of
local joins.

If the chosen path is a MergePath, and its outer or inner path is a
ForeignPath that is not already well enough ordered, the MergePath
will have non-NIL outersortkeys or innersortkeys indicating the
desired ordering to be created by an explicit Sort node.  If we then
replace the outer or inner path with its corresponding fdw_outerpath,
and that path is already sufficiently ordered, we end up in an
inconsistent state: the MergePath has non-NIL outersortkeys or
innersortkeys, and its input path is already properly ordered.  This
inconsistency can result in an Assert failure or the addition of a
redundant Sort node.

To fix, check if the new outer or inner path of a MergePath is already
properly sorted, and set its outersortkeys or innersortkeys to NIL if
so.

Bug: #18902
Reported-by: Nikita Kalinin <n.kalinin@postgrespro.ru>
Author: Richard Guo <guofenglinux@gmail.com>
Reviewed-by: Tender Wang <tndrwang@gmail.com>
Discussion: https://postgr.es/m/18902-71c1bed2b9f7c46f@postgresql.org
2025-05-08 18:20:18 +09:00
Peter Eisentraut
09a47c68e2 Fix whitespace 2025-05-07 07:01:03 +02:00
Jacob Champion
d2e7d2a09d oauth: Disallow OAuth connections via postgres_fdw/dblink
A subsequent commit will reclassify oauth_client_secret from dispchar=""
to dispchar="*", so that UIs will treat it like a secret. For our FDWs,
this change will move that option from SERVER to USER MAPPING, which we
need to avoid.

But upon further discussion, we don't really want our FDWs to use our
builtin Device Authorization flow at all, for several reasons:

- the URL and code would be printed to the server logs, not sent over
  the client connection
- tokens are not cached/refreshed, so every single connection has to be
  manually authorized by a user with a browser
- oauth_client_secret needs to belong to the foreign server, but options
  on SERVER are publicly accessible
- all non-superusers would need password_required=false, which is
  dangerous

Future OAuth work can use FDWs as a motivating use case. But for now,
disallow all oauth_* connection options for these two extensions.

Reviewed-by: Noah Misch <noah@leadboat.com>
Discussion: https://postgr.es/m/20250415191435.55.nmisch%40google.com
2025-04-29 13:08:24 -07:00
Melanie Plageman
f132815fd7 Add maintenance_io_concurrency flag to some read stream users
Index vacuuming and [auto]prewarm AIO concurrency should be governed by
maintenance_io_concurrency. As such, pass those read stream users the
READ_STREAM_MAINTENANCE flag which will calculate their read stream
distance with maintenance_io_concurrency instead of
effective_io_concurrency. This was an oversight in the original commits
making those operations use the read stream API.

Discussion: https://postgr.es/m/flat/CAAKRu_aopDxTo4b41Mt_7Zc-z0_ngocrY8SFCCY6Aph1HgwuNw%40mail.gmail.com
2025-04-28 14:19:45 -04:00
Amit Kapila
aaf9e95e87 Fix xmin advancement during fast_forward decoding.
During logical decoding, we advance catalog_xmin of logical too early in
fast_forward mode, resulting in required catalog data being removed by
vacuum. This mode is normally used to advance the slot without processing
the changes, but we still can't let the slot's xmin to advance to an
incorrect value.

Commit f49a80c481 fixed a similar issue where the logical slot's
catalog_xmin was getting advanced prematurely during non-fast-forward
mode. During xl_running_xacts processing, instead of directly advancing
the slot's xmin to the oldest running xid in the record, it allowed the
xmin to be held back for snapshots that can be used for
not-yet-replayed transactions, as those might consider older txns as
running too. However, it missed the fact that the same problem can happen
during fast_forward mode decoding, as we won't build a base snapshot in
that mode, and the future call to get_changes from the same slot can miss
seeing the required catalog changes leading to incorrect reslts.

This commit allows building the base snapshot even in fast_forward mode to
prevent the early advancement of xmin.

Reported-by: Amit Kapila <amit.kapila16@gmail.com>
Author: Zhijie Hou <houzj.fnst@fujitsu.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-by: shveta malik <shveta.malik@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Backpatch-through: 13
Discussion: https://postgr.es/m/CAA4eK1LqWncUOqKijiafe+Ypt1gQAQRjctKLMY953J79xDBgAg@mail.gmail.com
Discussion: https://postgr.es/m/OS0PR01MB57163087F86621D44D9A72BF94BB2@OS0PR01MB5716.jpnprd01.prod.outlook.com
2025-04-28 11:35:54 +05:30
Tom Lane
2311f193ea Remove circular #include's between plpython.h and plpy_util.h.
plpython.h included plpy_util.h, simply on the grounds that "it's
easier to just include it everywhere".  However, plpy_util.h must
include plpython.h, or it won't pass headerscheck.  While the
resulting circularity doesn't have any immediate bad effect,
it's poor design.  We have seen serious messes arise in the past
from overly-broad inclusion footprints created by such circularities,
so let's establish a project policy against it.

To fix, just replace *.c files' inclusions of plpython.h with
plpy_util.h.  They'll pull in plpython.h indirectly; indeed, almost
all have already done so via inclusions of other plpy_xxx.h headers.
(Any extensions using plpython.h can do likewise without breaking
the compatibility of their code with prior Postgres versions.)

Reported-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Discussion: https://postgr.es/m/aAxQ6fcY5QQV1lo3@ip-10-97-1-34.eu-west-3.compute.internal
2025-04-27 11:43:02 -04:00
Amit Kapila
50b8ad30f7 Fix typo in test file name added in commit 4909b38af0.
Author: Shlok Kyal <shlok.kyal.oss@gmail.com>
Backpatch-through: 13
Discussion: https://postgr.es/m/CANhcyEXsObdjkjxEnq10aJumDpa5J6aiPzgTh_w4KCWRYHLw6Q@mail.gmail.com
2025-04-25 12:46:02 +05:30
David Rowley
84fd3bc141 Fix a few duplicate words in comments
These are all new to v18

Author: David Rowley <dgrowleyml@gmail.com>
Discussion: https://postgr.es/m/CAApHDvrMcr8XD107H3NV=WHgyBcu=sx5+7=WArr-n_cWUqdFXQ@mail.gmail.com
2025-04-21 10:41:18 +12:00
Tom Lane
d05996340d Be more wary of corrupt data in pageinspect's heap_page_items().
The original intent in heap_page_items() was to return nulls, not
throw an error or crash, if an item was sufficiently corrupt that
we couldn't safely extract data from it.  However, commit d6061f83a
utterly missed that memo, and not only put in an un-length-checked
copy of the tuple's data section, but also managed to break the check
on sane nulls-bitmap length.  Either mistake could possibly lead to
a SIGSEGV crash if the tuple is corrupt.

Bug: #18896
Reported-by: Dmitry Kovalenko <d.kovalenko@postgrespro.ru>
Author: Dmitry Kovalenko <d.kovalenko@postgrespro.ru>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/18896-add267b8e06663e3@postgresql.org
Backpatch-through: 13
2025-04-19 16:37:42 -04:00
Michael Paquier
88e947136b Fix typos and grammar in the code
The large majority of these have been introduced by recent commits done
in the v18 development cycle.

Author: Alexander Lakhin <exclusion@gmail.com>
Discussion: https://postgr.es/m/9a7763ab-5252-429d-a943-b28941e0e28b@gmail.com
2025-04-19 19:17:42 +09:00
Peter Eisentraut
c55df7c6ea Fix incorrect format placeholders
BlockNumber is unsigned int.  Fix for commit 14ffaece0f.
2025-04-14 08:56:33 +02:00
Tom Lane
e708ffe79d Fix GIN's shimTriConsistentFn to not corrupt its input.
Commit 0f21db36d made an assumption that GIN triConsistentFns
would not modify their input entryRes[] arrays.  But in fact,
the "shim" triConsistentFn that we use for opclasses that don't
supply their own did exactly that, potentially leading to wrong
answers from a GIN index search.  Through bad luck, none of the
test cases that we have for such opclasses exposed the bug.

One response to this could be that the assumption of consistency check
functions not modifying entryRes[] arrays is a bad one, but it still
seems reasonable to me.  Notably, shimTriConsistentFn is itself
assuming that with respect to the underlying boolean consistentFn,
so it's sure being self-centered in supposing that it gets to do so.

Fortunately, it's quite simple to fix shimTriConsistentFn to restore
the entry-time state of entryRes[], so let's do that instead.

This issue doesn't affect any core GIN opclasses, since they all
supply their own triConsistentFns.  It does affect contrib modules
btree_gin, hstore, and intarray.

Along the way, I (tgl) noticed that shimTriConsistentFn failed to
pick up on a "recheck" flag returned by its first call to the boolean
consistentFn.  This may be only a latent problem, since it would be
unlikely for a consistentFn to set recheck for the all-false case
and not any other cases.  (Indeed, none of our contrib modules do
that.)  Nonetheless, it's formally wrong.

Reported-by: Vinod Sridharan <vsridh90@gmail.com>
Author: Vinod Sridharan <vsridh90@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/CAFMdLD7XzsXfi1+DpTqTgrD8XU0i2C99KuF=5VHLWjx4C1pkcg@mail.gmail.com
Backpatch-through: 13
2025-04-12 12:28:02 -04:00
Peter Geoghegan
a6cab6a78e Harmonize function parameter names for Postgres 18.
Make sure that function declarations use names that exactly match the
corresponding names from function definitions in a few places.  These
inconsistencies were all introduced during Postgres 18 development.

This commit was written with help from clang-tidy, by mechanically
applying the same rules as similar clean-up commits (the earliest such
commit was commit 035ce1fe).
2025-04-12 12:07:36 -04:00
David Rowley
928394b664 Improve various new-to-v18 appendStringInfo calls
Similar to 8461424fd, here we adjust a few new locations which were not
using the most suitable appendStringInfo* function for the intended
purpose.

Author: David Rowley <drowleyml@gmail.com
Discussion: https://postgr.es/m/CAApHDvqJnNjueb=Eoj8K+8n0g7nj_AcPWSiCj5RNV4fDejAfqA@mail.gmail.com
2025-04-11 10:07:22 +12:00
Amit Kapila
4909b38af0 Fix data loss in logical replication.
Data loss can happen when the DDLs like ALTER PUBLICATION ... ADD TABLE ...
or ALTER TYPE ...  that don't take a strong lock on table happens
concurrently to DMLs on the tables involved in the DDL. This happens
because logical decoding doesn't distribute invalidations to concurrent
transactions and those transactions use stale cache data to decode the
changes. The problem becomes bigger because we keep using the stale cache
even after those in-progress transactions are finished and skip the
changes required to be sent to the client.

This commit fixes the issue by distributing invalidation messages from
catalog-modifying transactions to all concurrent in-progress transactions.
This allows the necessary rebuild of the catalog cache when decoding new
changes after concurrent DDL.

We observed performance regression primarily during frequent execution of
*publication DDL* statements that modify the published tables. The
regression is minor or nearly nonexistent for DDLs that do not affect the
published tables or occur infrequently, making this a worthwhile cost to
resolve a longstanding data loss issue.

An alternative approach considered was to take a strong lock on each
affected table during publication modification. However, this would only
address issues related to publication DDLs (but not the ALTER TYPE ...)
and require locking every relation in the database for publications
created as FOR ALL TABLES, which is impractical.

The bug exists in all supported branches, but we are backpatching till 14.
The fix for 13 requires somewhat bigger changes than this fix, so the fix
for that branch is still under discussion.

Reported-by: hubert depesz lubaczewski <depesz@depesz.com>
Reported-by: Tomas Vondra <tomas.vondra@enterprisedb.com>
Author: Shlok Kyal <shlok.kyal.oss@gmail.com>
Author: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Reviewed-by: Zhijie Hou <houzj.fnst@fujitsu.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Tested-by: Benoit Lobréau <benoit.lobreau@dalibo.com>
Backpatch-through: 14
Discussion: https://postgr.es/m/de52b282-1166-1180-45a2-8d8917ca74c6@enterprisedb.com
Discussion: https://postgr.es/m/CAD21AoAenVqiMjpN-PvGHL1N9DWnHSq673bfgr6phmBUzx=kLQ@mail.gmail.com
2025-04-10 13:14:40 +05:30
Tomas Vondra
3887d0cfeb Cleanup of pg_numa.c
This moves/renames some of the functions defined in pg_numa.c:

* pg_numa_get_pagesize() is renamed to pg_get_shmem_pagesize(), and
  moved to src/backend/storage/ipc/shmem.c. The new name better reflects
  that the page size is not related to NUMA, and it's specifically about
  the page size used for the main shared memory segment.

* move pg_numa_available() to src/backend/storage/ipc/shmem.c, i.e. into
  the backend (which more appropriate for functions callable from SQL).
  While at it, improve the comment to explain what page size it returns.

* remove unnecessary includes from src/port/pg_numa.c, adding
  unnecessary dependencies (src/port should be suitable for frontent).
  These were either leftovers or unnecessary thanks to the other changes
  in this commit.

This eliminates unnecessary dependencies on backend symbols, which we
don't want in src/port.

Reported-by: Kirill Reshke <reshkekirill@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
https://postgr.es/m/CALdSSPi5fj0a7UG7Fmw2cUD1uWuckU_e8dJ+6x-bJEokcSXzqA@mail.gmail.com
2025-04-09 21:50:17 +02:00
Tom Lane
b1720fe63f Move contrib/spi testing from core regression tests to contrib/spi.
It's weird to have the core regression tests depending on contrib
code, and coverage testing shows that those test queries add nothing
to the core-code coverage of the core tests.  So pull those test bits
out and put them into ordinary test scripts inside contrib/spi/,
making that more like other contrib modules.

Aside from being structurally nicer, anything we can take out of the
core tests (which are executed multiple times per check-world run)
and put into tests executed only once should be a win.  It doesn't
look like this change will buy a whole lot of milliseconds, but a
cycle saved is a cycle earned.

Also, there is some discussion around possibly removing refint and/or
autoinc altogether.  I don't know if that will happen, but we'd
certainly need to decouple them from the core tests to do so.

The tests for autoinc were quite intertwined with the undocumented
"ttdummy" trigger in regress.c.  That made the tests very hard to
understand and contributed nothing to autoinc's testing either.
So I just deleted ttdummy and rewrote the autoinc tests without it.

I realized while doing this that the description of autoinc in
the SGML docs is not a great description of what the function
actually does, so the patch includes some updates to those docs.

Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi>
Discussion: https://postgr.es/m/3872677.1744077559@sss.pgh.pa.us
2025-04-08 19:12:03 -04:00
Peter Eisentraut
8969194b73 Fix incorrect format placeholder
for commit 749a9e20c9
2025-04-08 19:12:03 +02:00
Tomas Vondra
91f1fe90c7 pg_buffercache: Change page_num type to bigint
The page_num was defined as integer, which should be sufficient for the
near future (with 4K pages it's 8TB). But it's virtually free to return
bigint, and get a wider range. This was agreed on the thread, but I
forgot to tweak this in ba2a3c2302.

While at it, make the data types in CREATE VIEW a bit more consistent.

Discussion: https://postgr.es/m/CAKZiRmxh6KWo0aqRqvmcoaX2jUxZYb4kGp3N%3Dq1w%2BDiH-696Xw%40mail.gmail.co
2025-04-08 12:38:42 +02:00
Andres Freund
dcf7e1697b Add pg_buffercache_evict_{relation,all} functions
In addition to the added functions, the pg_buffercache_evict() function now
shows whether the buffer was flushed.

pg_buffercache_evict_relation(): Evicts all shared buffers in a
relation at once.
pg_buffercache_evict_all(): Evicts all shared buffers at once.

Both functions provide mechanism to evict multiple shared buffers at
once. They are designed to address the inefficiency of repeatedly calling
pg_buffercache_evict() for each individual buffer, which can be time-consuming
when dealing with large shared buffer pools. (e.g., ~477ms vs. ~2576ms for
16GB of fully populated shared buffers).

These functions are intended for developer testing and debugging
purposes and are available to superusers only.

Minimal tests for the new functions are included. Also, there was no test for
pg_buffercache_evict(), test for this added too.

No new extension version is needed, as it was already increased this release
by ba2a3c2302.

Author: Nazir Bilal Yavuz <byavuz81@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Aidar Imamov <a.imamov@postgrespro.ru>
Reviewed-by: Joseph Koshakow <koshy44@gmail.com>
Discussion: https://postgr.es/m/CAN55FZ0h_YoSqqutxV6DES1RW8ig6wcA8CR9rJk358YRMxZFmw%40mail.gmail.com
2025-04-08 02:19:32 -04:00
David Rowley
d69d45a5a9 Speedup child EquivalenceMember lookup in planner
When planning queries to partitioned tables, we clone all
EquivalenceMembers belonging to the partitioned table into em_is_child
EquivalenceMembers for each non-pruned partition.  For partitioned tables
with large numbers of partitions, this meant the ec_members list could
become large and code searching that list would become slow.  Effectively,
the more partitions which were present, the more searches needed to be
performed for operations such as find_ec_member_matching_expr() during
create_plan() and the more partitions present, the longer these searches
would take, i.e., a quadratic slowdown.

To fix this, here we adjust how we store EquivalenceMembers for
em_is_child members.  Instead of storing these directly in ec_members,
these are now stored in a new array of Lists in the EquivalenceClass,
which is indexed by the relid.  When we want to find EquivalenceMembers
belonging to a certain child relation, we can narrow the search to the
array element for that relation.

To make EquivalenceMember lookup easier and to reduce the amount of code
change, this commit provides a pair of functions to allow iteration over
the EquivalenceMembers of an EC which also handles finding the child
members, if required.  Callers that never need to look at child members
can remain using the foreach loop over ec_members, which will now often
be faster due to only parent-level members being stored there.

The actual performance increases here are highly dependent on the number
of partitions and the query being planned.  Performance increases can be
visible with as few as 8 partitions, but the speedup is marginal for
such low numbers of partitions.  The speedups become much more visible
with a few dozen to hundreds of partitions.  With some tested queries
using 56 partitions, the planner was around 3x faster than before.  For
use cases with thousands of partitions, these are likely to become
significantly faster.  Some testing has shown planner speedups of 60x or
more with 8192 partitions.

Author: Yuya Watari <watari.yuya@gmail.com>
Co-authored-by: David Rowley <dgrowleyml@gmail.com>
Reviewed-by: David Rowley <dgrowleyml@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Andrey Lepikhov <a.lepikhov@postgrespro.ru>
Reviewed-by: Alena Rybakina <lena.ribackina@yandex.ru>
Reviewed-by: Dmitry Dolgov <9erthalion6@gmail.com>
Reviewed-by: Amit Langote <amitlangote09@gmail.com>
Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Tested-by: Thom Brown <thom@linux.com>
Tested-by: newtglobal postgresql_contributors <postgresql_contributors@newtglobalcorp.com>
Discussion: https://postgr.es/m/CAJ2pMkZNCgoUKSE%2B_5LthD%2BKbXKvq6h2hQN8Esxpxd%2Bcxmgomg%40mail.gmail.com
2025-04-08 18:09:57 +12:00
Tomas Vondra
ba2a3c2302 Add pg_buffercache_numa view with NUMA node info
Introduces a new view pg_buffercache_numa, showing NUMA memory nodes
for individual buffers. For each buffer the view returns an entry for
each memory page, with the associated NUMA node.

The database blocks and OS memory pages may have different size - the
default block size is 8KB, while the memory page is 4K (on x86). But
other combinations are possible, depending on configure parameters,
platform, etc. This means buffers may overlap with multiple memory
pages, each associated with a different NUMA node.

To determine the NUMA node for a buffer, we first need to touch the
memory pages using pg_numa_touch_mem_if_required, otherwise we might get
status -2 (ENOENT = The page is not present), indicating the page is
either unmapped or unallocated.

The view may be relatively expensive, especially when accessed for the
first time in a backend, as it touches all memory pages to get reliable
information about the NUMA node. This may also force allocation of the
shared memory.

Author: Jakub Wartak <jakub.wartak@enterprisedb.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Reviewed-by: Tomas Vondra <tomas@vondra.me>
Discussion: https://postgr.es/m/CAKZiRmxh6KWo0aqRqvmcoaX2jUxZYb4kGp3N%3Dq1w%2BDiH-696Xw%40mail.gmail.com
2025-04-07 23:08:17 +02:00
Tom Lane
8cfbdf8f4d Fix some issues in contrib/spi/refint.c.
check_foreign_key incorrectly used a single cache entry for its saved
plans for a 'c' (cascade) trigger, although there are two different
queries to execute depending on whether it fires for an update or a
delete.  This caused the wrong things to be done if both types of
event occur in one session.  (This was indeed visible in the triggers
regression test, but apparently nobody ever questioned it.)  To fix,
add the operation type to the cache key.

Its debug log output failed to distinguish update from delete
events, too.

Also, change the intended trigger usage from BEFORE ROW to AFTER ROW,
and add checks insisting on that usage.  BEFORE is really rather
unsafe, since if there are other BEFORE triggers they might change or
cancel the operation we are trying to check.  AFTER triggers are the
standard way to propagate changes to other rows, so we should follow
that way here.

In passing, remove a useless duplicate lookup of the cache entry.

This code is mostly intended as a documentation example, so we
won't consider a back-patch.

Author: Dmitrii Bondar <d.bondar@postgrespro.ru>
Reviewed-by: Paul Jungwirth <pj@illuminatedcomputing.com>
Reviewed-by: Lilian Ontowhee <ontowhee@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/79755a2b18ed4fe5e29da6a87a1e00d1@postgrespro.ru
2025-04-07 15:54:16 -04:00
Tom Lane
969ab9d4f5 Follow-up fixes for SHA-2 patch (commit 749a9e20c).
This changes the check for valid characters in the salt string to
only allow plain ASCII letters and digits.  The previous coding was
locale-dependent which doesn't really seem like a great idea here;
moreover it could not work correctly in multibyte encodings.

This fixes a careless pointer-use-after-pfree, too.

Reported-by: Tom Lane <tgl@sss.pgh.pa.us>
Reported-by: Andres Freund <andres@anarazel.de>
Author: Bernd Helmle <mailings@oopsware.de>
Discussion: https://postgr.es/m/6fab35422df6b6b9727fdcc243c5fa1c667dd3b5.camel@oopsware.de
2025-04-07 14:14:28 -04:00
Tom Lane
b73e6d71a8 Fix erroneous construction of functions' dependencies on transforms.
The list of transform objects that a function should use is specified
in CREATE FUNCTION's TRANSFORM clause, and then represented indirectly
in pg_proc.protrftypes.  However, ProcedureCreate completely ignored
that for purposes of constructing pg_depend entries, and instead made
the function depend on any transforms that exist for its parameter or
return data types.  This is bad in both directions: the function could
be made dependent on a transform it does not actually use, or it
could try to use a transform that's since been dropped.  (The latter
scenario would require use of a transform that's not for any of the
parameter or return types, but that seems legit for cases where the
function performs SQL operations internally.)

To fix, pass in the list of transform objects that CreateFunction
identified, and build pg_depend entries from that not from the
parameter/return types.  This results in changes in the expected
test outputs in contrib/bool_plperl, which I guess are due to
different ordering of pg_depend entries -- that test case is
surely not exercising either of the problem scenarios.

This fix is not back-patchable as-is: changing the signature of
ProcedureCreate seems too risky in stable branches.  We could
do something like making ProcedureCreate a wrapper around
ProcedureCreateExt or so.  However, I'm more inclined to do
nothing in the back branches.  We had no field complaints up to
now, so the hazards don't seem to be a big issue in practice.
And we couldn't do anything about existing pg_depend entries,
so a back-patched fix would result in a mishmash of dependencies
created according to different rules.  That cure could be worse
than the disease, perhaps.

I bumped catversion just to lay down a marker that the expected
contents of pg_depend are a bit different than before.

Reported-by: Chapman Flack <jcflack@acm.org>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/3112950.1743984111@sss.pgh.pa.us
2025-04-07 13:31:37 -04:00
Tom Lane
8ab6ef2bb8 Fix memory leaks in px_crypt_shacrypt().
Per Coverity.  I don't think these are of any actual significance
since the function ought to be invoked in a short-lived context.
Still, if it's trying to be neat it should get it right.

Also const-ify a constant and fix up typedef formatting.
2025-04-06 11:57:22 -04:00
Álvaro Herrera
749a9e20c9
Add modern SHA-2 based password hashes to pgcrypto.
This adapts the publicly available reference implementation on
https://www.akkadia.org/drepper/SHA-crypt.txt and adds the new hash
algorithms sha256crypt and sha512crypt to crypt() and gen_salt()
respectively.

Author: Bernd Helmle <mailings@oopsware.de>
Reviewed-by: Japin Li <japinli@hotmail.com>
Discussion: https://postgr.es/m/c763235a2757e2f5f9e3e27268b9028349cef659.camel@oopsware.de
2025-04-05 19:17:13 +02:00
Melanie Plageman
d9c7911e1a Use streaming read I/O in autoprewarm
Make a read stream for each valid fork of each valid relation
represented in the autoprewarm dump file and prewarm those blocks
through the read stream API instead of by directly invoking
ReadBuffer().

Co-authored-by: Nazir Bilal Yavuz <byavuz81@gmail.com>
Co-authored-by: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi>
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Reviewed-by: Andrey M. Borodin <x4mmm@yandex-team.ru> (earlier versions)
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>  (earlier versions)
Reviewed-by: Matheus Alcantara <mths.dev@pm.me> (earlier versions)
Discussion: https://postgr.es/m/flat/CAN55FZ3n8Gd%2BhajbL%3D5UkGzu_aHGRqnn%2BxktXq2fuds%3D1AOR6Q%40mail.gmail.com
2025-04-04 15:28:54 -04:00
Melanie Plageman
6acab8bdbc Refactor autoprewarm_database_main() in preparation for read stream
Autoprewarm prewarms blocks from a dump file representing the contents
of shared buffers at the time it was dumped. It uses a sorted array of
BlockInfoRecords, each representing a block from one of the cluster's
databases and tables.

autoprewarm_database_main() prewarms all the blocks from a single
database. It is optimized to ensure we don't try to open the same
relation or fork over and over again if it has been dropped or is
invalid. The main loop handled this by carefully setting various local
variables to sentinel values when a run of blocks should be skipped.

This method won't work with the read stream API. The read stream
callback must be able to advance the current position in the
BlockInfoRecord array to allow for reading ahead additional blocks,
however a read stream maps 1-1 with a relation and fork combination. So,
the main loop in autoprewarm_database_main() must also advance the
position in the array of BlockInfoRecords to skip invalid relations and
forks. This split control doesn't fit well with the current flow control
in autoprewarm_database_main()

To make it compatible with the read stream API, change
autoprewarm_database_main() to explicitly fast-forward in the
BlockInfoRecords array past the blocks belonging to an invalid relation
or fork.

This commit only implements the new control flow -- it does not use the
read stream API.

Co-authored-by: Nazir Bilal Yavuz <byavuz81@gmail.com>
Co-authored-by: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi>
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Discussion: https://postgr.es/m/flat/CAN55FZ3n8Gd%2BhajbL%3D5UkGzu_aHGRqnn%2BxktXq2fuds%3D1AOR6Q%40mail.gmail.com
2025-04-04 15:28:49 -04:00
Melanie Plageman
7f848cb788 Remove superfluous autoprewarm check
autoprewarm_database_main() prewarms blocks from the same database. It
is passed an array of sorted BlockInfoRecords and a start and stop index
into the array. The range represented should include only blocks
belonging to global objects or blocks from a single database. Remove an
unnecessary check that the current block is from the same database and
add an assert to ensure this invariant remains. Doing so removes a
special case that makes future refactoring to accommodate read
streamifying autoprewarm easier.

Noticed off-list by Andres Freund
2025-04-04 15:28:39 -04:00
Melanie Plageman
64e7fa43a9 Fix autoprewarm neglect of tablespaces
While prewarming blocks from a dump file, autoprewarm_database_main()
mistakenly ignored tablespace when detecting the beginning of the next
relation to prewarm. Because RelFileNumbers are only unique within a
tablespace, autoprewarm could miss prewarming blocks from a
relation with the same RelFileNumber in a different tablespace.

Though this situation is likely rare in practice, it's best to make the
code correct. Do so by explicitly checking for the RelFileNumber when
detecting a new relation.

Reported-by: Heikki Linnakangas <hlinnaka@iki.fi>
Discussion: https://postgr.es/m/97c36982-603b-494a-95f4-aaf2a12ac27e%40iki.fi
2025-04-04 11:34:06 -04:00
Peter Eisentraut
8123e91f5a Convert PathKey to use CompareType
Change the PathKey struct to use CompareType to record the sort
direction instead of hardcoding btree strategy numbers.  The
CompareType is then converted to the index-type-specific strategy when
the plan is created.

This reduces the number of places btree strategy numbers are
hardcoded, and it's a self-contained subset of a larger effort to
allow non-btree indexes to behave like btrees.

Author: Mark Dilger <mark.dilger@enterprisedb.com>
Co-authored-by: Peter Eisentraut <peter@eisentraut.org>
Discussion: https://www.postgresql.org/message-id/flat/E72EAA49-354D-4C2E-8EB9-255197F55330@enterprisedb.com
2025-04-04 11:22:20 +02:00
Melanie Plageman
54a3615f15 Remove misleading read stream asserts in a few users
Several read stream users asserted that the read stream was exhausted
after looping on that very condition. It was pointed out in an a
review of an as-of-yet uncommitted read stream user [1] that this was
confusing and could lead the reader to think there was a possibility of
some kind of race condition. Remove these asserts.

[1] https://postgr.es/m/F9ACE8D0-B807-4A17-B6BD-87EF0717983D%40yesql.se
2025-04-03 18:22:37 -04:00
Heikki Linnakangas
e4309f73f6 Add support for sorted gist index builds to btree_gist
This enables sortsupport in the btree_gist extension for faster builds
of gist indexes.

Sorted gist index build strategy is the new default now. Regression
tests are unchanged (except for one small change in the 'enum' test to
add coverage for enum values added later) and are using the sorted
build strategy instead.

One version of this was committed a long time ago already, in commit
9f984ba6d2, but it was quickly reverted because of buildfarm
failures. The failures were presumably caused by some small bugs, but
we never got around to debug and commit it again. This patch was
written from scratch, implementing the same idea, with some fragments
and ideas from the original patch.

Author: Bernd Helmle <mailings@oopsware.de>
Author: Andrey Borodin <x4mmm@yandex-team.ru>
Discussion: https://www.postgresql.org/message-id/64d324ce2a6d535d3f0f3baeeea7b25beff82ce4.camel@oopsware.de
2025-04-03 13:46:35 +03:00
Heikki Linnakangas
9370978da8 Fix boilerplate comments in btree_gist
A few of these were copy-pasted wrong, like the comment "Bytea ops" in
btree_numeric.c. Instead of fixing the incorrect ones, replace them
all with generic comment "GiST support functions".

Also tidy up the inconsistent newlines between various functions while
we're at it.
2025-04-03 13:39:33 +03:00
Andres Freund
ae3df4b341 read_stream: Introduce and use optional batchmode support
Submitting IO in larger batches can be more efficient than doing so
one-by-one, particularly for many small reads. It does, however, require
the ReadStreamBlockNumberCB callback to abide by the restrictions of AIO
batching (c.f. pgaio_enter_batchmode()). Basically, the callback may not:
a) block without first calling pgaio_submit_staged(), unless a
   to-be-waited-on lock cannot be part of a deadlock, e.g. because it is
   never held while waiting for IO.

b) directly or indirectly start another batch pgaio_enter_batchmode()

As this requires care and is nontrivial in some cases, batching is only
used with explicit opt-in.

This patch adds an explicit flag (READ_STREAM_USE_BATCHING) to read_stream and
uses it where appropriate.

There are two cases where batching would likely be beneficial, but where we
aren't using it yet:

1) bitmap heap scans, because the callback reads the VM

   This should soon be solved, because we are planning to remove the use of
   the VM, due to that not being sound.

2) The first phase of heap vacuum

   This could be made to support batchmode, but would require some care.

Reviewed-by: Noah Misch <noah@leadboat.com>
Reviewed-by: Thomas Munro <thomas.munro@gmail.com>
Discussion: https://postgr.es/m/uvrtrknj4kdytuboidbhwclo4gxhswwcpgadptsjvjqcluzmah%40brqs62irg4dt
2025-03-30 18:36:41 -04:00
Tomas Vondra
49b82522f1 Remove incidental md5() function use from test
Replace md5() with sha256() in tests introduced in 14ffaece0f, to
allow test to pass in OpenSSL FIPS mode.

Reported-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/3518736.1743307492@sss.pgh.pa.us
2025-03-30 13:22:39 +02:00
Tomas Vondra
68f97aeadb amcheck: Add a GIN index to the CREATE INDEX CONCURRENTLY tests
The existing CREATE INDEX CONCURRENTLY tests checking only B-Tree, but
can be cheaply extended to also check GIN. This helps increasing test
coverage for GIN amcheck, especially related to handling concurrent page
splits and posting list trees.

This already helped to identify several issues during development of the
GIN amcheck support.

Author: Mark Dilger <mark.dilger@enterprisedb.com>
Reviewed-By: Tomas Vondra <tomas.vondra@enterprisedb.com>
Reviewed-By: Kirill Reshke <reshkekirill@gmail.com>
Discussion: https://postgr.es/m/BC221A56-977C-418E-A1B8-9EFC881D80C5%40enterprisedb.com
2025-03-29 16:47:44 +01:00
Tomas Vondra
ca738bdc4c amcheck: Add a test with GIN index on JSONB data
Extend the existing test of GIN checks to also include an index on JSONB
data, using the jsonb_path_ops opclass. This is a common enough usage of
GIN that it makes sense to have better test coverage for it.

Author: Mark Dilger <mark.dilger@enterprisedb.com>
Reviewed-By: Tomas Vondra <tomas.vondra@enterprisedb.com>
Reviewed-By: Kirill Reshke <reshkekirill@gmail.com>
Discussion: https://postgr.es/m/BC221A56-977C-418E-A1B8-9EFC881D80C5%40enterprisedb.com
2025-03-29 16:47:44 +01:00
Tomas Vondra
ec4327d106 amcheck: Fix indentation in verify_gin.c
I forgot to reindent the code after a couple last-minute adjustments
just before committing 14ffaece0f.

Discussion: https://postgr.es/m/45AC9B0A-2B45-40EE-B08F-BDCF5739D1E1%40yandex-team.ru
2025-03-29 16:47:44 +01:00
Tomas Vondra
14ffaece0f amcheck: Add gin_index_check() to verify GIN index
Adds a new function, validating two kinds of invariants on a GIN index:

- parent-child consistency: Paths in a GIN graph have to contain
  consistent keys. Tuples on parent pages consistently include tuples
  from child pages; parent tuples do not require any adjustments.

- balanced-tree / graph: Each internal page has at least one downlink,
  and can reference either only leaf pages or only internal pages.

The GIN verification is based on work by Grigory Kryachko, reworked by
Heikki Linnakangas and with various improvements by Andrey Borodin.
Investigation and fixes for multiple bugs by Kirill Reshke.

Author: Grigory Kryachko <GSKryachko@gmail.com>
Author: Heikki Linnakangas <hlinnaka@iki.fi>
Author: Andrey Borodin <amborodin@acm.org>
Reviewed-By: José Villanova <jose.arthur@gmail.com>
Reviewed-By: Aleksander Alekseev <aleksander@timescale.com>
Reviewed-By: Nikolay Samokhvalov <samokhvalov@gmail.com>
Reviewed-By: Andres Freund <andres@anarazel.de>
Reviewed-By: Tomas Vondra <tomas.vondra@enterprisedb.com>
Reviewed-By: Kirill Reshke <reshkekirill@gmail.com>
Reviewed-By: Mark Dilger <mark.dilger@enterprisedb.com>
Reviewed-By: Peter Geoghegan <pg@bowt.ie>
Discussion: https://postgr.es/m/45AC9B0A-2B45-40EE-B08F-BDCF5739D1E1%40yandex-team.ru
2025-03-29 15:44:29 +01:00
Tomas Vondra
d70b17636d amcheck: Move common routines into a separate module
Before performing checks on an index, we need to take some safety
measures that apply to all index AMs. This includes:

* verifying that the index can be checked - Only selected AMs are
supported by amcheck (right now only B-Tree). The index has to be
valid and not a temporary index from another session.

* changing (and then restoring) user's security context

* obtaining proper locks on the index (and table, if needed)

* discarding GUC changes from the index functions

Until now this was implemented in the B-Tree amcheck module, but it's
something every AM will have to do. So relocate the code into a new
module verify_common for reuse.

The shared steps are implemented by amcheck_lock_relation_and_check(),
receiving the AM-specific verification as a callback. Custom parameters
may be supplied using a pointer.

Author: Andrey Borodin <amborodin@acm.org>
Reviewed-By: José Villanova <jose.arthur@gmail.com>
Reviewed-By: Aleksander Alekseev <aleksander@timescale.com>
Reviewed-By: Nikolay Samokhvalov <samokhvalov@gmail.com>
Reviewed-By: Andres Freund <andres@anarazel.de>
Reviewed-By: Tomas Vondra <tomas@vondra.me>
Reviewed-By: Mark Dilger <mark.dilger@enterprisedb.com>
Reviewed-By: Peter Geoghegan <pg@bowt.ie>
Reviewed-By: Kirill Reshke <reshkekirill@gmail.com>
Discussion: https://postgr.es/m/45AC9B0A-2B45-40EE-B08F-BDCF5739D1E1%40yandex-team.ru
2025-03-29 15:14:49 +01:00
Peter Eisentraut
a0ed19e0a9 Use PRI?64 instead of "ll?" in format strings (continued).
Continuation of work started in commit 15a79c73, after initial trial.

Author: Thomas Munro <thomas.munro@gmail.com>
Discussion: https://postgr.es/m/b936d2fb-590d-49c3-a615-92c3a88c6c19%40eisentraut.org
2025-03-29 10:43:57 +01:00
Robert Haas
83ccc85859 pg_overexplain: Use PG_MODULE_MAGIC_EXT.
I committed this contrib module just after Tom committed
55527368bd07248e91e3d37a782bf66b76f06865; adjust it to match.

Author: Man Zeng <zengman@halodbtech.com>
Discussion: http://postgr.es/m/174313513707.60295.16516085012903412705.pgcf@coridan.postgresql.org
2025-03-28 09:16:29 -04:00
Robert Haas
9f0c36aea0 pg_overexplain: Call previous hooks as appropriate.
It makes no sense to remember the previous values of the hook variables
and then never bother calling those functions. Thanks to Andrei for
spotting my goof.

Author: Andrei Lepikhov <lepihov@gmail.com>
Discussion: http://postgr.es/m/41a344e3-ffb1-4296-8ba7-801f1e9642e5@gmail.com
2025-03-28 09:02:37 -04:00
Melanie Plageman
043799fa08 Use streaming read I/O in heap amcheck
Instead of directly invoking ReadBuffer() for each unskippable block in
the heap relation, verify_heapam() now uses the read stream API to
acquire the next buffer to check for corruption.

Author: Matheus Alcantara <matheusssilv97@gmail.com>
Co-authored-by: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Nazir Bilal Yavuz <byavuz81@gmail.com>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Reviewed-by: jian he <jian.universality@gmail.com>
Discussion: https://postgr.es/m/flat/CAFY6G8eLyz7%2BsccegZYFj%3D5tAUR-GZ9uEq4Ch5gvwKqUwb_hCA%40mail.gmail.com
2025-03-27 14:04:14 -04:00
Tom Lane
4623d71443 Prevent assertion failure in contrib/pg_freespacemap.
Applying pg_freespacemap() to a relation lacking storage (such as a
view) caused an assertion failure, although there was no ill effect
in non-assert builds.  Add an error check for that case.

Bug: #18866
Reported-by: Robins Tharakan <tharakan@gmail.com>
Author: Tender Wang <tndrwang@gmail.com>
Reviewed-by: Euler Taveira <euler@eulerto.com>
Discussion: https://postgr.es/m/18866-d68926d0f1c72d44@postgresql.org
Backpatch-through: 13
2025-03-27 13:20:23 -04:00
Robert Haas
081ec08e6a pg_overexplain: Filter out actual row count from test result.
Per buildfarm, these are not stable. In particular, 1/8 is sometimes
0.12 and sometimes 0.13.
2025-03-27 09:00:46 -04:00
Álvaro Herrera
9fbd53dea5
Remove the query_id_squash_values GUC
Commit 62d712ecfd introduced the capability to calculate the same
queryId for queries with different lengths of constants in a list for an
IN clause.  This behavior was originally enabled with a GUC
query_id_squash_values.  After a discussion about the value of such a
GUC, it was decided to back out of the use of a GUC and make the
squashing behavior the only available option.

Author: Sami Imseih <samimseih@gmail.com>
Discussion: https://postgr.es/m/Z-LZyygkkNyA8-kR@msg.df7cb.de
Discussion: https://postgr.es/m/CA+q6zcVTK-3C-8NWV1oY2NZrvtnMCDqnyYYyk1T7WMUG65MeOQ@mail.gmail.com
2025-03-27 13:33:37 +01:00
David Rowley
f31aad9b07 Fix query jumbling to account for NULL nodes
Previously NULL nodes were ignored.  This could cause issues where the
computed query ID could match for queries where fields that are next to
each other in their Node struct where one field was NULL and the other
non-NULL.  For example, the Query struct had distinctClause and sortClause
next to each other.  If someone wrote;

SELECT DISTINCT c1 FROM t;

and then;

SELECT c1 FROM t ORDER BY c1;

these would produce the same query ID since, in the first query, we
ignored the NULL sortClause and appended the jumble bytes for the
distictClause.  In the latter query, since we did nothing for the NULL
distinctClause then jumble the non-NULL sortClause, and since the node
representation stored is the same in both cases, the query IDs were
identical.

Here we fix this by always accounting for NULL nodes by recording that
we saw a NULL in the jumble buffer.  This fixes the issue as the order that
the NULL is recorded isn't the same in the above two queries.

Author: Bykov Ivan <i.bykov@modernsys.ru>
Author: Michael Paquier <michael@paquier.xyz>
Author: David Rowley <dgrowleyml@gmail.com>
Discussion: https://postgr.es/m/aafce7966e234372b2ba876c0193f1e9%40localhost.localdomain
2025-03-27 18:23:00 +13:00
Robert Haas
47a1f076a7 pg_overexplain: SET jit=off when running tests.
Per buildfarm.
2025-03-26 15:43:25 -04:00
Robert Haas
de65c4dade Fix oversights in commit 8d5ceb113e
It added bogus whitespace at the end of a line in the documentation.
It should not have done that.

The pg_overexplain tests must SET debug_parallel_query = false,
not just RESET debug_parallel_query, or we get failures on test
machines that make debug_parallel_query = true the defualt.
2025-03-26 14:22:45 -04:00
Robert Haas
8d5ceb113e pg_overexplain: Additional EXPLAIN options for debugging.
There's a fair amount of information in the Plan and PlanState trees
that isn't printed by any existing EXPLAIN option. This means that,
when working on the planner, it's often necessary to rely on facilities
such as debug_print_plan, which produce excessively voluminous
output. Hence, use the new EXPLAIN extension facilities to implement
EXPLAIN (DEBUG) and EXPLAIN (RANGE_TABLE) as extensions to the core
EXPLAIN facility.

A great deal more could be done here, and the specific choices about
what to print and how are definitely arguable, but this is at least
a starting point for discussion and a jumping-off point for possible
future improvements.

Reviewed-by: Sami Imseih <samimseih@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviweed-by: Andrei Lepikhov <lepihov@gmail.com> (who didn't like it)
Discussion: http://postgr.es/m/CA+TgmoZfvQUBWQ2P8iO30jywhfEAKyNzMZSR+uc2xr9PZBw6eQ@mail.gmail.com
2025-03-26 13:52:21 -04:00
Tom Lane
55527368bd Use PG_MODULE_MAGIC_EXT in our installable shared libraries.
It seems potentially useful to label our shared libraries with version
information, now that a facility exists for retrieving that.  This
patch labels them with the PG_VERSION string.  There was some
discussion about using semantic versioning conventions, but that
doesn't seem terribly helpful for modules with no SQL-level presence;
and for those that do have SQL objects, we typically expect them
to support multiple revisions of the SQL definitions, so it'd still
not be very helpful.

I did not label any of src/test/modules/.  It seems unnecessary since
we don't install those, and besides there ought to be someplace that
still provides test coverage for the original PG_MODULE_MAGIC macro.

Author: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/dd4d1b59-d0fe-49d5-b28f-1e463b68fa32@gmail.com
2025-03-26 11:11:02 -04:00
Tom Lane
9324c8c580 Introduce PG_MODULE_MAGIC_EXT macro.
This macro allows dynamically loaded shared libraries (modules) to
provide a wired-in module name and version, and possibly other
compile-time-constant fields in future.  This information can be
retrieved with the new pg_get_loaded_modules() function.

This feature is expected to be particularly useful for modules
that do not have any exposed SQL functionality and thus are
not associated with a SQL-level extension object.  But even for
modules that do belong to extensions, being able to verify the
actual code version can be useful.

Author: Andrei Lepikhov <lepihov@gmail.com>
Reviewed-by: Yurii Rashkovskii <yrashk@omnigres.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/dd4d1b59-d0fe-49d5-b28f-1e463b68fa32@gmail.com
2025-03-26 11:06:12 -04:00
Daniel Gustafsson
e92c0632c1 Move GSSAPI includes into its own header
Due to a conflict in macro names on Windows between <wincrypt.h>
and <openssl/ssl.h> these headers need to be included using a
predictable pattern with an undef to handle that. The GSSAPI
header <gssapi.h> does include <wincrypt.h> which cause problems
with compiling PostgreSQL using MSVC when OpenSSL and GSSAPI are
both enabled in the tree. Rather than fixing piecemeal for each
file including gssapi headers, move the the includes and undef
to a new file which should be used to centralize the logic.

This patch is a reworked version of a patch by Imran Zaheer
proposed earlier in the thread. Once this has proven effective
in master we should look at backporting this as the problem
exist at least since v16.

Author: Daniel Gustafsson <daniel@yesql.se>
Co-authored-by: Imran Zaheer <imran.zhir@gmail.com>
Reported-by: Dave Page <dpage@pgadmin.org>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: vignesh C <vignesh21@gmail.com>
Discussion: https://postgr.es/m/20240708173204.3f3xjilglx5wuzx6@awork3.anarazel.de
2025-03-26 15:31:46 +01:00
Peter Eisentraut
3642df265d dblink: SCRAM authentication pass-through
This enables SCRAM authentication for dblink (using dblink_fdw) when
connecting to a foreign server without having to store a plain-text
password on user mapping options

This uses the same approach as it was implemented for postgres_fdw in
commit 761c79508e.  (It also contains the equivalent of the
subsequent fixes 76563f88cf and d2028e9bbc1.)

Author: Matheus Alcantara <mths.dev@pm.me>
Reviewed-by: Jacob Champion <jacob.champion@enterprisedb.com>
Discussion: https://www.postgresql.org/message-id/flat/CAFY6G8ercA1KES%3DE_0__R9QCTR805TTyYr1No8qF8ZxmMg8z2Q%40mail.gmail.com
2025-03-26 10:49:23 +01:00
Michael Paquier
787514b30b Use relation name instead of OID in query jumbling for RangeTblEntry
custom_query_jumble (introduced in 5ac462e2b7 as a node field
attribute) is now assigned to the expanded reference name "eref" of
RangeTblEntry, adding in the query jumble computation the non-qualified
aliased relation name, without the list of column names.  The relation
OID is removed from the query jumbling.

The effects of this change can be seen in the tests added by
3430215fe3, where pg_stat_statements (PGSS) entries are now grouped
using the relation name, ignoring the relation search_path may point at.
For example, these two relations are different, but are now grouped in a
single PGSS entry as they are assigned the same query ID:
CREATE TABLE foo1.tab (a int);
CREATE TABLE foo2.tab (b int);
SET search_path = 'foo1';
SELECT count(*) FROM tab;
SET search_path = 'foo2';
SELECT count(*) FROM tab;
SELECT count(*) FROM foo1.tab;
SELECT count(*) FROM foo2.tab;
SELECT query, calls FROM pg_stat_statements WHERE query ~ 'FROM tab';
          query           | calls
--------------------------+-------
 SELECT count(*) FROM tab |     4
(1 row)

It is still possible to use an alias in the FROM clause to split these.
This behavior is useful for relations re-created with the same name,
where queries based on such relations would be grouped in the same
PGSS entry.  For permanent schemas, it should not really matter in
practice.  The main benefit is for workloads that use a lot of temporary
relations, which are usually re-created with the same name continuously.
These can be a heavy source of bloat in PGSS depending on the workload.
Such entries can now be grouped together, improving the user experience.

The original idea from Christoph Berg used catalog lookups to find
temporary relations, something that the query jumble has never done, and
it could cause some performance regressions.  The idea to use
RangeTblEntry.eref and the relation name, applying the same rules for
all relations, temporary and not temporary, has been proposed by Tom
Lane.  The documentation additions have been suggested by Sami Imseih.

Author: Michael Paquier <michael@paquier.xyz>
Co-authored-by: Sami Imseih <samimseih@gmail.com>
Reviewed-by: Christoph Berg <myon@debian.org>
Reviewed-by: Lukas Fittl <lukas@fittl.com>
Reviewed-by: Sami Imseih <samimseih@gmail.com>
Discussion: https://postgr.es/m/Z9iWXKGwkm8RAC93@msg.df7cb.de
2025-03-26 15:21:05 +09:00
Peter Eisentraut
d2028e9bbc postgres_fdw: Fix tests on some Windows variants
The tests introduced by commit 76563f88cf only work when Unix-domain
sockets are available.  This is optional on Windows, and buildfarm
member drongo runs without them.  To fix, skip the test if Unix-domain
sockets are not enabled.
2025-03-26 07:00:00 +01:00
Michael Paquier
3430215fe3 pg_stat_statements: Add more tests with temp tables and namespaces
These tests provide coverage for RangeTblEntry and how query jumbling
works with search_path, as well as the case where relations are
re-created, generating a different query ID as the relation OID is used
in the computation.

A patch is under discussion to switch to a different approach based on
the relation name, and there was no test coverage for this area,
including how queries are currently grouped with search_path.  This is
useful to track how the situation changes between HEAD and any patches
proposed.

Christoph has proposed the test with ON COMMIT DROP temporary tables,
and I have written the second part.

Author: Christoph Berg <myon@debian.org>
Author: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/Z9iWXKGwkm8RAC93@msg.df7cb.de
2025-03-26 07:25:23 +09:00
Alexander Korotkov
62f36d6924 postgres_fdw: Remove redundant check in semijoin_target_ok()
If a var belongs to the innerrel of the joinrel, it's not possible that
it belongs to the outerrel.  This commit removes the redundant check from
the if-clause but keeps it as an assertion.

Discussion: https://postgr.es/m/flat/CAHewXN=8aW4hd_W71F7Ua4+_w0=bppuvvTEBFBF6G0NuSXLwUw@mail.gmail.com
Author: Tender Wang <tndrwang@gmail.com>
Reviewed-by: Alexander Pyhalov <a.yhalov@postgrespro.ru>
Backpatch-through: 17
2025-03-25 12:49:01 +02:00
Alexander Korotkov
023fb51275 postgres_fdw: Avoid pulling up restrict infos from subqueries
Semi-join joins below left/right join are deparsed as
subqueries.  Thus, we can't refer to subqueries vars from upper relations.
This commit avoids pulling conditions from them.

Reported-by: Robins Tharakan <tharakan@gmail.com>
Bug: #18852
Discussion: https://postgr.es/m/CAEP4nAzryLd3gwcUpFBAG9MWyDfMRX8ZjuyY2XXjyC_C6k%2B_Zw%40mail.gmail.com
Author: Alexander Pyhalov <a.pyhalov@postgrespro.ru>
Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com>
Backpatch-through: 17
2025-03-25 05:49:47 +02:00
Peter Eisentraut
76563f88cf postgres_fdw: improve security checks
SCRAM pass-through should not bypass the FDW security check as it was
implemented for postgres_fdw in commit 761c79508e.

This commit improves the security check by adding new SCRAM
pass-through checks to ensure that the required SCRAM connection
options are not overwritten by the user mapping or foreign server
options.  This is meant to match the security requirements for a
password-using connection.

Since libpq has no SCRAM-specific equivalent of
PQconnectionUsedPassword(), we enforce this instead by making the
use_scram_passthrough option of postgres_fdw imply
require_auth=scram-sha-256.  This means that if use_scram_passthrough
is set, some situations that might otherwise have worked are
preempted, for example GSSAPI with delegated credentials.  This could
be enhanced in the future if there is desire for more flexibility.

Reported-by: Jacob Champion <jacob.champion@enterprisedb.com>
Author: Matheus Alcantara <mths.dev@pm.me>
Co-authored-by: Jacob Champion <jacob.champion@enterprisedb.com>
Reviewed-by: Jacob Champion <jacob.champion@enterprisedb.com>
Discussion: https://www.postgresql.org/message-id/flat/CAFY6G8ercA1KES%3DE_0__R9QCTR805TTyYr1No8qF8ZxmMg8z2Q%40mail.gmail.com
2025-03-24 15:56:53 +01:00
Peter Eisentraut
618c64ffd3 Revert workarounds for -Wmissing-braces false positives on old GCC
We have collected several instances of a workaround for GCC bug 53119,
which caused false-positive compiler warnings.  This bug has long been
fixed, but was still seen on the buildfarm, most recently on lapwing
with gcc (Debian 4.7.2-5).  (The GCC bug tracker mentions that a fix
was backported to 4.7.4 and 4.8.3.)

That compiler no longer runs warning-free since commit 6fdd5d9563, so
we don't need to keep these workarounds.  And furthermore, the
consensus appears to be that we don't want to keep supporting that era
of platform anymore at all.

This reverts the following commits:

d937904cce
506428d091
b449afb582
6392f2a096
bad0763a4d
5e0c761d0a

and makes a few similar fixes to newer code.

Discussion: https://www.postgresql.org/message-id/flat/e170d61f-01ab-4cf9-ab68-91cd1fac62c5%40eisentraut.org
Discussion: https://www.postgresql.org/message-id/flat/CA%2BTgmoYEAm-KKZibAP3hSqbTFTjUd47XtVcf3xSFDpyecXX9uQ%40mail.gmail.com
2025-03-20 11:25:58 +01:00
Álvaro Herrera
62d712ecfd
Introduce squashing of constant lists in query jumbling
pg_stat_statements produces multiple entries for queries like
    SELECT something FROM table WHERE col IN (1, 2, 3, ...)

depending on the number of parameters, because every element of
ArrayExpr is individually jumbled.  Most of the time that's undesirable,
especially if the list becomes too large.

Fix this by introducing a new GUC query_id_squash_values which modifies
the node jumbling code to only consider the first and last element of a
list of constants, rather than each list element individually.  This
affects both the query_id generated by query jumbling, as well as
pg_stat_statements query normalization so that it suppresses printing of
the individual elements of such a list.

The default value is off, meaning the previous behavior is maintained.

Author: Dmitry Dolgov <9erthalion6@gmail.com>
Reviewed-by: Sergey Dudoladov (mysterious, off-list)
Reviewed-by: David Geier <geidav.pg@gmail.com>
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Reviewed-by: Álvaro Herrera <alvherre@alvh.no-ip.org>
Reviewed-by: Sami Imseih <samimseih@gmail.com>
Reviewed-by: Sutou Kouhei <kou@clear-code.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Marcos Pegoraro <marcos@f10.com.br>
Reviewed-by: Julien Rouhaud <rjuju123@gmail.com>
Reviewed-by: Zhihong Yu <zyu@yugabyte.com>
Tested-by: Yasuo Honda <yasuo.honda@gmail.com>
Tested-by: Sergei Kornilov <sk@zsrv.org>
Tested-by: Maciek Sakrejda <m.sakrejda@gmail.com>
Tested-by: Chengxi Sun <sunchengxi@highgo.com>
Tested-by: Jakub Wartak <jakub.wartak@enterprisedb.com>
Discussion: https://postgr.es/m/CA+q6zcWtUbT_Sxj0V6HY6EZ89uv5wuG5aefpe_9n0Jr3VwntFg@mail.gmail.com
2025-03-18 18:56:11 +01:00
Robert Haas
c65bc2e1d1 Make it possible for loadable modules to add EXPLAIN options.
Modules can use RegisterExtensionExplainOption to register new
EXPLAIN options, and GetExplainExtensionId, GetExplainExtensionState,
and SetExplainExtensionState to store related state inside the
ExplainState object.

Since this substantially increases the amount of code that needs
to handle ExplainState-related tasks, move a few bits of existing
code to a new file explain_state.c and add the rest of this
infrastructure there.

See the comments at the top of explain_state.c for further
explanation of how this mechanism works.

This does not yet provide a way for such such options to do anything
useful. The intention is that we'll add hooks for that purpose in a
separate commit.

Discussion: http://postgr.es/m/CA+TgmoYSzg58hPuBmei46o8D3SKX+SZoO4K_aGQGwiRzvRApLg@mail.gmail.com
Reviewed-by: Srinath Reddy <srinath2133@gmail.com>
Reviewed-by: Andrei Lepikhov <lepihov@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Sami Imseih <samimseih@gmail.com>
2025-03-18 08:41:12 -04:00
Michael Paquier
3943f5cff6 Fix inconsistent quoting for some options in TAP tests
This commit addresses some inconsistencies with how the options of some
routines from PostgreSQL/Test/ are written, mainly for init() and
init_from_backup() in Cluster.pm.  These are written as unquoted, except
in the locations updated here.

Changes extracted from a larger patch by the same author.

Author: Dagfinn Ilmari Mannsåker <ilmari@ilmari.org>
Discussion: https://postgr.es/m/87jz8rzf3h.fsf@wibble.ilmari.org
2025-03-17 14:07:12 +09:00
Michael Paquier
19c6e92b13 Apply more consistent style for command options in TAP tests
This commit reshapes the grammar of some commands to apply a more
consistent style across the board, following rules similar to
ce1b0f9da03e:
- Elimination of some pointless used-once variables.
- Use of long options, to self-document better the options used.
- Use of fat commas to link option names and their assigned values,
including redirections, so as perltidy can be tricked to put them
together.

Author: Dagfinn Ilmari Mannsåker <ilmari@ilmari.org>
Discussion: https://postgr.es/m/87jz8rzf3h.fsf@wibble.ilmari.org
2025-03-17 12:42:23 +09:00
Tom Lane
4489044239 contrib/isn: Make weak mode a GUC setting, and fix related functions.
isn's weak mode used to be a simple static variable, settable only
via the isn_weak(boolean) function.  This wasn't optimal, as this
means it doesn't respect transactions nor respond to RESET ALL.

This patch makes isn.weak a GUC parameter instead, so that
it acts like any other user-settable parameter.

The isn_weak() functions are retained for backwards compatibility.
But we must fix their volatility markings: they were marked IMMUTABLE
which is surely incorrect, and PARALLEL RESTRICTED which isn't right
for GUC-related functions either.  Mark isn_weak(boolean) as
VOLATILE and PARALLEL UNSAFE, matching set_config().  Mark isn_weak()
as STABLE and PARALLEL SAFE, matching current_setting().

Reported-by: Viktor Holmberg <v@viktorh.net>
Diagnosed-by: Daniel Gustafsson <daniel@yesql.se>
Author: Viktor Holmberg <v@viktorh.net>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/790bc1f9-74dc-4b50-94d2-8147315b1556@Spark
2025-03-16 13:45:48 -04:00
Peter Eisentraut
3691edfab9 pg_noreturn to replace pg_attribute_noreturn()
We want to support a "noreturn" decoration on more compilers besides
just GCC-compatible ones, but for that we need to move the decoration
in front of the function declaration instead of either behind it or
wherever, which is the current style afforded by GCC-style attributes.
Also rename the macro to "pg_noreturn" to be similar to the C11
standard "noreturn".

pg_noreturn is now supported on all compilers that support C11 (using
_Noreturn), as well as GCC-compatible ones (using __attribute__, as
before), as well as MSVC (using __declspec).  (When PostgreSQL
requires C11, the latter two variants can be dropped.)

Now, all supported compilers effectively support pg_noreturn, so the
extra code for !HAVE_PG_ATTRIBUTE_NORETURN can be dropped.

This also fixes a possible problem if third-party code includes
stdnoreturn.h, because then the current definition of

    #define pg_attribute_noreturn() __attribute__((noreturn))

would cause an error.

Note that the C standard does not support a noreturn attribute on
function pointer types.  So we have to drop these here.  There are
only two instances at this time, so it's not a big loss.  In one case,
we can make up for it by adding the pg_noreturn to a wrapper function
and adding a pg_unreachable(), in the other case, the latter was
already done before.

Reviewed-by: Dagfinn Ilmari Mannsåker <ilmari@ilmari.org>
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://www.postgresql.org/message-id/flat/pxr5b3z7jmkpenssra5zroxi7qzzp6eswuggokw64axmdixpnk@zbwxuq7gbbcw
2025-03-13 12:37:26 +01:00
David Rowley
b955df4434 Fix indentation issue
Introduced recently by 9e088f7dd

Per buildfarm member koel
2025-03-13 12:41:44 +13:00
Masahiko Sawada
9e088f7dd8 Fix compiler warning in pg_logicalinspect.
Oversight in bd65cb3cd4.

Reported-by: David Rowley <dgrowleyml@gmail.com>
Reported-by: Nathan Bossart <nathandbossart@gmail.com>
Author: David Rowley <dgrowleyml@gmail.com>
Discussion: https://postgr.es/m/CAApHDvqrhFfnetbcwgGkJ=z63T8HfQ_OyP=vX8BYiXyxFKt67w@mail.gmail.com
2025-03-12 14:23:56 -07:00
Masahiko Sawada
bd65cb3cd4 pg_logicalinspect: Fix possible crash when passing a directory path.
Previously, pg_logicalinspect functions were too trusting of their
input and blindly passed it to SnapBuildRestoreSnapshot(). If the
input pointed to a directory, the server could a PANIC error while
attempting to fsync_fname() with isdir=false on a directory.

This commit adds validation checks for input filenames and passes the
LSN extracted from the filename to SnapBuildRestoreSnapshot() instead
of the filename itself. It also adds regression tests for various
input patterns and permission checks.

Bug: #18828
Reported-by: Robins Tharakan <tharakan@gmail.com>
Co-authored-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Co-authored-by: Masahiko Sawada <sawada.mshk@gmail.com>
Discussion: https://postgr.es/m/18828-0f4701c635064211@postgresql.org
2025-03-11 09:56:40 -07:00
Masahiko Sawada
a49927f04c pg_logicalinspect: Stabilize isolation tests.
The previous isolation tests did not account for the possibility that
the background writer or the checkpointer could write a RUNNING_XACTS
record, which could cause logical decoding to produce more logical
snapshots than expected.

This commit modifies the isolation tests to verify that at least one
logical snapshot contains the expected number of committed or ongoing
catalog-change transactions.

Per buildfarm member skink.

Reported-by: Andres Freund <andres@anarazel.de>
Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Discussion: https://postgr.es/m/5qbxud4pvnvmtuoi7weiizm5hmumxaeohx4vztfhrwlfhyz6rj@buh4435mllwo
2025-03-11 09:30:00 -07:00
Tom Lane
8b1b342544 Improve EXPLAIN's display of window functions.
Up to now we just punted on showing the window definitions used
in a plan, with window function calls represented as "OVER (?)".
To improve that, show the window definition implemented by each
WindowAgg plan node, and reference their window names in OVER.
For nameless window clauses generated by "OVER (...)", assign
unique names w1, w2, etc.

In passing, re-order the properties shown for a WindowAgg node
so that the Run Condition (if any) appears after the Window
property and before the Filter (if any).  This seems more
sensible since the Run Condition is associated with the Window
and acts before the Filter.

Thanks to David G. Johnston and Álvaro Herrera for design
suggestions.

Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: David Rowley <dgrowleyml@gmail.com>
Discussion: https://postgr.es/m/144530.1741469955@sss.pgh.pa.us
2025-03-11 11:19:54 -04:00
Peter Geoghegan
426ea61117 nbtree: Make BTMaxItemSize into object-like macro.
Make nbtree's "1/3 of a page limit" BTMaxItemSize function-like macro
(which accepts a "page" argument) into an object-like macro that can be
used from code that doesn't have convenient access to an nbtree page.

Preparation for an upcoming patch that adds skip scan to nbtree.
Parallel index scans that use skip scan will serialize datums (not just
SAOP array subscripts) when scheduling primitive scans.  BTMaxItemSize
will be used by btestimateparallelscan to determine how much DSM to
request.

Author: Peter Geoghegan <pg@bowt.ie>
Discussion: https://postgr.es/m/CAH2-Wz=H_RG5weNGeUG_TkK87tRBnH9mGCQj6WpM4V4FNWKv2g@mail.gmail.com
2025-03-11 10:35:56 -04:00
Peter Geoghegan
0fbceae841 Show index search count in EXPLAIN ANALYZE, take 2.
Expose the count of index searches/index descents in EXPLAIN ANALYZE's
output for index scan/index-only scan/bitmap index scan nodes.  This
information is particularly useful with scans that use ScalarArrayOp
quals, where the number of index searches can be unpredictable due to
implementation details that interact with physical index characteristics
(at least with nbtree SAOP scans, since Postgres 17 commit 5bf748b8).
The information shown also provides useful context when EXPLAIN ANALYZE
runs a plan with an index scan node that successfully applied the skip
scan optimization (set to be added to nbtree by an upcoming patch).

The instrumentation works by teaching all index AMs to increment a new
nsearches counter whenever a new index search begins.  The counter is
incremented at exactly the same point that index AMs already increment
the pg_stat_*_indexes.idx_scan counter (we're counting the same event,
but at the scan level rather than the relation level).  Parallel queries
have workers copy their local counter struct into shared memory when an
index scan node ends -- even when it isn't a parallel aware scan node.
An earlier version of this patch that only worked with parallel aware
scans became commit 5ead85fb (though that was quickly reverted by commit
d00107cd following "debug_parallel_query=regress" buildfarm failures).

Our approach doesn't match the approach used when tracking other index
scan related costs (e.g., "Rows Removed by Filter:").  It is comparable
to the approach used in similar cases involving costs that are only
readily accessible inside an access method, not from the executor proper
(e.g., "Heap Blocks:" output for a Bitmap Heap Scan, which was recently
enhanced to show per-worker costs by commit 5a1e6df3, using essentially
the same scheme as the one used here).  It is necessary for index AMs to
have direct responsibility for maintaining the new counter, since the
counter might need to be incremented multiple times per amgettuple call
(or per amgetbitmap call).  But it is also necessary for the executor
proper to manage the shared memory now used to transfer each worker's
counter struct to the leader.

Author: Peter Geoghegan <pg@bowt.ie>
Reviewed-By: Robert Haas <robertmhaas@gmail.com>
Reviewed-By: Tomas Vondra <tomas@vondra.me>
Reviewed-By: Masahiro Ikeda <ikedamsh@oss.nttdata.com>
Reviewed-By: Matthias van de Meent <boekewurm+postgres@gmail.com>
Discussion: https://postgr.es/m/CAH2-WzkRqvaqR2CTNqTZP0z6FuL4-3ED6eQB0yx38XBNj1v-4Q@mail.gmail.com
Discussion: https://postgr.es/m/CAH2-Wz=PKR6rB7qbx+Vnd7eqeB5VTcrW=iJvAsTsKbdG+kW_UA@mail.gmail.com
2025-03-11 09:20:50 -04:00
Peter Eisentraut
af4002b381 Rename amcancrosscompare
After more discussion about commit ce62f2f2a0, rename the index AM
property amcancrosscompare to two separate properties
amconsistentequality and amconsistentordering.  Also improve the
documentation and update some comments that were previously missed.

Reported-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://www.postgresql.org/message-id/flat/E1tngY6-0000UL-2n%40gemulon.postgresql.org
2025-03-07 11:46:33 +01:00
Peter Geoghegan
d00107cd63 Revert "Show index search count in EXPLAIN ANALYZE."
This reverts commit 5ead85fbc8.

This commit shows test failures with debug_parallel_query=regress.  The
underlying issue needs to be debugged, so revert for now.
2025-03-05 10:27:31 -05:00
Peter Geoghegan
5ead85fbc8 Show index search count in EXPLAIN ANALYZE.
Expose the count of index searches/index descents in EXPLAIN ANALYZE's
output for index scan nodes.  This information is particularly useful
with scans that use ScalarArrayOp quals, where the number of index scans
isn't predictable in advance (at least not with optimizations like the
one added to nbtree by Postgres 17 commit 5bf748b8).  It will also be
useful when EXPLAIN ANALYZE shows details of an nbtree index scan that
uses skip scan optimizations set to be introduced by an upcoming patch.

The instrumentation works by teaching index AMs to increment a new
nsearches counter whenever a new index search begins.  The counter is
incremented at exactly the same point that index AMs must already
increment the index's pg_stat_*_indexes.idx_scan counter (we're counting
the same event, but at the scan level rather than the relation level).
The new counter is stored in the scan descriptor (IndexScanDescData),
which explain.c reaches by going through the scan node's PlanState.

This approach doesn't match the approach used when tracking other index
scan specific costs (e.g., "Rows Removed by Filter:").  It is similar to
the approach used in other cases where we must track costs that are only
readily accessible inside an access method, and not from the executor
(e.g., "Heap Blocks:" output for a Bitmap Heap Scan).  It is inherently
necessary to maintain a counter that can be incremented multiple times
during a single amgettuple call (or amgetbitmap call), and directly
exposing PlanState.instrument to index access methods seems unappealing.

Author: Peter Geoghegan <pg@bowt.ie>
Reviewed-By: Tomas Vondra <tomas@vondra.me>
Reviewed-By: Robert Haas <robertmhaas@gmail.com>
Reviewed-By: Masahiro Ikeda <ikedamsh@oss.nttdata.com>
Reviewed-By: Matthias van de Meent <boekewurm+postgres@gmail.com>
Discussion: https://postgr.es/m/CAH2-Wz=PKR6rB7qbx+Vnd7eqeB5VTcrW=iJvAsTsKbdG+kW_UA@mail.gmail.com
Discussion: https://postgr.es/m/CAH2-WzkRqvaqR2CTNqTZP0z6FuL4-3ED6eQB0yx38XBNj1v-4Q@mail.gmail.com
2025-03-05 09:36:48 -05:00
Fujii Masao
fe186bda78 postgres_fdw: Extend postgres_fdw_get_connections to return remote backend PID.
This commit adds a new "remote_backend_pid" output column to
the postgres_fdw_get_connections function. It returns the process ID of
the remote backend, on the foreign server, handling the connection.

This enhancement is useful for troubleshooting, monitoring, and reporting.
For example, if a connection is unexpectedly closed by the foreign server,
the remote backend's PID can help diagnose the cause.

No extension version bump is needed, as commit c297a47c5f already
handled it for v18~.

Author: Sagar Dilip Shedge <sagar.shedge92@gmail.com>
Reviewed-by: Fujii Masao <masao.fujii@gmail.com>
Discussion: https://postgr.es/m/CAPhYifF25q5xUQWXETfKwhc0YVa_6+tfG9Kw4bCvCjpCWxYs2A@mail.gmail.com
2025-03-03 08:51:30 +09:00
Nathan Bossart
e636da9200 Adjust auto_explain's GUC descriptions.
This commit adjusts auto_explain's GUC descriptions to follow the
style guidelines established by commit 977d865c36.  Specifically,
it ensures the accepted special values are listed in a consistent
manner.

Author: Ilia Evdokimov <ilya.evdokimov@tantorlabs.com>
Reviewed-by: Peter Smith <smithpb2250@gmail.com>
Discussion: https://postgr.es/m/e82d4647-ce7f-45c7-9b01-fb900a050767%40tantorlabs.com
2025-02-28 16:05:51 -06:00
Robert Haas
9173e8b604 Create explain_format.c and move relevant code there.
explain.c has grown rather large, so move various functions that
are principally concerned with output generation to a new source
file, explain_format.c, instead of lumping them in with everything
else that is part of explain.c

Reviewed-by: Peter Geoghegan <pg@bowt.ie>
Discussion: http://postgr.es/m/CA+TgmoYutMw1Jgo8BWUmB3TqnOhsEAJiYO=rOQufF4gPLWmkLQ@mail.gmail.com
2025-02-27 12:37:10 -05:00
Robert Haas
95dbd827f2 EXPLAIN: Always use two fractional digits for row counts.
Commit ddb17e387a attempted to avoid
confusing users by displaying digits after the decimal point only when
nloops > 1, since it's impossible to have a fraction row count after a
single iteration. However, this made the regression tests unstable since
parallal queries will have nloops>1 for all nodes below the Gather or
Gather Merge in normal cases, but if the workers don't start in time and
the leader finishes all the work, they will suddenly have nloops==1,
making it unpredictable whether the digits after the decimal point would
be displayed or not. Although 44cbba9a7f
seemed to fix the immediate failures, it may still be the case that there
are lower-probability failures elsewhere in the regression tests.

Various fixes are possible here. For example, it has previously been
proposed that we should try to display the digits after the decimal
point only if rows/nloops is an integer, but currently rows is storead
as a float so it's not theoretically an exact quantity -- precision
could be lost in extreme cases. It has also been proposed that we
should try to display the digits after the decimal point only if we're
under some sort of construct that could potentially cause looping
regardless of whether it actually does. While such ideas are not
without merit, this patch adopts the much simpler solution of always
display two decimal digits. If that approach stands up to scrutiny
from the buildfarm and human users, it spares us the trouble of doing
anything more complex; if not, we can reassess.

This commit incidentally reverts 44cbba9a7f,
which should no longer be needed.

Author: Robert Haas <robertmhaas@gmail.com>
Author: Ilia Evdokimov <ilya.evdokimov@tantorlabs.com>
Discussion: http://postgr.es/m/CA+TgmoazzVHn8sFOMFAEwoqBTDxKT45D7mvkyeHgqtoD2cn58Q@mail.gmail.com
2025-02-27 11:27:16 -05:00
Peter Eisentraut
ce62f2f2a0 Generalize hash and ordering support in amapi
Stop comparing access method OID values against HASH_AM_OID and
BTREE_AM_OID, and instead check the IndexAmRoutine for an index to see
if it advertises its ability to perform the necessary ordering,
hashing, or cross-type comparing functionality.  A field amcanorder
already existed, this uses it more widely.  Fields amcanhash and
amcancrosscompare are added for the other purposes.

Author: Mark Dilger <mark.dilger@enterprisedb.com>
Discussion: https://www.postgresql.org/message-id/flat/E72EAA49-354D-4C2E-8EB9-255197F55330@enterprisedb.com
2025-02-27 17:03:31 +01:00
Masahiko Sawada
dfd8e6c73e Fix an issue with index scan using pg_trgm due to char signedness on different architectures.
GIN and GiST indexes utilizing pg_trgm's opclasses store sorted
trigrams within index tuples. When comparing and sorting each trigram,
pg_trgm treats each character as a 'char[3]' type in C. However, the
char type in C can be interpreted as either signed char or unsigned
char, depending on the platform, if the signedness is not explicitly
specified. Consequently, during replication between different CPU
architectures, there was an issue where index scans on standby servers
could not locate matching index tuples due to the differing treatment
of character signedness.

This change introduces comparison functions for trgm that explicitly
handle signed char and unsigned char. The appropriate comparison
function will be dynamically selected based on the character
signedness stored in the control file. Therefore, upgraded clusters
can utilize the indexes without rebuilding, provided the cluster
upgrade occurs on platforms with the same character signedness as the
original cluster initialization.

The default char signedness information was introduced in 44fe30fdab,
so no backpatch.

Reviewed-by: Noah Misch <noah@leadboat.com>
Discussion: https://postgr.es/m/CB11ADBC-0C3F-4FE0-A678-666EE80CBB07%40amazon.com
2025-02-21 10:27:39 -08:00
Peter Eisentraut
7d6d2c4bbd Drop opcintype from index AM strategy translation API
The type argument wasn't actually really necessary.  It was a remnant
of converting the API of the gist strategy translation from using
opclass to using opfamily+opcintype (commits c09e5a6a01,
622f678c10).  For looking up the gist translation function, we used
the convention "amproclefttype = amprocrighttype = opclass's
opcintype" (see pg_amproc.h).  But each operator family should only
have one translation function, and getting the right type for the
lookup is sometimes cumbersome and fragile, so this is all
unnecessarily complicated.

To simplify this, change the gist stategy support procedure to take
"any", "any" as argument.  (This is arbitrary but seems intuitive.
The alternative of using InvalidOid as argument(s) upsets various DDL
commands, so it's not practical.)  Then we don't need opcintype for
the lookup, and we can remove it from all the API layers introduced by
commit c09e5a6a01.

This also adds some more documentation about the correct signature of
the gist support function and adds more checks in gistvalidate().
This was previously underspecified.  (It relied implicitly on
convention mentioned above.)

Discussion: https://www.postgresql.org/message-id/flat/E72EAA49-354D-4C2E-8EB9-255197F55330@enterprisedb.com
2025-02-21 09:07:16 +01:00
Peter Eisentraut
3e4d868615 Remove various unnecessary (char *) casts
Remove a number of (char *) casts that are unnecessary.  Or in some
cases, rewrite the code to make the purpose of the cast clearer.

Reviewed-by: Dagfinn Ilmari Mannsåker <ilmari@ilmari.org>
Discussion: https://www.postgresql.org/message-id/flat/fd1fcedb-3492-4fc8-9e3e-74b97f2db6c7%40eisentraut.org
2025-02-20 19:49:27 +01:00
Amit Langote
525392d572 Don't lock partitions pruned by initial pruning
Before executing a cached generic plan, AcquireExecutorLocks() in
plancache.c locks all relations in a plan's range table to ensure the
plan is safe for execution. However, this locks runtime-prunable
relations that will later be pruned during "initial" runtime pruning,
introducing unnecessary overhead.

This commit defers locking for such relations to executor startup and
ensures that if the CachedPlan is invalidated due to concurrent DDL
during this window, replanning is triggered. Deferring these locks
avoids unnecessary locking overhead for pruned partitions, resulting
in significant speedup, particularly when many partitions are pruned
during initial runtime pruning.

* Changes to locking when executing generic plans:

AcquireExecutorLocks() now locks only unprunable relations, that is,
those found in PlannedStmt.unprunableRelids (introduced in commit
cbc127917e), to avoid locking runtime-prunable partitions
unnecessarily.  The remaining locks are taken by
ExecDoInitialPruning(), which acquires them only for partitions that
survive pruning.

This deferral does not affect the locks required for permission
checking in InitPlan(), which takes place before initial pruning.
ExecCheckPermissions() now includes an Assert to verify that all
relations undergoing permission checks, none of which can be in the
set of runtime-prunable relations, are properly locked.

* Plan invalidation handling:

Deferring locks introduces a window where prunable relations may be
altered by concurrent DDL, invalidating the plan. A new function,
ExecutorStartCachedPlan(), wraps ExecutorStart() to detect and handle
invalidation caused by deferred locking. If invalidation occurs,
ExecutorStartCachedPlan() updates CachedPlan using the new
UpdateCachedPlan() function and retries execution with the updated
plan. To ensure all code paths that may be affected by this handle
invalidation properly, all callers of ExecutorStart that may execute a
PlannedStmt from a CachedPlan have been updated to use
ExecutorStartCachedPlan() instead.

UpdateCachedPlan() replaces stale plans in CachedPlan.stmt_list. A new
CachedPlan.stmt_context, created as a child of CachedPlan.context,
allows freeing old PlannedStmts while preserving the CachedPlan
structure and its statement list. This ensures that loops over
statements in upstream callers of ExecutorStartCachedPlan() remain
intact.

ExecutorStart() and ExecutorStart_hook implementations now return a
boolean value indicating whether plan initialization succeeded with a
valid PlanState tree in QueryDesc.planstate, or false otherwise, in
which case QueryDesc.planstate is NULL. Hook implementations are
required to call standard_ExecutorStart() at the beginning, and if it
returns false, they should do the same without proceeding.

* Testing:

To verify these changes, the delay_execution module tests scenarios
where cached plans become invalid due to changes in prunable relations
after deferred locks.

* Note to extension authors:

ExecutorStart_hook implementations must verify plan validity after
calling standard_ExecutorStart(), as explained earlier. For example:

    if (prev_ExecutorStart)
        plan_valid = prev_ExecutorStart(queryDesc, eflags);
    else
        plan_valid = standard_ExecutorStart(queryDesc, eflags);

    if (!plan_valid)
        return false;

    <extension-code>

    return true;

Extensions accessing child relations, especially prunable partitions,
via ExecGetRangeTableRelation() must now ensure their RT indexes are
present in es_unpruned_relids (introduced in commit cbc127917e), or
they will encounter an error. This is a strict requirement after this
change, as only relations in that set are locked.

The idea of deferring some locks to executor startup, allowing locks
for prunable partitions to be skipped, was first proposed by Tom Lane.

Reviewed-by: Robert Haas <robertmhaas@gmail.com> (earlier versions)
Reviewed-by: David Rowley <dgrowleyml@gmail.com> (earlier versions)
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> (earlier versions)
Reviewed-by: Tomas Vondra <tomas@vondra.me>
Reviewed-by: Junwang Zhao <zhjwpku@gmail.com>
Discussion: https://postgr.es/m/CA+HiwqFGkMSge6TgC9KQzde0ohpAycLQuV7ooitEEpbKB0O_mg@mail.gmail.com
2025-02-20 17:09:48 +09:00
John Naylor
53d3daa491 Specialize intarray sorting
There is at least one report in the field of storing millions of
integers in arrays, so it seems like a good time to specialize
intarray's qsort function. In doing so, streamline the comparators:
Previously there were three, two for each direction for sorting
and one passed to qunique_arg. To preserve the early exit in the
case of descending input, pass the direction as an argument to
the comparator. This requires giving up duplicate detection, which
previously allowed skipping the qunique_arg() call. Testing showed
no regressions this way.

In passing, get rid of nearby checks that the input has at least
two elements, since preserving them would make some macros less
readable. These are not necessary for correctness, and seem like
premature optimizations.

Author: Andrey M. Borodin <x4mmm@yandex-team.ru>
Discussion: https://postgr.es/m/098A3E67-E4A6-4086-9C66-B1EAEB1DFE1C@yandex-team.ru
2025-02-18 11:04:55 +07:00
Michael Paquier
ce5bcc4a9f pg_stat_statements: Add wal_buffers_full
wal_buffers_full tracks the number of times WAL buffers become full,
giving hints to be able to tune the GUC wal_buffers.

Up to now, this information was only available in pg_stat_wal.  With
this field available in WalUsage since eaf502747b, exposing it in
pg_stat_statements is straight-forward, and it offers more granularity
at query level.

pg_stat_statements does not need a version bump as one has been done in
commit cf54a2c002 for this development cycle.

Author: Bertrand Drouvot
Reviewed-by: Ilia Evdokimov
Discussion: https://postgr.es/m/Z6SOha5YFFgvpwQY@ip-10-97-1-34.eu-west-3.compute.internal
2025-02-17 13:55:17 +09:00
Daniel Gustafsson
9ad1b3d01f pgcrypto: Add support for CFB mode in AES encryption
Cipher Feedback Mode, CFB, is a self-synchronizing stream cipher which
is very similar to CBC performed in reverse. Since OpenSSL supports it,
we can easily plug it into the existing cipher selection code without
any need for infrastructure changes.

This patch was simultaneously submitted by Umar Hayat and Vladyslav
Nebozhyn, the latter whom suggested the feauture. The committed patch
is Umar's version.

Author: Umar Hayat <postgresql.wizard@gmail.com>
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Reviewed-by: Álvaro Herrera <alvherre@alvh.no-ip.org>
Discussion: https://postgr.es/m/CAPBGcbxo9ASzq14VTpQp3mnUJ5omdgTWUJOvWV0L6nNigWE5jw@mail.gmail.com
2025-02-14 21:18:37 +01:00
Peter Eisentraut
ed5e5f0710 Remove unnecessary (char *) casts [xlog]
Remove (char *) casts no longer needed after XLogRegisterData() and
XLogRegisterBufData() argument type change.

Reviewed-by: Dagfinn Ilmari Mannsåker <ilmari@ilmari.org>
Discussion: https://www.postgresql.org/message-id/flat/fd1fcedb-3492-4fc8-9e3e-74b97f2db6c7%40eisentraut.org
2025-02-13 10:57:07 +01:00
Masahiko Sawada
072ee847ad Skip logical decoding of already-aborted transactions.
Previously, transaction aborts were detected concurrently only during
system catalog scans while replaying a transaction in streaming mode.

This commit adds an additional CLOG lookup to check the transaction
status, allowing the logical decoding to skip changes also when it
doesn't touch system catalogs, if the transaction is already
aborted. This optimization enhances logical decoding performance,
especially for large transactions that have already been rolled back,
as it avoids unnecessary disk or network I/O.

To avoid potential slowdowns caused by frequent CLOG lookups for small
transactions (most of which commit), the CLOG lookup is performed only
for large transactions before eviction. The performance benchmark
results showed there is not noticeable performance regression due to
CLOG lookups.

Reviewed-by: Amit Kapila, Peter Smith, Vignesh C, Ajin Cherian
Reviewed-by: Dilip Kumar, Andres Freund
Discussion: https://postgr.es/m/CAD21AoDht9Pz_DFv_R2LqBTBbO4eGrpa9Vojmt5z5sEx3XwD7A@mail.gmail.com
2025-02-12 16:31:34 -08:00
Peter Eisentraut
1b5841d461 Remove unnecessary (char *) casts [checksum]
Remove some (char *) casts related to uses of the pg_checksum_page()
function.  These casts are useless, because everything involved
already has the right type.  Moreover, these casts actually silently
discarded a const qualifier.  The declaration of a higher-level
function needs to be adjusted to fix that.

Reviewed-by: Dagfinn Ilmari Mannsåker <ilmari@ilmari.org>
Discussion: https://www.postgresql.org/message-id/flat/fd1fcedb-3492-4fc8-9e3e-74b97f2db6c7%40eisentraut.org
2025-02-12 08:59:48 +01:00
Peter Eisentraut
827b4060a8 Remove unnecessary (char *) casts [mem]
Remove (char *) casts around memory functions such as memcmp(),
memcpy(), or memset() where the cast is useless.  Since these
functions don't take char * arguments anyway, these casts are at best
complicated casts to (void *), about which see commit 7f798aca1d.

Reviewed-by: Dagfinn Ilmari Mannsåker <ilmari@ilmari.org>
Discussion: https://www.postgresql.org/message-id/flat/fd1fcedb-3492-4fc8-9e3e-74b97f2db6c7%40eisentraut.org
2025-02-12 08:50:13 +01:00
Peter Eisentraut
506183bce7 Remove unnecessary (char *) casts [string]
Remove (char *) casts around string functions where the arguments or
result already have the right type and the cast is useless (or worse,
potentially casts away a qualifier, but this doesn't appear to be the
case here).

Reviewed-by: Dagfinn Ilmari Mannsåker <ilmari@ilmari.org>
Discussion: https://www.postgresql.org/message-id/flat/fd1fcedb-3492-4fc8-9e3e-74b97f2db6c7%40eisentraut.org
2025-02-12 08:49:18 +01:00
Nathan Bossart
e5b0b0ce15 Add is_analyze parameter to vacuum_delay_point().
This function is used in both vacuum and analyze code paths, and a
follow-up commit will require distinguishing between the two.  This
commit forces callers to specify whether they are in a vacuum or
analyze path, but it does not use that information for anything
yet.

Author: Nathan Bossart <nathandbossart@gmail.com>
Co-authored-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Discussion: https://postgr.es/m/ZmaXmWDL829fzAVX%40ip-10-97-1-34.eu-west-3.compute.internal
2025-02-11 16:38:14 -06:00
Peter Eisentraut
83ea6c5402 Virtual generated columns
This adds a new variant of generated columns that are computed on read
(like a view, unlike the existing stored generated columns, which are
computed on write, like a materialized view).

The syntax for the column definition is

    ... GENERATED ALWAYS AS (...) VIRTUAL

and VIRTUAL is also optional.  VIRTUAL is the default rather than
STORED to match various other SQL products.  (The SQL standard makes
no specification about this, but it also doesn't know about VIRTUAL or
STORED.)  (Also, virtual views are the default, rather than
materialized views.)

Virtual generated columns are stored in tuples as null values.  (A
very early version of this patch had the ambition to not store them at
all.  But so much stuff breaks or gets confused if you have tuples
where a column in the middle is completely missing.  This is a
compromise, and it still saves space over being forced to use stored
generated columns.  If we ever find a way to improve this, a bit of
pg_upgrade cleverness could allow for upgrades to a newer scheme.)

The capabilities and restrictions of virtual generated columns are
mostly the same as for stored generated columns.  In some cases, this
patch keeps virtual generated columns more restricted than they might
technically need to be, to keep the two kinds consistent.  Some of
that could maybe be relaxed later after separate careful
considerations.

Some functionality that is currently not supported, but could possibly
be added as incremental features, some easier than others:

- index on or using a virtual column
- hence also no unique constraints on virtual columns
- extended statistics on virtual columns
- foreign-key constraints on virtual columns
- not-null constraints on virtual columns (check constraints are supported)
- ALTER TABLE / DROP EXPRESSION
- virtual column cannot have domain type
- virtual columns are not supported in logical replication

The tests in generated_virtual.sql have been copied over from
generated_stored.sql with the keyword replaced.  This way we can make
sure the behavior is mostly aligned, and the differences can be
visible.  Some tests for currently not supported features are
currently commented out.

Reviewed-by: Jian He <jian.universality@gmail.com>
Reviewed-by: Dean Rasheed <dean.a.rasheed@gmail.com>
Tested-by: Shlok Kyal <shlok.kyal.oss@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/a368248e-69e4-40be-9c07-6c3b5880b0a6@eisentraut.org
2025-02-07 09:46:59 +01:00
Daniel Gustafsson
44ec095751 Remove support for linking with libeay32 and ssleay32
The OpenSSL project stopped using the eay names back in 2016
on platforms other than Microsoft Windows, and version 1.1.0
removed the names from Windows as well. Since we now require
OpenSSL 1.1.1 we can remove support for using the eay names
from our tree as well.

Author: Daniel Gustafsson <daniel@yesql.se>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/3C445F8E-D43E-4970-9CD9-A54882197714@yesql.se
Discussion: https://postgr.es/m/CAHrt6656W9OnFomQTHBGYDcM5CKZ7hcgzFt8L+N0ezBZfcN3zA@mail.gmail.com
2025-02-06 20:26:46 +01:00
Daniel Gustafsson
affd38e55a pgcrypto: Remove static storage class from variables
Variables p, sp and ep were labeled with static storage class
but are all assigned before use so they cannot carry any data
across calls.  Fix by removing the static label.

Also while in there, make the magic variable const as it will
never change.

Author: Japin Li <japinli@hotmail.com>
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/ME0P300MB0445096B67ACE8CE25772F00B6F72@ME0P300MB0445.AUSP300.PROD.OUTLOOK.COM
2025-02-06 15:13:40 +01:00
Peter Eisentraut
cc2c9fa696 sepgsql: update TAP test to use fat comma style
Adopt the style introduced by commit ce1b0f9da0 to this new test
file.

Author: Dagfinn Ilmari Mannsåker <ilmari@ilmari.org>
Discussion: https://www.postgresql.org/message-id/87y0yv2har.fsf@wibble.ilmari.org
2025-02-04 15:51:42 +01:00
Peter Eisentraut
c09e5a6a01 Convert strategies to and from compare types
For each Index AM, provide a mapping between operator strategies and
the system-wide generic concept of a comparison type.  For example,
for btree, BTLessStrategyNumber maps to and from COMPARE_LT.  Numerous
places in the planner and executor think directly in terms of btree
strategy numbers (and a few in terms of hash strategy numbers.)  These
should be converted over subsequent commits to think in terms of
CompareType instead.  (This commit doesn't make any use of this API
yet.)

Author: Mark Dilger <mark.dilger@enterprisedb.com>
Reviewed-by: Peter Eisentraut <peter@eisentraut.org>
Discussion: https://www.postgresql.org/message-id/flat/E72EAA49-354D-4C2E-8EB9-255197F55330@enterprisedb.com
2025-02-02 10:26:04 +01:00
Peter Eisentraut
119fc30dd5 Move CompareType to separate header file
We'll want to make use of it in more places, and we'd prefer to not
have to include all of primnodes.h everywhere.

Author: Mark Dilger <mark.dilger@enterprisedb.com>
Reviewed-by: Peter Eisentraut <peter@eisentraut.org>
Discussion: https://www.postgresql.org/message-id/flat/E72EAA49-354D-4C2E-8EB9-255197F55330@enterprisedb.com
2025-02-02 08:11:57 +01:00
Peter Eisentraut
43493cceda Add get_opfamily_name() function
This refactors and simplifies various existing code to make use of the
new function.

Reviewed-by: Mark Dilger <mark.dilger@enterprisedb.com>
Discussion: https://www.postgresql.org/message-id/flat/E72EAA49-354D-4C2E-8EB9-255197F55330@enterprisedb.com
2025-02-01 10:42:58 +01:00
Tom Lane
0da39aa766 Handle default NULL insertion a little better.
If a column is omitted in an INSERT, and there's no column default,
the code in preptlist.c generates a NULL Const to be inserted.
Furthermore, if the column is of a domain type, we wrap the Const
in CoerceToDomain, so as to throw a run-time error if the domain
has a NOT NULL constraint.  That's fine as far as it goes, but
there are two problems:

1. We're being sloppy about the type/typmod that the Const is
labeled with.  It really should have the domain's base type/typmod,
since it's the input to CoerceToDomain not the output.  This can
result in coerce_to_domain inserting a useless length-coercion
function (useless because it's being applied to a null).  The
coercion would typically get const-folded away later, but it'd
be better not to create it in the first place.

2. We're not applying expression preprocessing (specifically,
eval_const_expressions) to the resulting expression tree.
The planner's primary expression-preprocessing pass already happened,
so that means the length coercion step and CoerceToDomain node miss
preprocessing altogether.

This is at the least inefficient, since it means the length coercion
and CoerceToDomain will actually be executed for each inserted row,
though they could be const-folded away in most cases.  Worse, it
seems possible that missing preprocessing for the length coercion
could result in an invalid plan (for example, due to failing to
perform default-function-argument insertion).  I'm not aware of
any live bug of that sort with core datatypes, and it might be
unreachable for extension types as well because of restrictions of
CREATE CAST, but I'm not entirely convinced that it's unreachable.
Hence, it seems worth back-patching the fix (although I only went
back to v14, as the patch doesn't apply cleanly at all in v13).

There are several places in the rewriter that are building null
domain constants the same way as preptlist.c.  While those are
before the planner and hence don't have any reachable bug, they're
still applying a length coercion that will be const-folded away
later, uselessly wasting cycles.  Hence, make a utility routine
that all of these places can call to do it right.

Making this code more careful about the typmod assigned to the
generated NULL constant has visible but cosmetic effects on some
of the plans shown in contrib/postgres_fdw's regression tests.

Discussion: https://postgr.es/m/1865579.1738113656@sss.pgh.pa.us
Backpatch-through: 14
2025-01-29 15:31:55 -05:00
John Naylor
128897b101 Fix grammatical typos around possessive "its"
Some places spelled it "it's", which is short for "it is".
In passing, fix a couple other nearby grammatical errors.

Author: Jacob Brazeal <jacob.brazeal@gmail.com>
Discussion: https://postgr.es/m/CA+COZaAO8g1KJCV0T48=CkJMjAnnfTGLWOATz+2aCh40c2Nm+g@mail.gmail.com
2025-01-29 14:39:14 +07:00
Noah Misch
81772a495e Merge copies of converting an XID to a FullTransactionId.
Assume twophase.c is the performance-sensitive caller, and preserve its
choice of unlikely() branch hint.  Add some retrospective rationale for
that choice.  Back-patch to v17, for the next commit to use it.

Reviewed (in earlier versions) by Michael Paquier.

Discussion: https://postgr.es/m/17821-dd8c334263399284@postgresql.org
Discussion: https://postgr.es/m/20250116010051.f3.nmisch@google.com
2025-01-25 11:28:14 -08:00
Peter Eisentraut
13a255c195 Fix copy-and-paste typo 2025-01-24 17:45:55 +01:00
Daniel Gustafsson
035f99cbeb pgcrypto: Make it possible to disable built-in crypto
When using OpenSSL and/or the underlying operating system in FIPS
mode no non-FIPS certified crypto implementations should be used.
While that is already possible by just not invoking the built-in
crypto in pgcrypto, this adds a GUC which prohibit the code from
being called.  This doesn't change the FIPS status of PostgreSQL
but can make it easier for sites which target FIPS compliance to
ensure that violations cannot occur.

Author: Daniel Gustafsson <daniel@yesql.se>
Author: Joe Conway <mail@joeconway.com>
Reviewed-by: Joe Conway <mail@joeconway.com>
Reviewed-by: Peter Eisentraut <peter@eisentraut.org>
Reviewed-by: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Discussion: https://postgr.es/m/16b4a157-9ea1-44d0-b7b3-4c85df5de97b@joeconway.com
2025-01-24 14:25:08 +01:00
Daniel Gustafsson
924d89a354 pgcrypto: Add function to check FIPS mode
This adds a SQL callable function for reading and returning the status
of FIPS configuration of OpenSSL.  If OpenSSL is operating with FIPS
enabled it will return true, otherwise false.  As this adds a function
to the SQL file, bump the extension version to 1.4.

Author: Daniel Gustafsson <daniel@yesql.se>
Reviewed-by: Joe Conway <mail@joeconway.com>
Discussion: https://postgr.es/m/8f979145-e206-475a-a31b-73c977a4134c@joeconway.com
2025-01-24 14:18:40 +01:00
Peter Eisentraut
aeb8ea361a Convert sepgsql tests to TAP
Add a TAP test for sepgsql.  This automates the previously required
manual setup before the test.  The actual tests are still run by
pg_regress, as before, but now called from within the TAP Perl script.

The previous manual test script (test_sepgsql) is left in place, since
its purpose is (also) to test whether a running instance was properly
initialized for sepgsql.  But it has been changed to call pg_regress
directly and no longer require make.

Reviewed-by: Andreas Karlsson <andreas@proxel.se>
Discussion: https://www.postgresql.org/message-id/flat/651a5baf-5c45-4a5a-a202-0c8453a4ebf8@eisentraut.org
2025-01-24 12:39:47 +01:00
Peter Eisentraut
02ed3c2bdc meson: Fix sepgsql installation
The sepgsql.sql file should be installed under share/contrib/, not
share/extension/, since it is not an extension.  This makes it match
what make install does.

Discussion: https://www.postgresql.org/message-id/flat/651a5baf-5c45-4a5a-a202-0c8453a4ebf8@eisentraut.org
2025-01-24 10:26:12 +01:00
Peter Eisentraut
34694ec888 Convert macros to static inline functions (htup_details.h, itup.h)
Discussion: https://www.postgresql.org/message-id/flat/5b558da8-99fb-0a99-83dd-f72f05388517@enterprisedb.com
2025-01-23 12:12:08 +01:00
Tom Lane
ea68ea6320 Repair incorrect handling of AfterTriggerSharedData.ats_modifiedcols.
This patch fixes two distinct errors that both ultimately trace
to commit 71d60e2aa, which added the ats_modifiedcols field.

The more severe error is that ats_modifiedcols wasn't accounted for
in afterTriggerAddEvent's scanning loop that looks for a pre-existing
duplicate AfterTriggerSharedData.  Thus, a new event could be
incorrectly matched to an AfterTriggerSharedData that has a different
value of ats_modifiedcols, resulting in the wrong tg_updatedcols
bitmap getting passed to the trigger whenever it finally gets fired.
We'd not noticed because (a) few triggers consult tg_updatedcols,
and (b) we had no tests exercising a case where such a trigger was
called as an AFTER trigger.  In the test case added by this commit,
contrib/lo's trigger fails to remove a large object when expected
because (without this fix) it thinks the LO OID column hasn't changed.

The other problem was introduced by commit ce5aaea8c, which copied the
modified-columns bitmap into trigger-related storage.  It made a copy
for every trigger event, whereas what we really want is to make a new
copy only when we make a new AfterTriggerSharedData entry.  (We could
imagine adding extra logic to reduce the number of bitmap copies still
more, but it doesn't look worthwhile at the moment.)  In a simple test
of an UPDATE of 10000000 rows with a single AFTER trigger, this thinko
roughly tripled the amount of memory consumed by the pending-triggers
data structures, from 160446744 to 480443440 bytes.

Fixing the first problem requires introducing a bms_equal() call into
afterTriggerAddEvent's scanning loop, which is slightly annoying from
a speed perspective.  However, getting rid of the excessive bms_copy()
calls from the second problem balances that out; overall speed of
trigger operations is the same or slightly better, in my tests.

Discussion: https://postgr.es/m/3496294.1737501591@sss.pgh.pa.us
Backpatch-through: 13
2025-01-22 11:58:20 -05:00
Michael Paquier
ce1b0f9da0 Improve grammar of options for command arrays in TAP tests
This commit rewrites a good chunk of the command arrays in TAP tests
with a grammar based on the following rules:
- Fat commas are used between option names and their values, making it
clear to both humans and perltidy that values and names are bound
together.  This is particularly useful for the readability of multi-line
command arrays, and there are plenty of them in the TAP tests.  Most of
the test code is updated to use this style.  Some commands used
parenthesis to show the link, or attached values and options in a single
string.  These are updated to use fat commas instead.
- Option names are switched to use their long names, making them more
self-documented.  Based on a suggestion by Andrew Dunstan.
- Add some trailing commas after the last item in multi-line arrays,
which is a common perl style.

Not all the places are taken care of, but this covers a very good chunk
of them.

Author: Dagfinn Ilmari Mannsåker
Reviewed-by: Michael Paquier, Peter Smith, Euler Taveira
Discussion: https://postgr.es/m/87jzc46d8u.fsf@wibble.ilmari.org
2025-01-22 14:47:13 +09:00
Dean Rasheed
80feb727c8 Add OLD/NEW support to RETURNING in DML queries.
This allows the RETURNING list of INSERT/UPDATE/DELETE/MERGE queries
to explicitly return old and new values by using the special aliases
"old" and "new", which are automatically added to the query (if not
already defined) while parsing its RETURNING list, allowing things
like:

  RETURNING old.colname, new.colname, ...

  RETURNING old.*, new.*

Additionally, a new syntax is supported, allowing the names "old" and
"new" to be changed to user-supplied alias names, e.g.:

  RETURNING WITH (OLD AS o, NEW AS n) o.colname, n.colname, ...

This is useful when the names "old" and "new" are already defined,
such as inside trigger functions, allowing backwards compatibility to
be maintained -- the interpretation of any existing queries that
happen to already refer to relations called "old" or "new", or use
those as aliases for other relations, is not changed.

For an INSERT, old values will generally be NULL, and for a DELETE,
new values will generally be NULL, but that may change for an INSERT
with an ON CONFLICT ... DO UPDATE clause, or if a query rewrite rule
changes the command type. Therefore, we put no restrictions on the use
of old and new in any DML queries.

Dean Rasheed, reviewed by Jian He and Jeff Davis.

Discussion: https://postgr.es/m/CAEZATCWx0J0-v=Qjc6gXzR=KtsdvAE7Ow=D=mu50AgOe+pvisQ@mail.gmail.com
2025-01-16 14:57:35 +00:00
Peter Eisentraut
ff030ebe25 Check return of pg_b64_encode() for error
Forgotten in commit 761c79508e.

Author: Ranier Vilela <ranier.vf@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/CAEudQAq-3yHsSdWoOOaw%2BgAQYgPMpMGuB5pt2yCXgv-YuxG2Hg%40mail.gmail.com
2025-01-16 08:35:57 +01:00
Peter Eisentraut
761c79508e postgres_fdw: SCRAM authentication pass-through
This enables SCRAM authentication for postgres_fdw when connecting to
a foreign server without having to store a plain-text password on user
mapping options.

This is done by saving the SCRAM ClientKey and ServeryKey from the
client authentication and using those instead of the plain-text
password for the server-side SCRAM exchange.  The new foreign-server
or user-mapping option "use_scram_passthrough" enables this.

Co-authored-by: Matheus Alcantara <mths.dev@pm.me>
Co-authored-by: Peter Eisentraut <peter@eisentraut.org>
Discussion: https://www.postgresql.org/message-id/flat/27b29a35-9b96-46a9-bc1a-914140869dac@gmail.com
2025-01-15 17:58:05 +01:00
Peter Eisentraut
630f9a43ce Change gist stratnum function to use CompareType
This changes commit 7406ab623f in that the gist strategy number
mapping support function is changed to use the CompareType enum as
input, instead of the "well-known" RT*StrategyNumber strategy numbers.

This is a bit cleaner, since you are not dealing with two sets of
strategy numbers.  Also, this will enable us to subsume this system
into a more general system of using CompareType to define operator
semantics across index methods.

Discussion: https://www.postgresql.org/message-id/flat/E72EAA49-354D-4C2E-8EB9-255197F55330@enterprisedb.com
2025-01-15 11:34:04 +01:00
Tom Lane
bebe904038 Use @extschema:name@ notation in contrib transform modules.
Harden hstore_plperl, hstore_plpython, and ltree_plpython
against search-path-based attacks by using @extschema:name@
notation to refer to the underlying hstore or ltree data type.

This allows removal of the previous documentation warning
suggesting that they must be installed in the same schema as
the underlying data type.  In passing, also improve a para in
extend.sgml to suggest using @extschema:name@ for such purposes.

Discussion: https://postgr.es/m/692480.1736021695@sss.pgh.pa.us
2025-01-09 15:16:56 -05:00
Michael Paquier
e0c3d5122e pg_freespacemap: Fix declaration of pg_freespace(regclass)
This function called generate_series() without enforcing its input
argument types, making possible for an attacker to catch this call, by
defining for example a generate_series(int,bigint).

The internals of pg_freespace(regclass) are changed to force the use of
bigint for the inputs of generate_series().  A more consistent style is
applied for all its hardcoded values, while on it.

Issue introduced in 3f323eba89.

Reported-by: Noah Misch
Reviewed-by: Noah Misch
Discussion: https://postgr.es/m/20250106190428.ec.nmisch@google.com
2025-01-08 13:16:43 +09:00
Nathan Bossart
f7e1b3828a Add passwordcheck.min_password_length.
This new parameter can be used to change the minimum allowed
password length (in bytes).  Note that it has no effect if a user
supplies a pre-encrypted password.

Author: Emanuele Musella, Maurizio Boriani
Reviewed-by: Tomas Vondra, Bertrand Drouvot, Japin Li
Discussion: https://postgr.es/m/CA%2BugDNyYtHOtWCqVD3YkSVYDWD_1fO8Jm_ahsDGA5dXhbDPwrQ%40mail.gmail.com
2025-01-07 15:06:40 -06:00
Bruce Momjian
50e6eb731d Update copyright for 2025
Backpatch-through: 13
2025-01-01 11:21:55 -05:00
Tom Lane
68ff25eef1 contrib/pageinspect: Use SQL-standard function bodies.
In the same spirit as 969bbd0fa, 13e3796c9, 3f323eba8.

Tom Lane and Ronan Dunklau

Discussion: https://postgr.es/m/3316564.aeNJFYEL58@aivenlaptop
2024-12-29 14:58:05 -05:00
Tom Lane
667368fd26 contrib/xml2: Use SQL-standard function bodies.
In the same spirit as 969bbd0fa, 13e3796c9, 3f323eba8.

Tom Lane and Ronan Dunklau

Discussion: https://postgr.es/m/3316564.aeNJFYEL58@aivenlaptop
2024-12-29 13:53:00 -05:00
Tom Lane
97a5a16849 contrib/citext: Use SQL-standard function bodies.
In the same spirit as 969bbd0fa, 13e3796c9, 3f323eba8.

Tom Lane and Ronan Dunklau

Discussion: https://postgr.es/m/3316564.aeNJFYEL58@aivenlaptop
2024-12-29 13:37:35 -05:00
Peter Eisentraut
301de6a6f6 Partial pgindent of .l and .y files
Trying to clean up the code a bit while we're working on these files
for the reentrant scanner/pure parser patches.  This cleanup only
touches the code sections after the second '%%' in each file, via a
manually-supervised and locally hacked up pgindent.
2024-12-25 17:55:42 +01:00
Tom Lane
c431986de1 postgres_fdw: re-issue cancel requests a few times if necessary.
Despite the best efforts of commit 0e5c82380, we're still seeing
occasional failures of postgres_fdw's query_cancel test in the
buildfarm.  Investigation suggests that its 100ms timeout is
still not enough to reliably ensure that the remote side starts
the query before receiving the cancel request --- and if it
hasn't, it will just discard the request because it's idle.

We discussed allowing a cancel request to kill the next-received
query, but that would have wide and perhaps unpleasant side-effects.
What seems safer is to make postgres_fdw do what a human user would
likely do, which is issue another cancel request if the first one
didn't seem to do anything.  We'll keep the same overall 30 second
grace period before concluding things are broken, but issue additional
cancel requests after 1 second, then 2 more seconds, then 4, then 8.
(The next one in series is 16 seconds, but we'll hit the 30 second
timeout before that.)

Having done that, revert the timeout in query_cancel.sql to 10 ms.
That will still be enough on most machines, most of the time, for
the remote query to start; but now we're intentionally risking the
race condition occurring sometimes in the buildfarm, so that the
repeat-cancel code path will get some testing.

As before, back-patch to v17.  We might eventually contemplate
back-patching this further, and/or adding similar logic to dblink.
But given the lack of field complaints to date, this feels like
mostly an exercise in test case stabilization, so v17 is enough.

Discussion: https://postgr.es/m/colnv3lzzmc53iu5qoawynr6qq7etn47lmggqr65ddtpjliq5d@glkveb4m6nop
2024-12-23 15:14:30 -05:00
Michael Paquier
7f97b4734f Fix some comments related to library unloading
Library unloading has never been supported with its code removed in
ab02d702ef, and there were some comments still mentioning that it was
a possible operation.

ChangAo has noticed the incorrect references in dfmgr.c, while I have
noticed the other ones while scanning the rest of the tree for similar
mistakes.

Author: ChangAo Chen, Michael Paquier
Reviewed-by: Tom Lane
Discussion: https://postgr.es/m/tencent_1D09840A1632D406A610C8C4E2491D74DB0A@qq.com
2024-12-23 14:46:49 +09:00
David Rowley
db448ce5ad Optimize alignment calculations in tuple form/deform
Here we convert CompactAttribute.attalign from a char, which is directly
derived from pg_attribute.attalign into a uint8, which stores the number
of bytes to align the column's value by in the tuple.

This allows tuple deformation and tuple size calculations to move away
from using the inefficient att_align_nominal() macro, which manually
checks each TYPALIGN_* char to translate that into the alignment bytes
for the given type.  Effectively, this commit changes those to TYPEALIGN
calls, which are branchless and only perform some simple arithmetic with
some bit-twiddling.

The removed branches were often mispredicted by CPUs, especially so in
real-world tables which often contain a mishmash of different types
with different alignment requirements.

Author: David Rowley
Reviewed-by: Andres Freund, Victor Yegorov
Discussion: https://postgr.es/m/CAApHDvrBztXP3yx=NKNmo3xwFAFhEdyPnvrDg3=M0RhDs+4vYw@mail.gmail.com
2024-12-21 09:43:26 +13:00
Thomas Munro
38c579b089 Fix corruption when relation truncation fails.
RelationTruncate() does three things, while holding an
AccessExclusiveLock and preventing checkpoints:

1. Logs the truncation.
2. Drops buffers, even if they're dirty.
3. Truncates some number of files.

Step 2 could previously be canceled if it had to wait for I/O, and step
3 could and still can fail in file APIs.  All orderings of these
operations have data corruption hazards if interrupted, so we can't give
up until the whole operation is done.  When dirty pages were discarded
but the corresponding blocks were left on disk due to ERROR, old page
versions could come back from disk, reviving deleted data (see
pgsql-bugs #18146 and several like it).  When primary and standby were
allowed to disagree on relation size, standbys could panic (see
pgsql-bugs #18426) or revive data unknown to visibility management on
the primary (theorized).

Changes:

 * WAL is now unconditionally flushed first
 * smgrtruncate() is now called in a critical section, preventing
   interrupts and causing PANIC on file API failure
 * smgrtruncate() has a new parameter for existing fork sizes,
   because it can't call smgrnblocks() itself inside a critical section

The changes apply to RelationTruncate(), smgr_redo() and
pg_truncate_visibility_map().  That last is also brought up to date with
other evolutions of the truncation protocol.

The VACUUM FileTruncate() failure mode had been discussed in older
reports than the ones referenced below, with independent analysis from
many people, but earlier theories on how to fix it were too complicated
to back-patch.  The more recently invented cancellation bug was
diagnosed by Alexander Lakhin.  Other corruption scenarios were spotted
by me while iterating on this patch and earlier commit 75818b3a.

Back-patch to all supported releases.

Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Reported-by: rootcause000@gmail.com
Reported-by: Alexander Lakhin <exclusion@gmail.com>
Discussion: https://postgr.es/m/18146-04e908c662113ad5%40postgresql.org
Discussion: https://postgr.es/m/18426-2d18da6586f152d6%40postgresql.org
2024-12-20 23:57:02 +13:00
David Rowley
5983a4cffc Introduce CompactAttribute array in TupleDesc, take 2
The new compact_attrs array stores a few select fields from
FormData_pg_attribute in a more compact way, using only 16 bytes per
column instead of the 104 bytes that FormData_pg_attribute uses.  Using
CompactAttribute allows performance-critical operations such as tuple
deformation to be performed without looking at the FormData_pg_attribute
element in TupleDesc which means fewer cacheline accesses.

For some workloads, tuple deformation can be the most CPU intensive part
of processing the query.  Some testing with 16 columns on a table
where the first column is variable length showed around a 10% increase in
transactions per second for an OLAP type query performing aggregation on
the 16th column.  However, in certain cases, the increases were much
higher, up to ~25% on one AMD Zen4 machine.

This also makes pg_attribute.attcacheoff redundant.  A follow-on commit
will remove it, thus shrinking the FormData_pg_attribute struct by 4
bytes.

Author: David Rowley
Reviewed-by: Andres Freund, Victor Yegorov
Discussion: https://postgr.es/m/CAApHDvrBztXP3yx=NKNmo3xwFAFhEdyPnvrDg3=M0RhDs+4vYw@mail.gmail.com
2024-12-20 22:31:26 +13:00
Peter Eisentraut
382092a0cd Prevent redeclaration of typedef yyscan_t
Fix for 1f0de66ea2a: We need to prevent redeclaration of typedef
yyscan_t.  (This will work with C11 but not currently with C99.)  The
generated scanner files provide their own typedef, but we also need to
provide one for the interfaces that we expose.  So we need to add some
preprocessor guards to avoid a redefinition.  (This is how the
generated scanner files do it internally as well.)  This way
everything now works independent of the order in which things are
included.

Discussion: https://www.postgresql.org/message-id/flat/eb6faeac-2a8a-4b69-9189-c33c520e5b7b@eisentraut.org
2024-12-19 11:24:43 +01:00
Peter Eisentraut
1f0de66ea2 seg: pure parser and reentrant scanner
Use the flex %option reentrant and the bison option %pure-parser to
make the generated scanner and parser pure, reentrant, and
thread-safe.

Make the generated scanner use palloc() etc. instead of malloc() etc.
Previously, we only used palloc() for the buffer, but flex would still
use malloc() for its internal structures.  As a result, there could be
some small memory leaks in case of uncaught errors.  (We do catch
normal syntax errors as soft errors.)  Now, all the memory is under
palloc() control, so there are no more such issues.

Simplify flex scan buffer management: Instead of constructing the
buffer from pieces and then using yy_scan_buffer(), we can just use
yy_scan_string(), which does the same thing internally.

The previous code was necessary because we allocated the buffer with
palloc() and the rest of the state was handled by malloc().  But this
is no longer the case; everything is under palloc() now.

(We could even get rid of the yylex_destroy() call and just let the
memory context cleanup handle everything.  But for now, we preserve
the existing behavior.)

Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi>
Reviewed-by: Andreas Karlsson <andreas@proxel.se>
Discussion: https://www.postgresql.org/message-id/flat/eb6faeac-2a8a-4b69-9189-c33c520e5b7b@eisentraut.org
2024-12-18 08:47:53 +01:00
Peter Eisentraut
802fe923e3 cube: pure parser and reentrant scanner
Use the flex %option reentrant and the bison option %pure-parser to
make the generated scanner and parser pure, reentrant, and
thread-safe.

Make the generated scanner use palloc() etc. instead of malloc() etc.
Previously, we only used palloc() for the buffer, but flex would still
use malloc() for its internal structures.  As a result, there could be
some small memory leaks in case of uncaught errors.  (We do catch
normal syntax errors as soft errors.)  Now, all the memory is under
palloc() control, so there are no more such issues.

Simplify flex scan buffer management: Instead of constructing the
buffer from pieces and then using yy_scan_buffer(), we can just use
yy_scan_string(), which does the same thing internally.  (Actually, we
use yy_scan_bytes() here because we already have the length.)

The previous code was necessary because we allocated the buffer with
palloc() and the rest of the state was handled by malloc().  But this
is no longer the case; everything is under palloc() now.

(We could even get rid of the yylex_destroy() call and just let the
memory context cleanup handle everything.  But for now, we preserve
the existing behavior.)

Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi>
Reviewed-by: Andreas Karlsson <andreas@proxel.se>
Discussion: https://www.postgresql.org/message-id/flat/eb6faeac-2a8a-4b69-9189-c33c520e5b7b@eisentraut.org
2024-12-18 08:47:34 +01:00
Tomas Vondra
957ba9ff14 Detect version mismatch in brin_page_items
Commit dae761a87e modified brin_page_items() to return the new "empty"
flag for each BRIN range. But the new output parameter was added in the
middle, which may cause crashes when using the new binary with old
function definition.

The ideal solution would be to introduce API versioning similar to what
pg_stat_statements does, but it's too late for that as PG17 was already
released (so we can't introduce a new extension version). We could do
something similar in brin_page_items() by checking the number of output
columns (and ignoring the new flag), but it doesn't seem very nice.

Instead, simply error out and suggest updating the extension to the
latest version. pageinspect is a superuser-only extension, and there's
not much reason to run an older version. Moreover, there's a precedent
for this approach in 691e8b2e18.

Reported by Ľuboslav Špilák, investigation and patch by me. Backpatch to
17, same as dae761a87e.

Reported-by: Ľuboslav Špilák
Reviewed-by: Michael Paquier, Hayato Kuroda, Peter Geoghegan
Backpatch-through: 17
Discussion: https://postgr.es/m/VI1PR02MB63331C3D90E2104FD12399D38A5D2@VI1PR02MB6333.eurprd02.prod.outlook.com
Discussion: https://postgr.es/m/flat/3385a58f-5484-49d0-b790-9a198a0bf236@vondra.me
2024-12-17 17:48:55 +01:00
Peter Eisentraut
fb1a18810f Remove ts_locale.c's lowerstr()
lowerstr() and lowerstr_with_len() in ts_locale.c do the same thing as
str_tolower() that the rest of the system uses, except that the former
don't use the common locale provider framework but instead use the
global libc locale settings.

This patch replaces uses of lowerstr*() with str_tolower(...,
DEFAULT_COLLATION_OID).  For instances that use a libc locale
globally, this will result in exactly the same behavior.  For
instances that use other locale providers, you now get consistent
behavior and are no longer dependent on the libc locale settings (for
this case; there are others).

Most uses of these functions are for processing dictionary and
configuration files.  In those cases, using the default collation
seems appropriate.  At least we don't have a more specific collation
available.  But the code in contrib/pg_trgm should really depend on
the collation of the columns being processed.  This is not done here,
this can be done in a separate patch.

(You can probably construct some edge cases where this change would
create some locale-related upgrade incompatibility, for example if
before you used a combination of ICU and a differently-behaving libc
locale.  We can document this in the release notes, but I don't think
there is anything more we can do about this.)

Reviewed-by: Jeff Davis <pgsql@j-davis.com>
Discussion: https://www.postgresql.org/message-id/flat/653f3b84-fc87-45a7-9a0c-bfb4fcab3e7d%40eisentraut.org
2024-12-17 14:04:55 +01:00
Peter Eisentraut
d3aad4ac57 Remove ts_locale.c's t_isdigit(), t_isspace(), t_isprint()
These do the same thing as the standard isdigit(), isspace(), and
isprint() but with multibyte and encoding support.  But all the
callers are only interested in analyzing single-byte ASCII characters.
So this extra layer is overkill and we can replace the uses with the
standard functions.

All the t_is*() functions in ts_locale.c are under scrutiny because
they don't use the common locale provider framework but instead use
the global libc locale settings.  For the functions being touched by
this patch, we don't need all that anyway, as mentioned above, so the
simplest solution is to just remove them.  The few remaining t_is*()
functions will need a different treatment in a separate patch.

pg_trgm has some compile-time options with macros such as
KEEPONLYALNUM.  These are not documented, and the non-default variant
is not supported by any test cases.  As part of this undertaking, I'm
removing the non-default variant, as it is in the way of cleanup.  So
in this case, the not-KEEPONLYALNUM code path is gone.

Reviewed-by: Jeff Davis <pgsql@j-davis.com>
Discussion: https://www.postgresql.org/message-id/flat/653f3b84-fc87-45a7-9a0c-bfb4fcab3e7d%40eisentraut.org
2024-12-17 12:52:29 +01:00
Tom Lane
969bbd0faf contrib/earthdistance: Use SQL-standard function bodies.
The @extschema:name@ feature added by 72a5b1fc8 allows us to
make earthdistance's references to the cube extension fully
search-path-secure, so long as all those references are
resolved at extension installation time not runtime.
To do that, we must convert earthdistance's SQL functions to
the new SQL-standard style; but we wanted to do that anyway.

The functions can be updated in our customary style by running
CREATE OR REPLACE FUNCTION in an extension update script.
However, there's still problems in the "CREATE DOMAIN earth"
command: its references to cube functions could be captured
by hostile objects in earthdistance's installation schema,
if that's not where the cube extension is.  Worse, the reference
to the cube type itself as the domain's base could be captured,
and that's not something we could fix after-the-fact in the
update script.

What I've done about that is to change the "CREATE DOMAIN earth"
command in the base script earthdistance--1.1.sql.  Ordinarily,
changing a released extension script is forbidden; but I think
it's okay here since the results of successful (non-trojaned)
script execution will be identical to before.

A good deal of care is still needed to make the extension's scripts
proof against search-path-based attacks.  We have to make sure that
all the function and operator invocations have exact argument-type
matches, to forestall attacks based on supplying a better match.
Fortunately earthdistance isn't very big, so I've just gone through
it and inspected each call to be sure of that.  The only actual code
changes needed were to spell all floating-point constants in the style
'-1'::float8, rather than depending on runtime type conversions and/or
negations.  (I'm not sure that the shortcuts previously used were
attackable, but removing run-time effort is a good thing anyway.)

I believe that this fixes earthdistance enough that we could
mark it trusted and remove the warnings about it that were
added by 7eeb1d986; but I've not done that here.

The primary reason for dealing with this now is that we've
received reports of pg_upgrade failing for databases that use
earthdistance functions in contexts like generated columns.
That's a consequence of 2af07e2f7 having restricted the search_path
used while evaluating such expressions.  The only way to fix that
is to make the earthdistance functions independent of run-time
search_path.  This patch is very much nicer than the alternative of
attaching "SET search_path" clauses to earthdistance's functions:
it is more secure and doesn't create a run-time penalty.  Therefore,
I've chosen to back-patch this to v16 where @extschema:name@
was added.  It won't help unless users update to 16.7 and issue
"ALTER EXTENSION earthdistance UPDATE" before upgrading to 17,
but at least there's now a way to deal with the problem without
manual intervention in the dump/restore process.

Tom Lane and Ronan Dunklau

Discussion: https://postgr.es/m/3316564.aeNJFYEL58@aivenlaptop
Discussion: https://postgr.es/m/6a6439f1-8039-44e2-8fb9-59028f7f2014@mailbox.org
2024-12-14 16:07:18 -05:00
David Rowley
89988ac589 Fix further fallout from EXPLAIN ANALYZE BUFFERS change
c2a4078eb adjusted EXPLAIN ANALYZE to default the BUFFERS to ON.  This
(hopefully) fixes the last remaining issue with regression test failures
with -D RELCACHE_FORCE_RELEASE -D CATCACHE_FORCE_RELEASE builds, where
the planner accesses more buffers due to the cold caches.

Discussion: https://postgr.es/m/CAApHDvqLdzgz77JsE-yTki3w9UiKQ-uTMLRctazcu+99-ips3g@mail.gmail.com
2024-12-12 09:50:00 +13:00
David Rowley
c2a4078eba Enable BUFFERS with EXPLAIN ANALYZE by default
The topic of turning EXPLAIN's BUFFERS option on with the ANALYZE option
has come up a few times over the past few years.  In many ways, doing this
seems like a good idea as it may be more obvious to users why a given
query is running more slowly than they might expect.  Also, from my own
(David's) personal experience, I've seen users posting to the mailing
lists with two identical plans, one slow and one fast asking why their
query is sometimes slow.  In many cases, this is due to additional reads.
Having BUFFERS on by default may help reduce some of these questions, and
if not, make it more obvious to the user before they post, or save a
round-trip to the mailing list when additional I/O effort is the cause of
the slowness.

The general consensus is that we want BUFFERS on by default with
ANALYZE.  However, there were more than zero concerns raised with doing
so.  The primary reason against is the additional verbosity, making it
harder to read large plans.  Another concern was that buffer information
isn't always useful so may not make sense to have it on by default.

It's currently December, so let's commit this to see if anyone comes
forward with a strong objection against making this change.  We have over
half a year remaining in the v18 cycle where we could still easily consider
reverting this if someone were to come forward with a convincing enough
reason as to why doing this is a bad idea.

There were two patches independently submitted to achieve this goal, one
by me and the other by Guillaume.  This commit is a mix of both of these
patches with some additional work done by me to adjust various
additional places in the documentation which include EXPLAIN ANALYZE
output.

Author: Guillaume Lelarge, David Rowley
Reviewed-by: Robert Haas, Greg Sabino Mullane, Michael Christofides
Discussion: https://postgr.es/m/CANNMO++W7MM8T0KyXN3ZheXXt-uLVM3aEtZd+WNfZ=obxffUiA@mail.gmail.com
2024-12-11 22:35:11 +13:00
Tom Lane
3eea7a0c97 Simplify executor's determination of whether to use parallelism.
Our parallel-mode code only works when we are executing a query
in full, so ExecutePlan must disable parallel mode when it is
asked to do partial execution.  The previous logic for this
involved passing down a flag (variously named execute_once or
run_once) from callers of ExecutorRun or PortalRun.  This is
overcomplicated, and unsurprisingly some of the callers didn't
get it right, since it requires keeping state that not all of
them have handy; not to mention that the requirements for it were
undocumented.  That led to assertion failures in some corner
cases.  The only state we really need for this is the existing
QueryDesc.already_executed flag, so let's just put all the
responsibility in ExecutePlan.  (It could have been done in
ExecutorRun too, leading to a slightly shorter patch -- but if
there's ever more than one caller of ExecutePlan, it seems better
to have this logic in the subroutine than the callers.)

This makes those ExecutorRun/PortalRun parameters unnecessary.
In master it seems okay to just remove them, returning the
API for those functions to what it was before parallelism.
Such an API break is clearly not okay in stable branches,
but for them we can just leave the parameters in place after
documenting that they do nothing.

Per report from Yugo Nagata, who also reviewed and tested
this patch.  Back-patch to all supported branches.

Discussion: https://postgr.es/m/20241206062549.710dc01cf91224809dd6c0e1@sraoss.co.jp
2024-12-09 14:38:19 -05:00