Compare commits

...

965 Commits

Author SHA1 Message Date
Etsuro Fujita
e5a3c9d9b5 postgres_fdw: Inherit the local transaction's access/deferrable modes.
Previously, postgres_fdw always 1) opened a remote transaction in READ
WRITE mode even when the local transaction was READ ONLY, causing a READ
ONLY transaction using it that references a foreign table mapped to a
remote view executing a volatile function to write in the remote side,
and 2) opened the remote transaction in NOT DEFERRABLE mode even when
the local transaction was DEFERRABLE, causing a SERIALIZABLE READ ONLY
DEFERRABLE transaction using it to abort due to a serialization failure
in the remote side.

To avoid these, modify postgres_fdw to open a remote transaction in the
same access/deferrable modes as the local transaction.  This commit also
modifies it to open a remote subtransaction in the same access mode as
the local subtransaction.

Although these issues exist since the introduction of postgres_fdw,
there have been no reports from the field.  So it seems fine to just fix
them in master only.

Author: Etsuro Fujita <etsuro.fujita@gmail.com>
Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/CAPmGK16n_hcUUWuOdmeUS%2Bw4Q6dZvTEDHb%3DOP%3D5JBzo-M3QmpQ%40mail.gmail.com
2025-06-01 17:30:00 +09:00
Dean Rasheed
b006bcd531 Fix MERGE into a plain inheritance parent table.
When a MERGE's target table is the parent of an inheritance tree, any
INSERT actions insert into the parent table using ModifyTableState's
rootResultRelInfo. However, there are two bugs in the way is
initialized:

1. ExecInitMerge() incorrectly uses a different ResultRelInfo entry
from ModifyTableState's resultRelInfo array to build the insert
projection, which may not be compatible with rootResultRelInfo.

2. ExecInitModifyTable() does not fully initialize rootResultRelInfo.
Specifically, ri_WithCheckOptions, ri_WithCheckOptionExprs,
ri_returningList, and ri_projectReturning are not initialized.

This can lead to crashes, or incorrect query results due to failing to
check WCO's or process the RETURNING list for INSERT actions.

Fix both these bugs in ExecInitMerge(), noting that it is only
necessary to fully initialize rootResultRelInfo if the MERGE has
INSERT actions and the target table is a plain inheritance parent.

Backpatch to v15, where MERGE was introduced.

Reported-by: Andres Freund <andres@anarazel.de>
Author: Dean Rasheed <dean.a.rasheed@gmail.com>
Reviewed-by: Jian He <jian.universality@gmail.com>
Reviewed-by: Tender Wang <tndrwang@gmail.com>
Discussion: https://postgr.es/m/4rlmjfniiyffp6b3kv4pfy4jw3pciy6mq72rdgnedsnbsx7qe5@j5hlpiwdguvc
Backpatch-through: 15
2025-05-31 12:12:58 +01:00
Michael Paquier
e050af2868 Change internal plan ID type from uint64 to int64
uint64 was chosen to be consistent with the type used by the query ID,
but the conclusion of a recent discussion for the query ID is that int64
is a better fit as the signed form is shown to the user, for PGSS or
EXPLAIN outputs.

This commit changes the plan ID to use int64, following c3eda50b06
that has done the same for the query ID.

The plan ID is new to v18, introduced in 2a0cd38da5.

Author: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Sami Imseih <samimseih@gmail.com>
Discussion: https://postgr.es/m/aCvzJNwetyEI3Sgo@paquier.xyz
2025-05-31 09:40:45 +09:00
Nathan Bossart
706054b11b Ensure we have a snapshot when updating various system catalogs.
A few places that access system catalogs don't set up an active
snapshot before potentially accessing their TOAST tables.  To fix,
push an active snapshot just before each section of code that might
require accessing one of these TOAST tables, and pop it shortly
afterwards.  While at it, this commit adds some rather strict
assertions in an attempt to prevent such issues in the future.

Commit 16bf24e0e4 recently removed pg_replication_origin's TOAST
table in order to fix the same problem for that catalog.  On the
back-branches, those bugs are left in place.  We cannot easily
remove a catalog's TOAST table on released major versions, and only
replication origins with extremely long names are affected.  Given
the low severity of the issue, fixing older versions doesn't seem
worth the trouble of significantly modifying the patch.

Also, on v13 and v14, the aforementioned strict assertions have
been omitted because commit 2776922201, which added
HaveRegisteredOrActiveSnapshot(), was not back-patched.  While we
could probably back-patch it now, I've opted against it because it
seems unlikely that new TOAST snapshot issues will be introduced in
the oldest supported versions.

Reported-by: Alexander Lakhin <exclusion@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/18127-fe54b6a667f29658%40postgresql.org
Discussion: https://postgr.es/m/18309-c0bf914950c46692%40postgresql.org
Discussion: https://postgr.es/m/ZvMSUPOqUU-VNADN%40nathan
Backpatch-through: 13
2025-05-30 15:17:28 -05:00
Tom Lane
232d8caeaa Fix memory leakage in postgres_fdw's DirectModify code path.
postgres_fdw tries to use PG_TRY blocks to ensure that it will
eventually free the PGresult created by the remote modify command.
However, it's fundamentally impossible for this scheme to work
reliably when there's RETURNING data, because the query could fail
in between invocations of postgres_fdw's DirectModify methods.
There is at least one instance of exactly this situation in the
regression tests, and the ensuing session-lifespan leak is visible
under Valgrind.

We can improve matters by using a memory context reset callback
attached to the ExecutorState context.  That ensures that the
PGresult will be freed when the ExecutorState context is torn
down, even if control never reaches postgresEndDirectModify.

I have little faith that there aren't other potential PGresult
leakages in the backend modules that use libpq.  So I think it'd
be a good idea to apply this concept universally by creating
infrastructure that attaches a reset callback to every PGresult
generated in the backend.  However, that seems too invasive for
v18 at this point, let alone the back branches.  So for the
moment, apply this narrow fix that just makes DirectModify safe.
I have a patch in the queue for the more general idea, but it
will have to wait for v19.

Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Matheus Alcantara <matheusssilv97@gmail.com>
Discussion: https://postgr.es/m/2976982.1748049023@sss.pgh.pa.us
Backpatch-through: 13
2025-05-30 13:45:41 -04:00
Tom Lane
d98cefe114 Allow larger packets during GSSAPI authentication exchange.
Our GSSAPI code only allows packet sizes up to 16kB.  However it
emerges that during authentication, larger packets might be needed;
various authorities suggest 48kB or 64kB as the maximum packet size.
This limitation caused login failure for AD users who belong to many
AD groups.  To add insult to injury, we gave an unintelligible error
message, typically "GSSAPI context establishment error: The routine
must be called again to complete its function: Unknown error".

As noted in code comments, the 16kB packet limit is effectively a
protocol constant once we are doing normal data transmission: the
GSSAPI code splits the data stream at those points, and if we change
the limit then we will have cross-version compatibility problems
due to the receiver's buffer being too small in some combinations.
However, during the authentication exchange the packet sizes are
not determined by us, but by the underlying GSSAPI library.  So we
might as well just try to send what the library tells us to.
An unpatched recipient will fail on a packet larger than 16kB,
but that's not worse than the sender failing without even trying.
So this doesn't introduce any meaningful compatibility problem.

We still need a buffer size limit, but we can easily make it be
64kB rather than 16kB until transport negotiation is complete.
(Larger values were discussed, but don't seem likely to add
anything.)

Reported-by: Chris Gooch <cgooch@bamfunds.com>
Fix-suggested-by: Jacob Champion <jacob.champion@enterprisedb.com>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Jacob Champion <jacob.champion@enterprisedb.com>
Discussion: https://postgr.es/m/DS0PR22MB5971A9C8A3F44BCC6293C4DABE99A@DS0PR22MB5971.namprd22.prod.outlook.com
Backpatch-through: 13
2025-05-30 12:55:15 -04:00
Fujii Masao
961553daf5 Make XactLockTableWait() and ConditionalXactLockTableWait() interruptable more.
Previously, XactLockTableWait() and ConditionalXactLockTableWait() could enter
a non-interruptible loop when they successfully acquired a lock on a transaction
but the transaction still appeared to be running. Since this loop continued
until the transaction completed, it could result in long, uninterruptible waits.

Although this scenario is generally unlikely since XactLockTableWait() and
ConditionalXactLockTableWait() can basically acquire a transaction lock
only when the transaction is not running, it can occur in a hot standby.
In such cases, the transaction may still appear active due to
the KnownAssignedXids list, even while no lock on the transaction exists.
For example, this situation can happen when creating a logical replication
slot on a standby.

The cause of the non-interruptible loop was the absence of CHECK_FOR_INTERRUPTS()
within it. This commit adds CHECK_FOR_INTERRUPTS() to the loop in both functions,
ensuring they can be interrupted safely.

Back-patch to all supported branches.

Author: Kevin K Biju <kevinkbiju@gmail.com>
Reviewed-by: Fujii Masao <masao.fujii@gmail.com>
Discussion: https://postgr.es/m/CAM45KeELdjhS-rGuvN=ZLJ_asvZACucZ9LZWVzH7bGcD12DDwg@mail.gmail.com
Backpatch-through: 13
2025-05-31 00:08:40 +09:00
David Rowley
c3eda50b06 Change internal queryid type from uint64 to int64
uint64 was perhaps chosen in cff440d36 as the type was uint32 prior to
that widening work.

Having this as uint64 doesn't make much sense and just adds the overhead of
having to remember that we always output this in its signed form.  Let's
remove that overhead.

The signed form output is seemingly required since we have no way to
represent the full range of uint64 in an SQL type.  We use BIGINT in places
like pg_stat_statements, which maps directly to int64.

The release notes "Source Code" section may want to mention this
adjustment as some extensions may wish to adjust their code.

Author: David Rowley <dgrowleyml@gmail.com>
Suggested-by: Peter Eisentraut <peter@eisentraut.org>
Reviewed-by: Sami Imseih <samimseih@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/50cb0c8b-994b-48f9-a1c4-13039eb3536b@eisentraut.org
2025-05-30 22:59:39 +12:00
Bruce Momjian
03c53a7314 doc PG 18 relnotes: modify async I/O item for other improvements
Add "etc." to indicate other actions will also be improved by
asynchronous I/O.

Reported-by: Melanie Plageman

Discussion: https://postgr.es/m/CAAKRu_bqjgSYA+OdemL-X91Yv53OwsVARZy+-tRyj8YQ=kcj0A@mail.gmail.com
2025-05-29 12:37:05 -04:00
Tom Lane
470273da0f Avoid resource leaks when a dblink connection fails.
If we hit out-of-memory between creating the PGconn and inserting
it into dblink's hashtable, we'd lose track of the PGconn, which
is quite bad since it represents a live connection to a remote DB.
Fix by rearranging things so that we create the hashtable entry
first.

Also reduce the number of states we have to deal with by getting rid
of the separately-allocated remoteConn object, instead allocating it
in-line in the hashtable entries.  (That incidentally removes a
session-lifespan memory leak observed in the regression tests.)

There is an apparently-irreducible remaining OOM hazard, which
is that if the connection fails at the libpq level (ie it's
CONNECTION_BAD) then we have to pstrdup the PGconn's error message
before we can release it, and theoretically that could fail.  However,
in such cases we're only leaking memory not a live remote connection,
so I'm not convinced that it's worth sweating over.

This is a pretty low-probability failure mode of course, but losing
a live connection seems bad enough to justify back-patching.

Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Matheus Alcantara <matheusssilv97@gmail.com>
Discussion: https://postgr.es/m/1346940.1748381911@sss.pgh.pa.us
Backpatch-through: 13
2025-05-29 10:39:55 -04:00
Fujii Masao
3c4d7557e0 Fix assertion failure in pg_prewarm() on objects without storage.
An assertion test added in commit 049ef33 could fail when pg_prewarm()
was called on objects without storage, such as partitioned tables.
This resulted in the following failure in assert-enabled builds:

    Failed Assert("RelFileNumberIsValid(rlocator.relNumber)")

Note that, in non-assert builds, pg_prewarm() just failed with an error
in that case, so there was no ill effect in practice.

This commit fixes the issue by having pg_prewarm() raise an error early
if the specified object has no storage. This approach is similar to
the fix in commit 4623d7144 for pg_freespacemap.

Back-patched to v17, where the issue was introduced.

Author: Masahiro Ikeda <ikedamsh@oss.nttdata.com>
Reviewed-by: Dilip Kumar <dilipbalaut@gmail.com>
Reviewed-by: Richard Guo <guofenglinux@gmail.com>
Reviewed-by: Fujii Masao <masao.fujii@gmail.com>
Discussion: https://postgr.es/m/e082e6027610fd0a4091ae6d033aa117@oss.nttdata.com
Backpatch-through: 17
2025-05-29 17:50:32 +09:00
Michael Paquier
c3623703f3 Add AioUringCompletion in wait_event_names.txt
Oversight in c325a7633f, where the LWLock tranche AioUringCompletion
has been added.

Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Discussion: https://postgr.es/m/aDT5sBOxJTdulXnE@paquier.xyz
2025-05-29 13:25:05 +09:00
Bruce Momjian
a1de1b0833 doc PG 18 relnotes: split apart log_connections item
Also add details to asynchronous I/O item.

Reported-by: Melanie Plageman

Discussion: https://postgr.es/m/CAAKRu_YsVvyantS0X0Y_-vp_97=yGaoYJMXXyCEkR7pumAH3Jg@mail.gmail.com
2025-05-28 22:43:36 -04:00
Michael Paquier
35a428f30b pg_stat_statements: Fix parameter number gaps in normalized queries
pg_stat_statements anticipates that certain constant locations may be
recorded multiple times and attempts to avoid calculating a length for
these locations in fill_in_constant_lengths().

However, during generate_normalized_query() where normalized query
strings are generated, these locations are not excluded from
consideration.  This could increment the parameter number counter for
every recorded occurrence at such a location, leading to an incorrect
normalization in certain cases with gaps in the numbers reported.

For example, take this query:
SELECT WHERE '1' IN ('2'::int, '3'::int::text)
Before this commit, it would be normalized like that, with gaps in the
parameter numbers:
SELECT WHERE $1 IN ($3::int, $4::int::text)
However the correct, less confusing one should be like that:
SELECT WHERE $1 IN ($2::int, $3::int::text)

This commit fixes the computation of the parameter numbers to track the
number of constants replaced with an $n by a separate counter instead of
the iterator used to loop through the list of locations.

The underlying query IDs are not changed, neither are the normalized
strings for existing PGSS hash entries.  New entries with fresh
normalized queries would automatically get reshaped based on the new
parameter numbering.

Issue discovered while discussing a separate problem for HEAD, but this
affects all the stable branches.

Author: Sami Imseih <samimseih@gmail.com>
Discussion: https://postgr.es/m/CAA5RZ0tzxvWXsacGyxrixdhy3tTTDfJQqxyFBRFh31nNHBQ5qA@mail.gmail.com
Backpatch-through: 13
2025-05-29 11:26:03 +09:00
Bruce Momjian
089f27cf8a doc: clarify log_connections new "setup_durations" output 2025-05-28 21:42:34 -04:00
Bruce Momjian
bf6034d00d doc PG 18 relnotes: move ANALYZE item,split ANALYZE/EXPLAIN item
Reported-by: Yugo Nagata

Author: Yugo Nagata

Discussion: https://postgr.es/m/20250528232503.7db770f651c2c821c0e3c1df@sraoss.co.jp
2025-05-28 18:43:31 -04:00
Tom Lane
e5d64fd654 Tighten parsing of datetime input.
ParseFraction only expects to deal with fields that contain a decimal
point and digit(s).  However it's possible in some edge cases for it
to be passed input that doesn't look like that.  In particular the
input could look like a valid floating-point number, such as ".123e6".
strtod() will happily eat that, possibly producing a result that is
not within the expected range 0..1, which can result in integer
overflow in the callers.  That doesn't have any security consequences,
but it's still not very desirable.  Fix by checking that the input
has the expected form.

Similarly, DecodeNumberField only expects to deal with fields that
contain a decimal point and digit(s), but it's sometimes abused to
parse strings that might not look like that.  This could result in
failure to reject bogus input, yielding silly results.  Again, fix
by rejecting input that doesn't look as-expected.  That decision
also means that we can affirmatively answer the very old comment
questioning whether we couldn't save some duplicative code by
using ParseFractionalSecond here.

While these changes should only reject input that nobody would
consider valid, it still doesn't seem like a change to make in
stable branches.  Apply to HEAD only.

Reported-by: Evgeniy Gorbanev <gorbanev.es@gmail.com>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/1328335.1748371099@sss.pgh.pa.us
2025-05-28 15:10:48 -04:00
Tom Lane
be86ca103a Fix memory leakage when function compilation fails.
In pl_comp.c, initially create the plpgsql function's cache context
under the assumed-short-lived caller's context, and reparent it under
CacheMemoryContext only upon success.  This avoids a process-lifespan
leak of 8kB or more if the function contains syntax errors.  (This
leakage has existed for a long time without many complaints, but as
we move towards a possibly multi-threaded future, getting rid of
process-lifespan leaks grows more important.)

In funccache.c, arrange to reclaim the CachedFunction struct in case
the language-specific compile callback function throws an error;
previously, that resulted in an independent process-lifespan leak.
This is arguably a new bug in v18, since the leakage now occurred
for SQL-language functions as well as plpgsql.

Also, don't fill fn_xmin/fn_tid/dcallback until after successful
completion of the compile callback.  This avoids a scenario where a
partially-built function cache might appear already valid upon later
inspection, and another scenario where dcallback might fail upon being
presented with an incomplete cache entry.  We would have to reach such
a faulty cache entry via a pre-existing fn_extra pointer, so I'm not
sure these scenarios correspond to any live bug.  (The predecessor
code in pl_comp.c never took any care about this, and we've heard no
complaints about that.)  Still, it's better to be careful.

Given the lack of field complaints, I'm not very excited about
back-patching any of this; but it seems still in-scope for v18.

Discussion: https://postgr.es/m/999171.1748300004@sss.pgh.pa.us
2025-05-28 13:29:45 -04:00
Bruce Momjian
c861092b0e doc PG 18 relnotes: clarify multiplication item
Reported-by: Dean Rasheed

Author: Dean Rasheed

Discussion: https://postgr.es/m/CAEZATCXZGU3LLMZHobYys1MLpyNMAus7+UUpWeeFYwSaPNC2CA@mail.gmail.com
2025-05-28 12:34:11 -04:00
Michael Paquier
4fbb46f612 Adjust regex for test with opening parenthesis in character classes
As written, the test was throwing an error because of an unbalanced
parenthesis.  The regex used in the test is adjusted to not fail and to
test the case of an opening parenthesis in a character class after some
nested square brackets.

Oversight in d46911e584.

Discussion: https://postgr.es/m/16ab039d1af455652bdf4173402ddda145f2c73b.camel@cybertec.at
2025-05-28 09:43:31 +09:00
Michael Paquier
d46911e584 Fix conversion of SIMILAR TO regexes for character classes
The code that translates SIMILAR TO pattern matching expressions to
POSIX-style regular expressions did not consider that square brackets
can be nested.  For example, in an expression like [[:alpha:]%_], the
logic replaced the placeholders '_' and '%' but it should not.

This commit fixes the conversion logic by tracking the nesting level of
square brackets marking character class areas, while considering that
in expressions like []] or [^]] the first closing square bracket is a
regular character.  Multiple tests are added to show how the conversions
should or should not apply applied while in a character class area, with
specific cases added for all the characters converted outside character
classes like an opening parenthesis '(', dollar sign '$', etc.

Author: Laurenz Albe <laurenz.albe@cybertec.at>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/16ab039d1af455652bdf4173402ddda145f2c73b.camel@cybertec.at
Backpatch-through: 13
2025-05-28 08:58:40 +09:00
Bruce Momjian
3e782ca322 doc PG 18 relnotes: add removal details to MD5 item
Reported-by: Nathan Bossart

Author: Nathan Bossart

Discussion: https://postgr.es/m/aDXLoTcBYjfyqeTA@nathan
2025-05-27 17:50:52 -04:00
Bruce Momjian
08b8aa1748 doc PG 18 relnotes: fix markup
Reported-by: Peter Smith

Discussion: https://postgr.es/m/CAHut+PswZ7wFtpNgv3bdtYK5D0eGMpvz4CcnAxvj7gR_acazGQ@mail.gmail.com
2025-05-27 17:34:45 -04:00
Jeff Davis
34eb2a80d5 Change pg_dump default for statistics export.
Set the default behavior of pg_dump and pg_dumpall to be
--no-statistics.

Leave the default for pg_restore and pg_upgrade to be
--with-statistics.

Discussion: https://postgr.es/m/CA+TgmoZ9=RnWcCOZiKYYjZs_AW1P4QXCw--h4dOLLHuf1Omung@mail.gmail.com
Reviewed-by: Greg Sabino Mullane <htamfids@gmail.com>
Reviewed-by: Nathan Bossart <nathandbossart@gmail.com>
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
2025-05-27 13:54:38 -07:00
Masahiko Sawada
4c08ecd161 Fix assertion when decrementing eager scanning success and failure counters.
Previously, we asserted that the eager scan's success and failure
counters were positive before decrementing them. However, this
assumption was incorrect, as it's possible that some blocks have
already been eagerly scanned by the time eager scanning is disabled.

This commit replaces the assertions with guards to handle this
scenario gracefully.

With this change, we continue to allow read-ahead operations by the
read stream that exceed the success and failure caps. While there is a
possibility that overruns will trigger eager scans of additional
pages, this does not pose a practical concern as the overruns will not
be substantial and remain within an acceptable range.

Reviewed-by: Melanie Plageman <melanieplageman@gmail.com>
Discussion: https://postgr.es/m/CAD21AoConf6tkVCv-=JhQJj56kYsDwo4jG5+WqgT+ukSkYomSQ@mail.gmail.com
2025-05-27 11:42:36 -07:00
Peter Eisentraut
c53f3b9cc8 Improve file_copy_method entry in postgresql.conf.sample
Improve the wording of the comment a bit, fix whitespace.  Also move
the entry so that the section order is consistent with config.sgml.
2025-05-26 14:52:00 +02:00
Daniel Gustafsson
1f62dbf5f0 doc: Fix wording in JIT README
Remove superfluous 'is' from sentence.

Author: Yugo Nagata <nagata@sraoss.co.jp>
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Discussion: https://postgr.es/m/20250526154412.5f77dfead87af9afc089cc48@sraoss.co.jp
2025-05-26 13:30:01 +02:00
Michael Paquier
52a1df85f2 Fix race condition in subscription TAP test 021_twophase
The test did not wait for all the subscriptions to have caught up when
dropping the subscription "tab_copy".  In a slow environment, it could
be possible for the replay of the COMMIT PREPARED transaction "mygid"
to not be confirmed yet, causing one prepared transaction to be left
around before moving to the next steps of the test.

One failure noticed is a transaction found in pg_prepared_xacts for the
cases where copy_data = false and two_phase = true, but there should be
none after dropping the subscription.

As an extra safety measure, a check is added before dropping the
subscription, scanning pg_prepared_xacts to make sure that no prepared
transactions are left once both subscriptions have caught up.

Issue introduced by a8fd13cab0, fixing a problem similar to
eaf5321c35.

Per buildfarm member kestrel.

Author: Vignesh C <vignesh21@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Discussion: https://postgr.es/m/CALDaNm329QaZ+bwU--bW6GjbNSZ8-38cDE8QWofafub7NV67oA@mail.gmail.com
Backpatch-through: 15
2025-05-26 17:28:37 +09:00
Amit Kapila
3bcb554fd2 Doc: Make logical replication examples executable in bulk.
To improve the usability of logical replication examples, we need to
enable bulk copy-pasting of DML/DDL series.

Currently, output command tags and prompts disrupt this workflow. While
prompts are typically removed, converting them to comments is acceptable
here, given the multi-server context.

Additionally, ensure all examples containing operators like < and > are
wrapped in CDATA blocks to guarantee correct rendering and consistency
with other places.

Author: David G. Johnston <david.g.johnston@gmail.com>
Reviewed-by: Peter Smith <smithpb2250@gmail.com>
Discussion: https://postgr.es/m/CAKFQuwbhbL1uaDTuo9shmo1rA-fX6XGotR7qZQ7rd-ia5ZDoQA@mail.gmail.com
2025-05-26 11:05:05 +05:30
Fujii Masao
47d90b741d doc: Fix documenation for snapshot export in logical decoding.
The documentation for exported snapshots in logical decoding previously
stated that snapshot creation may fail on a hot standby. This is no longer
accurate, as snapshot exporting on standbys has been supported since
PostgreSQL 10. This commit removes the outdated description.

Additionally, the docs referred to the NOEXPORT_SNAPSHOT option to
suppress snapshot exporting in CREATE_REPLICATION_SLOT. However,
since PostgreSQL 15, NOEXPORT_SNAPSHOT is considered legacy syntax
and retained only for backward compatibility. This commit updates
the documentation for v15 and later to use the modern equivalent:
SNAPSHOT 'nothing'. The older syntax is preserved in documentation for
v14 and earlier.

Back-patched to all supported branches.

Reported-by: Kevin K Biju <kevinkbiju@gmail.com>
Author: Fujii Masao <masao.fujii@gmail.com>
Reviewed-by: Kevin K Biju <kevinkbiju@gmail.com>
Discussion: https://postgr.es/m/174791480466.798.17122832105389395178@wrigleys.postgresql.org
Backpatch-through: 13
2025-05-26 12:47:33 +09:00
Bruce Momjian
44ce4e1593 doc PG 18 relnotes: clarify btree skip-scan item
Reported-by: Peter Geoghegan

Discussion: https://postgr.es/m/CAH2-Wzko57+sT=FcxHHo7jnPLhh35up_5aAvogLtj_D9bATsgQ@mail.gmail.com
2025-05-23 17:02:33 -04:00
Jacob Champion
a8f093234d oauth: Correct missing comma in Requires.private
I added libcurl to the Requires.private section of libpq.pc in commit
b0635bfda, but I missed that the Autoconf side needs commas added
explicitly. Configurations which used both --with-libcurl and
--with-openssl ended up with the following entry:

    Requires.private: libssl, libcrypto libcurl

The pkg-config parser appears to be fairly lenient in this case, and
accepts the whitespace as an equivalent separator, but let's not rely on
that. Add an add_to_list macro (inspired by Makefile.global's
add_to_path) to build up the PKG_CONFIG_REQUIRES_PRIVATE list correctly.

Reported-by: Wolfgang Walther <walther@technowledgy.de>
Reviewed-by: Fabrízio de Royes Mello <fabriziomello@gmail.com>
Discussion: https://postgr.es/m/CAOYmi+k2z7Rqj5xiWLUT0+bSXLvdE7TYgS5gCOSqSyXyTSSXiQ@mail.gmail.com
2025-05-23 13:05:38 -07:00
Jacob Champion
cbc8fd0c9a oauth: Limit JSON parsing depth in the client
Check the ctx->nested level as we go, to prevent a server from running
the client out of stack space.

The limit we choose when communicating with authorization servers can't
be overly strict, since those servers will continue to add extensions in
their JSON documents which we need to correctly ignore. For the SASL
communication, we can be more conservative, since there are no defined
extensions (and the peer is probably more Postgres code).

Reviewed-by: Aleksander Alekseev <aleksander@timescale.com>
Discussion: https://postgr.es/m/CAOYmi%2Bm71aRUEi0oQE9ciBnBS8xVtMn3CifaPu2kmJzUfhOZgA%40mail.gmail.com
2025-05-23 13:05:33 -07:00
Bruce Momjian
1ca583f6c0 doc PG 18 relnotes: update to current
Includes runtime injection point item by Michael Paquier.

Reported-by: Michael Paquier

Author: Michael Paquier

Discussion: https://postgr.es/m/aDAS0_eWzeGl4sok@paquier.xyz
2025-05-23 16:01:07 -04:00
Tom Lane
02502c1bca Fix per-relation memory leakage in autovacuum.
PgStat_StatTabEntry and AutoVacOpts structs were leaked until
the end of the autovacuum worker's run, which is bad news if
there are a lot of relations in the database.

Note: pfree'ing the PgStat_StatTabEntry structs here seems a bit
risky, because pgstat_fetch_stat_tabentry_ext does not guarantee
anything about whether its result is long-lived.  It appears okay
so long as autovacuum forces PGSTAT_FETCH_CONSISTENCY_NONE, but
I think that API could use a re-think.

Also ensure that the VacuumRelation structure passed to
vacuum() is in recoverable storage.

Back-patch to v15 where we started to manage table statistics
this way.  (The AutoVacOpts leakage is probably older, but
I'm not excited enough to worry about just that part.)

Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/285483.1746756246@sss.pgh.pa.us
Backpatch-through: 15
2025-05-23 14:43:43 -04:00
Tom Lane
6aa33afe6d Fix AlignedAllocRealloc to cope sanely with OOM.
If the inner allocation call returns NULL, we should restore the
previous state and return NULL.  Previously this code pfree'd
the old chunk anyway, which is surely wrong.

Also, make it call MemoryContextAllocationFailure rather than
summarily returning NULL.  The fact that we got control back from the
inner call proves that MCXT_ALLOC_NO_OOM was passed, so this change
is just cosmetic, but someday it might be less so.

This is just a latent bug at present: AFAICT no in-core callers use
this function at all, let alone call it with MCXT_ALLOC_NO_OOM.
Still, it's the kind of bug that might bite back-patched code pretty
hard someday, so let's back-patch to v17 where the bug was introduced
(by commit 743112a2e).

Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/285483.1746756246@sss.pgh.pa.us
Backpatch-through: 17
2025-05-23 11:47:33 -04:00
Daniel Gustafsson
fb844b9f06 Revert function to get memory context stats for processes
Due to concerns raised about the approach, and memory leaks found
in sensitive contexts the functionality is reverted. This reverts
commits 45e7e8ca9, f8c115a6c, d2a1ed172, 55ef7abf8 and 042a66291
for v18 with an intent to revisit this patch for v19.

Discussion: https://postgr.es/m/594293.1747708165@sss.pgh.pa.us
2025-05-23 15:44:54 +02:00
Peter Eisentraut
70a13c528b Move oauth_validator_libraries in postgresql.conf.sample
Move oauth_validator_libraries in postgresql.conf.sample to be grouped
with the other CONN_AUTH_AUTH settings, rather than making up a new
ad-hoc category.  This matches the internal categorization and also
how it is listed in the documentation.
2025-05-23 09:03:09 +02:00
Bruce Momjian
883339c170 doc PG 18 relnotes: adjust CREATE SUBSCRIPTION attribution
Reported-by: vignesh C

Discussion: https://postgr.es/m/CALDaNm0Wy-vJ6dE+e=y=yuq31i2KvGf-Rs-u6QOG4K7TpU_6Tw@mail.gmail.com
2025-05-22 23:02:11 -04:00
Bruce Momjian
7ddfac79f2 doc PG 18 relnotes: clarify btree skip scan item
Reported-by: Peter Geoghegan

Discussion: https://postgr.es/m/CAH2-Wz=2CWXgO1+uyR-VfN3ALMtFnfTtXK-VtkoQQ89ogm=4sg@mail.gmail.com
2025-05-22 22:24:18 -04:00
Bruce Momjian
3b7140d27e doc PG 18 relnotes: remove duplicate commit entry
Item related to btree skip scans.
2025-05-22 21:41:38 -04:00
Tom Lane
b7ab88ddb1 Fix assorted new memory leaks in libpq.
Valgrind'ing the postgres_fdw tests showed me that libpq was leaking
PGconn.be_cancel_key.  It looks like freePGconn is expecting
pqDropServerData to release it ... but in a cancel connection
object, that doesn't happen.

Looking a little closer, I was dismayed to find that freePGconn
also missed freeing the pgservice, min_protocol_version,
max_protocol_version, sslkeylogfile, scram_client_key_binary,
and scram_server_key_binary strings.  There's much less excuse
for those oversights.  Worse, that's from five different commits
(a460251f0, 4b99fed75, 285613c60, 2da74d8d6, 761c79508),
some of them by extremely senior hackers.

Fortunately, all of these are new in v18, so we haven't
shipped any leaky versions of libpq.

While at it, reorder the operations in freePGconn to match the
order of the fields in struct PGconn.  Some of those free's seem
to have been inserted with the aid of a dartboard.
2025-05-22 20:35:32 -04:00
Melanie Plageman
cb1456423d Replace deprecated log_connections values in docs and tests
9219093cab modularized log_connections output to allow more
granular control over which aspects of connection establishment are
logged. It converted the boolean log_connections GUC into a list of strings
and deprecated previously supported boolean-like values on, off, true,
false, 1, 0, yes, and no. Those values still work, but they are
supported mainly for backwards compatability. As such, documented
examples of log_connections should not use these deprecated values.

Update references in the docs to deprecated log_connections values. Many
of the tests use log_connections. This commit also updates the tests to
use the new values of log_connections. In some of the tests, the updated
log_connections value covers a narrower set of aspects (e.g. the
'authentication' aspect in the tests in src/test/authentication and the
'receipt' aspect in src/test/postmaster). In other cases, the new value
for log_connections is a superset of the previous included aspects (e.g.
'all' in src/test/kerberos/t/001_auth.pl).

Reported-by: Peter Eisentraut <peter@eisentraut.org>
Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Peter Eisentraut <peter@eisentraut.org>
Reviewed-by: Jacob Champion <jacob.champion@enterprisedb.com>
Discussion: https://postgr.es/m/e1586594-3b69-4aea-87ce-73a7488cdc97%40eisentraut.org
2025-05-22 17:14:54 -04:00
Tom Lane
d376ab570e In ExecInitModifyTable, don't scribble on the source plan.
The code carelessly modified mtstate->ps.plan->targetlist,
which it's not supposed to do.  Fortunately, there's not
really any need to do that because the planner already
set up a perfectly acceptable targetlist for the plan node.
We just need to remove the erroneous assignments and update some
relevant comments.

As it happens, the erroneous assignments caused the targetlist to
point to a different part of the source plan tree, so that there
isn't really a risk of the pointer becoming dangling after executor
termination.  The only visible effect of this change we can find is
that EXPLAIN will show upper references to the ModifyTable's output
expressions using different variables.  Formerly it showed Vars from
the first target relation that survived executor-startup pruning.
Now it always shows such references using the first relation appearing
in the planner output, independently of what happens during executor
pruning.  On the whole that seems like a good thing.

Also make a small tweak in ExplainPreScanNode to ensure that the first
relation will receive a refname assignment in set_rtable_names, even
if it got pruned at startup.  Previously the Vars might be shown
without any table qualification, which is confusing in a multi-table
query.

I considered back-patching this, but since the bug doesn't seem to
have any really terrible consequences in existing branches, it
seems better to not change their EXPLAIN output.  It's not too late
for v18 though, especially since v18 already made other changes in
the EXPLAIN output for these cases.

Reported-by: Tom Lane <tgl@sss.pgh.pa.us>
Author: Andres Freund <andres@anarazel.de>
Co-authored-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/213261.1747611093@sss.pgh.pa.us
2025-05-22 14:28:51 -04:00
Tom Lane
f24605e2dc Fix memory leak in XMLSERIALIZE(... INDENT).
xmltotext_with_options sometimes tries to replace the existing
root node of a libxml2 document.  In that case xmlDocSetRootElement
will unlink and return the old root node; if we fail to free it,
it's leaked for the remainder of the session.  The amount of memory
at stake is not large, a couple hundred bytes per occurrence, but
that could still become annoying in heavy usage.

Our only other xmlDocSetRootElement call is not at risk because
it's working on a just-created document, but let's modify that
code too to make it clear that it's dependent on that.

Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Jim Jones <jim.jones@uni-muenster.de>
Discussion: https://postgr.es/m/1358967.1747858817@sss.pgh.pa.us
Backpatch-through: 16
2025-05-22 13:52:46 -04:00
Nathan Bossart
5d6eac80cd pg_dump: Adjust reltuples from 0 to -1 for dumps of older versions.
Before v14, a reltuples value of 0 was ambiguous: it could either
mean the relation is empty, or it could mean that it hadn't yet
been vacuumed or analyzed.  (Commit 3d351d916b taught v14 and newer
to use -1 for the latter case.)  This ambiguity allegedly can cause
the planner to choose inefficient plans after restoring to v18 or
newer.  To fix, let's just dump reltuples as -1 in that case.  This
will cause some truly empty tables to be seen as not-yet-processed,
but that seems unlikely to cause too much trouble in practice.

Note that we could alternatively teach pg_restore_relation_stats()
to translate reltuples based on the version argument, but since
that function doesn't exist until v18, there's no particular
advantage to that approach.  That is, there's no chance of
restoring stats dumped from a pre-v14 server to another pre-v14
server.  Per discussion, the current policy is to fix pre-v18
behavior differences during export and everything else during
import.

Commit 9879105024 fixed a similar problem for vacuumdb by removing
the check for reltuples != 0.  Presumably we could reinstate that
check now, but I've chosen to leave it in place in case reltuples
isn't accurate.  As before, processing some empty tables seems
relatively harmless.

Author: Hari Krishna Sunder <hari.db.pg@gmail.com>
Reviewed-by: Jeff Davis <pgsql@j-davis.com>
Reviewed-by: Corey Huinker <corey.huinker@gmail.com>
Discussion: https://postgr.es/m/CAAeiqZ0o2p4SX5_xPcuAbbsmXjg6MJLNuPYSLUjC%3DWh-VeW64A%40mail.gmail.com
2025-05-22 10:23:26 -05:00
Amit Langote
1722d5eb05 Revert "Don't lock partitions pruned by initial pruning"
As pointed out by Tom Lane, the patch introduced fragile and invasive
design around plan invalidation handling when locking of prunable
partitions was deferred from plancache.c to the executor. In
particular, it violated assumptions about CachedPlan immutability and
altered executor APIs in ways that are difficult to justify given the
added complexity and overhead.

This also removes the firstResultRels field added to PlannedStmt in
commit 28317de72, which was intended to support deferred locking of
certain ModifyTable result relations.

Reported-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/605328.1747710381@sss.pgh.pa.us
2025-05-22 17:02:35 +09:00
Peter Eisentraut
f3622b6476 doc: Move documentation of md5_password_warnings to a better place
Commit db6a4a985b categorized md5_password_warnings as an
authentication setting, and the placement in postgresql.conf.sample
matches that, but in the documentation it ended up under logging
settings, which isn't unreasonable but inconsistent.  This moves the
documentation chunk to authentication settings as well.
2025-05-21 16:29:05 +02:00
Michael Paquier
3d0c3a418f Adjust operation names of pg_aios to match the documentation
pg_aios used the terms "read" and "write" for vectored I/O read and
write operations, respectively.  The documentation refers to them as
"readv" and "writev", and the code uses internally the terms
PGAIO_OP_READV and PGAIO_OP_WRITEV for them, as of "vectored".

This commit adjusts these operation names to match with the code and the
documentation.

Oversight in 8e293e689b.

Author: Atsushi Torikoshi <torikoshia@oss.nttdata.com>
Discussion: https://postgr.es/m/6df1e949d1d759ad2767c18e5845963e@oss.nttdata.com
2025-05-21 15:58:03 +09:00
Fujii Masao
0bd762e81f Fix incorrect WAL description for PREPARE TRANSACTION record.
Since commit 8b1dccd37c, the PREPARE TRANSACTION WAL record includes
information about dropped statistics entries. However, the WAL resource
manager description function for PREPARE TRANSACTION record failed to
parse this information correctly and always assumed there were
no such entries.

As a result, for example, pg_waldump could not display the dropped
statistics entries stored in PREPARE TRANSACTION records.

The root cause was that ParsePrepareRecord() did not set the number of
statistics entries to drop on commit or abort. These values remained
zero-initialized and were never updated from the parsed record.

This commit fixes the issue by properly setting those values during parsing.
With this fix, pg_waldump can now correctly report dropped statistics
entries in PREPARE TRANSACTION records.

Back-patch to v15, where commit 8b1dccd37c was introduced.

Author: Daniil Davydov <3danissimo@gmail.com>
Reviewed-by: Fujii Masao <masao.fujii@gmail.com>
Discussion: https://postgr.es/m/CAJDiXgh-6Epb2XiJe4uL0zF-cf0_s_7Lw1TfEHDMLzYjEmfGOw@mail.gmail.com
Backpatch-through: 15
2025-05-21 11:55:14 +09:00
Michael Paquier
06450c7b8c Fix regression with location calculation of nested statements
The statement location calculated for some nested query cases was wrong
when multiple queries are sent as a single string, these being separated
by semicolons.  As pointed by Sami Imseih, the location calculation was
incorrect when the last query of nested statement with multiple queries
does **NOT** finish with a semicolon for the last statement.  In this
case, the statement length tracked by RawStmt is 0, which is equivalent
to say that the string should be used until its end.  The code
previously discarded this case entirely, causing the location to remain
at 0, the same as pointing at the beginning of the string.  This caused
pg_stat_statements to store incorrect query strings.

This issue has been introduced in 499edb0974.  I have looked at the
diffs generated by pgaudit back then, and noticed the difference
generated for this nested query case, but I have missed the point that
it was an actual regression with an existing case.  A test case is added
in pg_stat_statements to provide some coverage, restoring the pre-17
behavior for the calculation of the query locations.  Special thanks to
David Steele, who, through an analysis of the test diffs generated by
pgaudit with the new v18 logic, has poked me about the fact that my
original analysis of the matter was wrong.

The test output of pg_overexplain is updated to reflect the new logic,
as the new locations refer to the beginning of the argument passed to
the function explain_filter().  When the module was introduced in
8d5ceb113e, which was after 499edb0974 (for the new calculation
method), the locations of the test were not actually right: the plan
generated for the query string given in input of the function pointed to
the top-level query, not the nested one.

Reported-by: David Steele <david@pgbackrest.org>
Author: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Anthonin Bonnefoy <anthonin.bonnefoy@datadoghq.com>
Reviewed-by: Jian He <jian.universality@gmail.com>
Reviewed-by: Sami Imseih <samimseih@gmail.com>
Reviewed-by: David Steele <david@pgbackrest.org>
Discussion: https://postgr.es/m/844a3b38-bbf1-4fb2-9fd6-f58c35c09917@pgbackrest.org
2025-05-21 10:22:12 +09:00
Nathan Bossart
a6060f1cbe pg_dump: Fix array literals in fetchAttributeStats().
Presently, fetchAttributeStats() builds array literals by treating
the elements as SQL identifiers.  This is incorrect for a couple of
reasons:

* Array literal content must match the external text representation
  of the array, i.e., what array_out() would return.  One notable
  problem is that double quotes are escaped with "" in identifiers
  but with \" in array literals.  To fix, build the array content
  using the pre-existing appendPGArray() function.

* Array literals must be written as string constants.  A notable
  problem here is that single quotes are escaped via '' in strings
  but are not escaped in the text representation of an array.  To
  fix, append the aforementioned array literal content to the query
  with appendStringLiteralAH().

While at it, modify a test case to use an identifier that would
cause the test to fail without this change.

Oversight in commit 9c02e3a986.

Reported-by: Philippe Beaudoin <pbh.emaj@free.fr>
Author: Jian He <jian.universality@gmail.com>
Co-authored-by: Nathan Bossart <nathandbossart@gmail.com>
Co-authored-by: Stepan Neretin <slpmcf@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Bug: #18923
Discussion: https://postgr.es/m/18923-e79273f87c6bed69%40postgresql.org
2025-05-20 16:31:00 -05:00
Heikki Linnakangas
cbf53e2b8a Fix cross-version upgrade test failure
Commit 29f7ce6fe7 added another view that needs adjustment in the
cross-version upgrade test. This should fix the XversionUpgrade
failures in the buildfarm.

Backpatch-through: 16
Discussion: https://www.postgresql.org/message-id/18929-077d6b7093b176e2@postgresql.org
2025-05-20 10:39:14 +03:00
Michael Paquier
54675d8986 doc: Clarify use of _ccnew and _ccold in REINDEX CONCURRENTLY
Invalid indexes are suffixed with "_ccnew" or "_ccold".  The
documentation missed to mention the initial underscore.
ChooseRelationName() may also append an extra number if indexes with a
similar name already exist; let's add a note about that too.

Author: Alec Cozens <acozens@pixelpower.com>
Discussion: https://postgr.es/m/174733277404.1455388.11471370288789479593@wrigleys.postgresql.org
Backpatch-through: 13
2025-05-20 14:39:06 +09:00
Andres Freund
acad909321 aio: Fix possible state confusions due to interrupt processing
elog()/ereport() process interrupts, iff the log message is < ERROR and the
log message will be emitted. aio's debug messages are emitted via ereport(),
but in some places the code is not ready for interrupts to be processed.

Fix the issue using a few different methods:

1) handle interrupts arriving concurrently - in some places it's easy to
   detect that by fetching the handle's generation a bit earlier
2) Check if interrupts made the work needing to be done obsolete
3) Disallow interrupts, as there's no sane way to make interrupt processing
   safe

To prevent some similar issues from being re-introduced, assert that
interrupts are held in pgaio_io_update_state().

This commit also fixes the contents of a debug message I added in 039bfc457e.

Reported-by: Alexander Lakhin <exclusion@gmail.com>
Reviewed-by: Noah Misch <noah@leadboat.com>
Discussion: https://postgr.es/m/mvpm7ga3dfgz7bvum22hmuz26cariylmcppb3irayftc7bwk3r@l7gb6gr7azhc
2025-05-19 21:07:06 -04:00
Heikki Linnakangas
29f7ce6fe7 Fix deparsing FETCH FIRST <expr> ROWS WITH TIES
In the grammar, <expr> is a c_expr, which accepts only a limited set
of integer literals and simple expressions without parens. The
deparsing logic didn't quite match the grammar rule, and failed to use
parens e.g. for "5::bigint".

To fix, always surround the expression with parens. Would be nice to
omit the parens in simple cases, but unfortunately it's non-trivial to
detect such simple cases. Even if the expression is a simple literal
123 in the original query, after parse analysis it becomes a FuncExpr
with COERCE_IMPLICIT_CAST rather than a simple Const.

Reported-by: yonghao lee
Backpatch-through: 13
Discussion: https://www.postgresql.org/message-id/18929-077d6b7093b176e2@postgresql.org
2025-05-19 18:50:26 +03:00
Amit Kapila
ad5eaf390c Don't retreat slot's confirmed_flush LSN.
Prevent moving the confirmed_flush backwards, as this could lead to data
duplication issues caused by replicating already replicated changes.

This can happen when a client acknowledges an LSN it doesn't have to do
anything for, and thus didn't store persistently. After a restart, the
client can send the prior LSN that it stored persistently as an
acknowledgement, but we need to ignore such an LSN to avoid retreating
confirm_flush LSN.

Diagnosed-by: Zhijie Hou <houzj.fnst@fujitsu.com>
Author: shveta malik <shveta.malik@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Reviewed-by: Dilip Kumar <dilipbalaut@gmail.com>
Tested-by: Nisha Moond <nisha.moond412@gmail.com>
Backpatch-through: 13
Discussion: https://postgr.es/m/CAJpy0uDZ29P=BYB1JDWMCh-6wXaNqMwG1u1mB4=10Ly0x7HhwQ@mail.gmail.com
Discussion: https://postgr.es/m/OS0PR01MB57164AB5716AF2E477D53F6F9489A@OS0PR01MB5716.jpnprd01.prod.outlook.com
2025-05-19 12:13:06 +05:30
Tom Lane
f8db5c7a3f Doc: add pre-branch task to run src/tools/copyright.pl.
It's common for some files with last year's copyright date
to sneak into the tree between early January (when we normally run
copyright.pl) and feature freeze.  Immediately before branching
the new release is an ideal time to fix the stragglers, so add a
note about it to the RELEASE_CHANGES checklist.

Discussion: https://postgr.es/m/CALa6HA4_Wu7-2PV0xv-Q84cT8eG7rTx6bdjUV0Pc=McAwkNMfQ@mail.gmail.com
2025-05-18 23:31:44 -04:00
Michael Paquier
2c6469d4cd Fix incorrect year in some copyright notices
A couple of new files have been added in the tree with a copyright year
of 2024 while we were already in 2025.  These should be marked with
2025, so let's fix them.

Reported-by: Shaik Mohammad Mujeeb <mujeeb.sk.dev@gmail.com>
Discussion: https://postgr.es/m/CALa6HA4_Wu7-2PV0xv-Q84cT8eG7rTx6bdjUV0Pc=McAwkNMfQ@mail.gmail.com
2025-05-19 09:46:52 +09:00
Michael Paquier
11b2dc3709 ecpg: Add missing newline in meson.build
Noticed while performing a routine sanity check of the files in the
tree.  Issue introduced by 28f04984f0.

Discussion: https://postgr.es/m/CALa6HA4_Wu7-2PV0xv-Q84cT8eG7rTx6bdjUV0Pc=McAwkNMfQ@mail.gmail.com
2025-05-19 09:44:17 +09:00
Alexander Korotkov
3d3a81fc24 Fix tuple_fraction calculation in generate_orderedappend_paths()
6b94e7a6da adjusted generate_orderedappend_paths() to consider fractional
paths.  However, it didn't manage to interpret the tuple_fraction value
correctly.  According to the header comment of grouping_planner(), the
tuple_fraction >= 1 specifies the absolute number of expected tuples.  That
number must be divided by the expected total number of tuples to get the
actual fraction.

Even though this is a bug fix, we don't backpatch it.  The risks of the side
effects of plan changes on stable branches are too high.

Reported-by: Andrei Lepikhov <lepihov@gmail.com>
Discussion: https://postgr.es/m/3ca271fa-ca5c-458c-8934-eb148622b270%40gmail.com
Author: Andrei Lepikhov <lepihov@gmail.com>
Reviewed-by: Junwang Zhao <zhjwpku@gmail.com>
Reviewed-by: Alvaro Herrera <alvherre@alvh.no-ip.org>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
2025-05-18 23:49:50 +03:00
Tom Lane
12eee85e51 Make our usage of memset_s() conform strictly to the C11 standard.
Per the letter of the C11 standard, one must #define
__STDC_WANT_LIB_EXT1__ as 1 before including <string.h> in order to
have access to memset_s().  It appears that many platforms are lenient
about this, because we weren't doing it and yet the code appeared to
work anyway.  But we now find that with -std=c11, macOS is strict and
doesn't declare memset_s, leading to compile failures since we try to
use it anyway.  (Given the lack of prior reports, perhaps this is new
behavior in the latest SDK?  No matter, we're clearly in the wrong.)

In addition to the immediate problem, which could be fixed merely by
adding the needed #define to explicit_bzero.c, it seems possible that
our configure-time probe for memset_s() could fail in case a platform
implements the function in some odd way due to this spec requirement.
This concern can be fixed in largely the same way that we dealt with
strchrnul() in 6da2ba1d8: switch to using a declaration-based
configure probe instead of a does-it-link probe.

Back-patch to v13 where we started using memset_s().

Reported-by: Lakshmi Narayana Velayudam <dev.narayana.v@gmail.com>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/CAA4pTnLcKGG78xeOjiBr5yS7ZeE-Rh=FaFQQGOO=nPzA1L8yEA@mail.gmail.com
Backpatch-through: 13
2025-05-18 12:45:55 -04:00
Daniel Gustafsson
0d4dad200d Fix function name reference in comment
Ensure that we refer to the function being used, rather than the
name of the resulting function in question.

Author: Paul A Jungwirth <pj@illuminatedcomputing.com>
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Discussion: https://postgr.es/m/CA+renyVZNiHEv5ceKDjA4j5xC6NT6mRuW33BDERBQMi_90_t6A@mail.gmail.com
2025-05-18 10:05:38 +02:00
Daniel Gustafsson
5987553fde Align organization wording in copyright statement
This aligns the copyright and legal notice wordig with commit
a233a603ba and pgweb commit 2d764dbc083ab8.  Backpatch down
to all supported versions.

Author: Daniel Gustafsson <daniel@yesql.se>
Reviewed-by: Dave Page <dpage@pgadmin.org>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/744E414E-3F52-404C-97FB-ED9B3AA37DC8@yesql.se
Backpatch-through: 13
2025-05-16 11:20:07 -04:00
Richard Guo
fe29b2a1da Fix Assert failure in XMLTABLE parser
In an XMLTABLE expression, columns can be marked NOT NULL, and the
parser internally fabricates an option named "is_not_null" to
represent this.  However, the parser also allows users to specify
arbitrary option names.  This creates a conflict: a user can
explicitly use "is_not_null" as an option name and assign it a
non-Boolean value, which violates internal assumptions and triggers an
assertion failure.

To fix, this patch checks whether a user-supplied name collides with
the internally reserved option name and raises an error if so.
Additionally, the internal name is renamed to "__pg__is_not_null" to
further reduce the risk of collision with user-defined names.

Reported-by: Евгений Горбанев <gorbanyoves@basealt.ru>
Author: Richard Guo <guofenglinux@gmail.com>
Reviewed-by: Alvaro Herrera <alvherre@kurilemu.de>
Discussion: https://postgr.es/m/6bac9886-65bf-4cec-96bd-e304159f28db@basealt.ru
Backpatch-through: 15
2025-05-15 17:09:04 +09:00
Richard Guo
2c0ed86d39 Add explicit initialization for all PlannerGlobal fields
When creating a new PlannerGlobal node in standard_planner(), most
fields are explicitly initialized, but a few are not.  This doesn't
cause any functional issues, as makeNode() zeroes all fields by
default.  However, the inconsistency is undesirable from a clarity and
maintenance perspective.

This patch explicitly initializes the remaining fields to improve
consistency and readability.

Author: Richard Guo <guofenglinux@gmail.com>
Reviewed-by: David Rowley <dgrowleyml@gmail.com>
Discussion: https://postgr.es/m/CAMbWs4-TgQHNOiouqGcuHoBqbJjWyx4UxGKxUY3FrF4trGbcPA@mail.gmail.com
2025-05-14 09:59:31 +09:00
Daniel Gustafsson
6e289f2d5d Fix order of parameters in POD documentation
The documentation for log_check() had the parameters in the wrong
order.  Also while there, rename %parameters to %params to better
documentation for similar functions which use %params.  Backpatch
down to v14 where this was introduced.

Author: Daniel Gustafsson <daniel@yesql.se>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/9F503B5-32F2-45D7-A0AE-952879AD65F1@yesql.se
Backpatch-through: 14
2025-05-13 07:29:14 -04:00
Amit Kapila
8ede692de5 Fix the race condition in the test added by 7c99dc587.
After executing ALTER SUBSCRIPTION tap_sub SET PUBLICATION, we did not
wait for the new walsender process to restart. As a result, an INSERT
executed immediately after the ALTER could be decoded and skipped,
considering it is not part of any subscribed publication. And, the old
apply worker could also confirm the LSN of such an INSERT. This could
cause the replication to resume from a point after the INSERT. In such
cases, we miss the expected warning about the missing publication.

To fix this, ensure the walsender has restarted before continuing after
ALTER SUBSCRIPTION.

Reported-by: Tom Lane as per CI
Author: vignesh C <vignesh21@gmail.com>
Reviewed-by: Xuneng Zhou <xunengzhou@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Discussion: https://postgr.es/m/1230066.1745992333@sss.pgh.pa.us
2025-05-13 09:54:29 +05:30
Álvaro Herrera
dbf42b84ac
Add tab-complete for ALTER DOMAIN ADD [CONSTRAINT]
We can add tab-completion with "CHECK (" and "NOT NULL" after ALTER
DOMAIN ADD [CONSTRAINT].

ALTER DOMAIN dom ADD -> CHECK (
ALTER DOMAIN dom ADD -> NOT NULL
ALTER DOMAIN dom ADD -> CONSTRAINT
ALTER DOMAIN dom ADD CONSTRAINT nm -> CHECK (
ALTER DOMAIN dom ADD CONSTRAINT nm -> NOT NULL

Author: jian he <jian.universality@gmail.com>
Author: Dagfinn Ilmari Mannsåker <ilmari@ilmari.org>
Discussion: https://postgr.es/m/CACJufxG_f6LzAT_McC-kKmQWpuWnOYKyNBw8Kv3xzTjPqmeHcA@mail.gmail.com
2025-05-11 10:16:45 -04:00
Álvaro Herrera
0588656366
Fix comment of tsquerysend()
The comment describes the order in which fields are sent, and it had one
of the fields in the wrong place.

This has been wrong since e6dbcb72fa (2008), so backpatch all the way
back.

Author: Emre Hasegeli <emre@hasegeli.com>
Discussion: https://postgr.es/m/CAE2gYzzf38bR_R=izhpMxAmqHXKeM5ajkmukh4mNs_oXfxcMCA@mail.gmail.com
2025-05-11 09:47:10 -04:00
Álvaro Herrera
dc9a2d54fd
relcache: Avoid memory leak on tables with no CHECK constraints
As complained about by Valgrind, in commit a379061a22 I failed to
realize that I was causing rd_att->constr->check to become allocated
when no CHECK constraints exist; previously it'd remain NULL.  (This was
my bug, not the mentioned commit author's).  Fix by making the
allocation conditional, and set ->check to NULL if unallocated.

Reported-by: Yasir <yasir.hussain.shah@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/202505082025.57ijx3qrbx7u@alvherre.pgsql
2025-05-11 09:22:12 -04:00
Álvaro Herrera
7b2ad43426
Sort includes in alphabetical order
Added by commit 042a66291b, no backpatch needed.
2025-05-11 09:15:05 -04:00
Tom Lane
d4a7e4e179 Fix incorrect "return NULL" in BumpAllocLarge().
This must be "return MemoryContextAllocationFailure(context, size, flags)"
instead.  The effect of this oversight is that if we got a malloc
failure right here, the code would act as though MCXT_ALLOC_NO_OOM
had been specified, whether it was or not.  That would likely lead
to a null-pointer-dereference crash at the unsuspecting call site.

Noted while messing with a patch to improve our Valgrind leak
detection support.  Back-patch to v17 where this code came in.
2025-05-10 20:22:39 -04:00
Noah Misch
4a4ee0c2c1 Remove GLOBALTABLESPACE_OID assert for locked buffers.
Commit f4ece891fc added the assertion in
an attempt to catch some defects even after VACUUM FULL or REINDEX.
However, IsCatalogTextUniqueIndexOid(tag.relNumber) always returns false
after a relfilenode change, provoking unintended assertion failures.

Reported-by: Adam Guo <adamguo@amazon.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Bug: #18912
Discussion: https://postgr.es/m/18912-a41c9bd0e0ad19b1@postgresql.org
2025-05-10 07:36:27 -07:00
Bruce Momjian
99ddf8615c doc PG 18 relnotes: mv. hash joins and GROUP BY item to General
Reported-by: David Rowley

Discussion: https://postgr.es/m/CAApHDvqJz+Zf7a6abisqoTGottDSRD+YPx=aQSgCsCKD476vGA@mail.gmail.com
2025-05-09 23:40:02 -04:00
Michael Paquier
c259ba881c aio: Use runtime arguments with injections points in tests
This cleans up the code related to the testing infrastructure of AIO
that used injection points, switching the test code to use the new
facility for injection points added by 371f2db8b0 rather than tweaks
to pass and reset arguments to the callbacks run.

This removes all the dependencies to USE_INJECTION_POINTS in the AIO
code.  pgaio_io_call_inj(), pgaio_inj_io_get() and pgaio_inj_cur_handle
are now gone.

Reviewed-by: Greg Burd <greg@burd.me>
Discussion: https://postgr.es/m/Z_y9TtnXubvYAApS@paquier.xyz
2025-05-10 12:36:57 +09:00
Michael Paquier
36e5fda632 injection_points: Add support and tests for runtime arguments
This commit provides some test coverage for the runtime arguments of
injection points, for both INJECTION_POINT_CACHED() and
INJECTION_POINT(), as extended in 371f2db8b0.

The SQL functions injection_points_cached() and injection_points_run()
are extended so as it is possible to pass an optional string value to
them.

Reviewed-by: Greg Burd <greg@burd.me>
Discussion: https://postgr.es/m/Z_y9TtnXubvYAApS@paquier.xyz
2025-05-10 07:40:25 +09:00
Michael Paquier
371f2db8b0 Add support for runtime arguments in injection points
The macros INJECTION_POINT() and INJECTION_POINT_CACHED() are extended
with an optional argument that can be passed down to the callback
attached when an injection point is run, giving to callbacks the
possibility to manipulate a stack state given by the caller.  The
existing callbacks in modules injection_points and test_aio have their
declarations adjusted based on that.

da7226993f (core AIO infrastructure) and 93bc3d75d8 (test_aio) and
been relying on a set of workarounds where a static variable called
pgaio_inj_cur_handle is used as runtime argument in the injection point
callbacks used by the AIO tests, in combination with a TRY/CATCH block
to reset the argument value.  The infrastructure introduced in this
commit will be reused for the AIO tests, simplifying them.

Reviewed-by: Greg Burd <greg@burd.me>
Discussion: https://postgr.es/m/Z_y9TtnXubvYAApS@paquier.xyz
2025-05-10 06:56:26 +09:00
Bruce Momjian
89372d0aaa doc PG 18 relnotes: fix missing parens for crc32c()
Reported-by: Steven Niu

Discussion: https://postgr.es/m/CABBtG=ejqK58cFWpw3etVZfQfhjC-qOqV+9GQWRnLO+p9wYMbw@mail.gmail.com
2025-05-09 14:16:17 -04:00
Tom Lane
95129709fd Skip RSA-PSS ssl test when using LibreSSL.
Presently, LibreSSL does not have working support for RSA-PSS,
so disable that test.  Per discussion at
https://marc.info/?l=libressl&m=174664225002441&w=2
they do intend to fix this, but it's a ways off yet.

Reported-by: Thomas Munro <thomas.munro@gmail.com>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Discussion: https://postgr.es/m/CA+hUKG+fLqyweHqFSBcErueUVT0vDuSNWui-ySz3+d_APmq7dw@mail.gmail.com
Backpatch-through: 15
2025-05-09 12:29:01 -04:00
Tom Lane
75d73331d0 Hack one ssl test case to pass with current LibreSSL.
With LibreSSL, our test of error logging for cert chain depths > 0
reports the wrong certificate.  This is almost certainly their bug
not ours, so just tweak the test to accept their answer.

No back-patch needed, since this test case wasn't enabled before
e0f373ee4.

Reported-by: Thomas Munro <thomas.munro@gmail.com>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Discussion: https://postgr.es/m/CA+hUKG+fLqyweHqFSBcErueUVT0vDuSNWui-ySz3+d_APmq7dw@mail.gmail.com
2025-05-09 11:53:51 -04:00
Tom Lane
0aaf69965d Centralize ssl tests' check for whether we're using LibreSSL.
Right now there's only one caller, so that this is merely
an exercise in shoving code from one module to another,
but there will shortly be another one.  It seems better to
avoid having two copies of this highly-subject-to-change test.

Back-patch to v15, where we first introduced some tests that
don't work with LibreSSL.

Reported-by: Thomas Munro <thomas.munro@gmail.com>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Discussion: https://postgr.es/m/CA+hUKG+fLqyweHqFSBcErueUVT0vDuSNWui-ySz3+d_APmq7dw@mail.gmail.com
Backpatch-through: 15
2025-05-09 11:50:33 -04:00
Peter Eisentraut
bc35adee8d doc: Put new options in consistent order on man pages 2025-05-09 09:03:41 +02:00
Heikki Linnakangas
b28c59a6cd Use 'void *' for arbitrary buffers, 'uint8 *' for byte arrays
A 'void *' argument suggests that the caller might pass an arbitrary
struct, which is appropriate for functions like libc's read/write, or
pq_sendbytes(). 'uint8 *' is more appropriate for byte arrays that
have no structure, like the cancellation keys or SCRAM tokens. Some
places used 'char *', but 'uint8 *' is better because 'char *' is
commonly used for null-terminated strings. Change code around SCRAM,
MD5 authentication, and cancellation key handling to follow these
conventions.

Discussion: https://www.postgresql.org/message-id/61be9e31-7b7d-49d5-bc11-721800d89d64@eisentraut.org
2025-05-08 22:01:25 +03:00
Heikki Linnakangas
965213d9c5 Use more mundane 'int' type for cancel key lengths in libpq
The documented max length of a cancel key is 256 bytes, so it fits in
uint8. It nevertheless seems weird to not just use 'int', like in
commit 0f1433f053 for the backend.

Discussion: https://www.postgresql.org/message-id/61be9e31-7b7d-49d5-bc11-721800d89d64%40eisentraut.org
2025-05-08 22:01:20 +03:00
Bruce Momjian
9d710a1ac0 PG 18 relnotes: adjust RETURNING new/old item
Reported-by: jian he

Discussion: https://postgr.es/m/CACJufxFM1avdwu=OrTx_uMAjTDbFOj1Gp7mnNHOofTVj9QtmRw@mail.gmail.com
2025-05-08 11:11:08 -04:00
Daniel Gustafsson
8fcc648780 doc: Fix title markup for AT TIME ZONE and AT LOCAL
The title for AT TIME ZONE and AT LOCAL was accidentally wrapping the
"and" in the <literal> tag.  Backpatch to v17 where it was introduced
in 97957fdbaa.

Author: Noboru Saito <noborusai@gmail.com>
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Reviewed-by: Tatsuo Ishii <ishii@postgresql.org>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/CAAM3qn+7QUWW9R6_YwPKXmky0xGE4n63U3EsxZeWE_QtogeU8g@mail.gmail.com
Backpatch-through: 17
2025-05-08 13:53:16 +02:00
Richard Guo
c06e909c26 Track the number of presorted outer pathkeys in MergePath
When creating an explicit Sort node for the outer path of a mergejoin,
we need to determine the number of presorted keys of the outer path to
decide whether explicit incremental sort can be applied.  Currently,
this is done by repeatedly calling pathkeys_count_contained_in.

This patch caches the number of presorted outer pathkeys in MergePath,
allowing us to save several calls to pathkeys_count_contained_in.  It
can be considered a complement to the changes in commit 828e94c9d.

Reported-by: David Rowley <dgrowleyml@gmail.com>
Author: Richard Guo <guofenglinux@gmail.com>
Reviewed-by: Tender Wang <tndrwang@gmail.com>
Discussion: https://postgr.es/m/CAApHDvqvBireB_w6x8BN5txdvBEHxVgZBt=rUnpf5ww5P_E_ww@mail.gmail.com
2025-05-08 18:21:32 +09:00
Richard Guo
773db22269 Suppress unnecessary explicit sorting for EPQ mergejoin path
When building a ForeignPath for a joinrel, if there's a possibility
that EvalPlanQual will be executed, we must identify a suitable path
for EPQ checks.  If the outer or inner path of the chosen path is a
ForeignPath representing a pushed-down join, we replace it with its
fdw_outerpath to ensure that the EPQ check path consists entirely of
local joins.

If the chosen path is a MergePath, and its outer or inner path is a
ForeignPath that is not already well enough ordered, the MergePath
will have non-NIL outersortkeys or innersortkeys indicating the
desired ordering to be created by an explicit Sort node.  If we then
replace the outer or inner path with its corresponding fdw_outerpath,
and that path is already sufficiently ordered, we end up in an
inconsistent state: the MergePath has non-NIL outersortkeys or
innersortkeys, and its input path is already properly ordered.  This
inconsistency can result in an Assert failure or the addition of a
redundant Sort node.

To fix, check if the new outer or inner path of a MergePath is already
properly sorted, and set its outersortkeys or innersortkeys to NIL if
so.

Bug: #18902
Reported-by: Nikita Kalinin <n.kalinin@postgrespro.ru>
Author: Richard Guo <guofenglinux@gmail.com>
Reviewed-by: Tender Wang <tndrwang@gmail.com>
Discussion: https://postgr.es/m/18902-71c1bed2b9f7c46f@postgresql.org
2025-05-08 18:20:18 +09:00
Bruce Momjian
9fef27a83b doc PG 18 relnotes: adjust pg_log_backend_memory_contexts()
Reported-by: David Rowley

Discussion: https://postgr.es/m/CAApHDvrGLBqs_Vm9COMY7uBDvUDMKds7RwC20YjEPf+XRTY9XQ@mail.gmail.com
2025-05-07 21:11:16 -04:00
Bruce Momjian
f8d49aa130 doc PG 18 relnotes: add pg_log_backend_memory_contexts() mention
Now zero-based.

Reported-by: David Rowley

Discussion: https://postgr.es/m/CAApHDvqMfTBdfwc0Z-tHXLnBMKJLYEZDApgUzA7x_PUDZsY3GA@mail.gmail.com
2025-05-07 20:36:21 -04:00
Bruce Momjian
69aca072eb doc PG 18 relnotes: adjust pgbench per-script reporting item
Also run src/tools/add_commit_links.pl for a previous commit.

Reported-by: Yugo Nagata

Discussion: https://postgr.es/m/20250507195941.c6e1b48c73f062b727f686a8@sraoss.co.jp
2025-05-07 16:56:26 -04:00
Bruce Momjian
3bd5271729 doc PG 18 relnotes: mention GROUP SET fixes
Reported-by: Richard Guo

Discussion: https://postgr.es/m/CAMbWs4_asKPqTCt0h9pp=zHc9vmPcnczbHeF6Xkxn1LhLapcTQ@mail.gmail.com
2025-05-07 16:39:49 -04:00
Nathan Bossart
16bf24e0e4 Remove pg_replication_origin's TOAST table.
A few places that access this catalog don't set up an active
snapshot before potentially accessing its TOAST table.  However,
roname (the replication origin name) is the only varlena column, so
this is only a problem if the name requires out-of-line storage.
This commit removes its TOAST table to avoid needing to set up a
snapshot.  It also places a limit on replication origin names so
that attempts to set long names will fail with a more user-friendly
error.  Those chosen limit of 512 bytes should be sufficient to
avoid "row is too big" errors independent of BLCKSZ, but it should
also be lenient enough for all reasonable use-cases.

Bumps catversion.

Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Reviewed-by: Euler Taveira <euler@eulerto.com>
Reviewed-by: Nisha Moond <nisha.moond412@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/ZvMSUPOqUU-VNADN%40nathan
2025-05-07 14:47:36 -05:00
Peter Geoghegan
5f4d98d4f3 Prevent premature nbtree array advancement.
nbtree array index scans could fail to return matching tuples in rare
cases where the missed tuples cover key space that the scan's arrays
incorrectly indicate has already been read.  These cases involved nearby
tuples with NULL values that were evaluated using a skip array key while
in pstate.forcenonrequired mode.

To fix, prevent forcenonrequired mode from prematurely advancing the
scan's array keys beyond key space that the scan has yet to read tuples
from: reset the scan's array keys (to the first elements in the current
scan direction) before the _bt_checkkeys call for pstate.finaltup.  That
way _bt_checkkeys starts from a clean slate, which ensures that it will
call _bt_advance_array_keys (while passing it sktrig_required=true).
This reliably restores the invariant that the scan's arrays always
accurately track its progress through the index's key space (at least
when the scan is "between pages").

Oversight in commit 8a510275, which optimized nbtree search scan key
comparisons.

Author: Peter Geoghegan <pg@bowt.ie>
Reviewed-By: Mark Dilger <mark.dilger@enterprisedb.com>
Discussion: https://postgr.es/m/CAH2-WzmodSE+gpTd1CRGU9ez8ytyyDS+Kns2r9NzgUp1s56kpw@mail.gmail.com
2025-05-07 15:20:42 -04:00
Peter Geoghegan
7e25c9363a nbtree: tighten up array recheck rules.
Be more conservative when performing a scheduled recheck of an nbtree
scan's array keys once on the next page, having set so->scanBehind: back
out of reading the page (perform another primitive scan instead) when
the next page's high key/finaltup has an untruncated prefix of matching
values and truncated suffix attributes associated with lower-order keys.
In other words, stop assuming that the lower-order keys have been
satisfied by the truncated suffix attributes in this context (only do so
when considering scheduling a recheck within _bt_advance_array_keys).

The new behavior is more logical: if the next page read after setting
so->scanBehind can only contain tuples that are themselves "behind the
scan", that's reason enough to cut our losses.  In general, when we set
so->scanBehind, we only expect to perform one recheck on the next page
to make a final decision about whether or not to continue the current
primitive index scan.  It seems unprincipled for the recheck to allow a
_bt_readpage to continue unless the scan's arrays will advance/unless
the page might actually contain relevant tuples.

In practice it is highly unlikely that things will line up like this
(the untruncated prefix of attribute values from the next page's high
key is seldom an exact match for their corresponding array's current
element following array advancement on the original/previous page).
That gives us all the more reason to keep things simple and consistent.

This was arguably an oversight in commit 9a2e2a285a, which improved
nbtree array primitive scan scheduling.

Author: Peter Geoghegan <pg@bowt.ie>
Discussion: https://postgr.es/m/CAH2-WzkXzJajgyW-pCQ7vaDPhaT3huU+Zw_j448rpCBEsu2YOQ@mail.gmail.com
2025-05-07 15:17:40 -04:00
Nathan Bossart
acea3fc49f pg_dumpall: Add --sequence-data.
I recently added this option to pg_dump, but I forgot to add it to
pg_dumpall, too.  There's probably little use for it at the moment,
but we will need it if/when we teach pg_upgrade to use pg_dumpall
to dump the database schemas.

Oversight in commit 9c49f0e8cd.

Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/aBE8rHFo922xQUwh%40nathan
2025-05-07 13:36:51 -05:00
Alexander Korotkov
ab42d643c1 Refactor ChangeVarNodesExtended() using the custom callback
fc069a3a63 implemented Self-Join Elimination (SJE) and put related logic
to ChangeVarNodes_walker().  This commit provides refactoring to remove the
SJE-related logic from ChangeVarNodes_walker() but adds a custom callback to
ChangeVarNodesExtended(), which has a chance to process a node before
ChangeVarNodes_walker().  Passing this callback to ChangeVarNodesExtended()
allows SJE-related node handling to be kept within the analyzejoins.c.

Reported-by: Richard Guo <guofenglinux@gmail.com>
Discussion: https://postgr.es/m/CAMbWs49PE3CvnV8vrQ0Dr%3DHqgZZmX0tdNbzVNJxqc8yg-8kDQQ%40mail.gmail.com
Author: Andrei Lepikhov <lepihov@gmail.com>
Author: Alexander Korotkov <aekorotkov@gmail.com>
2025-05-07 11:10:16 +03:00
Peter Eisentraut
2448c7a9e0 doc: Put some psql documentation pieces back into alphabetical order 2025-05-07 08:23:44 +02:00
Peter Eisentraut
c0cf282551 Remove some tabs in C string literals 2025-05-07 08:23:44 +02:00
Peter Eisentraut
c11bd5f500 doc: Add link to table
Formal tables should generally have an xref in the text that points to
them.  Add them here.
2025-05-07 08:23:44 +02:00
Peter Eisentraut
a2c6d84acd doc: Fix up spacing around verbatim DocBook elements 2025-05-07 08:23:44 +02:00
Michael Paquier
c4c236ab5c Fix some comments related to IO workers
IO workers are treated as auxiliary processes.  The comments fixed in
this commit stated that there could be only one auxiliary process of
each BackendType at the same time.  This is not true for IO workers, as
up to MAX_IO_WORKERS of them can co-exist at the same time.

Author: Cédric Villemain <Cedric.Villemain@data-bene.io>
Co-authored-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/e4a3ac45-abce-4b58-a043-b4a31cd11113@Data-Bene.io
2025-05-07 14:55:57 +09:00
Peter Eisentraut
09a47c68e2 Fix whitespace 2025-05-07 07:01:03 +02:00
Bruce Momjian
b560ce7884 doc PG 18 relnotes: adjust partition planning item
Reported-by: David Rowley

Discussion: https://postgr.es/m/CAApHDvqgK7uqPZAwxsfBiFhvBHHB0txaUxhUrdwG4d5Mik_RnA@mail.gmail.com
2025-05-06 21:15:44 -04:00
Bruce Momjian
ada78f9bef doc PG 18 relnotes: small adjustments regarding options
Reported-by: jian he

Discussion: https://postgr.es/m/CACJufxH1jo=hv77AK0HUJYBBMuPmr6+JT+8g-yovuJmHUPGOZQ@mail.gmail.com
2025-05-06 17:17:46 -04:00
Bruce Momjian
575f6003ed doc PG 18 relnotes: move partition locking item to General Perf
Reported-by: Amit Langote

Discussion: https://postgr.es/m/CA+HiwqE+8Pui_NCCC7zgacnet0Cf3tc_vU+P=nhLDES-8xuCUw@mail.gmail.com
2025-05-06 16:03:56 -04:00
Bruce Momjian
45750c6cfe doc PG 18 relnotes: adjust partition items
Reported-by: David Rowley

Discussion: https://postgr.es/m/CAApHDvo+BrVTXMBPjNXBTnAovJWN9+-dYc0kN7rSDqdNvpggZQ@mail.gmail.com
2025-05-06 15:45:03 -04:00
Tom Lane
caa76b91a6 Stamp 18beta1. 2025-05-05 16:25:46 -04:00
Bruce Momjian
c0e6aace02 doc PG 18 relnotes: reword OAuth item
Reported-by: Jacob Champion

Discussion: https://postgr.es/m/CAOYmi+mEQOqBSJas5V5t__b+6h_MLxyy3JFrVJEq638fnNxi0A@mail.gmail.com
2025-05-05 15:42:03 -04:00
Bruce Momjian
0de2e1c8b5 doc PG 18 relnotes: add mention of pg_stat_reset_backend_stats()
This is for WAL statistics.

Reported-by: Bertrand Drouvot

Discussion: https://postgr.es/m/aBjGlj+Yi++fVRQt@ip-10-97-1-34.eu-west-3.compute.internal
2025-05-05 14:56:58 -04:00
Bruce Momjian
092e72a930 doc PG 18 relnotes: adjust hash item
Reported-by: David Rowley

Discussion: https://postgr.es/m/CAApHDvrNmGncNgZMh2oBG5K-+4d1LGJgzrz7180OcHRT1VFojw@mail.gmail.com
2025-05-05 12:30:35 -04:00
Bruce Momjian
cf847d6340 doc PG 18 relnotes: split partition optimizer item into two
Reported-by: David Rowley

Discussion: https://postgr.es/m/CAApHDvohfoJ0D9eiUuVyHU_kq2Y7A_jAjWVsUt0Fm7Gw1Q=1cQ@mail.gmail.com
2025-05-05 11:59:56 -04:00
Noah Misch
627acc3caa With GB18030, prevent SIGSEGV from reading past end of allocation.
With GB18030 as source encoding, applications could crash the server via
SQL functions convert() or convert_from().  Applications themselves
could crash after passing unterminated GB18030 input to libpq functions
PQescapeLiteral(), PQescapeIdentifier(), PQescapeStringConn(), or
PQescapeString().  Extension code could crash by passing unterminated
GB18030 input to jsonapi.h functions.  All those functions have been
intended to handle untrusted, unterminated input safely.

A crash required allocating the input such that the last byte of the
allocation was the last byte of a virtual memory page.  Some malloc()
implementations take measures against that, making the SIGSEGV hard to
reach.  Back-patch to v13 (all supported versions).

Author: Noah Misch <noah@leadboat.com>
Author: Andres Freund <andres@anarazel.de>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Backpatch-through: 13
Security: CVE-2025-4207
2025-05-05 04:52:04 -07:00
Noah Misch
5be213caaa Refactor test_escape.c for additional ways of testing.
Start the file with static functions not specific to pe_test_vectors
tests.  This way, new tests can use them without disrupting the file's
layout.  Change report_result() PQExpBuffer arguments to plain strings.
Back-patch to v13 (all supported versions), for the next commit.

Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Backpatch-through: 13
Security: CVE-2025-4207
2025-05-05 04:52:04 -07:00
Peter Eisentraut
18c4fff640 Translation updates
Source-Git-URL: https://git.postgresql.org/git/pgtranslation/messages.git
Source-Git-Hash: f90ee4803c30491e5c49996b973b8a30de47bfb2
2025-05-05 12:04:49 +02:00
Bruce Momjian
b3754dcc9f doc PG 18 relnotes: adjust COPY and REJECT_LIMIT items
Reported-by: Atsushi Torikoshi

Discussion: https://postgr.es/m/CAM6-o=CEF6tKAjtGMEOd45YySwNRXPu8d_zyYq=fhnia9hOU6Q@mail.gmail.com
2025-05-04 22:37:20 -04:00
Bruce Momjian
d83981c24b doc PG 18 relnotes: move and clarify constraint items
Reported-by: Álvaro Herrera

Discussion: https://postgr.es/m/202505041135.cpo7zgdcya2u@alvherre.pgsql
2025-05-04 22:08:20 -04:00
Bruce Momjian
8c9eec540d doc PG 18 relnotes: add commit for cancel key and protocol neg.
Reported-by: Jelte Fennema-Nio

Discussion: https://postgr.es/m/CAGECzQQehQrhkNNXvLiBgE3odBbTPG=9PzV8F4Oqq3kOorK0Sw@mail.gmail.com
2025-05-04 21:44:39 -04:00
Bruce Momjian
a675149e87 doc PG 18 relnotes: fix libpq wording
Reported-by: Jelte Fennema-Nio

Discussion: https://postgr.es/m/CAGECzQT4804OLOP+nDBxDpMw3Soq=g+fKOE7NryBHggy4GgEcg@mail.gmail.com
2025-05-03 18:50:03 -04:00
Alexander Korotkov
2782f3b845 Revert "Refactor ChangeVarNodesExtended() using the custom callback"
This reverts commit 250a718aad.
It shouldn't be pushed during the release freeze.

Reported-by: Tom Lane
Discussion: https://postgr.es/m/E1uBIbY-000owH-0O%40gemulon.postgresql.org
2025-05-03 22:42:05 +03:00
Alexander Korotkov
250a718aad Refactor ChangeVarNodesExtended() using the custom callback
fc069a3a63 implemented Self-Join Elimination (SJE) and put related logic
to ChangeVarNodes_walker().  This commit provides refactoring to remove the
SJE-related logic from ChangeVarNodes_walker() but adds a custom callback to
ChangeVarNodesExtended(), which has a chance to process a node before
ChangeVarNodes_walker().  Passing this callback to ChangeVarNodesExtended()
allows SJE-related node handling to be kept within the analyzejoins.c.

Reported-by: Richard Guo <guofenglinux@gmail.com>
Discussion: https://postgr.es/m/CAMbWs49PE3CvnV8vrQ0Dr%3DHqgZZmX0tdNbzVNJxqc8yg-8kDQQ%40mail.gmail.com
Author: Andrei Lepikhov <lepihov@gmail.com>
Author: Alexander Korotkov <aekorotkov@gmail.com>
2025-05-03 22:30:52 +03:00
Bruce Momjian
fb21ed6c38 doc: update guidelines on non-ASCII characters in docs 2025-05-03 14:45:26 -04:00
Bruce Momjian
24987c6f06 doc PG 18 relnotes: add GROUP BY column elimination item
With a nod to PG 9.6.

Reported-by: jian he

Discussion: https://postgr.es/m/CACJufxEqs=EXZETwtaOooTFhZrtxvSWg8M2uPfzjNtS3wQ6Dzw@mail.gmail.com
2025-05-03 12:57:18 -04:00
Bruce Momjian
04b269da56 doc PG 18 relnotes: move protocol version item to "server"
Reported-by: Jelte Fennema-Nio

Discussion: https://postgr.es/m/CAGECzQSTBgTsDJPxOHWKo7106-YnnYQGzpzNJdis+xTKGUhu2g@mail.gmail.com
2025-05-03 12:19:54 -04:00
Etsuro Fujita
5201bba266 Fix memory allocation/copy mistakes.
The previous code was allocating more memory and copying more data than
necessary because it specified the wrong PgStat_KindInfo member as the
size argument for MemoryContextAlloc and memcpy, respectively.

Although these issues exist since 5891c7a8e, there have been no reports
from the field.  So for now, it seems sufficient to fix them in master.

Author: Etsuro Fujita <etsuro.fujita@gmail.com>
Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi>
Reviewed-by: Gurjeet Singh <gurjeet@singh.im>
Discussion: https://postgr.es/m/CAPmGK15eTRCZTnfgQ4EuBNo%3DQLYGFEbXS_7m2dXqtkcT7L8qrQ%40mail.gmail.com
2025-05-03 20:00:00 +09:00
Etsuro Fujita
6e91b9c16f Fix typos in comments.
Also adjust the phrasing in the comments.

Author: Etsuro Fujita <etsuro.fujita@gmail.com>
Author: Heikki Linnakangas <hlinnaka@iki.fi>
Reviewed-by: Tender Wang <tndrwang@gmail.com>
Reviewed-by: Gurjeet Singh <gurjeet@singh.im>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/CAPmGK17%3DPHSDZ%2B0G6jcj12buyyE1bQQc3sbp1Wxri7tODT-SDw%40mail.gmail.com
Backpatch-through: 15
2025-05-03 19:10:00 +09:00
Bruce Momjian
9fd989ff99 doc PG 18 relnotes: update chapter tags for recent commit 2025-05-02 20:10:10 -04:00
Bruce Momjian
9f8fcadb20 doc PG 18 relnotes: adjust libpq trace & potocol version items
Reported-by: Jelte Fennema-Nio

Discussion: https://postgr.es/m/CAGECzQQj0r_JX38fa-_kepp9UaMzCcujRAYaJG2+fPks1b8MVg@mail.gmail.com
2025-05-02 20:09:12 -04:00
Bruce Momjian
aa82ebdc29 doc PG 18 relnotes: reword and reorder items
Also move ssl_groups to a more appropriate section.

Reported-by: Jacob Champion (ssl_groups item)

Discussion: https://postgr.es/m/CAOYmi+k_zpGaDOrwV46_j-O-a_hSWxcXM6h8vccq45Y28deP-g@mail.gmail.com
2025-05-02 19:59:17 -04:00
Peter Geoghegan
0f08df4068 Avoid treating nonrequired nbtree keys as required.
Consistently prevent nbtree array advancement from treating a scankey as
required when operating in pstate.forcenonrequired mode.  Otherwise, we
risk a NULL pointer dereference.  This was possible in the path where
_bt_check_compare is called to recheck a tuple that advanced all of the
scan's arrays to matching values: its continuescan=false handling
expects _bt_advance_array_keys to have been called with a valid pstate,
but it'll always be NULL during sktrig_required=false calls (which is
how _bt_advance_array_keys must be called when pstate.forcenonrequired).

Oversight in commit 8a510275, which optimized nbtree search scan key
comparisons.

Author: Peter Geoghegan <pg@bowt.ie>
Reported-By: Mark Dilger <mark.dilger@enterprisedb.com>
Discussion: https://postgr.es/m/CAHgHdKsn2W=gPBmj7p6MjQFvxB+zZDBkwTSg0o3f5Hh8rkRrsA@mail.gmail.com
Discussion: https://postgr.es/m/CAH2-WzmodSE+gpTd1CRGU9ez8ytyyDS+Kns2r9NzgUp1s56kpw@mail.gmail.com
2025-05-02 17:50:58 -04:00
Tomas Vondra
1681a70df3 Fix memory leak in _gin_parallel_merge
To insert the merged GIN entries in _gin_parallel_merge, the leader
calls ginEntryInsert(). This may allocate memory, e.g. for a new leaf
tuple. This was allocated in the PortalContext, and kept until the end
of the index build. For most GIN indexes the amount of leaked memory is
negligible, but for custom opclasses with large keys it may cause OOMs.

Fixed by calling ginEntryInsert() in a temporary memory context, reset
after each insert. Other ginEntryInsert() callers do this too, except
that the context is reset after batches of inserts. More frequent resets
don't seem to hurt performance, it may even help it a bit.

Report and fix by Vinod Sridharan.

Author: Vinod Sridharan <vsridh90@gmail.com>
Reviewed-by: Tomas Vondra <tomas@vondra.me>
Discussion: https://postgr.es/m/CAFMdLD4p0VBd8JG=Nbi=BKv6rzFAiGJ_sXSFrw-2tNmNZFO5Kg@mail.gmail.com
2025-05-02 23:05:18 +02:00
Tom Lane
e83a8ae447 Don't use a tuplestore if we don't have to for SQL-language functions.
We only need a tuplestore if we're actually going to accumulate
multiple result tuples.  Obviously then we don't need one for non-set-
returning functions; but even a SRF doesn't need one if we decide to
use "lazyEval" (one row at a time) mode.  In these cases, it's
sufficient to use the junkfilter's result slot to hold the single row
that's due to be returned.  We just need to "materialize" that slot
to ensure it holds onto the data past shutdown of the sub-executor.

The original intent of this patch was partially to save a few cycles
(by not putting tuples into a tuplestore only to pull them back out
immediately), but mostly to ensure that we don't use a tuplestore
in non-set-returning functions.  That's because I had concerns
about whether a tuplestore is safe to keep across queries,
which was possible for functions invoked via long-lived FmgrInfos
such as those kept in the typcache.  There are no cases where SRFs
are called that way, so getting rid of the tuplestore in non-SRFs
should make things safer.

However, it emerges that running fmgr_sql in a short-lived context
(as 595d1efed made it do) makes the existing coding unsafe anyway:
we can end up with a long-lived TupleTableSlot holding a freeable
reference to a short-lived tuple, resulting in a double-free crash.
Not trying to pull tuples out of the tuplestore using that slot
dodges the problem, so I'm going to commit this now rather than
invent a band-aid solution for v18.

Reported-by: Alexander Lakhin <exclusion@gmail.com>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/2443532.1744919968@sss.pgh.pa.us
Discussion: https://postgr.es/m/9f975803-1a1c-4f21-b987-f572e110e860@gmail.com
2025-05-02 16:16:20 -04:00
Álvaro Herrera
c83a38758d
Handle self-referencing FKs correctly in partitioned tables
For self-referencing foreign keys in partitioned tables, we weren't
handling creation of pg_constraint rows during CREATE TABLE PARTITION AS
as well as ALTER TABLE ATTACH PARTITION.  This is an old bug -- mostly,
we broke this in 614a406b4f while trying to fix it (so 12.13, 13.9,
14.6 and 15.0 and up all behave incorrectly).  This commit reverts part
of that with additional fixes for full correctness, and installs more
tests to verify the parts we broke, not just the catalog contents but
also the user-visible behavior.

Backpatch to all live branches.  In branches 13 and 14, commit
46a8c27a7226 changed the behavior during DETACH to drop a FK
constraint rather than trying to repair it, because the complete fix of
repairing catalog constraints was problematic due to lack of previous
fixes.  For this reason, the test behavior in those branches is a bit
different.  However, as best as I can tell, the fix works correctly
there.

In release notes we have to recommend that all self-referencing foreign
keys on partitioned tables be recreated if partitions have been created
or attached after the FK was created, keeping in mind that violating
rows might already be present on the referencing side.

Reported-by: Guillaume Lelarge <guillaume@lelarge.info>
Reported-by: Matthew Gabeler-Lee <fastcat@gmail.com>
Reported-by: Luca Vallisa <luca.vallisa@gmail.com>
Discussion: https://postgr.es/m/CAECtzeWHCA+6tTcm2Oh2+g7fURUJpLZb-=pRXgeWJ-Pi+VU=_w@mail.gmail.com
Discussion: https://postgr.es/m/18156-a44bc7096f0683e6@postgresql.org
Discussion: https://postgr.es/m/CAAT=myvsiF-Attja5DcWoUWh21R12R-sfXECY2-3ynt8kaOqjw@mail.gmail.com
2025-05-02 21:25:50 +02:00
Tom Lane
ac557793d4 Doc: correct spelling of meson switch.
It's --auto-features not --auto_features.

Reported-by: Egor Chindyaskin <kyzevan23@mail.ru>
Discussion: https://postgr.es/m/172465652540.862882.17808523044292761256@wrigleys.postgresql.org
Discussion: https://postgr.es/m/1979661.1746212726@sss.pgh.pa.us
Backpatch-through: 16
2025-05-02 15:12:49 -04:00
Jacob Champion
3db68212a3 oauth: Correct SSL dependency for libpq-oauth.a
libpq-oauth.a includes libpq-int.h, which includes OpenSSL headers. The
Autoconf side picks up the necessary include directories via CPPFLAGS,
but Meson needs the dependency to be made explicit.

Reported-by: Nathan Bossart <nathandbossart@gmail.com>
Tested-by: Nathan Bossart <nathandbossart@gmail.com>
Tested-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/aBTgjDfrdOZmaPgv%40nathan
2025-05-02 10:45:12 -07:00
Peter Eisentraut
81eaaa2c41 Make "directory" setting work with extension_control_path
The extension_control_path setting (commit 4f7f7b0375) did not
support extensions that set a custom "directory" setting in their
control file.  Very few extensions use that and during the discussion
on the previous commit it was suggested to maybe remove that
functionality.  But a fix was easier than initially thought, so this
just adds that support.  The fix is to use the control->control_dir as
a share dir to return the path of the extension script files.

To make this work more sensibly overall, the directory suffix
"extension" is no longer to be included in the extension_control_path
value.  To quote the patch, it would be

-extension_control_path = '/usr/local/share/postgresql/extension:/home/my_project/share/extension:$system'
+extension_control_path = '/usr/local/share/postgresql:/home/my_project/share:$system'

During the initial patch, there was some discussion on which of these
two approaches would be better, and the committed patch was a 50/50
decision.  But the support for the "directory" setting pushed it the
other way, and also it seems like many people didn't like the previous
behavior much.

Author: Matheus Alcantara <mths.dev@pm.me>
Reviewed-by: Christoph Berg <myon@debian.org>
Reviewed-by: David E. Wheeler <david@justatheory.com>
Discussion: https://www.postgresql.org/message-id/flat/aAi1VACxhjMhjFnb%40msg.df7cb.de#0cdf7b7d727cc593b029650daa3c4fbc
2025-05-02 16:35:48 +02:00
Bruce Momjian
a724c7889f doc: first draft of the PG 18 release notes 2025-05-01 22:36:58 -04:00
Noah Misch
c6a26e4ccd Doc: stop implying recommendation of insecure search_path value.
SQL "SET search_path = 'pg_catalog, pg_temp'" is silently equivalent to
"SET search_path = pg_temp, pg_catalog, "pg_catalog, pg_temp"" instead
of the intended "SET search_path = pg_catalog, pg_temp".  (The intent
was a two-element search path.  With the single quotes, it instead
specifies one element with a comma and a space in the middle of the
element.)  In addition to the SET statement, this affects SET clauses of
CREATE FUNCTION, ALTER ROLE, and ALTER DATABASE.  It does not affect the
set_config() SQL function.

Though the documentation did not show an insecure command, remove single
quotes that could entice a reader to write an insecure command.
Back-patch to v13 (all supported versions).

Reported-by: Sven Klemm <sven@timescale.com>
Author: Sven Klemm <sven@timescale.com>
Backpatch-through: 13
2025-05-01 16:51:59 -07:00
Peter Eisentraut
0064020680 doc: Flesh out extension docs for the "prefix" make variable
The variable is a bit magical in how it requires "postgresql" or
"pgsql" to be part of the path, and files end up in its "share" and
"lib" subdirectories.  So mention all that and show an example of
setting "extension_control_path" and "dynamic_library_path" to use
those locations.

Author: David E. Wheeler <david@justatheory.com>
Reviewed-by: Matheus Alcantara <matheusssilv97@gmail.com>
Reviewed-by: Christoph Berg <myon@debian.org>
Discussion: https://www.postgresql.org/message-id/6B5BF07B-8A21-48E3-858C-1DC22F3A28B4@justatheory.com
2025-05-01 22:23:52 +02:00
Jacob Champion
4ea1254f35 oauth: Fix Autoconf build on macOS
Oversight in b0635bfda. -lintl is necessary for gettext on Mac, which
libpq-oauth depends on via pgport/pgcommon. (I'd incorrectly removed
this change from an earlier version of the patch, where it was suggested
by Peter Eisentraut.)

Per buildfarm member indri.
2025-05-01 12:35:52 -07:00
Jacob Champion
b0635bfda0 oauth: Move the builtin flow into a separate module
The additional packaging footprint of the OAuth Curl dependency, as well
as the existence of libcurl in the address space even if OAuth isn't
ever used by a client, has raised some concerns. Split off this
dependency into a separate loadable module called libpq-oauth.

When configured using --with-libcurl, libpq.so searches for this new
module via dlopen(). End users may choose not to install the libpq-oauth
module, in which case the default flow is disabled.

For static applications using libpq.a, the libpq-oauth staticlib is a
mandatory link-time dependency for --with-libcurl builds. libpq.pc has
been updated accordingly.

The default flow relies on some libpq internals. Some of these can be
safely duplicated (such as the SIGPIPE handlers), but others need to be
shared between libpq and libpq-oauth for thread-safety. To avoid
exporting these internals to all libpq clients forever, these
dependencies are instead injected from the libpq side via an
initialization function. This also lets libpq communicate the offsets of
PGconn struct members to libpq-oauth, so that we can function without
crashing if the module on the search path came from a different build of
Postgres. (A minor-version upgrade could swap the libpq-oauth module out
from under a long-running libpq client before it does its first load of
the OAuth flow.)

This ABI is considered "private". The module has no SONAME or version
symlinks, and it's named libpq-oauth-<major>.so to avoid mixing and
matching across Postgres versions. (Future improvements may promote this
"OAuth flow plugin" to a first-class concept, at which point we would
need a public API to replace this anyway.)

Additionally, NLS support for error messages in b3f0be788a was
incomplete, because the new error macros weren't being scanned by
xgettext. Fix that now.

Per request from Tom Lane and Bruce Momjian. Based on an initial patch
by Daniel Gustafsson, who also contributed docs changes. The "bare"
dlopen() concept came from Thomas Munro. Many people reviewed the design
and implementation; thank you!

Co-authored-by: Daniel Gustafsson <daniel@yesql.se>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Christoph Berg <myon@debian.org>
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Reviewed-by: Jelte Fennema-Nio <postgres@jeltef.nl>
Reviewed-by: Peter Eisentraut <peter@eisentraut.org>
Reviewed-by: Wolfgang Walther <walther@technowledgy.de>
Discussion: https://postgr.es/m/641687.1742360249%40sss.pgh.pa.us
2025-05-01 09:14:30 -07:00
Nathan Bossart
a3ef0b570c Remove extra "not" in pg_upgrade documentation.
Oversight in commit cb45dc3afb.

Reported-by: Erik Rijkers <er@xs4all.nl>
Reviewed-by: Fujii Masao <masao.fujii@oss.nttdata.com>
Discussion: https://postgr.es/m/7b856277-62ad-80f0-36e1-a134ec3c9cab%40xs4all.nl
2025-05-01 09:31:36 -05:00
Dean Rasheed
d73d4cfdfc doc: Warn that ts_headline() output is not HTML-safe.
Add a documentation warning to ts_headline() pointing out that, when
working with untrusted input documents, the output is not guaranteed
to be safe for direct inclusion in web pages. This is because, while
it does remove some XML tags from the input, it doesn't remove all
HTML markup, and so the result may be unsafe (e.g., it might permit
XSS attacks).

To guard against that, all HTML markup should be removed from the
input, making it plain text, or the output should be passed through an
HTML sanitizer.

In addition, document precisely what the default text search parser
recognises as valid XML tags, since that's what determines which XML
tags ts_headline() will remove.

Reported-by: Richard Neill <richard.neill@telos.digital>
Author: Dean Rasheed <dean.a.rasheed@gmail.com>
Reviewed-by: Noah Misch <noah@leadboat.com>
Backpatch-through: 13
2025-05-01 11:03:43 +01:00
Peter Eisentraut
06c4f3ae80 doc: Improve explanations when a table rewrite is needed
Further improvement for commit 11bd831860.  That commit confused
identity and generated columns; fix that.  Also, virtual generated
columns have since been added; add more details about that.  Also some
small rewordings and reformattings to further improve clarity.

Reviewed-by: Robert Treat <rob@xzilla.net>
Discussion: https://postgr.es/m/00e6eb5f5c793b8ef722252c7a519c9a@oss.nttdata.com
2025-05-01 08:57:48 +02:00
Peter Geoghegan
9d924dbb37 Adjust overstrong nbtree skip array assertion.
Make an nbtree array preprocessing assertion account for scans that add
fewer skip arrays than initially expected due to preprocessing finding
an unsatisfiable array qual.

Oversight in commit 92fe23d9.

Author: Peter Geoghegan <pg@bowt.ie>
Reported-By: Mark Dilger <mark.dilger@enterprisedb.com>
Discussion: https://postgr.es/m/CAHgHdKtQMhHy5qcB3KqCcGiW-Rp8P7KzUFRa9ZMKUiv6zen7LQ@mail.gmail.com
2025-04-30 23:15:51 -04:00
Michael Paquier
92ee8a4df5 doc: Mention cost-based delays for total_[auto]{vacuum,analyze}_time
30a6ed0ce4 has added four attributes to pg_stat_all_tables to track the
cumulative time spent in [auto]vacuum and [auto]analyze.  It was not
mentioned that the vacuum cost-based delays are included in these
numbers, which could be confusing now that the delays are included in
the vacuum progress view (bb8dff9995).

This commit adds an extra note about this matter.

Reported-by: Magnus Hagander <magnus@hagander.net>
Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Discussion: https://postgr.es/m/CABUevEz9v1ZNToPyD98JnWDGZgG=SmPZKkSNzU9hXQ-nGTQF0g@mail.gmail.com
2025-05-01 08:52:19 +09:00
Daniel Gustafsson
45e7e8ca9e Convert strncpy to strlcpy
We try to avoid using strncpy() due to the ease of which it can
be misused.  Convert this callsite to use strlcpy() instead to
match similar codepaths in this file.

Suggested-by: Peter Eisentraut <peter@eisentraut.org>
Discussion: https://postgr.es/m/2a796830-de2d-4030-b480-d673f6cc5d94@eisentraut.org
2025-04-30 23:00:47 +02:00
Nathan Bossart
2d6745a66b doc: Add missing reference to track_cost_delay_timing.
Oversight in commit bb8dff9995.
2025-04-30 14:45:54 -05:00
Nathan Bossart
9879105024 vacuumdb: Don't skip empty relations in --missing-stats-only mode.
Presently, --missing-stats-only skips relations with reltuples set
to 0 because empty relations don't get optimizer statistics.
However, before v14, a reltuples value of 0 was ambiguous: it could
either mean the relation is empty, or it could mean that it hadn't
yet been vacuumed or analyzed.  (Commit 3d351d916b taught v14 and
newer to use -1 for the latter case.)  This ambiguity can cause
--missing-stats-only to inadvertently skip relations that need
optimizer statistics after upgrades to v18 and newer (since
reltuples is now transferred from the old cluster).

To fix, simply remove the check for reltuples != 0.  This will
cause --missing-stats-only to analyze some empty tables, but that
doesn't seem too terrible a trade-off.

Reported-by: Christoph Berg <myon@debian.org>
Reviewed-by: Christoph Berg <myon@debian.org>
Discussion: https://postgr.es/m/aAjyvW5_fRGNr7yF%40msg.df7cb.de
2025-04-30 14:12:59 -05:00
Nathan Bossart
d5f1b6a75b Further adjust guidance for running vacuumdb after pg_upgrade.
Since pg_upgrade does not transfer the cumulative statistics used
to trigger autovacuum and autoanalyze, the server may take much
longer than expected to process them post-upgrade.  Currently, we
recommend analyzing only relations for which optimizer statistics
were not transferred by using the --analyze-in-stages and
--missing-stats-only options.  This commit appends another
recommendation to analyze all relations to update the relevant
cumulative statistics by using the --analyze-only option.  This is
similar to the recommendation for pg_stat_reset().

Reported-by: Christoph Berg <myon@debian.org>
Reviewed-by: Christoph Berg <myon@debian.org>
Discussion: https://postgr.es/m/aAfxfKC82B9NvJDj%40msg.df7cb.de
2025-04-30 14:12:59 -05:00
Nathan Bossart
f60420cff6 doc: Alphabetize long options for pg_dump[all].
The current ordering strategy for these pages is to list the short
options in alphabetical order followed by the long options in
alphabetical order.  If an option has both a short variant and a
long variant, the short variant takes precedence.  This commit
moves a few recently added options to match this style.  We should
probably adjust all pages and --help output to list the long and
short options in one combined alphabetical list (with the long
variants taking precedence), but that is a much larger change, so
it is left as a future exercise.

Oversights in commits a5cf808be5, 1fd1bd8710, and bde2fb797a.

Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de>
Reviewed-by: Peter Eisentraut <peter@eisentraut.org>
Discussion: https://postgr.es/m/aBFBtsltgu3-IU1d%40nathan
2025-04-30 13:07:51 -05:00
Tom Lane
368c3fbf9d Update time zone data files to tzdata release 2025b.
DST law changes in Chile: there is a new time zone America/Coyhaique
for Chile's Aysén Region, to account for it changing to UTC-03
year-round and thus diverging from America/Santiago.

Historical corrections for Iran.

Backpatch-through: 13
2025-04-30 11:13:49 -04:00
Daniel Gustafsson
f8c115a6cb Typo and doc fixups for memory context reporting
This fixes comment and docs typos as well as a small documentation
change to make it clearer.  Found via post-commit review.

Author: Rahila Syed <rahilasyed90@gmail.com>
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Discussion: https://postgr.es/m/CAH2L28vt16C9xTuK+K7QZvtA3kCNWXOEiT=gEekUw3Xxp9LVQw@mail.gmail.com
2025-04-30 11:10:27 +02:00
Daniel Gustafsson
d2a1ed1727 Add missing string terminator
When copying the string strncpy won't add nul termination since
the string length is equal to the length specified.  Explicitly
set a nul terminator after copying to properly terminate. Found
via post-commit review.

Author: Rahila Syed <rahilasyed90@gmail.com>
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Discussion: https://postgr.es/m/CAH2L28vt16C9xTuK+K7QZvtA3kCNWXOEiT=gEekUw3Xxp9LVQw@mail.gmail.com
2025-04-30 10:34:08 +02:00
David Rowley
991407ae86 Add 918e7287e to .git-blame-ignore-revs 2025-04-30 19:27:56 +12:00
David Rowley
918e7287ed Fix broken indentation
I forgot to run pgindent in d8555e522.

Reported-by: Fujii Masao <masao.fujii@oss.nttdata.com>
Discussion: https://postgr.es/m/156083c9-eac0-418d-9667-92dec4d6d6cd@oss.nttdata.com
2025-04-30 19:18:30 +12:00
David Rowley
d8555e522e Fix a couple of comment typos
Author: Junwang Zhao <zhjwpku@gmail.com>
Discussion: https://postgr.es/m/CAEG8a3+MRwDKc4YSFKKPKq7Y+vMufVC5u94wM5KZPB2CbgCxnQ@mail.gmail.com
2025-04-30 13:40:46 +12:00
Tom Lane
810a8b1c80 Give up on running with NetBSD/OpenBSD's default semaphore settings.
This reverts commit 38da053463, which
attempted to preserve our ability to start with only 60 semaphores.

Subsequent changes (particularly 55b454d0e) have put that idea pretty
much permanently out of reach: people wishing to use Postgres v18 on
OpenBSD or NetBSD will have no choice but to increase those platforms'
default values of SEMMNI and SEMMNS.

Hence, revert 38da05346's changes in SEMAS_PER_SET and the minimum
tested value of max_connections.  Adjust a comment from the subsequent
patch 6d0154196, and tweak the wording in runtime.sgml to make it
clear that changing SEMMNI/SEMMNS is no longer even a little bit
optional on these platforms.

Although 38da05346 was later back-patched into v17, leave that branch
alone: it's still capable of starting with 60 semaphores, and there's
no reason to break that.

Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Nathan Bossart <nathandbossart@gmail.com>
Discussion: https://postgr.es/m/E1tuZNv-0037Gs-34@gemulon.postgresql.org
Discussion: https://postgr.es/m/1052019.1745947915@sss.pgh.pa.us
2025-04-29 17:27:52 -04:00
Jacob Champion
e974f1c216 oauth: Classify oauth_client_secret as a password
Tell UIs to hide the value of oauth_client_secret, like the other
passwords. Due to the previous commit, this does not affect postgres_fdw
and dblink, but add a comment to try to warn others of the hazard in the
future.

Reported-by: Noah Misch <noah@leadboat.com>
Reviewed-by: Noah Misch <noah@leadboat.com>
Discussion: https://postgr.es/m/20250415191435.55.nmisch%40google.com
2025-04-29 13:08:55 -07:00
Jacob Champion
d2e7d2a09d oauth: Disallow OAuth connections via postgres_fdw/dblink
A subsequent commit will reclassify oauth_client_secret from dispchar=""
to dispchar="*", so that UIs will treat it like a secret. For our FDWs,
this change will move that option from SERVER to USER MAPPING, which we
need to avoid.

But upon further discussion, we don't really want our FDWs to use our
builtin Device Authorization flow at all, for several reasons:

- the URL and code would be printed to the server logs, not sent over
  the client connection
- tokens are not cached/refreshed, so every single connection has to be
  manually authorized by a user with a browser
- oauth_client_secret needs to belong to the foreign server, but options
  on SERVER are publicly accessible
- all non-superusers would need password_required=false, which is
  dangerous

Future OAuth work can use FDWs as a motivating use case. But for now,
disallow all oauth_* connection options for these two extensions.

Reviewed-by: Noah Misch <noah@leadboat.com>
Discussion: https://postgr.es/m/20250415191435.55.nmisch%40google.com
2025-04-29 13:08:24 -07:00
Jacob Champion
45363fca63 Bump the minimum supported Python version to 3.6.8
Python 3.2 is no longer tested by the buildfarm, and there are only a
handful of buildfarm animals running versions older than 3.6, which
itself went end-of-life in 2021. Python 3.6.8 is the default version
shipped in RHEL8, so that seems like a reasonable baseline for PG18.

Now that we use the Python Limited API as of 0793ab810, older versions
of Python should continue functioning for users of PL/Python in
particular, so soften the language from "required" to "supported".

Wording by Tom Lane. Separate from the review of the patch itself,
several people provided input on the choice of cutoff: Christoph Berg,
Devrim Gündüz, Florents Tselai, Jelte Fennema-Nio, and Renan Alves
Fonseca. Thank you!

Suggested-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Peter Eisentraut <peter@eisentraut.org>
Discussion: https://postgr.es/m/16098.1745079444%40sss.pgh.pa.us
2025-04-29 13:04:19 -07:00
Peter Eisentraut
eec34099c3 Fix whitespace typo in string 2025-04-29 19:16:11 +02:00
Nathan Bossart
2b49492eda initdb: Do not report default autovacuum_worker_slots.
Commit 6d01541960 taught initdb to lower the default value of
autovacuum_worker_slots for systems with very few semaphores.  It
also added a "fake" report for the chosen value, i.e., initdb
prints a message about selecting the default, but the value was
already selected in a previous test.  Per discussion, this is not a
precedent we want to set, and it seems unnecessary to report
everything derived from max_connections, so let's remove the "fake"
report.

Reported-by: Peter Eisentraut <peter@eisentraut.org>
Suggested-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/de722583-4ba4-4063-bc41-e20684978116%40eisentraut.org
2025-04-29 11:41:42 -05:00
Bruce Momjian
faced8e6a4 doc: adjust max_files_per_process again
Reported-by: Andres Freund

Discussion: https://postgr.es/m/5yqochswkulckuzzrwgv2nqdrfh4k4coc4uwq4lvgzkfwnbjbd@46igbiwjabn2
2025-04-29 10:30:08 -04:00
Bruce Momjian
9a9e60fed3 doc: clarify new behavior of max_files_per_process 2025-04-29 09:45:41 -04:00
Peter Eisentraut
913c60b067 doc: Small example improvement
Add a comment character before a line annotation, so that the query
can be used as presented.

Reported-by: Yaroslav Saburov <y.saburov@gmail.com>
Author: Euler Taveira <euler@eulerto.com>
Reviewed-by: Robert Treat <rob@xzilla.net>
Discussion: https://www.postgresql.org/message-id/flat/174393459040.678.17810152410419444783%40wrigleys.postgresql.org
2025-04-29 14:43:35 +02:00
Alexander Korotkov
2260c7f6d9 Fixes for ChangeVarNodes_walker()
This commit fixes two bug in ChangeVarNodes_walker() function.

 * When considering RestrictInfo, walk down to its clauses based on the
   presense of relid to be deleted not just in clause_relids but also in
   required_relids.

 * Incrementally adjust num_base_rels based on the change of clause_relids
   instead of recalculating it using clause_relids, which could contain
   outer-join relids.

Reported-by: Richard Guo <guofenglinux@gmail.com>
Discussion: https://postgr.es/m/CAMbWs49PE3CvnV8vrQ0Dr%3DHqgZZmX0tdNbzVNJxqc8yg-8kDQQ%40mail.gmail.com
Author: Andrei Lepikhov <lepihov@gmail.com>
Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com>
2025-04-29 14:34:44 +03:00
Peter Eisentraut
15b1b4dd3f pg_restore: Improve --help synopsis
The --help synopsis should only be one line.  This rephrases the first
line a bit to reflect the new functionality of restoring multiple
databases from pg_dumpall output.  Additional explanations are better
kept in the man page.
2025-04-29 11:32:49 +02:00
Peter Eisentraut
dadc58f50a pg_restore: Put new option in consistent order in --help output
Also make the description a bit more consistent with similar options.
2025-04-29 10:59:05 +02:00
Amit Kapila
3ff2a1f0c9 Fix assertion failure during decoding from synced slots.
The slot synchronization skips updating the confirmed_flush LSN of the
local slot if the local slot has a newer catalog_xmin or restart_lsn, but
still allows updating the two_phase and two_phase_at fields of the slot.
This opens up a window for the prepared transactions between old
confirmed_flush LSN and two_phase_at to unexpectedly get decoded and sent
to the downstream after promotion. Then, while decoding the commit
prepared the assert will fail, which expects that the prepare hasn't been
sent to the downstream.

The fix is to skip updating the other slot fields when we are skipping to
update the confirmed_flush LSN of the slot.

We didn't backpatch this commit as two_phase_at was not synced in back
branches, which means prepared transactions won't be unexpectedly sent to
downstream.

We discovered this problem while analyzing BF failure reported in the
discussion link.

Reliably reproducing this issue without a debugger is difficult. Given
its rarity, adding specific injection point to test it doesn't seem
worthwhile, so we won't be adding a dedicated test case.

Author: Zhijie Hou <houzj.fnst@fujitsu.com>
Reviewed-by: shveta malik <shveta.malik@gmail.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Discussion: https://postgr.es/m/OS0PR01MB5716B44052000EB91EFAE60E94BC2@OS0PR01MB5716.jpnprd01.prod.outlook.com
2025-04-29 12:52:05 +05:30
Peter Eisentraut
ef1811ac9a pg_verifybackup: Message style improvements 2025-04-29 09:19:15 +02:00
Peter Eisentraut
c893245ec3 test_slru: Fix incorrect format placeholders
Before commit a0ed19e0a9 there was a cast around these, but the cast
inadvertently changed the signedness, but that made the format
placeholder correct.  Commit a0ed19e0a9 removed the casts, so now the
format placeholders had the wrong signedness.
2025-04-29 09:09:00 +02:00
Amit Kapila
9807617a92 Doc: Specify the interaction of publish_generated_columns with column list.
Author: Peter Smith <smithpb2250@gmail.com>
Reviewed-by: David G. Johnston <david.g.johnston@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Discussion: https://postgr.es/m/CAHut+PtnjLiNFFh-3f9cXH0wnwqjdkTjQNbVmZdZ1y+zKt_PPg@mail.gmail.com
2025-04-29 09:01:43 +05:30
Melanie Plageman
f132815fd7 Add maintenance_io_concurrency flag to some read stream users
Index vacuuming and [auto]prewarm AIO concurrency should be governed by
maintenance_io_concurrency. As such, pass those read stream users the
READ_STREAM_MAINTENANCE flag which will calculate their read stream
distance with maintenance_io_concurrency instead of
effective_io_concurrency. This was an oversight in the original commits
making those operations use the read stream API.

Discussion: https://postgr.es/m/flat/CAAKRu_aopDxTo4b41Mt_7Zc-z0_ngocrY8SFCCY6Aph1HgwuNw%40mail.gmail.com
2025-04-28 14:19:45 -04:00
Peter Geoghegan
ce72e7e02e Fix obsolete nbtree array advancement comment.
Checking if another primitive scan is required after all once the next
leaf page was moved from _bt_checkkeys to its _bt_readpage caller by
commit 9a2e2a28.  Update a comment that incorrectly described the
recheck mechanism as something that takes place in _bt_checkkeys.

Also fix an older typo in related code comments.
2025-04-28 12:49:17 -04:00
Peter Geoghegan
b75fedcab7 Make NULL tuple values always advance skip arrays.
_bt_check_compare neglected to handle a case that can arise when the
scan's keys are temporarily treated as nonrequired, as an optimization:
whenever a NULL tuple value was encountered that had a skip array whose
current element wasn't already NULL, _bt_check_compare failed to advance
the array to the NULL element.  This allowed _bt_check_compare to fail
to return matching tuples containing a NULL value (though only with an
array column that came before a skip array column with NULLs, and only
during _bt_readpage calls that set pstate.forcenonrequired=true on a
page where the higher-order column also had to advance).

To fix, teach _bt_check_compare to handle this case just like any other
case where a skip array key is unsatisfied and must be advanced directly
(due to the key being considered a nonrequired key).

Oversight in commit 8a510275, which optimized nbtree search scan key
comparisons with skip arrays.

Author: Peter Geoghegan <pg@bowt.ie>
Reported-By: Mark Dilger <mark.dilger@enterprisedb.com>
Discussion: https://postgr.es/m/CAHgHdKtLFWZcjr87hMH0hYDHgcifu4Tj7iHz-xh8qsJREt5cqA@mail.gmail.com
2025-04-28 12:11:08 -04:00
Álvaro Herrera
0e13b13d26
Fix pg_dump for inherited validated not-null constraints
When a child constraint is validated and the parent constraint it
derives from isn't, pg_dump must be coerced into printing the child
constraint; failing to do would result in a dump that restores the
constraint as not valid, which would be incorrect.

Co-authored-by: jian he <jian.universality@gmail.com>
Co-authored-by: Álvaro Herrera <alvherre@kurilemu.de>
Reported-by: jian he <jian.universality@gmail.com>
Message-id: https://postgr.es/m/CACJufxGHNNMc0E2JphUqJMzD3=bwRSuAEVBF5ekgkG8uY0Q3hg@mail.gmail.com
2025-04-28 16:25:06 +02:00
Peter Eisentraut
c061000311 pg_combinebackup: Message style improvements 2025-04-28 14:26:49 +02:00
Alexander Korotkov
73e7361376 Restore comments in ChangeVarNodesExtended()
This commit restores comments in ChangeVarNodesExtended(), which were
accidentally removed by fc069a3a63.

Reported-by: Richard Guo <guofenglinux@gmail.com>
Discussion: https://postgr.es/m/CAMbWs49PE3CvnV8vrQ0Dr%3DHqgZZmX0tdNbzVNJxqc8yg-8kDQQ%40mail.gmail.com
2025-04-28 11:20:22 +03:00
Amit Kapila
aaf9e95e87 Fix xmin advancement during fast_forward decoding.
During logical decoding, we advance catalog_xmin of logical too early in
fast_forward mode, resulting in required catalog data being removed by
vacuum. This mode is normally used to advance the slot without processing
the changes, but we still can't let the slot's xmin to advance to an
incorrect value.

Commit f49a80c481 fixed a similar issue where the logical slot's
catalog_xmin was getting advanced prematurely during non-fast-forward
mode. During xl_running_xacts processing, instead of directly advancing
the slot's xmin to the oldest running xid in the record, it allowed the
xmin to be held back for snapshots that can be used for
not-yet-replayed transactions, as those might consider older txns as
running too. However, it missed the fact that the same problem can happen
during fast_forward mode decoding, as we won't build a base snapshot in
that mode, and the future call to get_changes from the same slot can miss
seeing the required catalog changes leading to incorrect reslts.

This commit allows building the base snapshot even in fast_forward mode to
prevent the early advancement of xmin.

Reported-by: Amit Kapila <amit.kapila16@gmail.com>
Author: Zhijie Hou <houzj.fnst@fujitsu.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-by: shveta malik <shveta.malik@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Backpatch-through: 13
Discussion: https://postgr.es/m/CAA4eK1LqWncUOqKijiafe+Ypt1gQAQRjctKLMY953J79xDBgAg@mail.gmail.com
Discussion: https://postgr.es/m/OS0PR01MB57163087F86621D44D9A72BF94BB2@OS0PR01MB5716.jpnprd01.prod.outlook.com
2025-04-28 11:35:54 +05:30
Michael Paquier
b225c5e76e Remove circular #include's between wait_event.h and wait_event_types.h
wait_event_types.h is generated by the code, and included wait_event.h.
wait_event.h did the opposite move, including wait_event_types.h,
causing a circular dependency between both.

wait_event_types.h only needs to now about the wait event classes, so
this information is moved into its own file, and wait_event_types.h uses
this new header so as it does not depend anymore on wait_event.h.

Note that such errors can be found with clang-tidy, with commands like
this one:
clang-tidy source_file.c --checks=misc-header-include-cycle -- \
  -I/install/path/include/ -I/install/path/include/server/

Issue introduced by fa88928470.

Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Discussion: https://postgr.es/m/350192.1745768770@sss.pgh.pa.us
2025-04-28 09:08:15 +09:00
Alexander Korotkov
1aa7cf9eb8 Disallow removing placeholders during Self-Join Elimination.
fc069a3a63 implements Self-Join Elimination (SJE), which can remove base
relations when appropriate.  However, regressions tests for SJE only cover
the case when placeholder variables (PHVs) are evaluated and needed only
in a single base rel.  If this baserel is removed due to SJE, its clauses,
including PHVs, will be transferred to the keeping relation.  Removing these
PHVs may trigger an error on plan creation -- thanks to the b3ff6c742f for
detecting that.

This commit skips removal of PHVs during SJE.  This might also happen that
we skip the removal of some PHVs that could be removed.  However, the overhead
of extra PHVs is small compared to the complexity of analysis needed to remove
them.

Reported-by: Alexander Lakhin <exclusion@gmail.com>
Author: Alena Rybakina <a.rybakina@postgrespro.ru>
Author: Andrei Lepikhov <lepihov@gmail.com>
Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com>
Reviewed-by: Richard Guo <guofenglinux@gmail.com>
2025-04-28 01:40:42 +03:00
Tom Lane
2f5b056203 Remove inappropriate inclusions of c.h and postgres_fe.h.
Per our usual policy, Postgres header files should not include these;
the decision as to which one to use is to be made in the calling .c
file instead.

These errors aren't particularly new, but I'm not feeling a need
to back-patch these changes; it's mostly just neatnik-ism.
2025-04-27 16:58:57 -04:00
Tom Lane
94b84a6072 Don't use double-quotes in #include's of system headers, redux.
This cleans up some loose ends left by commit e8ca9ed1d.  I hadn't
looked closely enough at these places before, but now I have.

The use of double-quoted #includes for Perl headers in plperl_system.h
seems to be simply a mistake introduced in 6c944bf3c and faithfully
copied forward since then.  (I had thought possibly it was required
by some weird Windows build setup, but there's no evidence of that in
our history.)

The occurrences in SectionMemoryManager.h and SectionMemoryManager.cpp
evidently stem from those files' origin as LLVM code.  It's
understandable that LLVM would treat their own files as needing
double-quoted #includes; but they're still system headers to us.

I also applied the same check to *.c files, and found a few other
random incorrect usages in both directions.

Our ECPG headers and test files routinely use angle brackets to refer
to ECPG headers.  I left those usages alone, since it seems reasonable
for an ECPG user to regard those headers as system headers.
2025-04-27 13:23:19 -04:00
Tom Lane
2311f193ea Remove circular #include's between plpython.h and plpy_util.h.
plpython.h included plpy_util.h, simply on the grounds that "it's
easier to just include it everywhere".  However, plpy_util.h must
include plpython.h, or it won't pass headerscheck.  While the
resulting circularity doesn't have any immediate bad effect,
it's poor design.  We have seen serious messes arise in the past
from overly-broad inclusion footprints created by such circularities,
so let's establish a project policy against it.

To fix, just replace *.c files' inclusions of plpython.h with
plpy_util.h.  They'll pull in plpython.h indirectly; indeed, almost
all have already done so via inclusions of other plpy_xxx.h headers.
(Any extensions using plpython.h can do likewise without breaking
the compatibility of their code with prior Postgres versions.)

Reported-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Discussion: https://postgr.es/m/aAxQ6fcY5QQV1lo3@ip-10-97-1-34.eu-west-3.compute.internal
2025-04-27 11:43:02 -04:00
Tom Lane
e8ca9ed1d2 Don't use double-quotes in #include's of system headers.
While few if any C compilers will complain about this, it's
inconsistent with our other #include's of the same headers.

There are some other questionable usages in
src/include/jit/SectionMemoryManager.h and
src/pl/plperl/plperl_system.h, but perhaps those have a
reason to be like that.  I can't see that these do.

Noticed while fooling around with a script to do analysis
of our header cross-inclusions.
2025-04-26 20:30:27 -04:00
David Rowley
936457419d Eliminate divide in new fast-path locking code
c4d5cb71d2 adjusted the fast-path locking code to allow some
configuration of the number of fast-path locking slots via the
max_locks_per_transaction GUC.  In that commit the FAST_PATH_REL_GROUP()
macro used integer division to determine the fast-path locking group slot
to use for the lock.

The divisor in this case is always a power-of-two value.  Here we swap
out the divide by a bitwise-AND, which is a significantly faster
operation to perform.

In passing, adjust the code that's setting FastPathLockGroupsPerBackend
so that it's more clear that the value being set is a power-of-two.

Also, adjust some comments in the area which contained some magic
numbers.  It seems better to justify the 1024 upper limit in the
location where the #define is made instead of where it is used.

Author: David Rowley <drowleyml@gmail.com>
Reviewed-by: Tomas Vondra <tomas@vondra.me>
Discussion: https://postgr.es/m/CAApHDvodr3bcnpxcs7+k-3cFwYR0tP-BYhyd2PpDhe-bCx9i=g@mail.gmail.com
2025-04-27 11:53:40 +12:00
John Naylor
27757677ca Match parameter in new function to earlier equivalents
Oversight in commit 3c6e8c123.
2025-04-27 03:03:52 +07:00
Bruce Momjian
10e8176950 doc: improve wording of vacuum_max_eager_freeze_failure_rate 2025-04-26 11:41:23 -04:00
Andres Freund
039bfc457e aio: Improve debug logging around waiting for IOs
Trying to investigate a bug report by Alexander Lakhin made it apparent that
the debug logging around waiting for IO completion is insufficient. Fix that.

Discussion: https://postgr.es/m/h4in2db37vepagmi2oz5vvqymjasc5gyb4lpqkunj4eusu274i@37jpd3c2spd3
2025-04-25 13:31:25 -04:00
Andres Freund
500b61769f Fix bug allowing io_combine_limit > io_max_combine_combine limit
10f6646847 intended to limit the value of io_combine_limit to the minimum of
io_combine_limit and io_max_combine_limit. To avoid issues with interdependent
GUCs, it introduced io_combine_limit_guc and set io_combine_limit in assign
hooks. That plan was thwarted by guc_tables.c accidentally still referencing
io_combine_limit, instead of io_combine_limit_guc.  That lead to the GUC
machinery overriding the work done in the assign hooks, potentially leaving
io_combine_limit with a too high value.

The consequence of this bug was that when running with io_combine_limit >
io_combine_limit_guc the AIO machinery would not have reserved large enough
iovec and IO data arrays, with one IO's arrays overlapping with another IO's,
leading to total confusion.

To make such a problem easier to detect in the future, add assertions to
pgaio_io_set_handle_data_* checking the length is smaller than
io_max_combine_limit (not just PG_IOV_MAX).

It'd be nice to have a few tests for this, but it's not entirely obvious how
to do so portably.

As remarked upon by Tom, the GUC assignment hooks really shouldn't set the
underlying variable, that's the job of the GUC machinery. Change that as well.

Discussion: https://postgr.es/m/c5jyqnuwrpigd35qe7xdypxsisdjrdba5iw63mhcse4mzjogxo@qdjpv22z763f
2025-04-25 13:31:24 -04:00
Andres Freund
0d9114b704 aio: Fix crash potential for pg_aios views due to late state update
pgaio_io_reclaim() reset the fields in PgAioHandle before updating the state
to IDLE or incrementing the generation. For most things that's OK, but for
pg_get_aios() it is not - if it copied the PgAioHandle while fields were being
reset, we wouldn't detect that and could call
pgaio_io_get_target_description() with ioh->target == PGAIO_TID_INVALID,
leading to a crash.

Fix this issue by incrementing the generation and state earlier, before
resetting.

Also add an assertion to pgaio_io_get_target_description() for the target to
be valid - that'd have made this case a bit easier to debug. While at it,
add/update a few related assertions.

Author: Alexander Lakhin <exclusion@gmail.com>
Discussion: https://postgr.es/m/062daca9-dfad-4750-9da8-b13388301ad9@gmail.com
2025-04-25 13:31:13 -04:00
Peter Eisentraut
76d52e7165 Fix incorrect format placeholders
Before commit a0ed19e0a9 there was a cast around these, but the cast
inadvertently changed the signedness, but that made the format
placeholder correct.  Commit a0ed19e0a9 removed the casts, so now the
format placeholders had the wrong signedness.
2025-04-25 16:49:30 +02:00
Peter Eisentraut
385959bdea Fix terminology in comment and message
Should be "bracket" not "brace" for [].
2025-04-25 16:26:28 +02:00
Peter Eisentraut
0787646e1d Small code consistency improvement
Adjust the way the increment operators are placed to be consistent
throughout the function.  Fixup for commit commit c1da728106.
2025-04-25 13:01:31 +02:00
Amit Kapila
50b8ad30f7 Fix typo in test file name added in commit 4909b38af0.
Author: Shlok Kyal <shlok.kyal.oss@gmail.com>
Backpatch-through: 13
Discussion: https://postgr.es/m/CANhcyEXsObdjkjxEnq10aJumDpa5J6aiPzgTh_w4KCWRYHLw6Q@mail.gmail.com
2025-04-25 12:46:02 +05:30
Fujii Masao
632f62dcec doc: remove unnecessary secondary index terms for replication settings.
Previously, config.sgml included secondary index terms for
max_replication_slots and max_active_replication_origins. These are
no longer necessary, as each parameter now has a single distinct index entry.

The secondary index terms were originally useful because
max_active_replication_origins was part of max_replication_slots,
and separate index entries helped users locate each setting. However,
commit 04ff636cbc split them into independent parameters,
making the secondary terms redundant.

This commit removes the unnecessary secondary index entries to
simplify the documentation.

Author: Fujii Masao <masao.fujii@gmail.com>
Reviewed-by: Euler Taveira <euler@eulerto.com>
Reviewed-by: Robert Treat <rob@xzilla.net>
Discussion: https://postgr.es/m/e825e7a7-4877-441d-93c1-25377db36c31@oss.nttdata.com
2025-04-25 14:58:14 +09:00
Bruce Momjian
6389db2320 doc: simplify new EXPLAIN ANALYZE BUFFERS description 2025-04-24 22:02:35 -04:00
Michael Paquier
3631612eae psql: Fix assertion failures with pipeline mode
A correct cocktail of COPY FROM, SELECT and/or DML queries and
\syncpipeline was able to break the logic in charge of discarding
results of a pipeline, done in discardAbortedPipelineResults().  Such
sequence make the backend generate a FATAL, due to a protocol
synchronization loss.

This problem comes down to the fact that we did not consider the case of
libpq returning a PGRES_FATAL_ERROR when discarding the results of an
aborted pipeline.  The discarding code is changed so as this result
status is handled as a special case, with the caller of
discardAbortedPipelineResults() being responsible for consuming the
result.

A couple of tests are added to cover the problems reported, bringing an
interesting gain in coverage as there were no tests in the tree covering
the case of protocol synchronization loss.

Issue introduced by 41625ab8ea.

Reported-by: Alexander Kozhemyakin <a.kozhemyakin@postgrespro.ru>
Author: Anthonin Bonnefoy <anthonin.bonnefoy@datadoghq.com>
Co-authored-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/ebf6ce77-b180-4d6b-8eab-71f641499ddf@postgrespro.ru
2025-04-24 12:22:53 +09:00
Michael Paquier
923ae50cf5 Add sanity check for dshash entries when reading pgstats file
Not having this check would produce a core dump at startup when running
pgstat_read_statsfile(), in the case where the information of a stats
kind for an entry in the dshash could not be found.  The same check
already happens for fixed-numbered stats and entries that are stored
with their names.  This issue can be seen with custom stats kinds.

Note that this problem can be reproduced what what is in the core code:
- Tweak the test module injection_points to not load the fixed-numbered
stats part, leaving only the variable-numbered stats.
- Create an instance with injection_points defined in
shared_preload_libraries.
- Create a pgstats entry by attaching and running a point.
- Restart the server without shared_preload_libraries.  The startup
process detects that something is wrong and reports a WARNING.

Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Discussion: https://postgr.es/m/aAieZAvM+K1d89R2@ip-10-97-1-34.eu-west-3.compute.internal
2025-04-24 09:20:01 +09:00
Tom Lane
bc19f63f80 Avoid possibly-theoretical OOM crash hazard in hash_create().
One place in hash_create() used DynaHashAlloc() as a convenient
shorthand for MemoryContextAlloc().  That was fine when it was
written, but it stopped being fine when 9c911ec06 changed
DynaHashAlloc() to use MCXT_ALLOC_NO_OOM (mea culpa).  Change
the code to call plain MemoryContextAlloc() as intended.

I think that this bug may be unreachable in practice, since we now
always create AllocSets with some space already allocated, so that
an OOM failure here for a non-shared hash table should be impossible
(with a hash table name of reasonable length anyway).  And there
aren't enough shared hash tables to make a crash for one of those
probable.  Nonetheless it's clearly not operating as designed, so
back-patch to v16 where 9c911ec06 came in.

Reported-by: Maksim Korotkov <m.korotkov@postgrespro.ru>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/219bdccd460510efaccf90b57e5e5ef2@postgrespro.ru
Backpatch-through: 16
2025-04-23 16:04:55 -04:00
Jacob Champion
005ccae0f2 oauth: Support Python 3.6 in tests
RHEL8 ships a patched 3.6.8 as its base Python version, and I
accidentally let some newer Python-isms creep into oauth_server.py
during development.

Reported-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Jelte Fennema-Nio <postgres@jeltef.nl>
Tested-by: Renan Alves Fonseca <renanfonseca@gmail.com>
Tested-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/16098.1745079444%40sss.pgh.pa.us
2025-04-23 11:16:45 -07:00
Alexander Korotkov
bb78e42678 Maintain RelIdToTypeIdCacheHash in TypeCacheOpcCallback()
b85a9d046e introduced a new RelIdToTypeIdCacheHash, whose entries should
exist for typecache entries with TCFLAGS_HAVE_PG_TYPE_DATA flag set or any
of TCFLAGS_OPERATOR_FLAGS set or tupDesc set.  However, TypeCacheOpcCallback(),
which resets TCFLAGS_OPERATOR_FLAGS, was forgotten to update
RelIdToTypeIdCacheHash.

This commit adds a delete_rel_type_cache_if_needed() call to the
TypeCacheOpcCallback() function to maintain RelIdToTypeIdCacheHash after
resetting TCFLAGS_OPERATOR_FLAGS.

Also, this commit fixes the name of the delete_rel_type_cache_if_needed()
function in its mentions in the comments.

Reported-by: Noah Misch
Discussion: https://postgr.es/m/20250411203241.e9.nmisch%40google.com
2025-04-23 20:26:52 +03:00
Alexander Korotkov
9f404d7922 Properly prepare varinfos in estimate_multivariate_bucketsize()
To estimate with extended statistics, we need to clear the varnullingrels
field in the expression, and duplicates are not allowed in the GroupVarInfo
list.  We might re-use add_unique_group_var(), but we don't do so for two
reasons.

  1) We must keep the origin_rinfos list ordered exactly the same way as
     varinfos.
  2) add_unique_group_var() is designed for estimate_num_groups(), where a
     larger number of groups is worse.   While estimating the number of hash
     buckets, we have the opposite: a lesser number of groups is worse.
     Therefore, we don't have to remove "known equal" vars: the removed var
     may valuably contribute to the multivariate statistics to grow the number
     of groups.

This commit adds custom code to estimate_multivariate_bucketsize() to
initialize varinfos properly.

Reported-by: Robins Tharakan <tharakan@gmail.com>
Discussion: https://postgr.es/m/18885-da51324078588253%40postgresql.org
Author: Andrei Lepikhov <lepihov@gmail.com>
Reviewed-by: David Rowley <dgrowleyml@gmail.com>
Reviewed-by: Tomas Vondra <tomas@vondra.me>
Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com>
2025-04-23 20:25:21 +03:00
Tom Lane
3db61db48e Change the names generated for child foreign key constraints.
When a foreign key constraint is placed on a partitioned table, we
actually make two pg_constraint entries associated with that table.
(I have my doubts about the wisdom of that, but it's been like that
since v12 and post-feature-freeze is no time to be messing with such
entrenched decisions.)  The second "child" entry always had a name
generated according to the default rule, "table_column(s)_fkey[nnn]",
even if the primary entry had an unrelated user-specified name.  The
trouble with doing that is that the default name could collide with
the user-specified name of some other constraint on the same table.
While we were willing to adjust the generated name to avoid
collisions, that only helps if it's made second; if it's made first
then creation of the other constraint would fail, potentially causing
dump/reload or pg_upgrade failures.

The core of the problem here is that we're infringing on user
namespace, so I doubt that there's any 100% solution other than to
find a way to not need the "child" entry.  In the meantime, it seems
like it'd be an improvement to make the child's name be the name of
the parent constraint with an underscore and digit(s) appended as
necessary to make it unique.  This rule can in theory fail in the same
way, but it seems much less probable; for one thing, this rule is
guaranteed not to match primary entries having auto-generated names.
(While an auto-generated primary name isn't user-specified to begin
with, it acts like that during dump/reload, so collisions against such
names are definitely possible.)

An additional bonus, visible in some of the regression test cases
that change here, arises from the fact that some error messages
cite the child constraint's name not the parent's.  In the
previous approach the two names could be completely unrelated,
leading to user confusion --- the more so since psql's \d command
hides child constraints.  With this approach it's hopefully much
clearer which constraint-the-user-knows-about is failing.

However, that does mean that there's user-visible behavior change
occurring here, making it seem like not something to back-patch.
I feel it's not too late for v18, though.

Reported-by: Kirill Reshke <reshkekirill@gmail.com>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Alvaro Herrera <alvherre@kurilemu.de>
Discussion: https://postgr.es/m/CALdSSPhGitjpTfzEMJN-Y2x+Q-5QChSxAsmSJ1-E8mQJLkHOqQ@mail.gmail.com
2025-04-23 12:03:02 -04:00
Daniel Gustafsson
994a100b37 Allocate JsonLexContexts on the heap to avoid warnings
The stack allocated JsonLexContexts, in combination with codepaths
using goto, were causing warnings when compiling with LTO enabled
as the optimizer is unable to figure out that is safe.  Rather than
contort the code with workarounds for this simply heap allocate the
structs instead as these are not in any performance critical paths.

Author: Daniel Gustafsson <daniel@yesql.se>
Reported-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Jacob Champion <jacob.champion@enterprisedb.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/2074634.1744839761@sss.pgh.pa.us
2025-04-23 11:02:05 +02:00
Michael Paquier
0ff95e0a5b psql: Rework TAP routine psql_fails_like() to define WAL sender context
The routine was coded so as a WAL sender was always used, state required
only for one failure test related to START_REPLICATION.  This test is
changed so as a WAL sender is used by passing a replication option to
psql_fails_like(), instead of forcing the use of a WAL sender for all
the tests.

This has come up as useful in the context of a separate bug fix where
we are looking at extending tests for some failure scenarios.  These
tests need to happen in the context of a normal backend, and not a WAL
sender where the extended query protocol cannot be used.

Discussion: https://postgr.es/m/aAXkJIOildLUA7vQ@paquier.xyz
2025-04-23 15:33:07 +09:00
Amit Kapila
0e091ce409 Fix an oversight in 3f28b2fcac.
Commit 3f28b2fcac tried to ensure that the replication origin shouldn't be
advanced in case of an ERROR in the apply worker, so that it can request
the same data again after restart. However, it is possible that an ERROR
was caught and handled by a (say PL/pgSQL) function, and  the apply worker
continues to apply further changes, in which case, we shouldn't reset the
replication origin.

Ensure to reset the origin only when the apply worker exits after an
ERROR.

Commit 3f28b2fcac added new function geterrlevel, which we removed in HEAD
as part of this commit, but kept it in backbranches to avoid breaking any
applications. A separate case can be made to have such a function even for
HEAD.

Reported-by: Shawn McCoy <shawn.the.mccoy@gmail.com>
Author: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-by: vignesh C <vignesh21@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Backpatch-through: 16, where it was introduced
Discussion: https://postgr.es/m/CALsgZNCGARa2mcYNVTSj9uoPcJo-tPuWUGECReKpNgTpo31_Pw@mail.gmail.com
2025-04-23 11:08:24 +05:30
Michael Paquier
1f7878c33c Remove assertion based on pending_since in pgstat_report_stat()
This assertion, based on pending_since (timestamp used to prevent stats
reports to be too frequent or should a partial flush happen), is reached
when it is found that no data can be flushed but a previous call of
pgstat_report_stat() determined that some stats data has been found as
in need of a flush.  So pending_since is set when some stats data is
pending (in non-force mode) or if report attempts are too frequent, and
reset to 0 once all stats have been flushed.

Since 5cbbe70a9cc6, WAL senders have begun to report their stats on a
periodic basis for IO stats in v16~ and backend stats on HEAD, creating
some friction with the concurrent pgstat_report_stat() calls that can
happen in the context of a WAL sender (shutdown callback doing a final
report or backend-related code paths).  This problem is the cause of
spurious failures in the TAP tests.

In theory, this assertion can be also reached in v15, even if that's
very unlikely.  For example, a process, say a background worker, could
do periodic and direct stats flushes with concurrent calls of
pgstat_report_stat() that could cause conflicting values of
pending_since.  This can be done with WAL or SLRU stats flushes using
pgstat_flush_wal() or pgstat_slru_flush().  HEAD makes this situation
easier to happen with custom cumulative stats.

This commit removes the assertion altogether, per discussion, as it is
more useful to keep the state of things as they are for the WAL sender.
The assertion could use a special state based on for example
am_walsender, but I doubt that this would be meaningful in the long run
based on the other arguments raised while discussing this issue.

Reported-by: Tom Lane <tgl@sss.pgh.pa.us>
Reported-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/1489124.1744685908@sss.pgh.pa.us
Discussion: https://postgr.es/m/dwrkeszz6czvtkxzr5mqlciy652zau5qqnm3cp5f3p2po74ppk@omg4g3cc6dgq
Backpatch-through: 15
2025-04-23 13:53:29 +09:00
Tom Lane
e0f373ee42 Re-enable SSL connect_fails tests, and fix related race conditions.
Cluster.pm's connect_fails routine has long had the ability to
sniff the postmaster log file for expected messages after a
connection failure.  However, that's always had a race condition:
on some platforms it's possible for psql to exit and the test
script to slurp up the postmaster log before the backend process
has been able to write out its final log messages.  Back in
commit 55828a6b6 we disabled a bunch of tests after discovering
that, and the aim of this patch is to re-enable them.

(The sibling function connect_ok doesn't seem to have a similar
problem, mainly because the messages we look for come out during
the authentication handshake, so that if psql reports successful
connection they should certainly have been emitted already.)

The solution used here is borrowed from 002_connection_limits.pl's
connect_fails_wait routine: set the server's log_min_messages setting
to DEBUG2 so that the postmaster will log child-process exit, and then
wait till we see that log entry before checking for the messages we
are actually interested in.

If a TAP test uses connect_fails' log_like or log_unlike options, and
forgets to set log_min_messages, those connect_fails calls will now
hang until timeout.  Fixing up the existing callers shows that we had
several other TAP tests that were in theory vulnerable to the same
problem.  It's unclear whether the lack of failures is just luck, or
lack of buildfarm coverage, or perhaps there is some obscure timing
effect that only manifests in SSL connections.  In any case, this
change should in principle make those other call sites more robust.
I'm not inclined to back-patch though, unless sometime we observe
an actual failure in one of them.

Reported-by: Andrew Dunstan <andrew@dunslane.net>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/984fca80-85a8-4c6f-a5cc-bb860950b435@dunslane.net
2025-04-22 15:10:50 -04:00
Tom Lane
da83b1ea10 Avoid depending on post-UPDATE row order in float4/float8 tests.
While heapam reproduces the insertion order of rows well, updates
can move rows to varying places depending on autovacuum activity.
In most regression tests we've guarded against getting variable
results due to that, but float4.sql and float8.sql had escaped
notice so far because they update tables that are too small for
autovacuum to pay attention to.

With increasing interest in non-heap table AMs, it seems worth
allowing for update behaviors that are not like heapam's.  Hence,
add ORDER BY to stabilize the results in case the updates put
the rows in a different order.  (We'll continue to assume that a
seqscan will reproduce original insertion order, though.  Removing
that assumption would require vastly-more-invasive test changes.)

Author: Pavel Borisov <pashkin.elfe@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/CALT9ZEExHAnBoBVQzQuWPMKUbapF5-FBO3fdeYG3s2tuWQz1NQ@mail.gmail.com
2025-04-22 14:24:21 -04:00
Tom Lane
eaf582806c gen_node_support.pl: improve error message for unclosed struct.
This error message was 'runaway "struct_name"', which isn't all
that clear; I think 'could not find closing brace for "struct_name"'
is better.  Also, provide the location of the struct start using the
script's usual '$file:$lineno' style.

Bug: #18901
Reported-by: Clemens Ruck <clemens.ruck@t-online.de>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/18901-424272abe01357e6@postgresql.org
2025-04-22 13:56:31 -04:00
Michael Paquier
e29df428a1 doc: Mention naming convention used by injection points
All the injection points used in the tree have relied on an implied
rule: their names should be made of lower-case characters, with dashes
between the words used.

This commit adds a light mention about that in the docs, encouraging the
practice.

Author: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Reviewed-by: Aleksander Alekseev <aleksander@timescale.com>
Discussion: https://postgr.es/m/OSCPR01MB14966E14C1378DEE51FB7B7C5F5B32@OSCPR01MB14966.jpnprd01.prod.outlook.com
Backpatch-through: 17
2025-04-22 12:41:29 +09:00
David Rowley
0b06459f3c Doc: reword text explaining the --maintenance-db option
The previous text was a little clumsy.  Here we improve that.

Author: David Rowley <dgrowleyml@gmail.com>
Reported-by: Noboru Saito <noborusai@gmail.com>
Reviewed-by: David G. Johnston <david.g.johnston@gmail.com>
Discussion: https://postgr.es/m/CAAM3qnJtv5YbjpwDfVOYN2gZ9zGSLFM1UGJgptSXmwfifOZJFQ@mail.gmail.com
Backpatch-through: 13
2025-04-22 14:54:22 +12:00
Michael Paquier
02c63f9438 Rename injection point for invalidation messages at end of transaction
This injection point was named "AtEOXact_Inval-with-transInvalInfo", not
respecting the implied naming convention that injection points should
use lower-case characters, with terms separated by dashes.  All the
other points defined in the tree follow this style, so let's be more
consistent.

Author: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Reviewed-by: Aleksander Alekseev <aleksander@timescale.com>
Discussion: https://postgr.es/m/OSCPR01MB14966E14C1378DEE51FB7B7C5F5B32@OSCPR01MB14966.jpnprd01.prod.outlook.com
Backpatch-through: 17
2025-04-22 10:01:38 +09:00
David Rowley
5e6f9a9c4e Doc: various fixups
* Use <symbol> tags for CONNECTION_* #defines

We were using an inconsistent mix of <literal> and sometimes <function>
tags.

* Use <application> tag for libpq

There was a mix of <literal> and <productname>

Also fix a whitespace issue.

None of these seem critical enough mistakes to backpatch.

Author: Noboru Saito <noborusai@gmail.com>
Discussion: https://postgr.es/m/CAAM3qnJtv5YbjpwDfVOYN2gZ9zGSLFM1UGJgptSXmwfifOZJFQ@mail.gmail.com
2025-04-22 11:10:08 +12:00
David Rowley
d010cc6cca Doc: fix incorrect punctuation
Author: Noboru Saito <noborusai@gmail.com>
Discussion: https://postgr.es/m/CAAM3qnJtv5YbjpwDfVOYN2gZ9zGSLFM1UGJgptSXmwfifOZJFQ@mail.gmail.com
Backpatch-through: 17
2025-04-22 11:04:04 +12:00
Jeff Davis
90260e2ec6 Fix INITCAP() word boundaries for PG_UNICODE_FAST.
Word boundaries are based on whether a character is alphanumeric or
not. For the PG_UNICODE_FAST collation, alphanumeric includes
non-ASCII digits; whereas for the PG_C_UTF8 collation, it only
includes digits 0-9. Pass down the right information from the
pg_locale_t into initcap_wbnext to differentiate the behavior.

Reported-by: Noah Misch <noah@leadboat.com>
Reviewed-by: Noah Misch <noah@leadboat.com>
Discussion: https://postgr.es/m/20250417135841.33.nmisch@google.com
2025-04-21 12:34:58 -07:00
Tom Lane
80b727eb9d Use the same cmd_context throughout a walsender's lifetime.
exec_replication_command created a cmd_context to work in and
then deleted it on exit.  This is pretty dangerous because
some replication commands start/finish transactions.  In the
wake of commit 1afe31f03, that could lead to re-selecting a
CurrentMemoryContext that's already been deleted, leading to
hilarity such as a memory context that is its own parent.

To fix, let's make the cmd_context persist across
exec_replication_command calls; instead of deleting it, we'll just
reset it each time.  In this way it retains the same identity and
there's no problem if transaction abort restores it as the working
context.  It probably even saves a few microseconds to do this.

This fix also ensures that exec_replication_command returns to the
caller (PostgresMain) with the same context active that had been
when it was called (probably MessageContext).  The previous
coding could get that wrong too.

Reported-by: Anthonin Bonnefoy <anthonin.bonnefoy@datadoghq.com>
Author: Anthonin Bonnefoy <anthonin.bonnefoy@datadoghq.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/CAO6_XqoJA7-_G6t7Uqe5nWF3nj+QBGn4F6Ptp=rUGDr0zo+KvA@mail.gmail.com
2025-04-21 12:09:36 -04:00
Tom Lane
5ec8b01c30 MemoryContextCreate: assert parent is valid and different from node.
The case of "node == parent" might seem impossible, since we just
allocated the new node.  But it's possible if parent is a dangling
reference to a recently-deleted context.  In fact, given aset.c's
habit of recycling contexts, it's actually rather likely if that's so.
If we'd had this assertion before, it would have simplified debugging
a recently-identified walsender issue.

Reported-by: Anthonin Bonnefoy <anthonin.bonnefoy@datadoghq.com>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/CAO6_XqoJA7-_G6t7Uqe5nWF3nj+QBGn4F6Ptp=rUGDr0zo+KvA@mail.gmail.com
2025-04-21 11:34:36 -04:00
Fujii Masao
706cbed351 doc: Fix memory context level in pg_log_backend_memory_contexts() example.
Commit d9e03864b6 changed the memory context level numbers shown by
pg_log_backend_memory_contexts() to be 1-based. However, the example in
the documentation was not updated and still used 0-based numbering.

This commit updates the example to match the current 1-based output.

Author: Fujii Masao <masao.fujii@gmail.com>
Reviewed-by: David Rowley <drowleyml@gmail.com>
Discussion: https://postgr.es/m/1ad6d388-1b43-400d-bec9-36d52f755f74@oss.nttdata.com
2025-04-21 14:53:25 +09:00
David Rowley
78eda9e264 Fix a few more duplicate words in comments
Similar to 84fd3bc14 but these ones were found using a regex that can span
multiple lines.

Author: David Rowley <dgrowleyml@gmail.com>
Discussion: https://postgr.es/m/CAApHDvrMcr8XD107H3NV=WHgyBcu=sx5+7=WArr-n_cWUqdFXQ@mail.gmail.com
2025-04-21 13:50:50 +12:00
David Rowley
84fd3bc141 Fix a few duplicate words in comments
These are all new to v18

Author: David Rowley <dgrowleyml@gmail.com>
Discussion: https://postgr.es/m/CAApHDvrMcr8XD107H3NV=WHgyBcu=sx5+7=WArr-n_cWUqdFXQ@mail.gmail.com
2025-04-21 10:41:18 +12:00
Noah Misch
8180136652 Comment on need to MarkBufferDirty() if omitting DELAY_CHKPT_START.
Blocking checkpoint phase 2 requires MarkBufferDirty() and
BUFFER_LOCK_EXCLUSIVE; neither suffices by itself.  transam/README documents
this, citing SyncOneBuffer().  Update the DELAY_CHKPT_START documentation to
say this.  Expand the heap_inplace_update_and_unlock() comment that cites
XLogSaveBufferForHint() as precedent, since heap_inplace_update_and_unlock()
could have opted not to use DELAY_CHKPT_START.

Commit 8e7e672cda added DELAY_CHKPT_START to
heap_inplace_update_and_unlock().  Since commit
bc6bad88572501aecaa2ac5d4bc900ac0fd457d5 reverted it in non-master branches,
no back-patch.

Discussion: https://postgr.es/m/20250406180054.26.nmisch@google.com
2025-04-20 12:00:17 -07:00
Noah Misch
714bd9e3a7 Test restartpoints in archive recovery.
v14 commit 1f95181b44c843729caaa688f74babe9403b5850 and its v13
equivalent caused timing-dependent failures in archive recovery, at
restartpoints.  The symptom was "invalid magic number 0000 in log
segment X, offset 0", "unexpected pageaddr X in log segment Y, offset 0"
[X < Y], or an assertion failure.  Commit
3635a0a35aafd3bfa80b7a809bc6e91ccd36606a and predecessors back-patched
v15 changes to fix that.  This test reproduces the problem
probabilistically, typically in less than 1000 iterations of the test.
Hence, buildfarm and CI runs would have surfaced enough failures to get
attention within a day.

Reported-by: Arun Thirupathi <arunth@google.com>
Discussion: https://postgr.es/m/20250306193013.36.nmisch@google.com
Backpatch-through: 13
2025-04-20 08:28:48 -07:00
Noah Misch
2d5350cfbd Avoid ERROR at ON COMMIT DELETE ROWS after relhassubclass=f.
Commit 7102070329 fixed a similar bug, but
it missed the case of database-wide ANALYZE ("use_own_xacts" mode).
Commit a07e03fd8f changed consequences
from silent discard of a pg_class stats (relpages et al.) update to
ERROR "tuple to be updated was already modified".  Losing a relpages
update of an ON COMMIT DELETE ROWS table was negligible, but a
COMMIT-time error isn't negligible.  Back-patch to v13 (all supported
versions).

Reported-by: Richard Guo <guofenglinux@gmail.com
Reported-by: Robins Tharakan <tharakan@gmail.com>
Discussion: https://postgr.es/m/CAMbWs4-XwMKMKJ_GT=p3_-_=j9rQSEs1FbDFUnW9zHuKPsPNEQ@mail.gmail.com
Backpatch-through: 13
2025-04-20 08:28:48 -07:00
David Rowley
d47f922246 Fix issue with ORDER BY / DISTINCT aggregates and FILTER
1349d2790 added support so that aggregate functions with an ORDER BY or
DISTINCT clause could make use of presorted inputs to avoid an implicit
sort within nodeAgg.c.  That commit failed to consider that a FILTER
clause may exist that filters rows before the aggregate function
arguments are evaluated.  That can be problematic if an aggregate
argument contains an expression which could error out during evaluation.
It's perfectly valid to want to have a FILTER clause which eliminates
such values, and with the pre-sorted path added in 1349d2790, it was
possible that the planner would produce a plan with a Sort node above
the Aggregate to perform the sort on the aggregate's arguments long before
the Aggregate node would filter out the non-matching values.

Here we fix this by inspecting ORDER BY / DISTINCT aggregate functions
which have a FILTER clause to see if the aggregate's arguments are
anything more complex than a Var or a Const.  Evaluating these isn't
going to cause an error.  If we find any non-Var, non-Const parameters
then the planner will now opt to perform the sort in the Aggregate node
for these aggregates, i.e. disable the presorted aggregate optimization.

An alternative fix would have been to completely disallow the presorted
optimization for Aggrefs with any FILTER clause, but that wasn't done as
that could cause large performance regressions for queries that see
significant gains from 1349d2790 due to presorted results coming in from
an Index Scan.

Backpatch to 16, where 1349d2790 was introduced

Author: David Rowley <dgrowleyml@gmail.com>
Reported-by: Kaimeh <kkaimeh@gmail.com>
Diagnosed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/CAK-%2BJz9J%3DQ06-M7cDJoPNeYbz5EZDqkjQbJnmRyQyzkbRGsYkA%40mail.gmail.com
Backpatch-through: 16
2025-04-20 22:12:07 +12:00
Michael Paquier
78231baaf9 psql: Split extended query protocol meta-commands in --help=commands
Compared to v17 with only \bind able to do extended query protocol work,
v18 has now a total of 11 meta-commands related to the extended query
protocol.  These were all listed under the "General" section of the
--help=commands output and are specialized, bloating the output
generated.

All these meta-commands are moved into a new section called "Extended
Query Protocol", listed at the end of --help=commands.

This split has been suggested by Noah Misch.

Discussion: https://postgr.es/m/20250415213450.1f.nmisch@google.com
2025-04-20 08:34:38 +09:00
Michael Paquier
5743d122fc psql: Improve descriptions of \\flush[request] in --help
Noah has reported that the current wording was confusing compared to the
description of the underlying libpq routine.  The new wording is from
me.

Reported-by: Noah Misch <noah@leadboat.com>
Discussion: https://postgr.es/m/20250415213450.1f.nmisch@google.com
2025-04-20 08:16:57 +09:00
Michael Paquier
5ee7bd944e psql: Fix incorrect status code returned by \getresults
When an invalid number of results is requested for \getresults, the
status code returned by exec_command_getresults() was PSQL_CMD_SKIP_LINE
and not PSQL_CMD_ERROR.

This led to incorrect behaviors, with ON_ERROR_STOP for example.

Reported-by: Noah Misch <noah@leadboat.com>
Discussion: https://postgr.es/m/20250415213450.1f.nmisch@google.com
2025-04-20 08:15:39 +09:00
Tom Lane
d05996340d Be more wary of corrupt data in pageinspect's heap_page_items().
The original intent in heap_page_items() was to return nulls, not
throw an error or crash, if an item was sufficiently corrupt that
we couldn't safely extract data from it.  However, commit d6061f83a
utterly missed that memo, and not only put in an un-length-checked
copy of the tuple's data section, but also managed to break the check
on sane nulls-bitmap length.  Either mistake could possibly lead to
a SIGSEGV crash if the tuple is corrupt.

Bug: #18896
Reported-by: Dmitry Kovalenko <d.kovalenko@postgrespro.ru>
Author: Dmitry Kovalenko <d.kovalenko@postgrespro.ru>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/18896-add267b8e06663e3@postgresql.org
Backpatch-through: 13
2025-04-19 16:37:42 -04:00
Michael Paquier
88e947136b Fix typos and grammar in the code
The large majority of these have been introduced by recent commits done
in the v18 development cycle.

Author: Alexander Lakhin <exclusion@gmail.com>
Discussion: https://postgr.es/m/9a7763ab-5252-429d-a943-b28941e0e28b@gmail.com
2025-04-19 19:17:42 +09:00
Michael Paquier
114f7fa81c Rename injection points used in AIO tests
The format of the injection point names used by the AIO code does not
match the existing naming convention used everywhere else in the code,
so let's be consistent.  These points are used in test_aio.

Reviewed-by: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Discussion: https://postgr.es/m/Z_yTB80bdu1sYDqJ@paquier.xyz
2025-04-19 18:53:35 +09:00
Fujii Masao
3aad76a0a9 Make pg_upgrade log message with control file path translatable.
Commit 173c97812f replaced the hardcoded "global/pg_control" in pg_upgrade
log message with a string literal concatenation of XLOG_CONTROL_FILE macro.
However, this change made the message untranslatable.

This commit fixes the issue by using %s with XLOG_CONTROL_FILE instead of
that literal concatenation, allowing the message to be translated properly.
It also wraps the file path in double quotes for consistency with similar
log messages.

Author: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Reviewed-by: Masao Fujii <masao.fujii@gmail.com>
Discussion: https://postgr.es/m/20250407.155546.2129693791769531891.horikyota.ntt@gmail.com
2025-04-18 18:35:40 +09:00
Tatsuo Ishii
05883bd6e5 Doc: fix missing comma at the end of a line.
Backpatch to 17, where the line was added.

Reported by Noboru Saito while he was working on translating the file
into Japanese.

Discussion: https://postgr.es/m/20250417.203047.1321297410457834775.ishii%40postgresql.org
Reported-by: Noboru Saito <noborusai@gmail.com>
Reviewed-by: Daniel Gustafs <daniel@yesql.se>
Backpatch-through: 17
2025-04-18 09:38:46 +09:00
David Rowley
1bd08f6ba5 Fixup various older misuses of appendPQExpBuffer
Use appendPQExpBufferStr when there are no parameters and
appendPQExpBufferChar when the string length is 1.

Unlike 3fae25cbb, which fixed this issue for code that was new to v18,
this one fixes up instances which exist in the backbranches.  We've
historically tried to maintain this standard and if we're going to
continue doing that, then we won't be doing that selectively based on
when the code was introduced.  Now seems like a good time to flush out the
existing misuses.  Waiting until v19 just prolongs their existence in
terms of released versions that the misuses exist in.

Author: David Rowley <drowleyml@gmail.com>
Discussion: https://postgr.es/m/CAApHDvoARMvPeXTTC0HnpARBHn-WgVstc8XFCyMGOzvgu_1HvQ@mail.gmail.com
2025-04-18 12:15:08 +12:00
David Rowley
d9e03864b6 Make levels 1-based in pg_log_backend_memory_contexts()
Both pg_get_process_memory_contexts() and pg_backend_memory_contexts
have 1-based levels, whereas pg_log_backend_memory_contexts() was using
0-based levels.  Align these.

This results in slightly saner behavior from MemoryContextStatsDetail()
in regards to the max_level.  Previously it would stop at 1 level before
the maximum requested level rather than at that level.

Reported-by: Atsushi Torikoshi <torikoshia@oss.nttdata.com>
Author: Atsushi Torikoshi <torikoshia@oss.nttdata.com>
Author: David Rowley <drowleyml@gmail.com
Reviewed-by: Melih Mutlu <m.melihmutlu@gmail.com>
Reviewed-by: Rahila Syed <rahilasyed90@gmail.com>
Discussion: https://postgr.es/m/395ea5d4fe190480efa95bf533485c70@oss.nttdata.com
2025-04-18 09:04:28 +12:00
Tom Lane
fc5e966f73 Suppress "may be used uninitialized" warnings from older compilers.
The "children" list won't be used until "got_children" has been set
true, but older compilers don't get that; about half a dozen
buildfarm animals are warning about this.  Issue added by 11ff192b5.

While here, improve slightly-shaky grammar in comment.

Discussion: https://postgr.es/m/2057835.1744833309@sss.pgh.pa.us
2025-04-17 16:47:04 -04:00
Tom Lane
4aad2cb770 Portability fix: isdigit() must be passed an unsigned char.
Oversight in commit 40b9c2701, per buildfarm member mamba.
2025-04-17 16:33:21 -04:00
Tom Lane
0400ae4a68 Cache typlens of a SQL function's input arguments.
This gets rid of repetitive get_typlen calls in postquel_sub_params,
which show up as costing a few percent of the runtime in simple test
cases (more with more parameters).

In combination with the preceding patches, this gets us most of the
way back down to the amount of per-call overhead that functions.c
had before commit 0dca5d68d.  There are some more things that could
be done, but this seems like an okay place to stop for v18.
2025-04-17 12:56:40 -04:00
Tom Lane
0313c5dc62 Make SQLFunctionCache long-lived again.
At this point, the only data structures we allocate directly in
fcontext are the SQLFunctionCache struct itself, the ParamListInfo
struct, and the execution_state array, all of which are small and
perfectly capable of being re-used across executions of the same
FmgrInfo.  Hence, let's give them the same lifespan as the FmgrInfo.
This step gets rid of the separate SQLFunctionLink struct and makes
fn_extra point to SQLFunctionCache again.  We also get rid of the
separate fcontext memory context and allocate these items directly
in fn_mcxt.

For notational simplicity, SQLFunctionCache still has an fcontext
field, but it's just a copy of fn_mcxt.

The motivation for this is to allow these structures to live as
long as the FmgrInfo and be re-used across calls, restoring the
original design without its propensity for memory leaks.  This
gets rid of some per-call overhead that we added in 0dca5d68d.

We also make an effort to re-use the JunkFilter and result slot.
Those might need to change if the function definition changes,
so we compromise by rebuilding them if the cached plan changes.

This also moves the tuplestore into fn_mcxt so that it can be
re-used across calls, again undoing a change made in 0dca5d68d.
2025-04-17 12:56:31 -04:00
Tom Lane
f45a5444ee Split some storage out to separate subcontexts of fcontext.
Put the JunkFilter and its result slot (and thence also
some subsidiary data such as the result tupledesc) into a
separate subcontext "jfcontext".  This doesn't accomplish
a lot at this point, because we make a new JunkFilter each
time through the SQL function.  However, the plan is to make
the fcontext long-lived, and that raises the possibility
that we'll need a new JunkFilter because the plan for the
result-generating query changes.  A separate context makes
it easy to free the obsoleted data when that happens.

Also, instead of always running the sub-executor in fcontext,
make a separate context for it if we're doing lazy eval of
a SRF, and otherwise just run it inside CurrentMemoryContext.
2025-04-17 12:56:21 -04:00
Tom Lane
595d1efeda Make functions.c mostly run in a short-lived memory context.
Previously, much of this code ran with CurrentMemoryContext set
to be the function's fcontext, so that we tended to leak a lot of
stuff there.  Commit 0dca5d68d dealt with that by releasing the
fcontext at the completion of each SQL function call, but we'd
like to go back to the previous approach of allowing the fcontext
to be query-lifespan.  To control the leakage problem, rearrange
the code so that we mostly run in the memory context that fmgr_sql
is called in (which we expect to be short-lived).  Notably, this
means that parsing/planning is all done in the short-lived context
and doesn't leak cruft into fcontext.

This patch also fixes the allocation of execution_state records
so that we don't leak them across executions.  I set that up
with a re-usable array that contains at least as many
execution_state structs as we need for the current querytree.
The chain structure is still there, but it's not really doing
much for us, and maybe somebody will be motivated to get rid
of it.  I'm not though.

This incidentally also moves the call of BlessTupleDesc to be
with the code that creates the JunkFilter.  That doesn't make
much difference now, but a later patch will reduce the number
of times the JunkFilter gets made, and we needn't bless the
results any more often than that.

We still leak a fair amount in fcontext, particularly when
executing utility statements, but that's material for a
separate patch step; the point here is only to get rid of
unintentional allocations in fcontext.
2025-04-17 12:56:08 -04:00
Tom Lane
09b07c2953 Minor performance improvement for SQL-language functions.
Late in the development of commit 0dca5d68d, I added a step to copy
the result tlist we extract from the cached final query, because
I was afraid that that might not last as long as the JunkFilter that
we're passing it off to.  However, that turns out to cost a noticeable
number of cycles, and it's really quite unnecessary because the
JunkFilter will not examine that tlist after it's been created.
(ExecFindJunkAttribute would use it, but we don't use that function
on this JunkFilter.)  Hence, remove the copy step.  For safety,
reset the might-become-dangling jf_targetList pointer to NIL.

In passing, remove DR_sqlfunction.cxt, which we don't use anymore;
it's confusing because it's not entirely clear which context it
ought to point at.
2025-04-17 12:55:58 -04:00
Noah Misch
f4ece891fc Assert lack of hazardous buffer locks before possible catalog read.
Commit 0bada39c83 fixed a bug of this kind,
which existed in all branches for six days before detection.  While the
probability of reaching the trouble was low, the disruption was extreme.  No
new backends could start, and service restoration needed an immediate
shutdown.  Hence, add this to catch the next bug like it.

The new check in RelationIdGetRelation() suffices to make autovacuum detect
the bug in commit 243e9b40f1 that led to commit
0bada39.  This also checks in a number of similar places.  It replaces each
Assert(IsTransactionState()) that pertained to a conditional catalog read.

No back-patch for now, but a back-patch of commit 243e9b4 should back-patch
this, too.  A back-patch could omit the src/test/regress changes, since back
branches won't gain new index columns.

Reported-by: Alexander Lakhin <exclusion@gmail.com>
Discussion: https://postgr.es/m/20250410191830.0e.nmisch@google.com
Discussion: https://postgr.es/m/10ec0bc3-5933-1189-6bb8-5dec4114558e@gmail.com
2025-04-17 05:00:30 -07:00
Daniel Gustafsson
b669293e34 pg_dump: Set private_date pointer to NULL in callback
The end callback for ZStandard compression frees the private_data
but didn't set the pointer to NULL after freeing.  This is not a
bug as the code is right now, since nothing is dereferencing the
pointer upon returning from the callback but it is good practice
to do.

Author: Alexander Kuznetsov <kuznetsovam@altlinux.org>
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/efaee52b-9550-44ca-8633-ea86076b3283@altlinux.org
2025-04-17 12:58:00 +02:00
Fujii Masao
e4b0f86e1f pg_dump: Fix incorrect archive format shown in error message.
In pg_dump and pg_restore, _allocAH() calls _discoverArchiveFormat() to
determine the archive format when the input format is unknown one.
If the input or discovered format is unrecognized, it reports an error
including the archive format number.

If discovered format is unrecognized, its number should be shown in
the error message. But previously the error message mistakenly showed
the originally requested format number (i.e., unknown one) instead of
the discovered one, due to referencing the wrong variable in the error
message.

This commit corrects the issue by using the appropriate variable in
the error message.

This fix has no practical impact since _discoverArchiveFormat() never
returns an unrecognized format and that error mesasge is actually
never output. Therefore, while the issue exists in back branches,
it's not worth the trouble and buildfarm cycles to back-patch.
So this fix is applied only to the master branch.

Author: Mahendra Singh Thalor <mahi6run@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Fujii Masao <masao.fujii@gmail.com>
Discussion: https://postgr.es/m/CAKYtNAqu+N-Ab2Fq6wzNSOm_-0N-BMneanYNV1+6kFDXjva1Eg@mail.gmail.com
2025-04-17 09:52:47 +09:00
Jeff Davis
2e5353be25 Another unintentional behavior change in commit e9931bfb75.
Reported-by: Noah Misch <noah@leadboat.com>
Reviewed-by: Noah Misch <noah@leadboat.com>
Discussion: https://postgr.es/m/20250412123430.8c.nmisch@google.com
2025-04-16 16:49:42 -07:00
Jeff Davis
b107744ce7 Improve comment in regc_pg_locale.c.
Reported-by: Noah Misch <noah@leadboat.com>
Reviewed-by: Noah Misch <noah@leadboat.com>
Discussion: https://postgr.es/m/20250412123430.8c.nmisch@google.com
2025-04-16 16:49:35 -07:00
David Rowley
3fae25cbb3 Fixup various new-to-v18 usages of appendPQExpBuffer
Use appendPQExpBufferStr when there are no parameters and
appendPQExpBufferChar when the string length is 1.

Author: David Rowley <drowleyml@gmail.com>
Discussion: https://postgr.es/m/CAApHDvoARMvPeXTTC0HnpARBHn-WgVstc8XFCyMGOzvgu_1HvQ@mail.gmail.com
2025-04-17 11:37:55 +12:00
David Rowley
f3281f9f93 Improve comments for estimate_multivariate_ndistinct()
estimate_multivariate_ndistinct() is coded to assume the caller handles
passing it a list of GroupVarInfos with unique 'var' fields over the
entire list.  6bb6a62f3 added code which didn't ensure this and that
could result in estimate_multivariate_ndistinct() erroring out with:

ERROR:  corrupt MVNDistinct entry

This occurred because estimate_multivariate_ndistinct() first searches
for a set of stats that match to at least two of the given GroupVarInfos
and then later assumes that the MVNDistinctItem.items array of the
best matching stats will have an entry for those two columns.  If the
GroupVarInfos List contained a duplicate entry then the same column could
be matched to twice and that could trick the code into thinking we have
>= 2 columns matched in cases where only a single distinct column has been
matched.  This could result in a failure to find the correct
MVNDistinctItem in the stats as the array containing those never
contains an item for single columns.

Here we make it more clear that the function needs a distinct set of
GroupVarInfos and also tidy up a few other comments to make things a bit
easier to follow.

Author: David Rowley <drowleyml@gmail.com>
Discussion: https://postgr.es/m/CAApHDvocZCUhM9W9mJ39d6oQz7ePKoqFnao_347mvC-A7QatcQ@mail.gmail.com
2025-04-17 11:03:24 +12:00
Tom Lane
ab3d8afc7f Sync declarations and definitions of two new tablecmds.c functions.
Buildfarm member drongo complained because the definitions of these
functions used "const Oid foo" where the forward declarations just
had "Oid foo".  (I'm a bit surprised that drongo seems to be the only
complainant.)  I chose to fix this by removing the "consts" because
(a) I'm generally not a fan of using const that way, and (b) it was
a minority usage even within these two functions, let alone compared
to the rest of our code base.

Oversight in commit eec0040c4, so no need for back-patch.
2025-04-16 17:59:08 -04:00
Álvaro Herrera
11ff192b5b
Elide not-null constraint checks on child tables during PK creation
We were unnecessarily acquiring AccessExclusiveLock on all child tables
when "ALTER TABLE ONLY sometab ADD PRIMARY KEY" was run on their parent
table, an oversight in commit 14e87ffa5c.  This caused deadlocks
during pg_restore of partitioned tables.

The reason to acquire the AEL was that we need to verify that child
tables have the involved columns already marked as not-null; but if the
parent table has an inheritable not-null constraint, then all children
must necessarily be in the correct state already, so we can skip the
check, which avoids acquiring the lock.  Reorder the code so that it
works that way.  This doesn't change things in the case where the
constraint doesn't exist, but that case is of lesser importance because
it doesn't occur during parallel pg_restore.

While at it, reword some errmsg() and add errhint() to similar cases in
related but not adjacent code.

Diagnosed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Tender Wang <tndrwang@gmail.com>
Discussion: https://postgr.es/m/67469c1c-38bc-7d94-918a-67033f5dd731@gmx.net
Discussion: https://postgr.es/m/2045026.1743801143@sss.pgh.pa.us
Discussion: https://postgr.es/m/1280408.1744650810@sss.pgh.pa.us
2025-04-16 21:51:23 +02:00
Daniel Gustafsson
1fd3566ebc Update pg_config.h.in with libnuma changes
Add macros from autoheader which were accidentally omitted in
commit 65c298f61f. There is no function change by this as no
code is currently using the missing macro.

Author: Daniel Gustafsson <daniel@yesql.se>
Reviewed-by: Jacob Champion <jacob.champion@enterprisedb.com>
Discussion: https://postgr.es/m/CF6D7D7F-E1C4-45BE-9019-0F4B4BC7C135@yesql.se
2025-04-16 20:16:57 +02:00
Tom Lane
1fc3403626 Fix pg_dump --clean with partitioned indexes.
We'd try to drop the partitions of a partitioned index separately,
which is disallowed by the backend, leading to an error during
restore.  While the error is harmless, it causes problems if you
try to use --single-transaction mode.

Fortunately, there seems no need to do a DROP at all, since the
partition will go away silently when we drop either the parent index
or the partition's table.  So just make the DROP conditional on not
being a partition.

Reported-by: jian he <jian.universality@gmail.com>
Author: jian he <jian.universality@gmail.com>
Reviewed-by: Pavel Stehule <pavel.stehule@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/CACJufxF0QSdkjFKF4di-JGWN6CSdQYEAhGPmQJJCdkSZtd=oLg@mail.gmail.com
Backpatch-through: 13
2025-04-16 13:31:59 -04:00
Andrew Dunstan
40b9c27014 pg_restore cleanups
. remove unnecessary oid_string list stuff
. use pg_get_line_buf() instead of open-coding it
. cleaner parsing of map.dat lines

Reverts 2b69afbe50 add new list type simple_oid_string_list to fe-utils/simple_list

Author: Álvaro Herrera <alvherre@kurilemu.de>
Author: Andrew Dunstan <andrew@dunslane.net>

Discussion: https://postgr.es/m/202504141220.343fmoxfsbj4@alvherre.pgsql
2025-04-16 12:04:34 -04:00
Richard Guo
3b35f9a4c5 Fix an incorrect check in get_memoize_path
Memoize typically marks cache entries as complete after fully scanning
the inner side of a join.  However, in the case of unique joins, we
skip to the next outer tuple as soon as the first matching inner tuple
is found, leaving no opportunity to scan the inner side to completion.
To work around that, we mark cache entries as complete after fetching
the first matching inner tuple in unique joins.

This approach is only safe when all of the join's restriction clauses
are parameterized; otherwise, there is no guarantee that reading just
one tuple from the inner side is sufficient.

Currently, we check for this by verifying that the number of clauses
in ppi_clauses is no less than the number of the join's restriction
clauses.  However, this check isn't entirely reliable, as ppi_clauses
includes join clauses available from all outer rels, not just the
current outer rel.  This means the check could pass even if a
restriction clause isn't parameterized, as long as another join
clause, which doesn't belong to the current join, is included in
ppi_clauses.

To fix this, we explicitly check whether each restriction clause of
the current join is present in ppi_clauses.

While we're here, remove the XXX comment from the modified code, as
it's not justified; in certain cases, it's not possible to move a join
clause to the inner side.

This is arguably a bugfix, but no backpatch given the lack of field
reports.

Author: Richard Guo <guofenglinux@gmail.com>
Reviewed-by: wenhui qiu <qiuwenhuifx@gmail.com>
Reviewed-by: Andrei Lepikhov <lepihov@gmail.com>
Discussion: https://postgr.es/m/CAMbWs4-8JPouj=wBDj4DhK-WO4+Xdx=A2jbjvvyyTBQneJ1=BQ@mail.gmail.com
2025-04-16 10:55:44 +09:00
Daniel Gustafsson
5ee476294c doc: Fix typos in documentation
This fixes a set of typos introduced during the v18 development
cycle.

Author: Daniel Gustafsson <daniel@yesql.se>
Reviewed-by: Laurenz Albe <laurenz.albe@cybertec.at>
Discussion: https://postgr.es/m/7038B4C5-2742-42B1-A8F0-0FFEAECF02A7@yesql.se
2025-04-15 21:32:18 +02:00
Tom Lane
7c87284940 Fix failure for generated column with a not-null domain constraint.
If a GENERATED column is declared to have a domain data type where
the domain's constraints disallow null values, INSERT commands failed
because we built a targetlist that included coercing a null constant
to the domain's type.  The failure occurred even when the generated
value would have been perfectly OK.  This is adjacent to the issues
fixed in 0da39aa76, but we didn't notice for lack of testing a domain
with such a constraint.

We aren't going to use the result of the targetlist entry for the
generated column --- ExecComputeStoredGenerated will overwrite it.
So it's not really necessary that it have the exact datatype of
the generated column.  This patch fixes the problem by changing
the targetlist entry to be a null Const of the domain's base type,
which should be sufficiently legal.  (We do have to tweak
ExecCheckPlanOutput to accept the situation, though.)

This has been broken since we implemented generated columns.
However, this patch only applies easily as far back as v14, partly
because I (tgl) only carried 0da39aa76 back that far, but mostly
because v14 significantly refactored the handling of INSERT/UPDATE
targetlists.  Given the lack of field complaints and the short
remaining support lifetime of v13, I judge the cost-benefit ratio
not good for devising a version that would work in v13.

Reported-by: jian he <jian.universality@gmail.com>
Author: jian he <jian.universality@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/CACJufxG59tip2+9h=rEv-ykOFjt0cbsPVchhi0RTij8bABBA0Q@mail.gmail.com
Backpatch-through: 14
2025-04-15 12:08:34 -04:00
Fujii Masao
f840f8ee30 doc: Fix missing whitespace in pg_restore documentation.
Previously, a space was missing between "<option>--exclude-schema</option>"
and "for" in the pg_restore documentation. This commit fixes the typo by
adding the missing whitespace.

Back-patch to v17 where the typo was added.

Author: Lele Gaifax <lele@metapensiero.it>
Reviewed-by: Fujii Masao <masao.fujii@gmail.com>
Discussion: https://postgr.es/m/87lds3ysm0.fsf@metapensiero.it
Backpatch-through: 17
2025-04-15 23:15:06 +09:00
Daniel Gustafsson
7ae13170ba pg_combinebackup: Fix incorrect code documentation
The code comment for parse_oid accidentally used the wrong parameter
when referring to the location of the last backup. Also, while there,
improve sentence wording by removing a superfluous word.

Backpatch to v17 where pg_combinebackup was addedd

Author: Amul Sul <sulamul@gmail.com>
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Discussion: https://postgr.es/m/CAAJ_b95ecWgzcS4K3Dx0E_Yp-SLwK5JBasFgioKMSjhQLw9xvg@mail.gmail.com
Backpatch-through: 17
2025-04-15 15:27:08 +02:00
Peter Eisentraut
c55df7c6ea Fix incorrect format placeholders
BlockNumber is unsigned int.  Fix for commit 14ffaece0f.
2025-04-14 08:56:33 +02:00
Peter Eisentraut
7cd171a5d2 Add more source files to pg_verifybackup/nls.mk
also related to commit 8dfd312902
2025-04-14 08:32:46 +02:00
David Rowley
b51f86e49a Doc: use "an SQL" consistently rather than "a SQL"
Per the precedent set by 04539e73f, adjust article prefixes for "SQL" to
use "an" consistently rather than "a", i.e., "an es-que-ell" rather than
"a sequel".

Both of these are new to v18. Also see b1b13d2b5, d866f0374 and
7bdd489d3.
2025-04-14 11:55:18 +12:00
Daniel Gustafsson
2970c75dd9 Mark sslkeylogfile as Debug option
Mark the sslkeylogile option as "D" debug as this truly is a debug
option, and it will allow postgres_fdw et.al to filter it out as
well.  Also update the display length to match that for an ssl key
as they are both filename based inputs.

Author: Daniel Gustafsson <daniel@yesql.se>
Reported-by: Jacob Champion <jacob.champion@enterprisedb.com>
Discussion: https://postgr.es/m/CAOYmi+=5GyBKpu7bU4D_xkAnYJTj=rMzGaUvHO99-DpNG_YKcw@mail.gmail.com
2025-04-13 21:53:03 +02:00
Andrew Dunstan
64e193f5dd Make AIO error test more portable
Alpine Linux's C library (musl) spells one error message differently.

Reported-by: Wolfgang Walther
2025-04-13 14:39:45 -04:00
Andrew Dunstan
f09088a01d Free memory properly in pg_restore.c
Thinko in commit 39729ec01d. Mea maxima culpa.

Per Mahendra Singh Thalor <mahi6run@gmail.com>
2025-04-12 14:54:48 -04:00
Tom Lane
78637a8be2 Doc: do a little copy-editing on Index Storage Parameters list.
Add a paragraph break per suggestion from David G. Johnston.
Use a consistent voice for all the different parameter
descriptions, and fix a couple of grammatical issues.

Reported-by: Igor Korot <ikorot01@gmail.com>
Co-authored-by: "David G. Johnston" <david.g.johnston@gmail.com>
Co-authored-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/CA+FnnTz=EW1VQRpWB9J+G-NSchrPFcw4nR7d0JqzEK9jWKB35A@mail.gmail.com
2025-04-12 13:42:31 -04:00
Tom Lane
e708ffe79d Fix GIN's shimTriConsistentFn to not corrupt its input.
Commit 0f21db36d made an assumption that GIN triConsistentFns
would not modify their input entryRes[] arrays.  But in fact,
the "shim" triConsistentFn that we use for opclasses that don't
supply their own did exactly that, potentially leading to wrong
answers from a GIN index search.  Through bad luck, none of the
test cases that we have for such opclasses exposed the bug.

One response to this could be that the assumption of consistency check
functions not modifying entryRes[] arrays is a bad one, but it still
seems reasonable to me.  Notably, shimTriConsistentFn is itself
assuming that with respect to the underlying boolean consistentFn,
so it's sure being self-centered in supposing that it gets to do so.

Fortunately, it's quite simple to fix shimTriConsistentFn to restore
the entry-time state of entryRes[], so let's do that instead.

This issue doesn't affect any core GIN opclasses, since they all
supply their own triConsistentFns.  It does affect contrib modules
btree_gin, hstore, and intarray.

Along the way, I (tgl) noticed that shimTriConsistentFn failed to
pick up on a "recheck" flag returned by its first call to the boolean
consistentFn.  This may be only a latent problem, since it would be
unlikely for a consistentFn to set recheck for the all-false case
and not any other cases.  (Indeed, none of our contrib modules do
that.)  Nonetheless, it's formally wrong.

Reported-by: Vinod Sridharan <vsridh90@gmail.com>
Author: Vinod Sridharan <vsridh90@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/CAFMdLD7XzsXfi1+DpTqTgrD8XU0i2C99KuF=5VHLWjx4C1pkcg@mail.gmail.com
Backpatch-through: 13
2025-04-12 12:28:02 -04:00
Peter Geoghegan
a6cab6a78e Harmonize function parameter names for Postgres 18.
Make sure that function declarations use names that exactly match the
corresponding names from function definitions in a few places.  These
inconsistencies were all introduced during Postgres 18 development.

This commit was written with help from clang-tidy, by mechanically
applying the same rules as similar clean-up commits (the earliest such
commit was commit 035ce1fe).
2025-04-12 12:07:36 -04:00
Michael Paquier
fdb69dd582 Fix instability with WAL fsync test in stats.sql
A backend using wal_sync_method set to "open_sync" or "open_datasync"
would fail the test checking the WAL sync data in pg_stat_io.  These
modes guarantee that a sync is done when WAL is written to disk, and the
data checked by the test is not incremented in this case,
issue_xlog_fsync() doing nothing.

Oversight in commit a051e71e28.

Author: Sami Imseih <samimseih@gmail.com>
Discussion: https://postgr.es/m/CAA5RZ0uxwg3xAi4nvdBMJ-zJQEeyg+RotuU+ebM2F6CKmnvaYA@mail.gmail.com
2025-04-12 13:09:48 +09:00
Daniel Gustafsson
847bbb21f8 Fix recently introduced typos
This fixes typos in docs and comments introduced during the v18
development cycle, to keep them from ending up in backbranches.

Author: Jacob Brazeal <jacob.brazeal@gmail.com>
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Discussion: https://postgr.es/m/CA+COZaCgGua25f2hSrjrDLJcJJAHkwoKgTTqUy-wyL1=64JNjw@mail.gmail.com
2025-04-11 22:17:12 +02:00
Nathan Bossart
5822bf21d5 Add missing space in pg_restore documentation.
Oversight in commit 1495eff7bd.
2025-04-11 10:05:32 -05:00
Peter Eisentraut
914ea1c93c Add missing source file to pg_verifybackup/nls.mk
added by commit 8dfd312902
2025-04-11 10:53:36 +02:00
Peter Eisentraut
b63cbacb86 Add missing source file to pg_dump/nls.mk
added by commit c1da728106
2025-04-11 10:28:59 +02:00
Peter Eisentraut
9e0e1cfc3e Add missing source file to pg_upgrade/nls.mk
added by commit 40e2e5e92b
2025-04-11 10:26:51 +02:00
Peter Eisentraut
7d430a5728 Add missing PGDLLIMPORT markings
Discussion: https://www.postgresql.org/message-id/flat/25095db5-b595-4b85-9100-d358907c25b5%40eisentraut.org
2025-04-11 08:59:52 +02:00
Michael Paquier
2e57790836 Fix race with synchronous_standby_names at startup
synchronous_standby_names cannot be reloaded safely by backends, and the
checkpointer is in charge of updating a state in shared memory if the
GUC is enabled in WalSndCtl, to let the backends know if they should
wait or not for a given LSN.  This provides a strict control on the
timing of the waiting queues if the GUC is enabled or disabled, then
reloaded.  The checkpointer is also in charge of waking up the backends
that could be waiting for a LSN when the GUC is disabled.

This logic had a race condition at startup, where it would be possible
for backends to not wait for a LSN even if synchronous_standby_names is
enabled.  This would cause visibility issues with transactions that we
should be waiting for but they were not.  The problem lasts until the
checkpointer does its initial update of the shared memory state when it
loads synchronous_standby_names.

In order to take care of this problem, the shared memory state in
WalSndCtl is extended to detect if it has been initialized by the
checkpointer, and not only check if synchronous_standby_names is
defined.  In WalSndCtlData, sync_standbys_defined is renamed to
sync_standbys_status, a bits8 able to know about two states:
- If the shared memory state has been initialized.  This flag is set by
the checkpointer at startup once, and never removed.
- If synchronous_standby_names is known as defined in the shared memory
state.  This is the same as the previous sync_standbys_defined in
WalSndCtl.

This method gives a way for backends to decide what they should do until
the shared memory area is initialized, and they now ultimately fall back
to a check on the GUC value in this case, which is the best thing that
can be done.

Fortunately, SyncRepUpdateSyncStandbysDefined() is called immediately by
the checkpointer when this process starts, so the window is very narrow.
It is possible to enlarge the problematic window by making the
checkpointer wait at the beginning of SyncRepUpdateSyncStandbysDefined()
with a hardcoded sleep for example, and doing so has showed that a 2PC
visibility test is indeed failing.  On machines slow enough, this bug
would cause spurious failures.

In 17~, we have looked at the possibility of adding an injection point
to have a reproducible test, but as the problematic window happens at
early startup, we would need to invent a way to make an injection point
optionally persistent across restarts when attached, something that
would be fine for this case as it would involve the checkpointer.  This
issue is quite old, and can be reproduced on all the stable branches.

Author: Melnikov Maksim <m.melnikov@postgrespro.ru>
Co-authored-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/163fcbec-900b-4b07-beaa-d2ead8634bec@postgrespro.ru
Backpatch-through: 13
2025-04-11 10:00:21 +09:00
David Rowley
530050d8d2 Add code comment explaining ins_since_vacuum and aborted inserts
Sami complained that there's a discrepancy between n_mod_since_analyze
and n_ins_since_vacuum, as the former only accounts for committed changes
and the latter tracks committed and aborted inserts.  Nobody seemed
overly concerned that this would cause any concerning issues.  The
repercussions, from what I can tell, are limited to causing an
autovacuum to trigger for inserts sooner than it otherwise might. For
typical ratios of commits to aborts, it's unlikely to ever be noticed.

Fixing things to make it so n_ins_since_vacuum only displays committed
inserts would require an additional field in PgStat_TableCounts, which
does not quite seem worthwhile at this stage.  This commit just adds a
comment with some details to mention that we know about it, which will
hopefully prevent repeat discussions.

Reported-by: Sami Imseih <samimseih@gmail.com>
Author: David Rowley <drowleyml@gmail.com>
Reviewed-by: Sami Imseih <samimseih@gmail.com>
Discussion: https://postgr.es/m/CAApHDvpgV3a-R2EGmPOh0L-x3pHbZpM3y4dySWfy+UqUazwDQA@mail.gmail.com
2025-04-11 11:36:21 +12:00
Andrew Dunstan
39729ec01d Fix fat fingering in 22cb6d2895
Per Rainier Vilela
2025-04-10 19:08:04 -04:00
David Rowley
928394b664 Improve various new-to-v18 appendStringInfo calls
Similar to 8461424fd, here we adjust a few new locations which were not
using the most suitable appendStringInfo* function for the intended
purpose.

Author: David Rowley <drowleyml@gmail.com
Discussion: https://postgr.es/m/CAApHDvqJnNjueb=Eoj8K+8n0g7nj_AcPWSiCj5RNV4fDejAfqA@mail.gmail.com
2025-04-11 10:07:22 +12:00
Daniel Gustafsson
55ef7abf88 Rename global variable backing DSA area
The global variable backing the DSA area for Memory Context stats
reporting had a too generic name, rename to be more descriptive.
Independently reported by Peter and Laurenz.

Author: Daniel Gustafsson <daniel@yesql.se>
Reported-by: Peter Eisentraut <peter@eisentraut.org>
Reported-by: Laurenz Albe <laurenz.albe@cybertec.at>
Discussion: https://postgr.es/m/d51172bd4e7f4b07a18a0288ca1b1c28a71a5f6a.camel@cybertec.at
Discussion: https://postgr.es/m/25095db5-b595-4b85-9100-d358907c25b5@eisentraut.org
2025-04-10 22:40:27 +02:00
Andrew Dunstan
22cb6d2895 Fix memory leak in pg_restore.c
Oversight in 1495eff7bd

Author: Ranier Vilela <ranier.vf@gmail.com>
2025-04-10 14:57:02 -04:00
Tom Lane
d89335eea6 Doc: remove long-obsolete advice about generated constraint names.
It's been twenty years since we generated constraint names that
look like "$N".  So this advice about double-quoting such names
is well past its sell-by date, and now it merely seems confusing.

Reported-by: Yaroslav Saburov <y.saburov@gmail.com>
Author: "David G. Johnston" <david.g.johnston@gmail.com>
Discussion: https://postgr.es/m/174393459040.678.17810152410419444783@wrigleys.postgresql.org
Backpatch-through: 13
2025-04-10 14:49:10 -04:00
Tom Lane
f27eb0325b Remove useless check for negative result of ip_addrsize().
By inspection, ip_addrsize() can't return a negative result.
(If it could, we'd have way bigger problems elsewhere.)
So delete useless check in network_send().  Most C compilers
are probably perfectly capable of removing this code by
themselves, but it's confusing/misleading.

Bug: #18889
Reported-by: Daniel Elishakov <dan-eli@mail.ru>
Discussion: https://postgr.es/m/18889-73d4f19e953a629e@postgresql.org
2025-04-10 14:18:07 -04:00
Andrew Dunstan
4170298b6e Further cleanup for directory creation on pg_dump/pg_dumpall
Instead of two separate (and different) implementations, refactor to use
a single common routine.

Along the way, remove use of a hardcoded file permissions constant in
favor of the common project setting for directory creation.

Author: Mahendra Singh Thalor <mahi6run@gmail.com>

Discussion: https://postgr.es/m/CAKYtNApihL8X1h7XO-zOjznc8Ca66Aevgvhc9zOTh6DBh2iaeA@mail.gmail.com
2025-04-10 12:11:36 -04:00
Amit Kapila
4909b38af0 Fix data loss in logical replication.
Data loss can happen when the DDLs like ALTER PUBLICATION ... ADD TABLE ...
or ALTER TYPE ...  that don't take a strong lock on table happens
concurrently to DMLs on the tables involved in the DDL. This happens
because logical decoding doesn't distribute invalidations to concurrent
transactions and those transactions use stale cache data to decode the
changes. The problem becomes bigger because we keep using the stale cache
even after those in-progress transactions are finished and skip the
changes required to be sent to the client.

This commit fixes the issue by distributing invalidation messages from
catalog-modifying transactions to all concurrent in-progress transactions.
This allows the necessary rebuild of the catalog cache when decoding new
changes after concurrent DDL.

We observed performance regression primarily during frequent execution of
*publication DDL* statements that modify the published tables. The
regression is minor or nearly nonexistent for DDLs that do not affect the
published tables or occur infrequently, making this a worthwhile cost to
resolve a longstanding data loss issue.

An alternative approach considered was to take a strong lock on each
affected table during publication modification. However, this would only
address issues related to publication DDLs (but not the ALTER TYPE ...)
and require locking every relation in the database for publications
created as FOR ALL TABLES, which is impractical.

The bug exists in all supported branches, but we are backpatching till 14.
The fix for 13 requires somewhat bigger changes than this fix, so the fix
for that branch is still under discussion.

Reported-by: hubert depesz lubaczewski <depesz@depesz.com>
Reported-by: Tomas Vondra <tomas.vondra@enterprisedb.com>
Author: Shlok Kyal <shlok.kyal.oss@gmail.com>
Author: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Reviewed-by: Zhijie Hou <houzj.fnst@fujitsu.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Tested-by: Benoit Lobréau <benoit.lobreau@dalibo.com>
Backpatch-through: 14
Discussion: https://postgr.es/m/de52b282-1166-1180-45a2-8d8917ca74c6@enterprisedb.com
Discussion: https://postgr.es/m/CAD21AoAenVqiMjpN-PvGHL1N9DWnHSq673bfgr6phmBUzx=kLQ@mail.gmail.com
2025-04-10 13:14:40 +05:30
Peter Eisentraut
9ad19295e9 Fix incorrect format placeholders
for commits 8f427187db, 6ee3b91bad
2025-04-10 08:04:35 +02:00
David Rowley
d7c04db27a Update wording in optimizer/README for EquivalenceClasses
d69d45a5a changed how em_is_child members are stored in
EquivalenceClasses.  Children are no longer stored in the ec_members
list.  optimizer/README mentioned that most operations "should ignore
child members", but that felt a little untrue now since child members
are now stored in a separate place, they simply won't be found by the
normal means of looking (a foreach loop over ec_members), and if you don't
find them, there's technically no need to "ignore" them.

Here we tweak the wording slightly to reflect the new storage location
for child members.

Reported-by: Amit Langote <amitlangote09@gmail.com>
Author: Amit Langote <amitlangote09@gmail.com>
Author: David Rowley <dgrowleyml@gmail.com>
Discussion: https://postgr.es/m/CA+HiwqE8v=EuAP_3F_A2xn8zWx+nG_etW_Fe_DvKO-Fkx=+DdQ@mail.gmail.com
2025-04-10 17:33:58 +12:00
Amit Kapila
d438515c29 Cosmetic fixes for pg_createsubscriber's -all option.
Author: Peter Smith <smithpb2250@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Discussion: https://postgr.es/m/CAHut+PsmSCQ-ENSDQ0YOUcsgzT=GG-E9jyXBvxd51A_dMXH5XA@mail.gmail.com
2025-04-10 10:30:05 +05:30
Tomas Vondra
d15acc915d ci: Check for missing dependencies in meson builds
Extends the Linux and Windows meson builds with a check for missing
dependencies by running

    ninja -t missingdeps

after the build. This highlights unindended dependencies.

Reviewed-by: Andres Freund <andres@anarazel.de>
https://postgr.es/m/CALdSSPi5fj0a7UG7Fmw2cUD1uWuckU_e8dJ+6x-bJEokcSXzqA@mail.gmail.com
2025-04-09 22:01:58 +02:00
Tomas Vondra
3887d0cfeb Cleanup of pg_numa.c
This moves/renames some of the functions defined in pg_numa.c:

* pg_numa_get_pagesize() is renamed to pg_get_shmem_pagesize(), and
  moved to src/backend/storage/ipc/shmem.c. The new name better reflects
  that the page size is not related to NUMA, and it's specifically about
  the page size used for the main shared memory segment.

* move pg_numa_available() to src/backend/storage/ipc/shmem.c, i.e. into
  the backend (which more appropriate for functions callable from SQL).
  While at it, improve the comment to explain what page size it returns.

* remove unnecessary includes from src/port/pg_numa.c, adding
  unnecessary dependencies (src/port should be suitable for frontent).
  These were either leftovers or unnecessary thanks to the other changes
  in this commit.

This eliminates unnecessary dependencies on backend symbols, which we
don't want in src/port.

Reported-by: Kirill Reshke <reshkekirill@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
https://postgr.es/m/CALdSSPi5fj0a7UG7Fmw2cUD1uWuckU_e8dJ+6x-bJEokcSXzqA@mail.gmail.com
2025-04-09 21:50:17 +02:00
Nathan Bossart
e2665efd0f pg_upgrade: Mention that we preserve database OIDs in a comment.
Oversight in commit aa01051418.

Reported-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/4055696.1744134682%40sss.pgh.pa.us
2025-04-09 14:27:08 -05:00
Tom Lane
837cc73af2 Fix performance issue in deadlock-parallel isolation test.
With debug_discard_caches = 1, the runtime of this test script
increased by about a factor of 10 after commit 0dca5d68d.  That's
causing some of our buildfarm animals to fail with a timeout.

The reason for the increased time is that now we are re-planning
some intentionally-non-inlineable SQL functions on every execution,
where the previous coding held onto the original plans throughout
the outer query.  The previous behavior was arguably quite buggy,
so I don't think 0dca5d68d deserves blame here.  But we would
like this test script to not take so long.

To fix, instead of forcing a "parallel safe" label via a
non-inlineable SQL function, apply it directly to the advisory-lock
functions by making internal-language aliases for them.  A small
problem is that the advisory-lock functions return void but this
test would really like them to return integer 1.  I cheated here by
declaring the aliases as returning "int".  That's perhaps undue
familiarity with the implementation of PG_RETURN_VOID(), but that
hasn't changed in twenty years and is unlikely to do so in the next
twenty.  That gets us an integer 0 result, and then an inline-able
wrapper to convert that to an integer 1 allows the rest of the script
to remain unchanged.

For me, this reduces the runtime with debug_discard_caches = 1
by about 100x, making the test comfortably faster than before
instead of slower.

Discussion: https://postgr.es/m/136163.1744179562@sss.pgh.pa.us
2025-04-09 12:28:34 -04:00
Noah Misch
5bbc596391 Fix test races between syscache-update-pruned.spec and autovacuum.
This spec fails ~3% of my Valgrind runs, and the spec has failed on Valgrind
buildfarm member skink at a similar rate.  Two problems contributed to that:

- A competing buffer pin triggered VACUUM's lazy_scan_noprune() path, causing
  "tuples missed: 1 dead from 1 pages not removed due to cleanup lock
  contention".  FREEZE fixes that.

- The spec ran lazy VACUUM immediately after VACUUM FULL.  The spec implicitly
  assumed lazy VACUUM prunes the one tuple that VACUUM FULL made dead.  First
  wait for old snapshots, making that assumption reliable.

This also adds two forms of defense in depth:

- Wait for snapshots using shared catalog pruning rules (VISHORIZON_SHARED).
  This avoids the removable cutoff moving backward when an XID-bearing
  autoanalyze process runs in another database.  That may never happen in this
  test, but it's cheap insurance.

- Use lazy VACUUM option DISABLE_PAGE_SKIPPING.  Commit
  c2dc1a7976 did this for a related requirement
  in other tests, but I suspect FREEZE is necessary and sufficient in all
  these tests.

Back-patch to v17, where the test first appeared.

Reported-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/sv3taq4e6ea4qckimien3nxp3sz4b6cw6sfcy4nhwl52zpur4g@h6i6tohxmizu
Backpatch-through: 17
2025-04-09 07:23:39 -07:00
Peter Eisentraut
306dd6e727 Update config.guess and config.sub 2025-04-09 12:41:54 +02:00
Heikki Linnakangas
0f1433f053 Fix a few oversights in the longer cancel keys patch
Change MyCancelKeyLength's type from uint8 to int. While it always
fits in a uint8, plain int is less surprising, as there's no
particular reason for it to be uint8.

Fix one ProcSignalInit caller that passed 'false' instead of NULL for
the pointer argument.

Author: Peter Eisentraut <peter@eisentraut.org>
Discussion: https://www.postgresql.org/message-id/61be9e31-7b7d-49d5-bc11-721800d89d64@eisentraut.org
2025-04-09 13:11:42 +03:00
Daniel Gustafsson
ef366b7d7e Perform missed catversion bump
Commit c57971034e renamed an argument for a function but missed
to bump the catversion to reflect this.

Reported-by: David Rowley <dgrowleyml@gmail.com>
Discussion: https://postgr.es/m/CAApHDvqOega=dPtu3h2C5fJWJEuaGCMDib_sVfhKQqgUNJVmFA@mail.gmail.com
2025-04-09 09:29:12 +02:00
Tom Lane
dd496eedea Doc: note that two examples in optimizer/README are oversimplified.
These examples fail to account for join clauses generated by
EquivalenceClasses, but since we haven't mentioned EquivalenceClasses
yet it seems like it'd just add confusion to make them fully accurate.
Instead, parenthetically note that they're oversimplified.

Reported-by: Zeyuan Hu <ferrishu3886@gmail.com>
Co-authored-by: David Rowley <dgrowleyml@gmail.com>
Co-authored-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/CACvHWmYFo+60yMqKJajDDvKN5EM41YHrCT3oxukwXmGAqpWvyw@mail.gmail.com
2025-04-08 23:03:33 -04:00
Tom Lane
b65b9da568 Adjust AdjustUpgrade.pm for commit b1720fe63.
Need to delete the functions we no longer have available from
the dumps to be reloaded from old versions.

Per buildfarm.
2025-04-08 20:21:03 -04:00
Tom Lane
b1720fe63f Move contrib/spi testing from core regression tests to contrib/spi.
It's weird to have the core regression tests depending on contrib
code, and coverage testing shows that those test queries add nothing
to the core-code coverage of the core tests.  So pull those test bits
out and put them into ordinary test scripts inside contrib/spi/,
making that more like other contrib modules.

Aside from being structurally nicer, anything we can take out of the
core tests (which are executed multiple times per check-world run)
and put into tests executed only once should be a win.  It doesn't
look like this change will buy a whole lot of milliseconds, but a
cycle saved is a cycle earned.

Also, there is some discussion around possibly removing refint and/or
autoinc altogether.  I don't know if that will happen, but we'd
certainly need to decouple them from the core tests to do so.

The tests for autoinc were quite intertwined with the undocumented
"ttdummy" trigger in regress.c.  That made the tests very hard to
understand and contributed nothing to autoinc's testing either.
So I just deleted ttdummy and rewrote the autoinc tests without it.

I realized while doing this that the description of autoinc in
the SGML docs is not a great description of what the function
actually does, so the patch includes some updates to those docs.

Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi>
Discussion: https://postgr.es/m/3872677.1744077559@sss.pgh.pa.us
2025-04-08 19:12:03 -04:00
Daniel Gustafsson
c57971034e Rename argument in pg_get_process_memory_contexts().
During development the third argument to pg_get_process_memory_contexts
was a retry count, but it was changed to a timeout instead.  The param
name was accidentally left in pg_proc.dat though.  Fix by renaming to
the correct parameter name.

Author: Fujii Masao <masao.fujii@oss.nttdata.com>
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Discussion: https://postgr.es/m/3eb40b3e-45c7-426a-b7f8-81f7d05a9b53@oss.nttdata.com
2025-04-08 23:09:13 +02:00
Peter Eisentraut
8969194b73 Fix incorrect format placeholder
for commit 749a9e20c9
2025-04-08 19:12:03 +02:00
Nathan Bossart
b0a4c3e88b Prevent 006_transfer_modes.pl from leaving files behind.
This test was leaving files like delete_old_cluster.{sh,bat} in the
source directory for VPATH and meson builds.  To fix, change the
directory to tmp_check before running the test, as was done in
commits 15b6d21553, 8af917be6b, and c462b054ba.

Oversight in commit af0d4901c1.

Reported-by: Andrew Dunstan <andrew@dunslane.net> (on Discord)
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Andrew Dunstan <andrew@dunslane.net>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/Z_RHkG770w3SE0yU%40nathan
2025-04-08 10:57:31 -05:00
Daniel Gustafsson
88edd661c8 ci: Add MBUILD_TARGET for NetBSD and OpenBSD
Commit b2bdb972c0 added MBUILD_TARGET to ensure that meson builds
the tests before running them, this adds MBUILD_TARGET to OpenBSD
and NetBSD builds as well where it was missing.

No backpatching since OpenBSD and NetBSD support does not exist
in the backbranch CI.

Author: Nazir Bilal Yavuz <byavuz81@gmail.com>
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Discussion: https://postgr.es/m/CAN55FZ2LNnRrtL+cpSdEg44fQcLPq_GjJjfNa0vz+xqEdq=ZHw@mail.gmail.com
2025-04-08 15:28:29 +02:00
Tomas Vondra
91f1fe90c7 pg_buffercache: Change page_num type to bigint
The page_num was defined as integer, which should be sufficient for the
near future (with 4K pages it's 8TB). But it's virtually free to return
bigint, and get a wider range. This was agreed on the thread, but I
forgot to tweak this in ba2a3c2302.

While at it, make the data types in CREATE VIEW a bit more consistent.

Discussion: https://postgr.es/m/CAKZiRmxh6KWo0aqRqvmcoaX2jUxZYb4kGp3N%3Dq1w%2BDiH-696Xw%40mail.gmail.co
2025-04-08 12:38:42 +02:00
Tomas Vondra
b8a6078ca8 doc: Correct pg_shmem_allocations_numa.size data type
The code in pg_get_shmem_allocations_numa() returned 'size' as int64,
but the docs said int32.

Report and fix by Noriyoshi Shinoda.

Reported-by: Noriyoshi Shinoda <noriyoshi.shinoda@hpe.com>
Discussion: https://postgr.es/m/DM4PR84MB1734308EB741A6ECFF040C27EEAA2@DM4PR84MB1734.NAMPRD84.PROD.OUTLOOK.COM
2025-04-08 12:36:36 +02:00
Amit Kapila
12eece5fd5 Fix uninitialized index information access during apply.
The issue happens when building conflict information during apply of
INSERT or UPDATE operations that violate unique constraints on leaf
partitions.

The problem was introduced in commit 9ff68679b5, which removed the
redundant calls to ExecOpenIndices/ExecCloseIndices. The previous code was
relying on the redundant ExecOpenIndices call in
apply_handle_tuple_routing() to build the index information required for
unique key conflict detection.

The fix is to delay building the index information until a conflict is
detected instead of relying on ExecOpenIndices to do the same. The
additional benefit of this approach is that it avoids building index
information when there is no conflict.

Author: Hou Zhijie <houzj.fnst@fujitsu.com>
Reviewed-by:Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Discussion: https://postgr.es/m/TYAPR01MB57244ADA33DDA57119B9D26494A62@TYAPR01MB5724.jpnprd01.prod.outlook.com
2025-04-08 15:35:42 +05:30
Thomas Munro
7ea21f4ee2 Fix typo in docs.
Typo in previous commit.
2025-04-08 22:02:45 +12:00
Thomas Munro
f78ca6f3eb Introduce file_copy_method setting.
It can be set to either COPY (the default) or CLONE if the system
supports it.  CLONE causes callers of copydir(), currently CREATE
DATABASE ... STRATEGY=FILE_COPY and ALTER DATABASE ... SET TABLESPACE =
..., to use copy_file_range (Linux, FreeBSD) or copyfile (macOS) to copy
files instead of a read-write loop over the contents.

CLONE gives the kernel the opportunity to share block ranges on
copy-on-write file systems and push copying down to storage on others,
depending on configuration.  On some systems CLONE can be used to clone
large databases quickly with CREATE DATABASE ... TEMPLATE=source
STRATEGY=FILE_COPY.

Other operating systems could be supported; patches welcome.

Co-authored-by: Nazir Bilal Yavuz <byavuz81@gmail.com>
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Reviewed-by: Ranier Vilela <ranier.vf@gmail.com>
Discussion: https://postgr.es/m/CA%2BhUKGLM%2Bt%2BSwBU-cHeMUXJCOgBxSHLGZutV5zCwY4qrCcE02w%40mail.gmail.com
2025-04-08 21:35:38 +12:00
Daniel Gustafsson
042a66291b Add function to get memory context stats for processes
This adds a function for retrieving memory context statistics
and information from backends as well as auxiliary processes.
The intended usecase is cluster debugging when under memory
pressure or unanticipated memory usage characteristics.

When calling the function it sends a signal to the specified
process to submit statistics regarding its memory contexts
into dynamic shared memory.  Each memory context is returned
in detail, followed by a cumulative total in case the number
of contexts exceed the max allocated amount of shared memory.
Each process is limited to use at most 1Mb memory for this.

A summary can also be explicitly requested by the user, this
will return the TopMemoryContext and a cumulative total of
all lower contexts.

In order to not block on busy processes the caller specifies
the number of seconds during which to retry before timing out.
In the case where no statistics are published within the set
timeout,  the last known statistics are returned, or NULL if
no previously published statistics exist.  This allows dash-
board type queries to continually publish even if the target
process is temporarily congested.  Context records contain a
timestamp to indicate when they were submitted.

Author: Rahila Syed <rahilasyed90@gmail.com>
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Tomas Vondra <tomas@vondra.me>
Reviewed-by: Atsushi Torikoshi <torikoshia@oss.nttdata.com>
Reviewed-by: Fujii Masao <masao.fujii@oss.nttdata.com>
Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com>
Discussion: https://postgr.es/m/CAH2L28v8mc9HDt8QoSJ8TRmKau_8FM_HKS41NeO9-6ZAkuZKXw@mail.gmail.com
2025-04-08 11:06:56 +02:00
Andres Freund
15f0cb26b5 Increase BAS_BULKREAD based on effective_io_concurrency
Before, BAS_BULKREAD was always of size 256kB. With the default
io_combine_limit of 16, that only allowed 1-2 IOs to be in flight -
insufficient even on very low latency storage.

We don't just want to increase the size to a much larger hardcoded value, as
very large rings (10s of MBs of of buffers), appear to have negative
performance effects when reading in data that the OS has cached (but not when
actually needing to do IO).

To address this, increase the size of BAS_BULKREAD to allow for
io_combine_limit * effective_io_concurrency buffers getting read in. To
prevent the ring being much larger than useful, limit the increased size with
GetPinLimit().

The formula outlined above keeps the ring size to sizes for which we have not
observed performance regressions, unless very large effective_io_concurrency
values are used together with large shared_buffers setting.

Reviewed-by: Thomas Munro <thomas.munro@gmail.com>
Discussion: https://postgr.es/m/lqwghabtu2ak4wknzycufqjm5ijnxhb4k73vzphlt2a3wsemcd@gtftg44kdim6
Discussion: https://postgr.es/m/uvrtrknj4kdytuboidbhwclo4gxhswwcpgadptsjvjqcluzmah@brqs62irg4dt
2025-04-08 02:41:03 -04:00
Andres Freund
dcf7e1697b Add pg_buffercache_evict_{relation,all} functions
In addition to the added functions, the pg_buffercache_evict() function now
shows whether the buffer was flushed.

pg_buffercache_evict_relation(): Evicts all shared buffers in a
relation at once.
pg_buffercache_evict_all(): Evicts all shared buffers at once.

Both functions provide mechanism to evict multiple shared buffers at
once. They are designed to address the inefficiency of repeatedly calling
pg_buffercache_evict() for each individual buffer, which can be time-consuming
when dealing with large shared buffer pools. (e.g., ~477ms vs. ~2576ms for
16GB of fully populated shared buffers).

These functions are intended for developer testing and debugging
purposes and are available to superusers only.

Minimal tests for the new functions are included. Also, there was no test for
pg_buffercache_evict(), test for this added too.

No new extension version is needed, as it was already increased this release
by ba2a3c2302.

Author: Nazir Bilal Yavuz <byavuz81@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Aidar Imamov <a.imamov@postgrespro.ru>
Reviewed-by: Joseph Koshakow <koshy44@gmail.com>
Discussion: https://postgr.es/m/CAN55FZ0h_YoSqqutxV6DES1RW8ig6wcA8CR9rJk358YRMxZFmw%40mail.gmail.com
2025-04-08 02:19:32 -04:00
David Rowley
d69d45a5a9 Speedup child EquivalenceMember lookup in planner
When planning queries to partitioned tables, we clone all
EquivalenceMembers belonging to the partitioned table into em_is_child
EquivalenceMembers for each non-pruned partition.  For partitioned tables
with large numbers of partitions, this meant the ec_members list could
become large and code searching that list would become slow.  Effectively,
the more partitions which were present, the more searches needed to be
performed for operations such as find_ec_member_matching_expr() during
create_plan() and the more partitions present, the longer these searches
would take, i.e., a quadratic slowdown.

To fix this, here we adjust how we store EquivalenceMembers for
em_is_child members.  Instead of storing these directly in ec_members,
these are now stored in a new array of Lists in the EquivalenceClass,
which is indexed by the relid.  When we want to find EquivalenceMembers
belonging to a certain child relation, we can narrow the search to the
array element for that relation.

To make EquivalenceMember lookup easier and to reduce the amount of code
change, this commit provides a pair of functions to allow iteration over
the EquivalenceMembers of an EC which also handles finding the child
members, if required.  Callers that never need to look at child members
can remain using the foreach loop over ec_members, which will now often
be faster due to only parent-level members being stored there.

The actual performance increases here are highly dependent on the number
of partitions and the query being planned.  Performance increases can be
visible with as few as 8 partitions, but the speedup is marginal for
such low numbers of partitions.  The speedups become much more visible
with a few dozen to hundreds of partitions.  With some tested queries
using 56 partitions, the planner was around 3x faster than before.  For
use cases with thousands of partitions, these are likely to become
significantly faster.  Some testing has shown planner speedups of 60x or
more with 8192 partitions.

Author: Yuya Watari <watari.yuya@gmail.com>
Co-authored-by: David Rowley <dgrowleyml@gmail.com>
Reviewed-by: David Rowley <dgrowleyml@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Andrey Lepikhov <a.lepikhov@postgrespro.ru>
Reviewed-by: Alena Rybakina <lena.ribackina@yandex.ru>
Reviewed-by: Dmitry Dolgov <9erthalion6@gmail.com>
Reviewed-by: Amit Langote <amitlangote09@gmail.com>
Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Tested-by: Thom Brown <thom@linux.com>
Tested-by: newtglobal postgresql_contributors <postgresql_contributors@newtglobalcorp.com>
Discussion: https://postgr.es/m/CAJ2pMkZNCgoUKSE%2B_5LthD%2BKbXKvq6h2hQN8Esxpxd%2Bcxmgomg%40mail.gmail.com
2025-04-08 18:09:57 +12:00
Amit Kapila
105b2cb336 Stabilize 035_standby_logical_decoding.pl.
Some tests try to invalidate logical slots on the standby server by
running VACUUM on the primary. The problem is that xl_running_xacts was
getting generated and replayed before the VACUUM command, leading to the
advancement of the active slot's catalog_xmin. Due to this, active slots
were not getting invalidated, leading to test failures.

We fix it by skipping the generation of xl_running_xacts for the required
tests with the help of injection points. As the required interface for
injection points was not present in back branches, we fixed the failing
tests in them by disallowing the slot to become active for the required
cases (where rows_removed conflict could be generated).

Author: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Reviewed-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Backpatch-through: 16, where it was introduced
Discussion: https://postgr.es/m/Z6oQXc8LmiTLfwLA@ip-10-97-1-34.eu-west-3.compute.internal
2025-04-08 09:38:02 +05:30
Bruce Momjian
46b4ba533c Fix PG 17 [NOT] NULL optimization bug for domains
A PG 17 optimization allowed columns with NOT NULL constraints to skip
table scans for IS NULL queries, and to skip IS NOT NULL checks for IS
NOT NULL queries.  This didn't work for domain types, since domain types
don't follow the IS NULL/IS NOT NULL constraint logic.  To fix, disable
this optimization for domains for PG 17+.

Reported-by: Jan Behrens

Diagnosed-by: Tom Lane

Discussion: https://postgr.es/m/Z37p0paENWWUarj-@momjian.us

Backpatch-through: 17
2025-04-07 21:33:42 -04:00
Michael Paquier
039549d70f Flush the IO statistics of active WAL senders more frequently
WAL senders do not flush their statistics until they exit, limiting the
monitoring possible for live processes.  This is penalizing when WAL
senders are running for a long time, like in streaming or logical
replication setups, because it is not possible to know the amount of IO
they generate while running.

This commit makes WAL senders more aggressive with their statistics
flush, using an internal of 1 second, with the flush timing calculated
based on the existing GetCurrentTimestamp() done before the sleeps done
to wait for some activity.  Note that the sleep done for logical and
physical WAL senders happens in two different code paths, so the stats
flushes need to happen in these two places.

One test is added for the physical WAL sender case, and one for the
logical WAL sender case.  This can be done in a stable fashion by
relying on the WAL generated by the TAP tests in combination with a
stats reset while a server is running, but only on HEAD as WAL data has
been added to pg_stat_io in a051e71e28.

This issue exists since a9c70b46db and the introduction of pg_stat_io,
so backpatch down to v16.

Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Reviewed-by: vignesh C <vignesh21@gmail.com>
Reviewed-by: Xuneng Zhou <xunengzhou@gmail.com>
Discussion: https://postgr.es/m/Z73IsKBceoVd4t55@ip-10-97-1-34.eu-west-3.compute.internal
Backpatch-through: 16
2025-04-08 07:57:19 +09:00
Tomas Vondra
ba2a3c2302 Add pg_buffercache_numa view with NUMA node info
Introduces a new view pg_buffercache_numa, showing NUMA memory nodes
for individual buffers. For each buffer the view returns an entry for
each memory page, with the associated NUMA node.

The database blocks and OS memory pages may have different size - the
default block size is 8KB, while the memory page is 4K (on x86). But
other combinations are possible, depending on configure parameters,
platform, etc. This means buffers may overlap with multiple memory
pages, each associated with a different NUMA node.

To determine the NUMA node for a buffer, we first need to touch the
memory pages using pg_numa_touch_mem_if_required, otherwise we might get
status -2 (ENOENT = The page is not present), indicating the page is
either unmapped or unallocated.

The view may be relatively expensive, especially when accessed for the
first time in a backend, as it touches all memory pages to get reliable
information about the NUMA node. This may also force allocation of the
shared memory.

Author: Jakub Wartak <jakub.wartak@enterprisedb.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Reviewed-by: Tomas Vondra <tomas@vondra.me>
Discussion: https://postgr.es/m/CAKZiRmxh6KWo0aqRqvmcoaX2jUxZYb4kGp3N%3Dq1w%2BDiH-696Xw%40mail.gmail.com
2025-04-07 23:08:17 +02:00
Tomas Vondra
8cc139bec3 Introduce pg_shmem_allocations_numa view
Introduce new pg_shmem_alloctions_numa view with information about how
shared memory is distributed across NUMA nodes. For each shared memory
segment, the view returns one row for each NUMA node backing it, with
the total amount of memory allocated from that node.

The view may be relatively expensive, especially when executed for the
first time in a backend, as it has to touch all memory pages to get
reliable information about the NUMA node. This may also force allocation
of the shared memory.

Unlike pg_shmem_allocations, the view does not show anonymous shared
memory allocations. It also does not show memory allocated using the
dynamic shared memory infrastructure.

Author: Jakub Wartak <jakub.wartak@enterprisedb.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Reviewed-by: Tomas Vondra <tomas@vondra.me>
Discussion: https://postgr.es/m/CAKZiRmxh6KWo0aqRqvmcoaX2jUxZYb4kGp3N%3Dq1w%2BDiH-696Xw%40mail.gmail.com
2025-04-07 23:08:17 +02:00
Tomas Vondra
65c298f61f Add support for basic NUMA awareness
Add basic NUMA awareness routines, using a minimal src/port/pg_numa.c
portability wrapper and an optional build dependency, enabled by
--with-libnuma configure option. For now this is Linux-only, other
platforms may be supported later.

A built-in SQL function pg_numa_available() allows checking NUMA
support, i.e. that the server was built/linked with the NUMA library.

The main function introduced is pg_numa_query_pages(), which allows
determining the NUMA node for individual memory pages. Internally the
function uses move_pages(2) syscall, as it allows batching, and is more
efficient than get_mempolicy(2).

Author: Jakub Wartak <jakub.wartak@enterprisedb.com>
Co-authored-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Álvaro Herrera <alvherre@alvh.no-ip.org>
Reviewed-by: Tomas Vondra <tomas@vondra.me>
Discussion: https://postgr.es/m/CAKZiRmxh6KWo0aqRqvmcoaX2jUxZYb4kGp3N%3Dq1w%2BDiH-696Xw%40mail.gmail.com
2025-04-07 23:08:17 +02:00
Álvaro Herrera
17bcf4f545
Use specific collation where needed in new test
Oversight in commit a379061a22.

Per Czech buildfarm members jay and hippopotamus.
2025-04-07 21:58:06 +02:00
Tom Lane
8cfbdf8f4d Fix some issues in contrib/spi/refint.c.
check_foreign_key incorrectly used a single cache entry for its saved
plans for a 'c' (cascade) trigger, although there are two different
queries to execute depending on whether it fires for an update or a
delete.  This caused the wrong things to be done if both types of
event occur in one session.  (This was indeed visible in the triggers
regression test, but apparently nobody ever questioned it.)  To fix,
add the operation type to the cache key.

Its debug log output failed to distinguish update from delete
events, too.

Also, change the intended trigger usage from BEFORE ROW to AFTER ROW,
and add checks insisting on that usage.  BEFORE is really rather
unsafe, since if there are other BEFORE triggers they might change or
cancel the operation we are trying to check.  AFTER triggers are the
standard way to propagate changes to other rows, so we should follow
that way here.

In passing, remove a useless duplicate lookup of the cache entry.

This code is mostly intended as a documentation example, so we
won't consider a back-patch.

Author: Dmitrii Bondar <d.bondar@postgrespro.ru>
Reviewed-by: Paul Jungwirth <pj@illuminatedcomputing.com>
Reviewed-by: Lilian Ontowhee <ontowhee@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/79755a2b18ed4fe5e29da6a87a1e00d1@postgrespro.ru
2025-04-07 15:54:16 -04:00
Andres Freund
8e293e689b aio: Make AIO more compatible with valgrind
In some edge cases valgrind flags issues with the memory referenced by
IOs. All of the cases addressed in this change are false positives.

Most of the false positives are caused by UnpinBuffer[NoOwner] marking buffer
data as inaccessible. This happens even though the AIO subsystem still holds a
pin. That's good, there shouldn't be accesses to the buffer outside of AIO
related code until it is pinned by "user" code again. But it requires some
explicit work - if the buffer is not pinned by the current backend, we need to
explicitly mark the buffer data accessible/inaccessible while executing
completion callbacks.

That however causes a cascading issue in IO workers: After the completion
callbacks for a buffer is executed, the page is marked as inaccessible. If
subsequently the same worker is executing IO targeting the same buffer, we
would get an error, as the memory is still marked inaccessible. To avoid that,
we need to explicitly mark the memory as accessible in IO workers.

Another issue is that IO executed in workers or via io_uring will not mark
memory as DEFINED. In the case of workers that is because valgrind does not
track memory definedness across processes. For io_uring that is because
valgrind does not understand io_uring, and therefore its IOs never mark memory
as defined, whether the completions are processed in the defining process or
in another context.  It's not entirely clear how to best solve that. The
current user of AIO is not affected, as it explicitly marks buffers as DEFINED
& NOACCESS anyway.  Defer solving this issue until we have a user with
different needs.

Per buildfarm animal skink.

Reviewed-by: Noah Misch <noah@leadboat.com>
Co-authored-by: Noah Misch <noah@leadboat.com>
Discussion: https://postgr.es/m/3pd4322mogfmdd5nln3zphdwhtmq3rzdldqjwb2sfqzcgs22lf@ok2gletdaoe6
2025-04-07 15:20:30 -04:00
Andres Freund
8ab4241b9f localbuf: Add Valgrind buffer access instrumentation
This mirrors 1e0dfd166b (+ 46ef520b95), for temporary table buffers. This
is mainly interesting right now because the AIO work currently triggers
spurious valgrind errors, and the fix for that is cleaner if temp buffers
behave the same as shared buffers.

This requires one change beyond the annotations themselves, namely to pin
local buffers while writing them out in FlushRelationBuffers().

Reviewed-by: Noah Misch <noah@leadboat.com>
Co-authored-by: Noah Misch <noah@leadboat.com>
Discussion: https://postgr.es/m/3pd4322mogfmdd5nln3zphdwhtmq3rzdldqjwb2sfqzcgs22lf@ok2gletdaoe6
2025-04-07 15:20:30 -04:00
Masahiko Sawada
a13d49014d doc: Fix a typo in pg_recvlogical documentation.
Oversight in cf2655a902.

Author: Zhijie Hou <houzj.fnst@fujitsu.com>
Discussion: https://postgr.es/m/OS3PR01MB5718DD1466E2B9043448AE5094AA2@OS3PR01MB5718.jpnprd01.prod.outlook.com
2025-04-07 12:13:08 -07:00
Tom Lane
969ab9d4f5 Follow-up fixes for SHA-2 patch (commit 749a9e20c).
This changes the check for valid characters in the salt string to
only allow plain ASCII letters and digits.  The previous coding was
locale-dependent which doesn't really seem like a great idea here;
moreover it could not work correctly in multibyte encodings.

This fixes a careless pointer-use-after-pfree, too.

Reported-by: Tom Lane <tgl@sss.pgh.pa.us>
Reported-by: Andres Freund <andres@anarazel.de>
Author: Bernd Helmle <mailings@oopsware.de>
Discussion: https://postgr.es/m/6fab35422df6b6b9727fdcc243c5fa1c667dd3b5.camel@oopsware.de
2025-04-07 14:14:28 -04:00
Tom Lane
b73e6d71a8 Fix erroneous construction of functions' dependencies on transforms.
The list of transform objects that a function should use is specified
in CREATE FUNCTION's TRANSFORM clause, and then represented indirectly
in pg_proc.protrftypes.  However, ProcedureCreate completely ignored
that for purposes of constructing pg_depend entries, and instead made
the function depend on any transforms that exist for its parameter or
return data types.  This is bad in both directions: the function could
be made dependent on a transform it does not actually use, or it
could try to use a transform that's since been dropped.  (The latter
scenario would require use of a transform that's not for any of the
parameter or return types, but that seems legit for cases where the
function performs SQL operations internally.)

To fix, pass in the list of transform objects that CreateFunction
identified, and build pg_depend entries from that not from the
parameter/return types.  This results in changes in the expected
test outputs in contrib/bool_plperl, which I guess are due to
different ordering of pg_depend entries -- that test case is
surely not exercising either of the problem scenarios.

This fix is not back-patchable as-is: changing the signature of
ProcedureCreate seems too risky in stable branches.  We could
do something like making ProcedureCreate a wrapper around
ProcedureCreateExt or so.  However, I'm more inclined to do
nothing in the back branches.  We had no field complaints up to
now, so the hazards don't seem to be a big issue in practice.
And we couldn't do anything about existing pg_depend entries,
so a back-patched fix would result in a mishmash of dependencies
created according to different rules.  That cure could be worse
than the disease, perhaps.

I bumped catversion just to lay down a marker that the expected
contents of pg_depend are a bit different than before.

Reported-by: Chapman Flack <jcflack@acm.org>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/3112950.1743984111@sss.pgh.pa.us
2025-04-07 13:31:37 -04:00
Álvaro Herrera
a379061a22
Allow NOT NULL constraints to be added as NOT VALID
This allows them to be added without scanning the table, and validating
them afterwards without holding access exclusive lock on the table after
any violating rows have been deleted or fixed.

Doing ALTER TABLE ... SET NOT NULL for a column that has an invalid
not-null constraint validates that constraint.  ALTER TABLE .. VALIDATE
CONSTRAINT is also supported.  There are various checks on whether an
invalid constraint is allowed in a child table when the parent table has
a valid constraint; this should match what we do for enforced/not
enforced constraints.

pg_attribute.attnotnull is now only an indicator for whether a not-null
constraint exists for the column; whether it's valid or invalid must be
queried in pg_constraint.  Applications can continue to query
pg_attribute.attnotnull as before, but now it's possible that NULL rows
are present in the column even when that's set to true.

For backend internal purposes, we cache the nullability status in
CompactAttribute->attnullability that each tuple descriptor carries
(replacing CompactAttribute.attnotnull, which was a mirror of
Form_pg_attribute.attnotnull).  During the initial tuple descriptor
creation, based on the pg_attribute scan, we set this to UNRESTRICTED if
pg_attribute.attnotnull is false, or to UNKNOWN if it's true; then we
update the latter to VALID or INVALID depending on the pg_constraint
scan.  This flag is also copied when tupledescs are copied.

Comparing tuple descs for equality must also compare the
CompactAttribute.attnullability flag and return false in case of a
mismatch.

pg_dump deals with these constraints by storing the OIDs of invalid
not-null constraints in a separate array, and running a query to obtain
their properties.  The regular table creation SQL omits them entirely.
They are then dealt with in the same way as "separate" CHECK
constraints, and dumped after the data has been loaded.  Because no
additional pg_dump infrastructure was required, we don't bump its
version number.

I decided not to bump catversion either, because the old catalog state
works perfectly in the new world.  (Trying to run with new catalog state
and the old server version would likely run into issues, however.)

System catalogs do not support invalid not-null constraints (because
commit 14e87ffa5c didn't allow them to have pg_constraint rows
anyway.)

Author: Rushabh Lathia <rushabh.lathia@gmail.com>
Author: Jian He <jian.universality@gmail.com>
Reviewed-by: Álvaro Herrera <alvherre@alvh.no-ip.org>
Tested-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Discussion: https://postgr.es/m/CAGPqQf0KitkNack4F5CFkFi-9Dqvp29Ro=EpcWt=4_hs-Rt+bQ@mail.gmail.com
2025-04-07 19:19:50 +02:00
Andrew Dunstan
b52a4a5f28 Clean up error messages from 1495eff7bd
Quote file names, and mostly avoid hard coded file names. Along the way
make a few other minor improvements.

Discussion: https://postgr.es/m/20250407.152721.1397761902317499205.horikyota.ntt@gmail.com
2025-04-07 12:22:41 -04:00
Tom Lane
3516ea768c Add local-address escape "%L" to log_line_prefix.
This escape shows the numeric server IP address that the client
has connected to.  Unix-socket connections will show "[local]".
Non-client processes (e.g. background processes) will show "[none]".

We expect that this option will be of interest to only a fairly
small number of users.  Therefore the implementation is optimized
for the case where it's not used (that is, we don't do the string
conversion until we have to), and we've not added the field to
csvlog or jsonlog formats.

Author: Greg Sabino Mullane <htamfids@gmail.com>
Reviewed-by: Cary Huang <cary.huang@highgo.ca>
Reviewed-by: David Steele <david@pgmasters.net>
Reviewed-by: Jim Jones <jim.jones@uni-muenster.de>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/CAKAnmmK-U+UicE-qbNU23K--Q5XTLdM6bj+gbkZBZkjyjrd3Ow@mail.gmail.com
2025-04-07 11:06:05 -04:00
Andrew Dunstan
8f5e419484 Revert "Use workaround of __builtin_setjmp only on MINGW on MSVCRT"
This reverts commit c313fa4602.

This is found to cause issues on x86_64 Windows even when using UCRT.

Discussion: https://postgr.es/m/3312149.1744001936@sss.pgh.pa.us
2025-04-07 11:01:15 -04:00
Andres Freund
8ce79483dc read_stream: Fix overflow hazard with large shared buffers
If the limit returned by GetAdditionalPinLimit() is large, the buffer_limit
variable in read_stream_start_pending_read() can overflow. While the code is
careful to limit buffer_limit PG_INT16_MAX, we subsequently add the number of
forwarded buffers.

The overflow can lead to assertion failures, crashes or wrong query results
when using large shared buffers.

It seems easier to avoid this if we make the buffer_limit variable an int,
instead of an int16.  Do so, and clamp buffer_limit after adding the number of
forwarded buffers.

It's possible we might want to address this and related issues more widely by
changing to int instead of int16 more widely, but since the consequences of
this bug can be confusing, it seems better to fix it now.

This bug was introduced in ed0b87caac.

Discussion: https://postgr.es/m/ewvz3cbtlhrwqk7h6ca6cctiqh7r64ol3pzb3iyjycn2r5nxk5@tnhw3a5zatlr
2025-04-07 09:45:00 -04:00
Alexander Korotkov
717d0e8dd9 Remove GUC_NOT_IN_SAMPLE from enable_self_join_elimination
fc069a3a63 implements Self-Join Elimination (SJE) and provides a new GUC
variable: enable_self_join_elimination.  This new GUC variable was marked
as GUC_NOT_IN_SAMPLE.  However, enable_self_join_elimination is documented
and is not different from any other enable_* GUCs.  Thus, remove
GUC_NOT_IN_SAMPLE from it and add it to the postgresql.conf.sample.

Discussion: https://postgr.es/m/CAPpHfdsqMTEsmxk3aQwt6xPz%2BKpUELO%3D6fzmER9ZRGrbs4uMfA%40mail.gmail.com
Author: Tender Wang <tndrwang@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
2025-04-07 16:28:54 +03:00
Daniel Gustafsson
ae60947643 psql: Clarify help message for WATCH_INTERVAL
The help message for WATCH_INTERVAL was hard to interpret and didn't
follow the style of other messages, this updates it to nake it fit in
better and be easier to interpret.

Author: Daniel Gustafsson <daniel@yesql.se>
Reported-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Reviewed-by: David G. Johnston <david.g.johnston@gmail.com>
Discussion: https://postgr.es/m/20250326.120732.1167093737847500721.horikyota.ntt@gmail.com
2025-04-07 13:44:58 +02:00
Michael Paquier
d6f118444d Fix grammar in log message of pg_restore.c
Introduced by 1495eff7bd.

Author: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Discussion: https://postgr.es/m/20250407.151359.72428746612514925.horikyota.ntt@gmail.com
2025-04-07 15:37:34 +09:00
Michael Paquier
2c7bd2ba50 libpq: Fix some issues in TAP tests for service files
The valid service file was not correctly shaped, as append_to_file() was
called with an array as input.  This is changed so as the parameter and
value pairs from the valid connection string are appended to the valid
service file one by one.

Even with the first issue fixed, the tests should fail.  However, they
have been passing because all the connection attempts relied on the
default values given to PGPORT and PGHOST from the node when using
Cluster.pm's connect_ok() and connect_fails(), rather than the data in
the service file.  The test is updated to use an interesting trick: a
dummy node is initialized but not started, and all the connection
attempts are done through it.  This ensures that the data inside the
service file is used for all the connection tests.  Note that breaking
the contents of the valid service file on purpose makes all the tests
that rely on it fail.

Issues introduced by 72c2f36d57.

Author: Andrew Jackson <andrewjackson947@gmail.com>
Discussion: https://postgr.es/m/CAKK5BkG_6_YSaebM6gG=8EuKaY7_VX1RFgYeySuwFPh8FZY73g@mail.gmail.com
2025-04-07 12:55:09 +09:00
Michael Paquier
c36eda2591 Clarify comment for worst-case allocation in quote_literal_cstr()
palloc() is invoked with a specific formula for its allocation size in
quote_literal_cstr().  This wastes some memory, but the size is large
enough to cover even the worst-case scenarios.

No explanations were given about the reasons behind these numbers.  This
commit adds more documentation about all that.

Author: Steve Chavez <steve@supabase.io>
Discussion: https://postgr.es/m/CAGRrpzZ9bToRWS+fAnjxDJrxwZN1QcJ-y1Pn2yg=Hst6rydLtw@mail.gmail.com
2025-04-07 10:02:12 +09:00
Michael Paquier
3191a593d6 Fix use-after-free in pgstat_fetch_stat_backend_by_pid()
stats_fetch_consistency set to "snapshot" causes the backend entry
"beentry" retrieved by pgstat_get_beentry_by_proc_number() to be reset
at the beginning of pgstat_fetch_stat_backend() when fetching the
backend pgstats entry.  As coded, "beentry" was being accessed after
being freed.  This commit moves all the accesses to "beentry" to happen
before calling pgstat_fetch_stat_backend(), fixing the problem.

This problem could be reached by calling the SQL functions
pg_stat_get_backend_io() or pg_stat_get_backend_wal().

Issue caught by valgrind.

Reported-by: Alexander Lakhin <exclusion@gmail.com>
Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Discussion: https://postgr.es/m/f1788cc0-253a-4a3a-aee0-1b8ab9538736@gmail.com
2025-04-07 09:51:40 +09:00
Fujii Masao
173c97812f Use XLOG_CONTROL_FILE macro consistently for control file name.
The XLOG_CONTROL_FILE macro (defined in access/xlog_internal.h)
represents the control file name. While some parts of the codebase already
use this macro, others previously hardcoded the file name as a string.

This commit replaces those hardcoded strings with the macro,
ensuring consistent usage throughout the code. This makes future
maintenance easier and improves searchability, for example when
grepping for control file usage.

Author: Anton A. Melnikov <a.melnikov@postgrespro.ru>
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Reviewed-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Reviewed-by: Masao Fujii <masao.fujii@gmail.com>
Discussion: https://postgr.es/m/0841ec77-47e5-452a-adb4-c6fa55d605fc@postgrespro.ru
2025-04-07 09:27:33 +09:00
Daniel Gustafsson
a233a603ba doc: Clarify project naming
Clarify the project naming in the history section of the docs
to match the recent license preamble changes.

Backpatch to all supported versions.

Author: Dave Page <dpage@pgadmin.org>
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Discussion: https://postgr.es/m/CA+OCxozLzK2+Jc14XZyWXSp6L9Ot+3efwXUE35FJG=fsbib2EA@mail.gmail.com
Backpatch-through: 13
2025-04-07 00:03:18 +02:00
Andrew Dunstan
643a1a6198 Clean up checking for pg_dumpall output directory
Coverity objected to the original code, and in any case this is much
cleaner, using the existing routine pg_check_dir() instead of rolling
its own test.

Per suggestion from Tom Lane.
2025-04-06 17:04:58 -04:00
Tom Lane
218ab68275 Doc: fix PDF "contents ... exceed the available area" warnings.
Tweak column widths in a new table, similarly to some previous
fixes such as b62381d9a.

Per buildfarm.
2025-04-06 16:27:39 -04:00
Nathan Bossart
de48056ec7 pg_upgrade: Fix memory leak in check_for_unicode_update().
This function was initializing the "task" variable before a couple
of early returns.  To fix, postpone the initialization until just
before it's needed.

Per Coverity.

Discussion: https://postgr.es/m/Z_KMsUH2-FEbiNjC%40nathan
2025-04-06 15:11:41 -05:00
Andres Freund
57dec20fd4 aio: Avoid spurious coverity warning
PgAioResult.result is never accessed in the relevant path, but coverity
complains about an uninitialized access anyway. So just zero-initialize the
whole thing.  While at it, reduce the scope of the variable.

Reported-by: Ranier Vilela <ranier.vf@gmail.com>
Reviewed-by: Noah Misch <noah@leadboat.com>
Discussion: https://postgr.es/m/CAEudQApsKqd-s+fsUQ0OmxJAMHmBSXxrAz3dCs+uvqb3iRtjSw@mail.gmail.com
2025-04-06 12:07:02 -04:00
Tom Lane
8ab6ef2bb8 Fix memory leaks in px_crypt_shacrypt().
Per Coverity.  I don't think these are of any actual significance
since the function ought to be invoked in a short-lived context.
Still, if it's trying to be neat it should get it right.

Also const-ify a constant and fix up typedef formatting.
2025-04-06 11:57:22 -04:00
Tom Lane
2e4ccf1b45 Use "(void)" to mark pgstat_lock_entry(..., false) calls.
This should silence Coverity's complaints about the result being
sometimes ignored.

I'm inclined to think that these routines are simply misdesigned,
because sometimes it's okay to ignore the result and sometimes it
isn't, and we have no way to enforce the latter.  But for now
I just added a comment.
2025-04-06 11:37:09 -04:00
Andrew Dunstan
5e19154390 Avoid unnecessary copying of a string in pg_restore.c
Coverity complained about a possible overrun in the copy, but there is
no actual need to copy the string at all.
2025-04-06 09:21:09 -04:00
Andrew Dunstan
6d5417e634 Fix a couple of memory leaks in pg_restore.c
per complaint from Coverity.
2025-04-06 09:09:25 -04:00
Peter Eisentraut
a8025f5448 Relax ordering-related hardcoded btree requirements in planning
There were several places in ordering-related planning where a
requirement for btree was hardcoded but an amcanorder index could
suffice.  This fixes that.  We just need to do the necessary mapping
between strategy numbers and compare types and adjust some related
APIs so that this works independent of btree strategy numbers.  For
instance, non-btree amcanorder indexes can now be used to support
sorting and merge joins.  Also, predtest.c works independent of btree
strategy numbers now.

To avoid performance regressions, some details on btree and other
built-in index types are still hardcoded as shortcuts, but other index
types now have access to the same features by providing the required
flags and callbacks.

Author: Mark Dilger <mark.dilger@enterprisedb.com>
Co-authored-by: Peter Eisentraut <peter@eisentraut.org>
Discussion: https://www.postgresql.org/message-id/flat/E72EAA49-354D-4C2E-8EB9-255197F55330@enterprisedb.com
2025-04-06 14:43:51 +02:00
Alexander Korotkov
3a1a7c5a70 Revert "Put enable_self_join_elimination into postgresql.conf.sample"
This reverts commit c2d329260c.

Reported-by: Daniel Gustafsson <daniel@yesql.se>
Discussion: https://postgr.es/m/D292EB44-806E-439A-82A4-491A1BA59E7A%40yesql.se
2025-04-06 14:30:20 +03:00
Alexander Korotkov
c2d329260c Put enable_self_join_elimination into postgresql.conf.sample
fc069a3a63 implements Self-Join Elimination (SJE) and provides a new
GUC variable: enable_self_join_elimination.  This commit adds
enable_self_join_elimination to the postgresql.conf.sample, as it was
forgotten in the original commit.

Discussion: https://postgr.es/m/CAHewXN%3D%2Bghd6O6im46q7j2u6c3H6vkXtXmF%3D_v4CfGSnjje8PA%40mail.gmail.com
Author: Tender Wang <tndrwang@gmail.com>
2025-04-06 13:24:16 +03:00
John Naylor
3c6e8c1238 Compute CRC32C using AVX-512 instructions where available
The previous implementation of CRC32C on x86 relied on the native
CRC32 instruction from the SSE 4.2 extension, which operates on
up to 8 bytes at a time. We can get a substantial speedup by using
carryless multiplication on SIMD registers, processing 64 bytes per
loop iteration. Shorter inputs fall back to ordinary CRC instructions.
On Intel Tiger Lake hardware (2020), CRC is now 50% faster for inputs
between 64 and 112 bytes, and 3x faster for 256 bytes.

The VPCLMULQDQ instruction on 512-bit registers has been available
on Intel hardware since 2019 and AMD since 2022. There is an older
variant for 128-bit registers, but at least on Zen 2 it performs worse
than normal CRC instructions for short inputs.

We must now do a runtime check, even for builds that target SSE
4.2. This doesn't matter in practice for WAL (arguably the most
critical case), because since commit e2809e3a1 the final computation
with the 20-byte WAL header is inlined and unrolled when targeting
that extension. Compared with two direct function calls, testing
showed equal or slightly faster performance in performing an indirect
function call on several dozen bytes followed by inlined instructions
on constant input of 20 bytes.

The MIT-licensed implementation was generated with the "generate"
program from

https://github.com/corsix/fast-crc32/

Based on: "Fast CRC Computation for Generic Polynomials Using PCLMULQDQ
Instruction" V. Gopal, E. Ozturk, et al., 2009

Co-authored-by: Raghuveer Devulapalli <raghuveer.devulapalli@intel.com>
Co-authored-by: Paul Amonson <paul.d.amonson@intel.com>
Reviewed-by: Nathan Bossart <nathandbossart@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de> (earlier version)
Reviewed-by: Matthew Sterrett <matthewsterrett2@gmail.com> (earlier version)
Tested-by: Raghuveer Devulapalli <raghuveer.devulapalli@intel.com>
Tested-by: David Rowley <<dgrowleyml@gmail.com>> (earlier version)
Discussion: https://postgr.es/m/BL1PR11MB530401FA7E9B1CA432CF9DC3DC192@BL1PR11MB5304.namprd11.prod.outlook.com
Discussion: https://postgr.es/m/PH8PR11MB82869FF741DFA4E9A029FF13FBF72@PH8PR11MB8286.namprd11.prod.outlook.com
2025-04-06 14:04:30 +07:00
Daniel Gustafsson
683df3f4de Quote filename in error message
Project standard is to quote filenames in error and log messages, which
commit 2da74d8d64 missed in two error messages.

Author: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Reported-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Discussion: https://postgr.es/m/20250404.120328.103562371975971823.horikyota.ntt@gmail.com
2025-04-05 22:10:28 +02:00
Tom Lane
691836405f Fix parse_cte.c's failure to examine sub-WITHs in DML statements.
makeDependencyGraphWalker thought that only SelectStmt nodes could
contain a WithClause.  Which was true in our original implementation
of WITH, but astonishingly we missed updating this code when we added
the ability to attach WITH to INSERT/UPDATE/DELETE (and later MERGE).
Moreover, since it was coded to deliberately block recursion to a
WithClause, even updating raw_expression_tree_walker didn't save it.

The upshot of this was that we didn't see references to outer CTE
names appearing within an inner WITH, and would neither complain about
disallowed recursion nor account for such references when sorting CTEs
into a usable order.  The lack of complaints about this is perhaps not
so surprising, because typical usage of WITH wouldn't hit either case.
Still, it's pretty broken; failing to detect recursion here leads to
assert failures or worse later on.

Fix by factoring out the processing of sub-WITHs into a new function
WalkInnerWith, and invoking that for all the statement types that
can have WITH.

Bug: #18878
Reported-by: Yu Liang <luy70@psu.edu>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/18878-a26fa5ab6be2f2cf@postgresql.org
Backpatch-through: 13
2025-04-05 15:01:48 -04:00
Álvaro Herrera
749a9e20c9
Add modern SHA-2 based password hashes to pgcrypto.
This adapts the publicly available reference implementation on
https://www.akkadia.org/drepper/SHA-crypt.txt and adds the new hash
algorithms sha256crypt and sha512crypt to crypt() and gen_salt()
respectively.

Author: Bernd Helmle <mailings@oopsware.de>
Reviewed-by: Japin Li <japinli@hotmail.com>
Discussion: https://postgr.es/m/c763235a2757e2f5f9e3e27268b9028349cef659.camel@oopsware.de
2025-04-05 19:17:13 +02:00
Tom Lane
e33f2335a9 Avoid double transformation of json_array()'s subquery.
transformJsonArrayQueryConstructor() applied transformStmt() to
the same subquery tree twice.  While this causes no issue in many
cases, there are some where it causes a coredump, thanks to the
parser's habit of scribbling on its input.

Fix by making a copy before the first transformation (compare
0f43083d1).  This is quite brute-force, but then so is the
whole business of transforming the input twice.  Per discussion
in the bug thread, this implementation of json_array() parsing
should be replaced completely.  But that will take some work
and will surely not be back-patchable, so for the moment let's
take the easy way out.

Oversight in 7081ac46a.  Back-patch to v16 where that came in.

Bug: #18877
Reported-by: Yu Liang <luy70@psu.edu>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/18877-c3c3ad75845833bb@postgresql.org
Backpatch-through: 16
2025-04-05 12:13:35 -04:00
Andrew Dunstan
5db3bf7391 Clean up from commit 1495eff7bd
Fix some comments, and remove the hacky way of quoting database names in
favor of appendStringLiteralConn.
2025-04-05 08:00:24 -04:00
Álvaro Herrera
64fba9c617
Set log_statement=none in t/002_pg_upgrade.pl
This should make the test a wee bit faster on high-load machines (e.g.,
when running under valgrind).

Per complaint from Andres Freund.

Discussion: https://postgr.es/m/cwbcyjp2ts7o7xgy5y5gwtcd4zltvncsj67el7xgci7xbwrhlu@k363vk5tce4g
2025-04-05 11:41:01 +02:00
Álvaro Herrera
4be6a74cfb
pg_dump: Tiny header cleanup
In commits 9c02e3a986 and 8ec0aaeae0, Nathan added a duplicate
TocEntry typedef forward declaration (plus assorted #ifdef hackery to
avoid C99 preprocessor issues) to deal with some very old untidyness
regarding DefnDumperPtr function prototype being located in pg_backup.h.
But there's no reason to have the DefnDumperPtr typedef (and the
accompanying DataDumperPtr typedef) in that file at all; they are better
placed in pg_backup_archiver.h, the internal header, because they are
only used internally.  That also requires zero #ifdef hackery, so move
them there.

Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/202504042140.qo66ggw6wzsz@alvherre.pgsql
2025-04-05 11:22:40 +02:00
Nathan Bossart
f0d0083f52 pg_dump: Fix query for gathering attribute stats on older versions.
Commit 9c02e3a986 taught pg_dump to retrieve attribute statistics
for 64 relations at a time.  pg_dump supports dumping from v9.2 and
newer versions, but our query for retrieving statistics for
multiple relations uses WITH ORDINALITY and multi-argument
UNNEST(), both of which were introduced in v9.4.  To fix, we resort
to gathering statistics for a single relation at a time on versions
older than v9.4.

Per buildfarm member crake.

Author: Corey Huinker <corey.huinker@gmail.com>
Discussion: https://postgr.es/m/Z_BcWVMvlUIJ_iuZ%40nathan
2025-04-04 21:05:30 -05:00
Tom Lane
43b8e6c4ab Repair misbehavior with duplicate entries in FK SET column lists.
Since v15 we've had an option to apply a foreign key constraint's
ON DELETE SET DEFAULT or SET NULL action to just some of the
referencing columns.  There was not a check for duplicate entries in
the list of columns-to-set, though.  That caused a potential memory
stomp in CreateConstraintEntry(), which incautiously assumed that
the list of columns-to-set couldn't be longer than the number of key
columns.  Even after fixing that, the case doesn't work because you
get an error like "multiple assignments to same column" from the SQL
command that is generated to do the update.

We could either raise an error for duplicate columns or silently
suppress the dups, and after a bit of thought I chose to do the
latter.  This is motivated by the fact that duplicates in the FK
column list are legal, so it's not real clear why duplicates
in the columns-to-set list shouldn't be.  Of course there's no
need to actually set the column more than once.

I left in the fix in CreateConstraintEntry() too, just because
it didn't seem like such low-level code ought to be making
assumptions about what it's handed.

Bug: #18879
Reported-by: Yu Liang <luy70@psu.edu>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/18879-259fc59d072bd4d7@postgresql.org
Backpatch-through: 15
2025-04-04 20:11:48 -04:00
Tom Lane
0f43083d16 functions.c: copy trees from source_list before parse analysis etc.
This is yet another bit of fallout from the fact that backend/parser
(like other code) feels free to scribble on the parse tree it's
handed.  In this case that resulted in modifying the
relatively-short-lived copy in the cached function's source_list.
That would be fine since we only need each source_list tree once
... except that if the parser fails after making some changes,
the function cache entry remains as-is and will still be there
if the user tries to execute the function again.  Then we have
problems because we're feeding a non-pristine tree to the parser.

The most expedient fix is a quick copyObject().  I considered
other answers like somehow marking the cache entry invalid
temporarily, but that would add complexity and I'm not sure
it's worth it.  In typical scenarios we'd only do this once
per function query per session.

Reported-by: Alexander Lakhin <exclusion@gmail.com>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/6d442183-102c-498a-81d1-eeeb086cdc5a@gmail.com
2025-04-04 18:26:51 -04:00
Andrew Dunstan
2ef5790806 Fix a couple of error messages and tests for them
oversights in 1495eff7bd and 289f74d0cb. Mea culpa.
2025-04-04 17:07:45 -04:00
Nathan Bossart
8ec0aaeae0 Prevent redeclaration of typedef TocEntry.
Commit 9c02e3a986 added a forward declaration for this typedef that
caused redeclarations, which is not valid in C99.  To fix, add some
preprocessor guards to avoid a redefinition, as is done elsewhere
(e.g., commit 382092a0cd).

Per buildfarm.
2025-04-04 15:56:23 -05:00
Andrew Dunstan
289f74d0cb Add more TAP tests for pg_dumpall
Author: Matheus Alcantara <matheusssilv97@gmail.com>
Author: Mahendra Singh Thalor <mahi6run@gmail.com>
2025-04-04 16:07:46 -04:00
Andrew Dunstan
1495eff7bd Non text modes for pg_dumpall, correspondingly change pg_restore
pg_dumpall acquires a new -F/--format option, with the same meanings as
pg_dump. The default is p, meaning plain text. For any other value, a
directory is created containing two files, globals.data and map.dat. The
first contains SQL for restoring the global data, and the second
contains a map from oids to database names. It will also contain a
subdirectory called databases, inside which it will create archives in
the specified format, named using the database oids.

In these casess the -f argument is required.

If pg_restore encounters a directory containing globals.dat, and no
toc.dat, it restores the global settings and then restores each
database.

pg_restore acquires two new options: -g/--globals-only which suppresses
restoration of any databases, and --exclude-database which inhibits
restoration of particualr database(s) in the same way the same option
works in pg_dumpall.

Author: Mahendra Singh Thalor <mahi6run@gmail.com>
Co-authored-by:  Andrew Dunstan <andrew@dunslane.net>
Reviewed-by: jian he <jian.universality@gmail.com>
Reviewed-by: Srinath Reddy <srinath2133@gmail.com>
Reviewed-by: Álvaro Herrera <alvherre@alvh.no-ip.org>

Discussion: https://postgr.es/m/cb103623-8ee6-4ba5-a2c9-f32e3a4933fa@dunslane.net
2025-04-04 16:01:22 -04:00
Andrew Dunstan
2b69afbe50 add new list type simple_oid_string_list to fe-utils/simple_list
This type contains both an oid and a string.

This will be used in forthcoming changes to pg_restore.

Author: Andrew Dunstan <andrew@dunslane.net>
2025-04-04 16:01:22 -04:00
Andrew Dunstan
c1da728106 Move common pg_dump code related to connections to a new file
ConnectDatabase is used by pg_dumpall, pg_restore and pg_dump so move
common code to new file.

new file name: connectdb.c

Author:    Mahendra Singh Thalor <mahi6run@gmail.com>
2025-04-04 16:01:22 -04:00
Nathan Bossart
ff3a7f0b68 Remove unused function parameters in pg_backup_archiver.c.
Thanks to commit 9c02e3a986, which modified some of the changes
from commit a0a4601765, we can remove the now-unused ArchiveHandle
parameter from _tocEntryRestorePass() and move_to_ready_heap().

Reviewed-by: Jeff Davis <pgsql@j-davis.com>
Discussion: https://postgr.es/m/Z-3x2AnPCP331JA3%40nathan
2025-04-04 14:55:04 -05:00
Nathan Bossart
9c02e3a986 pg_dump: Retrieve attribute statistics in batches.
Currently, pg_dump gathers attribute statistics with a query per
relation, which can cause pg_dump to take significantly longer,
especially when there are many relations.  This commit addresses
this by teaching pg_dump to gather attribute statistics for 64
relations at a time.  Some simple tests showed this was the optimal
batch size, but performance may vary depending on the workload.

Our lookahead code determines the next batch of relations by
searching the TOC sequentially for relevant entries.  This approach
assumes that we will dump all such entries in TOC order, which
unfortunately isn't true for dump formats that use
RestoreArchive().  RestoreArchive() does multiple passes through
the TOC and selectively dumps certain groups of entries each time.
This is particularly problematic for index stats and a subset of
matview stats; both are in SECTION_POST_DATA, but matview stats
that depend on matview data are dumped in RESTORE_PASS_POST_ACL,
while all other stats are dumped in RESTORE_PASS_MAIN.  To handle
this, this commit moves all statistics data entries in
SECTION_POST_DATA to RESTORE_PASS_POST_ACL, which ensures that we
always dump them in TOC order.  A convenient side effect of this
change is that we can revert a decent chunk of commit a0a4601765,
but that is left for a follow-up commit.

Author: Corey Huinker <corey.huinker@gmail.com>
Co-authored-by: Nathan Bossart <nathandbossart@gmail.com>
Reviewed-by: Jeff Davis <pgsql@j-davis.com>
Discussion: https://postgr.es/m/CADkLM%3Dc%2Br05srPy9w%2B-%2BnbmLEo15dKXYQ03Q_xyK%2BriJerigLQ%40mail.gmail.com
2025-04-04 14:51:08 -05:00
Nathan Bossart
7d5c83b4e9 pg_dump: Reduce memory usage of dumps with statistics.
Right now, pg_dump stores all generated commands for statistics in
memory.  These commands can be quite large and therefore can
significantly increase pg_dump's memory footprint.  To fix, wait
until we are about to write out the commands before generating
them, and be sure to free the commands after writing.  This is
implemented via a new defnDumper callback that works much like the
dataDumper one but is specifically designed for TOC entries.

Custom dumps that include data might write the TOC twice (to update
data offset information), which would ordinarily cause pg_dump to
run the attribute statistics queries twice.  However, as a hack, we
save the length of the written-out entry in the first pass and skip
over it in the second.  While there is no known technical issue
with executing the queries multiple times and rewriting the
results, it's expensive and feels risky, so let's avoid it.

As an exception, we _do_ execute the queries twice for the tar
format.  This format does a second pass through the TOC to generate
the restore.sql file.  pg_restore doesn't use this file, so even if
the second round of queries returns different results than the
first, it won't corrupt the output; the archive and restore.sql
file will just have different content.  A follow-up commit will
teach pg_dump to gather attribute statistics in batches, which our
testing indicates more than makes up for the added expense of
running the queries twice.

Author: Corey Huinker <corey.huinker@gmail.com>
Co-authored-by: Nathan Bossart <nathandbossart@gmail.com>
Reviewed-by: Jeff Davis <pgsql@j-davis.com>
Discussion: https://postgr.es/m/CADkLM%3Dc%2Br05srPy9w%2B-%2BnbmLEo15dKXYQ03Q_xyK%2BriJerigLQ%40mail.gmail.com
2025-04-04 14:51:08 -05:00
Nathan Bossart
e3cc039a7d Skip second WriteToc() call for custom-format dumps without data.
Presently, "pg_dump --format=custom" calls WriteToc() twice.  The
second call updates the data offset information, which allegedly
makes parallel pg_restore significantly faster.  However, if we're
not dumping any data, there are no data offsets to update, so we
can skip this step.

Reviewed-by: Jeff Davis <pgsql@j-davis.com>
Discussion: https://postgr.es/m/Z9c1rbzZegYQTOQE%40nathan
2025-04-04 14:51:08 -05:00
Melanie Plageman
d9c7911e1a Use streaming read I/O in autoprewarm
Make a read stream for each valid fork of each valid relation
represented in the autoprewarm dump file and prewarm those blocks
through the read stream API instead of by directly invoking
ReadBuffer().

Co-authored-by: Nazir Bilal Yavuz <byavuz81@gmail.com>
Co-authored-by: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi>
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Reviewed-by: Andrey M. Borodin <x4mmm@yandex-team.ru> (earlier versions)
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>  (earlier versions)
Reviewed-by: Matheus Alcantara <mths.dev@pm.me> (earlier versions)
Discussion: https://postgr.es/m/flat/CAN55FZ3n8Gd%2BhajbL%3D5UkGzu_aHGRqnn%2BxktXq2fuds%3D1AOR6Q%40mail.gmail.com
2025-04-04 15:28:54 -04:00
Melanie Plageman
6acab8bdbc Refactor autoprewarm_database_main() in preparation for read stream
Autoprewarm prewarms blocks from a dump file representing the contents
of shared buffers at the time it was dumped. It uses a sorted array of
BlockInfoRecords, each representing a block from one of the cluster's
databases and tables.

autoprewarm_database_main() prewarms all the blocks from a single
database. It is optimized to ensure we don't try to open the same
relation or fork over and over again if it has been dropped or is
invalid. The main loop handled this by carefully setting various local
variables to sentinel values when a run of blocks should be skipped.

This method won't work with the read stream API. The read stream
callback must be able to advance the current position in the
BlockInfoRecord array to allow for reading ahead additional blocks,
however a read stream maps 1-1 with a relation and fork combination. So,
the main loop in autoprewarm_database_main() must also advance the
position in the array of BlockInfoRecords to skip invalid relations and
forks. This split control doesn't fit well with the current flow control
in autoprewarm_database_main()

To make it compatible with the read stream API, change
autoprewarm_database_main() to explicitly fast-forward in the
BlockInfoRecords array past the blocks belonging to an invalid relation
or fork.

This commit only implements the new control flow -- it does not use the
read stream API.

Co-authored-by: Nazir Bilal Yavuz <byavuz81@gmail.com>
Co-authored-by: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi>
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Discussion: https://postgr.es/m/flat/CAN55FZ3n8Gd%2BhajbL%3D5UkGzu_aHGRqnn%2BxktXq2fuds%3D1AOR6Q%40mail.gmail.com
2025-04-04 15:28:49 -04:00
Melanie Plageman
7f848cb788 Remove superfluous autoprewarm check
autoprewarm_database_main() prewarms blocks from the same database. It
is passed an array of sorted BlockInfoRecords and a start and stop index
into the array. The range represented should include only blocks
belonging to global objects or blocks from a single database. Remove an
unnecessary check that the current block is from the same database and
add an assert to ensure this invariant remains. Doing so removes a
special case that makes future refactoring to accommodate read
streamifying autoprewarm easier.

Noticed off-list by Andres Freund
2025-04-04 15:28:39 -04:00
Peter Geoghegan
b3f1a13f22 Avoid extra index searches through preprocessing.
Transform low_compare and high_compare nbtree skip array inequalities
(with opclasses that offer skip support) in such a way as to allow
_bt_first to consistently apply later keys when it descends the tree.
This can lower the number of index searches for multi-column scans that
use a ">" key on one of the index's prefix columns (or use a "<" key,
when scanning backwards) when it precedes some later lower-order key.

For example, an index qual "WHERE a > 5 AND b = 2" will now be converted
to "WHERE a >= 6 AND b = 2" by a new preprocessing step that takes place
after low_compare and high_compare have been finalized.  That way, the
initial call to _bt_first can use "WHERE a >= 6 AND b = 2" to find an
initial position, rather than just using "WHERE a > 5" -- "b = 2" can be
applied during every _bt_first call.  There's a decent chance that this
will allow such a scan to avoid the extra search that might otherwise be
needed to determine the lowest "a" value still satisfying "WHERE a > 5".

The transformation process can only lower the total number of index
pages read when the use of a more restrictive set of initial positioning
keys in _bt_first actually allows the scan to land on some later leaf
page directly, relative to the unoptimized case (or on an earlier leaf
page directly, when scanning backwards).  But the savings can really add
up in cases where an affected skip array comes after some other array.
For example, a scan indexqual "WHERE x IN (1, 2, 3) AND y > 5 AND z = 2"
can save as many as 3 _bt_first calls by applying the new transformation
to its "y" array (up to 1 extra search can be avoided per "x" element).

Follow-up to commit 92fe23d9, which added nbtree skip scan.

Author: Peter Geoghegan <pg@bowt.ie>
Reviewed-By: Matthias van de Meent <boekewurm+postgres@gmail.com>
Discussion: https://postgr.es/m/CAH2-Wz=FJ78K3WsF3iWNxWnUCY9f=Jdg3QPxaXE=uYUbmuRz5Q@mail.gmail.com
2025-04-04 14:14:08 -04:00
Peter Geoghegan
21a152b37f Improve nbtree skip scan primitive scan scheduling.
Don't allow nbtree scans with skip arrays to end any primitive scan on
its first leaf page without giving some consideration to how many times
the scan's arrays advanced while changing at least one skip array
(though continue not caring about the number of array advancements that
only affected SAOP arrays, even during skip scans with SAOP arrays).
Now when a scan performs more than 3 such array advancements in the
course of reading a single leaf page, it is taken as a signal that the
next page is unlikely to be skippable.  We'll therefore continue the
ongoing primitive index scan, at least until we can perform a recheck
against the next page's finaltup.

Testing has shown that this new heuristic occasionally makes all the
difference with skip scans that were expected to rely on the "passed
first page" heuristic added by commit 9a2e2a28.  Without it, there is a
remaining risk that certain kinds of skip scans will never quite manage
to clear the initial hurdle of performing a primitive scan that lasts
beyond its first leaf page (or that such a skip scan will only clear
that initial hurdle when it has already wasted noticeably-many cycles
due to inefficient primitive scan scheduling).

Follow-up to commits 92fe23d9 and 9a2e2a28.

Author: Peter Geoghegan <pg@bowt.ie>
Reviewed-By: Matthias van de Meent <boekewurm+postgres@gmail.com>
Discussion: https://postgr.es/m/CAH2-Wz=RVdG3zWytFWBsyW7fWH7zveFvTHed5JKEsuTT0RCO_A@mail.gmail.com
2025-04-04 13:58:05 -04:00
Masahiko Sawada
cf2655a902 pg_recvlogical: Add --failover option.
This new option instructs pg_recvlogical to create the logical
replication slot with the failover option enabled. It can be used in
conjunction with the --create-slot option.

Author: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Reviewed-by: Michael Banck <mbanck@gmx.net>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Discussion: https://postgr.es/m/OSCPR01MB14966C54097FC83AF19F3516BF5AC2@OSCPR01MB14966.jpnprd01.prod.outlook.com
2025-04-04 10:39:57 -07:00
Jeff Davis
3556c89321 Oversight in commit b81ffa13e3.
Should warn if a materialized view may be affected, as well.
2025-04-04 10:28:52 -07:00
Peter Geoghegan
8a510275dd Further optimize nbtree search scan key comparisons.
Postgres 17 commit e0b1ee17 added two complementary optimizations to
nbtree: the "prechecked" and "firstmatch" optimizations.  _bt_readpage
was made to avoid needlessly evaluating keys that are guaranteed to be
satisfied by applying page-level context.  "prechecked" did this for
keys required in the current scan direction, while "firstmatch" did it
for keys required in the opposite-to-scan direction only.

The "prechecked" design had a number of notable issues.  It didn't
account for the fact that an = array scan key's sk_argument field might
need to advance at the point of the page precheck (it didn't check the
precheck tuple against the key's array, only the key's sk_argument,
which needlessly made it ineffective in cases involving stepping to a
page having advanced the scan's arrays using a truncated high key).
"prechecked" was also completely ineffective when only one scan key
wasn't guaranteed to be satisfied by every tuple (it didn't recognize
that it was still safe to avoid evaluating other, earlier keys).

The "firstmatch" optimization had similar limitations.  It could only be
applied after _bt_readpage found its first matching tuple, regardless of
why any earlier tuples failed to satisfy the scan's index quals.  This
allowed unsatisfied non-required scan keys to impede the optimization.

Replace both optimizations with a new optimization, without any of these
limitations: the "startikey" optimization.  Affected _bt_readpage calls
generate a page-level key offset ("startikey"), that their _bt_checkkeys
calls can then start at.  This is an offset to the first key that isn't
known to be satisfied by every tuple on the page.

Although this is independently useful work, its main goal is to avoid
performance regressions with index scans that use skip arrays, but still
never manage to skip over irrelevant leaf pages.  We must avoid wasting
CPU cycles on overly granular skip array maintenance in these cases.
The new "startikey" optimization helps with this by selectively
disabling array maintenance for the duration of a _bt_readpage call.
This has no lasting consequences for the scan's array keys (they'll
still reliably track the scan's progress through the index's key space
whenever the scan is "between pages").

Skip scan adds skip arrays during preprocessing using simple, static
rules, and decides how best to navigate/apply the scan's skip arrays
dynamically, at runtime.  The "startikey" optimization enables this
approach.  As a result of all this, the planner doesn't need to generate
distinct, competing index paths (one path for skip scan, another for an
equivalent traditional full index scan).  The overall effect is to make
scan runtime close to optimal, even when the planner works off an
incorrect cardinality estimate.  Scans will also perform well given a
skipped column with data skew: individual groups of pages with many
distinct values (in respect of a skipped column) can be read about as
efficiently as before -- without the scan being forced to give up on
skipping over other groups of pages that are provably irrelevant.

Many scans that cannot possibly skip will still benefit from the use of
skip arrays, since they'll allow the "startikey" optimization to be as
effective as possible (by allowing preprocessing to mark all the scan's
keys as required).  A scan that uses a skip array on "a" for a qual
"WHERE a BETWEEN 0 AND 1_000_000 AND b = 42" is often much faster now,
even when every tuple read by the scan has its own distinct "a" value.
However, there are still some remaining regressions, affecting certain
trickier cases.

Scans whose index quals have several range skip arrays, each on some
high cardinality column, can still be slower than they were before the
introduction of skip scan -- even with the new "startikey" optimization.
There are also known regressions affecting very selective index scans
that use a skip array.  The underlying issue with such selective scans
is that they never get as far as reading a second leaf page, and so will
never get a chance to consider applying the "startikey" optimization.
In principle, all regressions could be avoided by teaching preprocessing
to not add skip arrays whenever they aren't expected to help, but it
seems best to err on the side of robust performance.

Follow-up to commit 92fe23d9, which added nbtree skip scan.

Author: Peter Geoghegan <pg@bowt.ie>
Reviewed-By: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Reviewed-By: Masahiro Ikeda <ikedamsh@oss.nttdata.com>
Reviewed-By: Matthias van de Meent <boekewurm+postgres@gmail.com>
Discussion: https://postgr.es/m/CAH2-Wz=Y93jf5WjoOsN=xvqpMjRy-bxCE037bVFi-EasrpeUJA@mail.gmail.com
Discussion: https://postgr.es/m/CAH2-WznWDK45JfNPNvDxh6RQy-TaCwULaM5u5ALMXbjLBMcugQ@mail.gmail.com
2025-04-04 12:27:52 -04:00
Peter Geoghegan
92fe23d93a Add nbtree skip scan optimization.
Teach nbtree multi-column index scans to opportunistically skip over
irrelevant sections of the index given a query with no "=" conditions on
one or more prefix index columns.  When nbtree is passed input scan keys
derived from a predicate "WHERE b = 5", new nbtree preprocessing steps
output "WHERE a = ANY(<every possible 'a' value>) AND b = 5" scan keys.
That is, preprocessing generates a "skip array" (and an output scan key)
for the omitted prefix column "a", which makes it safe to mark the scan
key on "b" as required to continue the scan.  The scan is therefore able
to repeatedly reposition itself by applying both the "a" and "b" keys.

A skip array has "elements" that are generated procedurally and on
demand, but otherwise works just like a regular ScalarArrayOp array.
Preprocessing can freely add a skip array before or after any input
ScalarArrayOp arrays.  Index scans with a skip array decide when and
where to reposition the scan using the same approach as any other scan
with array keys.  This design builds on the design for array advancement
and primitive scan scheduling added to Postgres 17 by commit 5bf748b8.

Testing has shown that skip scans of an index with a low cardinality
skipped prefix column can be multiple orders of magnitude faster than an
equivalent full index scan (or sequential scan).  In general, the
cardinality of the scan's skipped column(s) limits the number of leaf
pages that can be skipped over.

The core B-Tree operator classes on most discrete types generate their
array elements with the help of their own custom skip support routine.
This infrastructure gives nbtree a way to generate the next required
array element by incrementing (or decrementing) the current array value.
It can reduce the number of index descents in cases where the next
possible indexable value frequently turns out to be the next value
stored in the index.  Opclasses that lack a skip support routine fall
back on having nbtree "increment" (or "decrement") a skip array's
current element by setting the NEXT (or PRIOR) scan key flag, without
directly changing the scan key's sk_argument.  These sentinel values
behave just like any other value from an array -- though they can never
locate equal index tuples (they can only locate the next group of index
tuples containing the next set of non-sentinel values that the scan's
arrays need to advance to).

A skip array's range is constrained by "contradictory" inequality keys.
For example, a skip array on "x" will only generate the values 1 and 2
given a qual such as "WHERE x BETWEEN 1 AND 2 AND y = 66".  Such a skip
array qual usually has near-identical performance characteristics to a
comparable SAOP qual "WHERE x = ANY('{1, 2}') AND y = 66".  However,
improved performance isn't guaranteed.  Much depends on physical index
characteristics.

B-Tree preprocessing is optimistic about skipping working out: it
applies static, generic rules when determining where to generate skip
arrays, which assumes that the runtime overhead of maintaining skip
arrays will pay for itself -- or lead to only a modest performance loss.
As things stand, these assumptions are much too optimistic: skip array
maintenance will lead to unacceptable regressions with unsympathetic
queries (queries whose scan can't skip over many irrelevant leaf pages).
An upcoming commit will address the problems in this area by enhancing
_bt_readpage's approach to saving cycles on scan key evaluation, making
it work in a way that directly considers the needs of = array keys
(particularly = skip array keys).

Author: Peter Geoghegan <pg@bowt.ie>
Reviewed-By: Masahiro Ikeda <masahiro.ikeda@nttdata.com>
Reviewed-By: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Reviewed-By: Matthias van de Meent <boekewurm+postgres@gmail.com>
Reviewed-By: Tomas Vondra <tomas@vondra.me>
Reviewed-By: Aleksander Alekseev <aleksander@timescale.com>
Reviewed-By: Alena Rybakina <a.rybakina@postgrespro.ru>
Discussion: https://postgr.es/m/CAH2-Wzmn1YsLzOGgjAQZdn1STSG_y8qP__vggTaPAYXJP+G4bw@mail.gmail.com
2025-04-04 12:27:04 -04:00
Tom Lane
3ba2cdaa45 Stabilize regression test from c0962a113.
Per buildfarm.

Co-authored-by: Alena Rybakina <a.rybakina@postgrespro.ru>
Co-authored-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/srnuqlttuimzmvoulhsrbgvj4vnul6b65osswvua7sfkqsvmuy@yg7apybpxp34
2025-04-04 11:57:26 -04:00
Melanie Plageman
64e7fa43a9 Fix autoprewarm neglect of tablespaces
While prewarming blocks from a dump file, autoprewarm_database_main()
mistakenly ignored tablespace when detecting the beginning of the next
relation to prewarm. Because RelFileNumbers are only unique within a
tablespace, autoprewarm could miss prewarming blocks from a
relation with the same RelFileNumber in a different tablespace.

Though this situation is likely rare in practice, it's best to make the
code correct. Do so by explicitly checking for the RelFileNumber when
detecting a new relation.

Reported-by: Heikki Linnakangas <hlinnaka@iki.fi>
Discussion: https://postgr.es/m/97c36982-603b-494a-95f4-aaf2a12ac27e%40iki.fi
2025-04-04 11:34:06 -04:00
Nathan Bossart
742317a80f Add commit e1a8b1ad58 to .git-blame-ignore-revs. 2025-04-04 09:41:59 -05:00
Nathan Bossart
e1a8b1ad58 Re-pgindent pg_largeobject.c after commit 0d6c477664. 2025-04-04 09:38:22 -05:00
Alexander Korotkov
c0962a113d Convert 'x IN (VALUES ...)' to 'x = ANY ...' then appropriate
This commit implements the automatic conversion of 'x IN (VALUES ...)' into
ScalarArrayOpExpr.  That simplifies the query tree, eliminating the appearance
of an unnecessary join.

Since VALUES describes a relational table, and the value of such a list is
a table row, the optimizer will likely face an underestimation problem due to
the inability to estimate cardinality through MCV statistics.  The cardinality
evaluation mechanism can work with the array inclusion check operation.
If the array is small enough (< 100 elements), it will perform a statistical
evaluation element by element.

We perform the transformation in the convert_ANY_sublink_to_join() if VALUES
RTE is proper and the transformation is convertible.  The conversion is only
possible for operations on scalar values, not rows.  Also, we currently
support the transformation only when it ends up with a constant array.
Otherwise, the evaluation of non-hashed SAOP might be slower than the
corresponding Hash Join with VALUES.

Discussion: https://postgr.es/m/0184212d-1248-4f1f-a42d-f5cb1c1976d2%40tantorlabs.com
Author: Alena Rybakina <a.rybakina@postgrespro.ru>
Author: Andrei Lepikhov <lepihov@gmail.com>
Reviewed-by: Ivan Kush <ivan.kush@tantorlabs.com>
Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com>
2025-04-04 16:01:50 +03:00
Alexander Korotkov
d48d2e2dc8 Extract make_SAOP_expr() function from match_orclause_to_indexcol()
This commit extracts the code to generate ScalarArrayOpExpr on top of the list
of expressions from match_orclause_to_indexcol() into a separate function
make_SAOP_expr().  This function was extracted to be used in optimization for
conversion of 'x IN (VALUES ...)' to 'x = ANY ...'.  make_SAOP_expr() is
placed in clauses.c file as only two additional headers were needed there
compared with other places.

Discussion: https://postgr.es/m/0184212d-1248-4f1f-a42d-f5cb1c1976d2%40tantorlabs.com
Author: Alena Rybakina <a.rybakina@postgrespro.ru>
Author: Andrei Lepikhov <lepihov@gmail.com>
Reviewed-by: Ivan Kush <ivan.kush@tantorlabs.com>
Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com>
2025-04-04 16:01:28 +03:00
Peter Eisentraut
ee1ae8b99f Fix crash/valgrind error
Fix for commit 9ef1851685b: We have to skip indexes where sortopfamily
is NULL.  This takes the place of the previous btree check.  Detected
by valgrind on the buildfarm.
2025-04-04 14:45:53 +02:00
Heikki Linnakangas
b4f453f6ab docs: Clarify that NULL arg to set_config() means reset to default
Author: David G. Johnston <david.g.johnston@gmail.com>
Reviewed-by: Zhang Mingli <zmlpostgres@gmail.com>
Discussion: https://www.postgresql.org/message-id/CAKFQuwY0SK6JdCci1VJX6xsztRXgGeVEY-grkENZx%2B3CZpyPcQ@mail.gmail.com
2025-04-04 15:17:17 +03:00
Heikki Linnakangas
7afca7edef Relax assertion in finding correct GiST parent
Commit 28d3c2ddcf introduced an assertion that if the memorized
downlink location in the insertion stack isn't valid, the parent's
LSN should've changed too. Turns out that was too strict. In
gistFindCorrectParent(), if we walk right, we update the parent's
block number and clear its memorized 'downlinkoffnum'. That triggered
the assertion on next call to gistFindCorrectParent(), if the parent
needed to be split too. Relax the assertion, so that it's OK if
downlinkOffnum is InvalidOffsetNumber.

Backpatch to v13-, all supported versions. The assertion was added in
commit 28d3c2ddcf in v12.

Reported-by: Alexander Lakhin <exclusion@gmail.com>
Reviewed-by: Tender Wang <tndrwang@gmail.com>
Discussion: https://www.postgresql.org/message-id/18396-03cac9beb2f7aac3@postgresql.org
2025-04-04 13:49:00 +03:00
Fujii Masao
534874fac0 Allow "COPY table TO" command to copy rows from materialized views.
Previously, "COPY table TO" command worked only with plain tables and
did not support materialized views, even when they were populated and
had physical storage. To copy rows from materialized views,
"COPY (query) TO" command had to be used, instead.

This commit extends "COPY table TO" to support populated materialized
views directly, improving usability and performance, as "COPY table TO"
is generally faster than "COPY (query) TO". Note that copying from
unpopulated materialized views will still result in an error.

Author: jian he <jian.universality@gmail.com>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Reviewed-by: David G. Johnston <david.g.johnston@gmail.com>
Reviewed-by: Vignesh C <vignesh21@gmail.com>
Reviewed-by: Fujii Masao <masao.fujii@gmail.com>
Discussion: https://postgr.es/m/CACJufxHVxnyRYy67hiPePNCPwVBMzhTQ6FaL9_Te5On9udG=yg@mail.gmail.com
2025-04-04 19:32:00 +09:00
Peter Eisentraut
9ef1851685 Support non-btree indexes in get_actual_variable_range()
This was previously not supported because the btree strategy numbers
were hardcoded.  Now we can support this for any index that has the
required strategy mapping support and the required operators.

If an index scan used for get_actual_variable_range() requires
recheck, we now just ignore it instead of erroring out.  With btree we
knew this couldn't happen, but now it might.

Author: Mark Dilger <mark.dilger@enterprisedb.com>
Co-authored-by: Peter Eisentraut <peter@eisentraut.org>
Discussion: https://www.postgresql.org/message-id/flat/E72EAA49-354D-4C2E-8EB9-255197F55330@enterprisedb.com
2025-04-04 12:21:34 +02:00
Fujii Masao
0d6c477664 Extend ALTER DEFAULT PRIVILEGES to define default privileges for large objects.
Previously, ALTER DEFAULT PRIVILEGES did not support large objects.
This meant that to grant privileges to users other than the owner,
permissions had to be manually assigned each time a large object
was created, which was inconvenient.

This commit extends ALTER DEFAULT PRIVILEGES to allow defining default
access privileges for large objects. With this change, specified privileges
will automatically apply to newly created large objects, making privilege
management more efficient.

As a side effect, this commit introduces the new keyword OBJECTS
since it's used in the syntax of ALTER DEFAULT PRIVILEGES.

Original patch by Haruka Takatsuka, with some fixes and tests by Yugo Nagata,
and rebased by Laurenz Albe.

Author: Takatsuka Haruka <harukat@sraoss.co.jp>
Co-authored-by: Yugo Nagata <nagata@sraoss.co.jp>
Co-authored-by: Laurenz Albe <laurenz.albe@cybertec.at>
Reviewed-by: Masao Fujii <masao.fujii@gmail.com>
Discussion: https://postgr.es/m/20240424115242.236b499b2bed5b7a27f7a418@sraoss.co.jp
2025-04-04 19:02:17 +09:00
Heikki Linnakangas
6e9c81836e Use standard die() signal handler in walreceiver
This gets rid of the bespoken ProcessWalRcvInterrupts() function,
which lets walreceiver terminate at any CHECK_FOR_INTERRUPTS() call.
And it's less code anyway.

We can now use the standard libpqsrv_connect_params() libpq wrapper
from libpq-be-fe-helpers.h, removing more code. We attempted to do
that earlier already in commit 728f86fec6, but that was reverted
because it didn't call ProcessWalRcvInterrupts() and therefore didn't
react to shutdown requests. Now that ProcessWalRcvInterrupts() is
gone, it works. As stated in that commit, this also leads to
libpqwalreceiver reserving file descriptors for libpq conncetions,
which is nice.

Author: Andres Freund <andres@anarazel.de> (the earlier commit)
Author: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Reviewed-by: Fujii Masao <masao.fujii@gmail.com>
Reviewed-by: Yura Sokolov <y.sokolov@postgrespro.ru>
2025-04-04 12:38:32 +03:00
Peter Eisentraut
8123e91f5a Convert PathKey to use CompareType
Change the PathKey struct to use CompareType to record the sort
direction instead of hardcoding btree strategy numbers.  The
CompareType is then converted to the index-type-specific strategy when
the plan is created.

This reduces the number of places btree strategy numbers are
hardcoded, and it's a self-contained subset of a larger effort to
allow non-btree indexes to behave like btrees.

Author: Mark Dilger <mark.dilger@enterprisedb.com>
Co-authored-by: Peter Eisentraut <peter@eisentraut.org>
Discussion: https://www.postgresql.org/message-id/flat/E72EAA49-354D-4C2E-8EB9-255197F55330@enterprisedb.com
2025-04-04 11:22:20 +02:00
Daniel Gustafsson
daa16893fa doc: Clarify the system value for sslrootcert
The documentation for the special value "system" for sslrootcert could
be misinterpreted to mean the default operating system CA store, which
it may be, but it's defined to be the default CA store of the SSL lib
used.

Backpatch down to v16 where support for the system value was added.

Author: Daniel Gustafsson <daniel@yesql.se>
Reviewed-by: George MacKerron <george@mackerron.co.uk>
Discussion: https://postgr.es/m/B3CBBAA3-6EA3-4AB7-8619-4BBFAB93DDB4@yesql.se
Backpatch-through: 16
2025-04-04 09:47:36 +02:00
Amit Kapila
898c131b58 pg_createsubscriber: Improve error messages.
Consistently, an option name is used in the error messages where
applicable. Also, change the code to use pg_fatal() instead of a
combination of pg_log_error() and exit().

Author: vignesh C <vignesh21@gmail.com>
Reviewed-by: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Discussion: https://postgr.es/m/CALDaNm0HxF1RH27LP7VisLzNsSJbssy8a64M5p6UduDaBq6-ag@mail.gmail.com
2025-04-04 10:58:59 +05:30
Fujii Masao
d5d85f1881 Fix logical decoding test to correctly check slot removal on standby.
The regression test for logical decoding verifies whether a logical slot
is correctly dropped on a standby when its associated database is dropped.
However, the test mistakenly retrieved slot information from the primary
instead of the standby, causing incorrect behavior.

This commit fixes the issue by ensuring the test correctly checks the slot
on the standby.

Back-patch to all supported versions.

Author: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Reviewed-by: Fujii Masao <masao.fujii@gmail.com>
Discussion: https://postgr.es/m/1fdfd020-a509-403c-bd8f-a04664aba148@oss.nttdata.com
Backpatch-through: 13
2025-04-04 13:32:46 +09:00
Fujii Masao
c754bdd8a2 Fix logical decoding regression tests to correctly check slot existence.
The regression tests for logical decoding verify whether a logical slot
exists or has been dropped. Previously, these tests attempted to
retrieve "slot_name" from the result of slot(), but since "slot_name" was
not included in the result, slot()->{'slot_name'} always returned undef,
leading to incorrect behavior.

This commit fixes the issue by checking the "plugin" field in the result
of slot() instead, ensuring the tests properly verify slot existence.

Back-patch to all supported versions.

Author: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Reviewed-by: Fujii Masao <masao.fujii@gmail.com>
Discussion: https://postgr.es/m/OSCPR01MB149667EC4E738769CA80B7EA5F5AE2@OSCPR01MB14966.jpnprd01.prod.outlook.com
Backpatch-through: 13
2025-04-04 13:09:06 +09:00
Tomas Vondra
1aff1dc8df Revert "Improve accounting for memory used by shared hash tables"
This reverts commit f5930f9a98.

This broke the expansion of private hash tables, which reallocates the
directory. But that's impossible when it's allocated together with the
other fields, and dir_realloc() failed with BogusFree. Clearly, this
needs rethinking.

Discussion: https://postgr.es/m/CAApHDvriCiNkm=v521AP6PKPfyWkJ++jqZ9eqX4cXnhxLv8w-A@mail.gmail.com
2025-04-04 04:43:50 +02:00
Amit Langote
88f55bc976 Make derived clause lookup in EquivalenceClass more efficient
Derived clauses are stored in ec_derives, a List of RestrictInfos.
These clauses are later looked up by matching the left and right
EquivalenceMembers along with the clause's parent EC.

This linear search becomes expensive in queries with many joins or
partitions, where ec_derives may contain thousands of entries. In
particular, create_join_clause() can spend significant time scanning
this list.

To improve performance, introduce a hash table (ec_derives_hash) that
is built when the list reaches 32 entries -- the same threshold used
for join_rel_hash. The original list is retained alongside the hash
table to support EC merging and serialization
(_outEquivalenceClass()).

Each clause is stored in the hash table using a canonicalized key: the
EquivalenceMember with the lower memory address is placed in the key
before the one with the higher memory address. This avoids storing or
searching for both permutations of the same clause. For clauses
involving a constant EM, the key places NULL in the first slot and the
non-constant EM in the second.

The hash table is initialized using list_length(ec_derives_list) as
the size hint. simplehash internally adjusts this to the next power of
two after dividing by the fillfactor, so this typically results in at
least 64 buckets near the threshold -- avoiding immediate resizing
while adapting to the actual number of entries.

The lookup logic for derived clauses is now centralized in
ec_search_derived_clause_for_ems(), which consults the hash table when
available and falls back to the list otherwise.

The new ec_clear_derived_clauses() always frees ec_derives_list, even
though some of the original code paths that cleared the old
ec_derives field did not. This ensures consistent cleanup and avoids
leaking memory when large lists are discarded.

An assertion originally placed in find_derived_clause_for_ec_member()
is moved into ec_search_derived_clause_for_ems() so that it is
enforced consistently, regardless of whether the hash table or list is
used for lookup.

This design incorporates suggestions by David Rowley, who proposed
both the key canonicalization and the initial sizing approach to
balance memory usage and CPU efficiency.

Author: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Reviewed-by: Amit Langote <amitlangote09@gmail.com>
Reviewed-by: David Rowley <dgrowleyml@gmail.com>
Tested-by: Dmitry Dolgov <9erthalion6@gmail.com>
Tested-by: Alvaro Herrera <alvherre@alvh.no-ip.org>
Tested-by: Amit Langote <amitlangote09@gmail.com>
Tested-by: David Rowley <dgrowleyml@gmail.com>
Discussion: https://postgr.es/m/CAExHW5vZiQtWU6moszLP5iZ8gLX_ZAUbgEX0DxGLx9PGWCtqUg@mail.gmail.com
2025-04-04 10:45:05 +09:00
Amit Langote
887160d1be Add assertion to verify derived clause has constant RHS
find_derived_clause_for_ec_member() searches for a previously-derived
clause that equates a non-constant EquivalenceMember to a constant.
It is only called for EquivalenceClasses with ec_has_const set, and
with a non-constant member the EquivalenceMember to search for.

The matched clause is expected to have the non-constant member on the
left-hand side and the constant EquivalenceMember on the right.

Assert that the RHS is indeed a constant, to catch violations of this
structure and enforce assumptions made by
generate_base_implied_equalities_const().

Author: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Reviewed-by: Amit Langote <amitlangote09@gmail.com>
Discussion: https://postgr.es/m/CAExHW5scMxyFRqOFE6ODmBiW2rnVBEmeEcA-p4W_CyuEikURdA@mail.gmail.com
2025-04-04 10:45:05 +09:00
Melanie Plageman
67be093562 Use AIO batchmode for bitmap heap scans
Previously bitmap heap scan was not AIO batchmode safe because of the
visibility map reads potentially done for the "skip fetch" optimization
(which skipped fetching tuples from the heap if the pages were all
visible and none of the columns were used in the query).

The skip fetch optimization implementation was found to have bugs and
was removed in 459e7bf8e2, so we can safely enable batchmode for
bitmap heap scans.
2025-04-03 18:23:02 -04:00
Melanie Plageman
54a3615f15 Remove misleading read stream asserts in a few users
Several read stream users asserted that the read stream was exhausted
after looping on that very condition. It was pointed out in an a
review of an as-of-yet uncommitted read stream user [1] that this was
confusing and could lead the reader to think there was a possibility of
some kind of race condition. Remove these asserts.

[1] https://postgr.es/m/F9ACE8D0-B807-4A17-B6BD-87EF0717983D%40yesql.se
2025-04-03 18:22:37 -04:00
Tom Lane
dbd437e670 Fix oversight in commit 0dca5d68d.
As coded, fmgr_sql() would get an assertion failure for a SQL function
that has an empty body and is declared to return some type other than
VOID.  Typically you'd never get that far because fmgr_sql_validator()
would reject such a definition (I suspect that's how come I managed to
miss the bug).  But if check_function_bodies is off or the function is
polymorphic, the validation check wouldn't get made.

Reported-by: Alexander Lakhin <exclusion@gmail.com>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/0fde377a-3870-4d18-946a-ce008ee5bb88@gmail.com
2025-04-03 16:03:12 -04:00
Daniel Gustafsson
46c4c7cbc6 oauth: Remove timeout from t/002_client when not needed
The connect_timeout=1 setting for the --hang-forever test was left in
place and used by later tests, causing unexpected timeouts on slower
buildfarm animals. Remove it when no longer needed.

Per buildfarm member skink, reported by Andres on Discord.

Author: Jacob Champion <jacob.champion@enterprisedb.com>
Reported-by: Andres Freund <andres@anarazel.de>
2025-04-03 20:41:09 +02:00
Daniel Gustafsson
8ae0a37932 oauth: Fix build on platforms without epoll/kqueue
register_socket() missed a variable declaration if neither
HAVE_SYS_EPOLL_H nor HAVE_SYS_EVENT_H was defined.

While we're fixing that, adjust the tests to check pg_config.h for one
of the multiplexer implementations, rather than assuming that Windows is
the only platform without support. (Christoph reported this on
hurd-amd64, an experimental Debian.)

Author: Jacob Champion <jacob.champion@enterprisedb.com>
Reported-by: Christoph Berg <myon@debian.org>
Discussion: https://postgr.es/m/Z-sPFl27Y0ZC-VBl%40msg.df7cb.de
2025-04-03 20:37:52 +02:00
Jeff Davis
945126234b Fix unintentional 'NULL' string literal in pg_upgrade.
Introduced in 2a083ab807.

Discussion: https://postgr.es/m/e852442da35b4f31acc600ed98bbee0f12e65e0c.camel@j-davis.com
Reviewed-by: Michael Paquier <michael@paquier.xyz>
2025-04-03 11:04:37 -07:00
Jeff Davis
b81ffa13e3 pg_upgrade check for Unicode-dependent relations.
This check will not cause an upgrade failure, only a warning.

Discussion: https://postgr.es/m/ef03d678b39a64392f4b12e0f59d1495c740969e.camel%40j-davis.com
Reviewed-by: Peter Eisentraut <peter@eisentraut.org>
2025-04-03 10:45:38 -07:00
Masahiko Sawada
fd09c1316b Restrict copying of invalidated replication slots.
Previously, invalidated logical and physical replication slots could
be copied using the pg_copy_logical_replication_slot and
pg_copy_physical_replication_slot functions. Replication slots that
were invalidated for reasons other than WAL removal retained their
restart_lsn. This meant that a new slot copied from an invalidated
slot could have a restart_lsn pointing to a WAL segment that might
have already been removed.

This commit restricts the copying of invalidated replication slots.

Backpatch to v16, where slots could retain their restart_lsn when
invalidated for reasons other than WAL removal.

For v15 and earlier, this check is not required since slots can only
be invalidated due to WAL removal, and existing checks already handle
this issue.

Author: Shlok Kyal <shlok.kyal.oss@gmail.com>
Reviewed-by: vignesh C <vignesh21@gmail.com>
Reviewed-by: Zhijie Hou <houzj.fnst@fujitsu.com>
Reviewed-by: Peter Smith <smithpb2250@gmail.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Discussion: https://postgr.es/m/CANhcyEU65aH0VYnLiu%3DOhNNxhnhNhwcXBeT-jvRe1OiJTo_Ayg%40mail.gmail.com
Backpatch-through: 16
2025-04-03 10:30:00 -07:00
Álvaro Herrera
f104192e52
Remove duplicate set of print_notnull
I inserted the second one by mistake in commit 14e87ffa5c.

Reported-by: jian he <jian.universality@gmail.com>
Confirmed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Discussion: https://postgr.es/m/CACJufxFqckBFxPfCixHHbOr0zMLksviTj2m3o12-tErfx_PvTg@mail.gmail.com
2025-04-03 17:34:25 +02:00
Daniel Gustafsson
b82e7eddb0 Add missing declarations to pg_config.h.in
Add missing pg_config.h.in declarations from 09be391126
where the corresponding autoconf/meson declarations were
added.

Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi>
Discussion: https://postgr.es/m/70145721-6949-4ABF-BB54-63F866488DF8@yesql.se
2025-04-03 13:57:27 +02:00
Daniel Gustafsson
2da74d8d64 libpq: Add support for dumping SSL key material to file
This adds a new connection parameter which instructs libpq to
write out keymaterial clientside into a file in order to make
connection debugging with Wireshark and similar tools possible.
The file format used is the standardized NSS format.

Author: Abhishek Chanda <abhishek.becs@gmail.com>
Co-authored-by: Daniel Gustafsson <daniel@yesql.se>
Reviewed-by: Jacob Champion <jacob.champion@enterprisedb.com>
Discussion: https://postgr.es/m/CAKiP-K85C8uQbzXKWf5wHQPkuygGUGcufke713iHmYWOe9q2dA@mail.gmail.com
2025-04-03 13:16:43 +02:00
Heikki Linnakangas
e4309f73f6 Add support for sorted gist index builds to btree_gist
This enables sortsupport in the btree_gist extension for faster builds
of gist indexes.

Sorted gist index build strategy is the new default now. Regression
tests are unchanged (except for one small change in the 'enum' test to
add coverage for enum values added later) and are using the sorted
build strategy instead.

One version of this was committed a long time ago already, in commit
9f984ba6d2, but it was quickly reverted because of buildfarm
failures. The failures were presumably caused by some small bugs, but
we never got around to debug and commit it again. This patch was
written from scratch, implementing the same idea, with some fragments
and ideas from the original patch.

Author: Bernd Helmle <mailings@oopsware.de>
Author: Andrey Borodin <x4mmm@yandex-team.ru>
Discussion: https://www.postgresql.org/message-id/64d324ce2a6d535d3f0f3baeeea7b25beff82ce4.camel@oopsware.de
2025-04-03 13:46:35 +03:00
Heikki Linnakangas
9370978da8 Fix boilerplate comments in btree_gist
A few of these were copy-pasted wrong, like the comment "Bytea ops" in
btree_numeric.c. Instead of fixing the incorrect ones, replace them
all with generic comment "GiST support functions".

Also tidy up the inconsistent newlines between various functions while
we're at it.
2025-04-03 13:39:33 +03:00
Peter Eisentraut
82a46cca99 Update Unicode data to Unicode 16.0.0
Reviewed-by: Jeff Davis <pgsql@j-davis.com>
Discussion: https://www.postgresql.org/message-id/flat/146349e4-4687-4321-91af-f235572490a8@eisentraut.org
2025-04-03 12:00:09 +02:00
Peter Eisentraut
231064aa0f plpython: Add test for returning Python set from SETOF function
This is claimed in the documentation but there was a no test case for
it.

Reported-by: Bogdan Grigorenko <gri.bogdan.2020@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/173543330569.680.6706329879058172623%40wrigleys.postgresql.org
2025-04-03 11:09:50 +02:00
Amit Kapila
d1d83827ba Doc: Improve -R option added in e5aeed4b80.
Author: Reviewed-by: Peter Smith <smithpb2250@gmail.com>
Reviewed-by: vignesh C <vignesh21@gmail.com>
Discussion: https://postgr.es/m/CAHut+PvJPnaL=70SbBe3fYg2nq74Z=Yv4X=zRpUWYfOi-q6=2w@mail.gmail.com
2025-04-03 14:27:13 +05:30
Álvaro Herrera
8806e4e8de
002_pg_upgrade.pl: Move pg_dump test code for better stability
The alleged "statistics pg_dump bug" that prevented us from enabling
stats dumping in commit 172259afb5 wasn't a pg_dump bug after all: it
was just a side effect of not running pg_dump at the right time (namely,
before giving autovacuum some time to do its thing and then disabling it
to stabilize things).  Move the code around to fix this problem and
enable statistics dumping.

Author: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Diagnosed-by: Jeff Davis <pgsql@j-davis.com>
Discussion: https://postgr.es/m/5f3703fd7f27da62a8f3615218f937507f522347.camel@j-davis.com
Discussion: https://postgr.es/m/CAExHW5sDm+aGb7A4EXK=X9rkrmSPDgc03EdADt=wWkdMO=XPSA@mail.gmail.com
2025-04-03 10:16:24 +02:00
Álvaro Herrera
abe56227b2
002_pg_upgrade.pl: rename some variables for clarity
This renames %node_params to %old_node_params, @initdb_params to
@old_initdb_params, and adds separate @new_initdb_params and
%new_node_params rather than reusing the former in confusing ways.

Extracted from a larger patch from the same author.

Author: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Discussion: https://postgr.es/m/CAExHW5sDm+aGb7A4EXK=X9rkrmSPDgc03EdADt=wWkdMO=XPSA@mail.gmail.com
2025-04-03 09:56:58 +02:00
Richard Guo
ea5d3f5233 Remove duplicated comment in get_relation_constraints
The check for non-inheritable constraints is performed later, and the
same comment is included at that point.

While we're here, remove one extraneous blank line.

Author: jian he <jian.universality@gmail.com>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Reviewed-by: Richard Guo <guofenglinux@gmail.com>
Discussion: https://postgr.es/m/CACJufxETi6x86S8EkH8mRfOcm2AenoE9t1pyCFVMpU34gVhF3w@mail.gmail.com
2025-04-03 16:43:53 +09:00
Peter Eisentraut
84fea854c9 Update Unicode data to CLDR 47
No actual changes result.
2025-04-03 09:20:25 +02:00
Peter Eisentraut
bbf24fe2f1 Update code comment
Commit 4e7f62bc38 added a new input file to a script but didn't
update the comment listing the input files.
2025-04-03 09:20:25 +02:00
Peter Eisentraut
34f04aa653 Fix update-unicode make target
The addition of SpecialCasing.txt by commit 286a365b9c was not added
to the make target dependencies, so the invoked script would fail
because the required file wasn't downloaded first.  (The meson version
appears to work correctly.)
2025-04-03 09:20:25 +02:00
Amit Kapila
4868c96bc8 Fix slot synchronization for two_phase enabled slots.
The issue is that the transactions prepared before two-phase decoding is
enabled can fail to replicate to the subscriber after being committed on a
promoted standby following a failover. This is because the two_phase_at
field of a slot, which tracks the LSN from which two-phase decoding
starts, is not synchronized to standby servers. Without two_phase_at, the
logical decoding might incorrectly identify prepared transaction as
already replicated to the subscriber after promotion of standby server,
causing them to be skipped.

To address the issue on HEAD, the two_phase_at field of the slot is
exposed by the pg_replication_slots view and allows the slot
synchronization to copy this value to the corresponding synced slot on the
standby server.

This bug is likely to occur if the user toggles the two_phase option to
true after initial slot creation. Given that altering the two_phase option
of a replication slot is not allowed in PostgreSQL 17, this bug is less
likely to occur. We can't change the view/function definition in
backbranch so we can't push the same fix but we are brainstorming an
appropriate solution for PG17.

Author: Zhijie Hou <houzj.fnst@fujitsu.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Discussion: https://postgr.es/m/TYAPR01MB5724CC7C288535BBCEEE65DA94A72@TYAPR01MB5724.jpnprd01.prod.outlook.com
2025-04-03 12:26:54 +05:30
Tom Lane
a7187c3723 Remove unnecessary type violation in tsvectorrecv().
compareentry() is declared to work on WordEntryIN structs, but
tsvectorrecv() is using it in two places to work on WordEntry
structs.  This is almost okay, since WordEntry is the first
field of WordEntryIN.  But on machines with 8-byte pointers,
WordEntryIN will have a larger alignment spec than WordEntry,
and it's at least theoretically possible that the compiler
could generate code that depends on the larger alignment.

Given the lack of field reports, this may be just a hypothetical bug
that upsets nothing except sanitizer tools.  Or it may be real on
certain hardware but nobody's tried to use tsvectorrecv() on such
hardware.  In any case we should fix it, and the fix is trivial:
just change compareentry() so that it works on WordEntry without any
mention of WordEntryIN.  We can also get rid of the quite-useless
intermediate function WordEntryCMP.

Bug: #18875
Reported-by: Alexander Lakhin <exclusion@gmail.com>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/18875-07a29c49c825a608@postgresql.org
Backpatch-through: 13
2025-04-02 16:17:43 -04:00
Andres Freund
24da5b239a Add test for HeapBitmapScan's broken skip_fetch optimization
In the previous commit HeapBitmapScan's skip_fetch optimization was removed,
due to being broken in not easily fixable ways. Add a test that verifies we
don't re-introduce this bug if somebody tries to re-add the feature.

Only add the test to master for now, it's possible it's not entirely
stable. That seems sufficient, as we're not going to re-introduce the feature
on the backbranches. I did verify that the test passes on all branches. If the
test turns out to be unproblematic, we can backpatch it later, should we feel
a need to do so.

Discussion: https://postgr.es/m/CAEze2Wg3gXXZTr6_rwC+s4-o2ZVFB5F985uUSgJTsECx6AmGcQ@mail.gmail.com
2025-04-02 14:58:39 -04:00
Andres Freund
459e7bf8e2 Remove HeapBitmapScan's skip_fetch optimization
The optimization does not take the removal of TIDs by a concurrent vacuum into
account. The concurrent vacuum can remove dead TIDs and make pages ALL_VISIBLE
while those dead TIDs are referenced in the bitmap. This can lead to a
skip_fetch scan returning too many tuples.

It likely would be possible to implement this optimization safely, but we
don't have the necessary infrastructure in place. Nor is it clear that it's
worth building that infrastructure, given how limited the skip_fetch
optimization is.

In the backbranches we just disable the optimization by always passing
need_tuples=true to table_beginscan_bm(). We can't perform API/ABI changes in
the backbranches and we want to make the change as minimal as possible.

Author: Matthias van de Meent <boekewurm+postgres@gmail.com>
Reported-By: Konstantin Knizhnik <knizhnik@garret.ru>
Discussion: https://postgr.es/m/CAEze2Wg3gXXZTr6_rwC+s4-o2ZVFB5F985uUSgJTsECx6AmGcQ@mail.gmail.com
Backpatch-through: 13
2025-04-02 14:54:20 -04:00
Tom Lane
0dca5d68d7 Change SQL-language functions to use the plan cache.
In the historical implementation of SQL functions (if they don't get
inlined), we built plans for all the contained queries at first call
within an outer query, and then re-used those plans for the duration
of the outer query, and then forgot everything.  This was not ideal,
not least because the plans could not be customized to specific values
of the function's parameters.  Our plancache infrastructure seems
mature enough to be used here.  That will solve both the problem with
not being able to build custom plans and the problem with not being
able to share work across successive outer queries.

Aside from those performance concerns, this change fixes a
longstanding bugaboo with SQL functions: you could not write DDL that
would affect later statements in the same function.  That's mostly
still true with new-style SQL functions, since the results of parse
analysis are baked into the stored query trees (and protected by
dependency records).  But for old-style SQL functions, it will now
work much as it does with PL/pgSQL functions, because we delay parse
analysis and planning of each query until we're ready to run it.
Some edge cases that require replanning are now handled better too;
see for example the new rowsecurity test, where we now detect an RLS
context change that was previously missed.

One other edge-case change that might be worthy of a release note
is that we now insist that a SQL function's result be generated
by the physically-last query within it.  Previously, if the last
original query was deleted by a DO INSTEAD NOTHING rule, we'd be
willing to take the result from the preceding query instead.
This behavior was undocumented except in source-code comments,
and it seems hard to believe that anyone's relying on it.

Along the way to this feature, we needed a few infrastructure changes:

* The plancache can now take either a raw parse tree or an
analyzed-but-not-rewritten Query as the starting point for a
CachedPlanSource.  If given a Query, it is caller's responsibility
that nothing will happen to invalidate that form of the query.
We use this for new-style SQL functions, where what's in pg_proc is
serialized Query(s) and we trust the dependency mechanism to disallow
DDL that would break those.

* The plancache now offers a way to invoke a post-rewrite callback
to examine/modify the rewritten parse tree when it is rebuilding
the parse trees after a cache invalidation.  We need this because
SQL functions sometimes adjust the parse tree to make its output
exactly match the declared result type; if the plan gets rebuilt,
that has to be re-done.

* There is a new backend module utils/cache/funccache.c that
abstracts the idea of caching data about a specific function
usage (a particular function and set of input data types).
The code in it is moved almost verbatim from PL/pgSQL, which
has done that for a long time.  We use that logic now for
SQL-language functions too, and maybe other PLs will have use
for it in the future.

Author: Alexander Pyhalov <a.pyhalov@postgrespro.ru>
Co-authored-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Pavel Stehule <pavel.stehule@gmail.com>
Discussion: https://postgr.es/m/8216639.NyiUUSuA9g@aivenlaptop
2025-04-02 14:06:02 -04:00
Heikki Linnakangas
e9e7b66044 Add GiST and btree sortsupport routines for range types
For GiST, having a sortsupport function allows building the index
using the "sorted build" method, which is much faster.

For b-tree, the sortsupport routine doesn't give any new
functionality, but speeds up sorting a tiny bit. The difference is not
very significant, about 2% in cursory testing on my laptop, because
the range type comparison function has quite a lot of overhead from
detoasting. In any case, since we have the function for GiST anyway,
we might as well register it for the btree opfamily too.

Author: Bernd Helmle <mailings@oopsware.de>
Discussion: https://www.postgresql.org/message-id/64d324ce2a6d535d3f0f3baeeea7b25beff82ce4.camel@oopsware.de
2025-04-02 19:51:28 +03:00
Heikki Linnakangas
ea3f9b6da3 docs: Fix column count attribute in table
Nothing seems to actually depend on the attribute, as the docs built
successfully, but let's be tidy.

Reported offlist by Matthias van de Meent
2025-04-02 18:21:07 +03:00
Tomas Vondra
46df9487d9 Improve accounting for PredXactList, RWConflictPool and PGPROC
Various places allocated shared memory by first allocating a small chunk
using ShmemInitStruct(), followed by ShmemAlloc() calls to allocate more
memory. Unfortunately, ShmemAlloc() does not update ShmemIndex, so this
affected pg_shmem_allocations - it only shown the initial chunk.

This commit modifies the following allocations, to allocate everything
as a single chunk, and then split it internally.

- PredXactList
- RWConflictPool
- PGPROC structures
- Fast-Path Lock Array

The fast-path lock array is allocated separately, not as a part of the
PGPROC structures allocation.

Author: Rahila Syed <rahilasyed90@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Nazir Bilal Yavuz <byavuz81@gmail.com>
Reviewed-by: Tomas Vondra <tomas@vondra.me>
Discussion: https://postgr.es/m/CAH2L28vHzRankszhqz7deXURxKncxfirnuW68zD7+hVAqaS5GQ@mail.gmail.com
2025-04-02 17:14:28 +02:00
Tomas Vondra
f5930f9a98 Improve accounting for memory used by shared hash tables
pg_shmem_allocations tracks the memory allocated by ShmemInitStruct(),
but for shared hash tables that covered only the header and hash
directory.  The remaining parts (segments and buckets) were allocated
later using ShmemAlloc(), which does not update the shmem accounting.
Thus, these allocations were not shown in pg_shmem_allocations.

This commit improves the situation by allocating all the hash table
parts at once, using a single ShmemInitStruct() call. This way the
ShmemIndex entries (and thus pg_shmem_allocations) better reflect the
proper size of the hash table.

This affects allocations for private (non-shared) hash tables too, as
the hash_create() code is shared. For non-shared tables this however
makes no practical difference.

This changes the alignment a bit. ShmemAlloc() aligns the chunks using
CACHELINEALIGN(), which means some parts (header, directory, segments)
were aligned this way. Allocating all parts as a single chunk removes
this (implicit) alignment. We've considered adding explicit alignment,
but we've decided not to - it seems to be merely a coincidence due to
using the ShmemAlloc() API, not due to necessity.

Author: Rahila Syed <rahilasyed90@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Nazir Bilal Yavuz <byavuz81@gmail.com>
Reviewed-by: Tomas Vondra <tomas@vondra.me>
Discussion: https://postgr.es/m/CAH2L28vHzRankszhqz7deXURxKncxfirnuW68zD7+hVAqaS5GQ@mail.gmail.com
2025-04-02 17:14:28 +02:00
Tom Lane
bd178960c6 Need to do CommandCounterIncrement after StoreAttrMissingVal.
Without this, an additional change to the same pg_attribute row
within the same command will fail.  This is possible at least with
ALTER TABLE ADD COLUMN on a multiple-inheritance-pathway structure.
(Another potential hazard is that immediately-following operations
might not see the missingval.)

Introduced by 95f650674, which split the former coding that
used a single pg_attribute update to change both atthasdef and
atthasmissing/attmissingval into two updates, but missed that
this should entail two CommandCounterIncrements as well.  Like
that fix, back-patch through v13.

Reported-by: Alexander Lakhin <exclusion@gmail.com>
Author: Tender Wang <tndrwang@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/025a3ffa-5eff-4a88-97fb-8f583b015965@gmail.com
Backpatch-through: 13
2025-04-02 11:13:01 -04:00
Heikki Linnakangas
b05751220b docs: Add a new section and a table listing protocol versions
Move the discussion on protocol versions and version negotiation to a
new "Protocol versions" section. Add a table listing all the different
protocol versions, starting from the obsolete protocol version 2, and
the PostgreSQL versions that support each.

Discussion: https://www.postgresql.org/message-id/69f53970-1d55-4165-9151-6fb524e36af9@iki.fi
2025-04-02 16:41:51 +03:00
Heikki Linnakangas
a460251f0a Make cancel request keys longer
Currently, the cancel request key is a 32-bit token, which isn't very
much entropy. If you want to cancel another session's query, you can
brute-force it. In most environments, an unauthorized cancellation of
a query isn't very serious, but it nevertheless would be nice to have
more protection from it. Hence make the key longer, to make it harder
to guess.

The longer cancellation keys are generated when using the new protocol
version 3.2. For connections using version 3.0, short 4-bytes keys are
still used.

The new longer key length is not hardcoded in the protocol anymore,
the client is expected to deal with variable length keys, up to 256
bytes. This flexibility allows e.g. a connection pooler to add more
information to the cancel key, which might be useful for finding the
connection.

Reviewed-by: Jelte Fennema-Nio <postgres@jeltef.nl>
Reviewed-by: Robert Haas <robertmhaas@gmail.com> (earlier versions)
Discussion: https://www.postgresql.org/message-id/508d0505-8b7a-4864-a681-e7e5edfe32aa@iki.fi
2025-04-02 16:41:48 +03:00
Heikki Linnakangas
285613c60a libpq: Add min/max_protocol_version connection options
All supported version of the PostgreSQL server send the
NegotiateProtocolVersion message when an unsupported minor protocol
version is requested by a client. But many other applications that
implement the PostgreSQL protocol (connection poolers, or other
databases) do not, and the same is true for PostgreSQL server versions
older than 9.3. Connecting to such other applications thus fails if a
client requests a protocol version different than 3.0.

This patch adds a max_protocol_version connection option to libpq that
specifies the protocol version that libpq should request from the
server. Currently only 3.0 is supported, but that will change in a
future commit that bumps the protocol version. Even after that version
bump the default will likely stay 3.0 for the time being. Once more of
the ecosystem supports the NegotiateProtocolVersion message we might
want to change the default to the latest minor version.

This also adds the similar min_protocol_version connection option, to
allow the client to specify that connecting should fail if a lower
protocol version is attempted by the server. This can be used to
ensure that certain protocol features are used, which can be
particularly useful if those features impact security.

Author: Jelte Fennema-Nio <postgres@jeltef.nl>
Reviewed-by: Robert Haas <robertmhaas@gmail.com> (earlier versions)
Discussion: https://www.postgresql.org/message-id/CAGECzQTfc_O%2BHXqAo5_-xG4r3EFVsTefUeQzSvhEyyLDba-O9w@mail.gmail.com
Discussion: https://www.postgresql.org/message-id/CAGECzQRbAGqJnnJJxTdKewTsNOovUt4bsx3NFfofz3m2j-t7tA@mail.gmail.com
2025-04-02 16:41:45 +03:00
Heikki Linnakangas
5070349102 libpq: Handle NegotiateProtocolVersion message differently
Previously libpq would always error out if the server sends a
NegotiateProtocolVersion message. This was fine because libpq only
supported a single protocol version and did not support any protocol
parameters. But in the upcoming commits, we will introduce a new
protocol version and the NegotiateProtocolVersion message starts to
actually be used.

This patch modifies the client side checks to allow a range of
supported protocol versions, instead of only allowing the exact
version that was requested. Currently this "range" only contains the
3.0 version, but in a future commit we'll change this.

Also clarify the error messages, making them suitable for the world
where libpq will support multiple protocol versions and protocol
extensions.

Note that until the later commits that introduce new protocol version,
this change does not have any behavioural effect, because libpq will
only request version 3.0 and will never send protocol parameters, and
therefore will never receive a NegotiateProtocolVersion message from
the server.

Author: Jelte Fennema-Nio <postgres@jeltef.nl>
Reviewed-by: Robert Haas <robertmhaas@gmail.com> (earlier versions)
Discussion: https://www.postgresql.org/message-id/CAGECzQTfc_O%2BHXqAo5_-xG4r3EFVsTefUeQzSvhEyyLDba-O9w@mail.gmail.com
Discussion: https://www.postgresql.org/message-id/CAGECzQRbAGqJnnJJxTdKewTsNOovUt4bsx3NFfofz3m2j-t7tA@mail.gmail.com
2025-04-02 16:41:42 +03:00
Peter Eisentraut
748e98d05b Fix code comment
The changes made in commit d2b4b4c225 contained incorrect comments:
They said that certain forward declarations were necessary to "avoid
including pathnodes.h here", but the file is itself pathnodes.h!  So
change the comment to just say it's a forward declaration in one case,
and in the other case we don't need the declaration at all because it
already appeared earlier in the file.
2025-04-02 14:46:47 +02:00
Heikki Linnakangas
09be391126 Add timingsafe_bcmp(), for constant-time memory comparison
timingsafe_bcmp() should be used instead of memcmp() or a naive
for-loop, when comparing passwords or secret tokens, to avoid leaking
information about the secret token by timing. This commit just
introduces the function but does not change any existing code to use
it yet.

Co-authored-by: Jelte Fennema-Nio <github-tech@jeltef.nl>
Discussion: https://www.postgresql.org/message-id/7b86da3b-9356-4e50-aa1b-56570825e234@iki.fi
2025-04-02 15:32:40 +03:00
Heikki Linnakangas
85d799ba8a docs: Update phrase on message lengths in the protocol
The reasoning for why all the message formats are parseable without
the explicit message length field is anachronistic; the real reason is
that protocol version 2 did not have a message length field. There's
nothing wrong with relying on the message length, like we do in the
CopyData messags, even though it often still makes sense to have
length fields for individual parts in messages.

Discussion: https://www.postgresql.org/message-id/02a4eed2-98f0-4796-9d4f-12128ff44fe0@iki.fi
2025-04-02 15:32:33 +03:00
Andres Freund
a6285b150a tests: Fix incompatibility of test_aio with *_FORCE_RELEASE
The test added in 93bc3d75d8 failed in a build with RELCACHE_FORCE_RELEASE
and CATCACHE_FORCE_RELEASE defined. The test intentionally forgets to exit
batchmode - normally that would trigger an error at the end of the
transaction, which the test verifies.  However, with RELCACHE_FORCE_RELEASE
and CATCACHE_FORCE_RELEASE defined, we get other code (output function lookup)
entering batchmode and erroring out because batchmode isn't allowed to be
entered recursively.

Fix that by changing the queries in question to not output any rows. That's
not exactly pretty, but seems to avoid the problem reliably.

Eventually we might want to make RELCACHE_FORCE_RELEASE and
CATCACHE_FORCE_RELEASE GUCs, so we can disable them where necessary - this
isn't the first test having difficulty with those debug options. But that's
for later.

Per buildfarm member prion.

Discussion: https://postgr.es/m/uc62i6vi5gd4bi6wtjj5poadqxolgy55e7ihkmf3mthjegb6zl@zqo7xez7sc2r
2025-04-02 07:57:11 -04:00
Andres Freund
43dca8a116 tests: Cope with WARNINGs during failed CREATE DB on windows
The test added in 93bc3d75d8 sometimes fails on windows, due to warnings like
WARNING:  some useless files may be left behind in old database directory "base/16514"

The reason for that is createdb_failure_callback() does not ensure that there
are no open file descriptors for files in the partially created,
to-be-dropped, database. We do take care in dropdb(), but that involves
waiting for checkpoints and a ProcSignalBarrier, which we probably don't want
to do in an error callback.  This should probably be fixed one day, but for
now 001_aio.pl needs to cope.

Per buildfarm animals fairywren and drongo.

Discussion: https://postgr.es/m/uc62i6vi5gd4bi6wtjj5poadqxolgy55e7ihkmf3mthjegb6zl@zqo7xez7sc2r
2025-04-02 07:51:48 -04:00
Peter Eisentraut
eec0040c4b Add support for NOT ENFORCED in foreign key constraints
This expands the NOT ENFORCED constraint flag, previously only
supported for CHECK constraints (commit ca87c415e2), to foreign key
constraints.

Normally, when a foreign key constraint is created on a table, action
and check triggers are added to maintain data integrity.  With this
patch, if a constraint is marked as NOT ENFORCED, integrity checks are
no longer required, making these triggers unnecessary.  Consequently,
when creating a NOT ENFORCED foreign key constraint, triggers will not
be created, and the constraint will be marked as NOT VALID.
Similarly, if an existing foreign key constraint is changed to NOT
ENFORCED, the associated triggers will be dropped, and the constraint
will also be marked as NOT VALID.  Conversely, if a NOT ENFORCED
foreign key constraint is changed to ENFORCED, the necessary triggers
will be created, and the will be changed to VALID by performing
necessary validation.

Since not-enforced foreign key constraints have no triggers, the
shortcut used for example in psql and pg_dump to skip looking for
foreign keys if the relation is known not to have triggers no longer
applies.  (It already didn't work for partitioned tables.)

Author: Amul Sul <sulamul@gmail.com>
Reviewed-by: Joel Jacobson <joel@compiler.org>
Reviewed-by: Andrew Dunstan <andrew@dunslane.net>
Reviewed-by: Peter Eisentraut <peter@eisentraut.org>
Reviewed-by: jian he <jian.universality@gmail.com>
Reviewed-by: Alvaro Herrera <alvherre@alvh.no-ip.org>
Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Reviewed-by: Isaac Morland <isaac.morland@gmail.com>
Reviewed-by: Alexandra Wang <alexandra.wang.oss@gmail.com>
Tested-by: Triveni N <triveni.n@enterprisedb.com>
Discussion: https://www.postgresql.org/message-id/flat/CAAJ_b962c5AcYW9KUt_R_ER5qs3fUGbe4az-SP-vuwPS-w-AGA@mail.gmail.com
2025-04-02 13:36:44 +02:00
Andres Freund
327d987df1 tests: Cope with io_method in TEMP_CONFIG in test_aio
If io_method is set in TEMP_CONFIG the test added in 93bc3d75d8 fails,
because it assumes the io_method specified at initdb is actually used.

Fix that by appending the io_method again, after initdb (and thus after
TEMP_CONFIG has been added by Cluster.pm).

Per buildfarm animal bumblebee

Discussion: https://postgr.es/m/zh5u22wbpcyfw2ddl3lsvmsxf4yvsrvgxqwwmfjddc4c2khsgp@gfysyjsaelr5
2025-04-02 07:00:40 -04:00
Alexander Korotkov
bc22dc0e0d Get rid of WALBufMappingLock
Allow multiple backends to initialize WAL buffers concurrently.  This way
`MemSet((char *) NewPage, 0, XLOG_BLCKSZ);` can run in parallel without
taking a single LWLock in exclusive mode.

The new algorithm works as follows:
 * reserve a page for initialization using XLogCtl->InitializeReserved,
 * ensure the page is written out,
 * once the page is initialized, try to advance XLogCtl->InitializedUpTo and
   signal to waiters using XLogCtl->InitializedUpToCondVar condition
   variable,
 * repeat previous steps until we reserve initialization up to the target
   WAL position,
 * wait until concurrent initialization finishes using a
   XLogCtl->InitializedUpToCondVar.

Now, multiple backends can, in parallel, concurrently reserve pages,
initialize them, and advance XLogCtl->InitializedUpTo to point to the latest
initialized page.

Author: Yura Sokolov <y.sokolov@postgrespro.ru>
Co-authored-by: Alexander Korotkov <aekorotkov@gmail.com>
Reviewed-by: Pavel Borisov <pashkin.elfe@gmail.com>
Reviewed-by: Tomas Vondra <tomas@vondra.me>
Tested-by: Michael Paquier <michael@paquier.xyz>
2025-04-02 12:44:24 +03:00
Fujii Masao
b53b88109f Improve error message when standby does accept connections.
Even after reaching the minimum recovery point, if there are long-lived
write transactions with 64 subtransactions on the primary, the recovery
snapshot may not yet be ready for hot standby, delaying read-only
connections on the standby. Previously, when read-only connections were
not accepted due to this condition, the following error message was logged:

    FATAL:  the database system is not yet accepting connections
    DETAIL:  Consistent recovery state has not been yet reached.

This DETAIL message was misleading because the following message was
already logged in this case:

    LOG:  consistent recovery state reached

This contradiction, i.e., indicating that the recovery state was consistent
while also stating it wasn’t, caused confusion.

This commit improves the error message to better reflect the actual state:

    FATAL: the database system is not yet accepting connections
    DETAIL: Recovery snapshot is not yet ready for hot standby.
    HINT: To enable hot standby, close write transactions with more than 64 subtransactions on the primary server.

To implement this, the commit introduces a new postmaster signal,
PMSIGNAL_RECOVERY_CONSISTENT. When the startup process reaches
a consistent recovery state, it sends this signal to the postmaster,
allowing it to correctly recognize that state.

Since this is not a clear bug, the change is applied only to the master
branch and is not back-patched.

Author: Atsushi Torikoshi <torikoshia@oss.nttdata.com>
Co-authored-by: Fujii Masao <masao.fujii@gmail.com>
Reviewed-by: Yugo Nagata <nagata@sraoss.co.jp>
Discussion: https://postgr.es/m/02db8cd8e1f527a8b999b94a4bee3165@oss.nttdata.com
2025-04-02 15:13:01 +09:00
David Rowley
121d774cae Doc: add information about partition locking
The documentation around locking of partitions for the executor startup
phase of run-time partition pruning wasn't clear about which partitions
were being locked.  Fix that.

Reviewed-by: Tender Wang <tndrwang@gmail.com>
Discussion: https://postgr.es/m/CAApHDvp738G75HfkKcfXaf3a8s%3D6mmtOLh46tMD0D2hAo1UCzA%40mail.gmail.com
Backpatch-through: 13
2025-04-02 14:02:44 +13:00
Melanie Plageman
b3219c69fc aio: Add errcontext for processing I/Os for another backend
Push an ErrorContextCallback adding additional detail about the process
performing the I/O and the owner of the I/O when those are not the same.

For io_method worker, this adds context specifying which process owns
the I/O that the I/O worker is processing.

For io_method io_uring, this adds context only when a backend is
*completing* I/O for another backend. It specifies the pid of the owning
process.

Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/rdml3fpukrqnas7qc5uimtl2fyytrnu6ymc2vjf2zuflbsjuul%40hyizyjsexwmm
2025-04-01 19:53:07 -04:00
David Rowley
b136db07c6 Fix planner's failure to identify multiple hashable ScalarArrayOpExprs
50e17ad28 (v14) and 29f45e299 (v15) made it so the planner could identify
IN and NOT IN clauses which have Const lists as right-hand arguments and
when an appropriate hash function is available for the data types, mark
the ScalarArrayOpExpr as hashable so the executor could execute it more
optimally by building and probing a hash table during expression
evaluation.

These commits both worked correctly when there was only a single
ScalarArrayOpExpr in the given expression being processed by the
planner, but when there were multiple, only the first was checked and any
subsequent ones were not identified, which resulted in less optimal
expression evaluation during query execution for all but the first found
ScalarArrayOpExpr.

Backpatch to 14, where 50e17ad28 was introduced.

Author: David Geier <geidav.pg@gmail.com>
Discussion: https://postgr.es/m/29a76f51-97b0-4c07-87b7-ec8e3b5345c9@gmail.com
Backpatch-through: 14
2025-04-02 11:56:29 +13:00
Tom Lane
6c12ae09f5 Introduce a SQL-callable function array_sort(anyarray).
Create a function that will sort the elements of an array
according to the element type's sort order.  If the array
has more than one dimension, the sub-arrays of the first
dimension are sorted per normal array-comparison rules,
leaving their contents alone.

In support of this, add pg_type.typarray to the set of fields
cached by the typcache.

Author: Junwang Zhao <zhjwpku@gmail.com>
Co-authored-by: Jian He <jian.universality@gmail.com>
Reviewed-by: Aleksander Alekseev <aleksander@timescale.com>
Discussion: https://postgr.es/m/CAEG8a3J41a4dpw_-F94fF-JPRXYxw-GfsgoGotKcjs9LVfEEvw@mail.gmail.com
2025-04-01 18:03:55 -04:00
Tom Lane
6da2ba1d8a Fix detection and handling of strchrnul() for macOS 15.4.
As of 15.4, macOS has strchrnul(), but access to it is blocked behind
a check for MACOSX_DEPLOYMENT_TARGET >= 15.4.  But our does-it-link
configure check finds it, so we try to use it, and fail with the
present default deployment target (namely 15.0).  This accounts for
today's buildfarm failures on indri and sifaka.

This is the identical problem that we faced some years ago when Apple
introduced preadv and pwritev in the same way.  We solved that in
commit f014b1b9b by using AC_CHECK_DECLS instead of AC_CHECK_FUNCS
to check the functions' availability.  So do the same now for
strchrnul().  Interestingly, we already had a workaround for
"the link check doesn't agree with <string.h>" cases with glibc,
which we no longer need since only the header declaration is being
checked.

Testing this revealed that the meson version of this check has never
worked, because it failed to use "-Werror=unguarded-availability-new".
(Apparently nobody's tried to build with meson on macOS versions that
lack preadv/pwritev as standard.)  Adjust that while at it.  Also,
we had never put support for "-Werror=unguarded-availability-new"
into v13, but we need that now.

Co-authored-by: Tom Lane <tgl@sss.pgh.pa.us>
Co-authored-by: Peter Eisentraut <peter@eisentraut.org>
Discussion: https://postgr.es/m/385134.1743523038@sss.pgh.pa.us
Backpatch-through: 13
2025-04-01 16:50:09 -04:00
Andrew Dunstan
c313fa4602 Use workaround of __builtin_setjmp only on MINGW on MSVCRT
MSVCRT is not present Windows/ARM64 and the workaround is not
necessary on any UCRT based toolchain.

Author: Lars Kanis <lars@greiz-reinsdorf.de>

Discussion: https://postgr.es/m/CAHXCYb2OjNHtoGVKyXtXmw4B3bUXwJX6M-Lcp1KcMCRUMLOocA@mail.gmail.com
2025-04-01 16:24:59 -04:00
Andres Freund
e19dc74491 aio: Minor comment improvements
Reviewed-by: Noah Misch <noah@leadboat.com>
Discussion: https://postgr.es/m/usbwzckj7q3jhfx3ann3nrfnukmupbs35axvq5zfyeo6nvrzrm@onjhxs2du4st
2025-04-01 16:06:48 -04:00
Andres Freund
fdd146a8ef aio: Add README.md explaining higher level design
Reviewed-by: Noah Misch <noah@leadboat.com>
Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi>
Discussion: https://postgr.es/m/uvrtrknj4kdytuboidbhwclo4gxhswwcpgadptsjvjqcluzmah%40brqs62irg4dt
Discussion: https://postgr.es/m/20210223100344.llw5an2aklengrmn@alap3.anarazel.de
Discussion: https://postgr.es/m/stj36ea6yyhoxtqkhpieia2z4krnam7qyetc57rfezgk4zgapf@gcnactj4z56m
2025-04-01 16:06:48 -04:00
Nathan Bossart
5aec7e07fb doc: Adjust some notes about pg_upgrade's file transfer modes.
--copy-file-range and --swap were not mentioned in a few places
that discuss the available file transfer modes.  This entire page
would likely benefit from an overhaul, but that's v19 material at
this point.

Oversights in commits d93627bcbe and 626d7236b6.
2025-04-01 14:37:47 -05:00
Andres Freund
00066aa173 md: Add comment & assert to buffer-zeroing path in md[start]readv()
mdreadv() has a codepath to zero out buffers when a read returns zero bytes,
guarded by a check for zero_damaged_pages || InRecovery.

The InRecovery codepath to zero out buffers in mdreadv() appears to be
unreachable. The only known paths to reach mdreadv()/mdstartreadv() in
recovery are XLogReadBufferExtended(), vm_readbuf(), and fsm_readbuf(), each
of which takes care to extend the relation if necessary. This looks to either
have been the case for a long time, or the code was never reachable.

The zero_damaged_pages path is incomplete, as missing segments are not
created.

Putting blocks into the buffer-pool that do not exist on disk is rather
problematic, as such blocks will, at least initially, not be found by scans
that rely on smgrnblocks(), as they are beyond EOF. It also can cause weird
problems with relation extension, as relation extension does not expect blocks
beyond EOF to exist.

Therefore we would like to remove that path.

mdstartreadv(), which I added in e5fe570b51c, does not implement this zeroing
logic. I had started a discussion about that a while ago (linked below), but
forgot to act on the conclusion of the discussion, namely to disable the
in-memory-zeroing behavior.

We could certainly implement equivalent zeroing logic in mdstartreadv(), but
it would have to be more complicated due to potential differences in the
zero_damaged_pages setting between the definer and completor of IO. Given that
we want to remove the logic, that does not seem worth implementing the
necessary logic.

For now, put an Assert(false) and comments documenting this choice into
mdreadv() and comments documenting the deprecation of the path in mdreadv()
and the non-implementation of it in mdstartreadv().  If we, during testing,
discover that we do need the path, we can implement it at that time.

Reviewed-by: Noah Misch <noah@leadboat.com>
Discussion: https://postgr.es/m/postgr.es/m/20250330024513.ac.nmisch@google.com
Discussion: https://postgr.es/m/postgr.es/m/3qxxsnciyffyf3wyguiz4besdp5t5uxvv3utg75cbcszojlz7p@uibfzmnukkbd
2025-04-01 13:50:39 -04:00
Andres Freund
93bc3d75d8 aio: Add test_aio module
To make the tests possible, a few functions from bufmgr.c/localbuf.c had to be
exported, via buf_internals.h.

Reviewed-by: Noah Misch <noah@leadboat.com>
Co-authored-by: Andres Freund <andres@anarazel.de>
Co-authored-by: Nazir Bilal Yavuz <byavuz81@gmail.com>
Discussion: https://postgr.es/m/uvrtrknj4kdytuboidbhwclo4gxhswwcpgadptsjvjqcluzmah%40brqs62irg4dt
2025-04-01 13:47:46 -04:00
Andres Freund
60f566b4f2 aio: Add pg_aios view
The new view lists all IO handles that are currently in use and is mainly
useful for PG developers, but may also be useful when tuning PG.

Bumps catversion.

Reviewed-by: Noah Misch <noah@leadboat.com>
Discussion: https://postgr.es/m/uvrtrknj4kdytuboidbhwclo4gxhswwcpgadptsjvjqcluzmah%40brqs62irg4dt
2025-04-01 13:30:33 -04:00
Andres Freund
46250cdcb0 docs: Add acronym and glossary entries for I/O and AIO
These are fairly basic, but better than nothing.  While there are several
opportunities to link to these entries, this patch does not add any. They will
however be referenced by future patches.

Reviewed-by: Noah Misch <noah@leadboat.com>
Discussion: https://postgr.es/m/20250326183102.92.nmisch@google.com
2025-04-01 13:30:33 -04:00
Álvaro Herrera
172259afb5
Verify roundtrip dump/restore of regression database
Add a test to pg_upgrade's test suite that verifies that
dump-restore-dump of regression database produces equivalent output to
dumping it directly.  This was already being tested by running
pg_upgrade itself, but non-binary-upgrade mode was not being covered.

The regression database has accrued, over time, a sufficient collection
of interesting objects to ensure good coverage, but there hasn't been a
concerted effort to be completely exhaustive, so it is likely still
possible to have more.

This'd belong more naturally in the pg_dump test suite, but we chose to
put it in src/bin/pg_upgrade/t/002_pg_upgrade.pl because we need a run
of the regression tests which is already done here, so this has less
total test runtime impact.  Also, experiments have shown that using
parallel dump/restore is slightly faster, so we use --format=directory -j2.

This test has already reported pg_dump bugs, as fixed in fd41ba93e4,
74563f6b90, d611f8b158, 4694aedf63.

Author: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Álvaro Herrera <alvherre@alvh.no-ip.org>
Discussion: https://www.postgresql.org/message-id/CAExHW5uF5V=Cjecx3_Z=7xfh4rg2Wf61PT+hfquzjBqouRzQJQ@mail.gmail.com
2025-04-01 18:50:40 +02:00
Peter Eisentraut
764d501d24 Remove a stray "pgrminclude" annotation
We don't use those anymore.  Fix for commit 8492feb98f.
2025-04-01 15:28:22 +02:00
Peter Eisentraut
113ecf1f8c Fix minor C type confusion
Returning false instead of NULL gets a compiler error under gcc-14
-std=gnu23, and it appears to have been unintentional.  Fix for commit
8492feb98f.
2025-04-01 15:28:22 +02:00
Heikki Linnakangas
2904324a88 heapam: Only set tuple's block once per page in pagemode
Due to splitting the block id into two 16 bit integers, BlockIdSet()
is more expensive than one might think.  Doing it once per returned
tuple shows up as a small but reliably reproducible cost.  It's simple
enough to set the block number just once per block in pagemode, so do
so.

Author: Andres Freund <andres@anarazel.de>
Discussion: https://www.postgresql.org/message-id/lxzj26ga6ippdeunz6kuncectr5gfuugmm2ry22qu6hcx6oid6@lzx3sjsqhmt6
2025-04-01 13:24:27 +03:00
John Naylor
af0c248557 Use function attributes for SSE 4.2 even when targeting that extension
On Red Hat 9 systems (or similar), the packaged gcc targets x86-64-v2,
but clang does not. This has caused build failures in the wake of
commit e2809e3a1 when building --with-llvm.

The most expedient fix is to use the same function attributes for
the inlined function as we do for the global function.

Reported-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com> (plus members skimmer and bumblebee)
Diagnosed-by: Tom Lane <tgl@sss.pgh.pa.us>
Tested-by: Todd Cook <cookt@blackduck.com>
Discussion: https://postgr.es/m/CANWCAZZSxs3a1YRKehkgk2OHKbrVn+xZ+AWW8Co2R_f70NqqmA@mail.gmail.com
2025-04-01 12:01:58 +07:00
David Rowley
3dbdf86c63 Fix failing regression test on x86-32 machines
95d6e9af0 added code to display the tuplestore storage type for
WindowAgg nodes and added a test to ensure the "Disk" storage method was
working correctly by setting work_mem to 64 and running a test which
caused the WindowAgg to go to disk.  Seemingly, the number of rows
chosen there wasn't quite enough for that to happen in x86 32-bit.

Fix this by increasing the number of rows slightly.

I suspect the buildfarm didn't catch this as MEMORY_CONTEXT_CHECKING
builds will use a bit more memory for MemoryChunks to store the
requested_size and also because of the additional space to store the
chunk's sentinel byte.

Reported-by: Christoph Berg <myon@debian.org>
Discussion: https://postgr.es/m/Z-q3ZAM4OhE-4UiI@msg.df7cb.de
2025-04-01 10:52:25 +13:00
Tom Lane
2fd3e2fa5c Fix accidentally-harmless thinko in psqlscan_test_variable().
This code was passing literal strings to psqlscan_emit,
which is quite contrary to that function's specification:
"If you pass it something that is not part of the yytext
string, you are making a mistake".  It accidentally worked
anyway, even in non-safe_encoding mode.  psqlscan_emit
would compute a garbage "reference" pointer, but would
never dereference that since the passed string is all-ASCII.
So there's no live bug today, but that is a happenstance
outcome of psqlscan_emit's current implementation.

Let's make psqlscan_test_variable do what it's supposed to,
namely append directly to the output buffer.  This is just
future-proofing against possible changes in psqlscan_emit,
so I don't feel a need to back-patch.
2025-03-31 12:16:32 -04:00
Peter Eisentraut
0fcf02ad45 doc: Mention clock synchronization recommendation for hot_standby_feedback
hot_standby_feedback mechanics assume that clocks are synchronized,
but it was not clear from documentation.

Author: Jakub Wartak <jakub.wartak@enterprisedb.com>
Reviewed-by: Euler Taveira <euler@eulerto.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Reviewed-by: vignesh C <vignesh21@gmail.com>
Discussion: https://postgr.es/m/CAKZiRmwBcALLrDgCyEhHP1enUxtPMjyNM_d1A2Lng3_6Rf4Qfw%40mail.gmail.com
2025-03-31 16:54:50 +02:00
John Naylor
e2809e3a10 Inline CRC computation for small fixed-length input on x86
pg_crc32c.h now has a simplified copy of the loop in pg_crc32c_sse42.c
suitable for inlining where possible.

This may slightly reduce contention for the WAL insertion lock,
but that hasn't been tested. The motivation for this change is avoid
regressing for a future commit that will use a function pointer for
non-constant input in all x86 builds.

While it's technically possible to make a similar change for Arm and
LoongArch, there are some questions about how inlining should work
since those platforms prefer stricter alignment. There are also no
immediate plans to add additional implementations for them.

Reviewed-by: Nathan Bossart <nathandbossart@gmail.com>
Reviewed-by: Raghuveer Devulapalli <raghuveer.devulapalli@intel.com>
Discussion: https://postgr.es/m/CANWCAZZEiTzhZcuwTiJ2=opiNpAUn1vuDRu1N02z61AthwRZLA@mail.gmail.com
Discussion: https://postgr.es/m/CANWCAZYRhLHArpyfV4uRK-Rw9N5oV5HMkkKtBehcuTjNOMwCZg@mail.gmail.com
2025-03-31 13:17:21 +07:00
Jeff Davis
4694aedf63 Add relallfrozen to pg_dump statistics.
Author: Corey Huinker <corey.huinker@gmail.com>
Discussion: https://postgr.es/m/CADkLM=desCuf3dVHasADvdUVRmb-5gO0mhMO5u9nzgv6i7U86Q@mail.gmail.com
2025-03-30 22:14:06 -07:00
Andres Freund
2a5e709e72 Enable IO concurrency on all systems
Previously effective_io_concurrency and maintenance_io_concurrency could not
be set above 0 on machines without fadvise support. AIO enables IO concurrency
without such support, via io_method=worker.

Currently only subsystems using the read stream API will take advantage of
this. Other users of maintenance_io_concurrency (like recovery prefetching)
which leverage OS advice directly will not benefit from this change. In those
cases, maintenance_io_concurrency will have no effect on I/O behavior.

Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Noah Misch <noah@leadboat.com>
Discussion: https://postgr.es/m/CAAKRu_atGgZePo=_g6T3cNtfMf0QxpvoUh5OUqa_cnPdhLd=gw@mail.gmail.com
2025-03-30 19:16:47 -04:00
Andres Freund
ae3df4b341 read_stream: Introduce and use optional batchmode support
Submitting IO in larger batches can be more efficient than doing so
one-by-one, particularly for many small reads. It does, however, require
the ReadStreamBlockNumberCB callback to abide by the restrictions of AIO
batching (c.f. pgaio_enter_batchmode()). Basically, the callback may not:
a) block without first calling pgaio_submit_staged(), unless a
   to-be-waited-on lock cannot be part of a deadlock, e.g. because it is
   never held while waiting for IO.

b) directly or indirectly start another batch pgaio_enter_batchmode()

As this requires care and is nontrivial in some cases, batching is only
used with explicit opt-in.

This patch adds an explicit flag (READ_STREAM_USE_BATCHING) to read_stream and
uses it where appropriate.

There are two cases where batching would likely be beneficial, but where we
aren't using it yet:

1) bitmap heap scans, because the callback reads the VM

   This should soon be solved, because we are planning to remove the use of
   the VM, due to that not being sound.

2) The first phase of heap vacuum

   This could be made to support batchmode, but would require some care.

Reviewed-by: Noah Misch <noah@leadboat.com>
Reviewed-by: Thomas Munro <thomas.munro@gmail.com>
Discussion: https://postgr.es/m/uvrtrknj4kdytuboidbhwclo4gxhswwcpgadptsjvjqcluzmah%40brqs62irg4dt
2025-03-30 18:36:41 -04:00
Andres Freund
f4d0730bbc aio: Basic read_stream adjustments for real AIO
Adapt the read stream logic for real AIO:
- If AIO is enabled, we shouldn't issue advice, but if it isn't, we should
  continue issuing advice
- AIO benefits from reading ahead with direct IO
- If effective_io_concurrency=0, pass READ_BUFFERS_SYNCHRONOUSLY to
  StartReadBuffers() to ensure synchronous IO execution

There are further improvements we should consider:

- While in read_stream_look_ahead(), we can use AIO batch submission mode for
  increased efficiency. That however requires care to avoid deadlocks and thus
  done separately.
- It can be beneficial to defer starting new IOs until we can issue multiple
  IOs at once. That however requires non-trivial heuristics to decide when to
  do so.

Reviewed-by: Noah Misch <noah@leadboat.com>
Co-authored-by: Andres Freund <andres@anarazel.de>
Co-authored-by: Thomas Munro <thomas.munro@gmail.com>
2025-03-30 18:26:44 -04:00
Andres Freund
b27f8637ea docs: Reframe track_io_timing related docs as wait time
With AIO it does not make sense anymore to track the time for each individual
IO, as multiple IOs can be in-flight at the same time. Instead we now track
the time spent *waiting* for IOs.

This should be reflected in the docs. While, so far, we only do a subset of
reads, and no other operations, via AIO, describing the GUC and view columns
as measuring IO waits is accurate for synchronous and asynchronous IO.

Reviewed-by: Noah Misch <noah@leadboat.com>
Discussion: https://postgr.es/m/5dzyoduxlvfg55oqtjyjehez5uoq6hnwgzor4kkybkfdgkj7ag@rbi4gsmzaczk
2025-03-30 18:04:40 -04:00
Andres Freund
12ce89fd07 bufmgr: Use AIO in StartReadBuffers()
This finally introduces the first actual use of AIO. StartReadBuffers() now
uses the AIO routines to issue IO.

As the implementation of StartReadBuffers() is also used by the functions for
reading individual blocks (StartReadBuffer() and through that
ReadBufferExtended()) this means all buffered read IO passes through the AIO
paths.  However, as those are synchronous reads, actually performing the IO
asynchronously would be rarely beneficial. Instead such IOs are flagged to
always be executed synchronously. This way we don't have to duplicate a fair
bit of code.

When io_method=sync is used, the IO patterns generated after this change are
the same as before, i.e. actual reads are only issued in WaitReadBuffers() and
StartReadBuffers() may issue prefetch requests.  This allows to bypass most of
the actual asynchronicity, which is important to make a change as big as this
less risky.

One thing worth calling out is that, if IO is actually executed
asynchronously, the precise meaning of what track_io_timing is measuring has
changed. Previously it tracked the time for each IO, but that does not make
sense when multiple IOs are executed concurrently. Now it only measures the
time actually spent waiting for IO. A subsequent commit will adjust the docs
for this.

While AIO is now actually used, the logic in read_stream.c will often prevent
using sufficiently many concurrent IOs. That will be addressed in the next
commit.

Reviewed-by: Noah Misch <noah@leadboat.com>
Reviewed-by: Nazir Bilal Yavuz <byavuz81@gmail.com>
Co-authored-by: Andres Freund <andres@anarazel.de>
Co-authored-by: Thomas Munro <thomas.munro@gmail.com>
Discussion: https://postgr.es/m/uvrtrknj4kdytuboidbhwclo4gxhswwcpgadptsjvjqcluzmah%40brqs62irg4dt
Discussion: https://postgr.es/m/20210223100344.llw5an2aklengrmn@alap3.anarazel.de
Discussion: https://postgr.es/m/stj36ea6yyhoxtqkhpieia2z4krnam7qyetc57rfezgk4zgapf@gcnactj4z56m
2025-03-30 18:02:23 -04:00
Andres Freund
047cba7fa0 bufmgr: Implement AIO read support
This commit implements the infrastructure to perform asynchronous reads into
the buffer pool.

To do so, it:

- Adds readv AIO callbacks for shared and local buffers

  It may be worth calling out that shared buffer completions may be run in a
  different backend than where the IO started.

- Adds an AIO wait reference to BufferDesc, to allow backends to wait for
  in-progress asynchronous IOs

- Adapts StartBufferIO(), WaitIO(), TerminateBufferIO(), and their localbuf.c
  equivalents, to be able to deal with AIO

- Moves the code to handle BM_PIN_COUNT_WAITER into a helper function, as it
  now also needs to be called on IO completion

As of this commit, nothing issues AIO on shared/local buffers. A future commit
will update StartReadBuffers() to do so.

Buffer reads executed through this infrastructure will report invalid page /
checksum errors / warnings differently than before:

In the error case the error message will cover all the blocks that were
included in the read, rather than just the reporting the first invalid
block. If more than one block is invalid, the error will include information
about the range of the read, the first invalid block and the number of invalid
pages, with a HINT towards the server log for per-block details.

For the warning case (i.e. zero_damaged_buffers) we would previously emit one
warning message for each buffer in a multi-block read. Now there is only a
single warning message for the entire read, again referring to the server log
for more details in case of multiple checksum failures within a single larger
read.

Reviewed-by: Noah Misch <noah@leadboat.com>
Reviewed-by: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Nazir Bilal Yavuz <byavuz81@gmail.com>
Discussion: https://postgr.es/m/uvrtrknj4kdytuboidbhwclo4gxhswwcpgadptsjvjqcluzmah%40brqs62irg4dt
Discussion: https://postgr.es/m/20210223100344.llw5an2aklengrmn@alap3.anarazel.de
Discussion: https://postgr.es/m/stj36ea6yyhoxtqkhpieia2z4krnam7qyetc57rfezgk4zgapf@gcnactj4z56m
2025-03-30 17:28:03 -04:00
Andres Freund
ef64fe26ba aio: Add WARNING result status
If an IO succeeds, but issues a warning, e.g. due to a page verification
failure with zero_damaged_pages, we want to issue that warning in the context
of the issuer of the IO, not the process that executes the completion (always
the case for worker).

It's already possible for a completion callback to report a custom error
message, we just didn't have a result status that allowed a user of AIO to
know that a warning should be emitted even though the IO request succeeded.

All that's needed for that is a dedicated PGAIO_RS_ value.

Previously there were not enough bits in PgAioResult.id for the new
value. Increase. While at that, add defines for the amount of bits and static
asserts to check that the widths are appropriate.

Reviewed-by: Noah Misch <noah@leadboat.com>
Discussion: https://postgr.es/m/20250329212929.a6.nmisch@google.com
2025-03-30 16:27:10 -04:00
Andres Freund
d445990adc Let caller of PageIsVerified() control ignore_checksum_failure
For AIO the completion of a read into shared buffers (i.e. verifying the page
including the checksum, updating the BufferDesc to reflect the IO) can happen
in a different backend than the backend that started the IO. As
ignore_checksum_failure can differ between backends, we need to allow the
caller of PageIsVerified() control whether to ignore checksum failures.

The commit leaves a gap in the PIV_* values, as an upcoming commit, which
depends on this commit, will add PIV_LOG_LOG, which better fits just after
PIV_LOG_WARNING.

Reviewed-by: Noah Misch <noah@leadboat.com>
Discussion: https://postgr.es/m/20250329212929.a6.nmisch@google.com
2025-03-30 16:27:10 -04:00
Andres Freund
b96d3c3897 pgstat: Allow checksum errors to be reported in critical sections
For AIO we execute completion callbacks in critical sections (to ensure that
AIO can in the future be used for WAL, which in turn requires that we can call
completion callbacks in critical sections, to get the resources for WAL
io). To report checksum errors a backend now has to call
pgstat_prepare_report_checksum_failure(), before entering a critical section,
which guarantees the relevant pgstats entry is in shared memory, the relevant
DSM segment is mapped into the backend's memory and the address is known via a
PgStat_EntryRef.

Reviewed-by: Noah Misch <noah@leadboat.com>
Discussion: https://postgr.es/m/wkjj4p2rmkevutkwc6tewoovdqznj6c6nvjmvii4oo5wmbh5sr@retq7d6uqs4j
2025-03-30 16:12:04 -04:00
Andres Freund
4244cf6876 Add errhint_internal()
We have errmsg_internal(), errdetail_internal(), but not errhint_internal().

Sometimes it is useful to output a hint with already translated format
string (e.g. because there different messages depending on the condition). For
message/detail we do that with the _internal() variants, but we can't do that
with hint today.  It's possible to work around that that by using something
like
  str = psprintf(translated_format, args);
  ereport(...
          errhint("%s", str);
but that's not exactly pretty and makes it harder to avoid memory leaks.

Reviewed-by: Noah Misch <noah@leadboat.com>
Discussion: https://postgr.es/m/ym3dqpa4xcvoeknewcw63x77vnqdosbqcetjinb2zfoh65k55m@m4ozmwhr6lk6
2025-03-30 16:10:51 -04:00
Tomas Vondra
49b82522f1 Remove incidental md5() function use from test
Replace md5() with sha256() in tests introduced in 14ffaece0f, to
allow test to pass in OpenSSL FIPS mode.

Reported-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/3518736.1743307492@sss.pgh.pa.us
2025-03-30 13:22:39 +02:00
Andres Freund
d6d8054dc7 localbuf: Track pincount in BufferDesc as well
For AIO on temporary table buffers the AIO subsystem needs to be able to
ensure a pin on a buffer while AIO is going on, even if the IO issuing query
errors out. Tracking the buffer in LocalRefCount does not work, as it would
cause CheckForLocalBufferLeaks() to assert out.

Instead, also track the refcount in BufferDesc.state, not just
LocalRefCount. This also makes local buffers behave a bit more akin to shared
buffers.

Note that we still don't need locking, AIO completion callbacks for local
buffers are executed in the issuing session (i.e. nobody else has access to
the BufferDesc).

Reviewed-by: Noah Misch <noah@leadboat.com>
Discussion: https://postgr.es/m/uvrtrknj4kdytuboidbhwclo4gxhswwcpgadptsjvjqcluzmah%40brqs62irg4dt
2025-03-29 16:36:51 -04:00
Andres Freund
08ccd56ac7 aio, bufmgr: Comment fixes/improvements
Some of these comments have been wrong for a while (12f3867f55), some I
recently introduced (da7226993f, 55b454d0e1). This includes an update to a
comment in FlushBuffer(), which will be copied in a future commit.

These changes seem big enough to be worth doing in separate commits.

Suggested-by: Noah Misch <noah@leadboat.com>
Discussion: https://postgr.es/m/20250319212530.80.nmisch@google.com
2025-03-29 14:45:42 -04:00
Andres Freund
50cb7505b3 aio: Implement support for reads in smgr/md/fd
This implements the following:

1) An smgr AIO target, for AIO on smgr files. This should be usable not just
   for md.c but also other SMGR implementation if we ever get them.
2) readv support in fd.c, which requires a small bit of infrastructure work in
   fd.c
3) smgr.c and md.c support for readv

There still is nothing performing AIO, but as of this commit it would be
possible.

As part of this change FileGetRawDesc() actually ensures that the file is
opened - previously it was basically not usable. It's used to reopen a file in
IO workers.

Reviewed-by: Noah Misch <noah@leadboat.com>
Discussion: https://postgr.es/m/uvrtrknj4kdytuboidbhwclo4gxhswwcpgadptsjvjqcluzmah%40brqs62irg4dt
Discussion: https://postgr.es/m/20210223100344.llw5an2aklengrmn@alap3.anarazel.de
Discussion: https://postgr.es/m/stj36ea6yyhoxtqkhpieia2z4krnam7qyetc57rfezgk4zgapf@gcnactj4z56m
2025-03-29 13:38:35 -04:00
Andres Freund
dee8002468 Fix mis-attribution of checksum failure stats to the wrong database
Checksum failure stats could be attributed to the wrong database in two cases:

- when a read of a shared relation encountered a checksum error , it would be
  attributed to the current database, instead of the "database" representing
  shared relations

- when using CREATE DATABASE ... STRATEGY WAL_LOG checksum errors in the
  source database would be attributed to the current database

The checksum stats reporting via PageIsVerifiedExtended(PIV_REPORT_STAT) does
not have access to the information about what database a page belongs to.

This fixes the issue by removing PIV_REPORT_STAT and delegating the
responsibility to report stats to the caller, which now can learn about the
number of stats via a new optional argument.

As this changes the signature of PageIsVerifiedExtended() and all callers
should adapt to the new signature, use the occasion to rename the function to
PageIsVerified() and remove the compatibility macro.

We could instead have fixed this by adding information about the database to
the args of PageIsVerified(), but there are soon-to-be-applied patches that
need to separate the stats reporting from the PageIsVerified() call
anyway. Those patches also include testing for the failure paths, something we
inexplicably have not had.

As there is no caller of pgstat_report_checksum_failure() left, remove it.

It'd be possible, but awkward to fix this in the back branches. We considered
doing the work not quite worth it, as mis-attributed stats should still elicit
concern. The emitted error messages do allow to attribute the errors
correctly.

Discussion: https://postgr.es/m/5tyic6epvdlmd6eddgelv47syg2b5cpwffjam54axp25xyq2ga@ptwkinxqo3az
Discussion: https://postgr.es/m/mglpvvbhighzuwudjxzu4br65qqcxsnyvio3nl4fbog3qknwhg@e4gt7npsohuz
2025-03-29 13:38:35 -04:00
Tomas Vondra
68f97aeadb amcheck: Add a GIN index to the CREATE INDEX CONCURRENTLY tests
The existing CREATE INDEX CONCURRENTLY tests checking only B-Tree, but
can be cheaply extended to also check GIN. This helps increasing test
coverage for GIN amcheck, especially related to handling concurrent page
splits and posting list trees.

This already helped to identify several issues during development of the
GIN amcheck support.

Author: Mark Dilger <mark.dilger@enterprisedb.com>
Reviewed-By: Tomas Vondra <tomas.vondra@enterprisedb.com>
Reviewed-By: Kirill Reshke <reshkekirill@gmail.com>
Discussion: https://postgr.es/m/BC221A56-977C-418E-A1B8-9EFC881D80C5%40enterprisedb.com
2025-03-29 16:47:44 +01:00
Tomas Vondra
ca738bdc4c amcheck: Add a test with GIN index on JSONB data
Extend the existing test of GIN checks to also include an index on JSONB
data, using the jsonb_path_ops opclass. This is a common enough usage of
GIN that it makes sense to have better test coverage for it.

Author: Mark Dilger <mark.dilger@enterprisedb.com>
Reviewed-By: Tomas Vondra <tomas.vondra@enterprisedb.com>
Reviewed-By: Kirill Reshke <reshkekirill@gmail.com>
Discussion: https://postgr.es/m/BC221A56-977C-418E-A1B8-9EFC881D80C5%40enterprisedb.com
2025-03-29 16:47:44 +01:00
Tomas Vondra
ec4327d106 amcheck: Fix indentation in verify_gin.c
I forgot to reindent the code after a couple last-minute adjustments
just before committing 14ffaece0f.

Discussion: https://postgr.es/m/45AC9B0A-2B45-40EE-B08F-BDCF5739D1E1%40yandex-team.ru
2025-03-29 16:47:44 +01:00
Andres Freund
116e851db5 Fix "‘static’ is not at beginning of declaration" warning
b98be8a2a2 used "const static" instead of "static const". We normally use the
latter form.

Discussion: https://postgr.es/m/z4mc2hzecahyq3paupfsouhuupmzmgum45md3k5my6bmo7gvn7@z5j26doqamqy
2025-03-29 10:48:59 -04:00
Tomas Vondra
14ffaece0f amcheck: Add gin_index_check() to verify GIN index
Adds a new function, validating two kinds of invariants on a GIN index:

- parent-child consistency: Paths in a GIN graph have to contain
  consistent keys. Tuples on parent pages consistently include tuples
  from child pages; parent tuples do not require any adjustments.

- balanced-tree / graph: Each internal page has at least one downlink,
  and can reference either only leaf pages or only internal pages.

The GIN verification is based on work by Grigory Kryachko, reworked by
Heikki Linnakangas and with various improvements by Andrey Borodin.
Investigation and fixes for multiple bugs by Kirill Reshke.

Author: Grigory Kryachko <GSKryachko@gmail.com>
Author: Heikki Linnakangas <hlinnaka@iki.fi>
Author: Andrey Borodin <amborodin@acm.org>
Reviewed-By: José Villanova <jose.arthur@gmail.com>
Reviewed-By: Aleksander Alekseev <aleksander@timescale.com>
Reviewed-By: Nikolay Samokhvalov <samokhvalov@gmail.com>
Reviewed-By: Andres Freund <andres@anarazel.de>
Reviewed-By: Tomas Vondra <tomas.vondra@enterprisedb.com>
Reviewed-By: Kirill Reshke <reshkekirill@gmail.com>
Reviewed-By: Mark Dilger <mark.dilger@enterprisedb.com>
Reviewed-By: Peter Geoghegan <pg@bowt.ie>
Discussion: https://postgr.es/m/45AC9B0A-2B45-40EE-B08F-BDCF5739D1E1%40yandex-team.ru
2025-03-29 15:44:29 +01:00
Peter Eisentraut
53a2a1564a pgbench: Make set_random_seed() 64-bit everywhere.
Delete an intermediate variable, a redundant cast, a use of long and a
use of long long.  scanf() the seed directly into a uint64, now that we
can do that with SCNu64 from <inttypes.h>.

The previous coding was from pre-C99 times when %lld might not have been
there, so it read into an unsigned long.  Therefore behavior varied
by OS, and --random-seed would accept either 32 or 64 bit seeds.  Now
it's the same everywhere.

Author: Thomas Munro <thomas.munro@gmail.com>
Discussion: https://postgr.es/m/b936d2fb-590d-49c3-a615-92c3a88c6c19%40eisentraut.org
2025-03-29 15:24:42 +01:00
Tomas Vondra
d70b17636d amcheck: Move common routines into a separate module
Before performing checks on an index, we need to take some safety
measures that apply to all index AMs. This includes:

* verifying that the index can be checked - Only selected AMs are
supported by amcheck (right now only B-Tree). The index has to be
valid and not a temporary index from another session.

* changing (and then restoring) user's security context

* obtaining proper locks on the index (and table, if needed)

* discarding GUC changes from the index functions

Until now this was implemented in the B-Tree amcheck module, but it's
something every AM will have to do. So relocate the code into a new
module verify_common for reuse.

The shared steps are implemented by amcheck_lock_relation_and_check(),
receiving the AM-specific verification as a callback. Custom parameters
may be supplied using a pointer.

Author: Andrey Borodin <amborodin@acm.org>
Reviewed-By: José Villanova <jose.arthur@gmail.com>
Reviewed-By: Aleksander Alekseev <aleksander@timescale.com>
Reviewed-By: Nikolay Samokhvalov <samokhvalov@gmail.com>
Reviewed-By: Andres Freund <andres@anarazel.de>
Reviewed-By: Tomas Vondra <tomas@vondra.me>
Reviewed-By: Mark Dilger <mark.dilger@enterprisedb.com>
Reviewed-By: Peter Geoghegan <pg@bowt.ie>
Reviewed-By: Kirill Reshke <reshkekirill@gmail.com>
Discussion: https://postgr.es/m/45AC9B0A-2B45-40EE-B08F-BDCF5739D1E1%40yandex-team.ru
2025-03-29 15:14:49 +01:00
Tomas Vondra
fb9dff7663 Fix grammar in GIN README
Author: Kirill Reshke <reshkekirill@gmail.com>
Discussion: https://postgr.es/m/CALdSSPgu9uAhVYojQ0yjG%3Dq5MaqmiSLUJPhz%2B-u7cA6K6Mc9UA%40mail.gmail.com
2025-03-29 15:14:25 +01:00
Dean Rasheed
8b6a0e2392 Fix MERGE with DO NOTHING actions into a partitioned table.
ExecInitPartitionInfo() duplicates much of the logic in
ExecInitMerge(), except that it failed to handle DO NOTHING
actions. This would cause an "unknown action in MERGE WHEN clause"
error if a MERGE with any DO NOTHING actions attempted to insert into
a partition not already initialised by ExecInitModifyTable().

Bug: #18871
Reported-by: Alexander Lakhin <exclusion@gmail.com>
Author: Tender Wang <tndrwang@gmail.com>
Reviewed-by: Gurjeet Singh <gurjeet@singh.im>
Discussion: https://postgr.es/m/18871-b44e3c96de3bd2e8%40postgresql.org
Backpatch-through: 15
2025-03-29 09:58:40 +00:00
Peter Eisentraut
a0ed19e0a9 Use PRI?64 instead of "ll?" in format strings (continued).
Continuation of work started in commit 15a79c73, after initial trial.

Author: Thomas Munro <thomas.munro@gmail.com>
Discussion: https://postgr.es/m/b936d2fb-590d-49c3-a615-92c3a88c6c19%40eisentraut.org
2025-03-29 10:43:57 +01:00
Jeff Davis
a0a4601765 Matview statistics depend on matview data.
REFRESH MATERIALIZED VIEW replaces the storage, which resets
statistics, so statistics must be restored afterward.

If both statistics and data are being dumped for a materialized view,
add a dependency from the former to the latter. Defer the statistics
to SECTION_POST_DATA, and use RESTORE_PASS_POST_ACL.

Reported-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Discussion: https://postgr.es/m/CAExHW5s47kmubpbbRJzSM-Zfe0Tj2O3GBagB7YAyE8rQ-V24Uw@mail.gmail.com
2025-03-28 16:12:55 -07:00
Alexander Korotkov
775a06d44c Make group_similar_or_args() reorder clause list as little as possible
Currently, group_similar_or_args() permutes original positions of clauses
independently on whether it manages to find any groups of similar clauses.
While we are not providing any strict warranties on saving the original order
of OR-clauses, it is preferred that the original order be modified as little
as possible.

This commit changes the reordering algorithm of group_similar_or_args() in
the following way.  We reorder each group of similar clauses so that the
first item of the group stays in place, but all the other items are moved
after it.  So, if there are no similar clauses, the order of clauses stays
the same.  When there are some groups, only required reordering happens while
the rest of the clauses remain in their places.

Reported-by: Andrei Lepikhov <lepihov@gmail.com>
Discussion: https://postgr.es/m/3ac7c436-81e1-4191-9caf-b0dd70b51511%40gmail.com
Reviewed-by: Pavel Borisov <pashkin.elfe@gmail.com>
Reviewed-by: Andrei Lepikhov <lepihov@gmail.com>
Reviewed-by: Alena Rybakina <a.rybakina@postgrespro.ru>
2025-03-28 23:37:49 +02:00
Nathan Bossart
519338ace4 Optimize popcount functions with ARM SVE intrinsics.
This commit introduces SVE implementations of pg_popcount{32,64}.
Unlike the Neon versions, we need an additional configure-time
check to determine if the compiler supports SVE intrinsics, and we
need a runtime check to determine if the current CPU supports SVE
instructions.  Our testing showed that the SVE implementations are
much faster for larger inputs and are comparable to the status
quo for smaller inputs.

Author: "Devanga.Susmitha@fujitsu.com" <Devanga.Susmitha@fujitsu.com>
Co-authored-by: "Chiranmoy.Bhattacharya@fujitsu.com" <Chiranmoy.Bhattacharya@fujitsu.com>
Co-authored-by: "Malladi, Rama" <ramamalladi@hotmail.com>
Reviewed-by: John Naylor <johncnaylorls@gmail.com>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Discussion: https://postgr.es/m/010101936e4aaa70-b474ab9e-b9ce-474d-a3ba-a3dc223d295c-000000%40us-west-2.amazonses.com
Discussion: https://postgr.es/m/OSZPR01MB84990A9A02A3515C6E85A65B8B2A2%40OSZPR01MB8499.jpnprd01.prod.outlook.com
2025-03-28 16:20:20 -05:00
Peter Eisentraut
3c8e463b0d Revert "Tidy up locale thread safety in ECPG library."
This reverts commit 8e993bff53.

It causes various build failures on the buildfarm, to be investigated.

Discussion: https://postgr.es/m/CWZBBRR6YA8D.8EHMDRGLCKCD%40neon.tech
2025-03-28 21:27:37 +01:00
Nathan Bossart
6be53c2767 Optimize popcount functions with ARM Neon intrinsics.
This commit introduces Neon implementations of pg_popcount{32,64},
pg_popcount(), and pg_popcount_masked().  As in simd.h, we assume
that all available AArch64 hardware supports Neon, so we don't need
any new configure-time or runtime checks.  Some compilers already
emit Neon instructions for these functions, but our hand-rolled
implementations for pg_popcount() and pg_popcount_masked()
performed better in testing, likely due to better instruction-level
parallelism.

Author: "Chiranmoy.Bhattacharya@fujitsu.com" <Chiranmoy.Bhattacharya@fujitsu.com>
Reviewed-by: John Naylor <johncnaylorls@gmail.com>
Discussion: https://postgr.es/m/010101936e4aaa70-b474ab9e-b9ce-474d-a3ba-a3dc223d295c-000000%40us-west-2.amazonses.com
2025-03-28 14:49:35 -05:00
Heikki Linnakangas
51a0382e8d Fix crash if LockErrorCleanup() is called twice
The refactoring in commit 3c0fd64fec removed the clearing of
awaitedLock from LockErrorCleanup(). It's still needed, otherwise
LockErrorCleanup() during abort processing will try to update the
LOCALLOCK struct even after the lock has already been released. Put it
back.

Reported-by: Richard Guo <guofenglinux@gmail.com>
Reported-by: Robins Tharakan <tharakan@gmail.com>
Reported-by: Alexander Lakhin <exclusion@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/CAMbWs4_dNX1SzBmvFdoY-LxJh_4W_BjtVd5i008ihfU-wFF=eg@mail.gmail.com
Discussion: https://www.postgresql.org/message-id/18832-38e5575b1bbd7277@postgresql.org
Discussion: https://www.postgresql.org/message-id/e11a30e5-c0d8-491d-8546-3a1b50c10ad4@gmail.com
2025-03-28 20:19:17 +02:00
Nathan Bossart
9ac6f7e7ce Rename TRY_POPCNT_FAST to TRY_POPCNT_X86_64.
This macro protects x86_64-specific code, and a subsequent commit
will introduce AArch64-specific versions of that code.  To prevent
confusion, let's rename it to clearly indicate that it's for
x86_64.  We should likely move this code to its own file (perhaps
merging it with the AVX-512 popcount code), but that is left as a
future exercise.

Reviewed-by: "Chiranmoy.Bhattacharya@fujitsu.com" <Chiranmoy.Bhattacharya@fujitsu.com>
Reviewed-by: John Naylor <johncnaylorls@gmail.com>
Discussion: https://postgr.es/m/010101936e4aaa70-b474ab9e-b9ce-474d-a3ba-a3dc223d295c-000000%40us-west-2.amazonses.com
2025-03-28 12:27:47 -05:00
Masahiko Sawada
a5419bc72e Fix timestamp overflow in UUIDv7 implementation.
The uuidv7_interval() function previously converted a shifted
microsecond-precision timestamp (64-bit integer) to another 64-bit
integer representing a timestamp with nanosecond precision. This
conversion caused overflow for dates beyond the year 2262. The
millisecond and sub-millisecond parts were then extracted from this
nanosecond-precision timestamp and stored in UUIDv7 values.

With this commit, the millisecond and sub-millisecond parts are stored
directly into the UUIDv7 value without being converted back to a
nanosecond precision timestamp. Following RFC 9562, the timestamp is
stored as an unsigned integer, enabling support for dates up to the
year 10889.

Reported and fixed by Andrey Borodin, with cosmetic changes and
regression tests by me.

Reported-by: Andrey Borodin <x4mmm@yandex-team.ru>
Author: Andrey Borodin <x4mmm@yandex-team.ru>
Discussion: https://postgr.es/m/96DEC2D9-659A-40E8-B7BA-AF5D162A9E21@yandex-team.ru
2025-03-28 09:39:11 -07:00
Peter Eisentraut
8e993bff53 Tidy up locale thread safety in ECPG library.
Remove setlocale() and _configthreadlocal() as fallback strategy on
systems that don't have uselocale(), where ECPG tries to control
LC_NUMERIC formatting on input and output of floating point numbers.  It
was probably broken on some systems (NetBSD), and the code was also
quite messy and complicated, with obsolete configure tests (Windows).
It was also arguably broken, or at least had unstated environmental
requirements, if pgtypeslib code was called directly.

Instead, introduce PG_C_LOCALE to refer to the "C" locale as a locale_t
value.  It maps to the special constant LC_C_LOCALE when defined by libc
(macOS, NetBSD), or otherwise uses a process-lifetime locale_t that is
allocated on first use, just as ECPG previously did itself.  The new
replacement might be more widely useful.  Then change the float parsing
and printing code to pass that to _l() functions where appropriate.

Unfortunately the portability of those functions is a bit complicated.
First, many obvious and useful _l() functions are missing from POSIX,
though most standard libraries define some of them anyway.  Second,
although the thread-safe save/restore technique can be used to replace
the missing ones, Windows and NetBSD refused to implement standard
uselocale().  They might have a point: "wide scope" uselocale() is hard
to combine with other code and error-prone, especially in library code.
Luckily they have the  _l() functions we want so far anyway.  So we have
to be prepared for both ways of doing things:

1.  In ECPG, use strtod_l() for parsing, and supply a port.h replacement
using uselocale() over a limited scope if missing.

2.  Inside our own snprintf.c, use three different approaches to format
floats.  For frontend code, call libc's snprintf_l(), or wrap libc's
snprintf() in uselocale() if it's missing.  For backend code, snprintf.c
can keep assuming that the global locale's LC_NUMERIC is "C" and call
libc's snprintf() without change, for now.

(It might eventually be possible to call our in-tree Ryū routines to
display floats in snprintf.c, given the C-locale-always remit of our
in-tree snprintf(), but this patch doesn't risk changing anything that
complicated.)

Author: Thomas Munro <thomas.munro@gmail.com>
Reviewed-by: Peter Eisentraut <peter@eisentraut.org>
Reviewed-by: Tristan Partin <tristan@partin.io>
Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi>
Discussion: https://postgr.es/m/CWZBBRR6YA8D.8EHMDRGLCKCD%40neon.tech
2025-03-28 16:18:36 +01:00
Peter Eisentraut
2247281c47 Cast result of i64abs() back to int64
Without the cast, the return type could be long or long long,
depending on what int64 is underneath.  This doesn't affect code
correctness, but it could result in format-mismatch warnings when
attempting to printf such values using PRId64.

Reported-by: Thomas Munro <thomas.munro@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/CA+hUKGJc4s+Wyb3EFOQNN9VVK+Qv40r2LK41o9PkS9ThxviTvQ@mail.gmail.com
2025-03-28 14:34:57 +01:00
Robert Haas
83ccc85859 pg_overexplain: Use PG_MODULE_MAGIC_EXT.
I committed this contrib module just after Tom committed
55527368bd07248e91e3d37a782bf66b76f06865; adjust it to match.

Author: Man Zeng <zengman@halodbtech.com>
Discussion: http://postgr.es/m/174313513707.60295.16516085012903412705.pgcf@coridan.postgresql.org
2025-03-28 09:16:29 -04:00
Robert Haas
9f0c36aea0 pg_overexplain: Call previous hooks as appropriate.
It makes no sense to remember the previous values of the hook variables
and then never bother calling those functions. Thanks to Andrei for
spotting my goof.

Author: Andrei Lepikhov <lepihov@gmail.com>
Discussion: http://postgr.es/m/41a344e3-ffb1-4296-8ba7-801f1e9642e5@gmail.com
2025-03-28 09:02:37 -04:00
Peter Eisentraut
cdc168ad4b Add support for not-null constraints on virtual generated columns
This was left out of the original patch for virtual generated columns
(commit 83ea6c5402).

This just involves a bit of extra work in the executor to expand the
generation expressions and run a "IS NOT NULL" test against them.

There is also a bit of work to make sure that not-null constraints are
checked during a table rewrite.

Author: jian he <jian.universality@gmail.com>
Reviewed-by: Xuneng Zhou <xunengzhou@gmail.com>
Reviewed-by: Navneet Kumar <thanit3111@gmail.com>
Reviewed-by: Álvaro Herrera <alvherre@alvh.no-ip.org>
Discussion: https://postgr.es/m/CACJufxHArQysbDkWFmvK+D1TPHQWWTxWN15cMuUaTYX3xhQXgg@mail.gmail.com
2025-03-28 13:53:37 +01:00
Peter Eisentraut
747ddd38cb Modernize some code a bit
Modernize code in ExecRelCheck() and ExecConstraints() a bit,
preparing the way for some new code.

Co-authored-by: jian he <jian.universality@gmail.com>
Reviewed-by: Xuneng Zhou <xunengzhou@gmail.com>
Reviewed-by: Navneet Kumar <thanit3111@gmail.com>
Reviewed-by: Álvaro Herrera <alvherre@alvh.no-ip.org>
Discussion: https://postgr.es/m/CACJufxHArQysbDkWFmvK+D1TPHQWWTxWN15cMuUaTYX3xhQXgg@mail.gmail.com
2025-03-28 10:49:15 +01:00
Peter Eisentraut
9a9ead1105 Rename a node field for clarity
Rename ResultRelInfo.ri_ConstraintExprs to ri_CheckConstraintExprs.
This reflects its specific purpose better and avoids confusion with
adjacent fields with similar but distinct purposes.

Discussion: https://postgr.es/m/CACJufxHArQysbDkWFmvK+D1TPHQWWTxWN15cMuUaTYX3xhQXgg@mail.gmail.com
2025-03-28 09:50:01 +01:00
Amit Kapila
fb2ea12f42 pg_createsubscriber: Add '--all' option.
The '--all' option indicates that the tool queries the source server
(publisher) for all databases and creates subscriptions on the target
server (subscriber) for databases with matching names. Without this user
needs to explicitly specify all databases by using -d option for each
database.

This simplifies converting a physical standby to a logical subscriber,
particularly during upgrades.

The options '--database', '--publication', '--subscription', and
'--replication-slot' cannot be used when '--all' is specified.

Author: Shubham Khanna <khannashubham1197@gmail.com>
Reviewed-by: vignesh C <vignesh21@gmail.com>
Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Reviewed-by: Euler Taveira <euler@eulerto.com>
Reviewed-by: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Reviewed-by: Peter Smith <smithpb2250@gmail.com>
Reviewed-by: Shlok Kyal <shlok.kyal.oss@gmail.com>
Discussion: https://postgr.es/m/CAHv8RjKhA=_h5vAbozzJ1Opnv=KXYQHQ-fJyaMfqfRqPpnC2bA@mail.gmail.com
2025-03-28 12:26:39 +05:30
Peter Eisentraut
890fc826c9 Use thread-safe strftime_l() instead of strftime().
This removes some setlocale() calls and a lot of commentary about how
dangerous that is.  strftime_l() is from POSIX 2008, and on Windows we
use _wcsftime_l().

While here, adjust error message for strftime_l() failure: it does not
in practice set errno (even though POSIX says it could), so no %m.

Author: Thomas Munro <thomas.munro@gmail.com>
Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi>
Reviewed-by: Peter Eisentraut <peter@eisentraut.org>
Discussion: https://postgr.es/m/CA%2BhUKGJqVe0%2BPv9dvC9dSums_PXxGo9SWcxYAMBguWJUGbWz-A%40mail.gmail.com
2025-03-28 07:13:43 +01:00
Amit Kapila
474d7a1fd8 Stablize tests added in 3abe9dc188.
The problem is that after the ALTER SUBSCRIPTION tap_sub SET PUBLICATION
command, we didn't wait for the new walsender to start on the publisher.
Immediately after ALTER, we performed Insert and expected it to replicate.
However, the replication could start from a point after the INSERT location,
and as the subscription isn't copying initial data, we could miss such an
Insert.

The fix is to wait for connection to be established between publisher and
subscriber before starting DML operations that are expected to replicate.

As per CI.

Reported-by: Andres Freund <andres@anarazel.de>
Author: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Discussion: https://postgr.es/m/CALDaNm2ms1deM5EYNLFEfESv_Kw=Y4AiTB0LP=qGS-UpFwGbPg@mail.gmail.com
2025-03-28 11:03:05 +05:30
Daniel Gustafsson
058b5152f0 Fix guc_malloc calls for consistency and OOM checks
check_createrole_self_grant and check_synchronized_standby_slots
were allocating memory on a LOG elevel without checking if the
allocation succeeded or not, which would have led to a segfault
on allocation failure.

On top of that, a number of callsites were using the ERROR level,
relying on erroring out rather than returning false to allow the
GUC machinery handle it gracefully.  Other callsites used WARNING
instead of LOG.  While neither being not wrong, this changes all
check_ functions do it consistently with LOG.

init_custom_variable gets a promoted elevel to FATAL to keep
the guc_malloc error handling in line with the rest of the
error handling in that function which already call FATAL.  If
we encounter an OOM in this callsite there is no graceful
handling to be had, better to error out hard.

Backpatch the fix to check_createrole_self_grant down to v16
and the fix to check_synchronized_standby_slots down to v17
where they were introduced.

Author: Daniel Gustafsson <daniel@yesql.se>
Reported-by: Nikita <pm91.arapov@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Bug: #18845
Discussion: https://postgr.es/m/18845-582c6e10247377ec@postgresql.org
Backpatch-through: 16
2025-03-27 22:57:34 +01:00
Melanie Plageman
043799fa08 Use streaming read I/O in heap amcheck
Instead of directly invoking ReadBuffer() for each unskippable block in
the heap relation, verify_heapam() now uses the read stream API to
acquire the next buffer to check for corruption.

Author: Matheus Alcantara <matheusssilv97@gmail.com>
Co-authored-by: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Nazir Bilal Yavuz <byavuz81@gmail.com>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Reviewed-by: jian he <jian.universality@gmail.com>
Discussion: https://postgr.es/m/flat/CAFY6G8eLyz7%2BsccegZYFj%3D5tAUR-GZ9uEq4Ch5gvwKqUwb_hCA%40mail.gmail.com
2025-03-27 14:04:14 -04:00
Tom Lane
4623d71443 Prevent assertion failure in contrib/pg_freespacemap.
Applying pg_freespacemap() to a relation lacking storage (such as a
view) caused an assertion failure, although there was no ill effect
in non-assert builds.  Add an error check for that case.

Bug: #18866
Reported-by: Robins Tharakan <tharakan@gmail.com>
Author: Tender Wang <tndrwang@gmail.com>
Reviewed-by: Euler Taveira <euler@eulerto.com>
Discussion: https://postgr.es/m/18866-d68926d0f1c72d44@postgresql.org
Backpatch-through: 13
2025-03-27 13:20:23 -04:00
Tom Lane
d66997dfe8 Avoid mixing designated and non-designated field initializers.
As revised by commit 9324c8c58, PG_MODULE_MAGIC constructed a
struct initializer containing both designated fields and a
non-designated "0".  That's okay in C, but not in C++, with
the result that extensions written in C++ failed to compile.
Change it to use only designated field initializers.

Author: Yurii Rashkovskii <yrashk@omnigres.com>
Discussion: https://postgr.es/m/CAG=VW14mctsR543gpzLCuJ9JgJqwa=ptmBfGvxEjs+k8Jf7-Bg@mail.gmail.com
2025-03-27 11:06:30 -04:00
Daniel Gustafsson
0f3604a518 psql: Fix incorrect equality comparison
Commit 1a759c8327 contained an incorrect equality comparison
which was discovered by Coverity.

Reported-by: Ranier Vilela <ranier.vf@gmail.com>
Discussion: https://postgr.es/m/CAEudQApfAWzLo+oSuy2byXktdr7R8KJC_ACT5VV8fontrL35Pw@mail.gmail.com
2025-03-27 14:09:25 +01:00
Robert Haas
081ec08e6a pg_overexplain: Filter out actual row count from test result.
Per buildfarm, these are not stable. In particular, 1/8 is sometimes
0.12 and sometimes 0.13.
2025-03-27 09:00:46 -04:00
Álvaro Herrera
9fbd53dea5
Remove the query_id_squash_values GUC
Commit 62d712ecfd introduced the capability to calculate the same
queryId for queries with different lengths of constants in a list for an
IN clause.  This behavior was originally enabled with a GUC
query_id_squash_values.  After a discussion about the value of such a
GUC, it was decided to back out of the use of a GUC and make the
squashing behavior the only available option.

Author: Sami Imseih <samimseih@gmail.com>
Discussion: https://postgr.es/m/Z-LZyygkkNyA8-kR@msg.df7cb.de
Discussion: https://postgr.es/m/CA+q6zcVTK-3C-8NWV1oY2NZrvtnMCDqnyYYyk1T7WMUG65MeOQ@mail.gmail.com
2025-03-27 13:33:37 +01:00
Peter Eisentraut
5d5f415816 Expand test a bit
Make pg_constraint output in inherit test show the convalidated column
as well.  This shows the interaction between convalidated and
conenforced.

This is extracted from a larger patch so that this reformatting isn't
distracting there.

Author: Amul Sul <amul.sul@enterprisedb.com>
Discussion: https://www.postgresql.org/message-id/flat/CAAJ_b962c5AcYW9KUt_R_ER5qs3fUGbe4az-SP-vuwPS-w-AGA@mail.gmail.com
2025-03-27 12:11:15 +01:00
Peter Eisentraut
b98be8a2a2 Provide thread-safe pg_localeconv_r().
This involves four different implementation strategies:

1.  For Windows, we now require _configthreadlocale() to be available
and work (commit f1da075d9a), and the documentation says that the
object returned by localeconv() is in thread-local memory.

2.  For glibc, we translate to nl_langinfo_l() calls, because it
offers the same information that way as an extension, and that API is
thread-safe.

3.  For macOS/*BSD, use localeconv_l(), which is thread-safe.

4.  For everything else, use uselocale() to set the locale for the
thread, and use a big ugly lock to defend against the returned object
being concurrently clobbered.  In practice this currently means only
Solaris.

The new call is used in pg_locale.c, replacing calls to setlocale() and
localeconv().

Author: Thomas Munro <thomas.munro@gmail.com>
Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi>
Reviewed-by: Peter Eisentraut <peter@eisentraut.org>
Discussion: https://postgr.es/m/CA%2BhUKGJqVe0%2BPv9dvC9dSums_PXxGo9SWcxYAMBguWJUGbWz-A%40mail.gmail.com
2025-03-27 10:54:28 +01:00
Álvaro Herrera
4a02af8b1a
Simplify syntax for ALTER TABLE ALTER CONSTRAINT NO INHERIT
Commit d45597f72f introduced the ability to change a not-null
constraint from NO INHERIT to INHERIT and vice versa, but we included
the SET noise word in the syntax for it.  The SET turns out not to be
necessary and goes against what the SQL standard says for other ALTER
TABLE subcommands, so remove it.

This changes the way this command is processed for constraint types
other than not-null, so there are some error message changes.

Reviewed-by: Peter Eisentraut <peter@eisentraut.org>
Reviewed-by: Suraj Kharage <suraj.kharage@enterprisedb.com>
Discussion: https://postgr.es/m/202503251602.vsxaehsyaoac@alvherre.pgsql
2025-03-27 09:24:52 +01:00
Michael Paquier
72c2f36d57 libpq: Add TAP tests for service files and names
This commit adds a set of regression tests that checks various patterns
with service names and service files, with:
- Service file with no contents, used as default for PGSERVICEFILE to
prevent any lookups at the HOME directory of an environment where the
test is run.
- Service file with valid service name and its section.
- Service file at the root of PGSYSCONFDIR, named pg_service.conf.
- Missing service file.
- Service name defined as a connection parameter or as PGSERVICE.

Note that PGSYSCONFDIR is set to always point at a temporary directory
created by the test, so as we never try to look at SYSCONFDIR.

This set of tests has come up as a useful independent addition while
discussing a patch that adds an equivalent of PGSERVICEFILE as a
connection parameter as there have never been any tests for service
files and service names.  Torsten Foertsch and Ryo Kanbayashi have
provided a basic implementation, that I have expanded to what is
introduced in this commit.

Author: Torsten Foertsch <tfoertsch123@gmail.com>
Author: Ryo Kanbayashi <kanbayashi.dev@gmail.com>
Author: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/CAKkG4_nCjx3a_F3gyXHSPWxD8Sd8URaM89wey7fG_9g7KBkOCQ@mail.gmail.com
2025-03-27 16:01:38 +09:00
David Rowley
ad9a23bc4f Optimize Query jumble
f31aad9b0 adjusted query jumbling so it no longer ignores NULL nodes
during the jumble.  This added some overhead.  Here we tune a few
things to make jumbling faster again.  This makes jumbling perform
similar or even slightly faster than prior to that change.

Author: David Rowley <dgrowleyml@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/CAApHDvreP04nhTKuYsPw0F-YN+4nr4f=L72SPeFb81jfv+2c7w@mail.gmail.com
2025-03-27 18:34:34 +13:00
David Rowley
f31aad9b07 Fix query jumbling to account for NULL nodes
Previously NULL nodes were ignored.  This could cause issues where the
computed query ID could match for queries where fields that are next to
each other in their Node struct where one field was NULL and the other
non-NULL.  For example, the Query struct had distinctClause and sortClause
next to each other.  If someone wrote;

SELECT DISTINCT c1 FROM t;

and then;

SELECT c1 FROM t ORDER BY c1;

these would produce the same query ID since, in the first query, we
ignored the NULL sortClause and appended the jumble bytes for the
distictClause.  In the latter query, since we did nothing for the NULL
distinctClause then jumble the non-NULL sortClause, and since the node
representation stored is the same in both cases, the query IDs were
identical.

Here we fix this by always accounting for NULL nodes by recording that
we saw a NULL in the jumble buffer.  This fixes the issue as the order that
the NULL is recorded isn't the same in the above two queries.

Author: Bykov Ivan <i.bykov@modernsys.ru>
Author: Michael Paquier <michael@paquier.xyz>
Author: David Rowley <dgrowleyml@gmail.com>
Discussion: https://postgr.es/m/aafce7966e234372b2ba876c0193f1e9%40localhost.localdomain
2025-03-27 18:23:00 +13:00
Michael Paquier
44fe6ceb51 doc: Correct description of values used in FSM for indexes
The implementation of FSM for indexes is simpler than heap, where 0 is
used to track if a page is in-use and (BLCKSZ - 1) if a page is free.
One comment in indexfsm.c and one description in the documentation of
pg_freespacemap were incorrect about that.

Author: Alex Friedman <alexf01@gmail.com>
Discussion: https://postgr.es/m/71eef655-c192-453f-ac45-2772fec2cb04@gmail.com
Backpatch-through: 13
2025-03-27 10:20:41 +09:00
Andres Freund
c325a7633f aio: Add io_method=io_uring
Performing AIO using io_uring can be considerably faster than
io_method=worker, particularly when lots of small IOs are issued, as
a) the context-switch overhead for worker based AIO becomes more significant
b) the number of IO workers can become limiting

io_uring, however, is linux specific and requires an additional compile-time
dependency (liburing).

This implementation is fairly simple and there are substantial optimization
opportunities.

The description of the existing AIO_IO_COMPLETION wait event is updated to
make the difference between it and the new AIO_IO_URING_EXECUTION clearer.

Reviewed-by: Noah Misch <noah@leadboat.com>
Reviewed-by: Jakub Wartak <jakub.wartak@enterprisedb.com>
Discussion: https://postgr.es/m/uvrtrknj4kdytuboidbhwclo4gxhswwcpgadptsjvjqcluzmah%40brqs62irg4dt
Discussion: https://postgr.es/m/20210223100344.llw5an2aklengrmn@alap3.anarazel.de
Discussion: https://postgr.es/m/stj36ea6yyhoxtqkhpieia2z4krnam7qyetc57rfezgk4zgapf@gcnactj4z56m
2025-03-26 19:49:13 -04:00
Andres Freund
8eadd5c73c aio: Add liburing dependency
Will be used in a subsequent commit, to implement io_method=io_uring. Kept
separate for easier review.

Reviewed-by: Noah Misch <noah@leadboat.com>
Discussion: https://postgr.es/m/uvrtrknj4kdytuboidbhwclo4gxhswwcpgadptsjvjqcluzmah%40brqs62irg4dt
2025-03-26 19:45:32 -04:00
Michael Paquier
f056f75daf doc: Mention possible ephemeral discrepancies in pg_stat_activity
Ephemeral inconsistencies across multiple attributes of pg_stat_activity
can exist as the system is designed to be efficient with a low overhead.
This question is raised by users from time to time based on the data
read in the view, so let's add a note in the docs about this
possibility.

Author: Alex Friedman <alexf01@gmail.com>
Reviewed-by: Sami Imseih <samimseih@gmail.com>
Discussion: https://postgr.es/m/8a275154-a654-44b0-ab37-197802f04c7b@gmail.com
2025-03-27 08:07:54 +09:00
Andres Freund
9469d7fdd2 aio: Rename pgaio_io_prep_* to pgaio_io_start_*
The old naming pattern (mirroring liburing's naming) was inconsistent with
the (not yet introduced) callers. It seems better to get rid of the
inconsistency now than to grow more users of the odd naming.

Reported-by: Noah Misch <noah@leadboat.com>
Discussion: https://postgr.es/m/20250326001915.bc.nmisch@google.com
2025-03-26 16:10:29 -04:00
Andres Freund
f321ec237a aio: Pass result of local callbacks to ->report_return
Otherwise the results of e.g. temp table buffer verification errors will not
reach bufmgr.c. Obviously that's not right. Found while expanding the tests
for invalid buffer contents.

Reviewed-by: Noah Misch <noah@leadboat.com>
Discussion: https://postgr.es/m/20250326001915.bc.nmisch@google.com
2025-03-26 16:06:54 -04:00
Andres Freund
96da9050a5 aio: Be more paranoid about interrupts
As reported by Noah, it's possible, although practically very unlikely, that
interrupts could be processed in between pgaio_io_reopen() and
pgaio_io_perform_synchronously(). Prevent that by explicitly holding
interrupts.

It also seems good to add an assertion to pgaio_io_before_prep() to ensure
that interrupts are held, as otherwise FDs referenced by the IO could be
closed during interrupt processing. All code in the aio series currently runs
the code with interrupts held, but it seems better to be paranoid.

Reviewed-by: Noah Misch <noah@leadboat.com>
Reported-by: Noah Misch <noah@leadboat.com>
Discussion: https://postgr.es/m/20250324002939.5c.nmisch@google.com
2025-03-26 16:06:54 -04:00
Robert Haas
47a1f076a7 pg_overexplain: SET jit=off when running tests.
Per buildfarm.
2025-03-26 15:43:25 -04:00
Robert Haas
de65c4dade Fix oversights in commit 8d5ceb113e
It added bogus whitespace at the end of a line in the documentation.
It should not have done that.

The pg_overexplain tests must SET debug_parallel_query = false,
not just RESET debug_parallel_query, or we get failures on test
machines that make debug_parallel_query = true the defualt.
2025-03-26 14:22:45 -04:00
Robert Haas
8d5ceb113e pg_overexplain: Additional EXPLAIN options for debugging.
There's a fair amount of information in the Plan and PlanState trees
that isn't printed by any existing EXPLAIN option. This means that,
when working on the planner, it's often necessary to rely on facilities
such as debug_print_plan, which produce excessively voluminous
output. Hence, use the new EXPLAIN extension facilities to implement
EXPLAIN (DEBUG) and EXPLAIN (RANGE_TABLE) as extensions to the core
EXPLAIN facility.

A great deal more could be done here, and the specific choices about
what to print and how are definitely arguable, but this is at least
a starting point for discussion and a jumping-off point for possible
future improvements.

Reviewed-by: Sami Imseih <samimseih@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviweed-by: Andrei Lepikhov <lepihov@gmail.com> (who didn't like it)
Discussion: http://postgr.es/m/CA+TgmoZfvQUBWQ2P8iO30jywhfEAKyNzMZSR+uc2xr9PZBw6eQ@mail.gmail.com
2025-03-26 13:52:21 -04:00
Tomas Vondra
818245506c Keep the decompressed filter in brin_bloom_union
The brin_bloom_union() function combines two BRIN summaries, by merging
one filter into the other. With bloom, we have to decompress the filters
first, but the function failed to update the summary to store the merged
filter. As a consequence, the index may be missing some of the data, and
return false negatives.

This issue exists since BRIN bloom indexes were introduced in Postgres
14, but at that point the union function was called only when two
sessions happened to summarize a range concurrently, which is rare. It
got much easier to hit in 17, as parallel builds use the union function
to merge summaries built by workers.

Fixed by storing a pointer to the decompressed filter, and freeing the
original one. Free the second filter too, if it was decompressed. The
freeing is not strictly necessary, because the union is called in
short-lived contexts, but it's tidy.

Backpatch to 14, where BRIN bloom indexes were introduced.

Reported by Arseniy Mukhin, investigation and fix by me.

Reported-by: Arseniy Mukhin
Discussion: https://postgr.es/m/18855-1cf1c8bcc22150e6%40postgresql.org
Backpatch-through: 14
2025-03-26 17:01:41 +01:00
Tom Lane
55527368bd Use PG_MODULE_MAGIC_EXT in our installable shared libraries.
It seems potentially useful to label our shared libraries with version
information, now that a facility exists for retrieving that.  This
patch labels them with the PG_VERSION string.  There was some
discussion about using semantic versioning conventions, but that
doesn't seem terribly helpful for modules with no SQL-level presence;
and for those that do have SQL objects, we typically expect them
to support multiple revisions of the SQL definitions, so it'd still
not be very helpful.

I did not label any of src/test/modules/.  It seems unnecessary since
we don't install those, and besides there ought to be someplace that
still provides test coverage for the original PG_MODULE_MAGIC macro.

Author: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/dd4d1b59-d0fe-49d5-b28f-1e463b68fa32@gmail.com
2025-03-26 11:11:02 -04:00
Tom Lane
9324c8c580 Introduce PG_MODULE_MAGIC_EXT macro.
This macro allows dynamically loaded shared libraries (modules) to
provide a wired-in module name and version, and possibly other
compile-time-constant fields in future.  This information can be
retrieved with the new pg_get_loaded_modules() function.

This feature is expected to be particularly useful for modules
that do not have any exposed SQL functionality and thus are
not associated with a SQL-level extension object.  But even for
modules that do belong to extensions, being able to verify the
actual code version can be useful.

Author: Andrei Lepikhov <lepihov@gmail.com>
Reviewed-by: Yurii Rashkovskii <yrashk@omnigres.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/dd4d1b59-d0fe-49d5-b28f-1e463b68fa32@gmail.com
2025-03-26 11:06:12 -04:00
Daniel Gustafsson
e92c0632c1 Move GSSAPI includes into its own header
Due to a conflict in macro names on Windows between <wincrypt.h>
and <openssl/ssl.h> these headers need to be included using a
predictable pattern with an undef to handle that. The GSSAPI
header <gssapi.h> does include <wincrypt.h> which cause problems
with compiling PostgreSQL using MSVC when OpenSSL and GSSAPI are
both enabled in the tree. Rather than fixing piecemeal for each
file including gssapi headers, move the the includes and undef
to a new file which should be used to centralize the logic.

This patch is a reworked version of a patch by Imran Zaheer
proposed earlier in the thread. Once this has proven effective
in master we should look at backporting this as the problem
exist at least since v16.

Author: Daniel Gustafsson <daniel@yesql.se>
Co-authored-by: Imran Zaheer <imran.zhir@gmail.com>
Reported-by: Dave Page <dpage@pgadmin.org>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: vignesh C <vignesh21@gmail.com>
Discussion: https://postgr.es/m/20240708173204.3f3xjilglx5wuzx6@awork3.anarazel.de
2025-03-26 15:31:46 +01:00
Daniel Gustafsson
1eb399366e psql: Make test robust against locale variations
The test committed in 1a759c8327 was prone to failing when using
locales with a different decimal separator.  Since the test value
isn't the important part, change to using an integer instead.

Author: Daniel Gustafsson <daniel@yesql.se>
Reported-by: Pavel Stehule <pavel.stehule@gmail.com>
Reviewed-by: Pavel Stehule <pavel.stehule@gmail.com>
Discussion: https://postgr.es/m/CAFj8pRDE=7uW7QP4rg-OQLE2i-puYsUUt+eHE-L6_b_J9w=eWg@mail.gmail.com
2025-03-26 13:20:56 +01:00
Peter Eisentraut
3642df265d dblink: SCRAM authentication pass-through
This enables SCRAM authentication for dblink (using dblink_fdw) when
connecting to a foreign server without having to store a plain-text
password on user mapping options

This uses the same approach as it was implemented for postgres_fdw in
commit 761c79508e.  (It also contains the equivalent of the
subsequent fixes 76563f88cf and d2028e9bbc1.)

Author: Matheus Alcantara <mths.dev@pm.me>
Reviewed-by: Jacob Champion <jacob.champion@enterprisedb.com>
Discussion: https://www.postgresql.org/message-id/flat/CAFY6G8ercA1KES%3DE_0__R9QCTR805TTyYr1No8qF8ZxmMg8z2Q%40mail.gmail.com
2025-03-26 10:49:23 +01:00
Dean Rasheed
a3b6dfd410 Add support for gamma() and lgamma() functions.
These are useful general-purpose math functions which are included in
POSIX and C99, and are commonly included in other math libraries, so
expose them as SQL-callable functions.

Author: Dean Rasheed <dean.a.rasheed@gmail.com>
Reviewed-by: Stepan Neretin <sncfmgg@gmail.com>
Reviewed-by: Peter Eisentraut <peter@eisentraut.org>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Dmitry Koval <d.koval@postgrespro.ru>
Reviewed-by: Alexandra Wang <alexandra.wang.oss@gmail.com>
Discussion: https://postgr.es/m/CAEZATCXpGyfjXCirFk9au+FvM0y2Ah+2-0WSJx7MO368ysNUPA@mail.gmail.com
2025-03-26 09:35:53 +00:00
Richard Guo
7c82b4f711 Fix integer-overflow problem in scram_SaltedPassword()
Setting the iteration count for SCRAM secret generation to INT_MAX
will cause an infinite loop in scram_SaltedPassword() due to integer
overflow, as the loop uses the "i <= iterations" comparison.  To fix,
use "i < iterations" instead.

Back-patch to v16 where the user-settable GUC scram_iterations has
been added.

Author: Kevin K Biju <kevinkbiju@gmail.com>
Reviewed-by: Richard Guo <guofenglinux@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/CAM45KeEMm8hnxdTOxA98qhfZ9CzGDdgy3mxgJmy0c+2WwjA6Zg@mail.gmail.com
2025-03-26 17:46:51 +09:00
Michael Paquier
787514b30b Use relation name instead of OID in query jumbling for RangeTblEntry
custom_query_jumble (introduced in 5ac462e2b7 as a node field
attribute) is now assigned to the expanded reference name "eref" of
RangeTblEntry, adding in the query jumble computation the non-qualified
aliased relation name, without the list of column names.  The relation
OID is removed from the query jumbling.

The effects of this change can be seen in the tests added by
3430215fe3, where pg_stat_statements (PGSS) entries are now grouped
using the relation name, ignoring the relation search_path may point at.
For example, these two relations are different, but are now grouped in a
single PGSS entry as they are assigned the same query ID:
CREATE TABLE foo1.tab (a int);
CREATE TABLE foo2.tab (b int);
SET search_path = 'foo1';
SELECT count(*) FROM tab;
SET search_path = 'foo2';
SELECT count(*) FROM tab;
SELECT count(*) FROM foo1.tab;
SELECT count(*) FROM foo2.tab;
SELECT query, calls FROM pg_stat_statements WHERE query ~ 'FROM tab';
          query           | calls
--------------------------+-------
 SELECT count(*) FROM tab |     4
(1 row)

It is still possible to use an alias in the FROM clause to split these.
This behavior is useful for relations re-created with the same name,
where queries based on such relations would be grouped in the same
PGSS entry.  For permanent schemas, it should not really matter in
practice.  The main benefit is for workloads that use a lot of temporary
relations, which are usually re-created with the same name continuously.
These can be a heavy source of bloat in PGSS depending on the workload.
Such entries can now be grouped together, improving the user experience.

The original idea from Christoph Berg used catalog lookups to find
temporary relations, something that the query jumble has never done, and
it could cause some performance regressions.  The idea to use
RangeTblEntry.eref and the relation name, applying the same rules for
all relations, temporary and not temporary, has been proposed by Tom
Lane.  The documentation additions have been suggested by Sami Imseih.

Author: Michael Paquier <michael@paquier.xyz>
Co-authored-by: Sami Imseih <samimseih@gmail.com>
Reviewed-by: Christoph Berg <myon@debian.org>
Reviewed-by: Lukas Fittl <lukas@fittl.com>
Reviewed-by: Sami Imseih <samimseih@gmail.com>
Discussion: https://postgr.es/m/Z9iWXKGwkm8RAC93@msg.df7cb.de
2025-03-26 15:21:05 +09:00
Peter Eisentraut
d2028e9bbc postgres_fdw: Fix tests on some Windows variants
The tests introduced by commit 76563f88cf only work when Unix-domain
sockets are available.  This is optional on Windows, and buildfarm
member drongo runs without them.  To fix, skip the test if Unix-domain
sockets are not enabled.
2025-03-26 07:00:00 +01:00
Jeff Davis
bde2fb797a Add pg_dump --with-{schema|data|statistics} options.
By adding the positive variants of options, in addition to the
negative variants that already exist, users can be explicit about what
pg_dump should produce.

Discussion: https://postgr.es/m/bd0513e4b1ea2b2f2d06f02720c6579711cb62a6.camel@j-davis.com
Reviewed-by: Corey Huinker <corey.huinker@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
2025-03-25 17:36:38 -07:00
Michael Paquier
27ee6ede6b Fix two issues with custom_query_jumble in gen_node_support.pl
A node field marked with custom_query_jumble and query_jumble_ignore
would generate some code of a custom routine.  The script is changed so
as custom_query_jumble behaves like the other options in this case,
query_jumble_ignore taking priority, with no code generated.

A comment related to the code generated for node types was misplaced.

Thinkos introduced in 5ac462e2b7.

Reported-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/1324036.1742945060@sss.pgh.pa.us
2025-03-26 09:06:36 +09:00
Tom Lane
cb36f8ec21 Fix order of -I switches for building pg_regress.o.
We need the -I switch for libpq_srcdir to come before any -I switches
injected by configure.  Otherwise there is a risk of pulling in a
mismatched version of libpq_fe.h from someplace like
/usr/local/include, if the platform has another Postgres version
installed there.  This evidently accounts for today's buildfarm
failures on "anaconda".

In principle the -I switch for src/port/ is at similar hazard, and has
been for a very long time.  But the only .h files we keep there are
pg_config_paths.h and pthread-win32.h, neither of which get installed
on Unix-ish systems, so the odds of picking up a conflicting header
seem pretty small.  That doubtless accounts for the lack of prior
reports.

Back-patch to v17 where pg_regress acquired a build dependency on
libpq_fe.h.  We could go back further to fix the hazard for src/port/
in older branches, but it seems unlikely to be worth troubling over.

Reported-by: Nathan Bossart <nathandbossart@gmail.com>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/Z-MhRzoc7t-nPUQG@nathan
Backpatch-through: 17
2025-03-25 20:03:56 -04:00
Michael Paquier
3430215fe3 pg_stat_statements: Add more tests with temp tables and namespaces
These tests provide coverage for RangeTblEntry and how query jumbling
works with search_path, as well as the case where relations are
re-created, generating a different query ID as the relation OID is used
in the computation.

A patch is under discussion to switch to a different approach based on
the relation name, and there was no test coverage for this area,
including how queries are currently grouped with search_path.  This is
useful to track how the situation changes between HEAD and any patches
proposed.

Christoph has proposed the test with ON COMMIT DROP temporary tables,
and I have written the second part.

Author: Christoph Berg <myon@debian.org>
Author: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/Z9iWXKGwkm8RAC93@msg.df7cb.de
2025-03-26 07:25:23 +09:00
Nathan Bossart
626d7236b6 pg_upgrade: Add --swap for faster file transfer.
This new option instructs pg_upgrade to move the data directories
from the old cluster to the new cluster and then to replace the
catalog files with those generated for the new cluster.  This mode
can outperform --link, --clone, --copy, and --copy-file-range,
especially on clusters with many relations.

However, this mode creates many garbage files in the old cluster,
which can prolong the file synchronization step if
--sync-method=syncfs is used.  To handle that, we recommend using
--sync-method=fsync with this mode, and pg_upgrade internally uses
"initdb --sync-only --no-sync-data-files" for file synchronization.
pg_upgrade will synchronize the catalog files as they are
transferred.  We assume that the database files transferred from
the old cluster were synchronized prior to upgrade.

This mode also complicates reverting to the old cluster, so we
recommend restoring from backup upon failure during or after file
transfer.  We did consider teaching pg_upgrade how to generate a
revert script for such failures, but we decided against it due to
the rarity of failing during file transfer, the complexity of
generating the script, and the potential for misusing the script.

The new mode is limited to clusters located in the same file
system.  With some effort, we could probably support upgrades
between different file systems, but this mode is unlikely to offer
much benefit if we have to copy the files across file system
boundaries.

It is also limited to upgrades from version 10 or newer.  There are
a few known obstacles for using swap mode to upgrade from older
versions.  For example, the visibility map format changed in v9.6,
and the sequence tuple format changed in v10.  In fact, swap mode
omits the --sequence-data option in its uses of pg_dump and instead
reuses the old cluster's sequence data files.  While teaching swap
mode to deal with these kinds of changes is surely possible (and we
may have to deal with similar problems in the future, anyway), it
doesn't seem worth the effort to support upgrades from
long-unsupported versions.

Reviewed-by: Greg Sabino Mullane <htamfids@gmail.com>
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Discussion: https://postgr.es/m/Zyvop-LxLXBLrZil%40nathan
2025-03-25 16:02:35 -05:00
Nathan Bossart
9c49f0e8cd pg_dump: Add --sequence-data.
This new option instructs pg_dump to dump sequence data when the
--no-data, --schema-only, or --statistics-only option is specified.
This was originally considered for commit a7e5457db8, but it was
left out at that time because there was no known use-case.  A
follow-up commit will use this to optimize pg_upgrade's file
transfer step.

Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Discussion: https://postgr.es/m/Zyvop-LxLXBLrZil%40nathan
2025-03-25 16:02:35 -05:00
Nathan Bossart
cf131fa942 initdb: Add --no-sync-data-files.
This new option instructs initdb to skip synchronizing any files
in database directories, the database directories themselves, and
the tablespace directories, i.e., everything in the base/
subdirectory and any other tablespace directories.  Other files,
such as those in pg_wal/ and pg_xact/, will still be synchronized
unless --no-sync is also specified.  --no-sync-data-files is
primarily intended for internal use by tools that separately ensure
the skipped files are synchronized to disk.  A follow-up commit
will use this to help optimize pg_upgrade's file transfer step.

The --sync-method=fsync implementation of this option makes use of
a new exclude_dir parameter for walkdir().  When not NULL,
exclude_dir specifies a directory to skip processing.  The
--sync-method=syncfs implementation of this option just skips
synchronizing the non-default tablespace directories.  This means
that initdb will still synchronize some or all of the database
files, but there's not much we can do about that.

Discussion: https://postgr.es/m/Zyvop-LxLXBLrZil%40nathan
2025-03-25 16:02:35 -05:00
Jeff Davis
650ab8aaf1 Stats: use schemaname/relname instead of regclass.
For import and export, use schemaname/relname rather than
regclass.

This is more natural during export, fits with the other arguments
better, and it gives better control over error handling in case we
need to downgrade more errors to warnings.

Also, use text for the argument types for schemaname, relname, and
attname so that casts to "name" are not required.

Author: Corey Huinker <corey.huinker@gmail.com>
Discussion: https://postgr.es/m/CADkLM=ceOSsx_=oe73QQ-BxUFR2Cwqum7-UP_fPe22DBY0NerA@mail.gmail.com
2025-03-25 11:16:06 -07:00
Jeff Davis
2a420f7995 Minor doc update for commit 99f8f3fbbc.
Author: Corey Huinker <corey.huinker@gmail.com>
2025-03-25 11:15:52 -07:00
Daniel Gustafsson
1a759c8327 psql: Make default \watch interval configurable
The default interval for \watch to wait between executing queries,
when executed without a specified interval, was hardcoded to two
seconds.  This adds the new variable WATCH_INTERVAL which is used
to set the default interval, making it configurable for the user.
This makes \watch the first command which has a user configurable
default setting.

Author: Daniel Gustafsson <daniel@yesql.se>
Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Reviewed-by: Masahiro Ikeda <ikedamsh@oss.nttdata.com>
Reviewed-by: Laurenz Albe <laurenz.albe@cybertec.at>
Reviewed-by: Greg Sabino Mullane <htamfids@gmail.com>
Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Discussion: https://postgr.es/m/B2FD26B4-8F64-4552-A603-5CC3DF1C7103@yesql.se
2025-03-25 17:53:33 +01:00
Daniel Gustafsson
a19db08274 pg_basebackup: Add missing PQclear in error path
This adds a missing PQclear in the error path of StreamLogicalLog, a
fix in the same vein as e889422d98 with an equivalent low impact.

Author: Steven Niu <niushiji@gmail.com>
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Discussion: https://postgr.es/m/c4b1c627-a3e4-4347-a670-1e28a43ce0eb@gmail.com
2025-03-25 17:24:23 +01:00
Peter Eisentraut
ef7a5af77d refactor: Pass relation OID instead of Relation to createForeignKeyCheckTriggers()
Currently, createForeignKeyCheckTriggers() takes a Relation type as
its first argument, but it doesn't use that argument directly.
Instead, it fetches the relation OID by calling RelationGetRelid().
Therefore, it would be more consistent with other functions (e.g.,
createForeignKeyCheckTriggers()) to pass the relation OID directly
instead of the whole Relation.

Author: Amul Sul <amul.sul@enterprisedb.com>
Discussion: https://www.postgresql.org/message-id/flat/CAAJ_b962c5AcYW9KUt_R_ER5qs3fUGbe4az-SP-vuwPS-w-AGA@mail.gmail.com
2025-03-25 17:04:12 +01:00
Peter Eisentraut
639238b978 refactor: Split ATExecAlterConstraintInternal()
Split ATExecAlterConstraintInternal() into two functions:
ATExecAlterConstrDeferrability() and
ATExecAlterConstrInheritability().  This simplifies the code and
avoids unnecessary confusion caused by recursive code, which isn't
needed for ATExecAlterConstrInheritability().

(This also takes over the changes in commit 64224a834c, as the new
AlterConstrDeferrabilityRecurse() is essentially the old
ATExecAlterChildConstr().)

Author: Amul Sul <amul.sul@enterprisedb.com>
Discussion: https://www.postgresql.org/message-id/flat/CAAJ_b962c5AcYW9KUt_R_ER5qs3fUGbe4az-SP-vuwPS-w-AGA@mail.gmail.com
2025-03-25 16:18:00 +01:00
Peter Eisentraut
a3280e2a49 refactor: Move some code that updates pg_constraint to a separate function
This extracts common/duplicate code for different ALTER CONSTRAINT
variants into a common function.  We plan to add more variants that
would use the same code.

Author: Amul Sul <amul.sul@enterprisedb.com>
Discussion: https://www.postgresql.org/message-id/flat/CAAJ_b962c5AcYW9KUt_R_ER5qs3fUGbe4az-SP-vuwPS-w-AGA@mail.gmail.com
2025-03-25 14:37:22 +01:00
Peter Eisentraut
f4b2a62ae3 Small fixes for Add ALTER TABLE ... ALTER CONSTRAINT ... SET [NO] INHERIT
Small fixes for commit f4e53e10b6c: Add missing calls to
InvokeObjectPostAlterHook() and also CacheInvalidateRelcache().  The
former change could have a user-visible effect.  The latter omission
might have caused other bugs, but it is not clear whether one actually
existed.  With these changes, the code is now more consistent with
similar ALTER CONSTRAINT variants, especially the ones that set the
deferrability.

Reviewed-by: Álvaro Herrera <alvherre@alvh.no-ip.org>
Discussion: https://postgr.es/m/CAF1DzPVfOW6Kk=7SSh7LbneQDJWh=PbJrEC_Wkzc24tHOyQWGg@mail.gmail.com
2025-03-25 13:40:24 +01:00
Alexander Korotkov
62f36d6924 postgres_fdw: Remove redundant check in semijoin_target_ok()
If a var belongs to the innerrel of the joinrel, it's not possible that
it belongs to the outerrel.  This commit removes the redundant check from
the if-clause but keeps it as an assertion.

Discussion: https://postgr.es/m/flat/CAHewXN=8aW4hd_W71F7Ua4+_w0=bppuvvTEBFBF6G0NuSXLwUw@mail.gmail.com
Author: Tender Wang <tndrwang@gmail.com>
Reviewed-by: Alexander Pyhalov <a.yhalov@postgrespro.ru>
Backpatch-through: 17
2025-03-25 12:49:01 +02:00
Thomas Munro
3c86223c99 libpq: Deprecate pg_int64.
Previously we used pg_int64 in three function prototypes in libpq.  It
was added by commit 461ef73f to expose the platform-dependent type used
for int64 in the C89 era.  As of commit 962da900 it is defined as
standard int64_t, and the dust seems to have settled.

Let's just use int64_t directly in these three client-facing functions
instead of (yet) another name.  We've required C99 and thus <stdint.h>
since PostgreSQL 12, C89 and C++98 compilers are long gone, and client
applications very likely use standard types for their own 64-bit needs.
This also cleans up the obscure placement of a new #include <stdint.h>
directive in postgres_ext.h, required for the new definition.  The
typedef was hiding in there for historical reasons, but it doesn't fit
postgres_ext.h's own description of its purpose and there is no evidence
of client applications including postgres_ext.h directly to see it.

Keep a typedef marked deprecated for backward compatibility, but move it
into libpq-fe.h where it was used.

Reviewed-by: Peter Eisentraut <peter@eisentraut.org>
Discussion: https://postgr.es/m/CA%2BhUKGKn_EkNNGMY5RzMcKP%2Ba6urT4JF%3DCPhw_zHtQwjvX6P2g%40mail.gmail.com
2025-03-25 21:40:00 +13:00
Peter Eisentraut
be1cc9aaf5 Generalize index support in network support function
The network (inet) support functions currently only supported a
hardcoded btree operator family.  With the generalized compare type
facility, we can generalize this to support any operator family from
any index type that supports the required operators.

Author: Mark Dilger <mark.dilger@enterprisedb.com>
Co-authored-by: Peter Eisentraut <peter@eisentraut.org>
Discussion: https://www.postgresql.org/message-id/flat/E72EAA49-354D-4C2E-8EB9-255197F55330@enterprisedb.com
2025-03-25 07:11:56 +01:00
Michael Paquier
5ac462e2b7 Add support for custom_query_jumble as a node field attribute
This option gives the possibility for query jumble to define a custom
routine for the field of a Node, extending support for
custom_query_jumble as a node field attribute.  When dealing with
complex node structures, this can be simpler than having to enforce a
custom function across a full node.

Custom functions need to be defined in queryjumblefuncs.c, named as
_jumble${node}_${field}(), and use in input the JumbleState, the node
and its field.  The field is not really required if we have the Node,
but it makes custom implementations somewhat easier to think about.  The
code generated by gen_node_support.pl uses a macro called
JUMBLE_CUSTOM(), hiding the internals of the logic inside
queryjumblefuncs.c.

This will be used by an upcoming patch manipulating adding a custom
routine into a field of RangeTblEntry, but this facility can become
useful in more cases.

Reviewed-by: Christoph Berg <myon@debian.org>
Discussion: https://postgr.es/m/Z9y43-dRvb4EtxQ0@paquier.xyz
2025-03-25 14:18:00 +09:00
Jeff Davis
626df47ad9 Remove 'additional' pointer from TupleHashEntryData.
Reduces memory required for hash aggregation by avoiding an allocation
and a pointer in the TupleHashEntryData structure. That structure is
used for all buckets, whether occupied or not, so the savings is
substantial.

Discussion: https://postgr.es/m/AApHDvpN4v3t_sdz4dvrv1Fx_ZPw=twSnxuTEytRYP7LFz5K9A@mail.gmail.com
Reviewed-by: David Rowley <dgrowleyml@gmail.com>
2025-03-24 22:06:02 -07:00
Jeff Davis
a0942f441e Add ExecCopySlotMinimalTupleExtra().
Allows an "extra" argument that allocates extra memory at the end of
the MinimalTuple. This is important for callers that need to store
additional data, but do not want to perform an additional allocation.

Suggested-by: David Rowley <dgrowleyml@gmail.com>
Discussion: https://postgr.es/m/CAApHDvppeqw2pNM-+ahBOJwq2QmC0hOAGsmCpC89QVmEoOvsdg@mail.gmail.com
2025-03-24 22:05:53 -07:00
Jeff Davis
4d143509cb Create accessor functions for TupleHashEntry.
Refactor for upcoming optimizations.

Reviewed-by: David Rowley <dgrowleyml@gmail.com>
Discussion: https://postgr.es/m/1cc3b400a0e8eead18ff967436fa9e42c0c14cfb.camel@j-davis.com
2025-03-24 22:05:41 -07:00
Jeff Davis
cc721c459d HashAgg: use Bump allocator for hash TupleHashTable entries.
The entries aren't freed until the entire hash table is destroyed, so
use the Bump allocator to improve allocation speed, avoid wasting
space on the chunk header, and avoid wasting space due to the
power-of-two allocations.

Discussion: https://postgr.es/m/CAApHDvqv1aNB4cM36FzRwivXrEvBO_LsG_eQ3nqDXTjECaatOQ@mail.gmail.com
Reviewed-by: David Rowley
2025-03-24 22:05:33 -07:00
Amit Kapila
cc4331605a Fix the typo in the test case added in 73eba5004a.
Author: vignesh C <vignesh21@gmail.com>
Discussion: https://postgr.es/m/CALDaNm2ms1deM5EYNLFEfESv_Kw=Y4AiTB0LP=qGS-UpFwGbPg@mail.gmail.com
Discussion: https://postgr.es/m/CABdArM7FW-_dnthGkg2s0fy1HhUB8C3ELA0gZX1kkbs1ZZoV3Q@mail.gmail.com
2025-03-25 09:39:53 +05:30
Amit Kapila
b87ced747d Fix an oversight in 3abe9dc188.
Forgot to update the comment atop one of the functions.

Author: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Discussion: https://postgr.es/m/OSCPR01MB1496623BE1125B44614494E7AF5A72@OSCPR01MB14966.jpnprd01.prod.outlook.com
2025-03-25 09:26:23 +05:30
Alexander Korotkov
023fb51275 postgres_fdw: Avoid pulling up restrict infos from subqueries
Semi-join joins below left/right join are deparsed as
subqueries.  Thus, we can't refer to subqueries vars from upper relations.
This commit avoids pulling conditions from them.

Reported-by: Robins Tharakan <tharakan@gmail.com>
Bug: #18852
Discussion: https://postgr.es/m/CAEP4nAzryLd3gwcUpFBAG9MWyDfMRX8ZjuyY2XXjyC_C6k%2B_Zw%40mail.gmail.com
Author: Alexander Pyhalov <a.pyhalov@postgrespro.ru>
Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com>
Backpatch-through: 17
2025-03-25 05:49:47 +02:00
Andres Freund
adb5f85fa5 Redefine max_files_per_process to control additionally opened files
Until now max_files_per_process=N limited each backend to open N files in
total (minus a safety factor), even if there were already more files opened in
postmaster and inherited by backends.  Change max_files_per_process to control
how many additional files each process is allowed to open.

The main motivation for this is the patch to add io_method=io_uring, which
needs to open one file for each backend.  Without this patch, even if
RLIMIT_NOFILE is high enough, postmaster will fail in set_max_safe_fds() if
started with a high max_connections.  The cause of the failure is that, until
now, set_max_safe_fds() subtracted the already open files from
max_files_per_process.

Reviewed-by: Noah Misch <noah@leadboat.com>
Discussion: https://postgr.es/m/w6uiicyou7hzq47mbyejubtcyb2rngkkf45fk4q7inue5kfbeo@bbfad3qyubvs
Discussion: https://postgr.es/m/CAGECzQQh6VSy3KG4pN1d=h9J=D1rStFCMR+t7yh_Kwj-g87aLQ@mail.gmail.com
2025-03-24 18:20:18 -04:00
Nathan Bossart
7d559c8580 Expand comment for isset_offset.
This field was added in commit 0164a0f9ee to provide a way to
determine whether a storage parameter was explicitly set for the
relation or if it just picked up the default value.  In most cases,
this can be accomplished by giving the storage parameter a special
out-of-range default value (e.g., the
autovacuum_vacuum_insert_threshold storage parameter defaults to
-2), but this approach doesn't work in all cases.  For example, a
Boolean storage parameter cannot be given an out-of-range default,
so we need another way to discover the source of its value.

Reported-by: "David G. Johnston" <david.g.johnston@gmail.com>
Reviewed-by: "David G. Johnston" <david.g.johnston@gmail.com>
Discussion: https://postgr.es/m/CAKFQuwYKtEUYKS%2B18gRs-xPhn0qOJgM2KGyyWVCODHuVn9F-XQ%40mail.gmail.com
2025-03-24 15:47:02 -05:00
Melanie Plageman
aea916fe55 Fix bitmapheapscan incorrect recheck of NULL tuples
The bitmap heap scan skip fetch optimization skips fetching the heap
block when a page is set all-visible in the visibility map and no
columns from the table are needed to satisfy the query.

2b73a8cd33 and c3953226a0 changed the control flow of bitmap heap scan
to use the read stream API. The read stream API returns buffers
containing blocks to the user. To make this work with the skip fetch
optimization, we keep a count of the empty tuples we need to emit for
all the blocks skipped and only emit the empty tuples after processing
the next block fetched from the heap or at the end of the scan.

It's incorrect to recheck NULL tuples, so we must set `recheck` to false
before yielding control back to BitmapHeapNext(). This was done before
emitting any remaining empty tuples at the end of the scan but not for
empty tuples emitted during the scan. This meant that if a page fetched
from the heap did require recheck and set `recheck` to true and then we
emitted empty tuples for subsequent blocks, we would get wrong results.

Fix this by always setting `recheck` to false before emitting empty
tuples.

Reported-by: Alexander Lakhin <exclusion@gmail.com>
Tested-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/496f7acd-881c-4df3-9bd3-8f8534dfec26%40gmail.com
2025-03-24 16:40:59 -04:00
Álvaro Herrera
0e3e0ec06b
Fix typo 2025-03-24 17:36:44 +01:00
Fujii Masao
c68100aa43 Allow pg_recvlogical --drop-slot to work without --dbname.
When pg_recvlogical was introduced in 9.4, the --dbname option was not
required for --drop-slot. Without it, pg_recvlogical --drop-slot connected
using a replication connection (not tied to a specific database) and
was able to drop both physical and logical replication slots, similar to
pg_receivewal --drop-slot.

However, commit 0c013e08cf unintentionally changed this behavior in 9.5,
making pg_recvlogical always check whether it's connected to a specific
database and fail if it's not. This change was expected for --create-slot
and --start, which handle logical replication slots and require a database
connection, but it was unnecessary for --drop-slot, which should work with
any replication connection. As a result, --dbname became a required option
for --drop-slot.

This commit removes that restriction, restoring the original behavior and
allowing pg_recvlogical --drop-slot to work without specifying --dbname.

Although this issue originated from an unintended change, it has existed
for a long time without complaints or bug reports, and the documentation
never explicitly stated that --drop-slot should work without --dbname.
Therefore, the change is not treated as a bug fix and is applied only to
master.

Author: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Reviewed-by: Fujii Masao <masao.fujii@gmail.com>
Discussion: https://postgr.es/m/b15ecf4f-e5af-4fbb-82c2-a425f453e0b2@oss.nttdata.com
2025-03-25 00:18:27 +09:00
Fujii Masao
dfc13428a9 doc: Clarify required options for each action in pg_recvlogical.
Each pg_recvlogical action requires specific options. For example,
--slot, --dbname, and --file must be specified with the --start action.
Previously, the documentation did not clearly outline these requirements.

This commit updates the documentation to explicitly state
the necessary options for each action.

Author: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Co-authored-by: Fujii Masao <masao.fujii@gmail.com>
Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Reviewed-by: Vignesh C <vignesh21@gmail.com>
Reviewed-by: David G. Johnston <david.g.johnston@gmail.com>
Discussion: https://postgr.es/m/OSCPR01MB14966930B4357BAE8C9D68A8AF5C72@OSCPR01MB14966.jpnprd01.prod.outlook.com
2025-03-25 00:14:38 +09:00
Peter Eisentraut
76563f88cf postgres_fdw: improve security checks
SCRAM pass-through should not bypass the FDW security check as it was
implemented for postgres_fdw in commit 761c79508e.

This commit improves the security check by adding new SCRAM
pass-through checks to ensure that the required SCRAM connection
options are not overwritten by the user mapping or foreign server
options.  This is meant to match the security requirements for a
password-using connection.

Since libpq has no SCRAM-specific equivalent of
PQconnectionUsedPassword(), we enforce this instead by making the
use_scram_passthrough option of postgres_fdw imply
require_auth=scram-sha-256.  This means that if use_scram_passthrough
is set, some situations that might otherwise have worked are
preempted, for example GSSAPI with delegated credentials.  This could
be enhanced in the future if there is desire for more flexibility.

Reported-by: Jacob Champion <jacob.champion@enterprisedb.com>
Author: Matheus Alcantara <mths.dev@pm.me>
Co-authored-by: Jacob Champion <jacob.champion@enterprisedb.com>
Reviewed-by: Jacob Champion <jacob.champion@enterprisedb.com>
Discussion: https://www.postgresql.org/message-id/flat/CAFY6G8ercA1KES%3DE_0__R9QCTR805TTyYr1No8qF8ZxmMg8z2Q%40mail.gmail.com
2025-03-24 15:56:53 +01:00
Magnus Hagander
a8eeb22f17 psql: use consistent alias for pg_description
Author:Jelte Fennema-Nio <github-tech@jeltef.nl>
Suggested-By: Michael Banck <mbanck@gmx.net>
Discussion: https://www.postgresql.org/message-id/67813520.170a0220.183245.7bf0%40mx.google.com
2025-03-24 14:31:28 +01:00
Magnus Hagander
d696406a9b psql: show default extension version in \dx output
Reviewed-By: Julien Rouhaud <rjuju123@gmail.com>
Reviewed-By: Michael Banck <mbanck@gmx.net>
Reviewed-By: Yugo Nagata <nagata@sraoss.co.jp>
Reviewed-By: Nathan Bossart <nathandbossart@gmail.com>
Reviewed-By: Jelte Fennema-Nio <postgres@jeltef.nl>
Discussion: https://postgr.es/m/CABUevEyTMyXC6OvCWkj+rPnHrfi8_Rw_+DD_jzgFFNPqgf+Oig@mail.gmail.com
2025-03-24 14:25:05 +01:00
Heikki Linnakangas
19c6eb06c5 Add test case for when subscriber table is missing a column
We haven't had bugs in this area, but there's some not-entirely
trivial code to detect that case, so it seems good to have test
coverage for it.

Author: Peter Smith <smithpb2250@gmail.com>
Reviewed-by: vignesh C <vignesh21@gmail.com>
Reviewed-by: Tomas Vondra <tomas@vondra.me>
Discussion: https://www.postgresql.org/message-id/CAHut%2BPtX8P0EGhsk9p%3DhQGUHrzxeCSzANXSMKOvYiLX-EjdyNw@mail.gmail.com
2025-03-24 12:13:32 +02:00
Amit Kapila
73eba5004a Detect and Log multiple_unique_conflicts type conflict.
Introduce a new conflict type, multiple_unique_conflicts, to handle cases
where an incoming row during logical replication violates multiple UNIQUE
constraints.

Previously, the apply worker detected and reported only the first
encountered key conflict (insert_exists/update_exists), causing repeated
failures as each constraint violation needs to be handled one by one
making the process slow and error-prone.

With this patch, the apply worker checks all unique constraints upfront
once the first key conflict is detected and reports
multiple_unique_conflicts if multiple violations exist. This allows users
to resolve all conflicts at once by deleting all conflicting tuples rather
than dealing with them individually or skipping the transaction.

In the future, this will also allow us to specify different resolution
handlers for such a conflict type.

Add the stats for this conflict type in pg_stat_subscription_stats.

Author: Nisha Moond <nisha.moond412@gmail.com>
Author: Zhijie Hou <houzj.fnst@fujitsu.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Reviewed-by: Peter Smith <smithpb2250@gmail.com>
Reviewed-by: Dilip Kumar <dilipbalaut@gmail.com>
Discussion: https://postgr.es/m/CABdArM7FW-_dnthGkg2s0fy1HhUB8C3ELA0gZX1kkbs1ZZoV3Q@mail.gmail.com
2025-03-24 12:30:44 +05:30
David Rowley
35a92b7c25 Add tests for POSITION(bytea, bytea)
Previously there was no coverage for this function.

Author: Aleksander Alekseev <aleksander@timescale.com>
Reviewed-by: Peter Smith <smithpb2250@gmail.com>
Reviewed-by: Rustam ALLAKOV <rustamallakov@gmail.com>
Discussion: https://postgr.es/m/CAJ7c6TMT6XCooMVKnCd_tR2oBdGcnjefSeCDCv8jzKy9VkWA5w@mail.gmail.com
2025-03-24 19:32:02 +13:00
Michael Paquier
2a0cd38da5 Allow plugins to set a 64-bit plan identifier in PlannedStmt
This field can be optionally set in a PlannedStmt through the planner
hook, giving extensions the possibility to assign an identifier related
to a computed plan.  The backend is changed to report it in the backend
entry of a process running (including the extended query protocol), with
semantics and APIs to set or get it similar to what is used for the
existing query ID (introduced in the backend via 4f0b0966c8).  The plan
ID is reset at the same timing as the query ID.  Currently, this
information is not added to the system view pg_stat_activity; extensions
can access it through PgBackendStatus.

Some patches have been proposed to provide some features in the planning
area, where a plan identifier is used as a key to know the plan involved
(for statistics, plan storage and manipulations, etc.), and the point of
this commit is to provide an anchor in the backend that extensions can
rely on for future work.   The reset of the plan identifier is
controlled by core and follows the same pattern as the query identifier
added in 4f0b0966c8.

The contents of this commit are extracted from a larger set proposed
originally by Lukas Fittl, that Sami Imseih has proposed as an
independent change, with a few tweaks sprinkled by me.

Author: Lukas Fittl <lukas@fittl.com>
Author: Sami Imseih <samimseih@gmail.com>
Reviewed-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/CAP53Pkyow59ajFMHGpmb1BK9WHDypaWtUsS_5DoYUEfsa_Hktg@mail.gmail.com
Discussion: https://postgr.es/m/CAA5RZ0vyWd4r35uUBUmhngv8XqeiJUkJDDKkLf5LCoWxv-t_pw@mail.gmail.com
2025-03-24 13:23:42 +09:00
Tom Lane
8a3e4011f0 psql: Add tab completion for VACUUM and ANALYZE ... ONLY option.
Improve psql's tab completion for VACUUM and ANALYZE by supporting
the ONLY option introduced in 62ddf7ee9.

In passing, simplify some of the VACUUM patterns by making use
of MatchAnyN.

Author: Umar Hayat <postgresql.wizard@gmail.com>
Reviewed-by: Vignesh C <vignesh21@gmail.com>
Reviewed-by: Ilia Evdokimov <ilya.evdokimov@tantorlabs.com>
Discussion: https://postgr.es/m/CAD68Dp3L6yW_nWs+MWBs6s8tKLRzXaQdQgVRm4byZe0L-hRD8g@mail.gmail.com
2025-03-23 17:16:08 -04:00
Heikki Linnakangas
2817525f0d Fix rare assertion failure in standby, if primary is restarted
During hot standby, ExpireAllKnownAssignedTransactionIds() and
ExpireOldKnownAssignedTransactionIds() functions mark old transactions
as no-longer running, but they failed to update xactCompletionCount
and latestCompletedXid. AFAICS it would not lead to incorrect query
results, because those functions effectively turn in-progress
transactions into aborted transactions and an MVCC snapshot considers
both as "not visible". But it could surprise GetSnapshotDataReuse()
and trigger the "TransactionIdPrecedesOrEquals(TransactionXmin,
RecentXmin))" assertion in it, if the apparent xmin in a backend would
move backwards. We saw this happen when GetCatalogSnapshot() would
reuse an older catalog snapshot, when GetTransactionSnapshot() had
already advanced TransactionXmin.

The bug goes back all the way to commit 623a9ba79b in v14 that
introduced the snapshot reuse mechanism, but it started to happen more
frequently with commit 952365cded which removed a
GetTransactionSnapshot() call from backend startup. That made it more
likely for ExpireOldKnownAssignedTransactionIds() to be called between
GetCatalogSnapshot() and the first GetTransactionSnapshot() in a
backend.

Andres Freund first spotted this assertion failure on buildfarm member
'skink'. Reproduction and analysis by Tomas Vondra.

Backpatch-through: 14
Discussion: https://www.postgresql.org/message-id/oey246mcw43cy4qw2hqjmurbd62lfdpcuxyqiu7botx3typpax%40h7o7mfg5zmdj
2025-03-23 20:41:16 +02:00
Noah Misch
f0446384ea Fix "make clean" for new TAP suite.
Commit 28f04984f0 missed this.
2025-03-23 06:12:02 -07:00
Andres Freund
ca3067cc57 aio: Change prefix of PgAioResultStatus values to PGAIO_RS_
The previous prefix wasn't consistent with the naming of other AIO related
enum values. It seems best to rename it before the users are introduced.

Reported-by: Melanie Plageman <melanieplageman@gmail.com>
Discussion: https://postgr.es/m/CAAKRu_Yb+JzQpNsgUxCB0gBi+sE-mi_HmcJF6ALnmO4W+UgwpA@mail.gmail.com
2025-03-22 17:30:44 -04:00
Tom Lane
58fdca2204 plpgsql: make WHEN OTHERS distinct from WHEN SQLSTATE '00000'.
The catchall exception condition OTHERS was represented as
sqlerrstate == 0, which was a poor choice because that comes
out the same as SQLSTATE '00000'.  While we don't issue that
as an error code ourselves, there isn't anything particularly
stopping users from doing so.  Use -1 instead, which can't
match any allowed SQLSTATE string.

While at it, invent a macro PLPGSQL_OTHERS to use instead of
a hard-coded magic number.

While this seems like a bug fix, I'm inclined not to back-patch.
It seems barely possible that someone has written code like this
and would be annoyed by changing the behavior in a minor release.

Reported-by: David Fiedler <david.fido.fiedler@gmail.com>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/CAHjN70-=H5EpTOuZVbC8mPvRS5EfZ4MY2=OUdVDWoyGvKhb+Rw@mail.gmail.com
2025-03-22 14:17:00 -04:00
Peter Geoghegan
9a2e2a285a Improve nbtree array primitive scan scheduling.
Add a new scheduling heuristic: don't end the ongoing primitive index
scan immediately (at the point where _bt_advance_array_keys notices that
the next set of matching tuples must be on a later page) if the primscan
already managed to step right/left from its first leaf page.  Schedule a
recheck against the next sibling leaf page's finaltup instead.

The new heuristic tends to avoid scenarios where the top-level scan
repeatedly starts and ends primitive index scans that each read only one
leaf page from a group of neighboring leaf pages.  Affected top-level
scans will now tend to step forward (or backward) through the index
instead, without wasting cycles on descending the index anew.

The recheck mechanism isn't exactly new.  But up until now it has only
been used to deal with edge cases involving high key finaltups with one
or more truncated -inf attributes that _bt_advance_array_keys deemed
"provisionally satisfied" (satisfied for the purposes of allowing the
scan to step onto the next page, subject to recheck once on that page).
The mechanism was added by commit 5bf748b8, which invented the general
concept of primitive scan scheduling.  It was later enhanced by commit
79fa7b3b, which taught it about cases involving -inf attributes that
satisfy inequality scan keys required in the opposite-to-scan direction
only (arguably, they should have been covered by the earliest version).
Now the recheck mechanism can be applied based on scan-level heuristics,
which have nothing to do with truncated high keys.  Now rechecks might
be performed by _bt_readpage when scanning in _either_ scan direction.

The theory behind the new heuristic is that any primitive scan that
makes it past its first leaf page is one that is already likely to have
arrays whose key values match index tuples that are closely clustered
together in the index.  The rules that determine whether we ever get
past the first page are still conservative (that'll still only happen
when pstate.finaltup strongly suggests that it's the right thing to do).
Surviving past the first leaf page is a strong signal in itself.

Preparation for an upcoming patch that will add skip scan optimizations
to nbtree.  That'll work by adding skip arrays, which behave similarly
to SAOP arrays, but generate their elements procedurally and on-demand.

Note that this commit isn't specifically concerned with skip arrays; the
scheduling logic doesn't (and won't) condition anything on whether the
scan uses skip arrays, SAOP arrays, or some combination of the two
(which seems like a good general principle for _bt_advance_array_keys).
While the problems that this commit ameliorates are more likely with
skip arrays (at least in practice), SAOP arrays (or those with very
dense, contiguous array elements) are also affected.

Author: Peter Geoghegan <pg@bowt.ie>
Reviewed-By: Matthias van de Meent <boekewurm+postgres@gmail.com>
Discussion: https://postgr.es/m/CAH2-Wzkz0wPe6+02kr+hC+JJNKfGtjGTzpG3CFVTQmKwWNrXNw@mail.gmail.com
2025-03-22 13:02:18 -04:00
Melanie Plageman
e215166c9c Use streaming read I/O in SP-GiST vacuuming
Like 69273b818b did for GiST vacuuming, make SP-GiST vacuum use the
read stream API for vacuuming physically contiguous index pages.

Concurrent insertions may cause SP-GiST index tuples to be redirected.
While vacuuming, these are added to a pending list which is later
processed to ensure no dead tuples are left behind. Pages containing
such tuples are still read by directly calling ReadBuffer() and do not
use the read stream API.

Author: Andrey M. Borodin <x4mmm@yandex-team.ru>
Reviewed-by: Melanie Plageman <melanieplageman@gmail.com>
Discussion: https://postgr.es/m/37432403-8657-403B-9CDF-5A642BECDD81%40yandex-team.ru
2025-03-21 17:51:22 -04:00
Thomas Munro
e51ca405ed Fix ps display for IO workers.
This code must have missed a memo about the backend type description
being supplied automatically these days, and was duplicating that
information.

Before: "io worker io worker: N"
After:  "io worker N"
2025-03-22 10:13:23 +13:00
Tom Lane
16a3ae504e Revert inappropriate weakening of an Assert in plpgsql.
Commit 682ce911f modified exec_save_simple_expr to accept a Param
in the tlist of a Gather node, rather than the normal case of a Var
referencing the Gather's input.  It turns out that this was a kluge
to work around the bug later fixed in 0f7ec8d9c, namely that setrefs.c
was failing to replace Params in upper plan nodes with Var references
to the same Params appearing in the child tlists.  With that fixed,
there seems no reason to continue to allow a Param here.  (Moreover,
even if we did expect a Param here, the semantically correct thing
to do would be to take the Param as the expression being sought.
Whatever it may represent, it is *not* a reference to the child.)
Hence, revert that part of 682ce911f.

That all happened a long time ago.  However, since the net effect
here is just to tighten an Assert condition, I'm content to change
it only in master.

Discussion: https://postgr.es/m/1565347.1742572349@sss.pgh.pa.us
2025-03-21 15:55:06 -04:00
Masahiko Sawada
04ff636cbc Add GUC option to control maximum active replication origins.
This commit introduces a new GUC option max_active_replication_origins
to control the maximum number of active replication
origins. Previously, this was controlled by
'max_replication_slots'. Having a separate GUC option provides better
flexibility for setting up subscribers, as they may not require
replication slots (for cascading replication) but always require
replication origins.

Author: Euler Taveira <euler@eulerto.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-by: Peter Eisentraut <peter@eisentraut.org>
Reviewed-by: vignesh C <vignesh21@gmail.com>
Discussion: https://postgr.es/m/b81db436-8262-4575-b7c4-bc0c1551000b@app.fastmail.com
2025-03-21 12:20:15 -07:00
Tom Lane
0e032a2240 Place "extern" declaration in the right part of pg_class.h.
errdetail_relkind_not_supported() was declared within
EXPOSE_TO_CLIENT_CODE, which is mistaken since that function
isn't available client-side.  While relatively harmless,
this isn't good precedent.

Discussion: https://postgr.es/m/1134562.1742507765@sss.pgh.pa.us
2025-03-21 15:14:15 -04:00
Tom Lane
cd72c1b76e Label the contents of pg_*_d.h files a little better.
Make genbki.pl emit some boilerplate comments identifying the
sections of the pg_*_d.h files that it generates.  This is in
hopes of making them slightly more readable, in case people
look at those files and not the pg_*.h/pg_*.dat originals.

Discussion: https://postgr.es/m/1134562.1742507765@sss.pgh.pa.us
2025-03-21 15:09:46 -04:00
Melanie Plageman
69273b818b Use streaming read I/O in GiST vacuuming
Like c5c239e26e did for btree vacuuming, make GiST vacuum use the
read stream API for sequentially processed pages.

Because it is possible for concurrent insertions to relocate unprocessed
index entries to already vacuumed pages, GiST vacuum must backtrack and
reprocess those pages. These pages are still read with explicit
ReadBuffer() calls.

Author: Andrey M. Borodin <x4mmm@yandex-team.ru>
Reviewed-by: Melanie Plageman <melanieplageman@gmail.com>
Discussion: https://postgr.es/m/EFEBED92-18D1-4C0F-A4EB-CD47072EF071%40yandex-team.ru
2025-03-21 14:06:45 -04:00
Melanie Plageman
3f850c3fc5 Assorted trivial cleanup of c5c239e26e
c5c239e26e made btree vacuum use the read stream API. Though it used
functions declared in read_stream.h, it relied on transitively including
it. Explicitly include that file. Also remove an extraneous newline and
decrease the scope of one of the local variables in btvacuumscan().
2025-03-21 14:06:40 -04:00
Tom Lane
7fe312f609 Fix plpgsql's handling of simple expressions in scrollable cursors.
exec_save_simple_expr did not account for the possibility that
standard_planner would stick a Materialize node atop the plan
of even a simple Result, if CURSOR_OPT_SCROLL is set.  This led
to an "unexpected plan node type" error.

This is a very old bug, but it'd only be reached by declaring a
cursor for a "SELECT simple-expression" query and explicitly
marking it scrollable, which is an odd thing to do.  So the lack
of prior reports isn't too surprising.

Bug: #18859
Reported-by: Olleg Samoylov <splarv@ya.ru>
Author: Andrei Lepikhov <lepihov@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/18859-0d5f28ac99a37059@postgresql.org
Backpatch-through: 13
2025-03-21 11:30:42 -04:00
Melanie Plageman
c5c239e26e Use streaming read I/O in btree vacuuming
Btree vacuum processes all index pages in physical order. Now it uses
the read stream API to get the next buffer instead of explicitly
invoking ReadBuffer().

It is possible for concurrent insertions to cause page splits during
index vacuuming. This can lead to index entries that have yet to be
vacuumed being moved to pages that have already been vacuumed. Btree
vacuum code handles this by backtracking to reprocess those pages. So,
while sequentially encountered pages are now read through the
read stream API, backtracked pages are still read with explicit
ReadBuffer() calls.

Author: Andrey Borodin <x4mmm@yandex-team.ru>
Reviewed-by: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Junwang Zhao <zhjwpku@gmail.com>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Discussion: https://postgr.es/m/flat/CAAKRu_bW1UOyup%3DjdFw%2BkOF9bCaAm%3D9UpiyZtbPMn8n_vnP%2Big%40mail.gmail.com#3b3a84132fc683b3ee5b40bc4c2ea2a5
2025-03-21 09:09:39 -04:00
Álvaro Herrera
1d617a2028
Change one loop in ATRewriteTable to use 1-based attnums
All TupleDescAttr() calls in tablecmds.c that aren't in loops across all
attributes use AttrNumber-style indexes (1-based); there was only one
place in ATRewriteTable that was stashing 0-based indexes in a list for
later processing.  Switch that to use attnums for consistency.

Author: jian he <jian.universality@gmail.com>
Discussion: https://postgr.es/m/CACJufxEoYA5ScUr2=CmA1xcpaS_1ixneDbEkVU77X1ctGxY2mA@mail.gmail.com
2025-03-21 10:55:06 +01:00
Thomas Munro
ce1a75c4fe Support buffer forwarding in StartReadBuffers().
StartReadBuffers() reports a short read when it finds a cached block
that ends a range needing I/O by updating the caller's *nblocks.  It
doesn't want to have to unpin the trailing hit that it knows the caller
wants, so the v17 version used sleight of hand in the name of
simplicity: it included it in *nblocks as if it were part of the I/O,
but internally tracked the shorter real I/O size in io_buffers_len (now
removed).

This API change "forwards" the delimiting buffer to the next call.  It's
still pinned, and still stored in the caller's array, but *nblocks no
longer includes stray buffers that are not really part of the operation.
The expectation is that the caller still wants the rest of the blocks
and will call again starting from that point, and now it can pass the
already pinned buffer back in (or choose not to and release it).

The change is needed for the coming asynchronous I/O version's larger
version of the problem: by definition it must move BM_IO_IN_PROGRESS
negotiation from WaitReadBuffers() to StartReadBuffers(), but it might
already have many buffers pinned before it discovers a need to split an
I/O.  (The current synchronous I/O version hides that detail from
callers by looping over smaller reads if required to make all covered
buffers valid in WaitReadBuffers(), so it looks like one operation but
it might occasionally be several under the covers.)

Aside from avoiding unnecessary pin traffic, this will also be important
for later work on out-of-order streams: you can't prioritize data that
is already available right now if that fact is hidden from you.

The new API is natural for read_stream.c (see ed0b87ca).  After a short
read it leaves forwarded buffers where they fell in its circular queue
for the continuing call to pick up.

Single-block StartReadBuffer() and traditional ReadBuffer() share code
but are not affected by the change.  They don't do multi-block I/O.

Reviewed-by: Andres Freund <andres@anarazel.de> (earlier versions)
Discussion: https://postgr.es/m/CA%2BhUKGK_%3D4CVmMHvsHjOVrK6t4F%3DLBpFzsrr3R%2BaJYN8kcTfWg%40mail.gmail.com
2025-03-21 20:43:59 +13:00
Thomas Munro
ed0b87caac Support buffer forwarding in read_stream.c.
In preparation for a follow-up change to the buffer manager, teach
read_stream.c to manage buffers "forwarded" from one StartReadBuffers()
call to the next after a short read.  This involves a small amount of
extra book-keeping, and opens the way for lower levels to split I/O
operations without having to drop pins, as required for efficient
handling of various edge cases.

Concretely, the "buffers" argument will change from an out parameter to
an in/out parameter.  Buffer queue elements must be initialized on first
use and cleared after they're consumed, but forwarded buffers are left
where they fall ahead of the current pending read in the queue, ready
for use by the operation that continues where a short read left off.
The stream also needs to count them for pin limit management and release
them on reset/early end.

Tested-by: Andres Freund <andres@anarazel.de> (earlier versions)
Discussion: https://postgr.es/m/CA%2BhUKGK_%3D4CVmMHvsHjOVrK6t4F%3DLBpFzsrr3R%2BaJYN8kcTfWg%40mail.gmail.com
2025-03-21 18:44:47 +13:00
Fujii Masao
14413d0ef5 doc: Remove incorrect description about dropping replication slots.
pg_drop_replication_slot() can drop replication slots created on
a different database than the one where it is executed. This behavior
has been in place since PostgreSQL 9.4, when pg_drop_replication_slot()
was introduced.

However, commit ff539d mistakenly added the following incorrect
description in the documentation:

     For logical slots, this must be called when connected to
     the same database the slot was created on.

This commit removes that incorrect statement. A similar mistake was
also present in the documentation for the DROP_REPLICATION_SLOT
command, which has now been corrected as well.

Back-patch to all supported versions.

Author: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Reviewed-by: Fujii Masao <masao.fujii@gmail.com>
Discussion: https://postgr.es/m/OSCPR01MB14966C6BE304B5BB2E58D4009F5DE2@OSCPR01MB14966.jpnprd01.prod.outlook.com
Backpatch-through: 13
2025-03-21 12:56:39 +09:00
David Rowley
00b52c3db6 Simplify EXPLAIN code for Memoize
This removes a needless special case for Memoize's FORMAT TEXT EXPLAIN
output.

ExplainPropertyText() outputs the same thing in text mode as the
special-case code was doing, so removing the special-case code results in
the same EXPLAIN output, just with less code.

It seems like a good idea to fix this to help prevent future changes in
this area from copying the same pattern.

Author: Ilia Evdokimov <ilya.evdokimov@tantorlabs.com>
Reported-by: David Rowley <dgrowleyml@gmail.com>
Discussion: https://postgr.es/m/88a71bcd-0b5c-4d0b-8107-757e96f402d5@tantorlabs.com
2025-03-21 13:40:05 +13:00
Andres Freund
202b12774d bufmgr: Improve stats when a buffer is read in concurrently
Previously we would have the following inaccuracies when a backend tried to
read in a buffer, but that buffer was read in concurrently by another backend:
- the read IO was double-counted in the global buffer access stats (pgBufferUsage)
- the buffer hit was not accounted for in:
  - global buffer access statistics
  - pg_stat_io
  - relation level IO stats
  - vacuum cost balancing

While trying to read in a buffer that is concurrently read in by another
backend is not a common occurrence, it's also not that rare, e.g. due to
concurrent sequential scans on the same relation.  This scenario has become
more likely in PG 17, due to the introducing of read streams, which can pin
multiple buffers before calling StartBufferIO() for all the buffers.

This behaviour has historically grown, but there doesn't seem to be any reason
to continue with the wrong accounting.

Reviewed-by: Melanie Plageman <melanieplageman@gmail.com>
Discussion: https://postgr.es/m/CAAKRu_Zk-B08AzPsO-6680LUHLOCGaNJYofaxTFseLa=OepV1g@mail.gmail.com
2025-03-20 19:58:22 -04:00
Andrew Dunstan
12604593e9 Show plperl version in the meson setup summary.
Also, use perl 'version' instead of 'api_versionstring' to sync with
the configure script.

Author: Roman Zharkov <r.zharkov@postgrespro.ru>

Discussion: https://postgr.es/m/93e7f77bf4e1ef4640e4ee733f9e2a78@postgrespro.ru
2025-03-20 18:55:29 -04:00
Andres Freund
fc51a60dd4 smgr: Hold interrupts in most smgr functions
We need to hold interrupts across most of the smgr.c/md.c functions, as
otherwise interrupt processing, e.g. due to a < ERROR elog/ereport, can
trigger procsignal processing, which in turn can trigger smgrreleaseall(). As
the relevant code is not reentrant, we quickly end up in a bad situation.

The only reason we haven't noticed this before is that there is only one
non-error ereport called in affected routines, in register_dirty_segments(),
and that one is extremely rarely reached. If one enables fd.c's FDDEBUG it's
easy to reproduce crashes.

It seems better to put the HOLD_INTERRUPTS()/RESUME_INTERRUPTS() in smgr.c,
instead of trying to push them down to md.c where possible: For one, every
smgr implementation would be vulnerable, for another, a good bit of smgr.c
code itself is affected too.

Eventually we might want a more targeted solution, allowing e.g. a networked
smgr implementation to be interrupted, but many other, more complicated,
problems would need to be fixed for that to be viable (e.g. smgr.c is often
called with interrupts already held).

One could argue this should be backpatched, but the existing < ERROR
elog/ereports that can be reached with unmodified sources are unlikely to be
reached. On balance the risk of backpatching seems higher than the gain - at
least for now.

Reviewed-by: Noah Misch <noah@leadboat.com>
Reviewed-by: Thomas Munro <thomas.munro@gmail.com>
Discussion: https://postgr.es/m/3vae7l5ozvqtxmd7rr7zaeq3qkuipz365u3rtim5t5wdkr6f4g@vkgf2fogjirl
2025-03-20 17:33:57 -04:00
Tom Lane
fdb5dd6331 Be more paranoid in configure's checks for CRC and POPCNT intrinsics.
In these tests, we need to verify not only that the compiler has heard
of these intrinsics, but that lower-level tools cope with them too.
(For example, the assembler must also know the instructions, and on
some platforms there might be library support involved.)  The hazard
is that the compiler might optimize away the calls altogether,
allowing the configure check to succeed only to have the build fail
later if lower-level support is missing.  The existing code tried to
prevent that by ensuring that the result of the intrinsic is used
for something, but that's really insufficient because we were feeding
constant input to it.  So the compiler would be perfectly entitled to
optimize away the calls anyway.  Fix by making the inputs into global
variables.  (Hypothetically, LTO optimization could still remove the
code --- but that's well past where we'd be likely to hit trouble.)

It is not known that any current compiler would actually optimize
away these calls, and even if that happened it would be unlikely
that any problem would manifest.  Our concern for this stems from
largely-bygone days when it was common to install gcc on platforms
with some other native compiler, so that a compiler-vs-library
support discrepancy was more probable.  Still, there's little
point in defending against such cases in a way that is visibly
incomplete.

I'm content to fix this in master for now; we can back-patch if
any indication appears that it's a live problem for someone.

Discussion: https://postgr.es/m/3368102.1741993462@sss.pgh.pa.us
2025-03-20 16:23:09 -04:00
Robert Haas
50ba65e733 Add an additional hook for EXPLAIN option validation.
Commit c65bc2e1d1 made it possible for
loadable modules to add EXPLAIN options. Normally, any necessary
validation can be performed by the hook function passed to
RegisterExtensionExplainOption, but if a loadable module wants to sanity
check options against each other, that needs to be done after the entire
options list has been processed. So, add an additional hook for that
purpose.

Author: Sami Imseih <samimseih@gmail.com>
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Reviewed-by: Andrei Lepikhov <lepihov@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: http://postgr.es/m/CAA5RZ0vOcJF91O2e5AQN+V6guMNLMhJx83dxALf-iUZ-hLGO_Q@mail.gmail.com
2025-03-20 13:47:55 -04:00
Nathan Bossart
af0d4901c1 Add test for pg_upgrade file transfer modes.
This new test checks all of pg_upgrade's file transfer modes.  For
each mode, we verify that pg_upgrade either succeeds (and some test
objects successfully reach the new version) or fails with an error
that indicates the mode is not supported on the current platform.
For cross-version tests, we also check that pg_upgrade transfers
non-default tablespaces.  (Tablespaces can't be tested on same
version upgrades because of the version-specific subdirectory
conflict, but we might be able to enable such tests once we teach
pg_upgrade how to handle in-place tablespaces.)

Suggested-by: Robert Haas <robertmhaas@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/Zyvop-LxLXBLrZil%40nathan
2025-03-20 11:08:42 -05:00
Nathan Bossart
0164a0f9ee Add vacuum_truncate configuration parameter.
This new parameter works just like the storage parameter of the
same name: if set to true (which is the default), autovacuum and
VACUUM attempt to truncate any empty pages at the end of the table.
It is primarily intended to help users avoid locking issues on hot
standbys.  The setting can be overridden with the storage parameter
or VACUUM's TRUNCATE option.

Since there's presently no way to determine whether a Boolean
storage parameter is explicitly set or has just picked up the
default value, this commit also introduces an isset_offset member
to relopt_parse_elt.

Suggested-by: Will Storey <will@summercat.com>
Author: Nathan Bossart <nathandbossart@gmail.com>
Co-authored-by: Gurjeet Singh <gurjeet@singh.im>
Reviewed-by: Laurenz Albe <laurenz.albe@cybertec.at>
Reviewed-by: Fujii Masao <masao.fujii@oss.nttdata.com>
Reviewed-by: Robert Treat <rob@xzilla.net>
Discussion: https://postgr.es/m/Z2DE4lDX4tHqNGZt%40dev.null
2025-03-20 10:16:50 -05:00
Peter Eisentraut
618c64ffd3 Revert workarounds for -Wmissing-braces false positives on old GCC
We have collected several instances of a workaround for GCC bug 53119,
which caused false-positive compiler warnings.  This bug has long been
fixed, but was still seen on the buildfarm, most recently on lapwing
with gcc (Debian 4.7.2-5).  (The GCC bug tracker mentions that a fix
was backported to 4.7.4 and 4.8.3.)

That compiler no longer runs warning-free since commit 6fdd5d9563, so
we don't need to keep these workarounds.  And furthermore, the
consensus appears to be that we don't want to keep supporting that era
of platform anymore at all.

This reverts the following commits:

d937904cce
506428d091
b449afb582
6392f2a096
bad0763a4d
5e0c761d0a

and makes a few similar fixes to newer code.

Discussion: https://www.postgresql.org/message-id/flat/e170d61f-01ab-4cf9-ab68-91cd1fac62c5%40eisentraut.org
Discussion: https://www.postgresql.org/message-id/flat/CA%2BTgmoYEAm-KKZibAP3hSqbTFTjUd47XtVcf3xSFDpyecXX9uQ%40mail.gmail.com
2025-03-20 11:25:58 +01:00
Peter Eisentraut
b7076c1e7f Fix extension control path tests
Change expected extension to be installed from amcheck to plpgsql since
not all build farm animals has the contrib module installed.

Author: Matheus Alcantara <mths.dev@pm.me>
Reported-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://www.postgresql.org/message-id/flat/E7C7BFFB-8857-48D4-A71F-88B359FADCFD@justatheory.com
2025-03-20 10:53:59 +01:00
Peter Eisentraut
47929324c5 Fix typo in comment 2025-03-20 10:44:12 +01:00
Amit Kapila
e5aeed4b80 pg_createsubscriber: Add -R publications option.
This patch introduces a new '-R'/'--remove' option in the
'pg_createsubscriber' utility to specify the object types to be removed
from the subscriber. Currently, we add support to specify 'publications'
as an object type. In the future, other object types like failover-slots
could be added.

This feature allows optionally to remove publications on the subscriber
that were replicated from the primary server (before running this tool)
during physical replication. Users may want to retain these publications
in case they want some pre-existing subscribers to point to the newly
created subscriber.

Author: Shubham Khanna <khannashubham1197@gmail.com>
Reviewed-by: Peter Smith <smithpb2250@gmail.com>
Reviewed-by: David G. Johnston <david.g.johnston@gmail.com>
Reviewed-by: Euler Taveira <euler@eulerto.com>
Reviewed-by: Zhijie Hou <houzj.fnst@fujitsu.com>
Reviewed-by: vignesh C <vignesh21@gmail.com>
Reviewed-by: Nisha Moond <nisha.moond412@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Discussion: https://postgr.es/m/CAHv8RjL4OvoYafofTb_U_JD5HuyoNowBoGpMfnEbhDSENA74Kg@mail.gmail.com
2025-03-20 12:21:54 +05:30
Andres Freund
5941946d09 meson: Flush stdout in testwrap
Otherwise the progress won't reliably be displayed during a test.

Reviewed-by: Noah Misch <noah@leadboat.com>
Discussion: https://postgr.es/m/kx6xu7suexal5vwsxpy7ybgkcznx6hgywbuhkr6qabcwxjqax2@i4pcpk75jvaa
Backpatch-through: 16
2025-03-19 09:04:09 -04:00
Peter Eisentraut
190dc27998 Update a code comment
The comment explained that ALTER TABLE ADD CONSTRAINT USING INDEX is
only supported with a btree index.  (This is not being changed.)  The
reason is to keep upgrades robust, as explained there.  The other part
of the comment, that btree is the only unique index kind anyway, is
somewhat less true as we're trying to enable unique indexes other than
btree, and it's irrelevant to this check.  There is a check for
indisunique earlier already.  So just remove this part of the comment.

Author: Mark Dilger <mark.dilger@enterprisedb.com>
Discussion: https://www.postgresql.org/message-id/flat/E72EAA49-354D-4C2E-8EB9-255197F55330@enterprisedb.com
2025-03-19 10:39:06 +01:00
Peter Eisentraut
4f7f7b0375 extension_control_path
The new GUC extension_control_path specifies a path to look for
extension control files.  The default value is $system, which looks in
the compiled-in location, as before.

The path search uses the same code and works in the same way as
dynamic_library_path.

Some use cases of this are: (1) testing extensions during package
builds, (2) installing extensions outside security-restricted
containers like Python.app (on macOS), (3) adding extensions to
PostgreSQL running in a Kubernetes environment using operators such as
CloudNativePG without having to rebuild the base image for each new
extension.

There is also a tweak in Makefile.global so that it is possible to
install extensions using PGXS into an different directory than the
default, using 'make install prefix=/else/where'.  This previously
only worked when specifying the subdirectories, like 'make install
datadir=/else/where/share pkglibdir=/else/where/lib', for purely
implementation reasons.  (Of course, without the path feature,
installing elsewhere was rarely useful.)

Author: Peter Eisentraut <peter@eisentraut.org>
Co-authored-by: Matheus Alcantara <matheusssilv97@gmail.com>
Reviewed-by: David E. Wheeler <david@justatheory.com>
Reviewed-by: Gabriele Bartolini <gabriele.bartolini@enterprisedb.com>
Reviewed-by: Marco Nenciarini <marco.nenciarini@enterprisedb.com>
Reviewed-by: Niccolò Fei <niccolo.fei@enterprisedb.com>
Discussion: https://www.postgresql.org/message-id/flat/E7C7BFFB-8857-48D4-A71F-88B359FADCFD@justatheory.com
2025-03-19 07:03:20 +01:00
Michael Paquier
2cce0fe440 psql: Allow queries terminated by semicolons while in pipeline mode
Currently, the only way to pipe queries in an ongoing pipeline (in a
\startpipeline block) is to leverage the meta-commands able to create
extended queries such as \bind, \parse or \bind_named.

While this is good enough for testing the backend with pipelines, it has
been mentioned that it can also be very useful to allow queries
terminated by semicolons to be appended to a pipeline.  For example, it
would be possible to migrate existing psql scripts to use pipelines by
just adding a set of \startpipeline and \endpipeline meta-commands,
making such scripts more efficient.

Doing such a change is proving to be simple in psql: queries terminated
by semicolons can be executed through PQsendQueryParams() without any
parameters set when the pipeline mode is active, instead of
PQsendQuery(), the default, like pgbench.  \watch is still forbidden
while in a pipeline, as it expects its results to be processed
synchronously.

The large portion of this commit consists in providing more test
coverage, with mixes of extended queries appended in a pipeline by \bind
and friends, and queries terminated by semicolons.

This improvement has been suggested by Daniel Vérité.

Author: Anthonin Bonnefoy <anthonin.bonnefoy@datadoghq.com>
Discussion: https://postgr.es/m/d67b9c19-d009-4a50-8020-1a0ea92366a1@manitou-mail.org
2025-03-19 13:34:59 +09:00
Thomas Munro
0b53c08677 Fix compiler warning for commit 434dbf69.
Reported-by: Tom Lane <tgl@sss.pgh.pa.us>
2025-03-19 17:26:16 +13:00
Thomas Munro
1cf4c56480 oauth: Simplify copy of PGoauthBearerRequest
Follow-up to 03366b61d. Since there are no more const members in the
PGoauthBearerRequest struct, the previous memcpy() can be replaced with
simple assignment.

Author: Jacob Champion <jacob.champion@enterprisedb.com>
Discussion: https://postgr.es/m/p4bd7mn6dxr2zdak74abocyltpfdxif4pxqzixqpxpetjwt34h%40qc6jgfmoddvq
2025-03-19 16:59:25 +13:00
Thomas Munro
873c0fd678 oauth: Improve validator docs on interruptibility
Andres pointed out that EINTR handling is inadequate for real-world use
cases. Direct module writers to our wait APIs instead.

Author: Jacob Champion <jacob.champion@enterprisedb.com>
Discussion: https://postgr.es/m/p4bd7mn6dxr2zdak74abocyltpfdxif4pxqzixqpxpetjwt34h%40qc6jgfmoddvq
2025-03-19 16:58:06 +13:00
Thomas Munro
d7e40845f9 oauth: Disallow synchronous DNS in libcurl
There is concern that a blocking DNS lookup in libpq could stall a
backend process (say, via FDW). Since there's currently no strong
evidence that synchronous DNS is a popular option, disallow it entirely
rather than warning at configure time. We can revisit if anyone
complains.

Per query from Andres Freund.

Author: Jacob Champion <jacob.champion@enterprisedb.com>
Discussion: https://postgr.es/m/p4bd7mn6dxr2zdak74abocyltpfdxif4pxqzixqpxpetjwt34h%40qc6jgfmoddvq
2025-03-19 16:56:19 +13:00
Thomas Munro
434dbf6907 oauth: Fix postcondition for set_timer on macOS
On macOS, readding an EVFILT_TIMER to a kqueue does not appear to clear
out previously queued timer events, so checks for timer expiration do
not work correctly during token retrieval. Switching to IPv4-only
communication exposes the problem, because libcurl is no longer clearing
out other timeouts related to Happy Eyeballs dual-stack handling.

Fully remove and re-register the kqueue timer events during each call to
set_timer(), to clear out any stale expirations.

Author: Jacob Champion <jacob.champion@enterprisedb.com>
Discussion: https://postgr.es/m/CAOYmi%2Bn4EDOOUL27_OqYT2-F2rS6S%2B3mK-ppWb2Ec92UEoUbYA%40mail.gmail.com
2025-03-19 16:45:01 +13:00
Thomas Munro
8d9d5843b5 oauth: Use IPv4-only issuer in oauth_validator tests
The test authorization server implemented in oauth_server.py does not
listen on IPv6. Most of the time, libcurl happily falls back to IPv4
after failing its initial connection, but on NetBSD, something is
consistently showing up on the unreserved IPv6 port and causing a test
failure.

Rather than deal with dual-stack details across all test platforms,
change the issuer to enforce the use of IPv4 only. (This elicits more
punishing timeout behavior from libcurl, so it's a useful change from
the testing perspective as well.)

Author: Jacob Champion <jacob.champion@enterprisedb.com>
Reported-by: Thomas Munro <thomas.munro@gmail.com>
Discussion: https://postgr.es/m/CAOYmi%2Bn4EDOOUL27_OqYT2-F2rS6S%2B3mK-ppWb2Ec92UEoUbYA%40mail.gmail.com
2025-03-19 16:45:01 +13:00
Amit Langote
28317de723 Ensure first ModifyTable rel initialized if all are pruned
Commit cbc127917e introduced tracking of unpruned relids to avoid
processing pruned relations, and changed ExecInitModifyTable() to
initialize only unpruned result relations. As a result, MERGE
statements that prune all target partitions can now lead to crashes
or incorrect behavior during execution.

The crash occurs because some executor code paths rely on
ModifyTableState.resultRelInfo[0] being present and initialized,
even when no result relations remain after pruning. For example,
ExecMerge() and ExecMergeNotMatched() use the first resultRelInfo
to determine the appropriate action. Similarly,
ExecInitPartitionInfo() assumes that at least one result relation
exists.

To preserve these assumptions, ExecInitModifyTable() now includes the
first result relation in the initialized result relation list if all
result relations for that ModifyTable were pruned. To enable that,
ExecDoInitialPruning() ensures the first relation is locked if it was
pruned and locking is necessary.

To support this exception to the pruning logic, PlannedStmt now
includes a list of RT indexes identifying the first result relation
of each ModifyTable node in the plan. This allows
ExecDoInitialPruning() to check whether each such relation was
pruned and, if so, lock it if necessary.

Bug: #18830
Reported-by: Robins Tharakan <tharakan@gmail.com>
Diagnozed-by: Tender Wang <tndrwang@gmail.com>
Diagnozed-by: Dean Rasheed <dean.a.rasheed@gmail.com>
Co-authored-by: Dean Rasheed <dean.a.rasheed@gmail.com>
Reviewed-by: Tender Wang <tndrwang@gmail.com>
Reviewed-by: Dean Rasheed <dean.a.rasheed@gmail.com>
Discussion: https://postgr.es/m/18830-1f31ea1dc930d444%40postgresql.org
2025-03-19 12:14:24 +09:00
Thomas Munro
06fb5612c9 Increase io_combine_limit range to 1MB.
The default of 128kB is unchanged, but the upper limit is changed from
32 blocks to 128 blocks, unless the operating system's IOV_MAX is too
low.  Some other RDBMSes seem to cap their multi-block buffer pool I/O
around this number, and it seems useful to allow experimentation.

The concrete change is to our definition of PG_IOV_MAX, which provides
the maximum for io_combine_limit and io_max_combine_limit.  It also
affects a couple of other places that work with arrays of struct iovec
or smaller objects on the stack, so we still don't want to use the
system IOV_MAX directly without a clamp: it is not under our control and
likely to be 1024.  128 seems acceptable for our current usage.

For Windows, we can't use real scatter/gather yet, so we continue to
define our own IOV_MAX value of 16 and emulate preadv()/pwritev() with
loops.  Someone would need to research the trade-offs of raising that
number.

NB if trying to see this working: you might temporarily need to hack
BAS_BULKREAD to be bigger, since otherwise the obvious way of "a very
big SELECT" is limited by that for now.

Suggested-by: Tomas Vondra <tomas@vondra.me>
Discussion: https://postgr.es/m/CA%2BhUKG%2B2T9p-%2BzM6Eeou-RAJjTML6eit1qn26f9twznX59qtCA%40mail.gmail.com
2025-03-19 15:40:35 +13:00
Thomas Munro
10f6646847 Introduce io_max_combine_limit.
The existing io_combine_limit can be changed by users.  The new
io_max_combine_limit is fixed at server startup time, and functions as a
silent clamp on the user setting.  That in itself is probably quite
useful, but the primary motivation is:

aio_init.c allocates shared memory for all asynchronous IOs including
some per-block data, and we didn't want to waste memory you'd never used
by assuming they could be up to PG_IOV_MAX.  This commit already halves
the size of 'AioHandleIov' and 'AioHandleData'.  A follow-up commit can
now expand PG_IOV_MAX without affecting that.

Since our GUC system doesn't support dependencies or cross-checks
between GUCs, the user-settable one now assigns a "raw" value to
io_combine_limit_guc, and the lower of io_combine_limit_guc and
io_max_combine_limit is maintained in io_combine_limit.

Reviewed-by: Andres Freund <andres@anarazel.de> (earlier version)
Discussion: https://postgr.es/m/CA%2BhUKG%2B2T9p-%2BzM6Eeou-RAJjTML6eit1qn26f9twznX59qtCA%40mail.gmail.com
2025-03-19 15:23:54 +13:00
Michael Paquier
17d8bba6da Fix copy-paste error related to the autovacuum launcher in pgstat_io.c
Autovacuum launchers perform no WAL IO reads, but pgstat_tracks_io_op()
was tracking them as an allowed combination for the "init" and "normal"
contexts.

This caused the "read", "read_bytes" and "read_time" attributes of
pg_stat_io to show zeros for the autovacuum launcher rather than NULL.
NULL means that a combination of IO object, IO context and IO operation
has no meaning for a backend type.  Zero is the same as telling that a
combination is relevant, and that WAL reads are possible in an
autovacuum launcher, but it is not relevant.

Copy-pasto introduced in a051e71e28.

Author: Ranier Vilela <ranier.vf@gmail.com>
Reviewed-by: Nazir Bilal Yavuz <byavuz81@gmail.com>
Discussion: https://postgr.es/m/CAEudQAopEMAPiUqE7BvDV+x2fUPmKmb9RrsaoDR+hhQzLKg4PQ@mail.gmail.com
2025-03-19 08:52:10 +09:00
Masahiko Sawada
f4290f20dd Fix assertion failure in parallel vacuum with minimal maintenance_work_mem setting.
bbf668d66f lowered the minimum value of maintenance_work_mem to
64kB. However, in parallel vacuum cases, since the initial underlying
DSA size is 256kB, it attempts to perform a cycle of index vacuuming
and table vacuuming with an empty TID store, resulting in an assertion
failure.

This commit ensures that at least one page is processed before index
vacuuming and table vacuuming begins.

Backpatch to 17, where the minimum maintenance_work_mem value was
lowered.

Reviewed-by: David Rowley <dgrowleyml@gmail.com>
Discussion: https://postgr.es/m/CAD21AoCEAmbkkXSKbj4dB+5pJDRL4ZHxrCiLBgES_g_g8mVi1Q@mail.gmail.com
Backpatch-through: 17
2025-03-18 16:37:02 -07:00
Michael Paquier
6d3ea48ff1 Optimize check for pending backend IO stats
This commit changes the backend stats code so as we rely on a single
boolean rather than a repeated check based on pg_memory_is_all_zeros()
in the code, making it cheaper should PgStat_PendingIO get bigger in
size.

The frequency of backend stats reports is not a bottleneck, but there is
no reason to not make that cheaper, and the logic is simple as the only
entry points updating backend IO stats are pgstat_count_backend_io_op()
and pgstat_count_backend_io_op_time().

Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Reviewed-by: Xuneng Zhou <xunengzhou@gmail.com>
Discussion: https://postgr.es/m/Z8WYf1jyy4MwOveQ@ip-10-97-1-34.eu-west-3.compute.internal
2025-03-19 08:03:06 +09:00
Nathan Bossart
7fb418f020 Add commit 796bdda484 to .git-blame-ignore-revs. 2025-03-18 17:00:23 -05:00
Nathan Bossart
c9d502eb68 Update guidance for running vacuumdb after pg_upgrade.
Now that pg_upgrade can carry over most optimizer statistics, we
should recommend using vacuumdb's new --missing-stats-only option
to only analyze relations that are missing statistics.

Reviewed-by: John Naylor <johncnaylorls@gmail.com>
Discussion: https://postgr.es/m/Z5O1bpcwDrMgyrYy%40nathan
2025-03-18 16:32:56 -05:00
Nathan Bossart
edba754f05 vacuumdb: Add option for analyzing only relations missing stats.
This commit adds a new --missing-stats-only option that can be used
with --analyze-only or --analyze-in-stages.  When this option is
specified, vacuumdb will analyze a relation if it lacks any
statistics for a column, expression index, or extended statistics
object.  This new option is primarily intended for use after
pg_upgrade (since it can now retain most optimizer statistics), but
it might be useful in other situations, too.

Author: Corey Huinker <corey.huinker@gmail.com>
Co-authored-by: Nathan Bossart <nathandbossart@gmail.com>
Reviewed-by: John Naylor <johncnaylorls@gmail.com>
Discussion: https://postgr.es/m/Z5O1bpcwDrMgyrYy%40nathan
2025-03-18 16:32:56 -05:00
Nathan Bossart
9c03c8d187 vacuumdb: Teach vacuum_one_database() to reuse query results.
Presently, each call to vacuum_one_database() queries the catalogs
to retrieve the list of tables to process.  A follow-up commit will
add a "missing stats only" feature to --analyze-in-stages, which
requires saving the catalog query results (since tables without
statistics will have them after the first stage).  This commit adds
a new parameter to vacuum_one_database() that specifies either a
previously-retrieved list or a place to return the catalog query
results.  Note that nothing uses this new parameter yet.

Author: Corey Huinker <corey.huinker@gmail.com>
Co-authored-by: Nathan Bossart <nathandbossart@gmail.com>
Reviewed-by: John Naylor <johncnaylorls@gmail.com>
Discussion: https://postgr.es/m/Z5O1bpcwDrMgyrYy%40nathan
2025-03-18 16:32:55 -05:00
Tom Lane
a6524105d2 Doc: manually break lines in wide UUID examples.
Buildfarm member crake has been complaining "WARNING: The contents of
fo:inline line 1 exceed the available area in the inline-progression
direction by 20500 millipoints. (See position 23808:106)" since
ba57dcfdc went in.  The other doc-building animals are not showing
this warning, and I don't see it on my RHEL8 workstation either, but
I was able to reproduce it on a Fedora 41 box.  So apparently this
is due to a recent-ish change in DocBook's line-breaking heuristics,
which caused it to cope less well with the UUIDs in these examples.
Put in some zero-width spaces to encourage the PDF toolchain to
break these lines in a better place.  (Only one of these examples
actually needs this today, but I marked up all three to ensure that
they get wrapped in a consistent way.)
2025-03-18 15:35:13 -04:00
Andres Freund
499faf9063 smgr: Make SMgrRelation initialization safer against errors
In case the smgr_open callback failed, the ->pincount field would not be
initialized and the relation would not be put onto the unpinned_relns list.

This buglet was introduced in 21d9c3ee4e, in 17.

Discussion: https://postgr.es/m/3vae7l5ozvqtxmd7rr7zaeq3qkuipz365u3rtim5t5wdkr6f4g@vkgf2fogjirl
Backpatch-through: 17
2025-03-18 14:04:44 -04:00
Álvaro Herrera
62d712ecfd
Introduce squashing of constant lists in query jumbling
pg_stat_statements produces multiple entries for queries like
    SELECT something FROM table WHERE col IN (1, 2, 3, ...)

depending on the number of parameters, because every element of
ArrayExpr is individually jumbled.  Most of the time that's undesirable,
especially if the list becomes too large.

Fix this by introducing a new GUC query_id_squash_values which modifies
the node jumbling code to only consider the first and last element of a
list of constants, rather than each list element individually.  This
affects both the query_id generated by query jumbling, as well as
pg_stat_statements query normalization so that it suppresses printing of
the individual elements of such a list.

The default value is off, meaning the previous behavior is maintained.

Author: Dmitry Dolgov <9erthalion6@gmail.com>
Reviewed-by: Sergey Dudoladov (mysterious, off-list)
Reviewed-by: David Geier <geidav.pg@gmail.com>
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Reviewed-by: Álvaro Herrera <alvherre@alvh.no-ip.org>
Reviewed-by: Sami Imseih <samimseih@gmail.com>
Reviewed-by: Sutou Kouhei <kou@clear-code.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Marcos Pegoraro <marcos@f10.com.br>
Reviewed-by: Julien Rouhaud <rjuju123@gmail.com>
Reviewed-by: Zhihong Yu <zyu@yugabyte.com>
Tested-by: Yasuo Honda <yasuo.honda@gmail.com>
Tested-by: Sergei Kornilov <sk@zsrv.org>
Tested-by: Maciek Sakrejda <m.sakrejda@gmail.com>
Tested-by: Chengxi Sun <sunchengxi@highgo.com>
Tested-by: Jakub Wartak <jakub.wartak@enterprisedb.com>
Discussion: https://postgr.es/m/CA+q6zcWtUbT_Sxj0V6HY6EZ89uv5wuG5aefpe_9n0Jr3VwntFg@mail.gmail.com
2025-03-18 18:56:11 +01:00
Andres Freund
247ce06b88 aio: Add io_method=worker
The previous commit introduced the infrastructure to start io_workers. This
commit actually makes the workers execute IOs.

IO workers consume IOs from a shared memory submission queue, run traditional
synchronous system calls, and perform the shared completion handling
immediately.  Client code submits most requests by pushing IOs into the
submission queue, and waits (if necessary) using condition variables.  Some
IOs cannot be performed in another process due to lack of infrastructure for
reopening the file, and must processed synchronously by the client code when
submitted.

For now the default io_method is changed to "worker". We should re-evaluate
that around beta1, we might want to be careful and set the default to "sync"
for 18.

Reviewed-by: Noah Misch <noah@leadboat.com>
Co-authored-by: Thomas Munro <thomas.munro@gmail.com>
Co-authored-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/uvrtrknj4kdytuboidbhwclo4gxhswwcpgadptsjvjqcluzmah%40brqs62irg4dt
Discussion: https://postgr.es/m/20210223100344.llw5an2aklengrmn@alap3.anarazel.de
Discussion: https://postgr.es/m/stj36ea6yyhoxtqkhpieia2z4krnam7qyetc57rfezgk4zgapf@gcnactj4z56m
2025-03-18 11:54:01 -04:00
Andres Freund
55b454d0e1 aio: Infrastructure for io_method=worker
This commit contains the basic, system-wide, infrastructure for
io_method=worker. It does not yet actually execute IO, this commit just
provides the infrastructure for running IO workers, kept separate for easier
review.

The number of IO workers can be adjusted with a PGC_SIGHUP GUC. Eventually
we'd like to make the number of workers dynamically scale up/down based on the
current "IO load".

To allow the number of IO workers to be increased without a restart, we need
to reserve PGPROC entries for the workers unconditionally. This has been
judged to be worth the cost. If it turns out to be problematic, we can
introduce a PGC_POSTMASTER GUC to control the maximum number.

As io workers might be needed during shutdown, e.g. for AIO during the
shutdown checkpoint, a new PMState phase is added. IO workers are shut down
after the shutdown checkpoint has been performed and walsender/archiver have
shut down, but before the checkpointer itself shuts down. See also
87a6690cc6.

Updates PGSTAT_FILE_FORMAT_ID due to the addition of a new BackendType.

Reviewed-by: Noah Misch <noah@leadboat.com>
Co-authored-by: Thomas Munro <thomas.munro@gmail.com>
Co-authored-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/uvrtrknj4kdytuboidbhwclo4gxhswwcpgadptsjvjqcluzmah%40brqs62irg4dt
Discussion: https://postgr.es/m/20210223100344.llw5an2aklengrmn@alap3.anarazel.de
Discussion: https://postgr.es/m/stj36ea6yyhoxtqkhpieia2z4krnam7qyetc57rfezgk4zgapf@gcnactj4z56m
2025-03-18 11:54:01 -04:00
Jeff Davis
549ea06e42 Fix headerscheck warning.
Reported-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/93731.1742310701@sss.pgh.pa.us
2025-03-18 08:37:07 -07:00
Tom Lane
4078da6c47 Silence compiler warning.
Assorted buildfarm members are complaining about "'process_list' may
be used uninitialized in this function" since f76892c9f, presumably
because they don't trust that the switch case labels are exhaustive.
We can silence that by initializing the variable to NULL.  Should
a switch fall-through actually happen, we'll get SIGSEGV at the
first use, which is as good as an Assert.
2025-03-18 10:54:10 -04:00
Daniel Gustafsson
daa02c6bd9 Add X25519 to the default set of curves
Since many clients default to the X25519 curve in the TLS handshake,
the fact that the server by defualt doesn't support it cause an extra
roundtrip for each TLS connection.  By adding multiple curves, which
is supported since 3d1ef3a15c, we can reduce the risk of extra
roundtrips.

Author: Daniel Gustafsson <daniel@yesql.se>
Co-authored-by: Jacob Champion <jacob.champion@enterprisedb.com>
Reported-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Jacob Champion <jacob.champion@enterprisedb.com>
Discussion: https://postgr.es/m/20240616234612.6cslu7nqexquvwj7@awork3.anarazel.de
2025-03-18 15:26:27 +01:00
Robert Haas
4fd02bf7cf Add some new hooks so extensions can add details to EXPLAIN.
Specifically, add a per-node hook that is called after the per-node
information has been displayed but before we display children, and a
per-query hook that is called after existing query-level information
is printed. This assumes that extension-added information should
always go at the end rather than the beginning or the middle, but
that seems like an acceptable limitation for simplicity. It also
assumes that extensions will only want to add information, not remove
or reformat existing details; those also seem like acceptable
restrictions, at least for now.

If multiple EXPLAIN extensions are used, the order in which any
additional details are printed is likely to depend on the order in
which the modules are loaded. That seems OK, since the user may
have opinions about the order in which output should appear, and the
extension author can't really know whether their stuff is more or
less important to a particular user than some other extension.

Discussion: http://postgr.es/m/CA+TgmoYSzg58hPuBmei46o8D3SKX+SZoO4K_aGQGwiRzvRApLg@mail.gmail.com
Reviewed-by: Srinath Reddy <srinath2133@gmail.com>
Reviewed-by: Andrei Lepikhov <lepihov@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Sami Imseih <samimseih@gmail.com>
2025-03-18 09:28:01 -04:00
Álvaro Herrera
f76892c9ff
Simplify reindexdb coding
get_parallel_object_list() was trying to serve two masters, and it was
doing a bad job at both.  In particular, it treated the given user_list
as an output argument, but only sometimes.  This was confusing, and the
two paths through it didn't really have all that much in common, so the
complexity wasn't buying us much.  Split it in two:
get_parallel_tables_list() handles the straightforward cases for
schemas, databases and tables, takes one list as argument and returns
another list.

A new function get_parallel_tabidx_list() handles the case for indexes.
This takes a list as argument and outputs two lists, just like
get_parallel_object_list used to do, but now the API is clearer (IMO
anyway).  Another difference is that accompanying the list of indexes
now we have a list of tables as an OID list rather than a
fully-qualified table name list.  This makes some comparisons easier,
and we don't really need the names of the tables, just their OIDs.
(This requires atooid, which requires <stdlib.h>).

Author: Ranier Vilela <ranier.vf@gmail.com>
Author: Álvaro Herrera <alvherre@alvh.no-ip.org>
Discussion: https://postgr.es/m/CAEudQArfqr0-s0VVPSEh=0kgOgBJvFNdGW=xSL5rBcr0WDMQYQ@mail.gmail.com
2025-03-18 14:21:26 +01:00
Melanie Plageman
cc6be07ebd Increase default maintenance_io_concurrency to 16
Since its introduction in fc34b0d9de, the default
maintenance_io_concurrency has been larger than the default
effective_io_concurrency. maintenance_io_concurrency primarily
controlled prefetching done on behalf of the whole system, for
operations like recovery. Therefore it makes sense for it to have a
value equal to or greater than effective_io_concurrency, which controls
I/O concurrency for reading a relation in a bitmap heap scan.

ff79b5b2ab increased effective_io_concurrency to 16, so we'll increase
maintenance_io_concurrency as well. For now, though, we'll keep the
defaults of effective_io_concurrency and maintenance_io_concurrency
equal to one another (16).

On fast, high IOPs systems, significantly higher values of
maintenance_io_concurrency are observably beneficial [1]. However, such
values would flood low IOPs systems and increase overall system I/O
latency.

It is worth mentioning that since 9256822608 and c3e775e608,
maintenance_io_concurrency also controls the I/O concurrency of each
vacuum worker. Since many autovacuum workers may be simultaneously
issuing I/Os, we want to keep maintenance_io_concurrency appropriately
conservative.

[1] https://postgr.es/m/c5d52837-6256-0556-ac8c-d6d3d558820a%40enterprisedb.com

Suggested-by: Jakub Wartak <jakub.wartak@enterprisedb.com>
Discussion: https://postgr.es/m/CAKZiRmxdHQaU%2B2Zpe6d%3Dx%3D0vigJ1sfWwwVYLJAf%3Dud_wQ_VcUw%40mail.gmail.com
2025-03-18 09:08:10 -04:00
Robert Haas
796bdda484 Fix indentation again.
Because somehow I manage to keep forgetting this.
2025-03-18 09:02:36 -04:00
Robert Haas
c65bc2e1d1 Make it possible for loadable modules to add EXPLAIN options.
Modules can use RegisterExtensionExplainOption to register new
EXPLAIN options, and GetExplainExtensionId, GetExplainExtensionState,
and SetExplainExtensionState to store related state inside the
ExplainState object.

Since this substantially increases the amount of code that needs
to handle ExplainState-related tasks, move a few bits of existing
code to a new file explain_state.c and add the rest of this
infrastructure there.

See the comments at the top of explain_state.c for further
explanation of how this mechanism works.

This does not yet provide a way for such such options to do anything
useful. The intention is that we'll add hooks for that purpose in a
separate commit.

Discussion: http://postgr.es/m/CA+TgmoYSzg58hPuBmei46o8D3SKX+SZoO4K_aGQGwiRzvRApLg@mail.gmail.com
Reviewed-by: Srinath Reddy <srinath2133@gmail.com>
Reviewed-by: Andrei Lepikhov <lepihov@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Sami Imseih <samimseih@gmail.com>
2025-03-18 08:41:12 -04:00
Peter Eisentraut
9d6db8bec1 Allow non-btree unique indexes for matviews
We were rejecting non-btree indexes in some cases owing to the
inability to determine the equality operators for other index AMs;
that problem no longer exists, because we can look up the equality
operator using COMPARE_EQ.

Stop rejecting these indexes, but instead rely on all unique indexes
having equality operators.  Unique indexes must have equality
operators.

Author: Mark Dilger <mark.dilger@enterprisedb.com>
Discussion: https://www.postgresql.org/message-id/flat/E72EAA49-354D-4C2E-8EB9-255197F55330@enterprisedb.com
2025-03-18 11:29:15 +01:00
Peter Eisentraut
f278e1fe30 Allow non-btree unique indexes for partition keys
We were rejecting non-btree indexes in some cases owing to the
inability to determine the equality operators for other index AMs;
that problem no longer exists, because we can look up the equality
operator using COMPARE_EQ.  The problem of not knowing the strategy
number for equality in other index AMs is already resolved.

Stop rejecting the indexes upfront, and instead reject any for which
the equality operator lookup fails.

Author: Mark Dilger <mark.dilger@enterprisedb.com>
Discussion: https://www.postgresql.org/message-id/flat/E72EAA49-354D-4C2E-8EB9-255197F55330@enterprisedb.com
2025-03-18 11:25:36 +01:00
Peter Eisentraut
7317e64126 Add some opfamily support functions to lsyscache.c
Add get_opfamily_method() and get_opfamily_member_for_cmptype() in
lsyscache.c.  No callers yet, but we'll add some soon.  This is part
of generalizing some parts of the code away from having btree
hardcoded and use CompareType instead.

Author: Mark Dilger <mark.dilger@enterprisedb.com>
Discussion: https://www.postgresql.org/message-id/flat/E72EAA49-354D-4C2E-8EB9-255197F55330@enterprisedb.com
2025-03-18 11:17:43 +01:00
Amit Kapila
122a9af5de Fix typo.
Author: vignesh C <vignesh21@gmail.com>
Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Discussion: https://postgr.es/m/CALDaNm1KqJ0VFfDJRPbfYi9Shz6LHFEE-Ckn+eqsePfKhebv9w@mail.gmail.com
2025-03-18 14:18:09 +05:30
Amit Kapila
01e27aab05 Use correct variable name in publicationcmds.c.
subid was used at few places for publicationid in publicationcmds.c/.h.

Author: vignesh C <vignesh21@gmail.com>
Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Discussion: https://postgr.es/m/CALDaNm1KqJ0VFfDJRPbfYi9Shz6LHFEE-Ckn+eqsePfKhebv9w@mail.gmail.com
2025-03-18 14:06:51 +05:30
Masahiko Sawada
c462b054ba Fix the test 005_char_signedness.
pg_upgrade test 005_char_signedness was leaving files like
delete_old_cluster.sh in the source directory for VPATH and meson
builds. The fix is to change the directory to tmp_check before running
the test.

Reported-by: Robert Haas <robertmhaas@gmail.com>
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Discussion: http://postgr.es/m/CA+TgmoYg5e4oznn0XGoJ3+mceG1qe_JJt34rF2JLwvGS5T1hgQ@mail.gmail.com
2025-03-17 21:34:10 -07:00
Michael Paquier
17caf66445 psql: Add \sendpipeline to send query buffers while in a pipeline
In the initial pipeline support for psql added in 41625ab8ea, \g was
used as the way to push extended query into an ongoing pipeline.  \gx
was blocked.

These two meta-commands have format-related options that can be applied
when fetching a query result (expanded, etc.).  As the results of a
pipeline are fetched asynchronously, not at the moment of the
meta-command execution but at the moment of a \getresults or a
\endpipeline, authorizing \g while blocking \gx leads to a confusing
implementation, making one think that psql should be smart enough to
remember the output format options defined from the time when \g or \gx
were executed.  Doing so would lead to more code complications when
retrieving a batch of results.  There is an extra argument other than
simplicity here: the output format options defined at the point of a
\getresults or a \endpipeline execution should be what affect the output
format for a batch of results.

To avoid any confusion, we have settled to the introduction of a new
meta-command called \sendpipeline, replacing \g when within a pipeline.
An advantage of this design is that it is possible to add new options
specific to pipelines when sending a query buffer, independent of \g
and \gx, should it prove to be necessary.

Most of the changes of this commit happen in the regression tests, where
\g is replaced by \sendpipeline.  More tests are added to check that \g
is not allowed.

Per discussion between the author, Daniel Vérité and me.

Author: Anthonin Bonnefoy <anthonin.bonnefoy@datadoghq.com>
Discussion: https://postgr.es/m/ad4b9f1a-f7fe-4ab8-8546-90754726d0be@manitou-mail.org
2025-03-18 09:41:21 +09:00
Andres Freund
da7226993f aio: Add core asynchronous I/O infrastructure
The main motivations to use AIO in PostgreSQL are:

a) Reduce the time spent waiting for IO by issuing IO sufficiently early.

   In a few places we have approximated this using posix_fadvise() based
   prefetching, but that is fairly limited (no completion feedback, double the
   syscalls, only works with buffered IO, only works on some OSs).

b) Allow to use Direct-I/O (DIO).

   DIO can offload most of the work for IO to hardware and thus increase
   throughput / decrease CPU utilization, as well as reduce latency.  While we
   have gained the ability to configure DIO in d4e71df6, it is not yet usable
   for real world workloads, as every IO is executed synchronously.

For portability, the new AIO infrastructure allows to implement AIO using
different methods. The choice of the AIO method is controlled by the new
io_method GUC. As of this commit, the only implemented method is "sync",
i.e. AIO is not actually executed asynchronously. The "sync" method exists to
allow to bypass most of the new code initially.

Subsequent commits will introduce additional IO methods, including a
cross-platform method implemented using worker processes and a linux specific
method using io_uring.

To allow different parts of postgres to use AIO, the core AIO infrastructure
does not need to know what kind of files it is operating on. The necessary
behavioral differences for different files are abstracted as "AIO
Targets". One example target would be smgr. For boring portability reasons,
all targets currently need to be added to an array in aio_target.c.  This
commit does not implement any AIO targets, just the infrastructure for
them. The smgr target will be added in a later commit.

Completion (and other events) of IOs for one type of file (i.e. one AIO
target) need to be reacted to differently, based on the IO operation and the
callsite. This is made possible by callbacks that can be registered on
IOs. E.g. an smgr read into a local buffer does not need to update the
corresponding BufferDesc (as there is none), but a read into shared buffers
does.  This commit does not contain any callbacks, they will be added in
subsequent commits.

For now the AIO infrastructure only understands READV and WRITEV operations,
but it is expected that more operations will be added. E.g. fsync/fdatasync,
flush_range and network operations like send/recv.

As of this commit, nothing uses the AIO infrastructure. Later commits will add
an smgr target, md.c and bufmgr.c callbacks and then finally use AIO for
read_stream.c IO, which, in one fell swoop, will convert all read stream users
to AIO.

The goal is to use AIO in many more places. There are patches to use AIO for
checkpointer and bgwriter that are reasonably close to being ready. There also
are prototypes to use it for WAL, relation extension, backend writes and many
more. Those prototypes were important to ensure the design of the AIO
subsystem is not too limiting (e.g. WAL writes need to happen in critical
sections, which influenced a lot of the design).

A future commit will add an AIO README explaining the AIO architecture and how
to use the AIO subsystem. The README is added later, as it references details
only added in later commits.

Many many more people than the folks named below have contributed with
feedback, work on semi-independent patches etc. E.g. various folks have
contributed patches to use the read stream infrastructure (added by Thomas in
b5a9b18cd0) in more places. Similarly, a *lot* of folks have contributed to
the CI infrastructure, which I had started to work on to make adding AIO
feasible.

Some of the work by contributors has gone into the "v1" prototype of AIO,
which heavily influenced the current design of the AIO subsystem. None of the
code from that directly survives, but without the prototype, the current
version of the AIO infrastructure would not exist.

Similarly, the reviewers below have not necessarily looked at the current
design or the whole infrastructure, but have provided very valuable input. I
am to blame for problems, not they.

Author: Andres Freund <andres@anarazel.de>
Co-authored-by: Thomas Munro <thomas.munro@gmail.com>
Co-authored-by: Nazir Bilal Yavuz <byavuz81@gmail.com>
Co-authored-by: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi>
Reviewed-by: Noah Misch <noah@leadboat.com>
Reviewed-by: Jakub Wartak <jakub.wartak@enterprisedb.com>
Reviewed-by: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Reviewed-by: Dmitry Dolgov <9erthalion6@gmail.com>
Reviewed-by: Antonin Houska <ah@cybertec.at>
Discussion: https://postgr.es/m/uvrtrknj4kdytuboidbhwclo4gxhswwcpgadptsjvjqcluzmah%40brqs62irg4dt
Discussion: https://postgr.es/m/20210223100344.llw5an2aklengrmn@alap3.anarazel.de
Discussion: https://postgr.es/m/stj36ea6yyhoxtqkhpieia2z4krnam7qyetc57rfezgk4zgapf@gcnactj4z56m
2025-03-17 18:51:33 -04:00
Andres Freund
02844012b3 aio: Basic subsystem initialization
This commit just does the minimal wiring up of the AIO subsystem, added in the
next commit, to the rest of the system. The next commit contains more details
about motivation and architecture.

This commit is kept separate to make it easier to review, separating the
changes across the tree, from the implementation of the new subsystem.

We discussed squashing this commit with the main commit before merging AIO,
but there has been a mild preference for keeping it separate.

Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi>
Reviewed-by: Noah Misch <noah@leadboat.com>
Discussion: https://postgr.es/m/uvrtrknj4kdytuboidbhwclo4gxhswwcpgadptsjvjqcluzmah%40brqs62irg4dt
2025-03-17 18:51:33 -04:00
Nathan Bossart
65db3963ae Add commit 203c1b4cc4 to .git-blame-ignore-revs. 2025-03-17 15:58:02 -05:00
Robert Haas
203c1b4cc4 Fix indentation.
Commit 99aeb84703 wasn't fully
reindented prior to commit.
2025-03-17 16:06:17 -04:00
Nathan Bossart
7e05df430b pg_upgrade: Remove some dead code.
Since commit e469f0aaf3, tablespace_suffix can't be empty.

Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/Z9hc3mkYFKR56Xof%40nathan
2025-03-17 13:18:14 -05:00
Andres Freund
1a22a8a0f1 tests: Expand temp table tests to some pin related matters
Added tests:
- recovery from running out of unpinned local buffers
- that we don't run out of unpinned buffers due to read stream (only recently
  fixed, in 92fc6856cb)
- temp tables can't be dropped while in use by cursors

Discussion: weskknhckugbdm2yt7sa2uq53xlsax67gcdkac34sanb7qpd3p@hcc2wadao5wy
Discussion: https://postgr.es/m/ge6nsuddurhpmll3xj22vucvqwp4agqz6ndtcf2mhyeydzarst@l75dman5x53p
2025-03-17 14:12:44 -04:00
Robert Haas
99aeb84703 pg_combinebackup: Add -k, --link option.
This is similar to pg_upgrade's --link option, except that here we won't
typically be able to use it for every input file: sometimes we will need
to reconstruct a complete backup from blocks stored in different files.
However, when a whole file does need to be copied, we can use an
optimized copying strategy: see the existing --clone and
--copy-file-range options and the code to use CopyFile() on Windows.
This commit adds a new strategy: add a hard link to an existing file.
Making a hard link doesn't actually copy anything, but it makes sense
for the code to treat it as doing so.

This is useful when the input directories are merely staging directories
that will be removed once the restore is complete. In such cases, there
is no need to actually copy the data, and making a bunch of new hard
links can be very quick. However, it would be quite dangerous to use it
if the input directories might later be reused for any other purpose,
since starting postgres on the output directory would destructively
modify the input directories. For that reason, using this new option
causes pg_combinebackup to emit a warning about the danger involved.

Author: Israel Barth Rubio <barthisrael@gmail.com>
Co-authored-by: Robert Haas <robertmhaas@gmail.com> (cosmetic changes)
Reviewed-by: Vignesh C <vignesh21@gmail.com>
Discussion: http://postgr.es/m/CA+TgmoaEFsYHsMefNaNkU=2SnMRufKE3eVJxvAaX=OWgcnPmPg@mail.gmail.com
2025-03-17 14:03:14 -04:00
Tom Lane
ed762e9425 Unify wording of user-facing "row security" messages.
Row-level security is mostly referred to as "row security" in
user-facing messages.  Commit cd3c45125 introduced one inconsistent
use of "row level security"; make that one match the rest.

Author: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Discussion: https://postgr.es/m/20250317.135305.573764276033358827.horikyota.ntt@gmail.com
2025-03-17 12:53:50 -04:00
Michael Paquier
3943f5cff6 Fix inconsistent quoting for some options in TAP tests
This commit addresses some inconsistencies with how the options of some
routines from PostgreSQL/Test/ are written, mainly for init() and
init_from_backup() in Cluster.pm.  These are written as unquoted, except
in the locations updated here.

Changes extracted from a larger patch by the same author.

Author: Dagfinn Ilmari Mannsåker <ilmari@ilmari.org>
Discussion: https://postgr.es/m/87jz8rzf3h.fsf@wibble.ilmari.org
2025-03-17 14:07:12 +09:00
Michael Paquier
19c6e92b13 Apply more consistent style for command options in TAP tests
This commit reshapes the grammar of some commands to apply a more
consistent style across the board, following rules similar to
ce1b0f9da03e:
- Elimination of some pointless used-once variables.
- Use of long options, to self-document better the options used.
- Use of fat commas to link option names and their assigned values,
including redirections, so as perltidy can be tricked to put them
together.

Author: Dagfinn Ilmari Mannsåker <ilmari@ilmari.org>
Discussion: https://postgr.es/m/87jz8rzf3h.fsf@wibble.ilmari.org
2025-03-17 12:42:23 +09:00
Michael Paquier
5721e5453e Revert "Add redo LSN to pgstats files"
This reverts commit b860848232, that was added as a prerequisite for
the support of pgstats data flush across checkpoints, linking a pgstats
file to a specific checkpoint redo LSN.

As reported, this is proving to be currently problematic when going
through a pg_upgrade, that does direct manipulations of the control file
in the new cluster.  The LSN stored in the pgstats file is not able to
cope with any changes done in the control file by pg_upgrade yet,
causing the pgstats file to be discarded when starting the new cluster
after overriding its redo LSN (one is a `pg_resetwal -l` where the new
cluster's start LSN is bumped by a hardcoded value of 8 segments, see
copy_xact_xlog_xid).

The least painful path going forward is likely going to be a refactor of
the pgstats code so as it is possible to read and write some of its data
with some routines in src/common/, so as pg_upgrade or pg_resetwal are
able to update its data.  The main point is that we are going to need a
LSN in the stats file should we make it written at checkpoint time and
not only as part of a shutdown sequence.  It is too late to dive into
these details for v18, so let's revert the change, and let's try to
figure out all the details in the next release cycle.  The pgstats file
is currently only written as part of a shutdown sequence, and its
contents are still lost on crash, same as older releases.

Bump PGSTAT_FILE_FORMAT_ID.

Reported-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/2563883.1741826489@sss.pgh.pa.us
2025-03-17 08:35:12 +09:00
Tom Lane
cd3c45125d pg_dump, pg_dumpall, pg_restore: Add --no-policies option.
Add --no-policies option to control row level security policy handling
in dump and restore operations. When this option is used, both CREATE
POLICY commands and ALTER TABLE ... ENABLE ROW LEVEL SECURITY commands
are excluded from dumps and skipped during restores.

This is useful in scenarios where policies need to be redefined in the
target system or when moving data between environments with different
security requirements.

Author: Nikolay Samokhvalov <nik@postgres.ai>
Reviewed-by: Greg Sabino Mullane <htamfids@gmail.com>
Reviewed-by: Jim Jones <jim.jones@uni-muenster.de>
Reviewed-by: newtglobal postgresql_contributors <postgresql_contributors@newtglobalcorp.com>
Discussion: https://postgr.es/m/CAM527d8kG2qPKvbfJ=OYJkT7iRNd623Bk+m-a4ngm+nyHYsHog@mail.gmail.com
2025-03-16 18:08:15 -04:00
Tom Lane
4489044239 contrib/isn: Make weak mode a GUC setting, and fix related functions.
isn's weak mode used to be a simple static variable, settable only
via the isn_weak(boolean) function.  This wasn't optimal, as this
means it doesn't respect transactions nor respond to RESET ALL.

This patch makes isn.weak a GUC parameter instead, so that
it acts like any other user-settable parameter.

The isn_weak() functions are retained for backwards compatibility.
But we must fix their volatility markings: they were marked IMMUTABLE
which is surely incorrect, and PARALLEL RESTRICTED which isn't right
for GUC-related functions either.  Mark isn_weak(boolean) as
VOLATILE and PARALLEL UNSAFE, matching set_config().  Mark isn_weak()
as STABLE and PARALLEL SAFE, matching current_setting().

Reported-by: Viktor Holmberg <v@viktorh.net>
Diagnosed-by: Daniel Gustafsson <daniel@yesql.se>
Author: Viktor Holmberg <v@viktorh.net>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/790bc1f9-74dc-4b50-94d2-8147315b1556@Spark
2025-03-16 13:45:48 -04:00
Alexander Korotkov
682c5be25c reindexdb: Fix the index-level REINDEX with multiple jobs
47f99a407d introduced a parallel index-level REINDEX.  The code was written
assuming that running run_reindex_command() with 'async == true' can schedule
a number of queries for a connection.  That's not true, and the second query
sent using run_reindex_command() will wait for the completion of the previous
one.

This commit fixes that by putting REINDEX commands for the same table into a
single query.

Also, this commit removes the 'async' argument from run_reindex_command(),
as only its call always passes 'async == true'.

Reported-by: Álvaro Herrera <alvherre@alvh.no-ip.org>
Discussion: https://postgr.es/m/202503071820.j25zn3lo4hvn%40alvherre.pgsql
Reviewed-by: Álvaro Herrera <alvherre@alvh.no-ip.org>
Backpatch-through: 17
2025-03-16 13:29:15 +02:00
Michael Paquier
83e5763d4d pg_createsubscriber: Remove some code bloat in the atexit() callback
This commit adjusts some code added by e117cfb2f6 in the atexit()
callback of pg_createsubscriber.c, in charge of performing post-failure
cleanup actions.  The code loops over all the databases specified, and
it is changed here to rely on a single LogicalRepInfo for each database
rather than always using LogicalRepInfos, simplifying its logic.

Author: Peter Smith <smithpb2250@gmail.com>
Discussion: https://postgr.es/m/CAHut+PtdBSVi4iH7BObDVwDNVwOpn+H3fezOBdSTtENx+rhNMw@mail.gmail.com
2025-03-16 19:20:49 +09:00
Andres Freund
771ba90298 localbuf: Introduce StartLocalBufferIO()
To initiate IO on a shared buffer we have StartBufferIO(). For temporary table
buffers no similar function exists - likely because the code for that
currently is very simple due to the lack of concurrency.

However, the upcoming AIO support will make it possible to re-encounter a
local buffer, while the buffer already is the target of IO. In that case we
need to wait for already in-progress IO to complete. This commit makes it
easier to add the necessary code, by introducing StartLocalBufferIO().

Reviewed-by: Melanie Plageman <melanieplageman@gmail.com>
Discussion: https://postgr.es/m/CAAKRu_b9anbWzEs5AAF9WCvcEVmgz-1AkHSQ-CLLy-p7WHzvFw@mail.gmail.com
2025-03-15 22:07:48 -04:00
Andres Freund
4b4d33b9ea localbuf: Introduce FlushLocalBuffer()
Previously we had two paths implementing writing out temporary table
buffers. For shared buffers, the logic for that is centralized in
FlushBuffer(). Introduce FlushLocalBuffer() to do the same for local buffers.

Besides being a nice cleanup on its own, it also makes an upcoming change
slightly easier.

Reviewed-by: Melanie Plageman <melanieplageman@gmail.com>
Discussion: https://postgr.es/m/CAAKRu_b9anbWzEs5AAF9WCvcEVmgz-1AkHSQ-CLLy-p7WHzvFw@mail.gmail.com
2025-03-15 22:07:48 -04:00
Andres Freund
dd6f2618f6 localbuf: Introduce TerminateLocalBufferIO()
Previously TerminateLocalBufferIO() was open-coded in multiple places, which
doesn't seem like a great idea. While TerminateLocalBufferIO() currently is
rather simple, an upcoming patch requires additional code to be added to
TerminateLocalBufferIO(), making this modification particularly worthwhile.

For some reason FlushRelationBuffers() previously cleared BM_JUST_DIRTIED,
even though that's never set for temporary buffers. This is not carried over
as part of this change.

Reviewed-by: Melanie Plageman <melanieplageman@gmail.com>
Discussion: https://postgr.es/m/CAAKRu_b9anbWzEs5AAF9WCvcEVmgz-1AkHSQ-CLLy-p7WHzvFw@mail.gmail.com
2025-03-15 22:07:48 -04:00
Andres Freund
0762a151b0 localbuf: Introduce InvalidateLocalBuffer()
Previously, there were three copies of this code, two of them
identical. There's no good reason for that.

This change is nice on its own, but the main motivation is the AIO patchset,
which needs to add extra checks the deduplicated code, which of course is
easier if there is only one version.

Reviewed-by: Melanie Plageman <melanieplageman@gmail.com>
Discussion: https://postgr.es/m/CAAKRu_b9anbWzEs5AAF9WCvcEVmgz-1AkHSQ-CLLy-p7WHzvFw@mail.gmail.com
2025-03-15 22:07:48 -04:00
Andres Freund
fa6af9b25e localbuf: Fix dangerous coding pattern in GetLocalVictimBuffer()
If PinLocalBuffer() were to modify the buf_state, the buf_state in
GetLocalVictimBuffer() would be out of date. Currently that does not happen,
as PinLocalBuffer() only modifies the buf_state if adjust_usagecount=true and
GetLocalVictimBuffer() passes false.

However, it's easy to make this not the case anymore - it cost me a few hours
to debug the consequences.

The minimal fix would be to just refetch the buf_state after after calling
PinLocalBuffer(), but the same danger exists in later parts of the
function. Instead, declare buf_state in the narrower scopes and re-read the
state in conditional branches.  Besides being safer, it also fits well with
an upcoming set of cleanup patches that move the contents of the conditional
branches in GetLocalVictimBuffer() into helper functions.

I "broke" this in 794f259447.

Arguably this should be backpatched, but as the relevant functions are not
exported and there is no actual misbehaviour, I chose to not backpatch, at
least for now.

Reviewed-by: Melanie Plageman <melanieplageman@gmail.com>
Discussion: https://postgr.es/m/CAAKRu_b9anbWzEs5AAF9WCvcEVmgz-1AkHSQ-CLLy-p7WHzvFw@mail.gmail.com
2025-03-15 22:07:48 -04:00
Andrew Dunstan
5eabd91a83 Silence perl critic
Commit 27bdec0684 uses a loop variable that is not strictly local to
the loop. Perlcritic disapproves, and there's really no reason as the
variable is not used outside the loop.

Per buildfarm animals koel and crake.
2025-03-15 17:41:54 -04:00
Jeff Davis
27bdec0684 Optimization for lower(), upper(), casefold() functions.
Improve performance and reduce table sizes for case mapping.

The main case mapping table stores only 16-bit offsets, which can be
used to look up the mapped code point in any of the case tables (fold,
lower, upper, or title case). Simple case pairs point to the same
offsets.

Generate a function in generate-unicode_case_table.pl that consists of
a nested branches to test for specific codepoint ranges that determine
the offset in the main table.

Other approaches were considered, such as representing these ranges as
another structure (rather than branches in a generated function), or a
different approach such as a radix tree, or perfect hashing. The
author implemented and tested these alternatives and settled on the
generated branches.

Author: Alexander Borisov <lex.borisov@gmail.com>
Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi>
Discussion: https://postgr.es/m/7cac7e66-9a3b-4e3f-a997-42aa0c401f80%40gmail.com
2025-03-15 13:00:50 -07:00
Melanie Plageman
c3953226a0 Remove table AM callback scan_bitmap_next_block
After pushing the bitmap iterator into table-AM specific code (as part
of making bitmap heap scan use the read stream API in 2b73a8cd33),
scan_bitmap_next_block() no longer returns the current block number.
Since scan_bitmap_next_block() isn't returning any relevant information
to bitmap table scan code, it makes more sense to get rid of it.

Now, bitmap table scan code only calls table_scan_bitmap_next_tuple(),
and the heap AM implementation of scan_bitmap_next_block() is a local
helper in heapam_handler.c.

Reviewed-by: Tomas Vondra <tomas@vondra.me>
Discussion: https://postgr.es/m/flat/CAAKRu_ZwCwWFeL_H3ia26bP2e7HiKLWt0ZmGXPVwPO6uXq0vaA%40mail.gmail.com
2025-03-15 10:37:46 -04:00
Melanie Plageman
2b73a8cd33 BitmapHeapScan uses the read stream API
Make Bitmap Heap Scan use the read stream API instead of invoking
ReadBuffer() for each block indicated by the bitmap.

The read stream API handles prefetching, so remove all of the explicit
prefetching from bitmap heap scan code.

Now, heap table AM implements a read stream callback which uses the
bitmap iterator to return the next required block to the read stream
code.

Tomas Vondra conducted extensive regression testing of this feature.
Andres Freund, Thomas Munro, and I analyzed regressions and Thomas Munro
patched the read stream API.

Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Tomas Vondra <tomas@vondra.me>
Tested-by: Tomas Vondra <tomas@vondra.me>
Tested-by: Andres Freund <andres@anarazel.de>
Tested-by: Thomas Munro <thomas.munro@gmail.com>
Tested-by: Nazir Bilal Yavuz <byavuz81@gmail.com>
Discussion: https://postgr.es/m/flat/CAAKRu_ZwCwWFeL_H3ia26bP2e7HiKLWt0ZmGXPVwPO6uXq0vaA%40mail.gmail.com
2025-03-15 10:34:42 -04:00
Melanie Plageman
944e81bf99 Separate TBM[Shared|Private]Iterator and TBMIterateResult
Remove the TBMIterateResult member from the TBMPrivateIterator and
TBMSharedIterator and make tbm_[shared|private_]iterate() take a
TBMIterateResult as a parameter.

This allows tidbitmap API users to manage multiple TBMIterateResults per
scan. This is required for bitmap heap scan to use the read stream API,
with which there may be multiple I/Os in flight at once, each one with a
TBMIterateResult.

Reviewed-by: Tomas Vondra <tomas@vondra.me>
Discussion: https://postgr.es/m/d4bb26c9-fe07-439e-ac53-c0e244387e01%40vondra.me
2025-03-15 10:11:19 -04:00
Thomas Munro
799959dc7c Simplify distance heuristics in read_stream.c.
Make the distance control heuristics simpler and more aggressive in
preparation for asynchronous I/O.

The v17 version of read_stream.c made a conservative choice to limit the
look-ahead distance when streaming sequential blocks, because it
couldn't benefit very much from looking ahead further yet.  It had a
three-behavior model where only random I/O would rapidly increase the
look-ahead distance, to support read-ahead advice.  Sequential I/O would
move it towards the io_combine_limit setting, just enough to build one
full-sized synchronous I/O at a time, and then expect kernel read-ahead
to avoid I/O stalls.

That already left I/O performance on the table with advice-based I/O
concurrency, since sequential blocks could be followed by random jumps,
eg with the proposed streaming Bitmap Heap Scan patch.

It is time to delete the cautious middle option and adjust the distance
based on recent I/O needs only, since asynchronous reads will need to be
started ahead of time whether random or sequential.  It is still limited
by io_combine_limit, *_io_concurrency, buffer availability and
strategy ring size, as before.

Reviewed-by: Andres Freund <andres@anarazel.de> (earlier version)
Tested-by: Melanie Plageman <melanieplageman@gmail.com>
Discussion: https://postgr.es/m/CA%2BhUKGK_%3D4CVmMHvsHjOVrK6t4F%3DLBpFzsrr3R%2BaJYN8kcTfWg%40mail.gmail.com
2025-03-16 03:05:07 +13:00
Thomas Munro
7ea8cd1566 Improve read_stream.c advice for dense streams.
read_stream.c tries not to issue read-ahead advice when it thinks the
kernel's own read-ahead should be active, ie when using buffered I/O and
reading sequential blocks.  It previously gave up too easily, and issued
advice only for the first read of up to io_combine_limit blocks in a
larger range of sequential blocks after random jump.  The following read
could suffer an avoidable I/O stall.

Fix, by continuing to issue advice until the corresponding preadv()
calls catch up with the start of the region we're currently issuing
advice for, if ever.  That's when the kernel actually sees the
sequential pattern.  Advice is now disabled only when the stream is
entirely sequential as far as we can see in the look-ahead window, or
in other words, when a sequential region is larger than we can cover
with the current io_concurrency and io_combine_limit settings.

While refactoring the advice control logic, also get rid of the
"suppress_advice" argument that was passed around between functions to
skip useless posix_fadvise() calls immediately followed by preadv().
read_stream_start_pending_read() can figure that out, so let's
concentrate knowledge of advice heuristics in fewer places (our goal
being to make advice-based I/O concurrency a legacy mode soon).

The problem cases were revealed by Tomas Vondra's extensive regression
testing with many different disk access patterns using Melanie
Plageman's streaming Bitmap Heap Scan patch, in a battle against the
venerable always-issue-advice-and-always-one-block-at-a-time code.

Reviewed-by: Andres Freund <andres@anarazel.de> (earlier version)
Reported-by: Melanie Plageman <melanieplageman@gmail.com>
Reported-by: Tomas Vondra <tomas@vondra.me>
Reported-by: Andres Freund <andres@anarazel.de>
Tested-by: Melanie Plageman <melanieplageman@gmail.com>
Discussion: https://postgr.es/m/CA%2BhUKGK_%3D4CVmMHvsHjOVrK6t4F%3DLBpFzsrr3R%2BaJYN8kcTfWg%40mail.gmail.com
Discussion: https://postgr.es/m/CA%2BhUKGJ3HSWciQCz8ekP1Zn7N213RfA4nbuotQawfpq23%2Bw-5Q%40mail.gmail.com
2025-03-15 19:04:54 +13:00
Álvaro Herrera
11bd831860
doc: Explain more thoroughly when a table rewrite is needed
Author: Masahiro Ikeda <ikedamsh@oss.nttdata.com>
Reviewed-by: Robert Treat <rob@xzilla.net>
Discussion: https://postgr.es/m/00e6eb5f5c793b8ef722252c7a519c9a@oss.nttdata.com
2025-03-14 20:44:59 +01:00
Tom Lane
1c9242b2cd Doc: remove obsolete comment.
This para should have been removed by 2f9661311, which made it
both false and irrelevant.  Noted while looking at SQL function
plancache patch.
2025-03-14 14:08:47 -04:00
Fujii Masao
6d376c3b0d Add GUC option to log lock acquisition failures.
This commit introduces a new GUC, log_lock_failure, which controls whether
a detailed log message is produced when a lock acquisition fails. Currently,
it only supports logging lock failures caused by SELECT ... NOWAIT.

The log message includes information about all processes holding or
waiting for the lock that couldn't be acquired, helping users analyze and
diagnose the causes of lock failures.

Currently, this option does not log failures from SELECT ... SKIP LOCKED,
as that could generate excessive log messages if many locks are skipped,
causing unnecessary noise.

This mechanism can be extended in the future to support for logging
lock failures from other commands, such as LOCK TABLE ... NOWAIT.

Author: Yuki Seino <seinoyu@oss.nttdata.com>
Co-authored-by: Fujii Masao <masao.fujii@gmail.com>
Reviewed-by: Jelte Fennema-Nio <postgres@jeltef.nl>
Discussion: https://postgr.es/m/411280a186cc26ef7034e0f2dfe54131@oss.nttdata.com
2025-03-14 23:14:12 +09:00
Fujii Masao
e80171d57c Optimize iteration over PGPROC for fast-path lock searches.
This commit improves efficiency in FastPathTransferRelationLocks()
and GetLockConflicts(), which iterate over PGPROCs to search for
fast-path locks.

Previously, these functions recalculated the fast-path group during
every loop iteration, even though it remained constant. This update
optimizes the process by calculating the group once and reusing it
throughout the loop.

The functions also now skip empty fast-path groups, avoiding
unnecessary scans of their slots. Additionally, groups belonging to
inactive backends (with pid=0) are always empty, so checking
the group is sufficient to bypass these backends, further enhancing
performance.

Author: Fujii Masao <masao.fujii@gmail.com>
Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi>
Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Discussion: https://postgr.es/m/07d5fd6a-71f1-4ce8-8602-4cc6883f4bd1@oss.nttdata.com
2025-03-14 22:49:29 +09:00
Peter Eisentraut
a359d37019 Simplify and generalize PrepareSortSupportFromIndexRel()
PrepareSortSupportFromIndexRel() was accepting btree strategy numbers
purely for the purpose of comparing it later against btree strategies
to determine if the sort direction was forward or reverse.  Change
that.  Instead, pass a bool directly, to indicate the same without an
unfortunate assumption that a strategy number refers specifically to a
btree strategy.  (This is similar in spirit to commits 0d2aa4d493 and
c594f1ad2ba.)

(This could arguably be simplfied further by having the callers fill
in ssup_reverse directly.  But this way, it preserves consistency by
having all PrepareSortSupport*() variants be responsible for filling
in ssup_reverse.)

Moreover, remove the hardcoded check against BTREE_AM_OID, and check
against amcanorder instead, which is the actual requirement.

Co-authored-by: Mark Dilger <mark.dilger@enterprisedb.com>
Discussion: https://www.postgresql.org/message-id/flat/E72EAA49-354D-4C2E-8EB9-255197F55330@enterprisedb.com
2025-03-14 10:34:08 +01:00
Álvaro Herrera
1548c3a304
Remove direct handling of reloptions for toast tables
It doesn't actually work, even with allow_system_table_mods turned on:
the ALTER TABLE operation is rejected by ATSimplePermissions(), so even
the error message we're adding in this commit is unreachable.

Add a test case for it.

Author: Nikolay Shaplov <dhyan@nataraj.su>
Discussion: https://postgr.es/m/1913854.tdWV9SEqCh@thinkpad-pgpro
2025-03-14 09:28:51 +01:00
Thomas Munro
92fc6856cb Respect changing pin limits in read_stream.c.
To avoid pinning too much of the buffer pool at once, read_stream.c
previously used LimitAdditionalPins().  The coding was naive, and only
considered the available buffers at stream construction time.

This commit checks before each StartReadBuffers() call with
GetAdditionalPinLimit().  The result might change over time due to pins
acquired outside this stream by the same backend.  No extra CPU cycles
are added to the all-buffered fast-path code, but the I/O-starting path
now considers the up-to-date remaining buffer limit.

In practice it was quite difficult to exceed limits and cause any real
problems in v17, so no back-patch for now, but proposed changes will
make it easier.

Per code review from Andres, in the course of testing his AIO patches.

Reviewed-by: Andres Freund <andres@anarazel.de> (earlier versions)
Discussion: https://postgr.es/m/CA%2BhUKGK_%3D4CVmMHvsHjOVrK6t4F%3DLBpFzsrr3R%2BaJYN8kcTfWg%40mail.gmail.com
2025-03-14 21:21:09 +13:00
Peter Eisentraut
0793ab8100 Activate Python "Limited API" in PL/Python
This allows building PL/Python against any Python 3.x version and
using another Python 3.x version at run time.  This is useful for
installers that want to run against a separately downloaded Python, so
that they don't have to bundle it themselves.

This builds on the earlier patch to only use APIs supported by the
Limited API.

At the moment, this is not activated on MSVC because that leads to
build failures that no one could explain or cared enough to address.
This could be done later.

Reviewed-by: Jakob Egger <jakob@eggerapps.at>
Discussion: https://www.postgresql.org/message-id/flat/ee410de1-1e0b-4770-b125-eeefd4726a24@eisentraut.org
2025-03-14 08:57:02 +01:00
Peter Eisentraut
05cbd6cb22 Swap order of extern/static and pg_nodiscard
When pg_nodiscard was first added, the C standard draft had it as a
function specifier, and so the code comment about placement was
written with that in mind.  The final C23 standard has it as an
attribute and the placement rules are a bit different for that.
Specifically, it needs to be before extern or static.  (Or at least
both current clang and gcc require that.)  So just swap these.  (To be
clear: The current implementation with gcc attributes doesn't care.
This change is just for maximum forward compatibility for non-gcc
compilers.)  This also keeps the order consistent with the previously
introduced pg_noreturn.  Also update the code comment to reflect the
mentioned developments since its introduction.

Reviewed-by: Dagfinn Ilmari Mannsåker <ilmari@ilmari.org>
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://www.postgresql.org/message-id/flat/pxr5b3z7jmkpenssra5zroxi7qzzp6eswuggokw64axmdixpnk@zbwxuq7gbbcw
2025-03-14 07:18:07 +01:00
Thomas Munro
01261fb078 Improve buffer manager API for backend pin limits.
Previously the support functions assumed that the caller needed one pin
to make progress, and could optionally use some more, allowing enough
for every connection to do the same.  Add a couple more functions for
callers that want to know:

* what the maximum possible number could be, irrespective of currently
  held pins, for space planning purposes

* how many additional pins they could acquire right now, without the
  special case allowing one pin, for callers that already hold pins and
  could already make progress even if no extra pins are available

The pin limit logic began in commit 31966b15.  This refactoring is
better suited to read_stream.c, which will be adjusted to respect the
remaining limit as it changes over time in a follow-up commit.  It also
computes MaxProportionalPins up front, to avoid performing divisions
whenever a caller needs to check the balance.

Reviewed-by: Andres Freund <andres@anarazel.de> (earlier versions)
Discussion: https://postgr.es/m/CA%2BhUKGK_%3D4CVmMHvsHjOVrK6t4F%3DLBpFzsrr3R%2BaJYN8kcTfWg%40mail.gmail.com
2025-03-14 17:13:09 +13:00
Amit Kapila
7c99dc587a Fix ALTER SUBSCRIPTION ... SET PUBLICATION ... command.
The problem is that ALTER SUBSCRIPTION ... SET PUBLICATION ... will lead
to restarting of apply worker and after the restart, the apply worker will
use the existing slot and replication origin corresponding to the
subscription. Now, it is possible that before the restart, the origin has
not been updated, and the WAL start location points to a location before
where PUBLICATION pointed to by SET PUBLICATION doesn't exist, and that
can lead to an error like: "ERROR:  publication "pub1" does not exist".
Once this error occurs, apply worker will never be able to proceed and
will always return the same error.

We decided to skip loading the publication if the publication does not
exist. The publication is loaded later and updates the relation entry when
the publication gets created.

We decided not to backpatch this as this is a behaviour change, and we don't
see field reports. This problem has been found by intermittent buildfarm
failures.

Author: vignesh C <vignesh21@gmail.com>
Reviewed-by: Dilip Kumar <dilipbalaut@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Discussion: https://postgr.es/m/flat/CALDaNm0-n8FGAorM%2BbTxkzn%2BAOUyx5%3DL_XmnvOP6T24%2B-NcBKg%40mail.gmail.com
Discussion: https://postgr.es/m/CAA4eK1+T-ETXeRM4DHWzGxBpKafLCp__5bPA_QZfFQp7-0wj4Q@mail.gmail.com
2025-03-14 08:57:40 +05:30
Tom Lane
4618045bee Fix ARRAY_SUBLINK and ARRAY[] for int2vector and oidvector input.
If the given input_type yields valid results from both
get_element_type and get_array_type, initArrayResultAny believed the
former and treated the input as an array type.  However this is
inconsistent with what get_promoted_array_type does, leading to
situations where the output of an ARRAY() subquery is labeled with
the wrong type: it's labeled as oidvector[] but is really a 2-D
array of OID.  That at least results in strange output, and can
result in crashes if further processing such as unnest() is applied.
AFAIK this is only possible with the int2vector and oidvector
types, which are special-cased to be treated mostly as true arrays
even though they aren't quite.

Fix by switching the logic to match get_promoted_array_type by
testing get_array_type not get_element_type, and remove an Assert
thereby made pointless.  (We need not introduce a symmetrical
check for get_element_type in the other if-branch, because
initArrayResultArr will check it.)  This restores the behavior
that existed before bac27394a introduced initArrayResultAny:
the output really is int2vector[] or oidvector[].

Comparable confusion exists when an input of an ARRAY[] construct
is int2vector or oidvector: transformArrayExpr decides it's dealing
with a multidimensional array constructor, and we end up with
something that's a multidimensional OID array but is alleged to be
of type oidvector.  I have not found a crashing case here, but it's
easy to demonstrate totally-wrong results.  Adjust that code so
that what you get is an oidvector[] instead, for consistency with
ARRAY() subqueries.  (This change also makes these types work like
domains-over-arrays in this context, which seems correct.)

Bug: #18840
Reported-by: yang lei <ylshiyu@126.com>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/18840-fbc9505f066e50d6@postgresql.org
Backpatch-through: 13
2025-03-13 16:07:55 -04:00
Álvaro Herrera
c7fc8808a9
ATExecSetRelOptions: Reduce scope of 'isnull' variable
Author: Nikolay Shaplov <dhyan@nataraj.su>
Reviewed-by: Timur Magomedov <t.magomedov@postgrespro.ru>
Discussion: https://postgr.es/m/1913854.tdWV9SEqCh@thinkpad-pgpro
2025-03-13 18:15:59 +01:00
Álvaro Herrera
da0f0582e8
Make lwlocknames.h generated file less ugly
We can make the output look a bit better by aligning each lock's
definition, so add some padding space to achieve that.  This change
makes no practical difference, but casual onlookers will be less
distracted by (lack of) whitespace.

Author: Gurjeet Singh <gurjeet@singh.im>
Discussion: https://postgr.es/m/CABwTF4VxfwDtRV-H22_XK4XeDogaV-Vaobu+af5U=8ZAZn9ZZQ@mail.gmail.com
2025-03-13 17:38:21 +01:00
Nathan Bossart
0697b23906 Add reverse(bytea).
This commit introduces a function for reversing the order of the
bytes in binary strings.

Bumps catversion.

Author: Aleksander Alekseev <aleksander@timescale.com>
Discussion: https://postgr.es/m/CAJ7c6TMe0QVRuNssUArbMi0bJJK32%2BzNA3at5m3osrBQ25MHuw%40mail.gmail.com
2025-03-13 11:20:53 -05:00
Peter Eisentraut
bb25276205 Fix copy-and-paste mistake in error message
Introduced in commit a68159ff2b.
2025-03-13 15:17:08 +01:00
Peter Eisentraut
3691edfab9 pg_noreturn to replace pg_attribute_noreturn()
We want to support a "noreturn" decoration on more compilers besides
just GCC-compatible ones, but for that we need to move the decoration
in front of the function declaration instead of either behind it or
wherever, which is the current style afforded by GCC-style attributes.
Also rename the macro to "pg_noreturn" to be similar to the C11
standard "noreturn".

pg_noreturn is now supported on all compilers that support C11 (using
_Noreturn), as well as GCC-compatible ones (using __attribute__, as
before), as well as MSVC (using __declspec).  (When PostgreSQL
requires C11, the latter two variants can be dropped.)

Now, all supported compilers effectively support pg_noreturn, so the
extra code for !HAVE_PG_ATTRIBUTE_NORETURN can be dropped.

This also fixes a possible problem if third-party code includes
stdnoreturn.h, because then the current definition of

    #define pg_attribute_noreturn() __attribute__((noreturn))

would cause an error.

Note that the C standard does not support a noreturn attribute on
function pointer types.  So we have to drop these here.  There are
only two instances at this time, so it's not a big loss.  In one case,
we can make up for it by adding the pg_noreturn to a wrapper function
and adding a pg_unreachable(), in the other case, the latter was
already done before.

Reviewed-by: Dagfinn Ilmari Mannsåker <ilmari@ilmari.org>
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://www.postgresql.org/message-id/flat/pxr5b3z7jmkpenssra5zroxi7qzzp6eswuggokw64axmdixpnk@zbwxuq7gbbcw
2025-03-13 12:37:26 +01:00
Richard Guo
cc5d98525d Fix incorrect handling of subquery pullup
When pulling up a subquery, if the subquery's target list items are
used in grouping set columns, we need to wrap them in PlaceHolderVars.
This ensures that expressions retain their separate identity so that
they will match grouping set columns when appropriate.

In 90947674f, we decided to wrap subquery outputs that are non-var
expressions in PlaceHolderVars.  This prevents const-simplification
from merging them into the surrounding expressions after subquery
pullup, which could otherwise lead to failing to match those
subexpressions to grouping set columns, with the effect that they'd
not go to null when expected.

However, that left some loose ends.  If the subquery's target list
contains two or more identical Var expressions, we can still fail to
match the Var expression to the expected grouping set expression.
This is not related to const-simplification, but rather to how we
match expressions to lower target items in setrefs.c.

For sort/group expressions, we use ressortgroupref matching, which
works well.  For other expressions, we primarily rely on comparing the
expressions to determine if they are the same.  Therefore, we need a
way to prevent setrefs.c from matching the expression to some other
identical ones.

To fix, wrap all subquery outputs in PlaceHolderVars if the parent
query uses grouping sets, ensuring that they preserve their separate
identity throughout the whole planning process.

Reported-by: Dean Rasheed <dean.a.rasheed@gmail.com>
Author: Richard Guo <guofenglinux@gmail.com>
Reviewed-by: Dean Rasheed <dean.a.rasheed@gmail.com>
Discussion: https://postgr.es/m/CAMbWs4-meSahaanKskpBn0KKxdHAXC1_EJCVWHxEodqirrGJnw@mail.gmail.com
2025-03-13 16:36:03 +09:00
Richard Guo
4c49611715 Remove code setting wrap_non_vars to true for UNION ALL subqueries
In pull_up_simple_subquery and pull_up_constant_function, there is
code that sets wrap_non_vars to true when dealing with an appendrel
member.  The goal is to wrap subquery outputs that are not simple Vars
in PlaceHolderVars, ensuring that what we pull up doesn't get merged
into a surrounding expression during later processing, which could
cause it to fail to match the expression actually available from the
appendrel.

However, this is unnecessary.  When pulling up an appendrel child
subquery, the only part of the upper query that could reference the
appendrel child yet is the translated_vars list of the associated
AppendRelInfo that we just made for this child.  Furthermore, we do
not want to force use of PHVs in the AppendRelInfo, as there is no
outer join between.  In fact, perform_pullup_replace_vars always sets
wrap_non_vars to false before performing pullup_replace_vars on the
AppendRelInfo.

This patch simply removes the code that sets wrap_non_vars to true for
UNION ALL subqueries.

Author: Richard Guo <guofenglinux@gmail.com>
Reviewed-by: Dean Rasheed <dean.a.rasheed@gmail.com>
Discussion: https://postgr.es/m/CAMbWs4-VXDEi1v+hZYLxpOv0riJxHsCkCH1f46tLnhonEAyGCQ@mail.gmail.com
2025-03-13 16:34:28 +09:00
Jeff Davis
d3b2e5e1ab Refactor convert_case() to prepare for optimizations.
Upcoming optimizations will add complexity to convert_case(). This
patch reorganizes slightly so that the complexity can be contained
within the logic to convert the case of a single character, rather
than mixing it in with logic to iterate through the string.

Reviewed-by: Alexander Borisov <lex.borisov@gmail.com>
Discussion: https://postgr.es/m/44005c3d-88f4-4a26-981f-fd82dfa8e313@gmail.com
2025-03-12 21:51:52 -07:00
Amit Kapila
3abe9dc188 Avoid invalidating all RelationSyncCache entries on publication rename.
On Publication rename, we need to only invalidate the RelationSyncCache
entries corresponding to relations that are part of the publication being
renamed.

As part of this patch, we introduce a new invalidation message to
invalidate the cache maintained by the logical decoding output plugin. We
can't use existing relcache invalidation for this purpose, as that would
unnecessarily cause relcache invalidations in other backends.

This will improve performance by building fewer relation cache entries
during logical replication.

Author: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Author: Shlok Kyal <shlok.kyal.oss@gmail.com>
Reviewed-by: Hou Zhijie <houzj.fnst@fujitsu.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Discussion: https://postgr.es/m/OSCPR01MB14966C09AA201EFFA706576A7F5C92@OSCPR01MB14966.jpnprd01.prod.outlook.com
2025-03-13 09:16:33 +05:30
Thomas Munro
75da2bece6 Fix read_stream.c for changing io_combine_limit.
In a couple of places, read_stream.c assumed that io_combine_limit would
be stable during the lifetime of a stream.  That is not true in at least
one unusual case: streams held by CURSORs where you could change the GUC
between FETCH commands, with unpredictable results.

Fix, by storing stream->io_combine_limit and referring only to that
after construction.  This mirrors the treatment of the other important
setting {effective,maintenance}_io_concurrency, which is stored in
stream->max_ios.

One of the cases was the queue overflow space, which was sized for
io_combine_limit and could be overrun if the GUC was increased.  Since
that coding was a little hard to follow, also introduce a variable for
better readability instead of open-coding the arithmetic.  Doing so
revealed an off-by-one thinko while clamping max_pinned_buffers to
INT16_MAX, though that wasn't a live bug due to the current limits on
GUC values.

Back-patch to 17.

Discussion: https://postgr.es/m/CA%2BhUKG%2B2T9p-%2BzM6Eeou-RAJjTML6eit1qn26f9twznX59qtCA%40mail.gmail.com
2025-03-13 15:43:34 +13:00
Amit Langote
d4f79865d4 Fix copy-paste error in datum_to_jsonb_internal()
Commit 3c152a27b0 mistakenly repeated JSONTYPE_JSON in a condition,
omitting JSONTYPE_CAST. As a result, datum_to_jsonb_internal() failed
to reject inputs that were casts (e.g., from an enum to json as in the
example below) when used as keys in JSON constructors.

This led to a crash in cases like:

  SELECT JSON_OBJECT('happy'::mood: '123'::jsonb);

where 'happy'::mood is implicitly cast to json. The missing check
meant such casted values weren’t properly rejected as invalid
(non-scalar) JSON keys.

Reported-by: Maciek Sakrejda <maciek@pganalyze.com>
Reviewed-by: Tender Wang <tndrwang@gmail.com>
Reviewed-by: Alvaro Herrera <alvherre@alvh.no-ip.org>
Reviewed-by: Maciek Sakrejda <maciek@pganalyze.com>
Discussion: https://postgr.es/m/CADXhmgTJtJZK9A3Na_ry+Xrq-ghjcejBRhcRMzWZvbd__QdgJA@mail.gmail.com
Backpatch-through: 17
2025-03-13 09:56:36 +09:00
Masahiko Sawada
4ecdd4110d pg_rewind: Add dbname to primary_conninfo when using --write-recovery-conf.
This commit enhances pg_rewind's --write-recovery-conf option to
include the dbname in the generated primary_conninfo value when
specified in the --source-server option. With this modification, the
rewound server can connect to the primary server without manual
configuration file modifications when sync_replication_slots is
enabled.

Reviewed-by: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Reviewed-by: Peter Smith <smithpb2250@gmail.com>
Discussion: https://postgr.es/m/CAD21AoAkW=Ht0k9dVoBTCcqLiiZ2MXhVr+d=j2T_EZMerGrLWQ@mail.gmail.com
2025-03-12 16:56:04 -07:00
David Rowley
cdc1471cc7 Add b955df443 to .git-blame-ignore-revs 2025-03-13 12:44:26 +13:00
David Rowley
b955df4434 Fix indentation issue
Introduced recently by 9e088f7dd

Per buildfarm member koel
2025-03-13 12:41:44 +13:00
Masahiko Sawada
9e088f7dd8 Fix compiler warning in pg_logicalinspect.
Oversight in bd65cb3cd4.

Reported-by: David Rowley <dgrowleyml@gmail.com>
Reported-by: Nathan Bossart <nathandbossart@gmail.com>
Author: David Rowley <dgrowleyml@gmail.com>
Discussion: https://postgr.es/m/CAApHDvqrhFfnetbcwgGkJ=z63T8HfQ_OyP=vX8BYiXyxFKt67w@mail.gmail.com
2025-03-12 14:23:56 -07:00
Heikki Linnakangas
ac4494646d Rename alloc/free functions in reorderbuffer.c
There used to be bespoken pools for these structs to reduce the
palloc/pfree overhead, but that was ripped out a long time ago and
replaced with the generic, cheaper generational memory allocator
(commit a4ccc1cef5). The Get/Return terminology made sense with the
pools, as you "got" an object from the pool and "returned" it later,
but now it just looks weird. Rename to Alloc/Free.

Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://www.postgresql.org/message-id/c9e43d2d-8e83-444f-b111-430377368989@iki.fi
2025-03-12 22:03:39 +02:00
Nathan Bossart
025e7e1eb4 Remove count_one_bits() in acl.c.
The only caller, select_best_grantor(), can instead use
pg_popcount64().  This isn't performance-critical code, but we
might as well use the centralized implementation.  While at it, add
some test coverage for this part of select_best_grantor().

Reviewed-by: Álvaro Herrera <alvherre@alvh.no-ip.org>
Discussion: https://postgr.es/m/Z9GtL7Nm6hsYyJnF%40nathan
2025-03-12 15:01:52 -05:00
Melanie Plageman
ff79b5b2ab Increase default effective_io_concurrency to 16
The default effective_io_concurrency has been 1 since it was introduced
in b7b8f0b609. Referencing the associated discussion [1], it
seems 1 was chosen as a conservative value that seemed unlikely to cause
regressions.

Experimentation on high latency cloud storage as well as fast, local
nvme storage (see Discussion link) shows that even slightly higher
values improve query timings substantially. 1 actually performs worse
than 0 [2]. With effective_io_concurrency 1, we are not prefetching
enough to avoid I/O stalls, but we are issuing extra syscalls.

The new default is 16, which should be more appropriate for common
hardware while still avoiding flooding low IOPs devices with I/O
requests.

[1] https://www.postgresql.org/message-id/flat/FDDBA24E-FF4D-4654-BA75-692B3BA71B97%40enterprisedb.com
[2] https://www.postgresql.org/message-id/CAAKRu_Zv08Cic%3DqdCfzrQabpEXGrd9Z9UOW5svEVkCM6%3DFXA9g%40mail.gmail.com

Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/CAAKRu_Z%2BJa-mwXebOoOERMMUMvJeRhzTjad4dSThxG0JLXESxw%40mail.gmail.com
2025-03-12 15:57:44 -04:00
Heikki Linnakangas
af717317a0 Handle interrupts while waiting on Append's async subplans
We did not wake up on interrupts while waiting on async events on an
async-capable append node. For example, if you tried to cancel the
query, nothing would happen until one of the async subplans becomes
readable. To fix, add WL_LATCH_SET to the WaitEventSet.

Backpatch down to v14 where async Append execution was introduced.

Discussion: https://www.postgresql.org/message-id/37a40570-f558-40d3-b5ea-5c2079b3b30b@iki.fi
2025-03-12 20:53:09 +02:00
Tom Lane
f4e7756ef9 Build whole-row Vars the same way during parsing and planning.
makeWholeRowVar() has different rules for constructing a
whole-row Var depending on the kind of RTE it's representing.
This turns out to be problematic because the rewriter and planner
can convert view RTEs and set-returning-function RTEs into
subquery RTEs; so a whole-row Var made during planning might
look different from one made by the parser.  In isolation this
doesn't cause any problem, but if a query contains Vars made
both ways for the same varno, there are cross-checks in the
executor that will complain.  This manifests for UPDATE, DELETE,
and MERGE queries that use whole-row table references.

To fix, we need makeWholeRowVar() to produce the same result
from an inlined RTE as it would have for the original.  For
an inlined view, we can use RangeTblEntry.relid to detect
that this had been a view RTE.  For inlined SRFs, make a
data structure definition change akin to commit 47bb9db75,
and say that we won't clear RangeTblEntry.functions until
the end of planning.  That allows makeWholeRowVar() to
repeat what it would have done with the unmodified RTE.

Reported-by: Duncan Sands <duncan.sands@deepbluecap.com>
Reported-by: Dean Rasheed <dean.a.rasheed@gmail.com>
Diagnosed-by: Tender Wang <tndrwang@gmail.com>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Dean Rasheed <dean.a.rasheed@gmail.com>
Discussion: https://postgr.es/m/3518c50a-ab18-482f-b916-a37263622501@deepbluecap.com
Backpatch-through: 13
2025-03-12 11:47:38 -04:00
Melanie Plageman
18cd15e706 Add connection establishment duration logging
Add log_connections option 'setup_durations' which logs durations of
several key parts of connection establishment and backend setup.

For an incoming connection, starting from when the postmaster gets a
socket from accept() and ending when the forked child backend is first
ready for query, there are multiple steps that could each take longer
than expected due to external factors. This logging provides visibility
into authentication and fork duration as well as the end-to-end
connection establishment and backend initialization time.

To make this portable, the timings captured in the postmaster (socket
creation time, fork initiation time) are passed through the
BackendStartupData.

Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Reviewed-by: Fujii Masao <masao.fujii@oss.nttdata.com>
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Reviewed-by: Jacob Champion <jacob.champion@enterprisedb.com>
Reviewed-by: Jelte Fennema-Nio <postgres@jeltef.nl>
Reviewed-by: Guillaume Lelarge <guillaume.lelarge@dalibo.com>
Discussion: https://postgr.es/m/flat/CAAKRu_b_smAHK0ZjrnL5GRxnAVWujEXQWpLXYzGbmpcZd3nLYw%40mail.gmail.com
2025-03-12 11:35:27 -04:00
Melanie Plageman
9219093cab Modularize log_connections output
Convert the boolean log_connections GUC into a list GUC comprised of the
connection aspects to log.

This gives users more control over the volume and kind of connection
logging.

The current log_connections options are 'receipt', 'authentication', and
'authorization'. The empty string disables all connection logging. 'all'
enables all available connection logging.

For backwards compatibility, the most common values for the
log_connections boolean are still supported (on, off, 1, 0, true, false,
yes, no). Note that previously supported substrings of on, off, true,
false, yes, and no are no longer supported.

Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Reviewed-by: Fujii Masao <masao.fujii@oss.nttdata.com>
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Discussion: https://postgr.es/m/flat/CAAKRu_b_smAHK0ZjrnL5GRxnAVWujEXQWpLXYzGbmpcZd3nLYw%40mail.gmail.com
2025-03-12 11:35:21 -04:00
Michael Paquier
f554a95379 Remove initialization from PendingBackendStats
9a8dd2c5a6 has added an initialization to PendingBackendStats, which
has been causing compilation warnings in the buildfarm.  This code does
not strictly require it as PendingBackendStats is always initialized
with memset(0), so let's remove it.

Per report from multiple buildfarm members, like ayu and batfish, via
Tom Lane.

Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Discussion: https://postgr.es/m/1870853.1741749264@sss.pgh.pa.us
2025-03-12 20:37:43 +09:00
Peter Eisentraut
72a3d0462b Prepare for Python "Limited API" in PL/Python
Using the Python Limited API would allow building PL/Python against
any Python 3.x version and using another Python 3.x version at run
time.  This commit does not activate that, but it prepares the code to
only use APIs supported by the Limited API.

Implementation details:

- Convert static types to heap types
  (https://docs.python.org/3/howto/isolating-extensions.html#heap-types).

- Replace PyRun_String() with component functions.

- Replace PyList_SET_ITEM() with PyList_SetItem().

This was previously committed as c47e8df815 and then reverted because
it wasn't working under Python older than 3.8.  That has been fixed in
this version.  There was a Python API change/bugfix between 3.7 and
3.8 that directly affects this patch.  The relevant commit is
<https://github.com/python/cpython/commit/364f0b0f19c>.  The
workarounds described there have been applied in this patch, and it
has been confirmed to work with Python 3.6 and 3.7.

Reviewed-by: Jakob Egger <jakob@eggerapps.at>
Discussion: https://www.postgresql.org/message-id/flat/ee410de1-1e0b-4770-b125-eeefd4726a24@eisentraut.org
2025-03-12 08:53:54 +01:00
Tom Lane
c872516d8f Doc: silence A4 PDF build warnings.
Commit 0fbceae84 put a "&zwsp;" in almost but not quite the correct
place to avoid "The contents of fo:block line 1 exceed the available
area" warnings.  Per buildfarm.
2025-03-11 23:35:39 -04:00
Heikki Linnakangas
043745c3a0 Improve snapmgr.c comment
Add more details on the different kinds of snapshots, how to use them,
and how the active snapshot stack works.

Discussion: https://www.postgresql.org/message-id/7c56f180-b9e1-481e-8c1d-efa63de3ecbb@iki.fi
2025-03-11 23:28:38 +02:00
Heikki Linnakangas
8076c00592 Assert that a snapshot is active or registered before it's used
The comment in GetTransactionSnapshot() said that you "should call
RegisterSnapshot or PushActiveSnapshot on the returned snap if it is
to be used very long". That felt too unclear to me. Make the comment
more strongly worded.

To enforce that rule and to catch potential bugs where a snapshot
might get invalidated while it's still in use, add an assertion to
HeapTupleSatisfiesMVCC() to check that the snapshot is registered or
pushed to active stack. No new bugs were found by this, but it seems
like good future-proofing. It's not a great place for the check;
HeapTupleSatisfiesMVCC() is in fact safe to call with an unregistered
snapshot, and the assertion won't catch other unsafe uses. But it goes
a long way in practice.

Fix a few cases that were playing fast and loose with that and just
assumed that the snapshot cannot be invalidated during a scan. Those
assumptions were not wrong, but they're not performance critical, so
let's drop the excuses and just register the snapshot. These were
false positives found by the new assertion.

Discussion: https://www.postgresql.org/message-id/7c56f180-b9e1-481e-8c1d-efa63de3ecbb@iki.fi
2025-03-11 23:20:34 +02:00
Masahiko Sawada
bd65cb3cd4 pg_logicalinspect: Fix possible crash when passing a directory path.
Previously, pg_logicalinspect functions were too trusting of their
input and blindly passed it to SnapBuildRestoreSnapshot(). If the
input pointed to a directory, the server could a PANIC error while
attempting to fsync_fname() with isdir=false on a directory.

This commit adds validation checks for input filenames and passes the
LSN extracted from the filename to SnapBuildRestoreSnapshot() instead
of the filename itself. It also adds regression tests for various
input patterns and permission checks.

Bug: #18828
Reported-by: Robins Tharakan <tharakan@gmail.com>
Co-authored-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Co-authored-by: Masahiko Sawada <sawada.mshk@gmail.com>
Discussion: https://postgr.es/m/18828-0f4701c635064211@postgresql.org
2025-03-11 09:56:40 -07:00
Masahiko Sawada
a49927f04c pg_logicalinspect: Stabilize isolation tests.
The previous isolation tests did not account for the possibility that
the background writer or the checkpointer could write a RUNNING_XACTS
record, which could cause logical decoding to produce more logical
snapshots than expected.

This commit modifies the isolation tests to verify that at least one
logical snapshot contains the expected number of committed or ongoing
catalog-change transactions.

Per buildfarm member skink.

Reported-by: Andres Freund <andres@anarazel.de>
Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Discussion: https://postgr.es/m/5qbxud4pvnvmtuoi7weiizm5hmumxaeohx4vztfhrwlfhyz6rj@buh4435mllwo
2025-03-11 09:30:00 -07:00
Tom Lane
8b1b342544 Improve EXPLAIN's display of window functions.
Up to now we just punted on showing the window definitions used
in a plan, with window function calls represented as "OVER (?)".
To improve that, show the window definition implemented by each
WindowAgg plan node, and reference their window names in OVER.
For nameless window clauses generated by "OVER (...)", assign
unique names w1, w2, etc.

In passing, re-order the properties shown for a WindowAgg node
so that the Run Condition (if any) appears after the Window
property and before the Filter (if any).  This seems more
sensible since the Run Condition is associated with the Window
and acts before the Filter.

Thanks to David G. Johnston and Álvaro Herrera for design
suggestions.

Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: David Rowley <dgrowleyml@gmail.com>
Discussion: https://postgr.es/m/144530.1741469955@sss.pgh.pa.us
2025-03-11 11:19:54 -04:00
Peter Geoghegan
426ea61117 nbtree: Make BTMaxItemSize into object-like macro.
Make nbtree's "1/3 of a page limit" BTMaxItemSize function-like macro
(which accepts a "page" argument) into an object-like macro that can be
used from code that doesn't have convenient access to an nbtree page.

Preparation for an upcoming patch that adds skip scan to nbtree.
Parallel index scans that use skip scan will serialize datums (not just
SAOP array subscripts) when scheduling primitive scans.  BTMaxItemSize
will be used by btestimateparallelscan to determine how much DSM to
request.

Author: Peter Geoghegan <pg@bowt.ie>
Discussion: https://postgr.es/m/CAH2-Wz=H_RG5weNGeUG_TkK87tRBnH9mGCQj6WpM4V4FNWKv2g@mail.gmail.com
2025-03-11 10:35:56 -04:00
Peter Geoghegan
0fbceae841 Show index search count in EXPLAIN ANALYZE, take 2.
Expose the count of index searches/index descents in EXPLAIN ANALYZE's
output for index scan/index-only scan/bitmap index scan nodes.  This
information is particularly useful with scans that use ScalarArrayOp
quals, where the number of index searches can be unpredictable due to
implementation details that interact with physical index characteristics
(at least with nbtree SAOP scans, since Postgres 17 commit 5bf748b8).
The information shown also provides useful context when EXPLAIN ANALYZE
runs a plan with an index scan node that successfully applied the skip
scan optimization (set to be added to nbtree by an upcoming patch).

The instrumentation works by teaching all index AMs to increment a new
nsearches counter whenever a new index search begins.  The counter is
incremented at exactly the same point that index AMs already increment
the pg_stat_*_indexes.idx_scan counter (we're counting the same event,
but at the scan level rather than the relation level).  Parallel queries
have workers copy their local counter struct into shared memory when an
index scan node ends -- even when it isn't a parallel aware scan node.
An earlier version of this patch that only worked with parallel aware
scans became commit 5ead85fb (though that was quickly reverted by commit
d00107cd following "debug_parallel_query=regress" buildfarm failures).

Our approach doesn't match the approach used when tracking other index
scan related costs (e.g., "Rows Removed by Filter:").  It is comparable
to the approach used in similar cases involving costs that are only
readily accessible inside an access method, not from the executor proper
(e.g., "Heap Blocks:" output for a Bitmap Heap Scan, which was recently
enhanced to show per-worker costs by commit 5a1e6df3, using essentially
the same scheme as the one used here).  It is necessary for index AMs to
have direct responsibility for maintaining the new counter, since the
counter might need to be incremented multiple times per amgettuple call
(or per amgetbitmap call).  But it is also necessary for the executor
proper to manage the shared memory now used to transfer each worker's
counter struct to the leader.

Author: Peter Geoghegan <pg@bowt.ie>
Reviewed-By: Robert Haas <robertmhaas@gmail.com>
Reviewed-By: Tomas Vondra <tomas@vondra.me>
Reviewed-By: Masahiro Ikeda <ikedamsh@oss.nttdata.com>
Reviewed-By: Matthias van de Meent <boekewurm+postgres@gmail.com>
Discussion: https://postgr.es/m/CAH2-WzkRqvaqR2CTNqTZP0z6FuL4-3ED6eQB0yx38XBNj1v-4Q@mail.gmail.com
Discussion: https://postgr.es/m/CAH2-Wz=PKR6rB7qbx+Vnd7eqeB5VTcrW=iJvAsTsKbdG+kW_UA@mail.gmail.com
2025-03-11 09:20:50 -04:00
Peter Eisentraut
12c5f797ea Update nls.mk for newly added file
Commit f18231e817 moved some code to a new file, but the new file
wasn't added to nls.mk.
2025-03-11 13:48:14 +01:00
Álvaro Herrera
17ce344f86
BRIN: be more strict about required support procs
With improperly defined operator classes, it's possible to get a
Postgres crash because we'd try to invoke a procedure that doesn't
exist.  This is because the code is being a bit too trusting that the
opclass is correctly defined.  Add some ereport(ERROR)s for cases where
mandatory support procedures are not defined, transforming the crashes
into errors.

The particular case that was reported is an incomplete opclass in
PostGIS.

Backpatch all the way down to 13.

Reported-by: Tobias Wendorff <tobias.wendorff@tu-dortmund.de>
Diagnosed-by: David Rowley <dgrowleyml@gmail.com>
Reviewed-by: Tomas Vondra <tomas@vondra.me>
Discussion: https://postgr.es/m/fb6d9a35-6c8e-4869-af80-0a4944a793a4@tu-dortmund.de
2025-03-11 12:50:35 +01:00
Daniel Gustafsson
d35d32d711 Add special case fast-paths for strict functions
Many STRICT function calls will have one or two arguments, in which
case we can speed up checking for NULL input by avoiding setting up
a loop over the arguments. This adds EEOP_FUNCEXPR_STRICT_1 and the
corresponding EEOP_FUNCEXPR_STRICT_2 for functions with one and two
arguments respectively.

Author: Andres Freund <andres@anarazel.de>
Co-authored-by: Daniel Gustafsson <daniel@yesql.se>
Reviewed-by: Andreas Karlsson <andreas@proxel.se>
Discussion: https://postgr.es/m/415721CE-7D2E-4B74-B5D9-1950083BA03E@yesql.se
Discussion: https://postgr.es/m/20191023163849.sosqbfs5yenocez3@alap3.anarazel.de
2025-03-11 12:02:42 +01:00
Daniel Gustafsson
8dd7c7cd0a Replace EEOP_DONE with special steps for return/no return
Knowing when the side-effects of an expression is the intended result
of the execution, rather than the returnvalue, is important for being
able generate more efficient JITed code. This replaces EEOP_DONE with
two new steps: EEOP_DONE_RETURN and EEOP_DONE_NO_RETURN.  Expressions
which return a value should use the former step; expressions used for
their side-effects which don't return value should use the latter.

Author: Andres Freund <andres@anarazel.de>
Co-authored-by: Daniel Gustafsson <daniel@yesql.se>
Reviewed-by: Andreas Karlsson <andreas@proxel.se>
Discussion: https://postgr.es/m/415721CE-7D2E-4B74-B5D9-1950083BA03E@yesql.se
Discussion: https://postgr.es/m/20191023163849.sosqbfs5yenocez3@alap3.anarazel.de
2025-03-11 12:02:38 +01:00
Peter Eisentraut
dabccf4513 Move RemoveInheritedConstraint() call slightly earlier
This change is harmless and does not affect the existing intended
operation.  It is necessary for a subsequent patch operation (NOT
ENFORCED foreign keys), where we may need to change the child
constraint to enforced.  In this case, we would create the necessary
triggers and queue the constraint for validation, so it is important
to remove any unnecessary constraints before proceeding.

This is a small change that could have been included in the previous
"split tryAttachPartitionForeignKey" refactoring patch (commit
1d26c2d2c4), but was kept separate to highlight the changes.

Author: Amul Sul <amul.sul@enterprisedb.com>
Reviewed-by: Alexandra Wang <alexandra.wang.oss@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/CAAJ_b962c5AcYW9KUt_R_ER5qs3fUGbe4az-SP-vuwPS-w-AGA%40mail.gmail.com
2025-03-11 10:43:48 +01:00
Peter Eisentraut
1d26c2d2c4 refactor: Split tryAttachPartitionForeignKey()
Split tryAttachPartitionForeignKey() into three functions:
AttachPartitionForeignKey(), RemoveInheritedConstraint(), and
DropForeignKeyConstraintTriggers(), so they can be reused in some
subsequent patches for the NOT ENFORCED feature.

Author: Amul Sul <amul.sul@enterprisedb.com>
Reviewed-by: Alexandra Wang <alexandra.wang.oss@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/CAAJ_b962c5AcYW9KUt_R_ER5qs3fUGbe4az-SP-vuwPS-w-AGA%40mail.gmail.com
2025-03-11 09:35:24 +01:00
Peter Eisentraut
64224a834c refactor: re-add ATExecAlterChildConstr()
ATExecAlterChildConstr() was removed in commit 80d7f99049, but it is
needed in some subsequent patches for the NOT ENFORCED feature, to
recurse over child constraints.  This adds it back in slightly altered
form.

Author: Amul Sul <amul.sul@enterprisedb.com>
Reviewed-by: Alexandra Wang <alexandra.wang.oss@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/CAAJ_b962c5AcYW9KUt_R_ER5qs3fUGbe4az-SP-vuwPS-w-AGA%40mail.gmail.com
2025-03-11 08:43:35 +01:00
Michael Paquier
76def4cdd7 Add WAL data to backend statistics
This commit adds per-backend WAL statistics, providing the same
information as pg_stat_wal, except that it is now possible to know how
much WAL activity is happening in each backend rather than an overall
aggregate of all the activity.  Like pg_stat_wal, the implementation
relies on pgWalUsage, tracking the difference of activity between two
reports to pgstats.

This data can be retrieved with a new system function called
pg_stat_get_backend_wal(), that returns one tuple based on the PID
provided in input.  Like pg_stat_get_backend_io(), this is useful when
joined with pg_stat_activity to get a live picture of the WAL generated
for each running backend, showing how the activity is [un]balanced.

pgstat_flush_backend() gains a new flag value, able to control the flush
of the WAL stats.

This commit relies mostly on the infrastructure provided by
9aea73fc61, that has introduced backend statistics.

Bump catalog version.  A bump of PGSTAT_FILE_FORMAT_ID is not required,
as backend stats do not persist on disk.

Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Nazir Bilal Yavuz <byavuz81@gmail.com>
Reviewed-by: Xuneng Zhou <xunengzhou@gmail.com>
Discussion: https://postgr.es/m/Z3zqc4o09dM/Ezyz@ip-10-97-1-34.eu-west-3.compute.internal
2025-03-11 09:04:11 +09:00
Andres Freund
59a1592e39 tests: Make postmaster/002_connection_limits deal verbose logs
When log_error_verbosity=verbose is configured the test would hand (and then
fail), because of the sqlstate being added between log level and message. Make
regex cope.

Reported-by: Andrew Dunstan <andrew@dunslane.net>
Discussion: https://postgr.es/m/c7ba6bd0-3701-43d1-9087-017777fe9cd2%40dunslane.net
2025-03-10 19:32:26 -04:00
Tom Lane
29d6808ede CREATE INDEX: do update index stats if autovacuum=off.
This fixes a thinko from commit d611f8b15.  The intent was to prevent
updating the stats of the pre-existing heap if autovacuum is off,
but it also disabled updating the stats of the just-created index.
There is AFAICS no good reason to do the latter, since there could not
be any pre-existing stats to refrain from overwriting, and the zeroed
stats that are there to begin with are very unlikely to be useful.
Moreover, the change broke our cross-version upgrade tests again.

Author: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/1116282.1741374848@sss.pgh.pa.us
2025-03-10 17:49:27 -04:00
Heikki Linnakangas
f7c566a1a2 Fix a few more redundant calls of GetLatestSnapshot()
Commit 2367503177 fixed this in RelationFindReplTupleByIndex(), but I
missed two other similar cases.

Per report from Ranier Vilela.

Discussion: https://www.postgresql.org/message-id/CAEudQArUT1dE45WN87F-Gb7XMy_hW6x1DFd3sqdhhxP-RMDa0Q@mail.gmail.com
Backpatch-through: 13
2025-03-10 18:58:10 +02:00
Heikki Linnakangas
2367503177 Fix snapshot used in logical replication index lookup
The function calls GetLatestSnapshot() to acquire a fresh snapshot,
makes it active, and was meant to pass it to table_tuple_lock(), but
instead called GetLatestSnapshot() again to acquire yet another
snapshot. It was harmless because the heap AM and all other known
table AMs ignore the 'snapshot' argument anyway, but let's be tidy.

In the long run, this perhaps should be redesigned so that snapshot
was not needed in the first place. The table AM API uses TID +
snapshot as the unique identifier for the row version, which is
questionable when the row came from an index scan with a Dirty
snapshot. You might lock a different row version when you use a
different snapshot in the table_tuple_lock() call (a fresh MVCC
snapshot) than in the index scan (DirtySnapshot). However, in the heap
AM and other AMs where the TID alone identifies the row version, it
doesn't matter. So for now, just fix the obvious albeit harmless bug.

This has been wrong ever since the table AM API was introduced in
commit 5db6df0c01, so backpatch to all supported versions.

Discussion: https://www.postgresql.org/message-id/83d243d6-ad8d-4307-8b51-2ee5844f6230@iki.fi
Backpatch-through: 13
2025-03-10 17:07:38 +02:00
Tom Lane
9f87e2593f Doc: improve description of window function processing.
The previous wording talked about a "single pass over the data",
which can be read as promising more than intended (to wit, that only
one WindowAgg plan node will be used).  What we promise is only what
the SQL spec requires, namely that the data not get re-sorted between
window functions with compatible PARTITION BY/ORDER BY clauses.
Adjust the wording in hopes of making this clearer.

Reported-by: Christopher Inokuchi <cinokuchi@gmail.com>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: David G. Johnston <david.g.johnston@gmail.com>
Discussion: https://postgr.es/m/CABde6B5va2wMsnM79u_x=n9KUgfKQje_pbLROEBmA9Ru5XWidw@mail.gmail.com
Backpatch-through: 13
2025-03-10 10:22:08 -04:00
Alexander Korotkov
6bb6a62f3c Use extended stats for precise estimation of bucket size in hash join
Recognizing the real-life complexity where columns in the table often have
functional dependencies, PostgreSQL's estimation of the number of distinct
values over a set of columns can be underestimated (or much rarely,
overestimated) when dealing with multi-clause JOIN.  In the case of hash
join, it can end up with a small number of predicted hash  buckets and, as
a result, picking non-optimal merge join.

To improve the situation, we introduce one additional stage of bucket size
estimation - having two or more join clauses estimator lookup for extended
statistics and use it for multicolumn estimation.  Clauses are grouped into
lists, each containing expressions referencing the same relation.  The result
of the multicolumn estimation made over such a list is combined with others
according to the caller's logic.  Clauses that are not estimated are returned
to the caller for further estimation.

Discussion: https://postgr.es/m/52257607-57f6-850d-399a-ec33a654457b%40postgrespro.ru
Author: Andrei Lepikhov <lepihov@gmail.com>
Reviewed-by: Andy Fan <zhihui.fan1213@gmail.com>
Reviewed-by: Tomas Vondra <tomas.vondra@enterprisedb.com>
Reviewed-by: Alena Rybakina <lena.ribackina@yandex.ru>
Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com>
2025-03-10 13:42:01 +02:00
Alexander Korotkov
fae535da0a Teach Append to consider tuple_fraction when accumulating subpaths.
This change is dedicated to more active usage of IndexScan and parameterized
NestLoop paths in partitioned cases under an Append node, as it already works
with plain tables.  As newly added regression tests demonstrate, it should
provide more smartness to the partitionwise technique.

With an indication of how many tuples are needed, it may be more meaningful
to use the 'fractional branch' subpaths of the Append path list, which are
more optimal for this specific number of tuples.  Planning on a higher level,
if the optimizer needs all the tuples, it will choose non-fractional paths.
In the case when, during execution, Append needs to return fewer tuples than
declared by tuple_fraction, it would not be harmful to use the 'intermediate'
variant of paths.  However, it will earn a considerable profit if a sensible
set of tuples is selected.

The change of the existing regression test demonstrates the positive outcome
of this feature: instead of scanning the whole table, the optimizer prefers
to use a parameterized scan, being aware of the only single tuple the join
has to produce to perform the query.

Discussion: https://www.postgresql.org/message-id/flat/CAN-LCVPxnWB39CUBTgOQ9O7Dd8DrA_tpT1EY3LNVnUuvAX1NjA%40mail.gmail.com
Author: Nikita Malakhov <hukutoc@gmail.com>
Author: Andrei Lepikhov <lepihov@gmail.com>
Reviewed-by: Andy Fan <zhihuifan1213@163.com>
Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com>
2025-03-10 13:38:39 +02:00
Peter Eisentraut
b83e8a2ca2 Remove support for temporal RESTRICT foreign keys
It isn't clear how these should behave, so let's wait to implement them
until we are sure how to do it.

This feature was initially added by commit 89f908a6d0, so it hasn't
been released yet.

Author: Paul A. Jungwirth <pj@illuminatedcomputing.com>
Discussion: https://postgr.es/m/e773bc11-4ac1-40de-bb91-814e02f05b6d%40eisentraut.org
2025-03-10 11:31:01 +01:00
David Rowley
e033696596 Fix incorrect #endif comment
Noticed while reading code in this area.
2025-03-10 13:36:04 +13:00
Heikki Linnakangas
03f8e9a7fe Fix incorrect assertion in libpqwalreceiver
Was supposed to check the length of the array, but was checking its
size in bytes.

Author: Jacob Brazeal <jacob.brazeal@gmail.com>
Discussion: https://www.postgresql.org/message-id/CA%2BCOZaA_9afJxj9ZuO73U5P7WXP%2BZM9NGnZvTDCmBFz0FGP%2BwA@mail.gmail.com
2025-03-09 20:40:45 +02:00
Heikki Linnakangas
2a943afcff Fix test name and username used in failed connection attempts
The first failed connection tests the "regular" connections limit, not
the reserved limit.

In the second failed connection, the username doesn't really matter,
but since the previous successful connections used "regress_reserved",
it seems weird to switch back to "regress_regular" for the
expected-to-fail attempt.

Discussion: https://www.postgresql.org/message-id/fd5e9523-78d3-4270-86b2-fd1b1eeb4fc9@iki.fi
2025-03-09 19:47:55 +02:00
Tom Lane
fedfcf6650 Don't try to parallelize array_agg() on an anonymous record type.
This doesn't work because record_recv requires the typmod that
identifies the specific record type (in our session) and
array_agg_deserialize has no convenient way to get that information.
The result is an "input of anonymous composite types is not
implemented" error.

We could probably make this work if we had to, but it does not seem
worth the trouble, given that it took this long to get a field report.
Just shut off parallelization, as though record_recv didn't exist.

Oversight in commit 16fd03e95.  Back-patch to v16 where that
came in.

Reported-by: Kirill Zdornyy <kirill@dineserve.com>
Diagnosed-by: Richard Guo <guofenglinux@gmail.com>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: David Rowley <dgrowleyml@gmail.com>
Discussion: https://postgr.es/m/atLI5Kce2ie1zcYjU0w_kjtVaxiYbYGTihrkLDmGZQnRDD4pnXukIATaABbnIj9pUnelC4ESvCXMm4HAyHg-v61XABaKpERj0A2IXzJZM7g=@dineserve.com
Backpatch-through: 16
2025-03-09 13:11:20 -04:00
Nathan Bossart
3c472a1829 doc: Adjust note about pg_upgrade's --jobs option.
Presently, this section lists a couple of parallelized parts of
pg_upgrade and suggests a starting point for setting the --jobs
option.  The list of parallelized tasks is not particularly
actionable, and the phrasing for the --jobs recommendation is
confusing to some readers.

This commit attempts to improve this section by eliminating the
list of parallelized tasks and instead highlighting that --jobs is
most useful for clusters with multiple databases or tablespaces.
Additionally, the recommendation for setting --jobs is simplified
to suggest starting with the number of CPU cores.

Reported-by: Magnus Hagander <magnus@hagander.net>
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Reviewed-by: Magnus Hagander <magnus@hagander.net>
Discussion: https://postgr.es/m/Z8dBn_5iGLNuYiPo%40nathan
2025-03-08 14:28:16 -06:00
Jeff Davis
1852aea3f5 Don't convert to and from floats in pg_dump.
Commit 8f427187db improved performance by remembering relation stats
as native types rather than issuing a new query for each relation.

Using native types is fine for integers like relpages; but reltuples
is floating point. The commit controllled for that complexity by using
setlocale(LC_NUMERIC, "C"). After that, Alexander Lakhin found a
problem in pg_strtof(), fixed in 00d61a08c5.

While we aren't aware of any more problems with that approach, it
seems wise to just use a string the whole way for floating point
values, as Corey's original patch did, and get rid of the
setlocale(). Integers are still converted to native types to avoid
wasting memory.

Co-authored-by: Corey Huinker <corey.huinker@gmail.com>
Discussion: https://postgr.es/m/3049348.1740855411@sss.pgh.pa.us
Discussion: https://postgr.es/m/560cca3781740bd69881bb07e26eb8f65b09792c.camel%40j-davis.com
2025-03-08 11:25:36 -08:00
Tom Lane
7fb8801021 Clear errno before calling strtol() in spell.c.
Per POSIX, a caller of strtol() that wishes to check for errors must
set errno to 0 beforehand.  Several places in spell.c neglected that,
so that they risked delivering a false overflow error in case errno
had been ERANGE already.  Given the lack of field reports, this case
may be unreachable at present --- but it's surely trouble waiting to
happen, so fix it.

Author: Jacob Brazeal <jacob.brazeal@gmail.com>
Discussion: https://postgr.es/m/CA+COZaBhsq6EromFm+knMJfzK6nTpG23zJ+K2=nfUQQXcj_xcQ@mail.gmail.com
Backpatch-through: 13
2025-03-08 11:24:25 -05:00
Peter Geoghegan
67fc4c9fd7 Make parallel nbtree index scans use an LWLock.
Teach parallel nbtree index scans to use an LWLock (not a spinlock) to
protect the scan's shared descriptor state.

Preparation for an upcoming patch that will add skip scan optimizations
to nbtree.  That patch will create the need to occasionally allocate
memory while the scan descriptor is locked, while copying datums that
were serialized by another backend.

Author: Peter Geoghegan <pg@bowt.ie>
Reviewed-By: Matthias van de Meent <boekewurm+postgres@gmail.com>
Discussion: https://postgr.es/m/CAH2-Wz=PKR6rB7qbx+Vnd7eqeB5VTcrW=iJvAsTsKbdG+kW_UA@mail.gmail.com
2025-03-08 11:10:14 -05:00
Peter Eisentraut
8021c77769 Make amcanorder independent of amconsistentordering
Follow-up to commit af4002b381d: Make amconsistentordering not depend
on amcanorder.  Although they are related, they are independent
properties.

Reported-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://www.postgresql.org/message-id/flat/E1tngY6-0000UL-2n%40gemulon.postgresql.org
2025-03-08 09:37:06 +01:00
Peter Eisentraut
661781f3a3 Fix typo
Duplicate assignment in commit af4002b381 should have been a
different field.  (But it didn't affect the outcome.)

Reported-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://www.postgresql.org/message-id/flat/E1tngY6-0000UL-2n%40gemulon.postgresql.org
2025-03-08 08:06:30 +01:00
Michael Paquier
21f653cc00 Use stricter ordering in regression test query for pg_stat_io
The query introduced in 8b532771a0 is proving to have ordering issues
under at least the locale cs_CZ.  This commit updates the query to use a
stricter ordering.

Per reports from buildfarm members hippopotamus and jay.
2025-03-08 13:39:57 +09:00
Michael Paquier
8b532771a0 Add regression test listing all the possible tuples in pg_stat_io
pg_stat_io returns a set of tuples based on a combination of three
properties (BackendType, IOObject and IOContext) and
pgstat_tracks_io_object() to decide if a BackendType should return a
tuple based on a pair made of an IOObject and an IOContext.

This commit adds a regression test to track all the combinations
supported.  This is useful to know which tuples are relevant when adding
a new BackendType to the set or when touching pgstat_tracks_io_object(),
and I have noticed while playing with this area that it is not
complicated to break it without the regression tests noticing a
difference in some cases.

Reviewed-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Discussion: https://postgr.es/m/Z8exfAehbVbEKXW5@paquier.xyz
2025-03-08 12:22:41 +09:00
Michael Paquier
9a8dd2c5a6 Improve check for detection of pending data in backend statistics
The callback pgstat_backend_have_pending_cb() is used as a way for
pg_stat_report() to detect if there is any pending data for backend
statistics.

It did not include a check based on pgstat_tracks_backend_bktype(), that
discards processes whose backend types do not support backend
statistics.  The logic is not a problem on HEAD, as processes that do
not support backend statistics cannot touch PendingBackendStats, so the
callback would always report that there is no pending data in this case.
However, we would run into trouble once backend statistics include
portions of pending stats that are not always zeroed, like pgWalUsage.

There is no reason for pgstat_backend_have_pending_cb() to not check
for pgstat_tracks_backend_bktype(), anyway, and this pattern is safer in
the long run, so let's update the code to do so.

While on it, this commit adds a proper initialization to
PendingBackendStats.

Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Co-authored-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/Z8l6EMM4ImVoWRkg@ip-10-97-1-34.eu-west-3.compute.internal
2025-03-08 10:56:30 +09:00
Peter Geoghegan
8e167e6188 nbtree: refine _bt_readnextpage contract comments.
Another minor follow-up commit for commit 1bd4bc85, which changed the
_bt_readnextpage contract.
2025-03-07 18:35:13 -05:00
Nathan Bossart
088f8e2d56 Assert that wrapper_handler()'s argument is within expected range.
pqsignal() already does a similar check, but strange Valgrind
reports have us wondering if wrapper_handler() is somehow getting
called with an invalid signal number.

Reported-by: Tomas Vondra <tomas@vondra.me>
Suggested-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/ace01111-f9ac-4f61-b1b1-8e9379415444%40vondra.me
Backpatch-through: 17
2025-03-07 15:23:09 -06:00
Tom Lane
34c3c5ce1c Include column name in build_attrmap_by_position's error reports.
Formerly we only provided the column number, but it's frequently
more useful to mention the column name.  The input tupdesc often
doesn't have useful column names, but the output tupdesc usually
contains user-supplied names, so report that one.

Author: Marcos Pegoraro <marcos@f10.com.br>
Co-authored-by: jian he <jian.universality@gmail.com>
Co-authored-by: Tom Lane <tgl@sss.pgh.pa.us>
Co-authored-by: Erik Wienhold <ewie@ewie.name>
Reviewed-by: Vladlen Popolitov <v.popolitov@postgrespro.ru>
Discussion: https://postgr.es/m/CAB-JLwanky28gjAMdnMh1CjyO1b2zLdr6UOA1-oY9G7PVL9KKQ@mail.gmail.com
2025-03-07 13:24:20 -05:00
Andres Freund
b48832cddb tests: Don't fail due to high default timeout in postmaster/003_start_stop
Some BF animals use very high timeouts due to their slowness. Unfortunately
postmaster/003_start_stop fails if a high timeout is configured, due to
authentication_timeout having a fairly low max.

As this test is reasonably fast, the easiest fix seems to be to cap the
timeout to 600.

Per buildfarm animal skink.

Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi>
Discussion: https://postgr.es/m/ggflhkciwdyotpoie323chu2c2idpjk5qimrn462encwx2io7s@thmcxl7i6dpw
2025-03-07 13:09:16 -05:00
Andres Freund
71d1ed6fe1 tests: Fix race condition in postmaster/002_connection_limits
The test occasionally failed due to unexpected connection limit errors being
encountered after having waited for FATAL errors on another connection. These
spurious failures were caused by the the backend reporting FATAL errors to the
client before detaching from the PGPROC entry. Adding a sleep(1) before
proc_exit() makes it easy to reproduce that problem.

To fix the issue, add a helper function that waits for postmaster to notice
the process having exited. For now this is implemented by waiting for the
DEBUG2 message that postmaster logs in that case. That's not the prettiest
fix, but simple. If we notice this problem elsewhere, it might be worthwhile
to make this more general, e.g. by adding an injection point.

Reported-by: Tomas Vondra <tomas@vondra.me>
Diagnosed-by: Heikki Linnakangas <hlinnaka@iki.fi>
Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi>
Tested-by: Tomas Vondra <tomas@vondra.me>
Discussion: https://postgr.es/m/ggflhkciwdyotpoie323chu2c2idpjk5qimrn462encwx2io7s@thmcxl7i6dpw
2025-03-07 13:09:16 -05:00
Robert Haas
d3fc7a5120 doc: Add missing decimal places to example rowcount.
Commit 95dbd827f2 updated a bunch
of similar cases in the documentation, but missed this one.

Author: Ilia Evdokimov <ilya.evdokimov@tantorlabs.com>
Reviewed-by: Fabrízio de Royes Mello <fabriziomello@gmail.com>
2025-03-07 09:00:53 -05:00
Peter Eisentraut
7f24c02743 Improve possible performance regression
Commit ce62f2f2a0 introduced calls to GetIndexAmRoutineByAmId() in
lsyscache.c functions.  This call is a bit more expensive than a
simple syscache lookup.  So rearrange the nesting so that we call that
one last and do the cheaper checks first.

Reported-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://www.postgresql.org/message-id/flat/E1tngY6-0000UL-2n%40gemulon.postgresql.org
2025-03-07 11:46:33 +01:00
Peter Eisentraut
af4002b381 Rename amcancrosscompare
After more discussion about commit ce62f2f2a0, rename the index AM
property amcancrosscompare to two separate properties
amconsistentequality and amconsistentordering.  Also improve the
documentation and update some comments that were previously missed.

Reported-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://www.postgresql.org/message-id/flat/E1tngY6-0000UL-2n%40gemulon.postgresql.org
2025-03-07 11:46:33 +01:00
Dean Rasheed
6da469bada Allow casting between bytea and integer types.
This allows smallint, integer, and bigint values to be cast to and
from bytea. The bytea value is the two's complement representation of
the integer, with the most significant byte first. For example:

  1234::bytea -> \x000004d2
  (-1234)::bytea -> \xfffffb2e

Author: Aleksander Alekseev <aleksander@timescale.com>
Reviewed-by: Joel Jacobson <joel@compiler.org>
Reviewed-by: Yugo Nagata <nagata@sraoss.co.jp>
Reviewed-by: Peter Eisentraut <peter@eisentraut.org>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Dean Rasheed <dean.a.rasheed@gmail.com>
Discussion: https://postgr.es/m/CAJ7c6TPtOp6%2BkFX5QX3fH1SVr7v65uHr-7yEJ%3DGMGQi5uhGtcA%40mail.gmail.com
2025-03-07 09:31:18 +00:00
Jeff Davis
d611f8b158 CREATE INDEX: don't update table stats if autovacuum=off.
We previously fixed this for binary upgrade in 71b66171d0, but a
similar problem remained when dumping statistics without data.

Fix by not opportunistically updating table stats during CREATE INDEX
when autovacuum is disabled. For stats to be stable at all, the server
needs to be aware that it should not take every opportunity to update
stats. Per discussion, autovacuum=off is a signal that the user
expects stats to be stable; though if necessary, we could create
a more specific mode in the future.

Reported-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Discussion: https://postgr.es/m/CAExHW5vf9D+8-a5_BEX3y=2y_xY9hiCxV1=C+FnxDvfprWvkng@mail.gmail.com
Discussion: https://postgr.es/m/ca81cbf6e6ea2af838df972801ad4da52640a503.camel%40j-davis.com
2025-03-06 19:39:14 -08:00
John Naylor
19e57f4f78 Revert "vacuumdb: Add option for analyzing only relations missing stats."
This reverts commit 5f8eb25706, which in
my branch by mistake.
2025-03-07 10:35:21 +07:00
John Naylor
fcabc3adf8 Doc: correct aggressive vacuum threshold for multixact members storage
The threshold is two billion members, which was interpreted as 2GB
in the documentation. Fix to reflect that each member takes up five
bytes, which translates to about 10GB. This is not exact, because of
page boundaries. While at it, mention the maximum size 20GB.

This has been wrong since commit c552e171d1, so backpatch to
version 14.

Author: Alex Friedman <alexf01@gmail.com>
Reviewed-by: Sami Imseih <samimseih@gmail.com>
Reviewed-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Discussion: https://postgr.es/m/CACbFw60UOk6fCC02KsyT3OfU9Dnuq5roYxdw2aFisiN_p1L0bg@mail.gmail.com
Backpatch-through: 14
2025-03-07 10:22:56 +07:00
Nathan Bossart
5f8eb25706 vacuumdb: Add option for analyzing only relations missing stats.
This commit adds a new --missing-only option that can be used in
conjunction with --analyze-only and --analyze-in-stages.  When this
option is specified, vacuumdb will generate ANALYZE commands for a
relation if it is missing any statistics it should ordinarily have.
For example, if a table has statistics for one column but not
another, we will analyze the whole table.  A similar principle
applies to extended statistics, expression indexes, and table
inheritance.

Co-authored-by: Corey Huinker <corey.huinker@gmail.com>
Reviewed-by: TODO
Discussion: https://postgr.es/m/Z5O1bpcwDrMgyrYy%40nathan
2025-03-07 10:17:35 +07:00
Michael Paquier
e2080261cc Fix race condition in TAP test 007_pre_auth
The authentication test added in c76db55c90 expects a backend to start
and wait at the injection point "init-pre-auth".  A query is used to
retrieve the PID of the backend waiting at authentication, but its WHERE
clause was too soft, checking only for a backend in a "starting" state.

As proved by the CI, this WHERE clause is not enough.  There is a small
window between the moment when the backend is reported as "starting" in
its backend entry and the moment when it waits in its injection point,
and it was possible for the test to return the PID of a backend process
not yet waiting in the injection point, causing spurious failures.  This
issue is fixed by tweaking the query retrieving the PID of the backend
waiting before authentication so as we check for "init-pre-auth" in its
wait_event.  An extra check based on the backend_type is added, based on
a suggestion by Jacob, to be more cautious.

Error spotted by the CI on Windows, but it could happen anywhere, as
long as the authentication path is slow enough compared to the TAP test.

Reported-by: Andres Freund <andres@anarazel.de>
Author: Jacob Champion <jacob.champion@enterprisedb.com>
Co-authored-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/soexrl7oeyku24bj3czupxmv27ow35u6edymp5y3oyoysbe2kb@r3tgoos2xp2x
2025-03-07 08:12:45 +09:00
Álvaro Herrera
24503fa95c
reindexdb: move PQfinish() calls to the right place
get_parallel_object_list() has no business closing a connection it did
not create.  Make things more sensible by closing the connection at the
level where it is created, in reindex_one_database().

Extracted from a larger patch by the same author.  However, the patch as
submitted not only was not described as containing this change, but in
addition it contained a fatal flaw whereby reindexdb would crash and
fail across all of its TAP test, which is why I list myself as
co-author.

Author: Ranier Vilela <ranier.vf@gmail.com>
Author: Álvaro Herrera <alvherre@alvh.no-ip.org>
Discussion: https://postgr.es/m/CAEudQArfqr0-s0VVPSEh=0kgOgBJvFNdGW=xSL5rBcr0WDMQYQ@mail.gmail.com
2025-03-06 19:40:06 +01:00
Tom Lane
0f21db36d6 Fix some performance issues in GIN query startup.
If a GIN index search had a lot of search keys (for example,
"jsonbcol ?| array[]" with tens of thousands of array elements),
both ginFillScanKey() and startScanKey() took O(N^2) time.
Worse, those loops were uncancelable for lack of CHECK_FOR_INTERRUPTS.

The problem in ginFillScanKey() is the brute-force search key
de-duplication done in ginFillScanEntry().  The most expedient
solution seems to be to just stop trying to de-duplicate once
there are "too many" search keys.  We could imagine working harder,
say by using a sort-and-unique algorithm instead of brute force
compare-all-the-keys.  But it seems unlikely to be worth the trouble.
There is no correctness issue here, since the code already allowed
duplicate keys if any extra_data is present.

The problem in startScanKey() is the loop that attempts to identify
the first non-required search key.  In the submitted test case, that
vainly tests all the key positions, and each iteration takes O(N)
time.  One part of that is that it's reinitializing the entryRes[]
array from scratch each time, which is entirely unnecessary given
that the triConsistentFn isn't supposed to scribble on its input.
We can easily adjust the array contents incrementally instead.
The other part of it is that the triConsistentFn may itself take
O(N) time (and does in this test case).  This is all extremely
brute force: in simple cases with AND or OR semantics, we could
know without any looping whatever that all or none of the keys
are required.  But GIN opclasses don't have any API for exposing
that knowledge, so at least in the short run there is little to
be done about that.  Put in a CHECK_FOR_INTERRUPTS so that at
least the loop is cancelable.

These two changes together resolve the primary complaint that
the test query doesn't respond promptly to cancel interrupts.
Also, while they don't completely eliminate the O(N^2) behavior,
they do provide quite a nice speedup for mid-sized examples.

Bug: #18831
Reported-by: Niek <niek.brasa@hitachienergy.com>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/18831-e845ac44ebc5dd36@postgresql.org
Backpatch-through: 13
2025-03-06 11:54:31 -05:00
Andrew Dunstan
e33969abc1 Further fix for json_strip_nulls documentation
Oversight in commit 4603903d29.

Author: Shinoda, Noriyoshi (SXD Japan FSI) <noriyoshi.shinoda@hpe.com>
2025-03-06 10:24:03 -05:00
Andrew Dunstan
0e76f253f4 Remove extraneous commas in json{b}_strip_nulls documentation
Oversight in commit 4603903d29.

Author: Ian Lawrence Barwick <barwick@gmail.com>
2025-03-06 08:46:15 -05:00
Amit Kapila
588acf6d0e Avoid invalidating all RelationSyncCache entries on publication change.
On change of publication via ALTER PUBLICATION ... SET/ADD/DROP commands,
we were invalidating all the relations present in relation sync cache
maintained by pgoutput. We need to invalidate only the relation entries
that are changed as part of publication DDL.

We have ensured that the publication DDL execution generated the
invalidations required to invalidate impacted relation sync entries in
RelationSyncCache.

This improves the performance by avoiding building the cache entries for
the cases where a publication has many tables but only one of them is
dropped.

Author: Shlok Kyal <shlok.kyal.oss@gmail.com>
Author: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Reviewed-by: Hou Zhijie <houzj.fnst@fujitsu.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Discussion: https://postgr.es/m/OSCPR01MB14966C09AA201EFFA706576A7F5C92@OSCPR01MB14966.jpnprd01.prod.outlook.com
2025-03-06 14:19:38 +05:30
Jeff Davis
1d33de9d68 Organize and deduplicate statistics import tests.
Author: Corey Huinker <corey.huinker@gmail.com>
Reported-by: Melanie Plageman <melanieplageman@gmail.com>
Discussion: https://postgr.es/m/CAAKRu_bWEqUfxhODfJ-XbZC75vq=P6DYOKK6biyey=yM1Ah3Hg@mail.gmail.com
Discussion: https://postgr.es/m/CADkLM=f1n2_Vomq0gKab7xdxDHmJGgn=DE48P8fzQOp3Mrs1Qg@mail.gmail.com
2025-03-06 00:19:22 -08:00
Jeff Davis
f9f4b43b8d Address stats export review comments.
Per discussion, did not use Jian He's patch exactly.

Reported-by: jian he <jian.universality@gmail.com>
Reviewed-by: Corey Huinker <corey.huinker@gmail.com>
Discussion: https://postgr.es/m/CACJufxFVq=tq9u1zrHWYSbMi1T07gS9Ff0LJScMco4HZmtZ1xw@mail.gmail.com
Discussion: https://postgr.es/m/CADkLM=f1n2_Vomq0gKab7xdxDHmJGgn=DE48P8fzQOp3Mrs1Qg@mail.gmail.com
2025-03-06 00:11:12 -08:00
Jeff Davis
298944e8d8 Address stats import review comments.
Reported-by: jian he <jian.universality@gmail.com>
Discussion: https://postgr.es/m/CACJufxHG9MBQozbJQ4JRBcRbUO+t+sx4qLZX092rS_9b4SR_EA@mail.gmail.com
2025-03-05 23:07:25 -08:00
Heikki Linnakangas
39de4f157d Fix compiler warnings about typedef redefinitions
Clang with -Wtypedef-redefinition produced warnings:

    src/include/storage/latch.h:122:3: error: redefinition of typedef 'Latch' is a C11 feature [-Werror,-Wtypedef-redefinition]

Per buildfarm
2025-03-06 03:10:22 +02:00
Michael Paquier
7f7f324eb5 Add more monitoring data for WAL writes in the WAL receiver
This commit adds two improvements related to the monitoring of WAL
writes for the WAL receiver.

First, write counts and timings are now counted in pg_stat_io for the
WAL receiver.  These have been discarded from pg_stat_wal in
ff99918c62 due to performance concerns, related to the fact that we
still relied on an on-disk file for the stats back then, even with
track_wal_io_timing to avoid the overhead of the timestamp calculations.
This implementation is simpler than the original proposal as it is
possible to rely on the APIs of pgstat_io.c to do the job.  Like the
fsync and read data, track_wal_io_timing needs to be enabled to track
the timings.

Second, a wait event is added around the pg_pwrite() call in charge of
the writes, using the exiting WAIT_EVENT_WAL_WRITE.  This is useful as
the WAL receiver data is tracked in pg_stat_activity.

Reviewed-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Discussion: https://postgr.es/m/Z8gFnH4o3jBm5BRz@ip-10-97-1-34.eu-west-3.compute.internal
2025-03-06 09:41:37 +09:00
Heikki Linnakangas
393e0d2314 Split WaitEventSet functions to separate source file
latch.c now only contains the Latch related functions, which build on
the WaitEventSet abstraction. Most of the platform-dependent stuff is
now in waiteventset.c.

Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://www.postgresql.org/message-id/8a507fb6-df28-49d3-81a5-ede180d7f0fb@iki.fi
2025-03-06 01:26:16 +02:00
Heikki Linnakangas
84e5b2f07a Use ModifyWaitEvent to update exit_on_postmaster_death
This is in preparation for splitting WaitEventSet related functions to
a separate source file. That will hide the details of WaitEventSet
from WaitLatch, so it must use an exposed function instead of
modifying WaitEventSet->exit_on_postmaster_death directly.

Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://www.postgresql.org/message-id/8a507fb6-df28-49d3-81a5-ede180d7f0fb@iki.fi
2025-03-06 01:26:12 +02:00
Fujii Masao
9f25b9f739 ecpg: Fix compiler warning in ecpg build with Meson.
Previously, Meson could produce a warning about the use of 'deps' in ecpg:

    WARNING: Project targets '>=0.54' but uses a feature introduced in '0.60.0': list.<plus>. The right-hand operand was not a list.

The right-hand operand of 'deps' should be a list. This commit fixes
the warning by wrapping it with square brackets.

This issue was introduced in commit 28f04984f0.

Author: Jacob Champion <jacob.champion@enterprisedb.com>
Discussion: https://postgr.es/m/CAOYmi+ks8wO06Ymxduw2h_eQJ_D4_jHGeyMK0P=p5Q3psnEdMA@mail.gmail.com
2025-03-06 08:22:30 +09:00
Heikki Linnakangas
a98e4dee63 Remove unused ShutdownLatchSupport() function
The only caller was removed in commit 80a8f95b3b. I don't foresee
needing it any time soon, and I'm working on some big changes in this
area, so let's remove it out of the way.

Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://www.postgresql.org/message-id/8a507fb6-df28-49d3-81a5-ede180d7f0fb@iki.fi
2025-03-05 23:52:04 +02:00
Daniel Gustafsson
153836b99a ci: Remove installation of libcurl
The CI images come with libcurl pre-installed since commit a119426
in the pg-vm-images repository so remove the installation commands
from the Cirrus tasks.  Installation of libcurl packages was added
in the OAuth patchset which introduced the dependency, a backpatch
is thus not applicable.

Author: Daniel Gustafsson <daniel@yesql.se>
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/8745B9D8-D897-4302-BD4C-FC18F291ECB7@yesql.se
2025-03-05 22:12:20 +01:00
Andres Freund
d4a6c847ca ci: Document what makes certain tasks special
To increase coverage without drastically increasing CI resource usage, we have
different CI tasks test different things (e.g. the linux tasks use
sanitizers).  Unfortunately that can create confusing situations where CI
fails on some OS, but not others, without the problem appearing to be platform
dependent.

To, partially, address that, add a comment, prefixed with SPECIAL, to each
task that we use to test in some non-default way.

Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/321570.1741195755@sss.pgh.pa.us
2025-03-05 13:19:28 -05:00
Andres Freund
0a2f5df881 ci: freebsd: Specify debug_parallel_query=regress
A lot of buildfarm animals run with debug_parallel_query=regress, while CI
didn't test that. That lead to the annoying situation of only noticing related
test instabilities after merging changes upstream.

FreeBSD was chosen because it's a relatively fast task. It also tests
debug_write_read_parse_plan_trees etc, which probably is exercised a bit more
heavily with debug_parallel_query=regress.

Discussion: https://postgr.es/m/zbuk4mlov22yfoktf5ub3lwjw2b7ezwphwolbplthepda42int@h6wpvq7orc44
2025-03-05 13:19:28 -05:00
Andres Freund
ad40644eb8 ci: Upgrade FreeBSD image
Upgrade to the current stable version. To avoid needing commits like this in
the future, the CI image name now doesn't contain the OS version number
anymore.

Backpatch to all versions with CI support, we don't want to generate CI images
for multiple FreeBSD versions.

Author: Nazir Bilal Yavuz <byavuz81@gmail.com>
Discussion: https://postgr.es/m/CAN55FZ3_P4JJ6tWZafjf-_XbHgG6DQGXhH-y6Yp78_bwBJjcww@mail.gmail.com
Backpatch-through: 15
2025-03-05 10:33:47 -05:00
Peter Geoghegan
d00107cd63 Revert "Show index search count in EXPLAIN ANALYZE."
This reverts commit 5ead85fbc8.

This commit shows test failures with debug_parallel_query=regress.  The
underlying issue needs to be debugged, so revert for now.
2025-03-05 10:27:31 -05:00
Andrew Dunstan
4603903d29 Allow json{b}_strip_nulls to remove null array elements
An additional paramater ("strip_in_arrays") is added to these functions.
It defaults to false. If true, then null array elements are removed as
well as null valued object fields. JSON that just consists of a single
null is not affected.

Author: Florents Tselai <florents.tselai@gmail.com>

Discussion: https://postgr.es/m/4BCECCD5-4F40-4313-9E98-9E16BEB0B01D@gmail.com
2025-03-05 10:04:02 -05:00
Peter Geoghegan
5ead85fbc8 Show index search count in EXPLAIN ANALYZE.
Expose the count of index searches/index descents in EXPLAIN ANALYZE's
output for index scan nodes.  This information is particularly useful
with scans that use ScalarArrayOp quals, where the number of index scans
isn't predictable in advance (at least not with optimizations like the
one added to nbtree by Postgres 17 commit 5bf748b8).  It will also be
useful when EXPLAIN ANALYZE shows details of an nbtree index scan that
uses skip scan optimizations set to be introduced by an upcoming patch.

The instrumentation works by teaching index AMs to increment a new
nsearches counter whenever a new index search begins.  The counter is
incremented at exactly the same point that index AMs must already
increment the index's pg_stat_*_indexes.idx_scan counter (we're counting
the same event, but at the scan level rather than the relation level).
The new counter is stored in the scan descriptor (IndexScanDescData),
which explain.c reaches by going through the scan node's PlanState.

This approach doesn't match the approach used when tracking other index
scan specific costs (e.g., "Rows Removed by Filter:").  It is similar to
the approach used in other cases where we must track costs that are only
readily accessible inside an access method, and not from the executor
(e.g., "Heap Blocks:" output for a Bitmap Heap Scan).  It is inherently
necessary to maintain a counter that can be incremented multiple times
during a single amgettuple call (or amgetbitmap call), and directly
exposing PlanState.instrument to index access methods seems unappealing.

Author: Peter Geoghegan <pg@bowt.ie>
Reviewed-By: Tomas Vondra <tomas@vondra.me>
Reviewed-By: Robert Haas <robertmhaas@gmail.com>
Reviewed-By: Masahiro Ikeda <ikedamsh@oss.nttdata.com>
Reviewed-By: Matthias van de Meent <boekewurm+postgres@gmail.com>
Discussion: https://postgr.es/m/CAH2-Wz=PKR6rB7qbx+Vnd7eqeB5VTcrW=iJvAsTsKbdG+kW_UA@mail.gmail.com
Discussion: https://postgr.es/m/CAH2-WzkRqvaqR2CTNqTZP0z6FuL4-3ED6eQB0yx38XBNj1v-4Q@mail.gmail.com
2025-03-05 09:36:48 -05:00
Heikki Linnakangas
635f580120 Rename some signal and interrupt handling functions for consistency
The usual pattern for handling a signal is that the signal handler
sets a flag and calls SetLatch(MyLatch), and CHECK_FOR_INTERRUPTS() or
other code that is part of a wait loop calls another function to deal
with it. The naming of the functions involved was a bit inconsistent,
however. CHECK_FOR_INTERRUPTS() calls ProcessInterrupts() to do the
heavy-lifting, but the analogous functions in aux processes were
called HandleMainLoopInterrupts(), HandleStartupProcInterrupts(),
etc. Similarly, most subroutines of ProcessInterrupts() were called
Process*(), but some were called Handle*().

To make things less confusing, rename all the functions that are part
of the overall signal/interrupt handling system but are not executed
in a signal handler to e.g. ProcessSomething(), rather than
HandleSomething(). The "Process" prefix is now consistently used in
the non-signal-handler functions, and the "Handle" prefix in functions
that are part of signal handlers, except for some completely unrelated
functions that clearly have nothing to do with signal or interrupt
handling.

Reviewed-by: Nathan Bossart <nathandbossart@gmail.com>
Discussion: https://www.postgresql.org/message-id/8a384b26-1499-41f6-be33-64b801fb98b8@iki.fi
2025-03-05 16:22:26 +02:00
Álvaro Herrera
f4e53e10b6
Add ALTER TABLE ... ALTER CONSTRAINT ... SET [NO] INHERIT
This allows to redefine an existing non-inheritable constraint to be
inheritable, which allows to straighten up situations with NO INHERIT
constraints so that thay can become normal constraints without having to
re-verify existing data.  For existing inheritance children this may
require creating additional constraints, if they don't exist already.

It also allows to do the opposite, if only for symmetry.

Author: Suraj Kharage <suraj.kharage@enterprisedb.com>
Reviewed-by: jian he <jian.universality@gmail.com>
Discussion: https://postgr.es/m/CAF1DzPVfOW6Kk=7SSh7LbneQDJWh=PbJrEC_Wkzc24tHOyQWGg@mail.gmail.com
2025-03-05 13:50:22 +01:00
Michael Paquier
f4694e0f35 Fix some gaps in pg_stat_io with WAL receiver and WAL summarizer
The WAL receiver and WAL summarizer processes gain each one a call to
pgstat_report_wal(), to make sure that they report their WAL statistics
to pgstats, gathering data for pg_stat_io.

In the WAL receiver, the stats reports are timed with status updates sent
to the primary, that depend on wal_receiver_status_interval and
wal_receiver_timeout.  This is a conservative choice, but perhaps we
could be more aggressive with the frequency of the stats reports.  An
interesting historical fact is that the WAL receiver does writes and
syncs of WAL, but it has never reported its statistics to pgstats in
pg_stat_wal.

In the WAL summarizer, the stats reports are done each time the process
waits for WAL.

While on it, pg_stat_io is adjusted so as these two processes do not
report any rows when IOObject is not WAL, making the view easier to use
with less rows.

Two tests are added in TAP, checking statistics for the WAL summarizer
and the WAL receiver.  Status updates in the WAL receiver are currently
possible in the recovery test 001_stream_rep.pl.

Reviewed-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Discussion: https://postgr.es/m/Z8UKZyVSHUUQJHNb@paquier.xyz
2025-03-05 10:17:39 +09:00
Michael Paquier
54d23601b9 psql: Fix memory leak with \gx used within a pipeline
While inside a pipeline, \gx is currently forbidden and will make
exec_command_g() exit early.  There was a memory leak in this code path,
so let's fix it.

Author: Anthonin Bonnefoy <anthonin.bonnefoy@datadoghq.com>
Discussion: https://postgr.es/m/CAO6_XqqFVQjLjZQiL7xdwLpzZEy1ghO_JWvCFPM_OmwF9s7XdA@mail.gmail.com
2025-03-05 07:56:03 +09:00
Tomas Vondra
b229c10164 Enforce memory limit during parallel GIN builds
Index builds are expected to respect maintenance_work_mem, just like
other maintenance operations. For serial builds this is done simply by
flushing the buffer in ginBuildCallback() into the index. But with
parallel builds it's more complicated, because there are multiple places
that can allocate memory.

ginBuildCallbackParallel() does the same thing as ginBuildCallback(),
except that the accumulated items are written into tuplesort. Then the
entries with the same key get merged - first in the worker, then in the
leader - and the TID lists may get (arbitrarily) long. It's unlikely it
would exceed the memory limit, but it's possible. We address this by
evicting some of the data if the list gets too long.

We can't simply dump the whole in-memory TID list. The GIN index bulk
insert code expects to see TIDs in monotonic order; it may fail if the
TIDs go backwards. If the TID lists overlap, evicting the whole current
TID list would break this (a later entry might add "old" TID values into
the already-written part).

In the workers this is not an issue, because the lists never overlap.
But the leader may see overlapping lists produced by the workers.

We can however derive a safe "horizon" TID - the entries (for a given
key) are sorted by (key, first TID), which means no future list can add
values before the last "first TID" we've seen. This patch tracks the
"frozen" part of the TID list, which we know can't change by merging
additional TID lists. If needed, we can evict this part of the list.

We don't want to do this too often - the smaller lists we evict, the
more expensive it'll be to merge them in the next step (especially in
the leader). Therefore we only trim the list if we have at least 1024
frozen items, and if the whole list is at least 64kB large.

These thresholds are somewhat arbitrary and conservative. We might
calculate the values from maintenance_work_mem, but tests show that does
not really improve anything (time, compression ratio, ...). So we stick
to these conservative values to release memory faster.

Author: Tomas Vondra
Reviewed-by: Matthias van de Meent, Andy Fan, Kirill Reshke
Discussion: https://postgr.es/m/6ab4003f-a8b8-4d75-a67f-f25ad98582dc%40enterprisedb.com
2025-03-04 20:41:13 +01:00
Masahiko Sawada
f52345995d pg_upgrade: Check for the expected error message in TAP tests.
Since pg_upgrade prints its error messages on stdout, we can't use
command_fails_like() to check if it fails for the right reason. This
commit uses command_checks_all() in pg_upgrade TAP tests to check the
exit status and stdout, enabling proper verification of error
reasons.

Author: Dagfinn Ilmari Mannsåker <ilmari@ilmari.org>
Discussion: https://postgr.es/m/87tt8h1vb7.fsf@wibble.ilmari.org
2025-03-04 11:16:12 -08:00
Álvaro Herrera
7bbc46213d
Fix ALTER TABLE error message
This bogus error message was introduced in 2013 by commit f177cbfe67,
because of misunderstanding the processCASbits() API; at the time, no
test cases were added that would be affected by this change.  Only in
ca87c415e2 was one added (along with a couple of typos), with an XXX
note that the error message was bogus.  Fix the whole, add some test
cases.

Backpatch all the way back.

Reviewed-by: Nathan Bossart <nathandbossart@gmail.com>
Discussion: https://postgr.es/m/202503041822.aobpqke3igvb@alvherre.pgsql
2025-03-04 20:07:30 +01:00
Masahiko Sawada
bacbc4863b Refactor Copy{From|To}GetRoutine() to use pass-by-reference argument.
The change improves efficiency by eliminating unnecessary copying of
CopyFormatOptions.

The coverity also complained about inefficiencies caused by
pass-by-value.

Oversight in 7717f6300 and 2e4127b6d.

Reported-by: Junwang Zhao <zhjwpku@gmail.com>
Reported-by: Tom Lane <tgl@sss.pgh.pa.us> (per reports from coverity)
Author: Sutou Kouhei <kou@clear-code.com>
Reviewed-by: Junwang Zhao <zhjwpku@gmail.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Discussion: https://postgr.es/m/CAEG8a3L6YCpPksTQMzjD_CvwDEhW3D_t=5md9BvvdOs5k+TA=Q@mail.gmail.com
2025-03-04 10:38:41 -08:00
Tomas Vondra
0b2a45a5d1 Compress TID lists when writing GIN tuples to disk
When serializing GIN tuples to tuplesorts during parallel index builds,
we can significantly reduce the amount of data by compressing the TID
lists. The GIN opclasses may produce a lot of data (depending on how
many keys are extracted from each row), and the TID compression is very
efficient and effective.

If the number of distinct keys is high, the first worker pass (reading
data from the table and writing them into a private tuplesort) may not
benefit from the compression very much. It is likely to spill data to
disk before the TID lists get long enough for the compression to help.
The second pass (writing the merged data into the shared tuplesort) is
more likely to benefit from compression.

The compression can be seen as a way to reduce the amount of disk space
needed by the parallel builds, because the data is written twice. First
into the per-worker tuplesorts, then into the shared tuplesort.

Author: Tomas Vondra
Reviewed-by: Matthias van de Meent, Andy Fan, Kirill Reshke
Discussion: https://postgr.es/m/6ab4003f-a8b8-4d75-a67f-f25ad98582dc%40enterprisedb.com
2025-03-04 19:02:05 +01:00
Tom Lane
9b4bdf876a Add .gitignore entry for ecpg test detritus.
Oversight in commit 28f04984f.
2025-03-04 12:58:07 -05:00
Tomas Vondra
c878de1db4 Make FP_LOCK_SLOTS_PER_BACKEND look like a function
The FP_LOCK_SLOTS_PER_BACKEND macro looks like a constant, but it
depends on the max_locks_per_transaction GUC, and thus can change. This
is non-obvious and confusing, so make it look more like a function by
renaming it to FastPathLockSlotsPerBackend().

While at it, use the macro when initializing fast-path shared memory,
instead of using the formula.

Reported-by: Andres Freund
Discussion: https://postgr.es/m/ffiwtzc6vedo6wb4gbwelon5nefqg675t5c7an2ta7pcz646cg%40qwmkdb3l4ett
2025-03-04 18:33:12 +01:00
Fujii Masao
91ecb5e0bc Add regression tests for pg_stat_progress_copy.tuples_skipped.
This commit adds tests to verify that tuples_skipped in pg_stat_progress_copy
works as expected. While existing tests checked other fields, tuples_skipped
was previously untested.

This improves test coverage and ensures accurate tracking of skipped tuples.

Author: Jian He <jian.universality@gmail.com>
Reviewed-by: Fujii Masao <masao.fujii@gmail.com>
Reviewed-by: Josef Šimánek <josef.simanek@gmail.com>
Discussion: https://postgr.es/m/CACJufxFazq-bfyhiO0KBojR=yOr84E25Rqf6mHB0Ow0KPidkKw@mail.gmail.com
2025-03-04 23:56:49 +09:00
Heikki Linnakangas
d2e7068392 Fix outdated comment
Commit bc971f4025 replaced the latch-setting mechanism that the
comment talked about with a condition variable. And before that,
commit 2258e76f90 moved the code so that the comment got detached from
the loop that it talked about, so move the comment closer to the loop.
2025-03-04 15:33:19 +02:00
Daniel Gustafsson
ad13490be0 doc: Expand version compatibility for pg_basebackup features
This updates the paragraph on backwards compatitibility for server
features to include --incremental which only works on servers with
v17 or newer.  Backpatch down to v17 where incremental backup was
added.

Author: David G. Johnston <David.G.Johnston@Gmail.com>
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Discussion: https://postgr.es/m/CAKFQuwZYfZyeTkS3g2Ovw84TsxHa796xnf-u5kfgn_auyxZk0Q@mail.gmail.com
Backpatch-through: 17
2025-03-04 12:08:27 +01:00
Peter Eisentraut
3abbd8dbeb Fix accidental use of = instead of ==
Fix for commit 630f9a43ce.  It used = instead of ==.  The result
would be an incorrect error message.

Author: Jacob Brazeal <jacob.brazeal@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://www.postgresql.org/message-id/flat/CA%2BCOZaC-JMbhQ4O0Q8V1Bxa0R%2BNex_RN9D6UyuLPiEx_CK4Heg%40mail.gmail.com
2025-03-04 09:45:01 +01:00
Peter Eisentraut
f011acdd61 Fix ALTER TABLE ADD VIRTUAL GENERATED COLUMN when table rewrite
demo:
CREATE TABLE gtest20a (a int PRIMARY KEY, b int GENERATED ALWAYS AS (a * 2) VIRTUAL);
ALTER TABLE gtest20a ADD COLUMN c float8 DEFAULT RANDOM() CHECK (b < 60);
ERROR:  no generation expression found for column number 2 of table "pg_temp_17306"

In ATRewriteTable, the variable OIDNewHeap (if valid) corresponding
pg_attrdef default expression entry was not populated.  So OIDNewHeap
cannot be used to call expand_generated_columns_in_expr or
build_generation_expression.  Therefore in ATRewriteTable, we can only
use the existing relation to expand the generated expression.

Author: jian he <jian.universality@gmail.com>
Reviewed-by: Srinath Reddy <srinath2133@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/CACJufxEJ%3DFoajabWXjszo_yrQeKSxdZ87KJqBW373rSbajKGAA%40mail.gmail.com
2025-03-04 09:18:32 +01:00
Richard Guo
716a051aac Avoid NullTest deduction for clone clauses
In commit b262ad440, we introduced an optimization that reduces an IS
NOT NULL qual on a column defined as NOT NULL to constant true, and an
IS NULL qual on a NOT NULL column to constant false, provided we can
prove that the input expression of the NullTest is not nullable by any
outer join.  This deduction happens after we have generated multiple
clones of the same qual condition to cope with commuted-left-join
cases.

However, performing the NullTest deduction for clone clauses can be
unsafe, because we don't have a reliable way to determine if the input
expression of a NullTest is non-nullable: nullingrel bits in clone
clauses may not reflect reality, so we dare not draw conclusions from
clones about whether Vars are guaranteed not-null.

To fix, we check whether the given RestrictInfo is a clone clause in
restriction_is_always_true and restriction_is_always_false, and avoid
performing any reduction if it is.

There are several ensuing plan changes in predicate.out, and we have
to modify the tests to ensure that they continue to test what they are
intended to.  Additionally, this fix causes the test case added in
f00ab1fd1 to no longer trigger the bug that commit fixed, so we also
remove that test case.

Back-patch to v17 where this bug crept in.

Reported-by: Ronald Cruz <cruz@rentec.com>
Diagnosed-by: Tom Lane <tgl@sss.pgh.pa.us>
Author: Richard Guo <guofenglinux@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/f5320d3d-77af-4ce8-b9c3-4715ff33f213@rentec.com
Backpatch-through: 17
2025-03-04 16:11:03 +09:00
Fujii Masao
28f04984f0 ecpg: Add TAP test for the ecpg command.
This commit adds a TAP test to verify that the ecpg command correctly
detects unsupported or disallowed statements in input files and reports
the appropriate error or warning messages.

This test helps catch bugs like the one introduced in commit 3d009e45bd,
which broke ecpg's handling of unsupported COPY FROM STDIN statements,
later fixed by commit 94b914f601.

Author: Ryo Kanbayashi <kanbayashi.dev@gmail.com>
Reviewed-by: Fujii Masao <masao.fujii@gmail.com>
Discussion: https://postgr.es/m/CANOn0EzoMyxA1m-quDS1UeQUq6FNki6+GGiGucgr9tm2R78rKw@mail.gmail.com
2025-03-04 14:58:46 +09:00
Michael Paquier
c76db55c90 Split pgstat_bestart() into three different routines
pgstat_bestart(), used post-authentication to set up a backend entry
in the PgBackendStatus array, so as its data becomes visible in
pg_stat_activity and related catalogs, has its logic divided into three
routines with this commit, called in order at different steps of the
backend initialization:
* pgstat_bestart_initial() sets up the backend entry with a minimal
amount of information, reporting it with a new BackendState called
STATE_STARTING while waiting for backend initialization and client
authentication to complete.  The main benefit that this offers is
observability, so as it is possible to monitor the backend activity
during authentication.  This step happens earlier than in the logic
prior to this commit.  pgstat_beinit() happens earlier as well, before
authentication.
* pgstat_bestart_security() reports the SSL/GSS status of the
connection, once authentication completes.  Auxiliary processes, for
example, do not need to call this step, hence it is optional.  This
step is called after performing authentication, same as previously.
* pgstat_bestart_final() reports the user and database IDs, takes the
entry out of STATE_STARTING, and reports its application_name.  This is
called as the last step of the three, once authentication completes.

An injection point is added, with a test checking that the "starting"
phase of a backend entry is visible in pg_stat_activity.  Some follow-up
patches are planned to take advantage of this refactoring with more
information provided in backend entries during authentication (LDAP
hanging was a problem for the author, initially).

Author: Jacob Champion <jacob.champion@enterprisedb.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/CAOYmi+=60deN20WDyCoHCiecgivJxr=98s7s7-C8SkXwrCfHXg@mail.gmail.com
2025-03-04 14:09:44 +09:00
Michael Paquier
40d3f82744 Add more assertions in palloc0() and palloc_extended()
palloc() includes an assertion checking that an alloc() implementation
never returns NULL for all MemoryContextMethods.

This commit adds a similar assertion in palloc0().  In palloc_extend(),
a different assertion is added, checking that MCXT_ALLOC_NO_OOM is set
when an alloc() routine returns NULL.  These additions can be useful to
catch errors when implementing a new set of MemoryContextMethods
routines.

Author: Andreas Karlsson <andreas@proxel.se>
Discussion: https://postgr.es/m/507e8eba-2035-4a12-a777-98199a66beb8@proxel.se
2025-03-04 10:53:10 +09:00
Masahiko Sawada
ba57dcfdcd doc: Convert UUID functions list to table format.
Convert the list of UUID functions into a table for better
readability. This commit also adds references to the UUID type section
and includes descriptions of different UUID generation algorithm
versions.

Author: Andy Alsup <bluesbreaker@gmail.com>
Reviewed-by: Laurenz Albe <laurenz.albe@cybertec.at>
Discussion: https://postgr.es/m/CADOZ7s7OHag+r6w+BzKw2xgb3fVtAD-pU=_N9-9pSe5W1TB+xQ@mail.gmail.com
2025-03-03 15:44:01 -08:00
Tom Lane
246dedc5d0 Allow => syntax for named cursor arguments in plpgsql.
We've traditionally accepted "name := value" syntax for
cursor arguments in plpgsql.  But it turns out that the
equivalent statements in Oracle use "name => value".
Since we accept both forms of punctuation for function
arguments, it makes sense to do the same here.

Author: Pavel Stehule <pavel.stehule@gmail.com>
Reviewed-by: Gilles Darold <gilles@darold.net>
Discussion: https://postgr.es/m/CAFj8pRA3d0ARQEMbABa1n6q25AUdNmyO8aGs56XNf9pD4sRMjQ@mail.gmail.com
2025-03-03 18:00:13 -05:00
Thomas Munro
b6904afae4 ci: Use a RAM disk for NetBSD and OpenBSD.
Put the RAM disk setup for all three *BSD CI tasks into a common script,
replacing the old FreeBSD-specific one from commit 0265e5c1.  This makes
them run 3 times and a bit over 2 times faster, respectively.

NetBSD and FreeBSD now share the same one-liner to mount tmpfs.  OpenBSD
needs a GCP-image specific recipe that knows where to steal an unused
disk partition needed to reserve swap space for an mfs RAM disk, because
its tmpfs is deprecated and currently broken.  The configured size is
enough for our current tests but could potentially need future
expansion.  Thanks to Bilal for the disklabel incantation.

Reviewed-by: Nazir Bilal Yavuz <byavuz81@gmail.com>
Discussion: https://postgr.es/m/CA%2BhUKGJJ-XrPhN%2BQA4ZUfYAAXcwOSDty9t0vE9Z8__AdacKnQg%40mail.gmail.com
2025-03-04 11:29:21 +13:00
Melanie Plageman
06eae9e621 Trigger more frequent autovacuums with relallfrozen
Calculate the insert threshold for triggering an autovacuum of a
relation based on the number of unfrozen pages.

By only considering the unfrozen portion of the table when calculating
how many tuples to add to the insert threshold, we can trigger more
frequent vacuums of insert-heavy tables. This increases the chances of
vacuuming those pages when they still reside in shared buffers

This also increases the number of autovacuums triggered by tuples
inserted and not by wraparound risk. We prefer to freeze these pages
during insert-triggered autovacuums, as anti-wraparound vacuums are not
automatically canceled by conflicting lock requests.

We calculate the unfrozen percentage of the table using the recently
added (99f8f3fbbc) relallfrozen column of pg_class.

Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Nathan Bossart <nathandbossart@gmail.com>
Reviewed-by: Greg Sabino Mullane <htamfids@gmail.com>
Reviewed-by: Robert Treat <rob@xzilla.net>
Reviewed-by: wenhui qiu <qiuwenhuifx@gmail.com>
Discussion: https://postgr.es/m/flat/CAAKRu_aj-P7YyBz_cPNwztz6ohP%2BvWis%3Diz3YcomkB3NpYA--w%40mail.gmail.com
2025-03-03 14:42:00 -05:00
Tom Lane
35c8dd9e11 Simplify some logic around setting pg_attribute.atthasdef.
DefineRelation was of the opinion that it could usefully pre-fill
atthasdef flags to eliminate work for StoreAttrDefault.  This is not
the case, however: the tupledesc that it's filling is not the one that
InsertPgAttributeTuples will work from.  The tupledesc used there is
made by RelationBuildLocalRelation, which deliberately doesn't copy
atthasdef.  Moreover, if this did happen as the code thinks, it would
be wrong for the case of plain "DEFAULT NULL" clauses, since we detect
and ignore simple-null-Const defaults later on.  Hence, remove the
useless code.

It also emerges that it's not really worth a special-case path in
StoreAttrDefault() for atthasdef already being set, because as far as
we can see that never happens: cases where an existing default gets
updated always do RemoveAttrDefault first, so as to clean up
possibly-no-longer-correct dependency entries.  If it were the case
the code would still work, anyway.

Also remove a nearby comment made moot by 5eaa0e92e.

Author: jian he <jian.universality@gmail.com>
Discussion: https://postgr.es/m/CACJufxHFssPvkP1we7WMhPD_1kwgbG52o=kQgL+TnVoX5LOyCQ@mail.gmail.com
2025-03-03 13:35:48 -05:00
Tom Lane
4528768d98 Remove now-dead code in StoreAttrDefault().
StoreAttrDefault() is no longer responsible for filling
attmissingval, so remove the code for that.

Get rid of RawColumnDefault.missingMode, too, as we no longer
need that to pass information around.

While here, clean up some sloppy coding in StoreAttrDefault(),
such as failure to use XXXGetDatum macros.  These aren't bugs
but they're not good code either.

Reported-by: jian he <jian.universality@gmail.com>
Author: jian he <jian.universality@gmail.com>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/CACJufxHFssPvkP1we7WMhPD_1kwgbG52o=kQgL+TnVoX5LOyCQ@mail.gmail.com
2025-03-03 13:09:20 -05:00
Tom Lane
95f650674d Fix broken handling of domains in atthasmissing logic.
If a domain type has a default, adding a column of that type (without
any explicit DEFAULT clause) failed to install the domain's default
value in existing rows, instead leaving the new column null.  This
is unexpected, and it used to work correctly before v11.  The cause
is confusion in the atthasmissing mechanism about which default value
to install: we'd only consider installing an explicitly-specified
default, and then we'd decide that no table rewrite is needed.

To fix, take the responsibility for filling attmissingval out of
StoreAttrDefault, and instead put it into ATExecAddColumn's existing
logic that derives the correct value to fill the new column with.
Also, centralize the logic that determines the need for
default-related table rewriting there, instead of spreading it over
four or five places.

In the back branches, we'll leave the attmissingval-filling code
in StoreAttrDefault even though it's now dead, for fear that some
extension may be depending on that functionality to exist there.
A separate HEAD-only patch will clean up the now-useless code.

Reported-by: jian he <jian.universality@gmail.com>
Author: jian he <jian.universality@gmail.com>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/CACJufxHFssPvkP1we7WMhPD_1kwgbG52o=kQgL+TnVoX5LOyCQ@mail.gmail.com
Backpatch-through: 13
2025-03-03 12:43:44 -05:00
Melanie Plageman
99f8f3fbbc Add relallfrozen to pg_class
Add relallfrozen, an estimate of the number of pages marked all-frozen
in the visibility map.

pg_class already has relallvisible, an estimate of the number of pages
in the relation marked all-visible in the visibility map. This is used
primarily for planning.

relallfrozen, together with relallvisible, is useful for estimating the
outstanding number of all-visible but not all-frozen pages in the
relation for the purposes of scheduling manual VACUUMs and tuning vacuum
freeze parameters.

A future commit will use relallfrozen to trigger more frequent vacuums
on insert-focused workloads with significant volume of frozen data.

Bump catalog version

Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Nathan Bossart <nathandbossart@gmail.com>
Reviewed-by: Robert Treat <rob@xzilla.net>
Reviewed-by: Corey Huinker <corey.huinker@gmail.com>
Reviewed-by: Greg Sabino Mullane <htamfids@gmail.com>
Discussion: https://postgr.es/m/flat/CAAKRu_aj-P7YyBz_cPNwztz6ohP%2BvWis%3Diz3YcomkB3NpYA--w%40mail.gmail.com
2025-03-03 11:18:05 -05:00
Tomas Vondra
8492feb98f Allow parallel CREATE INDEX for GIN indexes
Allow using parallel workers to build a GIN index, similarly to BTREE
and BRIN. For large tables this may result in significant speedup when
the build is CPU-bound.

The work is divided so that each worker builds index entries on a subset
of the table, determined by the regular parallel scan used to read the
data. Each worker uses a local tuplesort to sort and merge the entries
for the same key. The TID lists do not overlap (for a given key), which
means the merge sort simply concatenates the two lists. The merged
entries are written into a shared tuplesort for the leader.

The leader needs to merge the sorted entries again, before writing them
into the index. But this way a significant part of the work happens in
the workers, and the leader is left with merging fewer large entries,
which is more efficient.

Most of the parallelism infrastructure is a simplified copy of the code
used by BTREE indexes, omitting the parts irrelevant for GIN indexes
(e.g. uniqueness checks).

Original patch by me, with reviews and substantial improvements by
Matthias van de Meent, certainly enough to make him a co-author.

Author: Tomas Vondra, Matthias van de Meent
Reviewed-by: Matthias van de Meent, Andy Fan, Kirill Reshke
Discussion: https://postgr.es/m/6ab4003f-a8b8-4d75-a67f-f25ad98582dc%40enterprisedb.com
2025-03-03 16:53:06 +01:00
Michael Paquier
3f1db99bfa Handle auxiliary processes in SQL functions of backend statistics
This commit impacts the following SQL functions, authorizing the access
to the PGPROC entries of auxiliary processes when attempting to fetch or
reset backend-level pgstats entries:
- pg_stat_reset_backend_stats()
- pg_stat_get_backend_io()

This is relevant since a051e71e28 for at least the WAL summarizer, WAL
receiver and WAL writer processes, that has changed the backend
statistics to authorize these three following the addition of WAL I/O
statistics in pg_stat_io and backend statistics.  The code is more
flexible with future changes written this way, adapting automatically to
any updates done in pgstat_tracks_backend_bktype().

While on it, pgstat_report_wal() gains a call to pgstat_flush_backend(),
making sure that backend I/O statistics are updated when calling this
routine.  This makes the statistics report correctly for the WAL writer.
WAL receiver and WAL summarizer do not call pgstat_report_wal() yet
(spoiler: both should).  It should be possible to lift some of the
existing restrictions for other auxiliary processes, as well, but this
is left as future work.

Reported-by: Rahila Syed <rahilasyed90@gmail.com>
Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Discussion: https://postgr.es/m/CAH2L28v9BwN8_y0k6FQ591=0g2Hj_esHLGj3bP38c9nmVykoiA@mail.gmail.com
2025-03-03 09:57:48 +09:00
Fujii Masao
fe186bda78 postgres_fdw: Extend postgres_fdw_get_connections to return remote backend PID.
This commit adds a new "remote_backend_pid" output column to
the postgres_fdw_get_connections function. It returns the process ID of
the remote backend, on the foreign server, handling the connection.

This enhancement is useful for troubleshooting, monitoring, and reporting.
For example, if a connection is unexpectedly closed by the foreign server,
the remote backend's PID can help diagnose the cause.

No extension version bump is needed, as commit c297a47c5f already
handled it for v18~.

Author: Sagar Dilip Shedge <sagar.shedge92@gmail.com>
Reviewed-by: Fujii Masao <masao.fujii@gmail.com>
Discussion: https://postgr.es/m/CAPhYifF25q5xUQWXETfKwhc0YVa_6+tfG9Kw4bCvCjpCWxYs2A@mail.gmail.com
2025-03-03 08:51:30 +09:00
Peter Eisentraut
15a79c7311 Use PRI*64 instead of "ll*" in format strings (minimal trial)
Old: errmsg("hello %llu", (unsigned long long) x)
New: errmsg("hello %" PRIu64, x)

And likewise for everything printf-like.

In the past we had to use long long so localized format strings remained
architecture independent in message catalogs.  Although long long is
expected to be 64 bit everywhere, if we hadn't also cast the int64
values, we'd have generated compiler warnings on systems where int64 was
long.

Now that int64 is int64_t, C99 understand how to format them using
<inttypes.h> macros, the casts are not necessary, and the gettext()
tools recognize the macros and defer expansion until load time.  (And if
we ever manage to get -Wformat-signedness to work for us, that'd help
with these too, but not the type-system-clobbering casts.)

This particular patch converts only pg_checksums.c to the new system,
to allow testing of the translation toolchain for everyone.  If this
works okay, a later patch will convert most of the rest.

Author: Thomas Munro <thomas.munro@gmail.com>
Discussion: https://postgr.es/m/b936d2fb-590d-49c3-a615-92c3a88c6c19%40eisentraut.org
2025-03-02 13:53:03 +01:00
Tom Lane
00d61a08c5 Fix pg_strtof() to not crash on NULL endptr.
We had managed not to notice this simple oversight because none
of our calls exercised the case --- until commit 8f427187d.
That led to pg_dump crashing on any platform that uses this code
(currently Cygwin and Mingw).

Even though there's no immediate bug in the back branches, backpatch,
because a non-POSIX-compliant strtof() substitute is trouble waiting
to happen for extensions or future back-patches.

Diagnosed-by: Alexander Lakhin <exclusion@gmail.com>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/339b3902-4e98-4e31-a744-94e43b7b9292@gmail.com
Backpatch-through: 13
2025-03-01 14:22:56 -05:00
Peter Eisentraut
56ba0463d3 Set amcancrosscompare to true for hash
This was missed in the refactoring in patch ce62f2f2a0, which thus
created a regression.

Reported-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Mark Dilger <mark.dilger@enterprisedb.com>
Discussion: https://www.postgresql.org/message-id/flat/E1tngY6-0000UL-2n%40gemulon.postgresql.org
2025-03-01 09:15:27 +01:00
Thomas Munro
c301a0a74a Work around OAuth/EVFILT_TIMER quirk on NetBSD.
NetBSD's EVFILT_TIMER doesn't like zero timeouts, as introduced by
commit b3f0be788.  Steal the workaround from the same problem on Linux
from a few lines up: round zero up to one.  Do this only for NetBSD, as
the other systems with the kevent() API accept zero and shouldn't have
to insert a small bogus wait.

Future improvement ideas:
 * when NetBSD < 10 falls out of support, we could try NODE_ABSTIME for
   the "fire now" meaning if timeout == 0
 * when libcurl tells us to start a 0ms timer and call it back, we could
   figure out how to handle that more directly without involving the
   kernel (the current architecture doesn't make that straightforward)

Failures with EINVAL errors could be seen on the new optional NetBSD CI
task that we're trying to keep green as a candidate for inclusion as
default-enabled CI task.  The NetBSD build farm animals aren't testing
OAuth yet, so no breakage there.

Reviewed-by: Jacob Champion <jacob.champion@enterprisedb.com>
Discussion: https://postgr.es/m/CA%2BhUKGJ%2BWyJ26QGvO_nkgvbxgw%2B03U4EQ4Hxw%2BQBft6Np%2BXW7w%40mail.gmail.com
2025-03-01 14:41:02 +13:00
Masahiko Sawada
8a1012b35d Re-export NextCopyFromRawFields() to copy.h.
Commit 7717f63006 removed NextCopyFromRawFields() from copy.h. While
it was hoped that NextCopyFrom() could serve as an alternative,
certain use cases still require NextCopyFromRawFields(). For instance,
extensions like file_text_array_fdw, which process source data with an
unknown number of columns, rely on this function.

Per buildfarm member crake.

Reported-by: Andrew Dunstan <andrew@dunslane.net>
Reviewed-by: Andrew Dunstan <andrew@dunslane.net>
Reviewed-by: Sutou Kouhei <kou@clear-code.com>
Discussion: https://postgr.es/m/5c7e1ac8-5083-4c08-af19-cb9ade2f16ce@dunslane.net
2025-02-28 15:11:41 -08:00
Nathan Bossart
e636da9200 Adjust auto_explain's GUC descriptions.
This commit adjusts auto_explain's GUC descriptions to follow the
style guidelines established by commit 977d865c36.  Specifically,
it ensures the accepted special values are listed in a consistent
manner.

Author: Ilia Evdokimov <ilya.evdokimov@tantorlabs.com>
Reviewed-by: Peter Smith <smithpb2250@gmail.com>
Discussion: https://postgr.es/m/e82d4647-ce7f-45c7-9b01-fb900a050767%40tantorlabs.com
2025-02-28 16:05:51 -06:00
Tom Lane
8b49392b27 Tweak regex to avoid a bug in Perl 5.16.3.
For some reason, 5.16.3 (and perhaps slightly earlier/later versions)
go into an infinite loop with the version-replacement regex installed
by commit fc0d0ce97.  We can work around that by using an explicit
"\n" instead of the line-start metacharacter "^".

Reported-by: Sami Imseih <samimseih@gmail.com>
Discussion: https://postgr.es/m/CAA5RZ0u9dV3CdKqkqdusA_RdvBkwWe0c0rxcFWj++VYoutFYSw@mail.gmail.com
2025-02-28 15:20:24 -05:00
Masahiko Sawada
7717f63006 Refactor COPY FROM to use format callback functions.
This commit introduces a new CopyFromRoutine struct, which is a set of
callback routines to read tuples in a specific format. It also makes
COPY FROM with the existing formats (text, CSV, and binary) utilize
these format callbacks.

This change is a preliminary step towards making the COPY FROM command
extensible in terms of input formats.

Similar to 2e4127b6d2, this refactoring contributes to a performance
improvement by reducing the number of "if" branches that need to be
checked on a per-row basis when sending field representations in text
or CSV mode. The performance benchmark results showed ~5% performance
gain in text or CSV mode.

Author: Sutou Kouhei <kou@clear-code.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Tomas Vondra <tomas.vondra@enterprisedb.com>
Reviewed-by: Junwang Zhao <zhjwpku@gmail.com>
Discussion: https://postgr.es/m/20231204.153548.2126325458835528809.kou@clear-code.com
2025-02-28 10:29:36 -08:00
Robert Haas
77cb08be51 Avoid including explain.h in explain_format.h and explain_dr.h
As per a suggestion from Tom Lane, we do this by declaring "struct
ExplainState" here and refer to that rather than "ExplainState".

Also per Tom, CreateExplainSerializeDestReceiver was still defined
in explain.h in addition to explain_dr.h. Remove leftover prototype.

Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: http://postgr.es/m/CA+TgmoYtaad3i21V0jqua-fbr+CR0ix6uBvEX8_s6BG96abd=g@mail.gmail.com
2025-02-28 13:17:29 -05:00
Robert Haas
51d3e279c3 Fix missing space in EXPLAIN ANALYZE output.
Commit ddb17e387a introduced this
regression. Ideally, the regression tests would have caught this
mistake, but apparently they don't test with timing enabled,
presumably because that would make the output vary.

Author: Thom Brown <thom@linux.com>
Reviewed-by: Fabrízio de Royes Mello <fabriziomello@gmail.com>
Discussion: http://postgr.es/m/CAA-aLv6nq=UeiyvM7_Mxgo9TVBzs2oh46b9vfyLzuyVEz3j1-g@mail.gmail.com
2025-02-28 13:04:12 -05:00
Jeff Davis
424ededc58 Adjust pg_dump tag for relation stats.
Do not use fmtId(), just use dobj->name directly, like for table data.
2025-02-27 20:42:12 -08:00
Michael Paquier
c2a50ac678 Invent pgstat_fetch_stat_backend_by_pid()
This code is extracted from pg_stat_get_backend_io() in pgstatfuncs.c,
so as it can be shared with other areas that need backend pgstats
entries while having the benefits of the various sanity checks
refactored here.  As per its name, this retrieves backend statistics
based on a PID, with the option of retrieving a BackendType if given in
input.

Currently, this is used for the backend-level IO statistics.  The next
move would be to reuse that for the backend-level WAL statistics.

Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Discussion: https://postgr.es/m/Z3zqc4o09dM/Ezyz@ip-10-97-1-34.eu-west-3.compute.internal
2025-02-28 11:20:31 +09:00
Michael Paquier
2a083ab807 pg_upgrade: Fix inconsistency in memory freeing
The function in charge of freeing the memory from a result created by
PQescapeIdentifier() has to be PQfreemem(), to ensure that both
allocation and free come from libpq.

One spot in pg_upgrade was not respecting that for pg_database's
datlocale (daticulocale in v16) when the collation provider is libc (aka
datlocale/daticulocale is NULL) with an allocation done using
pg_strdup() and a free with PQfreemem().  The code is changed to always
use PQescapeLiteral() when processing the input.

Oversight in 9637badd9f.  This commit is similar to 48e4ae9a07 and
5b94e27534.

Author: Michael Paquier <michael@paquier.xyz>
Co-authored-by: Ranier Vilela <ranier.vf@gmail.com>
Discussion: https://postgr.es/m/Z601RQxTmIUohdkV@paquier.xyz
Backpatch-through: 16
2025-02-28 10:15:29 +09:00
Masahiko Sawada
2e4127b6d2 Refactor COPY TO to use format callback functions.
This commit introduces a new CopyToRoutine struct, which is a set of
callback routines to copy tuples in a specific format. It also makes
the existing formats (text, CSV, and binary) utilize these format
callbacks.

This change is a preliminary step towards making the COPY TO command
extensible in terms of output formats.

Additionally, this refactoring contributes to a performance
improvement by reducing the number of "if" branches that need to be
checked on a per-row basis when sending field representations in text
or CSV mode. The performance benchmark results showed ~5% performance
gain in text or CSV mode.

Author: Sutou Kouhei <kou@clear-code.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Tomas Vondra <tomas.vondra@enterprisedb.com>
Reviewed-by: Junwang Zhao <zhjwpku@gmail.com>
Discussion: https://postgr.es/m/20231204.153548.2126325458835528809.kou@clear-code.com
2025-02-27 15:03:52 -08:00
Robert Haas
555960a0fb Create explain_dr.c and move DestReceiver-related code there.
explain.c has grown rather large, and the code that deals with the
DestReceiver that supports the SERIALIZE option is pretty easily severable
from the rest of explain.c; hence, move it to a separate file.

Reviewed-by: Peter Geoghegan <pg@bowt.ie>
Discussion: http://postgr.es/m/CA+TgmoYutMw1Jgo8BWUmB3TqnOhsEAJiYO=rOQufF4gPLWmkLQ@mail.gmail.com
2025-02-27 13:14:16 -05:00
Robert Haas
9173e8b604 Create explain_format.c and move relevant code there.
explain.c has grown rather large, so move various functions that
are principally concerned with output generation to a new source
file, explain_format.c, instead of lumping them in with everything
else that is part of explain.c

Reviewed-by: Peter Geoghegan <pg@bowt.ie>
Discussion: http://postgr.es/m/CA+TgmoYutMw1Jgo8BWUmB3TqnOhsEAJiYO=rOQufF4gPLWmkLQ@mail.gmail.com
2025-02-27 12:37:10 -05:00
Robert Haas
95dbd827f2 EXPLAIN: Always use two fractional digits for row counts.
Commit ddb17e387a attempted to avoid
confusing users by displaying digits after the decimal point only when
nloops > 1, since it's impossible to have a fraction row count after a
single iteration. However, this made the regression tests unstable since
parallal queries will have nloops>1 for all nodes below the Gather or
Gather Merge in normal cases, but if the workers don't start in time and
the leader finishes all the work, they will suddenly have nloops==1,
making it unpredictable whether the digits after the decimal point would
be displayed or not. Although 44cbba9a7f
seemed to fix the immediate failures, it may still be the case that there
are lower-probability failures elsewhere in the regression tests.

Various fixes are possible here. For example, it has previously been
proposed that we should try to display the digits after the decimal
point only if rows/nloops is an integer, but currently rows is storead
as a float so it's not theoretically an exact quantity -- precision
could be lost in extreme cases. It has also been proposed that we
should try to display the digits after the decimal point only if we're
under some sort of construct that could potentially cause looping
regardless of whether it actually does. While such ideas are not
without merit, this patch adopts the much simpler solution of always
display two decimal digits. If that approach stands up to scrutiny
from the buildfarm and human users, it spares us the trouble of doing
anything more complex; if not, we can reassess.

This commit incidentally reverts 44cbba9a7f,
which should no longer be needed.

Author: Robert Haas <robertmhaas@gmail.com>
Author: Ilia Evdokimov <ilya.evdokimov@tantorlabs.com>
Discussion: http://postgr.es/m/CA+TgmoazzVHn8sFOMFAEwoqBTDxKT45D7mvkyeHgqtoD2cn58Q@mail.gmail.com
2025-02-27 11:27:16 -05:00
Peter Eisentraut
ce62f2f2a0 Generalize hash and ordering support in amapi
Stop comparing access method OID values against HASH_AM_OID and
BTREE_AM_OID, and instead check the IndexAmRoutine for an index to see
if it advertises its ability to perform the necessary ordering,
hashing, or cross-type comparing functionality.  A field amcanorder
already existed, this uses it more widely.  Fields amcanhash and
amcancrosscompare are added for the other purposes.

Author: Mark Dilger <mark.dilger@enterprisedb.com>
Discussion: https://www.postgresql.org/message-id/flat/E72EAA49-354D-4C2E-8EB9-255197F55330@enterprisedb.com
2025-02-27 17:03:31 +01:00
Tom Lane
6eb8a1a4f9 Avoid unnecessary computation of pgbench's script line number.
ParseScript only needs the lineno for meta-commands, so let's not
bother computing it otherwise.  While this doesn't save much given
the previous patch, there's no point in doing unnecessary work.
While we're at it, avoid calling psql_scan_get_location() twice for
a meta-command.

One reason for making this change is that the line number computed
in ParseScript's main loop was actually wrong in most cases: it
would point just past the semicolon of the previous SQL command,
not at what the user thinks the current command's line number is.
We could add some code to skip whitespace before capturing the line
number, but it would be pretty pointless at present.  Just move the
call to avoid the temptation to rely on that value.  (Once we've
lexed the backslash, the computed line number will be right.)

This change also means that pgbench never inquires about the
location before it's lexed something, so that the care taken in
the previous patch to behave sanely in that case is unnecessary.
It seems best to keep that logic, though, as future callers
might depend on it.

Author: Daniel Vérité <daniel@manitou-mail.org>
Discussion: https://postgr.es/m/84a8a89e-adb8-47a9-9d34-c13f7150ee45@manitou-mail.org
2025-02-27 10:57:55 -05:00
Tom Lane
c8c74ad7e1 Get rid of O(N^2) script-parsing overhead in pgbench.
pgbench wants to record the starting line number of each command
in its scripts.  It was computing that by scanning from the script
start and counting newlines, so that O(N^2) work had to be done
for an N-command script.  In a script with 50K lines, this adds
up to about 10 seconds on my machine.

To add insult to injury, the results were subtly wrong, because
expr_scanner_offset() scanned to find the NUL that flex inserts
at the end of the current token --- and before the first yylex
call, no such NUL has been inserted.  So we ended by computing the
script's last line number not its first one.  This was visible only
in case of \gset at the start of a script, which perhaps accounts
for the lack of complaints.

To fix, steal an idea from plpgsql and track the current lexer
ending position and line count as we advance through the script.
(It's a bit simpler than plpgsql since we can't need to back up.)
Also adjust a couple of other places that were invoking scans
from script start when they didn't really need to.  I made a new
psqlscan function psql_scan_get_location() that replaces both
expr_scanner_offset() and expr_scanner_get_lineno(), since in
practice expr_scanner_get_lineno() was only being invoked to find
the line number of the current lexer end position.

Reported-by: Daniel Vérité <daniel@manitou-mail.org>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/84a8a89e-adb8-47a9-9d34-c13f7150ee45@manitou-mail.org
2025-02-27 10:53:38 -05:00
Alexander Korotkov
e167191dc1 Get rid of ojrelid local variable in remove_rel_from_query()
As spotted by Coverity, the calculation of ojrelid mixes signed and unsigned
types causes possible overflow and undefined behavior.  Instead of trying to
fix the expression, this commit eliminates the relied local variable.  The
explicit branching is used to replace the -1 value.  That, in turn, requires
changing the signature of the remove_rel_from_eclass() function.

Reported-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/914330.1740330169%40sss.pgh.pa.us
Reviewed-by: Andrei Lepikhov <lepihov@gmail.com>
2025-02-27 11:22:01 +02:00
Thomas Munro
55918f798b Remove arbitrary cap on read_stream.c buffer queue.
Previously the internal queue of buffers was capped at max_ios * 4,
though not less than io_combine_limit, at allocation time.  That was
done in the first version based on conservative theories about resource
usage and heuristics pending later work.  The configured I/O depth could
not always be reached with dense random streams generated by ANALYZE,
VACUUM, the proposed Bitmap Heap Scan patch, and also sequential streams
with the proposed AIO subsystem to name some examples.

The new formula is (max_ios + 1) * io_combine_limit, enough buffers for
the full configured I/O concurrency level using the full configured I/O
combine size, plus the buffers from one finished but not yet consumed
full-sized I/O.  Significantly more memory would be needed for high GUC
values if the client code requests a large per-buffer data size, but
that is discouraged (existing and proposed stream users try to keep it
under a few words, if not zero).

With this new formula, an intermediate variable could have overflowed
under maximum GUC values, so its data type is adjusted to cope.

Discussion: https://postgr.es/m/CA%2BhUKGK_%3D4CVmMHvsHjOVrK6t4F%3DLBpFzsrr3R%2BaJYN8kcTfWg%40mail.gmail.com
2025-02-27 20:49:48 +13:00
Michael Paquier
48e4ae9a07 pg_amcheck: Fix inconsistency in memory freeing
The function in charge of freeing the memory from a result created by
PQescapeIdentifier() has to be PQfreemem(), to ensure that both
allocation and free come from libpq, but one spot in pg_amcheck was
missing that.

Oversight in b859d94c63.

Author: Ranier Vilela <ranier.vf@gmail.com>
Reviewed-by: vignesh C <vignesh21@gmail.com>
Discussion: https://postgr.es/m/CAEudQArD_nKSnYCNUZiPPsJ2tNXgRmLbXGSOrH1vpOF_XtP0Vg@mail.gmail.com
Discussion: https://postgr.es/m/CAEudQArbTWVSbxq608GRmXJjnNSQ0B6R7CSffNnj2hPWMUsRNg@mail.gmail.com
Backpatch-through: 14
2025-02-27 14:05:51 +09:00
Amit Kapila
8709dccc79 Fix the race condition in ReplicationSlotAcquire().
After commit f41d8468dd, a process could acquire and use a replication
slot that had just been invalidated, leading to failures while accessing
WAL.

To ensure that we don't accidentally start using invalid slots, we must
perform the invalidation check after acquiring the slot or under the
spinlock where we associate the slot with a particular process. We choose
the earlier method to keep the code simple.

Reported-by: Hou Zhijie <houzj.fnst@fujitsu.com>
Author: Nisha Moond <nisha.moond412@gmail.com>
Reviewed-by: Hou Zhijie <houzj.fnst@fujitsu.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Discussion: https://postgr.es/m/CABdArM7J-LbGoMPGUPiFiLOyB_TZ5+YaZb=HMES0mQqzVTn8Gg@mail.gmail.com
2025-02-27 09:47:04 +05:30
Amit Kapila
845511a72a Doc: Additional clarification for -d option of pg_createsubscriber.
Author: vignesh C <vignesh21@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Discussion: https://postgr.es/m/CALDaNm0zsFUYpe-tLha+-sp3K8KmBXu0o=LUN=8FFtxMLYikPA@mail.gmail.com
2025-02-27 08:50:03 +05:30
Michael Paquier
495864a4cf Refactor code of pg_stat_get_wal() building result tuple
This commit adds to pgstatfuncs.c a new routine called
pg_stat_wal_build_tuple(), helper routine for pg_stat_get_wal().  This
is in charge of filling one tuple based on the contents of
PgStat_WalStats retrieved from pgstats.

This refactoring will be used by an upcoming patch introducing
backend-level WAL statistics, simplifying the main patch.  Note that
it is not possible for stats_reset to be NULL in pg_stat_wal; backend
statistics need to be able to handle this case.

Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Discussion: https://postgr.es/m/Z3zqc4o09dM/Ezyz@ip-10-97-1-34.eu-west-3.compute.internal
2025-02-27 11:54:36 +09:00
Michael Paquier
62ec3e1f67 Fix possible double-release of spinlock in procsignal.c
9d9b9d46f3 has added spinlocks to protect the fields in ProcSignal
flags, introducing a code path in ProcSignalInit() where a spinlock
could be released twice if the pss_pid field of a ProcSignalSlot is
found as already set.  Multiple spinlock releases have no effect with
most spinlock implementations, but this could cause the code to run into
issues when the spinlock is acquired concurrently by a different
process.

This sanity check on pss_pid generates a LOG that can be delayed until
after the spinlock is released as, like older versions up to v17, the
code expects the initialization of the ProcSignalSlot to happen even if
pss_pid is found incorrect.  The code is changed so as the old pss_pid
is read while holding the slot's spinlock, with the LOG from the sanity
check generated after releasing the spinlock, preventing the double
release.

Author: Maksim Melnikov <m.melnikov@postgrespro.ru>
Co-authored-by: Maxim Orlov <orlovmg@gmail.com>
Reviewed-by: Andrey Borodin <x4mmm@yandex-team.ru>
Discussion: https://postgr.es/m/dca47527-2d8b-4e3b-b5a0-e2deb73371a4@postgrespro.ru
2025-02-27 09:43:06 +09:00
Jeff Davis
15df9d7b51 Remove stray diff introduced by a5cbdeb98a.
Reported-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/Z77IkjmmfbFfNh3f@paquier.xyz
2025-02-26 13:37:14 -08:00
Tom Lane
40e27d04b4 Use attnum to identify index columns in pg_restore_attribute_stats().
Previously we used attname for both table and index columns, but
that is problematic for indexes because their attnames are assigned
by internal rules that don't guarantee to preserve the names across
dump and reload.  (This is what's causing the remaining buildfarm
failures in cross-version-upgrade tests.)  Fortunately we can use
attnum instead, since there's no such thing as adding or dropping
columns in an existing index.  We met this same problem previously
with ALTER INDEX ... SET STATISTICS, and solved it the same way,
cf commit 5b6d13eec.

In pg_restore_attribute_stats() itself, we accept either attnum or
attname, but the policy used by pg_dump is to always use attname
for tables and attnum for indexes.

Author: Tom Lane <tgl@sss.pgh.pa.us>
Author: Corey Huinker <corey.huinker@gmail.com>
Discussion: https://postgr.es/m/1457469.1740419458@sss.pgh.pa.us
2025-02-26 16:36:20 -05:00
Peter Eisentraut
f734c9fc3a Revert "Prepare for Python "Limited API" in PL/Python"
This reverts commit c47e8df815.

That commit makes the plpython tests crash with Python 3.6.* and
3.7.*.  It will need further investigation and testing, so revert for
now.
2025-02-26 21:58:38 +01:00
Masahiko Sawada
945a9e3832 Fix a typo in 005_char_signedness.pl test.
The test in 005_char_signedness.pl was missing a dash in the
--set-char-signedness option. Although the test didn't fail since it
doesn't check the error message, it resulted in an unexpected error
message instead of the intended one.

Oversight in 1aab680591.

Author: Dagfinn Ilmari Mannsåker <ilmari@ilmari.org>
Discussion: https://postgr.es/m/87tt8h1vb7.fsf@wibble.ilmari.org
2025-02-26 11:10:03 -08:00
Peter Eisentraut
c47e8df815 Prepare for Python "Limited API" in PL/Python
Using the Python Limited API would allow building PL/Python against
any Python 3.x version and using another Python 3.x version at run
time.  This commit does not activate that, but it prepares the code to
only use APIs supported by the Limited API.

Implementation details:

- Convert static types to heap types
  (https://docs.python.org/3/howto/isolating-extensions.html#heap-types).

- Replace PyRun_String() with component functions.

- Replace PyList_SET_ITEM() with PyList_SetItem().

Reviewed-by: Jakob Egger <jakob@eggerapps.at>
Discussion: https://www.postgresql.org/message-id/flat/ee410de1-1e0b-4770-b125-eeefd4726a24@eisentraut.org
2025-02-26 16:14:39 +01:00
Michael Paquier
0e42d31b0b Adding new PgStat_WalCounters structure in pgstat.h
This new structure contains the counters and the data related to the WAL
activity statistics gathered from WalUsage, separated into its own
structure so as it can be shared across more than one Stats structure in
pg_stat.h.

This refactoring will be used by an upcoming patch introducing
backend-level WAL statistics.

Bump PGSTAT_FILE_FORMAT_ID.

Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Discussion: https://postgr.es/m/Z3zqc4o09dM/Ezyz@ip-10-97-1-34.eu-west-3.compute.internal
2025-02-26 16:48:54 +09:00
Michael Paquier
d7cbeaf261 Remove pgstat_flush_wal()
All the processes that generate WAL should call pgstat_report_wal() to
report all their statistics related to WAL, and this is already what
happens in the tree.  Keeping pgstat_report_wal() is confusing while the
other routine is encouraged.

This routine is not required since fc415edf8c, where it was lastly
used in pgstat_report_stat() before an equivalent callback existed.

Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Discussion: https://postgr.es/m/Z71oPkJJICrRB5Ws@paquier.xyz
2025-02-26 15:37:28 +09:00
Amit Kapila
e117cfb2f6 Add two-phase option in pg_createsubscriber.
This patch introduces the '--enable-two-phase' option to the
'pg_createsubscriber' utility, allowing users to enable two-phase commit
for all subscriptions during their creation.

Note that even without this option users can enable the two_phase option
for the subscriptions created by pg_createsubscriber. However, it requires
the subscription to be disabled first which could be inconvenient for
users.

When two-phase commit is enabled, prepared transactions are sent to the
subscriber at the time of 'PREPARE TRANSACTION', and they are processed as
two-phase transactions on the subscriber as well. If disabled, prepared
transactions are sent only when committed and are processed immediately by
the subscriber.

Author: Shubham Khanna <khannashubham1197@gmail.com>
Reviewed-by: vignesh C <vignesh21@gmail.com>
Reviewed-by: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Reviewed-by: Peter Smith <smithpb2250@gmail.com>
Reviewed-by: Ajin Cherian <itsajin@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Discussion: https://postgr.es/m/CAHv8RjLPdFP=kA5LNSmWZ=+GMXmO+LczvV6p9HJjsXxZz10KGA@mail.gmail.com
2025-02-26 11:12:50 +05:30
Michael Paquier
adc6032fa8 Improve FATAL message for invalid TLI history at recovery
The original message did not mention where the checkpoint record LSN was
found, a control file or a backup_label file.  A couple of LOG messages
are generated before this FATAL check is reached, providing more details
about the way recovery is set up.  However, knowing this information in
this specific message is useful for debugging.  This is also useful for
instances where log_min_messages is set to FATAL or more, where LOG
messages do not show up.

Author: Benoit Lobréau <benoit.lobreau@dalibo.com>
Reviewed-by: David Steele <david@pgbackrest.org>
Discussion: https://postgr.es/m/4ed10bc8-5513-4d8e-8643-8abcaa08336d@dalibo.com
2025-02-26 14:26:16 +09:00
Jeff Davis
6ee3b91bad pg_dump: prepare attribute stats query.
Follow precedent in pg_dump for preparing queries to improve
performance. Also, simplify the query by removing unnecessary joins.

Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reported-by: Andres Freund <andres@anarazel.de>
Co-authored-by: Corey Huinker <corey.huinker@gmail.com>
Co-authored-by: Jeff Davis <pgsql@j-davis.com>
Discussion: https://postgr.es/m/CADkLM=dRMC6t8gp9GVf6y6E_r5EChQjMAAh_vPyih_zMiq0zvA@mail.gmail.com
2025-02-25 19:52:11 -08:00
Jeff Davis
8f427187db Avoid unnecessary relation stats query in pg_dump.
The few fields we need can be easily collected in getTables() and
getIndexes() and stored in RelStatsInfo.

Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reported-by: Andres Freund <andres@anarazel.de>
Co-authored-by: Corey Huinker <corey.huinker@gmail.com>
Co-authored-by: Jeff Davis <pgsql@j-davis.com>
Discussion: https://postgr.es/m/CADkLM=f0a43aTd88xW4xCFayEF25g-7hTrHX_WhV40HyocsUGg@mail.gmail.com
2025-02-25 19:51:45 -08:00
Michael Paquier
6c349d83b6 Re-add GUC track_wal_io_timing
This commit is a rework of 2421e9a51d, about which Andres Freund has
raised some concerns as it is valuable to have both track_io_timing and
track_wal_io_timing in some cases, as the WAL write and fsync paths can
be a major bottleneck for some workloads.  Hence, it can be relevant to
not calculate the WAL timings in environments where pg_test_timing
performs poorly while capturing some IO data under track_io_timing for
the non-WAL IO paths.  The opposite can be also true: it should be
possible to disable the non-WAL timings and enable the WAL timings (the
previous GUC setups allowed this possibility).

track_wal_io_timing is added back in this commit, controlling if WAL
timings should be calculated in pg_stat_io for the read, fsync and write
paths, as done previously with pg_stat_wal.  pg_stat_wal previously
tracked only the sync and write parts (now removed), read stats is new
data tracked in pg_stat_io, all three are aggregated if
track_wal_io_timing is enabled.  The read part matters during recovery
or if a XLogReader is used.

Extra note: more control over if the types of timings calculated in
pg_stat_io could be done with a GUC that lists pairs of (IOObject,IOOp).

Reported-by: Andres Freund <andres@anarazel.de>
Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Co-authored-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/3opf2wh2oljco6ldyqf7ukabw3jijnnhno6fjb4mlu6civ5h24@fcwmhsgmlmzu
2025-02-26 09:49:59 +09:00
Jeff Davis
a5cbdeb98a Remove redundant pg_set_*_stats() variants.
After commit f3dae2ae58, the primary purpose of separating the
pg_set_*_stats() from the pg_restore_*_stats() variants was
eliminated.

Leave pg_restore_relation_stats() and pg_restore_attribute_stats(),
which satisfy both purposes, and remove pg_set_relation_stats() and
pg_set_attribute_stats().

Reviewed-by: Corey Huinker <corey.huinker@gmail.com>
Discussion: https://postgr.es/m/1457469.1740419458@sss.pgh.pa.us
2025-02-25 16:15:47 -08:00
Andres Freund
ecbff4378b Change _mdfd_segpath() to return paths by value
This basically mirrors the changes done in the predecessor commit. While there
isn't currently a need to get these paths in critical sections, it seems a
shame to unnecessarily allocate memory in these paths now that relpath()
doesn't allocate anymore.

Discussion: https://postgr.es/m/xeri5mla4b5syjd5a25nok5iez2kr3bm26j2qn4u7okzof2bmf@kwdh2vf7npra
2025-02-25 09:02:07 -05:00
Andres Freund
37c87e63f9 Change relpath() et al to return path by value
For AIO, and also some other recent patches, we need the ability to call
relpath() in a critical section. Until now that was not feasible, as it
allocated memory.

The fact that relpath() allocated memory also made it awkward to use in log
messages because we had to take care to free the memory afterwards. Which we
e.g. didn't do for when zeroing out an invalid buffer.

We discussed other solutions, e.g. filling a pre-allocated buffer that's
passed to relpath(), but they all came with plenty downsides or were larger
projects. The easiest fix seems to be to make relpath() return the path by
value.

To be able to return the path by value we need to determine the maximum length
of a relation path. This patch adds a long #define that computes the exact
maximum, which is verified to be correct in a regression test.

As this change the signature of relpath(), extensions using it will need to
adapt their code. We discussed leaving a backward-compat shim in place, but
decided it's not worth it given the use of relpath() doesn't seem widespread.

Discussion: https://postgr.es/m/xeri5mla4b5syjd5a25nok5iez2kr3bm26j2qn4u7okzof2bmf@kwdh2vf7npra
2025-02-25 09:02:07 -05:00
Peter Eisentraut
32c393f9f1 Remove obsolete Python version check
The checked version is already the current minimum supported version
(3.2).

Discussion: https://www.postgresql.org/message-id/flat/ee410de1-1e0b-4770-b125-eeefd4726a24@eisentraut.org
2025-02-25 14:11:38 +01:00
Richard Guo
363a6e8c6f Eliminate code duplication in replace_rte_variables callbacks
The callback functions ReplaceVarsFromTargetList_callback and
pullup_replace_vars_callback are both used to replace Vars in an
expression tree that reference a particular RTE with items from a
targetlist, and they both need to expand whole-tuple references and
deal with OLD/NEW RETURNING list Vars.  As a result, currently there
is significant code duplication between these two functions.

This patch introduces a new function, ReplaceVarFromTargetList, to
perform the replacement and calls it from both callback functions,
thereby eliminating code duplication.

Author: Dean Rasheed <dean.a.rasheed@gmail.com>
Author: Richard Guo <guofenglinux@gmail.com>
Reviewed-by: Jian He <jian.universality@gmail.com>
Discussion: https://postgr.es/m/CAEZATCWhr=FM4X5kCPvVs-g2XEk+ceLsNtBK_zZMkqFn9vUjsw@mail.gmail.com
2025-02-25 16:11:34 +09:00
Richard Guo
1e4351af32 Expand virtual generated columns in the planner
Commit 83ea6c540 added support for virtual generated columns that are
computed on read.  All Var nodes in the query that reference virtual
generated columns must be replaced with the corresponding generation
expressions.  Currently, this replacement occurs in the rewriter.
However, this approach has several issues.  If a Var referencing a
virtual generated column has any varnullingrels, those varnullingrels
need to be propagated into the generation expression.  Failing to do
so can lead to "wrong varnullingrels" errors and improper outer-join
removal.

Additionally, if such a Var comes from the nullable side of an outer
join, we may need to wrap the generation expression in a
PlaceHolderVar to ensure that it is evaluated at the right place and
hence is forced to null when the outer join should do so.  In certain
cases, such as when the query uses grouping sets, we also need a
PlaceHolderVar for anything that is not a simple Var to isolate
subexpressions.  Failure to do so can result in incorrect results.

To fix these issues, this patch expands the virtual generated columns
in the planner rather than in the rewriter, and leverages the
pullup_replace_vars architecture to avoid code duplication.  The
generation expressions will be correctly marked with nullingrel bits
and wrapped in PlaceHolderVars when needed by the pullup_replace_vars
callback function.  This requires handling the OLD/NEW RETURNING list
Vars in pullup_replace_vars_callback, as it may now deal with Vars
referencing the result relation instead of a subquery.

The "wrong varnullingrels" error was reported by Alexander Lakhin.
The incorrect result issue and the improper outer-join removal issue
were reported by Richard Guo.

Author: Richard Guo <guofenglinux@gmail.com>
Author: Dean Rasheed <dean.a.rasheed@gmail.com>
Reviewed-by: Jian He <jian.universality@gmail.com>
Discussion: https://postgr.es/m/75eb1a6f-d59f-42e6-8a78-124ee808cda7@gmail.com
2025-02-25 16:10:25 +09:00
Michael Paquier
560a842d63 Fix untranslatable string concatenation in pg_upgrade
Oversight in 1aab680591.

Author: Kyotaro Horiguchi
Discussion: https://postgr.es/m/20250225.140953.1271748916018759840.horikyota.ntt@gmail.com
2025-02-25 15:53:32 +09:00
Amit Kapila
5b8f2ccc0a Doc: Fix pg_copy_logical_replication_slot description.
This commit documents that the failover option is not copied when using
the pg_copy_logical_replication_slot function.

In passing, we modify the comments in the function clarifying the reason
for this behavior.

Reported-by: <duffieldzane@gmail.com>
Author: Hou Zhijie <houzj.fnst@fujitsu.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Backpatch-through: 17, where it was introduced
Discussion: https://postgr.es/m/173976850802.682632.11315364077431550250@wrigleys.postgresql.org
2025-02-25 09:42:07 +05:30
Jeff Davis
15601fa21a Missing doc update for f3dae2ae58. 2025-02-24 17:27:32 -08:00
Jeff Davis
f3dae2ae58 Do not use in-place updates for statistics import.
The use of in-place updates was originally there to follow the
precedent of ANALYZE and to reduce the potential for bloat on
pg_class. Per discussion, it's not worth the risks.

Reported-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/cpdanvzykcb5o64rmapkx6n5gjypoce3y52hff7ocxupgpbxu4@53jmlyvukijo
2025-02-24 17:10:59 -08:00
Michael Paquier
3ce357584e psql: Add pipeline status to prompt and some state variables
This commit adds %P to psql prompts, able to report the status of a
pipeline depending on PQpipelineStatus(): on, off or abort.

The following variables are added to report the state of an ongoing
pipeline:
- PIPELINE_SYNC_COUNT: reports the number of piped syncs.
- PIPELINE_COMMAND_COUNT: reports the number of piped commands, a
command being either \bind, \bind_named, \close or \parse.
- PIPELINE_RESULT_COUNT: reports the results available to read with
\getresults.

These variables can be used with \echo or in a prompt, using "%:name:"
in PROMPT1, PROMPT2 or PROMPT3.  Some basic regression tests are added
for these.  The suggestion to use variables to show the details about
the status counters comes from me.  The original patch proposed was less
extensible, hardcoding the output in the prompt.

Author: Anthonin Bonnefoy <anthonin.bonnefoy@datadoghq.com>
Discussion: https://postgr.es/m/CAO6_XqroE7JuMEm1sWz55rp9fAYX2JwmcP_3m_v51vnOFdsLiQ@mail.gmail.com
2025-02-25 10:07:24 +09:00
Amit Langote
cbb9086c9e Fix bug in cbc127917 to handle nested Append correctly
A non-leaf partition with a subplan that is an Append node was
omitted from PlannedStmt.unprunableRelids because it was mistakenly
included in PlannerGlobal.prunableRelids due to the way
PartitionedRelPruneInfo.leafpart_rti_map[] is constructed. This
happened when a non-leaf partition used an unflattened Append or
MergeAppend.  As a result, ExecGetRangeTableRelation() reported an
error when called from CreatePartitionPruneState() to process the
partition's own PartitionPruneInfo, since it was treated as prunable
when it should not have been.

Reported-by: Alexander Lakhin <exclusion@gmail.com> (via sqlsmith)
Diagnosed-by: Tender Wang <tndrwang@gmail.com>
Reviewed-by: Tender Wang <tndrwang@gmail.com>
Discussion: https://postgr.es/m/74839af6-aadc-4f60-ae77-ae65f94bf607@gmail.com
2025-02-25 09:24:42 +09:00
Masahiko Sawada
48796a98d5 Fix assertion when decoding XLOG_PARAMETER_CHANGE on promoted primary.
When a standby replays an XLOG_PARAMETER_CHANGE record that lowers
wal_level below logical, we invalidate all logical slots in hot
standby mode. However, if this record was replayed while not in hot
standby mode, logical slots could remain valid even after promotion,
potentially causing an assertion failure during WAL record decoding.

To fix this issue, this commit adds a check for hot_standby status
when restoring a logical replication slot on standbys. This check
ensures that logical slots are invalidated when they become
incompatible due to insufficient wal_level during recovery.

Backpatch to v16 where logical decoding on standby was introduced.

Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Reviewed-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Discussion: https://postgr.es/m/CAD21AoABoFwGY_Rh2aeE6tEq3HkJxf0c6UeOXn4VV9v6BAQPSw%40mail.gmail.com
Backpatch-through: 16
2025-02-24 14:03:04 -08:00
Daniel Gustafsson
d1146dc2a7 oauth: Rename macro to avoid collisions on Windows
Our json parsing defined the macros OPTIONAL and REQUIRED to decorate the
structs with for increased readability. This however collides with macros
in the <windef.h> header on Windows.

../src/interfaces/libpq/fe-auth-oauth-curl.c:398:9: warning: "OPTIONAL" redefined
  398 | #define OPTIONAL false
      |         ^~~~~~~~
In file included from D:/a/_temp/msys64/ucrt64/include/windef.h:9,
                 from D:/a/_temp/msys64/ucrt64/include/windows.h:69,
                 from D:/a/_temp/msys64/ucrt64/include/winsock2.h:23,
                 from ../src/include/port/win32_port.h:60,
                 from ../src/include/port.h:24,
                 from ../src/include/c.h:1331,
                 from ../src/include/postgres_fe.h:28,
                 from ../src/interfaces/libpq/fe-auth-oauth-curl.c:16:
include/minwindef.h:65:9: note: this is the location of the previous definition
   65 | #define OPTIONAL
      |         ^~~~~~~~

Rename to avoid compilation errors in anticipation of implementing
support for Windows.

Reported-by: Dave Cramer (on PostgreSQL Hacking Discord)
2025-02-24 22:20:37 +01:00
Daniel Gustafsson
03366b61df oauth: Fix incorrect const markers in struct
Two members in PGoauthBearerRequest were incorrectly marked as const.
While in there, align the name of the struct with the typedef as per
project style.

Reported-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/912516.1740329361@sss.pgh.pa.us
2025-02-24 22:20:29 +01:00
Melanie Plageman
bfe56cdf9a Delay extraction of TIDBitmap per page offsets
Pages from the bitmap created by the TIDBitmap API can be exact or
lossy. The TIDBitmap API extracts the tuple offsets from exact pages
into an array for the convenience of the caller.

This was done in tbm_private|shared_iterate() right after advancing the
iterator. However, as long as tbm_private|shared_iterate() set a
reference to the PagetableEntry in the TBMIterateResult, the offset
extraction can be done later.

Waiting to extract the tuple offsets has a few benefits. For the shared
iterator case, it allows us to extract the offsets after dropping the
shared iterator state lock, reducing time spent holding a contended
lock.

Separating the iteration step and extracting the offsets later also
allows us to avoid extracting the offsets for prefetched blocks. Those
offsets were never used, so the overhead of extracting and storing them
was wasted.

The real motivation for this change, however, is that future commits
will make bitmap heap scan use the read stream API. This requires a
TBMIterateResult per issued block. By removing the array of tuple
offsets from the TBMIterateResult and only extracting the offsets when
they are used, we reduce the memory required for per buffer data
substantially.

Suggested-by: Thomas Munro <thomas.munro@gmail.com>
Reviewed-by: Thomas Munro <thomas.munro@gmail.com>
Discussion: https://postgr.es/m/CA%2BhUKGLHbKP3jwJ6_%2BhnGi37Pw3BD5j2amjV3oSk7j-KyCnY7Q%40mail.gmail.com
2025-02-24 16:10:19 -05:00
Melanie Plageman
b8778c4cd8 Add lossy indicator to TBMIterateResult
TBMIterateResult->ntuples is -1 when the page in the bitmap is lossy.
Add an explicit lossy indicator so that we can move ntuples out of the
TBMIterateResult in a future commit.

Discussion: https://postgr.es/m/CA%2BhUKGLHbKP3jwJ6_%2BhnGi37Pw3BD5j2amjV3oSk7j-KyCnY7Q%40mail.gmail.com
2025-02-24 16:10:13 -05:00
Nathan Bossart
c56e8af75e Fix comment for MAX_BACKENDS.
This comment mentions that we check that the configured number of
backends does not exceed MAX_BACKENDS in RegisterBackgroundWorker()
and relevant GUC check hooks, neither of which has those checks
anymore.  To fix, adjust this comment to say that we do the check
in InitializeMaxBackends().

Oversights in commits 6bc8ef0b7f and 0b1fe1413e.

Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/Z7zOEzz8lNjaU9yf%40nathan
2025-02-24 15:02:09 -06:00
Robert Haas
e87c14b19e libpq: Trace all NegotiateProtocolVersion fields
Previously, the names of the unsupported protocol options were not
traced. Since NegotiateProtocolVersion has not really been used yet,
that has not mattered much, but we hope to use it eventually, so let's
fix this.

Author: Jelte Fennema-Nio <postgres@jeltef.nl>
Discussion: https://postgr.es/m/CAGECzQTfc_O+HXqAo5_-xG4r3EFVsTefUeQzSvhEyyLDba-O9w@mail.gmail.com
2025-02-24 12:06:21 -05:00
Robert Haas
c9d94ea215 libpq: Add PQfullProtocolVersion to exports.txt
This is necessary to be able to actually use the function on Windows;
bug introduced in commit cdb6b0fdb0.

Author: Jelte Fennema-Nio <postgres@jeltef.nl>
Discussion: https://postgr.es/m/CAGECzQTfc_O+HXqAo5_-xG4r3EFVsTefUeQzSvhEyyLDba-O9w@mail.gmail.com
2025-02-24 11:47:31 -05:00
Tom Lane
9de2cc455e Fix confusion about data type of pg_class.relpages and relallvisible.
Although they're exposed as int4 in pg_class, relpages and
relallvisible are really of type BlockNumber, that is uint32.
Correct type puns in relation_statistics_update() and remove
inappropriate range-checks.  The type puns are only cosmetic
issues, but the range checks would cause failures with huge
relations.

Reported-by: Tom Lane <tgl@sss.pgh.pa.us>
Author: Corey Huinker <corey.huinker@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/614341.1740269035@sss.pgh.pa.us
2025-02-24 11:16:04 -05:00
Daniel Gustafsson
e889422d98 pg_amcheck: PQclear query results
While the potential memory leak is small, ensure to PQclear the query
results before disconnecting.

Author: Jiao Shuntian <312199339@qq.com>
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Discussion: https://postgr.es/m/tencent_F34922C91C41E76C734773E767C9FBDB9906@qq.com
2025-02-24 16:03:19 +01:00
Andres Freund
5ee75e32fa Add static asserts for MAX_BACKENDS limiting factors
So far the various dependencies were documented in the comment above
MAX_BACKENDS, but not checked.

Discussion: https://postgr.es/m/CA+COZaBO_s3LfALq=b+HcBHFSOEGiApVjrRacCe4VP9m7CJsNQ@mail.gmail.com
2025-02-24 06:23:41 -05:00
Andres Freund
418451bfe1 bufmgr: Make it easier to change number of buffer state bits
In an upcoming commit I'd like to change the number of bits for the usage
count (the current max is 5, fitting in three bits, but we reserve four
bits). Until now that required adjusting a bunch of magic constants, now the
constants are defined based on the number of bits reserved.

Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi>
Discussion: https://postgr.es/m/lxzj26ga6ippdeunz6kuncectr5gfuugmm2ry22qu6hcx6oid6@lzx3sjsqhmt6
Discussion: https://postgr.es/m/riivolmg6uzfvpzfn6wjo3ghwt42rcec43ok6mv4oenfg654y7@x7dbposbskwd
2025-02-24 06:23:41 -05:00
Andres Freund
cd3ccf88aa Base LWLock limits directly on MAX_BACKENDS
Jacob reported that comments for LW_SHARED_MASK referenced a MAX_BACKENDS
limit of 2^23-1, but that MAX_BACKENDS is actually limited to 2^18-1. The
limit was lowered in 48354581a4, but the comment in lwlock.c wasn't updated.

Instead of just fixing the comment, it seems better to directly base the
lwlock defines on MAX_BACKENDS and add static assertions to ensure that there
is enough space. That way there's no comment that can go out of sync in the
future.

As part of that change I noticed that for some reason the high bit wasn't used
for flags, which seems somewhat odd. Redefine the flag values to start at the
highest bit.

Reported-by: Jacob Brazeal <jacob.brazeal@gmail.com>
Reviewed-by: Jacob Brazeal <jacob.brazeal@gmail.com>
Discussion: https://postgr.es/m/CA+COZaBO_s3LfALq=b+HcBHFSOEGiApVjrRacCe4VP9m7CJsNQ@mail.gmail.com
2025-02-24 06:23:41 -05:00
Andres Freund
6394a3a61c Move MAX_BACKENDS to procnumber.h
MAX_BACKENDS influences many things besides postmaster. I e.g. noticed that we
don't have static assertions ensuring BUF_REFCOUNT_MASK is big enough for
MAX_BACKENDS, adding them would require including postmaster.h in
buf_internals.h which doesn't seem right.

While at that, add MAX_BACKENDS_BITS, as that's useful in various places for
static assertions (to be added in subsequent commits).

Reviewed-by: Thomas Munro <thomas.munro@gmail.com>
Discussion: https://postgr.es/m/wptizm4qt6yikgm2pt52xzyv6ycmqiutloyvypvmagn7xvqkce@d4xuv3mylpg4
2025-02-24 06:23:41 -05:00
John Naylor
0600d276d4 Silence warning in older versions of Valgrind
Due to misunderstanding on my part, commit 235328ee4 did not go far
enough to silence older versions of Valgrind. For those, it was the bit
scan that was problematic, not the subsequent bit-masking operation. To
fix, use the unaligned path for the trailing bytes. Since we don't have
a bit scan here anymore, also remove some comments and endian-specific
coding around that.

Reported-by: Anton A. Melnikov <a.melnikov@postgrespro.ru>
Discussion: https://postgr.es/m/f3aa2d45-3b28-41c5-9499-a1bc30e0f8ec@postgrespro.ru
Backpatch-through: 17
2025-02-24 18:03:29 +07:00
Michael Paquier
2421e9a51d Remove read/sync fields from pg_stat_wal and GUC track_wal_io_timing
The four following attributes are removed from pg_stat_wal:
* wal_write
* wal_sync
* wal_write_time
* wal_sync_time

a051e71e28 has added an equivalent of this information in pg_stat_io
with more granularity as this now spreads across the backend types, IO
context and IO objects.  So, keeping the same information in pg_stat_wal
has little benefits.

Another benefit of this commit is the removal of PendingWalStats,
simplifying an upcoming patch to add per-backend WAL statistics, which
already support IO statistics and which have access to the write/sync
stats data of WAL.

The GUC track_wal_io_timing, that was used to enable or disable the
aggregation of the write and sync timings for WAL, is also removed.
pgstat_prepare_io_time() is simplified.

Bump catalog version.
Bump PGSTAT_FILE_FORMAT_ID, due to the update of PgStat_WalStats.

Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Discussion: https://postgr.es/m/Z7RkQ0EfYaqqjgz/@ip-10-97-1-34.eu-west-3.compute.internal
2025-02-24 09:51:56 +09:00
Tom Lane
fc0d0ce978 Ignore hash's relallvisible when checking pg_upgrade from pre-v10.
Our cross-version upgrade tests have been failing for some pre-v10
source versions since commit 1fd1bd871.  This turns out to be
because relallvisible may change for tables that have hash indexes,
because the upgrade process forcibly reindexes such indexes to
deal with the changes made in v10.

Fortunately, the set of tables that have such indexes is small
and won't change anymore in those branches.  So just hack up
AdjustUpgrade.pm to not compare the relallvisible values of
those specific tables.

While here, also tighten the regex that suppresses comparison
of version fields.

Discussion: https://postgr.es/m/812817.1740277228@sss.pgh.pa.us
2025-02-23 14:16:26 -05:00
Peter Eisentraut
454c182f85 backend libpq void * argument for binary data
Change some backend libpq functions to take void * for binary data
instead of char *.  This removes the need for numerous casts.

Reviewed-by: Dagfinn Ilmari Mannsåker <ilmari@ilmari.org>
Discussion: https://www.postgresql.org/message-id/flat/fd1fcedb-3492-4fc8-9e3e-74b97f2db6c7%40eisentraut.org
2025-02-23 14:27:02 +01:00
Peter Eisentraut
ebdccead16 SnapBuildRestoreContents() void * argument for binary data
Change internal snapbuild API function to take void * for binary data
instead of char *.  This removes the need for numerous casts.

Reviewed-by: Dagfinn Ilmari Mannsåker <ilmari@ilmari.org>
Discussion: https://www.postgresql.org/message-id/flat/fd1fcedb-3492-4fc8-9e3e-74b97f2db6c7%40eisentraut.org
2025-02-23 12:38:21 +01:00
Michael Paquier
a4e986ef5a Add more tests for utility commands in pipelines
This commit checks interactions with pipelines and implicit transaction
blocks for the following commands that have their own behaviors when
used in pipelines depending on their order in a pipeline and sync
requests:
- SET LOCAL
- REINDEX CONCURRENTLY
- VACUUM
- Subtransactions (SAVEPOINT, ROLLBACK TO SAVEPOINT)

These scenarios could be tested only with pgbench previously.  The
meta-commands of psql controlling pipelines make these easier to
implement, debug, and they can be run in a SQL script.

Author: Anthonin Bonnefoy <anthonin.bonnefoy@datadoghq.com>
Discussion: https://postgr.es/m/CAO6_XqroE7JuMEm1sWz55rp9fAYX2JwmcP_3m_v51vnOFdsLiQ@mail.gmail.com
2025-02-23 16:43:07 +09:00
Peter Eisentraut
f98765f0ce jsonb internal API void * argument for binary data
Change some internal jsonb API functions to take void * for binary
data instead of char *.  This removes the need for numerous casts.

Reviewed-by: Dagfinn Ilmari Mannsåker <ilmari@ilmari.org>
Discussion: https://www.postgresql.org/message-id/flat/fd1fcedb-3492-4fc8-9e3e-74b97f2db6c7%40eisentraut.org
2025-02-23 08:34:55 +01:00
Jeff Davis
cb45dc3afb Documentation fixups for dumping statistics.
Reported-by: Hayato Kuroda (Fujitsu) <kuroda.hayato@fujitsu.com>
Reported-by: Andrew Dunstan <andrew@dunslane.net>
Discussion: https://postgr.es/m/OSCPR01MB149665630030E7F54FDA8B27BF5C72@OSCPR01MB14966.jpnprd01.prod.outlook.com
Discussion: https://postgr.es/m/25d26774-25fa-46f2-9888-c6a707d1fef7@dunslane.net
2025-02-22 10:03:11 -08:00
Álvaro Herrera
bba2fbc623
Change \conninfo to use tabular format
(Initially the proposal was to keep \conninfo alone and add this feature
as \conninfo+, but we decided against keeping the original.)

Also display more fields than before, though not as many as were
suggested during the discussion.  In particular, we don't show 'role'
nor 'session authorization', for both which a case can probably be made.
These can be added as followup commits, if we agree to it.

Some (most?) reviewers actually reviewed rather different versions of
the patch and do not necessarily endorse the current one.

Co-authored-by: Maiquel Grassi <grassi@hotmail.com.br>
Co-authored-by: Hunaid Sohail <hunaidpgml@gmail.com>
Reviewed-by: Nathan Bossart <nathandbossart@gmail.com>
Reviewed-by: Peter Eisentraut <peter@eisentraut.org>
Reviewed-by: Sami Imseih <simseih@amazon.com>
Reviewed-by: David G. Johnston <david.g.johnston@gmail.com>
Reviewed-by: Jim Jones <jim.jones@uni-muenster.de>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Pavel Luzanov <p.luzanov@postgrespro.ru>
Reviewed-by: Dean Rasheed <dean.a.rasheed@gmail.com>
Reviewed-by: Erik Wienhold <ewie@ewie.name>
Discussion: https://postgr.es/m/CP8P284MB24965CB63DAC00FC0EA4A475EC462@CP8P284MB2496.BRAP284.PROD.OUTLOOK.COM
2025-02-22 10:05:26 +01:00
Amit Langote
4f1b6e5bb4 Remove unstable test suite added by 525392d57
The 'cached-plan-inval' test suite, introduced in 525392d57 under
src/test/modules/delay_execution, aimed to verify that cached plan
invalidation triggers replanning after deferred locks are taken.
However, its ExecutorStart_hook-based approach relies on lock timing
assumptions that, in retrospect, are fragile. This instability was
exposed by failures on BF animal trilobite, which builds with
CLOBBER_CACHE_ALWAYS.

One option was to dynamically disable the cache behavior that causes
the test suite to fail by setting "debug_discard_caches = 0", but it
seems better to remove the suite. The risk of future failures due to
other cache flush hazards outweighs the benefit of catching real
breakage in the backend behavior it tests.

Reported-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/2990641.1740117879@sss.pgh.pa.us
2025-02-22 15:19:23 +09:00
Andres Freund
f8d7f29b3e Allow lwlocks to be disowned
To implement AIO writes, the backend initiating writes needs to transfer the
lock ownership to the AIO subsystem, so the lock held during the write can be
released in another backend.

Other backends need to be able to "complete" an asynchronously started IO to
avoid deadlocks (consider e.g. one backend starting IO for a buffer and then
waiting for a heavyweight lock held by another relation followed by the
current holder of the heavyweight lock waiting for the IO to complete).

To that end, this commit adds LWLockDisown() and LWLockReleaseDisowned(). If
code uses LWLockDisown() it's the code's responsibility to ensure that the
lock is released in case of errors.

Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi>
Discussion: https://postgr.es/m/1f6b50a7-38ef-4d87-8246-786d39f46ab9@iki.fi
2025-02-21 20:55:23 -05:00
Robert Haas
44cbba9a7f Adjust EXPLAIN test case to filter out "Actual Rows" values.
Per the buildfarm, these tests appear to be unstable in the wake of
commit ddb17e387a. I'm not sure that
just hiding this output is the right way forward, because I think
there may be other test cases that will fail with lower probability
even after this fix. However, it's hard to tell right now, because
this is failing on a number of buildfarm animals. So let's try this
for now to either get a clearer picture of what else is broken, or
as a stopgap until we decide what the permanent fix should be, or
perhaps this will be the permanent fix after all.
2025-02-21 19:20:41 -05:00
Tom Lane
98fc31d649 Avoid race condition between "GRANT role" and "DROP ROLE".
Concurrently dropping either the granted role or the grantee
does not stop GRANT from completing, instead resulting in a
dangling role reference in pg_auth_members.  That's relatively
harmless in the short run, but inconsistent catalog entries
are not a good thing.

This patch solves the problem by adding the granted and grantee
roles as explicit shared dependencies of the pg_auth_members entry.
That's a bit indirect, but it works because the pg_shdepend code
applies the necessary locking and rechecking.

Commit 6566133c5 previously established similar handling for
the grantor column of pg_auth_members; it's not clear why it
didn't cover the other two role OID columns.

A side-effect of this approach is that DROP OWNED BY will now drop
pg_auth_members entries that mention the target role as either the
granted or grantee role.  That's clearly appropriate for the
grantee, since we'll drop its other privileges too.  It doesn't
seem too far out of line for the granted role, since we're
presumably about to drop it and besides we're removing all reasons
why it'd matter to be a member of it.  (One could argue that this
makes DropRole's code to auto-drop pg_auth_members entries
unnecessary, but I chose to leave it in place since perhaps some
people's workflows expect that to work without a DROP OWNED BY.)

Note to patch readers: CreateRole's first CommandCounterIncrement
call is now unconditional, because this change creates another
case in which it's needed, and it seemed to be more trouble than
it's worth to preserve that micro-optimization.

Arguably this is a bug fix, but the fact that it changes the
expected contents of pg_shdepend seems like not a great thing
to do in the stable branches, and perhaps we don't want the
change in DROP OWNED BY semantics there either.  On the other
hand, I opted not to force a catversion bump in HEAD, because
the presence or absence of these entries doesn't matter for
most purposes.

Reported-by: Virender Singla <virender.cse@gmail.com>
Reviewed-by: Laurenz Albe <laurenz.albe@cybertec.at>
Discussion: https://postgr.es/m/CAM6Zo8woa62ZFHtMKox6a4jb8qQ=w87R2L0K8347iE-juQL2EA@mail.gmail.com
2025-02-21 17:07:01 -05:00
Robert Haas
ddb17e387a Allow EXPLAIN to indicate fractional rows.
When nloops > 1, we now display two digits after the decimal point,
rather than none. This is important because what we print is actually
planstate->instrument->ntuples / nloops, and sometimes what you want
to know is planstate->instrument->ntuples. You can estimate that by
multiplying the displayed row count by the displayed nloops value, but
the fact that the displayed value is rounded makes that inexact. It's
still inexact even if we show these two extra decimal places, but less
so. Perhaps we will agree on a way to further improve this output later,
but for now this seems better than doing nothing.

Author: Ibrar Ahmed <ibrar.ahmad@gmail.com>
Author: Ilia Evdokimov <ilya.evdokimov@tantorlabs.com>
Reviewed-by: David G. Johnston <david.g.johnston@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Reviewed-by: Vignesh C <vignesh21@gmail.com>
Reviewed-by: Greg Stark <stark@mit.edu>
Reviewed-by: Naeem Akhter <akhternaeem@gmail.com>
Reviewed-by: Hamid Akhtar <hamid.akhtar@percona.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Andrei Lepikhov <a.lepikhov@postgrespro.ru>
Reviewed-by: Guillaume Lelarge <guillaume@lelarge.info>
Reviewed-by: Matheus Alcantara <matheusssilv97@gmail.com>
Reviewed-by: Alena Rybakina <a.rybakina@postgrespro.ru>
Discussion: http://postgr.es/m/603c8f070905281830g2e5419c4xad2946d149e21f9d%40mail.gmail.com
2025-02-21 16:14:13 -05:00
Masahiko Sawada
78d3f48895 Add test 005_char_signedness.pl to meson.build.
Oversight in a8238f87f9 where the test has been added.

Discussion: https://postgr.es/m/CB11ADBC-0C3F-4FE0-A678-666EE80CBB07%40amazon.com
2025-02-21 12:31:16 -08:00
Tom Lane
29d75b25b5 Fix pg_dumpall to cope with dangling OIDs in pg_auth_members.
There is a race condition between "GRANT role" and "DROP ROLE",
which allows GRANT to install pg_auth_members entries that refer to
dropped roles.  (Commit 6566133c5 prevented that for the grantor
field, but not for the granted or grantee roles.)  We'll soon fix
that, at least in HEAD, but pg_dumpall needs to cope with the
situation in case of pre-existing inconsistency.  As pg_dumpall
stands, it will emit invalid commands like 'GRANT foo TO ""',
which causes pg_upgrade to fail.  Fix it to emit warnings and skip
those GRANTs, instead.

There was some discussion of removing the problem by changing
dumpRoleMembership's query to use JOIN not LEFT JOIN, but that
would result in silently ignoring such entries.  It seems better
to produce a warning.

Pre-v16 branches already coped with dangling grantor OIDs by simply
omitting the GRANTED BY clause.  I left that behavior as-is, although
it's somewhat inconsistent with the behavior of later branches.

Reported-by: Virender Singla <virender.cse@gmail.com>
Discussion: https://postgr.es/m/CAM6Zo8woa62ZFHtMKox6a4jb8qQ=w87R2L0K8347iE-juQL2EA@mail.gmail.com
Backpatch-through: 13
2025-02-21 13:37:15 -05:00
Masahiko Sawada
dfd8e6c73e Fix an issue with index scan using pg_trgm due to char signedness on different architectures.
GIN and GiST indexes utilizing pg_trgm's opclasses store sorted
trigrams within index tuples. When comparing and sorting each trigram,
pg_trgm treats each character as a 'char[3]' type in C. However, the
char type in C can be interpreted as either signed char or unsigned
char, depending on the platform, if the signedness is not explicitly
specified. Consequently, during replication between different CPU
architectures, there was an issue where index scans on standby servers
could not locate matching index tuples due to the differing treatment
of character signedness.

This change introduces comparison functions for trgm that explicitly
handle signed char and unsigned char. The appropriate comparison
function will be dynamically selected based on the character
signedness stored in the control file. Therefore, upgraded clusters
can utilize the indexes without rebuilding, provided the cluster
upgrade occurs on platforms with the same character signedness as the
original cluster initialization.

The default char signedness information was introduced in 44fe30fdab,
so no backpatch.

Reviewed-by: Noah Misch <noah@leadboat.com>
Discussion: https://postgr.es/m/CB11ADBC-0C3F-4FE0-A678-666EE80CBB07%40amazon.com
2025-02-21 10:27:39 -08:00
Masahiko Sawada
1aab680591 pg_upgrade: Add --set-char-signedness to set the default char signedness of new cluster.
This change adds a new option --set-char-signedness to pg_upgrade. It
enables user to set arbitrary signedness during pg_upgrade. This helps
cases where user who knew they copied the v17 source cluster from
x86 (signedness=true) to ARM (signedness=false) can pg_upgrade
properly without the prerequisite of acquiring an x86 VM.

Reviewed-by: Noah Misch <noah@leadboat.com>
Discussion: https://postgr.es/m/CB11ADBC-0C3F-4FE0-A678-666EE80CBB07%40amazon.com
2025-02-21 10:23:39 -08:00
Masahiko Sawada
a8238f87f9 pg_upgrade: Preserve default char signedness value from old cluster.
Commit 44fe30fdab introduced the 'default_char_signedness' field in
controlfile. Newly created database clusters always set this field to
'signed'.

This change ensures that pg_upgrade updates the
'default_char_signedness' to 'unsigned' if the source database cluster
has signedness=false. For source clusters from v17 or earlier, which
lack the 'default_char_signedness' information, pg_upgrade assumes the
source cluster was initialized on the same platform where pg_upgrade
is running. It then sets the 'default_char_signedness' value according
to the current platform's default character signedness.

Reviewed-by: Noah Misch <noah@leadboat.com>
Discussion: https://postgr.es/m/CB11ADBC-0C3F-4FE0-A678-666EE80CBB07%40amazon.com
2025-02-21 10:19:40 -08:00
Masahiko Sawada
30666d1857 pg_resetwal: Add --char-signedness option to change the default char signedness.
With the newly added option --char-signedness, pg_resetwal updates the
default char signedness flag in the controlfile. This option is
primarily intended for an upcoming patch that pg_upgrade supports
preserving the default char signedness during upgrades, and is not
meant for manual operation.

Reviewed-by: Noah Misch <noah@leadboat.com>
Discussion: https://postgr.es/m/CB11ADBC-0C3F-4FE0-A678-666EE80CBB07%40amazon.com
2025-02-21 10:14:36 -08:00
Masahiko Sawada
44fe30fdab Add default_char_signedness field to ControlFileData.
The signedness of the 'char' type in C is
implementation-dependent. For instance, 'signed char' is used by
default on x86 CPUs, while 'unsigned char' is used on aarch
CPUs. Previously, we accidentally let C implementation signedness
affect persistent data. This led to inconsistent results when
comparing char data across different platforms.

This commit introduces a new 'default_char_signedness' field in
ControlFileData to store the signedness of the 'char' type. While this
change does not encourage the use of 'char' without explicitly
specifying its signedness, this field can be used as a hint to ensure
consistent behavior for pre-v18 data files that store data sorted by
the 'char' type on disk (e.g., GIN and GiST indexes), especially in
cross-platform replication scenarios.

Newly created database clusters unconditionally set the default char
signedness to true. pg_upgrade (with an upcoming commit) changes this
flag for clusters if the source database cluster has
signedness=false. As a result, signedness=false setting will become
rare over time. If we had known about the problem during the last
development cycle that forced initdb (v8.3), we would have made all
clusters signed or all clusters unsigned. Making pg_upgrade the only
source of signedness=false will cause the population of database
clusters to converge toward that retrospective ideal.

Bump catalog version (for the catalog changes) and PG_CONTROL_VERSION
(for the additions in ControlFileData).

Reviewed-by: Noah Misch <noah@leadboat.com>
Discussion: https://postgr.es/m/CB11ADBC-0C3F-4FE0-A678-666EE80CBB07%40amazon.com
2025-02-21 10:12:08 -08:00
Bruce Momjian
901a1cf8b4 doc: clarify default checksum behavior in non-master branches
Also simplify and correct data checksum wording in master now that it is
the default.  PG 13 did not have the awkward wording.

Reported-by: Felix <afripowered@gmail.com>

Reviewed-by: Laurenz Albe

Discussion: https://postgr.es/m/173928241056.707.3989867022954178032@wrigleys.postgresql.org

Backpatch-through: 14
2025-02-21 13:03:29 -05:00
Bruce Momjian
6ea0734e41 doc: remove non-breaking space in SGML files, causes make error 2025-02-21 12:15:53 -05:00
Andres Freund
32ce58e9e9 Make test portlock logic work with meson
Previously the portlock logic, added in 9b4eafcaf4, didn't actually work
properly when the tests were run via meson. 9b4eafcaf4 used the
MESON_BUILD_ROOT environment variable to determine the directory for the port
lock directory, but that's never set for running the tests.  That meant that
each test used its own portlock dir, unless the PG_TEST_PORT_DIR environment
variable was set.

Fix the problem by setting top_builddir for the environment. That's also used
for the autoconf/make build.

Backpatch back to 16, where meson support was added.

Reported-by: Zharkov Roman <r.zharkov@postgrespro.ru>
Reviewed-by: Andrew Dunstan <andrew@dunslane.net>
Backpatch-through: 16
2025-02-21 11:25:05 -05:00
Michael Paquier
665cafe8a4 Fix cross-version upgrades with XMLSERIALIZE(NO INDENT)
Dumps from versions older than v16 do not know about NO INDENT in a
XMLSERIALIZE() clause.  This commit adjusts AdjustUpgrade.pm so as NO
INDENT is discarded in the contents of the new dump adjusted for
comparison when the old version is v15 or older.  This should be enough
to make the cross-version upgrade tests pass.

Per report from buildfarm member crake.  Oversight in 984410b923.

Reviewed-by: Andrew Dunstan <andrew@dunslane.net>
Discussion: https://postgr.es/m/88b183f1-ebf9-4f51-9144-3704380ccae7@dunslane.net
Backpatch-through: 16
2025-02-21 20:37:31 +09:00
Peter Eisentraut
329304c901 Support text position search functions with nondeterministic collations
This allows using text position search functions with nondeterministic
collations.  These functions are

- position, strpos
- replace
- split_part
- string_to_array
- string_to_table

which all use common internal infrastructure.

There was previously no internal implementation of this, so it was met
with a not-supported error.  This adds the internal implementation and
removes the error.

Unlike with deterministic collations, the search cannot use any
byte-by-byte optimized techniques but has to go substring by
substring.  We also need to consider that the found match could have a
different length than the needle and that there could be substrings of
different length matching at a position.  In most cases, we need to
find the longest such substring (greedy semantics), but this can be
configured by each caller.

Reviewed-by: Euler Taveira <euler@eulerto.com>
Discussion: https://www.postgresql.org/message-id/flat/582b2613-0900-48ca-8b0d-340c06f4d400@eisentraut.org
2025-02-21 12:21:17 +01:00
Daniel Gustafsson
41336bf085 doc: Add links to olsen93 and ong90 in bibliography
The bibliography entries for olsen93 and ong90 lacked links to
online copies.  While ong90 is available in digital form, the
olsen93 thesis is only available as a physical copy in the UCB
library.  To save people from searching for it, we still link
to it via the UCB library page.

Reported-by: jian he <jian.universality@gmail.com>
Discussion: https://postgr.es/m/CACJufxFcJYdRvzgt59N26XjFp2tFFUXu+VN+x8Uo0NbDUCMCbw@mail.gmail.com
2025-02-21 11:28:42 +01:00
Amit Kapila
b4e0d0c53f Fix a WARNING for data origin discrepancies.
Previously, a WARNING was issued at the time of defining a subscription
with origin=NONE only when the publisher subscribed to the same table from
other publishers, indicating potential data origination from different
origins. However, the publisher can subscribe to the partition ancestors
or partition children of the table from other publishers, which could also
result in mixed-origin data inclusion. So, give a WARNING in those cases
as well.

Reported-by: Sergey Tatarintsev <s.tatarintsev@postgrespro.ru>
Author: Hou Zhijie <houzj.fnst@fujitsu.com>
Author: Shlok Kyal <shlok.kyal.oss@gmail.com>
Reviewed-by: Vignesh C <vignesh21@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Backpatch-through: 16, where it was introduced
Discussion: https://postgr.es/m/5eda6a9c-63cf-404d-8a49-8dcb116a29f3@postgrespro.ru
2025-02-21 14:34:40 +05:30
Michael Paquier
984410b923 Add missing deparsing of [NO] IDENT to XMLSERIALIZE()
NO INDENT is the default, and is added if no explicit indentation
flag was provided with XMLSERIALIZE().

Oversight in 483bdb2afe.

Author: Jim Jones <jim.jones@uni-muenster.de>
Discussion: https://postgr.es/m/bebd457e-5b43-46b3-8fc6-f6a6509483ba@uni-muenster.de
Backpatch-through: 16
2025-02-21 17:30:56 +09:00
Peter Eisentraut
7d6d2c4bbd Drop opcintype from index AM strategy translation API
The type argument wasn't actually really necessary.  It was a remnant
of converting the API of the gist strategy translation from using
opclass to using opfamily+opcintype (commits c09e5a6a01,
622f678c10).  For looking up the gist translation function, we used
the convention "amproclefttype = amprocrighttype = opclass's
opcintype" (see pg_amproc.h).  But each operator family should only
have one translation function, and getting the right type for the
lookup is sometimes cumbersome and fragile, so this is all
unnecessarily complicated.

To simplify this, change the gist stategy support procedure to take
"any", "any" as argument.  (This is arbitrary but seems intuitive.
The alternative of using InvalidOid as argument(s) upsets various DDL
commands, so it's not practical.)  Then we don't need opcintype for
the lookup, and we can remove it from all the API layers introduced by
commit c09e5a6a01.

This also adds some more documentation about the correct signature of
the gist support function and adds more checks in gistvalidate().
This was previously underspecified.  (It relied implicitly on
convention mentioned above.)

Discussion: https://www.postgresql.org/message-id/flat/E72EAA49-354D-4C2E-8EB9-255197F55330@enterprisedb.com
2025-02-21 09:07:16 +01:00
Peter Eisentraut
7202d72787 backend launchers void * arguments for binary data
Change backend launcher functions to take void * for binary data
instead of char *.  This removes the need for numerous casts.

Reviewed-by: Dagfinn Ilmari Mannsåker <ilmari@ilmari.org>
Discussion: https://www.postgresql.org/message-id/flat/fd1fcedb-3492-4fc8-9e3e-74b97f2db6c7%40eisentraut.org
2025-02-21 08:03:33 +01:00
Jeff Davis
b50a554cc8 Fix for pg_restore_attribute_stats().
Use RelationGetIndexExpressions() rather than rd_indexprs directly.

Author: Corey Huinker <corey.huinker@gmail.com>
2025-02-20 22:31:22 -08:00
Michael Paquier
41625ab8ea psql: Add support for pipelines
With \bind, \parse, \bind_named and \close, it is possible to issue
queries from psql using the extended protocol.  However, it was not
possible to send these queries using libpq's pipeline mode.  This
feature has two advantages:
- Testing.  Pipeline tests were only possible with pgbench, using TAP
tests.  It now becomes possible to have more SQL tests that are able to
stress the backend with pipelines and extended queries.  More tests will
be added in a follow-up commit that were discussed on some other
threads.  Some external projects in the community had to implement their
own facility to work around this limitation.
- Emulation of custom workloads, with more control over the actions
taken by a client with libpq APIs.  It is possible to emulate more
workload patterns to bottleneck the backend with the extended query
protocol.

This patch adds six new meta-commands to be able to control pipelines:
* \startpipeline starts a new pipeline.  All extended queries are queued
until the end of the pipeline are reached or a sync request is sent and
processed.
* \endpipeline ends an existing pipeline.  All queued commands are sent
to the server and all responses are processed by psql.
* \syncpipeline queues a synchronisation request, without flushing the
commands to the server, equivalent of PQsendPipelineSync().
* \flush, equivalent of PQflush().
* \flushrequest, equivalent of PQsendFlushRequest()
* \getresults reads the server's results for the queries in a pipeline.
Unsent data is automatically pushed when \getresults is called.  It is
possible to control the number of results read in a single meta-command
execution with an optional parameter, 0 means that all the results
should be read.

Author: Anthonin Bonnefoy <anthonin.bonnefoy@datadoghq.com>
Reviewed-by: Jelte Fennema-Nio <postgres@jeltef.nl>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Discussion: https://postgr.es/m/CAO6_XqroE7JuMEm1sWz55rp9fAYX2JwmcP_3m_v51vnOFdsLiQ@mail.gmail.com
2025-02-21 11:19:59 +09:00
Michael Paquier
40af897eb7 Add braces for if block with large comment in psql's common.c
A patch touching this area of the code is under review, and this format
makes the readability of the code slightly harder to parse.

Extracted from a larger patch by the same author.

Author: Anthonin Bonnefoy <anthonin.bonnefoy@datadoghq.com>
Discussion: https://postgr.es/m/CAO6_XqroE7JuMEm1sWz55rp9fAYX2JwmcP_3m_v51vnOFdsLiQ@mail.gmail.com
2025-02-21 09:18:49 +09:00
Daniel Gustafsson
2c53dec7f4 Add missing entry to oauth_validator test .gitignore
Commit b3f0be788 accidentally missed adding the oauth client test
binary to the relevant .gitignore.

Reported-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/2839306.1740082041@sss.pgh.pa.us
2025-02-20 21:29:21 +01:00
Peter Eisentraut
3e4d868615 Remove various unnecessary (char *) casts
Remove a number of (char *) casts that are unnecessary.  Or in some
cases, rewrite the code to make the purpose of the cast clearer.

Reviewed-by: Dagfinn Ilmari Mannsåker <ilmari@ilmari.org>
Discussion: https://www.postgresql.org/message-id/flat/fd1fcedb-3492-4fc8-9e3e-74b97f2db6c7%40eisentraut.org
2025-02-20 19:49:27 +01:00
Jeff Davis
ab84d0ff80 Trial fix for old cross-version upgrades.
Per buildfarm and reports, it seems that 9.X to 18 upgrades were
failing after commit 1fd1bd8710 due to an incorrect regex. Loosen the
regex to accommodate older versions.

Reported-by: vignesh C <vignesh21@gmail.com>
Reported-by: Andrew Dunstan <andrew@dunslane.net>
Discussion: https://postgr.es/m/CALDaNm3GUs+U8Nt4S=V5zmb+K8-RfAc03vRENS0teeoq0Lc6Tw@mail.gmail.com
Discussion: https://postgr.es/m/ea4cbbc1-c5a5-43d1-9618-8ff3f2155bfe@dunslane.net
2025-02-20 10:21:24 -08:00
Andrew Dunstan
8e4d72573c Ignore blank lines in pgindent exclude files
Currently a blank line matches everything, which is almost never what
someone would want. If they really want that they can use a wildcard
regex to do it.

Author: Zsolt Parragi <zsolt.parragi@percona.com>

Discussion: https://postgr.es/m/CAN4CZFNka+2q3=-Dithr4w65RJfwPaV92T62spEzLn+T4MgcMg@mail.gmail.com
2025-02-20 11:36:07 -05:00
Daniel Gustafsson
9d9a71002a cirrus: Temporarily fix libcurl link error
On FreeBSD the ftp/curl port appears to be missing a minimum
version dependency on libssh2, so the following starts showing
up after upgrading to curl 8.11.1_1:

  libcurl.so.4: Undefined symbol "libssh2_session_callback_set2"

Awaiting an upgrade of the FreeBSD CI images to version 14, work
around the issue.

Author: Jacob Champion <jacob.champion@enterprisedb.com>
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Discussion: https://postgr.es/m/CAOYmi+kZAka0sdxCOBxsQc2ozEZGZKHWU_9nrPXg3sG1NJ-zJw@mail.gmail.com
2025-02-20 16:25:47 +01:00
Daniel Gustafsson
b3f0be788a Add support for OAUTHBEARER SASL mechanism
This commit implements OAUTHBEARER, RFC 7628, and OAuth 2.0 Device
Authorization Grants, RFC 8628.  In order to use this there is a
new pg_hba auth method called oauth.  When speaking to a OAuth-
enabled server, it looks a bit like this:

  $ psql 'host=example.org oauth_issuer=... oauth_client_id=...'
  Visit https://oauth.example.org/login and enter the code: FPQ2-M4BG

Device authorization is currently the only supported flow so the
OAuth issuer must support that in order for users to authenticate.
Third-party clients may however extend this and provide their own
flows.  The built-in device authorization flow is currently not
supported on Windows.

In order for validation to happen server side a new framework for
plugging in OAuth validation modules is added.  As validation is
implementation specific, with no default specified in the standard,
PostgreSQL does not ship with one built-in.  Each pg_hba entry can
specify a specific validator or be left blank for the validator
installed as default.

This adds a requirement on libcurl for the client side support,
which is optional to build, but the server side has no additional
build requirements.  In order to run the tests, Python is required
as this adds a https server written in Python.  Tests are gated
behind PG_TEST_EXTRA as they open ports.

This patch has been a multi-year project with many contributors
involved with reviews and in-depth discussions:  Michael Paquier,
Heikki Linnakangas, Zhihong Yu, Mahendrakar Srinivasarao, Andrey
Chudnovsky and Stephen Frost to name a few.  While Jacob Champion
is the main author there have been some levels of hacking by others.
Daniel Gustafsson contributed the validation module and various bits
and pieces; Thomas Munro wrote the client side support for kqueue.

Author: Jacob Champion <jacob.champion@enterprisedb.com>
Co-authored-by: Daniel Gustafsson <daniel@yesql.se>
Co-authored-by: Thomas Munro <thomas.munro@gmail.com>
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Reviewed-by: Peter Eisentraut <peter@eisentraut.org>
Reviewed-by: Antonin Houska <ah@cybertec.at>
Reviewed-by: Kashif Zeeshan <kashi.zeeshan@gmail.com>
Discussion: https://postgr.es/m/d1b467a78e0e36ed85a09adf979d04cf124a9d4b.camel@vmware.com
2025-02-20 16:25:17 +01:00
Jeff Davis
1fd1bd8710 Transfer statistics during pg_upgrade.
Add support to pg_dump for dumping stats, and use that during
pg_upgrade so that statistics are transferred during upgrade. In most
cases this removes the need for a costly re-analyze after upgrade.

Some statistics are not transferred, such as extended statistics or
statistics with a custom stakind.

Now pg_dump accepts the options --schema-only, --no-schema,
--data-only, --no-data, --statistics-only, and --no-statistics; which
allow all combinations of schema, data, and/or stats. The options are
named this way to preserve compatibility with the previous
--schema-only and --data-only options.

Statistics are in SECTION_DATA, unless the object itself is in
SECTION_POST_DATA.

The stats are represented as calls to pg_restore_relation_stats() and
pg_restore_attribute_stats().

Author: Corey Huinker, Jeff Davis
Reviewed-by: Jian He
Discussion: https://postgr.es/m/CADkLM=fzX7QX6r78fShWDjNN3Vcr4PVAnvXxQ4DiGy6V=0bCUA@mail.gmail.com
Discussion: https://postgr.es/m/CADkLM%3DcB0rF3p_FuWRTMSV0983ihTRpsH%2BOCpNyiqE7Wk0vUWA%40mail.gmail.com
2025-02-20 01:29:06 -08:00
Amit Kapila
7da344b9f8 Improve errdetail message added by ac0e33136a.
Make it consistent with other similar messages.

Author: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Reviewed-by: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Discussion: https://postgr.es/m/20250220.140839.1444694904721968348.horikyota.ntt@gmail.com
2025-02-20 14:02:29 +05:30
Amit Langote
525392d572 Don't lock partitions pruned by initial pruning
Before executing a cached generic plan, AcquireExecutorLocks() in
plancache.c locks all relations in a plan's range table to ensure the
plan is safe for execution. However, this locks runtime-prunable
relations that will later be pruned during "initial" runtime pruning,
introducing unnecessary overhead.

This commit defers locking for such relations to executor startup and
ensures that if the CachedPlan is invalidated due to concurrent DDL
during this window, replanning is triggered. Deferring these locks
avoids unnecessary locking overhead for pruned partitions, resulting
in significant speedup, particularly when many partitions are pruned
during initial runtime pruning.

* Changes to locking when executing generic plans:

AcquireExecutorLocks() now locks only unprunable relations, that is,
those found in PlannedStmt.unprunableRelids (introduced in commit
cbc127917e), to avoid locking runtime-prunable partitions
unnecessarily.  The remaining locks are taken by
ExecDoInitialPruning(), which acquires them only for partitions that
survive pruning.

This deferral does not affect the locks required for permission
checking in InitPlan(), which takes place before initial pruning.
ExecCheckPermissions() now includes an Assert to verify that all
relations undergoing permission checks, none of which can be in the
set of runtime-prunable relations, are properly locked.

* Plan invalidation handling:

Deferring locks introduces a window where prunable relations may be
altered by concurrent DDL, invalidating the plan. A new function,
ExecutorStartCachedPlan(), wraps ExecutorStart() to detect and handle
invalidation caused by deferred locking. If invalidation occurs,
ExecutorStartCachedPlan() updates CachedPlan using the new
UpdateCachedPlan() function and retries execution with the updated
plan. To ensure all code paths that may be affected by this handle
invalidation properly, all callers of ExecutorStart that may execute a
PlannedStmt from a CachedPlan have been updated to use
ExecutorStartCachedPlan() instead.

UpdateCachedPlan() replaces stale plans in CachedPlan.stmt_list. A new
CachedPlan.stmt_context, created as a child of CachedPlan.context,
allows freeing old PlannedStmts while preserving the CachedPlan
structure and its statement list. This ensures that loops over
statements in upstream callers of ExecutorStartCachedPlan() remain
intact.

ExecutorStart() and ExecutorStart_hook implementations now return a
boolean value indicating whether plan initialization succeeded with a
valid PlanState tree in QueryDesc.planstate, or false otherwise, in
which case QueryDesc.planstate is NULL. Hook implementations are
required to call standard_ExecutorStart() at the beginning, and if it
returns false, they should do the same without proceeding.

* Testing:

To verify these changes, the delay_execution module tests scenarios
where cached plans become invalid due to changes in prunable relations
after deferred locks.

* Note to extension authors:

ExecutorStart_hook implementations must verify plan validity after
calling standard_ExecutorStart(), as explained earlier. For example:

    if (prev_ExecutorStart)
        plan_valid = prev_ExecutorStart(queryDesc, eflags);
    else
        plan_valid = standard_ExecutorStart(queryDesc, eflags);

    if (!plan_valid)
        return false;

    <extension-code>

    return true;

Extensions accessing child relations, especially prunable partitions,
via ExecGetRangeTableRelation() must now ensure their RT indexes are
present in es_unpruned_relids (introduced in commit cbc127917e), or
they will encounter an error. This is a strict requirement after this
change, as only relations in that set are locked.

The idea of deferring some locks to executor startup, allowing locks
for prunable partitions to be skipped, was first proposed by Tom Lane.

Reviewed-by: Robert Haas <robertmhaas@gmail.com> (earlier versions)
Reviewed-by: David Rowley <dgrowleyml@gmail.com> (earlier versions)
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> (earlier versions)
Reviewed-by: Tomas Vondra <tomas@vondra.me>
Reviewed-by: Junwang Zhao <zhjwpku@gmail.com>
Discussion: https://postgr.es/m/CA+HiwqFGkMSge6TgC9KQzde0ohpAycLQuV7ooitEEpbKB0O_mg@mail.gmail.com
2025-02-20 17:09:48 +09:00
Amit Kapila
4aa6fa3cd0 Include schema/table publications even with exclude options in dump.
The current implementation inconsistently includes public schema but not
information_schema when those are specified in FOR TABLES IN SCHMEA ...
Apart from that, the current behavior for publications w.r.t exclude table
and schema (--exclude-table, --exclude-schema) option differs from what we
do at other places. We try to avoid including publications for
corresponding tables or schemas when an exclude-table or exclude-schema
option is given, unlike what we do for views using functions defined in a
particular schema or a subscription pointing to publications with their
corresponding exclude options.

I decided not to backpatch this as it leads to a behavior change and we don't
see any field report for current behavior.

Reported-by: Tom Lane <tgl@sss.pgh.pa.us>
Author: Vignesh C <vignesh21@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Discussion: https://postgr.es/m/1270733.1734134272@sss.pgh.pa.us
2025-02-20 11:25:29 +05:30
Michael Paquier
f11674f8df doc: Fix typo in section "WAL configuration"
pg_stat_io has an attribute named fsync_time, not sync_time.

Oversight in 2f70871c2b.

Discussion: https://postgr.es/m/Z7RkQ0EfYaqqjgz/@ip-10-97-1-34.eu-west-3.compute.internal
2025-02-20 14:22:00 +09:00
Michael Paquier
4538bd3f1d doc: Add details about object "wal" in pg_stat_io
This commit adds a short description of what kind of activity is tracked
in pg_stat_io for the object "wal", with a link pointing to the section
"WAL configuration" that has a lot of details on the matter.

This should perhaps have been added in a051e71e28, but things are what
they are.

Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Discussion: https://postgr.es/m/Z7RkQ0EfYaqqjgz/@ip-10-97-1-34.eu-west-3.compute.internal
2025-02-20 14:16:23 +09:00
Michael Paquier
2f70871c2b doc: Recommend pg_stat_io rather than pg_stat_wal in WAL configuration
Since a051e71e28, pg_stat_io is able to track statistics for the WAL
activity, providing an equivalent of pg_stat_wal with more granularity
for the fsyncs/writes counts and timings, as the data is split across
backend types.

This commit now recommends pg_stat_io rather than pg_stat_wal in the
section "WAL configuration", some of the latter's attributes being
candidate for removal in a follow-up commit.

Extracted from a larger patch by the same author.

Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Discussion: https://postgr.es/m/Z7RkQ0EfYaqqjgz/@ip-10-97-1-34.eu-west-3.compute.internal
2025-02-20 13:55:00 +09:00
Michael Paquier
71f17823ba Fix FATAL message for invalid recovery timeline at beginning of recovery
If the requested recovery timeline is not reachable, the logged
checkpoint and timeline should to be the values read from the
backup_label when it is defined.  The message generated used the values
from the control file in this case, which is fine when recovering from
the control file without a backup_label, but not if there is a
backup_label.

Issue introduced in ee994272ca.  v15 has introduced xlogrecovery.c and
more simplifications in this area (4a92a1c3d1, a27048cbcb), making
this change a bit simpler to think about, so backpatch only down to this
version.

Author: David Steele <david@pgbackrest.org>
Reviewed-by: Andrey M. Borodin <x4mmm@yandex-team.ru>
Reviewed-by: Benoit Lobréau <benoit.lobreau@dalibo.com>
Discussion: https://postgr.es/m/c3d617d4-1696-4aa7-8a4d-5a7d19cc5618@pgbackrest.org
Backpatch-through: 15
2025-02-20 10:42:20 +09:00
Andres Freund
d38bab5edd pgbench: Increase RLIMIT_NOFILE if necessary
pgbench already had code to check if the soft rlimit is too low for the
specified number of connections. If too low, it errored out, telling the user
to increase the limit.

However, we can do better: If the hard limit allows, increase the soft limit
to be sufficiently for the number of connections.

It is common for the soft limit to be considerably lower than the hard limit,
due to the danger of soft limits > 1024 breaking programs that use the
select(2), as explained in [1].

[1]: https://0pointer.net/blog/file-descriptor-limits.html

Author: Jelte Fennema-Nio <postgres@jeltef.nl>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/CAGECzQQh6VSy3KG4pN1d%3Dh9J%3DD1rStFCMR%2Bt7yh_Kwj-g87aLQ%40mail.gmail.com
2025-02-19 19:35:09 -05:00
Michael Paquier
9b1cb58c5f test_escape: Fix output of --help
The short option name -f was not listed, only its long option name
--force-unsupported.

Author: Japin Li
Discussion: https://postgr.es/m/ME0P300MB04452BD1FB1B277D4C1C20B9B6C52@ME0P300MB0445.AUSP300.PROD.OUTLOOK.COM
Backpatch-through: 13
2025-02-20 09:30:54 +09:00
Tomas Vondra
9ba7bcc894 Correct relation size estimate with low fillfactor
Since commit 29cf61ade3, table_block_relation_estimate_size() considers
fillfactor when estimating number of rows in a relation before the first
ANALYZE. The formula however did not consider tuples may be larger than
available space determined by fillfactor, ending with density 0. This
ultimately means the relation was estimated to contain a single row.

The executor however places at least one tuple per page, even with very
low fillfactor values, so the density should be at least 1. Fixed by
clamping the density estimate using clamp_row_est().

Reported by Heikki Linnakangas. Fix by me, with regression test inspired
by example provided by Heikki.

Backpatch to 17, where the issue was introduced.

Reported-by: Heikki Linnakangas
Backpatch-through: 17
Discussion: https://postgr.es/m/2bf9d973-7789-4937-a7ca-0af9fb49c71e@iki.fi
2025-02-19 23:53:37 +01:00
Tom Lane
e596e077bb Assert that ExecOpenIndices and ExecCloseIndices are not repeated.
These functions should be called at most once per ResultRelInfo;
it's wasteful to do otherwise, and certainly the pattern of
opening twice and then closing twice is a bad idea.  Moreover,
aminsertcleanup functions might not be prepared to be called twice,
as the just-hardened code in BRIN demonstrates.

This amounts to an API change, since such coding patterns were
safe even if wasteful before v17.  Hence, apply to HEAD only.
(Extension code violating this new rule faces some risk in v17,
but we just fixed brininsertcleanup and there are probably few
other aminsertcleanup functions as yet.  So the odds of breaking
usable code seem higher than the odds of doing something useful
with a back-patch.)

Bug: #18815
Reported-by: Sergey Belyashov <sergey.belyashov@gmail.com>
Discussion: https://postgr.es/m/18815-2a0407cc7f40b327@postgresql.org
2025-02-19 16:45:12 -05:00
Tom Lane
9ff68679b5 Fix crash in brininsertcleanup during logical replication.
Logical replication crashes if the subscriber's partitioned table
has a BRIN index.  There are two independently blamable causes,
and this patch fixes both:

1. brininsertcleanup fails if called twice for the same IndexInfo,
because it half-destroys its BrinInsertState but leaves it still
linked from ii_AmCache.  brininsert would also fail in that state,
so it's pretty hard to see any advantage to this coding.  Fully
remove the BrinInsertState, instead, so that a new brininsert
call would create a new cache.

2. A logical replication subscriber sometimes does ExecOpenIndices
twice on the same ResultRelInfo, followed by doing ExecCloseIndices
twice; the second call reaches the brininsertcleanup bug.  Quite
aside from tickling unexpected cases in aminsertcleanup methods,
this seems very wasteful, because the IndexInfos built in the
first ExecOpenIndices call are just lost during the second call,
and have to be rebuilt at possibly-nontrivial cost.  We should
establish a coding rule that you don't do that.

The problematic coding is that when the target table is partitioned,
apply_handle_tuple_routing calls ExecFindPartition which does
ExecOpenIndices (and expects that ExecCleanupTupleRouting will
close the indexes again).  Using the ResultRelInfo made by
ExecFindPartition, it calls apply_handle_delete_internal or
apply_handle_insert_internal, both of which think they need to do
ExecOpenIndices/ExecCloseIndices for themselves.  They do in the main
non-partitioned code paths, but not here.  The simplest fix is to pull
their ExecOpenIndices/ExecCloseIndices calls out and put them in the
call sites for the non-partitioned cases.  (We could have refactored
apply_handle_update_internal similarly, but I did not do so today
because there's no bug there: the partitioned code path doesn't
call it.)

Also, remove the always-duplicative open/close calls within
apply_handle_tuple_routing itself.

Since brininsertcleanup and indeed the whole aminsertcleanup mechanism
are new in v17, there's no observable bug in older branches.  A case
could be made for trying to avoid these duplicative open/close calls
in the older branches, but for now it seems not worth the trouble and
risk of new bugs.

Bug: #18815
Reported-by: Sergey Belyashov <sergey.belyashov@gmail.com>
Discussion: https://postgr.es/m/18815-2a0407cc7f40b327@postgresql.org
Backpatch-through: 17
2025-02-19 16:35:15 -05:00
Tomas Vondra
a1b4f289be Consider BufFiles when adjusting hashjoin parameters
Until now ExecChooseHashTableSize() considered only the size of the
in-memory hash table, and ignored the memory needed for the batch files.
Which can be a significant amount, because each batch needs two BufFiles
(each with a BLCKSZ buffer). The same issue applies to increasing the
number of batches during execution.

It's also possible to trigger a "batch explosion", e.g. due to duplicate
values or skew. We've seen reports of joins with hundreds of thousands
(or even millions) of batches, consuming gigabytes of memory, triggering
OOM errors. These cases may be fairly rare, but it's clearly possible to
hit them.

These issues can't be prevented during planning. Even if we improve
that, it does not help with execution-time batch explosion. We can
however reduce the impact and use as little memory as possible.

This patch improves the behavior by adjusting how the memory is divided
between the hash table and batch files. It may be better to use fewer
batch files, even if it means the hash table will exceed the limit.

The capacity of the hash node may be increased either by doubling he
number of batches, or doubling the size of the in-memory hash table. The
outcome is the same, but the memory usage may be very different. For low
nbatch values it's better to add batches, for high nbatch values it's
better to allow a larger hash table.

The patch considers both options, both during the initial sizing and
then during execution, to minimize how much the limit gets exceeded.

It might seem this patch is relaxing the memory limit - allowing it to
be exceeded. But that's not really the case. It has always been like
that, except the memory used by batches was ignored.

Allowing the hash table to grow may also prevent the batch explosion.
If there's a large batch that can't be split (due to hash collisions or
duplicate values), at some point the memory limit will increase enough
for the batch to fit into the hash table.

This patch was in the works for a long time. The early versions were
posted in 2019, and revived every year or two when we happened to get
the next report of OOM due to a hashjoin batch explosion. Each of those
patch versions were reviewed by a couple people. I'm mentioning only
Melanie Plageman and Robert Haas, because they reviewed the last
version, and the older patches are very different.

Reviewed-by: Melanie Plageman, Robert Haas
Discussion: https://postgr.es/m/7bed6c08-72a0-4ab9-a79c-e01fcdd0940f@vondra.me
Discussion: https://postgr.es/m/20190504003414.bulcbnge3rhwhcsh%40development
Discussion: https://postgr.es/m/20190428141901.5dsbge2ka3rxmpk6%40development
2025-02-19 21:08:20 +01:00
Andres Freund
8b886a4e34 tests: BackgroundPsql: Fix potential for lost errors on windows
This addresses various corner cases in BackgroundPsql:

- On windows stdout and stderr may arrive out of order, leading to errors not
  being reported, or attributed to the wrong statement.

  To fix, emit the "query-separation banner" on both stdout and stderr and
  wait for both.

- Very occasionally the "query-separation banner" would not get removed, because
  we waited until the banner arrived, but then replaced the banner plus
  newline.

  To fix, wait for banner and newline.

- For interactive psql replacing $banner\n is not sufficient, interactive psql
  outputs \r\n.

- For interactive psql, where commands are echoed to stdout, the \echo
  command, rather than its output, would be matched.

  This would sometimes lead to output from the prior query, or wait_connect(),
  being returned in the next command.

  This also affected wait_connect(), leading to sometimes sending queries to
  psql before the connection actually was established.

While debugging these issues I also found that it's hard to know whether a
query separation banner was attributed to the right query. Make that easier by
counting the queries each BackgroundPsql instance has emitted and include the
number in the banner.

Also emit psql stdout/stderr in query() and wait_connect() as Test::More
notes, without that it's rather hard to debug some issues in CI and buildfarm.

As this can cause issues not just to-be-added tests, but also existing ones,
backpatch the fix to all supported versions.

Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Reviewed-by: Noah Misch <noah@leadboat.com>
Discussion: https://postgr.es/m/wmovm6xcbwh7twdtymxuboaoarbvwj2haasd3sikzlb3dkgz76@n45rzycluzft
Backpatch-through: 13
2025-02-19 10:45:48 -05:00
Álvaro Herrera
80d7f99049
Add ATAlterConstraint struct for ALTER .. CONSTRAINT
Replace the use of Constraint with a new ATAlterConstraint struct, which
allows us to pass additional information.  No functionality is added by
this commit.  This is necessary for future work that allows altering
constraints in other ways.

I (Álvaro) took the liberty of restructuring the code for ALTER
CONSTRAINT beyond what Amul did.  The original coding before Amul's
patch was unnecessarily baroque, and this change makes things simpler
by removing one level of subroutine.  Also, partly remove the assumption
that only partitioned tables are relevant (by passing sensible 'recurse'
arguments) and no longer ignore whether ONLY was specified.  I say
'partly' because the current coding only walks down via the 'conparentid'
relationship, which is only used for partitioned tables; but future
patches could handle ONLY or not for other types of constraint changes
for legacy inheritance trees too.

Author: Amul Sul <sulamul@gmail.com>
Author: Álvaro Herrera <alvherre@alvh.no-ip.org>
Discussion: https://postgr.es/m/CAAJ_b94bfgPV-8Mw_HwSBeheVwaK9=5s+7+KbBj_NpwXQFgDGg@mail.gmail.com
2025-02-19 13:06:13 +01:00
Alexander Korotkov
e983ee9380 Improve statistics estimation for single-column GROUP BY in sub-queries
This commit follows the idea of the 4767bc8ff2.  If sub-query has only one
GROUP BY column, we can consider its output variable as being unique. We can
employ this fact in the statistics to make more precise estimations in the
upper query block.

Author: Andrei Lepikhov <lepihov@gmail.com>
Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi>
Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com>
2025-02-19 11:59:30 +02:00
Amit Kapila
8a695d7998 Add a test for commit ac0e33136a using the injection point.
This test uses an injection point to bypass the time overhead caused by
the idle_replication_slot_timeout GUC, which has a minimum value of one
minute.

Author: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Author: Nisha Moond <nisha.moond412@gmail.com>
Reviewed-by: Peter Smith <smithpb2250@gmail.com>
Reviewed-by: Vignesh C <vignesh21@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Discussion: https://postgr.es/m/CALj2ACW4aUe-_uFQOjdWCEN-xXoLGhmvRFnL8SNw_TZ5nJe+aw@mail.gmail.com
2025-02-19 15:02:22 +05:30
Michael Paquier
302cf15759 Add support for LIKE in CREATE FOREIGN TABLE
LIKE enables the creation of foreign tables based on the column
definitions, constraints and objects of the defined source relation(s).

This feature mirrors the behavior of CREATE TABLE LIKE, but ignores
the INCLUDING sub-options that do not make sense for foreign tables:
INDEXES, COMPRESSION, IDENTITY and STORAGE.  The supported sub-options
are COMMENTS, CONSTRAINTS, DEFAULTS, GENERATED and STATISTICS, mapping
with the clauses already supported by the command.

Note that the restriction with LIKE in CREATE FOREIGN TABLE was added in
a0c6dfeecf.

Author: Zhang Mingli
Reviewed-by: Álvaro Herrera, Sami Imseih, Michael Paquier
Discussion: https://postgr.es/m/42d3f855-2275-4361-a42a-826172ca2dc4@Spark
2025-02-19 15:50:37 +09:00
Amit Langote
e7563e3c75 doc: Fix some issues with JSON_TABLE() examples
1. Remove an unused PASSING variable.

 2. Adjust formatting of JSON data used in an example to be valid
    under strict mode

Reported-by: Miłosz Chmura <mieszko4@gmail.com>
Author: Robert Treat <rob@xzilla.net>
Discussion: https://postgr.es/m/173859550337.1071.4748984213168572913@wrigleys.postgresql.org
2025-02-19 15:08:17 +09:00
Amit Kapila
ac0e33136a Invalidate inactive replication slots.
This commit introduces idle_replication_slot_timeout GUC that allows
inactive slots to be invalidated at the time of checkpoint. Because
checkpoints happen checkpoint_timeout intervals, there can be some lag
between when the idle_replication_slot_timeout was exceeded and when the
slot invalidation is triggered at the next checkpoint. To avoid such lags,
users can force a checkpoint to promptly invalidate inactive slots.

Note that the idle timeout invalidation mechanism is not applicable for
slots that do not reserve WAL or for slots on the standby server that are
synced from the primary server (i.e., standby slots having 'synced' field
'true'). Synced slots are always considered to be inactive because they
don't perform logical decoding to produce changes.

The slots can become inactive for a long period if a subscriber is down
due to a system error or inaccessible because of network issues. If such a
situation persists, it might be more practical to recreate the subscriber
rather than attempt to recover the node and wait for it to catch up which
could be time-consuming.

Then, external tools could create replication slots (e.g., for migrations
or upgrades) that may fail to remove them if an error occurs, leaving
behind unused slots that take up space and resources. Manually cleaning
them up can be tedious and error-prone, and without intervention, these
lingering slots can cause unnecessary WAL retention and system bloat.

As the duration of idle_replication_slot_timeout is in minutes, any test
using that would be time-consuming. We are planning to commit a follow up
patch for tests by using the injection point framework.

Author: Nisha Moond <nisha.moond412@gmail.com>
Author: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>
Reviewed-by: Peter Smith <smithpb2250@gmail.com>
Reviewed-by: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Reviewed-by: Vignesh C <vignesh21@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Reviewed-by: Hou Zhijie <houzj.fnst@fujitsu.com>
Reviewed-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Discussion: https://postgr.es/m/CALj2ACW4aUe-_uFQOjdWCEN-xXoLGhmvRFnL8SNw_TZ5nJe+aw@mail.gmail.com
Discussion: https://postgr.es/m/OS0PR01MB5716C131A7D80DAE8CB9E88794FC2@OS0PR01MB5716.jpnprd01.prod.outlook.com
2025-02-19 09:29:50 +05:30
Tom Lane
b464e51ab3 Update to latest Snowball sources.
It's been some time since we did this, partly because the upstream
snowball project hasn't formally tagged a new release since 2021.
The main motivation for doing it now is to absorb a bug fix
(their commit e322673a841d9abd69994ae8cd20e191090b6ef4), which
prevents a null pointer dereference crash if SN_create_env() gets
a malloc failure at just the wrong point.  We'll patch the back
branches with only that change, but we might as well do the full
sync dance on HEAD.

Aside from a bunch of mostly-minor tweaks to existing stemmers, this
update adds a new stemmer for Estonian.  It also removes the existing
stemmer for Romanian using ISO-8859-2 encoding.  Upstream apparently
concluded that ISO-8859-2 doesn't provide an adequate representation
of some Romanian characters, and the UTF-8 implementation should be
used instead.

While at it, update the README's instructions for doing a sync,
which have not been adjusted during the addition of meson tooling.

Thanks to Maksim Korotkov for discovering the null-pointer
bug and submitting the fix to upstream snowball.

Reported-by: Maksim Korotkov <m.korotkov@postgrespro.ru>
Discussion: https://postgr.es/m/1d1a46-67ab1000-21-80c451@83151435
2025-02-18 21:13:54 -05:00
Richard Guo
71d02dc478 Fix unsafe access to BufferDescriptors
When considering a local buffer, the GetBufferDescriptor() call in
BufferGetLSNAtomic() would be retrieving a shared buffer with a bad
buffer ID.  Since the code checks whether the buffer is shared before
using the retrieved BufferDesc, this issue did not lead to any
malfunction.  Nonetheless this seems like trouble waiting to happen,
so fix it by ensuring that GetBufferDescriptor() is only called when
we know the buffer is shared.

Author: Tender Wang <tndrwang@gmail.com>
Reviewed-by: Xuneng Zhou <xunengzhou@gmail.com>
Reviewed-by: Richard Guo <guofenglinux@gmail.com>
Discussion: https://postgr.es/m/CAHewXNku-o46-9cmUgyv6LkSZ25doDrWq32p=oz9kfD8ovVJMg@mail.gmail.com
Backpatch-through: 13
2025-02-19 11:05:35 +09:00
Richard Guo
c39392ebae Fix freeing a child join's SpecialJoinInfo
In try_partitionwise_join, we try to break down the join between two
partitioned relations into joins between matching partitions.  To
achieve this, we iterate through each pair of partitions from the two
joining relations and create child join relations for them.  To reduce
memory accumulation during each iteration, one step we take is freeing
the SpecialJoinInfos created for the child joins.

A child join's SpecialJoinInfo is a copy of the parent join's
SpecialJoinInfo, with some members being translated copies of their
counterparts in the parent.  However, when freeing the bitmapset
members in a child join's SpecialJoinInfo, we failed to check whether
they were translated copies.  As a result, we inadvertently freed the
members that were still in use by the parent SpecialJoinInfo, leading
to crashes when those freed members were accessed.

To fix, check if each member of the child join's SpecialJoinInfo is a
translated copy and free it only if that's the case.  This requires
passing the parent join's SpecialJoinInfo as a parameter to
free_child_join_sjinfo.

Back-patch to v17 where this bug crept in.

Bug: #18806
Reported-by: 孟令彬 <m_lingbin@126.com>
Diagnosed-by: Tender Wang <tndrwang@gmail.com>
Author: Richard Guo <guofenglinux@gmail.com>
Reviewed-by: Amit Langote <amitlangote09@gmail.com>
Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Discussion: https://postgr.es/m/18806-d70b0c9fdf63dcbf@postgresql.org
Backpatch-through: 17
2025-02-19 10:02:32 +09:00
Michael Paquier
aef6f907f6 test_escape: Fix handling of short options in getopt_long()
This addresses two errors in the module, based on the set of options
supported:
- '-c', for --conninfo, was not listed.
- '-f', for --force-unsupported, was not listed.

While on it, these are now listed in an alphabetical order.

Author: Japin Li
Discussion: https://postgr.es/m/ME0P300MB04451FB20CE0346A59C25CADB6FA2@ME0P300MB0445.AUSP300.PROD.OUTLOOK.COM
Backpatch-through: 13
2025-02-19 09:45:42 +09:00
Michael Paquier
f2e4c2b203 Make the description of some GUCs more consistent
This commit improves the description of a couple of GUCs, to be more
consistent with the style of their surroundings:
* array_nulls
* enable_self_join_elimination
* optimize_bounded_sort
* row_security
* synchronize_seqscans

Author: Kyotaro Horiguchi
Discussion: https://postgr.es/m/20250218.103240.1422205966404509831.horikyota.ntt@gmail.com
2025-02-19 08:42:35 +09:00
Bruce Momjian
06dc1ffd24 doc: add example of sign mismatch with POSIX/ISO-8601 time zones
Author: Laurenz Albe

Discussion: https://postgr.es/m/eb4d1e15c6822c1937be1491118500dd9201492f.camel@cybertec.at
2025-02-18 15:51:31 -05:00
Jeff Davis
a1f7f80bfe Update outdated comments in nodeAgg.c.
Author: Zhang Mingli
Reviewed-by: Richard Guo
Discussion: https://postgr.es/m/198a8d1e-0792-4e7f-828e-902aa342f36e@Spark
2025-02-18 10:37:50 -08:00
Melanie Plageman
c623e8593e Reduce scope of heap vacuum per_buffer_data
Move lazy_scan_heap()'s per_buffer_data variable into a tighter scope.
In lazy_scan_heap()'s phase I heap vacuuming, the read stream API
returns a pointer to the next block number to vacuum. As long as
read_stream_next_buffer() returns a valid buffer, per_buffer_data should
always be valid.

Move per_buffer_data into a tighter scope and make sure it is reset to
NULL on each iteration so that we get a core dump instead of bogus data
from a previous block if something goes wrong in the read stream API.

Suggested-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/626104.1739729538%40sss.pgh.pa.us
2025-02-18 09:29:10 -05:00
Daniel Gustafsson
95ef3d9029 Add PGErrorVerbosity to typedefs.list
PGErrorVerbosity was missing which resulted in incorrect whitespace
alignment going back all the way to e3860ffa4d.  No backpatch for
this though since we don't pgindent backbranches.

Author: Jelte Fennema-Nio <postgres@jeltef.nl>
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Discussion: https://postgr.es/m/CAGECzQTVi8n-HW4Q27je-b9ckQk7zf6bS_it42gNvQu+DX0NCQ@mail.gmail.com
2025-02-18 13:23:13 +01:00
David Rowley
593509202f Fix poorly written regression test
bd10ec529 added code to allow redundant functionally dependent GROUP BY
columns to be removed using unique indexes and NOT NULL constraints as
proofs of functional dependency.  In that commit, I (David) added a test
to ensure that when there are multiple indexes available to remove columns
that we pick the index that allows us to remove the most columns.  This
test was faulty as it assumed the t3 table's primary key index was valid
to use as functional dependency proof, but that's not the case since
that's defined as deferrable.

Here we adjust the tests added by that commit to use the t2 table instead.
That's defined with a non-deferrable primary key.

Author: songjinzhou <tsinghualucky912@foxmail.com>
Author: David Rowley <dgrowleyml@gmail.com>
Reviewed-by: Japin Li <japinli@hotmail.com>
Discussion: https://postgr.es/m/tencent_CD414C79D39668455DF80D35143B87634C08@qq.com
2025-02-19 00:42:22 +13:00
Amit Kapila
217919dd09 Raise a WARNING for max_slot_wal_keep_size in pg_createsubscriber.
During the pg_createsubscriber execution, it is possible that the required
WAL is removed from the primary/publisher node due to
'max_slot_wal_keep_size'.

This patch raises a WARNING during the '--dry-run' mode if the
'max_slot_wal_keep_size' is set to a non-default value on the
primary/publisher node.

Author: Shubham Khanna <khannashubham1197@gmail.com>
Reviewed-by: Peter Smith <smithpb2250@gmail.com>
Reviewed-by: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Reviewed-by: Vignesh C <vignesh21@gmail.com>
Discussion: https://postgr.es/m/CAHv8Rj+deqsQXOMa7Tck8CBQUbsua=+4AuMVQ2=MPM0f-ZHbjA@mail.gmail.com
2025-02-18 12:15:43 +05:30
John Naylor
53d3daa491 Specialize intarray sorting
There is at least one report in the field of storing millions of
integers in arrays, so it seems like a good time to specialize
intarray's qsort function. In doing so, streamline the comparators:
Previously there were three, two for each direction for sorting
and one passed to qunique_arg. To preserve the early exit in the
case of descending input, pass the direction as an argument to
the comparator. This requires giving up duplicate detection, which
previously allowed skipping the qunique_arg() call. Testing showed
no regressions this way.

In passing, get rid of nearby checks that the input has at least
two elements, since preserving them would make some macros less
readable. These are not necessary for correctness, and seem like
premature optimizations.

Author: Andrey M. Borodin <x4mmm@yandex-team.ru>
Discussion: https://postgr.es/m/098A3E67-E4A6-4086-9C66-B1EAEB1DFE1C@yandex-team.ru
2025-02-18 11:04:55 +07:00
Amit Kapila
164bac92f0 Doc: Improve pg_replication_slots.inactive_since description.
Author: Peter Smith <smithpb2250@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Discussion: https://postgr.es/m/CAHut+PssvVMTWVtUPto6HbPO8pgVsvtzndt_FdBomA_Oq4zf3w@mail.gmail.com
2025-02-18 09:23:43 +05:30
Thomas Munro
2509b857cc Fix typo in 2a8a0067.
Builds configured with Valgrind but without assertions would fail due to
a typo in the recent change.  This should be included when back-patching
2a8a0067 into v17.
2025-02-18 14:44:59 +13:00
Daniel Gustafsson
9cdc21b533 Fix translator notes in comments
The translator comments detailing what a %s inclusion refers to were
accidentally including too many address types.  In practice this is
not a problem since it's not a translated string, but to minimize any
risk of confusion let's fix them anwyays.  Even though this exists in
backbranches there is little use for backpatch as the translation work
has already happened there, so let's avoid the churn.

Author: Japin Li <japinli@hotmail.com>
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Discussion: https://postgr.es/m/ME0P300MB04458DE627480614ABE639D2B6FB2@ME0P300MB0445.AUSP300.PROD.OUTLOOK.COM
2025-02-17 20:23:34 +01:00
1651 changed files with 248187 additions and 139844 deletions

View File

@ -2,6 +2,12 @@
#
# For instructions on how to enable the CI integration in a repository and
# further details, see src/tools/ci/README
#
#
# NB: Different tasks intentionally test with different, non-default,
# configurations, to increase the chance of catching problems. Each task with
# non-obvious non-default documents their oddity at the top of the task,
# prefixed by "SPECIAL:".
env:
@ -23,7 +29,7 @@ env:
MTEST_ARGS: --print-errorlogs --no-rebuild -C build
PGCTLTIMEOUT: 120 # avoids spurious failures during parallel tests
TEMP_CONFIG: ${CIRRUS_WORKING_DIR}/src/tools/ci/pg_ci_base.conf
PG_TEST_EXTRA: kerberos ldap ssl libpq_encryption load_balance
PG_TEST_EXTRA: kerberos ldap ssl libpq_encryption load_balance oauth
# What files to preserve in case tests fail
@ -55,6 +61,10 @@ on_failure_meson: &on_failure_meson
# To avoid unnecessarily spinning up a lot of VMs / containers for entirely
# broken commits, have a minimal task that all others depend on.
#
# SPECIAL:
# - Builds with --auto-features=disabled and thus almost no enabled
# dependencies
task:
name: SanityCheck
@ -125,21 +135,33 @@ task:
src/tools/ci/cores_backtrace.sh linux /tmp/cores
# SPECIAL:
# - Uses postgres specific CPPFLAGS that increase test coverage
# - Specifies configuration options that test reading/writing/copying of node trees
# - Specifies debug_parallel_query=regress, to catch related issues during CI
# - Also runs tests against a running postgres instance, see test_running_script
task:
name: FreeBSD - 13 - Meson
name: FreeBSD - Meson
env:
CPUS: 4
BUILD_JOBS: 4
TEST_JOBS: 8
IMAGE_FAMILY: pg-ci-freebsd-13
IMAGE_FAMILY: pg-ci-freebsd
DISK_SIZE: 50
CCACHE_DIR: /tmp/ccache_dir
CPPFLAGS: -DRELCACHE_FORCE_RELEASE -DENFORCE_REGRESSION_TEST_NAME_RESTRICTIONS
CFLAGS: -Og -ggdb
PG_TEST_INITDB_EXTRA_OPTS: -c debug_copy_parse_plan_trees=on -c debug_write_read_parse_plan_trees=on -c debug_raw_expression_coverage_test=on
# Several buildfarm animals enable these options. Without testing them
# during CI, it would be easy to cause breakage on the buildfarm with CI
# passing.
PG_TEST_INITDB_EXTRA_OPTS: >-
-c debug_copy_parse_plan_trees=on
-c debug_write_read_parse_plan_trees=on
-c debug_raw_expression_coverage_test=on
-c debug_parallel_query=regress
PG_TEST_PG_UPGRADE_MODE: --link
<<: *freebsd_task_template
@ -155,8 +177,7 @@ task:
ccache_cache:
folder: $CCACHE_DIR
# Work around performance issues due to 32KB block size
repartition_script: src/tools/ci/gcp_freebsd_repartition.sh
setup_ram_disk_script: src/tools/ci/gcp_ram_disk.sh
create_user_script: |
pw useradd postgres
chown -R postgres:postgres .
@ -275,7 +296,7 @@ task:
ccache_cache:
folder: $CCACHE_DIR
setup_ram_disk_script: src/tools/ci/gcp_ram_disk.sh
create_user_script: |
useradd postgres
chown -R postgres:users /home/postgres
@ -301,7 +322,7 @@ task:
build
EOF
build_script: su postgres -c 'ninja -C build -j${BUILD_JOBS}'
build_script: su postgres -c 'ninja -C build -j${BUILD_JOBS} ${MBUILD_TARGET}'
upload_caches: ccache
test_world_script: |
@ -329,6 +350,7 @@ LINUX_CONFIGURE_FEATURES: &LINUX_CONFIGURE_FEATURES >-
--with-gssapi
--with-icu
--with-ldap
--with-libcurl
--with-libxml
--with-libxslt
--with-llvm
@ -348,6 +370,7 @@ LINUX_MESON_FEATURES: &LINUX_MESON_FEATURES >-
-Duuid=e2fs
# Check SPECIAL in the matrix: below
task:
env:
CPUS: 4
@ -426,6 +449,10 @@ task:
#DEBIAN_FRONTEND=noninteractive apt-get -y install ...
matrix:
# SPECIAL:
# - Uses address sanitizer, sanitizer failures are typically printed in
# the server log
# - Configures postgres with a small segment size
- name: Linux - Debian Bookworm - Autoconf
env:
@ -444,6 +471,8 @@ task:
--enable-cassert --enable-injection-points --enable-debug \
--enable-tap-tests --enable-nls \
--with-segsize-blocks=6 \
--with-libnuma \
--with-liburing \
\
${LINUX_CONFIGURE_FEATURES} \
\
@ -461,11 +490,18 @@ task:
on_failure:
<<: *on_failure_ac
# SPECIAL:
# - Uses undefined behaviour and alignment sanitizers, sanitizer failures
# are typically printed in the server log
# - Test both 64bit and 32 bit builds
# - uses io_method=io_uring
- name: Linux - Debian Bookworm - Meson
env:
CCACHE_MAXSIZE: "400M" # tests two different builds
SANITIZER_FLAGS: -fsanitize=alignment,undefined
PG_TEST_INITDB_EXTRA_OPTS: >-
-c io_method=io_uring
configure_script: |
su postgres <<-EOF
@ -488,11 +524,21 @@ task:
-Dllvm=disabled \
--pkg-config-path /usr/lib/i386-linux-gnu/pkgconfig/ \
-DPERL=perl5.36-i386-linux-gnu \
-Dlibnuma=disabled \
build-32
EOF
build_script: su postgres -c 'ninja -C build -j${BUILD_JOBS} ${MBUILD_TARGET}'
build_32_script: su postgres -c 'ninja -C build-32 -j${BUILD_JOBS} ${MBUILD_TARGET}'
build_script: |
su postgres <<-EOF
ninja -C build -j${BUILD_JOBS} ${MBUILD_TARGET}
ninja -C build -t missingdeps
EOF
build_32_script: |
su postgres <<-EOF
ninja -C build-32 -j${BUILD_JOBS} ${MBUILD_TARGET}
ninja -C build -t missingdeps
EOF
upload_caches: ccache
@ -521,6 +567,11 @@ task:
cores_script: src/tools/ci/cores_backtrace.sh linux /tmp/cores
# NB: macOS is by far the most expensive OS to run CI for, therefore no
# expensive additional checks should be added.
#
# SPECIAL:
# - Enables --clone for pg_upgrade and pg_combinebackup
task:
name: macOS - Sonoma - Meson
@ -687,6 +738,7 @@ task:
build_script: |
vcvarsall x64
ninja -C build %MBUILD_TARGET%
ninja -C build -t missingdeps
check_world_script: |
vcvarsall x64

View File

@ -14,6 +14,21 @@
#
# $ git log --pretty=format:"%H # %cd%n# %s" $PGINDENTGITHASH -1 --date=iso
918e7287ed20eb1fe280ab6c4056ccf94dcd53a8 # 2025-04-30 19:18:30 +1200
# Fix broken indentation
e1a8b1ad587112e67fdc5aa7b388631dde4dbdda # 2025-04-04 09:38:22 -0500
# Re-pgindent pg_largeobject.c after commit 0d6c477664.
796bdda484c838313959f65e2b700f14ac7c0e66 # 2025-03-18 09:02:36 -0400
# Fix indentation again.
203c1b4cc49455364b6bcab8034900d1c016b9cd # 2025-03-17 16:06:17 -0400
# Fix indentation.
b955df443405e056fd9047ef819a1465654f9d79 # 2025-03-13 12:41:44 +1300
# Fix indentation issue
76aa615943049c04efd36ab4765c06eda89cdfea # 2025-01-31 16:44:24 +0900
# Fix bad indentation introduced in commit d47cbf474

View File

@ -1,5 +1,5 @@
PostgreSQL Database Management System
(formerly known as Postgres, then as Postgres95)
(also known as Postgres, formerly known as Postgres95)
Portions Copyright (c) 1996-2025, PostgreSQL Global Development Group

View File

@ -553,16 +553,20 @@ fi])# PGAC_HAVE_GCC__ATOMIC_INT64_CAS
# the other ones are, on x86-64 platforms)
#
# If the intrinsics are supported, sets pgac_sse42_crc32_intrinsics.
#
# To detect the case where the compiler knows the function but library support
# is missing, we must link not just compile, and store the results in global
# variables so the compiler doesn't optimize away the call.
AC_DEFUN([PGAC_SSE42_CRC32_INTRINSICS],
[define([Ac_cachevar], [AS_TR_SH([pgac_cv_sse42_crc32_intrinsics])])dnl
AC_CACHE_CHECK([for _mm_crc32_u8 and _mm_crc32_u32], [Ac_cachevar],
[AC_LINK_IFELSE([AC_LANG_PROGRAM([#include <nmmintrin.h>
unsigned int crc;
#if defined(__has_attribute) && __has_attribute (target)
__attribute__((target("sse4.2")))
#endif
static int crc32_sse42_test(void)
{
unsigned int crc = 0;
crc = _mm_crc32_u8(crc, 0);
crc = _mm_crc32_u32(crc, 0);
/* return computed value, to prevent the above being optimized away */
@ -577,6 +581,43 @@ fi
undefine([Ac_cachevar])dnl
])# PGAC_SSE42_CRC32_INTRINSICS
# PGAC_AVX512_PCLMUL_INTRINSICS
# ---------------------------
# Check if the compiler supports AVX-512 carryless multiplication
# and three-way exclusive-or instructions used for computing CRC.
# AVX-512F is assumed to be supported if the above are.
#
# If the intrinsics are supported, sets pgac_avx512_pclmul_intrinsics.
AC_DEFUN([PGAC_AVX512_PCLMUL_INTRINSICS],
[define([Ac_cachevar], [AS_TR_SH([pgac_cv_avx512_pclmul_intrinsics])])dnl
AC_CACHE_CHECK([for _mm512_clmulepi64_epi128], [Ac_cachevar],
[AC_LINK_IFELSE([AC_LANG_PROGRAM([#include <immintrin.h>
__m512i x;
__m512i y;
#if defined(__has_attribute) && __has_attribute (target)
__attribute__((target("vpclmulqdq,avx512vl")))
#endif
static int avx512_pclmul_test(void)
{
__m128i z;
y = _mm512_clmulepi64_epi128(x, y, 0);
z = _mm_ternarylogic_epi64(
_mm512_castsi512_si128(y),
_mm512_extracti32x4_epi32(y, 1),
_mm512_extracti32x4_epi32(y, 2),
0x96);
return _mm_crc32_u64(0, _mm_extract_epi64(z, 0));
}],
[return avx512_pclmul_test();])],
[Ac_cachevar=yes],
[Ac_cachevar=no])])
if test x"$Ac_cachevar" = x"yes"; then
pgac_avx512_pclmul_intrinsics=yes
fi
undefine([Ac_cachevar])dnl
])# PGAC_AVX512_PCLMUL_INTRINSICS
# PGAC_ARMV8_CRC32C_INTRINSICS
# ----------------------------
@ -593,9 +634,9 @@ AC_DEFUN([PGAC_ARMV8_CRC32C_INTRINSICS],
AC_CACHE_CHECK([for __crc32cb, __crc32ch, __crc32cw, and __crc32cd with CFLAGS=$1], [Ac_cachevar],
[pgac_save_CFLAGS=$CFLAGS
CFLAGS="$pgac_save_CFLAGS $1"
AC_LINK_IFELSE([AC_LANG_PROGRAM([#include <arm_acle.h>],
[unsigned int crc = 0;
crc = __crc32cb(crc, 0);
AC_LINK_IFELSE([AC_LANG_PROGRAM([#include <arm_acle.h>
unsigned int crc;],
[crc = __crc32cb(crc, 0);
crc = __crc32ch(crc, 0);
crc = __crc32cw(crc, 0);
crc = __crc32cd(crc, 0);
@ -628,9 +669,8 @@ AC_DEFUN([PGAC_LOONGARCH_CRC32C_INTRINSICS],
AC_CACHE_CHECK(
[for __builtin_loongarch_crcc_w_b_w, __builtin_loongarch_crcc_w_h_w, __builtin_loongarch_crcc_w_w_w and __builtin_loongarch_crcc_w_d_w],
[Ac_cachevar],
[AC_LINK_IFELSE([AC_LANG_PROGRAM([],
[unsigned int crc = 0;
crc = __builtin_loongarch_crcc_w_b_w(0, crc);
[AC_LINK_IFELSE([AC_LANG_PROGRAM([unsigned int crc;],
[crc = __builtin_loongarch_crcc_w_b_w(0, crc);
crc = __builtin_loongarch_crcc_w_h_w(0, crc);
crc = __builtin_loongarch_crcc_w_w_w(0, crc);
crc = __builtin_loongarch_crcc_w_d_w(0, crc);
@ -680,22 +720,23 @@ undefine([Ac_cachevar])dnl
AC_DEFUN([PGAC_AVX512_POPCNT_INTRINSICS],
[define([Ac_cachevar], [AS_TR_SH([pgac_cv_avx512_popcnt_intrinsics])])dnl
AC_CACHE_CHECK([for _mm512_popcnt_epi64], [Ac_cachevar],
[AC_LINK_IFELSE([AC_LANG_PROGRAM([#include <immintrin.h>
[AC_LINK_IFELSE([AC_LANG_PROGRAM([[#include <immintrin.h>
#include <stdint.h>
char buf[sizeof(__m512i)];
#if defined(__has_attribute) && __has_attribute (target)
__attribute__((target("avx512vpopcntdq,avx512bw")))
#endif
static int popcount_test(void)
{
const char buf@<:@sizeof(__m512i)@:>@;
int64_t popcnt = 0;
__m512i accum = _mm512_setzero_si512();
const __m512i val = _mm512_maskz_loadu_epi8((__mmask64) 0xf0f0f0f0f0f0f0f0, (const __m512i *) buf);
const __m512i cnt = _mm512_popcnt_epi64(val);
__m512i val = _mm512_maskz_loadu_epi8((__mmask64) 0xf0f0f0f0f0f0f0f0, (const __m512i *) buf);
__m512i cnt = _mm512_popcnt_epi64(val);
accum = _mm512_add_epi64(accum, cnt);
popcnt = _mm512_reduce_add_epi64(accum);
return (int) popcnt;
}],
}]],
[return popcount_test();])],
[Ac_cachevar=yes],
[Ac_cachevar=no])])
@ -704,3 +745,55 @@ if test x"$Ac_cachevar" = x"yes"; then
fi
undefine([Ac_cachevar])dnl
])# PGAC_AVX512_POPCNT_INTRINSICS
# PGAC_SVE_POPCNT_INTRINSICS
# --------------------------
# Check if the compiler supports the SVE popcount instructions using the
# svptrue_b64, svdup_u64, svcntb, svld1_u64, svld1_u8, svadd_u64_x,
# svcnt_u64_x, svcnt_u8_x, svaddv_u64, svaddv_u8, svwhilelt_b8_s32,
# svand_n_u64_x, and svand_n_u8_x intrinsic functions.
#
# If the intrinsics are supported, sets pgac_sve_popcnt_intrinsics.
AC_DEFUN([PGAC_SVE_POPCNT_INTRINSICS],
[define([Ac_cachevar], [AS_TR_SH([pgac_cv_sve_popcnt_intrinsics])])dnl
AC_CACHE_CHECK([for svcnt_x], [Ac_cachevar],
[AC_LINK_IFELSE([AC_LANG_PROGRAM([[#include <arm_sve.h>
char buf[128];
#if defined(__has_attribute) && __has_attribute (target)
__attribute__((target("arch=armv8-a+sve")))
#endif
static int popcount_test(void)
{
svbool_t pred = svptrue_b64();
svuint8_t vec8;
svuint64_t accum1 = svdup_u64(0),
accum2 = svdup_u64(0),
vec64;
char *p = buf;
uint64_t popcnt,
mask = 0x5555555555555555;
vec64 = svand_n_u64_x(pred, svld1_u64(pred, (const uint64_t *) p), mask);
accum1 = svadd_u64_x(pred, accum1, svcnt_u64_x(pred, vec64));
p += svcntb();
vec64 = svand_n_u64_x(pred, svld1_u64(pred, (const uint64_t *) p), mask);
accum2 = svadd_u64_x(pred, accum2, svcnt_u64_x(pred, vec64));
p += svcntb();
popcnt = svaddv_u64(pred, svadd_u64_x(pred, accum1, accum2));
pred = svwhilelt_b8_s32(0, sizeof(buf));
vec8 = svand_n_u8_x(pred, svld1_u8(pred, (const uint8_t *) p), 0x55);
return (int) (popcnt + svaddv_u8(pred, svcnt_u8_x(pred, vec8)));
}]],
[return popcount_test();])],
[Ac_cachevar=yes],
[Ac_cachevar=no])])
if test x"$Ac_cachevar" = x"yes"; then
pgac_sve_popcnt_intrinsics=yes
fi
undefine([Ac_cachevar])dnl
])# PGAC_SVE_POPCNT_INTRINSICS

11
config/config.guess vendored
View File

@ -4,7 +4,7 @@
# shellcheck disable=SC2006,SC2268 # see below for rationale
timestamp='2024-01-01'
timestamp='2024-07-27'
# This file is free software; you can redistribute it and/or modify it
# under the terms of the GNU General Public License as published by
@ -123,7 +123,7 @@ set_cc_for_build() {
dummy=$tmp/dummy
case ${CC_FOR_BUILD-},${HOST_CC-},${CC-} in
,,) echo "int x;" > "$dummy.c"
for driver in cc gcc c89 c99 ; do
for driver in cc gcc c17 c99 c89 ; do
if ($driver -c -o "$dummy.o" "$dummy.c") >/dev/null 2>&1 ; then
CC_FOR_BUILD=$driver
break
@ -634,7 +634,8 @@ EOF
sed 's/^ //' << EOF > "$dummy.c"
#include <sys/systemcfg.h>
main()
int
main ()
{
if (!__power_pc())
exit(1);
@ -718,7 +719,8 @@ EOF
#include <stdlib.h>
#include <unistd.h>
int main ()
int
main ()
{
#if defined(_SC_KERNEL_BITS)
long bits = sysconf(_SC_KERNEL_BITS);
@ -1621,6 +1623,7 @@ cat > "$dummy.c" <<EOF
#endif
#endif
#endif
int
main ()
{
#if defined (sony)

729
config/config.sub vendored
View File

@ -2,9 +2,9 @@
# Configuration validation subroutine script.
# Copyright 1992-2024 Free Software Foundation, Inc.
# shellcheck disable=SC2006,SC2268 # see below for rationale
# shellcheck disable=SC2006,SC2268,SC2162 # see below for rationale
timestamp='2024-01-01'
timestamp='2024-05-27'
# This file is free software; you can redistribute it and/or modify it
# under the terms of the GNU General Public License as published by
@ -120,7 +120,6 @@ case $# in
esac
# Split fields of configuration type
# shellcheck disable=SC2162
saved_IFS=$IFS
IFS="-" read field1 field2 field3 field4 <<EOF
$1
@ -142,10 +141,20 @@ case $1 in
# parts
maybe_os=$field2-$field3
case $maybe_os in
nto-qnx* | linux-* | uclinux-uclibc* \
| uclinux-gnu* | kfreebsd*-gnu* | knetbsd*-gnu* | netbsd*-gnu* \
| netbsd*-eabi* | kopensolaris*-gnu* | cloudabi*-eabi* \
| storm-chaos* | os2-emx* | rtmk-nova* | managarm-* \
cloudabi*-eabi* \
| kfreebsd*-gnu* \
| knetbsd*-gnu* \
| kopensolaris*-gnu* \
| linux-* \
| managarm-* \
| netbsd*-eabi* \
| netbsd*-gnu* \
| nto-qnx* \
| os2-emx* \
| rtmk-nova* \
| storm-chaos* \
| uclinux-gnu* \
| uclinux-uclibc* \
| windows-* )
basic_machine=$field1
basic_os=$maybe_os
@ -161,8 +170,12 @@ case $1 in
esac
;;
*-*)
# A lone config we happen to match not fitting any pattern
case $field1-$field2 in
# Shorthands that happen to contain a single dash
convex-c[12] | convex-c3[248])
basic_machine=$field2-convex
basic_os=
;;
decstation-3100)
basic_machine=mips-dec
basic_os=
@ -170,28 +183,88 @@ case $1 in
*-*)
# Second component is usually, but not always the OS
case $field2 in
# Prevent following clause from handling this valid os
# Do not treat sunos as a manufacturer
sun*os*)
basic_machine=$field1
basic_os=$field2
;;
# Manufacturers
3100* \
| 32* \
| 3300* \
| 3600* \
| 7300* \
| acorn \
| altos* \
| apollo \
| apple \
| atari \
| att* \
| axis \
| be \
| bull \
| cbm \
| ccur \
| cisco \
| commodore \
| convergent* \
| convex* \
| cray \
| crds \
| dec* \
| delta* \
| dg \
| digital \
| dolphin \
| encore* \
| gould \
| harris \
| highlevel \
| hitachi* \
| hp \
| ibm* \
| intergraph \
| isi* \
| knuth \
| masscomp \
| microblaze* \
| mips* \
| motorola* \
| ncr* \
| news \
| next \
| ns \
| oki \
| omron* \
| pc533* \
| rebel \
| rom68k \
| rombug \
| semi \
| sequent* \
| siemens \
| sgi* \
| siemens \
| sim \
| sni \
| sony* \
| stratus \
| sun \
| sun[234]* \
| tektronix \
| tti* \
| ultra \
| unicom* \
| wec \
| winbond \
| wrs)
basic_machine=$field1-$field2
basic_os=
;;
zephyr*)
basic_machine=$field1-unknown
basic_os=$field2
;;
# Manufacturers
dec* | mips* | sequent* | encore* | pc533* | sgi* | sony* \
| att* | 7300* | 3300* | delta* | motorola* | sun[234]* \
| unicom* | ibm* | next | hp | isi* | apollo | altos* \
| convergent* | ncr* | news | 32* | 3600* | 3100* \
| hitachi* | c[123]* | convex* | sun | crds | omron* | dg \
| ultra | tti* | harris | dolphin | highlevel | gould \
| cbm | ns | masscomp | apple | axis | knuth | cray \
| microblaze* | sim | cisco \
| oki | wec | wrs | winbond)
basic_machine=$field1-$field2
basic_os=
;;
*)
basic_machine=$field1
basic_os=$field2
@ -272,26 +345,6 @@ case $1 in
basic_machine=arm-unknown
basic_os=cegcc
;;
convex-c1)
basic_machine=c1-convex
basic_os=bsd
;;
convex-c2)
basic_machine=c2-convex
basic_os=bsd
;;
convex-c32)
basic_machine=c32-convex
basic_os=bsd
;;
convex-c34)
basic_machine=c34-convex
basic_os=bsd
;;
convex-c38)
basic_machine=c38-convex
basic_os=bsd
;;
cray)
basic_machine=j90-cray
basic_os=unicos
@ -714,15 +767,26 @@ case $basic_machine in
vendor=dec
basic_os=tops20
;;
delta | 3300 | motorola-3300 | motorola-delta \
| 3300-motorola | delta-motorola)
delta | 3300 | delta-motorola | 3300-motorola | motorola-delta | motorola-3300)
cpu=m68k
vendor=motorola
;;
dpx2*)
# This used to be dpx2*, but that gets the RS6000-based
# DPX/20 and the x86-based DPX/2-100 wrong. See
# https://oldskool.silicium.org/stations/bull_dpx20.htm
# https://www.feb-patrimoine.com/english/bull_dpx2.htm
# https://www.feb-patrimoine.com/english/unix_and_bull.htm
dpx2 | dpx2[23]00 | dpx2[23]xx)
cpu=m68k
vendor=bull
basic_os=sysv3
;;
dpx2100 | dpx21xx)
cpu=i386
vendor=bull
;;
dpx20)
cpu=rs6000
vendor=bull
;;
encore | umax | mmax)
cpu=ns32k
@ -837,18 +901,6 @@ case $basic_machine in
next | m*-next)
cpu=m68k
vendor=next
case $basic_os in
openstep*)
;;
nextstep*)
;;
ns2*)
basic_os=nextstep2
;;
*)
basic_os=nextstep3
;;
esac
;;
np1)
cpu=np1
@ -937,7 +989,6 @@ case $basic_machine in
;;
*-*)
# shellcheck disable=SC2162
saved_IFS=$IFS
IFS="-" read cpu vendor <<EOF
$basic_machine
@ -972,15 +1023,19 @@ unset -v basic_machine
# Decode basic machines in the full and proper CPU-Company form.
case $cpu-$vendor in
# Here we handle the default manufacturer of certain CPU types in canonical form. It is in
# some cases the only manufacturer, in others, it is the most popular.
# Here we handle the default manufacturer of certain CPU types in canonical form.
# It is in some cases the only manufacturer, in others, it is the most popular.
c[12]-convex | c[12]-unknown | c3[248]-convex | c3[248]-unknown)
vendor=convex
basic_os=${basic_os:-bsd}
;;
craynv-unknown)
vendor=cray
basic_os=${basic_os:-unicosmp}
;;
c90-unknown | c90-cray)
vendor=cray
basic_os=${Basic_os:-unicos}
basic_os=${basic_os:-unicos}
;;
fx80-unknown)
vendor=alliant
@ -1026,11 +1081,29 @@ case $cpu-$vendor in
vendor=alt
basic_os=${basic_os:-linux-gnueabihf}
;;
dpx20-unknown | dpx20-bull)
cpu=rs6000
vendor=bull
# Normalized CPU+vendor pairs that imply an OS, if not otherwise specified
m68k-isi)
basic_os=${basic_os:-sysv}
;;
m68k-sony)
basic_os=${basic_os:-newsos}
;;
m68k-tektronix)
basic_os=${basic_os:-bsd}
;;
m88k-harris)
basic_os=${basic_os:-sysv3}
;;
i386-bull | m68k-bull)
basic_os=${basic_os:-sysv3}
;;
rs6000-bull)
basic_os=${basic_os:-bosx}
;;
mips-sni)
basic_os=${basic_os:-sysv4}
;;
# Here we normalize CPU types irrespective of the vendor
amd64-*)
@ -1038,7 +1111,7 @@ case $cpu-$vendor in
;;
blackfin-*)
cpu=bfin
basic_os=linux
basic_os=${basic_os:-linux}
;;
c54x-*)
cpu=tic54x
@ -1061,7 +1134,7 @@ case $cpu-$vendor in
;;
m68knommu-*)
cpu=m68k
basic_os=linux
basic_os=${basic_os:-linux}
;;
m9s12z-* | m68hcs12z-* | hcs12z-* | s12z-*)
cpu=s12z
@ -1071,7 +1144,7 @@ case $cpu-$vendor in
;;
parisc-*)
cpu=hppa
basic_os=linux
basic_os=${basic_os:-linux}
;;
pentium-* | p5-* | k5-* | k6-* | nexgen-* | viac3-*)
cpu=i586
@ -1085,9 +1158,6 @@ case $cpu-$vendor in
pentium4-*)
cpu=i786
;;
pc98-*)
cpu=i386
;;
ppc-* | ppcbe-*)
cpu=powerpc
;;
@ -1121,9 +1191,6 @@ case $cpu-$vendor in
tx39el-*)
cpu=mipstx39el
;;
x64-*)
cpu=x86_64
;;
xscale-* | xscalee[bl]-*)
cpu=`echo "$cpu" | sed 's/^xscale/arm/'`
;;
@ -1179,90 +1246,227 @@ case $cpu-$vendor in
# Recognize the canonical CPU types that are allowed with any
# company name.
case $cpu in
1750a | 580 \
1750a \
| 580 \
| [cjt]90 \
| a29k \
| aarch64 | aarch64_be | aarch64c | arm64ec \
| aarch64 \
| aarch64_be \
| aarch64c \
| abacus \
| alpha | alphaev[4-8] | alphaev56 | alphaev6[78] \
| alpha64 | alpha64ev[4-8] | alpha64ev56 | alpha64ev6[78] \
| alphapca5[67] | alpha64pca5[67] \
| alpha \
| alpha64 \
| alpha64ev56 \
| alpha64ev6[78] \
| alpha64ev[4-8] \
| alpha64pca5[67] \
| alphaev56 \
| alphaev6[78] \
| alphaev[4-8] \
| alphapca5[67] \
| am33_2.0 \
| amdgcn \
| arc | arceb | arc32 | arc64 \
| arm | arm[lb]e | arme[lb] | armv* \
| avr | avr32 \
| arc \
| arc32 \
| arc64 \
| arceb \
| arm \
| arm64e \
| arm64ec \
| arm[lb]e \
| arme[lb] \
| armv* \
| asmjs \
| avr \
| avr32 \
| ba \
| be32 | be64 \
| bfin | bpf | bs2000 \
| c[123]* | c30 | [cjt]90 | c4x \
| c8051 | clipper | craynv | csky | cydra \
| d10v | d30v | dlx | dsp16xx \
| e2k | elxsi | epiphany \
| f30[01] | f700 | fido | fr30 | frv | ft32 | fx80 \
| javascript \
| h8300 | h8500 \
| hppa | hppa1.[01] | hppa2.0 | hppa2.0[nw] | hppa64 \
| be32 \
| be64 \
| bfin \
| bpf \
| bs2000 \
| c30 \
| c4x \
| c8051 \
| c[123]* \
| clipper \
| craynv \
| csky \
| cydra \
| d10v \
| d30v \
| dlx \
| dsp16xx \
| e2k \
| elxsi \
| epiphany \
| f30[01] \
| f700 \
| fido \
| fr30 \
| frv \
| ft32 \
| fx80 \
| h8300 \
| h8500 \
| hexagon \
| i370 | i*86 | i860 | i960 | ia16 | ia64 \
| ip2k | iq2000 \
| hppa \
| hppa1.[01] \
| hppa2.0 \
| hppa2.0[nw] \
| hppa64 \
| i*86 \
| i370 \
| i860 \
| i960 \
| ia16 \
| ia64 \
| ip2k \
| iq2000 \
| javascript \
| k1om \
| kvx \
| le32 | le64 \
| le32 \
| le64 \
| lm32 \
| loongarch32 | loongarch64 \
| m32c | m32r | m32rle \
| m5200 | m68000 | m680[012346]0 | m68360 | m683?2 | m68k \
| m6811 | m68hc11 | m6812 | m68hc12 | m68hcs12x \
| m88110 | m88k | maxq | mb | mcore | mep | metag \
| microblaze | microblazeel \
| loongarch32 \
| loongarch64 \
| m32c \
| m32r \
| m32rle \
| m5200 \
| m68000 \
| m680[012346]0 \
| m6811 \
| m6812 \
| m68360 \
| m683?2 \
| m68hc11 \
| m68hc12 \
| m68hcs12x \
| m68k \
| m88110 \
| m88k \
| maxq \
| mb \
| mcore \
| mep \
| metag \
| microblaze \
| microblazeel \
| mips* \
| mmix \
| mn10200 | mn10300 \
| mn10200 \
| mn10300 \
| moxie \
| mt \
| msp430 \
| mt \
| nanomips* \
| nds32 | nds32le | nds32be \
| nds32 \
| nds32be \
| nds32le \
| nfp \
| nios | nios2 | nios2eb | nios2el \
| none | np1 | ns16k | ns32k | nvptx \
| nios \
| nios2 \
| nios2eb \
| nios2el \
| none \
| np1 \
| ns16k \
| ns32k \
| nvptx \
| open8 \
| or1k* \
| or32 \
| orion \
| pdp10 \
| pdp11 \
| picochip \
| pdp10 | pdp11 | pj | pjl | pn | power \
| powerpc | powerpc64 | powerpc64le | powerpcle | powerpcspe \
| pj \
| pjl \
| pn \
| power \
| powerpc \
| powerpc64 \
| powerpc64le \
| powerpcle \
| powerpcspe \
| pru \
| pyramid \
| riscv | riscv32 | riscv32be | riscv64 | riscv64be \
| rl78 | romp | rs6000 | rx \
| s390 | s390x \
| riscv \
| riscv32 \
| riscv32be \
| riscv64 \
| riscv64be \
| rl78 \
| romp \
| rs6000 \
| rx \
| s390 \
| s390x \
| score \
| sh | shl \
| sh[1234] | sh[24]a | sh[24]ae[lb] | sh[23]e | she[lb] | sh[lb]e \
| sh[1234]e[lb] | sh[12345][lb]e | sh[23]ele | sh64 | sh64le \
| sparc | sparc64 | sparc64b | sparc64v | sparc86x | sparclet \
| sh \
| sh64 \
| sh64le \
| sh[12345][lb]e \
| sh[1234] \
| sh[1234]e[lb] \
| sh[23]e \
| sh[23]ele \
| sh[24]a \
| sh[24]ae[lb] \
| sh[lb]e \
| she[lb] \
| shl \
| sparc \
| sparc64 \
| sparc64b \
| sparc64v \
| sparc86x \
| sparclet \
| sparclite \
| sparcv8 | sparcv9 | sparcv9b | sparcv9v | sv1 | sx* \
| sparcv8 \
| sparcv9 \
| sparcv9b \
| sparcv9v \
| spu \
| sv1 \
| sx* \
| tahoe \
| thumbv7* \
| tic30 | tic4x | tic54x | tic55x | tic6x | tic80 \
| tic30 \
| tic4x \
| tic54x \
| tic55x \
| tic6x \
| tic80 \
| tron \
| ubicom32 \
| v70 | v850 | v850e | v850e1 | v850es | v850e2 | v850e2v3 \
| v70 \
| v810 \
| v850 \
| v850e \
| v850e1 \
| v850e2 \
| v850e2v3 \
| v850es \
| vax \
| vc4 \
| visium \
| w65 \
| wasm32 | wasm64 \
| wasm32 \
| wasm64 \
| we32k \
| x86 | x86_64 | xc16x | xgate | xps100 \
| xstormy16 | xtensa* \
| x86 \
| x86_64 \
| xc16x \
| xgate \
| xps100 \
| xstormy16 \
| xtensa* \
| ymp \
| z8k | z80)
| z80 \
| z8k)
;;
*)
@ -1307,7 +1511,6 @@ case $basic_os in
os=`echo "$basic_os" | sed -e 's|nto-qnx|qnx|'`
;;
*-*)
# shellcheck disable=SC2162
saved_IFS=$IFS
IFS="-" read kernel os <<EOF
$basic_os
@ -1354,6 +1557,23 @@ case $os in
unixware*)
os=sysv4.2uw
;;
# The marketing names for NeXT's operating systems were
# NeXTSTEP, NeXTSTEP 2, OpenSTEP 3, OpenSTEP 4. 'openstep' is
# mapped to 'openstep3', but 'openstep1' and 'openstep2' are
# mapped to 'nextstep' and 'nextstep2', consistent with the
# treatment of SunOS/Solaris.
ns | ns1 | nextstep | nextstep1 | openstep1)
os=nextstep
;;
ns2 | nextstep2 | openstep2)
os=nextstep2
;;
ns3 | nextstep3 | openstep | openstep3)
os=openstep3
;;
ns4 | nextstep4 | openstep4)
os=openstep4
;;
# es1800 is here to avoid being matched by es* (a different OS)
es1800*)
os=ose
@ -1424,6 +1644,7 @@ case $os in
;;
utek*)
os=bsd
vendor=`echo "$vendor" | sed -e 's|^unknown$|tektronix|'`
;;
dynix*)
os=bsd
@ -1440,21 +1661,25 @@ case $os in
386bsd)
os=bsd
;;
ctix* | uts*)
ctix*)
os=sysv
vendor=`echo "$vendor" | sed -e 's|^unknown$|convergent|'`
;;
uts*)
os=sysv
;;
nova*)
os=rtmk-nova
;;
ns2)
os=nextstep2
kernel=rtmk
os=nova
;;
# Preserve the version number of sinix5.
sinix5.*)
os=`echo "$os" | sed -e 's|sinix|sysv|'`
vendor=`echo "$vendor" | sed -e 's|^unknown$|sni|'`
;;
sinix*)
os=sysv4
vendor=`echo "$vendor" | sed -e 's|^unknown$|sni|'`
;;
tpf*)
os=tpf
@ -1595,6 +1820,14 @@ case $cpu-$vendor in
os=
obj=elf
;;
# The -sgi and -siemens entries must be before the mips- entry
# or we get the wrong os.
*-sgi)
os=irix
;;
*-siemens)
os=sysv4
;;
mips*-cisco)
os=
obj=elf
@ -1607,7 +1840,8 @@ case $cpu-$vendor in
os=
obj=coff
;;
*-tti) # must be before sparc entry or we get the wrong os.
# This must be before the sparc-* entry or we get the wrong os.
*-tti)
os=sysv3
;;
sparc-* | *-sun)
@ -1639,7 +1873,7 @@ case $cpu-$vendor in
os=hpux
;;
*-hitachi)
os=hiux
os=hiuxwe2
;;
i860-* | *-att | *-ncr | *-altos | *-motorola | *-convergent)
os=sysv
@ -1683,12 +1917,6 @@ case $cpu-$vendor in
*-encore)
os=bsd
;;
*-sgi)
os=irix
;;
*-siemens)
os=sysv4
;;
*-masscomp)
os=rtu
;;
@ -1735,40 +1963,193 @@ case $os in
ghcjs)
;;
# Now accept the basic system types.
# The portable systems comes first.
# Each alternative MUST end in a * to match a version number.
gnu* | android* | bsd* | mach* | minix* | genix* | ultrix* | irix* \
| *vms* | esix* | aix* | cnk* | sunos | sunos[34]* \
| hpux* | unos* | osf* | luna* | dgux* | auroraux* | solaris* \
| sym* | plan9* | psp* | sim* | xray* | os68k* | v88r* \
| hiux* | abug | nacl* | netware* | windows* \
| os9* | macos* | osx* | ios* | tvos* | watchos* \
| mpw* | magic* | mmixware* | mon960* | lnews* \
| amigaos* | amigados* | msdos* | newsos* | unicos* | aof* \
| aos* | aros* | cloudabi* | sortix* | twizzler* \
| nindy* | vxsim* | vxworks* | ebmon* | hms* | mvs* \
| clix* | riscos* | uniplus* | iris* | isc* | rtu* | xenix* \
| mirbsd* | netbsd* | dicos* | openedition* | ose* \
| bitrig* | openbsd* | secbsd* | solidbsd* | libertybsd* | os108* \
| ekkobsd* | freebsd* | riscix* | lynxos* | os400* \
| bosx* | nextstep* | cxux* | oabi* \
| ptx* | ecoff* | winnt* | domain* | vsta* \
| udi* | lites* | ieee* | go32* | aux* | hcos* \
| chorusrdb* | cegcc* | glidix* | serenity* \
| cygwin* | msys* | moss* | proelf* | rtems* \
| midipix* | mingw32* | mingw64* | mint* \
| uxpv* | beos* | mpeix* | udk* | moxiebox* \
| interix* | uwin* | mks* | rhapsody* | darwin* \
| openstep* | oskit* | conix* | pw32* | nonstopux* \
| storm-chaos* | tops10* | tenex* | tops20* | its* \
| os2* | vos* | palmos* | uclinux* | nucleus* | morphos* \
| scout* | superux* | sysv* | rtmk* | tpf* | windiss* \
| powermax* | dnix* | nx6 | nx7 | sei* | dragonfly* \
| skyos* | haiku* | rdos* | toppers* | drops* | es* \
| onefs* | tirtos* | phoenix* | fuchsia* | redox* | bme* \
| midnightbsd* | amdhsa* | unleashed* | emscripten* | wasi* \
| nsk* | powerunix* | genode* | zvmoe* | qnx* | emx* | zephyr* \
| fiwix* | mlibc* | cos* | mbr* | ironclad* )
abug \
| aix* \
| amdhsa* \
| amigados* \
| amigaos* \
| android* \
| aof* \
| aos* \
| aros* \
| atheos* \
| auroraux* \
| aux* \
| beos* \
| bitrig* \
| bme* \
| bosx* \
| bsd* \
| cegcc* \
| chorusos* \
| chorusrdb* \
| clix* \
| cloudabi* \
| cnk* \
| conix* \
| cos* \
| cxux* \
| cygwin* \
| darwin* \
| dgux* \
| dicos* \
| dnix* \
| domain* \
| dragonfly* \
| drops* \
| ebmon* \
| ecoff* \
| ekkobsd* \
| emscripten* \
| emx* \
| es* \
| fiwix* \
| freebsd* \
| fuchsia* \
| genix* \
| genode* \
| glidix* \
| gnu* \
| go32* \
| haiku* \
| hcos* \
| hiux* \
| hms* \
| hpux* \
| ieee* \
| interix* \
| ios* \
| iris* \
| irix* \
| ironclad* \
| isc* \
| its* \
| l4re* \
| libertybsd* \
| lites* \
| lnews* \
| luna* \
| lynxos* \
| mach* \
| macos* \
| magic* \
| mbr* \
| midipix* \
| midnightbsd* \
| mingw32* \
| mingw64* \
| minix* \
| mint* \
| mirbsd* \
| mks* \
| mlibc* \
| mmixware* \
| mon960* \
| morphos* \
| moss* \
| moxiebox* \
| mpeix* \
| mpw* \
| msdos* \
| msys* \
| mvs* \
| nacl* \
| netbsd* \
| netware* \
| newsos* \
| nextstep* \
| nindy* \
| nonstopux* \
| nova* \
| nsk* \
| nucleus* \
| nx6 \
| nx7 \
| oabi* \
| ohos* \
| onefs* \
| openbsd* \
| openedition* \
| openstep* \
| os108* \
| os2* \
| os400* \
| os68k* \
| os9* \
| ose* \
| osf* \
| oskit* \
| osx* \
| palmos* \
| phoenix* \
| plan9* \
| powermax* \
| powerunix* \
| proelf* \
| psos* \
| psp* \
| ptx* \
| pw32* \
| qnx* \
| rdos* \
| redox* \
| rhapsody* \
| riscix* \
| riscos* \
| rtems* \
| rtmk* \
| rtu* \
| scout* \
| secbsd* \
| sei* \
| serenity* \
| sim* \
| skyos* \
| solaris* \
| solidbsd* \
| sortix* \
| storm-chaos* \
| sunos \
| sunos[34]* \
| superux* \
| syllable* \
| sym* \
| sysv* \
| tenex* \
| tirtos* \
| toppers* \
| tops10* \
| tops20* \
| tpf* \
| tvos* \
| twizzler* \
| uclinux* \
| udi* \
| udk* \
| ultrix* \
| unicos* \
| uniplus* \
| unleashed* \
| unos* \
| uwin* \
| uxpv* \
| v88r* \
|*vms* \
| vos* \
| vsta* \
| vxsim* \
| vxworks* \
| wasi* \
| watchos* \
| wince* \
| windiss* \
| windows* \
| winnt* \
| xenix* \
| xray* \
| zephyr* \
| zvmoe* )
;;
# This one is extra strict with allowed versions
sco3.2v2 | sco3.2v[4-9]* | sco5v6*)
@ -1829,9 +2210,9 @@ esac
case $kernel-$os-$obj in
linux-gnu*- | linux-android*- | linux-dietlibc*- | linux-llvm*- \
| linux-mlibc*- | linux-musl*- | linux-newlib*- \
| linux-relibc*- | linux-uclibc*- )
| linux-relibc*- | linux-uclibc*- | linux-ohos*- )
;;
uclinux-uclibc*- )
uclinux-uclibc*- | uclinux-gnu*- )
;;
managarm-mlibc*- | managarm-kernel*- )
;;
@ -1856,7 +2237,7 @@ case $kernel-$os-$obj in
echo "Invalid configuration '$1': '$os' needs 'windows'." 1>&2
exit 1
;;
kfreebsd*-gnu*- | kopensolaris*-gnu*-)
kfreebsd*-gnu*- | knetbsd*-gnu*- | netbsd*-gnu*- | kopensolaris*-gnu*-)
;;
vxworks-simlinux- | vxworks-simwindows- | vxworks-spe-)
;;
@ -1864,6 +2245,8 @@ case $kernel-$os-$obj in
;;
os2-emx-)
;;
rtmk-nova-)
;;
*-eabi*- | *-gnueabi*-)
;;
none--*)
@ -1890,7 +2273,7 @@ case $vendor in
*-riscix*)
vendor=acorn
;;
*-sunos*)
*-sunos* | *-solaris*)
vendor=sun
;;
*-cnk* | *-aix*)

View File

@ -274,3 +274,83 @@ AC_DEFUN([PGAC_CHECK_STRIP],
AC_SUBST(STRIP_STATIC_LIB)
AC_SUBST(STRIP_SHARED_LIB)
])# PGAC_CHECK_STRIP
# PGAC_CHECK_LIBCURL
# ------------------
# Check for required libraries and headers, and test to see whether the current
# installation of libcurl is thread-safe.
AC_DEFUN([PGAC_CHECK_LIBCURL],
[
AC_CHECK_HEADER(curl/curl.h, [],
[AC_MSG_ERROR([header file <curl/curl.h> is required for --with-libcurl])])
AC_CHECK_LIB(curl, curl_multi_init, [
AC_DEFINE([HAVE_LIBCURL], [1], [Define to 1 if you have the `curl' library (-lcurl).])
AC_SUBST(LIBCURL_LDLIBS, -lcurl)
],
[AC_MSG_ERROR([library 'curl' does not provide curl_multi_init])])
pgac_save_CPPFLAGS=$CPPFLAGS
pgac_save_LDFLAGS=$LDFLAGS
pgac_save_LIBS=$LIBS
CPPFLAGS="$LIBCURL_CPPFLAGS $CPPFLAGS"
LDFLAGS="$LIBCURL_LDFLAGS $LDFLAGS"
LIBS="$LIBCURL_LDLIBS $LIBS"
# Check to see whether the current platform supports threadsafe Curl
# initialization.
AC_CACHE_CHECK([for curl_global_init thread safety], [pgac_cv__libcurl_threadsafe_init],
[AC_RUN_IFELSE([AC_LANG_PROGRAM([
#include <curl/curl.h>
],[
curl_version_info_data *info;
if (curl_global_init(CURL_GLOBAL_ALL))
return -1;
info = curl_version_info(CURLVERSION_NOW);
#ifdef CURL_VERSION_THREADSAFE
if (info->features & CURL_VERSION_THREADSAFE)
return 0;
#endif
return 1;
])],
[pgac_cv__libcurl_threadsafe_init=yes],
[pgac_cv__libcurl_threadsafe_init=no],
[pgac_cv__libcurl_threadsafe_init=unknown])])
if test x"$pgac_cv__libcurl_threadsafe_init" = xyes ; then
AC_DEFINE(HAVE_THREADSAFE_CURL_GLOBAL_INIT, 1,
[Define to 1 if curl_global_init() is guaranteed to be thread-safe.])
fi
# Fail if a thread-friendly DNS resolver isn't built.
AC_CACHE_CHECK([for curl support for asynchronous DNS], [pgac_cv__libcurl_async_dns],
[AC_RUN_IFELSE([AC_LANG_PROGRAM([
#include <curl/curl.h>
],[
curl_version_info_data *info;
if (curl_global_init(CURL_GLOBAL_ALL))
return -1;
info = curl_version_info(CURLVERSION_NOW);
return (info->features & CURL_VERSION_ASYNCHDNS) ? 0 : 1;
])],
[pgac_cv__libcurl_async_dns=yes],
[pgac_cv__libcurl_async_dns=no],
[pgac_cv__libcurl_async_dns=unknown])])
if test x"$pgac_cv__libcurl_async_dns" = xno ; then
AC_MSG_ERROR([
*** The installed version of libcurl does not support asynchronous DNS
*** lookups. Rebuild libcurl with the AsynchDNS feature enabled in order
*** to use it with libpq.])
fi
CPPFLAGS=$pgac_save_CPPFLAGS
LDFLAGS=$pgac_save_LDFLAGS
LIBS=$pgac_save_LIBS
])# PGAC_CHECK_LIBCURL

940
configure vendored

File diff suppressed because it is too large Load Diff

View File

@ -17,7 +17,7 @@ dnl Read the Autoconf manual for details.
dnl
m4_pattern_forbid(^PGAC_)dnl to catch undefined macros
AC_INIT([PostgreSQL], [18devel], [pgsql-bugs@lists.postgresql.org], [], [https://www.postgresql.org/])
AC_INIT([PostgreSQL], [18beta1], [pgsql-bugs@lists.postgresql.org], [], [https://www.postgresql.org/])
m4_if(m4_defn([m4_PACKAGE_VERSION]), [2.69], [], [m4_fatal([Autoconf version 2.69 is required.
Untested combinations of 'autoconf' and PostgreSQL versions are not
@ -975,6 +975,18 @@ AC_SUBST(with_readline)
PGAC_ARG_BOOL(with, libedit-preferred, no,
[prefer BSD Libedit over GNU Readline])
#
# liburing
#
AC_MSG_CHECKING([whether to build with liburing support])
PGAC_ARG_BOOL(with, liburing, no, [build with io_uring support, for asynchronous I/O],
[AC_DEFINE([USE_LIBURING], 1, [Define to build with io_uring support. (--with-liburing)])])
AC_MSG_RESULT([$with_liburing])
AC_SUBST(with_liburing)
if test "$with_liburing" = yes; then
PKG_CHECK_MODULES(LIBURING, liburing)
fi
#
# UUID library
@ -1007,6 +1019,62 @@ fi
AC_SUBST(with_uuid)
#
# libcurl
#
AC_MSG_CHECKING([whether to build with libcurl support])
PGAC_ARG_BOOL(with, libcurl, no, [build with libcurl support],
[AC_DEFINE([USE_LIBCURL], 1, [Define to 1 to build with libcurl support. (--with-libcurl)])])
AC_MSG_RESULT([$with_libcurl])
AC_SUBST(with_libcurl)
if test "$with_libcurl" = yes ; then
# Check for libcurl 7.61.0 or higher (corresponding to RHEL8 and the ability
# to explicitly set TLS 1.3 ciphersuites).
PKG_CHECK_MODULES(LIBCURL, [libcurl >= 7.61.0])
# Curl's flags are kept separate from the standard CPPFLAGS/LDFLAGS. We use
# them only for libpq-oauth.
LIBCURL_CPPFLAGS=
LIBCURL_LDFLAGS=
# We only care about -I, -D, and -L switches. Note that -lcurl will be added
# to LIBCURL_LDLIBS by PGAC_CHECK_LIBCURL, below.
for pgac_option in $LIBCURL_CFLAGS; do
case $pgac_option in
-I*|-D*) LIBCURL_CPPFLAGS="$LIBCURL_CPPFLAGS $pgac_option";;
esac
done
for pgac_option in $LIBCURL_LIBS; do
case $pgac_option in
-L*) LIBCURL_LDFLAGS="$LIBCURL_LDFLAGS $pgac_option";;
esac
done
AC_SUBST(LIBCURL_CPPFLAGS)
AC_SUBST(LIBCURL_LDFLAGS)
# OAuth requires python for testing
if test "$with_python" != yes; then
AC_MSG_WARN([*** OAuth support tests require --with-python to run])
fi
fi
#
# libnuma
#
AC_MSG_CHECKING([whether to build with libnuma support])
PGAC_ARG_BOOL(with, libnuma, no, [build with libnuma support],
[AC_DEFINE([USE_LIBNUMA], 1, [Define to build with NUMA support. (--with-libnuma)])])
AC_MSG_RESULT([$with_libnuma])
AC_SUBST(with_libnuma)
if test "$with_libnuma" = yes ; then
AC_CHECK_LIB(numa, numa_available, [], [AC_MSG_ERROR([library 'libnuma' is required for NUMA support])])
PKG_CHECK_MODULES(LIBNUMA, numa)
fi
#
# XML
#
@ -1294,6 +1362,10 @@ failure. It is possible the compiler isn't looking in the proper directory.
Use --without-zlib to disable zlib support.])])
fi
if test "$with_libcurl" = yes ; then
PGAC_CHECK_LIBCURL
fi
if test "$with_gssapi" = yes ; then
if test "$PORTNAME" != "win32"; then
AC_SEARCH_LIBS(gss_store_cred_into, [gssapi_krb5 gss 'gssapi -lkrb5 -lcrypto'], [],
@ -1329,7 +1401,7 @@ if test "$with_ssl" = openssl ; then
# Function introduced in OpenSSL 1.0.2, not in LibreSSL.
AC_CHECK_FUNCS([SSL_CTX_set_cert_cb])
# Function introduced in OpenSSL 1.1.1, not in LibreSSL.
AC_CHECK_FUNCS([X509_get_signature_info SSL_CTX_set_num_tickets])
AC_CHECK_FUNCS([X509_get_signature_info SSL_CTX_set_num_tickets SSL_CTX_set_keylog_callback])
AC_DEFINE([USE_OPENSSL], 1, [Define to 1 to build with OpenSSL support. (--with-ssl=openssl)])
elif test "$with_ssl" != no ; then
AC_MSG_ERROR([--with-ssl must specify openssl])
@ -1587,6 +1659,13 @@ if test "$PORTNAME" = "win32" ; then
AC_CHECK_HEADERS(crtdefs.h)
fi
if test "$with_libcurl" = yes ; then
# Error out early if this platform can't support libpq-oauth.
if test "$ac_cv_header_sys_event_h" != yes -a "$ac_cv_header_sys_epoll_h" != yes; then
AC_MSG_ERROR([client-side OAuth is not supported on this platform])
fi
fi
##
## Types, structures, compiler characteristics
##
@ -1711,14 +1790,13 @@ AC_CHECK_FUNCS(m4_normalize([
getpeerucred
inet_pton
kqueue
localeconv_l
mbstowcs_l
memset_s
posix_fallocate
ppoll
pthread_is_threaded_np
setproctitle
setproctitle_fast
strchrnul
strsignal
syncfs
sync_file_range
@ -1752,12 +1830,15 @@ AC_CHECK_DECLS(posix_fadvise, [], [], [#include <fcntl.h>])
]) # fi
AC_CHECK_DECLS(fdatasync, [], [], [#include <unistd.h>])
AC_CHECK_DECLS([strlcat, strlcpy, strnlen, strsep])
AC_CHECK_DECLS([strlcat, strlcpy, strnlen, strsep, timingsafe_bcmp])
# We can't use AC_CHECK_FUNCS to detect these functions, because it
# won't handle deployment target restrictions on macOS
AC_CHECK_DECLS([preadv], [], [], [#include <sys/uio.h>])
AC_CHECK_DECLS([pwritev], [], [], [#include <sys/uio.h>])
AC_CHECK_DECLS([strchrnul], [], [], [#include <string.h>])
AC_CHECK_DECLS([memset_s], [], [], [#define __STDC_WANT_LIB_EXT1__ 1
#include <string.h>])
# This is probably only present on macOS, but may as well check always
AC_CHECK_DECLS(F_FULLFSYNC, [], [], [#include <fcntl.h>])
@ -1772,6 +1853,7 @@ AC_REPLACE_FUNCS(m4_normalize([
strlcpy
strnlen
strsep
timingsafe_bcmp
]))
AC_REPLACE_FUNCS(pthread_barrier_wait)
@ -2016,6 +2098,15 @@ if test x"$host_cpu" = x"x86_64"; then
fi
fi
# Check for SVE popcount intrinsics
#
if test x"$host_cpu" = x"aarch64"; then
PGAC_SVE_POPCNT_INTRINSICS()
if test x"$pgac_sve_popcnt_intrinsics" = x"yes"; then
AC_DEFINE(USE_SVE_POPCNT_WITH_RUNTIME_CHECK, 1, [Define to 1 to use SVE popcount instructions with a runtime check.])
fi
fi
# Check for Intel SSE 4.2 intrinsics to do CRC calculations.
#
PGAC_SSE42_CRC32_INTRINSICS()
@ -2052,17 +2143,21 @@ AC_SUBST(CFLAGS_CRC)
# Select CRC-32C implementation.
#
# If we are targeting a processor that has Intel SSE 4.2 instructions, we can
# use the special CRC instructions for calculating CRC-32C. If we're not
# targeting such a processor, but we can nevertheless produce code that uses
# the SSE intrinsics, compile both implementations and select which one to use
# at runtime, depending on whether SSE 4.2 is supported by the processor we're
# running on.
# There are three methods of calculating CRC, in order of increasing
# performance:
#
# Similarly, if we are targeting an ARM processor that has the CRC
# instructions that are part of the ARMv8 CRC Extension, use them. And if
# we're not targeting such a processor, but can nevertheless produce code that
# uses the CRC instructions, compile both, and select at runtime.
# 1. The fallback using a lookup table, called slicing-by-8
# 2. CRC-32C instructions (found in e.g. Intel SSE 4.2 and ARMv8 CRC Extension)
# 3. Algorithms using carryless multiplication instructions
# (e.g. Intel PCLMUL and Arm PMULL)
#
# If we can produce code (via function attributes or additional compiler
# flags) that uses #2 (and possibly #3), we compile all implementations
# and select which one to use at runtime, depending on what is supported
# by the processor we're running on.
#
# If we are targeting a processor that has #2, we can use that without
# runtime selection.
#
# Note that we do not use __attribute__((target("..."))) for the ARM CRC
# instructions because until clang 16, using the ARM intrinsics still requires
@ -2110,7 +2205,7 @@ fi
AC_MSG_CHECKING([which CRC-32C implementation to use])
if test x"$USE_SSE42_CRC32C" = x"1"; then
AC_DEFINE(USE_SSE42_CRC32C, 1, [Define to 1 use Intel SSE 4.2 CRC instructions.])
PG_CRC32C_OBJS="pg_crc32c_sse42.o"
PG_CRC32C_OBJS="pg_crc32c_sse42.o pg_crc32c_sse42_choose.o"
AC_MSG_RESULT(SSE 4.2)
else
if test x"$USE_SSE42_CRC32C_WITH_RUNTIME_CHECK" = x"1"; then
@ -2143,6 +2238,19 @@ else
fi
AC_SUBST(PG_CRC32C_OBJS)
# Check for carryless multiplication intrinsics to do vectorized CRC calculations.
#
if test x"$host_cpu" = x"x86_64"; then
PGAC_AVX512_PCLMUL_INTRINSICS()
fi
AC_MSG_CHECKING([for vectorized CRC-32C])
if test x"$pgac_avx512_pclmul_intrinsics" = x"yes"; then
AC_DEFINE(USE_AVX512_CRC32C_WITH_RUNTIME_CHECK, 1, [Define to 1 to use AVX-512 CRC algorithms with a runtime check.])
AC_MSG_RESULT(AVX-512 with runtime check)
else
AC_MSG_RESULT(none)
fi
# Select semaphore implementation type.
if test "$PORTNAME" != "win32"; then

View File

@ -33,6 +33,7 @@ SUBDIRS = \
pg_buffercache \
pg_freespacemap \
pg_logicalinspect \
pg_overexplain \
pg_prewarm \
pg_stat_statements \
pg_surgery \

View File

@ -3,14 +3,17 @@
MODULE_big = amcheck
OBJS = \
$(WIN32RES) \
verify_common.o \
verify_gin.o \
verify_heapam.o \
verify_nbtree.o
EXTENSION = amcheck
DATA = amcheck--1.3--1.4.sql amcheck--1.2--1.3.sql amcheck--1.1--1.2.sql amcheck--1.0--1.1.sql amcheck--1.0.sql
DATA = amcheck--1.2--1.3.sql amcheck--1.1--1.2.sql amcheck--1.0--1.1.sql amcheck--1.0.sql \
amcheck--1.3--1.4.sql amcheck--1.4--1.5.sql
PGFILEDESC = "amcheck - function for verifying relation integrity"
REGRESS = check check_btree check_heap
REGRESS = check check_btree check_gin check_heap
EXTRA_INSTALL = contrib/pg_walinspect
TAP_TESTS = 1

View File

@ -0,0 +1,14 @@
/* contrib/amcheck/amcheck--1.4--1.5.sql */
-- complain if script is sourced in psql, rather than via CREATE EXTENSION
\echo Use "ALTER EXTENSION amcheck UPDATE TO '1.5'" to load this file. \quit
-- gin_index_check()
--
CREATE FUNCTION gin_index_check(index regclass)
RETURNS VOID
AS 'MODULE_PATHNAME', 'gin_index_check'
LANGUAGE C STRICT;
REVOKE ALL ON FUNCTION gin_index_check(regclass) FROM PUBLIC;

View File

@ -1,5 +1,5 @@
# amcheck extension
comment = 'functions for verifying relation integrity'
default_version = '1.4'
default_version = '1.5'
module_pathname = '$libdir/amcheck'
relocatable = true

View File

@ -57,8 +57,8 @@ ERROR: could not open relation with OID 17
BEGIN;
CREATE INDEX bttest_a_brin_idx ON bttest_a USING brin(id);
SELECT bt_index_parent_check('bttest_a_brin_idx');
ERROR: only B-Tree indexes are supported as targets for verification
DETAIL: Relation "bttest_a_brin_idx" is not a B-Tree index.
ERROR: expected "btree" index as targets for verification
DETAIL: Relation "bttest_a_brin_idx" is a brin index.
ROLLBACK;
-- normal check outside of xact
SELECT bt_index_check('bttest_a_idx');

View File

@ -0,0 +1,78 @@
-- Test of index bulk load
SELECT setseed(1);
setseed
---------
(1 row)
CREATE TABLE "gin_check"("Column1" int[]);
-- posting trees (frequently used entries)
INSERT INTO gin_check select array_agg(round(random()*255) ) from generate_series(1, 100000) as i group by i % 10000;
-- posting leaves (sparse entries)
INSERT INTO gin_check select array_agg(255 + round(random()*100)) from generate_series(1, 100) as i group by i % 100;
CREATE INDEX gin_check_idx on "gin_check" USING GIN("Column1");
SELECT gin_index_check('gin_check_idx');
gin_index_check
-----------------
(1 row)
-- cleanup
DROP TABLE gin_check;
-- Test index inserts
SELECT setseed(1);
setseed
---------
(1 row)
CREATE TABLE "gin_check"("Column1" int[]);
CREATE INDEX gin_check_idx on "gin_check" USING GIN("Column1");
ALTER INDEX gin_check_idx SET (fastupdate = false);
-- posting trees
INSERT INTO gin_check select array_agg(round(random()*255) ) from generate_series(1, 100000) as i group by i % 10000;
-- posting leaves
INSERT INTO gin_check select array_agg(100 + round(random()*255)) from generate_series(1, 100) as i group by i % 100;
SELECT gin_index_check('gin_check_idx');
gin_index_check
-----------------
(1 row)
-- cleanup
DROP TABLE gin_check;
-- Test GIN over text array
SELECT setseed(1);
setseed
---------
(1 row)
CREATE TABLE "gin_check_text_array"("Column1" text[]);
-- posting trees
INSERT INTO gin_check_text_array select array_agg(sha256(round(random()*300)::text::bytea)::text) from generate_series(1, 100000) as i group by i % 10000;
-- posting leaves
INSERT INTO gin_check_text_array select array_agg(sha256(round(random()*300 + 300)::text::bytea)::text) from generate_series(1, 10000) as i group by i % 100;
CREATE INDEX gin_check_text_array_idx on "gin_check_text_array" USING GIN("Column1");
SELECT gin_index_check('gin_check_text_array_idx');
gin_index_check
-----------------
(1 row)
-- cleanup
DROP TABLE gin_check_text_array;
-- Test GIN over jsonb
CREATE TABLE "gin_check_jsonb"("j" jsonb);
INSERT INTO gin_check_jsonb values ('{"a":[["b",{"x":1}],["b",{"x":2}]],"c":3}');
INSERT INTO gin_check_jsonb values ('[[14,2,3]]');
INSERT INTO gin_check_jsonb values ('[1,[14,2,3]]');
CREATE INDEX "gin_check_jsonb_idx" on gin_check_jsonb USING GIN("j" jsonb_path_ops);
SELECT gin_index_check('gin_check_jsonb_idx');
gin_index_check
-----------------
(1 row)
-- cleanup
DROP TABLE gin_check_jsonb;

View File

@ -1,6 +1,8 @@
# Copyright (c) 2022-2025, PostgreSQL Global Development Group
amcheck_sources = files(
'verify_common.c',
'verify_gin.c',
'verify_heapam.c',
'verify_nbtree.c',
)
@ -24,6 +26,7 @@ install_data(
'amcheck--1.1--1.2.sql',
'amcheck--1.2--1.3.sql',
'amcheck--1.3--1.4.sql',
'amcheck--1.4--1.5.sql',
kwargs: contrib_data_args,
)
@ -35,6 +38,7 @@ tests += {
'sql': [
'check',
'check_btree',
'check_gin',
'check_heap',
],
},

View File

@ -0,0 +1,52 @@
-- Test of index bulk load
SELECT setseed(1);
CREATE TABLE "gin_check"("Column1" int[]);
-- posting trees (frequently used entries)
INSERT INTO gin_check select array_agg(round(random()*255) ) from generate_series(1, 100000) as i group by i % 10000;
-- posting leaves (sparse entries)
INSERT INTO gin_check select array_agg(255 + round(random()*100)) from generate_series(1, 100) as i group by i % 100;
CREATE INDEX gin_check_idx on "gin_check" USING GIN("Column1");
SELECT gin_index_check('gin_check_idx');
-- cleanup
DROP TABLE gin_check;
-- Test index inserts
SELECT setseed(1);
CREATE TABLE "gin_check"("Column1" int[]);
CREATE INDEX gin_check_idx on "gin_check" USING GIN("Column1");
ALTER INDEX gin_check_idx SET (fastupdate = false);
-- posting trees
INSERT INTO gin_check select array_agg(round(random()*255) ) from generate_series(1, 100000) as i group by i % 10000;
-- posting leaves
INSERT INTO gin_check select array_agg(100 + round(random()*255)) from generate_series(1, 100) as i group by i % 100;
SELECT gin_index_check('gin_check_idx');
-- cleanup
DROP TABLE gin_check;
-- Test GIN over text array
SELECT setseed(1);
CREATE TABLE "gin_check_text_array"("Column1" text[]);
-- posting trees
INSERT INTO gin_check_text_array select array_agg(sha256(round(random()*300)::text::bytea)::text) from generate_series(1, 100000) as i group by i % 10000;
-- posting leaves
INSERT INTO gin_check_text_array select array_agg(sha256(round(random()*300 + 300)::text::bytea)::text) from generate_series(1, 10000) as i group by i % 100;
CREATE INDEX gin_check_text_array_idx on "gin_check_text_array" USING GIN("Column1");
SELECT gin_index_check('gin_check_text_array_idx');
-- cleanup
DROP TABLE gin_check_text_array;
-- Test GIN over jsonb
CREATE TABLE "gin_check_jsonb"("j" jsonb);
INSERT INTO gin_check_jsonb values ('{"a":[["b",{"x":1}],["b",{"x":2}]],"c":3}');
INSERT INTO gin_check_jsonb values ('[[14,2,3]]');
INSERT INTO gin_check_jsonb values ('[1,[14,2,3]]');
CREATE INDEX "gin_check_jsonb_idx" on gin_check_jsonb USING GIN("j" jsonb_path_ops);
SELECT gin_index_check('gin_check_jsonb_idx');
-- cleanup
DROP TABLE gin_check_jsonb;

View File

@ -21,8 +21,9 @@ $node->append_conf('postgresql.conf',
'lock_timeout = ' . (1000 * $PostgreSQL::Test::Utils::timeout_default));
$node->start;
$node->safe_psql('postgres', q(CREATE EXTENSION amcheck));
$node->safe_psql('postgres', q(CREATE TABLE tbl(i int)));
$node->safe_psql('postgres', q(CREATE TABLE tbl(i int, j jsonb)));
$node->safe_psql('postgres', q(CREATE INDEX idx ON tbl(i)));
$node->safe_psql('postgres', q(CREATE INDEX ginidx ON tbl USING gin(j)));
#
# Stress CIC with pgbench.
@ -40,13 +41,13 @@ $node->pgbench(
{
'002_pgbench_concurrent_transaction' => q(
BEGIN;
INSERT INTO tbl VALUES(0);
INSERT INTO tbl VALUES(0, '{"a":[["b",{"x":1}],["b",{"x":2}]],"c":3}');
COMMIT;
),
'002_pgbench_concurrent_transaction_savepoints' => q(
BEGIN;
SAVEPOINT s1;
INSERT INTO tbl VALUES(0);
INSERT INTO tbl VALUES(0, '[[14,2,3]]');
COMMIT;
),
'002_pgbench_concurrent_cic' => q(
@ -54,7 +55,10 @@ $node->pgbench(
\if :gotlock
DROP INDEX CONCURRENTLY idx;
CREATE INDEX CONCURRENTLY idx ON tbl(i);
DROP INDEX CONCURRENTLY ginidx;
CREATE INDEX CONCURRENTLY ginidx ON tbl USING gin(j);
SELECT bt_index_check('idx',true);
SELECT gin_index_check('ginidx');
SELECT pg_advisory_unlock(42);
\endif
)

View File

@ -25,7 +25,7 @@ $node->append_conf('postgresql.conf',
'lock_timeout = ' . (1000 * $PostgreSQL::Test::Utils::timeout_default));
$node->start;
$node->safe_psql('postgres', q(CREATE EXTENSION amcheck));
$node->safe_psql('postgres', q(CREATE TABLE tbl(i int)));
$node->safe_psql('postgres', q(CREATE TABLE tbl(i int, j jsonb)));
#
@ -41,7 +41,7 @@ my $main_h = $node->background_psql('postgres');
$main_h->query_safe(
q(
BEGIN;
INSERT INTO tbl VALUES(0);
INSERT INTO tbl VALUES(0, '[[14,2,3]]');
));
my $cic_h = $node->background_psql('postgres');
@ -50,6 +50,7 @@ $cic_h->query_until(
qr/start/, q(
\echo start
CREATE INDEX CONCURRENTLY idx ON tbl(i);
CREATE INDEX CONCURRENTLY ginidx ON tbl USING gin(j);
));
$main_h->query_safe(
@ -60,7 +61,7 @@ PREPARE TRANSACTION 'a';
$main_h->query_safe(
q(
BEGIN;
INSERT INTO tbl VALUES(0);
INSERT INTO tbl VALUES(0, '[[14,2,3]]');
));
$node->safe_psql('postgres', q(COMMIT PREPARED 'a';));
@ -69,7 +70,7 @@ $main_h->query_safe(
q(
PREPARE TRANSACTION 'b';
BEGIN;
INSERT INTO tbl VALUES(0);
INSERT INTO tbl VALUES(0, '"mary had a little lamb"');
));
$node->safe_psql('postgres', q(COMMIT PREPARED 'b';));
@ -86,6 +87,9 @@ $cic_h->quit;
$result = $node->psql('postgres', q(SELECT bt_index_check('idx',true)));
is($result, '0', 'bt_index_check after overlapping 2PC');
$result = $node->psql('postgres', q(SELECT gin_index_check('ginidx')));
is($result, '0', 'gin_index_check after overlapping 2PC');
#
# Server restart shall not change whether prepared xact blocks CIC
@ -94,7 +98,7 @@ is($result, '0', 'bt_index_check after overlapping 2PC');
$node->safe_psql(
'postgres', q(
BEGIN;
INSERT INTO tbl VALUES(0);
INSERT INTO tbl VALUES(0, '{"a":[["b",{"x":1}],["b",{"x":2}]],"c":3}');
PREPARE TRANSACTION 'spans_restart';
BEGIN;
CREATE TABLE unused ();
@ -108,12 +112,16 @@ $reindex_h->query_until(
\echo start
DROP INDEX CONCURRENTLY idx;
CREATE INDEX CONCURRENTLY idx ON tbl(i);
DROP INDEX CONCURRENTLY ginidx;
CREATE INDEX CONCURRENTLY ginidx ON tbl USING gin(j);
));
$node->safe_psql('postgres', "COMMIT PREPARED 'spans_restart'");
$reindex_h->quit;
$result = $node->psql('postgres', q(SELECT bt_index_check('idx',true)));
is($result, '0', 'bt_index_check after 2PC and restart');
$result = $node->psql('postgres', q(SELECT gin_index_check('ginidx')));
is($result, '0', 'gin_index_check after 2PC and restart');
#
@ -136,14 +144,14 @@ $node->pgbench(
{
'003_pgbench_concurrent_2pc' => q(
BEGIN;
INSERT INTO tbl VALUES(0);
INSERT INTO tbl VALUES(0,'null');
PREPARE TRANSACTION 'c:client_id';
COMMIT PREPARED 'c:client_id';
),
'003_pgbench_concurrent_2pc_savepoint' => q(
BEGIN;
SAVEPOINT s1;
INSERT INTO tbl VALUES(0);
INSERT INTO tbl VALUES(0,'[false, "jnvaba", -76, 7, {"_": [1]}, 9]');
PREPARE TRANSACTION 'c:client_id';
COMMIT PREPARED 'c:client_id';
),
@ -163,7 +171,25 @@ $node->pgbench(
SELECT bt_index_check('idx',true);
SELECT pg_advisory_unlock(42);
\endif
),
'005_pgbench_concurrent_cic' => q(
SELECT pg_try_advisory_lock(42)::integer AS gotginlock \gset
\if :gotginlock
DROP INDEX CONCURRENTLY ginidx;
CREATE INDEX CONCURRENTLY ginidx ON tbl USING gin(j);
SELECT gin_index_check('ginidx');
SELECT pg_advisory_unlock(42);
\endif
),
'006_pgbench_concurrent_ric' => q(
SELECT pg_try_advisory_lock(42)::integer AS gotginlock \gset
\if :gotginlock
REINDEX INDEX CONCURRENTLY ginidx;
SELECT gin_index_check('ginidx');
SELECT pg_advisory_unlock(42);
\endif
)
});
$node->stop;

View File

@ -0,0 +1,191 @@
/*-------------------------------------------------------------------------
*
* verify_common.c
* Utility functions common to all access methods.
*
* Copyright (c) 2016-2025, PostgreSQL Global Development Group
*
* IDENTIFICATION
* contrib/amcheck/verify_common.c
*
*-------------------------------------------------------------------------
*/
#include "postgres.h"
#include "access/genam.h"
#include "access/table.h"
#include "access/tableam.h"
#include "verify_common.h"
#include "catalog/index.h"
#include "catalog/pg_am.h"
#include "commands/tablecmds.h"
#include "utils/guc.h"
#include "utils/syscache.h"
static bool amcheck_index_mainfork_expected(Relation rel);
/*
* Check if index relation should have a file for its main relation fork.
* Verification uses this to skip unlogged indexes when in hot standby mode,
* where there is simply nothing to verify.
*
* NB: Caller should call index_checkable() before calling here.
*/
static bool
amcheck_index_mainfork_expected(Relation rel)
{
if (rel->rd_rel->relpersistence != RELPERSISTENCE_UNLOGGED ||
!RecoveryInProgress())
return true;
ereport(NOTICE,
(errcode(ERRCODE_READ_ONLY_SQL_TRANSACTION),
errmsg("cannot verify unlogged index \"%s\" during recovery, skipping",
RelationGetRelationName(rel))));
return false;
}
/*
* Amcheck main workhorse.
* Given index relation OID, lock relation.
* Next, take a number of standard actions:
* 1) Make sure the index can be checked
* 2) change the context of the user,
* 3) keep track of GUCs modified via index functions
* 4) execute callback function to verify integrity.
*/
void
amcheck_lock_relation_and_check(Oid indrelid,
Oid am_id,
IndexDoCheckCallback check,
LOCKMODE lockmode,
void *state)
{
Oid heapid;
Relation indrel;
Relation heaprel;
Oid save_userid;
int save_sec_context;
int save_nestlevel;
/*
* We must lock table before index to avoid deadlocks. However, if the
* passed indrelid isn't an index then IndexGetRelation() will fail.
* Rather than emitting a not-very-helpful error message, postpone
* complaining, expecting that the is-it-an-index test below will fail.
*
* In hot standby mode this will raise an error when parentcheck is true.
*/
heapid = IndexGetRelation(indrelid, true);
if (OidIsValid(heapid))
{
heaprel = table_open(heapid, lockmode);
/*
* Switch to the table owner's userid, so that any index functions are
* run as that user. Also lock down security-restricted operations
* and arrange to make GUC variable changes local to this command.
*/
GetUserIdAndSecContext(&save_userid, &save_sec_context);
SetUserIdAndSecContext(heaprel->rd_rel->relowner,
save_sec_context | SECURITY_RESTRICTED_OPERATION);
save_nestlevel = NewGUCNestLevel();
}
else
{
heaprel = NULL;
/* Set these just to suppress "uninitialized variable" warnings */
save_userid = InvalidOid;
save_sec_context = -1;
save_nestlevel = -1;
}
/*
* Open the target index relations separately (like relation_openrv(), but
* with heap relation locked first to prevent deadlocking). In hot
* standby mode this will raise an error when parentcheck is true.
*
* There is no need for the usual indcheckxmin usability horizon test
* here, even in the heapallindexed case, because index undergoing
* verification only needs to have entries for a new transaction snapshot.
* (If this is a parentcheck verification, there is no question about
* committed or recently dead heap tuples lacking index entries due to
* concurrent activity.)
*/
indrel = index_open(indrelid, lockmode);
/*
* Since we did the IndexGetRelation call above without any lock, it's
* barely possible that a race against an index drop/recreation could have
* netted us the wrong table.
*/
if (heaprel == NULL || heapid != IndexGetRelation(indrelid, false))
ereport(ERROR,
(errcode(ERRCODE_UNDEFINED_TABLE),
errmsg("could not open parent table of index \"%s\"",
RelationGetRelationName(indrel))));
/* Check that relation suitable for checking */
if (index_checkable(indrel, am_id))
check(indrel, heaprel, state, lockmode == ShareLock);
/* Roll back any GUC changes executed by index functions */
AtEOXact_GUC(false, save_nestlevel);
/* Restore userid and security context */
SetUserIdAndSecContext(save_userid, save_sec_context);
/*
* Release locks early. That's ok here because nothing in the called
* routines will trigger shared cache invalidations to be sent, so we can
* relax the usual pattern of only releasing locks after commit.
*/
index_close(indrel, lockmode);
if (heaprel)
table_close(heaprel, lockmode);
}
/*
* Basic checks about the suitability of a relation for checking as an index.
*
*
* NB: Intentionally not checking permissions, the function is normally not
* callable by non-superusers. If granted, it's useful to be able to check a
* whole cluster.
*/
bool
index_checkable(Relation rel, Oid am_id)
{
if (rel->rd_rel->relkind != RELKIND_INDEX ||
rel->rd_rel->relam != am_id)
{
HeapTuple amtup;
HeapTuple amtuprel;
amtup = SearchSysCache1(AMOID, ObjectIdGetDatum(am_id));
amtuprel = SearchSysCache1(AMOID, ObjectIdGetDatum(rel->rd_rel->relam));
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("expected \"%s\" index as targets for verification", NameStr(((Form_pg_am) GETSTRUCT(amtup))->amname)),
errdetail("Relation \"%s\" is a %s index.",
RelationGetRelationName(rel), NameStr(((Form_pg_am) GETSTRUCT(amtuprel))->amname))));
}
if (RELATION_IS_OTHER_TEMP(rel))
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("cannot access temporary tables of other sessions"),
errdetail("Index \"%s\" is associated with temporary relation.",
RelationGetRelationName(rel))));
if (!rel->rd_index->indisvalid)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("cannot check index \"%s\"",
RelationGetRelationName(rel)),
errdetail("Index is not valid.")));
return amcheck_index_mainfork_expected(rel);
}

View File

@ -0,0 +1,31 @@
/*-------------------------------------------------------------------------
*
* amcheck.h
* Shared routines for amcheck verifications.
*
* Copyright (c) 2016-2025, PostgreSQL Global Development Group
*
* IDENTIFICATION
* contrib/amcheck/amcheck.h
*
*-------------------------------------------------------------------------
*/
#include "storage/bufpage.h"
#include "storage/lmgr.h"
#include "storage/lockdefs.h"
#include "utils/relcache.h"
#include "miscadmin.h"
/* Typedefs for callback functions for amcheck_lock_relation_and_check */
typedef void (*IndexCheckableCallback) (Relation index);
typedef void (*IndexDoCheckCallback) (Relation rel,
Relation heaprel,
void *state,
bool readonly);
extern void amcheck_lock_relation_and_check(Oid indrelid,
Oid am_id,
IndexDoCheckCallback check,
LOCKMODE lockmode, void *state);
extern bool index_checkable(Relation rel, Oid am_id);

View File

@ -0,0 +1,799 @@
/*-------------------------------------------------------------------------
*
* verify_gin.c
* Verifies the integrity of GIN indexes based on invariants.
*
*
* GIN index verification checks a number of invariants:
*
* - consistency: Paths in GIN graph have to contain consistent keys: tuples
* on parent pages consistently include tuples from children pages.
*
* - graph invariants: Each internal page must have at least one downlink, and
* can reference either only leaf pages or only internal pages.
*
*
* Copyright (c) 2016-2025, PostgreSQL Global Development Group
*
* IDENTIFICATION
* contrib/amcheck/verify_gin.c
*
*-------------------------------------------------------------------------
*/
#include "postgres.h"
#include "access/gin_private.h"
#include "access/nbtree.h"
#include "catalog/pg_am.h"
#include "utils/memutils.h"
#include "utils/rel.h"
#include "verify_common.h"
#include "string.h"
/*
* GinScanItem represents one item of depth-first scan of the index.
*/
typedef struct GinScanItem
{
int depth;
IndexTuple parenttup;
BlockNumber parentblk;
XLogRecPtr parentlsn;
BlockNumber blkno;
struct GinScanItem *next;
} GinScanItem;
/*
* GinPostingTreeScanItem represents one item of a depth-first posting tree scan.
*/
typedef struct GinPostingTreeScanItem
{
int depth;
ItemPointerData parentkey;
BlockNumber parentblk;
BlockNumber blkno;
struct GinPostingTreeScanItem *next;
} GinPostingTreeScanItem;
PG_FUNCTION_INFO_V1(gin_index_check);
static void gin_check_parent_keys_consistency(Relation rel,
Relation heaprel,
void *callback_state, bool readonly);
static void check_index_page(Relation rel, Buffer buffer, BlockNumber blockNo);
static IndexTuple gin_refind_parent(Relation rel,
BlockNumber parentblkno,
BlockNumber childblkno,
BufferAccessStrategy strategy);
static ItemId PageGetItemIdCareful(Relation rel, BlockNumber block, Page page,
OffsetNumber offset);
/*
* gin_index_check(index regclass)
*
* Verify integrity of GIN index.
*
* Acquires AccessShareLock on heap & index relations.
*/
Datum
gin_index_check(PG_FUNCTION_ARGS)
{
Oid indrelid = PG_GETARG_OID(0);
amcheck_lock_relation_and_check(indrelid,
GIN_AM_OID,
gin_check_parent_keys_consistency,
AccessShareLock,
NULL);
PG_RETURN_VOID();
}
/*
* Read item pointers from leaf entry tuple.
*
* Returns a palloc'd array of ItemPointers. The number of items is returned
* in *nitems.
*/
static ItemPointer
ginReadTupleWithoutState(IndexTuple itup, int *nitems)
{
Pointer ptr = GinGetPosting(itup);
int nipd = GinGetNPosting(itup);
ItemPointer ipd;
int ndecoded;
if (GinItupIsCompressed(itup))
{
if (nipd > 0)
{
ipd = ginPostingListDecode((GinPostingList *) ptr, &ndecoded);
if (nipd != ndecoded)
elog(ERROR, "number of items mismatch in GIN entry tuple, %d in tuple header, %d decoded",
nipd, ndecoded);
}
else
ipd = palloc(0);
}
else
{
ipd = (ItemPointer) palloc(sizeof(ItemPointerData) * nipd);
memcpy(ipd, ptr, sizeof(ItemPointerData) * nipd);
}
*nitems = nipd;
return ipd;
}
/*
* Scans through a posting tree (given by the root), and verifies that the keys
* on a child keys are consistent with the parent.
*
* Allocates a separate memory context and scans through posting tree graph.
*/
static void
gin_check_posting_tree_parent_keys_consistency(Relation rel, BlockNumber posting_tree_root)
{
BufferAccessStrategy strategy = GetAccessStrategy(BAS_BULKREAD);
GinPostingTreeScanItem *stack;
MemoryContext mctx;
MemoryContext oldcontext;
int leafdepth;
mctx = AllocSetContextCreate(CurrentMemoryContext,
"posting tree check context",
ALLOCSET_DEFAULT_SIZES);
oldcontext = MemoryContextSwitchTo(mctx);
/*
* We don't know the height of the tree yet, but as soon as we encounter a
* leaf page, we will set 'leafdepth' to its depth.
*/
leafdepth = -1;
/* Start the scan at the root page */
stack = (GinPostingTreeScanItem *) palloc0(sizeof(GinPostingTreeScanItem));
stack->depth = 0;
ItemPointerSetInvalid(&stack->parentkey);
stack->parentblk = InvalidBlockNumber;
stack->blkno = posting_tree_root;
elog(DEBUG3, "processing posting tree at blk %u", posting_tree_root);
while (stack)
{
GinPostingTreeScanItem *stack_next;
Buffer buffer;
Page page;
OffsetNumber i,
maxoff;
BlockNumber rightlink;
CHECK_FOR_INTERRUPTS();
buffer = ReadBufferExtended(rel, MAIN_FORKNUM, stack->blkno,
RBM_NORMAL, strategy);
LockBuffer(buffer, GIN_SHARE);
page = (Page) BufferGetPage(buffer);
Assert(GinPageIsData(page));
/* Check that the tree has the same height in all branches */
if (GinPageIsLeaf(page))
{
ItemPointerData minItem;
int nlist;
ItemPointerData *list;
char tidrange_buf[MAXPGPATH];
ItemPointerSetMin(&minItem);
elog(DEBUG1, "page blk: %u, type leaf", stack->blkno);
if (leafdepth == -1)
leafdepth = stack->depth;
else if (stack->depth != leafdepth)
ereport(ERROR,
(errcode(ERRCODE_INDEX_CORRUPTED),
errmsg("index \"%s\": internal pages traversal encountered leaf page unexpectedly on block %u",
RelationGetRelationName(rel), stack->blkno)));
list = GinDataLeafPageGetItems(page, &nlist, minItem);
if (nlist > 0)
snprintf(tidrange_buf, sizeof(tidrange_buf),
"%d tids (%u, %u) - (%u, %u)",
nlist,
ItemPointerGetBlockNumberNoCheck(&list[0]),
ItemPointerGetOffsetNumberNoCheck(&list[0]),
ItemPointerGetBlockNumberNoCheck(&list[nlist - 1]),
ItemPointerGetOffsetNumberNoCheck(&list[nlist - 1]));
else
snprintf(tidrange_buf, sizeof(tidrange_buf), "0 tids");
if (stack->parentblk != InvalidBlockNumber)
elog(DEBUG3, "blk %u: parent %u highkey (%u, %u), %s",
stack->blkno,
stack->parentblk,
ItemPointerGetBlockNumberNoCheck(&stack->parentkey),
ItemPointerGetOffsetNumberNoCheck(&stack->parentkey),
tidrange_buf);
else
elog(DEBUG3, "blk %u: root leaf, %s",
stack->blkno,
tidrange_buf);
if (stack->parentblk != InvalidBlockNumber &&
ItemPointerGetOffsetNumberNoCheck(&stack->parentkey) != InvalidOffsetNumber &&
nlist > 0 && ItemPointerCompare(&stack->parentkey, &list[nlist - 1]) < 0)
ereport(ERROR,
(errcode(ERRCODE_INDEX_CORRUPTED),
errmsg("index \"%s\": tid exceeds parent's high key in postingTree leaf on block %u",
RelationGetRelationName(rel), stack->blkno)));
}
else
{
LocationIndex pd_lower;
ItemPointerData bound;
int lowersize;
/*
* Check that tuples in each page are properly ordered and
* consistent with parent high key
*/
maxoff = GinPageGetOpaque(page)->maxoff;
rightlink = GinPageGetOpaque(page)->rightlink;
elog(DEBUG1, "page blk: %u, type data, maxoff %d", stack->blkno, maxoff);
if (stack->parentblk != InvalidBlockNumber)
elog(DEBUG3, "blk %u: internal posting tree page with %u items, parent %u highkey (%u, %u)",
stack->blkno, maxoff, stack->parentblk,
ItemPointerGetBlockNumberNoCheck(&stack->parentkey),
ItemPointerGetOffsetNumberNoCheck(&stack->parentkey));
else
elog(DEBUG3, "blk %u: root internal posting tree page with %u items",
stack->blkno, maxoff);
/*
* A GIN posting tree internal page stores PostingItems in the
* 'lower' part of the page. The 'upper' part is unused. The
* number of elements is stored in the opaque area (maxoff). Make
* sure the size of the 'lower' part agrees with 'maxoff'
*
* We didn't set pd_lower until PostgreSQL version 9.4, so if this
* check fails, it could also be because the index was
* binary-upgraded from an earlier version. That was a long time
* ago, though, so let's warn if it doesn't match.
*/
pd_lower = ((PageHeader) page)->pd_lower;
lowersize = pd_lower - MAXALIGN(SizeOfPageHeaderData);
if ((lowersize - MAXALIGN(sizeof(ItemPointerData))) / sizeof(PostingItem) != maxoff)
ereport(ERROR,
(errcode(ERRCODE_INDEX_CORRUPTED),
errmsg("index \"%s\" has unexpected pd_lower %u in posting tree block %u with maxoff %u)",
RelationGetRelationName(rel), pd_lower, stack->blkno, maxoff)));
/*
* Before the PostingItems, there's one ItemPointerData in the
* 'lower' part that stores the page's high key.
*/
bound = *GinDataPageGetRightBound(page);
/*
* Gin page right bound has a sane value only when not a highkey
* on the rightmost page (at a given level). For the rightmost
* page does not store the highkey explicitly, and the value is
* infinity.
*/
if (ItemPointerIsValid(&stack->parentkey) &&
rightlink != InvalidBlockNumber &&
!ItemPointerEquals(&stack->parentkey, &bound))
ereport(ERROR,
(errcode(ERRCODE_INDEX_CORRUPTED),
errmsg("index \"%s\": posting tree page's high key (%u, %u) doesn't match the downlink on block %u (parent blk %u, key (%u, %u))",
RelationGetRelationName(rel),
ItemPointerGetBlockNumberNoCheck(&bound),
ItemPointerGetOffsetNumberNoCheck(&bound),
stack->blkno, stack->parentblk,
ItemPointerGetBlockNumberNoCheck(&stack->parentkey),
ItemPointerGetOffsetNumberNoCheck(&stack->parentkey))));
for (i = FirstOffsetNumber; i <= maxoff; i = OffsetNumberNext(i))
{
GinPostingTreeScanItem *ptr;
PostingItem *posting_item = GinDataPageGetPostingItem(page, i);
/* ItemPointerGetOffsetNumber expects a valid pointer */
if (!(i == maxoff &&
rightlink == InvalidBlockNumber))
elog(DEBUG3, "key (%u, %u) -> %u",
ItemPointerGetBlockNumber(&posting_item->key),
ItemPointerGetOffsetNumber(&posting_item->key),
BlockIdGetBlockNumber(&posting_item->child_blkno));
else
elog(DEBUG3, "key (%u, %u) -> %u",
0, 0, BlockIdGetBlockNumber(&posting_item->child_blkno));
if (i == maxoff && rightlink == InvalidBlockNumber)
{
/*
* The rightmost item in the tree level has (0, 0) as the
* key
*/
if (ItemPointerGetBlockNumberNoCheck(&posting_item->key) != 0 ||
ItemPointerGetOffsetNumberNoCheck(&posting_item->key) != 0)
ereport(ERROR,
(errcode(ERRCODE_INDEX_CORRUPTED),
errmsg("index \"%s\": rightmost posting tree page (blk %u) has unexpected last key (%u, %u)",
RelationGetRelationName(rel),
stack->blkno,
ItemPointerGetBlockNumberNoCheck(&posting_item->key),
ItemPointerGetOffsetNumberNoCheck(&posting_item->key))));
}
else if (i != FirstOffsetNumber)
{
PostingItem *previous_posting_item = GinDataPageGetPostingItem(page, i - 1);
if (ItemPointerCompare(&posting_item->key, &previous_posting_item->key) < 0)
ereport(ERROR,
(errcode(ERRCODE_INDEX_CORRUPTED),
errmsg("index \"%s\" has wrong tuple order in posting tree, block %u, offset %u",
RelationGetRelationName(rel), stack->blkno, i)));
}
/*
* Check if this tuple is consistent with the downlink in the
* parent.
*/
if (stack->parentblk != InvalidBlockNumber && i == maxoff &&
ItemPointerCompare(&stack->parentkey, &posting_item->key) < 0)
ereport(ERROR,
(errcode(ERRCODE_INDEX_CORRUPTED),
errmsg("index \"%s\": posting item exceeds parent's high key in postingTree internal page on block %u offset %u",
RelationGetRelationName(rel),
stack->blkno, i)));
/* This is an internal page, recurse into the child. */
ptr = (GinPostingTreeScanItem *) palloc(sizeof(GinPostingTreeScanItem));
ptr->depth = stack->depth + 1;
/*
* Set rightmost parent key to invalid item pointer. Its value
* is 'Infinity' and not explicitly stored.
*/
if (rightlink == InvalidBlockNumber)
ItemPointerSetInvalid(&ptr->parentkey);
else
ptr->parentkey = posting_item->key;
ptr->parentblk = stack->blkno;
ptr->blkno = BlockIdGetBlockNumber(&posting_item->child_blkno);
ptr->next = stack->next;
stack->next = ptr;
}
}
LockBuffer(buffer, GIN_UNLOCK);
ReleaseBuffer(buffer);
/* Step to next item in the queue */
stack_next = stack->next;
pfree(stack);
stack = stack_next;
}
MemoryContextSwitchTo(oldcontext);
MemoryContextDelete(mctx);
}
/*
* Main entry point for GIN checks.
*
* Allocates memory context and scans through the whole GIN graph.
*/
static void
gin_check_parent_keys_consistency(Relation rel,
Relation heaprel,
void *callback_state,
bool readonly)
{
BufferAccessStrategy strategy = GetAccessStrategy(BAS_BULKREAD);
GinScanItem *stack;
MemoryContext mctx;
MemoryContext oldcontext;
GinState state;
int leafdepth;
mctx = AllocSetContextCreate(CurrentMemoryContext,
"amcheck consistency check context",
ALLOCSET_DEFAULT_SIZES);
oldcontext = MemoryContextSwitchTo(mctx);
initGinState(&state, rel);
/*
* We don't know the height of the tree yet, but as soon as we encounter a
* leaf page, we will set 'leafdepth' to its depth.
*/
leafdepth = -1;
/* Start the scan at the root page */
stack = (GinScanItem *) palloc0(sizeof(GinScanItem));
stack->depth = 0;
stack->parenttup = NULL;
stack->parentblk = InvalidBlockNumber;
stack->parentlsn = InvalidXLogRecPtr;
stack->blkno = GIN_ROOT_BLKNO;
while (stack)
{
GinScanItem *stack_next;
Buffer buffer;
Page page;
OffsetNumber i,
maxoff,
prev_attnum;
XLogRecPtr lsn;
IndexTuple prev_tuple;
BlockNumber rightlink;
CHECK_FOR_INTERRUPTS();
buffer = ReadBufferExtended(rel, MAIN_FORKNUM, stack->blkno,
RBM_NORMAL, strategy);
LockBuffer(buffer, GIN_SHARE);
page = (Page) BufferGetPage(buffer);
lsn = BufferGetLSNAtomic(buffer);
maxoff = PageGetMaxOffsetNumber(page);
rightlink = GinPageGetOpaque(page)->rightlink;
/* Do basic sanity checks on the page headers */
check_index_page(rel, buffer, stack->blkno);
elog(DEBUG3, "processing entry tree page at blk %u, maxoff: %u", stack->blkno, maxoff);
/*
* It's possible that the page was split since we looked at the
* parent, so that we didn't missed the downlink of the right sibling
* when we scanned the parent. If so, add the right sibling to the
* stack now.
*/
if (stack->parenttup != NULL)
{
GinNullCategory parent_key_category;
Datum parent_key = gintuple_get_key(&state,
stack->parenttup,
&parent_key_category);
ItemId iid = PageGetItemIdCareful(rel, stack->blkno,
page, maxoff);
IndexTuple idxtuple = (IndexTuple) PageGetItem(page, iid);
OffsetNumber attnum = gintuple_get_attrnum(&state, idxtuple);
GinNullCategory page_max_key_category;
Datum page_max_key = gintuple_get_key(&state, idxtuple, &page_max_key_category);
if (rightlink != InvalidBlockNumber &&
ginCompareEntries(&state, attnum, page_max_key,
page_max_key_category, parent_key,
parent_key_category) > 0)
{
/* split page detected, install right link to the stack */
GinScanItem *ptr;
elog(DEBUG3, "split detected for blk: %u, parent blk: %u", stack->blkno, stack->parentblk);
ptr = (GinScanItem *) palloc(sizeof(GinScanItem));
ptr->depth = stack->depth;
ptr->parenttup = CopyIndexTuple(stack->parenttup);
ptr->parentblk = stack->parentblk;
ptr->parentlsn = stack->parentlsn;
ptr->blkno = rightlink;
ptr->next = stack->next;
stack->next = ptr;
}
}
/* Check that the tree has the same height in all branches */
if (GinPageIsLeaf(page))
{
if (leafdepth == -1)
leafdepth = stack->depth;
else if (stack->depth != leafdepth)
ereport(ERROR,
(errcode(ERRCODE_INDEX_CORRUPTED),
errmsg("index \"%s\": internal pages traversal encountered leaf page unexpectedly on block %u",
RelationGetRelationName(rel), stack->blkno)));
}
/*
* Check that tuples in each page are properly ordered and consistent
* with parent high key
*/
prev_tuple = NULL;
prev_attnum = InvalidAttrNumber;
for (i = FirstOffsetNumber; i <= maxoff; i = OffsetNumberNext(i))
{
ItemId iid = PageGetItemIdCareful(rel, stack->blkno, page, i);
IndexTuple idxtuple = (IndexTuple) PageGetItem(page, iid);
OffsetNumber attnum = gintuple_get_attrnum(&state, idxtuple);
GinNullCategory prev_key_category;
Datum prev_key;
GinNullCategory current_key_category;
Datum current_key;
if (MAXALIGN(ItemIdGetLength(iid)) != MAXALIGN(IndexTupleSize(idxtuple)))
ereport(ERROR,
(errcode(ERRCODE_INDEX_CORRUPTED),
errmsg("index \"%s\" has inconsistent tuple sizes, block %u, offset %u",
RelationGetRelationName(rel), stack->blkno, i)));
current_key = gintuple_get_key(&state, idxtuple, &current_key_category);
/*
* First block is metadata, skip order check. Also, never check
* for high key on rightmost page, as this key is not really
* stored explicitly.
*
* Also make sure to not compare entries for different attnums,
* which may be stored on the same page.
*/
if (i != FirstOffsetNumber && attnum == prev_attnum && stack->blkno != GIN_ROOT_BLKNO &&
!(i == maxoff && rightlink == InvalidBlockNumber))
{
prev_key = gintuple_get_key(&state, prev_tuple, &prev_key_category);
if (ginCompareEntries(&state, attnum, prev_key,
prev_key_category, current_key,
current_key_category) >= 0)
ereport(ERROR,
(errcode(ERRCODE_INDEX_CORRUPTED),
errmsg("index \"%s\" has wrong tuple order on entry tree page, block %u, offset %u, rightlink %u",
RelationGetRelationName(rel), stack->blkno, i, rightlink)));
}
/*
* Check if this tuple is consistent with the downlink in the
* parent.
*/
if (stack->parenttup &&
i == maxoff)
{
GinNullCategory parent_key_category;
Datum parent_key = gintuple_get_key(&state,
stack->parenttup,
&parent_key_category);
if (ginCompareEntries(&state, attnum, current_key,
current_key_category, parent_key,
parent_key_category) > 0)
{
/*
* There was a discrepancy between parent and child
* tuples. We need to verify it is not a result of
* concurrent call of gistplacetopage(). So, lock parent
* and try to find downlink for current page. It may be
* missing due to concurrent page split, this is OK.
*/
pfree(stack->parenttup);
stack->parenttup = gin_refind_parent(rel, stack->parentblk,
stack->blkno, strategy);
/* We found it - make a final check before failing */
if (!stack->parenttup)
elog(NOTICE, "Unable to find parent tuple for block %u on block %u due to concurrent split",
stack->blkno, stack->parentblk);
else
{
parent_key = gintuple_get_key(&state,
stack->parenttup,
&parent_key_category);
/*
* Check if it is properly adjusted. If succeed,
* proceed to the next key.
*/
if (ginCompareEntries(&state, attnum, current_key,
current_key_category, parent_key,
parent_key_category) > 0)
ereport(ERROR,
(errcode(ERRCODE_INDEX_CORRUPTED),
errmsg("index \"%s\" has inconsistent records on page %u offset %u",
RelationGetRelationName(rel), stack->blkno, i)));
}
}
}
/* If this is an internal page, recurse into the child */
if (!GinPageIsLeaf(page))
{
GinScanItem *ptr;
ptr = (GinScanItem *) palloc(sizeof(GinScanItem));
ptr->depth = stack->depth + 1;
/* last tuple in layer has no high key */
if (i != maxoff && !GinPageGetOpaque(page)->rightlink)
ptr->parenttup = CopyIndexTuple(idxtuple);
else
ptr->parenttup = NULL;
ptr->parentblk = stack->blkno;
ptr->blkno = GinGetDownlink(idxtuple);
ptr->parentlsn = lsn;
ptr->next = stack->next;
stack->next = ptr;
}
/* If this item is a pointer to a posting tree, recurse into it */
else if (GinIsPostingTree(idxtuple))
{
BlockNumber rootPostingTree = GinGetPostingTree(idxtuple);
gin_check_posting_tree_parent_keys_consistency(rel, rootPostingTree);
}
else
{
ItemPointer ipd;
int nipd;
ipd = ginReadTupleWithoutState(idxtuple, &nipd);
for (int j = 0; j < nipd; j++)
{
if (!OffsetNumberIsValid(ItemPointerGetOffsetNumber(&ipd[j])))
ereport(ERROR,
(errcode(ERRCODE_INDEX_CORRUPTED),
errmsg("index \"%s\": posting list contains invalid heap pointer on block %u",
RelationGetRelationName(rel), stack->blkno)));
}
pfree(ipd);
}
prev_tuple = CopyIndexTuple(idxtuple);
prev_attnum = attnum;
}
LockBuffer(buffer, GIN_UNLOCK);
ReleaseBuffer(buffer);
/* Step to next item in the queue */
stack_next = stack->next;
if (stack->parenttup)
pfree(stack->parenttup);
pfree(stack);
stack = stack_next;
}
MemoryContextSwitchTo(oldcontext);
MemoryContextDelete(mctx);
}
/*
* Verify that a freshly-read page looks sane.
*/
static void
check_index_page(Relation rel, Buffer buffer, BlockNumber blockNo)
{
Page page = BufferGetPage(buffer);
/*
* ReadBuffer verifies that every newly-read page passes
* PageHeaderIsValid, which means it either contains a reasonably sane
* page header or is all-zero. We have to defend against the all-zero
* case, however.
*/
if (PageIsNew(page))
ereport(ERROR,
(errcode(ERRCODE_INDEX_CORRUPTED),
errmsg("index \"%s\" contains unexpected zero page at block %u",
RelationGetRelationName(rel),
BufferGetBlockNumber(buffer)),
errhint("Please REINDEX it.")));
/*
* Additionally check that the special area looks sane.
*/
if (PageGetSpecialSize(page) != MAXALIGN(sizeof(GinPageOpaqueData)))
ereport(ERROR,
(errcode(ERRCODE_INDEX_CORRUPTED),
errmsg("index \"%s\" contains corrupted page at block %u",
RelationGetRelationName(rel),
BufferGetBlockNumber(buffer)),
errhint("Please REINDEX it.")));
if (GinPageIsDeleted(page))
{
if (!GinPageIsLeaf(page))
ereport(ERROR,
(errcode(ERRCODE_INDEX_CORRUPTED),
errmsg("index \"%s\" has deleted internal page %u",
RelationGetRelationName(rel), blockNo)));
if (PageGetMaxOffsetNumber(page) > InvalidOffsetNumber)
ereport(ERROR,
(errcode(ERRCODE_INDEX_CORRUPTED),
errmsg("index \"%s\" has deleted page %u with tuples",
RelationGetRelationName(rel), blockNo)));
}
else if (PageGetMaxOffsetNumber(page) > MaxIndexTuplesPerPage)
ereport(ERROR,
(errcode(ERRCODE_INDEX_CORRUPTED),
errmsg("index \"%s\" has page %u with exceeding count of tuples",
RelationGetRelationName(rel), blockNo)));
}
/*
* Try to re-find downlink pointing to 'blkno', in 'parentblkno'.
*
* If found, returns a palloc'd copy of the downlink tuple. Otherwise,
* returns NULL.
*/
static IndexTuple
gin_refind_parent(Relation rel, BlockNumber parentblkno,
BlockNumber childblkno, BufferAccessStrategy strategy)
{
Buffer parentbuf;
Page parentpage;
OffsetNumber o,
parent_maxoff;
IndexTuple result = NULL;
parentbuf = ReadBufferExtended(rel, MAIN_FORKNUM, parentblkno, RBM_NORMAL,
strategy);
LockBuffer(parentbuf, GIN_SHARE);
parentpage = BufferGetPage(parentbuf);
if (GinPageIsLeaf(parentpage))
{
UnlockReleaseBuffer(parentbuf);
return result;
}
parent_maxoff = PageGetMaxOffsetNumber(parentpage);
for (o = FirstOffsetNumber; o <= parent_maxoff; o = OffsetNumberNext(o))
{
ItemId p_iid = PageGetItemIdCareful(rel, parentblkno, parentpage, o);
IndexTuple itup = (IndexTuple) PageGetItem(parentpage, p_iid);
if (ItemPointerGetBlockNumber(&(itup->t_tid)) == childblkno)
{
/* Found it! Make copy and return it */
result = CopyIndexTuple(itup);
break;
}
}
UnlockReleaseBuffer(parentbuf);
return result;
}
static ItemId
PageGetItemIdCareful(Relation rel, BlockNumber block, Page page,
OffsetNumber offset)
{
ItemId itemid = PageGetItemId(page, offset);
if (ItemIdGetOffset(itemid) + ItemIdGetLength(itemid) >
BLCKSZ - MAXALIGN(sizeof(GinPageOpaqueData)))
ereport(ERROR,
(errcode(ERRCODE_INDEX_CORRUPTED),
errmsg("line pointer points past end of tuple space in index \"%s\"",
RelationGetRelationName(rel)),
errdetail_internal("Index tid=(%u,%u) lp_off=%u, lp_len=%u lp_flags=%u.",
block, offset, ItemIdGetOffset(itemid),
ItemIdGetLength(itemid),
ItemIdGetFlags(itemid))));
/*
* Verify that line pointer isn't LP_REDIRECT or LP_UNUSED or LP_DEAD,
* since GIN never uses all three. Verify that line pointer has storage,
* too.
*/
if (ItemIdIsRedirected(itemid) || !ItemIdIsUsed(itemid) ||
ItemIdIsDead(itemid) || ItemIdGetLength(itemid) == 0)
ereport(ERROR,
(errcode(ERRCODE_INDEX_CORRUPTED),
errmsg("invalid line pointer storage in index \"%s\"",
RelationGetRelationName(rel)),
errdetail_internal("Index tid=(%u,%u) lp_off=%u, lp_len=%u lp_flags=%u.",
block, offset, ItemIdGetOffset(itemid),
ItemIdGetLength(itemid),
ItemIdGetFlags(itemid))));
return itemid;
}

View File

@ -25,6 +25,7 @@
#include "miscadmin.h"
#include "storage/bufmgr.h"
#include "storage/procarray.h"
#include "storage/read_stream.h"
#include "utils/builtins.h"
#include "utils/fmgroids.h"
#include "utils/rel.h"
@ -118,7 +119,10 @@ typedef struct HeapCheckContext
Relation valid_toast_index;
int num_toast_indexes;
/* Values for iterating over pages in the relation */
/*
* Values for iterating over pages in the relation. `blkno` is the most
* recent block in the buffer yielded by the read stream API.
*/
BlockNumber blkno;
BufferAccessStrategy bstrategy;
Buffer buffer;
@ -153,7 +157,32 @@ typedef struct HeapCheckContext
Tuplestorestate *tupstore;
} HeapCheckContext;
/*
* The per-relation data provided to the read stream API for heap amcheck to
* use in its callback for the SKIP_PAGES_ALL_FROZEN and
* SKIP_PAGES_ALL_VISIBLE options.
*/
typedef struct HeapCheckReadStreamData
{
/*
* `range` is used by all SkipPages options. SKIP_PAGES_NONE uses the
* default read stream callback, block_range_read_stream_cb(), which takes
* a BlockRangeReadStreamPrivate as its callback_private_data. `range`
* keeps track of the current block number across
* read_stream_next_buffer() invocations.
*/
BlockRangeReadStreamPrivate range;
SkipPages skip_option;
Relation rel;
Buffer *vmbuffer;
} HeapCheckReadStreamData;
/* Internal implementation */
static BlockNumber heapcheck_read_stream_next_unskippable(ReadStream *stream,
void *callback_private_data,
void *per_buffer_data);
static void check_tuple(HeapCheckContext *ctx,
bool *xmin_commit_status_ok,
XidCommitStatus *xmin_commit_status);
@ -231,6 +260,11 @@ verify_heapam(PG_FUNCTION_ARGS)
BlockNumber last_block;
BlockNumber nblocks;
const char *skip;
ReadStream *stream;
int stream_flags;
ReadStreamBlockNumberCB stream_cb;
void *stream_data;
HeapCheckReadStreamData stream_skip_data;
/* Check supplied arguments */
if (PG_ARGISNULL(0))
@ -404,7 +438,46 @@ verify_heapam(PG_FUNCTION_ARGS)
if (TransactionIdIsNormal(ctx.relfrozenxid))
ctx.oldest_xid = ctx.relfrozenxid;
for (ctx.blkno = first_block; ctx.blkno <= last_block; ctx.blkno++)
/* Now that `ctx` is set up, set up the read stream */
stream_skip_data.range.current_blocknum = first_block;
stream_skip_data.range.last_exclusive = last_block + 1;
stream_skip_data.skip_option = skip_option;
stream_skip_data.rel = ctx.rel;
stream_skip_data.vmbuffer = &vmbuffer;
if (skip_option == SKIP_PAGES_NONE)
{
/*
* It is safe to use batchmode as block_range_read_stream_cb takes no
* locks.
*/
stream_cb = block_range_read_stream_cb;
stream_flags = READ_STREAM_SEQUENTIAL |
READ_STREAM_FULL |
READ_STREAM_USE_BATCHING;
stream_data = &stream_skip_data.range;
}
else
{
/*
* It would not be safe to naively use batchmode, as
* heapcheck_read_stream_next_unskippable takes locks. It shouldn't be
* too hard to convert though.
*/
stream_cb = heapcheck_read_stream_next_unskippable;
stream_flags = READ_STREAM_DEFAULT;
stream_data = &stream_skip_data;
}
stream = read_stream_begin_relation(stream_flags,
ctx.bstrategy,
ctx.rel,
MAIN_FORKNUM,
stream_cb,
stream_data,
0);
while ((ctx.buffer = read_stream_next_buffer(stream, NULL)) != InvalidBuffer)
{
OffsetNumber maxoff;
OffsetNumber predecessor[MaxOffsetNumber];
@ -417,30 +490,11 @@ verify_heapam(PG_FUNCTION_ARGS)
memset(predecessor, 0, sizeof(OffsetNumber) * MaxOffsetNumber);
/* Optionally skip over all-frozen or all-visible blocks */
if (skip_option != SKIP_PAGES_NONE)
{
int32 mapbits;
mapbits = (int32) visibilitymap_get_status(ctx.rel, ctx.blkno,
&vmbuffer);
if (skip_option == SKIP_PAGES_ALL_FROZEN)
{
if ((mapbits & VISIBILITYMAP_ALL_FROZEN) != 0)
continue;
}
if (skip_option == SKIP_PAGES_ALL_VISIBLE)
{
if ((mapbits & VISIBILITYMAP_ALL_VISIBLE) != 0)
continue;
}
}
/* Read and lock the next page. */
ctx.buffer = ReadBufferExtended(ctx.rel, MAIN_FORKNUM, ctx.blkno,
RBM_NORMAL, ctx.bstrategy);
/* Lock the next page. */
Assert(BufferIsValid(ctx.buffer));
LockBuffer(ctx.buffer, BUFFER_LOCK_SHARE);
ctx.blkno = BufferGetBlockNumber(ctx.buffer);
ctx.page = BufferGetPage(ctx.buffer);
/* Perform tuple checks */
@ -799,6 +853,8 @@ verify_heapam(PG_FUNCTION_ARGS)
break;
}
read_stream_end(stream);
if (vmbuffer != InvalidBuffer)
ReleaseBuffer(vmbuffer);
@ -815,6 +871,42 @@ verify_heapam(PG_FUNCTION_ARGS)
PG_RETURN_NULL();
}
/*
* Heap amcheck's read stream callback for getting the next unskippable block.
* This callback is only used when 'all-visible' or 'all-frozen' is provided
* as the skip option to verify_heapam(). With the default 'none',
* block_range_read_stream_cb() is used instead.
*/
static BlockNumber
heapcheck_read_stream_next_unskippable(ReadStream *stream,
void *callback_private_data,
void *per_buffer_data)
{
HeapCheckReadStreamData *p = callback_private_data;
/* Loops over [current_blocknum, last_exclusive) blocks */
for (BlockNumber i; (i = p->range.current_blocknum++) < p->range.last_exclusive;)
{
uint8 mapbits = visibilitymap_get_status(p->rel, i, p->vmbuffer);
if (p->skip_option == SKIP_PAGES_ALL_FROZEN)
{
if ((mapbits & VISIBILITYMAP_ALL_FROZEN) != 0)
continue;
}
if (p->skip_option == SKIP_PAGES_ALL_VISIBLE)
{
if ((mapbits & VISIBILITYMAP_ALL_VISIBLE) != 0)
continue;
}
return i;
}
return InvalidBlockNumber;
}
/*
* Shared internal implementation for report_corruption and
* report_toast_corruption.

View File

@ -30,6 +30,7 @@
#include "access/tableam.h"
#include "access/transam.h"
#include "access/xact.h"
#include "verify_common.h"
#include "catalog/index.h"
#include "catalog/pg_am.h"
#include "catalog/pg_opfamily_d.h"
@ -42,7 +43,10 @@
#include "utils/snapmgr.h"
PG_MODULE_MAGIC;
PG_MODULE_MAGIC_EXT(
.name = "amcheck",
.version = PG_VERSION
);
/*
* A B-Tree cannot possibly have this many levels, since there must be one
@ -156,14 +160,22 @@ typedef struct BtreeLastVisibleEntry
ItemPointer tid; /* Heap tid */
} BtreeLastVisibleEntry;
/*
* arguments for the bt_index_check_callback callback
*/
typedef struct BTCallbackState
{
bool parentcheck;
bool heapallindexed;
bool rootdescend;
bool checkunique;
} BTCallbackState;
PG_FUNCTION_INFO_V1(bt_index_check);
PG_FUNCTION_INFO_V1(bt_index_parent_check);
static void bt_index_check_internal(Oid indrelid, bool parentcheck,
bool heapallindexed, bool rootdescend,
bool checkunique);
static inline void btree_index_checkable(Relation rel);
static inline bool btree_index_mainfork_expected(Relation rel);
static void bt_index_check_callback(Relation indrel, Relation heaprel,
void *state, bool readonly);
static void bt_check_every_level(Relation rel, Relation heaprel,
bool heapkeyspace, bool readonly, bool heapallindexed,
bool rootdescend, bool checkunique);
@ -238,15 +250,21 @@ Datum
bt_index_check(PG_FUNCTION_ARGS)
{
Oid indrelid = PG_GETARG_OID(0);
bool heapallindexed = false;
bool checkunique = false;
BTCallbackState args;
args.heapallindexed = false;
args.rootdescend = false;
args.parentcheck = false;
args.checkunique = false;
if (PG_NARGS() >= 2)
heapallindexed = PG_GETARG_BOOL(1);
if (PG_NARGS() == 3)
checkunique = PG_GETARG_BOOL(2);
args.heapallindexed = PG_GETARG_BOOL(1);
if (PG_NARGS() >= 3)
args.checkunique = PG_GETARG_BOOL(2);
bt_index_check_internal(indrelid, false, heapallindexed, false, checkunique);
amcheck_lock_relation_and_check(indrelid, BTREE_AM_OID,
bt_index_check_callback,
AccessShareLock, &args);
PG_RETURN_VOID();
}
@ -264,18 +282,23 @@ Datum
bt_index_parent_check(PG_FUNCTION_ARGS)
{
Oid indrelid = PG_GETARG_OID(0);
bool heapallindexed = false;
bool rootdescend = false;
bool checkunique = false;
BTCallbackState args;
args.heapallindexed = false;
args.rootdescend = false;
args.parentcheck = true;
args.checkunique = false;
if (PG_NARGS() >= 2)
heapallindexed = PG_GETARG_BOOL(1);
args.heapallindexed = PG_GETARG_BOOL(1);
if (PG_NARGS() >= 3)
rootdescend = PG_GETARG_BOOL(2);
if (PG_NARGS() == 4)
checkunique = PG_GETARG_BOOL(3);
args.rootdescend = PG_GETARG_BOOL(2);
if (PG_NARGS() >= 4)
args.checkunique = PG_GETARG_BOOL(3);
bt_index_check_internal(indrelid, true, heapallindexed, rootdescend, checkunique);
amcheck_lock_relation_and_check(indrelid, BTREE_AM_OID,
bt_index_check_callback,
ShareLock, &args);
PG_RETURN_VOID();
}
@ -284,193 +307,46 @@ bt_index_parent_check(PG_FUNCTION_ARGS)
* Helper for bt_index_[parent_]check, coordinating the bulk of the work.
*/
static void
bt_index_check_internal(Oid indrelid, bool parentcheck, bool heapallindexed,
bool rootdescend, bool checkunique)
bt_index_check_callback(Relation indrel, Relation heaprel, void *state, bool readonly)
{
Oid heapid;
Relation indrel;
Relation heaprel;
LOCKMODE lockmode;
Oid save_userid;
int save_sec_context;
int save_nestlevel;
BTCallbackState *args = (BTCallbackState *) state;
bool heapkeyspace,
allequalimage;
if (parentcheck)
lockmode = ShareLock;
else
lockmode = AccessShareLock;
/*
* We must lock table before index to avoid deadlocks. However, if the
* passed indrelid isn't an index then IndexGetRelation() will fail.
* Rather than emitting a not-very-helpful error message, postpone
* complaining, expecting that the is-it-an-index test below will fail.
*
* In hot standby mode this will raise an error when parentcheck is true.
*/
heapid = IndexGetRelation(indrelid, true);
if (OidIsValid(heapid))
{
heaprel = table_open(heapid, lockmode);
/*
* Switch to the table owner's userid, so that any index functions are
* run as that user. Also lock down security-restricted operations
* and arrange to make GUC variable changes local to this command.
*/
GetUserIdAndSecContext(&save_userid, &save_sec_context);
SetUserIdAndSecContext(heaprel->rd_rel->relowner,
save_sec_context | SECURITY_RESTRICTED_OPERATION);
save_nestlevel = NewGUCNestLevel();
RestrictSearchPath();
}
else
{
heaprel = NULL;
/* Set these just to suppress "uninitialized variable" warnings */
save_userid = InvalidOid;
save_sec_context = -1;
save_nestlevel = -1;
}
/*
* Open the target index relations separately (like relation_openrv(), but
* with heap relation locked first to prevent deadlocking). In hot
* standby mode this will raise an error when parentcheck is true.
*
* There is no need for the usual indcheckxmin usability horizon test
* here, even in the heapallindexed case, because index undergoing
* verification only needs to have entries for a new transaction snapshot.
* (If this is a parentcheck verification, there is no question about
* committed or recently dead heap tuples lacking index entries due to
* concurrent activity.)
*/
indrel = index_open(indrelid, lockmode);
/*
* Since we did the IndexGetRelation call above without any lock, it's
* barely possible that a race against an index drop/recreation could have
* netted us the wrong table.
*/
if (heaprel == NULL || heapid != IndexGetRelation(indrelid, false))
if (!smgrexists(RelationGetSmgr(indrel), MAIN_FORKNUM))
ereport(ERROR,
(errcode(ERRCODE_UNDEFINED_TABLE),
errmsg("could not open parent table of index \"%s\"",
(errcode(ERRCODE_INDEX_CORRUPTED),
errmsg("index \"%s\" lacks a main relation fork",
RelationGetRelationName(indrel))));
/* Relation suitable for checking as B-Tree? */
btree_index_checkable(indrel);
if (btree_index_mainfork_expected(indrel))
/* Extract metadata from metapage, and sanitize it in passing */
_bt_metaversion(indrel, &heapkeyspace, &allequalimage);
if (allequalimage && !heapkeyspace)
ereport(ERROR,
(errcode(ERRCODE_INDEX_CORRUPTED),
errmsg("index \"%s\" metapage has equalimage field set on unsupported nbtree version",
RelationGetRelationName(indrel))));
if (allequalimage && !_bt_allequalimage(indrel, false))
{
bool heapkeyspace,
allequalimage;
bool has_interval_ops = false;
if (!smgrexists(RelationGetSmgr(indrel), MAIN_FORKNUM))
ereport(ERROR,
(errcode(ERRCODE_INDEX_CORRUPTED),
errmsg("index \"%s\" lacks a main relation fork",
RelationGetRelationName(indrel))));
/* Extract metadata from metapage, and sanitize it in passing */
_bt_metaversion(indrel, &heapkeyspace, &allequalimage);
if (allequalimage && !heapkeyspace)
ereport(ERROR,
(errcode(ERRCODE_INDEX_CORRUPTED),
errmsg("index \"%s\" metapage has equalimage field set on unsupported nbtree version",
RelationGetRelationName(indrel))));
if (allequalimage && !_bt_allequalimage(indrel, false))
{
bool has_interval_ops = false;
for (int i = 0; i < IndexRelationGetNumberOfKeyAttributes(indrel); i++)
if (indrel->rd_opfamily[i] == INTERVAL_BTREE_FAM_OID)
has_interval_ops = true;
ereport(ERROR,
(errcode(ERRCODE_INDEX_CORRUPTED),
errmsg("index \"%s\" metapage incorrectly indicates that deduplication is safe",
RelationGetRelationName(indrel)),
has_interval_ops
? errhint("This is known of \"interval\" indexes last built on a version predating 2023-11.")
: 0));
}
/* Check index, possibly against table it is an index on */
bt_check_every_level(indrel, heaprel, heapkeyspace, parentcheck,
heapallindexed, rootdescend, checkunique);
for (int i = 0; i < IndexRelationGetNumberOfKeyAttributes(indrel); i++)
if (indrel->rd_opfamily[i] == INTERVAL_BTREE_FAM_OID)
{
has_interval_ops = true;
ereport(ERROR,
(errcode(ERRCODE_INDEX_CORRUPTED),
errmsg("index \"%s\" metapage incorrectly indicates that deduplication is safe",
RelationGetRelationName(indrel)),
has_interval_ops
? errhint("This is known of \"interval\" indexes last built on a version predating 2023-11.")
: 0));
}
}
/* Roll back any GUC changes executed by index functions */
AtEOXact_GUC(false, save_nestlevel);
/* Restore userid and security context */
SetUserIdAndSecContext(save_userid, save_sec_context);
/*
* Release locks early. That's ok here because nothing in the called
* routines will trigger shared cache invalidations to be sent, so we can
* relax the usual pattern of only releasing locks after commit.
*/
index_close(indrel, lockmode);
if (heaprel)
table_close(heaprel, lockmode);
}
/*
* Basic checks about the suitability of a relation for checking as a B-Tree
* index.
*
* NB: Intentionally not checking permissions, the function is normally not
* callable by non-superusers. If granted, it's useful to be able to check a
* whole cluster.
*/
static inline void
btree_index_checkable(Relation rel)
{
if (rel->rd_rel->relkind != RELKIND_INDEX ||
rel->rd_rel->relam != BTREE_AM_OID)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("only B-Tree indexes are supported as targets for verification"),
errdetail("Relation \"%s\" is not a B-Tree index.",
RelationGetRelationName(rel))));
if (RELATION_IS_OTHER_TEMP(rel))
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("cannot access temporary tables of other sessions"),
errdetail("Index \"%s\" is associated with temporary relation.",
RelationGetRelationName(rel))));
if (!rel->rd_index->indisvalid)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("cannot check index \"%s\"",
RelationGetRelationName(rel)),
errdetail("Index is not valid.")));
}
/*
* Check if B-Tree index relation should have a file for its main relation
* fork. Verification uses this to skip unlogged indexes when in hot standby
* mode, where there is simply nothing to verify. We behave as if the
* relation is empty.
*
* NB: Caller should call btree_index_checkable() before calling here.
*/
static inline bool
btree_index_mainfork_expected(Relation rel)
{
if (rel->rd_rel->relpersistence != RELPERSISTENCE_UNLOGGED ||
!RecoveryInProgress())
return true;
ereport(DEBUG1,
(errcode(ERRCODE_READ_ONLY_SQL_TRANSACTION),
errmsg("cannot verify unlogged index \"%s\" during recovery, skipping",
RelationGetRelationName(rel))));
return false;
/* Check index, possibly against table it is an index on */
bt_check_every_level(indrel, heaprel, heapkeyspace, readonly,
args->heapallindexed, args->rootdescend, args->checkunique);
}
/*
@ -1597,8 +1473,7 @@ bt_target_page_check(BtreeCheckState *state)
*/
lowersizelimit = skey->heapkeyspace &&
(P_ISLEAF(topaque) || BTreeTupleGetHeapTID(itup) == NULL);
if (tupsize > (lowersizelimit ? BTMaxItemSize(state->target) :
BTMaxItemSizeNoHeapTid(state->target)))
if (tupsize > (lowersizelimit ? BTMaxItemSize : BTMaxItemSizeNoHeapTid))
{
ItemPointer tid = BTreeTupleGetPointsToTID(itup);
char *itid,

View File

@ -16,7 +16,10 @@
#include "libpq/auth.h"
#include "utils/guc.h"
PG_MODULE_MAGIC;
PG_MODULE_MAGIC_EXT(
.name = "auth_delay",
.version = PG_VERSION
);
/* GUC Variables */
static int auth_delay_milliseconds = 0;

View File

@ -16,11 +16,16 @@
#include "access/parallel.h"
#include "commands/explain.h"
#include "commands/explain_format.h"
#include "commands/explain_state.h"
#include "common/pg_prng.h"
#include "executor/instrument.h"
#include "utils/guc.h"
PG_MODULE_MAGIC;
PG_MODULE_MAGIC_EXT(
.name = "auto_explain",
.version = PG_VERSION
);
/* GUC variables */
static int auto_explain_log_min_duration = -1; /* msec or -1 */
@ -93,7 +98,7 @@ _PG_init(void)
/* Define custom GUC variables. */
DefineCustomIntVariable("auto_explain.log_min_duration",
"Sets the minimum execution time above which plans will be logged.",
"Zero prints all plans. -1 turns this feature off.",
"-1 disables logging plans. 0 means log all plans.",
&auto_explain_log_min_duration,
-1,
-1, INT_MAX,
@ -104,8 +109,8 @@ _PG_init(void)
NULL);
DefineCustomIntVariable("auto_explain.log_parameter_max_length",
"Sets the maximum length of query parameters to log.",
"Zero logs no query parameters, -1 logs them in full.",
"Sets the maximum length of query parameter values to log.",
"-1 means log values in full.",
&auto_explain_log_parameter_max_length,
-1,
-1, INT_MAX,

View File

@ -28,7 +28,7 @@ sub query_log
}
my $node = PostgreSQL::Test::Cluster->new('main');
$node->init('auth_extra' => [ '--create-role', 'regress_user1' ]);
$node->init(auth_extra => [ '--create-role' => 'regress_user1' ]);
$node->append_conf('postgresql.conf',
"session_preload_libraries = 'auto_explain'");
$node->append_conf('postgresql.conf', "auto_explain.log_min_duration = 0");
@ -212,4 +212,17 @@ REVOKE SET ON PARAMETER auto_explain.log_format FROM regress_user1;
DROP USER regress_user1;
});
# Test pg_get_loaded_modules() function. This function is particularly
# useful for modules with no SQL presence, such as auto_explain.
my $res = $node->safe_psql(
"postgres", q{
SELECT module_name,
version = current_setting('server_version') as version_ok,
regexp_replace(file_name, '\..*', '') as file_name_stripped
FROM pg_get_loaded_modules()
WHERE module_name = 'auto_explain';
});
like($res, qr/^auto_explain\|t\|auto_explain$/, "pg_get_loaded_modules() ok");
done_testing();

View File

@ -18,7 +18,10 @@
#include "utils/acl.h"
#include "utils/guc.h"
PG_MODULE_MAGIC;
PG_MODULE_MAGIC_EXT(
.name = "basebackup_to_shell",
.version = PG_VERSION
);
typedef struct bbsink_shell
{

View File

@ -24,8 +24,8 @@ my $node = PostgreSQL::Test::Cluster->new('primary');
# Make sure pg_hba.conf is set up to allow connections from backupuser.
# This is only needed on Windows machines that don't use UNIX sockets.
$node->init(
'allows_streaming' => 1,
'auth_extra' => [ '--create-role' => 'backupuser' ]);
allows_streaming => 1,
auth_extra => [ '--create-role' => 'backupuser' ]);
$node->append_conf('postgresql.conf',
"shared_preload_libraries = 'basebackup_to_shell'");
@ -131,8 +131,10 @@ sub verify_backup
# Untar.
my $extract_path = PostgreSQL::Test::Utils::tempdir;
system_or_bail($tar, 'xf', $backup_dir . '/' . $prefix . 'base.tar',
'-C', $extract_path);
system_or_bail(
$tar,
'xf' => $backup_dir . '/' . $prefix . 'base.tar',
'-C' => $extract_path);
# Verify.
$node->command_ok(

View File

@ -37,7 +37,10 @@
#include "storage/fd.h"
#include "utils/guc.h"
PG_MODULE_MAGIC;
PG_MODULE_MAGIC_EXT(
.name = "basic_archive",
.version = PG_VERSION
);
static char *archive_directory = NULL;

View File

@ -22,7 +22,10 @@
#include "utils/memutils.h"
#include "utils/rel.h"
PG_MODULE_MAGIC;
PG_MODULE_MAGIC_EXT(
.name = "bloom",
.version = PG_VERSION
);
/*
* State of bloom index build. We accumulate one page data here before

View File

@ -116,6 +116,8 @@ blgetbitmap(IndexScanDesc scan, TIDBitmap *tbm)
bas = GetAccessStrategy(BAS_BULKREAD);
npages = RelationGetNumberOfBlocks(scan->indexRelation);
pgstat_count_index_scan(scan->indexRelation);
if (scan->instrument)
scan->instrument->nsearches++;
for (blkno = BLOOM_HEAD_BLKNO; blkno < npages; blkno++)
{

View File

@ -109,6 +109,9 @@ blhandler(PG_FUNCTION_ARGS)
amroutine->amoptsprocnum = BLOOM_OPTIONS_PROC;
amroutine->amcanorder = false;
amroutine->amcanorderbyop = false;
amroutine->amcanhash = false;
amroutine->amconsistentequality = false;
amroutine->amconsistentordering = false;
amroutine->amcanbackward = false;
amroutine->amcanunique = false;
amroutine->amcanmulticol = true;

View File

@ -4,7 +4,10 @@
#include "plperl.h"
PG_MODULE_MAGIC;
PG_MODULE_MAGIC_EXT(
.name = "bool_plperl",
.version = PG_VERSION
);
PG_FUNCTION_INFO_V1(bool_to_plperl);

View File

@ -104,9 +104,9 @@ SELECT spi_test();
DROP EXTENSION plperl CASCADE;
NOTICE: drop cascades to 6 other objects
DETAIL: drop cascades to function spi_test()
drop cascades to extension bool_plperl
DETAIL: drop cascades to extension bool_plperl
drop cascades to function perl2int(integer)
drop cascades to function perl2text(text)
drop cascades to function perl2undef()
drop cascades to function bool2perl(boolean,boolean,boolean)
drop cascades to function spi_test()

View File

@ -104,9 +104,9 @@ SELECT spi_test();
DROP EXTENSION plperlu CASCADE;
NOTICE: drop cascades to 6 other objects
DETAIL: drop cascades to function spi_test()
drop cascades to extension bool_plperlu
DETAIL: drop cascades to extension bool_plperlu
drop cascades to function perl2int(integer)
drop cascades to function perl2text(text)
drop cascades to function perl2undef()
drop cascades to function bool2perl(boolean,boolean,boolean)
drop cascades to function spi_test()

View File

@ -14,7 +14,10 @@
#include "utils/timestamp.h"
#include "utils/uuid.h"
PG_MODULE_MAGIC;
PG_MODULE_MAGIC_EXT(
.name = "btree_gin",
.version = PG_VERSION
);
typedef struct QueryInfo
{

View File

@ -34,7 +34,7 @@ DATA = btree_gist--1.0--1.1.sql \
btree_gist--1.1--1.2.sql btree_gist--1.2.sql btree_gist--1.2--1.3.sql \
btree_gist--1.3--1.4.sql btree_gist--1.4--1.5.sql \
btree_gist--1.5--1.6.sql btree_gist--1.6--1.7.sql \
btree_gist--1.7--1.8.sql
btree_gist--1.7--1.8.sql btree_gist--1.8--1.9.sql
PGFILEDESC = "btree_gist - B-tree equivalent GiST operator classes"
REGRESS = init int2 int4 int8 float4 float8 cash oid timestamp timestamptz \

View File

@ -6,18 +6,18 @@
#include "btree_gist.h"
#include "btree_utils_var.h"
#include "utils/fmgrprotos.h"
#include "utils/sortsupport.h"
#include "utils/varbit.h"
/*
** Bit ops
*/
/* GiST support functions */
PG_FUNCTION_INFO_V1(gbt_bit_compress);
PG_FUNCTION_INFO_V1(gbt_bit_union);
PG_FUNCTION_INFO_V1(gbt_bit_picksplit);
PG_FUNCTION_INFO_V1(gbt_bit_consistent);
PG_FUNCTION_INFO_V1(gbt_bit_penalty);
PG_FUNCTION_INFO_V1(gbt_bit_same);
PG_FUNCTION_INFO_V1(gbt_bit_sortsupport);
PG_FUNCTION_INFO_V1(gbt_varbit_sortsupport);
/* define for comparison */
@ -121,7 +121,7 @@ static const gbtree_vinfo tinfo =
/**************************************************
* Bit ops
* GiST support functions
**************************************************/
Datum
@ -161,8 +161,6 @@ gbt_bit_consistent(PG_FUNCTION_ARGS)
PG_RETURN_BOOL(retval);
}
Datum
gbt_bit_union(PG_FUNCTION_ARGS)
{
@ -173,7 +171,6 @@ gbt_bit_union(PG_FUNCTION_ARGS)
&tinfo, fcinfo->flinfo));
}
Datum
gbt_bit_picksplit(PG_FUNCTION_ARGS)
{
@ -196,7 +193,6 @@ gbt_bit_same(PG_FUNCTION_ARGS)
PG_RETURN_POINTER(result);
}
Datum
gbt_bit_penalty(PG_FUNCTION_ARGS)
{
@ -207,3 +203,46 @@ gbt_bit_penalty(PG_FUNCTION_ARGS)
PG_RETURN_POINTER(gbt_var_penalty(result, o, n, PG_GET_COLLATION(),
&tinfo, fcinfo->flinfo));
}
static int
gbt_bit_ssup_cmp(Datum x, Datum y, SortSupport ssup)
{
GBT_VARKEY *key1 = PG_DETOAST_DATUM(x);
GBT_VARKEY *key2 = PG_DETOAST_DATUM(y);
GBT_VARKEY_R arg1 = gbt_var_key_readable(key1);
GBT_VARKEY_R arg2 = gbt_var_key_readable(key2);
Datum result;
/* for leaf items we expect lower == upper, so only compare lower */
result = DirectFunctionCall2(byteacmp,
PointerGetDatum(arg1.lower),
PointerGetDatum(arg2.lower));
GBT_FREE_IF_COPY(key1, x);
GBT_FREE_IF_COPY(key2, y);
return DatumGetInt32(result);
}
Datum
gbt_bit_sortsupport(PG_FUNCTION_ARGS)
{
SortSupport ssup = (SortSupport) PG_GETARG_POINTER(0);
ssup->comparator = gbt_bit_ssup_cmp;
ssup->ssup_extra = NULL;
PG_RETURN_VOID();
}
Datum
gbt_varbit_sortsupport(PG_FUNCTION_ARGS)
{
SortSupport ssup = (SortSupport) PG_GETARG_POINTER(0);
ssup->comparator = gbt_bit_ssup_cmp;
ssup->ssup_extra = NULL;
PG_RETURN_VOID();
}

View File

@ -5,6 +5,7 @@
#include "btree_gist.h"
#include "btree_utils_num.h"
#include "utils/sortsupport.h"
typedef struct boolkey
{
@ -12,9 +13,7 @@ typedef struct boolkey
bool upper;
} boolKEY;
/*
** bool ops
*/
/* GiST support functions */
PG_FUNCTION_INFO_V1(gbt_bool_compress);
PG_FUNCTION_INFO_V1(gbt_bool_fetch);
PG_FUNCTION_INFO_V1(gbt_bool_union);
@ -22,6 +21,7 @@ PG_FUNCTION_INFO_V1(gbt_bool_picksplit);
PG_FUNCTION_INFO_V1(gbt_bool_consistent);
PG_FUNCTION_INFO_V1(gbt_bool_penalty);
PG_FUNCTION_INFO_V1(gbt_bool_same);
PG_FUNCTION_INFO_V1(gbt_bool_sortsupport);
static bool
gbt_boolgt(const void *a, const void *b, FmgrInfo *flinfo)
@ -82,10 +82,9 @@ static const gbtree_ninfo tinfo =
/**************************************************
* bool ops
* GiST support functions
**************************************************/
Datum
gbt_bool_compress(PG_FUNCTION_ARGS)
{
@ -124,7 +123,6 @@ gbt_bool_consistent(PG_FUNCTION_ARGS)
GIST_LEAF(entry), &tinfo, fcinfo->flinfo));
}
Datum
gbt_bool_union(PG_FUNCTION_ARGS)
{
@ -135,7 +133,6 @@ gbt_bool_union(PG_FUNCTION_ARGS)
PG_RETURN_POINTER(gbt_num_union(out, entryvec, &tinfo, fcinfo->flinfo));
}
Datum
gbt_bool_penalty(PG_FUNCTION_ARGS)
{
@ -166,3 +163,24 @@ gbt_bool_same(PG_FUNCTION_ARGS)
*result = gbt_num_same((void *) b1, (void *) b2, &tinfo, fcinfo->flinfo);
PG_RETURN_POINTER(result);
}
static int
gbt_bool_ssup_cmp(Datum x, Datum y, SortSupport ssup)
{
boolKEY *arg1 = (boolKEY *) DatumGetPointer(x);
boolKEY *arg2 = (boolKEY *) DatumGetPointer(y);
/* for leaf items we expect lower == upper, so only compare lower */
return (int32) arg1->lower - (int32) arg2->lower;
}
Datum
gbt_bool_sortsupport(PG_FUNCTION_ARGS)
{
SortSupport ssup = (SortSupport) PG_GETARG_POINTER(0);
ssup->comparator = gbt_bool_ssup_cmp;
ssup->ssup_extra = NULL;
PG_RETURN_VOID();
}

View File

@ -6,17 +6,16 @@
#include "btree_gist.h"
#include "btree_utils_var.h"
#include "utils/fmgrprotos.h"
#include "utils/sortsupport.h"
/*
** Bytea ops
*/
/* GiST support functions */
PG_FUNCTION_INFO_V1(gbt_bytea_compress);
PG_FUNCTION_INFO_V1(gbt_bytea_union);
PG_FUNCTION_INFO_V1(gbt_bytea_picksplit);
PG_FUNCTION_INFO_V1(gbt_bytea_consistent);
PG_FUNCTION_INFO_V1(gbt_bytea_penalty);
PG_FUNCTION_INFO_V1(gbt_bytea_same);
PG_FUNCTION_INFO_V1(gbt_bytea_sortsupport);
/* define for comparison */
@ -69,7 +68,6 @@ gbt_byteacmp(const void *a, const void *b, Oid collation, FmgrInfo *flinfo)
PointerGetDatum(b)));
}
static const gbtree_vinfo tinfo =
{
gbt_t_bytea,
@ -86,10 +84,9 @@ static const gbtree_vinfo tinfo =
/**************************************************
* Text ops
* GiST support functions
**************************************************/
Datum
gbt_bytea_compress(PG_FUNCTION_ARGS)
{
@ -98,8 +95,6 @@ gbt_bytea_compress(PG_FUNCTION_ARGS)
PG_RETURN_POINTER(gbt_var_compress(entry, &tinfo));
}
Datum
gbt_bytea_consistent(PG_FUNCTION_ARGS)
{
@ -121,8 +116,6 @@ gbt_bytea_consistent(PG_FUNCTION_ARGS)
PG_RETURN_BOOL(retval);
}
Datum
gbt_bytea_union(PG_FUNCTION_ARGS)
{
@ -133,7 +126,6 @@ gbt_bytea_union(PG_FUNCTION_ARGS)
&tinfo, fcinfo->flinfo));
}
Datum
gbt_bytea_picksplit(PG_FUNCTION_ARGS)
{
@ -156,7 +148,6 @@ gbt_bytea_same(PG_FUNCTION_ARGS)
PG_RETURN_POINTER(result);
}
Datum
gbt_bytea_penalty(PG_FUNCTION_ARGS)
{
@ -167,3 +158,35 @@ gbt_bytea_penalty(PG_FUNCTION_ARGS)
PG_RETURN_POINTER(gbt_var_penalty(result, o, n, PG_GET_COLLATION(),
&tinfo, fcinfo->flinfo));
}
static int
gbt_bytea_ssup_cmp(Datum x, Datum y, SortSupport ssup)
{
GBT_VARKEY *key1 = PG_DETOAST_DATUM(x);
GBT_VARKEY *key2 = PG_DETOAST_DATUM(y);
GBT_VARKEY_R xkey = gbt_var_key_readable(key1);
GBT_VARKEY_R ykey = gbt_var_key_readable(key2);
Datum result;
/* for leaf items we expect lower == upper, so only compare lower */
result = DirectFunctionCall2(byteacmp,
PointerGetDatum(xkey.lower),
PointerGetDatum(ykey.lower));
GBT_FREE_IF_COPY(key1, x);
GBT_FREE_IF_COPY(key2, y);
return DatumGetInt32(result);
}
Datum
gbt_bytea_sortsupport(PG_FUNCTION_ARGS)
{
SortSupport ssup = (SortSupport) PG_GETARG_POINTER(0);
ssup->comparator = gbt_bytea_ssup_cmp;
ssup->ssup_extra = NULL;
PG_RETURN_VOID();
}

View File

@ -7,6 +7,7 @@
#include "btree_utils_num.h"
#include "common/int.h"
#include "utils/cash.h"
#include "utils/sortsupport.h"
typedef struct
{
@ -14,9 +15,7 @@ typedef struct
Cash upper;
} cashKEY;
/*
** Cash ops
*/
/* GiST support functions */
PG_FUNCTION_INFO_V1(gbt_cash_compress);
PG_FUNCTION_INFO_V1(gbt_cash_fetch);
PG_FUNCTION_INFO_V1(gbt_cash_union);
@ -25,6 +24,7 @@ PG_FUNCTION_INFO_V1(gbt_cash_consistent);
PG_FUNCTION_INFO_V1(gbt_cash_distance);
PG_FUNCTION_INFO_V1(gbt_cash_penalty);
PG_FUNCTION_INFO_V1(gbt_cash_same);
PG_FUNCTION_INFO_V1(gbt_cash_sortsupport);
static bool
gbt_cashgt(const void *a, const void *b, FmgrInfo *flinfo)
@ -111,10 +111,10 @@ cash_dist(PG_FUNCTION_ARGS)
PG_RETURN_CASH(ra);
}
/**************************************************
* Cash ops
**************************************************/
/**************************************************
* GiST support functions
**************************************************/
Datum
gbt_cash_compress(PG_FUNCTION_ARGS)
@ -155,7 +155,6 @@ gbt_cash_consistent(PG_FUNCTION_ARGS)
fcinfo->flinfo));
}
Datum
gbt_cash_distance(PG_FUNCTION_ARGS)
{
@ -173,7 +172,6 @@ gbt_cash_distance(PG_FUNCTION_ARGS)
&tinfo, fcinfo->flinfo));
}
Datum
gbt_cash_union(PG_FUNCTION_ARGS)
{
@ -184,7 +182,6 @@ gbt_cash_union(PG_FUNCTION_ARGS)
PG_RETURN_POINTER(gbt_num_union(out, entryvec, &tinfo, fcinfo->flinfo));
}
Datum
gbt_cash_penalty(PG_FUNCTION_ARGS)
{
@ -215,3 +212,29 @@ gbt_cash_same(PG_FUNCTION_ARGS)
*result = gbt_num_same((void *) b1, (void *) b2, &tinfo, fcinfo->flinfo);
PG_RETURN_POINTER(result);
}
static int
gbt_cash_ssup_cmp(Datum x, Datum y, SortSupport ssup)
{
cashKEY *arg1 = (cashKEY *) DatumGetPointer(x);
cashKEY *arg2 = (cashKEY *) DatumGetPointer(y);
/* for leaf items we expect lower == upper, so only compare lower */
if (arg1->lower > arg2->lower)
return 1;
else if (arg1->lower < arg2->lower)
return -1;
else
return 0;
}
Datum
gbt_cash_sortsupport(PG_FUNCTION_ARGS)
{
SortSupport ssup = (SortSupport) PG_GETARG_POINTER(0);
ssup->comparator = gbt_cash_ssup_cmp;
ssup->ssup_extra = NULL;
PG_RETURN_VOID();
}

View File

@ -7,6 +7,7 @@
#include "btree_utils_num.h"
#include "utils/fmgrprotos.h"
#include "utils/date.h"
#include "utils/sortsupport.h"
typedef struct
{
@ -14,9 +15,7 @@ typedef struct
DateADT upper;
} dateKEY;
/*
** date ops
*/
/* GiST support functions */
PG_FUNCTION_INFO_V1(gbt_date_compress);
PG_FUNCTION_INFO_V1(gbt_date_fetch);
PG_FUNCTION_INFO_V1(gbt_date_union);
@ -25,6 +24,7 @@ PG_FUNCTION_INFO_V1(gbt_date_consistent);
PG_FUNCTION_INFO_V1(gbt_date_distance);
PG_FUNCTION_INFO_V1(gbt_date_penalty);
PG_FUNCTION_INFO_V1(gbt_date_same);
PG_FUNCTION_INFO_V1(gbt_date_sortsupport);
static bool
gbt_dategt(const void *a, const void *b, FmgrInfo *flinfo)
@ -128,11 +128,9 @@ date_dist(PG_FUNCTION_ARGS)
/**************************************************
* date ops
* GiST support functions
**************************************************/
Datum
gbt_date_compress(PG_FUNCTION_ARGS)
{
@ -172,7 +170,6 @@ gbt_date_consistent(PG_FUNCTION_ARGS)
fcinfo->flinfo));
}
Datum
gbt_date_distance(PG_FUNCTION_ARGS)
{
@ -190,7 +187,6 @@ gbt_date_distance(PG_FUNCTION_ARGS)
&tinfo, fcinfo->flinfo));
}
Datum
gbt_date_union(PG_FUNCTION_ARGS)
{
@ -201,7 +197,6 @@ gbt_date_union(PG_FUNCTION_ARGS)
PG_RETURN_POINTER(gbt_num_union(out, entryvec, &tinfo, fcinfo->flinfo));
}
Datum
gbt_date_penalty(PG_FUNCTION_ARGS)
{
@ -238,7 +233,6 @@ gbt_date_penalty(PG_FUNCTION_ARGS)
PG_RETURN_POINTER(result);
}
Datum
gbt_date_picksplit(PG_FUNCTION_ARGS)
{
@ -257,3 +251,26 @@ gbt_date_same(PG_FUNCTION_ARGS)
*result = gbt_num_same((void *) b1, (void *) b2, &tinfo, fcinfo->flinfo);
PG_RETURN_POINTER(result);
}
static int
gbt_date_ssup_cmp(Datum x, Datum y, SortSupport ssup)
{
dateKEY *akey = (dateKEY *) DatumGetPointer(x);
dateKEY *bkey = (dateKEY *) DatumGetPointer(y);
/* for leaf items we expect lower == upper, so only compare lower */
return DatumGetInt32(DirectFunctionCall2(date_cmp,
DateADTGetDatum(akey->lower),
DateADTGetDatum(bkey->lower)));
}
Datum
gbt_date_sortsupport(PG_FUNCTION_ARGS)
{
SortSupport ssup = (SortSupport) PG_GETARG_POINTER(0);
ssup->comparator = gbt_date_ssup_cmp;
ssup->ssup_extra = NULL;
PG_RETURN_VOID();
}

View File

@ -7,6 +7,8 @@
#include "btree_utils_num.h"
#include "fmgr.h"
#include "utils/fmgrprotos.h"
#include "utils/fmgroids.h"
#include "utils/sortsupport.h"
/* enums are really Oids, so we just use the same structure */
@ -16,9 +18,7 @@ typedef struct
Oid upper;
} oidKEY;
/*
** enum ops
*/
/* GiST support functions */
PG_FUNCTION_INFO_V1(gbt_enum_compress);
PG_FUNCTION_INFO_V1(gbt_enum_fetch);
PG_FUNCTION_INFO_V1(gbt_enum_union);
@ -26,6 +26,7 @@ PG_FUNCTION_INFO_V1(gbt_enum_picksplit);
PG_FUNCTION_INFO_V1(gbt_enum_consistent);
PG_FUNCTION_INFO_V1(gbt_enum_penalty);
PG_FUNCTION_INFO_V1(gbt_enum_same);
PG_FUNCTION_INFO_V1(gbt_enum_sortsupport);
static bool
@ -99,10 +100,9 @@ static const gbtree_ninfo tinfo =
/**************************************************
* Enum ops
* GiST support functions
**************************************************/
Datum
gbt_enum_compress(PG_FUNCTION_ARGS)
{
@ -152,7 +152,6 @@ gbt_enum_union(PG_FUNCTION_ARGS)
PG_RETURN_POINTER(gbt_num_union(out, entryvec, &tinfo, fcinfo->flinfo));
}
Datum
gbt_enum_penalty(PG_FUNCTION_ARGS)
{
@ -183,3 +182,39 @@ gbt_enum_same(PG_FUNCTION_ARGS)
*result = gbt_num_same((void *) b1, (void *) b2, &tinfo, fcinfo->flinfo);
PG_RETURN_POINTER(result);
}
static int
gbt_enum_ssup_cmp(Datum x, Datum y, SortSupport ssup)
{
oidKEY *arg1 = (oidKEY *) DatumGetPointer(x);
oidKEY *arg2 = (oidKEY *) DatumGetPointer(y);
/* for leaf items we expect lower == upper, so only compare lower */
return DatumGetInt32(CallerFInfoFunctionCall2(enum_cmp,
ssup->ssup_extra,
InvalidOid,
arg1->lower,
arg2->lower));
}
Datum
gbt_enum_sortsupport(PG_FUNCTION_ARGS)
{
SortSupport ssup = (SortSupport) PG_GETARG_POINTER(0);
FmgrInfo *flinfo;
ssup->comparator = gbt_enum_ssup_cmp;
/*
* Since gbt_enum_ssup_cmp() uses enum_cmp() like the rest of the
* comparison functions, it also needs to pass flinfo when calling it. The
* caller to a SortSupport comparison function doesn't provide an FmgrInfo
* struct, so look it up now, save it in ssup_extra and use it in
* gbt_enum_ssup_cmp() later.
*/
flinfo = MemoryContextAlloc(ssup->ssup_cxt, sizeof(FmgrInfo));
fmgr_info_cxt(F_ENUM_CMP, flinfo, ssup->ssup_cxt);
ssup->ssup_extra = flinfo;
PG_RETURN_VOID();
}

View File

@ -6,6 +6,7 @@
#include "btree_gist.h"
#include "btree_utils_num.h"
#include "utils/float.h"
#include "utils/sortsupport.h"
typedef struct float4key
{
@ -13,9 +14,7 @@ typedef struct float4key
float4 upper;
} float4KEY;
/*
** float4 ops
*/
/* GiST support functions */
PG_FUNCTION_INFO_V1(gbt_float4_compress);
PG_FUNCTION_INFO_V1(gbt_float4_fetch);
PG_FUNCTION_INFO_V1(gbt_float4_union);
@ -24,6 +23,7 @@ PG_FUNCTION_INFO_V1(gbt_float4_consistent);
PG_FUNCTION_INFO_V1(gbt_float4_distance);
PG_FUNCTION_INFO_V1(gbt_float4_penalty);
PG_FUNCTION_INFO_V1(gbt_float4_same);
PG_FUNCTION_INFO_V1(gbt_float4_sortsupport);
static bool
gbt_float4gt(const void *a, const void *b, FmgrInfo *flinfo)
@ -107,10 +107,9 @@ float4_dist(PG_FUNCTION_ARGS)
/**************************************************
* float4 ops
* GiST support functions
**************************************************/
Datum
gbt_float4_compress(PG_FUNCTION_ARGS)
{
@ -150,7 +149,6 @@ gbt_float4_consistent(PG_FUNCTION_ARGS)
fcinfo->flinfo));
}
Datum
gbt_float4_distance(PG_FUNCTION_ARGS)
{
@ -168,7 +166,6 @@ gbt_float4_distance(PG_FUNCTION_ARGS)
&tinfo, fcinfo->flinfo));
}
Datum
gbt_float4_union(PG_FUNCTION_ARGS)
{
@ -179,7 +176,6 @@ gbt_float4_union(PG_FUNCTION_ARGS)
PG_RETURN_POINTER(gbt_num_union(out, entryvec, &tinfo, fcinfo->flinfo));
}
Datum
gbt_float4_penalty(PG_FUNCTION_ARGS)
{
@ -210,3 +206,24 @@ gbt_float4_same(PG_FUNCTION_ARGS)
*result = gbt_num_same((void *) b1, (void *) b2, &tinfo, fcinfo->flinfo);
PG_RETURN_POINTER(result);
}
static int
gbt_float4_ssup_cmp(Datum x, Datum y, SortSupport ssup)
{
float4KEY *arg1 = (float4KEY *) DatumGetPointer(x);
float4KEY *arg2 = (float4KEY *) DatumGetPointer(y);
/* for leaf items we expect lower == upper, so only compare lower */
return float4_cmp_internal(arg1->lower, arg2->lower);
}
Datum
gbt_float4_sortsupport(PG_FUNCTION_ARGS)
{
SortSupport ssup = (SortSupport) PG_GETARG_POINTER(0);
ssup->comparator = gbt_float4_ssup_cmp;
ssup->ssup_extra = NULL;
PG_RETURN_VOID();
}

View File

@ -6,6 +6,7 @@
#include "btree_gist.h"
#include "btree_utils_num.h"
#include "utils/float.h"
#include "utils/sortsupport.h"
typedef struct float8key
{
@ -13,9 +14,7 @@ typedef struct float8key
float8 upper;
} float8KEY;
/*
** float8 ops
*/
/* GiST support functions */
PG_FUNCTION_INFO_V1(gbt_float8_compress);
PG_FUNCTION_INFO_V1(gbt_float8_fetch);
PG_FUNCTION_INFO_V1(gbt_float8_union);
@ -24,6 +23,7 @@ PG_FUNCTION_INFO_V1(gbt_float8_consistent);
PG_FUNCTION_INFO_V1(gbt_float8_distance);
PG_FUNCTION_INFO_V1(gbt_float8_penalty);
PG_FUNCTION_INFO_V1(gbt_float8_same);
PG_FUNCTION_INFO_V1(gbt_float8_sortsupport);
static bool
@ -113,10 +113,10 @@ float8_dist(PG_FUNCTION_ARGS)
PG_RETURN_FLOAT8(fabs(r));
}
/**************************************************
* float8 ops
**************************************************/
/**************************************************
* GiST support functions
**************************************************/
Datum
gbt_float8_compress(PG_FUNCTION_ARGS)
@ -157,7 +157,6 @@ gbt_float8_consistent(PG_FUNCTION_ARGS)
fcinfo->flinfo));
}
Datum
gbt_float8_distance(PG_FUNCTION_ARGS)
{
@ -175,7 +174,6 @@ gbt_float8_distance(PG_FUNCTION_ARGS)
&tinfo, fcinfo->flinfo));
}
Datum
gbt_float8_union(PG_FUNCTION_ARGS)
{
@ -186,7 +184,6 @@ gbt_float8_union(PG_FUNCTION_ARGS)
PG_RETURN_POINTER(gbt_num_union(out, entryvec, &tinfo, fcinfo->flinfo));
}
Datum
gbt_float8_penalty(PG_FUNCTION_ARGS)
{
@ -217,3 +214,24 @@ gbt_float8_same(PG_FUNCTION_ARGS)
*result = gbt_num_same((void *) b1, (void *) b2, &tinfo, fcinfo->flinfo);
PG_RETURN_POINTER(result);
}
static int
gbt_float8_ssup_cmp(Datum x, Datum y, SortSupport ssup)
{
float8KEY *arg1 = (float8KEY *) DatumGetPointer(x);
float8KEY *arg2 = (float8KEY *) DatumGetPointer(y);
/* for leaf items we expect lower == upper, so only compare lower */
return float8_cmp_internal(arg1->lower, arg2->lower);
}
Datum
gbt_float8_sortsupport(PG_FUNCTION_ARGS)
{
SortSupport ssup = (SortSupport) PG_GETARG_POINTER(0);
ssup->comparator = gbt_float8_ssup_cmp;
ssup->ssup_extra = NULL;
PG_RETURN_VOID();
}

View File

@ -9,79 +9,79 @@ AS 'MODULE_PATHNAME'
LANGUAGE C IMMUTABLE PARALLEL SAFE STRICT;
ALTER OPERATOR FAMILY gist_oid_ops USING gist ADD
FUNCTION 12 (oid, oid) gist_stratnum_btree (int) ;
FUNCTION 12 ("any", "any") gist_stratnum_btree (int) ;
ALTER OPERATOR FAMILY gist_int2_ops USING gist ADD
FUNCTION 12 (int2, int2) gist_stratnum_btree (int) ;
FUNCTION 12 ("any", "any") gist_stratnum_btree (int) ;
ALTER OPERATOR FAMILY gist_int4_ops USING gist ADD
FUNCTION 12 (int4, int4) gist_stratnum_btree (int) ;
FUNCTION 12 ("any", "any") gist_stratnum_btree (int) ;
ALTER OPERATOR FAMILY gist_int8_ops USING gist ADD
FUNCTION 12 (int8, int8) gist_stratnum_btree (int) ;
FUNCTION 12 ("any", "any") gist_stratnum_btree (int) ;
ALTER OPERATOR FAMILY gist_float4_ops USING gist ADD
FUNCTION 12 (float4, float4) gist_stratnum_btree (int) ;
FUNCTION 12 ("any", "any") gist_stratnum_btree (int) ;
ALTER OPERATOR FAMILY gist_float8_ops USING gist ADD
FUNCTION 12 (float8, float8) gist_stratnum_btree (int) ;
FUNCTION 12 ("any", "any") gist_stratnum_btree (int) ;
ALTER OPERATOR FAMILY gist_timestamp_ops USING gist ADD
FUNCTION 12 (timestamp, timestamp) gist_stratnum_btree (int) ;
FUNCTION 12 ("any", "any") gist_stratnum_btree (int) ;
ALTER OPERATOR FAMILY gist_timestamptz_ops USING gist ADD
FUNCTION 12 (timestamptz, timestamptz) gist_stratnum_btree (int) ;
FUNCTION 12 ("any", "any") gist_stratnum_btree (int) ;
ALTER OPERATOR FAMILY gist_time_ops USING gist ADD
FUNCTION 12 (time, time) gist_stratnum_btree (int) ;
FUNCTION 12 ("any", "any") gist_stratnum_btree (int) ;
ALTER OPERATOR FAMILY gist_date_ops USING gist ADD
FUNCTION 12 (date, date) gist_stratnum_btree (int) ;
FUNCTION 12 ("any", "any") gist_stratnum_btree (int) ;
ALTER OPERATOR FAMILY gist_interval_ops USING gist ADD
FUNCTION 12 (interval, interval) gist_stratnum_btree (int) ;
FUNCTION 12 ("any", "any") gist_stratnum_btree (int) ;
ALTER OPERATOR FAMILY gist_cash_ops USING gist ADD
FUNCTION 12 (money, money) gist_stratnum_btree (int) ;
FUNCTION 12 ("any", "any") gist_stratnum_btree (int) ;
ALTER OPERATOR FAMILY gist_macaddr_ops USING gist ADD
FUNCTION 12 (macaddr, macaddr) gist_stratnum_btree (int) ;
FUNCTION 12 ("any", "any") gist_stratnum_btree (int) ;
ALTER OPERATOR FAMILY gist_text_ops USING gist ADD
FUNCTION 12 (text, text) gist_stratnum_btree (int) ;
FUNCTION 12 ("any", "any") gist_stratnum_btree (int) ;
ALTER OPERATOR FAMILY gist_bpchar_ops USING gist ADD
FUNCTION 12 (bpchar, bpchar) gist_stratnum_btree (int) ;
FUNCTION 12 ("any", "any") gist_stratnum_btree (int) ;
ALTER OPERATOR FAMILY gist_bytea_ops USING gist ADD
FUNCTION 12 (bytea, bytea) gist_stratnum_btree (int) ;
FUNCTION 12 ("any", "any") gist_stratnum_btree (int) ;
ALTER OPERATOR FAMILY gist_numeric_ops USING gist ADD
FUNCTION 12 (numeric, numeric) gist_stratnum_btree (int) ;
FUNCTION 12 ("any", "any") gist_stratnum_btree (int) ;
ALTER OPERATOR FAMILY gist_bit_ops USING gist ADD
FUNCTION 12 (bit, bit) gist_stratnum_btree (int) ;
FUNCTION 12 ("any", "any") gist_stratnum_btree (int) ;
ALTER OPERATOR FAMILY gist_vbit_ops USING gist ADD
FUNCTION 12 (varbit, varbit) gist_stratnum_btree (int) ;
FUNCTION 12 ("any", "any") gist_stratnum_btree (int) ;
ALTER OPERATOR FAMILY gist_inet_ops USING gist ADD
FUNCTION 12 (inet, inet) gist_stratnum_btree (int) ;
FUNCTION 12 ("any", "any") gist_stratnum_btree (int) ;
ALTER OPERATOR FAMILY gist_cidr_ops USING gist ADD
FUNCTION 12 (cidr, cidr) gist_stratnum_btree (int) ;
FUNCTION 12 ("any", "any") gist_stratnum_btree (int) ;
ALTER OPERATOR FAMILY gist_timetz_ops USING gist ADD
FUNCTION 12 (timetz, timetz) gist_stratnum_btree (int) ;
FUNCTION 12 ("any", "any") gist_stratnum_btree (int) ;
ALTER OPERATOR FAMILY gist_uuid_ops USING gist ADD
FUNCTION 12 (uuid, uuid) gist_stratnum_btree (int) ;
FUNCTION 12 ("any", "any") gist_stratnum_btree (int) ;
ALTER OPERATOR FAMILY gist_macaddr8_ops USING gist ADD
FUNCTION 12 (macaddr8, macaddr8) gist_stratnum_btree (int) ;
FUNCTION 12 ("any", "any") gist_stratnum_btree (int) ;
ALTER OPERATOR FAMILY gist_enum_ops USING gist ADD
FUNCTION 12 (anyenum, anyenum) gist_stratnum_btree (int) ;
FUNCTION 12 ("any", "any") gist_stratnum_btree (int) ;
ALTER OPERATOR FAMILY gist_bool_ops USING gist ADD
FUNCTION 12 (bool, bool) gist_stratnum_btree (int) ;
FUNCTION 12 ("any", "any") gist_stratnum_btree (int) ;

View File

@ -0,0 +1,197 @@
/* contrib/btree_gist/btree_gist--1.7--1.8.sql */
-- complain if script is sourced in psql, rather than via CREATE EXTENSION
\echo Use "ALTER EXTENSION btree_gist UPDATE TO '1.9'" to load this file. \quit
CREATE FUNCTION gbt_bit_sortsupport(internal)
RETURNS void
AS 'MODULE_PATHNAME'
LANGUAGE C IMMUTABLE PARALLEL SAFE STRICT;
CREATE FUNCTION gbt_varbit_sortsupport(internal)
RETURNS void
AS 'MODULE_PATHNAME'
LANGUAGE C IMMUTABLE PARALLEL SAFE STRICT;
CREATE FUNCTION gbt_bool_sortsupport(internal)
RETURNS void
AS 'MODULE_PATHNAME'
LANGUAGE C IMMUTABLE PARALLEL SAFE STRICT;
CREATE FUNCTION gbt_bytea_sortsupport(internal)
RETURNS void
AS 'MODULE_PATHNAME'
LANGUAGE C IMMUTABLE PARALLEL SAFE STRICT;
CREATE FUNCTION gbt_cash_sortsupport(internal)
RETURNS void
AS 'MODULE_PATHNAME'
LANGUAGE C IMMUTABLE PARALLEL SAFE STRICT;
CREATE FUNCTION gbt_date_sortsupport(internal)
RETURNS void
AS 'MODULE_PATHNAME'
LANGUAGE C IMMUTABLE PARALLEL SAFE STRICT;
CREATE FUNCTION gbt_enum_sortsupport(internal)
RETURNS void
AS 'MODULE_PATHNAME'
LANGUAGE C IMMUTABLE PARALLEL SAFE STRICT;
CREATE FUNCTION gbt_float4_sortsupport(internal)
RETURNS void
AS 'MODULE_PATHNAME'
LANGUAGE C IMMUTABLE PARALLEL SAFE STRICT;
CREATE FUNCTION gbt_float8_sortsupport(internal)
RETURNS void
AS 'MODULE_PATHNAME'
LANGUAGE C IMMUTABLE PARALLEL SAFE STRICT;
CREATE FUNCTION gbt_inet_sortsupport(internal)
RETURNS void
AS 'MODULE_PATHNAME'
LANGUAGE C IMMUTABLE PARALLEL SAFE STRICT;
CREATE FUNCTION gbt_int2_sortsupport(internal)
RETURNS void
AS 'MODULE_PATHNAME'
LANGUAGE C IMMUTABLE PARALLEL SAFE STRICT;
CREATE FUNCTION gbt_int4_sortsupport(internal)
RETURNS void
AS 'MODULE_PATHNAME'
LANGUAGE C IMMUTABLE PARALLEL SAFE STRICT;
CREATE FUNCTION gbt_int8_sortsupport(internal)
RETURNS void
AS 'MODULE_PATHNAME'
LANGUAGE C IMMUTABLE PARALLEL SAFE STRICT;
CREATE FUNCTION gbt_intv_sortsupport(internal)
RETURNS void
AS 'MODULE_PATHNAME'
LANGUAGE C IMMUTABLE PARALLEL SAFE STRICT;
CREATE FUNCTION gbt_macaddr_sortsupport(internal)
RETURNS void
AS 'MODULE_PATHNAME'
LANGUAGE C IMMUTABLE PARALLEL SAFE STRICT;
CREATE FUNCTION gbt_macad8_sortsupport(internal)
RETURNS void
AS 'MODULE_PATHNAME'
LANGUAGE C IMMUTABLE PARALLEL SAFE STRICT;
CREATE FUNCTION gbt_numeric_sortsupport(internal)
RETURNS void
AS 'MODULE_PATHNAME'
LANGUAGE C IMMUTABLE PARALLEL SAFE STRICT;
CREATE FUNCTION gbt_oid_sortsupport(internal)
RETURNS void
AS 'MODULE_PATHNAME'
LANGUAGE C IMMUTABLE PARALLEL SAFE STRICT;
CREATE FUNCTION gbt_text_sortsupport(internal)
RETURNS void
AS 'MODULE_PATHNAME'
LANGUAGE C IMMUTABLE PARALLEL SAFE STRICT;
CREATE FUNCTION gbt_bpchar_sortsupport(internal)
RETURNS void
AS 'MODULE_PATHNAME'
LANGUAGE C IMMUTABLE PARALLEL SAFE STRICT;
CREATE FUNCTION gbt_time_sortsupport(internal)
RETURNS void
AS 'MODULE_PATHNAME'
LANGUAGE C IMMUTABLE PARALLEL SAFE STRICT;
CREATE FUNCTION gbt_ts_sortsupport(internal)
RETURNS void
AS 'MODULE_PATHNAME'
LANGUAGE C IMMUTABLE PARALLEL SAFE STRICT;
CREATE FUNCTION gbt_uuid_sortsupport(internal)
RETURNS void
AS 'MODULE_PATHNAME'
LANGUAGE C IMMUTABLE PARALLEL SAFE STRICT;
ALTER OPERATOR FAMILY gist_bit_ops USING gist ADD
FUNCTION 11 (bit, bit) gbt_bit_sortsupport (internal) ;
ALTER OPERATOR FAMILY gist_vbit_ops USING gist ADD
FUNCTION 11 (varbit, varbit) gbt_varbit_sortsupport (internal) ;
ALTER OPERATOR FAMILY gist_bool_ops USING gist ADD
FUNCTION 11 (bool, bool) gbt_bool_sortsupport (internal) ;
ALTER OPERATOR FAMILY gist_bytea_ops USING gist ADD
FUNCTION 11 (bytea, bytea) gbt_bytea_sortsupport (internal) ;
ALTER OPERATOR FAMILY gist_cash_ops USING gist ADD
FUNCTION 11 (money, money) gbt_cash_sortsupport (internal) ;
ALTER OPERATOR FAMILY gist_date_ops USING gist ADD
FUNCTION 11 (date, date) gbt_date_sortsupport (internal) ;
ALTER OPERATOR FAMILY gist_enum_ops USING gist ADD
FUNCTION 11 (anyenum, anyenum) gbt_enum_sortsupport (internal) ;
ALTER OPERATOR FAMILY gist_float4_ops USING gist ADD
FUNCTION 11 (float4, float4) gbt_float4_sortsupport (internal) ;
ALTER OPERATOR FAMILY gist_float8_ops USING gist ADD
FUNCTION 11 (float8, float8) gbt_float8_sortsupport (internal) ;
ALTER OPERATOR FAMILY gist_inet_ops USING gist ADD
FUNCTION 11 (inet, inet) gbt_inet_sortsupport (internal) ;
ALTER OPERATOR FAMILY gist_cidr_ops USING gist ADD
FUNCTION 11 (cidr, cidr) gbt_inet_sortsupport (internal) ;
ALTER OPERATOR FAMILY gist_int2_ops USING gist ADD
FUNCTION 11 (int2, int2) gbt_int2_sortsupport (internal) ;
ALTER OPERATOR FAMILY gist_int4_ops USING gist ADD
FUNCTION 11 (int4, int4) gbt_int4_sortsupport (internal) ;
ALTER OPERATOR FAMILY gist_int8_ops USING gist ADD
FUNCTION 11 (int8, int8) gbt_int8_sortsupport (internal) ;
ALTER OPERATOR FAMILY gist_interval_ops USING gist ADD
FUNCTION 11 (interval, interval) gbt_intv_sortsupport (internal) ;
ALTER OPERATOR FAMILY gist_macaddr_ops USING gist ADD
FUNCTION 11 (macaddr, macaddr) gbt_macaddr_sortsupport (internal) ;
ALTER OPERATOR FAMILY gist_macaddr8_ops USING gist ADD
FUNCTION 11 (macaddr8, macaddr8) gbt_macad8_sortsupport (internal) ;
ALTER OPERATOR FAMILY gist_numeric_ops USING gist ADD
FUNCTION 11 (numeric, numeric) gbt_numeric_sortsupport (internal) ;
ALTER OPERATOR FAMILY gist_oid_ops USING gist ADD
FUNCTION 11 (oid, oid) gbt_oid_sortsupport (internal) ;
ALTER OPERATOR FAMILY gist_text_ops USING gist ADD
FUNCTION 11 (text, text) gbt_text_sortsupport (internal) ;
ALTER OPERATOR FAMILY gist_bpchar_ops USING gist ADD
FUNCTION 11 (bpchar, bpchar) gbt_bpchar_sortsupport (internal) ;
ALTER OPERATOR FAMILY gist_time_ops USING gist ADD
FUNCTION 11 (time, time) gbt_time_sortsupport (internal) ;
ALTER OPERATOR FAMILY gist_timetz_ops USING gist ADD
FUNCTION 11 (timetz, timetz) gbt_time_sortsupport (internal) ;
ALTER OPERATOR FAMILY gist_timestamp_ops USING gist ADD
FUNCTION 11 (timestamp, timestamp) gbt_ts_sortsupport (internal) ;
ALTER OPERATOR FAMILY gist_timestamptz_ops USING gist ADD
FUNCTION 11 (timestamptz, timestamptz) gbt_ts_sortsupport (internal) ;
ALTER OPERATOR FAMILY gist_uuid_ops USING gist ADD
FUNCTION 11 (uuid, uuid) gbt_uuid_sortsupport (internal) ;

View File

@ -7,7 +7,10 @@
#include "access/stratnum.h"
#include "utils/builtins.h"
PG_MODULE_MAGIC;
PG_MODULE_MAGIC_EXT(
.name = "btree_gist",
.version = PG_VERSION
);
PG_FUNCTION_INFO_V1(gbt_decompress);
PG_FUNCTION_INFO_V1(gbtreekey_in);

View File

@ -1,6 +1,6 @@
# btree_gist extension
comment = 'support for indexing common datatypes in GiST'
default_version = '1.8'
default_version = '1.9'
module_pathname = '$libdir/btree_gist'
relocatable = true
trusted = true

View File

@ -7,6 +7,7 @@
#include "btree_utils_num.h"
#include "catalog/pg_type.h"
#include "utils/builtins.h"
#include "utils/sortsupport.h"
typedef struct inetkey
{
@ -14,15 +15,14 @@ typedef struct inetkey
double upper;
} inetKEY;
/*
** inet ops
*/
/* GiST support functions */
PG_FUNCTION_INFO_V1(gbt_inet_compress);
PG_FUNCTION_INFO_V1(gbt_inet_union);
PG_FUNCTION_INFO_V1(gbt_inet_picksplit);
PG_FUNCTION_INFO_V1(gbt_inet_consistent);
PG_FUNCTION_INFO_V1(gbt_inet_penalty);
PG_FUNCTION_INFO_V1(gbt_inet_same);
PG_FUNCTION_INFO_V1(gbt_inet_sortsupport);
static bool
@ -85,10 +85,9 @@ static const gbtree_ninfo tinfo =
/**************************************************
* inet ops
* GiST support functions
**************************************************/
Datum
gbt_inet_compress(PG_FUNCTION_ARGS)
{
@ -114,7 +113,6 @@ gbt_inet_compress(PG_FUNCTION_ARGS)
PG_RETURN_POINTER(retval);
}
Datum
gbt_inet_consistent(PG_FUNCTION_ARGS)
{
@ -142,7 +140,6 @@ gbt_inet_consistent(PG_FUNCTION_ARGS)
&strategy, GIST_LEAF(entry), &tinfo, fcinfo->flinfo));
}
Datum
gbt_inet_union(PG_FUNCTION_ARGS)
{
@ -153,7 +150,6 @@ gbt_inet_union(PG_FUNCTION_ARGS)
PG_RETURN_POINTER(gbt_num_union(out, entryvec, &tinfo, fcinfo->flinfo));
}
Datum
gbt_inet_penalty(PG_FUNCTION_ARGS)
{
@ -184,3 +180,29 @@ gbt_inet_same(PG_FUNCTION_ARGS)
*result = gbt_num_same((void *) b1, (void *) b2, &tinfo, fcinfo->flinfo);
PG_RETURN_POINTER(result);
}
static int
gbt_inet_ssup_cmp(Datum x, Datum y, SortSupport ssup)
{
inetKEY *arg1 = (inetKEY *) DatumGetPointer(x);
inetKEY *arg2 = (inetKEY *) DatumGetPointer(y);
/* for leaf items we expect lower == upper, so only compare lower */
if (arg1->lower < arg2->lower)
return -1;
else if (arg1->lower > arg2->lower)
return 1;
else
return 0;
}
Datum
gbt_inet_sortsupport(PG_FUNCTION_ARGS)
{
SortSupport ssup = (SortSupport) PG_GETARG_POINTER(0);
ssup->comparator = gbt_inet_ssup_cmp;
ssup->ssup_extra = NULL;
PG_RETURN_VOID();
}

View File

@ -6,6 +6,7 @@
#include "btree_gist.h"
#include "btree_utils_num.h"
#include "common/int.h"
#include "utils/sortsupport.h"
typedef struct int16key
{
@ -13,9 +14,7 @@ typedef struct int16key
int16 upper;
} int16KEY;
/*
** int16 ops
*/
/* GiST support functions */
PG_FUNCTION_INFO_V1(gbt_int2_compress);
PG_FUNCTION_INFO_V1(gbt_int2_fetch);
PG_FUNCTION_INFO_V1(gbt_int2_union);
@ -24,6 +23,8 @@ PG_FUNCTION_INFO_V1(gbt_int2_consistent);
PG_FUNCTION_INFO_V1(gbt_int2_distance);
PG_FUNCTION_INFO_V1(gbt_int2_penalty);
PG_FUNCTION_INFO_V1(gbt_int2_same);
PG_FUNCTION_INFO_V1(gbt_int2_sortsupport);
static bool
gbt_int2gt(const void *a, const void *b, FmgrInfo *flinfo)
@ -112,10 +113,9 @@ int2_dist(PG_FUNCTION_ARGS)
/**************************************************
* int16 ops
* GiST support functions
**************************************************/
Datum
gbt_int2_compress(PG_FUNCTION_ARGS)
{
@ -154,7 +154,6 @@ gbt_int2_consistent(PG_FUNCTION_ARGS)
GIST_LEAF(entry), &tinfo, fcinfo->flinfo));
}
Datum
gbt_int2_distance(PG_FUNCTION_ARGS)
{
@ -172,7 +171,6 @@ gbt_int2_distance(PG_FUNCTION_ARGS)
&tinfo, fcinfo->flinfo));
}
Datum
gbt_int2_union(PG_FUNCTION_ARGS)
{
@ -183,7 +181,6 @@ gbt_int2_union(PG_FUNCTION_ARGS)
PG_RETURN_POINTER(gbt_num_union(out, entryvec, &tinfo, fcinfo->flinfo));
}
Datum
gbt_int2_penalty(PG_FUNCTION_ARGS)
{
@ -214,3 +211,27 @@ gbt_int2_same(PG_FUNCTION_ARGS)
*result = gbt_num_same((void *) b1, (void *) b2, &tinfo, fcinfo->flinfo);
PG_RETURN_POINTER(result);
}
static int
gbt_int2_ssup_cmp(Datum x, Datum y, SortSupport ssup)
{
int16KEY *arg1 = (int16KEY *) DatumGetPointer(x);
int16KEY *arg2 = (int16KEY *) DatumGetPointer(y);
/* for leaf items we expect lower == upper, so only compare lower */
if (arg1->lower < arg2->lower)
return -1;
else if (arg1->lower > arg2->lower)
return 1;
else
return 0;
}
Datum
gbt_int2_sortsupport(PG_FUNCTION_ARGS)
{
SortSupport ssup = (SortSupport) PG_GETARG_POINTER(0);
ssup->comparator = gbt_int2_ssup_cmp;
PG_RETURN_VOID();
}

View File

@ -2,10 +2,10 @@
* contrib/btree_gist/btree_int4.c
*/
#include "postgres.h"
#include "btree_gist.h"
#include "btree_utils_num.h"
#include "common/int.h"
#include "utils/sortsupport.h"
typedef struct int32key
{
@ -13,9 +13,7 @@ typedef struct int32key
int32 upper;
} int32KEY;
/*
** int32 ops
*/
/* GiST support functions */
PG_FUNCTION_INFO_V1(gbt_int4_compress);
PG_FUNCTION_INFO_V1(gbt_int4_fetch);
PG_FUNCTION_INFO_V1(gbt_int4_union);
@ -24,7 +22,7 @@ PG_FUNCTION_INFO_V1(gbt_int4_consistent);
PG_FUNCTION_INFO_V1(gbt_int4_distance);
PG_FUNCTION_INFO_V1(gbt_int4_penalty);
PG_FUNCTION_INFO_V1(gbt_int4_same);
PG_FUNCTION_INFO_V1(gbt_int4_sortsupport);
static bool
gbt_int4gt(const void *a, const void *b, FmgrInfo *flinfo)
@ -113,10 +111,9 @@ int4_dist(PG_FUNCTION_ARGS)
/**************************************************
* int32 ops
* GiST support functions
**************************************************/
Datum
gbt_int4_compress(PG_FUNCTION_ARGS)
{
@ -155,7 +152,6 @@ gbt_int4_consistent(PG_FUNCTION_ARGS)
GIST_LEAF(entry), &tinfo, fcinfo->flinfo));
}
Datum
gbt_int4_distance(PG_FUNCTION_ARGS)
{
@ -173,7 +169,6 @@ gbt_int4_distance(PG_FUNCTION_ARGS)
&tinfo, fcinfo->flinfo));
}
Datum
gbt_int4_union(PG_FUNCTION_ARGS)
{
@ -184,7 +179,6 @@ gbt_int4_union(PG_FUNCTION_ARGS)
PG_RETURN_POINTER(gbt_num_union(out, entryvec, &tinfo, fcinfo->flinfo));
}
Datum
gbt_int4_penalty(PG_FUNCTION_ARGS)
{
@ -215,3 +209,27 @@ gbt_int4_same(PG_FUNCTION_ARGS)
*result = gbt_num_same((void *) b1, (void *) b2, &tinfo, fcinfo->flinfo);
PG_RETURN_POINTER(result);
}
static int
gbt_int4_ssup_cmp(Datum a, Datum b, SortSupport ssup)
{
int32KEY *ia = (int32KEY *) DatumGetPointer(a);
int32KEY *ib = (int32KEY *) DatumGetPointer(b);
/* for leaf items we expect lower == upper, so only compare lower */
if (ia->lower < ib->lower)
return -1;
else if (ia->lower > ib->lower)
return 1;
else
return 0;
}
Datum
gbt_int4_sortsupport(PG_FUNCTION_ARGS)
{
SortSupport ssup = (SortSupport) PG_GETARG_POINTER(0);
ssup->comparator = gbt_int4_ssup_cmp;
PG_RETURN_VOID();
}

View File

@ -6,6 +6,7 @@
#include "btree_gist.h"
#include "btree_utils_num.h"
#include "common/int.h"
#include "utils/sortsupport.h"
typedef struct int64key
{
@ -13,9 +14,7 @@ typedef struct int64key
int64 upper;
} int64KEY;
/*
** int64 ops
*/
/* GiST support functions */
PG_FUNCTION_INFO_V1(gbt_int8_compress);
PG_FUNCTION_INFO_V1(gbt_int8_fetch);
PG_FUNCTION_INFO_V1(gbt_int8_union);
@ -24,6 +23,7 @@ PG_FUNCTION_INFO_V1(gbt_int8_consistent);
PG_FUNCTION_INFO_V1(gbt_int8_distance);
PG_FUNCTION_INFO_V1(gbt_int8_penalty);
PG_FUNCTION_INFO_V1(gbt_int8_same);
PG_FUNCTION_INFO_V1(gbt_int8_sortsupport);
static bool
@ -113,10 +113,9 @@ int8_dist(PG_FUNCTION_ARGS)
/**************************************************
* int64 ops
* GiST support functions
**************************************************/
Datum
gbt_int8_compress(PG_FUNCTION_ARGS)
{
@ -155,7 +154,6 @@ gbt_int8_consistent(PG_FUNCTION_ARGS)
GIST_LEAF(entry), &tinfo, fcinfo->flinfo));
}
Datum
gbt_int8_distance(PG_FUNCTION_ARGS)
{
@ -173,7 +171,6 @@ gbt_int8_distance(PG_FUNCTION_ARGS)
&tinfo, fcinfo->flinfo));
}
Datum
gbt_int8_union(PG_FUNCTION_ARGS)
{
@ -184,7 +181,6 @@ gbt_int8_union(PG_FUNCTION_ARGS)
PG_RETURN_POINTER(gbt_num_union(out, entryvec, &tinfo, fcinfo->flinfo));
}
Datum
gbt_int8_penalty(PG_FUNCTION_ARGS)
{
@ -215,3 +211,28 @@ gbt_int8_same(PG_FUNCTION_ARGS)
*result = gbt_num_same((void *) b1, (void *) b2, &tinfo, fcinfo->flinfo);
PG_RETURN_POINTER(result);
}
static int
gbt_int8_ssup_cmp(Datum x, Datum y, SortSupport ssup)
{
int64KEY *arg1 = (int64KEY *) DatumGetPointer(x);
int64KEY *arg2 = (int64KEY *) DatumGetPointer(y);
/* for leaf items we expect lower == upper, so only compare lower */
if (arg1->lower < arg2->lower)
return -1;
else if (arg1->lower > arg2->lower)
return 1;
else
return 0;
}
Datum
gbt_int8_sortsupport(PG_FUNCTION_ARGS)
{
SortSupport ssup = (SortSupport) PG_GETARG_POINTER(0);
ssup->comparator = gbt_int8_ssup_cmp;
PG_RETURN_VOID();
}

View File

@ -6,6 +6,7 @@
#include "btree_gist.h"
#include "btree_utils_num.h"
#include "utils/fmgrprotos.h"
#include "utils/sortsupport.h"
#include "utils/timestamp.h"
typedef struct
@ -14,10 +15,7 @@ typedef struct
upper;
} intvKEY;
/*
** Interval ops
*/
/* GiST support functions */
PG_FUNCTION_INFO_V1(gbt_intv_compress);
PG_FUNCTION_INFO_V1(gbt_intv_fetch);
PG_FUNCTION_INFO_V1(gbt_intv_decompress);
@ -27,6 +25,7 @@ PG_FUNCTION_INFO_V1(gbt_intv_consistent);
PG_FUNCTION_INFO_V1(gbt_intv_distance);
PG_FUNCTION_INFO_V1(gbt_intv_penalty);
PG_FUNCTION_INFO_V1(gbt_intv_same);
PG_FUNCTION_INFO_V1(gbt_intv_sortsupport);
static bool
@ -137,10 +136,9 @@ interval_dist(PG_FUNCTION_ARGS)
/**************************************************
* interval ops
* GiST support functions
**************************************************/
Datum
gbt_intv_compress(PG_FUNCTION_ARGS)
{
@ -295,3 +293,26 @@ gbt_intv_same(PG_FUNCTION_ARGS)
*result = gbt_num_same((void *) b1, (void *) b2, &tinfo, fcinfo->flinfo);
PG_RETURN_POINTER(result);
}
static int
gbt_intv_ssup_cmp(Datum x, Datum y, SortSupport ssup)
{
intvKEY *arg1 = (intvKEY *) DatumGetPointer(x);
intvKEY *arg2 = (intvKEY *) DatumGetPointer(y);
/* for leaf items we expect lower == upper, so only compare lower */
return DatumGetInt32(DirectFunctionCall2(interval_cmp,
IntervalPGetDatum(&arg1->lower),
IntervalPGetDatum(&arg2->lower)));
}
Datum
gbt_intv_sortsupport(PG_FUNCTION_ARGS)
{
SortSupport ssup = (SortSupport) PG_GETARG_POINTER(0);
ssup->comparator = gbt_intv_ssup_cmp;
ssup->ssup_extra = NULL;
PG_RETURN_VOID();
}

View File

@ -7,6 +7,7 @@
#include "btree_utils_num.h"
#include "utils/fmgrprotos.h"
#include "utils/inet.h"
#include "utils/sortsupport.h"
typedef struct
{
@ -15,9 +16,7 @@ typedef struct
char pad[4]; /* make struct size = sizeof(gbtreekey16) */
} macKEY;
/*
** OID ops
*/
/* GiST support functions */
PG_FUNCTION_INFO_V1(gbt_macad_compress);
PG_FUNCTION_INFO_V1(gbt_macad_fetch);
PG_FUNCTION_INFO_V1(gbt_macad_union);
@ -25,6 +24,7 @@ PG_FUNCTION_INFO_V1(gbt_macad_picksplit);
PG_FUNCTION_INFO_V1(gbt_macad_consistent);
PG_FUNCTION_INFO_V1(gbt_macad_penalty);
PG_FUNCTION_INFO_V1(gbt_macad_same);
PG_FUNCTION_INFO_V1(gbt_macaddr_sortsupport);
static bool
@ -88,11 +88,9 @@ static const gbtree_ninfo tinfo =
/**************************************************
* macaddr ops
* GiST support functions
**************************************************/
static uint64
mac_2_uint64(macaddr *m)
{
@ -105,8 +103,6 @@ mac_2_uint64(macaddr *m)
return res;
}
Datum
gbt_macad_compress(PG_FUNCTION_ARGS)
{
@ -194,3 +190,26 @@ gbt_macad_same(PG_FUNCTION_ARGS)
*result = gbt_num_same((void *) b1, (void *) b2, &tinfo, fcinfo->flinfo);
PG_RETURN_POINTER(result);
}
static int
gbt_macaddr_ssup_cmp(Datum x, Datum y, SortSupport ssup)
{
macKEY *arg1 = (macKEY *) DatumGetPointer(x);
macKEY *arg2 = (macKEY *) DatumGetPointer(y);
/* for leaf items we expect lower == upper, so only compare lower */
return DatumGetInt32(DirectFunctionCall2(macaddr_cmp,
MacaddrPGetDatum(&arg1->lower),
MacaddrPGetDatum(&arg2->lower)));
}
Datum
gbt_macaddr_sortsupport(PG_FUNCTION_ARGS)
{
SortSupport ssup = (SortSupport) PG_GETARG_POINTER(0);
ssup->comparator = gbt_macaddr_ssup_cmp;
ssup->ssup_extra = NULL;
PG_RETURN_VOID();
}

View File

@ -7,6 +7,7 @@
#include "btree_utils_num.h"
#include "utils/fmgrprotos.h"
#include "utils/inet.h"
#include "utils/sortsupport.h"
typedef struct
{
@ -15,9 +16,7 @@ typedef struct
/* make struct size = sizeof(gbtreekey16) */
} mac8KEY;
/*
** OID ops
*/
/* GiST support functions */
PG_FUNCTION_INFO_V1(gbt_macad8_compress);
PG_FUNCTION_INFO_V1(gbt_macad8_fetch);
PG_FUNCTION_INFO_V1(gbt_macad8_union);
@ -25,7 +24,7 @@ PG_FUNCTION_INFO_V1(gbt_macad8_picksplit);
PG_FUNCTION_INFO_V1(gbt_macad8_consistent);
PG_FUNCTION_INFO_V1(gbt_macad8_penalty);
PG_FUNCTION_INFO_V1(gbt_macad8_same);
PG_FUNCTION_INFO_V1(gbt_macad8_sortsupport);
static bool
gbt_macad8gt(const void *a, const void *b, FmgrInfo *flinfo)
@ -88,11 +87,9 @@ static const gbtree_ninfo tinfo =
/**************************************************
* macaddr ops
* GiST support functions
**************************************************/
static uint64
mac8_2_uint64(macaddr8 *m)
{
@ -105,8 +102,6 @@ mac8_2_uint64(macaddr8 *m)
return res;
}
Datum
gbt_macad8_compress(PG_FUNCTION_ARGS)
{
@ -145,7 +140,6 @@ gbt_macad8_consistent(PG_FUNCTION_ARGS)
GIST_LEAF(entry), &tinfo, fcinfo->flinfo));
}
Datum
gbt_macad8_union(PG_FUNCTION_ARGS)
{
@ -156,7 +150,6 @@ gbt_macad8_union(PG_FUNCTION_ARGS)
PG_RETURN_POINTER(gbt_num_union(out, entryvec, &tinfo, fcinfo->flinfo));
}
Datum
gbt_macad8_penalty(PG_FUNCTION_ARGS)
{
@ -194,3 +187,26 @@ gbt_macad8_same(PG_FUNCTION_ARGS)
*result = gbt_num_same((void *) b1, (void *) b2, &tinfo, fcinfo->flinfo);
PG_RETURN_POINTER(result);
}
static int
gbt_macaddr8_ssup_cmp(Datum x, Datum y, SortSupport ssup)
{
mac8KEY *arg1 = (mac8KEY *) DatumGetPointer(x);
mac8KEY *arg2 = (mac8KEY *) DatumGetPointer(y);
/* for leaf items we expect lower == upper, so only compare lower */
return DatumGetInt32(DirectFunctionCall2(macaddr8_cmp,
Macaddr8PGetDatum(&arg1->lower),
Macaddr8PGetDatum(&arg2->lower)));
}
Datum
gbt_macad8_sortsupport(PG_FUNCTION_ARGS)
{
SortSupport ssup = (SortSupport) PG_GETARG_POINTER(0);
ssup->comparator = gbt_macaddr8_ssup_cmp;
ssup->ssup_extra = NULL;
PG_RETURN_VOID();
}

View File

@ -11,16 +11,16 @@
#include "utils/builtins.h"
#include "utils/numeric.h"
#include "utils/rel.h"
#include "utils/sortsupport.h"
/*
** Bytea ops
*/
/* GiST support functions */
PG_FUNCTION_INFO_V1(gbt_numeric_compress);
PG_FUNCTION_INFO_V1(gbt_numeric_union);
PG_FUNCTION_INFO_V1(gbt_numeric_picksplit);
PG_FUNCTION_INFO_V1(gbt_numeric_consistent);
PG_FUNCTION_INFO_V1(gbt_numeric_penalty);
PG_FUNCTION_INFO_V1(gbt_numeric_same);
PG_FUNCTION_INFO_V1(gbt_numeric_sortsupport);
/* define for comparison */
@ -90,10 +90,9 @@ static const gbtree_vinfo tinfo =
/**************************************************
* Text ops
* GiST support functions
**************************************************/
Datum
gbt_numeric_compress(PG_FUNCTION_ARGS)
{
@ -102,8 +101,6 @@ gbt_numeric_compress(PG_FUNCTION_ARGS)
PG_RETURN_POINTER(gbt_var_compress(entry, &tinfo));
}
Datum
gbt_numeric_consistent(PG_FUNCTION_ARGS)
{
@ -125,8 +122,6 @@ gbt_numeric_consistent(PG_FUNCTION_ARGS)
PG_RETURN_BOOL(retval);
}
Datum
gbt_numeric_union(PG_FUNCTION_ARGS)
{
@ -137,7 +132,6 @@ gbt_numeric_union(PG_FUNCTION_ARGS)
&tinfo, fcinfo->flinfo));
}
Datum
gbt_numeric_same(PG_FUNCTION_ARGS)
{
@ -149,7 +143,6 @@ gbt_numeric_same(PG_FUNCTION_ARGS)
PG_RETURN_POINTER(result);
}
Datum
gbt_numeric_penalty(PG_FUNCTION_ARGS)
{
@ -215,8 +208,6 @@ gbt_numeric_penalty(PG_FUNCTION_ARGS)
PG_RETURN_POINTER(result);
}
Datum
gbt_numeric_picksplit(PG_FUNCTION_ARGS)
{
@ -227,3 +218,35 @@ gbt_numeric_picksplit(PG_FUNCTION_ARGS)
&tinfo, fcinfo->flinfo);
PG_RETURN_POINTER(v);
}
static int
gbt_numeric_ssup_cmp(Datum x, Datum y, SortSupport ssup)
{
GBT_VARKEY *key1 = PG_DETOAST_DATUM(x);
GBT_VARKEY *key2 = PG_DETOAST_DATUM(y);
GBT_VARKEY_R arg1 = gbt_var_key_readable(key1);
GBT_VARKEY_R arg2 = gbt_var_key_readable(key2);
Datum result;
/* for leaf items we expect lower == upper, so only compare lower */
result = DirectFunctionCall2(numeric_cmp,
PointerGetDatum(arg1.lower),
PointerGetDatum(arg2.lower));
GBT_FREE_IF_COPY(key1, x);
GBT_FREE_IF_COPY(key2, y);
return DatumGetInt32(result);
}
Datum
gbt_numeric_sortsupport(PG_FUNCTION_ARGS)
{
SortSupport ssup = (SortSupport) PG_GETARG_POINTER(0);
ssup->comparator = gbt_numeric_ssup_cmp;
ssup->ssup_extra = NULL;
PG_RETURN_VOID();
}

View File

@ -5,6 +5,7 @@
#include "btree_gist.h"
#include "btree_utils_num.h"
#include "utils/sortsupport.h"
typedef struct
{
@ -12,9 +13,7 @@ typedef struct
Oid upper;
} oidKEY;
/*
** OID ops
*/
/* GiST support functions */
PG_FUNCTION_INFO_V1(gbt_oid_compress);
PG_FUNCTION_INFO_V1(gbt_oid_fetch);
PG_FUNCTION_INFO_V1(gbt_oid_union);
@ -23,6 +22,7 @@ PG_FUNCTION_INFO_V1(gbt_oid_consistent);
PG_FUNCTION_INFO_V1(gbt_oid_distance);
PG_FUNCTION_INFO_V1(gbt_oid_penalty);
PG_FUNCTION_INFO_V1(gbt_oid_same);
PG_FUNCTION_INFO_V1(gbt_oid_sortsupport);
static bool
@ -113,10 +113,9 @@ oid_dist(PG_FUNCTION_ARGS)
/**************************************************
* Oid ops
* GiST support functions
**************************************************/
Datum
gbt_oid_compress(PG_FUNCTION_ARGS)
{
@ -155,7 +154,6 @@ gbt_oid_consistent(PG_FUNCTION_ARGS)
GIST_LEAF(entry), &tinfo, fcinfo->flinfo));
}
Datum
gbt_oid_distance(PG_FUNCTION_ARGS)
{
@ -173,7 +171,6 @@ gbt_oid_distance(PG_FUNCTION_ARGS)
&tinfo, fcinfo->flinfo));
}
Datum
gbt_oid_union(PG_FUNCTION_ARGS)
{
@ -184,7 +181,6 @@ gbt_oid_union(PG_FUNCTION_ARGS)
PG_RETURN_POINTER(gbt_num_union(out, entryvec, &tinfo, fcinfo->flinfo));
}
Datum
gbt_oid_penalty(PG_FUNCTION_ARGS)
{
@ -215,3 +211,29 @@ gbt_oid_same(PG_FUNCTION_ARGS)
*result = gbt_num_same((void *) b1, (void *) b2, &tinfo, fcinfo->flinfo);
PG_RETURN_POINTER(result);
}
static int
gbt_oid_ssup_cmp(Datum x, Datum y, SortSupport ssup)
{
oidKEY *arg1 = (oidKEY *) DatumGetPointer(x);
oidKEY *arg2 = (oidKEY *) DatumGetPointer(y);
/* for leaf items we expect lower == upper, so only compare lower */
if (arg1->lower > arg2->lower)
return 1;
else if (arg1->lower < arg2->lower)
return -1;
else
return 0;
}
Datum
gbt_oid_sortsupport(PG_FUNCTION_ARGS)
{
SortSupport ssup = (SortSupport) PG_GETARG_POINTER(0);
ssup->comparator = gbt_oid_ssup_cmp;
ssup->ssup_extra = NULL;
PG_RETURN_VOID();
}

View File

@ -7,10 +7,9 @@
#include "btree_utils_var.h"
#include "mb/pg_wchar.h"
#include "utils/fmgrprotos.h"
#include "utils/sortsupport.h"
/*
** Text ops
*/
/* GiST support functions */
PG_FUNCTION_INFO_V1(gbt_text_compress);
PG_FUNCTION_INFO_V1(gbt_bpchar_compress);
PG_FUNCTION_INFO_V1(gbt_text_union);
@ -19,6 +18,8 @@ PG_FUNCTION_INFO_V1(gbt_text_consistent);
PG_FUNCTION_INFO_V1(gbt_bpchar_consistent);
PG_FUNCTION_INFO_V1(gbt_text_penalty);
PG_FUNCTION_INFO_V1(gbt_text_same);
PG_FUNCTION_INFO_V1(gbt_text_sortsupport);
PG_FUNCTION_INFO_V1(gbt_bpchar_sortsupport);
/* define for comparison */
@ -163,10 +164,9 @@ static gbtree_vinfo bptinfo =
/**************************************************
* Text ops
* GiST support functions
**************************************************/
Datum
gbt_text_compress(PG_FUNCTION_ARGS)
{
@ -187,8 +187,6 @@ gbt_bpchar_compress(PG_FUNCTION_ARGS)
return gbt_text_compress(fcinfo);
}
Datum
gbt_text_consistent(PG_FUNCTION_ARGS)
{
@ -216,7 +214,6 @@ gbt_text_consistent(PG_FUNCTION_ARGS)
PG_RETURN_BOOL(retval);
}
Datum
gbt_bpchar_consistent(PG_FUNCTION_ARGS)
{
@ -243,7 +240,6 @@ gbt_bpchar_consistent(PG_FUNCTION_ARGS)
PG_RETURN_BOOL(retval);
}
Datum
gbt_text_union(PG_FUNCTION_ARGS)
{
@ -254,7 +250,6 @@ gbt_text_union(PG_FUNCTION_ARGS)
&tinfo, fcinfo->flinfo));
}
Datum
gbt_text_picksplit(PG_FUNCTION_ARGS)
{
@ -277,7 +272,6 @@ gbt_text_same(PG_FUNCTION_ARGS)
PG_RETURN_POINTER(result);
}
Datum
gbt_text_penalty(PG_FUNCTION_ARGS)
{
@ -288,3 +282,69 @@ gbt_text_penalty(PG_FUNCTION_ARGS)
PG_RETURN_POINTER(gbt_var_penalty(result, o, n, PG_GET_COLLATION(),
&tinfo, fcinfo->flinfo));
}
static int
gbt_text_ssup_cmp(Datum x, Datum y, SortSupport ssup)
{
GBT_VARKEY *key1 = PG_DETOAST_DATUM(x);
GBT_VARKEY *key2 = PG_DETOAST_DATUM(y);
GBT_VARKEY_R arg1 = gbt_var_key_readable(key1);
GBT_VARKEY_R arg2 = gbt_var_key_readable(key2);
Datum result;
/* for leaf items we expect lower == upper, so only compare lower */
result = DirectFunctionCall2Coll(bttextcmp,
ssup->ssup_collation,
PointerGetDatum(arg1.lower),
PointerGetDatum(arg2.lower));
GBT_FREE_IF_COPY(key1, x);
GBT_FREE_IF_COPY(key2, y);
return DatumGetInt32(result);
}
Datum
gbt_text_sortsupport(PG_FUNCTION_ARGS)
{
SortSupport ssup = (SortSupport) PG_GETARG_POINTER(0);
ssup->comparator = gbt_text_ssup_cmp;
ssup->ssup_extra = NULL;
PG_RETURN_VOID();
}
static int
gbt_bpchar_ssup_cmp(Datum x, Datum y, SortSupport ssup)
{
GBT_VARKEY *key1 = PG_DETOAST_DATUM(x);
GBT_VARKEY *key2 = PG_DETOAST_DATUM(y);
GBT_VARKEY_R arg1 = gbt_var_key_readable(key1);
GBT_VARKEY_R arg2 = gbt_var_key_readable(key2);
Datum result;
/* for leaf items we expect lower == upper, so only compare lower */
result = DirectFunctionCall2Coll(bpcharcmp,
ssup->ssup_collation,
PointerGetDatum(arg1.lower),
PointerGetDatum(arg2.lower));
GBT_FREE_IF_COPY(key1, x);
GBT_FREE_IF_COPY(key2, y);
return DatumGetInt32(result);
}
Datum
gbt_bpchar_sortsupport(PG_FUNCTION_ARGS)
{
SortSupport ssup = (SortSupport) PG_GETARG_POINTER(0);
ssup->comparator = gbt_bpchar_ssup_cmp;
ssup->ssup_extra = NULL;
PG_RETURN_VOID();
}

View File

@ -7,6 +7,7 @@
#include "btree_utils_num.h"
#include "utils/fmgrprotos.h"
#include "utils/date.h"
#include "utils/sortsupport.h"
#include "utils/timestamp.h"
typedef struct
@ -15,9 +16,7 @@ typedef struct
TimeADT upper;
} timeKEY;
/*
** time ops
*/
/* GiST support functions */
PG_FUNCTION_INFO_V1(gbt_time_compress);
PG_FUNCTION_INFO_V1(gbt_timetz_compress);
PG_FUNCTION_INFO_V1(gbt_time_fetch);
@ -28,6 +27,8 @@ PG_FUNCTION_INFO_V1(gbt_time_distance);
PG_FUNCTION_INFO_V1(gbt_timetz_consistent);
PG_FUNCTION_INFO_V1(gbt_time_penalty);
PG_FUNCTION_INFO_V1(gbt_time_same);
PG_FUNCTION_INFO_V1(gbt_time_sortsupport);
PG_FUNCTION_INFO_V1(gbt_timetz_sortsupport);
#ifdef USE_FLOAT8_BYVAL
@ -92,8 +93,6 @@ gbt_timelt(const void *a, const void *b, FmgrInfo *flinfo)
TimeADTGetDatumFast(*bb)));
}
static int
gbt_timekey_cmp(const void *a, const void *b, FmgrInfo *flinfo)
{
@ -150,11 +149,9 @@ time_dist(PG_FUNCTION_ARGS)
/**************************************************
* time ops
* GiST support functions
**************************************************/
Datum
gbt_time_compress(PG_FUNCTION_ARGS)
{
@ -163,7 +160,6 @@ gbt_time_compress(PG_FUNCTION_ARGS)
PG_RETURN_POINTER(gbt_num_compress(entry, &tinfo));
}
Datum
gbt_timetz_compress(PG_FUNCTION_ARGS)
{
@ -262,7 +258,6 @@ gbt_timetz_consistent(PG_FUNCTION_ARGS)
GIST_LEAF(entry), &tinfo, fcinfo->flinfo));
}
Datum
gbt_time_union(PG_FUNCTION_ARGS)
{
@ -273,7 +268,6 @@ gbt_time_union(PG_FUNCTION_ARGS)
PG_RETURN_POINTER(gbt_num_union(out, entryvec, &tinfo, fcinfo->flinfo));
}
Datum
gbt_time_penalty(PG_FUNCTION_ARGS)
{
@ -313,7 +307,6 @@ gbt_time_penalty(PG_FUNCTION_ARGS)
PG_RETURN_POINTER(result);
}
Datum
gbt_time_picksplit(PG_FUNCTION_ARGS)
{
@ -332,3 +325,26 @@ gbt_time_same(PG_FUNCTION_ARGS)
*result = gbt_num_same((void *) b1, (void *) b2, &tinfo, fcinfo->flinfo);
PG_RETURN_POINTER(result);
}
static int
gbt_timekey_ssup_cmp(Datum x, Datum y, SortSupport ssup)
{
timeKEY *arg1 = (timeKEY *) DatumGetPointer(x);
timeKEY *arg2 = (timeKEY *) DatumGetPointer(y);
/* for leaf items we expect lower == upper, so only compare lower */
return DatumGetInt32(DirectFunctionCall2(time_cmp,
TimeADTGetDatumFast(arg1->lower),
TimeADTGetDatumFast(arg2->lower)));
}
Datum
gbt_time_sortsupport(PG_FUNCTION_ARGS)
{
SortSupport ssup = (SortSupport) PG_GETARG_POINTER(0);
ssup->comparator = gbt_timekey_ssup_cmp;
ssup->ssup_extra = NULL;
PG_RETURN_VOID();
}

View File

@ -10,6 +10,7 @@
#include "utils/fmgrprotos.h"
#include "utils/timestamp.h"
#include "utils/float.h"
#include "utils/sortsupport.h"
typedef struct
{
@ -17,9 +18,7 @@ typedef struct
Timestamp upper;
} tsKEY;
/*
** timestamp ops
*/
/* GiST support functions */
PG_FUNCTION_INFO_V1(gbt_ts_compress);
PG_FUNCTION_INFO_V1(gbt_tstz_compress);
PG_FUNCTION_INFO_V1(gbt_ts_fetch);
@ -31,6 +30,7 @@ PG_FUNCTION_INFO_V1(gbt_tstz_consistent);
PG_FUNCTION_INFO_V1(gbt_tstz_distance);
PG_FUNCTION_INFO_V1(gbt_ts_penalty);
PG_FUNCTION_INFO_V1(gbt_ts_same);
PG_FUNCTION_INFO_V1(gbt_ts_sortsupport);
#ifdef USE_FLOAT8_BYVAL
@ -40,6 +40,8 @@ PG_FUNCTION_INFO_V1(gbt_ts_same);
#endif
/* define for comparison */
static bool
gbt_tsgt(const void *a, const void *b, FmgrInfo *flinfo)
{
@ -95,7 +97,6 @@ gbt_tslt(const void *a, const void *b, FmgrInfo *flinfo)
TimestampGetDatumFast(*bb)));
}
static int
gbt_tskey_cmp(const void *a, const void *b, FmgrInfo *flinfo)
{
@ -126,7 +127,6 @@ gbt_ts_dist(const void *a, const void *b, FmgrInfo *flinfo)
return fabs(INTERVAL_TO_SEC(i));
}
static const gbtree_ninfo tinfo =
{
gbt_t_ts,
@ -190,12 +190,10 @@ tstz_dist(PG_FUNCTION_ARGS)
PG_RETURN_INTERVAL_P(abs_interval(r));
}
/**************************************************
* timestamp ops
* GiST support functions
**************************************************/
static inline Timestamp
tstz_to_ts_gmt(TimestampTz ts)
{
@ -212,7 +210,6 @@ gbt_ts_compress(PG_FUNCTION_ARGS)
PG_RETURN_POINTER(gbt_num_compress(entry, &tinfo));
}
Datum
gbt_tstz_compress(PG_FUNCTION_ARGS)
{
@ -398,3 +395,26 @@ gbt_ts_same(PG_FUNCTION_ARGS)
*result = gbt_num_same((void *) b1, (void *) b2, &tinfo, fcinfo->flinfo);
PG_RETURN_POINTER(result);
}
static int
gbt_ts_ssup_cmp(Datum x, Datum y, SortSupport ssup)
{
tsKEY *arg1 = (tsKEY *) DatumGetPointer(x);
tsKEY *arg2 = (tsKEY *) DatumGetPointer(y);
/* for leaf items we expect lower == upper, so only compare lower */
return DatumGetInt32(DirectFunctionCall2(timestamp_cmp,
TimestampGetDatumFast(arg1->lower),
TimestampGetDatumFast(arg2->lower)));
}
Datum
gbt_ts_sortsupport(PG_FUNCTION_ARGS)
{
SortSupport ssup = (SortSupport) PG_GETARG_POINTER(0);
ssup->comparator = gbt_ts_ssup_cmp;
ssup->ssup_extra = NULL;
PG_RETURN_VOID();
}

View File

@ -41,7 +41,17 @@ typedef struct
GBT_VARKEY *(*f_l2n) (GBT_VARKEY *, FmgrInfo *flinfo); /* convert leaf to node */
} gbtree_vinfo;
/*
* Free ptr1 in case its a copy of ptr2.
*
* This is adapted from varlena's PG_FREE_IF_COPY, though doesn't require
* fcinfo access.
*/
#define GBT_FREE_IF_COPY(ptr1, ptr2) \
do { \
if ((Pointer) (ptr1) != DatumGetPointer(ptr2)) \
pfree(ptr1); \
} while (0)
extern GBT_VARKEY_R gbt_var_key_readable(const GBT_VARKEY *k);

View File

@ -6,6 +6,7 @@
#include "btree_gist.h"
#include "btree_utils_num.h"
#include "port/pg_bswap.h"
#include "utils/sortsupport.h"
#include "utils/uuid.h"
typedef struct
@ -15,9 +16,7 @@ typedef struct
} uuidKEY;
/*
* UUID ops
*/
/* GiST support functions */
PG_FUNCTION_INFO_V1(gbt_uuid_compress);
PG_FUNCTION_INFO_V1(gbt_uuid_fetch);
PG_FUNCTION_INFO_V1(gbt_uuid_union);
@ -25,6 +24,7 @@ PG_FUNCTION_INFO_V1(gbt_uuid_picksplit);
PG_FUNCTION_INFO_V1(gbt_uuid_consistent);
PG_FUNCTION_INFO_V1(gbt_uuid_penalty);
PG_FUNCTION_INFO_V1(gbt_uuid_same);
PG_FUNCTION_INFO_V1(gbt_uuid_sortsupport);
static int
@ -93,10 +93,9 @@ static const gbtree_ninfo tinfo =
/**************************************************
* uuid ops
* GiST support functions
**************************************************/
Datum
gbt_uuid_compress(PG_FUNCTION_ARGS)
{
@ -233,3 +232,24 @@ gbt_uuid_same(PG_FUNCTION_ARGS)
*result = gbt_num_same((void *) b1, (void *) b2, &tinfo, fcinfo->flinfo);
PG_RETURN_POINTER(result);
}
static int
gbt_uuid_ssup_cmp(Datum x, Datum y, SortSupport ssup)
{
uuidKEY *arg1 = (uuidKEY *) DatumGetPointer(x);
uuidKEY *arg2 = (uuidKEY *) DatumGetPointer(y);
/* for leaf items we expect lower == upper, so only compare lower */
return uuid_internal_cmp(&arg1->lower, &arg2->lower);
}
Datum
gbt_uuid_sortsupport(PG_FUNCTION_ARGS)
{
SortSupport ssup = (SortSupport) PG_GETARG_POINTER(0);
ssup->comparator = gbt_uuid_ssup_cmp;
ssup->ssup_extra = NULL;
PG_RETURN_VOID();
}

View File

@ -1,5 +1,8 @@
-- enum check
create type rainbow as enum ('r','o','y','g','b','i','v');
create type rainbow as enum ('r','o','g','b','i','v');
-- enum values added later take some different codepaths internally,
-- so make sure we have coverage for those too
alter type rainbow add value 'y' before 'g';
CREATE TABLE enumtmp (a rainbow);
\copy enumtmp from 'data/enum.data'
SET enable_seqscan=on;

View File

@ -51,6 +51,7 @@ install_data(
'btree_gist--1.5--1.6.sql',
'btree_gist--1.6--1.7.sql',
'btree_gist--1.7--1.8.sql',
'btree_gist--1.8--1.9.sql',
kwargs: contrib_data_args,
)

View File

@ -1,6 +1,10 @@
-- enum check
create type rainbow as enum ('r','o','y','g','b','i','v');
create type rainbow as enum ('r','o','g','b','i','v');
-- enum values added later take some different codepaths internally,
-- so make sure we have coverage for those too
alter type rainbow add value 'y' before 'g';
CREATE TABLE enumtmp (a rainbow);

View File

@ -10,7 +10,10 @@
#include "utils/varlena.h"
#include "varatt.h"
PG_MODULE_MAGIC;
PG_MODULE_MAGIC_EXT(
.name = "citext",
.version = PG_VERSION
);
/*
* ====================

View File

@ -17,7 +17,10 @@
#include "utils/array.h"
#include "utils/float.h"
PG_MODULE_MAGIC;
PG_MODULE_MAGIC_EXT(
.name = "cube",
.version = PG_VERSION
);
/*
* Taken from the intarray contrib header

View File

@ -13,6 +13,7 @@ PGFILEDESC = "dblink - connect to other PostgreSQL databases"
REGRESS = dblink
REGRESS_OPTS = --dlpath=$(top_builddir)/src/test/regress
TAP_TESTS = 1
ifdef USE_PGXS
PG_CONFIG = pg_config

View File

@ -43,6 +43,8 @@
#include "catalog/pg_foreign_server.h"
#include "catalog/pg_type.h"
#include "catalog/pg_user_mapping.h"
#include "commands/defrem.h"
#include "common/base64.h"
#include "executor/spi.h"
#include "foreign/foreign.h"
#include "funcapi.h"
@ -63,7 +65,10 @@
#include "utils/varlena.h"
#include "utils/wait_event.h"
PG_MODULE_MAGIC;
PG_MODULE_MAGIC_EXT(
.name = "dblink",
.version = PG_VERSION
);
typedef struct remoteConn
{
@ -100,7 +105,7 @@ static PGresult *storeQueryResult(volatile storeInfo *sinfo, PGconn *conn, const
static void storeRow(volatile storeInfo *sinfo, PGresult *res, bool first);
static remoteConn *getConnectionByName(const char *name);
static HTAB *createConnHash(void);
static void createNewConnection(const char *name, remoteConn *rconn);
static remoteConn *createNewConnection(const char *name);
static void deleteConnection(const char *name);
static char **get_pkey_attnames(Relation rel, int16 *indnkeyatts);
static char **get_text_array_contents(ArrayType *array, int *numitems);
@ -114,7 +119,8 @@ static Relation get_rel_from_relname(text *relname_text, LOCKMODE lockmode, AclM
static char *generate_relation_name(Relation rel);
static void dblink_connstr_check(const char *connstr);
static bool dblink_connstr_has_pw(const char *connstr);
static void dblink_security_check(PGconn *conn, remoteConn *rconn, const char *connstr);
static void dblink_security_check(PGconn *conn, const char *connname,
const char *connstr);
static void dblink_res_error(PGconn *conn, const char *conname, PGresult *res,
bool fail, const char *fmt,...) pg_attribute_printf(5, 6);
static char *get_connect_string(const char *servername);
@ -126,6 +132,11 @@ static bool is_valid_dblink_option(const PQconninfoOption *options,
const char *option, Oid context);
static int applyRemoteGucs(PGconn *conn);
static void restoreLocalGucs(int nestlevel);
static bool UseScramPassthrough(ForeignServer *foreign_server, UserMapping *user);
static void appendSCRAMKeysInfo(StringInfo buf);
static bool is_valid_dblink_fdw_option(const PQconninfoOption *options, const char *option,
Oid context);
static bool dblink_connstr_has_required_scram_options(const char *connstr);
/* Global */
static remoteConn *pconn = NULL;
@ -137,16 +148,22 @@ static uint32 dblink_we_get_conn = 0;
static uint32 dblink_we_get_result = 0;
/*
* Following is list that holds multiple remote connections.
* Following is hash that holds multiple remote connections.
* Calling convention of each dblink function changes to accept
* connection name as the first parameter. The connection list is
* connection name as the first parameter. The connection hash is
* much like ecpg e.g. a mapping between a name and a PGconn object.
*
* To avoid potentially leaking a PGconn object in case of out-of-memory
* errors, we first create the hash entry, then open the PGconn.
* Hence, a hash entry whose rconn.conn pointer is NULL must be
* understood as a leftover from a failed create; it should be ignored
* by lookup operations, and silently replaced by create operations.
*/
typedef struct remoteConnHashEnt
{
char name[NAMEDATALEN];
remoteConn *rconn;
remoteConn rconn;
} remoteConnHashEnt;
/* initial number of connection hashes */
@ -160,8 +177,7 @@ xpstrdup(const char *in)
return pstrdup(in);
}
static void
pg_attribute_noreturn()
pg_noreturn static void
dblink_res_internalerror(PGconn *conn, PGresult *res, const char *p2)
{
char *msg = pchomp(PQerrorMessage(conn));
@ -170,8 +186,7 @@ dblink_res_internalerror(PGconn *conn, PGresult *res, const char *p2)
elog(ERROR, "%s: %s", p2, msg);
}
static void
pg_attribute_noreturn()
pg_noreturn static void
dblink_conn_not_avail(const char *conname)
{
if (conname)
@ -225,7 +240,7 @@ dblink_get_conn(char *conname_or_str,
errmsg("could not establish connection"),
errdetail_internal("%s", msg)));
}
dblink_security_check(conn, rconn, connstr);
dblink_security_check(conn, NULL, connstr);
if (PQclientEncoding(conn) != GetDatabaseEncoding())
PQsetClientEncoding(conn, GetDatabaseEncodingName());
freeconn = true;
@ -288,15 +303,6 @@ dblink_connect(PG_FUNCTION_ARGS)
else if (PG_NARGS() == 1)
conname_or_str = text_to_cstring(PG_GETARG_TEXT_PP(0));
if (connname)
{
rconn = (remoteConn *) MemoryContextAlloc(TopMemoryContext,
sizeof(remoteConn));
rconn->conn = NULL;
rconn->openCursorCount = 0;
rconn->newXactForCursor = false;
}
/* first check for valid foreign data server */
connstr = get_connect_string(conname_or_str);
if (connstr == NULL)
@ -309,6 +315,13 @@ dblink_connect(PG_FUNCTION_ARGS)
if (dblink_we_connect == 0)
dblink_we_connect = WaitEventExtensionNew("DblinkConnect");
/* if we need a hashtable entry, make that first, since it might fail */
if (connname)
{
rconn = createNewConnection(connname);
Assert(rconn->conn == NULL);
}
/* OK to make connection */
conn = libpqsrv_connect(connstr, dblink_we_connect);
@ -316,8 +329,8 @@ dblink_connect(PG_FUNCTION_ARGS)
{
msg = pchomp(PQerrorMessage(conn));
libpqsrv_disconnect(conn);
if (rconn)
pfree(rconn);
if (connname)
deleteConnection(connname);
ereport(ERROR,
(errcode(ERRCODE_SQLCLIENT_UNABLE_TO_ESTABLISH_SQLCONNECTION),
@ -326,16 +339,16 @@ dblink_connect(PG_FUNCTION_ARGS)
}
/* check password actually used if not superuser */
dblink_security_check(conn, rconn, connstr);
dblink_security_check(conn, connname, connstr);
/* attempt to set client encoding to match server encoding, if needed */
if (PQclientEncoding(conn) != GetDatabaseEncoding())
PQsetClientEncoding(conn, GetDatabaseEncodingName());
/* all OK, save away the conn */
if (connname)
{
rconn->conn = conn;
createNewConnection(connname, rconn);
}
else
{
@ -375,10 +388,7 @@ dblink_disconnect(PG_FUNCTION_ARGS)
libpqsrv_disconnect(conn);
if (rconn)
{
deleteConnection(conname);
pfree(rconn);
}
else
pconn->conn = NULL;
@ -1296,6 +1306,9 @@ dblink_get_connections(PG_FUNCTION_ARGS)
hash_seq_init(&status, remoteConnHash);
while ((hentry = (remoteConnHashEnt *) hash_seq_search(&status)) != NULL)
{
/* ignore it if it's not an open connection */
if (hentry->rconn.conn == NULL)
continue;
/* stash away current value */
astate = accumArrayResult(astate,
CStringGetTextDatum(hentry->name),
@ -1966,7 +1979,7 @@ dblink_fdw_validator(PG_FUNCTION_ARGS)
{
DefElem *def = (DefElem *) lfirst(cell);
if (!is_valid_dblink_option(options, def->defname, context))
if (!is_valid_dblink_fdw_option(options, def->defname, context))
{
/*
* Unknown option, or invalid option for the context specified, so
@ -2531,8 +2544,8 @@ getConnectionByName(const char *name)
hentry = (remoteConnHashEnt *) hash_search(remoteConnHash,
key, HASH_FIND, NULL);
if (hentry)
return hentry->rconn;
if (hentry && hentry->rconn.conn != NULL)
return &hentry->rconn;
return NULL;
}
@ -2549,8 +2562,8 @@ createConnHash(void)
HASH_ELEM | HASH_STRINGS);
}
static void
createNewConnection(const char *name, remoteConn *rconn)
static remoteConn *
createNewConnection(const char *name)
{
remoteConnHashEnt *hentry;
bool found;
@ -2564,17 +2577,15 @@ createNewConnection(const char *name, remoteConn *rconn)
hentry = (remoteConnHashEnt *) hash_search(remoteConnHash, key,
HASH_ENTER, &found);
if (found)
{
libpqsrv_disconnect(rconn->conn);
pfree(rconn);
if (found && hentry->rconn.conn != NULL)
ereport(ERROR,
(errcode(ERRCODE_DUPLICATE_OBJECT),
errmsg("duplicate connection name")));
}
hentry->rconn = rconn;
/* New, or reusable, so initialize the rconn struct to zeroes */
memset(&hentry->rconn, 0, sizeof(remoteConn));
return &hentry->rconn;
}
static void
@ -2598,13 +2609,77 @@ deleteConnection(const char *name)
errmsg("undefined connection name")));
}
/*
* Ensure that require_auth and SCRAM keys are correctly set on connstr.
* SCRAM keys used to pass-through are coming from the initial connection
* from the client with the server.
*
* All required SCRAM options are set by dblink, so we just need to ensure
* that these options are not overwritten by the user.
*
* See appendSCRAMKeysInfo and its usage for more.
*/
bool
dblink_connstr_has_required_scram_options(const char *connstr)
{
PQconninfoOption *options;
bool has_scram_server_key = false;
bool has_scram_client_key = false;
bool has_require_auth = false;
bool has_scram_keys = false;
options = PQconninfoParse(connstr, NULL);
if (options)
{
/*
* Continue iterating even if we found the keys that we need to
* validate to make sure that there is no other declaration of these
* keys that can overwrite the first.
*/
for (PQconninfoOption *option = options; option->keyword != NULL; option++)
{
if (strcmp(option->keyword, "require_auth") == 0)
{
if (option->val != NULL && strcmp(option->val, "scram-sha-256") == 0)
has_require_auth = true;
else
has_require_auth = false;
}
if (strcmp(option->keyword, "scram_client_key") == 0)
{
if (option->val != NULL && option->val[0] != '\0')
has_scram_client_key = true;
else
has_scram_client_key = false;
}
if (strcmp(option->keyword, "scram_server_key") == 0)
{
if (option->val != NULL && option->val[0] != '\0')
has_scram_server_key = true;
else
has_scram_server_key = false;
}
}
PQconninfoFree(options);
}
has_scram_keys = has_scram_client_key && has_scram_server_key && MyProcPort->has_scram_keys;
return (has_scram_keys && has_require_auth);
}
/*
* We need to make sure that the connection made used credentials
* which were provided by the user, so check what credentials were
* used to connect and then make sure that they came from the user.
*
* On failure, we close "conn" and also delete the hashtable entry
* identified by "connname" (if that's not NULL).
*/
static void
dblink_security_check(PGconn *conn, remoteConn *rconn, const char *connstr)
dblink_security_check(PGconn *conn, const char *connname, const char *connstr)
{
/* Superuser bypasses security check */
if (superuser())
@ -2614,6 +2689,18 @@ dblink_security_check(PGconn *conn, remoteConn *rconn, const char *connstr)
if (PQconnectionUsedPassword(conn) && dblink_connstr_has_pw(connstr))
return;
/*
* Password was not used to connect, check if SCRAM pass-through is in
* use.
*
* If dblink_connstr_has_required_scram_options is true we assume that
* UseScramPassthrough is also true because the required SCRAM keys are
* only added if UseScramPassthrough is set, and the user is not allowed
* to add the SCRAM keys on fdw and user mapping options.
*/
if (MyProcPort->has_scram_keys && dblink_connstr_has_required_scram_options(connstr))
return;
#ifdef ENABLE_GSS
/* If GSSAPI creds used to connect, make sure it was one delegated */
if (PQconnectionUsedGSSAPI(conn) && be_gssapi_get_delegation(MyProcPort))
@ -2622,8 +2709,8 @@ dblink_security_check(PGconn *conn, remoteConn *rconn, const char *connstr)
/* Otherwise, fail out */
libpqsrv_disconnect(conn);
if (rconn)
pfree(rconn);
if (connname)
deleteConnection(connname);
ereport(ERROR,
(errcode(ERRCODE_S_R_E_PROHIBITED_SQL_STATEMENT_ATTEMPTED),
@ -2666,12 +2753,14 @@ dblink_connstr_has_pw(const char *connstr)
}
/*
* For non-superusers, insist that the connstr specify a password, except
* if GSSAPI credentials have been delegated (and we check that they are used
* for the connection in dblink_security_check later). This prevents a
* password or GSSAPI credentials from being picked up from .pgpass, a
* service file, the environment, etc. We don't want the postgres user's
* passwords or Kerberos credentials to be accessible to non-superusers.
* For non-superusers, insist that the connstr specify a password, except if
* GSSAPI credentials have been delegated (and we check that they are used for
* the connection in dblink_security_check later) or if SCRAM pass-through is
* being used. This prevents a password or GSSAPI credentials from being
* picked up from .pgpass, a service file, the environment, etc. We don't want
* the postgres user's passwords or Kerberos credentials to be accessible to
* non-superusers. In case of SCRAM pass-through insist that the connstr
* has the required SCRAM pass-through options.
*/
static void
dblink_connstr_check(const char *connstr)
@ -2682,6 +2771,9 @@ dblink_connstr_check(const char *connstr)
if (dblink_connstr_has_pw(connstr))
return;
if (MyProcPort->has_scram_keys && dblink_connstr_has_required_scram_options(connstr))
return;
#ifdef ENABLE_GSS
if (be_gssapi_get_delegation(MyProcPort))
return;
@ -2834,6 +2926,14 @@ get_connect_string(const char *servername)
if (aclresult != ACLCHECK_OK)
aclcheck_error(aclresult, OBJECT_FOREIGN_SERVER, foreign_server->servername);
/*
* First append hardcoded options needed for SCRAM pass-through, so if
* the user overwrites these options we can ereport on
* dblink_connstr_check and dblink_security_check.
*/
if (MyProcPort->has_scram_keys && UseScramPassthrough(foreign_server, user_mapping))
appendSCRAMKeysInfo(&buf);
foreach(cell, fdw->options)
{
DefElem *def = lfirst(cell);
@ -3000,6 +3100,13 @@ is_valid_dblink_option(const PQconninfoOption *options, const char *option,
if (strcmp(opt->keyword, "client_encoding") == 0)
return false;
/*
* Disallow OAuth options for now, since the builtin flow communicates on
* stderr by default and can't cache tokens yet.
*/
if (strncmp(opt->keyword, "oauth_", strlen("oauth_")) == 0)
return false;
/*
* If the option is "user" or marked secure, it should be specified only
* in USER MAPPING. Others should be specified only in SERVER.
@ -3018,6 +3125,20 @@ is_valid_dblink_option(const PQconninfoOption *options, const char *option,
return true;
}
/*
* Same as is_valid_dblink_option but also check for only dblink_fdw specific
* options.
*/
static bool
is_valid_dblink_fdw_option(const PQconninfoOption *options, const char *option,
Oid context)
{
if (strcmp(option, "use_scram_passthrough") == 0)
return true;
return is_valid_dblink_option(options, option, context);
}
/*
* Copy the remote session's values of GUCs that affect datatype I/O
* and apply them locally in a new GUC nesting level. Returns the new
@ -3087,3 +3208,66 @@ restoreLocalGucs(int nestlevel)
if (nestlevel > 0)
AtEOXact_GUC(true, nestlevel);
}
/*
* Append SCRAM client key and server key information from the global
* MyProcPort into the given StringInfo buffer.
*/
static void
appendSCRAMKeysInfo(StringInfo buf)
{
int len;
int encoded_len;
char *client_key;
char *server_key;
len = pg_b64_enc_len(sizeof(MyProcPort->scram_ClientKey));
/* don't forget the zero-terminator */
client_key = palloc0(len + 1);
encoded_len = pg_b64_encode(MyProcPort->scram_ClientKey,
sizeof(MyProcPort->scram_ClientKey),
client_key, len);
if (encoded_len < 0)
elog(ERROR, "could not encode SCRAM client key");
len = pg_b64_enc_len(sizeof(MyProcPort->scram_ServerKey));
/* don't forget the zero-terminator */
server_key = palloc0(len + 1);
encoded_len = pg_b64_encode(MyProcPort->scram_ServerKey,
sizeof(MyProcPort->scram_ServerKey),
server_key, len);
if (encoded_len < 0)
elog(ERROR, "could not encode SCRAM server key");
appendStringInfo(buf, "scram_client_key='%s' ", client_key);
appendStringInfo(buf, "scram_server_key='%s' ", server_key);
appendStringInfoString(buf, "require_auth='scram-sha-256' ");
pfree(client_key);
pfree(server_key);
}
static bool
UseScramPassthrough(ForeignServer *foreign_server, UserMapping *user)
{
ListCell *cell;
foreach(cell, foreign_server->options)
{
DefElem *def = lfirst(cell);
if (strcmp(def->defname, "use_scram_passthrough") == 0)
return defGetBoolean(def);
}
foreach(cell, user->options)
{
DefElem *def = (DefElem *) lfirst(cell);
if (strcmp(def->defname, "use_scram_passthrough") == 0)
return defGetBoolean(def);
}
return false;
}

View File

@ -898,6 +898,17 @@ CREATE USER MAPPING FOR public SERVER fdtest
OPTIONS (server 'localhost'); -- fail, can't specify server here
ERROR: invalid option "server"
CREATE USER MAPPING FOR public SERVER fdtest OPTIONS (user :'USER');
-- OAuth options are not allowed in either context
ALTER SERVER fdtest OPTIONS (ADD oauth_issuer 'https://example.com');
ERROR: invalid option "oauth_issuer"
ALTER SERVER fdtest OPTIONS (ADD oauth_client_id 'myID');
ERROR: invalid option "oauth_client_id"
ALTER USER MAPPING FOR public SERVER fdtest
OPTIONS (ADD oauth_issuer 'https://example.com');
ERROR: invalid option "oauth_issuer"
ALTER USER MAPPING FOR public SERVER fdtest
OPTIONS (ADD oauth_client_id 'myID');
ERROR: invalid option "oauth_client_id"
GRANT USAGE ON FOREIGN SERVER fdtest TO regress_dblink_user;
GRANT EXECUTE ON FUNCTION dblink_connect_u(text, text) TO regress_dblink_user;
SET SESSION AUTHORIZATION regress_dblink_user;

View File

@ -36,4 +36,9 @@ tests += {
],
'regress_args': ['--dlpath', meson.build_root() / 'src/test/regress'],
},
'tap': {
'tests': [
't/001_auth_scram.pl',
],
},
}

View File

@ -469,6 +469,14 @@ CREATE USER MAPPING FOR public SERVER fdtest
OPTIONS (server 'localhost'); -- fail, can't specify server here
CREATE USER MAPPING FOR public SERVER fdtest OPTIONS (user :'USER');
-- OAuth options are not allowed in either context
ALTER SERVER fdtest OPTIONS (ADD oauth_issuer 'https://example.com');
ALTER SERVER fdtest OPTIONS (ADD oauth_client_id 'myID');
ALTER USER MAPPING FOR public SERVER fdtest
OPTIONS (ADD oauth_issuer 'https://example.com');
ALTER USER MAPPING FOR public SERVER fdtest
OPTIONS (ADD oauth_client_id 'myID');
GRANT USAGE ON FOREIGN SERVER fdtest TO regress_dblink_user;
GRANT EXECUTE ON FUNCTION dblink_connect_u(text, text) TO regress_dblink_user;

View File

@ -0,0 +1,253 @@
# Copyright (c) 2024-2025, PostgreSQL Global Development Group
# Test SCRAM authentication when opening a new connection with a foreign
# server.
#
# The test is executed by testing the SCRAM authentifcation on a loopback
# connection on the same server and with different servers.
use strict;
use warnings FATAL => 'all';
use PostgreSQL::Test::Utils;
use PostgreSQL::Test::Cluster;
use Test::More;
if (!$use_unix_sockets)
{
plan skip_all => "test requires Unix-domain sockets";
}
my $user = "user01";
my $db0 = "db0"; # For node1
my $db1 = "db1"; # For node1
my $db2 = "db2"; # For node2
my $fdw_server = "db1_fdw";
my $fdw_server2 = "db2_fdw";
my $fdw_invalid_server = "db2_fdw_invalid"; # For invalid fdw options
my $fdw_invalid_server2 =
"db2_fdw_invalid2"; # For invalid scram keys fdw options
my $node1 = PostgreSQL::Test::Cluster->new('node1');
my $node2 = PostgreSQL::Test::Cluster->new('node2');
$node1->init;
$node2->init;
$node1->start;
$node2->start;
# Test setup
$node1->safe_psql('postgres', qq'CREATE USER $user WITH password \'pass\'');
$node2->safe_psql('postgres', qq'CREATE USER $user WITH password \'pass\'');
$ENV{PGPASSWORD} = "pass";
$node1->safe_psql('postgres', qq'CREATE DATABASE $db0');
$node1->safe_psql('postgres', qq'CREATE DATABASE $db1');
$node2->safe_psql('postgres', qq'CREATE DATABASE $db2');
setup_table($node1, $db1, "t");
setup_table($node2, $db2, "t2");
$node1->safe_psql($db0, 'CREATE EXTENSION IF NOT EXISTS dblink');
setup_fdw_server($node1, $db0, $fdw_server, $node1, $db1);
setup_fdw_server($node1, $db0, $fdw_server2, $node2, $db2);
setup_invalid_fdw_server($node1, $db0, $fdw_invalid_server, $node2, $db2);
setup_fdw_server($node1, $db0, $fdw_invalid_server2, $node2, $db2);
setup_user_mapping($node1, $db0, $fdw_server);
setup_user_mapping($node1, $db0, $fdw_server2);
setup_user_mapping($node1, $db0, $fdw_invalid_server);
# Make the user have the same SCRAM key on both servers. Forcing to have the
# same iteration and salt.
my $rolpassword = $node1->safe_psql('postgres',
qq"SELECT rolpassword FROM pg_authid WHERE rolname = '$user';");
$node2->safe_psql('postgres', qq"ALTER ROLE $user PASSWORD '$rolpassword'");
unlink($node1->data_dir . '/pg_hba.conf');
unlink($node2->data_dir . '/pg_hba.conf');
$node1->append_conf(
'pg_hba.conf', qq{
local db0 all scram-sha-256
local db1 all scram-sha-256
}
);
$node2->append_conf(
'pg_hba.conf', qq{
local db2 all scram-sha-256
}
);
$node1->restart;
$node2->restart;
# End of test setup
test_scram_keys_is_not_overwritten($node1, $db0, $fdw_invalid_server2);
test_fdw_auth($node1, $db0, "t", $fdw_server,
"SCRAM auth on the same database cluster must succeed");
test_fdw_auth($node1, $db0, "t2", $fdw_server2,
"SCRAM auth on a different database cluster must succeed");
test_fdw_auth_with_invalid_overwritten_require_auth($fdw_invalid_server);
# Ensure that trust connections fail without superuser opt-in.
unlink($node1->data_dir . '/pg_hba.conf');
unlink($node2->data_dir . '/pg_hba.conf');
$node1->append_conf(
'pg_hba.conf', qq{
local db0 all scram-sha-256
local db1 all trust
}
);
$node2->append_conf(
'pg_hba.conf', qq{
local all all password
}
);
$node1->restart;
$node2->restart;
my ($ret, $stdout, $stderr) = $node1->psql(
$db0,
"SELECT * FROM dblink('$fdw_server', 'SELECT * FROM t') AS t(a int, b int)",
connstr => $node1->connstr($db0) . " user=$user");
is($ret, 3, 'loopback trust fails on the same cluster');
like(
$stderr,
qr/failed: authentication method requirement "scram-sha-256" failed: server did not complete authentication/,
'expected error from loopback trust (same cluster)');
($ret, $stdout, $stderr) = $node1->psql(
$db0,
"SELECT * FROM dblink('$fdw_server2', 'SELECT * FROM t2') AS t2(a int, b int)",
connstr => $node1->connstr($db0) . " user=$user");
is($ret, 3, 'loopback password fails on a different cluster');
like(
$stderr,
qr/authentication method requirement "scram-sha-256" failed: server requested a cleartext password/,
'expected error from loopback password (different cluster)');
# Helper functions
sub test_fdw_auth
{
local $Test::Builder::Level = $Test::Builder::Level + 1;
my ($node, $db, $tbl, $fdw, $testname) = @_;
my $connstr = $node->connstr($db) . qq' user=$user';
my $ret = $node->safe_psql(
$db,
qq"SELECT count(1) FROM dblink('$fdw', 'SELECT * FROM $tbl') AS $tbl(a int, b int)",
connstr => $connstr);
is($ret, '10', $testname);
}
sub test_fdw_auth_with_invalid_overwritten_require_auth
{
local $Test::Builder::Level = $Test::Builder::Level + 1;
my ($fdw) = @_;
my ($ret, $stdout, $stderr) = $node1->psql(
$db0,
"select * from dblink('$fdw', 'select * from t') as t(a int, b int)",
connstr => $node1->connstr($db0) . " user=$user");
is($ret, 3, 'loopback trust fails when overwriting require_auth');
like(
$stderr,
qr/password or GSSAPI delegated credentials required/,
'expected error when connecting to a fdw overwriting the require_auth'
);
}
sub test_scram_keys_is_not_overwritten
{
local $Test::Builder::Level = $Test::Builder::Level + 1;
my ($node, $db, $fdw) = @_;
my ($ret, $stdout, $stderr) = $node->psql(
$db,
qq'CREATE USER MAPPING FOR $user SERVER $fdw OPTIONS (user \'$user\', scram_client_key \'key\');',
connstr => $node->connstr($db) . " user=$user");
is($ret, 3, 'user mapping creation fails when using scram_client_key');
like(
$stderr,
qr/ERROR: invalid option "scram_client_key"/,
'user mapping creation fails when using scram_client_key');
($ret, $stdout, $stderr) = $node->psql(
$db,
qq'CREATE USER MAPPING FOR $user SERVER $fdw OPTIONS (user \'$user\', scram_server_key \'key\');',
connstr => $node->connstr($db) . " user=$user");
is($ret, 3, 'user mapping creation fails when using scram_server_key');
like(
$stderr,
qr/ERROR: invalid option "scram_server_key"/,
'user mapping creation fails when using scram_server_key');
}
sub setup_user_mapping
{
my ($node, $db, $fdw) = @_;
$node->safe_psql($db,
qq'CREATE USER MAPPING FOR $user SERVER $fdw OPTIONS (user \'$user\');'
);
}
sub setup_fdw_server
{
my ($node, $db, $fdw, $fdw_node, $dbname) = @_;
my $host = $fdw_node->host;
my $port = $fdw_node->port;
$node->safe_psql(
$db, qq'CREATE SERVER $fdw FOREIGN DATA WRAPPER dblink_fdw options (
host \'$host\', port \'$port\', dbname \'$dbname\', use_scram_passthrough \'true\') '
);
$node->safe_psql($db, qq'GRANT USAGE ON FOREIGN SERVER $fdw TO $user;');
$node->safe_psql($db, qq'GRANT ALL ON SCHEMA public TO $user');
}
sub setup_invalid_fdw_server
{
my ($node, $db, $fdw, $fdw_node, $dbname) = @_;
my $host = $fdw_node->host;
my $port = $fdw_node->port;
$node->safe_psql(
$db, qq'CREATE SERVER $fdw FOREIGN DATA WRAPPER dblink_fdw options (
host \'$host\', port \'$port\', dbname \'$dbname\', use_scram_passthrough \'true\', require_auth \'none\') '
);
$node->safe_psql($db, qq'GRANT USAGE ON FOREIGN SERVER $fdw TO $user;');
$node->safe_psql($db, qq'GRANT ALL ON SCHEMA public TO $user');
}
sub setup_table
{
my ($node, $db, $tbl) = @_;
$node->safe_psql($db,
qq'CREATE TABLE $tbl AS SELECT g as a, g + 1 as b FROM generate_series(1,10) g(g)'
);
$node->safe_psql($db, qq'GRANT USAGE ON SCHEMA public TO $user');
$node->safe_psql($db, qq'GRANT SELECT ON $tbl TO $user');
}
done_testing();

View File

@ -15,7 +15,10 @@
#include "commands/defrem.h"
#include "tsearch/ts_public.h"
PG_MODULE_MAGIC;
PG_MODULE_MAGIC_EXT(
.name = "dict_int",
.version = PG_VERSION
);
typedef struct
{

View File

@ -20,7 +20,10 @@
#include "tsearch/ts_public.h"
#include "utils/formatting.h"
PG_MODULE_MAGIC;
PG_MODULE_MAGIC_EXT(
.name = "dict_xsyn",
.version = PG_VERSION
);
typedef struct
{

View File

@ -11,7 +11,10 @@
#define M_PI 3.14159265358979323846
#endif
PG_MODULE_MAGIC;
PG_MODULE_MAGIC_EXT(
.name = "earthdistance",
.version = PG_VERSION
);
/* Earth's radius is in statute miles. */
static const double EARTH_RADIUS = 3958.747716;

View File

@ -24,8 +24,10 @@
#include "commands/copy.h"
#include "commands/copyfrom_internal.h"
#include "commands/defrem.h"
#include "commands/explain.h"
#include "commands/explain_format.h"
#include "commands/explain_state.h"
#include "commands/vacuum.h"
#include "executor/executor.h"
#include "foreign/fdwapi.h"
#include "foreign/foreign.h"
#include "miscadmin.h"
@ -40,7 +42,10 @@
#include "utils/sampling.h"
#include "utils/varlena.h"
PG_MODULE_MAGIC;
PG_MODULE_MAGIC_EXT(
.name = "file_fdw",
.version = PG_VERSION
);
/*
* Describes the valid options for objects that use this wrapper.
@ -793,8 +798,8 @@ retry:
cstate->num_errors > cstate->opts.reject_limit)
ereport(ERROR,
(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
errmsg("skipped more than REJECT_LIMIT (%lld) rows due to data type incompatibility",
(long long) cstate->opts.reject_limit)));
errmsg("skipped more than REJECT_LIMIT (%" PRId64 ") rows due to data type incompatibility",
cstate->opts.reject_limit)));
/* Repeat NextCopyFrom() until no soft error occurs */
goto retry;
@ -850,10 +855,10 @@ fileEndForeignScan(ForeignScanState *node)
festate->cstate->num_errors > 0 &&
festate->cstate->opts.log_verbosity >= COPY_LOG_VERBOSITY_DEFAULT)
ereport(NOTICE,
errmsg_plural("%llu row was skipped due to data type incompatibility",
"%llu rows were skipped due to data type incompatibility",
(unsigned long long) festate->cstate->num_errors,
(unsigned long long) festate->cstate->num_errors));
errmsg_plural("%" PRIu64 " row was skipped due to data type incompatibility",
"%" PRIu64 " rows were skipped due to data type incompatibility",
festate->cstate->num_errors,
festate->cstate->num_errors));
EndCopyFrom(festate->cstate);
}
@ -1314,10 +1319,10 @@ file_acquire_sample_rows(Relation onerel, int elevel,
cstate->num_errors > 0 &&
cstate->opts.log_verbosity >= COPY_LOG_VERBOSITY_DEFAULT)
ereport(NOTICE,
errmsg_plural("%llu row was skipped due to data type incompatibility",
"%llu rows were skipped due to data type incompatibility",
(unsigned long long) cstate->num_errors,
(unsigned long long) cstate->num_errors));
errmsg_plural("%" PRIu64 " row was skipped due to data type incompatibility",
"%" PRIu64 " rows were skipped due to data type incompatibility",
cstate->num_errors,
cstate->num_errors));
EndCopyFrom(cstate);

View File

@ -44,7 +44,10 @@
#include "utils/varlena.h"
#include "varatt.h"
PG_MODULE_MAGIC;
PG_MODULE_MAGIC_EXT(
.name = "fuzzystrmatch",
.version = PG_VERSION
);
/*
* Soundex

View File

@ -21,7 +21,10 @@
#include "utils/memutils.h"
#include "utils/typcache.h"
PG_MODULE_MAGIC;
PG_MODULE_MAGIC_EXT(
.name = "hstore",
.version = PG_VERSION
);
/* old names for C functions */
HSTORE_POLLUTE(hstore_from_text, tconvert);

View File

@ -4,7 +4,10 @@
#include "hstore/hstore.h"
#include "plperl.h"
PG_MODULE_MAGIC;
PG_MODULE_MAGIC_EXT(
.name = "hstore_plperl",
.version = PG_VERSION
);
/* Linkage to functions in hstore module */
typedef HStore *(*hstoreUpgrade_t) (Datum orig);

View File

@ -3,9 +3,12 @@
#include "fmgr.h"
#include "hstore/hstore.h"
#include "plpy_typeio.h"
#include "plpython.h"
#include "plpy_util.h"
PG_MODULE_MAGIC;
PG_MODULE_MAGIC_EXT(
.name = "hstore_plpython",
.version = PG_VERSION
);
/* Linkage to functions in plpython module */
typedef char *(*PLyObject_AsString_t) (PyObject *plrv);

View File

@ -41,17 +41,17 @@ typedef struct
#define SORT(x) \
do { \
int _nelems_ = ARRNELEMS(x); \
if (_nelems_ > 1) \
isort(ARRPTR(x), _nelems_); \
bool _ascending = true; \
isort(ARRPTR(x), _nelems_, &_ascending); \
} while(0)
/* sort the elements of the array and remove duplicates */
#define PREPAREARR(x) \
do { \
int _nelems_ = ARRNELEMS(x); \
if (_nelems_ > 1) \
if (isort(ARRPTR(x), _nelems_)) \
(x) = _int_unique(x); \
bool _ascending = true; \
isort(ARRPTR(x), _nelems_, &_ascending); \
(x) = _int_unique(x); \
} while(0)
/* "wish" function */
@ -109,7 +109,7 @@ typedef struct
/*
* useful functions
*/
bool isort(int32 *a, int len);
void isort(int32 *a, size_t len, void *arg);
ArrayType *new_intArrayType(int num);
ArrayType *copy_intArrayType(ArrayType *a);
ArrayType *resize_intArrayType(ArrayType *a, int num);
@ -176,16 +176,12 @@ bool execconsistent(QUERYTYPE *query, ArrayType *array, bool calcnot);
bool gin_bool_consistent(QUERYTYPE *query, bool *check);
bool query_has_required_values(QUERYTYPE *query);
int compASC(const void *a, const void *b);
int compDESC(const void *a, const void *b);
/* sort, either ascending or descending */
#define QSORT(a, direction) \
do { \
int _nelems_ = ARRNELEMS(a); \
if (_nelems_ > 1) \
qsort((void*) ARRPTR(a), _nelems_, sizeof(int32), \
(direction) ? compASC : compDESC ); \
bool _ascending = (direction) ? true : false; \
isort(ARRPTR(a), _nelems_, &_ascending); \
} while(0)
#endif /* ___INT_H__ */

View File

@ -5,7 +5,10 @@
#include "_int.h"
PG_MODULE_MAGIC;
PG_MODULE_MAGIC_EXT(
.name = "intarray",
.version = PG_VERSION
);
PG_FUNCTION_INFO_V1(_int_different);
PG_FUNCTION_INFO_V1(_int_same);

View File

@ -186,36 +186,38 @@ rt__int_size(ArrayType *a, float *size)
*size = (float) ARRNELEMS(a);
}
/* qsort_arg comparison function for isort() */
static int
/* comparison function for isort() and _int_unique() */
static inline int
isort_cmp(const void *a, const void *b, void *arg)
{
int32 aval = *((const int32 *) a);
int32 bval = *((const int32 *) b);
if (aval < bval)
return -1;
if (aval > bval)
return 1;
/*
* Report if we have any duplicates. If there are equal keys, qsort must
* compare them at some point, else it wouldn't know whether one should go
* before or after the other.
*/
*((bool *) arg) = true;
if (*((bool *) arg))
{
/* compare for ascending order */
if (aval < bval)
return -1;
if (aval > bval)
return 1;
}
else
{
if (aval > bval)
return -1;
if (aval < bval)
return 1;
}
return 0;
}
/* Sort the given data (len >= 2). Return true if any duplicates found */
bool
isort(int32 *a, int len)
{
bool r = false;
qsort_arg(a, len, sizeof(int32), isort_cmp, &r);
return r;
}
#define ST_SORT isort
#define ST_ELEMENT_TYPE int32
#define ST_COMPARE(a, b, ascending) isort_cmp(a, b, ascending)
#define ST_COMPARE_ARG_TYPE void
#define ST_SCOPE
#define ST_DEFINE
#include "lib/sort_template.h"
/* Create a new int array with room for "num" elements */
ArrayType *
@ -311,10 +313,10 @@ ArrayType *
_int_unique(ArrayType *r)
{
int num = ARRNELEMS(r);
bool duplicates_found; /* not used */
bool ascending = true;
num = qunique_arg(ARRPTR(r), num, sizeof(int), isort_cmp,
&duplicates_found);
&ascending);
return resize_intArrayType(r, num);
}
@ -393,15 +395,3 @@ int_to_intset(int32 elem)
aa[0] = elem;
return result;
}
int
compASC(const void *a, const void *b)
{
return pg_cmp_s32(*(const int32 *) a, *(const int32 *) b);
}
int
compDESC(const void *a, const void *b)
{
return pg_cmp_s32(*(const int32 *) b, *(const int32 *) a);
}

View File

@ -492,6 +492,12 @@ SELECT count(*) from test__int WHERE a @@ '!20 & !21';
6344
(1 row)
SELECT count(*) from test__int WHERE a @@ '!2733 & (2738 | 254)';
count
-------
12
(1 row)
SET enable_seqscan = off; -- not all of these would use index by default
CREATE INDEX text_idx on test__int using gist ( a gist__int_ops );
SELECT count(*) from test__int WHERE a && '{23,50}';
@ -566,6 +572,12 @@ SELECT count(*) from test__int WHERE a @@ '!20 & !21';
6344
(1 row)
SELECT count(*) from test__int WHERE a @@ '!2733 & (2738 | 254)';
count
-------
12
(1 row)
INSERT INTO test__int SELECT array(SELECT x FROM generate_series(1, 1001) x); -- should fail
ERROR: input array is too big (199 maximum allowed, 1001 current), use gist__intbig_ops opclass instead
DROP INDEX text_idx;
@ -648,6 +660,12 @@ SELECT count(*) from test__int WHERE a @@ '!20 & !21';
6344
(1 row)
SELECT count(*) from test__int WHERE a @@ '!2733 & (2738 | 254)';
count
-------
12
(1 row)
DROP INDEX text_idx;
CREATE INDEX text_idx on test__int using gist (a gist__intbig_ops(siglen = 0));
ERROR: value 0 out of bounds for option "siglen"
@ -728,6 +746,12 @@ SELECT count(*) from test__int WHERE a @@ '!20 & !21';
6344
(1 row)
SELECT count(*) from test__int WHERE a @@ '!2733 & (2738 | 254)';
count
-------
12
(1 row)
DROP INDEX text_idx;
CREATE INDEX text_idx on test__int using gist ( a gist__intbig_ops );
SELECT count(*) from test__int WHERE a && '{23,50}';
@ -802,6 +826,12 @@ SELECT count(*) from test__int WHERE a @@ '!20 & !21';
6344
(1 row)
SELECT count(*) from test__int WHERE a @@ '!2733 & (2738 | 254)';
count
-------
12
(1 row)
DROP INDEX text_idx;
CREATE INDEX text_idx on test__int using gin ( a gin__int_ops );
SELECT count(*) from test__int WHERE a && '{23,50}';
@ -876,6 +906,12 @@ SELECT count(*) from test__int WHERE a @@ '!20 & !21';
6344
(1 row)
SELECT count(*) from test__int WHERE a @@ '!2733 & (2738 | 254)';
count
-------
12
(1 row)
DROP INDEX text_idx;
-- Repeat the same queries with an extended data set. The data set is the
-- same that we used before, except that each element in the array is
@ -968,4 +1004,10 @@ SELECT count(*) from more__int WHERE a @@ '!20 & !21';
6344
(1 row)
SELECT count(*) from test__int WHERE a @@ '!2733 & (2738 | 254)';
count
-------
12
(1 row)
RESET enable_seqscan;

View File

@ -107,6 +107,7 @@ SELECT count(*) from test__int WHERE a @> '{20,23}' or a @> '{50,68}';
SELECT count(*) from test__int WHERE a @@ '(20&23)|(50&68)';
SELECT count(*) from test__int WHERE a @@ '20 | !21';
SELECT count(*) from test__int WHERE a @@ '!20 & !21';
SELECT count(*) from test__int WHERE a @@ '!2733 & (2738 | 254)';
SET enable_seqscan = off; -- not all of these would use index by default
@ -124,6 +125,7 @@ SELECT count(*) from test__int WHERE a @> '{20,23}' or a @> '{50,68}';
SELECT count(*) from test__int WHERE a @@ '(20&23)|(50&68)';
SELECT count(*) from test__int WHERE a @@ '20 | !21';
SELECT count(*) from test__int WHERE a @@ '!20 & !21';
SELECT count(*) from test__int WHERE a @@ '!2733 & (2738 | 254)';
INSERT INTO test__int SELECT array(SELECT x FROM generate_series(1, 1001) x); -- should fail
@ -144,6 +146,7 @@ SELECT count(*) from test__int WHERE a @> '{20,23}' or a @> '{50,68}';
SELECT count(*) from test__int WHERE a @@ '(20&23)|(50&68)';
SELECT count(*) from test__int WHERE a @@ '20 | !21';
SELECT count(*) from test__int WHERE a @@ '!20 & !21';
SELECT count(*) from test__int WHERE a @@ '!2733 & (2738 | 254)';
DROP INDEX text_idx;
CREATE INDEX text_idx on test__int using gist (a gist__intbig_ops(siglen = 0));
@ -162,6 +165,7 @@ SELECT count(*) from test__int WHERE a @> '{20,23}' or a @> '{50,68}';
SELECT count(*) from test__int WHERE a @@ '(20&23)|(50&68)';
SELECT count(*) from test__int WHERE a @@ '20 | !21';
SELECT count(*) from test__int WHERE a @@ '!20 & !21';
SELECT count(*) from test__int WHERE a @@ '!2733 & (2738 | 254)';
DROP INDEX text_idx;
CREATE INDEX text_idx on test__int using gist ( a gist__intbig_ops );
@ -178,6 +182,7 @@ SELECT count(*) from test__int WHERE a @> '{20,23}' or a @> '{50,68}';
SELECT count(*) from test__int WHERE a @@ '(20&23)|(50&68)';
SELECT count(*) from test__int WHERE a @@ '20 | !21';
SELECT count(*) from test__int WHERE a @@ '!20 & !21';
SELECT count(*) from test__int WHERE a @@ '!2733 & (2738 | 254)';
DROP INDEX text_idx;
CREATE INDEX text_idx on test__int using gin ( a gin__int_ops );
@ -194,6 +199,7 @@ SELECT count(*) from test__int WHERE a @> '{20,23}' or a @> '{50,68}';
SELECT count(*) from test__int WHERE a @@ '(20&23)|(50&68)';
SELECT count(*) from test__int WHERE a @@ '20 | !21';
SELECT count(*) from test__int WHERE a @@ '!20 & !21';
SELECT count(*) from test__int WHERE a @@ '!2733 & (2738 | 254)';
DROP INDEX text_idx;
@ -229,6 +235,7 @@ SELECT count(*) from more__int WHERE a @> '{20,23}' or a @> '{50,68}';
SELECT count(*) from more__int WHERE a @@ '(20&23)|(50&68)';
SELECT count(*) from more__int WHERE a @@ '20 | !21';
SELECT count(*) from more__int WHERE a @@ '!20 & !21';
SELECT count(*) from test__int WHERE a @@ '!2733 & (2738 | 254)';
RESET enable_seqscan;

View File

@ -3,8 +3,8 @@
MODULES = isn
EXTENSION = isn
DATA = isn--1.1.sql isn--1.1--1.2.sql \
isn--1.0--1.1.sql
DATA = isn--1.0--1.1.sql isn--1.1.sql \
isn--1.1--1.2.sql isn--1.2--1.3.sql
PGFILEDESC = "isn - data types for international product numbering standards"
# the other .h files are data tables, we don't install those

View File

@ -279,6 +279,50 @@ FROM (VALUES ('9780123456786', 'UPC'),
9771234567003 | ISSN | t | | | |
(3 rows)
--
-- test weak mode
--
SELECT '2222222222221'::ean13; -- fail
ERROR: invalid check digit for EAN13 number: "2222222222221", should be 2
LINE 1: SELECT '2222222222221'::ean13;
^
SET isn.weak TO TRUE;
SELECT '2222222222221'::ean13;
ean13
------------------
222-222222222-2!
(1 row)
SELECT is_valid('2222222222221'::ean13);
is_valid
----------
f
(1 row)
SELECT make_valid('2222222222221'::ean13);
make_valid
-----------------
222-222222222-2
(1 row)
SELECT isn_weak(); -- backwards-compatibility wrappers for accessing the GUC
isn_weak
----------
t
(1 row)
SELECT isn_weak(false);
isn_weak
----------
f
(1 row)
SHOW isn.weak;
isn.weak
----------
off
(1 row)
--
-- cleanup
--

View File

@ -0,0 +1,7 @@
/* contrib/isn/isn--1.2--1.3.sql */
-- complain if script is sourced in psql, rather than via ALTER EXTENSION
\echo Use "ALTER EXTENSION isn UPDATE TO '1.3'" to load this file. \quit
ALTER FUNCTION isn_weak(boolean) VOLATILE PARALLEL UNSAFE;
ALTER FUNCTION isn_weak() STABLE PARALLEL SAFE;

View File

@ -21,8 +21,12 @@
#include "UPC.h"
#include "fmgr.h"
#include "isn.h"
#include "utils/guc.h"
PG_MODULE_MAGIC;
PG_MODULE_MAGIC_EXT(
.name = "isn",
.version = PG_VERSION
);
#ifdef USE_ASSERT_CHECKING
#define ISN_DEBUG 1
@ -39,6 +43,7 @@ enum isn_type
static const char *const isn_names[] = {"EAN13/UPC/ISxN", "EAN13/UPC/ISxN", "EAN13", "ISBN", "ISMN", "ISSN", "UPC"};
/* GUC value */
static bool g_weak = false;
@ -929,6 +934,20 @@ _PG_init(void)
if (!check_table(UPC_range, UPC_index))
elog(ERROR, "UPC failed check");
}
/* Define a GUC variable for weak mode. */
DefineCustomBoolVariable("isn.weak",
"Accept input with invalid ISN check digits.",
NULL,
&g_weak,
false,
PGC_USERSET,
0,
NULL,
NULL,
NULL);
MarkGUCPrefixReserved("isn");
}
/* isn_out
@ -1109,17 +1128,16 @@ make_valid(PG_FUNCTION_ARGS)
/* this function temporarily sets weak input flag
* (to lose the strictness of check digit acceptance)
* It's a helper function, not intended to be used!!
*/
PG_FUNCTION_INFO_V1(accept_weak_input);
Datum
accept_weak_input(PG_FUNCTION_ARGS)
{
#ifdef ISN_WEAK_MODE
g_weak = PG_GETARG_BOOL(0);
#else
/* function has no effect */
#endif /* ISN_WEAK_MODE */
bool newvalue = PG_GETARG_BOOL(0);
(void) set_config_option("isn.weak", newvalue ? "on" : "off",
PGC_USERSET, PGC_S_SESSION,
GUC_ACTION_SET, true, 0, false);
PG_RETURN_BOOL(g_weak);
}

View File

@ -1,6 +1,6 @@
# isn extension
comment = 'data types for international product numbering standards'
default_version = '1.2'
default_version = '1.3'
module_pathname = '$libdir/isn'
relocatable = true
trusted = true

View File

@ -18,7 +18,6 @@
#include "fmgr.h"
#undef ISN_DEBUG
#define ISN_WEAK_MODE
/*
* uint64 is the internal storage format for ISNs.

View File

@ -19,8 +19,9 @@ contrib_targets += isn
install_data(
'isn.control',
'isn--1.0--1.1.sql',
'isn--1.1--1.2.sql',
'isn--1.1.sql',
'isn--1.1--1.2.sql',
'isn--1.2--1.3.sql',
kwargs: contrib_data_args,
)

View File

@ -120,6 +120,19 @@ FROM (VALUES ('9780123456786', 'UPC'),
AS a(str,typ),
LATERAL pg_input_error_info(a.str, a.typ) as errinfo;
--
-- test weak mode
--
SELECT '2222222222221'::ean13; -- fail
SET isn.weak TO TRUE;
SELECT '2222222222221'::ean13;
SELECT is_valid('2222222222221'::ean13);
SELECT make_valid('2222222222221'::ean13);
SELECT isn_weak(); -- backwards-compatibility wrappers for accessing the GUC
SELECT isn_weak(false);
SHOW isn.weak;
--
-- cleanup
--

View File

@ -7,7 +7,10 @@
#include "utils/fmgrprotos.h"
#include "utils/jsonb.h"
PG_MODULE_MAGIC;
PG_MODULE_MAGIC_EXT(
.name = "jsonb_plperl",
.version = PG_VERSION
);
static SV *Jsonb_to_SV(JsonbContainer *jsonb);
static JsonbValue *SV_to_JsonbValue(SV *obj, JsonbParseState **ps, bool is_elem);

View File

@ -2,12 +2,15 @@
#include "plpy_elog.h"
#include "plpy_typeio.h"
#include "plpython.h"
#include "plpy_util.h"
#include "utils/fmgrprotos.h"
#include "utils/jsonb.h"
#include "utils/numeric.h"
PG_MODULE_MAGIC;
PG_MODULE_MAGIC_EXT(
.name = "jsonb_plpython",
.version = PG_VERSION
);
/* for PLyObject_AsString in plpy_typeio.c */
typedef char *(*PLyObject_AsString_t) (PyObject *plrv);

View File

@ -12,7 +12,10 @@
#include "utils/fmgrprotos.h"
#include "utils/rel.h"
PG_MODULE_MAGIC;
PG_MODULE_MAGIC_EXT(
.name = "lo",
.version = PG_VERSION
);
/*

View File

@ -13,7 +13,10 @@
#include "utils/selfuncs.h"
#include "varatt.h"
PG_MODULE_MAGIC;
PG_MODULE_MAGIC_EXT(
.name = "ltree",
.version = PG_VERSION
);
/* compare functions */
PG_FUNCTION_INFO_V1(ltree_cmp);

Some files were not shown because too many files have changed in this diff Show More