Commit Graph

291 Commits

Author SHA1 Message Date
Melanie Plageman
add323da40 Eliminate XLOG_HEAP2_VISIBLE from vacuum phase III
Instead of emitting a separate XLOG_HEAP2_VISIBLE WAL record for each
page that becomes all-visible in vacuum's third phase, specify the VM
changes in the already emitted XLOG_HEAP2_PRUNE_VACUUM_CLEANUP record.

Visibility checks are now performed before marking dead items unused.
This is safe because the heap page is held under exclusive lock for the
entire operation.

This reduces the number of WAL records generated by VACUUM phase III by
up to 50%.

Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com
2025-10-13 18:01:06 -04:00
Álvaro Herrera
81fc3e28e3
Update some more forward declarations to use typedef
As commit d4d1fc527b.

Reviewed-by: Peter Eisentraut <peter@eisentraut.org>
Discussion: https://postgr.es/m/202509191025.22agk3fvpilc@alvherre.pgsql
2025-09-25 14:33:19 +02:00
Peter Eisentraut
a5b35fcedb Remove PointerIsValid()
This doesn't provide any value over the standard style of checking the
pointer directly or comparing against NULL.

Also remove related:
- AllocPointerIsValid() [unused]
- IndexScanIsValid() [had one user]
- HeapScanIsValid() [unused]
- InvalidRelation [unused]

Leaving HeapTupleIsValid(), ItemIdIsValid(), PortalIsValid(),
RelationIsValid for now, to reduce code churn.

Reviewed-by: Jacob Champion <jacob.champion@enterprisedb.com>
Discussion: https://www.postgresql.org/message-id/flat/ad50ab6b-6f74-4603-b099-1cd6382fb13d%40eisentraut.org
Discussion: https://www.postgresql.org/message-id/CA+hUKG+NFKnr=K4oybwDvT35dW=VAjAAfiuLxp+5JeZSOV3nBg@mail.gmail.com
Discussion: https://www.postgresql.org/message-id/bccf2803-5252-47c2-9ff0-340502d5bd1c@iki.fi
2025-09-24 15:17:20 +02:00
Michael Paquier
2252fcd427 Rationalize handling of VacuumParams
This commit refactors the vacuum routines that rely on VacuumParams,
adding const markers where necessary to force a new policy in the code.
This structure should not use a pointer as it may be used across
multiple relations, and its contents should never be updated.
vacuum_rel() stands as an exception as it touches the "index_cleanup"
and "truncate" options.

VacuumParams has been introduced in 0d83138974, and 661643deda has
fixed a bug impacting VACUUM operating on multiple relations.  The
changes done in tableam.h break ABI compatibility, so this commit can
only happen on HEAD.

Author: Shihao Zhong <zhong950419@gmail.com>
Co-authored-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Nathan Bossart <nathandbossart@gmail.com>
Reviewed-by: Junwang Zhao <zhjwpku@gmail.com>
Discussion: https://postgr.es/m/CAGRkXqTo+aK=GTy5pSc-9cy8H2F2TJvcrZ-zXEiNJj93np1UUw@mail.gmail.com
2025-06-30 15:42:50 +09:00
Tom Lane
b27644bade Sync typedefs.list with the buildfarm.
Our maintenance of typedefs.list has been a little haphazard
(and apparently we can't alphabetize worth a darn).  Replace
the file with the authoritative list from our buildfarm, and
run pgindent using that.

I also updated the additions/exclusions lists in pgindent where
necessary to keep pgindent from messing things up significantly.
Notably, now that regex_t and some related names are macros not real
typedefs, we have to whitelist them explicitly.  The exclusions list
has also drifted noticeably, presumably due to changes of system
headers on the buildfarm animals that contribute to the list.

Unlike in prior years, I've not manually added typedef names that
are missing from the buildfarm's list because they are not used to
declare any variables or fields.  So there are a few places where
the typedef declaration itself is formatted worse than before,
e.g. typedef enum IoMethod.  I could preserve the names that were
manually added to the list previously, but I'd really prefer to find
a less manual way of dealing with these cases.  A quick grep finds
about 75 such symbols, most of which have never gotten any special
treatment.

Per discussion among pgsql-release, doing this now seems appropriate
even though we're still a week or two away from making the v18 branch.
2025-06-15 13:04:24 -04:00
Andres Freund
459e7bf8e2 Remove HeapBitmapScan's skip_fetch optimization
The optimization does not take the removal of TIDs by a concurrent vacuum into
account. The concurrent vacuum can remove dead TIDs and make pages ALL_VISIBLE
while those dead TIDs are referenced in the bitmap. This can lead to a
skip_fetch scan returning too many tuples.

It likely would be possible to implement this optimization safely, but we
don't have the necessary infrastructure in place. Nor is it clear that it's
worth building that infrastructure, given how limited the skip_fetch
optimization is.

In the backbranches we just disable the optimization by always passing
need_tuples=true to table_beginscan_bm(). We can't perform API/ABI changes in
the backbranches and we want to make the change as minimal as possible.

Author: Matthias van de Meent <boekewurm+postgres@gmail.com>
Reported-By: Konstantin Knizhnik <knizhnik@garret.ru>
Discussion: https://postgr.es/m/CAEze2Wg3gXXZTr6_rwC+s4-o2ZVFB5F985uUSgJTsECx6AmGcQ@mail.gmail.com
Backpatch-through: 13
2025-04-02 14:54:20 -04:00
Melanie Plageman
f7a8fc10cc Add and use BitmapHeapScanDescData struct
Move the several members of HeapScanDescData which are specific to
Bitmap Heap Scans into a new struct, BitmapHeapScanDescData, which
inherits from HeapScanDescData.

This reduces the size of the HeapScanDescData for other types of scans
and will allow us to add additional bitmap heap scan-specific members in
the future without fear of bloating the HeapScanDescData.

Reviewed-by: Tomas Vondra
Discussion: https://postgr.es/m/c736f6aa-8b35-4e20-9621-62c7c82e2168%40vondra.me
2025-01-16 18:42:39 -05:00
Bruce Momjian
50e6eb731d Update copyright for 2025
Backpatch-through: 13
2025-01-01 11:21:55 -05:00
Melanie Plageman
68d9662be1 Make rs_cindex and rs_ntuples unsigned
HeapScanDescData.rs_cindex and rs_ntuples can't be less than 0. All scan
types using the heap scan descriptor expect these values to be >= 0.
Make that expectation clear by making rs_cindex and rs_ntuples unsigned.

Also remove the test in heapam_scan_bitmap_next_tuple() that checks if
rs_cindex < 0. This was never true, but now that rs_cindex is unsigned,
it makes even less sense.

While we are at it, initialize both rs_cindex and rs_ntuples to 0 in
initscan().

Author: Melanie Plageman
Reviewed-by: Dilip Kumar
Discussion: https://postgr.es/m/CAAKRu_ZxF8cDCM_BFi_L-t%3DRjdCZYP1usd1Gd45mjHfZxm0nZw%40mail.gmail.com
2024-12-18 11:47:38 -05:00
Noah Misch
30d47ec8c6 Unpin buffer before inplace update waits for an XID to end.
Commit a07e03fd8f changed inplace updates
to wait for heap_update() commands like GRANT TABLE and GRANT DATABASE.
By keeping the pin during that wait, a sequence of autovacuum workers
and an uncommitted GRANT starved one foreground LockBufferForCleanup()
for six minutes, on buildfarm member sarus.  Prevent, at the cost of a
bit of complexity.  Back-patch to v12, like the earlier commit.  That
commit and heap_inplace_lock() have not yet appeared in any release.

Discussion: https://postgr.es/m/20241026184936.ae.nmisch@google.com
2024-10-29 09:39:55 -07:00
Noah Misch
a07e03fd8f Fix data loss at inplace update after heap_update().
As previously-added tests demonstrated, heap_inplace_update() could
instead update an unrelated tuple of the same catalog.  It could lose
the update.  Losing relhasindex=t was a source of index corruption.
Inplace-updating commands like VACUUM will now wait for heap_update()
commands like GRANT TABLE and GRANT DATABASE.  That isn't ideal, but a
long-running GRANT already hurts VACUUM progress more just by keeping an
XID running.  The VACUUM will behave like a DELETE or UPDATE waiting for
the uncommitted change.

For implementation details, start at the systable_inplace_update_begin()
header comment and README.tuplock.  Back-patch to v12 (all supported
versions).  In back branches, retain a deprecated heap_inplace_update(),
for extensions.

Reported by Smolkin Grigory.  Reviewed by Nitin Motiani, (in earlier
versions) Heikki Linnakangas, and (in earlier versions) Alexander
Lakhin.

Discussion: https://postgr.es/m/CAMp+ueZQz3yDk7qg42hk6-9gxniYbp-=bG2mgqecErqR5gGGOA@mail.gmail.com
2024-09-24 15:25:18 -07:00
Michael Paquier
00c76cf21c Move logic related to WAL replay of Heap/Heap2 into its own file
This brings more clarity to heapam.c, by cleanly separating all the
logic related to WAL replay and the rest of Heap and Heap2, similarly
to other RMGRs like hash, btree, etc.

The header reorganization is also nice in heapam.c, cutting half of the
headers required.

Author: Li Yong
Reviewed-by: Sutou Kouhei, Michael Paquier
Discussion: https://postgr.es/m/EFE55E65-D7BD-4C6A-B630-91F43FD0771B@ebay.com
2024-09-12 13:32:05 +09:00
Peter Geoghegan
6207f08f70 Harmonize function parameter names for Postgres 17.
Make sure that function declarations use names that exactly match the
corresponding names from function definitions in a few places.  These
inconsistencies were all introduced during Postgres 17 development.

pg_bsd_indent still has a couple of similar inconsistencies, which I
(pgeoghegan) have left untouched for now.

This commit was written with help from clang-tidy, by mechanically
applying the same rules as similar clean-up commits (the earliest such
commit was commit 035ce1fe).
2024-06-12 17:01:51 -04:00
Alexander Korotkov
6377e12a5a revert: Generalize relation analyze in table AM interface
This commit reverts 27bc1772fc and dd1f6b0c17.  Per review by Andres Freund.

Discussion: https://postgr.es/m/20240415201057.khoyxbwwxfgzomeo%40awork3.anarazel.de
2024-04-16 13:14:20 +03:00
Alexander Korotkov
193e6d18e5 Revert: Allow locking updated tuples in tuple_update() and tuple_delete()
This commit reverts 87985cc925 and 818861eb57 per review by Andres Freund.

Discussion: https://postgr.es/m/20240410165236.rwyrny7ihi4ddxw4%40awork3.anarazel.de
2024-04-11 16:01:34 +03:00
Alexander Korotkov
da841aa4dc Revert: Let table AM insertion methods control index insertion
This commit reverts b1484a3f19 per review by Andres Freund.

Discussion: https://postgr.es/m/20240410165236.rwyrny7ihi4ddxw4%40awork3.anarazel.de
2024-04-11 16:01:30 +03:00
Thomas Munro
041b96802e Use streaming I/O in ANALYZE.
The ANALYZE command prefetches and reads sample blocks chosen by a
BlockSampler algorithm. Instead of calling [Prefetch|Read]Buffer() for
each block, ANALYZE now uses the streaming API introduced in b5a9b18cd0.

Author: Nazir Bilal Yavuz <byavuz81@gmail.com>
Reviewed-by: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Jakub Wartak <jakub.wartak@enterprisedb.com>
Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi>
Reviewed-by: Thomas Munro <thomas.munro@gmail.com>
Discussion: https://postgr.es/m/flat/CAN55FZ0UhXqk9v3y-zW_fp4-WCp43V8y0A72xPmLkOM%2B6M%2BmJg%40mail.gmail.com
2024-04-08 13:16:28 +12:00
Thomas Munro
b7b0f3f272 Use streaming I/O in sequential scans.
Instead of calling ReadBuffer() for each block, heap sequential scans
and TID range scans now use the streaming API introduced in b5a9b18cd0.

Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Thomas Munro <thomas.munro@gmail.com>
Discussion: https://postgr.es/m/flat/CAAKRu_YtXJiYKQvb5JsA2SkwrsizYLugs4sSOZh3EAjKUg%3DgEQ%40mail.gmail.com
2024-04-08 01:53:57 +12:00
Tomas Vondra
04e72ed617 BitmapHeapScan: Push skip_fetch optimization into table AM
Commit 7c70996ebf introduced an optimization to allow bitmap
scans to operate like index-only scans by not fetching a block from the
heap if none of the underlying data is needed and the block is marked
all visible in the visibility map.

With the introduction of table AMs, a FIXME was added to this code
indicating that the skip_fetch logic should be pushed into the table
AM-specific code, as not all table AMs may use a visibility map in the
same way.

This commit resolves this FIXME for the current block. The layering
violation is still present in BitmapHeapScans's prefetching code, which
uses the visibility map to decide whether or not to prefetch a block.
However, this can be addressed independently.

Author: Melanie Plageman
Reviewed-by: Andres Freund, Heikki Linnakangas, Tomas Vondra, Mark Dilger
Discussion: https://postgr.es/m/CAAKRu_ZwCwWFeL_H3ia26bP2e7HiKLWt0ZmGXPVwPO6uXq0vaA%40mail.gmail.com
2024-04-07 00:24:14 +02:00
David Rowley
44086b0975 Preliminary refactor of heap scanning functions
To allow the use of the read stream API added in b5a9b18cd for
sequential scans on heap tables, here we make some adjustments to make
that change less invasive and perhaps make the code easier to follow in
the process.

Here heapgetpage() gets broken into two functions:

1) The part which reads the block has now been moved into a function
   named heapfetchbuf().
2) The part which performed pruning and populated the scan's
   rs_vistuples[] array is now moved into a new function named
   heap_prepare_pagescan().

The functionality provided by heap_prepare_pagescan() was only ever
required by SO_ALLOW_PAGEMODE scans, so the branching that was
previously done in heapgetpage() is no longer needed as we simply just
don't call heap_prepare_pagescan() from heapgettup() in the refactored
code.

Author: Melanie Plageman
Discussion: https://postgr.es/m/CAAKRu_YtXJiYKQvb5JsA2SkwrsizYLugs4sSOZh3EAjKUg=gEQ@mail.gmail.com
2024-04-04 16:41:13 +13:00
Heikki Linnakangas
6dbb490261 Combine freezing and pruning steps in VACUUM
Execute both freezing and pruning of tuples in the same
heap_page_prune() function, now called heap_page_prune_and_freeze(),
and emit a single WAL record containing all changes. That reduces the
overall amount of WAL generated.

This moves the freezing logic from vacuumlazy.c to the
heap_page_prune_and_freeze() function. The main difference in the
coding is that in vacuumlazy.c, we looked at the tuples after the
pruning had already happened, but in heap_page_prune_and_freeze() we
operate on the tuples before pruning. The heap_prepare_freeze_tuple()
function is now invoked after we have determined that a tuple is not
going to be pruned away.

VACUUM no longer needs to loop through the items on the page after
pruning. heap_page_prune_and_freeze() does all the work. It now
returns the list of dead offsets, including existing LP_DEAD items, to
the caller. Similarly it's now responsible for tracking 'all_visible',
'all_frozen', and 'hastup' on the caller's behalf.

Author: Melanie Plageman <melanieplageman@gmail.com>
Discussion: https://www.postgresql.org/message-id/20240330055710.kqg6ii2cdojsxgje@liskov
2024-04-03 19:32:28 +03:00
Heikki Linnakangas
3d0f730bf1 Introduce 'options' argument to heap_page_prune()
Currently there is only one option, HEAP_PAGE_PRUNE_MARK_UNUSED_NOW
which replaces the old boolean argument, but upcoming patches will
introduce at least one more. Having a lot of boolean arguments makes
it hard to see at the call sites what the arguments mean, so prefer a
bitmask of options with human-readable names.

Author: Melanie Plageman <melanieplageman@gmail.com>
Author: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Discussion: https://www.postgresql.org/message-id/20240401172219.fngjosaqdgqqvg4e@liskov
2024-04-02 00:56:05 +03:00
Alexander Korotkov
b1484a3f19 Let table AM insertion methods control index insertion
Previously, the executor did index insert unconditionally after calling
table AM interface methods tuple_insert() and multi_insert().  This commit
introduces the new parameter insert_indexes for these two methods.  Setting
'*insert_indexes' to true saves the current logic.  Setting it to false
indicates that table AM cares about index inserts itself and doesn't want the
caller to do that.

Discussion: https://postgr.es/m/CAPpHfdurb9ycV8udYqM%3Do0sPS66PJ4RCBM1g-bBpvzUfogY0EA%40mail.gmail.com
Reviewed-by: Pavel Borisov, Matthias van de Meent, Mark Dilger
2024-03-30 22:53:56 +02:00
Alexander Korotkov
27bc1772fc Generalize relation analyze in table AM interface
Currently, there is just one algorithm for sampling tuples from a table written
in acquire_sample_rows().  Custom table AM can just redefine the way to get the
next block/tuple by implementing scan_analyze_next_block() and
scan_analyze_next_tuple() API functions.

This approach doesn't seem general enough.  For instance, it's unclear how to
sample this way index-organized tables.  This commit allows table AM to
encapsulate the whole sampling algorithm (currently implemented in
acquire_sample_rows()) into the relation_analyze() API function.

Discussion: https://postgr.es/m/CAPpHfdurb9ycV8udYqM%3Do0sPS66PJ4RCBM1g-bBpvzUfogY0EA%40mail.gmail.com
Reviewed-by: Pavel Borisov, Matthias van de Meent
2024-03-30 22:34:04 +02:00
Alexander Korotkov
87985cc925 Allow locking updated tuples in tuple_update() and tuple_delete()
Currently, in read committed transaction isolation mode (default), we have the
following sequence of actions when tuple_update()/tuple_delete() finds
the tuple updated by the concurrent transaction.

1. Attempt to update/delete tuple with tuple_update()/tuple_delete(), which
   returns TM_Updated.
2. Lock tuple with tuple_lock().
3. Re-evaluate plan qual (recheck if we still need to update/delete and
   calculate the new tuple for update).
4. Second attempt to update/delete tuple with tuple_update()/tuple_delete().
   This attempt should be successful, since the tuple was previously locked.

This commit eliminates step 2 by taking the lock during the first
tuple_update()/tuple_delete() call.  The heap table access method saves some
effort by checking the updated tuple once instead of twice.  Future
undo-based table access methods, which will start from the latest row version,
can immediately place a lock there.

Also, this commit makes tuple_update()/tuple_delete() optionally save the old
tuple into the dedicated slot.  That saves efforts on re-fetching tuples in
certain cases.

The code in nodeModifyTable.c is simplified by removing the nested switch/case.

Discussion: https://postgr.es/m/CAPpHfdua-YFw3XTprfutzGp28xXLigFtzNbuFY8yPhqeq6X5kg%40mail.gmail.com
Reviewed-by: Aleksander Alekseev, Pavel Borisov, Vignesh C, Mason Sharp
Reviewed-by: Andres Freund, Chris Travers
2024-03-26 01:27:56 +02:00
Heikki Linnakangas
f83d709760 Merge prune, freeze and vacuum WAL record formats
The new combined WAL record is now used for pruning, freezing and 2nd
pass of vacuum. This is in preparation for changing VACUUM to write a
combined prune+freeze record per page, instead of separate two
records. The new WAL record format now supports that, but the code
still always writes separate records for pruning and freezing.

This reserves separate XLOG_HEAP2_* info codes for when the pruning
record is emitted for on-access pruning or VACUUM, per Peter
Geoghegan's suggestion. The record format is identical, but having
separate info codes makes it easier analyze pruning and vacuuming with
pg_waldump.

The function to emit the new WAL record, log_heap_prune_and_freeze(),
is in pruneheap.c. The existing heap_log_freeze_plan() and its
subroutines are moved to pruneheap.c without changes, to keep them
together with log_heap_prune_and_freeze().

Author: Melanie Plageman <melanieplageman@gmail.com>
Discussion: https://www.postgresql.org/message-id/CAAKRu_azf-zH%3DDgVbquZ3tFWjMY1w5pO8m-TXJaMdri8z3933g@mail.gmail.com
Discussion: https://www.postgresql.org/message-id/CAAKRu_b2oE4GL%3Dq4g9mcByS9yT7wTQvEH9OLpabj28e%2BWKFi2A@mail.gmail.com
2024-03-25 14:59:58 +02:00
Robert Haas
c120550edb Optimize vacuuming of relations with no indexes.
If there are no indexes on a relation, items can be marked LP_UNUSED
instead of LP_DEAD when pruning. This significantly reduces WAL
volume, since we no longer need to emit one WAL record for pruning
and a second to change the LP_DEAD line pointers thus created to
LP_UNUSED.

Melanie Plageman, reviewed by Andres Freund, Peter Geoghegan, and me

Discussion: https://postgr.es/m/CAAKRu_bgvb_k0gKOXWzNKWHt560R0smrGe3E8zewKPs8fiMKkw%40mail.gmail.com
2024-01-18 10:03:42 -05:00
Bruce Momjian
29275b1d17 Update copyright for 2024
Reported-by: Michael Paquier

Discussion: https://postgr.es/m/ZZKTDPxBBMt3C0J9@paquier.xyz

Backpatch-through: 12
2024-01-03 20:49:05 -05:00
Peter Eisentraut
611806cd72 Add trailing commas to enum definitions
Since C99, there can be a trailing comma after the last value in an
enum definition.  A lot of new code has been introducing this style on
the fly.  Some new patches are now taking an inconsistent approach to
this.  Some add the last comma on the fly if they add a new last
value, some are trying to preserve the existing style in each place,
some are even dropping the last comma if there was one.  We could
nudge this all in a consistent direction if we just add the trailing
commas everywhere once.

I omitted a few places where there was a fixed "last" value that will
always stay last.  I also skipped the header files of libpq and ecpg,
in case people want to use those with older compilers.  There were
also a small number of cases where the enum type wasn't used anywhere
(but the enum values were), which ended up confusing pgindent a bit,
so I left those alone.

Discussion: https://www.postgresql.org/message-id/flat/386f8c45-c8ac-4681-8add-e3b0852c1620%40eisentraut.org
2023-10-26 09:20:54 +02:00
Robert Haas
1ccc1e05ae Remove retry loop in heap_page_prune().
The retry loop is needed because heap_page_prune() calls
HeapTupleSatisfiesVacuum() and then lazy_scan_prune() does the same
thing again, and they might get different answers due to concurrent
clog updates.  But this patch makes heap_page_prune() return the
HeapTupleSatisfiesVacuum() results that it computed back to the
caller, which allows lazy_scan_prune() to avoid needing to recompute
those values in the first place. That's nice both because it eliminates
the need for a retry loop and also because it's cheaper.

Melanie Plageman, reviewed by David Geier, Andres Freund, and me.

Discussion: https://postgr.es/m/CAAKRu_br124qsGJieuYA0nGjywEukhK1dKBfRdby_4yY3E9SXA%40mail.gmail.com
2023-10-02 11:40:07 -04:00
Robert Haas
4e9fc3a976 Return data from heap_page_prune via a struct.
Previously, one of the values in the struct was returned as the return
value, and another was returned via an output parameter. In
preparation for returning more stuff, consolidate both values into a
struct returned via an output parameter.

Melanie Plageman, reviewed by Andres Freund and by me.

Discussion: https://postgr.es/m/CAAKRu_br124qsGJieuYA0nGjywEukhK1dKBfRdby_4yY3E9SXA%40mail.gmail.com
2023-09-28 10:36:34 -04:00
Thomas Munro
f691f5b80a Remove the "snapshot too old" feature.
Remove the old_snapshot_threshold setting and mechanism for producing
the error "snapshot too old", originally added by commit 848ef42b.
Unfortunately it had a number of known problems in terms of correctness
and performance, mostly reported by Andres in the course of his work on
snapshot scalability.  We agreed to remove it, after a long period
without an active plan to fix it.

This is certainly a desirable feature, and someone might propose a new
or improved implementation in the future.

Reported-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/CACG%3DezYV%2BEvO135fLRdVn-ZusfVsTY6cH1OZqWtezuEYH6ciQA%40mail.gmail.com
Discussion: https://postgr.es/m/20200401064008.qob7bfnnbu4w5cw4%40alap3.anarazel.de
Discussion: https://postgr.es/m/CA%2BTgmoY%3Daqf0zjTD%2B3dUWYkgMiNDegDLFjo%2B6ze%3DWtpik%2B3XqA%40mail.gmail.com
2023-09-05 19:53:43 +12:00
Tomas Vondra
19d8e2308b Ignore BRIN indexes when checking for HOT updates
When determining whether an index update may be skipped by using HOT, we
can ignore attributes indexed by block summarizing indexes without
references to individual tuples that need to be cleaned up.

A new type TU_UpdateIndexes provides a signal to the executor to
determine which indexes to update - no indexes, all indexes, or only the
summarizing indexes.

This also removes rd_indexattr list, and replaces it with rd_attrsvalid
flag. The list was not used anywhere, and a simple flag is sufficient.

This was originally committed as 5753d4ee32, but then got reverted by
e3fcca0d0d because of correctness issues.

Original patch by Josef Simanek, various fixes and improvements by Tomas
Vondra and me.

Authors: Matthias van de Meent, Josef Simanek, Tomas Vondra
Reviewed-by: Tomas Vondra, Alvaro Herrera
Discussion: https://postgr.es/m/05ebcb44-f383-86e3-4f31-0a97a55634cf@enterprisedb.com
Discussion: https://postgr.es/m/CAFp7QwpMRGcDAQumN7onN9HjrJ3u4X3ZRXdGFT0K5G2JWvnbWg%40mail.gmail.com
2023-03-20 11:02:42 +01:00
David Rowley
9ed50ab349 Remove stray duplicated comment in heapam.h
This is just the same as what's written under the rs_numblocks field.

Reported-by: Melanie Plageman
Discussion: https://postgr.es/m/20230207204127.7vs6krqjqn5farr7@liskov
2023-02-08 16:03:26 +13:00
David Rowley
f9bc34fcb6 Further refactor of heapgettup and heapgettup_pagemode
Backward and forward scans share much of the same page acquisition code.
Here we consolidate that code to reduce some duplication.

Additionally, add a new rs_coffset field to HeapScanDescData to track the
offset of the current tuple.  The new field fits nicely into the padding
between a bool and BlockNumber field and saves having to look at the last
returned tuple to figure out which offset we should be looking at for the
current tuple.

Author: Melanie Plageman
Reviewed-by: David Rowley
Discussion: https://postgr.es/m/CAAKRu_bvkhka0CZQun28KTqhuUh5ZqY=_T8QEqZqOL02rpi2bw@mail.gmail.com
2023-02-03 11:48:39 +13:00
Peter Geoghegan
79d4bf4eff Delay commit status checks until freezing executes.
pg_xact lookups are relatively expensive.  Move the xmin/xmax commit
status checks from the point that freeze plans are prepared to the point
that they're actually executed.  Otherwise we'll repeat many commit
status checks whenever multiple successive VACUUM operations scan the
same pages and decide against freezing each time, which is a waste of
cycles.

Oversight in commit 1de58df4, which added page-level freezing.

Author: Peter Geoghegan <pg@bowt.ie>
Discussion: https://postgr.es/m/CAH2-WzkZpe4K6qMfEt8H4qYJCKc2R7TPvKsBva7jc9w7iGXQSw@mail.gmail.com
2023-01-03 11:22:36 -08:00
Peter Geoghegan
b37a083239 Refine the definition of page-level freezing.
Improve comments added by commit 1de58df4 which describe the
lazy_scan_prune "freeze the page" path.  These newly revised comments
are based on suggestions from Jeff Davis.

In passing, remove nearby visibility_cutoff_xid comments left over from
commit 6daeeb1f.

Author: Peter Geoghegan <pg@bowt.ie>
Reviewed-By: Jeff Davis <pgsql@j-davis.com>
Discussion: https://postgr.es/m/ebc857107fe3edd422ef8a65191ca4a8da568b9b.camel@j-davis.com
2023-01-03 10:08:55 -08:00
Bruce Momjian
c8e1ba736b Update copyright for 2023
Backpatch-through: 11
2023-01-02 15:00:37 -05:00
Peter Geoghegan
1de58df4fe Add page-level freezing to VACUUM.
Teach VACUUM to decide on whether or not to trigger freezing at the
level of whole heap pages.  Individual XIDs and MXIDs fields from tuple
headers now trigger freezing of whole pages, rather than independently
triggering freezing of each individual tuple header field.

Managing the cost of freezing over time now significantly influences
when and how VACUUM freezes.  The overall amount of WAL written is the
single most important freezing related cost, in general.  Freezing each
page's tuples together in batch allows VACUUM to take full advantage of
the freeze plan WAL deduplication optimization added by commit 9e540599.

Also teach VACUUM to trigger page-level freezing whenever it detects
that heap pruning generated an FPI.  We'll have already written a large
amount of WAL just to do that much, so it's very likely a good idea to
get freezing out of the way for the page early.  This only happens in
cases where it will directly lead to marking the page all-frozen in the
visibility map.

In most cases "freezing a page" removes all XIDs < OldestXmin, and all
MXIDs < OldestMxact.  It doesn't quite work that way in certain rare
cases involving MultiXacts, though.  It is convenient to define "freeze
the page" in a way that gives FreezeMultiXactId the leeway to put off
the work of processing an individual tuple's xmax whenever it happens to
be a MultiXactId that would require an expensive second pass to process
aggressively (allocating a new multi is especially worth avoiding here).
FreezeMultiXactId is eager when processing is cheap (as it usually is),
and lazy in the event of an individual multi that happens to require
expensive second pass processing.  This avoids regressions related to
processing of multis that page-level freezing might otherwise cause.

Author: Peter Geoghegan <pg@bowt.ie>
Reviewed-By: Jeff Davis <pgsql@j-davis.com>
Reviewed-By: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/CAH2-WzkFok_6EAHuK39GaW4FjEFQsY=3J0AAd6FXk93u-Xq3Fg@mail.gmail.com
2022-12-28 08:50:47 -08:00
Peter Geoghegan
4ce3afb82e Refactor how VACUUM passes around its XID cutoffs.
Use a dedicated struct for the XID/MXID cutoffs used by VACUUM, such as
FreezeLimit and OldestXmin.  This state is initialized in vacuum.c, and
then passed around by code from vacuumlazy.c to heapam.c freezing
related routines.  The new convention is that everybody works off of the
same cutoff state, which is passed around via pointers to const.

Also simplify some of the logic for dealing with frozen xmin in
heap_prepare_freeze_tuple: add dedicated "xmin_already_frozen" state to
clearly distinguish xmin XIDs that we're going to freeze from those that
were already frozen from before.  That way the routine's xmin handling
code is symmetrical with the existing xmax handling code.  This is
preparation for an upcoming commit that will add page level freezing.

Also refactor the control flow within FreezeMultiXactId(), while adding
stricter sanity checks.  We now test OldestXmin directly, instead of
using FreezeLimit as an inexact proxy for OldestXmin.  This is further
preparation for the page level freezing work, which will make the
function's caller cede control of page level freezing to the function
where appropriate (where heap_prepare_freeze_tuple sees a tuple that
happens to contain a MultiXactId in its xmax).

Author: Peter Geoghegan <pg@bowt.ie>
Reviewed-By: Jeff Davis <pgsql@j-davis.com>
Discussion: https://postgr.es/m/CAH2-WznS9TxXmz2_=SY+SyJyDFbiOftKofM9=aDo68BbXNBUMA@mail.gmail.com
2022-12-22 09:37:59 -08:00
Peter Eisentraut
8e1db29cdb Variable renaming in preparation for refactoring
Rename page -> block and dp -> page where appropriate.  The old naming
mixed up block and page in confusing ways.

Author: Melanie Plageman <melanieplageman@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/CAAKRu_YSOnhKsDyFcqJsKtBSrd32DP-jjXmv7hL0BPD-z0TGXQ@mail.gmail.com
2022-11-16 16:40:34 +01:00
Peter Geoghegan
9e5405993c Deduplicate freeze plans in freeze WAL records.
Make heapam WAL records that describe freezing performed by VACUUM more
space efficient by storing each distinct "freeze plan" once, alongside
an array of associated page offset numbers (one per freeze plan).  The
freeze plans required for most heap pages tend to naturally have a great
deal of redundancy, so this technique is very effective in practice.  It
often leads to freeze WAL records that are less than 20% of the size of
equivalent WAL records generated using the previous approach.

The freeze plan concept was introduced by commit 3b97e6823b, which fixed
bugs in VACUUM's handling of MultiXacts.  We retain the concept of
freeze plans, but go back to using page offset number arrays.  There is
no loss of generality here because deduplication is an additive process
that gets applied mechanically when FREEZE_PAGE WAL records are built.

More than anything else, freeze plan deduplication is an optimization
that reduces the marginal cost of freezing additional tuples on pages
that will need to have at least one or two tuples frozen in any case.
Ongoing work that adds page-level freezing to VACUUM will take full
advantage of the improved cost profile through batching.

Also refactor some of the details surrounding recovery conflicts needed
to REDO freeze records in passing: make original execution responsible
for generating a standard latestRemovedXid cutoff, rather than working
backwards to get the same cutoff in the REDO routine.  Bugfix commit
66fbcb0d2e did it the other way around, which is equivalent but obscures
what's going on.

Also rename the cutoff field from the WAL record/struct (rename the
field cutoff_xid to latestRemovedXid to match similar WAL records).
Processing of conflicts by REDO routines is already completely uniform,
so tools like pg_waldump should present the information driving the
process uniformly.  There are two remaining WAL record types that still
don't quite follow this convention (heapam's VISIBLE record type and
SP-GiST's VACUUM_REDIRECT record type).  They can be brought into line
by later work that totally standardizes how the cutoffs are presented.

Bump XLOG_PAGE_MAGIC.

Author: Peter Geoghegan <pg@bowt.ie>
Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-By: Nathan Bossart <nathandbossart@gmail.com>
Reviewed-By: Justin Pryzby <pryzby@telsasoft.com>
Discussion: https://postgr.es/m/CAH2-Wz=XytErMnb8FAyFd+OQEbiipB0Q2FmFdXrggPL4VBnRYQ@mail.gmail.com
2022-11-15 07:48:41 -08:00
Alvaro Herrera
ff0d8f27f4
Remove redundant declaration for XidInMVCCSnapshot
This was added for no good reason by c91560defc, after b7eda3e0e3
had just moved the prototype from utils/tqual.h to utils/snapmgr.h.

Author: Japin Li <japinli@hotmail.com>
Discussion: https://postgr.es/m/MEYP282MB16693A409F3282A9DB287BADB63E9@MEYP282MB1669.AUSP282.PROD.OUTLOOK.COM
2022-11-09 18:30:09 +01:00
Peter Geoghegan
4bac9600f0 Harmonize heapam and tableam parameter names.
Make sure that function declarations use names that exactly match the
corresponding names from function definitions.  Having parameter names
that are reliably consistent in this way will make it easier to reason
about groups of related C functions from the same translation unit as a
module.  It will also make certain refactoring tasks easier.

Like other recent commits that cleaned up function parameter names, this
commit was written with help from clang-tidy.  Later commits will do the
same for other parts of the codebase.

Author: Peter Geoghegan <pg@bowt.ie>
Reviewed-By: David Rowley <dgrowleyml@gmail.com>
Discussion: https://postgr.es/m/CAH2-WznJt9CMM9KJTMjJh_zbL5hD9oX44qdJ4aqZtjFi-zA3Tg@mail.gmail.com
2022-09-19 16:46:23 -07:00
Tom Lane
23e7b38bfe Pre-beta mechanical code beautification.
Run pgindent, pgperltidy, and reformat-dat-files.
I manually fixed a couple of comments that pgindent uglified.
2022-05-12 15:17:30 -04:00
Tom Lane
7b7ed046cb Prevent access to no-longer-pinned buffer in heapam_tuple_lock().
heap_fetch() used to have a "keep_buf" parameter that told it to return
ownership of the buffer pin to the caller after finding that the
requested tuple TID exists but is invisible to the specified snapshot.
This was thoughtlessly removed in commit 5db6df0c0, which broke
heapam_tuple_lock() (formerly EvalPlanQualFetch) because that function
needs to do more accesses to the tuple even if it's invisible.  The net
effect is that we would continue to touch the page for a microsecond or
two after releasing pin on the buffer.  Usually no harm would result;
but if a different session decided to defragment the page concurrently,
we could see garbage data and mistakenly conclude that there's no newer
tuple version to chain up to.  (It's hard to say whether this has
happened in the field.  The bug was actually found thanks to a later
change that allowed valgrind to detect accesses to non-pinned buffers.)

The most reasonable way to fix this is to reintroduce keep_buf,
although I made it behave slightly differently: buffer ownership
is passed back only if there is a valid tuple at the requested TID.
In HEAD, we can just add the parameter back to heap_fetch().
To avoid an API break in the back branches, introduce an additional
function heap_fetch_extended() in those branches.

In HEAD there is an additional, less obvious API change: tuple->t_data
will be set to NULL in all cases where buffer ownership is not returned,
in particular when the tuple exists but fails the time qual (and
!keep_buf).  This is to defend against any other callers attempting to
access non-pinned buffers.  We concluded that making that change in back
branches would be more likely to introduce problems than cure any.

In passing, remove a comment about heap_fetch that was obsoleted by
9a8ee1dc6.

Per bug #17462 from Daniil Anisimov.  Back-patch to v12 where the bug
was introduced.

Discussion: https://postgr.es/m/17462-9c98a0f00df9bd36@postgresql.org
2022-04-13 13:35:07 -04:00
Peter Geoghegan
0b018fabaa Set relfrozenxid to oldest extant XID seen by VACUUM.
When VACUUM set relfrozenxid before now, it set it to whatever value was
used to determine which tuples to freeze -- the FreezeLimit cutoff.
This approach was very naive.  The relfrozenxid invariant only requires
that new relfrozenxid values be <= the oldest extant XID remaining in
the table (at the point that the VACUUM operation ends), which in
general might be much more recent than FreezeLimit.

VACUUM now carefully tracks the oldest remaining XID/MultiXactId as it
goes (the oldest remaining values _after_ lazy_scan_prune processing).
The final values are set as the table's new relfrozenxid and new
relminmxid in pg_class at the end of each VACUUM.  The oldest XID might
come from a tuple's xmin, xmax, or xvac fields.  It might even come from
one of the table's remaining MultiXacts.

Final relfrozenxid values must still be >= FreezeLimit in an aggressive
VACUUM (FreezeLimit still acts as a lower bound on the final value that
aggressive VACUUM can set relfrozenxid to).  Since standard VACUUMs
still make no guarantees about advancing relfrozenxid, they might as
well set relfrozenxid to a value from well before FreezeLimit when the
opportunity presents itself.  In general standard VACUUMs may now set
relfrozenxid to any value > the original relfrozenxid and <= OldestXmin.

Credit for the general idea of using the oldest extant XID to set
pg_class.relfrozenxid at the end of VACUUM goes to Andres Freund.

Author: Peter Geoghegan <pg@bowt.ie>
Reviewed-By: Andres Freund <andres@anarazel.de>
Reviewed-By: Robert Haas <robertmhaas@gmail.com>
Discussion: https://postgr.es/m/CAH2-WzkymFbz6D_vL+jmqSn_5q1wsFvFrE+37yLgL_Rkfd6Gzg@mail.gmail.com
2022-04-03 09:57:21 -07:00
Peter Geoghegan
cf879d3069 Remove unnecessary heap_tuple_needs_freeze argument.
The buffer argument hasn't been used since the function was first added
by commit bbb6e559c4.  The sibling heap_prepare_freeze_tuple function
doesn't have such an argument either.  Remove it.
2022-02-24 18:31:07 -08:00
Bruce Momjian
27b77ecf9f Update copyright for 2022
Backpatch-through: 10
2022-01-07 19:04:57 -05:00
Amit Kapila
8e1fae1938 Move parallel vacuum code to vacuumparallel.c.
This commit moves parallel vacuum related code to a new file
commands/vacuumparallel.c so that any table AM supporting indexes can
utilize parallel vacuum in order to call index AM callbacks (ambulkdelete
and amvacuumcleanup) with parallel workers.

Another reason for this refactoring is that the parallel vacuum isn't
specific to heap so it doesn't make sense to keep this code in
heap/vacuumlazy.c.

Author: Masahiko Sawada, based on suggestion from Andres Freund
Reviewed-by: Hou Zhijie, Amit Kapila, Haiying Tang
Discussion: https://www.postgresql.org/message-id/20211030212101.ae3qcouatwmy7tbr%40alap3.anarazel.de
2021-12-23 11:42:52 +05:30