Commit dc7420c2 authored by Andres Freund's avatar Andres Freund

snapshot scalability: Don't compute global horizons while building snapshots.

To make GetSnapshotData() more scalable, it cannot not look at at each proc's
xmin: While snapshot contents do not need to change whenever a read-only
transaction commits or a snapshot is released, a proc's xmin is modified in
those cases. The frequency of xmin modifications leads to, particularly on
higher core count systems, many cache misses inside GetSnapshotData(), despite
the data underlying a snapshot not changing. That is the most
significant source of GetSnapshotData() scaling poorly on larger systems.

Without accessing xmins, GetSnapshotData() cannot calculate accurate horizons /
thresholds as it has so far. But we don't really have to: The horizons don't
actually change that much between GetSnapshotData() calls. Nor are the horizons
actually used every time a snapshot is built.

The trick this commit introduces is to delay computation of accurate horizons
until there use and using horizon boundaries to determine whether accurate
horizons need to be computed.

The use of RecentGlobal[Data]Xmin to decide whether a row version could be
removed has been replaces with new GlobalVisTest* functions.  These use two
thresholds to determine whether a row can be pruned:
1) definitely_needed, indicating that rows deleted by XIDs >= definitely_needed
   are definitely still visible.
2) maybe_needed, indicating that rows deleted by XIDs < maybe_needed can
   definitely be removed
GetSnapshotData() updates definitely_needed to be the xmin of the computed
snapshot.

When testing whether a row can be removed (with GlobalVisTestIsRemovableXid())
and the tested XID falls in between the two (i.e. XID >= maybe_needed && XID <
definitely_needed) the boundaries can be recomputed to be more accurate. As it
is not cheap to compute accurate boundaries, we limit the number of times that
happens in short succession.  As the boundaries used by
GlobalVisTestIsRemovableXid() are never reset (with maybe_needed updated by
GetSnapshotData()), it is likely that further test can benefit from an earlier
computation of accurate horizons.

To avoid regressing performance when old_snapshot_threshold is set (as that
requires an accurate horizon to be computed), heap_page_prune_opt() doesn't
unconditionally call TransactionIdLimitedForOldSnapshots() anymore. Both the
computation of the limited horizon, and the triggering of errors (with
SetOldSnapshotThresholdTimestamp()) is now only done when necessary to remove
tuples.

This commit just removes the accesses to PGXACT->xmin from
GetSnapshotData(), but other members of PGXACT residing in the same
cache line are accessed. Therefore this in itself does not result in a
significant improvement. Subsequent commits will take advantage of the
fact that GetSnapshotData() now does not need to access xmins anymore.

Note: This contains a workaround in heap_page_prune_opt() to keep the
snapshot_too_old tests working. While that workaround is ugly, the tests
currently are not meaningful, and it seems best to address them separately.

Author: Andres Freund <andres@anarazel.de>
Reviewed-By: default avatarRobert Haas <robertmhaas@gmail.com>
Reviewed-By: default avatarThomas Munro <thomas.munro@gmail.com>
Reviewed-By: default avatarDavid Rowley <dgrowleyml@gmail.com>
Discussion: https://postgr.es/m/20200301083601.ews6hz5dduc3w2se@alap3.anarazel.de
parent 1f42d35a
...@@ -434,10 +434,10 @@ bt_check_every_level(Relation rel, Relation heaprel, bool heapkeyspace, ...@@ -434,10 +434,10 @@ bt_check_every_level(Relation rel, Relation heaprel, bool heapkeyspace,
RelationGetRelationName(rel)); RelationGetRelationName(rel));
/* /*
* RecentGlobalXmin assertion matches index_getnext_tid(). See note on * This assertion matches the one in index_getnext_tid(). See page
* RecentGlobalXmin/B-Tree page deletion. * recycling/"visible to everyone" notes in nbtree README.
*/ */
Assert(TransactionIdIsValid(RecentGlobalXmin)); Assert(TransactionIdIsValid(RecentXmin));
/* /*
* Initialize state for entire verification operation * Initialize state for entire verification operation
...@@ -1581,7 +1581,7 @@ bt_right_page_check_scankey(BtreeCheckState *state) ...@@ -1581,7 +1581,7 @@ bt_right_page_check_scankey(BtreeCheckState *state)
* does not occur until no possible index scan could land on the page. * does not occur until no possible index scan could land on the page.
* Index scans can follow links with nothing more than their snapshot as * Index scans can follow links with nothing more than their snapshot as
* an interlock and be sure of at least that much. (See page * an interlock and be sure of at least that much. (See page
* recycling/RecentGlobalXmin notes in nbtree README.) * recycling/"visible to everyone" notes in nbtree README.)
* *
* Furthermore, it's okay if we follow a rightlink and find a half-dead or * Furthermore, it's okay if we follow a rightlink and find a half-dead or
* dead (ignorable) page one or more times. There will either be a * dead (ignorable) page one or more times. There will either be a
......
...@@ -563,17 +563,14 @@ collect_corrupt_items(Oid relid, bool all_visible, bool all_frozen) ...@@ -563,17 +563,14 @@ collect_corrupt_items(Oid relid, bool all_visible, bool all_frozen)
BufferAccessStrategy bstrategy = GetAccessStrategy(BAS_BULKREAD); BufferAccessStrategy bstrategy = GetAccessStrategy(BAS_BULKREAD);
TransactionId OldestXmin = InvalidTransactionId; TransactionId OldestXmin = InvalidTransactionId;
if (all_visible)
{
/* Don't pass rel; that will fail in recovery. */
OldestXmin = GetOldestXmin(NULL, PROCARRAY_FLAGS_VACUUM);
}
rel = relation_open(relid, AccessShareLock); rel = relation_open(relid, AccessShareLock);
/* Only some relkinds have a visibility map */ /* Only some relkinds have a visibility map */
check_relation_relkind(rel); check_relation_relkind(rel);
if (all_visible)
OldestXmin = GetOldestNonRemovableTransactionId(rel);
nblocks = RelationGetNumberOfBlocks(rel); nblocks = RelationGetNumberOfBlocks(rel);
/* /*
...@@ -679,11 +676,12 @@ collect_corrupt_items(Oid relid, bool all_visible, bool all_frozen) ...@@ -679,11 +676,12 @@ collect_corrupt_items(Oid relid, bool all_visible, bool all_frozen)
* From a concurrency point of view, it sort of sucks to * From a concurrency point of view, it sort of sucks to
* retake ProcArrayLock here while we're holding the buffer * retake ProcArrayLock here while we're holding the buffer
* exclusively locked, but it should be safe against * exclusively locked, but it should be safe against
* deadlocks, because surely GetOldestXmin() should never take * deadlocks, because surely
* a buffer lock. And this shouldn't happen often, so it's * GetOldestNonRemovableTransactionId() should never take a
* worth being careful so as to avoid false positives. * buffer lock. And this shouldn't happen often, so it's worth
* being careful so as to avoid false positives.
*/ */
RecomputedOldestXmin = GetOldestXmin(NULL, PROCARRAY_FLAGS_VACUUM); RecomputedOldestXmin = GetOldestNonRemovableTransactionId(rel);
if (!TransactionIdPrecedes(OldestXmin, RecomputedOldestXmin)) if (!TransactionIdPrecedes(OldestXmin, RecomputedOldestXmin))
record_corrupt_item(items, &tuple.t_self); record_corrupt_item(items, &tuple.t_self);
......
...@@ -71,7 +71,7 @@ statapprox_heap(Relation rel, output_type *stat) ...@@ -71,7 +71,7 @@ statapprox_heap(Relation rel, output_type *stat)
BufferAccessStrategy bstrategy; BufferAccessStrategy bstrategy;
TransactionId OldestXmin; TransactionId OldestXmin;
OldestXmin = GetOldestXmin(rel, PROCARRAY_FLAGS_VACUUM); OldestXmin = GetOldestNonRemovableTransactionId(rel);
bstrategy = GetAccessStrategy(BAS_BULKREAD); bstrategy = GetAccessStrategy(BAS_BULKREAD);
nblocks = RelationGetNumberOfBlocks(rel); nblocks = RelationGetNumberOfBlocks(rel);
......
...@@ -793,3 +793,29 @@ ginvacuumcleanup(IndexVacuumInfo *info, IndexBulkDeleteResult *stats) ...@@ -793,3 +793,29 @@ ginvacuumcleanup(IndexVacuumInfo *info, IndexBulkDeleteResult *stats)
return stats; return stats;
} }
/*
* Return whether Page can safely be recycled.
*/
bool
GinPageIsRecyclable(Page page)
{
TransactionId delete_xid;
if (PageIsNew(page))
return true;
if (!GinPageIsDeleted(page))
return false;
delete_xid = GinPageGetDeleteXid(page);
if (!TransactionIdIsValid(delete_xid))
return true;
/*
* If no backend still could view delete_xid as in running, all scans
* concurrent with ginDeletePage() must have finished.
*/
return GlobalVisCheckRemovableXid(NULL, delete_xid);
}
...@@ -891,15 +891,13 @@ gistPageRecyclable(Page page) ...@@ -891,15 +891,13 @@ gistPageRecyclable(Page page)
* As long as that can happen, we must keep the deleted page around as * As long as that can happen, we must keep the deleted page around as
* a tombstone. * a tombstone.
* *
* Compare the deletion XID with RecentGlobalXmin. If deleteXid < * For that check if the deletion XID could still be visible to
* RecentGlobalXmin, then no scan that's still in progress could have * anyone. If not, then no scan that's still in progress could have
* seen its downlink, and we can recycle it. * seen its downlink, and we can recycle it.
*/ */
FullTransactionId deletexid_full = GistPageGetDeleteXid(page); FullTransactionId deletexid_full = GistPageGetDeleteXid(page);
FullTransactionId recentxmin_full = GetFullRecentGlobalXmin();
if (FullTransactionIdPrecedes(deletexid_full, recentxmin_full)) return GlobalVisIsRemovableFullXid(NULL, deletexid_full);
return true;
} }
return false; return false;
} }
......
...@@ -387,11 +387,11 @@ gistRedoPageReuse(XLogReaderState *record) ...@@ -387,11 +387,11 @@ gistRedoPageReuse(XLogReaderState *record)
* PAGE_REUSE records exist to provide a conflict point when we reuse * PAGE_REUSE records exist to provide a conflict point when we reuse
* pages in the index via the FSM. That's all they do though. * pages in the index via the FSM. That's all they do though.
* *
* latestRemovedXid was the page's deleteXid. The deleteXid < * latestRemovedXid was the page's deleteXid. The
* RecentGlobalXmin test in gistPageRecyclable() conceptually mirrors the * GlobalVisIsRemovableFullXid(deleteXid) test in gistPageRecyclable()
* pgxact->xmin > limitXmin test in GetConflictingVirtualXIDs(). * conceptually mirrors the pgxact->xmin > limitXmin test in
* Consequently, one XID value achieves the same exclusion effect on * GetConflictingVirtualXIDs(). Consequently, one XID value achieves the
* primary and standby. * same exclusion effect on primary and standby.
*/ */
if (InHotStandby) if (InHotStandby)
{ {
......
...@@ -1517,6 +1517,7 @@ heap_hot_search_buffer(ItemPointer tid, Relation relation, Buffer buffer, ...@@ -1517,6 +1517,7 @@ heap_hot_search_buffer(ItemPointer tid, Relation relation, Buffer buffer,
bool at_chain_start; bool at_chain_start;
bool valid; bool valid;
bool skip; bool skip;
GlobalVisState *vistest = NULL;
/* If this is not the first call, previous call returned a (live!) tuple */ /* If this is not the first call, previous call returned a (live!) tuple */
if (all_dead) if (all_dead)
...@@ -1527,7 +1528,8 @@ heap_hot_search_buffer(ItemPointer tid, Relation relation, Buffer buffer, ...@@ -1527,7 +1528,8 @@ heap_hot_search_buffer(ItemPointer tid, Relation relation, Buffer buffer,
at_chain_start = first_call; at_chain_start = first_call;
skip = !first_call; skip = !first_call;
Assert(TransactionIdIsValid(RecentGlobalXmin)); /* XXX: we should assert that a snapshot is pushed or registered */
Assert(TransactionIdIsValid(RecentXmin));
Assert(BufferGetBlockNumber(buffer) == blkno); Assert(BufferGetBlockNumber(buffer) == blkno);
/* Scan through possible multiple members of HOT-chain */ /* Scan through possible multiple members of HOT-chain */
...@@ -1616,9 +1618,14 @@ heap_hot_search_buffer(ItemPointer tid, Relation relation, Buffer buffer, ...@@ -1616,9 +1618,14 @@ heap_hot_search_buffer(ItemPointer tid, Relation relation, Buffer buffer,
* Note: if you change the criterion here for what is "dead", fix the * Note: if you change the criterion here for what is "dead", fix the
* planner's get_actual_variable_range() function to match. * planner's get_actual_variable_range() function to match.
*/ */
if (all_dead && *all_dead && if (all_dead && *all_dead)
!HeapTupleIsSurelyDead(heapTuple, RecentGlobalXmin)) {
*all_dead = false; if (!vistest)
vistest = GlobalVisTestFor(relation);
if (!HeapTupleIsSurelyDead(heapTuple, vistest))
*all_dead = false;
}
/* /*
* Check to see if HOT chain continues past this tuple; if so fetch * Check to see if HOT chain continues past this tuple; if so fetch
......
...@@ -1203,7 +1203,7 @@ heapam_index_build_range_scan(Relation heapRelation, ...@@ -1203,7 +1203,7 @@ heapam_index_build_range_scan(Relation heapRelation,
/* okay to ignore lazy VACUUMs here */ /* okay to ignore lazy VACUUMs here */
if (!IsBootstrapProcessingMode() && !indexInfo->ii_Concurrent) if (!IsBootstrapProcessingMode() && !indexInfo->ii_Concurrent)
OldestXmin = GetOldestXmin(heapRelation, PROCARRAY_FLAGS_VACUUM); OldestXmin = GetOldestNonRemovableTransactionId(heapRelation);
if (!scan) if (!scan)
{ {
...@@ -1244,6 +1244,17 @@ heapam_index_build_range_scan(Relation heapRelation, ...@@ -1244,6 +1244,17 @@ heapam_index_build_range_scan(Relation heapRelation,
hscan = (HeapScanDesc) scan; hscan = (HeapScanDesc) scan;
/*
* Must have called GetOldestNonRemovableTransactionId() if using
* SnapshotAny. Shouldn't have for an MVCC snapshot. (It's especially
* worth checking this for parallel builds, since ambuild routines that
* support parallel builds must work these details out for themselves.)
*/
Assert(snapshot == SnapshotAny || IsMVCCSnapshot(snapshot));
Assert(snapshot == SnapshotAny ? TransactionIdIsValid(OldestXmin) :
!TransactionIdIsValid(OldestXmin));
Assert(snapshot == SnapshotAny || !anyvisible);
/* Publish number of blocks to scan */ /* Publish number of blocks to scan */
if (progress) if (progress)
{ {
...@@ -1263,17 +1274,6 @@ heapam_index_build_range_scan(Relation heapRelation, ...@@ -1263,17 +1274,6 @@ heapam_index_build_range_scan(Relation heapRelation,
nblocks); nblocks);
} }
/*
* Must call GetOldestXmin() with SnapshotAny. Should never call
* GetOldestXmin() with MVCC snapshot. (It's especially worth checking
* this for parallel builds, since ambuild routines that support parallel
* builds must work these details out for themselves.)
*/
Assert(snapshot == SnapshotAny || IsMVCCSnapshot(snapshot));
Assert(snapshot == SnapshotAny ? TransactionIdIsValid(OldestXmin) :
!TransactionIdIsValid(OldestXmin));
Assert(snapshot == SnapshotAny || !anyvisible);
/* set our scan endpoints */ /* set our scan endpoints */
if (!allow_sync) if (!allow_sync)
heap_setscanlimits(scan, start_blockno, numblocks); heap_setscanlimits(scan, start_blockno, numblocks);
......
...@@ -1154,19 +1154,56 @@ HeapTupleSatisfiesMVCC(HeapTuple htup, Snapshot snapshot, ...@@ -1154,19 +1154,56 @@ HeapTupleSatisfiesMVCC(HeapTuple htup, Snapshot snapshot,
* we mainly want to know is if a tuple is potentially visible to *any* * we mainly want to know is if a tuple is potentially visible to *any*
* running transaction. If so, it can't be removed yet by VACUUM. * running transaction. If so, it can't be removed yet by VACUUM.
* *
* OldestXmin is a cutoff XID (obtained from GetOldestXmin()). Tuples * OldestXmin is a cutoff XID (obtained from
* deleted by XIDs >= OldestXmin are deemed "recently dead"; they might * GetOldestNonRemovableTransactionId()). Tuples deleted by XIDs >=
* still be visible to some open transaction, so we can't remove them, * OldestXmin are deemed "recently dead"; they might still be visible to some
* even if we see that the deleting transaction has committed. * open transaction, so we can't remove them, even if we see that the deleting
* transaction has committed.
*/ */
HTSV_Result HTSV_Result
HeapTupleSatisfiesVacuum(HeapTuple htup, TransactionId OldestXmin, HeapTupleSatisfiesVacuum(HeapTuple htup, TransactionId OldestXmin,
Buffer buffer) Buffer buffer)
{
TransactionId dead_after = InvalidTransactionId;
HTSV_Result res;
res = HeapTupleSatisfiesVacuumHorizon(htup, buffer, &dead_after);
if (res == HEAPTUPLE_RECENTLY_DEAD)
{
Assert(TransactionIdIsValid(dead_after));
if (TransactionIdPrecedes(dead_after, OldestXmin))
res = HEAPTUPLE_DEAD;
}
else
Assert(!TransactionIdIsValid(dead_after));
return res;
}
/*
* Work horse for HeapTupleSatisfiesVacuum and similar routines.
*
* In contrast to HeapTupleSatisfiesVacuum this routine, when encountering a
* tuple that could still be visible to some backend, stores the xid that
* needs to be compared with the horizon in *dead_after, and returns
* HEAPTUPLE_RECENTLY_DEAD. The caller then can perform the comparison with
* the horizon. This is e.g. useful when comparing with different horizons.
*
* Note: HEAPTUPLE_DEAD can still be returned here, e.g. if the inserting
* transaction aborted.
*/
HTSV_Result
HeapTupleSatisfiesVacuumHorizon(HeapTuple htup, Buffer buffer, TransactionId *dead_after)
{ {
HeapTupleHeader tuple = htup->t_data; HeapTupleHeader tuple = htup->t_data;
Assert(ItemPointerIsValid(&htup->t_self)); Assert(ItemPointerIsValid(&htup->t_self));
Assert(htup->t_tableOid != InvalidOid); Assert(htup->t_tableOid != InvalidOid);
Assert(dead_after != NULL);
*dead_after = InvalidTransactionId;
/* /*
* Has inserting transaction committed? * Has inserting transaction committed?
...@@ -1323,17 +1360,15 @@ HeapTupleSatisfiesVacuum(HeapTuple htup, TransactionId OldestXmin, ...@@ -1323,17 +1360,15 @@ HeapTupleSatisfiesVacuum(HeapTuple htup, TransactionId OldestXmin,
else if (TransactionIdDidCommit(xmax)) else if (TransactionIdDidCommit(xmax))
{ {
/* /*
* The multixact might still be running due to lockers. If the * The multixact might still be running due to lockers. Need to
* updater is below the xid horizon, we have to return DEAD * allow for pruning if below the xid horizon regardless --
* regardless -- otherwise we could end up with a tuple where the * otherwise we could end up with a tuple where the updater has to
* updater has to be removed due to the horizon, but is not pruned * be removed due to the horizon, but is not pruned away. It's
* away. It's not a problem to prune that tuple, because any * not a problem to prune that tuple, because any remaining
* remaining lockers will also be present in newer tuple versions. * lockers will also be present in newer tuple versions.
*/ */
if (!TransactionIdPrecedes(xmax, OldestXmin)) *dead_after = xmax;
return HEAPTUPLE_RECENTLY_DEAD; return HEAPTUPLE_RECENTLY_DEAD;
return HEAPTUPLE_DEAD;
} }
else if (!MultiXactIdIsRunning(HeapTupleHeaderGetRawXmax(tuple), false)) else if (!MultiXactIdIsRunning(HeapTupleHeaderGetRawXmax(tuple), false))
{ {
...@@ -1372,14 +1407,11 @@ HeapTupleSatisfiesVacuum(HeapTuple htup, TransactionId OldestXmin, ...@@ -1372,14 +1407,11 @@ HeapTupleSatisfiesVacuum(HeapTuple htup, TransactionId OldestXmin,
} }
/* /*
* Deleter committed, but perhaps it was recent enough that some open * Deleter committed, allow caller to check if it was recent enough that
* transactions could still see the tuple. * some open transactions could still see the tuple.
*/ */
if (!TransactionIdPrecedes(HeapTupleHeaderGetRawXmax(tuple), OldestXmin)) *dead_after = HeapTupleHeaderGetRawXmax(tuple);
return HEAPTUPLE_RECENTLY_DEAD; return HEAPTUPLE_RECENTLY_DEAD;
/* Otherwise, it's dead and removable */
return HEAPTUPLE_DEAD;
} }
...@@ -1393,14 +1425,28 @@ HeapTupleSatisfiesVacuum(HeapTuple htup, TransactionId OldestXmin, ...@@ -1393,14 +1425,28 @@ HeapTupleSatisfiesVacuum(HeapTuple htup, TransactionId OldestXmin,
* *
* This is an interface to HeapTupleSatisfiesVacuum that's callable via * This is an interface to HeapTupleSatisfiesVacuum that's callable via
* HeapTupleSatisfiesSnapshot, so it can be used through a Snapshot. * HeapTupleSatisfiesSnapshot, so it can be used through a Snapshot.
* snapshot->xmin must have been set up with the xmin horizon to use. * snapshot->vistest must have been set up with the horizon to use.
*/ */
static bool static bool
HeapTupleSatisfiesNonVacuumable(HeapTuple htup, Snapshot snapshot, HeapTupleSatisfiesNonVacuumable(HeapTuple htup, Snapshot snapshot,
Buffer buffer) Buffer buffer)
{ {
return HeapTupleSatisfiesVacuum(htup, snapshot->xmin, buffer) TransactionId dead_after = InvalidTransactionId;
!= HEAPTUPLE_DEAD; HTSV_Result res;
res = HeapTupleSatisfiesVacuumHorizon(htup, buffer, &dead_after);
if (res == HEAPTUPLE_RECENTLY_DEAD)
{
Assert(TransactionIdIsValid(dead_after));
if (GlobalVisTestIsRemovableXid(snapshot->vistest, dead_after))
res = HEAPTUPLE_DEAD;
}
else
Assert(!TransactionIdIsValid(dead_after));
return res != HEAPTUPLE_DEAD;
} }
...@@ -1418,7 +1464,7 @@ HeapTupleSatisfiesNonVacuumable(HeapTuple htup, Snapshot snapshot, ...@@ -1418,7 +1464,7 @@ HeapTupleSatisfiesNonVacuumable(HeapTuple htup, Snapshot snapshot,
* if the tuple is removable. * if the tuple is removable.
*/ */
bool bool
HeapTupleIsSurelyDead(HeapTuple htup, TransactionId OldestXmin) HeapTupleIsSurelyDead(HeapTuple htup, GlobalVisState *vistest)
{ {
HeapTupleHeader tuple = htup->t_data; HeapTupleHeader tuple = htup->t_data;
...@@ -1459,7 +1505,8 @@ HeapTupleIsSurelyDead(HeapTuple htup, TransactionId OldestXmin) ...@@ -1459,7 +1505,8 @@ HeapTupleIsSurelyDead(HeapTuple htup, TransactionId OldestXmin)
return false; return false;
/* Deleter committed, so tuple is dead if the XID is old enough. */ /* Deleter committed, so tuple is dead if the XID is old enough. */
return TransactionIdPrecedes(HeapTupleHeaderGetRawXmax(tuple), OldestXmin); return GlobalVisTestIsRemovableXid(vistest,
HeapTupleHeaderGetRawXmax(tuple));
} }
/* /*
......
This diff is collapsed.
...@@ -788,6 +788,7 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats, ...@@ -788,6 +788,7 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
PROGRESS_VACUUM_MAX_DEAD_TUPLES PROGRESS_VACUUM_MAX_DEAD_TUPLES
}; };
int64 initprog_val[3]; int64 initprog_val[3];
GlobalVisState *vistest;
pg_rusage_init(&ru0); pg_rusage_init(&ru0);
...@@ -816,6 +817,8 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats, ...@@ -816,6 +817,8 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
vacrelstats->nonempty_pages = 0; vacrelstats->nonempty_pages = 0;
vacrelstats->latestRemovedXid = InvalidTransactionId; vacrelstats->latestRemovedXid = InvalidTransactionId;
vistest = GlobalVisTestFor(onerel);
/* /*
* Initialize state for a parallel vacuum. As of now, only one worker can * Initialize state for a parallel vacuum. As of now, only one worker can
* be used for an index, so we invoke parallelism only if there are at * be used for an index, so we invoke parallelism only if there are at
...@@ -1239,7 +1242,8 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats, ...@@ -1239,7 +1242,8 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
* *
* We count tuples removed by the pruning step as removed by VACUUM. * We count tuples removed by the pruning step as removed by VACUUM.
*/ */
tups_vacuumed += heap_page_prune(onerel, buf, OldestXmin, false, tups_vacuumed += heap_page_prune(onerel, buf, vistest, false,
InvalidTransactionId, 0,
&vacrelstats->latestRemovedXid); &vacrelstats->latestRemovedXid);
/* /*
...@@ -1596,14 +1600,16 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats, ...@@ -1596,14 +1600,16 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
} }
/* /*
* It's possible for the value returned by GetOldestXmin() to move * It's possible for the value returned by
* backwards, so it's not wrong for us to see tuples that appear to * GetOldestNonRemovableTransactionId() to move backwards, so it's not
* not be visible to everyone yet, while PD_ALL_VISIBLE is already * wrong for us to see tuples that appear to not be visible to
* set. The real safe xmin value never moves backwards, but * everyone yet, while PD_ALL_VISIBLE is already set. The real safe
* GetOldestXmin() is conservative and sometimes returns a value * xmin value never moves backwards, but
* that's unnecessarily small, so if we see that contradiction it just * GetOldestNonRemovableTransactionId() is conservative and sometimes
* means that the tuples that we think are not visible to everyone yet * returns a value that's unnecessarily small, so if we see that
* actually are, and the PD_ALL_VISIBLE flag is correct. * contradiction it just means that the tuples that we think are not
* visible to everyone yet actually are, and the PD_ALL_VISIBLE flag
* is correct.
* *
* There should never be dead tuples on a page with PD_ALL_VISIBLE * There should never be dead tuples on a page with PD_ALL_VISIBLE
* set, however. * set, however.
......
...@@ -519,7 +519,8 @@ index_getnext_tid(IndexScanDesc scan, ScanDirection direction) ...@@ -519,7 +519,8 @@ index_getnext_tid(IndexScanDesc scan, ScanDirection direction)
SCAN_CHECKS; SCAN_CHECKS;
CHECK_SCAN_PROCEDURE(amgettuple); CHECK_SCAN_PROCEDURE(amgettuple);
Assert(TransactionIdIsValid(RecentGlobalXmin)); /* XXX: we should assert that a snapshot is pushed or registered */
Assert(TransactionIdIsValid(RecentXmin));
/* /*
* The AM's amgettuple proc finds the next index entry matching the scan * The AM's amgettuple proc finds the next index entry matching the scan
......
...@@ -342,9 +342,9 @@ snapshots and registered snapshots as of the deletion are gone; which is ...@@ -342,9 +342,9 @@ snapshots and registered snapshots as of the deletion are gone; which is
overly strong, but is simple to implement within Postgres. When marked overly strong, but is simple to implement within Postgres. When marked
dead, a deleted page is labeled with the next-transaction counter value. dead, a deleted page is labeled with the next-transaction counter value.
VACUUM can reclaim the page for re-use when this transaction number is VACUUM can reclaim the page for re-use when this transaction number is
older than RecentGlobalXmin. As collateral damage, this implementation guaranteed to be "visible to everyone". As collateral damage, this
also waits for running XIDs with no snapshots and for snapshots taken implementation also waits for running XIDs with no snapshots and for
until the next transaction to allocate an XID commits. snapshots taken until the next transaction to allocate an XID commits.
Reclaiming a page doesn't actually change its state on disk --- we simply Reclaiming a page doesn't actually change its state on disk --- we simply
record it in the shared-memory free space map, from which it will be record it in the shared-memory free space map, from which it will be
...@@ -411,8 +411,8 @@ page and also the correct place to hold the current value. We can avoid ...@@ -411,8 +411,8 @@ page and also the correct place to hold the current value. We can avoid
the cost of walking down the tree in such common cases. the cost of walking down the tree in such common cases.
The optimization works on the assumption that there can only be one The optimization works on the assumption that there can only be one
non-ignorable leaf rightmost page, and so even a RecentGlobalXmin style non-ignorable leaf rightmost page, and so not even a visible-to-everyone
interlock isn't required. We cannot fail to detect that our hint was style interlock required. We cannot fail to detect that our hint was
invalidated, because there can only be one such page in the B-Tree at invalidated, because there can only be one such page in the B-Tree at
any time. It's possible that the page will be deleted and recycled any time. It's possible that the page will be deleted and recycled
without a backend's cached page also being detected as invalidated, but without a backend's cached page also being detected as invalidated, but
......
...@@ -1097,7 +1097,7 @@ _bt_page_recyclable(Page page) ...@@ -1097,7 +1097,7 @@ _bt_page_recyclable(Page page)
*/ */
opaque = (BTPageOpaque) PageGetSpecialPointer(page); opaque = (BTPageOpaque) PageGetSpecialPointer(page);
if (P_ISDELETED(opaque) && if (P_ISDELETED(opaque) &&
TransactionIdPrecedes(opaque->btpo.xact, RecentGlobalXmin)) GlobalVisCheckRemovableXid(NULL, opaque->btpo.xact))
return true; return true;
return false; return false;
} }
...@@ -2318,7 +2318,7 @@ _bt_unlink_halfdead_page(Relation rel, Buffer leafbuf, BlockNumber scanblkno, ...@@ -2318,7 +2318,7 @@ _bt_unlink_halfdead_page(Relation rel, Buffer leafbuf, BlockNumber scanblkno,
* updated links to the target, ReadNewTransactionId() suffices as an * updated links to the target, ReadNewTransactionId() suffices as an
* upper bound. Any scan having retained a now-stale link is advertising * upper bound. Any scan having retained a now-stale link is advertising
* in its PGXACT an xmin less than or equal to the value we read here. It * in its PGXACT an xmin less than or equal to the value we read here. It
* will continue to do so, holding back RecentGlobalXmin, for the duration * will continue to do so, holding back the xmin horizon, for the duration
* of that scan. * of that scan.
*/ */
page = BufferGetPage(buf); page = BufferGetPage(buf);
......
...@@ -808,6 +808,12 @@ _bt_vacuum_needs_cleanup(IndexVacuumInfo *info) ...@@ -808,6 +808,12 @@ _bt_vacuum_needs_cleanup(IndexVacuumInfo *info)
metapg = BufferGetPage(metabuf); metapg = BufferGetPage(metabuf);
metad = BTPageGetMeta(metapg); metad = BTPageGetMeta(metapg);
/*
* XXX: If IndexVacuumInfo contained the heap relation, we could be more
* aggressive about vacuuming non catalog relations by passing the table
* to GlobalVisCheckRemovableXid().
*/
if (metad->btm_version < BTREE_NOVAC_VERSION) if (metad->btm_version < BTREE_NOVAC_VERSION)
{ {
/* /*
...@@ -817,13 +823,12 @@ _bt_vacuum_needs_cleanup(IndexVacuumInfo *info) ...@@ -817,13 +823,12 @@ _bt_vacuum_needs_cleanup(IndexVacuumInfo *info)
result = true; result = true;
} }
else if (TransactionIdIsValid(metad->btm_oldest_btpo_xact) && else if (TransactionIdIsValid(metad->btm_oldest_btpo_xact) &&
TransactionIdPrecedes(metad->btm_oldest_btpo_xact, GlobalVisCheckRemovableXid(NULL, metad->btm_oldest_btpo_xact))
RecentGlobalXmin))
{ {
/* /*
* If any oldest btpo.xact from a previously deleted page in the index * If any oldest btpo.xact from a previously deleted page in the index
* is older than RecentGlobalXmin, then at least one deleted page can * is visible to everyone, then at least one deleted page can be
* be recycled -- don't skip cleanup. * recycled -- don't skip cleanup.
*/ */
result = true; result = true;
} }
...@@ -1276,14 +1281,13 @@ backtrack: ...@@ -1276,14 +1281,13 @@ backtrack:
* own conflict now.) * own conflict now.)
* *
* Backends with snapshots acquired after a VACUUM starts but * Backends with snapshots acquired after a VACUUM starts but
* before it finishes could have a RecentGlobalXmin with a * before it finishes could have visibility cutoff with a
* later xid than the VACUUM's OldestXmin cutoff. These * later xid than VACUUM's OldestXmin cutoff. These backends
* backends might happen to opportunistically mark some index * might happen to opportunistically mark some index tuples
* tuples LP_DEAD before we reach them, even though they may * LP_DEAD before we reach them, even though they may be after
* be after our cutoff. We don't try to kill these "extra" * our cutoff. We don't try to kill these "extra" index
* index tuples in _bt_delitems_vacuum(). This keep things * tuples in _bt_delitems_vacuum(). This keep things simple,
* simple, and allows us to always avoid generating our own * and allows us to always avoid generating our own conflicts.
* conflicts.
*/ */
Assert(!BTreeTupleIsPivot(itup)); Assert(!BTreeTupleIsPivot(itup));
if (!BTreeTupleIsPosting(itup)) if (!BTreeTupleIsPosting(itup))
......
...@@ -948,11 +948,11 @@ btree_xlog_reuse_page(XLogReaderState *record) ...@@ -948,11 +948,11 @@ btree_xlog_reuse_page(XLogReaderState *record)
* Btree reuse_page records exist to provide a conflict point when we * Btree reuse_page records exist to provide a conflict point when we
* reuse pages in the index via the FSM. That's all they do though. * reuse pages in the index via the FSM. That's all they do though.
* *
* latestRemovedXid was the page's btpo.xact. The btpo.xact < * latestRemovedXid was the page's btpo.xact. The
* RecentGlobalXmin test in _bt_page_recyclable() conceptually mirrors the * GlobalVisCheckRemovableXid test in _bt_page_recyclable() conceptually
* pgxact->xmin > limitXmin test in GetConflictingVirtualXIDs(). * mirrors the pgxact->xmin > limitXmin test in
* Consequently, one XID value achieves the same exclusion effect on * GetConflictingVirtualXIDs(). Consequently, one XID value achieves the
* primary and standby. * same exclusion effect on primary and standby.
*/ */
if (InHotStandby) if (InHotStandby)
{ {
......
...@@ -501,10 +501,14 @@ vacuumRedirectAndPlaceholder(Relation index, Buffer buffer) ...@@ -501,10 +501,14 @@ vacuumRedirectAndPlaceholder(Relation index, Buffer buffer)
OffsetNumber itemToPlaceholder[MaxIndexTuplesPerPage]; OffsetNumber itemToPlaceholder[MaxIndexTuplesPerPage];
OffsetNumber itemnos[MaxIndexTuplesPerPage]; OffsetNumber itemnos[MaxIndexTuplesPerPage];
spgxlogVacuumRedirect xlrec; spgxlogVacuumRedirect xlrec;
GlobalVisState *vistest;
xlrec.nToPlaceholder = 0; xlrec.nToPlaceholder = 0;
xlrec.newestRedirectXid = InvalidTransactionId; xlrec.newestRedirectXid = InvalidTransactionId;
/* XXX: providing heap relation would allow more pruning */
vistest = GlobalVisTestFor(NULL);
START_CRIT_SECTION(); START_CRIT_SECTION();
/* /*
...@@ -521,7 +525,7 @@ vacuumRedirectAndPlaceholder(Relation index, Buffer buffer) ...@@ -521,7 +525,7 @@ vacuumRedirectAndPlaceholder(Relation index, Buffer buffer)
dt = (SpGistDeadTuple) PageGetItem(page, PageGetItemId(page, i)); dt = (SpGistDeadTuple) PageGetItem(page, PageGetItemId(page, i));
if (dt->tupstate == SPGIST_REDIRECT && if (dt->tupstate == SPGIST_REDIRECT &&
TransactionIdPrecedes(dt->xid, RecentGlobalXmin)) GlobalVisTestIsRemovableXid(vistest, dt->xid))
{ {
dt->tupstate = SPGIST_PLACEHOLDER; dt->tupstate = SPGIST_PLACEHOLDER;
Assert(opaque->nRedirection > 0); Assert(opaque->nRedirection > 0);
......
...@@ -281,7 +281,7 @@ present or the overflow flag is set.) If a backend released XidGenLock ...@@ -281,7 +281,7 @@ present or the overflow flag is set.) If a backend released XidGenLock
before storing its XID into MyPgXact, then it would be possible for another before storing its XID into MyPgXact, then it would be possible for another
backend to allocate and commit a later XID, causing latestCompletedXid to backend to allocate and commit a later XID, causing latestCompletedXid to
pass the first backend's XID, before that value became visible in the pass the first backend's XID, before that value became visible in the
ProcArray. That would break GetOldestXmin, as discussed below. ProcArray. That would break ComputeXidHorizons, as discussed below.
We allow GetNewTransactionId to store the XID into MyPgXact->xid (or the We allow GetNewTransactionId to store the XID into MyPgXact->xid (or the
subxid array) without taking ProcArrayLock. This was once necessary to subxid array) without taking ProcArrayLock. This was once necessary to
...@@ -293,42 +293,50 @@ once, rather than assume they can read it multiple times and get the same ...@@ -293,42 +293,50 @@ once, rather than assume they can read it multiple times and get the same
answer each time. (Use volatile-qualified pointers when doing this, to answer each time. (Use volatile-qualified pointers when doing this, to
ensure that the C compiler does exactly what you tell it to.) ensure that the C compiler does exactly what you tell it to.)
Another important activity that uses the shared ProcArray is GetOldestXmin, Another important activity that uses the shared ProcArray is
which must determine a lower bound for the oldest xmin of any active MVCC ComputeXidHorizons, which must determine a lower bound for the oldest xmin
snapshot, system-wide. Each individual backend advertises the smallest of any active MVCC snapshot, system-wide. Each individual backend
xmin of its own snapshots in MyPgXact->xmin, or zero if it currently has no advertises the smallest xmin of its own snapshots in MyPgXact->xmin, or zero
live snapshots (eg, if it's between transactions or hasn't yet set a if it currently has no live snapshots (eg, if it's between transactions or
snapshot for a new transaction). GetOldestXmin takes the MIN() of the hasn't yet set a snapshot for a new transaction). ComputeXidHorizons takes
valid xmin fields. It does this with only shared lock on ProcArrayLock, the MIN() of the valid xmin fields. It does this with only shared lock on
which means there is a potential race condition against other backends ProcArrayLock, which means there is a potential race condition against other
doing GetSnapshotData concurrently: we must be certain that a concurrent backends doing GetSnapshotData concurrently: we must be certain that a
backend that is about to set its xmin does not compute an xmin less than concurrent backend that is about to set its xmin does not compute an xmin
what GetOldestXmin returns. We ensure that by including all the active less than what ComputeXidHorizons determines. We ensure that by including
XIDs into the MIN() calculation, along with the valid xmins. The rule that all the active XIDs into the MIN() calculation, along with the valid xmins.
transactions can't exit without taking exclusive ProcArrayLock ensures that The rule that transactions can't exit without taking exclusive ProcArrayLock
concurrent holders of shared ProcArrayLock will compute the same minimum of ensures that concurrent holders of shared ProcArrayLock will compute the
currently-active XIDs: no xact, in particular not the oldest, can exit same minimum of currently-active XIDs: no xact, in particular not the
while we hold shared ProcArrayLock. So GetOldestXmin's view of the minimum oldest, can exit while we hold shared ProcArrayLock. So
active XID will be the same as that of any concurrent GetSnapshotData, and ComputeXidHorizons's view of the minimum active XID will be the same as that
so it can't produce an overestimate. If there is no active transaction at of any concurrent GetSnapshotData, and so it can't produce an overestimate.
all, GetOldestXmin returns latestCompletedXid + 1, which is a lower bound If there is no active transaction at all, ComputeXidHorizons uses
for the xmin that might be computed by concurrent or later GetSnapshotData latestCompletedXid + 1, which is a lower bound for the xmin that might
calls. (We know that no XID less than this could be about to appear in be computed by concurrent or later GetSnapshotData calls. (We know that no
the ProcArray, because of the XidGenLock interlock discussed above.) XID less than this could be about to appear in the ProcArray, because of the
XidGenLock interlock discussed above.)
GetSnapshotData also performs an oldest-xmin calculation (which had better
match GetOldestXmin's) and stores that into RecentGlobalXmin, which is used As GetSnapshotData is performance critical, it does not perform an accurate
for some tuple age cutoff checks where a fresh call of GetOldestXmin seems oldest-xmin calculation (it used to, until v13). The contents of a snapshot
too expensive. Note that while it is certain that two concurrent only depend on the xids of other backends, not their xmin. As backend's xmin
executions of GetSnapshotData will compute the same xmin for their own changes much more often than its xid, having GetSnapshotData look at xmins
snapshots, as argued above, it is not certain that they will arrive at the can lead to a lot of unnecessary cacheline ping-pong. Instead
same estimate of RecentGlobalXmin. This is because we allow XID-less GetSnapshotData updates approximate thresholds (one that guarantees that all
transactions to clear their MyPgXact->xmin asynchronously (without taking deleted rows older than it can be removed, another determining that deleted
ProcArrayLock), so one execution might see what had been the oldest xmin, rows newer than it can not be removed). GlobalVisTest* uses those threshold
and another not. This is OK since RecentGlobalXmin need only be a valid to make invisibility decision, falling back to ComputeXidHorizons if
lower bound. As noted above, we are already assuming that fetch/store necessary.
of the xid fields is atomic, so assuming it for xmin as well is no extra
risk. Note that while it is certain that two concurrent executions of
GetSnapshotData will compute the same xmin for their own snapshots, there is
no such guarantee for the horizons computed by ComputeXidHorizons. This is
because we allow XID-less transactions to clear their MyPgXact->xmin
asynchronously (without taking ProcArrayLock), so one execution might see
what had been the oldest xmin, and another not. This is OK since the
thresholds need only be a valid lower bound. As noted above, we are already
assuming that fetch/store of the xid fields is atomic, so assuming it for
xmin as well is no extra risk.
pg_xact and pg_subtrans pg_xact and pg_subtrans
......
...@@ -9096,7 +9096,7 @@ CreateCheckPoint(int flags) ...@@ -9096,7 +9096,7 @@ CreateCheckPoint(int flags)
* StartupSUBTRANS hasn't been called yet. * StartupSUBTRANS hasn't been called yet.
*/ */
if (!RecoveryInProgress()) if (!RecoveryInProgress())
TruncateSUBTRANS(GetOldestXmin(NULL, PROCARRAY_FLAGS_DEFAULT)); TruncateSUBTRANS(GetOldestTransactionIdConsideredRunning());
/* Real work is done, but log and update stats before releasing lock. */ /* Real work is done, but log and update stats before releasing lock. */
LogCheckpointEnd(false); LogCheckpointEnd(false);
...@@ -9456,7 +9456,7 @@ CreateRestartPoint(int flags) ...@@ -9456,7 +9456,7 @@ CreateRestartPoint(int flags)
* this because StartupSUBTRANS hasn't been called yet. * this because StartupSUBTRANS hasn't been called yet.
*/ */
if (EnableHotStandby) if (EnableHotStandby)
TruncateSUBTRANS(GetOldestXmin(NULL, PROCARRAY_FLAGS_DEFAULT)); TruncateSUBTRANS(GetOldestTransactionIdConsideredRunning());
/* Real work is done, but log and update before releasing lock. */ /* Real work is done, but log and update before releasing lock. */
LogCheckpointEnd(true); LogCheckpointEnd(true);
......
...@@ -1045,7 +1045,7 @@ acquire_sample_rows(Relation onerel, int elevel, ...@@ -1045,7 +1045,7 @@ acquire_sample_rows(Relation onerel, int elevel,
totalblocks = RelationGetNumberOfBlocks(onerel); totalblocks = RelationGetNumberOfBlocks(onerel);
/* Need a cutoff xmin for HeapTupleSatisfiesVacuum */ /* Need a cutoff xmin for HeapTupleSatisfiesVacuum */
OldestXmin = GetOldestXmin(onerel, PROCARRAY_FLAGS_VACUUM); OldestXmin = GetOldestNonRemovableTransactionId(onerel);
/* Prepare for sampling block numbers */ /* Prepare for sampling block numbers */
nblocks = BlockSampler_Init(&bs, totalblocks, targrows, random()); nblocks = BlockSampler_Init(&bs, totalblocks, targrows, random());
......
...@@ -955,8 +955,25 @@ vacuum_set_xid_limits(Relation rel, ...@@ -955,8 +955,25 @@ vacuum_set_xid_limits(Relation rel,
* working on a particular table at any time, and that each vacuum is * working on a particular table at any time, and that each vacuum is
* always an independent transaction. * always an independent transaction.
*/ */
*oldestXmin = *oldestXmin = GetOldestNonRemovableTransactionId(rel);
TransactionIdLimitedForOldSnapshots(GetOldestXmin(rel, PROCARRAY_FLAGS_VACUUM), rel);
if (OldSnapshotThresholdActive())
{
TransactionId limit_xmin;
TimestampTz limit_ts;
if (TransactionIdLimitedForOldSnapshots(*oldestXmin, rel, &limit_xmin, &limit_ts))
{
/*
* TODO: We should only set the threshold if we are pruning on the
* basis of the increased limits. Not as crucial here as it is for
* opportunistic pruning (which often happens at a much higher
* frequency), but would still be a significant improvement.
*/
SetOldSnapshotThresholdTimestamp(limit_ts, limit_xmin);
*oldestXmin = limit_xmin;
}
}
Assert(TransactionIdIsNormal(*oldestXmin)); Assert(TransactionIdIsNormal(*oldestXmin));
...@@ -1345,12 +1362,13 @@ vac_update_datfrozenxid(void) ...@@ -1345,12 +1362,13 @@ vac_update_datfrozenxid(void)
bool dirty = false; bool dirty = false;
/* /*
* Initialize the "min" calculation with GetOldestXmin, which is a * Initialize the "min" calculation with
* reasonable approximation to the minimum relfrozenxid for not-yet- * GetOldestNonRemovableTransactionId(), which is a reasonable
* committed pg_class entries for new tables; see AddNewRelationTuple(). * approximation to the minimum relfrozenxid for not-yet-committed
* So we cannot produce a wrong minimum by starting with this. * pg_class entries for new tables; see AddNewRelationTuple(). So we
* cannot produce a wrong minimum by starting with this.
*/ */
newFrozenXid = GetOldestXmin(NULL, PROCARRAY_FLAGS_VACUUM); newFrozenXid = GetOldestNonRemovableTransactionId(NULL);
/* /*
* Similarly, initialize the MultiXact "min" with the value that would be * Similarly, initialize the MultiXact "min" with the value that would be
...@@ -1681,8 +1699,9 @@ vacuum_rel(Oid relid, RangeVar *relation, VacuumParams *params) ...@@ -1681,8 +1699,9 @@ vacuum_rel(Oid relid, RangeVar *relation, VacuumParams *params)
StartTransactionCommand(); StartTransactionCommand();
/* /*
* Functions in indexes may want a snapshot set. Also, setting a snapshot * Need to acquire a snapshot to prevent pg_subtrans from being truncated,
* ensures that RecentGlobalXmin is kept truly recent. * cutoff xids in local memory wrapping around, and to have updated xmin
* horizons.
*/ */
PushActiveSnapshot(GetTransactionSnapshot()); PushActiveSnapshot(GetTransactionSnapshot());
...@@ -1705,8 +1724,8 @@ vacuum_rel(Oid relid, RangeVar *relation, VacuumParams *params) ...@@ -1705,8 +1724,8 @@ vacuum_rel(Oid relid, RangeVar *relation, VacuumParams *params)
* *
* Note: these flags remain set until CommitTransaction or * Note: these flags remain set until CommitTransaction or
* AbortTransaction. We don't want to clear them until we reset * AbortTransaction. We don't want to clear them until we reset
* MyPgXact->xid/xmin, else OldestXmin might appear to go backwards, * MyPgXact->xid/xmin, otherwise GetOldestNonRemovableTransactionId()
* which is probably Not Good. * might appear to go backwards, which is probably Not Good.
*/ */
LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE); LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
MyPgXact->vacuumFlags |= PROC_IN_VACUUM; MyPgXact->vacuumFlags |= PROC_IN_VACUUM;
......
...@@ -1877,6 +1877,10 @@ get_database_list(void) ...@@ -1877,6 +1877,10 @@ get_database_list(void)
* the secondary effect that it sets RecentGlobalXmin. (This is critical * the secondary effect that it sets RecentGlobalXmin. (This is critical
* for anything that reads heap pages, because HOT may decide to prune * for anything that reads heap pages, because HOT may decide to prune
* them even if the process doesn't attempt to modify any tuples.) * them even if the process doesn't attempt to modify any tuples.)
*
* FIXME: This comment is inaccurate / the code buggy. A snapshot that is
* not pushed/active does not reliably prevent HOT pruning (->xmin could
* e.g. be cleared when cache invalidations are processed).
*/ */
StartTransactionCommand(); StartTransactionCommand();
(void) GetTransactionSnapshot(); (void) GetTransactionSnapshot();
......
...@@ -122,6 +122,10 @@ get_subscription_list(void) ...@@ -122,6 +122,10 @@ get_subscription_list(void)
* the secondary effect that it sets RecentGlobalXmin. (This is critical * the secondary effect that it sets RecentGlobalXmin. (This is critical
* for anything that reads heap pages, because HOT may decide to prune * for anything that reads heap pages, because HOT may decide to prune
* them even if the process doesn't attempt to modify any tuples.) * them even if the process doesn't attempt to modify any tuples.)
*
* FIXME: This comment is inaccurate / the code buggy. A snapshot that is
* not pushed/active does not reliably prevent HOT pruning (->xmin could
* e.g. be cleared when cache invalidations are processed).
*/ */
StartTransactionCommand(); StartTransactionCommand();
(void) GetTransactionSnapshot(); (void) GetTransactionSnapshot();
......
...@@ -1181,22 +1181,7 @@ XLogWalRcvSendHSFeedback(bool immed) ...@@ -1181,22 +1181,7 @@ XLogWalRcvSendHSFeedback(bool immed)
*/ */
if (hot_standby_feedback) if (hot_standby_feedback)
{ {
TransactionId slot_xmin; GetReplicationHorizons(&xmin, &catalog_xmin);
/*
* Usually GetOldestXmin() would include both global replication slot
* xmin and catalog_xmin in its calculations, but we want to derive
* separate values for each of those. So we ask for an xmin that
* excludes the catalog_xmin.
*/
xmin = GetOldestXmin(NULL,
PROCARRAY_FLAGS_DEFAULT | PROCARRAY_SLOTS_XMIN);
ProcArrayGetReplicationSlotXmin(&slot_xmin, &catalog_xmin);
if (TransactionIdIsValid(slot_xmin) &&
TransactionIdPrecedes(slot_xmin, xmin))
xmin = slot_xmin;
} }
else else
{ {
......
...@@ -2113,9 +2113,10 @@ ProcessStandbyHSFeedbackMessage(void) ...@@ -2113,9 +2113,10 @@ ProcessStandbyHSFeedbackMessage(void)
/* /*
* Set the WalSender's xmin equal to the standby's requested xmin, so that * Set the WalSender's xmin equal to the standby's requested xmin, so that
* the xmin will be taken into account by GetOldestXmin. This will hold * the xmin will be taken into account by GetSnapshotData() /
* back the removal of dead rows and thereby prevent the generation of * ComputeXidHorizons(). This will hold back the removal of dead rows and
* cleanup conflicts on the standby server. * thereby prevent the generation of cleanup conflicts on the standby
* server.
* *
* There is a small window for a race condition here: although we just * There is a small window for a race condition here: although we just
* checked that feedbackXmin precedes nextXid, the nextXid could have * checked that feedbackXmin precedes nextXid, the nextXid could have
...@@ -2128,10 +2129,10 @@ ProcessStandbyHSFeedbackMessage(void) ...@@ -2128,10 +2129,10 @@ ProcessStandbyHSFeedbackMessage(void)
* own xmin would prevent nextXid from advancing so far. * own xmin would prevent nextXid from advancing so far.
* *
* We don't bother taking the ProcArrayLock here. Setting the xmin field * We don't bother taking the ProcArrayLock here. Setting the xmin field
* is assumed atomic, and there's no real need to prevent a concurrent * is assumed atomic, and there's no real need to prevent concurrent
* GetOldestXmin. (If we're moving our xmin forward, this is obviously * horizon determinations. (If we're moving our xmin forward, this is
* safe, and if we're moving it backwards, well, the data is at risk * obviously safe, and if we're moving it backwards, well, the data is at
* already since a VACUUM could have just finished calling GetOldestXmin.) * risk already since a VACUUM could already have determined the horizon.)
* *
* If we're using a replication slot we reserve the xmin via that, * If we're using a replication slot we reserve the xmin via that,
* otherwise via the walsender's PGXACT entry. We can only track the * otherwise via the walsender's PGXACT entry. We can only track the
......
This diff is collapsed.
...@@ -5786,14 +5786,15 @@ get_actual_variable_endpoint(Relation heapRel, ...@@ -5786,14 +5786,15 @@ get_actual_variable_endpoint(Relation heapRel,
* recent); that case motivates not using SnapshotAny here. * recent); that case motivates not using SnapshotAny here.
* *
* A crucial point here is that SnapshotNonVacuumable, with * A crucial point here is that SnapshotNonVacuumable, with
* RecentGlobalXmin as horizon, yields the inverse of the condition that * GlobalVisTestFor(heapRel) as horizon, yields the inverse of the
* the indexscan will use to decide that index entries are killable (see * condition that the indexscan will use to decide that index entries are
* heap_hot_search_buffer()). Therefore, if the snapshot rejects a tuple * killable (see heap_hot_search_buffer()). Therefore, if the snapshot
* (or more precisely, all tuples of a HOT chain) and we have to continue * rejects a tuple (or more precisely, all tuples of a HOT chain) and we
* scanning past it, we know that the indexscan will mark that index entry * have to continue scanning past it, we know that the indexscan will mark
* killed. That means that the next get_actual_variable_endpoint() call * that index entry killed. That means that the next
* will not have to re-consider that index entry. In this way we avoid * get_actual_variable_endpoint() call will not have to re-consider that
* repetitive work when this function is used a lot during planning. * index entry. In this way we avoid repetitive work when this function
* is used a lot during planning.
* *
* But using SnapshotNonVacuumable creates a hazard of its own. In a * But using SnapshotNonVacuumable creates a hazard of its own. In a
* recently-created index, some index entries may point at "broken" HOT * recently-created index, some index entries may point at "broken" HOT
...@@ -5805,7 +5806,8 @@ get_actual_variable_endpoint(Relation heapRel, ...@@ -5805,7 +5806,8 @@ get_actual_variable_endpoint(Relation heapRel,
* or could even be NULL. We avoid this hazard because we take the data * or could even be NULL. We avoid this hazard because we take the data
* from the index entry not the heap. * from the index entry not the heap.
*/ */
InitNonVacuumableSnapshot(SnapshotNonVacuumable, RecentGlobalXmin); InitNonVacuumableSnapshot(SnapshotNonVacuumable,
GlobalVisTestFor(heapRel));
index_scan = index_beginscan(heapRel, indexRel, index_scan = index_beginscan(heapRel, indexRel,
&SnapshotNonVacuumable, &SnapshotNonVacuumable,
......
...@@ -722,6 +722,10 @@ InitPostgres(const char *in_dbname, Oid dboid, const char *username, ...@@ -722,6 +722,10 @@ InitPostgres(const char *in_dbname, Oid dboid, const char *username,
* is critical for anything that reads heap pages, because HOT may decide * is critical for anything that reads heap pages, because HOT may decide
* to prune them even if the process doesn't attempt to modify any * to prune them even if the process doesn't attempt to modify any
* tuples.) * tuples.)
*
* FIXME: This comment is inaccurate / the code buggy. A snapshot that is
* not pushed/active does not reliably prevent HOT pruning (->xmin could
* e.g. be cleared when cache invalidations are processed).
*/ */
if (!bootstrap) if (!bootstrap)
{ {
......
This diff is collapsed.
...@@ -12,6 +12,7 @@ ...@@ -12,6 +12,7 @@
#include "access/transam.h" #include "access/transam.h"
#include "storage/block.h" #include "storage/block.h"
#include "storage/bufpage.h"
#include "storage/itemptr.h" #include "storage/itemptr.h"
#include "storage/off.h" #include "storage/off.h"
...@@ -134,8 +135,7 @@ typedef struct GinMetaPageData ...@@ -134,8 +135,7 @@ typedef struct GinMetaPageData
*/ */
#define GinPageGetDeleteXid(page) ( ((PageHeader) (page))->pd_prune_xid ) #define GinPageGetDeleteXid(page) ( ((PageHeader) (page))->pd_prune_xid )
#define GinPageSetDeleteXid(page, xid) ( ((PageHeader) (page))->pd_prune_xid = xid) #define GinPageSetDeleteXid(page, xid) ( ((PageHeader) (page))->pd_prune_xid = xid)
#define GinPageIsRecyclable(page) ( PageIsNew(page) || (GinPageIsDeleted(page) \ extern bool GinPageIsRecyclable(Page page);
&& TransactionIdPrecedes(GinPageGetDeleteXid(page), RecentGlobalXmin)))
/* /*
* We use our own ItemPointerGet(BlockNumber|OffsetNumber) * We use our own ItemPointerGet(BlockNumber|OffsetNumber)
......
...@@ -172,9 +172,12 @@ extern TransactionId heap_compute_xid_horizon_for_tuples(Relation rel, ...@@ -172,9 +172,12 @@ extern TransactionId heap_compute_xid_horizon_for_tuples(Relation rel,
int nitems); int nitems);
/* in heap/pruneheap.c */ /* in heap/pruneheap.c */
struct GlobalVisState;
extern void heap_page_prune_opt(Relation relation, Buffer buffer); extern void heap_page_prune_opt(Relation relation, Buffer buffer);
extern int heap_page_prune(Relation relation, Buffer buffer, extern int heap_page_prune(Relation relation, Buffer buffer,
TransactionId OldestXmin, struct GlobalVisState *vistest,
TransactionId limited_oldest_xmin,
TimestampTz limited_oldest_ts,
bool report_stats, TransactionId *latestRemovedXid); bool report_stats, TransactionId *latestRemovedXid);
extern void heap_page_prune_execute(Buffer buffer, extern void heap_page_prune_execute(Buffer buffer,
OffsetNumber *redirected, int nredirected, OffsetNumber *redirected, int nredirected,
...@@ -195,11 +198,14 @@ extern TM_Result HeapTupleSatisfiesUpdate(HeapTuple stup, CommandId curcid, ...@@ -195,11 +198,14 @@ extern TM_Result HeapTupleSatisfiesUpdate(HeapTuple stup, CommandId curcid,
Buffer buffer); Buffer buffer);
extern HTSV_Result HeapTupleSatisfiesVacuum(HeapTuple stup, TransactionId OldestXmin, extern HTSV_Result HeapTupleSatisfiesVacuum(HeapTuple stup, TransactionId OldestXmin,
Buffer buffer); Buffer buffer);
extern HTSV_Result HeapTupleSatisfiesVacuumHorizon(HeapTuple stup, Buffer buffer,
TransactionId *dead_after);
extern void HeapTupleSetHintBits(HeapTupleHeader tuple, Buffer buffer, extern void HeapTupleSetHintBits(HeapTupleHeader tuple, Buffer buffer,
uint16 infomask, TransactionId xid); uint16 infomask, TransactionId xid);
extern bool HeapTupleHeaderIsOnlyLocked(HeapTupleHeader tuple); extern bool HeapTupleHeaderIsOnlyLocked(HeapTupleHeader tuple);
extern bool XidInMVCCSnapshot(TransactionId xid, Snapshot snapshot); extern bool XidInMVCCSnapshot(TransactionId xid, Snapshot snapshot);
extern bool HeapTupleIsSurelyDead(HeapTuple htup, TransactionId OldestXmin); extern bool HeapTupleIsSurelyDead(HeapTuple htup,
struct GlobalVisState *vistest);
/* /*
* To avoid leaking too much knowledge about reorderbuffer implementation * To avoid leaking too much knowledge about reorderbuffer implementation
......
...@@ -95,15 +95,6 @@ FullTransactionIdFromU64(uint64 value) ...@@ -95,15 +95,6 @@ FullTransactionIdFromU64(uint64 value)
(dest) = FirstNormalTransactionId; \ (dest) = FirstNormalTransactionId; \
} while(0) } while(0)
/* advance a FullTransactionId variable, stepping over special XIDs */
static inline void
FullTransactionIdAdvance(FullTransactionId *dest)
{
dest->value++;
while (XidFromFullTransactionId(*dest) < FirstNormalTransactionId)
dest->value++;
}
/* /*
* Retreat a FullTransactionId variable, stepping over xids that would appear * Retreat a FullTransactionId variable, stepping over xids that would appear
* to be special only when viewed as 32bit XIDs. * to be special only when viewed as 32bit XIDs.
...@@ -129,6 +120,23 @@ FullTransactionIdRetreat(FullTransactionId *dest) ...@@ -129,6 +120,23 @@ FullTransactionIdRetreat(FullTransactionId *dest)
dest->value--; dest->value--;
} }
/*
* Advance a FullTransactionId variable, stepping over xids that would appear
* to be special only when viewed as 32bit XIDs.
*/
static inline void
FullTransactionIdAdvance(FullTransactionId *dest)
{
dest->value++;
/* see FullTransactionIdAdvance() */
if (FullTransactionIdPrecedes(*dest, FirstNormalFullTransactionId))
return;
while (XidFromFullTransactionId(*dest) < FirstNormalTransactionId)
dest->value++;
}
/* back up a transaction ID variable, handling wraparound correctly */ /* back up a transaction ID variable, handling wraparound correctly */
#define TransactionIdRetreat(dest) \ #define TransactionIdRetreat(dest) \
do { \ do { \
...@@ -293,6 +301,59 @@ ReadNewTransactionId(void) ...@@ -293,6 +301,59 @@ ReadNewTransactionId(void)
return XidFromFullTransactionId(ReadNextFullTransactionId()); return XidFromFullTransactionId(ReadNextFullTransactionId());
} }
/* return transaction ID backed up by amount, handling wraparound correctly */
static inline TransactionId
TransactionIdRetreatedBy(TransactionId xid, uint32 amount)
{
xid -= amount;
while (xid < FirstNormalTransactionId)
xid--;
return xid;
}
/* return the older of the two IDs */
static inline TransactionId
TransactionIdOlder(TransactionId a, TransactionId b)
{
if (!TransactionIdIsValid(a))
return b;
if (!TransactionIdIsValid(b))
return a;
if (TransactionIdPrecedes(a, b))
return a;
return b;
}
/* return the older of the two IDs, assuming they're both normal */
static inline TransactionId
NormalTransactionIdOlder(TransactionId a, TransactionId b)
{
Assert(TransactionIdIsNormal(a));
Assert(TransactionIdIsNormal(b));
if (NormalTransactionIdPrecedes(a, b))
return a;
return b;
}
/* return the newer of the two IDs */
static inline FullTransactionId
FullTransactionIdNewer(FullTransactionId a, FullTransactionId b)
{
if (!FullTransactionIdIsValid(a))
return b;
if (!FullTransactionIdIsValid(b))
return a;
if (FullTransactionIdFollows(a, b))
return a;
return b;
}
#endif /* FRONTEND */ #endif /* FRONTEND */
#endif /* TRANSAM_H */ #endif /* TRANSAM_H */
...@@ -389,12 +389,6 @@ PageValidateSpecialPointer(Page page) ...@@ -389,12 +389,6 @@ PageValidateSpecialPointer(Page page)
#define PageClearAllVisible(page) \ #define PageClearAllVisible(page) \
(((PageHeader) (page))->pd_flags &= ~PD_ALL_VISIBLE) (((PageHeader) (page))->pd_flags &= ~PD_ALL_VISIBLE)
#define PageIsPrunable(page, oldestxmin) \
( \
AssertMacro(TransactionIdIsNormal(oldestxmin)), \
TransactionIdIsValid(((PageHeader) (page))->pd_prune_xid) && \
TransactionIdPrecedes(((PageHeader) (page))->pd_prune_xid, oldestxmin) \
)
#define PageSetPrunable(page, xid) \ #define PageSetPrunable(page, xid) \
do { \ do { \
Assert(TransactionIdIsNormal(xid)); \ Assert(TransactionIdIsNormal(xid)); \
......
...@@ -42,20 +42,12 @@ struct XidCache ...@@ -42,20 +42,12 @@ struct XidCache
/* /*
* Flags for PGXACT->vacuumFlags * Flags for PGXACT->vacuumFlags
*
* Note: If you modify these flags, you need to modify PROCARRAY_XXX flags
* in src/include/storage/procarray.h.
*
* PROC_RESERVED may later be assigned for use in vacuumFlags, but its value is
* used for PROCARRAY_SLOTS_XMIN in procarray.h, so GetOldestXmin won't be able
* to match and ignore processes with this flag set.
*/ */
#define PROC_IS_AUTOVACUUM 0x01 /* is it an autovac worker? */ #define PROC_IS_AUTOVACUUM 0x01 /* is it an autovac worker? */
#define PROC_IN_VACUUM 0x02 /* currently running lazy vacuum */ #define PROC_IN_VACUUM 0x02 /* currently running lazy vacuum */
#define PROC_VACUUM_FOR_WRAPAROUND 0x08 /* set by autovac only */ #define PROC_VACUUM_FOR_WRAPAROUND 0x08 /* set by autovac only */
#define PROC_IN_LOGICAL_DECODING 0x10 /* currently doing logical #define PROC_IN_LOGICAL_DECODING 0x10 /* currently doing logical
* decoding outside xact */ * decoding outside xact */
#define PROC_RESERVED 0x20 /* reserved for procarray */
/* flags reset at EOXact */ /* flags reset at EOXact */
#define PROC_VACUUM_STATE_MASK \ #define PROC_VACUUM_STATE_MASK \
......
...@@ -20,34 +20,6 @@ ...@@ -20,34 +20,6 @@
#include "utils/snapshot.h" #include "utils/snapshot.h"
/*
* These are to implement PROCARRAY_FLAGS_XXX
*
* Note: These flags are cloned from PROC_XXX flags in src/include/storage/proc.h
* to avoid forcing to include proc.h when including procarray.h. So if you modify
* PROC_XXX flags, you need to modify these flags.
*/
#define PROCARRAY_VACUUM_FLAG 0x02 /* currently running lazy
* vacuum */
#define PROCARRAY_LOGICAL_DECODING_FLAG 0x10 /* currently doing logical
* decoding outside xact */
#define PROCARRAY_SLOTS_XMIN 0x20 /* replication slot xmin,
* catalog_xmin */
/*
* Only flags in PROCARRAY_PROC_FLAGS_MASK are considered when matching
* PGXACT->vacuumFlags. Other flags are used for different purposes and
* have no corresponding PROC flag equivalent.
*/
#define PROCARRAY_PROC_FLAGS_MASK (PROCARRAY_VACUUM_FLAG | \
PROCARRAY_LOGICAL_DECODING_FLAG)
/* Use the following flags as an input "flags" to GetOldestXmin function */
/* Consider all backends except for logical decoding ones which manage xmin separately */
#define PROCARRAY_FLAGS_DEFAULT PROCARRAY_LOGICAL_DECODING_FLAG
/* Ignore vacuum backends */
#define PROCARRAY_FLAGS_VACUUM PROCARRAY_FLAGS_DEFAULT | PROCARRAY_VACUUM_FLAG
extern Size ProcArrayShmemSize(void); extern Size ProcArrayShmemSize(void);
extern void CreateSharedProcArray(void); extern void CreateSharedProcArray(void);
extern void ProcArrayAdd(PGPROC *proc); extern void ProcArrayAdd(PGPROC *proc);
...@@ -81,9 +53,11 @@ extern RunningTransactions GetRunningTransactionData(void); ...@@ -81,9 +53,11 @@ extern RunningTransactions GetRunningTransactionData(void);
extern bool TransactionIdIsInProgress(TransactionId xid); extern bool TransactionIdIsInProgress(TransactionId xid);
extern bool TransactionIdIsActive(TransactionId xid); extern bool TransactionIdIsActive(TransactionId xid);
extern TransactionId GetOldestXmin(Relation rel, int flags); extern TransactionId GetOldestNonRemovableTransactionId(Relation rel);
extern TransactionId GetOldestTransactionIdConsideredRunning(void);
extern TransactionId GetOldestActiveTransactionId(void); extern TransactionId GetOldestActiveTransactionId(void);
extern TransactionId GetOldestSafeDecodingTransactionId(bool catalogOnly); extern TransactionId GetOldestSafeDecodingTransactionId(bool catalogOnly);
extern void GetReplicationHorizons(TransactionId *slot_xmin, TransactionId *catalog_xmin);
extern VirtualTransactionId *GetVirtualXIDsDelayingChkpt(int *nvxids); extern VirtualTransactionId *GetVirtualXIDsDelayingChkpt(int *nvxids);
extern bool HaveVirtualXIDsDelayingChkpt(VirtualTransactionId *vxids, int nvxids); extern bool HaveVirtualXIDsDelayingChkpt(VirtualTransactionId *vxids, int nvxids);
......
...@@ -52,13 +52,12 @@ extern Size SnapMgrShmemSize(void); ...@@ -52,13 +52,12 @@ extern Size SnapMgrShmemSize(void);
extern void SnapMgrInit(void); extern void SnapMgrInit(void);
extern TimestampTz GetSnapshotCurrentTimestamp(void); extern TimestampTz GetSnapshotCurrentTimestamp(void);
extern TimestampTz GetOldSnapshotThresholdTimestamp(void); extern TimestampTz GetOldSnapshotThresholdTimestamp(void);
extern void SnapshotTooOldMagicForTest(void);
extern bool FirstSnapshotSet; extern bool FirstSnapshotSet;
extern PGDLLIMPORT TransactionId TransactionXmin; extern PGDLLIMPORT TransactionId TransactionXmin;
extern PGDLLIMPORT TransactionId RecentXmin; extern PGDLLIMPORT TransactionId RecentXmin;
extern PGDLLIMPORT TransactionId RecentGlobalXmin;
extern PGDLLIMPORT TransactionId RecentGlobalDataXmin;
/* Variables representing various special snapshot semantics */ /* Variables representing various special snapshot semantics */
extern PGDLLIMPORT SnapshotData SnapshotSelfData; extern PGDLLIMPORT SnapshotData SnapshotSelfData;
...@@ -78,11 +77,12 @@ extern PGDLLIMPORT SnapshotData CatalogSnapshotData; ...@@ -78,11 +77,12 @@ extern PGDLLIMPORT SnapshotData CatalogSnapshotData;
/* /*
* Similarly, some initialization is required for a NonVacuumable snapshot. * Similarly, some initialization is required for a NonVacuumable snapshot.
* The caller must supply the xmin horizon to use (e.g., RecentGlobalXmin). * The caller must supply the visibility cutoff state to use (c.f.
* GlobalVisTestFor()).
*/ */
#define InitNonVacuumableSnapshot(snapshotdata, xmin_horizon) \ #define InitNonVacuumableSnapshot(snapshotdata, vistestp) \
((snapshotdata).snapshot_type = SNAPSHOT_NON_VACUUMABLE, \ ((snapshotdata).snapshot_type = SNAPSHOT_NON_VACUUMABLE, \
(snapshotdata).xmin = (xmin_horizon)) (snapshotdata).vistest = (vistestp))
/* /*
* Similarly, some initialization is required for SnapshotToast. We need * Similarly, some initialization is required for SnapshotToast. We need
...@@ -98,6 +98,11 @@ extern PGDLLIMPORT SnapshotData CatalogSnapshotData; ...@@ -98,6 +98,11 @@ extern PGDLLIMPORT SnapshotData CatalogSnapshotData;
((snapshot)->snapshot_type == SNAPSHOT_MVCC || \ ((snapshot)->snapshot_type == SNAPSHOT_MVCC || \
(snapshot)->snapshot_type == SNAPSHOT_HISTORIC_MVCC) (snapshot)->snapshot_type == SNAPSHOT_HISTORIC_MVCC)
static inline bool
OldSnapshotThresholdActive(void)
{
return old_snapshot_threshold >= 0;
}
extern Snapshot GetTransactionSnapshot(void); extern Snapshot GetTransactionSnapshot(void);
extern Snapshot GetLatestSnapshot(void); extern Snapshot GetLatestSnapshot(void);
...@@ -121,8 +126,6 @@ extern void UnregisterSnapshot(Snapshot snapshot); ...@@ -121,8 +126,6 @@ extern void UnregisterSnapshot(Snapshot snapshot);
extern Snapshot RegisterSnapshotOnOwner(Snapshot snapshot, ResourceOwner owner); extern Snapshot RegisterSnapshotOnOwner(Snapshot snapshot, ResourceOwner owner);
extern void UnregisterSnapshotFromOwner(Snapshot snapshot, ResourceOwner owner); extern void UnregisterSnapshotFromOwner(Snapshot snapshot, ResourceOwner owner);
extern FullTransactionId GetFullRecentGlobalXmin(void);
extern void AtSubCommit_Snapshot(int level); extern void AtSubCommit_Snapshot(int level);
extern void AtSubAbort_Snapshot(int level); extern void AtSubAbort_Snapshot(int level);
extern void AtEOXact_Snapshot(bool isCommit, bool resetXmin); extern void AtEOXact_Snapshot(bool isCommit, bool resetXmin);
...@@ -131,13 +134,29 @@ extern void ImportSnapshot(const char *idstr); ...@@ -131,13 +134,29 @@ extern void ImportSnapshot(const char *idstr);
extern bool XactHasExportedSnapshots(void); extern bool XactHasExportedSnapshots(void);
extern void DeleteAllExportedSnapshotFiles(void); extern void DeleteAllExportedSnapshotFiles(void);
extern bool ThereAreNoPriorRegisteredSnapshots(void); extern bool ThereAreNoPriorRegisteredSnapshots(void);
extern TransactionId TransactionIdLimitedForOldSnapshots(TransactionId recentXmin, extern bool TransactionIdLimitedForOldSnapshots(TransactionId recentXmin,
Relation relation); Relation relation,
TransactionId *limit_xid,
TimestampTz *limit_ts);
extern void SetOldSnapshotThresholdTimestamp(TimestampTz ts, TransactionId xlimit);
extern void MaintainOldSnapshotTimeMapping(TimestampTz whenTaken, extern void MaintainOldSnapshotTimeMapping(TimestampTz whenTaken,
TransactionId xmin); TransactionId xmin);
extern char *ExportSnapshot(Snapshot snapshot); extern char *ExportSnapshot(Snapshot snapshot);
/*
* These live in procarray.c because they're intimately linked to the
* procarray contents, but thematically they better fit into snapmgr.h.
*/
typedef struct GlobalVisState GlobalVisState;
extern GlobalVisState *GlobalVisTestFor(Relation rel);
extern bool GlobalVisTestIsRemovableXid(GlobalVisState *state, TransactionId xid);
extern bool GlobalVisTestIsRemovableFullXid(GlobalVisState *state, FullTransactionId fxid);
extern FullTransactionId GlobalVisTestNonRemovableFullHorizon(GlobalVisState *state);
extern TransactionId GlobalVisTestNonRemovableHorizon(GlobalVisState *state);
extern bool GlobalVisCheckRemovableXid(Relation rel, TransactionId xid);
extern bool GlobalVisIsRemovableFullXid(Relation rel, FullTransactionId fxid);
/* /*
* Utility functions for implementing visibility routines in table AMs. * Utility functions for implementing visibility routines in table AMs.
*/ */
......
...@@ -192,6 +192,12 @@ typedef struct SnapshotData ...@@ -192,6 +192,12 @@ typedef struct SnapshotData
*/ */
uint32 speculativeToken; uint32 speculativeToken;
/*
* For SNAPSHOT_NON_VACUUMABLE (and hopefully more in the future) this is
* used to determine whether row could be vacuumed.
*/
struct GlobalVisState *vistest;
/* /*
* Book-keeping information, used by the snapshot manager * Book-keeping information, used by the snapshot manager
*/ */
......
...@@ -395,6 +395,7 @@ CompositeTypeStmt ...@@ -395,6 +395,7 @@ CompositeTypeStmt
CompoundAffixFlag CompoundAffixFlag
CompressionAlgorithm CompressionAlgorithm
CompressorState CompressorState
ComputeXidHorizonsResult
ConditionVariable ConditionVariable
ConditionalStack ConditionalStack
ConfigData ConfigData
...@@ -930,6 +931,7 @@ GistSplitVector ...@@ -930,6 +931,7 @@ GistSplitVector
GistTsVectorOptions GistTsVectorOptions
GistVacState GistVacState
GlobalTransaction GlobalTransaction
GlobalVisState
GrantRoleStmt GrantRoleStmt
GrantStmt GrantStmt
GrantTargetType GrantTargetType
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment