Commit ffaa44cb authored by Tom Lane's avatar Tom Lane

Account for catalog snapshot in PGXACT->xmin updates.

The CatalogSnapshot was not plugged into SnapshotResetXmin()'s accounting
for whether MyPgXact->xmin could be cleared or advanced.  In normal
transactions this was masked by the fact that the transaction snapshot
would be older, but during backend startup and certain utility commands
it was possible to re-use the CatalogSnapshot after MyPgXact->xmin had
been cleared, meaning that recently-deleted rows could be pruned even
though this snapshot could still see them, causing unexpected catalog
lookup failures.  This effect appears to be the explanation for a recent
failure on buildfarm member piculet.

To fix, add the CatalogSnapshot to the RegisteredSnapshots heap whenever
it is valid.

In the previous logic, it was possible for the CatalogSnapshot to remain
valid across waits for client input, but with this change that would mean
it delays advance of global xmin in cases where it did not before.  To
avoid possibly causing new table-bloat problems with clients that sit idle
for long intervals, add code to invalidate the CatalogSnapshot before
waiting for client input.  (When the backend is busy, it's unlikely that
the CatalogSnapshot would be the oldest snap for very long, so we don't
worry about forcing early invalidation of it otherwise.)

In passing, remove the CatalogSnapshotStale flag in favor of using
"CatalogSnapshot != NULL" to represent validity, as we do for the other
special snapshots in snapmgr.c.  And improve some obsolete comments.

No regression test because I don't know a deterministic way to cause this
failure.  But the stress test shown in the original discussion provokes
"cache lookup failed for relation 1255" within a few dozen seconds for me.

Back-patch to 9.4 where MVCC catalog scans were introduced.  (Note: it's
quite easy to produce similar failures with the same test case in branches
before 9.4.  But MVCC catalog scans were supposed to fix that.)

Discussion: <16447.1478818294@sss.pgh.pa.us>
parent fc19c180
...@@ -3946,6 +3946,12 @@ PostgresMain(int argc, char *argv[], ...@@ -3946,6 +3946,12 @@ PostgresMain(int argc, char *argv[],
initStringInfo(&input_message); initStringInfo(&input_message);
/*
* Also consider releasing our catalog snapshot if any, so that it's
* not preventing advance of global xmin while we wait for the client.
*/
InvalidateCatalogSnapshotConditionally();
/* /*
* (1) If we've reached idle state, tell the frontend we're ready for * (1) If we've reached idle state, tell the frontend we're ready for
* a new query. * a new query.
...@@ -4422,7 +4428,7 @@ ShowUsage(const char *title) ...@@ -4422,7 +4428,7 @@ ShowUsage(const char *title)
appendStringInfoString(&str, "! system usage stats:\n"); appendStringInfoString(&str, "! system usage stats:\n");
appendStringInfo(&str, appendStringInfo(&str,
"!\t%ld.%06ld s user, %ld.%06ld s system, %ld.%06ld s elapsed\n", "!\t%ld.%06ld s user, %ld.%06ld s system, %ld.%06ld s elapsed\n",
(long) (r.ru_utime.tv_sec - Save_r.ru_utime.tv_sec), (long) (r.ru_utime.tv_sec - Save_r.ru_utime.tv_sec),
(long) (r.ru_utime.tv_usec - Save_r.ru_utime.tv_usec), (long) (r.ru_utime.tv_usec - Save_r.ru_utime.tv_usec),
(long) (r.ru_stime.tv_sec - Save_r.ru_stime.tv_sec), (long) (r.ru_stime.tv_sec - Save_r.ru_stime.tv_sec),
......
/*------------------------------------------------------------------------- /*-------------------------------------------------------------------------
*
* snapmgr.c * snapmgr.c
* PostgreSQL snapshot manager * PostgreSQL snapshot manager
* *
...@@ -8,7 +9,7 @@ ...@@ -8,7 +9,7 @@
* (tracked by separate refcounts on each snapshot), its memory can be freed. * (tracked by separate refcounts on each snapshot), its memory can be freed.
* *
* The FirstXactSnapshot, if any, is treated a bit specially: we increment its * The FirstXactSnapshot, if any, is treated a bit specially: we increment its
* regd_count and count it in RegisteredSnapshots, but this reference is not * regd_count and list it in RegisteredSnapshots, but this reference is not
* tracked by a resource owner. We used to use the TopTransactionResourceOwner * tracked by a resource owner. We used to use the TopTransactionResourceOwner
* to track this snapshot reference, but that introduces logical circularity * to track this snapshot reference, but that introduces logical circularity
* and thus makes it impossible to clean up in a sane fashion. It's better to * and thus makes it impossible to clean up in a sane fashion. It's better to
...@@ -16,19 +17,22 @@ ...@@ -16,19 +17,22 @@
* module is entirely lower-level than ResourceOwners. * module is entirely lower-level than ResourceOwners.
* *
* Likewise, any snapshots that have been exported by pg_export_snapshot * Likewise, any snapshots that have been exported by pg_export_snapshot
* have regd_count = 1 and are counted in RegisteredSnapshots, but are not * have regd_count = 1 and are listed in RegisteredSnapshots, but are not
* tracked by any resource owner. * tracked by any resource owner.
* *
* Likewise, the CatalogSnapshot is listed in RegisteredSnapshots when it
* is valid, but is not tracked by any resource owner.
*
* The same is true for historic snapshots used during logical decoding, * The same is true for historic snapshots used during logical decoding,
* their lifetime is managed separately (as they live longer as one xact.c * their lifetime is managed separately (as they live longer than one xact.c
* transaction). * transaction).
* *
* These arrangements let us reset MyPgXact->xmin when there are no snapshots * These arrangements let us reset MyPgXact->xmin when there are no snapshots
* referenced by this transaction. (One possible improvement would be to be * referenced by this transaction, and advance it when the one with oldest
* able to advance Xmin when the snapshot with the earliest Xmin is no longer * Xmin is no longer referenced. For simplicity however, only registered
* referenced. That's a bit harder though, it requires more locking, and * snapshots not active snapshots participate in tracking which one is oldest;
* anyway it should be rather uncommon to keep temporary snapshots referenced * we don't try to change MyPgXact->xmin except when the active-snapshot
* for too long.) * stack is empty.
* *
* *
* Portions Copyright (c) 1996-2016, PostgreSQL Global Development Group * Portions Copyright (c) 1996-2016, PostgreSQL Global Development Group
...@@ -131,7 +135,7 @@ static volatile OldSnapshotControlData *oldSnapshotControl; ...@@ -131,7 +135,7 @@ static volatile OldSnapshotControlData *oldSnapshotControl;
* SecondarySnapshot is a snapshot that's always up-to-date as of the current * SecondarySnapshot is a snapshot that's always up-to-date as of the current
* instant, even in transaction-snapshot mode. It should only be used for * instant, even in transaction-snapshot mode. It should only be used for
* special-purpose code (say, RI checking.) CatalogSnapshot points to an * special-purpose code (say, RI checking.) CatalogSnapshot points to an
* MVCC snapshot intended to be used for catalog scans; we must refresh it * MVCC snapshot intended to be used for catalog scans; we must invalidate it
* whenever a system catalog change occurs. * whenever a system catalog change occurs.
* *
* These SnapshotData structs are static to simplify memory allocation * These SnapshotData structs are static to simplify memory allocation
...@@ -147,11 +151,6 @@ static Snapshot SecondarySnapshot = NULL; ...@@ -147,11 +151,6 @@ static Snapshot SecondarySnapshot = NULL;
static Snapshot CatalogSnapshot = NULL; static Snapshot CatalogSnapshot = NULL;
static Snapshot HistoricSnapshot = NULL; static Snapshot HistoricSnapshot = NULL;
/*
* Staleness detection for CatalogSnapshot.
*/
static bool CatalogSnapshotStale = true;
/* /*
* These are updated by GetSnapshotData. We initialize them this way * These are updated by GetSnapshotData. We initialize them this way
* for the convenience of TransactionIdIsInProgress: even in bootstrap * for the convenience of TransactionIdIsInProgress: even in bootstrap
...@@ -193,8 +192,7 @@ static ActiveSnapshotElt *OldestActiveSnapshot = NULL; ...@@ -193,8 +192,7 @@ static ActiveSnapshotElt *OldestActiveSnapshot = NULL;
/* /*
* Currently registered Snapshots. Ordered in a heap by xmin, so that we can * Currently registered Snapshots. Ordered in a heap by xmin, so that we can
* quickly find the one with lowest xmin, to advance our MyPgXat->xmin. * quickly find the one with lowest xmin, to advance our MyPgXact->xmin.
* resowner.c also tracks these.
*/ */
static int xmin_cmp(const pairingheap_node *a, const pairingheap_node *b, static int xmin_cmp(const pairingheap_node *a, const pairingheap_node *b,
void *arg); void *arg);
...@@ -316,6 +314,12 @@ GetTransactionSnapshot(void) ...@@ -316,6 +314,12 @@ GetTransactionSnapshot(void)
/* First call in transaction? */ /* First call in transaction? */
if (!FirstSnapshotSet) if (!FirstSnapshotSet)
{ {
/*
* Don't allow catalog snapshot to be older than xact snapshot. Must
* do this first to allow the empty-heap Assert to succeed.
*/
InvalidateCatalogSnapshot();
Assert(pairingheap_is_empty(&RegisteredSnapshots)); Assert(pairingheap_is_empty(&RegisteredSnapshots));
Assert(FirstXactSnapshot == NULL); Assert(FirstXactSnapshot == NULL);
...@@ -347,9 +351,6 @@ GetTransactionSnapshot(void) ...@@ -347,9 +351,6 @@ GetTransactionSnapshot(void)
else else
CurrentSnapshot = GetSnapshotData(&CurrentSnapshotData); CurrentSnapshot = GetSnapshotData(&CurrentSnapshotData);
/* Don't allow catalog snapshot to be older than xact snapshot. */
CatalogSnapshotStale = true;
FirstSnapshotSet = true; FirstSnapshotSet = true;
return CurrentSnapshot; return CurrentSnapshot;
} }
...@@ -358,7 +359,7 @@ GetTransactionSnapshot(void) ...@@ -358,7 +359,7 @@ GetTransactionSnapshot(void)
return CurrentSnapshot; return CurrentSnapshot;
/* Don't allow catalog snapshot to be older than xact snapshot. */ /* Don't allow catalog snapshot to be older than xact snapshot. */
CatalogSnapshotStale = true; InvalidateCatalogSnapshot();
CurrentSnapshot = GetSnapshotData(&CurrentSnapshotData); CurrentSnapshot = GetSnapshotData(&CurrentSnapshotData);
...@@ -463,36 +464,72 @@ GetNonHistoricCatalogSnapshot(Oid relid) ...@@ -463,36 +464,72 @@ GetNonHistoricCatalogSnapshot(Oid relid)
* scan a relation for which neither catcache nor snapshot invalidations * scan a relation for which neither catcache nor snapshot invalidations
* are sent, we must refresh the snapshot every time. * are sent, we must refresh the snapshot every time.
*/ */
if (!CatalogSnapshotStale && !RelationInvalidatesSnapshotsOnly(relid) && if (CatalogSnapshot &&
!RelationInvalidatesSnapshotsOnly(relid) &&
!RelationHasSysCache(relid)) !RelationHasSysCache(relid))
CatalogSnapshotStale = true; InvalidateCatalogSnapshot();
if (CatalogSnapshotStale) if (CatalogSnapshot == NULL)
{ {
/* Get new snapshot. */ /* Get new snapshot. */
CatalogSnapshot = GetSnapshotData(&CatalogSnapshotData); CatalogSnapshot = GetSnapshotData(&CatalogSnapshotData);
/* /*
* Mark new snapshost as valid. We must do this last, in case an * Make sure the catalog snapshot will be accounted for in decisions
* ERROR occurs inside GetSnapshotData(). * about advancing PGXACT->xmin. We could apply RegisterSnapshot, but
* that would result in making a physical copy, which is overkill; and
* it would also create a dependency on some resource owner, which we
* do not want for reasons explained at the head of this file. Instead
* just shove the CatalogSnapshot into the pairing heap manually. This
* has to be reversed in InvalidateCatalogSnapshot, of course.
*
* NB: it had better be impossible for this to throw error, since the
* CatalogSnapshot pointer is already valid.
*/ */
CatalogSnapshotStale = false; pairingheap_add(&RegisteredSnapshots, &CatalogSnapshot->ph_node);
} }
return CatalogSnapshot; return CatalogSnapshot;
} }
/* /*
* Mark the current catalog snapshot as invalid. We could change this API * InvalidateCatalogSnapshot
* to allow the caller to provide more fine-grained invalidation details, so * Mark the current catalog snapshot, if any, as invalid
* that a change to relation A wouldn't prevent us from using our cached *
* snapshot to scan relation B, but so far there's no evidence that the CPU * We could change this API to allow the caller to provide more fine-grained
* cycles we spent tracking such fine details would be well-spent. * invalidation details, so that a change to relation A wouldn't prevent us
* from using our cached snapshot to scan relation B, but so far there's no
* evidence that the CPU cycles we spent tracking such fine details would be
* well-spent.
*/ */
void void
InvalidateCatalogSnapshot(void) InvalidateCatalogSnapshot(void)
{ {
CatalogSnapshotStale = true; if (CatalogSnapshot)
{
pairingheap_remove(&RegisteredSnapshots, &CatalogSnapshot->ph_node);
CatalogSnapshot = NULL;
SnapshotResetXmin();
}
}
/*
* InvalidateCatalogSnapshotConditionally
* Drop catalog snapshot if it's the only one we have
*
* This is called when we are about to wait for client input, so we don't
* want to continue holding the catalog snapshot if it might mean that the
* global xmin horizon can't advance. However, if there are other snapshots
* still active or registered, the catalog snapshot isn't likely to be the
* oldest one, so we might as well keep it.
*/
void
InvalidateCatalogSnapshotConditionally(void)
{
if (CatalogSnapshot &&
ActiveSnapshot == NULL &&
pairingheap_is_singular(&RegisteredSnapshots))
InvalidateCatalogSnapshot();
} }
/* /*
...@@ -509,6 +546,7 @@ SnapshotSetCommandId(CommandId curcid) ...@@ -509,6 +546,7 @@ SnapshotSetCommandId(CommandId curcid)
CurrentSnapshot->curcid = curcid; CurrentSnapshot->curcid = curcid;
if (SecondarySnapshot) if (SecondarySnapshot)
SecondarySnapshot->curcid = curcid; SecondarySnapshot->curcid = curcid;
/* Should we do the same with CatalogSnapshot? */
} }
/* /*
...@@ -526,6 +564,9 @@ SetTransactionSnapshot(Snapshot sourcesnap, TransactionId sourcexid, ...@@ -526,6 +564,9 @@ SetTransactionSnapshot(Snapshot sourcesnap, TransactionId sourcexid,
/* Caller should have checked this already */ /* Caller should have checked this already */
Assert(!FirstSnapshotSet); Assert(!FirstSnapshotSet);
/* Better do this to ensure following Assert succeeds. */
InvalidateCatalogSnapshot();
Assert(pairingheap_is_empty(&RegisteredSnapshots)); Assert(pairingheap_is_empty(&RegisteredSnapshots));
Assert(FirstXactSnapshot == NULL); Assert(FirstXactSnapshot == NULL);
Assert(!HistoricSnapshotActive()); Assert(!HistoricSnapshotActive());
...@@ -918,7 +959,15 @@ xmin_cmp(const pairingheap_node *a, const pairingheap_node *b, void *arg) ...@@ -918,7 +959,15 @@ xmin_cmp(const pairingheap_node *a, const pairingheap_node *b, void *arg)
* Even if there are some remaining snapshots, we may be able to advance our * Even if there are some remaining snapshots, we may be able to advance our
* PGXACT->xmin to some degree. This typically happens when a portal is * PGXACT->xmin to some degree. This typically happens when a portal is
* dropped. For efficiency, we only consider recomputing PGXACT->xmin when * dropped. For efficiency, we only consider recomputing PGXACT->xmin when
* the active snapshot stack is empty. * the active snapshot stack is empty; this allows us not to need to track
* which active snapshot is oldest.
*
* Note: it's tempting to use GetOldestSnapshot() here so that we can include
* active snapshots in the calculation. However, that compares by LSN not
* xmin so it's not entirely clear that it's the same thing. Also, we'd be
* critically dependent on the assumption that the bottommost active snapshot
* stack entry has the oldest xmin. (Current uses of GetOldestSnapshot() are
* not actually critical, but this would be.)
*/ */
static void static void
SnapshotResetXmin(void) SnapshotResetXmin(void)
...@@ -1006,7 +1055,7 @@ AtEOXact_Snapshot(bool isCommit) ...@@ -1006,7 +1055,7 @@ AtEOXact_Snapshot(bool isCommit)
{ {
/* /*
* In transaction-snapshot mode we must release our privately-managed * In transaction-snapshot mode we must release our privately-managed
* reference to the transaction snapshot. We must decrement * reference to the transaction snapshot. We must remove it from
* RegisteredSnapshots to keep the check below happy. But we don't bother * RegisteredSnapshots to keep the check below happy. But we don't bother
* to do FreeSnapshot, for two reasons: the memory will go away with * to do FreeSnapshot, for two reasons: the memory will go away with
* TopTransactionContext anyway, and if someone has left the snapshot * TopTransactionContext anyway, and if someone has left the snapshot
...@@ -1046,7 +1095,7 @@ AtEOXact_Snapshot(bool isCommit) ...@@ -1046,7 +1095,7 @@ AtEOXact_Snapshot(bool isCommit)
/* /*
* As with the FirstXactSnapshot, we needn't spend any effort on * As with the FirstXactSnapshot, we needn't spend any effort on
* cleaning up the per-snapshot data structures, but we do need to * cleaning up the per-snapshot data structures, but we do need to
* unlink them from RegisteredSnapshots to prevent a warning below. * remove them from RegisteredSnapshots to prevent a warning below.
*/ */
foreach(lc, exportedSnapshots) foreach(lc, exportedSnapshots)
{ {
...@@ -1058,6 +1107,9 @@ AtEOXact_Snapshot(bool isCommit) ...@@ -1058,6 +1107,9 @@ AtEOXact_Snapshot(bool isCommit)
exportedSnapshots = NIL; exportedSnapshots = NIL;
} }
/* Drop catalog snapshot if any */
InvalidateCatalogSnapshot();
/* On commit, complain about leftover snapshots */ /* On commit, complain about leftover snapshots */
if (isCommit) if (isCommit)
{ {
......
...@@ -69,6 +69,7 @@ extern Snapshot GetOldestSnapshot(void); ...@@ -69,6 +69,7 @@ extern Snapshot GetOldestSnapshot(void);
extern Snapshot GetCatalogSnapshot(Oid relid); extern Snapshot GetCatalogSnapshot(Oid relid);
extern Snapshot GetNonHistoricCatalogSnapshot(Oid relid); extern Snapshot GetNonHistoricCatalogSnapshot(Oid relid);
extern void InvalidateCatalogSnapshot(void); extern void InvalidateCatalogSnapshot(void);
extern void InvalidateCatalogSnapshotConditionally(void);
extern void PushActiveSnapshot(Snapshot snapshot); extern void PushActiveSnapshot(Snapshot snapshot);
extern void PushCopiedSnapshot(Snapshot snapshot); extern void PushCopiedSnapshot(Snapshot snapshot);
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment