Commit 566372b3 authored by Noah Misch's avatar Noah Misch

Prevent concurrent SimpleLruTruncate() for any given SLRU.

The SimpleLruTruncate() header comment states the new coding rule.  To
achieve this, add locktype "frozenid" and two LWLocks.  This closes a
rare opportunity for data loss, which manifested as "apparent
wraparound" or "could not access status of transaction" errors.  Data
loss is more likely in pg_multixact, due to released branches' thin
margin between multiStopLimit and multiWrapLimit.  If a user's physical
replication primary logged ":  apparent wraparound" messages, the user
should rebuild standbys of that primary regardless of symptoms.  At less
risk is a cluster having emitted "not accepting commands" errors or
"must be vacuumed" warnings at some point.  One can test a cluster for
this data loss by running VACUUM FREEZE in every database.  Back-patch
to 9.5 (all supported versions).

Discussion: https://postgr.es/m/20190218073103.GA1434723@rfd.leadboat.com
parent d4d443b3
...@@ -10226,7 +10226,8 @@ SCRAM-SHA-256$<replaceable>&lt;iteration count&gt;</replaceable>:<replaceable>&l ...@@ -10226,7 +10226,8 @@ SCRAM-SHA-256$<replaceable>&lt;iteration count&gt;</replaceable>:<replaceable>&l
and general database objects (identified by class OID and object OID, and general database objects (identified by class OID and object OID,
in the same way as in <structname>pg_description</structname> or in the same way as in <structname>pg_description</structname> or
<structname>pg_depend</structname>). Also, the right to extend a <structname>pg_depend</structname>). Also, the right to extend a
relation is represented as a separate lockable object. relation is represented as a separate lockable object, as is the right to
update <structname>pg_database</structname>.<structfield>datfrozenxid</structfield>.
Also, <quote>advisory</quote> locks can be taken on numbers that have Also, <quote>advisory</quote> locks can be taken on numbers that have
user-defined meanings. user-defined meanings.
</para> </para>
...@@ -10254,6 +10255,7 @@ SCRAM-SHA-256$<replaceable>&lt;iteration count&gt;</replaceable>:<replaceable>&l ...@@ -10254,6 +10255,7 @@ SCRAM-SHA-256$<replaceable>&lt;iteration count&gt;</replaceable>:<replaceable>&l
Type of the lockable object: Type of the lockable object:
<literal>relation</literal>, <literal>relation</literal>,
<literal>extend</literal>, <literal>extend</literal>,
<literal>frozenid</literal>,
<literal>page</literal>, <literal>page</literal>,
<literal>tuple</literal>, <literal>tuple</literal>,
<literal>transactionid</literal>, <literal>transactionid</literal>,
......
...@@ -1742,6 +1742,12 @@ postgres 27093 0.0 0.0 30096 2752 ? Ss 11:34 0:00 postgres: ser ...@@ -1742,6 +1742,12 @@ postgres 27093 0.0 0.0 30096 2752 ? Ss 11:34 0:00 postgres: ser
<entry><literal>extend</literal></entry> <entry><literal>extend</literal></entry>
<entry>Waiting to extend a relation.</entry> <entry>Waiting to extend a relation.</entry>
</row> </row>
<row>
<entry><literal>frozenid</literal></entry>
<entry>Waiting to
update <structname>pg_database</structname>.<structfield>datfrozenxid</structfield>
and <structname>pg_database</structname>.<structfield>datminmxid</structfield>.</entry>
</row>
<row> <row>
<entry><literal>object</literal></entry> <entry><literal>object</literal></entry>
<entry>Waiting to acquire a lock on a non-relation database object.</entry> <entry>Waiting to acquire a lock on a non-relation database object.</entry>
...@@ -1910,6 +1916,11 @@ postgres 27093 0.0 0.0 30096 2752 ? Ss 11:34 0:00 postgres: ser ...@@ -1910,6 +1916,11 @@ postgres 27093 0.0 0.0 30096 2752 ? Ss 11:34 0:00 postgres: ser
<entry><literal>NotifyQueue</literal></entry> <entry><literal>NotifyQueue</literal></entry>
<entry>Waiting to read or update <command>NOTIFY</command> messages.</entry> <entry>Waiting to read or update <command>NOTIFY</command> messages.</entry>
</row> </row>
<row>
<entry><literal>NotifyQueueTail</literal></entry>
<entry>Waiting to update limit on <command>NOTIFY</command> message
storage.</entry>
</row>
<row> <row>
<entry><literal>NotifySLRU</literal></entry> <entry><literal>NotifySLRU</literal></entry>
<entry>Waiting to access the <command>NOTIFY</command> message SLRU <entry>Waiting to access the <command>NOTIFY</command> message SLRU
...@@ -2086,6 +2097,11 @@ postgres 27093 0.0 0.0 30096 2752 ? Ss 11:34 0:00 postgres: ser ...@@ -2086,6 +2097,11 @@ postgres 27093 0.0 0.0 30096 2752 ? Ss 11:34 0:00 postgres: ser
<entry><literal>WALWrite</literal></entry> <entry><literal>WALWrite</literal></entry>
<entry>Waiting for WAL buffers to be written to disk.</entry> <entry>Waiting for WAL buffers to be written to disk.</entry>
</row> </row>
<row>
<entry><literal>WrapLimitsVacuum</literal></entry>
<entry>Waiting to update limits on transaction id and multixact
consumption.</entry>
</row>
<row> <row>
<entry><literal>XactBuffer</literal></entry> <entry><literal>XactBuffer</literal></entry>
<entry>Waiting for I/O on a transaction status SLRU buffer.</entry> <entry>Waiting for I/O on a transaction status SLRU buffer.</entry>
......
...@@ -1191,6 +1191,14 @@ SimpleLruFlush(SlruCtl ctl, bool allow_redirtied) ...@@ -1191,6 +1191,14 @@ SimpleLruFlush(SlruCtl ctl, bool allow_redirtied)
/* /*
* Remove all segments before the one holding the passed page number * Remove all segments before the one holding the passed page number
*
* All SLRUs prevent concurrent calls to this function, either with an LWLock
* or by calling it only as part of a checkpoint. Mutual exclusion must begin
* before computing cutoffPage. Mutual exclusion must end after any limit
* update that would permit other backends to write fresh data into the
* segment immediately preceding the one containing cutoffPage. Otherwise,
* when the SLRU is quite full, SimpleLruTruncate() might delete that segment
* after it has accrued freshly-written data.
*/ */
void void
SimpleLruTruncate(SlruCtl ctl, int cutoffPage) SimpleLruTruncate(SlruCtl ctl, int cutoffPage)
......
...@@ -349,8 +349,8 @@ ExtendSUBTRANS(TransactionId newestXact) ...@@ -349,8 +349,8 @@ ExtendSUBTRANS(TransactionId newestXact)
/* /*
* Remove all SUBTRANS segments before the one holding the passed transaction ID * Remove all SUBTRANS segments before the one holding the passed transaction ID
* *
* This is normally called during checkpoint, with oldestXact being the * oldestXact is the oldest TransactionXmin of any running transaction. This
* oldest TransactionXmin of any running transaction. * is called only during checkpoint.
*/ */
void void
TruncateSUBTRANS(TransactionId oldestXact) TruncateSUBTRANS(TransactionId oldestXact)
......
...@@ -244,19 +244,22 @@ typedef struct QueueBackendStatus ...@@ -244,19 +244,22 @@ typedef struct QueueBackendStatus
/* /*
* Shared memory state for LISTEN/NOTIFY (excluding its SLRU stuff) * Shared memory state for LISTEN/NOTIFY (excluding its SLRU stuff)
* *
* The AsyncQueueControl structure is protected by the NotifyQueueLock. * The AsyncQueueControl structure is protected by the NotifyQueueLock and
* NotifyQueueTailLock.
* *
* When holding the lock in SHARED mode, backends may only inspect their own * When holding NotifyQueueLock in SHARED mode, backends may only inspect
* entries as well as the head and tail pointers. Consequently we can allow a * their own entries as well as the head and tail pointers. Consequently we
* backend to update its own record while holding only SHARED lock (since no * can allow a backend to update its own record while holding only SHARED lock
* other backend will inspect it). * (since no other backend will inspect it).
* *
* When holding the lock in EXCLUSIVE mode, backends can inspect the entries * When holding NotifyQueueLock in EXCLUSIVE mode, backends can inspect the
* of other backends and also change the head and tail pointers. * entries of other backends and also change the head pointer. When holding
* both NotifyQueueLock and NotifyQueueTailLock in EXCLUSIVE mode, backends
* can change the tail pointer.
* *
* NotifySLRULock is used as the control lock for the pg_notify SLRU buffers. * NotifySLRULock is used as the control lock for the pg_notify SLRU buffers.
* In order to avoid deadlocks, whenever we need both locks, we always first * In order to avoid deadlocks, whenever we need multiple locks, we first get
* get NotifyQueueLock and then NotifySLRULock. * NotifyQueueTailLock, then NotifyQueueLock, and lastly NotifySLRULock.
* *
* Each backend uses the backend[] array entry with index equal to its * Each backend uses the backend[] array entry with index equal to its
* BackendId (which can range from 1 to MaxBackends). We rely on this to make * BackendId (which can range from 1 to MaxBackends). We rely on this to make
...@@ -2177,6 +2180,10 @@ asyncQueueAdvanceTail(void) ...@@ -2177,6 +2180,10 @@ asyncQueueAdvanceTail(void)
int newtailpage; int newtailpage;
int boundary; int boundary;
/* Restrict task to one backend per cluster; see SimpleLruTruncate(). */
LWLockAcquire(NotifyQueueTailLock, LW_EXCLUSIVE);
/* Compute the new tail. */
LWLockAcquire(NotifyQueueLock, LW_EXCLUSIVE); LWLockAcquire(NotifyQueueLock, LW_EXCLUSIVE);
min = QUEUE_HEAD; min = QUEUE_HEAD;
for (BackendId i = QUEUE_FIRST_LISTENER; i > 0; i = QUEUE_NEXT_LISTENER(i)) for (BackendId i = QUEUE_FIRST_LISTENER; i > 0; i = QUEUE_NEXT_LISTENER(i))
...@@ -2185,7 +2192,6 @@ asyncQueueAdvanceTail(void) ...@@ -2185,7 +2192,6 @@ asyncQueueAdvanceTail(void)
min = QUEUE_POS_MIN(min, QUEUE_BACKEND_POS(i)); min = QUEUE_POS_MIN(min, QUEUE_BACKEND_POS(i));
} }
oldtailpage = QUEUE_POS_PAGE(QUEUE_TAIL); oldtailpage = QUEUE_POS_PAGE(QUEUE_TAIL);
QUEUE_TAIL = min;
LWLockRelease(NotifyQueueLock); LWLockRelease(NotifyQueueLock);
/* /*
...@@ -2205,6 +2211,17 @@ asyncQueueAdvanceTail(void) ...@@ -2205,6 +2211,17 @@ asyncQueueAdvanceTail(void)
*/ */
SimpleLruTruncate(NotifyCtl, newtailpage); SimpleLruTruncate(NotifyCtl, newtailpage);
} }
/*
* Advertise the new tail. This changes asyncQueueIsFull()'s verdict for
* the segment immediately prior to the new tail, allowing fresh data into
* that segment.
*/
LWLockAcquire(NotifyQueueLock, LW_EXCLUSIVE);
QUEUE_TAIL = min;
LWLockRelease(NotifyQueueLock);
LWLockRelease(NotifyQueueTailLock);
} }
/* /*
......
...@@ -1361,6 +1361,14 @@ vac_update_datfrozenxid(void) ...@@ -1361,6 +1361,14 @@ vac_update_datfrozenxid(void)
bool bogus = false; bool bogus = false;
bool dirty = false; bool dirty = false;
/*
* Restrict this task to one backend per database. This avoids race
* conditions that would move datfrozenxid or datminmxid backward. It
* avoids calling vac_truncate_clog() with a datfrozenxid preceding a
* datfrozenxid passed to an earlier vac_truncate_clog() call.
*/
LockDatabaseFrozenIds(ExclusiveLock);
/* /*
* Initialize the "min" calculation with * Initialize the "min" calculation with
* GetOldestNonRemovableTransactionId(), which is a reasonable * GetOldestNonRemovableTransactionId(), which is a reasonable
...@@ -1551,6 +1559,9 @@ vac_truncate_clog(TransactionId frozenXID, ...@@ -1551,6 +1559,9 @@ vac_truncate_clog(TransactionId frozenXID,
bool bogus = false; bool bogus = false;
bool frozenAlreadyWrapped = false; bool frozenAlreadyWrapped = false;
/* Restrict task to one backend per cluster; see SimpleLruTruncate(). */
LWLockAcquire(WrapLimitsVacuumLock, LW_EXCLUSIVE);
/* init oldest datoids to sync with my frozenXID/minMulti values */ /* init oldest datoids to sync with my frozenXID/minMulti values */
oldestxid_datoid = MyDatabaseId; oldestxid_datoid = MyDatabaseId;
minmulti_datoid = MyDatabaseId; minmulti_datoid = MyDatabaseId;
...@@ -1660,6 +1671,8 @@ vac_truncate_clog(TransactionId frozenXID, ...@@ -1660,6 +1671,8 @@ vac_truncate_clog(TransactionId frozenXID,
*/ */
SetTransactionIdLimit(frozenXID, oldestxid_datoid); SetTransactionIdLimit(frozenXID, oldestxid_datoid);
SetMultiXactIdLimit(minMulti, minmulti_datoid, false); SetMultiXactIdLimit(minMulti, minmulti_datoid, false);
LWLockRelease(WrapLimitsVacuumLock);
} }
......
...@@ -460,6 +460,21 @@ UnlockRelationForExtension(Relation relation, LOCKMODE lockmode) ...@@ -460,6 +460,21 @@ UnlockRelationForExtension(Relation relation, LOCKMODE lockmode)
LockRelease(&tag, lockmode, false); LockRelease(&tag, lockmode, false);
} }
/*
* LockDatabaseFrozenIds
*
* This allows one backend per database to execute vac_update_datfrozenxid().
*/
void
LockDatabaseFrozenIds(LOCKMODE lockmode)
{
LOCKTAG tag;
SET_LOCKTAG_DATABASE_FROZEN_IDS(tag, MyDatabaseId);
(void) LockAcquire(&tag, lockmode, false, false);
}
/* /*
* LockPage * LockPage
* *
...@@ -1098,6 +1113,11 @@ DescribeLockTag(StringInfo buf, const LOCKTAG *tag) ...@@ -1098,6 +1113,11 @@ DescribeLockTag(StringInfo buf, const LOCKTAG *tag)
tag->locktag_field2, tag->locktag_field2,
tag->locktag_field1); tag->locktag_field1);
break; break;
case LOCKTAG_DATABASE_FROZEN_IDS:
appendStringInfo(buf,
_("pg_database.datfrozenxid of database %u"),
tag->locktag_field1);
break;
case LOCKTAG_PAGE: case LOCKTAG_PAGE:
appendStringInfo(buf, appendStringInfo(buf,
_("page %u of relation %u of database %u"), _("page %u of relation %u of database %u"),
......
...@@ -50,3 +50,6 @@ MultiXactTruncationLock 41 ...@@ -50,3 +50,6 @@ MultiXactTruncationLock 41
OldSnapshotTimeMapLock 42 OldSnapshotTimeMapLock 42
LogicalRepWorkerLock 43 LogicalRepWorkerLock 43
XactTruncationLock 44 XactTruncationLock 44
# 45 was XactTruncationLock until removal of BackendRandomLock
WrapLimitsVacuumLock 46
NotifyQueueTailLock 47
...@@ -29,6 +29,7 @@ ...@@ -29,6 +29,7 @@
const char *const LockTagTypeNames[] = { const char *const LockTagTypeNames[] = {
"relation", "relation",
"extend", "extend",
"frozenid",
"page", "page",
"tuple", "tuple",
"transactionid", "transactionid",
...@@ -254,6 +255,17 @@ pg_lock_status(PG_FUNCTION_ARGS) ...@@ -254,6 +255,17 @@ pg_lock_status(PG_FUNCTION_ARGS)
nulls[8] = true; nulls[8] = true;
nulls[9] = true; nulls[9] = true;
break; break;
case LOCKTAG_DATABASE_FROZEN_IDS:
values[1] = ObjectIdGetDatum(instance->locktag.locktag_field1);
nulls[2] = true;
nulls[3] = true;
nulls[4] = true;
nulls[5] = true;
nulls[6] = true;
nulls[7] = true;
nulls[8] = true;
nulls[9] = true;
break;
case LOCKTAG_PAGE: case LOCKTAG_PAGE:
values[1] = ObjectIdGetDatum(instance->locktag.locktag_field1); values[1] = ObjectIdGetDatum(instance->locktag.locktag_field1);
values[2] = ObjectIdGetDatum(instance->locktag.locktag_field2); values[2] = ObjectIdGetDatum(instance->locktag.locktag_field2);
......
...@@ -59,6 +59,9 @@ extern bool ConditionalLockRelationForExtension(Relation relation, ...@@ -59,6 +59,9 @@ extern bool ConditionalLockRelationForExtension(Relation relation,
LOCKMODE lockmode); LOCKMODE lockmode);
extern int RelationExtensionLockWaiterCount(Relation relation); extern int RelationExtensionLockWaiterCount(Relation relation);
/* Lock to recompute pg_database.datfrozenxid in the current database */
extern void LockDatabaseFrozenIds(LOCKMODE lockmode);
/* Lock a page (currently only used within indexes) */ /* Lock a page (currently only used within indexes) */
extern void LockPage(Relation relation, BlockNumber blkno, LOCKMODE lockmode); extern void LockPage(Relation relation, BlockNumber blkno, LOCKMODE lockmode);
extern bool ConditionalLockPage(Relation relation, BlockNumber blkno, LOCKMODE lockmode); extern bool ConditionalLockPage(Relation relation, BlockNumber blkno, LOCKMODE lockmode);
......
...@@ -138,6 +138,7 @@ typedef enum LockTagType ...@@ -138,6 +138,7 @@ typedef enum LockTagType
{ {
LOCKTAG_RELATION, /* whole relation */ LOCKTAG_RELATION, /* whole relation */
LOCKTAG_RELATION_EXTEND, /* the right to extend a relation */ LOCKTAG_RELATION_EXTEND, /* the right to extend a relation */
LOCKTAG_DATABASE_FROZEN_IDS, /* pg_database.datfrozenxid */
LOCKTAG_PAGE, /* one page of a relation */ LOCKTAG_PAGE, /* one page of a relation */
LOCKTAG_TUPLE, /* one physical tuple */ LOCKTAG_TUPLE, /* one physical tuple */
LOCKTAG_TRANSACTION, /* transaction (for waiting for xact done) */ LOCKTAG_TRANSACTION, /* transaction (for waiting for xact done) */
...@@ -194,6 +195,15 @@ typedef struct LOCKTAG ...@@ -194,6 +195,15 @@ typedef struct LOCKTAG
(locktag).locktag_type = LOCKTAG_RELATION_EXTEND, \ (locktag).locktag_type = LOCKTAG_RELATION_EXTEND, \
(locktag).locktag_lockmethodid = DEFAULT_LOCKMETHOD) (locktag).locktag_lockmethodid = DEFAULT_LOCKMETHOD)
/* ID info for frozen IDs is DB OID */
#define SET_LOCKTAG_DATABASE_FROZEN_IDS(locktag,dboid) \
((locktag).locktag_field1 = (dboid), \
(locktag).locktag_field2 = 0, \
(locktag).locktag_field3 = 0, \
(locktag).locktag_field4 = 0, \
(locktag).locktag_type = LOCKTAG_DATABASE_FROZEN_IDS, \
(locktag).locktag_lockmethodid = DEFAULT_LOCKMETHOD)
/* ID info for a page is RELATION info + BlockNumber */ /* ID info for a page is RELATION info + BlockNumber */
#define SET_LOCKTAG_PAGE(locktag,dboid,reloid,blocknum) \ #define SET_LOCKTAG_PAGE(locktag,dboid,reloid,blocknum) \
((locktag).locktag_field1 = (dboid), \ ((locktag).locktag_field1 = (dboid), \
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment