Commit 3cd1ba14 authored by Noah Misch's avatar Noah Misch

Fix comments about WAL rule "write xlog before data" versus pg_multixact.

Recovery does not achieve its goal of zeroing all pg_multixact entries
whose accompanying WAL records never reached disk.  Remove that claim
and justify its expendability.  Detail the need for TrimMultiXact(),
which has little in common with the TrimCLOG() rationale.  Merge two
tightly-related comments.  Stop presenting pg_multixact as specific to
heap_lock_tuple(); PostgreSQL 9.3 extended its use to heap_update().

Noticed while investigating a report from Andres Freund.
parent 253de19b
...@@ -24,17 +24,21 @@ ...@@ -24,17 +24,21 @@
* since it would get completely confused if someone inquired about a bogus * since it would get completely confused if someone inquired about a bogus
* MultiXactId that pointed to an intermediate slot containing an XID.) * MultiXactId that pointed to an intermediate slot containing an XID.)
* *
* XLOG interactions: this module generates an XLOG record whenever a new * XLOG interactions: this module generates a record whenever a new OFFSETs or
* OFFSETs or MEMBERs page is initialized to zeroes, as well as an XLOG record * MEMBERs page is initialized to zeroes, as well as an
* whenever a new MultiXactId is defined. This allows us to completely * XLOG_MULTIXACT_CREATE_ID record whenever a new MultiXactId is defined.
* rebuild the data entered since the last checkpoint during XLOG replay. * This module ignores the WAL rule "write xlog before data," because it
* Because this is possible, we need not follow the normal rule of * suffices that actions recording a MultiXactId in a heap xmax do follow that
* "write WAL before data"; the only correctness guarantee needed is that * rule. The only way for the MXID to be referenced from any data page is for
* we flush and sync all dirty OFFSETs and MEMBERs pages to disk before a * heap_lock_tuple() or heap_update() to have put it there, and each generates
* checkpoint is considered complete. If a page does make it to disk ahead * an XLOG record that must follow ours. The normal LSN interlock between the
* of corresponding WAL records, it will be forcibly zeroed before use anyway. * data page and that XLOG record will ensure that our XLOG record reaches
* Therefore, we don't need to mark our pages with LSN information; we have * disk first. If the SLRU members/offsets data reaches disk sooner than the
* enough synchronization already. * XLOG records, we do not care; after recovery, no xmax will refer to it. On
* the flip side, to ensure that all referenced entries _do_ reach disk, this
* module's XLOG records completely rebuild the data entered since the last
* checkpoint. We flush and sync all dirty OFFSETs and MEMBERs pages to disk
* before each checkpoint is considered complete.
* *
* Like clog.c, and unlike subtrans.c, we have to preserve state across * Like clog.c, and unlike subtrans.c, we have to preserve state across
* crashes and ensure that MXID and offset numbering increases monotonically * crashes and ensure that MXID and offset numbering increases monotonically
...@@ -795,19 +799,7 @@ MultiXactIdCreateFromMembers(int nmembers, MultiXactMember *members) ...@@ -795,19 +799,7 @@ MultiXactIdCreateFromMembers(int nmembers, MultiXactMember *members)
*/ */
multi = GetNewMultiXactId(nmembers, &offset); multi = GetNewMultiXactId(nmembers, &offset);
/* /* Make an XLOG entry describing the new MXID. */
* Make an XLOG entry describing the new MXID.
*
* Note: we need not flush this XLOG entry to disk before proceeding. The
* only way for the MXID to be referenced from any data page is for
* heap_lock_tuple() to have put it there, and heap_lock_tuple() generates
* an XLOG record that must follow ours. The normal LSN interlock between
* the data page and that XLOG record will ensure that our XLOG record
* reaches disk first. If the SLRU members/offsets data reaches disk
* sooner than the XLOG record, we do not care because we'll overwrite it
* with zeroes unless the XLOG record is there too; see notes at top of
* this file.
*/
xlrec.mid = multi; xlrec.mid = multi;
xlrec.moff = offset; xlrec.moff = offset;
xlrec.nmembers = nmembers; xlrec.nmembers = nmembers;
...@@ -2037,7 +2029,11 @@ TrimMultiXact(void) ...@@ -2037,7 +2029,11 @@ TrimMultiXact(void)
/* /*
* Zero out the remainder of the current offsets page. See notes in * Zero out the remainder of the current offsets page. See notes in
* TrimCLOG() for motivation. * TrimCLOG() for background. Unlike CLOG, some WAL record covers every
* pg_multixact SLRU mutation. Since, also unlike CLOG, we ignore the WAL
* rule "write xlog before data," nextMXact successors may carry obsolete,
* nonzero offset values. Zero those so case 2 of GetMultiXactIdMembers()
* operates normally.
*/ */
entryno = MultiXactIdToOffsetEntry(nextMXact); entryno = MultiXactIdToOffsetEntry(nextMXact);
if (entryno != 0) if (entryno != 0)
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment