Commit 6dced63b authored by Michael Paquier's avatar Michael Paquier

Fix control file update done in restartpoints still running after promotion

If a cluster is promoted (aka the control file shows a state different
than DB_IN_ARCHIVE_RECOVERY) while CreateRestartPoint() is still
processing, this function could miss an update of the control file for
"checkPoint" and "checkPointCopy" but still do the recycling and/or
removal of the past WAL segments, assuming that the to-be-updated LSN
values should be used as reference points for the cleanup.  This causes
a follow-up restart attempting crash recovery to fail with a PANIC on a
missing checkpoint record if the end-of-recovery checkpoint triggered by
the promotion did not complete while the cluster abruptly stopped or
crashed before the completion of this checkpoint.  The PANIC would be
caused by the redo LSN referred in the control file as located in a
segment already gone, recycled by the previous restartpoint with
"checkPoint" out-of-sync in the control file.

This commit fixes the update of the control file during restartpoints so
as "checkPoint" and "checkPointCopy" are updated even if the cluster has
been promoted while a restartpoint is running, to be on par with the set
of WAL segments actually recycled in the end of CreateRestartPoint().

7863ee4 has fixed this problem already on master, but the release timing
of the latest point versions did not let me enough time to study and fix
that on all the stable branches.

Reported-by: Fujii Masao, Rui Zhao
Author: Kyotaro Horiguchi
Reviewed-by: Nathan Bossart, Michael Paquier
Discussion: https://postgr.es/m/20220316.102444.2193181487576617583.horikyota.ntt@gmail.com
Backpatch-through: 10
parent ac51c9fb
...@@ -9692,21 +9692,26 @@ CreateRestartPoint(int flags) ...@@ -9692,21 +9692,26 @@ CreateRestartPoint(int flags)
PriorRedoPtr = ControlFile->checkPointCopy.redo; PriorRedoPtr = ControlFile->checkPointCopy.redo;
/* /*
* Update pg_control, using current time. Check that it still shows * Update pg_control, using current time. Check that it still shows an
* DB_IN_ARCHIVE_RECOVERY state and an older checkpoint, else do nothing; * older checkpoint, else do nothing; this is a quick hack to make sure
* this is a quick hack to make sure nothing really bad happens if somehow * nothing really bad happens if somehow we get here after the
* we get here after the end-of-recovery checkpoint. * end-of-recovery checkpoint.
*/ */
LWLockAcquire(ControlFileLock, LW_EXCLUSIVE); LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
if (ControlFile->state == DB_IN_ARCHIVE_RECOVERY && if (ControlFile->checkPointCopy.redo < lastCheckPoint.redo)
ControlFile->checkPointCopy.redo < lastCheckPoint.redo)
{ {
/*
* Update the checkpoint information. We do this even if the cluster
* does not show DB_IN_ARCHIVE_RECOVERY to match with the set of WAL
* segments recycled below.
*/
ControlFile->checkPoint = lastCheckPointRecPtr; ControlFile->checkPoint = lastCheckPointRecPtr;
ControlFile->checkPointCopy = lastCheckPoint; ControlFile->checkPointCopy = lastCheckPoint;
ControlFile->time = (pg_time_t) time(NULL); ControlFile->time = (pg_time_t) time(NULL);
/* /*
* Ensure minRecoveryPoint is past the checkpoint record. Normally, * Ensure minRecoveryPoint is past the checkpoint record and update it
* if the control file still shows DB_IN_ARCHIVE_RECOVERY. Normally,
* this will have happened already while writing out dirty buffers, * this will have happened already while writing out dirty buffers,
* but not necessarily - e.g. because no buffers were dirtied. We do * but not necessarily - e.g. because no buffers were dirtied. We do
* this because a non-exclusive base backup uses minRecoveryPoint to * this because a non-exclusive base backup uses minRecoveryPoint to
...@@ -9715,7 +9720,15 @@ CreateRestartPoint(int flags) ...@@ -9715,7 +9720,15 @@ CreateRestartPoint(int flags)
* at a minimum. Note that for an ordinary restart of recovery there's * at a minimum. Note that for an ordinary restart of recovery there's
* no value in having the minimum recovery point any earlier than this * no value in having the minimum recovery point any earlier than this
* anyway, because redo will begin just after the checkpoint record. * anyway, because redo will begin just after the checkpoint record.
* this because a non-exclusive base backup uses minRecoveryPoint to
* determine which WAL files must be included in the backup, and the
* file (or files) containing the checkpoint record must be included,
* at a minimum. Note that for an ordinary restart of recovery there's
* no value in having the minimum recovery point any earlier than this
* anyway, because redo will begin just after the checkpoint record.
*/ */
if (ControlFile->state == DB_IN_ARCHIVE_RECOVERY)
{
if (ControlFile->minRecoveryPoint < lastCheckPointEndPtr) if (ControlFile->minRecoveryPoint < lastCheckPointEndPtr)
{ {
ControlFile->minRecoveryPoint = lastCheckPointEndPtr; ControlFile->minRecoveryPoint = lastCheckPointEndPtr;
...@@ -9727,6 +9740,7 @@ CreateRestartPoint(int flags) ...@@ -9727,6 +9740,7 @@ CreateRestartPoint(int flags)
} }
if (flags & CHECKPOINT_IS_SHUTDOWN) if (flags & CHECKPOINT_IS_SHUTDOWN)
ControlFile->state = DB_SHUTDOWNED_IN_RECOVERY; ControlFile->state = DB_SHUTDOWNED_IN_RECOVERY;
}
UpdateControlFile(); UpdateControlFile();
} }
LWLockRelease(ControlFileLock); LWLockRelease(ControlFileLock);
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment