Cleanup and code review for the patch that made bgwriter active during

archive recovery. Invent a separate state variable and inquiry function for XLogInsertAllowed() to clarify some tests and make the management of writing the end-of-recovery checkpoint less klugy. Fix several places that were incorrectly testing InRecovery when they should be looking at RecoveryInProgress or XLogInsertAllowed (because they will now be executed in the bgwriter not startup process). Clarify handling of bad LSNs passed to XLogFlush during recovery. Use a spinlock for setting/testing SharedRecoveryInProgress. Improve quite a lot of comments. Heikki and Tom

Cleanup and code review for the patch that made bgwriter active during
archive recovery. Invent a separate state variable and inquiry function for XLogInsertAllowed() to clarify some tests and make the management of writing the end-of-recovery checkpoint less klugy. Fix several places that were incorrectly testing InRecovery when they should be looking at RecoveryInProgress or XLogInsertAllowed (because they will now be executed in the bgwriter not startup process). Clarify handling of bad LSNs passed to XLogFlush during recovery. Use a spinlock for setting/testing SharedRecoveryInProgress. Improve quite a lot of comments. Heikki and Tom
2de48a83 · Tom Lane · a6667d96 · 2de48a83 · 2de48a83 · 2de48a83
Commit 2de48a83 authored Jun 26, 2009 by Tom Lane
6 changed files
--- a/src/backend/access/transam/multixact.c
+++ b/src/backend/access/transam/multixact.c
@@ -42,7 +42,7 @@
 * Portions Copyright (c) 1996-2009, PostgreSQL Global Development Group
 * Portions Copyright (c) 1994, Regents of the University of California
 *
- * $PostgreSQL: pgsql/src/backend/access/transam/multixact.c,v 1.30 2009/01/20 18:59:37 heikki Exp $
+ * $PostgreSQL: pgsql/src/backend/access/transam/multixact.c,v 1.31 2009/06/26 20:29:04 tgl Exp $
 *
 *-------------------------------------------------------------------------
 */
@@ -1543,7 +1543,7 @@ CheckPointMultiXact(void)
 	 * SimpleLruTruncate would get confused.  It seems best not to risk
 	 * removing any data during recovery anyway, so don't truncate.
 	 */
-	if (!InRecovery)
+	if (!RecoveryInProgress())
 		TruncateMultiXact();
 	TRACE_POSTGRESQL_MULTIXACT_CHECKPOINT_DONE(true);

--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
--- a/src/backend/postmaster/bgwriter.c
+++ b/src/backend/postmaster/bgwriter.c
@@ -19,7 +19,8 @@
 * condition.)
 *
 * The bgwriter is started by the postmaster as soon as the startup subprocess
- * finishes.  It remains alive until the postmaster commands it to terminate.
+ * finishes, or as soon as recovery begins if we are doing archive recovery.
+ * It remains alive until the postmaster commands it to terminate.
 * Normal termination is by SIGUSR2, which instructs the bgwriter to execute
 * a shutdown checkpoint and then exit(0).	(All backends must be stopped
 * before SIGUSR2 is issued!)  Emergency termination is by SIGQUIT; like any
@@ -37,7 +38,7 @@
 *
 *
 * IDENTIFICATION
- *	  $PostgreSQL: pgsql/src/backend/postmaster/bgwriter.c,v 1.61 2009/06/25 21:36:00 heikki Exp $
+ *	  $PostgreSQL: pgsql/src/backend/postmaster/bgwriter.c,v 1.62 2009/06/26 20:29:04 tgl Exp $
 *
 *-------------------------------------------------------------------------
 */
@@ -902,11 +903,11 @@ BgWriterShmemInit(void)
 *
 * flags is a bitwise OR of the following:
 *	CHECKPOINT_IS_SHUTDOWN: checkpoint is for database shutdown.
- *	CHECKPOINT_END_OF_RECOVERY: checkpoint is to finish WAL recovery.
+ *	CHECKPOINT_END_OF_RECOVERY: checkpoint is for end of WAL recovery.
 *	CHECKPOINT_IMMEDIATE: finish the checkpoint ASAP,
 *		ignoring checkpoint_completion_target parameter.
 *	CHECKPOINT_FORCE: force a checkpoint even if no XLOG activity has occured
- *		since the last one (implied by CHECKPOINT_IS_SHUTDOWN and
+ *		since the last one (implied by CHECKPOINT_IS_SHUTDOWN or
 *		CHECKPOINT_END_OF_RECOVERY).
 *	CHECKPOINT_WAIT: wait for completion before returning (otherwise,
 *		just signal bgwriter to do it, and return).

--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -37,7 +37,7 @@
 *
 *
 * IDENTIFICATION
- *	  $PostgreSQL: pgsql/src/backend/postmaster/postmaster.c,v 1.582 2009/06/11 14:49:01 momjian Exp $
+ *	  $PostgreSQL: pgsql/src/backend/postmaster/postmaster.c,v 1.583 2009/06/26 20:29:04 tgl Exp $
 *
 * NOTES
 *
@@ -227,21 +227,22 @@ static bool RecoveryError = false;		/* T if WAL recovery failed */
 *
 * After doing all the postmaster initialization work, we enter PM_STARTUP
 * state and the startup process is launched. The startup process begins by
- * reading the control file and other preliminary initialization steps. When
+ * reading the control file and other preliminary initialization steps.
- * it's ready to start WAL redo, it signals postmaster, and we switch to
+ * In a normal startup, or after crash recovery, the startup process exits
- * PM_RECOVERY phase. The background writer is launched, while the startup
+ * with exit code 0 and we switch to PM_RUN state.  However, archive recovery
- * process continues applying WAL.
+ * is handled specially since it takes much longer and we would like to support
+ * hot standby during archive recovery.
 *
+ * When the startup process is ready to start archive recovery, it signals the
+ * postmaster, and we switch to PM_RECOVERY state. The background writer is
+ * launched, while the startup process continues applying WAL.
 * After reaching a consistent point in WAL redo, startup process signals
- * us again, and we switch to PM_RECOVERY_CONSISTENT phase. There's currently
+ * us again, and we switch to PM_RECOVERY_CONSISTENT state. There's currently
 * no difference between PM_RECOVERY and PM_RECOVERY_CONSISTENT, but we
 * could start accepting connections to perform read-only queries at this
 * point, if we had the infrastructure to do that.
- *
+ * When archive recovery is finished, the startup process exits with exit
- * When WAL redo is finished, the startup process exits with exit code 0
+ * code 0 and we switch to PM_RUN state.
- * and we switch to PM_RUN state. Startup process can also skip the
- * recovery and consistent recovery phases altogether, as it will during
- * normal startup when there's no recovery to be done, for example.
 *
 * Normal child backends can only be launched when we are in PM_RUN state.
 * (We also allow it in PM_WAIT_BACKUP state, but only for superusers.)
@@ -269,7 +270,7 @@ typedef enum
 {
 	PM_INIT,					/* postmaster starting */
 	PM_STARTUP,					/* waiting for startup subprocess */
-	PM_RECOVERY,				/* in recovery mode */
+	PM_RECOVERY,				/* in archive recovery mode */
 	PM_RECOVERY_CONSISTENT,		/* consistent recovery mode */
 	PM_RUN,						/* normal "database is alive" state */
 	PM_WAIT_BACKUP,				/* waiting for online backup mode to end */
@@ -2195,8 +2196,8 @@ reaper(SIGNAL_ARGS)
 			/*
 			 * Unexpected exit of startup process (including FATAL exit)
-			 * during PM_STARTUP is treated as catastrophic. There is no other
+			 * during PM_STARTUP is treated as catastrophic. There are no
-			 * processes running yet, so we can just exit.
+			 * other processes running yet, so we can just exit.
 			 */
 			if (pmState == PM_STARTUP && !EXIT_STATUS_0(exitstatus))
 			{
@@ -2247,7 +2248,7 @@ reaper(SIGNAL_ARGS)
 			/*
 			 * Crank up the background writer, if we didn't do that already
-			 * when we entered consistent recovery phase.  It doesn't matter
+			 * when we entered consistent recovery state.  It doesn't matter
 			 * if this fails, we'll just try again later.
 			 */
 			if (BgWriterPID == 0)
@@ -4008,7 +4009,7 @@ sigusr1_handler(SIGNAL_ARGS)
 		/*
 		 * Load the flat authorization file into postmaster's cache. The
 		 * startup process won't have recomputed this from the database yet,
-		 * so we it may change following recovery.
+		 * so it may change following recovery.
 		 */
 		load_role();

--- a/src/backend/storage/smgr/md.c
+++ b/src/backend/storage/smgr/md.c
@@ -8,7 +8,7 @@
 *
 *
 * IDENTIFICATION
- *	  $PostgreSQL: pgsql/src/backend/storage/smgr/md.c,v 1.147 2009/06/25 21:36:00 heikki Exp $
+ *	  $PostgreSQL: pgsql/src/backend/storage/smgr/md.c,v 1.148 2009/06/26 20:29:04 tgl Exp $
 *
 *-------------------------------------------------------------------------
 */
@@ -204,10 +204,10 @@ mdinit(void)
 }
 /*
- * In archive recovery, we rely on bgwriter to do fsyncs(), but we don't
+ * In archive recovery, we rely on bgwriter to do fsyncs, but we will have
- * know that we do archive recovery at process startup when pendingOpsTable
+ * already created the pendingOpsTable during initialization of the startup
- * has already been created. Calling this function drops pendingOpsTable
+ * process.  Calling this function drops the local pendingOpsTable so that
- * and causes any subsequent requests to be forwarded to bgwriter.
+ * subsequent requests will be forwarded to bgwriter.
 */
 void
 SetForwardFsyncRequests(void)

--- a/src/include/access/xlog.h
+++ b/src/include/access/xlog.h
@@ -6,7 +6,7 @@
 * Portions Copyright (c) 1996-2009, PostgreSQL Global Development Group
 * Portions Copyright (c) 1994, Regents of the University of California
 *
- * $PostgreSQL: pgsql/src/include/access/xlog.h,v 1.92 2009/06/25 21:36:00 heikki Exp $
+ * $PostgreSQL: pgsql/src/include/access/xlog.h,v 1.93 2009/06/26 20:29:04 tgl Exp $
 */
 #ifndef XLOG_H
 #define XLOG_H
@@ -159,15 +159,15 @@ extern bool XLOG_DEBUG;
 /* These directly affect the behavior of CreateCheckPoint and subsidiaries */
 #define CHECKPOINT_IS_SHUTDOWN	0x0001	/* Checkpoint is for shutdown */
-#define CHECKPOINT_IMMEDIATE	0x0002	/* Do it without delays */
+#define CHECKPOINT_END_OF_RECOVERY	0x0002	/* Like shutdown checkpoint, but
-#define CHECKPOINT_FORCE		0x0004	/* Force even if no activity */
+											 * issued at end of WAL recovery */
+#define CHECKPOINT_IMMEDIATE	0x0004	/* Do it without delays */
+#define CHECKPOINT_FORCE		0x0008	/* Force even if no activity */
 /* These are important to RequestCheckpoint */
-#define CHECKPOINT_WAIT			0x0008	/* Wait for completion */
+#define CHECKPOINT_WAIT			0x0010	/* Wait for completion */
 /* These indicate the cause of a checkpoint request */
-#define CHECKPOINT_CAUSE_XLOG	0x0010	/* XLOG consumption */
+#define CHECKPOINT_CAUSE_XLOG	0x0020	/* XLOG consumption */
-#define CHECKPOINT_CAUSE_TIME	0x0020	/* Elapsed time */
+#define CHECKPOINT_CAUSE_TIME	0x0040	/* Elapsed time */
-#define CHECKPOINT_END_OF_RECOVERY	0x0040	/* Like shutdown checkpoint, but
-											 * issued at end of WAL recovery */
 /* Checkpoint statistics */
 typedef struct CheckpointStatsData
@@ -202,6 +202,7 @@ extern void xlog_redo(XLogRecPtr lsn, XLogRecord *record);
 extern void xlog_desc(StringInfo buf, uint8 xl_info, char *rec);
 extern bool RecoveryInProgress(void);
+extern bool XLogInsertAllowed(void);
 extern void UpdateControlFile(void);
 extern Size XLOGShmemSize(void);