Commit 1e614803 authored by Alvaro Herrera's avatar Alvaro Herrera

Allow walreceiver configuration to change on reload

The parameters primary_conninfo, primary_slot_name and
wal_receiver_create_temp_slot can now be changed with a simple "reload"
signal, no longer requiring a server restart.  This is achieved by
signalling the walreceiver process to terminate and having it start
again with the new values.

Thanks to Andres Freund, Kyotaro Horiguchi, Fujii Masao for discussion.

Author: Sergei Kornilov <sk@zsrv.org>
Reviewed-by: default avatarMichael Paquier <michael@paquier.xyz>
Reviewed-by: default avatarÁlvaro Herrera <alvherre@alvh.no-ip.org>
Discussion: https://postgr.es/m/19513901543181143@sas1-19a94364928d.qloud-c.yandex.net
parent 092c6936
...@@ -4028,7 +4028,12 @@ ANY <replaceable class="parameter">num_sync</replaceable> ( <replaceable class=" ...@@ -4028,7 +4028,12 @@ ANY <replaceable class="parameter">num_sync</replaceable> ( <replaceable class="
<varname>primary_conninfo</varname> string. <varname>primary_conninfo</varname> string.
</para> </para>
<para> <para>
This parameter can only be set at server start. This parameter can only be set in the <filename>postgresql.conf</filename>
file or on the server command line.
If this parameter is changed while the WAL receiver process is
running, that process is signalled to shut down and expected to
restart with the new setting (except if <varname>primary_conninfo</varname>
is an empty string).
This setting has no effect if the server is not in standby mode. This setting has no effect if the server is not in standby mode.
</para> </para>
</listitem> </listitem>
...@@ -4045,9 +4050,13 @@ ANY <replaceable class="parameter">num_sync</replaceable> ( <replaceable class=" ...@@ -4045,9 +4050,13 @@ ANY <replaceable class="parameter">num_sync</replaceable> ( <replaceable class="
connecting to the sending server via streaming replication to control connecting to the sending server via streaming replication to control
resource removal on the upstream node resource removal on the upstream node
(see <xref linkend="streaming-replication-slots"/>). (see <xref linkend="streaming-replication-slots"/>).
This parameter can only be set at server start. This parameter can only be set in the <filename>postgresql.conf</filename>
file or on the server command line.
If this parameter is changed while the WAL receiver process is running,
that process is signalled to shut down and expected to restart with the
new setting.
This setting has no effect if <varname>primary_conninfo</varname> is not This setting has no effect if <varname>primary_conninfo</varname> is not
set. set or the server is not in standby mode.
</para> </para>
</listitem> </listitem>
</varlistentry> </varlistentry>
...@@ -4160,10 +4169,14 @@ ANY <replaceable class="parameter">num_sync</replaceable> ( <replaceable class=" ...@@ -4160,10 +4169,14 @@ ANY <replaceable class="parameter">num_sync</replaceable> ( <replaceable class="
</term> </term>
<listitem> <listitem>
<para> <para>
Specifies whether a WAL receiver should create a temporary replication Specifies whether the WAL receiver process should create a temporary replication
slot on the remote instance when no permanent replication slot to use slot on the remote instance when no permanent replication slot to use
has been configured (using <xref linkend="guc-primary-slot-name"/>). has been configured (using <xref linkend="guc-primary-slot-name"/>).
The default is off. This parameter can only be set at server start. The default is off. This parameter can only be set in the
<filename>postgresql.conf</filename> file or on the server command line.
If this parameter is changed while the WAL receiver process is running,
that process is signalled to shut down and expected to restart with
the new setting.
</para> </para>
</listitem> </listitem>
</varlistentry> </varlistentry>
......
...@@ -816,8 +816,8 @@ archive_cleanup_command = 'pg_archivecleanup /path/to/archive %r' ...@@ -816,8 +816,8 @@ archive_cleanup_command = 'pg_archivecleanup /path/to/archive %r'
When the standby is started and <varname>primary_conninfo</varname> is set When the standby is started and <varname>primary_conninfo</varname> is set
correctly, the standby will connect to the primary after replaying all correctly, the standby will connect to the primary after replaying all
WAL files available in the archive. If the connection is established WAL files available in the archive. If the connection is established
successfully, you will see a walreceiver process in the standby, and successfully, you will see a <literal>walreceiver</literal> in the standby, and
a corresponding walsender process in the primary. a corresponding <literal>walsender</literal> process in the primary.
</para> </para>
<sect3 id="streaming-replication-authentication"> <sect3 id="streaming-replication-authentication">
......
...@@ -816,9 +816,13 @@ static XLogSource readSource = XLOG_FROM_ANY; ...@@ -816,9 +816,13 @@ static XLogSource readSource = XLOG_FROM_ANY;
* currently have a WAL file open. If lastSourceFailed is set, our last * currently have a WAL file open. If lastSourceFailed is set, our last
* attempt to read from currentSource failed, and we should try another source * attempt to read from currentSource failed, and we should try another source
* next. * next.
*
* pendingWalRcvRestart is set when a config change occurs that requires a
* walreceiver restart. This is only valid in XLOG_FROM_STREAM state.
*/ */
static XLogSource currentSource = XLOG_FROM_ANY; static XLogSource currentSource = XLOG_FROM_ANY;
static bool lastSourceFailed = false; static bool lastSourceFailed = false;
static bool pendingWalRcvRestart = false;
typedef struct XLogPageReadPrivate typedef struct XLogPageReadPrivate
{ {
...@@ -11905,6 +11909,7 @@ WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess, ...@@ -11905,6 +11909,7 @@ WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess,
for (;;) for (;;)
{ {
XLogSource oldSource = currentSource; XLogSource oldSource = currentSource;
bool startWalReceiver = false;
/* /*
* First check if we failed to read from the current source, and * First check if we failed to read from the current source, and
...@@ -11939,54 +11944,11 @@ WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess, ...@@ -11939,54 +11944,11 @@ WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess,
return false; return false;
/* /*
* If primary_conninfo is set, launch walreceiver to try * Move to XLOG_FROM_STREAM state, and set to start a
* to stream the missing WAL. * walreceiver if necessary.
*
* If fetching_ckpt is true, RecPtr points to the initial
* checkpoint location. In that case, we use RedoStartLSN
* as the streaming start position instead of RecPtr, so
* that when we later jump backwards to start redo at
* RedoStartLSN, we will have the logs streamed already.
*/
if (PrimaryConnInfo && strcmp(PrimaryConnInfo, "") != 0)
{
XLogRecPtr ptr;
TimeLineID tli;
if (fetching_ckpt)
{
ptr = RedoStartLSN;
tli = ControlFile->checkPointCopy.ThisTimeLineID;
}
else
{
ptr = RecPtr;
/*
* Use the record begin position to determine the
* TLI, rather than the position we're reading.
*/
tli = tliOfPointInHistory(tliRecPtr, expectedTLEs);
if (curFileTLI > 0 && tli < curFileTLI)
elog(ERROR, "according to history file, WAL location %X/%X belongs to timeline %u, but previous recovered WAL file came from timeline %u",
(uint32) (tliRecPtr >> 32),
(uint32) tliRecPtr,
tli, curFileTLI);
}
curFileTLI = tli;
RequestXLogStreaming(tli, ptr, PrimaryConnInfo,
PrimarySlotName,
wal_receiver_create_temp_slot);
receivedUpto = 0;
}
/*
* Move to XLOG_FROM_STREAM state in either case. We'll
* get immediate failure if we didn't launch walreceiver,
* and move on to the next state.
*/ */
currentSource = XLOG_FROM_STREAM; currentSource = XLOG_FROM_STREAM;
startWalReceiver = true;
break; break;
case XLOG_FROM_STREAM: case XLOG_FROM_STREAM:
...@@ -12138,7 +12100,71 @@ WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess, ...@@ -12138,7 +12100,71 @@ WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess,
Assert(StandbyMode); Assert(StandbyMode);
/* /*
* Check if WAL receiver is still active. * First, shutdown walreceiver if its restart has been
* requested -- but no point if we're already slated for
* starting it.
*/
if (pendingWalRcvRestart && !startWalReceiver)
{
ShutdownWalRcv();
/*
* Re-scan for possible new timelines if we were
* requested to recover to the latest timeline.
*/
if (recoveryTargetTimeLineGoal ==
RECOVERY_TARGET_TIMELINE_LATEST)
rescanLatestTimeLine();
startWalReceiver = true;
}
pendingWalRcvRestart = false;
/*
* Launch walreceiver if needed.
*
* If fetching_ckpt is true, RecPtr points to the initial
* checkpoint location. In that case, we use RedoStartLSN
* as the streaming start position instead of RecPtr, so
* that when we later jump backwards to start redo at
* RedoStartLSN, we will have the logs streamed already.
*/
if (startWalReceiver &&
PrimaryConnInfo && strcmp(PrimaryConnInfo, "") != 0)
{
XLogRecPtr ptr;
TimeLineID tli;
if (fetching_ckpt)
{
ptr = RedoStartLSN;
tli = ControlFile->checkPointCopy.ThisTimeLineID;
}
else
{
ptr = RecPtr;
/*
* Use the record begin position to determine the
* TLI, rather than the position we're reading.
*/
tli = tliOfPointInHistory(tliRecPtr, expectedTLEs);
if (curFileTLI > 0 && tli < curFileTLI)
elog(ERROR, "according to history file, WAL location %X/%X belongs to timeline %u, but previous recovered WAL file came from timeline %u",
(uint32) (tliRecPtr >> 32),
(uint32) tliRecPtr,
tli, curFileTLI);
}
curFileTLI = tli;
RequestXLogStreaming(tli, ptr, PrimaryConnInfo,
PrimarySlotName,
wal_receiver_create_temp_slot);
receivedUpto = 0;
}
/*
* Check if WAL receiver is active or wait to start up.
*/ */
if (!WalRcvStreaming()) if (!WalRcvStreaming())
{ {
...@@ -12266,6 +12292,22 @@ WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess, ...@@ -12266,6 +12292,22 @@ WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess,
return false; /* not reached */ return false; /* not reached */
} }
/*
* Set flag to signal the walreceiver to restart. (The startup process calls
* this on noticing a relevant configuration change.)
*/
void
StartupRequestWalReceiverRestart(void)
{
if (currentSource == XLOG_FROM_STREAM && WalRcvRunning())
{
ereport(LOG,
(errmsg("wal receiver process shutdown requested")));
pendingWalRcvRestart = true;
}
}
/* /*
* Determine what log level should be used to report a corrupt WAL record * Determine what log level should be used to report a corrupt WAL record
* in the current WAL page, previously read by XLogPageRead(). * in the current WAL page, previously read by XLogPageRead().
......
...@@ -585,9 +585,9 @@ ReadPageInternal(XLogReaderState *state, XLogRecPtr pageptr, int reqLen) ...@@ -585,9 +585,9 @@ ReadPageInternal(XLogReaderState *state, XLogRecPtr pageptr, int reqLen)
/* /*
* Data is not in our buffer. * Data is not in our buffer.
* *
* Every time we actually read the page, even if we looked at parts of it * Every time we actually read the segment, even if we looked at parts of
* before, we need to do verification as the read_page callback might now * it before, we need to do verification as the read_page callback might
* be rereading data from a different source. * now be rereading data from a different source.
* *
* Whenever switching to a new WAL segment, we read the first page of the * Whenever switching to a new WAL segment, we read the first page of the
* file and validate its header, even if that's not where the target * file and validate its header, even if that's not where the target
......
...@@ -96,17 +96,51 @@ StartupProcShutdownHandler(SIGNAL_ARGS) ...@@ -96,17 +96,51 @@ StartupProcShutdownHandler(SIGNAL_ARGS)
errno = save_errno; errno = save_errno;
} }
/*
* Re-read the config file.
*
* If one of the critical walreceiver options has changed, flag xlog.c
* to restart it.
*/
static void
StartupRereadConfig(void)
{
char *conninfo = pstrdup(PrimaryConnInfo);
char *slotname = pstrdup(PrimarySlotName);
bool tempSlot = wal_receiver_create_temp_slot;
bool conninfoChanged;
bool slotnameChanged;
bool tempSlotChanged = false;
ProcessConfigFile(PGC_SIGHUP);
conninfoChanged = strcmp(conninfo, PrimaryConnInfo) != 0;
slotnameChanged = strcmp(slotname, PrimarySlotName) != 0;
/*
* wal_receiver_create_temp_slot is used only when we have no slot
* configured. We do not need to track this change if it has no effect.
*/
if (!slotnameChanged && strcmp(PrimarySlotName, "") == 0)
tempSlotChanged = tempSlot != wal_receiver_create_temp_slot;
pfree(conninfo);
pfree(slotname);
if (conninfoChanged || slotnameChanged || tempSlotChanged)
StartupRequestWalReceiverRestart();
}
/* Handle various signals that might be sent to the startup process */ /* Handle various signals that might be sent to the startup process */
void void
HandleStartupProcInterrupts(void) HandleStartupProcInterrupts(void)
{ {
/* /*
* Check if we were requested to re-read config file. * Process any requests or signals received recently.
*/ */
if (got_SIGHUP) if (got_SIGHUP)
{ {
got_SIGHUP = false; got_SIGHUP = false;
ProcessConfigFile(PGC_SIGHUP); StartupRereadConfig();
} }
/* /*
......
...@@ -679,7 +679,11 @@ WalRcvWaitForStartPosition(XLogRecPtr *startpoint, TimeLineID *startpointTLI) ...@@ -679,7 +679,11 @@ WalRcvWaitForStartPosition(XLogRecPtr *startpoint, TimeLineID *startpointTLI)
walrcv->walRcvState == WALRCV_STOPPING); walrcv->walRcvState == WALRCV_STOPPING);
if (walrcv->walRcvState == WALRCV_RESTARTING) if (walrcv->walRcvState == WALRCV_RESTARTING)
{ {
/* we don't expect primary_conninfo to change */ /*
* No need to handle changes in primary_conninfo or
* primary_slotname here. Startup process will signal us to
* terminate in case those change.
*/
*startpoint = walrcv->receiveStart; *startpoint = walrcv->receiveStart;
*startpointTLI = walrcv->receiveStartTLI; *startpointTLI = walrcv->receiveStartTLI;
walrcv->walRcvState = WALRCV_STREAMING; walrcv->walRcvState = WALRCV_STREAMING;
......
...@@ -2050,7 +2050,7 @@ static struct config_bool ConfigureNamesBool[] = ...@@ -2050,7 +2050,7 @@ static struct config_bool ConfigureNamesBool[] =
}, },
{ {
{"wal_receiver_create_temp_slot", PGC_POSTMASTER, REPLICATION_STANDBY, {"wal_receiver_create_temp_slot", PGC_SIGHUP, REPLICATION_STANDBY,
gettext_noop("Sets whether a WAL receiver should create a temporary replication slot if no permanent slot is configured."), gettext_noop("Sets whether a WAL receiver should create a temporary replication slot if no permanent slot is configured."),
}, },
&wal_receiver_create_temp_slot, &wal_receiver_create_temp_slot,
...@@ -3717,7 +3717,7 @@ static struct config_string ConfigureNamesString[] = ...@@ -3717,7 +3717,7 @@ static struct config_string ConfigureNamesString[] =
}, },
{ {
{"primary_conninfo", PGC_POSTMASTER, REPLICATION_STANDBY, {"primary_conninfo", PGC_SIGHUP, REPLICATION_STANDBY,
gettext_noop("Sets the connection string to be used to connect to the sending server."), gettext_noop("Sets the connection string to be used to connect to the sending server."),
NULL, NULL,
GUC_SUPERUSER_ONLY GUC_SUPERUSER_ONLY
...@@ -3728,7 +3728,7 @@ static struct config_string ConfigureNamesString[] = ...@@ -3728,7 +3728,7 @@ static struct config_string ConfigureNamesString[] =
}, },
{ {
{"primary_slot_name", PGC_POSTMASTER, REPLICATION_STANDBY, {"primary_slot_name", PGC_SIGHUP, REPLICATION_STANDBY,
gettext_noop("Sets the name of the replication slot to use on the sending server."), gettext_noop("Sets the name of the replication slot to use on the sending server."),
NULL NULL
}, },
......
...@@ -309,9 +309,7 @@ ...@@ -309,9 +309,7 @@
# These settings are ignored on a master server. # These settings are ignored on a master server.
#primary_conninfo = '' # connection string to sending server #primary_conninfo = '' # connection string to sending server
# (change requires restart)
#primary_slot_name = '' # replication slot on sending server #primary_slot_name = '' # replication slot on sending server
# (change requires restart)
#promote_trigger_file = '' # file name whose presence ends recovery #promote_trigger_file = '' # file name whose presence ends recovery
#hot_standby = on # "off" disallows queries during recovery #hot_standby = on # "off" disallows queries during recovery
# (change requires restart) # (change requires restart)
...@@ -323,7 +321,6 @@ ...@@ -323,7 +321,6 @@
# -1 allows indefinite delay # -1 allows indefinite delay
#wal_receiver_create_temp_slot = off # Create temp slot if primary_slot_name #wal_receiver_create_temp_slot = off # Create temp slot if primary_slot_name
# is not set. # is not set.
# (change requires restart)
#wal_receiver_status_interval = 10s # send replies at least this often #wal_receiver_status_interval = 10s # send replies at least this often
# 0 disables # 0 disables
#hot_standby_feedback = off # send info from standby to prevent #hot_standby_feedback = off # send info from standby to prevent
......
...@@ -319,6 +319,7 @@ extern bool CheckPromoteSignal(void); ...@@ -319,6 +319,7 @@ extern bool CheckPromoteSignal(void);
extern void WakeupRecovery(void); extern void WakeupRecovery(void);
extern void SetWalWriterSleeping(bool sleeping); extern void SetWalWriterSleeping(bool sleeping);
extern void StartupRequestWalReceiverRestart(void);
extern void XLogRequestWalReceiverReply(void); extern void XLogRequestWalReceiverReply(void);
extern void assign_max_wal_size(int newval, void *extra); extern void assign_max_wal_size(int newval, void *extra);
......
...@@ -3,7 +3,7 @@ use strict; ...@@ -3,7 +3,7 @@ use strict;
use warnings; use warnings;
use PostgresNode; use PostgresNode;
use TestLib; use TestLib;
use Test::More tests => 34; use Test::More tests => 35;
# Initialize master node # Initialize master node
my $node_master = get_new_node('master'); my $node_master = get_new_node('master');
...@@ -208,7 +208,9 @@ $node_standby_2->append_conf('postgresql.conf', ...@@ -208,7 +208,9 @@ $node_standby_2->append_conf('postgresql.conf',
"primary_slot_name = $slotname_2"); "primary_slot_name = $slotname_2");
$node_standby_2->append_conf('postgresql.conf', $node_standby_2->append_conf('postgresql.conf',
"wal_receiver_status_interval = 1"); "wal_receiver_status_interval = 1");
$node_standby_2->restart; # should be able change primary_slot_name without restart
# will wait effect in get_slot_xmins above
$node_standby_2->reload;
# Fetch xmin columns from slot's pg_replication_slots row, after waiting for # Fetch xmin columns from slot's pg_replication_slots row, after waiting for
# given boolean condition to be true to ensure we've reached a quiescent state # given boolean condition to be true to ensure we've reached a quiescent state
...@@ -345,6 +347,24 @@ is($xmin, '', 'xmin of cascaded slot null with hs feedback reset'); ...@@ -345,6 +347,24 @@ is($xmin, '', 'xmin of cascaded slot null with hs feedback reset');
is($catalog_xmin, '', is($catalog_xmin, '',
'catalog xmin of cascaded slot still null with hs_feedback reset'); 'catalog xmin of cascaded slot still null with hs_feedback reset');
note "check change primary_conninfo without restart";
$node_standby_2->append_conf('postgresql.conf',
"primary_slot_name = ''");
$node_standby_2->enable_streaming($node_master);
$node_standby_2->reload;
# be sure do not streaming from cascade
$node_standby_1->stop;
my $newval = $node_master->safe_psql('postgres',
'INSERT INTO replayed(val) SELECT coalesce(max(val),0) + 1 AS newval FROM replayed RETURNING val'
);
$node_master->wait_for_catchup($node_standby_2, 'replay',
$node_master->lsn('insert'));
my $is_replayed = $node_standby_2->safe_psql('postgres',
qq[SELECT 1 FROM replayed WHERE val = $newval]);
is($is_replayed, qq(1), "standby_2 didn't replay master value $newval");
# Test physical slot advancing and its durability. Create a new slot on # Test physical slot advancing and its durability. Create a new slot on
# the primary, not used by any of the standbys. This reserves WAL at creation. # the primary, not used by any of the standbys. This reserves WAL at creation.
my $phys_slot = 'phys_slot'; my $phys_slot = 'phys_slot';
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment