Commit 3901fd70 authored by Fujii Masao's avatar Fujii Masao

Support quorum-based synchronous replication.

This feature is also known as "quorum commit" especially in discussion
on pgsql-hackers.

This commit adds the following new syntaxes into synchronous_standby_names
GUC. By using FIRST and ANY keywords, users can specify the method to
choose synchronous standbys from the listed servers.

  FIRST num_sync (standby_name [, ...])
  ANY num_sync (standby_name [, ...])

The keyword FIRST specifies a priority-based synchronous replication
which was available also in 9.6 or before. This method makes transaction
commits wait until their WAL records are replicated to num_sync
synchronous standbys chosen based on their priorities.

The keyword ANY specifies a quorum-based synchronous replication
and makes transaction commits wait until their WAL records are
replicated to *at least* num_sync listed standbys. In this method,
the values of sync_state.pg_stat_replication for the listed standbys
are reported as "quorum". The priority is still assigned to each standby,
but not used in this method.

The existing syntaxes having neither FIRST nor ANY keyword are still
supported. They are the same as new syntax with FIRST keyword, i.e.,
a priorirty-based synchronous replication.

Author: Masahiko Sawada
Reviewed-By: Michael Paquier, Amit Kapila and me
Discussion: <CAD21AoAACi9NeC_ecm+Vahm+MMA6nYh=Kqs3KB3np+MBOS_gZg@mail.gmail.com>

Many thanks to the various individuals who were involved in
discussing and developing this feature.
parent 10238fad
......@@ -3054,41 +3054,71 @@ include_dir 'conf.d'
transactions waiting for commit will be allowed to proceed after
these standby servers confirm receipt of their data.
The synchronous standbys will be those whose names appear
earlier in this list, and
in this list, and
that are both currently connected and streaming data in real-time
(as shown by a state of <literal>streaming</literal> in the
<link linkend="monitoring-stats-views-table">
<literal>pg_stat_replication</></link> view).
Other standby servers appearing later in this list represent potential
synchronous standbys. If any of the current synchronous
standbys disconnects for whatever reason,
it will be replaced immediately with the next-highest-priority standby.
Specifying more than one standby name can allow very high availability.
Specifying more than one standby names can allow very high availability.
</para>
<para>
This parameter specifies a list of standby servers using
either of the following syntaxes:
<synopsis>
<replaceable class="parameter">num_sync</replaceable> ( <replaceable class="parameter">standby_name</replaceable> [, ...] )
[FIRST] <replaceable class="parameter">num_sync</replaceable> ( <replaceable class="parameter">standby_name</replaceable> [, ...] )
ANY <replaceable class="parameter">num_sync</replaceable> ( <replaceable class="parameter">standby_name</replaceable> [, ...] )
<replaceable class="parameter">standby_name</replaceable> [, ...]
</synopsis>
where <replaceable class="parameter">num_sync</replaceable> is
the number of synchronous standbys that transactions need to
wait for replies from,
and <replaceable class="parameter">standby_name</replaceable>
is the name of a standby server. For example, a setting of
<literal>3 (s1, s2, s3, s4)</> makes transaction commits wait
until their WAL records are received by three higher-priority standbys
chosen from standby servers <literal>s1</>, <literal>s2</>,
<literal>s3</> and <literal>s4</>.
</para>
<para>
The second syntax was used before <productname>PostgreSQL</>
is the name of a standby server.
<literal>FIRST</> and <literal>ANY</> specify the method to choose
synchronous standbys from the listed servers.
</para>
<para>
The keyword <literal>FIRST</>, coupled with
<replaceable class="parameter">num_sync</replaceable>, specifies a
priority-based synchronous replication and makes transaction commits
wait until their WAL records are replicated to
<replaceable class="parameter">num_sync</replaceable> synchronous
standbys chosen based on their priorities. For example, a setting of
<literal>FIRST 3 (s1, s2, s3, s4)</> will cause each commit to wait for
replies from three higher-priority standbys chosen from standby servers
<literal>s1</>, <literal>s2</>, <literal>s3</> and <literal>s4</>.
The standbys whose names appear earlier in the list are given higher
priority and will be considered as synchronous. Other standby servers
appearing later in this list represent potential synchronous standbys.
If any of the current synchronous standbys disconnects for whatever
reason, it will be replaced immediately with the next-highest-priority
standby. The keyword <literal>FIRST</> is optional.
</para>
<para>
The keyword <literal>ANY</>, coupled with
<replaceable class="parameter">num_sync</replaceable>, specifies a
quorum-based synchronous replication and makes transaction commits
wait until their WAL records are replicated to <emphasis>at least</>
<replaceable class="parameter">num_sync</replaceable> listed standbys.
For example, a setting of <literal>ANY 3 (s1, s2, s3, s4)</> will cause
each commit to proceed as soon as at least any three standbys of
<literal>s1</>, <literal>s2</>, <literal>s3</> and <literal>s4</>
reply.
</para>
<para>
<literal>FIRST</> and <literal>ANY</> are case-insensitive. If these
keywords are used as the name of a standby server,
its <replaceable class="parameter">standby_name</replaceable> must
be double-quoted.
</para>
<para>
The third syntax was used before <productname>PostgreSQL</>
version 9.6 and is still supported. It's the same as the first syntax
with <replaceable class="parameter">num_sync</replaceable> equal to 1.
For example, <literal>1 (s1, s2)</> and
<literal>s1, s2</> have the same meaning: either <literal>s1</>
or <literal>s2</> is chosen as a synchronous standby.
with <literal>FIRST</> and
<replaceable class="parameter">num_sync</replaceable> equal to 1.
For example, <literal>FIRST 1 (s1, s2)</> and <literal>s1, s2</> have
the same meaning: either <literal>s1</> or <literal>s2</> is chosen
as a synchronous standby.
</para>
<para>
The name of a standby server for this purpose is the
......
......@@ -1138,19 +1138,25 @@ primary_slot_name = 'node_a_slot'
as synchronous confirm receipt of their data. The number of synchronous
standbys that transactions must wait for replies from is specified in
<varname>synchronous_standby_names</>. This parameter also specifies
a list of standby names, which determines the priority of each standby
for being chosen as a synchronous standby. The standbys whose names
appear earlier in the list are given higher priority and will be considered
as synchronous. Other standby servers appearing later in this list
represent potential synchronous standbys. If any of the current
synchronous standbys disconnects for whatever reason, it will be replaced
immediately with the next-highest-priority standby.
a list of standby names and the method (<literal>FIRST</> and
<literal>ANY</>) to choose synchronous standbys from the listed ones.
</para>
<para>
An example of <varname>synchronous_standby_names</> for multiple
synchronous standbys is:
The method <literal>FIRST</> specifies a priority-based synchronous
replication and makes transaction commits wait until their WAL records are
replicated to the requested number of synchronous standbys chosen based on
their priorities. The standbys whose names appear earlier in the list are
given higher priority and will be considered as synchronous. Other standby
servers appearing later in this list represent potential synchronous
standbys. If any of the current synchronous standbys disconnects for
whatever reason, it will be replaced immediately with the
next-highest-priority standby.
</para>
<para>
An example of <varname>synchronous_standby_names</> for
a priority-based multiple synchronous standbys is:
<programlisting>
synchronous_standby_names = '2 (s1, s2, s3)'
synchronous_standby_names = 'FIRST 2 (s1, s2, s3)'
</programlisting>
In this example, if four standby servers <literal>s1</>, <literal>s2</>,
<literal>s3</> and <literal>s4</> are running, the two standbys
......@@ -1161,6 +1167,24 @@ synchronous_standby_names = '2 (s1, s2, s3)'
<literal>s2</> fails. <literal>s4</> is an asynchronous standby since
its name is not in the list.
</para>
<para>
The method <literal>ANY</> specifies a quorum-based synchronous
replication and makes transaction commits wait until their WAL records
are replicated to <emphasis>at least</> the requested number of
synchronous standbys in the list.
</para>
<para>
An example of <varname>synchronous_standby_names</> for
a quorum-based multiple synchronous standbys is:
<programlisting>
synchronous_standby_names = 'ANY 2 (s1, s2, s3)'
</programlisting>
In this example, if four standby servers <literal>s1</>, <literal>s2</>,
<literal>s3</> and <literal>s4</> are running, transaction commits will
wait for replies from at least any two standbys of <literal>s1</>,
<literal>s2</> and <literal>s3</>. <literal>s4</> is an asynchronous
standby since its name is not in the list.
</para>
<para>
The synchronous states of standby servers can be viewed using
the <structname>pg_stat_replication</structname> view.
......
......@@ -1404,7 +1404,8 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i
<entry><structfield>sync_priority</></entry>
<entry><type>integer</></entry>
<entry>Priority of this standby server for being chosen as the
synchronous standby</entry>
synchronous standby in a priority-based synchronous replication.
This has no effect in a quorum-based synchronous replication.</entry>
</row>
<row>
<entry><structfield>sync_state</></entry>
......@@ -1429,6 +1430,12 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i
<literal>sync</>: This standby server is synchronous.
</para>
</listitem>
<listitem>
<para>
<literal>quorum</>: This standby server is considered as a candidate
for quorum standbys.
</para>
</listitem>
</itemizedlist>
</entry>
</row>
......
......@@ -26,7 +26,7 @@ repl_gram.o: repl_scanner.c
# syncrep_scanner is complied as part of syncrep_gram
syncrep_gram.o: syncrep_scanner.c
syncrep_scanner.c: FLEXFLAGS = -CF -p
syncrep_scanner.c: FLEXFLAGS = -CF -p -i
syncrep_scanner.c: FLEX_NO_BACKUP=yes
# repl_gram.c, repl_scanner.c, syncrep_gram.c and syncrep_scanner.c
......
This diff is collapsed.
......@@ -21,7 +21,7 @@ SyncRepConfigData *syncrep_parse_result;
char *syncrep_parse_error_msg;
static SyncRepConfigData *create_syncrep_config(const char *num_sync,
List *members);
List *members, uint8 syncrep_method);
/*
* Bison doesn't allocate anything that needs to live across parser calls,
......@@ -46,7 +46,7 @@ static SyncRepConfigData *create_syncrep_config(const char *num_sync,
SyncRepConfigData *config;
}
%token <str> NAME NUM JUNK
%token <str> NAME NUM JUNK ANY FIRST
%type <config> result standby_config
%type <list> standby_list
......@@ -60,8 +60,10 @@ result:
;
standby_config:
standby_list { $$ = create_syncrep_config("1", $1); }
| NUM '(' standby_list ')' { $$ = create_syncrep_config($1, $3); }
standby_list { $$ = create_syncrep_config("1", $1, SYNC_REP_PRIORITY); }
| NUM '(' standby_list ')' { $$ = create_syncrep_config($1, $3, SYNC_REP_PRIORITY); }
| ANY NUM '(' standby_list ')' { $$ = create_syncrep_config($2, $4, SYNC_REP_QUORUM); }
| FIRST NUM '(' standby_list ')' { $$ = create_syncrep_config($2, $4, SYNC_REP_PRIORITY); }
;
standby_list:
......@@ -75,9 +77,8 @@ standby_name:
;
%%
static SyncRepConfigData *
create_syncrep_config(const char *num_sync, List *members)
create_syncrep_config(const char *num_sync, List *members, uint8 syncrep_method)
{
SyncRepConfigData *config;
int size;
......@@ -98,6 +99,7 @@ create_syncrep_config(const char *num_sync, List *members)
config->config_size = size;
config->num_sync = atoi(num_sync);
config->syncrep_method = syncrep_method;
config->nmembers = list_length(members);
ptr = config->member_names;
foreach(lc, members)
......
......@@ -64,6 +64,9 @@ xdinside [^"]+
%%
{space}+ { /* ignore */ }
ANY { return ANY; }
FIRST { return FIRST; }
{xdstart} {
initStringInfo(&xdbuf);
BEGIN(xd);
......
......@@ -2868,12 +2868,20 @@ pg_stat_get_wal_senders(PG_FUNCTION_ARGS)
/*
* More easily understood version of standby state. This is purely
* informational, not different from priority.
* informational.
*
* In quorum-based sync replication, the role of each standby
* listed in synchronous_standby_names can be changing very
* frequently. Any standbys considered as "sync" at one moment can
* be switched to "potential" ones at the next moment. So, it's
* basically useless to report "sync" or "potential" as their sync
* states. We report just "quorum" for them.
*/
if (priority == 0)
values[7] = CStringGetTextDatum("async");
else if (list_member_int(sync_standbys, i))
values[7] = CStringGetTextDatum("sync");
values[7] = SyncRepConfig->syncrep_method == SYNC_REP_PRIORITY ?
CStringGetTextDatum("sync") : CStringGetTextDatum("quorum");
else
values[7] = CStringGetTextDatum("potential");
}
......
......@@ -245,7 +245,8 @@
# These settings are ignored on a standby server.
#synchronous_standby_names = '' # standby servers that provide sync rep
# number of sync standbys and comma-separated list of application_name
# method to choose sync standbys, number of sync standbys
# and comma-separated list of application_name
# from standby(s); '*' = all
#vacuum_defer_cleanup_age = 0 # number of xacts by which cleanup is delayed
......
......@@ -32,6 +32,10 @@
#define SYNC_REP_WAITING 1
#define SYNC_REP_WAIT_COMPLETE 2
/* syncrep_method of SyncRepConfigData */
#define SYNC_REP_PRIORITY 0
#define SYNC_REP_QUORUM 1
/*
* Struct for the configuration of synchronous replication.
*
......@@ -44,11 +48,14 @@ typedef struct SyncRepConfigData
int config_size; /* total size of this struct, in bytes */
int num_sync; /* number of sync standbys that we need to
* wait for */
uint8 syncrep_method; /* method to choose sync standbys */
int nmembers; /* number of members in the following list */
/* member_names contains nmembers consecutive nul-terminated C strings */
char member_names[FLEXIBLE_ARRAY_MEMBER];
} SyncRepConfigData;
extern SyncRepConfigData *SyncRepConfig;
/* communication variables for parsing synchronous_standby_names GUC */
extern SyncRepConfigData *syncrep_parse_result;
extern char *syncrep_parse_error_msg;
......
......@@ -3,7 +3,7 @@ use strict;
use warnings;
use PostgresNode;
use TestLib;
use Test::More tests => 8;
use Test::More tests => 11;
# Query checking sync_priority and sync_state of each standby
my $check_sql =
......@@ -172,3 +172,34 @@ test_sync_state(
standby2|1|sync
standby4|1|potential),
'potential standby found earlier in array is promoted to sync');
# Check that standby1 and standby2 are chosen as sync standbys
# based on their priorities.
test_sync_state(
$node_master, qq(standby1|1|sync
standby2|2|sync
standby4|0|async),
'priority-based sync replication specified by FIRST keyword',
'FIRST 2(standby1, standby2)');
# Check that all the listed standbys are considered as candidates
# for sync standbys in a quorum-based sync replication.
test_sync_state(
$node_master, qq(standby1|1|quorum
standby2|2|quorum
standby4|0|async),
'2 quorum and 1 async',
'ANY 2(standby1, standby2)');
# Start Standby3 which will be considered in 'quorum' state.
$node_standby_3->start;
# Check that the setting of 'ANY 2(*)' chooses all standbys as
# candidates for quorum sync standbys.
test_sync_state(
$node_master, qq(standby1|1|quorum
standby2|1|quorum
standby3|1|quorum
standby4|1|quorum),
'all standbys are considered as candidates for quorum sync standbys',
'ANY 2(*)');
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment