Commit 0cc59cc1 authored by Tom Lane's avatar Tom Lane

Add current WAL end (as seen by walsender, ie, GetWriteRecPtr() result)

and current server clock time to SR data messages.  These are not currently
used on the slave side but seem likely to be useful in future, and it'd be
better not to change the SR protocol after release.  Per discussion.
Also do some minor code review and cleanup on walsender.c, and improve the
protocol documentation.
parent 572ec5a2
<!-- $PostgreSQL: pgsql/doc/src/sgml/protocol.sgml,v 1.87 2010/04/03 07:22:55 petere Exp $ --> <!-- $PostgreSQL: pgsql/doc/src/sgml/protocol.sgml,v 1.88 2010/06/03 22:17:32 tgl Exp $ -->
<chapter id="protocol"> <chapter id="protocol">
<title>Frontend/Backend Protocol</title> <title>Frontend/Backend Protocol</title>
...@@ -1284,6 +1284,173 @@ ...@@ -1284,6 +1284,173 @@
</sect2> </sect2>
</sect1> </sect1>
<sect1 id="protocol-replication">
<title>Streaming Replication Protocol</title>
<para>
To initiate streaming replication, the frontend sends the
<literal>replication</> parameter in the startup message. This tells the
backend to go into walsender mode, wherein a small set of replication commands
can be issued instead of SQL statements. Only the simple query protocol can be
used in walsender mode.
The commands accepted in walsender mode are:
<variablelist>
<varlistentry>
<term>IDENTIFY_SYSTEM</term>
<listitem>
<para>
Requests the server to identify itself. Server replies with a result
set of a single row, containing two fields:
</para>
<para>
<variablelist>
<varlistentry>
<term>
systemid
</term>
<listitem>
<para>
The unique system identifier identifying the cluster. This
can be used to check that the base backup used to initialize the
slave came from the same cluster.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term>
timeline
</term>
<listitem>
<para>
Current TimelineID. Also useful to check that the slave is
consistent with the master.
</para>
</listitem>
</varlistentry>
</variablelist>
</para>
</listitem>
</varlistentry>
<varlistentry>
<term>START_REPLICATION <replaceable>XXX</>/<replaceable>XXX</></term>
<listitem>
<para>
Instructs server to start streaming WAL, starting at
WAL position <replaceable>XXX</>/<replaceable>XXX</>.
The server can reply with an error, e.g. if the requested section of WAL
has already been recycled. On success, server responds with a
CopyOutResponse message, and then starts to stream WAL to the frontend.
WAL will continue to be streamed until the connection is broken;
no further commands will be accepted.
</para>
<para>
WAL data is sent as a series of CopyData messages. (This allows
other information to be intermixed; in particular the server can send
an ErrorResponse message if it encounters a failure after beginning
to stream.) The payload in each CopyData message follows this format:
</para>
<para>
<variablelist>
<varlistentry>
<term>
XLogData (B)
</term>
<listitem>
<para>
<variablelist>
<varlistentry>
<term>
Byte1('w')
</term>
<listitem>
<para>
Identifies the message as WAL data.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term>
Byte8
</term>
<listitem>
<para>
The starting point of the WAL data in this message, given in
XLogRecPtr format.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term>
Byte8
</term>
<listitem>
<para>
The current end of WAL on the server, given in
XLogRecPtr format.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term>
Byte8
</term>
<listitem>
<para>
The server's system clock at the time of transmission,
given in TimestampTz format.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term>
Byte<replaceable>n</replaceable>
</term>
<listitem>
<para>
A section of the WAL data stream.
</para>
</listitem>
</varlistentry>
</variablelist>
</para>
</listitem>
</varlistentry>
</variablelist>
</para>
<para>
A single WAL record is never split across two CopyData messages.
When a WAL record crosses a WAL page boundary, and is therefore
already split using continuation records, it can be split at the page
boundary. In other words, the first main WAL record and its
continuation records can be sent in different CopyData messages.
</para>
<para>
Note that all fields within the WAL data and the above-described header
will be in the sending server's native format. Endianness, and the
format for the timestamp, are unpredictable unless the receiver has
verified that the sender's system identifier matches its own
<filename>pg_control</> contents.
</para>
<para>
If the WAL sender process is terminated normally (during postmaster
shutdown), it will send a CommandComplete message before exiting.
This might not happen during an abnormal shutdown, of course.
</para>
</listitem>
</varlistentry>
</variablelist>
</para>
</sect1>
<sect1 id="protocol-message-types"> <sect1 id="protocol-message-types">
<title>Message Data Types</title> <title>Message Data Types</title>
...@@ -4137,120 +4304,6 @@ not line breaks. ...@@ -4137,120 +4304,6 @@ not line breaks.
</sect1> </sect1>
<sect1 id="protocol-replication">
<title>Streaming Replication Protocol</title>
<para>
To initiate streaming replication, the frontend sends the "replication"
parameter in the startup message. This tells the backend to go into
walsender mode, where a small set of replication commands can be issued
instead of SQL statements. Only the simple query protocol can be used in
walsender mode.
The commands accepted in walsender mode are:
<variablelist>
<varlistentry>
<term>IDENTIFY_SYSTEM</term>
<listitem>
<para>
Requests the server to identify itself. Server replies with a result
set of a single row, and two fields:
systemid: The unique system identifier identifying the cluster. This
can be used to check that the base backup used to initialize the
slave came from the same cluster.
timeline: Current TimelineID. Also used to check that the slave is
consistent with the master.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term>START_REPLICATION XXX/XXX</term>
<listitem>
<para>
Instructs backend to start streaming WAL, starting at point XXX/XXX.
Server can reply with an error e.g if the requested piece of WAL has
already been recycled. On success, server responds with a
CopyOutResponse message, and backend starts to stream WAL as CopyData
messages.
The payload in CopyData message consists of the following format.
</para>
<para>
<variablelist>
<varlistentry>
<term>
XLogData (B)
</term>
<listitem>
<para>
<variablelist>
<varlistentry>
<term>
Byte1('w')
</term>
<listitem>
<para>
Identifies the message as WAL data.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term>
Int32
</term>
<listitem>
<para>
The log file number of the LSN, indicating the starting point of
the WAL in the message.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term>
Int32
</term>
<listitem>
<para>
The byte offset of the LSN, indicating the starting point of
the WAL in the message.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term>
Byte<replaceable>n</replaceable>
</term>
<listitem>
<para>
Data that forms part of WAL data stream.
</para>
</listitem>
</varlistentry>
</variablelist>
</para>
</listitem>
</varlistentry>
</variablelist>
</para>
<para>
A single WAL record is never split across two CopyData messages. When
a WAL record crosses a WAL page boundary, however, and is therefore
already split using continuation records, it can be split at the page
boundary. In other words, the first main WAL record and its
continuation records can be split across different CopyData messages.
</para>
</listitem>
</varlistentry>
</variablelist>
</para>
</sect1>
<sect1 id="protocol-changes"> <sect1 id="protocol-changes">
<title>Summary of Changes since Protocol 2.0</title> <title>Summary of Changes since Protocol 2.0</title>
......
...@@ -29,7 +29,7 @@ ...@@ -29,7 +29,7 @@
* *
* *
* IDENTIFICATION * IDENTIFICATION
* $PostgreSQL: pgsql/src/backend/replication/walreceiver.c,v 1.10 2010/04/20 22:55:03 tgl Exp $ * $PostgreSQL: pgsql/src/backend/replication/walreceiver.c,v 1.11 2010/06/03 22:17:32 tgl Exp $
* *
*------------------------------------------------------------------------- *-------------------------------------------------------------------------
*/ */
...@@ -41,6 +41,7 @@ ...@@ -41,6 +41,7 @@
#include "access/xlog_internal.h" #include "access/xlog_internal.h"
#include "libpq/pqsignal.h" #include "libpq/pqsignal.h"
#include "miscadmin.h" #include "miscadmin.h"
#include "replication/walprotocol.h"
#include "replication/walreceiver.h" #include "replication/walreceiver.h"
#include "storage/ipc.h" #include "storage/ipc.h"
#include "storage/pmsignal.h" #include "storage/pmsignal.h"
...@@ -393,18 +394,18 @@ XLogWalRcvProcessMsg(unsigned char type, char *buf, Size len) ...@@ -393,18 +394,18 @@ XLogWalRcvProcessMsg(unsigned char type, char *buf, Size len)
{ {
case 'w': /* WAL records */ case 'w': /* WAL records */
{ {
XLogRecPtr recptr; WalDataMessageHeader msghdr;
if (len < sizeof(XLogRecPtr)) if (len < sizeof(WalDataMessageHeader))
ereport(ERROR, ereport(ERROR,
(errcode(ERRCODE_PROTOCOL_VIOLATION), (errcode(ERRCODE_PROTOCOL_VIOLATION),
errmsg_internal("invalid WAL message received from primary"))); errmsg_internal("invalid WAL message received from primary")));
/* memcpy is required here for alignment reasons */
memcpy(&msghdr, buf, sizeof(WalDataMessageHeader));
buf += sizeof(WalDataMessageHeader);
len -= sizeof(WalDataMessageHeader);
memcpy(&recptr, buf, sizeof(XLogRecPtr)); XLogWalRcvWrite(buf, len, msghdr.dataStart);
buf += sizeof(XLogRecPtr);
len -= sizeof(XLogRecPtr);
XLogWalRcvWrite(buf, len, recptr);
break; break;
} }
default: default:
......
This diff is collapsed.
/*-------------------------------------------------------------------------
*
* walprotocol.h
* Definitions relevant to the streaming WAL transmission protocol.
*
* Portions Copyright (c) 2010-2010, PostgreSQL Global Development Group
*
* $PostgreSQL: pgsql/src/include/replication/walprotocol.h,v 1.1 2010/06/03 22:17:32 tgl Exp $
*
*-------------------------------------------------------------------------
*/
#ifndef _WALPROTOCOL_H
#define _WALPROTOCOL_H
#include "access/xlogdefs.h"
#include "utils/timestamp.h"
/*
* Header for a WAL data message (message type 'w'). This is wrapped within
* a CopyData message at the FE/BE protocol level.
*
* The header is followed by actual WAL data. Note that the data length is
* not specified in the header --- it's just whatever remains in the message.
*
* walEnd and sendTime are not essential data, but are provided in case
* the receiver wants to adjust its behavior depending on how far behind
* it is.
*/
typedef struct
{
/* WAL start location of the data included in this message */
XLogRecPtr dataStart;
/* Current end of WAL on the sender */
XLogRecPtr walEnd;
/* Sender's system clock at the time of transmission */
TimestampTz sendTime;
} WalDataMessageHeader;
/*
* Maximum data payload in a WAL data message. Must be >= XLOG_BLCKSZ.
*
* We don't have a good idea of what a good value would be; there's some
* overhead per message in both walsender and walreceiver, but on the other
* hand sending large batches makes walsender less responsive to signals
* because signals are checked only between messages. 128kB (with
* default 8k blocks) seems like a reasonable guess for now.
*/
#define MAX_SEND_SIZE (XLOG_BLCKSZ * 16)
#endif /* _WALPROTOCOL_H */
...@@ -5,7 +5,7 @@ ...@@ -5,7 +5,7 @@
* *
* Portions Copyright (c) 2010-2010, PostgreSQL Global Development Group * Portions Copyright (c) 2010-2010, PostgreSQL Global Development Group
* *
* $PostgreSQL: pgsql/src/include/replication/walreceiver.h,v 1.8 2010/02/26 02:01:27 momjian Exp $ * $PostgreSQL: pgsql/src/include/replication/walreceiver.h,v 1.9 2010/06/03 22:17:32 tgl Exp $
* *
*------------------------------------------------------------------------- *-------------------------------------------------------------------------
*/ */
...@@ -14,6 +14,7 @@ ...@@ -14,6 +14,7 @@
#include "access/xlogdefs.h" #include "access/xlogdefs.h"
#include "storage/spin.h" #include "storage/spin.h"
#include "pgtime.h"
extern bool am_walreceiver; extern bool am_walreceiver;
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment