Commit 78bad97f authored by Michael Paquier's avatar Michael Paquier

Improve various aspects of pg_rewind documentation

The pg_rewind docs currently assert that the state of the target's
data directory after rewind is equivalent to the source's data
directory.  This clarifies the documentation to describe that the base
state is further back in time and that the target's data directory will
include the current state from the source of any copied blocks since the
point of divergence.

This commit also improves the section "How It Works":
- Describe the update of the pg_control file.
- Reorganize the list of files and directories ignored during the
rewind.

Author: James Coleman
Discussion: https://postgr.es/m/CAAaqYe-sgqCos7MXF4XiY8rUPy3CEmaCY9EvfhX-DhPhPBF5_A@mail.gmail.com
parent d9c501da
...@@ -48,14 +48,16 @@ PostgreSQL documentation ...@@ -48,14 +48,16 @@ PostgreSQL documentation
</para> </para>
<para> <para>
The result is equivalent to replacing the target data directory with the After a successful rewind, the state of the target data directory is
source one. Only changed blocks from relation files are copied; analogous to a base backup of the source data directory. Unlike taking
all other files are copied in full, including configuration files. The a new base backup or using a tool like <application>rsync</application>,
advantage of <application>pg_rewind</application> over taking a new base backup, or <application>pg_rewind</application> does not require comparing or copying
tools like <application>rsync</application>, is that <application>pg_rewind</application> does unchanged relation blocks in the cluster. Only changed blocks from existing
not require reading through unchanged blocks in the cluster. This makes relation files are copied; all other files, including new relation files,
it a lot faster when the database is large and only a small configuration files, and WAL segments, are copied in full. As such the
fraction of blocks differ between the clusters. rewind operation is significantly faster than other approaches when the
database is large and only a small fraction of blocks differ between the
clusters.
</para> </para>
<para> <para>
...@@ -77,16 +79,18 @@ PostgreSQL documentation ...@@ -77,16 +79,18 @@ PostgreSQL documentation
</para> </para>
<para> <para>
When the target server is started for the first time after running After running <application>pg_rewind</application>, WAL replay needs to
<application>pg_rewind</application>, it will go into recovery mode and replay all complete for the data directory to be in a consistent state. When the
WAL generated in the source server after the point of divergence. target server is started again it will enter archive recovery and replay
If some of the WAL was no longer available in the source server when all WAL generated in the source server from the last checkpoint before
<application>pg_rewind</application> was run, and therefore could not be copied by the the point of divergence. If some of the WAL was no longer available in the
<application>pg_rewind</application> session, it must be made available when the source server when <application>pg_rewind</application> was run, and
target server is started. This can be done by creating a therefore could not be copied by the <application>pg_rewind</application>
<filename>recovery.signal</filename> file in the target data directory session, it must be made available when the target server is started.
and configuring suitable <xref linkend="guc-restore-command"/> This can be done by creating a <filename>recovery.signal</filename> file
in <filename>postgresql.conf</filename>. in the target data directory and by configuring a suitable
<xref linkend="guc-restore-command"/> in
<filename>postgresql.conf</filename>.
</para> </para>
<para> <para>
...@@ -105,6 +109,15 @@ PostgreSQL documentation ...@@ -105,6 +109,15 @@ PostgreSQL documentation
recovered. In such a case, taking a new fresh backup is recommended. recovered. In such a case, taking a new fresh backup is recommended.
</para> </para>
<para>
As <application>pg_rewind</application> copies configuration files
entirely from the source, it may be required to correct the configuration
used for recovery before restarting the target server, especially if
the target is reintroduced as a standby of the source. If you restart
the server after the rewind operation has finished but without configuring
recovery, the target may again diverge from the primary.
</para>
<para> <para>
<application>pg_rewind</application> will fail immediately if it finds <application>pg_rewind</application> will fail immediately if it finds
files it cannot write directly to. This can happen for example when files it cannot write directly to. This can happen for example when
...@@ -342,34 +355,45 @@ GRANT EXECUTE ON function pg_catalog.pg_read_binary_file(text, bigint, bigint, b ...@@ -342,34 +355,45 @@ GRANT EXECUTE ON function pg_catalog.pg_read_binary_file(text, bigint, bigint, b
Copy all those changed blocks from the source cluster to Copy all those changed blocks from the source cluster to
the target cluster, either using direct file system access the target cluster, either using direct file system access
(<option>--source-pgdata</option>) or SQL (<option>--source-server</option>). (<option>--source-pgdata</option>) or SQL (<option>--source-server</option>).
Relation files are now in a state equivalent to the moment of the last
completed checkpoint prior to the point at which the WAL timelines of the
source and target diverged plus the current state on the source of any
blocks changed on the target after that divergence.
</para> </para>
</step> </step>
<step> <step>
<para> <para>
Copy all other files such as <filename>pg_xact</filename> and Copy all other files, including new relation files, WAL segments,
configuration files from the source cluster to the target cluster <filename>pg_xact</filename>, and configuration files from the source
(everything except the relation files). Similarly to base backups, cluster to the target cluster. Similarly to base backups, the contents
the contents of the directories <filename>pg_dynshmem/</filename>, of the directories <filename>pg_dynshmem/</filename>,
<filename>pg_notify/</filename>, <filename>pg_replslot/</filename>, <filename>pg_notify/</filename>, <filename>pg_replslot/</filename>,
<filename>pg_serial/</filename>, <filename>pg_snapshots/</filename>, <filename>pg_serial/</filename>, <filename>pg_snapshots/</filename>,
<filename>pg_stat_tmp/</filename>, and <filename>pg_stat_tmp/</filename>, and <filename>pg_subtrans/</filename>
<filename>pg_subtrans/</filename> are omitted from the data copied are omitted from the data copied from the source cluster. The files
from the source cluster. Any file or directory beginning with
<filename>pgsql_tmp</filename> is omitted, as well as are
<filename>backup_label</filename>, <filename>backup_label</filename>,
<filename>tablespace_map</filename>, <filename>tablespace_map</filename>,
<filename>pg_internal.init</filename>, <filename>pg_internal.init</filename>,
<filename>postmaster.opts</filename> and <filename>postmaster.opts</filename>, and
<filename>postmaster.pid</filename>. <filename>postmaster.pid</filename>, as well as any file or directory
beginning with <filename>pgsql_tmp</filename>, are omitted.
</para>
</step>
<step>
<para>
Create a <filename>backup_label</filename> file to begin WAL replay at
the checkpoint created at failover and configure the
<filename>pg_control</filename> file with a minimum consistency LSN
defined as the result of <literal>pg_current_wal_insert_lsn()</literal>
when rewinding from a live source or the last checkpoint LSN when
rewinding from a stopped source.
</para> </para>
</step> </step>
<step> <step>
<para> <para>
Apply the WAL from the source cluster, starting from the checkpoint When starting the target, <productname>PostgreSQL</productname> replays
created at failover. (Strictly speaking, <application>pg_rewind</application> all the required WAL, resulting in a data directory in a consistent
doesn't apply the WAL, it just creates a backup label file that state.
makes <productname>PostgreSQL</productname> start by replaying all WAL from
that checkpoint forward.)
</para> </para>
</step> </step>
</procedure> </procedure>
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment