Commit e3fb6170 authored by Alvaro Herrera's avatar Alvaro Herrera

Avoid creating archive status ".ready" files too early

WAL records may span multiple segments, but XLogWrite() does not
wait for the entire record to be written out to disk before
creating archive status files.  Instead, as soon as the last WAL page of
the segment is written, the archive status file is created, and the
archiver may process it.  If PostgreSQL crashes before it is able to
write and flush the rest of the record (in the next WAL segment), the
wrong version of the first segment file lingers in the archive, which
causes operations such as point-in-time restores to fail.

To fix this, keep track of records that span across segments and ensure
that segments are only marked ready-for-archival once such records have
been completely written to disk.

This has always been wrong, so backpatch all the way back.

Author: Nathan Bossart <bossartn@amazon.com>
Reviewed-by: default avatarKyotaro Horiguchi <horikyota.ntt@gmail.com>
Reviewed-by: default avatarRyo Matsumura <matsumura.ryo@fujitsu.com>
Reviewed-by: default avatarAndrey Borodin <x4mmm@yandex-team.ru>
Discussion: https://postgr.es/m/CBDDFA01-6E40-46BB-9F98-9340F4379505@amazon.com
parent 65b649fe
This diff is collapsed.
...@@ -248,6 +248,13 @@ WalWriterMain(void) ...@@ -248,6 +248,13 @@ WalWriterMain(void)
/* Process any signals received recently */ /* Process any signals received recently */
HandleWalWriterInterrupts(); HandleWalWriterInterrupts();
/*
* Notify the archiver of any WAL segments that are ready. We do this
* here to handle a race condition where WAL is flushed to disk prior
* to registering the segment boundary.
*/
NotifySegmentsReadyForArchive(GetFlushRecPtr());
/* /*
* Do what we're here for; then, if XLogBackgroundFlush() found useful * Do what we're here for; then, if XLogBackgroundFlush() found useful
* work to do, reset hibernation counter. * work to do, reset hibernation counter.
......
...@@ -349,6 +349,7 @@ extern XLogRecPtr GetInsertRecPtr(void); ...@@ -349,6 +349,7 @@ extern XLogRecPtr GetInsertRecPtr(void);
extern XLogRecPtr GetFlushRecPtr(void); extern XLogRecPtr GetFlushRecPtr(void);
extern XLogRecPtr GetLastImportantRecPtr(void); extern XLogRecPtr GetLastImportantRecPtr(void);
extern void RemovePromoteSignalFiles(void); extern void RemovePromoteSignalFiles(void);
extern void NotifySegmentsReadyForArchive(XLogRecPtr flushRecPtr);
extern bool PromoteIsTriggered(void); extern bool PromoteIsTriggered(void);
extern bool CheckPromoteSignal(void); extern bool CheckPromoteSignal(void);
......
...@@ -46,6 +46,7 @@ typedef uint64 XLogRecPtr; ...@@ -46,6 +46,7 @@ typedef uint64 XLogRecPtr;
* XLogSegNo - physical log file sequence number. * XLogSegNo - physical log file sequence number.
*/ */
typedef uint64 XLogSegNo; typedef uint64 XLogSegNo;
#define MaxXLogSegNo ((XLogSegNo) 0xFFFFFFFFFFFFFFFF)
/* /*
* TimeLineID (TLI) - identifies different database histories to prevent * TimeLineID (TLI) - identifies different database histories to prevent
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment