Commit 7cbee7c0 authored by Heikki Linnakangas's avatar Heikki Linnakangas

At promotion, don't leave behind a partial segment on the old timeline.

With commit de768844, a copy of the partial segment was archived with the
.partial suffix, but the original file was still left in pg_xlog, so it
didn't actually solve the problems with archiving the partial segment that
it was supposed to solve. With this patch, the partial segment is renamed
rather than copied, so we only archive it with the .partial suffix.

Also be more robust in detecting if the last segment is already being
archived. Previously I used XLogArchiveIsBusy() for that, but that's not
quite right. With archive_mode='always', there might be a .ready file for
it, and we don't want to rename it to .partial in that case.

The old segment is needed until we're fully committed to the new timeline,
i.e. until we've written the end-of-recovery WAL record and updated the
min recovery point and timeline in the control file. So move the renaming
later in the startup sequence, after all that's been done.
parent c5dd8ead
...@@ -5224,31 +5224,6 @@ exitArchiveRecovery(TimeLineID endTLI, XLogRecPtr endOfLog) ...@@ -5224,31 +5224,6 @@ exitArchiveRecovery(TimeLineID endTLI, XLogRecPtr endOfLog)
* happens in the middle of a segment, copy data from the last WAL segment * happens in the middle of a segment, copy data from the last WAL segment
* of the old timeline up to the switch point, to the starting WAL segment * of the old timeline up to the switch point, to the starting WAL segment
* on the new timeline. * on the new timeline.
*
* What to do with the partial segment on the old timeline? If we don't
* archive it, and the server that created the WAL never archives it
* either (e.g. because it was hit by a meteor), it will never make it to
* the archive. That's OK from our point of view, because the new segment
* that we created with the new TLI contains all the WAL from the old
* timeline up to the switch point. But if you later try to do PITR to the
* "missing" WAL on the old timeline, recovery won't find it in the
* archive. It's physically present in the new file with new TLI, but
* recovery won't look there when it's recovering to the older timeline.
* On the other hand, if we archive the partial segment, and the original
* server on that timeline is still running and archives the completed
* version of the same segment later, it will fail. (We used to do that in
* 9.4 and below, and it caused such problems).
*
* As a compromise, we archive the last segment with the .partial suffix.
* Archive recovery will never try to read .partial segments, so they will
* normally go unused. But in the odd PITR case, the administrator can
* copy them manually to the pg_xlog directory (removing the suffix). They
* can be useful in debugging, too.
*
* If a .done file already exists for the old timeline, however, there is
* already a complete copy of the file in the archive, and there is no
* need to archive the partial one. (In particular, if it was restored
* from the archive to begin with, it's expected to have .done file).
*/ */
if (endLogSegNo == startLogSegNo) if (endLogSegNo == startLogSegNo)
{ {
...@@ -5266,31 +5241,6 @@ exitArchiveRecovery(TimeLineID endTLI, XLogRecPtr endOfLog) ...@@ -5266,31 +5241,6 @@ exitArchiveRecovery(TimeLineID endTLI, XLogRecPtr endOfLog)
tmpfname = XLogFileCopy(NULL, xlogfname, endOfLog % XLOG_SEG_SIZE); tmpfname = XLogFileCopy(NULL, xlogfname, endOfLog % XLOG_SEG_SIZE);
if (!InstallXLogFileSegment(&endLogSegNo, tmpfname, false, 0, false)) if (!InstallXLogFileSegment(&endLogSegNo, tmpfname, false, 0, false))
elog(ERROR, "InstallXLogFileSegment should not have failed"); elog(ERROR, "InstallXLogFileSegment should not have failed");
/*
* Make a .partial copy for the archive (unless the original file was
* already archived)
*/
if (XLogArchivingActive() && XLogArchiveIsBusy(xlogfname))
{
char partialfname[MAXFNAMELEN];
snprintf(partialfname, MAXFNAMELEN, "%s.partial", xlogfname);
/* Make sure there's no .done or .ready file for it. */
XLogArchiveCleanup(partialfname);
/*
* We copy the whole segment, not just upto the switch point.
* The portion after the switch point might be garbage, but it
* might also be valid WAL, if we stopped recovery at user's
* request before reaching the end. Better to preserve the
* file as it is, garbage and all, than lose the evidence if
* something goes wrong.
*/
(void) XLogFileCopy(partialfname, xlogfname, XLOG_SEG_SIZE);
XLogArchiveNotify(partialfname);
}
} }
else else
{ {
...@@ -5942,6 +5892,7 @@ StartupXLOG(void) ...@@ -5942,6 +5892,7 @@ StartupXLOG(void)
XLogRecPtr RecPtr, XLogRecPtr RecPtr,
checkPointLoc, checkPointLoc,
EndOfLog; EndOfLog;
TimeLineID EndOfLogTLI;
TimeLineID PrevTimeLineID; TimeLineID PrevTimeLineID;
XLogRecord *record; XLogRecord *record;
TransactionId oldestActiveXID; TransactionId oldestActiveXID;
...@@ -7032,6 +6983,15 @@ StartupXLOG(void) ...@@ -7032,6 +6983,15 @@ StartupXLOG(void)
record = ReadRecord(xlogreader, LastRec, PANIC, false); record = ReadRecord(xlogreader, LastRec, PANIC, false);
EndOfLog = EndRecPtr; EndOfLog = EndRecPtr;
/*
* EndOfLogTLI is the TLI in the filename of the XLOG segment containing
* the end-of-log. It could be different from the timeline that EndOfLog
* nominally belongs to, if there was a timeline switch in that segment,
* and we were reading the old wAL from a segment belonging to a higher
* timeline.
*/
EndOfLogTLI = xlogreader->readPageTLI;
/* /*
* Complain if we did not roll forward far enough to render the backup * Complain if we did not roll forward far enough to render the backup
* dump consistent. Note: it is indeed okay to look at the local variable * dump consistent. Note: it is indeed okay to look at the local variable
...@@ -7131,7 +7091,7 @@ StartupXLOG(void) ...@@ -7131,7 +7091,7 @@ StartupXLOG(void)
* we will use that below.) * we will use that below.)
*/ */
if (ArchiveRecoveryRequested) if (ArchiveRecoveryRequested)
exitArchiveRecovery(xlogreader->readPageTLI, EndOfLog); exitArchiveRecovery(EndOfLogTLI, EndOfLog);
/* /*
* Prepare to write WAL starting at EndOfLog position, and init xlog * Prepare to write WAL starting at EndOfLog position, and init xlog
...@@ -7262,12 +7222,82 @@ StartupXLOG(void) ...@@ -7262,12 +7222,82 @@ StartupXLOG(void)
true); true);
} }
/*
* Clean up any (possibly bogus) future WAL segments on the old timeline.
*/
if (ArchiveRecoveryRequested) if (ArchiveRecoveryRequested)
{
/*
* We switched to a new timeline. Clean up segments on the old
* timeline.
*
* If there are any higher-numbered segments on the old timeline,
* remove them. They might contain valid WAL, but they might also be
* pre-allocated files containing garbage. In any case, they are not
* part of the new timeline's history so we don't need them.
*/
RemoveNonParentXlogFiles(EndOfLog, ThisTimeLineID); RemoveNonParentXlogFiles(EndOfLog, ThisTimeLineID);
/*
* If the switch happened in the middle of a segment, what to do with
* the last, partial segment on the old timeline? If we don't archive
* it, and the server that created the WAL never archives it either
* (e.g. because it was hit by a meteor), it will never make it to the
* archive. That's OK from our point of view, because the new segment
* that we created with the new TLI contains all the WAL from the old
* timeline up to the switch point. But if you later try to do PITR to
* the "missing" WAL on the old timeline, recovery won't find it in
* the archive. It's physically present in the new file with new TLI,
* but recovery won't look there when it's recovering to the older
* timeline. On the other hand, if we archive the partial segment, and
* the original server on that timeline is still running and archives
* the completed version of the same segment later, it will fail. (We
* used to do that in 9.4 and below, and it caused such problems).
*
* As a compromise, we rename the last segment with the .partial
* suffix, and archive it. Archive recovery will never try to read
* .partial segments, so they will normally go unused. But in the odd
* PITR case, the administrator can copy them manually to the pg_xlog
* directory (removing the suffix). They can be useful in debugging,
* too.
*
* If a .done or .ready file already exists for the old timeline,
* however, we had already determined that the segment is complete,
* so we can let it be archived normally. (In particular, if it was
* restored from the archive to begin with, it's expected to have a
* .done file).
*/
if (EndOfLog % XLOG_SEG_SIZE != 0 && XLogArchivingActive())
{
char origfname[MAXFNAMELEN];
XLogSegNo endLogSegNo;
XLByteToPrevSeg(EndOfLog, endLogSegNo);
XLogFileName(origfname, EndOfLogTLI, endLogSegNo);
if (!XLogArchiveIsReadyOrDone(origfname))
{
char origpath[MAXPGPATH];
char partialfname[MAXFNAMELEN];
char partialpath[MAXPGPATH];
XLogFilePath(origpath, EndOfLogTLI, endLogSegNo);
snprintf(partialfname, MAXPGPATH, "%s.partial", origfname);
snprintf(partialpath, MAXPGPATH, "%s.partial", origpath);
/*
* Make sure there's no .done or .ready file for the .partial
* file.
*/
XLogArchiveCleanup(partialfname);
if (rename(origpath, partialpath) != 0)
ereport(ERROR,
(errcode_for_file_access(),
errmsg("could not rename file \"%s\" to \"%s\": %m",
origpath, partialpath)));
XLogArchiveNotify(partialfname);
}
}
}
/* /*
* Preallocate additional log files, if wanted. * Preallocate additional log files, if wanted.
*/ */
......
...@@ -697,6 +697,41 @@ XLogArchiveIsBusy(const char *xlog) ...@@ -697,6 +697,41 @@ XLogArchiveIsBusy(const char *xlog)
return true; return true;
} }
/*
* XLogArchiveIsReadyOrDone
*
* Check to see if an XLOG segment file has a .ready or .done file.
* This is similar to XLogArchiveIsBusy(), but returns true if the file
* is already archived or is about to be archived.
*
* This is currently only used at recovery. During normal operation this
* would be racy: the file might get removed or marked with .ready as we're
* checking it, or immediately after we return.
*/
bool
XLogArchiveIsReadyOrDone(const char *xlog)
{
char archiveStatusPath[MAXPGPATH];
struct stat stat_buf;
/* First check for .done --- this means archiver is done with it */
StatusFilePath(archiveStatusPath, xlog, ".done");
if (stat(archiveStatusPath, &stat_buf) == 0)
return true;
/* check for .ready --- this means archiver is still busy with it */
StatusFilePath(archiveStatusPath, xlog, ".ready");
if (stat(archiveStatusPath, &stat_buf) == 0)
return true;
/* Race condition --- maybe archiver just finished, so recheck */
StatusFilePath(archiveStatusPath, xlog, ".done");
if (stat(archiveStatusPath, &stat_buf) == 0)
return true;
return false;
}
/* /*
* XLogArchiveIsReady * XLogArchiveIsReady
* *
......
...@@ -305,6 +305,7 @@ extern void XLogArchiveForceDone(const char *xlog); ...@@ -305,6 +305,7 @@ extern void XLogArchiveForceDone(const char *xlog);
extern bool XLogArchiveCheckDone(const char *xlog); extern bool XLogArchiveCheckDone(const char *xlog);
extern bool XLogArchiveIsBusy(const char *xlog); extern bool XLogArchiveIsBusy(const char *xlog);
extern bool XLogArchiveIsReady(const char *xlog); extern bool XLogArchiveIsReady(const char *xlog);
extern bool XLogArchiveIsReadyOrDone(const char *xlog);
extern void XLogArchiveCleanup(const char *xlog); extern void XLogArchiveCleanup(const char *xlog);
#endif /* XLOG_INTERNAL_H */ #endif /* XLOG_INTERNAL_H */
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment