• Robert Haas's avatar
    Prevent WAL corruption after a standby promotion. · 0e54a5e2
    Robert Haas authored
    When a PostgreSQL instance performing archive recovery but not using
    standby mode is promoted, and the last WAL segment that it attempted
    to read ended in a partial record, the previous code would create
    invalid WAL on the new timeline. The WAL from the previously timeline
    would be copied to the new timeline up until the end of the last valid
    record, but instead of beginning to write WAL at immediately
    afterwards, the promoted server would write an overwrite contrecord at
    the beginning of the next segment. The end of the previous segment
    would be left as all-zeroes, resulting in failures if anything tried
    to read WAL from that file.
    
    The root of the issue is that ReadRecord() decides whether to set
    abortedRecPtr and missingContrecPtr based on the value of StandbyMode,
    but ReadRecord() switches to a new timeline based on the value of
    ArchiveRecoveryRequested. We shouldn't try to write an overwrite
    contrecord if we're switching to a new timeline, so change the test in
    ReadRecod() to check ArchiveRecoveryRequested instead.
    
    Code fix by Dilip Kumar. Comments by me incorporating suggested
    language from Álvaro Herrera. Further review from Kyotaro Horiguchi
    and Sami Imseih.
    
    Discussion: http://postgr.es/m/CAFiTN-t7umki=PK8dT1tcPV=mOUe2vNhHML6b3T7W7qqvvajjg@mail.gmail.com
    Discussion: http://postgr.es/m/FB0DEA0B-E14E-43A0-811F-C1AE93D00FF3%40amazon.com
    0e54a5e2
xlog.c 411 KB