• Heikki Linnakangas's avatar
    Revamp the WAL record format. · 2c03216d
    Heikki Linnakangas authored
    Each WAL record now carries information about the modified relation and
    block(s) in a standardized format. That makes it easier to write tools that
    need that information, like pg_rewind, prefetching the blocks to speed up
    recovery, etc.
    
    There's a whole new API for building WAL records, replacing the XLogRecData
    chains used previously. The new API consists of XLogRegister* functions,
    which are called for each buffer and chunk of data that is added to the
    record. The new API also gives more control over when a full-page image is
    written, by passing flags to the XLogRegisterBuffer function.
    
    This also simplifies the XLogReadBufferForRedo() calls. The function can dig
    the relation and block number from the WAL record, so they no longer need to
    be passed as arguments.
    
    For the convenience of redo routines, XLogReader now disects each WAL record
    after reading it, copying the main data part and the per-block data into
    MAXALIGNed buffers. The data chunks are not aligned within the WAL record,
    but the redo routines can assume that the pointers returned by XLogRecGet*
    functions are. Redo routines are now passed the XLogReaderState, which
    contains the record in the already-disected format, instead of the plain
    XLogRecord.
    
    The new record format also makes the fixed size XLogRecord header smaller,
    by removing the xl_len field. The length of the "main data" portion is now
    stored at the end of the WAL record, and there's a separate header after
    XLogRecord for it. The alignment padding at the end of XLogRecord is also
    removed. This compansates for the fact that the new format would otherwise
    be more bulky than the old format.
    
    Reviewed by Andres Freund, Amit Kapila, Michael Paquier, Alvaro Herrera,
    Fujii Masao.
    2c03216d
nbtpage.c 56.5 KB