• Alvaro Herrera's avatar
    Close some holes in BRIN page assignment · ccc4c074
    Alvaro Herrera authored
    In some corner cases, it is possible for the BRIN index relation to be
    extended by brin_getinsertbuffer but the new page not be used
    immediately for anything by its callers; when this happens, the page is
    initialized and the FSM is updated (by brin_getinsertbuffer) with the
    info about that page, but these actions are not WAL-logged.  A later
    index insert/update can use the page, but since the page is already
    initialized, the initialization itself is not WAL-logged then either.
    Replay of this sequence of events causes recovery to fail altogether.
    
    There is a related corner case within brin_getinsertbuffer itself, in
    which we extend the relation to put a new index tuple there, but later
    find out that we cannot do so, and do not return the buffer; the page
    obtained from extension is not even initialized.  The resulting page is
    lost forever.
    
    To fix, shuffle the code so that initialization is not the
    responsibility of brin_getinsertbuffer anymore, in normal cases;
    instead, the initialization is done by its callers (brin_doinsert and
    brin_doupdate) once they're certain that the page is going to be used.
    When either those functions determine that the new page cannot be used,
    before bailing out they initialize the page as an empty regular page,
    enter it in FSM and WAL-log all this.  This way, the page is usable for
    future index insertions, and WAL replay doesn't find trying to insert
    tuples in pages whose initialization didn't make it to the WAL.  The
    same strategy is used in brin_getinsertbuffer when it cannot return the
    new page.
    
    Additionally, add a new step to vacuuming so that all pages of the index
    are scanned; whenever an uninitialized page is found, it is initialized
    as empty and WAL-logged.  This closes the hole that the relation is
    extended but the system crashes before anything is WAL-logged about it.
    We also take this opportunity to update the FSM, in case it has gotten
    out of date.
    
    Thanks to Heikki Linnakangas for finding the problem that kicked some
    additional analysis of BRIN page assignment code.
    
    Backpatch to 9.5, where BRIN was introduced.
    
    Discussion: https://www.postgresql.org/message-id/20150723204810.GY5596@postgresql.org
    ccc4c074
brin_pageops.c 23.2 KB