• Andres Freund's avatar
    Prevent possibility of panics during shutdown checkpoint. · c6c33343
    Andres Freund authored
    When the checkpointer writes the shutdown checkpoint, it checks
    afterwards whether any WAL has been written since it started and
    throws a PANIC if so.  At that point, only walsenders are still
    active, so one might think this could not happen, but walsenders can
    also generate WAL, for instance in BASE_BACKUP and logical decoding
    related commands (e.g. via hint bits).  So they can trigger this panic
    if such a command is run while the shutdown checkpoint is being
    written.
    
    To fix this, divide the walsender shutdown into two phases.  First,
    checkpointer, itself triggered by postmaster, sends a
    PROCSIG_WALSND_INIT_STOPPING signal to all walsenders.  If the backend
    is idle or runs an SQL query this causes the backend to shutdown, if
    logical replication is in progress all existing WAL records are
    processed followed by a shutdown.  Otherwise this causes the walsender
    to switch to the "stopping" state. In this state, the walsender will
    reject any further replication commands. The checkpointer begins the
    shutdown checkpoint once all walsenders are confirmed as
    stopping. When the shutdown checkpoint finishes, the postmaster sends
    us SIGUSR2. This instructs walsender to send any outstanding WAL,
    including the shutdown checkpoint record, wait for it to be replicated
    to the standby, and then exit.
    
    Author: Andres Freund, based on an earlier patch by Michael Paquier
    Reported-By: Fujii Masao, Andres Freund
    Reviewed-By: Michael Paquier
    Discussion: https://postgr.es/m/20170602002912.tqlwn4gymzlxpvs2@alap3.anarazel.de
    Backpatch: 9.4, where logical decoding was introduced
    c6c33343
monitoring.sgml 152 KB