-
Peter Eisentraut authored
When the checkpointer writes the shutdown checkpoint, it checks afterwards whether any WAL has been written since it started and throws a PANIC if so. At that point, only walsenders are still active, so one might think this could not happen, but walsenders can also generate WAL, for instance in BASE_BACKUP and certain variants of CREATE_REPLICATION_SLOT. So they can trigger this panic if such a command is run while the shutdown checkpoint is being written. To fix this, divide the walsender shutdown into two phases. First, the postmaster sends a SIGUSR2 signal to all walsenders. The walsenders then put themselves into the "stopping" state. In this state, they reject any new commands. (For simplicity, we reject all new commands, so that in the future we do not have to track meticulously which commands might generate WAL.) The checkpointer waits for all walsenders to reach this state before proceeding with the shutdown checkpoint. After the shutdown checkpoint is done, the postmaster sends SIGINT (previously unused) to the walsenders. This triggers the existing shutdown behavior of sending out the shutdown checkpoint record and then terminating. Author: Michael Paquier <michael.paquier@gmail.com> Reported-by: Fujii Masao <masao.fujii@gmail.com>
086221cf