• Tom Lane's avatar
    Harden pmsignal.c against clobbered shared memory. · b10546ec
    Tom Lane authored
    The postmaster is not supposed to do anything that depends
    fundamentally on shared memory contents, because that creates
    the risk that a backend crash that trashes shared memory will
    take the postmaster down with it, preventing automatic recovery.
    In commit 969d7cd4 I lost sight of this principle and coded
    AssignPostmasterChildSlot() in such a way that it could fail
    or even crash if the shared PMSignalState structure became
    corrupted.  Remarkably, we've not seen field reports of such
    crashes; but I managed to induce one while testing the recent
    changes around palloc chunk headers.
    
    To fix, make a semi-duplicative state array inside the postmaster
    so that we need consult only local state while choosing a "child
    slot" for a new backend.  Ensure that other postmaster-executed
    routines in pmsignal.c don't have critical dependencies on the
    shared state, either.  Corruption of PMSignalState might now
    lead ReleasePostmasterChildSlot() to conclude that backend X
    failed, when actually backend Y was the one that trashed things.
    But that doesn't matter, because we'll force a cluster-wide reset
    regardless.
    
    Back-patch to all supported branches, since this is an old bug.
    
    Discussion: https://postgr.es/m/3436789.1665187055@sss.pgh.pa.us
    b10546ec
pmsignal.c 13.2 KB