• Tom Lane's avatar
    Fix incorrect order of lock file removal and failure to close() sockets. · d73d14c2
    Tom Lane authored
    Commit c9b0cbe9 accidentally broke the
    order of operations during postmaster shutdown: it resulted in removing
    the per-socket lockfiles after, not before, postmaster.pid.  This creates
    a race-condition hazard for a new postmaster that's started immediately
    after observing that postmaster.pid has disappeared; if it sees the
    socket lockfile still present, it will quite properly refuse to start.
    This error appears to be the explanation for at least some of the
    intermittent buildfarm failures we've seen in the pg_upgrade test.
    
    Another problem, which has been there all along, is that the postmaster
    has never bothered to close() its listen sockets, but has just allowed them
    to close at process death.  This creates a different race condition for an
    incoming postmaster: it might be unable to bind to the desired listen
    address because the old postmaster is still incumbent.  This might explain
    some odd failures we've seen in the past, too.  (Note: this is not related
    to the fact that individual backends don't close their client communication
    sockets.  That behavior is intentional and is not changed by this patch.)
    
    Fix by adding an on_proc_exit function that closes the postmaster's ports
    explicitly, and (in 9.3 and up) reshuffling the responsibility for where
    to unlink the Unix socket files.  Lock file unlinking can stay where it
    is, but teach it to unlink the lock files in reverse order of creation.
    d73d14c2
libpq.h 3.18 KB