• Tom Lane's avatar
    Rearrange pgstat_bestart() to avoid failures within its critical section. · 85ccb689
    Tom Lane authored
    We long ago decided to design the shared PgBackendStatus data structure to
    minimize the cost of writing status updates, which means that writers just
    have to increment the st_changecount field twice.  That isn't hooked into
    any sort of resource management mechanism, which means that if something
    were to throw error between the two increments, the st_changecount field
    would be left odd indefinitely.  That would cause readers to lock up.
    Now, since it's also a bad idea to leave the field odd for longer than
    absolutely necessary (because readers will spin while we have it set),
    the expectation was that we'd treat these segments like spinlock critical
    sections, with only short, more or less straight-line, code in them.
    
    That was fine as originally designed, but commit 9029f4b3 broke it
    by inserting a significant amount of non-straight-line code into
    pgstat_bestart(), code that is very capable of throwing errors, not to
    mention taking a significant amount of time during which readers will spin.
    We have a report from Neeraj Kumar of readers actually locking up, which
    I suspect was due to an encoding conversion error in X509_NAME_to_cstring,
    though conceivably it was just a garden-variety OOM failure.
    
    Subsequent commits have loaded even more dubious code into pgstat_bestart's
    critical section (and commit fc70a4b0 deserves some kind of booby prize
    for managing to miss the critical section entirely, although the negative
    consequences seem minimal given that the PgBackendStatus entry should be
    seen by readers as inactive at that point).
    
    The right way to fix this mess seems to be to compute all these values
    into a local copy of the process' PgBackendStatus struct, and then just
    copy the data back within the critical section proper.  This plan can't
    be implemented completely cleanly because of the struct's heavy reliance
    on out-of-line strings, which we must initialize separately within the
    critical section.  But still, the critical section is far smaller and
    safer than it was before.
    
    In hopes of forestalling future errors of the same ilk, rename the
    macros for st_changecount management to make it more apparent that
    the writer-side macros create a critical section.  And to prevent
    the worst consequences if we nonetheless manage to mess it up anyway,
    adjust those macros so that they really are a critical section, ie
    they now bump CritSectionCount.  That doesn't add much overhead, and
    it guarantees that if we do somehow throw an error while the counter
    is odd, it will lead to PANIC and a database restart to reset shared
    memory.
    
    Back-patch to 9.5 where the problem was introduced.
    
    In HEAD, also fix an oversight in commit b0b39f72: it failed to teach
    pgstat_read_current_status to copy st_gssstatus data from shared memory to
    local memory.  Hence, subsequent use of that data within the transaction
    would potentially see changing data that it shouldn't see.
    
    Discussion: https://postgr.es/m/CAPR3Wj5Z17=+eeyrn_ZDG3NQGYgMEOY6JV6Y-WRRhGgwc16U3Q@mail.gmail.com
    85ccb689
pgstat.c 174 KB