• Tom Lane's avatar
    Fix race condition in predicate-lock init code in EXEC_BACKEND builds. · e2c8100e
    Tom Lane authored
    Trading a little too heavily on letting the code path be the same whether
    we were creating shared data structures or only attaching to them,
    InitPredicateLocks() inserted the "scratch" PredicateLockTargetHash entry
    unconditionally.  This is just wrong if we're in a postmaster child,
    which would only reach this code in EXEC_BACKEND builds.  Most of the
    time, the hash_search(HASH_ENTER) call would simply report that the
    entry already existed, causing no visible effect since the code did not
    bother to check for that possibility.  However, if this happened while
    some other backend had transiently removed the "scratch" entry, then
    that other backend's eventual RestoreScratchTarget would suffer an
    assert failure; this appears to be the explanation for a recent failure
    on buildfarm member culicidae.  In non-assert builds, there would be
    no visible consequences there either.  But nonetheless this is a pretty
    bad bug for EXEC_BACKEND builds, for two reasons:
    
    1. Each new backend would perform the hash_search(HASH_ENTER) call
    without holding any lock that would prevent concurrent access to the
    PredicateLockTargetHash hash table.  This creates a low but certainly
    nonzero risk of corruption of that hash table.
    
    2. In the event that the race condition occurred, by reinserting the
    scratch entry too soon, we were defeating the entire purpose of the
    scratch entry, namely to guarantee that transaction commit could move
    hash table entries around with no risk of out-of-memory failure.
    The odds of an actual OOM failure are quite low, but not zero, and if
    it did happen it would again result in corruption of the hash table.
    
    The user-visible symptoms of such corruption are a little hard to predict,
    but would presumably amount to misbehavior of SERIALIZABLE transactions
    that'd require a crash or postmaster restart to fix.
    
    To fix, just skip the hash insertion if IsUnderPostmaster.  I also
    inserted a bunch of assertions that the expected things happen
    depending on whether IsUnderPostmaster is true.  That might be overkill,
    since most comparable code in other functions isn't quite that paranoid,
    but once burnt twice shy.
    
    In passing, also move a couple of lines to places where they seemed
    to make more sense.
    
    Diagnosis of problem by Thomas Munro, patch by me.  Back-patch to
    all supported branches.
    
    Discussion: https://postgr.es/m/10593.1500670709@sss.pgh.pa.us
    e2c8100e
predicate.c 158 KB