• Tom Lane's avatar
    Fix longstanding recursion hazard in sinval message processing. · f868a814
    Tom Lane authored
    LockRelationOid and sibling routines supposed that, if our session already
    holds the lock they were asked to acquire, they could skip calling
    AcceptInvalidationMessages on the grounds that we must have already read
    any remote sinval messages issued against the relation being locked.
    This is normally true, but there's a critical special case where it's not:
    processing inside AcceptInvalidationMessages might attempt to access system
    relations, resulting in a recursive call to acquire a relation lock.
    
    Hence, if the outer call had acquired that same system catalog lock, we'd
    fall through, despite the possibility that there's an as-yet-unread sinval
    message for that system catalog.  This could, for example, result in
    failure to access a system catalog or index that had just been processed
    by VACUUM FULL.  This is the explanation for buildfarm failures we've been
    seeing intermittently for the past three months.  The bug is far older
    than that, but commits a54e1f15 et al added a new recursion case within
    AcceptInvalidationMessages that is apparently easier to hit than any
    previous case.
    
    To fix this, we must not skip calling AcceptInvalidationMessages until
    we have *finished* a call to it since acquiring a relation lock, not
    merely acquired the lock.  (There's already adequate logic inside
    AcceptInvalidationMessages to deal with being called recursively.)
    Fortunately, we can implement that at trivial cost, by adding a flag
    to LOCALLOCK hashtable entries that tracks whether we know we have
    completed such a call.
    
    There is an API hazard added by this patch for external callers of
    LockAcquire: if anything is testing for LOCKACQUIRE_ALREADY_HELD,
    it might be fooled by the new return code LOCKACQUIRE_ALREADY_CLEAR
    into thinking the lock wasn't already held.  This should be a fail-soft
    condition, though, unless something very bizarre is being done in
    response to the test.
    
    Also, I added an additional output argument to LockAcquireExtended,
    assuming that that probably isn't called by any outside code given
    the very limited usefulness of its additional functionality.
    
    Back-patch to all supported branches.
    
    Discussion: https://postgr.es/m/12259.1532117714@sss.pgh.pa.us
    f868a814
lock.c 133 KB