• Tom Lane's avatar
    Avoid improbable PANIC during heap_update. · 34f581c3
    Tom Lane authored
    heap_update needs to clear any existing "all visible" flag on
    the old tuple's page (and on the new page too, if different).
    Per coding rules, to do this it must acquire pin on the appropriate
    visibility-map page while not holding exclusive buffer lock;
    which creates a race condition since someone else could set the
    flag whenever we're not holding the buffer lock.  The code is
    supposed to handle that by re-checking the flag after acquiring
    buffer lock and retrying if it became set.  However, one code
    path through heap_update itself, as well as one in its subroutine
    RelationGetBufferForTuple, failed to do this.  The end result,
    in the unlikely event that a concurrent VACUUM did set the flag
    while we're transiently not holding lock, is a non-recurring
    "PANIC: wrong buffer passed to visibilitymap_clear" failure.
    
    This has been seen a few times in the buildfarm since recent VACUUM
    changes that added code paths that could set the all-visible flag
    while holding only exclusive buffer lock.  Previously, the flag
    was (usually?) set only after doing LockBufferForCleanup, which
    would insist on buffer pin count zero, thus preventing the flag
    from becoming set partway through heap_update.  However, it's
    clear that it's heap_update not VACUUM that's at fault here.
    
    What's less clear is whether there is any hazard from these bugs
    in released branches.  heap_update is certainly violating API
    expectations, but if there is no code path that can set all-visible
    without a cleanup lock then it's only a latent bug.  That's not
    100% certain though, besides which we should worry about extensions
    or future back-patch fixes that could introduce such code paths.
    
    I chose to back-patch to v12.  Fixing RelationGetBufferForTuple
    before that would require also back-patching portions of older
    fixes (notably 0d1fe9f7), which is more code churn than seems
    prudent to fix a hypothetical issue.
    
    Discussion: https://postgr.es/m/2247102.1618008027@sss.pgh.pa.us
    34f581c3
heapam.c 300 KB