• Peter Geoghegan's avatar
    Recycle nbtree pages deleted during same VACUUM. · 9dd963ae
    Peter Geoghegan authored
    Maintain a simple array of metadata about pages that were deleted during
    nbtree VACUUM's current btvacuumscan() call.  Use this metadata at the
    end of btvacuumscan() to attempt to place newly deleted pages in the FSM
    without further delay.  It might not yet be safe to place any of the
    pages in the FSM by then (they may not be deemed recyclable), but we
    have little to lose and plenty to gain by trying.  In practice there is
    a very good chance that this will work out when vacuuming larger
    indexes, where scanning the index naturally takes quite a while.
    
    This commit doesn't change the page recycling invariants; it merely
    improves the efficiency of page recycling within the confines of the
    existing design.  Recycle safety is a part of nbtree's implementation of
    what Lanin & Shasha call "the drain technique".  The design happens to
    use transaction IDs (they're stored in deleted pages), but that in
    itself doesn't align the cutoff for recycle safety to any of the
    XID-based cutoffs used by VACUUM (e.g., OldestXmin).  All that matters
    is whether or not _other_ backends might be able to observe various
    inconsistencies in the tree structure (that they cannot just detect and
    recover from by moving right).  Recycle safety is purely a question of
    maintaining the consistency (or the apparent consistency) of a physical
    data structure.
    
    Note that running a simple serial test case involving a large range
    DELETE followed by a VACUUM VERBOSE will probably show that any newly
    deleted nbtree pages are not yet reusable/recyclable.  This is expected
    in the absence of even one concurrent XID assignment.  It is an old
    implementation restriction.  In practice it's unlikely to be the thing
    that makes recycling remain unsafe, at least with larger indexes, where
    recycling newly deleted pages during the same VACUUM actually matters.
    
    An important high-level goal of this commit (as well as related recent
    commits e5d8a999 and 9f3665fb) is to make expensive deferred cleanup
    operations in index AMs rare in general.  If index vacuuming frequently
    depends on the next VACUUM operation finishing off work that the current
    operation started, then the general behavior of index vacuuming is hard
    to predict.  This is relevant to ongoing work that adds a vacuumlazy.c
    mechanism to skip index vacuuming in certain cases.  Anything that makes
    the real world behavior of index vacuuming simpler and more linear will
    also make top-down modeling in vacuumlazy.c more robust.
    
    Author: Peter Geoghegan <pg@bowt.ie>
    Reviewed-By: default avatarMasahiko Sawada <sawada.mshk@gmail.com>
    Discussion: https://postgr.es/m/CAH2-Wzk76_P=67iUscb1UN44-gyZL-KgpsXbSxq_bdcMa7Q+wQ@mail.gmail.com
    9dd963ae
nbtree.h 49.2 KB