• Peter Geoghegan's avatar
    Teach VACUUM to bypass unnecessary index vacuuming. · 5100010e
    Peter Geoghegan authored
    VACUUM has never needed to call ambulkdelete() for each index in cases
    where there are precisely zero TIDs in its dead_tuples array by the end
    of its first pass over the heap (also its only pass over the heap in
    this scenario).  Index vacuuming is simply not required when this
    happens.  Index cleanup will still go ahead, but in practice most calls
    to amvacuumcleanup() are usually no-ops when there were zero preceding
    ambulkdelete() calls.  In short, VACUUM has generally managed to avoid
    index scans when there were clearly no index tuples to delete from
    indexes.  But cases with _close to_ no index tuples to delete were
    another matter -- a round of ambulkdelete() calls took place (one per
    index), each of which performed a full index scan.
    
    VACUUM now behaves just as if there were zero index tuples to delete in
    cases where there are in fact "virtually zero" such tuples.  That is, it
    can now bypass index vacuuming and heap vacuuming as an optimization
    (though not index cleanup).  Whether or not VACUUM bypasses indexes is
    determined dynamically, based on the just-observed number of heap pages
    in the table that have one or more LP_DEAD items (LP_DEAD items in heap
    pages have a 1:1 correspondence with index tuples that still need to be
    deleted from each index in the worst case).
    
    We only skip index vacuuming when 2% or less of the table's pages have
    one or more LP_DEAD items -- bypassing index vacuuming as an
    optimization must not noticeably impede setting bits in the visibility
    map.  As a further condition, the dead_tuples array (i.e. VACUUM's array
    of LP_DEAD item TIDs) must not exceed 32MB at the point that the first
    pass over the heap finishes, which is also when the decision to bypass
    is made.  (The VACUUM must also have been able to fit all TIDs in its
    maintenance_work_mem-bound dead_tuples space, though with a default
    maintenance_work_mem setting it can't matter.)
    
    This avoids surprising jumps in the duration and overhead of routine
    vacuuming with workloads where successive VACUUM operations consistently
    have almost zero dead index tuples.  The number of LP_DEAD items may
    well accumulate over multiple VACUUM operations, before finally the
    threshold is crossed and VACUUM performs conventional index vacuuming.
    Even then, the optimization will have avoided a great deal of largely
    unnecessary index vacuuming.
    
    In the future we may teach VACUUM to skip index vacuuming on a per-index
    basis, using a much more sophisticated approach.  For now we only
    consider the extreme cases, where we can be quite confident that index
    vacuuming just isn't worth it using simple heuristics.
    
    Also log information about how many heap pages have one or more LP_DEAD
    items when autovacuum logging is enabled.
    
    Author: Masahiko Sawada <sawada.mshk@gmail.com>
    Author: Peter Geoghegan <pg@bowt.ie>
    Discussion: https://postgr.es/m/CAD21AoD0SkE11fMw4jD4RENAwBMcw1wasVnwpJVw3tVqPOQgAw@mail.gmail.com
    Discussion: https://postgr.es/m/CAH2-WzmkebqPd4MVGuPTOS9bMFvp9MDs5cRTCOsv1rQJ3jCbXw@mail.gmail.com
    5100010e
vacuumlazy.c 135 KB