• Peter Geoghegan's avatar
    Enhance nbtree index tuple deletion. · d168b666
    Peter Geoghegan authored
    Teach nbtree and heapam to cooperate in order to eagerly remove
    duplicate tuples representing dead MVCC versions.  This is "bottom-up
    deletion".  Each bottom-up deletion pass is triggered lazily in response
    to a flood of versions on an nbtree leaf page.  This usually involves a
    "logically unchanged index" hint (these are produced by the executor
    mechanism added by commit 9dc718bd).
    
    The immediate goal of bottom-up index deletion is to avoid "unnecessary"
    page splits caused entirely by version duplicates.  It naturally has an
    even more useful effect, though: it acts as a backstop against
    accumulating an excessive number of index tuple versions for any given
    _logical row_.  Bottom-up index deletion complements what we might now
    call "top-down index deletion": index vacuuming performed by VACUUM.
    Bottom-up index deletion responds to the immediate local needs of
    queries, while leaving it up to autovacuum to perform infrequent clean
    sweeps of the index.  The overall effect is to avoid certain
    pathological performance issues related to "version churn" from UPDATEs.
    
    The previous tableam interface used by index AMs to perform tuple
    deletion (the table_compute_xid_horizon_for_tuples() function) has been
    replaced with a new interface that supports certain new requirements.
    Many (perhaps all) of the capabilities added to nbtree by this commit
    could also be extended to other index AMs.  That is left as work for a
    later commit.
    
    Extend deletion of LP_DEAD-marked index tuples in nbtree by adding logic
    to consider extra index tuples (that are not LP_DEAD-marked) for
    deletion in passing.  This increases the number of index tuples deleted
    significantly in many cases.  The LP_DEAD deletion process (which is now
    called "simple deletion" to clearly distinguish it from bottom-up
    deletion) won't usually need to visit any extra table blocks to check
    these extra tuples.  We have to visit the same table blocks anyway to
    generate a latestRemovedXid value (at least in the common case where the
    index deletion operation's WAL record needs such a value).
    
    Testing has shown that the "extra tuples" simple deletion enhancement
    increases the number of index tuples deleted with almost any workload
    that has LP_DEAD bits set in leaf pages.  That is, it almost never fails
    to delete at least a few extra index tuples.  It helps most of all in
    cases that happen to naturally have a lot of delete-safe tuples.  It's
    not uncommon for an individual deletion operation to end up deleting an
    order of magnitude more index tuples compared to the old naive approach
    (e.g., custom instrumentation of the patch shows that this happens
    fairly often when the regression tests are run).
    
    Add a further enhancement that augments simple deletion and bottom-up
    deletion in indexes that make use of deduplication: Teach nbtree's
    _bt_delitems_delete() function to support granular TID deletion in
    posting list tuples.  It is now possible to delete individual TIDs from
    posting list tuples provided the TIDs have a tableam block number of a
    table block that gets visited as part of the deletion process (visiting
    the table block can be triggered directly or indirectly).  Setting the
    LP_DEAD bit of a posting list tuple is still an all-or-nothing thing,
    but that matters much less now that deletion only needs to start out
    with the right _general_ idea about which index tuples are deletable.
    
    Bump XLOG_PAGE_MAGIC because xl_btree_delete changed.
    
    No bump in BTREE_VERSION, since there are no changes to the on-disk
    representation of nbtree indexes.  Indexes built on PostgreSQL 12 or
    PostgreSQL 13 will automatically benefit from bottom-up index deletion
    (i.e. no reindexing required) following a pg_upgrade.  The enhancement
    to simple deletion is available with all B-Tree indexes following a
    pg_upgrade, no matter what PostgreSQL version the user upgrades from.
    
    Author: Peter Geoghegan <pg@bowt.ie>
    Reviewed-By: default avatarHeikki Linnakangas <hlinnaka@iki.fi>
    Reviewed-By: default avatarVictor Yegorov <vyegorov@gmail.com>
    Discussion: https://postgr.es/m/CAH2-Wzm+maE3apHB8NOtmM=p-DO65j2V5GzAWCOEEuy3JZgb2g@mail.gmail.com
    d168b666
README 59 KB