• Peter Geoghegan's avatar
    Make heap TID a tiebreaker nbtree index column. · dd299df8
    Peter Geoghegan authored
    Make nbtree treat all index tuples as having a heap TID attribute.
    Index searches can distinguish duplicates by heap TID, since heap TID is
    always guaranteed to be unique.  This general approach has numerous
    benefits for performance, and is prerequisite to teaching VACUUM to
    perform "retail index tuple deletion".
    
    Naively adding a new attribute to every pivot tuple has unacceptable
    overhead (it bloats internal pages), so suffix truncation of pivot
    tuples is added.  This will usually truncate away the "extra" heap TID
    attribute from pivot tuples during a leaf page split, and may also
    truncate away additional user attributes.  This can increase fan-out,
    especially in a multi-column index.  Truncation can only occur at the
    attribute granularity, which isn't particularly effective, but works
    well enough for now.  A future patch may add support for truncating
    "within" text attributes by generating truncated key values using new
    opclass infrastructure.
    
    Only new indexes (BTREE_VERSION 4 indexes) will have insertions that
    treat heap TID as a tiebreaker attribute, or will have pivot tuples
    undergo suffix truncation during a leaf page split (on-disk
    compatibility with versions 2 and 3 is preserved).  Upgrades to version
    4 cannot be performed on-the-fly, unlike upgrades from version 2 to
    version 3.  contrib/amcheck continues to work with version 2 and 3
    indexes, while also enforcing stricter invariants when verifying version
    4 indexes.  These stricter invariants are the same invariants described
    by "3.1.12 Sequencing" from the Lehman and Yao paper.
    
    A later patch will enhance the logic used by nbtree to pick a split
    point.  This patch is likely to negatively impact performance without
    smarter choices around the precise point to split leaf pages at.  Making
    these two mostly-distinct sets of enhancements into distinct commits
    seems like it might clarify their design, even though neither commit is
    particularly useful on its own.
    
    The maximum allowed size of new tuples is reduced by an amount equal to
    the space required to store an extra MAXALIGN()'d TID in a new high key
    during leaf page splits.  The user-facing definition of the "1/3 of a
    page" restriction is already imprecise, and so does not need to be
    revised.  However, there should be a compatibility note in the v12
    release notes.
    
    Author: Peter Geoghegan
    Reviewed-By: Heikki Linnakangas, Alexander Korotkov
    Discussion: https://postgr.es/m/CAH2-WzkVb0Kom=R+88fDFb=JSxZMFvbHVC6Mn9LJ2n=X=kS-Uw@mail.gmail.com
    dd299df8
tuplesort.c 137 KB