• Peter Geoghegan's avatar
    Consider outliers in split interval calculation. · 1542e16f
    Peter Geoghegan authored
    Commit 0d861bbb, which introduced deduplication to nbtree, added some
    logic to take large posting list tuples into account when choosing a
    split point.  We subtract firstright posting list overhead from the
    projected new high key size when calculating leftfree/rightfree values
    for an affected candidate split point.  Posting list tuples aren't
    special to nbtsplitloc.c, but taking them into account like this makes a
    huge difference in practice.  Posting list tuples are frequently tuple
    size outliers.
    
    However, commit 0d861bbb missed a closely related issue: split interval
    itself is calculated based on the assumption that tuples on the page
    being split are roughly equisized.  That assumption was acceptable back
    when commit fab25024 taught the logic for choosing a split point about
    suffix truncation, but it's pretty questionable now that very large
    tuple sizes are common.  This oversight led to unbalanced page splits in
    low cardinality multi-column indexes when deduplication was used: page
    splits that don't give sufficient weight to how unbalanced the split is
    when the interval happens to include some large posting list tuples (and
    when most other tuples on the page are not so large).
    
    Nail this down by calculating an initial split interval in a way that's
    attuned to the actual cost that we want to keep under control (not a
    fuzzy proxy for the cost): apply a leftfree + rightfree evenness test to
    each candidate split point that actually gets included in the split
    interval (for the default strategy).  This replaces logic that used a
    percentage of all legal split points for the page as the basis of the
    initial split interval.
    
    Discussion: https://postgr.es/m/CAH2-WznJt5aT2uUB2Bs+JBLdwe0XTX67+xeLFcaNvCKxO=QBVQ@mail.gmail.com
    1542e16f
nbtsplitloc.c 42.2 KB