src/backend/access/nbtree/nbtsplitloc.c · 1542e16f2ca257656717ab8ea263bc310f1b6d51 · Abuhujair Javed / Postgres FD Implementation

Consider outliers in split interval calculation. · 1542e16f

Peter Geoghegan authored Apr 21, 2020

Commit 0d861bbb, which introduced deduplication to nbtree, added some
logic to take large posting list tuples into account when choosing a
split point. We subtract firstright posting list overhead from the
projected new high key size when calculating leftfree/rightfree values
for an affected candidate split point. Posting list tuples aren't
special to nbtsplitloc.c, but taking them into account like this makes a
huge difference in practice. Posting list tuples are frequently tuple
size outliers.

However, commit 0d861bbb missed a closely related issue: split interval
itself is calculated based on the assumption that tuples on the page
being split are roughly equisized. That assumption was acceptable back
when commit fab25024 taught the logic for choosing a split point about
suffix truncation, but it's pretty questionable now that very large
tuple sizes are common. This oversight led to unbalanced page splits in
low cardinality multi-column indexes when deduplication was used: page
splits that don't give sufficient weight to how unbalanced the split is
when the interval happens to include some large posting list tuples (and
when most other tuples on the page are not so large).

Nail this down by calculating an initial split interval in a way that's
attuned to the actual cost that we want to keep under control (not a
fuzzy proxy for the cost): apply a leftfree + rightfree evenness test to
each candidate split point that actually gets included in the split
interval (for the default strategy). This replaces logic that used a
percentage of all legal split points for the page as the basis of the
initial split interval.

Discussion: https://postgr.es/m/CAH2-WznJt5aT2uUB2Bs+JBLdwe0XTX67+xeLFcaNvCKxO=QBVQ@mail.gmail.com

1542e16f

nbtsplitloc.c 42.2 KB

Replace nbtsplitloc.c