• Peter Geoghegan's avatar
    Add deduplication to nbtree. · 0d861bbb
    Peter Geoghegan authored
    Deduplication reduces the storage overhead of duplicates in indexes that
    use the standard nbtree index access method.  The deduplication process
    is applied lazily, after the point where opportunistic deletion of
    LP_DEAD-marked index tuples occurs.  Deduplication is only applied at
    the point where a leaf page split would otherwise be required.  New
    posting list tuples are formed by merging together existing duplicate
    tuples.  The physical representation of the items on an nbtree leaf page
    is made more space efficient by deduplication, but the logical contents
    of the page are not changed.  Even unique indexes make use of
    deduplication as a way of controlling bloat from duplicates whose TIDs
    point to different versions of the same logical table row.
    
    The lazy approach taken by nbtree has significant advantages over a GIN
    style eager approach.  Most individual inserts of index tuples have
    exactly the same overhead as before.  The extra overhead of
    deduplication is amortized across insertions, just like the overhead of
    page splits.  The key space of indexes works in the same way as it has
    since commit dd299df8 (the commit that made heap TID a tiebreaker
    column).
    
    Testing has shown that nbtree deduplication can generally make indexes
    with about 10 or 15 tuples for each distinct key value about 2.5X - 4X
    smaller, even with single column integer indexes (e.g., an index on a
    referencing column that accompanies a foreign key).  The final size of
    single column nbtree indexes comes close to the final size of a similar
    contrib/btree_gin index, at least in cases where GIN's posting list
    compression isn't very effective.  This can significantly improve
    transaction throughput, and significantly reduce the cost of vacuuming
    indexes.
    
    A new index storage parameter (deduplicate_items) controls the use of
    deduplication.  The default setting is 'on', so all new B-Tree indexes
    automatically use deduplication where possible.  This decision will be
    reviewed at the end of the Postgres 13 beta period.
    
    There is a regression of approximately 2% of transaction throughput with
    synthetic workloads that consist of append-only inserts into a table
    with several non-unique indexes, where all indexes have few or no
    repeated values.  The underlying issue is that cycles are wasted on
    unsuccessful attempts at deduplicating items in non-unique indexes.
    There doesn't seem to be a way around it short of disabling
    deduplication entirely.  Note that deduplication of items in unique
    indexes is fairly well targeted in general, which avoids the problem
    there (we can use a special heuristic to trigger deduplication passes in
    unique indexes, since we're specifically targeting "version bloat").
    
    Bump XLOG_PAGE_MAGIC because xl_btree_vacuum changed.
    
    No bump in BTREE_VERSION, since the representation of posting list
    tuples works in a way that's backwards compatible with version 4 indexes
    (i.e. indexes built on PostgreSQL 12).  However, users must still
    REINDEX a pg_upgrade'd index to use deduplication, regardless of the
    Postgres version they've upgraded from.  This is the only way to set the
    new nbtree metapage flag indicating that deduplication is generally
    safe.
    
    Author: Anastasia Lubennikova, Peter Geoghegan
    Reviewed-By: Peter Geoghegan, Heikki Linnakangas
    Discussion:
        https://postgr.es/m/55E4051B.7020209@postgrespro.ru
        https://postgr.es/m/4ab6e2db-bcee-f4cf-0916-3a06e6ccbb55@postgrespro.ru
    0d861bbb
func.sgml 812 KB