• Peter Geoghegan's avatar
    Use full 64-bit XIDs in deleted nbtree pages. · e5d8a999
    Peter Geoghegan authored
    Otherwise we risk "leaking" deleted pages by making them non-recyclable
    indefinitely.  Commit 6655a729 did the same thing for deleted pages in
    GiST indexes.  That work was used as a starting point here.
    
    Stop storing an XID indicating the oldest bpto.xact across all deleted
    though unrecycled pages in nbtree metapages.  There is no longer any
    reason to care about that condition/the oldest XID.  It only ever made
    sense when wraparound was something _bt_vacuum_needs_cleanup() had to
    consider.
    
    The btm_oldest_btpo_xact metapage field has been repurposed and renamed.
    It is now btm_last_cleanup_num_delpages, which is used to remember how
    many non-recycled deleted pages remain from the last VACUUM (in practice
    its value is usually the precise number of pages that were _newly
    deleted_ during the specific VACUUM operation that last set the field).
    
    The general idea behind storing btm_last_cleanup_num_delpages is to use
    it to give _some_ consideration to non-recycled deleted pages inside
    _bt_vacuum_needs_cleanup() -- though never too much.  We only really
    need to avoid leaving a truly excessive number of deleted pages in an
    unrecycled state forever.  We only do this to cover certain narrow cases
    where no other factor makes VACUUM do a full scan, and yet the index
    continues to grow (and so actually misses out on recycling existing
    deleted pages).
    
    These metapage changes result in a clear user-visible benefit: We no
    longer trigger full index scans during VACUUM operations solely due to
    the presence of only 1 or 2 known deleted (though unrecycled) blocks
    from a very large index.  All that matters now is keeping the costs and
    benefits in balance over time.
    
    Fix an issue that has been around since commit 857f9c36, which added the
    "skip full scan of index" mechanism (i.e. the _bt_vacuum_needs_cleanup()
    logic).  The accuracy of btm_last_cleanup_num_heap_tuples accidentally
    hinged upon _when_ the source value gets stored.  We now always store
    btm_last_cleanup_num_heap_tuples in btvacuumcleanup().  This fixes the
    issue because IndexVacuumInfo.num_heap_tuples (the source field) is
    expected to accurately indicate the state of the table _after_ the
    VACUUM completes inside btvacuumcleanup().
    
    A backpatchable fix cannot easily be extracted from this commit.  A
    targeted fix for the issue will follow in a later commit, though that
    won't happen today.
    
    I (pgeoghegan) have chosen to remove any mention of deleted pages in the
    documentation of the vacuum_cleanup_index_scale_factor GUC/param, since
    the presence of deleted (though unrecycled) pages is no longer of much
    concern to users.  The vacuum_cleanup_index_scale_factor description in
    the docs now seems rather unclear in any case, and it should probably be
    rewritten in the near future.  Perhaps some passing mention of page
    deletion will be added back at the same time.
    
    Bump XLOG_PAGE_MAGIC due to nbtree WAL records using full XIDs now.
    
    Author: Peter Geoghegan <pg@bowt.ie>
    Reviewed-By: default avatarMasahiko Sawada <sawada.mshk@gmail.com>
    Discussion: https://postgr.es/m/CAH2-WznpdHvujGUwYZ8sihX=d5u-tRYhi-F4wnV2uN2zHpMUXw@mail.gmail.com
    e5d8a999
standby.c 45 KB