src/backend/access/nbtree/README · 9dd963ae2534e9614f0abeccaafbd39f1b93ff8a · Abuhujair Javed / Postgres FD Implementation

Recycle nbtree pages deleted during same VACUUM. · 9dd963ae

Peter Geoghegan authored Mar 21, 2021

Maintain a simple array of metadata about pages that were deleted during
nbtree VACUUM's current btvacuumscan() call. Use this metadata at the
end of btvacuumscan() to attempt to place newly deleted pages in the FSM
without further delay. It might not yet be safe to place any of the
pages in the FSM by then (they may not be deemed recyclable), but we
have little to lose and plenty to gain by trying. In practice there is
a very good chance that this will work out when vacuuming larger
indexes, where scanning the index naturally takes quite a while.

This commit doesn't change the page recycling invariants; it merely
improves the efficiency of page recycling within the confines of the
existing design. Recycle safety is a part of nbtree's implementation of
what Lanin & Shasha call "the drain technique". The design happens to
use transaction IDs (they're stored in deleted pages), but that in
itself doesn't align the cutoff for recycle safety to any of the
XID-based cutoffs used by VACUUM (e.g., OldestXmin). All that matters
is whether or not _other_ backends might be able to observe various
inconsistencies in the tree structure (that they cannot just detect and
recover from by moving right). Recycle safety is purely a question of
maintaining the consistency (or the apparent consistency) of a physical
data structure.

Note that running a simple serial test case involving a large range
DELETE followed by a VACUUM VERBOSE will probably show that any newly
deleted nbtree pages are not yet reusable/recyclable. This is expected
in the absence of even one concurrent XID assignment. It is an old
implementation restriction. In practice it's unlikely to be the thing
that makes recycling remain unsafe, at least with larger indexes, where
recycling newly deleted pages during the same VACUUM actually matters.

An important high-level goal of this commit (as well as related recent
commits e5d8a999 and 9f3665fb) is to make expensive deferred cleanup
operations in index AMs rare in general. If index vacuuming frequently
depends on the next VACUUM operation finishing off work that the current
operation started, then the general behavior of index vacuuming is hard
to predict. This is relevant to ongoing work that adds a vacuumlazy.c
mechanism to skip index vacuuming in certain cases. Anything that makes
the real world behavior of index vacuuming simpler and more linear will
also make top-down modeling in vacuumlazy.c more robust.

Author: Peter Geoghegan <pg@bowt.ie>
Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com>
Discussion: https://postgr.es/m/CAH2-Wzk76_P=67iUscb1UN44-gyZL-KgpsXbSxq_bdcMa7Q+wQ@mail.gmail.com

9dd963ae

README 61.4 KB

Replace README