Fix nbtree deduplication README commentary.

Descriptions of some aspects of how deduplication works were unclear in a couple of places.

Fix nbtree deduplication README commentary.
Descriptions of some aspects of how deduplication works were unclear in a couple of places.
b150a767 · Peter Geoghegan · 112b006f · b150a767
Commit b150a767 authored Mar 24, 2020 by Peter Geoghegan
Show whitespace changes
Inline Side-by-side

Showing with 24 additions and 21 deletions

src/backend/access/nbtree/README src/backend/access/nbtree/README +24 -21

No files found.
--- a/src/backend/access/nbtree/README
+++ b/src/backend/access/nbtree/README
@@ -780,20 +780,21 @@ order.  Delaying deduplication minimizes page level fragmentation.
 Deduplication in unique indexes
 -------------------------------
-Very often, the range of values that can be placed on a given leaf page in
+Very often, the number of distinct values that can ever be placed on
-a unique index is fixed and permanent.  For example, a primary key on an
+almost any given leaf page in a unique index is fixed and permanent.  For
-identity column will usually only have page splits caused by the insertion
+example, a primary key on an identity column will usually only have leaf
-of new logical rows within the rightmost leaf page.  If there is a split
+page splits caused by the insertion of new logical rows within the
-of a non-rightmost leaf page, then the split must have been triggered by
+rightmost leaf page.  If there is a split of a non-rightmost leaf page,
-inserts associated with an UPDATE of an existing logical row.  Splitting a
+then the split must have been triggered by inserts associated with UPDATEs
-leaf page purely to store multiple versions should be considered
+of existing logical rows.  Splitting a leaf page purely to store multiple
-pathological, since it permanently degrades the index structure in order
+versions is a false economy.  In effect, we're permanently degrading the
-to absorb a temporary burst of duplicates.  Deduplication in unique
+index structure just to absorb a temporary burst of duplicates.
-indexes helps to prevent these pathological page splits.  Storing
-duplicates in a space efficient manner is not the goal, since in the long
+Deduplication in unique indexes helps to prevent these pathological page
-run there won't be any duplicates anyway.  Rather, we're buying time for
+splits.  Storing duplicates in a space efficient manner is not the goal,
-standard garbage collection mechanisms to run before a page split is
+since in the long run there won't be any duplicates anyway.  Rather, we're
-needed.
+buying time for standard garbage collection mechanisms to run before a
+page split is needed.
 Unique index leaf pages only get a deduplication pass when an insertion
 (that might have to split the page) observed an existing duplicate on the
@@ -838,13 +839,15 @@ list splits.
 Only a few isolated extra steps are required to preserve the illusion that
 the new item never overlapped with an existing posting list in the first
-place: the heap TID of the incoming tuple is swapped with the rightmost/max
+place: the heap TID of the incoming tuple has its TID replaced with the
-heap TID from the existing/originally overlapping posting list.  Also, the
+rightmost/max heap TID from the existing/originally overlapping posting
-posting-split-with-page-split case must generate a new high key based on
+list.  Similarly, the original incoming item's TID is relocated to the
-an imaginary version of the original page that has both the final new item
+appropriate offset in the posting list (we usually shift TIDs out of the
-and the after-list-split posting tuple (page splits usually just operate
+way to make a hole for it).  Finally, the posting-split-with-page-split
-against an imaginary version that contains the new item/item that won't
+case must generate a new high key based on an imaginary version of the
-fit).
+original page that has both the final new item and the after-list-split
+posting tuple (page splits usually just operate against an imaginary
+version that contains the new item/item that won't fit).
 This approach avoids inventing an "eager" atomic posting split operation
 that splits the posting list without simultaneously finishing the insert