Commit 624686ab authored by Peter Geoghegan's avatar Peter Geoghegan

Adjust "root of to-be-deleted subtree" function.

Restructure the function that locates the root of the to-be-deleted
subtree during nbtree page deletion.  Handle the conditions that make
page deletion unsafe in a slightly more uniform way, and acknowledge the
fact that the behavior with incomplete splits on internal pages is
different (as pointed out in the nbtree README as of commit 35bc0ec7).
Also invent new terminology that avoids ambiguity around which pages are
about to be deleted.  Consistently use the term "to-be-deleted subtree",
not the ambiguous term "branch".

We were calling the subtree parent page the "top parent page", but that
was quite misleading.  The top parent page usually refers to a page
unlinked from its siblings and marked deleted (during the second stage
of page deletion).  There was one kind of top parent page that we merely
removed a downlink from, and another kind of top parent page that we
actually marked deleted.  Eliminate the ambiguity by inventing a new
term ("subtree parent page") that refers to the former kind of page
only.
parent a8be5364
...@@ -223,9 +223,10 @@ right to make this happen --- a scan moving in the opposite direction ...@@ -223,9 +223,10 @@ right to make this happen --- a scan moving in the opposite direction
might miss the items if so.) Also, we *never* delete the rightmost page might miss the items if so.) Also, we *never* delete the rightmost page
on a tree level (this restriction simplifies the traversal algorithms, as on a tree level (this restriction simplifies the traversal algorithms, as
explained below). Page deletion always begins from an empty leaf page. An explained below). Page deletion always begins from an empty leaf page. An
internal page can only be deleted as part of a branch leading to a leaf internal page can only be deleted as part of deleting an entire subtree.
page, where each internal page has only one child and that child is also to This is always a "skinny" subtree consisting of a "chain" of internal pages
be deleted. plus a single leaf page. There is one page on each level of the subtree,
and each level/page covers the same key space.
Deleting a leaf page is a two-stage process. In the first stage, the page Deleting a leaf page is a two-stage process. In the first stage, the page
is unlinked from its parent, and marked as half-dead. The parent page must is unlinked from its parent, and marked as half-dead. The parent page must
...@@ -243,7 +244,12 @@ it, but it's still linked to its siblings. ...@@ -243,7 +244,12 @@ it, but it's still linked to its siblings.
(Note: Lanin and Shasha prefer to make the key space move left, but their (Note: Lanin and Shasha prefer to make the key space move left, but their
argument for doing so hinges on not having left-links, which we have argument for doing so hinges on not having left-links, which we have
anyway. So we simplify the algorithm by moving the key space right.) anyway. So we simplify the algorithm by moving the key space right. This
is only possible because we don't match on a separator key when ascending
the tree during a page split, unlike Lehman and Yao/Lanin and Shasha -- it
doesn't matter if the downlink is re-found in a pivot tuple whose separator
key does not match the one encountered when inserter initially descended
the tree.)
To preserve consistency on the parent level, we cannot merge the key space To preserve consistency on the parent level, we cannot merge the key space
of a page into its right sibling unless the right sibling is a child of of a page into its right sibling unless the right sibling is a child of
......
...@@ -2278,7 +2278,8 @@ _bt_finish_split(Relation rel, Buffer lbuf, BTStack stack) ...@@ -2278,7 +2278,8 @@ _bt_finish_split(Relation rel, Buffer lbuf, BTStack stack)
* stack. For example, the checkingunique _bt_doinsert() case may * stack. For example, the checkingunique _bt_doinsert() case may
* have to step right when there are many physical duplicates, and its * have to step right when there are many physical duplicates, and its
* scantid forces an insertion to the right of the "first page the * scantid forces an insertion to the right of the "first page the
* value could be on". * value could be on". (This is also relied on by all of our callers
* when dealing with !heapkeyspace indexes.)
* *
* Returns write-locked parent page buffer, or InvalidBuffer if pivot * Returns write-locked parent page buffer, or InvalidBuffer if pivot
* tuple not found (should not happen). Adjusts bts_blkno & * tuple not found (should not happen). Adjusts bts_blkno &
......
This diff is collapsed.
...@@ -704,7 +704,7 @@ btree_xlog_mark_page_halfdead(uint8 info, XLogReaderState *record) ...@@ -704,7 +704,7 @@ btree_xlog_mark_page_halfdead(uint8 info, XLogReaderState *record)
* target page or not (since it's surely empty). * target page or not (since it's surely empty).
*/ */
/* parent page */ /* to-be-deleted subtree's parent page */
if (XLogReadBufferForRedo(record, 1, &buffer) == BLK_NEEDS_REDO) if (XLogReadBufferForRedo(record, 1, &buffer) == BLK_NEEDS_REDO)
{ {
OffsetNumber poffset; OffsetNumber poffset;
...@@ -749,8 +749,8 @@ btree_xlog_mark_page_halfdead(uint8 info, XLogReaderState *record) ...@@ -749,8 +749,8 @@ btree_xlog_mark_page_halfdead(uint8 info, XLogReaderState *record)
pageop->btpo_cycleid = 0; pageop->btpo_cycleid = 0;
/* /*
* Construct a dummy hikey item that points to the next parent to be * Construct a dummy high key item that points to top parent page (value
* deleted (if any). * is InvalidBlockNumber when the top parent page is the leaf page itself)
*/ */
MemSet(&trunctuple, 0, sizeof(IndexTupleData)); MemSet(&trunctuple, 0, sizeof(IndexTupleData));
trunctuple.t_info = sizeof(IndexTupleData); trunctuple.t_info = sizeof(IndexTupleData);
...@@ -837,7 +837,7 @@ btree_xlog_unlink_page(uint8 info, XLogReaderState *record) ...@@ -837,7 +837,7 @@ btree_xlog_unlink_page(uint8 info, XLogReaderState *record)
/* /*
* If we deleted a parent of the targeted leaf page, instead of the leaf * If we deleted a parent of the targeted leaf page, instead of the leaf
* itself, update the leaf to point to the next remaining child in the * itself, update the leaf to point to the next remaining child in the
* branch. * to-be-deleted subtree
*/ */
if (XLogRecHasBlockRef(record, 3)) if (XLogRecHasBlockRef(record, 3))
{ {
......
...@@ -250,7 +250,7 @@ typedef struct xl_btree_vacuum ...@@ -250,7 +250,7 @@ typedef struct xl_btree_vacuum
#define SizeOfBtreeVacuum (offsetof(xl_btree_vacuum, nupdated) + sizeof(uint16)) #define SizeOfBtreeVacuum (offsetof(xl_btree_vacuum, nupdated) + sizeof(uint16))
/* /*
* This is what we need to know about marking an empty branch for deletion. * This is what we need to know about marking an empty subtree for deletion.
* The target identifies the tuple removed from the parent page (note that we * The target identifies the tuple removed from the parent page (note that we
* remove this tuple's downlink and the *following* tuple's key). Note that * remove this tuple's downlink and the *following* tuple's key). Note that
* the leaf page is empty, so we don't need to store its content --- it is * the leaf page is empty, so we don't need to store its content --- it is
...@@ -267,7 +267,7 @@ typedef struct xl_btree_mark_page_halfdead ...@@ -267,7 +267,7 @@ typedef struct xl_btree_mark_page_halfdead
BlockNumber leafblk; /* leaf block ultimately being deleted */ BlockNumber leafblk; /* leaf block ultimately being deleted */
BlockNumber leftblk; /* leaf block's left sibling, if any */ BlockNumber leftblk; /* leaf block's left sibling, if any */
BlockNumber rightblk; /* leaf block's right sibling */ BlockNumber rightblk; /* leaf block's right sibling */
BlockNumber topparent; /* topmost internal page in the branch */ BlockNumber topparent; /* topmost internal page in the subtree */
} xl_btree_mark_page_halfdead; } xl_btree_mark_page_halfdead;
#define SizeOfBtreeMarkPageHalfDead (offsetof(xl_btree_mark_page_halfdead, topparent) + sizeof(BlockNumber)) #define SizeOfBtreeMarkPageHalfDead (offsetof(xl_btree_mark_page_halfdead, topparent) + sizeof(BlockNumber))
...@@ -294,7 +294,7 @@ typedef struct xl_btree_unlink_page ...@@ -294,7 +294,7 @@ typedef struct xl_btree_unlink_page
*/ */
BlockNumber leafleftsib; BlockNumber leafleftsib;
BlockNumber leafrightsib; BlockNumber leafrightsib;
BlockNumber topparent; /* next child down in the branch */ BlockNumber topparent; /* next child down in the subtree */
TransactionId btpo_xact; /* value of btpo.xact for use in recovery */ TransactionId btpo_xact; /* value of btpo.xact for use in recovery */
/* xl_btree_metadata FOLLOWS IF XLOG_BTREE_UNLINK_PAGE_META */ /* xl_btree_metadata FOLLOWS IF XLOG_BTREE_UNLINK_PAGE_META */
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment