Commit 40dae7ec authored by Heikki Linnakangas's avatar Heikki Linnakangas

Make the handling of interrupted B-tree page splits more robust.

Splitting a page consists of two separate steps: splitting the child page,
and inserting the downlink for the new right page to the parent. Previously,
we handled the case that you crash in between those steps with a cleanup
routine after the WAL recovery had finished, which finished the incomplete
split. However, that doesn't help if the page split is interrupted but the
database doesn't crash, so that you don't perform WAL recovery. That could
happen for example if you run out of disk space.

Remove the end-of-recovery cleanup step. Instead, when a page is split, the
left page is marked with a new INCOMPLETE_SPLIT flag, and when the downlink
is inserted to the parent, the flag is cleared again. If an insertion sees
a page with the flag set, it knows that the split was interrupted for some
reason, and inserts the missing downlink before proceeding.

I used the same approach to fix GIN and GiST split algorithms earlier. This
was the last WAL cleanup routine, so we could get rid of that whole
machinery now, but I'll leave that for a separate patch.

Reviewed by Peter Geoghegan.
parent b6ec7c92
...@@ -404,12 +404,41 @@ an additional insertion above that, etc). ...@@ -404,12 +404,41 @@ an additional insertion above that, etc).
For a root split, the followon WAL entry is a "new root" entry rather than For a root split, the followon WAL entry is a "new root" entry rather than
an "insertion" entry, but details are otherwise much the same. an "insertion" entry, but details are otherwise much the same.
Because insertion involves multiple atomic actions, the WAL replay logic Because splitting involves multiple atomic actions, it's possible that the
has to detect the case where a page split isn't followed by a matching system crashes between splitting a page and inserting the downlink for the
insertion on the parent level, and then do that insertion on its own (and new half to the parent. After recovery, the downlink for the new page will
recursively for any subsequent parent insertion, of course). This is be missing. The search algorithm works correctly, as the page will be found
feasible because the WAL entry for the split contains enough info to know by following the right-link from its left sibling, although if a lot of
what must be inserted in the parent level. downlinks in the tree are missing, performance will suffer. A more serious
consequence is that if the page without a downlink gets split again, the
insertion algorithm will fail to find the location in the parent level to
insert the downlink.
Our approach is to create any missing downlinks on-the-fly, when searching
the tree for a new insertion. It could be done during searches, too, but
it seems best not to put any extra updates in what would otherwise be a
read-only operation (updating is not possible in hot standby mode anyway).
It would seem natural to add the missing downlinks in VACUUM, but since
inserting a downlink might require splitting a page, it might fail if you
run out of disk space. That would be bad during VACUUM - the reason for
running VACUUM in the first place might be that you run out of disk space,
and now VACUUM won't finish because you're out of disk space. In contrast,
an insertion can require enlarging the physical file anyway.
To identify missing downlinks, when a page is split, the left page is
flagged to indicate that the split is not yet complete (INCOMPLETE_SPLIT).
When the downlink is inserted to the parent, the flag is cleared atomically
with the insertion. The child page is kept locked until the insertion in
the parent is finished and the flag in the child cleared, but can be
released immediately after that, before recursing up the tree if the parent
also needs to be split. This ensures that incompletely split pages should
not be seen under normal circumstances; only if insertion to the parent
has failed for some reason.
We flag the left page, even though it's the right page that's missing the
downlink, beacuse it's more convenient to know already when following the
right-link from the left page to the right page that it will need to have
its downlink inserted to the parent.
When splitting a non-root page that is alone on its level, the required When splitting a non-root page that is alone on its level, the required
metapage update (of the "fast root" link) is performed and logged as part metapage update (of the "fast root" link) is performed and logged as part
...@@ -419,8 +448,16 @@ metapage update is handled as part of the "new root" action. ...@@ -419,8 +448,16 @@ metapage update is handled as part of the "new root" action.
Each step in page deletion are logged as separate WAL entries: marking the Each step in page deletion are logged as separate WAL entries: marking the
leaf as half-dead and removing the downlink is one record, and unlinking a leaf as half-dead and removing the downlink is one record, and unlinking a
page is a second record. If vacuum is interrupted for some reason, or the page is a second record. If vacuum is interrupted for some reason, or the
system crashes, the tree is consistent for searches and insertions. The next system crashes, the tree is consistent for searches and insertions. The
VACUUM will find the half-dead leaf page and continue the deletion. next VACUUM will find the half-dead leaf page and continue the deletion.
Before 9.4, we used to keep track of incomplete splits and page deletions
during recovery and finish them immediately at end of recovery, instead of
doing it lazily at the next insertion or vacuum. However, that made the
recovery much more complicated, and only fixed the problem when crash
recovery was performed. An incomplete split can also occur if an otherwise
recoverable error, like out-of-memory or out-of-disk-space, happens while
inserting the downlink to the parent.
Scans during Recovery Scans during Recovery
--------------------- ---------------------
......
This diff is collapsed.
...@@ -992,6 +992,7 @@ _bt_lock_branch_parent(Relation rel, BlockNumber child, BTStack stack, ...@@ -992,6 +992,7 @@ _bt_lock_branch_parent(Relation rel, BlockNumber child, BTStack stack,
Buffer pbuf; Buffer pbuf;
Page page; Page page;
BTPageOpaque opaque; BTPageOpaque opaque;
BlockNumber leftsib;
/* Locate the parent's downlink (updating the stack entry if needed) */ /* Locate the parent's downlink (updating the stack entry if needed) */
ItemPointerSet(&(stack->bts_btentry.t_tid), child, P_HIKEY); ItemPointerSet(&(stack->bts_btentry.t_tid), child, P_HIKEY);
...@@ -1020,7 +1021,8 @@ _bt_lock_branch_parent(Relation rel, BlockNumber child, BTStack stack, ...@@ -1020,7 +1021,8 @@ _bt_lock_branch_parent(Relation rel, BlockNumber child, BTStack stack,
* We have to check the parent itself, and then recurse to test * We have to check the parent itself, and then recurse to test
* the conditions at the parent's parent. * the conditions at the parent's parent.
*/ */
if (P_RIGHTMOST(opaque) || P_ISROOT(opaque)) if (P_RIGHTMOST(opaque) || P_ISROOT(opaque) ||
P_INCOMPLETE_SPLIT(opaque))
{ {
_bt_relbuf(rel, pbuf); _bt_relbuf(rel, pbuf);
return false; return false;
...@@ -1028,8 +1030,41 @@ _bt_lock_branch_parent(Relation rel, BlockNumber child, BTStack stack, ...@@ -1028,8 +1030,41 @@ _bt_lock_branch_parent(Relation rel, BlockNumber child, BTStack stack,
*target = parent; *target = parent;
*rightsib = opaque->btpo_next; *rightsib = opaque->btpo_next;
leftsib = opaque->btpo_prev;
_bt_relbuf(rel, pbuf); _bt_relbuf(rel, pbuf);
/*
* Like in _bt_pagedel, check that the left sibling is not marked
* with INCOMPLETE_SPLIT flag. That would mean that there is no
* downlink to the page to be deleted, and the page deletion
* algorithm isn't prepared to handle that.
*/
if (leftsib != P_NONE)
{
Buffer lbuf;
Page lpage;
BTPageOpaque lopaque;
lbuf = _bt_getbuf(rel, leftsib, BT_READ);
lpage = BufferGetPage(lbuf);
lopaque = (BTPageOpaque) PageGetSpecialPointer(lpage);
/*
* If the left sibling was concurrently split, so that its
* next-pointer doesn't point to the current page anymore,
* the split that created the current page must be completed.
* (We don't allow splitting an incompletely split page again
* until the previous split has been completed)
*/
if (lopaque->btpo_next == parent &&
P_INCOMPLETE_SPLIT(lopaque))
{
_bt_relbuf(rel, lbuf);
return false;
}
_bt_relbuf(rel, lbuf);
}
return _bt_lock_branch_parent(rel, parent, stack->bts_parent, return _bt_lock_branch_parent(rel, parent, stack->bts_parent,
topparent, topoff, target, rightsib); topparent, topoff, target, rightsib);
} }
...@@ -1081,6 +1116,10 @@ _bt_pagedel(Relation rel, Buffer buf) ...@@ -1081,6 +1116,10 @@ _bt_pagedel(Relation rel, Buffer buf)
* "stack" is a search stack leading (approximately) to the target page. * "stack" is a search stack leading (approximately) to the target page.
* It is initially NULL, but when iterating, we keep it to avoid * It is initially NULL, but when iterating, we keep it to avoid
* duplicated search effort. * duplicated search effort.
*
* Also, when "stack" is not NULL, we have already checked that the
* current page is not the right half of an incomplete split, i.e. the
* left sibling does not have its INCOMPLETE_SPLIT flag set.
*/ */
BTStack stack = NULL; BTStack stack = NULL;
...@@ -1117,11 +1156,25 @@ _bt_pagedel(Relation rel, Buffer buf) ...@@ -1117,11 +1156,25 @@ _bt_pagedel(Relation rel, Buffer buf)
} }
/* /*
* We can never delete rightmost pages nor root pages. While at it, * We can never delete rightmost pages nor root pages. While at
* check that page is not already deleted and is empty. * it, check that page is not already deleted and is empty.
*
* To keep the algorithm simple, we also never delete an incompletely
* split page (they should be rare enough that this doesn't make any
* meaningful difference to disk usage):
*
* The INCOMPLETE_SPLIT flag on the page tells us if the page is the
* left half of an incomplete split, but ensuring that it's not the
* right half is more complicated. For that, we have to check that
* the left sibling doesn't have its INCOMPLETE_SPLIT flag set. On
* the first iteration, we temporarily release the lock on the
* current page, and check the left sibling and also construct a
* search stack to. On subsequent iterations, we know we stepped right
* from a page that passed these tests, so it's OK.
*/ */
if (P_RIGHTMOST(opaque) || P_ISROOT(opaque) || P_ISDELETED(opaque) || if (P_RIGHTMOST(opaque) || P_ISROOT(opaque) || P_ISDELETED(opaque) ||
P_FIRSTDATAKEY(opaque) <= PageGetMaxOffsetNumber(page)) P_FIRSTDATAKEY(opaque) <= PageGetMaxOffsetNumber(page) ||
P_INCOMPLETE_SPLIT(opaque))
{ {
/* Should never fail to delete a half-dead page */ /* Should never fail to delete a half-dead page */
Assert(!P_ISHALFDEAD(opaque)); Assert(!P_ISHALFDEAD(opaque));
...@@ -1142,6 +1195,9 @@ _bt_pagedel(Relation rel, Buffer buf) ...@@ -1142,6 +1195,9 @@ _bt_pagedel(Relation rel, Buffer buf)
* use the standard search mechanism to search for the page's high * use the standard search mechanism to search for the page's high
* key; this will give us a link to either the current parent or * key; this will give us a link to either the current parent or
* someplace to its left (if there are multiple equal high keys). * someplace to its left (if there are multiple equal high keys).
*
* Also check if this is the right-half of an incomplete split
* (see comment above).
*/ */
if (!stack) if (!stack)
{ {
...@@ -1149,16 +1205,43 @@ _bt_pagedel(Relation rel, Buffer buf) ...@@ -1149,16 +1205,43 @@ _bt_pagedel(Relation rel, Buffer buf)
ItemId itemid; ItemId itemid;
IndexTuple targetkey; IndexTuple targetkey;
Buffer lbuf; Buffer lbuf;
BlockNumber leftsib;
itemid = PageGetItemId(page, P_HIKEY); itemid = PageGetItemId(page, P_HIKEY);
targetkey = CopyIndexTuple((IndexTuple) PageGetItem(page, itemid)); targetkey = CopyIndexTuple((IndexTuple) PageGetItem(page, itemid));
leftsib = opaque->btpo_prev;
/* /*
* To avoid deadlocks, we'd better drop the leaf page lock * To avoid deadlocks, we'd better drop the leaf page lock
* before going further. * before going further.
*/ */
LockBuffer(buf, BUFFER_LOCK_UNLOCK); LockBuffer(buf, BUFFER_LOCK_UNLOCK);
/*
* Fetch the left sibling, to check that it's not marked
* with INCOMPLETE_SPLIT flag. That would mean that the
* page to-be-deleted doesn't have a downlink, and the page
* deletion algorithm isn't prepared to handle that.
*/
if (!P_LEFTMOST(opaque))
{
BTPageOpaque lopaque;
Page lpage;
lbuf = _bt_getbuf(rel, leftsib, BT_READ);
lpage = BufferGetPage(lbuf);
lopaque = (BTPageOpaque) PageGetSpecialPointer(lpage);
if (lopaque->btpo_next == BufferGetBlockNumber(buf) &&
P_INCOMPLETE_SPLIT(lopaque))
{
ReleaseBuffer(buf);
_bt_relbuf(rel, lbuf);
return ndeleted;
}
_bt_relbuf(rel, lbuf);
}
/* we need an insertion scan key for the search, so build one */ /* we need an insertion scan key for the search, so build one */
itup_scankey = _bt_mkscankey(rel, targetkey); itup_scankey = _bt_mkscankey(rel, targetkey);
/* find the leftmost leaf page containing this key */ /* find the leftmost leaf page containing this key */
......
...@@ -51,7 +51,8 @@ static bool _bt_endpoint(IndexScanDesc scan, ScanDirection dir); ...@@ -51,7 +51,8 @@ static bool _bt_endpoint(IndexScanDesc scan, ScanDirection dir);
* NOTE that the returned buffer is read-locked regardless of the access * NOTE that the returned buffer is read-locked regardless of the access
* parameter. However, access = BT_WRITE will allow an empty root page * parameter. However, access = BT_WRITE will allow an empty root page
* to be created and returned. When access = BT_READ, an empty index * to be created and returned. When access = BT_READ, an empty index
* will result in *bufP being set to InvalidBuffer. * will result in *bufP being set to InvalidBuffer. Also, in BT_WRITE mode,
* any incomplete splits encountered during the search will be finished.
*/ */
BTStack BTStack
_bt_search(Relation rel, int keysz, ScanKey scankey, bool nextkey, _bt_search(Relation rel, int keysz, ScanKey scankey, bool nextkey,
...@@ -82,8 +83,17 @@ _bt_search(Relation rel, int keysz, ScanKey scankey, bool nextkey, ...@@ -82,8 +83,17 @@ _bt_search(Relation rel, int keysz, ScanKey scankey, bool nextkey,
* Race -- the page we just grabbed may have split since we read its * Race -- the page we just grabbed may have split since we read its
* pointer in the parent (or metapage). If it has, we may need to * pointer in the parent (or metapage). If it has, we may need to
* move right to its new sibling. Do that. * move right to its new sibling. Do that.
*
* In write-mode, allow _bt_moveright to finish any incomplete splits
* along the way. Strictly speaking, we'd only need to finish an
* incomplete split on the leaf page we're about to insert to, not on
* any of the upper levels (they is taken care of in _bt_getstackbuf,
* if the leaf page is split and we insert to the parent page). But
* this is a good opportunity to finish splits of internal pages too.
*/ */
*bufP = _bt_moveright(rel, *bufP, keysz, scankey, nextkey, BT_READ); *bufP = _bt_moveright(rel, *bufP, keysz, scankey, nextkey,
(access == BT_WRITE), stack_in,
BT_READ);
/* if this is a leaf page, we're done */ /* if this is a leaf page, we're done */
page = BufferGetPage(*bufP); page = BufferGetPage(*bufP);
...@@ -148,6 +158,11 @@ _bt_search(Relation rel, int keysz, ScanKey scankey, bool nextkey, ...@@ -148,6 +158,11 @@ _bt_search(Relation rel, int keysz, ScanKey scankey, bool nextkey,
* item >= scankey. When nextkey is true, we are looking for the first * item >= scankey. When nextkey is true, we are looking for the first
* item strictly greater than scankey. * item strictly greater than scankey.
* *
* If forupdate is true, we will attempt to finish any incomplete splits
* that we encounter. This is required when locking a target page for an
* insertion, because we don't allow inserting on a page before the split
* is completed. 'stack' is only used if forupdate is true.
*
* On entry, we have the buffer pinned and a lock of the type specified by * On entry, we have the buffer pinned and a lock of the type specified by
* 'access'. If we move right, we release the buffer and lock and acquire * 'access'. If we move right, we release the buffer and lock and acquire
* the same on the right sibling. Return value is the buffer we stop at. * the same on the right sibling. Return value is the buffer we stop at.
...@@ -158,15 +173,14 @@ _bt_moveright(Relation rel, ...@@ -158,15 +173,14 @@ _bt_moveright(Relation rel,
int keysz, int keysz,
ScanKey scankey, ScanKey scankey,
bool nextkey, bool nextkey,
bool forupdate,
BTStack stack,
int access) int access)
{ {
Page page; Page page;
BTPageOpaque opaque; BTPageOpaque opaque;
int32 cmpval; int32 cmpval;
page = BufferGetPage(buf);
opaque = (BTPageOpaque) PageGetSpecialPointer(page);
/* /*
* When nextkey = false (normal case): if the scan key that brought us to * When nextkey = false (normal case): if the scan key that brought us to
* this page is > the high key stored on the page, then the page has split * this page is > the high key stored on the page, then the page has split
...@@ -184,16 +198,46 @@ _bt_moveright(Relation rel, ...@@ -184,16 +198,46 @@ _bt_moveright(Relation rel,
*/ */
cmpval = nextkey ? 0 : 1; cmpval = nextkey ? 0 : 1;
while (!P_RIGHTMOST(opaque) && for (;;)
(P_IGNORE(opaque) ||
_bt_compare(rel, keysz, scankey, page, P_HIKEY) >= cmpval))
{ {
/* step right one page */
BlockNumber rblkno = opaque->btpo_next;
buf = _bt_relandgetbuf(rel, buf, rblkno, access);
page = BufferGetPage(buf); page = BufferGetPage(buf);
opaque = (BTPageOpaque) PageGetSpecialPointer(page); opaque = (BTPageOpaque) PageGetSpecialPointer(page);
if (P_RIGHTMOST(opaque))
break;
/*
* Finish any incomplete splits we encounter along the way.
*/
if (forupdate && P_INCOMPLETE_SPLIT(opaque))
{
BlockNumber blkno = BufferGetBlockNumber(buf);
/* upgrade our lock if necessary */
if (access == BT_READ)
{
LockBuffer(buf, BUFFER_LOCK_UNLOCK);
LockBuffer(buf, BT_WRITE);
}
if (P_INCOMPLETE_SPLIT(opaque))
_bt_finish_split(rel, buf, stack);
else
_bt_relbuf(rel, buf);
/* re-acquire the lock in the right mode, and re-check */
buf = _bt_getbuf(rel, blkno, access);
continue;
}
if (P_IGNORE(opaque) || _bt_compare(rel, keysz, scankey, page, P_HIKEY) >= cmpval)
{
/* step right one page */
buf = _bt_relandgetbuf(rel, buf, opaque->btpo_next, access);
continue;
}
else
break;
} }
if (P_IGNORE(opaque)) if (P_IGNORE(opaque))
......
This diff is collapsed.
...@@ -73,6 +73,7 @@ typedef BTPageOpaqueData *BTPageOpaque; ...@@ -73,6 +73,7 @@ typedef BTPageOpaqueData *BTPageOpaque;
#define BTP_HALF_DEAD (1 << 4) /* empty, but still in tree */ #define BTP_HALF_DEAD (1 << 4) /* empty, but still in tree */
#define BTP_SPLIT_END (1 << 5) /* rightmost page of split group */ #define BTP_SPLIT_END (1 << 5) /* rightmost page of split group */
#define BTP_HAS_GARBAGE (1 << 6) /* page has LP_DEAD tuples */ #define BTP_HAS_GARBAGE (1 << 6) /* page has LP_DEAD tuples */
#define BTP_INCOMPLETE_SPLIT (1 << 7) /* right sibling's downlink is missing */
/* /*
* The max allowed value of a cycle ID is a bit less than 64K. This is * The max allowed value of a cycle ID is a bit less than 64K. This is
...@@ -178,6 +179,7 @@ typedef struct BTMetaPageData ...@@ -178,6 +179,7 @@ typedef struct BTMetaPageData
#define P_ISHALFDEAD(opaque) ((opaque)->btpo_flags & BTP_HALF_DEAD) #define P_ISHALFDEAD(opaque) ((opaque)->btpo_flags & BTP_HALF_DEAD)
#define P_IGNORE(opaque) ((opaque)->btpo_flags & (BTP_DELETED|BTP_HALF_DEAD)) #define P_IGNORE(opaque) ((opaque)->btpo_flags & (BTP_DELETED|BTP_HALF_DEAD))
#define P_HAS_GARBAGE(opaque) ((opaque)->btpo_flags & BTP_HAS_GARBAGE) #define P_HAS_GARBAGE(opaque) ((opaque)->btpo_flags & BTP_HAS_GARBAGE)
#define P_INCOMPLETE_SPLIT(opaque) ((opaque)->btpo_flags & BTP_INCOMPLETE_SPLIT)
/* /*
* Lehman and Yao's algorithm requires a ``high key'' on every non-rightmost * Lehman and Yao's algorithm requires a ``high key'' on every non-rightmost
...@@ -253,7 +255,7 @@ typedef struct xl_btree_metadata ...@@ -253,7 +255,7 @@ typedef struct xl_btree_metadata
typedef struct xl_btree_insert typedef struct xl_btree_insert
{ {
xl_btreetid target; /* inserted tuple id */ xl_btreetid target; /* inserted tuple id */
/* BlockNumber downlink field FOLLOWS IF NOT XLOG_BTREE_INSERT_LEAF */ /* BlockNumber finishes_split field FOLLOWS IF NOT XLOG_BTREE_INSERT_LEAF */
/* xl_btree_metadata FOLLOWS IF XLOG_BTREE_INSERT_META */ /* xl_btree_metadata FOLLOWS IF XLOG_BTREE_INSERT_META */
/* INDEX TUPLE FOLLOWS AT END OF STRUCT */ /* INDEX TUPLE FOLLOWS AT END OF STRUCT */
} xl_btree_insert; } xl_btree_insert;
...@@ -286,19 +288,18 @@ typedef struct xl_btree_split ...@@ -286,19 +288,18 @@ typedef struct xl_btree_split
OffsetNumber firstright; /* first item moved to right page */ OffsetNumber firstright; /* first item moved to right page */
/* /*
* If level > 0, BlockIdData downlink follows. (We use BlockIdData rather * In the _L variants, next are OffsetNumber newitemoff and the new item.
* than BlockNumber for alignment reasons: SizeOfBtreeSplit is only 16-bit * (In the _R variants, the new item is one of the right page's tuples.)
* aligned.) * The new item, but not newitemoff, is suppressed if XLogInsert chooses
* to store the left page's whole page image.
* *
* If level > 0, an IndexTuple representing the HIKEY of the left page * If level > 0, an IndexTuple representing the HIKEY of the left page
* follows. We don't need this on leaf pages, because it's the same as * follows. We don't need this on leaf pages, because it's the same as
* the leftmost key in the new right page. Also, it's suppressed if * the leftmost key in the new right page. Also, it's suppressed if
* XLogInsert chooses to store the left page's whole page image. * XLogInsert chooses to store the left page's whole page image.
* *
* In the _L variants, next are OffsetNumber newitemoff and the new item. * If level > 0, BlockNumber of the page whose incomplete-split flag
* (In the _R variants, the new item is one of the right page's tuples.) * this insertion clears. (not aligned)
* The new item, but not newitemoff, is suppressed if XLogInsert chooses
* to store the left page's whole page image.
* *
* Last are the right page's tuples in the form used by _bt_restore_page. * Last are the right page's tuples in the form used by _bt_restore_page.
*/ */
...@@ -642,8 +643,7 @@ extern Datum btoptions(PG_FUNCTION_ARGS); ...@@ -642,8 +643,7 @@ extern Datum btoptions(PG_FUNCTION_ARGS);
extern bool _bt_doinsert(Relation rel, IndexTuple itup, extern bool _bt_doinsert(Relation rel, IndexTuple itup,
IndexUniqueCheck checkUnique, Relation heapRel); IndexUniqueCheck checkUnique, Relation heapRel);
extern Buffer _bt_getstackbuf(Relation rel, BTStack stack, int access); extern Buffer _bt_getstackbuf(Relation rel, BTStack stack, int access);
extern void _bt_insert_parent(Relation rel, Buffer buf, Buffer rbuf, extern void _bt_finish_split(Relation rel, Buffer bbuf, BTStack stack);
BTStack stack, bool is_root, bool is_only);
/* /*
* prototypes for functions in nbtpage.c * prototypes for functions in nbtpage.c
...@@ -673,7 +673,8 @@ extern BTStack _bt_search(Relation rel, ...@@ -673,7 +673,8 @@ extern BTStack _bt_search(Relation rel,
int keysz, ScanKey scankey, bool nextkey, int keysz, ScanKey scankey, bool nextkey,
Buffer *bufP, int access); Buffer *bufP, int access);
extern Buffer _bt_moveright(Relation rel, Buffer buf, int keysz, extern Buffer _bt_moveright(Relation rel, Buffer buf, int keysz,
ScanKey scankey, bool nextkey, int access); ScanKey scankey, bool nextkey, bool forupdate, BTStack stack,
int access);
extern OffsetNumber _bt_binsrch(Relation rel, Buffer buf, int keysz, extern OffsetNumber _bt_binsrch(Relation rel, Buffer buf, int keysz,
ScanKey scankey, bool nextkey); ScanKey scankey, bool nextkey);
extern int32 _bt_compare(Relation rel, int keysz, ScanKey scankey, extern int32 _bt_compare(Relation rel, int keysz, ScanKey scankey,
...@@ -722,8 +723,5 @@ extern void _bt_leafbuild(BTSpool *btspool, BTSpool *spool2); ...@@ -722,8 +723,5 @@ extern void _bt_leafbuild(BTSpool *btspool, BTSpool *spool2);
*/ */
extern void btree_redo(XLogRecPtr lsn, XLogRecord *record); extern void btree_redo(XLogRecPtr lsn, XLogRecord *record);
extern void btree_desc(StringInfo buf, uint8 xl_info, char *rec); extern void btree_desc(StringInfo buf, uint8 xl_info, char *rec);
extern void btree_xlog_startup(void);
extern void btree_xlog_cleanup(void);
extern bool btree_safe_restartpoint(void);
#endif /* NBTREE_H */ #endif /* NBTREE_H */
...@@ -36,7 +36,7 @@ PG_RMGR(RM_RELMAP_ID, "RelMap", relmap_redo, relmap_desc, NULL, NULL, NULL) ...@@ -36,7 +36,7 @@ PG_RMGR(RM_RELMAP_ID, "RelMap", relmap_redo, relmap_desc, NULL, NULL, NULL)
PG_RMGR(RM_STANDBY_ID, "Standby", standby_redo, standby_desc, NULL, NULL, NULL) PG_RMGR(RM_STANDBY_ID, "Standby", standby_redo, standby_desc, NULL, NULL, NULL)
PG_RMGR(RM_HEAP2_ID, "Heap2", heap2_redo, heap2_desc, NULL, NULL, NULL) PG_RMGR(RM_HEAP2_ID, "Heap2", heap2_redo, heap2_desc, NULL, NULL, NULL)
PG_RMGR(RM_HEAP_ID, "Heap", heap_redo, heap_desc, NULL, NULL, NULL) PG_RMGR(RM_HEAP_ID, "Heap", heap_redo, heap_desc, NULL, NULL, NULL)
PG_RMGR(RM_BTREE_ID, "Btree", btree_redo, btree_desc, btree_xlog_startup, btree_xlog_cleanup, btree_safe_restartpoint) PG_RMGR(RM_BTREE_ID, "Btree", btree_redo, btree_desc, NULL, NULL, NULL)
PG_RMGR(RM_HASH_ID, "Hash", hash_redo, hash_desc, NULL, NULL, NULL) PG_RMGR(RM_HASH_ID, "Hash", hash_redo, hash_desc, NULL, NULL, NULL)
PG_RMGR(RM_GIN_ID, "Gin", gin_redo, gin_desc, gin_xlog_startup, gin_xlog_cleanup, NULL) PG_RMGR(RM_GIN_ID, "Gin", gin_redo, gin_desc, gin_xlog_startup, gin_xlog_cleanup, NULL)
PG_RMGR(RM_GIST_ID, "Gist", gist_redo, gist_desc, gist_xlog_startup, gist_xlog_cleanup, NULL) PG_RMGR(RM_GIST_ID, "Gist", gist_redo, gist_desc, gist_xlog_startup, gist_xlog_cleanup, NULL)
......
...@@ -55,7 +55,7 @@ typedef struct BkpBlock ...@@ -55,7 +55,7 @@ typedef struct BkpBlock
/* /*
* Each page of XLOG file has a header like this: * Each page of XLOG file has a header like this:
*/ */
#define XLOG_PAGE_MAGIC 0xD07C /* can be used as WAL version indicator */ #define XLOG_PAGE_MAGIC 0xD07D /* can be used as WAL version indicator */
typedef struct XLogPageHeaderData typedef struct XLogPageHeaderData
{ {
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment