Commit bde361fe authored by Tom Lane's avatar Tom Lane

Fix memory leak and other bugs in ginPlaceToPage() & subroutines.

Commit 36a35c55 turned the interface between ginPlaceToPage and
its subroutines in gindatapage.c and ginentrypage.c into a royal mess:
page-update critical sections were started in one place and finished in
another place not even in the same file, and the very same subroutine
might return having started a critical section or not.  Subsequent patches
band-aided over some of the problems with this design by making things
even messier.

One user-visible resulting problem is memory leaks caused by the need for
the subroutines to allocate storage that would survive until ginPlaceToPage
calls XLogInsert (as reported by Julien Rouhaud).  This would not typically
be noticeable during retail index updates.  It could be visible in a GIN
index build, in the form of memory consumption swelling to several times
the commanded maintenance_work_mem.

Another rather nasty problem is that in the internal-page-splitting code
path, we would clear the child page's GIN_INCOMPLETE_SPLIT flag well before
entering the critical section that it's supposed to be cleared in; a
failure in between would leave the index in a corrupt state.  There were
also assorted coding-rule violations with little immediate consequence but
possible long-term hazards, such as beginning an XLogInsert sequence before
entering a critical section, or calling elog(DEBUG) inside a critical
section.

To fix, redefine the API between ginPlaceToPage() and its subroutines
by splitting the subroutines into two parts.  The "beginPlaceToPage"
subroutine does what can be done outside a critical section, including
full computation of the result pages into temporary storage when we're
going to split the target page.  The "execPlaceToPage" subroutine is called
within a critical section established by ginPlaceToPage(), and it handles
the actual page update in the non-split code path.  The critical section,
as well as the XLOG insertion call sequence, are both now always started
and finished in ginPlaceToPage().  Also, make ginPlaceToPage() create and
work in a short-lived memory context to eliminate the leakage problem.
(Since a short-lived memory context had been getting created in the most
common code path in the subroutines, this shouldn't cause any noticeable
performance penalty; we're just moving the overhead up one call level.)

In passing, fix a bunch of comments that had gone unmaintained throughout
all this klugery.

Report: <571276DD.5050303@dalibo.com>
parent a343e223
...@@ -17,6 +17,7 @@ ...@@ -17,6 +17,7 @@
#include "access/gin_private.h" #include "access/gin_private.h"
#include "access/xloginsert.h" #include "access/xloginsert.h"
#include "miscadmin.h" #include "miscadmin.h"
#include "utils/memutils.h"
#include "utils/rel.h" #include "utils/rel.h"
static void ginFindParents(GinBtree btree, GinBtreeStack *stack); static void ginFindParents(GinBtree btree, GinBtreeStack *stack);
...@@ -312,15 +313,16 @@ ginFindParents(GinBtree btree, GinBtreeStack *stack) ...@@ -312,15 +313,16 @@ ginFindParents(GinBtree btree, GinBtreeStack *stack)
* Insert a new item to a page. * Insert a new item to a page.
* *
* Returns true if the insertion was finished. On false, the page was split and * Returns true if the insertion was finished. On false, the page was split and
* the parent needs to be updated. (a root split returns true as it doesn't * the parent needs to be updated. (A root split returns true as it doesn't
* need any further action by the caller to complete) * need any further action by the caller to complete.)
* *
* When inserting a downlink to an internal page, 'childbuf' contains the * When inserting a downlink to an internal page, 'childbuf' contains the
* child page that was split. Its GIN_INCOMPLETE_SPLIT flag will be cleared * child page that was split. Its GIN_INCOMPLETE_SPLIT flag will be cleared
* atomically with the insert. Also, the existing item at the given location * atomically with the insert. Also, the existing item at offset stack->off
* is updated to point to 'updateblkno'. * in the target page is updated to point to updateblkno.
* *
* stack->buffer is locked on entry, and is kept locked. * stack->buffer is locked on entry, and is kept locked.
* Likewise for childbuf, if given.
*/ */
static bool static bool
ginPlaceToPage(GinBtree btree, GinBtreeStack *stack, ginPlaceToPage(GinBtree btree, GinBtreeStack *stack,
...@@ -328,11 +330,28 @@ ginPlaceToPage(GinBtree btree, GinBtreeStack *stack, ...@@ -328,11 +330,28 @@ ginPlaceToPage(GinBtree btree, GinBtreeStack *stack,
Buffer childbuf, GinStatsData *buildStats) Buffer childbuf, GinStatsData *buildStats)
{ {
Page page = BufferGetPage(stack->buffer); Page page = BufferGetPage(stack->buffer);
bool result;
GinPlaceToPageRC rc; GinPlaceToPageRC rc;
uint16 xlflags = 0; uint16 xlflags = 0;
Page childpage = NULL; Page childpage = NULL;
Page newlpage = NULL, Page newlpage = NULL,
newrpage = NULL; newrpage = NULL;
void *ptp_workspace = NULL;
MemoryContext tmpCxt;
MemoryContext oldCxt;
/*
* We do all the work of this function and its subfunctions in a temporary
* memory context. This avoids leakages and simplifies APIs, since some
* subfunctions allocate storage that has to survive until we've finished
* the WAL insertion.
*/
tmpCxt = AllocSetContextCreate(CurrentMemoryContext,
"ginPlaceToPage temporary context",
ALLOCSET_DEFAULT_MINSIZE,
ALLOCSET_DEFAULT_INITSIZE,
ALLOCSET_DEFAULT_MAXSIZE);
oldCxt = MemoryContextSwitchTo(tmpCxt);
if (GinPageIsData(page)) if (GinPageIsData(page))
xlflags |= GIN_INSERT_ISDATA; xlflags |= GIN_INSERT_ISDATA;
...@@ -350,40 +369,42 @@ ginPlaceToPage(GinBtree btree, GinBtreeStack *stack, ...@@ -350,40 +369,42 @@ ginPlaceToPage(GinBtree btree, GinBtreeStack *stack,
} }
/* /*
* Try to put the incoming tuple on the page. placeToPage will decide if * See if the incoming tuple will fit on the page. beginPlaceToPage will
* the page needs to be split. * decide if the page needs to be split, and will compute the split
* * contents if so. See comments for beginPlaceToPage and execPlaceToPage
* WAL-logging this operation is a bit funny: * functions for more details of the API here.
*
* We're responsible for calling XLogBeginInsert() and XLogInsert().
* XLogBeginInsert() must be called before placeToPage, because
* placeToPage can register some data to the WAL record.
*
* If placeToPage returns INSERTED, placeToPage has already called
* START_CRIT_SECTION() and XLogBeginInsert(), and registered any data
* required to replay the operation, in block index 0. We're responsible
* for filling in the main data portion of the WAL record, calling
* XLogInsert(), and END_CRIT_SECTION.
*
* If placeToPage returns SPLIT, we're wholly responsible for WAL logging.
* Splits happen infrequently, so we just make a full-page image of all
* the pages involved.
*/ */
rc = btree->placeToPage(btree, stack->buffer, stack, rc = btree->beginPlaceToPage(btree, stack->buffer, stack,
insertdata, updateblkno, insertdata, updateblkno,
&ptp_workspace,
&newlpage, &newrpage); &newlpage, &newrpage);
if (rc == UNMODIFIED)
if (rc == GPTP_NO_WORK)
{ {
XLogResetInsertion(); /* Nothing to do */
return true; result = true;
} }
else if (rc == INSERTED) else if (rc == GPTP_INSERT)
{
/* It will fit, perform the insertion */
START_CRIT_SECTION();
if (RelationNeedsWAL(btree->index))
{ {
/* placeToPage did START_CRIT_SECTION() */ XLogBeginInsert();
XLogRegisterBuffer(0, stack->buffer, REGBUF_STANDARD);
if (BufferIsValid(childbuf))
XLogRegisterBuffer(1, childbuf, REGBUF_STANDARD);
}
/* Perform the page update, and register any extra WAL data */
btree->execPlaceToPage(btree, stack->buffer, stack,
insertdata, updateblkno, ptp_workspace);
MarkBufferDirty(stack->buffer); MarkBufferDirty(stack->buffer);
/* An insert to an internal page finishes the split of the child. */ /* An insert to an internal page finishes the split of the child. */
if (childbuf != InvalidBuffer) if (BufferIsValid(childbuf))
{ {
GinPageGetOpaque(childpage)->flags &= ~GIN_INCOMPLETE_SPLIT; GinPageGetOpaque(childpage)->flags &= ~GIN_INCOMPLETE_SPLIT;
MarkBufferDirty(childbuf); MarkBufferDirty(childbuf);
...@@ -395,21 +416,15 @@ ginPlaceToPage(GinBtree btree, GinBtreeStack *stack, ...@@ -395,21 +416,15 @@ ginPlaceToPage(GinBtree btree, GinBtreeStack *stack,
ginxlogInsert xlrec; ginxlogInsert xlrec;
BlockIdData childblknos[2]; BlockIdData childblknos[2];
/*
* placetopage already registered stack->buffer as block 0.
*/
xlrec.flags = xlflags; xlrec.flags = xlflags;
if (childbuf != InvalidBuffer)
XLogRegisterBuffer(1, childbuf, REGBUF_STANDARD);
XLogRegisterData((char *) &xlrec, sizeof(ginxlogInsert)); XLogRegisterData((char *) &xlrec, sizeof(ginxlogInsert));
/* /*
* Log information about child if this was an insertion of a * Log information about child if this was an insertion of a
* downlink. * downlink.
*/ */
if (childbuf != InvalidBuffer) if (BufferIsValid(childbuf))
{ {
BlockIdSet(&childblknos[0], BufferGetBlockNumber(childbuf)); BlockIdSet(&childblknos[0], BufferGetBlockNumber(childbuf));
BlockIdSet(&childblknos[1], GinPageGetOpaque(childpage)->rightlink); BlockIdSet(&childblknos[1], GinPageGetOpaque(childpage)->rightlink);
...@@ -419,23 +434,29 @@ ginPlaceToPage(GinBtree btree, GinBtreeStack *stack, ...@@ -419,23 +434,29 @@ ginPlaceToPage(GinBtree btree, GinBtreeStack *stack,
recptr = XLogInsert(RM_GIN_ID, XLOG_GIN_INSERT); recptr = XLogInsert(RM_GIN_ID, XLOG_GIN_INSERT);
PageSetLSN(page, recptr); PageSetLSN(page, recptr);
if (childbuf != InvalidBuffer) if (BufferIsValid(childbuf))
PageSetLSN(childpage, recptr); PageSetLSN(childpage, recptr);
} }
END_CRIT_SECTION(); END_CRIT_SECTION();
return true; /* Insertion is complete. */
result = true;
} }
else if (rc == SPLIT) else if (rc == GPTP_SPLIT)
{ {
/* Didn't fit, had to split */ /*
* Didn't fit, need to split. The split has been computed in newlpage
* and newrpage, which are pointers to palloc'd pages, not associated
* with buffers. stack->buffer is not touched yet.
*/
Buffer rbuffer; Buffer rbuffer;
BlockNumber savedRightLink; BlockNumber savedRightLink;
ginxlogSplit data; ginxlogSplit data;
Buffer lbuffer = InvalidBuffer; Buffer lbuffer = InvalidBuffer;
Page newrootpg = NULL; Page newrootpg = NULL;
/* Get a new index page to become the right page */
rbuffer = GinNewBuffer(btree->index); rbuffer = GinNewBuffer(btree->index);
/* During index build, count the new page */ /* During index build, count the new page */
...@@ -449,19 +470,11 @@ ginPlaceToPage(GinBtree btree, GinBtreeStack *stack, ...@@ -449,19 +470,11 @@ ginPlaceToPage(GinBtree btree, GinBtreeStack *stack,
savedRightLink = GinPageGetOpaque(page)->rightlink; savedRightLink = GinPageGetOpaque(page)->rightlink;
/* /* Begin setting up WAL record */
* newlpage and newrpage are pointers to memory pages, not associated
* with buffers. stack->buffer is not touched yet.
*/
data.node = btree->index->rd_node; data.node = btree->index->rd_node;
data.flags = xlflags; data.flags = xlflags;
if (childbuf != InvalidBuffer) if (BufferIsValid(childbuf))
{ {
Page childpage = BufferGetPage(childbuf);
GinPageGetOpaque(childpage)->flags &= ~GIN_INCOMPLETE_SPLIT;
data.leftChildBlkno = BufferGetBlockNumber(childbuf); data.leftChildBlkno = BufferGetBlockNumber(childbuf);
data.rightChildBlkno = GinPageGetOpaque(childpage)->rightlink; data.rightChildBlkno = GinPageGetOpaque(childpage)->rightlink;
} }
...@@ -471,12 +484,12 @@ ginPlaceToPage(GinBtree btree, GinBtreeStack *stack, ...@@ -471,12 +484,12 @@ ginPlaceToPage(GinBtree btree, GinBtreeStack *stack,
if (stack->parent == NULL) if (stack->parent == NULL)
{ {
/* /*
* split root, so we need to allocate new left page and place * splitting the root, so we need to allocate new left page and
* pointer on root to left and right page * place pointers to left and right page on root page.
*/ */
lbuffer = GinNewBuffer(btree->index); lbuffer = GinNewBuffer(btree->index);
/* During index build, count the newly-added root page */ /* During index build, count the new left page */
if (buildStats) if (buildStats)
{ {
if (btree->isData) if (btree->isData)
...@@ -493,9 +506,9 @@ ginPlaceToPage(GinBtree btree, GinBtreeStack *stack, ...@@ -493,9 +506,9 @@ ginPlaceToPage(GinBtree btree, GinBtreeStack *stack,
/* /*
* Construct a new root page containing downlinks to the new left * Construct a new root page containing downlinks to the new left
* and right pages. (do this in a temporary copy first rather than * and right pages. (Do this in a temporary copy rather than
* overwriting the original page directly, so that we can still * overwriting the original page directly, since we're not in the
* abort gracefully if this fails.) * critical section yet.)
*/ */
newrootpg = PageGetTempPage(newrpage); newrootpg = PageGetTempPage(newrpage);
GinInitPage(newrootpg, GinPageGetOpaque(newlpage)->flags & ~(GIN_LEAF | GIN_COMPRESSED), BLCKSZ); GinInitPage(newrootpg, GinPageGetOpaque(newlpage)->flags & ~(GIN_LEAF | GIN_COMPRESSED), BLCKSZ);
...@@ -506,7 +519,7 @@ ginPlaceToPage(GinBtree btree, GinBtreeStack *stack, ...@@ -506,7 +519,7 @@ ginPlaceToPage(GinBtree btree, GinBtreeStack *stack,
} }
else else
{ {
/* split non-root page */ /* splitting a non-root page */
data.rrlink = savedRightLink; data.rrlink = savedRightLink;
GinPageGetOpaque(newrpage)->rightlink = savedRightLink; GinPageGetOpaque(newrpage)->rightlink = savedRightLink;
...@@ -515,41 +528,44 @@ ginPlaceToPage(GinBtree btree, GinBtreeStack *stack, ...@@ -515,41 +528,44 @@ ginPlaceToPage(GinBtree btree, GinBtreeStack *stack,
} }
/* /*
* Ok, we have the new contents of the left page in a temporary copy * OK, we have the new contents of the left page in a temporary copy
* now (newlpage), and the newly-allocated right block has been filled * now (newlpage), and likewise for the new contents of the
* in. The original page is still unchanged. * newly-allocated right block. The original page is still unchanged.
* *
* If this is a root split, we also have a temporary page containing * If this is a root split, we also have a temporary page containing
* the new contents of the root. Copy the new left page to a * the new contents of the root.
* newly-allocated block, and initialize the (original) root page the
* new copy. Otherwise, copy over the temporary copy of the new left
* page over the old left page.
*/ */
START_CRIT_SECTION(); START_CRIT_SECTION();
MarkBufferDirty(rbuffer); MarkBufferDirty(rbuffer);
MarkBufferDirty(stack->buffer); MarkBufferDirty(stack->buffer);
if (BufferIsValid(childbuf))
MarkBufferDirty(childbuf);
/* /*
* Restore the temporary copies over the real buffers. But don't free * Restore the temporary copies over the real buffers.
* the temporary copies yet, WAL record data points to them.
*/ */
if (stack->parent == NULL) if (stack->parent == NULL)
{ {
/* Splitting the root, three pages to update */
MarkBufferDirty(lbuffer); MarkBufferDirty(lbuffer);
memcpy(BufferGetPage(stack->buffer), newrootpg, BLCKSZ); memcpy(page, newrootpg, BLCKSZ);
memcpy(BufferGetPage(lbuffer), newlpage, BLCKSZ); memcpy(BufferGetPage(lbuffer), newlpage, BLCKSZ);
memcpy(BufferGetPage(rbuffer), newrpage, BLCKSZ); memcpy(BufferGetPage(rbuffer), newrpage, BLCKSZ);
} }
else else
{ {
memcpy(BufferGetPage(stack->buffer), newlpage, BLCKSZ); /* Normal split, only two pages to update */
memcpy(page, newlpage, BLCKSZ);
memcpy(BufferGetPage(rbuffer), newrpage, BLCKSZ); memcpy(BufferGetPage(rbuffer), newrpage, BLCKSZ);
} }
/* We also clear childbuf's INCOMPLETE_SPLIT flag, if passed */
if (BufferIsValid(childbuf))
{
GinPageGetOpaque(childpage)->flags &= ~GIN_INCOMPLETE_SPLIT;
MarkBufferDirty(childbuf);
}
/* write WAL record */ /* write WAL record */
if (RelationNeedsWAL(btree->index)) if (RelationNeedsWAL(btree->index))
{ {
...@@ -574,12 +590,13 @@ ginPlaceToPage(GinBtree btree, GinBtreeStack *stack, ...@@ -574,12 +590,13 @@ ginPlaceToPage(GinBtree btree, GinBtreeStack *stack,
XLogRegisterBuffer(1, rbuffer, REGBUF_FORCE_IMAGE | REGBUF_STANDARD); XLogRegisterBuffer(1, rbuffer, REGBUF_FORCE_IMAGE | REGBUF_STANDARD);
} }
if (BufferIsValid(childbuf)) if (BufferIsValid(childbuf))
XLogRegisterBuffer(3, childbuf, 0); XLogRegisterBuffer(3, childbuf, REGBUF_STANDARD);
XLogRegisterData((char *) &data, sizeof(ginxlogSplit)); XLogRegisterData((char *) &data, sizeof(ginxlogSplit));
recptr = XLogInsert(RM_GIN_ID, XLOG_GIN_SPLIT); recptr = XLogInsert(RM_GIN_ID, XLOG_GIN_SPLIT);
PageSetLSN(BufferGetPage(stack->buffer), recptr);
PageSetLSN(page, recptr);
PageSetLSN(BufferGetPage(rbuffer), recptr); PageSetLSN(BufferGetPage(rbuffer), recptr);
if (stack->parent == NULL) if (stack->parent == NULL)
PageSetLSN(BufferGetPage(lbuffer), recptr); PageSetLSN(BufferGetPage(lbuffer), recptr);
...@@ -589,33 +606,31 @@ ginPlaceToPage(GinBtree btree, GinBtreeStack *stack, ...@@ -589,33 +606,31 @@ ginPlaceToPage(GinBtree btree, GinBtreeStack *stack,
END_CRIT_SECTION(); END_CRIT_SECTION();
/* /*
* We can release the lock on the right page now, but keep the * We can release the locks/pins on the new pages now, but keep
* original buffer locked. * stack->buffer locked. childbuf doesn't get unlocked either.
*/ */
UnlockReleaseBuffer(rbuffer); UnlockReleaseBuffer(rbuffer);
if (stack->parent == NULL) if (stack->parent == NULL)
UnlockReleaseBuffer(lbuffer); UnlockReleaseBuffer(lbuffer);
pfree(newlpage);
pfree(newrpage);
if (newrootpg)
pfree(newrootpg);
/* /*
* If we split the root, we're done. Otherwise the split is not * If we split the root, we're done. Otherwise the split is not
* complete until the downlink for the new page has been inserted to * complete until the downlink for the new page has been inserted to
* the parent. * the parent.
*/ */
if (stack->parent == NULL) result = (stack->parent == NULL);
return true;
else
return false;
} }
else else
{ {
elog(ERROR, "unknown return code from GIN placeToPage method: %d", rc); elog(ERROR, "invalid return code from GIN placeToPage method: %d", rc);
return false; /* keep compiler quiet */ result = false; /* keep compiler quiet */
} }
/* Clean up temp context */
MemoryContextSwitchTo(oldCxt);
MemoryContextDelete(tmpCxt);
return result;
} }
/* /*
......
...@@ -18,7 +18,6 @@ ...@@ -18,7 +18,6 @@
#include "access/xloginsert.h" #include "access/xloginsert.h"
#include "lib/ilist.h" #include "lib/ilist.h"
#include "miscadmin.h" #include "miscadmin.h"
#include "utils/memutils.h"
#include "utils/rel.h" #include "utils/rel.h"
/* /*
...@@ -57,6 +56,13 @@ typedef struct ...@@ -57,6 +56,13 @@ typedef struct
int rsize; /* total size on right page */ int rsize; /* total size on right page */
bool oldformat; /* page is in pre-9.4 format on disk */ bool oldformat; /* page is in pre-9.4 format on disk */
/*
* If we need WAL data representing the reconstructed leaf page, it's
* stored here by computeLeafRecompressWALData.
*/
char *walinfo; /* buffer start */
int walinfolen; /* and length */
} disassembledLeaf; } disassembledLeaf;
typedef struct typedef struct
...@@ -105,10 +111,9 @@ static bool leafRepackItems(disassembledLeaf *leaf, ItemPointer remaining); ...@@ -105,10 +111,9 @@ static bool leafRepackItems(disassembledLeaf *leaf, ItemPointer remaining);
static bool addItemsToLeaf(disassembledLeaf *leaf, ItemPointer newItems, static bool addItemsToLeaf(disassembledLeaf *leaf, ItemPointer newItems,
int nNewItems); int nNewItems);
static void registerLeafRecompressWALData(Buffer buf, disassembledLeaf *leaf); static void computeLeafRecompressWALData(disassembledLeaf *leaf);
static void dataPlaceToPageLeafRecompress(Buffer buf, disassembledLeaf *leaf); static void dataPlaceToPageLeafRecompress(Buffer buf, disassembledLeaf *leaf);
static void dataPlaceToPageLeafSplit(Buffer buf, static void dataPlaceToPageLeafSplit(disassembledLeaf *leaf,
disassembledLeaf *leaf,
ItemPointerData lbound, ItemPointerData rbound, ItemPointerData lbound, ItemPointerData rbound,
Page lpage, Page rpage); Page lpage, Page rpage);
...@@ -423,11 +428,22 @@ GinPageDeletePostingItem(Page page, OffsetNumber offset) ...@@ -423,11 +428,22 @@ GinPageDeletePostingItem(Page page, OffsetNumber offset)
} }
/* /*
* Places keys to leaf data page and fills WAL record. * Prepare to insert data on a leaf data page.
*
* If it will fit, return GPTP_INSERT after doing whatever setup is needed
* before we enter the insertion critical section. *ptp_workspace can be
* set to pass information along to the execPlaceToPage function.
*
* If it won't fit, perform a page split and return two temporary page
* images into *newlpage and *newrpage, with result GPTP_SPLIT.
*
* In neither case should the given page buffer be modified here.
*/ */
static GinPlaceToPageRC static GinPlaceToPageRC
dataPlaceToPageLeaf(GinBtree btree, Buffer buf, GinBtreeStack *stack, dataBeginPlaceToPageLeaf(GinBtree btree, Buffer buf, GinBtreeStack *stack,
void *insertdata, Page *newlpage, Page *newrpage) void *insertdata,
void **ptp_workspace,
Page *newlpage, Page *newrpage)
{ {
GinBtreeDataLeafInsertData *items = insertdata; GinBtreeDataLeafInsertData *items = insertdata;
ItemPointer newItems = &items->items[items->curitem]; ItemPointer newItems = &items->items[items->curitem];
...@@ -440,15 +456,11 @@ dataPlaceToPageLeaf(GinBtree btree, Buffer buf, GinBtreeStack *stack, ...@@ -440,15 +456,11 @@ dataPlaceToPageLeaf(GinBtree btree, Buffer buf, GinBtreeStack *stack,
bool append; bool append;
int segsize; int segsize;
Size freespace; Size freespace;
MemoryContext tmpCxt;
MemoryContext oldCxt;
disassembledLeaf *leaf; disassembledLeaf *leaf;
leafSegmentInfo *lastleftinfo; leafSegmentInfo *lastleftinfo;
ItemPointerData maxOldItem; ItemPointerData maxOldItem;
ItemPointerData remaining; ItemPointerData remaining;
Assert(GinPageIsData(page));
rbound = *GinDataPageGetRightBound(page); rbound = *GinDataPageGetRightBound(page);
/* /*
...@@ -472,18 +484,7 @@ dataPlaceToPageLeaf(GinBtree btree, Buffer buf, GinBtreeStack *stack, ...@@ -472,18 +484,7 @@ dataPlaceToPageLeaf(GinBtree btree, Buffer buf, GinBtreeStack *stack,
maxitems = i; maxitems = i;
} }
/* /* Disassemble the data on the page */
* The following operations do quite a lot of small memory allocations,
* create a temporary memory context so that we don't need to keep track
* of them individually.
*/
tmpCxt = AllocSetContextCreate(CurrentMemoryContext,
"Gin split temporary context",
ALLOCSET_DEFAULT_MINSIZE,
ALLOCSET_DEFAULT_INITSIZE,
ALLOCSET_DEFAULT_MAXSIZE);
oldCxt = MemoryContextSwitchTo(tmpCxt);
leaf = disassembleLeaf(page); leaf = disassembleLeaf(page);
/* /*
...@@ -548,16 +549,13 @@ dataPlaceToPageLeaf(GinBtree btree, Buffer buf, GinBtreeStack *stack, ...@@ -548,16 +549,13 @@ dataPlaceToPageLeaf(GinBtree btree, Buffer buf, GinBtreeStack *stack,
maxitems = Min(maxitems, nnewsegments * MinTuplesPerSegment); maxitems = Min(maxitems, nnewsegments * MinTuplesPerSegment);
} }
/* Add the new items to the segments */ /* Add the new items to the segment list */
if (!addItemsToLeaf(leaf, newItems, maxitems)) if (!addItemsToLeaf(leaf, newItems, maxitems))
{ {
/* all items were duplicates, we have nothing to do */ /* all items were duplicates, we have nothing to do */
items->curitem += maxitems; items->curitem += maxitems;
MemoryContextSwitchTo(oldCxt); return GPTP_NO_WORK;
MemoryContextDelete(tmpCxt);
return UNMODIFIED;
} }
/* /*
...@@ -590,22 +588,17 @@ dataPlaceToPageLeaf(GinBtree btree, Buffer buf, GinBtreeStack *stack, ...@@ -590,22 +588,17 @@ dataPlaceToPageLeaf(GinBtree btree, Buffer buf, GinBtreeStack *stack,
if (!needsplit) if (!needsplit)
{ {
/* /*
* Great, all the items fit on a single page. Construct a WAL record * Great, all the items fit on a single page. If needed, prepare data
* describing the changes we made, and write the segments back to the * for a WAL record describing the changes we'll make.
* page.
*
* Once we start modifying the page, there's no turning back. The
* caller is responsible for calling END_CRIT_SECTION() after writing
* the WAL record.
*/ */
MemoryContextSwitchTo(oldCxt);
if (RelationNeedsWAL(btree->index)) if (RelationNeedsWAL(btree->index))
{ computeLeafRecompressWALData(leaf);
XLogBeginInsert();
registerLeafRecompressWALData(buf, leaf); /*
} * We're ready to enter the critical section, but
START_CRIT_SECTION(); * dataExecPlaceToPageLeaf will need access to the "leaf" data.
dataPlaceToPageLeafRecompress(buf, leaf); */
*ptp_workspace = leaf;
if (append) if (append)
elog(DEBUG2, "appended %d new items to block %u; %d bytes (%d to go)", elog(DEBUG2, "appended %d new items to block %u; %d bytes (%d to go)",
...@@ -619,7 +612,7 @@ dataPlaceToPageLeaf(GinBtree btree, Buffer buf, GinBtreeStack *stack, ...@@ -619,7 +612,7 @@ dataPlaceToPageLeaf(GinBtree btree, Buffer buf, GinBtreeStack *stack,
else else
{ {
/* /*
* Had to split. * Have to split.
* *
* leafRepackItems already divided the segments between the left and * leafRepackItems already divided the segments between the left and
* the right page. It filled the left page as full as possible, and * the right page. It filled the left page as full as possible, and
...@@ -631,7 +624,7 @@ dataPlaceToPageLeaf(GinBtree btree, Buffer buf, GinBtreeStack *stack, ...@@ -631,7 +624,7 @@ dataPlaceToPageLeaf(GinBtree btree, Buffer buf, GinBtreeStack *stack,
* until they're balanced. * until they're balanced.
* *
* As a further heuristic, when appending items to the end of the * As a further heuristic, when appending items to the end of the
* page, try make the left page 75% full, one the assumption that * page, try to make the left page 75% full, on the assumption that
* subsequent insertions will probably also go to the end. This packs * subsequent insertions will probably also go to the end. This packs
* the index somewhat tighter when appending to a table, which is very * the index somewhat tighter when appending to a table, which is very
* common. * common.
...@@ -680,10 +673,13 @@ dataPlaceToPageLeaf(GinBtree btree, Buffer buf, GinBtreeStack *stack, ...@@ -680,10 +673,13 @@ dataPlaceToPageLeaf(GinBtree btree, Buffer buf, GinBtreeStack *stack,
&lastleftinfo->nitems); &lastleftinfo->nitems);
lbound = lastleftinfo->items[lastleftinfo->nitems - 1]; lbound = lastleftinfo->items[lastleftinfo->nitems - 1];
*newlpage = MemoryContextAlloc(oldCxt, BLCKSZ); /*
*newrpage = MemoryContextAlloc(oldCxt, BLCKSZ); * Now allocate a couple of temporary page images, and fill them.
*/
*newlpage = palloc(BLCKSZ);
*newrpage = palloc(BLCKSZ);
dataPlaceToPageLeafSplit(buf, leaf, lbound, rbound, dataPlaceToPageLeafSplit(leaf, lbound, rbound,
*newlpage, *newrpage); *newlpage, *newrpage);
Assert(GinPageRightMost(page) || Assert(GinPageRightMost(page) ||
...@@ -700,12 +696,31 @@ dataPlaceToPageLeaf(GinBtree btree, Buffer buf, GinBtreeStack *stack, ...@@ -700,12 +696,31 @@ dataPlaceToPageLeaf(GinBtree btree, Buffer buf, GinBtreeStack *stack,
items->nitem - items->curitem - maxitems); items->nitem - items->curitem - maxitems);
} }
MemoryContextSwitchTo(oldCxt);
MemoryContextDelete(tmpCxt);
items->curitem += maxitems; items->curitem += maxitems;
return needsplit ? SPLIT : INSERTED; return needsplit ? GPTP_SPLIT : GPTP_INSERT;
}
/*
* Perform data insertion after beginPlaceToPage has decided it will fit.
*
* This is invoked within a critical section, and XLOG record creation (if
* needed) is already started. The target buffer is registered in slot 0.
*/
static void
dataExecPlaceToPageLeaf(GinBtree btree, Buffer buf, GinBtreeStack *stack,
void *insertdata, void *ptp_workspace)
{
disassembledLeaf *leaf = (disassembledLeaf *) ptp_workspace;
/* Apply changes to page */
dataPlaceToPageLeafRecompress(buf, leaf);
/* If needed, register WAL data built by computeLeafRecompressWALData */
if (RelationNeedsWAL(btree->index))
{
XLogRegisterBufData(0, leaf->walinfo, leaf->walinfolen);
}
} }
/* /*
...@@ -816,11 +831,11 @@ ginVacuumPostingTreeLeaf(Relation indexrel, Buffer buffer, GinVacuumState *gvs) ...@@ -816,11 +831,11 @@ ginVacuumPostingTreeLeaf(Relation indexrel, Buffer buffer, GinVacuumState *gvs)
} }
if (RelationNeedsWAL(indexrel)) if (RelationNeedsWAL(indexrel))
{ computeLeafRecompressWALData(leaf);
XLogBeginInsert();
registerLeafRecompressWALData(buffer, leaf); /* Apply changes to page */
}
START_CRIT_SECTION(); START_CRIT_SECTION();
dataPlaceToPageLeafRecompress(buffer, leaf); dataPlaceToPageLeafRecompress(buffer, leaf);
MarkBufferDirty(buffer); MarkBufferDirty(buffer);
...@@ -829,6 +844,9 @@ ginVacuumPostingTreeLeaf(Relation indexrel, Buffer buffer, GinVacuumState *gvs) ...@@ -829,6 +844,9 @@ ginVacuumPostingTreeLeaf(Relation indexrel, Buffer buffer, GinVacuumState *gvs)
{ {
XLogRecPtr recptr; XLogRecPtr recptr;
XLogBeginInsert();
XLogRegisterBuffer(0, buffer, REGBUF_STANDARD);
XLogRegisterBufData(0, leaf->walinfo, leaf->walinfolen);
recptr = XLogInsert(RM_GIN_ID, XLOG_GIN_VACUUM_DATA_LEAF_PAGE); recptr = XLogInsert(RM_GIN_ID, XLOG_GIN_VACUUM_DATA_LEAF_PAGE);
PageSetLSN(page, recptr); PageSetLSN(page, recptr);
} }
...@@ -839,10 +857,11 @@ ginVacuumPostingTreeLeaf(Relation indexrel, Buffer buffer, GinVacuumState *gvs) ...@@ -839,10 +857,11 @@ ginVacuumPostingTreeLeaf(Relation indexrel, Buffer buffer, GinVacuumState *gvs)
/* /*
* Construct a ginxlogRecompressDataLeaf record representing the changes * Construct a ginxlogRecompressDataLeaf record representing the changes
* in *leaf. * in *leaf. (Because this requires a palloc, we have to do it before
* we enter the critical section that actually updates the page.)
*/ */
static void static void
registerLeafRecompressWALData(Buffer buf, disassembledLeaf *leaf) computeLeafRecompressWALData(disassembledLeaf *leaf)
{ {
int nmodified = 0; int nmodified = 0;
char *walbufbegin; char *walbufbegin;
...@@ -933,18 +952,15 @@ registerLeafRecompressWALData(Buffer buf, disassembledLeaf *leaf) ...@@ -933,18 +952,15 @@ registerLeafRecompressWALData(Buffer buf, disassembledLeaf *leaf)
segno++; segno++;
} }
/* Pass back the constructed info via *leaf */
XLogRegisterBuffer(0, buf, REGBUF_STANDARD); leaf->walinfo = walbufbegin;
XLogRegisterBufData(0, walbufbegin, walbufend - walbufbegin); leaf->walinfolen = walbufend - walbufbegin;
} }
/* /*
* Assemble a disassembled posting tree leaf page back to a buffer. * Assemble a disassembled posting tree leaf page back to a buffer.
* *
* *prdata is filled with WAL information about this operation. The caller * This just updates the target buffer; WAL stuff is caller's responsibility.
* is responsible for inserting to the WAL, along with any other information
* about the operation that triggered this recompression.
* *
* NOTE: The segment pointers must not point directly to the same buffer, * NOTE: The segment pointers must not point directly to the same buffer,
* except for segments that have not been modified and whose preceding * except for segments that have not been modified and whose preceding
...@@ -1003,11 +1019,11 @@ dataPlaceToPageLeafRecompress(Buffer buf, disassembledLeaf *leaf) ...@@ -1003,11 +1019,11 @@ dataPlaceToPageLeafRecompress(Buffer buf, disassembledLeaf *leaf)
* segments to two pages instead of one. * segments to two pages instead of one.
* *
* This is different from the non-split cases in that this does not modify * This is different from the non-split cases in that this does not modify
* the original page directly, but to temporary in-memory copies of the new * the original page directly, but writes to temporary in-memory copies of
* left and right pages. * the new left and right pages.
*/ */
static void static void
dataPlaceToPageLeafSplit(Buffer buf, disassembledLeaf *leaf, dataPlaceToPageLeafSplit(disassembledLeaf *leaf,
ItemPointerData lbound, ItemPointerData rbound, ItemPointerData lbound, ItemPointerData rbound,
Page lpage, Page rpage) Page lpage, Page rpage)
{ {
...@@ -1076,39 +1092,55 @@ dataPlaceToPageLeafSplit(Buffer buf, disassembledLeaf *leaf, ...@@ -1076,39 +1092,55 @@ dataPlaceToPageLeafSplit(Buffer buf, disassembledLeaf *leaf,
} }
/* /*
* Place a PostingItem to page, and fill a WAL record. * Prepare to insert data on an internal data page.
*
* If it will fit, return GPTP_INSERT after doing whatever setup is needed
* before we enter the insertion critical section. *ptp_workspace can be
* set to pass information along to the execPlaceToPage function.
* *
* If the item doesn't fit, returns false without modifying the page. * If it won't fit, perform a page split and return two temporary page
* images into *newlpage and *newrpage, with result GPTP_SPLIT.
* *
* In addition to inserting the given item, the downlink of the existing item * In neither case should the given page buffer be modified here.
* at 'off' is updated to point to 'updateblkno'.
* *
* On INSERTED, registers the buffer as buffer ID 0, with data. * Note: on insertion to an internal node, in addition to inserting the given
* On SPLIT, returns rdata that represents the split pages in *prdata. * item, the downlink of the existing item at stack->off will be updated to
* point to updateblkno.
*/ */
static GinPlaceToPageRC static GinPlaceToPageRC
dataPlaceToPageInternal(GinBtree btree, Buffer buf, GinBtreeStack *stack, dataBeginPlaceToPageInternal(GinBtree btree, Buffer buf, GinBtreeStack *stack,
void *insertdata, BlockNumber updateblkno, void *insertdata, BlockNumber updateblkno,
void **ptp_workspace,
Page *newlpage, Page *newrpage) Page *newlpage, Page *newrpage)
{ {
Page page = BufferGetPage(buf); Page page = BufferGetPage(buf);
OffsetNumber off = stack->off;
PostingItem *pitem;
/* this must be static so it can be returned to caller */ /* If it doesn't fit, deal with split case */
static ginxlogInsertDataInternal data;
/* split if we have to */
if (GinNonLeafDataPageGetFreeSpace(page) < sizeof(PostingItem)) if (GinNonLeafDataPageGetFreeSpace(page) < sizeof(PostingItem))
{ {
dataSplitPageInternal(btree, buf, stack, insertdata, updateblkno, dataSplitPageInternal(btree, buf, stack, insertdata, updateblkno,
newlpage, newrpage); newlpage, newrpage);
return SPLIT; return GPTP_SPLIT;
} }
Assert(GinPageIsData(page)); /* Else, we're ready to proceed with insertion */
return GPTP_INSERT;
}
START_CRIT_SECTION(); /*
* Perform data insertion after beginPlaceToPage has decided it will fit.
*
* This is invoked within a critical section, and XLOG record creation (if
* needed) is already started. The target buffer is registered in slot 0.
*/
static void
dataExecPlaceToPageInternal(GinBtree btree, Buffer buf, GinBtreeStack *stack,
void *insertdata, BlockNumber updateblkno,
void *ptp_workspace)
{
Page page = BufferGetPage(buf);
OffsetNumber off = stack->off;
PostingItem *pitem;
/* Update existing downlink to point to next page (on internal page) */ /* Update existing downlink to point to next page (on internal page) */
pitem = GinDataPageGetPostingItem(page, off); pitem = GinDataPageGetPostingItem(page, off);
...@@ -1120,25 +1152,44 @@ dataPlaceToPageInternal(GinBtree btree, Buffer buf, GinBtreeStack *stack, ...@@ -1120,25 +1152,44 @@ dataPlaceToPageInternal(GinBtree btree, Buffer buf, GinBtreeStack *stack,
if (RelationNeedsWAL(btree->index)) if (RelationNeedsWAL(btree->index))
{ {
/*
* This must be static, because it has to survive until XLogInsert,
* and we can't palloc here. Ugly, but the XLogInsert infrastructure
* isn't reentrant anyway.
*/
static ginxlogInsertDataInternal data;
data.offset = off; data.offset = off;
data.newitem = *pitem; data.newitem = *pitem;
XLogBeginInsert();
XLogRegisterBuffer(0, buf, REGBUF_STANDARD);
XLogRegisterBufData(0, (char *) &data, XLogRegisterBufData(0, (char *) &data,
sizeof(ginxlogInsertDataInternal)); sizeof(ginxlogInsertDataInternal));
} }
return INSERTED;
} }
/* /*
* Places an item (or items) to a posting tree. Calls relevant function of * Prepare to insert data on a posting-tree data page.
* internal of leaf page because they are handled very differently. *
* If it will fit, return GPTP_INSERT after doing whatever setup is needed
* before we enter the insertion critical section. *ptp_workspace can be
* set to pass information along to the execPlaceToPage function.
*
* If it won't fit, perform a page split and return two temporary page
* images into *newlpage and *newrpage, with result GPTP_SPLIT.
*
* In neither case should the given page buffer be modified here.
*
* Note: on insertion to an internal node, in addition to inserting the given
* item, the downlink of the existing item at stack->off will be updated to
* point to updateblkno.
*
* Calls relevant function for internal or leaf page because they are handled
* very differently.
*/ */
static GinPlaceToPageRC static GinPlaceToPageRC
dataPlaceToPage(GinBtree btree, Buffer buf, GinBtreeStack *stack, dataBeginPlaceToPage(GinBtree btree, Buffer buf, GinBtreeStack *stack,
void *insertdata, BlockNumber updateblkno, void *insertdata, BlockNumber updateblkno,
void **ptp_workspace,
Page *newlpage, Page *newrpage) Page *newlpage, Page *newrpage)
{ {
Page page = BufferGetPage(buf); Page page = BufferGetPage(buf);
...@@ -1146,17 +1197,45 @@ dataPlaceToPage(GinBtree btree, Buffer buf, GinBtreeStack *stack, ...@@ -1146,17 +1197,45 @@ dataPlaceToPage(GinBtree btree, Buffer buf, GinBtreeStack *stack,
Assert(GinPageIsData(page)); Assert(GinPageIsData(page));
if (GinPageIsLeaf(page)) if (GinPageIsLeaf(page))
return dataPlaceToPageLeaf(btree, buf, stack, insertdata, return dataBeginPlaceToPageLeaf(btree, buf, stack, insertdata,
ptp_workspace,
newlpage, newrpage); newlpage, newrpage);
else else
return dataPlaceToPageInternal(btree, buf, stack, return dataBeginPlaceToPageInternal(btree, buf, stack,
insertdata, updateblkno, insertdata, updateblkno,
ptp_workspace,
newlpage, newrpage); newlpage, newrpage);
} }
/* /*
* Split page and fill WAL record. Returns a new temp buffer filled with data * Perform data insertion after beginPlaceToPage has decided it will fit.
* that should go to the left page. The original buffer is left untouched. *
* This is invoked within a critical section, and XLOG record creation (if
* needed) is already started. The target buffer is registered in slot 0.
*
* Calls relevant function for internal or leaf page because they are handled
* very differently.
*/
static void
dataExecPlaceToPage(GinBtree btree, Buffer buf, GinBtreeStack *stack,
void *insertdata, BlockNumber updateblkno,
void *ptp_workspace)
{
Page page = BufferGetPage(buf);
if (GinPageIsLeaf(page))
dataExecPlaceToPageLeaf(btree, buf, stack, insertdata,
ptp_workspace);
else
dataExecPlaceToPageInternal(btree, buf, stack, insertdata,
updateblkno, ptp_workspace);
}
/*
* Split internal page and insert new data.
*
* Returns new temp pages to *newlpage and *newrpage.
* The original buffer is left untouched.
*/ */
static void static void
dataSplitPageInternal(GinBtree btree, Buffer origbuf, dataSplitPageInternal(GinBtree btree, Buffer origbuf,
...@@ -1231,6 +1310,7 @@ dataSplitPageInternal(GinBtree btree, Buffer origbuf, ...@@ -1231,6 +1310,7 @@ dataSplitPageInternal(GinBtree btree, Buffer origbuf,
/* set up right bound for right page */ /* set up right bound for right page */
*GinDataPageGetRightBound(rpage) = oldbound; *GinDataPageGetRightBound(rpage) = oldbound;
/* return temp pages to caller */
*newlpage = lpage; *newlpage = lpage;
*newrpage = rpage; *newrpage = rpage;
} }
...@@ -1789,7 +1869,8 @@ ginPrepareDataScan(GinBtree btree, Relation index, BlockNumber rootBlkno) ...@@ -1789,7 +1869,8 @@ ginPrepareDataScan(GinBtree btree, Relation index, BlockNumber rootBlkno)
btree->isMoveRight = dataIsMoveRight; btree->isMoveRight = dataIsMoveRight;
btree->findItem = NULL; btree->findItem = NULL;
btree->findChildPtr = dataFindChildPtr; btree->findChildPtr = dataFindChildPtr;
btree->placeToPage = dataPlaceToPage; btree->beginPlaceToPage = dataBeginPlaceToPage;
btree->execPlaceToPage = dataExecPlaceToPage;
btree->fillRoot = ginDataFillRoot; btree->fillRoot = ginDataFillRoot;
btree->prepareDownlink = dataPrepareDownlink; btree->prepareDownlink = dataPrepareDownlink;
......
...@@ -21,7 +21,7 @@ ...@@ -21,7 +21,7 @@
static void entrySplitPage(GinBtree btree, Buffer origbuf, static void entrySplitPage(GinBtree btree, Buffer origbuf,
GinBtreeStack *stack, GinBtreeStack *stack,
void *insertPayload, GinBtreeEntryInsertData *insertData,
BlockNumber updateblkno, BlockNumber updateblkno,
Page *newlpage, Page *newrpage); Page *newlpage, Page *newrpage);
...@@ -508,39 +508,57 @@ entryPreparePage(GinBtree btree, Page page, OffsetNumber off, ...@@ -508,39 +508,57 @@ entryPreparePage(GinBtree btree, Page page, OffsetNumber off,
} }
/* /*
* Place tuple on page and fills WAL record * Prepare to insert data on an entry page.
* *
* If the tuple doesn't fit, returns false without modifying the page. * If it will fit, return GPTP_INSERT after doing whatever setup is needed
* before we enter the insertion critical section. *ptp_workspace can be
* set to pass information along to the execPlaceToPage function.
* *
* On insertion to an internal node, in addition to inserting the given item, * If it won't fit, perform a page split and return two temporary page
* the downlink of the existing item at 'off' is updated to point to * images into *newlpage and *newrpage, with result GPTP_SPLIT.
* 'updateblkno'.
* *
* On INSERTED, registers the buffer as buffer ID 0, with data. * In neither case should the given page buffer be modified here.
* On SPLIT, returns rdata that represents the split pages in *prdata. *
* Note: on insertion to an internal node, in addition to inserting the given
* item, the downlink of the existing item at stack->off will be updated to
* point to updateblkno.
*/ */
static GinPlaceToPageRC static GinPlaceToPageRC
entryPlaceToPage(GinBtree btree, Buffer buf, GinBtreeStack *stack, entryBeginPlaceToPage(GinBtree btree, Buffer buf, GinBtreeStack *stack,
void *insertPayload, BlockNumber updateblkno, void *insertPayload, BlockNumber updateblkno,
void **ptp_workspace,
Page *newlpage, Page *newrpage) Page *newlpage, Page *newrpage)
{ {
GinBtreeEntryInsertData *insertData = insertPayload; GinBtreeEntryInsertData *insertData = insertPayload;
Page page = BufferGetPage(buf);
OffsetNumber off = stack->off; OffsetNumber off = stack->off;
OffsetNumber placed;
/* this must be static so it can be returned to caller. */ /* If it doesn't fit, deal with split case */
static ginxlogInsertEntry data;
/* quick exit if it doesn't fit */
if (!entryIsEnoughSpace(btree, buf, off, insertData)) if (!entryIsEnoughSpace(btree, buf, off, insertData))
{ {
entrySplitPage(btree, buf, stack, insertPayload, updateblkno, entrySplitPage(btree, buf, stack, insertData, updateblkno,
newlpage, newrpage); newlpage, newrpage);
return SPLIT; return GPTP_SPLIT;
} }
START_CRIT_SECTION(); /* Else, we're ready to proceed with insertion */
return GPTP_INSERT;
}
/*
* Perform data insertion after beginPlaceToPage has decided it will fit.
*
* This is invoked within a critical section, and XLOG record creation (if
* needed) is already started. The target buffer is registered in slot 0.
*/
static void
entryExecPlaceToPage(GinBtree btree, Buffer buf, GinBtreeStack *stack,
void *insertPayload, BlockNumber updateblkno,
void *ptp_workspace)
{
GinBtreeEntryInsertData *insertData = insertPayload;
Page page = BufferGetPage(buf);
OffsetNumber off = stack->off;
OffsetNumber placed;
entryPreparePage(btree, page, off, insertData, updateblkno); entryPreparePage(btree, page, off, insertData, updateblkno);
...@@ -554,34 +572,36 @@ entryPlaceToPage(GinBtree btree, Buffer buf, GinBtreeStack *stack, ...@@ -554,34 +572,36 @@ entryPlaceToPage(GinBtree btree, Buffer buf, GinBtreeStack *stack,
if (RelationNeedsWAL(btree->index)) if (RelationNeedsWAL(btree->index))
{ {
/*
* This must be static, because it has to survive until XLogInsert,
* and we can't palloc here. Ugly, but the XLogInsert infrastructure
* isn't reentrant anyway.
*/
static ginxlogInsertEntry data;
data.isDelete = insertData->isDelete; data.isDelete = insertData->isDelete;
data.offset = off; data.offset = off;
XLogBeginInsert();
XLogRegisterBuffer(0, buf, REGBUF_STANDARD);
XLogRegisterBufData(0, (char *) &data, XLogRegisterBufData(0, (char *) &data,
offsetof(ginxlogInsertEntry, tuple)); offsetof(ginxlogInsertEntry, tuple));
XLogRegisterBufData(0, (char *) insertData->entry, XLogRegisterBufData(0, (char *) insertData->entry,
IndexTupleSize(insertData->entry)); IndexTupleSize(insertData->entry));
} }
return INSERTED;
} }
/* /*
* Place tuple and split page, original buffer(lbuf) leaves untouched, * Split entry page and insert new data.
* returns shadow pages filled with new data. *
* Tuples are distributed between pages by equal size on its, not * Returns new temp pages to *newlpage and *newrpage.
* an equal number! * The original buffer is left untouched.
*/ */
static void static void
entrySplitPage(GinBtree btree, Buffer origbuf, entrySplitPage(GinBtree btree, Buffer origbuf,
GinBtreeStack *stack, GinBtreeStack *stack,
void *insertPayload, GinBtreeEntryInsertData *insertData,
BlockNumber updateblkno, BlockNumber updateblkno,
Page *newlpage, Page *newrpage) Page *newlpage, Page *newrpage)
{ {
GinBtreeEntryInsertData *insertData = insertPayload;
OffsetNumber off = stack->off; OffsetNumber off = stack->off;
OffsetNumber i, OffsetNumber i,
maxoff, maxoff,
...@@ -646,6 +666,10 @@ entrySplitPage(GinBtree btree, Buffer origbuf, ...@@ -646,6 +666,10 @@ entrySplitPage(GinBtree btree, Buffer origbuf,
{ {
itup = (IndexTuple) ptr; itup = (IndexTuple) ptr;
/*
* Decide where to split. We try to equalize the pages' total data
* size, not number of tuples.
*/
if (lsize > totalsize / 2) if (lsize > totalsize / 2)
{ {
if (separator == InvalidOffsetNumber) if (separator == InvalidOffsetNumber)
...@@ -663,6 +687,7 @@ entrySplitPage(GinBtree btree, Buffer origbuf, ...@@ -663,6 +687,7 @@ entrySplitPage(GinBtree btree, Buffer origbuf,
ptr += MAXALIGN(IndexTupleSize(itup)); ptr += MAXALIGN(IndexTupleSize(itup));
} }
/* return temp pages to caller */
*newlpage = lpage; *newlpage = lpage;
*newrpage = rpage; *newrpage = rpage;
} }
...@@ -731,7 +756,8 @@ ginPrepareEntryScan(GinBtree btree, OffsetNumber attnum, ...@@ -731,7 +756,8 @@ ginPrepareEntryScan(GinBtree btree, OffsetNumber attnum,
btree->isMoveRight = entryIsMoveRight; btree->isMoveRight = entryIsMoveRight;
btree->findItem = entryLocateLeafEntry; btree->findItem = entryLocateLeafEntry;
btree->findChildPtr = entryFindChildPtr; btree->findChildPtr = entryFindChildPtr;
btree->placeToPage = entryPlaceToPage; btree->beginPlaceToPage = entryBeginPlaceToPage;
btree->execPlaceToPage = entryExecPlaceToPage;
btree->fillRoot = ginEntryFillRoot; btree->fillRoot = ginEntryFillRoot;
btree->prepareDownlink = entryPrepareDownlink; btree->prepareDownlink = entryPrepareDownlink;
......
...@@ -420,14 +420,14 @@ typedef struct ginxlogCreatePostingTree ...@@ -420,14 +420,14 @@ typedef struct ginxlogCreatePostingTree
typedef struct typedef struct
{ {
uint16 flags; /* GIN_SPLIT_ISLEAF and/or GIN_SPLIT_ISDATA */ uint16 flags; /* GIN_INSERT_ISLEAF and/or GIN_INSERT_ISDATA */
/* /*
* FOLLOWS: * FOLLOWS:
* *
* 1. if not leaf page, block numbers of the left and right child pages * 1. if not leaf page, block numbers of the left and right child pages
* whose split this insertion finishes. As BlockIdData[2] (beware of * whose split this insertion finishes, as BlockIdData[2] (beware of
* adding fields before this that would make them not 16-bit aligned) * adding fields in this struct that would make them not 16-bit aligned)
* *
* 2. a ginxlogInsertEntry or ginxlogRecompressDataLeaf struct, depending * 2. a ginxlogInsertEntry or ginxlogRecompressDataLeaf struct, depending
* on tree type. * on tree type.
...@@ -499,21 +499,19 @@ typedef struct ginxlogSplit ...@@ -499,21 +499,19 @@ typedef struct ginxlogSplit
* split */ * split */
BlockNumber leftChildBlkno; /* valid on a non-leaf split */ BlockNumber leftChildBlkno; /* valid on a non-leaf split */
BlockNumber rightChildBlkno; BlockNumber rightChildBlkno;
uint16 flags; uint16 flags; /* see below */
/* follows: one of the following structs */
} ginxlogSplit; } ginxlogSplit;
/* /*
* Flags used in ginxlogInsert and ginxlogSplit records * Flags used in ginxlogInsert and ginxlogSplit records
*/ */
#define GIN_INSERT_ISDATA 0x01 /* for both insert and split records */ #define GIN_INSERT_ISDATA 0x01 /* for both insert and split records */
#define GIN_INSERT_ISLEAF 0x02 /* .. */ #define GIN_INSERT_ISLEAF 0x02 /* ditto */
#define GIN_SPLIT_ROOT 0x04 /* only for split records */ #define GIN_SPLIT_ROOT 0x04 /* only for split records */
/* /*
* Vacuum simply WAL-logs the whole page, when anything is modified. This * Vacuum simply WAL-logs the whole page, when anything is modified. This
* functionally identical heap_newpage records, but is kept separate for * is functionally identical to heap_newpage records, but is kept separate for
* debugging purposes. (When inspecting the WAL stream, it's easier to see * debugging purposes. (When inspecting the WAL stream, it's easier to see
* what's going on when GIN vacuum records are marked as such, not as heap * what's going on when GIN vacuum records are marked as such, not as heap
* records.) This is currently only used for entry tree leaf pages. * records.) This is currently only used for entry tree leaf pages.
...@@ -641,12 +639,12 @@ typedef struct GinBtreeStack ...@@ -641,12 +639,12 @@ typedef struct GinBtreeStack
typedef struct GinBtreeData *GinBtree; typedef struct GinBtreeData *GinBtree;
/* Return codes for GinBtreeData.placeToPage method */ /* Return codes for GinBtreeData.beginPlaceToPage method */
typedef enum typedef enum
{ {
UNMODIFIED, GPTP_NO_WORK,
INSERTED, GPTP_INSERT,
SPLIT GPTP_SPLIT
} GinPlaceToPageRC; } GinPlaceToPageRC;
typedef struct GinBtreeData typedef struct GinBtreeData
...@@ -659,7 +657,8 @@ typedef struct GinBtreeData ...@@ -659,7 +657,8 @@ typedef struct GinBtreeData
/* insert methods */ /* insert methods */
OffsetNumber (*findChildPtr) (GinBtree, Page, BlockNumber, OffsetNumber); OffsetNumber (*findChildPtr) (GinBtree, Page, BlockNumber, OffsetNumber);
GinPlaceToPageRC (*placeToPage) (GinBtree, Buffer, GinBtreeStack *, void *, BlockNumber, Page *, Page *); GinPlaceToPageRC (*beginPlaceToPage) (GinBtree, Buffer, GinBtreeStack *, void *, BlockNumber, void **, Page *, Page *);
void (*execPlaceToPage) (GinBtree, Buffer, GinBtreeStack *, void *, BlockNumber, void *);
void *(*prepareDownlink) (GinBtree, Buffer); void *(*prepareDownlink) (GinBtree, Buffer);
void (*fillRoot) (GinBtree, Page, BlockNumber, Page, BlockNumber, Page); void (*fillRoot) (GinBtree, Page, BlockNumber, Page, BlockNumber, Page);
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment