Commit bde361fe authored by Tom Lane's avatar Tom Lane

Fix memory leak and other bugs in ginPlaceToPage() & subroutines.

Commit 36a35c55 turned the interface between ginPlaceToPage and
its subroutines in gindatapage.c and ginentrypage.c into a royal mess:
page-update critical sections were started in one place and finished in
another place not even in the same file, and the very same subroutine
might return having started a critical section or not.  Subsequent patches
band-aided over some of the problems with this design by making things
even messier.

One user-visible resulting problem is memory leaks caused by the need for
the subroutines to allocate storage that would survive until ginPlaceToPage
calls XLogInsert (as reported by Julien Rouhaud).  This would not typically
be noticeable during retail index updates.  It could be visible in a GIN
index build, in the form of memory consumption swelling to several times
the commanded maintenance_work_mem.

Another rather nasty problem is that in the internal-page-splitting code
path, we would clear the child page's GIN_INCOMPLETE_SPLIT flag well before
entering the critical section that it's supposed to be cleared in; a
failure in between would leave the index in a corrupt state.  There were
also assorted coding-rule violations with little immediate consequence but
possible long-term hazards, such as beginning an XLogInsert sequence before
entering a critical section, or calling elog(DEBUG) inside a critical
section.

To fix, redefine the API between ginPlaceToPage() and its subroutines
by splitting the subroutines into two parts.  The "beginPlaceToPage"
subroutine does what can be done outside a critical section, including
full computation of the result pages into temporary storage when we're
going to split the target page.  The "execPlaceToPage" subroutine is called
within a critical section established by ginPlaceToPage(), and it handles
the actual page update in the non-split code path.  The critical section,
as well as the XLOG insertion call sequence, are both now always started
and finished in ginPlaceToPage().  Also, make ginPlaceToPage() create and
work in a short-lived memory context to eliminate the leakage problem.
(Since a short-lived memory context had been getting created in the most
common code path in the subroutines, this shouldn't cause any noticeable
performance penalty; we're just moving the overhead up one call level.)

In passing, fix a bunch of comments that had gone unmaintained throughout
all this klugery.

Report: <571276DD.5050303@dalibo.com>
parent a343e223
This diff is collapsed.
This diff is collapsed.
...@@ -21,7 +21,7 @@ ...@@ -21,7 +21,7 @@
static void entrySplitPage(GinBtree btree, Buffer origbuf, static void entrySplitPage(GinBtree btree, Buffer origbuf,
GinBtreeStack *stack, GinBtreeStack *stack,
void *insertPayload, GinBtreeEntryInsertData *insertData,
BlockNumber updateblkno, BlockNumber updateblkno,
Page *newlpage, Page *newrpage); Page *newlpage, Page *newrpage);
...@@ -508,39 +508,57 @@ entryPreparePage(GinBtree btree, Page page, OffsetNumber off, ...@@ -508,39 +508,57 @@ entryPreparePage(GinBtree btree, Page page, OffsetNumber off,
} }
/* /*
* Place tuple on page and fills WAL record * Prepare to insert data on an entry page.
* *
* If the tuple doesn't fit, returns false without modifying the page. * If it will fit, return GPTP_INSERT after doing whatever setup is needed
* before we enter the insertion critical section. *ptp_workspace can be
* set to pass information along to the execPlaceToPage function.
* *
* On insertion to an internal node, in addition to inserting the given item, * If it won't fit, perform a page split and return two temporary page
* the downlink of the existing item at 'off' is updated to point to * images into *newlpage and *newrpage, with result GPTP_SPLIT.
* 'updateblkno'.
* *
* On INSERTED, registers the buffer as buffer ID 0, with data. * In neither case should the given page buffer be modified here.
* On SPLIT, returns rdata that represents the split pages in *prdata. *
* Note: on insertion to an internal node, in addition to inserting the given
* item, the downlink of the existing item at stack->off will be updated to
* point to updateblkno.
*/ */
static GinPlaceToPageRC static GinPlaceToPageRC
entryPlaceToPage(GinBtree btree, Buffer buf, GinBtreeStack *stack, entryBeginPlaceToPage(GinBtree btree, Buffer buf, GinBtreeStack *stack,
void *insertPayload, BlockNumber updateblkno, void *insertPayload, BlockNumber updateblkno,
Page *newlpage, Page *newrpage) void **ptp_workspace,
Page *newlpage, Page *newrpage)
{ {
GinBtreeEntryInsertData *insertData = insertPayload; GinBtreeEntryInsertData *insertData = insertPayload;
Page page = BufferGetPage(buf);
OffsetNumber off = stack->off; OffsetNumber off = stack->off;
OffsetNumber placed;
/* this must be static so it can be returned to caller. */ /* If it doesn't fit, deal with split case */
static ginxlogInsertEntry data;
/* quick exit if it doesn't fit */
if (!entryIsEnoughSpace(btree, buf, off, insertData)) if (!entryIsEnoughSpace(btree, buf, off, insertData))
{ {
entrySplitPage(btree, buf, stack, insertPayload, updateblkno, entrySplitPage(btree, buf, stack, insertData, updateblkno,
newlpage, newrpage); newlpage, newrpage);
return SPLIT; return GPTP_SPLIT;
} }
START_CRIT_SECTION(); /* Else, we're ready to proceed with insertion */
return GPTP_INSERT;
}
/*
* Perform data insertion after beginPlaceToPage has decided it will fit.
*
* This is invoked within a critical section, and XLOG record creation (if
* needed) is already started. The target buffer is registered in slot 0.
*/
static void
entryExecPlaceToPage(GinBtree btree, Buffer buf, GinBtreeStack *stack,
void *insertPayload, BlockNumber updateblkno,
void *ptp_workspace)
{
GinBtreeEntryInsertData *insertData = insertPayload;
Page page = BufferGetPage(buf);
OffsetNumber off = stack->off;
OffsetNumber placed;
entryPreparePage(btree, page, off, insertData, updateblkno); entryPreparePage(btree, page, off, insertData, updateblkno);
...@@ -554,34 +572,36 @@ entryPlaceToPage(GinBtree btree, Buffer buf, GinBtreeStack *stack, ...@@ -554,34 +572,36 @@ entryPlaceToPage(GinBtree btree, Buffer buf, GinBtreeStack *stack,
if (RelationNeedsWAL(btree->index)) if (RelationNeedsWAL(btree->index))
{ {
/*
* This must be static, because it has to survive until XLogInsert,
* and we can't palloc here. Ugly, but the XLogInsert infrastructure
* isn't reentrant anyway.
*/
static ginxlogInsertEntry data;
data.isDelete = insertData->isDelete; data.isDelete = insertData->isDelete;
data.offset = off; data.offset = off;
XLogBeginInsert();
XLogRegisterBuffer(0, buf, REGBUF_STANDARD);
XLogRegisterBufData(0, (char *) &data, XLogRegisterBufData(0, (char *) &data,
offsetof(ginxlogInsertEntry, tuple)); offsetof(ginxlogInsertEntry, tuple));
XLogRegisterBufData(0, (char *) insertData->entry, XLogRegisterBufData(0, (char *) insertData->entry,
IndexTupleSize(insertData->entry)); IndexTupleSize(insertData->entry));
} }
return INSERTED;
} }
/* /*
* Place tuple and split page, original buffer(lbuf) leaves untouched, * Split entry page and insert new data.
* returns shadow pages filled with new data. *
* Tuples are distributed between pages by equal size on its, not * Returns new temp pages to *newlpage and *newrpage.
* an equal number! * The original buffer is left untouched.
*/ */
static void static void
entrySplitPage(GinBtree btree, Buffer origbuf, entrySplitPage(GinBtree btree, Buffer origbuf,
GinBtreeStack *stack, GinBtreeStack *stack,
void *insertPayload, GinBtreeEntryInsertData *insertData,
BlockNumber updateblkno, BlockNumber updateblkno,
Page *newlpage, Page *newrpage) Page *newlpage, Page *newrpage)
{ {
GinBtreeEntryInsertData *insertData = insertPayload;
OffsetNumber off = stack->off; OffsetNumber off = stack->off;
OffsetNumber i, OffsetNumber i,
maxoff, maxoff,
...@@ -646,6 +666,10 @@ entrySplitPage(GinBtree btree, Buffer origbuf, ...@@ -646,6 +666,10 @@ entrySplitPage(GinBtree btree, Buffer origbuf,
{ {
itup = (IndexTuple) ptr; itup = (IndexTuple) ptr;
/*
* Decide where to split. We try to equalize the pages' total data
* size, not number of tuples.
*/
if (lsize > totalsize / 2) if (lsize > totalsize / 2)
{ {
if (separator == InvalidOffsetNumber) if (separator == InvalidOffsetNumber)
...@@ -663,6 +687,7 @@ entrySplitPage(GinBtree btree, Buffer origbuf, ...@@ -663,6 +687,7 @@ entrySplitPage(GinBtree btree, Buffer origbuf,
ptr += MAXALIGN(IndexTupleSize(itup)); ptr += MAXALIGN(IndexTupleSize(itup));
} }
/* return temp pages to caller */
*newlpage = lpage; *newlpage = lpage;
*newrpage = rpage; *newrpage = rpage;
} }
...@@ -731,7 +756,8 @@ ginPrepareEntryScan(GinBtree btree, OffsetNumber attnum, ...@@ -731,7 +756,8 @@ ginPrepareEntryScan(GinBtree btree, OffsetNumber attnum,
btree->isMoveRight = entryIsMoveRight; btree->isMoveRight = entryIsMoveRight;
btree->findItem = entryLocateLeafEntry; btree->findItem = entryLocateLeafEntry;
btree->findChildPtr = entryFindChildPtr; btree->findChildPtr = entryFindChildPtr;
btree->placeToPage = entryPlaceToPage; btree->beginPlaceToPage = entryBeginPlaceToPage;
btree->execPlaceToPage = entryExecPlaceToPage;
btree->fillRoot = ginEntryFillRoot; btree->fillRoot = ginEntryFillRoot;
btree->prepareDownlink = entryPrepareDownlink; btree->prepareDownlink = entryPrepareDownlink;
......
...@@ -420,14 +420,14 @@ typedef struct ginxlogCreatePostingTree ...@@ -420,14 +420,14 @@ typedef struct ginxlogCreatePostingTree
typedef struct typedef struct
{ {
uint16 flags; /* GIN_SPLIT_ISLEAF and/or GIN_SPLIT_ISDATA */ uint16 flags; /* GIN_INSERT_ISLEAF and/or GIN_INSERT_ISDATA */
/* /*
* FOLLOWS: * FOLLOWS:
* *
* 1. if not leaf page, block numbers of the left and right child pages * 1. if not leaf page, block numbers of the left and right child pages
* whose split this insertion finishes. As BlockIdData[2] (beware of * whose split this insertion finishes, as BlockIdData[2] (beware of
* adding fields before this that would make them not 16-bit aligned) * adding fields in this struct that would make them not 16-bit aligned)
* *
* 2. a ginxlogInsertEntry or ginxlogRecompressDataLeaf struct, depending * 2. a ginxlogInsertEntry or ginxlogRecompressDataLeaf struct, depending
* on tree type. * on tree type.
...@@ -499,21 +499,19 @@ typedef struct ginxlogSplit ...@@ -499,21 +499,19 @@ typedef struct ginxlogSplit
* split */ * split */
BlockNumber leftChildBlkno; /* valid on a non-leaf split */ BlockNumber leftChildBlkno; /* valid on a non-leaf split */
BlockNumber rightChildBlkno; BlockNumber rightChildBlkno;
uint16 flags; uint16 flags; /* see below */
/* follows: one of the following structs */
} ginxlogSplit; } ginxlogSplit;
/* /*
* Flags used in ginxlogInsert and ginxlogSplit records * Flags used in ginxlogInsert and ginxlogSplit records
*/ */
#define GIN_INSERT_ISDATA 0x01 /* for both insert and split records */ #define GIN_INSERT_ISDATA 0x01 /* for both insert and split records */
#define GIN_INSERT_ISLEAF 0x02 /* .. */ #define GIN_INSERT_ISLEAF 0x02 /* ditto */
#define GIN_SPLIT_ROOT 0x04 /* only for split records */ #define GIN_SPLIT_ROOT 0x04 /* only for split records */
/* /*
* Vacuum simply WAL-logs the whole page, when anything is modified. This * Vacuum simply WAL-logs the whole page, when anything is modified. This
* functionally identical heap_newpage records, but is kept separate for * is functionally identical to heap_newpage records, but is kept separate for
* debugging purposes. (When inspecting the WAL stream, it's easier to see * debugging purposes. (When inspecting the WAL stream, it's easier to see
* what's going on when GIN vacuum records are marked as such, not as heap * what's going on when GIN vacuum records are marked as such, not as heap
* records.) This is currently only used for entry tree leaf pages. * records.) This is currently only used for entry tree leaf pages.
...@@ -641,12 +639,12 @@ typedef struct GinBtreeStack ...@@ -641,12 +639,12 @@ typedef struct GinBtreeStack
typedef struct GinBtreeData *GinBtree; typedef struct GinBtreeData *GinBtree;
/* Return codes for GinBtreeData.placeToPage method */ /* Return codes for GinBtreeData.beginPlaceToPage method */
typedef enum typedef enum
{ {
UNMODIFIED, GPTP_NO_WORK,
INSERTED, GPTP_INSERT,
SPLIT GPTP_SPLIT
} GinPlaceToPageRC; } GinPlaceToPageRC;
typedef struct GinBtreeData typedef struct GinBtreeData
...@@ -659,7 +657,8 @@ typedef struct GinBtreeData ...@@ -659,7 +657,8 @@ typedef struct GinBtreeData
/* insert methods */ /* insert methods */
OffsetNumber (*findChildPtr) (GinBtree, Page, BlockNumber, OffsetNumber); OffsetNumber (*findChildPtr) (GinBtree, Page, BlockNumber, OffsetNumber);
GinPlaceToPageRC (*placeToPage) (GinBtree, Buffer, GinBtreeStack *, void *, BlockNumber, Page *, Page *); GinPlaceToPageRC (*beginPlaceToPage) (GinBtree, Buffer, GinBtreeStack *, void *, BlockNumber, void **, Page *, Page *);
void (*execPlaceToPage) (GinBtree, Buffer, GinBtreeStack *, void *, BlockNumber, void *);
void *(*prepareDownlink) (GinBtree, Buffer); void *(*prepareDownlink) (GinBtree, Buffer);
void (*fillRoot) (GinBtree, Page, BlockNumber, Page, BlockNumber, Page); void (*fillRoot) (GinBtree, Page, BlockNumber, Page, BlockNumber, Page);
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment