Commit d22a09dc authored by Tom Lane's avatar Tom Lane

Support GiST index support functions that want to cache data across calls.

pg_trgm was already doing this unofficially, but the implementation hadn't
been thought through very well and leaked memory.  Restructure the core
GiST code so that it actually works, and document it.  Ordinarily this
would have required an extra memory context creation/destruction for each
GiST index search, but I was able to avoid that in the normal case of a
non-rescanned search by finessing the handling of the RBTree.  It used to
have its own context always, but now shares a context with the
scan-lifespan data structures, unless there is more than one rescan call.
This should make the added overhead unnoticeable in typical cases.
parent 79edb2b1
......@@ -86,11 +86,6 @@
reuse, and a clean interface.
</para>
</sect1>
<sect1 id="gist-implementation">
<title>Implementation</title>
<para>
There are seven methods that an index operator class for
<acronym>GiST</acronym> must provide, and an eighth that is optional.
......@@ -642,35 +637,54 @@ my_distance(PG_FUNCTION_ARGS)
</variablelist>
<para>
All the GiST support methods are normally called in short-lived memory
contexts; that is, <varname>CurrentMemoryContext</> will get reset after
each tuple is processed. It is therefore not very important to worry about
pfree'ing everything you palloc. However, in some cases it's useful for a
support method to cache data across repeated calls. To do that, allocate
the longer-lived data in <literal>fcinfo-&gt;flinfo-&gt;fn_mcxt</>, and
keep a pointer to it in <literal>fcinfo-&gt;flinfo-&gt;fn_extra</>. Such
data will survive for the life of the index operation (e.g., a single GiST
index scan, index build, or index tuple insertion). Be careful to pfree
the previous value when replacing a <literal>fn_extra</> value, or the leak
will accumulate for the duration of the operation.
</para>
</sect1>
<sect1 id="gist-implementation">
<title>Implementation</title>
<sect2 id="gist-buffering-build">
<title>GiST buffering build</title>
<para>
Building large GiST indexes by simply inserting all the tuples tends to be
slow, because if the index tuples are scattered across the index and the
index is large enough to not fit in cache, the insertions need to perform
a lot of random I/O. PostgreSQL from version 9.2 supports a more efficient
method to build GiST indexes based on buffering, which can dramatically
reduce number of random I/O needed for non-ordered data sets. For
well-ordered datasets the benefit is smaller or non-existent, because
only a small number of pages receive new tuples at a time, and those pages
fit in cache even if the index as whole does not.
a lot of random I/O. Beginning in version 9.2, PostgreSQL supports a more
efficient method to build GiST indexes based on buffering, which can
dramatically reduce the number of random I/Os needed for non-ordered data
sets. For well-ordered datasets the benefit is smaller or non-existent,
because only a small number of pages receive new tuples at a time, and
those pages fit in cache even if the index as whole does not.
</para>
<para>
However, buffering index build needs to call the <function>penalty</>
function more often, which consumes some extra CPU resources. Also, the
buffers used in the buffering build need temporary disk space, up to
the size of the resulting index. Buffering can also infuence the quality
of the produced index, in both positive and negative directions. That
the size of the resulting index. Buffering can also influence the quality
of the resulting index, in both positive and negative directions. That
influence depends on various factors, like the distribution of the input
data and operator class implementation.
data and the operator class implementation.
</para>
<para>
By default, the index build switches to the buffering method when the
By default, a GiST index build switches to the buffering method when the
index size reaches <xref linkend="guc-effective-cache-size">. It can
be manually turned on or off by the <literal>BUFFERING</literal> parameter
to the CREATE INDEX clause. The default behavior is good for most cases,
to the CREATE INDEX command. The default behavior is good for most cases,
but turning buffering off might speed up the build somewhat if the input
data is ordered.
</para>
......
......@@ -94,25 +94,29 @@ gistinsert(PG_FUNCTION_ARGS)
IndexUniqueCheck checkUnique = (IndexUniqueCheck) PG_GETARG_INT32(5);
#endif
IndexTuple itup;
GISTSTATE giststate;
MemoryContext oldCtx;
MemoryContext insertCtx;
GISTSTATE *giststate;
MemoryContext oldCxt;
insertCtx = createTempGistContext();
oldCtx = MemoryContextSwitchTo(insertCtx);
giststate = initGISTstate(r);
initGISTstate(&giststate, r);
/*
* We use the giststate's scan context as temp context too. This means
* that any memory leaked by the support functions is not reclaimed until
* end of insert. In most cases, we aren't going to call the support
* functions very many times before finishing the insert, so this seems
* cheaper than resetting a temp context for each function call.
*/
oldCxt = MemoryContextSwitchTo(giststate->tempCxt);
itup = gistFormTuple(&giststate, r,
itup = gistFormTuple(giststate, r,
values, isnull, true /* size is currently bogus */ );
itup->t_tid = *ht_ctid;
gistdoinsert(r, itup, 0, &giststate);
gistdoinsert(r, itup, 0, giststate);
/* cleanup */
freeGISTstate(&giststate);
MemoryContextSwitchTo(oldCtx);
MemoryContextDelete(insertCtx);
MemoryContextSwitchTo(oldCxt);
freeGISTstate(giststate);
PG_RETURN_BOOL(false);
}
......@@ -1213,47 +1217,64 @@ gistSplit(Relation r,
}
/*
* Fill a GISTSTATE with information about the index
* Create a GISTSTATE and fill it with information about the index
*/
void
initGISTstate(GISTSTATE *giststate, Relation index)
GISTSTATE *
initGISTstate(Relation index)
{
GISTSTATE *giststate;
MemoryContext scanCxt;
MemoryContext oldCxt;
int i;
/* safety check to protect fixed-size arrays in GISTSTATE */
if (index->rd_att->natts > INDEX_MAX_KEYS)
elog(ERROR, "numberOfAttributes %d > %d",
index->rd_att->natts, INDEX_MAX_KEYS);
/* Create the memory context that will hold the GISTSTATE */
scanCxt = AllocSetContextCreate(CurrentMemoryContext,
"GiST scan context",
ALLOCSET_DEFAULT_MINSIZE,
ALLOCSET_DEFAULT_INITSIZE,
ALLOCSET_DEFAULT_MAXSIZE);
oldCxt = MemoryContextSwitchTo(scanCxt);
/* Create and fill in the GISTSTATE */
giststate = (GISTSTATE *) palloc(sizeof(GISTSTATE));
giststate->scanCxt = scanCxt;
giststate->tempCxt = scanCxt; /* caller must change this if needed */
giststate->tupdesc = index->rd_att;
for (i = 0; i < index->rd_att->natts; i++)
{
fmgr_info_copy(&(giststate->consistentFn[i]),
index_getprocinfo(index, i + 1, GIST_CONSISTENT_PROC),
CurrentMemoryContext);
scanCxt);
fmgr_info_copy(&(giststate->unionFn[i]),
index_getprocinfo(index, i + 1, GIST_UNION_PROC),
CurrentMemoryContext);
scanCxt);
fmgr_info_copy(&(giststate->compressFn[i]),
index_getprocinfo(index, i + 1, GIST_COMPRESS_PROC),
CurrentMemoryContext);
scanCxt);
fmgr_info_copy(&(giststate->decompressFn[i]),
index_getprocinfo(index, i + 1, GIST_DECOMPRESS_PROC),
CurrentMemoryContext);
scanCxt);
fmgr_info_copy(&(giststate->penaltyFn[i]),
index_getprocinfo(index, i + 1, GIST_PENALTY_PROC),
CurrentMemoryContext);
scanCxt);
fmgr_info_copy(&(giststate->picksplitFn[i]),
index_getprocinfo(index, i + 1, GIST_PICKSPLIT_PROC),
CurrentMemoryContext);
scanCxt);
fmgr_info_copy(&(giststate->equalFn[i]),
index_getprocinfo(index, i + 1, GIST_EQUAL_PROC),
CurrentMemoryContext);
scanCxt);
/* opclasses are not required to provide a Distance method */
if (OidIsValid(index_getprocid(index, i + 1, GIST_DISTANCE_PROC)))
fmgr_info_copy(&(giststate->distanceFn[i]),
index_getprocinfo(index, i + 1, GIST_DISTANCE_PROC),
CurrentMemoryContext);
scanCxt);
else
giststate->distanceFn[i].fn_oid = InvalidOid;
......@@ -1273,10 +1294,15 @@ initGISTstate(GISTSTATE *giststate, Relation index)
else
giststate->supportCollation[i] = DEFAULT_COLLATION_OID;
}
MemoryContextSwitchTo(oldCxt);
return giststate;
}
void
freeGISTstate(GISTSTATE *giststate)
{
/* no work */
/* It's sufficient to delete the scanCxt */
MemoryContextDelete(giststate->scanCxt);
}
......@@ -54,7 +54,7 @@ typedef enum
typedef struct
{
Relation indexrel;
GISTSTATE giststate;
GISTSTATE *giststate;
GISTBuildBuffers *gfbb;
int64 indtuples; /* number of tuples indexed */
......@@ -63,7 +63,6 @@ typedef struct
Size freespace; /* amount of free space to leave on pages */
GistBufferingMode bufferingMode;
MemoryContext tmpCtx;
} GISTBuildState;
static void gistInitBuffering(GISTBuildState *buildstate);
......@@ -146,7 +145,14 @@ gistbuild(PG_FUNCTION_ARGS)
RelationGetRelationName(index));
/* no locking is needed */
initGISTstate(&buildstate.giststate, index);
buildstate.giststate = initGISTstate(index);
/*
* Create a temporary memory context that is reset once for each tuple
* processed. (Note: we don't bother to make this a child of the
* giststate's scanCxt, so we have to delete it separately at the end.)
*/
buildstate.giststate->tempCxt = createTempGistContext();
/* initialize the root page */
buffer = gistNewBuffer(index);
......@@ -184,12 +190,6 @@ gistbuild(PG_FUNCTION_ARGS)
buildstate.indtuples = 0;
buildstate.indtuplesSize = 0;
/*
* create a temporary memory context that is reset once for each tuple
* processed.
*/
buildstate.tmpCtx = createTempGistContext();
/*
* Do the heap scan.
*/
......@@ -208,9 +208,9 @@ gistbuild(PG_FUNCTION_ARGS)
/* okay, all heap tuples are indexed */
MemoryContextSwitchTo(oldcxt);
MemoryContextDelete(buildstate.tmpCtx);
MemoryContextDelete(buildstate.giststate->tempCxt);
freeGISTstate(&buildstate.giststate);
freeGISTstate(buildstate.giststate);
/*
* Return statistics
......@@ -440,10 +440,10 @@ gistBuildCallback(Relation index,
IndexTuple itup;
MemoryContext oldCtx;
oldCtx = MemoryContextSwitchTo(buildstate->tmpCtx);
oldCtx = MemoryContextSwitchTo(buildstate->giststate->tempCxt);
/* form an index tuple and point it at the heap tuple */
itup = gistFormTuple(&buildstate->giststate, index, values, isnull, true);
itup = gistFormTuple(buildstate->giststate, index, values, isnull, true);
itup->t_tid = htup->t_self;
if (buildstate->bufferingMode == GIST_BUFFERING_ACTIVE)
......@@ -458,7 +458,7 @@ gistBuildCallback(Relation index,
* locked, we call gistdoinsert directly.
*/
gistdoinsert(index, itup, buildstate->freespace,
&buildstate->giststate);
buildstate->giststate);
}
/* Update tuple count and total size. */
......@@ -466,7 +466,7 @@ gistBuildCallback(Relation index,
buildstate->indtuplesSize += IndexTupleSize(itup);
MemoryContextSwitchTo(oldCtx);
MemoryContextReset(buildstate->tmpCtx);
MemoryContextReset(buildstate->giststate->tempCxt);
if (buildstate->bufferingMode == GIST_BUFFERING_ACTIVE &&
buildstate->indtuples % BUFFERING_MODE_TUPLE_SIZE_STATS_TARGET == 0)
......@@ -520,7 +520,7 @@ static bool
gistProcessItup(GISTBuildState *buildstate, IndexTuple itup,
GISTBufferingInsertStack *startparent)
{
GISTSTATE *giststate = &buildstate->giststate;
GISTSTATE *giststate = buildstate->giststate;
GISTBuildBuffers *gfbb = buildstate->gfbb;
Relation indexrel = buildstate->indexrel;
GISTBufferingInsertStack *path;
......@@ -652,7 +652,7 @@ gistbufferinginserttuples(GISTBuildState *buildstate, Buffer buffer,
is_split = gistplacetopage(buildstate->indexrel,
buildstate->freespace,
&buildstate->giststate,
buildstate->giststate,
buffer,
itup, ntup, oldoffnum,
InvalidBuffer,
......@@ -720,7 +720,7 @@ gistbufferinginserttuples(GISTBuildState *buildstate, Buffer buffer,
* buffers that will eventually be inserted to them.
*/
gistRelocateBuildBuffersOnSplit(gfbb,
&buildstate->giststate,
buildstate->giststate,
buildstate->indexrel,
path, buffer, splitinfo);
......@@ -919,7 +919,7 @@ gistProcessEmptyingQueue(GISTBuildState *buildstate)
}
/* Free all the memory allocated during index tuple processing */
MemoryContextReset(CurrentMemoryContext);
MemoryContextReset(buildstate->giststate->tempCxt);
}
}
}
......@@ -938,7 +938,7 @@ gistEmptyAllBuffers(GISTBuildState *buildstate)
MemoryContext oldCtx;
int i;
oldCtx = MemoryContextSwitchTo(buildstate->tmpCtx);
oldCtx = MemoryContextSwitchTo(buildstate->giststate->tempCxt);
/*
* Iterate through the levels from top to bottom.
......@@ -970,7 +970,7 @@ gistEmptyAllBuffers(GISTBuildState *buildstate)
nodeBuffer->queuedForEmptying = true;
gfbb->bufferEmptyingQueue =
lcons(nodeBuffer, gfbb->bufferEmptyingQueue);
MemoryContextSwitchTo(buildstate->tmpCtx);
MemoryContextSwitchTo(buildstate->giststate->tempCxt);
}
gistProcessEmptyingQueue(buildstate);
}
......
......@@ -307,12 +307,12 @@ gistScanPage(IndexScanDesc scan, GISTSearchItem *pageItem, double *myDistances,
* Must call gistindex_keytest in tempCxt, and clean up any leftover
* junk afterward.
*/
oldcxt = MemoryContextSwitchTo(so->tempCxt);
oldcxt = MemoryContextSwitchTo(so->giststate->tempCxt);
match = gistindex_keytest(scan, it, page, i, &recheck);
MemoryContextSwitchTo(oldcxt);
MemoryContextReset(so->tempCxt);
MemoryContextReset(so->giststate->tempCxt);
/* Ignore tuple if it doesn't match */
if (!match)
......
......@@ -104,20 +104,28 @@ gistbeginscan(PG_FUNCTION_ARGS)
int nkeys = PG_GETARG_INT32(1);
int norderbys = PG_GETARG_INT32(2);
IndexScanDesc scan;
GISTSTATE *giststate;
GISTScanOpaque so;
MemoryContext oldCxt;
scan = RelationGetIndexScan(r, nkeys, norderbys);
/* First, set up a GISTSTATE with a scan-lifespan memory context */
giststate = initGISTstate(scan->indexRelation);
/*
* Everything made below is in the scanCxt, or is a child of the scanCxt,
* so it'll all go away automatically in gistendscan.
*/
oldCxt = MemoryContextSwitchTo(giststate->scanCxt);
/* initialize opaque data */
so = (GISTScanOpaque) palloc0(sizeof(GISTScanOpaqueData));
so->queueCxt = AllocSetContextCreate(CurrentMemoryContext,
"GiST queue context",
ALLOCSET_DEFAULT_MINSIZE,
ALLOCSET_DEFAULT_INITSIZE,
ALLOCSET_DEFAULT_MAXSIZE);
so->tempCxt = createTempGistContext();
so->giststate = (GISTSTATE *) palloc(sizeof(GISTSTATE));
initGISTstate(so->giststate, scan->indexRelation);
so->giststate = giststate;
giststate->tempCxt = createTempGistContext();
so->queue = NULL;
so->queueCxt = giststate->scanCxt; /* see gistrescan */
/* workspaces with size dependent on numberOfOrderBys: */
so->tmpTreeItem = palloc(GSTIHDRSZ + sizeof(double) * scan->numberOfOrderBys);
so->distances = palloc(sizeof(double) * scan->numberOfOrderBys);
......@@ -125,6 +133,8 @@ gistbeginscan(PG_FUNCTION_ARGS)
scan->opaque = so;
MemoryContextSwitchTo(oldCxt);
PG_RETURN_POINTER(scan);
}
......@@ -137,12 +147,44 @@ gistrescan(PG_FUNCTION_ARGS)
/* nkeys and norderbys arguments are ignored */
GISTScanOpaque so = (GISTScanOpaque) scan->opaque;
bool first_time;
int i;
MemoryContext oldCxt;
/* rescan an existing indexscan --- reset state */
MemoryContextReset(so->queueCxt);
so->curTreeItem = NULL;
/*
* The first time through, we create the search queue in the scanCxt.
* Subsequent times through, we create the queue in a separate queueCxt,
* which is created on the second call and reset on later calls. Thus, in
* the common case where a scan is only rescan'd once, we just put the
* queue in scanCxt and don't pay the overhead of making a second memory
* context. If we do rescan more than once, the first RBTree is just left
* for dead until end of scan; this small wastage seems worth the savings
* in the common case.
*/
if (so->queue == NULL)
{
/* first time through */
Assert(so->queueCxt == so->giststate->scanCxt);
first_time = true;
}
else if (so->queueCxt == so->giststate->scanCxt)
{
/* second time through */
so->queueCxt = AllocSetContextCreate(so->giststate->scanCxt,
"GiST queue context",
ALLOCSET_DEFAULT_MINSIZE,
ALLOCSET_DEFAULT_INITSIZE,
ALLOCSET_DEFAULT_MAXSIZE);
first_time = false;
}
else
{
/* third or later time through */
MemoryContextReset(so->queueCxt);
first_time = false;
}
/* create new, empty RBTree for search queue */
oldCxt = MemoryContextSwitchTo(so->queueCxt);
......@@ -154,11 +196,28 @@ gistrescan(PG_FUNCTION_ARGS)
scan);
MemoryContextSwitchTo(oldCxt);
so->curTreeItem = NULL;
so->firstCall = true;
/* Update scan key, if a new one is given */
if (key && scan->numberOfKeys > 0)
{
/*
* If this isn't the first time through, preserve the fn_extra
* pointers, so that if the consistentFns are using them to cache
* data, that data is not leaked across a rescan.
*/
if (!first_time)
{
for (i = 0; i < scan->numberOfKeys; i++)
{
ScanKey skey = scan->keyData + i;
so->giststate->consistentFn[skey->sk_attno - 1].fn_extra =
skey->sk_func.fn_extra;
}
}
memmove(scan->keyData, key,
scan->numberOfKeys * sizeof(ScanKeyData));
......@@ -172,6 +231,10 @@ gistrescan(PG_FUNCTION_ARGS)
* Next, if any of keys is a NULL and that key is not marked with
* SK_SEARCHNULL/SK_SEARCHNOTNULL then nothing can be found (ie, we
* assume all indexable operators are strict).
*
* Note: we intentionally memcpy the FmgrInfo to sk_func rather than
* using fmgr_info_copy. This is so that the fn_extra field gets
* preserved across multiple rescans.
*/
so->qual_ok = true;
......@@ -192,6 +255,18 @@ gistrescan(PG_FUNCTION_ARGS)
/* Update order-by key, if a new one is given */
if (orderbys && scan->numberOfOrderBys > 0)
{
/* As above, preserve fn_extra if not first time through */
if (!first_time)
{
for (i = 0; i < scan->numberOfOrderBys; i++)
{
ScanKey skey = scan->orderByData + i;
so->giststate->distanceFn[skey->sk_attno - 1].fn_extra =
skey->sk_func.fn_extra;
}
}
memmove(scan->orderByData, orderbys,
scan->numberOfOrderBys * sizeof(ScanKeyData));
......@@ -201,6 +276,8 @@ gistrescan(PG_FUNCTION_ARGS)
* function in the form of its strategy number, which is available
* from the sk_strategy field, and its subtype from the sk_subtype
* field.
*
* See above comment about why we don't use fmgr_info_copy here.
*/
for (i = 0; i < scan->numberOfOrderBys; i++)
{
......@@ -239,13 +316,11 @@ gistendscan(PG_FUNCTION_ARGS)
IndexScanDesc scan = (IndexScanDesc) PG_GETARG_POINTER(0);
GISTScanOpaque so = (GISTScanOpaque) scan->opaque;
/*
* freeGISTstate is enough to clean up everything made by gistbeginscan,
* as well as the queueCxt if there is a separate context for it.
*/
freeGISTstate(so->giststate);
pfree(so->giststate);
MemoryContextDelete(so->queueCxt);
MemoryContextDelete(so->tempCxt);
pfree(so->tmpTreeItem);
pfree(so->distances);
pfree(so);
PG_RETURN_VOID();
}
......@@ -48,9 +48,21 @@ typedef struct
*
* This struct retains call info for the index's opclass-specific support
* functions (per index column), plus the index's tuple descriptor.
*
* scanCxt holds the GISTSTATE itself as well as any data that lives for the
* lifetime of the index operation. We pass this to the support functions
* via fn_mcxt, so that they can store scan-lifespan data in it. The
* functions are invoked in tempCxt, which is typically short-lifespan
* (that is, it's reset after each tuple). However, tempCxt can be the same
* as scanCxt if we're not bothering with per-tuple context resets.
*/
typedef struct GISTSTATE
{
MemoryContext scanCxt; /* context for scan-lifespan data */
MemoryContext tempCxt; /* short-term context for calling functions */
TupleDesc tupdesc; /* index's tuple descriptor */
FmgrInfo consistentFn[INDEX_MAX_KEYS];
FmgrInfo unionFn[INDEX_MAX_KEYS];
FmgrInfo compressFn[INDEX_MAX_KEYS];
......@@ -62,8 +74,6 @@ typedef struct GISTSTATE
/* Collations to pass to the support functions */
Oid supportCollation[INDEX_MAX_KEYS];
TupleDesc tupdesc;
} GISTSTATE;
......@@ -132,7 +142,6 @@ typedef struct GISTScanOpaqueData
GISTSTATE *giststate; /* index information, see above */
RBTree *queue; /* queue of unvisited items */
MemoryContext queueCxt; /* context holding the queue */
MemoryContext tempCxt; /* workspace context for calling functions */
bool qual_ok; /* false if qual can never be satisfied */
bool firstCall; /* true until first gistgettuple call */
......@@ -422,7 +431,7 @@ typedef struct GiSTOptions
extern Datum gistbuildempty(PG_FUNCTION_ARGS);
extern Datum gistinsert(PG_FUNCTION_ARGS);
extern MemoryContext createTempGistContext(void);
extern void initGISTstate(GISTSTATE *giststate, Relation index);
extern GISTSTATE *initGISTstate(Relation index);
extern void freeGISTstate(GISTSTATE *giststate);
extern void gistdoinsert(Relation r,
IndexTuple itup,
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment