Commit 98edd617 authored by Heikki Linnakangas's avatar Heikki Linnakangas

Fix datatype confusion with the new lossy GiST distance functions.

We can only support a lossy distance function when the distance function's
datatype is comparable with the original ordering operator's datatype.
The distance function always returns a float8, so we are limited to float8,
and float4 (by a hard-coded cast of the float8 to float4).

In light of this limitation, it seems like a good idea to have a separate
'recheck' flag for the ORDER BY expressions, so that if you have a non-lossy
distance function, it still works with lossy quals. There are cases like
that with the build-in or contrib opclasses, but it's plausible.

There was a hidden assumption that the ORDER BY values returned by GiST
match the original ordering operator's return type, but there are plenty
of examples where that's not true, e.g. in btree_gist and pg_trgm. As long
as the distance function is not lossy, we can tolerate that and just not
return the distance to the executor (or rather, always return NULL). The
executor doesn't need the distances if there are no lossy results.

There was another little bug: the recheck variable was not initialized
before calling the distance function. That revealed the bigger issue,
as the executor tried to reorder tuples that didn't need reordering, and
that failed because of the datatype mismatch.
parent a868931f
...@@ -818,10 +818,15 @@ my_distance(PG_FUNCTION_ARGS) ...@@ -818,10 +818,15 @@ my_distance(PG_FUNCTION_ARGS)
</para> </para>
<para> <para>
The result value can be any finite <type>float8</> value. (Infinity and If the distance function returns *recheck=true for a leaf node, the
minus infinity are used internally to handle cases such as nulls, so it original ordering operator's return type must be float8 or float4, and
is not recommended that <function>distance</> functions return these the distance function's return value must be comparable with the actual
values.) distance operator. Otherwise, the distance function's return type can
be any finit <type>float8</> value, as long as the relative order of
the returned values matches the order returned by the ordering operator.
(Infinity and minus infinity are used internally to handle cases such as
nulls, so it is not recommended that <function>distance</> functions
return these values.)
</para> </para>
</listitem> </listitem>
......
...@@ -16,6 +16,7 @@ ...@@ -16,6 +16,7 @@
#include "access/gist_private.h" #include "access/gist_private.h"
#include "access/relscan.h" #include "access/relscan.h"
#include "catalog/pg_type.h"
#include "miscadmin.h" #include "miscadmin.h"
#include "pgstat.h" #include "pgstat.h"
#include "lib/pairingheap.h" #include "lib/pairingheap.h"
...@@ -31,9 +32,11 @@ ...@@ -31,9 +32,11 @@
* depending on whether the containing page is a leaf page or not. * depending on whether the containing page is a leaf page or not.
* *
* On success return for a heap tuple, *recheck_p is set to indicate whether * On success return for a heap tuple, *recheck_p is set to indicate whether
* recheck is needed. We recheck if any of the consistent() or distance() * the quals need to be rechecked. We recheck if any of the consistent()
* functions request it. recheck is not interesting when examining a non-leaf * functions request it. recheck is not interesting when examining a non-leaf
* entry, since we must visit the lower index page if there's any doubt. * entry, since we must visit the lower index page if there's any doubt.
* Similarly, *recheck_distances_p is set to indicate whether the distances
* need to be rechecked, and it is also ignored for non-leaf entries.
* *
* If we are doing an ordered scan, so->distances[] is filled with distance * If we are doing an ordered scan, so->distances[] is filled with distance
* data from the distance() functions before returning success. * data from the distance() functions before returning success.
...@@ -50,7 +53,8 @@ gistindex_keytest(IndexScanDesc scan, ...@@ -50,7 +53,8 @@ gistindex_keytest(IndexScanDesc scan,
IndexTuple tuple, IndexTuple tuple,
Page page, Page page,
OffsetNumber offset, OffsetNumber offset,
bool *recheck_p) bool *recheck_p,
bool *recheck_distances_p)
{ {
GISTScanOpaque so = (GISTScanOpaque) scan->opaque; GISTScanOpaque so = (GISTScanOpaque) scan->opaque;
GISTSTATE *giststate = so->giststate; GISTSTATE *giststate = so->giststate;
...@@ -60,6 +64,7 @@ gistindex_keytest(IndexScanDesc scan, ...@@ -60,6 +64,7 @@ gistindex_keytest(IndexScanDesc scan,
Relation r = scan->indexRelation; Relation r = scan->indexRelation;
*recheck_p = false; *recheck_p = false;
*recheck_distances_p = false;
/* /*
* If it's a leftover invalid tuple from pre-9.1, treat it as a match with * If it's a leftover invalid tuple from pre-9.1, treat it as a match with
...@@ -194,11 +199,14 @@ gistindex_keytest(IndexScanDesc scan, ...@@ -194,11 +199,14 @@ gistindex_keytest(IndexScanDesc scan,
* use.) * use.)
* *
* Distance functions get a recheck argument as well. In this * Distance functions get a recheck argument as well. In this
* case the returned distance is the lower bound of distance * case the returned distance is the lower bound of distance and
* and needs to be rechecked. We return single recheck flag * needs to be rechecked. We return single recheck flag which
* which means that both quals and distances are to be * means that both quals and distances are to be rechecked. We
* rechecked. * initialize the flag to 'false'. The flag was added in version
* 9.5 and the distance operators written before that won't know
* about the flag, and are never lossy.
*/ */
recheck = false;
dist = FunctionCall5Coll(&key->sk_func, dist = FunctionCall5Coll(&key->sk_func,
key->sk_collation, key->sk_collation,
PointerGetDatum(&de), PointerGetDatum(&de),
...@@ -206,9 +214,7 @@ gistindex_keytest(IndexScanDesc scan, ...@@ -206,9 +214,7 @@ gistindex_keytest(IndexScanDesc scan,
Int32GetDatum(key->sk_strategy), Int32GetDatum(key->sk_strategy),
ObjectIdGetDatum(key->sk_subtype), ObjectIdGetDatum(key->sk_subtype),
PointerGetDatum(&recheck)); PointerGetDatum(&recheck));
*recheck_distances_p |= recheck;
*recheck_p |= recheck;
*distance_p = DatumGetFloat8(dist); *distance_p = DatumGetFloat8(dist);
} }
...@@ -310,6 +316,7 @@ gistScanPage(IndexScanDesc scan, GISTSearchItem *pageItem, double *myDistances, ...@@ -310,6 +316,7 @@ gistScanPage(IndexScanDesc scan, GISTSearchItem *pageItem, double *myDistances,
IndexTuple it = (IndexTuple) PageGetItem(page, PageGetItemId(page, i)); IndexTuple it = (IndexTuple) PageGetItem(page, PageGetItemId(page, i));
bool match; bool match;
bool recheck; bool recheck;
bool recheck_distances;
/* /*
* Must call gistindex_keytest in tempCxt, and clean up any leftover * Must call gistindex_keytest in tempCxt, and clean up any leftover
...@@ -317,7 +324,8 @@ gistScanPage(IndexScanDesc scan, GISTSearchItem *pageItem, double *myDistances, ...@@ -317,7 +324,8 @@ gistScanPage(IndexScanDesc scan, GISTSearchItem *pageItem, double *myDistances,
*/ */
oldcxt = MemoryContextSwitchTo(so->giststate->tempCxt); oldcxt = MemoryContextSwitchTo(so->giststate->tempCxt);
match = gistindex_keytest(scan, it, page, i, &recheck); match = gistindex_keytest(scan, it, page, i,
&recheck, &recheck_distances);
MemoryContextSwitchTo(oldcxt); MemoryContextSwitchTo(oldcxt);
MemoryContextReset(so->giststate->tempCxt); MemoryContextReset(so->giststate->tempCxt);
...@@ -375,6 +383,7 @@ gistScanPage(IndexScanDesc scan, GISTSearchItem *pageItem, double *myDistances, ...@@ -375,6 +383,7 @@ gistScanPage(IndexScanDesc scan, GISTSearchItem *pageItem, double *myDistances,
item->blkno = InvalidBlockNumber; item->blkno = InvalidBlockNumber;
item->data.heap.heapPtr = it->t_tid; item->data.heap.heapPtr = it->t_tid;
item->data.heap.recheck = recheck; item->data.heap.recheck = recheck;
item->data.heap.recheckDistances = recheck_distances;
/* /*
* In an index-only scan, also fetch the data from the tuple. * In an index-only scan, also fetch the data from the tuple.
...@@ -461,10 +470,33 @@ getNextNearest(IndexScanDesc scan) ...@@ -461,10 +470,33 @@ getNextNearest(IndexScanDesc scan)
/* found a heap item at currently minimal distance */ /* found a heap item at currently minimal distance */
scan->xs_ctup.t_self = item->data.heap.heapPtr; scan->xs_ctup.t_self = item->data.heap.heapPtr;
scan->xs_recheck = item->data.heap.recheck; scan->xs_recheck = item->data.heap.recheck;
scan->xs_recheckorderby = item->data.heap.recheckDistances;
for (i = 0; i < scan->numberOfOrderBys; i++) for (i = 0; i < scan->numberOfOrderBys; i++)
{ {
scan->xs_orderbyvals[i] = Float8GetDatum(item->distances[i]); if (so->orderByTypes[i] == FLOAT8OID)
scan->xs_orderbynulls[i] = false; {
scan->xs_orderbyvals[i] = Float8GetDatum(item->distances[i]);
scan->xs_orderbynulls[i] = false;
}
else if (so->orderByTypes[i] == FLOAT4OID)
{
scan->xs_orderbyvals[i] = Float4GetDatum((float4) item->distances[i]);
scan->xs_orderbynulls[i] = false;
}
else
{
/*
* If the ordering operator's return value is anything
* else, we don't know how to convert the float8 bound
* calculated by the distance function to that. The
* executor won't actually need the order by values we
* return here, if there are no lossy results, so only
* insist on the datatype if the *recheck is set.
*/
if (scan->xs_recheckorderby)
elog(ERROR, "GiST operator family's FOR ORDER BY operator must return float8 or float4 if the distance function is lossy");
scan->xs_orderbynulls[i] = true;
}
} }
/* in an index-only scan, also return the reconstructed tuple. */ /* in an index-only scan, also return the reconstructed tuple. */
......
...@@ -17,6 +17,7 @@ ...@@ -17,6 +17,7 @@
#include "access/gist_private.h" #include "access/gist_private.h"
#include "access/gistscan.h" #include "access/gistscan.h"
#include "access/relscan.h" #include "access/relscan.h"
#include "utils/lsyscache.h"
#include "utils/memutils.h" #include "utils/memutils.h"
#include "utils/rel.h" #include "utils/rel.h"
...@@ -263,6 +264,8 @@ gistrescan(PG_FUNCTION_ARGS) ...@@ -263,6 +264,8 @@ gistrescan(PG_FUNCTION_ARGS)
memmove(scan->orderByData, orderbys, memmove(scan->orderByData, orderbys,
scan->numberOfOrderBys * sizeof(ScanKeyData)); scan->numberOfOrderBys * sizeof(ScanKeyData));
so->orderByTypes = (Oid *) palloc(scan->numberOfOrderBys * sizeof(Oid));
/* /*
* Modify the order-by key so that the Distance method is called for * Modify the order-by key so that the Distance method is called for
* all comparisons. The original operator is passed to the Distance * all comparisons. The original operator is passed to the Distance
...@@ -281,6 +284,19 @@ gistrescan(PG_FUNCTION_ARGS) ...@@ -281,6 +284,19 @@ gistrescan(PG_FUNCTION_ARGS)
GIST_DISTANCE_PROC, skey->sk_attno, GIST_DISTANCE_PROC, skey->sk_attno,
RelationGetRelationName(scan->indexRelation)); RelationGetRelationName(scan->indexRelation));
/*
* Look up the datatype returned by the original ordering operator.
* GiST always uses a float8 for the distance function, but the
* ordering operator could be anything else.
*
* XXX: The distance function is only allowed to be lossy if the
* ordering operator's result type is float4 or float8. Otherwise
* we don't know how to return the distance to the executor. But
* we cannot check that here, as we won't know if the distance
* function is lossy until it returns *recheck = true for the
* first time.
*/
so->orderByTypes[i] = get_func_rettype(skey->sk_func.fn_oid);
fmgr_info_copy(&(skey->sk_func), finfo, so->giststate->scanCxt); fmgr_info_copy(&(skey->sk_func), finfo, so->giststate->scanCxt);
/* Restore prior fn_extra pointers, if not first time */ /* Restore prior fn_extra pointers, if not first time */
......
...@@ -236,7 +236,12 @@ next_indextuple: ...@@ -236,7 +236,12 @@ next_indextuple:
InstrCountFiltered2(node, 1); InstrCountFiltered2(node, 1);
goto next_indextuple; goto next_indextuple;
} }
}
if (scandesc->xs_recheckorderby)
{
econtext->ecxt_scantuple = slot;
ResetExprContext(econtext);
EvalOrderByExpressions(node, econtext); EvalOrderByExpressions(node, econtext);
/* /*
......
...@@ -121,6 +121,7 @@ typedef struct GISTSearchHeapItem ...@@ -121,6 +121,7 @@ typedef struct GISTSearchHeapItem
{ {
ItemPointerData heapPtr; ItemPointerData heapPtr;
bool recheck; /* T if quals must be rechecked */ bool recheck; /* T if quals must be rechecked */
bool recheckDistances; /* T if distances must be rechecked */
IndexTuple ftup; /* data fetched back from the index, used in IndexTuple ftup; /* data fetched back from the index, used in
* index-only scans */ * index-only scans */
} GISTSearchHeapItem; } GISTSearchHeapItem;
...@@ -150,6 +151,8 @@ typedef struct GISTSearchItem ...@@ -150,6 +151,8 @@ typedef struct GISTSearchItem
typedef struct GISTScanOpaqueData typedef struct GISTScanOpaqueData
{ {
GISTSTATE *giststate; /* index information, see above */ GISTSTATE *giststate; /* index information, see above */
Oid *orderByTypes; /* datatypes of ORDER BY expressions */
pairingheap *queue; /* queue of unvisited items */ pairingheap *queue; /* queue of unvisited items */
MemoryContext queueCxt; /* context holding the queue */ MemoryContext queueCxt; /* context holding the queue */
bool qual_ok; /* false if qual can never be satisfied */ bool qual_ok; /* false if qual can never be satisfied */
......
...@@ -99,6 +99,7 @@ typedef struct IndexScanDescData ...@@ -99,6 +99,7 @@ typedef struct IndexScanDescData
*/ */
Datum *xs_orderbyvals; Datum *xs_orderbyvals;
bool *xs_orderbynulls; bool *xs_orderbynulls;
bool xs_recheckorderby; /* T means ORDER BY exprs must be rechecked */
/* state data for traversing HOT chains in index_getnext */ /* state data for traversing HOT chains in index_getnext */
bool xs_continue_hot; /* T if must keep walking HOT chain */ bool xs_continue_hot; /* T if must keep walking HOT chain */
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment