• Tom Lane's avatar
    Rewrite the key-combination logic in GIN's keyGetItem() and scanGetItem() · 2ab57e08
    Tom Lane authored
    routines to make them behave better in the presence of "lossy" index pointers.
    The previous coding was outright incorrect for some cases, as recently
    reported by Artur Dabrowski: scanGetItem would fail to return index entries in
    cases where one index key had multiple exact pointers on the same page as
    another key had a lossy pointer.  Also, keyGetItem was extremely inefficient
    for cases where a single index key generates multiple "entry" streams, such as
    an @@ operator with a multiple-clause tsquery.  The presence of a lossy page
    pointer in any one stream defeated its ability to use the opclass
    consistentFn, resulting in probing many heap pages that didn't really need to
    be visited.  In Artur's example case, a query like
    	WHERE tsvector @@ to_tsquery('a & b')
    was about 50X slower than the theoretically equivalent
    	WHERE tsvector @@ to_tsquery('a') AND tsvector @@ to_tsquery('b')
    The way that I chose to fix this was to have GIN call the consistentFn
    twice with both TRUE and FALSE values for the in-doubt entry stream,
    returning a hit if either call produces TRUE, but not if they both return
    FALSE.  The code handles this for the case of a single in-doubt entry stream,
    but punts (falling back to the stupid behavior) if there's more than one lossy
    reference to the same page.  The idea could be scaled up to deal with multiple
    lossy references, but I think that would probably be wasted complexity.  At
    least to judge by Artur's example, such cases don't occur often enough to be
    worth trying to optimize.
    
    Back-patch to 8.4.  8.3 did not have lossy GIN index pointers, so not
    subject to these problems.
    2ab57e08
gin.h 17.5 KB