• Tom Lane's avatar
    Improve contrib/pg_trgm's heuristics for regexp index searches. · 80a5cf64
    Tom Lane authored
    When extracting trigrams from a regular expression for search of a GIN or
    GIST trigram index, it's useful to penalize (preferentially discard)
    trigrams that contain whitespace, since those are typically far more common
    in the index than trigrams not containing whitespace.  Of course, this
    should only be a preference not a hard rule, since we might otherwise end
    up with no trigrams to search for.  The previous coding tended to produce
    fairly inefficient trigram search sets for anchored regexp patterns, as
    reported by Erik Rijkers.  This patch penalizes whitespace-containing
    trigrams, and also reduces the target number of extracted trigrams, since
    experience suggests that the original coding tended to select too many
    trigrams to search for.
    
    Alexander Korotkov, reviewed by Tom Lane
    80a5cf64
trgm_regexp.c 64.7 KB