• Alexander Korotkov's avatar
    Avoid full scan of GIN indexes when possible · 4b754d6c
    Alexander Korotkov authored
    The strategy of GIN index scan is driven by opclass-specific extract_query
    method.  This method that needed search mode is GIN_SEARCH_MODE_ALL.  This
    mode means that matching tuple may contain none of extracted entries.  Simple
    example is '!term' tsquery, which doesn't need any term to exist in matching
    tsvector.
    
    In order to handle such scan key GIN calculates virtual entry, which contains
    all TIDs of all entries of attribute.  In fact this is full scan of index
    attribute.  And typically this is very slow, but allows to handle some queries
    correctly in GIN.  However, current algorithm calculate such virtual entry for
    each GIN_SEARCH_MODE_ALL scan key even if they are multiple for the same
    attribute.  This is clearly not optimal.
    
    This commit improves the situation by introduction of "exclude only" scan keys.
    Such scan keys are not capable to return set of matching TIDs.  Instead, they
    are capable only to filter TIDs produced by normal scan keys.  Therefore,
    each attribute should contain at least one normal scan key, while rest of them
    may be "exclude only" if search mode is GIN_SEARCH_MODE_ALL.
    
    The same optimization might be applied to the whole scan, not per-attribute.
    But that leads to NULL values elimination problem.  There is trade-off between
    multiple possible ways to do this.  We probably want to do this later using
    some cost-based decision algorithm.
    
    Discussion: https://postgr.es/m/CAOBaU_YGP5-BEt5Cc0%3DzMve92vocPzD%2BXiZgiZs1kjY0cj%3DXBg%40mail.gmail.com
    Author: Nikita Glukhov, Alexander Korotkov, Tom Lane, Julien Rouhaud
    Reviewed-by: Julien Rouhaud, Tomas Vondra, Tom Lane
    4b754d6c
ginget.c 53.6 KB