• Tom Lane's avatar
    Improve ineq_histogram_selectivity's behavior for non-default orderings. · 0c882e52
    Tom Lane authored
    ineq_histogram_selectivity() can be invoked in situations where the
    ordering we care about is not that of the column's histogram.  We could
    be considering some other collation, or even more drastically, the
    query operator might not agree at all with what was used to construct
    the histogram.  (We'll get here for anything using scalarineqsel-based
    estimators, so that's quite likely to happen for extension operators.)
    
    Up to now we just ignored this issue and assumed we were dealing with
    an operator/collation whose sort order exactly matches the histogram,
    possibly resulting in junk estimates if the binary search gets confused.
    It's past time to improve that, since the use of nondefault collations
    is increasing.  What we can do is verify that the given operator and
    collation match what's recorded in pg_statistic, and use the existing
    code only if so.  When they don't match, instead execute the operator
    against each histogram entry, and take the fraction of successes as our
    selectivity estimate.  This gives an estimate that is probably good to
    about 1/histogram_size, with no assumptions about ordering.  (The quality
    of the estimate is likely to degrade near the ends of the value range,
    since the two orderings probably don't agree on what is an extremal value;
    but this is surely going to be more reliable than what we did before.)
    
    At some point we might further improve matters by storing more than one
    histogram calculated according to different orderings.  But this code
    would still be good fallback logic when no matches exist, so that is
    not an argument for not doing this.
    
    While here, also improve get_variable_range() to deal more honestly
    with non-default collations.
    
    This isn't back-patchable, because it requires adding another argument
    to ineq_histogram_selectivity, and because it might have significant
    impact on the estimation results for extension operators relying on
    scalarineqsel --- mostly for the better, one hopes, but in any case
    destabilizing plan choices in back branches is best avoided.
    
    Per investigation of a report from James Lucas.
    
    Discussion: https://postgr.es/m/CAAFmbbOvfi=wMM=3qRsPunBSLb8BFREno2oOzSBS=mzfLPKABw@mail.gmail.com
    0c882e52
lsyscache.c 79 KB