• Tom Lane's avatar
    Use query collation, not column's collation, while examining statistics. · 044c99bc
    Tom Lane authored
    Commit 5e092800 changed the planner so that, instead of blindly using
    DEFAULT_COLLATION_OID when invoking operators for selectivity estimation,
    it would use the collation of the column whose statistics we're
    considering.  This was recognized as still being not quite the right
    thing, but it seemed like a good incremental improvement.  However,
    shortly thereafter we introduced nondeterministic collations, and that
    creates cases where operators can fail if they're passed the wrong
    collation.  We don't want planning to fail in cases where the query itself
    would work, so this means that we *must* use the query's collation when
    invoking operators for estimation purposes.
    
    The only real problem this creates is in ineq_histogram_selectivity, where
    the binary search might produce a garbage answer if we perform comparisons
    using a different collation than the column's histogram is ordered with.
    However, when the query's collation is significantly different from the
    column's default collation, the estimate we previously generated would be
    pretty irrelevant anyway; so it's not clear that this will result in
    noticeably worse estimates in practice.  (A follow-on patch will improve
    this situation in HEAD, but it seems too invasive for back-patch.)
    
    The patch requires changing the signatures of mcv_selectivity and allied
    functions, which are exported and very possibly are used by extensions.
    In HEAD, I just did that, but an API/ABI break of this sort isn't
    acceptable in stable branches.  Therefore, in v12 the patch introduces
    "mcv_selectivity_ext" and so on, with signatures matching HEAD, and makes
    the old functions into wrappers that assume DEFAULT_COLLATION_OID should
    be used.  That does not match the prior behavior, but it should avoid risk
    of failure in most cases.  (In practice, I think most extension datatypes
    aren't collation-aware, so the change probably doesn't matter to them.)
    
    Per report from James Lucas.  Back-patch to v12 where the problem was
    introduced.
    
    Discussion: https://postgr.es/m/CAAFmbbOvfi=wMM=3qRsPunBSLb8BFREno2oOzSBS=mzfLPKABw@mail.gmail.com
    044c99bc
selfuncs.c 225 KB