• Tom Lane's avatar
    Make pg_statistic and related code account more honestly for collations. · 5e092800
    Tom Lane authored
    When we first put in collations support, we basically punted on teaching
    pg_statistic, ANALYZE, and the planner selectivity functions about that.
    They've just used DEFAULT_COLLATION_OID independently of the actual
    collation of the data.  It's time to improve that, so:
    
    * Add columns to pg_statistic that record the specific collation associated
    with each statistics slot.
    
    * Teach ANALYZE to use the column's actual collation when comparing values
    for statistical purposes, and record this in the appropriate slot.  (Note
    that type-specific typanalyze functions are now expected to fill
    stats->stacoll with the appropriate collation, too.)
    
    * Teach assorted selectivity functions to use the actual collation of
    the stats they are looking at, instead of just assuming it's
    DEFAULT_COLLATION_OID.
    
    This should give noticeably better results in selectivity estimates for
    columns with nondefault collations, at least for query clauses that use
    that same collation (which would be the default behavior in most cases).
    It's still true that comparisons with explicit COLLATE clauses different
    from the stored data's collation won't be well-estimated, but that's no
    worse than before.  Also, this patch does make the first step towards
    doing better with that, which is that it's now theoretically possible to
    collect stats for a collation other than the column's own collation.
    
    Patch by me; thanks to Peter Eisentraut for review.
    
    Discussion: https://postgr.es/m/14706.1544630227@sss.pgh.pa.us
    5e092800
pg_statistic.h 11.6 KB