• Alvaro Herrera's avatar
    Implement multivariate n-distinct coefficients · 7b504eb2
    Alvaro Herrera authored
    Add support for explicitly declared statistic objects (CREATE
    STATISTICS), allowing collection of statistics on more complex
    combinations that individual table columns.  Companion commands DROP
    STATISTICS and ALTER STATISTICS ... OWNER TO / SET SCHEMA / RENAME are
    added too.  All this DDL has been designed so that more statistic types
    can be added later on, such as multivariate most-common-values and
    multivariate histograms between columns of a single table, leaving room
    for permitting columns on multiple tables, too, as well as expressions.
    
    This commit only adds support for collection of n-distinct coefficient
    on user-specified sets of columns in a single table.  This is useful to
    estimate number of distinct groups in GROUP BY and DISTINCT clauses;
    estimation errors there can cause over-allocation of memory in hashed
    aggregates, for instance, so it's a worthwhile problem to solve.  A new
    special pseudo-type pg_ndistinct is used.
    
    (num-distinct estimation was deemed sufficiently useful by itself that
    this is worthwhile even if no further statistic types are added
    immediately; so much so that another version of essentially the same
    functionality was submitted by Kyotaro Horiguchi:
    https://postgr.es/m/20150828.173334.114731693.horiguchi.kyotaro@lab.ntt.co.jp
    though this commit does not use that code.)
    
    Author: Tomas Vondra.  Some code rework by Álvaro.
    Reviewed-by: Dean Rasheed, David Rowley, Kyotaro Horiguchi, Jeff Janes,
        Ideriha Takeshi
    Discussion: https://postgr.es/m/543AFA15.4080608@fuzzy.cz
        https://postgr.es/m/20170320190220.ixlaueanxegqd5gr@alvherre.pgsql
    7b504eb2
drop_statistics.sgml 2.05 KB