• Tomas Vondra's avatar
    Apply multiple multivariate MCV lists when possible · eae056c1
    Tomas Vondra authored
    Until now we've only used a single multivariate MCV list per relation,
    covering the largest number of clauses. So for example given a query
    
        SELECT * FROM t WHERE a = 1 AND b =1 AND c = 1 AND d = 1
    
    and extended statistics on (a,b) and (c,d), we'd only pick and use one
    of them. This commit improves this by repeatedly picking and applying
    the best statistics (matching the largest number of remaining clauses)
    until no additional statistics is applicable.
    
    This greedy algorithm is simple, but may not be optimal. A different
    choice of statistics may leave fewer clauses unestimated and/or give
    better estimates for some other reason.
    
    This can however happen only when there are overlapping statistics, and
    selecting one makes it impossible to use the other. E.g. with statistics
    on (a,b), (c,d), (b,c,d), we may pick either (a,b) and (c,d) or (b,c,d).
    But it's not clear which option is the best one.
    
    We however assume cases like this are rare, and the easiest solution is
    to define statistics covering the whole group of correlated columns. In
    the future we might support overlapping stats, using some of the clauses
    as conditions (in conditional probability sense).
    
    Author: Tomas Vondra
    Reviewed-by: Mark Dilger, Kyotaro Horiguchi
    Discussion: https://postgr.es/m/20191028152048.jc6pqv5hb7j77ocp@development
    eae056c1
stats_ext.out 31.4 KB