• Dean Rasheed's avatar
    Improve estimate of distinct values in estimate_num_groups(). · 84f9a35e
    Dean Rasheed authored
    When adjusting the estimate for the number of distinct values from a
    rel in a grouped query to take into account the selectivity of the
    rel's restrictions, use a formula that is less likely to produce
    under-estimates.
    
    The old formula simply multiplied the number of distinct values in the
    rel by the restriction selectivity, which would be correct if the
    restrictions were fully correlated with the grouping expressions, but
    can produce significant under-estimates in cases where they are not
    well correlated.
    
    The new formula is based on the random selection probability, and so
    assumes that the restrictions are not correlated with the grouping
    expressions. This is guaranteed to produce larger estimates, and of
    course risks over-estimating in cases where the restrictions are
    correlated, but that has less severe consequences than
    under-estimating, which might lead to a HashAgg that consumes an
    excessive amount of memory.
    
    This could possibly be improved upon in the future by identifying
    correlated restrictions and using a hybrid of the old and new
    formulae.
    
    Author: Tomas Vondra, with some hacking be me
    Reviewed-by: Mark Dilger, Alexander Korotkov, Dean Rasheed and Tom Lane
    Discussion: http://www.postgresql.org/message-id/flat/56CD0381.5060502@2ndquadrant.com
    84f9a35e
subselect.out 22.5 KB