• Dean Rasheed's avatar
    Prevent functional dependency estimates from exceeding column estimates. · 87779aa4
    Dean Rasheed authored
    Formerly we applied a functional dependency "a => b with dependency
    degree f" using the formula
    
      P(a,b) = P(a) * [f + (1-f)*P(b)]
    
    This leads to the possibility that the combined selectivity P(a,b)
    could exceed P(b), which is not ideal. The addition of support for IN
    and OR clauses (commits 8f321bd1 and ccaa3569) would seem to make
    this more likely, since the user-supplied values in such clauses are
    not necessarily compatible with the functional dependency.
    
    Mitigate this by using the formula
    
      P(a,b) = f * Min(P(a), P(b)) + (1-f) * P(a) * P(b)
    
    instead, which guarantees that the combined selectivity is less than
    each column's individual selectivity. Logically, this is modifies the
    part of the formula that accounts for dependent rows to handle cases
    where P(a) > P(b), whilst not changing the second term which accounts
    for independent rows.
    
    Additionally, this refactors the way that functional dependencies are
    applied, so now dependencies_clauselist_selectivity() estimates both
    the implying clauses and the implied clauses for each functional
    dependency (formerly only the implied clauses were estimated), and now
    all clauses for each attribute are taken into account (formerly only
    one clause for each implied attribute was estimated). This removes the
    previously built-in assumption that only equality clauses will be
    seen, which is no longer true, and opens up the possibility of
    applying functional dependencies to more general clauses.
    
    Patch by me, reviewed by Tomas Vondra.
    
    Discussion: https://postgr.es/m/CAEZATCXaNFZyOhR4XXAfkvj1tibRBEjje6ZbXwqWUB_tqbH%3Drw%40mail.gmail.com
    Discussion: https://postgr.es/m/20200318002946.6dvblukm3cfmgir2%40development
    87779aa4
stats_ext.out 51.9 KB