Commit 5ab15591 authored by Tom Lane's avatar Tom Lane

eqjoinsel's logic for case where MCV lists are not present should

account for NULLs; in hindsight this is obvious since the code for
the MCV-lists case would reduce to this when there are zero entries
in both lists.  Per example from Alec Mitchell.
parent 49c3cf5f
...@@ -15,7 +15,7 @@ ...@@ -15,7 +15,7 @@
* *
* *
* IDENTIFICATION * IDENTIFICATION
* $Header: /cvsroot/pgsql/src/backend/utils/adt/selfuncs.c,v 1.134 2003/03/23 05:14:36 tgl Exp $ * $Header: /cvsroot/pgsql/src/backend/utils/adt/selfuncs.c,v 1.135 2003/04/15 05:18:12 tgl Exp $
* *
*------------------------------------------------------------------------- *-------------------------------------------------------------------------
*/ */
...@@ -1591,27 +1591,33 @@ eqjoinsel(PG_FUNCTION_ARGS) ...@@ -1591,27 +1591,33 @@ eqjoinsel(PG_FUNCTION_ARGS)
{ {
/* /*
* We do not have MCV lists for both sides. Estimate the join * We do not have MCV lists for both sides. Estimate the join
* selectivity as MIN(1/nd1, 1/nd2). This is plausible if we * selectivity as MIN(1/nd1,1/nd2)*(1-nullfrac1)*(1-nullfrac2).
* assume that the values are about equally distributed: a * This is plausible if we assume that the join operator is
* given tuple of rel1 will join to either 0 or N2/nd2 rows of * strict and the non-null values are about equally distributed:
* rel2, so total join rows are at most N1*N2/nd2 giving a * a given non-null tuple of rel1 will join to either zero or
* join selectivity of not more than 1/nd2. By the same logic * N2*(1-nullfrac2)/nd2 rows of rel2, so total join rows are at
* it is not more than 1/nd1, so MIN(1/nd1, 1/nd2) is an upper * most N1*(1-nullfrac1)*N2*(1-nullfrac2)/nd2 giving a join
* bound. Using the MIN() means we estimate from the point of * selectivity of not more than (1-nullfrac1)*(1-nullfrac2)/nd2.
* view of the relation with smaller nd (since the larger nd * By the same logic it is not more than
* is determining the MIN). It is reasonable to assume that * (1-nullfrac1)*(1-nullfrac2)/nd1, so the expression with MIN()
* most tuples in this rel will have join partners, so the * is an upper bound. Using the MIN() means we estimate from the
* bound is probably reasonably tight and should be taken * point of view of the relation with smaller nd (since the larger
* as-is. * nd is determining the MIN). It is reasonable to assume that
* most tuples in this rel will have join partners, so the bound
* is probably reasonably tight and should be taken as-is.
* *
* XXX Can we be smarter if we have an MCV list for just one * XXX Can we be smarter if we have an MCV list for just one
* side? It seems that if we assume equal distribution for the * side? It seems that if we assume equal distribution for the
* other side, we end up with the same answer anyway. * other side, we end up with the same answer anyway.
*/ */
double nullfrac1 = stats1->stanullfrac;
double nullfrac2 = stats2->stanullfrac;
selec = (1.0 - nullfrac1) * (1.0 - nullfrac2);
if (nd1 > nd2) if (nd1 > nd2)
selec = 1.0 / nd1; selec /= nd1;
else else
selec = 1.0 / nd2; selec /= nd2;
} }
if (have_mcvs1) if (have_mcvs1)
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment