eqjoinsel's logic for case where MCV lists are not present should

account for NULLs; in hindsight this is obvious since the code for the MCV-lists case would reduce to this when there are zero entries in both lists. Per example from Alec Mitchell.

eqjoinsel's logic for case where MCV lists are not present should
account for NULLs; in hindsight this is obvious since the code for the MCV-lists case would reduce to this when there are zero entries in both lists. Per example from Alec Mitchell.
5ab15591 · Tom Lane · 49c3cf5f · 5ab15591
Commit 5ab15591 authored Apr 15, 2003 by Tom Lane
Show whitespace changes
Inline Side-by-side

Showing with 21 additions and 15 deletions

src/backend/utils/adt/selfuncs.c src/backend/utils/adt/selfuncs.c +21 -15

No files found.
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -15,7 +15,7 @@
 *
 *
 * IDENTIFICATION
- *	  $Header: /cvsroot/pgsql/src/backend/utils/adt/selfuncs.c,v 1.134 2003/03/23 05:14:36 tgl Exp $
+ *	  $Header: /cvsroot/pgsql/src/backend/utils/adt/selfuncs.c,v 1.135 2003/04/15 05:18:12 tgl Exp $
 *
 *-------------------------------------------------------------------------
 */
@@ -1591,27 +1591,33 @@ eqjoinsel(PG_FUNCTION_ARGS)
 		{
 			/*
 			 * We do not have MCV lists for both sides.  Estimate the join
-			 * selectivity as MIN(1/nd1, 1/nd2).  This is plausible if we
+			 * selectivity as MIN(1/nd1,1/nd2)*(1-nullfrac1)*(1-nullfrac2).
-			 * assume that the values are about equally distributed: a
+			 * This is plausible if we assume that the join operator is
-			 * given tuple of rel1 will join to either 0 or N2/nd2 rows of
+			 * strict and the non-null values are about equally distributed:
-			 * rel2, so total join rows are at most N1*N2/nd2 giving a
+			 * a given non-null tuple of rel1 will join to either zero or
-			 * join selectivity of not more than 1/nd2.  By the same logic
+			 * N2*(1-nullfrac2)/nd2 rows of rel2, so total join rows are at
-			 * it is not more than 1/nd1, so MIN(1/nd1, 1/nd2) is an upper
+			 * most N1*(1-nullfrac1)*N2*(1-nullfrac2)/nd2 giving a join
-			 * bound.  Using the MIN() means we estimate from the point of
+			 * selectivity of not more than (1-nullfrac1)*(1-nullfrac2)/nd2.
-			 * view of the relation with smaller nd (since the larger nd
+			 * By the same logic it is not more than
-			 * is determining the MIN).  It is reasonable to assume that
+			 * (1-nullfrac1)*(1-nullfrac2)/nd1, so the expression with MIN()
-			 * most tuples in this rel will have join partners, so the
+			 * is an upper bound.  Using the MIN() means we estimate from the
-			 * bound is probably reasonably tight and should be taken
+			 * point of view of the relation with smaller nd (since the larger
-			 * as-is.
+			 * nd is determining the MIN).  It is reasonable to assume that
+			 * most tuples in this rel will have join partners, so the bound
+			 * is probably reasonably tight and should be taken as-is.
 			 *
 			 * XXX Can we be smarter if we have an MCV list for just one
 			 * side? It seems that if we assume equal distribution for the
 			 * other side, we end up with the same answer anyway.
 			 */
+			double		nullfrac1 = stats1->stanullfrac;
+			double		nullfrac2 = stats2->stanullfrac;
+			selec = (1.0 - nullfrac1) * (1.0 - nullfrac2);
 			if (nd1 > nd2)
-				selec = 1.0 / nd1;
+				selec /= nd1;
 			else
-				selec = 1.0 / nd2;
+				selec /= nd2;
 		}
 		if (have_mcvs1)