Commit 3f5d2fe3 authored by Tom Lane's avatar Tom Lane

Be more wary of missing statistics in eqjoinsel_semi().

In particular, if we don't have real ndistinct estimates for both sides,
fall back to assuming that half of the left-hand rows have join partners.
This is what was done in 8.2 and 8.3 (cf nulltestsel() in those versions).
It's pretty stupid but it won't lead us to think that an antijoin produces
no rows out, as seen in recent example from Uwe Schroeder.
parent 921b9936
...@@ -2342,7 +2342,9 @@ eqjoinsel_semi(Oid operator, ...@@ -2342,7 +2342,9 @@ eqjoinsel_semi(Oid operator,
bool *hasmatch1; bool *hasmatch1;
bool *hasmatch2; bool *hasmatch2;
double nullfrac1 = stats1->stanullfrac; double nullfrac1 = stats1->stanullfrac;
double matchfreq1; double matchfreq1,
uncertainfrac,
uncertain;
int i, int i,
nmatches; nmatches;
...@@ -2396,18 +2398,26 @@ eqjoinsel_semi(Oid operator, ...@@ -2396,18 +2398,26 @@ eqjoinsel_semi(Oid operator,
* the uncertain rows that a fraction nd2/nd1 have join partners. We * the uncertain rows that a fraction nd2/nd1 have join partners. We
* can discount the known-matched MCVs from the distinct-values counts * can discount the known-matched MCVs from the distinct-values counts
* before doing the division. * before doing the division.
*
* Crude as the above is, it's completely useless if we don't have
* reliable ndistinct values for both sides. Hence, if either nd1
* or nd2 is default, punt and assume half of the uncertain rows
* have join partners.
*/ */
if (nd1 != DEFAULT_NUM_DISTINCT && nd2 != DEFAULT_NUM_DISTINCT)
{
nd1 -= nmatches; nd1 -= nmatches;
nd2 -= nmatches; nd2 -= nmatches;
if (nd1 <= nd2 || nd2 <= 0) if (nd1 <= nd2 || nd2 <= 0)
selec = Max(matchfreq1, 1.0 - nullfrac1); uncertainfrac = 1.0;
else else
{ uncertainfrac = nd2 / nd1;
double uncertain = 1.0 - matchfreq1 - nullfrac1;
CLAMP_PROBABILITY(uncertain);
selec = matchfreq1 + (nd2 / nd1) * uncertain;
} }
else
uncertainfrac = 0.5;
uncertain = 1.0 - matchfreq1 - nullfrac1;
CLAMP_PROBABILITY(uncertain);
selec = matchfreq1 + uncertainfrac * uncertain;
} }
else else
{ {
...@@ -2417,6 +2427,8 @@ eqjoinsel_semi(Oid operator, ...@@ -2417,6 +2427,8 @@ eqjoinsel_semi(Oid operator,
*/ */
double nullfrac1 = stats1 ? stats1->stanullfrac : 0.0; double nullfrac1 = stats1 ? stats1->stanullfrac : 0.0;
if (nd1 != DEFAULT_NUM_DISTINCT && nd2 != DEFAULT_NUM_DISTINCT)
{
if (vardata1->rel) if (vardata1->rel)
nd1 = Min(nd1, vardata1->rel->rows); nd1 = Min(nd1, vardata1->rel->rows);
if (vardata2->rel) if (vardata2->rel)
...@@ -2427,6 +2439,9 @@ eqjoinsel_semi(Oid operator, ...@@ -2427,6 +2439,9 @@ eqjoinsel_semi(Oid operator,
else else
selec = (nd2 / nd1) * (1.0 - nullfrac1); selec = (nd2 / nd1) * (1.0 - nullfrac1);
} }
else
selec = 0.5 * (1.0 - nullfrac1);
}
if (have_mcvs1) if (have_mcvs1)
free_attstatsslot(vardata1->atttype, values1, nvalues1, free_attstatsslot(vardata1->atttype, values1, nvalues1,
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment