Commit 391159e0 authored by Tom Lane's avatar Tom Lane

Partially revert commit 3d3bf62f.

On reflection, the pre-existing logic in ANALYZE is specifically meant to
compare the frequency of a candidate MCV against the estimated frequency of
a random distinct value across the whole table.  The change to compare it
against the average frequency of values actually seen in the sample doesn't
seem very principled, and if anything it would make us less likely not more
likely to consider a value an MCV.  So revert that, but keep the aspect of
considering only nonnull values, which definitely is correct.

In passing, rename the local variables in these stanzas to
"ndistinct_table", to avoid confusion with the "ndistinct" that appears at
an outer scope in compute_scalar_stats.
parent c9ff752a
...@@ -2133,13 +2133,15 @@ compute_distinct_stats(VacAttrStatsP stats, ...@@ -2133,13 +2133,15 @@ compute_distinct_stats(VacAttrStatsP stats,
} }
else else
{ {
/* d here is the same as d in the Haas-Stokes formula */ double ndistinct_table = stats->stadistinct;
int d = nonnull_cnt - summultiple + nmultiple;
double avgcount, double avgcount,
mincount; mincount;
/* Re-extract estimate of # distinct nonnull values in table */
if (ndistinct_table < 0)
ndistinct_table = -ndistinct_table * totalrows;
/* estimate # occurrences in sample of a typical nonnull value */ /* estimate # occurrences in sample of a typical nonnull value */
avgcount = (double) nonnull_cnt / (double) d; avgcount = (double) nonnull_cnt / ndistinct_table;
/* set minimum threshold count to store a value */ /* set minimum threshold count to store a value */
mincount = avgcount * 1.25; mincount = avgcount * 1.25;
if (mincount < 2) if (mincount < 2)
...@@ -2493,14 +2495,16 @@ compute_scalar_stats(VacAttrStatsP stats, ...@@ -2493,14 +2495,16 @@ compute_scalar_stats(VacAttrStatsP stats,
} }
else else
{ {
/* d here is the same as d in the Haas-Stokes formula */ double ndistinct_table = stats->stadistinct;
int d = ndistinct + toowide_cnt;
double avgcount, double avgcount,
mincount, mincount,
maxmincount; maxmincount;
/* Re-extract estimate of # distinct nonnull values in table */
if (ndistinct_table < 0)
ndistinct_table = -ndistinct_table * totalrows;
/* estimate # occurrences in sample of a typical nonnull value */ /* estimate # occurrences in sample of a typical nonnull value */
avgcount = (double) values_cnt / (double) d; avgcount = (double) nonnull_cnt / ndistinct_table;
/* set minimum threshold count to store a value */ /* set minimum threshold count to store a value */
mincount = avgcount * 1.25; mincount = avgcount * 1.25;
if (mincount < 2) if (mincount < 2)
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment