Fix ndistinct estimates with system attributes

When estimating the number of groups using extended statistics, the code was discarding information about system attributes. This led to strange situation that SELECT 1 FROM t GROUP BY ctid; could have produced higher estimate (equal to pg_class.reltuples) than SELECT 1 FROM t GROUP BY a, b, ctid; with extended statistics on (a,b). Fixed by retaining information about the system attribute. Backpatch all the way to 10, where extended statistics were introduced. Author: Tomas Vondra Backpatch-through: 10

Fix ndistinct estimates with system attributes
When estimating the number of groups using extended statistics, the code was discarding information about system attributes. This led to strange situation that SELECT 1 FROM t GROUP BY ctid; could have produced higher estimate (equal to pg_class.reltuples) than SELECT 1 FROM t GROUP BY a, b, ctid; with extended statistics on (a,b). Fixed by retaining information about the system attribute. Backpatch all the way to 10, where extended statistics were introduced. Author: Tomas Vondra Backpatch-through: 10
33e52ad9 · Tomas Vondra · a14a0118 · 33e52ad9 · 33e52ad9
Commit 33e52ad9 authored Mar 26, 2021 by Tomas Vondra
Show whitespace changes
Inline Side-by-side

Showing with 4 additions and 4 deletions

src/backend/utils/adt/selfuncs.c src/backend/utils/adt/selfuncs.c +3 -3

src/test/regress/expected/stats_ext.out src/test/regress/expected/stats_ext.out +1 -1

No files found.
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -3987,10 +3987,10 @@ estimate_multivariate_ndistinct(PlannerInfo *root, RelOptInfo *rel,

 			attnum = ((Var *) varinfo->var)->varattno;

-			if (!AttrNumberIsForUserDefinedAttr(attnum))
+			if (AttrNumberIsForUserDefinedAttr(attnum) &&
+				bms_is_member(attnum, matched))
 				continue;

-			if (!bms_is_member(attnum, matched))
 			newlist = lappend(newlist, varinfo);
 		}


--- a/src/test/regress/expected/stats_ext.out
+++ b/src/test/regress/expected/stats_ext.out
@@ -260,7 +260,7 @@ SELECT s.stxkind, d.stxdndistinct
 SELECT * FROM check_estimated_rows('SELECT COUNT(*) FROM ndistinct GROUP BY ctid, a, b');
 estimated | actual 
 -----------+--------
-        11 |   1000
+      1000 |   1000
 (1 row)

 -- Hash Aggregate, thanks to estimates improved by the statistic