Tweak index costing for problems with partial indexes.

btcostestimate() makes an estimate of the number of index tuples that will be visited based on knowledge of which index clauses can actually bound the scan within nbtree. However, it forgot to account for partial indexes in this calculation, with the result that the cost of the index scan could be significantly overestimated for a partial index. Fix that by merging the predicate with the abbreviated indexclause list, in the same way as we do with the full list to estimate how many heap tuples will be visited. Also, slightly increase the "fudge factor" that's meant to give preference to smaller indexes over larger ones. While this is applied to all indexes, it's most important for partial indexes since it can be the only factor that makes a partial index look cheaper than a similar full index. Experimentation shows that the existing value is so small as to easily get swamped by noise such as page-boundary-roundoff behavior. I'm tempted to kick it up more than this, but will refrain for now. Per report from Ruben Blanco. These are long-standing issues, but given the lack of prior complaints I'm not going to risk changing planner behavior in back branches by back-patching.

Tweak index costing for problems with partial indexes.
btcostestimate() makes an estimate of the number of index tuples that will be visited based on knowledge of which index clauses can actually bound the scan within nbtree. However, it forgot to account for partial indexes in this calculation, with the result that the cost of the index scan could be significantly overestimated for a partial index. Fix that by merging the predicate with the abbreviated indexclause list, in the same way as we do with the full list to estimate how many heap tuples will be visited. Also, slightly increase the "fudge factor" that's meant to give preference to smaller indexes over larger ones. While this is applied to all indexes, it's most important for partial indexes since it can be the only factor that makes a partial index look cheaper than a similar full index. Experimentation shows that the existing value is so small as to easily get swamped by noise such as page-boundary-roundoff behavior. I'm tempted to kick it up more than this, but will refrain for now. Per report from Ruben Blanco. These are long-standing issues, but given the lack of prior complaints I'm not going to risk changing planner behavior in back branches by back-patching.
21a39de5 · Tom Lane · b28ffd0f · 21a39de5
Commit 21a39de5 authored Jan 29, 2012 by Tom Lane
Show whitespace changes
Inline Side-by-side

Showing with 58 additions and 39 deletions

src/backend/utils/adt/selfuncs.c src/backend/utils/adt/selfuncs.c +58 -39

No files found.
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -5956,6 +5956,50 @@ string_to_bytea_const(const char *str, size_t str_len)
 *
 * Index cost estimation functions
 *
+ *-------------------------------------------------------------------------
+ */
+
+/*
+ * If the index is partial, add its predicate to the given qual list.
+ *
+ * ANDing the index predicate with the explicitly given indexquals produces
+ * a more accurate idea of the index's selectivity.  However, we need to be
+ * careful not to insert redundant clauses, because clauselist_selectivity()
+ * is easily fooled into computing a too-low selectivity estimate.  Our
+ * approach is to add only the predicate clause(s) that cannot be proven to
+ * be implied by the given indexquals.  This successfully handles cases such
+ * as a qual "x = 42" used with a partial index "WHERE x >= 40 AND x < 50".
+ * There are many other cases where we won't detect redundancy, leading to a
+ * too-low selectivity estimate, which will bias the system in favor of using
+ * partial indexes where possible.  That is not necessarily bad though.
+ *
+ * Note that indexQuals contains RestrictInfo nodes while the indpred
+ * does not, so the output list will be mixed.  This is OK for both
+ * predicate_implied_by() and clauselist_selectivity(), but might be
+ * problematic if the result were passed to other things.
+ */
+static List *
+add_predicate_to_quals(IndexOptInfo *index, List *indexQuals)
+{
+	List	   *predExtraQuals = NIL;
+	ListCell   *lc;
+
+	if (index->indpred == NIL)
+		return indexQuals;
+
+	foreach(lc, index->indpred)
+	{
+		Node	   *predQual = (Node *) lfirst(lc);
+		List	   *oneQual = list_make1(predQual);
+
+		if (!predicate_implied_by(oneQual, indexQuals))
+			predExtraQuals = list_concat(predExtraQuals, oneQual);
+	}
+	/* list_concat avoids modifying the passed-in indexQuals list */
+	return list_concat(predExtraQuals, indexQuals);
+}
+
+/*
 * genericcostestimate is a general-purpose estimator for use when we
 * don't have any better idea about how to estimate.  Index-type-specific
 * knowledge can be incorporated in the type-specific routines.
@@ -5964,10 +6008,7 @@ string_to_bytea_const(const char *str, size_t str_len)
 * in genericcostestimate is the estimate of the number of index tuples
 * visited.  If numIndexTuples is not 0 then it is used as the estimate,
 * otherwise we compute a generic estimate.
- *
- *-------------------------------------------------------------------------
 */
-
 static void
 genericcostestimate(PlannerInfo *root,
 					IndexPath *path,
@@ -5992,42 +6033,12 @@ genericcostestimate(PlannerInfo *root,
 	List	   *selectivityQuals;
 	ListCell   *l;

-	/*----------
+	/*
 	 * If the index is partial, AND the index predicate with the explicitly
 	 * given indexquals to produce a more accurate idea of the index
-	 * selectivity.  However, we need to be careful not to insert redundant
-	 * clauses, because clauselist_selectivity() is easily fooled into
-	 * computing a too-low selectivity estimate.  Our approach is to add
-	 * only the index predicate clause(s) that cannot be proven to be implied
-	 * by the given indexquals.  This successfully handles cases such as a
-	 * qual "x = 42" used with a partial index "WHERE x >= 40 AND x < 50".
-	 * There are many other cases where we won't detect redundancy, leading
-	 * to a too-low selectivity estimate, which will bias the system in favor
-	 * of using partial indexes where possible.  That is not necessarily bad
-	 * though.
-	 *
-	 * Note that indexQuals contains RestrictInfo nodes while the indpred
-	 * does not.  This is OK for both predicate_implied_by() and
-	 * clauselist_selectivity().
-	 *----------
+	 * selectivity.
 	 */
-	if (index->indpred != NIL)
-	{
-		List	   *predExtraQuals = NIL;
-
-		foreach(l, index->indpred)
-		{
-			Node	   *predQual = (Node *) lfirst(l);
-			List	   *oneQual = list_make1(predQual);
-
-			if (!predicate_implied_by(oneQual, indexQuals))
-				predExtraQuals = list_concat(predExtraQuals, oneQual);
-		}
-		/* list_concat avoids modifying the passed-in indexQuals list */
-		selectivityQuals = list_concat(predExtraQuals, indexQuals);
-	}
-	else
-		selectivityQuals = indexQuals;
+	selectivityQuals = add_predicate_to_quals(index, indexQuals);

 	/*
 	 * Check for ScalarArrayOpExpr index quals, and estimate the number of
@@ -6164,11 +6175,11 @@ genericcostestimate(PlannerInfo *root,
 	 *
 	 * We can deal with this by adding a very small "fudge factor" that
 	 * depends on the index size.  The fudge factor used here is one
-	 * spc_random_page_cost per 100000 index pages, which should be small
+	 * spc_random_page_cost per 10000 index pages, which should be small
 	 * enough to not alter index-vs-seqscan decisions, but will prevent
 	 * indexes of different sizes from looking exactly equally attractive.
 	 */
-	*indexTotalCost += index->pages * spc_random_page_cost / 100000.0;
+	*indexTotalCost += index->pages * spc_random_page_cost / 10000.0;

 	/*
 	 * CPU cost: any complex expressions in the indexquals will need to be
@@ -6386,9 +6397,17 @@ btcostestimate(PG_FUNCTION_ARGS)
 		numIndexTuples = 1.0;
 	else
 	{
+		List	   *selectivityQuals;
 		Selectivity btreeSelectivity;

-		btreeSelectivity = clauselist_selectivity(root, indexBoundQuals,
+		/*
+		 * If the index is partial, AND the index predicate with the
+		 * index-bound quals to produce a more accurate idea of the number
+		 * of rows covered by the bound conditions.
+		 */
+		selectivityQuals = add_predicate_to_quals(index, indexBoundQuals);
+
+		btreeSelectivity = clauselist_selectivity(root, selectivityQuals,
 												  index->rel->relid,
 												  JOIN_INNER,
 												  NULL);