Improve planner's handling of duplicated index column expressions.

It's potentially useful for an index to repeat the same indexable column or expression in multiple index columns, if the columns have different opclasses. (If they share opclasses too, the duplicate column is pretty useless, but nonetheless we've allowed such cases since 9.0.) However, the planner failed to cope with this, because createplan.c was relying on simple equal() matching to figure out which index column each index qual is intended for. We do have that information available upstream in indxpath.c, though, so the fix is to not flatten the multi-level indexquals list when putting it into an IndexPath. Then we can rely on the sublist structure to identify target index columns in createplan.c. There's a similar issue for index ORDER BYs (the KNNGIST feature), so introduce a multi-level-list representation for that too. This adds a bit more representational overhead, but we might more or less buy that back by not having to search for matching index columns anymore in createplan.c; likewise btcostestimate saves some cycles. Per bug #6351 from Christian Rudolph. Likely symptoms include the "btree index keys must be ordered by attribute" failure shown there, as well as "operator MMMM is not a member of opfamily NNNN". Although this is a pre-existing problem that can be demonstrated in 9.0 and 9.1, I'm not going to back-patch it, because the API changes in the planner seem likely to break things such as index plugins. The corner cases where this matters seem too narrow to justify possibly breaking things in a minor release.

Improve planner's handling of duplicated index column expressions.
It's potentially useful for an index to repeat the same indexable column or expression in multiple index columns, if the columns have different opclasses. (If they share opclasses too, the duplicate column is pretty useless, but nonetheless we've allowed such cases since 9.0.) However, the planner failed to cope with this, because createplan.c was relying on simple equal() matching to figure out which index column each index qual is intended for. We do have that information available upstream in indxpath.c, though, so the fix is to not flatten the multi-level indexquals list when putting it into an IndexPath. Then we can rely on the sublist structure to identify target index columns in createplan.c. There's a similar issue for index ORDER BYs (the KNNGIST feature), so introduce a multi-level-list representation for that too. This adds a bit more representational overhead, but we might more or less buy that back by not having to search for matching index columns anymore in createplan.c; likewise btcostestimate saves some cycles. Per bug #6351 from Christian Rudolph. Likely symptoms include the "btree index keys must be ordered by attribute" failure shown there, as well as "operator MMMM is not a member of opfamily NNNN". Although this is a pre-existing problem that can be demonstrated in 9.0 and 9.1, I'm not going to back-patch it, because the API changes in the planner seem likely to break things such as index plugins. The corner cases where this matters seem too narrow to justify possibly breaking things in a minor release.
e2c2c2e8 · Tom Lane · d5448c7d · e2c2c2e8 · e2c2c2e8 · e2c2c2e8
Commit e2c2c2e8 authored Dec 23, 2011 by Tom Lane
12 changed files
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -209,8 +209,10 @@ cost_seqscan(Path *path, PlannerInfo *root,
 *	  Determines and returns the cost of scanning a relation using an index.
 *
 * 'index' is the index to be used
- * 'indexQuals' is the list of applicable qual clauses (implicit AND semantics)
+ * 'indexQuals' is a list of lists of applicable qual clauses (implicit AND
- * 'indexOrderBys' is the list of ORDER BY operators for amcanorderbyop indexes
+ *		semantics, one sub-list per index column)
+ * 'indexOrderBys' is a list of lists of lists of ORDER BY expressions for
+ *		amcanorderbyop indexes (lists per pathkey and index column)
 * 'indexonly' is true if it's an index-only scan
 * 'outer_rel' is the outer relation when we are considering using the index
 *		scan as the inside of a nestloop join (hence, some of the indexQuals
@@ -221,8 +223,8 @@ cost_seqscan(Path *path, PlannerInfo *root,
 * additional fields of the IndexPath besides startup_cost and total_cost.
 * These fields are needed if the IndexPath is used in a BitmapIndexScan.
 *
- * indexQuals is a list of RestrictInfo nodes, but indexOrderBys is a list of
+ * indexQuals is a list of lists of RestrictInfo nodes, but indexOrderBys
- * bare expressions.
+ * is a list of lists of lists of bare expressions.
 *
 * NOTE: 'indexQuals' must contain only clauses usable as index restrictions.
 * Any additional quals evaluated as qpquals may reduce the number of returned

--- a/src/backend/optimizer/path/indxpath.c
+++ b/src/backend/optimizer/path/indxpath.c
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -412,8 +412,8 @@ create_seqscan_path(PlannerInfo *root, RelOptInfo *rel)
 * 'index' is a usable index.
 * 'clause_groups' is a list of lists of RestrictInfo nodes
 *			to be used as index qual conditions in the scan.
- * 'indexorderbys' is a list of bare expressions (no RestrictInfos)
+ * 'indexorderbys' is a list of lists of lists of bare expressions (not
- *			to be used as index ordering operators in the scan.
+ *			RestrictInfos) to be used as index ordering operators.
 * 'pathkeys' describes the ordering of the path.
 * 'indexscandir' is ForwardScanDirection or BackwardScanDirection
 *			for an ordered index, or NoMovementScanDirection for

--- a/src/backend/optimizer/util/restrictinfo.c
+++ b/src/backend/optimizer/util/restrictinfo.c
@@ -630,7 +630,7 @@ extract_actual_join_clauses(List *restrictinfo_list,
 * being used in an inner indexscan need not be checked again at the join.
 *
 * "Redundant" means either equal() or derived from the same EquivalenceClass.
- * We have to check the latter because indxqual.c may select different derived
+ * We have to check the latter because indxpath.c may select different derived
 * clauses than were selected by generate_join_implied_equalities().
 *
 * Note that we are *not* checking for local redundancies within the given

--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -5991,6 +5991,14 @@ genericcostestimate(PlannerInfo *root,
 	List	   *selectivityQuals;
 	ListCell   *l;
+	/*
+	 * For our purposes here, it doesn't matter which index columns the
+	 * individual quals and order-by expressions go with, so flatten the
+	 * lists for convenience.
+	 */
+	indexQuals = flatten_clausegroups_list(indexQuals);
+	indexOrderBys = flatten_indexorderbys_list(indexOrderBys);
 	/*----------
 	 * If the index is partial, AND the index predicate with the explicitly
 	 * given indexquals to produce a more accurate idea of the index
@@ -6022,7 +6030,7 @@ genericcostestimate(PlannerInfo *root,
 			if (!predicate_implied_by(oneQual, indexQuals))
 				predExtraQuals = list_concat(predExtraQuals, oneQual);
 		}
-		/* list_concat avoids modifying the passed-in indexQuals list */
+		/* list_concat avoids modifying the indexQuals list */
 		selectivityQuals = list_concat(predExtraQuals, indexQuals);
 	}
 	else
@@ -6250,7 +6258,7 @@ btcostestimate(PG_FUNCTION_ARGS)
 	bool		found_saop;
 	bool		found_is_null_op;
 	double		num_sa_scans;
-	ListCell   *l;
+	ListCell   *lc1;
 	/*
 	 * For a btree scan, only leading '=' quals plus inequality quals for the
@@ -6259,8 +6267,7 @@ btcostestimate(PG_FUNCTION_ARGS)
 	 * the index scan).  Additional quals can suppress visits to the heap, so
 	 * it's OK to count them in indexSelectivity, but they should not count
 	 * for estimating numIndexTuples.  So we must examine the given indexQuals
-	 * to find out which ones count as boundary quals.	We rely on the
+	 * to find out which ones count as boundary quals.
-	 * knowledge that they are given in index column order.
 	 *
 	 * For a RowCompareExpr, we consider only the first column, just as
 	 * rowcomparesel() does.
@@ -6270,14 +6277,25 @@ btcostestimate(PG_FUNCTION_ARGS)
 	 * considered to act the same as it normally does.
 	 */
 	indexBoundQuals = NIL;
-	indexcol = 0;
 	eqQualHere = false;
 	found_saop = false;
 	found_is_null_op = false;
 	num_sa_scans = 1;
-	foreach(l, indexQuals)
+	/* clausegroups must correspond to index columns */
+	Assert(list_length(indexQuals) <= index->ncolumns);
+	indexcol = 0;
+	foreach(lc1, indexQuals)
 	{
-		RestrictInfo *rinfo = (RestrictInfo *) lfirst(l);
+		List	   *clausegroup = (List *) lfirst(lc1);
+		ListCell   *lc2;
+		eqQualHere = false;
+		foreach(lc2, clausegroup)
+		{
+			RestrictInfo *rinfo = (RestrictInfo *) lfirst(lc2);
 			Expr	   *clause;
 			Node	   *leftop,
 					   *rightop;
@@ -6329,37 +6347,18 @@ btcostestimate(PG_FUNCTION_ARGS)
 					 (int) nodeTag(clause));
 				continue;			/* keep compiler quiet */
 			}
 			if (match_index_to_operand(leftop, indexcol, index))
 			{
 				/* clause_op is correct */
 			}
-		else if (match_index_to_operand(rightop, indexcol, index))
-		{
-			/* Must flip operator to get the opfamily member */
-			clause_op = get_commutator(clause_op);
-		}
 			else
 			{
-			/* Must be past the end of quals for indexcol, try next */
+				Assert(match_index_to_operand(rightop, indexcol, index));
-			if (!eqQualHere)
-				break;			/* done if no '=' qual for indexcol */
-			indexcol++;
-			eqQualHere = false;
-			if (match_index_to_operand(leftop, indexcol, index))
-			{
-				/* clause_op is correct */
-			}
-			else if (match_index_to_operand(rightop, indexcol, index))
-			{
 				/* Must flip operator to get the opfamily member */
 				clause_op = get_commutator(clause_op);
 			}
-			else
-			{
-				/* No quals for new indexcol, so we are done */
-				break;
-			}
-		}
 			/* check for equality operator */
 			if (OidIsValid(clause_op))
 			{
@@ -6371,7 +6370,7 @@ btcostestimate(PG_FUNCTION_ARGS)
 			}
 			else if (is_null_op)
 			{
-			/* IS NULL is like = for purposes of selectivity determination */
+				/* IS NULL is like = for selectivity determination */
 				eqQualHere = true;
 			}
 			/* count up number of SA scans induced by indexBoundQuals only */
@@ -6386,6 +6385,13 @@ btcostestimate(PG_FUNCTION_ARGS)
 			indexBoundQuals = lappend(indexBoundQuals, rinfo);
 		}
+		/* Done with this indexcol, continue to next only if it had = qual */
+		if (!eqQualHere)
+			break;
+		indexcol++;
+	}
 	/*
 	 * If index is unique and we found an '=' clause for each column, we can
 	 * just assume numIndexTuples = 1 and skip the expensive
@@ -6393,7 +6399,7 @@ btcostestimate(PG_FUNCTION_ARGS)
 	 * NullTest invalidates that theory, even though it sets eqQualHere.
 	 */
 	if (index->unique &&
-		indexcol == index->ncolumns - 1 &&
+		indexcol == index->ncolumns &&
 		eqQualHere &&
 		!found_saop &&
 		!found_is_null_op)
@@ -6924,6 +6930,14 @@ gincostestimate(PG_FUNCTION_ARGS)
 	Relation	indexRel;
 	GinStatsData ginStats;
+	/*
+	 * For our purposes here, it doesn't matter which index columns the
+	 * individual quals and order-by expressions go with, so flatten the
+	 * lists for convenience.
+	 */
+	indexQuals = flatten_clausegroups_list(indexQuals);
+	indexOrderBys = flatten_indexorderbys_list(indexOrderBys);
 	/*
 	 * Obtain statistic information from the meta page
 	 */
@@ -6980,7 +6994,7 @@ gincostestimate(PG_FUNCTION_ARGS)
 			if (!predicate_implied_by(oneQual, indexQuals))
 				predExtraQuals = list_concat(predExtraQuals, oneQual);
 		}
-		/* list_concat avoids modifying the passed-in indexQuals list */
+		/* list_concat avoids modifying the indexQuals list */
 		selectivityQuals = list_concat(predExtraQuals, indexQuals);
 	}
 	else

--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -659,18 +659,25 @@ typedef struct Path
 * AND semantics across the list.  Each clause is a RestrictInfo node from
 * the query's WHERE or JOIN conditions.
 *
- * 'indexquals' has the same structure as 'indexclauses', but it contains
+ * 'indexquals' is a list of sub-lists of the actual index qual conditions
- * the actual indexqual conditions that can be used with the index.
+ * that can be used with the index.  There is one possibly-empty sub-list
- * In simple cases this is identical to 'indexclauses', but when special
+ * for each index column (but empty sub-lists for trailing columns can be
- * indexable operators appear in 'indexclauses', they are replaced by the
+ * omitted).  The qual conditions are RestrictInfos, and in simple cases
- * derived indexscannable conditions in 'indexquals'.
+ * are the same RestrictInfos that appear in the flat indexclauses list.
- *
+ * But when special indexable operators appear in 'indexclauses', they are
- * 'indexorderbys', if not NIL, is a list of ORDER BY expressions that have
+ * replaced by their derived indexscannable conditions in 'indexquals'.
- * been found to be usable as ordering operators for an amcanorderbyop index.
+ * Note that an entirely empty indexquals list denotes a full-index scan.
- * Note that these are not RestrictInfos, just bare expressions, since they
+ *
- * generally won't yield booleans.  The list will match the path's pathkeys.
+ * 'indexorderbys', if not NIL, is a list of lists of lists of ORDER BY
- * Also, unlike the case for quals, it's guaranteed that each expression has
+ * expressions that have been found to be usable as ordering operators for an
- * the index key on the left side of the operator.
+ * amcanorderbyop index.  These are not RestrictInfos, just bare expressions,
+ * since they generally won't yield booleans.  Also, unlike the case for
+ * quals, it's guaranteed that each expression has the index key on the left
+ * side of the operator.  The top list has one entry per pathkey in the
+ * path's pathkeys, and the sub-lists have one sub-sublist per index column.
+ * This representation is a bit of overkill, since there will be only one
+ * actual expression per pathkey, but it's convenient because each sub-list
+ * has the same structure as the indexquals list.
 *
 * 'isjoininner' is TRUE if the path is a nestloop inner scan (that is,
 * some of the index conditions are join rather than restriction clauses).

--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -61,6 +61,12 @@ extern List *expand_indexqual_conditions(IndexOptInfo *index,
 							List *clausegroups);
 extern void check_partial_indexes(PlannerInfo *root, RelOptInfo *rel);
 extern List *flatten_clausegroups_list(List *clausegroups);
+extern List *flatten_indexorderbys_list(List *indexorderbys);
+extern Expr *adjust_rowcompare_for_index(RowCompareExpr *clause,
+							IndexOptInfo *index,
+							int indexcol,
+							List **indexcolnos,
+							bool *var_on_left_p);
 /*
 * orindxpath.c

--- a/src/test/regress/expected/create_index.out
+++ b/src/test/regress/expected/create_index.out
@@ -2459,3 +2459,27 @@ RESET enable_seqscan;
 RESET enable_indexscan;
 RESET enable_bitmapscan;
 DROP TABLE onek_with_null;
+--
+-- Check behavior with duplicate index column contents
+--
+CREATE TABLE dupindexcols AS
+  SELECT unique1 as id, stringu2::text as f1 FROM tenk1;
+CREATE INDEX dupindexcols_i ON dupindexcols (f1, id, f1 text_pattern_ops);
+VACUUM ANALYZE dupindexcols;
+EXPLAIN (COSTS OFF)
+  SELECT count(*) FROM dupindexcols
+    WHERE f1 > 'LX' and id < 1000 and f1 ~<~ 'YX';
+                                   QUERY PLAN                                    
+---------------------------------------------------------------------------------
+ Aggregate
+   ->  Index Only Scan using dupindexcols_i on dupindexcols
+         Index Cond: ((f1 > 'LX'::text) AND (id < 1000) AND (f1 ~<~ 'YX'::text))
+(3 rows)
+SELECT count(*) FROM dupindexcols
+  WHERE f1 > 'LX' and id < 1000 and f1 ~<~ 'YX';
+ count 
+-------
+   500
+(1 row)
--- a/src/test/regress/expected/sanity_check.out
+++ b/src/test/regress/expected/sanity_check.out
@@ -39,6 +39,7 @@ SELECT relname, relhasindex
 default_tbl             | f
 defaultexpr_tbl         | f
 dept                    | f
+ dupindexcols            | t
 e_star                  | f
 emp                     | f
 equipment_r             | f
@@ -164,7 +165,7 @@ SELECT relname, relhasindex
 timetz_tbl              | f
 tinterval_tbl           | f
 varchar_tbl             | f
-(153 rows)
+(154 rows)
 --
 -- another sanity check: every system catalog that has OIDs should have

--- a/src/test/regress/output/misc.source
+++ b/src/test/regress/output/misc.source
@@ -610,6 +610,7 @@ SELECT user_relns() AS user_relns
 default_tbl
 defaultexpr_tbl
 dept
+ dupindexcols
 e_star
 emp
 equipment_r
@@ -685,7 +686,7 @@ SELECT user_relns() AS user_relns
 toyemp
 varchar_tbl
 xacttest
-(107 rows)
+(108 rows)
 SELECT name(equipment(hobby_construct(text 'skywalking', text 'mer')));
 name 

--- a/src/test/regress/sql/create_index.sql
+++ b/src/test/regress/sql/create_index.sql
@@ -804,3 +804,18 @@ RESET enable_indexscan;
 RESET enable_bitmapscan;
 DROP TABLE onek_with_null;
+--
+-- Check behavior with duplicate index column contents
+--
+CREATE TABLE dupindexcols AS
+  SELECT unique1 as id, stringu2::text as f1 FROM tenk1;
+CREATE INDEX dupindexcols_i ON dupindexcols (f1, id, f1 text_pattern_ops);
+VACUUM ANALYZE dupindexcols;
+EXPLAIN (COSTS OFF)
+  SELECT count(*) FROM dupindexcols
+    WHERE f1 > 'LX' and id < 1000 and f1 ~<~ 'YX';
+SELECT count(*) FROM dupindexcols
+  WHERE f1 > 'LX' and id < 1000 and f1 ~<~ 'YX';