Commit e2c2c2e8 authored by Tom Lane's avatar Tom Lane

Improve planner's handling of duplicated index column expressions.

It's potentially useful for an index to repeat the same indexable column
or expression in multiple index columns, if the columns have different
opclasses.  (If they share opclasses too, the duplicate column is pretty
useless, but nonetheless we've allowed such cases since 9.0.)  However,
the planner failed to cope with this, because createplan.c was relying on
simple equal() matching to figure out which index column each index qual
is intended for.  We do have that information available upstream in
indxpath.c, though, so the fix is to not flatten the multi-level indexquals
list when putting it into an IndexPath.  Then we can rely on the sublist
structure to identify target index columns in createplan.c.  There's a
similar issue for index ORDER BYs (the KNNGIST feature), so introduce a
multi-level-list representation for that too.  This adds a bit more
representational overhead, but we might more or less buy that back by not
having to search for matching index columns anymore in createplan.c;
likewise btcostestimate saves some cycles.

Per bug #6351 from Christian Rudolph.  Likely symptoms include the "btree
index keys must be ordered by attribute" failure shown there, as well as
"operator MMMM is not a member of opfamily NNNN".

Although this is a pre-existing problem that can be demonstrated in 9.0 and
9.1, I'm not going to back-patch it, because the API changes in the planner
seem likely to break things such as index plugins.  The corner cases where
this matters seem too narrow to justify possibly breaking things in a minor
release.
parent d5448c7d
...@@ -209,8 +209,10 @@ cost_seqscan(Path *path, PlannerInfo *root, ...@@ -209,8 +209,10 @@ cost_seqscan(Path *path, PlannerInfo *root,
* Determines and returns the cost of scanning a relation using an index. * Determines and returns the cost of scanning a relation using an index.
* *
* 'index' is the index to be used * 'index' is the index to be used
* 'indexQuals' is the list of applicable qual clauses (implicit AND semantics) * 'indexQuals' is a list of lists of applicable qual clauses (implicit AND
* 'indexOrderBys' is the list of ORDER BY operators for amcanorderbyop indexes * semantics, one sub-list per index column)
* 'indexOrderBys' is a list of lists of lists of ORDER BY expressions for
* amcanorderbyop indexes (lists per pathkey and index column)
* 'indexonly' is true if it's an index-only scan * 'indexonly' is true if it's an index-only scan
* 'outer_rel' is the outer relation when we are considering using the index * 'outer_rel' is the outer relation when we are considering using the index
* scan as the inside of a nestloop join (hence, some of the indexQuals * scan as the inside of a nestloop join (hence, some of the indexQuals
...@@ -221,8 +223,8 @@ cost_seqscan(Path *path, PlannerInfo *root, ...@@ -221,8 +223,8 @@ cost_seqscan(Path *path, PlannerInfo *root,
* additional fields of the IndexPath besides startup_cost and total_cost. * additional fields of the IndexPath besides startup_cost and total_cost.
* These fields are needed if the IndexPath is used in a BitmapIndexScan. * These fields are needed if the IndexPath is used in a BitmapIndexScan.
* *
* indexQuals is a list of RestrictInfo nodes, but indexOrderBys is a list of * indexQuals is a list of lists of RestrictInfo nodes, but indexOrderBys
* bare expressions. * is a list of lists of lists of bare expressions.
* *
* NOTE: 'indexQuals' must contain only clauses usable as index restrictions. * NOTE: 'indexQuals' must contain only clauses usable as index restrictions.
* Any additional quals evaluated as qpquals may reduce the number of returned * Any additional quals evaluated as qpquals may reduce the number of returned
......
This diff is collapsed.
This diff is collapsed.
...@@ -412,8 +412,8 @@ create_seqscan_path(PlannerInfo *root, RelOptInfo *rel) ...@@ -412,8 +412,8 @@ create_seqscan_path(PlannerInfo *root, RelOptInfo *rel)
* 'index' is a usable index. * 'index' is a usable index.
* 'clause_groups' is a list of lists of RestrictInfo nodes * 'clause_groups' is a list of lists of RestrictInfo nodes
* to be used as index qual conditions in the scan. * to be used as index qual conditions in the scan.
* 'indexorderbys' is a list of bare expressions (no RestrictInfos) * 'indexorderbys' is a list of lists of lists of bare expressions (not
* to be used as index ordering operators in the scan. * RestrictInfos) to be used as index ordering operators.
* 'pathkeys' describes the ordering of the path. * 'pathkeys' describes the ordering of the path.
* 'indexscandir' is ForwardScanDirection or BackwardScanDirection * 'indexscandir' is ForwardScanDirection or BackwardScanDirection
* for an ordered index, or NoMovementScanDirection for * for an ordered index, or NoMovementScanDirection for
......
...@@ -630,7 +630,7 @@ extract_actual_join_clauses(List *restrictinfo_list, ...@@ -630,7 +630,7 @@ extract_actual_join_clauses(List *restrictinfo_list,
* being used in an inner indexscan need not be checked again at the join. * being used in an inner indexscan need not be checked again at the join.
* *
* "Redundant" means either equal() or derived from the same EquivalenceClass. * "Redundant" means either equal() or derived from the same EquivalenceClass.
* We have to check the latter because indxqual.c may select different derived * We have to check the latter because indxpath.c may select different derived
* clauses than were selected by generate_join_implied_equalities(). * clauses than were selected by generate_join_implied_equalities().
* *
* Note that we are *not* checking for local redundancies within the given * Note that we are *not* checking for local redundancies within the given
......
...@@ -5991,6 +5991,14 @@ genericcostestimate(PlannerInfo *root, ...@@ -5991,6 +5991,14 @@ genericcostestimate(PlannerInfo *root,
List *selectivityQuals; List *selectivityQuals;
ListCell *l; ListCell *l;
/*
* For our purposes here, it doesn't matter which index columns the
* individual quals and order-by expressions go with, so flatten the
* lists for convenience.
*/
indexQuals = flatten_clausegroups_list(indexQuals);
indexOrderBys = flatten_indexorderbys_list(indexOrderBys);
/*---------- /*----------
* If the index is partial, AND the index predicate with the explicitly * If the index is partial, AND the index predicate with the explicitly
* given indexquals to produce a more accurate idea of the index * given indexquals to produce a more accurate idea of the index
...@@ -6022,7 +6030,7 @@ genericcostestimate(PlannerInfo *root, ...@@ -6022,7 +6030,7 @@ genericcostestimate(PlannerInfo *root,
if (!predicate_implied_by(oneQual, indexQuals)) if (!predicate_implied_by(oneQual, indexQuals))
predExtraQuals = list_concat(predExtraQuals, oneQual); predExtraQuals = list_concat(predExtraQuals, oneQual);
} }
/* list_concat avoids modifying the passed-in indexQuals list */ /* list_concat avoids modifying the indexQuals list */
selectivityQuals = list_concat(predExtraQuals, indexQuals); selectivityQuals = list_concat(predExtraQuals, indexQuals);
} }
else else
...@@ -6250,7 +6258,7 @@ btcostestimate(PG_FUNCTION_ARGS) ...@@ -6250,7 +6258,7 @@ btcostestimate(PG_FUNCTION_ARGS)
bool found_saop; bool found_saop;
bool found_is_null_op; bool found_is_null_op;
double num_sa_scans; double num_sa_scans;
ListCell *l; ListCell *lc1;
/* /*
* For a btree scan, only leading '=' quals plus inequality quals for the * For a btree scan, only leading '=' quals plus inequality quals for the
...@@ -6259,8 +6267,7 @@ btcostestimate(PG_FUNCTION_ARGS) ...@@ -6259,8 +6267,7 @@ btcostestimate(PG_FUNCTION_ARGS)
* the index scan). Additional quals can suppress visits to the heap, so * the index scan). Additional quals can suppress visits to the heap, so
* it's OK to count them in indexSelectivity, but they should not count * it's OK to count them in indexSelectivity, but they should not count
* for estimating numIndexTuples. So we must examine the given indexQuals * for estimating numIndexTuples. So we must examine the given indexQuals
* to find out which ones count as boundary quals. We rely on the * to find out which ones count as boundary quals.
* knowledge that they are given in index column order.
* *
* For a RowCompareExpr, we consider only the first column, just as * For a RowCompareExpr, we consider only the first column, just as
* rowcomparesel() does. * rowcomparesel() does.
...@@ -6270,14 +6277,25 @@ btcostestimate(PG_FUNCTION_ARGS) ...@@ -6270,14 +6277,25 @@ btcostestimate(PG_FUNCTION_ARGS)
* considered to act the same as it normally does. * considered to act the same as it normally does.
*/ */
indexBoundQuals = NIL; indexBoundQuals = NIL;
indexcol = 0;
eqQualHere = false; eqQualHere = false;
found_saop = false; found_saop = false;
found_is_null_op = false; found_is_null_op = false;
num_sa_scans = 1; num_sa_scans = 1;
foreach(l, indexQuals)
/* clausegroups must correspond to index columns */
Assert(list_length(indexQuals) <= index->ncolumns);
indexcol = 0;
foreach(lc1, indexQuals)
{ {
RestrictInfo *rinfo = (RestrictInfo *) lfirst(l); List *clausegroup = (List *) lfirst(lc1);
ListCell *lc2;
eqQualHere = false;
foreach(lc2, clausegroup)
{
RestrictInfo *rinfo = (RestrictInfo *) lfirst(lc2);
Expr *clause; Expr *clause;
Node *leftop, Node *leftop,
*rightop; *rightop;
...@@ -6329,37 +6347,18 @@ btcostestimate(PG_FUNCTION_ARGS) ...@@ -6329,37 +6347,18 @@ btcostestimate(PG_FUNCTION_ARGS)
(int) nodeTag(clause)); (int) nodeTag(clause));
continue; /* keep compiler quiet */ continue; /* keep compiler quiet */
} }
if (match_index_to_operand(leftop, indexcol, index)) if (match_index_to_operand(leftop, indexcol, index))
{ {
/* clause_op is correct */ /* clause_op is correct */
} }
else if (match_index_to_operand(rightop, indexcol, index))
{
/* Must flip operator to get the opfamily member */
clause_op = get_commutator(clause_op);
}
else else
{ {
/* Must be past the end of quals for indexcol, try next */ Assert(match_index_to_operand(rightop, indexcol, index));
if (!eqQualHere)
break; /* done if no '=' qual for indexcol */
indexcol++;
eqQualHere = false;
if (match_index_to_operand(leftop, indexcol, index))
{
/* clause_op is correct */
}
else if (match_index_to_operand(rightop, indexcol, index))
{
/* Must flip operator to get the opfamily member */ /* Must flip operator to get the opfamily member */
clause_op = get_commutator(clause_op); clause_op = get_commutator(clause_op);
} }
else
{
/* No quals for new indexcol, so we are done */
break;
}
}
/* check for equality operator */ /* check for equality operator */
if (OidIsValid(clause_op)) if (OidIsValid(clause_op))
{ {
...@@ -6371,7 +6370,7 @@ btcostestimate(PG_FUNCTION_ARGS) ...@@ -6371,7 +6370,7 @@ btcostestimate(PG_FUNCTION_ARGS)
} }
else if (is_null_op) else if (is_null_op)
{ {
/* IS NULL is like = for purposes of selectivity determination */ /* IS NULL is like = for selectivity determination */
eqQualHere = true; eqQualHere = true;
} }
/* count up number of SA scans induced by indexBoundQuals only */ /* count up number of SA scans induced by indexBoundQuals only */
...@@ -6386,6 +6385,13 @@ btcostestimate(PG_FUNCTION_ARGS) ...@@ -6386,6 +6385,13 @@ btcostestimate(PG_FUNCTION_ARGS)
indexBoundQuals = lappend(indexBoundQuals, rinfo); indexBoundQuals = lappend(indexBoundQuals, rinfo);
} }
/* Done with this indexcol, continue to next only if it had = qual */
if (!eqQualHere)
break;
indexcol++;
}
/* /*
* If index is unique and we found an '=' clause for each column, we can * If index is unique and we found an '=' clause for each column, we can
* just assume numIndexTuples = 1 and skip the expensive * just assume numIndexTuples = 1 and skip the expensive
...@@ -6393,7 +6399,7 @@ btcostestimate(PG_FUNCTION_ARGS) ...@@ -6393,7 +6399,7 @@ btcostestimate(PG_FUNCTION_ARGS)
* NullTest invalidates that theory, even though it sets eqQualHere. * NullTest invalidates that theory, even though it sets eqQualHere.
*/ */
if (index->unique && if (index->unique &&
indexcol == index->ncolumns - 1 && indexcol == index->ncolumns &&
eqQualHere && eqQualHere &&
!found_saop && !found_saop &&
!found_is_null_op) !found_is_null_op)
...@@ -6924,6 +6930,14 @@ gincostestimate(PG_FUNCTION_ARGS) ...@@ -6924,6 +6930,14 @@ gincostestimate(PG_FUNCTION_ARGS)
Relation indexRel; Relation indexRel;
GinStatsData ginStats; GinStatsData ginStats;
/*
* For our purposes here, it doesn't matter which index columns the
* individual quals and order-by expressions go with, so flatten the
* lists for convenience.
*/
indexQuals = flatten_clausegroups_list(indexQuals);
indexOrderBys = flatten_indexorderbys_list(indexOrderBys);
/* /*
* Obtain statistic information from the meta page * Obtain statistic information from the meta page
*/ */
...@@ -6980,7 +6994,7 @@ gincostestimate(PG_FUNCTION_ARGS) ...@@ -6980,7 +6994,7 @@ gincostestimate(PG_FUNCTION_ARGS)
if (!predicate_implied_by(oneQual, indexQuals)) if (!predicate_implied_by(oneQual, indexQuals))
predExtraQuals = list_concat(predExtraQuals, oneQual); predExtraQuals = list_concat(predExtraQuals, oneQual);
} }
/* list_concat avoids modifying the passed-in indexQuals list */ /* list_concat avoids modifying the indexQuals list */
selectivityQuals = list_concat(predExtraQuals, indexQuals); selectivityQuals = list_concat(predExtraQuals, indexQuals);
} }
else else
......
...@@ -659,18 +659,25 @@ typedef struct Path ...@@ -659,18 +659,25 @@ typedef struct Path
* AND semantics across the list. Each clause is a RestrictInfo node from * AND semantics across the list. Each clause is a RestrictInfo node from
* the query's WHERE or JOIN conditions. * the query's WHERE or JOIN conditions.
* *
* 'indexquals' has the same structure as 'indexclauses', but it contains * 'indexquals' is a list of sub-lists of the actual index qual conditions
* the actual indexqual conditions that can be used with the index. * that can be used with the index. There is one possibly-empty sub-list
* In simple cases this is identical to 'indexclauses', but when special * for each index column (but empty sub-lists for trailing columns can be
* indexable operators appear in 'indexclauses', they are replaced by the * omitted). The qual conditions are RestrictInfos, and in simple cases
* derived indexscannable conditions in 'indexquals'. * are the same RestrictInfos that appear in the flat indexclauses list.
* * But when special indexable operators appear in 'indexclauses', they are
* 'indexorderbys', if not NIL, is a list of ORDER BY expressions that have * replaced by their derived indexscannable conditions in 'indexquals'.
* been found to be usable as ordering operators for an amcanorderbyop index. * Note that an entirely empty indexquals list denotes a full-index scan.
* Note that these are not RestrictInfos, just bare expressions, since they *
* generally won't yield booleans. The list will match the path's pathkeys. * 'indexorderbys', if not NIL, is a list of lists of lists of ORDER BY
* Also, unlike the case for quals, it's guaranteed that each expression has * expressions that have been found to be usable as ordering operators for an
* the index key on the left side of the operator. * amcanorderbyop index. These are not RestrictInfos, just bare expressions,
* since they generally won't yield booleans. Also, unlike the case for
* quals, it's guaranteed that each expression has the index key on the left
* side of the operator. The top list has one entry per pathkey in the
* path's pathkeys, and the sub-lists have one sub-sublist per index column.
* This representation is a bit of overkill, since there will be only one
* actual expression per pathkey, but it's convenient because each sub-list
* has the same structure as the indexquals list.
* *
* 'isjoininner' is TRUE if the path is a nestloop inner scan (that is, * 'isjoininner' is TRUE if the path is a nestloop inner scan (that is,
* some of the index conditions are join rather than restriction clauses). * some of the index conditions are join rather than restriction clauses).
......
...@@ -61,6 +61,12 @@ extern List *expand_indexqual_conditions(IndexOptInfo *index, ...@@ -61,6 +61,12 @@ extern List *expand_indexqual_conditions(IndexOptInfo *index,
List *clausegroups); List *clausegroups);
extern void check_partial_indexes(PlannerInfo *root, RelOptInfo *rel); extern void check_partial_indexes(PlannerInfo *root, RelOptInfo *rel);
extern List *flatten_clausegroups_list(List *clausegroups); extern List *flatten_clausegroups_list(List *clausegroups);
extern List *flatten_indexorderbys_list(List *indexorderbys);
extern Expr *adjust_rowcompare_for_index(RowCompareExpr *clause,
IndexOptInfo *index,
int indexcol,
List **indexcolnos,
bool *var_on_left_p);
/* /*
* orindxpath.c * orindxpath.c
......
...@@ -2459,3 +2459,27 @@ RESET enable_seqscan; ...@@ -2459,3 +2459,27 @@ RESET enable_seqscan;
RESET enable_indexscan; RESET enable_indexscan;
RESET enable_bitmapscan; RESET enable_bitmapscan;
DROP TABLE onek_with_null; DROP TABLE onek_with_null;
--
-- Check behavior with duplicate index column contents
--
CREATE TABLE dupindexcols AS
SELECT unique1 as id, stringu2::text as f1 FROM tenk1;
CREATE INDEX dupindexcols_i ON dupindexcols (f1, id, f1 text_pattern_ops);
VACUUM ANALYZE dupindexcols;
EXPLAIN (COSTS OFF)
SELECT count(*) FROM dupindexcols
WHERE f1 > 'LX' and id < 1000 and f1 ~<~ 'YX';
QUERY PLAN
---------------------------------------------------------------------------------
Aggregate
-> Index Only Scan using dupindexcols_i on dupindexcols
Index Cond: ((f1 > 'LX'::text) AND (id < 1000) AND (f1 ~<~ 'YX'::text))
(3 rows)
SELECT count(*) FROM dupindexcols
WHERE f1 > 'LX' and id < 1000 and f1 ~<~ 'YX';
count
-------
500
(1 row)
...@@ -39,6 +39,7 @@ SELECT relname, relhasindex ...@@ -39,6 +39,7 @@ SELECT relname, relhasindex
default_tbl | f default_tbl | f
defaultexpr_tbl | f defaultexpr_tbl | f
dept | f dept | f
dupindexcols | t
e_star | f e_star | f
emp | f emp | f
equipment_r | f equipment_r | f
...@@ -164,7 +165,7 @@ SELECT relname, relhasindex ...@@ -164,7 +165,7 @@ SELECT relname, relhasindex
timetz_tbl | f timetz_tbl | f
tinterval_tbl | f tinterval_tbl | f
varchar_tbl | f varchar_tbl | f
(153 rows) (154 rows)
-- --
-- another sanity check: every system catalog that has OIDs should have -- another sanity check: every system catalog that has OIDs should have
......
...@@ -610,6 +610,7 @@ SELECT user_relns() AS user_relns ...@@ -610,6 +610,7 @@ SELECT user_relns() AS user_relns
default_tbl default_tbl
defaultexpr_tbl defaultexpr_tbl
dept dept
dupindexcols
e_star e_star
emp emp
equipment_r equipment_r
...@@ -685,7 +686,7 @@ SELECT user_relns() AS user_relns ...@@ -685,7 +686,7 @@ SELECT user_relns() AS user_relns
toyemp toyemp
varchar_tbl varchar_tbl
xacttest xacttest
(107 rows) (108 rows)
SELECT name(equipment(hobby_construct(text 'skywalking', text 'mer'))); SELECT name(equipment(hobby_construct(text 'skywalking', text 'mer')));
name name
......
...@@ -804,3 +804,18 @@ RESET enable_indexscan; ...@@ -804,3 +804,18 @@ RESET enable_indexscan;
RESET enable_bitmapscan; RESET enable_bitmapscan;
DROP TABLE onek_with_null; DROP TABLE onek_with_null;
--
-- Check behavior with duplicate index column contents
--
CREATE TABLE dupindexcols AS
SELECT unique1 as id, stringu2::text as f1 FROM tenk1;
CREATE INDEX dupindexcols_i ON dupindexcols (f1, id, f1 text_pattern_ops);
VACUUM ANALYZE dupindexcols;
EXPLAIN (COSTS OFF)
SELECT count(*) FROM dupindexcols
WHERE f1 > 'LX' and id < 1000 and f1 ~<~ 'YX';
SELECT count(*) FROM dupindexcols
WHERE f1 > 'LX' and id < 1000 and f1 ~<~ 'YX';
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment