Commit dd4134ea authored by Tom Lane's avatar Tom Lane

Revisit handling of UNION ALL subqueries with non-Var output columns.

In commit 57664ed2 I tried to fix a bug
reported by Teodor Sigaev by making non-simple-Var output columns distinct
(by wrapping their expressions with dummy PlaceHolderVar nodes).  This did
not work too well.  Commit b28ffd0f fixed
some ensuing problems with matching to child indexes, but per a recent
report from Claus Stadler, constraint exclusion of UNION ALL subqueries was
still broken, because constant-simplification didn't handle the injected
PlaceHolderVars well either.  On reflection, the original patch was quite
misguided: there is no reason to expect that EquivalenceClass child members
will be distinct.  So instead of trying to make them so, we should ensure
that we can cope with the situation when they're not.

Accordingly, this patch reverts the code changes in the above-mentioned
commits (though the regression test cases they added stay).  Instead, I've
added assorted defenses to make sure that duplicate EC child members don't
cause any problems.  Teodor's original problem ("MergeAppend child's
targetlist doesn't match MergeAppend") is addressed more directly by
revising prepare_sort_from_pathkeys to let the parent MergeAppend's sort
list guide creation of each child's sort list.

In passing, get rid of add_sort_column; as far as I can tell, testing for
duplicate sort keys at this stage is dead code.  Certainly it doesn't
trigger often enough to be worth expending cycles on in ordinary queries.
And keeping the test would've greatly complicated the new logic in
prepare_sort_from_pathkeys, because comparing pathkey list entries against
a previous output array requires that we not skip any entries in the list.

Back-patch to 9.1, like the previous patches.  The only known issue in
this area that wasn't caused by the ill-advised previous patches was the
MergeAppend planning failure, which of course is not relevant before 9.1.
It's possible that we need some of the new defenses against duplicate child
EC entries in older branches, but until there's some clear evidence of that
I'm going to refrain from back-patching further.
parent aef5fe7e
......@@ -496,6 +496,14 @@ it's possible that it belongs to more than one. We keep track of all the
families to ensure that we can make use of an index belonging to any one of
the families for mergejoin purposes.)
An EquivalenceClass can contain "em_is_child" members, which are copies
of members that contain appendrel parent relation Vars, transposed to
contain the equivalent child-relation variables or expressions. These
members are *not* full-fledged members of the EquivalenceClass and do not
affect the class's overall properties at all. They are kept only to
simplify matching of child-relation expressions to EquivalenceClasses.
Most operations on EquivalenceClasses should ignore child members.
PathKeys
--------
......
......@@ -491,6 +491,15 @@ add_eq_member(EquivalenceClass *ec, Expr *expr, Relids relids,
* sortref is the SortGroupRef of the originating SortGroupClause, if any,
* or zero if not. (It should never be zero if the expression is volatile!)
*
* If rel is not NULL, it identifies a specific relation we're considering
* a path for, and indicates that child EC members for that relation can be
* considered. Otherwise child members are ignored. (Note: since child EC
* members aren't guaranteed unique, a non-NULL value means that there could
* be more than one EC that matches the expression; if so it's order-dependent
* which one you get. This is annoying but it only happens in corner cases,
* so for now we live with just reporting the first match. See also
* generate_implied_equalities_for_indexcol and match_pathkeys_to_index.)
*
* If create_it is TRUE, we'll build a new EquivalenceClass when there is no
* match. If create_it is FALSE, we just return NULL when no match.
*
......@@ -511,6 +520,7 @@ get_eclass_for_sort_expr(PlannerInfo *root,
Oid opcintype,
Oid collation,
Index sortref,
Relids rel,
bool create_it)
{
EquivalenceClass *newec;
......@@ -548,6 +558,13 @@ get_eclass_for_sort_expr(PlannerInfo *root,
{
EquivalenceMember *cur_em = (EquivalenceMember *) lfirst(lc2);
/*
* Ignore child members unless they match the request.
*/
if (cur_em->em_is_child &&
!bms_equal(cur_em->em_relids, rel))
continue;
/*
* If below an outer join, don't match constants: they're not as
* constant as they look.
......@@ -1505,6 +1522,7 @@ reconsider_outer_join_clause(PlannerInfo *root, RestrictInfo *rinfo,
{
EquivalenceMember *cur_em = (EquivalenceMember *) lfirst(lc2);
Assert(!cur_em->em_is_child); /* no children yet */
if (equal(outervar, cur_em->em_expr))
{
match = true;
......@@ -1626,6 +1644,7 @@ reconsider_full_join_clause(PlannerInfo *root, RestrictInfo *rinfo)
foreach(lc2, cur_ec->ec_members)
{
coal_em = (EquivalenceMember *) lfirst(lc2);
Assert(!coal_em->em_is_child); /* no children yet */
if (IsA(coal_em->em_expr, CoalesceExpr))
{
CoalesceExpr *cexpr = (CoalesceExpr *) coal_em->em_expr;
......@@ -1747,6 +1766,8 @@ exprs_known_equal(PlannerInfo *root, Node *item1, Node *item2)
{
EquivalenceMember *em = (EquivalenceMember *) lfirst(lc2);
if (em->em_is_child)
continue; /* ignore children here */
if (equal(item1, em->em_expr))
item1member = true;
else if (equal(item2, em->em_expr))
......@@ -1800,6 +1821,9 @@ add_child_rel_equivalences(PlannerInfo *root,
{
EquivalenceMember *cur_em = (EquivalenceMember *) lfirst(lc2);
if (cur_em->em_is_child)
continue; /* ignore children here */
/* Does it reference (only) parent_rel? */
if (bms_equal(cur_em->em_relids, parent_rel->relids))
{
......@@ -1908,7 +1932,16 @@ generate_implied_equalities_for_indexcol(PlannerInfo *root,
!bms_is_subset(rel->relids, cur_ec->ec_relids))
continue;
/* Scan members, looking for a match to the indexable column */
/*
* Scan members, looking for a match to the indexable column. Note
* that child EC members are considered, but only when they belong to
* the target relation. (Unlike regular members, the same expression
* could be a child member of more than one EC. Therefore, it's
* potentially order-dependent which EC a child relation's index
* column gets matched to. This is annoying but it only happens in
* corner cases, so for now we live with just reporting the first
* match. See also get_eclass_for_sort_expr.)
*/
cur_em = NULL;
foreach(lc2, cur_ec->ec_members)
{
......@@ -1933,6 +1966,9 @@ generate_implied_equalities_for_indexcol(PlannerInfo *root,
Oid eq_op;
RestrictInfo *rinfo;
if (other_em->em_is_child)
continue; /* ignore children here */
/* Make sure it'll be a join to a different rel */
if (other_em == cur_em ||
bms_overlap(other_em->em_relids, rel->relids))
......@@ -2187,8 +2223,10 @@ eclass_useful_for_merging(EquivalenceClass *eclass,
{
EquivalenceMember *cur_em = (EquivalenceMember *) lfirst(lc);
if (!cur_em->em_is_child &&
!bms_overlap(cur_em->em_relids, rel->relids))
if (cur_em->em_is_child)
continue; /* ignore children here */
if (!bms_overlap(cur_em->em_relids, rel->relids))
return true;
}
......
......@@ -2157,7 +2157,14 @@ match_pathkeys_to_index(IndexOptInfo *index, List *pathkeys,
if (pathkey->pk_eclass->ec_has_volatile)
return;
/* Try to match eclass member expression(s) to index */
/*
* Try to match eclass member expression(s) to index. Note that child
* EC members are considered, but only when they belong to the target
* relation. (Unlike regular members, the same expression could be a
* child member of more than one EC. Therefore, the same index could
* be considered to match more than one pathkey list, which is OK
* here. See also get_eclass_for_sort_expr.)
*/
foreach(lc2, pathkey->pk_eclass->ec_members)
{
EquivalenceMember *member = (EquivalenceMember *) lfirst(lc2);
......@@ -2580,15 +2587,6 @@ match_index_to_operand(Node *operand,
{
int indkey;
/*
* Ignore any PlaceHolderVar nodes above the operand. This is needed so
* that we can successfully use expression-index constraints pushed down
* through appendrels (UNION ALL). It's safe because a PlaceHolderVar
* appearing in a relation-scan-level expression is certainly a no-op.
*/
while (operand && IsA(operand, PlaceHolderVar))
operand = (Node *) ((PlaceHolderVar *) operand)->phexpr;
/*
* Ignore any RelabelType node above the operand. This is needed to be
* able to apply indexscanning in binary-compatible-operator cases. Note:
......
......@@ -221,6 +221,11 @@ canonicalize_pathkeys(PlannerInfo *root, List *pathkeys)
* If the PathKey is being generated from a SortGroupClause, sortref should be
* the SortGroupClause's SortGroupRef; otherwise zero.
*
* If rel is not NULL, it identifies a specific relation we're considering
* a path for, and indicates that child EC members for that relation can be
* considered. Otherwise child members are ignored. (See the comments for
* get_eclass_for_sort_expr.)
*
* create_it is TRUE if we should create any missing EquivalenceClass
* needed to represent the sort key. If it's FALSE, we return NULL if the
* sort key isn't already present in any EquivalenceClass.
......@@ -237,6 +242,7 @@ make_pathkey_from_sortinfo(PlannerInfo *root,
bool reverse_sort,
bool nulls_first,
Index sortref,
Relids rel,
bool create_it,
bool canonicalize)
{
......@@ -268,7 +274,7 @@ make_pathkey_from_sortinfo(PlannerInfo *root,
/* Now find or (optionally) create a matching EquivalenceClass */
eclass = get_eclass_for_sort_expr(root, expr, opfamilies,
opcintype, collation,
sortref, create_it);
sortref, rel, create_it);
/* Fail if no EC and !create_it */
if (!eclass)
......@@ -320,6 +326,7 @@ make_pathkey_from_sortop(PlannerInfo *root,
(strategy == BTGreaterStrategyNumber),
nulls_first,
sortref,
NULL,
create_it,
canonicalize);
}
......@@ -546,6 +553,7 @@ build_index_pathkeys(PlannerInfo *root,
reverse_sort,
nulls_first,
0,
index->rel->relids,
false,
true);
......@@ -636,6 +644,7 @@ convert_subquery_pathkeys(PlannerInfo *root, RelOptInfo *rel,
sub_member->em_datatype,
sub_eclass->ec_collation,
0,
rel->relids,
false);
/*
......@@ -680,6 +689,9 @@ convert_subquery_pathkeys(PlannerInfo *root, RelOptInfo *rel,
Oid sub_expr_coll = sub_eclass->ec_collation;
ListCell *k;
if (sub_member->em_is_child)
continue; /* ignore children here */
foreach(k, sub_tlist)
{
TargetEntry *tle = (TargetEntry *) lfirst(k);
......@@ -719,6 +731,7 @@ convert_subquery_pathkeys(PlannerInfo *root, RelOptInfo *rel,
sub_expr_type,
sub_expr_coll,
0,
rel->relids,
false);
/*
......@@ -910,6 +923,7 @@ initialize_mergeclause_eclasses(PlannerInfo *root, RestrictInfo *restrictinfo)
lefttype,
((OpExpr *) clause)->inputcollid,
0,
NULL,
true);
restrictinfo->right_ec =
get_eclass_for_sort_expr(root,
......@@ -918,6 +932,7 @@ initialize_mergeclause_eclasses(PlannerInfo *root, RestrictInfo *restrictinfo)
righttype,
((OpExpr *) clause)->inputcollid,
0,
NULL,
true);
}
......
This diff is collapsed.
......@@ -99,9 +99,10 @@ preprocess_minmax_aggregates(PlannerInfo *root, List *tlist)
* We also restrict the query to reference exactly one table, since join
* conditions can't be handled reasonably. (We could perhaps handle a
* query containing cartesian-product joins, but it hardly seems worth the
* trouble.) However, the single real table could be buried in several
* levels of FromExpr due to subqueries. Note the single table could be
* an inheritance parent, too.
* trouble.) However, the single table could be buried in several levels
* of FromExpr due to subqueries. Note the "single" table could be an
* inheritance parent, too, including the case of a UNION ALL subquery
* that's been flattened to an appendrel.
*/
jtnode = parse->jointree;
while (IsA(jtnode, FromExpr))
......@@ -114,7 +115,11 @@ preprocess_minmax_aggregates(PlannerInfo *root, List *tlist)
return;
rtr = (RangeTblRef *) jtnode;
rte = planner_rt_fetch(rtr->rtindex, root);
if (rte->rtekind != RTE_RELATION)
if (rte->rtekind == RTE_RELATION)
/* ordinary relation, ok */ ;
else if (rte->rtekind == RTE_SUBQUERY && rte->inh)
/* flattened UNION ALL subquery, ok */ ;
else
return;
/*
......
......@@ -176,7 +176,7 @@ query_planner(PlannerInfo *root, List *tlist,
*/
build_base_rel_tlists(root, tlist);
find_placeholders_in_query(root);
find_placeholders_in_jointree(root);
joinlist = deconstruct_jointree(root);
......
......@@ -866,15 +866,22 @@ pull_up_simple_subquery(PlannerInfo *root, Node *jtnode, RangeTblEntry *rte,
parse->havingQual = pullup_replace_vars(parse->havingQual, &rvcontext);
/*
* Replace references in the translated_vars lists of appendrels, too.
* We do it this way because we must preserve the AppendRelInfo structs.
* Replace references in the translated_vars lists of appendrels. When
* pulling up an appendrel member, we do not need PHVs in the list of the
* parent appendrel --- there isn't any outer join between. Elsewhere, use
* PHVs for safety. (This analysis could be made tighter but it seems
* unlikely to be worth much trouble.)
*/
foreach(lc, root->append_rel_list)
{
AppendRelInfo *appinfo = (AppendRelInfo *) lfirst(lc);
bool save_need_phvs = rvcontext.need_phvs;
if (appinfo == containing_appendrel)
rvcontext.need_phvs = false;
appinfo->translated_vars = (List *)
pullup_replace_vars((Node *) appinfo->translated_vars, &rvcontext);
rvcontext.need_phvs = save_need_phvs;
}
/*
......@@ -1482,31 +1489,14 @@ pullup_replace_vars_callback(Var *var,
if (newnode && IsA(newnode, Var) &&
((Var *) newnode)->varlevelsup == 0)
{
/*
* Simple Vars normally escape being wrapped. However, in
* wrap_non_vars mode (ie, we are dealing with an appendrel
* member), we must ensure that each tlist entry expands to a
* distinct expression, else we may have problems with
* improperly placing identical entries into different
* EquivalenceClasses. Therefore, we wrap a Var in a
* PlaceHolderVar if it duplicates any earlier entry in the
* tlist (ie, we've got "SELECT x, x, ..."). Since each PHV
* is distinct, this fixes the ambiguity. We can use
* tlist_member to detect whether there's an earlier
* duplicate.
*/
wrap = (rcon->wrap_non_vars &&
tlist_member(newnode, rcon->targetlist) != tle);
/* Simple Vars always escape being wrapped */
wrap = false;
}
else if (newnode && IsA(newnode, PlaceHolderVar) &&
((PlaceHolderVar *) newnode)->phlevelsup == 0)
{
/*
* No need to directly wrap a PlaceHolderVar with another one,
* either, unless we need to prevent duplication.
*/
wrap = (rcon->wrap_non_vars &&
tlist_member(newnode, rcon->targetlist) != tle);
/* No need to wrap a PlaceHolderVar with another one, either */
wrap = false;
}
else if (rcon->wrap_non_vars)
{
......
......@@ -104,41 +104,28 @@ find_placeholder_info(PlannerInfo *root, PlaceHolderVar *phv,
}
/*
* find_placeholders_in_query
* Search the query for PlaceHolderVars, and build PlaceHolderInfos
* find_placeholders_in_jointree
* Search the jointree for PlaceHolderVars, and build PlaceHolderInfos
*
* We need to examine the jointree, but not the targetlist, because
* build_base_rel_tlists() will already have made entries for any PHVs
* in the targetlist.
*
* We also need to search for PHVs in AppendRelInfo translated_vars
* lists. In most cases, translated_vars entries aren't directly referenced
* elsewhere, but we need to create PlaceHolderInfo entries for them to
* support set_rel_width() calculations for the appendrel child relations.
* We don't need to look at the targetlist because build_base_rel_tlists()
* will already have made entries for any PHVs in the tlist.
*/
void
find_placeholders_in_query(PlannerInfo *root)
find_placeholders_in_jointree(PlannerInfo *root)
{
/* We need do nothing if the query contains no PlaceHolderVars */
if (root->glob->lastPHId != 0)
{
/* Recursively search the jointree */
/* Start recursion at top of jointree */
Assert(root->parse->jointree != NULL &&
IsA(root->parse->jointree, FromExpr));
(void) find_placeholders_recurse(root, (Node *) root->parse->jointree);
/*
* Also search the append_rel_list for translated vars that are PHVs.
* Barring finding them elsewhere in the query, they do not need any
* ph_may_need bits, only to be present in the PlaceHolderInfo list.
*/
mark_placeholders_in_expr(root, (Node *) root->append_rel_list, NULL);
}
}
/*
* find_placeholders_recurse
* One recursion level of jointree search for find_placeholders_in_query.
* One recursion level of find_placeholders_in_jointree.
*
* jtnode is the current jointree node to examine.
*
......
......@@ -572,12 +572,18 @@ typedef struct EquivalenceClass
* EquivalenceMember - one member expression of an EquivalenceClass
*
* em_is_child signifies that this element was built by transposing a member
* for an inheritance parent relation to represent the corresponding expression
* on an inheritance child. These elements are used for constructing
* inner-indexscan paths for the child relation (other types of join are
* driven from transposed joininfo-list entries) and for constructing
* MergeAppend paths for the whole inheritance tree. Note that the EC's
* ec_relids field does NOT include the child relation.
* for an appendrel parent relation to represent the corresponding expression
* for an appendrel child. These members are used for determining the
* pathkeys of scans on the child relation and for explicitly sorting the
* child when necessary to build a MergeAppend path for the whole appendrel
* tree. An em_is_child member has no impact on the properties of the EC as a
* whole; in particular the EC's ec_relids field does NOT include the child
* relation. An em_is_child member should never be marked em_is_const nor
* cause ec_has_const or ec_has_volatile to be set, either. Thus, em_is_child
* members are not really full-fledged members of the EC, but just reflections
* or doppelgangers of real members. Most operations on EquivalenceClasses
* should ignore em_is_child members, and those that don't should test
* em_relids to make sure they only consider relevant members.
*
* em_datatype is usually the same as exprType(em_expr), but can be
* different when dealing with a binary-compatible opfamily; in particular
......
......@@ -110,6 +110,7 @@ extern EquivalenceClass *get_eclass_for_sort_expr(PlannerInfo *root,
Oid opcintype,
Oid collation,
Index sortref,
Relids rel,
bool create_it);
extern void generate_base_implied_equalities(PlannerInfo *root);
extern List *generate_join_implied_equalities(PlannerInfo *root,
......
......@@ -21,7 +21,7 @@ extern PlaceHolderVar *make_placeholder_expr(PlannerInfo *root, Expr *expr,
Relids phrels);
extern PlaceHolderInfo *find_placeholder_info(PlannerInfo *root,
PlaceHolderVar *phv, bool create_new_ph);
extern void find_placeholders_in_query(PlannerInfo *root);
extern void find_placeholders_in_jointree(PlannerInfo *root);
extern void mark_placeholder_maybe_needed(PlannerInfo *root,
PlaceHolderInfo *phinfo, Relids relids);
extern void update_placeholder_eval_levels(PlannerInfo *root,
......
......@@ -1067,11 +1067,11 @@ drop cascades to table matest2
drop cascades to table matest3
--
-- Test merge-append for UNION ALL append relations
-- Check handling of duplicated, constant, or volatile targetlist items
--
set enable_seqscan = off;
set enable_indexscan = on;
set enable_bitmapscan = off;
-- Check handling of duplicated, constant, or volatile targetlist items
explain (costs off)
SELECT thousand, tenthous FROM tenk1
UNION ALL
......@@ -1120,6 +1120,61 @@ ORDER BY thousand, tenthous;
-> Index Only Scan using tenk1_thous_tenthous on tenk1
(7 rows)
-- Check min/max aggregate optimization
explain (costs off)
SELECT min(x) FROM
(SELECT unique1 AS x FROM tenk1 a
UNION ALL
SELECT unique2 AS x FROM tenk1 b) s;
QUERY PLAN
--------------------------------------------------------------------
Result
InitPlan 1 (returns $0)
-> Limit
-> Merge Append
Sort Key: a.unique1
-> Index Only Scan using tenk1_unique1 on tenk1 a
Index Cond: (unique1 IS NOT NULL)
-> Index Only Scan using tenk1_unique2 on tenk1 b
Index Cond: (unique2 IS NOT NULL)
(9 rows)
explain (costs off)
SELECT min(y) FROM
(SELECT unique1 AS x, unique1 AS y FROM tenk1 a
UNION ALL
SELECT unique2 AS x, unique2 AS y FROM tenk1 b) s;
QUERY PLAN
--------------------------------------------------------------------
Result
InitPlan 1 (returns $0)
-> Limit
-> Merge Append
Sort Key: a.unique1
-> Index Only Scan using tenk1_unique1 on tenk1 a
Index Cond: (unique1 IS NOT NULL)
-> Index Only Scan using tenk1_unique2 on tenk1 b
Index Cond: (unique2 IS NOT NULL)
(9 rows)
-- XXX planner doesn't recognize that index on unique2 is sufficiently sorted
explain (costs off)
SELECT x, y FROM
(SELECT thousand AS x, tenthous AS y FROM tenk1 a
UNION ALL
SELECT unique2 AS x, unique2 AS y FROM tenk1 b) s
ORDER BY x, y;
QUERY PLAN
-------------------------------------------------------------------
Result
-> Merge Append
Sort Key: a.thousand, a.tenthous
-> Index Only Scan using tenk1_thous_tenthous on tenk1 a
-> Sort
Sort Key: b.unique2, b.unique2
-> Index Only Scan using tenk1_unique2 on tenk1 b
(7 rows)
reset enable_seqscan;
reset enable_indexscan;
reset enable_bitmapscan;
......@@ -503,3 +503,17 @@ explain (costs off)
reset enable_seqscan;
reset enable_indexscan;
reset enable_bitmapscan;
-- Test constraint exclusion of UNION ALL subqueries
explain (costs off)
SELECT * FROM
(SELECT 1 AS t, * FROM tenk1 a
UNION ALL
SELECT 2 AS t, * FROM tenk1 b) c
WHERE t = 2;
QUERY PLAN
---------------------------------
Result
-> Append
-> Seq Scan on tenk1 b
(3 rows)
......@@ -326,13 +326,13 @@ drop table matest0 cascade;
--
-- Test merge-append for UNION ALL append relations
-- Check handling of duplicated, constant, or volatile targetlist items
--
set enable_seqscan = off;
set enable_indexscan = on;
set enable_bitmapscan = off;
-- Check handling of duplicated, constant, or volatile targetlist items
explain (costs off)
SELECT thousand, tenthous FROM tenk1
UNION ALL
......@@ -351,6 +351,27 @@ UNION ALL
SELECT thousand, random()::integer FROM tenk1
ORDER BY thousand, tenthous;
-- Check min/max aggregate optimization
explain (costs off)
SELECT min(x) FROM
(SELECT unique1 AS x FROM tenk1 a
UNION ALL
SELECT unique2 AS x FROM tenk1 b) s;
explain (costs off)
SELECT min(y) FROM
(SELECT unique1 AS x, unique1 AS y FROM tenk1 a
UNION ALL
SELECT unique2 AS x, unique2 AS y FROM tenk1 b) s;
-- XXX planner doesn't recognize that index on unique2 is sufficiently sorted
explain (costs off)
SELECT x, y FROM
(SELECT thousand AS x, tenthous AS y FROM tenk1 a
UNION ALL
SELECT unique2 AS x, unique2 AS y FROM tenk1 b) s
ORDER BY x, y;
reset enable_seqscan;
reset enable_indexscan;
reset enable_bitmapscan;
......@@ -199,3 +199,11 @@ explain (costs off)
reset enable_seqscan;
reset enable_indexscan;
reset enable_bitmapscan;
-- Test constraint exclusion of UNION ALL subqueries
explain (costs off)
SELECT * FROM
(SELECT 1 AS t, * FROM tenk1 a
UNION ALL
SELECT 2 AS t, * FROM tenk1 b) c
WHERE t = 2;
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment