Commit e83bb10d authored by Tom Lane's avatar Tom Lane

Adjust definition of cheapest_total_path to work better with LATERAL.

In the initial cut at LATERAL, I kept the rule that cheapest_total_path
was always unparameterized, which meant it had to be NULL if the relation
has no unparameterized paths.  It turns out to work much more nicely if
we always have *some* path nominated as cheapest-total for each relation.
In particular, let's still say it's the cheapest unparameterized path if
there is one; if not, take the cheapest-total-cost path among those of
the minimum available parameterization.  (The first rule is actually
a special case of the second.)

This allows reversion of some temporary lobotomizations I'd put in place.
In particular, the planner can now consider hash and merge joins for
joins below a parameter-supplying nestloop, even if there aren't any
unparameterized paths available.  This should bring planning of
LATERAL-containing queries to the same level as queries not using that
feature.

Along the way, simplify management of parameterized paths in add_path()
and friends.  In the original coding for parameterized paths in 9.2,
I tried to minimize the logic changes in add_path(), so it just treated
parameterization as yet another dimension of comparison for paths.
We later made it ignore pathkeys (sort ordering) of parameterized paths,
on the grounds that ordering isn't a useful property for the path on the
inside of a nestloop, so we might as well get rid of useless parameterized
paths as quickly as possible.  But we didn't take that reasoning as far as
we should have.  Startup cost isn't a useful property inside a nestloop
either, so add_path() ought to discount startup cost of parameterized paths
as well.  Having done that, the secondary sorting I'd implemented (in
add_parameterized_path) is no longer needed --- any parameterized path that
survives add_path() at all is worth considering at higher levels.  So this
should be a bit faster as well as simpler.
parent 9fe6da5c
......@@ -789,7 +789,7 @@ a nestloop that provides parameters to the lower join's inputs). While we
do not ignore merge joins entirely, joinpath.c does not fully explore the
space of potential merge joins with parameterized inputs. Also, add_path
treats parameterized paths as having no pathkeys, so that they compete
only on cost and rowcount; they don't get preference for producing a
only on total cost and rowcount; they don't get preference for producing a
special sort order. This creates additional bias against merge joins,
since we might discard a path that could have been useful for performing
a merge without an explicit sort step. Since a parameterized path must
......@@ -799,4 +799,19 @@ output order of a query --- they only make it harder to use a merge join
at a lower level. The savings in planning work justifies that.
LATERAL subqueries
------------------
As of 9.3 we support SQL-standard LATERAL references from subqueries in
FROM (and also functions in FROM). The planner implements these by
generating parameterized paths for any RTE that contains lateral
references. In such cases, *all* paths for that relation will be
parameterized by at least the set of relations used in its lateral
references. (And in turn, join relations including such a subquery might
not have any unparameterized paths.) All the other comments made above for
parameterized paths still apply, though; in particular, each such path is
still expected to enforce any join clauses that can be pushed down to it,
so that all paths of the same parameterization have the same rowcount.
-- bjm & tgl
......@@ -102,18 +102,12 @@ geqo_eval(PlannerInfo *root, Gene *tour, int num_gene)
joinrel = gimme_tree(root, tour, num_gene);
best_path = joinrel->cheapest_total_path;
/*
* If no unparameterized path, use the cheapest parameterized path for
* costing purposes. XXX revisit this after LATERAL dust settles
*/
if (!best_path)
best_path = linitial(joinrel->cheapest_parameterized_paths);
/*
* compute fitness
*
* XXX geqo does not currently support optimization for partial result
* retrieval --- how to fix?
* retrieval, nor do we take any cognizance of possible use of
* parameterized paths --- how to fix?
*/
fitness = best_path->total_cost;
......
......@@ -722,7 +722,7 @@ set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
* the unparameterized Append path we are constructing for the parent.
* If not, there's no workable unparameterized path.
*/
if (childrel->cheapest_total_path)
if (childrel->cheapest_total_path->param_info == NULL)
subpaths = accumulate_append_subpath(subpaths,
childrel->cheapest_total_path);
else
......@@ -932,7 +932,6 @@ generate_mergeappend_paths(PlannerInfo *root, RelOptInfo *rel,
cheapest_startup = cheapest_total =
childrel->cheapest_total_path;
/* Assert we do have an unparameterized path for this child */
Assert(cheapest_total != NULL);
Assert(cheapest_total->param_info == NULL);
}
......
......@@ -22,6 +22,9 @@
#include "optimizer/paths.h"
#define PATH_PARAM_BY_REL(path, rel) \
((path)->param_info && bms_overlap(PATH_REQ_OUTER(path), (rel)->relids))
static void sort_inner_and_outer(PlannerInfo *root, RelOptInfo *joinrel,
RelOptInfo *outerrel, RelOptInfo *innerrel,
List *restrictlist, List *mergeclause_list,
......@@ -503,18 +506,24 @@ sort_inner_and_outer(PlannerInfo *root,
* cheapest-startup-cost input paths later, and only if they don't need a
* sort.
*
* This function intentionally does not consider parameterized input paths
* (implicit in the fact that it only looks at cheapest_total_path, which
* is always unparameterized). If we did so, we'd have a combinatorial
* explosion of mergejoin paths of dubious value. This interacts with
* decisions elsewhere that also discriminate against mergejoins with
* parameterized inputs; see comments in src/backend/optimizer/README.
* This function intentionally does not consider parameterized input
* paths, except when the cheapest-total is parameterized. If we did so,
* we'd have a combinatorial explosion of mergejoin paths of dubious
* value. This interacts with decisions elsewhere that also discriminate
* against mergejoins with parameterized inputs; see comments in
* src/backend/optimizer/README.
*/
outer_path = outerrel->cheapest_total_path;
inner_path = innerrel->cheapest_total_path;
/* Punt if either rel has only parameterized paths */
if (!outer_path || !inner_path)
/*
* If either cheapest-total path is parameterized by the other rel, we
* can't use a mergejoin. (There's no use looking for alternative input
* paths, since these should already be the least-parameterized available
* paths.)
*/
if (PATH_PARAM_BY_REL(outer_path, innerrel) ||
PATH_PARAM_BY_REL(inner_path, outerrel))
return;
/*
......@@ -714,16 +723,23 @@ match_unsorted_outer(PlannerInfo *root,
break;
}
/*
* If inner_cheapest_total is parameterized by the outer rel, ignore it;
* we will consider it below as a member of cheapest_parameterized_paths,
* but the other possibilities considered in this routine aren't usable.
*/
if (PATH_PARAM_BY_REL(inner_cheapest_total, outerrel))
inner_cheapest_total = NULL;
/*
* If we need to unique-ify the inner path, we will consider only the
* cheapest-total inner.
*/
if (save_jointype == JOIN_UNIQUE_INNER)
{
/* XXX for the moment, don't crash on LATERAL --- rethink this */
/* No way to do this with an inner path parameterized by outer rel */
if (inner_cheapest_total == NULL)
return;
inner_cheapest_total = (Path *)
create_unique_path(root, innerrel, inner_cheapest_total, sjinfo);
Assert(inner_cheapest_total);
......@@ -756,15 +772,13 @@ match_unsorted_outer(PlannerInfo *root,
/*
* We cannot use an outer path that is parameterized by the inner rel.
*/
if (bms_overlap(PATH_REQ_OUTER(outerpath), innerrel->relids))
if (PATH_PARAM_BY_REL(outerpath, innerrel))
continue;
/*
* If we need to unique-ify the outer path, it's pointless to consider
* any but the cheapest outer. (XXX we don't consider parameterized
* outers, nor inners, for unique-ified cases. Should we?)
*
* XXX does nothing for LATERAL, rethink
*/
if (save_jointype == JOIN_UNIQUE_OUTER)
{
......@@ -844,8 +858,8 @@ match_unsorted_outer(PlannerInfo *root,
if (save_jointype == JOIN_UNIQUE_OUTER)
continue;
/* Can't do anything else if inner has no unparameterized paths */
if (!inner_cheapest_total)
/* Can't do anything else if inner rel is parameterized by outer */
if (inner_cheapest_total == NULL)
continue;
/* Look for useful mergeclauses (if any) */
......@@ -1126,10 +1140,14 @@ hash_inner_and_outer(PlannerInfo *root,
Path *cheapest_total_outer = outerrel->cheapest_total_path;
Path *cheapest_total_inner = innerrel->cheapest_total_path;
/* Punt if either rel has only parameterized paths */
if (!cheapest_startup_outer ||
!cheapest_total_outer ||
!cheapest_total_inner)
/*
* If either cheapest-total path is parameterized by the other rel, we
* can't use a hashjoin. (There's no use looking for alternative
* input paths, since these should already be the least-parameterized
* available paths.)
*/
if (PATH_PARAM_BY_REL(cheapest_total_outer, innerrel) ||
PATH_PARAM_BY_REL(cheapest_total_inner, outerrel))
return;
/* Unique-ify if need be; we ignore parameterized possibilities */
......@@ -1169,7 +1187,8 @@ hash_inner_and_outer(PlannerInfo *root,
cheapest_total_inner,
restrictlist,
hashclauses);
if (cheapest_startup_outer != cheapest_total_outer)
if (cheapest_startup_outer != NULL &&
cheapest_startup_outer != cheapest_total_outer)
try_hashjoin_path(root,
joinrel,
jointype,
......@@ -1193,16 +1212,17 @@ hash_inner_and_outer(PlannerInfo *root,
ListCell *lc1;
ListCell *lc2;
try_hashjoin_path(root,
joinrel,
jointype,
sjinfo,
semifactors,
param_source_rels,
cheapest_startup_outer,
cheapest_total_inner,
restrictlist,
hashclauses);
if (cheapest_startup_outer != NULL)
try_hashjoin_path(root,
joinrel,
jointype,
sjinfo,
semifactors,
param_source_rels,
cheapest_startup_outer,
cheapest_total_inner,
restrictlist,
hashclauses);
foreach(lc1, outerrel->cheapest_parameterized_paths)
{
......@@ -1212,7 +1232,7 @@ hash_inner_and_outer(PlannerInfo *root,
* We cannot use an outer path that is parameterized by the
* inner rel.
*/
if (bms_overlap(PATH_REQ_OUTER(outerpath), innerrel->relids))
if (PATH_PARAM_BY_REL(outerpath, innerrel))
continue;
foreach(lc2, innerrel->cheapest_parameterized_paths)
......@@ -1223,8 +1243,7 @@ hash_inner_and_outer(PlannerInfo *root,
* We cannot use an inner path that is parameterized by
* the outer rel, either.
*/
if (bms_overlap(PATH_REQ_OUTER(innerpath),
outerrel->relids))
if (PATH_PARAM_BY_REL(innerpath, outerrel))
continue;
if (outerpath == cheapest_startup_outer &&
......
......@@ -267,7 +267,8 @@ query_planner(PlannerInfo *root, List *tlist,
*/
final_rel = make_one_rel(root, joinlist);
if (!final_rel || !final_rel->cheapest_total_path)
if (!final_rel || !final_rel->cheapest_total_path ||
final_rel->cheapest_total_path->param_info != NULL)
elog(ERROR, "failed to construct the join relation");
/*
......
This diff is collapsed.
......@@ -309,18 +309,16 @@ typedef struct PlannerInfo
* method of generating the relation
* ppilist - ParamPathInfo nodes for parameterized Paths, if any
* cheapest_startup_path - the pathlist member with lowest startup cost
* (regardless of its ordering; but must be
* unparameterized; hence will be NULL for
* a LATERAL subquery)
* (regardless of ordering) among the unparameterized paths;
* or NULL if there is no unparameterized path
* cheapest_total_path - the pathlist member with lowest total cost
* (regardless of its ordering; but must be
* unparameterized; hence will be NULL for
* a LATERAL subquery)
* (regardless of ordering) among the unparameterized paths;
* or if there is no unparameterized path, the path with lowest
* total cost among the paths with minimum parameterization
* cheapest_unique_path - for caching cheapest path to produce unique
* (no duplicates) output from relation
* cheapest_parameterized_paths - paths with cheapest total costs for
* their parameterizations; always includes
* cheapest_total_path, if that exists
* (no duplicates) output from relation; NULL if not yet requested
* cheapest_parameterized_paths - best paths for their parameterizations;
* always includes cheapest_total_path, even if that's unparameterized
*
* If the relation is a base relation it will have these fields set:
*
......
......@@ -3167,15 +3167,16 @@ explain (costs off)
select * from int8_tbl a,
int8_tbl x left join lateral (select a.q1 from int4_tbl y) ss(z)
on x.q2 = ss.z;
QUERY PLAN
------------------------------------
QUERY PLAN
------------------------------------------
Nested Loop
-> Seq Scan on int8_tbl a
-> Nested Loop Left Join
Join Filter: (x.q2 = ($0))
-> Hash Left Join
Hash Cond: (x.q2 = ($0))
-> Seq Scan on int8_tbl x
-> Seq Scan on int4_tbl y
(6 rows)
-> Hash
-> Seq Scan on int4_tbl y
(7 rows)
select * from int8_tbl a,
int8_tbl x left join lateral (select a.q1 from int4_tbl y) ss(z)
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment