Commit b514a746 authored by Tom Lane's avatar Tom Lane

Fix planning of star-schema-style queries.

Part of the intent of the parameterized-path mechanism was to handle
star-schema queries efficiently, but some overly-restrictive search
limiting logic added in commit e2fa76d8
prevented such cases from working as desired.  Fix that and add a
regression test about it.  Per gripe from Marc Cousin.

This is arguably a bug rather than a new feature, so back-patch to 9.2
where parameterized paths were introduced.
parent c4f4c7ca
...@@ -702,17 +702,37 @@ intermediate layers of joins, for example: ...@@ -702,17 +702,37 @@ intermediate layers of joins, for example:
-> Index Scan using C_Z_IDX on C -> Index Scan using C_Z_IDX on C
Index Condition: C.Z = A.X Index Condition: C.Z = A.X
If all joins are plain inner joins then this is unnecessary, because If all joins are plain inner joins then this is usually unnecessary,
it's always possible to reorder the joins so that a parameter is used because it's possible to reorder the joins so that a parameter is used
immediately below the nestloop node that provides it. But in the immediately below the nestloop node that provides it. But in the
presence of outer joins, join reordering may not be possible, and then presence of outer joins, such join reordering may not be possible.
this option can be critical. Before version 9.2, Postgres used ad-hoc
methods for planning and executing such queries, and those methods could Also, the bottom-level scan might require parameters from more than one
not handle passing parameters down through multiple join levels. other relation. In principle we could join the other relations first
so that all the parameters are supplied from a single nestloop level.
But if those other relations have no join clause in common (which is
common in star-schema queries for instance), the planner won't consider
joining them directly to each other. In such a case we need to be able
to create a plan like
NestLoop
-> Seq Scan on SmallTable1 A
NestLoop
-> Seq Scan on SmallTable2 B
NestLoop
-> Index Scan using XYIndex on LargeTable C
Index Condition: C.X = A.AID and C.Y = B.BID
so we should be willing to pass down A.AID through a join even though
there is no join order constraint forcing the plan to look like this.
Before version 9.2, Postgres used ad-hoc methods for planning and
executing nestloop queries of this kind, and those methods could not
handle passing parameters down through multiple join levels.
To plan such queries, we now use a notion of a "parameterized path", To plan such queries, we now use a notion of a "parameterized path",
which is a path that makes use of a join clause to a relation that's not which is a path that makes use of a join clause to a relation that's not
scanned by the path. In the example just above, we would construct a scanned by the path. In the example two above, we would construct a
path representing the possibility of doing this: path representing the possibility of doing this:
-> Index Scan using C_Z_IDX on C -> Index Scan using C_Z_IDX on C
......
...@@ -117,13 +117,14 @@ add_paths_to_joinrel(PlannerInfo *root, ...@@ -117,13 +117,14 @@ add_paths_to_joinrel(PlannerInfo *root,
/* /*
* Decide whether it's sensible to generate parameterized paths for this * Decide whether it's sensible to generate parameterized paths for this
* joinrel, and if so, which relations such paths should require. There * joinrel, and if so, which relations such paths should require. There
* is no need to create a parameterized result path unless there is a join * is usually no need to create a parameterized result path unless there
* order restriction that prevents joining one of our input rels directly * is a join order restriction that prevents joining one of our input rels
* to the parameter source rel instead of joining to the other input rel. * directly to the parameter source rel instead of joining to the other
* This restriction reduces the number of parameterized paths we have to * input rel. (But see exception in try_nestloop_path.) This restriction
* deal with at higher join levels, without compromising the quality of * reduces the number of parameterized paths we have to deal with at
* the resulting plan. We express the restriction as a Relids set that * higher join levels, without compromising the quality of the resulting
* must overlap the parameterization of any proposed join path. * plan. We express the restriction as a Relids set that must overlap the
* parameterization of any proposed join path.
*/ */
foreach(lc, root->join_info_list) foreach(lc, root->join_info_list)
{ {
...@@ -290,11 +291,31 @@ try_nestloop_path(PlannerInfo *root, ...@@ -290,11 +291,31 @@ try_nestloop_path(PlannerInfo *root,
inner_path); inner_path);
if (required_outer && if (required_outer &&
!bms_overlap(required_outer, param_source_rels)) !bms_overlap(required_outer, param_source_rels))
{
/*
* We override the param_source_rels heuristic to accept nestloop
* paths in which the outer rel satisfies some but not all of the
* inner path's parameterization. This is necessary to get good plans
* for star-schema scenarios, in which a parameterized path for a
* large table may require parameters from multiple small tables that
* will not get joined directly to each other. We can handle that by
* stacking nestloops that have the small tables on the outside; but
* this breaks the rule the param_source_rels heuristic is based on,
* namely that parameters should not be passed down across joins
* unless there's a join-order-constraint-based reason to do so. So
* ignore the param_source_rels restriction when this case applies.
*/
Relids outerrelids = outer_path->parent->relids;
Relids innerparams = PATH_REQ_OUTER(inner_path);
if (!(bms_overlap(innerparams, outerrelids) &&
bms_nonempty_difference(innerparams, outerrelids)))
{ {
/* Waste no memory when we reject a path here */ /* Waste no memory when we reject a path here */
bms_free(required_outer); bms_free(required_outer);
return; return;
} }
}
/* /*
* Independently of that, add parameterization needed for any * Independently of that, add parameterization needed for any
......
...@@ -2780,6 +2780,25 @@ where thousand = (q1 + q2); ...@@ -2780,6 +2780,25 @@ where thousand = (q1 + q2);
-> Seq Scan on int4_tbl -> Seq Scan on int4_tbl
(12 rows) (12 rows)
--
-- test ability to generate a suitable plan for a star-schema query
--
explain (costs off)
select * from
tenk1, int8_tbl a, int8_tbl b
where thousand = a.q1 and tenthous = b.q1 and a.q2 = 1 and b.q2 = 2;
QUERY PLAN
---------------------------------------------------------------------
Nested Loop
-> Seq Scan on int8_tbl b
Filter: (q2 = 2)
-> Nested Loop
-> Seq Scan on int8_tbl a
Filter: (q2 = 1)
-> Index Scan using tenk1_thous_tenthous on tenk1
Index Cond: ((thousand = a.q1) AND (tenthous = b.q1))
(8 rows)
-- --
-- test extraction of restriction OR clauses from join OR clause -- test extraction of restriction OR clauses from join OR clause
-- (we used to only do this for indexable clauses) -- (we used to only do this for indexable clauses)
......
...@@ -763,6 +763,15 @@ select * from ...@@ -763,6 +763,15 @@ select * from
int4(sin(0)) q2 int4(sin(0)) q2
where thousand = (q1 + q2); where thousand = (q1 + q2);
--
-- test ability to generate a suitable plan for a star-schema query
--
explain (costs off)
select * from
tenk1, int8_tbl a, int8_tbl b
where thousand = a.q1 and tenthous = b.q1 and a.q2 = 1 and b.q2 = 2;
-- --
-- test extraction of restriction OR clauses from join OR clause -- test extraction of restriction OR clauses from join OR clause
-- (we used to only do this for indexable clauses) -- (we used to only do this for indexable clauses)
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment