Commit 9f5f2124 authored by Tom Lane's avatar Tom Lane

Allow the planner to collapse explicit inner JOINs together, rather than

necessarily following the JOIN syntax to develop the query plan.  The old
behavior is still available by setting GUC variable JOIN_COLLAPSE_LIMIT
to 1.  Also create a GUC variable FROM_COLLAPSE_LIMIT to control the
similar decision about when to collapse sub-SELECT lists into their parent
lists.  (This behavior existed already, but the limit was always
GEQO_THRESHOLD/2; now it's separately adjustable.)
parent 15ab7a87
<!--
$Header: /cvsroot/pgsql/doc/src/sgml/perform.sgml,v 1.23 2003/01/12 18:42:59 tgl Exp $
$Header: /cvsroot/pgsql/doc/src/sgml/perform.sgml,v 1.24 2003/01/25 23:10:27 tgl Exp $
-->
<chapter id="performance-tips">
......@@ -591,53 +591,93 @@ SELECT * FROM a LEFT JOIN (b JOIN c ON (b.ref = c.id)) ON (a.id = b.id);
</para>
<para>
The <productname>PostgreSQL</productname> query planner treats all
explicit <literal>JOIN</> syntaxes as constraining the join order, even though
it is not logically necessary to make such a constraint for inner
joins. Therefore, although all of these queries give the same result:
Explicit inner join syntax (<literal>INNER JOIN</>, <literal>CROSS
JOIN</>, or unadorned <literal>JOIN</>) is semantically the same as
listing the input relations in <literal>FROM</>, so it does not need to
constrain the join order. But it is possible to instruct the
<productname>PostgreSQL</productname> query planner to treat
explicit inner <literal>JOIN</>s as constraining the join order anyway.
For example, these three queries are logically equivalent:
<programlisting>
SELECT * FROM a, b, c WHERE a.id = b.id AND b.ref = c.id;
SELECT * FROM a CROSS JOIN b CROSS JOIN c WHERE a.id = b.id AND b.ref = c.id;
SELECT * FROM a JOIN (b JOIN c ON (b.ref = c.id)) ON (a.id = b.id);
</programlisting>
But if we tell the planner to honor the <literal>JOIN</> order,
the second and third take less time to plan than the first. This effect
is not worth worrying about for only three tables, but it can be a
lifesaver with many tables.
</para>
<para>
To force the planner to follow the <literal>JOIN</> order for inner joins,
set the <varname>JOIN_COLLAPSE_LIMIT</> run-time parameter to 1.
(Other possible values are discussed below.)
</para>
<para>
You do not need to constrain the join order completely in order to
cut search time, because it's OK to use <literal>JOIN</> operators in a plain
<literal>FROM</> list. For example,
cut search time, because it's OK to use <literal>JOIN</> operators
within items of a plain <literal>FROM</> list. For example, consider
<programlisting>
SELECT * FROM a CROSS JOIN b, c, d, e WHERE ...;
</programlisting>
With <varname>JOIN_COLLAPSE_LIMIT</> = 1, this
forces the planner to join A to B before joining them to other tables,
but doesn't constrain its choices otherwise. In this example, the
number of possible join orders is reduced by a factor of 5.
</para>
<para>
If you have a mix of outer and inner joins in a complex query, you
might not want to constrain the planner's search for a good ordering
of inner joins inside an outer join. You can't do that directly in the
<literal>JOIN</> syntax, but you can get around the syntactic limitation by using
subselects. For example,
<programlisting>
SELECT * FROM d LEFT JOIN
(SELECT * FROM a, b, c WHERE ...) AS ss
ON (...);
</programlisting>
Here, joining to D must be the last step in the query plan, but the
planner is free to consider various join orders for A, B, and C.
</para>
<para>
Constraining the planner's search in this way is a useful technique
both for reducing planning time and for directing the planner to a
good query plan. If the planner chooses a bad join order by default,
you can force it to choose a better order via <literal>JOIN</> syntax --- assuming
that you know of a better order, that is. Experimentation is recommended.
you can force it to choose a better order via <literal>JOIN</> syntax
--- assuming that you know of a better order, that is. Experimentation
is recommended.
</para>
<para>
A closely related issue that affects planning time is collapsing of
sub-SELECTs into their parent query. For example, consider
<programlisting>
SELECT *
FROM x, y,
(SELECT * FROM a, b, c WHERE something) AS ss
WHERE somethingelse
</programlisting>
This situation might arise from use of a view that contains a join;
the view's SELECT rule will be inserted in place of the view reference,
yielding a query much like the above. Normally, the planner will try
to collapse the sub-query into the parent, yielding
<programlisting>
SELECT * FROM x, y, a, b, c WHERE something AND somethingelse
</programlisting>
This usually results in a better plan than planning the sub-query
separately. (For example, the outer WHERE conditions might be such that
joining X to A first eliminates many rows of A, thus avoiding the need to
form the full logical output of the sub-select.) But at the same time,
we have increased the planning time; here, we have a five-way join
problem replacing two separate three-way join problems. Because of the
exponential growth of the number of possibilities, this makes a big
difference. The planner tries to avoid getting stuck in huge join search
problems by not collapsing a sub-query if more than
<varname>FROM_COLLAPSE_LIMIT</> FROM-items would result in the parent
query. You can trade off planning time against quality of plan by
adjusting this run-time parameter up or down.
</para>
<para>
<varname>FROM_COLLAPSE_LIMIT</> and <varname>JOIN_COLLAPSE_LIMIT</>
are similarly named because they do almost the same thing: one controls
when the planner will <quote>flatten out</> sub-SELECTs, and the
other controls when it will flatten out explicit inner JOINs. Typically
you would either set <varname>JOIN_COLLAPSE_LIMIT</> equal to
<varname>FROM_COLLAPSE_LIMIT</> (so that explicit JOINs and sub-SELECTs
act similarly) or set <varname>JOIN_COLLAPSE_LIMIT</> to 1 (if you want
to control join order with explicit JOINs). But you might set them
differently if you are trying to fine-tune the tradeoff between planning
time and run time.
</para>
</sect1>
......
<!--
$Header: /cvsroot/pgsql/doc/src/sgml/release.sgml,v 1.180 2003/01/23 23:38:51 petere Exp $
$Header: /cvsroot/pgsql/doc/src/sgml/release.sgml,v 1.181 2003/01/25 23:10:27 tgl Exp $
-->
<appendix id="release">
......@@ -24,6 +24,7 @@ CDATA means the content is "SGML-free", so you can write without
worries about funny characters.
-->
<literallayout><![CDATA[
Explicit JOINs no longer constrain query plan, unless JOIN_COLLAPSE_LIMIT = 1
Performance of "foo IN (SELECT ...)" queries has been considerably improved
FETCH 0 now re-fetches cursor's current row, per SQL spec
Revised executor state representation; plan trees are read-only to executor now
......
<!--
$Header: /cvsroot/pgsql/doc/src/sgml/runtime.sgml,v 1.166 2003/01/11 05:04:14 momjian Exp $
$Header: /cvsroot/pgsql/doc/src/sgml/runtime.sgml,v 1.167 2003/01/25 23:10:27 tgl Exp $
-->
<Chapter Id="runtime">
......@@ -773,6 +773,19 @@ env PGOPTIONS='-c geqo=off' psql
</listitem>
</varlistentry>
<varlistentry>
<term><varname>FROM_COLLAPSE_LIMIT</varname> (<type>integer</type>)</term>
<listitem>
<para>
The planner will merge sub-queries into upper queries if the resulting
FROM list would have no more than this many items. Smaller values
reduce planning time but may yield inferior query plans.
The default is 8. It is usually wise to keep this less than
<literal>GEQO_THRESHOLD</>.
</para>
</listitem>
</varlistentry>
<varlistentry>
<indexterm>
<primary>genetic query optimization</primary>
......@@ -826,12 +839,27 @@ env PGOPTIONS='-c geqo=off' psql
<listitem>
<para>
Use genetic query optimization to plan queries with at least
this many <literal>FROM</> items involved. (Note that a
this many <literal>FROM</> items involved. (Note that an outer
<literal>JOIN</> construct counts as only one <literal>FROM</>
item.) The default is 11. For simpler queries it is usually best
to use the deterministic, exhaustive planner. This parameter
also controls how hard the optimizer will try to merge subquery
<literal>FROM</literal> clauses into the upper query.
to use the deterministic, exhaustive planner, but for queries with
many tables the deterministic planner takes too long.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term><varname>JOIN_COLLAPSE_LIMIT</varname> (<type>integer</type>)</term>
<listitem>
<para>
The planner will flatten explicit inner <literal>JOIN</> constructs
into lists of <literal>FROM</> items whenever a list of no more than
this many items would result. Usually this is set the same as
<literal>FROM_COLLAPSE_LIMIT</>. Setting it to 1 prevents any
flattening of inner <literal>JOIN</>s, allowing explicit
<literal>JOIN</> syntax to be used to control the join order.
Intermediate values might be useful to trade off planning time
against quality of plan.
</para>
</listitem>
</varlistentry>
......@@ -1842,8 +1870,8 @@ dynamic_library_path = '/usr/local/lib/postgresql:/home/my_project/lib:$libdir'
server. The default is 64. Each buffer is typically 8192
bytes. This must be greater than 16, as well as at least twice
the value of <varname>MAX_CONNECTIONS</varname>; however, a
higher value can often improve performance on modern
machines. Values of at least a few thousand are recommended
higher value can often improve performance.
Values of a few thousand are recommended
for production installations. This option can only be set at
server start.
</para>
......@@ -1878,15 +1906,17 @@ dynamic_library_path = '/usr/local/lib/postgresql:/home/my_project/lib:$libdir'
<listitem>
<para>
Specifies the amount of memory to be used by internal sorts and
hashes before switching to temporary disk files. The value is
hash tables before switching to temporary disk files. The value is
specified in kilobytes, and defaults to 1024 kilobytes (1 MB).
Note that for a complex query, several sorts might be running in
parallel, and each one will be allowed to use as much memory as
this value specifies before it starts to put data into temporary
Note that for a complex query, several sorts or hashes might be
running in parallel; each one will be allowed to use as much memory
as this value specifies before it starts to put data into temporary
files. Also, each running backend could be doing one or more
sorts simultaneously, so the total memory used could be many
times the value of <varname>SORT_MEM</varname>. Sorts are used
by <literal>ORDER BY</>, merge joins, and <command>CREATE INDEX</>.
Hash tables are used in hash joins, hash-based aggregation, and
hash-based processing of <literal>IN</> sub-selects.
</para>
</listitem>
</varlistentry>
......
......@@ -8,7 +8,7 @@
*
*
* IDENTIFICATION
* $Header: /cvsroot/pgsql/src/backend/optimizer/path/allpaths.c,v 1.94 2003/01/20 18:54:49 tgl Exp $
* $Header: /cvsroot/pgsql/src/backend/optimizer/path/allpaths.c,v 1.95 2003/01/25 23:10:27 tgl Exp $
*
*-------------------------------------------------------------------------
*/
......@@ -30,8 +30,9 @@
#include "rewrite/rewriteManip.h"
bool enable_geqo = true;
int geqo_rels = DEFAULT_GEQO_RELS;
/* These parameters are set by GUC */
bool enable_geqo = false; /* just in case GUC doesn't set it */
int geqo_threshold;
static void set_base_rel_pathlists(Query *root);
......@@ -422,7 +423,7 @@ make_fromexpr_rel(Query *root, FromExpr *from)
* Consider the different orders in which we could join the rels,
* using either GEQO or regular optimizer.
*/
if (enable_geqo && levels_needed >= geqo_rels)
if (enable_geqo && levels_needed >= geqo_threshold)
return geqo(root, levels_needed, initial_rels);
else
return make_one_rel_by_joins(root, levels_needed, initial_rels);
......
......@@ -8,7 +8,7 @@
*
*
* IDENTIFICATION
* $Header: /cvsroot/pgsql/src/backend/optimizer/plan/planner.c,v 1.141 2003/01/20 18:54:52 tgl Exp $
* $Header: /cvsroot/pgsql/src/backend/optimizer/plan/planner.c,v 1.142 2003/01/25 23:10:27 tgl Exp $
*
*-------------------------------------------------------------------------
*/
......@@ -166,12 +166,6 @@ subquery_planner(Query *parse, double tuple_fraction)
parse->jointree = (FromExpr *)
pull_up_subqueries(parse, (Node *) parse->jointree, false);
/*
* If so, we may have created opportunities to simplify the jointree.
*/
parse->jointree = (FromExpr *)
preprocess_jointree(parse, (Node *) parse->jointree);
/*
* Detect whether any rangetable entries are RTE_JOIN kind; if not,
* we can avoid the expense of doing flatten_join_alias_vars().
......@@ -246,6 +240,16 @@ subquery_planner(Query *parse, double tuple_fraction)
}
parse->havingQual = (Node *) newHaving;
/*
* See if we can simplify the jointree; opportunities for this may come
* from having pulled up subqueries, or from flattening explicit JOIN
* syntax. We must do this after flattening JOIN alias variables, since
* eliminating explicit JOIN nodes from the jointree will cause
* get_relids_for_join() to fail.
*/
parse->jointree = (FromExpr *)
preprocess_jointree(parse, (Node *) parse->jointree);
/*
* Do the main planning. If we have an inherited target relation,
* that needs special processing, else go straight to
......
......@@ -9,14 +9,13 @@
*
*
* IDENTIFICATION
* $Header: /cvsroot/pgsql/src/backend/optimizer/prep/prepjointree.c,v 1.1 2003/01/20 18:54:54 tgl Exp $
* $Header: /cvsroot/pgsql/src/backend/optimizer/prep/prepjointree.c,v 1.2 2003/01/25 23:10:27 tgl Exp $
*
*-------------------------------------------------------------------------
*/
#include "postgres.h"
#include "optimizer/clauses.h"
#include "optimizer/paths.h"
#include "optimizer/prep.h"
#include "optimizer/subselect.h"
#include "optimizer/var.h"
......@@ -24,6 +23,11 @@
#include "rewrite/rewriteManip.h"
/* These parameters are set by GUC */
int from_collapse_limit;
int join_collapse_limit;
static bool is_simple_subquery(Query *subquery);
static bool has_nullable_targetlist(Query *subquery);
static void resolvenew_in_jointree(Node *jtnode, int varno, List *subtlist);
......@@ -467,7 +471,16 @@ resolvenew_in_jointree(Node *jtnode, int varno, List *subtlist)
* case we can consider collapsing the two FromExprs into one. This is
* an optional conversion, since the planner will work correctly either
* way. But we may find a better plan (at the cost of more planning time)
* if we merge the two nodes.
* if we merge the two nodes, creating a single join search space out of
* two. To allow the user to trade off planning time against plan quality,
* we provide a control parameter from_collapse_limit that limits the size
* of the join search space that can be created this way.
*
* We also consider flattening explicit inner JOINs into FromExprs (which
* will in turn allow them to be merged into parent FromExprs). The tradeoffs
* here are the same as for flattening FromExprs, but we use a different
* control parameter so that the user can use explicit JOINs to control the
* join order even when they are inner JOINs.
*
* NOTE: don't try to do this in the same jointree scan that does subquery
* pullup! Since we're changing the jointree structure here, that wouldn't
......@@ -492,7 +505,7 @@ preprocess_jointree(Query *parse, Node *jtnode)
{
Node *child = (Node *) lfirst(l);
/* Recursively simplify the child... */
/* Recursively simplify this child... */
child = preprocess_jointree(parse, child);
/* Now, is it a FromExpr? */
if (child && IsA(child, FromExpr))
......@@ -500,21 +513,25 @@ preprocess_jointree(Query *parse, Node *jtnode)
/*
* Yes, so do we want to merge it into parent? Always do
* so if child has just one element (since that doesn't
* make the parent's list any longer). Otherwise we have
* to be careful about the increase in planning time
* caused by combining the two join search spaces into
* one. Our heuristic is to merge if the merge will
* produce a join list no longer than GEQO_RELS/2.
* (Perhaps need an additional user parameter?)
* make the parent's list any longer). Otherwise merge if
* the resulting join list would be no longer than
* from_collapse_limit.
*/
FromExpr *subf = (FromExpr *) child;
int childlen = length(subf->fromlist);
int myothers = length(newlist) + length(lnext(l));
if (childlen <= 1 || (childlen + myothers) <= geqo_rels / 2)
if (childlen <= 1 ||
(childlen + myothers) <= from_collapse_limit)
{
newlist = nconc(newlist, subf->fromlist);
f->quals = make_and_qual(subf->quals, f->quals);
/*
* By now, the quals have been converted to implicit-AND
* lists, so we just need to join the lists. NOTE: we
* put the pulled-up quals first.
*/
f->quals = (Node *) nconc((List *) subf->quals,
(List *) f->quals);
}
else
newlist = lappend(newlist, child);
......@@ -528,9 +545,64 @@ preprocess_jointree(Query *parse, Node *jtnode)
{
JoinExpr *j = (JoinExpr *) jtnode;
/* Can't usefully change the JoinExpr, but recurse on children */
/* Recursively simplify the children... */
j->larg = preprocess_jointree(parse, j->larg);
j->rarg = preprocess_jointree(parse, j->rarg);
/*
* If it is an outer join, we must not flatten it. An inner join
* is semantically equivalent to a FromExpr; we convert it to one,
* allowing it to be flattened into its parent, if the resulting
* FromExpr would have no more than join_collapse_limit members.
*/
if (j->jointype == JOIN_INNER && join_collapse_limit > 1)
{
int leftlen,
rightlen;
if (j->larg && IsA(j->larg, FromExpr))
leftlen = length(((FromExpr *) j->larg)->fromlist);
else
leftlen = 1;
if (j->rarg && IsA(j->rarg, FromExpr))
rightlen = length(((FromExpr *) j->rarg)->fromlist);
else
rightlen = 1;
if ((leftlen + rightlen) <= join_collapse_limit)
{
FromExpr *f = makeNode(FromExpr);
f->fromlist = NIL;
f->quals = NULL;
if (j->larg && IsA(j->larg, FromExpr))
{
FromExpr *subf = (FromExpr *) j->larg;
f->fromlist = subf->fromlist;
f->quals = subf->quals;
}
else
f->fromlist = makeList1(j->larg);
if (j->rarg && IsA(j->rarg, FromExpr))
{
FromExpr *subf = (FromExpr *) j->rarg;
f->fromlist = nconc(f->fromlist,
subf->fromlist);
f->quals = (Node *) nconc((List *) f->quals,
(List *) subf->quals);
}
else
f->fromlist = lappend(f->fromlist, j->rarg);
/* pulled-up quals first */
f->quals = (Node *) nconc((List *) f->quals,
(List *) j->quals);
return (Node *) f;
}
}
}
else
elog(ERROR, "preprocess_jointree: unexpected node type %d",
......@@ -615,6 +687,9 @@ get_relids_in_jointree(Node *jtnode)
/*
* get_relids_for_join: get list of base RT indexes making up a join
*
* NB: this will not work reliably after preprocess_jointree() is run,
* since that may eliminate join nodes from the jointree.
*/
List *
get_relids_for_join(Query *parse, int joinrelid)
......
......@@ -5,7 +5,7 @@
* command, configuration file, and command line options.
* See src/backend/utils/misc/README for more information.
*
* $Header: /cvsroot/pgsql/src/backend/utils/misc/guc.c,v 1.110 2003/01/10 22:03:29 petere Exp $
* $Header: /cvsroot/pgsql/src/backend/utils/misc/guc.c,v 1.111 2003/01/25 23:10:27 tgl Exp $
*
* Copyright 2000 by PostgreSQL Global Development Group
* Written by Peter Eisentraut <peter_e@gmx.net>.
......@@ -37,7 +37,7 @@
#include "optimizer/cost.h"
#include "optimizer/geqo.h"
#include "optimizer/paths.h"
#include "optimizer/planmain.h"
#include "optimizer/prep.h"
#include "parser/parse_expr.h"
#include "storage/fd.h"
#include "storage/freespace.h"
......@@ -539,8 +539,16 @@ static struct config_int
10, 1, 1000, NULL, NULL
},
{
{"geqo_threshold", PGC_USERSET}, &geqo_rels,
DEFAULT_GEQO_RELS, 2, INT_MAX, NULL, NULL
{"from_collapse_limit", PGC_USERSET}, &from_collapse_limit,
8, 1, INT_MAX, NULL, NULL
},
{
{"join_collapse_limit", PGC_USERSET}, &join_collapse_limit,
8, 1, INT_MAX, NULL, NULL
},
{
{"geqo_threshold", PGC_USERSET}, &geqo_threshold,
11, 2, INT_MAX, NULL, NULL
},
{
{"geqo_pool_size", PGC_USERSET}, &Geqo_pool_size,
......
......@@ -94,6 +94,9 @@
#cpu_index_tuple_cost = 0.001 # (same)
#cpu_operator_cost = 0.0025 # (same)
#from_collapse_limit = 8
#join_collapse_limit = 8 # 1 disables collapsing of explicit JOINs
#default_statistics_target = 10 # range 1-1000
#
......
......@@ -3,7 +3,7 @@
*
* Copyright 2000-2002 by PostgreSQL Global Development Group
*
* $Header: /cvsroot/pgsql/src/bin/psql/tab-complete.c,v 1.71 2003/01/10 22:03:30 petere Exp $
* $Header: /cvsroot/pgsql/src/bin/psql/tab-complete.c,v 1.72 2003/01/25 23:10:30 tgl Exp $
*/
/*----------------------------------------------------------------------
......@@ -252,6 +252,7 @@ psql_completion(char *text, int start, int end)
"explain_pretty_print",
"extra_float_digits",
"fixbtree",
"from_collapse_limit",
"fsync",
"geqo",
"geqo_effort",
......@@ -260,6 +261,7 @@ psql_completion(char *text, int start, int end)
"geqo_random_seed",
"geqo_selection_bias",
"geqo_threshold",
"join_collapse_limit",
"log_hostname",
"krb_server_keyfile",
"lc_messages",
......
......@@ -8,7 +8,7 @@
* Portions Copyright (c) 1996-2002, PostgreSQL Global Development Group
* Portions Copyright (c) 1994, Regents of the University of California
*
* $Id: paths.h,v 1.64 2003/01/24 03:58:43 tgl Exp $
* $Id: paths.h,v 1.65 2003/01/25 23:10:30 tgl Exp $
*
*-------------------------------------------------------------------------
*/
......@@ -18,16 +18,11 @@
#include "nodes/relation.h"
/* default GEQO threshold (default value for geqo_rels) */
/* If you change this, update backend/utils/misc/postgresql.sample.conf */
#define DEFAULT_GEQO_RELS 11
/*
* allpaths.c
*/
extern bool enable_geqo;
extern int geqo_rels;
extern int geqo_threshold;
extern RelOptInfo *make_one_rel(Query *root);
extern RelOptInfo *make_fromexpr_rel(Query *root, FromExpr *from);
......
......@@ -7,7 +7,7 @@
* Portions Copyright (c) 1996-2002, PostgreSQL Global Development Group
* Portions Copyright (c) 1994, Regents of the University of California
*
* $Id: prep.h,v 1.34 2003/01/20 18:55:05 tgl Exp $
* $Id: prep.h,v 1.35 2003/01/25 23:10:30 tgl Exp $
*
*-------------------------------------------------------------------------
*/
......@@ -20,6 +20,9 @@
/*
* prototypes for prepjointree.c
*/
extern int from_collapse_limit;
extern int join_collapse_limit;
extern Node *pull_up_IN_clauses(Query *parse, Node *node);
extern Node *pull_up_subqueries(Query *parse, Node *jtnode,
bool below_outer_join);
......
This diff is collapsed.
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment