Commit 3969f292 authored by Tom Lane's avatar Tom Lane

Revise GEQO planner to make use of some heuristic knowledge about SQL, namely

that it's good to join where there are join clauses rather than where there
are not.  Also enable it to generate bushy plans at need, so that it doesn't
fail in the presence of multiple IN clauses containing sub-joins.  These
changes appear to improve the behavior enough that we can substantially reduce
the default pool size and generations count, thereby decreasing the runtime,
and yet get as good or better plans as we were getting in 7.4.  Consequently,
adjust the default GEQO parameters.  I also modified the way geqo_effort is
used so that it affects both population size and number of generations;
it's now useful as a single control to adjust the GEQO runtime-vs-plan-quality
tradeoff.  Bump geqo_threshold to 12, since even with these changes GEQO
seems to be slower than the regular planner at 11 relations.
parent 81c554bb
<!-- <!--
$PostgreSQL: pgsql/doc/src/sgml/runtime.sgml,v 1.231 2004/01/21 23:33:34 tgl Exp $ $PostgreSQL: pgsql/doc/src/sgml/runtime.sgml,v 1.232 2004/01/23 23:54:20 tgl Exp $
--> -->
<Chapter Id="runtime"> <Chapter Id="runtime">
...@@ -1396,7 +1396,7 @@ SET ENABLE_SEQSCAN TO OFF; ...@@ -1396,7 +1396,7 @@ SET ENABLE_SEQSCAN TO OFF;
Use genetic query optimization to plan queries with at least Use genetic query optimization to plan queries with at least
this many <literal>FROM</> items involved. (Note that an outer this many <literal>FROM</> items involved. (Note that an outer
<literal>JOIN</> construct counts as only one <literal>FROM</> <literal>JOIN</> construct counts as only one <literal>FROM</>
item.) The default is 11. For simpler queries it is usually best item.) The default is 12. For simpler queries it is usually best
to use the deterministic, exhaustive planner, but for queries with to use the deterministic, exhaustive planner, but for queries with
many tables the deterministic planner takes too long. many tables the deterministic planner takes too long.
</para> </para>
...@@ -1404,25 +1404,33 @@ SET ENABLE_SEQSCAN TO OFF; ...@@ -1404,25 +1404,33 @@ SET ENABLE_SEQSCAN TO OFF;
</varlistentry> </varlistentry>
<varlistentry> <varlistentry>
<term><varname>geqo_effort</varname> (<type>integer</type>)</term>
<term><varname>geqo_pool_size</varname> (<type>integer</type>)</term> <term><varname>geqo_pool_size</varname> (<type>integer</type>)</term>
<term><varname>geqo_generations</varname> (<type>integer</type>)</term> <term><varname>geqo_generations</varname> (<type>integer</type>)</term>
<term><varname>geqo_effort</varname> (<type>integer</type>)</term>
<term><varname>geqo_selection_bias</varname> (<type>floating point</type>)</term> <term><varname>geqo_selection_bias</varname> (<type>floating point</type>)</term>
<listitem> <listitem>
<para> <para>
Various tuning parameters for the genetic query optimization Various tuning parameters for the genetic query optimization
algorithm. The pool size is the number of individuals in one algorithm. The recommended one to modify is
population. Valid values are between 128 and 1024. If it is set <varname>geqo_effort</varname>, which can range from 1 to 10 with
to 0 (the default) a pool size of 2^(QS+1), where QS is the a default of 5. Larger values increase the time spent in planning
number of <literal>FROM</> items in the query, is used. but make it more likely that a good plan will be found.
<varname>geqo_effort</varname> doesn't actually do anything directly,
it is just used to compute the default values for the other
parameters. If you prefer, you can set the other parameters by hand
instead.
The pool size is the number of individuals in the genetic population.
It must be at least two, and useful values are typically 100 to 1000.
If it is set to zero (the default setting) then a suitable default
is chosen based on <varname>geqo_effort</varname> and the number of
tables in the query.
Generations specifies the number of iterations of the algorithm. Generations specifies the number of iterations of the algorithm.
The value must be a positive integer. If 0 is specified then It must be at least one, and useful values are in the same range
<literal>Effort * Log2(PoolSize)</literal> is used. as the pool size.
If it is set to zero (the default setting) then a suitable default
is chosen based on the pool size.
The run time of the algorithm is roughly proportional to the sum of The run time of the algorithm is roughly proportional to the sum of
pool size and generations. pool size and generations.
<varname>geqo_effort</varname> is only used in computing the default
generations setting, as just described. The default value is 40,
and the allowed range 1 to 100.
The selection bias is the selective pressure within the The selection bias is the selective pressure within the
population. Values can be from 1.50 to 2.00; the latter is the population. Values can be from 1.50 to 2.00; the latter is the
default. default.
......
...@@ -6,7 +6,7 @@ ...@@ -6,7 +6,7 @@
* Portions Copyright (c) 1996-2003, PostgreSQL Global Development Group * Portions Copyright (c) 1996-2003, PostgreSQL Global Development Group
* Portions Copyright (c) 1994, Regents of the University of California * Portions Copyright (c) 1994, Regents of the University of California
* *
* $PostgreSQL: pgsql/src/backend/optimizer/geqo/geqo_eval.c,v 1.66 2003/11/29 19:51:50 pgsql Exp $ * $PostgreSQL: pgsql/src/backend/optimizer/geqo/geqo_eval.c,v 1.67 2004/01/23 23:54:21 tgl Exp $
* *
*------------------------------------------------------------------------- *-------------------------------------------------------------------------
*/ */
...@@ -31,13 +31,17 @@ ...@@ -31,13 +31,17 @@
#include "utils/memutils.h" #include "utils/memutils.h"
static bool desirable_join(Query *root,
RelOptInfo *outer_rel, RelOptInfo *inner_rel);
/* /*
* geqo_eval * geqo_eval
* *
* Returns cost of a query tree as an individual of the population. * Returns cost of a query tree as an individual of the population.
*/ */
Cost Cost
geqo_eval(Query *root, List *initial_rels, Gene *tour, int num_gene) geqo_eval(Gene *tour, int num_gene, GeqoEvalData *evaldata)
{ {
MemoryContext mycontext; MemoryContext mycontext;
MemoryContext oldcxt; MemoryContext oldcxt;
...@@ -52,9 +56,9 @@ geqo_eval(Query *root, List *initial_rels, Gene *tour, int num_gene) ...@@ -52,9 +56,9 @@ geqo_eval(Query *root, List *initial_rels, Gene *tour, int num_gene)
* redundant cost calculations, we simply reject tours where tour[0] > * redundant cost calculations, we simply reject tours where tour[0] >
* tour[1], assigning them an artificially bad fitness. * tour[1], assigning them an artificially bad fitness.
* *
* (It would be better to tweak the GEQO logic to not generate such tours * init_tour() is aware of this rule and so we should never reject a
* in the first place, but I'm not sure of all the implications in the * tour during the initial filling of the pool. It seems difficult to
* mutation logic.) * persuade the recombination logic never to break the rule, however.
*/ */
if (num_gene >= 2 && tour[0] > tour[1]) if (num_gene >= 2 && tour[0] > tour[1])
return DBL_MAX; return DBL_MAX;
...@@ -80,10 +84,10 @@ geqo_eval(Query *root, List *initial_rels, Gene *tour, int num_gene) ...@@ -80,10 +84,10 @@ geqo_eval(Query *root, List *initial_rels, Gene *tour, int num_gene)
* this, it'll be pointing at recycled storage after the * this, it'll be pointing at recycled storage after the
* MemoryContextDelete below. * MemoryContextDelete below.
*/ */
savelist = root->join_rel_list; savelist = evaldata->root->join_rel_list;
/* construct the best path for the given combination of relations */ /* construct the best path for the given combination of relations */
joinrel = gimme_tree(root, initial_rels, tour, num_gene); joinrel = gimme_tree(tour, num_gene, evaldata);
/* /*
* compute fitness * compute fitness
...@@ -97,7 +101,7 @@ geqo_eval(Query *root, List *initial_rels, Gene *tour, int num_gene) ...@@ -97,7 +101,7 @@ geqo_eval(Query *root, List *initial_rels, Gene *tour, int num_gene)
fitness = DBL_MAX; fitness = DBL_MAX;
/* restore join_rel_list */ /* restore join_rel_list */
root->join_rel_list = savelist; evaldata->root->join_rel_list = savelist;
/* release all the memory acquired within gimme_tree */ /* release all the memory acquired within gimme_tree */
MemoryContextSwitchTo(oldcxt); MemoryContextSwitchTo(oldcxt);
...@@ -111,63 +115,156 @@ geqo_eval(Query *root, List *initial_rels, Gene *tour, int num_gene) ...@@ -111,63 +115,156 @@ geqo_eval(Query *root, List *initial_rels, Gene *tour, int num_gene)
* Form planner estimates for a join tree constructed in the specified * Form planner estimates for a join tree constructed in the specified
* order. * order.
* *
* 'root' is the Query
* 'initial_rels' is the list of initial relations (FROM-list items)
* 'tour' is the proposed join order, of length 'num_gene' * 'tour' is the proposed join order, of length 'num_gene'
* 'evaldata' contains the context we need
* *
* Returns a new join relation whose cheapest path is the best plan for * Returns a new join relation whose cheapest path is the best plan for
* this join order. NB: will return NULL if join order is invalid. * this join order. NB: will return NULL if join order is invalid.
* *
* Note that at each step we consider using the next rel as both left and * The original implementation of this routine always joined in the specified
* right side of a join. However, we cannot build general ("bushy") plan * order, and so could only build left-sided plans (and right-sided and
* trees this way, only left-sided and right-sided trees. * mixtures, as a byproduct of the fact that make_join_rel() is symmetric).
* It could never produce a "bushy" plan. This had a couple of big problems,
* of which the worst was that as of 7.4, there are situations involving IN
* subqueries where the only valid plans are bushy.
*
* The present implementation takes the given tour as a guideline, but
* postpones joins that seem unsuitable according to some heuristic rules.
* This allows correct bushy plans to be generated at need, and as a nice
* side-effect it seems to materially improve the quality of the generated
* plans.
*/ */
RelOptInfo * RelOptInfo *
gimme_tree(Query *root, List *initial_rels, gimme_tree(Gene *tour, int num_gene, GeqoEvalData *evaldata)
Gene *tour, int num_gene)
{ {
RelOptInfo **stack;
int stack_depth;
RelOptInfo *joinrel; RelOptInfo *joinrel;
int cur_rel_index;
int rel_count; int rel_count;
/* /*
* Start with the first relation ... * Create a stack to hold not-yet-joined relations.
*/ */
cur_rel_index = (int) tour[0]; stack = (RelOptInfo **) palloc(num_gene * sizeof(RelOptInfo *));
stack_depth = 0;
joinrel = (RelOptInfo *) nth(cur_rel_index - 1, initial_rels);
/* /*
* And add on each relation in the specified order ... * Push each relation onto the stack in the specified order. After
* pushing each relation, see whether the top two stack entries are
* joinable according to the desirable_join() heuristics. If so,
* join them into one stack entry, and try again to combine with the
* next stack entry down (if any). When the stack top is no longer
* joinable, continue to the next input relation. After we have pushed
* the last input relation, the heuristics are disabled and we force
* joining all the remaining stack entries.
*
* If desirable_join() always returns true, this produces a straight
* left-to-right join just like the old code. Otherwise we may produce
* a bushy plan or a left/right-sided plan that really corresponds to
* some tour other than the one given. To the extent that the heuristics
* are helpful, however, this will be a better plan than the raw tour.
*
* Also, when a join attempt fails (because of IN-clause constraints),
* we may be able to recover and produce a workable plan, where the old
* code just had to give up. This case acts the same as a false result
* from desirable_join().
*/ */
for (rel_count = 1; rel_count < num_gene; rel_count++) for (rel_count = 0; rel_count < num_gene; rel_count++)
{ {
RelOptInfo *inner_rel; int cur_rel_index;
RelOptInfo *new_rel;
/* Get the next input relation and push it */
cur_rel_index = (int) tour[rel_count]; cur_rel_index = (int) tour[rel_count];
stack[stack_depth] = (RelOptInfo *) nth(cur_rel_index - 1,
inner_rel = (RelOptInfo *) nth(cur_rel_index - 1, initial_rels); evaldata->initial_rels);
stack_depth++;
/* /*
* Construct a RelOptInfo representing the previous joinrel joined * While it's feasible, pop the top two stack entries and replace
* to inner_rel. These are always inner joins. Note that we * with their join.
* expect the joinrel not to exist in root->join_rel_list yet, and
* so the paths constructed for it will only include the ones we
* want.
*/ */
new_rel = make_join_rel(root, joinrel, inner_rel, JOIN_INNER); while (stack_depth >= 2)
{
RelOptInfo *outer_rel = stack[stack_depth - 2];
RelOptInfo *inner_rel = stack[stack_depth - 1];
/*
* Don't pop if heuristics say not to join now. However,
* once we have exhausted the input, the heuristics can't
* prevent popping.
*/
if (rel_count < num_gene - 1 &&
!desirable_join(evaldata->root, outer_rel, inner_rel))
break;
/* Fail if join order is not valid */ /*
if (new_rel == NULL) * Construct a RelOptInfo representing the join of these
return NULL; * two input relations. These are always inner joins.
* Note that we expect the joinrel not to exist in
* root->join_rel_list yet, and so the paths constructed for it
* will only include the ones we want.
*/
joinrel = make_join_rel(evaldata->root, outer_rel, inner_rel,
JOIN_INNER);
/* Find and save the cheapest paths for this rel */ /* Can't pop stack here if join order is not valid */
set_cheapest(new_rel); if (!joinrel)
break;
/* and repeat... */ /* Find and save the cheapest paths for this rel */
joinrel = new_rel; set_cheapest(joinrel);
/* Pop the stack and replace the inputs with their join */
stack_depth--;
stack[stack_depth - 1] = joinrel;
}
} }
/* Did we succeed in forming a single join relation? */
if (stack_depth == 1)
joinrel = stack[0];
else
joinrel = NULL;
pfree(stack);
return joinrel; return joinrel;
} }
/*
* Heuristics for gimme_tree: do we want to join these two relations?
*/
static bool
desirable_join(Query *root,
RelOptInfo *outer_rel, RelOptInfo *inner_rel)
{
List *i;
/*
* Join if there is an applicable join clause.
*/
foreach(i, outer_rel->joininfo)
{
JoinInfo *joininfo = (JoinInfo *) lfirst(i);
if (bms_is_subset(joininfo->unjoined_relids, inner_rel->relids))
return true;
}
/*
* Join if the rels are members of the same IN sub-select. This is
* needed to improve the odds that we will find a valid solution in
* a case where an IN sub-select has a clauseless join.
*/
foreach(i, root->in_info_list)
{
InClauseInfo *ininfo = (InClauseInfo *) lfirst(i);
if (bms_is_subset(outer_rel->relids, ininfo->righthand) &&
bms_is_subset(inner_rel->relids, ininfo->righthand))
return true;
}
/* Otherwise postpone the join till later. */
return false;
}
...@@ -7,7 +7,7 @@ ...@@ -7,7 +7,7 @@
* Portions Copyright (c) 1996-2003, PostgreSQL Global Development Group * Portions Copyright (c) 1996-2003, PostgreSQL Global Development Group
* Portions Copyright (c) 1994, Regents of the University of California * Portions Copyright (c) 1994, Regents of the University of California
* *
* $PostgreSQL: pgsql/src/backend/optimizer/geqo/geqo_main.c,v 1.42 2004/01/21 23:33:34 tgl Exp $ * $PostgreSQL: pgsql/src/backend/optimizer/geqo/geqo_main.c,v 1.43 2004/01/23 23:54:21 tgl Exp $
* *
*------------------------------------------------------------------------- *-------------------------------------------------------------------------
*/ */
...@@ -37,9 +37,9 @@ ...@@ -37,9 +37,9 @@
/* /*
* Configuration options * Configuration options
*/ */
int Geqo_effort;
int Geqo_pool_size; int Geqo_pool_size;
int Geqo_generations; int Geqo_generations;
int Geqo_effort;
double Geqo_selection_bias; double Geqo_selection_bias;
...@@ -66,6 +66,7 @@ static int gimme_number_generations(int pool_size); ...@@ -66,6 +66,7 @@ static int gimme_number_generations(int pool_size);
RelOptInfo * RelOptInfo *
geqo(Query *root, int number_of_rels, List *initial_rels) geqo(Query *root, int number_of_rels, List *initial_rels)
{ {
GeqoEvalData evaldata;
int generation; int generation;
Chromosome *momma; Chromosome *momma;
Chromosome *daddy; Chromosome *daddy;
...@@ -90,6 +91,10 @@ geqo(Query *root, int number_of_rels, List *initial_rels) ...@@ -90,6 +91,10 @@ geqo(Query *root, int number_of_rels, List *initial_rels)
int mutations = 0; int mutations = 0;
#endif #endif
/* set up evaldata */
evaldata.root = root;
evaldata.initial_rels = initial_rels;
/* set GA parameters */ /* set GA parameters */
pool_size = gimme_pool_size(number_of_rels); pool_size = gimme_pool_size(number_of_rels);
number_generations = gimme_number_generations(pool_size); number_generations = gimme_number_generations(pool_size);
...@@ -99,7 +104,7 @@ geqo(Query *root, int number_of_rels, List *initial_rels) ...@@ -99,7 +104,7 @@ geqo(Query *root, int number_of_rels, List *initial_rels)
pool = alloc_pool(pool_size, number_of_rels); pool = alloc_pool(pool_size, number_of_rels);
/* random initialization of the pool */ /* random initialization of the pool */
random_init_pool(root, initial_rels, pool, 0, pool->size); random_init_pool(pool, &evaldata);
/* sort the pool according to cheapest path as fitness */ /* sort the pool according to cheapest path as fitness */
sort_pool(pool); /* we have to do it only one time, since sort_pool(pool); /* we have to do it only one time, since
...@@ -107,41 +112,54 @@ geqo(Query *root, int number_of_rels, List *initial_rels) ...@@ -107,41 +112,54 @@ geqo(Query *root, int number_of_rels, List *initial_rels)
* in future (-> geqo_pool.c:spread_chromo * in future (-> geqo_pool.c:spread_chromo
* ) */ * ) */
#ifdef GEQO_DEBUG
elog(DEBUG1, "GEQO selected %d pool entries, best %.2f, worst %.2f",
pool_size,
pool->data[0].worth,
pool->data[pool_size - 1].worth);
#endif
/* allocate chromosome momma and daddy memory */ /* allocate chromosome momma and daddy memory */
momma = alloc_chromo(pool->string_length); momma = alloc_chromo(pool->string_length);
daddy = alloc_chromo(pool->string_length); daddy = alloc_chromo(pool->string_length);
#if defined (ERX) #if defined (ERX)
ereport(DEBUG2, #ifdef GEQO_DEBUG
(errmsg_internal("using edge recombination crossover [ERX]"))); elog(DEBUG2, "using edge recombination crossover [ERX]");
#endif
/* allocate edge table memory */ /* allocate edge table memory */
edge_table = alloc_edge_table(pool->string_length); edge_table = alloc_edge_table(pool->string_length);
#elif defined(PMX) #elif defined(PMX)
ereport(DEBUG2, #ifdef GEQO_DEBUG
(errmsg_internal("using partially matched crossover [PMX]"))); elog(DEBUG2, "using partially matched crossover [PMX]");
#endif
/* allocate chromosome kid memory */ /* allocate chromosome kid memory */
kid = alloc_chromo(pool->string_length); kid = alloc_chromo(pool->string_length);
#elif defined(CX) #elif defined(CX)
ereport(DEBUG2, #ifdef GEQO_DEBUG
(errmsg_internal("using cycle crossover [CX]"))); elog(DEBUG2, "using cycle crossover [CX]");
#endif
/* allocate city table memory */ /* allocate city table memory */
kid = alloc_chromo(pool->string_length); kid = alloc_chromo(pool->string_length);
city_table = alloc_city_table(pool->string_length); city_table = alloc_city_table(pool->string_length);
#elif defined(PX) #elif defined(PX)
ereport(DEBUG2, #ifdef GEQO_DEBUG
(errmsg_internal("using position crossover [PX]"))); elog(DEBUG2, "using position crossover [PX]");
#endif
/* allocate city table memory */ /* allocate city table memory */
kid = alloc_chromo(pool->string_length); kid = alloc_chromo(pool->string_length);
city_table = alloc_city_table(pool->string_length); city_table = alloc_city_table(pool->string_length);
#elif defined(OX1) #elif defined(OX1)
ereport(DEBUG2, #ifdef GEQO_DEBUG
(errmsg_internal("using order crossover [OX1]"))); elog(DEBUG2, "using order crossover [OX1]");
#endif
/* allocate city table memory */ /* allocate city table memory */
kid = alloc_chromo(pool->string_length); kid = alloc_chromo(pool->string_length);
city_table = alloc_city_table(pool->string_length); city_table = alloc_city_table(pool->string_length);
#elif defined(OX2) #elif defined(OX2)
ereport(DEBUG2, #ifdef GEQO_DEBUG
(errmsg_internal("using order crossover [OX2]"))); elog(DEBUG2, "using order crossover [OX2]");
#endif
/* allocate city table memory */ /* allocate city table memory */
kid = alloc_chromo(pool->string_length); kid = alloc_chromo(pool->string_length);
city_table = alloc_city_table(pool->string_length); city_table = alloc_city_table(pool->string_length);
...@@ -189,8 +207,7 @@ geqo(Query *root, int number_of_rels, List *initial_rels) ...@@ -189,8 +207,7 @@ geqo(Query *root, int number_of_rels, List *initial_rels)
/* EVALUATE FITNESS */ /* EVALUATE FITNESS */
kid->worth = geqo_eval(root, initial_rels, kid->worth = geqo_eval(kid->string, pool->string_length, &evaldata);
kid->string, pool->string_length);
/* push the kid into the wilderness of life according to its worth */ /* push the kid into the wilderness of life according to its worth */
spread_chromo(kid, pool); spread_chromo(kid, pool);
...@@ -207,24 +224,28 @@ geqo(Query *root, int number_of_rels, List *initial_rels) ...@@ -207,24 +224,28 @@ geqo(Query *root, int number_of_rels, List *initial_rels)
#if defined(ERX) && defined(GEQO_DEBUG) #if defined(ERX) && defined(GEQO_DEBUG)
if (edge_failures != 0) if (edge_failures != 0)
elog(LOG, "[GEQO] failures: %d, average: %d", elog(LOG, "[GEQO] failures: %d, average: %d",
edge_failures, (int) generation / edge_failures); edge_failures, (int) number_generations / edge_failures);
else else
elog(LOG, "[GEQO] no edge failures detected"); elog(LOG, "[GEQO] no edge failures detected");
#endif #endif
#if defined(CX) && defined(GEQO_DEBUG) #if defined(CX) && defined(GEQO_DEBUG)
if (mutations != 0) if (mutations != 0)
elog(LOG, "[GEQO] mutations: %d, generations: %d", mutations, generation); elog(LOG, "[GEQO] mutations: %d, generations: %d",
mutations, number_generations);
else else
elog(LOG, "[GEQO] no mutations processed"); elog(LOG, "[GEQO] no mutations processed");
#endif #endif
#ifdef GEQO_DEBUG #ifdef GEQO_DEBUG
print_pool(stdout, pool, 0, pool_size - 1); print_pool(stdout, pool, 0, pool_size - 1);
#endif #endif
#ifdef GEQO_DEBUG
elog(DEBUG1, "GEQO best is %.2f after %d generations",
pool->data[0].worth, number_generations);
#endif
/* /*
* got the cheapest query tree processed by geqo; first element of the * got the cheapest query tree processed by geqo; first element of the
...@@ -233,8 +254,7 @@ geqo(Query *root, int number_of_rels, List *initial_rels) ...@@ -233,8 +254,7 @@ geqo(Query *root, int number_of_rels, List *initial_rels)
best_tour = (Gene *) pool->data[0].string; best_tour = (Gene *) pool->data[0].string;
/* root->join_rel_list will be modified during this ! */ /* root->join_rel_list will be modified during this ! */
best_rel = gimme_tree(root, initial_rels, best_rel = gimme_tree(best_tour, pool->string_length, &evaldata);
best_tour, pool->string_length);
if (best_rel == NULL) if (best_rel == NULL)
elog(ERROR, "failed to make a valid plan"); elog(ERROR, "failed to make a valid plan");
...@@ -272,53 +292,49 @@ geqo(Query *root, int number_of_rels, List *initial_rels) ...@@ -272,53 +292,49 @@ geqo(Query *root, int number_of_rels, List *initial_rels)
} }
/* /*
* Return either configured pool size or * Return either configured pool size or a good default
* a good default based on query size (no. of relations) *
* = 2^(QS+1) * The default is based on query size (no. of relations) = 2^(QS+1),
* also constrain between 128 and 1024 * but constrained to a range based on the effort value.
*/ */
static int static int
gimme_pool_size(int nr_rel) gimme_pool_size(int nr_rel)
{ {
double size; double size;
int minsize;
int maxsize;
if (Geqo_pool_size > 0) /* Legal pool size *must* be at least 2, so ignore attempt to select 1 */
if (Geqo_pool_size >= 2)
return Geqo_pool_size; return Geqo_pool_size;
size = pow(2.0, nr_rel + 1.0); size = pow(2.0, nr_rel + 1.0);
if (size < MIN_GEQO_POOL_SIZE) maxsize = 50 * Geqo_effort; /* 50 to 500 individuals */
return MIN_GEQO_POOL_SIZE; if (size > maxsize)
else if (size > MAX_GEQO_POOL_SIZE) return maxsize;
return MAX_GEQO_POOL_SIZE;
else
return (int) ceil(size);
}
minsize = 10 * Geqo_effort; /* 10 to 100 individuals */
if (size < minsize)
return minsize;
return (int) ceil(size);
}
/* /*
* Return either configured number of generations or * Return either configured number of generations or a good default
* some reasonable default calculated on the fly. *
* = Effort * Log2(PoolSize) * The default is the same as the pool size, which allows us to be
* sure that less-fit individuals get pushed out of the breeding
* population before the run finishes.
*/ */
static int static int
gimme_number_generations(int pool_size) gimme_number_generations(int pool_size)
{ {
double gens;
if (Geqo_generations > 0) if (Geqo_generations > 0)
return Geqo_generations; return Geqo_generations;
gens = Geqo_effort * log((double) pool_size) / log(2.0); return pool_size;
/* bound it to a sane range */
if (gens <= 0)
gens = 1;
else if (gens > 10000)
gens = 10000;
return (int) ceil(gens);
} }
...@@ -6,7 +6,7 @@ ...@@ -6,7 +6,7 @@
* Portions Copyright (c) 1996-2003, PostgreSQL Global Development Group * Portions Copyright (c) 1996-2003, PostgreSQL Global Development Group
* Portions Copyright (c) 1994, Regents of the University of California * Portions Copyright (c) 1994, Regents of the University of California
* *
* $PostgreSQL: pgsql/src/backend/optimizer/geqo/geqo_pool.c,v 1.22 2003/11/29 22:39:49 pgsql Exp $ * $PostgreSQL: pgsql/src/backend/optimizer/geqo/geqo_pool.c,v 1.23 2004/01/23 23:54:21 tgl Exp $
* *
*------------------------------------------------------------------------- *-------------------------------------------------------------------------
*/ */
...@@ -22,6 +22,11 @@ ...@@ -22,6 +22,11 @@
/* -- parts of this are adapted from D. Whitley's Genitor algorithm -- */ /* -- parts of this are adapted from D. Whitley's Genitor algorithm -- */
#include "postgres.h" #include "postgres.h"
#include <float.h>
#include <limits.h>
#include <math.h>
#include "optimizer/geqo.h" #include "optimizer/geqo.h"
#include "optimizer/geqo_copy.h" #include "optimizer/geqo_copy.h"
#include "optimizer/geqo_pool.h" #include "optimizer/geqo_pool.h"
...@@ -84,19 +89,42 @@ free_pool(Pool *pool) ...@@ -84,19 +89,42 @@ free_pool(Pool *pool)
* initialize genetic pool * initialize genetic pool
*/ */
void void
random_init_pool(Query *root, List *initial_rels, random_init_pool(Pool *pool, GeqoEvalData *evaldata)
Pool *pool, int strt, int stp)
{ {
Chromosome *chromo = (Chromosome *) pool->data; Chromosome *chromo = (Chromosome *) pool->data;
int i; int i;
int bad = 0;
for (i = strt; i < stp; i++) /*
* We immediately discard any invalid individuals (those that geqo_eval
* returns DBL_MAX for), thereby not wasting pool space on them.
*
* If we fail to make any valid individuals after 10000 tries, give up;
* this probably means something is broken, and we shouldn't just let
* ourselves get stuck in an infinite loop.
*/
i = 0;
while (i < pool->size)
{ {
init_tour(chromo[i].string, pool->string_length); init_tour(chromo[i].string, pool->string_length);
pool->data[i].worth = geqo_eval(root, initial_rels, pool->data[i].worth = geqo_eval(chromo[i].string,
chromo[i].string, pool->string_length,
pool->string_length); evaldata);
if (pool->data[i].worth < DBL_MAX)
i++;
else
{
bad++;
if (i == 0 && bad >= 10000)
elog(ERROR, "failed to make a valid plan");
}
} }
#ifdef GEQO_DEBUG
if (bad > 0)
elog(DEBUG1, "%d invalid tours found while selecting %d pool entries",
bad, pool->size);
#endif
} }
/* /*
...@@ -113,20 +141,17 @@ sort_pool(Pool *pool) ...@@ -113,20 +141,17 @@ sort_pool(Pool *pool)
/* /*
* compare * compare
* static input function for pg_sort * qsort comparison function for sort_pool
*
* return values for sort from smallest to largest are prooved!
* don't change them!
*/ */
static int static int
compare(const void *arg1, const void *arg2) compare(const void *arg1, const void *arg2)
{ {
Chromosome chromo1 = *(Chromosome *) arg1; const Chromosome *chromo1 = (const Chromosome *) arg1;
Chromosome chromo2 = *(Chromosome *) arg2; const Chromosome *chromo2 = (const Chromosome *) arg2;
if (chromo1.worth == chromo2.worth) if (chromo1->worth == chromo2->worth)
return 0; return 0;
else if (chromo1.worth > chromo2.worth) else if (chromo1->worth > chromo2->worth)
return 1; return 1;
else else
return -1; return -1;
......
...@@ -3,7 +3,7 @@ ...@@ -3,7 +3,7 @@
* geqo_recombination.c * geqo_recombination.c
* misc recombination procedures * misc recombination procedures
* *
* $PostgreSQL: pgsql/src/backend/optimizer/geqo/geqo_recombination.c,v 1.12 2003/11/29 22:39:49 pgsql Exp $ * $PostgreSQL: pgsql/src/backend/optimizer/geqo/geqo_recombination.c,v 1.13 2004/01/23 23:54:21 tgl Exp $
* *
*------------------------------------------------------------------------- *-------------------------------------------------------------------------
*/ */
...@@ -19,6 +19,7 @@ ...@@ -19,6 +19,7 @@
/* -- parts of this are adapted from D. Whitley's Genitor algorithm -- */ /* -- parts of this are adapted from D. Whitley's Genitor algorithm -- */
#include "postgres.h" #include "postgres.h"
#include "optimizer/geqo_random.h" #include "optimizer/geqo_random.h"
#include "optimizer/geqo_recombination.h" #include "optimizer/geqo_recombination.h"
...@@ -32,7 +33,6 @@ ...@@ -32,7 +33,6 @@
* points on the tour and randomly chooses the 'next' city from * points on the tour and randomly chooses the 'next' city from
* this array. When a city is chosen, the array is shortened * this array. When a city is chosen, the array is shortened
* and the procedure repeated. * and the procedure repeated.
*
*/ */
void void
init_tour(Gene *tour, int num_gene) init_tour(Gene *tour, int num_gene)
...@@ -42,31 +42,43 @@ init_tour(Gene *tour, int num_gene) ...@@ -42,31 +42,43 @@ init_tour(Gene *tour, int num_gene)
int next, int next,
i; i;
/* Fill a temp array with the IDs of all not-yet-visited cities */
tmp = (Gene *) palloc(num_gene * sizeof(Gene)); tmp = (Gene *) palloc(num_gene * sizeof(Gene));
for (i = 0; i < num_gene; i++) for (i = 0; i < num_gene; i++)
{ tmp[i] = (Gene) (i + 1);
tmp[i] = (Gene) i + 1; /* builds tours "1 - 2 - 3" etc. */
}
remainder = num_gene - 1; remainder = num_gene - 1;
for (i = 0; i < num_gene; i++) for (i = 0; i < num_gene; i++)
{ {
next = (int) geqo_randint(remainder, 0); /* choose city between 0 /* choose value between 0 and remainder inclusive */
* and remainder */ next = (int) geqo_randint(remainder, 0);
/* output that element of the tmp array */
tour[i] = tmp[next]; tour[i] = tmp[next];
/* and delete it */
tmp[next] = tmp[remainder]; tmp[next] = tmp[remainder];
remainder--; remainder--;
} }
/*
* Since geqo_eval() will reject tours where tour[0] > tour[1],
* we may as well switch the two to make it a valid tour.
*/
if (num_gene >= 2 && tour[0] > tour[1])
{
Gene gtmp = tour[0];
tour[0] = tour[1];
tour[1] = gtmp;
}
pfree(tmp); pfree(tmp);
} }
/* alloc_city_table /* alloc_city_table
* *
* allocate memory for city table * allocate memory for city table
*
*/ */
City * City *
alloc_city_table(int num_gene) alloc_city_table(int num_gene)
...@@ -77,7 +89,6 @@ alloc_city_table(int num_gene) ...@@ -77,7 +89,6 @@ alloc_city_table(int num_gene)
* palloc one extra location so that nodes numbered 1..n can be * palloc one extra location so that nodes numbered 1..n can be
* indexed directly; 0 will not be used * indexed directly; 0 will not be used
*/ */
city_table = (City *) palloc((num_gene + 1) * sizeof(City)); city_table = (City *) palloc((num_gene + 1) * sizeof(City));
return city_table; return city_table;
...@@ -86,7 +97,6 @@ alloc_city_table(int num_gene) ...@@ -86,7 +97,6 @@ alloc_city_table(int num_gene)
/* free_city_table /* free_city_table
* *
* deallocate memory of city table * deallocate memory of city table
*
*/ */
void void
free_city_table(City *city_table) free_city_table(City *city_table)
......
...@@ -10,7 +10,7 @@ ...@@ -10,7 +10,7 @@
* Written by Peter Eisentraut <peter_e@gmx.net>. * Written by Peter Eisentraut <peter_e@gmx.net>.
* *
* IDENTIFICATION * IDENTIFICATION
* $PostgreSQL: pgsql/src/backend/utils/misc/guc.c,v 1.178 2004/01/21 23:33:34 tgl Exp $ * $PostgreSQL: pgsql/src/backend/utils/misc/guc.c,v 1.179 2004/01/23 23:54:21 tgl Exp $
* *
*-------------------------------------------------------------------- *--------------------------------------------------------------------
*/ */
...@@ -921,33 +921,32 @@ static struct config_int ConfigureNamesInt[] = ...@@ -921,33 +921,32 @@ static struct config_int ConfigureNamesInt[] =
NULL NULL
}, },
&geqo_threshold, &geqo_threshold,
11, 2, INT_MAX, NULL, NULL 12, 2, INT_MAX, NULL, NULL
}, },
{ {
{"geqo_pool_size", PGC_USERSET, QUERY_TUNING_GEQO, {"geqo_effort", PGC_USERSET, QUERY_TUNING_GEQO,
gettext_noop("GEQO: number of individuals in one population."), gettext_noop("GEQO: effort is used to set the default for other GEQO parameters."),
NULL NULL
}, },
&Geqo_effort,
DEFAULT_GEQO_EFFORT, MIN_GEQO_EFFORT, MAX_GEQO_EFFORT, NULL, NULL
},
{
{"geqo_pool_size", PGC_USERSET, QUERY_TUNING_GEQO,
gettext_noop("GEQO: number of individuals in the population."),
gettext_noop("Zero selects a suitable default value.")
},
&Geqo_pool_size, &Geqo_pool_size,
DEFAULT_GEQO_POOL_SIZE, 0, MAX_GEQO_POOL_SIZE, NULL, NULL 0, 0, INT_MAX, NULL, NULL
}, },
{ {
{"geqo_generations", PGC_USERSET, QUERY_TUNING_GEQO, {"geqo_generations", PGC_USERSET, QUERY_TUNING_GEQO,
gettext_noop("GEQO: number of iterations of the algorithm."), gettext_noop("GEQO: number of iterations of the algorithm."),
gettext_noop("The value must be a positive integer. If 0 is " gettext_noop("Zero selects a suitable default value.")
"specified then effort * log2(poolsize) is used.")
}, },
&Geqo_generations, &Geqo_generations,
0, 0, INT_MAX, NULL, NULL 0, 0, INT_MAX, NULL, NULL
}, },
{
{"geqo_effort", PGC_USERSET, QUERY_TUNING_GEQO,
gettext_noop("GEQO: effort is used to set the default for generations."),
NULL
},
&Geqo_effort,
DEFAULT_GEQO_EFFORT, MIN_GEQO_EFFORT, MAX_GEQO_EFFORT, NULL, NULL
},
{ {
{"deadlock_timeout", PGC_SIGHUP, LOCK_MANAGEMENT, {"deadlock_timeout", PGC_SIGHUP, LOCK_MANAGEMENT,
......
...@@ -122,11 +122,10 @@ ...@@ -122,11 +122,10 @@
# - Genetic Query Optimizer - # - Genetic Query Optimizer -
#geqo = true #geqo = true
#geqo_threshold = 11 #geqo_threshold = 12
#geqo_pool_size = 0 # default based on tables in statement, #geqo_effort = 5 # range 1-10
# range 128-1024 #geqo_pool_size = 0 # selects default based on effort
#geqo_generations = 0 # use default: effort * log2(pool_size) #geqo_generations = 0 # selects default based on effort
#geqo_effort = 40 # range 1-100
#geqo_selection_bias = 2.0 # range 1.5-2.0 #geqo_selection_bias = 2.0 # range 1.5-2.0
# - Other Planner Options - # - Other Planner Options -
......
...@@ -6,7 +6,7 @@ ...@@ -6,7 +6,7 @@
* Portions Copyright (c) 1996-2003, PostgreSQL Global Development Group * Portions Copyright (c) 1996-2003, PostgreSQL Global Development Group
* Portions Copyright (c) 1994, Regents of the University of California * Portions Copyright (c) 1994, Regents of the University of California
* *
* $PostgreSQL: pgsql/src/include/optimizer/geqo.h,v 1.34 2004/01/21 23:33:34 tgl Exp $ * $PostgreSQL: pgsql/src/include/optimizer/geqo.h,v 1.35 2004/01/23 23:54:21 tgl Exp $
* *
*------------------------------------------------------------------------- *-------------------------------------------------------------------------
*/ */
...@@ -25,6 +25,7 @@ ...@@ -25,6 +25,7 @@
#include "nodes/relation.h" #include "nodes/relation.h"
#include "optimizer/geqo_gene.h" #include "optimizer/geqo_gene.h"
/* GEQO debug flag */ /* GEQO debug flag */
/* /*
#define GEQO_DEBUG #define GEQO_DEBUG
...@@ -47,21 +48,15 @@ ...@@ -47,21 +48,15 @@
* *
* If you change these, update backend/utils/misc/postgresql.sample.conf * If you change these, update backend/utils/misc/postgresql.sample.conf
*/ */
extern int Geqo_pool_size; extern int Geqo_effort; /* 1 .. 10, knob for adjustment of defaults */
#define DEFAULT_GEQO_POOL_SIZE 0 /* = default based on no. of relations. */
#define MIN_GEQO_POOL_SIZE 128
#define MAX_GEQO_POOL_SIZE 1024
extern int Geqo_generations; /* 1 .. inf, or 0 to use default based on #define DEFAULT_GEQO_EFFORT 5
* pool size */ #define MIN_GEQO_EFFORT 1
#define MAX_GEQO_EFFORT 10
extern int Geqo_effort; /* only used to calculate default for extern int Geqo_pool_size; /* 2 .. inf, or 0 to use default */
* generations */
#define DEFAULT_GEQO_EFFORT 40 extern int Geqo_generations; /* 1 .. inf, or 0 to use default */
#define MIN_GEQO_EFFORT 1
#define MAX_GEQO_EFFORT 100
extern double Geqo_selection_bias; extern double Geqo_selection_bias;
...@@ -70,13 +65,23 @@ extern double Geqo_selection_bias; ...@@ -70,13 +65,23 @@ extern double Geqo_selection_bias;
#define MAX_GEQO_SELECTION_BIAS 2.0 #define MAX_GEQO_SELECTION_BIAS 2.0
/*
* Data structure to encapsulate information needed for building plan trees
* (i.e., geqo_eval and gimme_tree).
*/
typedef struct
{
Query *root; /* the query we are planning */
List *initial_rels; /* the base relations */
} GeqoEvalData;
/* routines in geqo_main.c */ /* routines in geqo_main.c */
extern RelOptInfo *geqo(Query *root, int number_of_rels, List *initial_rels); extern RelOptInfo *geqo(Query *root, int number_of_rels, List *initial_rels);
/* routines in geqo_eval.c */ /* routines in geqo_eval.c */
extern Cost geqo_eval(Query *root, List *initial_rels, extern Cost geqo_eval(Gene *tour, int num_gene, GeqoEvalData *evaldata);
Gene *tour, int num_gene); extern RelOptInfo *gimme_tree(Gene *tour, int num_gene,
extern RelOptInfo *gimme_tree(Query *root, List *initial_rels, GeqoEvalData *evaldata);
Gene *tour, int num_gene);
#endif /* GEQO_H */ #endif /* GEQO_H */
...@@ -6,7 +6,7 @@ ...@@ -6,7 +6,7 @@
* Portions Copyright (c) 1996-2003, PostgreSQL Global Development Group * Portions Copyright (c) 1996-2003, PostgreSQL Global Development Group
* Portions Copyright (c) 1994, Regents of the University of California * Portions Copyright (c) 1994, Regents of the University of California
* *
* $PostgreSQL: pgsql/src/include/optimizer/geqo_pool.h,v 1.18 2003/11/29 22:41:07 pgsql Exp $ * $PostgreSQL: pgsql/src/include/optimizer/geqo_pool.h,v 1.19 2004/01/23 23:54:21 tgl Exp $
* *
*------------------------------------------------------------------------- *-------------------------------------------------------------------------
*/ */
...@@ -23,14 +23,13 @@ ...@@ -23,14 +23,13 @@
#ifndef GEQO_POOL_H #ifndef GEQO_POOL_H
#define GEQO_POOL_H #define GEQO_POOL_H
#include "optimizer/geqo_gene.h" #include "optimizer/geqo.h"
#include "nodes/parsenodes.h"
extern Pool *alloc_pool(int pool_size, int string_length); extern Pool *alloc_pool(int pool_size, int string_length);
extern void free_pool(Pool *pool); extern void free_pool(Pool *pool);
extern void random_init_pool(Query *root, List *initial_rels, extern void random_init_pool(Pool *pool, GeqoEvalData *evaldata);
Pool *pool, int strt, int stop);
extern Chromosome *alloc_chromo(int string_length); extern Chromosome *alloc_chromo(int string_length);
extern void free_chromo(Chromosome *chromo); extern void free_chromo(Chromosome *chromo);
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment