Commit 499be013 authored by Alvaro Herrera's avatar Alvaro Herrera

Support partition pruning at execution time

Existing partition pruning is only able to work at plan time, for query
quals that appear in the parsed query.  This is good but limiting, as
there can be parameters that appear later that can be usefully used to
further prune partitions.

This commit adds support for pruning subnodes of Append which cannot
possibly contain any matching tuples, during execution, by evaluating
Params to determine the minimum set of subnodes that can possibly match.
We support more than just simple Params in WHERE clauses. Support
additionally includes:

1. Parameterized Nested Loop Joins: The parameter from the outer side of the
   join can be used to determine the minimum set of inner side partitions to
   scan.

2. Initplans: Once an initplan has been executed we can then determine which
   partitions match the value from the initplan.

Partition pruning is performed in two ways.  When Params external to the plan
are found to match the partition key we attempt to prune away unneeded Append
subplans during the initialization of the executor.  This allows us to bypass
the initialization of non-matching subplans meaning they won't appear in the
EXPLAIN or EXPLAIN ANALYZE output.

For parameters whose value is only known during the actual execution
then the pruning of these subplans must wait.  Subplans which are
eliminated during this stage of pruning are still visible in the EXPLAIN
output.  In order to determine if pruning has actually taken place, the
EXPLAIN ANALYZE must be viewed.  If a certain Append subplan was never
executed due to the elimination of the partition then the execution
timing area will state "(never executed)".  Whereas, if, for example in
the case of parameterized nested loops, the number of loops stated in
the EXPLAIN ANALYZE output for certain subplans may appear lower than
others due to the subplan having been scanned fewer times.  This is due
to the list of matching subnodes having to be evaluated whenever a
parameter which was found to match the partition key changes.

This commit required some additional infrastructure that permits the
building of a data structure which is able to perform the translation of
the matching partition IDs, as returned by get_matching_partitions, into
the list index of a subpaths list, as exist in node types such as
Append, MergeAppend and ModifyTable.  This allows us to translate a list
of clauses into a Bitmapset of all the subpath indexes which must be
included to satisfy the clause list.

Author: David Rowley, based on an earlier effort by Beena Emerson
Reviewers: Amit Langote, Robert Haas, Amul Sul, Rajkumar Raghuwanshi,
Jesper Pedersen
Discussion: https://postgr.es/m/CAOG9ApE16ac-_VVZVvv0gePSgkg_BwYEV1NBqZFqDR2bBE0X0A@mail.gmail.com
parent 5c067521
......@@ -894,6 +894,18 @@ EXPLAIN ANALYZE SELECT * FROM tenk1 WHERE unique1 < 100 AND unique2 > 9000
BitmapAnd and BitmapOr nodes always report their actual row counts as zero,
due to implementation limitations.
</para>
<para>
Generally, the <command>EXPLAIN</command> output will display details for
every plan node which was generated by the query planner. However, there
are cases where the executor is able to determine that certain nodes are
not required; currently, the only node type to support this is the
<literal>Append</literal> node. This node type has the ability to discard
subnodes which it is able to determine won't contain any records required
by the query. It is possible to determine that nodes have been removed in
this way by the presence of a "Subplans Removed" property in the
<command>EXPLAIN</command> output.
</para>
</sect2>
</sect1>
......
......@@ -118,8 +118,8 @@ static void ExplainModifyTarget(ModifyTable *plan, ExplainState *es);
static void ExplainTargetRel(Plan *plan, Index rti, ExplainState *es);
static void show_modifytable_info(ModifyTableState *mtstate, List *ancestors,
ExplainState *es);
static void ExplainMemberNodes(List *plans, PlanState **planstates,
List *ancestors, ExplainState *es);
static void ExplainMemberNodes(PlanState **planstates, int nsubnodes,
int nplans, List *ancestors, ExplainState *es);
static void ExplainSubPlans(List *plans, List *ancestors,
const char *relationship, ExplainState *es);
static void ExplainCustomChildren(CustomScanState *css,
......@@ -1811,28 +1811,33 @@ ExplainNode(PlanState *planstate, List *ancestors,
switch (nodeTag(plan))
{
case T_ModifyTable:
ExplainMemberNodes(((ModifyTable *) plan)->plans,
((ModifyTableState *) planstate)->mt_plans,
ExplainMemberNodes(((ModifyTableState *) planstate)->mt_plans,
((ModifyTableState *) planstate)->mt_nplans,
list_length(((ModifyTable *) plan)->plans),
ancestors, es);
break;
case T_Append:
ExplainMemberNodes(((Append *) plan)->appendplans,
((AppendState *) planstate)->appendplans,
ExplainMemberNodes(((AppendState *) planstate)->appendplans,
((AppendState *) planstate)->as_nplans,
list_length(((Append *) plan)->appendplans),
ancestors, es);
break;
case T_MergeAppend:
ExplainMemberNodes(((MergeAppend *) plan)->mergeplans,
((MergeAppendState *) planstate)->mergeplans,
ExplainMemberNodes(((MergeAppendState *) planstate)->mergeplans,
((MergeAppendState *) planstate)->ms_nplans,
list_length(((MergeAppend *) plan)->mergeplans),
ancestors, es);
break;
case T_BitmapAnd:
ExplainMemberNodes(((BitmapAnd *) plan)->bitmapplans,
((BitmapAndState *) planstate)->bitmapplans,
ExplainMemberNodes(((BitmapAndState *) planstate)->bitmapplans,
((BitmapAndState *) planstate)->nplans,
list_length(((BitmapAnd *) plan)->bitmapplans),
ancestors, es);
break;
case T_BitmapOr:
ExplainMemberNodes(((BitmapOr *) plan)->bitmapplans,
((BitmapOrState *) planstate)->bitmapplans,
ExplainMemberNodes(((BitmapOrState *) planstate)->bitmapplans,
((BitmapOrState *) planstate)->nplans,
list_length(((BitmapOr *) plan)->bitmapplans),
ancestors, es);
break;
case T_SubqueryScan:
......@@ -3173,18 +3178,28 @@ show_modifytable_info(ModifyTableState *mtstate, List *ancestors,
*
* The ancestors list should already contain the immediate parent of these
* plans.
*
* Note: we don't actually need to examine the Plan list members, but
* we need the list in order to determine the length of the PlanState array.
*
* nsubnodes indicates the number of items in the planstates array.
* nplans indicates the original number of subnodes in the Plan, some of these
* may have been pruned by the run-time pruning code.
*/
static void
ExplainMemberNodes(List *plans, PlanState **planstates,
ExplainMemberNodes(PlanState **planstates, int nsubnodes, int nplans,
List *ancestors, ExplainState *es)
{
int nplans = list_length(plans);
int j;
for (j = 0; j < nplans; j++)
/*
* The number of subnodes being lower than the number of subplans that was
* specified in the plan means that some subnodes have been ignored per
* instruction for the partition pruning code during the executor
* initialization. To make this a bit less mysterious, we'll indicate
* here that this has happened.
*/
if (nsubnodes < nplans)
ExplainPropertyInteger("Subplans Removed", NULL, nplans - nsubnodes, es);
for (j = 0; j < nsubnodes; j++)
ExplainNode(planstates[j], ancestors,
"Member", NULL, es);
}
......
......@@ -40,6 +40,10 @@ static char *ExecBuildSlotPartitionKeyDescription(Relation rel,
bool *isnull,
int maxfieldlen);
static List *adjust_partition_tlist(List *tlist, TupleConversionMap *map);
static void find_subplans_for_params_recurse(PartitionPruneState *prunestate,
PartitionPruningData *pprune,
bool allparams,
Bitmapset **validsubplans);
/*
......@@ -1293,3 +1297,418 @@ adjust_partition_tlist(List *tlist, TupleConversionMap *map)
return new_tlist;
}
/*-------------------------------------------------------------------------
* Run-Time Partition Pruning Support.
*
* The following series of functions exist to support the removal of unneeded
* subnodes for queries against partitioned tables. The supporting functions
* here are designed to work with any node type which supports an arbitrary
* number of subnodes, e.g. Append, MergeAppend.
*
* Normally this pruning work is performed by the query planner's partition
* pruning code, however, the planner is limited to only being able to prune
* away unneeded partitions using quals which compare the partition key to a
* value which is known to be Const during planning. To allow the same
* pruning to be performed for values which are only determined during
* execution, we must make an additional pruning attempt during execution.
*
* Here we support pruning using both external and exec Params. The main
* difference between these that we need to concern ourselves with is the
* time when the values of the Params are known. External Param values are
* known at any time of execution, including executor startup, but exec Param
* values are only known when the executor is running.
*
* For external Params we may be able to prune away unneeded partitions
* during executor startup. This has the added benefit of not having to
* initialize the unneeded subnodes at all. This is useful as it can save
* quite a bit of effort during executor startup.
*
* For exec Params, we must delay pruning until the executor is running.
*
* Functions:
*
* ExecSetupPartitionPruneState:
* This must be called by nodes before any partition pruning is
* attempted. Normally executor startup is a good time. This function
* creates the PartitionPruneState details which are required by each
* of the two pruning functions, details include information about
* how to map the partition index details which are returned by the
* planner's partition prune function into subnode indexes.
*
* ExecFindInitialMatchingSubPlans:
* Returns indexes of matching subnodes utilizing only external Params
* to eliminate subnodes. The function must only be called during
* executor startup for the given node before the subnodes themselves
* are initialized. Subnodes which are found not to match by this
* function must not be included in the node's list of subnodes as this
* function performs a remap of the partition index to subplan index map
* and the newly created map provides indexes only for subnodes which
* remain after calling this function.
*
* ExecFindMatchingSubPlans:
* Returns indexes of matching subnodes utilizing all Params to eliminate
* subnodes which can't possibly contain matching tuples. This function
* can only be called while the executor is running.
*-------------------------------------------------------------------------
*/
/*
* ExecSetupPartitionPruneState
* Setup the required data structure which is required for calling
* ExecFindInitialMatchingSubPlans and ExecFindMatchingSubPlans.
*
* 'partitionpruneinfo' is a List of PartitionPruneInfos as generated by
* make_partition_pruneinfo. Here we build a PartitionPruneContext for each
* item in the List. These contexts can be re-used each time we re-evaulate
* which partitions match the pruning steps provided in each
* PartitionPruneInfo.
*/
PartitionPruneState *
ExecSetupPartitionPruneState(PlanState *planstate, List *partitionpruneinfo)
{
PartitionPruningData *prunedata;
PartitionPruneState *prunestate;
ListCell *lc;
int i;
Assert(partitionpruneinfo != NIL);
prunestate = (PartitionPruneState *) palloc(sizeof(PartitionPruneState));
prunedata = (PartitionPruningData *)
palloc(sizeof(PartitionPruningData) * list_length(partitionpruneinfo));
/*
* The first item in the array contains the details for the query's target
* partition, so record that as the root of the partition hierarchy.
*/
prunestate->partprunedata = prunedata;
prunestate->num_partprunedata = list_length(partitionpruneinfo);
prunestate->extparams = NULL;
prunestate->execparams = NULL;
/*
* Create a sub memory context which we'll use when making calls to the
* query planner's function to determine which partitions will match. The
* planner is not too careful about freeing memory, so we'll ensure we
* call the function in this context to avoid any memory leaking in the
* executor's memory context.
*/
prunestate->prune_context =
AllocSetContextCreate(CurrentMemoryContext,
"Partition Prune",
ALLOCSET_DEFAULT_SIZES);
i = 0;
foreach(lc, partitionpruneinfo)
{
PartitionPruneInfo *pinfo = (PartitionPruneInfo *) lfirst(lc);
PartitionPruningData *pprune = &prunedata[i];
PartitionPruneContext *context = &pprune->context;
PartitionDesc partdesc;
Relation rel;
PartitionKey partkey;
int partnatts;
pprune->present_parts = bms_copy(pinfo->present_parts);
pprune->subnode_map = palloc(sizeof(int) * pinfo->nparts);
/*
* We must make a copy of this rather than pointing directly to the
* plan's version as we may end up making modifications to it later.
*/
memcpy(pprune->subnode_map, pinfo->subnode_map,
sizeof(int) * pinfo->nparts);
/* We can use the subpart_map verbatim, since we never modify it */
pprune->subpart_map = pinfo->subpart_map;
/*
* Grab some info from the table's relcache; lock was already obtained
* by ExecLockNonLeafAppendTables.
*/
rel = relation_open(pinfo->reloid, NoLock);
partkey = RelationGetPartitionKey(rel);
partdesc = RelationGetPartitionDesc(rel);
context->strategy = partkey->strategy;
context->partnatts = partnatts = partkey->partnatts;
context->partopfamily = partkey->partopfamily;
context->partopcintype = partkey->partopcintype;
context->partcollation = partkey->partcollation;
context->partsupfunc = partkey->partsupfunc;
context->nparts = pinfo->nparts;
context->boundinfo = partition_bounds_copy(partdesc->boundinfo, partkey);
context->planstate = planstate;
context->safeparams = NULL; /* empty for now */
pprune->pruning_steps = pinfo->pruning_steps;
pprune->extparams = bms_copy(pinfo->extparams);
pprune->allparams = bms_union(pinfo->extparams, pinfo->execparams);
/*
* Accumulate the paramids which match the partitioned keys of all
* partitioned tables.
*/
prunestate->extparams = bms_add_members(prunestate->extparams,
pinfo->extparams);
prunestate->execparams = bms_add_members(prunestate->execparams,
pinfo->execparams);
relation_close(rel, NoLock);
i++;
}
/*
* Cache the union of the paramids of both types. This saves having to
* recalculate it everytime we need to know what they are.
*/
prunestate->allparams = bms_union(prunestate->extparams,
prunestate->execparams);
return prunestate;
}
/*
* ExecFindInitialMatchingSubPlans
* Determine which subset of subplan nodes we need to initialize based
* on the details stored in 'prunestate'. Here we only determine the
* matching partitions using values known during plan startup, which is
* only external Params. Exec Params will be unknown at this time. We
* must delay pruning using exec Params until the actual executor run.
*
* It is expected that callers of this function do so only once during their
* init plan. The caller must only initialize the subnodes which are returned
* by this function. The remaining subnodes should be discarded. Once this
* function has been called, future calls to ExecFindMatchingSubPlans will
* return its matching subnode indexes assuming that the caller discarded
* the original non-matching subnodes.
*
* This function must only be called if 'prunestate' has any extparams.
*
* 'nsubnodes' must be passed as the total number of unpruned subnodes.
*/
Bitmapset *
ExecFindInitialMatchingSubPlans(PartitionPruneState *prunestate, int nsubnodes)
{
PartitionPruningData *pprune;
MemoryContext oldcontext;
Bitmapset *result = NULL;
/*
* Ensure there's actually external params, or we've not been called
* already.
*/
Assert(!bms_is_empty(prunestate->extparams));
pprune = prunestate->partprunedata;
/*
* Switch to a temp context to avoid leaking memory in the executor's
* memory context.
*/
oldcontext = MemoryContextSwitchTo(prunestate->prune_context);
/* Determine which subnodes match the external params */
find_subplans_for_params_recurse(prunestate, pprune, false, &result);
MemoryContextSwitchTo(oldcontext);
/* Move to the correct memory context */
result = bms_copy(result);
MemoryContextReset(prunestate->prune_context);
/*
* Record that partition pruning has been performed for external params.
* This partly also serves to ensure we never call this function twice
* with the same input and also so that ExecFindMatchingSubPlans is aware
* that pruning has already been performed for external Params.
*/
bms_free(prunestate->extparams);
prunestate->extparams = NULL;
/*
* If any subnodes were pruned, we must re-sequence the subnode indexes so
* that ExecFindMatchingSubPlans properly returns the indexes from the
* subnodes which will remain after execution of this function.
*/
if (bms_num_members(result) < nsubnodes)
{
int *new_subnode_indexes;
int i;
int newidx;
/*
* First we must build an array which we can use to adjust the
* existing subnode_map so that it contains the new subnode indexes.
*/
new_subnode_indexes = (int *) palloc(sizeof(int) * nsubnodes);
newidx = 0;
for (i = 0; i < nsubnodes; i++)
{
if (bms_is_member(i, result))
new_subnode_indexes[i] = newidx++;
else
new_subnode_indexes[i] = -1; /* Newly pruned */
}
/*
* Now we can re-sequence each PartitionPruneInfo's subnode_map so
* that they point to the new index of the subnode.
*/
for (i = 0; i < prunestate->num_partprunedata; i++)
{
int nparts;
int j;
pprune = &prunestate->partprunedata[i];
nparts = pprune->context.nparts;
/*
* We also need to reset the present_parts field so that it only
* contains partition indexes that we actually still have subnodes
* for. It seems easier to build a fresh one, rather than trying
* to update the existing one.
*/
bms_free(pprune->present_parts);
pprune->present_parts = NULL;
for (j = 0; j < nparts; j++)
{
int oldidx = pprune->subnode_map[j];
/*
* If this partition existed as a subnode then change the old
* subnode index to the new subnode index. The new index may
* become -1 if the partition was pruned above, or it may just
* come earlier in the subnode list due to some subnodes being
* removed earlier in the list.
*/
if (oldidx >= 0)
{
pprune->subnode_map[j] = new_subnode_indexes[oldidx];
if (new_subnode_indexes[oldidx] >= 0)
pprune->present_parts =
bms_add_member(pprune->present_parts, j);
}
}
}
pfree(new_subnode_indexes);
}
return result;
}
/*
* ExecFindMatchingSubPlans
* Determine which subplans match the the pruning steps detailed in
* 'pprune' for the current Param values.
*
* Here we utilize both external and exec Params for pruning.
*/
Bitmapset *
ExecFindMatchingSubPlans(PartitionPruneState *prunestate)
{
PartitionPruningData *pprune;
MemoryContext oldcontext;
Bitmapset *result = NULL;
pprune = prunestate->partprunedata;
/*
* Switch to a temp context to avoid leaking memory in the executor's
* memory context.
*/
oldcontext = MemoryContextSwitchTo(prunestate->prune_context);
find_subplans_for_params_recurse(prunestate, pprune, true, &result);
MemoryContextSwitchTo(oldcontext);
/* Move to the correct memory context */
result = bms_copy(result);
MemoryContextReset(prunestate->prune_context);
return result;
}
/*
* find_subplans_for_params_recurse
* Recursive worker function for ExecFindMatchingSubPlans and
* ExecFindInitialMatchingSubPlans
*/
static void
find_subplans_for_params_recurse(PartitionPruneState *prunestate,
PartitionPruningData *pprune,
bool allparams,
Bitmapset **validsubplans)
{
PartitionPruneContext *context = &pprune->context;
Bitmapset *partset;
Bitmapset *pruneparams;
int i;
/* Guard against stack overflow due to overly deep partition hierarchy. */
check_stack_depth();
/*
* Use only external params unless we've been asked to also use exec
* params too.
*/
if (allparams)
pruneparams = pprune->allparams;
else
pruneparams = pprune->extparams;
/*
* We only need to determine the matching partitions if there are any
* params matching the partition key at this level. If there are no
* matching params, then we can simply return all subnodes which belong to
* this parent partition. The planner should have already determined
* these to be the minimum possible set. We must still recursively visit
* any subpartitioned tables as we may find their partition keys match
* some Params at their level.
*/
if (!bms_is_empty(pruneparams))
{
context->safeparams = pruneparams;
partset = get_matching_partitions(context,
pprune->pruning_steps);
}
else
partset = pprune->present_parts;
/* Translate partset into subnode indexes */
i = -1;
while ((i = bms_next_member(partset, i)) >= 0)
{
if (pprune->subnode_map[i] >= 0)
*validsubplans = bms_add_member(*validsubplans,
pprune->subnode_map[i]);
else
{
int partidx = pprune->subpart_map[i];
if (partidx != -1)
find_subplans_for_params_recurse(prunestate,
&prunestate->partprunedata[partidx],
allparams, validsubplans);
else
{
/*
* This could only happen if clauses used in planning where
* more restrictive than those used here, or if the maps are
* somehow corrupt.
*/
elog(ERROR, "partition missing from subplans");
}
}
}
}
......@@ -58,6 +58,7 @@
#include "postgres.h"
#include "executor/execdebug.h"
#include "executor/execPartition.h"
#include "executor/nodeAppend.h"
#include "miscadmin.h"
......@@ -77,11 +78,13 @@ struct ParallelAppendState
};
#define INVALID_SUBPLAN_INDEX -1
#define NO_MATCHING_SUBPLANS -2
static TupleTableSlot *ExecAppend(PlanState *pstate);
static bool choose_next_subplan_locally(AppendState *node);
static bool choose_next_subplan_for_leader(AppendState *node);
static bool choose_next_subplan_for_worker(AppendState *node);
static void mark_invalid_subplans_as_finished(AppendState *node);
/* ----------------------------------------------------------------
* ExecInitAppend
......@@ -99,8 +102,10 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
{
AppendState *appendstate = makeNode(AppendState);
PlanState **appendplanstates;
Bitmapset *validsubplans;
int nplans;
int i;
int i,
j;
ListCell *lc;
/* check for unsupported flags */
......@@ -113,54 +118,116 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
ExecLockNonLeafAppendTables(node->partitioned_rels, estate);
/*
* Set up empty vector of subplan states
* create new AppendState for our append node
*/
appendstate->ps.plan = (Plan *) node;
appendstate->ps.state = estate;
appendstate->ps.ExecProcNode = ExecAppend;
/* Let choose_next_subplan_* function handle setting the first subplan */
appendstate->as_whichplan = INVALID_SUBPLAN_INDEX;
/* If run-time partition pruning is enabled, then set that up now */
if (node->part_prune_infos != NIL)
{
PartitionPruneState *prunestate;
ExecAssignExprContext(estate, &appendstate->ps);
prunestate = ExecSetupPartitionPruneState(&appendstate->ps,
node->part_prune_infos);
/*
* When there are external params matching the partition key we may be
* able to prune away Append subplans now.
*/
if (!bms_is_empty(prunestate->extparams))
{
/* Determine which subplans match the external params */
validsubplans = ExecFindInitialMatchingSubPlans(prunestate,
list_length(node->appendplans));
/*
* If no subplans match the given parameters then we must handle
* this case in a special way. The problem here is that code in
* explain.c requires an Append to have at least one subplan in
* order for it to properly determine the Vars in that subplan's
* targetlist. We sidestep this issue by just initializing the
* first subplan and setting as_whichplan to NO_MATCHING_SUBPLANS
* to indicate that we don't need to scan any subnodes.
*/
if (bms_is_empty(validsubplans))
{
appendstate->as_whichplan = NO_MATCHING_SUBPLANS;
/* Mark the first as valid so that it's initialized below */
validsubplans = bms_make_singleton(0);
}
nplans = bms_num_members(validsubplans);
}
else
{
/* We'll need to initialize all subplans */
nplans = list_length(node->appendplans);
validsubplans = bms_add_range(NULL, 0, nplans - 1);
}
appendplanstates = (PlanState **) palloc0(nplans * sizeof(PlanState *));
/*
* If there's no exec params then no further pruning can be done, we
* can just set the valid subplans to all remaining subplans.
*/
if (bms_is_empty(prunestate->execparams))
appendstate->as_valid_subplans = bms_add_range(NULL, 0, nplans - 1);
appendstate->as_prune_state = prunestate;
}
else
{
nplans = list_length(node->appendplans);
/*
* create new AppendState for our append node
* When run-time partition pruning is not enabled we can just mark all
* subplans as valid, they must also all be initialized.
*/
appendstate->ps.plan = (Plan *) node;
appendstate->ps.state = estate;
appendstate->ps.ExecProcNode = ExecAppend;
appendstate->appendplans = appendplanstates;
appendstate->as_nplans = nplans;
appendstate->as_valid_subplans = validsubplans =
bms_add_range(NULL, 0, nplans - 1);
appendstate->as_prune_state = NULL;
}
/*
* Initialize result tuple type and slot.
*/
ExecInitResultTupleSlotTL(estate, &appendstate->ps);
appendplanstates = (PlanState **) palloc(nplans *
sizeof(PlanState *));
/*
* call ExecInitNode on each of the plans to be executed and save the
* results into the array "appendplans".
* call ExecInitNode on each of the valid plans to be executed and save
* the results into the appendplanstates array.
*/
i = 0;
j = i = 0;
foreach(lc, node->appendplans)
{
if (bms_is_member(i, validsubplans))
{
Plan *initNode = (Plan *) lfirst(lc);
appendplanstates[i] = ExecInitNode(initNode, estate, eflags);
appendplanstates[j++] = ExecInitNode(initNode, estate, eflags);
}
i++;
}
appendstate->appendplans = appendplanstates;
appendstate->as_nplans = nplans;
/*
* Miscellaneous initialization
*
* Append plans don't have expression contexts because they never call
* ExecQual or ExecProject.
*/
appendstate->ps.ps_ProjInfo = NULL;
/*
* Parallel-aware append plans must choose the first subplan to execute by
* looking at shared memory, but non-parallel-aware append plans can
* always start with the first subplan.
*/
appendstate->as_whichplan =
appendstate->ps.plan->parallel_aware ? INVALID_SUBPLAN_INDEX : 0;
appendstate->ps.ps_ProjInfo = NULL;
/* For parallel query, this will be overridden later. */
appendstate->choose_next_subplan = choose_next_subplan_locally;
......@@ -179,11 +246,21 @@ ExecAppend(PlanState *pstate)
{
AppendState *node = castNode(AppendState, pstate);
/* If no subplan has been chosen, we must choose one before proceeding. */
if (node->as_whichplan < 0)
{
/*
* If no subplan has been chosen, we must choose one before
* proceeding.
*/
if (node->as_whichplan == INVALID_SUBPLAN_INDEX &&
!node->choose_next_subplan(node))
return ExecClearTuple(node->ps.ps_ResultTupleSlot);
/* Nothing to do if there are no matching subplans */
else if (node->as_whichplan == NO_MATCHING_SUBPLANS)
return ExecClearTuple(node->ps.ps_ResultTupleSlot);
}
for (;;)
{
PlanState *subnode;
......@@ -251,6 +328,19 @@ ExecReScanAppend(AppendState *node)
{
int i;
/*
* If any of the parameters being used for partition pruning have changed,
* then we'd better unset the valid subplans so that they are reselected
* for the new parameter values.
*/
if (node->as_prune_state &&
bms_overlap(node->ps.chgParam,
node->as_prune_state->execparams))
{
bms_free(node->as_valid_subplans);
node->as_valid_subplans = NULL;
}
for (i = 0; i < node->as_nplans; i++)
{
PlanState *subnode = node->appendplans[i];
......@@ -270,8 +360,8 @@ ExecReScanAppend(AppendState *node)
ExecReScan(subnode);
}
node->as_whichplan =
node->ps.plan->parallel_aware ? INVALID_SUBPLAN_INDEX : 0;
/* Let choose_next_subplan_* function handle setting the first subplan */
node->as_whichplan = INVALID_SUBPLAN_INDEX;
}
/* ----------------------------------------------------------------
......@@ -360,29 +450,39 @@ static bool
choose_next_subplan_locally(AppendState *node)
{
int whichplan = node->as_whichplan;
int nextplan;
/* We should never be called when there are no subplans */
Assert(whichplan != NO_MATCHING_SUBPLANS);
if (ScanDirectionIsForward(node->ps.state->es_direction))
{
/*
* We won't normally see INVALID_SUBPLAN_INDEX in this case, but we
* might if a plan intended to be run in parallel ends up being run
* serially.
* If first call then have the bms member function choose the first valid
* subplan by initializing whichplan to -1. If there happen to be no
* valid subplans then the bms member function will handle that by
* returning a negative number which will allow us to exit returning a
* false value.
*/
if (whichplan == INVALID_SUBPLAN_INDEX)
node->as_whichplan = 0;
else
{
if (whichplan >= node->as_nplans - 1)
return false;
node->as_whichplan++;
}
if (node->as_valid_subplans == NULL)
node->as_valid_subplans =
ExecFindMatchingSubPlans(node->as_prune_state);
whichplan = -1;
}
/* Ensure whichplan is within the expected range */
Assert(whichplan >= -1 && whichplan <= node->as_nplans);
if (ScanDirectionIsForward(node->ps.state->es_direction))
nextplan = bms_next_member(node->as_valid_subplans, whichplan);
else
{
if (whichplan <= 0)
nextplan = bms_prev_member(node->as_valid_subplans, whichplan);
if (nextplan < 0)
return false;
node->as_whichplan--;
}
node->as_whichplan = nextplan;
return true;
}
......@@ -404,6 +504,9 @@ choose_next_subplan_for_leader(AppendState *node)
/* Backward scan is not supported by parallel-aware plans */
Assert(ScanDirectionIsForward(node->ps.state->es_direction));
/* We should never be called when there are no subplans */
Assert(node->as_whichplan != NO_MATCHING_SUBPLANS);
LWLockAcquire(&pstate->pa_lock, LW_EXCLUSIVE);
if (node->as_whichplan != INVALID_SUBPLAN_INDEX)
......@@ -415,6 +518,23 @@ choose_next_subplan_for_leader(AppendState *node)
{
/* Start with last subplan. */
node->as_whichplan = node->as_nplans - 1;
/*
* If we've yet to determine the valid subplans for these parameters
* then do so now. If run-time pruning is disabled then the valid
* subplans will always be set to all subplans.
*/
if (node->as_valid_subplans == NULL)
{
node->as_valid_subplans =
ExecFindMatchingSubPlans(node->as_prune_state);
/*
* Mark each invalid plan as finished to allow the loop below to
* select the first valid subplan.
*/
mark_invalid_subplans_as_finished(node);
}
}
/* Loop until we find a subplan to execute. */
......@@ -461,12 +581,27 @@ choose_next_subplan_for_worker(AppendState *node)
/* Backward scan is not supported by parallel-aware plans */
Assert(ScanDirectionIsForward(node->ps.state->es_direction));
/* We should never be called when there are no subplans */
Assert(node->as_whichplan != NO_MATCHING_SUBPLANS);
LWLockAcquire(&pstate->pa_lock, LW_EXCLUSIVE);
/* Mark just-completed subplan as finished. */
if (node->as_whichplan != INVALID_SUBPLAN_INDEX)
node->as_pstate->pa_finished[node->as_whichplan] = true;
/*
* If we've yet to determine the valid subplans for these parameters then
* do so now. If run-time pruning is disabled then the valid subplans
* will always be set to all subplans.
*/
else if (node->as_valid_subplans == NULL)
{
node->as_valid_subplans =
ExecFindMatchingSubPlans(node->as_prune_state);
mark_invalid_subplans_as_finished(node);
}
/* If all the plans are already done, we have nothing to do */
if (pstate->pa_next_plan == INVALID_SUBPLAN_INDEX)
{
......@@ -532,3 +667,34 @@ choose_next_subplan_for_worker(AppendState *node)
return true;
}
/*
* mark_invalid_subplans_as_finished
* Marks the ParallelAppendState's pa_finished as true for each invalid
* subplan.
*
* This function should only be called for parallel Append with run-time
* pruning enabled.
*/
static void
mark_invalid_subplans_as_finished(AppendState *node)
{
int i;
/* Only valid to call this while in parallel Append mode */
Assert(node->as_pstate);
/* Shouldn't have been called when run-time pruning is not enabled */
Assert(node->as_prune_state);
/* Nothing to do if all plans are valid */
if (bms_num_members(node->as_valid_subplans) == node->as_nplans)
return;
/* Mark all non-valid plans as finished */
for (i = 0; i < node->as_nplans; i++)
{
if (!bms_is_member(i, node->as_valid_subplans))
node->as_pstate->pa_finished[i] = true;
}
}
......@@ -248,6 +248,7 @@ _copyAppend(const Append *from)
COPY_NODE_FIELD(partitioned_rels);
COPY_NODE_FIELD(appendplans);
COPY_SCALAR_FIELD(first_partial_plan);
COPY_NODE_FIELD(part_prune_infos);
return newnode;
}
......@@ -2182,6 +2183,23 @@ _copyPartitionPruneStepCombine(const PartitionPruneStepCombine *from)
return newnode;
}
static PartitionPruneInfo *
_copyPartitionPruneInfo(const PartitionPruneInfo *from)
{
PartitionPruneInfo *newnode = makeNode(PartitionPruneInfo);
COPY_SCALAR_FIELD(reloid);
COPY_NODE_FIELD(pruning_steps);
COPY_BITMAPSET_FIELD(present_parts);
COPY_SCALAR_FIELD(nparts);
COPY_POINTER_FIELD(subnode_map, from->nparts * sizeof(int));
COPY_POINTER_FIELD(subpart_map, from->nparts * sizeof(int));
COPY_BITMAPSET_FIELD(extparams);
COPY_BITMAPSET_FIELD(execparams);
return newnode;
}
/* ****************************************************************
* relation.h copy functions
*
......@@ -5123,6 +5141,9 @@ copyObjectImpl(const void *from)
case T_PlaceHolderInfo:
retval = _copyPlaceHolderInfo(from);
break;
case T_PartitionPruneInfo:
retval = _copyPartitionPruneInfo(from);
break;
/*
* VALUE NODES
......
......@@ -30,7 +30,7 @@ static int leftmostLoc(int loc1, int loc2);
static bool fix_opfuncids_walker(Node *node, void *context);
static bool planstate_walk_subplans(List *plans, bool (*walker) (),
void *context);
static bool planstate_walk_members(List *plans, PlanState **planstates,
static bool planstate_walk_members(PlanState **planstates, int nplans,
bool (*walker) (), void *context);
......@@ -3806,32 +3806,32 @@ planstate_tree_walker(PlanState *planstate,
switch (nodeTag(plan))
{
case T_ModifyTable:
if (planstate_walk_members(((ModifyTable *) plan)->plans,
((ModifyTableState *) planstate)->mt_plans,
if (planstate_walk_members(((ModifyTableState *) planstate)->mt_plans,
((ModifyTableState *) planstate)->mt_nplans,
walker, context))
return true;
break;
case T_Append:
if (planstate_walk_members(((Append *) plan)->appendplans,
((AppendState *) planstate)->appendplans,
if (planstate_walk_members(((AppendState *) planstate)->appendplans,
((AppendState *) planstate)->as_nplans,
walker, context))
return true;
break;
case T_MergeAppend:
if (planstate_walk_members(((MergeAppend *) plan)->mergeplans,
((MergeAppendState *) planstate)->mergeplans,
if (planstate_walk_members(((MergeAppendState *) planstate)->mergeplans,
((MergeAppendState *) planstate)->ms_nplans,
walker, context))
return true;
break;
case T_BitmapAnd:
if (planstate_walk_members(((BitmapAnd *) plan)->bitmapplans,
((BitmapAndState *) planstate)->bitmapplans,
if (planstate_walk_members(((BitmapAndState *) planstate)->bitmapplans,
((BitmapAndState *) planstate)->nplans,
walker, context))
return true;
break;
case T_BitmapOr:
if (planstate_walk_members(((BitmapOr *) plan)->bitmapplans,
((BitmapOrState *) planstate)->bitmapplans,
if (planstate_walk_members(((BitmapOrState *) planstate)->bitmapplans,
((BitmapOrState *) planstate)->nplans,
walker, context))
return true;
break;
......@@ -3881,15 +3881,11 @@ planstate_walk_subplans(List *plans,
/*
* Walk the constituent plans of a ModifyTable, Append, MergeAppend,
* BitmapAnd, or BitmapOr node.
*
* Note: we don't actually need to examine the Plan list members, but
* we need the list in order to determine the length of the PlanState array.
*/
static bool
planstate_walk_members(List *plans, PlanState **planstates,
planstate_walk_members(PlanState **planstates, int nplans,
bool (*walker) (), void *context)
{
int nplans = list_length(plans);
int j;
for (j = 0; j < nplans; j++)
......
......@@ -419,6 +419,7 @@ _outAppend(StringInfo str, const Append *node)
WRITE_NODE_FIELD(partitioned_rels);
WRITE_NODE_FIELD(appendplans);
WRITE_INT_FIELD(first_partial_plan);
WRITE_NODE_FIELD(part_prune_infos);
}
static void
......@@ -1758,6 +1759,30 @@ _outMergeAction(StringInfo str, const MergeAction *node)
WRITE_NODE_FIELD(targetList);
}
static void
_outPartitionPruneInfo(StringInfo str, const PartitionPruneInfo *node)
{
int i;
WRITE_NODE_TYPE("PARTITIONPRUNEINFO");
WRITE_OID_FIELD(reloid);
WRITE_NODE_FIELD(pruning_steps);
WRITE_BITMAPSET_FIELD(present_parts);
WRITE_INT_FIELD(nparts);
appendStringInfoString(str, " :subnode_map");
for (i = 0; i < node->nparts; i++)
appendStringInfo(str, " %d", node->subnode_map[i]);
appendStringInfoString(str, " :subpart_map");
for (i = 0; i < node->nparts; i++)
appendStringInfo(str, " %d", node->subpart_map[i]);
WRITE_BITMAPSET_FIELD(extparams);
WRITE_BITMAPSET_FIELD(execparams);
}
/*****************************************************************************
*
* Stuff from relation.h.
......@@ -3996,6 +4021,9 @@ outNode(StringInfo str, const void *obj)
case T_PartitionPruneStepCombine:
_outPartitionPruneStepCombine(str, obj);
break;
case T_PartitionPruneInfo:
_outPartitionPruneInfo(str, obj);
break;
case T_Path:
_outPath(str, obj);
break;
......
......@@ -1373,6 +1373,23 @@ _readMergeAction(void)
READ_DONE();
}
static PartitionPruneInfo *
_readPartitionPruneInfo(void)
{
READ_LOCALS(PartitionPruneInfo);
READ_OID_FIELD(reloid);
READ_NODE_FIELD(pruning_steps);
READ_BITMAPSET_FIELD(present_parts);
READ_INT_FIELD(nparts);
READ_INT_ARRAY(subnode_map, local_node->nparts);
READ_INT_ARRAY(subpart_map, local_node->nparts);
READ_BITMAPSET_FIELD(extparams);
READ_BITMAPSET_FIELD(execparams);
READ_DONE();
}
/*
* Stuff from parsenodes.h.
*/
......@@ -1675,6 +1692,7 @@ _readAppend(void)
READ_NODE_FIELD(partitioned_rels);
READ_NODE_FIELD(appendplans);
READ_INT_FIELD(first_partial_plan);
READ_NODE_FIELD(part_prune_infos);
READ_DONE();
}
......@@ -2645,6 +2663,8 @@ parseNodeString(void)
return_value = _readPartitionPruneStepOp();
else if (MATCH("PARTITIONPRUNESTEPCOMBINE", 25))
return_value = _readPartitionPruneStepCombine();
else if (MATCH("PARTITIONPRUNEINFO", 18))
return_value = _readPartitionPruneInfo();
else if (MATCH("RTE", 3))
return_value = _readRangeTblEntry();
else if (MATCH("RANGETBLFUNCTION", 16))
......
......@@ -1604,7 +1604,7 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
* if we have zero or one live subpath due to constraint exclusion.)
*/
if (subpaths_valid)
add_path(rel, (Path *) create_append_path(rel, subpaths, NIL,
add_path(rel, (Path *) create_append_path(root, rel, subpaths, NIL,
NULL, 0, false,
partitioned_rels, -1));
......@@ -1646,8 +1646,8 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
Assert(parallel_workers > 0);
/* Generate a partial append path. */
appendpath = create_append_path(rel, NIL, partial_subpaths, NULL,
parallel_workers,
appendpath = create_append_path(root, rel, NIL, partial_subpaths,
NULL, parallel_workers,
enable_parallel_append,
partitioned_rels, -1);
......@@ -1695,7 +1695,7 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
max_parallel_workers_per_gather);
Assert(parallel_workers > 0);
appendpath = create_append_path(rel, pa_nonpartial_subpaths,
appendpath = create_append_path(root, rel, pa_nonpartial_subpaths,
pa_partial_subpaths,
NULL, parallel_workers, true,
partitioned_rels, partial_rows);
......@@ -1758,7 +1758,7 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
if (subpaths_valid)
add_path(rel, (Path *)
create_append_path(rel, subpaths, NIL,
create_append_path(root, rel, subpaths, NIL,
required_outer, 0, false,
partitioned_rels, -1));
}
......@@ -2024,7 +2024,7 @@ set_dummy_rel_pathlist(RelOptInfo *rel)
rel->pathlist = NIL;
rel->partial_pathlist = NIL;
add_path(rel, (Path *) create_append_path(rel, NIL, NIL, NULL,
add_path(rel, (Path *) create_append_path(NULL, rel, NIL, NIL, NULL,
0, false, NIL, -1));
/*
......
......@@ -1230,7 +1230,7 @@ mark_dummy_rel(RelOptInfo *rel)
rel->partial_pathlist = NIL;
/* Set up the dummy path */
add_path(rel, (Path *) create_append_path(rel, NIL, NIL, NULL,
add_path(rel, (Path *) create_append_path(NULL, rel, NIL, NIL, NULL,
0, false, NIL, -1));
/* Set or update cheapest_total_path and related fields */
......
......@@ -41,6 +41,7 @@
#include "optimizer/var.h"
#include "parser/parse_clause.h"
#include "parser/parsetree.h"
#include "partitioning/partprune.h"
#include "utils/lsyscache.h"
......@@ -210,7 +211,7 @@ static NamedTuplestoreScan *make_namedtuplestorescan(List *qptlist, List *qpqual
static WorkTableScan *make_worktablescan(List *qptlist, List *qpqual,
Index scanrelid, int wtParam);
static Append *make_append(List *appendplans, int first_partial_plan,
List *tlist, List *partitioned_rels);
List *tlist, List *partitioned_rels, List *partpruneinfos);
static RecursiveUnion *make_recursive_union(List *tlist,
Plan *lefttree,
Plan *righttree,
......@@ -1041,6 +1042,8 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path)
List *tlist = build_path_tlist(root, &best_path->path);
List *subplans = NIL;
ListCell *subpaths;
RelOptInfo *rel = best_path->path.parent;
List *partpruneinfos = NIL;
/*
* The subpaths list could be empty, if every child was proven empty by
......@@ -1078,6 +1081,38 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path)
subplans = lappend(subplans, subplan);
}
if (rel->reloptkind == RELOPT_BASEREL &&
best_path->partitioned_rels != NIL)
{
List *prunequal;
prunequal = extract_actual_clauses(rel->baserestrictinfo, false);
if (best_path->path.param_info)
{
List *prmquals = best_path->path.param_info->ppi_clauses;
prmquals = extract_actual_clauses(prmquals, false);
prmquals = (List *) replace_nestloop_params(root,
(Node *) prmquals);
prunequal = list_concat(prunequal, prmquals);
}
/*
* If any quals exist, they may be useful to perform further partition
* pruning during execution. Generate a PartitionPruneInfo for each
* partitioned rel to store these quals and allow translation of
* partition indexes into subpath indexes.
*/
if (prunequal != NIL)
partpruneinfos =
make_partition_pruneinfo(root,
best_path->partitioned_rels,
best_path->subpaths, prunequal);
}
/*
* XXX ideally, if there's just one child, we'd not bother to generate an
* Append node but just return the single child. At the moment this does
......@@ -1086,7 +1121,8 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path)
*/
plan = make_append(subplans, best_path->first_partial_path,
tlist, best_path->partitioned_rels);
tlist, best_path->partitioned_rels,
partpruneinfos);
copy_generic_path_info(&plan->plan, (Path *) best_path);
......@@ -5382,7 +5418,8 @@ make_foreignscan(List *qptlist,
static Append *
make_append(List *appendplans, int first_partial_plan,
List *tlist, List *partitioned_rels)
List *tlist, List *partitioned_rels,
List *partpruneinfos)
{
Append *node = makeNode(Append);
Plan *plan = &node->plan;
......@@ -5394,7 +5431,7 @@ make_append(List *appendplans, int first_partial_plan,
node->partitioned_rels = partitioned_rels;
node->appendplans = appendplans;
node->first_partial_plan = first_partial_plan;
node->part_prune_infos = partpruneinfos;
return node;
}
......
......@@ -3920,7 +3920,8 @@ create_degenerate_grouping_paths(PlannerInfo *root, RelOptInfo *input_rel,
paths = lappend(paths, path);
}
path = (Path *)
create_append_path(grouped_rel,
create_append_path(root,
grouped_rel,
paths,
NIL,
NULL,
......@@ -6852,8 +6853,9 @@ apply_scanjoin_target_to_paths(PlannerInfo *root,
* node, which would cause this relation to stop appearing to be a
* dummy rel.)
*/
rel->pathlist = list_make1(create_append_path(rel, NIL, NIL, NULL,
0, false, NIL, -1));
rel->pathlist = list_make1(create_append_path(root, rel, NIL, NIL,
NULL, 0, false, NIL,
-1));
rel->partial_pathlist = NIL;
set_cheapest(rel);
Assert(IS_DUMMY_REL(rel));
......
......@@ -648,7 +648,7 @@ generate_union_paths(SetOperationStmt *op, PlannerInfo *root,
/*
* Append the child results together.
*/
path = (Path *) create_append_path(result_rel, pathlist, NIL,
path = (Path *) create_append_path(root, result_rel, pathlist, NIL,
NULL, 0, false, NIL, -1);
/*
......@@ -703,7 +703,7 @@ generate_union_paths(SetOperationStmt *op, PlannerInfo *root,
Assert(parallel_workers > 0);
ppath = (Path *)
create_append_path(result_rel, NIL, partial_pathlist,
create_append_path(root, result_rel, NIL, partial_pathlist,
NULL, parallel_workers, enable_parallel_append,
NIL, -1);
ppath = (Path *)
......@@ -814,7 +814,7 @@ generate_nonunion_paths(SetOperationStmt *op, PlannerInfo *root,
/*
* Append the child results together.
*/
path = (Path *) create_append_path(result_rel, pathlist, NIL,
path = (Path *) create_append_path(root, result_rel, pathlist, NIL,
NULL, 0, false, NIL, -1);
/* Identify the grouping semantics */
......
......@@ -1210,7 +1210,8 @@ create_tidscan_path(PlannerInfo *root, RelOptInfo *rel, List *tidquals,
* Note that we must handle subpaths = NIL, representing a dummy access path.
*/
AppendPath *
create_append_path(RelOptInfo *rel,
create_append_path(PlannerInfo *root,
RelOptInfo *rel,
List *subpaths, List *partial_subpaths,
Relids required_outer,
int parallel_workers, bool parallel_aware,
......@@ -1224,8 +1225,25 @@ create_append_path(RelOptInfo *rel,
pathnode->path.pathtype = T_Append;
pathnode->path.parent = rel;
pathnode->path.pathtarget = rel->reltarget;
/*
* When generating an Append path for a partitioned table, there may be
* parameters that are useful so we can eliminate certain partitions
* during execution. Here we'll go all the way and fully populate the
* parameter info data as we do for normal base relations. However, we
* need only bother doing this for RELOPT_BASEREL rels, as
* RELOPT_OTHER_MEMBER_REL's Append paths are merged into the base rel's
* Append subpaths. It would do no harm to do this, we just avoid it to
* save wasting effort.
*/
if (partitioned_rels != NIL && root && rel->reloptkind == RELOPT_BASEREL)
pathnode->path.param_info = get_baserel_parampathinfo(root,
rel,
required_outer);
else
pathnode->path.param_info = get_appendrel_parampathinfo(rel,
required_outer);
pathnode->path.parallel_aware = parallel_aware;
pathnode->path.parallel_safe = rel->consider_parallel;
pathnode->path.parallel_workers = parallel_workers;
......@@ -3574,7 +3592,7 @@ reparameterize_path(PlannerInfo *root, Path *path,
i++;
}
return (Path *)
create_append_path(rel, childpaths, partialpaths,
create_append_path(root, rel, childpaths, partialpaths,
required_outer,
apath->path.parallel_workers,
apath->path.parallel_aware,
......
/*-------------------------------------------------------------------------
*
* partprune.c
* Support for partition pruning during query planning
* Support for partition pruning during query planning and execution
*
* This module implements partition pruning using the information contained in
* table's partition descriptor and query clauses.
* table's partition descriptor, query clauses, and run-time parameters.
*
* During planning, clauses that can be matched to the table's partition key
* are turned into a set of "pruning steps", which are then executed to
* produce a set of partitions (as indexes of the RelOptInfo->part_rels array)
* that satisfy the constraints in the step Partitions not in the set are said
* that satisfy the constraints in the step. Partitions not in the set are said
* to have been pruned.
*
* A base pruning step may also consist of expressions whose values are only
* known during execution, such as Params, in which case pruning cannot occur
* entirely during planning. In that case, such steps are included alongside
* the plan, so that they can be used by the executor for further pruning.
*
* There are two kinds of pruning steps: a "base" pruning step, which contains
* information extracted from one or more clauses that are matched to the
* (possibly multi-column) partition key, such as the expressions whose values
......@@ -39,10 +44,12 @@
#include "catalog/pg_operator.h"
#include "catalog/pg_opfamily.h"
#include "catalog/pg_type.h"
#include "executor/executor.h"
#include "miscadmin.h"
#include "nodes/makefuncs.h"
#include "nodes/nodeFuncs.h"
#include "optimizer/clauses.h"
#include "optimizer/pathnode.h"
#include "optimizer/planner.h"
#include "optimizer/predtest.h"
#include "optimizer/prep.h"
......@@ -153,6 +160,7 @@ static PruneStepResult *get_matching_list_bounds(PartitionPruneContext *context,
static PruneStepResult *get_matching_range_bounds(PartitionPruneContext *context,
StrategyNumber opstrategy, Datum *values, int nvalues,
FmgrInfo *partsupfunc, Bitmapset *nullkeys);
static bool pull_partkey_params(PartitionPruneInfo *pinfo, List *steps);
static PruneStepResult *perform_pruning_base_step(PartitionPruneContext *context,
PartitionPruneStepOp *opstep);
static PruneStepResult *perform_pruning_combine_step(PartitionPruneContext *context,
......@@ -163,6 +171,181 @@ static bool match_boolean_partition_clause(Oid partopfamily, Expr *clause,
static bool partkey_datum_from_expr(PartitionPruneContext *context,
Expr *expr, Datum *value);
/*
* make_partition_pruneinfo
* Build List of PartitionPruneInfos, one for each 'partitioned_rels'.
* These can be used in the executor to allow additional partition
* pruning to take place.
*
* Here we generate partition pruning steps for 'prunequal' and also build a
* data stucture which allows mapping of partition indexes into 'subpaths'
* indexes.
*
* If no Params were found to match the partition key in any of the
* 'partitioned_rels', then we return NIL. In such a case run-time partition
* pruning would be useless.
*/
List *
make_partition_pruneinfo(PlannerInfo *root, List *partition_rels,
List *subpaths, List *prunequal)
{
RelOptInfo *targetpart = NULL;
ListCell *lc;
List *pinfolist = NIL;
int *relid_subnode_map;
int *relid_subpart_map;
int i;
bool gotparam = false;
/*
* Allocate two arrays to store the 1-based indexes of the 'subpaths' and
* 'partitioned_rels' by relid.
*/
relid_subnode_map = palloc0(sizeof(int) * root->simple_rel_array_size);
relid_subpart_map = palloc0(sizeof(int) * root->simple_rel_array_size);
i = 1;
foreach(lc, subpaths)
{
Path *path = (Path *) lfirst(lc);
RelOptInfo *pathrel = path->parent;
Assert(IS_SIMPLE_REL(pathrel));
Assert(pathrel->relid < root->simple_rel_array_size);
relid_subnode_map[pathrel->relid] = i++;
}
/* Likewise for the partition_rels */
i = 1;
foreach(lc, partition_rels)
{
Index rti = lfirst_int(lc);
Assert(rti < root->simple_rel_array_size);
relid_subpart_map[rti] = i++;
}
/* We now build a PartitionPruneInfo for each partition_rels */
foreach(lc, partition_rels)
{
Index rti = lfirst_int(lc);
RelOptInfo *subpart = find_base_rel(root, rti);
PartitionPruneInfo *pinfo;
RangeTblEntry *rte;
Bitmapset *present_parts;
int nparts = subpart->nparts;
int *subnode_map;
int *subpart_map;
List *partprunequal;
List *pruning_steps;
bool contradictory;
/*
* The first item in the list is the target partitioned relation. The
* quals belong to this relation, so require no translation.
*/
if (!targetpart)
{
targetpart = subpart;
partprunequal = prunequal;
}
else
{
/*
* For sub-partitioned tables the columns may not be in the same
* order as the parent, so we must translate the prunequal to make
* it compatible with this relation.
*/
partprunequal = (List *)
adjust_appendrel_attrs_multilevel(root,
(Node *) prunequal,
subpart->relids,
targetpart->relids);
}
pruning_steps = gen_partprune_steps(subpart, partprunequal,
&contradictory);
if (contradictory)
{
/*
* This shouldn't happen as the planner should have detected this
* earlier. However, we do use additional quals from parameterized
* paths here. These do only compare Params to the partition key,
* so this shouldn't cause the discovery of any new qual
* contradictions that were not previously discovered as the Param
* values are unknown during planning. Anyway, we'd better do
* something sane here, so let's just disable run-time pruning.
*/
return NIL;
}
subnode_map = (int *) palloc(nparts * sizeof(int));
subpart_map = (int *) palloc(nparts * sizeof(int));
present_parts = NULL;
/*
* Loop over each partition of the partitioned rel and record the
* subpath index for each. Any partitions which are not present in
* the subpaths List will be set to -1, and any sub-partitioned table
* which is not present will also be set to -1.
*/
for (i = 0; i < nparts; i++)
{
RelOptInfo *partrel = subpart->part_rels[i];
int subnodeidx = relid_subnode_map[partrel->relid] - 1;
int subpartidx = relid_subpart_map[partrel->relid] - 1;
subnode_map[i] = subnodeidx;
subpart_map[i] = subpartidx;
/*
* Record the indexes of all the partition indexes that we have
* subnodes or subparts for. This allows an optimization to skip
* attempting any run-time pruning when no Params are found
* matching the partition key at this level.
*/
if (subnodeidx >= 0 || subpartidx >= 0)
present_parts = bms_add_member(present_parts, i);
}
rte = root->simple_rte_array[subpart->relid];
pinfo = makeNode(PartitionPruneInfo);
pinfo->reloid = rte->relid;
pinfo->pruning_steps = pruning_steps;
pinfo->present_parts = present_parts;
pinfo->nparts = nparts;
pinfo->extparams = NULL;
pinfo->execparams = NULL;
pinfo->subnode_map = subnode_map;
pinfo->subpart_map = subpart_map;
/*
* Extract Params matching partition key and record if we got any.
* We'll not bother enabling run-time pruning if no params matched the
* partition key at any level of partitioning.
*/
gotparam |= pull_partkey_params(pinfo, pruning_steps);
pinfolist = lappend(pinfolist, pinfo);
}
pfree(relid_subnode_map);
pfree(relid_subpart_map);
if (gotparam)
return pinfolist;
/*
* If no Params were found to match the partition key on any of the
* partitioned relations then there's no point doing any run-time
* partition pruning.
*/
return NIL;
}
/*
* gen_partprune_steps
......@@ -258,6 +441,10 @@ prune_append_rel_partitions(RelOptInfo *rel)
context.nparts = rel->nparts;
context.boundinfo = rel->boundinfo;
/* Not valid when being called from the planner */
context.planstate = NULL;
context.safeparams = NULL;
/* Actual pruning happens here. */
partindexes = get_matching_partitions(&context, pruning_steps);
......@@ -2492,6 +2679,57 @@ get_matching_range_bounds(PartitionPruneContext *context,
return result;
}
/*
* pull_partkey_params
* Loop through each pruning step and record each external and exec
* Params being compared to the partition keys.
*/
static bool
pull_partkey_params(PartitionPruneInfo *pinfo, List *steps)
{
ListCell *lc;
bool gotone = false;
foreach(lc, steps)
{
PartitionPruneStepOp *stepop = lfirst(lc);
ListCell *lc2;
if (!IsA(stepop, PartitionPruneStepOp))
continue;
foreach(lc2, stepop->exprs)
{
Expr *expr = lfirst(lc2);
if (IsA(expr, Param))
{
Param *param = (Param *) expr;
switch (param->paramkind)
{
case PARAM_EXTERN:
pinfo->extparams = bms_add_member(pinfo->extparams,
param->paramid);
break;
case PARAM_EXEC:
pinfo->execparams = bms_add_member(pinfo->execparams,
param->paramid);
break;
default:
elog(ERROR, "unrecognized paramkind: %d",
(int) param->paramkind);
break;
}
gotone = true;
}
}
}
return gotone;
}
/*
* perform_pruning_base_step
* Determines the indexes of datums that satisfy conditions specified in
......@@ -2793,6 +3031,29 @@ partkey_datum_from_expr(PartitionPruneContext *context,
*value = ((Const *) expr)->constvalue;
return true;
case T_Param:
/*
* When being called from the executor we may be able to evaluate
* the Param's value.
*/
if (context->planstate &&
bms_is_member(((Param *) expr)->paramid, context->safeparams))
{
ExprState *exprstate;
bool isNull;
exprstate = ExecInitExpr(expr, context->planstate);
*value = ExecEvalExprSwitchContext(exprstate,
context->planstate->ps_ExprContext,
&isNull);
if (isNull)
return false;
return true;
}
default:
break;
}
......
......@@ -17,6 +17,7 @@
#include "nodes/execnodes.h"
#include "nodes/parsenodes.h"
#include "nodes/plannodes.h"
#include "partitioning/partprune.h"
/*-----------------------
* PartitionDispatch - information about one partitioned table in a partition
......@@ -108,6 +109,77 @@ typedef struct PartitionTupleRouting
TupleTableSlot *root_tuple_slot;
} PartitionTupleRouting;
/*-----------------------
* PartitionPruningData - Encapsulates all information required to support
* elimination of partitions in node types which support arbitrary Lists of
* subplans. Information stored here allows the planner's partition pruning
* functions to be called and the return value of partition indexes translated
* into the subpath indexes of node types such as Append, thus allowing us to
* bypass certain subnodes when we have proofs that indicate that no tuple
* matching the 'pruning_steps' will be found within.
*
* subnode_map An array containing the subnode index which
* matches this partition index, or -1 if the
* subnode has been pruned already.
* subpart_map An array containing the offset into the
* 'partprunedata' array in PartitionPruning, or
* -1 if there is no such element in that array.
* present_parts A Bitmapset of the partition index that we have
* subnodes mapped for.
* context Contains the context details required to call
* the partition pruning code.
* pruning_steps Contains a list of PartitionPruneStep used to
* perform the actual pruning.
* extparams Contains paramids of external params found
* matching partition keys in 'pruning_steps'.
* allparams As 'extparams' but also including exec params.
*-----------------------
*/
typedef struct PartitionPruningData
{
int *subnode_map;
int *subpart_map;
Bitmapset *present_parts;
PartitionPruneContext context;
List *pruning_steps;
Bitmapset *extparams;
Bitmapset *allparams;
} PartitionPruningData;
/*-----------------------
* PartitionPruneState - State object required for executor nodes to perform
* partition pruning elimination of their subnodes. This encapsulates a
* flattened hierarchy of PartitionPruningData structs and also stores all
* paramids which were found to match the partition keys of each partition.
* This struct can be attached to node types which support arbitrary Lists of
* subnodes containing partitions to allow subnodes to be eliminated due to
* the clauses being unable to match to any tuple that the subnode could
* possibly produce.
*
* partprunedata Array of PartitionPruningData for the node's target
* partitioned relation. First element contains the
* details for the target partitioned table.
* num_partprunedata Number of items in 'partprunedata' array.
* prune_context A memory context which can be used to call the query
* planner's partition prune functions.
* extparams All PARAM_EXTERN paramids which were found to match a
* partition key in each of the contained
* PartitionPruningData structs.
* execparams As above but for PARAM_EXEC.
* allparams Union of 'extparams' and 'execparams', saved to avoid
* recalculation.
*-----------------------
*/
typedef struct PartitionPruneState
{
PartitionPruningData *partprunedata;
int num_partprunedata;
MemoryContext prune_context;
Bitmapset *extparams;
Bitmapset *execparams;
Bitmapset *allparams;
} PartitionPruneState;
extern PartitionTupleRouting *ExecSetupPartitionTupleRouting(ModifyTableState *mtstate,
Relation rel);
extern int ExecFindPartition(ResultRelInfo *resultRelInfo,
......@@ -133,5 +205,10 @@ extern HeapTuple ConvertPartitionTupleSlot(TupleConversionMap *map,
TupleTableSlot **p_my_slot);
extern void ExecCleanupTupleRouting(ModifyTableState *mtstate,
PartitionTupleRouting *proute);
extern PartitionPruneState *ExecSetupPartitionPruneState(PlanState *planstate,
List *partitionpruneinfo);
extern Bitmapset *ExecFindMatchingSubPlans(PartitionPruneState *prunestate);
extern Bitmapset *ExecFindInitialMatchingSubPlans(PartitionPruneState *prunestate,
int nsubnodes);
#endif /* EXECPARTITION_H */
......@@ -1124,7 +1124,12 @@ typedef struct ModifyTableState
* AppendState information
*
* nplans how many plans are in the array
* whichplan which plan is being executed (0 .. n-1)
* whichplan which plan is being executed (0 .. n-1), or a
* special negative value. See nodeAppend.c.
* pruningstate details required to allow partitions to be
* eliminated from the scan, or NULL if not possible.
* valid_subplans for runtime pruning, valid appendplans indexes to
* scan.
* ----------------
*/
......@@ -1132,6 +1137,7 @@ struct AppendState;
typedef struct AppendState AppendState;
struct ParallelAppendState;
typedef struct ParallelAppendState ParallelAppendState;
struct PartitionPruneState;
struct AppendState
{
......@@ -1141,6 +1147,8 @@ struct AppendState
int as_whichplan;
ParallelAppendState *as_pstate; /* parallel coordination info */
Size pstate_len; /* size of parallel coordination info */
struct PartitionPruneState *as_prune_state;
Bitmapset *as_valid_subplans;
bool (*choose_next_subplan) (AppendState *);
};
......
......@@ -196,6 +196,7 @@ typedef enum NodeTag
T_PartitionPruneStep,
T_PartitionPruneStepOp,
T_PartitionPruneStepCombine,
T_PartitionPruneInfo,
/*
* TAGS FOR EXPRESSION STATE NODES (execnodes.h)
......
......@@ -256,6 +256,11 @@ typedef struct Append
List *partitioned_rels;
List *appendplans;
int first_partial_plan;
/*
* Mapping details for run-time subplan pruning, one per partitioned_rels
*/
List *part_prune_infos;
} Append;
/* ----------------
......
......@@ -1581,4 +1581,27 @@ typedef struct PartitionPruneStepCombine
List *source_stepids;
} PartitionPruneStepCombine;
/*----------
* PartitionPruneInfo - Details required to allow the executor to prune
* partitions.
*
* Here we store mapping details to allow translation of a partitioned table's
* index into subnode indexes for node types which support arbitrary numbers
* of sub nodes, such as Append.
*----------
*/
typedef struct PartitionPruneInfo
{
NodeTag type;
Oid reloid; /* Oid of partition rel */
List *pruning_steps; /* List of PartitionPruneStep */
Bitmapset *present_parts; /* Indexes of all partitions which subnodes
* are present for. */
int nparts; /* The length of the following two arrays */
int *subnode_map; /* subnode index by partition id, or -1 */
int *subpart_map; /* subpart index by partition id, or -1 */
Bitmapset *extparams; /* All external paramids seen in prunesteps */
Bitmapset *execparams; /* All exec paramids seen in prunesteps */
} PartitionPruneInfo;
#endif /* PRIMNODES_H */
......@@ -64,7 +64,7 @@ extern BitmapOrPath *create_bitmap_or_path(PlannerInfo *root,
List *bitmapquals);
extern TidPath *create_tidscan_path(PlannerInfo *root, RelOptInfo *rel,
List *tidquals, Relids required_outer);
extern AppendPath *create_append_path(RelOptInfo *rel,
extern AppendPath *create_append_path(PlannerInfo *root, RelOptInfo *rel,
List *subpaths, List *partial_subpaths,
Relids required_outer,
int parallel_workers, bool parallel_aware,
......
......@@ -37,9 +37,23 @@ typedef struct PartitionPruneContext
/* Partition boundary info */
PartitionBoundInfo boundinfo;
/*
* Can be set when the context is used from the executor to allow params
* found matching the partition key to be evaulated.
*/
PlanState *planstate;
/*
* Parameters that are safe to be used for partition pruning. execparams
* are not safe to use until the executor is running.
*/
Bitmapset *safeparams;
} PartitionPruneContext;
extern List *make_partition_pruneinfo(PlannerInfo *root, List *partition_rels,
List *subpaths, List *prunequal);
extern Relids prune_append_rel_partitions(RelOptInfo *rel);
extern Bitmapset *get_matching_partitions(PartitionPruneContext *context,
List *pruning_steps);
......
......@@ -1331,3 +1331,1138 @@ explain (costs off) select * from rparted_by_int2 where a > 100000000000000;
(3 rows)
drop table lp, coll_pruning, rlp, mc3p, mc2p, boolpart, rp, coll_pruning_multi, like_op_noprune, lparted_by_int2, rparted_by_int2;
--
-- Test runtime partition pruning
--
create table ab (a int not null, b int not null) partition by list (a);
create table ab_a2 partition of ab for values in(2) partition by list (b);
create table ab_a2_b1 partition of ab_a2 for values in (1);
create table ab_a2_b2 partition of ab_a2 for values in (2);
create table ab_a2_b3 partition of ab_a2 for values in (3);
create table ab_a1 partition of ab for values in(1) partition by list (b);
create table ab_a1_b1 partition of ab_a1 for values in (1);
create table ab_a1_b2 partition of ab_a1 for values in (2);
create table ab_a1_b3 partition of ab_a1 for values in (3);
create table ab_a3 partition of ab for values in(3) partition by list (b);
create table ab_a3_b1 partition of ab_a3 for values in (1);
create table ab_a3_b2 partition of ab_a3 for values in (2);
create table ab_a3_b3 partition of ab_a3 for values in (3);
prepare ab_q1 (int, int, int) as
select * from ab where a between $1 and $2 and b <= $3;
-- Execute query 5 times to allow choose_custom_plan
-- to start considering a generic plan.
execute ab_q1 (1, 8, 3);
a | b
---+---
(0 rows)
execute ab_q1 (1, 8, 3);
a | b
---+---
(0 rows)
execute ab_q1 (1, 8, 3);
a | b
---+---
(0 rows)
execute ab_q1 (1, 8, 3);
a | b
---+---
(0 rows)
execute ab_q1 (1, 8, 3);
a | b
---+---
(0 rows)
explain (analyze, costs off, summary off, timing off) execute ab_q1 (2, 2, 3);
QUERY PLAN
---------------------------------------------------------
Append (actual rows=0 loops=1)
Subplans Removed: 6
-> Seq Scan on ab_a2_b1 (actual rows=0 loops=1)
Filter: ((a >= $1) AND (a <= $2) AND (b <= $3))
-> Seq Scan on ab_a2_b2 (actual rows=0 loops=1)
Filter: ((a >= $1) AND (a <= $2) AND (b <= $3))
-> Seq Scan on ab_a2_b3 (actual rows=0 loops=1)
Filter: ((a >= $1) AND (a <= $2) AND (b <= $3))
(8 rows)
explain (analyze, costs off, summary off, timing off) execute ab_q1 (1, 2, 3);
QUERY PLAN
---------------------------------------------------------
Append (actual rows=0 loops=1)
Subplans Removed: 3
-> Seq Scan on ab_a1_b1 (actual rows=0 loops=1)
Filter: ((a >= $1) AND (a <= $2) AND (b <= $3))
-> Seq Scan on ab_a1_b2 (actual rows=0 loops=1)
Filter: ((a >= $1) AND (a <= $2) AND (b <= $3))
-> Seq Scan on ab_a1_b3 (actual rows=0 loops=1)
Filter: ((a >= $1) AND (a <= $2) AND (b <= $3))
-> Seq Scan on ab_a2_b1 (actual rows=0 loops=1)
Filter: ((a >= $1) AND (a <= $2) AND (b <= $3))
-> Seq Scan on ab_a2_b2 (actual rows=0 loops=1)
Filter: ((a >= $1) AND (a <= $2) AND (b <= $3))
-> Seq Scan on ab_a2_b3 (actual rows=0 loops=1)
Filter: ((a >= $1) AND (a <= $2) AND (b <= $3))
(14 rows)
deallocate ab_q1;
-- Runtime pruning after optimizer pruning
prepare ab_q1 (int, int) as
select a from ab where a between $1 and $2 and b < 3;
-- Execute query 5 times to allow choose_custom_plan
-- to start considering a generic plan.
execute ab_q1 (1, 8);
a
---
(0 rows)
execute ab_q1 (1, 8);
a
---
(0 rows)
execute ab_q1 (1, 8);
a
---
(0 rows)
execute ab_q1 (1, 8);
a
---
(0 rows)
execute ab_q1 (1, 8);
a
---
(0 rows)
explain (analyze, costs off, summary off, timing off) execute ab_q1 (2, 2);
QUERY PLAN
-------------------------------------------------------
Append (actual rows=0 loops=1)
Subplans Removed: 4
-> Seq Scan on ab_a2_b1 (actual rows=0 loops=1)
Filter: ((a >= $1) AND (a <= $2) AND (b < 3))
-> Seq Scan on ab_a2_b2 (actual rows=0 loops=1)
Filter: ((a >= $1) AND (a <= $2) AND (b < 3))
(6 rows)
explain (analyze, costs off, summary off, timing off) execute ab_q1 (2, 4);
QUERY PLAN
-------------------------------------------------------
Append (actual rows=0 loops=1)
Subplans Removed: 2
-> Seq Scan on ab_a2_b1 (actual rows=0 loops=1)
Filter: ((a >= $1) AND (a <= $2) AND (b < 3))
-> Seq Scan on ab_a2_b2 (actual rows=0 loops=1)
Filter: ((a >= $1) AND (a <= $2) AND (b < 3))
-> Seq Scan on ab_a3_b1 (actual rows=0 loops=1)
Filter: ((a >= $1) AND (a <= $2) AND (b < 3))
-> Seq Scan on ab_a3_b2 (actual rows=0 loops=1)
Filter: ((a >= $1) AND (a <= $2) AND (b < 3))
(10 rows)
-- Ensure a mix of external and exec params work together at different
-- levels of partitioning.
prepare ab_q2 (int, int) as
select a from ab where a between $1 and $2 and b < (select 3);
execute ab_q2 (1, 8);
a
---
(0 rows)
execute ab_q2 (1, 8);
a
---
(0 rows)
execute ab_q2 (1, 8);
a
---
(0 rows)
execute ab_q2 (1, 8);
a
---
(0 rows)
execute ab_q2 (1, 8);
a
---
(0 rows)
explain (analyze, costs off, summary off, timing off) execute ab_q2 (2, 2);
QUERY PLAN
--------------------------------------------------------
Append (actual rows=0 loops=1)
InitPlan 1 (returns $0)
-> Result (actual rows=1 loops=1)
Subplans Removed: 6
-> Seq Scan on ab_a2_b1 (actual rows=0 loops=1)
Filter: ((a >= $1) AND (a <= $2) AND (b < $0))
-> Seq Scan on ab_a2_b2 (actual rows=0 loops=1)
Filter: ((a >= $1) AND (a <= $2) AND (b < $0))
-> Seq Scan on ab_a2_b3 (never executed)
Filter: ((a >= $1) AND (a <= $2) AND (b < $0))
(10 rows)
-- As above, but with swap the exec param to the first partition level
prepare ab_q3 (int, int) as
select a from ab where b between $1 and $2 and a < (select 3);
execute ab_q3 (1, 8);
a
---
(0 rows)
execute ab_q3 (1, 8);
a
---
(0 rows)
execute ab_q3 (1, 8);
a
---
(0 rows)
execute ab_q3 (1, 8);
a
---
(0 rows)
execute ab_q3 (1, 8);
a
---
(0 rows)
explain (analyze, costs off, summary off, timing off) execute ab_q3 (2, 2);
QUERY PLAN
--------------------------------------------------------
Append (actual rows=0 loops=1)
InitPlan 1 (returns $0)
-> Result (actual rows=1 loops=1)
Subplans Removed: 6
-> Seq Scan on ab_a1_b2 (actual rows=0 loops=1)
Filter: ((b >= $1) AND (b <= $2) AND (a < $0))
-> Seq Scan on ab_a2_b2 (actual rows=0 loops=1)
Filter: ((b >= $1) AND (b <= $2) AND (a < $0))
-> Seq Scan on ab_a3_b2 (never executed)
Filter: ((b >= $1) AND (b <= $2) AND (a < $0))
(10 rows)
-- Parallel append
prepare ab_q4 (int, int) as
select avg(a) from ab where a between $1 and $2 and b < 4;
-- Encourage use of parallel plans
set parallel_setup_cost = 0;
set parallel_tuple_cost = 0;
set min_parallel_table_scan_size = 0;
set max_parallel_workers_per_gather = 2;
-- Execute query 5 times to allow choose_custom_plan
-- to start considering a generic plan.
execute ab_q4 (1, 8);
avg
-----
(1 row)
execute ab_q4 (1, 8);
avg
-----
(1 row)
execute ab_q4 (1, 8);
avg
-----
(1 row)
execute ab_q4 (1, 8);
avg
-----
(1 row)
execute ab_q4 (1, 8);
avg
-----
(1 row)
explain (analyze, costs off, summary off, timing off) execute ab_q4 (2, 2);
QUERY PLAN
-------------------------------------------------------------------------------
Finalize Aggregate (actual rows=1 loops=1)
-> Gather (actual rows=3 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Partial Aggregate (actual rows=1 loops=3)
-> Parallel Append (actual rows=0 loops=3)
Subplans Removed: 6
-> Parallel Seq Scan on ab_a2_b1 (actual rows=0 loops=1)
Filter: ((a >= $1) AND (a <= $2) AND (b < 4))
-> Parallel Seq Scan on ab_a2_b2 (actual rows=0 loops=1)
Filter: ((a >= $1) AND (a <= $2) AND (b < 4))
-> Parallel Seq Scan on ab_a2_b3 (actual rows=0 loops=1)
Filter: ((a >= $1) AND (a <= $2) AND (b < 4))
(13 rows)
-- Test run-time pruning with IN lists.
prepare ab_q5 (int, int, int) as
select avg(a) from ab where a in($1,$2,$3) and b < 4;
-- Execute query 5 times to allow choose_custom_plan
-- to start considering a generic plan.
execute ab_q5 (1, 2, 3);
avg
-----
(1 row)
execute ab_q5 (1, 2, 3);
avg
-----
(1 row)
execute ab_q5 (1, 2, 3);
avg
-----
(1 row)
execute ab_q5 (1, 2, 3);
avg
-----
(1 row)
execute ab_q5 (1, 2, 3);
avg
-----
(1 row)
explain (analyze, costs off, summary off, timing off) execute ab_q5 (1, 1, 1);
QUERY PLAN
-------------------------------------------------------------------------------
Finalize Aggregate (actual rows=1 loops=1)
-> Gather (actual rows=3 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Partial Aggregate (actual rows=1 loops=3)
-> Parallel Append (actual rows=0 loops=3)
Subplans Removed: 6
-> Parallel Seq Scan on ab_a1_b1 (actual rows=0 loops=1)
Filter: ((b < 4) AND (a = ANY (ARRAY[$1, $2, $3])))
-> Parallel Seq Scan on ab_a1_b2 (actual rows=0 loops=1)
Filter: ((b < 4) AND (a = ANY (ARRAY[$1, $2, $3])))
-> Parallel Seq Scan on ab_a1_b3 (actual rows=0 loops=1)
Filter: ((b < 4) AND (a = ANY (ARRAY[$1, $2, $3])))
(13 rows)
explain (analyze, costs off, summary off, timing off) execute ab_q5 (2, 3, 3);
QUERY PLAN
-------------------------------------------------------------------------------
Finalize Aggregate (actual rows=1 loops=1)
-> Gather (actual rows=3 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Partial Aggregate (actual rows=1 loops=3)
-> Parallel Append (actual rows=0 loops=3)
Subplans Removed: 3
-> Parallel Seq Scan on ab_a2_b1 (actual rows=0 loops=1)
Filter: ((b < 4) AND (a = ANY (ARRAY[$1, $2, $3])))
-> Parallel Seq Scan on ab_a2_b2 (actual rows=0 loops=1)
Filter: ((b < 4) AND (a = ANY (ARRAY[$1, $2, $3])))
-> Parallel Seq Scan on ab_a2_b3 (actual rows=0 loops=1)
Filter: ((b < 4) AND (a = ANY (ARRAY[$1, $2, $3])))
-> Parallel Seq Scan on ab_a3_b1 (actual rows=0 loops=1)
Filter: ((b < 4) AND (a = ANY (ARRAY[$1, $2, $3])))
-> Parallel Seq Scan on ab_a3_b2 (actual rows=0 loops=1)
Filter: ((b < 4) AND (a = ANY (ARRAY[$1, $2, $3])))
-> Parallel Seq Scan on ab_a3_b3 (actual rows=0 loops=1)
Filter: ((b < 4) AND (a = ANY (ARRAY[$1, $2, $3])))
(19 rows)
-- Try some params whose values do not belong to any partition.
-- We'll still get a single subplan in this case, but it should not be scanned.
explain (analyze, costs off, summary off, timing off) execute ab_q5 (33, 44, 55);
QUERY PLAN
-------------------------------------------------------------------------------
Finalize Aggregate (actual rows=1 loops=1)
-> Gather (actual rows=3 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Partial Aggregate (actual rows=1 loops=3)
-> Parallel Append (actual rows=0 loops=3)
Subplans Removed: 8
-> Parallel Seq Scan on ab_a1_b1 (never executed)
Filter: ((b < 4) AND (a = ANY (ARRAY[$1, $2, $3])))
(9 rows)
-- Test parallel Append with IN list and parameterized nested loops
create table lprt_a (a int not null);
-- Insert some values we won't find in ab
insert into lprt_a select 0 from generate_series(1,100);
-- and insert some values that we should find.
insert into lprt_a values(1),(1);
analyze lprt_a;
create index ab_a2_b1_a_idx on ab_a2_b1 (a);
create index ab_a2_b2_a_idx on ab_a2_b2 (a);
create index ab_a2_b3_a_idx on ab_a2_b3 (a);
create index ab_a1_b1_a_idx on ab_a1_b1 (a);
create index ab_a1_b2_a_idx on ab_a1_b2 (a);
create index ab_a1_b3_a_idx on ab_a1_b3 (a);
create index ab_a3_b1_a_idx on ab_a3_b1 (a);
create index ab_a3_b2_a_idx on ab_a3_b2 (a);
create index ab_a3_b3_a_idx on ab_a3_b3 (a);
set enable_hashjoin = 0;
set enable_mergejoin = 0;
prepare ab_q6 (int, int, int) as
select avg(ab.a) from ab inner join lprt_a a on ab.a = a.a where a.a in($1,$2,$3);
execute ab_q6 (1, 2, 3);
avg
-----
(1 row)
execute ab_q6 (1, 2, 3);
avg
-----
(1 row)
execute ab_q6 (1, 2, 3);
avg
-----
(1 row)
execute ab_q6 (1, 2, 3);
avg
-----
(1 row)
execute ab_q6 (1, 2, 3);
avg
-----
(1 row)
explain (analyze, costs off, summary off, timing off) execute ab_q6 (0, 0, 1);
QUERY PLAN
--------------------------------------------------------------------------------------------------------
Finalize Aggregate (actual rows=1 loops=1)
-> Gather (actual rows=2 loops=1)
Workers Planned: 1
Workers Launched: 1
-> Partial Aggregate (actual rows=1 loops=2)
-> Nested Loop (actual rows=0 loops=2)
-> Parallel Seq Scan on lprt_a a (actual rows=51 loops=2)
Filter: (a = ANY ('{0,0,1}'::integer[]))
-> Append (actual rows=0 loops=102)
-> Index Only Scan using ab_a1_b1_a_idx on ab_a1_b1 (actual rows=0 loops=2)
Index Cond: (a = a.a)
Heap Fetches: 0
-> Index Only Scan using ab_a1_b2_a_idx on ab_a1_b2 (actual rows=0 loops=2)
Index Cond: (a = a.a)
Heap Fetches: 0
-> Index Only Scan using ab_a1_b3_a_idx on ab_a1_b3 (actual rows=0 loops=2)
Index Cond: (a = a.a)
Heap Fetches: 0
-> Index Only Scan using ab_a2_b1_a_idx on ab_a2_b1 (never executed)
Index Cond: (a = a.a)
Heap Fetches: 0
-> Index Only Scan using ab_a2_b2_a_idx on ab_a2_b2 (never executed)
Index Cond: (a = a.a)
Heap Fetches: 0
-> Index Only Scan using ab_a2_b3_a_idx on ab_a2_b3 (never executed)
Index Cond: (a = a.a)
Heap Fetches: 0
-> Index Only Scan using ab_a3_b1_a_idx on ab_a3_b1 (never executed)
Index Cond: (a = a.a)
Heap Fetches: 0
-> Index Only Scan using ab_a3_b2_a_idx on ab_a3_b2 (never executed)
Index Cond: (a = a.a)
Heap Fetches: 0
-> Index Only Scan using ab_a3_b3_a_idx on ab_a3_b3 (never executed)
Index Cond: (a = a.a)
Heap Fetches: 0
(36 rows)
insert into lprt_a values(3),(3);
explain (analyze, costs off, summary off, timing off) execute ab_q6 (1, 0, 3);
QUERY PLAN
--------------------------------------------------------------------------------------------------------
Finalize Aggregate (actual rows=1 loops=1)
-> Gather (actual rows=2 loops=1)
Workers Planned: 1
Workers Launched: 1
-> Partial Aggregate (actual rows=1 loops=2)
-> Nested Loop (actual rows=0 loops=2)
-> Parallel Seq Scan on lprt_a a (actual rows=52 loops=2)
Filter: (a = ANY ('{1,0,3}'::integer[]))
-> Append (actual rows=0 loops=104)
-> Index Only Scan using ab_a1_b1_a_idx on ab_a1_b1 (actual rows=0 loops=2)
Index Cond: (a = a.a)
Heap Fetches: 0
-> Index Only Scan using ab_a1_b2_a_idx on ab_a1_b2 (actual rows=0 loops=2)
Index Cond: (a = a.a)
Heap Fetches: 0
-> Index Only Scan using ab_a1_b3_a_idx on ab_a1_b3 (actual rows=0 loops=2)
Index Cond: (a = a.a)
Heap Fetches: 0
-> Index Only Scan using ab_a2_b1_a_idx on ab_a2_b1 (never executed)
Index Cond: (a = a.a)
Heap Fetches: 0
-> Index Only Scan using ab_a2_b2_a_idx on ab_a2_b2 (never executed)
Index Cond: (a = a.a)
Heap Fetches: 0
-> Index Only Scan using ab_a2_b3_a_idx on ab_a2_b3 (never executed)
Index Cond: (a = a.a)
Heap Fetches: 0
-> Index Only Scan using ab_a3_b1_a_idx on ab_a3_b1 (actual rows=0 loops=2)
Index Cond: (a = a.a)
Heap Fetches: 0
-> Index Only Scan using ab_a3_b2_a_idx on ab_a3_b2 (actual rows=0 loops=2)
Index Cond: (a = a.a)
Heap Fetches: 0
-> Index Only Scan using ab_a3_b3_a_idx on ab_a3_b3 (actual rows=0 loops=2)
Index Cond: (a = a.a)
Heap Fetches: 0
(36 rows)
explain (analyze, costs off, summary off, timing off) execute ab_q6 (1, 0, 0);
QUERY PLAN
--------------------------------------------------------------------------------------------------------
Finalize Aggregate (actual rows=1 loops=1)
-> Gather (actual rows=2 loops=1)
Workers Planned: 1
Workers Launched: 1
-> Partial Aggregate (actual rows=1 loops=2)
-> Nested Loop (actual rows=0 loops=2)
-> Parallel Seq Scan on lprt_a a (actual rows=51 loops=2)
Filter: (a = ANY ('{1,0,0}'::integer[]))
Rows Removed by Filter: 1
-> Append (actual rows=0 loops=102)
-> Index Only Scan using ab_a1_b1_a_idx on ab_a1_b1 (actual rows=0 loops=2)
Index Cond: (a = a.a)
Heap Fetches: 0
-> Index Only Scan using ab_a1_b2_a_idx on ab_a1_b2 (actual rows=0 loops=2)
Index Cond: (a = a.a)
Heap Fetches: 0
-> Index Only Scan using ab_a1_b3_a_idx on ab_a1_b3 (actual rows=0 loops=2)
Index Cond: (a = a.a)
Heap Fetches: 0
-> Index Only Scan using ab_a2_b1_a_idx on ab_a2_b1 (never executed)
Index Cond: (a = a.a)
Heap Fetches: 0
-> Index Only Scan using ab_a2_b2_a_idx on ab_a2_b2 (never executed)
Index Cond: (a = a.a)
Heap Fetches: 0
-> Index Only Scan using ab_a2_b3_a_idx on ab_a2_b3 (never executed)
Index Cond: (a = a.a)
Heap Fetches: 0
-> Index Only Scan using ab_a3_b1_a_idx on ab_a3_b1 (never executed)
Index Cond: (a = a.a)
Heap Fetches: 0
-> Index Only Scan using ab_a3_b2_a_idx on ab_a3_b2 (never executed)
Index Cond: (a = a.a)
Heap Fetches: 0
-> Index Only Scan using ab_a3_b3_a_idx on ab_a3_b3 (never executed)
Index Cond: (a = a.a)
Heap Fetches: 0
(37 rows)
delete from lprt_a where a = 1;
explain (analyze, costs off, summary off, timing off) execute ab_q6 (1, 0, 0);
QUERY PLAN
-------------------------------------------------------------------------------------------------
Finalize Aggregate (actual rows=1 loops=1)
-> Gather (actual rows=2 loops=1)
Workers Planned: 1
Workers Launched: 1
-> Partial Aggregate (actual rows=1 loops=2)
-> Nested Loop (actual rows=0 loops=2)
-> Parallel Seq Scan on lprt_a a (actual rows=50 loops=2)
Filter: (a = ANY ('{1,0,0}'::integer[]))
Rows Removed by Filter: 1
-> Append (actual rows=0 loops=100)
-> Index Only Scan using ab_a1_b1_a_idx on ab_a1_b1 (never executed)
Index Cond: (a = a.a)
Heap Fetches: 0
-> Index Only Scan using ab_a1_b2_a_idx on ab_a1_b2 (never executed)
Index Cond: (a = a.a)
Heap Fetches: 0
-> Index Only Scan using ab_a1_b3_a_idx on ab_a1_b3 (never executed)
Index Cond: (a = a.a)
Heap Fetches: 0
-> Index Only Scan using ab_a2_b1_a_idx on ab_a2_b1 (never executed)
Index Cond: (a = a.a)
Heap Fetches: 0
-> Index Only Scan using ab_a2_b2_a_idx on ab_a2_b2 (never executed)
Index Cond: (a = a.a)
Heap Fetches: 0
-> Index Only Scan using ab_a2_b3_a_idx on ab_a2_b3 (never executed)
Index Cond: (a = a.a)
Heap Fetches: 0
-> Index Only Scan using ab_a3_b1_a_idx on ab_a3_b1 (never executed)
Index Cond: (a = a.a)
Heap Fetches: 0
-> Index Only Scan using ab_a3_b2_a_idx on ab_a3_b2 (never executed)
Index Cond: (a = a.a)
Heap Fetches: 0
-> Index Only Scan using ab_a3_b3_a_idx on ab_a3_b3 (never executed)
Index Cond: (a = a.a)
Heap Fetches: 0
(37 rows)
reset enable_hashjoin;
reset enable_mergejoin;
reset parallel_setup_cost;
reset parallel_tuple_cost;
reset min_parallel_table_scan_size;
reset max_parallel_workers_per_gather;
-- Test run-time partition pruning with an initplan
explain (analyze, costs off, summary off, timing off)
select * from ab where a = (select max(a) from lprt_a) and b = (select max(a)-1 from lprt_a);
QUERY PLAN
-------------------------------------------------------------------------
Append (actual rows=0 loops=1)
InitPlan 1 (returns $0)
-> Aggregate (actual rows=1 loops=1)
-> Seq Scan on lprt_a (actual rows=102 loops=1)
InitPlan 2 (returns $1)
-> Aggregate (actual rows=1 loops=1)
-> Seq Scan on lprt_a lprt_a_1 (actual rows=102 loops=1)
-> Bitmap Heap Scan on ab_a1_b1 (never executed)
Recheck Cond: (a = $0)
Filter: (b = $1)
-> Bitmap Index Scan on ab_a1_b1_a_idx (never executed)
Index Cond: (a = $0)
-> Bitmap Heap Scan on ab_a1_b2 (never executed)
Recheck Cond: (a = $0)
Filter: (b = $1)
-> Bitmap Index Scan on ab_a1_b2_a_idx (never executed)
Index Cond: (a = $0)
-> Bitmap Heap Scan on ab_a1_b3 (never executed)
Recheck Cond: (a = $0)
Filter: (b = $1)
-> Bitmap Index Scan on ab_a1_b3_a_idx (never executed)
Index Cond: (a = $0)
-> Bitmap Heap Scan on ab_a2_b1 (never executed)
Recheck Cond: (a = $0)
Filter: (b = $1)
-> Bitmap Index Scan on ab_a2_b1_a_idx (never executed)
Index Cond: (a = $0)
-> Bitmap Heap Scan on ab_a2_b2 (never executed)
Recheck Cond: (a = $0)
Filter: (b = $1)
-> Bitmap Index Scan on ab_a2_b2_a_idx (never executed)
Index Cond: (a = $0)
-> Bitmap Heap Scan on ab_a2_b3 (never executed)
Recheck Cond: (a = $0)
Filter: (b = $1)
-> Bitmap Index Scan on ab_a2_b3_a_idx (never executed)
Index Cond: (a = $0)
-> Bitmap Heap Scan on ab_a3_b1 (never executed)
Recheck Cond: (a = $0)
Filter: (b = $1)
-> Bitmap Index Scan on ab_a3_b1_a_idx (never executed)
Index Cond: (a = $0)
-> Bitmap Heap Scan on ab_a3_b2 (actual rows=0 loops=1)
Recheck Cond: (a = $0)
Filter: (b = $1)
-> Bitmap Index Scan on ab_a3_b2_a_idx (actual rows=0 loops=1)
Index Cond: (a = $0)
-> Bitmap Heap Scan on ab_a3_b3 (never executed)
Recheck Cond: (a = $0)
Filter: (b = $1)
-> Bitmap Index Scan on ab_a3_b3_a_idx (never executed)
Index Cond: (a = $0)
(52 rows)
deallocate ab_q1;
deallocate ab_q2;
deallocate ab_q3;
deallocate ab_q4;
deallocate ab_q5;
deallocate ab_q6;
drop table ab, lprt_a;
-- Join
create table tbl1(col1 int);
insert into tbl1 values (501), (505);
-- Basic table
create table tprt (col1 int) partition by range (col1);
create table tprt_1 partition of tprt for values from (1) to (501);
create table tprt_2 partition of tprt for values from (501) to (1001);
create table tprt_3 partition of tprt for values from (1001) to (2001);
create table tprt_4 partition of tprt for values from (2001) to (3001);
create table tprt_5 partition of tprt for values from (3001) to (4001);
create table tprt_6 partition of tprt for values from (4001) to (5001);
create index tprt1_idx on tprt_1 (col1);
create index tprt2_idx on tprt_2 (col1);
create index tprt3_idx on tprt_3 (col1);
create index tprt4_idx on tprt_4 (col1);
create index tprt5_idx on tprt_5 (col1);
create index tprt6_idx on tprt_6 (col1);
insert into tprt values (10), (20), (501), (502), (505), (1001), (4500);
set enable_hashjoin = off;
set enable_mergejoin = off;
explain (analyze, costs off, summary off, timing off)
select * from tbl1 join tprt on tbl1.col1 > tprt.col1;
QUERY PLAN
-------------------------------------------------------------------------------
Nested Loop (actual rows=6 loops=1)
-> Seq Scan on tbl1 (actual rows=2 loops=1)
-> Append (actual rows=3 loops=2)
-> Index Only Scan using tprt1_idx on tprt_1 (actual rows=2 loops=2)
Index Cond: (col1 < tbl1.col1)
Heap Fetches: 4
-> Index Only Scan using tprt2_idx on tprt_2 (actual rows=2 loops=1)
Index Cond: (col1 < tbl1.col1)
Heap Fetches: 2
-> Index Only Scan using tprt3_idx on tprt_3 (never executed)
Index Cond: (col1 < tbl1.col1)
Heap Fetches: 0
-> Index Only Scan using tprt4_idx on tprt_4 (never executed)
Index Cond: (col1 < tbl1.col1)
Heap Fetches: 0
-> Index Only Scan using tprt5_idx on tprt_5 (never executed)
Index Cond: (col1 < tbl1.col1)
Heap Fetches: 0
-> Index Only Scan using tprt6_idx on tprt_6 (never executed)
Index Cond: (col1 < tbl1.col1)
Heap Fetches: 0
(21 rows)
explain (analyze, costs off, summary off, timing off)
select * from tbl1 join tprt on tbl1.col1 = tprt.col1;
QUERY PLAN
-------------------------------------------------------------------------------
Nested Loop (actual rows=2 loops=1)
-> Seq Scan on tbl1 (actual rows=2 loops=1)
-> Append (actual rows=1 loops=2)
-> Index Only Scan using tprt1_idx on tprt_1 (never executed)
Index Cond: (col1 = tbl1.col1)
Heap Fetches: 0
-> Index Only Scan using tprt2_idx on tprt_2 (actual rows=1 loops=2)
Index Cond: (col1 = tbl1.col1)
Heap Fetches: 2
-> Index Only Scan using tprt3_idx on tprt_3 (never executed)
Index Cond: (col1 = tbl1.col1)
Heap Fetches: 0
-> Index Only Scan using tprt4_idx on tprt_4 (never executed)
Index Cond: (col1 = tbl1.col1)
Heap Fetches: 0
-> Index Only Scan using tprt5_idx on tprt_5 (never executed)
Index Cond: (col1 = tbl1.col1)
Heap Fetches: 0
-> Index Only Scan using tprt6_idx on tprt_6 (never executed)
Index Cond: (col1 = tbl1.col1)
Heap Fetches: 0
(21 rows)
select tbl1.col1, tprt.col1 from tbl1
inner join tprt on tbl1.col1 > tprt.col1
order by tbl1.col1, tprt.col1;
col1 | col1
------+------
501 | 10
501 | 20
505 | 10
505 | 20
505 | 501
505 | 502
(6 rows)
select tbl1.col1, tprt.col1 from tbl1
inner join tprt on tbl1.col1 = tprt.col1
order by tbl1.col1, tprt.col1;
col1 | col1
------+------
501 | 501
505 | 505
(2 rows)
-- Multiple partitions
insert into tbl1 values (1001), (1010), (1011);
explain (analyze, costs off, summary off, timing off)
select * from tbl1 inner join tprt on tbl1.col1 > tprt.col1;
QUERY PLAN
-------------------------------------------------------------------------------
Nested Loop (actual rows=23 loops=1)
-> Seq Scan on tbl1 (actual rows=5 loops=1)
-> Append (actual rows=5 loops=5)
-> Index Only Scan using tprt1_idx on tprt_1 (actual rows=2 loops=5)
Index Cond: (col1 < tbl1.col1)
Heap Fetches: 10
-> Index Only Scan using tprt2_idx on tprt_2 (actual rows=3 loops=4)
Index Cond: (col1 < tbl1.col1)
Heap Fetches: 11
-> Index Only Scan using tprt3_idx on tprt_3 (actual rows=1 loops=2)
Index Cond: (col1 < tbl1.col1)
Heap Fetches: 2
-> Index Only Scan using tprt4_idx on tprt_4 (never executed)
Index Cond: (col1 < tbl1.col1)
Heap Fetches: 0
-> Index Only Scan using tprt5_idx on tprt_5 (never executed)
Index Cond: (col1 < tbl1.col1)
Heap Fetches: 0
-> Index Only Scan using tprt6_idx on tprt_6 (never executed)
Index Cond: (col1 < tbl1.col1)
Heap Fetches: 0
(21 rows)
explain (analyze, costs off, summary off, timing off)
select * from tbl1 inner join tprt on tbl1.col1 = tprt.col1;
QUERY PLAN
-------------------------------------------------------------------------------
Nested Loop (actual rows=3 loops=1)
-> Seq Scan on tbl1 (actual rows=5 loops=1)
-> Append (actual rows=1 loops=5)
-> Index Only Scan using tprt1_idx on tprt_1 (never executed)
Index Cond: (col1 = tbl1.col1)
Heap Fetches: 0
-> Index Only Scan using tprt2_idx on tprt_2 (actual rows=1 loops=2)
Index Cond: (col1 = tbl1.col1)
Heap Fetches: 2
-> Index Only Scan using tprt3_idx on tprt_3 (actual rows=0 loops=3)
Index Cond: (col1 = tbl1.col1)
Heap Fetches: 1
-> Index Only Scan using tprt4_idx on tprt_4 (never executed)
Index Cond: (col1 = tbl1.col1)
Heap Fetches: 0
-> Index Only Scan using tprt5_idx on tprt_5 (never executed)
Index Cond: (col1 = tbl1.col1)
Heap Fetches: 0
-> Index Only Scan using tprt6_idx on tprt_6 (never executed)
Index Cond: (col1 = tbl1.col1)
Heap Fetches: 0
(21 rows)
select tbl1.col1, tprt.col1 from tbl1
inner join tprt on tbl1.col1 > tprt.col1
order by tbl1.col1, tprt.col1;
col1 | col1
------+------
501 | 10
501 | 20
505 | 10
505 | 20
505 | 501
505 | 502
1001 | 10
1001 | 20
1001 | 501
1001 | 502
1001 | 505
1010 | 10
1010 | 20
1010 | 501
1010 | 502
1010 | 505
1010 | 1001
1011 | 10
1011 | 20
1011 | 501
1011 | 502
1011 | 505
1011 | 1001
(23 rows)
select tbl1.col1, tprt.col1 from tbl1
inner join tprt on tbl1.col1 = tprt.col1
order by tbl1.col1, tprt.col1;
col1 | col1
------+------
501 | 501
505 | 505
1001 | 1001
(3 rows)
-- Last partition
delete from tbl1;
insert into tbl1 values (4400);
explain (analyze, costs off, summary off, timing off)
select * from tbl1 join tprt on tbl1.col1 < tprt.col1;
QUERY PLAN
-------------------------------------------------------------------------------
Nested Loop (actual rows=1 loops=1)
-> Seq Scan on tbl1 (actual rows=1 loops=1)
-> Append (actual rows=1 loops=1)
-> Index Only Scan using tprt1_idx on tprt_1 (never executed)
Index Cond: (col1 > tbl1.col1)
Heap Fetches: 0
-> Index Only Scan using tprt2_idx on tprt_2 (never executed)
Index Cond: (col1 > tbl1.col1)
Heap Fetches: 0
-> Index Only Scan using tprt3_idx on tprt_3 (never executed)
Index Cond: (col1 > tbl1.col1)
Heap Fetches: 0
-> Index Only Scan using tprt4_idx on tprt_4 (never executed)
Index Cond: (col1 > tbl1.col1)
Heap Fetches: 0
-> Index Only Scan using tprt5_idx on tprt_5 (never executed)
Index Cond: (col1 > tbl1.col1)
Heap Fetches: 0
-> Index Only Scan using tprt6_idx on tprt_6 (actual rows=1 loops=1)
Index Cond: (col1 > tbl1.col1)
Heap Fetches: 1
(21 rows)
select tbl1.col1, tprt.col1 from tbl1
inner join tprt on tbl1.col1 < tprt.col1
order by tbl1.col1, tprt.col1;
col1 | col1
------+------
4400 | 4500
(1 row)
-- No matching partition
delete from tbl1;
insert into tbl1 values (10000);
explain (analyze, costs off, summary off, timing off)
select * from tbl1 join tprt on tbl1.col1 = tprt.col1;
QUERY PLAN
------------------------------------------------------------------------
Nested Loop (actual rows=0 loops=1)
-> Seq Scan on tbl1 (actual rows=1 loops=1)
-> Append (actual rows=0 loops=1)
-> Index Only Scan using tprt1_idx on tprt_1 (never executed)
Index Cond: (col1 = tbl1.col1)
Heap Fetches: 0
-> Index Only Scan using tprt2_idx on tprt_2 (never executed)
Index Cond: (col1 = tbl1.col1)
Heap Fetches: 0
-> Index Only Scan using tprt3_idx on tprt_3 (never executed)
Index Cond: (col1 = tbl1.col1)
Heap Fetches: 0
-> Index Only Scan using tprt4_idx on tprt_4 (never executed)
Index Cond: (col1 = tbl1.col1)
Heap Fetches: 0
-> Index Only Scan using tprt5_idx on tprt_5 (never executed)
Index Cond: (col1 = tbl1.col1)
Heap Fetches: 0
-> Index Only Scan using tprt6_idx on tprt_6 (never executed)
Index Cond: (col1 = tbl1.col1)
Heap Fetches: 0
(21 rows)
select tbl1.col1, tprt.col1 from tbl1
inner join tprt on tbl1.col1 = tprt.col1
order by tbl1.col1, tprt.col1;
col1 | col1
------+------
(0 rows)
drop table tbl1, tprt;
-- Test with columns defined in varying orders between each level
create table part_abc (a int not null, b int not null, c int not null) partition by list (a);
create table part_bac (b int not null, a int not null, c int not null) partition by list (b);
create table part_cab (c int not null, a int not null, b int not null) partition by list (c);
create table part_abc_p1 (a int not null, b int not null, c int not null);
alter table part_abc attach partition part_bac for values in(1);
alter table part_bac attach partition part_cab for values in(2);
alter table part_cab attach partition part_abc_p1 for values in(3);
prepare part_abc_q1 (int, int, int) as
select * from part_abc where a = $1 and b = $2 and c = $3;
-- Execute query 5 times to allow choose_custom_plan
-- to start considering a generic plan.
execute part_abc_q1 (1, 2, 3);
a | b | c
---+---+---
(0 rows)
execute part_abc_q1 (1, 2, 3);
a | b | c
---+---+---
(0 rows)
execute part_abc_q1 (1, 2, 3);
a | b | c
---+---+---
(0 rows)
execute part_abc_q1 (1, 2, 3);
a | b | c
---+---+---
(0 rows)
execute part_abc_q1 (1, 2, 3);
a | b | c
---+---+---
(0 rows)
-- Single partition should be scanned.
explain (analyze, costs off, summary off, timing off) execute part_abc_q1 (1, 2, 3);
QUERY PLAN
-------------------------------------------------------
Append (actual rows=0 loops=1)
-> Seq Scan on part_abc_p1 (actual rows=0 loops=1)
Filter: ((a = $1) AND (b = $2) AND (c = $3))
(3 rows)
deallocate part_abc_q1;
drop table part_abc;
-- Ensure that an Append node properly handles a sub-partitioned table
-- matching without any of its leaf partitions matching the clause.
create table listp (a int, b int) partition by list (a);
create table listp_1 partition of listp for values in(1) partition by list (b);
create table listp_1_1 partition of listp_1 for values in(1);
create table listp_2 partition of listp for values in(2) partition by list (b);
create table listp_2_1 partition of listp_2 for values in(2);
select * from listp where b = 1;
a | b
---+---
(0 rows)
-- Ensure that an Append node properly can handle selection of all first level
-- partitions before finally detecting the correct set of 2nd level partitions
-- which match the given parameter.
prepare q1 (int,int) as select * from listp where b in ($1,$2);
execute q1 (1,2);
a | b
---+---
(0 rows)
execute q1 (1,2);
a | b
---+---
(0 rows)
execute q1 (1,2);
a | b
---+---
(0 rows)
execute q1 (1,2);
a | b
---+---
(0 rows)
execute q1 (1,2);
a | b
---+---
(0 rows)
explain (analyze, costs off, summary off, timing off) execute q1 (1,1);
QUERY PLAN
-----------------------------------------------------
Append (actual rows=0 loops=1)
Subplans Removed: 1
-> Seq Scan on listp_1_1 (actual rows=0 loops=1)
Filter: (b = ANY (ARRAY[$1, $2]))
(4 rows)
explain (analyze, costs off, summary off, timing off) execute q1 (2,2);
QUERY PLAN
-----------------------------------------------------
Append (actual rows=0 loops=1)
Subplans Removed: 1
-> Seq Scan on listp_2_1 (actual rows=0 loops=1)
Filter: (b = ANY (ARRAY[$1, $2]))
(4 rows)
-- Try with no matching partitions. One subplan should remain in this case,
-- but it shouldn't be executed.
explain (analyze, costs off, summary off, timing off) execute q1 (0,0);
QUERY PLAN
----------------------------------------------
Append (actual rows=0 loops=1)
Subplans Removed: 1
-> Seq Scan on listp_1_1 (never executed)
Filter: (b = ANY (ARRAY[$1, $2]))
(4 rows)
deallocate q1;
-- Test more complex cases where a not-equal condition further eliminates partitions.
prepare q1 (int,int,int,int) as select * from listp where b in($1,$2) and $3 <> b and $4 <> b;
execute q1 (1,2,3,4);
a | b
---+---
(0 rows)
execute q1 (1,2,3,4);
a | b
---+---
(0 rows)
execute q1 (1,2,3,4);
a | b
---+---
(0 rows)
execute q1 (1,2,3,4);
a | b
---+---
(0 rows)
execute q1 (1,2,3,4);
a | b
---+---
(0 rows)
-- Both partitions allowed by IN clause, but one disallowed by <> clause
explain (analyze, costs off, summary off, timing off) execute q1 (1,2,2,0);
QUERY PLAN
-------------------------------------------------------------------------
Append (actual rows=0 loops=1)
Subplans Removed: 1
-> Seq Scan on listp_1_1 (actual rows=0 loops=1)
Filter: ((b = ANY (ARRAY[$1, $2])) AND ($3 <> b) AND ($4 <> b))
(4 rows)
-- Both partitions allowed by IN clause, then both excluded again by <> clauses.
-- One subplan will remain in this case, but it should not be executed.
explain (analyze, costs off, summary off, timing off) execute q1 (1,2,2,1);
QUERY PLAN
-------------------------------------------------------------------------
Append (actual rows=0 loops=1)
Subplans Removed: 1
-> Seq Scan on listp_1_1 (never executed)
Filter: ((b = ANY (ARRAY[$1, $2])) AND ($3 <> b) AND ($4 <> b))
(4 rows)
drop table listp;
-- Ensure runtime pruning works with initplans params with boolean types
create table boolvalues (value bool not null);
insert into boolvalues values('t'),('f');
create table boolp (a bool) partition by list (a);
create table boolp_t partition of boolp for values in('t');
create table boolp_f partition of boolp for values in('f');
explain (analyze, costs off, summary off, timing off)
select * from boolp where a = (select value from boolvalues where value);
QUERY PLAN
--------------------------------------------------------
Append (actual rows=0 loops=1)
InitPlan 1 (returns $0)
-> Seq Scan on boolvalues (actual rows=1 loops=1)
Filter: value
Rows Removed by Filter: 1
-> Seq Scan on boolp_f (never executed)
Filter: (a = $0)
-> Seq Scan on boolp_t (actual rows=0 loops=1)
Filter: (a = $0)
(9 rows)
explain (analyze, costs off, summary off, timing off)
select * from boolp where a = (select value from boolvalues where not value);
QUERY PLAN
--------------------------------------------------------
Append (actual rows=0 loops=1)
InitPlan 1 (returns $0)
-> Seq Scan on boolvalues (actual rows=1 loops=1)
Filter: (NOT value)
Rows Removed by Filter: 1
-> Seq Scan on boolp_f (actual rows=0 loops=1)
Filter: (a = $0)
-> Seq Scan on boolp_t (never executed)
Filter: (a = $0)
(9 rows)
drop table boolp;
......@@ -237,3 +237,347 @@ create table rparted_by_int2_maxvalue partition of rparted_by_int2 for values fr
explain (costs off) select * from rparted_by_int2 where a > 100000000000000;
drop table lp, coll_pruning, rlp, mc3p, mc2p, boolpart, rp, coll_pruning_multi, like_op_noprune, lparted_by_int2, rparted_by_int2;
--
-- Test runtime partition pruning
--
create table ab (a int not null, b int not null) partition by list (a);
create table ab_a2 partition of ab for values in(2) partition by list (b);
create table ab_a2_b1 partition of ab_a2 for values in (1);
create table ab_a2_b2 partition of ab_a2 for values in (2);
create table ab_a2_b3 partition of ab_a2 for values in (3);
create table ab_a1 partition of ab for values in(1) partition by list (b);
create table ab_a1_b1 partition of ab_a1 for values in (1);
create table ab_a1_b2 partition of ab_a1 for values in (2);
create table ab_a1_b3 partition of ab_a1 for values in (3);
create table ab_a3 partition of ab for values in(3) partition by list (b);
create table ab_a3_b1 partition of ab_a3 for values in (1);
create table ab_a3_b2 partition of ab_a3 for values in (2);
create table ab_a3_b3 partition of ab_a3 for values in (3);
prepare ab_q1 (int, int, int) as
select * from ab where a between $1 and $2 and b <= $3;
-- Execute query 5 times to allow choose_custom_plan
-- to start considering a generic plan.
execute ab_q1 (1, 8, 3);
execute ab_q1 (1, 8, 3);
execute ab_q1 (1, 8, 3);
execute ab_q1 (1, 8, 3);
execute ab_q1 (1, 8, 3);
explain (analyze, costs off, summary off, timing off) execute ab_q1 (2, 2, 3);
explain (analyze, costs off, summary off, timing off) execute ab_q1 (1, 2, 3);
deallocate ab_q1;
-- Runtime pruning after optimizer pruning
prepare ab_q1 (int, int) as
select a from ab where a between $1 and $2 and b < 3;
-- Execute query 5 times to allow choose_custom_plan
-- to start considering a generic plan.
execute ab_q1 (1, 8);
execute ab_q1 (1, 8);
execute ab_q1 (1, 8);
execute ab_q1 (1, 8);
execute ab_q1 (1, 8);
explain (analyze, costs off, summary off, timing off) execute ab_q1 (2, 2);
explain (analyze, costs off, summary off, timing off) execute ab_q1 (2, 4);
-- Ensure a mix of external and exec params work together at different
-- levels of partitioning.
prepare ab_q2 (int, int) as
select a from ab where a between $1 and $2 and b < (select 3);
execute ab_q2 (1, 8);
execute ab_q2 (1, 8);
execute ab_q2 (1, 8);
execute ab_q2 (1, 8);
execute ab_q2 (1, 8);
explain (analyze, costs off, summary off, timing off) execute ab_q2 (2, 2);
-- As above, but with swap the exec param to the first partition level
prepare ab_q3 (int, int) as
select a from ab where b between $1 and $2 and a < (select 3);
execute ab_q3 (1, 8);
execute ab_q3 (1, 8);
execute ab_q3 (1, 8);
execute ab_q3 (1, 8);
execute ab_q3 (1, 8);
explain (analyze, costs off, summary off, timing off) execute ab_q3 (2, 2);
-- Parallel append
prepare ab_q4 (int, int) as
select avg(a) from ab where a between $1 and $2 and b < 4;
-- Encourage use of parallel plans
set parallel_setup_cost = 0;
set parallel_tuple_cost = 0;
set min_parallel_table_scan_size = 0;
set max_parallel_workers_per_gather = 2;
-- Execute query 5 times to allow choose_custom_plan
-- to start considering a generic plan.
execute ab_q4 (1, 8);
execute ab_q4 (1, 8);
execute ab_q4 (1, 8);
execute ab_q4 (1, 8);
execute ab_q4 (1, 8);
explain (analyze, costs off, summary off, timing off) execute ab_q4 (2, 2);
-- Test run-time pruning with IN lists.
prepare ab_q5 (int, int, int) as
select avg(a) from ab where a in($1,$2,$3) and b < 4;
-- Execute query 5 times to allow choose_custom_plan
-- to start considering a generic plan.
execute ab_q5 (1, 2, 3);
execute ab_q5 (1, 2, 3);
execute ab_q5 (1, 2, 3);
execute ab_q5 (1, 2, 3);
execute ab_q5 (1, 2, 3);
explain (analyze, costs off, summary off, timing off) execute ab_q5 (1, 1, 1);
explain (analyze, costs off, summary off, timing off) execute ab_q5 (2, 3, 3);
-- Try some params whose values do not belong to any partition.
-- We'll still get a single subplan in this case, but it should not be scanned.
explain (analyze, costs off, summary off, timing off) execute ab_q5 (33, 44, 55);
-- Test parallel Append with IN list and parameterized nested loops
create table lprt_a (a int not null);
-- Insert some values we won't find in ab
insert into lprt_a select 0 from generate_series(1,100);
-- and insert some values that we should find.
insert into lprt_a values(1),(1);
analyze lprt_a;
create index ab_a2_b1_a_idx on ab_a2_b1 (a);
create index ab_a2_b2_a_idx on ab_a2_b2 (a);
create index ab_a2_b3_a_idx on ab_a2_b3 (a);
create index ab_a1_b1_a_idx on ab_a1_b1 (a);
create index ab_a1_b2_a_idx on ab_a1_b2 (a);
create index ab_a1_b3_a_idx on ab_a1_b3 (a);
create index ab_a3_b1_a_idx on ab_a3_b1 (a);
create index ab_a3_b2_a_idx on ab_a3_b2 (a);
create index ab_a3_b3_a_idx on ab_a3_b3 (a);
set enable_hashjoin = 0;
set enable_mergejoin = 0;
prepare ab_q6 (int, int, int) as
select avg(ab.a) from ab inner join lprt_a a on ab.a = a.a where a.a in($1,$2,$3);
execute ab_q6 (1, 2, 3);
execute ab_q6 (1, 2, 3);
execute ab_q6 (1, 2, 3);
execute ab_q6 (1, 2, 3);
execute ab_q6 (1, 2, 3);
explain (analyze, costs off, summary off, timing off) execute ab_q6 (0, 0, 1);
insert into lprt_a values(3),(3);
explain (analyze, costs off, summary off, timing off) execute ab_q6 (1, 0, 3);
explain (analyze, costs off, summary off, timing off) execute ab_q6 (1, 0, 0);
delete from lprt_a where a = 1;
explain (analyze, costs off, summary off, timing off) execute ab_q6 (1, 0, 0);
reset enable_hashjoin;
reset enable_mergejoin;
reset parallel_setup_cost;
reset parallel_tuple_cost;
reset min_parallel_table_scan_size;
reset max_parallel_workers_per_gather;
-- Test run-time partition pruning with an initplan
explain (analyze, costs off, summary off, timing off)
select * from ab where a = (select max(a) from lprt_a) and b = (select max(a)-1 from lprt_a);
deallocate ab_q1;
deallocate ab_q2;
deallocate ab_q3;
deallocate ab_q4;
deallocate ab_q5;
deallocate ab_q6;
drop table ab, lprt_a;
-- Join
create table tbl1(col1 int);
insert into tbl1 values (501), (505);
-- Basic table
create table tprt (col1 int) partition by range (col1);
create table tprt_1 partition of tprt for values from (1) to (501);
create table tprt_2 partition of tprt for values from (501) to (1001);
create table tprt_3 partition of tprt for values from (1001) to (2001);
create table tprt_4 partition of tprt for values from (2001) to (3001);
create table tprt_5 partition of tprt for values from (3001) to (4001);
create table tprt_6 partition of tprt for values from (4001) to (5001);
create index tprt1_idx on tprt_1 (col1);
create index tprt2_idx on tprt_2 (col1);
create index tprt3_idx on tprt_3 (col1);
create index tprt4_idx on tprt_4 (col1);
create index tprt5_idx on tprt_5 (col1);
create index tprt6_idx on tprt_6 (col1);
insert into tprt values (10), (20), (501), (502), (505), (1001), (4500);
set enable_hashjoin = off;
set enable_mergejoin = off;
explain (analyze, costs off, summary off, timing off)
select * from tbl1 join tprt on tbl1.col1 > tprt.col1;
explain (analyze, costs off, summary off, timing off)
select * from tbl1 join tprt on tbl1.col1 = tprt.col1;
select tbl1.col1, tprt.col1 from tbl1
inner join tprt on tbl1.col1 > tprt.col1
order by tbl1.col1, tprt.col1;
select tbl1.col1, tprt.col1 from tbl1
inner join tprt on tbl1.col1 = tprt.col1
order by tbl1.col1, tprt.col1;
-- Multiple partitions
insert into tbl1 values (1001), (1010), (1011);
explain (analyze, costs off, summary off, timing off)
select * from tbl1 inner join tprt on tbl1.col1 > tprt.col1;
explain (analyze, costs off, summary off, timing off)
select * from tbl1 inner join tprt on tbl1.col1 = tprt.col1;
select tbl1.col1, tprt.col1 from tbl1
inner join tprt on tbl1.col1 > tprt.col1
order by tbl1.col1, tprt.col1;
select tbl1.col1, tprt.col1 from tbl1
inner join tprt on tbl1.col1 = tprt.col1
order by tbl1.col1, tprt.col1;
-- Last partition
delete from tbl1;
insert into tbl1 values (4400);
explain (analyze, costs off, summary off, timing off)
select * from tbl1 join tprt on tbl1.col1 < tprt.col1;
select tbl1.col1, tprt.col1 from tbl1
inner join tprt on tbl1.col1 < tprt.col1
order by tbl1.col1, tprt.col1;
-- No matching partition
delete from tbl1;
insert into tbl1 values (10000);
explain (analyze, costs off, summary off, timing off)
select * from tbl1 join tprt on tbl1.col1 = tprt.col1;
select tbl1.col1, tprt.col1 from tbl1
inner join tprt on tbl1.col1 = tprt.col1
order by tbl1.col1, tprt.col1;
drop table tbl1, tprt;
-- Test with columns defined in varying orders between each level
create table part_abc (a int not null, b int not null, c int not null) partition by list (a);
create table part_bac (b int not null, a int not null, c int not null) partition by list (b);
create table part_cab (c int not null, a int not null, b int not null) partition by list (c);
create table part_abc_p1 (a int not null, b int not null, c int not null);
alter table part_abc attach partition part_bac for values in(1);
alter table part_bac attach partition part_cab for values in(2);
alter table part_cab attach partition part_abc_p1 for values in(3);
prepare part_abc_q1 (int, int, int) as
select * from part_abc where a = $1 and b = $2 and c = $3;
-- Execute query 5 times to allow choose_custom_plan
-- to start considering a generic plan.
execute part_abc_q1 (1, 2, 3);
execute part_abc_q1 (1, 2, 3);
execute part_abc_q1 (1, 2, 3);
execute part_abc_q1 (1, 2, 3);
execute part_abc_q1 (1, 2, 3);
-- Single partition should be scanned.
explain (analyze, costs off, summary off, timing off) execute part_abc_q1 (1, 2, 3);
deallocate part_abc_q1;
drop table part_abc;
-- Ensure that an Append node properly handles a sub-partitioned table
-- matching without any of its leaf partitions matching the clause.
create table listp (a int, b int) partition by list (a);
create table listp_1 partition of listp for values in(1) partition by list (b);
create table listp_1_1 partition of listp_1 for values in(1);
create table listp_2 partition of listp for values in(2) partition by list (b);
create table listp_2_1 partition of listp_2 for values in(2);
select * from listp where b = 1;
-- Ensure that an Append node properly can handle selection of all first level
-- partitions before finally detecting the correct set of 2nd level partitions
-- which match the given parameter.
prepare q1 (int,int) as select * from listp where b in ($1,$2);
execute q1 (1,2);
execute q1 (1,2);
execute q1 (1,2);
execute q1 (1,2);
execute q1 (1,2);
explain (analyze, costs off, summary off, timing off) execute q1 (1,1);
explain (analyze, costs off, summary off, timing off) execute q1 (2,2);
-- Try with no matching partitions. One subplan should remain in this case,
-- but it shouldn't be executed.
explain (analyze, costs off, summary off, timing off) execute q1 (0,0);
deallocate q1;
-- Test more complex cases where a not-equal condition further eliminates partitions.
prepare q1 (int,int,int,int) as select * from listp where b in($1,$2) and $3 <> b and $4 <> b;
execute q1 (1,2,3,4);
execute q1 (1,2,3,4);
execute q1 (1,2,3,4);
execute q1 (1,2,3,4);
execute q1 (1,2,3,4);
-- Both partitions allowed by IN clause, but one disallowed by <> clause
explain (analyze, costs off, summary off, timing off) execute q1 (1,2,2,0);
-- Both partitions allowed by IN clause, then both excluded again by <> clauses.
-- One subplan will remain in this case, but it should not be executed.
explain (analyze, costs off, summary off, timing off) execute q1 (1,2,2,1);
drop table listp;
-- Ensure runtime pruning works with initplans params with boolean types
create table boolvalues (value bool not null);
insert into boolvalues values('t'),('f');
create table boolp (a bool) partition by list (a);
create table boolp_t partition of boolp for values in('t');
create table boolp_f partition of boolp for values in('f');
explain (analyze, costs off, summary off, timing off)
select * from boolp where a = (select value from boolvalues where value);
explain (analyze, costs off, summary off, timing off)
select * from boolp where a = (select value from boolvalues where not value);
drop table boolp;
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment