Commit 2f178441 authored by Robert Haas's avatar Robert Haas

Allow UPDATE to move rows between partitions.

When an UPDATE causes a row to no longer match the partition
constraint, try to move it to a different partition where it does
match the partition constraint.  In essence, the UPDATE is split into
a DELETE from the old partition and an INSERT into the new one.  This
can lead to surprising behavior in concurrency scenarios because
EvalPlanQual rechecks won't work as they normally did; the known
problems are documented.  (There is a pending patch to improve the
situation further, but it needs more review.)

Amit Khandekar, reviewed and tested by Amit Langote, David Rowley,
Rajkumar Raghuwanshi, Dilip Kumar, Amul Sul, Thomas Munro, Álvaro
Herrera, Amit Kapila, and me.  A few final revisions by me.

Discussion: http://postgr.es/m/CAJ3gD9do9o2ccQ7j7+tSgiE1REY65XRiMb=yJO3u3QhyP8EEPQ@mail.gmail.com
parent 7f17fd6f
......@@ -178,6 +178,7 @@ SELECT tableoid::regclass, * FROM p1;
SELECT tableoid::regclass, * FROM p2;
INSERT INTO pt VALUES (1, 'xyzzy'); -- ERROR
INSERT INTO pt VALUES (2, 'xyzzy');
UPDATE pt set a = 1 where a = 2; -- ERROR
SELECT tableoid::regclass, * FROM pt;
SELECT tableoid::regclass, * FROM p1;
SELECT tableoid::regclass, * FROM p2;
......
......@@ -344,6 +344,8 @@ SELECT tableoid::regclass, * FROM p2;
INSERT INTO pt VALUES (1, 'xyzzy'); -- ERROR
ERROR: cannot route inserted tuples to a foreign table
INSERT INTO pt VALUES (2, 'xyzzy');
UPDATE pt set a = 1 where a = 2; -- ERROR
ERROR: cannot route inserted tuples to a foreign table
SELECT tableoid::regclass, * FROM pt;
tableoid | a | b
----------+---+-------
......
......@@ -3005,6 +3005,11 @@ VALUES ('Albany', NULL, NULL, 'NY');
foreign table partitions.
</para>
<para>
Updating the partition key of a row might cause it to be moved into a
different partition where this row satisfies its partition constraint.
</para>
<sect3 id="ddl-partitioning-declarative-example">
<title>Example</title>
......@@ -3302,9 +3307,22 @@ ALTER TABLE measurement ATTACH PARTITION measurement_y2008m02
<listitem>
<para>
An <command>UPDATE</command> that causes a row to move from one partition to
another fails, because the new value of the row fails to satisfy the
implicit partition constraint of the original partition.
When an <command>UPDATE</command> causes a row to move from one
partition to another, there is a chance that another concurrent
<command>UPDATE</command> or <command>DELETE</command> misses this row.
Suppose session 1 is performing an <command>UPDATE</command> on a
partition key, and meanwhile a concurrent session 2 for which this row
is visible performs an <command>UPDATE</command> or
<command>DELETE</command> operation on this row. Session 2 can silently
miss the row if the row is deleted from the partition due to session
1's activity. In such case, session 2's
<command>UPDATE</command> or <command>DELETE</command>, being unaware of
the row movement thinks that the row has just been deleted and concludes
that there is nothing to be done for this row. In the usual case where
the table is not partitioned, or where there is no row movement,
session 2 would have identified the newly updated row and carried out
the <command>UPDATE</command>/<command>DELETE</command> on this new row
version.
</para>
</listitem>
......
......@@ -282,10 +282,15 @@ UPDATE <replaceable class="parameter">count</replaceable>
<para>
In the case of a partitioned table, updating a row might cause it to no
longer satisfy the partition constraint. Since there is no provision to
move the row to the partition appropriate to the new value of its
partitioning key, an error will occur in this case. This can also happen
when updating a partition directly.
longer satisfy the partition constraint of the containing partition. In that
case, if there is some other partition in the partition tree for which this
row satisfies its partition constraint, then the row is moved to that
partition. If there is no such partition, an error will occur. Behind the
scenes, the row movement is actually a <command>DELETE</command> and
<command>INSERT</command> operation. However, there is a possibility that a
concurrent <command>UPDATE</command> or <command>DELETE</command> on the
same row may miss this row. For details see the section
<xref linkend="ddl-partitioning-declarative-limitations"/>.
</para>
</refsect1>
......
......@@ -153,6 +153,29 @@
triggers.
</para>
<para>
If an <command>UPDATE</command> on a partitioned table causes a row to move
to another partition, it will be performed as a <command>DELETE</command>
from the original partition followed by an <command>INSERT</command> into
the new partition. In this case, all row-level <literal>BEFORE</literal>
<command>UPDATE</command> triggers and all row-level
<literal>BEFORE</literal> <command>DELETE</command> triggers are fired on
the original partition. Then all row-level <literal>BEFORE</literal>
<command>INSERT</command> triggers are fired on the destination partition.
The possibility of surprising outcomes should be considered when all these
triggers affect the row being moved. As far as <literal>AFTER ROW</literal>
triggers are concerned, <literal>AFTER</literal> <command>DELETE</command>
and <literal>AFTER</literal> <command>INSERT</command> triggers are
applied; but <literal>AFTER</literal> <command>UPDATE</command> triggers
are not applied because the <command>UPDATE</command> has been converted to
a <command>DELETE</command> and an <command>INSERT</command>. As far as
statement-level triggers are concerned, none of the
<command>DELETE</command> or <command>INSERT</command> triggers are fired,
even if row movement occurs; only the <command>UPDATE</command> triggers
defined on the target table used in the <command>UPDATE</command> statement
will be fired.
</para>
<para>
Trigger functions invoked by per-statement triggers should always
return <symbol>NULL</symbol>. Trigger functions invoked by per-row
......
......@@ -170,7 +170,6 @@ typedef struct CopyStateData
PartitionTupleRouting *partition_tuple_routing;
TransitionCaptureState *transition_capture;
TupleConversionMap **transition_tupconv_maps;
/*
* These variables are used to reduce overhead in textual COPY FROM.
......@@ -2481,19 +2480,7 @@ CopyFrom(CopyState cstate)
* tuple).
*/
if (cstate->transition_capture != NULL)
{
int i;
cstate->transition_tupconv_maps = (TupleConversionMap **)
palloc0(sizeof(TupleConversionMap *) * proute->num_partitions);
for (i = 0; i < proute->num_partitions; ++i)
{
cstate->transition_tupconv_maps[i] =
convert_tuples_by_name(RelationGetDescr(proute->partitions[i]->ri_RelationDesc),
RelationGetDescr(cstate->rel),
gettext_noop("could not convert row type"));
}
}
ExecSetupChildParentMapForLeaf(proute);
}
/*
......@@ -2587,7 +2574,6 @@ CopyFrom(CopyState cstate)
if (cstate->partition_tuple_routing)
{
int leaf_part_index;
TupleConversionMap *map;
PartitionTupleRouting *proute = cstate->partition_tuple_routing;
/*
......@@ -2651,7 +2637,8 @@ CopyFrom(CopyState cstate)
*/
cstate->transition_capture->tcs_original_insert_tuple = NULL;
cstate->transition_capture->tcs_map =
cstate->transition_tupconv_maps[leaf_part_index];
TupConvMapForLeaf(proute, saved_resultRelInfo,
leaf_part_index);
}
else
{
......@@ -2668,23 +2655,10 @@ CopyFrom(CopyState cstate)
* We might need to convert from the parent rowtype to the
* partition rowtype.
*/
map = proute->partition_tupconv_maps[leaf_part_index];
if (map)
{
Relation partrel = resultRelInfo->ri_RelationDesc;
tuple = do_convert_tuple(tuple, map);
/*
* We must use the partition's tuple descriptor from this
* point on. Use a dedicated slot from this point on until
* we're finished dealing with the partition.
*/
slot = proute->partition_tuple_slot;
Assert(slot != NULL);
ExecSetSlotDescriptor(slot, RelationGetDescr(partrel));
ExecStoreTuple(tuple, slot, InvalidBuffer, true);
}
tuple = ConvertPartitionTupleSlot(proute->parent_child_tupconv_maps[leaf_part_index],
tuple,
proute->partition_tuple_slot,
&slot);
tuple->t_tableOid = RelationGetRelid(resultRelInfo->ri_RelationDesc);
}
......
......@@ -2854,8 +2854,13 @@ ExecARUpdateTriggers(EState *estate, ResultRelInfo *relinfo,
{
HeapTuple trigtuple;
Assert(HeapTupleIsValid(fdw_trigtuple) ^ ItemPointerIsValid(tupleid));
if (fdw_trigtuple == NULL)
/*
* Note: if the UPDATE is converted into a DELETE+INSERT as part of
* update-partition-key operation, then this function is also called
* separately for DELETE and INSERT to capture transition table rows.
* In such case, either old tuple or new tuple can be NULL.
*/
if (fdw_trigtuple == NULL && ItemPointerIsValid(tupleid))
trigtuple = GetTupleForTrigger(estate,
NULL,
relinfo,
......@@ -5414,7 +5419,12 @@ AfterTriggerPendingOnRel(Oid relid)
* triggers actually need to be queued. It is also called after each row,
* even if there are no triggers for that event, if there are any AFTER
* STATEMENT triggers for the statement which use transition tables, so that
* the transition tuplestores can be built.
* the transition tuplestores can be built. Furthermore, if the transition
* capture is happening for UPDATEd rows being moved to another partition due
* to the partition-key being changed, then this function is called once when
* the row is deleted (to capture OLD row), and once when the row is inserted
* into another partition (to capture NEW row). This is done separately because
* DELETE and INSERT happen on different tables.
*
* Transition tuplestores are built now, rather than when events are pulled
* off of the queue because AFTER ROW triggers are allowed to select from the
......@@ -5463,12 +5473,25 @@ AfterTriggerSaveEvent(EState *estate, ResultRelInfo *relinfo,
bool update_new_table = transition_capture->tcs_update_new_table;
bool insert_new_table = transition_capture->tcs_insert_new_table;;
if ((event == TRIGGER_EVENT_DELETE && delete_old_table) ||
(event == TRIGGER_EVENT_UPDATE && update_old_table))
/*
* For INSERT events newtup should be non-NULL, for DELETE events
* oldtup should be non-NULL, whereas for UPDATE events normally both
* oldtup and newtup are non-NULL. But for UPDATE events fired for
* capturing transition tuples during UPDATE partition-key row
* movement, oldtup is NULL when the event is for a row being inserted,
* whereas newtup is NULL when the event is for a row being deleted.
*/
Assert(!(event == TRIGGER_EVENT_DELETE && delete_old_table &&
oldtup == NULL));
Assert(!(event == TRIGGER_EVENT_INSERT && insert_new_table &&
newtup == NULL));
if (oldtup != NULL &&
((event == TRIGGER_EVENT_DELETE && delete_old_table) ||
(event == TRIGGER_EVENT_UPDATE && update_old_table)))
{
Tuplestorestate *old_tuplestore;
Assert(oldtup != NULL);
old_tuplestore = transition_capture->tcs_private->old_tuplestore;
if (map != NULL)
......@@ -5481,12 +5504,12 @@ AfterTriggerSaveEvent(EState *estate, ResultRelInfo *relinfo,
else
tuplestore_puttuple(old_tuplestore, oldtup);
}
if ((event == TRIGGER_EVENT_INSERT && insert_new_table) ||
(event == TRIGGER_EVENT_UPDATE && update_new_table))
if (newtup != NULL &&
((event == TRIGGER_EVENT_INSERT && insert_new_table) ||
(event == TRIGGER_EVENT_UPDATE && update_new_table)))
{
Tuplestorestate *new_tuplestore;
Assert(newtup != NULL);
new_tuplestore = transition_capture->tcs_private->new_tuplestore;
if (original_insert_tuple != NULL)
......@@ -5502,11 +5525,18 @@ AfterTriggerSaveEvent(EState *estate, ResultRelInfo *relinfo,
tuplestore_puttuple(new_tuplestore, newtup);
}
/* If transition tables are the only reason we're here, return. */
/*
* If transition tables are the only reason we're here, return. As
* mentioned above, we can also be here during update tuple routing in
* presence of transition tables, in which case this function is called
* separately for oldtup and newtup, so we expect exactly one of them
* to be NULL.
*/
if (trigdesc == NULL ||
(event == TRIGGER_EVENT_DELETE && !trigdesc->trig_delete_after_row) ||
(event == TRIGGER_EVENT_INSERT && !trigdesc->trig_insert_after_row) ||
(event == TRIGGER_EVENT_UPDATE && !trigdesc->trig_update_after_row))
(event == TRIGGER_EVENT_UPDATE && !trigdesc->trig_update_after_row) ||
(event == TRIGGER_EVENT_UPDATE && ((oldtup == NULL) ^ (newtup == NULL))))
return;
}
......
This diff is collapsed.
This diff is collapsed.
......@@ -204,6 +204,7 @@ _copyModifyTable(const ModifyTable *from)
COPY_SCALAR_FIELD(canSetTag);
COPY_SCALAR_FIELD(nominalRelation);
COPY_NODE_FIELD(partitioned_rels);
COPY_SCALAR_FIELD(partColsUpdated);
COPY_NODE_FIELD(resultRelations);
COPY_SCALAR_FIELD(resultRelIndex);
COPY_SCALAR_FIELD(rootResultRelIndex);
......@@ -2263,6 +2264,7 @@ _copyPartitionedChildRelInfo(const PartitionedChildRelInfo *from)
COPY_SCALAR_FIELD(parent_relid);
COPY_NODE_FIELD(child_rels);
COPY_SCALAR_FIELD(part_cols_updated);
return newnode;
}
......
......@@ -908,6 +908,7 @@ _equalPartitionedChildRelInfo(const PartitionedChildRelInfo *a, const Partitione
{
COMPARE_SCALAR_FIELD(parent_relid);
COMPARE_NODE_FIELD(child_rels);
COMPARE_SCALAR_FIELD(part_cols_updated);
return true;
}
......
......@@ -372,6 +372,7 @@ _outModifyTable(StringInfo str, const ModifyTable *node)
WRITE_BOOL_FIELD(canSetTag);
WRITE_UINT_FIELD(nominalRelation);
WRITE_NODE_FIELD(partitioned_rels);
WRITE_BOOL_FIELD(partColsUpdated);
WRITE_NODE_FIELD(resultRelations);
WRITE_INT_FIELD(resultRelIndex);
WRITE_INT_FIELD(rootResultRelIndex);
......@@ -2105,6 +2106,7 @@ _outModifyTablePath(StringInfo str, const ModifyTablePath *node)
WRITE_BOOL_FIELD(canSetTag);
WRITE_UINT_FIELD(nominalRelation);
WRITE_NODE_FIELD(partitioned_rels);
WRITE_BOOL_FIELD(partColsUpdated);
WRITE_NODE_FIELD(resultRelations);
WRITE_NODE_FIELD(subpaths);
WRITE_NODE_FIELD(subroots);
......@@ -2527,6 +2529,7 @@ _outPartitionedChildRelInfo(StringInfo str, const PartitionedChildRelInfo *node)
WRITE_UINT_FIELD(parent_relid);
WRITE_NODE_FIELD(child_rels);
WRITE_BOOL_FIELD(part_cols_updated);
}
static void
......
......@@ -1568,6 +1568,7 @@ _readModifyTable(void)
READ_BOOL_FIELD(canSetTag);
READ_UINT_FIELD(nominalRelation);
READ_NODE_FIELD(partitioned_rels);
READ_BOOL_FIELD(partColsUpdated);
READ_NODE_FIELD(resultRelations);
READ_INT_FIELD(resultRelIndex);
READ_INT_FIELD(rootResultRelIndex);
......
......@@ -1364,7 +1364,7 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
case RTE_RELATION:
if (rte->relkind == RELKIND_PARTITIONED_TABLE)
partitioned_rels =
get_partitioned_child_rels(root, rel->relid);
get_partitioned_child_rels(root, rel->relid, NULL);
break;
case RTE_SUBQUERY:
build_partitioned_rels = true;
......@@ -1403,7 +1403,7 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
{
List *cprels;
cprels = get_partitioned_child_rels(root, childrel->relid);
cprels = get_partitioned_child_rels(root, childrel->relid, NULL);
partitioned_rels = list_concat(partitioned_rels,
list_copy(cprels));
}
......
......@@ -279,6 +279,7 @@ static ProjectSet *make_project_set(List *tlist, Plan *subplan);
static ModifyTable *make_modifytable(PlannerInfo *root,
CmdType operation, bool canSetTag,
Index nominalRelation, List *partitioned_rels,
bool partColsUpdated,
List *resultRelations, List *subplans,
List *withCheckOptionLists, List *returningLists,
List *rowMarks, OnConflictExpr *onconflict, int epqParam);
......@@ -2373,6 +2374,7 @@ create_modifytable_plan(PlannerInfo *root, ModifyTablePath *best_path)
best_path->canSetTag,
best_path->nominalRelation,
best_path->partitioned_rels,
best_path->partColsUpdated,
best_path->resultRelations,
subplans,
best_path->withCheckOptionLists,
......@@ -6442,6 +6444,7 @@ static ModifyTable *
make_modifytable(PlannerInfo *root,
CmdType operation, bool canSetTag,
Index nominalRelation, List *partitioned_rels,
bool partColsUpdated,
List *resultRelations, List *subplans,
List *withCheckOptionLists, List *returningLists,
List *rowMarks, OnConflictExpr *onconflict, int epqParam)
......@@ -6468,6 +6471,7 @@ make_modifytable(PlannerInfo *root,
node->canSetTag = canSetTag;
node->nominalRelation = nominalRelation;
node->partitioned_rels = partitioned_rels;
node->partColsUpdated = partColsUpdated;
node->resultRelations = resultRelations;
node->resultRelIndex = -1; /* will be set correctly in setrefs.c */
node->rootResultRelIndex = -1; /* will be set correctly in setrefs.c */
......
......@@ -1101,6 +1101,7 @@ inheritance_planner(PlannerInfo *root)
Query *parent_parse;
Bitmapset *parent_relids = bms_make_singleton(top_parentRTindex);
PlannerInfo **parent_roots = NULL;
bool partColsUpdated = false;
Assert(parse->commandType != CMD_INSERT);
......@@ -1172,7 +1173,8 @@ inheritance_planner(PlannerInfo *root)
if (parent_rte->relkind == RELKIND_PARTITIONED_TABLE)
{
nominalRelation = top_parentRTindex;
partitioned_rels = get_partitioned_child_rels(root, top_parentRTindex);
partitioned_rels = get_partitioned_child_rels(root, top_parentRTindex,
&partColsUpdated);
/* The root partitioned table is included as a child rel */
Assert(list_length(partitioned_rels) >= 1);
}
......@@ -1512,6 +1514,7 @@ inheritance_planner(PlannerInfo *root)
parse->canSetTag,
nominalRelation,
partitioned_rels,
partColsUpdated,
resultRelations,
subpaths,
subroots,
......@@ -2123,6 +2126,7 @@ grouping_planner(PlannerInfo *root, bool inheritance_update,
parse->canSetTag,
parse->resultRelation,
NIL,
false,
list_make1_int(parse->resultRelation),
list_make1(path),
list_make1(root),
......@@ -6155,17 +6159,24 @@ plan_cluster_use_sort(Oid tableOid, Oid indexOid)
/*
* get_partitioned_child_rels
* Returns a list of the RT indexes of the partitioned child relations
* with rti as the root parent RT index.
* with rti as the root parent RT index. Also sets
* *part_cols_updated to true if any of the root rte's updated
* columns is used in the partition key either of the relation whose RTI
* is specified or of any child relation.
*
* Note: This function might get called even for range table entries that
* are not partitioned tables; in such a case, it will simply return NIL.
*/
List *
get_partitioned_child_rels(PlannerInfo *root, Index rti)
get_partitioned_child_rels(PlannerInfo *root, Index rti,
bool *part_cols_updated)
{
List *result = NIL;
ListCell *l;
if (part_cols_updated)
*part_cols_updated = false;
foreach(l, root->pcinfo_list)
{
PartitionedChildRelInfo *pc = lfirst_node(PartitionedChildRelInfo, l);
......@@ -6173,6 +6184,8 @@ get_partitioned_child_rels(PlannerInfo *root, Index rti)
if (pc->parent_relid == rti)
{
result = pc->child_rels;
if (part_cols_updated)
*part_cols_updated = pc->part_cols_updated;
break;
}
}
......
......@@ -105,7 +105,8 @@ static void expand_partitioned_rtentry(PlannerInfo *root,
RangeTblEntry *parentrte,
Index parentRTindex, Relation parentrel,
PlanRowMark *top_parentrc, LOCKMODE lockmode,
List **appinfos, List **partitioned_child_rels);
List **appinfos, List **partitioned_child_rels,
bool *part_cols_updated);
static void expand_single_inheritance_child(PlannerInfo *root,
RangeTblEntry *parentrte,
Index parentRTindex, Relation parentrel,
......@@ -1461,16 +1462,19 @@ expand_inherited_rtentry(PlannerInfo *root, RangeTblEntry *rte, Index rti)
if (RelationGetPartitionDesc(oldrelation) != NULL)
{
List *partitioned_child_rels = NIL;
bool part_cols_updated = false;
Assert(rte->relkind == RELKIND_PARTITIONED_TABLE);
/*
* If this table has partitions, recursively expand them in the order
* in which they appear in the PartitionDesc.
* in which they appear in the PartitionDesc. While at it, also
* extract the partition key columns of all the partitioned tables.
*/
expand_partitioned_rtentry(root, rte, rti, oldrelation, oldrc,
lockmode, &root->append_rel_list,
&partitioned_child_rels);
&partitioned_child_rels,
&part_cols_updated);
/*
* We keep a list of objects in root, each of which maps a root
......@@ -1487,6 +1491,7 @@ expand_inherited_rtentry(PlannerInfo *root, RangeTblEntry *rte, Index rti)
pcinfo = makeNode(PartitionedChildRelInfo);
pcinfo->parent_relid = rti;
pcinfo->child_rels = partitioned_child_rels;
pcinfo->part_cols_updated = part_cols_updated;
root->pcinfo_list = lappend(root->pcinfo_list, pcinfo);
}
}
......@@ -1563,7 +1568,8 @@ static void
expand_partitioned_rtentry(PlannerInfo *root, RangeTblEntry *parentrte,
Index parentRTindex, Relation parentrel,
PlanRowMark *top_parentrc, LOCKMODE lockmode,
List **appinfos, List **partitioned_child_rels)
List **appinfos, List **partitioned_child_rels,
bool *part_cols_updated)
{
int i;
RangeTblEntry *childrte;
......@@ -1578,6 +1584,17 @@ expand_partitioned_rtentry(PlannerInfo *root, RangeTblEntry *parentrte,
Assert(parentrte->inh);
/*
* Note down whether any partition key cols are being updated. Though it's
* the root partitioned table's updatedCols we are interested in, we
* instead use parentrte to get the updatedCols. This is convenient because
* parentrte already has the root partrel's updatedCols translated to match
* the attribute ordering of parentrel.
*/
if (!*part_cols_updated)
*part_cols_updated =
has_partition_attrs(parentrel, parentrte->updatedCols, NULL);
/* First expand the partitioned table itself. */
expand_single_inheritance_child(root, parentrte, parentRTindex, parentrel,
top_parentrc, parentrel,
......@@ -1617,7 +1634,8 @@ expand_partitioned_rtentry(PlannerInfo *root, RangeTblEntry *parentrte,
if (childrel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
expand_partitioned_rtentry(root, childrte, childRTindex,
childrel, top_parentrc, lockmode,
appinfos, partitioned_child_rels);
appinfos, partitioned_child_rels,
part_cols_updated);
/* Close child relation, but keep locks */
heap_close(childrel, NoLock);
......
......@@ -3274,6 +3274,8 @@ create_lockrows_path(PlannerInfo *root, RelOptInfo *rel,
* 'partitioned_rels' is an integer list of RT indexes of non-leaf tables in
* the partition tree, if this is an UPDATE/DELETE to a partitioned table.
* Otherwise NIL.
* 'partColsUpdated' is true if any partitioning columns are being updated,
* either from the target relation or a descendent partitioned table.
* 'resultRelations' is an integer list of actual RT indexes of target rel(s)
* 'subpaths' is a list of Path(s) producing source data (one per rel)
* 'subroots' is a list of PlannerInfo structs (one per rel)
......@@ -3287,6 +3289,7 @@ ModifyTablePath *
create_modifytable_path(PlannerInfo *root, RelOptInfo *rel,
CmdType operation, bool canSetTag,
Index nominalRelation, List *partitioned_rels,
bool partColsUpdated,
List *resultRelations, List *subpaths,
List *subroots,
List *withCheckOptionLists, List *returningLists,
......@@ -3354,6 +3357,7 @@ create_modifytable_path(PlannerInfo *root, RelOptInfo *rel,
pathnode->canSetTag = canSetTag;
pathnode->nominalRelation = nominalRelation;
pathnode->partitioned_rels = list_copy(partitioned_rels);
pathnode->partColsUpdated = partColsUpdated;
pathnode->resultRelations = resultRelations;
pathnode->subpaths = subpaths;
pathnode->subroots = subroots;
......
......@@ -62,11 +62,24 @@ typedef struct PartitionDispatchData *PartitionDispatch;
* for every leaf partition in the partition tree.
* num_partitions Number of leaf partitions in the partition tree
* (= 'partitions' array length)
* partition_tupconv_maps Array of TupleConversionMap objects with one
* parent_child_tupconv_maps Array of TupleConversionMap objects with one
* entry for every leaf partition (required to
* convert input tuple based on the root table's
* rowtype to a leaf partition's rowtype after
* tuple routing is done)
* convert tuple from the root table's rowtype to
* a leaf partition's rowtype after tuple routing
* is done)
* child_parent_tupconv_maps Array of TupleConversionMap objects with one
* entry for every leaf partition (required to
* convert an updated tuple from the leaf
* partition's rowtype to the root table's rowtype
* so that tuple routing can be done)
* child_parent_map_not_required Array of bool. True value means that a map is
* determined to be not required for the given
* partition. False means either we haven't yet
* checked if a map is required, or it was
* determined to be required.
* subplan_partition_offsets Integer array ordered by UPDATE subplans. Each
* element of this array has the index into the
* corresponding partition in partitions array.
* partition_tuple_slot TupleTableSlot to be used to manipulate any
* given leaf partition's rowtype after that
* partition is chosen for insertion by
......@@ -79,8 +92,12 @@ typedef struct PartitionTupleRouting
int num_dispatch;
ResultRelInfo **partitions;
int num_partitions;
TupleConversionMap **partition_tupconv_maps;
TupleConversionMap **parent_child_tupconv_maps;
TupleConversionMap **child_parent_tupconv_maps;
bool *child_parent_map_not_required;
int *subplan_partition_offsets;
TupleTableSlot *partition_tuple_slot;
TupleTableSlot *root_tuple_slot;
} PartitionTupleRouting;
extern PartitionTupleRouting *ExecSetupPartitionTupleRouting(ModifyTableState *mtstate,
......@@ -90,6 +107,13 @@ extern int ExecFindPartition(ResultRelInfo *resultRelInfo,
PartitionDispatch *pd,
TupleTableSlot *slot,
EState *estate);
extern void ExecSetupChildParentMapForLeaf(PartitionTupleRouting *proute);
extern TupleConversionMap *TupConvMapForLeaf(PartitionTupleRouting *proute,
ResultRelInfo *rootRelInfo, int leaf_index);
extern HeapTuple ConvertPartitionTupleSlot(TupleConversionMap *map,
HeapTuple tuple,
TupleTableSlot *new_slot,
TupleTableSlot **p_my_slot);
extern void ExecCleanupTupleRouting(PartitionTupleRouting *proute);
#endif /* EXECPARTITION_H */
......@@ -992,8 +992,8 @@ typedef struct ModifyTableState
/* controls transition table population for specified operation */
struct TransitionCaptureState *mt_oc_transition_capture;
/* controls transition table population for INSERT...ON CONFLICT UPDATE */
TupleConversionMap **mt_transition_tupconv_maps;
/* Per plan/partition tuple conversion */
TupleConversionMap **mt_per_subplan_tupconv_maps;
/* Per plan map for tuple conversion from child to root */
} ModifyTableState;
/* ----------------
......
......@@ -219,6 +219,7 @@ typedef struct ModifyTable
Index nominalRelation; /* Parent RT index for use of EXPLAIN */
/* RT indexes of non-leaf tables in a partition tree */
List *partitioned_rels;
bool partColsUpdated; /* some part key in hierarchy updated */
List *resultRelations; /* integer list of RT indexes */
int resultRelIndex; /* index of first resultRel in plan's list */
int rootResultRelIndex; /* index of the partitioned table root */
......
......@@ -1674,6 +1674,7 @@ typedef struct ModifyTablePath
Index nominalRelation; /* Parent RT index for use of EXPLAIN */
/* RT indexes of non-leaf tables in a partition tree */
List *partitioned_rels;
bool partColsUpdated; /* some part key in hierarchy updated */
List *resultRelations; /* integer list of RT indexes */
List *subpaths; /* Path(s) producing source data */
List *subroots; /* per-target-table PlannerInfos */
......@@ -2124,6 +2125,8 @@ typedef struct PartitionedChildRelInfo
Index parent_relid;
List *child_rels;
bool part_cols_updated; /* is the partition key of any of
* the partitioned tables updated? */
} PartitionedChildRelInfo;
/*
......
......@@ -242,6 +242,7 @@ extern ModifyTablePath *create_modifytable_path(PlannerInfo *root,
RelOptInfo *rel,
CmdType operation, bool canSetTag,
Index nominalRelation, List *partitioned_rels,
bool partColsUpdated,
List *resultRelations, List *subpaths,
List *subroots,
List *withCheckOptionLists, List *returningLists,
......
......@@ -57,7 +57,8 @@ extern Expr *preprocess_phv_expression(PlannerInfo *root, Expr *expr);
extern bool plan_cluster_use_sort(Oid tableOid, Oid indexOid);
extern List *get_partitioned_child_rels(PlannerInfo *root, Index rti);
extern List *get_partitioned_child_rels(PlannerInfo *root, Index rti,
bool *part_cols_updated);
extern List *get_partitioned_child_rels_for_join(PlannerInfo *root,
Relids join_relids);
......
This diff is collapsed.
This diff is collapsed.
......@@ -1590,6 +1590,7 @@ PartitionRangeDatum
PartitionRangeDatumKind
PartitionScheme
PartitionSpec
PartitionTupleRouting
PartitionedChildRelInfo
PasswordType
Path
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment