Commit 8aba9322 authored by Alvaro Herrera's avatar Alvaro Herrera

Fix relcache inconsistency hazard in partition detach

During queries coming from ri_triggers.c, we need to omit partitions
that are marked pending detach -- otherwise, the RI query is tricked
into allowing a row into the referencing table whose corresponding row
is in the detached partition.  Which is bogus: once the detach operation
completes, the row becomes an orphan.

However, the code was not doing that in repeatable-read transactions,
because relcache kept a copy of the partition descriptor that included
the partition, and used it in the RI query.  This commit changes the
partdesc cache code to only keep descriptors that aren't dependent on
a snapshot (namely: those where no detached partition exist, and those
where detached partitions are included).  When a partdesc-without-
detached-partitions is requested, we create one afresh each time; also,
those partdescs are stored in PortalContext instead of
CacheMemoryContext.

find_inheritance_children gets a new output *detached_exist boolean,
which indicates whether any partition marked pending-detach is found.
Its "include_detached" input flag is changed to "omit_detached", because
that name captures desired the semantics more naturally.
CreatePartitionDirectory() and RelationGetPartitionDesc() arguments are
identically renamed.

This was noticed because a buildfarm member that runs with relcache
clobbering, which would not keep the improperly cached partdesc, broke
one test, which led us to realize that the expected output of that test
was bogus.  This commit also corrects that expected output.

Author: Amit Langote <amitlangote09@gmail.com>
Author: Álvaro Herrera <alvherre@alvh.no-ip.org>
Discussion: https://postgr.es/m/3269784.1617215412@sss.pgh.pa.us
parent 82b13dbc
...@@ -3840,7 +3840,7 @@ StorePartitionBound(Relation rel, Relation parent, PartitionBoundSpec *bound) ...@@ -3840,7 +3840,7 @@ StorePartitionBound(Relation rel, Relation parent, PartitionBoundSpec *bound)
* removed. * removed.
*/ */
defaultPartOid = defaultPartOid =
get_default_oid_from_partdesc(RelationGetPartitionDesc(parent, false)); get_default_oid_from_partdesc(RelationGetPartitionDesc(parent, true));
if (OidIsValid(defaultPartOid)) if (OidIsValid(defaultPartOid))
CacheInvalidateRelcacheByRelid(defaultPartOid); CacheInvalidateRelcacheByRelid(defaultPartOid);
......
...@@ -52,13 +52,19 @@ typedef struct SeenRelsEntry ...@@ -52,13 +52,19 @@ typedef struct SeenRelsEntry
* then no locks are acquired, but caller must beware of race conditions * then no locks are acquired, but caller must beware of race conditions
* against possible DROPs of child relations. * against possible DROPs of child relations.
* *
* include_detached says to include all partitions, even if they're marked * If a partition's pg_inherits row is marked "detach pending",
* detached. Passing it as false means they might or might not be included, * *detached_exist (if not null) is set true, otherwise it is set false.
* depending on the visibility of the pg_inherits row for the active snapshot. *
* If omit_detached is true and there is an active snapshot (not the same as
* the catalog snapshot used to scan pg_inherits!) and a pg_inherits tuple
* marked "detach pending" is visible to that snapshot, then that partition is
* omitted from the output list. This makes partitions invisible depending on
* whether the transaction that marked those partitions as detached appears
* committed to the active snapshot.
*/ */
List * List *
find_inheritance_children(Oid parentrelId, bool include_detached, find_inheritance_children(Oid parentrelId, bool omit_detached,
LOCKMODE lockmode) LOCKMODE lockmode, bool *detached_exist)
{ {
List *list = NIL; List *list = NIL;
Relation relation; Relation relation;
...@@ -78,6 +84,9 @@ find_inheritance_children(Oid parentrelId, bool include_detached, ...@@ -78,6 +84,9 @@ find_inheritance_children(Oid parentrelId, bool include_detached,
if (!has_subclass(parentrelId)) if (!has_subclass(parentrelId))
return NIL; return NIL;
if (detached_exist)
*detached_exist = false;
/* /*
* Scan pg_inherits and build a working array of subclass OIDs. * Scan pg_inherits and build a working array of subclass OIDs.
*/ */
...@@ -99,29 +108,35 @@ find_inheritance_children(Oid parentrelId, bool include_detached, ...@@ -99,29 +108,35 @@ find_inheritance_children(Oid parentrelId, bool include_detached,
{ {
/* /*
* Cope with partitions concurrently being detached. When we see a * Cope with partitions concurrently being detached. When we see a
* partition marked "detach pending", we only include it in the set of * partition marked "detach pending", we omit it from the returned set
* visible partitions if caller requested all detached partitions, or * of visible partitions if caller requested that and the tuple's xmin
* if its pg_inherits tuple's xmin is still visible to the active * does not appear in progress to the active snapshot. (If there's no
* snapshot. * active snapshot set, that means we're not running a user query, so
* it's OK to always include detached partitions in that case; if the
* xmin is still running to the active snapshot, then the partition
* has not been detached yet and so we include it.)
* *
* The reason for this check is that we want to avoid seeing the * The reason for this hack is that we want to avoid seeing the
* partition as alive in RI queries during REPEATABLE READ or * partition as alive in RI queries during REPEATABLE READ or
* SERIALIZABLE transactions. (If there's no active snapshot set, * SERIALIZABLE transactions: such queries use a different snapshot
* that means we're not running a user query, so it's OK to always * than the one used by regular (user) queries.
* include detached partitions in that case.)
*/ */
if (((Form_pg_inherits) GETSTRUCT(inheritsTuple))->inhdetachpending && if (((Form_pg_inherits) GETSTRUCT(inheritsTuple))->inhdetachpending)
!include_detached &&
ActiveSnapshotSet())
{ {
TransactionId xmin; if (detached_exist)
Snapshot snap; *detached_exist = true;
if (omit_detached && ActiveSnapshotSet())
{
TransactionId xmin;
Snapshot snap;
xmin = HeapTupleHeaderGetXmin(inheritsTuple->t_data); xmin = HeapTupleHeaderGetXmin(inheritsTuple->t_data);
snap = GetActiveSnapshot(); snap = GetActiveSnapshot();
if (!XidInMVCCSnapshot(xmin, snap)) if (!XidInMVCCSnapshot(xmin, snap))
continue; continue;
}
} }
inhrelid = ((Form_pg_inherits) GETSTRUCT(inheritsTuple))->inhrelid; inhrelid = ((Form_pg_inherits) GETSTRUCT(inheritsTuple))->inhrelid;
...@@ -235,8 +250,8 @@ find_all_inheritors(Oid parentrelId, LOCKMODE lockmode, List **numparents) ...@@ -235,8 +250,8 @@ find_all_inheritors(Oid parentrelId, LOCKMODE lockmode, List **numparents)
ListCell *lc; ListCell *lc;
/* Get the direct children of this rel */ /* Get the direct children of this rel */
currentchildren = find_inheritance_children(currentrel, false, currentchildren = find_inheritance_children(currentrel, true,
lockmode); lockmode, NULL);
/* /*
* Add to the queue only those children not already seen. This avoids * Add to the queue only those children not already seen. This avoids
......
...@@ -1123,7 +1123,7 @@ DefineIndex(Oid relationId, ...@@ -1123,7 +1123,7 @@ DefineIndex(Oid relationId,
*/ */
if (partitioned && stmt->relation && !stmt->relation->inh) if (partitioned && stmt->relation && !stmt->relation->inh)
{ {
PartitionDesc pd = RelationGetPartitionDesc(rel, false); PartitionDesc pd = RelationGetPartitionDesc(rel, true);
if (pd->nparts != 0) if (pd->nparts != 0)
flags |= INDEX_CREATE_INVALID; flags |= INDEX_CREATE_INVALID;
...@@ -1180,7 +1180,7 @@ DefineIndex(Oid relationId, ...@@ -1180,7 +1180,7 @@ DefineIndex(Oid relationId,
* *
* If we're called internally (no stmt->relation), recurse always. * If we're called internally (no stmt->relation), recurse always.
*/ */
partdesc = RelationGetPartitionDesc(rel, false); partdesc = RelationGetPartitionDesc(rel, true);
if ((!stmt->relation || stmt->relation->inh) && partdesc->nparts > 0) if ((!stmt->relation || stmt->relation->inh) && partdesc->nparts > 0)
{ {
int nparts = partdesc->nparts; int nparts = partdesc->nparts;
......
...@@ -1041,7 +1041,7 @@ DefineRelation(CreateStmt *stmt, char relkind, Oid ownerId, ...@@ -1041,7 +1041,7 @@ DefineRelation(CreateStmt *stmt, char relkind, Oid ownerId,
*/ */
defaultPartOid = defaultPartOid =
get_default_oid_from_partdesc(RelationGetPartitionDesc(parent, get_default_oid_from_partdesc(RelationGetPartitionDesc(parent,
false)); true));
if (OidIsValid(defaultPartOid)) if (OidIsValid(defaultPartOid))
defaultRel = table_open(defaultPartOid, AccessExclusiveLock); defaultRel = table_open(defaultPartOid, AccessExclusiveLock);
...@@ -3507,7 +3507,7 @@ renameatt_internal(Oid myrelid, ...@@ -3507,7 +3507,7 @@ renameatt_internal(Oid myrelid,
* expected_parents will only be 0 if we are not already recursing. * expected_parents will only be 0 if we are not already recursing.
*/ */
if (expected_parents == 0 && if (expected_parents == 0 &&
find_inheritance_children(myrelid, false, NoLock) != NIL) find_inheritance_children(myrelid, true, NoLock, NULL) != NIL)
ereport(ERROR, ereport(ERROR,
(errcode(ERRCODE_INVALID_TABLE_DEFINITION), (errcode(ERRCODE_INVALID_TABLE_DEFINITION),
errmsg("inherited column \"%s\" must be renamed in child tables too", errmsg("inherited column \"%s\" must be renamed in child tables too",
...@@ -3706,7 +3706,7 @@ rename_constraint_internal(Oid myrelid, ...@@ -3706,7 +3706,7 @@ rename_constraint_internal(Oid myrelid,
else else
{ {
if (expected_parents == 0 && if (expected_parents == 0 &&
find_inheritance_children(myrelid, false, NoLock) != NIL) find_inheritance_children(myrelid, true, NoLock, NULL) != NIL)
ereport(ERROR, ereport(ERROR,
(errcode(ERRCODE_INVALID_TABLE_DEFINITION), (errcode(ERRCODE_INVALID_TABLE_DEFINITION),
errmsg("inherited constraint \"%s\" must be renamed in child tables too", errmsg("inherited constraint \"%s\" must be renamed in child tables too",
...@@ -6580,7 +6580,7 @@ ATExecAddColumn(List **wqueue, AlteredTableInfo *tab, Relation rel, ...@@ -6580,7 +6580,7 @@ ATExecAddColumn(List **wqueue, AlteredTableInfo *tab, Relation rel,
*/ */
if (colDef->identity && if (colDef->identity &&
recurse && recurse &&
find_inheritance_children(myrelid, false, NoLock) != NIL) find_inheritance_children(myrelid, true, NoLock, NULL) != NIL)
ereport(ERROR, ereport(ERROR,
(errcode(ERRCODE_INVALID_TABLE_DEFINITION), (errcode(ERRCODE_INVALID_TABLE_DEFINITION),
errmsg("cannot recursively add identity column to table that has child tables"))); errmsg("cannot recursively add identity column to table that has child tables")));
...@@ -6826,7 +6826,7 @@ ATExecAddColumn(List **wqueue, AlteredTableInfo *tab, Relation rel, ...@@ -6826,7 +6826,7 @@ ATExecAddColumn(List **wqueue, AlteredTableInfo *tab, Relation rel,
* use find_all_inheritors to do it in one pass. * use find_all_inheritors to do it in one pass.
*/ */
children = children =
find_inheritance_children(RelationGetRelid(rel), false, lockmode); find_inheritance_children(RelationGetRelid(rel), true, lockmode, NULL);
/* /*
* If we are told not to recurse, there had better not be any child * If we are told not to recurse, there had better not be any child
...@@ -6980,7 +6980,7 @@ ATPrepDropNotNull(Relation rel, bool recurse, bool recursing) ...@@ -6980,7 +6980,7 @@ ATPrepDropNotNull(Relation rel, bool recurse, bool recursing)
*/ */
if (rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE) if (rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
{ {
PartitionDesc partdesc = RelationGetPartitionDesc(rel, false); PartitionDesc partdesc = RelationGetPartitionDesc(rel, true);
Assert(partdesc != NULL); Assert(partdesc != NULL);
if (partdesc->nparts > 0 && !recurse && !recursing) if (partdesc->nparts > 0 && !recurse && !recursing)
...@@ -7689,7 +7689,7 @@ ATPrepDropExpression(Relation rel, AlterTableCmd *cmd, bool recurse, bool recurs ...@@ -7689,7 +7689,7 @@ ATPrepDropExpression(Relation rel, AlterTableCmd *cmd, bool recurse, bool recurs
* resulting state can be properly dumped and restored. * resulting state can be properly dumped and restored.
*/ */
if (!recurse && if (!recurse &&
find_inheritance_children(RelationGetRelid(rel), false, lockmode)) find_inheritance_children(RelationGetRelid(rel), true, lockmode, NULL))
ereport(ERROR, ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED), (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("ALTER TABLE / DROP EXPRESSION must be applied to child tables too"))); errmsg("ALTER TABLE / DROP EXPRESSION must be applied to child tables too")));
...@@ -8297,7 +8297,7 @@ ATExecDropColumn(List **wqueue, Relation rel, const char *colName, ...@@ -8297,7 +8297,7 @@ ATExecDropColumn(List **wqueue, Relation rel, const char *colName,
* use find_all_inheritors to do it in one pass. * use find_all_inheritors to do it in one pass.
*/ */
children = children =
find_inheritance_children(RelationGetRelid(rel), false, lockmode); find_inheritance_children(RelationGetRelid(rel), true, lockmode, NULL);
if (children) if (children)
{ {
...@@ -8785,7 +8785,7 @@ ATAddCheckConstraint(List **wqueue, AlteredTableInfo *tab, Relation rel, ...@@ -8785,7 +8785,7 @@ ATAddCheckConstraint(List **wqueue, AlteredTableInfo *tab, Relation rel,
* use find_all_inheritors to do it in one pass. * use find_all_inheritors to do it in one pass.
*/ */
children = children =
find_inheritance_children(RelationGetRelid(rel), false, lockmode); find_inheritance_children(RelationGetRelid(rel), true, lockmode, NULL);
/* /*
* Check if ONLY was specified with ALTER TABLE. If so, allow the * Check if ONLY was specified with ALTER TABLE. If so, allow the
...@@ -9400,7 +9400,7 @@ addFkRecurseReferenced(List **wqueue, Constraint *fkconstraint, Relation rel, ...@@ -9400,7 +9400,7 @@ addFkRecurseReferenced(List **wqueue, Constraint *fkconstraint, Relation rel,
*/ */
if (pkrel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE) if (pkrel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
{ {
PartitionDesc pd = RelationGetPartitionDesc(pkrel, false); PartitionDesc pd = RelationGetPartitionDesc(pkrel, true);
for (int i = 0; i < pd->nparts; i++) for (int i = 0; i < pd->nparts; i++)
{ {
...@@ -9534,7 +9534,7 @@ addFkRecurseReferencing(List **wqueue, Constraint *fkconstraint, Relation rel, ...@@ -9534,7 +9534,7 @@ addFkRecurseReferencing(List **wqueue, Constraint *fkconstraint, Relation rel,
} }
else if (rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE) else if (rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
{ {
PartitionDesc pd = RelationGetPartitionDesc(rel, false); PartitionDesc pd = RelationGetPartitionDesc(rel, true);
/* /*
* Recurse to take appropriate action on each partition; either we * Recurse to take appropriate action on each partition; either we
...@@ -11318,8 +11318,8 @@ ATExecDropConstraint(Relation rel, const char *constrName, ...@@ -11318,8 +11318,8 @@ ATExecDropConstraint(Relation rel, const char *constrName,
* use find_all_inheritors to do it in one pass. * use find_all_inheritors to do it in one pass.
*/ */
if (!is_no_inherit_constraint) if (!is_no_inherit_constraint)
children = children = find_inheritance_children(RelationGetRelid(rel), true,
find_inheritance_children(RelationGetRelid(rel), false, lockmode); lockmode, NULL);
else else
children = NIL; children = NIL;
...@@ -11703,8 +11703,8 @@ ATPrepAlterColumnType(List **wqueue, ...@@ -11703,8 +11703,8 @@ ATPrepAlterColumnType(List **wqueue,
} }
} }
else if (!recursing && else if (!recursing &&
find_inheritance_children(RelationGetRelid(rel), false, find_inheritance_children(RelationGetRelid(rel), true,
NoLock) != NIL) NoLock, NULL) != NIL)
ereport(ERROR, ereport(ERROR,
(errcode(ERRCODE_INVALID_TABLE_DEFINITION), (errcode(ERRCODE_INVALID_TABLE_DEFINITION),
errmsg("type of inherited column \"%s\" must be changed in child tables too", errmsg("type of inherited column \"%s\" must be changed in child tables too",
...@@ -16875,7 +16875,7 @@ QueuePartitionConstraintValidation(List **wqueue, Relation scanrel, ...@@ -16875,7 +16875,7 @@ QueuePartitionConstraintValidation(List **wqueue, Relation scanrel,
} }
else if (scanrel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE) else if (scanrel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
{ {
PartitionDesc partdesc = RelationGetPartitionDesc(scanrel, false); PartitionDesc partdesc = RelationGetPartitionDesc(scanrel, true);
int i; int i;
for (i = 0; i < partdesc->nparts; i++) for (i = 0; i < partdesc->nparts; i++)
...@@ -16935,7 +16935,7 @@ ATExecAttachPartition(List **wqueue, Relation rel, PartitionCmd *cmd, ...@@ -16935,7 +16935,7 @@ ATExecAttachPartition(List **wqueue, Relation rel, PartitionCmd *cmd,
* new partition will change its partition constraint. * new partition will change its partition constraint.
*/ */
defaultPartOid = defaultPartOid =
get_default_oid_from_partdesc(RelationGetPartitionDesc(rel, false)); get_default_oid_from_partdesc(RelationGetPartitionDesc(rel, true));
if (OidIsValid(defaultPartOid)) if (OidIsValid(defaultPartOid))
LockRelationOid(defaultPartOid, AccessExclusiveLock); LockRelationOid(defaultPartOid, AccessExclusiveLock);
...@@ -17551,7 +17551,7 @@ ATExecDetachPartition(List **wqueue, AlteredTableInfo *tab, Relation rel, ...@@ -17551,7 +17551,7 @@ ATExecDetachPartition(List **wqueue, AlteredTableInfo *tab, Relation rel,
* will change its partition constraint. * will change its partition constraint.
*/ */
defaultPartOid = defaultPartOid =
get_default_oid_from_partdesc(RelationGetPartitionDesc(rel, false)); get_default_oid_from_partdesc(RelationGetPartitionDesc(rel, true));
if (OidIsValid(defaultPartOid)) if (OidIsValid(defaultPartOid))
{ {
/* /*
...@@ -18148,7 +18148,7 @@ ATExecAttachPartitionIdx(List **wqueue, Relation parentIdx, RangeVar *name) ...@@ -18148,7 +18148,7 @@ ATExecAttachPartitionIdx(List **wqueue, Relation parentIdx, RangeVar *name)
RelationGetRelationName(partIdx)))); RelationGetRelationName(partIdx))));
/* Make sure it indexes a partition of the other index's table */ /* Make sure it indexes a partition of the other index's table */
partDesc = RelationGetPartitionDesc(parentTbl, false); partDesc = RelationGetPartitionDesc(parentTbl, true);
found = false; found = false;
for (i = 0; i < partDesc->nparts; i++) for (i = 0; i < partDesc->nparts; i++)
{ {
...@@ -18302,7 +18302,7 @@ validatePartitionedIndex(Relation partedIdx, Relation partedTbl) ...@@ -18302,7 +18302,7 @@ validatePartitionedIndex(Relation partedIdx, Relation partedTbl)
* If we found as many inherited indexes as the partitioned table has * If we found as many inherited indexes as the partitioned table has
* partitions, we're good; update pg_index to set indisvalid. * partitions, we're good; update pg_index to set indisvalid.
*/ */
if (tuples == RelationGetPartitionDesc(partedTbl, false)->nparts) if (tuples == RelationGetPartitionDesc(partedTbl, true)->nparts)
{ {
Relation idxRel; Relation idxRel;
HeapTuple newtup; HeapTuple newtup;
......
...@@ -1119,7 +1119,7 @@ CreateTrigger(CreateTrigStmt *stmt, const char *queryString, ...@@ -1119,7 +1119,7 @@ CreateTrigger(CreateTrigStmt *stmt, const char *queryString,
*/ */
if (partition_recurse) if (partition_recurse)
{ {
PartitionDesc partdesc = RelationGetPartitionDesc(rel, false); PartitionDesc partdesc = RelationGetPartitionDesc(rel, true);
List *idxs = NIL; List *idxs = NIL;
List *childTbls = NIL; List *childTbls = NIL;
ListCell *l; ListCell *l;
...@@ -1141,8 +1141,8 @@ CreateTrigger(CreateTrigStmt *stmt, const char *queryString, ...@@ -1141,8 +1141,8 @@ CreateTrigger(CreateTrigStmt *stmt, const char *queryString,
ListCell *l; ListCell *l;
List *idxs = NIL; List *idxs = NIL;
idxs = find_inheritance_children(indexOid, false, idxs = find_inheritance_children(indexOid, true,
ShareRowExclusiveLock); ShareRowExclusiveLock, NULL);
foreach(l, idxs) foreach(l, idxs)
childTbls = lappend_oid(childTbls, childTbls = lappend_oid(childTbls,
IndexGetRelation(lfirst_oid(l), IndexGetRelation(lfirst_oid(l),
......
...@@ -991,19 +991,16 @@ ExecInitPartitionDispatchInfo(EState *estate, ...@@ -991,19 +991,16 @@ ExecInitPartitionDispatchInfo(EState *estate,
/* /*
* For data modification, it is better that executor does not include * For data modification, it is better that executor does not include
* partitions being detached, except in snapshot-isolation mode. This * partitions being detached, except when running in snapshot-isolation
* means that a read-committed transaction immediately gets a "no * mode. This means that a read-committed transaction immediately gets a
* partition for tuple" error when a tuple is inserted into a partition * "no partition for tuple" error when a tuple is inserted into a
* that's being detached concurrently, but a transaction in repeatable- * partition that's being detached concurrently, but a transaction in
* read mode can still use the partition. Note that because partition * repeatable-read mode can still use such a partition.
* detach uses ShareLock on the partition (which conflicts with DML),
* we're certain that the detach won't be able to complete until any
* inserting transaction is done.
*/ */
if (estate->es_partition_directory == NULL) if (estate->es_partition_directory == NULL)
estate->es_partition_directory = estate->es_partition_directory =
CreatePartitionDirectory(estate->es_query_cxt, CreatePartitionDirectory(estate->es_query_cxt,
IsolationUsesXactSnapshot()); !IsolationUsesXactSnapshot());
oldcxt = MemoryContextSwitchTo(proute->memcxt); oldcxt = MemoryContextSwitchTo(proute->memcxt);
...@@ -1571,10 +1568,10 @@ ExecCreatePartitionPruneState(PlanState *planstate, ...@@ -1571,10 +1568,10 @@ ExecCreatePartitionPruneState(PlanState *planstate,
ListCell *lc; ListCell *lc;
int i; int i;
/* Executor must always include detached partitions */ /* For data reading, executor always omits detached partitions */
if (estate->es_partition_directory == NULL) if (estate->es_partition_directory == NULL)
estate->es_partition_directory = estate->es_partition_directory =
CreatePartitionDirectory(estate->es_query_cxt, true); CreatePartitionDirectory(estate->es_query_cxt, false);
n_part_hierarchies = list_length(partitionpruneinfo->prune_infos); n_part_hierarchies = list_length(partitionpruneinfo->prune_infos);
Assert(n_part_hierarchies > 0); Assert(n_part_hierarchies > 0);
......
...@@ -2200,7 +2200,7 @@ set_relation_partition_info(PlannerInfo *root, RelOptInfo *rel, ...@@ -2200,7 +2200,7 @@ set_relation_partition_info(PlannerInfo *root, RelOptInfo *rel,
if (root->glob->partition_directory == NULL) if (root->glob->partition_directory == NULL)
{ {
root->glob->partition_directory = root->glob->partition_directory =
CreatePartitionDirectory(CurrentMemoryContext, false); CreatePartitionDirectory(CurrentMemoryContext, true);
} }
partdesc = PartitionDirectoryLookup(root->glob->partition_directory, partdesc = PartitionDirectoryLookup(root->glob->partition_directory,
......
...@@ -2798,7 +2798,7 @@ check_new_partition_bound(char *relname, Relation parent, ...@@ -2798,7 +2798,7 @@ check_new_partition_bound(char *relname, Relation parent,
PartitionBoundSpec *spec, ParseState *pstate) PartitionBoundSpec *spec, ParseState *pstate)
{ {
PartitionKey key = RelationGetPartitionKey(parent); PartitionKey key = RelationGetPartitionKey(parent);
PartitionDesc partdesc = RelationGetPartitionDesc(parent, true); PartitionDesc partdesc = RelationGetPartitionDesc(parent, false);
PartitionBoundInfo boundinfo = partdesc->boundinfo; PartitionBoundInfo boundinfo = partdesc->boundinfo;
int with = -1; int with = -1;
bool overlap = false; bool overlap = false;
...@@ -3991,7 +3991,7 @@ get_qual_for_list(Relation parent, PartitionBoundSpec *spec) ...@@ -3991,7 +3991,7 @@ get_qual_for_list(Relation parent, PartitionBoundSpec *spec)
{ {
int i; int i;
int ndatums = 0; int ndatums = 0;
PartitionDesc pdesc = RelationGetPartitionDesc(parent, true); /* XXX correct? */ PartitionDesc pdesc = RelationGetPartitionDesc(parent, false);
PartitionBoundInfo boundinfo = pdesc->boundinfo; PartitionBoundInfo boundinfo = pdesc->boundinfo;
if (boundinfo) if (boundinfo)
...@@ -4191,7 +4191,7 @@ get_qual_for_range(Relation parent, PartitionBoundSpec *spec, ...@@ -4191,7 +4191,7 @@ get_qual_for_range(Relation parent, PartitionBoundSpec *spec,
if (spec->is_default) if (spec->is_default)
{ {
List *or_expr_args = NIL; List *or_expr_args = NIL;
PartitionDesc pdesc = RelationGetPartitionDesc(parent, true); /* XXX correct? */ PartitionDesc pdesc = RelationGetPartitionDesc(parent, false);
Oid *inhoids = pdesc->oids; Oid *inhoids = pdesc->oids;
int nparts = pdesc->nparts, int nparts = pdesc->nparts,
i; i;
......
...@@ -37,7 +37,7 @@ typedef struct PartitionDirectoryData ...@@ -37,7 +37,7 @@ typedef struct PartitionDirectoryData
{ {
MemoryContext pdir_mcxt; MemoryContext pdir_mcxt;
HTAB *pdir_hash; HTAB *pdir_hash;
bool include_detached; bool omit_detached;
} PartitionDirectoryData; } PartitionDirectoryData;
typedef struct PartitionDirectoryEntry typedef struct PartitionDirectoryEntry
...@@ -47,7 +47,8 @@ typedef struct PartitionDirectoryEntry ...@@ -47,7 +47,8 @@ typedef struct PartitionDirectoryEntry
PartitionDesc pd; PartitionDesc pd;
} PartitionDirectoryEntry; } PartitionDirectoryEntry;
static void RelationBuildPartitionDesc(Relation rel, bool include_detached); static PartitionDesc RelationBuildPartitionDesc(Relation rel,
bool omit_detached);
/* /*
...@@ -60,18 +61,29 @@ static void RelationBuildPartitionDesc(Relation rel, bool include_detached); ...@@ -60,18 +61,29 @@ static void RelationBuildPartitionDesc(Relation rel, bool include_detached);
* for callers to continue to use that pointer as long as (a) they hold the * for callers to continue to use that pointer as long as (a) they hold the
* relation open, and (b) they hold a relation lock strong enough to ensure * relation open, and (b) they hold a relation lock strong enough to ensure
* that the data doesn't become stale. * that the data doesn't become stale.
*
* The above applies to partition descriptors that are complete regarding
* partitions concurrently being detached. When a descriptor that omits
* partitions being detached is requested (and such partitions are present),
* said descriptor is not part of relcache and so it isn't freed by
* invalidations either. Caller must not use such a descriptor beyond the
* current Portal.
*/ */
PartitionDesc PartitionDesc
RelationGetPartitionDesc(Relation rel, bool include_detached) RelationGetPartitionDesc(Relation rel, bool omit_detached)
{ {
if (rel->rd_rel->relkind != RELKIND_PARTITIONED_TABLE) Assert(rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE);
return NULL;
if (unlikely(rel->rd_partdesc == NULL || /*
rel->rd_partdesc->includes_detached != include_detached)) * If relcache has a partition descriptor, use that. However, we can only
RelationBuildPartitionDesc(rel, include_detached); * do so when we are asked to include all partitions including detached;
* and also when we know that there are no detached partitions.
*/
if (likely(rel->rd_partdesc &&
(!rel->rd_partdesc->detached_exist || !omit_detached)))
return rel->rd_partdesc;
return rel->rd_partdesc; return RelationBuildPartitionDesc(rel, omit_detached);
} }
/* /*
...@@ -88,9 +100,15 @@ RelationGetPartitionDesc(Relation rel, bool include_detached) ...@@ -88,9 +100,15 @@ RelationGetPartitionDesc(Relation rel, bool include_detached)
* context the current context except in very brief code sections, out of fear * context the current context except in very brief code sections, out of fear
* that some of our callees allocate memory on their own which would be leaked * that some of our callees allocate memory on their own which would be leaked
* permanently. * permanently.
*
* As a special case, partition descriptors that are requested to omit
* partitions being detached (and which contain such partitions) are transient
* and are not associated with the relcache entry. Such descriptors only last
* through the requesting Portal, so we use the corresponding memory context
* for them.
*/ */
static void static PartitionDesc
RelationBuildPartitionDesc(Relation rel, bool include_detached) RelationBuildPartitionDesc(Relation rel, bool omit_detached)
{ {
PartitionDesc partdesc; PartitionDesc partdesc;
PartitionBoundInfo boundinfo = NULL; PartitionBoundInfo boundinfo = NULL;
...@@ -98,6 +116,7 @@ RelationBuildPartitionDesc(Relation rel, bool include_detached) ...@@ -98,6 +116,7 @@ RelationBuildPartitionDesc(Relation rel, bool include_detached)
PartitionBoundSpec **boundspecs = NULL; PartitionBoundSpec **boundspecs = NULL;
Oid *oids = NULL; Oid *oids = NULL;
bool *is_leaf = NULL; bool *is_leaf = NULL;
bool detached_exist;
ListCell *cell; ListCell *cell;
int i, int i,
nparts; nparts;
...@@ -112,8 +131,8 @@ RelationBuildPartitionDesc(Relation rel, bool include_detached) ...@@ -112,8 +131,8 @@ RelationBuildPartitionDesc(Relation rel, bool include_detached)
* concurrently, whatever this function returns will be accurate as of * concurrently, whatever this function returns will be accurate as of
* some well-defined point in time. * some well-defined point in time.
*/ */
inhoids = find_inheritance_children(RelationGetRelid(rel), include_detached, inhoids = find_inheritance_children(RelationGetRelid(rel), omit_detached,
NoLock); NoLock, &detached_exist);
nparts = list_length(inhoids); nparts = list_length(inhoids);
/* Allocate working arrays for OIDs, leaf flags, and boundspecs. */ /* Allocate working arrays for OIDs, leaf flags, and boundspecs. */
...@@ -234,6 +253,7 @@ RelationBuildPartitionDesc(Relation rel, bool include_detached) ...@@ -234,6 +253,7 @@ RelationBuildPartitionDesc(Relation rel, bool include_detached)
partdesc = (PartitionDescData *) partdesc = (PartitionDescData *)
MemoryContextAllocZero(new_pdcxt, sizeof(PartitionDescData)); MemoryContextAllocZero(new_pdcxt, sizeof(PartitionDescData));
partdesc->nparts = nparts; partdesc->nparts = nparts;
partdesc->detached_exist = detached_exist;
/* If there are no partitions, the rest of the partdesc can stay zero */ /* If there are no partitions, the rest of the partdesc can stay zero */
if (nparts > 0) if (nparts > 0)
{ {
...@@ -241,7 +261,6 @@ RelationBuildPartitionDesc(Relation rel, bool include_detached) ...@@ -241,7 +261,6 @@ RelationBuildPartitionDesc(Relation rel, bool include_detached)
partdesc->boundinfo = partition_bounds_copy(boundinfo, key); partdesc->boundinfo = partition_bounds_copy(boundinfo, key);
partdesc->oids = (Oid *) palloc(nparts * sizeof(Oid)); partdesc->oids = (Oid *) palloc(nparts * sizeof(Oid));
partdesc->is_leaf = (bool *) palloc(nparts * sizeof(bool)); partdesc->is_leaf = (bool *) palloc(nparts * sizeof(bool));
partdesc->includes_detached = include_detached;
/* /*
* Assign OIDs from the original array into mapped indexes of the * Assign OIDs from the original array into mapped indexes of the
...@@ -261,22 +280,41 @@ RelationBuildPartitionDesc(Relation rel, bool include_detached) ...@@ -261,22 +280,41 @@ RelationBuildPartitionDesc(Relation rel, bool include_detached)
} }
/* /*
* We have a fully valid partdesc ready to store into the relcache. * We have a fully valid partdesc. Reparent it so that it has the right
* Reparent it so it has the right lifespan. * lifespan, and if appropriate put it into the relation's relcache entry.
*/ */
MemoryContextSetParent(new_pdcxt, CacheMemoryContext); if (omit_detached && detached_exist)
{
/*
* A transient partition descriptor is only good for the current
* statement, so make it a child of the current portal's context.
*/
MemoryContextSetParent(new_pdcxt, PortalContext);
}
else
{
/*
* This partdesc goes into relcache.
*/
/* MemoryContextSetParent(new_pdcxt, CacheMemoryContext);
* But first, a kluge: if there's an old rd_pdcxt, it contains an old
* partition descriptor that may still be referenced somewhere. Preserve /*
* it, while not leaking it, by reattaching it as a child context of the * But first, a kluge: if there's an old rd_pdcxt, it contains an old
* new rd_pdcxt. Eventually it will get dropped by either RelationClose * partition descriptor that may still be referenced somewhere.
* or RelationClearRelation. * Preserve it, while not leaking it, by reattaching it as a child
*/ * context of the new rd_pdcxt. Eventually it will get dropped by
if (rel->rd_pdcxt != NULL) * either RelationClose or RelationClearRelation.
MemoryContextSetParent(rel->rd_pdcxt, new_pdcxt); */
rel->rd_pdcxt = new_pdcxt; if (rel->rd_pdcxt != NULL)
rel->rd_partdesc = partdesc; MemoryContextSetParent(rel->rd_pdcxt, new_pdcxt);
rel->rd_pdcxt = new_pdcxt;
/* Store it into relcache */
rel->rd_partdesc = partdesc;
}
return partdesc;
} }
/* /*
...@@ -284,7 +322,7 @@ RelationBuildPartitionDesc(Relation rel, bool include_detached) ...@@ -284,7 +322,7 @@ RelationBuildPartitionDesc(Relation rel, bool include_detached)
* Create a new partition directory object. * Create a new partition directory object.
*/ */
PartitionDirectory PartitionDirectory
CreatePartitionDirectory(MemoryContext mcxt, bool include_detached) CreatePartitionDirectory(MemoryContext mcxt, bool omit_detached)
{ {
MemoryContext oldcontext = MemoryContextSwitchTo(mcxt); MemoryContext oldcontext = MemoryContextSwitchTo(mcxt);
PartitionDirectory pdir; PartitionDirectory pdir;
...@@ -299,7 +337,7 @@ CreatePartitionDirectory(MemoryContext mcxt, bool include_detached) ...@@ -299,7 +337,7 @@ CreatePartitionDirectory(MemoryContext mcxt, bool include_detached)
pdir->pdir_hash = hash_create("partition directory", 256, &ctl, pdir->pdir_hash = hash_create("partition directory", 256, &ctl,
HASH_ELEM | HASH_BLOBS | HASH_CONTEXT); HASH_ELEM | HASH_BLOBS | HASH_CONTEXT);
pdir->include_detached = include_detached; pdir->omit_detached = omit_detached;
MemoryContextSwitchTo(oldcontext); MemoryContextSwitchTo(oldcontext);
return pdir; return pdir;
...@@ -332,7 +370,7 @@ PartitionDirectoryLookup(PartitionDirectory pdir, Relation rel) ...@@ -332,7 +370,7 @@ PartitionDirectoryLookup(PartitionDirectory pdir, Relation rel)
*/ */
RelationIncrementReferenceCount(rel); RelationIncrementReferenceCount(rel);
pde->rel = rel; pde->rel = rel;
pde->pd = RelationGetPartitionDesc(rel, pdir->include_detached); pde->pd = RelationGetPartitionDesc(rel, pdir->omit_detached);
Assert(pde->pd != NULL); Assert(pde->pd != NULL);
} }
return pde->pd; return pde->pd;
......
...@@ -50,8 +50,8 @@ DECLARE_INDEX(pg_inherits_parent_index, 2187, on pg_inherits using btree(inhpare ...@@ -50,8 +50,8 @@ DECLARE_INDEX(pg_inherits_parent_index, 2187, on pg_inherits using btree(inhpare
#define InheritsParentIndexId 2187 #define InheritsParentIndexId 2187
extern List *find_inheritance_children(Oid parentrelId, bool include_detached, extern List *find_inheritance_children(Oid parentrelId, bool omit_detached,
LOCKMODE lockmode); LOCKMODE lockmode, bool *detached_exist);
extern List *find_all_inheritors(Oid parentrelId, LOCKMODE lockmode, extern List *find_all_inheritors(Oid parentrelId, LOCKMODE lockmode,
List **parents); List **parents);
extern bool has_subclass(Oid relationId); extern bool has_subclass(Oid relationId);
......
...@@ -17,11 +17,19 @@ ...@@ -17,11 +17,19 @@
/* /*
* Information about partitions of a partitioned table. * Information about partitions of a partitioned table.
*
* For partitioned tables where detached partitions exist, we only cache
* descriptors that include all partitions, including detached; when we're
* requested a descriptor without the detached partitions, we create one
* afresh each time. (The reason for this is that the set of detached
* partitions that are visible to each caller depends on the snapshot it has,
* so it's pretty much impossible to evict a descriptor from cache at the
* right time.)
*/ */
typedef struct PartitionDescData typedef struct PartitionDescData
{ {
int nparts; /* Number of partitions */ int nparts; /* Number of partitions */
bool includes_detached; /* Does it include detached partitions */ bool detached_exist; /* Are there any detached partitions? */
Oid *oids; /* Array of 'nparts' elements containing Oid *oids; /* Array of 'nparts' elements containing
* partition OIDs in order of the their bounds */ * partition OIDs in order of the their bounds */
bool *is_leaf; /* Array of 'nparts' elements storing whether bool *is_leaf; /* Array of 'nparts' elements storing whether
...@@ -31,9 +39,9 @@ typedef struct PartitionDescData ...@@ -31,9 +39,9 @@ typedef struct PartitionDescData
} PartitionDescData; } PartitionDescData;
extern PartitionDesc RelationGetPartitionDesc(Relation rel, bool include_detached); extern PartitionDesc RelationGetPartitionDesc(Relation rel, bool omit_detached);
extern PartitionDirectory CreatePartitionDirectory(MemoryContext mcxt, bool include_detached); extern PartitionDirectory CreatePartitionDirectory(MemoryContext mcxt, bool omit_detached);
extern PartitionDesc PartitionDirectoryLookup(PartitionDirectory, Relation); extern PartitionDesc PartitionDirectoryLookup(PartitionDirectory, Relation);
extern void DestroyPartitionDirectory(PartitionDirectory pdir); extern void DestroyPartitionDirectory(PartitionDirectory pdir);
......
...@@ -324,6 +324,7 @@ a ...@@ -324,6 +324,7 @@ a
1 1
2 2
step s1insert: insert into d4_fk values (1); step s1insert: insert into d4_fk values (1);
ERROR: insert or update on table "d4_fk" violates foreign key constraint "d4_fk_a_fkey"
step s1c: commit; step s1c: commit;
starting permutation: s2snitch s1b s1s s2detach s1cancel s3vacfreeze s1s s1insert s1c starting permutation: s2snitch s1b s1s s2detach s1cancel s3vacfreeze s1s s1insert s1c
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment