Commit e5fac1cb authored by Tom Lane's avatar Tom Lane

Avoid unnecessary recursion to child tables in ALTER TABLE SET NOT NULL.

If a partitioned table's column is already marked NOT NULL, there is
no need to examine its partitions, because we can rely on previous
DDL to have enforced that the child columns are NOT NULL as well.
(Unfortunately, the same cannot be said for traditional inheritance,
so for now we have to restrict the optimization to partitioned tables.)
Hence, we may skip recursing to child tables in this situation.

The reason this case is worth worrying about is that when pg_dump dumps
a partitioned table having a primary key, it will include the requisite
NOT NULL markings in the CREATE TABLE commands, and then add the
primary key as a separate step.  The primary key addition generates a
SET NOT NULL as a subcommand, just to be sure.  So the situation where
a SET NOT NULL is redundant does arise in the real world.

Skipping the recursion does more than just save a few cycles: it means
that a command such as "ALTER TABLE ONLY partition_parent ADD PRIMARY
KEY" will take locks only on the partition parent table, not on the
partitions.  It turns out that parallel pg_restore is effectively
assuming that that's true, and has little choice but to do so because
the dependencies listed for such a TOC entry don't include the
partitions.  pg_restore could thus issue this ALTER while data restores
on the partitions are still in progress.  Taking unnecessary locks on
the partitions not only hurts concurrency, but can lead to actual
deadlock failures, as reported by Domagoj Smoljanovic.

(A contributing factor in the deadlock is that TRUNCATE on a child
partition wants a non-exclusive lock on the parent.  This seems
likewise unnecessary, but the fix for it is more invasive so we
won't consider back-patching it.  Fortunately, getting rid of one
of these two poor behaviors is enough to remove the deadlock.)

Although support for partitioned primary keys came in with v11,
this patch is dependent on the SET NOT NULL refactoring done by
commit f4a3fdfb, so we can only patch back to v12.

Patch by me; thanks to Alvaro Herrera and Amit Langote for review.

Discussion: https://postgr.es/m/VI1PR03MB31670CA1BD9625C3A8C5DD05EB230@VI1PR03MB3167.eurprd03.prod.outlook.com
parent 3d65b059
...@@ -5681,14 +5681,10 @@ ATSimpleRecursion(List **wqueue, Relation rel, ...@@ -5681,14 +5681,10 @@ ATSimpleRecursion(List **wqueue, Relation rel,
AlterTableUtilityContext *context) AlterTableUtilityContext *context)
{ {
/* /*
* Propagate to children if desired. Only plain tables, foreign tables * Propagate to children, if desired and if there are (or might be) any
* and partitioned tables have children, so no need to search for other * children.
* relkinds.
*/ */
if (recurse && if (recurse && rel->rd_rel->relhassubclass)
(rel->rd_rel->relkind == RELKIND_RELATION ||
rel->rd_rel->relkind == RELKIND_FOREIGN_TABLE ||
rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE))
{ {
Oid relid = RelationGetRelid(rel); Oid relid = RelationGetRelid(rel);
ListCell *child; ListCell *child;
...@@ -6698,6 +6694,41 @@ ATPrepSetNotNull(List **wqueue, Relation rel, ...@@ -6698,6 +6694,41 @@ ATPrepSetNotNull(List **wqueue, Relation rel,
if (recursing) if (recursing)
return; return;
/*
* If the target column is already marked NOT NULL, we can skip recursing
* to children, because their columns should already be marked NOT NULL as
* well. But there's no point in checking here unless the relation has
* some children; else we can just wait till execution to check. (If it
* does have children, however, this can save taking per-child locks
* unnecessarily. This greatly improves concurrency in some parallel
* restore scenarios.)
*
* Unfortunately, we can only apply this optimization to partitioned
* tables, because traditional inheritance doesn't enforce that child
* columns be NOT NULL when their parent is. (That's a bug that should
* get fixed someday.)
*/
if (rel->rd_rel->relhassubclass &&
rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
{
HeapTuple tuple;
bool attnotnull;
tuple = SearchSysCacheAttName(RelationGetRelid(rel), cmd->name);
/* Might as well throw the error now, if name is bad */
if (!HeapTupleIsValid(tuple))
ereport(ERROR,
(errcode(ERRCODE_UNDEFINED_COLUMN),
errmsg("column \"%s\" of relation \"%s\" does not exist",
cmd->name, RelationGetRelationName(rel))));
attnotnull = ((Form_pg_attribute) GETSTRUCT(tuple))->attnotnull;
ReleaseSysCache(tuple);
if (attnotnull)
return;
}
/* /*
* If we have ALTER TABLE ONLY ... SET NOT NULL on a partitioned table, * If we have ALTER TABLE ONLY ... SET NOT NULL on a partitioned table,
* apply ALTER TABLE ... CHECK NOT NULL to every child. Otherwise, use * apply ALTER TABLE ... CHECK NOT NULL to every child. Otherwise, use
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment