Commit 19de0ab2 authored by Tom Lane's avatar Tom Lane

Fix inadequate locking during get_rel_oids().

get_rel_oids used to not take any relation locks at all, but that stopped
being a good idea with commit 3c3bb993, which inserted a syscache lookup
into the function.  A concurrent DROP TABLE could now produce "cache lookup
failed", which we don't want to have happen in normal operation.  The best
solution seems to be to transiently take a lock on the relation named by
the RangeVar (which also makes the result of RangeVarGetRelid a lot less
spongy).  But we shouldn't hold the lock beyond this function, because we
don't want VACUUM to lock more than one table at a time.  (That would not
be a big problem right now, but it will become one after the pending
feature patch to allow multiple tables to be named in VACUUM.)

In passing, adjust vacuum_rel and analyze_rel to document that we don't
trust the passed RangeVar to be accurate, and allow the RangeVar to
possibly be NULL --- which it is anyway for a whole-database VACUUM,
though we accidentally didn't crash for that case.

The passed RangeVar is in fact inaccurate when dealing with a child
partition, as of v10, and it has been wrong for a whole long time in the
case of vacuum_rel() recursing to a TOAST table.  None of these things
present visible bugs up to now, because the passed RangeVar is in fact
only consulted for autovacuum logging, and in that particular context it's
always accurate because autovacuum doesn't let vacuum.c expand partitions
nor recurse to toast tables.  Still, this seems like trouble waiting to
happen, so let's nail the door at least partly shut.  (Further cleanup
is planned, in HEAD only, as part of the pending feature patch.)

Fix some sadly inaccurate/obsolete comments too.  Back-patch to v10.

Michael Paquier and Tom Lane

Discussion: https://postgr.es/m/25023.1506107590@sss.pgh.pa.us
parent 69c16983
...@@ -106,6 +106,10 @@ static Datum ind_fetch_func(VacAttrStatsP stats, int rownum, bool *isNull); ...@@ -106,6 +106,10 @@ static Datum ind_fetch_func(VacAttrStatsP stats, int rownum, bool *isNull);
/* /*
* analyze_rel() -- analyze one relation * analyze_rel() -- analyze one relation
*
* relid identifies the relation to analyze. If relation is supplied, use
* the name therein for reporting any failure to open/lock the rel; do not
* use it once we've successfully opened the rel, since it might be stale.
*/ */
void void
analyze_rel(Oid relid, RangeVar *relation, int options, analyze_rel(Oid relid, RangeVar *relation, int options,
...@@ -145,7 +149,8 @@ analyze_rel(Oid relid, RangeVar *relation, int options, ...@@ -145,7 +149,8 @@ analyze_rel(Oid relid, RangeVar *relation, int options,
else else
{ {
onerel = NULL; onerel = NULL;
if (IsAutoVacuumWorkerProcess() && params->log_min_duration >= 0) if (relation &&
IsAutoVacuumWorkerProcess() && params->log_min_duration >= 0)
ereport(LOG, ereport(LOG,
(errcode(ERRCODE_LOCK_NOT_AVAILABLE), (errcode(ERRCODE_LOCK_NOT_AVAILABLE),
errmsg("skipping analyze of \"%s\" --- lock not available", errmsg("skipping analyze of \"%s\" --- lock not available",
......
...@@ -128,9 +128,11 @@ ExecVacuum(VacuumStmt *vacstmt, bool isTopLevel) ...@@ -128,9 +128,11 @@ ExecVacuum(VacuumStmt *vacstmt, bool isTopLevel)
* *
* options is a bitmask of VacuumOption flags, indicating what to do. * options is a bitmask of VacuumOption flags, indicating what to do.
* *
* relid, if not InvalidOid, indicate the relation to process; otherwise, * relid, if not InvalidOid, indicates the relation to process; otherwise,
* the RangeVar is used. (The latter must always be passed, because it's * if a RangeVar is supplied, that's what to process; otherwise, we process
* used for error messages.) * all relevant tables in the database. (If both relid and a RangeVar are
* supplied, the relid is what is processed, but we use the RangeVar's name
* to report any open/lock failure.)
* *
* params contains a set of parameters that can be used to customize the * params contains a set of parameters that can be used to customize the
* behavior. * behavior.
...@@ -226,8 +228,8 @@ vacuum(int options, RangeVar *relation, Oid relid, VacuumParams *params, ...@@ -226,8 +228,8 @@ vacuum(int options, RangeVar *relation, Oid relid, VacuumParams *params,
vac_strategy = bstrategy; vac_strategy = bstrategy;
/* /*
* Build list of relations to process, unless caller gave us one. (If we * Build list of relation OID(s) to process, putting it in vac_context for
* build one, we put it in vac_context for safekeeping.) * safekeeping.
*/ */
relations = get_rel_oids(relid, relation); relations = get_rel_oids(relid, relation);
...@@ -393,22 +395,18 @@ get_rel_oids(Oid relid, const RangeVar *vacrel) ...@@ -393,22 +395,18 @@ get_rel_oids(Oid relid, const RangeVar *vacrel)
} }
else if (vacrel) else if (vacrel)
{ {
/* Process a specific relation */ /* Process a specific relation, and possibly partitions thereof */
Oid relid; Oid relid;
HeapTuple tuple; HeapTuple tuple;
Form_pg_class classForm; Form_pg_class classForm;
bool include_parts; bool include_parts;
/* /*
* Since we don't take a lock here, the relation might be gone, or the * We transiently take AccessShareLock to protect the syscache lookup
* RangeVar might no longer refer to the OID we look up here. In the * below, as well as find_all_inheritors's expectation that the caller
* former case, VACUUM will do nothing; in the latter case, it will * holds some lock on the starting relation.
* process the OID we looked up here, rather than the new one. Neither
* is ideal, but there's little practical alternative, since we're
* going to commit this transaction and begin a new one between now
* and then.
*/ */
relid = RangeVarGetRelid(vacrel, NoLock, false); relid = RangeVarGetRelid(vacrel, AccessShareLock, false);
/* /*
* To check whether the relation is a partitioned table, fetch its * To check whether the relation is a partitioned table, fetch its
...@@ -422,10 +420,11 @@ get_rel_oids(Oid relid, const RangeVar *vacrel) ...@@ -422,10 +420,11 @@ get_rel_oids(Oid relid, const RangeVar *vacrel)
ReleaseSysCache(tuple); ReleaseSysCache(tuple);
/* /*
* Make relation list entries for this guy and its partitions, if any. * Make relation list entries for this rel and its partitions, if any.
* Note that the list returned by find_all_inheritors() include the * Note that the list returned by find_all_inheritors() includes the
* passed-in OID at its head. Also note that we did not request a * passed-in OID at its head. There's no point in taking locks on the
* lock to be taken to match what would be done otherwise. * individual partitions yet, and doing so would just add unnecessary
* deadlock risk.
*/ */
oldcontext = MemoryContextSwitchTo(vac_context); oldcontext = MemoryContextSwitchTo(vac_context);
if (include_parts) if (include_parts)
...@@ -434,6 +433,19 @@ get_rel_oids(Oid relid, const RangeVar *vacrel) ...@@ -434,6 +433,19 @@ get_rel_oids(Oid relid, const RangeVar *vacrel)
else else
oid_list = lappend_oid(oid_list, relid); oid_list = lappend_oid(oid_list, relid);
MemoryContextSwitchTo(oldcontext); MemoryContextSwitchTo(oldcontext);
/*
* Release lock again. This means that by the time we actually try to
* process the table, it might be gone or renamed. In the former case
* we'll silently ignore it; in the latter case we'll process it
* anyway, but we must beware that the RangeVar doesn't necessarily
* identify it anymore. This isn't ideal, perhaps, but there's little
* practical alternative, since we're typically going to commit this
* transaction and begin a new one between now and then. Moreover,
* holding locks on multiple relations would create significant risk
* of deadlock.
*/
UnlockRelationOid(relid, AccessShareLock);
} }
else else
{ {
...@@ -463,7 +475,7 @@ get_rel_oids(Oid relid, const RangeVar *vacrel) ...@@ -463,7 +475,7 @@ get_rel_oids(Oid relid, const RangeVar *vacrel)
classForm->relkind != RELKIND_PARTITIONED_TABLE) classForm->relkind != RELKIND_PARTITIONED_TABLE)
continue; continue;
/* Make a relation list entry for this guy */ /* Make a relation list entry for this rel */
oldcontext = MemoryContextSwitchTo(vac_context); oldcontext = MemoryContextSwitchTo(vac_context);
oid_list = lappend_oid(oid_list, HeapTupleGetOid(tuple)); oid_list = lappend_oid(oid_list, HeapTupleGetOid(tuple));
MemoryContextSwitchTo(oldcontext); MemoryContextSwitchTo(oldcontext);
...@@ -1212,6 +1224,11 @@ vac_truncate_clog(TransactionId frozenXID, ...@@ -1212,6 +1224,11 @@ vac_truncate_clog(TransactionId frozenXID,
/* /*
* vacuum_rel() -- vacuum one heap relation * vacuum_rel() -- vacuum one heap relation
* *
* relid identifies the relation to vacuum. If relation is supplied,
* use the name therein for reporting any failure to open/lock the rel;
* do not use it once we've successfully opened the rel, since it might
* be stale.
*
* Doing one heap at a time incurs extra overhead, since we need to * Doing one heap at a time incurs extra overhead, since we need to
* check that the heap exists again just before we vacuum it. The * check that the heap exists again just before we vacuum it. The
* reason that we do this is so that vacuuming can be spread across * reason that we do this is so that vacuuming can be spread across
...@@ -1300,7 +1317,8 @@ vacuum_rel(Oid relid, RangeVar *relation, int options, VacuumParams *params) ...@@ -1300,7 +1317,8 @@ vacuum_rel(Oid relid, RangeVar *relation, int options, VacuumParams *params)
else else
{ {
onerel = NULL; onerel = NULL;
if (IsAutoVacuumWorkerProcess() && params->log_min_duration >= 0) if (relation &&
IsAutoVacuumWorkerProcess() && params->log_min_duration >= 0)
ereport(LOG, ereport(LOG,
(errcode(ERRCODE_LOCK_NOT_AVAILABLE), (errcode(ERRCODE_LOCK_NOT_AVAILABLE),
errmsg("skipping vacuum of \"%s\" --- lock not available", errmsg("skipping vacuum of \"%s\" --- lock not available",
...@@ -1468,7 +1486,7 @@ vacuum_rel(Oid relid, RangeVar *relation, int options, VacuumParams *params) ...@@ -1468,7 +1486,7 @@ vacuum_rel(Oid relid, RangeVar *relation, int options, VacuumParams *params)
* totally unimportant for toast relations. * totally unimportant for toast relations.
*/ */
if (toast_relid != InvalidOid) if (toast_relid != InvalidOid)
vacuum_rel(toast_relid, relation, options, params); vacuum_rel(toast_relid, NULL, options, params);
/* /*
* Now release the session-level lock on the master table. * Now release the session-level lock on the master table.
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment