Commit dafa0848 authored by Tom Lane's avatar Tom Lane

Code review for early drop of orphaned temp relations in autovacuum.

Commit a734fd5d exposed some race conditions that existed previously
in the autovac code, but were basically harmless because autovac would
not try to delete orphaned relations immediately.  Specifically, the test
for orphaned-ness was made on a pg_class tuple that might be dead by now,
allowing autovac to try to remove a table that the owning backend had just
finished deleting.  This resulted in a hard crash due to inadequate caution
about accessing the table's catalog entries without any lock.  We must take
a relation lock and then recheck whether the table is still present and
still looks deletable before we do anything.

Also, it seemed to me that deleting multiple tables per transaction, and
trying to continue after errors, represented unjustifiable complexity.
We do not expect this code path to be taken often in the field, nor even
during testing, which means that prioritizing performance over correctness
is a bad tradeoff.  Rip all that out in favor of just starting a new
transaction after each successful temp table deletion.  If we're unlucky
enough to get an error, which shouldn't happen anyway now that we're being
more cautious, let the autovacuum worker fail as it normally would.

In passing, improve the order of operations in the initial scan loop.
Now that we don't care about whether a temp table is a wraparound hazard,
there's no need to perform extract_autovac_opts, get_pgstat_tabentry_relid,
or relation_needs_vacanalyze for temp tables.

Also, if GetTempNamespaceBackendId returns InvalidBackendId (indicating
it doesn't recognize the schema as temp), treat that as meaning it's NOT
an orphaned temp table, not that it IS one, which is what happened before
because BackendIdGetProc necessarily failed.  The case really shouldn't
come up for a table that has RELPERSISTENCE_TEMP, but the consequences
if it did seem undesirable.  (This might represent a back-patchable bug
fix; not sure if it's worth the trouble.)

Discussion: https://postgr.es/m/21299.1480272347@sss.pgh.pa.us
parent c6dbc7b6
...@@ -87,6 +87,7 @@ ...@@ -87,6 +87,7 @@
#include "storage/bufmgr.h" #include "storage/bufmgr.h"
#include "storage/ipc.h" #include "storage/ipc.h"
#include "storage/latch.h" #include "storage/latch.h"
#include "storage/lmgr.h"
#include "storage/pmsignal.h" #include "storage/pmsignal.h"
#include "storage/proc.h" #include "storage/proc.h"
#include "storage/procsignal.h" #include "storage/procsignal.h"
...@@ -130,17 +131,6 @@ int Log_autovacuum_min_duration = -1; ...@@ -130,17 +131,6 @@ int Log_autovacuum_min_duration = -1;
#define MIN_AUTOVAC_SLEEPTIME 100.0 /* milliseconds */ #define MIN_AUTOVAC_SLEEPTIME 100.0 /* milliseconds */
#define MAX_AUTOVAC_SLEEPTIME 300 /* seconds */ #define MAX_AUTOVAC_SLEEPTIME 300 /* seconds */
/*
* Maximum number of orphan temporary tables to drop in a single transaction.
* (If this is too high, we might run out of heavyweight locks.)
*/
#define MAX_ORPHAN_ITEMS 50
/*
* After this many failures, stop trying to drop orphan temporary tables.
*/
#define MAX_ORPHAN_DROP_FAILURE 10
/* Flags to tell if we are in an autovacuum process */ /* Flags to tell if we are in an autovacuum process */
static bool am_autovacuum_launcher = false; static bool am_autovacuum_launcher = false;
static bool am_autovacuum_worker = false; static bool am_autovacuum_worker = false;
...@@ -1899,7 +1889,6 @@ do_autovacuum(void) ...@@ -1899,7 +1889,6 @@ do_autovacuum(void)
Form_pg_database dbForm; Form_pg_database dbForm;
List *table_oids = NIL; List *table_oids = NIL;
List *orphan_oids = NIL; List *orphan_oids = NIL;
List *pending_oids = NIL;
HASHCTL ctl; HASHCTL ctl;
HTAB *table_toast_map; HTAB *table_toast_map;
ListCell *volatile cell; ListCell *volatile cell;
...@@ -1908,7 +1897,6 @@ do_autovacuum(void) ...@@ -1908,7 +1897,6 @@ do_autovacuum(void)
BufferAccessStrategy bstrategy; BufferAccessStrategy bstrategy;
ScanKeyData key; ScanKeyData key;
TupleDesc pg_class_desc; TupleDesc pg_class_desc;
int orphan_failures = 0;
int effective_multixact_freeze_max_age; int effective_multixact_freeze_max_age;
/* /*
...@@ -2000,7 +1988,7 @@ do_autovacuum(void) ...@@ -2000,7 +1988,7 @@ do_autovacuum(void)
* TOAST tables. The reason for doing the second pass is that during it we * TOAST tables. The reason for doing the second pass is that during it we
* want to use the main relation's pg_class.reloptions entry if the TOAST * want to use the main relation's pg_class.reloptions entry if the TOAST
* table does not have any, and we cannot obtain it unless we know * table does not have any, and we cannot obtain it unless we know
* beforehand what's the main table OID. * beforehand what's the main table OID.
* *
* We need to check TOAST tables separately because in cases with short, * We need to check TOAST tables separately because in cases with short,
* wide tables there might be proportionally much more activity in the * wide tables there might be proportionally much more activity in the
...@@ -2028,16 +2016,6 @@ do_autovacuum(void) ...@@ -2028,16 +2016,6 @@ do_autovacuum(void)
relid = HeapTupleGetOid(tuple); relid = HeapTupleGetOid(tuple);
/* Fetch reloptions and the pgstat entry for this table */
relopts = extract_autovac_opts(tuple, pg_class_desc);
tabentry = get_pgstat_tabentry_relid(relid, classForm->relisshared,
shared, dbentry);
/* Check if it needs vacuum or analyze */
relation_needs_vacanalyze(relid, relopts, classForm, tabentry,
effective_multixact_freeze_max_age,
&dovacuum, &doanalyze, &wraparound);
/* /*
* Check if it is a temp table (presumably, of some other backend's). * Check if it is a temp table (presumably, of some other backend's).
* We cannot safely process other backends' temp tables. * We cannot safely process other backends' temp tables.
...@@ -2049,47 +2027,60 @@ do_autovacuum(void) ...@@ -2049,47 +2027,60 @@ do_autovacuum(void)
backendID = GetTempNamespaceBackendId(classForm->relnamespace); backendID = GetTempNamespaceBackendId(classForm->relnamespace);
/* We just ignore it if the owning backend is still active */ /* We just ignore it if the owning backend is still active */
if (backendID == MyBackendId || BackendIdGetProc(backendID) == NULL) if (backendID != InvalidBackendId &&
(backendID == MyBackendId ||
BackendIdGetProc(backendID) == NULL))
{ {
/* /*
* We found an orphan temp table which was probably left * The table seems to be orphaned -- although it might be that
* behind by a crashed backend. Remember it, so we can attempt * the owning backend has already deleted it and exited; our
* to drop it. * pg_class scan snapshot is not necessarily up-to-date
* anymore, so we could be looking at a committed-dead entry.
* Remember it so we can try to delete it later.
*/ */
orphan_oids = lappend_oid(orphan_oids, relid); orphan_oids = lappend_oid(orphan_oids, relid);
} }
continue;
} }
else
{
/* relations that need work are added to table_oids */
if (dovacuum || doanalyze)
table_oids = lappend_oid(table_oids, relid);
/* /* Fetch reloptions and the pgstat entry for this table */
* Remember the association for the second pass. Note: we must do relopts = extract_autovac_opts(tuple, pg_class_desc);
* this even if the table is going to be vacuumed, because we tabentry = get_pgstat_tabentry_relid(relid, classForm->relisshared,
* don't automatically vacuum toast tables along the parent table. shared, dbentry);
*/
if (OidIsValid(classForm->reltoastrelid)) /* Check if it needs vacuum or analyze */
{ relation_needs_vacanalyze(relid, relopts, classForm, tabentry,
av_relation *hentry; effective_multixact_freeze_max_age,
bool found; &dovacuum, &doanalyze, &wraparound);
/* Relations that need work are added to table_oids */
if (dovacuum || doanalyze)
table_oids = lappend_oid(table_oids, relid);
/*
* Remember TOAST associations for the second pass. Note: we must do
* this whether or not the table is going to be vacuumed, because we
* don't automatically vacuum toast tables along the parent table.
*/
if (OidIsValid(classForm->reltoastrelid))
{
av_relation *hentry;
bool found;
hentry = hash_search(table_toast_map, hentry = hash_search(table_toast_map,
&classForm->reltoastrelid, &classForm->reltoastrelid,
HASH_ENTER, &found); HASH_ENTER, &found);
if (!found) if (!found)
{
/* hash_search already filled in the key */
hentry->ar_relid = relid;
hentry->ar_hasrelopts = false;
if (relopts != NULL)
{ {
/* hash_search already filled in the key */ hentry->ar_hasrelopts = true;
hentry->ar_relid = relid; memcpy(&hentry->ar_reloptions, relopts,
hentry->ar_hasrelopts = false; sizeof(AutoVacOpts));
if (relopts != NULL)
{
hentry->ar_hasrelopts = true;
memcpy(&hentry->ar_reloptions, relopts,
sizeof(AutoVacOpts));
}
} }
} }
} }
...@@ -2154,112 +2145,90 @@ do_autovacuum(void) ...@@ -2154,112 +2145,90 @@ do_autovacuum(void)
heap_close(classRel, AccessShareLock); heap_close(classRel, AccessShareLock);
/* /*
* Loop through orphan temporary tables and drop them in batches. If * Recheck orphan temporary tables, and if they still seem orphaned, drop
* we're unable to drop one particular table, we'll retry to see if we * them. We'll eat a transaction per dropped table, which might seem
* can drop others, but if we fail too many times we'll give up and proceed * excessive, but we should only need to do anything as a result of a
* with our regular work, so that this step hopefully can't wedge * previous backend crash, so this should not happen often enough to
* autovacuum for too long. * justify "optimizing". Using separate transactions ensures that we
* don't bloat the lock table if there are many temp tables to be dropped,
* and it ensures that we don't lose work if a deletion attempt fails.
*/ */
while (list_length(orphan_oids) > 0 && foreach(cell, orphan_oids)
orphan_failures < MAX_ORPHAN_DROP_FAILURE)
{ {
Oid relid = linitial_oid(orphan_oids); Oid relid = lfirst_oid(cell);
ObjectAddress object; Form_pg_class classForm;
char *namespace = get_namespace_name(get_rel_namespace(relid)); int backendID;
char *relname = get_rel_name(relid); ObjectAddress object;
orphan_oids = list_delete_first(orphan_oids); /*
* Check for user-requested abort.
*/
CHECK_FOR_INTERRUPTS();
PG_TRY(); /*
{ * Try to lock the table. If we can't get the lock immediately,
ereport(LOG, * somebody else is using (or dropping) the table, so it's not our
(errmsg("autovacuum: dropping orphan temp table \"%s\".\"%s\" in database \"%s\"", * concern anymore. Having the lock prevents race conditions below.
namespace, relname, */
get_database_name(MyDatabaseId)))); if (!ConditionalLockRelationOid(relid, AccessExclusiveLock))
object.classId = RelationRelationId; continue;
object.objectId = relid;
object.objectSubId = 0;
performDeletion(&object, DROP_CASCADE, PERFORM_DELETION_INTERNAL);
/* /*
* This orphan table has been dropped correctly, add it to the * Re-fetch the pg_class tuple and re-check whether it still seems to
* list of tables whose drop should be attempted again if an * be an orphaned temp table. If it's not there or no longer the same
* error after in the same transaction. * relation, ignore it.
*/ */
pending_oids = lappend_oid(pending_oids, relid); tuple = SearchSysCacheCopy1(RELOID, ObjectIdGetDatum(relid));
} if (!HeapTupleIsValid(tuple))
PG_CATCH();
{ {
/* Abort the current transaction. */ /* be sure to drop useless lock so we don't bloat lock table */
HOLD_INTERRUPTS(); UnlockRelationOid(relid, AccessExclusiveLock);
continue;
errcontext("dropping of orphan temp table \"%s\".\"%s\" in database \"%s\"",
namespace, relname,
get_database_name(MyDatabaseId));
EmitErrorReport();
/* this resets the PGXACT flags too */
AbortOutOfAnyTransaction();
FlushErrorState();
/*
* Any tables were succesfully dropped before the failure now
* need to be dropped again. Add them back into the list, but
* don't retry the table that failed.
*/
orphan_oids = list_concat(orphan_oids, pending_oids);
orphan_failures++;
/* Start a new transaction. */
StartTransactionCommand();
/* StartTransactionCommand changed elsewhere the memory context */
MemoryContextSwitchTo(AutovacMemCxt);
RESUME_INTERRUPTS();
} }
PG_END_TRY(); classForm = (Form_pg_class) GETSTRUCT(tuple);
/* /*
* If we've successfully dropped quite a few tables, commit the * Make all the same tests made in the loop above. In event of OID
* transaction and begin a new one. The main point of this is to * counter wraparound, the pg_class entry we have now might be
* avoid accumulating too many locks and blowing out the lock table, * completely unrelated to the one we saw before.
* but it also minimizes the amount of work that will have to be rolled
* back if we fail to drop some table later in the list.
*/ */
if (list_length(pending_oids) >= MAX_ORPHAN_ITEMS) if (!((classForm->relkind == RELKIND_RELATION ||
classForm->relkind == RELKIND_MATVIEW) &&
classForm->relpersistence == RELPERSISTENCE_TEMP))
{ {
CommitTransactionCommand(); UnlockRelationOid(relid, AccessExclusiveLock);
StartTransactionCommand(); continue;
}
/* StartTransactionCommand changed elsewhere */ backendID = GetTempNamespaceBackendId(classForm->relnamespace);
MemoryContextSwitchTo(AutovacMemCxt); if (!(backendID != InvalidBackendId &&
(backendID == MyBackendId ||
list_free(pending_oids); BackendIdGetProc(backendID) == NULL)))
pending_oids = NIL; {
UnlockRelationOid(relid, AccessExclusiveLock);
continue;
} }
pfree(relname); /* OK, let's delete it */
pfree(namespace); ereport(LOG,
} (errmsg("autovacuum: dropping orphan temp table \"%s.%s.%s\"",
get_database_name(MyDatabaseId),
get_namespace_name(classForm->relnamespace),
NameStr(classForm->relname))));
/* object.classId = RelationRelationId;
* Commit current transaction to finish the cleanup done previously and object.objectId = relid;
* restart a new one to not bloat the activity of the following steps. object.objectSubId = 0;
* This needs to happen only if there are any items thought as previously performDeletion(&object, DROP_CASCADE, PERFORM_DELETION_INTERNAL);
* pending, but are actually not as the last transaction doing the cleanup
* has been successful. /*
*/ * To commit the deletion, end current transaction and start a new
if (list_length(pending_oids) > 0) * one. Note this also releases the lock we took.
{ */
CommitTransactionCommand(); CommitTransactionCommand();
StartTransactionCommand(); StartTransactionCommand();
/* StartTransactionCommand changed elsewhere */ /* StartTransactionCommand changed current memory context */
MemoryContextSwitchTo(AutovacMemCxt); MemoryContextSwitchTo(AutovacMemCxt);
list_free(pending_oids);
} }
/* /*
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment