Commit a54e1f15 authored by Andres Freund's avatar Andres Freund

Fix bugs in vacuum of shared rels, by keeping their relcache entries current.

When vacuum processes a relation it uses the corresponding relcache
entry's relfrozenxid / relminmxid as a cutoff for when to remove
tuples etc. Unfortunately for nailed relations (i.e. critical system
catalogs) bugs could frequently lead to the corresponding relcache
entry being stale.

This set of bugs could cause actual data corruption as vacuum would
potentially not remove the correct row versions, potentially reviving
them at a later point.  After 699bf7d0 some corruptions in this vein
were prevented, but the additional error checks could also trigger
spuriously. Examples of such errors are:
  ERROR: found xmin ... from before relfrozenxid ...
and
  ERROR: found multixact ... from before relminmxid ...
To be caused by this bug the errors have to occur on system catalog
tables.

The two bugs are:

1) Invalidations for nailed relations were ignored, based on the
   theory that the relcache entry for such tables doesn't
   change. Which is largely true, except for fields like relfrozenxid
   etc.  This means that changes to relations vacuumed in other
   sessions weren't picked up by already existing sessions.  Luckily
   autovacuum doesn't have particularly longrunning sessions.

2) For shared *and* nailed relations, the shared relcache init file
   was never invalidated while running.  That means that for such
   tables (e.g. pg_authid, pg_database) it's not just already existing
   sessions that are affected, but even new connections are as well.
   That explains why the reports usually were about pg_authid et. al.

To fix 1), revalidate the rd_rel portion of a relcache entry when
invalid. This implies a bit of extra complexity to deal with
bootstrapping, but it's not too bad.  The fix for 2) is simpler,
simply always remove both the shared and local init files.

Author: Andres Freund
Reviewed-By: Alvaro Herrera
Discussion:
    https://postgr.es/m/20180525203736.crkbg36muzxrjj5e@alap3.anarazel.de
    https://postgr.es/m/CAMa1XUhKSJd98JW4o9StWPrfS=11bPgG+_GDMxe25TvUY4Sugg@mail.gmail.com
    https://postgr.es/m/CAKMFJucqbuoDRfxPDX39WhA3vJyxweRg_zDVXzncr6+5wOguWA@mail.gmail.com
    https://postgr.es/m/CAGewt-ujGpMLQ09gXcUFMZaZsGJC98VXHEFbF-tpPB0fB13K+A@mail.gmail.com
Backpatch: 9.3-
parent 8a07ebb3
......@@ -521,12 +521,12 @@ RegisterRelcacheInvalidation(Oid dbId, Oid relId)
(void) GetCurrentCommandId(true);
/*
* If the relation being invalidated is one of those cached in the local
* relcache init file, mark that we need to zap that file at commit. Same
* is true when we are invalidating whole relcache.
* If the relation being invalidated is one of those cached in a relcache
* init file, mark that we need to zap that file at commit. For simplicity
* invalidations for a specific database always invalidate the shared file
* as well. Also zap when we are invalidating whole relcache.
*/
if (OidIsValid(dbId) &&
(RelationIdIsInInitFile(relId) || relId == InvalidOid))
if (relId == InvalidOid || RelationIdIsInInitFile(relId))
transInvalInfo->RelcacheInitFileInval = true;
}
......@@ -881,18 +881,26 @@ ProcessCommittedInvalidationMessages(SharedInvalidationMessage *msgs,
if (RelcacheInitFileInval)
{
elog(trace_recovery(DEBUG4), "removing relcache init files for database %u",
dbid);
/*
* RelationCacheInitFilePreInvalidate requires DatabasePath to be set,
* but we should not use SetDatabasePath during recovery, since it is
* RelationCacheInitFilePreInvalidate, when the invalidation message
* is for a specific database, requires DatabasePath to be set, but we
* should not use SetDatabasePath during recovery, since it is
* intended to be used only once by normal backends. Hence, a quick
* hack: set DatabasePath directly then unset after use.
*/
DatabasePath = GetDatabasePath(dbid, tsid);
elog(trace_recovery(DEBUG4), "removing relcache init file in \"%s\"",
DatabasePath);
if (OidIsValid(dbid))
DatabasePath = GetDatabasePath(dbid, tsid);
RelationCacheInitFilePreInvalidate();
pfree(DatabasePath);
DatabasePath = NULL;
if (OidIsValid(dbid))
{
pfree(DatabasePath);
DatabasePath = NULL;
}
}
SendSharedInvalidMessages(msgs, nmsgs);
......
This diff is collapsed.
......@@ -64,7 +64,7 @@ typedef struct xl_invalidations
{
Oid dbId; /* MyDatabaseId */
Oid tsId; /* MyDatabaseTableSpace */
bool relcacheInitFileInval; /* invalidate relcache init file */
bool relcacheInitFileInval; /* invalidate relcache init files */
int nmsgs; /* number of shared inval msgs */
SharedInvalidationMessage msgs[FLEXIBLE_ARRAY_MEMBER];
} xl_invalidations;
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment