Commit f912d7de authored by Tom Lane's avatar Tom Lane

Fix reindexing of pg_class indexes some more.

Commits 3dbb317d et al failed under CLOBBER_CACHE_ALWAYS testing.
Investigation showed that to reindex pg_class_oid_index, we must
suppress accesses to the index (via SetReindexProcessing) before we call
RelationSetNewRelfilenode, or at least before we do CommandCounterIncrement
therein; otherwise, relcache reloads happening within the CCI may try to
fetch pg_class rows using the index's new relfilenode value, which is as
yet an empty file.

Of course, the point of 3dbb317d was that that ordering didn't work
either, because then RelationSetNewRelfilenode's own update of the index's
pg_class row cannot access the index, should it need to.

There are various ways we might have got around that, but Andres Freund
came up with a brilliant solution: for a mapped index, we can really just
skip the pg_class update altogether.  The only fields it was actually
changing were relpages etc, but it was just setting them to zeroes which
is useless make-work.  (Correct new values will be installed at the end
of index build.)  All pg_class indexes are mapped and probably always will
be, so this eliminates the problem by removing work rather than adding it,
always a pleasant outcome.  Having taught RelationSetNewRelfilenode to do
it that way, we can revert the code reordering in reindex_index.  (But
I left the moved setup code where it was; there seems no reason why it
has to run without use of the old index.  If you're trying to fix a
busted pg_class index, you'll have had to disable system index use
altogether to get this far.)

Moreover, this means we don't need RelationSetIndexList at all, because
reindex_relation's hacking to make "REINDEX TABLE pg_class" work is
likewise now unnecessary.  We'll leave that code in place in the back
branches, but a follow-on patch will remove it in HEAD.

In passing, do some minor cleanup for commit 5c156060 (in HEAD only),
notably removing a duplicate newrnode assignment.

Patch by me, using a core idea due to Andres Freund.  Back-patch to all
supported branches, as 3dbb317d was.

Discussion: https://postgr.es/m/28926.1556664156@sss.pgh.pa.us
parent 2bf372a4
......@@ -3330,20 +3330,15 @@ reindex_index(Oid indexId, bool skip_constraint_checks, char persistence,
indexInfo->ii_ExclusionStrats = NULL;
}
/*
* Build a new physical relation for the index. Need to do that before
* "officially" starting the reindexing with SetReindexProcessing -
* otherwise the necessary pg_class changes cannot be made with
* encountering assertions.
*/
RelationSetNewRelfilenode(iRel, persistence);
/* ensure SetReindexProcessing state isn't leaked */
PG_TRY();
{
/* Suppress use of the target index while rebuilding it */
SetReindexProcessing(heapId, indexId);
/* Create a new physical relation for the index */
RelationSetNewRelfilenode(iRel, persistence);
/* Initialize the index and rebuild */
/* Note: we do not need to re-establish pkey setting */
index_build(heapRelation, iRel, indexInfo, true, true);
......@@ -3491,7 +3486,6 @@ reindex_relation(Oid relid, int flags, int options)
Relation rel;
Oid toast_relid;
List *indexIds;
bool is_pg_class;
bool result;
int i;
......@@ -3527,32 +3521,8 @@ reindex_relation(Oid relid, int flags, int options)
*/
indexIds = RelationGetIndexList(rel);
/*
* reindex_index will attempt to update the pg_class rows for the relation
* and index. If we are processing pg_class itself, we want to make sure
* that the updates do not try to insert index entries into indexes we
* have not processed yet. (When we are trying to recover from corrupted
* indexes, that could easily cause a crash.) We can accomplish this
* because CatalogTupleInsert/CatalogTupleUpdate will use the relcache's
* index list to know which indexes to update. We just force the index
* list to be only the stuff we've processed.
*
* It is okay to not insert entries into the indexes we have not processed
* yet because all of this is transaction-safe. If we fail partway
* through, the updated rows are dead and it doesn't matter whether they
* have index entries. Also, a new pg_class index will be created with a
* correct entry for its own pg_class row because we do
* RelationSetNewRelfilenode() before we do index_build().
*/
is_pg_class = (RelationGetRelid(rel) == RelationRelationId);
/* Ensure rd_indexattr is valid; see comments for RelationSetIndexList */
if (is_pg_class)
(void) RelationGetIndexAttrBitmap(rel, INDEX_ATTR_BITMAP_ALL);
PG_TRY();
{
List *doneIndexes;
ListCell *indexId;
char persistence;
......@@ -3580,15 +3550,11 @@ reindex_relation(Oid relid, int flags, int options)
persistence = rel->rd_rel->relpersistence;
/* Reindex all the indexes. */
doneIndexes = NIL;
i = 1;
foreach(indexId, indexIds)
{
Oid indexOid = lfirst_oid(indexId);
if (is_pg_class)
RelationSetIndexList(rel, doneIndexes);
reindex_index(indexOid, !(flags & REINDEX_REL_CHECK_CONSTRAINTS),
persistence, options);
......@@ -3597,9 +3563,6 @@ reindex_relation(Oid relid, int flags, int options)
/* Index should no longer be in the pending list */
Assert(!ReindexIsProcessingIndex(indexOid));
if (is_pg_class)
doneIndexes = lappend_oid(doneIndexes, indexOid);
/* Set index rebuild count */
pgstat_progress_update_param(PROGRESS_CLUSTER_INDEX_REBUILD_COUNT,
i);
......@@ -3615,9 +3578,6 @@ reindex_relation(Oid relid, int flags, int options)
PG_END_TRY();
ResetReindexPending();
if (is_pg_class)
RelationSetIndexList(rel, indexIds);
/*
* Close rel, but continue to hold the lock.
*/
......
......@@ -3421,7 +3421,8 @@ RelationBuildLocalRelation(const char *relname,
/*
* RelationSetNewRelfilenode
*
* Assign a new relfilenode (physical file name) to the relation.
* Assign a new relfilenode (physical file name), and possibly a new
* persistence setting, to the relation.
*
* This allows a full rewrite of the relation to be done with transactional
* safety (since the filenode assignment can be rolled back). Note however
......@@ -3463,13 +3464,10 @@ RelationSetNewRelfilenode(Relation relation, char persistence)
*/
RelationDropStorage(relation);
/* initialize new relfilenode from old relfilenode */
newrnode = relation->rd_node;
/*
* Create storage for the main fork of the new relfilenode. If it's
* table-like object, call into table AM to do so, which'll also create
* the table's init fork.
* Create storage for the main fork of the new relfilenode. If it's a
* table-like object, call into the table AM to do so, which'll also
* create the table's init fork if needed.
*
* NOTE: If relevant for the AM, any conflict in relfilenode value will be
* caught here, if GetNewRelFileNode messes up for any reason.
......@@ -3479,18 +3477,10 @@ RelationSetNewRelfilenode(Relation relation, char persistence)
switch (relation->rd_rel->relkind)
{
/* shouldn't be called for these */
case RELKIND_VIEW:
case RELKIND_COMPOSITE_TYPE:
case RELKIND_FOREIGN_TABLE:
case RELKIND_PARTITIONED_TABLE:
case RELKIND_PARTITIONED_INDEX:
elog(ERROR, "should not have storage");
break;
case RELKIND_INDEX:
case RELKIND_SEQUENCE:
{
/* handle these directly, at least for now */
SMgrRelation srel;
srel = RelationCreateStorage(newrnode, persistence);
......@@ -3505,41 +3495,76 @@ RelationSetNewRelfilenode(Relation relation, char persistence)
persistence,
&freezeXid, &minmulti);
break;
default:
/* we shouldn't be called for anything else */
elog(ERROR, "relation \"%s\" does not have storage",
RelationGetRelationName(relation));
break;
}
/*
* However, if we're dealing with a mapped index, pg_class.relfilenode
* doesn't change; instead we have to send the update to the relation
* mapper.
* If we're dealing with a mapped index, pg_class.relfilenode doesn't
* change; instead we have to send the update to the relation mapper.
*
* For mapped indexes, we don't actually change the pg_class entry at all;
* this is essential when reindexing pg_class itself. That leaves us with
* possibly-inaccurate values of relpages etc, but those will be fixed up
* later.
*/
if (RelationIsMapped(relation))
{
/* This case is only supported for indexes */
Assert(relation->rd_rel->relkind == RELKIND_INDEX);
/* Since we're not updating pg_class, these had better not change */
Assert(classform->relfrozenxid == freezeXid);
Assert(classform->relminmxid == minmulti);
Assert(classform->relpersistence == persistence);
/*
* In some code paths it's possible that the tuple update we'd
* otherwise do here is the only thing that would assign an XID for
* the current transaction. However, we must have an XID to delete
* files, so make sure one is assigned.
*/
(void) GetCurrentTransactionId();
/* Do the deed */
RelationMapUpdateMap(RelationGetRelid(relation),
newrelfilenode,
relation->rd_rel->relisshared,
false);
/* Since we're not updating pg_class, must trigger inval manually */
CacheInvalidateRelcache(relation);
}
else
{
/* Normal case, update the pg_class entry */
classform->relfilenode = newrelfilenode;
/* These changes are safe even for a mapped relation */
if (relation->rd_rel->relkind != RELKIND_SEQUENCE)
{
classform->relpages = 0; /* it's empty until further notice */
classform->reltuples = 0;
classform->relallvisible = 0;
}
classform->relfrozenxid = freezeXid;
classform->relminmxid = minmulti;
classform->relpersistence = persistence;
/* relpages etc. never change for sequences */
if (relation->rd_rel->relkind != RELKIND_SEQUENCE)
{
classform->relpages = 0; /* it's empty until further notice */
classform->reltuples = 0;
classform->relallvisible = 0;
}
classform->relfrozenxid = freezeXid;
classform->relminmxid = minmulti;
classform->relpersistence = persistence;
CatalogTupleUpdate(pg_class, &tuple->t_self, tuple);
CatalogTupleUpdate(pg_class, &tuple->t_self, tuple);
}
heap_freetuple(tuple);
table_close(pg_class, RowExclusiveLock);
/*
* Make the pg_class row change visible, as well as the relation map
* change if any. This will cause the relcache entry to get updated, too.
* Make the pg_class row change or relation map change visible. This will
* cause the relcache entry to get updated, too.
*/
CommandCounterIncrement();
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment