Commit 3c840464 authored by Tom Lane's avatar Tom Lane

Fix assorted bugs in CREATE/DROP INDEX CONCURRENTLY.

Commit 8cb53654, which introduced DROP
INDEX CONCURRENTLY, managed to break CREATE INDEX CONCURRENTLY via a poor
choice of catalog state representation.  The pg_index state for an index
that's reached the final pre-drop stage was the same as the state for an
index just created by CREATE INDEX CONCURRENTLY.  This meant that the
(necessary) change to make RelationGetIndexList ignore about-to-die indexes
also made it ignore freshly-created indexes; which is catastrophic because
the latter do need to be considered in HOT-safety decisions.  Failure to
do so leads to incorrect index entries and subsequently wrong results from
queries depending on the concurrently-created index.

To fix, add an additional boolean column "indislive" to pg_index, so that
the freshly-created and about-to-die states can be distinguished.  (This
change obviously is only possible in HEAD.  This patch will need to be
back-patched, but in 9.2 we'll use a kluge consisting of overloading the
formerly-impossible state of indisvalid = true and indisready = false.)

In addition, change CREATE/DROP INDEX CONCURRENTLY so that the pg_index
flag changes they make without exclusive lock on the index are made via
heap_inplace_update() rather than a normal transactional update.  The
latter is not very safe because moving the pg_index tuple could result in
concurrent SnapshotNow scans finding it twice or not at all, thus possibly
resulting in index corruption.  This is a pre-existing bug in CREATE INDEX
CONCURRENTLY, which was copied into the DROP code.

In addition, fix various places in the code that ought to check to make
sure that the indexes they are manipulating are valid and/or ready as
appropriate.  These represent bugs that have existed since 8.2, since
a failed CREATE INDEX CONCURRENTLY could leave a corrupt or invalid
index behind, and we ought not try to do anything that might fail with
such an index.

Also fix RelationReloadIndexInfo to ensure it copies all the pg_index
columns that are allowed to change after initial creation.  Previously we
could have been left with stale values of some fields in an index relcache
entry.  It's not clear whether this actually had any user-visible
consequences, but it's at least a bug waiting to happen.

In addition, do some code and docs review for DROP INDEX CONCURRENTLY;
some cosmetic code cleanup but mostly addition and revision of comments.

This will need to be back-patched, but in a noticeably different form,
so I'm committing it to HEAD before working on the back-patch.

Problem reported by Amit Kapila, diagnosis by Pavan Deolassee,
fix by Tom Lane and Andres Freund.
parent 1577b46b
...@@ -141,8 +141,8 @@ triggered_change_notification(PG_FUNCTION_ARGS) ...@@ -141,8 +141,8 @@ triggered_change_notification(PG_FUNCTION_ARGS)
if (!HeapTupleIsValid(indexTuple)) /* should not happen */ if (!HeapTupleIsValid(indexTuple)) /* should not happen */
elog(ERROR, "cache lookup failed for index %u", indexoid); elog(ERROR, "cache lookup failed for index %u", indexoid);
index = (Form_pg_index) GETSTRUCT(indexTuple); index = (Form_pg_index) GETSTRUCT(indexTuple);
/* we're only interested if it is the primary key */ /* we're only interested if it is the primary key and valid */
if (index->indisprimary) if (index->indisprimary && IndexIsValid(index))
{ {
int numatts = index->indnatts; int numatts = index->indnatts;
......
...@@ -3480,7 +3480,7 @@ ...@@ -3480,7 +3480,7 @@
index is possibly incomplete: it must still be modified by index is possibly incomplete: it must still be modified by
<command>INSERT</>/<command>UPDATE</> operations, but it cannot safely <command>INSERT</>/<command>UPDATE</> operations, but it cannot safely
be used for queries. If it is unique, the uniqueness property is not be used for queries. If it is unique, the uniqueness property is not
true either. guaranteed true either.
</entry> </entry>
</row> </row>
...@@ -3507,6 +3507,16 @@ ...@@ -3507,6 +3507,16 @@
</entry> </entry>
</row> </row>
<row>
<entry><structfield>indislive</structfield></entry>
<entry><type>bool</type></entry>
<entry></entry>
<entry>
If false, the index is in process of being dropped, and should be
ignored for all purposes (including HOT-safety decisions)
</entry>
</row>
<row> <row>
<entry><structfield>indkey</structfield></entry> <entry><structfield>indkey</structfield></entry>
<entry><type>int2vector</type></entry> <entry><type>int2vector</type></entry>
......
...@@ -40,34 +40,33 @@ DROP INDEX [ CONCURRENTLY ] [ IF EXISTS ] <replaceable class="PARAMETER">name</r ...@@ -40,34 +40,33 @@ DROP INDEX [ CONCURRENTLY ] [ IF EXISTS ] <replaceable class="PARAMETER">name</r
<variablelist> <variablelist>
<varlistentry> <varlistentry>
<term><literal>IF EXISTS</literal></term> <term><literal>CONCURRENTLY</literal></term>
<listitem> <listitem>
<para> <para>
Do not throw an error if the index does not exist. A notice is issued Drop the index without locking out concurrent selects, inserts, updates,
in this case. and deletes on the index's table. A normal <command>DROP INDEX</>
acquires exclusive lock on the table, blocking other accesses until the
index drop can be completed. With this option, the command instead
waits until conflicting transactions have completed.
</para>
<para>
There are several caveats to be aware of when using this option.
Only one index name can be specified, and the <literal>CASCADE</> option
is not supported. (Thus, an index that supports a <literal>UNIQUE</> or
<literal>PRIMARY KEY</> constraint cannot be dropped this way.)
Also, regular <command>DROP INDEX</> commands can be
performed within a transaction block, but
<command>DROP INDEX CONCURRENTLY</> cannot.
</para> </para>
</listitem> </listitem>
</varlistentry> </varlistentry>
<varlistentry> <varlistentry>
<term><literal>CONCURRENTLY</literal></term> <term><literal>IF EXISTS</literal></term>
<listitem> <listitem>
<para> <para>
When this option is used, <productname>PostgreSQL</> will drop the Do not throw an error if the index does not exist. A notice is issued
index without taking any locks that prevent concurrent selects, inserts, in this case.
updates, or deletes on the table; whereas a standard index drop
waits for a lock that locks out everything on the table until it's done.
Concurrent drop index is a two stage process. First, we mark the index
both invalid and not ready then commit the change. Next we wait until
there are no users locking the table who can see the index.
</para>
<para>
There are several caveats to be aware of when using this option.
Only one index name can be specified if the <literal>CONCURRENTLY</literal>
parameter is specified. Regular <command>DROP INDEX</> command can be
performed within a transaction block, but
<command>DROP INDEX CONCURRENTLY</> cannot.
The CASCADE option is not supported when dropping an index concurrently.
</para> </para>
</listitem> </listitem>
</varlistentry> </varlistentry>
......
...@@ -386,6 +386,34 @@ from the index, as well as ensuring that no one can see any inconsistent ...@@ -386,6 +386,34 @@ from the index, as well as ensuring that no one can see any inconsistent
rows in a broken HOT chain (the first condition is stronger than the rows in a broken HOT chain (the first condition is stronger than the
second). Finally, we can mark the index valid for searches. second). Finally, we can mark the index valid for searches.
Note that we do not need to set pg_index.indcheckxmin in this code path,
because we have outwaited any transactions that would need to avoid using
the index. (indcheckxmin is only needed because non-concurrent CREATE
INDEX doesn't want to wait; its stronger lock would create too much risk of
deadlock if it did.)
DROP INDEX CONCURRENTLY
-----------------------
DROP INDEX CONCURRENTLY is sort of the reverse sequence of CREATE INDEX
CONCURRENTLY. We first mark the index as not indisvalid, and then wait for
any transactions that could be using it in queries to end. (During this
time, index updates must still be performed as normal, since such
transactions might expect freshly inserted tuples to be findable.)
Then, we clear indisready and indislive, and again wait for transactions
that could be updating the index to end. Finally we can drop the index
normally (though taking only ShareUpdateExclusiveLock on its parent table).
The reason we need the pg_index.indislive flag is that after the second
wait step begins, we don't want transactions to be touching the index at
all; otherwise they might suffer errors if the DROP finally commits while
they are reading catalog entries for the index. If we had only indisvalid
and indisready, this state would be indistinguishable from the first stage
of CREATE INDEX CONCURRENTLY --- but in that state, we *do* want
transactions to examine the index, since they must consider it in
HOT-safety checks.
Limitations and Restrictions Limitations and Restrictions
---------------------------- ----------------------------
......
...@@ -995,7 +995,6 @@ deleteOneObject(const ObjectAddress *object, Relation depRel, int flags) ...@@ -995,7 +995,6 @@ deleteOneObject(const ObjectAddress *object, Relation depRel, int flags)
int nkeys; int nkeys;
SysScanDesc scan; SysScanDesc scan;
HeapTuple tup; HeapTuple tup;
Oid depRelOid = depRel->rd_id;
/* DROP hook of the objects being removed */ /* DROP hook of the objects being removed */
if (object_access_hook) if (object_access_hook)
...@@ -1008,9 +1007,9 @@ deleteOneObject(const ObjectAddress *object, Relation depRel, int flags) ...@@ -1008,9 +1007,9 @@ deleteOneObject(const ObjectAddress *object, Relation depRel, int flags)
} }
/* /*
* Close depRel if we are doing a drop concurrently. The individual * Close depRel if we are doing a drop concurrently. The object deletion
* deletion has to commit the transaction and we don't want dangling * subroutine will commit the current transaction, so we can't keep the
* references. * relation open across doDeletion().
*/ */
if (flags & PERFORM_DELETION_CONCURRENTLY) if (flags & PERFORM_DELETION_CONCURRENTLY)
heap_close(depRel, RowExclusiveLock); heap_close(depRel, RowExclusiveLock);
...@@ -1018,24 +1017,23 @@ deleteOneObject(const ObjectAddress *object, Relation depRel, int flags) ...@@ -1018,24 +1017,23 @@ deleteOneObject(const ObjectAddress *object, Relation depRel, int flags)
/* /*
* Delete the object itself, in an object-type-dependent way. * Delete the object itself, in an object-type-dependent way.
* *
* Do this before removing outgoing dependencies as deletions can be * We used to do this after removing the outgoing dependency links, but it
* happening in concurrent mode. That will only happen for a single object * seems just as reasonable to do it beforehand. In the concurrent case
* at once and if so the object will be invalidated inside a separate * we *must* do it in this order, because we can't make any transactional
* transaction and only dropped inside a transaction thats in-progress when * updates before calling doDeletion() --- they'd get committed right
* doDeletion returns. This way no observer can see dangling dependency * away, which is not cool if the deletion then fails.
* entries.
*/ */
doDeletion(object, flags); doDeletion(object, flags);
/* /*
* Reopen depRel if we closed it before * Reopen depRel if we closed it above
*/ */
if (flags & PERFORM_DELETION_CONCURRENTLY) if (flags & PERFORM_DELETION_CONCURRENTLY)
depRel = heap_open(depRelOid, RowExclusiveLock); depRel = heap_open(DependRelationId, RowExclusiveLock);
/* /*
* Then remove any pg_depend records that link from this object to * Now remove any pg_depend records that link from this object to others.
* others. (Any records linking to this object should be gone already.) * (Any records linking to this object should be gone already.)
* *
* When dropping a whole object (subId = 0), remove all pg_depend records * When dropping a whole object (subId = 0), remove all pg_depend records
* for its sub-objects too. * for its sub-objects too.
...@@ -1258,15 +1256,23 @@ AcquireDeletionLock(const ObjectAddress *object, int flags) ...@@ -1258,15 +1256,23 @@ AcquireDeletionLock(const ObjectAddress *object, int flags)
{ {
if (object->classId == RelationRelationId) if (object->classId == RelationRelationId)
{ {
/*
* In DROP INDEX CONCURRENTLY, take only ShareUpdateExclusiveLock on
* the index for the moment. index_drop() will promote the lock once
* it's safe to do so. In all other cases we need full exclusive
* lock.
*/
if (flags & PERFORM_DELETION_CONCURRENTLY) if (flags & PERFORM_DELETION_CONCURRENTLY)
LockRelationOid(object->objectId, ShareUpdateExclusiveLock); LockRelationOid(object->objectId, ShareUpdateExclusiveLock);
else else
LockRelationOid(object->objectId, AccessExclusiveLock); LockRelationOid(object->objectId, AccessExclusiveLock);
} }
else else
{
/* assume we should lock the whole object not a sub-object */ /* assume we should lock the whole object not a sub-object */
LockDatabaseObject(object->classId, object->objectId, 0, LockDatabaseObject(object->classId, object->objectId, 0,
AccessExclusiveLock); AccessExclusiveLock);
}
} }
/* /*
......
This diff is collapsed.
...@@ -444,7 +444,7 @@ check_index_is_clusterable(Relation OldHeap, Oid indexOid, bool recheck, LOCKMOD ...@@ -444,7 +444,7 @@ check_index_is_clusterable(Relation OldHeap, Oid indexOid, bool recheck, LOCKMOD
* might put recently-dead tuples out-of-order in the new table, and there * might put recently-dead tuples out-of-order in the new table, and there
* is little harm in that.) * is little harm in that.)
*/ */
if (!OldIndex->rd_index->indisvalid) if (!IndexIsValid(OldIndex->rd_index))
ereport(ERROR, ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED), (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("cannot cluster on invalid index \"%s\"", errmsg("cannot cluster on invalid index \"%s\"",
...@@ -458,6 +458,11 @@ check_index_is_clusterable(Relation OldHeap, Oid indexOid, bool recheck, LOCKMOD ...@@ -458,6 +458,11 @@ check_index_is_clusterable(Relation OldHeap, Oid indexOid, bool recheck, LOCKMOD
* mark_index_clustered: mark the specified index as the one clustered on * mark_index_clustered: mark the specified index as the one clustered on
* *
* With indexOid == InvalidOid, will mark all indexes of rel not-clustered. * With indexOid == InvalidOid, will mark all indexes of rel not-clustered.
*
* Note: we do transactional updates of the pg_index rows, which are unsafe
* against concurrent SnapshotNow scans of pg_index. Therefore this is unsafe
* to execute with less than full exclusive lock on the parent table;
* otherwise concurrent executions of RelationGetIndexList could miss indexes.
*/ */
void void
mark_index_clustered(Relation rel, Oid indexOid) mark_index_clustered(Relation rel, Oid indexOid)
...@@ -513,6 +518,9 @@ mark_index_clustered(Relation rel, Oid indexOid) ...@@ -513,6 +518,9 @@ mark_index_clustered(Relation rel, Oid indexOid)
} }
else if (thisIndexOid == indexOid) else if (thisIndexOid == indexOid)
{ {
/* this was checked earlier, but let's be real sure */
if (!IndexIsValid(indexForm))
elog(ERROR, "cannot cluster on invalid index %u", indexOid);
indexForm->indisclustered = true; indexForm->indisclustered = true;
simple_heap_update(pg_index, &indexTuple->t_self, indexTuple); simple_heap_update(pg_index, &indexTuple->t_self, indexTuple);
CatalogUpdateIndexes(pg_index, indexTuple); CatalogUpdateIndexes(pg_index, indexTuple);
......
...@@ -124,6 +124,7 @@ CheckIndexCompatible(Oid oldId, ...@@ -124,6 +124,7 @@ CheckIndexCompatible(Oid oldId,
Oid accessMethodId; Oid accessMethodId;
Oid relationId; Oid relationId;
HeapTuple tuple; HeapTuple tuple;
Form_pg_index indexForm;
Form_pg_am accessMethodForm; Form_pg_am accessMethodForm;
bool amcanorder; bool amcanorder;
int16 *coloptions; int16 *coloptions;
...@@ -193,17 +194,22 @@ CheckIndexCompatible(Oid oldId, ...@@ -193,17 +194,22 @@ CheckIndexCompatible(Oid oldId,
tuple = SearchSysCache1(INDEXRELID, ObjectIdGetDatum(oldId)); tuple = SearchSysCache1(INDEXRELID, ObjectIdGetDatum(oldId));
if (!HeapTupleIsValid(tuple)) if (!HeapTupleIsValid(tuple))
elog(ERROR, "cache lookup failed for index %u", oldId); elog(ERROR, "cache lookup failed for index %u", oldId);
indexForm = (Form_pg_index) GETSTRUCT(tuple);
/* We don't assess expressions or predicates; assume incompatibility. */ /*
* We don't assess expressions or predicates; assume incompatibility.
* Also, if the index is invalid for any reason, treat it as incompatible.
*/
if (!(heap_attisnull(tuple, Anum_pg_index_indpred) && if (!(heap_attisnull(tuple, Anum_pg_index_indpred) &&
heap_attisnull(tuple, Anum_pg_index_indexprs))) heap_attisnull(tuple, Anum_pg_index_indexprs) &&
IndexIsValid(indexForm)))
{ {
ReleaseSysCache(tuple); ReleaseSysCache(tuple);
return false; return false;
} }
/* Any change in operator class or collation breaks compatibility. */ /* Any change in operator class or collation breaks compatibility. */
old_natts = ((Form_pg_index) GETSTRUCT(tuple))->indnatts; old_natts = indexForm->indnatts;
Assert(old_natts == numberOfAttributes); Assert(old_natts == numberOfAttributes);
d = SysCacheGetAttr(INDEXRELID, tuple, Anum_pg_index_indcollation, &isnull); d = SysCacheGetAttr(INDEXRELID, tuple, Anum_pg_index_indcollation, &isnull);
...@@ -320,9 +326,6 @@ DefineIndex(IndexStmt *stmt, ...@@ -320,9 +326,6 @@ DefineIndex(IndexStmt *stmt,
LockRelId heaprelid; LockRelId heaprelid;
LOCKTAG heaplocktag; LOCKTAG heaplocktag;
Snapshot snapshot; Snapshot snapshot;
Relation pg_index;
HeapTuple indexTuple;
Form_pg_index indexForm;
int i; int i;
/* /*
...@@ -717,23 +720,7 @@ DefineIndex(IndexStmt *stmt, ...@@ -717,23 +720,7 @@ DefineIndex(IndexStmt *stmt,
* commit this transaction, any new transactions that open the table must * commit this transaction, any new transactions that open the table must
* insert new entries into the index for insertions and non-HOT updates. * insert new entries into the index for insertions and non-HOT updates.
*/ */
pg_index = heap_open(IndexRelationId, RowExclusiveLock); index_set_state_flags(indexRelationId, INDEX_CREATE_SET_READY);
indexTuple = SearchSysCacheCopy1(INDEXRELID,
ObjectIdGetDatum(indexRelationId));
if (!HeapTupleIsValid(indexTuple))
elog(ERROR, "cache lookup failed for index %u", indexRelationId);
indexForm = (Form_pg_index) GETSTRUCT(indexTuple);
Assert(!indexForm->indisready);
Assert(!indexForm->indisvalid);
indexForm->indisready = true;
simple_heap_update(pg_index, &indexTuple->t_self, indexTuple);
CatalogUpdateIndexes(pg_index, indexTuple);
heap_close(pg_index, RowExclusiveLock);
/* we can do away with our snapshot */ /* we can do away with our snapshot */
PopActiveSnapshot(); PopActiveSnapshot();
...@@ -857,23 +844,7 @@ DefineIndex(IndexStmt *stmt, ...@@ -857,23 +844,7 @@ DefineIndex(IndexStmt *stmt,
/* /*
* Index can now be marked valid -- update its pg_index entry * Index can now be marked valid -- update its pg_index entry
*/ */
pg_index = heap_open(IndexRelationId, RowExclusiveLock); index_set_state_flags(indexRelationId, INDEX_CREATE_SET_VALID);
indexTuple = SearchSysCacheCopy1(INDEXRELID,
ObjectIdGetDatum(indexRelationId));
if (!HeapTupleIsValid(indexTuple))
elog(ERROR, "cache lookup failed for index %u", indexRelationId);
indexForm = (Form_pg_index) GETSTRUCT(indexTuple);
Assert(indexForm->indisready);
Assert(!indexForm->indisvalid);
indexForm->indisvalid = true;
simple_heap_update(pg_index, &indexTuple->t_self, indexTuple);
CatalogUpdateIndexes(pg_index, indexTuple);
heap_close(pg_index, RowExclusiveLock);
/* /*
* The pg_index update will cause backends (including this one) to update * The pg_index update will cause backends (including this one) to update
...@@ -881,7 +852,7 @@ DefineIndex(IndexStmt *stmt, ...@@ -881,7 +852,7 @@ DefineIndex(IndexStmt *stmt,
* relcache inval on the parent table to force replanning of cached plans. * relcache inval on the parent table to force replanning of cached plans.
* Otherwise existing sessions might fail to use the new index where it * Otherwise existing sessions might fail to use the new index where it
* would be useful. (Note that our earlier commits did not create reasons * would be useful. (Note that our earlier commits did not create reasons
* to replan; relcache flush on the index itself was sufficient.) * to replan; so relcache flush on the index itself was sufficient.)
*/ */
CacheInvalidateRelcacheByRelid(heaprelid.relId); CacheInvalidateRelcacheByRelid(heaprelid.relId);
......
...@@ -744,10 +744,13 @@ RemoveRelations(DropStmt *drop) ...@@ -744,10 +744,13 @@ RemoveRelations(DropStmt *drop)
int flags = 0; int flags = 0;
LOCKMODE lockmode = AccessExclusiveLock; LOCKMODE lockmode = AccessExclusiveLock;
/* DROP CONCURRENTLY uses a weaker lock, and has some restrictions */
if (drop->concurrent) if (drop->concurrent)
{ {
flags |= PERFORM_DELETION_CONCURRENTLY;
lockmode = ShareUpdateExclusiveLock; lockmode = ShareUpdateExclusiveLock;
if (list_length(drop->objects) > 1) Assert(drop->removeType == OBJECT_INDEX);
if (list_length(drop->objects) != 1)
ereport(ERROR, ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED), (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("DROP INDEX CONCURRENTLY does not support dropping multiple objects"))); errmsg("DROP INDEX CONCURRENTLY does not support dropping multiple objects")));
...@@ -839,19 +842,6 @@ RemoveRelations(DropStmt *drop) ...@@ -839,19 +842,6 @@ RemoveRelations(DropStmt *drop)
add_exact_object_address(&obj, objects); add_exact_object_address(&obj, objects);
} }
/*
* Set options and check further requirements for concurrent drop
*/
if (drop->concurrent)
{
/*
* Confirm that concurrent behaviour is restricted in grammar.
*/
Assert(drop->removeType == OBJECT_INDEX);
flags |= PERFORM_DELETION_CONCURRENTLY;
}
performMultipleDeletions(objects, drop->behavior, flags); performMultipleDeletions(objects, drop->behavior, flags);
free_object_addresses(objects); free_object_addresses(objects);
...@@ -918,7 +908,7 @@ RangeVarCallbackForDropRelation(const RangeVar *rel, Oid relOid, Oid oldRelOid, ...@@ -918,7 +908,7 @@ RangeVarCallbackForDropRelation(const RangeVar *rel, Oid relOid, Oid oldRelOid,
* locking the index. index_drop() will need this anyway, and since * locking the index. index_drop() will need this anyway, and since
* regular queries lock tables before their indexes, we risk deadlock if * regular queries lock tables before their indexes, we risk deadlock if
* we do it the other way around. No error if we don't find a pg_index * we do it the other way around. No error if we don't find a pg_index
* entry, though --- the relation may have been droppd. * entry, though --- the relation may have been dropped.
*/ */
if (relkind == RELKIND_INDEX && relOid != oldRelOid) if (relkind == RELKIND_INDEX && relOid != oldRelOid)
{ {
...@@ -4784,6 +4774,8 @@ ATExecDropNotNull(Relation rel, const char *colName, LOCKMODE lockmode) ...@@ -4784,6 +4774,8 @@ ATExecDropNotNull(Relation rel, const char *colName, LOCKMODE lockmode)
/* /*
* Check that the attribute is not in a primary key * Check that the attribute is not in a primary key
*
* Note: we'll throw error even if the pkey index is not valid.
*/ */
/* Loop over all indexes on the relation */ /* Loop over all indexes on the relation */
...@@ -6318,7 +6310,7 @@ transformFkeyGetPrimaryKey(Relation pkrel, Oid *indexOid, ...@@ -6318,7 +6310,7 @@ transformFkeyGetPrimaryKey(Relation pkrel, Oid *indexOid,
/* /*
* Get the list of index OIDs for the table from the relcache, and look up * Get the list of index OIDs for the table from the relcache, and look up
* each one in the pg_index syscache until we find one marked primary key * each one in the pg_index syscache until we find one marked primary key
* (hopefully there isn't more than one such). * (hopefully there isn't more than one such). Insist it's valid, too.
*/ */
*indexOid = InvalidOid; *indexOid = InvalidOid;
...@@ -6332,7 +6324,7 @@ transformFkeyGetPrimaryKey(Relation pkrel, Oid *indexOid, ...@@ -6332,7 +6324,7 @@ transformFkeyGetPrimaryKey(Relation pkrel, Oid *indexOid,
if (!HeapTupleIsValid(indexTuple)) if (!HeapTupleIsValid(indexTuple))
elog(ERROR, "cache lookup failed for index %u", indexoid); elog(ERROR, "cache lookup failed for index %u", indexoid);
indexStruct = (Form_pg_index) GETSTRUCT(indexTuple); indexStruct = (Form_pg_index) GETSTRUCT(indexTuple);
if (indexStruct->indisprimary) if (indexStruct->indisprimary && IndexIsValid(indexStruct))
{ {
/* /*
* Refuse to use a deferrable primary key. This is per SQL spec, * Refuse to use a deferrable primary key. This is per SQL spec,
...@@ -6430,10 +6422,12 @@ transformFkeyCheckAttrs(Relation pkrel, ...@@ -6430,10 +6422,12 @@ transformFkeyCheckAttrs(Relation pkrel,
/* /*
* Must have the right number of columns; must be unique and not a * Must have the right number of columns; must be unique and not a
* partial index; forget it if there are any expressions, too * partial index; forget it if there are any expressions, too. Invalid
* indexes are out as well.
*/ */
if (indexStruct->indnatts == numattrs && if (indexStruct->indnatts == numattrs &&
indexStruct->indisunique && indexStruct->indisunique &&
IndexIsValid(indexStruct) &&
heap_attisnull(indexTuple, Anum_pg_index_indpred) && heap_attisnull(indexTuple, Anum_pg_index_indpred) &&
heap_attisnull(indexTuple, Anum_pg_index_indexprs)) heap_attisnull(indexTuple, Anum_pg_index_indexprs))
{ {
......
...@@ -1097,9 +1097,16 @@ vacuum_rel(Oid relid, VacuumStmt *vacstmt, bool do_toast, bool for_wraparound) ...@@ -1097,9 +1097,16 @@ vacuum_rel(Oid relid, VacuumStmt *vacstmt, bool do_toast, bool for_wraparound)
/* /*
* Open all the indexes of the given relation, obtaining the specified kind * Open all the vacuumable indexes of the given relation, obtaining the
* of lock on each. Return an array of Relation pointers for the indexes * specified kind of lock on each. Return an array of Relation pointers for
* into *Irel, and the number of indexes into *nindexes. * the indexes into *Irel, and the number of indexes into *nindexes.
*
* We consider an index vacuumable if it is marked insertable (IndexIsReady).
* If it isn't, probably a CREATE INDEX CONCURRENTLY command failed early in
* execution, and what we have is too corrupt to be processable. We will
* vacuum even if the index isn't indisvalid; this is important because in a
* unique index, uniqueness checks will be performed anyway and had better not
* hit dangling index pointers.
*/ */
void void
vac_open_indexes(Relation relation, LOCKMODE lockmode, vac_open_indexes(Relation relation, LOCKMODE lockmode,
...@@ -1113,21 +1120,30 @@ vac_open_indexes(Relation relation, LOCKMODE lockmode, ...@@ -1113,21 +1120,30 @@ vac_open_indexes(Relation relation, LOCKMODE lockmode,
indexoidlist = RelationGetIndexList(relation); indexoidlist = RelationGetIndexList(relation);
*nindexes = list_length(indexoidlist); /* allocate enough memory for all indexes */
i = list_length(indexoidlist);
if (*nindexes > 0) if (i > 0)
*Irel = (Relation *) palloc(*nindexes * sizeof(Relation)); *Irel = (Relation *) palloc(i * sizeof(Relation));
else else
*Irel = NULL; *Irel = NULL;
/* collect just the ready indexes */
i = 0; i = 0;
foreach(indexoidscan, indexoidlist) foreach(indexoidscan, indexoidlist)
{ {
Oid indexoid = lfirst_oid(indexoidscan); Oid indexoid = lfirst_oid(indexoidscan);
Relation indrel;
(*Irel)[i++] = index_open(indexoid, lockmode); indrel = index_open(indexoid, lockmode);
if (IndexIsReady(indrel->rd_index))
(*Irel)[i++] = indrel;
else
index_close(indrel, lockmode);
} }
*nindexes = i;
list_free(indexoidlist); list_free(indexoidlist);
} }
......
...@@ -906,6 +906,9 @@ ExecOpenIndices(ResultRelInfo *resultRelInfo) ...@@ -906,6 +906,9 @@ ExecOpenIndices(ResultRelInfo *resultRelInfo)
/* /*
* For each index, open the index relation and save pg_index info. We * For each index, open the index relation and save pg_index info. We
* acquire RowExclusiveLock, signifying we will update the index. * acquire RowExclusiveLock, signifying we will update the index.
*
* Note: we do this even if the index is not IndexIsReady; it's not worth
* the trouble to optimize for the case where it isn't.
*/ */
i = 0; i = 0;
foreach(l, indexoidlist) foreach(l, indexoidlist)
......
...@@ -170,9 +170,10 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent, ...@@ -170,9 +170,10 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent,
* Ignore invalid indexes, since they can't safely be used for * Ignore invalid indexes, since they can't safely be used for
* queries. Note that this is OK because the data structure we * queries. Note that this is OK because the data structure we
* are constructing is only used by the planner --- the executor * are constructing is only used by the planner --- the executor
* still needs to insert into "invalid" indexes! * still needs to insert into "invalid" indexes, if they're marked
* IndexIsReady.
*/ */
if (!index->indisvalid) if (!IndexIsValid(index))
{ {
index_close(indexRelation, NoLock); index_close(indexRelation, NoLock);
continue; continue;
......
...@@ -1533,18 +1533,12 @@ transformIndexConstraint(Constraint *constraint, CreateStmtContext *cxt) ...@@ -1533,18 +1533,12 @@ transformIndexConstraint(Constraint *constraint, CreateStmtContext *cxt)
index_name, RelationGetRelationName(heap_rel)), index_name, RelationGetRelationName(heap_rel)),
parser_errposition(cxt->pstate, constraint->location))); parser_errposition(cxt->pstate, constraint->location)));
if (!index_form->indisvalid) if (!IndexIsValid(index_form))
ereport(ERROR, ereport(ERROR,
(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
errmsg("index \"%s\" is not valid", index_name), errmsg("index \"%s\" is not valid", index_name),
parser_errposition(cxt->pstate, constraint->location))); parser_errposition(cxt->pstate, constraint->location)));
if (!index_form->indisready)
ereport(ERROR,
(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
errmsg("index \"%s\" is not ready", index_name),
parser_errposition(cxt->pstate, constraint->location)));
if (!index_form->indisunique) if (!index_form->indisunique)
ereport(ERROR, ereport(ERROR,
(errcode(ERRCODE_WRONG_OBJECT_TYPE), (errcode(ERRCODE_WRONG_OBJECT_TYPE),
......
...@@ -1731,9 +1731,23 @@ RelationReloadIndexInfo(Relation relation) ...@@ -1731,9 +1731,23 @@ RelationReloadIndexInfo(Relation relation)
RelationGetRelid(relation)); RelationGetRelid(relation));
index = (Form_pg_index) GETSTRUCT(tuple); index = (Form_pg_index) GETSTRUCT(tuple);
/*
* Basically, let's just copy all the bool fields. There are one or
* two of these that can't actually change in the current code, but
* it's not worth it to track exactly which ones they are. None of
* the array fields are allowed to change, though.
*/
relation->rd_index->indisunique = index->indisunique;
relation->rd_index->indisprimary = index->indisprimary;
relation->rd_index->indisexclusion = index->indisexclusion;
relation->rd_index->indimmediate = index->indimmediate;
relation->rd_index->indisclustered = index->indisclustered;
relation->rd_index->indisvalid = index->indisvalid; relation->rd_index->indisvalid = index->indisvalid;
relation->rd_index->indcheckxmin = index->indcheckxmin; relation->rd_index->indcheckxmin = index->indcheckxmin;
relation->rd_index->indisready = index->indisready; relation->rd_index->indisready = index->indisready;
relation->rd_index->indislive = index->indislive;
/* Copy xmin too, as that is needed to make sense of indcheckxmin */
HeapTupleHeaderSetXmin(relation->rd_indextuple->t_data, HeapTupleHeaderSetXmin(relation->rd_indextuple->t_data,
HeapTupleHeaderGetXmin(tuple->t_data)); HeapTupleHeaderGetXmin(tuple->t_data));
...@@ -3299,6 +3313,10 @@ CheckConstraintFetch(Relation relation) ...@@ -3299,6 +3313,10 @@ CheckConstraintFetch(Relation relation)
* so that we must recompute the index list on next request. This handles * so that we must recompute the index list on next request. This handles
* creation or deletion of an index. * creation or deletion of an index.
* *
* Indexes that are marked not IndexIsLive are omitted from the returned list.
* Such indexes are expected to be dropped momentarily, and should not be
* touched at all by any caller of this function.
*
* The returned list is guaranteed to be sorted in order by OID. This is * The returned list is guaranteed to be sorted in order by OID. This is
* needed by the executor, since for index types that we obtain exclusive * needed by the executor, since for index types that we obtain exclusive
* locks on when updating the index, all backends must lock the indexes in * locks on when updating the index, all backends must lock the indexes in
...@@ -3358,9 +3376,12 @@ RelationGetIndexList(Relation relation) ...@@ -3358,9 +3376,12 @@ RelationGetIndexList(Relation relation)
bool isnull; bool isnull;
/* /*
* Ignore any indexes that are currently being dropped * Ignore any indexes that are currently being dropped. This will
* prevent them from being searched, inserted into, or considered in
* HOT-safety decisions. It's unsafe to touch such an index at all
* since its catalog entries could disappear at any instant.
*/ */
if (!index->indisvalid && !index->indisready) if (!IndexIsLive(index))
continue; continue;
/* Add index's OID to result list in the proper order */ /* Add index's OID to result list in the proper order */
...@@ -3379,7 +3400,8 @@ RelationGetIndexList(Relation relation) ...@@ -3379,7 +3400,8 @@ RelationGetIndexList(Relation relation)
indclass = (oidvector *) DatumGetPointer(indclassDatum); indclass = (oidvector *) DatumGetPointer(indclassDatum);
/* Check to see if it is a unique, non-partial btree index on OID */ /* Check to see if it is a unique, non-partial btree index on OID */
if (index->indnatts == 1 && if (IndexIsValid(index) &&
index->indnatts == 1 &&
index->indisunique && index->indimmediate && index->indisunique && index->indimmediate &&
index->indkey.values[0] == ObjectIdAttributeNumber && index->indkey.values[0] == ObjectIdAttributeNumber &&
indclass->values[0] == OID_BTREE_OPS_OID && indclass->values[0] == OID_BTREE_OPS_OID &&
...@@ -3674,6 +3696,13 @@ RelationGetIndexAttrBitmap(Relation relation) ...@@ -3674,6 +3696,13 @@ RelationGetIndexAttrBitmap(Relation relation)
/* /*
* For each index, add referenced attributes to indexattrs. * For each index, add referenced attributes to indexattrs.
*
* Note: we consider all indexes returned by RelationGetIndexList, even if
* they are not indisready or indisvalid. This is important because an
* index for which CREATE INDEX CONCURRENTLY has just started must be
* included in HOT-safety decisions (see README.HOT). If a DROP INDEX
* CONCURRENTLY is far enough along that we should ignore the index, it
* won't be returned at all by RelationGetIndexList.
*/ */
indexattrs = NULL; indexattrs = NULL;
foreach(l, indexoidlist) foreach(l, indexoidlist)
......
...@@ -53,6 +53,6 @@ ...@@ -53,6 +53,6 @@
*/ */
/* yyyymmddN */ /* yyyymmddN */
#define CATALOG_VERSION_NO 201210071 #define CATALOG_VERSION_NO 201211281
#endif #endif
...@@ -27,6 +27,15 @@ typedef void (*IndexBuildCallback) (Relation index, ...@@ -27,6 +27,15 @@ typedef void (*IndexBuildCallback) (Relation index,
bool tupleIsAlive, bool tupleIsAlive,
void *state); void *state);
/* Action code for index_set_state_flags */
typedef enum
{
INDEX_CREATE_SET_READY,
INDEX_CREATE_SET_VALID,
INDEX_DROP_CLEAR_VALID,
INDEX_DROP_SET_DEAD
} IndexStateFlagsAction;
extern void index_check_primary_key(Relation heapRel, extern void index_check_primary_key(Relation heapRel,
IndexInfo *indexInfo, IndexInfo *indexInfo,
...@@ -90,6 +99,8 @@ extern double IndexBuildHeapScan(Relation heapRelation, ...@@ -90,6 +99,8 @@ extern double IndexBuildHeapScan(Relation heapRelation,
extern void validate_index(Oid heapId, Oid indexId, Snapshot snapshot); extern void validate_index(Oid heapId, Oid indexId, Snapshot snapshot);
extern void index_set_state_flags(Oid indexId, IndexStateFlagsAction action);
extern void reindex_index(Oid indexId, bool skip_constraint_checks); extern void reindex_index(Oid indexId, bool skip_constraint_checks);
/* Flag bits for reindex_relation(): */ /* Flag bits for reindex_relation(): */
......
...@@ -41,6 +41,7 @@ CATALOG(pg_index,2610) BKI_WITHOUT_OIDS BKI_SCHEMA_MACRO ...@@ -41,6 +41,7 @@ CATALOG(pg_index,2610) BKI_WITHOUT_OIDS BKI_SCHEMA_MACRO
bool indisvalid; /* is this index valid for use by queries? */ bool indisvalid; /* is this index valid for use by queries? */
bool indcheckxmin; /* must we wait for xmin to be old? */ bool indcheckxmin; /* must we wait for xmin to be old? */
bool indisready; /* is this index ready for inserts? */ bool indisready; /* is this index ready for inserts? */
bool indislive; /* is this index alive at all? */
/* variable-length fields start here, but we allow direct access to indkey */ /* variable-length fields start here, but we allow direct access to indkey */
int2vector indkey; /* column numbers of indexed cols, or 0 */ int2vector indkey; /* column numbers of indexed cols, or 0 */
...@@ -68,7 +69,7 @@ typedef FormData_pg_index *Form_pg_index; ...@@ -68,7 +69,7 @@ typedef FormData_pg_index *Form_pg_index;
* compiler constants for pg_index * compiler constants for pg_index
* ---------------- * ----------------
*/ */
#define Natts_pg_index 17 #define Natts_pg_index 18
#define Anum_pg_index_indexrelid 1 #define Anum_pg_index_indexrelid 1
#define Anum_pg_index_indrelid 2 #define Anum_pg_index_indrelid 2
#define Anum_pg_index_indnatts 3 #define Anum_pg_index_indnatts 3
...@@ -80,12 +81,13 @@ typedef FormData_pg_index *Form_pg_index; ...@@ -80,12 +81,13 @@ typedef FormData_pg_index *Form_pg_index;
#define Anum_pg_index_indisvalid 9 #define Anum_pg_index_indisvalid 9
#define Anum_pg_index_indcheckxmin 10 #define Anum_pg_index_indcheckxmin 10
#define Anum_pg_index_indisready 11 #define Anum_pg_index_indisready 11
#define Anum_pg_index_indkey 12 #define Anum_pg_index_indislive 12
#define Anum_pg_index_indcollation 13 #define Anum_pg_index_indkey 13
#define Anum_pg_index_indclass 14 #define Anum_pg_index_indcollation 14
#define Anum_pg_index_indoption 15 #define Anum_pg_index_indclass 15
#define Anum_pg_index_indexprs 16 #define Anum_pg_index_indoption 16
#define Anum_pg_index_indpred 17 #define Anum_pg_index_indexprs 17
#define Anum_pg_index_indpred 18
/* /*
* Index AMs that support ordered scans must support these two indoption * Index AMs that support ordered scans must support these two indoption
...@@ -95,4 +97,13 @@ typedef FormData_pg_index *Form_pg_index; ...@@ -95,4 +97,13 @@ typedef FormData_pg_index *Form_pg_index;
#define INDOPTION_DESC 0x0001 /* values are in reverse order */ #define INDOPTION_DESC 0x0001 /* values are in reverse order */
#define INDOPTION_NULLS_FIRST 0x0002 /* NULLs are first instead of last */ #define INDOPTION_NULLS_FIRST 0x0002 /* NULLs are first instead of last */
/*
* Use of these macros is recommended over direct examination of the state
* flag columns where possible; this allows source code compatibility with
* the hacky representation used in 9.2.
*/
#define IndexIsValid(indexForm) ((indexForm)->indisvalid)
#define IndexIsReady(indexForm) ((indexForm)->indisready)
#define IndexIsLive(indexForm) ((indexForm)->indislive)
#endif /* PG_INDEX_H */ #endif /* PG_INDEX_H */
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment