Commit 0a51e707 authored by Tom Lane's avatar Tom Lane

Don't take ProcArrayLock while exiting a transaction that has no XID; there is

no need for serialization against snapshot-taking because the xact doesn't
affect anyone else's snapshot anyway.  Per discussion.  Also, move various
info about the interlocking of transactions and snapshots out of code comments
and into a hopefully-more-cohesive discussion in access/transam/README.

Also, remove a couple of now-obsolete comments about having to force some WAL
to be written to persuade RecordTransactionCommit to do its thing.
parent 85e79a4a
......@@ -8,7 +8,7 @@
*
*
* IDENTIFICATION
* $PostgreSQL: pgsql/src/backend/access/heap/heapam.c,v 1.238 2007/09/05 18:10:47 tgl Exp $
* $PostgreSQL: pgsql/src/backend/access/heap/heapam.c,v 1.239 2007/09/07 20:59:26 tgl Exp $
*
*
* INTERFACE ROUTINES
......@@ -1546,9 +1546,8 @@ UpdateXmaxHintBits(HeapTupleHeader tuple, Buffer buffer, TransactionId xid)
* If use_wal is false, the new tuple is not logged in WAL, even for a
* non-temp relation. Safe usage of this behavior requires that we arrange
* that all new tuples go into new pages not containing any tuples from other
* transactions, that the relation gets fsync'd before commit, and that the
* transaction emits at least one WAL record to ensure RecordTransactionCommit
* will decide to WAL-log the commit. (See also heap_sync() comments)
* transactions, and that the relation gets fsync'd before commit.
* (See also heap_sync() comments)
*
* use_fsm is passed directly to RelationGetBufferForTuple, which see for
* more info.
......
$PostgreSQL: pgsql/src/backend/access/transam/README,v 1.7 2007/09/05 18:10:47 tgl Exp $
$PostgreSQL: pgsql/src/backend/access/transam/README,v 1.8 2007/09/07 20:59:26 tgl Exp $
The Transaction System
----------------------
......@@ -221,6 +221,110 @@ InvalidSubTransactionId.) Note that subtransactions do not have their
own VXIDs; they use the parent top transaction's VXID.
Interlocking transaction begin, transaction end, and snapshots
--------------------------------------------------------------
We try hard to minimize the amount of overhead and lock contention involved
in the frequent activities of beginning/ending a transaction and taking a
snapshot. Unfortunately, we must have some interlocking for this, because
we must ensure consistency about the commit order of transactions.
For example, suppose an UPDATE in xact A is blocked by xact B's prior
update of the same row, and xact B is doing commit while xact C gets a
snapshot. Xact A can complete and commit as soon as B releases its locks.
If xact C's GetSnapshotData sees xact B as still running, then it had
better see xact A as still running as well, or it will be able to see two
tuple versions - one deleted by xact B and one inserted by xact A. Another
reason why this would be bad is that C would see (in the row inserted by A)
earlier changes by B, and it would be inconsistent for C not to see any
of B's changes elsewhere in the database.
Formally, the correctness requirement is "if A sees B as committed,
and B sees C as committed, then A must see C as committed".
What we actually enforce is strict serialization of commits and rollbacks
with snapshot-taking: we do not allow any transaction to exit the set of
running transactions while a snapshot is being taken. (This rule is
stronger than necessary for consistency, but is relatively simple to
enforce, and it assists with some other issues as explained below.) The
implementation of this is that GetSnapshotData takes the ProcArrayLock in
shared mode (so that multiple backends can take snapshots in parallel),
but xact.c must take the ProcArrayLock in exclusive mode while clearing
MyProc->xid at transaction end (either commit or abort).
GetSnapshotData must in fact acquire ProcArrayLock before it calls
ReadNewTransactionId. Otherwise it would be possible for a transaction A
postdating the xmax to commit, and then an existing transaction B that saw
A as committed to commit, before GetSnapshotData is able to acquire
ProcArrayLock and finish taking its snapshot. This would violate the
consistency requirement, because A would be still running and B not
according to this snapshot.
In short, then, the rule is that no transaction may exit the set of
currently-running transactions between the time we fetch xmax and the time
we finish building our snapshot. However, this restriction only applies
to transactions that have an XID --- read-only transactions can end without
acquiring ProcArrayLock, since they don't affect anyone else's snapshot.
Transaction start, per se, doesn't have any interlocking with these
considerations, since we no longer assign an XID immediately at transaction
start. But when we do decide to allocate an XID, we must require
GetNewTransactionId to store the new XID into the shared ProcArray before
releasing XidGenLock. This ensures that when GetSnapshotData calls
ReadNewTransactionId (which also takes XidGenLock), all active XIDs before
the returned value of nextXid are already present in the ProcArray and
can't be missed by GetSnapshotData. Unfortunately, we can't have
GetNewTransactionId take ProcArrayLock to do this, else it could deadlock
against GetSnapshotData. Therefore, we simply let GetNewTransactionId
store into MyProc->xid without any lock. We are thereby relying on
fetch/store of an XID to be atomic, else other backends might see a
partially-set XID. (NOTE: for multiprocessors that need explicit memory
access fence instructions, this means that acquiring/releasing XidGenLock
is just as necessary as acquiring/releasing ProcArrayLock for
GetSnapshotData to ensure it sees up-to-date xid fields.) This also means
that readers of the ProcArray xid fields must be careful to fetch a value
only once, rather than assume they can read it multiple times and get the
same answer each time.
Another important activity that uses the shared ProcArray is GetOldestXmin,
which must determine a lower bound for the oldest xmin of any active MVCC
snapshot, system-wide. Each individual backend advertises the smallest
xmin of its own snapshots in MyProc->xmin, or zero if it currently has no
live snapshots (eg, if it's between transactions or hasn't yet set a
snapshot for a new transaction). GetOldestXmin takes the MIN() of the
valid xmin fields. It does this with only shared lock on ProcArrayLock,
which means there is a potential race condition against other backends
doing GetSnapshotData concurrently: we must be certain that a concurrent
backend that is about to set its xmin does not compute an xmin less than
what GetOldestXmin returns. We ensure that by including all the active
XIDs into the MIN() calculation, along with the valid xmins. The rule that
transactions can't exit without taking exclusive ProcArrayLock ensures that
concurrent holders of shared ProcArrayLock will compute the same minimum of
currently-active XIDs: no xact, in particular not the oldest, can exit
while we hold shared ProcArrayLock. So GetOldestXmin's view of the minimum
active XID will be the same as that of any concurrent GetSnapshotData, and
so it can't produce an overestimate. If there is no active transaction at
all, GetOldestXmin returns the result of ReadNewTransactionId. Note that
two concurrent executions of GetOldestXmin might not see the same result
from ReadNewTransactionId --- but if there is a difference, the intervening
execution(s) of GetNewTransactionId must have stored their XIDs into the
ProcArray, so the later execution of GetOldestXmin will see them and
compute the same global xmin anyway.
GetSnapshotData also performs an oldest-xmin calculation (which had better
match GetOldestXmin's) and stores that into RecentGlobalXmin, which is used
for some tuple age cutoff checks where a fresh call of GetOldestXmin seems
too expensive. Note that while it is certain that two concurrent
executions of GetSnapshotData will compute the same xmin for their own
snapshots, as argued above, it is not certain that they will arrive at the
same estimate of RecentGlobalXmin. This is because we allow XID-less
transactions to clear their MyProc->xmin asynchronously (without taking
ProcArrayLock), so one execution might see what had been the oldest xmin,
and another not. This is OK since RecentGlobalXmin need only be a valid
lower bound. As noted above, we are already assuming that fetch/store
of the xid fields is atomic, so assuming it for xmin as well is no extra
risk.
pg_clog and pg_subtrans
-----------------------
......
......@@ -10,7 +10,7 @@
*
*
* IDENTIFICATION
* $PostgreSQL: pgsql/src/backend/access/transam/xact.c,v 1.248 2007/09/05 18:10:47 tgl Exp $
* $PostgreSQL: pgsql/src/backend/access/transam/xact.c,v 1.249 2007/09/07 20:59:26 tgl Exp $
*
*-------------------------------------------------------------------------
*/
......@@ -747,6 +747,8 @@ AtSubStart_ResourceOwner(void)
/*
* RecordTransactionCommit
*
* This is exported only to support an ugly hack in VACUUM FULL.
*/
void
RecordTransactionCommit(void)
......@@ -1552,33 +1554,27 @@ CommitTransaction(void)
*/
RecordTransactionCommit();
/*----------
PG_TRACE1(transaction__commit, MyProc->lxid);
/*
* Let others know about no transaction in progress by me. Note that
* this must be done _before_ releasing locks we hold and _after_
* RecordTransactionCommit.
*
* LWLockAcquire(ProcArrayLock) is required; consider this example:
* UPDATE with xid 0 is blocked by xid 1's UPDATE.
* xid 1 is doing commit while xid 2 gets snapshot.
* If xid 2's GetSnapshotData sees xid 1 as running then it must see
* xid 0 as running as well, or it will be able to see two tuple versions
* - one deleted by xid 1 and one inserted by xid 0. See notes in
* GetSnapshotData.
*
* Note: MyProc may be null during bootstrap.
*----------
*/
if (MyProc != NULL)
{
if (TransactionIdIsValid(MyProc->xid))
{
/*
* Lock ProcArrayLock because that's what GetSnapshotData uses.
* You might assume that we can skip this step if we had no
* transaction id assigned, because the failure case outlined
* in GetSnapshotData cannot happen in that case. This is true,
* but we *still* need the lock guarantee that two concurrent
* computations of the *oldest* xmin will get the same result.
* We must lock ProcArrayLock while clearing MyProc->xid, so
* that we do not exit the set of "running" transactions while
* someone else is taking a snapshot. See discussion in
* src/backend/access/transam/README.
*/
LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
MyProc->xid = InvalidTransactionId;
MyProc->lxid = InvalidLocalTransactionId;
MyProc->xmin = InvalidTransactionId;
......@@ -1590,8 +1586,21 @@ CommitTransaction(void)
LWLockRelease(ProcArrayLock);
}
else
{
/*
* If we have no XID, we don't need to lock, since we won't
* affect anyone else's calculation of a snapshot. We might
* change their estimate of global xmin, but that's OK.
*/
MyProc->lxid = InvalidLocalTransactionId;
MyProc->xmin = InvalidTransactionId;
MyProc->inVacuum = false; /* must be cleared with xid/xmin */
PG_TRACE1(transaction__commit, s->transactionId);
Assert(MyProc->subxids.nxids == 0);
Assert(MyProc->subxids.overflowed == false);
}
}
/*
* This is all post-commit cleanup. Note that if an error is raised here,
......@@ -1815,28 +1824,21 @@ PrepareTransaction(void)
* Let others know about no transaction in progress by me. This has to be
* done *after* the prepared transaction has been marked valid, else
* someone may think it is unlocked and recyclable.
*
* We can skip locking ProcArrayLock here, because this action does not
* actually change anyone's view of the set of running XIDs: our entry
* is duplicate with the gxact that has already been inserted into the
* ProcArray.
*/
/*
* Lock ProcArrayLock because that's what GetSnapshotData uses.
* You might assume that we can skip this step if we have no
* transaction id assigned, because the failure case outlined
* in GetSnapshotData cannot happen in that case. This is true,
* but we *still* need the lock guarantee that two concurrent
* computations of the *oldest* xmin will get the same result.
*/
LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
MyProc->xid = InvalidTransactionId;
MyProc->lxid = InvalidLocalTransactionId;
MyProc->xmin = InvalidTransactionId;
MyProc->inVacuum = false; /* must be cleared with xid/xmin */
/* Clear the subtransaction-XID cache too while holding the lock */
/* Clear the subtransaction-XID cache too */
MyProc->subxids.nxids = 0;
MyProc->subxids.overflowed = false;
LWLockRelease(ProcArrayLock);
/*
* This is all post-transaction cleanup. Note that if an error is raised
* here, it's too late to abort the transaction. This should be just
......@@ -1987,22 +1989,27 @@ AbortTransaction(void)
*/
RecordTransactionAbort(false);
PG_TRACE1(transaction__abort, MyProc->lxid);
/*
* Let others know about no transaction in progress by me. Note that this
* must be done _before_ releasing locks we hold and _after_
* RecordTransactionAbort.
*
* Note: MyProc may be null during bootstrap.
*/
if (MyProc != NULL)
{
if (TransactionIdIsValid(MyProc->xid))
{
/*
* Lock ProcArrayLock because that's what GetSnapshotData uses.
* You might assume that we can skip this step if we have no
* transaction id assigned, because the failure case outlined
* in GetSnapshotData cannot happen in that case. This is true,
* but we *still* need the lock guarantee that two concurrent
* computations of the *oldest* xmin will get the same result.
* We must lock ProcArrayLock while clearing MyProc->xid, so
* that we do not exit the set of "running" transactions while
* someone else is taking a snapshot. See discussion in
* src/backend/access/transam/README.
*/
LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
MyProc->xid = InvalidTransactionId;
MyProc->lxid = InvalidLocalTransactionId;
MyProc->xmin = InvalidTransactionId;
......@@ -2015,8 +2022,22 @@ AbortTransaction(void)
LWLockRelease(ProcArrayLock);
}
else
{
/*
* If we have no XID, we don't need to lock, since we won't
* affect anyone else's calculation of a snapshot. We might
* change their estimate of global xmin, but that's OK.
*/
MyProc->lxid = InvalidLocalTransactionId;
MyProc->xmin = InvalidTransactionId;
MyProc->inVacuum = false; /* must be cleared with xid/xmin */
MyProc->inCommit = false; /* be sure this gets cleared */
PG_TRACE1(transaction__abort, s->transactionId);
Assert(MyProc->subxids.nxids == 0);
Assert(MyProc->subxids.overflowed == false);
}
}
/*
* Post-abort cleanup. See notes in CommitTransaction() concerning
......
......@@ -8,7 +8,7 @@
*
*
* IDENTIFICATION
* $PostgreSQL: pgsql/src/backend/commands/copy.c,v 1.285 2007/06/20 02:02:49 neilc Exp $
* $PostgreSQL: pgsql/src/backend/commands/copy.c,v 1.286 2007/09/07 20:59:26 tgl Exp $
*
*-------------------------------------------------------------------------
*/
......@@ -1678,13 +1678,6 @@ CopyFrom(CopyState cstate)
* rd_newRelfilenodeSubid can be cleared before the end of the transaction.
* However this is OK since at worst we will fail to make the optimization.
*
* When skipping WAL it's entirely possible that COPY itself will write no
* WAL records at all. This is of concern because RecordTransactionCommit
* might decide it doesn't need to log our eventual commit, which we
* certainly need it to do. However, we need no special action here for
* that, because if we have a new table or new relfilenode then there
* must have been a WAL-logged pg_class update earlier in the transaction.
*
* Also, if the target file is new-in-transaction, we assume that checking
* FSM for free space is a waste of time, even if we must use WAL because
* of archiving. This could possibly be wrong, but it's unlikely.
......
......@@ -26,7 +26,7 @@
*
*
* IDENTIFICATION
* $PostgreSQL: pgsql/src/backend/executor/execMain.c,v 1.296 2007/08/15 21:39:50 tgl Exp $
* $PostgreSQL: pgsql/src/backend/executor/execMain.c,v 1.297 2007/09/07 20:59:26 tgl Exp $
*
*-------------------------------------------------------------------------
*/
......@@ -2635,12 +2635,6 @@ OpenIntoRel(QueryDesc *queryDesc)
/*
* We can skip WAL-logging the insertions, unless PITR is in use.
*
* Note that for a non-temp INTO table, this is safe only because we know
* that the catalog changes above will have been WAL-logged, and so
* RecordTransactionCommit will think it needs to WAL-log the eventual
* transaction commit. Else the commit might be lost, even though all the
* data is safely fsync'd ...
*/
estate->es_into_relation_use_wal = XLogArchivingActive();
estate->es_into_relation_descriptor = intoRelationDesc;
......
......@@ -10,7 +10,7 @@
*
* Because of various subtle race conditions it is critical that a backend
* hold the correct locks while setting or clearing its MyProc->xid field.
* See notes in GetSnapshotData.
* See notes in src/backend/access/transam/README.
*
* The process array now also includes PGPROC structures representing
* prepared transactions. The xid and subxids fields of these are valid,
......@@ -23,7 +23,7 @@
*
*
* IDENTIFICATION
* $PostgreSQL: pgsql/src/backend/storage/ipc/procarray.c,v 1.31 2007/09/07 00:58:56 tgl Exp $
* $PostgreSQL: pgsql/src/backend/storage/ipc/procarray.c,v 1.32 2007/09/07 20:59:26 tgl Exp $
*
*-------------------------------------------------------------------------
*/
......@@ -407,6 +407,7 @@ TransactionIdIsActive(TransactionId xid)
* Note: we include all currently running xids in the set of considered xids.
* This ensures that if a just-started xact has not yet set its snapshot,
* when it does set the snapshot it cannot set xmin less than what we compute.
* See notes in src/backend/access/transam/README.
*/
TransactionId
GetOldestXmin(bool allDbs, bool ignoreVacuum)
......@@ -468,7 +469,7 @@ GetOldestXmin(bool allDbs, bool ignoreVacuum)
return result;
}
/*----------
/*
* GetSnapshotData -- returns information about running transactions.
*
* The returned snapshot includes xmin (lowest still-running xact ID),
......@@ -481,12 +482,13 @@ GetOldestXmin(bool allDbs, bool ignoreVacuum)
* This ensures that the set of transactions seen as "running" by the
* current xact will not change after it takes the snapshot.
*
* All running top-level XIDs are included in the snapshot. We also try
* to include running subtransaction XIDs, but since PGPROC has only a
* limited cache area for subxact XIDs, full information may not be
* available. If we find any overflowed subxid arrays, we have to mark
* the snapshot's subxid data as overflowed, and extra work will need to
* be done to determine what's running (see XidInMVCCSnapshot() in tqual.c).
* All running top-level XIDs are included in the snapshot, except for lazy
* VACUUM processes. We also try to include running subtransaction XIDs,
* but since PGPROC has only a limited cache area for subxact XIDs, full
* information may not be available. If we find any overflowed subxid arrays,
* we have to mark the snapshot's subxid data as overflowed, and extra work
* will need to be done to determine what's running (see XidInMVCCSnapshot()
* in tqual.c).
*
* We also update the following backend-global variables:
* TransactionXmin: the oldest xmin of any snapshot in use in the
......@@ -497,7 +499,6 @@ GetOldestXmin(bool allDbs, bool ignoreVacuum)
* RecentGlobalXmin: the global xmin (oldest TransactionXmin across all
* running transactions, except those running LAZY VACUUM). This is
* the same computation done by GetOldestXmin(true, true).
*----------
*/
Snapshot
GetSnapshotData(Snapshot snapshot, bool serializable)
......@@ -550,57 +551,16 @@ GetSnapshotData(Snapshot snapshot, bool serializable)
/*
* It is sufficient to get shared lock on ProcArrayLock, even if we are
* computing a serializable snapshot and therefore will be setting
* MyProc->xmin. This is because any two backends that have overlapping
* shared holds on ProcArrayLock will certainly compute the same xmin
* (since no xact, in particular not the oldest, can exit the set of
* running transactions while we hold ProcArrayLock --- see further
* discussion just below). So it doesn't matter whether another backend
* concurrently doing GetSnapshotData or GetOldestXmin sees our xmin as
* set or not; he'd compute the same xmin for himself either way.
* (We are assuming here that xmin can be set and read atomically,
* just like xid.)
*
* There is a corner case in which the above argument doesn't work: if
* there isn't any oldest xact, ie, all xids in the array are invalid.
* In that case we will compute xmin as the result of ReadNewTransactionId,
* and since GetNewTransactionId doesn't take the ProcArrayLock, it's not
* so obvious that two backends with overlapping shared locks will get
* the same answer. But GetNewTransactionId is required to store the XID
* it assigned into the ProcArray before releasing XidGenLock. Therefore
* the backend that did ReadNewTransactionId later will see that XID in
* the array, and will compute the same xmin as the earlier one that saw
* no XIDs in the array.
* going to set MyProc->xmin.
*/
LWLockAcquire(ProcArrayLock, LW_SHARED);
/*--------------------
/*
* Unfortunately, we have to call ReadNewTransactionId() after acquiring
* ProcArrayLock above. It's not good because ReadNewTransactionId() does
* LWLockAcquire(XidGenLock), but *necessary*. We need to be sure that
* no transactions exit the set of currently-running transactions
* between the time we fetch xmax and the time we finish building our
* snapshot. Otherwise we could have a situation like this:
*
* 1. Tx Old is running (in Read Committed mode).
* 2. Tx S reads new transaction ID into xmax, then
* is swapped out before acquiring ProcArrayLock.
* 3. Tx New gets new transaction ID (>= S' xmax),
* makes changes and commits.
* 4. Tx Old changes some row R changed by Tx New and commits.
* 5. Tx S finishes getting its snapshot data. It sees Tx Old as
* done, but sees Tx New as still running (since New >= xmax).
*
* Now S will see R changed by both Tx Old and Tx New, *but* does not
* see other changes made by Tx New. If S is supposed to be in
* Serializable mode, this is wrong.
*
* By locking ProcArrayLock before we read xmax, we ensure that TX Old
* cannot exit the set of running transactions seen by Tx S. Therefore
* both Old and New will be seen as still running => no inconsistency.
*--------------------
* ProcArrayLock. It's not good because ReadNewTransactionId() does
* LWLockAcquire(XidGenLock), but *necessary*. See discussion in
* src/backend/access/transam/README.
*/
xmax = ReadNewTransactionId();
/* initialize xmin calculation with xmax */
......@@ -1147,9 +1107,10 @@ XidCacheRemoveRunningXids(TransactionId xid, int nxids, TransactionId *xids)
/*
* We must hold ProcArrayLock exclusively in order to remove transactions
* from the PGPROC array. (See notes in GetSnapshotData.) It's possible
* this could be relaxed since we know this routine is only used to abort
* subtransactions, but pending closer analysis we'd best be conservative.
* from the PGPROC array. (See src/backend/access/transam/README.) It's
* possible this could be relaxed since we know this routine is only used
* to abort subtransactions, but pending closer analysis we'd best be
* conservative.
*/
LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment