Commit a3290f65 authored by Tom Lane's avatar Tom Lane

Minor editing for README-SSI.

Fix some grammatical issues, try to clarify a couple of proofs, make the
terminology more consistent.
parent e2a0cb1a
...@@ -3,11 +3,11 @@ src/backend/storage/lmgr/README-SSI ...@@ -3,11 +3,11 @@ src/backend/storage/lmgr/README-SSI
Serializable Snapshot Isolation (SSI) and Predicate Locking Serializable Snapshot Isolation (SSI) and Predicate Locking
=========================================================== ===========================================================
This is currently sitting in the lmgr directory because about 90% of This code is in the lmgr directory because about 90% of it is an
the code is an implementation of predicate locking, which is required implementation of predicate locking, which is required for SSI,
for SSI, rather than being directly related to SSI itself. When rather than being directly related to SSI itself. When another use
another use for predicate locking justifies the effort to tease these for predicate locking justifies the effort to tease these two things
two things apart, this README file should probably be split. apart, this README file should probably be split.
Credits Credits
...@@ -151,11 +151,11 @@ transactions. ...@@ -151,11 +151,11 @@ transactions.
SSI Algorithm SSI Algorithm
------------- -------------
Serializable transaction in PostgreSQL are implemented using As of 9.1, serializable transactions in PostgreSQL are implemented using
Serializable Snapshot Isolation (SSI), based on the work of Cahill Serializable Snapshot Isolation (SSI), based on the work of Cahill
et al. Fundamentally, this allows snapshot isolation to run as it et al. Fundamentally, this allows snapshot isolation to run as it
has, while monitoring for conditions which could create a serialization previously did, while monitoring for conditions which could create a
anomaly. serialization anomaly.
SSI is based on the observation [2] that each snapshot isolation SSI is based on the observation [2] that each snapshot isolation
anomaly corresponds to a cycle that contains a "dangerous structure" anomaly corresponds to a cycle that contains a "dangerous structure"
...@@ -168,8 +168,10 @@ SSI works by watching for this dangerous structure, and rolling ...@@ -168,8 +168,10 @@ SSI works by watching for this dangerous structure, and rolling
back a transaction when needed to prevent any anomaly. This means it back a transaction when needed to prevent any anomaly. This means it
only needs to track rw-conflicts between concurrent transactions, not only needs to track rw-conflicts between concurrent transactions, not
wr- and ww-dependencies. It also means there is a risk of false wr- and ww-dependencies. It also means there is a risk of false
positives, because not every dangerous structure corresponds to an positives, because not every dangerous structure is embedded in an
actual serialization failure. actual cycle. The number of false positives is low in practice, so
this represents an acceptable tradeoff for keeping the detection
overhead low.
The PostgreSQL implementation uses two additional optimizations: The PostgreSQL implementation uses two additional optimizations:
...@@ -182,11 +184,12 @@ The PostgreSQL implementation uses two additional optimizations: ...@@ -182,11 +184,12 @@ The PostgreSQL implementation uses two additional optimizations:
one. Proof: one. Proof:
- Because there is a cycle, there must be some transaction T0 that - Because there is a cycle, there must be some transaction T0 that
precedes Tin in the serial order. (T0 might be the same as Tout). precedes Tin in the cycle. (T0 might be the same as Tout.)
- The dependency between T0 and Tin can't be a rw-conflict, - The edge between T0 and Tin can't be a rw-conflict or ww-dependency,
because Tin was read-only, so it must be a wr-dependency. because Tin was read-only, so it must be a wr-dependency.
Those can only occur if T0 committed before Tin started. Those can only occur if T0 committed before Tin took its snapshot,
else Tin would have ignored T0's output.
- Because Tout must commit before any other transaction in the - Because Tout must commit before any other transaction in the
cycle, it must commit before T0 commits -- and thus before Tin cycle, it must commit before T0 commits -- and thus before Tin
...@@ -258,8 +261,8 @@ full serializable transactions under either strategy. Practical ...@@ -258,8 +261,8 @@ full serializable transactions under either strategy. Practical
implementations of predicate locking generally involve acquiring implementations of predicate locking generally involve acquiring
locks against data as it is accessed, using multiple granularities locks against data as it is accessed, using multiple granularities
(tuple, page, table, etc.) with escalation as needed to keep the lock (tuple, page, table, etc.) with escalation as needed to keep the lock
count to a number which can be tracked within RAM structures, and count to a number which can be tracked within RAM structures. This
this was used in PostgreSQL. Coarse granularities can cause some approach was used in PostgreSQL. Coarse granularities can cause some
false positive indications of conflict. The number of false positives false positive indications of conflict. The number of false positives
can be influenced by plan choice. can be influenced by plan choice.
...@@ -276,7 +279,7 @@ Hellerstein, Stonebraker and Hamilton paper [3], along with the ...@@ -276,7 +279,7 @@ Hellerstein, Stonebraker and Hamilton paper [3], along with the
locking papers referenced from that and the Cahill papers. locking papers referenced from that and the Cahill papers.
Because the SIREAD locks don't block, traditional locking techniques Because the SIREAD locks don't block, traditional locking techniques
were be modified. Intent locking (locking higher level objects have to be modified. Intent locking (locking higher level objects
before locking lower level objects) doesn't work with non-blocking before locking lower level objects) doesn't work with non-blocking
"locks" (which are, in some respects, more like flags than locks). "locks" (which are, in some respects, more like flags than locks).
...@@ -284,10 +287,10 @@ A configurable amount of shared memory is reserved at postmaster ...@@ -284,10 +287,10 @@ A configurable amount of shared memory is reserved at postmaster
start-up to track predicate locks. This size cannot be changed start-up to track predicate locks. This size cannot be changed
without a restart. without a restart.
* To prevent resource exhaustion, multiple fine-grained locks may To prevent resource exhaustion, multiple fine-grained locks may
be promoted to a single coarser-grained lock as needed. be promoted to a single coarser-grained lock as needed.
* An attempt to acquire an SIREAD lock on a tuple when the same An attempt to acquire an SIREAD lock on a tuple when the same
transaction already holds an SIREAD lock on the page or the relation transaction already holds an SIREAD lock on the page or the relation
will be ignored. Likewise, an attempt to lock a page when the will be ignored. Likewise, an attempt to lock a page when the
relation is locked will be ignored, and the acquisition of a coarser relation is locked will be ignored, and the acquisition of a coarser
...@@ -306,8 +309,8 @@ Predicate locks will be acquired for the heap based on the following: ...@@ -306,8 +309,8 @@ Predicate locks will be acquired for the heap based on the following:
will be locked, whether or not it meets selection criteria; except will be locked, whether or not it meets selection criteria; except
that there is no need to acquire an SIREAD lock on a tuple when the that there is no need to acquire an SIREAD lock on a tuple when the
transaction already holds a write lock on any tuple representing the transaction already holds a write lock on any tuple representing the
row, since a rw-dependency would also create a ww-dependency which row, since a rw-conflict would also create a ww-dependency which
has more aggressive enforcement and will thus prevent any anomaly. has more aggressive enforcement and thus will prevent any anomaly.
* Modifying a heap tuple creates a rw-conflict with any transaction * Modifying a heap tuple creates a rw-conflict with any transaction
that holds a SIREAD lock on that tuple, or on the page or relation that holds a SIREAD lock on that tuple, or on the page or relation
...@@ -341,13 +344,13 @@ need not generate a conflict, although an update which "moves" a row ...@@ -341,13 +344,13 @@ need not generate a conflict, although an update which "moves" a row
into the scan must generate a conflict. While correctness allows into the scan must generate a conflict. While correctness allows
false positives, they should be minimized for performance reasons. false positives, they should be minimized for performance reasons.
Several optimizations are possible, though not all implemented yet: Several optimizations are possible, though not all are implemented yet:
* An index scan which is just finding the right position for an * An index scan which is just finding the right position for an
index insertion or deletion needs not acquire a predicate lock. index insertion or deletion need not acquire a predicate lock.
* An index scan which is comparing for equality on the entire key * An index scan which is comparing for equality on the entire key
for a unique index needs not acquire a predicate lock as long as a key for a unique index need not acquire a predicate lock as long as a key
is found corresponding to a visible tuple which has not been modified is found corresponding to a visible tuple which has not been modified
by another transaction -- there are no "between or around" gaps to by another transaction -- there are no "between or around" gaps to
cover. cover.
...@@ -362,6 +365,9 @@ x = 1 AND x = 2), then no predicate lock is needed. ...@@ -362,6 +365,9 @@ x = 1 AND x = 2), then no predicate lock is needed.
Other index AM implementation considerations: Other index AM implementation considerations:
* For an index AM that doesn't have support for predicate locking,
we just acquire a predicate lock on the whole index for any search.
* B-tree index searches acquire predicate locks only on the * B-tree index searches acquire predicate locks only on the
index *leaf* pages needed to lock the appropriate index range. If, index *leaf* pages needed to lock the appropriate index range. If,
however, a search discovers that no root page has yet been created, a however, a search discovers that no root page has yet been created, a
...@@ -395,8 +401,8 @@ tracking SIREAD locks. ...@@ -395,8 +401,8 @@ tracking SIREAD locks.
any length of time; lock information is written to the tuples any length of time; lock information is written to the tuples
involved in the transactions. involved in the transactions.
* In PostgreSQL, existing lock structures have pointers to * In PostgreSQL, existing lock structures have pointers to
memory which is related to a connection. SIREAD locks need to persist memory which is related to a session. SIREAD locks need to persist
past the end of the originating transaction and even the connection past the end of the originating transaction and even the session
which ran it. which ran it.
* PostgreSQL needs to be able to tolerate a large number of * PostgreSQL needs to be able to tolerate a large number of
transactions executing while one long-running transaction stays open transactions executing while one long-running transaction stays open
...@@ -411,7 +417,8 @@ isolation level distinct from snapshot isolation. ...@@ -411,7 +417,8 @@ isolation level distinct from snapshot isolation.
in the papers. in the papers.
5. PostgreSQL doesn't assign a transaction number to a database 5. PostgreSQL doesn't assign a transaction number to a database
transaction until and unless necessary. transaction until and unless necessary (normally, when the transaction
attempts to modify data).
6. PostgreSQL has pluggable data types with user-definable 6. PostgreSQL has pluggable data types with user-definable
operators, as well as pluggable index types, not all of which are operators, as well as pluggable index types, not all of which are
...@@ -453,42 +460,46 @@ versions of the row, based on the following proof that any additional ...@@ -453,42 +460,46 @@ versions of the row, based on the following proof that any additional
serialization failures we would get from that would be false serialization failures we would get from that would be false
positives: positives:
o If transaction T1 reads a row (thus acquiring a predicate o If transaction T1 reads a row version (thus acquiring a
lock on it) and a second transaction T2 updates that row, must a predicate lock on it) and a second transaction T2 updates that row
third transaction T3 which updates the new version of the row have a version (thus creating a rw-conflict graph edge from T1 to T2), must a
rw-conflict in from T1 to prevent anomalies? In other words, does it third transaction T3 which re-updates the new version of the row also
matter whether this edge T1 -> T3 is there? have a rw-conflict in from T1 to prevent anomalies? In other words,
does it matter whether we recognize the edge T1 -> T3?
o If T1 has a conflict in, it certainly doesn't. Adding the o If T1 has a conflict in, it certainly doesn't. Adding the
edge T1 -> T3 would create a dangerous structure, but we already had edge T1 -> T3 would create a dangerous structure, but we already had
one from the edge T1 -> T2, so we would have aborted something one from the edge T1 -> T2, so we would have aborted something anyway.
anyway. (T2 has already committed, else T3 could not have updated its output;
but we would have aborted either T1 or T1's predecessor(s). Hence
no cycle involving T1 and T3 can survive.)
o Now let's consider the case where T1 doesn't have a o Now let's consider the case where T1 doesn't have a
conflict in. If that's the case, for this edge T1 -> T3 to make a rw-conflict in. If that's the case, for this edge T1 -> T3 to make a
difference, T3 must have a rw-conflict out that induces a cycle in difference, T3 must have a rw-conflict out that induces a cycle in the
the dependency graph, i.e. a conflict out to some transaction dependency graph, i.e. a conflict out to some transaction preceding T1
preceding T1 in the serial order. (A conflict out to T1 would work in the graph. (A conflict out to T1 itself would be problematic too,
too, but that would mean T1 has a conflict in and we would have but that would mean T1 has a conflict in, the case we already
rolled back.) eliminated.)
o So now we're trying to figure out if there can be an o So now we're trying to figure out if there can be an
rw-conflict edge T3 -> T0, where T0 is some transaction that precedes rw-conflict edge T3 -> T0, where T0 is some transaction that precedes
T1. For T0 to precede T1, there has to be has to be some edge, or T1. For T0 to precede T1, there has to be some edge, or sequence of
sequence of edges, from T0 to T1. At least the last edge has to be a edges, from T0 to T1. At least the last edge has to be a wr-dependency
wr-dependency or ww-dependency rather than a rw-conflict, because T1 or ww-dependency rather than a rw-conflict, because T1 doesn't have a
doesn't have a rw-conflict in. And that gives us enough information rw-conflict in. And that gives us enough information about the order
about the order of transactions to see that T3 can't have a of transactions to see that T3 can't have a rw-conflict to T0:
rw-dependency to T0:
- T0 committed before T1 started (the wr/ww-dependency implies this) - T0 committed before T1 started (the wr/ww-dependency implies this)
- T1 started before T2 committed (the T1->T2 rw-conflict implies this) - T1 started before T2 committed (the T1->T2 rw-conflict implies this)
- T2 committed before T3 started (otherwise, T3 would be aborted - T2 committed before T3 started (otherwise, T3 would get aborted
because of an update conflict) because of an update conflict)
o That means T0 committed before T3 started, and therefore o That means T0 committed before T3 started, and therefore
there can't be a rw-conflict from T3 to T0. there can't be a rw-conflict from T3 to T0.
o In both cases, we didn't need the T1 -> T3 edge. o So in all cases, we don't need the T1 -> T3 edge to
recognize cycles. Therefore it's not necessary for T1's SIREAD lock
on the original tuple version to cover later versions as well.
* Predicate locking in PostgreSQL starts at the tuple level * Predicate locking in PostgreSQL starts at the tuple level
when possible. Multiple fine-grained locks are promoted to a single when possible. Multiple fine-grained locks are promoted to a single
...@@ -520,10 +531,12 @@ NULL to indicate no conflict and a self-reference to indicate ...@@ -520,10 +531,12 @@ NULL to indicate no conflict and a self-reference to indicate
multiple conflicts or conflicts with committed transactions, we use a multiple conflicts or conflicts with committed transactions, we use a
list of rw-conflicts. With the more complete information, false list of rw-conflicts. With the more complete information, false
positives are reduced and we have sufficient data for more aggressive positives are reduced and we have sufficient data for more aggressive
clean-up and other optimizations. clean-up and other optimizations:
o We can avoid ever rolling back a transaction until and o We can avoid ever rolling back a transaction until and
unless there is a pivot where a transaction on the conflict *out* unless there is a pivot where a transaction on the conflict *out*
side of the pivot committed before either of the other transactions. side of the pivot committed before either of the other transactions.
o We can avoid ever rolling back a transaction when the o We can avoid ever rolling back a transaction when the
transaction on the conflict *in* side of the pivot is explicitly or transaction on the conflict *in* side of the pivot is explicitly or
implicitly READ ONLY unless the transaction on the conflict *out* implicitly READ ONLY unless the transaction on the conflict *out*
...@@ -531,6 +544,7 @@ side of the pivot committed before the READ ONLY transaction acquired ...@@ -531,6 +544,7 @@ side of the pivot committed before the READ ONLY transaction acquired
its snapshot. (An implicit READ ONLY transaction is one which its snapshot. (An implicit READ ONLY transaction is one which
committed without writing, even though it was not explicitly declared committed without writing, even though it was not explicitly declared
to be READ ONLY.) to be READ ONLY.)
o We can more aggressively clean up conflicts, predicate o We can more aggressively clean up conflicts, predicate
locks, and SSI transaction information. locks, and SSI transaction information.
...@@ -543,7 +557,7 @@ overlapping transaction dependencies. ...@@ -543,7 +557,7 @@ overlapping transaction dependencies.
until the conditions are right for it to start in the "opt out" state until the conditions are right for it to start in the "opt out" state
described above. We add a DEFERRABLE state to transactions, which is described above. We add a DEFERRABLE state to transactions, which is
specified and maintained in a way similar to READ ONLY. It is specified and maintained in a way similar to READ ONLY. It is
ignored for transactions which are not SERIALIZABLE and READ ONLY. ignored for transactions that are not SERIALIZABLE and READ ONLY.
* When a transaction must be rolled back, we pick among the * When a transaction must be rolled back, we pick among the
active transactions such that an immediate retry will not fail again active transactions such that an immediate retry will not fail again
...@@ -593,8 +607,8 @@ might never be touched, or should we keep adding returned items to ...@@ -593,8 +607,8 @@ might never be touched, or should we keep adding returned items to
the end of the available list? the end of the available list?
Footnotes References
--------- ----------
[1] http://www.contrib.andrew.cmu.edu/~shadow/sql/sql1992.txt [1] http://www.contrib.andrew.cmu.edu/~shadow/sql/sql1992.txt
Search for serial execution to find the relevant section. Search for serial execution to find the relevant section.
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment