The row-version chaining in Serializable Snapshot Isolation was still wrong.

On further analysis, it turns out that it is not needed to duplicate predicate locks to the new row version at update, the lock on the version that the transaction saw as visible is enough. However, there was a different bug in the code that checks for dangerous structures when a new rw-conflict happens. Fix that bug, and remove all the row-version chaining related code. Kevin Grittner & Dan Ports, with some comment editorialization by me.

The row-version chaining in Serializable Snapshot Isolation was still wrong.
On further analysis, it turns out that it is not needed to duplicate predicate locks to the new row version at update, the lock on the version that the transaction saw as visible is enough. However, there was a different bug in the code that checks for dangerous structures when a new rw-conflict happens. Fix that bug, and remove all the row-version chaining related code. Kevin Grittner & Dan Ports, with some comment editorialization by me.
3103f9a7 · Heikki Linnakangas · 5177dfef · 3103f9a7 · 3103f9a7 · 3103f9a7
Commit 3103f9a7 authored May 30, 2011 by Heikki Linnakangas
7 changed files
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -1529,7 +1529,6 @@ heap_hot_search_buffer(ItemPointer tid, Relation relation, Buffer buffer,
 	OffsetNumber offnum;
 	bool		at_chain_start;
 	bool		valid;
-	bool		match_found;

 	if (all_dead)
 		*all_dead = true;
@@ -1539,7 +1538,6 @@ heap_hot_search_buffer(ItemPointer tid, Relation relation, Buffer buffer,
 	Assert(ItemPointerGetBlockNumber(tid) == BufferGetBlockNumber(buffer));
 	offnum = ItemPointerGetOffsetNumber(tid);
 	at_chain_start = true;
-	match_found = false;

 	/* Scan through possible multiple members of HOT-chain */
 	for (;;)
@@ -1597,9 +1595,6 @@ heap_hot_search_buffer(ItemPointer tid, Relation relation, Buffer buffer,
 			PredicateLockTuple(relation, &heapTuple);
 			if (all_dead)
 				*all_dead = false;
-			if (IsolationIsSerializable())
-				match_found = true;
-			else
 			return true;
 		}

@@ -1629,7 +1624,7 @@ heap_hot_search_buffer(ItemPointer tid, Relation relation, Buffer buffer,
 			break;				/* end of chain */
 	}

-	return match_found;
+	return false;
 }

 /*
@@ -2855,12 +2850,6 @@ l2:

 	END_CRIT_SECTION();

-	/*
-	 * Any existing SIREAD locks on the old tuple must be linked to the new
-	 * tuple for conflict detection purposes.
-	 */
-	PredicateLockTupleRowVersionLink(relation, &oldtup, heaptup);
-
 	if (newbuf != buffer)
 		LockBuffer(newbuf, BUFFER_LOCK_UNLOCK);
 	LockBuffer(buffer, BUFFER_LOCK_UNLOCK);

--- a/src/backend/access/index/indexam.c
+++ b/src/backend/access/index/indexam.c
@@ -612,8 +612,7 @@ index_getnext(IndexScanDesc scan, ScanDirection direction)
 				 * any more members.  Otherwise, check for continuation of the
 				 * HOT-chain, and set state for next time.
 				 */
-				if (IsMVCCSnapshot(scan->xs_snapshot)
-					&& !IsolationIsSerializable())
+				if (IsMVCCSnapshot(scan->xs_snapshot))
 					scan->xs_next_hot = InvalidOffsetNumber;
 				else if (HeapTupleIsHotUpdated(heapTuple))
 				{

--- a/src/backend/storage/lmgr/README-SSI
+++ b/src/backend/storage/lmgr/README-SSI
@@ -402,6 +402,54 @@ is based on the top level xid.  When looking at an xid that comes
 from a tuple's xmin or xmax, for example, we always call
 SubTransGetTopmostTransaction() before doing much else with it.

+    * PostgreSQL does not use "update in place" with a rollback log
+for its MVCC implementation.  Where possible it uses "HOT" updates on
+the same page (if there is room and no indexed value is changed).
+For non-HOT updates the old tuple is expired in place and a new tuple
+is inserted at a new location.  Because of this difference, a tuple
+lock in PostgreSQL doesn't automatically lock any other versions of a
+row.  We don't try to copy or expand a tuple lock to any other
+versions of the row, based on the following proof that any additional
+serialization failures we would get from that would be false
+positives:
+
+          o If transaction T1 reads a row (thus acquiring a predicate
+lock on it) and a second transaction T2 updates that row, must a
+third transaction T3 which updates the new version of the row have a
+rw-conflict in from T1 to prevent anomalies?  In other words, does it
+matter whether this edge T1 -> T3 is there?
+
+          o If T1 has a conflict in, it certainly doesn't. Adding the
+edge T1 -> T3 would create a dangerous structure, but we already had
+one from the edge T1 -> T2, so we would have aborted something
+anyway.
+
+          o Now let's consider the case where T1 doesn't have a
+conflict in. If that's the case, for this edge T1 -> T3 to make a
+difference, T3 must have a rw-conflict out that induces a cycle in
+the dependency graph, i.e. a conflict out to some transaction
+preceding T1 in the serial order. (A conflict out to T1 would work
+too, but that would mean T1 has a conflict in and we would have
+rolled back.)
+
+          o So now we're trying to figure out if there can be an
+rw-conflict edge T3 -> T0, where T0 is some transaction that precedes
+T1. For T0 to precede T1, there has to be has to be some edge, or
+sequence of edges, from T0 to T1. At least the last edge has to be a
+wr-dependency or ww-dependency rather than a rw-conflict, because T1
+doesn't have a rw-conflict in. And that gives us enough information
+about the order of transactions to see that T3 can't have a
+rw-dependency to T0:
+ - T0 committed before T1 started (the wr/ww-dependency implies this)
+ - T1 started before T2 committed (the T1->T2 rw-conflict implies this)
+ - T2 committed before T3 started (otherwise, T3 would be aborted
+                                   because of an update conflict)
+
+          o That means T0 committed before T3 started, and therefore
+there can't be a rw-conflict from T3 to T0.
+
+          o In both cases, we didn't need the T1 -> T3 edge.
+
    * Predicate locking in PostgreSQL will start at the tuple level
 when possible, with automatic conversion of multiple fine-grained
 locks to coarser granularity as need to avoid resource exhaustion.

--- a/src/backend/storage/lmgr/predicate.c
+++ b/src/backend/storage/lmgr/predicate.c
--- a/src/include/storage/predicate.h
+++ b/src/include/storage/predicate.h
@@ -47,7 +47,6 @@ extern void RegisterPredicateLockingXid(const TransactionId xid);
 extern void PredicateLockRelation(const Relation relation);
 extern void PredicateLockPage(const Relation relation, const BlockNumber blkno);
 extern void PredicateLockTuple(const Relation relation, const HeapTuple tuple);
-extern void PredicateLockTupleRowVersionLink(const Relation relation, const HeapTuple oldTuple, const HeapTuple newTuple);
 extern void PredicateLockPageSplit(const Relation relation, const BlockNumber oldblkno, const BlockNumber newblkno);
 extern void PredicateLockPageCombine(const Relation relation, const BlockNumber oldblkno, const BlockNumber newblkno);
 extern void ReleasePredicateLocks(const bool isCommit);

--- a/src/test/isolation/expected/multiple-row-versions.out
+++ b/src/test/isolation/expected/multiple-row-versions.out
@@ -19,6 +19,6 @@ id             txt
 1                             
 step c4:  COMMIT; 
 step c3:  COMMIT; 
-ERROR:  could not serialize access due to read/write dependencies among transactions
 step wz1:  UPDATE t SET txt = 'a' WHERE id = 1; 
+ERROR:  could not serialize access due to read/write dependencies among transactions
 step c1:  COMMIT; 
--- a/src/test/isolation/specs/multiple-row-versions.spec
+++ b/src/test/isolation/specs/multiple-row-versions.spec
 # Multiple Row Versions test
 #
-# This test is designed to ensure that predicate locks taken on one version
-# of a row are detected as conflicts when a later version of the row is
-# updated or deleted by a transaction concurrent to the reader.
+# This test is designed to cover some code paths which only occur with
+# four or more transactions interacting with particular timings.
 #
 # Due to long permutation setup time, we are only testing one specific
 # permutation, which should get a serialization error.