Implement genuine serializable isolation level.

Until now, our Serializable mode has in fact been what's called Snapshot Isolation, which allows some anomalies that could not occur in any serialized ordering of the transactions. This patch fixes that using a method called Serializable Snapshot Isolation, based on research papers by Michael J. Cahill (see README-SSI for full references). In Serializable Snapshot Isolation, transactions run like they do in Snapshot Isolation, but a predicate lock manager observes the reads and writes performed and aborts transactions if it detects that an anomaly might occur. This method produces some false positives, ie. it sometimes aborts transactions even though there is no anomaly. To track reads we implement predicate locking, see storage/lmgr/predicate.c. Whenever a tuple is read, a predicate lock is acquired on the tuple. Shared memory is finite, so when a transaction takes many tuple-level locks on a page, the locks are promoted to a single page-level lock, and further to a single relation level lock if necessary. To lock key values with no matching tuple, a sequential scan always takes a relation-level lock, and an index scan acquires a page-level lock that covers the search key, whether or not there are any matching keys at the moment. A predicate lock doesn't conflict with any regular locks or with another predicate locks in the normal sense. They're only used by the predicate lock manager to detect the danger of anomalies. Only serializable transactions participate in predicate locking, so there should be no extra overhead for for other transactions. Predicate locks can't be released at commit, but must be remembered until all the transactions that overlapped with it have completed. That means that we need to remember an unbounded amount of predicate locks, so we apply a lossy but conservative method of tracking locks for committed transactions. If we run short of shared memory, we overflow to a new "pg_serial" SLRU pool. We don't currently allow Serializable transactions in Hot Standby mode. That would be hard, because even read-only transactions can cause anomalies that wouldn't otherwise occur. Serializable isolation mode now means the new fully serializable level. Repeatable Read gives you the old Snapshot Isolation level that we have always had. Kevin Grittner and Dan Ports, reviewed by Jeff Davis, Heikki Linnakangas and Anssi Kääriäinen

Implement genuine serializable isolation level.
Until now, our Serializable mode has in fact been what's called Snapshot Isolation, which allows some anomalies that could not occur in any serialized ordering of the transactions. This patch fixes that using a method called Serializable Snapshot Isolation, based on research papers by Michael J. Cahill (see README-SSI for full references). In Serializable Snapshot Isolation, transactions run like they do in Snapshot Isolation, but a predicate lock manager observes the reads and writes performed and aborts transactions if it detects that an anomaly might occur. This method produces some false positives, ie. it sometimes aborts transactions even though there is no anomaly. To track reads we implement predicate locking, see storage/lmgr/predicate.c. Whenever a tuple is read, a predicate lock is acquired on the tuple. Shared memory is finite, so when a transaction takes many tuple-level locks on a page, the locks are promoted to a single page-level lock, and further to a single relation level lock if necessary. To lock key values with no matching tuple, a sequential scan always takes a relation-level lock, and an index scan acquires a page-level lock that covers the search key, whether or not there are any matching keys at the moment. A predicate lock doesn't conflict with any regular locks or with another predicate locks in the normal sense. They're only used by the predicate lock manager to detect the danger of anomalies. Only serializable transactions participate in predicate locking, so there should be no extra overhead for for other transactions. Predicate locks can't be released at commit, but must be remembered until all the transactions that overlapped with it have completed. That means that we need to remember an unbounded amount of predicate locks, so we apply a lossy but conservative method of tracking locks for committed transactions. If we run short of shared memory, we overflow to a new "pg_serial" SLRU pool. We don't currently allow Serializable transactions in Hot Standby mode. That would be hard, because even read-only transactions can cause anomalies that wouldn't otherwise occur. Serializable isolation mode now means the new fully serializable level. Repeatable Read gives you the old Snapshot Isolation level that we have always had. Kevin Grittner and Dan Ports, reviewed by Jeff Davis, Heikki Linnakangas and Anssi Kääriäinen
dafaa3ef · Heikki Linnakangas · c18f51da · dafaa3ef · dafaa3ef · dafaa3ef
Commit dafaa3ef authored Feb 07, 2011 by Heikki Linnakangas
90 changed files
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -490,6 +490,13 @@
      <entry>Can an index of this type be clustered on?</entry>
     </row>
+     <row>
+      <entry><structfield>ampredlocks</structfield></entry>
+      <entry><type>bool</type></entry>
+      <entry></entry>
+      <entry>Does an index of this type manage fine-grained predicate locks?</entry>
+     </row>
     <row>
      <entry><structfield>amkeytype</structfield></entry>
      <entry><type>oid</type></entry>
@@ -6577,7 +6584,7 @@
      <entry><type>text</type></entry>
      <entry></entry>
      <entry>Name of the lock mode held or desired by this process (see <xref
-      linkend="locking-tables">)</entry>
+      linkend="locking-tables"> and <xref linkend="xact-serializable">)</entry>
     </row>
     <row>
      <entry><structfield>granted</structfield></entry>

--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -4456,6 +4456,7 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
     <varlistentry id="guc-default-transaction-isolation" xreflabel="default_transaction_isolation">
      <indexterm>
       <primary>transaction isolation level</primary>
+       <secondary>setting default</secondary>
      </indexterm>
      <indexterm>
       <primary><varname>default_transaction_isolation</> configuration parameter</primary>
@@ -4481,6 +4482,7 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
     <varlistentry id="guc-default-transaction-read-only" xreflabel="default_transaction_read_only">
      <indexterm>
       <primary>read-only transaction</primary>
+       <secondary>setting default</secondary>
      </indexterm>
      <indexterm>
       <primary><varname>default_transaction_read_only</> configuration parameter</primary>
@@ -4500,6 +4502,41 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
      </listitem>
     </varlistentry>
+     <varlistentry id="guc-default-transaction-deferrable" xreflabel="default_transaction_deferrable">
+      <indexterm>
+       <primary>deferrable transaction</primary>
+       <secondary>setting default</secondary>
+      </indexterm>
+      <indexterm>
+       <primary><varname>default_transaction_deferrable</> configuration parameter</primary>
+      </indexterm>
+      <term><varname>default_transaction_deferrable</varname> (<type>boolean</type>)</term>
+      <listitem>
+       <para>
+        When running at the <literal>serializable</> isolation level,
+        a deferrable read-only SQL transaction may be delayed before
+        it is allowed to proceed.  However, once it begins executing
+        it does not incur any of the overhead required to ensure
+        serializability; so serialization code will have no reason to
+        force it to abort because of concurrent updates, making this
+        option suitable for long-running read-only transactions.
+        </para>
+        <para>
+        This parameter controls the default deferrable status of each
+        new transaction.  It currently has no effect on read-write
+        transactions or those operating at isolation levels lower
+        than <literal>serializable</>. The default is <literal>off</>.
+       </para>
+       <para>
+        Consult <xref linkend="sql-set-transaction"> for more information.
+       </para>
+      </listitem>
+     </varlistentry>
     <varlistentry id="guc-session-replication-role" xreflabel="session_replication_role">
      <term><varname>session_replication_role</varname> (<type>enum</type>)</term>
      <indexterm>
@@ -5125,6 +5162,39 @@ dynamic_library_path = 'C:\tools\postgresql;H:\my_project\lib;$libdir'
      </listitem>
     </varlistentry>
+     <varlistentry id="guc-max-predicate-locks-per-transaction" xreflabel="max_predicate_locks_per_transaction">
+      <term><varname>max_predicate_locks_per_transaction</varname> (<type>integer</type>)</term>
+      <indexterm>
+       <primary><varname>max_predicate_locks_per_transaction</> configuration parameter</primary>
+      </indexterm>
+      <listitem>
+       <para>
+        The shared predicate lock table tracks locks on
+        <varname>max_predicate_locks_per_transaction</varname> * (<xref
+        linkend="guc-max-connections"> + <xref
+        linkend="guc-max-prepared-transactions">) objects (e.g., tables);
+        hence, no more than this many distinct objects can be locked at
+        any one time.  This parameter controls the average number of object
+        locks allocated for each transaction;  individual transactions
+        can lock more objects as long as the locks of all transactions
+        fit in the lock table.  This is <emphasis>not</> the number of
+        rows that can be locked; that value is unlimited.  The default,
+        64, has generally been sufficient in testing, but you might need to
+        raise this value if you have clients that touch many different
+        tables in a single serializable transaction. This parameter can
+        only be set at server start.
+       </para>
+       <para>
+        Increasing this parameter might cause <productname>PostgreSQL</>
+        to request more <systemitem class="osname">System V</> shared
+        memory than your operating system's default configuration
+        allows. See <xref linkend="sysvipc"> for information on how to
+        adjust those parameters, if necessary.
+       </para>
+      </listitem>
+     </varlistentry>
     </variablelist>
   </sect1>

--- a/doc/src/sgml/high-availability.sgml
+++ b/doc/src/sgml/high-availability.sgml
@@ -1916,6 +1916,15 @@ LOG:  database system is ready to accept read only connections
     your setting of <varname>max_prepared_transactions</> is 0.
    </para>
   </listitem>
+   <listitem>
+    <para>
+     The Serializable transaction isolation level is not yet available in hot
+     standby.  (See <xref linkend="xact-serializable"> and
+     <xref linkend="serializable-consistency"> for details.)
+     An attempt to set a transaction to the serializable isolation level in
+     hot standby mode will generate an error.
+    </para>
+   </listitem>
  </itemizedlist>
   </para>

--- a/doc/src/sgml/indexam.sgml
+++ b/doc/src/sgml/indexam.sgml
@@ -705,6 +705,19 @@ amrestrpos (IndexScanDesc scan);
   it is only safe to use such scans with MVCC-compliant snapshots.
  </para>
+  <para>
+   When the <structfield>ampredlocks</> flag is not set, any scan using that
+   index access method within a serializable transaction will acquire a
+   non-blocking predicate lock on the full index.  This will generate a
+   read-write conflict with the insert of any tuple into that index by a
+   concurrent serializable transaction.  If certain patterns of read-write
+   conflicts are detected among a set of concurrent serializable
+   transactions, one of those transactions may be cancelled to protect data
+   integrity.  When the flag is set, it indicates that the index access
+   method implements finer-grained predicate locking, which will tend to
+   reduce the frequency of such transaction cancellations.
+  </para>
 </sect1>
 <sect1 id="index-unique-checks">

--- a/doc/src/sgml/lobj.sgml
+++ b/doc/src/sgml/lobj.sgml
@@ -256,7 +256,7 @@ int lo_open(PGconn *conn, Oid lobjId, int mode);
     from a descriptor opened with <symbol>INV_WRITE</symbol> returns
     data that reflects all writes of other committed transactions as well
     as writes of the current transaction.  This is similar to the behavior
-     of <literal>SERIALIZABLE</> versus <literal>READ COMMITTED</> transaction
+     of <literal>REPEATABLE READ</> versus <literal>READ COMMITTED</> transaction
     modes for ordinary SQL <command>SELECT</> commands.
    </para>

--- a/doc/src/sgml/mvcc.sgml
+++ b/doc/src/sgml/mvcc.sgml
--- a/doc/src/sgml/ref/begin.sgml
+++ b/doc/src/sgml/ref/begin.sgml
@@ -27,6 +27,7 @@ BEGIN [ WORK | TRANSACTION ] [ <replaceable class="parameter">transaction_mode</
    ISOLATION LEVEL { SERIALIZABLE | REPEATABLE READ | READ COMMITTED | READ UNCOMMITTED }
    READ WRITE | READ ONLY
+    [ NOT ] DEFERRABLE
 </synopsis>
 </refsynopsisdiv>
@@ -57,7 +58,7 @@ BEGIN [ WORK | TRANSACTION ] [ <replaceable class="parameter">transaction_mode</
  </para>
  <para>
-   If the isolation level or read/write mode is specified, the new
+   If the isolation level, read/write mode, or deferrable mode is specified, the new
   transaction has those characteristics, as if
   <xref linkend="sql-set-transaction">
   was executed.
@@ -135,6 +136,12 @@ BEGIN;
   contains additional compatibility information.
  </para>
+  <para>
+   The <literal>DEFERRABLE</literal>
+   <replaceable class="parameter">transaction_mode</replaceable>
+   is a <productname>PostgreSQL</productname> language extension.
+  </para>
  <para>
   Incidentally, the <literal>BEGIN</literal> key word is used for a
   different purpose in embedded SQL. You are advised to be careful

--- a/doc/src/sgml/ref/lock.sgml
+++ b/doc/src/sgml/ref/lock.sgml
@@ -67,10 +67,12 @@ LOCK [ TABLE ] [ ONLY ] <replaceable class="PARAMETER">name</replaceable> [, ...
  </para>
  <para>
-   To achieve a similar effect when running a transaction at the Serializable
+   To achieve a similar effect when running a transaction at the
+   <literal>REPEATABLE READ</> or <literal>SERIALIZABLE</>
   isolation level, you have to execute the <command>LOCK TABLE</> statement
   before executing any <command>SELECT</> or data modification statement.
-   A serializable transaction's view of data will be frozen when its first
+   A <literal>REPEATABLE READ</> or <literal>SERIALIZABLE</> transaction's
+   view of data will be frozen when its first
   <command>SELECT</> or data modification statement begins.  A <command>LOCK
   TABLE</> later in the transaction will still prevent concurrent writes
   &mdash; but it won't ensure that what the transaction reads corresponds to

--- a/doc/src/sgml/ref/pg_dump.sgml
+++ b/doc/src/sgml/ref/pg_dump.sgml
@@ -646,6 +646,41 @@ PostgreSQL documentation
      </listitem>
     </varlistentry>
+     <varlistentry>
+      <term><option>--serializable-deferrable</option></term>
+      <listitem>
+       <para>
+        Use a <literal>serializable</literal> transaction for the dump, to
+        ensure that the snapshot used is consistent with later database
+        states; but do this by waiting for a point in the transaction stream
+        at which no anomalies can be present, so that there isn't a risk of
+        the dump failing or causing other transactions to roll back with a
+        <literal>serialization_failure</literal>.  See <xref linkend="mvcc">
+        for more information about transaction isolation and concurrency
+        control.
+       </para>
+       <para>
+        This option is not beneficial for a dump which is intended only for
+        disaster recovery.  It could be useful for a dump used to load a
+        copy of the database for reporting or other read-only load sharing
+        while the original database continues to be updated.  Without it the
+        dump may reflect a state which is not consistent with any serial
+        execution of the transactions eventually committed.  For example, if
+        batch processing techniques are used, a batch may show as closed in
+        the dump without all of the items which are in the batch appearing.
+       </para>
+       <para>
+        This option will make no difference if there are no read-write
+        transactions active when pg_dump is started.  If read-write
+        transactions are active, the start of the dump may be delayed for an
+        indeterminate length of time.  Once running, performance with or
+        without the switch is the same.
+       </para>
+      </listitem>
+     </varlistentry>
     <varlistentry>
      <term><option>--no-tablespaces</option></term>
      <listitem>

--- a/doc/src/sgml/ref/select.sgml
+++ b/doc/src/sgml/ref/select.sgml
@@ -1144,7 +1144,7 @@ FOR SHARE [ OF <replaceable class="parameter">table_name</replaceable> [, ...] ]
    has already locked a selected row or rows, <command>SELECT FOR
    UPDATE</command> will wait for the other transaction to complete,
    and will then lock and return the updated row (or no row, if the
-    row was deleted).  Within a <literal>SERIALIZABLE</> transaction,
+    row was deleted).  Within a <literal>REPEATABLE READ</> or <literal>SERIALIZABLE</> transaction,
    however, an error will be thrown if a row to be locked has changed
    since the transaction started.  For further discussion see <xref
    linkend="mvcc">.

--- a/doc/src/sgml/ref/set_transaction.sgml
+++ b/doc/src/sgml/ref/set_transaction.sgml
@@ -15,6 +15,21 @@
  <primary>SET TRANSACTION</primary>
 </indexterm>
+ <indexterm>
+  <primary>transaction isolation level</primary>
+  <secondary>setting</secondary>
+ </indexterm>
+ <indexterm>
+  <primary>read-only transaction</primary>
+  <secondary>setting</secondary>
+ </indexterm>
+ <indexterm>
+  <primary>deferrable transaction</primary>
+  <secondary>setting</secondary>
+ </indexterm>
 <refsynopsisdiv>
 <synopsis>
 SET TRANSACTION <replaceable class="parameter">transaction_mode</replaceable> [, ...]
@@ -24,6 +39,7 @@ SET SESSION CHARACTERISTICS AS TRANSACTION <replaceable class="parameter">transa
    ISOLATION LEVEL { SERIALIZABLE | REPEATABLE READ | READ COMMITTED | READ UNCOMMITTED }
    READ WRITE | READ ONLY
+    [ NOT ] DEFERRABLE
 </synopsis>
 </refsynopsisdiv>
@@ -42,8 +58,8 @@ SET SESSION CHARACTERISTICS AS TRANSACTION <replaceable class="parameter">transa
  <para>
   The available transaction characteristics are the transaction
-   isolation level and the transaction access mode (read/write or
+   isolation level, the transaction access mode (read/write or
-   read-only).
+   read-only), and the deferrable mode.
  </para>
  <para>
@@ -62,7 +78,7 @@ SET SESSION CHARACTERISTICS AS TRANSACTION <replaceable class="parameter">transa
    </varlistentry>
    <varlistentry>
-     <term><literal>SERIALIZABLE</literal></term>
+     <term><literal>REPEATABLE READ</literal></term>
     <listitem>
      <para>
       All statements of the current transaction can only see rows committed
@@ -71,14 +87,27 @@ SET SESSION CHARACTERISTICS AS TRANSACTION <replaceable class="parameter">transa
      </para>
     </listitem>
    </varlistentry>
+    <varlistentry>
+     <term><literal>SERIALIZABLE</literal></term>
+     <listitem>
+      <para>
+       All statements of the current transaction can only see rows committed
+       before the first query or data-modification statement was executed in
+       this transaction.  If a pattern of reads and writes among concurrent
+       serializable transactions would create a situation which could not
+       have occurred for any serial (one-at-a-time) execution of those
+       transactions, one of them will be rolled back with a
+       <literal>serialization_failure</literal> <literal>SQLSTATE</literal>.
+      </para>
+     </listitem>
+    </varlistentry>
   </variablelist>
-   The SQL standard defines two additional levels, <literal>READ
+   The SQL standard defines one additional level, <literal>READ
-   UNCOMMITTED</literal> and <literal>REPEATABLE READ</literal>.
+   UNCOMMITTED</literal>.
   In <productname>PostgreSQL</productname> <literal>READ
-   UNCOMMITTED</literal> is treated as
+   UNCOMMITTED</literal> is treated as <literal>READ COMMITTED</literal>.
-   <literal>READ COMMITTED</literal>, while <literal>REPEATABLE
-   READ</literal> is treated as <literal>SERIALIZABLE</literal>.
  </para>
  <para>
@@ -127,8 +156,9 @@ SET SESSION CHARACTERISTICS AS TRANSACTION <replaceable class="parameter">transa
  <para>
   The session default transaction modes can also be set by setting the
-   configuration parameters <xref linkend="guc-default-transaction-isolation">
+   configuration parameters <xref linkend="guc-default-transaction-isolation">,
-   and <xref linkend="guc-default-transaction-read-only">.
+   <xref linkend="guc-default-transaction-read-only">, and
+   <xref linkend="guc-default-transaction-deferrable">.
   (In fact <command>SET SESSION CHARACTERISTICS</command> is just a
   verbose equivalent for setting these variables with <command>SET</>.)
   This means the defaults can be set in the configuration file, via
@@ -146,9 +176,7 @@ SET SESSION CHARACTERISTICS AS TRANSACTION <replaceable class="parameter">transa
   isolation level in the standard.  In
   <productname>PostgreSQL</productname> the default is ordinarily
   <literal>READ COMMITTED</literal>, but you can change it as
-   mentioned above.  Because of lack of predicate locking, the
+   mentioned above.
-   <literal>SERIALIZABLE</literal> level is not truly
-   serializable. See <xref linkend="mvcc"> for details.
  </para>
  <para>
@@ -158,6 +186,12 @@ SET SESSION CHARACTERISTICS AS TRANSACTION <replaceable class="parameter">transa
   not implemented in the <productname>PostgreSQL</productname> server.
  </para>
+  <para>
+   The <literal>DEFERRABLE</literal>
+   <replaceable class="parameter">transaction_mode</replaceable>
+   is a <productname>PostgreSQL</productname> language extension.
+  </para>
  <para>
   The SQL standard requires commas between successive <replaceable
   class="parameter">transaction_modes</replaceable>, but for historical

--- a/doc/src/sgml/ref/start_transaction.sgml
+++ b/doc/src/sgml/ref/start_transaction.sgml
@@ -27,6 +27,7 @@ START TRANSACTION [ <replaceable class="parameter">transaction_mode</replaceable
    ISOLATION LEVEL { SERIALIZABLE | REPEATABLE READ | READ COMMITTED | READ UNCOMMITTED }
    READ WRITE | READ ONLY
+    [ NOT ] DEFERRABLE
 </synopsis>
 </refsynopsisdiv>
@@ -34,8 +35,8 @@ START TRANSACTION [ <replaceable class="parameter">transaction_mode</replaceable
  <title>Description</title>
  <para>
-   This command begins a new transaction block. If the isolation level or
+   This command begins a new transaction block. If the isolation level,
-   read/write mode is specified, the new transaction has those
+   read/write mode, or deferrable mode is specified, the new transaction has those
   characteristics, as if <xref linkend="sql-set-transaction"> was executed. This is the same
   as the <xref linkend="sql-begin"> command.
  </para>
@@ -64,6 +65,12 @@ START TRANSACTION [ <replaceable class="parameter">transaction_mode</replaceable
   as a convenience.
  </para>
+  <para>
+   The <literal>DEFERRABLE</literal>
+   <replaceable class="parameter">transaction_mode</replaceable>
+   is a <productname>PostgreSQL</productname> language extension.
+  </para>
  <para>
   The SQL standard requires commas between successive <replaceable
   class="parameter">transaction_modes</replaceable>, but for historical

--- a/doc/src/sgml/spi.sgml
+++ b/doc/src/sgml/spi.sgml
@@ -340,7 +340,7 @@ SPI_execute("INSERT INTO foo SELECT * FROM bar", false, 5);
   <function>SPI_execute</function> increments the command
   counter and computes a new <firstterm>snapshot</> before executing each
   command in the string.  The snapshot does not actually change if the
-   current transaction isolation level is <literal>SERIALIZABLE</>, but in
+   current transaction isolation level is <literal>SERIALIZABLE</> or <literal>REPEATABLE READ</>, but in
   <literal>READ COMMITTED</> mode the snapshot update allows each command to
   see the results of newly committed transactions from other sessions.
   This is essential for consistent behavior when the commands are modifying

--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -57,6 +57,7 @@
 #include "storage/bufmgr.h"
 #include "storage/freespace.h"
 #include "storage/lmgr.h"
+#include "storage/predicate.h"
 #include "storage/procarray.h"
 #include "storage/smgr.h"
 #include "storage/standby.h"
@@ -261,20 +262,20 @@ heapgetpage(HeapScanDesc scan, BlockNumber page)
 	{
 		if (ItemIdIsNormal(lpp))
 		{
+			HeapTupleData loctup;
 			bool		valid;
+			loctup.t_data = (HeapTupleHeader) PageGetItem((Page) dp, lpp);
+			loctup.t_len = ItemIdGetLength(lpp);
+			ItemPointerSet(&(loctup.t_self), page, lineoff);
 			if (all_visible)
 				valid = true;
 			else
-			{
+				valid = HeapTupleSatisfiesVisibility(&loctup, snapshot, buffer);
-				HeapTupleData loctup;
-				loctup.t_data = (HeapTupleHeader) PageGetItem((Page) dp, lpp);
+			CheckForSerializableConflictOut(valid, scan->rs_rd, &loctup, buffer);
-				loctup.t_len = ItemIdGetLength(lpp);
-				ItemPointerSet(&(loctup.t_self), page, lineoff);
-				valid = HeapTupleSatisfiesVisibility(&loctup, snapshot, buffer);
-			}
 			if (valid)
 				scan->rs_vistuples[ntup++] = lineoff;
 		}
@@ -468,12 +469,16 @@ heapgettup(HeapScanDesc scan,
 													 snapshot,
 													 scan->rs_cbuf);
+				CheckForSerializableConflictOut(valid, scan->rs_rd, tuple, scan->rs_cbuf);
 				if (valid && key != NULL)
 					HeapKeyTest(tuple, RelationGetDescr(scan->rs_rd),
 								nkeys, key, valid);
 				if (valid)
 				{
+					if (!scan->rs_relpredicatelocked)
+						PredicateLockTuple(scan->rs_rd, tuple);
 					LockBuffer(scan->rs_cbuf, BUFFER_LOCK_UNLOCK);
 					return;
 				}
@@ -741,12 +746,16 @@ heapgettup_pagemode(HeapScanDesc scan,
 							nkeys, key, valid);
 				if (valid)
 				{
+					if (!scan->rs_relpredicatelocked)
+						PredicateLockTuple(scan->rs_rd, tuple);
 					scan->rs_cindex = lineindex;
 					return;
 				}
 			}
 			else
 			{
+				if (!scan->rs_relpredicatelocked)
+					PredicateLockTuple(scan->rs_rd, tuple);
 				scan->rs_cindex = lineindex;
 				return;
 			}
@@ -1213,6 +1222,7 @@ heap_beginscan_internal(Relation relation, Snapshot snapshot,
 	scan->rs_strategy = NULL;	/* set in initscan */
 	scan->rs_allow_strat = allow_strat;
 	scan->rs_allow_sync = allow_sync;
+	scan->rs_relpredicatelocked = false;
 	/*
 	 * we can use page-at-a-time mode if it's an MVCC-safe snapshot
@@ -1459,8 +1469,13 @@ heap_fetch(Relation relation,
 	 */
 	valid = HeapTupleSatisfiesVisibility(tuple, snapshot, buffer);
+	if (valid)
+		PredicateLockTuple(relation, tuple);
 	LockBuffer(buffer, BUFFER_LOCK_UNLOCK);
+	CheckForSerializableConflictOut(valid, relation, tuple, buffer);
 	if (valid)
 	{
 		/*
@@ -1506,13 +1521,15 @@ heap_fetch(Relation relation,
 * heap_fetch, we do not report any pgstats count; caller may do so if wanted.
 */
 bool
-heap_hot_search_buffer(ItemPointer tid, Buffer buffer, Snapshot snapshot,
+heap_hot_search_buffer(ItemPointer tid, Relation relation, Buffer buffer,
-					   bool *all_dead)
+					   Snapshot snapshot, bool *all_dead)
 {
 	Page		dp = (Page) BufferGetPage(buffer);
 	TransactionId prev_xmax = InvalidTransactionId;
 	OffsetNumber offnum;
 	bool		at_chain_start;
+	bool		valid;
+	bool		match_found;
 	if (all_dead)
 		*all_dead = true;
@@ -1522,6 +1539,7 @@ heap_hot_search_buffer(ItemPointer tid, Buffer buffer, Snapshot snapshot,
 	Assert(ItemPointerGetBlockNumber(tid) == BufferGetBlockNumber(buffer));
 	offnum = ItemPointerGetOffsetNumber(tid);
 	at_chain_start = true;
+	match_found = false;
 	/* Scan through possible multiple members of HOT-chain */
 	for (;;)
@@ -1552,6 +1570,8 @@ heap_hot_search_buffer(ItemPointer tid, Buffer buffer, Snapshot snapshot,
 		heapTuple.t_data = (HeapTupleHeader) PageGetItem(dp, lp);
 		heapTuple.t_len = ItemIdGetLength(lp);
+		heapTuple.t_tableOid = relation->rd_id;
+		heapTuple.t_self = *tid;
 		/*
 		 * Shouldn't see a HEAP_ONLY tuple at chain start.
@@ -1569,12 +1589,18 @@ heap_hot_search_buffer(ItemPointer tid, Buffer buffer, Snapshot snapshot,
 			break;
 		/* If it's visible per the snapshot, we must return it */
-		if (HeapTupleSatisfiesVisibility(&heapTuple, snapshot, buffer))
+		valid = HeapTupleSatisfiesVisibility(&heapTuple, snapshot, buffer);
+		CheckForSerializableConflictOut(valid, relation, &heapTuple, buffer);
+		if (valid)
 		{
 			ItemPointerSetOffsetNumber(tid, offnum);
+			PredicateLockTuple(relation, &heapTuple);
 			if (all_dead)
 				*all_dead = false;
-			return true;
+			if (IsolationIsSerializable())
+				match_found = true;
+			else
+				return true;
 		}
 		/*
@@ -1603,7 +1629,7 @@ heap_hot_search_buffer(ItemPointer tid, Buffer buffer, Snapshot snapshot,
 			break;				/* end of chain */
 	}
-	return false;
+	return match_found;
 }
 /*
@@ -1622,7 +1648,7 @@ heap_hot_search(ItemPointer tid, Relation relation, Snapshot snapshot,
 	buffer = ReadBuffer(relation, ItemPointerGetBlockNumber(tid));
 	LockBuffer(buffer, BUFFER_LOCK_SHARE);
-	result = heap_hot_search_buffer(tid, buffer, snapshot, all_dead);
+	result = heap_hot_search_buffer(tid, relation, buffer, snapshot, all_dead);
 	LockBuffer(buffer, BUFFER_LOCK_UNLOCK);
 	ReleaseBuffer(buffer);
 	return result;
@@ -1729,6 +1755,7 @@ heap_get_latest_tid(Relation relation,
 		 * result candidate.
 		 */
 		valid = HeapTupleSatisfiesVisibility(&tp, snapshot, buffer);
+		CheckForSerializableConflictOut(valid, relation, &tp, buffer);
 		if (valid)
 			*tid = ctid;
@@ -1893,6 +1920,13 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 	buffer = RelationGetBufferForTuple(relation, heaptup->t_len,
 									   InvalidBuffer, options, bistate);
+	/*
+	 * We're about to do the actual insert -- check for conflict at the
+	 * relation or buffer level first, to avoid possibly having to roll
+	 * back work we've just done.
+	 */
+	CheckForSerializableConflictIn(relation, NULL, buffer);
 	/* NO EREPORT(ERROR) from here till changes are logged */
 	START_CRIT_SECTION();
@@ -2193,6 +2227,12 @@ l1:
 		return result;
 	}
+	/*
+	 * We're about to do the actual delete -- check for conflict first,
+	 * to avoid possibly having to roll back work we've just done.
+	 */
+	CheckForSerializableConflictIn(relation, &tp, buffer);
 	/* replace cid with a combo cid if necessary */
 	HeapTupleHeaderAdjustCmax(tp.t_data, &cid, &iscombo);
@@ -2546,6 +2586,12 @@ l2:
 		return result;
 	}
+	/*
+	 * We're about to do the actual update -- check for conflict first,
+	 * to avoid possibly having to roll back work we've just done.
+	 */
+	CheckForSerializableConflictIn(relation, &oldtup, buffer);
 	/* Fill in OID and transaction status data for newtup */
 	if (relation->rd_rel->relhasoids)
 	{
@@ -2690,6 +2736,16 @@ l2:
 		heaptup = newtup;
 	}
+	/*
+	 * We're about to create the new tuple -- check for conflict first,
+	 * to avoid possibly having to roll back work we've just done.
+	 *
+	 * NOTE: For a tuple insert, we only need to check for table locks, since
+	 * predicate locking at the index level will cover ranges for anything
+	 * except a table scan.  Therefore, only provide the relation.
+	 */
+	CheckForSerializableConflictIn(relation, NULL, InvalidBuffer);
 	/*
 	 * At this point newbuf and buffer are both pinned and locked, and newbuf
 	 * has enough space for the new tuple.	If they are the same buffer, only
@@ -2799,6 +2855,12 @@ l2:
 	END_CRIT_SECTION();
+	/*
+	 * Any existing SIREAD locks on the old tuple must be linked to the new
+	 * tuple for conflict detection purposes.
+	 */
+	PredicateLockTupleRowVersionLink(relation, &oldtup, newtup);
 	if (newbuf != buffer)
 		LockBuffer(newbuf, BUFFER_LOCK_UNLOCK);
 	LockBuffer(buffer, BUFFER_LOCK_UNLOCK);

--- a/src/backend/access/index/indexam.c
+++ b/src/backend/access/index/indexam.c
@@ -64,9 +64,11 @@
 #include "access/relscan.h"
 #include "access/transam.h"
+#include "access/xact.h"
 #include "pgstat.h"
 #include "storage/bufmgr.h"
 #include "storage/lmgr.h"
+#include "storage/predicate.h"
 #include "utils/relcache.h"
 #include "utils/snapmgr.h"
 #include "utils/tqual.h"
@@ -192,6 +194,11 @@ index_insert(Relation indexRelation,
 	RELATION_CHECKS;
 	GET_REL_PROCEDURE(aminsert);
+	if (!(indexRelation->rd_am->ampredlocks))
+		CheckForSerializableConflictIn(indexRelation,
+									   (HeapTuple) NULL,
+									   InvalidBuffer);
 	/*
 	 * have the am's insert proc do all the work.
 	 */
@@ -266,6 +273,9 @@ index_beginscan_internal(Relation indexRelation,
 	RELATION_CHECKS;
 	GET_REL_PROCEDURE(ambeginscan);
+	if (!(indexRelation->rd_am->ampredlocks))
+		PredicateLockRelation(indexRelation);
 	/*
 	 * We hold a reference count to the relcache entry throughout the scan.
 	 */
@@ -523,6 +533,7 @@ index_getnext(IndexScanDesc scan, ScanDirection direction)
 		{
 			ItemId		lp;
 			ItemPointer ctid;
+			bool		valid;
 			/* check for bogus TID */
 			if (offnum < FirstOffsetNumber ||
@@ -577,8 +588,13 @@ index_getnext(IndexScanDesc scan, ScanDirection direction)
 				break;
 			/* If it's visible per the snapshot, we must return it */
-			if (HeapTupleSatisfiesVisibility(heapTuple, scan->xs_snapshot,
+			valid = HeapTupleSatisfiesVisibility(heapTuple, scan->xs_snapshot,
-											 scan->xs_cbuf))
+												 scan->xs_cbuf);
+			CheckForSerializableConflictOut(valid, scan->heapRelation,
+											heapTuple, scan->xs_cbuf);
+			if (valid)
 			{
 				/*
 				 * If the snapshot is MVCC, we know that it could accept at
@@ -586,7 +602,8 @@ index_getnext(IndexScanDesc scan, ScanDirection direction)
 				 * any more members.  Otherwise, check for continuation of the
 				 * HOT-chain, and set state for next time.
 				 */
-				if (IsMVCCSnapshot(scan->xs_snapshot))
+				if (IsMVCCSnapshot(scan->xs_snapshot)
+					&& !IsolationIsSerializable())
 					scan->xs_next_hot = InvalidOffsetNumber;
 				else if (HeapTupleIsHotUpdated(heapTuple))
 				{
@@ -598,6 +615,8 @@ index_getnext(IndexScanDesc scan, ScanDirection direction)
 				else
 					scan->xs_next_hot = InvalidOffsetNumber;
+				PredicateLockTuple(scan->heapRelation, heapTuple);
 				LockBuffer(scan->xs_cbuf, BUFFER_LOCK_UNLOCK);
 				pgstat_count_heap_fetch(scan->indexRelation);

--- a/src/backend/access/nbtree/nbtinsert.c
+++ b/src/backend/access/nbtree/nbtinsert.c
@@ -21,6 +21,7 @@
 #include "miscadmin.h"
 #include "storage/bufmgr.h"
 #include "storage/lmgr.h"
+#include "storage/predicate.h"
 #include "utils/inval.h"
 #include "utils/tqual.h"
@@ -174,6 +175,14 @@ top:
 	if (checkUnique != UNIQUE_CHECK_EXISTING)
 	{
+		/*
+		 * The only conflict predicate locking cares about for indexes is when
+		 * an index tuple insert conflicts with an existing lock.  Since the
+		 * actual location of the insert is hard to predict because of the
+		 * random search used to prevent O(N^2) performance when there are many
+		 * duplicate entries, we can just use the "first valid" page.
+		 */
+		CheckForSerializableConflictIn(rel, NULL, buf);
 		/* do the insertion */
 		_bt_findinsertloc(rel, &buf, &offset, natts, itup_scankey, itup, heapRel);
 		_bt_insertonpg(rel, buf, stack, itup, offset, false);
@@ -696,6 +705,9 @@ _bt_insertonpg(Relation rel,
 		/* split the buffer into left and right halves */
 		rbuf = _bt_split(rel, buf, firstright,
 						 newitemoff, itemsz, itup, newitemonleft);
+		PredicateLockPageSplit(rel,
+							   BufferGetBlockNumber(buf),
+							   BufferGetBlockNumber(rbuf));
 		/*----------
 		 * By here,

--- a/src/backend/access/nbtree/nbtpage.c
+++ b/src/backend/access/nbtree/nbtpage.c
@@ -29,6 +29,7 @@
 #include "storage/freespace.h"
 #include "storage/indexfsm.h"
 #include "storage/lmgr.h"
+#include "storage/predicate.h"
 #include "utils/inval.h"
 #include "utils/snapmgr.h"
@@ -1183,6 +1184,12 @@ _bt_pagedel(Relation rel, Buffer buf, BTStack stack)
 			 rightsib, opaque->btpo_prev, target,
 			 RelationGetRelationName(rel));
+	/*
+	 * Any insert which would have gone on the target block will now go to the
+	 * right sibling block.
+	 */
+	PredicateLockPageCombine(rel, target, rightsib);
 	/*
 	 * Next find and write-lock the current parent of the target page. This is
 	 * essentially the same as the corresponding step of splitting.

--- a/src/backend/access/nbtree/nbtree.c
+++ b/src/backend/access/nbtree/nbtree.c
@@ -29,6 +29,7 @@
 #include "storage/indexfsm.h"
 #include "storage/ipc.h"
 #include "storage/lmgr.h"
+#include "storage/predicate.h"
 #include "storage/smgr.h"
 #include "utils/memutils.h"
@@ -822,6 +823,7 @@ restart:
 	if (_bt_page_recyclable(page))
 	{
 		/* Okay to recycle this page */
+		Assert(!PageIsPredicateLocked(rel, blkno));
 		RecordFreeIndexPage(rel, blkno);
 		vstate->totFreePages++;
 		stats->pages_deleted++;

--- a/src/backend/access/nbtree/nbtsearch.c
+++ b/src/backend/access/nbtree/nbtsearch.c
@@ -21,6 +21,7 @@
 #include "miscadmin.h"
 #include "pgstat.h"
 #include "storage/bufmgr.h"
+#include "storage/predicate.h"
 #include "utils/lsyscache.h"
 #include "utils/rel.h"
@@ -63,7 +64,10 @@ _bt_search(Relation rel, int keysz, ScanKey scankey, bool nextkey,
 	/* If index is empty and access = BT_READ, no root page is created. */
 	if (!BufferIsValid(*bufP))
+	{
+		PredicateLockRelation(rel);  /* Nothing finer to lock exists. */
 		return (BTStack) NULL;
+	}
 	/* Loop iterates once per level descended in the tree */
 	for (;;)
@@ -88,7 +92,11 @@ _bt_search(Relation rel, int keysz, ScanKey scankey, bool nextkey,
 		page = BufferGetPage(*bufP);
 		opaque = (BTPageOpaque) PageGetSpecialPointer(page);
 		if (P_ISLEAF(opaque))
+		{
+			if (access == BT_READ)
+				PredicateLockPage(rel, BufferGetBlockNumber(*bufP));
 			break;
+		}
 		/*
 		 * Find the appropriate item on the internal page, and get the child
@@ -1142,6 +1150,7 @@ _bt_steppage(IndexScanDesc scan, ScanDirection dir)
 			opaque = (BTPageOpaque) PageGetSpecialPointer(page);
 			if (!P_IGNORE(opaque))
 			{
+				PredicateLockPage(rel, blkno);
 				/* see if there are any matches on this page */
 				/* note that this will clear moreRight if we can stop */
 				if (_bt_readpage(scan, dir, P_FIRSTDATAKEY(opaque)))
@@ -1189,6 +1198,7 @@ _bt_steppage(IndexScanDesc scan, ScanDirection dir)
 			opaque = (BTPageOpaque) PageGetSpecialPointer(page);
 			if (!P_IGNORE(opaque))
 			{
+				PredicateLockPage(rel, BufferGetBlockNumber(so->currPos.buf));
 				/* see if there are any matches on this page */
 				/* note that this will clear moreLeft if we can stop */
 				if (_bt_readpage(scan, dir, PageGetMaxOffsetNumber(page)))
@@ -1352,6 +1362,7 @@ _bt_get_endpoint(Relation rel, uint32 level, bool rightmost)
 	if (!BufferIsValid(buf))
 	{
 		/* empty index... */
+		PredicateLockRelation(rel);  /* Nothing finer to lock exists. */
 		return InvalidBuffer;
 	}
@@ -1431,10 +1442,12 @@ _bt_endpoint(IndexScanDesc scan, ScanDirection dir)
 	if (!BufferIsValid(buf))
 	{
 		/* empty index... */
+		PredicateLockRelation(rel);  /* Nothing finer to lock exists. */
 		so->currPos.buf = InvalidBuffer;
 		return false;
 	}
+	PredicateLockPage(rel, BufferGetBlockNumber(buf));
 	page = BufferGetPage(buf);
 	opaque = (BTPageOpaque) PageGetSpecialPointer(page);
 	Assert(P_ISLEAF(opaque));

--- a/src/backend/access/transam/twophase.c
+++ b/src/backend/access/transam/twophase.c
@@ -57,6 +57,7 @@
 #include "pgstat.h"
 #include "replication/walsender.h"
 #include "storage/fd.h"
+#include "storage/predicate.h"
 #include "storage/procarray.h"
 #include "storage/sinvaladt.h"
 #include "storage/smgr.h"
@@ -1357,6 +1358,8 @@ FinishPreparedTransaction(const char *gid, bool isCommit)
 	else
 		ProcessRecords(bufptr, xid, twophase_postabort_callbacks);
+	PredicateLockTwoPhaseFinish(xid, isCommit);
 	/* Count the prepared xact as committed or aborted */
 	AtEOXact_PgStat(isCommit);

--- a/src/backend/access/transam/twophase_rmgr.c
+++ b/src/backend/access/transam/twophase_rmgr.c
@@ -18,12 +18,14 @@
 #include "access/twophase_rmgr.h"
 #include "pgstat.h"
 #include "storage/lock.h"
+#include "storage/predicate.h"
 const TwoPhaseCallback twophase_recover_callbacks[TWOPHASE_RM_MAX_ID + 1] =
 {
 	NULL,						/* END ID */
 	lock_twophase_recover,		/* Lock */
+	predicatelock_twophase_recover,		/* PredicateLock */
 	NULL,						/* pgstat */
 	multixact_twophase_recover	/* MultiXact */
 };
@@ -32,6 +34,7 @@ const TwoPhaseCallback twophase_postcommit_callbacks[TWOPHASE_RM_MAX_ID + 1] =
 {
 	NULL,						/* END ID */
 	lock_twophase_postcommit,	/* Lock */
+	NULL,						/* PredicateLock */
 	pgstat_twophase_postcommit, /* pgstat */
 	multixact_twophase_postcommit		/* MultiXact */
 };
@@ -40,6 +43,7 @@ const TwoPhaseCallback twophase_postabort_callbacks[TWOPHASE_RM_MAX_ID + 1] =
 {
 	NULL,						/* END ID */
 	lock_twophase_postabort,	/* Lock */
+	NULL,						/* PredicateLock */
 	pgstat_twophase_postabort,	/* pgstat */
 	multixact_twophase_postabort	/* MultiXact */
 };
@@ -48,6 +52,7 @@ const TwoPhaseCallback twophase_standby_recover_callbacks[TWOPHASE_RM_MAX_ID + 1
 {
 	NULL,						/* END ID */
 	lock_twophase_standby_recover,		/* Lock */
+	NULL,						/* PredicateLock */
 	NULL,						/* pgstat */
 	NULL						/* MultiXact */
 };
--- a/src/backend/access/transam/varsup.c
+++ b/src/backend/access/transam/varsup.c
@@ -21,6 +21,7 @@
 #include "miscadmin.h"
 #include "postmaster/autovacuum.h"
 #include "storage/pmsignal.h"
+#include "storage/predicate.h"
 #include "storage/proc.h"
 #include "utils/builtins.h"
 #include "utils/syscache.h"
@@ -161,6 +162,10 @@ GetNewTransactionId(bool isSubXact)
 	ExtendCLOG(xid);
 	ExtendSUBTRANS(xid);
+	/* If it's top level, the predicate locking system also needs to know. */
+	if (!isSubXact)
+		RegisterPredicateLockingXid(xid);
 	/*
 	 * Now advance the nextXid counter.  This must not happen until after we
 	 * have successfully completed ExtendCLOG() --- if that routine fails, we

--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -40,6 +40,7 @@
 #include "storage/bufmgr.h"
 #include "storage/fd.h"
 #include "storage/lmgr.h"
+#include "storage/predicate.h"
 #include "storage/procarray.h"
 #include "storage/sinvaladt.h"
 #include "storage/smgr.h"
@@ -63,6 +64,9 @@ int			XactIsoLevel;
 bool		DefaultXactReadOnly = false;
 bool		XactReadOnly;
+bool		DefaultXactDeferrable = false;
+bool		XactDeferrable;
 bool		XactSyncCommit = true;
 int			CommitDelay = 0;	/* precommit delay in microseconds */
@@ -1640,6 +1644,7 @@ StartTransaction(void)
 		s->startedInRecovery = false;
 		XactReadOnly = DefaultXactReadOnly;
 	}
+	XactDeferrable = DefaultXactDeferrable;
 	XactIsoLevel = DefaultXactIsoLevel;
 	forceSyncCommit = false;
 	MyXactAccessedTempRel = false;
@@ -1786,6 +1791,13 @@ CommitTransaction(void)
 	/* close large objects before lower-level cleanup */
 	AtEOXact_LargeObject(true);
+	/*
+	 * Mark serializable transaction as complete for predicate locking
+	 * purposes.  This should be done as late as we can put it and still
+	 * allow errors to be raised for failure patterns found at commit.
+	 */
+	PreCommit_CheckForSerializationFailure();
 	/*
 	 * Insert notifications sent by NOTIFY commands into the queue.  This
 	 * should be late in the pre-commit sequence to minimize time spent
@@ -1980,6 +1992,13 @@ PrepareTransaction(void)
 	/* close large objects before lower-level cleanup */
 	AtEOXact_LargeObject(true);
+	/*
+	 * Mark serializable transaction as complete for predicate locking
+	 * purposes.  This should be done as late as we can put it and still
+	 * allow errors to be raised for failure patterns found at commit.
+	 */
+	PreCommit_CheckForSerializationFailure();
 	/* NOTIFY will be handled below */
 	/*
@@ -2044,6 +2063,7 @@ PrepareTransaction(void)
 	AtPrepare_Notify();
 	AtPrepare_Locks();
+	AtPrepare_PredicateLocks();
 	AtPrepare_PgStat();
 	AtPrepare_MultiXact();
 	AtPrepare_RelationMap();
@@ -2103,6 +2123,7 @@ PrepareTransaction(void)
 	PostPrepare_MultiXact(xid);
 	PostPrepare_Locks(xid);
+	PostPrepare_PredicateLocks(xid);
 	ResourceOwnerRelease(TopTransactionResourceOwner,
 						 RESOURCE_RELEASE_LOCKS,

--- a/src/backend/commands/variable.c
+++ b/src/backend/commands/variable.c
@@ -616,6 +616,15 @@ assign_XactIsoLevel(const char *value, bool doit, GucSource source)
 					 errmsg("SET TRANSACTION ISOLATION LEVEL must not be called in a subtransaction")));
 			return NULL;
 		}
+		/* Can't go to serializable mode while recovery is still active */
+		if (RecoveryInProgress() && strcmp(value, "serializable") == 0)
+		{
+			ereport(GUC_complaint_elevel(source),
+					(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+					 errmsg("cannot use serializable mode in a hot standby"),
+					 errhint("You can use REPEATABLE READ instead.")));
+			return false;
+		}
 	}
 	if (strcmp(value, "serializable") == 0)
@@ -667,6 +676,35 @@ show_XactIsoLevel(void)
 	}
 }
+/*
+ * SET TRANSACTION [NOT] DEFERRABLE
+ */
+bool
+assign_transaction_deferrable(bool newval, bool doit, GucSource source)
+{
+	/* source == PGC_S_OVERRIDE means do it anyway, eg at xact abort */
+	if (source == PGC_S_OVERRIDE)
+		return true;
+	if (IsSubTransaction())
+	{
+		ereport(GUC_complaint_elevel(source),
+				(errcode(ERRCODE_ACTIVE_SQL_TRANSACTION),
+				 errmsg("SET TRANSACTION [NOT] DEFERRABLE cannot be called within a subtransaction")));
+		return false;
+	}
+	if (FirstSnapshotSet)
+	{
+		ereport(GUC_complaint_elevel(source),
+				(errcode(ERRCODE_ACTIVE_SQL_TRANSACTION),
+				 errmsg("SET TRANSACTION [NOT] DEFERRABLE must be called before any query")));
+		return false;
+	}
+	return true;
+}
 /*
 * Random number seed

--- a/src/backend/executor/nodeBitmapHeapscan.c
+++ b/src/backend/executor/nodeBitmapHeapscan.c
@@ -42,6 +42,7 @@
 #include "executor/nodeBitmapHeapscan.h"
 #include "pgstat.h"
 #include "storage/bufmgr.h"
+#include "storage/predicate.h"
 #include "utils/memutils.h"
 #include "utils/snapmgr.h"
 #include "utils/tqual.h"
@@ -351,7 +352,7 @@ bitgetpage(HeapScanDesc scan, TBMIterateResult *tbmres)
 			ItemPointerData tid;
 			ItemPointerSet(&tid, page, offnum);
-			if (heap_hot_search_buffer(&tid, buffer, snapshot, NULL))
+			if (heap_hot_search_buffer(&tid, scan->rs_rd, buffer, snapshot, NULL))
 				scan->rs_vistuples[ntup++] = ItemPointerGetOffsetNumber(&tid);
 		}
 	}

--- a/src/backend/executor/nodeSeqscan.c
+++ b/src/backend/executor/nodeSeqscan.c
@@ -28,6 +28,7 @@
 #include "access/relscan.h"
 #include "executor/execdebug.h"
 #include "executor/nodeSeqscan.h"
+#include "storage/predicate.h"
 static void InitScanRelation(SeqScanState *node, EState *estate);
 static TupleTableSlot *SeqNext(SeqScanState *node);
@@ -105,11 +106,15 @@ SeqRecheck(SeqScanState *node, TupleTableSlot *slot)
 *		tuple.
 *		We call the ExecScan() routine and pass it the appropriate
 *		access method functions.
+ *		For serializable transactions, we first acquire a predicate
+ *		lock on the entire relation.
 * ----------------------------------------------------------------
 */
 TupleTableSlot *
 ExecSeqScan(SeqScanState *node)
 {
+	PredicateLockRelation(node->ss_currentRelation);
+	node->ss_currentScanDesc->rs_relpredicatelocked = true;
 	return ExecScan((ScanState *) node,
 					(ExecScanAccessMtd) SeqNext,
 					(ExecScanRecheckMtd) SeqRecheck);

--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -6768,6 +6768,12 @@ transaction_mode_item:
 			| READ WRITE
 					{ $$ = makeDefElem("transaction_read_only",
 									   makeIntConst(FALSE, @1)); }
+			| DEFERRABLE
+					{ $$ = makeDefElem("transaction_deferrable",
+									   makeIntConst(TRUE, @1)); }
+			| NOT DEFERRABLE
+					{ $$ = makeDefElem("transaction_deferrable",
+									   makeIntConst(FALSE, @1)); }
 		;
 /* Syntax with commas is SQL-spec, without commas is Postgres historical */

--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -32,6 +32,7 @@
 #include "storage/ipc.h"
 #include "storage/pg_shmem.h"
 #include "storage/pmsignal.h"
+#include "storage/predicate.h"
 #include "storage/procarray.h"
 #include "storage/procsignal.h"
 #include "storage/sinvaladt.h"
@@ -105,6 +106,7 @@ CreateSharedMemoryAndSemaphores(bool makePrivate, int port)
 												 sizeof(ShmemIndexEnt)));
 		size = add_size(size, BufferShmemSize());
 		size = add_size(size, LockShmemSize());
+		size = add_size(size, PredicateLockShmemSize());
 		size = add_size(size, ProcGlobalShmemSize());
 		size = add_size(size, XLOGShmemSize());
 		size = add_size(size, CLOGShmemSize());
@@ -199,6 +201,11 @@ CreateSharedMemoryAndSemaphores(bool makePrivate, int port)
 	 */
 	InitLocks();
+	/*
+	 * Set up predicate lock manager
+	 */
+	InitPredicateLocks();
 	/*
 	 * Set up process table
 	 */

--- a/src/backend/storage/ipc/shmem.c
+++ b/src/backend/storage/ipc/shmem.c
@@ -198,7 +198,7 @@ ShmemAlloc(Size size)
 * Returns TRUE if the pointer points within the shared memory segment.
 */
 bool
-ShmemAddrIsValid(void *addr)
+ShmemAddrIsValid(const void *addr)
 {
 	return (addr >= ShmemBase) && (addr < ShmemEnd);
 }

--- a/src/backend/storage/ipc/shmqueue.c
+++ b/src/backend/storage/ipc/shmqueue.c
@@ -43,14 +43,12 @@ SHMQueueInit(SHM_QUEUE *queue)
 * SHMQueueIsDetached -- TRUE if element is not currently
 *		in a queue.
 */
-#ifdef NOT_USED
 bool
-SHMQueueIsDetached(SHM_QUEUE *queue)
+SHMQueueIsDetached(const SHM_QUEUE *queue)
 {
 	Assert(ShmemAddrIsValid(queue));
 	return (queue->prev == NULL);
 }
-#endif
 /*
 * SHMQueueElemInit -- clear an element's links
@@ -146,7 +144,7 @@ SHMQueueInsertAfter(SHM_QUEUE *queue, SHM_QUEUE *elem)
 *--------------------
 */
 Pointer
-SHMQueueNext(SHM_QUEUE *queue, SHM_QUEUE *curElem, Size linkOffset)
+SHMQueueNext(const SHM_QUEUE *queue, const SHM_QUEUE *curElem, Size linkOffset)
 {
 	SHM_QUEUE  *elemPtr = curElem->next;
@@ -162,7 +160,7 @@ SHMQueueNext(SHM_QUEUE *queue, SHM_QUEUE *curElem, Size linkOffset)
 * SHMQueueEmpty -- TRUE if queue head is only element, FALSE otherwise
 */
 bool
-SHMQueueEmpty(SHM_QUEUE *queue)
+SHMQueueEmpty(const SHM_QUEUE *queue)
 {
 	Assert(ShmemAddrIsValid(queue));

--- a/src/backend/storage/lmgr/Makefile
+++ b/src/backend/storage/lmgr/Makefile
@@ -12,7 +12,7 @@ subdir = src/backend/storage/lmgr
 top_builddir = ../../../..
 include $(top_builddir)/src/Makefile.global
-OBJS = lmgr.o lock.o proc.o deadlock.o lwlock.o spin.o s_lock.o
+OBJS = lmgr.o lock.o proc.o deadlock.o lwlock.o spin.o s_lock.o predicate.o
 include $(top_srcdir)/src/backend/common.mk

--- a/src/backend/storage/lmgr/README
+++ b/src/backend/storage/lmgr/README
@@ -3,7 +3,7 @@ src/backend/storage/lmgr/README
 Locking Overview
 ================
-Postgres uses three types of interprocess locks:
+Postgres uses four types of interprocess locks:
 * Spinlocks.  These are intended for *very* short-term locks.  If a lock
 is to be held more than a few dozen instructions, or across any sort of
@@ -34,6 +34,8 @@ supports a variety of lock modes with table-driven semantics, and it has
 full deadlock detection and automatic release at transaction end.
 Regular locks should be used for all user-driven lock requests.
+* SIReadLock predicate locks.  See separate README-SSI file for details.
 Acquisition of either a spinlock or a lightweight lock causes query
 cancel and die() interrupts to be held off until all such locks are
 released. No such restriction exists for regular locks, however.  Also

--- a/src/backend/storage/lmgr/README-SSI
+++ b/src/backend/storage/lmgr/README-SSI
--- a/src/backend/storage/lmgr/lwlock.c
+++ b/src/backend/storage/lmgr/lwlock.c
@@ -28,6 +28,7 @@
 #include "miscadmin.h"
 #include "pg_trace.h"
 #include "storage/ipc.h"
+#include "storage/predicate.h"
 #include "storage/proc.h"
 #include "storage/spin.h"
@@ -178,6 +179,9 @@ NumLWLocks(void)
 	/* async.c needs one per Async buffer */
 	numLocks += NUM_ASYNC_BUFFERS;
+	/* predicate.c needs one per old serializable xid buffer */
+	numLocks += NUM_OLDSERXID_BUFFERS;
 	/*
 	 * Add any requested by loadable modules; for backwards-compatibility
 	 * reasons, allocate at least NUM_USER_DEFINED_LWLOCKS of them even if

--- a/src/backend/storage/lmgr/predicate.c
+++ b/src/backend/storage/lmgr/predicate.c
--- a/src/backend/tcop/utility.c
+++ b/src/backend/tcop/utility.c
@@ -374,6 +374,10 @@ standard_ProcessUtility(Node *parsetree,
 									SetPGVariable("transaction_read_only",
 												  list_make1(item->arg),
 												  true);
+								else if (strcmp(item->defname, "transaction_deferrable") == 0)
+									SetPGVariable("transaction_deferrable",
+												  list_make1(item->arg),
+												  true);
 							}
 						}
 						break;

--- a/src/backend/utils/adt/lockfuncs.c
+++ b/src/backend/utils/adt/lockfuncs.c
@@ -15,6 +15,7 @@
 #include "catalog/pg_type.h"
 #include "funcapi.h"
 #include "miscadmin.h"
+#include "storage/predicate_internals.h"
 #include "storage/proc.h"
 #include "utils/builtins.h"
@@ -32,11 +33,20 @@ static const char *const LockTagTypeNames[] = {
 	"advisory"
 };
+/* This must match enum PredicateLockTargetType (predicate_internals.h) */
+static const char *const PredicateLockTagTypeNames[] = {
+	"relation",
+	"page",
+	"tuple"
+};
 /* Working status for pg_lock_status */
 typedef struct
 {
 	LockData   *lockData;		/* state data from lmgr */
 	int			currIdx;		/* current PROCLOCK index */
+	PredicateLockData *predLockData;	/* state data for pred locks */
+	int			predLockIdx;	/* current index for pred lock */
 } PG_Lock_Status;
@@ -69,6 +79,7 @@ pg_lock_status(PG_FUNCTION_ARGS)
 	FuncCallContext *funcctx;
 	PG_Lock_Status *mystatus;
 	LockData   *lockData;
+	PredicateLockData *predLockData;
 	if (SRF_IS_FIRSTCALL())
 	{
@@ -126,6 +137,8 @@ pg_lock_status(PG_FUNCTION_ARGS)
 		mystatus->lockData = GetLockStatusData();
 		mystatus->currIdx = 0;
+		mystatus->predLockData = GetPredicateLockStatusData();
+		mystatus->predLockIdx = 0;
 		MemoryContextSwitchTo(oldcontext);
 	}
@@ -303,6 +316,72 @@ pg_lock_status(PG_FUNCTION_ARGS)
 		SRF_RETURN_NEXT(funcctx, result);
 	}
+	/*
+	 * Have returned all regular locks. Now start on the SIREAD predicate
+	 * locks.
+	 */
+	predLockData = mystatus->predLockData;
+	if (mystatus->predLockIdx < predLockData->nelements)
+	{
+		PredicateLockTargetType lockType;
+		PREDICATELOCKTARGETTAG *predTag = &(predLockData->locktags[mystatus->predLockIdx]);
+		SERIALIZABLEXACT *xact = &(predLockData->xacts[mystatus->predLockIdx]);
+		Datum		values[14];
+		bool		nulls[14];
+		HeapTuple	tuple;
+		Datum		result;
+		mystatus->predLockIdx++;
+		/*
+		 * Form tuple with appropriate data.
+		 */
+		MemSet(values, 0, sizeof(values));
+		MemSet(nulls, false, sizeof(nulls));
+		/* lock type */
+		lockType = GET_PREDICATELOCKTARGETTAG_TYPE(*predTag);
+		values[0] = CStringGetTextDatum(PredicateLockTagTypeNames[lockType]);
+		/* lock target */
+		values[1] = GET_PREDICATELOCKTARGETTAG_DB(*predTag);
+		values[2] = GET_PREDICATELOCKTARGETTAG_RELATION(*predTag);
+		if (lockType == PREDLOCKTAG_TUPLE)
+			values[4] = GET_PREDICATELOCKTARGETTAG_OFFSET(*predTag);
+		else
+			nulls[4] = true;
+		if ((lockType == PREDLOCKTAG_TUPLE) ||
+			(lockType == PREDLOCKTAG_PAGE))
+			values[3] = GET_PREDICATELOCKTARGETTAG_PAGE(*predTag);
+		else
+			nulls[3] = true;
+		/* these fields are targets for other types of locks */
+		nulls[5] = true;		/* virtualxid */
+		nulls[6] = true;		/* transactionid */
+		nulls[7] = true;		/* classid */
+		nulls[8] = true;		/* objid */
+		nulls[9] = true;		/* objsubid */
+		/* lock holder */
+		values[10] = VXIDGetDatum(xact->vxid.backendId,
+								  xact->vxid.localTransactionId);
+		nulls[11] = true;		/* pid */
+		/*
+		 * Lock mode. Currently all predicate locks are SIReadLocks, which are
+		 * always held (never waiting)
+		 */
+		values[12] = CStringGetTextDatum("SIReadLock");
+		values[13] = BoolGetDatum(true);
+		tuple = heap_form_tuple(funcctx->tuple_desc, values, nulls);
+		result = HeapTupleGetDatum(tuple);
+		SRF_RETURN_NEXT(funcctx, result);
+	}
 	SRF_RETURN_DONE(funcctx);
 }

--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -59,6 +59,7 @@
 #include "storage/bufmgr.h"
 #include "storage/standby.h"
 #include "storage/fd.h"
+#include "storage/predicate.h"
 #include "tcop/tcopprot.h"
 #include "tsearch/ts_cache.h"
 #include "utils/builtins.h"
@@ -1096,6 +1097,23 @@ static struct config_bool ConfigureNamesBool[] =
 		&XactReadOnly,
 		false, assign_transaction_read_only, NULL
 	},
+	{
+		{"default_transaction_deferrable", PGC_USERSET, CLIENT_CONN_STATEMENT,
+			gettext_noop("Sets the default deferrable status of new transactions."),
+			NULL
+		},
+		&DefaultXactDeferrable,
+		false, NULL, NULL
+	},
+	{
+		{"transaction_deferrable", PGC_USERSET, CLIENT_CONN_STATEMENT,
+			gettext_noop("Whether to defer a read-only serializable transaction until it can be executed with no possible serialization failures."),
+			NULL,
+			GUC_NO_RESET_ALL | GUC_NOT_IN_SAMPLE | GUC_DISALLOW_IN_FILE
+		},
+		&XactDeferrable,
+		false, assign_transaction_deferrable, NULL
+	},
 	{
 		{"check_function_bodies", PGC_USERSET, CLIENT_CONN_STATEMENT,
 			gettext_noop("Check function bodies during CREATE FUNCTION."),
@@ -1695,6 +1713,17 @@ static struct config_int ConfigureNamesInt[] =
 		64, 10, INT_MAX, NULL, NULL
 	},
+	{
+		{"max_predicate_locks_per_transaction", PGC_POSTMASTER, LOCK_MANAGEMENT,
+			gettext_noop("Sets the maximum number of predicate locks per transaction."),
+			gettext_noop("The shared predicate lock table is sized on the assumption that "
+			  "at most max_predicate_locks_per_transaction * max_connections distinct "
+						 "objects will need to be locked at any one time.")
+		},
+		&max_predicate_locks_per_xact,
+		64, 10, INT_MAX, NULL, NULL
+	},
 	{
 		{"authentication_timeout", PGC_SIGHUP, CONN_AUTH_SECURITY,
 			gettext_noop("Sets the maximum allowed time to complete client authentication."),
@@ -3460,6 +3489,8 @@ InitializeGUCOptions(void)
 					PGC_POSTMASTER, PGC_S_OVERRIDE);
 	SetConfigOption("transaction_read_only", "no",
 					PGC_POSTMASTER, PGC_S_OVERRIDE);
+	SetConfigOption("transaction_deferrable", "no",
+					PGC_POSTMASTER, PGC_S_OVERRIDE);
 	/*
 	 * For historical reasons, some GUC parameters can receive defaults from
@@ -5699,6 +5730,9 @@ ExecSetVariableStmt(VariableSetStmt *stmt)
 					else if (strcmp(item->defname, "transaction_read_only") == 0)
 						SetPGVariable("transaction_read_only",
 									  list_make1(item->arg), stmt->is_local);
+					else if (strcmp(item->defname, "transaction_deferrable") == 0)
+						SetPGVariable("transaction_deferrable",
+									  list_make1(item->arg), stmt->is_local);
 					else
 						elog(ERROR, "unexpected SET TRANSACTION element: %s",
 							 item->defname);
@@ -5718,6 +5752,9 @@ ExecSetVariableStmt(VariableSetStmt *stmt)
 					else if (strcmp(item->defname, "transaction_read_only") == 0)
 						SetPGVariable("default_transaction_read_only",
 									  list_make1(item->arg), stmt->is_local);
+					else if (strcmp(item->defname, "transaction_deferrable") == 0)
+						SetPGVariable("default_transaction_deferrable",
+									  list_make1(item->arg), stmt->is_local);
 					else
 						elog(ERROR, "unexpected SET SESSION element: %s",
 							 item->defname);

--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -450,6 +450,7 @@
 #check_function_bodies = on
 #default_transaction_isolation = 'read committed'
 #default_transaction_read_only = off
+#default_transaction_deferrable = off
 #session_replication_role = 'origin'
 #statement_timeout = 0			# in milliseconds, 0 is disabled
 #vacuum_freeze_min_age = 50000000
@@ -501,7 +502,8 @@
 # Note:  Each lock table slot uses ~270 bytes of shared memory, and there are
 # max_locks_per_transaction * (max_connections + max_prepared_transactions)
 # lock table slots.
+#max_predicate_locks_per_transaction = 64	# min 10
+					# (change requires restart)
 #------------------------------------------------------------------------------
 # VERSION/PLATFORM COMPATIBILITY

--- a/src/backend/utils/resowner/resowner.c
+++ b/src/backend/utils/resowner/resowner.c
@@ -22,6 +22,7 @@
 #include "access/hash.h"
 #include "storage/bufmgr.h"
+#include "storage/predicate.h"
 #include "storage/proc.h"
 #include "utils/memutils.h"
 #include "utils/rel.h"
@@ -261,7 +262,10 @@ ResourceOwnerReleaseInternal(ResourceOwner owner,
 			 * the top of the recursion.
 			 */
 			if (owner == TopTransactionResourceOwner)
+			{
 				ProcReleaseLocks(isCommit);
+				ReleasePredicateLocks(isCommit);
+			}
 		}
 		else
 		{

--- a/src/backend/utils/time/snapmgr.c
+++ b/src/backend/utils/time/snapmgr.c
@@ -27,6 +27,7 @@
 #include "access/transam.h"
 #include "access/xact.h"
+#include "storage/predicate.h"
 #include "storage/proc.h"
 #include "storage/procarray.h"
 #include "utils/memutils.h"
@@ -126,9 +127,6 @@ GetTransactionSnapshot(void)
 	{
 		Assert(RegisteredSnapshots == 0);
-		CurrentSnapshot = GetSnapshotData(&CurrentSnapshotData);
-		FirstSnapshotSet = true;
 		/*
 		 * In transaction-snapshot mode, the first snapshot must live until
 		 * end of xact regardless of what the caller does with it, so we must
@@ -136,11 +134,20 @@ GetTransactionSnapshot(void)
 		 */
 		if (IsolationUsesXactSnapshot())
 		{
-			CurrentSnapshot = RegisterSnapshotOnOwner(CurrentSnapshot,
+			if (IsolationIsSerializable())
+				CurrentSnapshot = RegisterSerializableTransaction(&CurrentSnapshotData);
+			else
+			{
+				CurrentSnapshot = GetSnapshotData(&CurrentSnapshotData);
+				CurrentSnapshot = RegisterSnapshotOnOwner(CurrentSnapshot,
 												TopTransactionResourceOwner);
+			}
 			registered_xact_snapshot = true;
 		}
+		else
+			CurrentSnapshot = GetSnapshotData(&CurrentSnapshotData);
+		FirstSnapshotSet = true;
 		return CurrentSnapshot;
 	}

--- a/src/bin/initdb/initdb.c
+++ b/src/bin/initdb/initdb.c
@@ -2299,6 +2299,7 @@ main(int argc, char *argv[])
 		"pg_xlog/archive_status",
 		"pg_clog",
 		"pg_notify",
+		"pg_serial",
 		"pg_subtrans",
 		"pg_twophase",
 		"pg_multixact/members",

--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -11,14 +11,14 @@
 *	script that reproduces the schema in terms of SQL that is understood
 *	by PostgreSQL
 *
- *	Note that pg_dump runs in a serializable transaction, so it sees a
+ *	Note that pg_dump runs in a transaction-snapshot mode transaction,
- *	consistent snapshot of the database including system catalogs.
+ *	so it sees a consistent snapshot of the database including system
- *	However, it relies in part on various specialized backend functions
+ *	catalogs. However, it relies in part on various specialized backend
- *	like pg_get_indexdef(), and those things tend to run on SnapshotNow
+ *	functions like pg_get_indexdef(), and those things tend to run on
- *	time, ie they look at the currently committed state.  So it is
+ *	SnapshotNow time, ie they look at the currently committed state.  So
- *	possible to get 'cache lookup failed' error if someone performs DDL
+ *	it is possible to get 'cache lookup failed' error if someone
- *	changes while a dump is happening. The window for this sort of thing
+ *	performs DDL changes while a dump is happening. The window for this
- *	is from the beginning of the serializable transaction to
+ *	sort of thing is from the acquisition of the transaction snapshot to
 *	getSchemaData() (when pg_dump acquires AccessShareLock on every
 *	table it intends to dump). It isn't very large, but it can happen.
 *
@@ -135,6 +135,7 @@ static int	dump_inserts = 0;
 static int	column_inserts = 0;
 static int	no_security_label = 0;
 static int	no_unlogged_table_data = 0;
+static int	serializable_deferrable = 0;
 static void help(const char *progname);
@@ -318,6 +319,7 @@ main(int argc, char **argv)
 		{"no-tablespaces", no_argument, &outputNoTablespaces, 1},
 		{"quote-all-identifiers", no_argument, &quote_all_identifiers, 1},
 		{"role", required_argument, NULL, 3},
+		{"serializable-deferrable", no_argument, &serializable_deferrable, 1},
 		{"use-set-session-authorization", no_argument, &use_setsessauth, 1},
 		{"no-security-label", no_argument, &no_security_label, 1},
 		{"no-unlogged-table-data", no_argument, &no_unlogged_table_data, 1},
@@ -669,11 +671,21 @@ main(int argc, char **argv)
 		no_security_label = 1;
 	/*
-	 * Start serializable transaction to dump consistent data.
+	 * Start transaction-snapshot mode transaction to dump consistent data.
 	 */
 	do_sql_command(g_conn, "BEGIN");
+	if (g_fout->remoteVersion >= 90100)
-	do_sql_command(g_conn, "SET TRANSACTION ISOLATION LEVEL SERIALIZABLE");
+	{
+		if (serializable_deferrable)
+			do_sql_command(g_conn,
+						   "SET TRANSACTION ISOLATION LEVEL SERIALIZABLE, "
+						   "READ ONLY, DEFERRABLE");
+		else
+			do_sql_command(g_conn,
+						   "SET TRANSACTION ISOLATION LEVEL REPEATABLE READ");
+	}
+	else
+		do_sql_command(g_conn, "SET TRANSACTION ISOLATION LEVEL SERIALIZABLE");
 	/* Select the appropriate subquery to convert user IDs to names */
 	if (g_fout->remoteVersion >= 80100)
@@ -864,6 +876,7 @@ help(const char *progname)
 	printf(_("  --disable-triggers          disable triggers during data-only restore\n"));
 	printf(_("  --no-tablespaces            do not dump tablespace assignments\n"));
 	printf(_("  --quote-all-identifiers     quote all identifiers, even if not keywords\n"));
+	printf(_("  --serializable-deferrable   wait until the dump can run without anomalies\n"));
 	printf(_("  --role=ROLENAME             do SET ROLE before dump\n"));
 	printf(_("  --no-security-label         do not dump security label assignments\n"));
 	printf(_("  --no-unlogged-table-data	do not dump unlogged table data\n"));

--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -82,8 +82,8 @@ extern HeapTuple heap_getnext(HeapScanDesc scan, ScanDirection direction);
 extern bool heap_fetch(Relation relation, Snapshot snapshot,
 		   HeapTuple tuple, Buffer *userbuf, bool keep_buf,
 		   Relation stats_relation);
-extern bool heap_hot_search_buffer(ItemPointer tid, Buffer buffer,
+extern bool heap_hot_search_buffer(ItemPointer tid, Relation relation,
-					   Snapshot snapshot, bool *all_dead);
+					   Buffer buffer, Snapshot snapshot, bool *all_dead);
 extern bool heap_hot_search(ItemPointer tid, Relation relation,
 				Snapshot snapshot, bool *all_dead);

--- a/src/include/access/relscan.h
+++ b/src/include/access/relscan.h
@@ -35,6 +35,7 @@ typedef struct HeapScanDescData
 	BlockNumber rs_startblock;	/* block # to start at */
 	BufferAccessStrategy rs_strategy;	/* access strategy for reads */
 	bool		rs_syncscan;	/* report location to syncscan logic? */
+	bool		rs_relpredicatelocked;	/* predicate lock on relation exists */
 	/* scan current state */
 	bool		rs_inited;		/* false = scan not init'd yet */

--- a/src/include/access/twophase_rmgr.h
+++ b/src/include/access/twophase_rmgr.h
@@ -23,8 +23,9 @@ typedef uint8 TwoPhaseRmgrId;
 */
 #define TWOPHASE_RM_END_ID			0
 #define TWOPHASE_RM_LOCK_ID			1
-#define TWOPHASE_RM_PGSTAT_ID		2
+#define TWOPHASE_RM_PREDICATELOCK_ID	2
-#define TWOPHASE_RM_MULTIXACT_ID	3
+#define TWOPHASE_RM_PGSTAT_ID		3
+#define TWOPHASE_RM_MULTIXACT_ID	4
 #define TWOPHASE_RM_MAX_ID			TWOPHASE_RM_MULTIXACT_ID
 extern const TwoPhaseCallback twophase_recover_callbacks[];

--- a/src/include/access/xact.h
+++ b/src/include/access/xact.h
@@ -32,15 +32,26 @@ extern int	DefaultXactIsoLevel;
 extern int	XactIsoLevel;
 /*
- * We only implement two isolation levels internally.  This macro should
+ * We implement three isolation levels internally.
- * be used to check which one is selected.
+ * The two stronger ones use one snapshot per database transaction;
+ * the others use one snapshot per statement.
+ * Serializable uses predicate locks in addition to snapshots.
+ * These macros should be used to check which isolation level is selected.
 */
 #define IsolationUsesXactSnapshot() (XactIsoLevel >= XACT_REPEATABLE_READ)
+#define IsolationIsSerializable() (XactIsoLevel == XACT_SERIALIZABLE)
 /* Xact read-only state */
 extern bool DefaultXactReadOnly;
 extern bool XactReadOnly;
+/*
+ * Xact is deferrable -- only meaningful (currently) for read only
+ * SERIALIZABLE transactions
+ */
+extern bool DefaultXactDeferrable;
+extern bool XactDeferrable;
 /* Asynchronous commits */
 extern bool XactSyncCommit;

--- a/src/include/catalog/pg_am.h
+++ b/src/include/catalog/pg_am.h
@@ -49,6 +49,7 @@ CATALOG(pg_am,2601)
 	bool		amsearchnulls;	/* can AM search for NULL/NOT NULL entries? */
 	bool		amstorage;		/* can storage type differ from column type? */
 	bool		amclusterable;	/* does AM support cluster command? */
+	bool		ampredlocks;	/* does AM handle predicate locks? */
 	Oid			amkeytype;		/* type of data in index, or InvalidOid */
 	regproc		aminsert;		/* "insert this tuple" function */
 	regproc		ambeginscan;	/* "prepare for index scan" function */
@@ -77,7 +78,7 @@ typedef FormData_pg_am *Form_pg_am;
 *		compiler constants for pg_am
 * ----------------
 */
-#define Natts_pg_am						27
+#define Natts_pg_am						28
 #define Anum_pg_am_amname				1
 #define Anum_pg_am_amstrategies			2
 #define Anum_pg_am_amsupport			3
@@ -90,37 +91,38 @@ typedef FormData_pg_am *Form_pg_am;
 #define Anum_pg_am_amsearchnulls		10
 #define Anum_pg_am_amstorage			11
 #define Anum_pg_am_amclusterable		12
-#define Anum_pg_am_amkeytype			13
+#define Anum_pg_am_ampredlocks			13
-#define Anum_pg_am_aminsert				14
+#define Anum_pg_am_amkeytype			14
-#define Anum_pg_am_ambeginscan			15
+#define Anum_pg_am_aminsert				15
-#define Anum_pg_am_amgettuple			16
+#define Anum_pg_am_ambeginscan			16
-#define Anum_pg_am_amgetbitmap			17
+#define Anum_pg_am_amgettuple			17
-#define Anum_pg_am_amrescan				18
+#define Anum_pg_am_amgetbitmap			18
-#define Anum_pg_am_amendscan			19
+#define Anum_pg_am_amrescan				19
-#define Anum_pg_am_ammarkpos			20
+#define Anum_pg_am_amendscan			20
-#define Anum_pg_am_amrestrpos			21
+#define Anum_pg_am_ammarkpos			21
-#define Anum_pg_am_ambuild				22
+#define Anum_pg_am_amrestrpos			22
-#define Anum_pg_am_ambuildempty			23
+#define Anum_pg_am_ambuild				23
-#define Anum_pg_am_ambulkdelete			24
+#define Anum_pg_am_ambuildempty			24
-#define Anum_pg_am_amvacuumcleanup		25
+#define Anum_pg_am_ambulkdelete			25
-#define Anum_pg_am_amcostestimate		26
+#define Anum_pg_am_amvacuumcleanup		26
-#define Anum_pg_am_amoptions			27
+#define Anum_pg_am_amcostestimate		27
+#define Anum_pg_am_amoptions			28
 /* ----------------
 *		initial contents of pg_am
 * ----------------
 */
-DATA(insert OID = 403 (  btree	5 1 t f t t t t t f t 0 btinsert btbeginscan btgettuple btgetbitmap btrescan btendscan btmarkpos btrestrpos btbuild btbuildempty btbulkdelete btvacuumcleanup btcostestimate btoptions ));
+DATA(insert OID = 403 (  btree	5 1 t f t t t t t f t t 0 btinsert btbeginscan btgettuple btgetbitmap btrescan btendscan btmarkpos btrestrpos btbuild btbuildempty btbulkdelete btvacuumcleanup btcostestimate btoptions ));
 DESCR("b-tree index access method");
 #define BTREE_AM_OID 403
-DATA(insert OID = 405 (  hash	1 1 f f t f f f f f f 23 hashinsert hashbeginscan hashgettuple hashgetbitmap hashrescan hashendscan hashmarkpos hashrestrpos hashbuild hashbuildempty hashbulkdelete hashvacuumcleanup hashcostestimate hashoptions ));
+DATA(insert OID = 405 (  hash	1 1 f f t f f f f f f f 23 hashinsert hashbeginscan hashgettuple hashgetbitmap hashrescan hashendscan hashmarkpos hashrestrpos hashbuild hashbuildempty hashbulkdelete hashvacuumcleanup hashcostestimate hashoptions ));
 DESCR("hash index access method");
 #define HASH_AM_OID 405
-DATA(insert OID = 783 (  gist	0 8 f t f f t t t t t 0 gistinsert gistbeginscan gistgettuple gistgetbitmap gistrescan gistendscan gistmarkpos gistrestrpos gistbuild gistbuildempty gistbulkdelete gistvacuumcleanup gistcostestimate gistoptions ));
+DATA(insert OID = 783 (  gist	0 8 f t f f t t t t t f 0 gistinsert gistbeginscan gistgettuple gistgetbitmap gistrescan gistendscan gistmarkpos gistrestrpos gistbuild gistbuildempty gistbulkdelete gistvacuumcleanup gistcostestimate gistoptions ));
 DESCR("GiST index access method");
 #define GIST_AM_OID 783
-DATA(insert OID = 2742 (  gin	0 5 f f f f t t f t f 0 gininsert ginbeginscan - gingetbitmap ginrescan ginendscan ginmarkpos ginrestrpos ginbuild ginbuildempty ginbulkdelete ginvacuumcleanup gincostestimate ginoptions ));
+DATA(insert OID = 2742 (  gin	0 5 f f f f t t f t f f 0 gininsert ginbeginscan - gingetbitmap ginrescan ginendscan ginmarkpos ginrestrpos ginbuild ginbuildempty ginbulkdelete ginvacuumcleanup gincostestimate ginoptions ));
 DESCR("GIN index access method");
 #define GIN_AM_OID 2742

--- a/src/include/commands/variable.h
+++ b/src/include/commands/variable.h
@@ -26,6 +26,8 @@ extern bool assign_transaction_read_only(bool value,
 extern const char *assign_XactIsoLevel(const char *value,
 					bool doit, GucSource source);
 extern const char *show_XactIsoLevel(void);
+extern bool assign_transaction_deferrable(bool newval, bool doit,
+					GucSource source);
 extern bool assign_random_seed(double value,
 				   bool doit, GucSource source);
 extern const char *show_random_seed(void);

--- a/src/include/storage/lwlock.h
+++ b/src/include/storage/lwlock.h
@@ -27,6 +27,10 @@
 #define LOG2_NUM_LOCK_PARTITIONS  4
 #define NUM_LOCK_PARTITIONS  (1 << LOG2_NUM_LOCK_PARTITIONS)
+/* Number of partitions the shared predicate lock tables are divided into */
+#define LOG2_NUM_PREDICATELOCK_PARTITIONS  4
+#define NUM_PREDICATELOCK_PARTITIONS  (1 << LOG2_NUM_PREDICATELOCK_PARTITIONS)
 /*
 * We have a number of predefined LWLocks, plus a bunch of LWLocks that are
 * dynamically assigned (e.g., for shared buffers).  The LWLock structures
@@ -70,12 +74,18 @@ typedef enum LWLockId
 	RelationMappingLock,
 	AsyncCtlLock,
 	AsyncQueueLock,
+	SerializableXactHashLock,
+	SerializableFinishedListLock,
+	SerializablePredicateLockListLock,
+	OldSerXidLock,
+	PredicateLockNextRowLinkLock,
 	/* Individual lock IDs end here */
 	FirstBufMappingLock,
 	FirstLockMgrLock = FirstBufMappingLock + NUM_BUFFER_PARTITIONS,
+	FirstPredicateLockMgrLock = FirstLockMgrLock + NUM_LOCK_PARTITIONS,
 	/* must be last except for MaxDynamicLWLock: */
-	NumFixedLWLocks = FirstLockMgrLock + NUM_LOCK_PARTITIONS,
+	NumFixedLWLocks = FirstPredicateLockMgrLock + NUM_PREDICATELOCK_PARTITIONS,
 	MaxDynamicLWLock = 1000000000
 } LWLockId;

--- a/src/include/storage/predicate.h
+++ b/src/include/storage/predicate.h
+/*-------------------------------------------------------------------------
+ *
+ * predicate.h
+ *	  POSTGRES public predicate locking definitions.
+ *
+ *
+ * Portions Copyright (c) 1996-2011, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/storage/predicate.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef PREDICATE_H
+#define PREDICATE_H
+#include "utils/relcache.h"
+#include "utils/snapshot.h"
+/*
+ * GUC variables
+ */
+extern int	max_predicate_locks_per_xact;
+/* Number of SLRU buffers to use for predicate locking */
+#define NUM_OLDSERXID_BUFFERS	16
+/*
+ * function prototypes
+ */
+/* housekeeping for shared memory predicate lock structures */
+extern void InitPredicateLocks(void);
+extern Size PredicateLockShmemSize(void);
+/* predicate lock reporting */
+extern bool PageIsPredicateLocked(const Relation relation, const BlockNumber blkno);
+/* predicate lock maintenance */
+extern Snapshot RegisterSerializableTransaction(Snapshot snapshot);
+extern void RegisterPredicateLockingXid(const TransactionId xid);
+extern void PredicateLockRelation(const Relation relation);
+extern void PredicateLockPage(const Relation relation, const BlockNumber blkno);
+extern void PredicateLockTuple(const Relation relation, const HeapTuple tuple);
+extern void PredicateLockTupleRowVersionLink(const Relation relation, const HeapTuple oldTuple, const HeapTuple newTuple);
+extern void PredicateLockPageSplit(const Relation relation, const BlockNumber oldblkno, const BlockNumber newblkno);
+extern void PredicateLockPageCombine(const Relation relation, const BlockNumber oldblkno, const BlockNumber newblkno);
+extern void ReleasePredicateLocks(const bool isCommit);
+/* conflict detection (may also trigger rollback) */
+extern void CheckForSerializableConflictOut(const bool valid, const Relation relation, const HeapTuple tuple, const Buffer buffer);
+extern void CheckForSerializableConflictIn(const Relation relation, const HeapTuple tuple, const Buffer buffer);
+/* final rollback checking */
+extern void PreCommit_CheckForSerializationFailure(void);
+/* two-phase commit support */
+extern void AtPrepare_PredicateLocks(void);
+extern void PostPrepare_PredicateLocks(TransactionId xid);
+extern void PredicateLockTwoPhaseFinish(TransactionId xid, bool isCommit);
+extern void predicatelock_twophase_recover(TransactionId xid, uint16 info,
+							   void *recdata, uint32 len);
+#endif   /* PREDICATE_H */
--- a/src/include/storage/predicate_internals.h
+++ b/src/include/storage/predicate_internals.h
--- a/src/include/storage/shmem.h
+++ b/src/include/storage/shmem.h
@@ -35,7 +35,7 @@ typedef struct SHM_QUEUE
 extern void InitShmemAccess(void *seghdr);
 extern void InitShmemAllocation(void);
 extern void *ShmemAlloc(Size size);
-extern bool ShmemAddrIsValid(void *addr);
+extern bool ShmemAddrIsValid(const void *addr);
 extern void InitShmemIndex(void);
 extern HTAB *ShmemInitHash(const char *name, long init_size, long max_size,
 			  HASHCTL *infoP, int hash_flags);
@@ -67,8 +67,9 @@ extern void SHMQueueInit(SHM_QUEUE *queue);
 extern void SHMQueueElemInit(SHM_QUEUE *queue);
 extern void SHMQueueDelete(SHM_QUEUE *queue);
 extern void SHMQueueInsertBefore(SHM_QUEUE *queue, SHM_QUEUE *elem);
-extern Pointer SHMQueueNext(SHM_QUEUE *queue, SHM_QUEUE *curElem,
+extern Pointer SHMQueueNext(const SHM_QUEUE *queue, const SHM_QUEUE *curElem,
 			 Size linkOffset);
-extern bool SHMQueueEmpty(SHM_QUEUE *queue);
+extern bool SHMQueueEmpty(const SHM_QUEUE *queue);
+extern bool SHMQueueIsDetached(const SHM_QUEUE *queue);
 #endif   /* SHMEM_H */
--- a/src/test/isolation/.gitignore
+++ b/src/test/isolation/.gitignore
+# Local binaries
+/isolationtester
+/pg_isolation_regress
+# Local generated source files
+/specparse.c
+/specscanner.c
+# Generated subdirectories
+/results/
+/log/
+/tmp_check/
--- a/src/test/isolation/Makefile
+++ b/src/test/isolation/Makefile
+#
+# Makefile for isolation tests
+#
+subdir = src/test/isolation
+top_builddir = ../../..
+include $(top_builddir)/src/Makefile.global
+ifeq ($(PORTNAME), win32)
+LDLIBS += -lws2_32
+endif
+override CPPFLAGS := -I$(libpq_srcdir) $(CPPFLAGS)
+override LDLIBS := $(libpq_pgport) $(LDLIBS)
+OBJS =  specparse.o isolationtester.o
+submake-regress:
+	$(MAKE) -C $(top_builddir)/src/test/regress pg_regress.o
+pg_regress.o: | submake-regress
+	rm -f $@ && $(LN_S) $(top_builddir)/src/test/regress/pg_regress.o .
+pg_isolation_regress: isolation_main.o pg_regress.o
+	$(CC) $(CFLAGS) $^ $(libpq_pgport) $(LDFLAGS) $(LDFLAGS_EX) $(LIBS) -o $@$(X)
+all: isolationtester pg_isolation_regress
+isolationtester: $(OBJS) | submake-libpq submake-libpgport
+	$(CC) $(CFLAGS) $(OBJS) $(libpq_pgport) $(LDFLAGS) $(LDFLAGS_EX) $(LIBS) -o $@$(X)
+distprep: specparse.c
+# There is no correct way to write a rule that generates two files.
+# Rules with two targets don't have that meaning, they are merely
+# shorthand for two otherwise separate rules.  To be safe for parallel
+# make, we must chain the dependencies like this.  The semicolon is
+# important, otherwise make will choose the built-in rule for
+# gram.y=>gram.c.
+all: isolationtester$(X) pg_isolation_regress$(X)
+specparse.h: specparse.c ;
+# specscanner is compiled as part of specparse
+specparse.o: specscanner.c
+specparse.c: specparse.y
+ifdef BISON
+	$(BISON) $(BISONFLAGS) -o $@ $<
+else
+	@$(missing) bison $< $@
+endif
+specscanner.c: specscanner.l
+ifdef FLEX
+	$(FLEX) $(FLEXFLAGS) -o'$@' $<
+else
+	@$(missing) flex $< $@
+endif
+# specparse.c is in the distribution tarball, so is not cleaned here
+clean distclean:
+	rm -f isolationtester$(X) pg_isolation_regress$(X) $(OBJS) isolation_main.o
+	rm -f pg_regress.o
+	rm -rf results
+maintainer-clean: distclean
+	rm -f specparse.c specscanner.c
+installcheck: all
+	./pg_isolation_regress --schedule=$(srcdir)/isolation_schedule
+check: all
+	./pg_isolation_regress --temp-install=./tmp_check --top-builddir=$(top_builddir) --schedule=$(srcdir)/isolation_schedule
--- a/src/test/isolation/README
+++ b/src/test/isolation/README
+src/test/isolation/README
+Isolation tests
+===============
+This directory contains a set of tests for the serializable isolation level.
+Testing isolation requires running multiple overlapping transactions, so
+which requires multiple concurrent connections, and can't therefore be
+tested using the normal pg_regress program.
+To represent a test with overlapping transactions, we use a test specification
+file with a custom syntax, described in the next section.
+isolationtester is program that uses libpq to open multiple connections,
+and executes a test specified by a spec file. A libpq connection string
+to specify the server and database to connect to, the defaults derived from
+environment variables are used otherwise.
+pg_isolation_regress is a tool identical to pg_regress, but instead of using
+psql to execute a test, it uses isolationtester.
+To run the tests, you need to have a server up and running. Run
+    gmake installcheck
+Test specification
+==================
+Each isolation test is defined by a specification file, stored in the specs
+subdirectory. A test specification consists of five parts, in this order:
+setup { <SQL> }
+  The given SQL block is executed once, in one session only, before running
+  the test. Create any test tables or such objects here. This part is
+  optional.
+teardown { <SQL> }
+  The teardown SQL block is executed once after the test is finished. Use
+  this to clean up, e.g dropping any test tables. This part is optional.
+session "<name>"
+  Each session is executed in a separate connection. A session consists
+  of four parts: setup, teardown and one or more steps. The per-session
+  setup and teardown parts have the same syntax as the per-test setup and
+  teardown described above, but they are executed in every session,
+  before and after each permutation. The setup part typically contains a
+  "BEGIN" command to begin a transaction.
+  Each step has a syntax of
+  step "<name>" { <SQL> }
+  where <name> is a unique name identifying this step, and SQL is a SQL
+  statement (or statements, separated by semicolons) that is executed in the
+  step.
+permutation "<step name>" ...
+  A permutation line specifies a list of steps that are ran in that order.
+  If no permutation lines are given, the test program automatically generates
+  all possible overlapping orderings of the given sessions.
+Lines beginning with a # are considered comments.
--- a/src/test/isolation/expected/classroom-scheduling.out
+++ b/src/test/isolation/expected/classroom-scheduling.out
--- a/src/test/isolation/expected/multiple-row-versions.out
+++ b/src/test/isolation/expected/multiple-row-versions.out
+Parsed test spec with 4 sessions
+starting permutation: rx1 wx2 c2 wx3 ry3 wy4 rz4 c4 c3 wz1 c1
+step rx1:  SELECT * FROM t WHERE id = 1000000; 
+id             txt            
+1000000                       
+step wx2:  UPDATE t SET txt = 'b' WHERE id = 1000000; 
+step c2:  COMMIT; 
+step wx3:  UPDATE t SET txt = 'c' WHERE id = 1000000; 
+step ry3:  SELECT * FROM t WHERE id = 500000; 
+id             txt            
+500000                        
+step wy4:  UPDATE t SET txt = 'd' WHERE id = 500000; 
+step rz4:  SELECT * FROM t WHERE id = 1; 
+id             txt            
+1                             
+step c4:  COMMIT; 
+step c3:  COMMIT; 
+ERROR:  could not serialize access due to read/write dependencies among transactions
+step wz1:  UPDATE t SET txt = 'a' WHERE id = 1; 
+step c1:  COMMIT; 
--- a/src/test/isolation/expected/partial-index.out
+++ b/src/test/isolation/expected/partial-index.out
--- a/src/test/isolation/expected/project-manager.out
+++ b/src/test/isolation/expected/project-manager.out
--- a/src/test/isolation/expected/receipt-report.out
+++ b/src/test/isolation/expected/receipt-report.out
--- a/src/test/isolation/expected/referential-integrity.out
+++ b/src/test/isolation/expected/referential-integrity.out
--- a/src/test/isolation/expected/ri-trigger.out
+++ b/src/test/isolation/expected/ri-trigger.out
+Parsed test spec with 2 sessions
+starting permutation: wxry1 c1 r2 wyrx2 c2
+step wxry1:  INSERT INTO child (parent_id) VALUES (0); 
+step c1:  COMMIT; 
+step r2:  SELECT TRUE; 
+bool           
+t              
+step wyrx2:  DELETE FROM parent WHERE parent_id = 0; 
+ERROR:  child row exists
+step c2:  COMMIT; 
+starting permutation: wxry1 r2 c1 wyrx2 c2
+step wxry1:  INSERT INTO child (parent_id) VALUES (0); 
+step r2:  SELECT TRUE; 
+bool           
+t              
+step c1:  COMMIT; 
+step wyrx2:  DELETE FROM parent WHERE parent_id = 0; 
+ERROR:  could not serialize access due to read/write dependencies among transactions
+step c2:  COMMIT; 
+starting permutation: wxry1 r2 wyrx2 c1 c2
+step wxry1:  INSERT INTO child (parent_id) VALUES (0); 
+step r2:  SELECT TRUE; 
+bool           
+t              
+step wyrx2:  DELETE FROM parent WHERE parent_id = 0; 
+step c1:  COMMIT; 
+step c2:  COMMIT; 
+ERROR:  could not serialize access due to read/write dependencies among transactions
+starting permutation: wxry1 r2 wyrx2 c2 c1
+step wxry1:  INSERT INTO child (parent_id) VALUES (0); 
+step r2:  SELECT TRUE; 
+bool           
+t              
+step wyrx2:  DELETE FROM parent WHERE parent_id = 0; 
+step c2:  COMMIT; 
+step c1:  COMMIT; 
+ERROR:  could not serialize access due to read/write dependencies among transactions
+starting permutation: r2 wxry1 c1 wyrx2 c2
+step r2:  SELECT TRUE; 
+bool           
+t              
+step wxry1:  INSERT INTO child (parent_id) VALUES (0); 
+step c1:  COMMIT; 
+step wyrx2:  DELETE FROM parent WHERE parent_id = 0; 
+ERROR:  could not serialize access due to read/write dependencies among transactions
+step c2:  COMMIT; 
+starting permutation: r2 wxry1 wyrx2 c1 c2
+step r2:  SELECT TRUE; 
+bool           
+t              
+step wxry1:  INSERT INTO child (parent_id) VALUES (0); 
+step wyrx2:  DELETE FROM parent WHERE parent_id = 0; 
+step c1:  COMMIT; 
+step c2:  COMMIT; 
+ERROR:  could not serialize access due to read/write dependencies among transactions
+starting permutation: r2 wxry1 wyrx2 c2 c1
+step r2:  SELECT TRUE; 
+bool           
+t              
+step wxry1:  INSERT INTO child (parent_id) VALUES (0); 
+step wyrx2:  DELETE FROM parent WHERE parent_id = 0; 
+step c2:  COMMIT; 
+step c1:  COMMIT; 
+ERROR:  could not serialize access due to read/write dependencies among transactions
+starting permutation: r2 wyrx2 wxry1 c1 c2
+step r2:  SELECT TRUE; 
+bool           
+t              
+step wyrx2:  DELETE FROM parent WHERE parent_id = 0; 
+step wxry1:  INSERT INTO child (parent_id) VALUES (0); 
+step c1:  COMMIT; 
+step c2:  COMMIT; 
+ERROR:  could not serialize access due to read/write dependencies among transactions
+starting permutation: r2 wyrx2 wxry1 c2 c1
+step r2:  SELECT TRUE; 
+bool           
+t              
+step wyrx2:  DELETE FROM parent WHERE parent_id = 0; 
+step wxry1:  INSERT INTO child (parent_id) VALUES (0); 
+step c2:  COMMIT; 
+step c1:  COMMIT; 
+ERROR:  could not serialize access due to read/write dependencies among transactions
+starting permutation: r2 wyrx2 c2 wxry1 c1
+step r2:  SELECT TRUE; 
+bool           
+t              
+step wyrx2:  DELETE FROM parent WHERE parent_id = 0; 
+step c2:  COMMIT; 
+step wxry1:  INSERT INTO child (parent_id) VALUES (0); 
+ERROR:  parent row missing
+step c1:  COMMIT; 
--- a/src/test/isolation/expected/simple-write-skew.out
+++ b/src/test/isolation/expected/simple-write-skew.out
+Parsed test spec with 2 sessions
+starting permutation: rwx1 c1 rwx2 c2
+step rwx1:  UPDATE test SET t = 'apple' WHERE t = 'pear'; 
+step c1:  COMMIT; 
+step rwx2:  UPDATE test SET t = 'pear' WHERE t = 'apple'
+step c2:  COMMIT; 
+starting permutation: rwx1 rwx2 c1 c2
+step rwx1:  UPDATE test SET t = 'apple' WHERE t = 'pear'; 
+step rwx2:  UPDATE test SET t = 'pear' WHERE t = 'apple'
+step c1:  COMMIT; 
+step c2:  COMMIT; 
+ERROR:  could not serialize access due to read/write dependencies among transactions
+starting permutation: rwx1 rwx2 c2 c1
+step rwx1:  UPDATE test SET t = 'apple' WHERE t = 'pear'; 
+step rwx2:  UPDATE test SET t = 'pear' WHERE t = 'apple'
+step c2:  COMMIT; 
+step c1:  COMMIT; 
+ERROR:  could not serialize access due to read/write dependencies among transactions
+starting permutation: rwx2 rwx1 c1 c2
+step rwx2:  UPDATE test SET t = 'pear' WHERE t = 'apple'
+step rwx1:  UPDATE test SET t = 'apple' WHERE t = 'pear'; 
+step c1:  COMMIT; 
+step c2:  COMMIT; 
+ERROR:  could not serialize access due to read/write dependencies among transactions
+starting permutation: rwx2 rwx1 c2 c1
+step rwx2:  UPDATE test SET t = 'pear' WHERE t = 'apple'
+step rwx1:  UPDATE test SET t = 'apple' WHERE t = 'pear'; 
+step c2:  COMMIT; 
+step c1:  COMMIT; 
+ERROR:  could not serialize access due to read/write dependencies among transactions
+starting permutation: rwx2 c2 rwx1 c1
+step rwx2:  UPDATE test SET t = 'pear' WHERE t = 'apple'
+step c2:  COMMIT; 
+step rwx1:  UPDATE test SET t = 'apple' WHERE t = 'pear'; 
+step c1:  COMMIT; 
--- a/src/test/isolation/expected/temporal-range-integrity.out
+++ b/src/test/isolation/expected/temporal-range-integrity.out
--- a/src/test/isolation/expected/total-cash.out
+++ b/src/test/isolation/expected/total-cash.out
--- a/src/test/isolation/expected/two-ids.out
+++ b/src/test/isolation/expected/two-ids.out
--- a/src/test/isolation/isolation_main.c
+++ b/src/test/isolation/isolation_main.c
--- a/src/test/isolation/isolation_schedule
+++ b/src/test/isolation/isolation_schedule
--- a/src/test/isolation/isolationtester.c
+++ b/src/test/isolation/isolationtester.c
--- a/src/test/isolation/isolationtester.h
+++ b/src/test/isolation/isolationtester.h
--- a/src/test/isolation/specparse.y
+++ b/src/test/isolation/specparse.y
--- a/src/test/isolation/specs/classroom-scheduling.spec
+++ b/src/test/isolation/specs/classroom-scheduling.spec
--- a/src/test/isolation/specs/multiple-row-versions.spec
+++ b/src/test/isolation/specs/multiple-row-versions.spec
--- a/src/test/isolation/specs/partial-index.spec
+++ b/src/test/isolation/specs/partial-index.spec
--- a/src/test/isolation/specs/project-manager.spec
+++ b/src/test/isolation/specs/project-manager.spec
--- a/src/test/isolation/specs/receipt-report.spec
+++ b/src/test/isolation/specs/receipt-report.spec
--- a/src/test/isolation/specs/referential-integrity.spec
+++ b/src/test/isolation/specs/referential-integrity.spec
--- a/src/test/isolation/specs/ri-trigger.spec
+++ b/src/test/isolation/specs/ri-trigger.spec
--- a/src/test/isolation/specs/simple-write-skew.spec
+++ b/src/test/isolation/specs/simple-write-skew.spec
--- a/src/test/isolation/specs/temporal-range-integrity.spec
+++ b/src/test/isolation/specs/temporal-range-integrity.spec
--- a/src/test/isolation/specs/total-cash.spec
+++ b/src/test/isolation/specs/total-cash.spec
--- a/src/test/isolation/specs/two-ids.spec
+++ b/src/test/isolation/specs/two-ids.spec
--- a/src/test/isolation/specscanner.l
+++ b/src/test/isolation/specscanner.l
--- a/src/test/regress/expected/prepared_xacts.out
+++ b/src/test/regress/expected/prepared_xacts.out
--- a/src/test/regress/expected/prepared_xacts_1.out
+++ b/src/test/regress/expected/prepared_xacts_1.out
--- a/src/test/regress/expected/transactions.out
+++ b/src/test/regress/expected/transactions.out
@@ -44,7 +44,7 @@ SELECT * FROM aggtest;
 CREATE TABLE writetest (a int);
 CREATE TEMPORARY TABLE temptest (a int);
 BEGIN;
-SET TRANSACTION READ ONLY; -- ok
+SET TRANSACTION ISOLATION LEVEL SERIALIZABLE, READ ONLY, DEFERRABLE; -- ok
 SELECT * FROM writetest; -- ok
 a 
 ---

--- a/src/test/regress/sql/prepared_xacts.sql
+++ b/src/test/regress/sql/prepared_xacts.sql
--- a/src/test/regress/sql/transactions.sql
+++ b/src/test/regress/sql/transactions.sql
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list