Commit dafaa3ef authored by Heikki Linnakangas's avatar Heikki Linnakangas

Implement genuine serializable isolation level.

Until now, our Serializable mode has in fact been what's called Snapshot
Isolation, which allows some anomalies that could not occur in any
serialized ordering of the transactions. This patch fixes that using a
method called Serializable Snapshot Isolation, based on research papers by
Michael J. Cahill (see README-SSI for full references). In Serializable
Snapshot Isolation, transactions run like they do in Snapshot Isolation,
but a predicate lock manager observes the reads and writes performed and
aborts transactions if it detects that an anomaly might occur. This method
produces some false positives, ie. it sometimes aborts transactions even
though there is no anomaly.

To track reads we implement predicate locking, see storage/lmgr/predicate.c.
Whenever a tuple is read, a predicate lock is acquired on the tuple. Shared
memory is finite, so when a transaction takes many tuple-level locks on a
page, the locks are promoted to a single page-level lock, and further to a
single relation level lock if necessary. To lock key values with no matching
tuple, a sequential scan always takes a relation-level lock, and an index
scan acquires a page-level lock that covers the search key, whether or not
there are any matching keys at the moment.

A predicate lock doesn't conflict with any regular locks or with another
predicate locks in the normal sense. They're only used by the predicate lock
manager to detect the danger of anomalies. Only serializable transactions
participate in predicate locking, so there should be no extra overhead for
for other transactions.

Predicate locks can't be released at commit, but must be remembered until
all the transactions that overlapped with it have completed. That means that
we need to remember an unbounded amount of predicate locks, so we apply a
lossy but conservative method of tracking locks for committed transactions.
If we run short of shared memory, we overflow to a new "pg_serial" SLRU
pool.

We don't currently allow Serializable transactions in Hot Standby mode.
That would be hard, because even read-only transactions can cause anomalies
that wouldn't otherwise occur.

Serializable isolation mode now means the new fully serializable level.
Repeatable Read gives you the old Snapshot Isolation level that we have
always had.

Kevin Grittner and Dan Ports, reviewed by Jeff Davis, Heikki Linnakangas and
Anssi Kääriäinen
parent c18f51da
......@@ -490,6 +490,13 @@
<entry>Can an index of this type be clustered on?</entry>
</row>
<row>
<entry><structfield>ampredlocks</structfield></entry>
<entry><type>bool</type></entry>
<entry></entry>
<entry>Does an index of this type manage fine-grained predicate locks?</entry>
</row>
<row>
<entry><structfield>amkeytype</structfield></entry>
<entry><type>oid</type></entry>
......@@ -6577,7 +6584,7 @@
<entry><type>text</type></entry>
<entry></entry>
<entry>Name of the lock mode held or desired by this process (see <xref
linkend="locking-tables">)</entry>
linkend="locking-tables"> and <xref linkend="xact-serializable">)</entry>
</row>
<row>
<entry><structfield>granted</structfield></entry>
......
......@@ -4456,6 +4456,7 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
<varlistentry id="guc-default-transaction-isolation" xreflabel="default_transaction_isolation">
<indexterm>
<primary>transaction isolation level</primary>
<secondary>setting default</secondary>
</indexterm>
<indexterm>
<primary><varname>default_transaction_isolation</> configuration parameter</primary>
......@@ -4481,6 +4482,7 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
<varlistentry id="guc-default-transaction-read-only" xreflabel="default_transaction_read_only">
<indexterm>
<primary>read-only transaction</primary>
<secondary>setting default</secondary>
</indexterm>
<indexterm>
<primary><varname>default_transaction_read_only</> configuration parameter</primary>
......@@ -4500,6 +4502,41 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
</listitem>
</varlistentry>
<varlistentry id="guc-default-transaction-deferrable" xreflabel="default_transaction_deferrable">
<indexterm>
<primary>deferrable transaction</primary>
<secondary>setting default</secondary>
</indexterm>
<indexterm>
<primary><varname>default_transaction_deferrable</> configuration parameter</primary>
</indexterm>
<term><varname>default_transaction_deferrable</varname> (<type>boolean</type>)</term>
<listitem>
<para>
When running at the <literal>serializable</> isolation level,
a deferrable read-only SQL transaction may be delayed before
it is allowed to proceed. However, once it begins executing
it does not incur any of the overhead required to ensure
serializability; so serialization code will have no reason to
force it to abort because of concurrent updates, making this
option suitable for long-running read-only transactions.
</para>
<para>
This parameter controls the default deferrable status of each
new transaction. It currently has no effect on read-write
transactions or those operating at isolation levels lower
than <literal>serializable</>. The default is <literal>off</>.
</para>
<para>
Consult <xref linkend="sql-set-transaction"> for more information.
</para>
</listitem>
</varlistentry>
<varlistentry id="guc-session-replication-role" xreflabel="session_replication_role">
<term><varname>session_replication_role</varname> (<type>enum</type>)</term>
<indexterm>
......@@ -5125,6 +5162,39 @@ dynamic_library_path = 'C:\tools\postgresql;H:\my_project\lib;$libdir'
</listitem>
</varlistentry>
<varlistentry id="guc-max-predicate-locks-per-transaction" xreflabel="max_predicate_locks_per_transaction">
<term><varname>max_predicate_locks_per_transaction</varname> (<type>integer</type>)</term>
<indexterm>
<primary><varname>max_predicate_locks_per_transaction</> configuration parameter</primary>
</indexterm>
<listitem>
<para>
The shared predicate lock table tracks locks on
<varname>max_predicate_locks_per_transaction</varname> * (<xref
linkend="guc-max-connections"> + <xref
linkend="guc-max-prepared-transactions">) objects (e.g., tables);
hence, no more than this many distinct objects can be locked at
any one time. This parameter controls the average number of object
locks allocated for each transaction; individual transactions
can lock more objects as long as the locks of all transactions
fit in the lock table. This is <emphasis>not</> the number of
rows that can be locked; that value is unlimited. The default,
64, has generally been sufficient in testing, but you might need to
raise this value if you have clients that touch many different
tables in a single serializable transaction. This parameter can
only be set at server start.
</para>
<para>
Increasing this parameter might cause <productname>PostgreSQL</>
to request more <systemitem class="osname">System V</> shared
memory than your operating system's default configuration
allows. See <xref linkend="sysvipc"> for information on how to
adjust those parameters, if necessary.
</para>
</listitem>
</varlistentry>
</variablelist>
</sect1>
......
......@@ -1916,6 +1916,15 @@ LOG: database system is ready to accept read only connections
your setting of <varname>max_prepared_transactions</> is 0.
</para>
</listitem>
<listitem>
<para>
The Serializable transaction isolation level is not yet available in hot
standby. (See <xref linkend="xact-serializable"> and
<xref linkend="serializable-consistency"> for details.)
An attempt to set a transaction to the serializable isolation level in
hot standby mode will generate an error.
</para>
</listitem>
</itemizedlist>
</para>
......
......@@ -705,6 +705,19 @@ amrestrpos (IndexScanDesc scan);
it is only safe to use such scans with MVCC-compliant snapshots.
</para>
<para>
When the <structfield>ampredlocks</> flag is not set, any scan using that
index access method within a serializable transaction will acquire a
non-blocking predicate lock on the full index. This will generate a
read-write conflict with the insert of any tuple into that index by a
concurrent serializable transaction. If certain patterns of read-write
conflicts are detected among a set of concurrent serializable
transactions, one of those transactions may be cancelled to protect data
integrity. When the flag is set, it indicates that the index access
method implements finer-grained predicate locking, which will tend to
reduce the frequency of such transaction cancellations.
</para>
</sect1>
<sect1 id="index-unique-checks">
......
......@@ -256,7 +256,7 @@ int lo_open(PGconn *conn, Oid lobjId, int mode);
from a descriptor opened with <symbol>INV_WRITE</symbol> returns
data that reflects all writes of other committed transactions as well
as writes of the current transaction. This is similar to the behavior
of <literal>SERIALIZABLE</> versus <literal>READ COMMITTED</> transaction
of <literal>REPEATABLE READ</> versus <literal>READ COMMITTED</> transaction
modes for ordinary SQL <command>SELECT</> commands.
</para>
......
This diff is collapsed.
......@@ -27,6 +27,7 @@ BEGIN [ WORK | TRANSACTION ] [ <replaceable class="parameter">transaction_mode</
ISOLATION LEVEL { SERIALIZABLE | REPEATABLE READ | READ COMMITTED | READ UNCOMMITTED }
READ WRITE | READ ONLY
[ NOT ] DEFERRABLE
</synopsis>
</refsynopsisdiv>
......@@ -57,7 +58,7 @@ BEGIN [ WORK | TRANSACTION ] [ <replaceable class="parameter">transaction_mode</
</para>
<para>
If the isolation level or read/write mode is specified, the new
If the isolation level, read/write mode, or deferrable mode is specified, the new
transaction has those characteristics, as if
<xref linkend="sql-set-transaction">
was executed.
......@@ -135,6 +136,12 @@ BEGIN;
contains additional compatibility information.
</para>
<para>
The <literal>DEFERRABLE</literal>
<replaceable class="parameter">transaction_mode</replaceable>
is a <productname>PostgreSQL</productname> language extension.
</para>
<para>
Incidentally, the <literal>BEGIN</literal> key word is used for a
different purpose in embedded SQL. You are advised to be careful
......
......@@ -67,10 +67,12 @@ LOCK [ TABLE ] [ ONLY ] <replaceable class="PARAMETER">name</replaceable> [, ...
</para>
<para>
To achieve a similar effect when running a transaction at the Serializable
To achieve a similar effect when running a transaction at the
<literal>REPEATABLE READ</> or <literal>SERIALIZABLE</>
isolation level, you have to execute the <command>LOCK TABLE</> statement
before executing any <command>SELECT</> or data modification statement.
A serializable transaction's view of data will be frozen when its first
A <literal>REPEATABLE READ</> or <literal>SERIALIZABLE</> transaction's
view of data will be frozen when its first
<command>SELECT</> or data modification statement begins. A <command>LOCK
TABLE</> later in the transaction will still prevent concurrent writes
&mdash; but it won't ensure that what the transaction reads corresponds to
......
......@@ -646,6 +646,41 @@ PostgreSQL documentation
</listitem>
</varlistentry>
<varlistentry>
<term><option>--serializable-deferrable</option></term>
<listitem>
<para>
Use a <literal>serializable</literal> transaction for the dump, to
ensure that the snapshot used is consistent with later database
states; but do this by waiting for a point in the transaction stream
at which no anomalies can be present, so that there isn't a risk of
the dump failing or causing other transactions to roll back with a
<literal>serialization_failure</literal>. See <xref linkend="mvcc">
for more information about transaction isolation and concurrency
control.
</para>
<para>
This option is not beneficial for a dump which is intended only for
disaster recovery. It could be useful for a dump used to load a
copy of the database for reporting or other read-only load sharing
while the original database continues to be updated. Without it the
dump may reflect a state which is not consistent with any serial
execution of the transactions eventually committed. For example, if
batch processing techniques are used, a batch may show as closed in
the dump without all of the items which are in the batch appearing.
</para>
<para>
This option will make no difference if there are no read-write
transactions active when pg_dump is started. If read-write
transactions are active, the start of the dump may be delayed for an
indeterminate length of time. Once running, performance with or
without the switch is the same.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term><option>--no-tablespaces</option></term>
<listitem>
......
......@@ -1144,7 +1144,7 @@ FOR SHARE [ OF <replaceable class="parameter">table_name</replaceable> [, ...] ]
has already locked a selected row or rows, <command>SELECT FOR
UPDATE</command> will wait for the other transaction to complete,
and will then lock and return the updated row (or no row, if the
row was deleted). Within a <literal>SERIALIZABLE</> transaction,
row was deleted). Within a <literal>REPEATABLE READ</> or <literal>SERIALIZABLE</> transaction,
however, an error will be thrown if a row to be locked has changed
since the transaction started. For further discussion see <xref
linkend="mvcc">.
......
......@@ -15,6 +15,21 @@
<primary>SET TRANSACTION</primary>
</indexterm>
<indexterm>
<primary>transaction isolation level</primary>
<secondary>setting</secondary>
</indexterm>
<indexterm>
<primary>read-only transaction</primary>
<secondary>setting</secondary>
</indexterm>
<indexterm>
<primary>deferrable transaction</primary>
<secondary>setting</secondary>
</indexterm>
<refsynopsisdiv>
<synopsis>
SET TRANSACTION <replaceable class="parameter">transaction_mode</replaceable> [, ...]
......@@ -24,6 +39,7 @@ SET SESSION CHARACTERISTICS AS TRANSACTION <replaceable class="parameter">transa
ISOLATION LEVEL { SERIALIZABLE | REPEATABLE READ | READ COMMITTED | READ UNCOMMITTED }
READ WRITE | READ ONLY
[ NOT ] DEFERRABLE
</synopsis>
</refsynopsisdiv>
......@@ -42,8 +58,8 @@ SET SESSION CHARACTERISTICS AS TRANSACTION <replaceable class="parameter">transa
<para>
The available transaction characteristics are the transaction
isolation level and the transaction access mode (read/write or
read-only).
isolation level, the transaction access mode (read/write or
read-only), and the deferrable mode.
</para>
<para>
......@@ -62,7 +78,7 @@ SET SESSION CHARACTERISTICS AS TRANSACTION <replaceable class="parameter">transa
</varlistentry>
<varlistentry>
<term><literal>SERIALIZABLE</literal></term>
<term><literal>REPEATABLE READ</literal></term>
<listitem>
<para>
All statements of the current transaction can only see rows committed
......@@ -71,14 +87,27 @@ SET SESSION CHARACTERISTICS AS TRANSACTION <replaceable class="parameter">transa
</para>
</listitem>
</varlistentry>
<varlistentry>
<term><literal>SERIALIZABLE</literal></term>
<listitem>
<para>
All statements of the current transaction can only see rows committed
before the first query or data-modification statement was executed in
this transaction. If a pattern of reads and writes among concurrent
serializable transactions would create a situation which could not
have occurred for any serial (one-at-a-time) execution of those
transactions, one of them will be rolled back with a
<literal>serialization_failure</literal> <literal>SQLSTATE</literal>.
</para>
</listitem>
</varlistentry>
</variablelist>
The SQL standard defines two additional levels, <literal>READ
UNCOMMITTED</literal> and <literal>REPEATABLE READ</literal>.
The SQL standard defines one additional level, <literal>READ
UNCOMMITTED</literal>.
In <productname>PostgreSQL</productname> <literal>READ
UNCOMMITTED</literal> is treated as
<literal>READ COMMITTED</literal>, while <literal>REPEATABLE
READ</literal> is treated as <literal>SERIALIZABLE</literal>.
UNCOMMITTED</literal> is treated as <literal>READ COMMITTED</literal>.
</para>
<para>
......@@ -127,8 +156,9 @@ SET SESSION CHARACTERISTICS AS TRANSACTION <replaceable class="parameter">transa
<para>
The session default transaction modes can also be set by setting the
configuration parameters <xref linkend="guc-default-transaction-isolation">
and <xref linkend="guc-default-transaction-read-only">.
configuration parameters <xref linkend="guc-default-transaction-isolation">,
<xref linkend="guc-default-transaction-read-only">, and
<xref linkend="guc-default-transaction-deferrable">.
(In fact <command>SET SESSION CHARACTERISTICS</command> is just a
verbose equivalent for setting these variables with <command>SET</>.)
This means the defaults can be set in the configuration file, via
......@@ -146,9 +176,7 @@ SET SESSION CHARACTERISTICS AS TRANSACTION <replaceable class="parameter">transa
isolation level in the standard. In
<productname>PostgreSQL</productname> the default is ordinarily
<literal>READ COMMITTED</literal>, but you can change it as
mentioned above. Because of lack of predicate locking, the
<literal>SERIALIZABLE</literal> level is not truly
serializable. See <xref linkend="mvcc"> for details.
mentioned above.
</para>
<para>
......@@ -158,6 +186,12 @@ SET SESSION CHARACTERISTICS AS TRANSACTION <replaceable class="parameter">transa
not implemented in the <productname>PostgreSQL</productname> server.
</para>
<para>
The <literal>DEFERRABLE</literal>
<replaceable class="parameter">transaction_mode</replaceable>
is a <productname>PostgreSQL</productname> language extension.
</para>
<para>
The SQL standard requires commas between successive <replaceable
class="parameter">transaction_modes</replaceable>, but for historical
......
......@@ -27,6 +27,7 @@ START TRANSACTION [ <replaceable class="parameter">transaction_mode</replaceable
ISOLATION LEVEL { SERIALIZABLE | REPEATABLE READ | READ COMMITTED | READ UNCOMMITTED }
READ WRITE | READ ONLY
[ NOT ] DEFERRABLE
</synopsis>
</refsynopsisdiv>
......@@ -34,8 +35,8 @@ START TRANSACTION [ <replaceable class="parameter">transaction_mode</replaceable
<title>Description</title>
<para>
This command begins a new transaction block. If the isolation level or
read/write mode is specified, the new transaction has those
This command begins a new transaction block. If the isolation level,
read/write mode, or deferrable mode is specified, the new transaction has those
characteristics, as if <xref linkend="sql-set-transaction"> was executed. This is the same
as the <xref linkend="sql-begin"> command.
</para>
......@@ -64,6 +65,12 @@ START TRANSACTION [ <replaceable class="parameter">transaction_mode</replaceable
as a convenience.
</para>
<para>
The <literal>DEFERRABLE</literal>
<replaceable class="parameter">transaction_mode</replaceable>
is a <productname>PostgreSQL</productname> language extension.
</para>
<para>
The SQL standard requires commas between successive <replaceable
class="parameter">transaction_modes</replaceable>, but for historical
......
......@@ -340,7 +340,7 @@ SPI_execute("INSERT INTO foo SELECT * FROM bar", false, 5);
<function>SPI_execute</function> increments the command
counter and computes a new <firstterm>snapshot</> before executing each
command in the string. The snapshot does not actually change if the
current transaction isolation level is <literal>SERIALIZABLE</>, but in
current transaction isolation level is <literal>SERIALIZABLE</> or <literal>REPEATABLE READ</>, but in
<literal>READ COMMITTED</> mode the snapshot update allows each command to
see the results of newly committed transactions from other sessions.
This is essential for consistent behavior when the commands are modifying
......
......@@ -57,6 +57,7 @@
#include "storage/bufmgr.h"
#include "storage/freespace.h"
#include "storage/lmgr.h"
#include "storage/predicate.h"
#include "storage/procarray.h"
#include "storage/smgr.h"
#include "storage/standby.h"
......@@ -261,20 +262,20 @@ heapgetpage(HeapScanDesc scan, BlockNumber page)
{
if (ItemIdIsNormal(lpp))
{
HeapTupleData loctup;
bool valid;
loctup.t_data = (HeapTupleHeader) PageGetItem((Page) dp, lpp);
loctup.t_len = ItemIdGetLength(lpp);
ItemPointerSet(&(loctup.t_self), page, lineoff);
if (all_visible)
valid = true;
else
{
HeapTupleData loctup;
valid = HeapTupleSatisfiesVisibility(&loctup, snapshot, buffer);
loctup.t_data = (HeapTupleHeader) PageGetItem((Page) dp, lpp);
loctup.t_len = ItemIdGetLength(lpp);
ItemPointerSet(&(loctup.t_self), page, lineoff);
CheckForSerializableConflictOut(valid, scan->rs_rd, &loctup, buffer);
valid = HeapTupleSatisfiesVisibility(&loctup, snapshot, buffer);
}
if (valid)
scan->rs_vistuples[ntup++] = lineoff;
}
......@@ -468,12 +469,16 @@ heapgettup(HeapScanDesc scan,
snapshot,
scan->rs_cbuf);
CheckForSerializableConflictOut(valid, scan->rs_rd, tuple, scan->rs_cbuf);
if (valid && key != NULL)
HeapKeyTest(tuple, RelationGetDescr(scan->rs_rd),
nkeys, key, valid);
if (valid)
{
if (!scan->rs_relpredicatelocked)
PredicateLockTuple(scan->rs_rd, tuple);
LockBuffer(scan->rs_cbuf, BUFFER_LOCK_UNLOCK);
return;
}
......@@ -741,12 +746,16 @@ heapgettup_pagemode(HeapScanDesc scan,
nkeys, key, valid);
if (valid)
{
if (!scan->rs_relpredicatelocked)
PredicateLockTuple(scan->rs_rd, tuple);
scan->rs_cindex = lineindex;
return;
}
}
else
{
if (!scan->rs_relpredicatelocked)
PredicateLockTuple(scan->rs_rd, tuple);
scan->rs_cindex = lineindex;
return;
}
......@@ -1213,6 +1222,7 @@ heap_beginscan_internal(Relation relation, Snapshot snapshot,
scan->rs_strategy = NULL; /* set in initscan */
scan->rs_allow_strat = allow_strat;
scan->rs_allow_sync = allow_sync;
scan->rs_relpredicatelocked = false;
/*
* we can use page-at-a-time mode if it's an MVCC-safe snapshot
......@@ -1459,8 +1469,13 @@ heap_fetch(Relation relation,
*/
valid = HeapTupleSatisfiesVisibility(tuple, snapshot, buffer);
if (valid)
PredicateLockTuple(relation, tuple);
LockBuffer(buffer, BUFFER_LOCK_UNLOCK);
CheckForSerializableConflictOut(valid, relation, tuple, buffer);
if (valid)
{
/*
......@@ -1506,13 +1521,15 @@ heap_fetch(Relation relation,
* heap_fetch, we do not report any pgstats count; caller may do so if wanted.
*/
bool
heap_hot_search_buffer(ItemPointer tid, Buffer buffer, Snapshot snapshot,
bool *all_dead)
heap_hot_search_buffer(ItemPointer tid, Relation relation, Buffer buffer,
Snapshot snapshot, bool *all_dead)
{
Page dp = (Page) BufferGetPage(buffer);
TransactionId prev_xmax = InvalidTransactionId;
OffsetNumber offnum;
bool at_chain_start;
bool valid;
bool match_found;
if (all_dead)
*all_dead = true;
......@@ -1522,6 +1539,7 @@ heap_hot_search_buffer(ItemPointer tid, Buffer buffer, Snapshot snapshot,
Assert(ItemPointerGetBlockNumber(tid) == BufferGetBlockNumber(buffer));
offnum = ItemPointerGetOffsetNumber(tid);
at_chain_start = true;
match_found = false;
/* Scan through possible multiple members of HOT-chain */
for (;;)
......@@ -1552,6 +1570,8 @@ heap_hot_search_buffer(ItemPointer tid, Buffer buffer, Snapshot snapshot,
heapTuple.t_data = (HeapTupleHeader) PageGetItem(dp, lp);
heapTuple.t_len = ItemIdGetLength(lp);
heapTuple.t_tableOid = relation->rd_id;
heapTuple.t_self = *tid;
/*
* Shouldn't see a HEAP_ONLY tuple at chain start.
......@@ -1569,12 +1589,18 @@ heap_hot_search_buffer(ItemPointer tid, Buffer buffer, Snapshot snapshot,
break;
/* If it's visible per the snapshot, we must return it */
if (HeapTupleSatisfiesVisibility(&heapTuple, snapshot, buffer))
valid = HeapTupleSatisfiesVisibility(&heapTuple, snapshot, buffer);
CheckForSerializableConflictOut(valid, relation, &heapTuple, buffer);
if (valid)
{
ItemPointerSetOffsetNumber(tid, offnum);
PredicateLockTuple(relation, &heapTuple);
if (all_dead)
*all_dead = false;
return true;
if (IsolationIsSerializable())
match_found = true;
else
return true;
}
/*
......@@ -1603,7 +1629,7 @@ heap_hot_search_buffer(ItemPointer tid, Buffer buffer, Snapshot snapshot,
break; /* end of chain */
}
return false;
return match_found;
}
/*
......@@ -1622,7 +1648,7 @@ heap_hot_search(ItemPointer tid, Relation relation, Snapshot snapshot,
buffer = ReadBuffer(relation, ItemPointerGetBlockNumber(tid));
LockBuffer(buffer, BUFFER_LOCK_SHARE);
result = heap_hot_search_buffer(tid, buffer, snapshot, all_dead);
result = heap_hot_search_buffer(tid, relation, buffer, snapshot, all_dead);
LockBuffer(buffer, BUFFER_LOCK_UNLOCK);
ReleaseBuffer(buffer);
return result;
......@@ -1729,6 +1755,7 @@ heap_get_latest_tid(Relation relation,
* result candidate.
*/
valid = HeapTupleSatisfiesVisibility(&tp, snapshot, buffer);
CheckForSerializableConflictOut(valid, relation, &tp, buffer);
if (valid)
*tid = ctid;
......@@ -1893,6 +1920,13 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
buffer = RelationGetBufferForTuple(relation, heaptup->t_len,
InvalidBuffer, options, bistate);
/*
* We're about to do the actual insert -- check for conflict at the
* relation or buffer level first, to avoid possibly having to roll
* back work we've just done.
*/
CheckForSerializableConflictIn(relation, NULL, buffer);
/* NO EREPORT(ERROR) from here till changes are logged */
START_CRIT_SECTION();
......@@ -2193,6 +2227,12 @@ l1:
return result;
}
/*
* We're about to do the actual delete -- check for conflict first,
* to avoid possibly having to roll back work we've just done.
*/
CheckForSerializableConflictIn(relation, &tp, buffer);
/* replace cid with a combo cid if necessary */
HeapTupleHeaderAdjustCmax(tp.t_data, &cid, &iscombo);
......@@ -2546,6 +2586,12 @@ l2:
return result;
}
/*
* We're about to do the actual update -- check for conflict first,
* to avoid possibly having to roll back work we've just done.
*/
CheckForSerializableConflictIn(relation, &oldtup, buffer);
/* Fill in OID and transaction status data for newtup */
if (relation->rd_rel->relhasoids)
{
......@@ -2690,6 +2736,16 @@ l2:
heaptup = newtup;
}
/*
* We're about to create the new tuple -- check for conflict first,
* to avoid possibly having to roll back work we've just done.
*
* NOTE: For a tuple insert, we only need to check for table locks, since
* predicate locking at the index level will cover ranges for anything
* except a table scan. Therefore, only provide the relation.
*/
CheckForSerializableConflictIn(relation, NULL, InvalidBuffer);
/*
* At this point newbuf and buffer are both pinned and locked, and newbuf
* has enough space for the new tuple. If they are the same buffer, only
......@@ -2799,6 +2855,12 @@ l2:
END_CRIT_SECTION();
/*
* Any existing SIREAD locks on the old tuple must be linked to the new
* tuple for conflict detection purposes.
*/
PredicateLockTupleRowVersionLink(relation, &oldtup, newtup);
if (newbuf != buffer)
LockBuffer(newbuf, BUFFER_LOCK_UNLOCK);
LockBuffer(buffer, BUFFER_LOCK_UNLOCK);
......
......@@ -64,9 +64,11 @@
#include "access/relscan.h"
#include "access/transam.h"
#include "access/xact.h"
#include "pgstat.h"
#include "storage/bufmgr.h"
#include "storage/lmgr.h"
#include "storage/predicate.h"
#include "utils/relcache.h"
#include "utils/snapmgr.h"
#include "utils/tqual.h"
......@@ -192,6 +194,11 @@ index_insert(Relation indexRelation,
RELATION_CHECKS;
GET_REL_PROCEDURE(aminsert);
if (!(indexRelation->rd_am->ampredlocks))
CheckForSerializableConflictIn(indexRelation,
(HeapTuple) NULL,
InvalidBuffer);
/*
* have the am's insert proc do all the work.
*/
......@@ -266,6 +273,9 @@ index_beginscan_internal(Relation indexRelation,
RELATION_CHECKS;
GET_REL_PROCEDURE(ambeginscan);
if (!(indexRelation->rd_am->ampredlocks))
PredicateLockRelation(indexRelation);
/*
* We hold a reference count to the relcache entry throughout the scan.
*/
......@@ -523,6 +533,7 @@ index_getnext(IndexScanDesc scan, ScanDirection direction)
{
ItemId lp;
ItemPointer ctid;
bool valid;
/* check for bogus TID */
if (offnum < FirstOffsetNumber ||
......@@ -577,8 +588,13 @@ index_getnext(IndexScanDesc scan, ScanDirection direction)
break;
/* If it's visible per the snapshot, we must return it */
if (HeapTupleSatisfiesVisibility(heapTuple, scan->xs_snapshot,
scan->xs_cbuf))
valid = HeapTupleSatisfiesVisibility(heapTuple, scan->xs_snapshot,
scan->xs_cbuf);
CheckForSerializableConflictOut(valid, scan->heapRelation,
heapTuple, scan->xs_cbuf);
if (valid)
{
/*
* If the snapshot is MVCC, we know that it could accept at
......@@ -586,7 +602,8 @@ index_getnext(IndexScanDesc scan, ScanDirection direction)
* any more members. Otherwise, check for continuation of the
* HOT-chain, and set state for next time.
*/
if (IsMVCCSnapshot(scan->xs_snapshot))
if (IsMVCCSnapshot(scan->xs_snapshot)
&& !IsolationIsSerializable())
scan->xs_next_hot = InvalidOffsetNumber;
else if (HeapTupleIsHotUpdated(heapTuple))
{
......@@ -598,6 +615,8 @@ index_getnext(IndexScanDesc scan, ScanDirection direction)
else
scan->xs_next_hot = InvalidOffsetNumber;
PredicateLockTuple(scan->heapRelation, heapTuple);
LockBuffer(scan->xs_cbuf, BUFFER_LOCK_UNLOCK);
pgstat_count_heap_fetch(scan->indexRelation);
......
......@@ -21,6 +21,7 @@
#include "miscadmin.h"
#include "storage/bufmgr.h"
#include "storage/lmgr.h"
#include "storage/predicate.h"
#include "utils/inval.h"
#include "utils/tqual.h"
......@@ -174,6 +175,14 @@ top:
if (checkUnique != UNIQUE_CHECK_EXISTING)
{
/*
* The only conflict predicate locking cares about for indexes is when
* an index tuple insert conflicts with an existing lock. Since the
* actual location of the insert is hard to predict because of the
* random search used to prevent O(N^2) performance when there are many
* duplicate entries, we can just use the "first valid" page.
*/
CheckForSerializableConflictIn(rel, NULL, buf);
/* do the insertion */
_bt_findinsertloc(rel, &buf, &offset, natts, itup_scankey, itup, heapRel);
_bt_insertonpg(rel, buf, stack, itup, offset, false);
......@@ -696,6 +705,9 @@ _bt_insertonpg(Relation rel,
/* split the buffer into left and right halves */
rbuf = _bt_split(rel, buf, firstright,
newitemoff, itemsz, itup, newitemonleft);
PredicateLockPageSplit(rel,
BufferGetBlockNumber(buf),
BufferGetBlockNumber(rbuf));
/*----------
* By here,
......
......@@ -29,6 +29,7 @@
#include "storage/freespace.h"
#include "storage/indexfsm.h"
#include "storage/lmgr.h"
#include "storage/predicate.h"
#include "utils/inval.h"
#include "utils/snapmgr.h"
......@@ -1183,6 +1184,12 @@ _bt_pagedel(Relation rel, Buffer buf, BTStack stack)
rightsib, opaque->btpo_prev, target,
RelationGetRelationName(rel));
/*
* Any insert which would have gone on the target block will now go to the
* right sibling block.
*/
PredicateLockPageCombine(rel, target, rightsib);
/*
* Next find and write-lock the current parent of the target page. This is
* essentially the same as the corresponding step of splitting.
......
......@@ -29,6 +29,7 @@
#include "storage/indexfsm.h"
#include "storage/ipc.h"
#include "storage/lmgr.h"
#include "storage/predicate.h"
#include "storage/smgr.h"
#include "utils/memutils.h"
......@@ -822,6 +823,7 @@ restart:
if (_bt_page_recyclable(page))
{
/* Okay to recycle this page */
Assert(!PageIsPredicateLocked(rel, blkno));
RecordFreeIndexPage(rel, blkno);
vstate->totFreePages++;
stats->pages_deleted++;
......
......@@ -21,6 +21,7 @@
#include "miscadmin.h"
#include "pgstat.h"
#include "storage/bufmgr.h"
#include "storage/predicate.h"
#include "utils/lsyscache.h"
#include "utils/rel.h"
......@@ -63,7 +64,10 @@ _bt_search(Relation rel, int keysz, ScanKey scankey, bool nextkey,
/* If index is empty and access = BT_READ, no root page is created. */
if (!BufferIsValid(*bufP))
{
PredicateLockRelation(rel); /* Nothing finer to lock exists. */
return (BTStack) NULL;
}
/* Loop iterates once per level descended in the tree */
for (;;)
......@@ -88,7 +92,11 @@ _bt_search(Relation rel, int keysz, ScanKey scankey, bool nextkey,
page = BufferGetPage(*bufP);
opaque = (BTPageOpaque) PageGetSpecialPointer(page);
if (P_ISLEAF(opaque))
{
if (access == BT_READ)
PredicateLockPage(rel, BufferGetBlockNumber(*bufP));
break;
}
/*
* Find the appropriate item on the internal page, and get the child
......@@ -1142,6 +1150,7 @@ _bt_steppage(IndexScanDesc scan, ScanDirection dir)
opaque = (BTPageOpaque) PageGetSpecialPointer(page);
if (!P_IGNORE(opaque))
{
PredicateLockPage(rel, blkno);
/* see if there are any matches on this page */
/* note that this will clear moreRight if we can stop */
if (_bt_readpage(scan, dir, P_FIRSTDATAKEY(opaque)))
......@@ -1189,6 +1198,7 @@ _bt_steppage(IndexScanDesc scan, ScanDirection dir)
opaque = (BTPageOpaque) PageGetSpecialPointer(page);
if (!P_IGNORE(opaque))
{
PredicateLockPage(rel, BufferGetBlockNumber(so->currPos.buf));
/* see if there are any matches on this page */
/* note that this will clear moreLeft if we can stop */
if (_bt_readpage(scan, dir, PageGetMaxOffsetNumber(page)))
......@@ -1352,6 +1362,7 @@ _bt_get_endpoint(Relation rel, uint32 level, bool rightmost)
if (!BufferIsValid(buf))
{
/* empty index... */
PredicateLockRelation(rel); /* Nothing finer to lock exists. */
return InvalidBuffer;
}
......@@ -1431,10 +1442,12 @@ _bt_endpoint(IndexScanDesc scan, ScanDirection dir)
if (!BufferIsValid(buf))
{
/* empty index... */
PredicateLockRelation(rel); /* Nothing finer to lock exists. */
so->currPos.buf = InvalidBuffer;
return false;
}
PredicateLockPage(rel, BufferGetBlockNumber(buf));
page = BufferGetPage(buf);
opaque = (BTPageOpaque) PageGetSpecialPointer(page);
Assert(P_ISLEAF(opaque));
......
......@@ -57,6 +57,7 @@
#include "pgstat.h"
#include "replication/walsender.h"
#include "storage/fd.h"
#include "storage/predicate.h"
#include "storage/procarray.h"
#include "storage/sinvaladt.h"
#include "storage/smgr.h"
......@@ -1357,6 +1358,8 @@ FinishPreparedTransaction(const char *gid, bool isCommit)
else
ProcessRecords(bufptr, xid, twophase_postabort_callbacks);
PredicateLockTwoPhaseFinish(xid, isCommit);
/* Count the prepared xact as committed or aborted */
AtEOXact_PgStat(isCommit);
......
......@@ -18,12 +18,14 @@
#include "access/twophase_rmgr.h"
#include "pgstat.h"
#include "storage/lock.h"
#include "storage/predicate.h"
const TwoPhaseCallback twophase_recover_callbacks[TWOPHASE_RM_MAX_ID + 1] =
{
NULL, /* END ID */
lock_twophase_recover, /* Lock */
predicatelock_twophase_recover, /* PredicateLock */
NULL, /* pgstat */
multixact_twophase_recover /* MultiXact */
};
......@@ -32,6 +34,7 @@ const TwoPhaseCallback twophase_postcommit_callbacks[TWOPHASE_RM_MAX_ID + 1] =
{
NULL, /* END ID */
lock_twophase_postcommit, /* Lock */
NULL, /* PredicateLock */
pgstat_twophase_postcommit, /* pgstat */
multixact_twophase_postcommit /* MultiXact */
};
......@@ -40,6 +43,7 @@ const TwoPhaseCallback twophase_postabort_callbacks[TWOPHASE_RM_MAX_ID + 1] =
{
NULL, /* END ID */
lock_twophase_postabort, /* Lock */
NULL, /* PredicateLock */
pgstat_twophase_postabort, /* pgstat */
multixact_twophase_postabort /* MultiXact */
};
......@@ -48,6 +52,7 @@ const TwoPhaseCallback twophase_standby_recover_callbacks[TWOPHASE_RM_MAX_ID + 1
{
NULL, /* END ID */
lock_twophase_standby_recover, /* Lock */
NULL, /* PredicateLock */
NULL, /* pgstat */
NULL /* MultiXact */
};
......@@ -21,6 +21,7 @@
#include "miscadmin.h"
#include "postmaster/autovacuum.h"
#include "storage/pmsignal.h"
#include "storage/predicate.h"
#include "storage/proc.h"
#include "utils/builtins.h"
#include "utils/syscache.h"
......@@ -161,6 +162,10 @@ GetNewTransactionId(bool isSubXact)
ExtendCLOG(xid);
ExtendSUBTRANS(xid);
/* If it's top level, the predicate locking system also needs to know. */
if (!isSubXact)
RegisterPredicateLockingXid(xid);
/*
* Now advance the nextXid counter. This must not happen until after we
* have successfully completed ExtendCLOG() --- if that routine fails, we
......
......@@ -40,6 +40,7 @@
#include "storage/bufmgr.h"
#include "storage/fd.h"
#include "storage/lmgr.h"
#include "storage/predicate.h"
#include "storage/procarray.h"
#include "storage/sinvaladt.h"
#include "storage/smgr.h"
......@@ -63,6 +64,9 @@ int XactIsoLevel;
bool DefaultXactReadOnly = false;
bool XactReadOnly;
bool DefaultXactDeferrable = false;
bool XactDeferrable;
bool XactSyncCommit = true;
int CommitDelay = 0; /* precommit delay in microseconds */
......@@ -1640,6 +1644,7 @@ StartTransaction(void)
s->startedInRecovery = false;
XactReadOnly = DefaultXactReadOnly;
}
XactDeferrable = DefaultXactDeferrable;
XactIsoLevel = DefaultXactIsoLevel;
forceSyncCommit = false;
MyXactAccessedTempRel = false;
......@@ -1786,6 +1791,13 @@ CommitTransaction(void)
/* close large objects before lower-level cleanup */
AtEOXact_LargeObject(true);
/*
* Mark serializable transaction as complete for predicate locking
* purposes. This should be done as late as we can put it and still
* allow errors to be raised for failure patterns found at commit.
*/
PreCommit_CheckForSerializationFailure();
/*
* Insert notifications sent by NOTIFY commands into the queue. This
* should be late in the pre-commit sequence to minimize time spent
......@@ -1980,6 +1992,13 @@ PrepareTransaction(void)
/* close large objects before lower-level cleanup */
AtEOXact_LargeObject(true);
/*
* Mark serializable transaction as complete for predicate locking
* purposes. This should be done as late as we can put it and still
* allow errors to be raised for failure patterns found at commit.
*/
PreCommit_CheckForSerializationFailure();
/* NOTIFY will be handled below */
/*
......@@ -2044,6 +2063,7 @@ PrepareTransaction(void)
AtPrepare_Notify();
AtPrepare_Locks();
AtPrepare_PredicateLocks();
AtPrepare_PgStat();
AtPrepare_MultiXact();
AtPrepare_RelationMap();
......@@ -2103,6 +2123,7 @@ PrepareTransaction(void)
PostPrepare_MultiXact(xid);
PostPrepare_Locks(xid);
PostPrepare_PredicateLocks(xid);
ResourceOwnerRelease(TopTransactionResourceOwner,
RESOURCE_RELEASE_LOCKS,
......
......@@ -616,6 +616,15 @@ assign_XactIsoLevel(const char *value, bool doit, GucSource source)
errmsg("SET TRANSACTION ISOLATION LEVEL must not be called in a subtransaction")));
return NULL;
}
/* Can't go to serializable mode while recovery is still active */
if (RecoveryInProgress() && strcmp(value, "serializable") == 0)
{
ereport(GUC_complaint_elevel(source),
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
errmsg("cannot use serializable mode in a hot standby"),
errhint("You can use REPEATABLE READ instead.")));
return false;
}
}
if (strcmp(value, "serializable") == 0)
......@@ -667,6 +676,35 @@ show_XactIsoLevel(void)
}
}
/*
* SET TRANSACTION [NOT] DEFERRABLE
*/
bool
assign_transaction_deferrable(bool newval, bool doit, GucSource source)
{
/* source == PGC_S_OVERRIDE means do it anyway, eg at xact abort */
if (source == PGC_S_OVERRIDE)
return true;
if (IsSubTransaction())
{
ereport(GUC_complaint_elevel(source),
(errcode(ERRCODE_ACTIVE_SQL_TRANSACTION),
errmsg("SET TRANSACTION [NOT] DEFERRABLE cannot be called within a subtransaction")));
return false;
}
if (FirstSnapshotSet)
{
ereport(GUC_complaint_elevel(source),
(errcode(ERRCODE_ACTIVE_SQL_TRANSACTION),
errmsg("SET TRANSACTION [NOT] DEFERRABLE must be called before any query")));
return false;
}
return true;
}
/*
* Random number seed
......
......@@ -42,6 +42,7 @@
#include "executor/nodeBitmapHeapscan.h"
#include "pgstat.h"
#include "storage/bufmgr.h"
#include "storage/predicate.h"
#include "utils/memutils.h"
#include "utils/snapmgr.h"
#include "utils/tqual.h"
......@@ -351,7 +352,7 @@ bitgetpage(HeapScanDesc scan, TBMIterateResult *tbmres)
ItemPointerData tid;
ItemPointerSet(&tid, page, offnum);
if (heap_hot_search_buffer(&tid, buffer, snapshot, NULL))
if (heap_hot_search_buffer(&tid, scan->rs_rd, buffer, snapshot, NULL))
scan->rs_vistuples[ntup++] = ItemPointerGetOffsetNumber(&tid);
}
}
......
......@@ -28,6 +28,7 @@
#include "access/relscan.h"
#include "executor/execdebug.h"
#include "executor/nodeSeqscan.h"
#include "storage/predicate.h"
static void InitScanRelation(SeqScanState *node, EState *estate);
static TupleTableSlot *SeqNext(SeqScanState *node);
......@@ -105,11 +106,15 @@ SeqRecheck(SeqScanState *node, TupleTableSlot *slot)
* tuple.
* We call the ExecScan() routine and pass it the appropriate
* access method functions.
* For serializable transactions, we first acquire a predicate
* lock on the entire relation.
* ----------------------------------------------------------------
*/
TupleTableSlot *
ExecSeqScan(SeqScanState *node)
{
PredicateLockRelation(node->ss_currentRelation);
node->ss_currentScanDesc->rs_relpredicatelocked = true;
return ExecScan((ScanState *) node,
(ExecScanAccessMtd) SeqNext,
(ExecScanRecheckMtd) SeqRecheck);
......
......@@ -6768,6 +6768,12 @@ transaction_mode_item:
| READ WRITE
{ $$ = makeDefElem("transaction_read_only",
makeIntConst(FALSE, @1)); }
| DEFERRABLE
{ $$ = makeDefElem("transaction_deferrable",
makeIntConst(TRUE, @1)); }
| NOT DEFERRABLE
{ $$ = makeDefElem("transaction_deferrable",
makeIntConst(FALSE, @1)); }
;
/* Syntax with commas is SQL-spec, without commas is Postgres historical */
......
......@@ -32,6 +32,7 @@
#include "storage/ipc.h"
#include "storage/pg_shmem.h"
#include "storage/pmsignal.h"
#include "storage/predicate.h"
#include "storage/procarray.h"
#include "storage/procsignal.h"
#include "storage/sinvaladt.h"
......@@ -105,6 +106,7 @@ CreateSharedMemoryAndSemaphores(bool makePrivate, int port)
sizeof(ShmemIndexEnt)));
size = add_size(size, BufferShmemSize());
size = add_size(size, LockShmemSize());
size = add_size(size, PredicateLockShmemSize());
size = add_size(size, ProcGlobalShmemSize());
size = add_size(size, XLOGShmemSize());
size = add_size(size, CLOGShmemSize());
......@@ -199,6 +201,11 @@ CreateSharedMemoryAndSemaphores(bool makePrivate, int port)
*/
InitLocks();
/*
* Set up predicate lock manager
*/
InitPredicateLocks();
/*
* Set up process table
*/
......
......@@ -198,7 +198,7 @@ ShmemAlloc(Size size)
* Returns TRUE if the pointer points within the shared memory segment.
*/
bool
ShmemAddrIsValid(void *addr)
ShmemAddrIsValid(const void *addr)
{
return (addr >= ShmemBase) && (addr < ShmemEnd);
}
......
......@@ -43,14 +43,12 @@ SHMQueueInit(SHM_QUEUE *queue)
* SHMQueueIsDetached -- TRUE if element is not currently
* in a queue.
*/
#ifdef NOT_USED
bool
SHMQueueIsDetached(SHM_QUEUE *queue)
SHMQueueIsDetached(const SHM_QUEUE *queue)
{
Assert(ShmemAddrIsValid(queue));
return (queue->prev == NULL);
}
#endif
/*
* SHMQueueElemInit -- clear an element's links
......@@ -146,7 +144,7 @@ SHMQueueInsertAfter(SHM_QUEUE *queue, SHM_QUEUE *elem)
*--------------------
*/
Pointer
SHMQueueNext(SHM_QUEUE *queue, SHM_QUEUE *curElem, Size linkOffset)
SHMQueueNext(const SHM_QUEUE *queue, const SHM_QUEUE *curElem, Size linkOffset)
{
SHM_QUEUE *elemPtr = curElem->next;
......@@ -162,7 +160,7 @@ SHMQueueNext(SHM_QUEUE *queue, SHM_QUEUE *curElem, Size linkOffset)
* SHMQueueEmpty -- TRUE if queue head is only element, FALSE otherwise
*/
bool
SHMQueueEmpty(SHM_QUEUE *queue)
SHMQueueEmpty(const SHM_QUEUE *queue)
{
Assert(ShmemAddrIsValid(queue));
......
......@@ -12,7 +12,7 @@ subdir = src/backend/storage/lmgr
top_builddir = ../../../..
include $(top_builddir)/src/Makefile.global
OBJS = lmgr.o lock.o proc.o deadlock.o lwlock.o spin.o s_lock.o
OBJS = lmgr.o lock.o proc.o deadlock.o lwlock.o spin.o s_lock.o predicate.o
include $(top_srcdir)/src/backend/common.mk
......
......@@ -3,7 +3,7 @@ src/backend/storage/lmgr/README
Locking Overview
================
Postgres uses three types of interprocess locks:
Postgres uses four types of interprocess locks:
* Spinlocks. These are intended for *very* short-term locks. If a lock
is to be held more than a few dozen instructions, or across any sort of
......@@ -34,6 +34,8 @@ supports a variety of lock modes with table-driven semantics, and it has
full deadlock detection and automatic release at transaction end.
Regular locks should be used for all user-driven lock requests.
* SIReadLock predicate locks. See separate README-SSI file for details.
Acquisition of either a spinlock or a lightweight lock causes query
cancel and die() interrupts to be held off until all such locks are
released. No such restriction exists for regular locks, however. Also
......
This diff is collapsed.
......@@ -28,6 +28,7 @@
#include "miscadmin.h"
#include "pg_trace.h"
#include "storage/ipc.h"
#include "storage/predicate.h"
#include "storage/proc.h"
#include "storage/spin.h"
......@@ -178,6 +179,9 @@ NumLWLocks(void)
/* async.c needs one per Async buffer */
numLocks += NUM_ASYNC_BUFFERS;
/* predicate.c needs one per old serializable xid buffer */
numLocks += NUM_OLDSERXID_BUFFERS;
/*
* Add any requested by loadable modules; for backwards-compatibility
* reasons, allocate at least NUM_USER_DEFINED_LWLOCKS of them even if
......
This diff is collapsed.
......@@ -374,6 +374,10 @@ standard_ProcessUtility(Node *parsetree,
SetPGVariable("transaction_read_only",
list_make1(item->arg),
true);
else if (strcmp(item->defname, "transaction_deferrable") == 0)
SetPGVariable("transaction_deferrable",
list_make1(item->arg),
true);
}
}
break;
......
......@@ -15,6 +15,7 @@
#include "catalog/pg_type.h"
#include "funcapi.h"
#include "miscadmin.h"
#include "storage/predicate_internals.h"
#include "storage/proc.h"
#include "utils/builtins.h"
......@@ -32,11 +33,20 @@ static const char *const LockTagTypeNames[] = {
"advisory"
};
/* This must match enum PredicateLockTargetType (predicate_internals.h) */
static const char *const PredicateLockTagTypeNames[] = {
"relation",
"page",
"tuple"
};
/* Working status for pg_lock_status */
typedef struct
{
LockData *lockData; /* state data from lmgr */
int currIdx; /* current PROCLOCK index */
PredicateLockData *predLockData; /* state data for pred locks */
int predLockIdx; /* current index for pred lock */
} PG_Lock_Status;
......@@ -69,6 +79,7 @@ pg_lock_status(PG_FUNCTION_ARGS)
FuncCallContext *funcctx;
PG_Lock_Status *mystatus;
LockData *lockData;
PredicateLockData *predLockData;
if (SRF_IS_FIRSTCALL())
{
......@@ -126,6 +137,8 @@ pg_lock_status(PG_FUNCTION_ARGS)
mystatus->lockData = GetLockStatusData();
mystatus->currIdx = 0;
mystatus->predLockData = GetPredicateLockStatusData();
mystatus->predLockIdx = 0;
MemoryContextSwitchTo(oldcontext);
}
......@@ -303,6 +316,72 @@ pg_lock_status(PG_FUNCTION_ARGS)
SRF_RETURN_NEXT(funcctx, result);
}
/*
* Have returned all regular locks. Now start on the SIREAD predicate
* locks.
*/
predLockData = mystatus->predLockData;
if (mystatus->predLockIdx < predLockData->nelements)
{
PredicateLockTargetType lockType;
PREDICATELOCKTARGETTAG *predTag = &(predLockData->locktags[mystatus->predLockIdx]);
SERIALIZABLEXACT *xact = &(predLockData->xacts[mystatus->predLockIdx]);
Datum values[14];
bool nulls[14];
HeapTuple tuple;
Datum result;
mystatus->predLockIdx++;
/*
* Form tuple with appropriate data.
*/
MemSet(values, 0, sizeof(values));
MemSet(nulls, false, sizeof(nulls));
/* lock type */
lockType = GET_PREDICATELOCKTARGETTAG_TYPE(*predTag);
values[0] = CStringGetTextDatum(PredicateLockTagTypeNames[lockType]);
/* lock target */
values[1] = GET_PREDICATELOCKTARGETTAG_DB(*predTag);
values[2] = GET_PREDICATELOCKTARGETTAG_RELATION(*predTag);
if (lockType == PREDLOCKTAG_TUPLE)
values[4] = GET_PREDICATELOCKTARGETTAG_OFFSET(*predTag);
else
nulls[4] = true;
if ((lockType == PREDLOCKTAG_TUPLE) ||
(lockType == PREDLOCKTAG_PAGE))
values[3] = GET_PREDICATELOCKTARGETTAG_PAGE(*predTag);
else
nulls[3] = true;
/* these fields are targets for other types of locks */
nulls[5] = true; /* virtualxid */
nulls[6] = true; /* transactionid */
nulls[7] = true; /* classid */
nulls[8] = true; /* objid */
nulls[9] = true; /* objsubid */
/* lock holder */
values[10] = VXIDGetDatum(xact->vxid.backendId,
xact->vxid.localTransactionId);
nulls[11] = true; /* pid */
/*
* Lock mode. Currently all predicate locks are SIReadLocks, which are
* always held (never waiting)
*/
values[12] = CStringGetTextDatum("SIReadLock");
values[13] = BoolGetDatum(true);
tuple = heap_form_tuple(funcctx->tuple_desc, values, nulls);
result = HeapTupleGetDatum(tuple);
SRF_RETURN_NEXT(funcctx, result);
}
SRF_RETURN_DONE(funcctx);
}
......
......@@ -59,6 +59,7 @@
#include "storage/bufmgr.h"
#include "storage/standby.h"
#include "storage/fd.h"
#include "storage/predicate.h"
#include "tcop/tcopprot.h"
#include "tsearch/ts_cache.h"
#include "utils/builtins.h"
......@@ -1096,6 +1097,23 @@ static struct config_bool ConfigureNamesBool[] =
&XactReadOnly,
false, assign_transaction_read_only, NULL
},
{
{"default_transaction_deferrable", PGC_USERSET, CLIENT_CONN_STATEMENT,
gettext_noop("Sets the default deferrable status of new transactions."),
NULL
},
&DefaultXactDeferrable,
false, NULL, NULL
},
{
{"transaction_deferrable", PGC_USERSET, CLIENT_CONN_STATEMENT,
gettext_noop("Whether to defer a read-only serializable transaction until it can be executed with no possible serialization failures."),
NULL,
GUC_NO_RESET_ALL | GUC_NOT_IN_SAMPLE | GUC_DISALLOW_IN_FILE
},
&XactDeferrable,
false, assign_transaction_deferrable, NULL
},
{
{"check_function_bodies", PGC_USERSET, CLIENT_CONN_STATEMENT,
gettext_noop("Check function bodies during CREATE FUNCTION."),
......@@ -1695,6 +1713,17 @@ static struct config_int ConfigureNamesInt[] =
64, 10, INT_MAX, NULL, NULL
},
{
{"max_predicate_locks_per_transaction", PGC_POSTMASTER, LOCK_MANAGEMENT,
gettext_noop("Sets the maximum number of predicate locks per transaction."),
gettext_noop("The shared predicate lock table is sized on the assumption that "
"at most max_predicate_locks_per_transaction * max_connections distinct "
"objects will need to be locked at any one time.")
},
&max_predicate_locks_per_xact,
64, 10, INT_MAX, NULL, NULL
},
{
{"authentication_timeout", PGC_SIGHUP, CONN_AUTH_SECURITY,
gettext_noop("Sets the maximum allowed time to complete client authentication."),
......@@ -3460,6 +3489,8 @@ InitializeGUCOptions(void)
PGC_POSTMASTER, PGC_S_OVERRIDE);
SetConfigOption("transaction_read_only", "no",
PGC_POSTMASTER, PGC_S_OVERRIDE);
SetConfigOption("transaction_deferrable", "no",
PGC_POSTMASTER, PGC_S_OVERRIDE);
/*
* For historical reasons, some GUC parameters can receive defaults from
......@@ -5699,6 +5730,9 @@ ExecSetVariableStmt(VariableSetStmt *stmt)
else if (strcmp(item->defname, "transaction_read_only") == 0)
SetPGVariable("transaction_read_only",
list_make1(item->arg), stmt->is_local);
else if (strcmp(item->defname, "transaction_deferrable") == 0)
SetPGVariable("transaction_deferrable",
list_make1(item->arg), stmt->is_local);
else
elog(ERROR, "unexpected SET TRANSACTION element: %s",
item->defname);
......@@ -5718,6 +5752,9 @@ ExecSetVariableStmt(VariableSetStmt *stmt)
else if (strcmp(item->defname, "transaction_read_only") == 0)
SetPGVariable("default_transaction_read_only",
list_make1(item->arg), stmt->is_local);
else if (strcmp(item->defname, "transaction_deferrable") == 0)
SetPGVariable("default_transaction_deferrable",
list_make1(item->arg), stmt->is_local);
else
elog(ERROR, "unexpected SET SESSION element: %s",
item->defname);
......
......@@ -450,6 +450,7 @@
#check_function_bodies = on
#default_transaction_isolation = 'read committed'
#default_transaction_read_only = off
#default_transaction_deferrable = off
#session_replication_role = 'origin'
#statement_timeout = 0 # in milliseconds, 0 is disabled
#vacuum_freeze_min_age = 50000000
......@@ -501,7 +502,8 @@
# Note: Each lock table slot uses ~270 bytes of shared memory, and there are
# max_locks_per_transaction * (max_connections + max_prepared_transactions)
# lock table slots.
#max_predicate_locks_per_transaction = 64 # min 10
# (change requires restart)
#------------------------------------------------------------------------------
# VERSION/PLATFORM COMPATIBILITY
......
......@@ -22,6 +22,7 @@
#include "access/hash.h"
#include "storage/bufmgr.h"
#include "storage/predicate.h"
#include "storage/proc.h"
#include "utils/memutils.h"
#include "utils/rel.h"
......@@ -261,7 +262,10 @@ ResourceOwnerReleaseInternal(ResourceOwner owner,
* the top of the recursion.
*/
if (owner == TopTransactionResourceOwner)
{
ProcReleaseLocks(isCommit);
ReleasePredicateLocks(isCommit);
}
}
else
{
......
......@@ -27,6 +27,7 @@
#include "access/transam.h"
#include "access/xact.h"
#include "storage/predicate.h"
#include "storage/proc.h"
#include "storage/procarray.h"
#include "utils/memutils.h"
......@@ -126,9 +127,6 @@ GetTransactionSnapshot(void)
{
Assert(RegisteredSnapshots == 0);
CurrentSnapshot = GetSnapshotData(&CurrentSnapshotData);
FirstSnapshotSet = true;
/*
* In transaction-snapshot mode, the first snapshot must live until
* end of xact regardless of what the caller does with it, so we must
......@@ -136,11 +134,20 @@ GetTransactionSnapshot(void)
*/
if (IsolationUsesXactSnapshot())
{
CurrentSnapshot = RegisterSnapshotOnOwner(CurrentSnapshot,
if (IsolationIsSerializable())
CurrentSnapshot = RegisterSerializableTransaction(&CurrentSnapshotData);
else
{
CurrentSnapshot = GetSnapshotData(&CurrentSnapshotData);
CurrentSnapshot = RegisterSnapshotOnOwner(CurrentSnapshot,
TopTransactionResourceOwner);
}
registered_xact_snapshot = true;
}
else
CurrentSnapshot = GetSnapshotData(&CurrentSnapshotData);
FirstSnapshotSet = true;
return CurrentSnapshot;
}
......
......@@ -2299,6 +2299,7 @@ main(int argc, char *argv[])
"pg_xlog/archive_status",
"pg_clog",
"pg_notify",
"pg_serial",
"pg_subtrans",
"pg_twophase",
"pg_multixact/members",
......
......@@ -11,14 +11,14 @@
* script that reproduces the schema in terms of SQL that is understood
* by PostgreSQL
*
* Note that pg_dump runs in a serializable transaction, so it sees a
* consistent snapshot of the database including system catalogs.
* However, it relies in part on various specialized backend functions
* like pg_get_indexdef(), and those things tend to run on SnapshotNow
* time, ie they look at the currently committed state. So it is
* possible to get 'cache lookup failed' error if someone performs DDL
* changes while a dump is happening. The window for this sort of thing
* is from the beginning of the serializable transaction to
* Note that pg_dump runs in a transaction-snapshot mode transaction,
* so it sees a consistent snapshot of the database including system
* catalogs. However, it relies in part on various specialized backend
* functions like pg_get_indexdef(), and those things tend to run on
* SnapshotNow time, ie they look at the currently committed state. So
* it is possible to get 'cache lookup failed' error if someone
* performs DDL changes while a dump is happening. The window for this
* sort of thing is from the acquisition of the transaction snapshot to
* getSchemaData() (when pg_dump acquires AccessShareLock on every
* table it intends to dump). It isn't very large, but it can happen.
*
......@@ -135,6 +135,7 @@ static int dump_inserts = 0;
static int column_inserts = 0;
static int no_security_label = 0;
static int no_unlogged_table_data = 0;
static int serializable_deferrable = 0;
static void help(const char *progname);
......@@ -318,6 +319,7 @@ main(int argc, char **argv)
{"no-tablespaces", no_argument, &outputNoTablespaces, 1},
{"quote-all-identifiers", no_argument, &quote_all_identifiers, 1},
{"role", required_argument, NULL, 3},
{"serializable-deferrable", no_argument, &serializable_deferrable, 1},
{"use-set-session-authorization", no_argument, &use_setsessauth, 1},
{"no-security-label", no_argument, &no_security_label, 1},
{"no-unlogged-table-data", no_argument, &no_unlogged_table_data, 1},
......@@ -669,11 +671,21 @@ main(int argc, char **argv)
no_security_label = 1;
/*
* Start serializable transaction to dump consistent data.
* Start transaction-snapshot mode transaction to dump consistent data.
*/
do_sql_command(g_conn, "BEGIN");
do_sql_command(g_conn, "SET TRANSACTION ISOLATION LEVEL SERIALIZABLE");
if (g_fout->remoteVersion >= 90100)
{
if (serializable_deferrable)
do_sql_command(g_conn,
"SET TRANSACTION ISOLATION LEVEL SERIALIZABLE, "
"READ ONLY, DEFERRABLE");
else
do_sql_command(g_conn,
"SET TRANSACTION ISOLATION LEVEL REPEATABLE READ");
}
else
do_sql_command(g_conn, "SET TRANSACTION ISOLATION LEVEL SERIALIZABLE");
/* Select the appropriate subquery to convert user IDs to names */
if (g_fout->remoteVersion >= 80100)
......@@ -864,6 +876,7 @@ help(const char *progname)
printf(_(" --disable-triggers disable triggers during data-only restore\n"));
printf(_(" --no-tablespaces do not dump tablespace assignments\n"));
printf(_(" --quote-all-identifiers quote all identifiers, even if not keywords\n"));
printf(_(" --serializable-deferrable wait until the dump can run without anomalies\n"));
printf(_(" --role=ROLENAME do SET ROLE before dump\n"));
printf(_(" --no-security-label do not dump security label assignments\n"));
printf(_(" --no-unlogged-table-data do not dump unlogged table data\n"));
......
......@@ -82,8 +82,8 @@ extern HeapTuple heap_getnext(HeapScanDesc scan, ScanDirection direction);
extern bool heap_fetch(Relation relation, Snapshot snapshot,
HeapTuple tuple, Buffer *userbuf, bool keep_buf,
Relation stats_relation);
extern bool heap_hot_search_buffer(ItemPointer tid, Buffer buffer,
Snapshot snapshot, bool *all_dead);
extern bool heap_hot_search_buffer(ItemPointer tid, Relation relation,
Buffer buffer, Snapshot snapshot, bool *all_dead);
extern bool heap_hot_search(ItemPointer tid, Relation relation,
Snapshot snapshot, bool *all_dead);
......
......@@ -35,6 +35,7 @@ typedef struct HeapScanDescData
BlockNumber rs_startblock; /* block # to start at */
BufferAccessStrategy rs_strategy; /* access strategy for reads */
bool rs_syncscan; /* report location to syncscan logic? */
bool rs_relpredicatelocked; /* predicate lock on relation exists */
/* scan current state */
bool rs_inited; /* false = scan not init'd yet */
......
......@@ -23,8 +23,9 @@ typedef uint8 TwoPhaseRmgrId;
*/
#define TWOPHASE_RM_END_ID 0
#define TWOPHASE_RM_LOCK_ID 1
#define TWOPHASE_RM_PGSTAT_ID 2
#define TWOPHASE_RM_MULTIXACT_ID 3
#define TWOPHASE_RM_PREDICATELOCK_ID 2
#define TWOPHASE_RM_PGSTAT_ID 3
#define TWOPHASE_RM_MULTIXACT_ID 4
#define TWOPHASE_RM_MAX_ID TWOPHASE_RM_MULTIXACT_ID
extern const TwoPhaseCallback twophase_recover_callbacks[];
......
......@@ -32,15 +32,26 @@ extern int DefaultXactIsoLevel;
extern int XactIsoLevel;
/*
* We only implement two isolation levels internally. This macro should
* be used to check which one is selected.
* We implement three isolation levels internally.
* The two stronger ones use one snapshot per database transaction;
* the others use one snapshot per statement.
* Serializable uses predicate locks in addition to snapshots.
* These macros should be used to check which isolation level is selected.
*/
#define IsolationUsesXactSnapshot() (XactIsoLevel >= XACT_REPEATABLE_READ)
#define IsolationIsSerializable() (XactIsoLevel == XACT_SERIALIZABLE)
/* Xact read-only state */
extern bool DefaultXactReadOnly;
extern bool XactReadOnly;
/*
* Xact is deferrable -- only meaningful (currently) for read only
* SERIALIZABLE transactions
*/
extern bool DefaultXactDeferrable;
extern bool XactDeferrable;
/* Asynchronous commits */
extern bool XactSyncCommit;
......
......@@ -49,6 +49,7 @@ CATALOG(pg_am,2601)
bool amsearchnulls; /* can AM search for NULL/NOT NULL entries? */
bool amstorage; /* can storage type differ from column type? */
bool amclusterable; /* does AM support cluster command? */
bool ampredlocks; /* does AM handle predicate locks? */
Oid amkeytype; /* type of data in index, or InvalidOid */
regproc aminsert; /* "insert this tuple" function */
regproc ambeginscan; /* "prepare for index scan" function */
......@@ -77,7 +78,7 @@ typedef FormData_pg_am *Form_pg_am;
* compiler constants for pg_am
* ----------------
*/
#define Natts_pg_am 27
#define Natts_pg_am 28
#define Anum_pg_am_amname 1
#define Anum_pg_am_amstrategies 2
#define Anum_pg_am_amsupport 3
......@@ -90,37 +91,38 @@ typedef FormData_pg_am *Form_pg_am;
#define Anum_pg_am_amsearchnulls 10
#define Anum_pg_am_amstorage 11
#define Anum_pg_am_amclusterable 12
#define Anum_pg_am_amkeytype 13
#define Anum_pg_am_aminsert 14
#define Anum_pg_am_ambeginscan 15
#define Anum_pg_am_amgettuple 16
#define Anum_pg_am_amgetbitmap 17
#define Anum_pg_am_amrescan 18
#define Anum_pg_am_amendscan 19
#define Anum_pg_am_ammarkpos 20
#define Anum_pg_am_amrestrpos 21
#define Anum_pg_am_ambuild 22
#define Anum_pg_am_ambuildempty 23
#define Anum_pg_am_ambulkdelete 24
#define Anum_pg_am_amvacuumcleanup 25
#define Anum_pg_am_amcostestimate 26
#define Anum_pg_am_amoptions 27
#define Anum_pg_am_ampredlocks 13
#define Anum_pg_am_amkeytype 14
#define Anum_pg_am_aminsert 15
#define Anum_pg_am_ambeginscan 16
#define Anum_pg_am_amgettuple 17
#define Anum_pg_am_amgetbitmap 18
#define Anum_pg_am_amrescan 19
#define Anum_pg_am_amendscan 20
#define Anum_pg_am_ammarkpos 21
#define Anum_pg_am_amrestrpos 22
#define Anum_pg_am_ambuild 23
#define Anum_pg_am_ambuildempty 24
#define Anum_pg_am_ambulkdelete 25
#define Anum_pg_am_amvacuumcleanup 26
#define Anum_pg_am_amcostestimate 27
#define Anum_pg_am_amoptions 28
/* ----------------
* initial contents of pg_am
* ----------------
*/
DATA(insert OID = 403 ( btree 5 1 t f t t t t t f t 0 btinsert btbeginscan btgettuple btgetbitmap btrescan btendscan btmarkpos btrestrpos btbuild btbuildempty btbulkdelete btvacuumcleanup btcostestimate btoptions ));
DATA(insert OID = 403 ( btree 5 1 t f t t t t t f t t 0 btinsert btbeginscan btgettuple btgetbitmap btrescan btendscan btmarkpos btrestrpos btbuild btbuildempty btbulkdelete btvacuumcleanup btcostestimate btoptions ));
DESCR("b-tree index access method");
#define BTREE_AM_OID 403
DATA(insert OID = 405 ( hash 1 1 f f t f f f f f f 23 hashinsert hashbeginscan hashgettuple hashgetbitmap hashrescan hashendscan hashmarkpos hashrestrpos hashbuild hashbuildempty hashbulkdelete hashvacuumcleanup hashcostestimate hashoptions ));
DATA(insert OID = 405 ( hash 1 1 f f t f f f f f f f 23 hashinsert hashbeginscan hashgettuple hashgetbitmap hashrescan hashendscan hashmarkpos hashrestrpos hashbuild hashbuildempty hashbulkdelete hashvacuumcleanup hashcostestimate hashoptions ));
DESCR("hash index access method");
#define HASH_AM_OID 405
DATA(insert OID = 783 ( gist 0 8 f t f f t t t t t 0 gistinsert gistbeginscan gistgettuple gistgetbitmap gistrescan gistendscan gistmarkpos gistrestrpos gistbuild gistbuildempty gistbulkdelete gistvacuumcleanup gistcostestimate gistoptions ));
DATA(insert OID = 783 ( gist 0 8 f t f f t t t t t f 0 gistinsert gistbeginscan gistgettuple gistgetbitmap gistrescan gistendscan gistmarkpos gistrestrpos gistbuild gistbuildempty gistbulkdelete gistvacuumcleanup gistcostestimate gistoptions ));
DESCR("GiST index access method");
#define GIST_AM_OID 783
DATA(insert OID = 2742 ( gin 0 5 f f f f t t f t f 0 gininsert ginbeginscan - gingetbitmap ginrescan ginendscan ginmarkpos ginrestrpos ginbuild ginbuildempty ginbulkdelete ginvacuumcleanup gincostestimate ginoptions ));
DATA(insert OID = 2742 ( gin 0 5 f f f f t t f t f f 0 gininsert ginbeginscan - gingetbitmap ginrescan ginendscan ginmarkpos ginrestrpos ginbuild ginbuildempty ginbulkdelete ginvacuumcleanup gincostestimate ginoptions ));
DESCR("GIN index access method");
#define GIN_AM_OID 2742
......
......@@ -26,6 +26,8 @@ extern bool assign_transaction_read_only(bool value,
extern const char *assign_XactIsoLevel(const char *value,
bool doit, GucSource source);
extern const char *show_XactIsoLevel(void);
extern bool assign_transaction_deferrable(bool newval, bool doit,
GucSource source);
extern bool assign_random_seed(double value,
bool doit, GucSource source);
extern const char *show_random_seed(void);
......
......@@ -27,6 +27,10 @@
#define LOG2_NUM_LOCK_PARTITIONS 4
#define NUM_LOCK_PARTITIONS (1 << LOG2_NUM_LOCK_PARTITIONS)
/* Number of partitions the shared predicate lock tables are divided into */
#define LOG2_NUM_PREDICATELOCK_PARTITIONS 4
#define NUM_PREDICATELOCK_PARTITIONS (1 << LOG2_NUM_PREDICATELOCK_PARTITIONS)
/*
* We have a number of predefined LWLocks, plus a bunch of LWLocks that are
* dynamically assigned (e.g., for shared buffers). The LWLock structures
......@@ -70,12 +74,18 @@ typedef enum LWLockId
RelationMappingLock,
AsyncCtlLock,
AsyncQueueLock,
SerializableXactHashLock,
SerializableFinishedListLock,
SerializablePredicateLockListLock,
OldSerXidLock,
PredicateLockNextRowLinkLock,
/* Individual lock IDs end here */
FirstBufMappingLock,
FirstLockMgrLock = FirstBufMappingLock + NUM_BUFFER_PARTITIONS,
FirstPredicateLockMgrLock = FirstLockMgrLock + NUM_LOCK_PARTITIONS,
/* must be last except for MaxDynamicLWLock: */
NumFixedLWLocks = FirstLockMgrLock + NUM_LOCK_PARTITIONS,
NumFixedLWLocks = FirstPredicateLockMgrLock + NUM_PREDICATELOCK_PARTITIONS,
MaxDynamicLWLock = 1000000000
} LWLockId;
......
/*-------------------------------------------------------------------------
*
* predicate.h
* POSTGRES public predicate locking definitions.
*
*
* Portions Copyright (c) 1996-2011, PostgreSQL Global Development Group
* Portions Copyright (c) 1994, Regents of the University of California
*
* src/include/storage/predicate.h
*
*-------------------------------------------------------------------------
*/
#ifndef PREDICATE_H
#define PREDICATE_H
#include "utils/relcache.h"
#include "utils/snapshot.h"
/*
* GUC variables
*/
extern int max_predicate_locks_per_xact;
/* Number of SLRU buffers to use for predicate locking */
#define NUM_OLDSERXID_BUFFERS 16
/*
* function prototypes
*/
/* housekeeping for shared memory predicate lock structures */
extern void InitPredicateLocks(void);
extern Size PredicateLockShmemSize(void);
/* predicate lock reporting */
extern bool PageIsPredicateLocked(const Relation relation, const BlockNumber blkno);
/* predicate lock maintenance */
extern Snapshot RegisterSerializableTransaction(Snapshot snapshot);
extern void RegisterPredicateLockingXid(const TransactionId xid);
extern void PredicateLockRelation(const Relation relation);
extern void PredicateLockPage(const Relation relation, const BlockNumber blkno);
extern void PredicateLockTuple(const Relation relation, const HeapTuple tuple);
extern void PredicateLockTupleRowVersionLink(const Relation relation, const HeapTuple oldTuple, const HeapTuple newTuple);
extern void PredicateLockPageSplit(const Relation relation, const BlockNumber oldblkno, const BlockNumber newblkno);
extern void PredicateLockPageCombine(const Relation relation, const BlockNumber oldblkno, const BlockNumber newblkno);
extern void ReleasePredicateLocks(const bool isCommit);
/* conflict detection (may also trigger rollback) */
extern void CheckForSerializableConflictOut(const bool valid, const Relation relation, const HeapTuple tuple, const Buffer buffer);
extern void CheckForSerializableConflictIn(const Relation relation, const HeapTuple tuple, const Buffer buffer);
/* final rollback checking */
extern void PreCommit_CheckForSerializationFailure(void);
/* two-phase commit support */
extern void AtPrepare_PredicateLocks(void);
extern void PostPrepare_PredicateLocks(TransactionId xid);
extern void PredicateLockTwoPhaseFinish(TransactionId xid, bool isCommit);
extern void predicatelock_twophase_recover(TransactionId xid, uint16 info,
void *recdata, uint32 len);
#endif /* PREDICATE_H */
This diff is collapsed.
......@@ -35,7 +35,7 @@ typedef struct SHM_QUEUE
extern void InitShmemAccess(void *seghdr);
extern void InitShmemAllocation(void);
extern void *ShmemAlloc(Size size);
extern bool ShmemAddrIsValid(void *addr);
extern bool ShmemAddrIsValid(const void *addr);
extern void InitShmemIndex(void);
extern HTAB *ShmemInitHash(const char *name, long init_size, long max_size,
HASHCTL *infoP, int hash_flags);
......@@ -67,8 +67,9 @@ extern void SHMQueueInit(SHM_QUEUE *queue);
extern void SHMQueueElemInit(SHM_QUEUE *queue);
extern void SHMQueueDelete(SHM_QUEUE *queue);
extern void SHMQueueInsertBefore(SHM_QUEUE *queue, SHM_QUEUE *elem);
extern Pointer SHMQueueNext(SHM_QUEUE *queue, SHM_QUEUE *curElem,
extern Pointer SHMQueueNext(const SHM_QUEUE *queue, const SHM_QUEUE *curElem,
Size linkOffset);
extern bool SHMQueueEmpty(SHM_QUEUE *queue);
extern bool SHMQueueEmpty(const SHM_QUEUE *queue);
extern bool SHMQueueIsDetached(const SHM_QUEUE *queue);
#endif /* SHMEM_H */
# Local binaries
/isolationtester
/pg_isolation_regress
# Local generated source files
/specparse.c
/specscanner.c
# Generated subdirectories
/results/
/log/
/tmp_check/
#
# Makefile for isolation tests
#
subdir = src/test/isolation
top_builddir = ../../..
include $(top_builddir)/src/Makefile.global
ifeq ($(PORTNAME), win32)
LDLIBS += -lws2_32
endif
override CPPFLAGS := -I$(libpq_srcdir) $(CPPFLAGS)
override LDLIBS := $(libpq_pgport) $(LDLIBS)
OBJS = specparse.o isolationtester.o
submake-regress:
$(MAKE) -C $(top_builddir)/src/test/regress pg_regress.o
pg_regress.o: | submake-regress
rm -f $@ && $(LN_S) $(top_builddir)/src/test/regress/pg_regress.o .
pg_isolation_regress: isolation_main.o pg_regress.o
$(CC) $(CFLAGS) $^ $(libpq_pgport) $(LDFLAGS) $(LDFLAGS_EX) $(LIBS) -o $@$(X)
all: isolationtester pg_isolation_regress
isolationtester: $(OBJS) | submake-libpq submake-libpgport
$(CC) $(CFLAGS) $(OBJS) $(libpq_pgport) $(LDFLAGS) $(LDFLAGS_EX) $(LIBS) -o $@$(X)
distprep: specparse.c
# There is no correct way to write a rule that generates two files.
# Rules with two targets don't have that meaning, they are merely
# shorthand for two otherwise separate rules. To be safe for parallel
# make, we must chain the dependencies like this. The semicolon is
# important, otherwise make will choose the built-in rule for
# gram.y=>gram.c.
all: isolationtester$(X) pg_isolation_regress$(X)
specparse.h: specparse.c ;
# specscanner is compiled as part of specparse
specparse.o: specscanner.c
specparse.c: specparse.y
ifdef BISON
$(BISON) $(BISONFLAGS) -o $@ $<
else
@$(missing) bison $< $@
endif
specscanner.c: specscanner.l
ifdef FLEX
$(FLEX) $(FLEXFLAGS) -o'$@' $<
else
@$(missing) flex $< $@
endif
# specparse.c is in the distribution tarball, so is not cleaned here
clean distclean:
rm -f isolationtester$(X) pg_isolation_regress$(X) $(OBJS) isolation_main.o
rm -f pg_regress.o
rm -rf results
maintainer-clean: distclean
rm -f specparse.c specscanner.c
installcheck: all
./pg_isolation_regress --schedule=$(srcdir)/isolation_schedule
check: all
./pg_isolation_regress --temp-install=./tmp_check --top-builddir=$(top_builddir) --schedule=$(srcdir)/isolation_schedule
src/test/isolation/README
Isolation tests
===============
This directory contains a set of tests for the serializable isolation level.
Testing isolation requires running multiple overlapping transactions, so
which requires multiple concurrent connections, and can't therefore be
tested using the normal pg_regress program.
To represent a test with overlapping transactions, we use a test specification
file with a custom syntax, described in the next section.
isolationtester is program that uses libpq to open multiple connections,
and executes a test specified by a spec file. A libpq connection string
to specify the server and database to connect to, the defaults derived from
environment variables are used otherwise.
pg_isolation_regress is a tool identical to pg_regress, but instead of using
psql to execute a test, it uses isolationtester.
To run the tests, you need to have a server up and running. Run
gmake installcheck
Test specification
==================
Each isolation test is defined by a specification file, stored in the specs
subdirectory. A test specification consists of five parts, in this order:
setup { <SQL> }
The given SQL block is executed once, in one session only, before running
the test. Create any test tables or such objects here. This part is
optional.
teardown { <SQL> }
The teardown SQL block is executed once after the test is finished. Use
this to clean up, e.g dropping any test tables. This part is optional.
session "<name>"
Each session is executed in a separate connection. A session consists
of four parts: setup, teardown and one or more steps. The per-session
setup and teardown parts have the same syntax as the per-test setup and
teardown described above, but they are executed in every session,
before and after each permutation. The setup part typically contains a
"BEGIN" command to begin a transaction.
Each step has a syntax of
step "<name>" { <SQL> }
where <name> is a unique name identifying this step, and SQL is a SQL
statement (or statements, separated by semicolons) that is executed in the
step.
permutation "<step name>" ...
A permutation line specifies a list of steps that are ran in that order.
If no permutation lines are given, the test program automatically generates
all possible overlapping orderings of the given sessions.
Lines beginning with a # are considered comments.
This diff is collapsed.
Parsed test spec with 4 sessions
starting permutation: rx1 wx2 c2 wx3 ry3 wy4 rz4 c4 c3 wz1 c1
step rx1: SELECT * FROM t WHERE id = 1000000;
id txt
1000000
step wx2: UPDATE t SET txt = 'b' WHERE id = 1000000;
step c2: COMMIT;
step wx3: UPDATE t SET txt = 'c' WHERE id = 1000000;
step ry3: SELECT * FROM t WHERE id = 500000;
id txt
500000
step wy4: UPDATE t SET txt = 'd' WHERE id = 500000;
step rz4: SELECT * FROM t WHERE id = 1;
id txt
1
step c4: COMMIT;
step c3: COMMIT;
ERROR: could not serialize access due to read/write dependencies among transactions
step wz1: UPDATE t SET txt = 'a' WHERE id = 1;
step c1: COMMIT;
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
Parsed test spec with 2 sessions
starting permutation: wxry1 c1 r2 wyrx2 c2
step wxry1: INSERT INTO child (parent_id) VALUES (0);
step c1: COMMIT;
step r2: SELECT TRUE;
bool
t
step wyrx2: DELETE FROM parent WHERE parent_id = 0;
ERROR: child row exists
step c2: COMMIT;
starting permutation: wxry1 r2 c1 wyrx2 c2
step wxry1: INSERT INTO child (parent_id) VALUES (0);
step r2: SELECT TRUE;
bool
t
step c1: COMMIT;
step wyrx2: DELETE FROM parent WHERE parent_id = 0;
ERROR: could not serialize access due to read/write dependencies among transactions
step c2: COMMIT;
starting permutation: wxry1 r2 wyrx2 c1 c2
step wxry1: INSERT INTO child (parent_id) VALUES (0);
step r2: SELECT TRUE;
bool
t
step wyrx2: DELETE FROM parent WHERE parent_id = 0;
step c1: COMMIT;
step c2: COMMIT;
ERROR: could not serialize access due to read/write dependencies among transactions
starting permutation: wxry1 r2 wyrx2 c2 c1
step wxry1: INSERT INTO child (parent_id) VALUES (0);
step r2: SELECT TRUE;
bool
t
step wyrx2: DELETE FROM parent WHERE parent_id = 0;
step c2: COMMIT;
step c1: COMMIT;
ERROR: could not serialize access due to read/write dependencies among transactions
starting permutation: r2 wxry1 c1 wyrx2 c2
step r2: SELECT TRUE;
bool
t
step wxry1: INSERT INTO child (parent_id) VALUES (0);
step c1: COMMIT;
step wyrx2: DELETE FROM parent WHERE parent_id = 0;
ERROR: could not serialize access due to read/write dependencies among transactions
step c2: COMMIT;
starting permutation: r2 wxry1 wyrx2 c1 c2
step r2: SELECT TRUE;
bool
t
step wxry1: INSERT INTO child (parent_id) VALUES (0);
step wyrx2: DELETE FROM parent WHERE parent_id = 0;
step c1: COMMIT;
step c2: COMMIT;
ERROR: could not serialize access due to read/write dependencies among transactions
starting permutation: r2 wxry1 wyrx2 c2 c1
step r2: SELECT TRUE;
bool
t
step wxry1: INSERT INTO child (parent_id) VALUES (0);
step wyrx2: DELETE FROM parent WHERE parent_id = 0;
step c2: COMMIT;
step c1: COMMIT;
ERROR: could not serialize access due to read/write dependencies among transactions
starting permutation: r2 wyrx2 wxry1 c1 c2
step r2: SELECT TRUE;
bool
t
step wyrx2: DELETE FROM parent WHERE parent_id = 0;
step wxry1: INSERT INTO child (parent_id) VALUES (0);
step c1: COMMIT;
step c2: COMMIT;
ERROR: could not serialize access due to read/write dependencies among transactions
starting permutation: r2 wyrx2 wxry1 c2 c1
step r2: SELECT TRUE;
bool
t
step wyrx2: DELETE FROM parent WHERE parent_id = 0;
step wxry1: INSERT INTO child (parent_id) VALUES (0);
step c2: COMMIT;
step c1: COMMIT;
ERROR: could not serialize access due to read/write dependencies among transactions
starting permutation: r2 wyrx2 c2 wxry1 c1
step r2: SELECT TRUE;
bool
t
step wyrx2: DELETE FROM parent WHERE parent_id = 0;
step c2: COMMIT;
step wxry1: INSERT INTO child (parent_id) VALUES (0);
ERROR: parent row missing
step c1: COMMIT;
Parsed test spec with 2 sessions
starting permutation: rwx1 c1 rwx2 c2
step rwx1: UPDATE test SET t = 'apple' WHERE t = 'pear';
step c1: COMMIT;
step rwx2: UPDATE test SET t = 'pear' WHERE t = 'apple'
step c2: COMMIT;
starting permutation: rwx1 rwx2 c1 c2
step rwx1: UPDATE test SET t = 'apple' WHERE t = 'pear';
step rwx2: UPDATE test SET t = 'pear' WHERE t = 'apple'
step c1: COMMIT;
step c2: COMMIT;
ERROR: could not serialize access due to read/write dependencies among transactions
starting permutation: rwx1 rwx2 c2 c1
step rwx1: UPDATE test SET t = 'apple' WHERE t = 'pear';
step rwx2: UPDATE test SET t = 'pear' WHERE t = 'apple'
step c2: COMMIT;
step c1: COMMIT;
ERROR: could not serialize access due to read/write dependencies among transactions
starting permutation: rwx2 rwx1 c1 c2
step rwx2: UPDATE test SET t = 'pear' WHERE t = 'apple'
step rwx1: UPDATE test SET t = 'apple' WHERE t = 'pear';
step c1: COMMIT;
step c2: COMMIT;
ERROR: could not serialize access due to read/write dependencies among transactions
starting permutation: rwx2 rwx1 c2 c1
step rwx2: UPDATE test SET t = 'pear' WHERE t = 'apple'
step rwx1: UPDATE test SET t = 'apple' WHERE t = 'pear';
step c2: COMMIT;
step c1: COMMIT;
ERROR: could not serialize access due to read/write dependencies among transactions
starting permutation: rwx2 c2 rwx1 c1
step rwx2: UPDATE test SET t = 'pear' WHERE t = 'apple'
step c2: COMMIT;
step rwx1: UPDATE test SET t = 'apple' WHERE t = 'pear';
step c1: COMMIT;
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
......@@ -44,7 +44,7 @@ SELECT * FROM aggtest;
CREATE TABLE writetest (a int);
CREATE TEMPORARY TABLE temptest (a int);
BEGIN;
SET TRANSACTION READ ONLY; -- ok
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE, READ ONLY, DEFERRABLE; -- ok
SELECT * FROM writetest; -- ok
a
---
......
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment