mvcc.sgml 18.6 KB
Newer Older
1
<!--
2
$Header: /cvsroot/pgsql/doc/src/sgml/mvcc.sgml,v 2.14 2001/05/12 22:51:35 petere Exp $
3 4
-->

5 6 7
 <chapter id="mvcc">
  <title>Multi-Version Concurrency Control</title>

8 9 10 11
  <indexterm>
   <primary>concurrency</primary>
  </indexterm>

12 13 14 15 16 17
  <abstract>
   <para>
    Multi-Version Concurrency Control
    (MVCC)
    is an advanced technique for improving database performance in a
    multi-user environment. 
18
    Vadim Mikheev (<email>vadim@krs.ru</email>) provided
19 20 21 22
    the implementation for <productname>Postgres</productname>.
   </para>
  </abstract>

23
  <sect1 id="mvcc-intro">
24 25 26 27 28 29
   <title>Introduction</title>

   <para>
    Unlike most other database systems which use locks for concurrency control,
    <productname>Postgres</productname>
    maintains data consistency by using a multiversion model. 
Thomas G. Lockhart's avatar
Thomas G. Lockhart committed
30
    This means that while querying a database each transaction sees
31 32
    a snapshot of data (a <firstterm>database version</firstterm>)
    as it was some
Thomas G. Lockhart's avatar
Thomas G. Lockhart committed
33
    time ago, regardless of the current state of the underlying data.
34 35 36 37 38 39 40 41 42 43 44 45 46 47
    This protects the transaction from viewing inconsistent data that
    could be caused by (other) concurrent transaction updates on the same
    data rows, providing <firstterm>transaction isolation</firstterm>
    for each database session.
   </para>

   <para>
    The main difference between multiversion and lock models is that
    in MVCC locks acquired for querying (reading) data don't conflict
    with locks acquired for writing data and so reading never blocks
    writing and writing never blocks reading.
   </para>
  </sect1>

48
  <sect1 id="transaction-iso">
49 50 51 52 53 54 55 56 57 58 59 60 61
   <title>Transaction Isolation</title>

   <para>
    The <acronym>ANSI</acronym>/<acronym>ISO</acronym> <acronym>SQL</acronym>
    standard defines four levels of transaction
    isolation in terms of three phenomena that must be prevented 
    between concurrent transactions.
    These undesirable phenomena are:

    <variablelist>
     <varlistentry>
      <term>
       dirty reads
62
       <indexterm><primary>dirty reads</primary></indexterm>
63 64 65 66 67 68 69 70 71 72 73
      </term>
     <listitem>
      <para>
	A transaction reads data written by concurrent uncommitted transaction.
       </para>
      </listitem>
     </varlistentry>

     <varlistentry>
      <term>
       non-repeatable reads
74
       <indexterm><primary>non-repeatable reads</primary></indexterm>
75 76 77 78
      </term>
     <listitem>
      <para>
	A transaction re-reads data it has previously read and finds that data
Tom Lane's avatar
Tom Lane committed
79 80
	has been modified by another transaction (that committed since the
	initial read).
81 82 83 84 85 86 87
       </para>
      </listitem>
     </varlistentry>

     <varlistentry>
      <term>
       phantom read
88
       <indexterm><primary>phantom reads</primary></indexterm>
89 90 91 92
      </term>
     <listitem>
      <para>
	A transaction re-executes a query returning a set of rows that satisfy a
Tom Lane's avatar
Tom Lane committed
93 94
	search condition and finds that the set of rows satisfying the condition
	has changed due to another recently-committed transaction.
95 96 97 98 99 100 101
       </para>
      </listitem>
     </varlistentry>
    </variablelist>
   </para>

   <para>
102 103 104
    <indexterm>
     <primary>isolation levels</primary>
    </indexterm>
Thomas G. Lockhart's avatar
Thomas G. Lockhart committed
105 106 107
    The four isolation levels and the corresponding behaviors are described below.

    <table tocentry="1">
108
     <title><acronym>ANSI</acronym>/<acronym>ISO</acronym> <acronym>SQL</acronym> Isolation Levels</title>
Thomas G. Lockhart's avatar
Thomas G. Lockhart committed
109 110 111 112
     <titleabbrev>Isolation Levels</titleabbrev>
     <tgroup cols="4">
      <thead>
       <row>
113
	<entry>
114
         Isolation Level
115
	</entry>
Thomas G. Lockhart's avatar
Thomas G. Lockhart committed
116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189
	<entry>
	 Dirty Read
	</entry>
	<entry>
	 Non-Repeatable Read
	</entry>
	<entry>
	 Phantom Read
	</entry>
       </row>
      </thead>
      <tbody>
       <row>
	<entry>
	 Read uncommitted
	</entry>
	<entry>
	 Possible
	</entry>
	<entry>
	 Possible
	</entry>
	<entry>
	 Possible
	</entry>
       </row>

       <row>
	<entry>
	 Read committed
	</entry>
	<entry>
	 Not possible
	</entry>
	<entry>
	 Possible
	</entry>
	<entry>
	 Possible
	</entry>
       </row>

       <row>
	<entry>
	 Repeatable read
	</entry>
	<entry>
	 Not possible
	</entry>
	<entry>
	 Not possible
	</entry>
	<entry>
	 Possible
	</entry>
       </row>

       <row>
	<entry>
	 Serializable
	</entry>
	<entry>
	 Not possible
	</entry>
	<entry>
	 Not possible
	</entry>
	<entry>
	 Not possible
	</entry>
       </row>
      </tbody>
     </tgroup>
    </table>
Tom Lane's avatar
Tom Lane committed
190
   </para>
191

Tom Lane's avatar
Tom Lane committed
192
   <para>
193 194 195 196 197
    <productname>Postgres</productname>
    offers the read committed and serializable isolation levels.
   </para>
  </sect1>

198
  <sect1 id="xact-read-committed">
199 200
   <title>Read Committed Isolation Level</title>

201 202 203 204 205
   <indexterm>
    <primary>isolation levels</primary>
    <secondary>read committed</secondary>
   </indexterm>

206
   <para>
Thomas G. Lockhart's avatar
Thomas G. Lockhart committed
207 208
    <firstterm>Read Committed</firstterm>
    is the default isolation level in <productname>Postgres</productname>. 
Tom Lane's avatar
Tom Lane committed
209 210
    When a transaction runs on this isolation level,
    a <command>SELECT</command> query sees only data committed before the
211 212
    query began and never sees either uncommitted data or changes committed
    during query execution by concurrent transactions.  (However, the
Tom Lane's avatar
Tom Lane committed
213
    <command>SELECT</command> does see the effects of previous updates
214 215 216 217
    executed within this same transaction, even though they are not yet
    committed.)  Notice that two successive <command>SELECT</command>s can see different data,
    even though they are within a single transaction, when other transactions
    commit changes during execution of the first <command>SELECT</command>.
218 219 220
   </para>

   <para>
Tom Lane's avatar
Tom Lane committed
221
    If a target row found by a query while executing an
222
    <command>UPDATE</command> statement
Tom Lane's avatar
Tom Lane committed
223 224
    (or <command>DELETE</command> or <command>SELECT FOR UPDATE</command>)
    has already been updated by a
225 226 227 228 229
    concurrent uncommitted transaction then the second transaction
    that tries to update this row will wait for the other transaction to
    commit or rollback. In the case of rollback, the waiting transaction
    can proceed to change the row. In the case of commit (and if the
    row still exists; i.e. was not deleted by the other transaction), the
Tom Lane's avatar
Tom Lane committed
230 231 232 233 234 235 236 237
    query will be re-executed for this row to check that the new row
    version still satisfies the query search condition. If the new row version
    satisfies the query search condition then the row will be
    updated (or deleted or marked for update).  Note that the starting point
    for the update will be the new row version; moreover, after the update
    the doubly-updated row is visible to subsequent <command>SELECT</command>s
    in the current transaction.  Thus, the current transaction is able to see
    the effects of the other transaction for this specific row.
238 239 240
   </para>

   <para>
Tom Lane's avatar
Tom Lane committed
241 242 243 244 245
    The partial transaction isolation provided by Read Committed level is
    adequate for many applications, and this level is fast and simple to use.
    However, for applications that do complex queries and updates, it may
    be necessary to guarantee a more rigorously consistent view of the
    database than Read Committed level provides.
246 247 248
   </para>
  </sect1>

249
  <sect1 id="xact-serializable">
250 251
   <title>Serializable Isolation Level</title>

252 253 254 255 256
   <indexterm>
    <primary>isolation levels</primary>
    <secondary>read serializable</secondary>
   </indexterm>

257
   <para>
Tom Lane's avatar
Tom Lane committed
258 259 260 261 262 263 264 265
    <firstterm>Serializable</firstterm> provides the highest transaction
    isolation.  This level emulates serial transaction execution,
    as if transactions had been executed one after another, serially,
    rather than concurrently.  However, applications using this level must
    be prepared to retry transactions due to serialization failures.
   </para>

   <para>
Bruce Momjian's avatar
Bruce Momjian committed
266
    When a transaction is on the serializable level,
Tom Lane's avatar
Tom Lane committed
267
    a <command>SELECT</command> query sees only data committed before the
268 269 270
    transaction began and never sees either uncommitted data or changes
    committed
    during transaction execution by concurrent transactions.  (However, the
Tom Lane's avatar
Tom Lane committed
271
    <command>SELECT</command> does see the effects of previous updates
272 273 274 275 276
    executed within this same transaction, even though they are not yet
    committed.)  This is different from Read Committed in that the
    <command>SELECT</command>
    sees a snapshot as of the start of the transaction, not as of the start
    of the current query within the transaction.
277 278 279
   </para>

   <para>
Tom Lane's avatar
Tom Lane committed
280 281
    If a target row found by a query while executing an
    <command>UPDATE</command> statement
Thomas G. Lockhart's avatar
Thomas G. Lockhart committed
282
    (or <command>DELETE</command> or <command>SELECT FOR UPDATE</command>)
Tom Lane's avatar
Tom Lane committed
283 284
    has already been updated by a
    concurrent uncommitted transaction then the second transaction
285 286 287 288 289 290 291 292 293 294 295 296 297 298
    that tries to update this row will wait for the other transaction to
    commit or rollback. In the case of rollback, the waiting transaction
    can proceed to change the row. In the case of a concurrent
    transaction commit, a serializable transaction will be rolled back
    with the message

    <programlisting>
ERROR:  Can't serialize access due to concurrent update
    </programlisting>

    because a serializable transaction cannot modify rows changed by
    other transactions after the serializable transaction began.
   </para>

Tom Lane's avatar
Tom Lane committed
299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316
   <para>
    When the application receives this error message, it should abort
    the current transaction and then retry the whole transaction from
    the beginning.  The second time through, the transaction sees the
    previously-committed change as part of its initial view of the database,
    so there is no logical conflict in using the new version of the row
    as the starting point for the new transaction's update.
    Note that only updating transactions may need to be retried --- read-only
    transactions never have serialization conflicts.
   </para>

   <para>
    Serializable transaction level provides a rigorous guarantee that each
    transaction sees a wholly consistent view of the database.  However,
    the application has to be prepared to retry transactions when concurrent
    updates make it impossible to sustain the illusion of serial execution,
    and the cost of redoing complex transactions may be significant.  So
    this level is recommended only when update queries contain logic
317
    sufficiently complex that they may give wrong answers in Read Committed
Tom Lane's avatar
Tom Lane committed
318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367
    level.
   </para>
  </sect1>

  <sect1 id="applevel-consistency">
   <title>Data consistency checks at the application level</title>

   <para>
    Because readers in <productname>Postgres</productname>
    don't lock data, regardless of
    transaction isolation level, data read by one transaction can be
    overwritten by another concurrent transaction. In other words,
    if a row is returned by <command>SELECT</command> it doesn't mean that
    the row still exists at the time it is returned (i.e. sometime after the
    current transaction began); the row might have been modified or deleted
    by an already-committed transaction that committed after this one started.
    Even if the row is still valid "now", it could be changed or deleted
    before the current transaction does a commit or rollback.
   </para>

   <para>
    Another way to think about it is that each
    transaction sees a snapshot of the database contents, and concurrently
    executing transactions may very well see different snapshots.  So the
    whole concept of "now" is somewhat suspect anyway.  This is not normally
    a big problem if the client applications are isolated from each other,
    but if the clients can communicate via channels outside the database
    then serious confusion may ensue.
   </para>

   <para>
    To ensure the current existence of a row and protect it against
    concurrent updates one must use <command>SELECT FOR UPDATE</command> or
    an appropriate <command>LOCK TABLE</command> statement.
    (<command>SELECT FOR UPDATE</command> locks just the returned rows against
    concurrent updates, while <command>LOCK TABLE</command> protects the
    whole table.)
    This should be taken into account when porting applications to
    <productname>Postgres</productname> from other environments.

    <note>
     <para>
      Before version 6.5 <productname>Postgres</productname>
      used read-locks and so the
      above consideration is also the case
      when upgrading to 6.5 (or higher) from previous
      <productname>Postgres</productname> versions.
     </para>
    </note>
   </para>
368 369
  </sect1>

370
  <sect1 id="locking-tables">
371 372
   <title>Locking and Tables</title>

373 374 375 376
   <indexterm>
    <primary>locking</primary>
   </indexterm>

377 378 379 380 381 382
   <para>
    <productname>Postgres</productname>
    provides various lock modes to control concurrent
    access to data in tables. Some of these lock modes are acquired by
    <productname>Postgres</productname>
    automatically before statement execution, while others are
Tom Lane's avatar
Tom Lane committed
383 384
    provided to be used by applications. All lock modes acquired in a
    transaction are held for the duration 
385 386 387 388 389 390 391 392 393 394 395 396 397 398
    of the transaction.
   </para>

   <sect2>
    <title>Table-level locks</title>

    <para>
     <variablelist>
      <varlistentry>
       <term>
	AccessShareLock
       </term>
       <listitem>
	<para>
Tom Lane's avatar
Tom Lane committed
399 400
	 A read-lock mode acquired automatically on tables
	 being queried.
401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510
	</para>

	<para>
	 Conflicts with AccessExclusiveLock only.
	</para>
       </listitem>
      </varlistentry>

      <varlistentry>
       <term>
	RowShareLock
       </term>
       <listitem>
	<para>
	 Acquired by <command>SELECT FOR UPDATE</command>
	 and <command>LOCK TABLE</command>
	 for <option>IN ROW SHARE MODE</option> statements.
	</para>

	<para>
	 Conflicts with ExclusiveLock and AccessExclusiveLock modes.
	</para>
       </listitem>
      </varlistentry>

      <varlistentry>
       <term>
	RowExclusiveLock
       </term>
       <listitem>
	<para>
	 Acquired by <command>UPDATE</command>, <command>DELETE</command>,
	 <command>INSERT</command> and <command>LOCK TABLE</command>
	 for <option>IN ROW EXCLUSIVE MODE</option> statements.
	</para>

	<para>
	 Conflicts with ShareLock, ShareRowExclusiveLock, ExclusiveLock and
	 AccessExclusiveLock modes.
	</para>
       </listitem>
      </varlistentry>

      <varlistentry>
       <term>
	ShareLock
       </term>
       <listitem>
	<para>
	 Acquired by <command>CREATE INDEX</command>
	 and <command>LOCK TABLE</command> table
	 for <option>IN SHARE MODE</option>
	 statements.
	</para>

	<para>
	 Conflicts with RowExclusiveLock, ShareRowExclusiveLock,
	 ExclusiveLock and AccessExclusiveLock modes.
	</para>
       </listitem>
      </varlistentry>

      <varlistentry>
       <term>
	ShareRowExclusiveLock
       </term>
       <listitem>
	<para>
	 Acquired by <command>LOCK TABLE</command> for
	 <option>IN SHARE ROW EXCLUSIVE MODE</option> statements.
	</para>

	<para>
	 Conflicts with RowExclusiveLock, ShareLock, ShareRowExclusiveLock,
	 ExclusiveLock and AccessExclusiveLock modes.
	</para>
       </listitem>
      </varlistentry>

      <varlistentry>
       <term>
	ExclusiveLock
       </term>
       <listitem>
	<para>
	 Acquired by <command>LOCK TABLE</command> table 
	 for <option>IN EXCLUSIVE MODE</option> statements.
	</para>

	<para>
	 Conflicts with RowShareLock, RowExclusiveLock, ShareLock,
	 ShareRowExclusiveLock, ExclusiveLock and AccessExclusiveLock
	 modes.
	</para>
       </listitem>
      </varlistentry>

      <varlistentry>
       <term>
	AccessExclusiveLock
       </term>
       <listitem>
	<para>
	 Acquired by <command>ALTER TABLE</command>,
	 <command>DROP TABLE</command>,
	 <command>VACUUM</command> and <command>LOCK TABLE</command>
	 statements.
	</para>

	<para>
511 512 513
	 Conflicts with all modes (AccessShareLock, RowShareLock,
	 RowExclusiveLock, ShareLock,
	 ShareRowExclusiveLock, ExclusiveLock and AccessExclusiveLock).
514 515 516 517
	</para>
       </listitem>
      </varlistentry>
     </variablelist>
518 519 520 521 522 523 524

     <note>
      <para>
       Only AccessExclusiveLock blocks <command>SELECT</command> (without
       <option>FOR UPDATE</option>) statement.
      </para>
     </note>
525 526 527 528 529 530 531
    </para>
   </sect2>

   <sect2>
    <title>Row-level locks</title>

    <para>
Tom Lane's avatar
Tom Lane committed
532 533 534 535
     These locks are acquired when rows are being updated (or deleted or
     marked for update).
     Row-level locks don't affect data querying. They block
     writers to <emphasis>the same row</emphasis> only.
536 537 538
    </para>

    <para>
Tom Lane's avatar
Tom Lane committed
539 540 541 542 543 544
     <productname>Postgres</productname>
     doesn't remember any information about modified rows in memory and
     so has no limit to the number of rows locked at one time.  However,
     locking a row may cause a disk write; thus, for example,
     <command>SELECT FOR UPDATE</command> will modify
     selected rows to mark them and so will result in disk writes.
545 546 547
    </para>

    <para>
Tom Lane's avatar
Tom Lane committed
548 549 550 551 552 553
    In addition to table and row locks, short-term share/exclusive locks are
    used to control read/write access to table pages in the shared buffer
    pool.  These locks are released immediately after a tuple is fetched or
    updated.  Application writers normally need not be concerned with
    page-level locks, but we mention them for completeness.
   </para>
554 555 556
   </sect2>
  </sect1>

557
  <sect1 id="locking-indices">
558 559 560 561
   <title>Locking and Indices</title>

   <para>
    Though <productname>Postgres</productname>
Tom Lane's avatar
Tom Lane committed
562 563 564
    provides nonblocking read/write access to table
    data, nonblocking read/write access is not currently offered for every
    index access method implemented
565 566 567 568 569 570 571 572 573 574 575 576 577
    in <productname>Postgres</productname>.
   </para>

   <para>
    The various index types are handled as follows:

    <variablelist>
     <varlistentry>
      <term>
       GiST and R-Tree indices
      </term>
      <listitem>
       <para>
Thomas G. Lockhart's avatar
Thomas G. Lockhart committed
578
	Share/exclusive index-level locks are used for read/write access.
579 580 581 582 583 584 585 586 587 588 589
	Locks are released after statement is done.
       </para>
      </listitem>
     </varlistentry>

     <varlistentry>
      <term>
       Hash indices
      </term>
      <listitem>
       <para>
Thomas G. Lockhart's avatar
Thomas G. Lockhart committed
590
	Share/exclusive page-level locks are used for read/write access.
591 592 593 594
	Locks are released after page is processed.
       </para>

       <para>
Tom Lane's avatar
Tom Lane committed
595
	Page-level locks provide better concurrency than index-level ones
596 597 598 599 600 601 602
	but are subject to deadlocks.
       </para>
      </listitem>
     </varlistentry>

     <varlistentry>
      <term>
Tom Lane's avatar
Tom Lane committed
603
       Btree indices
604 605 606
      </term>
      <listitem>
       <para>
Tom Lane's avatar
Tom Lane committed
607 608 609
	Short-term share/exclusive page-level locks are used for
	read/write access. Locks are released immediately after each index
	tuple is fetched/inserted.
610 611 612
       </para>

       <para>
Thomas G. Lockhart's avatar
Thomas G. Lockhart committed
613
	Btree indices provide the highest concurrency without deadlock
614 615 616 617 618 619 620 621
	conditions.
       </para>
      </listitem>
     </varlistentry>
    </variablelist>
   </para>

   <para>
Tom Lane's avatar
Tom Lane committed
622 623
    In short, btree indices are the recommended index type for concurrent
    applications.
624 625 626 627 628 629
   </para>
  </sect1>
 </chapter>

<!-- Keep this comment at the end of the file
Local variables:
630
mode:sgml
631 632 633 634 635 636 637 638 639
sgml-omittag:nil
sgml-shorttag:t
sgml-minimize-attributes:nil
sgml-always-quote-attributes:t
sgml-indent-step:1
sgml-indent-data:t
sgml-parent-document:nil
sgml-default-dtd-file:"./reference.ced"
sgml-exposed-tags:nil
640
sgml-local-catalogs:("/usr/lib/sgml/catalog")
641 642 643
sgml-local-ecat-files:nil
End:
-->