Commit 282d2a03 authored by Tom Lane's avatar Tom Lane

HOT updates. When we update a tuple without changing any of its indexed

columns, and the new version can be stored on the same heap page, we no longer
generate extra index entries for the new version.  Instead, index searches
follow the HOT-chain links to ensure they find the correct tuple version.

In addition, this patch introduces the ability to "prune" dead tuples on a
per-page basis, without having to do a complete VACUUM pass to recover space.
VACUUM is still needed to clean up dead index entries, however.

Pavan Deolasee, with help from a bunch of other people.
parent bbf4fdc2
/*
* $PostgreSQL: pgsql/contrib/pgstattuple/pgstattuple.c,v 1.29 2007/09/12 22:10:25 tgl Exp $
* $PostgreSQL: pgsql/contrib/pgstattuple/pgstattuple.c,v 1.30 2007/09/20 17:56:30 tgl Exp $
*
* Copyright (c) 2001,2002 Tatsuo Ishii
*
......@@ -290,7 +290,7 @@ pgstat_heap(Relation rel, FunctionCallInfo fcinfo)
{
buffer = ReadBuffer(rel, block);
LockBuffer(buffer, BUFFER_LOCK_SHARE);
stat.free_space += PageGetFreeSpace((Page) BufferGetPage(buffer));
stat.free_space += PageGetHeapFreeSpace((Page) BufferGetPage(buffer));
LockBuffer(buffer, BUFFER_LOCK_UNLOCK);
ReleaseBuffer(buffer);
block++;
......@@ -301,7 +301,7 @@ pgstat_heap(Relation rel, FunctionCallInfo fcinfo)
while (block < nblocks)
{
buffer = ReadBuffer(rel, block);
stat.free_space += PageGetFreeSpace((Page) BufferGetPage(buffer));
stat.free_space += PageGetHeapFreeSpace((Page) BufferGetPage(buffer));
ReleaseBuffer(buffer);
block++;
}
......
<!-- $PostgreSQL: pgsql/doc/src/sgml/catalogs.sgml,v 2.157 2007/09/05 18:10:47 tgl Exp $ -->
<!-- $PostgreSQL: pgsql/doc/src/sgml/catalogs.sgml,v 2.158 2007/09/20 17:56:30 tgl Exp $ -->
<!--
Documentation of the system catalogs, directed toward PostgreSQL developers
-->
......@@ -2565,6 +2565,29 @@
</entry>
</row>
<row>
<entry><structfield>indcheckxmin</structfield></entry>
<entry><type>bool</type></entry>
<entry></entry>
<entry>
If true, queries must not use the index until the <structfield>xmin</>
of this <structname>pg_index</> row is below their TransactionXmin
event horizon, because the table may contain broken HOT chains with
incompatible rows that they can see
</entry>
</row>
<row>
<entry><structfield>indisready</structfield></entry>
<entry><type>bool</type></entry>
<entry></entry>
<entry>
If true, the index is currently ready for inserts. False means the
index must be ignored by <command>INSERT</>/<command>UPDATE</>
operations
</entry>
</row>
<row>
<entry><structfield>indkey</structfield></entry>
<entry><type>int2vector</type></entry>
......
<!-- $PostgreSQL: pgsql/doc/src/sgml/monitoring.sgml,v 1.51 2007/06/28 00:02:37 tgl Exp $ -->
<!-- $PostgreSQL: pgsql/doc/src/sgml/monitoring.sgml,v 1.52 2007/09/20 17:56:30 tgl Exp $ -->
<chapter id="monitoring">
<title>Monitoring Database Activity</title>
......@@ -276,6 +276,8 @@ postgres: <replaceable>user</> <replaceable>database</> <replaceable>host</> <re
scans, number of index scans initiated (over all indexes
belonging to the table), number of live rows fetched by index
scans, numbers of row insertions, updates, and deletions,
number of row updates that were HOT (i.e., no separate index update),
numbers of live and dead rows,
the last time the table was vacuumed manually,
the last time it was vacuumed by the autovacuum daemon,
the last time it was analyzed manually,
......@@ -580,7 +582,7 @@ postgres: <replaceable>user</> <replaceable>database</> <replaceable>host</> <re
<entry><literal><function>pg_stat_get_tuples_updated</function>(<type>oid</type>)</literal></entry>
<entry><type>bigint</type></entry>
<entry>
Number of rows updated in table
Number of rows updated in table (includes HOT updates)
</entry>
</row>
......@@ -592,6 +594,30 @@ postgres: <replaceable>user</> <replaceable>database</> <replaceable>host</> <re
</entry>
</row>
<row>
<entry><literal><function>pg_stat_get_tuples_hot_updated</function>(<type>oid</type>)</literal></entry>
<entry><type>bigint</type></entry>
<entry>
Number of rows HOT-updated in table
</entry>
</row>
<row>
<entry><literal><function>pg_stat_get_live_tuples</function>(<type>oid</type>)</literal></entry>
<entry><type>bigint</type></entry>
<entry>
Number of live rows in table
</entry>
</row>
<row>
<entry><literal><function>pg_stat_get_dead_tuples</function>(<type>oid</type>)</literal></entry>
<entry><type>bigint</type></entry>
<entry>
Number of dead rows in table
</entry>
</row>
<row>
<entry><literal><function>pg_stat_get_blocks_fetched</function>(<type>oid</type>)</literal></entry>
<entry><type>bigint</type></entry>
......@@ -716,6 +742,18 @@ postgres: <replaceable>user</> <replaceable>database</> <replaceable>host</> <re
</entry>
</row>
<row>
<entry><literal><function>pg_stat_get_backend_xact_start</function>(<type>integer</type>)</literal></entry>
<entry><type>timestamp with time zone</type></entry>
<entry>
The time at which the given server process' currently
executing transaction was started, but only if the
current user is a superuser or the same user as that of
the session being queried (and
<varname>stats_command_string</varname> is on)
</entry>
</row>
<row>
<entry><literal><function>pg_stat_get_backend_start</function>(<type>integer</type>)</literal></entry>
<entry><type>timestamp with time zone</type></entry>
......
<!--
$PostgreSQL: pgsql/doc/src/sgml/ref/create_index.sgml,v 1.64 2007/09/07 00:58:56 tgl Exp $
$PostgreSQL: pgsql/doc/src/sgml/ref/create_index.sgml,v 1.65 2007/09/20 17:56:30 tgl Exp $
PostgreSQL documentation
-->
......@@ -329,7 +329,10 @@ CREATE [ UNIQUE ] INDEX [ CONCURRENTLY ] <replaceable class="parameter">name</re
</para>
<para>
If a problem arises during the second scan of the table, such as a
In a concurrent index build, the index is actually entered into the
system catalogs in one transaction, then the two table scans occur in a
second and third transaction.
If a problem arises while scanning the table, such as a
uniqueness violation in a unique index, the <command>CREATE INDEX</>
command will fail but leave behind an <quote>invalid</> index. This index
will be ignored for querying purposes because it might be incomplete;
......
......@@ -8,7 +8,7 @@
* Portions Copyright (c) 1994, Regents of the University of California
*
* IDENTIFICATION
* $PostgreSQL: pgsql/src/backend/access/gin/ginentrypage.c,v 1.8 2007/09/12 22:10:25 tgl Exp $
* $PostgreSQL: pgsql/src/backend/access/gin/ginentrypage.c,v 1.9 2007/09/20 17:56:30 tgl Exp $
*-------------------------------------------------------------------------
*/
......@@ -359,7 +359,7 @@ entryPlaceToPage(GinBtree btree, Buffer buf, OffsetNumber off, XLogRecData **prd
*prdata = rdata;
data.updateBlkno = entryPreparePage(btree, page, off);
placed = PageAddItem(page, (Item) btree->entry, IndexTupleSize(btree->entry), off, false);
placed = PageAddItem(page, (Item) btree->entry, IndexTupleSize(btree->entry), off, false, false);
if (placed != off)
elog(ERROR, "failed to add item to index page in \"%s\"",
RelationGetRelationName(btree->index));
......@@ -488,7 +488,7 @@ entrySplitPage(GinBtree btree, Buffer lbuf, Buffer rbuf, OffsetNumber off, XLogR
lsize += MAXALIGN(IndexTupleSize(itup)) + sizeof(ItemIdData);
}
if (PageAddItem(page, (Item) itup, IndexTupleSize(itup), InvalidOffsetNumber, false) == InvalidOffsetNumber)
if (PageAddItem(page, (Item) itup, IndexTupleSize(itup), InvalidOffsetNumber, false, false) == InvalidOffsetNumber)
elog(ERROR, "failed to add item to index page in \"%s\"",
RelationGetRelationName(btree->index));
ptr += MAXALIGN(IndexTupleSize(itup));
......@@ -563,11 +563,11 @@ entryFillRoot(GinBtree btree, Buffer root, Buffer lbuf, Buffer rbuf)
page = BufferGetPage(root);
itup = ginPageGetLinkItup(lbuf);
if (PageAddItem(page, (Item) itup, IndexTupleSize(itup), InvalidOffsetNumber, false) == InvalidOffsetNumber)
if (PageAddItem(page, (Item) itup, IndexTupleSize(itup), InvalidOffsetNumber, false, false) == InvalidOffsetNumber)
elog(ERROR, "failed to add item to index root page");
itup = ginPageGetLinkItup(rbuf);
if (PageAddItem(page, (Item) itup, IndexTupleSize(itup), InvalidOffsetNumber, false) == InvalidOffsetNumber)
if (PageAddItem(page, (Item) itup, IndexTupleSize(itup), InvalidOffsetNumber, false, false) == InvalidOffsetNumber)
elog(ERROR, "failed to add item to index root page");
}
......
......@@ -8,7 +8,7 @@
* Portions Copyright (c) 1994, Regents of the University of California
*
* IDENTIFICATION
* $PostgreSQL: pgsql/src/backend/access/gin/ginvacuum.c,v 1.16 2007/09/12 22:10:25 tgl Exp $
* $PostgreSQL: pgsql/src/backend/access/gin/ginvacuum.c,v 1.17 2007/09/20 17:56:30 tgl Exp $
*-------------------------------------------------------------------------
*/
......@@ -544,7 +544,7 @@ ginVacuumEntryPage(GinVacuumState *gvs, Buffer buffer, BlockNumber *roots, uint3
itup = GinFormTuple(&gvs->ginstate, value, GinGetPosting(itup), newN);
PageIndexTupleDelete(tmppage, i);
if (PageAddItem(tmppage, (Item) itup, IndexTupleSize(itup), i, false) != i)
if (PageAddItem(tmppage, (Item) itup, IndexTupleSize(itup), i, false, false) != i)
elog(ERROR, "failed to add item to index page in \"%s\"",
RelationGetRelationName(gvs->index));
......
......@@ -8,7 +8,7 @@
* Portions Copyright (c) 1994, Regents of the University of California
*
* IDENTIFICATION
* $PostgreSQL: pgsql/src/backend/access/gin/ginxlog.c,v 1.8 2007/09/12 22:10:25 tgl Exp $
* $PostgreSQL: pgsql/src/backend/access/gin/ginxlog.c,v 1.9 2007/09/20 17:56:30 tgl Exp $
*-------------------------------------------------------------------------
*/
#include "postgres.h"
......@@ -199,7 +199,7 @@ ginRedoInsert(XLogRecPtr lsn, XLogRecord *record)
itup = (IndexTuple) (XLogRecGetData(record) + sizeof(ginxlogInsert));
if (PageAddItem(page, (Item) itup, IndexTupleSize(itup), data->offset, false) == InvalidOffsetNumber)
if (PageAddItem(page, (Item) itup, IndexTupleSize(itup), data->offset, false, false) == InvalidOffsetNumber)
elog(ERROR, "failed to add item to index page in %u/%u/%u",
data->node.spcNode, data->node.dbNode, data->node.relNode);
......@@ -281,7 +281,7 @@ ginRedoSplit(XLogRecPtr lsn, XLogRecord *record)
for (i = 0; i < data->separator; i++)
{
if (PageAddItem(lpage, (Item) itup, IndexTupleSize(itup), InvalidOffsetNumber, false) == InvalidOffsetNumber)
if (PageAddItem(lpage, (Item) itup, IndexTupleSize(itup), InvalidOffsetNumber, false, false) == InvalidOffsetNumber)
elog(ERROR, "failed to add item to index page in %u/%u/%u",
data->node.spcNode, data->node.dbNode, data->node.relNode);
itup = (IndexTuple) (((char *) itup) + MAXALIGN(IndexTupleSize(itup)));
......@@ -289,7 +289,7 @@ ginRedoSplit(XLogRecPtr lsn, XLogRecord *record)
for (i = data->separator; i < data->nitem; i++)
{
if (PageAddItem(rpage, (Item) itup, IndexTupleSize(itup), InvalidOffsetNumber, false) == InvalidOffsetNumber)
if (PageAddItem(rpage, (Item) itup, IndexTupleSize(itup), InvalidOffsetNumber, false, false) == InvalidOffsetNumber)
elog(ERROR, "failed to add item to index page in %u/%u/%u",
data->node.spcNode, data->node.dbNode, data->node.relNode);
itup = (IndexTuple) (((char *) itup) + MAXALIGN(IndexTupleSize(itup)));
......@@ -375,7 +375,7 @@ ginRedoVacuumPage(XLogRecPtr lsn, XLogRecord *record)
for (i = 0; i < data->nitem; i++)
{
if (PageAddItem(page, (Item) itup, IndexTupleSize(itup), InvalidOffsetNumber, false) == InvalidOffsetNumber)
if (PageAddItem(page, (Item) itup, IndexTupleSize(itup), InvalidOffsetNumber, false, false) == InvalidOffsetNumber)
elog(ERROR, "failed to add item to index page in %u/%u/%u",
data->node.spcNode, data->node.dbNode, data->node.relNode);
itup = (IndexTuple) (((char *) itup) + MAXALIGN(IndexTupleSize(itup)));
......
......@@ -8,7 +8,7 @@
* Portions Copyright (c) 1994, Regents of the University of California
*
* IDENTIFICATION
* $PostgreSQL: pgsql/src/backend/access/gist/gist.c,v 1.146 2007/09/12 22:10:25 tgl Exp $
* $PostgreSQL: pgsql/src/backend/access/gist/gist.c,v 1.147 2007/09/20 17:56:30 tgl Exp $
*
*-------------------------------------------------------------------------
*/
......@@ -366,7 +366,7 @@ gistplacetopage(GISTInsertState *state, GISTSTATE *giststate)
data = (char *) (ptr->list);
for (i = 0; i < ptr->block.num; i++)
{
if (PageAddItem(ptr->page, (Item) data, IndexTupleSize((IndexTuple) data), i + FirstOffsetNumber, false) == InvalidOffsetNumber)
if (PageAddItem(ptr->page, (Item) data, IndexTupleSize((IndexTuple) data), i + FirstOffsetNumber, false, false) == InvalidOffsetNumber)
elog(ERROR, "failed to add item to index page in \"%s\"", RelationGetRelationName(state->r));
data += IndexTupleSize((IndexTuple) data);
}
......
......@@ -8,7 +8,7 @@
* Portions Copyright (c) 1994, Regents of the University of California
*
* IDENTIFICATION
* $PostgreSQL: pgsql/src/backend/access/gist/gistutil.c,v 1.23 2007/09/12 22:10:25 tgl Exp $
* $PostgreSQL: pgsql/src/backend/access/gist/gistutil.c,v 1.24 2007/09/20 17:56:30 tgl Exp $
*-------------------------------------------------------------------------
*/
#include "postgres.h"
......@@ -42,7 +42,7 @@ gistfillbuffer(Relation r, Page page, IndexTuple *itup,
for (i = 0; i < len; i++)
{
l = PageAddItem(page, (Item) itup[i], IndexTupleSize(itup[i]),
off, false);
off, false, false);
if (l == InvalidOffsetNumber)
elog(ERROR, "failed to add item to index page in \"%s\"",
RelationGetRelationName(r));
......
......@@ -8,7 +8,7 @@
* Portions Copyright (c) 1994, Regents of the University of California
*
* IDENTIFICATION
* $PostgreSQL: pgsql/src/backend/access/gist/gistvacuum.c,v 1.31 2007/09/12 22:10:25 tgl Exp $
* $PostgreSQL: pgsql/src/backend/access/gist/gistvacuum.c,v 1.32 2007/09/20 17:56:30 tgl Exp $
*
*-------------------------------------------------------------------------
*/
......@@ -201,7 +201,7 @@ vacuumSplitPage(GistVacuum *gv, Page tempPage, Buffer buffer, IndexTuple *addon,
data = (char *) (ptr->list);
for (i = 0; i < ptr->block.num; i++)
{
if (PageAddItem(ptr->page, (Item) data, IndexTupleSize((IndexTuple) data), i + FirstOffsetNumber, false) == InvalidOffsetNumber)
if (PageAddItem(ptr->page, (Item) data, IndexTupleSize((IndexTuple) data), i + FirstOffsetNumber, false, false) == InvalidOffsetNumber)
elog(ERROR, "failed to add item to index page in \"%s\"", RelationGetRelationName(gv->index));
data += IndexTupleSize((IndexTuple) data);
}
......
......@@ -8,7 +8,7 @@
*
*
* IDENTIFICATION
* $PostgreSQL: pgsql/src/backend/access/hash/hashinsert.c,v 1.46 2007/09/12 22:10:25 tgl Exp $
* $PostgreSQL: pgsql/src/backend/access/hash/hashinsert.c,v 1.47 2007/09/20 17:56:30 tgl Exp $
*
*-------------------------------------------------------------------------
*/
......@@ -200,7 +200,7 @@ _hash_pgaddtup(Relation rel,
page = BufferGetPage(buf);
itup_off = OffsetNumberNext(PageGetMaxOffsetNumber(page));
if (PageAddItem(page, (Item) itup, itemsize, itup_off, false)
if (PageAddItem(page, (Item) itup, itemsize, itup_off, false, false)
== InvalidOffsetNumber)
elog(ERROR, "failed to add index item to \"%s\"",
RelationGetRelationName(rel));
......
......@@ -8,7 +8,7 @@
*
*
* IDENTIFICATION
* $PostgreSQL: pgsql/src/backend/access/hash/hashovfl.c,v 1.59 2007/09/12 22:10:25 tgl Exp $
* $PostgreSQL: pgsql/src/backend/access/hash/hashovfl.c,v 1.60 2007/09/20 17:56:30 tgl Exp $
*
* NOTES
* Overflow pages look like ordinary relation pages.
......@@ -684,7 +684,7 @@ _hash_squeezebucket(Relation rel,
* we have found room so insert on the "write" page.
*/
woffnum = OffsetNumberNext(PageGetMaxOffsetNumber(wpage));
if (PageAddItem(wpage, (Item) itup, itemsz, woffnum, false)
if (PageAddItem(wpage, (Item) itup, itemsz, woffnum, false, false)
== InvalidOffsetNumber)
elog(ERROR, "failed to add index item to \"%s\"",
RelationGetRelationName(rel));
......
......@@ -8,7 +8,7 @@
*
*
* IDENTIFICATION
* $PostgreSQL: pgsql/src/backend/access/hash/hashpage.c,v 1.69 2007/09/12 22:10:25 tgl Exp $
* $PostgreSQL: pgsql/src/backend/access/hash/hashpage.c,v 1.70 2007/09/20 17:56:30 tgl Exp $
*
* NOTES
* Postgres hash pages look like ordinary relation pages. The opaque
......@@ -830,7 +830,7 @@ _hash_splitbucket(Relation rel,
}
noffnum = OffsetNumberNext(PageGetMaxOffsetNumber(npage));
if (PageAddItem(npage, (Item) itup, itemsz, noffnum, false)
if (PageAddItem(npage, (Item) itup, itemsz, noffnum, false, false)
== InvalidOffsetNumber)
elog(ERROR, "failed to add index item to \"%s\"",
RelationGetRelationName(rel));
......
......@@ -4,7 +4,7 @@
# Makefile for access/heap
#
# IDENTIFICATION
# $PostgreSQL: pgsql/src/backend/access/heap/Makefile,v 1.16 2007/06/08 18:23:52 tgl Exp $
# $PostgreSQL: pgsql/src/backend/access/heap/Makefile,v 1.17 2007/09/20 17:56:30 tgl Exp $
#
#-------------------------------------------------------------------------
......@@ -12,7 +12,7 @@ subdir = src/backend/access/heap
top_builddir = ../../../..
include $(top_builddir)/src/Makefile.global
OBJS = heapam.o hio.o rewriteheap.o syncscan.o tuptoaster.o
OBJS = heapam.o hio.o pruneheap.o rewriteheap.o syncscan.o tuptoaster.o
all: SUBSYS.o
......
This diff is collapsed.
This diff is collapsed.
......@@ -8,7 +8,7 @@
*
*
* IDENTIFICATION
* $PostgreSQL: pgsql/src/backend/access/heap/hio.c,v 1.66 2007/09/12 22:10:26 tgl Exp $
* $PostgreSQL: pgsql/src/backend/access/heap/hio.c,v 1.67 2007/09/20 17:56:30 tgl Exp $
*
*-------------------------------------------------------------------------
*/
......@@ -41,7 +41,7 @@ RelationPutHeapTuple(Relation relation,
pageHeader = BufferGetPage(buffer);
offnum = PageAddItem(pageHeader, (Item) tuple->t_data,
tuple->t_len, InvalidOffsetNumber, false);
tuple->t_len, InvalidOffsetNumber, false, true);
if (offnum == InvalidOffsetNumber)
elog(PANIC, "failed to add tuple to page");
......@@ -218,7 +218,7 @@ RelationGetBufferForTuple(Relation relation, Size len,
* we're done.
*/
pageHeader = (Page) BufferGetPage(buffer);
pageFreeSpace = PageGetFreeSpace(pageHeader);
pageFreeSpace = PageGetHeapFreeSpace(pageHeader);
if (len + saveFreeSpace <= pageFreeSpace)
{
/* use this page as future insert target, too */
......@@ -311,7 +311,7 @@ RelationGetBufferForTuple(Relation relation, Size len,
PageInit(pageHeader, BufferGetPageSize(buffer), 0);
if (len > PageGetFreeSpace(pageHeader))
if (len > PageGetHeapFreeSpace(pageHeader))
{
/* We should not get here given the test at the top */
elog(PANIC, "tuple is too big: size %lu", (unsigned long) len);
......
This diff is collapsed.
......@@ -96,7 +96,7 @@
* Portions Copyright (c) 1994-5, Regents of the University of California
*
* IDENTIFICATION
* $PostgreSQL: pgsql/src/backend/access/heap/rewriteheap.c,v 1.6 2007/09/12 22:10:26 tgl Exp $
* $PostgreSQL: pgsql/src/backend/access/heap/rewriteheap.c,v 1.7 2007/09/20 17:56:30 tgl Exp $
*
*-------------------------------------------------------------------------
*/
......@@ -320,12 +320,14 @@ rewrite_heap_tuple(RewriteState state,
* Copy the original tuple's visibility information into new_tuple.
*
* XXX we might later need to copy some t_infomask2 bits, too?
* Right now, we intentionally clear the HOT status bits.
*/
memcpy(&new_tuple->t_data->t_choice.t_heap,
&old_tuple->t_data->t_choice.t_heap,
sizeof(HeapTupleFields));
new_tuple->t_data->t_infomask &= ~HEAP_XACT_MASK;
new_tuple->t_data->t_infomask2 &= ~HEAP2_XACT_MASK;
new_tuple->t_data->t_infomask |=
old_tuple->t_data->t_infomask & HEAP_XACT_MASK;
......@@ -593,7 +595,7 @@ raw_heap_insert(RewriteState state, HeapTuple tup)
/* Now we can check to see if there's enough free space already. */
if (state->rs_buffer_valid)
{
pageFreeSpace = PageGetFreeSpace(page);
pageFreeSpace = PageGetHeapFreeSpace(page);
if (len + saveFreeSpace > pageFreeSpace)
{
......@@ -628,7 +630,7 @@ raw_heap_insert(RewriteState state, HeapTuple tup)
/* And now we can insert the tuple into the page */
newoff = PageAddItem(page, (Item) heaptup->t_data, len,
InvalidOffsetNumber, false);
InvalidOffsetNumber, false, true);
if (newoff == InvalidOffsetNumber)
elog(ERROR, "failed to add tuple");
......
......@@ -8,7 +8,7 @@
*
*
* IDENTIFICATION
* $PostgreSQL: pgsql/src/backend/access/index/genam.c,v 1.62 2007/05/27 03:50:38 tgl Exp $
* $PostgreSQL: pgsql/src/backend/access/index/genam.c,v 1.63 2007/09/20 17:56:30 tgl Exp $
*
* NOTES
* many of the old access method routines have been turned into
......@@ -21,6 +21,7 @@
#include "access/genam.h"
#include "access/heapam.h"
#include "access/transam.h"
#include "miscadmin.h"
#include "pgstat.h"
......@@ -95,6 +96,9 @@ RelationGetIndexScan(Relation indexRelation,
ItemPointerSetInvalid(&scan->xs_ctup.t_self);
scan->xs_ctup.t_data = NULL;
scan->xs_cbuf = InvalidBuffer;
scan->xs_prev_xmax = InvalidTransactionId;
scan->xs_next_hot = InvalidOffsetNumber;
scan->xs_hot_dead = false;
/*
* Let the AM fill in the key and any opaque data it wants.
......
......@@ -8,7 +8,7 @@
*
*
* IDENTIFICATION
* $PostgreSQL: pgsql/src/backend/access/index/indexam.c,v 1.98 2007/05/27 03:50:38 tgl Exp $
* $PostgreSQL: pgsql/src/backend/access/index/indexam.c,v 1.99 2007/09/20 17:56:30 tgl Exp $
*
* INTERFACE ROUTINES
* index_open - open an index relation by relation OID
......@@ -64,6 +64,7 @@
#include "access/genam.h"
#include "access/heapam.h"
#include "access/transam.h"
#include "pgstat.h"
#include "utils/relcache.h"
......@@ -313,6 +314,8 @@ index_rescan(IndexScanDesc scan, ScanKey key)
scan->xs_cbuf = InvalidBuffer;
}
scan->xs_next_hot = InvalidOffsetNumber;
scan->kill_prior_tuple = false; /* for safety */
FunctionCall2(procedure,
......@@ -370,6 +373,14 @@ index_markpos(IndexScanDesc scan)
* NOTE: this only restores the internal scan state of the index AM.
* The current result tuple (scan->xs_ctup) doesn't change. See comments
* for ExecRestrPos().
*
* NOTE: in the presence of HOT chains, mark/restore only works correctly
* if the scan's snapshot is MVCC-safe; that ensures that there's at most one
* returnable tuple in each HOT chain, and so restoring the prior state at the
* granularity of the index AM is sufficient. Since the only current user
* of mark/restore functionality is nodeMergejoin.c, this effectively means
* that merge-join plans only work for MVCC snapshots. This could be fixed
* if necessary, but for now it seems unimportant.
* ----------------
*/
void
......@@ -377,9 +388,13 @@ index_restrpos(IndexScanDesc scan)
{
FmgrInfo *procedure;
Assert(IsMVCCSnapshot(scan->xs_snapshot));
SCAN_CHECKS;
GET_SCAN_PROCEDURE(amrestrpos);
scan->xs_next_hot = InvalidOffsetNumber;
scan->kill_prior_tuple = false; /* for safety */
FunctionCall1(procedure, PointerGetDatum(scan));
......@@ -398,20 +413,53 @@ HeapTuple
index_getnext(IndexScanDesc scan, ScanDirection direction)
{
HeapTuple heapTuple = &scan->xs_ctup;
ItemPointer tid = &heapTuple->t_self;
FmgrInfo *procedure;
SCAN_CHECKS;
GET_SCAN_PROCEDURE(amgettuple);
/* just make sure this is false... */
scan->kill_prior_tuple = false;
/*
* We always reset xs_hot_dead; if we are here then either we are just
* starting the scan, or we previously returned a visible tuple, and in
* either case it's inappropriate to kill the prior index entry.
*/
scan->xs_hot_dead = false;
for (;;)
{
OffsetNumber offnum;
bool at_chain_start;
Page dp;
if (scan->xs_next_hot != InvalidOffsetNumber)
{
/*
* We are resuming scan of a HOT chain after having returned
* an earlier member. Must still hold pin on current heap page.
*/
Assert(BufferIsValid(scan->xs_cbuf));
Assert(ItemPointerGetBlockNumber(tid) ==
BufferGetBlockNumber(scan->xs_cbuf));
Assert(TransactionIdIsValid(scan->xs_prev_xmax));
offnum = scan->xs_next_hot;
at_chain_start = false;
scan->xs_next_hot = InvalidOffsetNumber;
}
else
{
bool found;
Buffer prev_buf;
/*
* If we scanned a whole HOT chain and found only dead tuples,
* tell index AM to kill its entry for that TID.
*/
scan->kill_prior_tuple = scan->xs_hot_dead;
/*
* The AM's gettuple proc finds the next tuple matching the scan keys.
* The AM's gettuple proc finds the next index entry matching the
* scan keys, and puts the TID in xs_ctup.t_self (ie, *tid).
*/
found = DatumGetBool(FunctionCall2(procedure,
PointerGetDatum(scan),
......@@ -420,50 +468,169 @@ index_getnext(IndexScanDesc scan, ScanDirection direction)
/* Reset kill flag immediately for safety */
scan->kill_prior_tuple = false;
/* If we're out of index entries, break out of outer loop */
if (!found)
break;
pgstat_count_index_tuples(scan->indexRelation, 1);
/* Switch to correct buffer if we don't have it already */
prev_buf = scan->xs_cbuf;
scan->xs_cbuf = ReleaseAndReadBuffer(scan->xs_cbuf,
scan->heapRelation,
ItemPointerGetBlockNumber(tid));
/*
* Prune page, but only if we weren't already on this page
*/
if (prev_buf != scan->xs_cbuf)
heap_page_prune_opt(scan->heapRelation, scan->xs_cbuf,
RecentGlobalXmin);
/* Prepare to scan HOT chain starting at index-referenced offnum */
offnum = ItemPointerGetOffsetNumber(tid);
at_chain_start = true;
/* We don't know what the first tuple's xmin should be */
scan->xs_prev_xmax = InvalidTransactionId;
/* Initialize flag to detect if all entries are dead */
scan->xs_hot_dead = true;
}
/* Obtain share-lock on the buffer so we can examine visibility */
LockBuffer(scan->xs_cbuf, BUFFER_LOCK_SHARE);
dp = (Page) BufferGetPage(scan->xs_cbuf);
/* Scan through possible multiple members of HOT-chain */
for (;;)
{
/* Release any held pin on a heap page */
if (BufferIsValid(scan->xs_cbuf))
ItemId lp;
ItemPointer ctid;
/* check for bogus TID */
if (offnum < FirstOffsetNumber ||
offnum > PageGetMaxOffsetNumber(dp))
break;
lp = PageGetItemId(dp, offnum);
/* check for unused, dead, or redirected items */
if (!ItemIdIsNormal(lp))
{
ReleaseBuffer(scan->xs_cbuf);
scan->xs_cbuf = InvalidBuffer;
/* We should only see a redirect at start of chain */
if (ItemIdIsRedirected(lp) && at_chain_start)
{
/* Follow the redirect */
offnum = ItemIdGetRedirect(lp);
at_chain_start = false;
continue;
}
return NULL; /* failure exit */
/* else must be end of chain */
break;
}
pgstat_count_index_tuples(scan->indexRelation, 1);
/*
* We must initialize all of *heapTuple (ie, scan->xs_ctup)
* since it is returned to the executor on success.
*/
heapTuple->t_data = (HeapTupleHeader) PageGetItem(dp, lp);
heapTuple->t_len = ItemIdGetLength(lp);
ItemPointerSetOffsetNumber(tid, offnum);
heapTuple->t_tableOid = RelationGetRelid(scan->heapRelation);
ctid = &heapTuple->t_data->t_ctid;
/*
* Fetch the heap tuple and see if it matches the snapshot.
* Shouldn't see a HEAP_ONLY tuple at chain start. (This test
* should be unnecessary, since the chain root can't be removed
* while we have pin on the index entry, but let's make it anyway.)
*/
if (heap_release_fetch(scan->heapRelation, scan->xs_snapshot,
heapTuple, &scan->xs_cbuf, true,
scan->indexRelation))
if (at_chain_start && HeapTupleIsHeapOnly(heapTuple))
break;
/* Skip if no undeleted tuple at this location */
if (heapTuple->t_data == NULL)
continue;
/*
* The xmin should match the previous xmax value, else chain is
* broken. (Note: this test is not optional because it protects
* us against the case where the prior chain member's xmax
* aborted since we looked at it.)
*/
if (TransactionIdIsValid(scan->xs_prev_xmax) &&
!TransactionIdEquals(scan->xs_prev_xmax,
HeapTupleHeaderGetXmin(heapTuple->t_data)))
break;
/* If it's visible per the snapshot, we must return it */
if (HeapTupleSatisfiesVisibility(heapTuple, scan->xs_snapshot,
scan->xs_cbuf))
{
/*
* If we can't see it, maybe no one else can either. Check to see if
* the tuple is dead to all transactions. If so, signal the index AM
* to not return it on future indexscans.
*
* We told heap_release_fetch to keep a pin on the buffer, so we can
* re-access the tuple here. But we must re-lock the buffer first.
* If the snapshot is MVCC, we know that it could accept
* at most one member of the HOT chain, so we can skip
* examining any more members. Otherwise, check for
* continuation of the HOT-chain, and set state for next time.
*/
LockBuffer(scan->xs_cbuf, BUFFER_LOCK_SHARE);
if (IsMVCCSnapshot(scan->xs_snapshot))
scan->xs_next_hot = InvalidOffsetNumber;
else if (HeapTupleIsHotUpdated(heapTuple))
{
Assert(ItemPointerGetBlockNumber(ctid) ==
ItemPointerGetBlockNumber(tid));
scan->xs_next_hot = ItemPointerGetOffsetNumber(ctid);
scan->xs_prev_xmax = HeapTupleHeaderGetXmax(heapTuple->t_data);
}
else
scan->xs_next_hot = InvalidOffsetNumber;
LockBuffer(scan->xs_cbuf, BUFFER_LOCK_UNLOCK);
pgstat_count_heap_fetch(scan->indexRelation);
if (HeapTupleSatisfiesVacuum(heapTuple->t_data, RecentGlobalXmin,
scan->xs_cbuf) == HEAPTUPLE_DEAD)
scan->kill_prior_tuple = true;
return heapTuple;
}
/*
* If we can't see it, maybe no one else can either. Check to see
* if the tuple is dead to all transactions. If we find that all
* the tuples in the HOT chain are dead, we'll signal the index AM
* to not return that TID on future indexscans.
*/
if (scan->xs_hot_dead &&
HeapTupleSatisfiesVacuum(heapTuple->t_data, RecentGlobalXmin,
scan->xs_cbuf) != HEAPTUPLE_DEAD)
scan->xs_hot_dead = false;
/*
* Check to see if HOT chain continues past this tuple; if so
* fetch the next offnum (we don't bother storing it into
* xs_next_hot, but must store xs_prev_xmax), and loop around.
*/
if (HeapTupleIsHotUpdated(heapTuple))
{
Assert(ItemPointerGetBlockNumber(ctid) ==
ItemPointerGetBlockNumber(tid));
offnum = ItemPointerGetOffsetNumber(ctid);
at_chain_start = false;
scan->xs_prev_xmax = HeapTupleHeaderGetXmax(heapTuple->t_data);
}
else
break; /* end of chain */
} /* loop over a single HOT chain */
LockBuffer(scan->xs_cbuf, BUFFER_LOCK_UNLOCK);
/* Loop around to ask index AM for another TID */
scan->xs_next_hot = InvalidOffsetNumber;
}
/* Success exit */
return heapTuple;
/* Release any held pin on a heap page */
if (BufferIsValid(scan->xs_cbuf))
{
ReleaseBuffer(scan->xs_cbuf);
scan->xs_cbuf = InvalidBuffer;
}
return NULL; /* failure exit */
}
/* ----------------
......
......@@ -8,7 +8,7 @@
*
*
* IDENTIFICATION
* $PostgreSQL: pgsql/src/backend/access/nbtree/nbtinsert.c,v 1.159 2007/09/12 22:10:26 tgl Exp $
* $PostgreSQL: pgsql/src/backend/access/nbtree/nbtinsert.c,v 1.160 2007/09/20 17:56:30 tgl Exp $
*
*-------------------------------------------------------------------------
*/
......@@ -193,8 +193,6 @@ _bt_check_unique(Relation rel, IndexTuple itup, Relation heapRel,
*/
for (;;)
{
HeapTupleData htup;
Buffer hbuffer;
ItemId curitemid;
IndexTuple curitup;
BlockNumber nblkno;
......@@ -223,6 +221,9 @@ _bt_check_unique(Relation rel, IndexTuple itup, Relation heapRel,
*/
if (!ItemIdIsDead(curitemid))
{
ItemPointerData htid;
bool all_dead;
/*
* _bt_compare returns 0 for (1,NULL) and (1,NULL) - this's
* how we handling NULLs - and so we must not use _bt_compare
......@@ -234,17 +235,20 @@ _bt_check_unique(Relation rel, IndexTuple itup, Relation heapRel,
/* okay, we gotta fetch the heap tuple ... */
curitup = (IndexTuple) PageGetItem(page, curitemid);
htup.t_self = curitup->t_tid;
if (heap_fetch(heapRel, &SnapshotDirty, &htup, &hbuffer,
true, NULL))
htid = curitup->t_tid;
/*
* We check the whole HOT-chain to see if there is any tuple
* that satisfies SnapshotDirty. This is necessary because
* we have just a single index entry for the entire chain.
*/
if (heap_hot_search(&htid, heapRel, &SnapshotDirty, &all_dead))
{
/* it is a duplicate */
TransactionId xwait =
(TransactionIdIsValid(SnapshotDirty.xmin)) ?
SnapshotDirty.xmin : SnapshotDirty.xmax;
ReleaseBuffer(hbuffer);
/*
* If this tuple is being updated by other transaction
* then we have to wait for its commit/abort.
......@@ -263,15 +267,22 @@ _bt_check_unique(Relation rel, IndexTuple itup, Relation heapRel,
* is itself now committed dead --- if so, don't complain.
* This is a waste of time in normal scenarios but we must
* do it to support CREATE INDEX CONCURRENTLY.
*/
htup.t_self = itup->t_tid;
if (heap_fetch(heapRel, SnapshotSelf, &htup, &hbuffer,
false, NULL))
*
* We must follow HOT-chains here because during
* concurrent index build, we insert the root TID though
* the actual tuple may be somewhere in the HOT-chain.
* While following the chain we might not stop at the exact
* tuple which triggered the insert, but that's OK because
* if we find a live tuple anywhere in this chain, we have
* a unique key conflict. The other live tuple is not part
* of this chain because it had a different index entry.
*/
htid = itup->t_tid;
if (heap_hot_search(&htid, heapRel, SnapshotSelf, NULL))
{
/* Normal case --- it's still live */
ReleaseBuffer(hbuffer);
}
else if (htup.t_data != NULL)
else
{
/*
* It's been deleted, so no error, and no need to
......@@ -279,28 +290,19 @@ _bt_check_unique(Relation rel, IndexTuple itup, Relation heapRel,
*/
break;
}
else
{
/* couldn't find the tuple?? */
elog(ERROR, "failed to fetch tuple being inserted");
}
ereport(ERROR,
(errcode(ERRCODE_UNIQUE_VIOLATION),
errmsg("duplicate key value violates unique constraint \"%s\"",
RelationGetRelationName(rel))));
}
else if (htup.t_data != NULL)
else if (all_dead)
{
/*
* Hmm, if we can't see the tuple, maybe it can be marked
* killed. This logic should match index_getnext and
* btgettuple.
* The conflicting tuple (or whole HOT chain) is dead to
* everyone, so we may as well mark the index entry
* killed.
*/
LockBuffer(hbuffer, BUFFER_LOCK_SHARE);
if (HeapTupleSatisfiesVacuum(htup.t_data, RecentGlobalXmin,
hbuffer) == HEAPTUPLE_DEAD)
{
ItemIdMarkDead(curitemid);
opaque->btpo_flags |= BTP_HAS_GARBAGE;
/* be sure to mark the proper buffer dirty... */
......@@ -309,9 +311,6 @@ _bt_check_unique(Relation rel, IndexTuple itup, Relation heapRel,
else
SetBufferCommitInfoNeedsSave(buf);
}
LockBuffer(hbuffer, BUFFER_LOCK_UNLOCK);
}
ReleaseBuffer(hbuffer);
}
}
......@@ -840,7 +839,7 @@ _bt_split(Relation rel, Buffer buf, OffsetNumber firstright,
itemsz = ItemIdGetLength(itemid);
item = (IndexTuple) PageGetItem(origpage, itemid);
if (PageAddItem(rightpage, (Item) item, itemsz, rightoff,
false) == InvalidOffsetNumber)
false, false) == InvalidOffsetNumber)
elog(PANIC, "failed to add hikey to the right sibling");
rightoff = OffsetNumberNext(rightoff);
}
......@@ -865,7 +864,7 @@ _bt_split(Relation rel, Buffer buf, OffsetNumber firstright,
item = (IndexTuple) PageGetItem(origpage, itemid);
}
if (PageAddItem(leftpage, (Item) item, itemsz, leftoff,
false) == InvalidOffsetNumber)
false, false) == InvalidOffsetNumber)
elog(PANIC, "failed to add hikey to the left sibling");
leftoff = OffsetNumberNext(leftoff);
......@@ -1700,7 +1699,7 @@ _bt_newroot(Relation rel, Buffer lbuf, Buffer rbuf)
* benefit of _bt_restore_page().
*/
if (PageAddItem(rootpage, (Item) new_item, itemsz, P_HIKEY,
false) == InvalidOffsetNumber)
false, false) == InvalidOffsetNumber)
elog(PANIC, "failed to add leftkey to new root page");
pfree(new_item);
......@@ -1718,7 +1717,7 @@ _bt_newroot(Relation rel, Buffer lbuf, Buffer rbuf)
* insert the right page pointer into the new root page.
*/
if (PageAddItem(rootpage, (Item) new_item, itemsz, P_FIRSTKEY,
false) == InvalidOffsetNumber)
false, false) == InvalidOffsetNumber)
elog(PANIC, "failed to add rightkey to new root page");
pfree(new_item);
......@@ -1805,7 +1804,7 @@ _bt_pgaddtup(Relation rel,
}
if (PageAddItem(page, (Item) itup, itemsize, itup_off,
false) == InvalidOffsetNumber)
false, false) == InvalidOffsetNumber)
elog(PANIC, "failed to add item to the %s for \"%s\"",
where, RelationGetRelationName(rel));
}
......
......@@ -57,7 +57,7 @@
* Portions Copyright (c) 1994, Regents of the University of California
*
* IDENTIFICATION
* $PostgreSQL: pgsql/src/backend/access/nbtree/nbtsort.c,v 1.112 2007/09/12 22:10:26 tgl Exp $
* $PostgreSQL: pgsql/src/backend/access/nbtree/nbtsort.c,v 1.113 2007/09/20 17:56:30 tgl Exp $
*
*-------------------------------------------------------------------------
*/
......@@ -400,7 +400,7 @@ _bt_sortaddtup(Page page,
}
if (PageAddItem(page, (Item) itup, itemsize, itup_off,
false) == InvalidOffsetNumber)
false, false) == InvalidOffsetNumber)
elog(ERROR, "failed to add item to the index page");
}
......
......@@ -8,7 +8,7 @@
* Portions Copyright (c) 1994, Regents of the University of California
*
* IDENTIFICATION
* $PostgreSQL: pgsql/src/backend/access/nbtree/nbtxlog.c,v 1.45 2007/09/12 22:10:26 tgl Exp $
* $PostgreSQL: pgsql/src/backend/access/nbtree/nbtxlog.c,v 1.46 2007/09/20 17:56:30 tgl Exp $
*
*-------------------------------------------------------------------------
*/
......@@ -141,8 +141,8 @@ _bt_restore_page(Page page, char *from, int len)
memcpy(&itupdata, from, sizeof(IndexTupleData));
itemsz = IndexTupleDSize(itupdata);
itemsz = MAXALIGN(itemsz);
if (PageAddItem(page, (Item) from, itemsz,
FirstOffsetNumber, false) == InvalidOffsetNumber)
if (PageAddItem(page, (Item) from, itemsz, FirstOffsetNumber,
false, false) == InvalidOffsetNumber)
elog(PANIC, "_bt_restore_page: cannot add item to page");
from += itemsz;
}
......@@ -238,7 +238,7 @@ btree_xlog_insert(bool isleaf, bool ismeta,
{
if (PageAddItem(page, (Item) datapos, datalen,
ItemPointerGetOffsetNumber(&(xlrec->target.tid)),
false) == InvalidOffsetNumber)
false, false) == InvalidOffsetNumber)
elog(PANIC, "btree_insert_redo: failed to add item");
PageSetLSN(page, lsn);
......@@ -389,7 +389,7 @@ btree_xlog_split(bool onleft, bool isroot,
if (onleft)
{
if (PageAddItem(lpage, newitem, newitemsz, newitemoff,
false) == InvalidOffsetNumber)
false, false) == InvalidOffsetNumber)
elog(PANIC, "failed to add new item to left page after split");
}
......@@ -398,7 +398,7 @@ btree_xlog_split(bool onleft, bool isroot,
hiItem = PageGetItem(rpage, hiItemId);
if (PageAddItem(lpage, hiItem, ItemIdGetLength(hiItemId),
P_HIKEY, false) == InvalidOffsetNumber)
P_HIKEY, false, false) == InvalidOffsetNumber)
elog(PANIC, "failed to add high key to left page after split");
/* Fix opaque fields */
......
This diff is collapsed.
......@@ -9,7 +9,7 @@
*
*
* IDENTIFICATION
* $PostgreSQL: pgsql/src/backend/catalog/indexing.c,v 1.114 2007/01/05 22:19:24 momjian Exp $
* $PostgreSQL: pgsql/src/backend/catalog/indexing.c,v 1.115 2007/09/20 17:56:30 tgl Exp $
*
*-------------------------------------------------------------------------
*/
......@@ -78,6 +78,10 @@ CatalogIndexInsert(CatalogIndexState indstate, HeapTuple heapTuple)
Datum values[INDEX_MAX_KEYS];
bool isnull[INDEX_MAX_KEYS];
/* HOT update does not require index inserts */
if (HeapTupleIsHeapOnly(heapTuple))
return;
/*
* Get information from the state structure. Fall out if nothing to do.
*/
......@@ -101,6 +105,10 @@ CatalogIndexInsert(CatalogIndexState indstate, HeapTuple heapTuple)
indexInfo = indexInfoArray[i];
/* If the index is marked as read-only, ignore it */
if (!indexInfo->ii_ReadyForInserts)
continue;
/*
* Expressional and partial indexes on system catalogs are not
* supported
......
......@@ -3,7 +3,7 @@
*
* Copyright (c) 1996-2007, PostgreSQL Global Development Group
*
* $PostgreSQL: pgsql/src/backend/catalog/system_views.sql,v 1.44 2007/09/11 08:51:22 teodor Exp $
* $PostgreSQL: pgsql/src/backend/catalog/system_views.sql,v 1.45 2007/09/20 17:56:30 tgl Exp $
*/
CREATE VIEW pg_roles AS
......@@ -207,6 +207,7 @@ CREATE VIEW pg_stat_all_tables AS
pg_stat_get_tuples_inserted(C.oid) AS n_tup_ins,
pg_stat_get_tuples_updated(C.oid) AS n_tup_upd,
pg_stat_get_tuples_deleted(C.oid) AS n_tup_del,
pg_stat_get_tuples_hot_updated(C.oid) AS n_tup_hot_upd,
pg_stat_get_live_tuples(C.oid) AS n_live_tup,
pg_stat_get_dead_tuples(C.oid) AS n_dead_tup,
pg_stat_get_last_vacuum_time(C.oid) as last_vacuum,
......
......@@ -8,7 +8,7 @@
* Portions Copyright (c) 1994, Regents of the University of California
*
* IDENTIFICATION
* $PostgreSQL: pgsql/src/backend/catalog/toasting.c,v 1.7 2007/07/25 22:16:18 tgl Exp $
* $PostgreSQL: pgsql/src/backend/catalog/toasting.c,v 1.8 2007/09/20 17:56:30 tgl Exp $
*
*-------------------------------------------------------------------------
*/
......@@ -225,7 +225,9 @@ create_toast_table(Relation rel, Oid toastOid, Oid toastIndexOid)
indexInfo->ii_Predicate = NIL;
indexInfo->ii_PredicateState = NIL;
indexInfo->ii_Unique = true;
indexInfo->ii_ReadyForInserts = true;
indexInfo->ii_Concurrent = false;
indexInfo->ii_BrokenHotChain = false;
classObjectId[0] = OID_BTREE_OPS_OID;
classObjectId[1] = INT4_BTREE_OPS_OID;
......
......@@ -8,7 +8,7 @@
*
*
* IDENTIFICATION
* $PostgreSQL: pgsql/src/backend/commands/indexcmds.c,v 1.165 2007/09/10 21:59:37 alvherre Exp $
* $PostgreSQL: pgsql/src/backend/commands/indexcmds.c,v 1.166 2007/09/20 17:56:31 tgl Exp $
*
*-------------------------------------------------------------------------
*/
......@@ -119,6 +119,7 @@ DefineIndex(RangeVar *heapRelation,
Oid namespaceId;
Oid tablespaceId;
Relation rel;
Relation indexRelation;
HeapTuple tuple;
Form_pg_am accessMethodForm;
bool amcanorder;
......@@ -420,7 +421,10 @@ DefineIndex(RangeVar *heapRelation,
indexInfo->ii_Predicate = make_ands_implicit(predicate);
indexInfo->ii_PredicateState = NIL;
indexInfo->ii_Unique = unique;
/* In a concurrent build, mark it not-ready-for-inserts */
indexInfo->ii_ReadyForInserts = !concurrent;
indexInfo->ii_Concurrent = concurrent;
indexInfo->ii_BrokenHotChain = false;
classObjectId = (Oid *) palloc(numberOfAttributes * sizeof(Oid));
coloptions = (int16 *) palloc(numberOfAttributes * sizeof(int16));
......@@ -439,23 +443,38 @@ DefineIndex(RangeVar *heapRelation,
primary ? "PRIMARY KEY" : "UNIQUE",
indexRelationName, RelationGetRelationName(rel))));
/* save lockrelid for below, then close rel */
/* save lockrelid and locktag for below, then close rel */
heaprelid = rel->rd_lockInfo.lockRelId;
SET_LOCKTAG_RELATION(heaplocktag, heaprelid.dbId, heaprelid.relId);
heap_close(rel, NoLock);
if (!concurrent)
{
indexRelationId =
index_create(relationId, indexRelationName, indexRelationId,
indexInfo, accessMethodId, tablespaceId, classObjectId,
coloptions, reloptions, primary, isconstraint,
allowSystemTableMods, skip_build, concurrent);
if (!concurrent)
return; /* We're done, in the standard case */
}
/*
* For a concurrent build, we next insert the catalog entry and add
* constraints. We don't build the index just yet; we must first make
* the catalog entry so that the new index is visible to updating
* transactions. That will prevent them from making incompatible HOT
* updates. The new index will be marked not indisready and not
* indisvalid, so that no one else tries to either insert into it or use
* it for queries. We pass skip_build = true to prevent the build.
*/
indexRelationId =
index_create(relationId, indexRelationName, indexRelationId,
indexInfo, accessMethodId, tablespaceId, classObjectId,
coloptions, reloptions, primary, isconstraint,
allowSystemTableMods, true, concurrent);
/*
* Phase 2 of concurrent index build (see comments for validate_index()
* for an overview of how this works)
*
* We must commit our current transaction so that the index becomes
* visible; then start another. Note that all the data structures we just
* built are lost in the commit. The only data we keep past here are the
......@@ -476,6 +495,9 @@ DefineIndex(RangeVar *heapRelation,
StartTransactionCommand();
/*
* Phase 2 of concurrent index build (see comments for validate_index()
* for an overview of how this works)
*
* Now we must wait until no running transaction could have the table open
* with the old list of indexes. To do this, inquire which xacts
* currently would conflict with ShareLock on the table -- ie, which ones
......@@ -494,7 +516,91 @@ DefineIndex(RangeVar *heapRelation,
* check for that. Also, prepared xacts are not reported, which is
* fine since they certainly aren't going to do anything more.
*/
SET_LOCKTAG_RELATION(heaplocktag, heaprelid.dbId, heaprelid.relId);
old_lockholders = GetLockConflicts(&heaplocktag, ShareLock);
while (VirtualTransactionIdIsValid(*old_lockholders))
{
VirtualXactLockTableWait(*old_lockholders);
old_lockholders++;
}
/*
* At this moment we are sure that there are no transactions with the
* table open for write that don't have this new index in their list of
* indexes. We have waited out all the existing transactions and any new
* transaction will have the new index in its list, but the index is still
* marked as "not-ready-for-inserts". The index is consulted while
* deciding HOT-safety though. This arrangement ensures that no new HOT
* chains can be created where the new tuple and the old tuple in the
* chain have different index keys.
*
* We now take a new snapshot, and build the index using all tuples that
* are visible in this snapshot. We can be sure that any HOT updates
* to these tuples will be compatible with the index, since any updates
* made by transactions that didn't know about the index are now committed
* or rolled back. Thus, each visible tuple is either the end of its
* HOT-chain or the extension of the chain is HOT-safe for this index.
*/
/* Open and lock the parent heap relation */
rel = heap_openrv(heapRelation, ShareUpdateExclusiveLock);
/* And the target index relation */
indexRelation = index_open(indexRelationId, RowExclusiveLock);
/* Set ActiveSnapshot since functions in the indexes may need it */
ActiveSnapshot = CopySnapshot(GetTransactionSnapshot());
/* We have to re-build the IndexInfo struct, since it was lost in commit */
indexInfo = BuildIndexInfo(indexRelation);
Assert(!indexInfo->ii_ReadyForInserts);
indexInfo->ii_Concurrent = true;
indexInfo->ii_BrokenHotChain = false;
/* Now build the index */
index_build(rel, indexRelation, indexInfo, primary);
/* Close both the relations, but keep the locks */
heap_close(rel, NoLock);
index_close(indexRelation, NoLock);
/*
* Update the pg_index row to mark the index as ready for inserts.
* Once we commit this transaction, any new transactions that
* open the table must insert new entries into the index for insertions
* and non-HOT updates.
*/
pg_index = heap_open(IndexRelationId, RowExclusiveLock);
indexTuple = SearchSysCacheCopy(INDEXRELID,
ObjectIdGetDatum(indexRelationId),
0, 0, 0);
if (!HeapTupleIsValid(indexTuple))
elog(ERROR, "cache lookup failed for index %u", indexRelationId);
indexForm = (Form_pg_index) GETSTRUCT(indexTuple);
Assert(!indexForm->indisready);
Assert(!indexForm->indisvalid);
indexForm->indisready = true;
simple_heap_update(pg_index, &indexTuple->t_self, indexTuple);
CatalogUpdateIndexes(pg_index, indexTuple);
heap_close(pg_index, RowExclusiveLock);
/*
* Commit this transaction to make the indisready update visible.
*/
CommitTransactionCommand();
StartTransactionCommand();
/*
* Phase 3 of concurrent index build
*
* We once again wait until no transaction can have the table open with
* the index marked as read-only for updates.
*/
old_lockholders = GetLockConflicts(&heaplocktag, ShareLock);
while (VirtualTransactionIdIsValid(*old_lockholders))
......@@ -505,7 +611,7 @@ DefineIndex(RangeVar *heapRelation,
/*
* Now take the "reference snapshot" that will be used by validate_index()
* to filter candidate tuples. Beware! There might be still snapshots
* to filter candidate tuples. Beware! There might still be snapshots
* in use that treat some transaction as in-progress that our reference
* snapshot treats as committed. If such a recently-committed transaction
* deleted tuples in the table, we will not include them in the index; yet
......@@ -560,7 +666,7 @@ DefineIndex(RangeVar *heapRelation,
elog(ERROR, "cache lookup failed for index %u", indexRelationId);
indexForm = (Form_pg_index) GETSTRUCT(indexTuple);
Assert(indexForm->indexrelid = indexRelationId);
Assert(indexForm->indisready);
Assert(!indexForm->indisvalid);
indexForm->indisvalid = true;
......@@ -575,7 +681,8 @@ DefineIndex(RangeVar *heapRelation,
* relcache entries for the index itself, but we should also send a
* relcache inval on the parent table to force replanning of cached plans.
* Otherwise existing sessions might fail to use the new index where it
* would be useful.
* would be useful. (Note that our earlier commits did not create
* reasons to replan; relcache flush on the index itself was sufficient.)
*/
CacheInvalidateRelcacheByRelid(heaprelid.relId);
......
......@@ -8,7 +8,7 @@
*
*
* IDENTIFICATION
* $PostgreSQL: pgsql/src/backend/commands/sequence.c,v 1.145 2007/09/12 22:10:26 tgl Exp $
* $PostgreSQL: pgsql/src/backend/commands/sequence.c,v 1.146 2007/09/20 17:56:31 tgl Exp $
*
*-------------------------------------------------------------------------
*/
......@@ -1281,7 +1281,7 @@ seq_redo(XLogRecPtr lsn, XLogRecord *record)
itemsz = record->xl_len - sizeof(xl_seq_rec);
itemsz = MAXALIGN(itemsz);
if (PageAddItem(page, (Item) item, itemsz,
FirstOffsetNumber, false) == InvalidOffsetNumber)
FirstOffsetNumber, false, false) == InvalidOffsetNumber)
elog(PANIC, "seq_redo: failed to add item to page");
PageSetLSN(page, lsn);
......
This diff is collapsed.
......@@ -36,7 +36,7 @@
*
*
* IDENTIFICATION
* $PostgreSQL: pgsql/src/backend/commands/vacuumlazy.c,v 1.96 2007/09/16 02:37:46 tgl Exp $
* $PostgreSQL: pgsql/src/backend/commands/vacuumlazy.c,v 1.97 2007/09/20 17:56:31 tgl Exp $
*
*-------------------------------------------------------------------------
*/
......@@ -326,8 +326,8 @@ lazy_scan_heap(Relation onerel, LVRelStats *vacrelstats,
buf = ReadBufferWithStrategy(onerel, blkno, vac_strategy);
/* Initially, we only need shared access to the buffer */
LockBuffer(buf, BUFFER_LOCK_SHARE);
/* We need buffer cleanup lock so that we can prune HOT chains. */
LockBufferForCleanup(buf);
page = BufferGetPage(buf);
......@@ -341,11 +341,10 @@ lazy_scan_heap(Relation onerel, LVRelStats *vacrelstats,
* We have to be careful here because we could be looking at a
* page that someone has just added to the relation and not yet
* been able to initialize (see RelationGetBufferForTuple). To
* interlock against that, release the buffer read lock (which we
* must do anyway) and grab the relation extension lock before
* re-locking in exclusive mode. If the page is still
* uninitialized by then, it must be left over from a crashed
* backend, and we can initialize it.
* protect against that, release the buffer lock, grab the
* relation extension lock momentarily, and re-lock the buffer.
* If the page is still uninitialized by then, it must be left
* over from a crashed backend, and we can initialize it.
*
* We don't really need the relation lock when this is a new or
* temp relation, but it's probably not worth the code space to
......@@ -357,7 +356,7 @@ lazy_scan_heap(Relation onerel, LVRelStats *vacrelstats,
LockBuffer(buf, BUFFER_LOCK_UNLOCK);
LockRelationForExtension(onerel, ExclusiveLock);
UnlockRelationForExtension(onerel, ExclusiveLock);
LockBuffer(buf, BUFFER_LOCK_EXCLUSIVE);
LockBufferForCleanup(buf);
if (PageIsNew(page))
{
ereport(WARNING,
......@@ -366,7 +365,7 @@ lazy_scan_heap(Relation onerel, LVRelStats *vacrelstats,
PageInit(page, BufferGetPageSize(buf), 0);
empty_pages++;
lazy_record_free_space(vacrelstats, blkno,
PageGetFreeSpace(page));
PageGetHeapFreeSpace(page));
}
MarkBufferDirty(buf);
UnlockReleaseBuffer(buf);
......@@ -377,11 +376,23 @@ lazy_scan_heap(Relation onerel, LVRelStats *vacrelstats,
{
empty_pages++;
lazy_record_free_space(vacrelstats, blkno,
PageGetFreeSpace(page));
PageGetHeapFreeSpace(page));
UnlockReleaseBuffer(buf);
continue;
}
/*
* Prune all HOT-update chains in this page.
*
* We count tuples removed by the pruning step as removed by VACUUM.
*/
tups_vacuumed += heap_page_prune(onerel, buf, OldestXmin,
false, false);
/*
* Now scan the page to collect vacuumable items and check for
* tuples requiring freezing.
*/
nfrozen = 0;
hastup = false;
prev_dead_count = vacrelstats->num_dead_tuples;
......@@ -394,21 +405,63 @@ lazy_scan_heap(Relation onerel, LVRelStats *vacrelstats,
itemid = PageGetItemId(page, offnum);
/* Unused items require no processing, but we count 'em */
if (!ItemIdIsUsed(itemid))
{
nunused += 1;
continue;
}
/* Redirect items mustn't be touched */
if (ItemIdIsRedirected(itemid))
{
hastup = true; /* this page won't be truncatable */
continue;
}
ItemPointerSet(&(tuple.t_self), blkno, offnum);
/*
* DEAD item pointers are to be vacuumed normally; but we don't
* count them in tups_vacuumed, else we'd be double-counting
* (at least in the common case where heap_page_prune() just
* freed up a non-HOT tuple).
*/
if (ItemIdIsDead(itemid))
{
lazy_record_dead_tuple(vacrelstats, &(tuple.t_self));
continue;
}
Assert(ItemIdIsNormal(itemid));
tuple.t_data = (HeapTupleHeader) PageGetItem(page, itemid);
tuple.t_len = ItemIdGetLength(itemid);
ItemPointerSet(&(tuple.t_self), blkno, offnum);
tupgone = false;
switch (HeapTupleSatisfiesVacuum(tuple.t_data, OldestXmin, buf))
{
case HEAPTUPLE_DEAD:
/*
* Ordinarily, DEAD tuples would have been removed by
* heap_page_prune(), but it's possible that the tuple
* state changed since heap_page_prune() looked. In
* particular an INSERT_IN_PROGRESS tuple could have
* changed to DEAD if the inserter aborted. So this
* cannot be considered an error condition.
*
* If the tuple is HOT-updated then it must only be
* removed by a prune operation; so we keep it just as
* if it were RECENTLY_DEAD. Also, if it's a heap-only
* tuple, we choose to keep it, because it'll be a
* lot cheaper to get rid of it in the next pruning pass
* than to treat it like an indexed tuple.
*/
if (HeapTupleIsHotUpdated(&tuple) ||
HeapTupleIsHeapOnly(&tuple))
nkeep += 1;
else
tupgone = true; /* we can delete the tuple */
break;
case HEAPTUPLE_LIVE:
......@@ -449,11 +502,10 @@ lazy_scan_heap(Relation onerel, LVRelStats *vacrelstats,
/*
* Each non-removable tuple must be checked to see if it
* needs freezing. If we already froze anything, then
* we've already switched the buffer lock to exclusive.
* needs freezing. Note we already have exclusive buffer lock.
*/
if (heap_freeze_tuple(tuple.t_data, FreezeLimit,
(nfrozen > 0) ? InvalidBuffer : buf))
InvalidBuffer))
frozen[nfrozen++] = offnum;
}
} /* scan along page */
......@@ -485,9 +537,6 @@ lazy_scan_heap(Relation onerel, LVRelStats *vacrelstats,
if (nindexes == 0 &&
vacrelstats->num_dead_tuples > 0)
{
/* Trade in buffer share lock for super-exclusive lock */
LockBuffer(buf, BUFFER_LOCK_UNLOCK);
LockBufferForCleanup(buf);
/* Remove tuples from heap */
lazy_vacuum_page(onerel, blkno, buf, 0, vacrelstats);
/* Forget the now-vacuumed tuples, and press on */
......@@ -505,7 +554,7 @@ lazy_scan_heap(Relation onerel, LVRelStats *vacrelstats,
if (vacrelstats->num_dead_tuples == prev_dead_count)
{
lazy_record_free_space(vacrelstats, blkno,
PageGetFreeSpace(page));
PageGetHeapFreeSpace(page));
}
/* Remember the location of the last page with nonremovable tuples */
......@@ -598,7 +647,7 @@ lazy_vacuum_heap(Relation onerel, LVRelStats *vacrelstats)
/* Now that we've compacted the page, record its available space */
page = BufferGetPage(buf);
lazy_record_free_space(vacrelstats, tblk,
PageGetFreeSpace(page));
PageGetHeapFreeSpace(page));
UnlockReleaseBuffer(buf);
npages++;
}
......@@ -615,7 +664,7 @@ lazy_vacuum_heap(Relation onerel, LVRelStats *vacrelstats)
* lazy_vacuum_page() -- free dead tuples on a page
* and repair its fragmentation.
*
* Caller must hold pin and lock on the buffer.
* Caller must hold pin and buffer cleanup lock on the buffer.
*
* tupindex is the index in vacrelstats->dead_tuples of the first dead
* tuple for this page. We assume the rest follow sequentially.
......@@ -625,10 +674,9 @@ static int
lazy_vacuum_page(Relation onerel, BlockNumber blkno, Buffer buffer,
int tupindex, LVRelStats *vacrelstats)
{
OffsetNumber unused[MaxOffsetNumber];
int uncnt;
Page page = BufferGetPage(buffer);
ItemId itemid;
OffsetNumber unused[MaxOffsetNumber];
int uncnt = 0;
START_CRIT_SECTION();
......@@ -636,6 +684,7 @@ lazy_vacuum_page(Relation onerel, BlockNumber blkno, Buffer buffer,
{
BlockNumber tblk;
OffsetNumber toff;
ItemId itemid;
tblk = ItemPointerGetBlockNumber(&vacrelstats->dead_tuples[tupindex]);
if (tblk != blkno)
......@@ -643,9 +692,10 @@ lazy_vacuum_page(Relation onerel, BlockNumber blkno, Buffer buffer,
toff = ItemPointerGetOffsetNumber(&vacrelstats->dead_tuples[tupindex]);
itemid = PageGetItemId(page, toff);
ItemIdSetUnused(itemid);
unused[uncnt++] = toff;
}
uncnt = PageRepairFragmentation(page, unused);
PageRepairFragmentation(page);
MarkBufferDirty(buffer);
......@@ -654,7 +704,10 @@ lazy_vacuum_page(Relation onerel, BlockNumber blkno, Buffer buffer,
{
XLogRecPtr recptr;
recptr = log_heap_clean(onerel, buffer, unused, uncnt);
recptr = log_heap_clean(onerel, buffer,
NULL, 0, NULL, 0,
unused, uncnt,
false);
PageSetLSN(page, recptr);
PageSetTLI(page, ThisTimeLineID);
}
......@@ -980,7 +1033,7 @@ lazy_record_dead_tuple(LVRelStats *vacrelstats,
/*
* The array shouldn't overflow under normal behavior, but perhaps it
* could if we are given a really small maintenance_work_mem. In that
* case, just forget the last few tuples.
* case, just forget the last few tuples (we'll get 'em next time).
*/
if (vacrelstats->num_dead_tuples < vacrelstats->max_dead_tuples)
{
......
......@@ -26,7 +26,7 @@
*
*
* IDENTIFICATION
* $PostgreSQL: pgsql/src/backend/executor/execMain.c,v 1.297 2007/09/07 20:59:26 tgl Exp $
* $PostgreSQL: pgsql/src/backend/executor/execMain.c,v 1.298 2007/09/20 17:56:31 tgl Exp $
*
*-------------------------------------------------------------------------
*/
......@@ -1813,8 +1813,10 @@ lreplace:;
*
* Note: heap_update returns the tid (location) of the new tuple in the
* t_self field.
*
* If it's a HOT update, we mustn't insert new index entries.
*/
if (resultRelInfo->ri_NumIndices > 0)
if (resultRelInfo->ri_NumIndices > 0 && !HeapTupleIsHeapOnly(tuple))
ExecInsertIndexTuples(slot, &(tuple->t_self), estate, false);
/* AFTER ROW UPDATE Triggers */
......
......@@ -8,7 +8,7 @@
*
*
* IDENTIFICATION
* $PostgreSQL: pgsql/src/backend/executor/execUtils.c,v 1.150 2007/08/15 21:39:50 tgl Exp $
* $PostgreSQL: pgsql/src/backend/executor/execUtils.c,v 1.151 2007/09/20 17:56:31 tgl Exp $
*
*-------------------------------------------------------------------------
*/
......@@ -981,6 +981,10 @@ ExecCloseIndices(ResultRelInfo *resultRelInfo)
* stuff as it only exists here because the genam stuff
* doesn't provide the functionality needed by the
* executor.. -cim 9/27/89
*
* CAUTION: this must not be called for a HOT update.
* We can't defend against that here for lack of info.
* Should we change the API to make it safer?
* ----------------------------------------------------------------
*/
void
......@@ -1029,6 +1033,10 @@ ExecInsertIndexTuples(TupleTableSlot *slot,
indexInfo = indexInfoArray[i];
/* If the index is marked as read-only, ignore it */
if (!indexInfo->ii_ReadyForInserts)
continue;
/* Check for partial index */
if (indexInfo->ii_Predicate != NIL)
{
......
......@@ -21,7 +21,7 @@
*
*
* IDENTIFICATION
* $PostgreSQL: pgsql/src/backend/executor/nodeBitmapHeapscan.c,v 1.19 2007/09/12 22:10:26 tgl Exp $
* $PostgreSQL: pgsql/src/backend/executor/nodeBitmapHeapscan.c,v 1.20 2007/09/20 17:56:31 tgl Exp $
*
*-------------------------------------------------------------------------
*/
......@@ -240,12 +240,7 @@ bitgetpage(HeapScanDesc scan, TBMIterateResult *tbmres)
BlockNumber page = tbmres->blockno;
Buffer buffer;
Snapshot snapshot;
Page dp;
int ntup;
int curslot;
int minslot;
int maxslot;
int maxoff;
/*
* Acquire pin on the target heap page, trading in any pin we held before.
......@@ -258,6 +253,13 @@ bitgetpage(HeapScanDesc scan, TBMIterateResult *tbmres)
buffer = scan->rs_cbuf;
snapshot = scan->rs_snapshot;
ntup = 0;
/*
* Prune and repair fragmentation for the whole page, if possible.
*/
heap_page_prune_opt(scan->rs_rd, buffer, RecentGlobalXmin);
/*
* We must hold share lock on the buffer content while examining tuple
* visibility. Afterwards, however, the tuples we have found to be
......@@ -265,71 +267,51 @@ bitgetpage(HeapScanDesc scan, TBMIterateResult *tbmres)
*/
LockBuffer(buffer, BUFFER_LOCK_SHARE);
dp = (Page) BufferGetPage(buffer);
maxoff = PageGetMaxOffsetNumber(dp);
/*
* Determine how many entries we need to look at on this page. If the
* bitmap is lossy then we need to look at each physical item pointer;
* otherwise we just look through the offsets listed in tbmres.
* We need two separate strategies for lossy and non-lossy cases.
*/
if (tbmres->ntuples >= 0)
{
/* non-lossy case */
minslot = 0;
maxslot = tbmres->ntuples - 1;
}
else
{
/* lossy case */
minslot = FirstOffsetNumber;
maxslot = maxoff;
}
/*
* Bitmap is non-lossy, so we just look through the offsets listed in
* tbmres; but we have to follow any HOT chain starting at each such
* offset.
*/
int curslot;
ntup = 0;
for (curslot = minslot; curslot <= maxslot; curslot++)
for (curslot = 0; curslot < tbmres->ntuples; curslot++)
{
OffsetNumber targoffset;
ItemId lp;
HeapTupleData loctup;
bool valid;
OffsetNumber offnum = tbmres->offsets[curslot];
ItemPointerData tid;
if (tbmres->ntuples >= 0)
{
/* non-lossy case */
targoffset = tbmres->offsets[curslot];
ItemPointerSet(&tid, page, offnum);
if (heap_hot_search_buffer(&tid, buffer, snapshot, NULL))
scan->rs_vistuples[ntup++] = ItemPointerGetOffsetNumber(&tid);
}
}
else
{
/* lossy case */
targoffset = (OffsetNumber) curslot;
}
/*
* We'd better check for out-of-range offnum in case of VACUUM since
* the TID was obtained.
* Bitmap is lossy, so we must examine each item pointer on the page.
* But we can ignore HOT chains, since we'll check each tuple anyway.
*/
if (targoffset < FirstOffsetNumber || targoffset > maxoff)
continue;
Page dp = (Page) BufferGetPage(buffer);
OffsetNumber maxoff = PageGetMaxOffsetNumber(dp);
OffsetNumber offnum;
lp = PageGetItemId(dp, targoffset);
for (offnum = FirstOffsetNumber; offnum <= maxoff; offnum++)
{
ItemId lp;
HeapTupleData loctup;
/*
* Must check for deleted tuple.
*/
lp = PageGetItemId(dp, offnum);
if (!ItemIdIsNormal(lp))
continue;
/*
* check time qualification of tuple, remember it if valid
*/
loctup.t_data = (HeapTupleHeader) PageGetItem((Page) dp, lp);
loctup.t_len = ItemIdGetLength(lp);
ItemPointerSet(&(loctup.t_self), page, targoffset);
valid = HeapTupleSatisfiesVisibility(&loctup, snapshot, buffer);
if (valid)
scan->rs_vistuples[ntup++] = targoffset;
if (HeapTupleSatisfiesVisibility(&loctup, snapshot, buffer))
scan->rs_vistuples[ntup++] = offnum;
}
}
LockBuffer(buffer, BUFFER_LOCK_UNLOCK);
......
......@@ -8,7 +8,7 @@
*
*
* IDENTIFICATION
* $PostgreSQL: pgsql/src/backend/executor/spi.c,v 1.180 2007/08/15 19:15:46 tgl Exp $
* $PostgreSQL: pgsql/src/backend/executor/spi.c,v 1.181 2007/09/20 17:56:31 tgl Exp $
*
*-------------------------------------------------------------------------
*/
......@@ -1407,6 +1407,7 @@ _SPI_prepare_plan(const char *src, SPIPlanPtr plan)
plansource->num_params = nargs;
plansource->fully_planned = true;
plansource->fixed_result = false;
/* no need to set search_path, generation or saved_xmin */
plansource->resultDesc = PlanCacheComputeResultDesc(stmt_list);
plansource->plan = cplan;
......@@ -1973,6 +1974,7 @@ _SPI_copy_plan(SPIPlanPtr plan, MemoryContext parentcxt)
newsource->num_params = newplan->nargs;
newsource->fully_planned = plansource->fully_planned;
newsource->fixed_result = plansource->fixed_result;
/* no need to worry about seach_path, generation or saved_xmin */
if (plansource->resultDesc)
newsource->resultDesc = CreateTupleDescCopy(plansource->resultDesc);
newsource->plan = newcplan;
......
......@@ -23,7 +23,7 @@
* Copyright (c) 2003-2007, PostgreSQL Global Development Group
*
* IDENTIFICATION
* $PostgreSQL: pgsql/src/backend/nodes/tidbitmap.c,v 1.12 2007/04/26 23:24:44 tgl Exp $
* $PostgreSQL: pgsql/src/backend/nodes/tidbitmap.c,v 1.13 2007/09/20 17:56:31 tgl Exp $
*
*-------------------------------------------------------------------------
*/
......@@ -32,6 +32,7 @@
#include <limits.h>
#include "access/htup.h"
#include "nodes/bitmapset.h"
#include "nodes/tidbitmap.h"
#include "storage/bufpage.h"
#include "utils/hsearch.h"
......@@ -61,9 +62,7 @@
*/
#define PAGES_PER_CHUNK (BLCKSZ / 32)
/* The bitmap unit size can be adjusted by changing these declarations: */
#define BITS_PER_BITMAPWORD 32
typedef uint32 bitmapword; /* must be an unsigned type */
/* We use BITS_PER_BITMAPWORD and typedef bitmapword from nodes/bitmapset.h */
#define WORDNUM(x) ((x) / BITS_PER_BITMAPWORD)
#define BITNUM(x) ((x) % BITS_PER_BITMAPWORD)
......
......@@ -8,7 +8,7 @@
*
*
* IDENTIFICATION
* $PostgreSQL: pgsql/src/backend/optimizer/plan/planner.c,v 1.221 2007/05/26 18:23:01 tgl Exp $
* $PostgreSQL: pgsql/src/backend/optimizer/plan/planner.c,v 1.222 2007/09/20 17:56:31 tgl Exp $
*
*-------------------------------------------------------------------------
*/
......@@ -134,6 +134,7 @@ standard_planner(Query *parse, int cursorOptions, ParamListInfo boundParams)
glob->subrtables = NIL;
glob->rewindPlanIDs = NULL;
glob->finalrtable = NIL;
glob->transientPlan = false;
/* Determine what fraction of the plan is likely to be scanned */
if (cursorOptions & CURSOR_OPT_FAST_PLAN)
......@@ -183,6 +184,7 @@ standard_planner(Query *parse, int cursorOptions, ParamListInfo boundParams)
result->commandType = parse->commandType;
result->canSetTag = parse->canSetTag;
result->transientPlan = glob->transientPlan;
result->planTree = top_plan;
result->rtable = glob->finalrtable;
result->resultRelations = root->resultRelations;
......
......@@ -9,7 +9,7 @@
*
*
* IDENTIFICATION
* $PostgreSQL: pgsql/src/backend/optimizer/util/plancat.c,v 1.136 2007/05/31 16:57:34 tgl Exp $
* $PostgreSQL: pgsql/src/backend/optimizer/util/plancat.c,v 1.137 2007/09/20 17:56:31 tgl Exp $
*
*-------------------------------------------------------------------------
*/
......@@ -19,6 +19,7 @@
#include "access/genam.h"
#include "access/heapam.h"
#include "access/transam.h"
#include "catalog/pg_inherits.h"
#include "nodes/makefuncs.h"
#include "optimizer/clauses.h"
......@@ -164,6 +165,20 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent,
continue;
}
/*
* If the index is valid, but cannot yet be used, ignore it;
* but mark the plan we are generating as transient.
* See src/backend/access/heap/README.HOT for discussion.
*/
if (index->indcheckxmin &&
!TransactionIdPrecedes(HeapTupleHeaderGetXmin(indexRelation->rd_indextuple->t_data),
TransactionXmin))
{
root->glob->transientPlan = true;
index_close(indexRelation, NoLock);
continue;
}
info = makeNode(IndexOptInfo);
info->indexoid = index->indexrelid;
......
......@@ -8,12 +8,13 @@
*
*
* IDENTIFICATION
* $PostgreSQL: pgsql/src/backend/optimizer/util/var.c,v 1.70 2007/06/11 01:16:23 tgl Exp $
* $PostgreSQL: pgsql/src/backend/optimizer/util/var.c,v 1.71 2007/09/20 17:56:31 tgl Exp $
*
*-------------------------------------------------------------------------
*/
#include "postgres.h"
#include "access/htup.h"
#include "optimizer/clauses.h"
#include "optimizer/prep.h"
#include "optimizer/var.h"
......@@ -54,6 +55,7 @@ typedef struct
static bool pull_varnos_walker(Node *node,
pull_varnos_context *context);
static bool pull_varattnos_walker(Node *node, Bitmapset **varattnos);
static bool contain_var_reference_walker(Node *node,
contain_var_reference_context *context);
static bool contain_var_clause_walker(Node *node, void *context);
......@@ -134,6 +136,47 @@ pull_varnos_walker(Node *node, pull_varnos_context *context)
(void *) context);
}
/*
* pull_varattnos
* Find all the distinct attribute numbers present in an expression tree,
* and add them to the initial contents of *varattnos.
* Only Vars that reference RTE 1 of rtable level zero are considered.
*
* Attribute numbers are offset by FirstLowInvalidHeapAttributeNumber so that
* we can include system attributes (e.g., OID) in the bitmap representation.
*
* Currently, this does not support subqueries nor expressions containing
* references to multiple tables; not needed since it's only applied to
* index expressions and predicates.
*/
void
pull_varattnos(Node *node, Bitmapset **varattnos)
{
(void) pull_varattnos_walker(node, varattnos);
}
static bool
pull_varattnos_walker(Node *node, Bitmapset **varattnos)
{
if (node == NULL)
return false;
if (IsA(node, Var))
{
Var *var = (Var *) node;
Assert(var->varno == 1);
*varattnos = bms_add_member(*varattnos,
var->varattno - FirstLowInvalidHeapAttributeNumber);
return false;
}
/* Should not find a subquery or subplan */
Assert(!IsA(node, Query));
Assert(!is_subplan(node));
return expression_tree_walker(node, pull_varattnos_walker,
(void *) varattnos);
}
/*
* contain_var_reference
......
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
......@@ -7,7 +7,7 @@
* Portions Copyright (c) 1996-2007, PostgreSQL Global Development Group
* Portions Copyright (c) 1994, Regents of the University of California
*
* $PostgreSQL: pgsql/src/include/access/relscan.h,v 1.56 2007/06/09 18:49:55 tgl Exp $
* $PostgreSQL: pgsql/src/include/access/relscan.h,v 1.57 2007/09/20 17:56:32 tgl Exp $
*
*-------------------------------------------------------------------------
*/
......@@ -82,6 +82,9 @@ typedef struct IndexScanDescData
HeapTupleData xs_ctup; /* current heap tuple, if any */
Buffer xs_cbuf; /* current heap buffer in scan, if any */
/* NB: if xs_cbuf is not InvalidBuffer, we hold a pin on that buffer */
TransactionId xs_prev_xmax; /* previous HOT chain member's XMAX, if any */
OffsetNumber xs_next_hot; /* next member of HOT chain, if any */
bool xs_hot_dead; /* T if all members of HOT chain are dead */
} IndexScanDescData;
typedef IndexScanDescData *IndexScanDesc;
......
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment