Commit 9d37c038 authored by Tom Lane's avatar Tom Lane

Repair PANIC condition in hash indexes when a previous index extension attempt

failed (due to lock conflicts or out-of-space).  We might have already
extended the index's filesystem EOF before failing, causing the EOF to be
beyond what the metapage says is the last used page.  Hence the invariant
maintained by the code needs to be "EOF is at or beyond last used page",
not "EOF is exactly the last used page".  Problem was created by my patch
of 2006-11-19 that attempted to repair bug #2737.  Since that was
back-patched to 7.4, this needs to be as well.  Per report and test case
from Vlastimil Krejcir.
parent 77a41e71
$PostgreSQL: pgsql/src/backend/access/hash/README,v 1.5 2007/01/09 07:30:49 tgl Exp $ $PostgreSQL: pgsql/src/backend/access/hash/README,v 1.6 2007/04/19 20:24:04 tgl Exp $
This directory contains an implementation of hash indexing for Postgres. Most This directory contains an implementation of hash indexing for Postgres. Most
of the core ideas are taken from Margo Seltzer and Ozan Yigit, A New Hashing of the core ideas are taken from Margo Seltzer and Ozan Yigit, A New Hashing
...@@ -77,6 +77,18 @@ index, and preparing to allocate additional overflow pages after those ...@@ -77,6 +77,18 @@ index, and preparing to allocate additional overflow pages after those
bucket pages. hashm_spares[] entries before S cannot change anymore, bucket pages. hashm_spares[] entries before S cannot change anymore,
since that would require moving already-created bucket pages. since that would require moving already-created bucket pages.
The last page nominally used by the index is always determinable from
hashm_spares[S]. To avoid complaints from smgr, the logical EOF as seen by
the filesystem and smgr must always be greater than or equal to this page.
We have to allow the case "greater than" because it's possible that during
an index extension we crash after allocating filesystem space and before
updating the metapage. Note that on filesystems that allow "holes" in
files, it's entirely likely that pages before the logical EOF are not yet
allocated: when we allocate a new splitpoint's worth of bucket pages, we
physically zero the last such page to force the EOF up, and the first such
page will be used immediately, but the intervening pages are not written
until needed.
Since overflow pages may be recycled if enough tuples are deleted from Since overflow pages may be recycled if enough tuples are deleted from
their bucket, we need a way to keep track of currently-free overflow their bucket, we need a way to keep track of currently-free overflow
pages. The state of each overflow page (0 = available, 1 = not available) pages. The state of each overflow page (0 = available, 1 = not available)
...@@ -310,6 +322,10 @@ we can just error out without any great harm being done. ...@@ -310,6 +322,10 @@ we can just error out without any great harm being done.
Free space management Free space management
--------------------- ---------------------
(Question: why is this so complicated? Why not just have a linked list
of free pages with the list head in the metapage? It's not like we
avoid needing to modify the metapage with all this.)
Free space management consists of two sub-algorithms, one for reserving Free space management consists of two sub-algorithms, one for reserving
an overflow page to add to a bucket chain, and one for returning an empty an overflow page to add to a bucket chain, and one for returning an empty
overflow page to the free pool. overflow page to the free pool.
......
...@@ -8,7 +8,7 @@ ...@@ -8,7 +8,7 @@
* *
* *
* IDENTIFICATION * IDENTIFICATION
* $PostgreSQL: pgsql/src/backend/access/hash/hashovfl.c,v 1.55 2007/04/09 22:03:57 tgl Exp $ * $PostgreSQL: pgsql/src/backend/access/hash/hashovfl.c,v 1.56 2007/04/19 20:24:04 tgl Exp $
* *
* NOTES * NOTES
* Overflow pages look like ordinary relation pages. * Overflow pages look like ordinary relation pages.
...@@ -272,19 +272,12 @@ _hash_getovflpage(Relation rel, Buffer metabuf) ...@@ -272,19 +272,12 @@ _hash_getovflpage(Relation rel, Buffer metabuf)
blkno = bitno_to_blkno(metap, bit); blkno = bitno_to_blkno(metap, bit);
/* /*
* We have to fetch the page with P_NEW to ensure smgr's idea of the * Fetch the page with _hash_getnewbuf to ensure smgr's idea of the
* relation length stays in sync with ours. XXX It's annoying to do this * relation length stays in sync with ours. XXX It's annoying to do this
* with metapage write lock held; would be better to use a lock that * with metapage write lock held; would be better to use a lock that
* doesn't block incoming searches. Best way to fix it would be to stop * doesn't block incoming searches.
* maintaining hashm_spares[hashm_ovflpoint] and rely entirely on the
* smgr relation length to track where new overflow pages come from;
* then we could release the metapage before we do the smgrextend.
* FIXME later (not in beta...)
*/ */
newbuf = _hash_getbuf(rel, P_NEW, HASH_WRITE); newbuf = _hash_getnewbuf(rel, blkno, HASH_WRITE);
if (BufferGetBlockNumber(newbuf) != blkno)
elog(ERROR, "unexpected hash relation size: %u, should be %u",
BufferGetBlockNumber(newbuf), blkno);
metap->hashm_spares[splitnum]++; metap->hashm_spares[splitnum]++;
...@@ -507,19 +500,14 @@ _hash_initbitmap(Relation rel, HashMetaPage metap, BlockNumber blkno) ...@@ -507,19 +500,14 @@ _hash_initbitmap(Relation rel, HashMetaPage metap, BlockNumber blkno)
/* /*
* It is okay to write-lock the new bitmap page while holding metapage * It is okay to write-lock the new bitmap page while holding metapage
* write lock, because no one else could be contending for the new page. * write lock, because no one else could be contending for the new page.
* Also, the metapage lock makes it safe to extend the index using P_NEW, * Also, the metapage lock makes it safe to extend the index using
* which we want to do to ensure the smgr's idea of the relation size * _hash_getnewbuf.
* stays in step with ours.
* *
* There is some loss of concurrency in possibly doing I/O for the new * There is some loss of concurrency in possibly doing I/O for the new
* page while holding the metapage lock, but this path is taken so seldom * page while holding the metapage lock, but this path is taken so seldom
* that it's not worth worrying about. * that it's not worth worrying about.
*/ */
buf = _hash_getbuf(rel, P_NEW, HASH_WRITE); buf = _hash_getnewbuf(rel, blkno, HASH_WRITE);
if (BufferGetBlockNumber(buf) != blkno)
elog(ERROR, "unexpected hash relation size: %u, should be %u",
BufferGetBlockNumber(buf), blkno);
pg = BufferGetPage(buf); pg = BufferGetPage(buf);
/* initialize the page */ /* initialize the page */
......
This diff is collapsed.
...@@ -7,7 +7,7 @@ ...@@ -7,7 +7,7 @@
* Portions Copyright (c) 1996-2007, PostgreSQL Global Development Group * Portions Copyright (c) 1996-2007, PostgreSQL Global Development Group
* Portions Copyright (c) 1994, Regents of the University of California * Portions Copyright (c) 1994, Regents of the University of California
* *
* $PostgreSQL: pgsql/src/include/access/hash.h,v 1.78 2007/04/09 22:04:05 tgl Exp $ * $PostgreSQL: pgsql/src/include/access/hash.h,v 1.79 2007/04/19 20:24:04 tgl Exp $
* *
* NOTES * NOTES
* modeled after Margo Seltzer's hash implementation for unix. * modeled after Margo Seltzer's hash implementation for unix.
...@@ -284,6 +284,7 @@ extern void _hash_getlock(Relation rel, BlockNumber whichlock, int access); ...@@ -284,6 +284,7 @@ extern void _hash_getlock(Relation rel, BlockNumber whichlock, int access);
extern bool _hash_try_getlock(Relation rel, BlockNumber whichlock, int access); extern bool _hash_try_getlock(Relation rel, BlockNumber whichlock, int access);
extern void _hash_droplock(Relation rel, BlockNumber whichlock, int access); extern void _hash_droplock(Relation rel, BlockNumber whichlock, int access);
extern Buffer _hash_getbuf(Relation rel, BlockNumber blkno, int access); extern Buffer _hash_getbuf(Relation rel, BlockNumber blkno, int access);
extern Buffer _hash_getnewbuf(Relation rel, BlockNumber blkno, int access);
extern void _hash_relbuf(Relation rel, Buffer buf); extern void _hash_relbuf(Relation rel, Buffer buf);
extern void _hash_dropbuf(Relation rel, Buffer buf); extern void _hash_dropbuf(Relation rel, Buffer buf);
extern void _hash_wrtbuf(Relation rel, Buffer buf); extern void _hash_wrtbuf(Relation rel, Buffer buf);
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment