Commit 5d508736 authored by Tom Lane's avatar Tom Lane

Replace the BufMgrLock with separate locks on the lookup hashtable and

the freelist, plus per-buffer spinlocks that protect access to individual
shared buffer headers.  This requires abandoning a global freelist (since
the freelist is a global contention point), which shoots down ARC and 2Q
as well as plain LRU management.  Adopt a clock sweep algorithm instead.
Preliminary results show substantial improvement in multi-backend situations.
parent 5592a6cf
<!-- <!--
$PostgreSQL: pgsql/doc/src/sgml/runtime.sgml,v 1.306 2005/03/02 19:58:54 tgl Exp $ $PostgreSQL: pgsql/doc/src/sgml/runtime.sgml,v 1.307 2005/03/04 20:21:05 tgl Exp $
--> -->
<chapter Id="runtime"> <chapter Id="runtime">
...@@ -1379,9 +1379,7 @@ SET ENABLE_SEQSCAN TO OFF; ...@@ -1379,9 +1379,7 @@ SET ENABLE_SEQSCAN TO OFF;
Specifies the delay between activity rounds for the Specifies the delay between activity rounds for the
background writer. In each round the writer issues writes background writer. In each round the writer issues writes
for some number of dirty buffers (controllable by the for some number of dirty buffers (controllable by the
following parameters). The selected buffers will always be following parameters). It then sleeps for <varname>bgwriter_delay</>
the least recently used ones among the currently dirty
buffers. It then sleeps for <varname>bgwriter_delay</>
milliseconds, and repeats. The default value is 200. Note milliseconds, and repeats. The default value is 200. Note
that on many systems, the effective resolution of sleep that on many systems, the effective resolution of sleep
delays is 10 milliseconds; setting <varname>bgwriter_delay</> delays is 10 milliseconds; setting <varname>bgwriter_delay</>
...@@ -1393,32 +1391,77 @@ SET ENABLE_SEQSCAN TO OFF; ...@@ -1393,32 +1391,77 @@ SET ENABLE_SEQSCAN TO OFF;
</listitem> </listitem>
</varlistentry> </varlistentry>
<varlistentry id="guc-bgwriter-percent" xreflabel="bgwriter_percent"> <varlistentry id="guc-bgwriter-lru-percent" xreflabel="bgwriter_lru_percent">
<term><varname>bgwriter_percent</varname> (<type>integer</type>)</term> <term><varname>bgwriter_lru_percent</varname> (<type>floating point</type>)</term>
<indexterm> <indexterm>
<primary><varname>bgwriter_percent</> configuration parameter</primary> <primary><varname>bgwriter_lru_percent</> configuration parameter</primary>
</indexterm> </indexterm>
<listitem> <listitem>
<para> <para>
In each round, no more than this percentage of the currently To reduce the probability that server processes will need to issue
dirty buffers will be written (rounding up any fraction to their own writes, the background writer tries to write buffers that
the next whole number of buffers). The default value is are likely to be recycled soon. In each round, it examines up to
1. This option can only be set at server start or in the <varname>bgwriter_lru_percent</> of the buffers that are nearest to
being recycled, and writes any that are dirty.
The default value is 1.0 (this is a percentage of the total number
of shared buffers).
This option can only be set at server start or in the
<filename>postgresql.conf</filename> file.
</para>
</listitem>
</varlistentry>
<varlistentry id="guc-bgwriter-lru-maxpages" xreflabel="bgwriter_lru_maxpages">
<term><varname>bgwriter_lru_maxpages</varname> (<type>integer</type>)</term>
<indexterm>
<primary><varname>bgwriter_lru_maxpages</> configuration parameter</primary>
</indexterm>
<listitem>
<para>
In each round, no more than this many buffers will be written
as a result of scanning soon-to-be-recycled buffers.
The default value is 5.
This option can only be set at server start or in the
<filename>postgresql.conf</filename> file.
</para>
</listitem>
</varlistentry>
<varlistentry id="guc-bgwriter-all-percent" xreflabel="bgwriter_all_percent">
<term><varname>bgwriter_all_percent</varname> (<type>floating point</type>)</term>
<indexterm>
<primary><varname>bgwriter_all_percent</> configuration parameter</primary>
</indexterm>
<listitem>
<para>
To reduce the amount of work that will be needed at checkpoint time,
the background writer also does a circular scan through the entire
buffer pool, writing buffers that are found to be dirty.
In each round, it examines up to
<varname>bgwriter_all_percent</> of the buffers for this purpose.
The default value is 0.333 (this is a percentage of the total number
of shared buffers). With the default <varname>bgwriter_delay</>
setting, this will allow the entire shared buffer pool to be scanned
about once per minute.
This option can only be set at server start or in the
<filename>postgresql.conf</filename> file. <filename>postgresql.conf</filename> file.
</para> </para>
</listitem> </listitem>
</varlistentry> </varlistentry>
<varlistentry id="guc-bgwriter-maxpages" xreflabel="bgwriter_maxpages"> <varlistentry id="guc-bgwriter-all-maxpages" xreflabel="bgwriter_all_maxpages">
<term><varname>bgwriter_maxpages</varname> (<type>integer</type>)</term> <term><varname>bgwriter_all_maxpages</varname> (<type>integer</type>)</term>
<indexterm> <indexterm>
<primary><varname>bgwriter_maxpages</> configuration parameter</primary> <primary><varname>bgwriter_all_maxpages</> configuration parameter</primary>
</indexterm> </indexterm>
<listitem> <listitem>
<para> <para>
In each round, no more than this many dirty buffers will be In each round, no more than this many buffers will be written
written. The default value is 100. This option can only be as a result of the scan of the entire buffer pool. (If this
set at server start or in the limit is reached, the scan stops, and resumes at the next buffer
during the next round.)
The default value is 5.
This option can only be set at server start or in the
<filename>postgresql.conf</filename> file. <filename>postgresql.conf</filename> file.
</para> </para>
</listitem> </listitem>
...@@ -1426,13 +1469,19 @@ SET ENABLE_SEQSCAN TO OFF; ...@@ -1426,13 +1469,19 @@ SET ENABLE_SEQSCAN TO OFF;
</variablelist> </variablelist>
<para> <para>
Smaller values of <varname>bgwriter_percent</varname> and Smaller values of <varname>bgwriter_all_percent</varname> and
<varname>bgwriter_maxpages</varname> reduce the extra I/O load <varname>bgwriter_all_maxpages</varname> reduce the extra I/O load
caused by the background writer, but leave more work to be done caused by the background writer, but leave more work to be done
at checkpoint time. To reduce load spikes at checkpoints, at checkpoint time. To reduce load spikes at checkpoints,
increase the values. To disable background writing entirely, increase these two values.
set <varname>bgwriter_percent</varname> and/or Similarly, smaller values of <varname>bgwriter_lru_percent</varname> and
<varname>bgwriter_maxpages</varname> to zero. <varname>bgwriter_lru_maxpages</varname> reduce the extra I/O load
caused by the background writer, but make it more likely that server
processes will have to issue writes for themselves, delaying interactive
queries.
To disable background writing entirely,
set both <varname>maxpages</varname> values and/or both
<varname>percent</varname> values to zero.
</para> </para>
</sect3> </sect3>
...@@ -3866,20 +3915,6 @@ plruby.bar = true # generates error, unknown class name ...@@ -3866,20 +3915,6 @@ plruby.bar = true # generates error, unknown class name
</listitem> </listitem>
</varlistentry> </varlistentry>
<varlistentry id="guc-debug-shared-buffers" xreflabel="debug_shared_buffers">
<term><varname>debug_shared_buffers</varname> (<type>integer</type>)</term>
<indexterm>
<primary><varname>debug_shared_buffers</> configuration parameter</primary>
</indexterm>
<listitem>
<para>
Number of seconds between ARC reports.
If set greater than zero, emit ARC statistics to the log every so many
seconds. Zero (the default) disables reporting.
</para>
</listitem>
</varlistentry>
<varlistentry id="guc-pre-auth-delay" xreflabel="pre_auth_delay"> <varlistentry id="guc-pre-auth-delay" xreflabel="pre_auth_delay">
<term><varname>pre_auth_delay</varname> (<type>integer</type>)</term> <term><varname>pre_auth_delay</varname> (<type>integer</type>)</term>
<indexterm> <indexterm>
......
...@@ -8,7 +8,7 @@ ...@@ -8,7 +8,7 @@
* *
* *
* IDENTIFICATION * IDENTIFICATION
* $PostgreSQL: pgsql/src/backend/catalog/index.c,v 1.244 2005/01/10 20:02:19 tgl Exp $ * $PostgreSQL: pgsql/src/backend/catalog/index.c,v 1.245 2005/03/04 20:21:05 tgl Exp $
* *
* *
* INTERFACE ROUTINES * INTERFACE ROUTINES
...@@ -1060,7 +1060,6 @@ setRelhasindex(Oid relid, bool hasindex, bool isprimary, Oid reltoastidxid) ...@@ -1060,7 +1060,6 @@ setRelhasindex(Oid relid, bool hasindex, bool isprimary, Oid reltoastidxid)
/* Send out shared cache inval if necessary */ /* Send out shared cache inval if necessary */
if (!IsBootstrapProcessingMode()) if (!IsBootstrapProcessingMode())
CacheInvalidateHeapTuple(pg_class, tuple); CacheInvalidateHeapTuple(pg_class, tuple);
BufferSync(-1, -1);
} }
else if (dirty) else if (dirty)
{ {
......
...@@ -15,7 +15,7 @@ ...@@ -15,7 +15,7 @@
* *
* *
* IDENTIFICATION * IDENTIFICATION
* $PostgreSQL: pgsql/src/backend/commands/dbcommands.c,v 1.151 2005/02/26 18:43:33 tgl Exp $ * $PostgreSQL: pgsql/src/backend/commands/dbcommands.c,v 1.152 2005/03/04 20:21:05 tgl Exp $
* *
*------------------------------------------------------------------------- *-------------------------------------------------------------------------
*/ */
...@@ -339,7 +339,7 @@ createdb(const CreatedbStmt *stmt) ...@@ -339,7 +339,7 @@ createdb(const CreatedbStmt *stmt)
* up-to-date for the copy. (We really only need to flush buffers for * up-to-date for the copy. (We really only need to flush buffers for
* the source database, but bufmgr.c provides no API for that.) * the source database, but bufmgr.c provides no API for that.)
*/ */
BufferSync(-1, -1); BufferSync();
/* /*
* Close virtual file descriptors so the kernel has more available for * Close virtual file descriptors so the kernel has more available for
...@@ -1201,7 +1201,7 @@ dbase_redo(XLogRecPtr lsn, XLogRecord *record) ...@@ -1201,7 +1201,7 @@ dbase_redo(XLogRecPtr lsn, XLogRecord *record)
* up-to-date for the copy. (We really only need to flush buffers for * up-to-date for the copy. (We really only need to flush buffers for
* the source database, but bufmgr.c provides no API for that.) * the source database, but bufmgr.c provides no API for that.)
*/ */
BufferSync(-1, -1); BufferSync();
#ifndef WIN32 #ifndef WIN32
......
...@@ -13,7 +13,7 @@ ...@@ -13,7 +13,7 @@
* *
* *
* IDENTIFICATION * IDENTIFICATION
* $PostgreSQL: pgsql/src/backend/commands/vacuum.c,v 1.302 2005/02/26 18:43:33 tgl Exp $ * $PostgreSQL: pgsql/src/backend/commands/vacuum.c,v 1.303 2005/03/04 20:21:06 tgl Exp $
* *
*------------------------------------------------------------------------- *-------------------------------------------------------------------------
*/ */
...@@ -36,7 +36,6 @@ ...@@ -36,7 +36,6 @@
#include "commands/vacuum.h" #include "commands/vacuum.h"
#include "executor/executor.h" #include "executor/executor.h"
#include "miscadmin.h" #include "miscadmin.h"
#include "storage/buf_internals.h"
#include "storage/freespace.h" #include "storage/freespace.h"
#include "storage/sinval.h" #include "storage/sinval.h"
#include "storage/smgr.h" #include "storage/smgr.h"
......
...@@ -37,7 +37,7 @@ ...@@ -37,7 +37,7 @@
* *
* *
* IDENTIFICATION * IDENTIFICATION
* $PostgreSQL: pgsql/src/backend/postmaster/bgwriter.c,v 1.14 2005/02/19 23:16:15 tgl Exp $ * $PostgreSQL: pgsql/src/backend/postmaster/bgwriter.c,v 1.15 2005/03/04 20:21:06 tgl Exp $
* *
*------------------------------------------------------------------------- *-------------------------------------------------------------------------
*/ */
...@@ -116,9 +116,6 @@ static BgWriterShmemStruct *BgWriterShmem; ...@@ -116,9 +116,6 @@ static BgWriterShmemStruct *BgWriterShmem;
* GUC parameters * GUC parameters
*/ */
int BgWriterDelay = 200; int BgWriterDelay = 200;
int BgWriterPercent = 1;
int BgWriterMaxPages = 100;
int CheckPointTimeout = 300; int CheckPointTimeout = 300;
int CheckPointWarning = 30; int CheckPointWarning = 30;
...@@ -274,7 +271,6 @@ BackgroundWriterMain(void) ...@@ -274,7 +271,6 @@ BackgroundWriterMain(void)
bool force_checkpoint = false; bool force_checkpoint = false;
time_t now; time_t now;
int elapsed_secs; int elapsed_secs;
int n;
long udelay; long udelay;
/* /*
...@@ -365,16 +361,13 @@ BackgroundWriterMain(void) ...@@ -365,16 +361,13 @@ BackgroundWriterMain(void)
* checkpoints happen at a predictable spacing. * checkpoints happen at a predictable spacing.
*/ */
last_checkpoint_time = now; last_checkpoint_time = now;
/* Nap for configured time before rechecking */
n = 1;
} }
else else
n = BufferSync(BgWriterPercent, BgWriterMaxPages); BgBufferSync();
/* /*
* Nap for the configured time or sleep for 10 seconds if there * Nap for the configured time, or sleep for 10 seconds if there
* was nothing to do at all. * is no bgwriter activity configured.
* *
* On some platforms, signals won't interrupt the sleep. To ensure * On some platforms, signals won't interrupt the sleep. To ensure
* we respond reasonably promptly when someone signals us, break * we respond reasonably promptly when someone signals us, break
...@@ -383,7 +376,11 @@ BackgroundWriterMain(void) ...@@ -383,7 +376,11 @@ BackgroundWriterMain(void)
* *
* We absorb pending requests after each short sleep. * We absorb pending requests after each short sleep.
*/ */
udelay = ((n > 0) ? BgWriterDelay : 10000) * 1000L; if ((bgwriter_all_percent > 0.0 && bgwriter_all_maxpages > 0) ||
(bgwriter_lru_percent > 0.0 && bgwriter_lru_maxpages > 0))
udelay = BgWriterDelay * 1000L;
else
udelay = 10000000L;
while (udelay > 1000000L) while (udelay > 1000000L)
{ {
if (got_SIGHUP || checkpoint_requested || shutdown_requested) if (got_SIGHUP || checkpoint_requested || shutdown_requested)
......
This diff is collapsed.
...@@ -8,7 +8,7 @@ ...@@ -8,7 +8,7 @@
* *
* *
* IDENTIFICATION * IDENTIFICATION
* $PostgreSQL: pgsql/src/backend/storage/buffer/buf_init.c,v 1.71 2005/02/03 23:29:11 tgl Exp $ * $PostgreSQL: pgsql/src/backend/storage/buffer/buf_init.c,v 1.72 2005/03/04 20:21:06 tgl Exp $
* *
*------------------------------------------------------------------------- *-------------------------------------------------------------------------
*/ */
...@@ -22,6 +22,8 @@ BufferDesc *BufferDescriptors; ...@@ -22,6 +22,8 @@ BufferDesc *BufferDescriptors;
Block *BufferBlockPointers; Block *BufferBlockPointers;
int32 *PrivateRefCount; int32 *PrivateRefCount;
static char *BufferBlocks;
/* statistics counters */ /* statistics counters */
long int ReadBufferCount; long int ReadBufferCount;
long int ReadLocalBufferCount; long int ReadLocalBufferCount;
...@@ -50,16 +52,11 @@ long int LocalBufferFlushCount; ...@@ -50,16 +52,11 @@ long int LocalBufferFlushCount;
* *
* Synchronization/Locking: * Synchronization/Locking:
* *
* BufMgrLock lock -- must be acquired before manipulating the
* buffer search datastructures (lookup/freelist, as well as the
* flag bits of any buffer). Must be released
* before exit and before doing any IO.
*
* IO_IN_PROGRESS -- this is a flag in the buffer descriptor. * IO_IN_PROGRESS -- this is a flag in the buffer descriptor.
* It must be set when an IO is initiated and cleared at * It must be set when an IO is initiated and cleared at
* the end of the IO. It is there to make sure that one * the end of the IO. It is there to make sure that one
* process doesn't start to use a buffer while another is * process doesn't start to use a buffer while another is
* faulting it in. see IOWait/IOSignal. * faulting it in. see WaitIO and related routines.
* *
* refcount -- Counts the number of processes holding pins on a buffer. * refcount -- Counts the number of processes holding pins on a buffer.
* A buffer is pinned during IO and immediately after a BufferAlloc(). * A buffer is pinned during IO and immediately after a BufferAlloc().
...@@ -85,10 +82,8 @@ long int LocalBufferFlushCount; ...@@ -85,10 +82,8 @@ long int LocalBufferFlushCount;
void void
InitBufferPool(void) InitBufferPool(void)
{ {
char *BufferBlocks;
bool foundBufs, bool foundBufs,
foundDescs; foundDescs;
int i;
BufferDescriptors = (BufferDesc *) BufferDescriptors = (BufferDesc *)
ShmemInitStruct("Buffer Descriptors", ShmemInitStruct("Buffer Descriptors",
...@@ -102,52 +97,42 @@ InitBufferPool(void) ...@@ -102,52 +97,42 @@ InitBufferPool(void)
{ {
/* both should be present or neither */ /* both should be present or neither */
Assert(foundDescs && foundBufs); Assert(foundDescs && foundBufs);
/* note: this path is only taken in EXEC_BACKEND case */
} }
else else
{ {
BufferDesc *buf; BufferDesc *buf;
char *block; int i;
/*
* It's probably not really necessary to grab the lock --- if
* there's anyone else attached to the shmem at this point, we've
* got problems.
*/
LWLockAcquire(BufMgrLock, LW_EXCLUSIVE);
buf = BufferDescriptors; buf = BufferDescriptors;
block = BufferBlocks;
/* /*
* Initialize all the buffer headers. * Initialize all the buffer headers.
*/ */
for (i = 0; i < NBuffers; block += BLCKSZ, buf++, i++) for (i = 0; i < NBuffers; buf++, i++)
{ {
Assert(ShmemIsValid((unsigned long) block)); CLEAR_BUFFERTAG(buf->tag);
buf->flags = 0;
buf->usage_count = 0;
buf->refcount = 0;
buf->wait_backend_id = 0;
/* SpinLockInit(&buf->buf_hdr_lock);
* The bufNext fields link together all totally-unused buffers.
* Subsequent management of this list is done by
* StrategyGetBuffer().
*/
buf->bufNext = i + 1;
CLEAR_BUFFERTAG(buf->tag);
buf->buf_id = i; buf->buf_id = i;
buf->data = MAKE_OFFSET(block); /*
buf->flags = 0; * Initially link all the buffers together as unused.
buf->refcount = 0; * Subsequent management of this list is done by freelist.c.
*/
buf->freeNext = i + 1;
buf->io_in_progress_lock = LWLockAssign(); buf->io_in_progress_lock = LWLockAssign();
buf->cntx_lock = LWLockAssign(); buf->content_lock = LWLockAssign();
buf->cntxDirty = false;
buf->wait_backend_id = 0;
} }
/* Correct last entry of linked list */ /* Correct last entry of linked list */
BufferDescriptors[NBuffers - 1].bufNext = -1; BufferDescriptors[NBuffers - 1].freeNext = FREENEXT_END_OF_LIST;
LWLockRelease(BufMgrLock);
} }
/* Init other shared buffer-management stuff */ /* Init other shared buffer-management stuff */
...@@ -162,12 +147,13 @@ InitBufferPool(void) ...@@ -162,12 +147,13 @@ InitBufferPool(void)
* buffer pool. * buffer pool.
* *
* NB: this is called before InitProcess(), so we do not have a PGPROC and * NB: this is called before InitProcess(), so we do not have a PGPROC and
* cannot do LWLockAcquire; hence we can't actually access the bufmgr's * cannot do LWLockAcquire; hence we can't actually access stuff in
* shared memory yet. We are only initializing local data here. * shared memory yet. We are only initializing local data here.
*/ */
void void
InitBufferPoolAccess(void) InitBufferPoolAccess(void)
{ {
char *block;
int i; int i;
/* /*
...@@ -179,12 +165,18 @@ InitBufferPoolAccess(void) ...@@ -179,12 +165,18 @@ InitBufferPoolAccess(void)
sizeof(*PrivateRefCount)); sizeof(*PrivateRefCount));
/* /*
* Convert shmem offsets into addresses as seen by this process. This * Construct addresses for the individual buffer data blocks. We do
* is just to speed up the BufferGetBlock() macro. It is OK to do this * this just to speed up the BufferGetBlock() macro. (Since the
* without any lock since the data pointers never change. * addresses should be the same in every backend, we could inherit
* this data from the postmaster --- but in the EXEC_BACKEND case
* that doesn't work.)
*/ */
block = BufferBlocks;
for (i = 0; i < NBuffers; i++) for (i = 0; i < NBuffers; i++)
BufferBlockPointers[i] = (Block) MAKE_PTR(BufferDescriptors[i].data); {
BufferBlockPointers[i] = (Block) block;
block += BLCKSZ;
}
} }
/* /*
......
...@@ -3,12 +3,9 @@ ...@@ -3,12 +3,9 @@
* buf_table.c * buf_table.c
* routines for mapping BufferTags to buffer indexes. * routines for mapping BufferTags to buffer indexes.
* *
* NOTE: this module is called only by freelist.c, and the "buffer IDs" * Note: the routines in this file do no locking of their own. The caller
* it deals with are whatever freelist.c needs them to be; they may not be * must hold a suitable lock on the BufMappingLock, as specified in the
* directly equivalent to Buffer numbers. * comments.
*
* Note: all routines in this file assume that the BufMgrLock is held
* by the caller, so no synchronization is needed.
* *
* *
* Portions Copyright (c) 1996-2005, PostgreSQL Global Development Group * Portions Copyright (c) 1996-2005, PostgreSQL Global Development Group
...@@ -16,7 +13,7 @@ ...@@ -16,7 +13,7 @@
* *
* *
* IDENTIFICATION * IDENTIFICATION
* $PostgreSQL: pgsql/src/backend/storage/buffer/buf_table.c,v 1.39 2005/02/03 23:29:11 tgl Exp $ * $PostgreSQL: pgsql/src/backend/storage/buffer/buf_table.c,v 1.40 2005/03/04 20:21:06 tgl Exp $
* *
*------------------------------------------------------------------------- *-------------------------------------------------------------------------
*/ */
...@@ -74,17 +71,17 @@ InitBufTable(int size) ...@@ -74,17 +71,17 @@ InitBufTable(int size)
/* /*
* BufTableLookup * BufTableLookup
* Lookup the given BufferTag; return buffer ID, or -1 if not found * Lookup the given BufferTag; return buffer ID, or -1 if not found
*
* Caller must hold at least share lock on BufMappingLock
*/ */
int int
BufTableLookup(BufferTag *tagPtr) BufTableLookup(BufferTag *tagPtr)
{ {
BufferLookupEnt *result; BufferLookupEnt *result;
if (tagPtr->blockNum == P_NEW)
return -1;
result = (BufferLookupEnt *) result = (BufferLookupEnt *)
hash_search(SharedBufHash, (void *) tagPtr, HASH_FIND, NULL); hash_search(SharedBufHash, (void *) tagPtr, HASH_FIND, NULL);
if (!result) if (!result)
return -1; return -1;
...@@ -93,14 +90,23 @@ BufTableLookup(BufferTag *tagPtr) ...@@ -93,14 +90,23 @@ BufTableLookup(BufferTag *tagPtr)
/* /*
* BufTableInsert * BufTableInsert
* Insert a hashtable entry for given tag and buffer ID * Insert a hashtable entry for given tag and buffer ID,
* unless an entry already exists for that tag
*
* Returns -1 on successful insertion. If a conflicting entry exists
* already, returns the buffer ID in that entry.
*
* Caller must hold write lock on BufMappingLock
*/ */
void int
BufTableInsert(BufferTag *tagPtr, int buf_id) BufTableInsert(BufferTag *tagPtr, int buf_id)
{ {
BufferLookupEnt *result; BufferLookupEnt *result;
bool found; bool found;
Assert(buf_id >= 0); /* -1 is reserved for not-in-table */
Assert(tagPtr->blockNum != P_NEW); /* invalid tag */
result = (BufferLookupEnt *) result = (BufferLookupEnt *)
hash_search(SharedBufHash, (void *) tagPtr, HASH_ENTER, &found); hash_search(SharedBufHash, (void *) tagPtr, HASH_ENTER, &found);
...@@ -109,15 +115,19 @@ BufTableInsert(BufferTag *tagPtr, int buf_id) ...@@ -109,15 +115,19 @@ BufTableInsert(BufferTag *tagPtr, int buf_id)
(errcode(ERRCODE_OUT_OF_MEMORY), (errcode(ERRCODE_OUT_OF_MEMORY),
errmsg("out of shared memory"))); errmsg("out of shared memory")));
if (found) /* found something already in the table? */ if (found) /* found something already in the table */
elog(ERROR, "shared buffer hash table corrupted"); return result->id;
result->id = buf_id; result->id = buf_id;
return -1;
} }
/* /*
* BufTableDelete * BufTableDelete
* Delete the hashtable entry for given tag (which must exist) * Delete the hashtable entry for given tag (which must exist)
*
* Caller must hold write lock on BufMappingLock
*/ */
void void
BufTableDelete(BufferTag *tagPtr) BufTableDelete(BufferTag *tagPtr)
......
This diff is collapsed.
This diff is collapsed.
...@@ -9,7 +9,7 @@ ...@@ -9,7 +9,7 @@
* *
* *
* IDENTIFICATION * IDENTIFICATION
* $PostgreSQL: pgsql/src/backend/storage/buffer/localbuf.c,v 1.62 2005/01/10 20:02:21 tgl Exp $ * $PostgreSQL: pgsql/src/backend/storage/buffer/localbuf.c,v 1.63 2005/03/04 20:21:06 tgl Exp $
* *
*------------------------------------------------------------------------- *-------------------------------------------------------------------------
*/ */
...@@ -24,6 +24,10 @@ ...@@ -24,6 +24,10 @@
/*#define LBDEBUG*/ /*#define LBDEBUG*/
/* Note: this macro only works on local buffers, not shared ones! */
#define LocalBufHdrGetBlock(bufHdr) \
LocalBufferBlockPointers[-((bufHdr)->buf_id + 2)]
/* should be a GUC parameter some day */ /* should be a GUC parameter some day */
int NLocBuffer = 64; int NLocBuffer = 64;
...@@ -39,7 +43,7 @@ static int nextFreeLocalBuf = 0; ...@@ -39,7 +43,7 @@ static int nextFreeLocalBuf = 0;
* allocate a local buffer. We do round robin allocation for now. * allocate a local buffer. We do round robin allocation for now.
* *
* API is similar to bufmgr.c's BufferAlloc, except that we do not need * API is similar to bufmgr.c's BufferAlloc, except that we do not need
* to have the BufMgrLock since this is all local. Also, IO_IN_PROGRESS * to do any locking since this is all local. Also, IO_IN_PROGRESS
* does not get set. * does not get set.
*/ */
BufferDesc * BufferDesc *
...@@ -47,11 +51,12 @@ LocalBufferAlloc(Relation reln, BlockNumber blockNum, bool *foundPtr) ...@@ -47,11 +51,12 @@ LocalBufferAlloc(Relation reln, BlockNumber blockNum, bool *foundPtr)
{ {
BufferTag newTag; /* identity of requested block */ BufferTag newTag; /* identity of requested block */
int i; int i;
int trycounter;
BufferDesc *bufHdr; BufferDesc *bufHdr;
INIT_BUFFERTAG(newTag, reln, blockNum); INIT_BUFFERTAG(newTag, reln, blockNum);
/* a low tech search for now -- not optimized for scans */ /* a low tech search for now -- should use a hashtable */
for (i = 0; i < NLocBuffer; i++) for (i = 0; i < NLocBuffer; i++)
{ {
bufHdr = &LocalBufferDescriptors[i]; bufHdr = &LocalBufferDescriptors[i];
...@@ -81,32 +86,44 @@ LocalBufferAlloc(Relation reln, BlockNumber blockNum, bool *foundPtr) ...@@ -81,32 +86,44 @@ LocalBufferAlloc(Relation reln, BlockNumber blockNum, bool *foundPtr)
RelationGetRelid(reln), blockNum, -nextFreeLocalBuf - 1); RelationGetRelid(reln), blockNum, -nextFreeLocalBuf - 1);
#endif #endif
/* need to get a new buffer (round robin for now) */ /*
bufHdr = NULL; * Need to get a new buffer. We use a clock sweep algorithm
for (i = 0; i < NLocBuffer; i++) * (essentially the same as what freelist.c does now...)
*/
trycounter = NLocBuffer;
for (;;)
{ {
int b = (nextFreeLocalBuf + i) % NLocBuffer; int b = nextFreeLocalBuf;
if (++nextFreeLocalBuf >= NLocBuffer)
nextFreeLocalBuf = 0;
if (LocalRefCount[b] == 0) bufHdr = &LocalBufferDescriptors[b];
if (LocalRefCount[b] == 0 && bufHdr->usage_count == 0)
{ {
bufHdr = &LocalBufferDescriptors[b];
LocalRefCount[b]++; LocalRefCount[b]++;
ResourceOwnerRememberBuffer(CurrentResourceOwner, ResourceOwnerRememberBuffer(CurrentResourceOwner,
BufferDescriptorGetBuffer(bufHdr)); BufferDescriptorGetBuffer(bufHdr));
nextFreeLocalBuf = (b + 1) % NLocBuffer;
break; break;
} }
if (bufHdr->usage_count > 0)
{
bufHdr->usage_count--;
trycounter = NLocBuffer;
}
else if (--trycounter == 0)
ereport(ERROR,
(errcode(ERRCODE_INSUFFICIENT_RESOURCES),
errmsg("no empty local buffer available")));
} }
if (bufHdr == NULL)
ereport(ERROR,
(errcode(ERRCODE_INSUFFICIENT_RESOURCES),
errmsg("no empty local buffer available")));
/* /*
* this buffer is not referenced but it might still be dirty. if * this buffer is not referenced but it might still be dirty. if
* that's the case, write it out before reusing it! * that's the case, write it out before reusing it!
*/ */
if (bufHdr->flags & BM_DIRTY || bufHdr->cntxDirty) if (bufHdr->flags & BM_DIRTY)
{ {
SMgrRelation oreln; SMgrRelation oreln;
...@@ -116,7 +133,7 @@ LocalBufferAlloc(Relation reln, BlockNumber blockNum, bool *foundPtr) ...@@ -116,7 +133,7 @@ LocalBufferAlloc(Relation reln, BlockNumber blockNum, bool *foundPtr)
/* And write... */ /* And write... */
smgrwrite(oreln, smgrwrite(oreln,
bufHdr->tag.blockNum, bufHdr->tag.blockNum,
(char *) MAKE_PTR(bufHdr->data), (char *) LocalBufHdrGetBlock(bufHdr),
true); true);
LocalBufferFlushCount++; LocalBufferFlushCount++;
...@@ -129,7 +146,7 @@ LocalBufferAlloc(Relation reln, BlockNumber blockNum, bool *foundPtr) ...@@ -129,7 +146,7 @@ LocalBufferAlloc(Relation reln, BlockNumber blockNum, bool *foundPtr)
* use, so it's okay to do it (and possibly error out) before marking * use, so it's okay to do it (and possibly error out) before marking
* the buffer as not dirty. * the buffer as not dirty.
*/ */
if (bufHdr->data == (SHMEM_OFFSET) 0) if (LocalBufHdrGetBlock(bufHdr) == NULL)
{ {
char *data = (char *) malloc(BLCKSZ); char *data = (char *) malloc(BLCKSZ);
...@@ -138,17 +155,10 @@ LocalBufferAlloc(Relation reln, BlockNumber blockNum, bool *foundPtr) ...@@ -138,17 +155,10 @@ LocalBufferAlloc(Relation reln, BlockNumber blockNum, bool *foundPtr)
(errcode(ERRCODE_OUT_OF_MEMORY), (errcode(ERRCODE_OUT_OF_MEMORY),
errmsg("out of memory"))); errmsg("out of memory")));
/*
* This is a bit of a hack: bufHdr->data needs to be a shmem
* offset for consistency with the shared-buffer case, so make it
* one even though it's not really a valid shmem offset.
*/
bufHdr->data = MAKE_OFFSET(data);
/* /*
* Set pointer for use by BufferGetBlock() macro. * Set pointer for use by BufferGetBlock() macro.
*/ */
LocalBufferBlockPointers[-(bufHdr->buf_id + 2)] = (Block) data; LocalBufHdrGetBlock(bufHdr) = (Block) data;
} }
/* /*
...@@ -156,7 +166,8 @@ LocalBufferAlloc(Relation reln, BlockNumber blockNum, bool *foundPtr) ...@@ -156,7 +166,8 @@ LocalBufferAlloc(Relation reln, BlockNumber blockNum, bool *foundPtr)
*/ */
bufHdr->tag = newTag; bufHdr->tag = newTag;
bufHdr->flags &= ~(BM_VALID | BM_DIRTY | BM_JUST_DIRTIED | BM_IO_ERROR); bufHdr->flags &= ~(BM_VALID | BM_DIRTY | BM_JUST_DIRTIED | BM_IO_ERROR);
bufHdr->cntxDirty = false; bufHdr->flags |= BM_TAG_VALID;
bufHdr->usage_count = 0;
*foundPtr = FALSE; *foundPtr = FALSE;
return bufHdr; return bufHdr;
...@@ -170,6 +181,7 @@ void ...@@ -170,6 +181,7 @@ void
WriteLocalBuffer(Buffer buffer, bool release) WriteLocalBuffer(Buffer buffer, bool release)
{ {
int bufid; int bufid;
BufferDesc *bufHdr;
Assert(BufferIsLocal(buffer)); Assert(BufferIsLocal(buffer));
...@@ -178,12 +190,18 @@ WriteLocalBuffer(Buffer buffer, bool release) ...@@ -178,12 +190,18 @@ WriteLocalBuffer(Buffer buffer, bool release)
#endif #endif
bufid = -(buffer + 1); bufid = -(buffer + 1);
LocalBufferDescriptors[bufid].flags |= BM_DIRTY;
Assert(LocalRefCount[bufid] > 0);
bufHdr = &LocalBufferDescriptors[bufid];
bufHdr->flags |= BM_DIRTY;
if (release) if (release)
{ {
Assert(LocalRefCount[bufid] > 0);
LocalRefCount[bufid]--; LocalRefCount[bufid]--;
if (LocalRefCount[bufid] == 0 &&
bufHdr->usage_count < BM_MAX_USAGE_COUNT)
bufHdr->usage_count++;
ResourceOwnerForgetBuffer(CurrentResourceOwner, buffer); ResourceOwnerForgetBuffer(CurrentResourceOwner, buffer);
} }
} }
......
...@@ -10,7 +10,7 @@ ...@@ -10,7 +10,7 @@
* Written by Peter Eisentraut <peter_e@gmx.net>. * Written by Peter Eisentraut <peter_e@gmx.net>.
* *
* IDENTIFICATION * IDENTIFICATION
* $PostgreSQL: pgsql/src/backend/utils/misc/guc.c,v 1.253 2005/03/01 20:23:34 tgl Exp $ * $PostgreSQL: pgsql/src/backend/utils/misc/guc.c,v 1.254 2005/03/04 20:21:06 tgl Exp $
* *
*-------------------------------------------------------------------- *--------------------------------------------------------------------
*/ */
...@@ -77,7 +77,6 @@ extern bool Log_disconnections; ...@@ -77,7 +77,6 @@ extern bool Log_disconnections;
extern DLLIMPORT bool check_function_bodies; extern DLLIMPORT bool check_function_bodies;
extern int CommitDelay; extern int CommitDelay;
extern int CommitSiblings; extern int CommitSiblings;
extern int DebugSharedBuffers;
extern char *default_tablespace; extern char *default_tablespace;
static const char *assign_log_destination(const char *value, static const char *assign_log_destination(const char *value,
...@@ -1230,15 +1229,6 @@ static struct config_int ConfigureNamesInt[] = ...@@ -1230,15 +1229,6 @@ static struct config_int ConfigureNamesInt[] =
-1, -1, INT_MAX / 1000, NULL, NULL -1, -1, INT_MAX / 1000, NULL, NULL
}, },
{
{"debug_shared_buffers", PGC_POSTMASTER, STATS_MONITORING,
gettext_noop("Interval to report shared buffer status in seconds"),
NULL
},
&DebugSharedBuffers,
0, 0, 600, NULL, NULL
},
{ {
{"bgwriter_delay", PGC_SIGHUP, RESOURCES, {"bgwriter_delay", PGC_SIGHUP, RESOURCES,
gettext_noop("Background writer sleep time between rounds in milliseconds"), gettext_noop("Background writer sleep time between rounds in milliseconds"),
...@@ -1249,21 +1239,21 @@ static struct config_int ConfigureNamesInt[] = ...@@ -1249,21 +1239,21 @@ static struct config_int ConfigureNamesInt[] =
}, },
{ {
{"bgwriter_percent", PGC_SIGHUP, RESOURCES, {"bgwriter_lru_maxpages", PGC_SIGHUP, RESOURCES,
gettext_noop("Background writer percentage of dirty buffers to flush per round"), gettext_noop("Background writer maximum number of all pages to flush per round"),
NULL NULL
}, },
&BgWriterPercent, &bgwriter_lru_maxpages,
1, 0, 100, NULL, NULL 5, 0, 1000, NULL, NULL
}, },
{ {
{"bgwriter_maxpages", PGC_SIGHUP, RESOURCES, {"bgwriter_all_maxpages", PGC_SIGHUP, RESOURCES,
gettext_noop("Background writer maximum number of pages to flush per round"), gettext_noop("Background writer maximum number of LRU pages to flush per round"),
NULL NULL
}, },
&BgWriterMaxPages, &bgwriter_all_maxpages,
100, 0, 1000, NULL, NULL 5, 0, 1000, NULL, NULL
}, },
{ {
...@@ -1394,6 +1384,24 @@ static struct config_real ConfigureNamesReal[] = ...@@ -1394,6 +1384,24 @@ static struct config_real ConfigureNamesReal[] =
MAX_GEQO_SELECTION_BIAS, NULL, NULL MAX_GEQO_SELECTION_BIAS, NULL, NULL
}, },
{
{"bgwriter_lru_percent", PGC_SIGHUP, RESOURCES,
gettext_noop("Background writer percentage of LRU buffers to flush per round"),
NULL
},
&bgwriter_lru_percent,
1.0, 0.0, 100.0, NULL, NULL
},
{
{"bgwriter_all_percent", PGC_SIGHUP, RESOURCES,
gettext_noop("Background writer percentage of all buffers to flush per round"),
NULL
},
&bgwriter_all_percent,
0.333, 0.0, 100.0, NULL, NULL
},
{ {
{"seed", PGC_USERSET, UNGROUPED, {"seed", PGC_USERSET, UNGROUPED,
gettext_noop("Sets the seed for random-number generation."), gettext_noop("Sets the seed for random-number generation."),
......
...@@ -99,8 +99,10 @@ ...@@ -99,8 +99,10 @@
# - Background writer - # - Background writer -
#bgwriter_delay = 200 # 10-10000 milliseconds between rounds #bgwriter_delay = 200 # 10-10000 milliseconds between rounds
#bgwriter_percent = 1 # 0-100% of dirty buffers in each round #bgwriter_lru_percent = 1.0 # 0-100% of LRU buffers scanned in each round
#bgwriter_maxpages = 100 # 0-1000 buffers max per round #bgwriter_lru_maxpages = 5 # 0-1000 buffers max written per round
#bgwriter_all_percent = 0.333 # 0-100% of all buffers scanned in each round
#bgwriter_all_maxpages = 5 # 0-1000 buffers max written per round
#--------------------------------------------------------------------------- #---------------------------------------------------------------------------
......
...@@ -14,7 +14,7 @@ ...@@ -14,7 +14,7 @@
* *
* *
* IDENTIFICATION * IDENTIFICATION
* $PostgreSQL: pgsql/src/backend/utils/resowner/resowner.c,v 1.9 2004/12/31 22:02:50 pgsql Exp $ * $PostgreSQL: pgsql/src/backend/utils/resowner/resowner.c,v 1.10 2005/03/04 20:21:06 tgl Exp $
* *
*------------------------------------------------------------------------- *-------------------------------------------------------------------------
*/ */
...@@ -200,12 +200,7 @@ ResourceOwnerReleaseInternal(ResourceOwner owner, ...@@ -200,12 +200,7 @@ ResourceOwnerReleaseInternal(ResourceOwner owner,
* that would indicate failure to clean up the executor correctly --- * that would indicate failure to clean up the executor correctly ---
* so issue warnings. In the abort case, just clean up quietly. * so issue warnings. In the abort case, just clean up quietly.
* *
* XXX this is fairly inefficient due to multiple BufMgrLock * We are careful to do the releasing back-to-front, so as to
* grabs if there are lots of buffers to be released, but we
* don't expect many (indeed none in the success case) so it's
* probably not worth optimizing.
*
* We are however careful to release back-to-front, so as to
* avoid O(N^2) behavior in ResourceOwnerForgetBuffer(). * avoid O(N^2) behavior in ResourceOwnerForgetBuffer().
*/ */
while (owner->nbuffers > 0) while (owner->nbuffers > 0)
......
...@@ -5,7 +5,7 @@ ...@@ -5,7 +5,7 @@
* *
* Portions Copyright (c) 1996-2005, PostgreSQL Global Development Group * Portions Copyright (c) 1996-2005, PostgreSQL Global Development Group
* *
* $PostgreSQL: pgsql/src/include/postmaster/bgwriter.h,v 1.4 2004/12/31 22:03:39 pgsql Exp $ * $PostgreSQL: pgsql/src/include/postmaster/bgwriter.h,v 1.5 2005/03/04 20:21:06 tgl Exp $
* *
*------------------------------------------------------------------------- *-------------------------------------------------------------------------
*/ */
...@@ -18,8 +18,6 @@ ...@@ -18,8 +18,6 @@
/* GUC options */ /* GUC options */
extern int BgWriterDelay; extern int BgWriterDelay;
extern int BgWriterPercent;
extern int BgWriterMaxPages;
extern int CheckPointTimeout; extern int CheckPointTimeout;
extern int CheckPointWarning; extern int CheckPointWarning;
......
...@@ -8,7 +8,7 @@ ...@@ -8,7 +8,7 @@
* Portions Copyright (c) 1996-2005, PostgreSQL Global Development Group * Portions Copyright (c) 1996-2005, PostgreSQL Global Development Group
* Portions Copyright (c) 1994, Regents of the University of California * Portions Copyright (c) 1994, Regents of the University of California
* *
* $PostgreSQL: pgsql/src/include/storage/buf_internals.h,v 1.76 2005/02/03 23:29:19 tgl Exp $ * $PostgreSQL: pgsql/src/include/storage/buf_internals.h,v 1.77 2005/03/04 20:21:07 tgl Exp $
* *
*------------------------------------------------------------------------- *-------------------------------------------------------------------------
*/ */
...@@ -19,24 +19,39 @@ ...@@ -19,24 +19,39 @@
#include "storage/buf.h" #include "storage/buf.h"
#include "storage/lwlock.h" #include "storage/lwlock.h"
#include "storage/shmem.h" #include "storage/shmem.h"
#include "storage/spin.h"
#include "utils/rel.h" #include "utils/rel.h"
/* /*
* Flags for buffer descriptors * Flags for buffer descriptors
*
* Note: TAG_VALID essentially means that there is a buffer hashtable
* entry associated with the buffer's tag.
*/ */
#define BM_DIRTY (1 << 0) /* data needs writing */ #define BM_DIRTY (1 << 0) /* data needs writing */
#define BM_VALID (1 << 1) /* data is valid */ #define BM_VALID (1 << 1) /* data is valid */
#define BM_IO_IN_PROGRESS (1 << 2) /* read or write in #define BM_TAG_VALID (1 << 2) /* tag is assigned */
#define BM_IO_IN_PROGRESS (1 << 3) /* read or write in
* progress */ * progress */
#define BM_IO_ERROR (1 << 3) /* previous I/O failed */ #define BM_IO_ERROR (1 << 4) /* previous I/O failed */
#define BM_JUST_DIRTIED (1 << 4) /* dirtied since write #define BM_JUST_DIRTIED (1 << 5) /* dirtied since write
* started */ * started */
#define BM_PIN_COUNT_WAITER (1 << 5) /* have waiter for sole #define BM_PIN_COUNT_WAITER (1 << 6) /* have waiter for sole
* pin */ * pin */
typedef bits16 BufFlags; typedef bits16 BufFlags;
/*
* The maximum allowed value of usage_count represents a tradeoff between
* accuracy and speed of the clock-sweep buffer management algorithm. A
* large value (comparable to NBuffers) would approximate LRU semantics.
* But it can take as many as BM_MAX_USAGE_COUNT+1 complete cycles of
* clock sweeps to find a free buffer, so in practice we don't want the
* value to be very large.
*/
#define BM_MAX_USAGE_COUNT 5
/* /*
* Buffer tag identifies which disk block the buffer contains. * Buffer tag identifies which disk block the buffer contains.
* *
...@@ -77,45 +92,81 @@ typedef struct buftag ...@@ -77,45 +92,81 @@ typedef struct buftag
/* /*
* BufferDesc -- shared descriptor/state data for a single shared buffer. * BufferDesc -- shared descriptor/state data for a single shared buffer.
*
* Note: buf_hdr_lock must be held to examine or change the tag, flags,
* usage_count, refcount, or wait_backend_id fields. buf_id field never
* changes after initialization, so does not need locking. freeNext is
* protected by the BufFreelistLock not buf_hdr_lock. The LWLocks can take
* care of themselves. The buf_hdr_lock is *not* used to control access to
* the data in the buffer!
*
* An exception is that if we have the buffer pinned, its tag can't change
* underneath us, so we can examine the tag without locking the spinlock.
* Also, in places we do one-time reads of the flags without bothering to
* lock the spinlock; this is generally for situations where we don't expect
* the flag bit being tested to be changing.
*
* We can't physically remove items from a disk page if another backend has
* the buffer pinned. Hence, a backend may need to wait for all other pins
* to go away. This is signaled by storing its own backend ID into
* wait_backend_id and setting flag bit BM_PIN_COUNT_WAITER. At present,
* there can be only one such waiter per buffer.
*
* We use this same struct for local buffer headers, but the lock fields
* are not used and not all of the flag bits are useful either.
*/ */
typedef struct sbufdesc typedef struct sbufdesc
{ {
Buffer bufNext; /* link in freelist chain */ BufferTag tag; /* ID of page contained in buffer */
SHMEM_OFFSET data; /* pointer to data in buf pool */
/* tag and id must be together for table lookup (still true?) */
BufferTag tag; /* file/block identifier */
int buf_id; /* buffer's index number (from 0) */
BufFlags flags; /* see bit definitions above */ BufFlags flags; /* see bit definitions above */
uint16 usage_count; /* usage counter for clock sweep code */
unsigned refcount; /* # of backends holding pins on buffer */ unsigned refcount; /* # of backends holding pins on buffer */
BackendId wait_backend_id; /* backend ID of pin-count waiter */
slock_t buf_hdr_lock; /* protects the above fields */
int buf_id; /* buffer's index number (from 0) */
int freeNext; /* link in freelist chain */
LWLockId io_in_progress_lock; /* to wait for I/O to complete */ LWLockId io_in_progress_lock; /* to wait for I/O to complete */
LWLockId cntx_lock; /* to lock access to page context */ LWLockId content_lock; /* to lock access to buffer contents */
bool cntxDirty; /* new way to mark block as dirty */
/*
* We can't physically remove items from a disk page if another
* backend has the buffer pinned. Hence, a backend may need to wait
* for all other pins to go away. This is signaled by storing its own
* backend ID into wait_backend_id and setting flag bit
* BM_PIN_COUNT_WAITER. At present, there can be only one such waiter
* per buffer.
*/
BackendId wait_backend_id; /* backend ID of pin-count waiter */
} BufferDesc; } BufferDesc;
#define BufferDescriptorGetBuffer(bdesc) ((bdesc)->buf_id + 1) #define BufferDescriptorGetBuffer(bdesc) ((bdesc)->buf_id + 1)
/*
* The freeNext field is either the index of the next freelist entry,
* or one of these special values:
*/
#define FREENEXT_END_OF_LIST (-1)
#define FREENEXT_NOT_IN_LIST (-2)
/* in bufmgr.c */ /*
* Macros for acquiring/releasing a buffer header's spinlock. The
* NoHoldoff cases may be used when we know that we hold some LWLock
* and therefore interrupts are already held off. Do not apply these
* to local buffers!
*/
#define LockBufHdr(bufHdr) \
SpinLockAcquire(&(bufHdr)->buf_hdr_lock)
#define UnlockBufHdr(bufHdr) \
SpinLockRelease(&(bufHdr)->buf_hdr_lock)
#define LockBufHdr_NoHoldoff(bufHdr) \
SpinLockAcquire_NoHoldoff(&(bufHdr)->buf_hdr_lock)
#define UnlockBufHdr_NoHoldoff(bufHdr) \
SpinLockRelease_NoHoldoff(&(bufHdr)->buf_hdr_lock)
/* in buf_init.c */
extern BufferDesc *BufferDescriptors; extern BufferDesc *BufferDescriptors;
/* in localbuf.c */ /* in localbuf.c */
extern BufferDesc *LocalBufferDescriptors; extern BufferDesc *LocalBufferDescriptors;
/* counters in buf_init.c */ /* in freelist.c */
extern bool strategy_hint_vacuum;
/* event counters in buf_init.c */
extern long int ReadBufferCount; extern long int ReadBufferCount;
extern long int ReadLocalBufferCount; extern long int ReadLocalBufferCount;
extern long int BufferHitCount; extern long int BufferHitCount;
...@@ -129,15 +180,9 @@ extern long int LocalBufferFlushCount; ...@@ -129,15 +180,9 @@ extern long int LocalBufferFlushCount;
*/ */
/* freelist.c */ /* freelist.c */
extern BufferDesc *StrategyBufferLookup(BufferTag *tagPtr, bool recheck, extern BufferDesc *StrategyGetBuffer(void);
int *cdb_found_index); extern void StrategyFreeBuffer(BufferDesc *buf, bool at_head);
extern BufferDesc *StrategyGetBuffer(int *cdb_replace_index); extern int StrategySyncStart(void);
extern void StrategyReplaceBuffer(BufferDesc *buf, BufferTag *newTag,
int cdb_found_index, int cdb_replace_index);
extern void StrategyInvalidateBuffer(BufferDesc *buf);
extern void StrategyHintVacuum(bool vacuum_active);
extern int StrategyDirtyBufferList(BufferDesc **buffers, BufferTag *buftags,
int max_buffers);
extern int StrategyShmemSize(void); extern int StrategyShmemSize(void);
extern void StrategyInitialize(bool init); extern void StrategyInitialize(bool init);
...@@ -145,7 +190,7 @@ extern void StrategyInitialize(bool init); ...@@ -145,7 +190,7 @@ extern void StrategyInitialize(bool init);
extern int BufTableShmemSize(int size); extern int BufTableShmemSize(int size);
extern void InitBufTable(int size); extern void InitBufTable(int size);
extern int BufTableLookup(BufferTag *tagPtr); extern int BufTableLookup(BufferTag *tagPtr);
extern void BufTableInsert(BufferTag *tagPtr, int buf_id); extern int BufTableInsert(BufferTag *tagPtr, int buf_id);
extern void BufTableDelete(BufferTag *tagPtr); extern void BufTableDelete(BufferTag *tagPtr);
/* localbuf.c */ /* localbuf.c */
......
...@@ -7,7 +7,7 @@ ...@@ -7,7 +7,7 @@
* Portions Copyright (c) 1996-2005, PostgreSQL Global Development Group * Portions Copyright (c) 1996-2005, PostgreSQL Global Development Group
* Portions Copyright (c) 1994, Regents of the University of California * Portions Copyright (c) 1994, Regents of the University of California
* *
* $PostgreSQL: pgsql/src/include/storage/bufmgr.h,v 1.89 2004/12/31 22:03:42 pgsql Exp $ * $PostgreSQL: pgsql/src/include/storage/bufmgr.h,v 1.90 2005/03/04 20:21:07 tgl Exp $
* *
*------------------------------------------------------------------------- *-------------------------------------------------------------------------
*/ */
...@@ -27,21 +27,25 @@ extern DLLIMPORT int NBuffers; ...@@ -27,21 +27,25 @@ extern DLLIMPORT int NBuffers;
/* in bufmgr.c */ /* in bufmgr.c */
extern bool zero_damaged_pages; extern bool zero_damaged_pages;
extern double bgwriter_lru_percent;
extern double bgwriter_all_percent;
extern int bgwriter_lru_maxpages;
extern int bgwriter_all_maxpages;
/* in buf_init.c */ /* in buf_init.c */
extern DLLIMPORT Block *BufferBlockPointers; extern DLLIMPORT Block *BufferBlockPointers;
extern int32 *PrivateRefCount; extern DLLIMPORT int32 *PrivateRefCount;
/* in localbuf.c */ /* in localbuf.c */
extern DLLIMPORT int NLocBuffer; extern DLLIMPORT int NLocBuffer;
extern DLLIMPORT Block *LocalBufferBlockPointers; extern DLLIMPORT Block *LocalBufferBlockPointers;
extern int32 *LocalRefCount; extern DLLIMPORT int32 *LocalRefCount;
/* special block number for ReadBuffer() */ /* special block number for ReadBuffer() */
#define P_NEW InvalidBlockNumber /* grow the file to get a new page */ #define P_NEW InvalidBlockNumber /* grow the file to get a new page */
/* /*
* Buffer context lock modes * Buffer content lock modes (mode argument for LockBuffer())
*/ */
#define BUFFER_LOCK_UNLOCK 0 #define BUFFER_LOCK_UNLOCK 0
#define BUFFER_LOCK_SHARE 1 #define BUFFER_LOCK_SHARE 1
...@@ -150,8 +154,12 @@ extern void LockBufferForCleanup(Buffer buffer); ...@@ -150,8 +154,12 @@ extern void LockBufferForCleanup(Buffer buffer);
extern void AbortBufferIO(void); extern void AbortBufferIO(void);
extern void BufmgrCommit(void); extern void BufmgrCommit(void);
extern int BufferSync(int percent, int maxpages); extern void BufferSync(void);
extern void BgBufferSync(void);
extern void InitLocalBuffer(void); extern void InitLocalBuffer(void);
/* in freelist.c */
extern void StrategyHintVacuum(bool vacuum_active);
#endif #endif
...@@ -7,7 +7,7 @@ ...@@ -7,7 +7,7 @@
* Portions Copyright (c) 1996-2005, PostgreSQL Global Development Group * Portions Copyright (c) 1996-2005, PostgreSQL Global Development Group
* Portions Copyright (c) 1994, Regents of the University of California * Portions Copyright (c) 1994, Regents of the University of California
* *
* $PostgreSQL: pgsql/src/include/storage/lwlock.h,v 1.16 2004/12/31 22:03:42 pgsql Exp $ * $PostgreSQL: pgsql/src/include/storage/lwlock.h,v 1.17 2005/03/04 20:21:07 tgl Exp $
* *
*------------------------------------------------------------------------- *-------------------------------------------------------------------------
*/ */
...@@ -25,7 +25,8 @@ ...@@ -25,7 +25,8 @@
*/ */
typedef enum LWLockId typedef enum LWLockId
{ {
BufMgrLock, BufMappingLock,
BufFreelistLock,
LockMgrLock, LockMgrLock,
OidGenLock, OidGenLock,
XidGenLock, XidGenLock,
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment