Commits · 3c71244b74b164e6d86654cab6fd0df0de1d3312 · Abuhujair Javed / Postgres FD Implementation

27 Jun, 2006 1 commit

Put #ifdef NOT_USED around posix_fadvise call. We may want to resurrect · 3c71244b

Tom Lane authored 18 years ago

this someday, but right now it seems that posix_fadvise is immature to
the point of being broken on many platforms ... and we don't have any
benchmark evidence proving it's worth spending time on.

3c71244b

22 Jun, 2006 1 commit

pg_stop_backup was calling XLogArchiveNotify() twice for the newly created · 3a04f53e

Tom Lane authored 18 years ago

backup history file.  Bug introduced by the 8.1 change to make pg_stop_backup
delete older history files.  Per report from Masao Fujii.

3a04f53e

18 Jun, 2006 1 commit

Don't try to call posix_fadvise() unless <fcntl.h> supplies a declaration · 1e8ae136

Tom Lane authored 18 years ago

for it.  Hopefully will fix core dump evidenced by some buildfarm members
since fadvise patch went in.  The actual definition of the function is not
ABI-compatible with compiler's default assumption in the absence of any
declaration, so it's clearly unsafe to try to call it without seeing a
declaration.

1e8ae136

16 Jun, 2006 1 commit
- Test for POSIX_FADV_DONTNEED to use posix_fadvise(). · 40bc06fa
  Bruce Momjian authored 18 years ago
  
  40bc06fa
15 Jun, 2006 1 commit
- Use posix_fadvise() to avoid kernel caching of WAL contents on WAL file · 94a5c4a0
  Bruce Momjian authored 18 years ago
```
close.

ITAGAKI Takahiro
```
  94a5c4a0
20 Apr, 2006 1 commit

Ensure that we validate the page header of the first page of a WAL file · eac825aa

Tom Lane authored 18 years ago

whenever we start to read within that file.  The first page carries
extra identification information that really ought to be checked, but
as the code stood, this was only checked when we switched sequentially
into a new WAL file, or if by chance the starting checkpoint record was
within the first page.  This patch ensures that we will detect bogus
'long header' information before we start replaying the WAL sequence.

eac825aa

17 Apr, 2006 1 commit

Fix the torn-page hazard for PITR base backups by forcing full page writes · 0a873949

Tom Lane authored 18 years ago

to occur between pg_start_backup() and pg_stop_backup(), even if the GUC
setting full_page_writes is OFF. Per discussion, doing this in combination
with the already-existing checkpoint during pg_start_backup() should ensure
safety against partial page updates being included in the backup. We do
not have to force full page writes to occur during normal PITR operation,
as I had first feared.

0a873949

14 Apr, 2006 1 commit

Make the world safe for full_page_writes. Allow XLOG records that try to · defe9346

Tom Lane authored 18 years ago

update no-longer-existing pages to fall through as no-ops, but make a note
of each page number referenced by such records. If we don't see a later
XLOG entry dropping the table or truncating away the page, complain at
the end of XLOG replay. Since this fixes the known failure mode for
full_page_writes = off, revert my previous band-aid patch that disabled
that GUC variable.

defe9346

05 Apr, 2006 1 commit

Add a field to the first page of each WAL file to indicate the · 09b5271e

Tom Lane authored 18 years ago

XLOG_BLCKSZ. This ought to help in preventing configuration mismatch
problems if anyone tries to ship PITR files between servers compiled
with different XLOG_BLCKSZ settings. Simon Riggs

09b5271e

04 Apr, 2006 1 commit

Don't use BLCKSZ for the physical length of the pg_control file, but · e6140d90

Tom Lane authored 18 years ago

instead a dedicated symbol.  This probably makes no functional difference
for likely values of BLCKSZ, but it makes the intent clearer.
Simon Riggs, minor editorialization by Tom Lane.

e6140d90

03 Apr, 2006 1 commit

Define a separately configurable XLOG_BLCKSZ symbol for the page size · eaef1113

Tom Lane authored 18 years ago

used within WAL files. Historically this was the same as the data file
BLCKSZ, but there's no necessary connection, and it's possible that
performance gains might ensue from reducing XLOG_BLCKSZ. In any case
distinguishing two symbols should improve code clarity. This commit
does not actually change the page size, only provide the infrastructure
to make it possible to do so. initdb forced because of addition of a
field to pg_control.
Mark Wong, with some help from Simon Riggs and Tom Lane.

eaef1113

31 Mar, 2006 1 commit

Clean up WAL/buffer interactions as per my recent proposal. Get rid of the · a8b8f4db

Tom Lane authored 18 years ago

misleadingly-named WriteBuffer routine, and instead require routines that
change buffer pages to call MarkBufferDirty (which does exactly what it says).
We also require that they do so before calling XLogInsert; this takes care of
the synchronization requirement documented in SyncOneBuffer. Note that
because bufmgr takes the buffer content lock (in shared mode) while writing
out any buffer, it doesn't matter whether MarkBufferDirty is executed before
the buffer content change is complete, so long as the content change is
completed before releasing exclusive lock on the buffer. So it's OK to set
the dirtybit before we fill in the LSN.
This eliminates the former kluge of needing to set the dirtybit in LockBuffer.
Aside from making the code more transparent, we can also add some new
debugging assertions, in particular that the caller of MarkBufferDirty must
hold the buffer content lock, not merely a pin.

a8b8f4db

29 Mar, 2006 1 commit

Clean up and document the API for XLogOpenRelation and XLogReadBuffer. · 6d61cdec

Tom Lane authored 18 years ago

This commit doesn't make much functional change, but it does eliminate some
duplicated code --- for instance, PageIsNew tests are now done inside
XLogReadBuffer rather than by each caller.
The GIST xlog code still needs a lot of love, but I'll worry about that
separately.

6d61cdec

28 Mar, 2006 1 commit

Disable full_page_writes, because turning it off risks causing crash-recovery · 0a971e2f

Tom Lane authored 18 years ago

failures even when the hardware and OS did nothing wrong. Per recent analysis
of a problem report from Alex Bahdushka.

For the moment I've just diked out the test of the parameter, rather than
removing the GUC infrastructure and documentation, in case we conclude that
there's something salvageable there. There seems no chance of it being
resurrected in the 8.1 branch though.

0a971e2f

24 Mar, 2006 1 commit

Arrange to emit a description of the current XLOG record as error context · 0a202070

Tom Lane authored 18 years ago

when an error occurs during xlog replay.  Also, replace the former risky
'write into a fixed-size buffer with no overflow detection' API for XLOG
record description routines; use an expansible StringInfo instead.  (The
latter accounts for most of the patch bulk.)

Qingqing Zhou

0a202070

05 Mar, 2006 1 commit
- Update copyright for 2006. Update scripts. · f2f5b056
  Bruce Momjian authored 18 years ago
  
  f2f5b056
11 Jan, 2006 1 commit

Cosmetic code cleanup: fix a bunch of places that used "return (expr);" · fb627b76

Neil Conway authored 19 years ago

rather than "return expr;" -- the latter style is used in most of the
tree. I kept the parentheses when they were necessary or useful because
the return expression was complex.

fb627b76

29 Dec, 2005 1 commit

Get rid of the SpinLockAcquire/SpinLockAcquire_NoHoldoff distinction · 195f1642

Tom Lane authored 19 years ago

in favor of having just one set of macros that don't do HOLD/RESUME_INTERRUPTS
(hence, these correspond to the old SpinLockAcquire_NoHoldoff case).
Given our coding rules for spinlock use, there is no reason to allow
CHECK_FOR_INTERRUPTS to be done while holding a spinlock, and also there
is no situation where ImmediateInterruptOK will be true while holding a
spinlock.  Therefore doing HOLD/RESUME_INTERRUPTS while taking/releasing a
spinlock is just a waste of cycles.  Qingqing Zhou and Tom Lane.

195f1642

28 Dec, 2005 1 commit

Arrange to set the LC_XXX environment variables to match our locale · ab51bbaa

Tom Lane authored 19 years ago

setup.  This protects against undesired changes in locale behavior
if someone carelessly does setlocale(LC_ALL, "") (and we know who
you are, perl guys).

ab51bbaa

22 Nov, 2005 1 commit

Re-run pgindent, fixing a problem where comment lines after a blank · 436a2956

Bruce Momjian authored 19 years ago

comment line where output as too long, and update typedefs for /lib
directory.  Also fix case where identifiers were used as variable names
in the backend, but as typedefs in ecpg (favor the backend for
indenting).

Backpatch to 8.1.X.

436a2956

29 Oct, 2005 1 commit
- Message corrections · 07bb9f08
  Peter Eisentraut authored 19 years ago
  
  07bb9f08
22 Oct, 2005 1 commit
- Make code for selecting default WAL sync method less confusing. · 6d6c3722
  Tom Lane authored 19 years ago
  
  6d6c3722
15 Oct, 2005 1 commit
- Standard pgindent run for 8.1. · 1dc34982
  Bruce Momjian authored 19 years ago
  
  1dc34982
03 Oct, 2005 1 commit

Expand pg_control information so that we can verify that the database · 64eea6c2

Tom Lane authored 19 years ago

was created on a machine with alignment rules and floating-point format
similar to the current machine. Per recent discussion, this seems like
a good idea with the increasing prevalence of 32/64 bit environments.

64eea6c2

22 Aug, 2005 2 commits

Rewrite gather-write patch into something less obviously bolted on · 90525373

Tom Lane authored 19 years ago

after the fact. Fix bug with incorrect test for whether we are at end
of logfile segment. Arrange for writes triggered by XLogInsert's
is-cache-more-than-half-full test to synchronize with the cache boundaries,
so that in long transactions we tend to write alternating halves of the
cache rather than randomly chosen portions of it; this saves one more
write syscall per cache load.

90525373

Fix some inconsistent choices of datatypes in xlog.c. Make buffer · d0096a41

Tom Lane authored 19 years ago

indexes all be int, rather than variously int, uint16 and uint32;
add some casts where necessary to support large buffer arrays.

d0096a41

20 Aug, 2005 1 commit

Convert the arithmetic for shared memory size calculation from 'int' · 0007490e

Tom Lane authored 19 years ago

to 'Size' (that is, size_t), and install overflow detection checks in it.
This allows us to remove the former arbitrary restrictions on NBuffers
etc.  It won't make any difference in a 32-bit machine, but in a 64-bit
machine you could theoretically have terabytes of shared buffers.
(How efficiently we could manage 'em remains to be seen.)  Similarly,
num_temp_buffers, work_mem, and maintenance_work_mem can be set above
2Gb on a 64-bit machine.  Original patch from Koichi Suzuki, additional
work by moi.

0007490e

11 Aug, 2005 1 commit

Autovacuum loose end mop-up. Provide autovacuum-specific vacuum cost · d90c5311

Tom Lane authored 19 years ago

delay and limit, both as global GUCs and as table-specific entries in
pg_autovacuum. stats_reset_on_server_start is now OFF by default,
but a reset is forced if we did WAL replay. XID-wrap vacuums do not
ANALYZE, but do FREEZE if it's a template database. Alvaro Herrera

d90c5311

30 Jul, 2005 1 commit
- Fix compile for no O_SYNC, but introduced with O_DIRECT. · 5b0bfec4
  Bruce Momjian authored 19 years ago
  
  5b0bfec4
29 Jul, 2005 3 commits

Clean up a number of autovacuum loose ends. Make the stats collector · 5d5f1a79

Tom Lane authored 19 years ago

track shared relations in a separate hashtable, so that operations done
from different databases are counted correctly. Add proper support for
anti-XID-wraparound vacuuming, even in databases that are never connected
to and so have no stats entries. Miscellaneous other bug fixes.
Alvaro Herrera, some additional fixes by Tom Lane.

5d5f1a79

Update O_DIRECT comment. · c6b1724c
Bruce Momjian authored 19 years ago

c6b1724c

· c34bb005

Bruce Momjian authored 19 years ago

Use O_DIRECT if available when using O_SYNC for wal_sync_method.

Also, write multiple WAL buffers out in one write() operation.

ITAGAKI Takahiro

---------------------------------------------------------------------------

> If we disable writeback-cache and use open_sync, the per-page writing
> behavior in WAL module will show up as bad result. O_DIRECT is similar
> to O_DSYNC (at least on linux), so that the benefit of it will disappear
> behind the slow disk revolution.
>
> In the current source, WAL is written as:
>     for (i = 0; i < N; i++) { write(&buffers[i], BLCKSZ); }
> Is this intentional? Can we rewrite it as follows?
>    write(&buffers[0], N * BLCKSZ);
>
> In order to achieve it, I wrote a 'gather-write' patch (xlog.gw.diff).
> Aside from this, I'll also send the fixed direct io patch (xlog.dio.diff).
> These two patches are independent, so they can be applied either or both.
>
>
> I tested them on my machine and the results as follows. It shows that
> direct-io and gather-write is the best choice when writeback-cache is off.
> Are these two patches worth trying if they are used together?
>
>
>             | writeback | fsync= | fdata | open_ | fsync_ | open_
> patch       | cache     |  false |  sync |  sync | direct | direct
> ------------+-----------+--------+-------+-------+--------+---------
> direct io   | off       |  124.2 | 105.7 |  48.3 |   48.3 |  48.2
> direct io   | on        |  129.1 | 112.3 | 114.1 |  142.9 | 144.5
> gather-write| off       |  124.3 | 108.7 | 105.4 |  (N/A) | (N/A)
> both        | off       |  131.5 | 115.5 | 114.4 |  145.4 | 145.2
>
> - 20runs * pgbench -s 100 -c 50 -t 200
>    - with tuning (wal_buffers=64, commit_delay=500, checkpoint_segments=8)
> - using 2 ATA disks:
>    - hda(reiserfs) includes system and wal.
>    - hdc(jfs) includes database files. writeback-cache is always on.
>
> ---
> ITAGAKI Takahiro

c34bb005

23 Jul, 2005 2 commits
- Remove unintended code addition. · 9af9d674
  Bruce Momjian authored 19 years ago
  
  9af9d674
- Macro alignment cleanup. · 4098c886
  Bruce Momjian authored 19 years ago
  
  4098c886
08 Jul, 2005 1 commit
- Even though I'd like to see full_page_writes go away before 8.1, · d7207cfc
  Tom Lane authored 19 years ago
```
a minimum requirement is that it not completely break the system
meanwhile.  Put the test in the right place.
```
  d7207cfc
05 Jul, 2005 1 commit
- Add GUC full_page_writes to control writing full pages to WAL. · 326a7a07
  Bruce Momjian authored 19 years ago
  
  326a7a07
04 Jul, 2005 1 commit

Arrange for the postmaster (and standalone backends, initdb, etc) to · eb5949d1

Tom Lane authored 19 years ago

chdir into PGDATA and subsequently use relative paths instead of absolute
paths to access all files under PGDATA. This seems to give a small
performance improvement, and it should make the system more robust
against naive DBAs doing things like moving a database directory that
has a live postmaster in it. Per recent discussion.

eb5949d1

30 Jun, 2005 1 commit

Improve the checkpoint signaling mechanism so that the bgwriter can tell · 401de9c8

Tom Lane authored 19 years ago

the difference between checkpoints forced due to WAL segment consumption
and checkpoints forced for other reasons (such as CREATE DATABASE).  Avoid
generating 'checkpoints are occurring too frequently' messages when the
checkpoint wasn't caused by WAL segment consumption.  Per gripe from
Chris K-L.

401de9c8

29 Jun, 2005 1 commit

Clean up the rather historically encumbered interface to now() and · b5f7cff8

Tom Lane authored 19 years ago

current time: provide a GetCurrentTimestamp() function that returns
current time in the form of a TimestampTz, instead of separate time_t
and microseconds fields. This is what all the callers really want
anyway, and it eliminates low-level dependencies on AbsoluteTime,
which is a deprecated datatype that will have to disappear eventually.

b5f7cff8

19 Jun, 2005 1 commit

Simplify uses of readdir() by creating a function ReadDir() that · 3f749924

Tom Lane authored 19 years ago

includes error checking and an appropriate ereport(ERROR) message.
This gets rid of rather tedious and error-prone manipulation of errno,
as well as a Windows-specific bug workaround, at more than a dozen
call sites. After an idea in a recent patch by Heikki Linnakangas.

3f749924