Commits · 1494931d7375ccdc6afd34f135bc708f8954eecc · Abuhujair Javed / Postgres FD Implementation

21 Mar, 2014 3 commits

Remove MinGW readdir/errno bug workaround fixed on 2003-10-10 · 1494931d
Bruce Momjian authored Mar 21, 2014

1494931d

Properly check for readdir/closedir() failures · 6f03927f

Bruce Momjian authored Mar 21, 2014

Clear errno before calling readdir() and handle old MinGW errno bug
while adding full test coverage for readdir/closedir failures.

Backpatch through 8.4.

6f03927f

Replace the XLogInsert slots with regular LWLocks. · 68a2e52b

Heikki Linnakangas authored Mar 21, 2014

The special feature the XLogInsert slots had over regular LWLocks is the
insertingAt value that was updated atomically with releasing backends
waiting on it. Add new functions to the LWLock API to do that, and replace
the slots with LWLocks. This reduces the amount of duplicated code.
(There's still some duplication, but at least it's all in lwlock.c now.)

Reviewed by Andres Freund.

68a2e52b

20 Mar, 2014 3 commits

Again fix initialization of auto-tuned effective_cache_size. · af930e60

Tom Lane authored Mar 20, 2014

The previous method was overly complex and underly correct; in particular,
by assigning the default value with PGC_S_OVERRIDE, it prevented later
attempts to change the setting in postgresql.conf, as noted by Jeff Janes.
We should just assign the default value with source PGC_S_DYNAMIC_DEFAULT,
which will have the desired priority relative to the boot_val as well as
user-set values.

There is still a gap in this method: if there's an explicit assignment of
effective_cache_size = -1 in the postgresql.conf file, and that assignment
appears before shared_buffers is assigned, the code will substitute 4 times
the bootstrap default for shared_buffers, and that value will then persist
(since it will have source PGC_S_FILE).  I don't see any very nice way
to avoid that though, and it's not a case to be expected in practice.
The existing comments in guc-file.l look forward to a redesign of the
DYNAMIC_DEFAULT mechanism; if that ever happens, we should consider this
case as one of the things we'd like to improve.

af930e60

libpq: pass a memory allocation failure error up to PQconndefaults() · a4c8f143
Bruce Momjian authored Mar 20, 2014
```
Previously user name memory allocation failures were ignored and the
default user name set to NULL.
```
a4c8f143
test_shm_mq: Improve regression tests. · d1bdab2f
Robert Haas authored Mar 20, 2014
```
Per discussion with Tom Lane.
```
d1bdab2f

19 Mar, 2014 3 commits

Setup error context callback for transaction lock waits · f88d4cfc

Alvaro Herrera authored Mar 19, 2014

With this in place, a session blocking behind another one because of
tuple locks will get a context line mentioning the relation name, tuple
TID, and operation being done on tuple.  For example:

LOG:  process 11367 still waiting for ShareLock on transaction 717 after 1000.108 ms
DETAIL:  Process holding the lock: 11366. Wait queue: 11367.
CONTEXT:  while updating tuple (0,2) in relation "foo"
STATEMENT:  UPDATE foo SET value = 3;

Most usefully, the new line is displayed by log entries due to
log_lock_waits, although of course it will be printed by any other log
message as well.

Author: Christian Kruse, some tweaks by Álvaro Herrera
Reviewed-by: Amit Kapila, Andres Freund, Tom Lane, Robert Haas

f88d4cfc

Fix memory leak during regular expression execution. · ea8c7e90

Tom Lane authored Mar 19, 2014

For a regex containing backrefs, pg_regexec() might fail to free all the
sub-DFAs that were created during execution, resulting in a permanent
(session lifespan) memory leak.  Problem was introduced by me in commit
58735947.  Per report from Sandro Santilli;
diagnosis by Greg Stark.

ea8c7e90

Some minor improvements to logical decoding document. · fb1d92a9
Fujii Masao authored Mar 19, 2014
```
Also improve help message in pg_recvlogical.
```
fb1d92a9

18 Mar, 2014 16 commits

Fix compilation of pg_xlogdump, now that rm_safe_restartpoint is no more. · 033dc1c9
Heikki Linnakangas authored Mar 18, 2014
```
Oops. Pointed out by Andres Freund.
```
033dc1c9

Remove rm_safe_restartpoint machinery. · 59a5ab3f

Heikki Linnakangas authored Mar 18, 2014

It is no longer used, none of the resource managers have multi-record
actions that would make it unsafe to perform a restartpoint.

Also don't allow rm_cleanup to write WAL records, it's also no longer
required. Move the call to rm_cleanup routines to make it more symmetric
with rm_startup.

59a5ab3f

Fix misc typos in comments. · 1d3b258c
Heikki Linnakangas authored Mar 18, 2014

1d3b258c
Logical decoding documentation corrections. · 3ee4fcfc
Robert Haas authored Mar 18, 2014
```
Thom Brown
```
3ee4fcfc
Fix uninitialized variable. · a3b30d4c
Robert Haas authored Mar 18, 2014
```
Report from Andres Freund, but not his fix.
```
a3b30d4c

Make the handling of interrupted B-tree page splits more robust. · 40dae7ec

Heikki Linnakangas authored Mar 18, 2014

Splitting a page consists of two separate steps: splitting the child page,
and inserting the downlink for the new right page to the parent. Previously,
we handled the case that you crash in between those steps with a cleanup
routine after the WAL recovery had finished, which finished the incomplete
split. However, that doesn't help if the page split is interrupted but the
database doesn't crash, so that you don't perform WAL recovery. That could
happen for example if you run out of disk space.

Remove the end-of-recovery cleanup step. Instead, when a page is split, the
left page is marked with a new INCOMPLETE_SPLIT flag, and when the downlink
is inserted to the parent, the flag is cleared again. If an insertion sees
a page with the flag set, it knows that the split was interrupted for some
reason, and inserts the missing downlink before proceeding.

I used the same approach to fix GIN and GiST split algorithms earlier. This
was the last WAL cleanup routine, so we could get rid of that whole
machinery now, but I'll leave that for a separate patch.

Reviewed by Peter Geoghegan.

40dae7ec

Fix some remaining int64 vestiges in contrib/test_shm_mq. · b6ec7c92
Tom Lane authored Mar 18, 2014
```
Andres Freund and Tom Lane
```
b6ec7c92

test_shm_mq: Use Size rather than uint64. · c676ac0f

Robert Haas authored Mar 18, 2014

Commit 3bd261ca updated the API but
neglected to make the corresponding edits here.

Per Tom Lane and the buildfarm.

c676ac0f

Documentation for logical decoding. · 49c0864d
Robert Haas authored Mar 18, 2014
```
Craig Ringer, Andres Freund, Christian Kruse, with edits by me.
```
49c0864d

Add pg_recvlogical, a tool to receive data logical decoding data. · 8bdd12bb

Robert Haas authored Mar 18, 2014

This is fairly basic at the moment, but it's at least useful for
testing and debugging, and possibly more.

Andres Freund

8bdd12bb

Rewrite comment for shm_mq_receive_bytes. · 250f8a7b

Robert Haas authored Mar 18, 2014

The comment and the code diverged at some point before the initial
commit of this feature, and I failed to notice.

Noted by Tom Lane.

250f8a7b

Fix relcache reference leak in refresh_by_match_merge(). · f7271c44

Tom Lane authored Mar 18, 2014

One path through the loop over indexes forgot to do index_close(). Rather
than adding a fourth call, restructure slightly so that there's only one.

In passing, get rid of an unnecessary syscache lookup: the pg_index struct
for the index is already available from its relcache entry.

Per report from YAMAMOTO Takashi, though this is a bit different from his
suggested patch. This is new code in HEAD, so no need for back-patch.

f7271c44

Improve shm_mq portability around MAXIMUM_ALIGNOF and sizeof(Size). · 3bd261ca

Robert Haas authored Mar 18, 2014

Revise the original decision to expose a uint64-based interface and
use Size everywhere possible.  Avoid assuming that MAXIMUM_ALIGNOF is
8, or making any assumption about the relationship between that value
and sizeof(Size).  If MAXIMUM_ALIGNOF is bigger, we'll now insert
padding after the length word; if it's smaller, we are now prepared
to read and write the length word in chunks.

Per discussion with Tom Lane.

3bd261ca

Fix pg_dumpall option parsing: -i doesn't take an argument. · 19f2d6cd
Tom Lane authored Mar 18, 2014
```
This used to work properly, but got fat-fingered in commit
3dee636e.  Per bug #9620 from
Nicolas Payart.
```
19f2d6cd
Fix help message and document in pg_receivexlog. · e726e59d
Fujii Masao authored Mar 18, 2014
```
Add SLOTNAME placeholder to --slot option in help message and
document.
```
e726e59d

Make it easy to detach completely from shared memory. · 79a4d24f

Robert Haas authored Mar 18, 2014

The new function dsm_detach_all() can be used either by postmaster
children that don't wish to take any risk of accidentally corrupting
shared memory; or by forked children of regular backends with
the same need.  This patch also updates the postmaster children that
already do PGSharedMemoryDetach() to do dsm_detach_all() as well.

Per discussion with Tom Lane.

79a4d24f

17 Mar, 2014 11 commits

Release notes for 9.3.4, 9.2.8, 9.1.13, 9.0.17, 8.4.21. · 551fb5ac
Tom Lane authored Mar 17, 2014

551fb5ac

During index build, check and elog (not just Assert) for broken HOT chain. · d70cf811

Tom Lane authored Mar 17, 2014

The recently-fixed bug in WAL replay could result in not finding a parent
tuple for a heap-only tuple.  The existing code would either Assert or
generate an invalid index entry, neither of which is desirable.  Throw a
regular error instead.

d70cf811

Fix thinko: have trueTriConsistentFn return GIN_TRUE. · d663d439
Heikki Linnakangas authored Mar 17, 2014
```
While we're at it, also improve comments in ginlogic.c.
```
d663d439
Fix typos in comments. · 2bccced1
Fujii Masao authored Mar 17, 2014
```
Thom Brown
```
2bccced1

Fix bug in clean shutdown of walsender that pg_receiving is connecting to. · 5c6d9fc4

Fujii Masao authored Mar 17, 2014

On clean shutdown, walsender waits for all WAL to be replicated to a standby,
and exits. It determined whether that replication had been completed by
checking whether its sent location had been equal to a standby's flush
location. Unfortunately this condition never becomes true when the standby
such as pg_receivexlog which always returns an invalid flush location is
connecting to walsender, and then walsender waits forever.

This commit changes walsender so that it just checks a standby's write
location if a flush location is invalid.

Back-patch to 9.1 where enough infrastructure for this exists.

5c6d9fc4

Fix small typo in comment · 02703ff2
Magnus Hagander authored Mar 17, 2014
```
Michael Paquier
```
02703ff2

plperl: Fix memory leak in hek2cstr · bd1154ed

Alvaro Herrera authored Mar 16, 2014

Backpatch all the way back to 9.1, where it was introduced by commit
50d89d42.

Reported by Sergey Burladyan in #9223
Author: Alex Hunsaker

bd1154ed

Fix unportable shell-script syntax in pg_upgrade's test.sh. · 0268d21e

Tom Lane authored Mar 16, 2014

I discovered the hard way that on some old shells, the locution
    FOO=""   unset FOO
does not behave the same as
    FOO="";  unset FOO
and in fact leaves FOO set to an empty string.  test.sh was inconsistently
spelling it different ways on adjacent lines.

This got broken relatively recently, in commit c737a2e5, so the lack of
field reports to date doesn't represent a lot of evidence that the problem
is rare.

0268d21e

Make punctuation consistent · 2861e8e9
Peter Eisentraut authored Mar 16, 2014

2861e8e9
Fix whitespace · e2b95947
Peter Eisentraut authored Mar 16, 2014

e2b95947

Fix advertised dispsize for libpq's sslmode connection parameter. · f4051e36

Tom Lane authored Mar 16, 2014

"8" was correct back when "disable" was the longest allowed value, but
since "verify-full" was added, it should be "12". Given the lack of
complaints, I wouldn't be surprised if nobody is actually using these
values ... but still, if they're in the API, they should be right.

Noticed while pursuing a different problem. It's been wrong for quite
a long time, so back-patch to all supported branches.

f4051e36

16 Mar, 2014 1 commit

Cleanups from the remove-native-krb5 patch · 0294023a

Magnus Hagander authored Mar 16, 2014

krb_srvname is actually not available anymore as a parameter server-side, since
with gssapi we accept all principals in our keytab. It's still used in libpq for
client side specification.

In passing remove declaration of krb_server_hostname, where all the functionality
was already removed.

Noted by Stephen Frost, though a different solution than his suggestion

0294023a

15 Mar, 2014 2 commits

First-draft release notes for 9.3.4. · e3c9f232

Tom Lane authored Mar 15, 2014

As usual, the release notes for older branches will be made by cutting
these down, but put them up for community review first.

e3c9f232

Update time zone data files to tzdata release 2014a. · aba7f567
Tom Lane authored Mar 15, 2014
```
DST law changes in Fiji, Turkey; historical changes in Israel, Ukraine.
```
aba7f567

14 Mar, 2014 1 commit

Fix race condition in B-tree page deletion. · efada2b8

Heikki Linnakangas authored Mar 14, 2014

In short, we don't allow a page to be deleted if it's the rightmost child
of its parent, but that situation can change after we check for it.

Problem
-------

We check that the page to be deleted is not the rightmost child of its
parent, and then lock its left sibling, the page itself, its right sibling,
and the parent, in that order. However, if the parent page is split after
the check but before acquiring the locks, the target page might become the
rightmost child, if the split happens at the right place. That leads to an
error in vacuum (I reproduced this by setting a breakpoint in debugger):

ERROR: failed to delete rightmost child 41 of block 3 in index "foo_pkey"

We currently re-check that the page is still the rightmost child, and throw
the above error if it's not. We could easily just give up rather than throw
an error, but that approach doesn't scale to half-dead pages. To recap,
although we don't normally allow deleting the rightmost child, if the page
is the *only* child of its parent, we delete the child page and mark the
parent page as half-dead in one atomic operation. But before we do that, we
check that the parent can later be deleted, by checking that it in turn is
not the rightmost child of the grandparent (potentially recursing all the
way up to the root). But the same situation can arise there - the
grandparent can be split while we're not holding the locks. We end up with
a half-dead page that we cannot delete.

To make things worse, the keyspace of the deleted page has already been
transferred to its right sibling. As the README points out, the keyspace at
the grandparent level is "out-of-whack" until the half-dead page is deleted,
and if enough tuples with keys in the transferred keyspace are inserted, the
page might get split and a downlink might be inserted into the grandparent
that is out-of-order. That might not cause any serious problem if it's
transient (as the README ponders), but is surely bad if it stays that way.

Solution
--------

This patch changes the page deletion algorithm to avoid that problem. After
checking that the topmost page in the chain of to-be-deleted pages is not
the rightmost child of its parent, and then deleting the pages from bottom
up, unlink the pages from top to bottom. This way, the intermediate stages
are similar to the intermediate stages in page splitting, and there is no
transient stage where the keyspace is "out-of-whack". The topmost page in
the to-be-deleted chain doesn't have a downlink pointing to it, like a page
split before the downlink has been inserted.

This also allows us to get rid of the cleanup step after WAL recovery, if we
crash during page deletion. The deletion will be continued at next VACUUM,
but the tree is consistent for searches and insertions at every step.

This bug is old, all supported versions are affected, but this patch is too
big to back-patch (and changes the WAL record formats of related records).
We have not heard any reports of the bug from users, so clearly it's not
easy to bump into. Maybe backpatch later, after this has had some field
testing.

Reviewed by Kevin Grittner and Peter Geoghegan.

efada2b8