Commits · 1c382655ad90b7cd224230452f7056040337facf · Abuhujair Javed / Postgres FD Implementation

14 Dec, 2012 4 commits
- Provide Assert() for frontend code. · 1c382655
  Andrew Dunstan authored Dec 14, 2012
```
Per discussion on-hackers. psql is converted to use the new code.

Follows a suggestion from Heikki Linnakangas.
```
  1c382655
- Update comment in heapgetpage() regarding PD_ALL_VISIBLE vs. Hot Standby. · 75758a6f
  Robert Haas authored Dec 14, 2012
```
Pavan Deolasee, slightly modified by me
```
  75758a6f
- NLS: Use msgmerge --previous option · fdb67eb2
  Peter Eisentraut authored Dec 13, 2012
```
It provides some additional help to translators.
```
  fdb67eb2
- doc: Improve search_path mentions in index · a301eb99
  Peter Eisentraut authored Dec 13, 2012
```
Karl O. Pinc
```
  a301eb99
13 Dec, 2012 2 commits

Allow a streaming replication standby to follow a timeline switch. · abfd192b

Heikki Linnakangas authored Dec 13, 2012

Before this patch, streaming replication would refuse to start replicating
if the timeline in the primary doesn't exactly match the standby. The
situation where it doesn't match is when you have a master, and two
standbys, and you promote one of the standbys to become new master.
Promoting bumps up the timeline ID, and after that bump, the other standby
would refuse to continue.

There's significantly more timeline related logic in streaming replication
now. First of all, when a standby connects to primary, it will ask the
primary for any timeline history files that are missing from the standby.
The missing files are sent using a new replication command TIMELINE_HISTORY,
and stored in standby's pg_xlog directory. Using the timeline history files,
the standby can follow the latest timeline present in the primary
(recovery_target_timeline='latest'), just as it can follow new timelines
appearing in an archive directory.

START_REPLICATION now takes a TIMELINE parameter, to specify exactly which
timeline to stream WAL from. This allows the standby to request the primary
to send over WAL that precedes the promotion. The replication protocol is
changed slightly (in a backwards-compatible way although there's little hope
of streaming replication working across major versions anyway), to allow
replication to stop when the end of timeline reached, putting the walsender
back into accepting a replication command.

Many thanks to Amit Kapila for testing and reviewing various versions of
this patch.

abfd192b

Make xlog_internal.h includable in frontend context. · 52766871

Heikki Linnakangas authored Dec 13, 2012

This makes unnecessary the ugly hack used to #include postgres.h in
pg_basebackup.

Based on Alvaro Herrera's patch

52766871

12 Dec, 2012 3 commits

In multi-insert, don't go into infinite loop on a huge tuple and fillfactor. · 6264cd3d

Heikki Linnakangas authored Dec 12, 2012

If a tuple is larger than page size minus space reserved for fillfactor,
heap_multi_insert would never find a page that it fits in and repeatedly ask
for a new page from RelationGetBufferForTuple. If a tuple is too large to
fit on any page, taking fillfactor into account, RelationGetBufferForTuple
will always expand the relation. In a normal insert, heap_insert will accept
that and put the tuple on the new page. heap_multi_insert, however, does a
fillfactor check of its own, and doesn't accept the newly-extended page
RelationGetBufferForTuple returns, even though there is no other choice to
make the tuple fit.

Fix that by making the logic in heap_multi_insert more like the heap_insert
logic. The first tuple is always put on the page RelationGetBufferForTuple
gives us, and the fillfactor check is only applied to the subsequent tuples.

Report from David Gould, although I didn't use his patch.

6264cd3d

Add defenses against integer overflow in dynahash numbuckets calculations. · 691c5ebf

Tom Lane authored Dec 11, 2012

The dynahash code requires the number of buckets in a hash table to fit
in an int; but since we calculate the desired hash table size dynamically,
there are various scenarios where we might calculate too large a value.
The resulting overflow can lead to infinite loops, division-by-zero
crashes, etc. I (tgl) had previously installed some defenses against that
in commit 299d1716, but that covered only one
call path. Moreover it worked by limiting the request size to work_mem,
but in a 64-bit machine it's possible to set work_mem high enough that the
problem appears anyway. So let's fix the problem at the root by installing
limits in the dynahash.c functions themselves.

Trouble report and patch by Jeff Davis.

691c5ebf

Disable event triggers in standalone mode. · cd3413ec

Tom Lane authored Dec 11, 2012

Per discussion, this seems necessary to allow recovery from broken event
triggers, or broken indexes on pg_event_trigger.

Dimitri Fontaine

cd3413ec

11 Dec, 2012 6 commits

Fix performance problems with autovacuum truncation in busy workloads. · b19e4250

Kevin Grittner authored Dec 11, 2012

In situations where there are over 8MB of empty pages at the end of
a table, the truncation work for trailing empty pages takes longer
than deadlock_timeout, and there is frequent access to the table by
processes other than autovacuum, there was a problem with the
autovacuum worker process being canceled by the deadlock checking
code. The truncation work done by autovacuum up that point was
lost, and the attempt tried again by a later autovacuum worker. The
attempts could continue indefinitely without making progress,
consuming resources and blocking other processes for up to
deadlock_timeout each time.

This patch has the autovacuum worker checking whether it is
blocking any other thread at 20ms intervals. If such a condition
develops, the autovacuum worker will persist the work it has done
so far, release its lock on the table, and sleep in 50ms intervals
for up to 5 seconds, hoping to be able to re-acquire the lock and
try again. If it is unable to get the lock in that time, it moves
on and a worker will try to continue later from the point this one
left off.

While this patch doesn't change the rules about when and what to
truncate, it does cause the truncation to occur sooner, with less
blocking, and with the consumption of fewer resources when there is
contention for the table's lock.

The only user-visible change other than improved performance is
that the table size during truncation may change incrementally
instead of just once.

This problem exists in all supported versions but is infrequently
reported, although some reports of performance problems when
autovacuum runs might be caused by this. Initial commit is just the
master branch, but this should probably be backpatched once the
build farm and general developer usage confirm that there are no
surprising effects.

Jan Wieck

b19e4250

Fix pg_upgrade for invalid indexes · e95c4bd1

Bruce Momjian authored Dec 11, 2012

All versions of pg_upgrade upgraded invalid indexes caused by CREATE
INDEX CONCURRENTLY failures and marked them as valid.  The patch adds a
check to all pg_upgrade versions and throws an error during upgrade or
--check.

Backpatch to 9.2, 9.1, 9.0.  Patch slightly adjusted.

e95c4bd1

Consistency check should compare last record replayed, not last record read. · 970fb12d

Heikki Linnakangas authored Dec 11, 2012

EndRecPtr is the last record that we've read, but not necessarily yet
replayed. CheckRecoveryConsistency should compare minRecoveryPoint with the
last replayed record instead. This caused recovery to think it's reached
consistency too early.

Now that we do the check in CheckRecoveryConsistency correctly, we have to
move the call of that function to after redoing a record. The current place,
after reading a record but before replaying it, is wrong. In particular, if
there are no more records after the one ending at minRecoveryPoint, we don't
enter hot standby until one extra record is generated and read by the
standby, and CheckRecoveryConsistency is called. These two bugs conspired
to make the code appear to work correctly, except for the small window
between reading the last record that reaches minRecoveryPoint, and
replaying it.

In the passing, rename recoveryLastRecPtr, which is the last record
replayed, to lastReplayedEndRecPtr. This makes it slightly less confusing
with replayEndRecPtr, which is the last record read that we're about to
replay.

Original report from Kyotaro HORIGUCHI, further diagnosis by Fujii Masao.
Backpatch to 9.0, where Hot Standby subtly changed the test from
"minRecoveryPoint < EndRecPtr" to "minRecoveryPoint <= EndRecPtr". The
former works because where the test is performed, we have always read one
more record than we've replayed.

970fb12d

Add mode where contrib installcheck runs each module in a separately named database. · ad69bd05

Andrew Dunstan authored Dec 11, 2012

Normally each module is tested in a database named contrib_regression,
which is dropped and recreated at the beginhning of each pg_regress run.
This new mode, enabled by adding USE_MODULE_DB=1 to the make command
line, runs most modules in a database with the module name embedded in
it.

This will make testing pg_upgrade on clusters with the contrib modules
a lot easier.

Second attempt at this, this time accomodating make versions older
than 3.82.

Still to be done: adapt to the MSVC build system.

Backpatch to 9.0, which is the earliest version it is reasonably
possible to test upgrading from.

ad69bd05

Fix pg_upgrade -O/-o options · acdb8c22

Bruce Momjian authored Dec 10, 2012

Fix previous commit that added synchronous_commit=off, but broke -O/-o
due to missing space in argument passing.

Backpatch to 9.2.

acdb8c22

doc: Remove blastwave.org link · 8e48d77c
Peter Eisentraut authored Dec 10, 2012
```
Apparently, this service has been dead since 2008.
```
8e48d77c

10 Dec, 2012 2 commits

Update minimum recovery point on truncation. · 7bffc9b7

Heikki Linnakangas authored Dec 10, 2012

If a file is truncated, we must update minRecoveryPoint. Once a file is
truncated, there's no going back; it would not be safe to stop recovery
at a point earlier than that anymore.

Per report from Kyotaro HORIGUCHI. Backpatch to 8.4. Before that,
minRecoveryPoint was not updated during recovery at all.

7bffc9b7

Fix the tracking of min recovery point timeline. · 6be79966

Heikki Linnakangas authored Dec 07, 2012

Forgot to update it at the right place. Also, consider checkpoint record
that switches to new timelne to be on the new timeline.

This fixes erroneous "requested timeline 2 does not contain minimum recovery
point" errors, pointed out by Amit Kapila while testing another patch.

6be79966

09 Dec, 2012 1 commit

Fix assorted bugs in privileges-for-types patch. · b46c9211

Tom Lane authored Dec 09, 2012

Commit 72920557 added privileges on data
types, but there were a number of oversights.  The implementation of
default privileges for types missed a few places, and pg_dump was
utterly innocent of the whole concept.  Per bug #7741 from Nathan Alden,
and subsequent wider investigation.

b46c9211

08 Dec, 2012 2 commits

Support automatically-updatable views. · a99c42f2

Tom Lane authored Dec 08, 2012

This patch makes "simple" views automatically updatable, without the need
to create either INSTEAD OF triggers or INSTEAD rules.  "Simple" views
are those classified as updatable according to SQL-92 rules.  The rewriter
transforms INSERT/UPDATE/DELETE commands on such views directly into an
equivalent command on the underlying table, which will generally have
noticeably better performance than is possible with either triggers or
user-written rules.  A view that has INSTEAD OF triggers or INSTEAD rules
continues to operate the same as before.

For the moment, security_barrier views are not considered simple.
Also, we do not support WITH CHECK OPTION.  These features may be
added in future.

Dean Rasheed, reviewed by Amit Kapila

a99c42f2

Update iso.org page link · d12d9f59
Peter Eisentraut authored Dec 08, 2012
```
The old one is responding with 404.
```
d12d9f59

07 Dec, 2012 5 commits

Improve pg_upgrade's status display · 6dd95845

Bruce Momjian authored Dec 07, 2012

Pg_upgrade displays file names during copy and database names during
dump/restore.  Andrew Dunstan identified three bugs:

*  long file names were being truncated to 60 _leading_ characters, which
   often do not change for long file names

*  file names were truncated to 60 characters in log files

*  carriage returns were being output to log files

This commit fixes these --- it prints 60 _trailing_ characters to the
status display, and full path names without carriage returns to log
files.  It also suppresses status output to the log file unless verbose
mode is used.

6dd95845

Correct xmax test for COPY FREEZE · ef754fb5
Simon Riggs authored Dec 07, 2012

ef754fb5
Optimize COPY FREEZE with CREATE TABLE also. · 1f023f92
Simon Riggs authored Dec 07, 2012
```
Jeff Davis, additional test by me
```
1f023f92
Clarify that COPY FREEZE is not a hard rule. · 1eb6cee4
Simon Riggs authored Dec 07, 2012
```
Remove message when FREEZE not honoured,
clarify reasons in comments and docs.
```
1eb6cee4

Improve pl/pgsql to support composite-type expressions in RETURN. · 31a89185

Tom Lane authored Dec 06, 2012

For some reason lost in the mists of prehistory, RETURN was only coded to
allow a simple reference to a composite variable when the function's return
type is composite. Allow an expression instead, while preserving the
efficiency of the original code path in the case where the expression is
indeed just a composite variable's name. Likewise for RETURN NEXT.

As is true in various other places, the supplied expression must yield
exactly the number and data types of the required columns. There was some
discussion of relaxing that for pl/pgsql, but no consensus yet, so this
patch doesn't address that.

Asif Rehman, reviewed by Pavel Stehule

31a89185

06 Dec, 2012 3 commits

Background worker processes · da07a1e8

Alvaro Herrera authored Dec 06, 2012

Background workers are postmaster subprocesses that run arbitrary
user-specified code.  They can request shared memory access as well as
backend database connections; or they can just use plain libpq frontend
database connections.

Modules listed in shared_preload_libraries can register background
workers in their _PG_init() function; this is early enough that it's not
necessary to provide an extra GUC option, because the necessary extra
resources can be allocated early on.  Modules can install more than one
bgworker, if necessary.

Care is taken that these extra processes do not interfere with other
postmaster tasks: only one such process is started on each ServerLoop
iteration.  This means a large number of them could be waiting to be
started up and postmaster is still able to quickly service external
connection requests.  Also, shutdown sequence should not be impacted by
a worker process that's reasonably well behaved (i.e. promptly responds
to termination signals.)

The current implementation lets worker processes specify their start
time, i.e. at what point in the server startup process they are to be
started: right after postmaster start (in which case they mustn't ask
for shared memory access), when consistent state has been reached
(useful during recovery in a HOT standby server), or when recovery has
terminated (i.e. when normal backends are allowed).

In case of a bgworker crash, actions to take depend on registration
data: if shared memory was requested, then all other connections are
taken down (as well as other bgworkers), just like it were a regular
backend crashing.  The bgworker itself is restarted, too, within a
configurable timeframe (which can be configured to be never).

More features to add to this framework can be imagined without much
effort, and have been discussed, but this seems good enough as a useful
unit already.

An elementary sample module is supplied.

Author: Álvaro Herrera

This patch is loosely based on prior patches submitted by KaiGai Kohei,
and unsubmitted code by Simon Riggs.

Reviewed by: KaiGai Kohei, Markus Wanner, Andres Freund,
Heikki Linnakangas, Simon Riggs, Amit Kapila

da07a1e8

Fix intermittent crash in DROP INDEX CONCURRENTLY. · e31d5248

Tom Lane authored Dec 05, 2012

When deleteOneObject closes and reopens the pg_depend relation,
we must see to it that the relcache pointer held by the calling function
(typically performMultipleDeletions) is updated.  Usually the relcache
entry is retained so that the pointer value doesn't change, which is why
the problem had escaped notice ... but after a cache flush event there's
no guarantee that the same memory will be reassigned.  To fix, change
the recursive functions' APIs so that we pass around a "Relation *"
not just "Relation".

Per investigation of occasional buildfarm failures.  This is trivial
to reproduce with -DCLOBBER_CACHE_ALWAYS, which points up the sad
lack of any buildfarm member running that way on a regular basis.

e31d5248

Update comment at top of index_create · 5e15cdb2
Alvaro Herrera authored Dec 05, 2012
```
I neglected to update it in commit f4c4335a.

Michael Paquier
```
5e15cdb2

05 Dec, 2012 4 commits

Ensure recovery pause feature doesn't pause unless users can connect. · af4aba2f

Tom Lane authored Dec 05, 2012

If we're not in hot standby mode, then there's no way for users to connect
to reset the recoveryPause flag, so we shouldn't pause. The code was aware
of this but the test to see if pausing was safe was seriously inadequate:
it wasn't paying attention to reachedConsistency, and besides what it was
testing was that we could legally enter hot standby, not that we have
done so. Get rid of that in favor of checking LocalHotStandbyActive,
which because of the coding in CheckRecoveryConsistency is tantamount to
checking that we have told the postmaster to enter hot standby.

Also, move the recoveryPausesHere() call that reacts to asynchronous
recoveryPause requests so that it's not in the middle of application of a
WAL record. I put it next to the recoveryStopsHere() call --- in future
those are going to need to interact significantly, so this seems like a
good waystation.

Also, don't bother trying to read another WAL record if we've already
decided not to continue recovery. This was no big deal when the code was
written originally, but now that reading a record might entail actions like
fetching an archive file, it seems a bit silly to do it like that.

Per report from Jeff Janes and subsequent discussion. The pause feature
needs quite a lot more work, but this gets rid of some indisputable bugs,
and seems safe enough to back-patch.

af4aba2f

Oops, meant to change the comment in writeTimeLineHistory. · d67b06fe
Heikki Linnakangas authored Dec 05, 2012

d67b06fe

Must not reach consistency before XLOG_BACKUP_RECORD · 6aa2e49a

Simon Riggs authored Dec 05, 2012

When waiting for an XLOG_BACKUP_RECORD the minRecoveryPoint
will be incorrect, so we must not declare recovery as consistent
before we have seen the record. Major bug allowing recovery to end
too early in some cases, allowing people to see inconsistent db.
This patch to HEAD and 9.2, other fix required for 9.1 and 9.0

Simon Riggs and Andres Freund, bug report by Jeff Janes

6aa2e49a

Add pgstatginindex() function to get the size of the GIN pending list. · 357cbaae
Heikki Linnakangas authored Dec 05, 2012
```
Fujii Masao, reviewed by Kyotaro Horiguchi.
```
357cbaae

04 Dec, 2012 8 commits

Attempt to un-break Windows builds with USE_LDAP. · cdf498c5

Tom Lane authored Dec 04, 2012

The buildfarm shows this case is entirely broken, and I'm betting the
reason is lack of any include file.

cdf498c5

Include isinf.o in libecpg if isinf() is not available on the system. · ac99ca68
Michael Meskes authored Dec 04, 2012
```
Patch done by Jiang Guiqing <jianggq@cn.fujitsu.com>.
```
ac99ca68

Downgrade a status message from LOG to DEBUG2. · 90991c40

Heikki Linnakangas authored Dec 04, 2012

I never intended this to be anything other than a debugging aid, but forgot
to change the level before committing.

90991c40

Write exact xlog position of timeline switch in the timeline history file. · 32f4de0a

Heikki Linnakangas authored Dec 04, 2012

This allows us to do some more rigorous sanity checking for various
incorrect point-in-time recovery scenarios, and provides more information
for debugging purposes. It will also come handy in the upcoming patch to
allow timeline switches to be replicated by streaming replication.

32f4de0a

In initdb.c, move auth warning code into main() from secondary function. · a84c30dd
Bruce Momjian authored Dec 04, 2012

a84c30dd
In pg_upgrade testing script, turn off command echo at the end so status · c47d261c
Bruce Momjian authored Dec 04, 2012
```
report is clearer.
```
c47d261c
Fix build of LDAP URL feature · ec8d1e32
Peter Eisentraut authored Dec 04, 2012
```
Some code was not ifdef'ed out for non-LDAP builds.

patch from Bruce Momjian
```
ec8d1e32

Track the timeline associated with minRecoveryPoint, for more sanity checks. · 5ce108bf

Heikki Linnakangas authored Dec 04, 2012

This allows recovery to notice certain incorrect recovery scenarios.
If a server has recovered to point X on timeline 5, and you restart
recovery, it better be on timeline 5 when it reaches point X again, not on
some timeline with a higher ID. This can happen e.g if you a standby server
is shut down, a new timeline appears in the WAL archive, and the standby
server is restarted. It will try to follow the new timeline, which is wrong
because some WAL on the old timeline was already replayed before shutdown.

Requires an initdb (or at least pg_resetxlog), because this adds a field to
the control file.

5ce108bf