Commits · c504513f83a9ee8dce4a719746ca73102cae9f13 · Abuhujair Javed / Postgres FD Implementation

23 Dec, 2012 2 commits

Adjust many backend functions to return OID rather than void. · c504513f

Robert Haas authored Dec 23, 2012

Extracted from a larger patch by Dimitri Fontaine.  It is hoped that
this will provide infrastructure for enriching the new event trigger
functionality, but it seems possibly useful for other purposes as
well.

c504513f

Prevent failure when RowExpr or XmlExpr is parse-analyzed twice. · 31bc8397

Tom Lane authored Dec 23, 2012

transformExpr() is required to cope with already-transformed expression
trees, for various ugly-but-not-quite-worth-cleaning-up reasons. However,
some of its newer subroutines hadn't gotten the memo. This accounts for
bug #7763 from Norbert Buchmuller: transformRowExpr() was overwriting the
previously determined type of a RowExpr during CREATE TABLE LIKE INCLUDING
INDEXES. Additional investigation showed that transformXmlExpr had the
same kind of problem, but all the other cases seem to be safe.

Andres Freund and Tom Lane

31bc8397

22 Dec, 2012 1 commit

Fix documentation typo. · eb035068

Tom Lane authored Dec 22, 2012

"GetForeignTableColumnOptions" should be "GetForeignColumnOptions".
Noted by Metin Döşlü.

eb035068

21 Dec, 2012 5 commits
- Fix sloppiness in the timeline switch over streaming replication patch. · 1ff92eea
  Heikki Linnakangas authored Dec 21, 2012
```
Here's another attempt at fixing the logic that decides how far the WAL can
be streamed, which was still broken if the timeline changed while streaming.
You would get an assertion failure. The way the logic is now written is more
readable, too.

Thom Brown reported the assertion failure.
```
  1ff92eea
- Fix race condition if a file is removed while pg_basebackup is running. · 36e4456d
  Heikki Linnakangas authored Dec 21, 2012
```
If a relation file was removed when the server-side counterpart of
pg_basebackup was just about to open it to send it to the client, you'd
get a "could not open file" error. Fix that.

Backpatch to 9.1, this goes back to when pg_basebackup was introduced.
```
  36e4456d
- Forgot to remove extern declaration of GetRecoveryTargetTLI() · d57a9734
  Heikki Linnakangas authored Dec 21, 2012
```
Fujii Masao
```
  d57a9734
- Make some messages more consistent in style · 740ee42d
  Peter Eisentraut authored Dec 20, 2012
  
  740ee42d
- Fix grammatical mistake in error message · a0bfb7b3
  Peter Eisentraut authored Dec 20, 2012
  
  a0bfb7b3
20 Dec, 2012 6 commits

Fix pg_extension_config_dump() to handle update cases more sanely. · 343c2a86

Tom Lane authored Dec 20, 2012

If pg_extension_config_dump() is executed again for a table already listed
in the extension's extconfig, the code was blindly making a new array entry.
This does not seem useful. Fix it to replace the existing array entry
instead, so that it's possible for extension update scripts to alter the
filter conditions for configuration tables.

In addition, teach ALTER EXTENSION DROP TABLE to check for an extconfig
entry for the target table, and remove it if present. This is not a 100%
solution because it's allowed for an extension update script to just
summarily DROP a member table, and that code path doesn't go through
ExecAlterExtensionContentsStmt. We could probably make that case clean
things up if we had to, but it would involve sticking a very ugly wart
somewhere in the guts of dependency.c. Since on the whole it seems quite
unlikely that extension updates would want to remove pre-existing
configuration tables, making the case possible with an explicit command
seems sufficient.

Per bug #7756 from Regina Obe. Back-patch to 9.1 where extensions were
introduced.

343c2a86

Fix recycling of WAL segments after switching timeline during recovery. · 343ee00b

Heikki Linnakangas authored Dec 20, 2012

This was broken before, we would recycle old WAL segments on wrong timeline
after the recovery target timeline had changed, but my recent commit to
not initialize ThisTimeLineID at all in a standby's checkpointer process
broke this completely.

The problem is that when installing a recycled WAL segment as a future one,
ThisTimeLineID is used to construct the filename. To fix, always update
ThisTimeLineID to the current timeline being recovered, before recycling
WAL segments at a restartpoint.

This still leaves a small window where we might install WAL segments under
wrong timeline ID, if the timeline is changed just as we're about to start
recycling. Also, even if we're replaying timeline X at the momnent, there's
no guarantee that we'll need as many WAL segments on that timeline as we
recycle. We might be just about to reach the point where we switch to next
timeline, so might only need one more WAL segment on the current timeline.
We'll live with the waste in that situation.

Bug pointed out by Fujii Masao. 9.1 and 9.2 had the same issue, when
recovery target timeline was changed, but I committed a slightly different
version of this patch on those branches.

343ee00b

Avoid using NAMEDATALEN in pg_upgrade · dc9896a2

Bruce Momjian authored Dec 20, 2012

Because the client encoding might not match the server encoding,
pg_upgrade can't allocate NAMEDATALEN bytes for storage of database,
relation, and namespace identifiers.  Instead pg_strdup() the memory and
free it.

Also add C comment in initdb.c about safe NAMEDATALEN usage.

dc9896a2

Follow TLI of last replayed record, not recovery target TLI, in walsenders. · af275a12

Heikki Linnakangas authored Dec 20, 2012

Most of the time, the last replayed record comes from the recovery target
timeline, but there is a corner case where it makes a difference. When
the startup process scans for a new timeline, and decides to change recovery
target timeline, there is a window where the recovery target TLI has already
been bumped, but there are no WAL segments from the new timeline in pg_xlog
yet. For example, if we have just replayed up to point 0/30002D8, on
timeline 1, there is a WAL file called 000000010000000000000003 in pg_xlog
that contains the WAL up to that point. When recovery switches recovery
target timeline to 2, a walsender can immediately try to read WAL from
0/30002D8, from timeline 2, so it will try to open WAL file
000000020000000000000003. However, that doesn't exist yet - the startup
process hasn't copied that file from the archive yet nor has the walreceiver
streamed it yet, so walsender fails with error "requested WAL segment
000000020000000000000003 has already been removed". That's harmless, in that
the standby will try to reconnect later and by that time the segment is
already created, but error messages that should be ignored are not good.

To fix that, have walsender track the TLI of the last replayed record,
instead of the recovery target timeline. That way walsender will not try to
read anything from timeline 2, until the WAL segment has been created and at
least one record has been replayed from it. The recovery target timeline is
now xlog.c's internal affair, it doesn't need to be exposed in shared memory
anymore.

This fixes the error reported by Thom Brown. depesz the same error message,
but I'm not sure if this fixes his scenario.

af275a12

Don't set ThisTimeLineID in checkpointer & bgwriter during recovery. · 1a11d460

Heikki Linnakangas authored Dec 20, 2012

We used to set it to the current recovery target timeline, but the recovery
target timeline can change during recovery, leaving ThisTimeLineID at an
old value. That seems worse than always leaving it at zero to begin with.

AFAICS there was no good reason to set it in the first place. ThisTimeLineID
is not needed in checkpointer or bgwriter process, until it's time to write
the end-of-recovery checkpoint, and at that point ThisTimeLineID is updated
anyway.

1a11d460

Add pg_upgrade comment about mismatch error · 345fb82f
Bruce Momjian authored Dec 20, 2012
```
Add comment stating that constraint and index names must match.
```
345fb82f

19 Dec, 2012 3 commits

Check if we've reached end-of-backup point also if no redo is required. · e43f947b

Heikki Linnakangas authored Dec 19, 2012

If you restored from a backup taken from a standby, and the last record in
the backup is the checkpoint record, ie. there is no redo required except
for the checkpoint record, we would fail to notice that we've reached the
end-of-backup point, and the database is consistent. The result was an
error "WAL ends before end of online backup". To fix, move the
have-we-reached-end-of-backup check into CheckRecoveryConsistency(), which
is already responsible for similar checks with minRecoveryPoint, and is
called in the right places.

Backpatch to 9.2, this check and bug did not exist before that.

e43f947b

Rename SQL feature S403 to ARRAY_MAX_CARDINALITY · f2b88080
Peter Eisentraut authored Dec 19, 2012
```
In an earlier version of the standard, this was called just
"MAX_CARDINALITY".
```
f2b88080
pg_basebackup: Small message punctuation improvements · 6925e38d
Peter Eisentraut authored Dec 19, 2012

6925e38d

18 Dec, 2012 5 commits

Don't include postgres.h in postgres_fe.h for cpluspluscheck. · 9ac749ce
Andrew Dunstan authored Dec 18, 2012
```
Error exposed by recent Assert changes.

Complaint from Peter Eisentraut.
```
9ac749ce

Ignore libedit/libreadline while probing for standard functions. · 2666a6d0

Tom Lane authored Dec 18, 2012

Some versions of libedit expose bogus definitions of setproctitle(),
optreset, and perhaps other symbols that we don't want configure to pick up
on. There was a previous report of similar problems with strlcpy(), which
we addressed in commit 59cf88da, but the
problem has evidently grown in scope since then. In hopes of not having to
deal with it again in future, rearrange configure's tests for supplied
functions so that we ignore libedit/libreadline except when probing
specifically for functions we expect them to provide.

Per report from Christoph Berg, though this is slightly more aggressive
than his proposed patch.

2666a6d0

Remove allow_nonpic_in_shlib · 1a5f04dd

Peter Eisentraut authored Dec 18, 2012

This was used in a time when a shared libperl or libpython was difficult
to come by.  That is obsolete, and the idea behind the flag was never
fully portable anyway and will likely fail on more modern CPU
architectures.

1a5f04dd

doc: Put PL/pgSQL RAISE USING keywords into a list · 8d2e9a9d
Peter Eisentraut authored Dec 17, 2012
```
Karl O. Pinc
```
8d2e9a9d

Fix failure to ignore leftover temp tables after a server crash. · 6919b7e3

Tom Lane authored Dec 17, 2012

During crash recovery, we remove disk files belonging to temporary tables,
but the system catalog entries for such tables are intentionally not
cleaned up right away. Instead, the first backend that uses a temp schema
is expected to clean out any leftover objects therein. This approach
requires that we be careful to ignore leftover temp tables (since any
actual access attempt would fail), *even if their BackendId matches our
session*, if we have not yet established use of the session's corresponding
temp schema. That worked fine in the past, but was broken by commit
debcec7d which incorrectly removed the
rd_islocaltemp relcache flag. Put it back, and undo various changes
that substituted tests like "rel->rd_backend == MyBackendId" for use
of a state-aware flag. Per trouble report from Heikki Linnakangas.

Back-patch to 9.1 where the erroneous change was made. In the back
branches, be careful to add rd_islocaltemp in a spot in the struct that
was alignment padding before, so as not to break existing add-on code.

6919b7e3

16 Dec, 2012 4 commits

Fix filling of postmaster.pid in bootstrap/standalone mode. · c2994772

Tom Lane authored Dec 16, 2012

We failed to ever fill the sixth line (LISTEN_ADDR), which caused the
attempt to fill the seventh line (SHMEM_KEY) to fail, so that the shared
memory key never got added to the file in standalone mode. This has been
broken since we added more content to our lock files in 9.1.

To fix, tweak the logic in CreateLockFile to add an empty LISTEN_ADDR
line in standalone mode. This is a tad grotty, but since that function
already knows almost everything there is to know about the contents of
lock files, it doesn't seem that it's any better to hack it elsewhere.

It's not clear how significant this bug really is, since a standalone
backend should never have any children and thus it seems not critical
to be able to check the nattch count of the shmem segment externally.
But I'm going to back-patch the fix anyway.

This problem had escaped notice because of an ancient (and in hindsight
pretty dubious) decision to suppress LOG-level messages by default in
standalone mode; so that the elog(LOG) complaint in AddToDataDirLockFile
that should have warned of the problem didn't do anything. Fixing that
is material for a separate patch though.

c2994772

Tidy up from frontend Assert change. · 3717f083
Andrew Dunstan authored Dec 16, 2012
```
Quiet compiler warnings noted by Peter Eisentraut.
```
3717f083
Properly copy fmgroids.h after clean on Win32 · c1f856a1
Magnus Hagander authored Dec 16, 2012
```
Craig Ringer
```
c1f856a1

doc: Remove extra table column · c2e32d5a

Peter Eisentraut authored Dec 16, 2012

Not all system catalog description tables have the same number of
columns, and the patch to add oid columns did one bit too much
copy-and-pasting.

c2e32d5a

15 Dec, 2012 2 commits
- doc: Add oid columns to system catalog documentation · 160701f6
  Peter Eisentraut authored Dec 15, 2012
```
Karl O. Pinc and Jeff Davis
```
  160701f6
- doc: Add pg_stat_reset and related functions to index · 79a457dc
  Peter Eisentraut authored Dec 15, 2012
  
  79a457dc
14 Dec, 2012 4 commits
- Provide Assert() for frontend code. · 1c382655
  Andrew Dunstan authored Dec 14, 2012
```
Per discussion on-hackers. psql is converted to use the new code.

Follows a suggestion from Heikki Linnakangas.
```
  1c382655
- Update comment in heapgetpage() regarding PD_ALL_VISIBLE vs. Hot Standby. · 75758a6f
  Robert Haas authored Dec 14, 2012
```
Pavan Deolasee, slightly modified by me
```
  75758a6f
- NLS: Use msgmerge --previous option · fdb67eb2
  Peter Eisentraut authored Dec 13, 2012
```
It provides some additional help to translators.
```
  fdb67eb2
- doc: Improve search_path mentions in index · a301eb99
  Peter Eisentraut authored Dec 13, 2012
```
Karl O. Pinc
```
  a301eb99
13 Dec, 2012 2 commits

Allow a streaming replication standby to follow a timeline switch. · abfd192b

Heikki Linnakangas authored Dec 13, 2012

Before this patch, streaming replication would refuse to start replicating
if the timeline in the primary doesn't exactly match the standby. The
situation where it doesn't match is when you have a master, and two
standbys, and you promote one of the standbys to become new master.
Promoting bumps up the timeline ID, and after that bump, the other standby
would refuse to continue.

There's significantly more timeline related logic in streaming replication
now. First of all, when a standby connects to primary, it will ask the
primary for any timeline history files that are missing from the standby.
The missing files are sent using a new replication command TIMELINE_HISTORY,
and stored in standby's pg_xlog directory. Using the timeline history files,
the standby can follow the latest timeline present in the primary
(recovery_target_timeline='latest'), just as it can follow new timelines
appearing in an archive directory.

START_REPLICATION now takes a TIMELINE parameter, to specify exactly which
timeline to stream WAL from. This allows the standby to request the primary
to send over WAL that precedes the promotion. The replication protocol is
changed slightly (in a backwards-compatible way although there's little hope
of streaming replication working across major versions anyway), to allow
replication to stop when the end of timeline reached, putting the walsender
back into accepting a replication command.

Many thanks to Amit Kapila for testing and reviewing various versions of
this patch.

abfd192b

Make xlog_internal.h includable in frontend context. · 52766871

Heikki Linnakangas authored Dec 13, 2012

This makes unnecessary the ugly hack used to #include postgres.h in
pg_basebackup.

Based on Alvaro Herrera's patch

52766871

12 Dec, 2012 3 commits

In multi-insert, don't go into infinite loop on a huge tuple and fillfactor. · 6264cd3d

Heikki Linnakangas authored Dec 12, 2012

If a tuple is larger than page size minus space reserved for fillfactor,
heap_multi_insert would never find a page that it fits in and repeatedly ask
for a new page from RelationGetBufferForTuple. If a tuple is too large to
fit on any page, taking fillfactor into account, RelationGetBufferForTuple
will always expand the relation. In a normal insert, heap_insert will accept
that and put the tuple on the new page. heap_multi_insert, however, does a
fillfactor check of its own, and doesn't accept the newly-extended page
RelationGetBufferForTuple returns, even though there is no other choice to
make the tuple fit.

Fix that by making the logic in heap_multi_insert more like the heap_insert
logic. The first tuple is always put on the page RelationGetBufferForTuple
gives us, and the fillfactor check is only applied to the subsequent tuples.

Report from David Gould, although I didn't use his patch.

6264cd3d

Add defenses against integer overflow in dynahash numbuckets calculations. · 691c5ebf

Tom Lane authored Dec 11, 2012

The dynahash code requires the number of buckets in a hash table to fit
in an int; but since we calculate the desired hash table size dynamically,
there are various scenarios where we might calculate too large a value.
The resulting overflow can lead to infinite loops, division-by-zero
crashes, etc. I (tgl) had previously installed some defenses against that
in commit 299d1716, but that covered only one
call path. Moreover it worked by limiting the request size to work_mem,
but in a 64-bit machine it's possible to set work_mem high enough that the
problem appears anyway. So let's fix the problem at the root by installing
limits in the dynahash.c functions themselves.

Trouble report and patch by Jeff Davis.

691c5ebf

Disable event triggers in standalone mode. · cd3413ec

Tom Lane authored Dec 11, 2012

Per discussion, this seems necessary to allow recovery from broken event
triggers, or broken indexes on pg_event_trigger.

Dimitri Fontaine

cd3413ec

11 Dec, 2012 3 commits

Fix performance problems with autovacuum truncation in busy workloads. · b19e4250

Kevin Grittner authored Dec 11, 2012

In situations where there are over 8MB of empty pages at the end of
a table, the truncation work for trailing empty pages takes longer
than deadlock_timeout, and there is frequent access to the table by
processes other than autovacuum, there was a problem with the
autovacuum worker process being canceled by the deadlock checking
code. The truncation work done by autovacuum up that point was
lost, and the attempt tried again by a later autovacuum worker. The
attempts could continue indefinitely without making progress,
consuming resources and blocking other processes for up to
deadlock_timeout each time.

This patch has the autovacuum worker checking whether it is
blocking any other thread at 20ms intervals. If such a condition
develops, the autovacuum worker will persist the work it has done
so far, release its lock on the table, and sleep in 50ms intervals
for up to 5 seconds, hoping to be able to re-acquire the lock and
try again. If it is unable to get the lock in that time, it moves
on and a worker will try to continue later from the point this one
left off.

While this patch doesn't change the rules about when and what to
truncate, it does cause the truncation to occur sooner, with less
blocking, and with the consumption of fewer resources when there is
contention for the table's lock.

The only user-visible change other than improved performance is
that the table size during truncation may change incrementally
instead of just once.

This problem exists in all supported versions but is infrequently
reported, although some reports of performance problems when
autovacuum runs might be caused by this. Initial commit is just the
master branch, but this should probably be backpatched once the
build farm and general developer usage confirm that there are no
surprising effects.

Jan Wieck

b19e4250

Fix pg_upgrade for invalid indexes · e95c4bd1

Bruce Momjian authored Dec 11, 2012

All versions of pg_upgrade upgraded invalid indexes caused by CREATE
INDEX CONCURRENTLY failures and marked them as valid.  The patch adds a
check to all pg_upgrade versions and throws an error during upgrade or
--check.

Backpatch to 9.2, 9.1, 9.0.  Patch slightly adjusted.

e95c4bd1

Consistency check should compare last record replayed, not last record read. · 970fb12d

Heikki Linnakangas authored Dec 11, 2012

EndRecPtr is the last record that we've read, but not necessarily yet
replayed. CheckRecoveryConsistency should compare minRecoveryPoint with the
last replayed record instead. This caused recovery to think it's reached
consistency too early.

Now that we do the check in CheckRecoveryConsistency correctly, we have to
move the call of that function to after redoing a record. The current place,
after reading a record but before replaying it, is wrong. In particular, if
there are no more records after the one ending at minRecoveryPoint, we don't
enter hot standby until one extra record is generated and read by the
standby, and CheckRecoveryConsistency is called. These two bugs conspired
to make the code appear to work correctly, except for the small window
between reading the last record that reaches minRecoveryPoint, and
replaying it.

In the passing, rename recoveryLastRecPtr, which is the last record
replayed, to lastReplayedEndRecPtr. This makes it slightly less confusing
with replayEndRecPtr, which is the last record read that we're about to
replay.

Original report from Kyotaro HORIGUCHI, further diagnosis by Fujii Masao.
Backpatch to 9.0, where Hot Standby subtly changed the test from
"minRecoveryPoint < EndRecPtr" to "minRecoveryPoint <= EndRecPtr". The
former works because where the test is performed, we have always read one
more record than we've replayed.

970fb12d