Commits · 98f58a30c1beb6ec0870d6520f49fb40d9d0b566 · Abuhujair Javed / Postgres FD Implementation

22 Nov, 2013 4 commits

Fix Hot-Standby initialization of clog and subtrans. · 98f58a30

Heikki Linnakangas authored Nov 22, 2013

These bugs can cause data loss on standbys started with hot_standby=on at
the moment they start to accept read only queries, by marking committed
transactions as uncommited. The likelihood of such corruptions is small
unless the primary has a high transaction rate.

5a031a55 fixed bugs in HS's startup logic
by maintaining less state until at least STANDBY_SNAPSHOT_PENDING state
was reached, missing the fact that both clog and subtrans are written to
before that. This only failed to fail in common cases because the usage
of ExtendCLOG in procarray.c was superflous since clog extensions are
actually WAL logged.

f44eedc3f0f347a856eea8590730769125964597/I then tried to fix the missing
extensions of pg_subtrans due to the former commit's changes - which are
not WAL logged - by performing the extensions when switching to a state
> STANDBY_INITIALIZED and not performing xid assignments before that -
again missing the fact that ExtendCLOG is unneccessary - but screwed up
twice: Once because latestObservedXid wasn't updated anymore in that
state due to the earlier commit and once by having an off-by-one error in
the loop performing extensions. This means that whenever a
CLOG_XACTS_PER_PAGE (32768 with default settings) boundary was crossed
between the start of the checkpoint recovery started from and the first
xl_running_xact record old transactions commit bits in pg_clog could be
overwritten if they started and committed in that window.

Fix this mess by not performing ExtendCLOG() in HS at all anymore since
it's unneeded and evidently dangerous and by performing subtrans
extensions even before reaching STANDBY_SNAPSHOT_PENDING.

Analysis and patch by Andres Freund. Reported by Christophe Pettus.
Backpatch down to 9.0, like the previous commit that caused this.

98f58a30

Avoid acquiring spinlock when checking if recovery has finished, for speed. · 1a3d1044

Heikki Linnakangas authored Nov 22, 2013

RecoveryIsInProgress() can be called very frequently. During normal
operation, it just checks a backend-local variable and returns quickly,
but during hot standby, it checks a spinlock-protected shared variable.
Those spinlock acquisitions can become a point of contention on a busy
hot standby system.

Replace the spinlock acquisition with a memory barrier.

Per discussion with Andres Freund, Ants Aasma and Merlin Moncure.

1a3d1044

Tweak streamutil.c further to avoid scan-build warning · f4482a54
Peter Eisentraut authored Nov 21, 2013
```
The previous change added a new scan-build warning about need_password
assigned but not read.
```
f4482a54

Support multi-argument UNNEST(), and TABLE() syntax for multiple functions. · 784e762e

Tom Lane authored Nov 21, 2013

This patch adds the ability to write TABLE( function1(), function2(), ...)
as a single FROM-clause entry. The result is the concatenation of the
first row from each function, followed by the second row from each
function, etc; with NULLs inserted if any function produces fewer rows than
others. This is believed to be a much more useful behavior than what
Postgres currently does with multiple SRFs in a SELECT list.

This syntax also provides a reasonable way to combine use of column
definition lists with WITH ORDINALITY: put the column definition list
inside TABLE(), where it's clear that it doesn't control the ordinality
column as well.

Also implement SQL-compliant multiple-argument UNNEST(), by turning
UNNEST(a,b,c) into TABLE(unnest(a), unnest(b), unnest(c)).

The SQL standard specifies TABLE() with only a single function, not
multiple functions, and it seems to require an implicit UNNEST() which is
not what this patch does. There may be something wrong with that reading
of the spec, though, because if it's right then the spec's TABLE() is just
a pointless alternative spelling of UNNEST(). After further review of
that, we might choose to adopt a different syntax for what this patch does,
but in any case this functionality seems clearly worthwhile.

Andrew Gierth, reviewed by Zoltán Böszörményi and Heikki Linnakangas, and
significantly revised by me

784e762e

21 Nov, 2013 1 commit

Fix pg_isready to handle -d option properly. · 38f43289

Fujii Masao authored Nov 21, 2013

Previously, -d option for pg_isready was broken. When the name of the
database was specified by -d option, pg_isready failed with an error.
When the conninfo specified by -d option contained the setting of the
host name but not Numeric IP address (i.e., hostaddr), pg_isready
displayed wrong connection message. -d option could not handle a valid
URI prefix at all. This commit fixes these bugs of pg_isready.

Backpatch to 9.3, where pg_isready was introduced.

Per report from Josh Berkus and Robert Haas.
Original patch by Fabrízio de Royes Mello, heavily modified by me.

38f43289

20 Nov, 2013 4 commits

More GIN refactoring. · 04eee1fa

Heikki Linnakangas authored Nov 20, 2013

Split off the portion of ginInsertValue that inserts the tuple to current
level into a separate function, ginPlaceToPage. ginInsertValue's charter
is now to recurse up the tree to insert the downlink, when a page split is
required.

This is in preparation for a patch to change the way incomplete splits are
handled, which will need to do these operations separately. And IMHO makes
the code more readable anyway.

04eee1fa

Refactor the internal GIN B-tree interface for forming a downlink. · 50101263

Heikki Linnakangas authored Nov 20, 2013

This creates a new gin-btree callback function for creating a downlink for
a page. Previously, ginxlog.c duplicated the logic used during normal
operation.

50101263

Further GIN refactoring. · 04965ad4

Heikki Linnakangas authored Nov 20, 2013

Merge some functions that were always called together. Makes the code
little bit more readable.

04965ad4

ecpg: Split off mmfatal() from mmerror() · b21de4e7
Peter Eisentraut authored Nov 12, 2013
```
This allows decorating mmfatal() with noreturn compiler hints, leading
to better diagnostics.
```
b21de4e7

19 Nov, 2013 4 commits

docs: update page format to specify page checksum field · 22967d80
Bruce Momjian authored Nov 19, 2013
```
Backpatch to 9.3

Per report from Steffen Hildebrandt
```
22967d80

pg_upgrade: avoid ALTER COLUMN TYPE on inherited columns · dbd786bc

Bruce Momjian authored Nov 19, 2013

This only affects upgrades from 8.3 currently, and is harmless as the
child just generates an error in the script, but we should get it right
in case we ever need this for more complex uses.

Per report from Peter Eisentraut

dbd786bc

Add tab completion for \pset in psql. · b1543cc8
Fujii Masao authored Nov 19, 2013
```
Pavel Stehule, reviewed by Ian Lawrence Barwick
```
b1543cc8

pg_upgrade: Report full disk better · bd5a9a50

Peter Eisentraut authored Nov 18, 2013

Previously, pg_upgrade would abort copy_file() on a short write without
setting errno, which the caller would report as an error with the
message "Success".  We assume ENOSPC in that case, as we do elsewhere in
the code.  Also set errno in some other error cases in copy_file() to
avoid bogus "Success" error messages.

This was broken in 6b711cf3, so 9.2 and
before are OK.

bd5a9a50

18 Nov, 2013 5 commits

unaccent: Revert patch · 0dbf9a6a

Bruce Momjian authored Nov 18, 2013

The reverted patch to change functions from strict to immutable was
incorrect and needs additional research.

0dbf9a6a

Spell SQL keywords in uppercase in pg_dump's query. · fea43768
Heikki Linnakangas authored Nov 18, 2013
```
The server won't care, but let's be consistent.

David Rowley.
```
fea43768
Replace appendPQExpBuffer(..., <constant>) with appendPQExpBufferStr · 32ceba3e
Heikki Linnakangas authored Nov 18, 2013
```
Arguably makes the code a bit more readable, and might give a small
performance gain.

David Rowley
```
32ceba3e
Use cstring_to_text_with_len when length is known. · f1df4731
Robert Haas authored Nov 18, 2013
```
This avoids a potentially-expensive extra call to strlen().

David Rowley
```
f1df4731

Count locked pages that don't need vacuuming as scanned. · 4c697d8f

Heikki Linnakangas authored Nov 18, 2013

Previously, if VACUUM skipped vacuuming a page because it's pinned, it
didn't count that page as scanned. However, that meant that relfrozenxid
was not bumped up either, which prevented anti-wraparound vacuum from
doing its job.

Report by Миша Тюрин, analysis and patch by Sergey Burladyn and Jeff Janes.
Backpatch to 9.2, where the skip-locked-pages behavior was introduced.

4c697d8f

17 Nov, 2013 1 commit
- Add make_date() and make_time() functions. · f901bb50
  Tom Lane authored Nov 17, 2013
```
Pavel Stehule, reviewed by Jeevan Chalke and Atri Sharma
```
  f901bb50
16 Nov, 2013 4 commits

Improve performance of numeric sum(), avg(), stddev(), variance(), etc. · 69c8fbac

Tom Lane authored Nov 16, 2013

This patch improves performance of most built-in aggregates that formerly
used a NUMERIC or NUMERIC array as their transition type; this includes
not only aggregates on numeric inputs, but some aggregates on integer
inputs where overflow of an int8 value is a possibility. The code now
uses a special-purpose data structure to avoid array construction and
deconstruction overhead, as well as packing and unpacking overhead for
numeric values.

These aggregates' transition type is now declared as INTERNAL, since
it doesn't correspond to any SQL data type. To keep the planner from
thinking that that means a lot of storage will be used, we make use
of the just-added pg_aggregate.aggtransspace feature. The space estimate
is set to 128 bytes, which is at least in the right ballpark.

Hadi Moshayedi, reviewed by Pavel Stehule and Tomas Vondra

69c8fbac

Allow aggregates to provide estimates of their transition state data size. · 6cb86143

Tom Lane authored Nov 16, 2013

Formerly the planner had a hard-wired rule of thumb for guessing the amount
of space consumed by an aggregate function's transition state data. This
estimate is critical to deciding whether it's OK to use hash aggregation,
and in many situations the built-in estimate isn't very good. This patch
adds a column to pg_aggregate wherein a per-aggregate estimate can be
provided, overriding the planner's default, and infrastructure for setting
the column via CREATE AGGREGATE.

It may be that additional smarts will be required in future, perhaps even
a per-aggregate estimation function. But this is already a step forward.

This is extracted from a larger patch to improve the performance of numeric
and int8 aggregates. I (tgl) thought it was worth reviewing and committing
this infrastructure separately. In this commit, all built-in aggregates
are given aggtransspace = 0, so no behavior should change.

Hadi Moshayedi, reviewed by Pavel Stehule and Tomas Vondra

6cb86143

pg_upgrade: Fix some whitespace oddities · 55c3d86a
Peter Eisentraut authored Nov 16, 2013

55c3d86a

Remove pgbench's hardwired limit on line length in custom script files. · 61a07bae