Commits · 37484ad2aacef5ec794f4dd3d5cf814475180a78 · Abuhujair Javed / Postgres FD Implementation

22 Dec, 2013 1 commit

Change the way we mark tuples as frozen. · 37484ad2

Robert Haas authored Dec 22, 2013

Instead of changing the tuple xmin to FrozenTransactionId, the combination
of HEAP_XMIN_COMMITTED and HEAP_XMIN_INVALID, which were previously never
set together, is now defined as HEAP_XMIN_FROZEN. A variety of previous
proposals to freeze tuples opportunistically before vacuum_freeze_min_age
is reached have foundered on the objection that replacing xmin by
FrozenTransactionId might hinder debugging efforts when things in this
area go awry; this patch is intended to solve that problem by keeping
the XID around (but largely ignoring the value to which it is set).

Third-party code that checks for HEAP_XMIN_INVALID on tuples where
HEAP_XMIN_COMMITTED might be set will be broken by this change. To fix,
use the new accessor macros in htup_details.h rather than consulting the
bits directly. HeapTupleHeaderGetXmin has been modified to return
FrozenTransactionId when the infomask bits indicate that the tuple is
frozen; use HeapTupleHeaderGetRawXmin when you already know that the
tuple isn't marked commited or frozen, or want the raw value anyway.
We currently do this in routines that display the xmin for user consumption,
in tqual.c where it's known to be safe and important for the avoidance of
extra cycles, and in the function-caching code for various procedural
languages, which shouldn't invalidate the cache just because the tuple
gets frozen.

Robert Haas and Andres Freund

37484ad2

20 Dec, 2013 4 commits

Rename wal_log_hintbits to wal_log_hints, per discussion on pgsql-hackers. · 961bf59f
Fujii Masao authored Dec 21, 2013
```
Sawada Masahiko
```
961bf59f

Avoid useless palloc during transaction commit · 6130208e

Alvaro Herrera authored Dec 20, 2013

We can allocate the initial relations-to-drop array when first needed,
instead of at function entry; this avoids allocating it when the
function is not going to do anything, which is most of the time.

Backpatch to 9.3, where this behavior was introduced by commit
279628a0.

There's more that could be done here, such as possible reworking of the
code to avoid having to palloc anything, but that doesn't sound as
backpatchable as this relatively minor change.

Per complaint from Noah Misch in
20131031145234.GA621493@tornado.leadboat.com

6130208e

pg_prewarm, a contrib module for prewarming relationd data. · c32afe53
Robert Haas authored Dec 20, 2013
```
Patch by me.  Review by Álvaro Herrera, Amit Kapila, Jeff Janes,
Gurjeet Singh, and others.
```
c32afe53
isolationtester: Ensure stderr is unbuffered, too · 6eda3e9c
Alvaro Herrera authored Dec 19, 2013

6eda3e9c

19 Dec, 2013 6 commits

Move pg_upgrade_support global variables to their own include file · 527fdd9d
Bruce Momjian authored Dec 19, 2013
```
Previously their declarations were spread around to avoid accidental
access.
```
527fdd9d

Make stdout unbuffered · 73bcb76b

Alvaro Herrera authored Dec 19, 2013

This ensures that all stdout output is flushed immediately, to match
stderr.  This eliminates the need for fflush(stdout) calls sprinkled all
over the place.

Per Daniel Wood in message 519A79C6.90308@salesforce.com

73bcb76b

Optimize updating a row that's locked by same xid · 13aa6244

Alvaro Herrera authored Dec 19, 2013

Updating or locking a row that was already locked by the same
transaction under the same Xid caused a MultiXact to be created; but
this is unnecessary, because there's no usefulness in being able to
differentiate two locks by the same transaction.  In particular, if a
transaction executed SELECT FOR UPDATE followed by an UPDATE that didn't
modify columns of the key, we would dutifully represent the resulting
combination as a multixact -- even though a single key-update is
sufficient.

Optimize the case so that only the strongest of both locks/updates is
represented in Xmax.  This can save some Xmax's from becoming
MultiXacts, which can be a significant optimization.

This missed optimization opportunity was spotted by Andres Freund while
investigating a bug reported by Oliver Seemann in message
CANCipfpfzoYnOz5jj=UZ70_R=CwDHv36dqWSpwsi27vpm1z5sA@mail.gmail.com
and also directly as a performance regression reported by Dong Ye in
message
d54b8387.000012d8.00000010@YED-DEVD1.vmware.com
Reportedly, this patch fixes the performance regression.

Since the missing optimization was reported as a significant performance
regression from 9.2, backpatch to 9.3.

Andres Freund, tweaked by Álvaro Herrera

13aa6244

Add tab completion for ALTER SYSTEM SET in psql. · 084e385a
Fujii Masao authored Dec 20, 2013

084e385a
Fix typo in docs for min_recovery_apply_delay. · f83a7545
Fujii Masao authored Dec 19, 2013
```
Bernd Helmle
```
f83a7545
Upgrade to Autoconf 2.69 · 94b899b8
Peter Eisentraut authored Dec 18, 2013

94b899b8

18 Dec, 2013 5 commits

Fix compiler warning. · 6bb9d301

Robert Haas authored Dec 18, 2013

get_user_name returns const char *, but we were assigning the result
to a char * variable.

6bb9d301

Allow on-detach callbacks for dynamic shared memory segments. · 001a573a

Robert Haas authored Dec 18, 2013

Just as backends must clean up their shared memory state (releasing
lwlocks, buffer pins, etc.) before exiting, they must also perform
any similar cleanups related to dynamic shared memory segments they
have mapped before unmapping those segments.  So add a mechanism to
ensure that.

Existing on_shmem_exit hooks include both "user level" cleanup such
as transaction abort and removal of leftover temporary relations and
also "low level" cleanup that forcibly released leftover shared
memory resources.  On-detach callbacks should run after the first
group but before the second group, so create a new before_shmem_exit
function for registering the early callbacks and keep on_shmem_exit
for the regular callbacks.  (An earlier draft of this patch added an
additional argument to on_shmem_exit, but that had a much larger
footprint and probably a substantially higher risk of breaking third
party code for no real gain.)

Patch by me, reviewed by KaiGai Kohei and Andres Freund.

001a573a

Fix incorrect error message reported for non-existent users · 613c6d26

Bruce Momjian authored Dec 18, 2013

Previously, lookups of non-existent user names could return "Success";
it will now return "User does not exist" by resetting errno. This also
centralizes the user name lookup code in libpgport.

Report and analysis by Nicolas Marchildon; patch by me

613c6d26

Don't ignore tuple locks propagated by our updates · 11ac4c73

Alvaro Herrera authored Dec 18, 2013

If a tuple was locked by transaction A, and transaction B updated it,
the new version of the tuple created by B would be locked by A, yet
visible only to B; due to an oversight in HeapTupleSatisfiesUpdate, the
lock held by A wouldn't get checked if transaction B later deleted (or
key-updated) the new version of the tuple. This might cause referential
integrity checks to give false positives (that is, allow deletes that
should have been rejected).

This is an easy oversight to have made, because prior to improved tuple
locks in commit 0ac5ad51 it wasn't possible to have tuples created by
our own transaction that were also locked by remote transactions, and so
locks weren't even considered in that code path.

It is recommended that foreign keys be rechecked manually in bulk after
installing this update, in case some referenced rows are missing with
some referencing row remaining.

Per bug reported by Daniel Wood in
CAPweHKe5QQ1747X2c0tA=5zf4YnS2xcvGf13Opd-1Mq24rF1cQ@mail.gmail.com

11ac4c73

Add ALTER SYSTEM command to edit the server configuration file. · 65d6e4cb

Tatsuo Ishii authored Dec 18, 2013

Patch contributed by Amit Kapila. Reviewed by Hari Babu, Masao Fujii,
Boszormenyi Zoltan, Andres Freund, Greg Smith and others.

65d6e4cb

17 Dec, 2013 1 commit
- Comment: COPY comment improvement · dba5a9dd
  Bruce Momjian authored Dec 17, 2013
```
Etsuro Fujita
```
  dba5a9dd
16 Dec, 2013 2 commits

Rework tuple freezing protocol · 3b97e682

Alvaro Herrera authored Dec 16, 2013

Tuple freezing was broken in connection to MultiXactIds; commit
8e53ae025de9 tried to fix it, but didn't go far enough. As noted by
Noah Misch, freezing a tuple whose Xmax is a multi containing an aborted
update might cause locks in the multi to go ignored by later
transactions. This is because the code depended on a multixact above
their cutoff point not having any lock-only member older than the cutoff
point for Xids, which is easily defeated in READ COMMITTED transactions.

The fix for this involves creating a new MultiXactId when necessary.
But this cannot be done during WAL replay, and moreover multixact
examination requires using CLOG access routines which are not supposed
to be used during WAL replay either; so tuple freezing cannot be done
with the old freeze WAL record. Therefore, separate the freezing
computation from its execution, and change the WAL record to carry all
necessary information. At WAL replay time, it's easy to re-execute
freezing because we don't need to re-compute the new infomask/Xmax
values but just take them from the WAL record.

While at it, restructure the coding to ensure all page changes occur in
a single critical section without much room for failures. The previous
coding wasn't using a critical section, without any explanation as to
why this was acceptable.

In replication scenarios using the 9.3 branch, standby servers must be
upgraded before their master, so that they are prepared to deal with the
new WAL record once the master is upgraded; failure to do so will cause
WAL replay to die with a PANIC message. Later upgrade of the standby
will allow the process to continue where it left off, so there's no
disruption of the data in the standby in any case. Standbys know how to
deal with the old WAL record, so it's okay to keep the master running
the old code for a while.

In master, the old freeze WAL record is gone, for cleanliness' sake;
there's no compatibility concern there.

Backpatch to 9.3, where the original bug was introduced and where the
previous fix was backpatched.

Álvaro Herrera and Andres Freund

3b97e682

Mark variables 'static' where possible. Move GinFuzzySearchLimit to ginget.c · 30b96549

Heikki Linnakangas authored Dec 16, 2013

Per "clang -Wmissing-variable-declarations" output, posted by Andres Freund.
I didn't silence all those warnings, though, only the most obvious cases.

30b96549

15 Dec, 2013 2 commits

Add "SHIFT_JIS" as an accepted encoding name for locale checking. · 1f0626ee

Tatsuo Ishii authored Dec 15, 2013

When locale is "ja_JP.SJIS", nl_langinfo(CODESET) returns "SHIFT_JIS"
on some platforms, at least on RedHat Linux. So the encoding/locale
match table (encoding_match_list) needs the entry. Otherwise client
encoding is set to SQL_ASCII.

Back patch to all supported branches.

1f0626ee

Allow empty target list in SELECT. · 1b4f7f93

Tom Lane authored Dec 14, 2013

This fixes a problem noted as a followup to bug #8648: if a query has a
semantically-empty target list, e.g. SELECT * FROM zero_column_table,
ruleutils.c will dump it as a syntactically-empty target list, which was
not allowed.  There doesn't seem to be any reliable way to fix this by
hacking ruleutils (note in particular that the originally zero-column table
might since have had columns added to it); and even if we had such a fix,
it would do nothing for existing dump files that might contain bad syntax.
The best bet seems to be to relax the syntactic restriction.

Also, add parse-analysis errors for SELECT DISTINCT with no columns (after
*-expansion) and RETURNING with no columns.  These cases previously
produced unexpected behavior because the parsed Query looked like it had
no DISTINCT or RETURNING clause, respectively.  If anyone ever offers
a plausible use-case for this, we could work a bit harder on making the
situation distinguishable.

Arguably this is a bug fix that should be back-patched, but I'm worried
that there may be client apps or PLs that expect "SELECT ;" to throw a
syntax error.  The issue doesn't seem important enough to risk changing
behavior in minor releases.

1b4f7f93

14 Dec, 2013 1 commit

Fix inherited UPDATE/DELETE with UNION ALL subqueries. · c03ad560

Tom Lane authored Dec 14, 2013

Fix an oversight in commit b3aaf908: we do
indeed need to process the planner's append_rel_list when copying RTE
subqueries, because if any of them were flattenable UNION ALL subqueries,
the append_rel_list shows which subquery RTEs were pulled up out of which
other ones. Without this, UNION ALL subqueries aren't correctly inserted
into the update plans for inheritance child tables after the first one,
typically resulting in no update happening for those child table(s).
Per report from Victor Yegorov.

Experimentation with this case also exposed a fault in commit
a7b96538: if an inherited UPDATE/DELETE
was proven totally dummy by constraint exclusion, we might arrive at
add_rtes_to_flat_rtable with root->simple_rel_array being NULL. This
should be interpreted as not having any RelOptInfos. I chose to code
the guard as a check against simple_rel_array_size, so as to also
provide some protection against indexing off the end of the array.

Back-patch to 9.2 where the faulty code was added.

c03ad560

13 Dec, 2013 9 commits

Fix typo · 60eea378
Alvaro Herrera authored Dec 13, 2013

60eea378

Rework MultiXactId cache code · d881dd62

Alvaro Herrera authored Dec 13, 2013

The original performs too poorly; in some scenarios it shows way too
high while profiling.  Try to make it a bit smarter to avoid excessive
cosst.  In particular, make it have a maximum size, and have entries be
sorted in LRU order; once the max size is reached, evict the oldest
entry to avoid it from growing too large.

Per complaint from Andres Freund in connection with new tuple freezing
code.

d881dd62

Add HOLD/RESUME_INTERRUPTS in HandleCatchupInterrupt/HandleNotifyInterrupt. · 2efc6dc2

Tom Lane authored Dec 13, 2013

This prevents a possible longjmp out of the signal handler if a timeout
or SIGINT occurs while something within the handler has transiently set
ImmediateInterruptOK. For safety we must hold off the timeout or cancel
error until we're back in mainline, or at least till we reach the end of
the signal handler when ImmediateInterruptOK was true at entry. This
syncs these functions with the logic now present in handle_sig_alarm.

AFAICT there is no live bug here in 9.0 and up, because I don't think we
currently can wait for any heavyweight lock inside these functions, and
there is no other code (except read-from-client) that will turn on
ImmediateInterruptOK. However, that was not true pre-9.0: in older
branches ProcessIncomingNotify might block trying to lock pg_listener, and
then a SIGINT could lead to undesirable control flow. It might be all
right anyway given the relatively narrow code ranges in which NOTIFY
interrupts are enabled, but for safety's sake I'm back-patching this.

2efc6dc2

Fix more instances of "the the" in comments. · dde62825
Heikki Linnakangas authored Dec 13, 2013
```
Plus one instance of "to to" in the docs.
```
dde62825

Don't let timeout interrupts happen unless ImmediateInterruptOK is set. · e8312b4f

Tom Lane authored Dec 13, 2013

Serious oversight in commit 16e1b7a1:
we should not allow an interrupt to take control away from mainline code
except when ImmediateInterruptOK is set.  Just to be safe, let's adopt
the same save-clear-restore dance that's been used for many years in
HandleCatchupInterrupt and HandleNotifyInterrupt, so that nothing bad
happens if a timeout handler invokes code that tests or even manipulates
ImmediateInterruptOK.

Per report of "stuck spinlock" failures from Christophe Pettus, though
many other symptoms are possible.  Diagnosis by Andres Freund.

e8312b4f

Add GUC to enable WAL-logging of hint bits, even with checksums disabled. · 50e54709

Heikki Linnakangas authored Dec 13, 2013

WAL records of hint bit updates is useful to tools that want to examine
which pages have been modified. In particular, this is required to make
the pg_rewind tool safe (without checksums).

This can also be used to test how much extra WAL-logging would occur if
you enabled checksums, without actually enabling them (which you can't
currently do without re-initdb'ing).

Sawada Masahiko, docs by Samrat Revagade. Reviewed by Dilip Kumar, with
further changes by me.

50e54709

Fix double "the" in the documentation · 56afe850
Magnus Hagander authored Dec 13, 2013
```
Erik Rijkers
```
56afe850

Fix WAL-logging of setting the visibility map bit. · a49633d8

Heikki Linnakangas authored Dec 13, 2013

The operation that removes the remaining dead tuples from the page must
be WAL-logged before the setting of the VM bit. Otherwise, if you replay
the WAL to between those two records, you end up with the VM bit set, but
the dead tuples are still there.

Backpatch to 9.3, where this bug was introduced.

a49633d8

configure: Allow adding a custom string to PG_VERSION · 46328916

Peter Eisentraut authored Dec 12, 2013

This can be used to mark custom built binaries with an extra version
string such as a git describe identifier or distribution package release
version.

From: Oskari Saarenmaa <os@ohmu.fi>

46328916

12 Dec, 2013 7 commits

Fix ancient docs/comments thinko: XID comparison is mod 2^32, not 2^31. · ccca6f56
Tom Lane authored Dec 12, 2013
```
Pointed out by Gianni Ciolli.
```
ccca6f56
Improve EXPLAIN to print the grouping columns in Agg and Group nodes. · f2609905
Tom Lane authored Dec 12, 2013
```
Per request from Kevin Grittner.
```
f2609905

New autovacuum_work_mem parameter · 8693559c

Simon Riggs authored Dec 12, 2013

If autovacuum_work_mem is set, autovacuum workers now use
this parameter in preference to maintenance_work_mem.

Peter Geoghegan

8693559c

Allow time delayed standbys and recovery · 36da3cfb

Simon Riggs authored Dec 12, 2013

Set min_recovery_apply_delay to force a delay in recovery apply for commit and
restore point WAL records. Other records are replayed immediately. Delay is
measured between WAL record time and local standby time.

Robert Haas, Fabrízio de Royes Mello and Simon Riggs
Detailed review by Mitsumasa Kondo

36da3cfb

Fix progress logging when scale factor is large. · 841a6548

Tatsuo Ishii authored Dec 12, 2013

Integer overflow showed minus percent and minus remaining time something like this.
  239300000 of 3800000000 tuples (-48%) done (elapsed 226.86 s, remaining -696.10 s).

841a6548

Display old and new values in pg_resetxlog -n output. · 108e3992
Heikki Linnakangas authored Dec 12, 2013
```
For extra clarity.

Rajeev Rastogi, reviewed by Amit Kapila
```
108e3992
Remove bogus executable permissions on xlog.c. · 22310b80
Tom Lane authored Dec 11, 2013
```
Apparently fat-fingered in 1a3d1044.
Noted by Peter Geoghegan.
```
22310b80

11 Dec, 2013 2 commits

Add a regression test case for plpython function returning setof RECORD. · 6bff0e7d

Tom Lane authored Dec 11, 2013

We had coverage for functions returning setof a named composite type,
but not for anonymous records, which is a somewhat different code path.
In view of recent crash report from Sergey Konoplev, this seems worth
testing, though I doubt there's any deterministic bug here today.

6bff0e7d

Regression tests for SCHEMA commands · cf589c9c
Simon Riggs authored Dec 11, 2013
```
Hari Babu Kommi reviewed by David Rowley
```
cf589c9c