Commits · 312bde3d404f770943c992992565c73f0336d21b · Abuhujair Javed / Postgres FD Implementation

05 Dec, 2013 3 commits

Fix improper abort during update chain locking · 312bde3d

Alvaro Herrera authored Dec 05, 2013

In 247c76a9, I added some code to do fine-grained checking of
MultiXact status of locking/updating transactions when traversing an
update chain.  There was a thinko in that patch which would have the
traversing abort, that is return HeapTupleUpdated, when the other
transaction is a committed lock-only.  In this case we should ignore it
and return success instead.  Of course, in the case where there is a
committed update, HeapTupleUpdated is the correct return value.

A user-visible symptom of this bug is that in REPEATABLE READ and
SERIALIZABLE transaction isolation modes spurious serializability errors
can occur:
  ERROR:  could not serialize access due to concurrent update

In order for this to happen, there needs to be a tuple that's key-share-
locked and also updated, and the update must abort; a subsequent
transaction trying to acquire a new lock on that tuple would abort with
the above error.  The reason is that the initial FOR KEY SHARE is seen
as committed by the new locking transaction, which triggers this bug.
(If the UPDATE commits, then the serialization error is correctly
reported.)

When running a query in READ COMMITTED mode, what happens is that the
locking is aborted by the HeapTupleUpdated return value, then
EvalPlanQual fetches the newest version of the tuple, which is then the
only version that gets locked.  (The second time the tuple is checked
there is no misbehavior on the committed lock-only, because it's not
checked by the code that traverses update chains; so no bug.) Only the
newest version of the tuple is locked, not older ones, but this is
harmless.

The isolation test added by this commit illustrates the desired
behavior, including the proper serialization errors that get thrown.

Backpatch to 9.3.

312bde3d

Clear retry flags properly in replacement OpenSSL sock_write function. · 74242c23

Tom Lane authored Dec 05, 2013

Current OpenSSL code includes a BIO_clear_retry_flags() step in the
sock_write() function.  Either we failed to copy the code correctly, or
they added this since we copied it.  In any case, lack of the clear step
appears to be the cause of the server lockup after connection loss reported
in bug #8647 from Valentine Gogichashvili.  Assume that this is correct
coding for all OpenSSL versions, and hence back-patch to all supported
branches.

Diagnosis and patch by Alexander Kukushkin.

74242c23

Avoid resetting Xmax when it's a multi with an aborted update · 07aeb1fe

Alvaro Herrera authored Dec 05, 2013

HeapTupleSatisfiesUpdate can very easily "forget" tuple locks while
checking the contents of a multixact and finding it contains an aborted
update, by setting the HEAP_XMAX_INVALID bit.  This would lead to
concurrent transactions not noticing any previous locks held by
transactions that might still be running, and thus being able to acquire
subsequent locks they wouldn't be normally able to acquire.

This bug was introduced in commit 1ce150b7; backpatch this fix to 9.3,
like that commit.

This change reverts the change to the delete-abort-savept isolation test
in 1ce150b7, because that behavior change was caused by this bug.

Noticed by Andres Freund while investigating a different issue reported
by Noah Misch.

07aeb1fe

04 Dec, 2013 3 commits
- build: pass EXTRA_REGRESS_OPTS to secondary regression tests · 86ef4796
  Bruce Momjian authored Dec 04, 2013
```
Christoph Berg
```
  86ef4796
- doc: split long query into multiple lines · 5043fc82
  Bruce Momjian authored Dec 04, 2013
```
Report from Erik Rijkers
```
  5043fc82
- Fix whitespace · dfd5151c
  Peter Eisentraut authored Dec 03, 2013
  
  dfd5151c
03 Dec, 2013 8 commits

Don't include unused space in LOG_NEWPAGE records. · 9e857436
Heikki Linnakangas authored Dec 04, 2013
```
This is the same trick we use when taking a full page image of a buffer
passed to XLogInsert.
```
9e857436

Fix full-page writes of internal GIN pages. · 22122c83

Heikki Linnakangas authored Dec 03, 2013

Insertion to a non-leaf GIN page didn't make a full-page image of the page,
which is wrong. The code used to do it correctly, but was changed (commit
853d1c31) because the redo-routine didn't
track incomplete splits correctly when the page was restored from a full
page image. Of course, that was not right way to fix it, the redo routine
should've been fixed instead. The redo-routine was surreptitiously fixed
in 2010 (commit 4016bdef), so all we need
to do now is revert the code that creates the record to its original form.

This doesn't change the format of the WAL record.

Backpatch to all supported versions.

22122c83

C comment: again update comment for pg_fe_sendauth for error cases · 4a8adfd4
Bruce Momjian authored Dec 03, 2013

4a8adfd4
Update C comment for pg_fe_getauthname · 6a6b7bbb
Bruce Momjian authored Dec 03, 2013
```
This function no longer takes an argument.
```
6a6b7bbb

libpq: change PQconndefaults() to ignore invalid service files · 9e0a97f1

Bruce Momjian authored Dec 03, 2013

Previously missing or invalid service files returned NULL.  Also fix
pg_upgrade to report "out of memory" for a null return from
PQconndefaults().

Patch by Steve Singer, rewritten by me

9e0a97f1

doc: Refine documentation about recovery command exist status · 95e3d505

Peter Eisentraut authored Dec 02, 2013

Add more documentation about how different exit codes and signals are
handled in each case.
Reviewed-by: Peter Geoghegan <pg@heroku.com>

95e3d505

Report exit code from external recovery commands properly · fef88b3f

Peter Eisentraut authored Nov 13, 2013

When an external recovery command such as restore_command or
archive_cleanup_command fails, report the exit code properly,
distinguishing signals and normal exists, using the existing
wait_result_to_str() facility, instead of just reporting the return
value from system().
Reviewed-by: Peter Geoghegan <pg@heroku.com>

fef88b3f

Fix crash in assign_collations_walker for EXISTS with empty SELECT list. · 7ab32140

Tom Lane authored Dec 02, 2013

We (I think I, actually) forgot about this corner case while coding
collation resolution.  Per bug #8648 from Arjen Nienhuis.

7ab32140

02 Dec, 2013 8 commits

Update release notes for 9.3.2, 9.2.6, 9.1.11, 9.0.15, 8.4.19. · 02bb4bbc
Tom Lane authored Dec 02, 2013

02bb4bbc
doc: update wording of ineffective SET and ABORT commands · 54916b99
Bruce Momjian authored Dec 02, 2013
```
Wording by Alvaro Herrera
```
54916b99

Improve draft release notes. · b8b7b723

Tom Lane authored Dec 02, 2013

Per suggestions from Andres Freund.  Also fix spelling of
Sergey Burladyan's name.

b8b7b723

Increase git_changelog's timestamp_slop from 10 min to 1 day. · 7a1e34d3

Tom Lane authored Dec 02, 2013

Many committers seem to now be using a work flow in which back-patched
commits are timestamped minutes or even hours apart in different branches
(most likely because they commit in one branch before starting work on
the next one).  git_changelog was failing to merge its reports in such
cases, so increase the max time it's willing to merge commits across.
I considered getting rid of the limit altogether, but that produces
some odd results in terms of how the merged commit gets sorted relative
to unrelated commits.

7a1e34d3

Flag mmap implemenation of dynamic shared memory as resize-capable. · c6d4b1dd
Robert Haas authored Dec 02, 2013
```
Error noted by Heikki Linnakangas
```
c6d4b1dd

Make NUM_TOCHAR_prepare and NUM_TOCHAR_finish macros declare "len". · a8656a3a

Robert Haas authored Dec 02, 2013

Remove the variable from the enclosing scopes so that nothing can be
relying on it.  The net result of this refactoring is that we get rid
of a few unnecessary strlen() calls.

Original patch from Greg Jaskiewicz, substantially expanded by me.

a8656a3a

Avoid out-of-bounds read in errfinish if error_stack_depth < 0. · 9d140f7b

Robert Haas authored Dec 02, 2013

If errordata_stack_depth < 0, we won't find that out and correct the
problem until CHECK_STACK_DEPTH() is invoked.  In the meantime,
elevel will be set based on an invalid read.  This is probably
harmless in practice, but it seems cleaner this way.

Xi Wang

9d140f7b

Translation updates · 3e3520cf
Peter Eisentraut authored Dec 02, 2013

3e3520cf

01 Dec, 2013 3 commits

Draft release notes for 9.3.2. · 23e796de

Tom Lane authored Dec 01, 2013

I'm putting these up for review before I start to extract the relevant
subsets for the older branches.  It'll be easier to make any suggested
wording improvements at this stage.

23e796de

doc: Disable preface.autolabel in XSLT · 3c81b5c1
Peter Eisentraut authored Dec 01, 2013
```
The makes the output more consistent with the existing DSSSL setup.
```
3c81b5c1

Update time zone data files to tzdata release 2013h. · 33547025

Tom Lane authored Dec 01, 2013

DST law changes in Argentina, Brazil, Jordan, Libya, Liechtenstein,
Morocco, Palestine.  New timezone abbreviations WIB, WIT, WITA for
Indonesia.

33547025

30 Nov, 2013 10 commits

Editorial corrections to the October 2013 minor-release notes. · 47960354

Tom Lane authored Nov 30, 2013

This is mostly to fix incorrect migration instructions: since the preceding
minor releases advised reindexing some GIST indexes, it's important that
we back-link to that advice rather than earlier instances.

Also improve some bug descriptions and fix a few typos.

No back-patch yet; these files will get copied into the back branches
later in the release process.

47960354

pg_upgrade: Handle default_transaction_read_only settings · e7d56aee

Bruce Momjian authored Nov 30, 2013

Setting default_transaction_read_only=true could prevent pg_upgrade from
completing, so prepend default_transaction_read_only=false to
PGOPTIONS.

e7d56aee

Fix pg_dumpall to work for databases flagged as read-only. · 4bd371f6

Kevin Grittner authored Nov 30, 2013

pg_dumpall's charter is to be able to recreate a database cluster's
contents in a virgin installation, but it was failing to honor that
contract if the cluster had any ALTER DATABASE SET
default_transaction_read_only settings.  By including a SET command
for the connection for each connection opened by pg_dumpall output,
errors are avoided and the source cluster is successfully
recreated.

There was discussion of whether to also set this for the connection
applying pg_dump output, but it was felt that it was both less
appropriate in that context, and far easier to work around.

Backpatch to all supported branches.

4bd371f6

Remove use of obsolescent Autoconf macros · 34fa72ec

Peter Eisentraut authored Nov 30, 2013

Remove the use of the following macros, which are obsolescent according
to the Autoconf documentation:

- AC_C_CONST
- AC_C_STRINGIZE
- AC_C_VOLATILE
- AC_FUNC_MEMCMP

34fa72ec

doc: Simplify handling of variablelists in XSLT build · 1eafea5d

Peter Eisentraut authored Nov 29, 2013

The previously used custom template is no longer necessary because
parameters provided by the standard style sheet can achieve the same
outcome.

1eafea5d

Fix a couple of bugs in MultiXactId freezing · 2393c7d1

Alvaro Herrera authored Nov 28, 2013

Both heap_freeze_tuple() and heap_tuple_needs_freeze() neglected to look
into a multixact to check the members against cutoff_xid.  This means
that a very old Xid could survive hidden within a multi, possibly
outliving its CLOG storage.  In the distant future, this would cause
clog lookup failures:
ERROR:  could not access status of transaction 3883960912
DETAIL:  Could not open file "pg_clog/0E78": No such file or directory.

This mostly was problematic when the updating transaction aborted, since
in that case the row wouldn't get pruned away earlier in vacuum and the
multixact could possibly survive for a long time.  In many cases, data
that is inaccessible for this reason way can be brought back
heuristically.

As a second bug, heap_freeze_tuple() didn't properly handle multixacts
that need to be frozen according to cutoff_multi, but whose updater xid
is still alive.  Instead of preserving the update Xid, it just set Xmax
invalid, which leads to both old and new tuple versions becoming
visible.  This is pretty rare in practice, but a real threat
nonetheless.  Existing corrupted rows, unfortunately, cannot be repaired
in an automated fashion.

Existing physical replicas might have already incorrectly frozen tuples
because of different behavior than in master, which might only become
apparent in the future once pg_multixact/ is truncated; it is
recommended that all clones be rebuilt after upgrading.

Following code analysis caused by bug report by J Smith in message
CADFUPgc5bmtv-yg9znxV-vcfkb+JPRqs7m2OesQXaM_4Z1JpdQ@mail.gmail.com
and privately by F-Secure.

Backpatch to 9.3, where freezing of MultiXactIds was introduced.

Analysis and patch by Andres Freund, with some tweaks by Álvaro.

2393c7d1

Don't TransactionIdDidAbort in HeapTupleGetUpdateXid · 1ce150b7

Alvaro Herrera authored Nov 29, 2013

It is dangerous to do so, because some code expects to be able to see what's
the true Xmax even if it is aborted (particularly while traversing HOT
chains).  So don't do it, and instead rely on the callers to verify for
abortedness, if necessary.

Several race conditions and bugs fixed in the process.  One isolation test
changes the expected output due to these.

This also reverts commit c235a6a5, which is no longer necessary.

Backpatch to 9.3, where this function was introduced.

Andres Freund

1ce150b7

Truncate pg_multixact/'s contents during crash recovery · 1df0122d

Alvaro Herrera authored Nov 29, 2013

Commit 9dc842f0 of 8.2 era prevented MultiXact truncation during crash
recovery, because there was no guarantee that enough state had been
setup, and because it wasn't deemed to be a good idea to remove data
during crash recovery anyway.  Since then, due to Hot-Standby, streaming
replication and PITR, the amount of time a cluster can spend doing crash
recovery has increased significantly, to the point that a cluster may
even never come out of it.  This has made not truncating the content of
pg_multixact/ not defensible anymore.

To fix, take care to setup enough state for multixact truncation before
crash recovery starts (easy since checkpoints contain the required
information), and move the current end-of-recovery actions to a new
TrimMultiXact() function, analogous to TrimCLOG().

At some later point, this should probably done similarly to the way
clog.c is doing it, which is to just WAL log truncations, but we can't
do that for the back branches.

Back-patch to 9.0.  8.4 also has the problem, but since there's no hot
standby there, it's much less pressing.  In 9.2 and earlier, this patch
is simpler than in newer branches, because multixact access during
recovery isn't required.  Add appropriate checks to make sure that's not
happening.

Andres Freund

1df0122d

Fix full-table-vacuum request mechanism for MultiXactIds · f54106f7

Alvaro Herrera authored Nov 28, 2013

While autovacuum dutifully launched anti-multixact-wraparound vacuums
when the multixact "age" was reached, the vacuum code was not aware that
it needed to make them be full table vacuums.  As the resulting
partial-table vacuums aren't capable of actually increasing relminmxid,
autovacuum continued to launch anti-wraparound vacuums that didn't have
the intended effect, until age of relfrozenxid caused the vacuum to
finally be a full table one via vacuum_freeze_table_age.

To fix, introduce logic for multixacts similar to that for plain
TransactionIds, using the same GUCs.

Backpatch to 9.3, where permanent MultiXactIds were introduced.

Andres Freund, some cleanup by Álvaro

f54106f7

Replace hardcoded 200000000 with autovacuum_freeze_max_age · 76a31c68

Alvaro Herrera authored Nov 28, 2013

Parts of the code used autovacuum_freeze_max_age to determine whether
anti-multixact-wraparound vacuums are necessary, while others used a
hardcoded 200000000 value.  This leads to problems when
autovacuum_freeze_max_age is set to a non-default value.  Use the latter
everywhere.

Backpatch to 9.3, where vacuuming of multixacts was introduced.

Andres Freund

76a31c68

29 Nov, 2013 5 commits

Fix assorted issues in pg_ctl's pgwin32_CommandLine(). · 79193c75

Tom Lane authored Nov 29, 2013

Ensure that the invocation command for postgres or pg_ctl runservice
double-quotes the executable's pathname; failure to do this leads to
trouble when the path contains spaces.

Also, ensure that the path ends in ".exe" in both cases and uses
backslashes rather than slashes as directory separators.  The latter issue
is reported to confuse some third-party tools such as Symantec Backup Exec.

Also, rewrite the function to avoid buffer overrun issues by using a
PQExpBuffer instead of a fixed-size static buffer.  Combinations of
very long executable pathnames and very long data directory pathnames
could have caused trouble before, for example.

Back-patch to all active branches, since this code has been like this
for a long while.

Naoya Anzai and Tom Lane, reviewed by Rajeev Rastogi

79193c75

Be sure to release proc->backendLock after SetupLockInTable() failure. · 8b151558

Tom Lane authored Nov 29, 2013

The various places that transferred fast-path locks to the main lock table
neglected to release the PGPROC's backendLock if SetupLockInTable failed
due to being out of shared memory. In most cases this is no big deal since
ensuing error cleanup would release all held LWLocks anyway. But there are
some hot-standby functions that don't consider failure of
FastPathTransferRelationLocks to be a hard error, and in those cases this
oversight could lead to system lockup. For consistency, make all of these
places look the same as FastPathTransferRelationLocks.

Noted while looking for the cause of Dan Wood's bugs --- this wasn't it,
but it's a bug anyway.

8b151558

Fix assorted race conditions in the new timeout infrastructure. · 16e1b7a1

Tom Lane authored Nov 29, 2013

Prevent handle_sig_alarm from losing control partway through due to a query
cancel (either an asynchronous SIGINT, or a cancel triggered by one of the
timeout handler functions). That would at least result in failure to
schedule any required future interrupt, and might result in actual
corruption of timeout.c's data structures, if the interrupt happened while
we were updating those.

We could still lose control if an asynchronous SIGINT arrives just as the
function is entered. This wouldn't break any data structures, but it would
have the same effect as if the SIGALRM interrupt had been silently lost:
we'd not fire any currently-due handlers, nor schedule any new interrupt.
To forestall that scenario, forcibly reschedule any pending timer interrupt
during AbortTransaction and AbortSubTransaction. We can avoid any extra
kernel call in most cases by not doing that until we've allowed
LockErrorCleanup to kill the DEADLOCK_TIMEOUT and LOCK_TIMEOUT events.

Another hazard is that some platforms (at least Linux and *BSD) block a
signal before calling its handler and then unblock it on return. When we
longjmp out of the handler, the unblock doesn't happen, and the signal is
left blocked indefinitely. Again, we can fix that by forcibly unblocking
signals during AbortTransaction and AbortSubTransaction.

These latter two problems do not manifest when the longjmp reaches
postgres.c, because the error recovery code there kills all pending timeout
events anyway, and it uses sigsetjmp(..., 1) so that the appropriate signal
mask is restored. So errors thrown outside any transaction should be OK
already, and cleaning up in AbortTransaction and AbortSubTransaction should
be enough to fix these issues. (We're assuming that any code that catches
a query cancel error and doesn't re-throw it will do at least a
subtransaction abort to clean up; but that was pretty much required already
by other subsystems.)

Lastly, ProcSleep should not clear the LOCK_TIMEOUT indicator flag when
disabling that event: if a lock timeout interrupt happened after the lock
was granted, the ensuing query cancel is still going to happen at the next
CHECK_FOR_INTERRUPTS, and we want to report it as a lock timeout not a user
cancel.

Per reports from Dan Wood.

Back-patch to 9.3 where the new timeout handling infrastructure was
introduced. We may at some point decide to back-patch the signal
unblocking changes further, but I'll desist from that until we hear
actual field complaints about it.

16e1b7a1

doc: Enhance documentation of ssl_ciphers setting a bit · 50107ee7
Peter Eisentraut authored Nov 29, 2013

50107ee7
doc: Allow selecting web site CSS style sheet in XSLT HTML build · 384eb1d4
Peter Eisentraut authored Nov 28, 2013

384eb1d4