Commits · c170655cc81fd5e3c152e951c52247171bb57611 · Abuhujair Javed / Postgres FD Implementation

09 Jun, 2014 2 commits

Fix infinite loop when splitting inner tuples in SPGiST text indexes. · c170655c

Tom Lane authored Jun 09, 2014

Previously, the code used a node label of zero both for strings that
contain no bytes beyond the inner tuple's prefix, and for cases where an
"allTheSame" inner tuple has to be split to allow a string with a different
next byte to be inserted into it. Failing to distinguish these cases meant
that if a string ending with the current prefix needed to be inserted into
an allTheSame tuple, we got into an infinite loop, because after splitting
the tuple we'd descend into the child allTheSame tuple and then find we
need to split again.

To fix, instead use -1 and -2 as the node labels for these two cases.
This requires widening the node label type from "char" to int2, but
fortunately SPGiST stores all pass-by-value node label types in their
Datum representation, which means that this change is transparently upward
compatible so far as the on-disk representation goes. We continue to
recognize zero as a dummy node label for reading purposes, but will not
attempt to push new index entries down into such a label, so that the loop
won't occur even when dealing with an existing index.

Per report from Teodor Sigaev. Back-patch to 9.2 where the faulty
code was introduced.

c170655c

Wrap multixact/members correctly during extension, take 2 · b0b263ba

Alvaro Herrera authored Jun 09, 2014

In a50d9762 I already changed this, but got it wrong for the case
where the number of members is larger than the number of entries that
fit in the last page of the last segment.

As reported by Serge Negodyuck in a followup to bug #8673.

b0b263ba

05 Jun, 2014 7 commits

Fix off-by-one in decoding causing one-record events to be skipped. · fe7337f2

Andres Freund authored Jun 05, 2014

A ReorderBufferTransaction's end_lsn, the sentPtr advocated by
walsender keepalive messages, and the end location remembered by the
decoding get_*changes* SQL functions all use the location of the last
read record + 1. I.e. the LSN points to the beginning of the next
record. That cannot realistically be changed without changing the
replication protocol because that's how keepalive messages have worked
since 9.0.
The bug is that the logic inside the snapshot builder, which decides
whether a transaction's contents should be decoded, assumed the start
location would point towards the last byte of the last record. The
reason this didn't actually cause visible problems is that currently
that decision is only made for commit records. Since interesting
transactions always have at least one additional record - containing
actual data - we'd never skip a transaction.
But if there ever were transactions, or other events, with just one
record containing important information, we'd skip them after stopping
and restarting logical decoding.

fe7337f2

Add defenses against running with a wrong selection of LOBLKSIZE. · 5f93c378

Tom Lane authored Jun 05, 2014

It's critical that the backend's idea of LOBLKSIZE match the way data has
actually been divided up in pg_largeobject. While we don't provide any
direct way to adjust that value, doing so is a one-line source code change
and various people have expressed interest recently in changing it. So,
just as with TOAST_MAX_CHUNK_SIZE, it seems prudent to record the value in
pg_control and cross-check that the backend's compiled-in setting matches
the on-disk data.

Also tweak the code in inv_api.c so that fetches from pg_largeobject
explicitly verify that the length of the data field is not more than
LOBLKSIZE. Formerly we just had Asserts() for that, which is no protection
at all in production builds. In some of the call sites an overlength data
value would translate directly to a security-relevant stack clobber, so it
seems worth one extra runtime comparison to be sure.

In the back branches, we can't change the contents of pg_control; but we
can still make the extra checks in inv_api.c, which will offer some amount
of protection against running with the wrong value of LOBLKSIZE.

5f93c378

Consistently spell a replication slot's name as slot_name. · f0c10856

Andres Freund authored Jun 05, 2014

Previously there's been a mix between 'slotname' and 'slot_name'. It's
not nice to be unneccessarily inconsistent in a new feature. As a post
beta1 initdb now is required in the wake of eeca4cd3, fix the
inconsistencies.
Most the changes won't affect usage of replication slots because the
majority of changes is around function parameter names. The prominent
exception to that is that the recovery.conf parameter
'primary_slotname' is now named 'primary_slot_name'.

f0c10856

Move regression test listing of builtin leakproof functions to opr_sanity.sql. · e0cb4aa8

Andres Freund authored Jun 05, 2014

The original location in create_function_3.sql didn't invite the close
structinity warranted for adding new leakproof functions. Add comments
to the test explaining that functions should only be added after
careful consideration and understanding what a leakproof function is.

Per complaint from Tom Lane after 5eebb8d9.

e0cb4aa8

Adjust SP-GiST WAL record formats to reduce alignment padding. · 8776faa8

Heikki Linnakangas authored Jun 05, 2014

The way the code was written, the padding was copied from uninitialized
memory areas.. Because the structs are local variables in the code where
the WAL records are constructed, making them larger and zeroing the padding
bytes would not make the code very pretty, so rather than fixing this
directly by zeroing out the padding bytes, it seems more clear to not try to
align the tuples in the WAL records. The redo functions are taught to copy
the tuple header to a local variable to avoid unaligned access.

Stable-branches have the same problem, but we can't change the WAL format
there, so fix in master only. Reading a few random extra bytes at the stack
is harmless in practice, so it's not worth crafting a different
back-patchable fix.

Per reports from Kevin Grittner and Andres Freund, using clang static
analyzer and Valgrind, respectively.

8776faa8

Tweak new regression test case for better portability. · d4d48a5e

Tom Lane authored Jun 04, 2014

Buildfarm says we get different plans on 32-bit and 64-bit platforms,
probably because of MAXALIGN-related differences in memory-consumption
calculations. Add some dummy WHERE clauses so that the planner estimates
different sizes for the three generate_series() relations; that should
stabilize the choice of join order.

d4d48a5e

Add btree and hash opclasses for pg_lsn. · 4c8ab1b9

Tom Lane authored Jun 04, 2014

This is needed to allow ORDER BY, DISTINCT, etc to work as expected for
pg_lsn values.

We had previously decided to put this off for 9.5, but in view of commit
eeca4cd3 there's no reason to avoid a
catversion bump for 9.4beta2, and this does make a pretty significant
usability difference for pg_lsn.

Michael Paquier, with fixes from Andres Freund and Tom Lane

4c8ab1b9

04 Jun, 2014 5 commits

Bump PG_CONTROL_VERSION for previous 9.4 changes. · eeca4cd3

Tom Lane authored Jun 04, 2014

This should have been done in 6bc8ef0b
and/or 50e54709, but better late than
never.  If we don't change this then we risk 9.3 pg_controldata or
pg_resetxlog being inappropriately used against a 9.4 pg_control file,
or vice versa.

eeca4cd3

Fix longstanding bug in HeapTupleSatisfiesVacuum(). · 621a99a6

Andres Freund authored Jun 04, 2014

HeapTupleSatisfiesVacuum() didn't properly discern between
DELETE_IN_PROGRESS and INSERT_IN_PROGRESS for rows that have been
inserted in the current transaction and deleted in a aborted
subtransaction of the current backend. At the very least that caused
problems for CLUSTER and CREATE INDEX in transactions that had
aborting subtransactions producing rows, leading to warnings like:
WARNING:  concurrent delete in progress within table "..."
possibly in an endless, uninterruptible, loop.

Instead of treating *InProgress xmins the same as *IsCurrent ones,
treat them as being distinct like the other visibility routines. As
implemented this separatation can cause a behaviour change for rows
that have been inserted and deleted in another, still running,
transaction. HTSV will now return INSERT_IN_PROGRESS instead of
DELETE_IN_PROGRESS for those. That's both, more in line with the other
visibility routines and arguably more correct. The latter because a
INSERT_IN_PROGRESS will make callers look at/wait for xmin, instead of
xmax.
The only current caller where that's possibly worse than the old
behaviour is heap_prune_chain() which now won't mark the page as
prunable if a row has concurrently been inserted and deleted. That's
harmless enough.

As a cautionary measure also insert a interrupt check before the gotos
in IndexBuildHeapScan() that lead to the uninterruptible loop. There
are other possible causes, like a row that several sessions try to
update and all fail, for repeated loops and the cost of doing so in
the retry case is low.

As this bug goes back all the way to the introduction of
subtransactions in 573a71a5 backpatch to all supported releases.

Reported-By: Sandro Santilli

621a99a6

Add description of pg_stat directory into doc. · c8c9c1f5
Fujii Masao authored Jun 05, 2014
```
Back-patch to 9.3 where pg_stat directory was introduced.
```
c8c9c1f5

Save pg_stat_statements statistics file into $PGDATA/pg_stat directory at shutdown. · 654e8e44

Fujii Masao authored Jun 04, 2014

187492b6 changed pgstat.c so that
the stats files were saved into $PGDATA/pg_stat directory when the server
was shutdowned. But it accidentally forgot to change the location of
pg_stat_statements permanent stats file. This commit fixes pg_stat_statements
so that its stats file is also saved into $PGDATA/pg_stat at shutdown.

Since this fix changes the file layout, we don't back-patch it to 9.3
where this oversight was introduced.

654e8e44

Silence Bison deprecation warnings · 55fb759a

Peter Eisentraut authored Jun 03, 2014

Bison >=3.0 issues warnings about

    %name-prefix="base_yy"

instead of the now preferred

    %name-prefix "base_yy"

but the latter doesn't work with Bison 2.3 or less.  So for now we
silence the deprecation warnings.

55fb759a

03 Jun, 2014 6 commits

Use EncodeDateTime instead of to_char to render JSON timestamps. · ab14a73a

Andrew Dunstan authored Jun 03, 2014

Per gripe from Peter Eisentraut and Tom Lane.

The output is slightly different, but still ISO 8601 compliant: to_char
doesn't output the minutes when time zone offset is an integer number of
hours, while EncodeDateTime outputs ":00".

The code is slightly adapted from code in xml.c

ab14a73a

Do not escape a unicode sequence when escaping JSON text. · 0ad1a816

Andrew Dunstan authored Jun 03, 2014

Previously, any backslash in text being escaped for JSON was doubled so
that the result was still valid JSON. However, this led to some perverse
results in the case of Unicode sequences, These are now detected and the
initial backslash is no longer escaped. All other backslashes are
still escaped. No validity check is performed, all that is looked for is
\uXXXX where X is a hexidecimal digit.

This is a change from the 9.2 and 9.3 behaviour as noted in the Release
notes.

Per complaint from Teodor Sigaev.

0ad1a816

Output timestamps in ISO 8601 format when rendering JSON. · f30015b6

Andrew Dunstan authored Jun 03, 2014

Many JSON processors require timestamp strings in ISO 8601 format in
order to convert the strings. When converting a timestamp, with or
without timezone, to a JSON datum we therefore now use such a format
rather than the type's default text output, in functions such as
to_json().

This is a change in behaviour from 9.2 and 9.3, as noted in the release
notes.

f30015b6

Make plpython_unicode regression test work in more database encodings. · 2dfa15de

Tom Lane authored Jun 03, 2014

This test previously used a data value containing U+0080, and would
therefore fail if the database encoding didn't have an equivalent to
that; which only about half of our supported server encodings do.
We could fall back to using some plain-ASCII character, but that seems
like it's losing most of the point of the test. Instead switch to using
U+00A0 (no-break space), which translates into all our supported encodings
except the four in the EUC_xx family.

Per buildfarm testing. Back-patch to 9.1, which is as far back as this
test is expected to succeed everywhere. (9.0 has the test, but without
back-patching some 9.1 code changes we could not expect to get consistent
results across platforms anyway.)

2dfa15de

Set the process latch when processing recovery conflict interrupts. · 44445b28

Andres Freund authored Jun 03, 2014

Because RecoveryConflictInterrupt() didn't set the process latch
anything using the latter to wait for events didn't get notified about
recovery conflicts. Most latch users are never the target of recovery
conflicts, which explains the lack of reports about this until
now.
Since 9.3 two possible affected users exist though: The sql callable
pg_sleep() now uses latches to wait and background workers are
expected to use latches in their main loop. Both would currently wait
until the end of WaitLatch's timeout.

Fix by adding a SetLatch() to RecoveryConflictInterrupt(). It'd also
be possible to fix the issue by having each latch user set
set_latch_on_sigusr1. That seems failure prone and though, as most of
these callsites won't often receive recovery conflicts and thus will
likely only be tested against normal query cancels et al. It'd also be
unnecessarily verbose.

Backpatch to 9.1 where latches were introduced. Arguably 9.3 would be
sufficient, because that's where pg_sleep() was converted to waiting
on the latch and background workers got introduced; but there could be
user level code making use of the latch pre 9.3.

44445b28

Use unaligned output in another regression test query to reduce diff noise. · 5eebb8d9

Andres Freund authored Jun 03, 2014

Use the unaligned/no rowcount output mode in a regression tests that
shows all built-in leakproof functions. Currently a new leakproof
function will often change the alignment of all existing functions,
making it hard to see the actual difference and creating unnecessary
patch conflicts.

Noticed while looking over a patch introducing new leakproof functions.

5eebb8d9

02 Jun, 2014 1 commit
- doc: fix JSON function prototype variable label · 19c9da1d
  Bruce Momjian authored Jun 02, 2014
```
from_jsonb -> from_json, for consistency

Patch by rudolf (private report)
```
  19c9da1d
01 Jun, 2014 1 commit

Improve the efficiency of certain jsonb get operations. · 1a4174a4

Andrew Dunstan authored Jun 01, 2014

Instead of iterating over jsonb structures, use the inbuilt functions
findJsonbValueFromContainerLen() and getIthJsonbValueFromContainer() to
extract values directly. These functions use algorithms that are O(n log
n) and O(1) respectively, whereas iterating is O(n), so we should see
considerable speedup here.

Teodor Sigaev.

1a4174a4

31 May, 2014 1 commit

Improvements to the replication protocol documentation. · a5750982

Andres Freund authored May 31, 2014

Document the CREATE_REPLICATION_SLOT's output_plugin parameter; that
START_REPLICATION ... LOGICAL takes parameters; that START_REPLICATION
... LOGICAL uses the same messages as ... PHYSICAL; and be more
consistent with the usage of <literal/>.

Michael Paquier, with some additional changes by me.

a5750982

30 May, 2014 3 commits

On OS X, link libpython normally, ignoring the "framework" framework. · 20561acf

Tom Lane authored May 30, 2014

As of Xcode 5.0, Apple isn't including the Python framework as part of the
SDK-level files, which means that linking to it might fail depending on
whether Xcode thinks you've selected a specific SDK version.  According to
their Tech Note 2328, they've basically deprecated the framework method of
linking to libpython and are telling people to link to the shared library
normally.  (I'm pretty sure this is in direct contradiction to the advice
they were giving a few years ago, but whatever.)  Testing says that this
approach works fine at least as far back as OS X 10.4.11, so let's just
rip out the framework special case entirely.  We do still need a special
case to decide that OS X provides a shared library at all, unfortunately
(I wonder why the distutils check doesn't work ...).  But this is still
less of a special case than before, so it's fine.

Back-patch to all supported branches, since we'll doubtless be hearing
about this more as more people update to recent Xcode.

20561acf

Fix typos in MSVC solution file. · 512f3b03
Heikki Linnakangas authored May 30, 2014
```
Michael Paquier
```
512f3b03
In release notes, mention the need to initialize bgw_notify_pid. · 42be7d69
Robert Haas authored May 29, 2014
```
Michael Paquier
```
42be7d69

29 May, 2014 2 commits

When using the OSSP UUID library, cache its uuid_t state object. · c941aed9

Tom Lane authored May 29, 2014

The original coding in contrib/uuid-ossp created and destroyed a uuid_t
object (or, in some cases, even two of them) each time it was called.
This is not the intended usage: you're supposed to keep the uuid_t object
around so that the library can cache its state across uses. (Other UUID
libraries seem to keep equivalent state behind-the-scenes in static
variables, but OSSP chose differently.) Aside from being quite inefficient,
creating a new uuid_t loses knowledge of the previously generated UUID,
which in theory could result in duplicate V1-style UUIDs being created
on sufficiently fast machines.

On at least some platforms, creating a new uuid_t also draws some entropy
from /dev/urandom, leaving less for the rest of the system. This seems
sufficiently unpleasant to justify back-patching this change.

c941aed9

Fix uuid-ossp regression tests based on buildfarm feedback. · 25dd07e0

Tom Lane authored May 28, 2014

The previous version of these tests expected uuid_generate_v1() to always
emit MAC addresses with the local-admin and multicast address bits zero.
However, several of the buildfarm critters are reporting values with the
local-admin bit set. (Perhaps they're running inside VMs or jails.)
And a couple are reporting values with the multicast bit set, probably
meaning that the UUID library couldn't read the system MAC address.

Also, it emerges that if OSSP UUID can't read the system MAC address, it
falls back to V1MC behavior wherein the whole node field gets randomized
each time, breaking the test that expected the node field to remain stable
in V1 output. (It looks like e2fs doesn't behave that way, though.)

It's not entirely clear why we can't get a system MAC address, since the
buildfarm scripts would not work without internet access. Nonetheless,
the regression tests had better cope with the case, so adjust the tests
to expect these behaviors.

25dd07e0

28 May, 2014 12 commits

Revert "Fix bogus %name-prefix option syntax in all our Bison files." · 71ed8b3c

Tom Lane authored May 28, 2014

This reverts commit 45b7abe5.

It turns out that the %name-prefix syntax without "=" does not work
at all in pre-2.4 Bison.  We are not prepared to make such a large
jump in minimum required Bison version just to suppress a warning
message in a version hardly any developers are using yet.
When 3.0 gets more popular, we'll figure out a way to deal with this.
In the meantime, BISONFLAGS=-Wno-deprecated is recommendable for
anyone using 3.0 who doesn't want to see the warning.

71ed8b3c

Don't pay heed to wal_sender_timeout while creating a decoding slot. · 21d48d66

Andres Freund authored May 29, 2014

Sometimes CREATE_REPLICATION_SLOT ... LOGICAL ... needs to wait for
further WAL using WalSndWaitForWal(). That used to always respect
wal_sender_timeout and kill the session when waiting long enough
because no feedback/ping messages can be sent while the slot is still
being created.
Introduce the notion that last_reply_timestamp = 0 means that the
walsender currently doesn't need timeout processing to avoid that
problem. Use that notion for CREATE_REPLICATION_SLOT ... LOGICAL.

Bugreport and initial patch by Steve Singer, revised by me.

21d48d66

Minor refactoring of jsonb_util.c · d1d50bff

Heikki Linnakangas authored May 28, 2014

The only caller of compareJsonbScalarValue that needed locale-sensitive
comparison of strings was also the only caller that didn't just check for
equality. Separate the two cases for clarity: compareJsonbScalarValue now
does locale-sensitive comparison, and a new function,
equalsJsonbScalarValue, just checks for equality.

d1d50bff

Jsonb comparison bug fixes. · b3e5cfd5

Heikki Linnakangas authored May 28, 2014

Fix an over-zealous assertion, which didn't take into account that sometimes
a scalar element can be compared against an array/object element.

Avoid comparing possibly-uninitialized local variables when end-of-array or
end-of-object is reached. Also fix and enhance comments a bit.

Peter Geoghegan, per reports by Pavel Stehule and me.

b3e5cfd5

Fix bogus %name-prefix option syntax in all our Bison files. · 45b7abe5

Tom Lane authored May 28, 2014

%name-prefix doesn't use an "=" sign according to the Bison docs, but it
silently accepted one anyway, until Bison 3.0.  This was originally a
typo of mine in commit 012abeba, and we
seem to have slavishly copied the error into all the other grammar files.

Per report from Vik Fearing; analysis by Peter Eisentraut.

Back-patch to all active branches, since somebody might try to build
a back branch with up-to-date tools.

45b7abe5

Improve regression tests for uuid-ossp. · c0f27628

Tom Lane authored May 28, 2014

On reflection, the timestamp-advances test might fail if we're unlucky
enough for the time_mid field to change between two calls, since uuid_cmp
is just bytewise comparison and the field ordering has more significant
fields later. Build some field extraction functions so we can do a more
honest test of that. Also check that the version and reserved fields
contain what they should.

c0f27628

Fix stack clobber in new uuid-ossp code. · 2103218d

Tom Lane authored May 28, 2014

The V5 (SHA1 hashing) code wrote 20 bytes into a 16-byte local variable.
This had accidentally failed to fail in my testing and Matteo's, but
buildfarm results exposed the problem.

2103218d

Ensure cleanup in case of early errors in streaming base backups · 8232d6df

Magnus Hagander authored May 28, 2014

Move the code that sends the initial status information as well as the
calculation of paths inside the ENSURE_ERROR_CLEANUP block. If this code
failed, we would "leak" a counter of number of concurrent backups, thereby
making the system always believe it was in backup mode. This could happen
if the sending failed (which it probably never did given that the small
amount of data to send would never cause a flush) or if the psprintf calls
ran out of memory. Both are very low risk, but all operations after
do_pg_start_backup should be protected.

8232d6df

doc: improve markup of ssl_ecdh_curve commit · c6763156
Bruce Momjian authored May 28, 2014

c6763156

pg_lsn should not be marked typispreferred. · ec3357a3

Tom Lane authored May 28, 2014

In general it's not a good idea for built-in types in the 'U' category
to be marked preferred; they could draw behavior away from user-defined
types with similarly-named operators. pg_lsn is probably at low risk
of that right now given the lack of casts between it and other types,
but that doesn't make this marking OK.

Ordinarily we'd bump catversion when changing any predefined catalog
contents like this, but since we're past beta1, the costs of a forced
initdb seem to outweigh the benefits of guaranteed behavioral consistency.
There's not any known behavioral impact today anyway --- this is more
in the nature of being sure there's not problems in future.

Per an off-list complaint from Thomas Fanghaenel.

ec3357a3

Fix obsolete config-module-exclusion logic in vcregress.pl. · 86000311

Tom Lane authored May 27, 2014

The recent addition of regression tests to uuid-ossp exposed the fact
that the MSVC build system wasn't being consistent about whether it was
building/testing that contrib module, ie, it would try to test the module
even when it hadn't built it.  The same hazard was latent for sslinfo.

For the moment I just copied the more up-to-date logic from point A to
point B, but this is screaming for refactoring.

Per buildfarm results.

86000311

Propagate system identifier generation improvement into pg_resetxlog. · 4bcb3946

Tom Lane authored May 27, 2014

Commit 5035701e improved xlog.c's method
for creating a database system identifier, but I neglected to fix the
copy of that code appearing in pg_resetxlog.c.  Spotted by Andres Freund.

4bcb3946