Commits · a3215431ab7c667bf581728f10c80a36abbe1d5a · Abuhujair Javed / Postgres FD Implementation

19 Oct, 2016 9 commits

Suppress "Factory" zone in pg_timezone_names view for tzdata >= 2016g. · a3215431

Tom Lane authored Oct 19, 2016

IANA got rid of the really silly "abbreviation" and replaced it with one
that's only moderately silly.  But it's still pointless, so keep on not
showing it.

a3215431

Update time zone data files to tzdata release 2016g. · ecbac3e6

Tom Lane authored Oct 19, 2016

DST law changes in Turkey. Historical corrections for America/Los_Angeles,
Europe/Kirov, Europe/Moscow, Europe/Samara, and Europe/Ulyanovsk.
Rename Asia/Rangoon to Asia/Yangon, with a backward compatibility link.

The IANA crew continue their campaign to replace invented time zone
abbrevations with numeric GMT offsets. This update changes numerous zones
in Antarctica and the former Soviet Union, for instance Antarctica/Casey
now reports "+08" not "AWST" in the pg_timezone_names view. I kept these
abbreviations in the tznames/ data files, however, so that we will still
accept them for input. (We may want to start trimming those files someday,
but today is not that day.)

An exception is that since IANA no longer claims that "AMT" is in use
in Armenia for GMT+4, I replaced it in the Default file with GMT-4,
corresponding to Amazon Time which is in use in South America. It may be
that that meaning is also invented and IANA will drop it in a future
update; but for now, it seems silly to give pride of place to a meaning
not traceable to IANA over one that is.

ecbac3e6

Make getrusage() output a little more readable · 9ffe4a8b
Peter Eisentraut authored Oct 19, 2016
```
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Reviewed-by: Peter Geoghegan <pg@heroku.com>
```
9ffe4a8b

Use pg_ctl promote -w in TAP tests · e5a9bcb5

Peter Eisentraut authored Oct 19, 2016

Switch TAP tests to use the new wait mode of pg_ctl promote.  This
allows avoiding extra logic with poll_query_until() to be sure that a
promoted standby is ready for read-write queries.

From: Michael Paquier <michael.paquier@gmail.com>

e5a9bcb5

initdb pg_basebackup: Rename --noxxx options to --no-xxx · 5d58c07a

Peter Eisentraut authored Oct 19, 2016

--noclean and --nosync were the only options spelled without a hyphen,
so change this for consistency with other options.  The options in
pg_basebackup have not been in a release, so we just rename them.  For
initdb, we retain the old variants.

Vik Fearing and me

5d58c07a

pg_ctl: Add long option for -o · caf936b0
Peter Eisentraut authored Oct 19, 2016
```
Now all normally used options are covered by long options as well.
```
caf936b0
doc: Consistently use = sign in long options synopses · c709c607
Peter Eisentraut authored Oct 19, 2016
```
This was already the predominant form in man pages and help output.
```
c709c607
pg_ctl: Add long options for -w and -W · 0be22457
Peter Eisentraut authored Oct 19, 2016
```
From: Vik Fearing <vik@2ndquadrant.fr>
```
0be22457

Fix WAL-logging of FSM and VM truncation. · 917dc7d2

Heikki Linnakangas authored Oct 19, 2016

When a relation is truncated, it is important that the FSM is truncated as
well. Otherwise, after recovery, the FSM can return a page that has been
truncated away, leading to errors like:

ERROR:  could not read block 28991 in file "base/16390/572026": read only 0
of 8192 bytes

We were using MarkBufferDirtyHint() to dirty the buffer holding the last
remaining page of the FSM, but during recovery, that might in fact not
dirty the page, and the FSM update might be lost.

To fix, use the stronger MarkBufferDirty() function. MarkBufferDirty()
requires us to do WAL-logging ourselves, to protect from a torn page, if
checksumming is enabled.

Also fix an oversight in visibilitymap_truncate: it also needs to WAL-log
when checksumming is enabled.

Analysis by Pavan Deolasee.

Discussion: <CABOikdNr5vKucqyZH9s1Mh0XebLs_jRhKv6eJfNnD2wxTn=_9A@mail.gmail.com>

917dc7d2

18 Oct, 2016 5 commits

Improve regression test coverage for hash indexes. · b801e120

Robert Haas authored Oct 18, 2016

On my system, this improves coverage for src/backend/access/hash from
61.3% of lines to 88.2% of lines, and from 83.5% of functions to 97.5%
of functions, which is pretty good for 36 lines of tests.

Mithun Cy, reviewing by Amit Kapila and Álvaro Herrera

b801e120

Fix a few typos in simplehash.h. · 90d3da11
Andres Freund authored Oct 18, 2016
```
Author: Erik Rijkers
Discussion: <274e4c8ac545d6622735f97c1f6c354b@xs4all.nl>
```
90d3da11
Fix typo in comment. · fca41acb
Robert Haas authored Oct 18, 2016
```
Amit Langote
```
fca41acb

Fix cidin() to handle values above 2^31 platform-independently. · 6f13a682

Tom Lane authored Oct 18, 2016

CommandId is declared as uint32, and values up to 4G are indeed legal.
cidout() handles them properly by treating the value as unsigned int.
But cidin() was just using atoi(), which has platform-dependent behavior
for values outside the range of signed int, as reported by Bart Lengkeek
in bug #14379.  Use strtoul() instead, as xidin() does.

In passing, make some purely cosmetic changes to make xidin/xidout
look more like cidin/cidout; the former didn't have a monopoly on
best practice IMO.

Neither xidin nor cidin make any attempt to throw error for invalid input.
I didn't change that here, and am not sure it's worth worrying about
since neither is really a user-facing type.  The point is just to ensure
that indubitably-valid inputs work as expected.

It's been like this for a long time, so back-patch to all supported
branches.

Report: <20161018152550.1413.6439@wrigleys.postgresql.org>

6f13a682

Revert "Replace PostmasterRandom() with a stronger way of generating randomness." · faae1c91

Heikki Linnakangas authored Oct 18, 2016

This reverts commit 9e083fd4. That was a
few bricks shy of a load:

* Query cancel stopped working
* Buildfarm member pademelon stopped working, because the box doesn't have
  /dev/urandom nor /dev/random.

This clearly needs some more discussion, and a quite different patch, so
revert for now.

faae1c91

17 Oct, 2016 4 commits

By default, set log_line_prefix = '%m [%p] '. · 7d3235ba

Robert Haas authored Oct 17, 2016

This value might not be to everyone's taste; in particular, some
people might prefer %t to %m, and others may want %u, %d, or other
fields.  However, it's a vast improvement on the old default of ''.

Christoph Berg

7d3235ba

Use OpenSSL EVP API for symmetric encryption in pgcrypto. · 5ff4a67f

Heikki Linnakangas authored Oct 17, 2016

The old "low-level" API is deprecated, and doesn't support hardware
acceleration. And this makes the code simpler, too.

Discussion: <561274F1.1030000@iki.fi>

5ff4a67f

Fix use-after-free around DISTINCT transition function calls. · d8589946

Heikki Linnakangas authored Oct 17, 2016

Have tuplesort_gettupleslot() copy the contents of its current table slot
as needed. This is based on an approach taken by tuplestore_gettupleslot().
In the future, tuplesort_gettupleslot() may also be taught to avoid copying
the tuple where caller can determine that that is safe (the
tuplestore_gettupleslot() interface already offers this option to callers).

Patch by Peter Geoghegan. Fixes bug #14344, reported by Regina Obe.

Report: <20160929035538.20224.39628@wrigleys.postgresql.org>

Backpatch-through: 9.6

d8589946

Replace PostmasterRandom() with a stronger way of generating randomness. · 9e083fd4

Heikki Linnakangas authored Oct 17, 2016

This adds a new routine, pg_strong_random() for generating random bytes,
for use in both frontend and backend. At the moment, it's only used in
the backend, but the upcoming SCRAM authentication patches need strong
random numbers in libpq as well.

pg_strong_random() is based on, and replaces, the existing implementation
in pgcrypto. It can acquire strong random numbers from a number of sources,
depending on what's available:
- OpenSSL RAND_bytes(), if built with OpenSSL
- On Windows, the native cryptographic functions are used
- /dev/urandom
- /dev/random

Original patch by Magnus Hagander, with further work by Michael Paquier
and me.

Discussion: <CAB7nPqRy3krN8quR9XujMVVHYtXJ0_60nqgVc6oUk8ygyVkZsA@mail.gmail.com>

9e083fd4

15 Oct, 2016 1 commit

Use more efficient hashtable for execGrouping.c to speed up hash aggregation. · 5dfc1981

Andres Freund authored Oct 14, 2016

The more efficient hashtable speeds up hash-aggregations with more than
a few hundred groups significantly. Improvements of over 120% have been
measured.

Due to the the different hash table queries that not fully
determined (e.g. GROUP BY without ORDER BY) may change their result
order.

The conversion is largely straight-forward, except that, due to the
static element types of simplehash.h type hashes, the additional data
some users store in elements (e.g. the per-group working data for hash
aggregaters) is now stored in TupleHashEntryData->additional.  The
meaning of BuildTupleHashTable's entrysize (renamed to additionalsize)
has been changed to only be about the additionally stored size.  That
size is only used for the initial sizing of the hash-table.

Reviewed-By: Tomas Vondra
Discussion: <20160727004333.r3e2k2y6fvk2ntup@alap3.anarazel.de>

5dfc1981

14 Oct, 2016 5 commits

Use more efficient hashtable for tidbitmap.c to speed up bitmap scans. · 75ae538b

Andres Freund authored Oct 14, 2016

Use the new simplehash.h to speed up tidbitmap.c uses. For bitmap scan
heavy queries speedups of over 100% have been measured. Both lossy and
exact scans benefit, but the wins are bigger for mostly exact scans.

The conversion is mostly trivial, except that tbm_lossify() now restarts
lossifying at the point it previously stopped. Otherwise the hash table
becomes unbalanced because the scan in done in hash-order, leaving the
end of the hashtable more densely filled then the beginning. That caused
performance issues with dynahash as well, but due to the open chaining
they were less pronounced than with the linear adressing from
simplehash.h.

Reviewed-By: Tomas Vondra
Discussion: <20160727004333.r3e2k2y6fvk2ntup@alap3.anarazel.de>

75ae538b

Add a macro templatized hashtable. · b30d3ea8

Andres Freund authored Oct 14, 2016

dynahash.c hash tables aren't quite fast enough for some
use-cases. There are several reasons for lacking performance:
- the use of chaining for collision handling makes them cache
  inefficient, that's especially an issue when the tables get bigger.
- as the element sizes for dynahash are only determined at runtime,
  offset computations are somewhat expensive
- hash and element comparisons are indirect function calls, causing
  unnecessary pipeline stalls
- it's two level structure has some benefits (somewhat natural
  partitioning), but increases the number of indirections
to fix several of these the hash tables have to be adjusted to the
individual use-case at compile-time. C unfortunately doesn't provide a
good way to do compile code generation (like e.g. c++'s templates for
all their weaknesses do).  Thus the somewhat ugly approach taken here is
to allow for code generation using a macro-templatized header file,
which generates functions and types based on a prefix and other
parameters.

Later patches use this infrastructure to use such hash tables for
tidbitmap.c (bitmap scans) and execGrouping.c (hash aggregation,
...). In queries where these use up a large fraction of the time, this
has been measured to lead to performance improvements of over 100%.

There are other cases where this could be useful (e.g. catcache.c).

The hash table design chosen is a variant of linear open-addressing. The
biggest disadvantage of simple linear addressing schemes are highly
variable lookup times due to clustering, and deletions leaving a lot of
tombstones around.  To address these issues a variant of "robin hood"
hashing is employed.  Robin hood hashing optimizes chaining lengths by
moving elements close to their optimal bucket ("rich" elements), out of
the way if a to-be-inserted element is further away from its optimal
position (i.e. it's "poor").  While that can make insertions slower, the
average lookup performance is a lot better, and higher fill factors can
be used in a still performant manner.  To avoid tombstones - which
normally solve the issue that a deleted node's presence is relevant to
determine whether a lookup needs to continue looking or is done -
buckets following a deleted element are shifted backwards, unless
they're empty or already at their optimal position.

There's further possible improvements that can be made to this
implementation. Amongst others:
- Use distance as a termination criteria during searches. This is
  generally a good idea, but I've been able to see the overhead of
  distance calculations in some cases.
- Consider combining the 'empty' status into the hashvalue, and enforce
  storing the hashvalue. That could, in some cases, increase memory
  density and remove a few instructions.
- Experiment further with the, very conservatively choosen, fillfactor.
- Make maximum size of hashtable configurable, to allow storing very
  very large tables. That'd require 64bit hash values to be more common
  than now, though.
- some smaller memcpy calls could be optimized to copy larger chunks
But since the new implementation is already considerably faster than
dynahash it seem sensible to start using it.

Reviewed-By: Tomas Vondra
Discussion: <20160727004333.r3e2k2y6fvk2ntup@alap3.anarazel.de>

b30d3ea8

Add likely/unlikely() branch hint macros. · aa3ca5e3

Andres Freund authored Oct 14, 2016

These are useful for very hot code paths. Because it's easy to guess
wrongly about likelihood, and because such likelihoods change over time,
they should be used sparingly.

Past tests have shown it'd be a good idea to use them in some places,
e.g. in error checks around ereports that ERROR out, but that's work for
later.

Discussion: <20160727004333.r3e2k2y6fvk2ntup@alap3.anarazel.de>

aa3ca5e3

Fix assorted integer-overflow hazards in varbit.c. · 32fdf42c

Tom Lane authored Oct 14, 2016

bitshiftright() and bitshiftleft() would recursively call each other
infinitely if the user passed INT_MIN for the shift amount, due to integer
overflow in negating the shift amount.  To fix, clamp to -VARBITMAXLEN.
That doesn't change the results since any shift distance larger than the
input bit string's length produces an all-zeroes result.

Also fix some places that seemed inadequately paranoid about input typmods
exceeding VARBITMAXLEN.  While a typmod accepted by anybit_typmodin() will
certainly be much less than that, at least some of these spots are
reachable with user-chosen integer values.

Andreas Seltenreich and Tom Lane

Discussion: <87d1j2zqtz.fsf@credativ.de>

32fdf42c

Fix typo. · 13d3180f
Tatsuo Ishii authored Oct 14, 2016
```
Confirmed by Michael Paquier.
```
13d3180f

13 Oct, 2016 11 commits

Fix handling of pgstat counters for TRUNCATE in a prepared transaction. · 81e82a2b

Tom Lane authored Oct 13, 2016

pgstat_twophase_postcommit is supposed to duplicate the math in
AtEOXact_PgStat, but it had missed out the bit about clearing
t_delta_live_tuples/t_delta_dead_tuples for a TRUNCATE.

It's harder than you might think to replicate the issue here, because
those counters would only be nonzero when a previous transaction in
the same backend had added/deleted tuples in the truncated table,
and those counts hadn't been sent to the stats collector yet.

Evident oversight in commit d42358ef.  I've not added a regression
test for this; we tried to add one in d42358ef, and had to revert it
because it was too timing-sensitive for the buildfarm.

Back-patch to 9.5 where d42358ef came in.

Stas Kelvich

Discussion: <EB57BF68-C06D-4737-BDDC-4BA778F4E62B@postgrespro.ru>

81e82a2b

Fix typo. · b1ee762a
Tatsuo Ishii authored Oct 14, 2016
```
Confirmed by Tom Lane.
```
b1ee762a

Fix another bug in merging of inherited CHECK constraints. · 3cca13cb

Tom Lane authored Oct 13, 2016

It's not good for an inherited child constraint to be marked connoinherit;
that would result in the constraint not propagating to grandchild tables,
if any are created later. The code mostly prevented this from happening
but there was one case that was missed.

This is somewhat related to commit e55a946a, which also tightened checks
on constraint merging. Hence, back-patch to 9.2 like that one. This isn't
so much because there's a concrete feature-related reason to stop there,
as to avoid having more distinct behaviors than we have to in this area.

Amit Langote

Discussion: <b28ee774-7009-313d-dd55-5bdd81242c41@lab.ntt.co.jp>

3cca13cb

Remove dead code in pg_dump. · c08521eb

Tom Lane authored Oct 13, 2016

I'm not sure if this provision for "pg_backup" behaving a bit differently
from "pg_dump" ever did anything useful in a released version.  But it's
definitely dead code now.

Michael Paquier

c08521eb

Try to find out the actual hugepage size when making a MAP_HUGETLB request. · cb775768

Tom Lane authored Oct 13, 2016

Even if Linux's mmap() is okay with a partial-hugepage request, munmap()
is not, as reported by Chris Richards.  Therefore it behooves us to try
a bit harder to find out the actual hugepage size, instead of assuming
that we can skate by with a guess.

For the moment, just look into /proc/meminfo to find out the default
hugepage size, and use that.  Later, on kernels that support requests
for nondefault sizes, we might try to consider other alternatives.
But that smells more like a new feature than a bug fix, especially if
we want to provide any way for the DBA to control it, so leave it for
another day.

I set this up to allow easy addition of platform-specific code for
non-Linux platforms, if needed; but right now there are no reports
suggesting that we need to work harder on other platforms.

Back-patch to 9.4 where hugepage support was introduced.

Discussion: <31056.1476303954@sss.pgh.pa.us>

cb775768

Clean up handling of anonymous mmap'd shared-memory segment. · 15fc5e15

Tom Lane authored Oct 13, 2016

Fix detaching of the mmap'd segment to have its own on_shmem_exit callback,
rather than piggybacking on the one for detaching from the SysV segment.
That was confusing, and given the distance between the two attach calls,
it was trouble waiting to happen.

Make the detaching calls idempotent by clearing AnonymousShmem to show
we've already unmapped. I spent quite a bit of time yesterday trying
to find a path that would allow the munmap()'s to be done twice, and
while I did not succeed, it seems silly that there's even a question.

Make the #ifdef logic less confusing by separating "do we want to use
anonymous shmem" from EXEC_BACKEND. Even though there's no current
scenario where those conditions are different, it is not helpful for
different places in the same file to be testing EXEC_BACKEND for what
are fundamentally different reasons.

Don't do on_exit_reset() in StartBackgroundWorker(). At best that's
useless (InitPostmasterChild would have done it already) and at worst
it could zap some callback that's unrelated to shared memory.

Improve comments, and simplify the huge_pages enablement logic slightly.

Back-patch to 9.4 where hugepage support was introduced.
Arguably this should go into 9.3 as well, but the code looks
significantly different there, and I doubt it's worth the
trouble of adapting the patch given I can't show a live bug.

15fc5e15

Fix pg_dumpall regression test to be locale-independent. · 0a4bf6b1

Tom Lane authored Oct 13, 2016

The expected results in commit b4fc6457 seem to have been generated
in a non-C locale, which just points up the fact that the ORDER BY
clause was locale-sensitive.

Per buildfarm.

0a4bf6b1

Fix broken jsonb_set() logic for replacing array elements. · 9c4cc9e2

Tom Lane authored Oct 13, 2016

Commit 0b62fd03 did a fairly sloppy job of refactoring setPath()
to support jsonb_insert() along with jsonb_set(). In its defense,
though, there was no regression test case exercising the case of
replacing an existing element in a jsonb array.

Per bug #14366 from Peng Sun. Back-patch to 9.6 where bug was introduced.

Report: <20161012065349.1412.47858@wrigleys.postgresql.org>

9c4cc9e2

Fix further hash table order dependent tests. · ccbb852c

Andres Freund authored Oct 12, 2016

Similar to 0137caf2, this makes contrib and pl tests less dependant on
hash-table order.  After this commit, at least some order affecting
changes to execGrouping.c don't result in regression test changes
anymore.

ccbb852c

Make pg_dumpall's database ACL query independent of hash table order. · b4fc6457

Andres Freund authored Oct 12, 2016

Previously GRANT order on databases was not well defined, due to the use
of EXCEPT without an ORDER BY. Add an ORDER BY, adapt test output.

I don't, at the moment, see reason to backpatch this.

b4fc6457

Remove spurious word. · 248776ea
Robert Haas authored Oct 12, 2016
```
Tatsuo Ishii
```
248776ea

12 Oct, 2016 5 commits

Revert addition of PGDLLEXPORT in PG_FUNCTION_INFO_V1 macro. · 4f52fd3c

Tom Lane authored Oct 12, 2016

This turns out not to be as harmless as I thought: MSVC will complain
if it sees an "extern" declaration without PGDLLEXPORT and then one with.
(Seems fairly silly, given that this can be changed after the fact by the
linker, but there you have it.)  Therefore, contrib modules that have
extern's for V1 functions in header files are falling over in the
buildfarm, since none of those externs are marked PGDLLEXPORT.

We might or might not conclude that we're willing to plaster those
declarations with PGDLLEXPORT in HEAD, but in any case there's no way we're
going to ship this change in the back branches.  Third-party authors would
not thank us for breaking their code in a minor release.  Hence, revert
the addition of PGDLLEXPORT (but let's keep the extra info in the comment).
If we do the other changes we can revert this commit in HEAD.

Per buildfarm.

4f52fd3c

pg_dump's getTypes() needn't retrieve typinput or typoutput anymore. · c0a3b211

Tom Lane authored Oct 12, 2016

Commit 64f3524e removed the stanza of code that examined these values.
I failed to notice they were unnecessary because my compiler didn't
warn about the un-read variables. Noted by Peter Eisentraut.

c0a3b211

Remove unnecessary int2vector-specific hash function and equality operator. · 5c80642a

Tom Lane authored Oct 12, 2016

These functions were originally added in commit d8cedf67 to support
use of int2vector columns as catcache lookup keys. However, there are
no catcaches that use such columns. (Indeed I now think it must always
have been dead code: a catcache with such a key column would need an
underlying unique index on the column, but we've never had an int2vector
btree opclass.)

Getting rid of the int2vector-specific operator and function does not
lose any functionality, because operations on int2vectors will now fall
back to the generic anyarray support. This avoids a wart that a btree
index on an int2vector column (made using anyarray_ops) would fail to
match equality searches, because int2vectoreq wasn't a member of the
opclass. We don't really care much about that, since int2vector is not
meant as a type for users to use, but it's silly to have extra code and
less functionality.

If we ever do want a catcache to be indexed by an int2vector column,
we'd need to put back full btree and hash opclasses for int2vector,
comparable to the support for oidvector. (The anyarray code can't be
used at such a low level, because it needs to do catcache lookups.)
But we'll deal with that if/when the need arises.

Also worth noting is that removal of the hash int2vector_ops opclass will
break any user-created hash indexes on int2vector columns. While hash
anyarray_ops would serve the same purpose, it would probably not compute
the same hash values and thus wouldn't be on-disk-compatible. Given that
int2vector isn't a user-facing type and we're planning other incompatible
changes in hash indexes for v10 anyway, this doesn't seem like something
to worry about, but it's probably worth mentioning here.

Amit Langote

Discussion: <d9bb74f8-b194-7307-9ebd-90645d377e45@lab.ntt.co.jp>

5c80642a

Provide DLLEXPORT markers for C functions via PG_FUNCTION_INFO_V1 macro. · 8518583c

Tom Lane authored Oct 12, 2016

This isn't really necessary for our own code, because we use a .DEF file
in MSVC builds (see gendef.pl), or --export-all-symbols in MinGW and
Cygwin builds, to ensure that all global symbols in loadable modules
will be exported on Windows.  However, third-party authors might use
different build processes that need this marker, and it's harmless
enough for our own builds.

To some extent, this is an oversight in commit e7128e8d, so back-patch
to 9.4 where that was added.

Laurenz Albe

Discussion: <A737B7A37273E048B164557ADEF4A58B539300BD@ntex2010a.host.magwien.gv.at>

8518583c

Remove pg_dump/pg_dumpall support for dumping from pre-8.0 servers. · 64f3524e

Tom Lane authored Oct 12, 2016

The need for dumping from such ancient servers has decreased to about nil
in the field, so let's remove all the code that catered to it. Aside
from removing a lot of boilerplate variant queries, this allows us to not
have to cope with servers that don't have (a) schemas or (b) pg_depend.
That means we can get rid of assorted squishy code around that. There
may be some nonobvious additional simplifications possible, but this patch
already removes about 1500 lines of code.

I did not remove the ability for pg_restore to read custom-format archives
generated by these old versions (and light testing says that that does
still work). If you have an old server, you probably also have a pg_dump
that will work with it; but you have an old custom-format backup file,
that might be all you have.

It'd be possible at this point to remove fmtQualifiedId()'s version
argument, but I refrained since that would affect code outside pg_dump.

Discussion: <2661.1475849167@sss.pgh.pa.us>

64f3524e