Commits · e6c039d13e16a3a2dec5ba479d9d1fb3229c03a3 · Abuhujair Javed / Postgres FD Implementation

28 Mar, 2018 17 commits

Add documentation for the JIT feature. · e6c039d1

Andres Freund authored Mar 28, 2018

As promised in earlier commits, this adds documentation about the new
build options, the new GUCs, about the planner logic when JIT is used,
and the benefits of JIT in general.

Also adds a more implementation oriented README.

I'm sure we're going to want to expand this further, but I think this
is a reasonable start.

Author: Andres Freund, with contributions by Thomas Munro
Reviewed-By: Thomas Munro
Discussion: https://postgr.es/m/20170901064131.tazjxwus3k2w3ybh@alap3.anarazel.de

e6c039d1

Add EXPLAIN support for JIT. · 1f0c6a9e

Andres Freund authored Mar 28, 2018

This just shows a few details about JITing, e.g. how many functions
have been JITed, and how long that took.  To avoid noise in regression
tests with functions sometimes being JITed in --with-llvm builds,
disable display when COSTS OFF is specified.

Author: Andres Freund
Discussion: https://postgr.es/m/20170901064131.tazjxwus3k2w3ybh@alap3.anarazel.de

1f0c6a9e

Add inlining support to LLVM JIT provider. · 9370462e

Andres Freund authored Mar 28, 2018

This provides infrastructure to allow JITed code to inline code
implemented in C. This e.g. can be postgres internal functions or
extension code.

This already speeds up long running queries, by allowing the LLVM
optimizer to optimize across function boundaries. The optimization
potential currently doesn't reach its full potential because LLVM
cannot optimize the FunctionCallInfoData argument fully away, because
it's allocated on the heap rather than the stack. Fixing that is
beyond what's realistic for v11.

To be able to do that, use CLANG to convert C code to LLVM bitcode,
and have LLVM build a summary for it. That bitcode can then be used to
to inline functions at runtime. For that the bitcode needs to be
installed. Postgres bitcode goes into $pkglibdir/bitcode/postgres,
extensions go into equivalent directories.  PGXS has been modified so
that happens automatically if postgres has been compiled with LLVM
support.

Currently this isn't the fastest inline implementation, modules are
reloaded from disk during inlining. That's to work around an apparent
LLVM bug, triggering an apparently spurious error in LLVM assertion
enabled builds.  Once that is resolved we can remove the superfluous
read from disk.

Docs will follow in a later commit containing docs for the whole JIT
feature.

Author: Andres Freund
Discussion: https://postgr.es/m/20170901064131.tazjxwus3k2w3ybh@alap3.anarazel.de

9370462e

Use isinf builtin for clang, for performance. · 8a934d67

Andres Freund authored Mar 28, 2018

When compiling with clang glibc's definition of isinf() ends up
leading to and external libc function call. That's because there was a
bug in the builtin in an old gcc version, and clang claims
compatibility with an older version.  That causes clang to be
measurably slower for floating point heavy workloads than gcc.

To fix simply redirect isinf when using clang and clang confirms it
has __builtin_isinf().

8a934d67

Make pg_rewind skip files and directories that are removed during server start. · 266b6acb

Fujii Masao authored Mar 29, 2018

The target cluster that was rewound needs to perform recovery from
the checkpoint created at failover, which leads it to remove or recreate
some files and directories that may have been copied from the source
cluster. So pg_rewind can skip synchronizing such files and directories,
and which reduces the amount of data transferred during a rewind
without changing the usefulness of the operation.

Author: Michael Paquier
Reviewed-by: Anastasia Lubennikova, Stephen Frost and me

Discussion: https://postgr.es/m/20180205071022.GA17337@paquier.xyz

266b6acb

Fix handling of files that source server removes during pg_rewind is running. · 09e96b3f

Fujii Masao authored Mar 29, 2018

After processing the filemap to build the list of chunks that will be
fetched from the source to rewing the target server, it is possible that
a file which was previously processed is removed from the source.  A
simple example of such an occurence is a WAL segment which gets recycled
on the target in-between.  When the filemap is processed, files not
categorized as relation files are first truncated to prepare for its
full copy of which is going to be taken from the source, divided into a
set of junks.  However, for a recycled WAL segment, this would result in
a segment which has a zero-byte size.  With such an empty file,
post-rewind recovery thinks that records are saved but they are actually
not because of the truncation which happened when processing the
filemap, resulting in data loss.

In order to fix the problem, make sure that files which are found as
removed on the source when receiving chunks of them are as well deleted
on the target server for consistency.

Back-patch to 9.5 where pg_rewind was added.

Author: Tsunakawa Takayuki
Reviewed-by: Michael Paquier
Reported-by: Tsunakawa Takayuki

Discussion: https://postgr.es/m/0A3221C70F24FB45833433255569204D1F8DAAA2%40G01JPEXMBYT05

09e96b3f

PL/pgSQL: Nested CALL with transactions · d92bc83c

Peter Eisentraut authored Mar 24, 2018

So far, a nested CALL or DO in PL/pgSQL would not establish a context
where transaction control statements were allowed.  This fixes that by
handling CALL and DO specially in PL/pgSQL, passing the atomic/nonatomic
execution context through and doing the required management around
transaction boundaries.
Reviewed-by: Tomas Vondra <tomas.vondra@2ndquadrant.com>

d92bc83c

Fix actual and potential double-frees around tuplesort usage. · c2d4eb1b

Tom Lane authored Mar 28, 2018

tuplesort_gettupleslot() passed back tuples allocated in the tuplesort's
own memory context, even when the caller was responsible to free them.
This created a double-free hazard, because some callers might destroy
the tuplesort object (via tuplesort_end) before trying to clean up the
last returned tuple. To avoid this, change the API to specify that the
tuple is allocated in the caller's memory context. v10 and HEAD already
did things that way, but in 9.5 and 9.6 this is a live bug that can
demonstrably cause crashes with some grouping-set usages.

In 9.5 and 9.6, this requires doing an extra tuple copy in some cases,
which is unfortunate. But the amount of refactoring needed to avoid it
seems excessive for a back-patched change, especially since the cases
where an extra copy happens are less performance-critical.

Likewise change tuplesort_getdatum() to return pass-by-reference Datums
in the caller's context not the tuplesort's context. There seem to be
no live bugs among its callers, but clearly the same sort of situation
could happen in future.

For other tuplesort fetch routines, continue to allocate the memory in
the tuplesort's context. This is a little inconsistent with what we now
do for tuplesort_gettupleslot() and tuplesort_getdatum(), but that's
preferable to adding new copy overhead in the back branches where it's
clearly unnecessary. These other fetch routines provide the weakest
possible guarantees about tuple memory lifespan from v10 on, anyway,
so this actually seems more consistent overall.

Adjust relevant comments to reflect these API redefinitions.

Arguably, we should change the pre-9.5 branches as well, but since
there are no known failure cases there, it seems not worth the risk.

Peter Geoghegan, per report from Bernd Helmle. Reviewed by Kyotaro
Horiguchi; thanks also to Andreas Seltenreich for extracting a
self-contained test case.

Discussion: https://postgr.es/m/1512661638.9720.34.camel@oopsware.de

c2d4eb1b

Store 2PC GID in commit/abort WAL recs for logical decoding · 1eb6d652

Simon Riggs authored Mar 28, 2018

Store GID of 2PC in commit/abort WAL records when wal_level = logical.
This allows logical decoding to send the SAME gid to subscribers
across restarts of logical replication.

Track relica origin replay progress for 2PC.

(Edited from patch 0003 in the logical decoding 2PC series.)

Authors: Nikhil Sontakke, Stas Kelvich
Reviewed-by: Simon Riggs, Andres Freund

1eb6d652

Attempt to fix jsonb_plpython build on Windows · 75e95dd7
Peter Eisentraut authored Mar 28, 2018

75e95dd7

Fix jsonb_plpython tests on older Python versions · e81fc9b9

Peter Eisentraut authored Mar 28, 2018

Rewrite one test to avoid a case where some Python versions have output
format differences (Decimal('1') vs Decimal("1")).

e81fc9b9

Transforms for jsonb to PL/Python · 3f44e3db

Peter Eisentraut authored Mar 28, 2018

Add a new contrib module jsonb_plpython that provide a transform between
jsonb and PL/Python.  jsonb values are converted to appropriate Python
types such as dicts and lists, and vice versa.

Author: Anthony Bykov <a.bykov@postgrespro.ru>
Reviewed-by: Aleksander Alekseev <a.alekseev@postgrespro.ru>
Reviewed-by: Nikita Glukhov <n.gluhov@postgrespro.ru>

3f44e3db

Make fast_default regression tests locale independent · a437551a
Andrew Dunstan authored Mar 28, 2018

a437551a

Use pg_stat_get_xact* functions within xacts · 5b0d7f69

Simon Riggs authored Mar 28, 2018

Resolve build farm failures from c203d6cf,
diagnosed by Tom Lane.

The output of pg_stat_get_xact_tuples_hot_updated() and friends
is not guaranteed to show anything after the transaction completes.
Data is flushed slowly to stats collector, so using them can
give timing issues.

5b0d7f69

Quick adaption of JIT tuple deforming to the fast default patch. · f4f5845b

Andres Freund authored Mar 27, 2018

Instead using memset to set tts_isnull, call the new
slot_getmissingattrs().

Also fix a bug (= instead of >=) in the code generation. Normally = is
correct, but when repeatedly deforming fields not in a
tuple (e.g. deform up to natts + 1 and then natts + 2) >= is needed.

Discussion: https://postgr.es/m/20180328010053.i2qvsuuusst4lgmc@alap3.anarazel.de

f4f5845b

Add catversion bump missed in 16828d5c. · b4013b8e
Andres Freund authored Mar 27, 2018
```
Given that pg_attribute changed its layout...
```
b4013b8e

Fast ALTER TABLE ADD COLUMN with a non-NULL default · 16828d5c

Andrew Dunstan authored Mar 28, 2018

Currently adding a column to a table with a non-NULL default results in
a rewrite of the table. For large tables this can be both expensive and
disruptive. This patch removes the need for the rewrite as long as the
default value is not volatile. The default expression is evaluated at
the time of the ALTER TABLE and the result stored in a new column
(attmissingval) in pg_attribute, and a new column (atthasmissing) is set
to true. Any existing row when fetched will be supplied with the
attmissingval. New rows will have the supplied value or the default and
so will never need the attmissingval.

Any time the table is rewritten all the atthasmissing and attmissingval
settings for the attributes are cleared, as they are no longer needed.

The most visible code change from this is in heap_attisnull, which
acquires a third TupleDesc argument, allowing it to detect a missing
value if there is one. In many cases where it is known that there will
not be any (e.g.  catalog relations) NULL can be passed for this
argument.

Andrew Dunstan, heavily modified from an original patch from Serge
Rielau.
Reviewed by Tom Lane, Andres Freund, Tomas Vondra and David Rowley.

Discussion: https://postgr.es/m/31e2e921-7002-4c27-59f5-51f08404c858@2ndQuadrant.com

16828d5c

27 Mar, 2018 7 commits

Update pgindent's typedefs blacklist, and make it easier to adjust. · ef1978d6

Tom Lane authored Mar 27, 2018

It seems that all buildfarm members are now using the <stdbool.h> code
path, so that none of them report "bool" as a typedef.  We still need it
to be treated that way, so adjust pgindent to force that whether or not
it's in the given list.

Also, the recent introduction of LLVM infrastructure has caused the
appearance of some typedef names that we definitely *don't* want
treated as typedefs, such as "string" and "abs".  Extend the existing
blacklist to include these.  (Additions based on comparing v10's
typedefs list to what the buildfarm is currently emitting.)

Rearrange the code so that the lists of whitelisted/blacklisted
names are a bit easier to find and modify.

Andrew Dunstan and Tom Lane

Discussion: https://postgr.es/m/28690.1521912334@sss.pgh.pa.us

ef1978d6

Allow memory contexts to have both fixed and variable ident strings. · 442accc3

Tom Lane authored Mar 27, 2018

Originally, we treated memory context names as potentially variable in
all cases, and therefore always copied them into the context header.
Commit 9fa6f00b rethought this a little bit and invented a distinction
between fixed and variable names, skipping the copy step for the former.
But we can make things both simpler and more useful by instead allowing
there to be two parts to a context's identification, a fixed "name" and
an optional, variable "ident". The name supplied in the context create
call is now required to be a compile-time-constant string in all cases,
as it is never copied but just pointed to. The "ident" string, if
wanted, is supplied later. This is needed because typically we want
the ident to be stored inside the context so that it's cleaned up
automatically on context deletion; that means it has to be copied into
the context before we can set the pointer.

The cost of this approach is basically just an additional pointer field
in struct MemoryContextData, which isn't much overhead, and is bought
back entirely in the AllocSet case by not needing a headerSize field
anymore, since we no longer have to cope with variable header length.
In addition, we can simplify the internal interfaces for memory context
creation still further, saving a few cycles there. And it's no longer
true that a custom identifier disqualifies a context from participating
in aset.c's freelist scheme, so possibly there's some win on that end.

All the places that were using non-compile-time-constant context names
are adjusted to put the variable info into the "ident" instead. This
allows more effective identification of those contexts in many cases;
for example, subsidary contexts of relcache entries are now identified
by both type (e.g. "index info") and relname, where before you got only
one or the other. Contexts associated with PL function cache entries
are now identified more fully and uniformly, too.

I also arranged for plancache contexts to use the query source string
as their identifier. This is basically free for CachedPlanSources, as
they contained a copy of that string already. We pay an extra pstrdup
to do it for CachedPlans. That could perhaps be avoided, but it would
make things more fragile (since the CachedPlanSource is sometimes
destroyed first). I suspect future improvements in error reporting will
require CachedPlans to have a copy of that string anyway, so it's not
clear that it's worth moving mountains to avoid it now.

This also changes the APIs for context statistics routines so that the
context-specific routines no longer assume that output goes straight
to stderr, nor do they know all details of the output format. This
is useful immediately to reduce code duplication, and it also allows
for external code to do something with stats output that's different
from printing to stderr.

The reason for pushing this now rather than waiting for v12 is that
it rethinks some of the API changes made by commit 9fa6f00b. Seems
better for extension authors to endure just one round of API changes
not two.

Discussion: https://postgr.es/m/CAB=Je-FdtmFZ9y9REHD7VsSrnCkiBhsA4mdsLKSPauwXtQBeNA@mail.gmail.com

442accc3

Allow HOT updates for some expression indexes · c203d6cf

Simon Riggs authored Mar 27, 2018

If the value of an index expression is unchanged after UPDATE,
allow HOT updates where previously we disallowed them, giving
a significant performance boost in those cases.

Particularly useful for indexes such as JSON->>field where the
JSON value changes but the indexed value does not.

Submitted as "surjective indexes" patch, now enabled by use
of new "recheck_on_update" parameter.

Author: Konstantin Knizhnik
Reviewer: Simon Riggs, with much wordsmithing and some cleanup

c203d6cf

libpq: PQhost to return active connected host or hostaddr · 1944cdc9

Peter Eisentraut authored Mar 27, 2018

Previously, PQhost didn't return the connected host details when the
connection type was CHT_HOST_ADDRESS (i.e., via hostaddr).  Instead, it
returned the complete host connection parameter (which could contain
multiple hosts) or the default host details, which was confusing and
arguably incorrect.

Change this to return the actually connected host or hostaddr
irrespective of the connection type.  When hostaddr but no host was
specified, hostaddr is now returned.  Never return the original host
connection parameter, and document that PQhost cannot be relied on
before the connection is established.

PQport is similarly changed to always return the active connection port
and never the original connection parameter.

Author: Hari Babu <kommi.haribabu@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp>
Reviewed-by: David G. Johnston <david.g.johnston@gmail.com>

1944cdc9

Fix count of skipped test of basebackup on Windows · 44bd9584

Teodor Sigaev authored Mar 27, 2018

Commit 920a5e50 add tests which should be
skipped on Windows boxes, but patch doesn't contain right count of them.

David Steel

44bd9584

Skip temp tables from basebackup. · 920a5e50

Teodor Sigaev authored Mar 27, 2018

Do not store temp tables in basebackup, they will not be visible anyway, so,
there are not reasons to store them.

Author: David Steel
Reviewed by: me
Discussion: https://www.postgresql.org/message-id/flat/5ea4d26a-a453-c1b7-eff9-5a3ef8f8aceb@pgmasters.net

920a5e50

Add predicate locking for GiST · 3ad55863

Teodor Sigaev authored Mar 27, 2018

Add page-level predicate locking, due to gist's code organization, patch seems
close to trivial: add check before page changing, add predicate lock before page
scanning. Although choosing right place to check is not simple: it should not
be called during index build, it should support insertion of new downlink and so
on.

Author: Shubham Barai with editorization by me and Alexander Korotkov
Reviewed by: Alexander Korotkov, Andrey Borodin, me
Discussion: https://www.postgresql.org/message-id/flat/CALxAEPtdcANpw5ePU3LvnTP8HCENFw6wygupQAyNBgD-sG3h0g@mail.gmail.com

3ad55863

26 Mar, 2018 12 commits

Adapt to LLVM 7+ Orc API changes. · 4b9094eb

Andres Freund authored Mar 26, 2018

This is mostly done to be able to validate features and fixes
submitted to LLVM. Given the size of these changes that seems
acceptable.

Author: Andres Freund

4b9094eb

LLVMJIT: Free created module in LLVM < 5. · 071371bc

Andres Freund authored Mar 26, 2018

Due to the differing APIs between versions, I forgot to deallocate the
generated module in older LLVM versions, leading to a memory leak.

Author: Andres Freund

071371bc

Make new regression indpendent of max_parallel_workers_per_gather. · 0976c4dd

Andres Freund authored Mar 26, 2018

The tests in e2f1eb0e ("Implement partition-wise
grouping/aggregation.") weren't independent of the server's
max_parallel_workers_per_gather setting.  I (Andres) find it useful to
locally run with that disabled, and the aforementioned patch broke
this.

Author: Jeevan Chalke
Discussion:
    https://postgr.es/m/20180322210703.qmga3vsxqmiiypci@alap3.anarazel.de
    https://postgr.es/m/CAM2+6=UNWGKTgh9aOn4=SQ72HfFzbVFseh9=5N54bD6KB+D9OQ@mail.gmail.com

0976c4dd

Correct some typos in the new JIT code. · 96b5eac9
Andres Freund authored Mar 26, 2018
```
Author: Thomas Munro
```
96b5eac9

JIT tuple deforming in LLVM JIT provider. · 32af96b2

Andres Freund authored Mar 26, 2018

Performing JIT compilation for deforming gains performance benefits
over unJITed deforming from compile-time knowledge of the tuple
descriptor. Fixed column widths, NOT NULLness, etc can be taken
advantage of.

Right now the JITed deforming is only used when deforming tuples as
part of expression evaluation (and obviously only if the descriptor is
known). It's likely to be beneficial in other cases, too.

By default tuple deforming is JITed whenever an expression is JIT
compiled. There's a separate boolean GUC controlling it, but that's
expected to be primarily useful for development and benchmarking.

Docs will follow in a later commit containing docs for the whole JIT
feature.

Author: Andres Freund
Discussion: https://postgr.es/m/20170901064131.tazjxwus3k2w3ybh@alap3.anarazel.de

32af96b2

Set random seed for pgbench. · 64f85894

Teodor Sigaev authored Mar 26, 2018

Setting random could increase reproducibility of test in some cases. Patch
suggests three providers for seed: time (default), strong random
generator (if available) and unsigned constant. Seed could be set from
command line or enviroment variable.

Author: Fabien Coelho
Reviewed by: Chapman Flack
Discussion: https://www.postgresql.org/message-id/flat/20160407082711.q7iq3ykffqxcszkv@alap3.anarazel.de

64f85894

Fix thinko in comment · 530bcf75

Alvaro Herrera authored Mar 26, 2018

The listed numbers disagreed with the ones being used in the symbols;
but instead of just fixing the numbers in the comment, use the symbolic
name instead, which seems clearer.

This has been wrong all along, so apply back to 9.5 where BRIN was
introduced.

Reported-by: Tomas Vondra
Discussion: https://postgr.es/m/5ff514f2-8b1e-6366-b11c-8e2ed442562d@2ndquadrant.com

530bcf75

Fix test impredictability · 186b6df2

Alvaro Herrera authored Mar 26, 2018

Test 'triggers' fails when another one creates triggers concurrently at
some precise time, because of a missing WHERE clause.

Per buildfarm members snapper, desmoxytes.

186b6df2

Handle INSERT .. ON CONFLICT with partitioned tables · 555ee77a

Alvaro Herrera authored Mar 26, 2018

Commit eb7ed3f3 enabled unique constraints on partitioned tables,
but one thing that was not working properly is INSERT/ON CONFLICT.
This commit introduces a new node keeps state related to the ON CONFLICT
clause per partition, and fills it when that partition is about to be
used for tuple routing.

Author: Amit Langote, Álvaro Herrera
Reviewed-by: Etsuro Fujita, Pavan Deolasee
Discussion: https://postgr.es/m/20180228004602.cwdyralmg5ejdqkq@alvherre.pgsql

555ee77a

Fix typo · 1b89c218
Alvaro Herrera authored Mar 26, 2018

1b89c218
Remove two tests inadvertently added in 2b272734 · 1d494b62
Andrew Dunstan authored Mar 26, 2018

1d494b62

Optimize btree insertions for common case of increasing values · 2b272734

Andrew Dunstan authored Mar 26, 2018

Remember the last page of an index insert if it's the rightmost leaf
page. If the next entry belongs on and can fit in the remembered page,
insert the new entry there as long as we can get a lock on the page.
Otherwise, fall back on the more expensive method of searching for
the right place to insert the entry.

This provides a performance improvement for the common case where an
index entry is for monotonically increasing or nearly monotonically
increasing value such as an identity field or a current timestamp.

Pavan Deolasee
Reviewed by Claudio Freire, Simon Riggs and Peter Geoghegan

Discussion: https://postgr.es/m/CABOikdM9DrupjyKZZFM5k8-0RCDs1wk6JzEkg7UgSW6QzOwMZw@mail.gmail.com

2b272734

25 Mar, 2018 4 commits

Doc: add example of type resolution in nested UNIONs. · c515ff8d

Tom Lane authored Mar 25, 2018

Section 10.5 didn't say explicitly that multiple UNIONs are resolved
pairwise.  Since the resolution algorithm is described as taking any
number of inputs, readers might well think that a query like
"select x union select y union select z" would be resolved by
considering x, y, and z in one resolution step.  But that's not what
happens (and I think that behavior is per SQL spec).  Add an example
clarifying this point.

Per bug #15129 from Philippe Beaudoin.

Discussion: https://postgr.es/m/152196085023.32649.9916472370480121694@wrigleys.postgresql.org

c515ff8d

Fix unsafe extraction of the OID part of a relation filename. · d0c0c894

Tom Lane authored Mar 25, 2018

Commit 8694cc96 did this randomly differently from other callers of
parse_filename_for_nontemp_relation(). Perhaps unsurprisingly,
the randomly different way is wrong; it fails to ensure the
extracted string is null-terminated. Per buildfarm member skink.

Discussion: https://postgr.es/m/14453.1522001792@sss.pgh.pa.us

d0c0c894

pg_resetwal: Allow users to change the WAL segment size · bf4a8676

Peter Eisentraut authored Mar 25, 2018

This adds a new option --wal-segsize (analogous to initdb) that changes
the WAL segment size in pg_control.

Author: Nathan Bossart <bossartn@amazon.com>

bf4a8676

initdb: Further polishing of --wal-segsize option · 8ad8d916
Peter Eisentraut authored Mar 25, 2018
```
Extend documentation.  Improve option parsing in case no argument was
specified.
```
8ad8d916