Commits · 499be013de65242235ebdde06adb08db887f0ea5 · Abuhujair Javed / Postgres FD Implementation

07 Apr, 2018 16 commits

Support partition pruning at execution time · 499be013

Alvaro Herrera authored Apr 07, 2018

Existing partition pruning is only able to work at plan time, for query
quals that appear in the parsed query. This is good but limiting, as
there can be parameters that appear later that can be usefully used to
further prune partitions.

This commit adds support for pruning subnodes of Append which cannot
possibly contain any matching tuples, during execution, by evaluating
Params to determine the minimum set of subnodes that can possibly match.
We support more than just simple Params in WHERE clauses. Support
additionally includes:

1. Parameterized Nested Loop Joins: The parameter from the outer side of the
join can be used to determine the minimum set of inner side partitions to
scan.

2. Initplans: Once an initplan has been executed we can then determine which
partitions match the value from the initplan.

Partition pruning is performed in two ways. When Params external to the plan
are found to match the partition key we attempt to prune away unneeded Append
subplans during the initialization of the executor. This allows us to bypass
the initialization of non-matching subplans meaning they won't appear in the
EXPLAIN or EXPLAIN ANALYZE output.

For parameters whose value is only known during the actual execution
then the pruning of these subplans must wait. Subplans which are
eliminated during this stage of pruning are still visible in the EXPLAIN
output. In order to determine if pruning has actually taken place, the
EXPLAIN ANALYZE must be viewed. If a certain Append subplan was never
executed due to the elimination of the partition then the execution
timing area will state "(never executed)". Whereas, if, for example in
the case of parameterized nested loops, the number of loops stated in
the EXPLAIN ANALYZE output for certain subplans may appear lower than
others due to the subplan having been scanned fewer times. This is due
to the list of matching subnodes having to be evaluated whenever a
parameter which was found to match the partition key changes.

This commit required some additional infrastructure that permits the
building of a data structure which is able to perform the translation of
the matching partition IDs, as returned by get_matching_partitions, into
the list index of a subpaths list, as exist in node types such as
Append, MergeAppend and ModifyTable. This allows us to translate a list
of clauses into a Bitmapset of all the subpath indexes which must be
included to satisfy the clause list.

Author: David Rowley, based on an earlier effort by Beena Emerson
Reviewers: Amit Langote, Robert Haas, Amul Sul, Rajkumar Raghuwanshi,
Jesper Pedersen
Discussion: https://postgr.es/m/CAOG9ApE16ac-_VVZVvv0gePSgkg_BwYEV1NBqZFqDR2bBE0X0A@mail.gmail.com

499be013

Add bms_prev_member function · 5c067521

Alvaro Herrera authored Apr 07, 2018

This works very much like the existing bms_last_member function, only it
traverses through the Bitmapset in the opposite direction from the most
significant bit down to the least significant bit.  A special prevbit value of
-1 may be used to have the function determine the most significant bit.  This
is useful for starting a loop.  When there are no members less than prevbit,
the function returns -2 to indicate there are no more members.

Author: David Rowley
Discussion: https://postgr.es/m/CAKJS1f-K=3d5MDASNYFJpUpc20xcBnAwNC1-AOeunhn0OtkWbQ@mail.gmail.com

5c067521

Raise error when affecting tuple moved into different partition. · f16241be

Andres Freund authored Apr 07, 2018

When an update moves a row between partitions (supported since
2f178441), our normal logic for following update chains in READ
COMMITTED mode doesn't work anymore. Cross partition updates are
modeled as an delete from the old and insert into the new
partition. No ctid chain exists across partitions, and there's no
convenient space to introduce that link.

Not throwing an error in a partitioned context when one would have
been thrown without partitioning is obviously problematic. This commit
introduces infrastructure to detect when a tuple has been moved, not
just plainly deleted. That allows to throw an error when encountering
a deletion that's actually a move, while attempting to following a
ctid chain.

The row deleted as part of a cross partition update is marked by
pointing it's t_ctid to an invalid block, instead of self as a normal
update would.  That was deemed to be the least invasive and most
future proof way to represent the knowledge, given how few infomask
bits are there to be recycled (there's also some locking issues with
using infomask bits).

External code following ctid chains should be updated to check for
moved tuples. The most likely consequence of not doing so is a missed
error.

Author: Amul Sul, editorialized by me
Reviewed-By: Amit Kapila, Pavan Deolasee, Andres Freund, Robert Haas
Discussion: http://postgr.es/m/CAAJ_b95PkwojoYfz0bzXU8OokcTVGzN6vYGCNVUukeUDrnF3dw@mail.gmail.com

f16241be

Indexes with INCLUDE columns and their support in B-tree · 8224de4f

Teodor Sigaev authored Apr 07, 2018

This patch introduces INCLUDE clause to index definition. This clause
specifies a list of columns which will be included as a non-key part in
the index. The INCLUDE columns exist solely to allow more queries to
benefit from index-only scans. Also, such columns don't need to have
appropriate operator classes. Expressions are not supported as INCLUDE
columns since they cannot be used in index-only scans.

Index access methods supporting INCLUDE are indicated by amcaninclude flag
in IndexAmRoutine. For now, only B-tree indexes support INCLUDE clause.

In B-tree indexes INCLUDE columns are truncated from pivot index tuples
(tuples located in non-leaf pages and high keys). Therefore, B-tree indexes
now might have variable number of attributes. This patch also provides
generic facility to support that: pivot tuples contain number of their
attributes in t_tid.ip_posid. Free 13th bit of t_info is used for indicating
that. This facility will simplify further support of index suffix truncation.
The changes of above are backward-compatible, pg_upgrade doesn't need special
handling of B-tree indexes for that.

Bump catalog version

Author: Anastasia Lubennikova with contribition by Alexander Korotkov and me
Reviewed by: Peter Geoghegan, Tomas Vondra, Antonin Houska, Jeff Janes,
David Rowley, Alexander Korotkov
Discussion: https://www.postgresql.org/message-id/flat/56168952.4010101@postgrespro.ru

8224de4f

Make test of json(b)_to_tsvector language-independ · 01bb8516
Teodor Sigaev authored Apr 07, 2018
```
Missed in 1c1791e0 commit
```
01bb8516

Add json(b)_to_tsvector function · 1c1791e0

Teodor Sigaev authored Apr 07, 2018

Jsonb has a complex nature so there isn't best-for-everything way to convert it
to tsvector for full text search. Current to_tsvector(json(b)) suggests to
convert only string values, but it's possible to index keys, numerics and even
booleans value. To solve that json(b)_to_tsvector has a second required
argument contained a list of desired types of json fields. Second argument is
a jsonb scalar or array right now with possibility to add new options in a
future.

Bump catalog version

Author: Dmitry Dolgov with some editorization by me
Reviewed by: Teodor Sigaev
Discussion: https://www.postgresql.org/message-id/CA+q6zcXJQbS1b4kJ_HeAOoOc=unfnOrUEL=KGgE32QKDww7d8g@mail.gmail.com

1c1791e0

Fix timing issue in new subscription truncate test · 529ab7bd

Peter Eisentraut authored Apr 07, 2018

We need to wait for the initial sync of all subscriptions.  On
some (faster?) machines, this didn't make a difference, but
the (slower?) buildfarm machines are upset.

529ab7bd

Deactive flapping checksum isolation tests. · bf75fe47

Andres Freund authored Apr 07, 2018

They've been broken for days, and prevent other tests from being
run. The plan is to revert their addition later.

Discussion: https://postgr.es/m/20180407162252.wfo5aorjrjw2n5ws@alap3.anarazel.de

bf75fe47

Logical replication support for TRUNCATE · 039eb6e9

Peter Eisentraut authored Apr 07, 2018

Update the built-in logical replication system to make use of the
previously added logical decoding for TRUNCATE support.  Add the
required truncate callback to pgoutput and a new logical replication
protocol message.

Publications get a new attribute to determine whether to replicate
truncate actions.  When updating a publication via pg_dump from an older
version, this is not set, thus preserving the previous behavior.

Author: Simon Riggs <simon@2ndquadrant.com>
Author: Marco Nenciarini <marco.nenciarini@2ndquadrant.it>
Author: Peter Eisentraut <peter.eisentraut@2ndquadrant.com>
Reviewed-by: Petr Jelinek <petr.jelinek@2ndquadrant.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Alvaro Herrera <alvherre@alvh.no-ip.org>

039eb6e9

Logical decoding of TRUNCATE · 5dfd1e5a

Peter Eisentraut authored Apr 07, 2018

Add a new WAL record type for TRUNCATE, which is only used when
wal_level >= logical.  (For physical replication, TRUNCATE is already
replicated via SMGR records.)  Add new callback for logical decoding
output plugins to receive TRUNCATE actions.

Author: Simon Riggs <simon@2ndquadrant.com>
Author: Marco Nenciarini <marco.nenciarini@2ndquadrant.it>
Author: Peter Eisentraut <peter.eisentraut@2ndquadrant.com>
Reviewed-by: Petr Jelinek <petr.jelinek@2ndquadrant.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Alvaro Herrera <alvherre@alvh.no-ip.org>

5dfd1e5a

Predicate locking in hash indexes. · b508a56f

Teodor Sigaev authored Apr 07, 2018

Hash index searches acquire predicate locks on the primary
page of a bucket. It acquires a lock on both the old and new buckets
for scans that happen concurrently with page splits. During a bucket
split, a predicate lock is copied from the primary page of an old
bucket to the primary page of a new bucket.

Author: Shubham Barai, Amit Kapila
Reviewed by: Amit Kapila, Alexander Korotkov, Thomas Munro
Discussion: https://www.postgresql.org/message-id/flat/CALxAEPvNsM2GTiXdRgaaZ1Pjd1bs+sxfFsf7Ytr+iq+5JJoYXA@mail.gmail.com

b508a56f

Document partprune.c a little better · 971d7ddb

Alvaro Herrera authored Apr 07, 2018

Author: Amit Langote
Reviewed-by: Álvaro Herrera, David Rowley
Discussion: https://postgr.es/m/CA+HiwqGzq4D6z=8R0AP+XhbTFCQ-4Ct+t2ekqjE9Fpm84_JUGg@mail.gmail.com

971d7ddb

Blindly attempt to fix sepgsql tests broken due to . · 4f813c72

Andres Freund authored Apr 06, 2018

The failure appears to solely be caused by the changed partition
pruning logic.

Author: Andres Freund
Discussion: https://postgr.es/m/20180406210330.wmqw42wqgiicktli@alap3.anarazel.de

4f813c72

Attempt to fix endianess issues in new hash partition test. · 40e42e10

Andres Freund authored Apr 06, 2018

The tests added as part of 9fdb675f yield differing results
depending on endianess, causing buildfarm failures. As the differences
are expected, split the hash partitioning tests into a different file
and maintain alternative output. The separate file is so the amount of
duplicated output is reduced.

David produced the alternative output without a machine to test on, so
it's possible this'll require a buildfarm cycle or two to get right.

Author: David Rowley
Discussion: https://postgr.es/m/CAKJS1f-6f4c2Qhuipe-GY7BKmFd0FMBobRnLS7hVCoAmTszsBg@mail.gmail.com

40e42e10

Fix and improve pg_atomic_flag fallback implementation. · 8c3debbb

Andres Freund authored Apr 06, 2018

The atomics fallback implementation for pg_atomic_flag was broken,
returning the inverted value from pg_atomic_test_set_flag().  This was
unnoticed because a) atomic flags were unused until recently b) the
test code wasn't run when the fallback implementation was in
use (because it didn't allow to test for some edge cases).

Fix the bug, and improve the fallback so it has the same behaviour as
the non-fallback implementation in the problematic edge cases. That
breaks ABI compatibility in the back branches when fallbacks are in
use, but given they were broken until now...

Author: Andres Freund
Reported-by: Daniel Gustafsson
Discussion:
    https://postgr.es/m/FB948276-7B32-4B77-83E6-D00167F8EEB4@yesql.se
    https://postgr.es/m/20180406233854.uni2h3mbnveczl32@alap3.anarazel.de
Backpatch: 9.5-, where the atomics abstraction was introduced.

8c3debbb

Doc: fix broken markup. · eb2a0e00

Tom Lane authored Apr 06, 2018

Commit 3d956d95 was apparently not checked against HEAD's doc toolchain.
Per buildfarm.

eb2a0e00

06 Apr, 2018 18 commits

Fix possible failure in parallel index build. · 47cb9ca4

Robert Haas authored Apr 06, 2018

Report and proposed fix by David Rowley, put in patch form by
Peter Geoghegan.

Discussion: http://postgr.es/m/CAKJS1f91kq1wfYR8rnRRfKtxyhU2woEA+=whd640UxMyU+O0EQ@mail.gmail.com

47cb9ca4

Allow insert and update tuple routing and COPY for foreign tables. · 3d956d95

Robert Haas authored Apr 06, 2018

Also enable this for postgres_fdw.

Etsuro Fujita, based on an earlier patch by Amit Langote. The larger
patch series of which this is a part has been reviewed by Amit
Langote, David Fetter, Maksim Milyutin, Álvaro Herrera, Stephen Frost,
and me. Minor documentation changes to the final version by me.

Discussion: http://postgr.es/m/29906a26-da12-8c86-4fb9-d8f88442f2b9@lab.ntt.co.jp

3d956d95

Remove some unnecessary quote marks from catalog DATA lines. · cb1ff1e5

Tom Lane authored Apr 06, 2018

This has no functional impact whatsoever. However, it causes
these unnecessary quote marks to disappear from the generated
postgres.bki file, making it easier to verify that the upcoming
bootstrap data conversion patch doesn't change the generated file.

cb1ff1e5

Fix badly edited doc sentence · 3cabe386
Alvaro Herrera authored Apr 06, 2018
```
Noted by Vik Fearing and Robert Haas
```
3cabe386

Clean up intermetiate state in pg_basebackup tests · 03242970

Magnus Hagander authored Apr 06, 2018

These tests accummulated almost a gigabyte of data during the test which
was then removed at the end. Instead, remove output that's no longer
needed between the individual tests, to keep the total disk usage down
lower.

Author: Michael Banck

03242970

Fix typo · f66c37b2
Magnus Hagander authored Apr 06, 2018
```
Author: Michael Banck
```
f66c37b2

Faster partition pruning · 9fdb675f

Alvaro Herrera authored Apr 06, 2018

Add a new module backend/partitioning/partprune.c, implementing a more
sophisticated algorithm for partition pruning. The new module uses each
partition's "boundinfo" for pruning instead of constraint exclusion,
based on an idea proposed by Robert Haas of a "pruning program": a list
of steps generated from the query quals which are run iteratively to
obtain a list of partitions that must be scanned in order to satisfy
those quals.

At present, this targets planner-time partition pruning, but there exist
further patches to apply partition pruning at execution time as well.

This commit also moves some definitions from include/catalog/partition.h
to a new file include/partitioning/partbounds.h, in an attempt to
rationalize partitioning related code.

Authors: Amit Langote, David Rowley, Dilip Kumar
Reviewers: Robert Haas, Kyotaro Horiguchi, Ashutosh Bapat, Jesper Pedersen.
Discussion: https://postgr.es/m/098b9c71-1915-1a2a-8d52-1a7a50ce79e8@lab.ntt.co.jp

9fdb675f

Support new default roles with adminpack · 11523e86

Stephen Frost authored Apr 06, 2018

This provides a newer version of adminpack which works with the newly
added default roles to support GRANT'ing to non-superusers access to
read and write files, along with related functions (unlinking files,
getting file length, renaming/removing files, scanning the log file
directory) which are supported through adminpack.

Note that new versions of the functions are required because an
environment might have an updated version of the library but still have
the old adminpack 1.0 catalog definitions (where EXECUTE is GRANT'd to
PUBLIC for the functions).

This patch also removes the long-deprecated alternative names for
functions that adminpack used to include and which are now included in
the backend, in adminpack v1.1. Applications using the deprecated names
should be updated to use the backend functions instead. Existing
installations which continue to use adminpack v1.0 should continue to
function until/unless adminpack is upgraded.

Reviewed-By: Michael Paquier
Discussion: https://postgr.es/m/20171231191939.GR2416%40tamriel.snowman.net

11523e86

Add default roles for file/program access · 0fdc8495

Stephen Frost authored Apr 06, 2018

This patch adds new default roles named 'pg_read_server_files',
'pg_write_server_files', 'pg_execute_server_program' which
allow an administrator to GRANT to a non-superuser role the ability to
access server-side files or run programs through PostgreSQL (as the user
the database is running as).  Having one of these roles allows a
non-superuser to use server-side COPY to read, write, or with a program,
and to use file_fdw (if installed by a superuser and GRANT'd USAGE on
it) to read from files or run a program.

The existing misc file functions are also changed to allow a user with
the 'pg_read_server_files' default role to read any files on the
filesystem, matching the privileges given to that role through COPY and
file_fdw from above.

Reviewed-By: Michael Paquier
Discussion: https://postgr.es/m/20171231191939.GR2416%40tamriel.snowman.net

0fdc8495

Remove explicit superuser checks in favor of ACLs · e79350fe

Stephen Frost authored Apr 06, 2018

This removes the explicit superuser checks in the various file-access
functions in the backend, specifically pg_ls_dir(), pg_read_file(),
pg_read_binary_file(), and pg_stat_file(). Instead, EXECUTE is REVOKE'd
from public for these, meaning that only a superuser is able to run them
by default, but access to them can be GRANT'd to other roles.

Reviewed-By: Michael Paquier
Discussion: https://postgr.es/m/20171231191939.GR2416%40tamriel.snowman.net

e79350fe

Add memory context identifier to portal context · 94c1f9ba
Peter Eisentraut authored Apr 06, 2018
```
Discussion: https://www.postgresql.org/message-id/6421.1522194949@sss.pgh.pa.us
```
94c1f9ba

Rename MemoryContextCopySetIdentifier() for clarity · bbca7762

Peter Eisentraut authored Apr 06, 2018

MemoryContextCopySetIdentifier -> MemoryContextCopyAndSetIdentifier

Discussion: https://www.postgresql.org/message-id/6421.1522194949@sss.pgh.pa.us

bbca7762

Enforce child constraints during COPY TO a partitioned table. · cfbecf81

Robert Haas authored Apr 06, 2018

The previous coding inadvertently checked the constraints for the
partitioned table rather than the target partition, which could
lead to data in a partition that fails to satisfy some constraint
on that partition.  This problem seems to date back to when
table partitioning was introduced; prior to that, there was only
one target table for a COPY, so the problem didn't occur, and the
code just didn't get updated.

Etsuro Fujita, reviewed by Amit Langote and Ashutosh Bapat

Discussion: https://postgr.es/message-id/5ABA4074.1090500%40lab.ntt.co.jp

cfbecf81

Refactor PgFdwModifyState creation/destruction into separate functions. · 870d8960

Robert Haas authored Apr 06, 2018

Etsuro Fujita.  The larger patch series of which this is a part has
been reviewed by Amit Langote, David Fetter, Maksim Milyutin,
Álvaro Herrera, Stephen Frost, and me.

Discussion: http://postgr.es/m/5A95487E.9050808@lab.ntt.co.jp

870d8960

Split the SetSubscriptionRelState function into two · bcf79b5b

Peter Eisentraut authored Apr 06, 2018

We don't actually need the insert-or-update logic, so it's clearer to
have separate functions for the inserting and updating.

Author: Petr Jelinek <petr.jelinek@2ndquadrant.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>

bcf79b5b

Improve messaging during logical replication worker startup · c25304a9

Peter Eisentraut authored Apr 06, 2018

In case the subscription is removed before the worker is fully started,
give a specific error message instead of the generic "cache lookup"
error.

Author: Petr Jelinek <petr.jelinek@2ndquadrant.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>

c25304a9

Fix compiler warning about format truncation · 2cd6520e
Peter Eisentraut authored Apr 06, 2018

2cd6520e

Improve parse representation for MERGE · f1464c53

Simon Riggs authored Apr 06, 2018

Separation of parser data structures from executor, as
requested by Tom Lane. Further improvements possible.

While there, implement error for multiple VALUES clauses via parser
to allow line number of error, as requested by Andres Freund.

Author: Pavan Deolasee

Discussion: https://www.postgresql.org/message-id/CABOikdPpqjectFchg0FyTOpsGXyPoqwgC==OLKWuxgBOsrDDZw@mail.gmail.com

f1464c53

05 Apr, 2018 6 commits

Attempt to fix win32 build of pg_verify_checksums · 3b0b4f31
Magnus Hagander authored Apr 05, 2018
```
S_ISLNK doesn't exist on Win32, instead we should use
pgwin32_is_junction().
```
3b0b4f31

Allow on-line enabling and disabling of data checksums · 1fde38be

Magnus Hagander authored Apr 05, 2018

This makes it possible to turn checksums on in a live cluster, without
the previous need for dump/reload or logical replication (and to turn it
off).

Enabling checkusm starts a background process in the form of a
launcher/worker combination that goes through the entire database and
recalculates checksums on each and every page. Only when all pages have
been checksummed are they fully enabled in the cluster. Any failure of
the process will revert to checksums off and the process has to be
started.

This adds a new WAL record that indicates the state of checksums, so
the process works across replicated clusters.

Authors: Magnus Hagander and Daniel Gustafsson
Review: Tomas Vondra, Michael Banck, Heikki Linnakangas, Andrey Borodin

1fde38be

doc: remove mention of the DMOZ catalog in ltree docs · c39e903d

Bruce Momjian authored Apr 05, 2018

Discussion: https://postgr.es/m/CAF4Au4xYem_W3KOuxcKct7=G4j8Z3uO9j3DUKTFJqUsfp_9pQg@mail.gmail.com

Author: Oleg Bartunov

Backpatch-through: 9.3

c39e903d

MERGE syntax diagram correction · ddb41585
Simon Riggs authored Apr 05, 2018
```
Reported-by: Andrew Gierth
```
ddb41585

PL/pgSQL: Add support for SET TRANSACTION · b981275b

Peter Eisentraut authored Mar 29, 2018

A normal SQL command run inside PL/pgSQL acquires a snapshot, but SET
TRANSACTION does not work anymore if a snapshot is set.  So we have to
handle this separately.
Reviewed-by: Alexander Korotkov <a.korotkov@postgrespro.ru>
Reviewed-by: Tomas Vondra <tomas.vondra@2ndquadrant.com>

b981275b

Allow cpluspluscheck to pass by renaming variable · 530e69e5
Simon Riggs authored Apr 05, 2018
```
Use of a C++ keyword as a function name caused problems

Reported-by: Álvaro Herrera
```
530e69e5