Commits · ee07e38c14a06a8479d2605406345c4a868465ae · Abuhujair Javed / Postgres FD Implementation

21 Nov, 2018 5 commits

Fix PartitionDispatchData vertical whitespace · ee07e38c

Alvaro Herrera authored Nov 21, 2018

Per David Rowley
Discussion: https://postgr.es/m/CAKJS1f-MstvBWdkOzACsOHyBgj2oXcBM8kfv+NhVe-Ux-wq9Sg@mail.gmail.com

ee07e38c

instr_time.h: add INSTR_TIME_SET_CURRENT_LAZY · 6f7d02aa

Alvaro Herrera authored Nov 21, 2018

Sets the timestamp to current if not already set.  Will acquire more
callers momentarily.

Author: Fabien Coelho
Discussion: https://postgr.es/m/alpine.DEB.2.21.1808111104320.1705@lancre

6f7d02aa

Blind attempt at fixing sepgsql output for 578b22. · 7306d5e9
Andres Freund authored Nov 20, 2018

7306d5e9

Fix sepgsql compile error caused by oid removal. · 937e4e50

Andres Freund authored Nov 20, 2018

Per buildfarm animal rhinoceros. I (Andres) missed replacing a few
uses of ObjectIdAttributeNumber in sepgsql.

It's quite probable that the sepgsql test output will need more
adapting than done in 578b22...

Author: Thomas Munro
Discussion: https://postgr.es/m/CAEepm=2Sk+66HJV8FLDfm_sKTn22j7cWTY_Y1Rok3RxeWL_Y0w@mail.gmail.com

937e4e50

Remove WITH OIDS support, change oid catalog column visibility. · 578b2297

Andres Freund authored Nov 20, 2018

Previously tables declared WITH OIDS, including a significant fraction
of the catalog tables, stored the oid column not as a normal column,
but as part of the tuple header.

This special column was not shown by default, which was somewhat odd,
as it's often (consider e.g. pg_class.oid) one of the more important
parts of a row.  Neither pg_dump nor COPY included the contents of the
oid column by default.

The fact that the oid column was not an ordinary column necessitated a
significant amount of special case code to support oid columns. That
already was painful for the existing, but upcoming work aiming to make
table storage pluggable, would have required expanding and duplicating
that "specialness" significantly.

WITH OIDS has been deprecated since 2005 (commit ff02d0a05280e0).
Remove it.

Removing includes:
- CREATE TABLE and ALTER TABLE syntax for declaring the table to be
  WITH OIDS has been removed (WITH (oids[ = true]) will error out)
- pg_dump does not support dumping tables declared WITH OIDS and will
  issue a warning when dumping one (and ignore the oid column).
- restoring an pg_dump archive with pg_restore will warn when
  restoring a table with oid contents (and ignore the oid column)
- COPY will refuse to load binary dump that includes oids.
- pg_upgrade will error out when encountering tables declared WITH
  OIDS, they have to be altered to remove the oid column first.
- Functionality to access the oid of the last inserted row (like
  plpgsql's RESULT_OID, spi's SPI_lastoid, ...) has been removed.

The syntax for declaring a table WITHOUT OIDS (or WITH (oids = false)
for CREATE TABLE) is still supported. While that requires a bit of
support code, it seems unnecessary to break applications / dumps that
do not use oids, and are explicit about not using them.

The biggest user of WITH OID columns was postgres' catalog. This
commit changes all 'magic' oid columns to be columns that are normally
declared and stored. To reduce unnecessary query breakage all the
newly added columns are still named 'oid', even if a table's column
naming scheme would indicate 'reloid' or such.  This obviously
requires adapting a lot code, mostly replacing oid access via
HeapTupleGetOid() with access to the underlying Form_pg_*->oid column.

The bootstrap process now assigns oids for all oid columns in
genbki.pl that do not have an explicit value (starting at the largest
oid previously used), only oids assigned later by oids will be above
FirstBootstrapObjectId. As the oid column now is a normal column the
special bootstrap syntax for oids has been removed.

Oids are not automatically assigned during insertion anymore, all
backend code explicitly assigns oids with GetNewOidWithIndex(). For
the rare case that insertions into the catalog via SQL are called for
the new pg_nextoid() function can be used (which only works on catalog
tables).

The fact that oid columns on system tables are now normal columns
means that they will be included in the set of columns expanded
by * (i.e. SELECT * FROM pg_class will now include the table's oid,
previously it did not). It'd not technically be hard to hide oid
column by default, but that'd mean confusing behavior would either
have to be carried forward forever, or it'd cause breakage down the
line.

While it's not unlikely that further adjustments are needed, the
scope/invasiveness of the patch makes it worthwhile to get merge this
now. It's painful to maintain externally, too complicated to commit
after the code code freeze, and a dependency of a number of other
patches.

Catversion bump, for obvious reasons.

Author: Andres Freund, with contributions by John Naylor
Discussion: https://postgr.es/m/20180930034810.ywp2c7awz7opzcfr@alap3.anarazel.de

578b2297

20 Nov, 2018 7 commits

Improve description of buffer used to store records in WAL reader · 0999ac47

Michael Paquier authored Nov 21, 2018

The dedicated private buffer to store records is used only for these
crossing a page boundary since 285bd0ac, but its description did not
match completely the reality.

Reported-by: Andrey Lepikhov
Author: Michael Paquier
Discussion: https://postgr.es/m/49518b48-2036-5e43-1818-0f594e375e76@postgrespro.ru

0999ac47

Make detection of SSL_CTX_set_min_proto_version more portable · ea8bc349

Peter Eisentraut authored Nov 20, 2018

As already explained in configure.in, using the OpenSSL version number
to detect presence of functions doesn't work, because LibreSSL reports
incompatible version numbers. Fortunately, the functions we need here
are actually macros, so we can just test for them directly.

ea8bc349

Add settings to control SSL/TLS protocol version · e73e67c7

Peter Eisentraut authored Nov 20, 2018

For example:

    ssl_min_protocol_version = 'TLSv1.1'
    ssl_max_protocol_version = 'TLSv1.2'
Reviewed-by: Steve Singer <steve@ssinger.info>
Discussion: https://www.postgresql.org/message-id/flat/1822da87-b862-041a-9fc2-d0310c3da173@2ndquadrant.com

e73e67c7

Make WAL description output more consistent · 2d9140ed

Peter Eisentraut authored Nov 20, 2018

The output for record types XLOG_DBASE_CREATE and XLOG_DBASE_DROP used
the order dbid/tablespaceid, whereas elsewhere the order is
tablespaceid/dbid[/relfilenodeid]. Flip the order for those two types
to make it consistent.

Author: Jean-Christophe Arnu <jcarnu@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/CAHZmTm18Ln62KW-G8NYvO1wbBL3QU1E76Zep=DuHmg-zS2XFAg@mail.gmail.com/

2d9140ed

doc: Clarify CREATE TYPE ENUM documentation · 69bae237

Peter Eisentraut authored Nov 13, 2018

The documentation claimed that an enum type requires "one or more"
labels, but since 1fd9883f, zero labels are also allowed.
Reported-by: Lukas Eder <lukas.eder@gmail.com>
Bug: #15356

69bae237

Refine some guc.c help texts · a568cada

Peter Eisentraut authored Nov 20, 2018

These settings apply to communication with the sending server, which
is not necessarily a primary.

Author: Sergei Kornilov <sk@zsrv.org>

a568cada

Fix issues with TAP tests of pg_verify_checksums · 9685d738

Michael Paquier authored Nov 20, 2018

Two issues have been spotted and get fixed here:
- When checking for corrupted files, make sure that pg_verify_checksums
complains about the correct file.  In order to make the logic more
robust, all files created are immediately removed once checks on them
are done.  The error message generated by pg_verify_checksums also now
includes the file name it sees as corrupted.
- Before running corruption-related tests, empty files are generated
which used names mapping with the corrupted files, potentially leading
to conflicts.  So use different set of names for both.

Author: Michael Banck
Discussion: https://postgr.es/m/20181119181119.GC23740@nighthawk.caipicrew.dd-dns.de

9685d738

19 Nov, 2018 14 commits

Add needed #include. · cb09903f

Tom Lane authored Nov 19, 2018

Per POSIX, WIFSIGNALED and related macros are provided by <sys/wait.h>.
Apparently on Linux they're also pulled in by some other inclusions,
but BSD-ish systems are pickier.  Fixes portability issue in ffa4cbd6.

Per buildfarm.

cb09903f

Handle EPIPE more sanely when we close a pipe reading from a program. · ffa4cbd6

Tom Lane authored Nov 19, 2018

Previously, any program launched by COPY TO/FROM PROGRAM inherited the
server's setting of SIGPIPE handling, i.e. SIG_IGN. Hence, if we were
doing COPY FROM PROGRAM and closed the pipe early, the child process
would see EPIPE on its output file and typically would treat that as
a fatal error, in turn causing the COPY to report error. Similarly,
one could get a failure report from a query that didn't read all of
the output from a contrib/file_fdw foreign table that uses file_fdw's
PROGRAM option.

To fix, ensure that child programs inherit SIG_DFL not SIG_IGN processing
of SIGPIPE. This seems like an all-around better situation since if
the called program wants some non-default treatment of SIGPIPE, it would
expect to have to set that up for itself. Then in COPY, if it's COPY
FROM PROGRAM and we stop reading short of detecting EOF, treat a SIGPIPE
exit from the called program as a non-error condition. This still allows
us to report an error for any case where the called program gets SIGPIPE
on some other file descriptor.

As coded, we won't report a SIGPIPE if we stop reading as a result of
seeing an in-band EOF marker (e.g. COPY BINARY EOF marker). It's
somewhat debatable whether we should complain if the called program
continues to transmit data after an EOF marker. However, it seems like
we should avoid throwing error in any questionable cases, especially in a
back-patched fix, and anyway it would take additional code to make such
an error get reported consistently.

Back-patch to v10. We could go further back, since COPY FROM PROGRAM
has been around awhile, but AFAICS the only way to reach this situation
using core or contrib is via file_fdw, which has only supported PROGRAM
sources since v10. The COPY statement per se has no feature whereby
it'd stop reading without having hit EOF or an error already. Therefore,
I don't see any upside to back-patching further that'd outweigh the
risk of complaints about behavioral change.

Per bug #15449 from Eric Cyr.

Patch by me, review by Etsuro Fujita and Kyotaro Horiguchi

Discussion: https://postgr.es/m/15449-1cf737dd5929450e@postgresql.org

ffa4cbd6

psql: Describe partitioned tables/indexes as such · d56e0fde

Alvaro Herrera authored Nov 19, 2018

In \d and \z, instead of conflating partitioned tables and indexes with
plain ones, set the "type" column and table title differently to make
the distinction obvious. A simple ease-of-use improvement.

Author: Pavel Stehule, Michaël Paquier, Álvaro Herrera
Reviewed-by: Amit Langote
Discussion: https://postgr.es/m/CAFj8pRDMWPgijpt_vPj1t702PgLG4Ls2NCf+rEcb+qGPpossmg@mail.gmail.com

d56e0fde

Update config/ax_pthread.m4 to latest upstream version. · df303aff

Tom Lane authored Nov 19, 2018

This change doesn't fix any bugs that we've heard about, but it seems
like a good idea on general principles to track upstream occasionally.

Discussion: https://postgr.es/m/3320.1542647565@sss.pgh.pa.us

df303aff

Postpone LLVM-related uses of AC_CHECK_DECLS. · 640a4ba0

Tom Lane authored Nov 19, 2018

Calling AC_CHECK_DECLS before we've finished setting up the compiler's
CFLAGS seems like a pretty risky proposition, especially now that the
first use of that macro will result in a test to see whether the compiler
gives warning or error for undeclared built-in functions. That answer
could very easily get changed later than where PGAC_LLVM_SUPPORT is
called; furthermore, it's hardly unlikely that flags such as -D_GNU_SOURCE
could change visibility of declarations. Hence, be a little less cavalier
about where to do LLVM-related tests. This results in v11 and HEAD doing
the warning-or-error check at the same place in the script as older
branches are doing it, which seems like a good thing.

Per further thought about commits 0b59b0e8 and 16fbac39.

640a4ba0

psql: Show IP address in \conninfo · 6e5f8d48

Alvaro Herrera authored Nov 19, 2018

When hostaddr is given, the actual IP address that psql is connected to
can be totally unexpected for the given host. The more verbose output
we now generate makes things clearer. Since the "host" and "hostaddr"
parts of the conninfo could come from different sources (say, one of
them is in the service specification or a URI-style conninfo and the
other is not), this is not as silly as it may first appear. This is
also definitely useful if the hostname resolves to multiple addresses.

Author: Fabien Coelho
Reviewed-by: Pavel Stehule, Arthur Zakirov
Discussion: https://postgr.es/m/alpine.DEB.2.21.1810261532380.27686@lancre
https://postgr.es/m/alpine.DEB.2.21.1808201323020.13832@lancre

6e5f8d48

Reduce unnecessary list construction in RelationBuildPartitionDesc. · 7ee5f88e

Robert Haas authored Nov 19, 2018

The 'partoids' list which was constructed by the previous version
of this code was necessarily identical to 'inhoids'.  There's no
point to duplicating the list, so avoid that.  Instead, construct
the array representation directly from the original 'inhoids' list.

Also, use an array rather than a list for 'boundspecs'.  We know
exactly how many items we need to store, so there's really no
reason to use a list.  Using an array instead reduces the number
of memory allocations we perform.

Patch by me, reviewed by Michael Paquier and Amit Langote, the
latter of whom also helped with rebasing.

7ee5f88e

Fix configure's AC_CHECK_DECLS tests to work correctly with clang. · 16fbac39

Tom Lane authored Nov 19, 2018

The test case that Autoconf uses to discover whether a function has
been declared doesn't work reliably with clang, because clang reports
a warning not an error if the name is a known built-in function.
On some platforms, this results in a lot of compile-time warnings about
strlcpy and related functions not having been declared.

There is a fix for this (by Noah Misch) in the upstream Autoconf sources,
but since they've not made a release in years and show no indication of
doing so anytime soon, let's just absorb their fix directly. We can
revert this when and if we update to a newer Autoconf release.

Back-patch to all supported branches.

Discussion: https://postgr.es/m/26819.1542515567@sss.pgh.pa.us

16fbac39

Disallow COPY FREEZE on partitioned tables · 5c9a5513

Alvaro Herrera authored Nov 19, 2018

This didn't actually work: COPY would fail to flush the right files, and
instead would try to flush a non-existing file, causing the whole
transaction to fail.

Cope by raising an error as soon as the command is sent instead, to
avoid a nasty later surprise.  Of course, it would be much better to
make it work, but we don't have a patch for that yet, and we don't know
if we'll want to backpatch one when we do.

Reported-by: Tomas Vondra
Author: David Rowley
Reviewed-by: Amit Langote, Steve Singer, Tomas Vondra

5c9a5513

pg_archivecleanup: Update file header comment a bit · fc47e99a
Peter Eisentraut authored Nov 19, 2018

fc47e99a

PANIC on fsync() failure. · 9ccdd7f6

Thomas Munro authored Nov 19, 2018

On some operating systems, it doesn't make sense to retry fsync(),
because dirty data cached by the kernel may have been dropped on
write-back failure.  In that case the only remaining copy of the
data is in the WAL.  A subsequent fsync() could appear to succeed,
but not have flushed the data.  That means that a future checkpoint
could apparently complete successfully but have lost data.

Therefore, violently prevent any future checkpoint attempts by
panicking on the first fsync() failure.  Note that we already
did the same for WAL data; this change extends that behavior to
non-temporary data files.

Provide a GUC data_sync_retry to control this new behavior, for
users of operating systems that don't eject dirty data, and possibly
forensic/testing uses.  If it is set to on and the write-back error
was transient, a later checkpoint might genuinely succeed (on a
system that does not throw away buffers on failure); if the error is
permanent, later checkpoints will continue to fail.  The GUC defaults
to off, meaning that we panic.

Back-patch to all supported releases.

There is still a narrow window for error-loss on some operating
systems: if the file is closed and later reopened and a write-back
error occurs in the intervening time, but the inode has the bad
luck to be evicted due to memory pressure before we reopen, we could
miss the error.  A later patch will address that with a scheme
for keeping files with dirty data open at all times, but we judge
that to be too complicated to back-patch.

Author: Craig Ringer, with some adjustments by Thomas Munro
Reported-by: Craig Ringer
Reviewed-by: Robert Haas, Thomas Munro, Andres Freund
Discussion: https://postgr.es/m/20180427222842.in2e4mibx45zdth5%40alap3.anarazel.de

9ccdd7f6

Don't forget about failed fsync() requests. · 1556cb2f

Thomas Munro authored Nov 19, 2018

If fsync() fails, md.c must keep the request in its bitmap, so that
future attempts will try again.

Back-patch to all supported releases.

Author: Thomas Munro
Reviewed-by: Amit Kapila
Reported-by: Andrew Gierth
Discussion: https://postgr.es/m/87y3i1ia4w.fsf%40news-spur.riddles.org.uk

1556cb2f

Rework documentation of pg_promote · 74171f8c

Michael Paquier authored Nov 19, 2018

This clarifies the behavior of how the "wait" flag works, which is
something that the previous version of the documentation failed to do.

Author: Ian Barwick
Discussion: https://postgr.es/m/cbd38450-2295-10a1-1f73-591a692ae0b0@2ndquadrant.com

74171f8c

Remove unnecessary memcpy when reading WAL record fitting on page · 285bd0ac

Michael Paquier authored Nov 19, 2018

When reading a WAL record, its contents are copied into an intermediate
buffer.  However, doing so is not necessary if the record fits fully
into the current page, saving one memcpy for each such record.  The
allocation handling of the intermediate buffer is also now done only
when a record crosses a page boundary, shaving some extra cycles when
reading a WAL record.

Author: Andrey Lepikhov
Reviewed-by: Kyotaro Horiguchi, Heikki Linnakangas
Discussion: https://postgr.es/m/c2ea54dd-a1d3-80eb-ddbf-7e6f258e615e@postgrespro.ru

285bd0ac

18 Nov, 2018 3 commits

fix typo · 79376e07
Andrew Dunstan authored Nov 18, 2018

79376e07

Silence MSVC warnings about redefinition of isnan · d5d7f7f3

Andrew Dunstan authored Nov 18, 2018

Some versions of perl.h define isnan when the compiler is MSVC. To avoid
warnings about this, undefine the symbol before including perl.h and
re-add the definition afterwards if it wasn't recreated.

Discussion: https://postgr.es/m/caf0568e-3c1f-07fd-6914-d903f22560f2@2ndQuadrant.com

d5d7f7f3

Fix AC_REQUIRES breakage in LLVM autoconf tests. · 0b59b0e8

Tom Lane authored Nov 17, 2018

Any Autoconf macro that uses AC_REQUIRES -- directly or indirectly --
must not be inside a plain shell "if" test; if it is, whatever code
gets pulled in by the AC_REQUIRES will also be inside that "if".
Instead of "if" we can use AS_IF, which knows how to get this right
(cf commit 01051a98).

The only immediate problem from getting this wrong was that AC_PROG_AWK
had to be run twice, once inside the "if llvm" block and once in the
main line.  However, it broke a different patch I'm about to submit
more thoroughly.

0b59b0e8

17 Nov, 2018 5 commits

Add valgrind suppressions for wcsrtombs optimizations · d3bbc4b9

Tomas Vondra authored Nov 17, 2018

wcsrtombs (called through wchar2char from common functions like lower,
upper, etc.) uses various optimizations that may look like access to
uninitialized data, triggering valgrind reports.

For example AVX2 instructions load data in 256-bit chunks, and  gconv
does something similar with 32-bit chunks.  This is faster than accessing
the bytes one by one, and the uninitialized part of the buffer is not
actually used. So suppress the bogus reports.

The exact stack depends on possible optimizations - it might be AVX, SSE
(as in the report by Aleksander Alekseev) or something else. Hence the
last frame is wildcarded, to deal with this.

Backpatch all the way back to 9.4.

Author: Tomas Vondra
Discussion: https://www.postgresql.org/message-id/flat/90ac0452-e907-e7a4-b3c8-15bd33780e62%402ndquadrant.com
Discussion: https://www.postgresql.org/message-id/20180220150838.GD18315@e733.localdomain

d3bbc4b9

Avoid defining SIGTTIN/SIGTTOU on Windows. · 37afc079

Tom Lane authored Nov 17, 2018

Setting them to SIG_IGN seems unlikely to have any beneficial effect
on that platform, and given the signal numbering collision with SIGABRT,
it could easily have bad effects.

Given the lack of field complaints that can be traced to this, I don't
presently feel a need to back-patch.

Discussion: https://postgr.es/m/5627.1542477392@sss.pgh.pa.us

37afc079

Leave SIGTTIN/SIGTTOU signal handling alone in postmaster child processes. · 125f551c

Tom Lane authored Nov 17, 2018

For reasons lost in the mists of time, most postmaster child processes
reset SIGTTIN/SIGTTOU signal handling to SIG_DFL, with the major exception
that backend sessions do not.  It seems like a pretty bad idea for any
postmaster children to do that: if stderr is connected to the terminal,
and the user has put the postmaster in background, any log output would
result in the child process freezing up.  Hence, switch them all to
doing what backends do, ie, nothing.  This allows them to inherit the
postmaster's SIG_IGN setting.  On the other hand, manually-launched
processes such as standalone backends will have default processing,
which seems fine.

In passing, also remove useless resets of SIGCONT and SIGWINCH signal
processing.  Perhaps the postmaster once changed those to something
besides SIG_DFL, but it doesn't now, so these are just wasted (and
confusing) syscalls.

Basically, this propagates the changes made in commit 8e2998d8 from
backends to other postmaster children.  Probably the only reason these
calls now exist elsewhere is that I missed changing pgstat.c along with
postgres.c at the time.

Given the lack of field complaints that can be traced to this, I don't
presently feel a need to back-patch.

Discussion: https://postgr.es/m/5627.1542477392@sss.pgh.pa.us

125f551c

Fix some spurious new compiler warnings in MSVC. · 73616126

Andres Freund authored Nov 17, 2018

Per buildfarm animal bowerbird.

Discussion: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=bowerbird&dt=2018-11-17%2002%3A30%3A20

73616126

Make TupleTableSlots extensible, finish split of existing slot type. · 4da597ed

Andres Freund authored Nov 16, 2018

This commit completes the work prepared in 1a0586de, splitting the
old TupleTableSlot implementation (which could store buffer, heap,
minimal and virtual slots) into four different slot types. As
described in the aforementioned commit, this is done with the goal of
making tuple table slots extensible, to allow for pluggable table
access methods.

To achieve runtime extensibility for TupleTableSlots, operations on
slots that can differ between types of slots are performed using the
TupleTableSlotOps struct provided at slot creation time. That
includes information from the size of TupleTableSlot struct to be
allocated, initialization, deforming etc. See the struct's definition
for more detailed information about callbacks TupleTableSlotOps.

I decided to rename TTSOpsBufferTuple to TTSOpsBufferHeapTuple and
ExecCopySlotTuple to ExecCopySlotHeapTuple, as that seems more
consistent with other naming introduced in recent patches.

There's plenty optimization potential in the slot implementation, but
according to benchmarking the state after this commit has similar
performance characteristics to before this set of changes, which seems
sufficient.

There's a few changes in execReplication.c that currently need to poke
through the slot abstraction, that'll be repaired once the pluggable
storage patchset provides the necessary infrastructure.

Author: Andres Freund and Ashutosh Bapat, with changes by Amit Khandekar
Discussion: https://postgr.es/m/20181105210039.hh4vvi4vwoq5ba2q@alap3.anarazel.de

4da597ed

16 Nov, 2018 6 commits

Avoid re-typedef'ing PartitionTupleRouting · 0201d79a
Alvaro Herrera authored Nov 16, 2018
```
Apparently, gcc on macOS (?) doesn't like it.

Per buildfarm.
```
0201d79a

pgbench: introduce a RandomState struct · 40923191

Alvaro Herrera authored Nov 16, 2018

This becomes useful when used to retry a transaction after a
serialization error or deadlock abort.  (We don't yet have that feature,
but this is preparation for it.)

While at it, use separate random state for thread administratrivia such
as deciding which script to run, how long to delay for throttling, or
whether to log a message when sampling; this not only makes these tasks
independent of each other, but makes the actual thread run
deterministic.

Author: Marina Polyakova
Reviewed-by: Fabien Coelho
Discussion: https://postgr.es/m/72a0d590d6ba06f242d75c2e641820ec@postgrespro.ru

40923191

Inline hot path of slot_getsomeattrs(). · a7aa608e

Andres Freund authored Nov 16, 2018

This yields a minor speedup, which roughly balances the loss from the
upcoming introduction of callbacks to do some operations on slots.

Author: Andres Freund
Discussion: https://postgr.es/m/20181105210039.hh4vvi4vwoq5ba2q@alap3.anarazel.de

a7aa608e

Redesign initialization of partition routing structures · 3f2393ed

Alvaro Herrera authored Nov 16, 2018

This speeds up write operations (INSERT, UPDATE, DELETE, COPY, as well
as the future MERGE) on partitioned tables.

This changes the setup for tuple routing so that it does far less work
during the initial setup and pushes more work out to when partitions
receive tuples. PartitionDispatchData structs for sub-partitioned
tables are only created when a tuple gets routed through it. The
possibly large arrays in the PartitionTupleRouting struct have largely
been removed. The partitions[] array remains but now never contains any
NULL gaps. Previously the NULLs had to be skipped during
ExecCleanupTupleRouting(), which could add a large overhead to the
cleanup when the number of partitions was large. The partitions[] array
is allocated small to start with and only enlarged when we route tuples
to enough partitions that it runs out of space. This allows us to keep
simple single-row partition INSERTs running quickly. Redesign

The arrays in PartitionTupleRouting which stored the tuple translation maps
have now been removed. These have been moved out into a
PartitionRoutingInfo struct which is an additional field in ResultRelInfo.

The find_all_inheritors() call still remains by far the slowest part of
ExecSetupPartitionTupleRouting(). This commit just removes the other slow
parts.

In passing also rename the tuple translation maps from being ParentToChild
and ChildToParent to being RootToPartition and PartitionToRoot. The old
names mislead you into thinking that a partition of some sub-partitioned
table would translate to the rowtype of the sub-partitioned table rather
than the root partitioned table.

Authors: David Rowley and Amit Langote, heavily revised by Álvaro Herrera
Testing help from Jesper Pedersen and Kato Sho.
Discussion: https://postgr.es/m/CAKJS1f_1RJyFquuCKRFHTdcXqoPX-PYqAd7nz=GVBwvGh4a6xA@mail.gmail.com

3f2393ed

Fix slot type assumptions for nodeGather[Merge]. · a387a3df

Andres Freund authored Nov 15, 2018

The assumption made in 1a0586de was wrong, as evidenced by
buildfarm failure on locust, which runs with
force_parallel_mode=regress.  The tuples accessed in either nodes are
in the outer slot, and we can't trivially rely on the slot type being
known because the leader might execute the subsidiary node directly,
or via the tuple queue on a worker. In the latter case the tuple will
always be a heaptuple slot, but in the former, it'll be whatever the
subsidiary node returns.

a387a3df

Add dummy field to currently empty struct TupleTableSlotOps. · f92cd739
Andres Freund authored Nov 15, 2018
```
Per MSVC complaint on buildfarm member dory.
```
f92cd739