Commits · 12d11432b4db8a2ae665287e05f0f6868d35545e · Abuhujair Javed / Postgres FD Implementation

30 Apr, 2017 2 commits

Fix possible null pointer dereference or invalid warning message. · 12d11432

Tom Lane authored Apr 30, 2017

Thinko in commit de438971: this warning message references the wrong
"LogicalRepWorker *" variable.  This would often result in a core dump,
but if it didn't, the message would show the wrong subscription OID.

In passing, adjust the message text to format a subscription OID
similarly to how that's done elsewhere in the function; and fix
grammatical issues in some nearby messages.

Per Coverity testing.

12d11432

Micro-optimize some slower queries in the opr_sanity regression test. · c2384421

Tom Lane authored Apr 29, 2017

Convert the binary_coercible() and physically_coercible() functions from
SQL to plpgsql.  It's not that plpgsql is inherently better at doing
queries; if you simply convert the previous single SQL query into one
RETURN expression, it's no faster.  The problem with the existing code
is that it fools the plancache into deciding that it's worth re-planning
the query every time, since constant-folding with a concrete value for $2
allows elimination of at least one sub-SELECT.  In reality that's using the
planner to do the equivalent of a few runtime boolean tests, causing the
function to run much slower than it should.  Splitting the AND/OR logic
into separate plpgsql statements allows each if-expression to acquire a
static plan.

Also, get rid of some uses of obj_description() in favor of explicitly
joining to pg_description, allowing the joins to be optimized better.
(Someday we might improve the SQL-function-inlining logic enough that
this happens automatically, but today is not that day.)

Together, these changes reduce the runtime of the opr_sanity regression
test by about a factor of two on one of my slower machines.  They don't
seem to help as much on a fast machine, but this should at least benefit
the buildfarm.

c2384421

28 Apr, 2017 9 commits

doc: Fix typo in 9.6 release notes · bc920bee
Peter Eisentraut authored Apr 28, 2017
```
Author: Huong Dangminh <huo-dangminh@ys.jp.nec.com>
```
bc920bee

Fix VALIDATE CONSTRAINT to consider NO INHERIT attribute. · 6a4dda44

Robert Haas authored Apr 28, 2017

Currently, trying to validate a NO INHERIT constraint on the parent will
search for the constraint in child tables (where it is not supposed to
exist), wrongly causing a "constraint does not exist" error.

Amit Langote, per a report from Hans Buschmann.

Discussion: http://postgr.es/m/20170421184012.24362.19@wrigleys.postgresql.org

6a4dda44

psql: Support identity columns in sequence display · e4fddfd4

Peter Eisentraut authored Apr 28, 2017

Where the footer for an owned serial sequence would say "Owned by", put
something analogous for a sequence belonging to an identity column.
Reported-by: Vitaly Burovoy <vitaly.burovoy@gmail.com>

e4fddfd4

In load_relcache_init_file, initialize rd_pdcxt. · 5e1ccd48

Robert Haas authored Apr 28, 2017

Oversight noted by Gao Zeng Qi.

Discussion: http://postgr.es/m/CAFmBtr1N3-SbepJbnGpaYp=jw-FvWMnYY7-bTtRgvjvbyB8YJA@mail.gmail.com

5e1ccd48

Speed up dropping tables with many partitions. · c1e0e7e1

Robert Haas authored Apr 28, 2017

We need to lock the parent, but we don't need a relcache entry
for it.

Gao Zeng Qi, reviewed by Amit Langote

Discussion: http://postgr.es/m/CAFmBtr0ukqJjRJEhPWL5wt4rNMrJUUxggVAGXPR3SyYh3E+HDQ@mail.gmail.com

c1e0e7e1

Fix crash when partitioned column specified twice. · 504c2205

Robert Haas authored Apr 28, 2017

Amit Langote, reviewed by Beena Emerson

Discussion: http://postgr.es/m/6ed23d3d-c09d-4cbc-3628-0a8a32f750f4@lab.ntt.co.jp

504c2205

Wait between tablesync worker restarts · e3cf7080

Peter Eisentraut authored Apr 27, 2017

Before restarting a tablesync worker for the same relation, wait
wal_retrieve_retry_interval (currently 5s by default).  This avoids
restarting failing workers in a tight loop.

We keep the last start times in a hash table last_start_times that is
separate from the table_states list, because that list is cleared out on
syscache invalidation, which happens whenever a table finishes syncing.
The hash table is kept until all tables have finished syncing.

A future project might be to unify these two and keep everything in one
data structure, but for now this is a less invasive change to accomplish
the original purpose.

For the test suite, set wal_retrieve_retry_interval to its minimum
value, to not increase the test suite run time.
Reviewed-by: Petr Jelinek <petr.jelinek@2ndquadrant.com>
Reported-by: Masahiko Sawada <sawada.mshk@gmail.com>

e3cf7080

Misc SCRAM code cleanups. · d981074c

Heikki Linnakangas authored Apr 28, 2017

* Move computation of SaltedPassword to a separate function from
  scram_ClientOrServerKey(). This saves a lot of cycles in libpq, by
  computing SaltedPassword only once per authentication. (Computing
  SaltedPassword is expensive by design.)

* Split scram_ClientOrServerKey() into two functions. Improves
  readability, by making the calling code less verbose.

* Rename "server proof" to "server signature", to better match the
  nomenclature used in RFC 5802.

* Rename SCRAM_SALT_LEN to SCRAM_DEFAULT_SALT_LEN, to make it more clear
  that the salt can be of any length, and the constant only specifies how
  long a salt we use when we generate a new verifier. Also rename
  SCRAM_ITERATIONS_DEFAULT to SCRAM_DEFAULT_ITERATIONS, for consistency.

These things caught my eye while working on other upcoming changes.

d981074c

Remove unnecessairly duplicated gram.y productions · b9a3ef55

Stephen Frost authored Apr 27, 2017

Declarative partitioning duplicated the TypedTableElement productions,
evidently to remove the need to specify WITH OPTIONS when creating
partitions.  Instead, simply make WITH OPTIONS optional in the
TypedTableElement production and remove all of the duplicate
PartitionElement-related productions.  This change simplifies the
syntax and makes WITH OPTIONS optional when adding defaults, constraints
or storage parameters to columns when creating either typed tables or
partitions.

Also update pg_dump to no longer include WITH OPTIONS, since it's not
necessary, and update the documentation to reflect that WITH OPTIONS is
now optional.

b9a3ef55

27 Apr, 2017 12 commits

Don't build full initial logical decoding snapshot if NOEXPORT_SNAPSHOT. · ab9c4338

Andres Freund authored Apr 27, 2017

Earlier commits (56e19d93 and 2bef06d5) make it cheaper to
create a logical slot if not exporting the initial snapshot.  If
NOEXPORT_SNAPSHOT is specified, we can skip the overhead, not just
when creating a slot via sql (which can't export snapshots).  As
NOEXPORT_SNAPSHOT has only recently been introduced, this shouldn't be
backpatched.

ab9c4338

Don't use on-disk snapshots for exported logical decoding snapshot. · 56e19d93

Andres Freund authored Apr 27, 2017

Logical decoding stores historical snapshots on disk, so that logical
decoding can restart without having to reconstruct a snapshot from
scratch (for which the resources are not guaranteed to be present
anymore).  These serialized snapshots were also used when creating a
new slot via the walsender interface, which can export a "full"
snapshot (i.e. one that can read all tables, not just catalog ones).

The problem is that the serialized snapshots are only useful for
catalogs and not for normal user tables.  Thus the use of such a
serialized snapshot could result in an inconsistent snapshot being
exported, which could lead to queries returning wrong data.  This
would only happen if logical slots are created while another logical
slot already exists.

Author: Petr Jelinek
Reviewed-By: Andres Freund
Discussion: https://postgr.es/m/f37e975c-908f-858e-707f-058d3b1eb214@2ndquadrant.com
Backport: 9.4, where logical decoding was introduced.

56e19d93

Avoid slow shutdown of pg_basebackup. · 7834d20b

Tom Lane authored Apr 27, 2017

pg_basebackup's child process did not pay any attention to the pipe
from its parent while waiting for input from the source server.
If no server data was arriving, it would only wake up and check the
pipe every standby_message_timeout or so.  This creates a problem
since the parent process might determine and send the desired stop
position only after the server has reached end-of-WAL and stopped
sending data.  In the src/test/recovery regression tests, the timing
is repeatably such that it takes nearly 10 seconds for the child
process to realize that it should shut down.  It's not clear how
often that would happen in real-world cases, but it sure seems like
a bug --- and if the user turns off standby_message_timeout or sets
it very large, the delay could be a lot worse.

To fix, expand the StreamCtl API to allow the pipe input FD to be
passed down to the low-level wait routine, and watch both sockets
when sleeping.

(Note: AFAICS this issue doesn't affect the Windows port, since
it doesn't rely on a pipe to transfer the stop position to the
child thread.)

Discussion: https://postgr.es/m/6456.1493263884@sss.pgh.pa.us

7834d20b

Fix bug so logical rep launcher saves correctly time of last startup of worker. · 9f11fcec

Fujii Masao authored Apr 28, 2017

Previously the logical replication launcher stored the last timestamp
when it started the worker, in the local variable "last_start_time",
in order to check whether wal_retrive_retry_interval elapsed since
the last startup of worker. If it has elapsed, the launcher sees
pg_subscription and starts new worker if necessary. This is for
limitting the startup of worker to once a wal_retrieve_retry_interval.

The bug was that the variable "last_start_time" was defined and
always initialized with 0 at the beginning of the launcher's main loop.
So even if it's set to the last timestamp in later phase of the loop,
it's always reset to 0. Therefore the launcher could not check
correctly whether wal_retrieve_retry_interval elapsed since
the last startup.

This patch moves the variable "last_start_time" outside the main loop
so that it will not be reset.

Reviewed-by: Petr Jelinek
Discussion: http://postgr.es/m/CAHGQGwGJrPO++XM4mFENAwpy1eGXKsGdguYv43GUgLgU-x8nTQ@mail.gmail.com

9f11fcec

Cope with glibc too old to have epoll_create1(). · 82ebbeb0

Tom Lane authored Apr 27, 2017

Commit fa31b6f4 supposed that we didn't have to worry about that
anymore, but it seems that RHEL5 is like that, and that's still
a supported platform. Put back the prior coding under an #ifdef,
adding an explicit fcntl() to retain the desired CLOEXEC property.

Discussion: https://postgr.es/m/12307.1493325329@sss.pgh.pa.us

82ebbeb0

Preserve required !catalog tuples while computing initial decoding snapshot. · 2bef06d5

Andres Freund authored Apr 23, 2017

The logical decoding machinery already preserved all the required
catalog tuples, which is sufficient in the course of normal logical
decoding, but did not guarantee that non-catalog tuples were preserved
during computation of the initial snapshot when creating a slot over
the replication protocol.

This could cause a corrupted initial snapshot being exported.  The
time window for issues is usually not terribly large, but on a busy
server it's perfectly possible to it hit it.  Ongoing decoding is not
affected by this bug.

To avoid increased overhead for the SQL API, only retain additional
tuples when a logical slot is being created over the replication
protocol.  To do so this commit changes the signature of
CreateInitDecodingContext(), but it seems unlikely that it's being
used in an extension, so that's probably ok.

In a drive-by fix, fix handling of
ReplicationSlotsComputeRequiredXmin's already_locked argument, which
should only apply to ProcArrayLock, not ReplicationSlotControlLock.

Reported-By: Erik Rijkers
Analyzed-By: Petr Jelinek
Author: Petr Jelinek, heavily editorialized by Andres Freund
Reviewed-By: Andres Freund
Discussion: https://postgr.es/m/9a897b86-46e1-9915-ee4c-da02e4ff6a95@2ndquadrant.com
Backport: 9.4, where logical decoding was introduced.

2bef06d5

Make latch.c more paranoid about child-process cases. · fa31b6f4

Tom Lane authored Apr 27, 2017

Although the postmaster doesn't currently create a self-pipe or any
latches, there's discussion of it doing so in future. It's also
conceivable that a shared_preload_libraries extension would try to
create such a thing in the postmaster process today. In that case
the self-pipe FDs would be inherited by forked child processes.
latch.c was entirely unprepared for such a case and could suffer an
assertion failure, or worse try to use the inherited pipe if somebody
called WaitLatch without having called InitializeLatchSupport in that
process. Make it keep track of whether InitializeLatchSupport has been
called in the *current* process, and do the right thing if state has
been inherited from a parent.

Apply FD_CLOEXEC to file descriptors created in latch.c (the self-pipe,
as well as epoll event sets). This ensures that child processes spawned
in backends, the archiver, etc cannot accidentally or intentionally mess
with these FDs. It also ensures that we end up with the right state
for the self-pipe in EXEC_BACKEND processes, which otherwise wouldn't
know to close the postmaster's self-pipe FDs.

Back-patch to 9.6, mainly to keep latch.c looking similar in all branches
it exists in.

Discussion: https://postgr.es/m/8322.1493240739@sss.pgh.pa.us

fa31b6f4

doc: PG10 release note typo fix · a311d2a0
Bruce Momjian authored Apr 27, 2017
```
Reported-by: daniel.westermann
```
a311d2a0
doc PG10rel: adjust hash index commits and add parallel subquery · f8ab08ad
Bruce Momjian authored Apr 27, 2017
```
Reported-by: Amit Kapila
```
f8ab08ad

Rework handling of subtransactions in 2PC recovery · 49e92815

Simon Riggs authored Apr 27, 2017

The bug fixed by 0874d4f3
caused us to question and rework the handling of
subtransactions in 2PC during and at end of recovery.
Patch adds checks and tests to ensure no further bugs.

This effectively removes the temporary measure put in place
by 546c13e1.

Author: Simon Riggs
Reviewed-by: Tom Lane, Michael Paquier
Discussion: http://postgr.es/m/CANP8+j+vvXmruL_i2buvdhMeVv5TQu0Hm2+C5N+kdVwHJuor8w@mail.gmail.com

49e92815

Additional tests for subtransactions in recovery · 0352c15e

Simon Riggs authored Apr 27, 2017

Tests for normal and prepared transactions

Author: Nikhil Sontakke, placed in new test file by me

0352c15e

Fix typo in comment · 6c9bd27a
Peter Eisentraut authored Apr 26, 2017
```
Author: Masahiko Sawada <sawada.mshk@gmail.com>
```
6c9bd27a

26 Apr, 2017 10 commits

Allow multiple bgworkers to be launched per postmaster iteration. · aa1351f1

Tom Lane authored Apr 26, 2017

Previously, maybe_start_bgworker() would launch at most one bgworker
process per call, on the grounds that the postmaster might otherwise
neglect its other duties for too long. However, that seems overly
conservative, especially since bad effects only become obvious when
many hundreds of bgworkers need to be launched at once. On the other
side of the coin is that the existing logic could result in substantial
delay of bgworker launches, because ServerLoop isn't guaranteed to
iterate immediately after a signal arrives. (My attempt to fix that
by using pselect(2) encountered too many portability question marks,
and in any case could not help on platforms without pselect().)
One could also question the wisdom of using an O(N^2) processing
method if the system is intended to support so many bgworkers.

As a compromise, allow that function to launch up to 100 bgworkers
per call (and in consequence, rename it to maybe_start_bgworkers).
This will allow any normal parallel-query request for workers
to be satisfied immediately during sigusr1_handler, avoiding the
question of whether ServerLoop will be able to launch more promptly.

There is talk of rewriting the postmaster to use a WaitEventSet to
avoid the signal-response-delay problem, but I'd argue that this change
should be kept even after that happens (if it ever does).

Backpatch to 9.6 where parallel query was added. The issue exists
before that, but previous uses of bgworkers typically aren't as
sensitive to how quickly they get launched.

Discussion: https://postgr.es/m/4707.1493221358@sss.pgh.pa.us

aa1351f1

doc PG10: add commit for transition table item · fda4fec5
Bruce Momjian authored Apr 26, 2017

fda4fec5

pg_get_partkeydef: return NULL for non-partitions · 0c76c246

Stephen Frost authored Apr 26, 2017

Our general rule for pg_get_X(oid) functions is to simply return NULL
when passed an invalid or inappropriate OID.  Teach pg_get_partkeydef to
do this also, making it easier for users to use this function when
querying against tables with both partitions and non-partitions (such as
pg_class).

As a concrete example, this makes pg_dump's life a little easier.

Author: Amit Langote

0c76c246

Silence compiler warning induced by commit . · 49da0067

Tom Lane authored Apr 26, 2017

Smarter compilers can see that "slot" can't be used uninitialized,
but some popular ones cannot.  Noted by Jeff Janes.

49da0067

doc: ALTER SUBSCRIPTION documentation fixes · e315346d

Peter Eisentraut authored Apr 26, 2017

WITH is optional for REFRESH PUBLICATION.  Also, remove a spurious
bracket and fix a punctuation.

Author: Euler Taveira <euler@timbira.com.br>

e315346d

Fix query that gets remote relation info · 61ecc90b

Peter Eisentraut authored Apr 26, 2017

Publisher relation can be incorrectly chosen, if there are more than
one relation in different schemas with the same name.

Author: Euler Taveira <euler@timbira.com.br>

61ecc90b

Spelling fixes in code comments · e495c168
Peter Eisentraut authored Apr 26, 2017
```
Author: Euler Taveira <euler@timbira.com.br>
```
e495c168
Fix typo in comment. · 1f8b0601
Fujii Masao authored Apr 27, 2017
```
Author: Masahiko Sawada
```
1f8b0601

Fix various concurrency issues in logical replication worker launching · de438971

Peter Eisentraut authored Apr 26, 2017

The code was originally written with assumption that launcher is the
only process starting the worker.  However that hasn't been true since
commit 7c4f5240 which failed to modify the worker management code
adequately.

This patch adds an in_use field to the LogicalRepWorker struct to
indicate whether the worker slot is being used and uses proper locking
everywhere this flag is set or read.

However if the parent process dies while the new worker is starting and
the new worker fails to attach to shared memory, this flag would never
get cleared.  We solve this rare corner case by adding a sort of garbage
collector for in_use slots.  This uses another field in the
LogicalRepWorker struct named launch_time that contains the time when
the worker was started.  If any request to start a new worker does not
find free slot, we'll check for workers that were supposed to start but
took too long to actually do so, and reuse their slot.

In passing also fix possible race conditions when stopping a worker that
hasn't finished starting yet.

Author: Petr Jelinek <petr.jelinek@2ndquadrant.com>
Reported-by: Fujii Masao <masao.fujii@gmail.com>

de438971

doc PG10: add Rafia Sabih to parallel index scan item · 309191f6
Bruce Momjian authored Apr 26, 2017
```
Reported-by: Amit Kapila
```
309191f6

25 Apr, 2017 7 commits

Allow ALTER TABLE ONLY on partitioned tables · 9139aa19

Stephen Frost authored Apr 25, 2017

There is no need to forbid ALTER TABLE ONLY on partitioned tables,
when no partitions exist yet.  This can be handy for users who are
building up their partitioned table independently and will create actual
partitions later.

In addition, this is how pg_dump likes to operate in certain instances.

Author: Amit Langote, with some error message word-smithing by me

9139aa19

doc PG10: update EXPLAIN SUMMARY item · 5f2b48d1
Bruce Momjian authored Apr 25, 2017
```
Reported-by: Tels
```
5f2b48d1

Wake up launcher when enabling a subscription · a3f17b9c

Peter Eisentraut authored Apr 25, 2017

Otherwise one would have to wait up to DEFAULT_NAPTIME_PER_CYCLE until
the subscription worker is considered for starting.

There is a small race condition: If one enables a subscription right
after disabling it, the launcher might not have registered the stopping
when receiving the wakeup signal for the re-enabling. The start will
then not happen right away but after the full cycle time.

Author: Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp>

a3f17b9c

doc: update PG 10 item about referencing many relations · ef0ba572
Bruce Momjian authored Apr 25, 2017
```
Reported-by: Tom Lane
```
ef0ba572
doc: add PG 10 doc item about VACUUM truncation, 7e26e02e · 3d774119
Bruce Momjian authored Apr 25, 2017
```
Reported-by: Andres Freund
```
3d774119
doc PG10: add commit 090010f2 and adjust EXPLAIN SUMMARY item · 3640cf5e
Bruce Momjian authored Apr 25, 2017
```
Reported-by: Tels, Andres Freund
```
3640cf5e
doc: properly indent SGML tags in PG 10 release notes · bf368fbe
Bruce Momjian authored Apr 25, 2017

bf368fbe