Commits · 6b3094c26f9ec8688d802e71562e9be714cfe6ac · Abuhujair Javed / Postgres FD Implementation

28 May, 2016 1 commit

Lots of comment-fixing, and minor cosmetic cleanup, in pg_dump/parallel.c. · 6b3094c2

Tom Lane authored May 28, 2016

The commentary in this file was in extremely sad shape. The author(s)
had clearly never heard of the project convention that a function header
comment should provide an API spec of some sort for that function. Much
of it was flat out wrong, too --- maybe it was accurate when written, but
if so it had not been updated to track subsequent code revisions. Rewrite
and rearrange to try to bring it up to speed, and annotate some of the
places where more work is needed. (I've refrained from actually fixing
anything of substance ... yet.)

Also, rename a couple of functions for more clarity as to what they do,
do some very minor code rearrangement, remove some pointless Asserts,
fix an incorrect Assert in readMessageFromPipe, and add a missing socket
close in one error exit from pgpipe(). The last would be a bug if we
tried to continue after pgpipe() failure, but since we don't, it's just
cosmetic at present.

Although this is only cosmetic, back-patch to 9.3 where parallel.c was
added. It's sufficiently invasive that it'll pose a hazard for future
back-patching if we don't.

Discussion: <25239.1464386067@sss.pgh.pa.us>

6b3094c2

27 May, 2016 4 commits

Clean up thread management in parallel pg_dump for Windows. · 807b4537

Tom Lane authored May 27, 2016

Since we start the worker threads with _beginthreadex(), we should use
_endthreadex() to terminate them.  We got this right in the normal-exit
code path, but not so much during an error exit from a worker.
In addition, be sure to apply CloseHandle to the thread handle after
each thread exits.

It's not clear that these oversights cause any user-visible problems,
since the pg_dump run is about to terminate anyway.  Still, it's clearly
better to follow Microsoft's API specifications than ignore them.

Also a few cosmetic cleanups in WaitForTerminatingWorkers(), including
being a bit less random about where to cast between uintptr_t and HANDLE,
and being sure to clear the worker identity field for each dead worker
(not that false matches should be possible later, but let's be careful).

Original observation and patch by Armin Schöffmann, cosmetic improvements
by Michael Paquier and me.  (Armin's patch also included closing sockets
in ShutdownWorkersHard(), but that's been dealt with already in commit
df8d2d8c.)  Back-patch to 9.3 where parallel pg_dump was introduced.

Discussion: <zarafa.570306bd.3418.074bf1420d8f2ba2@root.aegaeon.de>

807b4537

Fix release-note typo. · d81ecb9b
Tom Lane authored May 27, 2016
```
Léonard Benedetti
```
d81ecb9b

Fix DROP ACCESS METHOD IF EXISTS. · 83dbde94

Tom Lane authored May 27, 2016

The IF EXISTS option was documented, and implemented in the grammar, but
it didn't actually work for lack of support in does_not_exist_skipping().
Per bug #14160.

Report and patch by Kouhei Sutou

Report: <20160527070433.19424.81712@wrigleys.postgresql.org>

83dbde94

Be more predictable about reporting "lock timeout" vs "statement timeout". · 9dd4178c

Tom Lane authored May 27, 2016

If both timeout indicators are set when we arrive at ProcessInterrupts,
we've historically just reported "lock timeout". However, some buildfarm
members have been observed to fail isolationtester's timeouts test by
reporting "lock timeout" when the statement timeout was expected to fire
first. The cause seems to be that the process is allowed to sleep longer
than expected (probably due to heavy machine load) so that the lock
timeout happens before we reach the point of reporting the error, and
then this arbitrary tiebreak rule does the wrong thing. We can improve
matters by comparing the scheduled timeout times to decide which error
to report.

I had originally proposed greatly reducing the 1-second window between
the two timeouts in the test cases. On reflection that is a bad idea,
at least for the case where the lock timeout is expected to fire first,
because that would assume that it takes negligible time to get from
statement start to the beginning of the lock wait. Thus, this patch
doesn't completely remove the risk of test failures on slow machines.
Empirically, however, the case this handles is the one we are seeing
in the buildfarm. The explanation may be that the other case requires
the scheduler to take the CPU away from a busy process, whereas the
case fixed here only requires the scheduler to not give the CPU back
right away to a process that has been woken from a multi-second sleep
(and, perhaps, has been swapped out meanwhile).

Back-patch to 9.3 where the isolationtester timeouts test was added.

Discussion: <8693.1464314819@sss.pgh.pa.us>

9dd4178c

26 May, 2016 5 commits

Make pg_dump error cleanly with -j against hot standby · d74048de

Magnus Hagander authored May 26, 2016

Getting a synchronized snapshot is not supported on a hot standby node,
and is by default taken when using -j with multiple sessions. Trying to
do so still failed, but with a server error that would also go in the
log. Instead, proprely detect this case and give a better error message.

d74048de

Disable physical tlist if any Var would need multiple sortgroupref labels. · aeb9ae64

Tom Lane authored May 26, 2016

As part of upper planner pathification (commit 3fc6e2d7) I redid
createplan.c's approach to the physical-tlist optimization, in which scan
nodes are allowed to return exactly the underlying table's columns so as
to save doing a projection step at runtime. The logic was intentionally
more aggressive than before about applying the optimization, which is
generally a good thing, but Andres Freund found a case in which it got
too aggressive. Namely, if any column is referenced more than once in
the parent plan node's sorting or grouping column list, we can't optimize
because then that column would need to have more than one ressortgroupref
label, and we only have space for one.

Add logic to detect this situation in use_physical_tlist(), and also add
some error checking in apply_pathtarget_labeling_to_tlist(), which this
example proves was being overly cavalier about whether what it was doing
made any sense.

The added test case exposes the problem only because we do not eliminate
duplicate grouping keys. That might be something to fix someday, but it
doesn't seem like appropriate post-beta work.

Report: <20160526021235.w4nq7k3gnheg7vit@alap3.anarazel.de>

aeb9ae64

Fix typo in 9.5 release nodes · d7ef3572
Alvaro Herrera authored May 26, 2016
```
Noted by 星合 拓馬 (HOSHIAI Takuma)
```
d7ef3572

Make pg_dump behave more sanely when built without HAVE_LIBZ. · cae2bb19

Tom Lane authored May 26, 2016

For some reason the code to emit a warning and switch to uncompressed
output was placed down in the guts of pg_backup_archiver.c. This is
definitely too late in the case of parallel operation (and I rather
wonder if it wasn't too late for other purposes as well). Put it in
pg_dump.c's option-processing logic, which seems a much saner place.

Also, the default behavior with custom or directory output format was
to emit the warning telling you the output would be uncompressed. This
seems unhelpful, so silence that case.

Back-patch to 9.3 where parallel dump was introduced.

Kyotaro Horiguchi, adjusted a bit by me

Report: <20160526.185551.242041780.horiguchi.kyotaro@lab.ntt.co.jp>

cae2bb19

In Windows pg_dump, ensure idle workers will shut down during error exit. · df8d2d8c

Tom Lane authored May 26, 2016

The Windows coding of ShutdownWorkersHard() thought that setting termEvent
was sufficient to make workers exit after an error.  But that only helps
if a worker is busy and passes through checkAborting().  An idle worker
will just sit, resulting in pg_dump failing to exit until the user gives up
and hits control-C.  We should close the write end of the command pipe
so that idle workers will see socket EOF and exit, as the Unix coding was
already doing.

Back-patch to 9.3 where parallel pg_dump was introduced.

Kyotaro Horiguchi

df8d2d8c

25 May, 2016 8 commits

Remove option to write USING before opclass name in CREATE INDEX. · b898eb63

Tom Lane authored May 25, 2016

Dating back to commit f10b6392, our grammar has allowed "USING" to
optionally appear before an opclass name in CREATE INDEX (and, lately,
some related places such as ON CONFLICT specifications).  Nikolay Shaplov
noticed that this syntax existed but wasn't documented, and proposed
documenting it.  But what seems like a better idea is to remove the
production, thereby making the code match the docs not vice versa.
This isn't our usual modus operandi for such cases, but there are a
couple of good reasons to proceed this way:

* So far as I can find, this syntax has never been documented anywhere.
It isn't relied on by any of our own code or test cases, and there seems
little reason to suppose that it's been used in the wild either.

* Documenting it would mean that there would be two separate uses of
USING in the CREATE INDEX syntax, the other being "USING access_method".
That can lead to nothing but confusion.

So, let's just remove it.  On the off chance that somebody somewhere
is using it, this isn't something to back-patch, but we can fix it
in HEAD.

Discussion: <1593237.l7oKHRpxSe@nataraj-amd64>

b898eb63

Ensure that backends see up-to-date statistics for shared catalogs. · 52e8fc3e

Tom Lane authored May 25, 2016

Ever since we split the statistics collector's reports into per-database
files (commit 187492b6), backends have been seeing stale statistics
for shared catalogs. This is because the inquiry message only prompts the
collector to write the per-database file for the requesting backend's own
database. Stats for shared catalogs are in a separate file for "DB 0",
which didn't get updated.

In normal operation this was partially masked by the fact that the
autovacuum launcher would send an inquiry message at least once per
autovacuum_naptime that asked for "DB 0"; so the shared-catalog stats would
never be more than a minute out of date. However the problem becomes very
obvious with autovacuum disabled, as reported by Peter Eisentraut.

To fix, redefine the semantics of inquiry messages so that both the
specified DB and DB 0 will be dumped. (This might seem a bit inefficient,
but we have no good way to know whether a backend's transaction will look
at shared-catalog stats, so we have to read both groups of stats whenever
we request stats. Sending two inquiry messages would definitely not be
better.)

Back-patch to 9.3 where the bug was introduced.

Report: <56AD41AC.1030509@gmx.net>

52e8fc3e

Fix broken error handling in parallel pg_dump/pg_restore. · 9abd64ec

Tom Lane authored May 25, 2016

In the original design for parallel dump, worker processes reported errors
by sending them up to the master process, which would print the messages.
This is unworkably fragile for a couple of reasons: it risks deadlock if a
worker sends an error at an unexpected time, and if the master has already
died for some reason, the user will never get to see the error at all.
Revert that idea and go back to just always printing messages to stderr.
This approach means that if all the workers fail for similar reasons (eg,
bad password or server shutdown), the user will see N copies of that
message, not only one as before. While that's slightly annoying, it's
certainly better than not seeing any message; not to mention that we
shouldn't assume that only the first failure is interesting.

An additional problem in the same area was that the master failed to
disable SIGPIPE (at least until much too late), which meant that sending a
command to an already-dead worker would cause the master to crash silently.
That was bad enough in itself but was made worse by the total reliance on
the master to print errors: even if the worker had reported an error, you
would probably not see it, depending on timing. Instead disable SIGPIPE
right after we've forked the workers, before attempting to send them
anything.

Additionally, the master relies on seeing socket EOF to realize that a
worker has exited prematurely --- but on Windows, there would be no EOF
since the socket is attached to the process that includes both the master
and worker threads, so it remains open. Make archive_close_connection()
close the worker end of the sockets so that this acts more like the Unix
case. It's not perfect, because if a worker thread exits without going
through exit_nicely() the closures won't happen; but that's not really
supposed to happen.

This has been wrong all along, so back-patch to 9.3 where parallel dump
was introduced.

Report: <2458.1450894615@sss.pgh.pa.us>

9abd64ec

Update doc text to reflect new column in MVCC phenomena table. · 627e3603
Kevin Grittner authored May 25, 2016
```
Scott Wehrenberg
```
627e3603

Do not DROP default roles in pg_dumpall -c · 018eb027

Stephen Frost authored May 24, 2016

When pulling the list of roles to drop, exclude roles whose names
begin with "pg_" (as we do when we are dumping the roles out to
recreate them).

Also add regression tests to cover pg_dumpall -c and this specific
issue.

Noticed by Rushabh Lathia.  Patch by me.

018eb027

Mark wal_level as PGDLLIMPORT. · f5e7b2f9

Tom Lane authored May 24, 2016

Per buildfarm, this is needed to allow extensions to use XLogIsNeeded()
in Windows builds.

f5e7b2f9

Fix contrib/bloom to work for unlogged indexes. · abaffa90

Tom Lane authored May 24, 2016

blbuildempty did not do even approximately the right thing: it tried
to add a metapage to the relation's regular data fork, which already
has one at that point.  It should look like the ambuildempty methods
for all the standard index types, ie, initialize a metapage image in
some transient storage and then write it directly to the init fork.
To support that, refactor BloomInitMetapage into two functions.

In passing, fix BloomInitMetapage so it doesn't leave the rd_options
field of the index's relcache entry pointing at transient storage.
I'm not sure this had any visible consequence, since nothing much
else is likely to look at a bloom index's rd_options, but it's
certainly poor practice.

Per bug #14155 from Zhou Digoal.

Report: <20160524144146.22598.42558@wrigleys.postgresql.org>

abaffa90

Qualify table usage in dumpTable() and use regclass · 2e8b4bf8

Stephen Frost authored May 24, 2016

All of the other tables used in the query in dumpTable(), which is
collecting column-level ACLs, are qualified, so we should be qualifying
the pg_init_privs, the related sub-select against pg_class and the
other queries added by the pg_dump catalog ACLs work.

Also, use ::regclass (or ::pg_catalog.regclass, where appropriate)
instead of using a poorly constructed query to get the OID for various
catalog tables.

Issues identified by Noah and Alvaro, patch by me.

2e8b4bf8

24 May, 2016 7 commits

Fetch XIDs atomically during vac_truncate_clog(). · 2d2e40e3

Tom Lane authored May 24, 2016

Because vac_update_datfrozenxid() updates datfrozenxid and datminmxid
in-place, it's unsafe to assume that successive reads of those values will
give consistent results.  Fetch each one just once to ensure sane behavior
in the minimum calculation.  Noted while reviewing Alexander Korotkov's
patch in the same area.

Discussion: <8564.1464116473@sss.pgh.pa.us>

2d2e40e3

Avoid consuming an XID during vac_truncate_clog(). · 996d2739

Tom Lane authored May 24, 2016

vac_truncate_clog() uses its own transaction ID as the comparison point in
a sanity check that no database's datfrozenxid has already wrapped around
"into the future". That was probably fine when written, but in a lazy
vacuum we won't have assigned an XID, so calling GetCurrentTransactionId()
causes an XID to be assigned when otherwise one would not be. Most of the
time that's not a big problem ... but if we are hard up against the
wraparound limit, consuming XIDs during antiwraparound vacuums is a very
bad thing.

Instead, use ReadNewTransactionId(), which not only avoids this problem
but is in itself a better comparison point to test whether wraparound
has already occurred.

Report and patch by Alexander Korotkov. Back-patch to all versions.

Report: <CAPpHfdspOkmiQsxh-UZw2chM6dRMwXAJGEmmbmqYR=yvM7-s6A@mail.gmail.com>

996d2739

Fix range check for effective_io_concurrency · 0c7cd45b

Alvaro Herrera authored May 24, 2016

Commit 1aba62ec moved the range check of that option form guc.c into
bufmgr.c, but introduced a bug by changing a >= 0.0 to > 0.0, which made
the value 0 no longer accepted.  Put it back.

Reported by Jeff Janes, diagnosed by Tom Lane

0c7cd45b

Docs: mention pg_reload_conf() in ALTER SYSTEM reference page. · c45fb43c
Tom Lane authored May 24, 2016
```
Takayuki Tsunakawa

Discussion: <0A3221C70F24FB45833433255569204D1F578FC3@G01JPEXMBYT05>
```
c45fb43c

In examples of Oracle PL/SQL code, use varchar2 not varchar. · 23f11dc2

Tom Lane authored May 24, 2016

Oracle recommends using VARCHAR2 not VARCHAR, allegedly because they might
someday change VARCHAR to be spec-compliant about distinguishing null from
empty string. (I'm not holding my breath, though.) Our examples of PL/SQL
code were using VARCHAR, which while not wrong is missing the pedagogical
opportunity to talk about converting Oracle type names to Postgres. So
switch the examples to use VARCHAR2, and add some text about what to do
with common Oracle type names like VARCHAR2 and NUMBER. (There is probably
more to be said here, but those are the ones I'm sure about offhand.)
Per suggestion from rapg12@gmail.com.

Discussion: <20160521140046.22591.24672@wrigleys.postgresql.org>

23f11dc2

Fix typo in docs · 6ee7fb82

Teodor Sigaev authored May 24, 2016

Add missing USING BLOOM in example of contrib/bloom

Nikolay Shaplov

6ee7fb82

Fix typo in TAP test identification string. · 1087aa23
Tom Lane authored May 23, 2016
```
Michael Paquier
```
1087aa23

23 May, 2016 4 commits

Fix BTREE_BUILD_STATS build. · 1e0d6512

Tom Lane authored May 23, 2016

Commit 65c5fcd3 broke this by removing a
header include directive that is conditionally required.  Add that back
to nbtree.c, with annotation to keep pgrminclude from re-breaking it.

Peter Geoghegan

Report: <CAM3SWZTNjHFYW_UG8bu0BnogqQ2HfsTgkzXLueuUhfTcYbu5HA@mail.gmail.com>

1e0d6512

Support IndexElem in raw_expression_tree_walker(). · eae1ad9b

Tom Lane authored May 23, 2016

Needed for cases in which INSERT ... ON CONFLICT appears inside a
recursive CTE item.  Per bug #14153 from Thomas Alton.

Patch by Peter Geoghegan, slightly adjusted by me

Report: <20160521232802.22598.13537@wrigleys.postgresql.org>

eae1ad9b

Add support for more extensive testing of raw_expression_tree_walker(). · 465e09da

Tom Lane authored May 23, 2016

If RAW_EXPRESSION_COVERAGE_TEST is defined, do a no-op tree walk over
every basic DML statement submitted to parse analysis.  If we'd had this
in place earlier, bug #14153 would have been caught by buildfarm testing.
The difficulty is that raw_expression_tree_walker() is only used in
limited cases involving CTEs (particularly recursive ones), so it's
very easy for an oversight in it to not be noticed during testing of a
seemingly-unrelated feature.

The type of error we can expect to catch with this is complete omission
of a node type from raw_expression_tree_walker(), and perhaps also
recursion into a field that doesn't contain a node tree, though that
would be an unlikely mistake.  It won't catch failure to add new fields
that need to be recursed into, unfortunately.

I'll go enable this on one or two of my own buildfarm animals once
bug #14153 is dealt with.

Discussion: <27861.1464040417@sss.pgh.pa.us>

465e09da

Fix latent crash in do_text_output_multiline(). · 8a4930e3

Tom Lane authored May 23, 2016

do_text_output_multiline() would fail (typically with a null pointer
dereference crash) if its input string did not end with a newline.  Such
cases do not arise in our current sources; but it certainly could happen
in future, or in extension code's usage of the function, so we should fix
it.  To fix, replace "eol += len" with "eol = text + len".

While at it, make two cosmetic improvements: mark the input string const,
and rename the argument from "text" to "txt" to dodge pgindent strangeness
(since "text" is a typedef name).

Even though this problem is only latent at present, it seems like a good
idea to back-patch the fix, since it's a very simple/safe patch and it's
not out of the realm of possibility that we might in future back-patch
something that expects sane behavior from do_text_output_multiline().

Per report from Hao Lee.

Report: <CAGoxFiFPAGyPAJLcFxTB5cGhTW2yOVBDYeqDugYwV4dEd1L_Ag@mail.gmail.com>

8a4930e3

22 May, 2016 1 commit
- psql: Message style improvements · a50b605a
  Peter Eisentraut authored May 21, 2016
  
  a50b605a
21 May, 2016 2 commits

Improve docs about contrib/intarray's benchmark suite. · 768d6f90

Tom Lane authored May 21, 2016

Correct obsolete install instructions, as noted by Daniel Gustafsson.
Clarify the test code's prerequisites.

Discussion: <88E617F2-7721-4C4E-84F4-886A2041C1D0@yesql.se>

768d6f90

Improve docs about using ORDER BY to control aggregate input order. · 82eafabe

Tom Lane authored May 21, 2016

David Johnston pointed out that the original text here had been obsoleted
by SQL:2008, which allowed ORDER BY in subqueries. We could weaken the
text to describe ORDER-BY-in-subqueries as an optional SQL feature that's
possibly unportable; but then the exact same statements would apply to
the alternative it's being compared to (ORDER-BY-in-aggregate-calls).
So really that would be pretty useless; let's just take out the sentence
entirely. Instead, point out the hazard that any extra processing in the
upper query might cause the subquery output order to be destroyed.

Discussion: <CAKFQuwbAX=iO9QbpN7_jr+BnUWm9FYX8WbEPUvG0p+nZhp6TZg@mail.gmail.com>

82eafabe

20 May, 2016 2 commits

Further improve documentation about --quote-all-identifiers switch. · 50e5315a

Tom Lane authored May 20, 2016

Mention it in the Notes section too, per suggestion from David Johnston.

Discussion: <20160520165824.22598.31426@wrigleys.postgresql.org>

50e5315a

Improve documentation about pg_dump's --quote-all-identifiers switch. · 960be4a9

Tom Lane authored May 20, 2016

Per bug #14152 from Alejandro Martínez.  Back-patch to all supported
branches.

Discussion: <20160520165824.22598.31426@wrigleys.postgresql.org>

960be4a9

19 May, 2016 1 commit

Pin the built-in index access methods. · 16ea51a2

Tom Lane authored May 19, 2016

This was overlooked in commit 473b9328, which introduced DROP ACCESS
METHOD. Although that command is restricted to superusers, we don't want
even superusers dropping the built-in methods; "DROP ACCESS METHOD btree"
in particular is unrecoverable from. Pin these objects in the same way
that other initdb-created objects are pinned.

I chose to bump catversion for this fix. That's not absolutely necessary
perhaps, but it will ensure that no 9.6 production systems are missing
the pin entries.

16ea51a2

17 May, 2016 3 commits

Avoid possible crash in contrib/bloom's blendscan(). · e13ac558

Tom Lane authored May 17, 2016

It's possible to begin and end an indexscan without ever calling
amrescan.  contrib/bloom, unlike every other index AM, allocated
its "scan->opaque" storage at amrescan time, and thus would crash
in amendscan if amrescan hadn't been called.  We could fix this
by putting in a null-pointer check in blendscan, but I see no very
good reason why contrib/bloom should march to its own drummer in
this respect.  Let's move that initialization to blbeginscan
instead.  Per report from Jeff Janes.

e13ac558

Allocate all page images at once in generic wal interface · 7c979c95
Teodor Sigaev authored May 17, 2016
```
That reduces number of allocation.

Per gripe from Michael Paquier and Tom Lane suggestion.
```
7c979c95
Fix typo · b09cd2e5
Magnus Hagander authored May 17, 2016
```
Amit Langote
```
b09cd2e5

16 May, 2016 2 commits

Correctly align page's images in generic wal API · 7c8345f6

Teodor Sigaev authored May 17, 2016

Page image should be MAXALIGN'ed because existing code could directly align
pointers in page instead of align offset from beginning of page.

Found during play with indexes as extenstion, Alexander Korotkov and me

7c8345f6

postgres_fdw: Fix the fix for crash when pushing down multiple joins. · 02a568a0

Robert Haas authored May 16, 2016

Commit 3151f16e was intended to be
a commit of a patch from Ashutosh Bapat, but instead I mistakenly
committed an earlier version from Michael Paquier (because both
patches were submitted with the same filename, and I confused them).
Michael's patch fixes the crash but doesn't actually implement the
correct test.

Repair the incorrect logic, and also expand the comments considerably
so that this is all more clear.

Ashutosh Bapat and Robert Haas

02a568a0