Commits · 52f5d578d6c29bf254e93c69043b817d4047ca67 · Abuhujair Javed / Postgres FD Implementation

22 Feb, 2016 3 commits

Create a function to reliably identify which sessions block which others. · 52f5d578

Tom Lane authored Feb 22, 2016

This patch introduces "pg_blocking_pids(int) returns int[]", which returns
the PIDs of any sessions that are blocking the session with the given PID.
Historically people have obtained such information using a self-join on
the pg_locks view, but it's unreasonably tedious to do it that way with any
modicum of correctness, and the addition of parallel queries has pretty
much broken that approach altogether. (Given some more columns in the view
than there are today, you could imagine handling parallel-query cases with
a 4-way join; but ugh.)

The new function has the following behaviors that are painful or impossible
to get right via pg_locks:

1. Correctly understands which lock modes block which other ones.

2. In soft-block situations (two processes both waiting for conflicting lock
modes), only the one that's in front in the wait queue is reported to
block the other.

3. In parallel-query cases, reports all sessions blocking any member of
the given PID's lock group, and reports a session by naming its leader
process's PID, which will be the pg_backend_pid() value visible to
clients.

The motivation for doing this right now is mostly to fix the isolation
tests. Commit 38f8bdca lobotomized
isolationtester's is-it-waiting query by removing its ability to recognize
nonconflicting lock modes, as a crude workaround for the inability to
handle soft-block situations properly. But even without the lock mode
tests, the old query was excessively slow, particularly in
CLOBBER_CACHE_ALWAYS builds; some of our buildfarm animals fail the new
deadlock-hard test because the deadlock timeout elapses before they can
probe the waiting status of all eight sessions. Replacing the pg_locks
self-join with use of pg_blocking_pids() is not only much more correct, but
a lot faster: I measure it at about 9X faster in a typical dev build with
Asserts, and 3X faster in CLOBBER_CACHE_ALWAYS builds. That should provide
enough headroom for the slower CLOBBER_CACHE_ALWAYS animals to pass the
test, without having to lengthen deadlock_timeout yet more and thus slow
down the test for everyone else.

52f5d578

Remove redundant PGPROC.lockGroupLeaderIdentifier field. · 73bf8715

Tom Lane authored Feb 22, 2016

We don't really need this field, because it's either zero or redundant with
PGPROC.pid. The use of zero to mark "not a group leader" is not necessary
since we can just as well test whether lockGroupLeader is NULL. This does
not save very much, either as to code or data, but the simplification seems
worthwhile anyway.

73bf8715

Fix wrong keysize in PrivateRefCountHash creation. · ea56b06c

Andres Freund authored Feb 21, 2016

In 4b4b680c I accidentally used sizeof(PrivateRefCountArray) instead of
sizeof(PrivateRefCountEntry) when creating the refcount overflow
hashtable. As the former is bigger than the latter, this luckily only
resulted in a slightly increased memory usage when many buffers are
pinned in a backend.

Reported-By: Takashi Horikawa
Discussion: 73FA3881462C614096F815F75628AFCD035A48C3@BPXM01GP.gisp.nec.co.jp
Backpatch: 9.5, where thew new ref count infrastructure was introduced

ea56b06c

21 Feb, 2016 7 commits

Docs: make prose discussion match the ordering of Table 9-58. · 64a169d1

Tom Lane authored Feb 21, 2016

The "Session Information Functions" table seems to be sorted mostly
alphabetically (although it's not perfect), which would be all right
if it didn't lead to some related functions being described in a
pretty nonintuitive order. Also, the prose discussions after the table
were in an order that hardly matched the table at all. Rearrange to
make things a bit easier to follow.

64a169d1

Cosmetic improvements in new config_info code. · c7a1c5a6

Tom Lane authored Feb 21, 2016

Coverity griped about use of unchecked strcpy() into a local variable.
There's unlikely to be any actual bug there, since no caller would be
passing a path longer than MAXPGPATH, but nonetheless use of strlcpy()
seems preferable.

While at it, get rid of unmaintainable separation between list of
field names and list of field values in favor of initializing them
in parallel. And we might as well declare get_configdata()'s path
argument as const char *, even though no current caller needs that.

c7a1c5a6

Fix two-argument jsonb_object when called with empty arrays · 94c745eb

Andrew Dunstan authored Feb 21, 2016

Some over-eager copy-and-pasting on my part resulted in a nonsense
result being returned in this case. I have adopted the same pattern for
handling this case as is used in the one argument form of the function,
i.e. we just skip over the code that adds values to the object.

Diagnosis and patch from Michael Paquier, although not quite his
solution.

Fixes bug #13936.

Backpatch to 9.5 where jsonb_object was introduced.

94c745eb

Fix incorrect decision about which lock to take. · 88aca566
Robert Haas authored Feb 21, 2016
```
Spotted by Tom Lane.
```
88aca566

Cosmetic improvements to group locking. · d91a4a6c

Robert Haas authored Feb 21, 2016

Reflow text in lock manager README so that it fits within 80 columns.
Correct some mistakes.  Expand the README to explain not only why group
locking exists but also the data structures that support it.  Improve
comments related to group locking several files.  Change the name of a
macro argument for improved clarity.

Most of these problems were reported by Tom Lane, but I found a few
of them myself.

Robert Haas and Tom Lane

d91a4a6c

postgres_fdw: Avoid sharing list substructure. · dd077ef8

Robert Haas authored Feb 21, 2016

list_concat(list_concat(a, b), c) destructively changes both a and b;
to avoid such perils, copy lists of remote_conds before incorporating
them into larger lists via list_concat().

Ashutosh Bapat, per a report from Etsuro Fujita

dd077ef8

Fix wording in the Tutorial document. · 902fd1f4
Tatsuo Ishii authored Feb 21, 2016
```
With suggentions from Tom Lane.
```
902fd1f4

20 Feb, 2016 5 commits

Further fixing to make pg_size_bytes() portable. · 740d7184

Dean Rasheed authored Feb 20, 2016

Not all compilers support "long long" and the "LL" integer literal
suffix, so use a cast to int64 instead.

740d7184

Fix pg_size_bytes() to be more portable. · ad7cc1c5

Dean Rasheed authored Feb 20, 2016

Commit 53874c52 broke various 32-bit
buildfarm machines because it incorrectly used an 'L' suffix for what
needed to be a 64-bit literal. Thanks to Michael Paquier for helping
to diagnose this.

ad7cc1c5

Add pg_size_bytes() to parse human-readable size strings. · 53874c52

Dean Rasheed authored Feb 20, 2016

This will parse strings in the format produced by pg_size_pretty() and
return sizes in bytes. This allows queries to be written with clauses
like "pg_total_relation_size(oid) > pg_size_bytes('10 GB')".

Author: Pavel Stehule with various improvements by Vitaly Burovoy
Discussion: http://www.postgresql.org/message-id/CAFj8pRD-tGoDKnxdYgECzA4On01_uRqPrwF-8LdkSE-6bDHp0w@mail.gmail.com
Reviewed-by: Vitaly Burovoy, Oleksandr Shulgin, Kyotaro Horiguchi,
Michael Paquier and Robert Haas

53874c52

doc: Improve CSS style of option element · 091b6055
Peter Eisentraut authored Feb 19, 2016
```
Prevent wrapping of the element content to avoid confusing line breaks
between hyphens.
```
091b6055

Call xlc __isync() after, not before, associated compare-and-swap. · 5882ca66

Noah Misch authored Feb 19, 2016

Architecture reference material specifies this order, and s_lock.h
inline assembly agrees.  The former order failed to provide mutual
exclusion to lwlock.c and perhaps to other clients.  The two xlc
buildfarm members, hornet and mandrill, have failed sixteen times with
duplicate key errors involving pg_class_oid_index or pg_type_oid_index.
Back-patch to 9.5, where commit b64d92f1
introduced atomics.

Reviewed by Andres Freund and Tom Lane.

5882ca66

19 Feb, 2016 3 commits

Correct StartupSUBTRANS for page wraparound · 481725c0

Simon Riggs authored Feb 19, 2016

StartupSUBTRANS() incorrectly handled cases near the max pageid in the subtrans
data structure, which in some cases could lead to errors in startup for Hot
Standby.
This patch wraps the pageids correctly, avoiding any such errors.
Identified by exhaustive crash testing by Jeff Janes.

Jeff Janes

481725c0

pg_dump: Fix inconsistent sscanf() conversions · a914a041

Peter Eisentraut authored Feb 18, 2016

It was using %u to read a string that was earlier produced by snprintf with %d
into a signed integer variable.  This seems to work in practice but is
incorrect.

found by cppcheck

a914a041

Add an explicit representation of the output targetlist to Paths. · 19a54114

Tom Lane authored Feb 18, 2016

Up to now, there's been an assumption that all Paths for a given relation
compute the same output column set (targetlist). However, there are good
reasons to remove that assumption. For example, an indexscan on an
expression index might be able to return the value of an expensive function
"for free". While we have the ability to generate such a plan today in
simple cases, we don't have a way to model that it's cheaper than a plan
that computes the function from scratch, nor a way to create such a plan
in join cases (where the function computation would normally happen at
the topmost join node). Also, we need this so that we can have Paths
representing post-scan/join steps, where the targetlist may well change
from one step to the next. Therefore, invent a "struct PathTarget"
representing the columns we expect a plan step to emit. It's convenient
to include the output tuple width and tlist evaluation cost in this struct,
and there will likely be additional fields in future.

While Path nodes that actually do have custom outputs will need their own
PathTargets, it will still be true that most Paths for a given relation
will compute the same tlist. To reduce the overhead added by this patch,
keep a "default PathTarget" in RelOptInfo, and allow Paths that compute
that column set to just point to their parent RelOptInfo's reltarget.
(In the patch as committed, actually every Path is like that, since we
do not yet have any cases of custom PathTargets.)

I took this opportunity to provide some more-honest costing of
PlaceHolderVar evaluation. Up to now, the assumption that "scan/join
reltargetlists have cost zero" was applied not only to Vars, where it's
reasonable, but also PlaceHolderVars where it isn't. Now, we add the eval
cost of a PlaceHolderVar's expression to the first plan level where it can
be computed, by including it in the PathTarget cost field and adding that
to the cost estimates for Paths. This isn't perfect yet but it's much
better than before, and there is a way forward to improve it more. This
costing change affects the join order chosen for a couple of the regression
tests, changing expected row ordering.

19a54114

18 Feb, 2016 3 commits

pg_upgrade: suppress creation of delete script · 3386f34c

Bruce Momjian authored Feb 18, 2016

Suppress creation of the pg_upgrade delete script when the new data
directory is inside the old data directory.

Reported-by: IRC

Backpatch-through: 9.3, where delete script tests were added

3386f34c

Fix multiple bugs in contrib/pgstattuple's pgstatindex() function. · 48e6c943

Tom Lane authored Feb 18, 2016

Dead or half-dead index leaf pages were incorrectly reported as live, as a
consequence of a code rearrangement I made (during a moment of severe brain
fade, evidently) in commit d287818e.

The index metapage was not counted in index_size, causing that result to
not agree with the actual index size on-disk.

Index root pages were not counted in internal_pages, which is inconsistent
compared to the case of a root that's also a leaf (one-page index), where
the root would be counted in leaf_pages. Aside from that inconsistency,
this could lead to additional transient discrepancies between the reported
page counts and index_size, since it's possible for pgstatindex's scan to
see zero or multiple pages marked as BTP_ROOT, if the root moves due to
a split during the scan. With these fixes, index_size will always be
exactly one page more than the sum of the displayed page counts.

Also, the index_size result was incorrectly documented as being measured in
pages; it's always been measured in bytes. (While fixing that, I couldn't
resist doing some small additional wordsmithing on the pgstattuple docs.)

Including the metapage causes the reported index_size to not be zero for
an empty index. To preserve the desired property that the pgstattuple
regression test results are platform-independent (ie, BLCKSZ configuration
independent), scale the index_size result in the regression tests.

The documentation issue was reported by Otsuka Kenji, and the inconsistent
root page counting by Peter Geoghegan; the other problems noted by me.
Back-patch to all supported branches, because this has been broken for
a long time.

48e6c943

Improve error message about active replication slot · 18777c38
Peter Eisentraut authored Feb 17, 2016
```
The old phrasing was awkward if a replication slot is activated and
deactivated repeatedly.
```
18777c38

17 Feb, 2016 4 commits

Revert inadvertant change in pg_config behavior · fc8a81e3

Joe Conway authored Feb 17, 2016

In commit a5c43b88 the behavior of command line pg_config was
inadvertantly changed to include the config name when specific
configs are requested, similar to when none are requested and
all are emitted. This breaks scripts that expect to use
pg_config for e.g. PGXS. Revert the behavior to the previous.

fc8a81e3

Add new system view, pg_config · a5c43b88

Joe Conway authored Feb 17, 2016

Move and refactor the underlying code for the pg_config client
application to src/common in support of sharing it with a new
system information SRF called pg_config() which makes the same
information available via SQL. Additionally wrap the SRF with a
new system view, as called pg_config.

Patch by me with extensive input and review by Michael Paquier
and additional review by Alvaro Herrera.

a5c43b88

Reuse abbreviated keys in ordered [set] aggregates. · f1f5ec1e

Robert Haas authored Feb 17, 2016

When processing ordered aggregates following a sort that could make use
of the abbreviated key optimization, only call the equality operator to
compare successive pairs of tuples when their abbreviated keys were not
equal.

Peter Geoghegan, reviewd by Andreas Karlsson and by me.

f1f5ec1e

Make plpython cope with funny characters in function names. · 66f50386

Tom Lane authored Feb 16, 2016

A function name that's double-quoted in SQL can contain almost any
characters, but we were using that name directly as part of the name
generated for the Python-level function, and Python doesn't like
anything that isn't pretty much a standard identifier. To fix,
replace anything that isn't an ASCII letter or digit with an underscore
in the generated name. This doesn't create any risk of duplicate Python
function names because we were already appending the function OID to
the generated name to ensure uniqueness. Per bug #13960 from Jim Nasby.

Patch by Jim Nasby, modified a bit by me. Back-patch to all
supported branches.

66f50386

16 Feb, 2016 7 commits

Improve documentation about CREATE INDEX CONCURRENTLY. · a65313f2

Tom Lane authored Feb 16, 2016

Clarify the description of which transactions will block a CREATE INDEX
CONCURRENTLY command from proceeding, and mention that the index might
still not be usable after CREATE INDEX completes. (This happens if the
index build detected broken HOT chains, so that pg_index.indcheckxmin gets
set, and there are open old transactions preventing the xmin horizon from
advancing past the index's initial creation. I didn't want to explain what
broken HOT chains are, though, so I omitted an explanation of exactly when
old transactions prevent the index from being used.)

Per discussion with Chris Travers. Back-patch to all supported branches,
since the same text appears in all of them.

a65313f2

release notes: fix 9.5 SGML comment about commit · ab0757c1
Bruce Momjian authored Feb 16, 2016
```
Reported-by: Tatsuo Ishii

Backpatch-through: 9.5
```
ab0757c1
Changed expected result to list IPv6 local interface too. · 86889873
Michael Meskes authored Feb 16, 2016

86889873
Change ecpg lexer to accept comments with line breaks in CPP lines. · fc1ae7d2
Michael Meskes authored Feb 16, 2016

fc1ae7d2

Improve wording in the planner doc · bdc309c7

Tatsuo Ishii authored Feb 16, 2016

Change "In this case" to "In the example above" to clarify what it
actually refers to.

bdc309c7

Correct the formulas for System V IPC parameters SEMMNI and SEMMNS in docs. · 597f7e3a

Fujii Masao authored Feb 16, 2016

In runtime.sgml, the old formulas for calculating the reasonable
values of SEMMNI and SEMMNS were incorrect. They have forgotten to
count the number of semaphores which both the checkpointer process
(introduced in 9.2) and the background worker processes (introduced
in 9.3) need.

This commit fixes those formulas so that they count the number of
semaphores which the checkpointer process and the background worker
processes need.

Report and patch by Kyotaro Horiguchi. Only the patch for 9.3 was
modified by me. Back-patch to 9.2 where the checkpointer process was
added and the number of needed semaphores was increased.

Author: Kyotaro Horiguchi
Reviewed-by: Fujii Masao
Backpatch: 9.2
Discussion: http://www.postgresql.org/message-id/20160203.125119.66820697.horiguchi.kyotaro@lab.ntt.co.jp

597f7e3a

Move DATA entry to correct position · 851636bf

Joe Conway authored Feb 15, 2016

In commit 7b4bfc87 the DATA and DESCR entries for the new
row_security_active() function were inadvertantly put after
the PROVOLATILE defines, rather than before as they should
have been placed. Move them up where they belong.

Backpatch to 9.5 where the new entries were introduced.

851636bf

15 Feb, 2016 8 commits

Allow the WAL writer to flush WAL at a reduced rate. · 7975c5e0

Andres Freund authored Feb 15, 2016

Commit 4de82f7d increased the WAL flush rate, mainly to increase the
likelihood that hint bits can be set quickly. More quickly set hint bits
can reduce contention around the clog et al. But unfortunately the
increased flush rate can have a significant negative performance impact,
I have measured up to a factor of ~4. The reason for this slowdown is
that if there are independent writes to the underlying devices, for
example because shared buffers is a lot smaller than the hot data set,
or because a checkpoint is ongoing, the fdatasync() calls force cache
flushes to be emitted to the storage.

This is achieved by flushing WAL only if the last flush was longer than
wal_writer_delay ago, or if more than wal_writer_flush_after (new GUC)
unflushed blocks are pending. Based on some tests the default for
wal_writer_delay is 1MB, which seems to work well both on SSD and
rotational media.

To avoid negative performance impact due to 4de82f7d an earlier
commit (db76b1ef) made SetHintBits() more likely to succeed; preventing
performance regressions in the pgbench tests I performed.

Discussion: 20160118163908.GW10941@awork2.anarazel.de

7975c5e0

pgbench: avoid FD_ISSET on an invalid file descriptor · 5df44d14

Alvaro Herrera authored Feb 15, 2016

The original code wasn't careful to test the file descriptor returned by
PQsocket() for an invalid socket.  If an invalid socket did turn up,
that would amount to calling FD_ISSET with fd = -1, whereby undefined
behavior can be invoked.

To fix, test file descriptor for validity and stop further processing if
that fails.

Problem noticed by Coverity.

There is an existing FD_ISSET callsite that does check for invalid
sockets beforehand, but the error message reported by it was
strerror(errno); in testing the aforementioned change, that turns out to
result in "bad socket: Success" which isn't terribly helpful.  Instead
use PQerrorMessage() in both places which is more likely to contain an
useful error message.

Backpatch-through: 9.1.

5df44d14

Suppress compiler warnings about useless comparison of unsigned to zero. · 8c95ae81

Tom Lane authored Feb 15, 2016

Reportedly, some compilers warn about tests like "c < 0" if c is unsigned,
and hence complain about the character range checks I added in commit
3bb3f42f.  This is a bit of a pain since
the regex library doesn't really want to assume that chr is unsigned.
However, since any such reconfiguration would involve manual edits of
regcustom.h anyway, we can put it on the shoulders of whoever wants to
do that to adjust this new range-checking macro correctly.

Per gripes from Coverity and Andres.

8c95ae81

Allow SetHintBits() to succeed if the buffer's LSN is new enough. · db76b1ef

Andres Freund authored Feb 15, 2016

Previously we only allowed SetHintBits() to succeed if the commit LSN of
the last transaction touching the page has already been flushed to
disk. We can't generally change the LSN of the page, because we don't
necessarily have the required locks on the page. But the required LSN
interlock does not mean the commit record has to be flushed immediately,
it just requires that the commit record will be flushed before the page is
written out. Therefore if the buffer LSN is newer than the commit LSN,
the hint bit can be safely set.

In a number of scenarios (e.g. pgbench) this noticeably increases the
number of hint bits are set. But more importantly it also keeps the
success rate up when flushing WAL less frequently. That was the original
reason for commit 4de82f7d, which has negative performance consequences
in a number of scenarios. This will allow a followup commit to reduce
the flush rate.

Discussion: 20160118163908.GW10941@awork2.anarazel.de

db76b1ef

Joe Conway authored Feb 15, 2016

Looks like this patch went in after Copyright messages
were updated for 2016 and it missed the boat. Fixed.

cfafd8be

Make concurrent refresh check early that there is a unique index on matview. · 31b6606c

Fujii Masao authored Feb 16, 2016

In REFRESH MATERIALIZED VIEW command, CONCURRENTLY option is only
allowed if there is at least one unique index with no WHERE clause on
one or more columns of the matview. Previously, concurrent refresh
checked the existence of a unique index on the matview after filling
the data to new snapshot, i.e., after calling refresh_matview_datafill().
So, when there was no unique index, we could need to wait a long time
before we detected that and got the error. It was a waste of time.

To eliminate such wasting time, this commit changes concurrent refresh
so that it checks the existence of a unique index at the beginning of
the refresh operation, i.e., before starting any time-consuming jobs.
If CONCURRENTLY option is not allowed due to lack of a unique index,
concurrent refresh can immediately detect it and emit an error.

Author: Masahiko Sawada
Reviewed-by: Michael Paquier, Fujii Masao

31b6606c

Fix typo · 57c93247
Magnus Hagander authored Feb 15, 2016

57c93247
Replace broken link in comment. · 9449c4b1
Noah Misch authored Feb 15, 2016

9449c4b1