Commits · 28605968322b70a7efe1cc89595d1cfc557d80b9 · Abuhujair Javed / Postgres FD Implementation

11 Oct, 2017 5 commits

Doc: fix missing explanation of default object privileges. · 28605968

Tom Lane authored Oct 11, 2017

The GRANT reference page, which lists the default privileges for new
objects, failed to mention that USAGE is granted by default for data
types and domains. As a lesser sin, it also did not specify anything
about the initial privileges for sequences, FDWs, foreign servers,
or large objects. Fix that, and add a comment to acldefault() in the
probably vain hope of getting people to maintain this list in future.

Noted by Laurenz Albe, though I editorialized on the wording a bit.
Back-patch to all supported branches, since they all have this behavior.

Discussion: https://postgr.es/m/1507620895.4152.1.camel@cybertec.at

28605968

Fix mistakes in comments. · 20d210bf

Robert Haas authored Oct 11, 2017

Masahiko Sawada

Discussion: http://postgr.es/m/CAD21AoBsfYsMHD6_SL9iN3n_Foaa+oPbL5jG55DxU1ChaujqwQ@mail.gmail.com

20d210bf

Fix low-probability loss of NOTIFY messages due to XID wraparound. · 118e99c3

Tom Lane authored Oct 11, 2017

Up to now async.c has used TransactionIdIsInProgress() to detect whether
a notify message's source transaction is still running. However, that
function has a quick-exit path that reports that XIDs before RecentXmin
are no longer running. If a listening backend is doing nothing but
listening, and not running any queries, there is nothing that will advance
its value of RecentXmin. Once 2 billion transactions elapse, the
RecentXmin check causes active transactions to be reported as not running.
If they aren't committed yet according to CLOG, async.c decides they
aborted and discards their messages. The timing for that is a bit tight
but it can happen when multiple backends are sending notifies concurrently.
The net symptom therefore is that a sufficiently-long-surviving
listen-only backend starts to miss some fraction of NOTIFY traffic,
but only under heavy load.

The only function that updates RecentXmin is GetSnapshotData().
A brute-force fix would therefore be to take a snapshot before
processing incoming notify messages. But that would add cycles,
as well as contention for the ProcArrayLock. We can be smarter:
having taken the snapshot, let's use that to check for running
XIDs, and not call TransactionIdIsInProgress() at all. In this
way we reduce the number of ProcArrayLock acquisitions from one
per message to one per notify interrupt; that's the same under
light load but should be a benefit under heavy load. Light testing
says that this change is a wash performance-wise for normal loads.

I looked around for other callers of TransactionIdIsInProgress()
that might be at similar risk, and didn't find any; all of them
are inside transactions that presumably have already taken a
snapshot.

Problem report and diagnosis by Marko Tiikkaja, patch by me.
Back-patch to all supported branches, since it's been like this
since 9.0.

Discussion: https://postgr.es/m/20170926182935.14128.65278@wrigleys.postgresql.org

118e99c3

Add port/strnlen support to libpq and ecpg Makefiles. · 46912d9b

Tom Lane authored Oct 11, 2017

In the wake of fffd651e, any makefile that pulls in snprintf.c
from src/port/ needs to be prepared to pull in strnlen.c as well.
Per buildfarm.

46912d9b

Fix whitespace · e9e0f78b
Peter Eisentraut authored Oct 11, 2017

e9e0f78b

10 Oct, 2017 4 commits

Regenerate configure script. · f4128ab4

Tom Lane authored Oct 10, 2017

Not sure how fffd651e ended up
probing for "strnlenfrak" rather than "strnlen".
My autoconf doesn't do that ...

f4128ab4

Rewrite strnlen replacement implementation from . · fffd651e

Andres Freund authored Oct 10, 2017

The previous placement of the fallback implementation in libpgcommon
was problematic, because libpqport functions need strnlen
functionality.

Move replacement into libpgport. Provide strnlen() under its posix
name, instead of pg_strnlen(). Fix stupid configure bug, executing the
test only when compiled with threading support.

Author: Andres Freund
Discussion: https://postgr.es/m/E1e1gR2-0005fB-SI@gemulon.postgresql.org

fffd651e

Add missing clean step to src/test/modules/brin/Makefile. · fa5e119d

Tom Lane authored Oct 10, 2017

I noticed the tmp_check subdirectory wasn't getting cleaned up
after a check-world run.  Apparently pgxs.mk will only do this
for you if you've defined REGRESS.  The only other src/test/modules
Makefile that does not set that is snapshot_too_old, and it
does it like this.

fa5e119d

Use lower-case SGML attribute values · 44b3230e
Peter Eisentraut authored Oct 08, 2017
```
for DocBook XML compatibility
```
44b3230e

09 Oct, 2017 3 commits

Fix pnstrdup() to not memcpy() the maximum allowed length. · 82c117cb

Andres Freund authored Oct 09, 2017

The previous behaviour was dangerous if the length passed wasn't the
size of the underlying buffer, but the maximum size of the underlying
buffer.

Author: Andres Freund
Discussion: https://postgr.es/m/20161003215524.mwz5p45pcverrkyk@alap3.anarazel.de

82c117cb

Add pg_strnlen() a portable implementation of strlen. · 8a241792

Andres Freund authored Oct 09, 2017

As the OS version is likely going to be more optimized, fall back to
it if available, as detected by configure.

8a241792

Remove unused documentation file · 71c75ddf
Peter Eisentraut authored Oct 08, 2017

71c75ddf

08 Oct, 2017 3 commits

Reduce memory usage of targetlist SRFs. · 84ad4b03

Andres Freund authored Oct 08, 2017

Previously nodeProjectSet only released memory once per input tuple,
rather than once per returned tuple. If the computation of an
individual returned tuple requires a lot of memory, that can lead to
problems.

Instead change things so that the expression context can be reset once
per output tuple, which requires a new memory context to store SRF
arguments in.

This is a longstanding issue, but was hard to fix before 9.6, due to
the way tSRFs where evaluated. But it's fairly easy to fix now. We
could backpatch this into 10, but given there've been fewc omplaints
that doesn't seem worth the risk so far.

Reported-By: Lucas Fairchild
Author: Andres Freund, per discussion with Tom Lane
Discussion: https://postgr.es/m/4514.1507318623@sss.pgh.pa.us

84ad4b03

Increase distance between flush requests during bulk file copies. · 643c27e3

Tom Lane authored Oct 08, 2017

copy_file() reads and writes data 64KB at a time (with default BLCKSZ),
and historically has issued a pg_flush_data request after each write.
This turns out to interact really badly with macOS's new APFS file
system: a large file copy takes over 100X longer than it ought to on
APFS, as reported by Brent Dearth. While that's arguably a macOS bug,
it's not clear whether Apple will do anything about it in the near
future, and in any case experimentation suggests that issuing flushes
a bit less often can be helpful on other platforms too.

Hence, rearrange the logic in copy_file() so that flush requests are
issued once per N writes rather than every time through the loop.
I set the FLUSH_DISTANCE to 32MB on macOS (any less than that still
results in a noticeable speed degradation on APFS), but 1MB elsewhere.
In limited testing on Linux and FreeBSD, this seems slightly faster
than the previous code, and certainly no worse. It helps noticeably
on macOS even with the older HFS filesystem.

A simpler change would have been to just increase the size of the
copy buffer without changing the loop logic, but that seems likely
to trash the processor cache without really helping much.

Back-patch to 9.6 where we introduced msync() as an implementation
option for pg_flush_data(). The problem seems specific to APFS's
mmap/msync support, so I don't think we need to go further back.

Discussion: https://postgr.es/m/CADkxhTNv-j2jw2g8H57deMeAbfRgYBoLmVuXkC=YCFBXRuCOww@mail.gmail.com

643c27e3

Reduce "X = X" to "X IS NOT NULL", if it's easy to do so. · 8ec5429e

Tom Lane authored Oct 08, 2017

If the operator is a strict btree equality operator, and X isn't volatile,
then the clause must yield true for any non-null value of X, or null if X
is null. At top level of a WHERE clause, we can ignore the distinction
between false and null results, so it's valid to simplify the clause to
"X IS NOT NULL". This is a useful improvement mainly because we'll get
a far better selectivity estimate in most cases.

Because such cases seldom arise in well-written queries, it is unappetizing
to expend a lot of planner cycles looking for them ... but it turns out
that there's a place we can shoehorn this in practically for free, because
equivclass.c already has to detect and reject candidate equivalences of the
form X = X. That doesn't catch every place that it would be valid to
simplify to X IS NOT NULL, but it catches the typical case. Working harder
doesn't seem justified.

Patch by me, reviewed by Petr Jelinek

Discussion: https://postgr.es/m/CAMjNa7cC4X9YR-vAJS-jSYCajhRDvJQnN7m2sLH1wLh-_Z2bsw@mail.gmail.com

8ec5429e

07 Oct, 2017 3 commits

Improve pg_regress's error reporting for schedule-file problems. · b11f0d36

Tom Lane authored Oct 07, 2017

The previous coding here trashed the line buffer as it scanned it,
making it impossible to print the source line in subsequent error
messages.  With a few save/restore/strdup pushups we can improve
that situation.

In passing, move the free'ing of the various strings that are collected
while processing one set of tests down to the bottom of the loop.
That's simpler, less surprising, and should make valgrind less unhappy
about the strings that were previously leaked by the last iteration.

b11f0d36

Enforce our convention about max number of parallel regression tests. · ef73a816

Tom Lane authored Oct 07, 2017

We have a very old rule that parallel_schedule should have no more
than twenty tests in any one parallel group, so as to provide a
bound on the number of concurrently running processes needed to
pass the tests. But people keep forgetting the rule, so let's add
a few lines of code to check it.

Discussion: https://postgr.es/m/a37e9c57-22d4-1b82-1270-4501cd2e984e@2ndquadrant.com

ef73a816

Clean up sloppy maintenance of regression test schedule files. · 1fdab4d5

Tom Lane authored Oct 07, 2017

The partition_join test was added to a parallel group that was already
at the maximum of 20 concurrent tests. The hash_func test wasn't
added to serial_schedule at all. The identity and partition_join tests
were added to serial_schedule with the aid of a dartboard, rather than
maintaining consistency with parallel_schedule.

There are proposals afoot to make these sorts of errors harder to make,
but in the meantime let's fix the ones already in place.

Discussion: https://postgr.es/m/a37e9c57-22d4-1b82-1270-4501cd2e984e@2ndquadrant.com

1fdab4d5

06 Oct, 2017 10 commits

Fix crash when logical decoding is invoked from a PL function. · 1518d078

Tom Lane authored Oct 06, 2017

The logical decoding functions do BeginInternalSubTransaction and
RollbackAndReleaseCurrentSubTransaction to clean up after themselves.
It turns out that AtEOSubXact_SPI has an unrecognized assumption that
we always need to cancel the active SPI operation in the SPI context
that surrounds the subtransaction (if there is one).  That's true
when the RollbackAndReleaseCurrentSubTransaction call is coming from
the SPI-using function itself, but not when it's happening inside
some unrelated function invoked by a SPI query.  In practice the
affected callers are the various PLs.

To fix, record the current subtransaction ID when we begin a SPI
operation, and clean up only if that ID is the subtransaction being
canceled.

Also, remove AtEOSubXact_SPI's assertion that it must have cleaned
up the surrounding SPI context's active tuptable.  That's proven
wrong by the same test case.

Also clarify (or, if you prefer, reinterpret) the calling conventions
for _SPI_begin_call and _SPI_end_call.  The memory context cleanup
in the latter means that these have always had the flavor of a matched
resource-management pair, but they weren't documented that way before.

Per report from Ben Chobot.

Back-patch to 9.4 where logical decoding came in.  In principle,
the SPI changes should go all the way back, since the problem dates
back to commit 7ec1c5a8.  But given the lack of field complaints
it seems few people are using internal subtransactions in this way.
So I don't feel a need to take any risks in 9.2/9.3.

Discussion: https://postgr.es/m/73FBA179-C68C-4540-9473-71E865408B15@silentmedia.com

1518d078

Copy information from the relcache instead of pointing to it. · 45866c75

Robert Haas authored Oct 06, 2017

We have the relations continuously locked, but not open, so relcache
pointers are not guaranteed to be stable.  Per buildfarm member
prion.

Ashutosh Bapat.  I fixed a typo.

Discussion: http://postgr.es/m/CAFjFpRcRBqoKLZSNmRsjKr81uEP=ennvqSQaXVCCBTXvJ2rW+Q@mail.gmail.com

45866c75

Fix intra-query memory leakage in nodeProjectSet.c. · a1c2c430

Tom Lane authored Oct 06, 2017

Both ExecMakeFunctionResultSet() and evaluation of simple expressions
need to be done in the per-tuple memory context, not per-query, else
we leak data until end of query. This is a consideration that was
missed while refactoring code in the ProjectSet patch (note that in
pre-v10, ExecMakeFunctionResult is called in the per-tuple context).

Per bug #14843 from Ben M. Diagnosed independently by Andres and myself.

Discussion: https://postgr.es/m/20171005230321.28561.15927@wrigleys.postgresql.org

a1c2c430

Fix access-off-end-of-array in clog.c. · 6b87416c

Tom Lane authored Oct 06, 2017

Sloppy loop coding in set_status_by_pages() resulted in fetching one array
element more than it should from the subxids[] array. The odds of this
resulting in SIGSEGV are pretty small, but we've certainly seen that happen
with similar mistakes elsewhere. While at it, we can get rid of an extra
TransactionIdToPage() calculation per loop.

Per report from David Binderman. Back-patch to all supported branches,
since this code is quite old.

Discussion: https://postgr.es/m/HE1PR0802MB2331CBA919CBFFF0C465EB429C710@HE1PR0802MB2331.eurprd08.prod.outlook.com

6b87416c

Support coverage on vpath builds · c3d9a660

Peter Eisentraut authored Aug 10, 2017

A few paths needed to be tweaked so everything looks into the
appropriate directories.
Reviewed-by: Michael Paquier <michael.paquier@gmail.com>

c3d9a660

Run coverage commands quietly · 52e1b1b0

Peter Eisentraut authored Aug 10, 2017

They are very chatty by default, but the output doesn't seem all that
useful for normal operation.
Reviewed-by: Michael Paquier <michael.paquier@gmail.com>

52e1b1b0

Remove coverage details view · c0112363

Peter Eisentraut authored Aug 10, 2017

This is only useful if we name the different tests, which we don't do at
the moment.
Reviewed-by: Michael Paquier <michael.paquier@gmail.com>

c0112363

#ifdef out some dead code in psql/mainloop.c. · 3620569f

Tom Lane authored Oct 06, 2017

This pg_send_history() call is unreachable, since the block it's in
is currently only entered in !cur_cmd_interactive mode. But rather
than just delete it, make it #ifdef NOT_USED, in hopes that we'll
remember to enable it if we ever change that decision.

Per report from David Binderman. Since this is basically cosmetic,
I see no great need to back-patch.

Discussion: https://postgr.es/m/HE1PR0802MB233122B61F00A15E035C83BE9C710@HE1PR0802MB2331.eurprd08.prod.outlook.com

3620569f

Fix traversal of half-frozen update chains · a5736bf7

Alvaro Herrera authored Oct 06, 2017

When some tuple versions in an update chain are frozen due to them being
older than freeze_min_age, the xmax/xmin trail can become broken. This
breaks HOT (and probably other things). A subsequent VACUUM can break
things in more serious ways, such as leaving orphan heap-only tuples
whose root HOT redirect items were removed. This can be seen because
index creation (or REINDEX) complain like
ERROR: XX000: failed to find parent tuple for heap-only tuple at (0,7) in table "t"

Because of relfrozenxid contraints, we cannot avoid the freezing of the
early tuples, so we must cope with the results: whenever we see an Xmin
of FrozenTransactionId, consider it a match for whatever the previous
Xmax value was.

This problem seems to have appeared in 9.3 with multixact changes,
though strictly speaking it seems unrelated.

Since 9.4 we have commit 37484ad2 "Change the way we mark tuples as
frozen", so the fix is simple: just compare the raw Xmin (still stored
in the tuple header, since freezing merely set an infomask bit) to the
Xmax. But in 9.3 we rewrite the Xmin value to FrozenTransactionId, so
the original value is lost and we have nothing to compare the Xmax with.
To cope with that case we need to compare the Xmin with FrozenXid,
assume it's a match, and hope for the best. Sadly, since you can
pg_upgrade a 9.3 instance containing half-frozen pages to newer
releases, we need to keep the old check in newer versions too, which
seems a bit brittle; I hope we can somehow get rid of that.

I didn't optimize the new function for performance. The new coding is
probably a bit slower than before, since there is a function call rather
than a straight comparison, but I'd rather have it work correctly than
be fast but wrong.

This is a followup after 20b65522 fixed a few related problems.
Apparently, in 9.6 and up there are more ways to get into trouble, but
in 9.3 - 9.5 I cannot reproduce a problem anymore with this patch, so
there must be a separate bug.

Reported-by: Peter Geoghegan
Diagnosed-by: Peter Geoghegan, Michael Paquier, Daniel Wood,
Yi Wen Wong, Álvaro
Discussion: https://postgr.es/m/CAH2-Wznm4rCrhFAiwKPWTpEw2bXDtgROZK7jWWGucXeH3D1fmA@mail.gmail.com

a5736bf7

Basic partition-wise join functionality. · f49842d1

Robert Haas authored Oct 06, 2017

Instead of joining two partitioned tables in their entirety we can, if
it is an equi-join on the partition keys, join the matching partitions
individually. This involves teaching the planner about "other join"
rels, which are related to regular join rels in the same way that
other member rels are related to baserels. This can use significantly
more CPU time and memory than regular join planning, because there may
now be a set of "other" rels not only for every base relation but also
for every join relation. In most practical cases, this probably
shouldn't be a problem, because (1) it's probably unusual to join many
tables each with many partitions using the partition keys for all
joins and (2) if you do that scenario then you probably have a big
enough machine to handle the increased memory cost of planning and (3)
the resulting plan is highly likely to be better, so what you spend in
planning you'll make up on the execution side. All the same, for now,
turn this feature off by default.

Currently, we can only perform joins between two tables whose
partitioning schemes are absolutely identical. It would be nice to
cope with other scenarios, such as extra partitions on one side or the
other with no match on the other side, but that will have to wait for
a future patch.

Ashutosh Bapat, reviewed and tested by Rajkumar Raghuwanshi, Amit
Langote, Rafia Sabih, Thomas Munro, Dilip Kumar, Antonin Houska, Amit
Khandekar, and by me. A few final adjustments by me.

Discussion: http://postgr.es/m/CAFjFpRfQ8GrQvzp3jA2wnLqrHmaXna-urjm_UY9BqXj=EaDTSA@mail.gmail.com
Discussion: http://postgr.es/m/CAFjFpRcitjfrULr5jfuKWRPsGUX0LQ0k8-yG0Qw2+1LBGNpMdw@mail.gmail.com

f49842d1

05 Oct, 2017 10 commits

Fix typo in README. · fe9ba28e
Tom Lane authored Oct 05, 2017
```
s/BeginInternalSubtransaction/BeginInternalSubTransaction/
```
fe9ba28e

On CREATE TABLE, consider skipping validation of subpartitions. · 6476b261

Robert Haas authored Oct 05, 2017

This is just like commit 14f67a8e, but
for CREATE PARTITION rather than ATTACH PARTITION.

Jeevan Ladhe, with test case changes by me.

Discussion: http://postgr.es/m/CAOgcT0MWwG8WBw8frFMtRYHAgDD=tpt6U7WcsO_L2k0KYpm4Jg@mail.gmail.com

6476b261

On attach, consider skipping validation of subpartitions individually. · 14f67a8e

Robert Haas authored Oct 05, 2017

If the table attached as a partition is itself partitioned, individual
partitions might have constraints strong enough to skip scanning the
table even if the table actually attached does not. This is pretty
cheap to check, and possibly a big win if it works out.

Amit Langote, with test case changes by me.

Discussion: http://postgr.es/m/1f08b844-0078-aa8d-452e-7af3bf77d05f@lab.ntt.co.jp

14f67a8e

Improve error message when skipping scan of default partition. · c31e9d4b

Robert Haas authored Oct 05, 2017

It seems like a good idea to clearly distinguish between skipping the
scan of the new partition itself and skipping the scan of the default
partition.

Amit Langote

Discussion: http://postgr.es/m/1f08b844-0078-aa8d-452e-7af3bf77d05f@lab.ntt.co.jp

c31e9d4b

Allow DML commands that create tables to use parallel query. · e9baa5e9

Robert Haas authored Oct 05, 2017

Haribabu Kommi, reviewed by Dilip Kumar and Rafia Sabih. Various
cosmetic changes by me to explain why this appears to be safe but
allowing inserts in parallel mode in general wouldn't be. Also, I
removed the REFRESH MATERIALIZED VIEW case from Haribabu's patch,
since I'm not convinced that case is OK, and hacked on the
documentation somewhat.

Discussion: http://postgr.es/m/CAJrrPGdo5bak6qnPWe8Kpi8g_jfQEs-G4SYmG9y+OFaw2-dPvA@mail.gmail.com

e9baa5e9

Improve comments in vacuum_rel() and analyze_rel(). · 4d85c290

Tom Lane authored Oct 05, 2017

Remove obsolete references to get_rel_oids(). Avoid listing specific
relkinds in the comments, since we seem unable to keep such things
in sync with the code, and it's not all that helpful anyhow.

Noted by Michael Paquier, though I rewrote the comments a bit more.

Discussion: https://postgr.es/m/CAB7nPqTWiN9zwKTaOrsnKiGDChqRt7C1+CiiDk4N4OMn92rs6A@mail.gmail.com

4d85c290

Fix typo. · 4b2ba1fe

Robert Haas authored Oct 05, 2017

Etsuro Fujita

Discussion: http://postgr.es/m/1b2e9ac7-b99a-2769-5e42-afdf62bfa7fa@lab.ntt.co.jp

4b2ba1fe

Fix more user-visible elog() calls. · c097b271

Robert Haas authored Oct 05, 2017

Michael Paquier discovered that this could be triggered via SQL;
give a nicer message instead.

Patch by Michael Paquier, reviewed by Masahiko Sawada.

Discussion: http://postgr.es/m/CAB7nPqQtPg+LKKtzdKN26judHcvPZ0s1gNigzOT4j8CYuuuBYg@mail.gmail.com

c097b271

Document and use SPI_result_code_string() · 036166f2

Peter Eisentraut authored Aug 30, 2017

A lot of semi-internal code just prints out numeric SPI error codes,
which is not very helpful. We already have an API function to convert
the codes to a string, so let's make more use of that.
Reviewed-by: Michael Paquier <michael.paquier@gmail.com>

036166f2

Move SPI error reporting out of ri_ReportViolation() · 582bbcf3

Peter Eisentraut authored Aug 30, 2017

These are two completely unrelated code paths, so it doesn't make sense
to pack them into one function.

Add attribute noreturn to ri_ReportViolation().
Reviewed-by: Michael Paquier <michael.paquier@gmail.com>

582bbcf3

04 Oct, 2017 2 commits
- Msvc doesn't know UINT16_MAX, replace with PG_UINT16_MAX. · 9eafa2b5
  Andres Freund authored Oct 04, 2017
```
UINT16_MAX usage is originating from commit 212e6f34.

Per buildfarm animal currawong.
```
  9eafa2b5
- Attempt to adapt windows build for 212e6f34. · 15334ad1
  Andres Freund authored Oct 04, 2017
```
Per buildfarm animal baiji.
```
  15334ad1