Commits · f12e814b88d8082804bbc8b827469d8068e7252c · Abuhujair Javed / Postgres FD Implementation

01 Oct, 2015 7 commits

Alvaro Herrera authored Oct 01, 2015

Module initialization was still not completely correct after commit
6b619551, per crash report from Takashi Ohnishi.  To fix, instead of
trying to monkey around with the value of the GUC setting directly, add
a separate boolean flag that enables the feature on a standby, but only
for the startup (recovery) process, when it sees that its master server
has the feature enabled.
Discussion: http://www.postgresql.org/message-id/ca44c6c7f9314868bdc521aea4f77cbf@MP-MSGSS-MBX004.msg.nttdata.co.jp

Also change the deactivation routine to delete all segment files rather
than leaving the last one around.  (This doesn't need separate
WAL-logging, because on recovery we execute the same deactivation
routine anyway.)

In passing, clean up the code structure somewhat, particularly so that
xlog.c doesn't know so much about when to activate/deactivate the
feature.

Thanks to Fujii Masao for testing and Petr Jelínek for off-list discussion.

Back-patch to 9.5, where commit_ts was introduced.

f12e814b

Fix incorrect tab-completion for GRANT and REVOKE · bf4817e4

Fujii Masao authored Oct 01, 2015

Previously "GRANT * ON * TO " was tab-completed to add an extra "TO",
rather than with a list of roles. This is the bug that commit 2f888070
introduced unexpectedly. This commit fixes that incorrect tab-completion.

Thomas Munro, reviewed by Jeff Janes.

bf4817e4

Fix documentation error in commit 8703059c. · 21995d3f
Tom Lane authored Oct 01, 2015
```
Etsuro Fujita spotted a thinko in the README commentary.
```
21995d3f

Fix mention of htup.h in storage.sgml · 3123ee0d

Fujii Masao authored Oct 01, 2015

Previously it was documented that the details on HeapTupleHeaderData
struct could be found in htup.h. This is not correct because it's now
defined in htup_details.h.

Back-patch to 9.3 where the definition of HeapTupleHeaderData struct
was moved from htup.h to htup_details.h.

Michael Paquier

3123ee0d

Fix readfuncs/outfuncs problems in last night's Gather patch. · 286a3a68
Robert Haas authored Oct 01, 2015
```
KaiGai Kohei, with one correction by me.
```
286a3a68
Fix errors in commit a04bb65f. · 5884b92a
Tom Lane authored Sep 30, 2015
```
Not a lot of commentary needed here really.
```
5884b92a

Improve LISTEN startup time when there are many unread notifications. · 07e4d03f

Tom Lane authored Sep 30, 2015

If some existing listener is far behind, incoming new listener sessions
would start from that session's read pointer and then need to advance over
many already-committed notification messages, which they have no interest
in. This was expensive in itself and also thrashed the pg_notify SLRU
buffers a lot more than necessary. We can improve matters considerably
in typical scenarios, without much added cost, by starting from the
furthest-ahead read pointer, not the furthest-behind one. We do have to
consider only sessions in our own database when doing this, which requires
an extra field in the data structure, but that's a pretty small cost.

Back-patch to 9.0 where the current LISTEN/NOTIFY logic was introduced.

Matt Newell, slightly adjusted by me

07e4d03f

30 Sep, 2015 5 commits

Add a Gather executor node. · 3bd909b2

Robert Haas authored Sep 30, 2015

A Gather executor node runs any number of copies of a plan in an equal
number of workers and merges all of the results into a single tuple
stream. It can also run the plan itself, if the workers are
unavailable or haven't started up yet. It is intended to work with
the Partial Seq Scan node which will be added in future commits.

It could also be used to implement parallel query of a different sort
by itself, without help from Partial Seq Scan, if the single_copy mode
is used. In that mode, a worker executes the plan, and the parallel
leader does not, merely collecting the worker's results. So, a Gather
node could be inserted into a plan to split the execution of that plan
across two processes. Nested Gather nodes aren't currently supported,
but we might want to add support for that in the future.

There's nothing in the planner to actually generate Gather nodes yet,
so it's not quite time to break out the champagne. But we're getting
close.

Amit Kapila. Some designs suggestions were provided by me, and I also
reviewed the patch. Single-copy mode, documentation, and other minor
changes also by me.

3bd909b2

Don't dump core when destroying an unused ParallelContext. · 227d57f3

Robert Haas authored Sep 30, 2015

If a transaction or subtransaction creates a ParallelContext but ends
without calling InitializeParallelDSM, the previous code would
seg fault.  Fix that.

227d57f3

Include policies based on ACLs needed · 7d8db3e8

Stephen Frost authored Sep 30, 2015

When considering which policies should be included, rather than look at
individual bits of the query (eg: if a RETURNING clause exists, or if a
WHERE clause exists which is referencing the table, or if it's a
FOR SHARE/UPDATE query), consider any case where we've determined
the user needs SELECT rights on the relation while doing an UPDATE or
DELETE to be a case where we apply SELECT policies, and any case where
we've deteremind that the user needs UPDATE rights on the relation while
doing a SELECT to be a case where we apply UPDATE policies.

This simplifies the logic and addresses concerns that a user could use
UPDATE or DELETE with a WHERE clauses to determine if rows exist, or
they could use SELECT .. FOR UPDATE to lock rows which they are not
actually allowed to modify through UPDATE policies.

Use list_append_unique() to avoid adding the same quals multiple times,
as, on balance, the cost of checking when adding the quals will almost
always be cheaper than keeping them and doing busywork for each tuple
during execution.

Back-patch to 9.5 where RLS was added.

7d8db3e8

Small improvements in comments in async.c. · 6057f61b

Tom Lane authored Sep 29, 2015

We seem to have lost a line somewhere along the way in the comment block
that discusses async.c's locks, because it suddenly refers to "both locks"
without previously having mentioned more than one. Add a sentence to make
that read more sanely. Also, refer to the "pos of the slowest backend"
not the "tail of the slowest backend", since we have no per-backend value
called "tail".

6057f61b

Fix incorrect tps number calculation in "excluding connections establishing". · a16db3a0

Tatsuo Ishii authored Sep 30, 2015

The tolerance (larger than actual tps number) increases as the number
of threads decreases. The bug has been there since the thread support
was introduced in 9.0. Because back patching introduces incompatible
behavior changes regarding the tps number, the fix is committed to
master and 9.5 stable branches only.

Problem spotted by me and fix proposed by Fabien COELHO. Note that his
original patch included more than fixes (a code re-factoring) which is
not related to the problem and I omitted the part.

a16db3a0

29 Sep, 2015 4 commits

Code review for transaction commit timestamps · 6b619551

Alvaro Herrera authored Sep 29, 2015

There are three main changes here:

1. No longer cause a start failure in a standby if the feature is
disabled in postgresql.conf but enabled in the master. This reverts one
part of commit 4f3924d9; what we keep is the ability of the standby
to activate/deactivate the module (which includes creating and removing
segments as appropriate) during replay of such actions in the master.

2. Replay WAL records affecting commitTS even if the feature is
disabled. This means the standby will always have the same state as the
master after replay.

3. Have COMMIT PREPARE record the transaction commit time as well. We
were previously only applying it in the normal transaction commit path.

Author: Petr Jelínek
Discussion: http://www.postgresql.org/message-id/CAHGQGwHereDzzzmfxEBYcVQu3oZv6vZcgu1TPeERWbDc+gQ06g@mail.gmail.com
Discussion: http://www.postgresql.org/message-id/CAHGQGwFuzfO4JscM9LCAmCDCxp_MfLvN4QdB+xWsS-FijbjTYQ@mail.gmail.com

Additionally, I cleaned up nearby code related to replication origins,
which I found a bit hard to follow, and fixed a couple of typos.

Backpatch to 9.5, where this code was introduced.

Per bug reports from Fujii Masao and subsequent discussion.

6b619551

Fix plperl to handle non-ASCII error message texts correctly. · b631a46e

Tom Lane authored Sep 29, 2015

We were passing error message texts to croak() verbatim, which turns out
not to work if the text contains non-ASCII characters; Perl mangles their
encoding, as reported in bug #13638 from Michal Leinweber. To fix, convert
the text into a UTF8-encoded SV first.

It's hard to test this without risking failures in different database
encodings; but we can follow the lead of plpython, which is already
assuming that no-break space (U+00A0) has an equivalent in all encodings
we care about running the regression tests in (cf commit 2dfa15de).

Back-patch to 9.1. The code is quite different in 9.0, and anyway it seems
too risky to put something like this into 9.0's final minor release.

Alex Hunsaker, with suggestions from Tim Bunce and Tom Lane

b631a46e

Comment update for join pushdown. · 758fcfdc
Robert Haas authored Sep 29, 2015
```
Etsuro Fujita
```
758fcfdc

Parallel executor support. · d1b7c1ff

Robert Haas authored Sep 28, 2015

This code provides infrastructure for a parallel leader to start up
parallel workers to execute subtrees of the plan tree being executed
in the master.  User-supplied parameters from ParamListInfo are passed
down, but PARAM_EXEC parameters are not.  Various other constructs,
such as initplans, subplans, and CTEs, are also not currently shared.
Nevertheless, there's enough here to support a basic implementation of
parallel query, and we can lift some of the current restrictions as
needed.

Amit Kapila and Robert Haas

d1b7c1ff

28 Sep, 2015 10 commits

Fix compiler warning for non-TIOCGWINSZ case · 0557dc27
Andrew Dunstan authored Sep 28, 2015
```
Backpatch to 9.5 where the error was introduced.
```
0557dc27
Fix compiler warning about unused function in non-readline case. · 8a0aa686
Andrew Dunstan authored Sep 28, 2015
```
Backpatch to all live branches to keep the code in sync.
```
8a0aa686

Fix "sesssion" typo · 17f5831c

Alvaro Herrera authored Sep 28, 2015

It was introduced alongside replication origins, by commit
5aa23504, so backpatch to 9.5.

Pointed out by Fujii Masao

17f5831c

Fix poor errno handling in libpq's version of our custom OpenSSL BIO. · 60f1e6bc

Tom Lane authored Sep 28, 2015

Thom Brown reported that SSL connections didn't seem to work on Windows in
9.5. Asif Naeem figured out that the cause was my_sock_read() looking at
"errno" when it needs to look at "SOCK_ERRNO". This mistake was introduced
in commit 680513ab, which cloned the
backend's custom SSL BIO code into libpq, and didn't translate the errno
handling properly. Moreover, it introduced unnecessary errno save/restore
logic, which was particularly confusing because it was incomplete; and it
failed to check for all three of EINTR, EAGAIN, and EWOULDBLOCK in
my_sock_write. (That might not be necessary; but since we're copying
well-tested backend code that does do that, it seems prudent to copy it
faithfully.)

60f1e6bc

Ensure a few policies remain for pg_upgrade · 992d702b

Stephen Frost authored Sep 28, 2015

To make sure that pg_dump/pg_restore function properly with RLS
policies, arrange to have a few of them left around at the end of the
regression tests.

Back-patch to 9.5 where RLS was added.

992d702b

COPY: use pg_plan_query() instead of planner() · 590e2d12

Alvaro Herrera authored Sep 28, 2015

While at it, trim the includes list in copy.c.  The planner headers
cannot be removed, but there are a few others that are not of any use.

590e2d12

Fix ON CONFLICT DO UPDATE for tables with oids. · 617db3a2

Andres Freund authored Sep 28, 2015

When taking the UPDATE path in an INSERT .. ON CONFLICT .. UPDATE tables
with oids were not supported. The tuple generated by the update target
list was projected without space for an oid - a simple oversight.

Reported-By: Peter Geoghegan
Author: Andres Freund
Backpatch: 9.5, where ON CONFLICT was introduced

617db3a2

Use LOCKBIT_ON() instead of a bit shift in a few places. · f40792a9
Robert Haas authored Sep 28, 2015
```
We do this mostly everywhere, so it seems just as well to do it here,
too.

Thomas Munro
```
f40792a9

Don't try to create a temp install without abs_top_builddir. · 45e5b4ef

Robert Haas authored Sep 28, 2015

Otherwise, we effectively act as if abs_top_builddir were the root
directory, which is quite dangerous if the user happens to have
permissions to do things there.  This can crop up in PGXS builds,
for example.

Report by Sandro Santilli, patch by me, review by Noah Misch.

45e5b4ef

pg_dump: Fix some messages · 883af819
Peter Eisentraut authored Sep 27, 2015
```
Make quoting style match existing style.  Improve plural support.
```
883af819

27 Sep, 2015 3 commits
- reindexdb: Fix mistake in help output · 71fc49df
  Peter Eisentraut authored Sep 27, 2015
  
  71fc49df
- pg_ctl: Improve help formatting and order · 72ed3905
  Peter Eisentraut authored Sep 26, 2015
  
  72ed3905
- doc: Tweak "cube" index entry · ac7cbf4f
  Peter Eisentraut authored Sep 26, 2015
```
With the arrival of the CUBE key word/feature, the index entries for the
cube extension and the CUBE feature were collapsed into one.  Tweak the
entry for the cube extension so they are separate entries.
```
  ac7cbf4f
26 Sep, 2015 2 commits

Remove legacy multixact truncation support. · aa29c1cc

Andres Freund authored Sep 26, 2015

In 9.5 and master there is no need to support legacy truncation. This is
just committed separately to make it easier to backpatch the WAL logged
multixact truncation to 9.3 and 9.4 if we later decide to do so.

I bumped master's magic from 0xD086 to 0xD088 and 9.5's from 0xD085 to
0xD087 to avoid 9.5 reusing a value that has been in use on master while
keeping the numbers increasing between major versions.

Discussion: 20150621192409.GA4797@alap3.anarazel.de
Backpatch: 9.5

aa29c1cc

Rework the way multixact truncations work. · 4f627f89

Andres Freund authored Sep 26, 2015

The fact that multixact truncations are not WAL logged has caused a fair
share of problems. Amongst others it requires to do computations during
recovery while the database is not in a consistent state, delaying
truncations till checkpoints, and handling members being truncated, but
offset not.

We tried to put bandaids on lots of these issues over the last years,
but it seems time to change course. Thus this patch introduces WAL
logging for multixact truncations.

This allows:
1) to perform the truncation directly during VACUUM, instead of delaying it
   to the checkpoint.
2) to avoid looking at the offsets SLRU for truncation during recovery,
   we can just use the master's values.
3) simplify a fair amount of logic to keep in memory limits straight,
   this has gotten much easier

During the course of fixing this a bunch of additional bugs had to be
fixed:
1) Data was not purged from memory the member's SLRU before deleting
   segments. This happened to be hard or impossible to hit due to the
   interlock between checkpoints and truncation.
2) find_multixact_start() relied on SimpleLruDoesPhysicalPageExist - but
   that doesn't work for offsets that haven't yet been flushed to
   disk. Add code to flush the SLRUs to fix. Not pretty, but it feels
   slightly safer to only make decisions based on actual on-disk state.
3) find_multixact_start() could be called concurrently with a truncation
   and thus fail. Via SetOffsetVacuumLimit() that could lead to a round
   of emergency vacuuming. The problem remains in
   pg_get_multixact_members(), but that's quite harmless.

For now this is going to only get applied to 9.5+, leaving the issues in
the older branches in place. It is quite possible that we need to
backpatch at a later point though.

For the case this gets backpatched we need to handle that an updated
standby may be replaying WAL from a not-yet upgraded primary. We have to
recognize that situation and use "old style" truncation (i.e. looking at
the SLRUs) during WAL replay. In contrast to before, this now happens in
the startup process, when replaying a checkpoint record, instead of the
checkpointer. Doing truncation in the restartpoint is incorrect, they
can happen much later than the original checkpoint, thereby leading to
wraparound.  To avoid "multixact_redo: unknown op code 48" errors
standbys would have to be upgraded before primaries.

A later patch will bump the WAL page magic, and remove the legacy
truncation codepaths. Legacy truncation support is just included to make
a possible future backpatch easier.

Discussion: 20150621192409.GA4797@alap3.anarazel.de
Reviewed-By: Robert Haas, Alvaro Herrera, Thomas Munro
Backpatch: 9.5 for now

4f627f89

25 Sep, 2015 4 commits

Second try at fixing O(N^2) problem in foreign key references. · 2abfd9d5

Tom Lane authored Sep 25, 2015

This replaces ill-fated commit 5ddc7288,
which was reverted because it broke active uses of FK cache entries. In
this patch, we still do nothing more to invalidatable cache entries than
mark them as needing revalidation, so we won't break active uses. To keep
down the overhead of InvalidateConstraintCacheCallBack(), keep a list of
just the currently-valid cache entries. (The entries are large enough that
some added space for list links doesn't seem like a big problem.) This
would still be O(N^2) when there are many valid entries, though, so when
the list gets too long, just force the "sinval reset" behavior to remove
everything from the list. I set the threshold at 1000 entries, somewhat
arbitrarily. Possibly that could be fine-tuned later. Another item for
future study is whether it's worth adding reference counting so that we
could safely remove invalidated entries. As-is, problem cases are likely
to end up with large and mostly invalid FK caches.

Like the previous attempt, backpatch to 9.3.

Jan Wieck and Tom Lane

2abfd9d5

Further fix for psql's code for locale-aware formatting of numeric output. · 77130fc1

Tom Lane authored Sep 25, 2015

(Third time's the charm, I hope.)

Additional testing disclosed that this code could mangle already-localized
output from the "money" datatype.  We can't very easily skip applying it
to "money" values, because the logic is tied to column right-justification
and people expect "money" output to be right-justified.  Short of
decoupling that, we can fix it in what should be a safe enough way by
testing to make sure the string doesn't contain any characters that would
not be expected in plain numeric output.

77130fc1

Further fix for psql's code for locale-aware formatting of numeric output. · 6325527d

Tom Lane authored Sep 25, 2015

On closer inspection, those seemingly redundant atoi() calls were not so
much inefficient as just plain wrong: the author of this code either had
not read, or had not understood, the POSIX specification for localeconv().
The grouping field is *not* a textual digit string but separate integers
encoded as chars.

We'll follow the existing code as well as the backend's cash.c in only
honoring the first group width, but let's at least honor it correctly.

This doesn't actually result in any behavioral change in any of the
locales I have installed on my Linux box, which may explain why nobody's
complained; grouping width 3 is close enough to universal that it's barely
worth considering other cases. Still, wrong is wrong, so back-patch.

6325527d

Fix psql's code for locale-aware formatting of numeric output. · 4778a0bd

Tom Lane authored Sep 24, 2015

This code did the wrong thing entirely for numbers with an exponent
but no decimal point (e.g., '1e6'), as reported by Jeff Janes in
bug #13636.  More generally, it made lots of unverified assumptions
about what the input string could possibly look like.  Rearrange so
that it only fools with leading digits that it's directly verified
are there, and an immediately adjacent decimal point.  While at it,
get rid of some useless inefficiencies, like converting the grouping
count string to integer over and over (and over).

This has been broken for a long time, so back-patch to all supported
branches.

4778a0bd

24 Sep, 2015 5 commits

Allow planner to use expression-index stats for function calls in WHERE. · 39df0f15

Tom Lane authored Sep 24, 2015

Previously, a function call appearing at the top level of WHERE had a
hard-wired selectivity estimate of 0.3333333, a kludge conveniently dated
in the source code itself to July 1992. The expectation at the time was
that somebody would soon implement estimator support functions analogous
to those for operators; but no such code has appeared, nor does it seem
likely to in the near future. We do have an alternative solution though,
at least for immutable functions on single relations: creating an
expression index on the function call will allow ANALYZE to gather stats
about the function's selectivity. But the code in clause_selectivity()
failed to make use of such data even if it exists.

Refactor so that that will happen. I chose to make it try this technique
for any clause type for which clause_selectivity() doesn't have a special
case, not just functions. To avoid adding unnecessary overhead in the
common case where we don't learn anything new, make selfuncs.c provide an
API that hooks directly to examine_variable() and then var_eq_const(),
rather than the previous coding which laboriously constructed an OpExpr
only so that it could be expensively deconstructed again.

I preserved the behavior that the default estimate for a function call
is 0.3333333. (For any other expression node type, it's 0.5, as before.)
I had originally thought to make the default be 0.5 across the board, but
changing a default estimate that's survived for twenty-three years seems
like something not to do without a lot more testing than I care to put
into it right now.

Per a complaint from Jehan-Guillaume de Rorthais. Back-patch into 9.5,
but not further, at least for the moment.

39df0f15

Improve handling of collations in contrib/postgres_fdw. · 76f965ff

Tom Lane authored Sep 24, 2015

If we have a local Var of say varchar type with default collation, and
we apply a RelabelType to convert that to text with default collation, we
don't want to consider that as creating an FDW_COLLATE_UNSAFE situation.
It should be okay to compare that to a remote Var, so long as the remote
Var determines the comparison collation. (When we actually ship such an
expression to the remote side, the local Var would become a Param with
default collation, meaning the remote Var would in fact control the
comparison collation, because non-default implicit collation overrides
default implicit collation in parse_collate.c.) To fix, be more precise
about what FDW_COLLATE_NONE means: it applies either to a noncollatable
data type or to a collatable type with default collation, if that collation
can't be traced to a remote Var. (When it can, FDW_COLLATE_SAFE is
appropriate.) We were essentially using that interpretation already at
the Var/Const/Param level, but we weren't bubbling it up properly.

An alternative fix would be to introduce a separate FDW_COLLATE_DEFAULT
value to describe the second situation, but that would add more code
without changing the actual behavior, so it didn't seem worthwhile.

Also, since we're clarifying the rule to be that we care about whether
operator/function input collations match, there seems no need to fail
immediately upon seeing a Const/Param/non-foreign-Var with nondefault
collation. We only have to reject if it appears in a collation-sensitive
context (for example, "var IS NOT NULL" is perfectly safe from a collation
standpoint, whatever collation the var has). So just set the state to
UNSAFE rather than failing immediately.

Per report from Jeevan Chalke. This essentially corrects some sloppy
thinking in commit ed3ddf91, so back-patch
to 9.3 where that logic appeared.

76f965ff

Don't zero opfuncid when reading nodes. · 9f1255ac

Robert Haas authored Sep 24, 2015

The comments here stated that this was just in case we ever had an
ALTER OPERATOR command that could remap an operator to a different
function.  But those comments have been here for a long time, and no
such command has come about.  In the absence of such a feature,
forcing the pg_proc OID to be looked up again each time we reread a
stored rule or similar is just a waste of cycles.  Moreover, parallel
query needs a way to reread the exact same node tree that was written
out, not one that has been slightly stomped on.  So just get rid of
this for now.

Per discussion with Tom Lane.

9f1255ac

Make pg_controldata report newest XID with valid commit timestamp · 18d938de

Fujii Masao authored Sep 24, 2015

Previously pg_controldata didn't report newestCommitTs and this was
an oversight in commit 73c986ad.

Also this patch changes pg_resetxlog so that it uses the same sentences
as pg_controldata does, regarding oldestCommitTs and newestCommitTs,
for the sake of consistency.

Back-patch to 9.5 where track_commit_timestamp was added.

Euler Taveira

18d938de

Lower *_freeze_max_age minimum values. · 020235a5

Andres Freund authored Sep 24, 2015

The old minimum values are rather large, making it time consuming to
test related behaviour. Additionally the current limits, especially for
multixacts, can be problematic in space-constrained systems. 10000000
multixacts can contain a lot of members.

Since there's no good reason for the current limits, lower them a good
bit. Setting them to 0 would be a bad idea, triggering endless vacuums,
so still retain a limit.

While at it fix autovacuum_multixact_freeze_max_age to refer to
multixact.c instead of varsup.c.

Reviewed-By: Robert Haas
Discussion: CA+TgmoYmQPHcrc3GSs7vwvrbTkbcGD9Gik=OztbDGGrovkkEzQ@mail.gmail.com
Backpatch: back to 9.0 (in parts)

020235a5