Commits · f65e8270587f3e9b8224e20f7d020ed1f816dfe1 · Abuhujair Javed / Postgres FD Implementation

27 Feb, 2015 3 commits

Invent a memory context reset/delete callback mechanism. · f65e8270

Tom Lane authored Feb 27, 2015

This allows cleanup actions to be registered to be called just before a
particular memory context's contents are flushed (either by deletion or
MemoryContextReset). The patch in itself has no use-cases for this, but
several likely reasons for wanting this exist.

In passing, per discussion, rearrange some boolean fields in struct
MemoryContextData so as to avoid wasted padding space. For safety,
this requires making allowInCritSection's existence unconditional;
but I think that's a better approach than what was there anyway.

f65e8270

Fix a couple of trivial issues in jsonb.c · 654809e7

Alvaro Herrera authored Feb 27, 2015

Typo "aggreagate" appeared three times, and the return value of function
JsonbIteratorNext() was being assigned to an int variable in a bunch of
places.

654809e7

Fix table_rewrite event trigger for ALTER TYPE/SET DATA TYPE CASCADE · 3f190f67

Alvaro Herrera authored Feb 27, 2015

When a composite type being used in a typed table is modified by way
of ALTER TYPE, a table rewrite occurs appearing to come from ALTER TYPE.
The existing event_trigger.c code was unable to cope with that
and raised a spurious error.  The fix is just to accept that command
tag for the event, and document this properly.

Noted while fooling with deparsing of DDL commands.  This appears to be
an oversight in commit 618c9430.

Thanks to Mark Wong for documentation wording help.

3f190f67

26 Feb, 2015 6 commits

Render infinite date/timestamps as 'infinity' for json/jsonb · bda76c1c

Andrew Dunstan authored Feb 26, 2015

Commit ab14a73a raised an error in these cases and later the
behaviour was copied to jsonb. This is what the XML code, which we
then adopted, does, as the XSD types don't accept infinite values.
However, json dates and timestamps are just strings as far as json is
concerned, so there is no reason not to render these values as
'infinity'.

The json portion of this is backpatched to 9.4 where the behaviour was
introduced. The jsonb portion only affects the development branch.

Per gripe on pgsql-general.

bda76c1c

Reconsider when to wait for WAL flushes/syncrep during commit. · fd6a3f3a

Andres Freund authored Feb 26, 2015

Up to now RecordTransactionCommit() waited for WAL to be flushed (if
synchronous_commit != off) and to be synchronously replicated (if
enabled), even if a transaction did not have a xid assigned. The primary
reason for that is that sequence's nextval() did not assign a xid, but
are worthwhile to wait for on commit.

This can be problematic because sometimes read only transactions do
write WAL, e.g. HOT page prune records. That then could lead to read only
transactions having to wait during commit. Not something people expect
in a read only transaction.

This lead to such strange symptoms as backends being seemingly stuck
during connection establishment when all synchronous replicas are
down. Especially annoying when said stuck connection is the standby
trying to reconnect to allow syncrep again...

This behavior also is involved in a rather complicated <= 9.4 bug where
the transaction started by catchup interrupt processing waited for
syncrep using latches, but didn't get the wakeup because it was already
running inside the same overloaded signal handler. Fix the issue here
doesn't properly solve that issue, merely papers over the problems. In
9.5 catchup interrupts aren't processed out of signal handlers anymore.

To fix all this, make nextval() acquire a top level xid, and only wait for
transaction commit if a transaction both acquired a xid and emitted WAL
records. If only a xid has been assigned we don't uselessly want to
wait just because of writes to temporary/unlogged tables; if only WAL
has been written we don't want to wait just because of HOT prunes.

The xid assignment in nextval() is unlikely to cause overhead in
real-world workloads. For one it only happens SEQ_LOG_VALS/32 values
anyway, for another only usage of nextval() without using the result in
an insert or similar is affected.

Discussion: 20150223165359.GF30784@awork2.anarazel.de,
369698E947874884A77849D8FE3680C2@maumau,
5CF4ABBA67674088B3941894E22A0D25@maumau

Per complaint from maumau and Thom Brown

Backpatch all the way back; 9.0 doesn't have syncrep, but it seems
better to be consistent behavior across all maintained branches.

fd6a3f3a

Add note about how to make the SRF detoasted arguments live accross calls. · a7920b87
Fujii Masao authored Feb 26, 2015
```
Andrew Gierth and Ali Akbar
```
a7920b87

Free SQLSTATE and SQLERRM no earlier than other PL/pgSQL variables. · f5ef00ae

Noah Misch authored Feb 25, 2015

"RETURN SQLERRM" prompted plpgsql_exec_function() to read from freed
memory. Back-patch to 9.0 (all supported versions). Little code ran
between the premature free and the read, so non-assert builds are
unlikely to witness user-visible consequences.

f5ef00ae

Add hasRowSecurity to copyfuncs/outfuncs · 62a4a1af

Stephen Frost authored Feb 25, 2015

The RLS patch added a hasRowSecurity field to PlannerGlobal and
PlannedStmt but didn't update nodes/copyfuncs.c and nodes/outfuncs.c to
reflect those additional fields.

Correct that by adding entries to the appropriate functions for those
fields.

Pointed out by Robert.

62a4a1af

Add locking clause for SB views for update/delete · 6f9bd50e

Stephen Frost authored Feb 25, 2015

In expand_security_qual(), we were handling locking correctly when a
PlanRowMark existed, but not when we were working with the target
relation (which doesn't have any PlanRowMarks, but the subquery created
for the security barrier quals still needs to lock the rows under it).

Noted by Etsuro Fujita when working with the Postgres FDW, which wasn't
properly issuing a SELECT ... FOR UPDATE to the remote side under a
DELETE.

Back-patch to 9.4 where updatable security barrier views were
introduced.

Per discussion with Etsuro and Dean Rasheed.

6f9bd50e

25 Feb, 2015 3 commits

Fix over-optimistic caching in fetch_array_arg_replace_nulls(). · 77903ede

Tom Lane authored Feb 25, 2015

When I rewrote this in commit 56a79a86,
I forgot that it's possible for the input array type to change from one
call to the next (this can happen when applying the function to
pg_statistic columns, for instance).  Fix that.

77903ede

Fix dumping of views that are just VALUES(...) but have column aliases. · e9f1c01b

Tom Lane authored Feb 25, 2015

The "simple" path for printing VALUES clauses doesn't work if we need
to attach nondefault column aliases, because there's noplace to do that
in the minimal VALUES() syntax. So modify get_simple_values_rte() to
detect nondefault aliases and treat that as a non-simple case. This
further exposes that the "non-simple" path never actually worked;
it didn't produce valid syntax. Fix that too. Per bug #12789 from
Curtis McEnroe, and analysis by Andrew Gierth.

Back-patch to all supported branches. Before 9.3, this also requires
back-patching the part of commit 092d7ded
that created get_simple_values_rte() to begin with; inserting the extra
test into the old factorization of that logic would've been too messy.

e9f1c01b

Remove null-pointer checks that are not needed. · 8794bf1c

Michael Meskes authored Feb 25, 2015

If a pointer is guaranteed to carry information there is no need to check
for NULL again. Patch by Michael Paquier.

8794bf1c

24 Feb, 2015 4 commits

Improve parser's one-extra-token lookahead mechanism. · d809fd00

Tom Lane authored Feb 24, 2015

There are a couple of places in our grammar that fail to be strict LALR(1),
by requiring more than a single token of lookahead to decide what to do.
Up to now we've dealt with that by using a filter between the lexer and
parser that merges adjacent tokens into one in the places where two tokens
of lookahead are necessary. But that creates a number of user-visible
anomalies, for instance that you can't name a CTE "ordinality" because
"WITH ordinality AS ..." triggers folding of WITH and ORDINALITY into one
token. I realized that there's a better way.

In this patch, we still do the lookahead basically as before, but we never
merge the second token into the first; we replace just the first token by
a special lookahead symbol when one of the lookahead pairs is seen.

This requires a couple extra productions in the grammar, but it involves
fewer special tokens, so that the grammar tables come out a bit smaller
than before. The filter logic is no slower than before, perhaps a bit
faster.

I also fixed the filter logic so that when backing up after a lookahead,
the current token's terminator is correctly restored; this eliminates some
weird behavior in error message issuance, as is shown by the one change in
existing regression test outputs.

I believe that this patch entirely eliminates odd behaviors caused by
lookahead for WITH. It doesn't really improve the situation for NULLS
followed by FIRST/LAST unfortunately: those sequences still act like a
reserved word, even though there are cases where they should be seen as two
ordinary identifiers, eg "SELECT nulls first FROM ...". I experimented
with additional grammar hacks but couldn't find any simple solution for
that. Still, this is better than before, and it seems much more likely
that we *could* somehow solve the NULLS case on the basis of this filter
behavior than the previous one.

d809fd00

Error when creating names too long for tar format · 23a78352

Peter Eisentraut authored Feb 24, 2015

The tar format (at least the version we are using), does not support
file names or symlink targets longer than 99 bytes.  Until now, the tar
creation code would silently truncate any names that are too long.  (Its
original application was pg_dump, where this never happens.)  This
creates problems when running base backups over the replication
protocol.

The most important problem is when a tablespace path is longer than 99
bytes, which will result in a truncated tablespace path being backed up.
Less importantly, the basebackup protocol also promises to back up any
other files it happens to find in the data directory, which would also
lead to file name truncation if someone put a file with a long name in
there.

Now both of these cases result in an error during the backup.

Add tests that fail when a too-long file name or symlink is attempted to
be backed up.
Reviewed-by: Robert Hass <robertmhaas@gmail.com>

23a78352

Fix recovery_command -> restore_command typo in 8.3 release notes. · 347c7432
Heikki Linnakangas authored Feb 24, 2015
```
Kyotaro Horiguchi
```
347c7432
Fix typo in README. · dd58c609
Heikki Linnakangas authored Feb 24, 2015
```
Kyotaro Horiguchi
```
dd58c609

23 Feb, 2015 10 commits

Fix invalid DocBook XML · b007bee1
Peter Eisentraut authored Feb 23, 2015

b007bee1
Fix stupid merge errors in previous commit · d1712d01
Alvaro Herrera authored Feb 23, 2015
```
Brown paper bag installed permanently.
```
d1712d01

Further tweaking of raw grammar output to distinguish different inputs. · 56be925e

Tom Lane authored Feb 23, 2015

Use a different A_Expr_Kind for LIKE/ILIKE/SIMILAR TO constructs, so that
they can be distinguished from direct invocation of the underlying
operators.  Also, postpone selection of the operator name when transforming
"x IN (select)" to "x = ANY (select)", so that those syntaxes can be told
apart at parse analysis time.

I had originally thought I'd also have to do something special for the
syntaxes IS NOT DISTINCT FROM, IS NOT DOCUMENT, and x NOT IN (SELECT...),
which the grammar translates as though they were NOT (construct).
On reflection though, we can distinguish those cases reliably by noting
whether the parse location shown for the NOT is the same as for its child
node.  This only requires tweaking the parse locations for NOT IN, which
I've done here.

These changes should have no effect outside the parser; they're just in
support of being able to give accurate warnings for planned operator
precedence changes.

56be925e

Support more commands in event triggers · 296f3a60

Alvaro Herrera authored Feb 23, 2015

COMMENT, SECURITY LABEL, and GRANT/REVOKE now also fire
ddl_command_start and ddl_command_end event triggers, when they operate
on database-local objects.

Reviewed-By: Michael Paquier, Andres Freund, Stephen Frost

296f3a60

Replace checkpoint_segments with min_wal_size and max_wal_size. · 88e98230

Heikki Linnakangas authored Feb 23, 2015

Instead of having a single knob (checkpoint_segments) that both triggers
checkpoints, and determines how many checkpoints to recycle, they are now
separate concerns. There is still an internal variable called
CheckpointSegments, which triggers checkpoints. But it no longer determines
how many segments to recycle at a checkpoint. That is now auto-tuned by
keeping a moving average of the distance between checkpoints (in bytes),
and trying to keep that many segments in reserve. The advantage of this is
that you can set max_wal_size very high, but the system won't actually
consume that much space if there isn't any need for it. The min_wal_size
sets a floor for that; you can effectively disable the auto-tuning behavior
by setting min_wal_size equal to max_wal_size.

The max_wal_size setting is now the actual target size of WAL at which a
new checkpoint is triggered, instead of the distance between checkpoints.
Previously, you could calculate the actual WAL usage with the formula
"(2 + checkpoint_completion_target) * checkpoint_segments + 1". With this
patch, you set the desired WAL usage with max_wal_size, and the system
calculates the appropriate CheckpointSegments with the reverse of that
formula. That's a lot more intuitive for administrators to set.

Reviewed by Amit Kapila and Venkata Balaji N.

88e98230

Renumber GUC_* constants. · 0fec0003

Heikki Linnakangas authored Feb 23, 2015

This moves all the regular flags back together (for aesthetic reasons), and
makes room for more GUC_UNIT_* types.

0fec0003

Refactor unit conversions code in guc.c. · 1b630264

Heikki Linnakangas authored Feb 23, 2015

Replace the if-switch-case constructs with two conversion tables,
containing all the supported conversions between human-readable unit
strings and the base units used in GUC variables. This makes the code
easier to read, and makes adding new units simpler.

1b630264

Guard against spurious signals in LockBufferForCleanup. · bc208a5a

Andres Freund authored Feb 23, 2015

When LockBufferForCleanup() has to wait for getting a cleanup lock on a
buffer it does so by setting a flag in the buffer header and then wait
for other backends to signal it using ProcWaitForSignal().
Unfortunately LockBufferForCleanup() missed that ProcWaitForSignal() can
return for other reasons than the signal it is hoping for. If such a
spurious signal arrives the wait flags on the buffer header will still
be set. That then triggers "ERROR: multiple backends attempting to wait
for pincount 1".

The fix is simple, unset the flag if still set when retrying. That
implies an additional spinlock acquisition/release, but that's unlikely
to matter given the cost of waiting for a cleanup lock.  Alternatively
it'd have been possible to move responsibility for maintaining the
relevant flag to the waiter all together, but that might have had
negative consequences due to possible floods of signals. Besides being
more invasive.

This looks to be a very longstanding bug. The relevant code in
LockBufferForCleanup() hasn't changed materially since its introduction
and ProcWaitForSignal() was documented to return for unrelated reasons
since 8.2.  The master only patch series removing ImmediateInterruptOK
made it much easier to hit though, as ProcSendSignal/ProcWaitForSignal
now uses a latch shared with other tasks.

Per discussion with Kevin Grittner, Tom Lane and me.

Backpatch to all supported branches.

Discussion: 11553.1423805224@sss.pgh.pa.us

bc208a5a

Add GUC to control the time to wait before retrieving WAL after failed attempt. · 5d2b45e3

Fujii Masao authored Feb 23, 2015

Previously when the standby server failed to retrieve WAL files from any sources
(i.e., streaming replication, local pg_xlog directory or WAL archive), it always
waited for five seconds (hard-coded) before the next attempt. For example,
this is problematic in warm-standby because restore_command can fail
every five seconds even while new WAL file is expected to be unavailable for
a long time and flood the log files with its error messages.

This commit adds new parameter, wal_retrieve_retry_interval, to control that
wait time.

Alexey Vasiliev and Michael Paquier, reviewed by Andres Freund and me.

5d2b45e3

Fix potential deadlock with libpq non-blocking mode. · 2a3f6e36

Heikki Linnakangas authored Feb 23, 2015

If libpq output buffer is full, pqSendSome() function tries to drain any
incoming data. This avoids deadlock, if the server e.g. sends a lot of
NOTICE messages, and blocks until we read them. However, pqSendSome() only
did that in blocking mode. In non-blocking mode, the deadlock could still
happen.

To fix, take a two-pronged approach:

1. Change the documentation to instruct that when PQflush() returns 1, you
should wait for both read- and write-ready, and call PQconsumeInput() if it
becomes read-ready. That fixes the deadlock, but applications are not going
to change overnight.

2. In pqSendSome(), drain the input buffer before returning 1. This
alleviates the problem for applications that only wait for write-ready. In
particular, a slow but steady stream of NOTICE messages during COPY FROM
STDIN will no longer cause a deadlock. The risk remains that the server
attempts to send a large burst of data and fills its output buffer, and at
the same time the client also sends enough data to fill its output buffer.
The application will deadlock if it goes to sleep, waiting for the socket
to become write-ready, before the server's data arrives. In practice,
NOTICE messages and such that the server might be sending are usually
short, so it's highly unlikely that the server would fill its output buffer
so quickly.

Backpatch to all supported versions.

2a3f6e36

22 Feb, 2015 5 commits

Add parse location fields to NullTest and BooleanTest structs. · c063da17

Tom Lane authored Feb 22, 2015

We did not need a location tag on NullTest or BooleanTest before, because
no error messages referred directly to their locations. That's planned
to change though, so add these fields in a separate housekeeping commit.

Catversion bump because stored rules may change.

c063da17

Get rid of multiple applications of transformExpr() to the same tree. · 6a75562e

Tom Lane authored Feb 22, 2015

transformExpr() has for many years had provisions to do nothing when
applied to an already-transformed expression tree. However, this was
always ugly and of dubious reliability, so we'd be much better off without
it. The primary historical reason for it was that gram.y sometimes
returned multiple links to the same subexpression, which is no longer true
as of my BETWEEN fixes. We'd also grown some lazy hacks in CREATE TABLE
LIKE (failing to distinguish between raw and already-transformed index
specifications) and one or two other places.

This patch removes the need for and support for re-transforming already
transformed expressions. The index case is dealt with by adding a flag
to struct IndexStmt to indicate that it's already been transformed;
which has some benefit anyway in that tablecmds.c can now Assert that
transformation has happened rather than just assuming. The other main
reason was some rather sloppy code for array type coercion, which can
be fixed (and its performance improved too) by refactoring.

I did leave transformJoinUsingClause() still constructing expressions
containing untransformed operator nodes being applied to Vars, so that
transformExpr() still has to allow Var inputs. But that's a much narrower,
and safer, special case than before, since Vars will never appear in a raw
parse tree, and they don't have any substructure to worry about.

In passing fix some oversights in the patch that added CREATE INDEX
IF NOT EXISTS (missing processing of IndexStmt.if_not_exists). These
appear relatively harmless, but still sloppy coding practice.

6a75562e

Represent BETWEEN as a special node type in raw parse trees. · 34af082f

Tom Lane authored Feb 22, 2015

Previously, gram.y itself converted BETWEEN into AND (or AND/OR) nests of
expression comparisons. This was always as bogus as could be, but fixing
it hasn't risen to the top of the to-do list. The present patch invents an
A_Expr representation for BETWEEN expressions, and does the expansion to
comparison trees in parse_expr.c which is at least a slightly saner place
to be doing semantic conversions. There should be no change in the post-
parse-analysis results.

This does nothing for the semantic issues with BETWEEN (dubious connection
to btree-opclass semantics, and multiple evaluation of possibly volatile
subexpressions) ... but it's a necessary preliminary step before we could
fix any of that. The main immediate benefit is that preserving BETWEEN as
an identifiable raw-parse-tree construct will enable better error messages.

While at it, fix the code so that multiply-referenced subexpressions are
physically duplicated before being passed through transformExpr(). This
gets rid of one of the principal reasons why transformExpr() has
historically had to allow already-processed input.

34af082f

Rename variable in AllocSetContextCreate to be consistent. · 74811c40

Jeff Davis authored Feb 21, 2015

Everywhere else in the file, "context" is of type MemoryContext and
"set" is of type AllocSet. AllocSetContextCreate uses a variable of
type AllocSet, so rename it from "context" to "set".

74811c40

In array_agg(), don't create a new context for every group. · b419865a

Jeff Davis authored Feb 21, 2015

Previously, each new array created a new memory context that started
out at 8kB. This is incredibly wasteful when there are lots of small
groups of just a few elements each.

Change initArrayResult() and friends to accept a "subcontext" argument
to indicate whether the caller wants the ArrayBuildState allocated in
a new subcontext or not. If not, it can no longer be released
separately from the rest of the memory context.

Fixes bug report by Frank van Vugt on 2013-10-19.

Tomas Vondra. Reviewed by Ali Akbar, Tom Lane, and me.

b419865a

21 Feb, 2015 9 commits

Try to fix busted gettimeofday() code. · e9fd5545
Tom Lane authored Feb 21, 2015
```
Per buildfarm, we have to match the _stdcall property of the system
functions.
```
e9fd5545
Use FLEXIBLE_ARRAY_MEMBER in Windows-specific code. · 332f02f8
Tom Lane authored Feb 21, 2015
```
Be a tad more paranoid about overlength input, too.
```
332f02f8

Force some system catalog table columns to be marked NOT NULL. · 82a532b3

Andres Freund authored Feb 21, 2015

In a manual pass over the catalog declaration I found a number of
columns which the boostrap automatism didn't mark NOT NULL even though
they actually were. Add BKI_FORCE_NOT_NULL markings to them.

It's usually not critical if a system table column is falsely determined
to be nullable as the code should always catch relevant cases. But it's
good to have a extra layer in place.

Discussion: 20150215170014.GE15326@awork2.anarazel.de

82a532b3

Allow forcing nullness of columns during bootstrap. · eb68379c

Andres Freund authored Feb 21, 2015

Bootstrap determines whether a column is null based on simple builtin
rules. Those work surprisingly well, but nonetheless a few existing
columns aren't set correctly. Additionally there is at least one patch
sent to hackers where forcing the nullness of a column would be helpful.

The boostrap format has gained FORCE [NOT] NULL for this, which will be
emitted by genbki.pl when BKI_FORCE_(NOT_)?NULL is specified for a
column in a catalog header.

This patch doesn't change the marking of any existing columns.

Discussion: 20150215170014.GE15326@awork2.anarazel.de

eb68379c

Don't need to explain [1] kluge anymore in xfunc.sgml. · 0627eff3
Tom Lane authored Feb 21, 2015

0627eff3
Use FLEXIBLE_ARRAY_MEMBER in a number of other places. · 2e211211
Tom Lane authored Feb 21, 2015
```
I think we're about done with this...
```
2e211211

Use FLEXIBLE_ARRAY_MEMBER for HeapTupleHeaderData.t_bits[]. · e1a11d93

Tom Lane authored Feb 21, 2015

This requires changing quite a few places that were depending on
sizeof(HeapTupleHeaderData), but it seems for the best.

Michael Paquier, some adjustments by me

e1a11d93

Minor code beautification in conninfo_uri_parse_params(). · 3d9b6f31
Tom Lane authored Feb 21, 2015
```
Reading this made me itch, so clean the logic a bit.
```
3d9b6f31

Fix misparsing of empty value in conninfo_uri_parse_params(). · b26e2081

Tom Lane authored Feb 21, 2015

After finding an "=" character, the pointer was advanced twice when it
should only advance once. This is harmless as long as the value after "="
has at least one character; but if it doesn't, we'd miss the terminator
character and include too much in the value.

In principle this could lead to reading off the end of memory. It does not
seem worth treating as a security issue though, because it would happen on
client side, and besides client logic that's taking conninfo strings from
untrusted sources has much worse security problems than this.

Report and patch received off-list from Thomas Fanghaenel.
Back-patch to 9.2 where the faulty code was introduced.

b26e2081