Commits · 12c9a04008870c283931d6b3b648ee21bbc2cfda · Abuhujair Javed / Postgres FD Implementation

30 Oct, 2015 4 commits

Implement lookbehind constraints in our regular-expression engine. · 12c9a040

Tom Lane authored Oct 30, 2015

A lookbehind constraint is like a lookahead constraint in that it consumes
no text; but it checks for existence (or nonexistence) of a match *ending*
at the current point in the string, rather than one *starting* at the
current point. This is a long-requested feature since it exists in many
other regex libraries, but Henry Spencer had never got around to
implementing it in the code we use.

Just making it work is actually pretty trivial; but naive copying of the
logic for lookahead constraints leads to code that often spends O(N^2) time
to scan an N-character string, because we have to run the match engine
from string start to the current probe point each time the constraint is
checked. In typical use-cases a lookbehind constraint will be written at
the start of the regex and hence will need to be checked at every character
--- so O(N^2) work overall. To fix that, I introduced a third copy of the
core DFA matching loop, paralleling the existing longest() and shortest()
loops. This version, matchuntil(), can suspend and resume matching given
a couple of pointers' worth of storage space. So we need only run it
across the string once, stopping at each interesting probe point and then
resuming to advance to the next one.

I also put in an optimization that simplifies one-character lookahead and
lookbehind constraints, such as "(?=x)" or "(?<!\w)", into AHEAD and BEHIND
constraints, which already existed in the engine. This avoids the overhead
of the LACON machinery entirely for these rather common cases.

The net result is that lookbehind constraints run a factor of three or so
slower than Perl's for multi-character constraints, but faster than Perl's
for one-character constraints ... and they work fine for variable-length
constraints, which Perl gives up on entirely. So that's not bad from a
competitive perspective, and there's room for further optimization if
anyone cares. (In reality, raw scan rate across a large input string is
probably not that big a deal for Postgres usage anyway; so I'm happy if
it's linear.)

12c9a040

doc: security_barrier option is a Boolean, not a string. · c5057b2b
Robert Haas authored Oct 30, 2015
```
Mistake introduced by commit 5bd91e3a.

Hari Babu
```
c5057b2b

Update parallel executor support to reuse the same DSM. · 3a1f8611

Robert Haas authored Oct 30, 2015

Commit b0b0d84b purported to make it
possible to relaunch workers using the same parallel context, but it had
an unpleasant race condition: we might reinitialize after the workers
have sent their last control message but before they have dettached the
DSM, leaving to crashes.  Repair by introducing a new ParallelContext
operation, ReinitializeParallelDSM.

Adjust execParallel.c to use this new support, so that we can rescan a
Gather node by relaunching workers but without needing to recreate the
DSM.

Amit Kapila, with some adjustments by me.  Extracted from latest parallel
sequential scan patch.

3a1f8611

Fix typo in bgworker.c · c6baec92
Robert Haas authored Oct 30, 2015

c6baec92

29 Oct, 2015 3 commits
- Docs: add example clarifying use of nested JSON containment. · 23937a42
  Tom Lane authored Oct 29, 2015
```
Show how this can be used in practice to make queries simpler and more
flexible.  Also, draw an explicit contrast to the existence operator,
which doesn't work that way.

Peter Geoghegan and Tom Lane
```
  23937a42
- Remove some remains from Alpha support removal · c5130e8e
  Peter Eisentraut authored Oct 29, 2015
  
  c5130e8e
- Message style improvements · a8d585c0
  Peter Eisentraut authored Oct 28, 2015
```
Message style, plurals, quoting, spelling, consistency with similar
messages
```
  a8d585c0
28 Oct, 2015 3 commits
- Add missing serial comma, for consistency. · d4556516
  Robert Haas authored Oct 28, 2015
```
Amit Langote, per Etsuro Fujita
```
  d4556516
- Fix incorrect message in ATWrongRelkindError. · 9dcce712
  Robert Haas authored Oct 28, 2015
```
Mistake introduced by commit 3bf3ab8c.

Etsuro Fujita
```
  9dcce712
- Fix secondary expected output for commit_ts test · dbe6f434
  Alvaro Herrera authored Oct 27, 2015
```
Per red wall in buildfarm
```
  dbe6f434
27 Oct, 2015 5 commits

Make Gather node projection-capable. · 8538a630

Robert Haas authored Oct 28, 2015

The original Gather code failed to mark a Gather node as not able to
do projection, but it couldn't, even though it did call initialize its
projection info via ExecAssignProjectionInfo. There doesn't seem to
be any good reason for this node not to have projection capability,
so clean things up so that it does. Without this, plans using Gather
nodes might need to carry extra Result nodes to do projection.

8538a630

Document BRIN's inclusion opclass framework · c15898c1

Alvaro Herrera authored Oct 27, 2015

Backpatch to 9.5 -- this should have been part of b0b7be61, but we
didn't have 38b03caebc5de either at the time.

Author: Emre Hasegeli
Revised by: Ian Barwick
Discussion:
 http://www.postgresql.org/message-id/CAE2gYzyB39Q9up_-TO6FKhH44pcAM1x6n_Cuj15qKoLoFihUVg@mail.gmail.com
 http://www.postgresql.org/message-id/562DA711.3020305@2ndquadrant.com

c15898c1

Fix BRIN free space computations · 21a4e4a4

Alvaro Herrera authored Oct 27, 2015

A bug in the original free space computation made it possible to
return a page which wasn't actually able to fit the item.  Since the
insertion code isn't prepared to deal with PageAddItem failing, a PANIC
resulted ("failed to add BRIN tuple [to new page]").  Add a macro to
encapsulate the correct computation, and use it in
brin_getinsertbuffer's callers before calling that routine, to raise an
early error.

I became aware of the possiblity of a problem in this area while working
on ccc4c074.  There's no archived discussion about it, but it's
easy to reproduce a problem in the unpatched code with something like

CREATE TABLE t (a text);
CREATE INDEX ti ON t USING brin (a) WITH (pages_per_range=1);

for length in `seq 8000 8196`
do
	psql -f - <<EOF
TRUNCATE TABLE t;
INSERT INTO t VALUES ('z'), (repeat('a', $length));
EOF
done

Backpatch to 9.5, where BRIN was introduced.

21a4e4a4

Cleanup commit timestamp module activaction, again · 531d21b7

Alvaro Herrera authored Oct 27, 2015

Further tweak commit_ts.c so that on a standby the state is completely
consistent with what that in the master, rather than behaving
differently in the cases that the settings differ. Now in standby and
master the module should always be active or inactive in lockstep.

Author: Petr Jelínek, with some further tweaks by Álvaro Herrera.

Backpatch to 9.5, where commit timestamps were introduced.

Discussion: http://www.postgresql.org/message-id/5622BF9D.2010409@2ndquadrant.com

531d21b7

Measure string lengths only once · 0cd836a4

Alvaro Herrera authored Oct 27, 2015

Bernd Helmle complained that CreateReplicationSlot() was assigning the
same value to the same variable twice, so we could remove one of them.
Code inspection reveals that we can actually remove both assignments:
according to the author the assignment was there for beauty of the
strlen line only, and another possible fix to that is to put the strlen
in its own line, so do that.

To be consistent within the file, refactor all duplicated strlen()
calls, which is what we do elsewhere in the backend anyway. In
basebackup.c, snprintf already returns the right length; no need for
strlen afterwards.

Backpatch to 9.4, where replication slots were introduced, to keep code
identical. Some of this is older, but the patch doesn't apply cleanly
and it's only of cosmetic value anyway.

Discussion: http://www.postgresql.org/message-id/BE2FD71DEA35A2287EA5F018@eje.credativ.lan

0cd836a4

23 Oct, 2015 1 commit

shm_mq: Repair breakage from previous commit. · a1480ec1

Robert Haas authored Oct 22, 2015

If the counterparty writes some data into the queue and then detaches,
it's wrong to return SHM_MQ_DETACHED right away.  If we do that, we
fail to read whatever was written.

a1480ec1

22 Oct, 2015 8 commits

Add two missing cases to ATWrongRelkindError. · 872101be

Robert Haas authored Oct 22, 2015

This way, we produce a better error message if someone tries to do
something like ALTER INDEX .. ALTER COLUMN .. SET STORAGE.

Amit Langote

872101be

shm_mq: Fix failure to notice a dead counterparty when nowait is used. · b2ccb5f4

Robert Haas authored Oct 22, 2015

The shm_mq mechanism was intended to optionally notice when the process
on the other end of the queue fails to attach to the queue. It does
this by allowing the user to pass a BackgroundWorkerHandle; if the
background worker in question is launched and dies without attaching
to the queue, then we know it never will. This logic works OK in
blocking mode, but when called with nowait = true we fail to notice
that this has happened due to an asymmetry in the logic. Repair.

Reported off-list by Rushabh Lathia. Patch by me.

b2ccb5f4

Fix typos in comments. · 31ba62ce
Robert Haas authored Oct 22, 2015
```
CharSyam
```
31ba62ce
doc: Add advice on updating checkpoint_segments to max_wal_size · 8f2977b3
Peter Eisentraut authored Oct 22, 2015
```
with suggestion from Michael Paquier
```
8f2977b3

Remove redundant CREATEUSER/NOCREATEUSER options in CREATE ROLE et al. · d371bebd

Tom Lane authored Oct 22, 2015

Once upon a time we did not have a separate CREATEROLE privilege, and
CREATEUSER effectively meant SUPERUSER.  When we invented CREATEROLE
(in 8.1) we also added SUPERUSER so as to have a less confusing keyword
for this role property.  However, we left CREATEUSER in place as a
deprecated synonym for SUPERUSER, because of backwards-compatibility
concerns.  It's still there and is still confusing people, as for example
in bug #13694 from Justin Catterson.  9.6 will be ten years or so later,
which surely ought to be long enough to end the deprecation and just
remove these old keywords.  Hence, do so.

d371bebd

Fix a couple of bugs in recent parallelism-related commits. · bde39eed

Robert Haas authored Oct 22, 2015

Commit 816e336f added the wrong error
check to async.c; sending restrictions is restricted to the leader,
not altogether unsafe.

Commit 3bd909b2 added ExecShutdownNode
to traverse the planstate tree and call shutdown functions, but made
a Gather node, the only node that actually has such a function, abort
the tree traversal, which is wrong.

bde39eed

Add header comments to execParallel.c and nodeGather.c. · 1a219fa1
Robert Haas authored Oct 22, 2015
```
Patch by me, per a note from Simon Riggs.  Reviewed by Amit Kapila
and Amit Langote.
```
1a219fa1
doc: Improve markup and fine-tune replication protocol documentation · e4a618aa
Peter Eisentraut authored Oct 21, 2015

e4a618aa

20 Oct, 2015 9 commits

Fix incorrect translation of minus-infinity datetimes for json/jsonb. · d4355425

Tom Lane authored Oct 20, 2015

Commit bda76c1c caused both plus and
minus infinity to be rendered as "infinity", which is not only wrong
but inconsistent with the pre-9.4 behavior of to_json().  Fix that by
duplicating the coding in date_out/timestamp_out/timestamptz_out more
closely.  Per bug #13687 from Stepan Perlov.  Back-patch to 9.4, like
the previous commit.

In passing, also re-pgindent json.c, since it had gotten a bit messed up by
recent patches (and I was already annoyed by indentation-related problems
in back-patching this fix ...)

d4355425

doc: Move documentation of max_wal_size to better position · 984ae04a
Peter Eisentraut authored Oct 20, 2015

984ae04a
Fix incorrect comment in plannodes.h · a1c466c5
Robert Haas authored Oct 20, 2015
```
Etsuro Fujita
```
a1c466c5
Remove duplicate word. · dc486fb9
Robert Haas authored Oct 20, 2015
```
Amit Langote
```
dc486fb9
Tab complete CREATE EXTENSION .. VERSION. · 7c0b49cd
Robert Haas authored Oct 20, 2015
```
Jeff Janes
```
7c0b49cd

Put back ssl_renegotiation_limit parameter, but only allow 0. · 84ef9c59

Robert Haas authored Oct 20, 2015

Per a report from Shay Rojansky, Npgsql sends ssl_renegotiation_limit=0
in the startup packet because it does not support renegotiation; other
clients which have not attempted to support renegotiation might well
behave similarly.  The recent removal of this parameter forces them to
break compatibility with either current PostgreSQL versions, or
previous ones.  Per discussion, the best solution is to accept the
parameter but only allow a value of 0.

Shay Rojansky, edited a little by me.

84ef9c59

Be a bit more rigorous about how we cache strcoll and strxfrm results. · 5be94a9e

Robert Haas authored Oct 20, 2015

Commit 0e57b4d8 contained some clever
logic that attempted to make sure that we couldn't get confused about
whether the last thing we cached was a strcoll() result or a strxfrm()
result, but it wasn't quite clever enough, because we can perform
further abbreviations after having already performed some comparisons.
Introduce an explicit flag in the hopes of making this watertight.

Peter Geoghegan, reviewed by me.

5be94a9e

Remove obsolete comment. · d53f808e
Robert Haas authored Oct 20, 2015
```
Peter Geoghegan
```
d53f808e

Eschew "RESET statement_timeout" in tests. · 8e3b4d9d

Noah Misch authored Oct 20, 2015

Instead, use transaction abort. Given an unlucky bout of latency, the
timeout would cancel the RESET itself. Buildfarm members gharial,
lapwing, mereswine, shearwater, and sungazer witness that. Back-patch
to 9.1 (all supported versions). The query_canceled test still could
timeout before entering its subtransaction; for whatever reason, that
has yet to happen on the buildfarm.

8e3b4d9d

19 Oct, 2015 1 commit

Fix incorrect handling of lookahead constraints in pg_regprefix(). · 9f1e642d

Tom Lane authored Oct 19, 2015

pg_regprefix was doing nothing with lookahead constraints, which would
be fine if it were the right kind of nothing, but it isn't: we have to
terminate our search for a fixed prefix, not just pretend the LACON arc
isn't there.  Otherwise, if the current state has both a LACON outarc and a
single plain-color outarc, we'd falsely conclude that the color represents
an addition to the fixed prefix, and generate an extracted index condition
that restricts the indexscan too much.  (See added regression test case.)

Terminating the search is conservative: we could traverse the LACON arc
(thus assuming that the constraint can be satisfied at runtime) and then
examine the outarcs of the linked-to state.  But that would be a lot more
work than it seems worth, because writing a LACON followed by a single
plain character is a pretty silly thing to do.

This makes a difference only in rather contrived cases, but it's a bug,
so back-patch to all supported branches.

9f1e642d

16 Oct, 2015 6 commits

Add a C API for parallel heap scans. · ee7ca559

Robert Haas authored Oct 16, 2015

Using this API, one backend can set up a ParallelHeapScanDesc to
which multiple backends can then attach.  Each tuple in the relation
will be returned to exactly one of the scanning backends.  Only
forward scans are supported, and rescans must be carefully
coordinated.

This is not exposed to the planner or executor yet.

The original version of this code was written by me.  Amit Kapila
reviewed it, tested it, and improved it, including adding support for
synchronized scans, per review comments from Jeff Davis.  Extensive
testing of this and related patches was performed by Haribabu Kommi.
Final cleanup of this patch by me.

ee7ca559

Allow a parallel context to relaunch workers. · b0b0d84b

Robert Haas authored Oct 16, 2015

This may allow some callers to avoid the overhead involved in tearing
down a parallel context and then setting up a new one, which means
releasing the DSM and then allocating and populating a new one.  I
suspect we'll want to revise the Gather node to make use of this new
capability, but even if not it may be useful elsewhere and requires
very little additional code.

b0b0d84b

Miscellaneous cleanup of regular-expression compiler. · afdfcd3f

Tom Lane authored Oct 16, 2015

Revert our previous addition of "all" flags to copyins() and copyouts();
they're no longer needed, and were never anything but an unsightly hack.

Improve a couple of infelicities in the REG_DEBUG code for dumping
the NFA data structure, including adding code to count the total
number of states and arcs.

Add a couple of missed error checks.

Add some more documentation in the README file, and some regression tests
illustrating cases that exceeded the state-count limit and/or took
unreasonable amounts of time before this set of patches.

Back-patch to all supported branches.

afdfcd3f

Improve memory-usage accounting in regular-expression compiler. · 538b3b8b

Tom Lane authored Oct 16, 2015

This code previously counted the number of NFA states it created, and
complained if a limit was exceeded, so as to prevent bizarre regex patterns
from consuming unreasonable time or memory. That's fine as far as it went,
but the code paid no attention to how many arcs linked those states. Since
regexes can be contrived that have O(N) states but will need O(N^2) arcs
after fixempties() processing, it was still possible to blow out memory,
and take a long time doing it too. To fix, modify the bookkeeping to count
space used by both states and arcs.

I did not bother with including the "color map" in the accounting; it
can only grow to a few megabytes, which is not a lot in comparison to
what we're allowing for states+arcs (about 150MB on 64-bit machines
or half that on 32-bit machines).

Looking at some of the larger real-world regexes captured in the Tcl
regression test suite suggests that the most that is likely to be needed
for regexes found in the wild is under 10MB, so I believe that the current
limit has enough headroom to make it okay to keep it as a hard-wired limit.

In connection with this, redefine REG_ETOOBIG as meaning "regular
expression is too complex"; the previous wording of "nfa has too many
states" was already somewhat inapropos because of the error code's use
for stack depth overrun, and it was not very user-friendly either.

Back-patch to all supported branches.

538b3b8b

Improve performance of pullback/pushfwd in regular-expression compiler. · 6a715366

Tom Lane authored Oct 16, 2015

The previous coding would create a new intermediate state every time it
wanted to interchange the ordering of two constraint arcs. Certain regex
features such as \Y can generate large numbers of parallel constraint arcs,
and if we needed to reorder the results of that, we created unreasonable
numbers of intermediate states. To improve matters, keep a list of
already-created intermediate states associated with the state currently
being considered by the outer loop; we can re-use such states to place all
the new arcs leading to the same destination or source.

I also took the trouble to redefine push() and pull() to have a less risky
API: they no longer delete any state or arc that the caller might possibly
have a pointer to, except for the specifically-passed constraint arc.
This reduces the risk of re-introducing the same type of error seen in
the failed patch for CVE-2007-4772.

Back-patch to all supported branches.

6a715366

Improve performance of fixempties() pass in regular-expression compiler. · f5b7d103

Tom Lane authored Oct 16, 2015

The previous coding took something like O(N^4) time to fully process a
chain of N EMPTY arcs. We can't really do much better than O(N^2) because
we have to insert about that many arcs, but we can do lots better than
what's there now. The win comes partly from using mergeins() to amortize
de-duplication of arcs across multiple source states, and partly from
exploiting knowledge of the ordering of arcs for each state to avoid
looking at arcs we don't need to consider during the scan. We do have
to be a bit careful of the possible reordering of arcs introduced by
the sort-merge coding of the previous commit, but that's not hard to
deal with.

Back-patch to all supported branches.

f5b7d103