Commits · d16f8c8e416d288bd4734ed5f14076b62ec8d153 · Abuhujair Javed / Postgres FD Implementation

02 Mar, 2021 9 commits

Mark default_transaction_read_only as GUC_REPORT. · d16f8c8e

Tom Lane authored Mar 02, 2021

This allows clients to find out the setting at connection time without
having to expend a query round trip to do so; which is helpful when
trying to identify read/write servers. (One must also look at
in_hot_standby, but that's already GUC_REPORT, cf bf8a662c.)
Modifying libpq to make use of this will come soon, but I felt it
cleaner to push the server change separately.

Haribabu Kommi, Greg Nancarrow, Vignesh C; reviewed at various times
by Laurenz Albe, Takayuki Tsunakawa, Peter Smith.

Discussion: https://postgr.es/m/CAF3+xM+8-ztOkaV9gHiJ3wfgENTq97QcjXQt+rbFQ6F7oNzt9A@mail.gmail.com

d16f8c8e

Use native path separators to pg_ctl in initdb · 75dbfe4c

Alvaro Herrera authored Mar 02, 2021

On Windows, CMD.EXE allegedly does not run a command that uses forward slashes,
so let's convert the path to use backslashes instead.

Backpatch to 10.

Author: Nitin Jadhav <nitinjadhavpostgres@gmail.com>
Reviewed-by: Juan José Santamaría Flecha <juanjo.santamaria@gmail.com>
Discussion: https://postgr.es/m/CAMm1aWaNDuaPYFYMAqDeJrZmPtNvLcJRS++CcZWY8LT6KcoBZw@mail.gmail.com

75dbfe4c

Suppress unnecessary regex subre nodes in a couple more cases. · 4604f83f

Tom Lane authored Mar 02, 2021

This extends the changes made in commit cebc1d34, teaching
parseqatom() to generate fewer or cheaper subre nodes in some edge
cases. The case of interest here is a quantified atom that is "messy"
only because it has greediness opposite to what preceded it (whereas
captures and backrefs are intrinsically messy). In this case we don't
need an iteration node, since we don't care where the sub-matches of
the quantifier are; and we might also not need a second concatenation
node. This seems of only marginal real-world use according to my
testing, but I wanted to get it in before wrapping up this series of
regex performance fixes.

Discussion: https://postgr.es/m/1340281.1613018383@sss.pgh.pa.us

4604f83f

Improve performance of regular expression back-references. · 0c3405cf

Tom Lane authored Mar 02, 2021

In some cases, at the time that we're doing an NFA-based precheck
of whether a backref subexpression can match at a particular place
in the string, we already know which substring the referenced
subexpression matched.  If so, we might as well forget about the NFA
and just compare the substring; this is faster and it gives an exact
rather than approximate answer.

In general, this optimization can help while we are prechecking within
the second child expression of a concat node, while the capture was
within the first child expression; then the substring was saved during
cdissect() of the first child and will be available to NFA checks done
while cdissect() recurses into the second child.  It can help quite a
lot if the tree looks like

              concat
             /      \
      capture        concat
                    /      \
     expensive stuff        backref

as we will be able to avoid recursively dissecting the "expensive
stuff" before discovering that the backref isn't satisfied with a
particular midpoint that the lower concat node is testing.  This
doesn't help if the concat tree is left-deep, as the capture node
won't get set soon enough (and it's hard to fix that without changing
the engine's match behavior).  Fortunately, right-deep concat trees
are the common case.

Patch by me, reviewed by Joel Jacobson

Discussion: https://postgr.es/m/661609.1614560029@sss.pgh.pa.us

0c3405cf

Fix semantics of regular expression back-references. · 4aea704a

Tom Lane authored Mar 02, 2021

POSIX defines the behavior of back-references thus:

The back-reference expression '\n' shall match the same (possibly
empty) string of characters as was matched by a subexpression
enclosed between "\(" and "\)" preceding the '\n'.

As far as I can see, the back-reference is supposed to consider only
the data characters matched by the referenced subexpression. However,
because our engine copies the NFA constructed from the referenced
subexpression, it effectively enforces any constraints therein, too.
As an example, '(^.)\1' ought to match 'xx', or any other string
starting with two occurrences of the same character; but in our code
it does not, and indeed can't match anything, because the '^' anchor
constraint is included in the backref's copied NFA. If POSIX intended
that, you'd think they'd mention it. Perl for one doesn't act that
way, so it's hard to conclude that this isn't a bug.

Fix by modifying the backref's NFA immediately after it's copied from
the reference, replacing all constraint arcs by EMPTY arcs so that the
constraints are treated as automatically satisfied. This still allows
us to enforce matching rules that depend only on the data characters;
for example, in '(^\d+).*\1' the NFA matching step will still know
that the backref can only match strings of digits.

Perhaps surprisingly, this change does not affect the results of any
of a rather large corpus of real-world regexes. Nonetheless, I would
not consider back-patching it, since it's a clear compatibility break.

Patch by me, reviewed by Joel Jacobson

Discussion: https://postgr.es/m/661609.1614560029@sss.pgh.pa.us

4aea704a

Fix duplicated test case in TAP tests of reindexdb · c5530d84

Michael Paquier authored Mar 02, 2021

The same test for REINDEX (VERBOSE) was done twice, while it is clear
that the second test should use --concurrently.  Issue introduced in
5dc92b84, for what looks like a copy-paste mistake.

Reviewed-by: Mark Dilger
Discussion: https://postgr.es/m/A7AE97EA-F4B0-4CAB-8FFF-3FECD31F9D63@enterprisedb.com
Backpatch-through: 12

c5530d84

Simplify code to switch pg_class.relrowsecurity in tablecmds.c · fabde52f

Michael Paquier authored Mar 02, 2021

The same code pattern was repeated twice to enable or disable ROW LEVEL
SECURITY with an ALTER TABLE command.  This makes the code slightly
cleaner.

Author: Justin Pryzby
Reviewed-by: Zhihong Yu
Discussion: https://postgr.es/m/20210228211854.GC20769@telsasoft.com

fabde52f

doc: Improve description of data checksums · bd1b8d0e

Michael Paquier authored Mar 02, 2021

This partially reverts bcf2667b that got incorrectly merged, and this
improves the wording of the documentation that existed before that.

Per discussion with Justin Pryzby.

Discussion: https://postgr.es/m/20210301004647.GF20769@telsasoft.com

bd1b8d0e

doc: Mention archive_command failure handling on signals · 8c1b6a18

Michael Paquier authored Mar 02, 2021

The behavior is similar to restore_command, which was already documented
for the restore part, but not the archive part.

Author: Benoit Lobréau
Reviewed-by: Julien Rouhaud
Discussion: https://postgr.es/m/CAPE8EZ7akCzc1hWohA4AcbmKtHh9rcWAB5MStOeZD2+9jC+hLQ@mail.gmail.com

8c1b6a18

01 Mar, 2021 12 commits

Improve reporting for syntax errors in multi-line JSON data. · ffd3944a

Tom Lane authored Mar 01, 2021

Point to the specific line where the error was detected; the
previous code tended to include several preceding lines as well.
Avoid re-scanning the entire input to recompute which line that
was. Simplify the logic a bit. Add test cases.

Simon Riggs and Hamid Akhtar, reviewed by Daniel Gustafsson and myself

Discussion: https://postgr.es/m/CANbhV-EPBnXm3MF_TTWBwwqgn1a1Ghmep9VHfqmNBQ8BT0f+_g@mail.gmail.com

ffd3944a

Remove obsolete comment for WaitForProcSignalBarrier(). · bd69ddfc
Thomas Munro authored Mar 02, 2021
```
Commit 814f1d8b removed the behavior described.
Reported-by: Amit Kapila <amit.kapila16@gmail.com>
```
bd69ddfc

Fix recovery test hang in 021_row_visibility.pl on windows. · 1e6e4044

Andres Freund authored Mar 01, 2021

The psql processes were not explicitly killed (but would eventually
exit due postgres shutting down). For some reason windows perl doesn't
like that, resulting in errors like
Warning: unable to close filehandle GEN20 properly: Bad file descriptor during global destruction.

The test was introduced in d6734a897e3, so no backpatching necessary.

1e6e4044

Allow condition variables to be used in interrupt code. · f5a5773a

Thomas Munro authored Mar 01, 2021

Adjust the condition variable sleep loop to work correctly when code
reached by its internal CHECK_FOR_INTERRUPTS() call interacts with
another condition variable.

There are no such cases currently, but a proposed patch would do this.

Discussion: https://postgr.es/m/CA+hUKGLdemy2gBm80kz20GTe6hNVwoErE8KwcJk6-U56oStjtg@mail.gmail.com

f5a5773a

Use condition variables for ProcSignalBarriers. · 814f1d8b

Thomas Munro authored Mar 01, 2021

Instead of a poll/sleep loop, use a condition variable for precise
wake-up whenever a backend's pss_barrierGeneration advances.

Discussion: https://postgr.es/m/CA+hUKGLdemy2gBm80kz20GTe6hNVwoErE8KwcJk6-U56oStjtg@mail.gmail.com

814f1d8b

Avoid repeated decoding of prepared transactions after a restart. · 8bdb1332

Amit Kapila authored Mar 01, 2021

In commit a271a1b5, we allowed decoding at prepare time and the prepare
was decoded again if there is a restart after decoding it. It was done
that way because we can't distinguish between the cases where we have not
decoded the prepare because it was prior to consistent snapshot or we have
decoded it earlier but restarted. To distinguish between these two cases,
we have introduced an initial_consistent_point at the slot level which is
an LSN at which we found a consistent point at the time of slot creation.
This is also the point where we have exported a snapshot for the initial
copy. So, prepare transaction prior to this point are sent along with
commit prepared.

This commit bumps SNAPBUILD_VERSION because of change in SnapBuild. It
will break existing slots which is fine in a major release.

Author: Ajin Cherian, based on idea by Andres Freund
Reviewed-by: Amit Kapila and Vignesh C
Discussion: https://postgr.es/m/d0f60d60-133d-bf8d-bd70-47784d8fabf3@enterprisedb.com

8bdb1332

Use FeBeWaitSet for walsender.c. · 6230912f

Thomas Munro authored Mar 01, 2021

This avoids the need to set up and tear down a fresh WaitEventSet every
time we need need to wait. We have to add an explicit exit on
postmaster exit (FeBeWaitSet isn't set up to do that automatically), so
move the code to do that into a new function to avoid repetition.

Reviewed-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com> (earlier version)
Discussion: https://postgr.es/m/CA%2BhUKGJAC4Oqao%3DqforhNey20J8CiG2R%3DoBPqvfR0vOJrFysGw%40mail.gmail.com

6230912f

Introduce symbolic names for FeBeWaitSet positions. · a042ba2b

Thomas Munro authored Mar 01, 2021

Previously we used 0 and 1 to refer to the socket and latch in far flung
parts of the tree, without any explanation. Also use PGINVALID_SOCKET
rather than -1 in a couple of places that didn't already do that.
Reviewed-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Discussion: https://postgr.es/m/CA%2BhUKGJAC4Oqao%3DqforhNey20J8CiG2R%3DoBPqvfR0vOJrFysGw%40mail.gmail.com

a042ba2b

Update docs of logical replication for commit . · cf54e04b

Amit Kapila authored Mar 01, 2021

Forgot to update the logical replication configuration settings page.
After commit ce0fdbfe, table synchronization workers also started using
replication origins to track the progress and the same should be reflected
in docs.

Author: Amit Kapila
Discussion: https://postgr.es/m/CAA4eK1KkbppndxxRKbaT2sXrLkdPwy44F4pjEZ0EDrVjD9MPjQ@mail.gmail.com

cf54e04b

Update the docs and comments for decoding of prepared xacts. · b4e3dc7f

Amit Kapila authored Mar 01, 2021

Commit a271a1b5 introduced decoding at prepare time in ReorderBuffer.
This can lead to deadlock for out-of-core logical replication solutions
that uses this feature to build distributed 2PC in case such transactions
lock [user] catalog tables exclusively. They need to inform users to not
have locks on catalog tables (via explicit LOCK command) in such
transactions.

Reported-by: Andres Freund
Discussion: https://postgr.es/m/20210222222847.tpnb6eg3yiykzpky@alap3.anarazel.de

b4e3dc7f

Use EVFILT_SIGNAL for kqueue latches. · 6148656a

Thomas Munro authored Mar 01, 2021

Cut down on system calls and other overheads by waiting for SIGURG
explicitly with kqueue instead of using a signal handler and self-pipe.
Affects *BSD and macOS systems.

This leaves only the poll implementation with a signal handler and the
traditional self-pipe trick.

Discussion: https://postgr.es/m/CA+hUKGJjxPDpzBE0a3hyUywBvaZuC89yx3jK9RFZgfv_KHU7gg@mail.gmail.com

6148656a

Use signalfd(2) for epoll latches. · 6a2a70a0

Thomas Munro authored Mar 01, 2021

Cut down on system calls and other overheads by reading from a signalfd
instead of using a signal handler and self-pipe. Affects Linux sytems,
and possibly others including illumos that implement the Linux epoll and
signalfd interfaces.
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/CA+hUKGJjxPDpzBE0a3hyUywBvaZuC89yx3jK9RFZgfv_KHU7gg@mail.gmail.com

6a2a70a0

28 Feb, 2021 3 commits

Use SIGURG rather than SIGUSR1 for latches. · 83709a0d

Thomas Munro authored Mar 01, 2021

Traditionally, SIGUSR1 has been overloaded for ad-hoc signals,
procsignal.c signals and latch.c wakeups. Move that last use over to a
new dedicated signal. SIGURG is normally used to report out-of-band
socket data, but PostgreSQL doesn't use that facility.

The signal handler is now installed in all postmaster children by
InitializeLatchSupport(). Those wishing to disconnect from it should
call ShutdownLatchSupport().

Future patches will use this separation of signals to avoid the need for
a signal handler on some operating systems.

Discussion: https://postgr.es/m/CA+hUKGJjxPDpzBE0a3hyUywBvaZuC89yx3jK9RFZgfv_KHU7gg@mail.gmail.com

83709a0d

Optimize latches to send fewer signals. · c8f3bc24

Thomas Munro authored Mar 01, 2021

Don't send signals to processes that aren't sleeping.

Author: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/CA+hUKGJjxPDpzBE0a3hyUywBvaZuC89yx3jK9RFZgfv_KHU7gg@mail.gmail.com

c8f3bc24

Remove latch.c workaround for Linux < 2.6.27. · d1b90995

Thomas Munro authored Mar 01, 2021

Commit 82ebbeb0 added a workaround for systems with no epoll_create1()
and EPOLL_CLOEXEC. Linux < 2.6.27 and glibc < 2.9 are long gone. Now
seems like a good time to drop the extra code, because otherwise we'd
have to add similar already-dead workaround code to new patches using
XXX_CLOEXEC flags that arrived in the same kernel release.
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/CA%2BhUKGKL_%3DaO%3Dr30N%3Ds9VoDgTqHpRSzePRbA9dkYO7snc7HsxA%40mail.gmail.com

d1b90995

27 Feb, 2021 6 commits

pgbench: Remove now-dead CState->ecnt · 943eb478

Michael Paquier authored Feb 28, 2021

The last use of ecnt was in 12788ae4.  It was getting incremented after a
backend error without any purpose since then, so let's get rid of it.

Author: Kota Miyake
Reviewed-by: Álvaro Herrera
Discussion: https://postgr.es/m/786c3d9fbe067763d899e78c296f9f0f@oss.nttdata.com

943eb478

Fix use-after-free bug with AfterTriggersTableData.storeslot · 25936fd4

Alvaro Herrera authored Feb 27, 2021

AfterTriggerSaveEvent() wrongly allocates the slot in execution-span
memory context, whereas the correct thing is to allocate it in
a transaction-span context, because that's where the enclosing
AfterTriggersTableData instance belongs into.

Backpatch to 12 (the test back to 11, where it works well with no code
changes, and it's good to have to confirm that the case was previously
well supported); this bug seems introduced by commit ff11e7f4.
Reported-by: Bertrand Drouvot <bdrouvot@amazon.com>
Author: Amit Langote <amitlangote09@gmail.com>
Discussion: https://postgr.es/m/39a71864-b120-5a5c-8cc5-c632b6f16761@amazon.com

25936fd4

Raise a timeout to 180s, in contrib/test_decoding. · 388b9593
Noah Misch authored Feb 27, 2021
```
Per buildfarm member hornet.  The test is new in v14, so no back-patch.
```
388b9593
Add missing TidRangeScan readfunc · 977b2c08
David Rowley authored Feb 27, 2021
```
Mistakenly forgotten in bb437f99
```
977b2c08

Add TID Range Scans to support efficient scanning ranges of TIDs · bb437f99

David Rowley authored Feb 27, 2021

This adds a new executor node named TID Range Scan. The query planner
will generate paths for TID Range scans when quals are discovered on base
relations which search for ranges on the table's ctid column. These
ranges may be open at either end. For example, WHERE ctid >= '(10,0)';
will return all tuples on page 10 and over.

To support this, two new optional callback functions have been added to
table AM. scan_set_tidrange is used to set the scan range to just the
given range of TIDs. scan_getnextslot_tidrange fetches the next tuple
in the given range.

For AMs were scanning ranges of TIDs would not make sense, these functions
can be set to NULL in the TableAmRoutine. The query planner won't
generate TID Range Scan Paths in that case.

Author: Edmund Horner, David Rowley
Reviewed-by: David Rowley, Tomas Vondra, Tom Lane, Andres Freund, Zhihong Yu
Discussion: https://postgr.es/m/CAMyN-kB-nFTkF=VA_JPwFNo08S0d-Yk0F741S2B7LDmYAi8eyA@mail.gmail.com

bb437f99

Enhanced cycle mark values · f4adc41c

Peter Eisentraut authored Feb 27, 2021

Per SQL:202x draft, in the CYCLE clause of a recursive query, the
cycle mark values can be of type boolean and can be omitted, in which
case they default to TRUE and FALSE.
Reviewed-by: Vik Fearing <vik@postgresfriends.org>
Discussion: https://www.postgresql.org/message-id/flat/db80ceee-6f97-9b4a-8ee8-3ba0c58e5be2@2ndquadrant.com

f4adc41c

26 Feb, 2021 6 commits

Doc: further clarify libpq's description of connection string URIs. · 4e90052c

Tom Lane authored Feb 26, 2021

Break the synopsis into named parts to make it less confusing.
Make more than zero effort at applying SGML markup.  Do a bit
of copy-editing of nearby text.

The synopsis revision is by Alvaro Herrera and Paul Förster,
the rest is my fault.  Back-patch to v10 where multi-host
connection strings appeared.

Discussion: https://postgr.es/m/6E752D6B-487C-463E-B6E2-C32E7FB007EA@gmail.com

4e90052c

Improve memory management in regex compiler. · 0fc1af17

Tom Lane authored Feb 26, 2021

The previous logic here created a separate pool of arcs for each
state, so that the out-arcs of each state were physically stored
within it. Perhaps this choice was driven by trying to not include
a "from" pointer within each arc; but Spencer gave up on that idea
long ago, and it's hard to see what the value is now. The approach
turns out to be fairly disastrous in terms of memory consumption,
though. In the first place, NFAs built by this engine seem to have
about 4 arcs per state on average, with a majority having only one
or two out-arcs. So pre-allocating 10 out-arcs for each state is
already cause for a factor of two or more bloat. Worse, the NFA
optimization phase moves arcs around with abandon. In a large NFA,
some of the states will have hundreds of out-arcs, so towards the
end of the optimization phase we have a significant number of states
whose arc pools have room for hundreds of arcs each, even though only
a few of those arcs are in use. We have seen real-world regexes in
which this effect bloats the memory requirement by 25X or even more.

Hence, get rid of the per-state arc pools in favor of a single arc
pool for the whole NFA, with variable-sized allocation batches
instead of always asking for 10 at a time. While we're at it,
let's batch the allocations of state structs too, to further reduce
the malloc traffic.

This incidentally allows moveouts() to be optimized in a similar
way to moveins(): when moving an arc to another state, it's now
valid to just re-link the same arc struct into a different outchain,
where before the code invariants required us to make a physically
new arc and then free the old one.

These changes reduce the regex compiler's typical space consumption
for average-size regexes by about a factor of two, and much more for
large or complicated regexes. In a large test set of real-world
regexes, we formerly had half a dozen cases that failed with "regular
expression too complex" due to exceeding the REG_MAX_COMPILE_SPACE
limit (about 150MB); we would have had to raise that limit to
something close to 400MB to make them work with the old code. Now,
none of those cases need more than 13MB to compile. Furthermore,
the test set is about 10% faster overall due to less malloc traffic.

Discussion: https://postgr.es/m/168861.1614298592@sss.pgh.pa.us

0fc1af17

Extend a test case a little · b3a9e989

Peter Eisentraut authored Feb 26, 2021

This will possibly help a subsequent patch by making sure the notice
messages are distinct so that it's clear that they come out in the
right order.

Author: Fabien COELHO <coelho@cri.ensmp.fr>
Discussion: https://www.postgresql.org/message-id/alpine.DEB.2.21.1904240654120.3407%40lancre

b3a9e989

doc: Improve {archive,restore}_command for compressed logs · 329784e1

Michael Paquier authored Feb 26, 2021

The commands mentioned in the docs with gzip and gunzip did not prefix
the archives with ".gz" and used inconsistent paths for the archives,
which can be confusing.

Reported-by: Philipp Gramzow
Reviewed-by: Fujii Masao
Discussion: https://postgr.es/m/161397938841.15451.13129264141285167267@wrigleys.postgresql.org

329784e1

Revert "pg_collation_actual_version() -> pg_collation_current_version()." · 8556267b

Thomas Munro authored Feb 26, 2021

This reverts commit 9cf184cc.  Name
change less well received than anticipated.

Discussion: https://postgr.es/m/afcfb97e-88a1-a540-db95-6c573b93bc2b%40eisentraut.org

8556267b

Fix list-manipulation bug in WITH RECURSIVE processing. · 80ca8464

Tom Lane authored Feb 25, 2021

makeDependencyGraphWalker and checkWellFormedRecursionWalker
thought they could hold onto a pointer to a list's first
cons cell while the list was modified by recursive calls.
That was okay when the cons cell was actually separately
palloc'd ... but since commit 1cff1b95, it's quite unsafe,
leading to core dumps or incorrect complaints of faulty
WITH nesting.

In the field this'd require at least a seven-deep WITH nest
to cause an issue, but enabling DEBUG_LIST_MEMORY_USAGE
allows the bug to be seen with lesser nesting depths.

Per bug #16801 from Alexander Lakhin.  Back-patch to v13.

Michael Paquier and Tom Lane

Discussion: https://postgr.es/m/16801-393c7922143eaa4d@postgresql.org

80ca8464

25 Feb, 2021 4 commits

VACUUM VERBOSE: Count "newly deleted" index pages. · 23763618

Peter Geoghegan authored Feb 25, 2021

Teach VACUUM VERBOSE to report on pages deleted by the _current_ VACUUM
operation -- these are newly deleted pages. VACUUM VERBOSE continues to
report on the total number of deleted pages in the entire index (no
change there). The former is a subset of the latter.

The distinction between each category of deleted index page only arises
with index AMs where page deletion is supported and is decoupled from
page recycling for performance reasons.

This is follow-up work to commit e5d8a999, which made nbtree store
64-bit XIDs (not 32-bit XIDs) in pages at the point at which they're
deleted. Note that the btm_last_cleanup_num_delpages metapage field
added by that commit usually gets set to pages_newly_deleted. The
exceptions (the scenarios in which they're not equal) all seem to be
tricky cases for the implementation (of page deletion and recycling) in
general.

Author: Peter Geoghegan <pg@bowt.ie>
Discussion: https://postgr.es/m/CAH2-WznpdHvujGUwYZ8sihX%3Dd5u-tRYhi-F4wnV2uN2zHpMUXw%40mail.gmail.com

23763618

Doc: remove src/backend/regex/re_syntax.n. · 301ed881

Tom Lane authored Feb 25, 2021

We aren't publishing this file as documentation, and it's been
much more haphazardly maintained than the real docs in func.sgml,
so let's just drop it.  I think the only reason I included it in
commit 7bcc6d98 was that the Berkeley-era sources had had a man
page in this directory.

Discussion: https://postgr.es/m/4099447.1614186542@sss.pgh.pa.us

301ed881

Change regex \D and \W shorthands to always match newlines. · 7dc13a0f

Tom Lane authored Feb 25, 2021

Newline is certainly not a digit, nor a word character, so it is
sensible that it should match these complemented character classes.
Previously, \D and \W acted that way by default, but in
newline-sensitive mode ('n' or 'p' flag) they did not match newlines.

This behavior was previously forced because explicit complemented
character classes don't match newlines in newline-sensitive mode;
but as of the previous commit that implementation constraint no
longer exists.  It seems useful to change this because the primary
real-world use for newline-sensitive mode seems to be to match the
default behavior of other regex engines such as Perl and Javascript
... and their default behavior is that these match newlines.

The old behavior can be kept by writing an explicit complemented
character class, i.e. [^[:digit:]] or [^[:word:]].  (This means
that \D and \W are not exactly equivalent to those strings, but
they weren't anyway.)

Discussion: https://postgr.es/m/3220564.1613859619@sss.pgh.pa.us

7dc13a0f

Allow complemented character class escapes within regex brackets. · 2a0af7fe

Tom Lane authored Feb 25, 2021

The complement-class escapes \D, \S, \W are now allowed within
bracket expressions.  There is no semantic difficulty with doing
that, but the rather hokey macro-expansion-based implementation
previously used here couldn't cope.

Also, invent "word" as an allowed character class name, thus "\w"
is now equivalent to "[[:word:]]" outside brackets, or "[:word:]"
within brackets.  POSIX allows such implementation-specific
extensions, and the same name is used in e.g. bash.

One surprising compatibility issue this raises is that constructs
such as "[\w-_]" are now disallowed, as our documentation has always
said they should be: character classes can't be endpoints of a range.
Previously, because \w was just a macro for "[:alnum:]_", such a
construct was read as "[[:alnum:]_-_]", so it was accepted so long as
the character after "-" was numerically greater than or equal to "_".

Some implementation cleanup along the way:

* Remove the lexnest() hack, and in consequence clean up wordchrs()
to not interact with the lexer.

* Fix colorcomplement() to not be O(N^2) in the number of colors
involved.

* Get rid of useless-as-far-as-I-can-see calls of element()
on single-character character element names in brackpart().
element() always maps these to the character itself, and things
would be quite broken if it didn't --- should "[a]" match something
different than "a" does?  Besides, the shortcut path in brackpart()
wasn't doing this anyway, making it even more inconsistent.

Discussion: https://postgr.es/m/2845172.1613674385@sss.pgh.pa.us
Discussion: https://postgr.es/m/3220564.1613859619@sss.pgh.pa.us

2a0af7fe