1. 23 Feb, 2021 7 commits
  2. 22 Feb, 2021 13 commits
  3. 21 Feb, 2021 2 commits
    • Tom Lane's avatar
      Avoid generating extra subre tree nodes for capturing parentheses. · ea1268f6
      Tom Lane authored
      Previously, each pair of capturing parentheses gave rise to a separate
      subre tree node, whose only function was to identify that we ought to
      capture the match details for this particular sub-expression.  In
      most cases we don't really need that, since we can perfectly well
      put a "capture this" annotation on the child node that does the real
      matching work.  As with the two preceding commits, the main value
      of this is to avoid generating and optimizing an NFA for a tree node
      that's not really pulling its weight.
      
      The chosen data representation only allows one capture annotation
      per subre node.  In the legal-per-spec, but seemingly not very useful,
      case where there are multiple capturing parens around the exact same
      bit of the regex (i.e. "((xyz))"), wrap the child node in N-1 capture
      nodes that act the same as before.  We could work harder at that but
      I'll refrain, pending some evidence that such cases are worth troubling
      over.
      
      In passing, improve the comments in regex.h to say what all the
      different re_info bits mean.  Some of them were pretty obvious
      but others not so much, so reverse-engineer some documentation.
      
      This is part of a patch series that in total reduces the regex engine's
      runtime by about a factor of four on a large corpus of real-world regexes.
      
      Patch by me, reviewed by Joel Jacobson
      
      Discussion: https://postgr.es/m/1340281.1613018383@sss.pgh.pa.us
      ea1268f6
    • Tom Lane's avatar
      Convert regex engine's subre tree from binary to N-ary style. · 58104308
      Tom Lane authored
      Instead of having left and right child links in subre structs,
      have a single child link plus a sibling link.  Multiple children
      of a tree node are now reached by chasing the sibling chain.
      
      The beneficiary of this is alternation tree nodes.  A regular
      expression with N (>1) branches is now represented by one alternation
      node with N children, rather than a tree that includes N alternation
      nodes as well as N children.  While the old representation didn't
      really cost anything extra at execution time, it was pretty horrid
      for compilation purposes, because each of the alternation nodes had
      its own NFA, which we were too stupid not to separately optimize.
      (To make matters worse, all of those NFAs described the entire
      alternation pattern, not just the portion of it that one might
      expect from the tree structure.)
      
      We continue to require concatenation nodes to have exactly two
      children.  This data structure is now prepared to support more,
      but the executor's logic would need some careful redesign, and
      it's not clear that a lot of benefit could be had.
      
      This is part of a patch series that in total reduces the regex engine's
      runtime by about a factor of four on a large corpus of real-world regexes.
      
      Patch by me, reviewed by Joel Jacobson
      
      Discussion: https://postgr.es/m/1340281.1613018383@sss.pgh.pa.us
      58104308
  4. 20 Feb, 2021 5 commits
    • Tom Lane's avatar
      Fix regex engine to suppress useless concatenation sub-REs. · cebc1d34
      Tom Lane authored
      The comment for parsebranch() claims that it avoids generating
      unnecessary concatenation nodes in the "subre" tree, but it missed
      some significant cases.  Once we've decided that a given atom is
      "messy" and can't be bundled with the preceding atom(s) of the
      current regex branch, parseqatom() always generated two new concat
      nodes, one to concat the messy atom to what follows it in the branch,
      and an upper node to concatenate the preceding part of the branch
      to that one.  But one or both of these could be unnecessary, if the
      messy atom is the first, last, or only one in the branch.  Improve
      the code to suppress such useless concat nodes, along with the
      no-op child nodes representing empty chunks of a branch.
      
      Reducing the number of subre tree nodes offers significant savings
      not only at execution but during compilation, because each subre node
      has its own NFA that has to be separately optimized.  (Maybe someday
      we'll figure out how to share the optimization work across multiple
      tree nodes, but it doesn't look easy.)  Eliminating upper tree nodes
      is especially useful because they tend to have larger NFAs.
      
      This is part of a patch series that in total reduces the regex engine's
      runtime by about a factor of four on a large corpus of real-world regexes.
      
      Patch by me, reviewed by Joel Jacobson
      
      Discussion: https://postgr.es/m/1340281.1613018383@sss.pgh.pa.us
      cebc1d34
    • Tom Lane's avatar
      Recognize "match-all" NFAs within the regex engine. · 824bf719
      Tom Lane authored
      This builds on the previous "rainbow" patch to detect NFAs that will
      match any string, though possibly with constraints on the string length.
      This definition is chosen to match constructs such as ".*", ".+", and
      ".{1,100}".  Recognizing such an NFA after the optimization pass is
      fairly cheap, since we basically just have to verify that all arcs
      are RAINBOW arcs and count the number of steps to the end state.
      (Well, there's a bit of complication with pseudo-color arcs for string
      boundary conditions, but not much.)
      
      Once we have these markings, the regex executor functions longest(),
      shortest(), and matchuntil() don't have to expend per-character work
      to determine whether a given substring satisfies such an NFA; they
      just need to check its length against the bounds.  Since some matching
      problems require O(N) invocations of these functions, we've reduced
      the runtime for an N-character string from O(N^2) to O(N).  Of course,
      this is no help for non-matchall sub-patterns, but those usually have
      constraints that allow us to avoid needing O(N) substring checks in the
      first place.  It's precisely the unconstrained "match-all" cases that
      cause the most headaches.
      
      This is part of a patch series that in total reduces the regex engine's
      runtime by about a factor of four on a large corpus of real-world regexes.
      
      Patch by me, reviewed by Joel Jacobson
      
      Discussion: https://postgr.es/m/1340281.1613018383@sss.pgh.pa.us
      824bf719
    • Tom Lane's avatar
      Invent "rainbow" arcs within the regex engine. · 08c0d6ad
      Tom Lane authored
      Some regular expression constructs, most notably the "." match-anything
      metacharacter, produce a sheaf of parallel NFA arcs covering all
      possible colors (that is, character equivalence classes).  We can make
      a noticeable improvement in the space and time needed to process large
      regexes by replacing such cases with a single arc bearing the special
      color code "RAINBOW".  This requires only minor additional complication
      in places such as pull() and push().
      
      Callers of pg_reg_getoutarcs() must now be prepared for the possibility
      of seeing a RAINBOW arc.  For the one known user, contrib/pg_trgm,
      that's a net benefit since it cuts the number of arcs to be dealt with,
      and the handling isn't any different than for other colors that contain
      too many characters to be dealt with individually.
      
      This is part of a patch series that in total reduces the regex engine's
      runtime by about a factor of four on a large corpus of real-world regexes.
      
      Patch by me, reviewed by Joel Jacobson
      
      Discussion: https://postgr.es/m/1340281.1613018383@sss.pgh.pa.us
      08c0d6ad
    • Michael Paquier's avatar
      doc: Mention that partitions_{done,total} is 0 for REINDEX progress reports · 17661188
      Michael Paquier authored
      REINDEX has recently gained support for partitions, so it can be
      confusing to see those fields not being set.  Making useful reports for
      for such relations is more complicated than it looks with the current
      set of columns available in pg_stat_progress_create_index, and this
      touches equally REINDEX DATABASE/SYSTEM/SCHEMA.  This commit documents
      that those two columns are not touched during a REINDEX.
      
      Reported-by: Justin Pryzby
      Discussion: https://postgr.es/m/20210216064214.GI28165@telsasoft.com
      17661188
    • Michael Paquier's avatar
      Fix inconsistent configure data for --with-ssl · a899ec1c
      Michael Paquier authored
      This inconsistency was showing up after an autoreconf.
      
      Reported-by: Antonin Houska
      Reviewed-by: Tom Lane
      Discussion: https://postgr.es/m/47255.1613716807@antos
      a899ec1c
  5. 19 Feb, 2021 5 commits
    • Fujii Masao's avatar
      Fix psql's ON_ERROR_ROLLBACK so that it handles COMMIT AND CHAIN. · fe06819f
      Fujii Masao authored
      When ON_ERROR_ROLLBACK is enabled, psql releases a temporary savepoint
      if it's idle in a valid transaction block after executing a query. But psql
      doesn't do that after RELEASE or ROLLBACK is executed because a temporary
      savepoint has already been destroyed in that case.
      
      This commit changes psql's ON_ERROR_ROLLBACK so that it doesn't release
      a temporary savepoint also when COMMIT AND CHAIN is executed. A temporary
      savepoint doesn't need to be released in that case because
      COMMIT AND CHAIN also destroys any savepoints defined within the transaction
      to commit. Otherwise psql tries to release the savepoint that
      COMMIT AND CHAIN has already destroyed and cause an error
      "ERROR:  savepoint "pg_psql_temporary_savepoint" does not exist".
      
      Back-patch to v12 where transaction chaining was added.
      
      Reported-by: Arthur Nascimento
      Author: Arthur Nascimento
      Reviewed-by: Fujii Masao, Vik Fearing
      Discussion: https://postgr.es/m/16867-3475744069228158@postgresql.org
      fe06819f
    • Fujii Masao's avatar
      Fix bug in COMMIT AND CHAIN command. · 8a55cb5b
      Fujii Masao authored
      This commit fixes COMMIT AND CHAIN command so that it starts new transaction
      immediately even if savepoints are defined within the transaction to commit.
      Previously COMMIT AND CHAIN command did not in that case because
      commit 280a408b forgot to make CommitTransactionCommand() handle
      a transaction chaining when the transaction state was TBLOCK_SUBCOMMIT.
      
      Also this commit adds the regression test for COMMIT AND CHAIN command
      when savepoints are defined.
      
      Back-patch to v12 where transaction chaining was added.
      
      Reported-by: Arthur Nascimento
      Author: Fujii Masao
      Reviewed-by: Arthur Nascimento, Vik Fearing
      Discussion: https://postgr.es/m/16867-3475744069228158@postgresql.org
      8a55cb5b
    • Peter Eisentraut's avatar
      Update snowball · 678d0e23
      Peter Eisentraut authored
      Update to snowball tag v2.1.0.  Major changes are new stemmers for
      Armenian, Serbian, and Yiddish.
      678d0e23
    • Peter Geoghegan's avatar
      Add nbtree README section on page recycling. · b071a311
      Peter Geoghegan authored
      Consolidate discussion of how VACUUM places pages in the FSM for
      recycling by adding a new section that comes after discussion of page
      deletion.  This structure reflects the fact that page recycling is
      explicitly decoupled from page deletion in Lanin & Shasha's paper.  Page
      recycling in nbtree is an implementation of what the paper calls "the
      drain technique".
      
      This decoupling is an important concept for nbtree VACUUM.  Searchers
      have to detect and recover from concurrent page deletions, but they will
      never have to reason about concurrent page recycling.  Recycling can
      almost always be thought of as a low level garbage collection operation
      that asynchronously frees the physical space that backs a logical tree
      node.  Almost all code need only concern itself with logical tree nodes.
      (Note that "logical tree node" is not currently a term of art in the
      nbtree code -- this all works implicitly.)
      
      This is preparation for an upcoming patch that teaches nbtree VACUUM to
      remember the details of pages that it deletes on the fly, in local
      memory.  This enables the same VACUUM operation to consider placing its
      own deleted pages in the FSM later on, when it reaches the end of
      btvacuumscan().
      b071a311
    • Tom Lane's avatar
      Fix another ancient bug in parsing of BRE-mode regular expressions. · b5a66e73
      Tom Lane authored
      While poking at the regex code, I happened to notice that the bug
      squashed in commit afcc8772 had a sibling: next() failed to return
      a specific value associated with the '}' token for a "\{m,n\}"
      quantifier when parsing in basic RE mode.  Again, this could result
      in treating the quantifier as non-greedy, which it never should be in
      basic mode.  For that to happen, the last character before "\}" that
      sets "nextvalue" would have to set it to zero, or it'd have to have
      accidentally been zero from the start.  The failure can be provoked
      repeatably with, for example, a bound ending in digit "0".
      
      Like the previous patch, back-patch all the way.
      b5a66e73
  6. 18 Feb, 2021 4 commits
    • Fujii Masao's avatar
      Fix "invalid spinlock number: 0" error in pg_stat_wal_receiver. · 614b7f18
      Fujii Masao authored
      Commit 2c8dd05d added the atomic variable writtenUpto into
      walreceiver's shared memory information. It's initialized only
      when walreceiver started up but could be read via pg_stat_wal_receiver
      view anytime, i.e., even before it's initialized. In the server built
      with --disable-atomics and --disable-spinlocks, this uninitialized
      atomic variable read could cause "invalid spinlock number: 0" error.
      
      This commit changed writtenUpto so that it's initialized at
      the postmaster startup, to avoid the uninitialized variable read
      via pg_stat_wal_receiver and fix the error.
      
      Also this commit moved the read of writtenUpto after the release
      of spinlock protecting walreceiver's shared variables. This is
      necessary to prevent new spinlock from being taken by atomic
      variable read while holding another spinlock, and to shorten
      the spinlock duration. This change leads writtenUpto not to be
      consistent with the other walreceiver's shared variables protected
      by a spinlock. But this is OK because writtenUpto should not be
      used for data integrity checks.
      
      Back-patch to v13 where commit 2c8dd05d introduced the bug.
      
      Author: Fujii Masao
      Reviewed-by: Michael Paquier, Thomas Munro, Andres Freund
      Discussion: https://postgr.es/m/7ef8708c-5b6b-edd3-2cf2-7783f1c7c175@oss.nttdata.com
      614b7f18
    • Peter Eisentraut's avatar
      Add tests for bytea LIKE operator · eb42110d
      Peter Eisentraut authored
      Add test coverage for the following operations, which were previously
      not tested at all:
      
      bytea LIKE bytea (bytealike)
      bytea NOT LIKE bytea (byteanlike)
      ESCAPE clause for the above (like_escape_bytea)
      
      also
      
      name NOT ILIKE text (nameicnlike)
      
      Discussion: https://www.postgresql.org/message-id/flat/4d13563a-2c8d-fd91-20d5-e71b7a4eaa87%40enterprisedb.com
      eb42110d
    • Peter Eisentraut's avatar
      Allow specifying CRL directory · f5465fad
      Peter Eisentraut authored
      Add another method to specify CRLs, hashed directory method, for both
      server and client side.  This offers a means for server or libpq to
      load only CRLs that are required to verify a certificate.  The CRL
      directory is specifed by separate GUC variables or connection options
      ssl_crl_dir and sslcrldir, alongside the existing ssl_crl_file and
      sslcrl, so both methods can be used at the same time.
      
      Author: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
      Discussion: https://www.postgresql.org/message-id/flat/20200731.173911.904649928639357911.horikyota.ntt@gmail.com
      f5465fad
    • Peter Geoghegan's avatar
      nbtree README: move VACUUM linear scan section. · 128dd901
      Peter Geoghegan authored
      Discuss VACUUM's linear scan after discussion of tuple deletion by
      VACUUM, but before discussion of page deletion by VACUUM.  This
      progression is a lot more natural.
      
      Also tweak the wording a little.  It seems unnecessary to talk about how
      it worked prior to PostgreSQL 8.2.
      128dd901
  7. 17 Feb, 2021 4 commits
    • Tomas Vondra's avatar
      Fix tuple routing to initialize batching only for inserts · 927f453a
      Tomas Vondra authored
      A cross-partition update on a partitioned table is implemented as a
      delete followed by an insert. With foreign partitions, this was however
      causing issues, because the FDW and core may disagree on when to enable
      batching.  postgres_fdw was only allowing batching for plain inserts
      (CMD_INSERT) while core was trying to batch the insert component of the
      cross-partition update.  Fix by restricting core to apply batching only
      to plain CMD_INSERT queries.
      
      It's possible to allow batching for cross-partition updates, but that
      will require more extensive changes, so better to leave that for a
      separate patch.
      
      Author: Amit Langote
      Reviewed-by: Tomas Vondra, Takayuki Tsunakawa
      Discussion: https://postgr.es/m/20200628151002.7x5laxwpgvkyiu3q@development
      927f453a
    • Tomas Vondra's avatar
    • Tom Lane's avatar
      Make some minor improvements in the regex code. · 4e703d67
      Tom Lane authored
      Push some hopefully-uncontroversial bits extracted from an upcoming
      patch series, to remove non-relevant clutter from the main patches.
      
      In compact(), return immediately after setting REG_ASSERT error;
      continuing the loop would just lead to assertion failure below.
      (Ask me how I know.)
      
      In parseqatom(), remove assertion that moresubs() did its job.
      When moresubs actually did its job, this is redundant with that
      function's final assert; but when it failed on OOM, this is an
      assertion crash.  We could avoid the crash by adding a NOERR()
      check before the assertion, but it seems better to subtract code
      than add it.  (Note that there's a NOERR exit a few lines further
      down, and nothing else between here and there requires moresubs
      to have succeeded.  So we don't really need an extra error exit.)
      This is a live bug in assert-enabled builds, but given the very
      low likelihood of OOM in moresub's tiny allocation, I don't think
      it's worth back-patching.
      
      On the other hand, it seems worthwhile to add an assertion that
      our intended v->subs[subno] target is still null by the time we
      are ready to insert into it, since there's a recursion in between.
      
      In pg_regexec, ensure we fflush any debug output on the way out,
      and try to make MDEBUG messages more uniform and helpful.  (In
      particular, ensure that all of them are prefixed with the subre's
      id number, so one can match up entry and exit reports.)
      
      Add some test cases in test_regex to improve coverage of lookahead
      and lookbehind constraints.  Adding these now is mainly to establish
      that this is indeed the existing behavior.
      
      Discussion: https://postgr.es/m/1340281.1613018383@sss.pgh.pa.us
      4e703d67
    • Peter Eisentraut's avatar
      Routine usage information schema tables · f40c6969
      Peter Eisentraut authored
      Several information schema views track dependencies between
      functions/procedures and objects used by them.  These had not been
      implemented so far because PostgreSQL doesn't track objects used in a
      function body.  However, formally, these also show dependencies used
      in parameter default expressions, which PostgreSQL does support and
      track.  So for the sake of completeness, we might as well add these.
      If dependency tracking for function bodies is ever implemented, these
      views will automatically work correctly.
      Reviewed-by: default avatarErik Rijkers <er@xs4all.nl>
      Discussion: https://www.postgresql.org/message-id/flat/ac80fc74-e387-8950-9a31-2560778fc1e3%40enterprisedb.com
      f40c6969