1. 27 Feb, 2021 1 commit
  2. 26 Feb, 2021 6 commits
    • Tom Lane's avatar
      Doc: further clarify libpq's description of connection string URIs. · 4e90052c
      Tom Lane authored
      Break the synopsis into named parts to make it less confusing.
      Make more than zero effort at applying SGML markup.  Do a bit
      of copy-editing of nearby text.
      
      The synopsis revision is by Alvaro Herrera and Paul Förster,
      the rest is my fault.  Back-patch to v10 where multi-host
      connection strings appeared.
      
      Discussion: https://postgr.es/m/6E752D6B-487C-463E-B6E2-C32E7FB007EA@gmail.com
      4e90052c
    • Tom Lane's avatar
      Improve memory management in regex compiler. · 0fc1af17
      Tom Lane authored
      The previous logic here created a separate pool of arcs for each
      state, so that the out-arcs of each state were physically stored
      within it.  Perhaps this choice was driven by trying to not include
      a "from" pointer within each arc; but Spencer gave up on that idea
      long ago, and it's hard to see what the value is now.  The approach
      turns out to be fairly disastrous in terms of memory consumption,
      though.  In the first place, NFAs built by this engine seem to have
      about 4 arcs per state on average, with a majority having only one
      or two out-arcs.  So pre-allocating 10 out-arcs for each state is
      already cause for a factor of two or more bloat.  Worse, the NFA
      optimization phase moves arcs around with abandon.  In a large NFA,
      some of the states will have hundreds of out-arcs, so towards the
      end of the optimization phase we have a significant number of states
      whose arc pools have room for hundreds of arcs each, even though only
      a few of those arcs are in use.  We have seen real-world regexes in
      which this effect bloats the memory requirement by 25X or even more.
      
      Hence, get rid of the per-state arc pools in favor of a single arc
      pool for the whole NFA, with variable-sized allocation batches
      instead of always asking for 10 at a time.  While we're at it,
      let's batch the allocations of state structs too, to further reduce
      the malloc traffic.
      
      This incidentally allows moveouts() to be optimized in a similar
      way to moveins(): when moving an arc to another state, it's now
      valid to just re-link the same arc struct into a different outchain,
      where before the code invariants required us to make a physically
      new arc and then free the old one.
      
      These changes reduce the regex compiler's typical space consumption
      for average-size regexes by about a factor of two, and much more for
      large or complicated regexes.  In a large test set of real-world
      regexes, we formerly had half a dozen cases that failed with "regular
      expression too complex" due to exceeding the REG_MAX_COMPILE_SPACE
      limit (about 150MB); we would have had to raise that limit to
      something close to 400MB to make them work with the old code.  Now,
      none of those cases need more than 13MB to compile.  Furthermore,
      the test set is about 10% faster overall due to less malloc traffic.
      
      Discussion: https://postgr.es/m/168861.1614298592@sss.pgh.pa.us
      0fc1af17
    • Peter Eisentraut's avatar
      Extend a test case a little · b3a9e989
      Peter Eisentraut authored
      This will possibly help a subsequent patch by making sure the notice
      messages are distinct so that it's clear that they come out in the
      right order.
      
      Author: Fabien COELHO <coelho@cri.ensmp.fr>
      Discussion: https://www.postgresql.org/message-id/alpine.DEB.2.21.1904240654120.3407%40lancre
      b3a9e989
    • Michael Paquier's avatar
      doc: Improve {archive,restore}_command for compressed logs · 329784e1
      Michael Paquier authored
      The commands mentioned in the docs with gzip and gunzip did not prefix
      the archives with ".gz" and used inconsistent paths for the archives,
      which can be confusing.
      
      Reported-by: Philipp Gramzow
      Reviewed-by: Fujii Masao
      Discussion: https://postgr.es/m/161397938841.15451.13129264141285167267@wrigleys.postgresql.org
      329784e1
    • Thomas Munro's avatar
      Revert "pg_collation_actual_version() -> pg_collation_current_version()." · 8556267b
      Thomas Munro authored
      This reverts commit 9cf184cc.  Name
      change less well received than anticipated.
      
      Discussion: https://postgr.es/m/afcfb97e-88a1-a540-db95-6c573b93bc2b%40eisentraut.org
      8556267b
    • Tom Lane's avatar
      Fix list-manipulation bug in WITH RECURSIVE processing. · 80ca8464
      Tom Lane authored
      makeDependencyGraphWalker and checkWellFormedRecursionWalker
      thought they could hold onto a pointer to a list's first
      cons cell while the list was modified by recursive calls.
      That was okay when the cons cell was actually separately
      palloc'd ... but since commit 1cff1b95, it's quite unsafe,
      leading to core dumps or incorrect complaints of faulty
      WITH nesting.
      
      In the field this'd require at least a seven-deep WITH nest
      to cause an issue, but enabling DEBUG_LIST_MEMORY_USAGE
      allows the bug to be seen with lesser nesting depths.
      
      Per bug #16801 from Alexander Lakhin.  Back-patch to v13.
      
      Michael Paquier and Tom Lane
      
      Discussion: https://postgr.es/m/16801-393c7922143eaa4d@postgresql.org
      80ca8464
  3. 25 Feb, 2021 8 commits
    • Peter Geoghegan's avatar
      VACUUM VERBOSE: Count "newly deleted" index pages. · 23763618
      Peter Geoghegan authored
      Teach VACUUM VERBOSE to report on pages deleted by the _current_ VACUUM
      operation -- these are newly deleted pages.  VACUUM VERBOSE continues to
      report on the total number of deleted pages in the entire index (no
      change there).  The former is a subset of the latter.
      
      The distinction between each category of deleted index page only arises
      with index AMs where page deletion is supported and is decoupled from
      page recycling for performance reasons.
      
      This is follow-up work to commit e5d8a999, which made nbtree store
      64-bit XIDs (not 32-bit XIDs) in pages at the point at which they're
      deleted.  Note that the btm_last_cleanup_num_delpages metapage field
      added by that commit usually gets set to pages_newly_deleted.  The
      exceptions (the scenarios in which they're not equal) all seem to be
      tricky cases for the implementation (of page deletion and recycling) in
      general.
      
      Author: Peter Geoghegan <pg@bowt.ie>
      Discussion: https://postgr.es/m/CAH2-WznpdHvujGUwYZ8sihX%3Dd5u-tRYhi-F4wnV2uN2zHpMUXw%40mail.gmail.com
      23763618
    • Tom Lane's avatar
      Doc: remove src/backend/regex/re_syntax.n. · 301ed881
      Tom Lane authored
      We aren't publishing this file as documentation, and it's been
      much more haphazardly maintained than the real docs in func.sgml,
      so let's just drop it.  I think the only reason I included it in
      commit 7bcc6d98 was that the Berkeley-era sources had had a man
      page in this directory.
      
      Discussion: https://postgr.es/m/4099447.1614186542@sss.pgh.pa.us
      301ed881
    • Tom Lane's avatar
      Change regex \D and \W shorthands to always match newlines. · 7dc13a0f
      Tom Lane authored
      Newline is certainly not a digit, nor a word character, so it is
      sensible that it should match these complemented character classes.
      Previously, \D and \W acted that way by default, but in
      newline-sensitive mode ('n' or 'p' flag) they did not match newlines.
      
      This behavior was previously forced because explicit complemented
      character classes don't match newlines in newline-sensitive mode;
      but as of the previous commit that implementation constraint no
      longer exists.  It seems useful to change this because the primary
      real-world use for newline-sensitive mode seems to be to match the
      default behavior of other regex engines such as Perl and Javascript
      ... and their default behavior is that these match newlines.
      
      The old behavior can be kept by writing an explicit complemented
      character class, i.e. [^[:digit:]] or [^[:word:]].  (This means
      that \D and \W are not exactly equivalent to those strings, but
      they weren't anyway.)
      
      Discussion: https://postgr.es/m/3220564.1613859619@sss.pgh.pa.us
      7dc13a0f
    • Tom Lane's avatar
      Allow complemented character class escapes within regex brackets. · 2a0af7fe
      Tom Lane authored
      The complement-class escapes \D, \S, \W are now allowed within
      bracket expressions.  There is no semantic difficulty with doing
      that, but the rather hokey macro-expansion-based implementation
      previously used here couldn't cope.
      
      Also, invent "word" as an allowed character class name, thus "\w"
      is now equivalent to "[[:word:]]" outside brackets, or "[:word:]"
      within brackets.  POSIX allows such implementation-specific
      extensions, and the same name is used in e.g. bash.
      
      One surprising compatibility issue this raises is that constructs
      such as "[\w-_]" are now disallowed, as our documentation has always
      said they should be: character classes can't be endpoints of a range.
      Previously, because \w was just a macro for "[:alnum:]_", such a
      construct was read as "[[:alnum:]_-_]", so it was accepted so long as
      the character after "-" was numerically greater than or equal to "_".
      
      Some implementation cleanup along the way:
      
      * Remove the lexnest() hack, and in consequence clean up wordchrs()
      to not interact with the lexer.
      
      * Fix colorcomplement() to not be O(N^2) in the number of colors
      involved.
      
      * Get rid of useless-as-far-as-I-can-see calls of element()
      on single-character character element names in brackpart().
      element() always maps these to the character itself, and things
      would be quite broken if it didn't --- should "[a]" match something
      different than "a" does?  Besides, the shortcut path in brackpart()
      wasn't doing this anyway, making it even more inconsistent.
      
      Discussion: https://postgr.es/m/2845172.1613674385@sss.pgh.pa.us
      Discussion: https://postgr.es/m/3220564.1613859619@sss.pgh.pa.us
      2a0af7fe
    • Fujii Masao's avatar
      Improve tab-completion for TRUNCATE. · 6b40d9bd
      Fujii Masao authored
      Author: Kota Miyake
      Reviewed-by: Muhammad Usama
      Discussion: https://postgr.es/m/f5d30053d00dcafda3280c9e267ecb0f@oss.nttdata.com
      6b40d9bd
    • Michael Paquier's avatar
      doc: Mention PGDATABASE as supported by pgbench · a6f8dc47
      Michael Paquier authored
      PGHOST, PGPORT and PGUSER were already mentioned, but not PGDATABASE.
      Like 5aaa584f, backpatch down to 12.
      
      Reported-by: Christophe Courtois
      Discussion: https://postgr.es/m/161399398648.21711.15387267201764682579@wrigleys.postgresql.org
      Backpatch-through: 12
      a6f8dc47
    • Peter Geoghegan's avatar
      Use full 64-bit XIDs in deleted nbtree pages. · e5d8a999
      Peter Geoghegan authored
      Otherwise we risk "leaking" deleted pages by making them non-recyclable
      indefinitely.  Commit 6655a729 did the same thing for deleted pages in
      GiST indexes.  That work was used as a starting point here.
      
      Stop storing an XID indicating the oldest bpto.xact across all deleted
      though unrecycled pages in nbtree metapages.  There is no longer any
      reason to care about that condition/the oldest XID.  It only ever made
      sense when wraparound was something _bt_vacuum_needs_cleanup() had to
      consider.
      
      The btm_oldest_btpo_xact metapage field has been repurposed and renamed.
      It is now btm_last_cleanup_num_delpages, which is used to remember how
      many non-recycled deleted pages remain from the last VACUUM (in practice
      its value is usually the precise number of pages that were _newly
      deleted_ during the specific VACUUM operation that last set the field).
      
      The general idea behind storing btm_last_cleanup_num_delpages is to use
      it to give _some_ consideration to non-recycled deleted pages inside
      _bt_vacuum_needs_cleanup() -- though never too much.  We only really
      need to avoid leaving a truly excessive number of deleted pages in an
      unrecycled state forever.  We only do this to cover certain narrow cases
      where no other factor makes VACUUM do a full scan, and yet the index
      continues to grow (and so actually misses out on recycling existing
      deleted pages).
      
      These metapage changes result in a clear user-visible benefit: We no
      longer trigger full index scans during VACUUM operations solely due to
      the presence of only 1 or 2 known deleted (though unrecycled) blocks
      from a very large index.  All that matters now is keeping the costs and
      benefits in balance over time.
      
      Fix an issue that has been around since commit 857f9c36, which added the
      "skip full scan of index" mechanism (i.e. the _bt_vacuum_needs_cleanup()
      logic).  The accuracy of btm_last_cleanup_num_heap_tuples accidentally
      hinged upon _when_ the source value gets stored.  We now always store
      btm_last_cleanup_num_heap_tuples in btvacuumcleanup().  This fixes the
      issue because IndexVacuumInfo.num_heap_tuples (the source field) is
      expected to accurately indicate the state of the table _after_ the
      VACUUM completes inside btvacuumcleanup().
      
      A backpatchable fix cannot easily be extracted from this commit.  A
      targeted fix for the issue will follow in a later commit, though that
      won't happen today.
      
      I (pgeoghegan) have chosen to remove any mention of deleted pages in the
      documentation of the vacuum_cleanup_index_scale_factor GUC/param, since
      the presence of deleted (though unrecycled) pages is no longer of much
      concern to users.  The vacuum_cleanup_index_scale_factor description in
      the docs now seems rather unclear in any case, and it should probably be
      rewritten in the near future.  Perhaps some passing mention of page
      deletion will be added back at the same time.
      
      Bump XLOG_PAGE_MAGIC due to nbtree WAL records using full XIDs now.
      
      Author: Peter Geoghegan <pg@bowt.ie>
      Reviewed-By: default avatarMasahiko Sawada <sawada.mshk@gmail.com>
      Discussion: https://postgr.es/m/CAH2-WznpdHvujGUwYZ8sihX=d5u-tRYhi-F4wnV2uN2zHpMUXw@mail.gmail.com
      e5d8a999
    • Amit Kapila's avatar
      Fix relcache reference leak introduced by ce0fdbfe. · 8a4f9522
      Amit Kapila authored
      Author: Sawada Masahiko
      Reviewed-by: Amit Kapila
      Discussion: https://postgr.es/m/CAD21AoA7ZEfsOXQ9HQqMv3QYGsEm2H5Wk5ic5S=mvzDf-3a3SA@mail.gmail.com
      8a4f9522
  4. 24 Feb, 2021 3 commits
  5. 23 Feb, 2021 7 commits
  6. 22 Feb, 2021 13 commits
  7. 21 Feb, 2021 2 commits
    • Tom Lane's avatar
      Avoid generating extra subre tree nodes for capturing parentheses. · ea1268f6
      Tom Lane authored
      Previously, each pair of capturing parentheses gave rise to a separate
      subre tree node, whose only function was to identify that we ought to
      capture the match details for this particular sub-expression.  In
      most cases we don't really need that, since we can perfectly well
      put a "capture this" annotation on the child node that does the real
      matching work.  As with the two preceding commits, the main value
      of this is to avoid generating and optimizing an NFA for a tree node
      that's not really pulling its weight.
      
      The chosen data representation only allows one capture annotation
      per subre node.  In the legal-per-spec, but seemingly not very useful,
      case where there are multiple capturing parens around the exact same
      bit of the regex (i.e. "((xyz))"), wrap the child node in N-1 capture
      nodes that act the same as before.  We could work harder at that but
      I'll refrain, pending some evidence that such cases are worth troubling
      over.
      
      In passing, improve the comments in regex.h to say what all the
      different re_info bits mean.  Some of them were pretty obvious
      but others not so much, so reverse-engineer some documentation.
      
      This is part of a patch series that in total reduces the regex engine's
      runtime by about a factor of four on a large corpus of real-world regexes.
      
      Patch by me, reviewed by Joel Jacobson
      
      Discussion: https://postgr.es/m/1340281.1613018383@sss.pgh.pa.us
      ea1268f6
    • Tom Lane's avatar
      Convert regex engine's subre tree from binary to N-ary style. · 58104308
      Tom Lane authored
      Instead of having left and right child links in subre structs,
      have a single child link plus a sibling link.  Multiple children
      of a tree node are now reached by chasing the sibling chain.
      
      The beneficiary of this is alternation tree nodes.  A regular
      expression with N (>1) branches is now represented by one alternation
      node with N children, rather than a tree that includes N alternation
      nodes as well as N children.  While the old representation didn't
      really cost anything extra at execution time, it was pretty horrid
      for compilation purposes, because each of the alternation nodes had
      its own NFA, which we were too stupid not to separately optimize.
      (To make matters worse, all of those NFAs described the entire
      alternation pattern, not just the portion of it that one might
      expect from the tree structure.)
      
      We continue to require concatenation nodes to have exactly two
      children.  This data structure is now prepared to support more,
      but the executor's logic would need some careful redesign, and
      it's not clear that a lot of benefit could be had.
      
      This is part of a patch series that in total reduces the regex engine's
      runtime by about a factor of four on a large corpus of real-world regexes.
      
      Patch by me, reviewed by Joel Jacobson
      
      Discussion: https://postgr.es/m/1340281.1613018383@sss.pgh.pa.us
      58104308