1. 26 Jul, 2020 2 commits
    • David Rowley's avatar
      Allocate consecutive blocks during parallel seqscans · 56788d21
      David Rowley authored
      Previously we would allocate blocks to parallel workers during a parallel
      sequential scan 1 block at a time.  Since other workers were likely to
      request a block before a worker returns for another block number to work
      on, this could lead to non-sequential I/O patterns in each worker which
      could cause the operating system's readahead to perform poorly or not at
      all.
      
      Here we change things so that we allocate consecutive "chunks" of blocks
      to workers and have them work on those until they're done, at which time
      we allocate another chunk for the worker.  The size of these chunks is
      based on the size of the relation.
      
      Initial patch here was by Thomas Munro which showed some good improvements
      just having a fixed chunk size of 64 blocks with a simple ramp-down near
      the end of the scan. The revisions of the patch to make the chunk size
      based on the relation size and the adjusted ramp-down in powers of two was
      done by me, along with quite extensive benchmarking to determine the
      optimal chunk sizes.
      
      For the most part, benchmarks have shown significant performance
      improvements for large parallel sequential scans on Linux, FreeBSD and
      Windows using SSDs.  It's less clear how this affects the performance of
      cloud providers.  Tests done so far are unable to obtain stable enough
      performance to provide meaningful benchmark results.  It is possible that
      this could cause some performance regressions on more obscure filesystems,
      so we may need to later provide users with some ability to get something
      closer to the old behavior.  For now, let's leave that until we see that
      it's really required.
      
      Author: Thomas Munro, David Rowley
      Reviewed-by: Ranier Vilela, Soumyadeep Chakraborty, Robert Haas
      Reviewed-by: Amit Kapila, Kirk Jamison
      Discussion: https://postgr.es/m/CA+hUKGJ_EErDv41YycXcbMbCBkztA34+z1ts9VQH+ACRuvpxig@mail.gmail.com
      56788d21
    • Michael Paquier's avatar
      Tweak behavior of pg_stat_activity.leader_pid · 11a68e4b
      Michael Paquier authored
      The initial implementation of leader_pid in pg_stat_activity added by
      b025f32e took the approach to strictly print what a PGPROC entry
      includes.  In short, if a backend has been involved in parallel query at
      least once, leader_pid would remain set as long as the backend is alive.
      For a parallel group leader, this means that the field would always be
      set after it participated at least once in parallel query, and after
      more discussions this could be confusing if using for example a
      connection pooler.
      
      This commit changes the data printed so as leader_pid becomes always
      NULL for a parallel group leader, showing up a non-NULL value only for
      the parallel workers, and actually as long as a parallel query is
      running as workers are shut down once the query has completed.
      
      This does not change the definition of any catalog, so no catalog bump
      is needed.  Per discussion with Justin Pryzby, Álvaro Herrera, Julien
      Rouhaud and me.
      
      Discussion: https://postgr.es/m/20200721035145.GB17300@paquier.xyz
      Backpatch-through: 13
      11a68e4b
  2. 25 Jul, 2020 5 commits
    • Noah Misch's avatar
      Remove optimization for RAND_poll() failing. · 15e44197
      Noah Misch authored
      The loop to generate seed data will exit on RAND_status(), so we don't
      need to handle the case of RAND_poll() failing separately.  Failures
      here are rare, so this a code cleanup, essentially.
      
      Daniel Gustafsson, reviewed by David Steele and Michael Paquier.
      
      Discussion: https://postgr.es/m/9B038FA5-23E8-40D0-B932-D515E1D8F66A@yesql.se
      15e44197
    • Noah Misch's avatar
      Use RAND_poll() for seeding randomness after fork(). · ce4939ff
      Noah Misch authored
      OpenSSL deprecated RAND_cleanup(), and OpenSSL 1.1.0 made it into a
      no-op.  Replace it with RAND_poll(), per an OpenSSL community
      recommendation.  While this has no user-visible consequences under
      OpenSSL defaults, it might help under non-default settings.
      
      Daniel Gustafsson, reviewed by David Steele and Michael Paquier.
      
      Discussion: https://postgr.es/m/9B038FA5-23E8-40D0-B932-D515E1D8F66A@yesql.se
      ce4939ff
    • Tom Lane's avatar
      Improve performance of binary COPY FROM through better buffering. · 0a0727cc
      Tom Lane authored
      At least on Linux and macOS, fread() turns out to have far higher
      per-call overhead than one could wish.  Reading 64KB of data at a time
      and then parceling it out with our own memcpy logic makes binary COPY
      from a file significantly faster --- around 30% in simple testing for
      cases with narrow text columns (on Linux ... even more on macOS).
      
      In binary COPY from frontend, there's no per-call fread(), and this
      patch introduces an extra layer of memcpy'ing, but it still manages
      to eke out a small win.  Apparently, the control-logic overhead in
      CopyGetData() is enough to be worth avoiding for small fetches.
      
      Bharath Rupireddy and Amit Langote, reviewed by Vignesh C,
      cosmetic tweaks by me
      
      Discussion: https://postgr.es/m/CALj2ACU5Bz06HWLwqSzNMN=Gupoj6Rcn_QVC+k070V4em9wu=A@mail.gmail.com
      0a0727cc
    • Tom Lane's avatar
      Mark built-in coercion functions as leakproof where possible. · 8a37951e
      Tom Lane authored
      Making these leakproof seems helpful since (for example) if you have a
      function f(int8) that is leakproof, you don't want it to effectively
      become non-leakproof when you apply it to an int4 or int2 column.
      But that's what happens today, since the implicit up-coercion will
      not be leakproof.
      
      Most of the coercion functions that visibly can't throw errors are
      functions that convert numeric datatypes to other, wider ones.
      Notable is that float4_numeric and float8_numeric can be marked
      leakproof; before commit a57d312a they could not have been.
      I also marked the functions that coerce strings to "name" as leakproof;
      that's okay today because they truncate silently, but if we ever
      reconsidered that behavior then they could no longer be leakproof.
      
      I desisted from marking rtrim1() as leakproof; it appears so right now,
      but the code seems a little too complex and perhaps subject to change,
      since it's shared with other SQL functions.
      
      Discussion: https://postgr.es/m/459322.1595607431@sss.pgh.pa.us
      8a37951e
    • Amit Kapila's avatar
      Fix buffer usage stats for nodes above Gather Merge. · 2a249422
      Amit Kapila authored
      Commit 85c9d347 addressed a similar problem for Gather and Gather
      Merge nodes but forgot to account for nodes above parallel nodes.  This
      still works for nodes above Gather node because we shut down the workers
      for Gather node as soon as there are no more tuples.  We can do a similar
      thing for Gather Merge as well but it seems better to account for stats
      during nodes shutdown after completing the execution.
      
      Reported-by: Stéphane Lorek, Jehan-Guillaume de Rorthais
      Author: Jehan-Guillaume de Rorthais <jgdr@dalibo.com>
      Reviewed-by: Amit Kapila
      Backpatch-through: 10, where it was introduced
      Discussion: https://postgr.es/m/20200718160206.584532a2@firost
      2a249422
  3. 24 Jul, 2020 3 commits
    • Tom Lane's avatar
      Replace TS_execute's TS_EXEC_CALC_NOT flag with TS_EXEC_SKIP_NOT. · 79d6d1a2
      Tom Lane authored
      It's fairly silly that ignoring NOT subexpressions is TS_execute's
      default behavior.  It's wrong on its face and it encourages errors
      of omission.  Moreover, the only two remaining callers that aren't
      specifying CALC_NOT are in ts_headline calculations, and it's very
      arguable that those are bugs: if you've specified "!foo" in your
      query, why would you want to get a headline that includes "foo"?
      
      Hence, rip that out and change the default behavior to be to calculate
      NOT accurately.  As a concession to the slim chance that there is still
      somebody somewhere who needs the incorrect behavior, provide a new
      SKIP_NOT flag to explicitly request that.
      
      Back-patch into v13, mainly because it seems better to change this
      at the same time as the previous commit's rejiggering of TS_execute
      related APIs.  Any outside callers affected by this change are
      probably also affected by that one.
      
      Discussion: https://postgr.es/m/CALT9ZEE-aLotzBg-pOp2GFTesGWVYzXA3=mZKzRDa_OKnLF7Mg@mail.gmail.com
      79d6d1a2
    • Tom Lane's avatar
      Fix assorted bugs by changing TS_execute's callback API to ternary logic. · 2f2007fb
      Tom Lane authored
      Text search sometimes failed to find valid matches, for instance
      '!crew:A'::tsquery might fail to locate 'crew:1B'::tsvector during
      an index search.  The root of the issue is that TS_execute's callback
      functions were not changed to use ternary (yes/no/maybe) reporting
      when we made the search logic itself do so.  It's somewhat annoying
      to break that API, but on the other hand we now see that any code
      using plain boolean logic is almost certainly broken since the
      addition of phrase search.  There seem to be very few outside callers
      of this code anyway, so we'll just break them intentionally to get
      them to adapt.
      
      This allows removal of tsginidx.c's private re-implementation of
      TS_execute, since that's now entirely duplicative.  It's also no
      longer necessary to avoid use of CALC_NOT in tsgistidx.c, since
      the underlying callbacks can now do something reasonable.
      
      Back-patch into v13.  We can't change this in stable branches,
      but it seems not quite too late to fix it in v13.
      
      Tom Lane and Pavel Borisov
      
      Discussion: https://postgr.es/m/CALT9ZEE-aLotzBg-pOp2GFTesGWVYzXA3=mZKzRDa_OKnLF7Mg@mail.gmail.com
      2f2007fb
    • Peter Eisentraut's avatar
      Rename configure.in to configure.ac · 25244b89
      Peter Eisentraut authored
      The new name has been preferred by Autoconf for a long time.  Future
      versions of Autoconf will warn about the old name.
      
      Discussion: https://www.postgresql.org/message-id/flat/e796c185-5ece-8569-248f-dd3799701be1%402ndquadrant.com
      25244b89
  4. 23 Jul, 2020 4 commits
    • Tom Lane's avatar
      Fix ancient violation of zlib's API spec. · b9b61057
      Tom Lane authored
      contrib/pgcrypto mishandled the case where deflate() does not consume
      all of the offered input on the first try.  It reset the next_in pointer
      to the start of the input instead of leaving it alone, causing the wrong
      data to be fed to the next deflate() call.
      
      This has been broken since pgcrypto was committed.  The reason for the
      lack of complaints seems to be that it's fairly hard to get stock zlib
      to not consume all the input, so long as the output buffer is big enough
      (which it normally would be in pgcrypto's usage; AFAICT the input is
      always going to be packetized into packets no larger than ZIP_OUT_BUF).
      However, IBM's zlibNX implementation for AIX evidently will do it
      in some cases.
      
      I did not add a test case for this, because I couldn't find one that
      would fail with stock zlib.  When we put back the test case for
      bug #16476, that will cover the zlibNX situation well enough.
      
      While here, write deflate()'s second argument as Z_NO_FLUSH per its
      API spec, instead of hard-wiring the value zero.
      
      Per buildfarm results and subsequent investigation.
      
      Discussion: https://postgr.es/m/16476-692ef7b84e5fb893@postgresql.org
      b9b61057
    • Peter Eisentraut's avatar
      doc: Document that ssl_ciphers does not affect TLS 1.3 · 5733fa0f
      Peter Eisentraut authored
      TLS 1.3 uses a different way of specifying ciphers and a different
      OpenSSL API.  PostgreSQL currently does not support setting those
      ciphers.  For now, just document this.  In the future, support for
      this might be added somehow.
      Reviewed-by: default avatarJonathan S. Katz <jkatz@postgresql.org>
      Reviewed-by: default avatarTom Lane <tgl@sss.pgh.pa.us>
      5733fa0f
    • Thomas Munro's avatar
      Fix error message. · 42dee8b8
      Thomas Munro authored
      Remove extra space.  Back-patch to all releases, like commit 7897e3bb.
      
      Author: Lu, Chenyang <lucy.fnst@cn.fujitsu.com>
      Discussion: https://postgr.es/m/795d03c6129844d3803e7eea48f5af0d%40G08CNEXMBPEKD04.g08.fujitsu.local
      42dee8b8
    • Amit Kapila's avatar
      WAL Log invalidations at command end with wal_level=logical. · c55040cc
      Amit Kapila authored
      When wal_level=logical, write invalidations at command end into WAL so
      that decoding can use this information.
      
      This patch is required to allow the streaming of in-progress transactions
      in logical decoding.  The actual work to allow streaming will be committed
      as a separate patch.
      
      We still add the invalidations to the cache and write them to WAL at
      commit time in RecordTransactionCommit(). This uses the existing
      XLOG_INVALIDATIONS xlog record type, from the RM_STANDBY_ID resource
      manager (see LogStandbyInvalidations for details).
      
      So existing code relying on those invalidations (e.g. redo) does not need
      to be changed.
      
      The invalidations written at command end uses a new xlog record type
      XLOG_XACT_INVALIDATIONS, from RM_XACT_ID resource manager. See
      LogLogicalInvalidations for details.
      
      These new xlog records are ignored by existing redo procedures, which
      still rely on the invalidations written to commit records.
      
      The invalidations are decoded and accumulated in top-transaction, and then
      executed during replay.  This obviates the need to decode the
      invalidations as part of a commit record.
      
      Bump XLOG_PAGE_MAGIC, since this introduces XLOG_XACT_INVALIDATIONS.
      
      Author: Dilip Kumar, Tomas Vondra, Amit Kapila
      Reviewed-by: Amit Kapila
      Tested-by: Neha Sharma and Mahendra Singh Thalor
      Discussion: https://postgr.es/m/688b0b7f-2f6c-d827-c27b-216a8e3ea700@2ndquadrant.com
      c55040cc
  5. 22 Jul, 2020 5 commits
  6. 21 Jul, 2020 9 commits
    • Tom Lane's avatar
      neqjoinsel must now pass through collation to eqjoinsel. · bd0d893a
      Tom Lane authored
      Since commit 044c99bc, eqjoinsel passes the passed-in collation
      to any operators it invokes.  However, neqjoinsel failed to pass
      on whatever collation it got, so that if we invoked a
      collation-dependent operator via that code path, we'd get "could not
      determine which collation to use for string comparison" or the like.
      
      Per report from Justin Pryzby.  Back-patch to v12, like the previous
      commit.
      
      Discussion: https://postgr.es/m/20200721191606.GL5748@telsasoft.com
      bd0d893a
    • Peter Geoghegan's avatar
      Add nbtree Valgrind buffer lock checks. · 4a70f829
      Peter Geoghegan authored
      Holding just a buffer pin (with no buffer lock) on an nbtree buffer/page
      provides very weak guarantees, especially compared to heapam, where it's
      often safe to read a page while only holding a buffer pin.  This commit
      has Valgrind enforce the following rule: it is never okay to access an
      nbtree buffer without holding both a pin and a lock on the buffer.
      
      A draft version of this patch detected questionable code that was
      cleaned up by commits fa7ff642 and 7154aa16.  The code in question used
      to access an nbtree buffer page's special/opaque area with no buffer
      lock (only a buffer pin).  This practice (which isn't obviously unsafe)
      is hereby formally disallowed in nbtree.  There doesn't seem to be any
      reason to allow it, and banning it keeps things simple for Valgrind.
      
      The new checks are implemented by adding custom nbtree client requests
      (located in LockBuffer() wrapper functions); these requests are
      "superimposed" on top of the generic bufmgr.c Valgrind client requests
      added by commit 1e0dfd16.  No custom resource management cleanup code is
      needed to undo the effects of marking buffers as non-accessible under
      this scheme.
      
      Author: Peter Geoghegan
      Reviewed-By: Anastasia Lubennikova, Georgios Kokolatos
      Discussion: https://postgr.es/m/CAH2-WzkLgyN3zBvRZ1pkNJThC=xi_0gpWRUb_45eexLH1+k2_Q@mail.gmail.com
      4a70f829
    • Tom Lane's avatar
      Weaken type-OID-matching checks in array_recv and record_recv. · 670c0a1d
      Tom Lane authored
      Rather than always insisting on an exact match of the type OID in the
      data to the element type or column type we expect, complain only when
      both OIDs fall within the manually-assigned range.  This acknowledges
      the reality that user-defined types don't have stable OIDs, while
      still preserving some of the mistake-detection value of the old test.
      
      (It's not entirely clear whether to error if one OID is manually
      assigned and the other isn't.  But perhaps that case could arise in
      cross-version cases where a former extension type has been imported
      into core, so I let it pass.)
      
      This change allows us to remove the prohibition on binary transfer
      of user-defined arrays and composites in the recently-landed support
      for binary logical replication (commit 9de77b54).  We can just
      unconditionally drop that check, since if the client has asked for
      binary transfer it must be >= v14 and must have this change.
      
      Discussion: https://postgr.es/m/CADK3HH+R3xMn=8t3Ct+uD+qJ1KD=Hbif5NFMJ+d5DkoCzp6Vgw@mail.gmail.com
      670c0a1d
    • Alvaro Herrera's avatar
      Glossary: Add term "base backup" · 606c3845
      Alvaro Herrera authored
      Author: Jürgen Purtz <juergen@purtz.de>
      Discussion: https://postgr.es/m/95f90a5d-7692-701d-2c0c-0c88eb5cea7d@purtz.de
      606c3845
    • Alvaro Herrera's avatar
      Minor glossary tweaks · a0b2d583
      Alvaro Herrera authored
      Add "(process)" qualifier to two terms, remove self-reference in one
      term.
      
      Author: Jürgen Purtz <juergen@purtz.de>
      Discussion: https://postgr.es/m/95f90a5d-7692-701d-2c0c-0c88eb5cea7d@purtz.de
      a0b2d583
    • Tom Lane's avatar
      Be more careful about marking catalog columns NOT NULL by default. · fc032bed
      Tom Lane authored
      The bug fixed in commit 72eab84a would not have occurred if initdb
      had a less surprising rule about which columns should be marked
      NOT NULL by default.  Let's make that rule be strictly that the
      column must be fixed-width and its predecessors must be fixed-width
      and NOT NULL, removing the hacky and unsafe exceptions for oidvector
      and int2vector.
      
      Since we do still want all existing oidvector and int2vector columns
      to be marked NOT NULL, we have to put BKI_FORCE_NOT_NULL labels on
      them.  But making this less magic and more documented seems like a
      good idea, even if it's a shade more verbose.
      
      I didn't bump catversion since the initial catalog contents are
      not actually changed by this patch.  Note however that the
      contents of postgres.bki do change, and feeding an old copy of
      that to a new backend will produce wrong results.
      
      Discussion: https://postgr.es/m/204760.1595181800@sss.pgh.pa.us
      fc032bed
    • Tom Lane's avatar
      Assert that we don't insert nulls into attnotnull catalog columns. · 3e66019f
      Tom Lane authored
      The executor checks for this error, and so does the bootstrap catalog
      loader, but we never checked for it in retail catalog manipulations.
      The folly of that has now been exposed, so let's add assertions
      checking it.  Checking in CatalogTupleInsert[WithInfo] and
      CatalogTupleUpdate[WithInfo] should be enough to cover this.
      
      Back-patch to v10; the aforesaid functions didn't exist before that,
      and it didn't seem worth adapting the patch to the oldest branches.
      But given the risk of JIT crashes, I think we certainly need this
      as far back as v11.
      
      Pre-v13, we have to explicitly exclude pg_subscription.subslotname
      and pg_subscription_rel.srsublsn from the checks, since they are
      mismarked.  (Even if we change our mind about applying BKI_FORCE_NULL
      in the branch tips, it doesn't seem wise to have assertions that
      would fire in existing databases.)
      
      Discussion: https://postgr.es/m/298837.1595196283@sss.pgh.pa.us
      3e66019f
    • Michael Paquier's avatar
      Rework tab completion of COPY and \copy in psql · c273d9d8
      Michael Paquier authored
      This corrects and simplifies $subject in a number of ways:
      - Remove from the completion the pre-9.0 grammar still supported for
      compatibility purposes.  This simplifies the code, and allows to extend
      it more easily with new patterns.
      - Add completion for the options of FORMAT within a WITH clause.
      - Complete WHERE and WITH clauses correctly depending on if TO or FROM
      are used, WHERE being only available with COPY FROM.
      
      Author: Vignesh C, Michael Paquier
      Reviewed-by: Ahsan Hadi
      Discussion: https://postgr.es/m/CALDaNm3zWr=OmxeNqOqfT=uZTSdam_j-gkX94CL8eTNfgUtf6A@mail.gmail.com
      c273d9d8
    • Tom Lane's avatar
      Fix some corner cases for window ranges with infinite offsets. · a4faef8f
      Tom Lane authored
      Many situations where the offset is infinity were not handled sanely.
      We should generally allow the val versus base +/- offset comparison to
      proceed according to the normal rules of IEEE arithmetic; however, we
      must do something special for the corner cases where base +/- offset
      would produce NaN due to subtracting two like-signed infinities.
      That corresponds to asking which values infinitely precede +inf or
      infinitely follow -inf, which should certainly be true of any finite
      value or of the opposite-signed infinity.  After some discussion it
      seems that the best decision is to make it true of the same-signed
      infinity as well, ie, just return constant TRUE if the calculation
      would produce a NaN.
      
      (We could write this with a bit less code by subtracting anyway,
      and then checking for a NaN result.  However, I prefer this
      formulation because it'll be easier to transpose into numeric.c.)
      
      Although this seems like clearly a bug fix with respect to finite
      values, it is less obviously correct for infinite values.  Between
      that and the fact that the whole issue only arises for very strange
      window specifications (e.g. RANGE BETWEEN 'inf' PRECEDING AND 'inf'
      PRECEDING), I'll desist from back-patching.
      
      Noted by Dean Rasheed.
      
      Discussion: https://postgr.es/m/3393130.1594925893@sss.pgh.pa.us
      a4faef8f
  7. 20 Jul, 2020 9 commits
  8. 19 Jul, 2020 3 commits
    • Peter Geoghegan's avatar
      Avoid harmless Valgrind no-buffer-pin errors. · a766d6ca
      Peter Geoghegan authored
      Valgrind builds with assertions enabled sometimes perform a
      theoretically unsafe page access inside an assertion in
      heapam_tuple_lock().  This happened when the eval-plan-qual isolation
      test ran one of the permutations added by commit a2418f9e.
      
      Avoid complaints from Valgrind by moving the assertion ever so slightly.
      This is minor cleanup for commit 1e0dfd16, which added Valgrind buffer
      access instrumentation.
      
      No backpatch, since this only happens within an assertion, and seems
      very unlikely to cause any real problems even with assert-enabled
      builds.
      a766d6ca
    • Peter Geoghegan's avatar
      Mark buffers as defined to Valgrind consistently. · 46ef520b
      Peter Geoghegan authored
      Make PinBuffer() mark buffers as defined to Valgrind unconditionally,
      including when the buffer header spinlock must be acquired.  Failure to
      handle that case could lead to false positive reports from Valgrind.
      
      This theoretically creates a risk that we'll mark buffers defined even
      when external callers don't end up with a buffer pin.  That seems
      perfectly acceptable, though, since in general we make no guarantees
      about buffers that are unsafe to access being reliably marked as unsafe.
      
      Oversight in commit 1e0dfd16, which added valgrind buffer access
      instrumentation.
      46ef520b
    • Tom Lane's avatar
      Correctly mark pg_subscription.subslotname as nullable. · 72eab84a
      Tom Lane authored
      Due to the layout of this catalog, subslotname has to be explicitly
      marked BKI_FORCE_NULL, else initdb will default to the assumption
      that it's non-nullable.  Since, in fact, CREATE/ALTER SUBSCRIPTION
      will store null values there, the existing marking is just wrong,
      and has been since this catalog was invented.
      
      We haven't noticed because not much in the system actually depends
      on attnotnull being truthful.  However, JIT'ed tuple deconstruction
      does depend on that in some cases, allowing crashes or wrong answers
      in queries that inspect pg_subscription.  Commit 9de77b54 quite
      accidentally exposed this on the buildfarm members that force JIT
      activation.
      
      Back-patch to v13.  The problem goes further back, but we cannot
      force initdb in released branches, so some klugier solution will
      be needed there.  Before working on that, push this simple fix
      to try to get the buildfarm back to green.
      
      Discussion: https://postgr.es/m/4118109.1595096139@sss.pgh.pa.us
      72eab84a