1. 19 Jun, 2021 1 commit
    • Peter Geoghegan's avatar
      Support disabling index bypassing by VACUUM. · 3499df0d
      Peter Geoghegan authored
      Generalize the INDEX_CLEANUP VACUUM parameter (and the corresponding
      reloption): make it into a ternary style boolean parameter.  It now
      exposes a third option, "auto".  The "auto" option (which is now the
      default) enables the "bypass index vacuuming" optimization added by
      commit 1e55e7d1.
      
      "VACUUM (INDEX_CLEANUP TRUE)" is redefined to once again make VACUUM
      simply do any required index vacuuming, regardless of how few dead
      tuples are encountered during the first scan of the target heap relation
      (unless there are exactly zero).  This gives users a way of opting out
      of the "bypass index vacuuming" optimization, if for whatever reason
      that proves necessary.  It is also expected to be used by PostgreSQL
      developers as a testing option from time to time.
      
      "VACUUM (INDEX_CLEANUP FALSE)" does the same thing as it always has: it
      forcibly disables both index vacuuming and index cleanup.  It's not
      expected to be used much in PostgreSQL 14.  The failsafe mechanism added
      by commit 1e55e7d1 addresses the same problem in a simpler way.
      INDEX_CLEANUP can now be thought of as a testing and compatibility
      option.
      
      Author: Peter Geoghegan <pg@bowt.ie>
      Reviewed-By: default avatarMasahiko Sawada <sawada.mshk@gmail.com>
      Reviewed-By: default avatarJustin Pryzby <pryzby@telsasoft.com>
      Discussion: https://postgr.es/m/CAH2-WznrBoCST4_Gxh_G9hA8NzGUbeBGnOUC8FcXcrhqsv6OHQ@mail.gmail.com
      3499df0d
  2. 18 Jun, 2021 7 commits
    • Alvaro Herrera's avatar
      Add test case for obsoleting slot with active walsender · 09126984
      Alvaro Herrera authored
      The code to signal a running walsender when its reserved WAL size grows
      too large is completely uncovered before this commit; this adds coverage
      for that case.
      
      This test involves sending SIGSTOP to walsender and walreceiver and
      running a checkpoint while advancing WAL, then sending SIGCONT.  There's
      no precedent for this coding in Perl tests, and my reading of relevant
      manpages says it's likely to fail on Windows.  Because of this, this
      test is always skipped on that platform.
      
      Author: Álvaro Herrera <alvherre@alvh.no-ip.org>
      Discussion: https://postgr.es/m/202106102202.mjw4huiix7lo@alvherre.pgsql
      09126984
    • Tom Lane's avatar
      Fix misbehavior of DROP OWNED BY with duplicate polroles entries. · d21fca08
      Tom Lane authored
      Ordinarily, a pg_policy.polroles array wouldn't list the same role
      more than once; but CREATE POLICY does not prevent that.  If we
      perform DROP OWNED BY on a role that is listed more than once,
      RemoveRoleFromObjectPolicy either suffered an assertion failure
      or encountered a tuple-updated-by-self error.  Rewrite it to cope
      correctly with duplicate entries, and add a CommandCounterIncrement
      call to prevent the other problem.
      
      Per discussion, there's other cleanup that ought to happen here,
      but this seems like the minimum essential fix.
      
      Per bug #17062 from Alexander Lakhin.  It's been broken all along,
      so back-patch to all supported branches.
      
      Discussion: https://postgr.es/m/17062-11f471ae3199ca23@postgresql.org
      d21fca08
    • Tom Lane's avatar
      Improve version reporting in pgbench. · 84bee961
      Tom Lane authored
      Commit 547f04e7 caused pgbench to start printing its version number,
      which seems like a fine idea, but it needs a bit more work:
      * Print the server version number too, when different.
      * Print the PG_VERSION string, not some reconstructed approximation.
      
      This patch copies psql's well-tested code for the same purpose.
      
      Discussion: https://postgr.es/m/1226654.1624036821@sss.pgh.pa.us
      84bee961
    • Tom Lane's avatar
      Centralize the logic for protective copying of utility statements. · 7c337b6b
      Tom Lane authored
      In the "simple Query" code path, it's fine for parse analysis or
      execution of a utility statement to scribble on the statement's node
      tree, since that'll just be thrown away afterwards.  However it's
      not fine if the node tree is in the plan cache, as then it'd be
      corrupted for subsequent executions.  Up to now we've dealt with
      that by having individual utility-statement functions apply
      copyObject() if they were going to modify the tree.  But that's
      prone to errors of omission.  Bug #17053 from Charles Samborski
      shows that CREATE/ALTER DOMAIN didn't get this memo, and can
      crash if executed repeatedly from plan cache.
      
      In the back branches, we'll just apply a narrow band-aid for that,
      but in HEAD it seems prudent to have a more principled fix that
      will close off the possibility of other similar bugs in future.
      Hence, let's hoist the responsibility for doing copyObject up into
      ProcessUtility from its children, thus ensuring that it happens for
      all utility statement types.
      
      Also, modify ProcessUtility's API so that its callers can tell it
      whether a copy step is necessary.  It turns out that in all cases,
      the immediate caller knows whether the node tree is transient, so
      this doesn't involve a huge amount of code thrashing.  In this way,
      while we lose a little bit in the execute-from-cache code path due
      to sometimes copying node trees that wouldn't be mutated anyway,
      we gain something in the simple-Query code path by not copying
      throwaway node trees.  Statements that are complex enough to be
      expensive to copy are almost certainly ones that would have to be
      copied anyway, so the loss in the cache code path shouldn't be much.
      
      (Note that this whole problem applies only to utility statements.
      Optimizable statements don't have the issue because we long ago made
      the executor treat Plan trees as read-only.  Perhaps someday we will
      make utility statement execution act likewise, but I'm not holding
      my breath.)
      
      Discussion: https://postgr.es/m/931771.1623893989@sss.pgh.pa.us
      Discussion: https://postgr.es/m/17053-3ca3f501bbc212b4@postgresql.org
      7c337b6b
    • Andrew Dunstan's avatar
      Don't set a fast default for anything but a plain table · 0a4efdc7
      Andrew Dunstan authored
      The fast default code added in Release 11 omitted to check that the
      table a fast default was being added to was a plain table. Thus one
      could be added to a foreign table, which predicably blows up. Here we
      perform that check.
      
      In addition, on the back branches, since some of these might have
      escaped into the wild, if we encounter a missing value for
      an attribute of something other than a plain table we ignore it.
      
      Fixes bug #17056
      
      Backpatch to release 11,
      
      Reviewed by: Andres Freund, Álvaro Herrera and Tom Lane
      0a4efdc7
    • Fujii Masao's avatar
      Make archiver process handle barrier events. · 981524d2
      Fujii Masao authored
      Commit d75288fb made WAL archiver process an auxiliary process.
      An auxiliary process needs to handle barrier events but the commit
      forgot to make archiver process do that.
      
      Reported-by: Thomas Munro
      Author: Fujii Masao
      Reviewed-by: Thomas Munro
      Discussion: https://postgr.es/m/CA+hUKGLah2w1pWKHonZP_+EQw69=q56AHYwCgEN8GDzsRG_Hgw@mail.gmail.com
      981524d2
    • Michael Paquier's avatar
  3. 17 Jun, 2021 2 commits
    • Heikki Linnakangas's avatar
      Tidy up GetMultiXactIdMembers()'s behavior on error · d24c5658
      Heikki Linnakangas authored
      One of the error paths left *members uninitialized. That's not a live
      bug, because most callers don't look at *members when the function
      returns -1, but let's be tidy. One caller, in heap_lock_tuple(), does
      "if (members != NULL) pfree(members)", but AFAICS it never passes an
      invalid 'multi' value so it should not reach that error case.
      
      The callers are also a bit inconsistent in their expectations.
      heap_lock_tuple() pfrees the 'members' array if it's not-NULL, others
      pfree() it if "nmembers >= 0", and others if "nmembers > 0". That's
      not a live bug either, because the function should never return 0, but
      add an Assert for that to make it more clear. I left the callers alone
      for now.
      
      I also moved the line where we set *nmembers. It wasn't wrong before,
      but I like to do that right next to the 'return' statement, to make it
      clear that it's always set on return.
      
      Also remove one unreachable return statement after ereport(ERROR), for
      brevity and for consistency with the similar if-block right after it.
      
      Author: Greg Nancarrow with the additional changes by me
      Backpatch-through: 9.6, all supported versions
      d24c5658
    • Amit Kapila's avatar
      Document a few caveats in synchronous logical replication. · 3cb828db
      Amit Kapila authored
      In a synchronous logical setup, locking [user] catalog tables can cause
      deadlock. This is because logical decoding of transactions can lock
      catalog tables to access them so exclusively locking those in transactions
      can lead to deadlock. To avoid this users must refrain from having
      exclusive locks on catalog tables.
      
      Author: Takamichi Osumi
      Reviewed-by: Vignesh C, Amit Kapila
      Backpatch-through: 9.6
      Discussion: https://www.postgresql.org/message-id/20210222222847.tpnb6eg3yiykzpky%40alap3.anarazel.de
      3cb828db
  4. 16 Jun, 2021 4 commits
    • Tom Lane's avatar
      Fix plancache refcount leak after error in ExecuteQuery. · 131ea3e9
      Tom Lane authored
      When stuffing a plan from the plancache into a Portal, one is
      not supposed to risk throwing an error between GetCachedPlan and
      PortalDefineQuery; if that happens, the plan refcount incremented
      by GetCachedPlan will be leaked.  I managed to break this rule
      while refactoring code in 9dbf2b7d.  There is no visible
      consequence other than some memory leakage, and since nobody is
      very likely to trigger the relevant error conditions many times
      in a row, it's not surprising we haven't noticed.  Nonetheless,
      it's a bug, so rearrange the order of operations to remove the
      hazard.
      
      Noted on the way to looking for a better fix for bug #17053.
      This mistake is pretty old, so back-patch to all supported
      branches.
      131ea3e9
    • Tomas Vondra's avatar
      Fix copying data into slots with FDW batching · 99cea49d
      Tomas Vondra authored
      Commit b676ac44 optimized handling of tuple slots with bulk inserts
      into foreign tables, so that the slots are initialized only once and
      reused for all batches. The data was however copied into the slots only
      after the initialization, inserting duplicate values when the slot gets
      reused. Fixed by moving the ExecCopySlot outside the init branch.
      
      The existing postgres_fdw tests failed to catch this due to inserting
      data into foreign tables without unique indexes, and then checking only
      the number of inserted rows. This adds a new test with both a unique
      index and a check of inserted values.
      
      Reported-by: Alexander Pyhalov
      Discussion: https://postgr.es/m/7a8cf8d56b3d18e5c0bccd6cd42d04ac%40postgrespro.ru
      99cea49d
    • Tom Lane's avatar
      Improve SQLSTATE reporting in some replication-related code. · 6b787d9e
      Tom Lane authored
      I started out with the goal of reporting ERRCODE_CONNECTION_FAILURE
      when walrcv_connect() fails, but as I looked around I realized that
      whoever wrote this code was of the opinion that errcodes are purely
      optional.  That's not my understanding of our project policy.  Hence,
      make sure that an errcode is provided in each ereport that (a) is
      ERROR or higher level and (b) isn't arguably an internal logic error.
      Also fix some very dubious existing errcode assignments.
      
      While this is not per policy, it's also largely cosmetic, since few
      of these cases could get reported to applications.  So I don't
      feel a need to back-patch.
      
      Discussion: https://postgr.es/m/2189704.1623512522@sss.pgh.pa.us
      6b787d9e
    • Heikki Linnakangas's avatar
      Fix outdated comment that talked about seek position of WAL file. · d0303bc8
      Heikki Linnakangas authored
      Since commit c24dcd0c, we have been using pg_pread() to read the WAL
      file, which doesn't change the seek position (unless we fall back to
      the implementation in src/port/pread.c). Update comment accordingly.
      
      Backpatch-through: 12, where we started to use pg_pread()
      d0303bc8
  5. 15 Jun, 2021 12 commits
  6. 14 Jun, 2021 6 commits
  7. 13 Jun, 2021 3 commits
    • Tom Lane's avatar
      Work around portability issue with newer versions of mktime(). · f807e341
      Tom Lane authored
      Recent glibc versions have made mktime() fail if tm_isdst is
      inconsistent with the prevailing timezone; in particular it fails for
      tm_isdst = 1 when the zone is UTC.  (This seems wildly inconsistent
      with the POSIX-mandated treatment of "incorrect" values for the other
      fields of struct tm, so if you ask me it's a bug, but I bet they'll
      say it's intentional.)  This has been observed to cause cosmetic
      problems when pg_restore'ing an archive created in a different
      timezone.
      
      To fix, do mktime() using the field values from the archive, and if
      that fails try again with tm_isdst = -1.  This will give a result
      that's off by the UTC-offset difference from the original zone, but
      that was true before, too.  It's not terribly critical since we don't
      do anything with the result except possibly print it.  (Someday we
      should flush this entire bit of logic and record a standard-format
      timestamp in the archive instead.  That's not okay for a back-patched
      bug fix, though.)
      
      Also, guard our only other use of mktime() by having initdb's
      build_time_t() set tm_isdst = -1 not 0.  This case could only have
      an issue in zones that are DST year-round; but I think some do exist,
      or could in future.
      
      Per report from Wells Oliver.  Back-patch to all supported
      versions, since any of them might need to run with a newer glibc.
      
      Discussion: https://postgr.es/m/CAOC+FBWDhDHO7G-i1_n_hjRzCnUeFO+H-Czi1y10mFhRWpBrew@mail.gmail.com
      f807e341
    • Andrew Dunstan's avatar
      Further tweaks to stuck_on_old_timeline recovery test · 9d97c340
      Andrew Dunstan authored
      Translate path slashes on target directory path. This was confusing old
      branches, but is applied to all branches for the sake of uniformity.
      Perl is perfectly able to understand paths with forward slashes.
      
      Along the way, restore the previous archive_wait query, for the sake of
      uniformity with other tests, per gripe from Tom Lane.
      9d97c340
    • Michael Paquier's avatar
      Ignore more environment variables in pg_regress.c · a9e0b3b0
      Michael Paquier authored
      This is similar to the work done in 8279f68a for TestLib.pm, where
      environment variables set may cause unwanted failures if using a
      temporary installation with pg_regress.  The list of variables reset is
      adjusted in each stable branch depending on what is supported.
      
      Comments are added to remember that the lists in TestLib.pm and
      pg_regress.c had better be kept in sync.
      
      Reviewed-by: Álvaro Herrera
      Discussion: https://postgr.es/m/YMNR9GYDn+fHlMta@paquier.xyz
      Backpatch-through: 9.6
      a9e0b3b0
  8. 12 Jun, 2021 5 commits
    • Tom Lane's avatar
      Restore robustness of TAP tests that wait for postmaster restart. · f452aaf7
      Tom Lane authored
      Several TAP tests use poll_query_until() to wait for the postmaster
      to restart.  They were checking to see if a trivial query
      (e.g. "SELECT 1") succeeds.  However, that's problematic in the wake
      of commit 11e9caff, because now that we feed said query to psql
      via stdin, we risk IPC::Run whining about a SIGPIPE failure if psql
      quits before reading the query.  Hence, we can't use a nonempty
      query in cases where we need to wait for connection failures to
      stop happening.
      
      Per the precedent of commits c757a3da and 6d41dd04, we can pass
      "undef" as the query in such cases to ensure that IPC::Run has
      nothing to write.  However, then we have to say that the expected
      output is empty, and this exposes a deficiency in poll_query_until:
      if psql fails altogether and returns empty stdout, poll_query_until
      will treat that as a success!  That's because, contrary to its
      documentation, it makes no actual check for psql failure, looking
      neither at the exit status nor at stderr.
      
      To fix that, adjust poll_query_until to insist on empty stderr as
      well as a stdout match.  (I experimented with checking exit status
      instead, but it seems that psql often does exit(1) in cases that we
      need to consider successes.  That might be something to fix someday,
      but it would be a non-back-patchable behavior change.)
      
      Back-patch to v10.  The test cases needing this exist only as far
      back as v11, but it seems wise to keep poll_query_until's behavior
      the same in v10, in case we back-patch another such test case in
      future.  (9.6 does not currently need this change, because in that
      branch poll_query_until can't be told to accept empty stdout as
      a success case.)
      
      Per assorted buildfarm failures, mostly on hoverfly.
      
      Discussion: https://postgr.es/m/CAA4eK1+zM6L4QSA1XMvXY_qqWwdUmqkOS1+hWvL8QcYEBGA1Uw@mail.gmail.com
      f452aaf7
    • Tom Lane's avatar
      Ensure pg_filenode_relation(0, 0) returns NULL. · 1250aad4
      Tom Lane authored
      Previously, a zero value for the relfilenode resulted in
      a confusing error message about "unexpected duplicate".
      This function returns NULL for other invalid relfilenode
      values, so zero should be treated likewise.
      
      It's been like this all along, so back-patch to all supported
      branches.
      
      Justin Pryzby
      
      Discussion: https://postgr.es/m/20210612023324.GT16435@telsasoft.com
      1250aad4
    • Tom Lane's avatar
      Don't use Asserts to check for violations of replication protocol. · fe6a20ce
      Tom Lane authored
      Using an Assert to check the validity of incoming messages is an
      extremely poor decision.  In a debug build, it should not be that easy
      for a broken or malicious remote client to crash the logrep worker.
      The consequences could be even worse in non-debug builds, which will
      fail to make such checks at all, leading to who-knows-what misbehavior.
      Hence, promote every Assert that could possibly be triggered by wrong
      or out-of-order replication messages to a full test-and-ereport.
      
      To avoid bloating the set of messages the translation team has to cope
      with, establish a policy that replication protocol violation error
      reports don't need to be translated.  Hence, all the new messages here
      use errmsg_internal().  A couple of old messages are changed likewise
      for consistency.
      
      Along the way, fix some non-idiomatic or outright wrong uses of
      hash_search().
      
      Most of these mistakes are new with the "streaming replication"
      patch (commit 46482432), but a couple go back a long way.
      Back-patch as appropriate.
      
      Discussion: https://postgr.es/m/1719083.1623351052@sss.pgh.pa.us
      fe6a20ce
    • Andrew Dunstan's avatar
      Fix new recovery test for use under msys · c3652f97
      Andrew Dunstan authored
      Commit caba8f0d wasn't quite right for msys, as demonstrated by
      several buildfarm animals, including jacana and fairywren. We need to
      use the msys perl in the archive command, but call it in such a way that
      Windows will understand the path. Furthermore, inside the copy script we
      need to convert a Windows path to an msys path.
      c3652f97
    • Michael Paquier's avatar
      Simplify some code in getObjectTypeDescription() · b56b83aa
      Michael Paquier authored
      This routine is designed to never return an empty description or NULL,
      providing description fallbacks even if missing objects are accepted,
      but it included a code path where this was considered possible.  All the
      callers of this routine already consider NULL as not possible, so
      change a bit the code to map with the assumptions of the callers, and
      add more comments close to the callers of this routine to outline the
      behavior expected.
      
      This code is new as of 2a10fdc4, so no backpatch is needed.
      
      Discussion: https://postgr.es/m/YMNY6RGPBRCeLmFb@paquier.xyz
      b56b83aa