1. 24 Mar, 2021 3 commits
    • Fujii Masao's avatar
      Rename wait event WalrcvExit to WalReceiverExit. · 84007043
      Fujii Masao authored
      Commit de829ddf added wait event WalrcvExit. But its name is not
      consistent with other wait events like WalReceiverMain or
      WalReceiverWaitStart, etc. So this commit renames WalrcvExit to
      WalReceiverExit.
      
      Author: Fujii Masao
      Reviewed-by: Thomas Munro
      Discussion: https://postgr.es/m/cced9995-8fa2-7b22-9d91-3f22a2b8c23c@oss.nttdata.com
      84007043
    • Fujii Masao's avatar
      Log when GetNewOidWithIndex() fails to find unused OID many times. · 7fbcee1b
      Fujii Masao authored
      GetNewOidWithIndex() generates a new OID one by one until it finds
      one not in the relation. If there are very long runs of consecutive
      existing OIDs, GetNewOidWithIndex() needs to iterate many times
      in the loop to find unused OID. Since TOAST table can have a large
      number of entries and there can be such long runs of OIDs, there is
      the case where it takes so many iterations to find new OID not in
      TOAST table. Furthermore if all (i.e., 2^32) OIDs are already used,
      GetNewOidWithIndex() enters something like busy loop and repeats
      the iterations until at least one OID is marked as unused.
      
      There are some reported troubles caused by a large number of
      iterations in GetNewOidWithIndex(). For example, when inserting
      a billion of records into the table, all the backends doing that
      insertion operation got hang with 100% CPU usage at some point.
      
      Previously there was no easy way to detect that GetNewOidWithIndex()
      failed to find unused OID many times. So, for example, gdb full
      backtrace of hanged backends needed to be taken, in order to
      investigate that trouble. This is inconvenient and may not be
      available in some production environments.
      
      To provide easy way for that, this commit makes GetNewOidWithIndex()
      log that it iterates more than GETNEWOID_LOG_THRESHOLD but have
      not yet found OID unused in the relation. Also this commit makes
      it repeat logging with exponentially increasing intervals until
      it iterates more than GETNEWOID_LOG_MAX_INTERVAL, and makes it
      finally repeat logging every GETNEWOID_LOG_MAX_INTERVAL unless
      an unused OID is found. Those macro variables are used not to
      fill up the server log with the similar messages.
      
      In the discusion at pgsql-hackers, there was another idea to report
      the lots of iterations in GetNewOidWithIndex() via wait event.
      But since GetNewOidWithIndex() traverses indexes to find unused
      OID and which will do I/O, acquire locks, etc, which will overwrite
      the wait event and reset it to nothing once done. So that idea
      doesn't work well, and we didn't adopt it.
      
      Author: Tomohiro Hiramitsu
      Reviewed-by: Tatsuhito Kasahara, Kyotaro Horiguchi, Tom Lane, Fujii Masao
      Discussion: https://postgr.es/m/16722-93043fb459a41073@postgresql.org
      7fbcee1b
    • Michael Paquier's avatar
      Reword slightly logs generated for index stats in autovacuum · 99dd75fb
      Michael Paquier authored
      Using "remain" is confusing, as it implies that the index file can
      shrink.  Instead, use "in total".
      
      Per discussion with Peter Geoghegan.
      
      Discussion: https://postgr.es/m/CAH2-WzkYgHZzpGOwR14CScJsjaQpvJrEkEfkh_=wGhzLb=yVdQ@mail.gmail.com
      99dd75fb
  2. 23 Mar, 2021 14 commits
  3. 22 Mar, 2021 14 commits
  4. 21 Mar, 2021 9 commits
    • Michael Paquier's avatar
      Simplify TAP tests of kerberos with expected log file contents · 11e1577a
      Michael Paquier authored
      The TAP tests of kerberos rely on the logs generated by the backend to
      check various connection scenarios.  In order to make sure that a given
      test does not overlap with the log contents generated by a previous
      test, the test suite relied on a logic with the logging collector and a
      rotation of the log files to ensure the uniqueness of the log generated
      with a wait phase.
      
      Parsing the log contents for expected patterns is a problem that has
      been solved in a simpler way by PostgresNode::issues_sql_like() where
      the log file is truncated before checking for the contents generated,
      with the backend sending its output to a log file given by pg_ctl
      instead.  This commit switches the kerberos test suite to use such a
      method, removing any wait phase and simplifying the whole logic,
      resulting in less code.  If a failure happens in the tests, the contents
      of the logs are still showed to the user at the moment of the failure
      thanks to like(), so this has no impact on debugging capabilities.
      
      I have bumped into this issue while reviewing a different patch set
      aiming at extending the kerberos test suite to check for multiple
      log patterns instead of one now.
      
      Author: Michael Paquier
      Reviewed-by: Stephen Frost, Bharath Rupireddy
      Discussion: https://postgr.es/m/YFXcq2vBTDGQVBNC@paquier.xyz
      11e1577a
    • Michael Paquier's avatar
      Fix timeline assignment in checkpoints with 2PC transactions · 595b9cba
      Michael Paquier authored
      Any transactions found as still prepared by a checkpoint have their
      state data read from the WAL records generated by PREPARE TRANSACTION
      before being moved into their new location within pg_twophase/.  While
      reading such records, the WAL reader uses the callback
      read_local_xlog_page() to read a page, that is shared across various
      parts of the system.  This callback, since 1148e22a, has introduced an
      update of ThisTimeLineID when reading a record while in recovery, which
      is potentially helpful in the context of cascading WAL senders.
      
      This update of ThisTimeLineID interacts badly with the checkpointer if a
      promotion happens while some 2PC data is read from its record, as, by
      changing ThisTimeLineID, any follow-up WAL records would be written to
      an timeline older than the promoted one.  This results in consistency
      issues.  For instance, a subsequent server restart would cause a failure
      in finding a valid checkpoint record, resulting in a PANIC, for
      instance.
      
      This commit changes the code reading the 2PC data to reset the timeline
      once the 2PC record has been read, to prevent messing up with the static
      state of the checkpointer.  It would be tempting to do the same thing
      directly in read_local_xlog_page().  However, based on the discussion
      that has led to 1148e22a, users may rely on the updates of
      ThisTimeLineID when a WAL record page is read in recovery, so changing
      this callback could break some cases that are working currently.
      
      A TAP test reproducing the issue is added, relying on a PITR to
      precisely trigger a promotion with a prepared transaction still
      tracked.
      
      Per discussion with Heikki Linnakangas, Kyotaro Horiguchi, Fujii Masao
      and myself.
      
      Author: Soumyadeep Chakraborty, Jimmy Yih, Kevin Yeap
      Discussion: https://postgr.es/m/CAE-ML+_EjH_fzfq1F3RJ1=XaaNG=-Jz-i3JqkNhXiLAsM3z-Ew@mail.gmail.com
      Backpatch-through: 10
      595b9cba
    • Tom Lane's avatar
      Fix assorted silliness in ATExecSetCompression(). · ac897c48
      Tom Lane authored
      It's not okay to scribble directly on a syscache entry.
      Nor to continue accessing said entry after releasing it.
      
      Also get rid of not-used local variables.
      
      Per valgrind testing.
      ac897c48
    • Peter Geoghegan's avatar
      Recycle nbtree pages deleted during same VACUUM. · 9dd963ae
      Peter Geoghegan authored
      Maintain a simple array of metadata about pages that were deleted during
      nbtree VACUUM's current btvacuumscan() call.  Use this metadata at the
      end of btvacuumscan() to attempt to place newly deleted pages in the FSM
      without further delay.  It might not yet be safe to place any of the
      pages in the FSM by then (they may not be deemed recyclable), but we
      have little to lose and plenty to gain by trying.  In practice there is
      a very good chance that this will work out when vacuuming larger
      indexes, where scanning the index naturally takes quite a while.
      
      This commit doesn't change the page recycling invariants; it merely
      improves the efficiency of page recycling within the confines of the
      existing design.  Recycle safety is a part of nbtree's implementation of
      what Lanin & Shasha call "the drain technique".  The design happens to
      use transaction IDs (they're stored in deleted pages), but that in
      itself doesn't align the cutoff for recycle safety to any of the
      XID-based cutoffs used by VACUUM (e.g., OldestXmin).  All that matters
      is whether or not _other_ backends might be able to observe various
      inconsistencies in the tree structure (that they cannot just detect and
      recover from by moving right).  Recycle safety is purely a question of
      maintaining the consistency (or the apparent consistency) of a physical
      data structure.
      
      Note that running a simple serial test case involving a large range
      DELETE followed by a VACUUM VERBOSE will probably show that any newly
      deleted nbtree pages are not yet reusable/recyclable.  This is expected
      in the absence of even one concurrent XID assignment.  It is an old
      implementation restriction.  In practice it's unlikely to be the thing
      that makes recycling remain unsafe, at least with larger indexes, where
      recycling newly deleted pages during the same VACUUM actually matters.
      
      An important high-level goal of this commit (as well as related recent
      commits e5d8a999 and 9f3665fb) is to make expensive deferred cleanup
      operations in index AMs rare in general.  If index vacuuming frequently
      depends on the next VACUUM operation finishing off work that the current
      operation started, then the general behavior of index vacuuming is hard
      to predict.  This is relevant to ongoing work that adds a vacuumlazy.c
      mechanism to skip index vacuuming in certain cases.  Anything that makes
      the real world behavior of index vacuuming simpler and more linear will
      also make top-down modeling in vacuumlazy.c more robust.
      
      Author: Peter Geoghegan <pg@bowt.ie>
      Reviewed-By: default avatarMasahiko Sawada <sawada.mshk@gmail.com>
      Discussion: https://postgr.es/m/CAH2-Wzk76_P=67iUscb1UN44-gyZL-KgpsXbSxq_bdcMa7Q+wQ@mail.gmail.com
      9dd963ae
    • Tom Lane's avatar
      Bring configure support for LZ4 up to snuff. · 4d399a6f
      Tom Lane authored
      It's not okay to just shove the pkg_config results right into our
      build flags, for a couple different reasons:
      
      * This fails to maintain the separation between CPPFLAGS and CFLAGS,
      as well as that between LDFLAGS and LIBS.  (The CPPFLAGS angle is,
      I believe, the reason for warning messages reported when building
      with MacPorts' liblz4.)
      
      * If pkg_config emits anything other than -I/-D/-L/-l switches,
      it's highly unlikely that we want to absorb those.  That'd be more
      likely to break the build than do anything helpful.  (Even the -D
      case is questionable; but we're doing that for libxml2, so I kept it.)
      
      Also, it's not okay to skip doing an AC_CHECK_LIB probe, as
      evidenced by recent build failure on topminnow; that should
      have been caught at configure time.
      
      Model fixes for this on configure's libxml2 support.
      
      It appears that somebody overlooked an autoheader run, too.
      
      Discussion: https://postgr.es/m/20210119190720.GL8560@telsasoft.com
      4d399a6f
    • Tom Lane's avatar
      Make compression.sql regression test independent of default. · fd1ac9a5
      Tom Lane authored
      This test will fail in "make installcheck" if the installation's
      default_toast_compression setting is not 'pglz'.  Make it robust
      against that situation.
      
      Dilip Kumar
      
      Discussion: https://postgr.es/m/CAFiTN-t0w+Rc2U3S+y=7KWcLuOYNB5MfWeGdNa7+pg0UovVdcQ@mail.gmail.com
      fd1ac9a5
    • Andrew Dunstan's avatar
      Don't run recover crash_temp_files test in Windows perl · ef823873
      Andrew Dunstan authored
      This reverts commit 677271a3.
      "Unbreak recovery test on Windows"
      
      The test hangs on Windows, and attempts to remedy the problem have
      proved fragile at best. So we simply disable the test on Windows perl.
      (Msys perl seems perfectly happy).
      
      Discussion: https://postgr.es/m/5b748470-7335-5439-e876-6a88c951e1c5@dunslane.net
      ef823873
    • Alvaro Herrera's avatar
      Fix new memory leaks in libpq · 2b526ed2
      Alvaro Herrera authored
      My oversight in commit 9aa491ab.
      
      Per coverity.
      2b526ed2
    • Andrew Dunstan's avatar
      Unbreak recovery test on Windows · 677271a3
      Andrew Dunstan authored
      On Windows we need to send explicit quit messages to psql or the TAP tests
      can hang.
      677271a3