1. 24 Mar, 2021 8 commits
    • Stephen Frost's avatar
      Change checkpoint_completion_target default to 0.9 · bbcc4eb2
      Stephen Frost authored
      Common recommendations are that the checkpoint should be spread out as
      much as possible, provided we avoid having it take too long.  This
      change updates the default to 0.9 (from 0.5) to match that
      recommendation.
      
      There was some debate about possibly removing the option entirely but it
      seems there may be some corner-cases where having it set much lower to
      try to force the checkpoint to be as fast as possible could result in
      fewer periods of time of reduced performance due to kernel flushing.
      General agreement is that the "spread more" is the preferred approach
      though and those who need to tune away from that value are much less
      common.
      
      Reviewed-By: Michael Paquier, Peter Eisentraut, Tom Lane, David Steele,
      Nathan Bossart
      Discussion: https://postgr.es/m/20201207175329.GM16415%40tamriel.snowman.net
      bbcc4eb2
    • Robert Haas's avatar
      Tidy up more loose ends related to configurable TOAST compression. · e5595de0
      Robert Haas authored
      Change the default_toast_compression GUC to be an enum rather than
      a string. Earlier, uncommitted versions of the patch supported using
      CREATE ACCESS METHOD to add new compression methods to a running
      system, but that idea was dropped before commit. So, we can simplify
      the GUC handling as well, which has the nice side effect of improving
      the error messages.
      
      While updating the documentation to reflect the new GUC type, also
      move it back to the right place in the list. I moved this while
      revising what became commit 24f0e395,
      but apparently the intended ordering is "alphabetical" rather than
      "whatever Robert thinks looks nice."
      
      Rejigger things to avoid having access/toast_compression.h depend on
      utils/guc.h, so that we don't end up with every file that includes
      it also depending on something largely unrelated. Move a few
      inline functions back into the C source file partly to help reduce
      dependencies and partly just to avoid clutter. A few very minor
      cosmetic fixes.
      
      Original patch by Justin Pryzby, but very heavily edited by me,
      and reverse reviewed by him and also reviewed by by Tom Lane.
      
      Discussion: http://postgr.es/m/CA+TgmoYp=GT_ztUCeZg2i4hkHAQv8o=-nVJ1-TKWTG1zQOmOpg@mail.gmail.com
      e5595de0
    • Peter Eisentraut's avatar
      Add date_bin function · 49ab61f0
      Peter Eisentraut authored
      Similar to date_trunc, but allows binning by an arbitrary interval
      rather than just full units.
      
      Author: John Naylor <john.naylor@enterprisedb.com>
      Reviewed-by: default avatarDavid Fetter <david@fetter.org>
      Reviewed-by: default avatarIsaac Morland <isaac.morland@gmail.com>
      Reviewed-by: default avatarTom Lane <tgl@sss.pgh.pa.us>
      Reviewed-by: default avatarArtur Zakirov <zaartur@gmail.com>
      Discussion: https://www.postgresql.org/message-id/flat/CACPNZCt4buQFRgy6DyjuZS-2aPDpccRkrJBmgUfwYc1KiaXYxg@mail.gmail.com
      49ab61f0
    • Peter Eisentraut's avatar
      Improve an error message · 1509c6fc
      Peter Eisentraut authored
      Make it the same as another nearby message.
      1509c6fc
    • Amit Kapila's avatar
      Revert "Enable parallel SELECT for "INSERT INTO ... SELECT ..."." · 26acb54a
      Amit Kapila authored
      To allow inserts in parallel-mode this feature has to ensure that all the
      constraints, triggers, etc. are parallel-safe for the partition hierarchy
      which is costly and we need to find a better way to do that. Additionally,
      we could have used existing cached information in some cases like indexes,
      domains, etc. to determine the parallel-safety.
      
      List of commits reverted, in reverse chronological order:
      
      ed62d373 Doc: Update description for parallel insert reloption.
      c8f78b61 Add a new GUC and a reloption to enable inserts in parallel-mode.
      c5be48f0 Improve FK trigger parallel-safety check added by 05c8482f.
      e2cda3c2 Fix use of relcache TriggerDesc field introduced by commit 05c8482f.
      e4e87a32 Fix valgrind issue in commit 05c8482f.
      05c8482f Enable parallel SELECT for "INSERT INTO ... SELECT ...".
      
      Discussion: https://postgr.es/m/E1lMiB9-0001c3-SY@gemulon.postgresql.org
      26acb54a
    • Fujii Masao's avatar
      Rename wait event WalrcvExit to WalReceiverExit. · 84007043
      Fujii Masao authored
      Commit de829ddf added wait event WalrcvExit. But its name is not
      consistent with other wait events like WalReceiverMain or
      WalReceiverWaitStart, etc. So this commit renames WalrcvExit to
      WalReceiverExit.
      
      Author: Fujii Masao
      Reviewed-by: Thomas Munro
      Discussion: https://postgr.es/m/cced9995-8fa2-7b22-9d91-3f22a2b8c23c@oss.nttdata.com
      84007043
    • Fujii Masao's avatar
      Log when GetNewOidWithIndex() fails to find unused OID many times. · 7fbcee1b
      Fujii Masao authored
      GetNewOidWithIndex() generates a new OID one by one until it finds
      one not in the relation. If there are very long runs of consecutive
      existing OIDs, GetNewOidWithIndex() needs to iterate many times
      in the loop to find unused OID. Since TOAST table can have a large
      number of entries and there can be such long runs of OIDs, there is
      the case where it takes so many iterations to find new OID not in
      TOAST table. Furthermore if all (i.e., 2^32) OIDs are already used,
      GetNewOidWithIndex() enters something like busy loop and repeats
      the iterations until at least one OID is marked as unused.
      
      There are some reported troubles caused by a large number of
      iterations in GetNewOidWithIndex(). For example, when inserting
      a billion of records into the table, all the backends doing that
      insertion operation got hang with 100% CPU usage at some point.
      
      Previously there was no easy way to detect that GetNewOidWithIndex()
      failed to find unused OID many times. So, for example, gdb full
      backtrace of hanged backends needed to be taken, in order to
      investigate that trouble. This is inconvenient and may not be
      available in some production environments.
      
      To provide easy way for that, this commit makes GetNewOidWithIndex()
      log that it iterates more than GETNEWOID_LOG_THRESHOLD but have
      not yet found OID unused in the relation. Also this commit makes
      it repeat logging with exponentially increasing intervals until
      it iterates more than GETNEWOID_LOG_MAX_INTERVAL, and makes it
      finally repeat logging every GETNEWOID_LOG_MAX_INTERVAL unless
      an unused OID is found. Those macro variables are used not to
      fill up the server log with the similar messages.
      
      In the discusion at pgsql-hackers, there was another idea to report
      the lots of iterations in GetNewOidWithIndex() via wait event.
      But since GetNewOidWithIndex() traverses indexes to find unused
      OID and which will do I/O, acquire locks, etc, which will overwrite
      the wait event and reset it to nothing once done. So that idea
      doesn't work well, and we didn't adopt it.
      
      Author: Tomohiro Hiramitsu
      Reviewed-by: Tatsuhito Kasahara, Kyotaro Horiguchi, Tom Lane, Fujii Masao
      Discussion: https://postgr.es/m/16722-93043fb459a41073@postgresql.org
      7fbcee1b
    • Michael Paquier's avatar
      Reword slightly logs generated for index stats in autovacuum · 99dd75fb
      Michael Paquier authored
      Using "remain" is confusing, as it implies that the index file can
      shrink.  Instead, use "in total".
      
      Per discussion with Peter Geoghegan.
      
      Discussion: https://postgr.es/m/CAH2-WzkYgHZzpGOwR14CScJsjaQpvJrEkEfkh_=wGhzLb=yVdQ@mail.gmail.com
      99dd75fb
  2. 23 Mar, 2021 14 commits
  3. 22 Mar, 2021 14 commits
  4. 21 Mar, 2021 4 commits
    • Michael Paquier's avatar
      Simplify TAP tests of kerberos with expected log file contents · 11e1577a
      Michael Paquier authored
      The TAP tests of kerberos rely on the logs generated by the backend to
      check various connection scenarios.  In order to make sure that a given
      test does not overlap with the log contents generated by a previous
      test, the test suite relied on a logic with the logging collector and a
      rotation of the log files to ensure the uniqueness of the log generated
      with a wait phase.
      
      Parsing the log contents for expected patterns is a problem that has
      been solved in a simpler way by PostgresNode::issues_sql_like() where
      the log file is truncated before checking for the contents generated,
      with the backend sending its output to a log file given by pg_ctl
      instead.  This commit switches the kerberos test suite to use such a
      method, removing any wait phase and simplifying the whole logic,
      resulting in less code.  If a failure happens in the tests, the contents
      of the logs are still showed to the user at the moment of the failure
      thanks to like(), so this has no impact on debugging capabilities.
      
      I have bumped into this issue while reviewing a different patch set
      aiming at extending the kerberos test suite to check for multiple
      log patterns instead of one now.
      
      Author: Michael Paquier
      Reviewed-by: Stephen Frost, Bharath Rupireddy
      Discussion: https://postgr.es/m/YFXcq2vBTDGQVBNC@paquier.xyz
      11e1577a
    • Michael Paquier's avatar
      Fix timeline assignment in checkpoints with 2PC transactions · 595b9cba
      Michael Paquier authored
      Any transactions found as still prepared by a checkpoint have their
      state data read from the WAL records generated by PREPARE TRANSACTION
      before being moved into their new location within pg_twophase/.  While
      reading such records, the WAL reader uses the callback
      read_local_xlog_page() to read a page, that is shared across various
      parts of the system.  This callback, since 1148e22a, has introduced an
      update of ThisTimeLineID when reading a record while in recovery, which
      is potentially helpful in the context of cascading WAL senders.
      
      This update of ThisTimeLineID interacts badly with the checkpointer if a
      promotion happens while some 2PC data is read from its record, as, by
      changing ThisTimeLineID, any follow-up WAL records would be written to
      an timeline older than the promoted one.  This results in consistency
      issues.  For instance, a subsequent server restart would cause a failure
      in finding a valid checkpoint record, resulting in a PANIC, for
      instance.
      
      This commit changes the code reading the 2PC data to reset the timeline
      once the 2PC record has been read, to prevent messing up with the static
      state of the checkpointer.  It would be tempting to do the same thing
      directly in read_local_xlog_page().  However, based on the discussion
      that has led to 1148e22a, users may rely on the updates of
      ThisTimeLineID when a WAL record page is read in recovery, so changing
      this callback could break some cases that are working currently.
      
      A TAP test reproducing the issue is added, relying on a PITR to
      precisely trigger a promotion with a prepared transaction still
      tracked.
      
      Per discussion with Heikki Linnakangas, Kyotaro Horiguchi, Fujii Masao
      and myself.
      
      Author: Soumyadeep Chakraborty, Jimmy Yih, Kevin Yeap
      Discussion: https://postgr.es/m/CAE-ML+_EjH_fzfq1F3RJ1=XaaNG=-Jz-i3JqkNhXiLAsM3z-Ew@mail.gmail.com
      Backpatch-through: 10
      595b9cba
    • Tom Lane's avatar
      Fix assorted silliness in ATExecSetCompression(). · ac897c48
      Tom Lane authored
      It's not okay to scribble directly on a syscache entry.
      Nor to continue accessing said entry after releasing it.
      
      Also get rid of not-used local variables.
      
      Per valgrind testing.
      ac897c48
    • Peter Geoghegan's avatar
      Recycle nbtree pages deleted during same VACUUM. · 9dd963ae
      Peter Geoghegan authored
      Maintain a simple array of metadata about pages that were deleted during
      nbtree VACUUM's current btvacuumscan() call.  Use this metadata at the
      end of btvacuumscan() to attempt to place newly deleted pages in the FSM
      without further delay.  It might not yet be safe to place any of the
      pages in the FSM by then (they may not be deemed recyclable), but we
      have little to lose and plenty to gain by trying.  In practice there is
      a very good chance that this will work out when vacuuming larger
      indexes, where scanning the index naturally takes quite a while.
      
      This commit doesn't change the page recycling invariants; it merely
      improves the efficiency of page recycling within the confines of the
      existing design.  Recycle safety is a part of nbtree's implementation of
      what Lanin & Shasha call "the drain technique".  The design happens to
      use transaction IDs (they're stored in deleted pages), but that in
      itself doesn't align the cutoff for recycle safety to any of the
      XID-based cutoffs used by VACUUM (e.g., OldestXmin).  All that matters
      is whether or not _other_ backends might be able to observe various
      inconsistencies in the tree structure (that they cannot just detect and
      recover from by moving right).  Recycle safety is purely a question of
      maintaining the consistency (or the apparent consistency) of a physical
      data structure.
      
      Note that running a simple serial test case involving a large range
      DELETE followed by a VACUUM VERBOSE will probably show that any newly
      deleted nbtree pages are not yet reusable/recyclable.  This is expected
      in the absence of even one concurrent XID assignment.  It is an old
      implementation restriction.  In practice it's unlikely to be the thing
      that makes recycling remain unsafe, at least with larger indexes, where
      recycling newly deleted pages during the same VACUUM actually matters.
      
      An important high-level goal of this commit (as well as related recent
      commits e5d8a999 and 9f3665fb) is to make expensive deferred cleanup
      operations in index AMs rare in general.  If index vacuuming frequently
      depends on the next VACUUM operation finishing off work that the current
      operation started, then the general behavior of index vacuuming is hard
      to predict.  This is relevant to ongoing work that adds a vacuumlazy.c
      mechanism to skip index vacuuming in certain cases.  Anything that makes
      the real world behavior of index vacuuming simpler and more linear will
      also make top-down modeling in vacuumlazy.c more robust.
      
      Author: Peter Geoghegan <pg@bowt.ie>
      Reviewed-By: default avatarMasahiko Sawada <sawada.mshk@gmail.com>
      Discussion: https://postgr.es/m/CAH2-Wzk76_P=67iUscb1UN44-gyZL-KgpsXbSxq_bdcMa7Q+wQ@mail.gmail.com
      9dd963ae