1. 18 Jan, 2021 3 commits
    • Peter Eisentraut's avatar
      Pause recovery for insufficient parameter settings · 15251c0a
      Peter Eisentraut authored
      When certain parameters are changed on a physical replication primary,
      this is communicated to standbys using the XLOG_PARAMETER_CHANGE WAL
      record.  The standby then checks whether its own settings are at least
      as big as the ones on the primary.  If not, the standby shuts down
      with a fatal error.
      
      This patch changes this behavior for hot standbys to pause recovery at
      that point instead.  That allows read traffic on the standby to
      continue while database administrators figure out next steps.  When
      recovery is unpaused, the server shuts down (as before).  The idea is
      to fix the parameters while recovery is paused and then restart when
      there is a maintenance window.
      Reviewed-by: default avatarSergei Kornilov <sk@zsrv.org>
      Discussion: https://www.postgresql.org/message-id/flat/4ad69a4c-cc9b-0dfe-0352-8b1b0cd36c7b@2ndquadrant.com
      15251c0a
    • Fujii Masao's avatar
      postgres_fdw: Add function to list cached connections to foreign servers. · 708d165d
      Fujii Masao authored
      This commit adds function postgres_fdw_get_connections() to return
      the foreign server names of all the open connections that postgres_fdw
      established from the local session to the foreign servers. This function
      also returns whether each connection is valid or not.
      
      This function is useful when checking all the open foreign server connections.
      If we found some connection to drop, from the result of function, probably
      we can explicitly close them by the function that upcoming commit will add.
      
      This commit bumps the version of postgres_fdw to 1.1 since it adds
      new function.
      
      Author: Bharath Rupireddy, tweaked by Fujii Masao
      Reviewed-by: Zhijie Hou, Alexey Kondratov, Zhihong Yu, Fujii Masao
      Discussion: https://postgr.es/m/2d5cb0b3-a6e8-9bbb-953f-879f47128faa@oss.nttdata.com
      708d165d
    • Michael Paquier's avatar
      Refactor option handling of CLUSTER, REINDEX and VACUUM · a3dc9260
      Michael Paquier authored
      This continues the work done in b5913f61.  All the options of those
      commands are changed to use hex values rather than enums to reduce the
      risk of compatibility bugs when introducing new options.  Each option
      set is moved into a new structure that can be extended with more
      non-boolean options (this was already the case of VACUUM).  The code of
      REINDEX is restructured so as manual REINDEX commands go through a
      single routine from utility.c, like VACUUM, to ease the allocation
      handling of option parameters when a command needs to go through
      multiple transactions.
      
      This can be used as a base infrastructure for future patches related to
      those commands, including reindex filtering and tablespace support.
      
      Per discussion with people mentioned below, as well as Alvaro Herrera
      and Peter Eisentraut.
      
      Author: Michael Paquier, Justin Pryzby
      Reviewed-by: Alexey Kondratov, Justin Pryzby
      Discussion: https://postgr.es/m/X8riynBLwxAD9uKk@paquier.xyz
      a3dc9260
  2. 17 Jan, 2021 7 commits
  3. 16 Jan, 2021 5 commits
  4. 15 Jan, 2021 7 commits
  5. 14 Jan, 2021 7 commits
    • Tom Lane's avatar
      pg_dump: label PUBLICATION TABLE ArchiveEntries with an owner. · 8e396a77
      Tom Lane authored
      This is the same fix as commit 9eabfe30 applied to INDEX ATTACH
      entries, but for table-to-publication attachments.  As in that
      case, even though the backend doesn't record "ownership" of the
      attachment, we still ought to label it in the dump archive with
      the role name that should run the ALTER PUBLICATION command.
      The existing behavior causes the ALTER to be done by the original
      role that started the restore; that will usually work fine, but
      there may be corner cases where it fails.
      
      The bulk of the patch is concerned with changing struct
      PublicationRelInfo to include a pointer to the associated
      PublicationInfo object, so that we can get the owner's name
      out of that when the time comes.  While at it, I rewrote
      getPublicationTables() to do just one query of pg_publication_rel,
      not one per table.
      
      Back-patch to v10 where this code was introduced.
      
      Discussion: https://postgr.es/m/1165710.1610473242@sss.pgh.pa.us
      8e396a77
    • Alvaro Herrera's avatar
      Prevent drop of tablespaces used by partitioned relations · ebfe2dbd
      Alvaro Herrera authored
      When a tablespace is used in a partitioned relation (per commits
      ca410302 in pg12 for tables and 33e6c34c3267 in pg11 for indexes),
      it is possible to drop the tablespace, potentially causing various
      problems.  One such was reported in bug #16577, where a rewriting ALTER
      TABLE causes a server crash.
      
      Protect against this by using pg_shdepend to keep track of tablespaces
      when used for relations that don't keep physical files; we now abort a
      tablespace if we see that the tablespace is referenced from any
      partitioned relations.
      
      Backpatch this to 11, where this problem has been latent all along.  We
      don't try to create pg_shdepend entries for existing partitioned
      indexes/tables, but any ones that are modified going forward will be
      protected.
      
      Note slight behavior change: when trying to drop a tablespace that
      contains both regular tables as well as partitioned ones, you'd
      previously get ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE and now you'll
      get ERRCODE_DEPENDENT_OBJECTS_STILL_EXIST.  Arguably, the latter is more
      correct.
      
      It is possible to add protecting pg_shdepend entries for existing
      tables/indexes, by doing
        ALTER TABLE ONLY some_partitioned_table SET TABLESPACE pg_default;
        ALTER TABLE ONLY some_partitioned_table SET TABLESPACE original_tablespace;
      for each partitioned table/index that is not in the database default
      tablespace.  Because these partitioned objects do not have storage, no
      file needs to be actually moved, so it shouldn't take more time than
      what's required to acquire locks.
      
      This query can be used to search for such relations:
      SELECT ... FROM pg_class WHERE relkind IN ('p', 'I') AND reltablespace <> 0
      Reported-by: default avatarAlexander Lakhin <exclusion@gmail.com>
      Discussion: https://postgr.es/m/16577-881633a9f9894fd5@postgresql.org
      Author: Álvaro Herrera <alvherre@alvh.no-ip.org>
      Reviewed-by: default avatarMichael Paquier <michael@paquier.xyz>
      ebfe2dbd
    • Fujii Masao's avatar
      Stabilize timeline switch regression test. · 424d7a9b
      Fujii Masao authored
      Commit fef5b47f added the regression test to check whether a standby is
      able to follow a primary on a newer timeline when WAL archiving is enabled.
      But the buildfarm member florican reported that this test failed because
      the requested WAL segment was removed and replication failed. This is a
      timing issue. Since neither replication slot is used nor wal_keep_size is set
      in the test, checkpoint could remove the WAL segment that's still necessary
      for replication.
      
      This commit stabilizes the test by setting wal_keep_size.
      
      Back-patch to v13 where the regression test that this commit stabilizes
      was added.
      
      Author: Fujii Masao
      Discussion: https://postgr.es/m/X//PsenxcC50jDzX@paquier.xyz
      424d7a9b
    • Fujii Masao's avatar
      Improve tab-completion for CLOSE, DECLARE, FETCH and MOVE. · 3f238b88
      Fujii Masao authored
      This commit makes CLOSE, FETCH and MOVE commands tab-complete the list of
      cursors. Also this commit makes DECLARE command tab-complete the options.
      
      Author: Shinya Kato, Sawada Masahiko, tweaked by Fujii Masao
      Reviewed-by: Shinya Kato, Sawada Masahiko, Fujii Masao
      Discussion: https://postgr.es/m/b0e4c5c53ef84c5395524f5056fc71f0@MP-MSGSS-MBX001.msg.nttdata.co.jp
      3f238b88
    • Thomas Munro's avatar
      Minor header cleanup for the new iovec code. · fb29ab26
      Thomas Munro authored
      Remove redundant function declaration and improve header comment in
      pg_iovec.h.  Move the new declaration in fd.h next to a group of more
      similar functions.
      fb29ab26
    • Fujii Masao's avatar
      Ensure that a standby is able to follow a primary on a newer timeline. · fef5b47f
      Fujii Masao authored
      Commit 709d003f refactored WAL-reading code, but accidentally caused
      WalSndSegmentOpen() to fail to follow a timeline switch while reading from
      a historic timeline. This issue caused a standby to fail to follow a primary
      on a newer timeline when WAL archiving is enabled.
      
      If there is a timeline switch within the segment, WalSndSegmentOpen() should
      read from the WAL segment belonging to the new timeline. But previously
      since it failed to follow a timeline switch, it tried to read the WAL segment
      with old timeline. When WAL archiving is enabled, that WAL segment with
      old timeline doesn't exist because it's renamed to .partial. This leads
      a primary to have tried to read non-existent WAL segment, and which caused
      replication to faill with the error "ERROR:  requested WAL segment ... has
       already been removed".
      
      This commit fixes WalSndSegmentOpen() so that it's able to follow a timeline
      switch, to ensure that a standby is able to follow a primary on a newer
      timeline even when WAL archiving is enabled.
      
      This commit also adds the regression test to check whether a standby is
      able to follow a primary on a newer timeline when WAL archiving is enabled.
      
      Back-patch to v13 where the bug was introduced.
      
      Reported-by: Kyotaro Horiguchi
      Author: Kyotaro Horiguchi, tweaked by Fujii Masao
      Reviewed-by:  Alvaro Herrera, Fujii Masao
      Discussion: https://postgr.es/m/20201209.174314.282492377848029776.horikyota.ntt@gmail.com
      fef5b47f
    • Michael Paquier's avatar
      Rework refactoring of hex and encoding routines · aef8948f
      Michael Paquier authored
      This commit addresses some issues with c3826f83 that moved the hex
      decoding routine to src/common/:
      - The decoding function lacked overflow checks, so when used for
      security-related features it was an open door to out-of-bound writes if
      not carefully used that could remain undetected.  Like the base64
      routines already in src/common/ used by SCRAM, this routine is reworked
      to check for overflows by having the size of the destination buffer
      passed as argument, with overflows checked before doing any writes.
      - The encoding routine was missing.  This is moved to src/common/ and
      it gains the same overflow checks as the decoding part.
      
      On failure, the hex routines of src/common/ issue an error as per the
      discussion done to make them usable by frontend tools, but not by shared
      libraries.  Note that this is why ECPG is left out of this commit, and
      it still includes a duplicated logic doing hex encoding and decoding.
      
      While on it, this commit uses better variable names for the source and
      destination buffers in the existing escape and base64 routines in
      encode.c and it makes them more robust to overflow detection.  The
      previous core code issued a FATAL after doing out-of-bound writes if
      going through the SQL functions, which would be enough to detect
      problems when working on changes that impacted this area of the
      code.  Instead, an error is issued before doing an out-of-bound write.
      The hex routines were being directly called for bytea conversions and
      backup manifests without such sanity checks.  The current calls happen
      to not have any problems, but careless uses of such APIs could easily
      lead to CVE-class bugs.
      
      Author: Bruce Momjian, Michael Paquier
      Reviewed-by: Sehrope Sarkuni
      Discussion: https://postgr.es/m/20201231003557.GB22199@momjian.us
      aef8948f
  6. 13 Jan, 2021 11 commits
    • Thomas Munro's avatar
      Move our p{read,write}v replacements into their own files. · 0d56acfb
      Thomas Munro authored
      macOS's ranlib issued a warning about an empty pread.o file with the
      previous arrangement, on systems new enough to require no replacement
      functions.  Let's go back to using configure's AC_REPLACE_FUNCS system
      to build and include each .o in the library only if it's needed, which
      requires moving the *v() functions to their own files.
      
      Also move the _with_retry() wrapper to a more permanent home.
      Reported-by: default avatarTom Lane <tgl@sss.pgh.pa.us>
      Discussion: https://postgr.es/m/1283127.1610554395%40sss.pgh.pa.us
      0d56acfb
    • Tom Lane's avatar
      Mark inet_server_addr() and inet_server_port() as parallel-restricted. · 5a6f9bce
      Tom Lane authored
      These need to be PR because they access the MyProcPort data structure,
      which doesn't get copied to parallel workers.  The very similar
      functions inet_client_addr() and inet_client_port() are already
      marked PR, but somebody missed these.
      
      Although this is a pre-existing bug, we can't readily fix it in the back
      branches since we can't force initdb.  Given the small usage of these
      two functions, and the even smaller likelihood that they'd get pushed to
      a parallel worker anyway, it doesn't seem worth the trouble to suggest
      that DBAs should fix it manually.
      
      Masahiko Sawada
      
      Discussion: https://postgr.es/m/CAD21AoAT4aHP0Uxq91qpD7NL009tnUYQe-b14R3MnSVOjtE71g@mail.gmail.com
      5a6f9bce
    • Tom Lane's avatar
      Run reformat-dat-files to declutter the catalog data files. · 8b411b8f
      Tom Lane authored
      Things had gotten pretty messy here, apparently mostly but not
      entirely the fault of the multirange patch.  No functional changes.
      8b411b8f
    • Tom Lane's avatar
      Doc, more or less: uncomment tutorial example that was fixed long ago. · dce62490
      Tom Lane authored
      Reverts a portion of commit 344190b7.  Apparently, back in the
      twentieth century we had some issues with multi-statement SQL
      functions, but they've worked fine for a long time.
      
      Daniel Westermann
      
      Discussion: https://postgr.es/m/GVAP278MB04242DCBF5E31F528D53FA18D2A90@GVAP278MB0424.CHEP278.PROD.OUTLOOK.COM
      dce62490
    • Alvaro Herrera's avatar
      Call out vacuum considerations in create index docs · 93c39f98
      Alvaro Herrera authored
      Backpatch to pg12, which is as far as it goes without conflicts.
      
      Author: James Coleman <jtc331@gmail.com>
      Reviewed-by: default avatar"David G. Johnston" <david.g.johnston@gmail.com>
      Discussion: https://postgr.es/m/CAAaqYe9oEfbz7AxXq7OX+FFVi5w5p1e_Of8ON8ZnKO9QqBfmjg@mail.gmail.com
      93c39f98
    • Tom Lane's avatar
      Disallow a digit as the first character of a variable name in pgbench. · c21ea4d5
      Tom Lane authored
      The point of this restriction is to avoid trying to substitute variables
      into timestamp literal values, which may contain strings like '12:34'.
      
      There is a good deal more that should be done to reduce pgbench's
      tendency to substitute where it shouldn't.  But this is sufficient to
      solve the case complained of by Jaime Soler, and it's simple enough
      to back-patch.
      
      Back-patch to v11; before commit 9d36a386, pgbench had a slightly
      different definition of what a variable name is, and anyway it seems
      unwise to change long-stable branches for this.
      
      Fabien Coelho
      
      Discussion: https://postgr.es/m/alpine.DEB.2.22.394.2006291740420.805678@pseudo
      c21ea4d5
    • Heikki Linnakangas's avatar
      Fix test failure with wal_level=minimal. · 5abca4b1
      Heikki Linnakangas authored
      The newly-added gist pageinspect test prints the LSNs of GiST pages,
      expecting them all to be 1 (GistBuildLSN). But with wal_level=minimal,
      they got updated by the whole-relation WAL-logging at commit. Fix by
      wrapping the problematic tests in the same transaction with the CREATE
      INDEX.
      
      Per buildfarm failure on thorntail.
      
      Discussion: https://www.postgresql.org/message-id/3B4F97E5-40FB-4142-8CAA-B301CDFBF982%40iki.fi
      5abca4b1
    • Tom Lane's avatar
      Doc: clarify behavior of back-half options in pg_dump. · 06ed235a
      Tom Lane authored
      Options that change how the archive data is converted to SQL text
      are ignored when dumping to archive formats.  The documentation
      previously said "not meaningful", which is not helpful.
      
      Discussion: https://postgr.es/m/161052021249.12228.9598689907884726185@wrigleys.postgresql.org
      06ed235a
    • Peter Geoghegan's avatar
      Enhance nbtree index tuple deletion. · d168b666
      Peter Geoghegan authored
      Teach nbtree and heapam to cooperate in order to eagerly remove
      duplicate tuples representing dead MVCC versions.  This is "bottom-up
      deletion".  Each bottom-up deletion pass is triggered lazily in response
      to a flood of versions on an nbtree leaf page.  This usually involves a
      "logically unchanged index" hint (these are produced by the executor
      mechanism added by commit 9dc718bd).
      
      The immediate goal of bottom-up index deletion is to avoid "unnecessary"
      page splits caused entirely by version duplicates.  It naturally has an
      even more useful effect, though: it acts as a backstop against
      accumulating an excessive number of index tuple versions for any given
      _logical row_.  Bottom-up index deletion complements what we might now
      call "top-down index deletion": index vacuuming performed by VACUUM.
      Bottom-up index deletion responds to the immediate local needs of
      queries, while leaving it up to autovacuum to perform infrequent clean
      sweeps of the index.  The overall effect is to avoid certain
      pathological performance issues related to "version churn" from UPDATEs.
      
      The previous tableam interface used by index AMs to perform tuple
      deletion (the table_compute_xid_horizon_for_tuples() function) has been
      replaced with a new interface that supports certain new requirements.
      Many (perhaps all) of the capabilities added to nbtree by this commit
      could also be extended to other index AMs.  That is left as work for a
      later commit.
      
      Extend deletion of LP_DEAD-marked index tuples in nbtree by adding logic
      to consider extra index tuples (that are not LP_DEAD-marked) for
      deletion in passing.  This increases the number of index tuples deleted
      significantly in many cases.  The LP_DEAD deletion process (which is now
      called "simple deletion" to clearly distinguish it from bottom-up
      deletion) won't usually need to visit any extra table blocks to check
      these extra tuples.  We have to visit the same table blocks anyway to
      generate a latestRemovedXid value (at least in the common case where the
      index deletion operation's WAL record needs such a value).
      
      Testing has shown that the "extra tuples" simple deletion enhancement
      increases the number of index tuples deleted with almost any workload
      that has LP_DEAD bits set in leaf pages.  That is, it almost never fails
      to delete at least a few extra index tuples.  It helps most of all in
      cases that happen to naturally have a lot of delete-safe tuples.  It's
      not uncommon for an individual deletion operation to end up deleting an
      order of magnitude more index tuples compared to the old naive approach
      (e.g., custom instrumentation of the patch shows that this happens
      fairly often when the regression tests are run).
      
      Add a further enhancement that augments simple deletion and bottom-up
      deletion in indexes that make use of deduplication: Teach nbtree's
      _bt_delitems_delete() function to support granular TID deletion in
      posting list tuples.  It is now possible to delete individual TIDs from
      posting list tuples provided the TIDs have a tableam block number of a
      table block that gets visited as part of the deletion process (visiting
      the table block can be triggered directly or indirectly).  Setting the
      LP_DEAD bit of a posting list tuple is still an all-or-nothing thing,
      but that matters much less now that deletion only needs to start out
      with the right _general_ idea about which index tuples are deletable.
      
      Bump XLOG_PAGE_MAGIC because xl_btree_delete changed.
      
      No bump in BTREE_VERSION, since there are no changes to the on-disk
      representation of nbtree indexes.  Indexes built on PostgreSQL 12 or
      PostgreSQL 13 will automatically benefit from bottom-up index deletion
      (i.e. no reindexing required) following a pg_upgrade.  The enhancement
      to simple deletion is available with all B-Tree indexes following a
      pg_upgrade, no matter what PostgreSQL version the user upgrades from.
      
      Author: Peter Geoghegan <pg@bowt.ie>
      Reviewed-By: default avatarHeikki Linnakangas <hlinnaka@iki.fi>
      Reviewed-By: default avatarVictor Yegorov <vyegorov@gmail.com>
      Discussion: https://postgr.es/m/CAH2-Wzm+maE3apHB8NOtmM=p-DO65j2V5GzAWCOEEuy3JZgb2g@mail.gmail.com
      d168b666
    • Peter Geoghegan's avatar
      Pass down "logically unchanged index" hint. · 9dc718bd
      Peter Geoghegan authored
      Add an executor aminsert() hint mechanism that informs index AMs that
      the incoming index tuple (the tuple that accompanies the hint) is not
      being inserted by execution of an SQL statement that logically modifies
      any of the index's key columns.
      
      The hint is received by indexes when an UPDATE takes place that does not
      apply an optimization like heapam's HOT (though only for indexes where
      all key columns are logically unchanged).  Any index tuple that receives
      the hint on insert is expected to be a duplicate of at least one
      existing older version that is needed for the same logical row.  Related
      versions will typically be stored on the same index page, at least
      within index AMs that apply the hint.
      
      Recognizing the difference between MVCC version churn duplicates and
      true logical row duplicates at the index AM level can help with cleanup
      of garbage index tuples.  Cleanup can intelligently target tuples that
      are likely to be garbage, without wasting too many cycles on less
      promising tuples/pages (index pages with little or no version churn).
      
      This is infrastructure for an upcoming commit that will teach nbtree to
      perform bottom-up index deletion.  No index AM actually applies the hint
      just yet.
      
      Author: Peter Geoghegan <pg@bowt.ie>
      Reviewed-By: default avatarVictor Yegorov <vyegorov@gmail.com>
      Discussion: https://postgr.es/m/CAH2-Wz=CEKFa74EScx_hFVshCOn6AA5T-ajFASTdzipdkLTNQQ@mail.gmail.com
      9dc718bd
    • Fujii Masao's avatar
      Log long wait time on recovery conflict when it's resolved. · 39b03690
      Fujii Masao authored
      This is a follow-up of the work done in commit 0650ff23. This commit
      extends log_recovery_conflict_waits so that a log message is produced
      also when recovery conflict has already been resolved after deadlock_timeout
      passes, i.e., when the startup process finishes waiting for recovery
      conflict after deadlock_timeout. This is useful in investigating how long
      recovery conflicts prevented the recovery from applying WAL.
      
      Author: Fujii Masao
      Reviewed-by: Kyotaro Horiguchi, Bertrand Drouvot
      Discussion: https://postgr.es/m/9a60178c-a853-1440-2cdc-c3af916cff59@amazon.com
      39b03690