1. 07 Apr, 2021 19 commits
    • Peter Geoghegan's avatar
      Add wraparound failsafe to VACUUM. · 1e55e7d1
      Peter Geoghegan authored
      Add a failsafe mechanism that is triggered by VACUUM when it notices
      that the table's relfrozenxid and/or relminmxid are dangerously far in
      the past.  VACUUM checks the age of the table dynamically, at regular
      intervals.
      
      When the failsafe triggers, VACUUM takes extraordinary measures to
      finish as quickly as possible so that relfrozenxid and/or relminmxid can
      be advanced.  VACUUM will stop applying any cost-based delay that may be
      in effect.  VACUUM will also bypass any further index vacuuming and heap
      vacuuming -- it only completes whatever remaining pruning and freezing
      is required.  Bypassing index/heap vacuuming is enabled by commit
      8523492d, which made it possible to dynamically trigger the mechanism
      already used within VACUUM when it is run with INDEX_CLEANUP off.
      
      It is expected that the failsafe will almost always trigger within an
      autovacuum to prevent wraparound, long after the autovacuum began.
      However, the failsafe mechanism can trigger in any VACUUM operation.
      Even in a non-aggressive VACUUM, where we're likely to not advance
      relfrozenxid, it still seems like a good idea to finish off remaining
      pruning and freezing.   An aggressive/anti-wraparound VACUUM will be
      launched immediately afterwards.  Note that the anti-wraparound VACUUM
      that follows will itself trigger the failsafe, usually before it even
      begins its first (and only) pass over the heap.
      
      The failsafe is controlled by two new GUCs: vacuum_failsafe_age, and
      vacuum_multixact_failsafe_age.  There are no equivalent reloptions,
      since that isn't expected to be useful.  The GUCs have rather high
      defaults (both default to 1.6 billion), and are expected to generally
      only be used to make the failsafe trigger sooner/more frequently.
      
      Author: Masahiko Sawada <sawada.mshk@gmail.com>
      Author: Peter Geoghegan <pg@bowt.ie>
      Discussion: https://postgr.es/m/CAD21AoD0SkE11fMw4jD4RENAwBMcw1wasVnwpJVw3tVqPOQgAw@mail.gmail.com
      Discussion: https://postgr.es/m/CAH2-WzmgH3ySGYeC-m-eOBsa2=sDwa292-CFghV4rESYo39FsQ@mail.gmail.com
      1e55e7d1
    • Bruce Momjian's avatar
      Make use of in-core query id added by commit 5fd9dfa5 · 4f0b0966
      Bruce Momjian authored
      Use the in-core query id computation for pg_stat_activity,
      log_line_prefix, and EXPLAIN VERBOSE.
      
      Similar to other fields in pg_stat_activity, only the queryid from the
      top level statements are exposed, and if the backends status isn't
      active then the queryid from the last executed statements is displayed.
      
      Add a %Q placeholder to include the queryid in log_line_prefix, which
      will also only expose top level statements.
      
      For EXPLAIN VERBOSE, if a query identifier has been computed, either by
      enabling compute_query_id or using a third-party module, display it.
      
      Bump catalog version.
      
      Discussion: https://postgr.es/m/20210407125726.tkvjdbw76hxnpwfi@nol
      
      Author: Julien Rouhaud
      
      Reviewed-by: Alvaro Herrera, Nitin Jadhav, Zhihong Yu
      4f0b0966
    • Robert Haas's avatar
      amcheck: fix multiple problems with TOAST pointer validation · ec7ffb80
      Robert Haas authored
      First, don't perform database access while holding a buffer lock.
      When checking a heap, we can validate that TOAST pointers are sane by
      performing a scan on the TOAST index and looking up the chunks that
      correspond to each value ID that appears in a TOAST poiner in the main
      table. But, to do that while holding a buffer lock at least risks
      causing other backends to wait uninterruptibly, and probably can cause
      undetected and uninterruptible deadlocks.  So, instead, make a list of
      checks to perform while holding the lock, and then perform the checks
      after releasing it.
      
      Second, adjust things so that we don't try to follow TOAST pointers
      for tuples that are already eligible to be pruned. The TOAST tuples
      become eligible for pruning at the same time that the main tuple does,
      so trying to check them may lead to spurious reports of corruption,
      as observed in the buildfarm. The necessary infrastructure to decide
      whether or not the tuple being checked is prunable was added by
      commit 3b6c1259, but it wasn't
      actually used for its intended purpose prior to this patch.
      
      Mark Dilger, adjusted by me to avoid a memory leak.
      
      Discussion: http://postgr.es/m/AC5479E4-6321-473D-AC92-5EC36299FBC2@enterprisedb.com
      ec7ffb80
    • Bruce Momjian's avatar
      Move pg_stat_statements query jumbling to core. · 5fd9dfa5
      Bruce Momjian authored
      Add compute_query_id GUC to control whether a query identifier should be
      computed by the core (off by default).  It's thefore now possible to
      disable core queryid computation and use pg_stat_statements with a
      different algorithm to compute the query identifier by using a
      third-party module.
      
      To ensure that a single source of query identifier can be used and is
      well defined, modules that calculate a query identifier should throw an
      error if compute_query_id specified to compute a query id and if a query
      idenfitier was already calculated.
      
      Discussion: https://postgr.es/m/20210407125726.tkvjdbw76hxnpwfi@nol
      
      Author: Julien Rouhaud
      
      Reviewed-by: Alvaro Herrera, Nitin Jadhav, Zhihong Yu
      5fd9dfa5
    • Tom Lane's avatar
      Remove channel binding requirement from clientcert=verify-full test. · a282ee68
      Tom Lane authored
      This fails on older OpenSSL versions that lack channel binding
      support.  Since that feature is not essential to this test case,
      just remove it, instead of complicating matters.  Per buildfarm.
      
      Jacob Champion
      
      Discussion: https://postgr.es/m/fa8dbbb58c20b1d1adf0082769f80d5466eaf485.camel@vmware.com
      a282ee68
    • Tom Lane's avatar
    • Robert Haas's avatar
      amcheck: Remove duplicate XID/MXID bounds checks. · 4573f6a9
      Robert Haas authored
      Commit 3b6c1259 resulted in the same
      xmin and xmax bounds checking being performed in both check_tuple()
      and check_tuple_visibility(). Remove the duplication.
      
      While at it, adjust some code comments that were overlooked in that
      commit.
      
      Mark Dilger
      
      Discussion: http://postgr.es/m/AC5479E4-6321-473D-AC92-5EC36299FBC2@enterprisedb.com
      4573f6a9
    • Peter Geoghegan's avatar
      Truncate line pointer array during VACUUM. · 3c3b8a4b
      Peter Geoghegan authored
      Teach VACUUM to truncate the line pointer array of each heap page when a
      contiguous group of LP_UNUSED line pointers appear at the end of the
      array -- these unused and unreferenced items are excluded.  This process
      occurs during VACUUM's second pass over the heap, right after LP_DEAD
      line pointers on the page (those encountered/pruned during the first
      pass) are marked LP_UNUSED.
      
      Truncation avoids line pointer bloat with certain workloads,
      particularly those involving continual range DELETEs and bulk INSERTs
      against the same table.
      
      Also harden heapam code to check for an out-of-range page offset number
      in places where we weren't already doing so.
      
      Author: Matthias van de Meent <boekewurm+postgres@gmail.com>
      Author: Peter Geoghegan <pg@bowt.ie>
      Reviewed-By: default avatarMasahiko Sawada <sawada.mshk@gmail.com>
      Reviewed-By: default avatarPeter Geoghegan <pg@bowt.ie>
      Discussion: https://postgr.es/m/CAEze2WjgaQc55Y5f5CQd3L=eS5CZcff2Obxp=O6pto8-f0hC4w@mail.gmail.com
      Discussion: https://postgr.es/m/CAH2-Wzn6a64PJM1Ggzm=uvx2otsopJMhFQj_g1rAj4GWr3ZSzw@mail.gmail.com
      3c3b8a4b
    • Tom Lane's avatar
      Tighten up allowed names for custom GUC parameters. · 3db826bd
      Tom Lane authored
      Formerly we were pretty lax about what a custom GUC's name could
      be; so long as it had at least one dot in it, we'd take it.
      However, corner cases such as dashes or equal signs in the name
      would cause various bits of functionality to misbehave.  Rather
      than trying to make the world perfectly safe for that, let's
      just require that custom names look like "identifier.identifier",
      where "identifier" means something that scan.l would accept
      without double quotes.
      
      Along the way, this patch refactors things slightly in guc.c
      so that find_option() is responsible for reporting GUC-not-found
      cases, allowing removal of duplicative code from its callers.
      
      Per report from Hubert Depesz Lubaczewski.  No back-patch,
      since the consequences of the problem don't seem to warrant
      changing behavior in stable branches.
      
      Discussion: https://postgr.es/m/951335.1612910077@sss.pgh.pa.us
      3db826bd
    • Tomas Vondra's avatar
      Don't add non-existent pages to bitmap from BRIN · 23607a81
      Tomas Vondra authored
      The code in bringetbitmap() simply added the whole matching page range
      to the TID bitmap, as determined by pages_per_range, even if some of the
      pages were beyond the end of the heap. The query then might fail with
      an error like this:
      
        ERROR:  could not open file "base/20176/20228.2" (target block
                262144): previous segment is only 131021 blocks
      
      In this case, the relation has 262093 pages (131072 and 131021 pages),
      but we're trying to acess block 262144, i.e. first block of the 3rd
      segment. At that point _mdfd_getseg() notices the preceding segment is
      incomplete, and fails.
      
      Hitting this in practice is rather unlikely, because:
      
      * Most indexes use power-of-two ranges, so segments and page ranges
        align perfectly (segment end is also a page range end).
      
      * The table size has to be just right, with the last segment being
        almost full - less than one page range from full segment, so that the
        last page range actually crosses the segment boundary.
      
      * Prefetch has to be enabled. The regular page access checks that
        pages are not beyond heap end, but prefetch does not. On older
        releases (before 12) the execution stops after hitting the first
        non-existent page, so the prefetch distance has to be sufficient
        to reach the first page in the next segment to trigger the issue.
        Since 12 it's enough to just have prefetch enabled, the prefetch
        distance does not matter.
      
      Fixed by not adding non-existent pages to the TID bitmap. Backpatch
      all the way back to 9.6 (BRIN indexes were introduced in 9.5, but that
      release is EOL).
      
      Backpatch-through: 9.6
      23607a81
    • Peter Eisentraut's avatar
      libpq: Set Server Name Indication (SNI) for SSL connections · 5c55dc8b
      Peter Eisentraut authored
      By default, have libpq set the TLS extension "Server Name Indication" (SNI).
      
      This allows an SNI-aware SSL proxy to route connections.  (This
      requires a proxy that is aware of the PostgreSQL protocol, not just
      any SSL proxy.)
      
      In the future, this could also allow the server to use different SSL
      certificates for different host specifications.  (That would require
      new server functionality.  This would be the client-side functionality
      for that.)
      
      Since SNI makes the host name appear in cleartext in the network
      traffic, this might be undesirable in some cases.  Therefore, also add
      a libpq connection option "sslsni" to turn it off.
      
      Discussion: https://www.postgresql.org/message-id/flat/7289d5eb-62a5-a732-c3b9-438cee2cb709%40enterprisedb.com
      5c55dc8b
    • Magnus Hagander's avatar
      Refactor hba_authname · c1968426
      Magnus Hagander authored
      The previous implementation (from 9afffcb8) had an unnecessary check
      on the boundaries of the enum which trigtered compile warnings. To clean
      it up, move the pre-existing static assert to a central location and
      call that.
      
      Reported-By: Erik Rijkers
      Reviewed-By: Michael Paquier
      Discussion: https://postgr.es/m/1056399262.13159.1617793249020@webmailclassic.xs4all.nl
      c1968426
    • Peter Eisentraut's avatar
    • Heikki Linnakangas's avatar
      Revert "Add sortsupport for gist_btree opclasses, for faster index builds." · d92b1cdb
      Heikki Linnakangas authored
      This reverts commit 9f984ba6.
      
      It was making the buildfarm unhappy, apparently setting client_min_messages
      in a regression test produces different output if log_statement='all'.
      Another issue is that I now suspect the bit sortsupport function was in
      fact not correct to call byteacmp(). Revert to investigate both of those
      issues.
      d92b1cdb
    • Heikki Linnakangas's avatar
      Add sortsupport for gist_btree opclasses, for faster index builds. · 9f984ba6
      Heikki Linnakangas authored
      Commit 16fa9b2b introduced a faster way to build GiST indexes, by
      sorting all the data. This commit adds the sortsupport functions needed
      to make use of that feature for btree_gist.
      
      Author: Andrey Borodin
      Discussion: https://www.postgresql.org/message-id/2F3F7265-0D22-44DB-AD71-8554C743D943@yandex-team.ru
      9f984ba6
    • Peter Eisentraut's avatar
      Fix use of cursor sensitivity terminology · dd13ad9d
      Peter Eisentraut authored
      Documentation and comments in code and tests have been using the terms
      sensitive/insensitive cursor incorrectly relative to the SQL standard.
      (Cursor sensitivity is only relevant for changes made in the same
      transaction as the cursor, not for concurrent changes in other
      sessions.)  Moreover, some of the behavior of PostgreSQL is incorrect
      according to the SQL standard, confusing the issue further.  (WHERE
      CURRENT OF changes are not visible in insensitive cursors, but they
      should be.)
      
      This change corrects the terminology and removes the claim that
      sensitive cursors are supported.  It also adds a test case that checks
      the insensitive behavior in a "correct" way, using a change command
      not using WHERE CURRENT OF.  Finally, it adds the ASENSITIVE cursor
      option to select the default asensitive behavior, per SQL standard.
      
      There are no changes to cursor behavior in this patch.
      
      Discussion: https://www.postgresql.org/message-id/flat/96ee8b30-9889-9e1b-b053-90e10c050e85%40enterprisedb.com
      dd13ad9d
    • Peter Eisentraut's avatar
      Message improvement · 0b5e8245
      Peter Eisentraut authored
      The previous wording contained a superfluous comma.  Adjust phrasing
      for grammatical correctness and clarity.
      0b5e8245
    • Michael Paquier's avatar
      Remove redundant memset(0) calls for page init of some index AMs · 4c0239cb
      Michael Paquier authored
      Bloom, GIN, GiST and SP-GiST rely on PageInit() to initialize the
      contents of a page, and this routine fills entirely a page with zeros
      for a size of BLCKSZ, including the special space.  Those index AMs have
      been using an extra memset() call to fill with zeros the special page
      space, or even the whole page, which is not necessary as PageInit()
      already does this work, so let's remove them.  GiST was not doing this
      extra call, but has commented out a system call that did so since
      62369911.
      
      While on it, remove one MAXALIGN() for SP-GiST as PageInit() takes care
      of that.  This makes the whole page initialization logic more consistent
      across all index AMs.
      
      Author: Bharath Rupireddy
      Reviewed-by: Vignesh C, Mahendra Singh Thalor
      Discussion: https://postgr.es/m/CALj2ACViOo2qyaPT7krWm4LRyRTw9kOXt+g6PfNmYuGA=YHj9A@mail.gmail.com
      4c0239cb
    • Michael Paquier's avatar
      Add some information about authenticated identity via log_connections · 9afffcb8
      Michael Paquier authored
      The "authenticated identity" is the string used by an authentication
      method to identify a particular user.  In many common cases, this is the
      same as the PostgreSQL username, but for some third-party authentication
      methods, the identifier in use may be shortened or otherwise translated
      (e.g. through pg_ident user mappings) before the server stores it.
      
      To help administrators see who has actually interacted with the system,
      this commit adds the capability to store the original identity when
      authentication succeeds within the backend's Port, and generates a log
      entry when log_connections is enabled.  The log entries generated look
      something like this (where a local user named "foouser" is connecting to
      the database as the database user called "admin"):
      
        LOG:  connection received: host=[local]
        LOG:  connection authenticated: identity="foouser" method=peer (/data/pg_hba.conf:88)
        LOG:  connection authorized: user=admin database=postgres application_name=psql
      
      Port->authn_id is set according to the authentication method:
      
        bsd: the PostgreSQL username (aka the local username)
        cert: the client's Subject DN
        gss: the user principal
        ident: the remote username
        ldap: the final bind DN
        pam: the PostgreSQL username (aka PAM username)
        password (and all pw-challenge methods): the PostgreSQL username
        peer: the peer's pw_name
        radius: the PostgreSQL username (aka the RADIUS username)
        sspi: either the down-level (SAM-compatible) logon name, if
              compat_realm=1, or the User Principal Name if compat_realm=0
      
      The trust auth method does not set an authenticated identity.  Neither
      does clientcert=verify-full.
      
      Port->authn_id could be used for other purposes, like a superuser-only
      extra column in pg_stat_activity, but this is left as future work.
      
      PostgresNode::connect_{ok,fails}() have been modified to let tests check
      the backend log files for required or prohibited patterns, using the
      new log_like and log_unlike parameters.  This uses a method based on a
      truncation of the existing server log file, like issues_sql_like().
      Tests are added to the ldap, kerberos, authentication and SSL test
      suites.
      
      Author: Jacob Champion
      Reviewed-by: Stephen Frost, Magnus Hagander, Tom Lane, Michael Paquier
      Discussion: https://postgr.es/m/c55788dd1773c521c862e8e0dddb367df51222be.camel@vmware.com
      9afffcb8
  2. 06 Apr, 2021 21 commits
    • Fujii Masao's avatar
      Fix test added by commit 9de9294b. · 8ee9b662
      Fujii Masao authored
      The buildfarm members "drongo" and "fairywren" reported that
      the regression test (024_archive_recovery.pl) added by commit 9de9294b
      failed. The cause of this failure is that the test calls $node->init()
      without "allows_streaming => 1" and which doesn't add pg_hba.conf
      entry for TCP/IP connection from pg_basebackup.
      This commit fixes the issue by specifying "allows_streaming => 1"
      when calling $node->init().
      
      Author: Fujii Masao
      Discussion: https://postgr.es/m/3cc3909d-f779-7a74-c201-f1f7f62c7497@oss.nttdata.com
      8ee9b662
    • Tom Lane's avatar
      Postpone some more stuff out of ExecInitModifyTable. · a1115fa0
      Tom Lane authored
      Delay creation of the projections for INSERT and UPDATE tuples
      until they're needed.  This saves a pretty fair amount of work
      when only some of the partitions are actually touched.
      
      The logic associated with identifying junk columns in UPDATE/DELETE
      is moved to another loop, allowing removal of one loop over the
      target relations; but it didn't actually change at all.
      
      Extracted from a larger patch, which seemed to me to be too messy
      to push in one commit.
      
      Amit Langote, reviewed at different times by Heikki Linnakangas and
      myself
      
      Discussion: https://postgr.es/m/CA+HiwqG7ZruBmmih3wPsBZ4s0H2EhywrnXEduckY5Hr3fWzPWA@mail.gmail.com
      a1115fa0
    • David Rowley's avatar
      Fix compiler warning for MSVC in libpq_pipeline.c · 3b82d990
      David Rowley authored
      DEBUG was already defined by the MSVC toolchain for "Debug" builds. On
      these systems the unconditional #define DEBUG was causing a 'DEBUG': macro
      redefinition warning.
      
      Here we rename DEBUG to DEBUG_OUPUT and also get rid of the #define which
      defined this constant.  This appears to have been left in the code by
      mistake.
      
      Discussion: https://postgr.es/m/CAApHDvqTTgDm38s4HRj03nhzhzQ1oMOj-RXFUB1pE6Bj07jyuQ@mail.gmail.com
      3b82d990
    • Tom Lane's avatar
      Postpone some stuff out of ExecInitModifyTable. · c5b7ba4e
      Tom Lane authored
      Arrange to do some things on-demand, rather than immediately during
      executor startup, because there's a fair chance of never having to do
      them at all:
      
      * Don't open result relations' indexes until needed.
      
      * Don't initialize partition tuple routing, nor the child-to-root
      tuple conversion map, until needed.
      
      This wins in UPDATEs on partitioned tables when only some of the
      partitions will actually receive updates; with larger partition
      counts the savings is quite noticeable.  Also, we can remove some
      sketchy heuristics in ExecInitModifyTable about whether to set up
      tuple routing.
      
      Also, remove execPartition.c's private hash table tracking which
      partitions were already opened by the ModifyTable node.  Instead
      use the hash added to ModifyTable itself by commit 86dc9005.
      
      To allow lazy computation of the conversion maps, we now set
      ri_RootResultRelInfo in all child ResultRelInfos.  We formerly set it
      only in some, not terribly well-defined, cases.  This has user-visible
      side effects in that now more error messages refer to the root
      relation instead of some partition (and provide error data in the
      root's column order, too).  It looks to me like this is a strict
      improvement in consistency, so I don't have a problem with the
      output changes visible in this commit.
      
      Extracted from a larger patch, which seemed to me to be too messy
      to push in one commit.
      
      Amit Langote, reviewed at different times by Heikki Linnakangas and
      myself
      
      Discussion: https://postgr.es/m/CA+HiwqG7ZruBmmih3wPsBZ4s0H2EhywrnXEduckY5Hr3fWzPWA@mail.gmail.com
      c5b7ba4e
    • Fujii Masao's avatar
      postgres_fdw: Allow partitions specified in LIMIT TO to be imported. · a3740c48
      Fujii Masao authored
      Commit f49bcd4e disallowed postgres_fdw to import table partitions.
      Because all data can be accessed through the partitioned table which
      is the root of the partitioning hierarchy, importing only partitioned
      table should allow access to all the data without creating extra objects.
      This is a reasonable default when importing a whole schema. But there
      may be the case where users want to explicitly import one of
      a partitioned tables' partitions.
      
      For that use case, this commit allows postgres_fdw to import tables or
      foreign tables which are partitions of some other table only when they
      are explicitly specified in LIMIT TO clause.  It doesn't change
      the behavior that any partitions not specified in LIMIT TO are
      automatically excluded in IMPORT FOREIGN SCHEMA command.
      
      Author: Matthias van de Meent
      Reviewed-by: Bernd Helmle, Amit Langote, Michael Paquier, Fujii Masao
      Discussion: https://postgr.es/m/CAEze2Whwg4i=mzApMe+PXxCEfgoZmHGqdqQFW7J4bmj_5p6t1A@mail.gmail.com
      a3740c48
    • Andres Freund's avatar
      Increment xactCompletionCount during subtransaction abort. · 90c885cd
      Andres Freund authored
      Snapshot caching, introduced in 623a9ba7, did not increment
      xactCompletionCount during subtransaction abort. That could lead to an older
      snapshot being reused. That is, at least as far as I can see, not a
      correctness issue (for MVCC snapshots there's no difference between "in
      progress" and "aborted"). The only difference between the old and new
      snapshots would be a newer ->xmax.
      
      While HeapTupleSatisfiesMVCC makes the same visibility determination, reusing
      the old snapshot leads HeapTupleSatisfiesMVCC to not set
      HEAP_XMIN_INVALID. Which subsequently causes the kill_prior_tuple optimization
      to not kick in (via HeapTupleIsSurelyDead() returning false). The performance
      effects of doing the same index-lookups over and over again is how the issue
      was discovered...
      
      Fix the issue by incrementing xactCompletionCount in
      XidCacheRemoveRunningXids. It already acquires ProcArrayLock exclusively,
      making that an easy proposition.
      
      Add a test to ensure that kill_prior_tuple prevents index growth when it
      involves aborted subtransaction of the current transaction.
      
      Author: Andres Freund
      Discussion: https://postgr.es/m/20210406043521.lopeo7bbigad3n6t@alap3.anarazel.de
      Discussion: https://postgr.es/m/20210317055718.v6qs3ltzrformqoa%40alap3.anarazel.de
      90c885cd
    • Peter Geoghegan's avatar
      Remove tupgone special case from vacuumlazy.c. · 8523492d
      Peter Geoghegan authored
      Retry the call to heap_prune_page() in rare cases where there is
      disagreement between the heap_prune_page() call and the call to
      HeapTupleSatisfiesVacuum() that immediately follows.  Disagreement is
      possible when a concurrently-aborted transaction makes a tuple DEAD
      during the tiny window between each step.  This was the only case where
      a tuple considered DEAD by VACUUM still had storage following pruning.
      VACUUM's definition of dead tuples is now uniformly simple and
      unambiguous: dead tuples from each page are always LP_DEAD line pointers
      that were encountered just after we performed pruning (and just before
      we considered freezing remaining items with tuple storage).
      
      Eliminating the tupgone=true special case enables INDEX_CLEANUP=off
      style skipping of index vacuuming that takes place based on flexible,
      dynamic criteria.  The INDEX_CLEANUP=off case had to know about skipping
      indexes up-front before now, due to a subtle interaction with the
      special case (see commit dd695979) -- this was a special case unto
      itself.  Now there are no special cases.  And so now it won't matter
      when or how we decide to skip index vacuuming: it won't affect how
      pruning behaves, and it won't be affected by any of the implementation
      details of pruning or freezing.
      
      Also remove XLOG_HEAP2_CLEANUP_INFO records.  These are no longer
      necessary because we now rely entirely on heap pruning taking care of
      recovery conflicts.  There is no longer any need to generate recovery
      conflicts for DEAD tuples that pruning just missed.  This also means
      that heap vacuuming now uses exactly the same strategy for recovery
      conflicts as index vacuuming always has: REDO routines never need to
      process a latestRemovedXid from the WAL record, since earlier REDO of
      the WAL record from pruning is sufficient in all cases.  The generic
      XLOG_HEAP2_CLEAN record type is now split into two new record types to
      reflect this new division (these are called XLOG_HEAP2_PRUNE and
      XLOG_HEAP2_VACUUM).
      
      Also stop acquiring a super-exclusive lock for heap pages when they're
      vacuumed during VACUUM's second heap pass.  A regular exclusive lock is
      enough.  This is correct because heap page vacuuming is now strictly a
      matter of setting the LP_DEAD line pointers to LP_UNUSED.  No other
      backend can have a pointer to a tuple located in a pinned buffer that
      can be invalidated by a concurrent heap page vacuum operation.
      
      Heap vacuuming can now be thought of as conceptually similar to index
      vacuuming and conceptually dissimilar to heap pruning.  Heap pruning now
      has sole responsibility for anything involving the logical contents of
      the database (e.g., managing transaction status information, recovery
      conflicts, considering what to do with HOT chains).  Index vacuuming and
      heap vacuuming are now only concerned with recycling garbage items from
      physical data structures that back the logical database.
      
      Bump XLOG_PAGE_MAGIC due to pruning and heap page vacuum WAL record
      changes.
      
      Credit for the idea of retrying pruning a page to avoid the tupgone case
      goes to Andres Freund.
      
      Author: Peter Geoghegan <pg@bowt.ie>
      Reviewed-By: default avatarAndres Freund <andres@anarazel.de>
      Reviewed-By: default avatarMasahiko Sawada <sawada.mshk@gmail.com>
      Discussion: https://postgr.es/m/CAH2-WznneCXTzuFmcwx_EyRQgfsfJAAsu+CsqRFmFXCAar=nJw@mail.gmail.com
      8523492d
    • Tom Lane's avatar
      Fix missing #include in nodeResultCache.h. · 789d81de
      Tom Lane authored
      Per cpluspluscheck.
      789d81de
    • Peter Eisentraut's avatar
      psql: Show all query results by default · 3a513067
      Peter Eisentraut authored
      Previously, psql printed only the last result if a command string
      returned multiple result sets.  Now it prints all of them.  The
      previous behavior can be obtained by setting the psql variable
      SHOW_ALL_RESULTS to off.
      
      Author: Fabien COELHO <coelho@cri.ensmp.fr>
      Reviewed-by: default avatar"Iwata, Aya" <iwata.aya@jp.fujitsu.com>
      Reviewed-by: default avatarDaniel Verite <daniel@manitou-mail.org>
      Reviewed-by: default avatarPeter Eisentraut <peter.eisentraut@2ndquadrant.com>
      Reviewed-by: default avatarKyotaro Horiguchi <horikyota.ntt@gmail.com>
      Reviewed-by: default avatarvignesh C <vignesh21@gmail.com>
      Discussion: https://www.postgresql.org/message-id/flat/alpine.DEB.2.21.1904132231510.8961@lancre
      3a513067
    • Tomas Vondra's avatar
      Fix handling of clauses incompatible with extended statistics · 518442c7
      Tomas Vondra authored
      Handling of incompatible clauses while applying extended statistics was
      a bit confused - while handling a mix of compatible and incompatible
      clauses it sometimes incorrectly treated the incompatible clauses as
      compatible, resulting in a crash.
      
      Fixed by reworking the code applying the selected statistics object to
      make it easier to understand, and adding a proper compatibility check.
      
      Reported-by: David Rowley
      Discussion: https://postgr.es/m/CAApHDvpYT10-nkSp8xXe-nbO3jmoaRyRFHbzh-RWMfAJynqgpQ%40mail.gmail.com
      518442c7
    • Peter Geoghegan's avatar
      Refactor lazy_scan_heap() loop. · 7ab96cf6
      Peter Geoghegan authored
      Add a lazy_scan_heap() subsidiary function that handles heap pruning and
      tuple freezing: lazy_scan_prune().  This is a great deal cleaner.  The
      code that remains in lazy_scan_heap()'s per-block loop can now be
      thought of as code that either comes before or after the call to
      lazy_scan_prune(), which is now the clear focal point.  This division is
      enforced by the way in which we now manage state.  lazy_scan_prune()
      outputs state (using its own struct) that describes what to do with the
      page following pruning and freezing (e.g., visibility map maintenance,
      recording free space in the FSM).  It doesn't get passed any special
      instructional state from the preamble code, though.
      
      Also cleanly separate the logic used by a VACUUM with INDEX_CLEANUP=off
      from the logic used by single-heap-pass VACUUMs.  The former case is now
      structured as the omission of index and heap vacuuming by a two pass
      VACUUM.  The latter case goes back to being used only when the table
      happens to have no indexes (just as it was before commit a96c41fe).
      This structure is much more natural, since the whole point of
      INDEX_CLEANUP=off is to skip the index and heap vacuuming that would
      otherwise take place.  The single-heap-pass case doesn't skip any useful
      work, though -- it just does heap pruning and heap vacuuming together
      when the table happens to have no indexes.
      
      Both of these changes are preparation for an upcoming patch that
      generalizes the mechanism used by INDEX_CLEANUP=off.  The later patch
      will allow VACUUM to give up on index and heap vacuuming dynamically, as
      problems emerge (e.g., with wraparound), so that an affected VACUUM
      operation can finish up as soon as possible.
      
      Also fix a very old bug in single-pass VACUUM VERBOSE output.  We were
      reporting the number of tuples deleted via pruning as a direct
      substitute for reporting the number of LP_DEAD items removed in a
      function that deals with the second pass over the heap.  But that
      doesn't work at all -- they're two different things.
      
      To fix, start tracking the total number of LP_DEAD items encountered
      during pruning, and use that in the report instead.  A single pass
      VACUUM will always vacuum away whatever LP_DEAD items a heap page has
      immediately after it is pruned, so the total number of LP_DEAD items
      encountered during pruning equals the total number vacuumed-away.
      (They are _not_ equal in the INDEX_CLEANUP=off case, but that's okay
      because skipping index vacuuming is now a totally orthogonal concept to
      one-pass VACUUM.)
      
      Also stop reporting the count of LP_UNUSED items in VACUUM VERBOSE
      output.  This makes the output of VACUUM VERBOSE more consistent with
      log_autovacuum's output (because it never showed information about
      LP_UNUSED items).  VACUUM VERBOSE reported LP_UNUSED items left behind
      by the last VACUUM, and LP_UNUSED items created via pruning HOT chains
      during the current VACUUM (it never included LP_UNUSED items left behind
      by the current VACUUM's second pass over the heap).  This makes it
      useless as an indicator of line pointer bloat, which must have been the
      original intention. (Like the first VACUUM VERBOSE issue, this issue was
      arguably an oversight in commit 282d2a03, which added the heap-only
      tuple optimization.)
      
      Finally, stop reporting empty_pages in VACUUM VERBOSE output, and start
      reporting pages_removed instead.  This also makes the output of VACUUM
      VERBOSE more consistent with log_autovacuum's output (which does not
      show empty_pages, but does show pages_removed).  An empty page isn't
      meaningfully different to a page that is almost empty, or a page that is
      empty but for only a small number of remaining LP_UNUSED items.
      
      Author: Peter Geoghegan <pg@bowt.ie>
      Reviewed-By: default avatarRobert Haas <robertmhaas@gmail.com>
      Reviewed-By: default avatarMasahiko Sawada <sawada.mshk@gmail.com>
      Discussion: https://postgr.es/m/CAH2-WznneCXTzuFmcwx_EyRQgfsfJAAsu+CsqRFmFXCAar=nJw@mail.gmail.com
      7ab96cf6
    • Tom Lane's avatar
      Clean up treatment of missing default and CHECK-constraint records. · 091e22b2
      Tom Lane authored
      Andrew Gierth reported that it's possible to crash the backend if no
      pg_attrdef record is found to match an attribute that has atthasdef set.
      AttrDefaultFetch warns about this situation, but then leaves behind
      a relation tupdesc that has null "adbin" pointer(s), which most places
      don't guard against.
      
      We considered promoting the warning to an error, but throwing errors
      during relcache load is pretty drastic: it effectively locks one out
      of using the relation at all.  What seems better is to leave the
      load-time behavior as a warning, but then throw an error in any code
      path that wants to use a default and can't find it.  This confines
      the error to a subset of INSERT/UPDATE operations on the table, and
      in particular will at least allow a pg_dump to succeed.
      
      Also, we should fix AttrDefaultFetch to not leave any null pointers
      in the tupdesc, because that just creates an untested bug hazard.
      
      While at it, apply the same philosophy of "warn at load, throw error
      only upon use of the known-missing info" to CHECK constraints.
      CheckConstraintFetch is very nearly the same logic as AttrDefaultFetch,
      but for reasons lost in the mists of time, it was throwing ERROR for
      the same cases that AttrDefaultFetch treats as WARNING.  Make the two
      functions more nearly alike.
      
      In passing, get rid of potentially-O(N^2) loops in equalTupleDesc
      by making AttrDefaultFetch sort the entries after fetching them,
      so that equalTupleDesc can assume that entries in two equal tupdescs
      must be in matching order.  (CheckConstraintFetch already was sorting
      CHECK constraints, but equalTupleDesc hadn't been told about it.)
      
      There's some argument for back-patching this, but with such a small
      number of field reports, I'm content to fix it in HEAD.
      
      Discussion: https://postgr.es/m/87pmzaq4gx.fsf@news-spur.riddles.org.uk
      091e22b2
    • Fujii Masao's avatar
      Stop archive recovery if WAL generated with wal_level=minimal is found. · 9de9294b
      Fujii Masao authored
      Previously if hot standby was enabled, archive recovery exited with
      an error when it found WAL generated with wal_level=minimal.
      But if hot standby was disabled, it just reported a warning and
      continued in that case. Which could lead to data loss or errors
      during normal operation. A warning was emitted, but users could
      easily miss that and not notice this serious situation until
      they encountered the actual errors.
      
      To improve this situation, this commit changes archive recovery
      so that it exits with FATAL error when it finds WAL generated with
      wal_level=minimal whatever the setting of hot standby. This enables
      users to notice the serious situation soon.
      
      The FATAL error is thrown if archive recovery starts from a base
      backup taken before wal_level is changed to minimal. When archive
      recovery exits with the error, if users have a base backup taken
      after setting wal_level to higher than minimal, they can recover
      the database by starting archive recovery from that newer backup.
      But note that if such backup doesn't exist, there is no easy way to
      complete archive recovery, which may make the database server
      unstartable and users may lose whole database. The commit adds
      the note about this risk into the document.
      
      Even in the case of unstartable database server, previously by just
      disabling hot standby users could avoid the error during archive
      recovery, forcibly start up the server and salvage data from it.
      But note that this commit makes this procedure unavailable at all.
      
      Author: Takamichi Osumi
      Reviewed-by: Laurenz Albe, Kyotaro Horiguchi, David Steele, Fujii Masao
      Discussion: https://postgr.es/m/OSBPR01MB4888CBE1DA08818FD2D90ED8EDF90@OSBPR01MB4888.jpnprd01.prod.outlook.com
      9de9294b
    • Heikki Linnakangas's avatar
      Mark test_enc_conversion() as STRICT. · c4c393b3
      Heikki Linnakangas authored
      Reported-by: Jaime Casanova, using SQLsmith
      Discussion: https://www.postgresql.org/message-id/20210402235337.GA4082@ahch-to
      c4c393b3
    • Dean Rasheed's avatar
      pgbench: Function to generate random permutations. · 6b258e3d
      Dean Rasheed authored
      This adds a new function, permute(), that generates pseudorandom
      permutations of arbitrary sizes. This can be used to randomly shuffle
      a set of values to remove unwanted correlations. For example,
      permuting the output from a non-uniform random distribution so that
      all the most common values aren't collocated, allowing more realistic
      tests to be performed.
      
      Formerly, hash() was recommended for this purpose, but that suffers
      from collisions that might alter the distribution, so recommend
      permute() for this purpose instead.
      
      Fabien Coelho and Hironobu Suzuki, with additional hacking be me.
      Reviewed by Thomas Munro, Alvaro Herrera and Muhammad Usama.
      
      Discussion: https://postgr.es/m/alpine.DEB.2.21.1807280944370.5142@lancre
      6b258e3d
    • Etsuro Fujita's avatar
      Adjust input value to WaitEventSetWait() in ExecAppendAsyncEventWait(). · a8af856d
      Etsuro Fujita authored
      Adjust the number of events given to WaitEventSetWait() so that it
      doesn't exceed the maximum number of events in the WaitEventSet given
      to that function (set->nevents_space) in hopes of making the buildfarm
      green.
      
      Per valgrind failure report from Tom Lane and the buildfarm.
      
      Author: Etsuro Fujita
      Discussion: https://postgr.es/m/3411577.1617289776%40sss.pgh.pa.us
      a8af856d
    • Peter Eisentraut's avatar
      ALTER SUBSCRIPTION ... ADD/DROP PUBLICATION · 82ed7748
      Peter Eisentraut authored
      At present, if we want to update publications in a subscription, we
      can use SET PUBLICATION.  However, it requires supplying all
      publications that exists and the new publications.  If we want to add
      new publications, it's inconvenient.  The new syntax only supplies the
      new publications.  When the refresh is true, it only refreshes the new
      publications.
      
      Author: Japin Li <japinli@hotmail.com>
      Author: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>
      Discussion: https://www.postgresql.org/message-id/flat/MEYP282MB166939D0D6C480B7FBE7EFFBB6BC0@MEYP282MB1669.AUSP282.PROD.OUTLOOK.COM
      82ed7748
    • Amit Kapila's avatar
      Fix the tests added by commit ac4645c0. · 266b5673
      Amit Kapila authored
      In the tests, after disabling the subscription, we were not waiting for
      the replication connection to drop from the publisher. So when the test
      was trying to use the same slot to fetch the messages via SQL API, it
      sometimes gives an error that the replication slot is active for other
      PID.
      
      Per buildfarm.
      266b5673
    • David Rowley's avatar
      Fix compiler warning in fe-trace.c for MSVC · 9bc9b460
      David Rowley authored
      It seems that in MSVC timeval's tv_sec field is of type long.
      localtime() takes a time_t pointer.  Since long is 32-bit even on 64-bit
      builds in MSVC, passing a long pointer instead of the correct time_t
      pointer generated a compiler warning.  Fix that.
      
      Reviewed-by: Tom Lane
      Discussion: https://postgr.es/m/CAApHDvoRG25X_=ZCGSPb4KN_j2iu=G2uXsRSg8NBZeuhkOSETg@mail.gmail.com
      9bc9b460
    • Peter Eisentraut's avatar
      Change return type of EXTRACT to numeric · a2da77cd
      Peter Eisentraut authored
      The previous implementation of EXTRACT mapped internally to
      date_part(), which returned type double precision (since it was
      implemented long before the numeric type existed).  This can lead to
      imprecise output in some cases, so returning numeric would be
      preferrable.  Changing the return type of an existing function is a
      bit risky, so instead we do the following:  We implement a new set of
      functions, which are now called "extract", in parallel to the existing
      date_part functions.  They work the same way internally but use
      numeric instead of float8.  The EXTRACT construct is now mapped by the
      parser to these new extract functions.  That way, dumps of views
      etc. from old versions (which would use date_part) continue to work
      unchanged, but new uses will map to the new extract functions.
      
      Additionally, the reverse compilation of EXTRACT now reproduces the
      original syntax, using the new mechanism introduced in
      40c24bfe.
      
      The following minor changes of behavior result from the new
      implementation:
      
      - The column name from an isolated EXTRACT call is now "extract"
        instead of "date_part".
      
      - Extract from date now rejects inappropriate field names such as
        HOUR.  It was previously mapped internally to extract from
        timestamp, so it would silently accept everything appropriate for
        timestamp.
      
      - Return values when extracting fields with possibly fractional
        values, such as second and epoch, now have the full scale that the
        value has internally (so, for example, '1.000000' instead of just
        '1').
      Reported-by: default avatarPetr Fedorov <petr.fedorov@phystech.edu>
      Reviewed-by: default avatarTom Lane <tgl@sss.pgh.pa.us>
      Discussion: https://www.postgresql.org/message-id/flat/42b73d2d-da12-ba9f-570a-420e0cce19d9@phystech.edu
      a2da77cd
    • Fujii Masao's avatar
      Fix typo in pgstat.c. · f5d94e40
      Fujii Masao authored
      Introduced by 98681675.
      
      Author: Vignesh C
      Discussion: https://postgr.es/m/CALDaNm1DqgaLBAJrtGznKk1sR1mH-augmp7LfGvxWwTUhah+rg@mail.gmail.com
      f5d94e40