1. 14 Jan, 2021 6 commits
    • Alvaro Herrera's avatar
      Prevent drop of tablespaces used by partitioned relations · ebfe2dbd
      Alvaro Herrera authored
      When a tablespace is used in a partitioned relation (per commits
      ca410302 in pg12 for tables and 33e6c34c3267 in pg11 for indexes),
      it is possible to drop the tablespace, potentially causing various
      problems.  One such was reported in bug #16577, where a rewriting ALTER
      TABLE causes a server crash.
      
      Protect against this by using pg_shdepend to keep track of tablespaces
      when used for relations that don't keep physical files; we now abort a
      tablespace if we see that the tablespace is referenced from any
      partitioned relations.
      
      Backpatch this to 11, where this problem has been latent all along.  We
      don't try to create pg_shdepend entries for existing partitioned
      indexes/tables, but any ones that are modified going forward will be
      protected.
      
      Note slight behavior change: when trying to drop a tablespace that
      contains both regular tables as well as partitioned ones, you'd
      previously get ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE and now you'll
      get ERRCODE_DEPENDENT_OBJECTS_STILL_EXIST.  Arguably, the latter is more
      correct.
      
      It is possible to add protecting pg_shdepend entries for existing
      tables/indexes, by doing
        ALTER TABLE ONLY some_partitioned_table SET TABLESPACE pg_default;
        ALTER TABLE ONLY some_partitioned_table SET TABLESPACE original_tablespace;
      for each partitioned table/index that is not in the database default
      tablespace.  Because these partitioned objects do not have storage, no
      file needs to be actually moved, so it shouldn't take more time than
      what's required to acquire locks.
      
      This query can be used to search for such relations:
      SELECT ... FROM pg_class WHERE relkind IN ('p', 'I') AND reltablespace <> 0
      Reported-by: default avatarAlexander Lakhin <exclusion@gmail.com>
      Discussion: https://postgr.es/m/16577-881633a9f9894fd5@postgresql.org
      Author: Álvaro Herrera <alvherre@alvh.no-ip.org>
      Reviewed-by: default avatarMichael Paquier <michael@paquier.xyz>
      ebfe2dbd
    • Fujii Masao's avatar
      Stabilize timeline switch regression test. · 424d7a9b
      Fujii Masao authored
      Commit fef5b47f added the regression test to check whether a standby is
      able to follow a primary on a newer timeline when WAL archiving is enabled.
      But the buildfarm member florican reported that this test failed because
      the requested WAL segment was removed and replication failed. This is a
      timing issue. Since neither replication slot is used nor wal_keep_size is set
      in the test, checkpoint could remove the WAL segment that's still necessary
      for replication.
      
      This commit stabilizes the test by setting wal_keep_size.
      
      Back-patch to v13 where the regression test that this commit stabilizes
      was added.
      
      Author: Fujii Masao
      Discussion: https://postgr.es/m/X//PsenxcC50jDzX@paquier.xyz
      424d7a9b
    • Fujii Masao's avatar
      Improve tab-completion for CLOSE, DECLARE, FETCH and MOVE. · 3f238b88
      Fujii Masao authored
      This commit makes CLOSE, FETCH and MOVE commands tab-complete the list of
      cursors. Also this commit makes DECLARE command tab-complete the options.
      
      Author: Shinya Kato, Sawada Masahiko, tweaked by Fujii Masao
      Reviewed-by: Shinya Kato, Sawada Masahiko, Fujii Masao
      Discussion: https://postgr.es/m/b0e4c5c53ef84c5395524f5056fc71f0@MP-MSGSS-MBX001.msg.nttdata.co.jp
      3f238b88
    • Thomas Munro's avatar
      Minor header cleanup for the new iovec code. · fb29ab26
      Thomas Munro authored
      Remove redundant function declaration and improve header comment in
      pg_iovec.h.  Move the new declaration in fd.h next to a group of more
      similar functions.
      fb29ab26
    • Fujii Masao's avatar
      Ensure that a standby is able to follow a primary on a newer timeline. · fef5b47f
      Fujii Masao authored
      Commit 709d003f refactored WAL-reading code, but accidentally caused
      WalSndSegmentOpen() to fail to follow a timeline switch while reading from
      a historic timeline. This issue caused a standby to fail to follow a primary
      on a newer timeline when WAL archiving is enabled.
      
      If there is a timeline switch within the segment, WalSndSegmentOpen() should
      read from the WAL segment belonging to the new timeline. But previously
      since it failed to follow a timeline switch, it tried to read the WAL segment
      with old timeline. When WAL archiving is enabled, that WAL segment with
      old timeline doesn't exist because it's renamed to .partial. This leads
      a primary to have tried to read non-existent WAL segment, and which caused
      replication to faill with the error "ERROR:  requested WAL segment ... has
       already been removed".
      
      This commit fixes WalSndSegmentOpen() so that it's able to follow a timeline
      switch, to ensure that a standby is able to follow a primary on a newer
      timeline even when WAL archiving is enabled.
      
      This commit also adds the regression test to check whether a standby is
      able to follow a primary on a newer timeline when WAL archiving is enabled.
      
      Back-patch to v13 where the bug was introduced.
      
      Reported-by: Kyotaro Horiguchi
      Author: Kyotaro Horiguchi, tweaked by Fujii Masao
      Reviewed-by:  Alvaro Herrera, Fujii Masao
      Discussion: https://postgr.es/m/20201209.174314.282492377848029776.horikyota.ntt@gmail.com
      fef5b47f
    • Michael Paquier's avatar
      Rework refactoring of hex and encoding routines · aef8948f
      Michael Paquier authored
      This commit addresses some issues with c3826f83 that moved the hex
      decoding routine to src/common/:
      - The decoding function lacked overflow checks, so when used for
      security-related features it was an open door to out-of-bound writes if
      not carefully used that could remain undetected.  Like the base64
      routines already in src/common/ used by SCRAM, this routine is reworked
      to check for overflows by having the size of the destination buffer
      passed as argument, with overflows checked before doing any writes.
      - The encoding routine was missing.  This is moved to src/common/ and
      it gains the same overflow checks as the decoding part.
      
      On failure, the hex routines of src/common/ issue an error as per the
      discussion done to make them usable by frontend tools, but not by shared
      libraries.  Note that this is why ECPG is left out of this commit, and
      it still includes a duplicated logic doing hex encoding and decoding.
      
      While on it, this commit uses better variable names for the source and
      destination buffers in the existing escape and base64 routines in
      encode.c and it makes them more robust to overflow detection.  The
      previous core code issued a FATAL after doing out-of-bound writes if
      going through the SQL functions, which would be enough to detect
      problems when working on changes that impacted this area of the
      code.  Instead, an error is issued before doing an out-of-bound write.
      The hex routines were being directly called for bytea conversions and
      backup manifests without such sanity checks.  The current calls happen
      to not have any problems, but careless uses of such APIs could easily
      lead to CVE-class bugs.
      
      Author: Bruce Momjian, Michael Paquier
      Reviewed-by: Sehrope Sarkuni
      Discussion: https://postgr.es/m/20201231003557.GB22199@momjian.us
      aef8948f
  2. 13 Jan, 2021 18 commits
  3. 12 Jan, 2021 8 commits
    • Alvaro Herrera's avatar
      Invent struct ReindexIndexInfo · c6c4b373
      Alvaro Herrera authored
      This struct is used by ReindexRelationConcurrently to keep track of the
      relations to process.  This saves having to obtain some data repeatedly,
      and has future uses as well.
      Reviewed-by: default avatarDmitry Dolgov <9erthalion6@gmail.com>
      Reviewed-by: default avatarHamid Akhtar <hamid.akhtar@gmail.com>
      Reviewed-by: default avatarMasahiko Sawada <sawada.mshk@gmail.com>
      Discussion: https://postgr.es/m/20201130195439.GA24598@alvherre.pgsql
      c6c4b373
    • Tom Lane's avatar
      pg_dump: label INDEX ATTACH ArchiveEntries with an owner. · 9eabfe30
      Tom Lane authored
      Although a partitioned index's attachment to its parent doesn't
      have separate ownership, the ArchiveEntry for it needs to be
      marked with an owner anyway, to ensure that the ALTER command
      is run by the appropriate role when restoring with
      --use-set-session-authorization.  Without this, the ALTER will
      be run by the role that started the restore session, which will
      usually work but it's formally the wrong thing.
      
      Back-patch to v11 where this type of ArchiveEntry was added.
      In HEAD, add equivalent commentary to the just-added TABLE ATTACH
      case, which I'd made do the right thing already.
      
      Discussion: https://postgr.es/m/1094034.1610418498@sss.pgh.pa.us
      9eabfe30
    • Tom Lane's avatar
      Doc: fix description of privileges needed for ALTER PUBLICATION. · cc865c0f
      Tom Lane authored
      Adding a table to a publication requires ownership of the table
      (in addition to ownership of the publication).  This was mentioned
      nowhere.
      cc865c0f
    • Alvaro Herrera's avatar
      Fix thinko in comment · a3e51a36
      Alvaro Herrera authored
      This comment has been wrong since its introduction in commit
      2c03216d.
      
      Author: Masahiko Sawada <sawada.mshk@gmail.com>
      Discussion: https://postgr.es/m/CAD21AoAzz6qipFJBbGEaHmyWxvvNDp8httbwLR9tUQWaTjUs2Q@mail.gmail.com
      a3e51a36
    • Amit Kapila's avatar
      Fix relation descriptor leak. · 044aa9e7
      Amit Kapila authored
      We missed closing the relation descriptor while sending changes via the
      root of partitioned relations during logical replication.
      
      Author: Amit Langote and Mark Zhao
      Reviewed-by: Amit Kapila and Ashutosh Bapat
      Backpatch-through: 13, where it was introduced
      Discussion: https://postgr.es/m/tencent_41FEA657C206F19AB4F406BE9252A0F69C06@qq.com
      Discussion: https://postgr.es/m/tencent_6E296D2F7D70AFC90D83353B69187C3AA507@qq.com
      044aa9e7
    • Amit Kapila's avatar
      Optimize DropRelFileNodeBuffers() for recovery. · d6ad34f3
      Amit Kapila authored
      The recovery path of DropRelFileNodeBuffers() is optimized so that
      scanning of the whole buffer pool can be avoided when the number of
      blocks to be truncated in a relation is below a certain threshold. For
      such cases, we find the buffers by doing lookups in BufMapping table.
      This improves the performance by more than 100 times in many cases
      when several small tables (tested with 1000 relations) are truncated
      and where the server is configured with a large value of shared
      buffers (greater than equal to 100GB).
      
      This optimization helps cases (a) when vacuum or autovacuum truncated off
      any of the empty pages at the end of a relation, or (b) when the relation is
      truncated in the same transaction in which it was created.
      
      This commit introduces a new API smgrnblocks_cached which returns a cached
      value for the number of blocks in a relation fork. This helps us to determine
      the exact size of relation which is required to apply this optimization. The
      exact size is required to ensure that we don't leave any buffer for the
      relation being dropped as otherwise the background writer or checkpointer
      can lead to a PANIC error while flushing buffers corresponding to files that
      don't exist.
      
      Author: Kirk Jamison based on ideas by Amit Kapila
      Reviewed-by: Kyotaro Horiguchi, Takayuki Tsunakawa, and Amit Kapila
      Tested-By: Haiying Tang
      Discussion: https://postgr.es/m/OSBPR01MB3207DCA7EC725FDD661B3EDAEF660@OSBPR01MB3207.jpnprd01.prod.outlook.com
      d6ad34f3
    • Tom Lane's avatar
      Dump ALTER TABLE ... ATTACH PARTITION as a separate ArchiveEntry. · 9a4c0e36
      Tom Lane authored
      Previously, we emitted the ATTACH PARTITION command as part of
      the child table's ArchiveEntry.  This was a poor choice since it
      complicates restoring the partition as a standalone table; you have
      to ignore the error from the ATTACH, which isn't even an option when
      restoring direct-to-database with pg_restore.  (pg_restore will issue
      the whole ArchiveEntry as one PQexec, so that any error rolls back
      the table creation as well.)  Hence, separate it out as its own
      ArchiveEntry, as indeed we already did for index ATTACH PARTITION
      commands.
      
      Justin Pryzby
      
      Discussion: https://postgr.es/m/20201023052940.GE9241@telsasoft.com
      9a4c0e36
    • Tom Lane's avatar
      Make pg_dump's table of object-type priorities more maintainable. · d5ab79d8
      Tom Lane authored
      Wedging a new object type into this table has historically required
      manually renumbering a lot of existing entries.  (Although it appears
      that some people got lazy and re-used the priority level of an
      existing object type, even if it wasn't particularly related.)
      We can let the compiler do the counting by inventing an enum type that
      lists the desired priority levels in order.  Now, if you want to add
      or remove a priority level, that's a one-liner.
      
      This patch is not purely cosmetic, because I split apart the priorities
      of DO_COLLATION and DO_TRANSFORM, as well as those of DO_ACCESS_METHOD
      and DO_OPERATOR, which look to me to have been merged out of expediency
      rather than because it was a good idea.  Shell types continue to be
      sorted interchangeably with full types, and opclasses interchangeably
      with opfamilies.
      d5ab79d8
  4. 11 Jan, 2021 8 commits
    • Thomas Munro's avatar
      Fix function prototypes in dependency.h. · f315205f
      Thomas Munro authored
      Commit 257836a7 accidentally deleted a couple of
      redundant-but-conventional "extern" keywords on function prototypes.
      Put them back.
      Reported-by: default avatarAlvaro Herrera <alvherre@alvh.no-ip.org>
      f315205f
    • Tom Lane's avatar
      Rethink SQLSTATE code for ERRCODE_IDLE_SESSION_TIMEOUT. · 4edf9684
      Tom Lane authored
      Move it to class 57 (Operator Intervention), which seems like a
      better choice given that from the client's standpoint it behaves
      a heck of a lot like, e.g., ERRCODE_ADMIN_SHUTDOWN.
      
      In a green field I'd put ERRCODE_IDLE_IN_TRANSACTION_SESSION_TIMEOUT
      here as well.  But that's been around for a few years, so it's
      probably too late to change its SQLSTATE code.
      
      Discussion: https://postgr.es/m/763A0689-F189-459E-946F-F0EC4458980B@hotmail.com
      4edf9684
    • Tom Lane's avatar
      Try next host after a "cannot connect now" failure. · c1d58957
      Tom Lane authored
      If a server returns ERRCODE_CANNOT_CONNECT_NOW, try the next host,
      if multiple host names have been provided.  This allows dealing
      gracefully with standby servers that might not be in hot standby mode
      yet.
      
      In the wake of the preceding commit, it might be plausible to retry
      many more error cases than we do now, but I (tgl) am hesitant to
      move too aggressively on that --- it's not clear it'd be desirable
      for cases such as bad-password, for example.  But this case seems
      safe enough.
      
      Hubert Zhang, reviewed by Takayuki Tsunakawa
      
      Discussion: https://postgr.es/m/BN6PR05MB3492948E4FD76C156E747E8BC9160@BN6PR05MB3492.namprd05.prod.outlook.com
      c1d58957
    • Tom Lane's avatar
      Uniformly identify the target host in libpq connection failure reports. · 52a10224
      Tom Lane authored
      Prefix "could not connect to host-or-socket-path:" to all connection
      failure cases that occur after the socket() call, and remove the
      ad-hoc server identity data that was appended to a few of these
      messages.  This should produce much more intelligible error reports
      in multiple-target-host situations, especially for error cases that
      are off the beaten track to any degree (because none of those provided
      any server identity info).
      
      As an example of the change, formerly a connection attempt with a bad
      port number such as "psql -p 12345 -h localhost,/tmp" might produce
      
      psql: error: could not connect to server: Connection refused
              Is the server running on host "localhost" (::1) and accepting
              TCP/IP connections on port 12345?
      could not connect to server: Connection refused
              Is the server running on host "localhost" (127.0.0.1) and accepting
              TCP/IP connections on port 12345?
      could not connect to server: No such file or directory
              Is the server running locally and accepting
              connections on Unix domain socket "/tmp/.s.PGSQL.12345"?
      
      Now it looks like
      
      psql: error: could not connect to host "localhost" (::1), port 12345: Connection refused
              Is the server running on that host and accepting TCP/IP connections?
      could not connect to host "localhost" (127.0.0.1), port 12345: Connection refused
              Is the server running on that host and accepting TCP/IP connections?
      could not connect to socket "/tmp/.s.PGSQL.12345": No such file or directory
              Is the server running locally and accepting connections on that socket?
      
      This requires adjusting a couple of regression tests to allow for
      variation in the contents of a connection failure message.
      
      Discussion: https://postgr.es/m/BN6PR05MB3492948E4FD76C156E747E8BC9160@BN6PR05MB3492.namprd05.prod.outlook.com
      52a10224
    • Tom Lane's avatar
      Allow pg_regress.c wrappers to postprocess test result files. · 800d93f3
      Tom Lane authored
      Add an optional callback to regression_main() that, if provided,
      is invoked on each test output file before we try to compare it
      to the expected-result file.
      
      The main and isolation test programs don't need this (yet).
      In pg_regress_ecpg, add a filter that eliminates target-host
      details from "could not connect" error reports.  This filter
      doesn't do anything as of this commit, but it will be needed
      by the next one.
      
      In the long run we might want to provide some more general,
      perhaps pattern-based, filtering mechanism for test output.
      For now, this will solve the immediate problem.
      
      Discussion: https://postgr.es/m/BN6PR05MB3492948E4FD76C156E747E8BC9160@BN6PR05MB3492.namprd05.prod.outlook.com
      800d93f3
    • Tom Lane's avatar
      In libpq, always append new error messages to conn->errorMessage. · ffa2e467
      Tom Lane authored
      Previously, we had an undisciplined mish-mash of printfPQExpBuffer and
      appendPQExpBuffer calls to report errors within libpq.  This commit
      establishes a uniform rule that appendPQExpBuffer[Str] should be used.
      conn->errorMessage is reset only at the start of an application request,
      and then accumulates messages till we're done.  We can remove no less
      than three different ad-hoc mechanisms that were used to get the effect
      of concatenation of error messages within a sequence of operations.
      
      Although this makes things quite a bit cleaner conceptually, the main
      reason to do it is to make the world safer for the multiple-target-host
      feature that was added awhile back.  Previously, there were many cases
      in which an error occurring during an individual host connection attempt
      would wipe out the record of what had happened during previous attempts.
      (The reporting is still inadequate, in that it can be hard to tell which
      host got the failure, but that seems like a matter for a separate commit.)
      
      Currently, lo_import and lo_export contain exceptions to the "never
      use printfPQExpBuffer" rule.  If we changed them, we'd risk reporting
      an incidental lo_close failure before the actual read or write
      failure, which would be confusing, not least because lo_close happened
      after the main failure.  We could improve this by inventing an
      internal version of lo_close that doesn't reset the errorMessage; but
      we'd also need a version of PQfn() that does that, and it didn't quite
      seem worth the trouble for now.
      
      Discussion: https://postgr.es/m/BN6PR05MB3492948E4FD76C156E747E8BC9160@BN6PR05MB3492.namprd05.prod.outlook.com
      ffa2e467
    • Thomas Munro's avatar
      Use vectored I/O to fill new WAL segments. · ce6a71fa
      Thomas Munro authored
      Instead of making many block-sized write() calls to fill a new WAL file
      with zeroes, make a smaller number of pwritev() calls (or various
      emulations).  The actual number depends on the OS's IOV_MAX, which
      PG_IOV_MAX currently caps at 32.  That means we'll write 256kB per call
      on typical systems.  We may want to tune the number later with more
      experience.
      Reviewed-by: default avatarTom Lane <tgl@sss.pgh.pa.us>
      Reviewed-by: default avatarAndres Freund <andres@anarazel.de>
      Discussion: https://postgr.es/m/CA%2BhUKGJA%2Bu-220VONeoREBXJ9P3S94Y7J%2BkqCnTYmahvZJwM%3Dg%40mail.gmail.com
      ce6a71fa
    • Thomas Munro's avatar
      Provide pg_preadv() and pg_pwritev(). · 13a021f3
      Thomas Munro authored
      Provide synchronous vectored file I/O routines.  These map to preadv()
      and pwritev(), with fallback implementations for systems that don't have
      them.  Also provide a wrapper pg_pwritev_with_retry() that automatically
      retries on short writes.
      Reviewed-by: default avatarTom Lane <tgl@sss.pgh.pa.us>
      Reviewed-by: default avatarAndres Freund <andres@anarazel.de>
      Discussion: https://postgr.es/m/CA%2BhUKGJA%2Bu-220VONeoREBXJ9P3S94Y7J%2BkqCnTYmahvZJwM%3Dg%40mail.gmail.com
      13a021f3