1. 15 Jan, 2021 4 commits
  2. 14 Jan, 2021 7 commits
    • Tom Lane's avatar
      pg_dump: label PUBLICATION TABLE ArchiveEntries with an owner. · 8e396a77
      Tom Lane authored
      This is the same fix as commit 9eabfe30 applied to INDEX ATTACH
      entries, but for table-to-publication attachments.  As in that
      case, even though the backend doesn't record "ownership" of the
      attachment, we still ought to label it in the dump archive with
      the role name that should run the ALTER PUBLICATION command.
      The existing behavior causes the ALTER to be done by the original
      role that started the restore; that will usually work fine, but
      there may be corner cases where it fails.
      
      The bulk of the patch is concerned with changing struct
      PublicationRelInfo to include a pointer to the associated
      PublicationInfo object, so that we can get the owner's name
      out of that when the time comes.  While at it, I rewrote
      getPublicationTables() to do just one query of pg_publication_rel,
      not one per table.
      
      Back-patch to v10 where this code was introduced.
      
      Discussion: https://postgr.es/m/1165710.1610473242@sss.pgh.pa.us
      8e396a77
    • Alvaro Herrera's avatar
      Prevent drop of tablespaces used by partitioned relations · ebfe2dbd
      Alvaro Herrera authored
      When a tablespace is used in a partitioned relation (per commits
      ca410302 in pg12 for tables and 33e6c34c3267 in pg11 for indexes),
      it is possible to drop the tablespace, potentially causing various
      problems.  One such was reported in bug #16577, where a rewriting ALTER
      TABLE causes a server crash.
      
      Protect against this by using pg_shdepend to keep track of tablespaces
      when used for relations that don't keep physical files; we now abort a
      tablespace if we see that the tablespace is referenced from any
      partitioned relations.
      
      Backpatch this to 11, where this problem has been latent all along.  We
      don't try to create pg_shdepend entries for existing partitioned
      indexes/tables, but any ones that are modified going forward will be
      protected.
      
      Note slight behavior change: when trying to drop a tablespace that
      contains both regular tables as well as partitioned ones, you'd
      previously get ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE and now you'll
      get ERRCODE_DEPENDENT_OBJECTS_STILL_EXIST.  Arguably, the latter is more
      correct.
      
      It is possible to add protecting pg_shdepend entries for existing
      tables/indexes, by doing
        ALTER TABLE ONLY some_partitioned_table SET TABLESPACE pg_default;
        ALTER TABLE ONLY some_partitioned_table SET TABLESPACE original_tablespace;
      for each partitioned table/index that is not in the database default
      tablespace.  Because these partitioned objects do not have storage, no
      file needs to be actually moved, so it shouldn't take more time than
      what's required to acquire locks.
      
      This query can be used to search for such relations:
      SELECT ... FROM pg_class WHERE relkind IN ('p', 'I') AND reltablespace <> 0
      Reported-by: default avatarAlexander Lakhin <exclusion@gmail.com>
      Discussion: https://postgr.es/m/16577-881633a9f9894fd5@postgresql.org
      Author: Álvaro Herrera <alvherre@alvh.no-ip.org>
      Reviewed-by: default avatarMichael Paquier <michael@paquier.xyz>
      ebfe2dbd
    • Fujii Masao's avatar
      Stabilize timeline switch regression test. · 424d7a9b
      Fujii Masao authored
      Commit fef5b47f added the regression test to check whether a standby is
      able to follow a primary on a newer timeline when WAL archiving is enabled.
      But the buildfarm member florican reported that this test failed because
      the requested WAL segment was removed and replication failed. This is a
      timing issue. Since neither replication slot is used nor wal_keep_size is set
      in the test, checkpoint could remove the WAL segment that's still necessary
      for replication.
      
      This commit stabilizes the test by setting wal_keep_size.
      
      Back-patch to v13 where the regression test that this commit stabilizes
      was added.
      
      Author: Fujii Masao
      Discussion: https://postgr.es/m/X//PsenxcC50jDzX@paquier.xyz
      424d7a9b
    • Fujii Masao's avatar
      Improve tab-completion for CLOSE, DECLARE, FETCH and MOVE. · 3f238b88
      Fujii Masao authored
      This commit makes CLOSE, FETCH and MOVE commands tab-complete the list of
      cursors. Also this commit makes DECLARE command tab-complete the options.
      
      Author: Shinya Kato, Sawada Masahiko, tweaked by Fujii Masao
      Reviewed-by: Shinya Kato, Sawada Masahiko, Fujii Masao
      Discussion: https://postgr.es/m/b0e4c5c53ef84c5395524f5056fc71f0@MP-MSGSS-MBX001.msg.nttdata.co.jp
      3f238b88
    • Thomas Munro's avatar
      Minor header cleanup for the new iovec code. · fb29ab26
      Thomas Munro authored
      Remove redundant function declaration and improve header comment in
      pg_iovec.h.  Move the new declaration in fd.h next to a group of more
      similar functions.
      fb29ab26
    • Fujii Masao's avatar
      Ensure that a standby is able to follow a primary on a newer timeline. · fef5b47f
      Fujii Masao authored
      Commit 709d003f refactored WAL-reading code, but accidentally caused
      WalSndSegmentOpen() to fail to follow a timeline switch while reading from
      a historic timeline. This issue caused a standby to fail to follow a primary
      on a newer timeline when WAL archiving is enabled.
      
      If there is a timeline switch within the segment, WalSndSegmentOpen() should
      read from the WAL segment belonging to the new timeline. But previously
      since it failed to follow a timeline switch, it tried to read the WAL segment
      with old timeline. When WAL archiving is enabled, that WAL segment with
      old timeline doesn't exist because it's renamed to .partial. This leads
      a primary to have tried to read non-existent WAL segment, and which caused
      replication to faill with the error "ERROR:  requested WAL segment ... has
       already been removed".
      
      This commit fixes WalSndSegmentOpen() so that it's able to follow a timeline
      switch, to ensure that a standby is able to follow a primary on a newer
      timeline even when WAL archiving is enabled.
      
      This commit also adds the regression test to check whether a standby is
      able to follow a primary on a newer timeline when WAL archiving is enabled.
      
      Back-patch to v13 where the bug was introduced.
      
      Reported-by: Kyotaro Horiguchi
      Author: Kyotaro Horiguchi, tweaked by Fujii Masao
      Reviewed-by:  Alvaro Herrera, Fujii Masao
      Discussion: https://postgr.es/m/20201209.174314.282492377848029776.horikyota.ntt@gmail.com
      fef5b47f
    • Michael Paquier's avatar
      Rework refactoring of hex and encoding routines · aef8948f
      Michael Paquier authored
      This commit addresses some issues with c3826f83 that moved the hex
      decoding routine to src/common/:
      - The decoding function lacked overflow checks, so when used for
      security-related features it was an open door to out-of-bound writes if
      not carefully used that could remain undetected.  Like the base64
      routines already in src/common/ used by SCRAM, this routine is reworked
      to check for overflows by having the size of the destination buffer
      passed as argument, with overflows checked before doing any writes.
      - The encoding routine was missing.  This is moved to src/common/ and
      it gains the same overflow checks as the decoding part.
      
      On failure, the hex routines of src/common/ issue an error as per the
      discussion done to make them usable by frontend tools, but not by shared
      libraries.  Note that this is why ECPG is left out of this commit, and
      it still includes a duplicated logic doing hex encoding and decoding.
      
      While on it, this commit uses better variable names for the source and
      destination buffers in the existing escape and base64 routines in
      encode.c and it makes them more robust to overflow detection.  The
      previous core code issued a FATAL after doing out-of-bound writes if
      going through the SQL functions, which would be enough to detect
      problems when working on changes that impacted this area of the
      code.  Instead, an error is issued before doing an out-of-bound write.
      The hex routines were being directly called for bytea conversions and
      backup manifests without such sanity checks.  The current calls happen
      to not have any problems, but careless uses of such APIs could easily
      lead to CVE-class bugs.
      
      Author: Bruce Momjian, Michael Paquier
      Reviewed-by: Sehrope Sarkuni
      Discussion: https://postgr.es/m/20201231003557.GB22199@momjian.us
      aef8948f
  3. 13 Jan, 2021 18 commits
  4. 12 Jan, 2021 8 commits
    • Alvaro Herrera's avatar
      Invent struct ReindexIndexInfo · c6c4b373
      Alvaro Herrera authored
      This struct is used by ReindexRelationConcurrently to keep track of the
      relations to process.  This saves having to obtain some data repeatedly,
      and has future uses as well.
      Reviewed-by: default avatarDmitry Dolgov <9erthalion6@gmail.com>
      Reviewed-by: default avatarHamid Akhtar <hamid.akhtar@gmail.com>
      Reviewed-by: default avatarMasahiko Sawada <sawada.mshk@gmail.com>
      Discussion: https://postgr.es/m/20201130195439.GA24598@alvherre.pgsql
      c6c4b373
    • Tom Lane's avatar
      pg_dump: label INDEX ATTACH ArchiveEntries with an owner. · 9eabfe30
      Tom Lane authored
      Although a partitioned index's attachment to its parent doesn't
      have separate ownership, the ArchiveEntry for it needs to be
      marked with an owner anyway, to ensure that the ALTER command
      is run by the appropriate role when restoring with
      --use-set-session-authorization.  Without this, the ALTER will
      be run by the role that started the restore session, which will
      usually work but it's formally the wrong thing.
      
      Back-patch to v11 where this type of ArchiveEntry was added.
      In HEAD, add equivalent commentary to the just-added TABLE ATTACH
      case, which I'd made do the right thing already.
      
      Discussion: https://postgr.es/m/1094034.1610418498@sss.pgh.pa.us
      9eabfe30
    • Tom Lane's avatar
      Doc: fix description of privileges needed for ALTER PUBLICATION. · cc865c0f
      Tom Lane authored
      Adding a table to a publication requires ownership of the table
      (in addition to ownership of the publication).  This was mentioned
      nowhere.
      cc865c0f
    • Alvaro Herrera's avatar
      Fix thinko in comment · a3e51a36
      Alvaro Herrera authored
      This comment has been wrong since its introduction in commit
      2c03216d.
      
      Author: Masahiko Sawada <sawada.mshk@gmail.com>
      Discussion: https://postgr.es/m/CAD21AoAzz6qipFJBbGEaHmyWxvvNDp8httbwLR9tUQWaTjUs2Q@mail.gmail.com
      a3e51a36
    • Amit Kapila's avatar
      Fix relation descriptor leak. · 044aa9e7
      Amit Kapila authored
      We missed closing the relation descriptor while sending changes via the
      root of partitioned relations during logical replication.
      
      Author: Amit Langote and Mark Zhao
      Reviewed-by: Amit Kapila and Ashutosh Bapat
      Backpatch-through: 13, where it was introduced
      Discussion: https://postgr.es/m/tencent_41FEA657C206F19AB4F406BE9252A0F69C06@qq.com
      Discussion: https://postgr.es/m/tencent_6E296D2F7D70AFC90D83353B69187C3AA507@qq.com
      044aa9e7
    • Amit Kapila's avatar
      Optimize DropRelFileNodeBuffers() for recovery. · d6ad34f3
      Amit Kapila authored
      The recovery path of DropRelFileNodeBuffers() is optimized so that
      scanning of the whole buffer pool can be avoided when the number of
      blocks to be truncated in a relation is below a certain threshold. For
      such cases, we find the buffers by doing lookups in BufMapping table.
      This improves the performance by more than 100 times in many cases
      when several small tables (tested with 1000 relations) are truncated
      and where the server is configured with a large value of shared
      buffers (greater than equal to 100GB).
      
      This optimization helps cases (a) when vacuum or autovacuum truncated off
      any of the empty pages at the end of a relation, or (b) when the relation is
      truncated in the same transaction in which it was created.
      
      This commit introduces a new API smgrnblocks_cached which returns a cached
      value for the number of blocks in a relation fork. This helps us to determine
      the exact size of relation which is required to apply this optimization. The
      exact size is required to ensure that we don't leave any buffer for the
      relation being dropped as otherwise the background writer or checkpointer
      can lead to a PANIC error while flushing buffers corresponding to files that
      don't exist.
      
      Author: Kirk Jamison based on ideas by Amit Kapila
      Reviewed-by: Kyotaro Horiguchi, Takayuki Tsunakawa, and Amit Kapila
      Tested-By: Haiying Tang
      Discussion: https://postgr.es/m/OSBPR01MB3207DCA7EC725FDD661B3EDAEF660@OSBPR01MB3207.jpnprd01.prod.outlook.com
      d6ad34f3
    • Tom Lane's avatar
      Dump ALTER TABLE ... ATTACH PARTITION as a separate ArchiveEntry. · 9a4c0e36
      Tom Lane authored
      Previously, we emitted the ATTACH PARTITION command as part of
      the child table's ArchiveEntry.  This was a poor choice since it
      complicates restoring the partition as a standalone table; you have
      to ignore the error from the ATTACH, which isn't even an option when
      restoring direct-to-database with pg_restore.  (pg_restore will issue
      the whole ArchiveEntry as one PQexec, so that any error rolls back
      the table creation as well.)  Hence, separate it out as its own
      ArchiveEntry, as indeed we already did for index ATTACH PARTITION
      commands.
      
      Justin Pryzby
      
      Discussion: https://postgr.es/m/20201023052940.GE9241@telsasoft.com
      9a4c0e36
    • Tom Lane's avatar
      Make pg_dump's table of object-type priorities more maintainable. · d5ab79d8
      Tom Lane authored
      Wedging a new object type into this table has historically required
      manually renumbering a lot of existing entries.  (Although it appears
      that some people got lazy and re-used the priority level of an
      existing object type, even if it wasn't particularly related.)
      We can let the compiler do the counting by inventing an enum type that
      lists the desired priority levels in order.  Now, if you want to add
      or remove a priority level, that's a one-liner.
      
      This patch is not purely cosmetic, because I split apart the priorities
      of DO_COLLATION and DO_TRANSFORM, as well as those of DO_ACCESS_METHOD
      and DO_OPERATOR, which look to me to have been merged out of expediency
      rather than because it was a good idea.  Shell types continue to be
      sorted interchangeably with full types, and opclasses interchangeably
      with opfamilies.
      d5ab79d8
  5. 11 Jan, 2021 3 commits