1. 17 Aug, 2020 1 commit
  2. 16 Aug, 2020 3 commits
  3. 15 Aug, 2020 7 commits
  4. 14 Aug, 2020 9 commits
    • Andres Freund's avatar
      snapshot scalability: Move subxact info to ProcGlobal, remove PGXACT. · 73487a60
      Andres Freund authored
      Similar to the previous changes this increases the chance that data
      frequently needed by GetSnapshotData() stays in l2 cache. In many
      workloads subtransactions are very rare, and this makes the check for
      that considerably cheaper.
      
      As this removes the last member of PGXACT, there is no need to keep it
      around anymore.
      
      On a larger 2 socket machine this and the two preceding commits result
      in a ~1.07x performance increase in read-only pgbench. For read-heavy
      mixed r/w workloads without row level contention, I see about 1.1x.
      
      Author: Andres Freund <andres@anarazel.de>
      Reviewed-By: default avatarRobert Haas <robertmhaas@gmail.com>
      Reviewed-By: default avatarThomas Munro <thomas.munro@gmail.com>
      Reviewed-By: default avatarDavid Rowley <dgrowleyml@gmail.com>
      Discussion: https://postgr.es/m/20200301083601.ews6hz5dduc3w2se@alap3.anarazel.de
      73487a60
    • Andres Freund's avatar
      snapshot scalability: Move PGXACT->vacuumFlags to ProcGlobal->vacuumFlags. · 5788e258
      Andres Freund authored
      Similar to the previous commit this increases the chance that data
      frequently needed by GetSnapshotData() stays in l2 cache. As we now
      take care to not unnecessarily write to ProcGlobal->vacuumFlags, there
      should be very few modifications to the ProcGlobal->vacuumFlags array.
      
      Author: Andres Freund <andres@anarazel.de>
      Reviewed-By: default avatarRobert Haas <robertmhaas@gmail.com>
      Reviewed-By: default avatarThomas Munro <thomas.munro@gmail.com>
      Reviewed-By: default avatarDavid Rowley <dgrowleyml@gmail.com>
      Discussion: https://postgr.es/m/20200301083601.ews6hz5dduc3w2se@alap3.anarazel.de
      5788e258
    • Andres Freund's avatar
      snapshot scalability: Introduce dense array of in-progress xids. · 941697c3
      Andres Freund authored
      The new array contains the xids for all connected backends / in-use
      PGPROC entries in a dense manner (in contrast to the PGPROC/PGXACT
      arrays which can have unused entries interspersed).
      
      This improves performance because GetSnapshotData() always needs to
      scan the xids of all live procarray entries and now there's no need to
      go through the procArray->pgprocnos indirection anymore.
      
      As the set of running top-level xids changes rarely, compared to the
      number of snapshots taken, this substantially increases the likelihood
      of most data required for a snapshot being in l2 cache.  In
      read-mostly workloads scanning the xids[] array will sufficient to
      build a snapshot, as most backends will not have an xid assigned.
      
      To keep the xid array dense ProcArrayRemove() needs to move entries
      behind the to-be-removed proc's one further up in the array. Obviously
      moving array entries cannot happen while a backend sets it
      xid. I.e. locking needs to prevent that array entries are moved while
      a backend modifies its xid.
      
      To avoid locking ProcArrayLock in GetNewTransactionId() - a fairly hot
      spot already - ProcArrayAdd() / ProcArrayRemove() now needs to hold
      XidGenLock in addition to ProcArrayLock. Adding / Removing a procarray
      entry is not a very frequent operation, even taking 2PC into account.
      
      Due to the above, the dense array entries can only be read or modified
      while holding ProcArrayLock and/or XidGenLock. This prevents a
      concurrent ProcArrayRemove() from shifting the dense array while it is
      accessed concurrently.
      
      While the new dense array is very good when needing to look at all
      xids it is less suitable when accessing a single backend's xid. In
      particular it would be problematic to have to acquire a lock to access
      a backend's own xid. Therefore a backend's xid is not just stored in
      the dense array, but also in PGPROC. This also allows a backend to
      only access the shared xid value when the backend had acquired an
      xid.
      
      The infrastructure added in this commit will be used for the remaining
      PGXACT fields in subsequent commits. They are kept separate to make
      review easier.
      
      Author: Andres Freund <andres@anarazel.de>
      Reviewed-By: default avatarRobert Haas <robertmhaas@gmail.com>
      Reviewed-By: default avatarThomas Munro <thomas.munro@gmail.com>
      Reviewed-By: default avatarDavid Rowley <dgrowleyml@gmail.com>
      Discussion: https://postgr.es/m/20200301083601.ews6hz5dduc3w2se@alap3.anarazel.de
      941697c3
    • Alvaro Herrera's avatar
      pg_dump: fix dependencies on FKs to partitioned tables · 2ba5b2db
      Alvaro Herrera authored
      Parallel-restoring a foreign key that references a partitioned table
      with several levels of partitions can fail:
      
      pg_restore: while PROCESSING TOC:
      pg_restore: from TOC entry 6684; 2606 29166 FK CONSTRAINT fk fk_a_fkey postgres
      pg_restore: error: could not execute query: ERROR:  there is no unique constraint matching given keys for referenced table "pk"
      Command was: ALTER TABLE fkpart3.fk
          ADD CONSTRAINT fk_a_fkey FOREIGN KEY (a) REFERENCES fkpart3.pk(a);
      
      This happens in parallel restore mode because some index partitions
      aren't yet attached to the topmost partitioned index that the FK uses,
      and so the index is still invalid.  The current code marks the FK as
      dependent on the first level of index-attach dump objects; the bug is
      fixed by recursively marking the FK on their children.
      
      Backpatch to 12, where FKs to partitioned tables were introduced.
      Reported-by: default avatarTom Lane <tgl@sss.pgh.pa.us>
      Author: Álvaro Herrera <alvherre@alvh.no-ip.org>
      Discussion: https://postgr.es/m/3170626.1594842723@sss.pgh.pa.us
      Backpatch: 12-master
      2ba5b2db
    • Peter Geoghegan's avatar
      Fix obsolete comment in xlogutils.c. · 914140e8
      Peter Geoghegan authored
      Oversight in commit 2c03216d.
      914140e8
    • Tom Lane's avatar
      Fix postmaster's behavior during smart shutdown. · 0038f943
      Tom Lane authored
      Up to now, upon receipt of a SIGTERM ("smart shutdown" command), the
      postmaster has immediately killed all "optional" background processes,
      and subsequently refused to launch new ones while it's waiting for
      foreground client processes to exit.  No doubt this seemed like an OK
      policy at some point; but it's a pretty bad one now, because it makes
      for a seriously degraded environment for the remaining clients:
      
      * Parallel queries are killed, and new ones fail to launch. (And our
      parallel-query infrastructure utterly fails to deal with the case
      in a reasonable way --- it just hangs waiting for workers that are
      not going to arrive.  There is more work needed in that area IMO.)
      
      * Autovacuum ceases to function.  We can tolerate that for awhile,
      but if bulk-update queries continue to run in the surviving client
      sessions, there's eventually going to be a mess.  In the worst case
      the system could reach a forced shutdown to prevent XID wraparound.
      
      * The bgwriter and walwriter are also stopped immediately, likely
      resulting in performance degradation.
      
      Hence, let's rearrange things so that the only immediate change in
      behavior is refusing to let in new normal connections.  Once the last
      normal connection is gone, shut everything down as though we'd received
      a "fast" shutdown.  To implement this, remove the PM_WAIT_BACKUP and
      PM_WAIT_READONLY states, instead staying in PM_RUN or PM_HOT_STANDBY
      while normal connections remain.  A subsidiary state variable tracks
      whether or not we're letting in new connections in those states.
      
      This also allows having just one copy of the logic for killing child
      processes in smart and fast shutdown modes.  I moved that logic into
      PostmasterStateMachine() by inventing a new state PM_STOP_BACKENDS.
      
      Back-patch to 9.6 where parallel query was added.  In principle
      this'd be a good idea in 9.5 as well, but the risk/reward ratio
      is not as good there, since lack of autovacuum is not a problem
      during typical uses of smart shutdown.
      
      Per report from Bharath Rupireddy.
      
      Patch by me, reviewed by Thomas Munro
      
      Discussion: https://postgr.es/m/CALj2ACXAZ5vKxT9P7P89D87i3MDO9bfS+_bjMHgnWJs8uwUOOw@mail.gmail.com
      0038f943
    • Heikki Linnakangas's avatar
      Fix typo in test comment. · 5bdf6945
      Heikki Linnakangas authored
      5bdf6945
    • Michael Paquier's avatar
      Fix compilation warnings with libselinux 3.1 in contrib/sepgsql/ · 1f32136a
      Michael Paquier authored
      Upstream SELinux has recently marked security_context_t as officially
      deprecated, causing warnings with -Wdeprecated-declarations.  This is
      considered as legacy code for some time now by upstream as
      security_context_t got removed from most of the code tree during the
      development of 2.3 back in 2014.
      
      This removes all the references to security_context_t in sepgsql/ to be
      consistent with SELinux, fixing the warnings.  Note that this does not
      impact the minimum version of libselinux supported.
      
      Reviewed-by: Tom Lane
      Discussion: https://postgr.es/m/20200813012735.GC11663@paquier.xyz
      1f32136a
    • Tom Lane's avatar
      Doc: improve examples for json_populate_record() and related functions. · a9306f10
      Tom Lane authored
      Make these examples self-contained by providing declarations of the
      user-defined row types they rely on.  There wasn't room to do this
      in the old doc format, but now there is, and I think it makes the
      examples a good bit less confusing.
      a9306f10
  5. 13 Aug, 2020 3 commits
  6. 12 Aug, 2020 4 commits
    • Andres Freund's avatar
      snapshot scalability: Don't compute global horizons while building snapshots. · dc7420c2
      Andres Freund authored
      To make GetSnapshotData() more scalable, it cannot not look at at each proc's
      xmin: While snapshot contents do not need to change whenever a read-only
      transaction commits or a snapshot is released, a proc's xmin is modified in
      those cases. The frequency of xmin modifications leads to, particularly on
      higher core count systems, many cache misses inside GetSnapshotData(), despite
      the data underlying a snapshot not changing. That is the most
      significant source of GetSnapshotData() scaling poorly on larger systems.
      
      Without accessing xmins, GetSnapshotData() cannot calculate accurate horizons /
      thresholds as it has so far. But we don't really have to: The horizons don't
      actually change that much between GetSnapshotData() calls. Nor are the horizons
      actually used every time a snapshot is built.
      
      The trick this commit introduces is to delay computation of accurate horizons
      until there use and using horizon boundaries to determine whether accurate
      horizons need to be computed.
      
      The use of RecentGlobal[Data]Xmin to decide whether a row version could be
      removed has been replaces with new GlobalVisTest* functions.  These use two
      thresholds to determine whether a row can be pruned:
      1) definitely_needed, indicating that rows deleted by XIDs >= definitely_needed
         are definitely still visible.
      2) maybe_needed, indicating that rows deleted by XIDs < maybe_needed can
         definitely be removed
      GetSnapshotData() updates definitely_needed to be the xmin of the computed
      snapshot.
      
      When testing whether a row can be removed (with GlobalVisTestIsRemovableXid())
      and the tested XID falls in between the two (i.e. XID >= maybe_needed && XID <
      definitely_needed) the boundaries can be recomputed to be more accurate. As it
      is not cheap to compute accurate boundaries, we limit the number of times that
      happens in short succession.  As the boundaries used by
      GlobalVisTestIsRemovableXid() are never reset (with maybe_needed updated by
      GetSnapshotData()), it is likely that further test can benefit from an earlier
      computation of accurate horizons.
      
      To avoid regressing performance when old_snapshot_threshold is set (as that
      requires an accurate horizon to be computed), heap_page_prune_opt() doesn't
      unconditionally call TransactionIdLimitedForOldSnapshots() anymore. Both the
      computation of the limited horizon, and the triggering of errors (with
      SetOldSnapshotThresholdTimestamp()) is now only done when necessary to remove
      tuples.
      
      This commit just removes the accesses to PGXACT->xmin from
      GetSnapshotData(), but other members of PGXACT residing in the same
      cache line are accessed. Therefore this in itself does not result in a
      significant improvement. Subsequent commits will take advantage of the
      fact that GetSnapshotData() now does not need to access xmins anymore.
      
      Note: This contains a workaround in heap_page_prune_opt() to keep the
      snapshot_too_old tests working. While that workaround is ugly, the tests
      currently are not meaningful, and it seems best to address them separately.
      
      Author: Andres Freund <andres@anarazel.de>
      Reviewed-By: default avatarRobert Haas <robertmhaas@gmail.com>
      Reviewed-By: default avatarThomas Munro <thomas.munro@gmail.com>
      Reviewed-By: default avatarDavid Rowley <dgrowleyml@gmail.com>
      Discussion: https://postgr.es/m/20200301083601.ews6hz5dduc3w2se@alap3.anarazel.de
      dc7420c2
    • Alvaro Herrera's avatar
      BRIN: Handle concurrent desummarization properly · 1f42d35a
      Alvaro Herrera authored
      If a page range is desummarized at just the right time concurrently with
      an index walk, BRIN would raise an error indicating index corruption.
      This is scary and unhelpful; silently returning that the page range is
      not summarized is sufficient reaction.
      
      This bug was introduced by commit 975ad4e6 as additional protection
      against a bug whose actual fix was elsewhere.  Backpatch equally.
      Reported-By: default avatarAnastasia Lubennikova <a.lubennikova@postgrespro.ru>
      Diagnosed-By: default avatarAlexander Lakhin <exclusion@gmail.com>
      Discussion: https://postgr.es/m/2588667e-d07d-7e10-74e2-7e1e46194491@postgrespro.ru
      Backpatch: 9.5 - master
      1f42d35a
    • Tom Lane's avatar
      Improve comments for postmaster.c's BackendList. · 3546cf8a
      Tom Lane authored
      This had gotten a little disjointed over time, and some of the grammar
      was sloppy.  Rewrite for more clarity.
      
      In passing, re-pgindent some recently added comments.
      
      No code changes.
      3546cf8a
    • Andres Freund's avatar
      Track latest completed xid as a FullTransactionId. · 3bd7f996
      Andres Freund authored
      The reason for doing so is that a subsequent commit will need that to
      avoid wraparound issues. As the subsequent change is large this was
      split out for easier review.
      
      The reason this is not a perfect straight-forward change is that we do
      not want track 64bit xids in the procarray or the WAL. Therefore we
      need to advance lastestCompletedXid in relation to 32 bit xids. The
      code for that is now centralized in MaintainLatestCompletedXid*.
      
      Author: Andres Freund
      Reviewed-By: Thomas Munro, Robert Haas, David Rowley
      Discussion: https://postgr.es/m/20200301083601.ews6hz5dduc3w2se@alap3.anarazel.de
      3bd7f996
  7. 11 Aug, 2020 2 commits
  8. 10 Aug, 2020 5 commits
    • Peter Eisentraut's avatar
      Replace remaining StrNCpy() by strlcpy() · 1784f278
      Peter Eisentraut authored
      They are equivalent, except that StrNCpy() zero-fills the entire
      destination buffer instead of providing just one trailing zero.  For
      all but a tiny number of callers, that's just overhead rather than
      being desirable.
      
      Remove StrNCpy() as it is now unused.
      
      In some cases, namestrcpy() is the more appropriate function to use.
      While we're here, simplify the API of namestrcpy(): Remove the return
      value, don't check for NULL input.  Nothing was using that anyway.
      Also, remove a few unused name-related functions.
      Reviewed-by: default avatarTom Lane <tgl@sss.pgh.pa.us>
      Discussion: https://www.postgresql.org/message-id/flat/44f5e198-36f6-6cdb-7fa9-60e34784daae%402ndquadrant.com
      1784f278
    • Noah Misch's avatar
      Document clashes between logical replication and untrusted users. · cec57b1a
      Noah Misch authored
      Back-patch to v10, which introduced logical replication.
      
      Security: CVE-2020-14349
      cec57b1a
    • Noah Misch's avatar
      Empty search_path in logical replication apply worker and walsender. · 11da9702
      Noah Misch authored
      This is like CVE-2018-1058 commit
      582edc36.  Today, a malicious user of a
      publisher or subscriber database can invoke arbitrary SQL functions
      under an identity running replication, often a superuser.  This fix may
      cause "does not exist" or "no schema has been selected to create in"
      errors in a replication process.  After upgrading, consider watching
      server logs for these errors.  Objects accruing schema qualification in
      the wake of the earlier commit are unlikely to need further correction.
      Back-patch to v10, which introduced logical replication.
      
      Security: CVE-2020-14349
      11da9702
    • Noah Misch's avatar
      Move connect.h from fe_utils to src/include/common. · e078fb5d
      Noah Misch authored
      Any libpq client can use the header.  Clients include backend components
      postgres_fdw, dblink, and logical replication apply worker.  Back-patch
      to v10, because another fix needs this.  In released branches, just copy
      the header and keep the original.
      e078fb5d
    • Tom Lane's avatar
      Make contrib modules' installation scripts more secure. · 7eeb1d98
      Tom Lane authored
      Hostile objects located within the installation-time search_path could
      capture references in an extension's installation or upgrade script.
      If the extension is being installed with superuser privileges, this
      opens the door to privilege escalation.  While such hazards have existed
      all along, their urgency increases with the v13 "trusted extensions"
      feature, because that lets a non-superuser control the installation path
      for a superuser-privileged script.  Therefore, make a number of changes
      to make such situations more secure:
      
      * Tweak the construction of the installation-time search_path to ensure
      that references to objects in pg_catalog can't be subverted; and
      explicitly add pg_temp to the end of the path to prevent attacks using
      temporary objects.
      
      * Disable check_function_bodies within installation/upgrade scripts,
      so that any security gaps in SQL-language or PL-language function bodies
      cannot create a risk of unwanted installation-time code execution.
      
      * Adjust lookup of type input/receive functions and join estimator
      functions to complain if there are multiple candidate functions.  This
      prevents capture of references to functions whose signature is not the
      first one checked; and it's arguably more user-friendly anyway.
      
      * Modify various contrib upgrade scripts to ensure that catalog
      modification queries are executed with secure search paths.  (These
      are in-place modifications with no extension version changes, since
      it is the update process itself that is at issue, not the end result.)
      
      Extensions that depend on other extensions cannot be made fully secure
      by these methods alone; therefore, revert the "trusted" marking that
      commit eb67623c applied to earthdistance and hstore_plperl, pending
      some better solution to that set of issues.
      
      Also add documentation around these issues, to help extension authors
      write secure installation scripts.
      
      Patch by me, following an observation by Andres Freund; thanks
      to Noah Misch for review.
      
      Security: CVE-2020-14350
      7eeb1d98
  9. 09 Aug, 2020 3 commits
    • Peter Geoghegan's avatar
      Correct nbtree page split lock coupling comment. · d129c074
      Peter Geoghegan authored
      There is no reason to distinguish between readers and writers here.
      d129c074
    • Tom Lane's avatar
      Check for fseeko() failure in pg_dump's _tarAddFile(). · 1b9cde51
      Tom Lane authored
      Coverity pointed out, not unreasonably, that we checked fseeko's
      result at every other call site but these.  Failure to seek in the
      temp file (note this is NOT pg_dump's output file) seems quite
      unlikely, and even if it did happen the file length cross-check
      further down would probably detect the problem.  Still, that's a
      poor excuse for not checking the result of a system call.
      1b9cde51
    • Tom Lane's avatar
      Remove useless Assert. · 1c164ef3
      Tom Lane authored
      Testing that an unsigned variable is >= 0 is pretty pointless,
      as noted by Coverity and numerous buildfarm members.
      
      In passing, add comment about new uses of "volatile" --- Coverity
      doesn't much like that either, but it seems probably necessary.
      1c164ef3
  10. 08 Aug, 2020 3 commits
    • Tom Lane's avatar
      Remove <@ from contrib/intarray's GiST operator classes. · 20e7e1fe
      Tom Lane authored
      Since commit efc77cf5, an indexed query using <@ has required a
      full-index scan, so that it actually performs worse than a plain seqscan
      would do.  As I noted at the time, we'd be better off to not treat <@ as
      being indexable by such indexes at all; and that's what this patch does.
      
      It would have been difficult to remove these opclass members without
      dropping the whole opclass before commit 9f968278 fixed GiST opclass
      member dependency rules, but now it's quite simple, so let's do it.
      
      I left the existing support code in place for the time being, with
      comments noting it's now unreachable.  At some point, perhaps we should
      remove that code in favor of throwing an error telling people to upgrade
      the extension version.
      
      Discussion: https://postgr.es/m/2176979.1596389859@sss.pgh.pa.us
      Discussion: https://postgr.es/m/458.1565114141@sss.pgh.pa.us
      20e7e1fe
    • Peter Geoghegan's avatar
      Teach amcheck to verify sibling links in all cases. · 39132b78
      Peter Geoghegan authored
      Teach contrib/amcheck's bt_index_check() function to check agreement
      between siblings links.  The left sibling's right link should point to a
      right sibling page whose left link points back to the same original left
      sibling.  This extends a check that bt_index_parent_check() always
      performed to bt_index_check().
      
      This is the first time amcheck has been taught to perform buffer lock
      coupling, which we have explicitly avoided up until now.  The sibling
      link check tends to catch a lot of real world index corruption with
      little overhead, so it seems worth accepting the complexity.  Note that
      the new lock coupling logic would not work correctly on replica servers
      without the changes made by commits 0a7d771f and 9a9db08a (there could
      be false positives without those changes).
      
      Author: Andrey Borodin, Peter Geoghegan
      Discussion: https://postgr.es/m/0EB0CFA8-CBD8-4296-8049-A2C0F28FAE8C@yandex-team.ru
      39132b78
    • Alvaro Herrera's avatar
      walsnd: Don't set waiting_for_ping_response spuriously · 470687b4
      Alvaro Herrera authored
      Ashutosh Bapat noticed that when logical walsender needs to wait for
      WAL, and it realizes that it must send a keepalive message to
      walreceiver to update the sent-LSN, which *does not* request a reply
      from walreceiver, it wrongly sets the flag that it's going to wait for
      that reply.  That means that any future would-be sender of feedback
      messages ends up not sending a feedback message, because they all
      believe that a reply is expected.
      
      With built-in logical replication there's not much harm in this, because
      WalReceiverMain will send a ping-back every wal_receiver_timeout/2
      anyway; but with other logical replication systems (e.g. pglogical) it
      can cause significant pain.
      
      This problem was introduced in commit 41d5f8ad, where the
      request-reply flag was changed from true to false to WalSndKeepalive,
      without at the same time removing the line that sets
      waiting_for_ping_response.
      
      Just removing that line would be a sufficient fix, but it seems better
      to shift the responsibility of setting the flag to WalSndKeepalive
      itself instead of requiring caller to do it; this is clearly less
      error-prone.
      
      Author: Álvaro Herrera <alvherre@alvh.no-ip.org>
      Reported-by: default avatarAshutosh Bapat <ashutosh.bapat@2ndquadrant.com>
      Backpatch: 9.5 and up
      Discussion: https://postgr.es/m/20200806225558.GA22401@alvherre.pgsql
      470687b4