1. 19 Aug, 2020 1 commit
    • Fujii Masao's avatar
      Add pg_backend_memory_contexts system view. · 3e98c0ba
      Fujii Masao authored
      This view displays the usages of all the memory contexts of the server
      process attached to the current session. This information is useful to
      investigate the cause of backend-local memory bloat.
      
      This information can be also collected by calling
      MemoryContextStats(TopMemoryContext) via a debugger. But this technique
      cannot be uesd in some environments because no debugger is available there.
      And it outputs lots of text messages and it's not easy to analyze them.
      So, pg_backend_memory_contexts view allows us to access to backend-local
      memory contexts information more easily.
      
      Bump catalog version.
      
      Author: Atsushi Torikoshi, Fujii Masao
      Reviewed-by: Tatsuhito Kasahara, Andres Freund, Daniel Gustafsson, Robert Haas, Michael Paquier
      Discussion: https://postgr.es/m/72a656e0f71d0860161e0b3f67e4d771@oss.nttdata.com
      3e98c0ba
  2. 18 Aug, 2020 5 commits
    • Andres Freund's avatar
      Fix race condition in snapshot caching when 2PC is used. · 07f32fcd
      Andres Freund authored
      When preparing a transaction xactCompletionCount needs to be
      incremented, even though the transaction has not committed
      yet. Otherwise the snapshot used within the transaction otherwise can
      get reused outside of the prepared transaction. As GetSnapshotData()
      does not include the current xid when building a snapshot, reuse would
      not be correct.
      
      Somewhat surprisingly the regression tests only rarely show incorrect
      results without the fix. The reason for that is that often the
      snapshot's xmax will be >= the backend xid, yielding a snapshot that
      is correct, despite the bug.
      
      I'm working on a reliable test for the bug, but it seems worth seeing
      whether this fixes all the BF failures while I do.
      
      Author: Andres Freund <andres@anarazel.de>
      Discussion: https://postgr.es/m/E1k7tGP-0005V0-5k@gemulon.postgresql.org
      07f32fcd
    • Heikki Linnakangas's avatar
      Avoid non-constant format string argument to fprintf(). · 73447820
      Heikki Linnakangas authored
      As Tom Lane pointed out, it could defeat the compiler's printf() format
      string verification.
      
      Backpatch to v12, like that patch that introduced it.
      
      Discussion: https://www.postgresql.org/message-id/1069283.1597672779%40sss.pgh.pa.us
      73447820
    • Andres Freund's avatar
      snapshot scalability: cache snapshots using a xact completion counter. · 623a9ba7
      Andres Freund authored
      Previous commits made it faster/more scalable to compute snapshots. But not
      building a snapshot is still faster. Now that GetSnapshotData() does not
      maintain RecentGlobal* anymore, that is actually not too hard:
      
      This commit introduces xactCompletionCount, which tracks the number of
      top-level transactions with xids (i.e. which may have modified the database)
      that completed in some form since the start of the server.
      
      We can avoid rebuilding the snapshot's contents whenever the current
      xactCompletionCount is the same as it was when the snapshot was
      originally built.  Currently this check happens while holding
      ProcArrayLock. While it's likely possible to perform the check without
      acquiring ProcArrayLock, it seems better to do that separately /
      later, some careful analysis is required. Even with the lock this is a
      significant win on its own.
      
      On a smaller two socket machine this gains another ~1.03x, on a larger
      machine the effect is roughly double (earlier patch version tested
      though).  If we were able to safely avoid the lock there'd be another
      significant gain on top of that.
      
      Author: Andres Freund <andres@anarazel.de>
      Reviewed-By: default avatarRobert Haas <robertmhaas@gmail.com>
      Reviewed-By: default avatarThomas Munro <thomas.munro@gmail.com>
      Reviewed-By: default avatarDavid Rowley <dgrowleyml@gmail.com>
      Discussion: https://postgr.es/m/20200301083601.ews6hz5dduc3w2se@alap3.anarazel.de
      623a9ba7
    • Michael Paquier's avatar
      Fix use-after-release issue in PL/Sample · 51300b45
      Michael Paquier authored
      Introduced in adbe62d0.  Per buildfarm member prion, when using
      RELCACHE_FORCE_RELEASE.
      51300b45
    • Michael Paquier's avatar
      Add PL/Sample to src/test/modules/ · adbe62d0
      Michael Paquier authored
      PL/Sample is an example template of procedural-language handler.  This
      can be used as a base to implement a custom PL, or as a facility to test
      APIs dedicated to PLs.  Much more could be done in this module, like
      adding a simple validator, but this is left as future work.
      
      The documentation included originally some C code to understand the
      basics of PL handler implementation, but it was outdated, and not really
      helpful either if trying to implement a new procedural language,
      particularly when it came to the integration of a PL installation with
      CREATE EXTENSION.
      
      Author: Mark Wong
      Reviewed-by: Tom Lane, Michael Paquier
      Discussion: https://postgr.es/m/20200612172648.GA3327@2ndQuadrant.com
      adbe62d0
  3. 17 Aug, 2020 6 commits
  4. 16 Aug, 2020 3 commits
  5. 15 Aug, 2020 7 commits
  6. 14 Aug, 2020 9 commits
    • Andres Freund's avatar
      snapshot scalability: Move subxact info to ProcGlobal, remove PGXACT. · 73487a60
      Andres Freund authored
      Similar to the previous changes this increases the chance that data
      frequently needed by GetSnapshotData() stays in l2 cache. In many
      workloads subtransactions are very rare, and this makes the check for
      that considerably cheaper.
      
      As this removes the last member of PGXACT, there is no need to keep it
      around anymore.
      
      On a larger 2 socket machine this and the two preceding commits result
      in a ~1.07x performance increase in read-only pgbench. For read-heavy
      mixed r/w workloads without row level contention, I see about 1.1x.
      
      Author: Andres Freund <andres@anarazel.de>
      Reviewed-By: default avatarRobert Haas <robertmhaas@gmail.com>
      Reviewed-By: default avatarThomas Munro <thomas.munro@gmail.com>
      Reviewed-By: default avatarDavid Rowley <dgrowleyml@gmail.com>
      Discussion: https://postgr.es/m/20200301083601.ews6hz5dduc3w2se@alap3.anarazel.de
      73487a60
    • Andres Freund's avatar
      snapshot scalability: Move PGXACT->vacuumFlags to ProcGlobal->vacuumFlags. · 5788e258
      Andres Freund authored
      Similar to the previous commit this increases the chance that data
      frequently needed by GetSnapshotData() stays in l2 cache. As we now
      take care to not unnecessarily write to ProcGlobal->vacuumFlags, there
      should be very few modifications to the ProcGlobal->vacuumFlags array.
      
      Author: Andres Freund <andres@anarazel.de>
      Reviewed-By: default avatarRobert Haas <robertmhaas@gmail.com>
      Reviewed-By: default avatarThomas Munro <thomas.munro@gmail.com>
      Reviewed-By: default avatarDavid Rowley <dgrowleyml@gmail.com>
      Discussion: https://postgr.es/m/20200301083601.ews6hz5dduc3w2se@alap3.anarazel.de
      5788e258
    • Andres Freund's avatar
      snapshot scalability: Introduce dense array of in-progress xids. · 941697c3
      Andres Freund authored
      The new array contains the xids for all connected backends / in-use
      PGPROC entries in a dense manner (in contrast to the PGPROC/PGXACT
      arrays which can have unused entries interspersed).
      
      This improves performance because GetSnapshotData() always needs to
      scan the xids of all live procarray entries and now there's no need to
      go through the procArray->pgprocnos indirection anymore.
      
      As the set of running top-level xids changes rarely, compared to the
      number of snapshots taken, this substantially increases the likelihood
      of most data required for a snapshot being in l2 cache.  In
      read-mostly workloads scanning the xids[] array will sufficient to
      build a snapshot, as most backends will not have an xid assigned.
      
      To keep the xid array dense ProcArrayRemove() needs to move entries
      behind the to-be-removed proc's one further up in the array. Obviously
      moving array entries cannot happen while a backend sets it
      xid. I.e. locking needs to prevent that array entries are moved while
      a backend modifies its xid.
      
      To avoid locking ProcArrayLock in GetNewTransactionId() - a fairly hot
      spot already - ProcArrayAdd() / ProcArrayRemove() now needs to hold
      XidGenLock in addition to ProcArrayLock. Adding / Removing a procarray
      entry is not a very frequent operation, even taking 2PC into account.
      
      Due to the above, the dense array entries can only be read or modified
      while holding ProcArrayLock and/or XidGenLock. This prevents a
      concurrent ProcArrayRemove() from shifting the dense array while it is
      accessed concurrently.
      
      While the new dense array is very good when needing to look at all
      xids it is less suitable when accessing a single backend's xid. In
      particular it would be problematic to have to acquire a lock to access
      a backend's own xid. Therefore a backend's xid is not just stored in
      the dense array, but also in PGPROC. This also allows a backend to
      only access the shared xid value when the backend had acquired an
      xid.
      
      The infrastructure added in this commit will be used for the remaining
      PGXACT fields in subsequent commits. They are kept separate to make
      review easier.
      
      Author: Andres Freund <andres@anarazel.de>
      Reviewed-By: default avatarRobert Haas <robertmhaas@gmail.com>
      Reviewed-By: default avatarThomas Munro <thomas.munro@gmail.com>
      Reviewed-By: default avatarDavid Rowley <dgrowleyml@gmail.com>
      Discussion: https://postgr.es/m/20200301083601.ews6hz5dduc3w2se@alap3.anarazel.de
      941697c3
    • Alvaro Herrera's avatar
      pg_dump: fix dependencies on FKs to partitioned tables · 2ba5b2db
      Alvaro Herrera authored
      Parallel-restoring a foreign key that references a partitioned table
      with several levels of partitions can fail:
      
      pg_restore: while PROCESSING TOC:
      pg_restore: from TOC entry 6684; 2606 29166 FK CONSTRAINT fk fk_a_fkey postgres
      pg_restore: error: could not execute query: ERROR:  there is no unique constraint matching given keys for referenced table "pk"
      Command was: ALTER TABLE fkpart3.fk
          ADD CONSTRAINT fk_a_fkey FOREIGN KEY (a) REFERENCES fkpart3.pk(a);
      
      This happens in parallel restore mode because some index partitions
      aren't yet attached to the topmost partitioned index that the FK uses,
      and so the index is still invalid.  The current code marks the FK as
      dependent on the first level of index-attach dump objects; the bug is
      fixed by recursively marking the FK on their children.
      
      Backpatch to 12, where FKs to partitioned tables were introduced.
      Reported-by: default avatarTom Lane <tgl@sss.pgh.pa.us>
      Author: Álvaro Herrera <alvherre@alvh.no-ip.org>
      Discussion: https://postgr.es/m/3170626.1594842723@sss.pgh.pa.us
      Backpatch: 12-master
      2ba5b2db
    • Peter Geoghegan's avatar
      Fix obsolete comment in xlogutils.c. · 914140e8
      Peter Geoghegan authored
      Oversight in commit 2c03216d.
      914140e8
    • Tom Lane's avatar
      Fix postmaster's behavior during smart shutdown. · 0038f943
      Tom Lane authored
      Up to now, upon receipt of a SIGTERM ("smart shutdown" command), the
      postmaster has immediately killed all "optional" background processes,
      and subsequently refused to launch new ones while it's waiting for
      foreground client processes to exit.  No doubt this seemed like an OK
      policy at some point; but it's a pretty bad one now, because it makes
      for a seriously degraded environment for the remaining clients:
      
      * Parallel queries are killed, and new ones fail to launch. (And our
      parallel-query infrastructure utterly fails to deal with the case
      in a reasonable way --- it just hangs waiting for workers that are
      not going to arrive.  There is more work needed in that area IMO.)
      
      * Autovacuum ceases to function.  We can tolerate that for awhile,
      but if bulk-update queries continue to run in the surviving client
      sessions, there's eventually going to be a mess.  In the worst case
      the system could reach a forced shutdown to prevent XID wraparound.
      
      * The bgwriter and walwriter are also stopped immediately, likely
      resulting in performance degradation.
      
      Hence, let's rearrange things so that the only immediate change in
      behavior is refusing to let in new normal connections.  Once the last
      normal connection is gone, shut everything down as though we'd received
      a "fast" shutdown.  To implement this, remove the PM_WAIT_BACKUP and
      PM_WAIT_READONLY states, instead staying in PM_RUN or PM_HOT_STANDBY
      while normal connections remain.  A subsidiary state variable tracks
      whether or not we're letting in new connections in those states.
      
      This also allows having just one copy of the logic for killing child
      processes in smart and fast shutdown modes.  I moved that logic into
      PostmasterStateMachine() by inventing a new state PM_STOP_BACKENDS.
      
      Back-patch to 9.6 where parallel query was added.  In principle
      this'd be a good idea in 9.5 as well, but the risk/reward ratio
      is not as good there, since lack of autovacuum is not a problem
      during typical uses of smart shutdown.
      
      Per report from Bharath Rupireddy.
      
      Patch by me, reviewed by Thomas Munro
      
      Discussion: https://postgr.es/m/CALj2ACXAZ5vKxT9P7P89D87i3MDO9bfS+_bjMHgnWJs8uwUOOw@mail.gmail.com
      0038f943
    • Heikki Linnakangas's avatar
      Fix typo in test comment. · 5bdf6945
      Heikki Linnakangas authored
      5bdf6945
    • Michael Paquier's avatar
      Fix compilation warnings with libselinux 3.1 in contrib/sepgsql/ · 1f32136a
      Michael Paquier authored
      Upstream SELinux has recently marked security_context_t as officially
      deprecated, causing warnings with -Wdeprecated-declarations.  This is
      considered as legacy code for some time now by upstream as
      security_context_t got removed from most of the code tree during the
      development of 2.3 back in 2014.
      
      This removes all the references to security_context_t in sepgsql/ to be
      consistent with SELinux, fixing the warnings.  Note that this does not
      impact the minimum version of libselinux supported.
      
      Reviewed-by: Tom Lane
      Discussion: https://postgr.es/m/20200813012735.GC11663@paquier.xyz
      1f32136a
    • Tom Lane's avatar
      Doc: improve examples for json_populate_record() and related functions. · a9306f10
      Tom Lane authored
      Make these examples self-contained by providing declarations of the
      user-defined row types they rely on.  There wasn't room to do this
      in the old doc format, but now there is, and I think it makes the
      examples a good bit less confusing.
      a9306f10
  7. 13 Aug, 2020 3 commits
  8. 12 Aug, 2020 4 commits
    • Andres Freund's avatar
      snapshot scalability: Don't compute global horizons while building snapshots. · dc7420c2
      Andres Freund authored
      To make GetSnapshotData() more scalable, it cannot not look at at each proc's
      xmin: While snapshot contents do not need to change whenever a read-only
      transaction commits or a snapshot is released, a proc's xmin is modified in
      those cases. The frequency of xmin modifications leads to, particularly on
      higher core count systems, many cache misses inside GetSnapshotData(), despite
      the data underlying a snapshot not changing. That is the most
      significant source of GetSnapshotData() scaling poorly on larger systems.
      
      Without accessing xmins, GetSnapshotData() cannot calculate accurate horizons /
      thresholds as it has so far. But we don't really have to: The horizons don't
      actually change that much between GetSnapshotData() calls. Nor are the horizons
      actually used every time a snapshot is built.
      
      The trick this commit introduces is to delay computation of accurate horizons
      until there use and using horizon boundaries to determine whether accurate
      horizons need to be computed.
      
      The use of RecentGlobal[Data]Xmin to decide whether a row version could be
      removed has been replaces with new GlobalVisTest* functions.  These use two
      thresholds to determine whether a row can be pruned:
      1) definitely_needed, indicating that rows deleted by XIDs >= definitely_needed
         are definitely still visible.
      2) maybe_needed, indicating that rows deleted by XIDs < maybe_needed can
         definitely be removed
      GetSnapshotData() updates definitely_needed to be the xmin of the computed
      snapshot.
      
      When testing whether a row can be removed (with GlobalVisTestIsRemovableXid())
      and the tested XID falls in between the two (i.e. XID >= maybe_needed && XID <
      definitely_needed) the boundaries can be recomputed to be more accurate. As it
      is not cheap to compute accurate boundaries, we limit the number of times that
      happens in short succession.  As the boundaries used by
      GlobalVisTestIsRemovableXid() are never reset (with maybe_needed updated by
      GetSnapshotData()), it is likely that further test can benefit from an earlier
      computation of accurate horizons.
      
      To avoid regressing performance when old_snapshot_threshold is set (as that
      requires an accurate horizon to be computed), heap_page_prune_opt() doesn't
      unconditionally call TransactionIdLimitedForOldSnapshots() anymore. Both the
      computation of the limited horizon, and the triggering of errors (with
      SetOldSnapshotThresholdTimestamp()) is now only done when necessary to remove
      tuples.
      
      This commit just removes the accesses to PGXACT->xmin from
      GetSnapshotData(), but other members of PGXACT residing in the same
      cache line are accessed. Therefore this in itself does not result in a
      significant improvement. Subsequent commits will take advantage of the
      fact that GetSnapshotData() now does not need to access xmins anymore.
      
      Note: This contains a workaround in heap_page_prune_opt() to keep the
      snapshot_too_old tests working. While that workaround is ugly, the tests
      currently are not meaningful, and it seems best to address them separately.
      
      Author: Andres Freund <andres@anarazel.de>
      Reviewed-By: default avatarRobert Haas <robertmhaas@gmail.com>
      Reviewed-By: default avatarThomas Munro <thomas.munro@gmail.com>
      Reviewed-By: default avatarDavid Rowley <dgrowleyml@gmail.com>
      Discussion: https://postgr.es/m/20200301083601.ews6hz5dduc3w2se@alap3.anarazel.de
      dc7420c2
    • Alvaro Herrera's avatar
      BRIN: Handle concurrent desummarization properly · 1f42d35a
      Alvaro Herrera authored
      If a page range is desummarized at just the right time concurrently with
      an index walk, BRIN would raise an error indicating index corruption.
      This is scary and unhelpful; silently returning that the page range is
      not summarized is sufficient reaction.
      
      This bug was introduced by commit 975ad4e6 as additional protection
      against a bug whose actual fix was elsewhere.  Backpatch equally.
      Reported-By: default avatarAnastasia Lubennikova <a.lubennikova@postgrespro.ru>
      Diagnosed-By: default avatarAlexander Lakhin <exclusion@gmail.com>
      Discussion: https://postgr.es/m/2588667e-d07d-7e10-74e2-7e1e46194491@postgrespro.ru
      Backpatch: 9.5 - master
      1f42d35a
    • Tom Lane's avatar
      Improve comments for postmaster.c's BackendList. · 3546cf8a
      Tom Lane authored
      This had gotten a little disjointed over time, and some of the grammar
      was sloppy.  Rewrite for more clarity.
      
      In passing, re-pgindent some recently added comments.
      
      No code changes.
      3546cf8a
    • Andres Freund's avatar
      Track latest completed xid as a FullTransactionId. · 3bd7f996
      Andres Freund authored
      The reason for doing so is that a subsequent commit will need that to
      avoid wraparound issues. As the subsequent change is large this was
      split out for easier review.
      
      The reason this is not a perfect straight-forward change is that we do
      not want track 64bit xids in the procarray or the WAL. Therefore we
      need to advance lastestCompletedXid in relation to 32 bit xids. The
      code for that is now centralized in MaintainLatestCompletedXid*.
      
      Author: Andres Freund
      Reviewed-By: Thomas Munro, Robert Haas, David Rowley
      Discussion: https://postgr.es/m/20200301083601.ews6hz5dduc3w2se@alap3.anarazel.de
      3bd7f996
  9. 11 Aug, 2020 2 commits