1. 18 Aug, 2020 4 commits
  2. 17 Aug, 2020 6 commits
  3. 16 Aug, 2020 3 commits
  4. 15 Aug, 2020 7 commits
  5. 14 Aug, 2020 9 commits
    • Andres Freund's avatar
      snapshot scalability: Move subxact info to ProcGlobal, remove PGXACT. · 73487a60
      Andres Freund authored
      Similar to the previous changes this increases the chance that data
      frequently needed by GetSnapshotData() stays in l2 cache. In many
      workloads subtransactions are very rare, and this makes the check for
      that considerably cheaper.
      
      As this removes the last member of PGXACT, there is no need to keep it
      around anymore.
      
      On a larger 2 socket machine this and the two preceding commits result
      in a ~1.07x performance increase in read-only pgbench. For read-heavy
      mixed r/w workloads without row level contention, I see about 1.1x.
      
      Author: Andres Freund <andres@anarazel.de>
      Reviewed-By: default avatarRobert Haas <robertmhaas@gmail.com>
      Reviewed-By: default avatarThomas Munro <thomas.munro@gmail.com>
      Reviewed-By: default avatarDavid Rowley <dgrowleyml@gmail.com>
      Discussion: https://postgr.es/m/20200301083601.ews6hz5dduc3w2se@alap3.anarazel.de
      73487a60
    • Andres Freund's avatar
      snapshot scalability: Move PGXACT->vacuumFlags to ProcGlobal->vacuumFlags. · 5788e258
      Andres Freund authored
      Similar to the previous commit this increases the chance that data
      frequently needed by GetSnapshotData() stays in l2 cache. As we now
      take care to not unnecessarily write to ProcGlobal->vacuumFlags, there
      should be very few modifications to the ProcGlobal->vacuumFlags array.
      
      Author: Andres Freund <andres@anarazel.de>
      Reviewed-By: default avatarRobert Haas <robertmhaas@gmail.com>
      Reviewed-By: default avatarThomas Munro <thomas.munro@gmail.com>
      Reviewed-By: default avatarDavid Rowley <dgrowleyml@gmail.com>
      Discussion: https://postgr.es/m/20200301083601.ews6hz5dduc3w2se@alap3.anarazel.de
      5788e258
    • Andres Freund's avatar
      snapshot scalability: Introduce dense array of in-progress xids. · 941697c3
      Andres Freund authored
      The new array contains the xids for all connected backends / in-use
      PGPROC entries in a dense manner (in contrast to the PGPROC/PGXACT
      arrays which can have unused entries interspersed).
      
      This improves performance because GetSnapshotData() always needs to
      scan the xids of all live procarray entries and now there's no need to
      go through the procArray->pgprocnos indirection anymore.
      
      As the set of running top-level xids changes rarely, compared to the
      number of snapshots taken, this substantially increases the likelihood
      of most data required for a snapshot being in l2 cache.  In
      read-mostly workloads scanning the xids[] array will sufficient to
      build a snapshot, as most backends will not have an xid assigned.
      
      To keep the xid array dense ProcArrayRemove() needs to move entries
      behind the to-be-removed proc's one further up in the array. Obviously
      moving array entries cannot happen while a backend sets it
      xid. I.e. locking needs to prevent that array entries are moved while
      a backend modifies its xid.
      
      To avoid locking ProcArrayLock in GetNewTransactionId() - a fairly hot
      spot already - ProcArrayAdd() / ProcArrayRemove() now needs to hold
      XidGenLock in addition to ProcArrayLock. Adding / Removing a procarray
      entry is not a very frequent operation, even taking 2PC into account.
      
      Due to the above, the dense array entries can only be read or modified
      while holding ProcArrayLock and/or XidGenLock. This prevents a
      concurrent ProcArrayRemove() from shifting the dense array while it is
      accessed concurrently.
      
      While the new dense array is very good when needing to look at all
      xids it is less suitable when accessing a single backend's xid. In
      particular it would be problematic to have to acquire a lock to access
      a backend's own xid. Therefore a backend's xid is not just stored in
      the dense array, but also in PGPROC. This also allows a backend to
      only access the shared xid value when the backend had acquired an
      xid.
      
      The infrastructure added in this commit will be used for the remaining
      PGXACT fields in subsequent commits. They are kept separate to make
      review easier.
      
      Author: Andres Freund <andres@anarazel.de>
      Reviewed-By: default avatarRobert Haas <robertmhaas@gmail.com>
      Reviewed-By: default avatarThomas Munro <thomas.munro@gmail.com>
      Reviewed-By: default avatarDavid Rowley <dgrowleyml@gmail.com>
      Discussion: https://postgr.es/m/20200301083601.ews6hz5dduc3w2se@alap3.anarazel.de
      941697c3
    • Alvaro Herrera's avatar
      pg_dump: fix dependencies on FKs to partitioned tables · 2ba5b2db
      Alvaro Herrera authored
      Parallel-restoring a foreign key that references a partitioned table
      with several levels of partitions can fail:
      
      pg_restore: while PROCESSING TOC:
      pg_restore: from TOC entry 6684; 2606 29166 FK CONSTRAINT fk fk_a_fkey postgres
      pg_restore: error: could not execute query: ERROR:  there is no unique constraint matching given keys for referenced table "pk"
      Command was: ALTER TABLE fkpart3.fk
          ADD CONSTRAINT fk_a_fkey FOREIGN KEY (a) REFERENCES fkpart3.pk(a);
      
      This happens in parallel restore mode because some index partitions
      aren't yet attached to the topmost partitioned index that the FK uses,
      and so the index is still invalid.  The current code marks the FK as
      dependent on the first level of index-attach dump objects; the bug is
      fixed by recursively marking the FK on their children.
      
      Backpatch to 12, where FKs to partitioned tables were introduced.
      Reported-by: default avatarTom Lane <tgl@sss.pgh.pa.us>
      Author: Álvaro Herrera <alvherre@alvh.no-ip.org>
      Discussion: https://postgr.es/m/3170626.1594842723@sss.pgh.pa.us
      Backpatch: 12-master
      2ba5b2db
    • Peter Geoghegan's avatar
      Fix obsolete comment in xlogutils.c. · 914140e8
      Peter Geoghegan authored
      Oversight in commit 2c03216d.
      914140e8
    • Tom Lane's avatar
      Fix postmaster's behavior during smart shutdown. · 0038f943
      Tom Lane authored
      Up to now, upon receipt of a SIGTERM ("smart shutdown" command), the
      postmaster has immediately killed all "optional" background processes,
      and subsequently refused to launch new ones while it's waiting for
      foreground client processes to exit.  No doubt this seemed like an OK
      policy at some point; but it's a pretty bad one now, because it makes
      for a seriously degraded environment for the remaining clients:
      
      * Parallel queries are killed, and new ones fail to launch. (And our
      parallel-query infrastructure utterly fails to deal with the case
      in a reasonable way --- it just hangs waiting for workers that are
      not going to arrive.  There is more work needed in that area IMO.)
      
      * Autovacuum ceases to function.  We can tolerate that for awhile,
      but if bulk-update queries continue to run in the surviving client
      sessions, there's eventually going to be a mess.  In the worst case
      the system could reach a forced shutdown to prevent XID wraparound.
      
      * The bgwriter and walwriter are also stopped immediately, likely
      resulting in performance degradation.
      
      Hence, let's rearrange things so that the only immediate change in
      behavior is refusing to let in new normal connections.  Once the last
      normal connection is gone, shut everything down as though we'd received
      a "fast" shutdown.  To implement this, remove the PM_WAIT_BACKUP and
      PM_WAIT_READONLY states, instead staying in PM_RUN or PM_HOT_STANDBY
      while normal connections remain.  A subsidiary state variable tracks
      whether or not we're letting in new connections in those states.
      
      This also allows having just one copy of the logic for killing child
      processes in smart and fast shutdown modes.  I moved that logic into
      PostmasterStateMachine() by inventing a new state PM_STOP_BACKENDS.
      
      Back-patch to 9.6 where parallel query was added.  In principle
      this'd be a good idea in 9.5 as well, but the risk/reward ratio
      is not as good there, since lack of autovacuum is not a problem
      during typical uses of smart shutdown.
      
      Per report from Bharath Rupireddy.
      
      Patch by me, reviewed by Thomas Munro
      
      Discussion: https://postgr.es/m/CALj2ACXAZ5vKxT9P7P89D87i3MDO9bfS+_bjMHgnWJs8uwUOOw@mail.gmail.com
      0038f943
    • Heikki Linnakangas's avatar
      Fix typo in test comment. · 5bdf6945
      Heikki Linnakangas authored
      5bdf6945
    • Michael Paquier's avatar
      Fix compilation warnings with libselinux 3.1 in contrib/sepgsql/ · 1f32136a
      Michael Paquier authored
      Upstream SELinux has recently marked security_context_t as officially
      deprecated, causing warnings with -Wdeprecated-declarations.  This is
      considered as legacy code for some time now by upstream as
      security_context_t got removed from most of the code tree during the
      development of 2.3 back in 2014.
      
      This removes all the references to security_context_t in sepgsql/ to be
      consistent with SELinux, fixing the warnings.  Note that this does not
      impact the minimum version of libselinux supported.
      
      Reviewed-by: Tom Lane
      Discussion: https://postgr.es/m/20200813012735.GC11663@paquier.xyz
      1f32136a
    • Tom Lane's avatar
      Doc: improve examples for json_populate_record() and related functions. · a9306f10
      Tom Lane authored
      Make these examples self-contained by providing declarations of the
      user-defined row types they rely on.  There wasn't room to do this
      in the old doc format, but now there is, and I think it makes the
      examples a good bit less confusing.
      a9306f10
  6. 13 Aug, 2020 3 commits
  7. 12 Aug, 2020 4 commits
    • Andres Freund's avatar
      snapshot scalability: Don't compute global horizons while building snapshots. · dc7420c2
      Andres Freund authored
      To make GetSnapshotData() more scalable, it cannot not look at at each proc's
      xmin: While snapshot contents do not need to change whenever a read-only
      transaction commits or a snapshot is released, a proc's xmin is modified in
      those cases. The frequency of xmin modifications leads to, particularly on
      higher core count systems, many cache misses inside GetSnapshotData(), despite
      the data underlying a snapshot not changing. That is the most
      significant source of GetSnapshotData() scaling poorly on larger systems.
      
      Without accessing xmins, GetSnapshotData() cannot calculate accurate horizons /
      thresholds as it has so far. But we don't really have to: The horizons don't
      actually change that much between GetSnapshotData() calls. Nor are the horizons
      actually used every time a snapshot is built.
      
      The trick this commit introduces is to delay computation of accurate horizons
      until there use and using horizon boundaries to determine whether accurate
      horizons need to be computed.
      
      The use of RecentGlobal[Data]Xmin to decide whether a row version could be
      removed has been replaces with new GlobalVisTest* functions.  These use two
      thresholds to determine whether a row can be pruned:
      1) definitely_needed, indicating that rows deleted by XIDs >= definitely_needed
         are definitely still visible.
      2) maybe_needed, indicating that rows deleted by XIDs < maybe_needed can
         definitely be removed
      GetSnapshotData() updates definitely_needed to be the xmin of the computed
      snapshot.
      
      When testing whether a row can be removed (with GlobalVisTestIsRemovableXid())
      and the tested XID falls in between the two (i.e. XID >= maybe_needed && XID <
      definitely_needed) the boundaries can be recomputed to be more accurate. As it
      is not cheap to compute accurate boundaries, we limit the number of times that
      happens in short succession.  As the boundaries used by
      GlobalVisTestIsRemovableXid() are never reset (with maybe_needed updated by
      GetSnapshotData()), it is likely that further test can benefit from an earlier
      computation of accurate horizons.
      
      To avoid regressing performance when old_snapshot_threshold is set (as that
      requires an accurate horizon to be computed), heap_page_prune_opt() doesn't
      unconditionally call TransactionIdLimitedForOldSnapshots() anymore. Both the
      computation of the limited horizon, and the triggering of errors (with
      SetOldSnapshotThresholdTimestamp()) is now only done when necessary to remove
      tuples.
      
      This commit just removes the accesses to PGXACT->xmin from
      GetSnapshotData(), but other members of PGXACT residing in the same
      cache line are accessed. Therefore this in itself does not result in a
      significant improvement. Subsequent commits will take advantage of the
      fact that GetSnapshotData() now does not need to access xmins anymore.
      
      Note: This contains a workaround in heap_page_prune_opt() to keep the
      snapshot_too_old tests working. While that workaround is ugly, the tests
      currently are not meaningful, and it seems best to address them separately.
      
      Author: Andres Freund <andres@anarazel.de>
      Reviewed-By: default avatarRobert Haas <robertmhaas@gmail.com>
      Reviewed-By: default avatarThomas Munro <thomas.munro@gmail.com>
      Reviewed-By: default avatarDavid Rowley <dgrowleyml@gmail.com>
      Discussion: https://postgr.es/m/20200301083601.ews6hz5dduc3w2se@alap3.anarazel.de
      dc7420c2
    • Alvaro Herrera's avatar
      BRIN: Handle concurrent desummarization properly · 1f42d35a
      Alvaro Herrera authored
      If a page range is desummarized at just the right time concurrently with
      an index walk, BRIN would raise an error indicating index corruption.
      This is scary and unhelpful; silently returning that the page range is
      not summarized is sufficient reaction.
      
      This bug was introduced by commit 975ad4e6 as additional protection
      against a bug whose actual fix was elsewhere.  Backpatch equally.
      Reported-By: default avatarAnastasia Lubennikova <a.lubennikova@postgrespro.ru>
      Diagnosed-By: default avatarAlexander Lakhin <exclusion@gmail.com>
      Discussion: https://postgr.es/m/2588667e-d07d-7e10-74e2-7e1e46194491@postgrespro.ru
      Backpatch: 9.5 - master
      1f42d35a
    • Tom Lane's avatar
      Improve comments for postmaster.c's BackendList. · 3546cf8a
      Tom Lane authored
      This had gotten a little disjointed over time, and some of the grammar
      was sloppy.  Rewrite for more clarity.
      
      In passing, re-pgindent some recently added comments.
      
      No code changes.
      3546cf8a
    • Andres Freund's avatar
      Track latest completed xid as a FullTransactionId. · 3bd7f996
      Andres Freund authored
      The reason for doing so is that a subsequent commit will need that to
      avoid wraparound issues. As the subsequent change is large this was
      split out for easier review.
      
      The reason this is not a perfect straight-forward change is that we do
      not want track 64bit xids in the procarray or the WAL. Therefore we
      need to advance lastestCompletedXid in relation to 32 bit xids. The
      code for that is now centralized in MaintainLatestCompletedXid*.
      
      Author: Andres Freund
      Reviewed-By: Thomas Munro, Robert Haas, David Rowley
      Discussion: https://postgr.es/m/20200301083601.ews6hz5dduc3w2se@alap3.anarazel.de
      3bd7f996
  8. 11 Aug, 2020 2 commits
  9. 10 Aug, 2020 2 commits