1. 21 Aug, 2020 3 commits
    • Fujii Masao's avatar
      Fix explain regression test failure. · eabba4a3
      Fujii Masao authored
      Commit 9d701e62 caused the regression test for EXPLAIN to fail on
      the buildfarm member prion. This happened because of instability of
      test output, i.e., in text format, whether "Planning:" line is output
      varies depending on the system state.
      
      This commit updated the regression test so that it ignores that
      "Planning:" line to produce more stable test output and get rid of
      the test failure.
      
      Back-patch to v13.
      
      Author: Fujii Masao
      Discussion: https://postgr.es/m/1803897.1598021621@sss.pgh.pa.us
      eabba4a3
    • Fujii Masao's avatar
      Rework EXPLAIN for planner's buffer usage. · 9d701e62
      Fujii Masao authored
      Commit ce77abe6 allowed EXPLAIN (BUFFERS) to report the information
      on buffer usage during planning phase. However three issues were
      reported regarding this feature.
      
      (1) Previously, EXPLAIN option BUFFERS required ANALYZE. So the query
          had to be actually executed by specifying ANALYZE even when we
          want to see only the planner's buffer usage. This was inconvenient
          especially when the query was write one like DELETE.
      
      (2) EXPLAIN included the planner's buffer usage in summary
          information. So SUMMARY option had to be enabled to report that.
          Also this format was confusing.
      
      (3) The output structure for planning information was not consistent
          between TEXT format and the others. For example, "Planning" tag
          was output in JSON format, but not in TEXT format.
      
      For (1), this commit allows us to perform EXPLAIN (BUFFERS) without
      ANALYZE to report the planner's buffer usage.
      
      For (2), this commit changed EXPLAIN output so that the planner's
      buffer usage is reported before summary information.
      
      For (3), this commit made the output structure for planning
      information more consistent between the formats.
      
      Back-patch to v13 where the planner's buffer usage was allowed to
      be reported in EXPLAIN.
      
      Reported-by: Pierre Giraud, David Rowley
      Author: Fujii Masao
      Reviewed-by: David Rowley, Julien Rouhaud, Pierre Giraud
      Discussion: https://postgr.es/m/07b226e6-fa49-687f-b110-b7c37572f69e@dalibo.com
      9d701e62
    • Fujii Masao's avatar
      Fix typos in comments. · d259afa7
      Fujii Masao authored
      Author: Masahiko Sawada
      Reviewed-by: Fujii Masao
      Discussion: https://postgr.es/m/CA+fd4k4m9hFSrRLB3etPWO5_v5=MujVZWRtz63q+55hM0Dz25Q@mail.gmail.com
      d259afa7
  2. 20 Aug, 2020 4 commits
  3. 19 Aug, 2020 2 commits
    • Tom Lane's avatar
      Suppress unnecessary RelabelType nodes in yet more cases. · 20729324
      Tom Lane authored
      Commit a477bfc1 fixed eval_const_expressions() to ensure that it
      didn't generate unnecessary RelabelType nodes, but I failed to notice
      that some other places in the planner had the same issue.  Really
      noplace in the planner should be using plain makeRelabelType(), for
      fear of generating expressions that should be equal() to semantically
      equivalent trees, but aren't.
      
      An example is that because canonicalize_ec_expression() failed
      to be careful about this, we could end up with an equivalence class
      containing both a plain Const, and a Const-with-RelabelType
      representing exactly the same value.  So far as I can tell this led to
      no visible misbehavior, but we did waste a bunch of cycles generating
      and evaluating "Const = Const-with-RelabelType" to prove such entries
      are redundant.
      
      Hence, move the support function added by a477bfc1 to where it can
      be more generally useful, and use it in the places where planner code
      previously used makeRelabelType.
      
      Back-patch to v12, like the previous patch.  While I have no concrete
      evidence of any real misbehavior here, it's certainly possible that
      I overlooked a case where equivalent expressions that aren't equal()
      could cause a user-visible problem.  In any case carrying extra
      RelabelType nodes through planning to execution isn't very desirable.
      
      Discussion: https://postgr.es/m/1311836.1597781384@sss.pgh.pa.us
      20729324
    • Fujii Masao's avatar
      Add pg_backend_memory_contexts system view. · 3e98c0ba
      Fujii Masao authored
      This view displays the usages of all the memory contexts of the server
      process attached to the current session. This information is useful to
      investigate the cause of backend-local memory bloat.
      
      This information can be also collected by calling
      MemoryContextStats(TopMemoryContext) via a debugger. But this technique
      cannot be uesd in some environments because no debugger is available there.
      And it outputs lots of text messages and it's not easy to analyze them.
      So, pg_backend_memory_contexts view allows us to access to backend-local
      memory contexts information more easily.
      
      Bump catalog version.
      
      Author: Atsushi Torikoshi, Fujii Masao
      Reviewed-by: Tatsuhito Kasahara, Andres Freund, Daniel Gustafsson, Robert Haas, Michael Paquier
      Discussion: https://postgr.es/m/72a656e0f71d0860161e0b3f67e4d771@oss.nttdata.com
      3e98c0ba
  4. 18 Aug, 2020 5 commits
    • Andres Freund's avatar
      Fix race condition in snapshot caching when 2PC is used. · 07f32fcd
      Andres Freund authored
      When preparing a transaction xactCompletionCount needs to be
      incremented, even though the transaction has not committed
      yet. Otherwise the snapshot used within the transaction otherwise can
      get reused outside of the prepared transaction. As GetSnapshotData()
      does not include the current xid when building a snapshot, reuse would
      not be correct.
      
      Somewhat surprisingly the regression tests only rarely show incorrect
      results without the fix. The reason for that is that often the
      snapshot's xmax will be >= the backend xid, yielding a snapshot that
      is correct, despite the bug.
      
      I'm working on a reliable test for the bug, but it seems worth seeing
      whether this fixes all the BF failures while I do.
      
      Author: Andres Freund <andres@anarazel.de>
      Discussion: https://postgr.es/m/E1k7tGP-0005V0-5k@gemulon.postgresql.org
      07f32fcd
    • Heikki Linnakangas's avatar
      Avoid non-constant format string argument to fprintf(). · 73447820
      Heikki Linnakangas authored
      As Tom Lane pointed out, it could defeat the compiler's printf() format
      string verification.
      
      Backpatch to v12, like that patch that introduced it.
      
      Discussion: https://www.postgresql.org/message-id/1069283.1597672779%40sss.pgh.pa.us
      73447820
    • Andres Freund's avatar
      snapshot scalability: cache snapshots using a xact completion counter. · 623a9ba7
      Andres Freund authored
      Previous commits made it faster/more scalable to compute snapshots. But not
      building a snapshot is still faster. Now that GetSnapshotData() does not
      maintain RecentGlobal* anymore, that is actually not too hard:
      
      This commit introduces xactCompletionCount, which tracks the number of
      top-level transactions with xids (i.e. which may have modified the database)
      that completed in some form since the start of the server.
      
      We can avoid rebuilding the snapshot's contents whenever the current
      xactCompletionCount is the same as it was when the snapshot was
      originally built.  Currently this check happens while holding
      ProcArrayLock. While it's likely possible to perform the check without
      acquiring ProcArrayLock, it seems better to do that separately /
      later, some careful analysis is required. Even with the lock this is a
      significant win on its own.
      
      On a smaller two socket machine this gains another ~1.03x, on a larger
      machine the effect is roughly double (earlier patch version tested
      though).  If we were able to safely avoid the lock there'd be another
      significant gain on top of that.
      
      Author: Andres Freund <andres@anarazel.de>
      Reviewed-By: default avatarRobert Haas <robertmhaas@gmail.com>
      Reviewed-By: default avatarThomas Munro <thomas.munro@gmail.com>
      Reviewed-By: default avatarDavid Rowley <dgrowleyml@gmail.com>
      Discussion: https://postgr.es/m/20200301083601.ews6hz5dduc3w2se@alap3.anarazel.de
      623a9ba7
    • Michael Paquier's avatar
      Fix use-after-release issue in PL/Sample · 51300b45
      Michael Paquier authored
      Introduced in adbe62d0.  Per buildfarm member prion, when using
      RELCACHE_FORCE_RELEASE.
      51300b45
    • Michael Paquier's avatar
      Add PL/Sample to src/test/modules/ · adbe62d0
      Michael Paquier authored
      PL/Sample is an example template of procedural-language handler.  This
      can be used as a base to implement a custom PL, or as a facility to test
      APIs dedicated to PLs.  Much more could be done in this module, like
      adding a simple validator, but this is left as future work.
      
      The documentation included originally some C code to understand the
      basics of PL handler implementation, but it was outdated, and not really
      helpful either if trying to implement a new procedural language,
      particularly when it came to the integration of a PL installation with
      CREATE EXTENSION.
      
      Author: Mark Wong
      Reviewed-by: Tom Lane, Michael Paquier
      Discussion: https://postgr.es/m/20200612172648.GA3327@2ndQuadrant.com
      adbe62d0
  5. 17 Aug, 2020 6 commits
  6. 16 Aug, 2020 3 commits
  7. 15 Aug, 2020 7 commits
  8. 14 Aug, 2020 9 commits
    • Andres Freund's avatar
      snapshot scalability: Move subxact info to ProcGlobal, remove PGXACT. · 73487a60
      Andres Freund authored
      Similar to the previous changes this increases the chance that data
      frequently needed by GetSnapshotData() stays in l2 cache. In many
      workloads subtransactions are very rare, and this makes the check for
      that considerably cheaper.
      
      As this removes the last member of PGXACT, there is no need to keep it
      around anymore.
      
      On a larger 2 socket machine this and the two preceding commits result
      in a ~1.07x performance increase in read-only pgbench. For read-heavy
      mixed r/w workloads without row level contention, I see about 1.1x.
      
      Author: Andres Freund <andres@anarazel.de>
      Reviewed-By: default avatarRobert Haas <robertmhaas@gmail.com>
      Reviewed-By: default avatarThomas Munro <thomas.munro@gmail.com>
      Reviewed-By: default avatarDavid Rowley <dgrowleyml@gmail.com>
      Discussion: https://postgr.es/m/20200301083601.ews6hz5dduc3w2se@alap3.anarazel.de
      73487a60
    • Andres Freund's avatar
      snapshot scalability: Move PGXACT->vacuumFlags to ProcGlobal->vacuumFlags. · 5788e258
      Andres Freund authored
      Similar to the previous commit this increases the chance that data
      frequently needed by GetSnapshotData() stays in l2 cache. As we now
      take care to not unnecessarily write to ProcGlobal->vacuumFlags, there
      should be very few modifications to the ProcGlobal->vacuumFlags array.
      
      Author: Andres Freund <andres@anarazel.de>
      Reviewed-By: default avatarRobert Haas <robertmhaas@gmail.com>
      Reviewed-By: default avatarThomas Munro <thomas.munro@gmail.com>
      Reviewed-By: default avatarDavid Rowley <dgrowleyml@gmail.com>
      Discussion: https://postgr.es/m/20200301083601.ews6hz5dduc3w2se@alap3.anarazel.de
      5788e258
    • Andres Freund's avatar
      snapshot scalability: Introduce dense array of in-progress xids. · 941697c3
      Andres Freund authored
      The new array contains the xids for all connected backends / in-use
      PGPROC entries in a dense manner (in contrast to the PGPROC/PGXACT
      arrays which can have unused entries interspersed).
      
      This improves performance because GetSnapshotData() always needs to
      scan the xids of all live procarray entries and now there's no need to
      go through the procArray->pgprocnos indirection anymore.
      
      As the set of running top-level xids changes rarely, compared to the
      number of snapshots taken, this substantially increases the likelihood
      of most data required for a snapshot being in l2 cache.  In
      read-mostly workloads scanning the xids[] array will sufficient to
      build a snapshot, as most backends will not have an xid assigned.
      
      To keep the xid array dense ProcArrayRemove() needs to move entries
      behind the to-be-removed proc's one further up in the array. Obviously
      moving array entries cannot happen while a backend sets it
      xid. I.e. locking needs to prevent that array entries are moved while
      a backend modifies its xid.
      
      To avoid locking ProcArrayLock in GetNewTransactionId() - a fairly hot
      spot already - ProcArrayAdd() / ProcArrayRemove() now needs to hold
      XidGenLock in addition to ProcArrayLock. Adding / Removing a procarray
      entry is not a very frequent operation, even taking 2PC into account.
      
      Due to the above, the dense array entries can only be read or modified
      while holding ProcArrayLock and/or XidGenLock. This prevents a
      concurrent ProcArrayRemove() from shifting the dense array while it is
      accessed concurrently.
      
      While the new dense array is very good when needing to look at all
      xids it is less suitable when accessing a single backend's xid. In
      particular it would be problematic to have to acquire a lock to access
      a backend's own xid. Therefore a backend's xid is not just stored in
      the dense array, but also in PGPROC. This also allows a backend to
      only access the shared xid value when the backend had acquired an
      xid.
      
      The infrastructure added in this commit will be used for the remaining
      PGXACT fields in subsequent commits. They are kept separate to make
      review easier.
      
      Author: Andres Freund <andres@anarazel.de>
      Reviewed-By: default avatarRobert Haas <robertmhaas@gmail.com>
      Reviewed-By: default avatarThomas Munro <thomas.munro@gmail.com>
      Reviewed-By: default avatarDavid Rowley <dgrowleyml@gmail.com>
      Discussion: https://postgr.es/m/20200301083601.ews6hz5dduc3w2se@alap3.anarazel.de
      941697c3
    • Alvaro Herrera's avatar
      pg_dump: fix dependencies on FKs to partitioned tables · 2ba5b2db
      Alvaro Herrera authored
      Parallel-restoring a foreign key that references a partitioned table
      with several levels of partitions can fail:
      
      pg_restore: while PROCESSING TOC:
      pg_restore: from TOC entry 6684; 2606 29166 FK CONSTRAINT fk fk_a_fkey postgres
      pg_restore: error: could not execute query: ERROR:  there is no unique constraint matching given keys for referenced table "pk"
      Command was: ALTER TABLE fkpart3.fk
          ADD CONSTRAINT fk_a_fkey FOREIGN KEY (a) REFERENCES fkpart3.pk(a);
      
      This happens in parallel restore mode because some index partitions
      aren't yet attached to the topmost partitioned index that the FK uses,
      and so the index is still invalid.  The current code marks the FK as
      dependent on the first level of index-attach dump objects; the bug is
      fixed by recursively marking the FK on their children.
      
      Backpatch to 12, where FKs to partitioned tables were introduced.
      Reported-by: default avatarTom Lane <tgl@sss.pgh.pa.us>
      Author: Álvaro Herrera <alvherre@alvh.no-ip.org>
      Discussion: https://postgr.es/m/3170626.1594842723@sss.pgh.pa.us
      Backpatch: 12-master
      2ba5b2db
    • Peter Geoghegan's avatar
      Fix obsolete comment in xlogutils.c. · 914140e8
      Peter Geoghegan authored
      Oversight in commit 2c03216d.
      914140e8
    • Tom Lane's avatar
      Fix postmaster's behavior during smart shutdown. · 0038f943
      Tom Lane authored
      Up to now, upon receipt of a SIGTERM ("smart shutdown" command), the
      postmaster has immediately killed all "optional" background processes,
      and subsequently refused to launch new ones while it's waiting for
      foreground client processes to exit.  No doubt this seemed like an OK
      policy at some point; but it's a pretty bad one now, because it makes
      for a seriously degraded environment for the remaining clients:
      
      * Parallel queries are killed, and new ones fail to launch. (And our
      parallel-query infrastructure utterly fails to deal with the case
      in a reasonable way --- it just hangs waiting for workers that are
      not going to arrive.  There is more work needed in that area IMO.)
      
      * Autovacuum ceases to function.  We can tolerate that for awhile,
      but if bulk-update queries continue to run in the surviving client
      sessions, there's eventually going to be a mess.  In the worst case
      the system could reach a forced shutdown to prevent XID wraparound.
      
      * The bgwriter and walwriter are also stopped immediately, likely
      resulting in performance degradation.
      
      Hence, let's rearrange things so that the only immediate change in
      behavior is refusing to let in new normal connections.  Once the last
      normal connection is gone, shut everything down as though we'd received
      a "fast" shutdown.  To implement this, remove the PM_WAIT_BACKUP and
      PM_WAIT_READONLY states, instead staying in PM_RUN or PM_HOT_STANDBY
      while normal connections remain.  A subsidiary state variable tracks
      whether or not we're letting in new connections in those states.
      
      This also allows having just one copy of the logic for killing child
      processes in smart and fast shutdown modes.  I moved that logic into
      PostmasterStateMachine() by inventing a new state PM_STOP_BACKENDS.
      
      Back-patch to 9.6 where parallel query was added.  In principle
      this'd be a good idea in 9.5 as well, but the risk/reward ratio
      is not as good there, since lack of autovacuum is not a problem
      during typical uses of smart shutdown.
      
      Per report from Bharath Rupireddy.
      
      Patch by me, reviewed by Thomas Munro
      
      Discussion: https://postgr.es/m/CALj2ACXAZ5vKxT9P7P89D87i3MDO9bfS+_bjMHgnWJs8uwUOOw@mail.gmail.com
      0038f943
    • Heikki Linnakangas's avatar
      Fix typo in test comment. · 5bdf6945
      Heikki Linnakangas authored
      5bdf6945
    • Michael Paquier's avatar
      Fix compilation warnings with libselinux 3.1 in contrib/sepgsql/ · 1f32136a
      Michael Paquier authored
      Upstream SELinux has recently marked security_context_t as officially
      deprecated, causing warnings with -Wdeprecated-declarations.  This is
      considered as legacy code for some time now by upstream as
      security_context_t got removed from most of the code tree during the
      development of 2.3 back in 2014.
      
      This removes all the references to security_context_t in sepgsql/ to be
      consistent with SELinux, fixing the warnings.  Note that this does not
      impact the minimum version of libselinux supported.
      
      Reviewed-by: Tom Lane
      Discussion: https://postgr.es/m/20200813012735.GC11663@paquier.xyz
      1f32136a
    • Tom Lane's avatar
      Doc: improve examples for json_populate_record() and related functions. · a9306f10
      Tom Lane authored
      Make these examples self-contained by providing declarations of the
      user-defined row types they rely on.  There wasn't room to do this
      in the old doc format, but now there is, and I think it makes the
      examples a good bit less confusing.
      a9306f10
  9. 13 Aug, 2020 1 commit