1. 08 Aug, 2020 4 commits
    • Alvaro Herrera's avatar
      walsnd: Don't set waiting_for_ping_response spuriously · 470687b4
      Alvaro Herrera authored
      Ashutosh Bapat noticed that when logical walsender needs to wait for
      WAL, and it realizes that it must send a keepalive message to
      walreceiver to update the sent-LSN, which *does not* request a reply
      from walreceiver, it wrongly sets the flag that it's going to wait for
      that reply.  That means that any future would-be sender of feedback
      messages ends up not sending a feedback message, because they all
      believe that a reply is expected.
      
      With built-in logical replication there's not much harm in this, because
      WalReceiverMain will send a ping-back every wal_receiver_timeout/2
      anyway; but with other logical replication systems (e.g. pglogical) it
      can cause significant pain.
      
      This problem was introduced in commit 41d5f8ad, where the
      request-reply flag was changed from true to false to WalSndKeepalive,
      without at the same time removing the line that sets
      waiting_for_ping_response.
      
      Just removing that line would be a sufficient fix, but it seems better
      to shift the responsibility of setting the flag to WalSndKeepalive
      itself instead of requiring caller to do it; this is clearly less
      error-prone.
      
      Author: Álvaro Herrera <alvherre@alvh.no-ip.org>
      Reported-by: default avatarAshutosh Bapat <ashutosh.bapat@2ndquadrant.com>
      Backpatch: 9.5 and up
      Discussion: https://postgr.es/m/20200806225558.GA22401@alvherre.pgsql
      470687b4
    • Amit Kapila's avatar
      Fix the logical streaming test. · 82a0ba77
      Amit Kapila authored
      Commit 7259736a added the capability to stream changes in ReorderBuffer
      which has some tests to test the streaming mode. It is quite possible that
      while this test is running a parallel transaction could be logged by
      autovacuum. Such a transaction won't perform any insert/update/delete to
      non-catalog tables so will be shown as an empty transaction. Fix it by
      skipping the empty transactions during this test.
      
      Per report by buildfarm.
      82a0ba77
    • Peter Eisentraut's avatar
      Add some const decorations · a13421c9
      Peter Eisentraut authored
      a13421c9
    • Amit Kapila's avatar
      Implement streaming mode in ReorderBuffer. · 7259736a
      Amit Kapila authored
      Instead of serializing the transaction to disk after reaching the
      logical_decoding_work_mem limit in memory, we consume the changes we have
      in memory and invoke stream API methods added by commit 45fdc973.
      However, sometimes if we have incomplete toast or speculative insert we
      spill to the disk because we can't generate the complete tuple and stream.
      And, as soon as we get the complete tuple we stream the transaction
      including the serialized changes.
      
      We can do this incremental processing thanks to having assignments
      (associating subxact with toplevel xacts) in WAL right away, and
      thanks to logging the invalidation messages at each command end. These
      features are added by commits 0bead9af and c55040cc respectively.
      
      Now that we can stream in-progress transactions, the concurrent aborts
      may cause failures when the output plugin consults catalogs (both system
      and user-defined).
      
      We handle such failures by returning ERRCODE_TRANSACTION_ROLLBACK
      sqlerrcode from system table scan APIs to the backend or WALSender
      decoding a specific uncommitted transaction. The decoding logic on the
      receipt of such a sqlerrcode aborts the decoding of the current
      transaction and continue with the decoding of other transactions.
      
      We have ReorderBufferTXN pointer in each ReorderBufferChange by which we
      know which xact it belongs to.  The output plugin can use this to decide
      which changes to discard in case of stream_abort_cb (e.g. when a subxact
      gets discarded).
      
      We also provide a new option via SQL APIs to fetch the changes being
      streamed.
      
      Author: Dilip Kumar, Tomas Vondra, Amit Kapila, Nikhil Sontakke
      Reviewed-by: Amit Kapila, Kuntal Ghosh, Ajin Cherian
      Tested-by: Neha Sharma, Mahendra Singh Thalor and Ajin Cherian
      Discussion: https://postgr.es/m/688b0b7f-2f6c-d827-c27b-216a8e3ea700@2ndquadrant.com
      7259736a
  2. 07 Aug, 2020 5 commits
    • Peter Geoghegan's avatar
      Make nbtree split REDO locking match original execution. · 0a7d771f
      Peter Geoghegan authored
      Make the nbtree page split REDO routine consistent with original
      execution in its approach to acquiring and releasing buffer locks (at
      least for pages on the tree level of the page being split).  This brings
      btree_xlog_split() in line with btree_xlog_unlink_page(), which was
      taught to couple buffer locks by commit 9a9db08a.
      
      Note that the precise order in which we both acquire and release sibling
      buffer locks in btree_xlog_split() now matches original execution
      exactly (the precise order in which the locks are released probably
      doesn't matter much, but we might as well be consistent about it).
      
      The rule for nbtree REDO routines from here on is that same-level locks
      should be acquired in an order that's consistent with original
      execution.  It's not practical to have a similar rule for cross-level
      page locks, since for the most part original execution holds those locks
      for a period that spans multiple atomic actions/WAL records.  It's also
      not necessary, because clearly the cross-level lock coupling is only
      truly needed during original execution because of the presence of
      concurrent inserters.
      
      This is not a bug fix (unlike the similar aforementioned commit, commit
      9a9db08a).  The immediate reason to tighten things up in this area is to
      enable an upcoming enhancement to contrib/amcheck that allows it to
      verify that sibling links are in agreement with only an AccessShareLock
      (this check produced false positives when run on a replica server on
      account of the inconsistency fixed by this commit).  But that's not the
      only reason to be stricter here.
      
      It is generally useful to make locking on replicas be as close to what
      happens during original execution as practically possible.  It makes it
      less likely that hard to catch bugs will slip in in the future.  The
      previous state of affairs seems to be a holdover from before the
      introduction of Hot Standby, when buffer lock acquisitions during
      recovery were totally unnecessary.  See also: commit 3bbf668d, which
      tightened things up in this area a few years after the introduction of
      Hot Standby.
      
      Discussion: https://postgr.es/m/CAH2-Wz=465cJj11YXD9RKH8z=nhQa2dofOZ_23h67EXUGOJ00Q@mail.gmail.com
      0a7d771f
    • Alvaro Herrera's avatar
      Remove PROC_IN_ANALYZE and derived flags · cea3d558
      Alvaro Herrera authored
      These flags are unused and always have been.
      
      Discussion: https://postgr.es/m/20200805235549.GA8118@alvherre.pgsql
      cea3d558
    • Tom Lane's avatar
      Support testing of cases where table schemas change after planning. · 6f0b632f
      Tom Lane authored
      We have various cases where we allow DDL on tables to be performed with
      less than full AccessExclusiveLock.  This requires concurrent queries
      to be able to cope with the DDL change mid-flight, but up to now we had
      no repeatable way to test such cases.  To improve that, invent a test
      module that allows halting a backend after planning and then resuming
      execution once we've done desired actions in another session.  (The same
      approach could be used to inject delays in other places, if there's a
      suitable hook available.)
      
      This commit includes a single test case, which is meant to exercise the
      previously-untestable ExecCreatePartitionPruneState code repaired by
      commit 7a980dfc.  We'd probably not bother with this if that were the
      only foreseen benefit, but I expect additional test cases will use this
      infrastructure in the future.
      
      Test module by Andy Fan, partition-addition test case by me.
      
      Discussion: https://postgr.es/m/20200802181131.GA27754@telsasoft.com
      6f0b632f
    • Peter Geoghegan's avatar
      Rename nbtree split REDO routine variables. · 3df92bbd
      Peter Geoghegan authored
      Make the nbtree page split REDO routine variable names consistent with
      _bt_split() (which handles the original execution of page splits).
      These names make the code easier to follow by making the distinction
      between the original page and the left half of the split clear.  (The
      left half of the split page is a temp page that REDO creates to replace
      the origpage contents.)
      
      Also reduce the elevel used when adding a new high key to the temp page
      from PANIC to ERROR to be consistent.  We already only raise an ERROR
      when data item PageAddItem() temp page calls fail.
      3df92bbd
    • Etsuro Fujita's avatar
      Fix yet another issue with step generation in partition pruning. · 199cec97
      Etsuro Fujita authored
      Commit 13838740 fixed some issues with step generation in partition
      pruning, but there was yet another one: get_steps_using_prefix() assumes
      that clauses in the passed-in prefix list are sorted in ascending order
      of their partition key numbers, but the caller failed to ensure this for
      range partitioning, which led to an assertion failure in debug builds.
      Adjust the caller function to arrange the clauses in the prefix list in
      the required order for range partitioning.
      
      Back-patch to v11, like the previous commit.
      
      Patch by me, reviewed by Amit Langote.
      
      Discussion: https://postgr.es/m/CAPmGK16jkXiFG0YqMbU66wte-oJTfW6D1HaNvQf%3D%2B5o9%3Dm55wQ%40mail.gmail.com
      199cec97
  3. 06 Aug, 2020 4 commits
    • Peter Geoghegan's avatar
      Remove obsolete amcheck comment. · 3a3be806
      Peter Geoghegan authored
      Oversight in commit d114cc53.
      3a3be806
    • Peter Geoghegan's avatar
      amcheck: Sanitize metapage's allequalimage field. · c254d8d7
      Peter Geoghegan authored
      This will be helpful if it ever proves necessary to revoke an opclass's
      support for deduplication.
      
      Backpatch: 13-, where nbtree deduplication was introduced.
      c254d8d7
    • David Rowley's avatar
      Fix bogus EXPLAIN output for Hash Aggregate · d5e96520
      David Rowley authored
      9bdb300d modified the EXPLAIN output for Hash Aggregate to show details
      from parallel workers. However, it neglected to consider that a given
      parallel worker may not have assisted with the given Hash Aggregate. This
      can occur when workers fail to start or during Parallel Append with
      enable_partitionwise_join enabled when only a single worker is working on
      a non-parallel aware sub-plan. It could also happen if a worker simply
      wasn't fast enough to get any work done before other processes went and
      finished all the work.
      
      The bogus output came from the fact that ExplainOpenWorker() skipped
      showing any details for non-initialized workers but show_hashagg_info()
      did show details from the worker.  This meant that the worker properties
      that were shown were not properly attributed to the worker that they
      belong to.
      
      In passing, we also now don't show Hash Aggregate properties for the
      leader process when it did not contribute any work to the Hash Aggregate.
      This can occur either during Parallel Append when only a parallel worker
      worked on a given sub plan or with parallel_leader_participation set to
      off.  This aims to make the behavior of Hash Aggregate's EXPLAIN output
      more similar to Sort's.
      
      Reported-by: Justin Pryzby
      Discussion: https://postgr.es/m/20200805012105.GZ28072%40telsasoft.com
      Backpatch-through: 13, where the original breakage was introduced
      d5e96520
    • Robert Haas's avatar
      Register llvm_shutdown using on_proc_exit, not before_shmem_exit. · bab15004
      Robert Haas authored
      This seems more correct, because other before_shmem_exit calls may
      expect the infrastructure that is needed to run queries and access the
      database to be working, and also because this cleanup has nothing to
      do with shared memory.
      
      There are no known user-visible consequences to this, though, apart
      from what was previous fixed by commit
      30364019 and back-patched as commit
      bcbc27251d35336a6442761f59638138a772b839 and commit
      f7013683d9bb663a6a917421b1374306a32f165b, so for now, no back-patch.
      
      Bharath Rupireddy
      
      Discussion: http://postgr.es/m/CALj2ACWk7j4F2v2fxxYfrroOF=AdFNPr1WsV+AGtHAFQOqm_pw@mail.gmail.com
      bab15004
  4. 05 Aug, 2020 2 commits
    • Bruce Momjian's avatar
      doc: clarify "state" table reference in tutorial · a6775352
      Bruce Momjian authored
      Reported-by: Vyacheslav Shablistyy
      
      Discussion: https://postgr.es/m/159586122762.680.1361378513036616007@wrigleys.postgresql.org
      
      Backpatch-through: 9.5
      a6775352
    • Tom Lane's avatar
      Fix matching of sub-partitions when a partitioned plan is stale. · 7a980dfc
      Tom Lane authored
      Since we no longer require AccessExclusiveLock to add a partition,
      the executor may see that a partitioned table has more partitions
      than the planner saw.  ExecCreatePartitionPruneState's code for
      matching up the partition lists in such cases was faulty, and would
      misbehave if the planner had successfully pruned any partitions from
      the query.  (Thus, trouble would occur only if a partition addition
      happens concurrently with a query that uses both static and dynamic
      partition pruning.)  This led to an Assert failure in debug builds,
      and probably to crashes or query misbehavior in production builds.
      
      To repair the bug, just explicitly skip zeroes in the plan's
      relid_map[] list.  I also made some cosmetic changes to make the code
      more readable (IMO anyway).  Also, convert the cross-checking Assert
      to a regular test-and-elog, since it's now apparent that this logic
      is more fragile than one would like.
      
      Currently, there's no way to repeatably exercise this code, except
      with manual use of a debugger to stop the backend between planning
      and execution.  Hence, no test case in this patch.  We oughta do
      something about that testability gap, but that's for another day.
      
      Amit Langote and Tom Lane, per report from Justin Pryzby.  Oversight
      in commit 898e5e32; backpatch to v12 where that appeared.
      
      Discussion: https://postgr.es/m/20200802181131.GA27754@telsasoft.com
      7a980dfc
  5. 04 Aug, 2020 3 commits
  6. 03 Aug, 2020 8 commits
    • Peter Geoghegan's avatar
      Fix replica backward scan race condition. · 9a9db08a
      Peter Geoghegan authored
      It was possible for the logic used by backward scans (which must reason
      about concurrent page splits/deletions in its own peculiar way) to
      become confused when running on a replica.  Concurrent replay of a WAL
      record that describes the second phase of page deletion could cause
      _bt_walk_left() to get confused.  btree_xlog_unlink_page() simply failed
      to adhere to the same locking protocol that we use on the primary, which
      is obviously wrong once you consider these two disparate functions
      together.  This bug is present in all stable branches.
      
      More concretely, the problem was that nothing stopped _bt_walk_left()
      from observing inconsistencies between the deletion's target page and
      its original sibling pages when running on a replica.  This is true even
      though the second phase of page deletion is supposed to work as a single
      atomic action.  Queries running on replicas raised "could not find left
      sibling of block %u in index %s" can't-happen errors when they went back
      to their scan's "original" page and observed that the page has not been
      marked deleted (even though it really was concurrently deleted).
      
      There is no evidence that this actually happened in the real world.  The
      issue came to light during unrelated feature development work.  Note
      that _bt_walk_left() is the only code that cares about the difference
      between a half-dead page and a fully deleted page that isn't also
      exclusively used by nbtree VACUUM (unless you include contrib/amcheck
      code).  It seems very likely that backward scans are the only thing that
      could become confused by the inconsistency.  Even amcheck's complex
      bt_right_page_check_scankey() dance was unaffected.
      
      To fix, teach btree_xlog_unlink_page() to lock the left sibling, target,
      and right sibling pages in that order before releasing any locks (just
      like _bt_unlink_halfdead_page()).  This is the simplest possible
      approach.  There doesn't seem to be any opportunity to be more clever
      about lock acquisition in the REDO routine, and it hardly seems worth
      the trouble in any case.
      
      This fix might enable contrib/amcheck verification of leaf page sibling
      links with only an AccessShareLock on the relation.  An amcheck patch
      from Andrey Borodin was rejected back in January because it clashed with
      btree_xlog_unlink_page()'s lax approach to locking pages.  It now seems
      likely that the real problem was with btree_xlog_unlink_page(), not the
      patch.
      
      This is a low severity, low likelihood bug, so no backpatch.
      
      Author: Michail Nikolaev
      Diagnosed-By: Michail Nikolaev
      Discussion: https://postgr.es/m/CANtu0ohkR-evAWbpzJu54V8eCOtqjJyYp3PQ_SGoBTRGXWhWRw@mail.gmail.com
      9a9db08a
    • Peter Geoghegan's avatar
      Add nbtree page deletion assertion. · a451b7d4
      Peter Geoghegan authored
      Add a documenting assertion that's similar to the nearby assertion added
      by commit cd8c73a3.  This conveys that the entire call to _bt_pagedel()
      does no work if it isn't possible to get a descent stack for the initial
      scanblkno page.
      a451b7d4
    • Tom Lane's avatar
      Remove unnecessary "DISTINCT" in psql's queries for \dAc and \dAf. · 9e496768
      Tom Lane authored
      A moment's examination of these queries is sufficient to see that
      they do not produce duplicate rows, unless perhaps there's
      catalog corruption.  Using DISTINCT anyway is inefficient and
      confusing; moreover it sets a poor example for anyone who
      refers to psql -E output to see how to query the catalogs.
      9e496768
    • Tom Lane's avatar
      Doc: fix obsolete info about allowed range of TZ offsets in timetz. · eeb01e31
      Tom Lane authored
      We've allowed UTC offsets up to +/- 15:59 since commit cd0ff9c0, but
      that commit forgot to fix the documentation about timetz.
      
      Per bug #16571 from osdba.
      
      Discussion: https://postgr.es/m/16571-eb7501598de78c8a@postgresql.org
      eeb01e31
    • Tom Lane's avatar
      Fix behavior of ecpg's "EXEC SQL elif name". · 5f28b21e
      Tom Lane authored
      This ought to work much like C's "#elif defined(name)"; but the code
      implemented it in a way equivalent to endif followed by ifdef, so that
      it didn't matter whether any previous branch of the IF construct had
      succeeded.  Fix that; add some test cases covering elif and nested IFs;
      and improve the documentation, which also seemed a bit confused.
      
      AFAICS the code has been like this since the feature was added in 1999
      (commit b57b0e04).  So while it's surely wrong, there might be code
      out there relying on the current behavior.  Hence, don't back-patch
      into stable branches.  It seems all right to fix it in v13 though.
      
      Per report from Ashutosh Sharma.  Reviewed by Ashutosh Sharma and
      Michael Meskes.
      
      Discussion: https://postgr.es/m/CAE9k0P=dQk9X0cU2tN49S7a9tv733-e1pVdpB1P-pWJ5PdTktg@mail.gmail.com
      5f28b21e
    • Michael Paquier's avatar
      Add %P to log_line_prefix for parallel group leader · b8fdee7d
      Michael Paquier authored
      This is useful for monitoring purposes with log parsing.  Similarly to
      pg_stat_activity, the leader's PID is shown only for active parallel
      workers, minimizing the log footprint for the leaders as the equivalent
      shared memory field is set as long as a backend is alive.
      
      Author: Justin Pryzby
      Reviewed-by: Álvaro Herrera, Michael Paquier, Julien Rouhaud, Tom Lane
      Discussion: https://postgr.es/m/20200315111831.GA21492@telsasoft.com
      b8fdee7d
    • Thomas Munro's avatar
      Fix rare failure in LDAP tests. · f44b9b62
      Thomas Munro authored
      Instead of writing a query to psql's stdin, use -c.  This avoids a
      failure where psql exits before we write, seen a few times on the build
      farm.  Thanks to Tom Lane for the suggestion.
      
      Back-patch to 11, where the LDAP tests arrived.
      Reviewed-by: default avatarNoah Misch <noah@leadboat.com>
      Discussion: https://postgr.es/m/CA%2BhUKGLFmW%2BHQYPeKiwSp5sdFFHtFViCpw4Mh6yAgEx74r5-Cw%40mail.gmail.com
      f44b9b62
    • Thomas Munro's avatar
      Correct comment in simplehash.h. · 63e9aa68
      Thomas Munro authored
      Post-commit review for commit 84c0e4b9.
      
      Author: David Rowley <dgrowleyml@gmail.com>
      Discussion: https://postgr.es/m/CAApHDvptBx_%2BUPAzY0uXzopbvPVGKPeZ6Hoy8rnPcWz20Cr0Bw%40mail.gmail.com
      63e9aa68
  7. 02 Aug, 2020 2 commits
    • Tom Lane's avatar
      Fix minor issues in psql's new \dAc and related commands. · 533020d0
      Tom Lane authored
      The type-name pattern in \dAc and \dAf was matched only to the actual
      pg_type.typname string, which is fairly user-unfriendly in cases where
      that is not what's shown to the user by format_type (compare "_int4"
      and "integer[]").  Make this code match what \dT does, i.e. match the
      pattern against either typname or format_type() output.  Also fix its
      broken handling of schema-name restrictions.  (IOW, make these
      processSQLNamePattern calls match \dT's.)  While here, adjust
      whitespace to make the query a little prettier in -E output, too.
      
      Also improve some inaccuracies and shaky grammar in the related
      documentation.
      
      Noted while working on a patch for intarray's opclasses; I wondered
      why I couldn't get a match to "integer*" for the input type name.
      533020d0
    • David Rowley's avatar
      Use int64 instead of long in incremental sort code · 6ee3b5fb
      David Rowley authored
      Windows 64bit has 4-byte long values which is not suitable for tracking
      disk space usage in the incremental sort code. Let's just make all these
      fields int64s.
      
      Author: James Coleman
      Discussion: https://postgr.es/m/CAApHDvpky%2BUhof8mryPf5i%3D6e6fib2dxHqBrhp0Qhu0NeBhLJw%40mail.gmail.com
      Backpatch-through: 13, where the incremental sort code was added
      6ee3b5fb
  8. 01 Aug, 2020 5 commits
  9. 31 Jul, 2020 7 commits
    • Peter Geoghegan's avatar
      Restore lost amcheck TOAST test coverage. · c79aed4f
      Peter Geoghegan authored
      Commit eba77534 fixed an amcheck false positive bug involving
      inconsistencies in TOAST input state between table and index.  A test
      case was added that verified that such an inconsistency didn't result in
      a spurious corruption related error.
      
      Test coverage from the test was accidentally lost by commit 501e41dd,
      which propagated ALTER TABLE ...  SET STORAGE attstorage state to
      indexes.  This broke the test because the test specifically relied on
      attstorage not being propagated.  This artificially forced there to be
      index tuples whose datums were equivalent to the datums in the heap
      without the datums actually being bitwise equal.
      
      Fix this by updating pg_attribute directly instead.  Commit 501e41dd
      made similar changes to a test_decoding TOAST-related test case which
      made the same assumption, but overlooked the amcheck test case.
      
      Backpatch: 11-, just like commit eba77534 (and commit 501e41dd).
      c79aed4f
    • Tom Lane's avatar
      Fix oversight in ALTER TYPE: typmodin/typmodout must propagate to arrays. · 3d2376d5
      Tom Lane authored
      If a base type supports typmods, its array type does too, with the
      same interpretation.  Hence changes in pg_type.typmodin/typmodout
      must be propagated to the array type.
      
      While here, improve AlterTypeRecurse to not recurse to domains if
      there is nothing we'd need to change.
      
      Oversight in fe30e7eb.  Back-patch to v13 where that came in.
      3d2376d5
    • Tom Lane's avatar
      Fix recently-introduced performance problem in ts_headline(). · 78e73e87
      Tom Lane authored
      The new hlCover() algorithm that I introduced in commit c9b0c678
      turns out to potentially take O(N^2) or worse time on long documents,
      if there are many occurrences of individual query words but few or no
      substrings that actually satisfy the query.  (One way to hit this
      behavior is with a "common_word & rare_word" type of query.)  This
      seems unavoidable given the original goal of checking every substring
      of the document, so we have to back off that idea.  Fortunately, it
      seems unlikely that anyone would really want headlines spanning all of
      a long document, so we can avoid the worse-than-linear behavior by
      imposing a maximum length of substring that we'll consider.
      
      For now, just hard-wire that maximum length as a multiple of max_words
      times max_fragments.  Perhaps at some point somebody will argue for
      exposing it as a ts_headline parameter, but I'm hesitant to make such
      a feature addition in a back-patched bug fix.
      
      I also noted that the hlFirstIndex() function I'd added in that
      commit was unnecessarily stupid: it really only needs to check whether
      a HeadlineWordEntry's item pointer is null or not.  This wouldn't make
      all that much difference in typical cases with queries having just
      a few terms, but a cycle shaved is a cycle earned.
      
      In addition, add a CHECK_FOR_INTERRUPTS call in TS_execute_recurse.
      This ensures that hlCover's loop is cancellable if it manages to take
      a long time, and it may protect some other TS_execute callers as well.
      
      Back-patch to 9.6 as the previous commit was.  I also chose to add the
      CHECK_FOR_INTERRUPTS call to 9.5.  The old hlCover() algorithm seems
      to avoid the O(N^2) behavior, at least on the test case I tried, but
      nonetheless it's not very quick on a long document.
      
      Per report from Stephen Frost.
      
      Discussion: https://postgr.es/m/20200724160535.GW12375@tamriel.snowman.net
      78e73e87
    • Thomas Munro's avatar
      Fix compiler warning from Clang. · 7be04496
      Thomas Munro authored
      Per build farm.
      
      Discussion: https://postgr.es/m/20200731062626.GD3317%40paquier.xyz
      7be04496
    • Thomas Munro's avatar
      Preallocate some DSM space at startup. · 84b1c63a
      Thomas Munro authored
      Create an optional region in the main shared memory segment that can be
      used to acquire and release "fast" DSM segments, and can benefit from
      huge pages allocated at cluster startup time, if configured.  Fall back
      to the existing mechanisms when that space is full.  The size is
      controlled by a new GUC min_dynamic_shared_memory, defaulting to 0.
      
      Main region DSM segments initially contain whatever garbage the memory
      held last time they were used, rather than zeroes.  That change revealed
      that DSA areas failed to initialize themselves correctly in memory that
      wasn't zeroed first, so fix that problem.
      
      Discussion: https://postgr.es/m/CA%2BhUKGLAE2QBv-WgGp%2BD9P_J-%3Dyne3zof9nfMaqq1h3EGHFXYQ%40mail.gmail.com
      84b1c63a
    • Michael Paquier's avatar
      Fix comment in instrument.h · 7b1110d2
      Michael Paquier authored
      local_blks_dirtied tracks the number of local blocks dirtied, not shared
      ones.
      
      Author: Kirk Jamison
      Discussion: https://postgr.es/m/OSBPR01MB2341760686DC056DE89D2AB9EF710@OSBPR01MB2341.jpnprd01.prod.outlook.com
      7b1110d2
    • Thomas Munro's avatar
      Cache smgrnblocks() results in recovery. · c5315f4f
      Thomas Munro authored
      Avoid repeatedly calling lseek(SEEK_END) during recovery by caching
      the size of each fork.  For now, we can't use the same technique in
      other processes, because we lack a shared invalidation mechanism.
      
      Do this by generalizing the pre-existing caching used by FSM and VM
      to support all forks.
      
      Discussion: https://postgr.es/m/CAEepm%3D3SSw-Ty1DFcK%3D1rU-K6GSzYzfdD4d%2BZwapdN7dTa6%3DnQ%40mail.gmail.com
      c5315f4f