1. 27 Apr, 2017 3 commits
  2. 26 Apr, 2017 10 commits
    • Tom Lane's avatar
      Allow multiple bgworkers to be launched per postmaster iteration. · aa1351f1
      Tom Lane authored
      Previously, maybe_start_bgworker() would launch at most one bgworker
      process per call, on the grounds that the postmaster might otherwise
      neglect its other duties for too long.  However, that seems overly
      conservative, especially since bad effects only become obvious when
      many hundreds of bgworkers need to be launched at once.  On the other
      side of the coin is that the existing logic could result in substantial
      delay of bgworker launches, because ServerLoop isn't guaranteed to
      iterate immediately after a signal arrives.  (My attempt to fix that
      by using pselect(2) encountered too many portability question marks,
      and in any case could not help on platforms without pselect().)
      One could also question the wisdom of using an O(N^2) processing
      method if the system is intended to support so many bgworkers.
      
      As a compromise, allow that function to launch up to 100 bgworkers
      per call (and in consequence, rename it to maybe_start_bgworkers).
      This will allow any normal parallel-query request for workers
      to be satisfied immediately during sigusr1_handler, avoiding the
      question of whether ServerLoop will be able to launch more promptly.
      
      There is talk of rewriting the postmaster to use a WaitEventSet to
      avoid the signal-response-delay problem, but I'd argue that this change
      should be kept even after that happens (if it ever does).
      
      Backpatch to 9.6 where parallel query was added.  The issue exists
      before that, but previous uses of bgworkers typically aren't as
      sensitive to how quickly they get launched.
      
      Discussion: https://postgr.es/m/4707.1493221358@sss.pgh.pa.us
      aa1351f1
    • Bruce Momjian's avatar
      fda4fec5
    • Stephen Frost's avatar
      pg_get_partkeydef: return NULL for non-partitions · 0c76c246
      Stephen Frost authored
      Our general rule for pg_get_X(oid) functions is to simply return NULL
      when passed an invalid or inappropriate OID.  Teach pg_get_partkeydef to
      do this also, making it easier for users to use this function when
      querying against tables with both partitions and non-partitions (such as
      pg_class).
      
      As a concrete example, this makes pg_dump's life a little easier.
      
      Author: Amit Langote
      0c76c246
    • Tom Lane's avatar
      Silence compiler warning induced by commit de438971. · 49da0067
      Tom Lane authored
      Smarter compilers can see that "slot" can't be used uninitialized,
      but some popular ones cannot.  Noted by Jeff Janes.
      49da0067
    • Peter Eisentraut's avatar
      doc: ALTER SUBSCRIPTION documentation fixes · e315346d
      Peter Eisentraut authored
      WITH is optional for REFRESH PUBLICATION.  Also, remove a spurious
      bracket and fix a punctuation.
      
      Author: Euler Taveira <euler@timbira.com.br>
      e315346d
    • Peter Eisentraut's avatar
      Fix query that gets remote relation info · 61ecc90b
      Peter Eisentraut authored
      Publisher relation can be incorrectly chosen, if there are more than
      one relation in different schemas with the same name.
      
      Author: Euler Taveira <euler@timbira.com.br>
      61ecc90b
    • Peter Eisentraut's avatar
      Spelling fixes in code comments · e495c168
      Peter Eisentraut authored
      Author: Euler Taveira <euler@timbira.com.br>
      e495c168
    • Fujii Masao's avatar
      Fix typo in comment. · 1f8b0601
      Fujii Masao authored
      Author: Masahiko Sawada
      1f8b0601
    • Peter Eisentraut's avatar
      Fix various concurrency issues in logical replication worker launching · de438971
      Peter Eisentraut authored
      The code was originally written with assumption that launcher is the
      only process starting the worker.  However that hasn't been true since
      commit 7c4f5240 which failed to modify the worker management code
      adequately.
      
      This patch adds an in_use field to the LogicalRepWorker struct to
      indicate whether the worker slot is being used and uses proper locking
      everywhere this flag is set or read.
      
      However if the parent process dies while the new worker is starting and
      the new worker fails to attach to shared memory, this flag would never
      get cleared.  We solve this rare corner case by adding a sort of garbage
      collector for in_use slots.  This uses another field in the
      LogicalRepWorker struct named launch_time that contains the time when
      the worker was started.  If any request to start a new worker does not
      find free slot, we'll check for workers that were supposed to start but
      took too long to actually do so, and reuse their slot.
      
      In passing also fix possible race conditions when stopping a worker that
      hasn't finished starting yet.
      
      Author: Petr Jelinek <petr.jelinek@2ndquadrant.com>
      Reported-by: default avatarFujii Masao <masao.fujii@gmail.com>
      de438971
    • Bruce Momjian's avatar
      doc PG10: add Rafia Sabih to parallel index scan item · 309191f6
      Bruce Momjian authored
      Reported-by: Amit Kapila
      309191f6
  3. 25 Apr, 2017 22 commits
  4. 24 Apr, 2017 5 commits
    • Bruce Momjian's avatar
      66fade8a
    • Tom Lane's avatar
      Revert "Use pselect(2) not select(2), if available, to wait in postmaster's loop." · 64925603
      Tom Lane authored
      This reverts commit 81069a9e.
      
      Buildfarm results suggest that some platforms have versions of pselect(2)
      that are not merely non-atomic, but flat out non-functional.  Revert the
      use-pselect patch to confirm this diagnosis (and exclude the no-SA_RESTART
      patch as the source of trouble).  If it's so, we should probably look into
      blacklisting specific platforms that have broken pselect.
      
      Discussion: https://postgr.es/m/9696.1493072081@sss.pgh.pa.us
      64925603
    • Tom Lane's avatar
      Use pselect(2) not select(2), if available, to wait in postmaster's loop. · 81069a9e
      Tom Lane authored
      Traditionally we've unblocked signals, called select(2), and then blocked
      signals again.  The code expects that the select() will be cancelled with
      EINTR if an interrupt occurs; but there's a race condition, which is that
      an already-pending signal will be delivered as soon as we unblock, and then
      when we reach select() there will be nothing preventing it from waiting.
      This can result in a long delay before we perform any action that
      ServerLoop was supposed to have taken in response to the signal.  As with
      the somewhat-similar symptoms fixed by commit 89390208, the main practical
      problem is slow launching of parallel workers.  The window for trouble is
      usually pretty short, corresponding to one iteration of ServerLoop; but
      it's not negligible.
      
      To fix, use pselect(2) in place of select(2) where available, as that's
      designed to solve exactly this problem.  Where not available, we continue
      to use the old way, and are no worse off than before.
      
      pselect(2) has been required by POSIX since about 2001, so most modern
      platforms should have it.  A bigger portability issue is that some
      implementations are said to be non-atomic, ie pselect() isn't really
      any different from unblock/select/reblock.  Still, we're no worse off
      than before on such a platform.
      
      There is talk of rewriting the postmaster to use a WaitEventSet and
      not do signal response work in signal handlers, at which point this
      could be reverted, since we'd be using a self-pipe to solve the race
      condition.  But that's not happening before v11 at the earliest.
      
      Back-patch to 9.6.  The problem exists much further back, but the
      worst symptom arises only in connection with parallel query, so it
      does not seem worth taking any portability risks in older branches.
      
      Discussion: https://postgr.es/m/9205.1492833041@sss.pgh.pa.us
      81069a9e
    • Tom Lane's avatar
      Run the postmaster's signal handlers without SA_RESTART. · 89390208
      Tom Lane authored
      The postmaster keeps signals blocked everywhere except while waiting
      for something to happen in ServerLoop().  The code expects that the
      select(2) will be cancelled with EINTR if an interrupt occurs; without
      that, followup actions that should be performed by ServerLoop() itself
      will be delayed.  However, some platforms interpret the SA_RESTART
      signal flag as meaning that they should restart rather than cancel
      the select(2).  Worse yet, some of them restart it with the original
      timeout delay, meaning that a steady stream of signal interrupts can
      prevent ServerLoop() from iterating at all if there are no incoming
      connection requests.
      
      Observable symptoms of this, on an affected platform such as HPUX 10,
      include extremely slow parallel query startup (possibly as much as
      30 seconds) and failure to update timestamps on the postmaster's sockets
      and lockfiles when no new connections arrive for a long time.
      
      We can fix this by running the postmaster's signal handlers without
      SA_RESTART.  That would be quite a scary change if the range of code
      where signals are accepted weren't so tiny, but as it is, it seems
      safe enough.  (Note that postmaster children do, and must, reset all
      the handlers before unblocking signals; so this change should not
      affect any child process.)
      
      There is talk of rewriting the postmaster to use a WaitEventSet and
      not do signal response work in signal handlers, at which point it might
      be appropriate to revert this patch.  But that's not happening before
      v11 at the earliest.
      
      Back-patch to 9.6.  The problem exists much further back, but the
      worst symptom arises only in connection with parallel query, so it
      does not seem worth taking any portability risks in older branches.
      
      Discussion: https://postgr.es/m/9205.1492833041@sss.pgh.pa.us
      89390208
    • Fujii Masao's avatar
      Get rid of extern declarations of non-existent functions. · cbc2270e
      Fujii Masao authored
      Those extern declartions were mistakenly added by commit 7c4f5240.
      
      Author: Petr Jelinek
      cbc2270e