1. 25 Apr, 2017 11 commits
  2. 24 Apr, 2017 7 commits
    • Bruce Momjian's avatar
      66fade8a
    • Tom Lane's avatar
      Revert "Use pselect(2) not select(2), if available, to wait in postmaster's loop." · 64925603
      Tom Lane authored
      This reverts commit 81069a9e.
      
      Buildfarm results suggest that some platforms have versions of pselect(2)
      that are not merely non-atomic, but flat out non-functional.  Revert the
      use-pselect patch to confirm this diagnosis (and exclude the no-SA_RESTART
      patch as the source of trouble).  If it's so, we should probably look into
      blacklisting specific platforms that have broken pselect.
      
      Discussion: https://postgr.es/m/9696.1493072081@sss.pgh.pa.us
      64925603
    • Tom Lane's avatar
      Use pselect(2) not select(2), if available, to wait in postmaster's loop. · 81069a9e
      Tom Lane authored
      Traditionally we've unblocked signals, called select(2), and then blocked
      signals again.  The code expects that the select() will be cancelled with
      EINTR if an interrupt occurs; but there's a race condition, which is that
      an already-pending signal will be delivered as soon as we unblock, and then
      when we reach select() there will be nothing preventing it from waiting.
      This can result in a long delay before we perform any action that
      ServerLoop was supposed to have taken in response to the signal.  As with
      the somewhat-similar symptoms fixed by commit 89390208, the main practical
      problem is slow launching of parallel workers.  The window for trouble is
      usually pretty short, corresponding to one iteration of ServerLoop; but
      it's not negligible.
      
      To fix, use pselect(2) in place of select(2) where available, as that's
      designed to solve exactly this problem.  Where not available, we continue
      to use the old way, and are no worse off than before.
      
      pselect(2) has been required by POSIX since about 2001, so most modern
      platforms should have it.  A bigger portability issue is that some
      implementations are said to be non-atomic, ie pselect() isn't really
      any different from unblock/select/reblock.  Still, we're no worse off
      than before on such a platform.
      
      There is talk of rewriting the postmaster to use a WaitEventSet and
      not do signal response work in signal handlers, at which point this
      could be reverted, since we'd be using a self-pipe to solve the race
      condition.  But that's not happening before v11 at the earliest.
      
      Back-patch to 9.6.  The problem exists much further back, but the
      worst symptom arises only in connection with parallel query, so it
      does not seem worth taking any portability risks in older branches.
      
      Discussion: https://postgr.es/m/9205.1492833041@sss.pgh.pa.us
      81069a9e
    • Tom Lane's avatar
      Run the postmaster's signal handlers without SA_RESTART. · 89390208
      Tom Lane authored
      The postmaster keeps signals blocked everywhere except while waiting
      for something to happen in ServerLoop().  The code expects that the
      select(2) will be cancelled with EINTR if an interrupt occurs; without
      that, followup actions that should be performed by ServerLoop() itself
      will be delayed.  However, some platforms interpret the SA_RESTART
      signal flag as meaning that they should restart rather than cancel
      the select(2).  Worse yet, some of them restart it with the original
      timeout delay, meaning that a steady stream of signal interrupts can
      prevent ServerLoop() from iterating at all if there are no incoming
      connection requests.
      
      Observable symptoms of this, on an affected platform such as HPUX 10,
      include extremely slow parallel query startup (possibly as much as
      30 seconds) and failure to update timestamps on the postmaster's sockets
      and lockfiles when no new connections arrive for a long time.
      
      We can fix this by running the postmaster's signal handlers without
      SA_RESTART.  That would be quite a scary change if the range of code
      where signals are accepted weren't so tiny, but as it is, it seems
      safe enough.  (Note that postmaster children do, and must, reset all
      the handlers before unblocking signals; so this change should not
      affect any child process.)
      
      There is talk of rewriting the postmaster to use a WaitEventSet and
      not do signal response work in signal handlers, at which point it might
      be appropriate to revert this patch.  But that's not happening before
      v11 at the earliest.
      
      Back-patch to 9.6.  The problem exists much further back, but the
      worst symptom arises only in connection with parallel query, so it
      does not seem worth taking any portability risks in older branches.
      
      Discussion: https://postgr.es/m/9205.1492833041@sss.pgh.pa.us
      89390208
    • Fujii Masao's avatar
      Get rid of extern declarations of non-existent functions. · cbc2270e
      Fujii Masao authored
      Those extern declartions were mistakenly added by commit 7c4f5240.
      
      Author: Petr Jelinek
      cbc2270e
    • Tom Lane's avatar
      Fix postmaster's handling of fork failure for a bgworker process. · 4fe04244
      Tom Lane authored
      This corner case didn't behave nicely at all: the postmaster would
      (partially) update its state as though the process had started
      successfully, and be quite confused thereafter.  Fix it to act
      like the worker had crashed, instead.
      
      In passing, refactor so that do_start_bgworker contains all the
      state-change logic for bgworker launch, rather than just some of it.
      
      Back-patch as far as 9.4.  9.3 contains similar logic, but it's just
      enough different that I don't feel comfortable applying the patch
      without more study; and the use of bgworkers in 9.3 was so small
      that it doesn't seem worth the extra work.
      
      transam/parallel.c is still entirely unprepared for the possibility
      of bgworker startup failure, but that seems like material for a
      separate patch.
      
      Discussion: https://postgr.es/m/4905.1492813727@sss.pgh.pa.us
      4fe04244
    • Tom Lane's avatar
      Code review for commands/statscmds.c. · 4b34624d
      Tom Lane authored
      Fix machine-dependent sorting of column numbers.  (Odd behavior
      would only materialize for column numbers above 255, but that's
      certainly legal.)
      
      Fix poor choice of SQLSTATE for some errors, and improve error message
      wording.  (Notably, "is not a scalar type" is a totally misleading way
      to explain "does not have a default btree opclass".)
      
      Avoid taking AccessExclusiveLock on the associated relation during DROP
      STATISTICS.  That's neither necessary nor desirable, and it could easily
      have put us into situations where DROP fails (compare commit 68ea2b7f).
      
      Adjust/improve comments.
      
      David Rowley and Tom Lane
      
      Discussion: https://postgr.es/m/CAKJS1f-GmCfPvBbAEaM5xoVOaYdVgVN1gicALSoYQ77z-+vLbw@mail.gmail.com
      4b34624d
  3. 23 Apr, 2017 8 commits
  4. 22 Apr, 2017 4 commits
    • Tom Lane's avatar
      Make PostgresNode.pm check server status more carefully. · 7d68f228
      Tom Lane authored
      PostgresNode blithely ignored the exit status of pg_ctl, and in general
      made no effort to be sure that the server was running when it should be.
      This caused it to miss server crashes, which is a serious shortcoming
      in a test scaffold.  Make it complain if pg_ctl fails, and modify the
      start and stop logic to complain if the server doesn't start, or doesn't
      stop, when expected.
      
      Also, have it turn off the "restart_after_crash" configuration parameter
      in created clusters, as bitter experience has shown that leaving that on
      can mask crashes too.
      
      We might at some point need variant functions that allow for, eg,
      server start failure to be expected.  But no existing test case appears
      to want that, and it surely shouldn't be the default behavior.
      
      Note that this *will* break the buildfarm, as it will expose known
      bugs that the previous testing failed to.  I'm committing it despite
      that, to verify that we get the expected failures in the buildfarm
      not just in manual testing.
      
      Back-patch into 9.6 where PostgresNode was introduced.  (The 9.6
      branch is not expected to show any failures.)
      
      Discussion: https://postgr.es/m/21432.1492886428@sss.pgh.pa.us
      7d68f228
    • Tom Lane's avatar
      Make PostgresNode::append_conf append a newline automatically. · 8a19c1a3
      Tom Lane authored
      Although the documentation for append_conf said clearly that it didn't
      add a newline, many test authors seem to have forgotten that ... or maybe
      they just consulted the example at the top of the POD documentation,
      which clearly shows adding a config entry without bothering to add a
      trailing newline.  The worst part of that is that it works, as long as
      you don't do it more than once, since the backend isn't picky about
      whether config files end with newlines.  So there's not a strong forcing
      function reminding test authors not to do it like that.  Upshot is that
      this is a terribly fragile way to go about things, and there's at least
      one existing test case that is demonstrably broken and not testing what
      it thinks it is.
      
      Let's just make append_conf append a newline, instead; that is clearly
      way safer than the old definition.
      
      I also cleaned up a few call sites that were unnecessarily ugly.
      (I left things alone in places where it's plausible that additional
      config lines would need to be added someday.)
      
      Back-patch the change in append_conf itself to 9.6 where it was added,
      as having a definitional inconsistency between branches would obviously
      be pretty hazardous for back-patching TAP tests.  The other changes are
      just cosmetic and don't need to be back-patched.
      
      Discussion: https://postgr.es/m/19751.1492892376@sss.pgh.pa.us
      8a19c1a3
    • Andrew Dunstan's avatar
      Require sufficiently modern version of Test::More for TAP tests · f92562ad
      Andrew Dunstan authored
      Ancient versions of Test::More don't support the note() function used in
      some TAP tests, so we require the minimum version of the module that
      does.
      f92562ad
    • Tom Lane's avatar
      Partially revert commit 536d47bd. · 5041cdf2
      Tom Lane authored
      Per buildfarm, the "#ifdef F_SETFD" removed in that commit actually
      is needed on Windows, because fcntl() isn't available at all on that
      platform, unless using Cygwin.  We could perhaps spell it more like
      "#ifdef HAVE_FCNTL", or "#ifndef WIN32", but it's not clear that
      those choices are better.
      
      It does seem that we don't need the bogus manual definition of
      FD_CLOEXEC, though, so keep that change.
      
      Discussion: https://postgr.es/m/26254.1492805635@sss.pgh.pa.us
      5041cdf2
  5. 21 Apr, 2017 6 commits
    • Peter Eisentraut's avatar
      doc: Update link · f58b6643
      Peter Eisentraut authored
      The reference "That is the topic of the next section." has been
      incorrect since the materialized views documentation got inserted
      between the section "rules-views" and "rules-update".
      
      Author: Zertrin <postgres_wiki@zertrin.org>
      f58b6643
    • Tom Lane's avatar
      Avoid depending on non-POSIX behavior of fcntl(2). · 3e51725b
      Tom Lane authored
      The POSIX standard does not say that the success return value for
      fcntl(F_SETFD) and fcntl(F_SETFL) is zero; it says only that it's not -1.
      We had several calls that were making the stronger assumption.  Adjust
      them to test specifically for -1 for strict spec compliance.
      
      The standard further leaves open the possibility that the O_NONBLOCK
      flag bit is not the only active one in F_SETFL's argument.  Formally,
      therefore, one ought to get the current flags with F_GETFL and store
      them back with only the O_NONBLOCK bit changed when trying to change
      the nonblock state.  In port/noblock.c, we were doing the full pushup
      in pg_set_block but not in pg_set_noblock, which is just weird.  Make
      both of them do it properly, since they have little business making
      any assumptions about the socket they're handed.  The other places
      where we're issuing F_SETFL are working with FDs we just got from
      pipe(2), so it's reasonable to assume the FDs' properties are all
      default, so I didn't bother adding F_GETFL steps there.
      
      Also, while pg_set_block deserves some points for trying to do things
      right, somebody had decided that it'd be even better to cast fcntl's
      third argument to "long".  Which is completely loony, because POSIX
      clearly says the third argument for an F_SETFL call is "int".
      
      Given the lack of field complaints, these missteps apparently are not
      of significance on any common platforms.  But they're still wrong,
      so back-patch to all supported branches.
      
      Discussion: https://postgr.es/m/30882.1492800880@sss.pgh.pa.us
      3e51725b
    • Heikki Linnakangas's avatar
      Change the on-disk format of SCRAM verifiers to conform to RFC 5803. · 68e61ee7
      Heikki Linnakangas authored
      It doesn't make any immediate difference to PostgreSQL, but might as well
      follow the standard, since one exists. (I looked at RFC 5803 earlier, but
      didn't fully understand it back then.)
      
      The new format uses Base64 instead of hex to encode StoredKey and
      ServerKey, which makes the verifiers slightly smaller. Using the same
      encoding for the salt and the keys also means that you only need one
      encoder/decoder instead of two. Although we have code in the backend to
      do both, we are talking about teaching libpq how to create SCRAM verifiers
      for PQencodePassword(), and libpq doesn't currently have any code for hex
      encoding.
      
      Bump catversion, because this renders any existing SCRAM verifiers in
      pg_authid invalid.
      
      Discussion: https://www.postgresql.org/message-id/351ba574-85ea-d9b8-9689-8c928dd0955d@iki.fi
      68e61ee7
    • Peter Eisentraut's avatar
      doc: Fix typo · c29a752c
      Peter Eisentraut authored
      c29a752c
    • Tom Lane's avatar
      Remove long-obsolete catering for platforms without F_SETFD/FD_CLOEXEC. · 536d47bd
      Tom Lane authored
      SUSv2 mandates that <fcntl.h> provide both F_SETFD and FD_CLOEXEC,
      so it seems pretty unlikely that any platforms remain without those.
      Remove the #ifdef-ery installed by commit 7627b91c to see if the
      buildfarm agrees.
      
      Discussion: https://postgr.es/m/21444.1492798101@sss.pgh.pa.us
      536d47bd
    • Peter Eisentraut's avatar
      Synchronize table list before creating slot in CREATE SUBSCRIPTION · dcb39c37
      Peter Eisentraut authored
      This way a failure to synchronize the table list will not leave an
      unused slot on the publisher.
      
      Author: Petr Jelinek <petr.jelinek@2ndquadrant.com>
      dcb39c37
  6. 20 Apr, 2017 4 commits
    • Tom Lane's avatar
      Add missing erand48.c to libpq/.gitignore. · 77c316be
      Tom Lane authored
      Oversight in commit 818fd4a6.  While at it, sync order of file list
      in .gitignore with those in the Makefile.
      77c316be
    • Alvaro Herrera's avatar
      Improve multivariate statistics documentation · 919f6d74
      Alvaro Herrera authored
      Extended statistics commit 7b504eb2 did not include appropriate
      documentation next to where we document regular planner statistics (I
      ripped what was submitted before commit and then forgot to put it back),
      and while later commit 2686ee1b added some material, it structurally
      depended on what I had ripped out, so the end result wasn't proper.
      
      Fix those problems by shuffling what was added by 2686ee1b and
      including some additional material, so that now chapter 14 "Performance
      Tips" now describes the types of multivariate statistics we currently
      have, and chapter 68 "How the Planner Uses Statistics" shows some
      examples.  The new text should be more in line with previous material,
      in (hopefully) the appropriate depth.
      
      While at it, fix a small bug in pg_statistic_ext docs: one column was
      listed in the wrong spot.
      919f6d74
    • Tom Lane's avatar
      Sync pg_ctl documentation and usage message with reality. · 8bcb31ad
      Tom Lane authored
      Commit 05cd12ed ("pg_ctl: Change default to wait for all actions")
      was a tad sloppy about updating the documentation to match.  The
      documentation was also sorely in need of a copy-editing pass, having
      been adjusted at different times by different people who took little
      care to maintain consistency of style.
      8bcb31ad
    • Peter Eisentraut's avatar
      Modify message when partitioned table is added to publication · 594b526b
      Peter Eisentraut authored
      Give a more specific error message than "xyz is not a table".
      
      Also document in CREATE PUBLICATION which kinds of relations are not
      supported.
      
      based on patch by Amit Langote <Langote_Amit_f8@lab.ntt.co.jp>
      594b526b