1. 27 May, 2016 4 commits
    • Tom Lane's avatar
      Clean up thread management in parallel pg_dump for Windows. · 807b4537
      Tom Lane authored
      Since we start the worker threads with _beginthreadex(), we should use
      _endthreadex() to terminate them.  We got this right in the normal-exit
      code path, but not so much during an error exit from a worker.
      In addition, be sure to apply CloseHandle to the thread handle after
      each thread exits.
      
      It's not clear that these oversights cause any user-visible problems,
      since the pg_dump run is about to terminate anyway.  Still, it's clearly
      better to follow Microsoft's API specifications than ignore them.
      
      Also a few cosmetic cleanups in WaitForTerminatingWorkers(), including
      being a bit less random about where to cast between uintptr_t and HANDLE,
      and being sure to clear the worker identity field for each dead worker
      (not that false matches should be possible later, but let's be careful).
      
      Original observation and patch by Armin Schöffmann, cosmetic improvements
      by Michael Paquier and me.  (Armin's patch also included closing sockets
      in ShutdownWorkersHard(), but that's been dealt with already in commit
      df8d2d8c.)  Back-patch to 9.3 where parallel pg_dump was introduced.
      
      Discussion: <zarafa.570306bd.3418.074bf1420d8f2ba2@root.aegaeon.de>
      807b4537
    • Tom Lane's avatar
      Fix release-note typo. · d81ecb9b
      Tom Lane authored
      Léonard Benedetti
      d81ecb9b
    • Tom Lane's avatar
      Fix DROP ACCESS METHOD IF EXISTS. · 83dbde94
      Tom Lane authored
      The IF EXISTS option was documented, and implemented in the grammar, but
      it didn't actually work for lack of support in does_not_exist_skipping().
      Per bug #14160.
      
      Report and patch by Kouhei Sutou
      
      Report: <20160527070433.19424.81712@wrigleys.postgresql.org>
      83dbde94
    • Tom Lane's avatar
      Be more predictable about reporting "lock timeout" vs "statement timeout". · 9dd4178c
      Tom Lane authored
      If both timeout indicators are set when we arrive at ProcessInterrupts,
      we've historically just reported "lock timeout".  However, some buildfarm
      members have been observed to fail isolationtester's timeouts test by
      reporting "lock timeout" when the statement timeout was expected to fire
      first.  The cause seems to be that the process is allowed to sleep longer
      than expected (probably due to heavy machine load) so that the lock
      timeout happens before we reach the point of reporting the error, and
      then this arbitrary tiebreak rule does the wrong thing.  We can improve
      matters by comparing the scheduled timeout times to decide which error
      to report.
      
      I had originally proposed greatly reducing the 1-second window between
      the two timeouts in the test cases.  On reflection that is a bad idea,
      at least for the case where the lock timeout is expected to fire first,
      because that would assume that it takes negligible time to get from
      statement start to the beginning of the lock wait.  Thus, this patch
      doesn't completely remove the risk of test failures on slow machines.
      Empirically, however, the case this handles is the one we are seeing
      in the buildfarm.  The explanation may be that the other case requires
      the scheduler to take the CPU away from a busy process, whereas the
      case fixed here only requires the scheduler to not give the CPU back
      right away to a process that has been woken from a multi-second sleep
      (and, perhaps, has been swapped out meanwhile).
      
      Back-patch to 9.3 where the isolationtester timeouts test was added.
      
      Discussion: <8693.1464314819@sss.pgh.pa.us>
      9dd4178c
  2. 26 May, 2016 5 commits
    • Magnus Hagander's avatar
      Make pg_dump error cleanly with -j against hot standby · d74048de
      Magnus Hagander authored
      Getting a synchronized snapshot is not supported on a hot standby node,
      and is by default taken when using -j with multiple sessions. Trying to
      do so still failed, but with a server error that would also go in the
      log. Instead, proprely detect this case and give a better error message.
      d74048de
    • Tom Lane's avatar
      Disable physical tlist if any Var would need multiple sortgroupref labels. · aeb9ae64
      Tom Lane authored
      As part of upper planner pathification (commit 3fc6e2d7) I redid
      createplan.c's approach to the physical-tlist optimization, in which scan
      nodes are allowed to return exactly the underlying table's columns so as
      to save doing a projection step at runtime.  The logic was intentionally
      more aggressive than before about applying the optimization, which is
      generally a good thing, but Andres Freund found a case in which it got
      too aggressive.  Namely, if any column is referenced more than once in
      the parent plan node's sorting or grouping column list, we can't optimize
      because then that column would need to have more than one ressortgroupref
      label, and we only have space for one.
      
      Add logic to detect this situation in use_physical_tlist(), and also add
      some error checking in apply_pathtarget_labeling_to_tlist(), which this
      example proves was being overly cavalier about whether what it was doing
      made any sense.
      
      The added test case exposes the problem only because we do not eliminate
      duplicate grouping keys.  That might be something to fix someday, but it
      doesn't seem like appropriate post-beta work.
      
      Report: <20160526021235.w4nq7k3gnheg7vit@alap3.anarazel.de>
      aeb9ae64
    • Alvaro Herrera's avatar
      Fix typo in 9.5 release nodes · d7ef3572
      Alvaro Herrera authored
      Noted by 星合 拓馬 (HOSHIAI Takuma)
      d7ef3572
    • Tom Lane's avatar
      Make pg_dump behave more sanely when built without HAVE_LIBZ. · cae2bb19
      Tom Lane authored
      For some reason the code to emit a warning and switch to uncompressed
      output was placed down in the guts of pg_backup_archiver.c.  This is
      definitely too late in the case of parallel operation (and I rather
      wonder if it wasn't too late for other purposes as well).  Put it in
      pg_dump.c's option-processing logic, which seems a much saner place.
      
      Also, the default behavior with custom or directory output format was
      to emit the warning telling you the output would be uncompressed.  This
      seems unhelpful, so silence that case.
      
      Back-patch to 9.3 where parallel dump was introduced.
      
      Kyotaro Horiguchi, adjusted a bit by me
      
      Report: <20160526.185551.242041780.horiguchi.kyotaro@lab.ntt.co.jp>
      cae2bb19
    • Tom Lane's avatar
      In Windows pg_dump, ensure idle workers will shut down during error exit. · df8d2d8c
      Tom Lane authored
      The Windows coding of ShutdownWorkersHard() thought that setting termEvent
      was sufficient to make workers exit after an error.  But that only helps
      if a worker is busy and passes through checkAborting().  An idle worker
      will just sit, resulting in pg_dump failing to exit until the user gives up
      and hits control-C.  We should close the write end of the command pipe
      so that idle workers will see socket EOF and exit, as the Unix coding was
      already doing.
      
      Back-patch to 9.3 where parallel pg_dump was introduced.
      
      Kyotaro Horiguchi
      df8d2d8c
  3. 25 May, 2016 8 commits
    • Tom Lane's avatar
      Remove option to write USING before opclass name in CREATE INDEX. · b898eb63
      Tom Lane authored
      Dating back to commit f10b6392, our grammar has allowed "USING" to
      optionally appear before an opclass name in CREATE INDEX (and, lately,
      some related places such as ON CONFLICT specifications).  Nikolay Shaplov
      noticed that this syntax existed but wasn't documented, and proposed
      documenting it.  But what seems like a better idea is to remove the
      production, thereby making the code match the docs not vice versa.
      This isn't our usual modus operandi for such cases, but there are a
      couple of good reasons to proceed this way:
      
      * So far as I can find, this syntax has never been documented anywhere.
      It isn't relied on by any of our own code or test cases, and there seems
      little reason to suppose that it's been used in the wild either.
      
      * Documenting it would mean that there would be two separate uses of
      USING in the CREATE INDEX syntax, the other being "USING access_method".
      That can lead to nothing but confusion.
      
      So, let's just remove it.  On the off chance that somebody somewhere
      is using it, this isn't something to back-patch, but we can fix it
      in HEAD.
      
      Discussion: <1593237.l7oKHRpxSe@nataraj-amd64>
      b898eb63
    • Tom Lane's avatar
      Ensure that backends see up-to-date statistics for shared catalogs. · 52e8fc3e
      Tom Lane authored
      Ever since we split the statistics collector's reports into per-database
      files (commit 187492b6), backends have been seeing stale statistics
      for shared catalogs.  This is because the inquiry message only prompts the
      collector to write the per-database file for the requesting backend's own
      database.  Stats for shared catalogs are in a separate file for "DB 0",
      which didn't get updated.
      
      In normal operation this was partially masked by the fact that the
      autovacuum launcher would send an inquiry message at least once per
      autovacuum_naptime that asked for "DB 0"; so the shared-catalog stats would
      never be more than a minute out of date.  However the problem becomes very
      obvious with autovacuum disabled, as reported by Peter Eisentraut.
      
      To fix, redefine the semantics of inquiry messages so that both the
      specified DB and DB 0 will be dumped.  (This might seem a bit inefficient,
      but we have no good way to know whether a backend's transaction will look
      at shared-catalog stats, so we have to read both groups of stats whenever
      we request stats.  Sending two inquiry messages would definitely not be
      better.)
      
      Back-patch to 9.3 where the bug was introduced.
      
      Report: <56AD41AC.1030509@gmx.net>
      52e8fc3e
    • Tom Lane's avatar
      Fix broken error handling in parallel pg_dump/pg_restore. · 9abd64ec
      Tom Lane authored
      In the original design for parallel dump, worker processes reported errors
      by sending them up to the master process, which would print the messages.
      This is unworkably fragile for a couple of reasons: it risks deadlock if a
      worker sends an error at an unexpected time, and if the master has already
      died for some reason, the user will never get to see the error at all.
      Revert that idea and go back to just always printing messages to stderr.
      This approach means that if all the workers fail for similar reasons (eg,
      bad password or server shutdown), the user will see N copies of that
      message, not only one as before.  While that's slightly annoying, it's
      certainly better than not seeing any message; not to mention that we
      shouldn't assume that only the first failure is interesting.
      
      An additional problem in the same area was that the master failed to
      disable SIGPIPE (at least until much too late), which meant that sending a
      command to an already-dead worker would cause the master to crash silently.
      That was bad enough in itself but was made worse by the total reliance on
      the master to print errors: even if the worker had reported an error, you
      would probably not see it, depending on timing.  Instead disable SIGPIPE
      right after we've forked the workers, before attempting to send them
      anything.
      
      Additionally, the master relies on seeing socket EOF to realize that a
      worker has exited prematurely --- but on Windows, there would be no EOF
      since the socket is attached to the process that includes both the master
      and worker threads, so it remains open.  Make archive_close_connection()
      close the worker end of the sockets so that this acts more like the Unix
      case.  It's not perfect, because if a worker thread exits without going
      through exit_nicely() the closures won't happen; but that's not really
      supposed to happen.
      
      This has been wrong all along, so back-patch to 9.3 where parallel dump
      was introduced.
      
      Report: <2458.1450894615@sss.pgh.pa.us>
      9abd64ec
    • Kevin Grittner's avatar
      Update doc text to reflect new column in MVCC phenomena table. · 627e3603
      Kevin Grittner authored
      Scott Wehrenberg
      627e3603
    • Stephen Frost's avatar
      Do not DROP default roles in pg_dumpall -c · 018eb027
      Stephen Frost authored
      When pulling the list of roles to drop, exclude roles whose names
      begin with "pg_" (as we do when we are dumping the roles out to
      recreate them).
      
      Also add regression tests to cover pg_dumpall -c and this specific
      issue.
      
      Noticed by Rushabh Lathia.  Patch by me.
      018eb027
    • Tom Lane's avatar
      Mark wal_level as PGDLLIMPORT. · f5e7b2f9
      Tom Lane authored
      Per buildfarm, this is needed to allow extensions to use XLogIsNeeded()
      in Windows builds.
      f5e7b2f9
    • Tom Lane's avatar
      Fix contrib/bloom to work for unlogged indexes. · abaffa90
      Tom Lane authored
      blbuildempty did not do even approximately the right thing: it tried
      to add a metapage to the relation's regular data fork, which already
      has one at that point.  It should look like the ambuildempty methods
      for all the standard index types, ie, initialize a metapage image in
      some transient storage and then write it directly to the init fork.
      To support that, refactor BloomInitMetapage into two functions.
      
      In passing, fix BloomInitMetapage so it doesn't leave the rd_options
      field of the index's relcache entry pointing at transient storage.
      I'm not sure this had any visible consequence, since nothing much
      else is likely to look at a bloom index's rd_options, but it's
      certainly poor practice.
      
      Per bug #14155 from Zhou Digoal.
      
      Report: <20160524144146.22598.42558@wrigleys.postgresql.org>
      abaffa90
    • Stephen Frost's avatar
      Qualify table usage in dumpTable() and use regclass · 2e8b4bf8
      Stephen Frost authored
      All of the other tables used in the query in dumpTable(), which is
      collecting column-level ACLs, are qualified, so we should be qualifying
      the pg_init_privs, the related sub-select against pg_class and the
      other queries added by the pg_dump catalog ACLs work.
      
      Also, use ::regclass (or ::pg_catalog.regclass, where appropriate)
      instead of using a poorly constructed query to get the OID for various
      catalog tables.
      
      Issues identified by Noah and Alvaro, patch by me.
      2e8b4bf8
  4. 24 May, 2016 7 commits
    • Tom Lane's avatar
      Fetch XIDs atomically during vac_truncate_clog(). · 2d2e40e3
      Tom Lane authored
      Because vac_update_datfrozenxid() updates datfrozenxid and datminmxid
      in-place, it's unsafe to assume that successive reads of those values will
      give consistent results.  Fetch each one just once to ensure sane behavior
      in the minimum calculation.  Noted while reviewing Alexander Korotkov's
      patch in the same area.
      
      Discussion: <8564.1464116473@sss.pgh.pa.us>
      2d2e40e3
    • Tom Lane's avatar
      Avoid consuming an XID during vac_truncate_clog(). · 996d2739
      Tom Lane authored
      vac_truncate_clog() uses its own transaction ID as the comparison point in
      a sanity check that no database's datfrozenxid has already wrapped around
      "into the future".  That was probably fine when written, but in a lazy
      vacuum we won't have assigned an XID, so calling GetCurrentTransactionId()
      causes an XID to be assigned when otherwise one would not be.  Most of the
      time that's not a big problem ... but if we are hard up against the
      wraparound limit, consuming XIDs during antiwraparound vacuums is a very
      bad thing.
      
      Instead, use ReadNewTransactionId(), which not only avoids this problem
      but is in itself a better comparison point to test whether wraparound
      has already occurred.
      
      Report and patch by Alexander Korotkov.  Back-patch to all versions.
      
      Report: <CAPpHfdspOkmiQsxh-UZw2chM6dRMwXAJGEmmbmqYR=yvM7-s6A@mail.gmail.com>
      996d2739
    • Alvaro Herrera's avatar
      Fix range check for effective_io_concurrency · 0c7cd45b
      Alvaro Herrera authored
      Commit 1aba62ec moved the range check of that option form guc.c into
      bufmgr.c, but introduced a bug by changing a >= 0.0 to > 0.0, which made
      the value 0 no longer accepted.  Put it back.
      
      Reported by Jeff Janes, diagnosed by Tom Lane
      0c7cd45b
    • Tom Lane's avatar
      Docs: mention pg_reload_conf() in ALTER SYSTEM reference page. · c45fb43c
      Tom Lane authored
      Takayuki Tsunakawa
      
      Discussion: <0A3221C70F24FB45833433255569204D1F578FC3@G01JPEXMBYT05>
      c45fb43c
    • Tom Lane's avatar
      In examples of Oracle PL/SQL code, use varchar2 not varchar. · 23f11dc2
      Tom Lane authored
      Oracle recommends using VARCHAR2 not VARCHAR, allegedly because they might
      someday change VARCHAR to be spec-compliant about distinguishing null from
      empty string.  (I'm not holding my breath, though.)  Our examples of PL/SQL
      code were using VARCHAR, which while not wrong is missing the pedagogical
      opportunity to talk about converting Oracle type names to Postgres.  So
      switch the examples to use VARCHAR2, and add some text about what to do
      with common Oracle type names like VARCHAR2 and NUMBER.  (There is probably
      more to be said here, but those are the ones I'm sure about offhand.)
      Per suggestion from rapg12@gmail.com.
      
      Discussion: <20160521140046.22591.24672@wrigleys.postgresql.org>
      23f11dc2
    • Teodor Sigaev's avatar
      Fix typo in docs · 6ee7fb82
      Teodor Sigaev authored
      Add missing USING BLOOM in example of contrib/bloom
      
      Nikolay Shaplov
      6ee7fb82
    • Tom Lane's avatar
      Fix typo in TAP test identification string. · 1087aa23
      Tom Lane authored
      Michael Paquier
      1087aa23
  5. 23 May, 2016 4 commits
    • Tom Lane's avatar
      Fix BTREE_BUILD_STATS build. · 1e0d6512
      Tom Lane authored
      Commit 65c5fcd3 broke this by removing a
      header include directive that is conditionally required.  Add that back
      to nbtree.c, with annotation to keep pgrminclude from re-breaking it.
      
      Peter Geoghegan
      
      Report: <CAM3SWZTNjHFYW_UG8bu0BnogqQ2HfsTgkzXLueuUhfTcYbu5HA@mail.gmail.com>
      1e0d6512
    • Tom Lane's avatar
      Support IndexElem in raw_expression_tree_walker(). · eae1ad9b
      Tom Lane authored
      Needed for cases in which INSERT ... ON CONFLICT appears inside a
      recursive CTE item.  Per bug #14153 from Thomas Alton.
      
      Patch by Peter Geoghegan, slightly adjusted by me
      
      Report: <20160521232802.22598.13537@wrigleys.postgresql.org>
      eae1ad9b
    • Tom Lane's avatar
      Add support for more extensive testing of raw_expression_tree_walker(). · 465e09da
      Tom Lane authored
      If RAW_EXPRESSION_COVERAGE_TEST is defined, do a no-op tree walk over
      every basic DML statement submitted to parse analysis.  If we'd had this
      in place earlier, bug #14153 would have been caught by buildfarm testing.
      The difficulty is that raw_expression_tree_walker() is only used in
      limited cases involving CTEs (particularly recursive ones), so it's
      very easy for an oversight in it to not be noticed during testing of a
      seemingly-unrelated feature.
      
      The type of error we can expect to catch with this is complete omission
      of a node type from raw_expression_tree_walker(), and perhaps also
      recursion into a field that doesn't contain a node tree, though that
      would be an unlikely mistake.  It won't catch failure to add new fields
      that need to be recursed into, unfortunately.
      
      I'll go enable this on one or two of my own buildfarm animals once
      bug #14153 is dealt with.
      
      Discussion: <27861.1464040417@sss.pgh.pa.us>
      465e09da
    • Tom Lane's avatar
      Fix latent crash in do_text_output_multiline(). · 8a4930e3
      Tom Lane authored
      do_text_output_multiline() would fail (typically with a null pointer
      dereference crash) if its input string did not end with a newline.  Such
      cases do not arise in our current sources; but it certainly could happen
      in future, or in extension code's usage of the function, so we should fix
      it.  To fix, replace "eol += len" with "eol = text + len".
      
      While at it, make two cosmetic improvements: mark the input string const,
      and rename the argument from "text" to "txt" to dodge pgindent strangeness
      (since "text" is a typedef name).
      
      Even though this problem is only latent at present, it seems like a good
      idea to back-patch the fix, since it's a very simple/safe patch and it's
      not out of the realm of possibility that we might in future back-patch
      something that expects sane behavior from do_text_output_multiline().
      
      Per report from Hao Lee.
      
      Report: <CAGoxFiFPAGyPAJLcFxTB5cGhTW2yOVBDYeqDugYwV4dEd1L_Ag@mail.gmail.com>
      8a4930e3
  6. 22 May, 2016 1 commit
  7. 21 May, 2016 2 commits
    • Tom Lane's avatar
      Improve docs about contrib/intarray's benchmark suite. · 768d6f90
      Tom Lane authored
      Correct obsolete install instructions, as noted by Daniel Gustafsson.
      Clarify the test code's prerequisites.
      
      Discussion: <88E617F2-7721-4C4E-84F4-886A2041C1D0@yesql.se>
      768d6f90
    • Tom Lane's avatar
      Improve docs about using ORDER BY to control aggregate input order. · 82eafabe
      Tom Lane authored
      David Johnston pointed out that the original text here had been obsoleted
      by SQL:2008, which allowed ORDER BY in subqueries.  We could weaken the
      text to describe ORDER-BY-in-subqueries as an optional SQL feature that's
      possibly unportable; but then the exact same statements would apply to
      the alternative it's being compared to (ORDER-BY-in-aggregate-calls).
      So really that would be pretty useless; let's just take out the sentence
      entirely.  Instead, point out the hazard that any extra processing in the
      upper query might cause the subquery output order to be destroyed.
      
      Discussion: <CAKFQuwbAX=iO9QbpN7_jr+BnUWm9FYX8WbEPUvG0p+nZhp6TZg@mail.gmail.com>
      82eafabe
  8. 20 May, 2016 2 commits
  9. 19 May, 2016 1 commit
    • Tom Lane's avatar
      Pin the built-in index access methods. · 16ea51a2
      Tom Lane authored
      This was overlooked in commit 473b9328, which introduced DROP ACCESS
      METHOD.  Although that command is restricted to superusers, we don't want
      even superusers dropping the built-in methods; "DROP ACCESS METHOD btree"
      in particular is unrecoverable from.  Pin these objects in the same way
      that other initdb-created objects are pinned.
      
      I chose to bump catversion for this fix.  That's not absolutely necessary
      perhaps, but it will ensure that no 9.6 production systems are missing
      the pin entries.
      16ea51a2
  10. 17 May, 2016 3 commits
    • Tom Lane's avatar
      Avoid possible crash in contrib/bloom's blendscan(). · e13ac558
      Tom Lane authored
      It's possible to begin and end an indexscan without ever calling
      amrescan.  contrib/bloom, unlike every other index AM, allocated
      its "scan->opaque" storage at amrescan time, and thus would crash
      in amendscan if amrescan hadn't been called.  We could fix this
      by putting in a null-pointer check in blendscan, but I see no very
      good reason why contrib/bloom should march to its own drummer in
      this respect.  Let's move that initialization to blbeginscan
      instead.  Per report from Jeff Janes.
      e13ac558
    • Teodor Sigaev's avatar
      Allocate all page images at once in generic wal interface · 7c979c95
      Teodor Sigaev authored
      That reduces number of allocation.
      
      Per gripe from Michael Paquier and Tom Lane suggestion.
      7c979c95
    • Magnus Hagander's avatar
      Fix typo · b09cd2e5
      Magnus Hagander authored
      Amit Langote
      b09cd2e5
  11. 16 May, 2016 3 commits
    • Teodor Sigaev's avatar
      Correctly align page's images in generic wal API · 7c8345f6
      Teodor Sigaev authored
      Page image should be MAXALIGN'ed because existing code could directly align
      pointers in page instead of align offset from beginning of page.
      
      Found during play with indexes as extenstion, Alexander Korotkov and me
      7c8345f6
    • Robert Haas's avatar
      postgres_fdw: Fix the fix for crash when pushing down multiple joins. · 02a568a0
      Robert Haas authored
      Commit 3151f16e was intended to be
      a commit of a patch from Ashutosh Bapat, but instead I mistakenly
      committed an earlier version from Michael Paquier (because both
      patches were submitted with the same filename, and I confused them).
      Michael's patch fixes the crash but doesn't actually implement the
      correct test.
      
      Repair the incorrect logic, and also expand the comments considerably
      so that this is all more clear.
      
      Ashutosh Bapat and Robert Haas
      02a568a0
    • Robert Haas's avatar
      Fix multiple problems in postgres_fdw query cancellation logic. · 1b812afb
      Robert Haas authored
      First, even if we cancel a query, we still have to roll back the
      containing transaction; otherwise, the session will be left in a
      failed transaction state.
      
      Second, we need to support canceling queries whe aborting a
      subtransaction as well as when aborting a toplevel transaction.
      
      Etsuro Fujita, reviewed by Michael Paquier
      1b812afb