1. 21 Dec, 2016 5 commits
    • Robert Haas's avatar
      Fix corner-case bug in WaitEventSetWaitBlock on Windows. · 3b790d25
      Robert Haas authored
      If we do not reset the FD_READ event, WaitForMultipleObjects won't
      return it again again unless we've meanwhile read from the socket,
      which is generally true but not guaranteed.  WaitEventSetWaitBlock
      itself may fail to return the event to the caller if the latch is
      also set, and even if we changed that, the caller isn't obliged to
      handle all returned events at once.  On non-Windows systems, the
      socket-read event is purely level-triggered, so this issue does
      not exist.  To fix, make Windows reset the event when needed.
      
      This bug was introduced by 98a64d0b,
      and causes hangs when trying to use the pldebugger extension.
      
      Patch by Amit Kapial.  Reported and tested by Ashutosh Sharma, who
      also provided some analysis.  Further analysis by Michael Paquier.
      3b790d25
    • Robert Haas's avatar
      Refactor merge path generation code. · 59649c3f
      Robert Haas authored
      This shouldn't change the set of paths that get generated in any
      way, but it is preparatory work for further changes to allow a
      partial path to be merge-joined witih a non-partial path to produce
      a partial join path.
      
      Dilip Kumar, with cosmetic adjustments by me.
      59649c3f
    • Peter Eisentraut's avatar
      Reorder pg_sequence columns to avoid alignment issue · f3b421da
      Peter Eisentraut authored
      On AIX, doubles are aligned at 4 bytes, but int64 is aligned at 8 bytes.
      Our code assumes that doubles have alignment that can also be applied to
      int64, but that fails in this case.  One effect is that
      heap_form_tuple() writes tuples in a different layout than
      Form_pg_sequence expects.
      
      Rather than rewrite the whole alignment code, work around the issue by
      reordering the columns in pg_sequence so that the first int64 column
      naturally comes out at an 8-byte boundary.
      f3b421da
    • Fujii Masao's avatar
      Forbid invalid combination of options in pg_basebackup. · ecbdc4c5
      Fujii Masao authored
      Commit 56c7d8d4 allowed pg_basebackup
      to stream WAL in tar mode. But there is the restriction that WAL
      streaming in tar mode works only when the value - (dash) is not
      specified as output directory. This means that the combination of
      three options "-D -", "-F t" and "-X stream" is invalid. However,
      previously, even when those options were specified at the same time,
      pg_basebackup background process unexpectedly started streaming WAL.
      And then it exited with an error.
      
      This commit changes pg_basebackup so that it errors out on such
      invalid combination of options at the beginning.
      
      Reviewed by Magnus Hagander, and patch by me.
      ecbdc4c5
    • Tom Lane's avatar
      Fix minor oversights in nodeAgg.c. · c080b223
      Tom Lane authored
      aggstate->evalproj is always set up by ExecInitAgg, so there's no
      need to test.  Doing so led Coverity to think that we might be
      intending "slot" to be possibly NULL here, and it quite properly
      complained that the rest of combine_aggregates() wasn't prepared
      for that.
      
      Also fix a couple of obvious thinkos in Asserts checking that
      "inputoff" isn't past the end of the slot.
      
      Errors introduced in commit 8ed3f11b, so no need for back-patch.
      c080b223
  2. 20 Dec, 2016 4 commits
  3. 19 Dec, 2016 6 commits
    • Robert Haas's avatar
      Provide a DSA area for all parallel queries. · e13029a5
      Robert Haas authored
      This will allow future parallel query code to dynamically allocate
      storage shared by all participants.
      
      Thomas Munro, with assorted changes by me.
      e13029a5
    • Tom Lane's avatar
      Fix handling of phrase operator removal while removing tsquery stopwords. · 26044384
      Tom Lane authored
      The distance of a removed phrase operator should propagate up to a
      parent phrase operator if there is one, but this only worked correctly
      in left-deep trees.  Throwing in a few parentheses confused it completely,
      as indeed was illustrated by bizarre results in existing regression test
      cases.
      
      To fix, track unaccounted-for distances that should propagate to the left
      and to the right of the current node, rather than trying to make it work
      with only one returned distance.
      
      Also make some adjustments to behave as well as we can for cases of
      intermixed phrase and regular (AND/OR) operators.  I don't think it's
      possible to be 100% correct for that without a rethinking of the tsquery
      representation; for example, maybe we should just not drop stopword nodes
      at all underneath phrase operators.  But this is better than it was,
      and changing tsquery representation wouldn't be safely back-patchable.
      
      While at it, I simplified the API of the clean_fakeval_intree function
      a bit by getting rid of the "char *result" output parameter; that wasn't
      doing anything that wasn't redundant with whether the result node is
      NULL or not, and testing for NULL seems a lot clearer/safer.
      
      This is part of a larger project to fix various infelicities in the
      phrase-search implementation, but this part seems comittable on its own.
      
      Back-patch to 9.6 where phrase operators were introduced.
      
      Discussion: https://postgr.es/m/28215.1481999808@sss.pgh.pa.us
      Discussion: https://postgr.es/m/26706.1482087250@sss.pgh.pa.us
      26044384
    • Robert Haas's avatar
      Fix locking problem in _hash_squeezebucket() / _hash_freeovflpage(). · dd728826
      Robert Haas authored
      A bucket squeeze operation needs to lock each page of the bucket
      before releasing the prior page, but the previous coding fumbled the
      locking when freeing an overflow page during a bucket squeeze
      operation.  Commit 6d46f478
      introduced this bug.
      
      Amit Kapila, with help from Kuntal Ghosh and Dilip Kumar, after
      an initial trouble report by Jeff Janes.  Reviewed by me.  I also
      fixed a problem with a comment.
      dd728826
    • Robert Haas's avatar
      Remove unused file. · 668dbbec
      Robert Haas authored
      This was added in 10540974, but has
      never been used for anything as far as I can tell.  There seems to
      be no reason to keep it.
      668dbbec
    • Fujii Masao's avatar
      Support quorum-based synchronous replication. · 3901fd70
      Fujii Masao authored
      This feature is also known as "quorum commit" especially in discussion
      on pgsql-hackers.
      
      This commit adds the following new syntaxes into synchronous_standby_names
      GUC. By using FIRST and ANY keywords, users can specify the method to
      choose synchronous standbys from the listed servers.
      
        FIRST num_sync (standby_name [, ...])
        ANY num_sync (standby_name [, ...])
      
      The keyword FIRST specifies a priority-based synchronous replication
      which was available also in 9.6 or before. This method makes transaction
      commits wait until their WAL records are replicated to num_sync
      synchronous standbys chosen based on their priorities.
      
      The keyword ANY specifies a quorum-based synchronous replication
      and makes transaction commits wait until their WAL records are
      replicated to *at least* num_sync listed standbys. In this method,
      the values of sync_state.pg_stat_replication for the listed standbys
      are reported as "quorum". The priority is still assigned to each standby,
      but not used in this method.
      
      The existing syntaxes having neither FIRST nor ANY keyword are still
      supported. They are the same as new syntax with FIRST keyword, i.e.,
      a priorirty-based synchronous replication.
      
      Author: Masahiko Sawada
      Reviewed-By: Michael Paquier, Amit Kapila and me
      Discussion: <CAD21AoAACi9NeC_ecm+Vahm+MMA6nYh=Kqs3KB3np+MBOS_gZg@mail.gmail.com>
      
      Many thanks to the various individuals who were involved in
      discussing and developing this feature.
      3901fd70
    • Magnus Hagander's avatar
      Fix base backup rate limiting in presence of slow i/o · 10238fad
      Magnus Hagander authored
      When source i/o on disk was too slow compared to the rate limiting
      specified, the system could end up with a negative value for sleep that
      it never got out of, which caused rate limiting to effectively be
      turned off.
      
      Discussion: https://postgr.es/m/CABUevEy_-e0YvL4ayoX8bH_Ja9w%2BBHoP6jUgdxZuG2nEj3uAfQ%40mail.gmail.com
      
      Analysis by me, patch by Antonin Houska
      10238fad
  4. 18 Dec, 2016 2 commits
    • Noah Misch's avatar
      MSVC: Position MSBFLAGS after flags it might override. · cc07e06b
      Noah Misch authored
      Christian Ullrich
      cc07e06b
    • Tom Lane's avatar
      In contrib/uuid-ossp, #include headers needed for ntohl() and ntohs(). · 4a0a34b5
      Tom Lane authored
      Oversight in commit b8cc8f94.  I just noticed this causes compiler
      warnings on FreeBSD, and it really ought to cause warnings elsewhere too:
      all references I can find say that <arpa/inet.h> is required for these.
      We have a lot of code elsewhere that thinks that both <netinet/in.h>
      and <arpa/inet.h> should be included for these functions, so do it that
      way here too, even though <arpa/inet.h> ought to be sufficient according
      to the references I consulted.
      
      Back-patch to 9.4 where the previous commit landed.
      4a0a34b5
  5. 17 Dec, 2016 3 commits
    • Tom Lane's avatar
      Fix FK-based join selectivity estimation for semi/antijoins. · 7fa93eec
      Tom Lane authored
      This case wasn't thought through sufficiently in commit 100340e2.
      It's true that the FK proves that every outer row has a match in the
      inner table, but we forgot that some of the inner rows might be filtered
      away by WHERE conditions located within the semijoin's RHS.
      
      If the RHS is just one table, we can reasonably take the semijoin
      selectivity as equal to the fraction of the referenced table's rows
      that are expected to survive its restriction clauses.
      
      If the RHS is a join, it's not clear how much of the referenced table
      might get through the join, so fall back to the same rule we were
      already using for other outer-join cases: use the minimum of the
      regular per-clause selectivity estimates.  This gives the same result
      as if we hadn't considered the FK at all when there's a single FK
      column, but it should still help for multi-column FKs, which is the
      case that 100340e2 is really meant to help with.
      
      Back-patch to 9.6 where the previous commit came in.
      
      Discussion: https://postgr.es/m/16149.1481835103@sss.pgh.pa.us
      7fa93eec
    • Peter Eisentraut's avatar
      doc: Remove some trailing whitespace · b645a05f
      Peter Eisentraut authored
      Per discussion, we will not at this time remove trailing whitespace in
      psql output displays where it is part of the actual psql output.
      
      From: Vladimir Rusinov <vrusinov@google.com>
      b645a05f
    • Magnus Hagander's avatar
      Fix typos in comments · 01776a07
      Magnus Hagander authored
      Michael Paquier
      01776a07
  6. 16 Dec, 2016 9 commits
    • Robert Haas's avatar
      Fix outdated comment in lwlock.c · 591ccb66
      Robert Haas authored
      Commit 3761fe3c should have made
      this change, but didn't.
      
      Reported by Álvaro Herrera.
      591ccb66
    • Fujii Masao's avatar
      Ensure that num_sync is greater than zero in synchronous_standby_names. · 93eb619c
      Fujii Masao authored
      Previously num_sync could be set to zero and this setting caused
      an assertion failure. This means that multiple synchronous standbys
      code should assume that num_sync is greater than zero.
      Also setting num_sync to zero is nonsense because it's basically
      the configuration for synchronous replication. If users want not to
      make transaction commits wait for any standbys,
      synchronous_standby_names should be emptied to disable synchronous
      replication instead of setting num_sync to zero.
      
      This patch forbids users from setting num_sync to zero in
      synchronous_standby_names. If zero is specified, an error will
      happen during processing the parameter settings.
      
      Back-patch to 9.6 where multiple synchronous standbys feature was added.
      
      Patch by me. Reviewed by Tom Lane.
      Discussion: <CAHGQGwHWB3izc6cXuFLh5kOcAbFXaRhhgwd-X5PeN9TEjxqXwg@mail.gmail.com>
      93eb619c
    • Tom Lane's avatar
      Improve documentation around TS_execute(). · 23c75b55
      Tom Lane authored
      I got frustrated by the lack of commentary in this area, so here is some
      reverse-engineered documentation, along with minor stylistic cleanup.
      No code changes more significant than removal of unused variables.
      
      Back-patch to 9.6, not because that's useful in itself, but because
      we have some bugs to fix in phrase search and this would cause merge
      failures if it's only in HEAD.
      23c75b55
    • Robert Haas's avatar
      Simplify LWLock tranche machinery by removing array_base/array_stride. · 3761fe3c
      Robert Haas authored
      array_base and array_stride were added so that we could identify the
      offset of an LWLock within a tranche, but this facility is only very
      marginally used apart from the main tranche.  So, give every lock in
      the main tranche its own tranche ID and get rid of array_base,
      array_stride, and all that's attached.  For debugging facilities
      (Trace_lwlocks and LWLOCK_STATS) print the pointer address of the
      LWLock using %p instead of the offset.  This is arguably more useful,
      and certainly a lot cheaper.  Drop the offset-within-tranche from
      the information reported to dtrace and from one can't-happen message
      inside lwlock.c.
      
      The main user-visible impact of this change is that pg_stat_activity
      will now report all waits for LWLocks as "LWLock" rather than
      reporting some as "LWLockTranche" and others as "LWLockNamed".
      
      The main motivation for this change is that the need to specify an
      array_base and an array_stride is awkward for parallel query.  There
      is only a very limited supply of tranche IDs so we can't just keep
      allocating new ones, and if we try to use the same tranche IDs every
      time then we run into trouble when multiple parallel contexts are
      use simultaneously.  So if we didn't get rid of this mechanism we'd
      have to make it even more complicated.  By simplifying it in this
      way, we instead reduce the size of the generated code for lwlock.c
      by about 5%.
      
      Discussion: http://postgr.es/m/CA+TgmoYsFn6NUW1x0AZtupJGUAs1UDY4dJtCN47_Q6D0sP80PA@mail.gmail.com
      3761fe3c
    • Fujii Masao's avatar
      Add missing documentation for effective_io_concurrency tablespace option. · 4e344c2c
      Fujii Masao authored
      The description of effective_io_concurrency option was missing in ALTER
      TABLESPACE docs though it's included in CREATE TABLESPACE one.
      
      Back-patch to 9.6 where effective_io_concurrency tablespace option was added.
      
      Michael Paquier, reported by Marc-Olaf Jaschke
      4e344c2c
    • Robert Haas's avatar
      Unbreak Finalize HashAggregate over Partial HashAggregate. · b81b5a96
      Robert Haas authored
      Commit 5dfc1981 introduced the use
      of a new type of hash table with linear reprobing for hash aggregates.
      Such a hash table behaves very poorly if keys are inserted in hash
      order, which does in fact happen in the case where a query use a
      Finalize HashAggregate node fed (via Gather) by a Partial
      HashAggregate node.  In fact, queries with this type of plan tend
      to run effectively forever.
      
      Fix that by seeding the hash value differently in each worker
      (and in the leader, if it participates).
      
      Andres Freund and Robert Haas
      b81b5a96
    • Robert Haas's avatar
      Fix more hash index bugs around marking buffers dirty. · 6a4fe112
      Robert Haas authored
      In _hash_freeovflpage(), if we're freeing the overflow page that
      immediate follows the page to which tuples are being moved (the
      confusingly-named "write buffer"), don't forget to mark that
      page dirty after updating its hasho_nextblkno.
      
      In _hash_squeezebucket(), it's not necessary to mark the primary
      bucket page dirty if there are no overflow pages, because there's
      nothing to squeeze in that case.
      
      Amit Kapila, with help from Kuntal Ghosh and Dilip Kumar, after
      an initial trouble report by Jeff Janes.
      6a4fe112
    • Robert Haas's avatar
      Remove _hash_wrtbuf() in favor of calling MarkBufferDirty(). · 25216c98
      Robert Haas authored
      The whole concept of _hash_wrtbuf() is that we need to know at the
      time we're releasing the buffer lock (and pin) whether we dirtied the
      buffer, but this is easy to get wrong.  This patch actually fixes one
      non-obvious bug of that form: hashbucketcleanup forgot to signal
      _hash_squeezebucket, which gets the primary bucket page already
      locked, as to whether it had already dirtied the page.  Calling
      MarkBufferDirty() at the places where we dirty the buffer is more
      intuitive and lets us simplify the code in various places as well.
      
      On top of all that, the ultimate goal here is to make hash indexes
      WAL-logged, and as the comments to _hash_wrtbuf() note, it should
      go away when that happens.  Making it go away a little earlier than
      that seems like a good preparatory step.
      
      Report by Jeff Janes.  Diagnosis by Amit Kapila, Kuntal Ghosh,
      and Dilip Kumar.  Patch by me, after studying an alternative patch
      submitted by Amit Kapila.
      
      Discussion: http://postgr.es/m/CAA4eK1Kf6tOY0oVz_SEdngiNFkeXrA3xUSDPPORQvsWVPdKqnA@mail.gmail.com
      25216c98
    • Heikki Linnakangas's avatar
      Fix off-by-one in memory allocation for quote_literal_cstr(). · 4f5182e1
      Heikki Linnakangas authored
      The calculation didn't take into account the NULL terminator. That lead
      to overwriting the palloc'd buffer by one byte, if the input consists
      entirely of backslashes. For example "format('%L', E'\\')".
      
      Fixes bug #14468. Backpatch to all supported versions.
      
      Report: https://www.postgresql.org/message-id/20161216105001.13334.42819%40wrigleys.postgresql.org
      4f5182e1
  7. 15 Dec, 2016 3 commits
  8. 13 Dec, 2016 7 commits
    • Tom Lane's avatar
      Improve handling of array elements as getdiag_targets and cursor_variables. · 55caaaeb
      Tom Lane authored
      There's no good reason why plpgsql's GET DIAGNOSTICS statement can't
      support an array element as target variable, since the execution code
      already uses the generic exec_assign_value() function to assign to it.
      Hence, refactor the grammar to allow that, by making getdiag_target
      depend on the assign_var production.
      
      Ideally we'd also let a cursor_variable expand to an element of a
      refcursor[] array, but that's substantially harder since those statements
      also have to handle bound-cursor-variable cases.  For now, just make sure
      the reported error is sensible, ie "cursor variable must be a simple
      variable" not "variable must be of type cursor or refcursor".  The latter
      was quite confusing from the user's viewpoint, since what he wrote
      satisfies the claimed restriction.
      
      Per bug #14463 from Zhou Digoal.  Given the lack of previous complaints,
      I see no need for a back-patch.
      
      Discussion: https://postgr.es/m/20161213152548.14897.81245@wrigleys.postgresql.org
      55caaaeb
    • Tom Lane's avatar
      Prevent planagg.c from failing on queries containing CTEs. · 1f542a2e
      Tom Lane authored
      The existing tests in preprocess_minmax_aggregates() usually prevent it
      from trying to do anything with queries containing CTEs, but there's an
      exception: a CTE could be present as a member of an appendrel, if we
      flattened a UNION ALL that contains CTE references.  If it did try to
      generate an optimized path for a query using a CTE, it failed with
      "could not find plan for CTE", as reported by Torsten Förtsch.
      
      The proximate cause is an unwise decision in commit 3fc6e2d7 to clear
      subroot->cte_plan_ids in build_minmax_path().  That left the subroot's
      cte_plan_ids list out of step with its parse->cteList.
      
      Removing the "subroot->cte_plan_ids = NIL;" assignment is enough to let
      the case work again, but really it's pretty silly to be expending any
      cycles at all in this module when there are CTEs: we always treat their
      outputs as unordered so there's no way for the optimization to win.
      Hence, also add an early-exit test so we don't waste time like that.
      
      Back-patch to 9.6 where the misbehavior was introduced.
      
      Report: https://postgr.es/m/CAKkG4_=gjY5QiHtqSZyWMwDuTd_CftKoTaCqxjJ7uUz1-Gw=qw@mail.gmail.com
      1f542a2e
    • Robert Haas's avatar
      Fix bug in hashbulkdelete. · 501c7b94
      Robert Haas authored
      Commit 6d46f478 failed to account for
      the possibility that hashbulkdelete() might encounter a bucket that
      has been split since it began scanning the bucket array.  Repair.
      
      Extracted from a larger pathc by Amit Kapila; I rewrote the comment.
      501c7b94
    • Robert Haas's avatar
      Fix bugs in RelationGetPartitionDispatchInfo. · a2566508
      Robert Haas authored
      The previous coding was not quite right for cases involving multiple
      levels of partitioning.
      
      Amit Langote
      a2566508
    • Robert Haas's avatar
      Clean up code, comments, and formatting for table partitioning. · 4b9a98e1
      Robert Haas authored
      Amit Langote, plus pgindent-ing by me.  Inspired in part by review
      comments from Tomas Vondra.
      4b9a98e1
    • Robert Haas's avatar
      Update typedefs.list · acddbe22
      Robert Haas authored
      So developers can more easily run pgindent locally
      acddbe22
    • Robert Haas's avatar
      doc: Improve documentation related to table partitioning feature. · a1a4459c
      Robert Haas authored
      Commit f0e44751 implemented table
      partitioning, but failed to mention the "no row movement"
      restriction in the documentation.  Fix that and a few other issues.
      
      Amit Langote, with some additional wordsmithing by me.
      a1a4459c
  9. 12 Dec, 2016 1 commit