1. 08 Jun, 2017 1 commit
  2. 07 Jun, 2017 7 commits
    • Tom Lane's avatar
      Docs: improve CREATE TABLE ref page's discussion of partition bounds. · 0198c277
      Tom Lane authored
      Clarify in the syntax synopsis that partition bound values must be
      exactly numeric literals or string literals; previously it
      said "bound_literal" which was defined nowhere.
      
      Replace confusing --- and, I think, incorrect in detail --- definition
      of how range bounds work with a reference to row-wise comparison plus
      a concrete example (which I stole from Robert Haas).
      
      Minor copy-editing in the same area.
      
      Discussion: https://postgr.es/m/30475.1496005465@sss.pgh.pa.us
      Discussion: https://postgr.es/m/28106.1496041449@sss.pgh.pa.us
      0198c277
    • Robert Haas's avatar
      postgres_fdw: Allow cancellation of transaction control commands. · ae9bfc5d
      Robert Haas authored
      Commit f039eaac, later back-patched
      with commit 1b812afb, allowed many of
      the queries issued by postgres_fdw to fetch remote data to respond to
      cancel interrupts in a timely fashion.  However, it didn't do anything
      about the transaction control commands, which remained
      noninterruptible.
      
      Improve the situation by changing do_sql_command() to retrieve query
      results using pgfdw_get_result(), which uses the asynchronous
      interface to libpq so that it can check for interrupts every time
      libpq returns control.  Since this might result in a situation
      where we can no longer be sure that the remote transaction state
      matches the local transaction state, add a facility to force all
      levels of the local transaction to abort if we've lost track of
      the remote state; without this, an apparently-successful commit of
      the local transaction might fail to commit changes made on the
      remote side.  Also, add a 60-second timeout for queries issue during
      transaction abort; if that expires, give up and mark the state of
      the connection as unknown.  Drop all such connections when we exit
      the local transaction.  Together, these changes mean that if we're
      aborting the local toplevel transaction anyway, we can just drop the
      remote connection in lieu of waiting (possibly for a very long time)
      for it to complete an abort.
      
      This still leaves quite a bit of room for improvement.  PQcancel()
      has no asynchronous interface, so if we get stuck sending the cancel
      request we'll still hang.  Also, PQsetnonblocking() is not used, which
      means we could block uninterruptibly when sending a query.  There
      might be some other optimizations possible as well.  Nonetheless,
      this allows us to escape a wait for an unresponsive remote server
      quickly in many more cases than previously.
      
      Report by Suraj Kharage.  Patch by me and Rafia Sabih.  Review
      and testing by Amit Kapila and Tushar Ahuja.
      
      Discussion: http://postgr.es/m/CAF1DzPU8Kx+fMXEbFoP289xtm3bz3t+ZfxhmKavr98Bh-C0TqQ@mail.gmail.com
      ae9bfc5d
    • Peter Eisentraut's avatar
      Fix updating of pg_subscription_rel from workers · 644ea35f
      Peter Eisentraut authored
      A logical replication worker should not insert new rows into
      pg_subscription_rel, only update existing rows, so that there are no
      races if a concurrent refresh removes rows.  Adjust the API to be able
      to choose that behavior.
      
      Author: Masahiko Sawada <sawada.mshk@gmail.com>
      Reported-by: default avatartushar <tushar.ahuja@enterprisedb.com>
      644ea35f
    • Robert Haas's avatar
      Prevent BEFORE triggers from violating partitioning constraints. · 15ce775f
      Robert Haas authored
      Since tuple-routing implicitly checks the partitioning constraints
      at least for the levels of the partitioning hierarchy it traverses,
      there's normally no need to revalidate the partitioning constraint
      after performing tuple routing.  However, if there's a BEFORE trigger
      on the target partition, it could modify the tuple, causing the
      partitioning constraint to be violated.  Catch that case.
      
      Also, instead of checking the root table's partition constraint after
      tuple-routing, check it beforehand.  Otherwise, the rules for when
      the partitioning constraint gets checked get too complicated, because
      you sometimes have to check part of the constraint but not all of it.
      This effectively reverts commit 39162b20
      in favor of a different approach altogether.
      
      Report by me.  Initial debugging by Jeevan Ladhe.  Patch by Amit
      Langote, reviewed by me.
      
      Discussion: http://postgr.es/m/CA+Tgmoa9DTgeVOqopieV8d1QRpddmP65aCdxyjdYDoEO5pS5KA@mail.gmail.com
      15ce775f
    • Heikki Linnakangas's avatar
      Clear auth context correctly when re-connecting after failed auth attempt. · e6c33d59
      Heikki Linnakangas authored
      If authentication over an SSL connection fails, with sslmode=prefer,
      libpq will reconnect without SSL and retry. However, we did not clear
      the variables related to GSS, SSPI, and SASL authentication state, when
      reconnecting. Because of that, the second authentication attempt would
      always fail with a "duplicate GSS/SASL authentication request" error.
      pg_SSPI_startup did not check for duplicate authentication requests like
      the corresponding GSS and SASL functions, so with SSPI, you would leak
      some memory instead.
      
      Another way this could manifest itself, on version 10, is if you list
      multiple hostnames in the "host" parameter. If the first server requests
      Kerberos or SCRAM authentication, but it fails, the attempts to connect to
      the other servers will also fail with "duplicate authentication request"
      errors.
      
      To fix, move the clearing of authentication state from closePGconn to
      pgDropConnection, so that it is cleared also when re-connecting.
      
      Patch by Michael Paquier, with some kibitzing by me.
      
      Backpatch down to 9.3. 9.2 has the same bug, but the code around closing
      the connection is somewhat different, so that this patch doesn't apply.
      To fix this in 9.2, I think we would need to back-port commit 210eb9b7
      first, and then apply this patch. However, given that we only bumped into
      this in our own testing, we haven't heard any reports from users about
      this, and that 9.2 will be end-of-lifed in a couple of months anyway, it
      doesn't seem worth the risk and trouble.
      
      Discussion: https://www.postgresql.org/message-id/CAB7nPqRuOUm0MyJaUy9L3eXYJU3AKCZ-0-03=-aDTZJGV4GyWw@mail.gmail.com
      e6c33d59
    • Heikki Linnakangas's avatar
      Fix double-free bug in GSS authentication. · 3344582e
      Heikki Linnakangas authored
      The logic to free the buffer after the gss_init_sec_context() call was
      always a bit wonky. Because gss_init_sec_context() sets the GSS context
      variable, conn->gctx, we would in fact always attempt to free the buffer.
      That only works, because previously conn->ginbuf.value was initialized to
      NULL, and free(NULL) is a no-op. Commit 61bf96ca refactored things so
      that the GSS input token buffer is allocated locally in pg_GSS_continue,
      and not held in the PGconn object. After that, the now-local ginbuf.value
      variable isn't initialized when it's not used, so we pass a bogus pointer
      to free().
      
      To fix, only try to free the input buffer if we allocated it. That was the
      intention, certainly after the refactoring, and probably even before that.
      But because there's no live bug before the refactoring, I refrained from
      backpatching this.
      
      The bug was also independently reported by Graham Dutton, as bug #14690.
      Patch reviewed by Michael Paquier.
      
      Discussion: https://www.postgresql.org/message-id/6288d80e-a0bf-d4d3-4e12-7b79c77f1771%40iki.fi
      Discussion: https://www.postgresql.org/message-id/20170605130954.1438.90535%40wrigleys.postgresql.org
      3344582e
    • Peter Eisentraut's avatar
      Consistently use subscription name as application name · d4bfc06e
      Peter Eisentraut authored
      The logical replication apply worker uses the subscription name as
      application name, except for table sync.  This was incorrectly set to
      use the replication slot name, which might be different, in one case.
      Also add a comment why the other case is different.
      d4bfc06e
  3. 06 Jun, 2017 15 commits
  4. 05 Jun, 2017 6 commits
  5. 04 Jun, 2017 5 commits
    • Tom Lane's avatar
      Replace over-optimistic Assert in partitioning code with a runtime test. · e7941a97
      Tom Lane authored
      get_partition_parent felt that it could simply Assert that systable_getnext
      found a tuple.  This is unlike any other caller of that function, and it's
      unsafe IMO --- in fact, the reason I noticed it was that the Assert failed.
      (OK, I was working with known-inconsistent catalog contents, but I wasn't
      expecting the DB to fall over quite that violently.  The behavior in a
      non-assert-enabled build wouldn't be very nice, either.)  Fix it to do what
      other callers do, namely an actual runtime-test-and-elog.
      
      Also, standardize the wording of elog messages that are complaining about
      unexpected failure of systable_getnext.  90% of them say "could not find
      tuple for <object>", so make the remainder do likewise.  Many of the
      holdouts were using the phrasing "cache lookup failed", which is outright
      misleading since no catcache search is involved.
      e7941a97
    • Tom Lane's avatar
      #ifdef out assorted unused GEQO code. · 9db7d47f
      Tom Lane authored
      I'd always assumed that backend/optimizer/geqo/'s remarkably poor
      showing on code coverage metrics was because we weren't exercising
      it much in the regression tests.  But it turns out that a good chunk
      of the problem is that there's a bunch of code that is physically
      unreachable (because the calls to it are #ifdef'd out in geqo_main.c)
      but is being built anyway.  Making the called code have #if guards
      similar to the calling code saves a couple of kilobytes of executable
      size and should make the coverage numbers more reflective of reality.
      
      It's arguable that we should just delete all the unused recombination
      mechanisms altogether, but I didn't feel a need to go that far today.
      9db7d47f
    • Tom Lane's avatar
      Disallow CREATE INDEX if table is already in use in current session. · 0d188526
      Tom Lane authored
      If we allow this, whatever outer command has the table open will not know
      about the new index and may fail to update it as needed, as shown in a
      report from Laurenz Albe.  We already had such a prohibition in place for
      ALTER TABLE, but the CREATE INDEX syntax missed the check.
      
      Fixing it requires an API change for DefineIndex(), which conceivably
      would break third-party extensions if we were to back-patch it.  Given
      how long this problem has existed without being noticed, fixing it in
      the back branches doesn't seem worth that risk.
      
      Discussion: https://postgr.es/m/A737B7A37273E048B164557ADEF4A58B53A4DC9A@ntex2010i.host.magwien.gv.at
      0d188526
    • Alvaro Herrera's avatar
      Assorted translatable string fixes · 55a70a02
      Alvaro Herrera authored
      Mark our rusage reportage string translatable; remove quotes from type
      names; unify formatting of very similar messages.
      55a70a02
    • Tom Lane's avatar
      Remove dead variables. · 5936d25f
      Tom Lane authored
      Commit 512c7356 left a couple of variables unused except for being set.
      My compiler didn't whine about this, but some buildfarm members did.
      5936d25f
  6. 03 Jun, 2017 6 commits
    • Tom Lane's avatar
      Add some missing backslash commands to psql's tab-completion knowledge. · f1175556
      Tom Lane authored
      \if and related commands were overlooked here, as were \dRp and \dRs
      from the logical-replication patch, as was \?.
      
      While here, reformat the list to put each new first command letter on
      a separate line; perhaps that will limit the need to reflow the whole
      list when we add more commands in future.
      
      Masahiko Sawada (reformatting by me)
      
      Discussion: https://postgr.es/m/CAD21AoDW1QHtBsM33hV+Fg2mYEs+FWj4qtoCU72AwHAXQ3U6ZQ@mail.gmail.com
      f1175556
    • Tom Lane's avatar
      Fix <> and pattern-NOT-match estimators to handle nulls correctly. · 512c7356
      Tom Lane authored
      These estimators returned 1 minus the corresponding equality/match
      estimate, which is incorrect: we need to subtract off the fraction
      of nulls in the column, since those are neither equal nor not equal
      to the comparison value.  The error only becomes obvious if the
      nullfrac is large, but it could be very bad in a mostly-nulls
      column, as reported in bug #14676 from Marko Tiikkaja.
      
      To fix the <> case, refactor eqsel() and neqsel() to call a common
      support routine, which can be made to account for nullfrac correctly.
      The pattern-match cases were already factored that way, and it was
      simply an oversight that patternsel() wasn't subtracting off nullfrac.
      
      neqjoinsel() has a similar problem, but since we're elsewhere discussing
      changing its behavior entirely, I left it alone for now.
      
      This is a very longstanding bug, but I'm hesitant to back-patch a fix for
      it.  Given the lack of prior complaints, such cases must not come up often,
      so it's probably not worth the risk of destabilizing plans in stable
      branches.
      
      Discussion: https://postgr.es/m/20170529153847.4275.95416@wrigleys.postgresql.org
      512c7356
    • Tom Lane's avatar
      Fix old corner-case logic error in final_cost_nestloop(). · 23886581
      Tom Lane authored
      When costing a nestloop with stop-at-first-inner-match semantics, and a
      non-indexscan inner path, final_cost_nestloop() wants to charge the full
      scan cost of the inner rel at least once, with additional scans charged
      at inner_rescan_run_cost which might be less.  However the logic for
      doing this effectively assumed that outer_matched_rows is at least 1.
      If it's zero, which is not unlikely for a small outer rel, we ended up
      charging inner_run_cost plus N times inner_rescan_run_cost, as much as
      double the correct charge for an outer rel with only one row that
      we're betting won't be matched.  (Unless the inner rel is materialized,
      in which case it has very small inner_rescan_run_cost and the cost
      is not so far off what it should have been.)
      
      The upshot of this was that the planner had a tendency to select plans
      that failed to make effective use of the stop-at-first-inner-match
      semantics, and that might have Materialize nodes in them even when the
      predicted number of executions of the Materialize subplan was only 1.
      This was not so obvious before commit 9c7f5229, because the case only
      arose in connection with semi/anti joins where there's not freedom to
      reverse the join order.  But with the addition of unique-inner joins,
      it could result in some fairly bad planning choices, as reported by
      Teodor Sigaev.  Indeed, some of the test cases added by that commit
      have plans that look dubious on closer inspection, and are changed
      by this patch.
      
      Fix the logic to ensure that we don't charge for too many inner scans.
      I chose to adjust it so that the full-freight scan cost is associated
      with an unmatched outer row if possible, not a matched one, since that
      seems like a better model of what would happen at runtime.
      
      This is a longstanding bug, but given the lesser impact in back branches,
      and the lack of field complaints, I won't risk a back-patch.
      
      Discussion: https://postgr.es/m/CAKJS1f-LzkUsFxdJ_-Luy38orQ+AdEXM5o+vANR+-pHAWPSecg@mail.gmail.com
      23886581
    • Peter Eisentraut's avatar
      Receive invalidation messages correctly in tablesync worker · 66b84fa8
      Peter Eisentraut authored
      We didn't accept any invalidation messages until the whole sync process
      had finished (because it flattens all the remote transactions in the
      single one).  So the sync worker didn't learn about subscription
      changes/drop until it has finished.  This could lead to "orphaned" sync
      workers.
      
      Author: Petr Jelinek <petr.jelinek@2ndquadrant.com>
      Reported-by: default avatarMasahiko Sawada <sawada.mshk@gmail.com>
      66b84fa8
    • Peter Eisentraut's avatar
      Make tablesync worker exit when apply dies while it was waiting for it · 3c9bc215
      Peter Eisentraut authored
      This avoids "orphaned" sync workers.
      
      This was caused by a thinko in wait_for_sync_status_change.
      
      Author: Petr Jelinek <petr.jelinek@2ndquadrant.com>
      Reported-by: default avatarMasahiko Sawada <sawada.mshk@gmail.com>
      3c9bc215
    • Andres Freund's avatar
      Allow parallelism in COPY (query) TO ...; · 34aebcf4
      Andres Freund authored
      Previously this was not allowed, as copy.c didn't set the
      CURSOR_OPT_PARALLEL_OK flag when planning the query. Set it.
      
      While the lack of parallel query for COPY isn't strictly speaking a
      bug, it does prevent parallelism from being used in a facility
      commonly used to run long running queries. Thus backpatch to 9.6.
      
      Author: Andres Freund
      Discussion: https://postgr.es/m/20170531231958.ihanapplorptykzm@alap3.anarazel.de
      Backpatch: 9.6, where parallelism was introduced.
      34aebcf4