1. 06 Jun, 2017 8 commits
    • Andres Freund's avatar
      Unify SIGHUP handling between normal and walsender backends. · 6e1dd277
      Andres Freund authored
      Because walsender and normal backends share the same main loop it's
      problematic to have two different flag variables, set in signal
      handlers, indicating a pending configuration reload.  Only certain
      walsender commands reach code paths checking for the
      variable (START_[LOGICAL_]REPLICATION, CREATE_REPLICATION_SLOT
      ... LOGICAL, notably not base backups).
      
      This is a bug present since the introduction of walsender, but has
      gotten worse in releases since then which allow walsender to do more.
      
      A later patch, not slated for v10, will similarly unify SIGHUP
      handling in other types of processes as well.
      
      Author: Petr Jelinek, Andres Freund
      Reviewed-By: Michael Paquier
      Discussion: https://postgr.es/m/20170423235941.qosiuoyqprq4nu7v@alap3.anarazel.de
      Backpatch: 9.2-, bug is present since 9.0
      6e1dd277
    • Andres Freund's avatar
      Prevent possibility of panics during shutdown checkpoint. · c6c33343
      Andres Freund authored
      When the checkpointer writes the shutdown checkpoint, it checks
      afterwards whether any WAL has been written since it started and
      throws a PANIC if so.  At that point, only walsenders are still
      active, so one might think this could not happen, but walsenders can
      also generate WAL, for instance in BASE_BACKUP and logical decoding
      related commands (e.g. via hint bits).  So they can trigger this panic
      if such a command is run while the shutdown checkpoint is being
      written.
      
      To fix this, divide the walsender shutdown into two phases.  First,
      checkpointer, itself triggered by postmaster, sends a
      PROCSIG_WALSND_INIT_STOPPING signal to all walsenders.  If the backend
      is idle or runs an SQL query this causes the backend to shutdown, if
      logical replication is in progress all existing WAL records are
      processed followed by a shutdown.  Otherwise this causes the walsender
      to switch to the "stopping" state. In this state, the walsender will
      reject any further replication commands. The checkpointer begins the
      shutdown checkpoint once all walsenders are confirmed as
      stopping. When the shutdown checkpoint finishes, the postmaster sends
      us SIGUSR2. This instructs walsender to send any outstanding WAL,
      including the shutdown checkpoint record, wait for it to be replicated
      to the standby, and then exit.
      
      Author: Andres Freund, based on an earlier patch by Michael Paquier
      Reported-By: Fujii Masao, Andres Freund
      Reviewed-By: Michael Paquier
      Discussion: https://postgr.es/m/20170602002912.tqlwn4gymzlxpvs2@alap3.anarazel.de
      Backpatch: 9.4, where logical decoding was introduced
      c6c33343
    • Andres Freund's avatar
      Have walsenders participate in procsignal infrastructure. · 47fd420f
      Andres Freund authored
      The non-participation in procsignal was a problem for both changes in
      master, e.g. parallelism not working for normal statements run in
      walsender backends, and older branches, e.g. recovery conflicts and
      catchup interrupts not working for logical decoding walsenders.
      
      This commit thus replaces the previous WalSndXLogSendHandler with
      procsignal_sigusr1_handler.  In branches since db0f6cad that can
      lead to additional SetLatch calls, but that only rarely seems to make
      a difference.
      
      Author: Andres Freund
      Reviewed-By: Michael Paquier
      Discussion: https://postgr.es/m/20170421014030.fdzvvvbrz4nckrow@alap3.anarazel.de
      Backpatch: 9.4, earlier commits don't seem to benefit sufficiently
      47fd420f
    • Andres Freund's avatar
      Revert "Prevent panic during shutdown checkpoint" · 703f148e
      Andres Freund authored
      This reverts commit 086221cf, which
      was made to master only.
      
      The approach implemented in the above commit has some issues.  While
      those could easily be fixed incrementally, doing so would make
      backpatching considerably harder, so instead first revert this patch.
      
      Discussion: https://postgr.es/m/20170602002912.tqlwn4gymzlxpvs2@alap3.anarazel.de
      703f148e
    • Peter Eisentraut's avatar
      Don't set application_name in logical replication workers · 41a21bf9
      Peter Eisentraut authored
      This was bothering some people because it's not the intended use of
      application_name and it makes the default view of pg_stat_activity
      bulky.
      41a21bf9
    • Peter Eisentraut's avatar
      Fix ALTER SUBSCRIPTION grammar ambiguity · 9907b55c
      Peter Eisentraut authored
      There was a grammar ambiguity between SET PUBLICATION name REFRESH and
      SET PUBLICATION SKIP REFRESH, because SKIP is not a reserved word.  To
      resolve that, fold the refresh choice into the WITH options.  Refreshing
      is the default now.
      Reported-by: default avatartushar <tushar.ahuja@enterprisedb.com>
      9907b55c
    • Peter Eisentraut's avatar
      Ignore WL_POSTMASTER_DEATH latch event in single user mode · 06bfb801
      Peter Eisentraut authored
      Otherwise code that uses this will abort with an assertion failure,
      because postmaster_alive_fds are not initialized.
      Reported-by: default avatartushar <tushar.ahuja@enterprisedb.com>
      06bfb801
    • Andrew Dunstan's avatar
      Fix thinko in previous openssl change · 2e02136f
      Andrew Dunstan authored
      2e02136f
  2. 05 Jun, 2017 6 commits
  3. 04 Jun, 2017 5 commits
    • Tom Lane's avatar
      Replace over-optimistic Assert in partitioning code with a runtime test. · e7941a97
      Tom Lane authored
      get_partition_parent felt that it could simply Assert that systable_getnext
      found a tuple.  This is unlike any other caller of that function, and it's
      unsafe IMO --- in fact, the reason I noticed it was that the Assert failed.
      (OK, I was working with known-inconsistent catalog contents, but I wasn't
      expecting the DB to fall over quite that violently.  The behavior in a
      non-assert-enabled build wouldn't be very nice, either.)  Fix it to do what
      other callers do, namely an actual runtime-test-and-elog.
      
      Also, standardize the wording of elog messages that are complaining about
      unexpected failure of systable_getnext.  90% of them say "could not find
      tuple for <object>", so make the remainder do likewise.  Many of the
      holdouts were using the phrasing "cache lookup failed", which is outright
      misleading since no catcache search is involved.
      e7941a97
    • Tom Lane's avatar
      #ifdef out assorted unused GEQO code. · 9db7d47f
      Tom Lane authored
      I'd always assumed that backend/optimizer/geqo/'s remarkably poor
      showing on code coverage metrics was because we weren't exercising
      it much in the regression tests.  But it turns out that a good chunk
      of the problem is that there's a bunch of code that is physically
      unreachable (because the calls to it are #ifdef'd out in geqo_main.c)
      but is being built anyway.  Making the called code have #if guards
      similar to the calling code saves a couple of kilobytes of executable
      size and should make the coverage numbers more reflective of reality.
      
      It's arguable that we should just delete all the unused recombination
      mechanisms altogether, but I didn't feel a need to go that far today.
      9db7d47f
    • Tom Lane's avatar
      Disallow CREATE INDEX if table is already in use in current session. · 0d188526
      Tom Lane authored
      If we allow this, whatever outer command has the table open will not know
      about the new index and may fail to update it as needed, as shown in a
      report from Laurenz Albe.  We already had such a prohibition in place for
      ALTER TABLE, but the CREATE INDEX syntax missed the check.
      
      Fixing it requires an API change for DefineIndex(), which conceivably
      would break third-party extensions if we were to back-patch it.  Given
      how long this problem has existed without being noticed, fixing it in
      the back branches doesn't seem worth that risk.
      
      Discussion: https://postgr.es/m/A737B7A37273E048B164557ADEF4A58B53A4DC9A@ntex2010i.host.magwien.gv.at
      0d188526
    • Alvaro Herrera's avatar
      Assorted translatable string fixes · 55a70a02
      Alvaro Herrera authored
      Mark our rusage reportage string translatable; remove quotes from type
      names; unify formatting of very similar messages.
      55a70a02
    • Tom Lane's avatar
      Remove dead variables. · 5936d25f
      Tom Lane authored
      Commit 512c7356 left a couple of variables unused except for being set.
      My compiler didn't whine about this, but some buildfarm members did.
      5936d25f
  4. 03 Jun, 2017 6 commits
    • Tom Lane's avatar
      Add some missing backslash commands to psql's tab-completion knowledge. · f1175556
      Tom Lane authored
      \if and related commands were overlooked here, as were \dRp and \dRs
      from the logical-replication patch, as was \?.
      
      While here, reformat the list to put each new first command letter on
      a separate line; perhaps that will limit the need to reflow the whole
      list when we add more commands in future.
      
      Masahiko Sawada (reformatting by me)
      
      Discussion: https://postgr.es/m/CAD21AoDW1QHtBsM33hV+Fg2mYEs+FWj4qtoCU72AwHAXQ3U6ZQ@mail.gmail.com
      f1175556
    • Tom Lane's avatar
      Fix <> and pattern-NOT-match estimators to handle nulls correctly. · 512c7356
      Tom Lane authored
      These estimators returned 1 minus the corresponding equality/match
      estimate, which is incorrect: we need to subtract off the fraction
      of nulls in the column, since those are neither equal nor not equal
      to the comparison value.  The error only becomes obvious if the
      nullfrac is large, but it could be very bad in a mostly-nulls
      column, as reported in bug #14676 from Marko Tiikkaja.
      
      To fix the <> case, refactor eqsel() and neqsel() to call a common
      support routine, which can be made to account for nullfrac correctly.
      The pattern-match cases were already factored that way, and it was
      simply an oversight that patternsel() wasn't subtracting off nullfrac.
      
      neqjoinsel() has a similar problem, but since we're elsewhere discussing
      changing its behavior entirely, I left it alone for now.
      
      This is a very longstanding bug, but I'm hesitant to back-patch a fix for
      it.  Given the lack of prior complaints, such cases must not come up often,
      so it's probably not worth the risk of destabilizing plans in stable
      branches.
      
      Discussion: https://postgr.es/m/20170529153847.4275.95416@wrigleys.postgresql.org
      512c7356
    • Tom Lane's avatar
      Fix old corner-case logic error in final_cost_nestloop(). · 23886581
      Tom Lane authored
      When costing a nestloop with stop-at-first-inner-match semantics, and a
      non-indexscan inner path, final_cost_nestloop() wants to charge the full
      scan cost of the inner rel at least once, with additional scans charged
      at inner_rescan_run_cost which might be less.  However the logic for
      doing this effectively assumed that outer_matched_rows is at least 1.
      If it's zero, which is not unlikely for a small outer rel, we ended up
      charging inner_run_cost plus N times inner_rescan_run_cost, as much as
      double the correct charge for an outer rel with only one row that
      we're betting won't be matched.  (Unless the inner rel is materialized,
      in which case it has very small inner_rescan_run_cost and the cost
      is not so far off what it should have been.)
      
      The upshot of this was that the planner had a tendency to select plans
      that failed to make effective use of the stop-at-first-inner-match
      semantics, and that might have Materialize nodes in them even when the
      predicted number of executions of the Materialize subplan was only 1.
      This was not so obvious before commit 9c7f5229, because the case only
      arose in connection with semi/anti joins where there's not freedom to
      reverse the join order.  But with the addition of unique-inner joins,
      it could result in some fairly bad planning choices, as reported by
      Teodor Sigaev.  Indeed, some of the test cases added by that commit
      have plans that look dubious on closer inspection, and are changed
      by this patch.
      
      Fix the logic to ensure that we don't charge for too many inner scans.
      I chose to adjust it so that the full-freight scan cost is associated
      with an unmatched outer row if possible, not a matched one, since that
      seems like a better model of what would happen at runtime.
      
      This is a longstanding bug, but given the lesser impact in back branches,
      and the lack of field complaints, I won't risk a back-patch.
      
      Discussion: https://postgr.es/m/CAKJS1f-LzkUsFxdJ_-Luy38orQ+AdEXM5o+vANR+-pHAWPSecg@mail.gmail.com
      23886581
    • Peter Eisentraut's avatar
      Receive invalidation messages correctly in tablesync worker · 66b84fa8
      Peter Eisentraut authored
      We didn't accept any invalidation messages until the whole sync process
      had finished (because it flattens all the remote transactions in the
      single one).  So the sync worker didn't learn about subscription
      changes/drop until it has finished.  This could lead to "orphaned" sync
      workers.
      
      Author: Petr Jelinek <petr.jelinek@2ndquadrant.com>
      Reported-by: default avatarMasahiko Sawada <sawada.mshk@gmail.com>
      66b84fa8
    • Peter Eisentraut's avatar
      Make tablesync worker exit when apply dies while it was waiting for it · 3c9bc215
      Peter Eisentraut authored
      This avoids "orphaned" sync workers.
      
      This was caused by a thinko in wait_for_sync_status_change.
      
      Author: Petr Jelinek <petr.jelinek@2ndquadrant.com>
      Reported-by: default avatarMasahiko Sawada <sawada.mshk@gmail.com>
      3c9bc215
    • Andres Freund's avatar
      Allow parallelism in COPY (query) TO ...; · 34aebcf4
      Andres Freund authored
      Previously this was not allowed, as copy.c didn't set the
      CURSOR_OPT_PARALLEL_OK flag when planning the query. Set it.
      
      While the lack of parallel query for COPY isn't strictly speaking a
      bug, it does prevent parallelism from being used in a facility
      commonly used to run long running queries. Thus backpatch to 9.6.
      
      Author: Andres Freund
      Discussion: https://postgr.es/m/20170531231958.ihanapplorptykzm@alap3.anarazel.de
      Backpatch: 9.6, where parallelism was introduced.
      34aebcf4
  5. 02 Jun, 2017 7 commits
  6. 01 Jun, 2017 4 commits
  7. 31 May, 2017 4 commits