1. 19 Aug, 2018 1 commit
  2. 18 Aug, 2018 2 commits
  3. 17 Aug, 2018 6 commits
    • Tom Lane's avatar
      Ensure schema qualification in pg_restore DISABLE/ENABLE TRIGGER commands. · 6771c932
      Tom Lane authored
      Previously, this code blindly followed the common coding pattern of
      passing PQserverVersion(AH->connection) as the server-version parameter
      of fmtQualifiedId.  That works as long as we have a connection; but in
      pg_restore with text output, we don't.  Instead we got a zero from
      PQserverVersion, which fmtQualifiedId interpreted as "server is too old to
      have schemas", and so the name went unqualified.  That still accidentally
      managed to work in many cases, which is probably why this ancient bug went
      undetected for so long.  It only became obvious in the wake of the changes
      to force dump/restore to execute with restricted search_path.
      
      In HEAD/v11, let's deal with this by ripping out fmtQualifiedId's server-
      version behavioral dependency, and just making it schema-qualify all the
      time.  We no longer support pg_dump from servers old enough to need the
      ability to omit schema name, let alone restoring to them.  (Also, the few
      callers outside pg_dump already didn't work with pre-schema servers.)
      
      In older branches, that's not an acceptable solution, so instead just
      tweak the DISABLE/ENABLE TRIGGER logic to ensure it will schema-qualify
      its output regardless of server version.
      
      Per bug #15338 from Oleg somebody.  Back-patch to all supported branches.
      
      Discussion: https://postgr.es/m/153452458706.1316.5328079417086507743@wrigleys.postgresql.org
      6771c932
    • Peter Eisentraut's avatar
      InsertPgAttributeTuple() to set attcacheoff · e4597ee6
      Peter Eisentraut authored
      InsertPgAttributeTuple() is the interface between in-memory tuple
      descriptors and on-disk pg_attribute, so it makes sense to give it the
      job of resetting attcacheoff.  This avoids having all the callers having
      to do so.
      Reviewed-by: default avatarTom Lane <tgl@sss.pgh.pa.us>
      e4597ee6
    • Andrew Gierth's avatar
      Set scan direction appropriately for SubPlans (bug #15336) · 520acab1
      Andrew Gierth authored
      When executing a SubPlan in an expression, the EState's direction
      field was left alone, resulting in an attempt to execute the subplan
      backwards if it was encountered during a backwards scan of a cursor.
      Also, though much less likely, it was possible to reach the execution
      of an InitPlan while in backwards-scan state.
      
      Repair by saving/restoring estate->es_direction and forcing forward
      scan mode in the relevant places.
      
      Backpatch all the way, since this has been broken since 8.3 (prior to
      commit c7ff7663, SubPlans had their own EStates rather than sharing
      the parent plan's, so there was no confusion over scan direction).
      
      Per bug #15336 reported by Vladimir Baranoff; analysis and patch by
      me, review by Tom Lane.
      
      Discussion: https://postgr.es/m/153449812167.1304.1741624125628126322@wrigleys.postgresql.org
      520acab1
    • Tom Lane's avatar
      Fix configure's snprintf test so it exposes HP-UX bug. · 9bed827b
      Tom Lane authored
      Since commit e1d19c90, buildfarm member gharial has been failing with
      symptoms indicating that snprintf sometimes returns -1 for buffer
      overrun, even though it passes the added configure check.  Some
      google research suggests that this happens only in limited cases,
      such as when the overrun happens partway through a %d item.  Adjust
      the configure check to exercise it that way.  Since I'm now feeling
      more paranoid than I was before, also make the test explicitly verify
      that the buffer doesn't get physically overrun.
      9bed827b
    • Bruce Momjian's avatar
      pg_upgrade: issue helpful error message for use on standbys · b94f7b53
      Bruce Momjian authored
      Commit 777e6ddf checked for a shut down
      message from a standby and allowed it to continue.  This patch reports a
      helpful error message in these cases, suggesting to use rsync as
      documented.
      
      Diagnosed-by: Martín Marqués
      
      Discussion: https://postgr.es/m/CAPdiE1xYCow-reLjrhJ9DqrMu-ppNq0ChUUEvVdxhdjGRD5_eA@mail.gmail.com
      
      Backpatch-through: 9.3
      b94f7b53
    • Michael Paquier's avatar
  4. 16 Aug, 2018 10 commits
  5. 15 Aug, 2018 3 commits
    • Alvaro Herrera's avatar
      Update FSM on WAL replay of page all-visible/frozen · ab7dbd68
      Alvaro Herrera authored
      We aren't very strict about keeping FSM up to date on WAL replay,
      because per-page freespace values aren't critical in replicas (can't
      write to heap in a replica; and if the replica is promoted, the values
      would be updated by VACUUM anyway).  However, VACUUM since 9.6 can skip
      processing pages marked all-visible or all-frozen, and if such pages are
      recorded in FSM with wrong values, those values are blindly propagated
      to FSM's upper layers by VACUUM's FreeSpaceMapVacuum.  (This rationale
      assumes that crashes are not very frequent, because those would cause
      outdated FSM to occur in the primary.)
      
      Even when the FSM is outdated in standby, things are not too bad
      normally, because, most per-page FSM values will be zero (other than
      those propagated with the base-backup that created the standby); only
      once the remaining free space is less than 0.2*BLCKSZ the per-page value
      is maintained by WAL replay of heap ins/upd/del.  However, if
      wal_log_hints=on causes complete FSM pages to be propagated to a standby
      via full-page images, many too-optimistic per-page values can end up
      being registered in the standby.
      
      Incorrect per-page values aren't critical in most cases, since an
      inserter that is given a page that doesn't actually contain the claimed
      free space will update FSM with the correct value, and retry until it
      finds a usable page.  However, if there are many such updates to do, an
      inserter can spend a long time doing them before a usable page is found;
      in a heavily trafficked insert-only table with many concurrent inserters
      this has been observed to cause several second stalls, causing visible
      application malfunction.
      
      To fix this problem, it seems sufficient to have heap_xlog_visible
      (replay of setting all-visible and all-frozen VM bits for a heap page)
      update the FSM value for the page being processed.  This fixes the
      per-page counters together with making the page skippable to vacuum, so
      when vacuum does FreeSpaceMapVacuum, the values propagated to FSM upper
      layers are the correct ones, avoiding the problem.
      
      While at it, apply the same fix to heap_xlog_clean (replay of tuple
      removal by HOT pruning and vacuum).  This makes any space freed by the
      cleaning available earlier than the next vacuum in the promoted replica.
      
      Backpatch to 9.6, where this problem was diagnosed on an insert-only
      table with all-frozen pages, which were introduced as a concept in that
      release.  Theoretically it could apply with all-visible pages to older
      branches, but there's been no report of that and it doesn't backpatch
      cleanly anyway.
      
      Author: Álvaro Herrera <alvherre@alvh.no-ip.org>
      Discussion: https://postgr.es/m/20180802172857.5skoexsilnjvgruk@alvherre.pgsql
      ab7dbd68
    • Tom Lane's avatar
      Clean up assorted misuses of snprintf()'s result value. · cc4f6b77
      Tom Lane authored
      Fix a small number of places that were testing the result of snprintf()
      but doing so incorrectly.  The right test for buffer overrun, per C99,
      is "result >= bufsize" not "result > bufsize".  Some places were also
      checking for failure with "result == -1", but the standard only says
      that a negative value is delivered on failure.
      
      (Note that this only makes these places correct if snprintf() delivers
      C99-compliant results.  But at least now these places are consistent
      with all the other places where we assume that.)
      
      Also, make psql_start_test() and isolation_start_test() check for
      buffer overrun while constructing their shell commands.  There seems
      like a higher risk of overrun, with more severe consequences, here
      than there is for the individual file paths that are made elsewhere
      in the same functions, so this seemed like a worthwhile change.
      
      Also fix guc.c's do_serialize() to initialize errno = 0 before
      calling vsnprintf.  In principle, this should be unnecessary because
      vsnprintf should have set errno if it returns a failure indication ...
      but the other two places this coding pattern is cribbed from don't
      assume that, so let's be consistent.
      
      These errors are all very old, so back-patch as appropriate.  I think
      that only the shell command overrun cases are even theoretically
      reachable in practice, but there's not much point in erroneous error
      checks.
      
      Discussion: https://postgr.es/m/17245.1534289329@sss.pgh.pa.us
      cc4f6b77
    • Tom Lane's avatar
      Make snprintf.c follow the C99 standard for snprintf's result value. · 805889d7
      Tom Lane authored
      C99 says that the result should be the number of bytes that would have
      been emitted given a large enough buffer, not the number we actually
      were able to put in the buffer.  It's time to make our substitute
      implementation comply with that.  Not doing so results in inefficiency
      in buffer-enlargement cases, and also poses a portability hazard for
      third-party code that might expect C99-compliant snprintf behavior
      within Postgres.
      
      In passing, remove useless tests for str == NULL; neither C99 nor
      predecessor standards ever allowed that except when count == 0,
      so I see no reason to expend cycles on making that a non-crash case
      for this implementation.  Also, don't waste a byte in pg_vfprintf's
      local I/O buffer; this might have performance benefits by allowing
      aligned writes during flushbuffer calls.
      
      Discussion: https://postgr.es/m/17245.1534289329@sss.pgh.pa.us
      805889d7
  6. 14 Aug, 2018 3 commits
  7. 13 Aug, 2018 11 commits
    • Peter Eisentraut's avatar
      Remove obsolete netbsd dynloader code · b68ff3ea
      Peter Eisentraut authored
      dlopen() has been documented since NetBSD 1.1 (1995).
      b68ff3ea
    • Peter Eisentraut's avatar
      Remove obsolete openbsd dynloader code · 29351a06
      Peter Eisentraut authored
      dlopen() has been documented since OpenBSD 2.0 (1996).
      29351a06
    • Peter Eisentraut's avatar
      Remove obsolete freebsd dynloader code · 2db1905f
      Peter Eisentraut authored
      dlopen() has been documented since FreeBSD 3.0 (1989).
      2db1905f
    • Peter Eisentraut's avatar
      Remove obsolete linux dynloader code · 351855fc
      Peter Eisentraut authored
      This has been obsolete probably since the late 1990s.
      351855fc
    • Peter Eisentraut's avatar
      Remove obsolete darwin dynloader code · b5d29299
      Peter Eisentraut authored
      not needed since macOS 10.3 (2003)
      b5d29299
    • Peter Eisentraut's avatar
      Remove obsolete comment · 3ebdd21b
      Peter Eisentraut authored
      The sequence name is no longer stored in the sequence relation, since
      1753b1b0.
      3ebdd21b
    • Tom Lane's avatar
      Fix libpq's implementation of per-host connection timeouts. · 1e6e98f7
      Tom Lane authored
      Commit 5f374fe7 attempted to turn the connect_timeout from an overall
      maximum time limit into a per-host limit, but it didn't do a great job of
      that.  The timer would only get restarted if we actually detected timeout
      within connectDBComplete(), not if we changed our attention to a new host
      for some other reason.  In that case the old timeout continued to run,
      possibly causing a premature timeout failure for the new host.
      
      Fix that, and also tweak the logic so that if we do get a timeout,
      we advance to the next available IP address, not to the next host name.
      There doesn't seem to be a good reason to assume that all the IP
      addresses supplied for a given host name will necessarily fail the
      same way as the current one.  Moreover, this conforms better to the
      admittedly-vague documentation statement that the timeout is "per
      connection attempt".  I changed that to "per host name or IP address"
      to be clearer.  (Note that reconnections to the same server, such as for
      switching protocol version or SSL status, don't get their own separate
      timeout; that was true before and remains so.)
      
      Also clarify documentation about the interpretation of connect_timeout
      values less than 2.
      
      This seems like a bug, so back-patch to v10 where this logic came in.
      
      Tom Lane, reviewed by Fabien Coelho
      
      Discussion: https://postgr.es/m/5735.1533828184@sss.pgh.pa.us
      1e6e98f7
    • Michael Paquier's avatar
      Make autovacuum more aggressive to remove orphaned temp tables · 246a6c8f
      Michael Paquier authored
      Commit dafa0848, added in 10, made the removal of temporary orphaned
      tables more aggressive.  This commit makes an extra step into the
      aggressiveness by adding a flag in each backend's MyProc which tracks
      down any temporary namespace currently in use.  The flag is set when the
      namespace gets created and can be reset if the temporary namespace has
      been created in a transaction or sub-transaction which is aborted.  The
      flag value assignment is assumed to be atomic, so this can be done in a
      lock-less fashion like other flags already present in PGPROC like
      databaseId or backendId, still the fact that the temporary namespace and
      table created are still locked until the transaction creating those
      commits acts as a barrier for other backends.
      
      This new flag gets used by autovacuum to discard more aggressively
      orphaned tables by additionally checking for the database a backend is
      connected to as well as its temporary namespace in-use, removing
      orphaned temporary relations even if a backend reuses the same slot as
      one which created temporary relations in a past session.
      
      The base idea of this patch comes from Robert Haas, has been written in
      its first version by Tsunakawa Takayuki, then heavily reviewed by me.
      
      Author: Tsunakawa Takayuki
      Reviewed-by: Michael Paquier, Kyotaro Horiguchi, Andres Freund
      Discussion: https://postgr.es/m/0A3221C70F24FB45833433255569204D1F8A4DC6@G01JPEXMBYT05
      Backpatch: 11-, as PGPROC gains a new flag and we don't want silent ABI
      breakages on already released versions.
      246a6c8f
    • Amit Kapila's avatar
      Adjust comment atop ExecShutdownNode. · 4f9a97e4
      Amit Kapila authored
      After commits a315b967 and b805b63ac2, part of the comment atop
      ExecShutdownNode is redundant.  Adjust it.
      
      Author: Amit Kapila
      Backpatch-through: 10 where both the mentioned commits are present.
      Discussion: https://postgr.es/m/86137f17-1dfb-42f9-7421-82fd786b04a1@anayrat.info
      4f9a97e4
    • Amit Kapila's avatar
      Prohibit shutting down resources if there is a possibility of back up. · 2cd0acfd
      Amit Kapila authored
      Currently, we release the asynchronous resources as soon as it is evident
      that no more rows will be needed e.g. when a Limit is filled.  This can be
      problematic especially for custom and foreign scans where we can scan
      backward.  Fix that by disallowing the shutting down of resources in such
      cases.
      
      Reported-by: Robert Haas
      Analysed-by: Robert Haas and Amit Kapila
      Author: Amit Kapila
      Reviewed-by: Robert Haas
      Backpatch-through: 9.6 where this code was introduced
      Discussion: https://postgr.es/m/86137f17-1dfb-42f9-7421-82fd786b04a1@anayrat.info
      2cd0acfd
    • Andrew Gierth's avatar
      Avoid query-lifetime memory leaks in XMLTABLE (bug #15321) · 07172d5a
      Andrew Gierth authored
      Multiple calls to XMLTABLE in a query (e.g. laterally applying it to a
      table with an xml column, an important use-case) were leaking large
      amounts of memory into the per-query context, blowing up memory usage.
      
      Repair by reorganizing memory context usage in nodeTableFuncscan; use
      the usual per-tuple context for row-by-row evaluations instead of
      perValueCxt, and use the explicitly created context -- renamed from
      perValueCxt to perTableCxt -- for arguments and state for each
      individual table-generation operation.
      
      Backpatch to PG10 where this code was introduced.
      
      Original report by IRC user begriffs; analysis and patch by me.
      Reviewed by Tom Lane and Pavel Stehule.
      
      Discussion: https://postgr.es/m/153394403528.10284.7530399040974170549@wrigleys.postgresql.org
      07172d5a
  8. 12 Aug, 2018 2 commits
    • Tom Lane's avatar
      Revert "Distinguish printf-like functions that support %m from those that don't." · 46b5e7c4
      Tom Lane authored
      This reverts commit 3a60c8ff.  Buildfarm
      results show that that caused a whole bunch of new warnings on platforms
      where gcc believes the local printf to be non-POSIX-compliant.  This
      problem outweighs the hypothetical-anyway possibility of getting warnings
      for misuse of %m.  We could use gnu_printf archetype when we've substituted
      src/port/snprintf.c, but that brings us right back to the problem of not
      getting warnings for %m.
      
      A possible answer is to attack it in the other direction by insisting
      that %m support be included in printf's feature set, but that will take
      more investigation.  In the meantime, revert the previous change, and
      update the comment for PGAC_C_PRINTF_ARCHETYPE to more fully explain
      what's going on.
      
      Discussion: https://postgr.es/m/2975.1526862605@sss.pgh.pa.us
      46b5e7c4
    • Tom Lane's avatar
      Fix bogus loop logic in 013_crash_restart test's pump_until subroutine. · d11eae09
      Tom Lane authored
      The pump_nb() step might've already received the desired data, so we must
      check for that at the top of the loop not the bottom.  Otherwise, the
      call to pump() will sit with nothing to do until the timeout elapses.
      pump_until then falls out with apparent success ... but the timeout has
      been used up, causing the next call of pump_until to report a timeout
      failure.  I believe this explains the intermittent timeout failures
      we've seen in the buildfarm ever since this test went in.  I was able
      to reproduce the problem on gaur semi-repeatably, and this appears to
      fix it.
      
      In passing, remove a duplicate assignment, fix one stdin-assignment to
      look like the rest, and document the test's dependency on test_decoding.
      d11eae09
  9. 11 Aug, 2018 2 commits
    • Tom Lane's avatar
      Fix wrong order of operations in inheritance_planner. · 4a2994f0
      Tom Lane authored
      When considering a partitioning parent rel, we should stop processing that
      subroot as soon as we've done adjust_appendrel_attrs and any securityQuals
      updates.  The rest of this is unnecessary, and indeed adding duplicate
      subquery RTEs to the subroot is *wrong*.  As the code stood, the children
      of that partition ended up with two sets of copied subquery RTEs, confusing
      matters greatly.  Even more hilarity ensued if all of the children got
      excluded by constraint exclusion, so that the extra RTEs didn't make it
      back into the parent rtable.
      
      Per fuzz testing by Andreas Seltenreich.  Back-patch to v11 where this
      got broken (by commit 0a480502, it looks like).
      
      Discussion: https://postgr.es/m/87va8g7vq0.fsf@ansel.ydns.eu
      4a2994f0
    • Tom Lane's avatar
      Produce compiler errors if errno is referenced inside elog/ereport calls. · a2a8acd1
      Tom Lane authored
      It's often unsafe to reference errno within an elog/ereport call, because
      there are a lot of sub-functions involved and they might not all preserve
      errno.  (This is why we support the %m format spec: it works off a value
      of errno captured before we execute any potentially-unsafe functions in
      the arguments.)  Therefore, we have a project policy not to use errno
      there.
      
      This patch adds a hack to cause an (admittedly obscure) compiler error
      for such unsafe usages.  With the current code, the error will only be seen
      on Linux, macOS, and FreeBSD, but that should certainly be enough to catch
      mistakes in the buildfarm if they somehow get missed earlier.
      
      In addition, fix some places in src/common/exec.c that trip the error.
      I think these places are actually all safe, but it's simple enough to
      avoid the error by capturing errno manually, and doing so is good
      future-proofing in case these call sites get any more complicated.
      
      Thomas Munro (exec.c fixes by me)
      
      Discussion: https://postgr.es/m/2975.1526862605@sss.pgh.pa.us
      a2a8acd1