1. 16 Aug, 2018 8 commits
  2. 15 Aug, 2018 3 commits
    • Alvaro Herrera's avatar
      Update FSM on WAL replay of page all-visible/frozen · ab7dbd68
      Alvaro Herrera authored
      We aren't very strict about keeping FSM up to date on WAL replay,
      because per-page freespace values aren't critical in replicas (can't
      write to heap in a replica; and if the replica is promoted, the values
      would be updated by VACUUM anyway).  However, VACUUM since 9.6 can skip
      processing pages marked all-visible or all-frozen, and if such pages are
      recorded in FSM with wrong values, those values are blindly propagated
      to FSM's upper layers by VACUUM's FreeSpaceMapVacuum.  (This rationale
      assumes that crashes are not very frequent, because those would cause
      outdated FSM to occur in the primary.)
      
      Even when the FSM is outdated in standby, things are not too bad
      normally, because, most per-page FSM values will be zero (other than
      those propagated with the base-backup that created the standby); only
      once the remaining free space is less than 0.2*BLCKSZ the per-page value
      is maintained by WAL replay of heap ins/upd/del.  However, if
      wal_log_hints=on causes complete FSM pages to be propagated to a standby
      via full-page images, many too-optimistic per-page values can end up
      being registered in the standby.
      
      Incorrect per-page values aren't critical in most cases, since an
      inserter that is given a page that doesn't actually contain the claimed
      free space will update FSM with the correct value, and retry until it
      finds a usable page.  However, if there are many such updates to do, an
      inserter can spend a long time doing them before a usable page is found;
      in a heavily trafficked insert-only table with many concurrent inserters
      this has been observed to cause several second stalls, causing visible
      application malfunction.
      
      To fix this problem, it seems sufficient to have heap_xlog_visible
      (replay of setting all-visible and all-frozen VM bits for a heap page)
      update the FSM value for the page being processed.  This fixes the
      per-page counters together with making the page skippable to vacuum, so
      when vacuum does FreeSpaceMapVacuum, the values propagated to FSM upper
      layers are the correct ones, avoiding the problem.
      
      While at it, apply the same fix to heap_xlog_clean (replay of tuple
      removal by HOT pruning and vacuum).  This makes any space freed by the
      cleaning available earlier than the next vacuum in the promoted replica.
      
      Backpatch to 9.6, where this problem was diagnosed on an insert-only
      table with all-frozen pages, which were introduced as a concept in that
      release.  Theoretically it could apply with all-visible pages to older
      branches, but there's been no report of that and it doesn't backpatch
      cleanly anyway.
      
      Author: Álvaro Herrera <alvherre@alvh.no-ip.org>
      Discussion: https://postgr.es/m/20180802172857.5skoexsilnjvgruk@alvherre.pgsql
      ab7dbd68
    • Tom Lane's avatar
      Clean up assorted misuses of snprintf()'s result value. · cc4f6b77
      Tom Lane authored
      Fix a small number of places that were testing the result of snprintf()
      but doing so incorrectly.  The right test for buffer overrun, per C99,
      is "result >= bufsize" not "result > bufsize".  Some places were also
      checking for failure with "result == -1", but the standard only says
      that a negative value is delivered on failure.
      
      (Note that this only makes these places correct if snprintf() delivers
      C99-compliant results.  But at least now these places are consistent
      with all the other places where we assume that.)
      
      Also, make psql_start_test() and isolation_start_test() check for
      buffer overrun while constructing their shell commands.  There seems
      like a higher risk of overrun, with more severe consequences, here
      than there is for the individual file paths that are made elsewhere
      in the same functions, so this seemed like a worthwhile change.
      
      Also fix guc.c's do_serialize() to initialize errno = 0 before
      calling vsnprintf.  In principle, this should be unnecessary because
      vsnprintf should have set errno if it returns a failure indication ...
      but the other two places this coding pattern is cribbed from don't
      assume that, so let's be consistent.
      
      These errors are all very old, so back-patch as appropriate.  I think
      that only the shell command overrun cases are even theoretically
      reachable in practice, but there's not much point in erroneous error
      checks.
      
      Discussion: https://postgr.es/m/17245.1534289329@sss.pgh.pa.us
      cc4f6b77
    • Tom Lane's avatar
      Make snprintf.c follow the C99 standard for snprintf's result value. · 805889d7
      Tom Lane authored
      C99 says that the result should be the number of bytes that would have
      been emitted given a large enough buffer, not the number we actually
      were able to put in the buffer.  It's time to make our substitute
      implementation comply with that.  Not doing so results in inefficiency
      in buffer-enlargement cases, and also poses a portability hazard for
      third-party code that might expect C99-compliant snprintf behavior
      within Postgres.
      
      In passing, remove useless tests for str == NULL; neither C99 nor
      predecessor standards ever allowed that except when count == 0,
      so I see no reason to expend cycles on making that a non-crash case
      for this implementation.  Also, don't waste a byte in pg_vfprintf's
      local I/O buffer; this might have performance benefits by allowing
      aligned writes during flushbuffer calls.
      
      Discussion: https://postgr.es/m/17245.1534289329@sss.pgh.pa.us
      805889d7
  3. 14 Aug, 2018 3 commits
  4. 13 Aug, 2018 11 commits
    • Peter Eisentraut's avatar
      Remove obsolete netbsd dynloader code · b68ff3ea
      Peter Eisentraut authored
      dlopen() has been documented since NetBSD 1.1 (1995).
      b68ff3ea
    • Peter Eisentraut's avatar
      Remove obsolete openbsd dynloader code · 29351a06
      Peter Eisentraut authored
      dlopen() has been documented since OpenBSD 2.0 (1996).
      29351a06
    • Peter Eisentraut's avatar
      Remove obsolete freebsd dynloader code · 2db1905f
      Peter Eisentraut authored
      dlopen() has been documented since FreeBSD 3.0 (1989).
      2db1905f
    • Peter Eisentraut's avatar
      Remove obsolete linux dynloader code · 351855fc
      Peter Eisentraut authored
      This has been obsolete probably since the late 1990s.
      351855fc
    • Peter Eisentraut's avatar
      Remove obsolete darwin dynloader code · b5d29299
      Peter Eisentraut authored
      not needed since macOS 10.3 (2003)
      b5d29299
    • Peter Eisentraut's avatar
      Remove obsolete comment · 3ebdd21b
      Peter Eisentraut authored
      The sequence name is no longer stored in the sequence relation, since
      1753b1b0.
      3ebdd21b
    • Tom Lane's avatar
      Fix libpq's implementation of per-host connection timeouts. · 1e6e98f7
      Tom Lane authored
      Commit 5f374fe7 attempted to turn the connect_timeout from an overall
      maximum time limit into a per-host limit, but it didn't do a great job of
      that.  The timer would only get restarted if we actually detected timeout
      within connectDBComplete(), not if we changed our attention to a new host
      for some other reason.  In that case the old timeout continued to run,
      possibly causing a premature timeout failure for the new host.
      
      Fix that, and also tweak the logic so that if we do get a timeout,
      we advance to the next available IP address, not to the next host name.
      There doesn't seem to be a good reason to assume that all the IP
      addresses supplied for a given host name will necessarily fail the
      same way as the current one.  Moreover, this conforms better to the
      admittedly-vague documentation statement that the timeout is "per
      connection attempt".  I changed that to "per host name or IP address"
      to be clearer.  (Note that reconnections to the same server, such as for
      switching protocol version or SSL status, don't get their own separate
      timeout; that was true before and remains so.)
      
      Also clarify documentation about the interpretation of connect_timeout
      values less than 2.
      
      This seems like a bug, so back-patch to v10 where this logic came in.
      
      Tom Lane, reviewed by Fabien Coelho
      
      Discussion: https://postgr.es/m/5735.1533828184@sss.pgh.pa.us
      1e6e98f7
    • Michael Paquier's avatar
      Make autovacuum more aggressive to remove orphaned temp tables · 246a6c8f
      Michael Paquier authored
      Commit dafa0848, added in 10, made the removal of temporary orphaned
      tables more aggressive.  This commit makes an extra step into the
      aggressiveness by adding a flag in each backend's MyProc which tracks
      down any temporary namespace currently in use.  The flag is set when the
      namespace gets created and can be reset if the temporary namespace has
      been created in a transaction or sub-transaction which is aborted.  The
      flag value assignment is assumed to be atomic, so this can be done in a
      lock-less fashion like other flags already present in PGPROC like
      databaseId or backendId, still the fact that the temporary namespace and
      table created are still locked until the transaction creating those
      commits acts as a barrier for other backends.
      
      This new flag gets used by autovacuum to discard more aggressively
      orphaned tables by additionally checking for the database a backend is
      connected to as well as its temporary namespace in-use, removing
      orphaned temporary relations even if a backend reuses the same slot as
      one which created temporary relations in a past session.
      
      The base idea of this patch comes from Robert Haas, has been written in
      its first version by Tsunakawa Takayuki, then heavily reviewed by me.
      
      Author: Tsunakawa Takayuki
      Reviewed-by: Michael Paquier, Kyotaro Horiguchi, Andres Freund
      Discussion: https://postgr.es/m/0A3221C70F24FB45833433255569204D1F8A4DC6@G01JPEXMBYT05
      Backpatch: 11-, as PGPROC gains a new flag and we don't want silent ABI
      breakages on already released versions.
      246a6c8f
    • Amit Kapila's avatar
      Adjust comment atop ExecShutdownNode. · 4f9a97e4
      Amit Kapila authored
      After commits a315b967 and b805b63ac2, part of the comment atop
      ExecShutdownNode is redundant.  Adjust it.
      
      Author: Amit Kapila
      Backpatch-through: 10 where both the mentioned commits are present.
      Discussion: https://postgr.es/m/86137f17-1dfb-42f9-7421-82fd786b04a1@anayrat.info
      4f9a97e4
    • Amit Kapila's avatar
      Prohibit shutting down resources if there is a possibility of back up. · 2cd0acfd
      Amit Kapila authored
      Currently, we release the asynchronous resources as soon as it is evident
      that no more rows will be needed e.g. when a Limit is filled.  This can be
      problematic especially for custom and foreign scans where we can scan
      backward.  Fix that by disallowing the shutting down of resources in such
      cases.
      
      Reported-by: Robert Haas
      Analysed-by: Robert Haas and Amit Kapila
      Author: Amit Kapila
      Reviewed-by: Robert Haas
      Backpatch-through: 9.6 where this code was introduced
      Discussion: https://postgr.es/m/86137f17-1dfb-42f9-7421-82fd786b04a1@anayrat.info
      2cd0acfd
    • Andrew Gierth's avatar
      Avoid query-lifetime memory leaks in XMLTABLE (bug #15321) · 07172d5a
      Andrew Gierth authored
      Multiple calls to XMLTABLE in a query (e.g. laterally applying it to a
      table with an xml column, an important use-case) were leaking large
      amounts of memory into the per-query context, blowing up memory usage.
      
      Repair by reorganizing memory context usage in nodeTableFuncscan; use
      the usual per-tuple context for row-by-row evaluations instead of
      perValueCxt, and use the explicitly created context -- renamed from
      perValueCxt to perTableCxt -- for arguments and state for each
      individual table-generation operation.
      
      Backpatch to PG10 where this code was introduced.
      
      Original report by IRC user begriffs; analysis and patch by me.
      Reviewed by Tom Lane and Pavel Stehule.
      
      Discussion: https://postgr.es/m/153394403528.10284.7530399040974170549@wrigleys.postgresql.org
      07172d5a
  5. 12 Aug, 2018 2 commits
    • Tom Lane's avatar
      Revert "Distinguish printf-like functions that support %m from those that don't." · 46b5e7c4
      Tom Lane authored
      This reverts commit 3a60c8ff.  Buildfarm
      results show that that caused a whole bunch of new warnings on platforms
      where gcc believes the local printf to be non-POSIX-compliant.  This
      problem outweighs the hypothetical-anyway possibility of getting warnings
      for misuse of %m.  We could use gnu_printf archetype when we've substituted
      src/port/snprintf.c, but that brings us right back to the problem of not
      getting warnings for %m.
      
      A possible answer is to attack it in the other direction by insisting
      that %m support be included in printf's feature set, but that will take
      more investigation.  In the meantime, revert the previous change, and
      update the comment for PGAC_C_PRINTF_ARCHETYPE to more fully explain
      what's going on.
      
      Discussion: https://postgr.es/m/2975.1526862605@sss.pgh.pa.us
      46b5e7c4
    • Tom Lane's avatar
      Fix bogus loop logic in 013_crash_restart test's pump_until subroutine. · d11eae09
      Tom Lane authored
      The pump_nb() step might've already received the desired data, so we must
      check for that at the top of the loop not the bottom.  Otherwise, the
      call to pump() will sit with nothing to do until the timeout elapses.
      pump_until then falls out with apparent success ... but the timeout has
      been used up, causing the next call of pump_until to report a timeout
      failure.  I believe this explains the intermittent timeout failures
      we've seen in the buildfarm ever since this test went in.  I was able
      to reproduce the problem on gaur semi-repeatably, and this appears to
      fix it.
      
      In passing, remove a duplicate assignment, fix one stdin-assignment to
      look like the rest, and document the test's dependency on test_decoding.
      d11eae09
  6. 11 Aug, 2018 3 commits
    • Tom Lane's avatar
      Fix wrong order of operations in inheritance_planner. · 4a2994f0
      Tom Lane authored
      When considering a partitioning parent rel, we should stop processing that
      subroot as soon as we've done adjust_appendrel_attrs and any securityQuals
      updates.  The rest of this is unnecessary, and indeed adding duplicate
      subquery RTEs to the subroot is *wrong*.  As the code stood, the children
      of that partition ended up with two sets of copied subquery RTEs, confusing
      matters greatly.  Even more hilarity ensued if all of the children got
      excluded by constraint exclusion, so that the extra RTEs didn't make it
      back into the parent rtable.
      
      Per fuzz testing by Andreas Seltenreich.  Back-patch to v11 where this
      got broken (by commit 0a480502, it looks like).
      
      Discussion: https://postgr.es/m/87va8g7vq0.fsf@ansel.ydns.eu
      4a2994f0
    • Tom Lane's avatar
      Produce compiler errors if errno is referenced inside elog/ereport calls. · a2a8acd1
      Tom Lane authored
      It's often unsafe to reference errno within an elog/ereport call, because
      there are a lot of sub-functions involved and they might not all preserve
      errno.  (This is why we support the %m format spec: it works off a value
      of errno captured before we execute any potentially-unsafe functions in
      the arguments.)  Therefore, we have a project policy not to use errno
      there.
      
      This patch adds a hack to cause an (admittedly obscure) compiler error
      for such unsafe usages.  With the current code, the error will only be seen
      on Linux, macOS, and FreeBSD, but that should certainly be enough to catch
      mistakes in the buildfarm if they somehow get missed earlier.
      
      In addition, fix some places in src/common/exec.c that trip the error.
      I think these places are actually all safe, but it's simple enough to
      avoid the error by capturing errno manually, and doing so is good
      future-proofing in case these call sites get any more complicated.
      
      Thomas Munro (exec.c fixes by me)
      
      Discussion: https://postgr.es/m/2975.1526862605@sss.pgh.pa.us
      a2a8acd1
    • Tom Lane's avatar
      Distinguish printf-like functions that support %m from those that don't. · 3a60c8ff
      Tom Lane authored
      The elog/ereport family of functions certainly support the %m format spec,
      because they implement it "by hand".  But elsewhere we have printf wrappers
      that might or might not allow it depending on whether the platform's printf
      does.  (Most non-glibc versions don't, and notably, src/port/snprintf.c
      doesn't.)  Hence, rather than using the gnu_printf format archetype
      interchangeably for all these functions, use it only for elog/ereport.
      This will allow us to get compiler warnings for mistakes like the ones
      fixed in commit a13b47a5, at least on platforms where printf doesn't
      take %m and gcc is correctly configured to know it.  (Unfortunately,
      that won't happen on Linux, nor on macOS according to my testing.
      It remains to be seen what the buildfarm's gcc-on-Windows animals will
      think of this, but we may well have to rely on less-popular platforms
      to warn us about unportable code of this kind.)
      
      Discussion: https://postgr.es/m/2975.1526862605@sss.pgh.pa.us
      3a60c8ff
  7. 10 Aug, 2018 7 commits
  8. 09 Aug, 2018 3 commits