1. 12 Jan, 2021 3 commits
    • Amit Kapila's avatar
      Optimize DropRelFileNodeBuffers() for recovery. · d6ad34f3
      Amit Kapila authored
      The recovery path of DropRelFileNodeBuffers() is optimized so that
      scanning of the whole buffer pool can be avoided when the number of
      blocks to be truncated in a relation is below a certain threshold. For
      such cases, we find the buffers by doing lookups in BufMapping table.
      This improves the performance by more than 100 times in many cases
      when several small tables (tested with 1000 relations) are truncated
      and where the server is configured with a large value of shared
      buffers (greater than equal to 100GB).
      
      This optimization helps cases (a) when vacuum or autovacuum truncated off
      any of the empty pages at the end of a relation, or (b) when the relation is
      truncated in the same transaction in which it was created.
      
      This commit introduces a new API smgrnblocks_cached which returns a cached
      value for the number of blocks in a relation fork. This helps us to determine
      the exact size of relation which is required to apply this optimization. The
      exact size is required to ensure that we don't leave any buffer for the
      relation being dropped as otherwise the background writer or checkpointer
      can lead to a PANIC error while flushing buffers corresponding to files that
      don't exist.
      
      Author: Kirk Jamison based on ideas by Amit Kapila
      Reviewed-by: Kyotaro Horiguchi, Takayuki Tsunakawa, and Amit Kapila
      Tested-By: Haiying Tang
      Discussion: https://postgr.es/m/OSBPR01MB3207DCA7EC725FDD661B3EDAEF660@OSBPR01MB3207.jpnprd01.prod.outlook.com
      d6ad34f3
    • Tom Lane's avatar
      Dump ALTER TABLE ... ATTACH PARTITION as a separate ArchiveEntry. · 9a4c0e36
      Tom Lane authored
      Previously, we emitted the ATTACH PARTITION command as part of
      the child table's ArchiveEntry.  This was a poor choice since it
      complicates restoring the partition as a standalone table; you have
      to ignore the error from the ATTACH, which isn't even an option when
      restoring direct-to-database with pg_restore.  (pg_restore will issue
      the whole ArchiveEntry as one PQexec, so that any error rolls back
      the table creation as well.)  Hence, separate it out as its own
      ArchiveEntry, as indeed we already did for index ATTACH PARTITION
      commands.
      
      Justin Pryzby
      
      Discussion: https://postgr.es/m/20201023052940.GE9241@telsasoft.com
      9a4c0e36
    • Tom Lane's avatar
      Make pg_dump's table of object-type priorities more maintainable. · d5ab79d8
      Tom Lane authored
      Wedging a new object type into this table has historically required
      manually renumbering a lot of existing entries.  (Although it appears
      that some people got lazy and re-used the priority level of an
      existing object type, even if it wasn't particularly related.)
      We can let the compiler do the counting by inventing an enum type that
      lists the desired priority levels in order.  Now, if you want to add
      or remove a priority level, that's a one-liner.
      
      This patch is not purely cosmetic, because I split apart the priorities
      of DO_COLLATION and DO_TRANSFORM, as well as those of DO_ACCESS_METHOD
      and DO_OPERATOR, which look to me to have been merged out of expediency
      rather than because it was a good idea.  Shell types continue to be
      sorted interchangeably with full types, and opclasses interchangeably
      with opfamilies.
      d5ab79d8
  2. 11 Jan, 2021 8 commits
    • Thomas Munro's avatar
      Fix function prototypes in dependency.h. · f315205f
      Thomas Munro authored
      Commit 257836a7 accidentally deleted a couple of
      redundant-but-conventional "extern" keywords on function prototypes.
      Put them back.
      Reported-by: default avatarAlvaro Herrera <alvherre@alvh.no-ip.org>
      f315205f
    • Tom Lane's avatar
      Rethink SQLSTATE code for ERRCODE_IDLE_SESSION_TIMEOUT. · 4edf9684
      Tom Lane authored
      Move it to class 57 (Operator Intervention), which seems like a
      better choice given that from the client's standpoint it behaves
      a heck of a lot like, e.g., ERRCODE_ADMIN_SHUTDOWN.
      
      In a green field I'd put ERRCODE_IDLE_IN_TRANSACTION_SESSION_TIMEOUT
      here as well.  But that's been around for a few years, so it's
      probably too late to change its SQLSTATE code.
      
      Discussion: https://postgr.es/m/763A0689-F189-459E-946F-F0EC4458980B@hotmail.com
      4edf9684
    • Tom Lane's avatar
      Try next host after a "cannot connect now" failure. · c1d58957
      Tom Lane authored
      If a server returns ERRCODE_CANNOT_CONNECT_NOW, try the next host,
      if multiple host names have been provided.  This allows dealing
      gracefully with standby servers that might not be in hot standby mode
      yet.
      
      In the wake of the preceding commit, it might be plausible to retry
      many more error cases than we do now, but I (tgl) am hesitant to
      move too aggressively on that --- it's not clear it'd be desirable
      for cases such as bad-password, for example.  But this case seems
      safe enough.
      
      Hubert Zhang, reviewed by Takayuki Tsunakawa
      
      Discussion: https://postgr.es/m/BN6PR05MB3492948E4FD76C156E747E8BC9160@BN6PR05MB3492.namprd05.prod.outlook.com
      c1d58957
    • Tom Lane's avatar
      Uniformly identify the target host in libpq connection failure reports. · 52a10224
      Tom Lane authored
      Prefix "could not connect to host-or-socket-path:" to all connection
      failure cases that occur after the socket() call, and remove the
      ad-hoc server identity data that was appended to a few of these
      messages.  This should produce much more intelligible error reports
      in multiple-target-host situations, especially for error cases that
      are off the beaten track to any degree (because none of those provided
      any server identity info).
      
      As an example of the change, formerly a connection attempt with a bad
      port number such as "psql -p 12345 -h localhost,/tmp" might produce
      
      psql: error: could not connect to server: Connection refused
              Is the server running on host "localhost" (::1) and accepting
              TCP/IP connections on port 12345?
      could not connect to server: Connection refused
              Is the server running on host "localhost" (127.0.0.1) and accepting
              TCP/IP connections on port 12345?
      could not connect to server: No such file or directory
              Is the server running locally and accepting
              connections on Unix domain socket "/tmp/.s.PGSQL.12345"?
      
      Now it looks like
      
      psql: error: could not connect to host "localhost" (::1), port 12345: Connection refused
              Is the server running on that host and accepting TCP/IP connections?
      could not connect to host "localhost" (127.0.0.1), port 12345: Connection refused
              Is the server running on that host and accepting TCP/IP connections?
      could not connect to socket "/tmp/.s.PGSQL.12345": No such file or directory
              Is the server running locally and accepting connections on that socket?
      
      This requires adjusting a couple of regression tests to allow for
      variation in the contents of a connection failure message.
      
      Discussion: https://postgr.es/m/BN6PR05MB3492948E4FD76C156E747E8BC9160@BN6PR05MB3492.namprd05.prod.outlook.com
      52a10224
    • Tom Lane's avatar
      Allow pg_regress.c wrappers to postprocess test result files. · 800d93f3
      Tom Lane authored
      Add an optional callback to regression_main() that, if provided,
      is invoked on each test output file before we try to compare it
      to the expected-result file.
      
      The main and isolation test programs don't need this (yet).
      In pg_regress_ecpg, add a filter that eliminates target-host
      details from "could not connect" error reports.  This filter
      doesn't do anything as of this commit, but it will be needed
      by the next one.
      
      In the long run we might want to provide some more general,
      perhaps pattern-based, filtering mechanism for test output.
      For now, this will solve the immediate problem.
      
      Discussion: https://postgr.es/m/BN6PR05MB3492948E4FD76C156E747E8BC9160@BN6PR05MB3492.namprd05.prod.outlook.com
      800d93f3
    • Tom Lane's avatar
      In libpq, always append new error messages to conn->errorMessage. · ffa2e467
      Tom Lane authored
      Previously, we had an undisciplined mish-mash of printfPQExpBuffer and
      appendPQExpBuffer calls to report errors within libpq.  This commit
      establishes a uniform rule that appendPQExpBuffer[Str] should be used.
      conn->errorMessage is reset only at the start of an application request,
      and then accumulates messages till we're done.  We can remove no less
      than three different ad-hoc mechanisms that were used to get the effect
      of concatenation of error messages within a sequence of operations.
      
      Although this makes things quite a bit cleaner conceptually, the main
      reason to do it is to make the world safer for the multiple-target-host
      feature that was added awhile back.  Previously, there were many cases
      in which an error occurring during an individual host connection attempt
      would wipe out the record of what had happened during previous attempts.
      (The reporting is still inadequate, in that it can be hard to tell which
      host got the failure, but that seems like a matter for a separate commit.)
      
      Currently, lo_import and lo_export contain exceptions to the "never
      use printfPQExpBuffer" rule.  If we changed them, we'd risk reporting
      an incidental lo_close failure before the actual read or write
      failure, which would be confusing, not least because lo_close happened
      after the main failure.  We could improve this by inventing an
      internal version of lo_close that doesn't reset the errorMessage; but
      we'd also need a version of PQfn() that does that, and it didn't quite
      seem worth the trouble for now.
      
      Discussion: https://postgr.es/m/BN6PR05MB3492948E4FD76C156E747E8BC9160@BN6PR05MB3492.namprd05.prod.outlook.com
      ffa2e467
    • Thomas Munro's avatar
      Use vectored I/O to fill new WAL segments. · ce6a71fa
      Thomas Munro authored
      Instead of making many block-sized write() calls to fill a new WAL file
      with zeroes, make a smaller number of pwritev() calls (or various
      emulations).  The actual number depends on the OS's IOV_MAX, which
      PG_IOV_MAX currently caps at 32.  That means we'll write 256kB per call
      on typical systems.  We may want to tune the number later with more
      experience.
      Reviewed-by: default avatarTom Lane <tgl@sss.pgh.pa.us>
      Reviewed-by: default avatarAndres Freund <andres@anarazel.de>
      Discussion: https://postgr.es/m/CA%2BhUKGJA%2Bu-220VONeoREBXJ9P3S94Y7J%2BkqCnTYmahvZJwM%3Dg%40mail.gmail.com
      ce6a71fa
    • Thomas Munro's avatar
      Provide pg_preadv() and pg_pwritev(). · 13a021f3
      Thomas Munro authored
      Provide synchronous vectored file I/O routines.  These map to preadv()
      and pwritev(), with fallback implementations for systems that don't have
      them.  Also provide a wrapper pg_pwritev_with_retry() that automatically
      retries on short writes.
      Reviewed-by: default avatarTom Lane <tgl@sss.pgh.pa.us>
      Reviewed-by: default avatarAndres Freund <andres@anarazel.de>
      Discussion: https://postgr.es/m/CA%2BhUKGJA%2Bu-220VONeoREBXJ9P3S94Y7J%2BkqCnTYmahvZJwM%3Dg%40mail.gmail.com
      13a021f3
  3. 09 Jan, 2021 2 commits
  4. 08 Jan, 2021 4 commits
    • Tom Lane's avatar
      Fix plpgsql tests for debug_invalidate_system_caches_always. · 39d4a153
      Tom Lane authored
      Commit c9d52984 resulted in having a couple more places where
      the error context stack for a failure varies depending on
      debug_invalidate_system_caches_always (nee CLOBBER_CACHE_ALWAYS).
      This is not very surprising, since we have to re-parse cached
      plans if the plan cache is clobbered.  Stabilize the expected
      test output by hiding the context stack in these places,
      as we've done elsewhere in this test script.
      
      (Another idea worth considering, now that we have
      debug_invalidate_system_caches_always, is to force it to zero for
      these test cases.  That seems like it'd risk reducing the coverage
      of cache-clobber testing, which might or might not be worth being
      able to verify that we get the expected error output in normal
      cases.  For the moment I just stuck with the existing technique.)
      
      In passing, update comments that referred to CLOBBER_CACHE_ALWAYS.
      
      Per buildfarm member hyrax.
      39d4a153
    • Tom Lane's avatar
      Fix ancient bug in parsing of BRE-mode regular expressions. · afcc8772
      Tom Lane authored
      brenext(), when parsing a '*' quantifier, forgot to return any "value"
      for the token; per the equivalent case in next(), it should return
      value 1 to indicate that greedy rather than non-greedy behavior is
      wanted.  The result is that the compiled regexp could behave like 'x*?'
      rather than the intended 'x*', if we were unlucky enough to have
      a zero in v->nextvalue at this point.  That seems to happen with some
      reliability if we have '.*' at the beginning of a BRE-mode regexp,
      although that depends on the initial contents of a stack-allocated
      struct, so it's not guaranteed to fail.
      
      Found by Alexander Lakhin using valgrind testing.  This bug seems
      to be aboriginal in Spencer's code, so back-patch all the way.
      
      Discussion: https://postgr.es/m/16814-6c5e3edd2bdf0d50@postgresql.org
      afcc8772
    • Michael Paquier's avatar
      Fix and simplify some code related to cryptohashes · 15b824da
      Michael Paquier authored
      This commit addresses two issues:
      - In pgcrypto, MD5 computation called pg_cryptohash_{init,update,final}
      without checking for the result status.
      - Simplify pg_checksum_raw_context to use only one variable for all the
      SHA2 options available in checksum manifests.
      
      Reported-by: Heikki Linnakangas
      Discussion: https://postgr.es/m/f62f26bb-47a5-8411-46e5-4350823e06a5@iki.fi
      15b824da
    • Tom Lane's avatar
      Adjust createdb TAP tests to work on recent OpenBSD. · 9ffe2278
      Tom Lane authored
      We found last February that the error-case tests added by commit
      008cf040 failed on OpenBSD, because that platform doesn't really
      check locale names.  At the time it seemed that that was only an issue
      for LC_CTYPE, but testing on a more recent version of OpenBSD shows
      that it's now equally lax about LC_COLLATE.
      
      Rather than dropping the LC_COLLATE test too, put back LC_CTYPE
      (reverting c4b0edb0), and adjust these tests to accept the different
      error message that we get if setlocale() doesn't reject a bogus locale
      name.  The point of these tests is not really what the backend does
      with the locale name, but to show that createdb quotes funny locale
      names safely; so we're not losing test reliability this way.
      
      Back-patch as appropriate.
      
      Discussion: https://postgr.es/m/231373.1610058324@sss.pgh.pa.us
      9ffe2278
  5. 07 Jan, 2021 6 commits
  6. 06 Jan, 2021 10 commits
    • Tom Lane's avatar
      Add idle_session_timeout. · 9877374b
      Tom Lane authored
      This GUC variable works much like idle_in_transaction_session_timeout,
      in that it kills sessions that have waited too long for a new client
      query.  But it applies when we're not in a transaction, rather than
      when we are.
      
      Li Japin, reviewed by David Johnston and Hayato Kuroda, some
      fixes by me
      
      Discussion: https://postgr.es/m/763A0689-F189-459E-946F-F0EC4458980B@hotmail.com
      9877374b
    • Tom Lane's avatar
      Improve timeout.c's handling of repeated timeout set/cancel. · 09cf1d52
      Tom Lane authored
      A very common usage pattern is that we set a timeout that we don't
      expect to reach, cancel it after a little bit, and later repeat.
      With the original implementation of timeout.c, this results in one
      setitimer() call per timeout set or cancel.  We can do a lot better
      by being lazy about changing the timeout interrupt request, namely:
      (1) never cancel the outstanding interrupt, even when we have no
      active timeout events;
      (2) if we need to set an interrupt, but there already is one pending
      at or before the required time, leave it alone.  When the interrupt
      happens, the signal handler will reschedule it at whatever time is
      then needed.
      
      For example, with a one-second setting for statement_timeout, this
      method results in having to interact with the kernel only a little
      more than once a second, no matter how many statements we execute
      in between.  The mainline code might never call setitimer() at all
      after the first time, while each time the signal handler fires,
      it sees that the then-pending request is most of a second away,
      and that's when it sets the next interrupt request for.  Each
      mainline timeout-set request after that will observe that the time
      it wants is past the pending interrupt request time, and do nothing.
      
      This also works pretty well for cases where a few different timeout
      lengths are in use, as long as none of them are very short.  But
      that describes our usage well.
      
      Idea and original patch by Thomas Munro; I fixed a race condition
      and improved the comments.
      
      Discussion: https://postgr.es/m/CA+hUKG+o6pbuHBJSGnud=TadsuXySWA7CCcPgCt2QE9F6_4iHQ@mail.gmail.com
      09cf1d52
    • Tomas Vondra's avatar
      Report progress of COPY commands · 8a4f618e
      Tomas Vondra authored
      This commit introduces a view pg_stat_progress_copy, reporting progress
      of COPY commands.  This allows rough estimates how far a running COPY
      progressed, with the caveat that the total number of bytes may not be
      available in some cases (e.g. when the input comes from the client).
      
      Author: Josef Šimánek
      Reviewed-by: Fujii Masao, Bharath Rupireddy, Vignesh C, Matthias van de Meent
      Discussion: https://postgr.es/m/CAFp7QwqMGEi4OyyaLEK9DR0+E+oK3UtA4bEjDVCa4bNkwUY2PQ@mail.gmail.com
      Discussion: https://postgr.es/m/CAFp7Qwr6_FmRM6pCO0x_a0mymOfX_Gg+FEKet4XaTGSW=LitKQ@mail.gmail.com
      8a4f618e
    • Tom Lane's avatar
      Add a test module for the regular expression package. · ca8217c1
      Tom Lane authored
      This module provides a function test_regex() that is functionally
      rather like regexp_matches(), but with additional debugging-oriented
      options and additional output.  The debug options are somewhat obscure;
      they are chosen to match the API of the test harness that Henry Spencer
      wrote way-back-when for use in Tcl.  With this, we can import all the
      test cases that Spencer wrote originally, even for regex functionality
      that we don't currently expose in Postgres.  This seems necessary
      because we can no longer rely on Tcl to act as upstream and verify
      any fixes or improvements that we make.
      
      In addition to Spencer's tests, I added a few for lookbehind
      constraints (which we added in 2015, and Tcl still hasn't absorbed)
      that are modeled on his tests for lookahead constraints.  After looking
      at code coverage reports, I also threw in a couple of tests to more
      fully exercise our "high colormap" logic.
      
      According to my testing, this brings the check-world coverage
      for src/backend/regex/ from 71.1% to 86.7% of lines.
      (coverage.postgresql.org shows a slightly different number,
      which I think is because it measures a non-assert build.)
      
      Discussion: https://postgr.es/m/2873268.1609732164@sss.pgh.pa.us
      ca8217c1
    • Peter Eisentraut's avatar
      Replace CLOBBER_CACHE_ALWAYS with run-time GUC · 4656e3d6
      Peter Eisentraut authored
      Forced cache invalidation (CLOBBER_CACHE_ALWAYS) has been impractical
      to use for testing in PostgreSQL because it's so slow and because it's
      toggled on/off only at build time.  It is helpful when hunting bugs in
      any code that uses the sycache/relcache because causes cache
      invalidations to be injected whenever it would be possible for an
      invalidation to occur, whether or not one was really pending.
      
      Address this by providing run-time control over cache clobber
      behaviour using the new debug_invalidate_system_caches_always GUC.
      Support is not compiled in at all unless assertions are enabled or
      CLOBBER_CACHE_ENABLED is explicitly defined at compile time.  It
      defaults to 0 if compiled in, so it has negligible effect on assert
      build performance by default.
      
      When support is compiled in, test code can now set
      debug_invalidate_system_caches_always=1 locally to a backend to test
      specific queries, functions, extensions, etc.  Or tests can toggle it
      globally for a specific test case while retaining normal performance
      during test setup and teardown.
      
      For backwards compatibility with existing test harnesses and scripts,
      debug_invalidate_system_caches_always defaults to 1 if
      CLOBBER_CACHE_ALWAYS is defined, and to 3 if CLOBBER_CACHE_RECURSIVE
      is defined.
      
      CLOBBER_CACHE_ENABLED is now visible in pg_config_manual.h, as is the
      related RECOVER_RELATION_BUILD_MEMORY setting for the relcache.
      
      Author: Craig Ringer <craig.ringer@2ndquadrant.com>
      Discussion: https://www.postgresql.org/message-id/flat/CAMsr+YF=+ctXBZj3ywmvKNUjWpxmuTuUKuv-rgbHGX5i5pLstQ@mail.gmail.com
      4656e3d6
    • Fujii Masao's avatar
      Detect the deadlocks between backends and the startup process. · 8900b5a9
      Fujii Masao authored
      The deadlocks that the recovery conflict on lock is involved in can
      happen between hot-standby backends and the startup process.
      If a backend takes an access exclusive lock on the table and which
      finally triggers the deadlock, that deadlock can be detected
      as expected. On the other hand, previously, if the startup process
      took an access exclusive lock and which finally triggered the deadlock,
      that deadlock could not be detected and could remain even after
      deadlock_timeout passed. This is a bug.
      
      The cause of this bug was that the code for handling the recovery
      conflict on lock didn't take care of deadlock case at all. It assumed
      that deadlocks involving the startup process and backends were able
      to be detected by the deadlock detector invoked within backends.
      But this assumption was incorrect. The startup process also should
      have invoked the deadlock detector if necessary.
      
      To fix this bug, this commit makes the startup process invoke
      the deadlock detector if deadlock_timeout is reached while handling
      the recovery conflict on lock. Specifically, in that case, the startup
      process requests all the backends holding the conflicting locks to
      check themselves for deadlocks.
      
      Back-patch to v9.6. v9.5 has also this bug, but per discussion we decided
      not to back-patch the fix to v9.5. Because v9.5 doesn't have some
      infrastructure codes (e.g., 37c54863) that this bug fix patch depends on.
      We can apply those codes for the back-patch, but since the next minor
      version release is the final one for v9.5, it's risky to do that. If we
      unexpectedly introduce new bug to v9.5 by the back-patch, there is no
      chance to fix that. We determined that the back-patch to v9.5 would give
      more risk than gain.
      
      Author: Fujii Masao
      Reviewed-by: Bertrand Drouvot, Masahiko Sawada, Kyotaro Horiguchi
      Discussion: https://postgr.es/m/4041d6b6-cf24-a120-36fa-1294220f8243@oss.nttdata.com
      8900b5a9
    • Amit Kapila's avatar
    • Fujii Masao's avatar
      doc: Fix description about default behavior of recovery_target_timeline. · 25dde583
      Fujii Masao authored
      The default value of recovery_target_timeline was changed in v12,
      but the description about the default behavior of that was not updated.
      
      Back-patch to v12 where the default behavior of recovery_target_timeline
      was changed.
      
      Author: Benoit Lobréau
      Reviewed-by: Fujii Masao
      Discussion: https://postgr.es/m/CAPE8EZ7c3aruEmM24GYkj8y8WmHKD1m9TtPtgCF0nQ3zw4LCkQ@mail.gmail.com
      25dde583
    • Michael Paquier's avatar
      Promote --data-checksums to the common set of options in initdb --help · bc08f797
      Michael Paquier authored
      This was previously part of the section dedicated to less common
      options, but it is an option commonly used these days.
      
      Author: Michael Banck
      Reviewed-by: Stephen Frost, Michael Paquier
      Discussion: https://postgr.es/m/d7938aca4d4ea8e8c72c33bd75efe9f8218fe390.camel@credativ.de
      bc08f797
    • Tom Lane's avatar
      Revert unstable test cases from commit 7d80441d. · 14d49f48
      Tom Lane authored
      I momentarily forgot that the "owner" column wouldn't be stable
      in the buildfarm.  Oh well, these tests weren't very valuable
      anyway.
      
      Discussion: https://postgr.es/m/20201130165436.GX24052@telsasoft.com
      14d49f48
  7. 05 Jan, 2021 7 commits