1. 05 Apr, 2021 3 commits
  2. 04 Apr, 2021 8 commits
    • Tom Lane's avatar
      Fix more confusion in SP-GiST. · dfc843d4
      Tom Lane authored
      spg_box_quad_leaf_consistent unconditionally returned the leaf
      datum as leafValue, even though in its usage for poly_ops that
      value is of completely the wrong type.
      
      In versions before 12, that was harmless because the core code did
      nothing with leafValue in non-index-only scans ... but since commit
      2a636834, if we were doing a KNN-style scan, spgNewHeapItem would
      unconditionally try to copy the value using the wrong datatype
      parameters.  Said copying is a waste of time and space if we're not
      going to return the data, but it accidentally failed to fail until
      I fixed the datatype confusion in ac9099fc.
      
      Hence, change spgNewHeapItem to not copy the datum unless we're
      actually going to return it later.  This saves cycles and dodges
      the question of whether lossy opclasses are returning the right
      type.  Also change spg_box_quad_leaf_consistent to not return
      data that might be of the wrong type, as insurance against
      somebody introducing a similar bug into the core code in future.
      
      It seems like a good idea to back-patch these two changes into
      v12 and v13, although I'm afraid to change spgNewHeapItem's
      mistaken idea of which datatype to use in those branches.
      
      Per buildfarm results from ac9099fc.
      
      Discussion: https://postgr.es/m/3728741.1617381471@sss.pgh.pa.us
      dfc843d4
    • Tom Lane's avatar
      Fix confusion in SP-GiST between attribute type and leaf storage type. · ac9099fc
      Tom Lane authored
      According to the documentation, the attType passed to the opclass
      config function (and also relied on by the core code) is the type
      of the heap column or expression being indexed.  But what was
      actually being passed was the type stored for the index column.
      This made no difference for user-defined SP-GiST opclasses,
      because we weren't allowing the STORAGE clause of CREATE OPCLASS
      to be used, so the two types would be the same.  But it's silly
      not to allow that, seeing that the built-in poly_ops opclass
      has a different value for opckeytype than opcintype, and that if you
      want to do lossy storage then the types must really be different.
      (Thus, user-defined opclasses doing lossy storage had to lie about
      what type is in the index.)  Hence, remove the restriction, and make
      sure that we use the input column type not opckeytype where relevant.
      
      For reasons of backwards compatibility with existing user-defined
      opclasses, we can't quite insist that the specified leafType match
      the STORAGE clause; instead just add an amvalidate() warning if
      they don't match.
      
      Also fix some bugs that would only manifest when trying to return
      index entries when attType is different from attLeafType.  It's not
      too surprising that these have not been reported, because the only
      usual reason for such a difference is to store the leaf value
      lossily, rendering index-only scans impossible.
      
      Add a src/test/modules module to exercise cases where attType is
      different from attLeafType and yet index-only scan is supported.
      
      Discussion: https://postgr.es/m/3728741.1617381471@sss.pgh.pa.us
      ac9099fc
    • Tomas Vondra's avatar
      Fix bug in brin_minmax_multi_union · d9c5b9a9
      Tomas Vondra authored
      When calling sort_expanded_ranges() we need to remember the return
      value, because the function sorts and also deduplicates the ranges. So
      the number of ranges may decrease. brin_minmax_multi_union failed to do
      that, which resulted in crashes due to bogus ranges (equal minval/maxval
      but not marked as compacted).
      
      Reported-by: Jaime Casanova
      Discussion: https://postgr.es/m/20210404052550.GA4376%40ahch-to
      d9c5b9a9
    • Tomas Vondra's avatar
      Add regression test for minmax-multi macaddr8 type · 4908684d
      Tomas Vondra authored
      The regression test for BRIN minmax-multi opclasses tested almost all
      supported data types, with the exception of macaddr8. So this adds it.
      4908684d
    • Tomas Vondra's avatar
      Fix order of parameters in BRIN minmax-multi calls · 1dad2a5e
      Tomas Vondra authored
      The BRIN minmax-multi consistent function incorrectly assumed it can
      lookup an operator, and then swap the arguments to get the commutator.
      For example <(a,b) would be called as <(b,a) to get >(a,b). This works
      when the arguments are of the same type, but with cross-type opclasses
      this fails. We can't swap <(float4,float8) arguments, for example.
      
      Fixed by passing arguments in the right order.
      
      Discussion: https://postgr.es/m/CAJKUy5jLZFLCxyxfT%3DMfK5mtPfSzHA1rVLowR-j4RRsFVvKm7A%40mail.gmail.com
      1dad2a5e
    • Tomas Vondra's avatar
      Fix BRIN minmax-multi distance for inet type · e1fbe118
      Tomas Vondra authored
      The distance calculation ignored the mask, unlike the inet comparator,
      which resulted in negative distance in some cases. Fixed by applying the
      mask in brin_minmax_multi_distance_inet. I've considered simply calling
      inetmi() to calculate the delta, but that does not consider mask either.
      
      Reviewed-by: Zhihong Yu
      Discussion: https://postgr.es/m/1a0a7b9d-9bda-e3a2-7fa4-88f15042a051%40enterprisedb.com
      e1fbe118
    • Tomas Vondra's avatar
      Fix BRIN minmax-multi distance for timetz type · 7262f242
      Tomas Vondra authored
      The distance calculation ignored the time zone, so the result of (b-a)
      might have ended negative even if (b > a). Fixed by considering the time
      zone difference.
      
      Reported-by: Jaime Casanova
      Discussion: https://postgr.es/m/CAJKUy5jLZFLCxyxfT%3DMfK5mtPfSzHA1rVLowR-j4RRsFVvKm7A%40mail.gmail.com
      7262f242
    • Tomas Vondra's avatar
      Fix BRIN minmax-multi distance for interval type · 2b10e0e3
      Tomas Vondra authored
      The distance calculation for interval type was treating months as having
      31 days, which is inconsistent with the interval comparator (using 30
      days). Due to this it was possible to get negative distance (b-a) when
      (a<b), trigerring an assert.
      
      Fixed by adopting the same logic as interval_cmp_value.
      
      Reported-by: Jaime Casanova
      Discussion: https://postgr.es/m/CAJKUy5jKH0Xhneau2mNftNPtTy-BVgQfXc8zQkEvRvBHfeUThQ%40mail.gmail.com
      2b10e0e3
  3. 03 Apr, 2021 7 commits
    • Tom Lane's avatar
      Improve psql's behavior when the editor is exited without saving. · 55873a00
      Tom Lane authored
      When editing the previous query buffer, if the editor is exited
      without modifying the temp file then clear the query buffer,
      rather than re-loading (and probably re-executing) the previous
      query buffer.  This reduces the probability of accidentally
      re-executing something you didn't intend to.
      
      Similarly, in "\e file", if the file isn't actually modified
      then don't load it into the query buffer.  And in "\ef" and
      "\ev", if no changes are made then clear the query buffer
      instead of loading the function or view definition into it.
      
      Cases where we fail to invoke the editor at all, or it returns
      a nonzero status, are treated like the no-file-modification case.
      
      Laurenz Albe, reviewed by Jacob Champion
      
      Discussion: https://postgr.es/m/0ba3f2a658bac6546d9934ab6ba63a805d46a49b.camel@cybertec.at
      55873a00
    • Andres Freund's avatar
      Improve efficiency of wait event reporting, remove proc.h dependency. · 225a22b1
      Andres Freund authored
      pgstat_report_wait_start() and pgstat_report_wait_end() required two
      conditional branches so far. One to check if MyProc is NULL, the other to
      check if pgstat_track_activities is set. As wait events are used around
      comparatively lightweight operations, and are inlined (reducing branch
      predictor effectiveness), that's not great.
      
      The dependency on MyProc has a second disadvantage: Low-level subsystems, like
      storage/file/fd.c, report wait events, but architecturally it is preferable
      for them to not depend on inter-process subsystems like proc.h (defining
      PGPROC).  After this change including pgstat.h (nor obviously its
      sub-components like backend_status.h, wait_event.h, ...) does not pull in IPC
      related headers anymore.
      
      These goals, efficiency and abstraction, are achieved by having
      pgstat_report_wait_start/end() not interact with MyProc, but instead a new
      my_wait_event_info variable. At backend startup it points to a local variable,
      removing the need to check for MyProc being NULL. During process
      initialization my_wait_event_info is redirected to MyProc->wait_event_info. At
      shutdown this is reversed. Because wait event reporting now does not need to
      know about where the wait event is stored, it does not need to know about
      PGPROC anymore.
      
      The removal of the branch for checking pgstat_track_activities is simpler:
      Don't check anymore. The cost due to the branch are often higher than the
      store - and even if not, pgstat_track_activities is rarely disabled.
      
      The main motivator to commit this work now is that removing the (indirect)
      pgproc.h include from pgstat.h simplifies a patch to move statistics reporting
      to shared memory (which still has a chance to get into 14).
      
      Author: Andres Freund <andres@anarazel.de>
      Discussion: https://postgr.es/m/20210402194458.2vu324hkk2djq6ce@alap3.anarazel.de
      225a22b1
    • Andres Freund's avatar
      Split backend status and progress related functionality out of pgstat.c. · e1025044
      Andres Freund authored
      Backend status (supporting pg_stat_activity) and command
      progress (supporting pg_stat_progress*) related code is largely
      independent from the rest of pgstat.[ch] (supporting views like
      pg_stat_all_tables that accumulate data over time). See also
      a333476b.
      
      This commit doesn't rename the function names to make the distinction
      from the rest of pgstat_ clearer - that'd be more invasive and not
      clearly beneficial. If we were to decide to do such a rename at some
      point, it's better done separately from moving the code as well.
      
      Robert's review was of an earlier version.
      Reviewed-By: default avatarRobert Haas <robertmhaas@gmail.com>
      Discussion: https://postgr.es/m/20210316195440.twxmlov24rr2nxrg@alap3.anarazel.de
      e1025044
    • Michael Paquier's avatar
      Use more verbose matching patterns for errors in SSL TAP tests · 8d3a4c3e
      Michael Paquier authored
      The TAP tests of src/test/ssl/ have been using rather generic matching
      patterns to check some failure scenarios, like "SSL error" or just
      "FATAL".  These have been introduced in 081bfc19.
      
      Those messages are not wrong per se, but when working on the integration
      of new SSL libraries it becomes hard to know if those errors are legit
      or not, and existing scenarios may fail in incorrect ways.  This commit
      makes all those messages more verbose by adding the information
      generated by OpenSSL.  Fortunately, the same error messages are used for
      all the versions supported on HEAD (checked that after running the tests
      from 1.0.1 to 1.1.1), so the change is straight-forward.
      
      Reported-by: Jacob Champion, Álvaro Herrera
      Discussion: https://postgr.es/m/YGU3AxQh0zBMMW8m@paquier.xyz
      8d3a4c3e
    • Michael Paquier's avatar
      Refactor HMAC implementations · e6bdfd97
      Michael Paquier authored
      Similarly to the cryptohash implementations, this refactors the existing
      HMAC code into a single set of APIs that can be plugged with any crypto
      libraries PostgreSQL is built with (only OpenSSL currently).  If there
      is no such libraries, a fallback implementation is available.  Those new
      APIs are designed similarly to the existing cryptohash layer, so there
      is no real new design here, with the same logic around buffer bound
      checks and memory handling.
      
      HMAC has a dependency on cryptohashes, so all the cryptohash types
      supported by cryptohash{_openssl}.c can be used with HMAC.  This
      refactoring is an advantage mainly for SCRAM, that included its own
      implementation of HMAC with SHA256 without relying on the existing
      crypto libraries even if PostgreSQL was built with their support.
      
      This code has been tested on Windows and Linux, with and without
      OpenSSL, across all the versions supported on HEAD from 1.1.1 down to
      1.0.1.  I have also checked that the implementations are working fine
      using some sample results, a custom extension of my own, and doing
      cross-checks across different major versions with SCRAM with the client
      and the backend.
      
      Author: Michael Paquier
      Reviewed-by: Bruce Momjian
      Discussion: https://postgr.es/m/X9m0nkEJEzIPXjeZ@paquier.xyz
      e6bdfd97
    • Andres Freund's avatar
      Do not rely on pgstat.h to indirectly include storage/ headers. · 1d9c5d0c
      Andres Freund authored
      An upcoming patch might remove the (now indirect) proc.h
      include (which in turn includes other headers), and it's cleaner for
      the modified files to include their dependencies directly anyway...
      
      Discussion: https://postgr.es/m/20210402194458.2vu324hkk2djq6ce@alap3.anarazel.de
      1d9c5d0c
    • Andres Freund's avatar
      Split wait event related code from pgstat.[ch] into wait_event.[ch]. · a333476b
      Andres Freund authored
      The wait event related code is independent from the rest of the
      pgstat.[ch] code, of nontrivial size and changes on a regular
      basis. Put it into its own set of files.
      
      As there doesn't seem to be a good pre-existing directory for code
      like this, add src/backend/utils/activity.
      Reviewed-By: default avatarRobert Haas <robertmhaas@gmail.com>
      Discussion: https://postgr.es/m/20210316195440.twxmlov24rr2nxrg@alap3.anarazel.de
      a333476b
  4. 02 Apr, 2021 16 commits
    • David Rowley's avatar
      Remove useless Asserts in Result Cache code · 1267d986
      David Rowley authored
      Testing if an unsigned variable is >= 0 is pretty pointless.
      
      There's likely enough code in remove_cache_entry() to verify the cache
      memory accounting is correct in assert enabled builds. These Asserts
      were not adding much extra cover, even if they had been checking >= 0 on a
      signed variable.
      
      Reported-by: Andres Freund
      Discussion: https://postgr.es/m/20210402204734.6mo3nfacnljlicgn@alap3.anarazel.de
      1267d986
    • Bruce Momjian's avatar
      Use macro MONTHS_PER_YEAR instead of '12' in /ecpg/pgtypeslib · 84bc2b17
      Bruce Momjian authored
      All other places already use MONTHS_PER_YEAR appropriately.
      
      Backpatch-through: 9.6
      84bc2b17
    • Thomas Munro's avatar
      Detect POLLHUP/POLLRDHUP while running queries. · c30f54ad
      Thomas Munro authored
      Provide a new GUC check_client_connection_interval that can be used to
      check whether the client connection has gone away, while running very
      long queries.  It is disabled by default.
      
      For now this uses a non-standard Linux extension (also adopted by at
      least one other OS).  POLLRDHUP is not defined by POSIX, and other OSes
      don't have a reliable way to know if a connection was closed without
      actually trying to read or write.
      
      In future we might consider trying to send a no-op/heartbeat message
      instead, but that could require protocol changes.
      
      Author: Sergey Cherkashin <s.cherkashin@postgrespro.ru>
      Author: Thomas Munro <thomas.munro@gmail.com>
      Reviewed-by: default avatarThomas Munro <thomas.munro@gmail.com>
      Reviewed-by: default avatarTatsuo Ishii <ishii@sraoss.co.jp>
      Reviewed-by: default avatarKonstantin Knizhnik <k.knizhnik@postgrespro.ru>
      Reviewed-by: default avatarZhihong Yu <zyu@yugabyte.com>
      Reviewed-by: default avatarAndres Freund <andres@anarazel.de>
      Reviewed-by: default avatarMaksim Milyutin <milyutinma@gmail.com>
      Reviewed-by: default avatarTsunakawa, Takayuki/綱川 貴之 <tsunakawa.takay@fujitsu.com>
      Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> (much earlier version)
      Discussion: https://postgr.es/m/77def86b27e41f0efcba411460e929ae%40postgrespro.ru
      c30f54ad
    • Joe Conway's avatar
      Clarify documentation of RESET ROLE · 174edbe9
      Joe Conway authored
      Command-line options, or previous "ALTER (ROLE|DATABASE) ...
      SET ROLE ..." commands, can change the value of the default role
      for a session. In the presence of one of these, RESET ROLE will
      change the current user identifier to the default role rather
      than the session user identifier. Fix the documentation to
      reflect this reality. Backpatch to all supported versions.
      
      Author: Nathan Bossart
      Reviewed-By: Laurenz Albe, David G. Johnston, Joe Conway
      Reported by: Nathan Bossart
      Discussion: https://postgr.es/m/flat/925134DB-8212-4F60-8AB1-B1231D750CB4%40amazon.com
      Backpatch-through: 9.6
      174edbe9
    • Fujii Masao's avatar
      pg_checksums: Fix progress reporting. · 2eb1fc8b
      Fujii Masao authored
      pg_checksums uses two counters, total size and current size,
      to calculate the progress. Previously the progress that
      pg_checksums reported could not reach 100% at the end.
      The cause of this issue was that the sizes of only pages excluding
      new ones in each file were counted as the current size
      while the size of each file is counted as the total size.
      That is, the total size of all new pages could be reported
      as the difference between the total size and current size.
      
      This commit fixes this issue by making pg_checksums count
      the sizes of all pages including new ones in each file as
      the current size.
      
      Back-patch to v12 where progress reporting was added to pg_checksums.
      
      Reported-by: Shinya Kato
      Author: Shinya Kato
      Reviewed-by: Fujii Masao
      Discussion: https://postgr.es/m/TYAPR01MB289656B1ACA0A5E7CAD07BE3C47A9@TYAPR01MB2896.jpnprd01.prod.outlook.com
      2eb1fc8b
    • Tom Lane's avatar
      Strip file names reported in error messages on Windows, too. · 53aafdb9
      Tom Lane authored
      Commit dd136052 established a policy that error message FILE items
      should include only the base name of the reporting source file, for
      uniformity and succinctness.  We now observe that some Windows compilers
      use backslashes in __FILE__ strings, so truncate at backslashes as well.
      
      This is expected to fix some platform variation in the results of the
      new libpq_pipeline test module.
      
      Discussion: https://postgr.es/m/3650140.1617372290@sss.pgh.pa.us
      53aafdb9
    • Andrew Dunstan's avatar
      Fix typo in 6d7a6fea · 1877c9ac
      Andrew Dunstan authored
      Per gripe from Daniel Gustafsson
      1877c9ac
    • Fujii Masao's avatar
      postgres_fdw: Add option to control whether to keep connections open. · b1be3074
      Fujii Masao authored
      This commit adds a new option keep_connections that controls
      whether postgres_fdw keeps the connections to the foreign server open
      so that the subsequent queries can re-use them. This option can only be
      specified for a foreign server. The default is on. If set to off,
      all connections to the foreign server will be discarded
      at the end of transaction. Closed connections will be re-established
      when they are necessary by future queries using a foreign table.
      
      This option is useful, for example, when users want to prevent
      the connections from eating up the foreign servers connections
      capacity.
      
      Author: Bharath Rupireddy
      Reviewed-by: Alexey Kondratov, Vignesh C, Fujii Masao
      Discussion: https://postgr.es/m/CALj2ACVvrp5=AVp2PupEm+nAC8S4buqR3fJMmaCoc7ftT0aD2A@mail.gmail.com
      b1be3074
    • Peter Eisentraut's avatar
    • Fujii Masao's avatar
      Fix pgstat_report_replslot() to use proper data types for its arguments. · 96bdb7e1
      Fujii Masao authored
      The caller of pgstat_report_replslot() passes int64 values to the function.
      Also the function stores those values in PgStat_Counter (i.e., int64) fields
      of PgStat_MsgReplSlot struct. But previously the function used "int" as
      the data types of some arguments for those values, which could lead to
      the overflow of values.
      
      To avoid this risk, this commit fixes pgstat_report_replslot() to use
      PgStat_Counter type for the arguments. Since they are the statistics counters,
      PgStat_Counter, the data type used for counters, is used for them
      instead of int64.
      
      Reported-by: Vignesh C
      Author: Vignesh C
      Reviewed-by: Jeevan Ladhe, Fujii Masao
      Discussion: https://postgr.es/m/CALDaNm080OpG=ZwOb0i8EyChH5SyHAMFWJCKaKTXmrfvJLbgaA@mail.gmail.com
      96bdb7e1
    • Michael Paquier's avatar
      doc: Clarify how to generate backup files with non-exclusive backups · 6fb66c26
      Michael Paquier authored
      The current instructions describing how to write the backup_label and
      tablespace_map files are confusing.  For example, opening a file in text
      mode on Windows and copy-pasting the file's contents would result in a
      failure at recovery because of the extra CRLF characters generated.  The
      documentation was not stating that clearly, and per discussion this is
      not considered as a supported scenario.
      
      This commit extends a bit the documentation to mention that it may be
      required to open the file in binary mode before writing its data.
      
      Reported-by: Wang Shenhao
      Author: David Steele
      Reviewed-by: Andrew Dunstan, Magnus Hagander
      Discussion: https://postgr.es/m/8373f61426074f2cb6be92e02f838389@G08CNEXMBPEKD06.g08.fujitsu.local
      Backpatch-through: 9.6
      6fb66c26
    • Fujii Masao's avatar
      98e5bd10
    • David Rowley's avatar
      Attempt to fix unstable Result Cache regression tests · a4fac4ff
      David Rowley authored
      force_parallel_mode = regress is causing a few more problems than I
      thought.  It seems that both the leader and the single worker can
      contribute to the execution. I had mistakenly thought that only the worker
      process would do any work.  Since it's not deterministic as to which
      of the two processes will get a chance to work on the plan, it seems just
      better to disable force_parallel_mode for these tests.  At least doing
      this seems better than changing to EXPLAIN only rather than EXPLAIN
      ANALYZE.
      
      Additionally, I overlooked the fact that the number of executions of the
      sub-plan below a Result Cache will execute a varying number of times
      depending on cache eviction.  32-bit machines will use less memory and
      evict fewer tuples from the cache.  That results in the subnode being
      executed fewer times on 32-bit machines.  Let's just blank out the number
      of loops in each node.
      a4fac4ff
    • Bruce Momjian's avatar
      doc: mention that intervening major releases can be skipped · 2bda93f8
      Bruce Momjian authored
      Also mention that you should read the intervening major releases notes.
      This change was also applied to the website.
      
      Discussion: https://postgr.es/m/20210330144949.GA8259@momjian.us
      
      Backpatch-through: 9.6
      2bda93f8
    • David Rowley's avatar
      Add Result Cache executor node (take 2) · 9eacee2e
      David Rowley authored
      Here we add a new executor node type named "Result Cache".  The planner
      can include this node type in the plan to have the executor cache the
      results from the inner side of parameterized nested loop joins.  This
      allows caching of tuples for sets of parameters so that in the event that
      the node sees the same parameter values again, it can just return the
      cached tuples instead of rescanning the inner side of the join all over
      again.  Internally, result cache uses a hash table in order to quickly
      find tuples that have been previously cached.
      
      For certain data sets, this can significantly improve the performance of
      joins.  The best cases for using this new node type are for join problems
      where a large portion of the tuples from the inner side of the join have
      no join partner on the outer side of the join.  In such cases, hash join
      would have to hash values that are never looked up, thus bloating the hash
      table and possibly causing it to multi-batch.  Merge joins would have to
      skip over all of the unmatched rows.  If we use a nested loop join with a
      result cache, then we only cache tuples that have at least one join
      partner on the outer side of the join.  The benefits of using a
      parameterized nested loop with a result cache increase when there are
      fewer distinct values being looked up and the number of lookups of each
      value is large.  Also, hash probes to lookup the cache can be much faster
      than the hash probe in a hash join as it's common that the result cache's
      hash table is much smaller than the hash join's due to result cache only
      caching useful tuples rather than all tuples from the inner side of the
      join.  This variation in hash probe performance is more significant when
      the hash join's hash table no longer fits into the CPU's L3 cache, but the
      result cache's hash table does.  The apparent "random" access of hash
      buckets with each hash probe can cause a poor L3 cache hit ratio for large
      hash tables.  Smaller hash tables generally perform better.
      
      The hash table used for the cache limits itself to not exceeding work_mem
      * hash_mem_multiplier in size.  We maintain a dlist of keys for this cache
      and when we're adding new tuples and realize we've exceeded the memory
      budget, we evict cache entries starting with the least recently used ones
      until we have enough memory to add the new tuples to the cache.
      
      For parameterized nested loop joins, we now consider using one of these
      result cache nodes in between the nested loop node and its inner node.  We
      determine when this might be useful based on cost, which is primarily
      driven off of what the expected cache hit ratio will be.  Estimating the
      cache hit ratio relies on having good distinct estimates on the nested
      loop's parameters.
      
      For now, the planner will only consider using a result cache for
      parameterized nested loop joins.  This works for both normal joins and
      also for LATERAL type joins to subqueries.  It is possible to use this new
      node for other uses in the future.  For example, to cache results from
      correlated subqueries.  However, that's not done here due to some
      difficulties obtaining a distinct estimation on the outer plan to
      calculate the estimated cache hit ratio.  Currently we plan the inner plan
      before planning the outer plan so there is no good way to know if a result
      cache would be useful or not since we can't estimate the number of times
      the subplan will be called until the outer plan is generated.
      
      The functionality being added here is newly introducing a dependency on
      the return value of estimate_num_groups() during the join search.
      Previously, during the join search, we only ever needed to perform
      selectivity estimations.  With this commit, we need to use
      estimate_num_groups() in order to estimate what the hit ratio on the
      result cache will be.   In simple terms, if we expect 10 distinct values
      and we expect 1000 outer rows, then we'll estimate the hit ratio to be
      99%.  Since cache hits are very cheap compared to scanning the underlying
      nodes on the inner side of the nested loop join, then this will
      significantly reduce the planner's cost for the join.   However, it's
      fairly easy to see here that things will go bad when estimate_num_groups()
      incorrectly returns a value that's significantly lower than the actual
      number of distinct values.  If this happens then that may cause us to make
      use of a nested loop join with a result cache instead of some other join
      type, such as a merge or hash join.  Our distinct estimations have been
      known to be a source of trouble in the past, so the extra reliance on them
      here could cause the planner to choose slower plans than it did previous
      to having this feature.  Distinct estimations are also fairly hard to
      estimate accurately when several tables have been joined already or when a
      WHERE clause filters out a set of values that are correlated to the
      expressions we're estimating the number of distinct value for.
      
      For now, the costing we perform during query planning for result caches
      does put quite a bit of faith in the distinct estimations being accurate.
      When these are accurate then we should generally see faster execution
      times for plans containing a result cache.  However, in the real world, we
      may find that we need to either change the costings to put less trust in
      the distinct estimations being accurate or perhaps even disable this
      feature by default.  There's always an element of risk when we teach the
      query planner to do new tricks that it decides to use that new trick at
      the wrong time and causes a regression.  Users may opt to get the old
      behavior by turning the feature off using the enable_resultcache GUC.
      Currently, this is enabled by default.  It remains to be seen if we'll
      maintain that setting for the release.
      
      Additionally, the name "Result Cache" is the best name I could think of
      for this new node at the time I started writing the patch.  Nobody seems
      to strongly dislike the name. A few people did suggest other names but no
      other name seemed to dominate in the brief discussion that there was about
      names. Let's allow the beta period to see if the current name pleases
      enough people.  If there's some consensus on a better name, then we can
      change it before the release.  Please see the 2nd discussion link below
      for the discussion on the "Result Cache" name.
      
      Author: David Rowley
      Reviewed-by: Andy Fan, Justin Pryzby, Zhihong Yu, Hou Zhijie
      Tested-By: Konstantin Knizhnik
      Discussion: https://postgr.es/m/CAApHDvrPcQyQdWERGYWx8J%2B2DLUNgXu%2BfOSbQ1UscxrunyXyrQ%40mail.gmail.com
      Discussion: https://postgr.es/m/CAApHDvq=yQXr5kqhRviT2RhNKwToaWr9JAN5t+5_PzhuRJ3wvg@mail.gmail.com
      9eacee2e
    • Michael Paquier's avatar
      Improve stability of test with vacuum_truncate in reloptions.sql · fe246d1c
      Michael Paquier authored
      This test has been using a simple VACUUM with pg_relation_size() to
      check if a relation gets physically truncated or not, but forgot the
      fact that some concurrent activity, like checkpoint buffer writes, could
      cause some pages to be skipped.  The second test enabling
      vacuum_truncate could fail, seeing a non-empty relation.  The first test
      would not have failed, but could finish by testing a behavior different
      than the one aimed for.  Both tests gain a FREEZE option, to make the
      vacuums more aggressive and prevent page skips.
      
      This is similar to the issues fixed in c2dc1a79.
      
      Author: Arseny Sher
      Reviewed-by: Masahiko Sawada
      Discussion: https://postgr.es/m/87tuotr2hh.fsf@ars-thinkpad
      backpatch-through: 12
      fe246d1c
  5. 01 Apr, 2021 6 commits
    • Tom Lane's avatar
      Rethink handling of pass-by-value leaf datums in SP-GiST. · 1ebdec8c
      Tom Lane authored
      The existing convention in SP-GiST is that any pass-by-value datatype
      is stored in Datum representation, i.e. it's of width sizeof(Datum)
      even when typlen is less than that.  This is okay, or at least it's
      too late to change it, for prefix datums and node-label datums in inner
      (upper) tuples.  But it's problematic for leaf datums, because we'd
      prefer those to be stored in Postgres' standard on-disk representation
      so that we can easily extend leaf tuples to carry additional "included"
      columns.
      
      I believe, however, that we can get away with just up and changing that.
      This would be an unacceptable on-disk-format break, but there are two
      big mitigating factors:
      
      1. It seems quite unlikely that there are any SP-GiST opclasses out
      there that use pass-by-value leaf datatypes.  Certainly none of the
      ones in core do, nor has codesearch.debian.net heard of any.  Given
      what SP-GiST is good for, it's hard to conceive of a use-case where
      the leaf-level values would be both small and fixed-width.  (As an
      example, if you wanted to index text values with the leaf level being
      just a byte, then every text string would have to be represented
      with one level of inner tuple per preceding byte, which would be
      horrendously space-inefficient and slow to access.  You always want
      to use as few inner-tuple levels as possible, leaving as much as
      possible in the leaf values.)
      
      2. Even granting that you have such an index, this change only
      breaks things on big-endian machines.  On little-endian, the high
      order bytes of the Datum format will now just appear to be alignment
      padding space.
      
      So, change the code to store pass-by-value leaf datums in their
      usual on-disk form.  Inner-tuple datums are not touched.
      
      This is extracted from a larger patch that intends to add support for
      "included" columns.  I'm committing it separately for visibility in
      our commit logs.
      
      Pavel Borisov and Tom Lane, reviewed by Andrey Borodin
      
      Discussion: https://postgr.es/m/CALT9ZEFi-vMp4faht9f9Junb1nO3NOSjhpxTmbm1UGLMsLqiEQ@mail.gmail.com
      1ebdec8c
    • Stephen Frost's avatar
      Rename Default Roles to Predefined Roles · c9c41c7a
      Stephen Frost authored
      The term 'default roles' wasn't quite apt as these roles aren't able to
      be modified or removed after installation, so rename them to be
      'Predefined Roles' instead, adding an entry into the newly added
      Obsolete Appendix to help users of current releases find the new
      documentation.
      
      Bruce Momjian and Stephen Frost
      
      Discussion: https://postgr.es/m/157742545062.1149.11052653770497832538%40wrigleys.postgresql.org
      and https://www.postgresql.org/message-id/20201120211304.GG16415@tamriel.snowman.net
      c9c41c7a
    • Alvaro Herrera's avatar
      Fix setvbuf()-induced crash in libpq_pipeline · a68a894f
      Alvaro Herrera authored
      Windows doesn't like setvbuf(..., _IOLBF) and crashes if you use it,
      which has been causing the libpq_pipeline failures all along ... and our
      own port.h has known about it for a long time: it offers PG_IOLBF that's
      defined to _IONBF on that platform.  Follow its advice.
      
      While at it, get rid of a bogus bitshift that used a constant of the
      wrong size.  Decorate the constant as LL to fix.  While at it, remove a
      pointless addition that only confused matters.
      
      All as diagnosed by Tom Lane.
      
      Discussion: https://postgr.es/m/3458958.1617302154@sss.pgh.pa.us
      a68a894f
    • Robert Haas's avatar
      amcheck: Fix verify_heapam's tuple visibility checking rules. · 3b6c1259
      Robert Haas authored
      We now follow the order of checks from HeapTupleSatisfies* more
      closely to avoid coming to erroneous conclusions.
      
      Mark Dilger and Robert Haas
      
      Discussion: http://postgr.es/m/CA+Tgmob6sii0yTvULYJ0Vq4w6ZBmj7zUhddL3b+SKDi9z9NA7Q@mail.gmail.com
      3b6c1259
    • Tom Lane's avatar
      Fix pg_restore's misdesigned code for detecting archive file format. · ec03f2df
      Tom Lane authored
      Despite the clear comments pointing out that the duplicative code
      segments in ReadHead() and _discoverArchiveFormat() needed to be
      in sync, they were not: the latter did not bother to apply any of
      the sanity checks in the former.  We'd missed noticing this partly
      because none of those checks would fail in scenarios we customarily
      test, and partly because the oversight would be masked if both
      segments execute, which they would in cases other than needing to
      autodetect the format of a non-seekable stdin source.  However,
      in a case meeting all these requirements --- for example, trying
      to read a newer-than-supported archive format from non-seekable
      stdin --- pg_restore missed applying the version check and would
      likely dump core or otherwise misbehave.
      
      The whole thing is silly anyway, because there seems little reason
      to duplicate the logic beyond the one-line verification that the
      file starts with "PGDMP".  There seems to have been an undocumented
      assumption that multiple major formats (major enough to require
      separate reader modules) would nonetheless share the first half-dozen
      fields of the custom-format header.  This seems unlikely, so let's
      fix it by just nuking the duplicate logic in _discoverArchiveFormat().
      
      Also get rid of the pointless attempt to seek back to the start of
      the file after successful autodetection.  That wastes cycles and
      it means we have four behaviors to verify not two.
      
      Per bug #16951 from Sergey Koposov.  This has been broken for
      decades, so back-patch to all supported versions.
      
      Discussion: https://postgr.es/m/16951-a4dd68cf0de23048@postgresql.org
      ec03f2df
    • Peter Eisentraut's avatar
      Fix internal extract(timezone_minute) formulas · 91e7c903
      Peter Eisentraut authored
      Through various refactorings over time, the extract(timezone_minute
      from time with time zone) and extract(timezone_minute from timestamp
      with time zone) implementations ended up with two different but
      equally nonsensical formulas by using SECS_PER_MINUTE and
      MINS_PER_HOUR interchangeably.  Since those two are of course both the
      same number, the formulas do work, but for readability, fix them to be
      semantically correct.
      91e7c903