1. 02 Mar, 2020 8 commits
  2. 01 Mar, 2020 2 commits
  3. 29 Feb, 2020 5 commits
    • Peter Geoghegan's avatar
      Doc: Fix pageinspect bt_page_items() example. · dba91533
      Peter Geoghegan authored
      Oversight in commit 93ee38ea.
      dba91533
    • Peter Geoghegan's avatar
      Teach pageinspect about nbtree deduplication. · 93ee38ea
      Peter Geoghegan authored
      Add a new bt_metap() column to display the metapage's allequalimage
      field.  Also add three new columns to contrib/pageinspect's
      bt_page_items() function:
      
      * Add a boolean column ("dead") that displays the LP_DEAD bit value for
      each non-pivot tuple.
      
      * Add a TID column ("htid") that displays a single heap TID value for
      each tuple.  This is the TID that is returned by BTreeTupleGetHeapTID(),
      so comparable values are shown for pivot tuples, plain non-pivot tuples,
      and posting list tuples.
      
      * Add a TID array column ("tids") that displays TIDs from each tuple's
      posting list, if any.  This works just like the "tids" column from
      pageinspect's gin_leafpage_items() function.
      
      No version bump for the pageinspect extension, since there hasn't been a
      stable Postgres release since the last version bump (the last bump was
      part of commit 58b4cb30).
      
      Author: Peter Geoghegan
      Discussion: https://postgr.es/m/CAH2-WzmSMmU2eNvY9+a4MNP+z02h6sa-uxZvN3un6jY02ZVBSw@mail.gmail.com
      93ee38ea
    • Tom Lane's avatar
      Correctly re-use hash tables in buildSubPlanHash(). · 58c47ccf
      Tom Lane authored
      Commit 356687bd omitted to remove leftover code for destroying
      a hashed subplan's hash tables, with the result that the tables
      were always rebuilt not reused; this leads to severe memory
      leakage if a hashed subplan is re-executed enough times.
      Moreover, the code for reusing the hashnulls table had a typo
      that would have made it do the wrong thing if it were reached.
      
      Looking at the code coverage report shows severe under-coverage
      of the potential callers of ResetTupleHashTable, so add some test
      cases that exercise them.
      
      Andreas Karlsson and Tom Lane, per reports from Ranier Vilela
      and Justin Pryzby.
      
      Backpatch to v11, as the faulty commit was.
      
      Discussion: https://postgr.es/m/edb62547-c453-c35b-3ed6-a069e4d6b937@proxel.se
      Discussion: https://postgr.es/m/CAEudQAo=DCebm1RXtig9OH+QivpS97sMkikt0A9qHmMUs+g6ZA@mail.gmail.com
      Discussion: https://postgr.es/m/20200210032547.GA1412@telsasoft.com
      58c47ccf
    • Tom Lane's avatar
      Remove obsolete comment. · 6afc8aef
      Tom Lane authored
      Noted while studying subplan hash issue.
      6afc8aef
    • Tom Lane's avatar
      Avoid failure if autovacuum tries to access a just-dropped temp namespace. · 80d76be5
      Tom Lane authored
      Such an access became possible when commit 246a6c8f added more
      aggressive cleanup of orphaned temp relations by autovacuum.
      Since autovacuum's snapshot might be slightly stale, it could
      attempt to access an already-dropped temp namespace, resulting in
      an assertion failure or null-pointer dereference.  (In practice,
      since we don't drop temp namespaces automatically but merely
      recycle them, this situation could only arise if a superuser does
      a manual drop of a temp namespace.  Still, that should be allowed.)
      
      The core of the bug, IMO, is that isTempNamespaceInUse and its callers
      failed to think hard about whether to treat "temp namespace isn't there"
      differently from "temp namespace isn't in use".  In hopes of forestalling
      future mistakes of the same ilk, replace that function with a new one
      checkTempNamespaceStatus, which makes the same tests but returns a
      three-way enum rather than just a bool.  isTempNamespaceInUse is gone
      entirely in HEAD; but just in case some external code is relying on it,
      keep it in the back branches, as a bug-compatible wrapper around the
      new function.
      
      Per report originally from Prabhat Kumar Sahu, investigated by Mahendra
      Singh and Michael Paquier; the final form of the patch is my fault.
      This replaces the failed fix attempt in a052f6cb.
      
      Backpatch as far as v11, as 246a6c8f was.
      
      Discussion: https://postgr.es/m/CAKYtNAr9Zq=1-ww4etHo-VCC-k120YxZy5OS01VkaLPaDbv2tg@mail.gmail.com
      80d76be5
  4. 28 Feb, 2020 5 commits
  5. 27 Feb, 2020 8 commits
  6. 26 Feb, 2020 7 commits
    • Peter Geoghegan's avatar
      Silence another compiler warning in nbtinsert.c. · 2c0797da
      Peter Geoghegan authored
      Per complaint from Álvaro Herrera.
      2c0797da
    • Tom Lane's avatar
      Suppress unnecessary RelabelType nodes in more cases. · a477bfc1
      Tom Lane authored
      eval_const_expressions sometimes produced RelabelType nodes that
      were useless because they just relabeled an expression to the same
      exposed type it already had.  This is worth avoiding because it can
      cause two equivalent expressions to not be equal(), preventing
      recognition of useful optimizations.  In the test case added here,
      an unpatched planner fails to notice that the "sqli = constant" clause
      renders a sort step unnecessary, because one code path produces an
      extra RelabelType and another doesn't.
      
      Fix by ensuring that eval_const_expressions_mutator's T_RelabelType
      case will not add in an unnecessary RelabelType.  Also save some
      code by sharing a subroutine with the effectively-equivalent cases
      for CollateExpr and CoerceToDomain.  (CollateExpr had no bug, and
      I think that the case couldn't arise with CoerceToDomain, but
      it seems prudent to do the same check for all three cases.)
      
      Back-patch to v12.  In principle this has been wrong all along,
      but I haven't seen a case where it causes visible misbehavior
      before v12, so refrain from changing stable branches unnecessarily.
      
      Per investigation of a report from Eric Gillum.
      
      Discussion: https://postgr.es/m/CAMmjdmvAZsUEskHYj=KT9sTukVVCiCSoe_PBKOXsncFeAUDPCQ@mail.gmail.com
      a477bfc1
    • Alvaro Herrera's avatar
      Fix docs regarding AFTER triggers on partitioned tables · 3acfe6b0
      Alvaro Herrera authored
      In commit 86f57594 I forgot to update the trigger.sgml paragraph
      that needs to explain that AFTER triggers are allowed in partitioned
      tables.  Do so now.
      
      Discussion: https://postgr.es/m/20200224185850.GA30899@alvherre.pgsql
      3acfe6b0
    • Peter Geoghegan's avatar
      Silence compiler warning in nbtinsert.c. · 2d8a6fad
      Peter Geoghegan authored
      Per buildfarm member longfin.
      2d8a6fad
    • Peter Geoghegan's avatar
      Add deduplication to nbtree. · 0d861bbb
      Peter Geoghegan authored
      Deduplication reduces the storage overhead of duplicates in indexes that
      use the standard nbtree index access method.  The deduplication process
      is applied lazily, after the point where opportunistic deletion of
      LP_DEAD-marked index tuples occurs.  Deduplication is only applied at
      the point where a leaf page split would otherwise be required.  New
      posting list tuples are formed by merging together existing duplicate
      tuples.  The physical representation of the items on an nbtree leaf page
      is made more space efficient by deduplication, but the logical contents
      of the page are not changed.  Even unique indexes make use of
      deduplication as a way of controlling bloat from duplicates whose TIDs
      point to different versions of the same logical table row.
      
      The lazy approach taken by nbtree has significant advantages over a GIN
      style eager approach.  Most individual inserts of index tuples have
      exactly the same overhead as before.  The extra overhead of
      deduplication is amortized across insertions, just like the overhead of
      page splits.  The key space of indexes works in the same way as it has
      since commit dd299df8 (the commit that made heap TID a tiebreaker
      column).
      
      Testing has shown that nbtree deduplication can generally make indexes
      with about 10 or 15 tuples for each distinct key value about 2.5X - 4X
      smaller, even with single column integer indexes (e.g., an index on a
      referencing column that accompanies a foreign key).  The final size of
      single column nbtree indexes comes close to the final size of a similar
      contrib/btree_gin index, at least in cases where GIN's posting list
      compression isn't very effective.  This can significantly improve
      transaction throughput, and significantly reduce the cost of vacuuming
      indexes.
      
      A new index storage parameter (deduplicate_items) controls the use of
      deduplication.  The default setting is 'on', so all new B-Tree indexes
      automatically use deduplication where possible.  This decision will be
      reviewed at the end of the Postgres 13 beta period.
      
      There is a regression of approximately 2% of transaction throughput with
      synthetic workloads that consist of append-only inserts into a table
      with several non-unique indexes, where all indexes have few or no
      repeated values.  The underlying issue is that cycles are wasted on
      unsuccessful attempts at deduplicating items in non-unique indexes.
      There doesn't seem to be a way around it short of disabling
      deduplication entirely.  Note that deduplication of items in unique
      indexes is fairly well targeted in general, which avoids the problem
      there (we can use a special heuristic to trigger deduplication passes in
      unique indexes, since we're specifically targeting "version bloat").
      
      Bump XLOG_PAGE_MAGIC because xl_btree_vacuum changed.
      
      No bump in BTREE_VERSION, since the representation of posting list
      tuples works in a way that's backwards compatible with version 4 indexes
      (i.e. indexes built on PostgreSQL 12).  However, users must still
      REINDEX a pg_upgrade'd index to use deduplication, regardless of the
      Postgres version they've upgraded from.  This is the only way to set the
      new nbtree metapage flag indicating that deduplication is generally
      safe.
      
      Author: Anastasia Lubennikova, Peter Geoghegan
      Reviewed-By: Peter Geoghegan, Heikki Linnakangas
      Discussion:
          https://postgr.es/m/55E4051B.7020209@postgrespro.ru
          https://postgr.es/m/4ab6e2db-bcee-f4cf-0916-3a06e6ccbb55@postgrespro.ru
      0d861bbb
    • Peter Geoghegan's avatar
      Add equalimage B-Tree support functions. · 612a1ab7
      Peter Geoghegan authored
      Invent the concept of a B-Tree equalimage ("equality implies image
      equality") support function, registered as support function 4.  This
      indicates whether it is safe (or not safe) to apply optimizations that
      assume that any two datums considered equal by an operator class's order
      method must be interchangeable without any loss of semantic information.
      This is static information about an operator class and a collation.
      
      Register an equalimage routine for almost all of the existing B-Tree
      opclasses.  We only need two trivial routines for all of the opclasses
      that are included with the core distribution.  There is one routine for
      opclasses that index non-collatable types (which returns 'true'
      unconditionally), plus another routine for collatable types (which
      returns 'true' when the collation is a deterministic collation).
      
      This patch is infrastructure for an upcoming patch that adds B-Tree
      deduplication.
      
      Author: Peter Geoghegan, Anastasia Lubennikova
      Discussion: https://postgr.es/m/CAH2-Wzn3Ee49Gmxb7V1VJ3-AC8fWn-Fr8pfWQebHe8rYRxt5OQ@mail.gmail.com
      612a1ab7
    • Magnus Hagander's avatar
      Include error code in message from pg_upgrade · 4109bb5d
      Magnus Hagander authored
      In passing, also quote the filename in one message where it wasn't.
      
      Author: Dagfinn Ilmari Mannsåker
      Discussion: https://postgr.es/m/87pne2w98h.fsf@wibble.ilmari.org
      4109bb5d
  7. 25 Feb, 2020 1 commit
  8. 24 Feb, 2020 4 commits
    • Tom Lane's avatar
      Fix compile failure. · 36390713
      Tom Lane authored
      I forgot that some compilers won't handle #if constructs within
      ereport() calls.  Duplicating most of the call is annoying but simple.
      Per buildfarm.
      36390713
    • Andres Freund's avatar
      expression eval: Reduce number of steps for agg transition invocations. · 2742c450
      Andres Freund authored
      Do so by combining the various steps that are part of aggregate
      transition function invocation into one larger step. As some of the
      current steps are only necessary for some aggregates, have one variant
      of the aggregate transition step for each possible combination.
      
      To avoid further manual copies of code in the different transition
      step implementations, move most of the code into helper functions
      marked as "always inline".
      
      The benefit of this change is an increase in performance when
      aggregating lots of rows. This comes in part due to the reduced number
      of indirect jumps due to the reduced number of steps, and in part by
      reducing redundant setup code across steps. This mainly benefits
      interpreted execution, but the code generated by JIT is also improved
      a bit.
      
      As a nice side-effect it also ends up making the code a bit simpler.
      
      A small additional optimization is removing the need to set
      aggstate->curaggcontext before calling ExecAggInitGroup, choosing to
      instead passign curaggcontext as an argument. It was, in contrast to
      other aggregate related functions, only needed to fetch a memory
      context to copy the transition value into.
      
      Author: Andres Freund
      Discussion:
         https://postgr.es/m/20191023163849.sosqbfs5yenocez3@alap3.anarazel.de
         https://postgr.es/m/5c371df7cee903e8cd4c685f90c6c72086d3a2dc.camel@j-davis.com
      2742c450
    • Michael Paquier's avatar
      Issue properly WAL record for CID of first catalog tuple in multi-insert · 7d672b76
      Michael Paquier authored
      Multi-insert for heap is not yet used actively for catalogs, but the
      code to support this case is in place for logical decoding.  The
      existing code forgot to issue a XLOG_HEAP2_NEW_CID record for the first
      tuple inserted, leading to failures when attempting to use multiple
      inserts for catalogs at decoding time.  This commit fixes the problem by
      WAL-logging the needed CID.
      
      This is not an active bug, so no back-patch is done.
      
      Author: Daniel Gustafsson
      Discussion: https://postgr.es/m/E0D4CC67-A1CF-4DF4-991D-B3AC2EB5FAE9@yesql.se
      7d672b76
    • Tom Lane's avatar
      Account explicitly for long-lived FDs that are allocated outside fd.c. · 3d475515
      Tom Lane authored
      The comments in fd.c have long claimed that all file allocations should
      go through that module, but in reality that's not always practical.
      fd.c doesn't supply APIs for invoking some FD-producing syscalls like
      pipe() or epoll_create(); and the APIs it does supply for non-virtual
      FDs are mostly insistent on releasing those FDs at transaction end;
      and in some cases the actual open() call is in code that can't be made
      to use fd.c, such as libpq.
      
      This has led to a situation where, in a modern server, there are likely
      to be seven or so long-lived FDs per backend process that are not known
      to fd.c.  Since NUM_RESERVED_FDS is only 10, that meant we had *very*
      few spare FDs if max_files_per_process is >= the system ulimit and
      fd.c had opened all the files it thought it safely could.  The
      contrib/postgres_fdw regression test, in particular, could easily be
      made to fall over by running it under a restrictive ulimit.
      
      To improve matters, invent functions Acquire/Reserve/ReleaseExternalFD
      that allow outside callers to tell fd.c that they have or want to allocate
      a FD that's not directly managed by fd.c.  Add calls to track all the
      fixed FDs in a standard backend session, so that we are honestly
      guaranteeing that NUM_RESERVED_FDS FDs remain unused below the EMFILE
      limit in a backend's idle state.  The coding rules for these functions say
      that there's no need to call them in code that just allocates one FD over
      a fairly short interval; we can dip into NUM_RESERVED_FDS for such cases.
      That means that there aren't all that many places where we need to worry.
      But postgres_fdw and dblink must use this facility to account for
      long-lived FDs consumed by libpq connections.  There may be other places
      where it's worth doing such accounting, too, but this seems like enough
      to solve the immediate problem.
      
      Internally to fd.c, "external" FDs are limited to max_safe_fds/3 FDs.
      (Callers can choose to ignore this limit, but of course it's unwise
      to do so except for fixed file allocations.)  I also reduced the limit
      on "allocated" files to max_safe_fds/3 FDs (it had been max_safe_fds/2).
      Conceivably a smarter rule could be used here --- but in practice,
      on reasonable systems, max_safe_fds should be large enough that this
      isn't much of an issue, so KISS for now.  To avoid possible regression
      in the number of external or allocated files that can be opened,
      increase FD_MINFREE and the lower limit on max_files_per_process a
      little bit; we now insist that the effective "ulimit -n" be at least 64.
      
      This seems like pretty clearly a bug fix, but in view of the lack of
      field complaints, I'll refrain from risking a back-patch.
      
      Discussion: https://postgr.es/m/E1izCmM-0005pV-Co@gemulon.postgresql.org
      3d475515