1. 08 Apr, 2020 6 commits
    • Tom Lane's avatar
      Allow partitionwise join to handle nested FULL JOIN USING cases. · 981643dc
      Tom Lane authored
      This case didn't work because columns merged by FULL JOIN USING are
      represented in the parse tree by COALESCE expressions, and the logic
      for recognizing a partitionable join failed to match upper-level join
      clauses to such expressions.  To fix, synthesize suitable COALESCE
      expressions and add them to the nullable_partexprs lists.  This is
      pretty ugly and brute-force, but it gets the job done.  (I have
      ambitions of rethinking the way outer-join output Vars are
      represented, so maybe that will provide a cleaner solution someday.
      For now, do this.)
      
      Amit Langote, reviewed by Justin Pryzby, Richard Guo, and myself
      
      Discussion: https://postgr.es/m/CA+HiwqG2WVUGmLJqtR0tPFhniO=H=9qQ+Z3L_ZC+Y3-EVQHFGg@mail.gmail.com
      981643dc
    • Etsuro Fujita's avatar
      Allow partitionwise joins in more cases. · c8434d64
      Etsuro Fujita authored
      Previously, the partitionwise join technique only allowed partitionwise
      join when input partitioned tables had exactly the same partition
      bounds.  This commit extends the technique to some cases when the tables
      have different partition bounds, by using an advanced partition-matching
      algorithm introduced by this commit.  For both the input partitioned
      tables, the algorithm checks whether every partition of one input
      partitioned table only matches one partition of the other input
      partitioned table at most, and vice versa.  In such a case the join
      between the tables can be broken down into joins between the matching
      partitions, so the algorithm produces the pairs of the matching
      partitions, plus the partition bounds for the join relation, to allow
      partitionwise join for computing the join.  Currently, the algorithm
      works for list-partitioned and range-partitioned tables, but not
      hash-partitioned tables.  See comments in partition_bounds_merge().
      
      Ashutosh Bapat and Etsuro Fujita, most of regression tests by Rajkumar
      Raghuwanshi, some of the tests by Mark Dilger and Amul Sul, reviewed by
      Dmitry Dolgov and Amul Sul, with additional review at various points by
      Ashutosh Bapat, Mark Dilger, Robert Haas, Antonin Houska, Amit Langote,
      Justin Pryzby, and Tomas Vondra
      
      Discussion: https://postgr.es/m/CAFjFpRdjQvaUEV5DJX3TW6pU5eq54NCkadtxHX2JiJG_GvbrCA@mail.gmail.com
      c8434d64
    • Tom Lane's avatar
      Fix circle_in to accept "(x,y),r" as it's advertised to do. · 41a194f4
      Tom Lane authored
      Our documentation describes four allowed input syntaxes for circles,
      but the regression tests tried only three ... with predictable
      consequences.  Remarkably, this has been wrong since the circle
      datatype was added in 1997, but nobody noticed till now.
      
      David Zhang, with some help from me
      
      Discussion: https://postgr.es/m/332c47fa-d951-7574-b5cc-a8f7f7201202@highgo.ca
      41a194f4
    • Andres Freund's avatar
      snapshot scalability: Move delayChkpt from PGXACT to PGPROC. · 75848bc7
      Andres Freund authored
      The goal of separating hotly accessed per-backend data from PGPROC
      into PGXACT is to make accesses fast (GetSnapshotData() in
      particular). But delayChkpt is not actually accessed frequently; only
      when starting a checkpoint. As it is frequently modified (multiple
      times in the course of a single transaction), storing it in the same
      cacheline as hotly accessed data unnecessarily dirties a contended
      cacheline.
      
      Therefore move delayChkpt to PGPROC.
      
      This is part of a larger series of patches intending to improve
      GetSnapshotData() scalability. It is committed and pushed separately,
      as it is independently beneficial (small but measurable win, limited
      by the other frequent modifications of PGXACT).
      
      Author: Andres Freund
      Reviewed-By: Robert Haas, Thomas Munro, David Rowley
      Discussion: https://postgr.es/m/20200301083601.ews6hz5dduc3w2se@alap3.anarazel.de
      75848bc7
    • Tomas Vondra's avatar
      Track SLRU page hits in SimpleLruReadPage_ReadOnly · 2b88fdde
      Tomas Vondra authored
      SLRU page hits were tracked only in SimpleLruReadPage, but that's not
      enough because we may hit the page in SimpleLruReadPage_ReadOnly in
      which case we don't call SimpleLruReadPage at all.
      
      Reported-by: Kuntal Ghosh
      Discussion: https://postgr.es/m/20200119143707.gyinppnigokesjok@development
      2b88fdde
    • Andres Freund's avatar
      Fix XLogReader FD leak that makes backends unusable after 2PC usage. · 91c40548
      Andres Freund authored
      Before the fix every 2PC commit/abort leaked a file descriptor. As the
      files are opened using BasicOpenFile(), that quickly leads to the
      backend running out of file descriptors.
      
      Once enough 2PC abort/commit have caused enough FDs to leak, any IO
      in the backend will fail with "Too many open files", as
      BasicOpenFilePerm() will have triggered all open files known to fd.c
      to be closed.
      
      The leak causing the problem at hand is a consequence of 0dc8ead4,
      but is only exascerbated by it. Previously most XLogPageReadCB
      callbacks used static variables to cache one open file, but after the
      commit the cache is private to each XLogReader instance. There never
      was infrastructure to close FDs at the time of XLogReaderFree, but the
      way XLogReader was used limited the leak to one FD.
      
      This commit just closes the during XLogReaderFree() if the FD is
      stored in XLogReaderState.seg.ws_segno. This may not be the way to
      solve this medium/long term, but at least unbreaks 2PC.
      
      Discussion: https://postgr.es/m/20200406025651.fpzdb5yyb7qyhqko@alap3.anarazel.de
      91c40548
  2. 07 Apr, 2020 14 commits
  3. 06 Apr, 2020 13 commits
  4. 05 Apr, 2020 4 commits
  5. 04 Apr, 2020 3 commits
    • Noah Misch's avatar
      Add perl2host call missing from a new test file. · 70de4e95
      Noah Misch authored
      Oversight in today's commit c6b92041.
      Per buildfarm member jacana.
      
      Discussion: http://postgr.es/m/20200404223212.GC3442685@rfd.leadboat.com
      70de4e95
    • Tom Lane's avatar
      Remove bogus Assert, add some regression test cases showing why. · 07871d40
      Tom Lane authored
      Commit 77ec5aff added an assertion to enforce_generic_type_consistency
      that boils down to "if the function result is polymorphic, there must be
      at least one polymorphic argument".  This should be true for user-created
      functions, but there are built-in functions for which it's not true, as
      pointed out by Jaime Casanova.  Hence, go back to the old behavior of
      leaving the return type alone.  There's only a limited amount of stuff
      you can do with such a function result, but it does work to some extent;
      add some regression test cases to ensure we don't break that again.
      
      Discussion: https://postgr.es/m/CAJGNTeMbhtsCUZgJJ8h8XxAJbK7U2ipsX8wkHRtZRz-NieT8RA@mail.gmail.com
      07871d40
    • Noah Misch's avatar
      Skip WAL for new relfilenodes, under wal_level=minimal. · c6b92041
      Noah Misch authored
      Until now, only selected bulk operations (e.g. COPY) did this.  If a
      given relfilenode received both a WAL-skipping COPY and a WAL-logged
      operation (e.g. INSERT), recovery could lose tuples from the COPY.  See
      src/backend/access/transam/README section "Skipping WAL for New
      RelFileNode" for the new coding rules.  Maintainers of table access
      methods should examine that section.
      
      To maintain data durability, just before commit, we choose between an
      fsync of the relfilenode and copying its contents to WAL.  A new GUC,
      wal_skip_threshold, guides that choice.  If this change slows a workload
      that creates small, permanent relfilenodes under wal_level=minimal, try
      adjusting wal_skip_threshold.  Users setting a timeout on COMMIT may
      need to adjust that timeout, and log_min_duration_statement analysis
      will reflect time consumption moving to COMMIT from commands like COPY.
      
      Internally, this requires a reliable determination of whether
      RollbackAndReleaseCurrentSubTransaction() would unlink a relation's
      current relfilenode.  Introduce rd_firstRelfilenodeSubid.  Amend the
      specification of rd_createSubid such that the field is zero when a new
      rel has an old rd_node.  Make relcache.c retain entries for certain
      dropped relations until end of transaction.
      
      Bump XLOG_PAGE_MAGIC, since this introduces XLOG_GIST_ASSIGN_LSN.
      Future servers accept older WAL, so this bump is discretionary.
      
      Kyotaro Horiguchi, reviewed (in earlier, similar versions) by Robert
      Haas.  Heikki Linnakangas and Michael Paquier implemented earlier
      designs that materially clarified the problem.  Reviewed, in earlier
      designs, by Andrew Dunstan, Andres Freund, Alvaro Herrera, Tom Lane,
      Fujii Masao, and Simon Riggs.  Reported by Martijn van Oosterhout.
      
      Discussion: https://postgr.es/m/20150702220524.GA9392@svana.org
      c6b92041