1. 21 Dec, 2017 7 commits
    • Robert Haas's avatar
      Adjust assertion in GetCurrentCommandId. · cce1ecfc
      Robert Haas authored
      currentCommandIdUsed is only used to skip redundant increments of the
      command counter, and CommandCounterIncrement() is categorically denied
      under parallelism anyway.  Therefore, it's OK for
      GetCurrentCommandId() to mark the counter value used, as long as it
      happens in the leader, not a worker.
      
      Prior to commit e9baa5e9, the slightly
      incorrect check didn't matter, but now it does.  A test case added by
      commit 18042840 uncovered the problem
      by accident; it caused failures with force_parallel_mode=on/regress.
      
      Report and review by Andres Freund.  Patch by me.
      
      Discussion: http://postgr.es/m/20171221143106.5lhtygohvmazli3x@alap3.anarazel.de
      cce1ecfc
    • Tom Lane's avatar
      Rearrange execution of PARAM_EXTERN Params for plpgsql's benefit. · 6719b238
      Tom Lane authored
      This patch does three interrelated things:
      
      * Create a new expression execution step type EEOP_PARAM_CALLBACK
      and add the infrastructure needed for add-on modules to generate that.
      As discussed, the best control mechanism for that seems to be to add
      another hook function to ParamListInfo, which will be called by
      ExecInitExpr if it's supplied and a PARAM_EXTERN Param is found.
      For stand-alone expressions, we add a new entry point to allow the
      ParamListInfo to be specified directly, since it can't be retrieved
      from the parent plan node's EState.
      
      * Redesign the API for the ParamListInfo paramFetch hook so that the
      ParamExternData array can be entirely virtual.  This also lets us get rid
      of ParamListInfo.paramMask, instead leaving it to the paramFetch hook to
      decide which param IDs should be accessible or not.  plpgsql_param_fetch
      was already doing the identical masking check, so having callers do it too
      seemed redundant.  While I was at it, I added a "speculative" flag to
      paramFetch that the planner can specify as TRUE to avoid unwanted failures.
      This solves an ancient problem for plpgsql that it couldn't provide values
      of non-DTYPE_VAR variables to the planner for fear of triggering premature
      "record not assigned yet" or "field not found" errors during planning.
      
      * Rework plpgsql to get rid of the need for "unshared" parameter lists,
      by dint of turning the single ParamListInfo per estate into a nearly
      read-only data structure that doesn't instantiate any per-variable data.
      Instead, the paramFetch hook controls access to per-variable data and can
      make the right decisions on the fly, replacing the cases that we used to
      need multiple ParamListInfos for.  This might perhaps have been a
      performance loss on its own, but by using a paramCompile hook we can
      bypass plpgsql_param_fetch entirely during normal query execution.
      (It's now only called when, eg, we copy the ParamListInfo into a cursor
      portal.  copyParamList() or SerializeParamList() effectively instantiate
      the virtual parameter array as a simple physical array without a
      paramFetch hook, which is what we want in those cases.)  This allows
      reverting most of commit 6c82d8d1, though I kept the cosmetic
      code-consolidation aspects of that (eg the assign_simple_var function).
      
      Performance testing shows this to be at worst a break-even change,
      and it can provide wins ranging up to 20% in test cases involving
      accesses to fields of "record" variables.  The fact that values of
      such variables can now be exposed to the planner might produce wins
      in some situations, too, but I've not pursued that angle.
      
      In passing, remove the "parent" pointer from the arguments to
      ExecInitExprRec and related functions, instead storing that pointer in a
      transient field in ExprState.  The ParamListInfo pointer for a stand-alone
      expression is handled the same way; we'd otherwise have had to add
      yet another recursively-passed-down argument in expression compilation.
      
      Discussion: https://postgr.es/m/32589.1513706441@sss.pgh.pa.us
      6719b238
    • Alvaro Herrera's avatar
      Get rid of copy_partition_key · 8a0596cb
      Alvaro Herrera authored
      That function currently exists to avoid leaking memory in
      CacheMemoryContext in case of trouble while the partition key is being
      built, but there's a better way: allocate everything in a memcxt that
      goes away if the current (sub)transaction fails, and once the partition
      key is built and no further errors can occur, make the memcxt permanent
      by making it a child of CacheMemoryContext.
      
      Reviewed-by: Tom Lane
      Discussion: https://postgr.es/m/20171027172730.eh2domlkpn4ja62m@alvherre.pgsql
      8a0596cb
    • Alvaro Herrera's avatar
      Fix typo · 9ef6aba1
      Alvaro Herrera authored
      9ef6aba1
    • Tom Lane's avatar
      Avoid putting build-location-dependent strings into generated files. · c98c35cd
      Tom Lane authored
      Various Perl scripts we use to generate files were in the habit of
      printing things like "generated by $0" into their output files.
      That looks like a fine idea at first glance, but it results in
      non-reproducible output, because in VPATH builds $0 won't be just
      the name of the script file, but a full path for it.  We'd prefer
      that you get identical results whether using VPATH or not, so this
      is a bad thing.
      
      Some of these places also printed their input file name(s), causing
      an additional hazard of the same type.
      
      Hence, establish a policy that thou shalt not print $0, nor input file
      pathnames, into output files (they're still allowed in error messages,
      though).  Instead just write the script name verbatim.  While we are at
      it, we can make these annotations more useful by giving the script's
      full relative path name within the PG source tree, eg instead of
      Gen_fmgrtab.pl let's print src/backend/utils/Gen_fmgrtab.pl.
      
      Not all of the changes made here actually affect any files shipped
      in finished tarballs today, but it seems best to apply the policy
      everyplace so that nobody copies unsafe code into places where it
      could matter.
      
      Christoph Berg and Tom Lane
      
      Discussion: https://postgr.es/m/20171215102223.GB31812@msg.df7cb.de
      c98c35cd
    • Robert Haas's avatar
      Cancel CV sleep during subtransaction abort. · 59d1e2b9
      Robert Haas authored
      Generally, error recovery paths that need to do things like
      LWLockReleaseAll and pgstat_report_wait_end also need to call
      ConditionVariableCancelSleep, but AbortSubTransaction was missed.
      
      Since subtransaction abort might destroy up the DSM segment that
      contains the ConditionVariable stored in cv_sleep_target, this
      can result in a crash for anything using condition variables.
      
      Reported and diagnosed by Andres Freund.
      
      Discussion: http://postgr.es/m/20171221110048.rxk6464azzl5t2fi@alap3.anarazel.de
      59d1e2b9
    • Andres Freund's avatar
      Add parallel-aware hash joins. · 18042840
      Andres Freund authored
      Introduce parallel-aware hash joins that appear in EXPLAIN plans as Parallel
      Hash Join with Parallel Hash.  While hash joins could already appear in
      parallel queries, they were previously always parallel-oblivious and had a
      partial subplan only on the outer side, meaning that the work of the inner
      subplan was duplicated in every worker.
      
      After this commit, the planner will consider using a partial subplan on the
      inner side too, using the Parallel Hash node to divide the work over the
      available CPU cores and combine its results in shared memory.  If the join
      needs to be split into multiple batches in order to respect work_mem, then
      workers process different batches as much as possible and then work together
      on the remaining batches.
      
      The advantages of a parallel-aware hash join over a parallel-oblivious hash
      join used in a parallel query are that it:
      
       * avoids wasting memory on duplicated hash tables
       * avoids wasting disk space on duplicated batch files
       * divides the work of building the hash table over the CPUs
      
      One disadvantage is that there is some communication between the participating
      CPUs which might outweigh the benefits of parallelism in the case of small
      hash tables.  This is avoided by the planner's existing reluctance to supply
      partial plans for small scans, but it may be necessary to estimate
      synchronization costs in future if that situation changes.  Another is that
      outer batch 0 must be written to disk if multiple batches are required.
      
      A potential future advantage of parallel-aware hash joins is that right and
      full outer joins could be supported, since there is a single set of matched
      bits for each hashtable, but that is not yet implemented.
      
      A new GUC enable_parallel_hash is defined to control the feature, defaulting
      to on.
      
      Author: Thomas Munro
      Reviewed-By: Andres Freund, Robert Haas
      Tested-By: Rafia Sabih, Prabhat Sahu
      Discussion:
          https://postgr.es/m/CAEepm=2W=cOkiZxcg6qiFQP-dHUe09aqTrEMM7yJDrHMhDv_RA@mail.gmail.com
          https://postgr.es/m/CAEepm=37HKyJ4U6XOLi=JgfSHM3o6B-GaeO-6hkOmneTDkH+Uw@mail.gmail.com
      18042840
  2. 20 Dec, 2017 1 commit
  3. 19 Dec, 2017 5 commits
  4. 18 Dec, 2017 6 commits
    • Andres Freund's avatar
      Add shared tuplestores. · ab9e0e71
      Andres Freund authored
      SharedTuplestore allows multiple participants to write into it and
      then read the tuples back from it in parallel.  Each reader receives
      partial results.
      
      For now it always uses disk files, but other buffering policies and
      other kinds of scans (ie each reader receives complete results) may be
      useful in future.
      
      The upcoming parallel hash join feature will use this facility.
      
      Author: Thomas Munro
      Reviewed-By: Peter Geoghegan, Andres Freund, Robert Haas
      Discussion: https://postgr.es/m/CAEepm=2W=cOkiZxcg6qiFQP-dHUe09aqTrEMM7yJDrHMhDv_RA@mail.gmail.com
      ab9e0e71
    • Peter Eisentraut's avatar
      Move SCRAM-related name definitions to scram-common.h · 25d53269
      Peter Eisentraut authored
      Mechanism names for SCRAM and channel binding names have been included
      in scram.h by the libpq frontend code, and this header references a set
      of routines which are only used by the backend.  scram-common.h is on
      the contrary usable by both the backend and libpq, so getting those
      names from there seems more reasonable.
      
      Author: Michael Paquier <michael.paquier@gmail.com>
      25d53269
    • Peter Eisentraut's avatar
      doc: Fix figures in example description · 53cba77b
      Peter Eisentraut authored
      oversight in 244c8b46Reported-by: default avatarBlaz Merela <blaz@merela.org>
      53cba77b
    • Fujii Masao's avatar
      Fix bug in cancellation of non-exclusive backup to avoid assertion failure. · 56a95ee5
      Fujii Masao authored
      Previously an assertion failure occurred when pg_stop_backup() for
      non-exclusive backup was aborted while it's waiting for WAL files to
      be archived. This assertion failure happened in do_pg_abort_backup()
      which was called when a non-exclusive backup was canceled.
      do_pg_abort_backup() assumes that there is at least one non-exclusive
      backup running when it's called. But pg_stop_backup() can be canceled
      even after it marks the end of non-exclusive backup (e.g.,
      during waiting for WAL archiving). This broke the assumption that
      do_pg_abort_backup() relies on, and which caused an assertion failure.
      
      This commit changes do_pg_abort_backup() so that it does nothing
      when non-exclusive backup has been already marked as completed.
      That is, the asssumption is also changed, and do_pg_abort_backup()
      now can handle even the case where it's called when there is
      no running backup.
      
      Backpatch to 9.6 where SQL-callable non-exclusive backup was added.
      
      Author: Masahiko Sawada and Michael Paquier
      Reviewed-By: Robert Haas and Fujii Masao
      Discussion: https://www.postgresql.org/message-id/CAD21AoD2L1Fu2c==gnVASMyFAAaq3y-AQ2uEVj-zTCGFFjvmDg@mail.gmail.com
      56a95ee5
    • Robert Haas's avatar
      Fix crashes on plans with multiple Gather (Merge) nodes. · fd7c0fa7
      Robert Haas authored
      es_query_dsa turns out to be broken by design, because it supposes
      that there is only one DSA for the whole query, whereas there is
      actually one per Gather (Merge) node.  For now, work around that
      problem by setting and clearing the pointer around the sections of
      code that might need it.  It's probably a better idea to get rid of
      es_query_dsa altogether in favor of having each node keep track
      individually of which DSA is relevant, but that seems like more than
      we would want to back-patch.
      
      Thomas Munro, reviewed and tested by Andreas Seltenreich, Amit
      Kapila, and by me.
      
      Discussion: http://postgr.es/m/CAEepm=1U6as=brnVvMNixEV2tpi8NuyQoTmO8Qef0-VV+=7MDA@mail.gmail.com
      fd7c0fa7
    • Magnus Hagander's avatar
      Fix typo on comment · 7731c320
      Magnus Hagander authored
      Author: David Rowley
      7731c320
  5. 17 Dec, 2017 2 commits
  6. 16 Dec, 2017 3 commits
  7. 15 Dec, 2017 2 commits
    • Andres Freund's avatar
      Perform a lot more sanity checks when freezing tuples. · 699bf7d0
      Andres Freund authored
      The previous commit has shown that the sanity checks around freezing
      aren't strong enough. Strengthening them seems especially important
      because the existance of the bug has caused corruption that we don't
      want to make even worse during future vacuum cycles.
      
      The errors are emitted with ereport rather than elog, despite being
      "should never happen" messages, so a proper error code is emitted. To
      avoid superflous translations, mark messages as internal.
      
      Author: Andres Freund and Alvaro Herrera
      Reviewed-By: Alvaro Herrera, Michael Paquier
      Discussion: https://postgr.es/m/20171102112019.33wb7g5wp4zpjelu@alap3.anarazel.de
      Backpatch: 9.3-
      699bf7d0
    • Andres Freund's avatar
      Fix pruning of locked and updated tuples. · 9c2f0a6c
      Andres Freund authored
      Previously it was possible that a tuple was not pruned during vacuum,
      even though its update xmax (i.e. the updating xid in a multixact with
      both key share lockers and an updater) was below the cutoff horizon.
      
      As the freezing code assumed, rightly so, that that's not supposed to
      happen, xmax would be preserved (as a member of a new multixact or
      xmax directly). That causes two problems: For one the tuple is below
      the xmin horizon, which can cause problems if the clog is truncated or
      once there's an xid wraparound. The bigger problem is that that will
      break HOT chains, which in turn can lead two to breakages: First,
      failing index lookups, which in turn can e.g lead to constraints being
      violated. Second, future hot prunes / vacuums can end up making
      invisible tuples visible again. There's other harmful scenarios.
      
      Fix the problem by recognizing that tuples can be DEAD instead of
      RECENTLY_DEAD, even if the multixactid has alive members, if the
      update_xid is below the xmin horizon. That's safe because newer
      versions of the tuple will contain the locking xids.
      
      A followup commit will harden the code somewhat against future similar
      bugs and already corrupted data.
      
      Author: Andres Freund, with changes by Alvaro Herrera
      Reported-By: Daniel Wood
      Analyzed-By: Andres Freund, Alvaro Herrera, Robert Haas, Peter
         Geoghegan, Daniel Wood, Yi Wen Wong, Michael Paquier
      Reviewed-By: Alvaro Herrera, Robert Haas, Michael Paquier
      Discussion:
          https://postgr.es/m/E5711E62-8FDF-4DCA-A888-C200BF6B5742@amazon.com
          https://postgr.es/m/20171102112019.33wb7g5wp4zpjelu@alap3.anarazel.de
      Backpatch: 9.3-
      9c2f0a6c
  8. 14 Dec, 2017 4 commits
  9. 13 Dec, 2017 10 commits
    • Andres Freund's avatar
      Allow executor nodes to change their ExecProcNode function. · 538d114f
      Andres Freund authored
      In order for executor nodes to be able to change their ExecProcNode function
      after ExecInitNode() has finished, provide ExecSetExecProcNode().  This allows
      any wrappers functions that only execProcnode.c knows about to be reinstalled.
      The motivation for wanting to change ExecProcNode after ExecInitNode() has
      finished is that it is not known until later whether parallel query is
      available, so if a parallel variant is to be installed then ExecInitNode()
      is too soon to decide.
      
      Author: Thomas Munro
      Reviewed-By: Andres Freund
      Discussion: https://postgr.es/m/CAEepm=09rr65VN+cAV5FgyM_z=D77Xy8Fuc9CDDDYbq3pQUezg@mail.gmail.com
      538d114f
    • Andres Freund's avatar
      Add pg_attribute_always_inline. · dbb3d6f0
      Andres Freund authored
      Sometimes it is useful to be able to insist that the compiler inline a
      function that its normal cost analysis would not normally choose to inline.
      This can be useful for instantiating different variants of a function that
      remove branches of code by constant folding.
      
      Author: Thomas Munro
      Reviewed-By: Andres Freund
      Discussion: https://postgr.es/m/CAEepm=09rr65VN+cAV5FgyM_z=D77Xy8Fuc9CDDDYbq3pQUezg@mail.gmail.com
      dbb3d6f0
    • Andres Freund's avatar
      Add defenses against pre-crash files to BufFileOpenShared(). · 923e8dee
      Andres Freund authored
      Crash restarts currently don't clean up temporary files, as a debugging aid.
      If a left-over file happens to have the same name as a segment file we're
      trying to create, we'll just truncate and reuse it, but there is a problem:
      BufFileOpenShared() determines how many segment files exist by trying to open
      .0, .1, .2, ... until it finds no more files.  It might be confused by a junk
      file that has the next segment number.  To defend against that, make sure we
      always create a gap after the end file by unlinking the following name if it
      exists.  Also make it an error to try to open a BufFile that doesn't exist
      (has no segment 0), so as not to encourage the development of client code
      that depends on an interface that we can't reliably provide.
      
      Author: Thomas Munro
      Reviewed-By: Andres Freund
      Discussion: https://postgr.es/m/CAEepm%3D2jhCbC_GFQJaaDhWxLB4EXtT3vVd5czuRNaqF5CWSTog%40mail.gmail.com
      923e8dee
    • Robert Haas's avatar
      Fix parallel index scan hang with deleted or half-dead pages. · 884a6084
      Robert Haas authored
      The previous coding forgot to release the scan before seizing
      it again, leading to a lockup.
      
      Report by Patrick Hemmer.  Diagnosis by Thomas Munro.  Patch by
      Amit Kapila.
      
      Discussion: http://postgr.es/m/CAEepm=2xZUcOGP9V0O_G0=2P2wwXwPrkF=upWTCJSisUxMnuSg@mail.gmail.com
      884a6084
    • Robert Haas's avatar
      Revert "Fix accumulation of parallel worker instrumentation." · 1d6fb35a
      Robert Haas authored
      This reverts commit 2c09a5c1.  Per
      further discussion, that doesn't seem to be the best possible fix.
      
      Discussion: http://postgr.es/m/CAA4eK1LW2aFKzY3=vwvc=t-juzPPVWP2uT1bpx_MeyEqnM+p8g@mail.gmail.com
      1d6fb35a
    • Tom Lane's avatar
      Rethink MemoryContext creation to improve performance. · 9fa6f00b
      Tom Lane authored
      This patch makes a number of interrelated changes to reduce the overhead
      involved in creating/deleting memory contexts.  The key ideas are:
      
      * Include the AllocSetContext header of an aset.c context in its first
      malloc request, rather than allocating it separately in TopMemoryContext.
      This means that we now always create an initial or "keeper" block in an
      aset, even if it never receives any allocation requests.
      
      * Create freelists in which we can save and recycle recently-destroyed
      asets (this idea is due to Robert Haas).
      
      * In the common case where the name of a context is a constant string,
      just store a pointer to it in the context header, rather than copying
      the string.
      
      The first change eliminates a palloc/pfree cycle per context, and
      also avoids bloat in TopMemoryContext, at the price that creating
      a context now involves a malloc/free cycle even if the context never
      receives any allocations.  That would be a loser for some common
      usage patterns, but recycling short-lived contexts via the freelist
      eliminates that pain.
      
      Avoiding copying constant strings not only saves strlen() and strcpy()
      overhead, but is an essential part of the freelist optimization because
      it makes the context header size constant.  Currently we make no
      attempt to use the freelist for contexts with non-constant names.
      (Perhaps someday we'll need to think harder about that, but in current
      usage, most contexts with custom names are long-lived anyway.)
      
      The freelist management in this initial commit is pretty simplistic,
      and we might want to refine it later --- but in common workloads that
      will never matter because the freelists will never get full anyway.
      
      To create a context with a non-constant name, one is now required to
      call AllocSetContextCreateExtended and specify the MEMCONTEXT_COPY_NAME
      option.  AllocSetContextCreate becomes a wrapper macro, and it includes
      a test that will complain about non-string-literal context name
      parameters on gcc and similar compilers.
      
      An unfortunate side effect of making AllocSetContextCreate a macro is
      that one is now *required* to use the size parameter abstraction macros
      (ALLOCSET_DEFAULT_SIZES and friends) with it; the pre-9.6 habit of
      writing out individual size parameters no longer works unless you
      switch to AllocSetContextCreateExtended.
      
      Internally to the memory-context-related modules, the context creation
      APIs are simplified, removing the rather baroque original design whereby
      a context-type module called mcxt.c which then called back into the
      context-type module.  That saved a bit of code duplication, but not much,
      and it prevented context-type modules from exercising control over the
      allocation of context headers.
      
      In passing, I converted the test-and-elog validation of aset size
      parameters into Asserts to save a few more cycles.  The original thought
      was that callers might compute size parameters on the fly, but in practice
      nobody does that, so it's useless to expend cycles on checking those
      numbers in production builds.
      
      Also, mark the memory context method-pointer structs "const",
      just for cleanliness.
      
      Discussion: https://postgr.es/m/2264.1512870796@sss.pgh.pa.us
      9fa6f00b
    • Peter Eisentraut's avatar
      Start a separate test suite for plpgsql · 632b03da
      Peter Eisentraut authored
      The plpgsql.sql test file in the main regression tests is now by far the
      largest after numeric_big, making editing and managing the test cases
      very cumbersome.  The other PLs have their own test suites split up into
      smaller files by topic.  It would be nice to have that for plpgsql as
      well.  So, to get that started, set up test infrastructure in
      src/pl/plpgsql/src/ and split out the recently added procedure test
      cases into a new file there.  That file now mirrors the test cases added
      to the other PLs, making managing those matching tests a bit easier too.
      
      msvc build system changes with help from Michael Paquier
      632b03da
    • Peter Eisentraut's avatar
      Fix crash when using CALL on an aggregate · 3d887422
      Peter Eisentraut authored
      Author: Ashutosh Bapat <ashutosh.bapat@enterprisedb.com>
      Reported-by: default avatarRushabh Lathia <rushabh.lathia@gmail.com>
      3d887422
    • Andres Freund's avatar
      Add float.h include to int8.c, for isnan(). · 8e211f53
      Andres Freund authored
      port.h redirects isnan() to _isnan() on windows, which in turn is
      provided by float.h rather than math.h. Therefore include the latter
      as well.
      
      Per buildfarm.
      8e211f53
    • Andres Freund's avatar
      Consistently use PG_INT(16|32|64)_(MIN|MAX). · f512a6e1
      Andres Freund authored
      Per buildfarm animal woodlouse.
      f512a6e1