1. 30 Sep, 2015 5 commits
    • Robert Haas's avatar
      Add a Gather executor node. · 3bd909b2
      Robert Haas authored
      A Gather executor node runs any number of copies of a plan in an equal
      number of workers and merges all of the results into a single tuple
      stream.  It can also run the plan itself, if the workers are
      unavailable or haven't started up yet.  It is intended to work with
      the Partial Seq Scan node which will be added in future commits.
      
      It could also be used to implement parallel query of a different sort
      by itself, without help from Partial Seq Scan, if the single_copy mode
      is used.  In that mode, a worker executes the plan, and the parallel
      leader does not, merely collecting the worker's results.  So, a Gather
      node could be inserted into a plan to split the execution of that plan
      across two processes.  Nested Gather nodes aren't currently supported,
      but we might want to add support for that in the future.
      
      There's nothing in the planner to actually generate Gather nodes yet,
      so it's not quite time to break out the champagne.  But we're getting
      close.
      
      Amit Kapila.  Some designs suggestions were provided by me, and I also
      reviewed the patch.  Single-copy mode, documentation, and other minor
      changes also by me.
      3bd909b2
    • Robert Haas's avatar
      Don't dump core when destroying an unused ParallelContext. · 227d57f3
      Robert Haas authored
      If a transaction or subtransaction creates a ParallelContext but ends
      without calling InitializeParallelDSM, the previous code would
      seg fault.  Fix that.
      227d57f3
    • Stephen Frost's avatar
      Include policies based on ACLs needed · 7d8db3e8
      Stephen Frost authored
      When considering which policies should be included, rather than look at
      individual bits of the query (eg: if a RETURNING clause exists, or if a
      WHERE clause exists which is referencing the table, or if it's a
      FOR SHARE/UPDATE query), consider any case where we've determined
      the user needs SELECT rights on the relation while doing an UPDATE or
      DELETE to be a case where we apply SELECT policies, and any case where
      we've deteremind that the user needs UPDATE rights on the relation while
      doing a SELECT to be a case where we apply UPDATE policies.
      
      This simplifies the logic and addresses concerns that a user could use
      UPDATE or DELETE with a WHERE clauses to determine if rows exist, or
      they could use SELECT .. FOR UPDATE to lock rows which they are not
      actually allowed to modify through UPDATE policies.
      
      Use list_append_unique() to avoid adding the same quals multiple times,
      as, on balance, the cost of checking when adding the quals will almost
      always be cheaper than keeping them and doing busywork for each tuple
      during execution.
      
      Back-patch to 9.5 where RLS was added.
      7d8db3e8
    • Tom Lane's avatar
      Small improvements in comments in async.c. · 6057f61b
      Tom Lane authored
      We seem to have lost a line somewhere along the way in the comment block
      that discusses async.c's locks, because it suddenly refers to "both locks"
      without previously having mentioned more than one.  Add a sentence to make
      that read more sanely.  Also, refer to the "pos of the slowest backend"
      not the "tail of the slowest backend", since we have no per-backend value
      called "tail".
      6057f61b
    • Tatsuo Ishii's avatar
      Fix incorrect tps number calculation in "excluding connections establishing". · a16db3a0
      Tatsuo Ishii authored
      The tolerance (larger than actual tps number) increases as the number
      of threads decreases.  The bug has been there since the thread support
      was introduced in 9.0. Because back patching introduces incompatible
      behavior changes regarding the tps number, the fix is committed to
      master and 9.5 stable branches only.
      
      Problem spotted by me and fix proposed by Fabien COELHO. Note that his
      original patch included more than fixes (a code re-factoring) which is
      not related to the problem and I omitted the part.
      a16db3a0
  2. 29 Sep, 2015 4 commits
    • Alvaro Herrera's avatar
      Code review for transaction commit timestamps · 6b619551
      Alvaro Herrera authored
      There are three main changes here:
      
      1. No longer cause a start failure in a standby if the feature is
      disabled in postgresql.conf but enabled in the master.  This reverts one
      part of commit 4f3924d9; what we keep is the ability of the standby
      to activate/deactivate the module (which includes creating and removing
      segments as appropriate) during replay of such actions in the master.
      
      2. Replay WAL records affecting commitTS even if the feature is
      disabled.  This means the standby will always have the same state as the
      master after replay.
      
      3. Have COMMIT PREPARE record the transaction commit time as well.  We
      were previously only applying it in the normal transaction commit path.
      
      Author: Petr Jelínek
      Discussion: http://www.postgresql.org/message-id/CAHGQGwHereDzzzmfxEBYcVQu3oZv6vZcgu1TPeERWbDc+gQ06g@mail.gmail.com
      Discussion: http://www.postgresql.org/message-id/CAHGQGwFuzfO4JscM9LCAmCDCxp_MfLvN4QdB+xWsS-FijbjTYQ@mail.gmail.com
      
      Additionally, I cleaned up nearby code related to replication origins,
      which I found a bit hard to follow, and fixed a couple of typos.
      
      Backpatch to 9.5, where this code was introduced.
      
      Per bug reports from Fujii Masao and subsequent discussion.
      6b619551
    • Tom Lane's avatar
      Fix plperl to handle non-ASCII error message texts correctly. · b631a46e
      Tom Lane authored
      We were passing error message texts to croak() verbatim, which turns out
      not to work if the text contains non-ASCII characters; Perl mangles their
      encoding, as reported in bug #13638 from Michal Leinweber.  To fix, convert
      the text into a UTF8-encoded SV first.
      
      It's hard to test this without risking failures in different database
      encodings; but we can follow the lead of plpython, which is already
      assuming that no-break space (U+00A0) has an equivalent in all encodings
      we care about running the regression tests in (cf commit 2dfa15de).
      
      Back-patch to 9.1.  The code is quite different in 9.0, and anyway it seems
      too risky to put something like this into 9.0's final minor release.
      
      Alex Hunsaker, with suggestions from Tim Bunce and Tom Lane
      b631a46e
    • Robert Haas's avatar
      Comment update for join pushdown. · 758fcfdc
      Robert Haas authored
      Etsuro Fujita
      758fcfdc
    • Robert Haas's avatar
      Parallel executor support. · d1b7c1ff
      Robert Haas authored
      This code provides infrastructure for a parallel leader to start up
      parallel workers to execute subtrees of the plan tree being executed
      in the master.  User-supplied parameters from ParamListInfo are passed
      down, but PARAM_EXEC parameters are not.  Various other constructs,
      such as initplans, subplans, and CTEs, are also not currently shared.
      Nevertheless, there's enough here to support a basic implementation of
      parallel query, and we can lift some of the current restrictions as
      needed.
      
      Amit Kapila and Robert Haas
      d1b7c1ff
  3. 28 Sep, 2015 10 commits
  4. 27 Sep, 2015 3 commits
  5. 26 Sep, 2015 2 commits
    • Andres Freund's avatar
      Remove legacy multixact truncation support. · aa29c1cc
      Andres Freund authored
      In 9.5 and master there is no need to support legacy truncation. This is
      just committed separately to make it easier to backpatch the WAL logged
      multixact truncation to 9.3 and 9.4 if we later decide to do so.
      
      I bumped master's magic from 0xD086 to 0xD088 and 9.5's from 0xD085 to
      0xD087 to avoid 9.5 reusing a value that has been in use on master while
      keeping the numbers increasing between major versions.
      
      Discussion: 20150621192409.GA4797@alap3.anarazel.de
      Backpatch: 9.5
      aa29c1cc
    • Andres Freund's avatar
      Rework the way multixact truncations work. · 4f627f89
      Andres Freund authored
      The fact that multixact truncations are not WAL logged has caused a fair
      share of problems. Amongst others it requires to do computations during
      recovery while the database is not in a consistent state, delaying
      truncations till checkpoints, and handling members being truncated, but
      offset not.
      
      We tried to put bandaids on lots of these issues over the last years,
      but it seems time to change course. Thus this patch introduces WAL
      logging for multixact truncations.
      
      This allows:
      1) to perform the truncation directly during VACUUM, instead of delaying it
         to the checkpoint.
      2) to avoid looking at the offsets SLRU for truncation during recovery,
         we can just use the master's values.
      3) simplify a fair amount of logic to keep in memory limits straight,
         this has gotten much easier
      
      During the course of fixing this a bunch of additional bugs had to be
      fixed:
      1) Data was not purged from memory the member's SLRU before deleting
         segments. This happened to be hard or impossible to hit due to the
         interlock between checkpoints and truncation.
      2) find_multixact_start() relied on SimpleLruDoesPhysicalPageExist - but
         that doesn't work for offsets that haven't yet been flushed to
         disk. Add code to flush the SLRUs to fix. Not pretty, but it feels
         slightly safer to only make decisions based on actual on-disk state.
      3) find_multixact_start() could be called concurrently with a truncation
         and thus fail. Via SetOffsetVacuumLimit() that could lead to a round
         of emergency vacuuming. The problem remains in
         pg_get_multixact_members(), but that's quite harmless.
      
      For now this is going to only get applied to 9.5+, leaving the issues in
      the older branches in place. It is quite possible that we need to
      backpatch at a later point though.
      
      For the case this gets backpatched we need to handle that an updated
      standby may be replaying WAL from a not-yet upgraded primary. We have to
      recognize that situation and use "old style" truncation (i.e. looking at
      the SLRUs) during WAL replay. In contrast to before, this now happens in
      the startup process, when replaying a checkpoint record, instead of the
      checkpointer. Doing truncation in the restartpoint is incorrect, they
      can happen much later than the original checkpoint, thereby leading to
      wraparound.  To avoid "multixact_redo: unknown op code 48" errors
      standbys would have to be upgraded before primaries.
      
      A later patch will bump the WAL page magic, and remove the legacy
      truncation codepaths. Legacy truncation support is just included to make
      a possible future backpatch easier.
      
      Discussion: 20150621192409.GA4797@alap3.anarazel.de
      Reviewed-By: Robert Haas, Alvaro Herrera, Thomas Munro
      Backpatch: 9.5 for now
      4f627f89
  6. 25 Sep, 2015 4 commits
    • Tom Lane's avatar
      Second try at fixing O(N^2) problem in foreign key references. · 2abfd9d5
      Tom Lane authored
      This replaces ill-fated commit 5ddc7288,
      which was reverted because it broke active uses of FK cache entries.  In
      this patch, we still do nothing more to invalidatable cache entries than
      mark them as needing revalidation, so we won't break active uses.  To keep
      down the overhead of InvalidateConstraintCacheCallBack(), keep a list of
      just the currently-valid cache entries.  (The entries are large enough that
      some added space for list links doesn't seem like a big problem.)  This
      would still be O(N^2) when there are many valid entries, though, so when
      the list gets too long, just force the "sinval reset" behavior to remove
      everything from the list.  I set the threshold at 1000 entries, somewhat
      arbitrarily.  Possibly that could be fine-tuned later.  Another item for
      future study is whether it's worth adding reference counting so that we
      could safely remove invalidated entries.  As-is, problem cases are likely
      to end up with large and mostly invalid FK caches.
      
      Like the previous attempt, backpatch to 9.3.
      
      Jan Wieck and Tom Lane
      2abfd9d5
    • Tom Lane's avatar
      Further fix for psql's code for locale-aware formatting of numeric output. · 77130fc1
      Tom Lane authored
      (Third time's the charm, I hope.)
      
      Additional testing disclosed that this code could mangle already-localized
      output from the "money" datatype.  We can't very easily skip applying it
      to "money" values, because the logic is tied to column right-justification
      and people expect "money" output to be right-justified.  Short of
      decoupling that, we can fix it in what should be a safe enough way by
      testing to make sure the string doesn't contain any characters that would
      not be expected in plain numeric output.
      77130fc1
    • Tom Lane's avatar
      Further fix for psql's code for locale-aware formatting of numeric output. · 6325527d
      Tom Lane authored
      On closer inspection, those seemingly redundant atoi() calls were not so
      much inefficient as just plain wrong: the author of this code either had
      not read, or had not understood, the POSIX specification for localeconv().
      The grouping field is *not* a textual digit string but separate integers
      encoded as chars.
      
      We'll follow the existing code as well as the backend's cash.c in only
      honoring the first group width, but let's at least honor it correctly.
      
      This doesn't actually result in any behavioral change in any of the
      locales I have installed on my Linux box, which may explain why nobody's
      complained; grouping width 3 is close enough to universal that it's barely
      worth considering other cases.  Still, wrong is wrong, so back-patch.
      6325527d
    • Tom Lane's avatar
      Fix psql's code for locale-aware formatting of numeric output. · 4778a0bd
      Tom Lane authored
      This code did the wrong thing entirely for numbers with an exponent
      but no decimal point (e.g., '1e6'), as reported by Jeff Janes in
      bug #13636.  More generally, it made lots of unverified assumptions
      about what the input string could possibly look like.  Rearrange so
      that it only fools with leading digits that it's directly verified
      are there, and an immediately adjacent decimal point.  While at it,
      get rid of some useless inefficiencies, like converting the grouping
      count string to integer over and over (and over).
      
      This has been broken for a long time, so back-patch to all supported
      branches.
      4778a0bd
  7. 24 Sep, 2015 5 commits
    • Tom Lane's avatar
      Allow planner to use expression-index stats for function calls in WHERE. · 39df0f15
      Tom Lane authored
      Previously, a function call appearing at the top level of WHERE had a
      hard-wired selectivity estimate of 0.3333333, a kludge conveniently dated
      in the source code itself to July 1992.  The expectation at the time was
      that somebody would soon implement estimator support functions analogous
      to those for operators; but no such code has appeared, nor does it seem
      likely to in the near future.  We do have an alternative solution though,
      at least for immutable functions on single relations: creating an
      expression index on the function call will allow ANALYZE to gather stats
      about the function's selectivity.  But the code in clause_selectivity()
      failed to make use of such data even if it exists.
      
      Refactor so that that will happen.  I chose to make it try this technique
      for any clause type for which clause_selectivity() doesn't have a special
      case, not just functions.  To avoid adding unnecessary overhead in the
      common case where we don't learn anything new, make selfuncs.c provide an
      API that hooks directly to examine_variable() and then var_eq_const(),
      rather than the previous coding which laboriously constructed an OpExpr
      only so that it could be expensively deconstructed again.
      
      I preserved the behavior that the default estimate for a function call
      is 0.3333333.  (For any other expression node type, it's 0.5, as before.)
      I had originally thought to make the default be 0.5 across the board, but
      changing a default estimate that's survived for twenty-three years seems
      like something not to do without a lot more testing than I care to put
      into it right now.
      
      Per a complaint from Jehan-Guillaume de Rorthais.  Back-patch into 9.5,
      but not further, at least for the moment.
      39df0f15
    • Tom Lane's avatar
      Improve handling of collations in contrib/postgres_fdw. · 76f965ff
      Tom Lane authored
      If we have a local Var of say varchar type with default collation, and
      we apply a RelabelType to convert that to text with default collation, we
      don't want to consider that as creating an FDW_COLLATE_UNSAFE situation.
      It should be okay to compare that to a remote Var, so long as the remote
      Var determines the comparison collation.  (When we actually ship such an
      expression to the remote side, the local Var would become a Param with
      default collation, meaning the remote Var would in fact control the
      comparison collation, because non-default implicit collation overrides
      default implicit collation in parse_collate.c.)  To fix, be more precise
      about what FDW_COLLATE_NONE means: it applies either to a noncollatable
      data type or to a collatable type with default collation, if that collation
      can't be traced to a remote Var.  (When it can, FDW_COLLATE_SAFE is
      appropriate.)  We were essentially using that interpretation already at
      the Var/Const/Param level, but we weren't bubbling it up properly.
      
      An alternative fix would be to introduce a separate FDW_COLLATE_DEFAULT
      value to describe the second situation, but that would add more code
      without changing the actual behavior, so it didn't seem worthwhile.
      
      Also, since we're clarifying the rule to be that we care about whether
      operator/function input collations match, there seems no need to fail
      immediately upon seeing a Const/Param/non-foreign-Var with nondefault
      collation.  We only have to reject if it appears in a collation-sensitive
      context (for example, "var IS NOT NULL" is perfectly safe from a collation
      standpoint, whatever collation the var has).  So just set the state to
      UNSAFE rather than failing immediately.
      
      Per report from Jeevan Chalke.  This essentially corrects some sloppy
      thinking in commit ed3ddf91, so back-patch
      to 9.3 where that logic appeared.
      76f965ff
    • Robert Haas's avatar
      Don't zero opfuncid when reading nodes. · 9f1255ac
      Robert Haas authored
      The comments here stated that this was just in case we ever had an
      ALTER OPERATOR command that could remap an operator to a different
      function.  But those comments have been here for a long time, and no
      such command has come about.  In the absence of such a feature,
      forcing the pg_proc OID to be looked up again each time we reread a
      stored rule or similar is just a waste of cycles.  Moreover, parallel
      query needs a way to reread the exact same node tree that was written
      out, not one that has been slightly stomped on.  So just get rid of
      this for now.
      
      Per discussion with Tom Lane.
      9f1255ac
    • Fujii Masao's avatar
      Make pg_controldata report newest XID with valid commit timestamp · 18d938de
      Fujii Masao authored
      Previously pg_controldata didn't report newestCommitTs and this was
      an oversight in commit 73c986ad.
      
      Also this patch changes pg_resetxlog so that it uses the same sentences
      as pg_controldata does, regarding oldestCommitTs and newestCommitTs,
      for the sake of consistency.
      
      Back-patch to 9.5 where track_commit_timestamp was added.
      
      Euler Taveira
      18d938de
    • Andres Freund's avatar
      Lower *_freeze_max_age minimum values. · 020235a5
      Andres Freund authored
      The old minimum values are rather large, making it time consuming to
      test related behaviour. Additionally the current limits, especially for
      multixacts, can be problematic in space-constrained systems. 10000000
      multixacts can contain a lot of members.
      
      Since there's no good reason for the current limits, lower them a good
      bit. Setting them to 0 would be a bad idea, triggering endless vacuums,
      so still retain a limit.
      
      While at it fix autovacuum_multixact_freeze_max_age to refer to
      multixact.c instead of varsup.c.
      
      Reviewed-By: Robert Haas
      Discussion: CA+TgmoYmQPHcrc3GSs7vwvrbTkbcGD9Gik=OztbDGGrovkkEzQ@mail.gmail.com
      Backpatch: back to 9.0 (in parts)
      020235a5
  8. 23 Sep, 2015 5 commits
    • Tom Lane's avatar
      Make ANALYZE compute basic statistics even for types with no "=" operator. · 82e1ba7f
      Tom Lane authored
      Previously, ANALYZE simply ignored columns of datatypes that have neither
      a btree nor hash opclass (which means they have no recognized equality
      operator).  Without a notion of equality, we can't identify most-common
      values nor estimate the number of distinct values.  But we can still
      count nulls and compute the average physical column width, and those
      stats might be of value.  Moreover there are some tools out there that
      don't work so well if rows are missing from pg_statistic.  So let's
      add suitable logic for this case.
      
      While this is arguably a bug fix, it also has the potential to change
      query plans, and the gain seems not worth taking a risk of that in
      stable branches.  So back-patch into 9.5 but not further.
      
      Oleksandr Shulgin, rewritten a bit by me.
      82e1ba7f
    • Robert Haas's avatar
      Add readfuncs.c support for plan nodes. · a0d9f6e4
      Robert Haas authored
      For parallel query, we need to be able to pass a Plan to a worker, so
      that it knows what it's supposed to do.  We could invent our own way
      of serializing plans for that purpose, but piggybacking on the
      existing node infrastructure seems like a much better idea.
      
      Initially, we'll probably only support a limited number of nodes
      within parallel workers, but this commit adds support for everything
      in plannodes.h except CustomScan, because doing it all at once seems
      easier than doing it piecemeal, and it makes testing this code easier,
      too.  CustomScan is excluded because making that work requires a
      larger rework of that facility.
      
      Amit Kapila, reviewed and slightly revised by me.
      a0d9f6e4
    • Robert Haas's avatar
      Print a MergeJoin's mergeNullsFirst array as bool, not int. · 4fe6f72b
      Robert Haas authored
      It's declared as being an array of bool, but it's printed
      differently from the way bool and arrays of bool are handled
      elsewhere.
      
      Patch by Amit Kapila.  Anomaly noted independently by Amit Kapila
      and KaiGai Kohei.
      4fe6f72b
    • Teodor Sigaev's avatar
      Allow autoanalyze to add pages deleted from pending list to FSM · dc943ad9
      Teodor Sigaev authored
      Commit e9568083 introduces adding pages
      to FSM for ordinary insert, but autoanalyze was able just cleanup
      pending list without adding to FSM.
      
      Also fix double call of IndexFreeSpaceMapVacuum() during ginvacuumcleanup()
      
      Report from Fujii Masao
      Patch by me
      Review by Jeff Janes
      dc943ad9
    • Robert Haas's avatar
      Teach planstate_tree_walker about custom scans. · 262e56bc
      Robert Haas authored
      This logic was missing from ExplainPreScanNode, from which I derived
      planstate_tree_walker.  But it shouldn't be missing, especially not
      from a generic walker function, so add it.
      
      KaiGai Kohei
      262e56bc
  9. 22 Sep, 2015 2 commits