1. 17 Oct, 2008 1 commit
  2. 04 Oct, 2008 1 commit
    • Tom Lane's avatar
      Implement SQL-standard WITH clauses, including WITH RECURSIVE. · 44d5be0e
      Tom Lane authored
      There are some unimplemented aspects: recursive queries must use UNION ALL
      (should allow UNION too), and we don't have SEARCH or CYCLE clauses.
      These might or might not get done for 8.4, but even without them it's a
      pretty useful feature.
      
      There are also a couple of small loose ends and definitional quibbles,
      which I'll send a memo about to pgsql-hackers shortly.  But let's land
      the patch now so we can get on with other development.
      
      Yoshiyuki Asaba, with lots of help from Tatsuo Ishii and Tom Lane
      44d5be0e
  3. 05 Sep, 2008 1 commit
    • Tom Lane's avatar
      Fix an oversight in the 8.2 patch that improved mergejoin performance by · e540b972
      Tom Lane authored
      inserting a materialize node above an inner-side sort node, when the sort is
      expected to spill to disk.  (The materialize protects the sort from having
      to support mark/restore, allowing it to do its final merge pass on-the-fly.)
      We neglected to teach cost_mergejoin about that hack, so it was failing to
      include the materialize's costs in the estimated cost of the mergejoin.
      The materialize's costs are generally going to be pretty negligible in
      comparison to the sort's, so this is only a small error and probably not
      worth back-patching; but it's still wrong.
      
      In the similar case where a materialize is inserted to protect an inner-side
      node that can't do mark/restore at all, it's still true that the materialize
      should not spill to disk, and so we should cost it cheaply rather than
      expensively.
      
      Noted while thinking about a question from Tom Raney.
      e540b972
  4. 25 Aug, 2008 1 commit
  5. 22 Aug, 2008 1 commit
    • Tom Lane's avatar
      Arrange to convert EXISTS subqueries that are equivalent to hashable IN · bd3dadda
      Tom Lane authored
      subqueries into the same thing you'd have gotten from IN (except always with
      unknownEqFalse = true, so as to get the proper semantics for an EXISTS).
      I believe this fixes the last case within CVS HEAD in which an EXISTS could
      give worse performance than an equivalent IN subquery.
      
      The tricky part of this is that if the upper query probes the EXISTS for only
      a few rows, the hashing implementation can actually be worse than the default,
      and therefore we need to make a cost-based decision about which way to use.
      But at the time when the planner generates plans for subqueries, it doesn't
      really know how many times the subquery will be executed.  The least invasive
      solution seems to be to generate both plans and postpone the choice until
      execution.  Therefore, in a query that has been optimized this way, EXPLAIN
      will show two subplans for the EXISTS, of which only one will actually get
      executed.
      
      There is a lot more that could be done based on this infrastructure: in
      particular it's interesting to consider switching to the hash plan if we start
      out using the non-hashed plan but find a lot more upper rows going by than we
      expected.  I have therefore left some minor inefficiencies in place, such as
      initializing both subplans even though we will currently only use one.
      bd3dadda
  6. 16 Aug, 2008 1 commit
    • Tom Lane's avatar
      Clean up the loose ends in selectivity estimation left by my patch for semi · d4af2a64
      Tom Lane authored
      and anti joins.  To do this, pass the SpecialJoinInfo struct for the current
      join as an additional optional argument to operator join selectivity
      estimation functions.  This allows the estimator to tell not only what kind
      of join is being formed, but which variable is on which side of the join;
      a requirement long recognized but not dealt with till now.  This also leaves
      the door open for future improvements in the estimators, such as accounting
      for the null-insertion effects of lower outer joins.  I didn't do anything
      about that in the current patch but the information is in principle deducible
      from what's passed.
      
      The patch also clarifies the definition of join selectivity for semi/anti
      joins: it's the fraction of the left input that has (at least one) match
      in the right input.  This allows getting rid of some very fuzzy thinking
      that I had committed in the original 7.4-era IN-optimization patch.
      There's probably room to estimate this better than the present patch does,
      but at least we know what to estimate.
      
      Since I had to touch CREATE OPERATOR anyway to allow a variant signature
      for join estimator functions, I took the opportunity to add a couple of
      additional checks that were missing, per my recent message to -hackers:
      * Check that estimator functions return float8;
      * Require execute permission at the time of CREATE OPERATOR on the
      operator's function as well as the estimator functions;
      * Require ownership of any pre-existing operator that's modified by
      the command.
      I also moved the lookup of the functions out of OperatorCreate() and
      into operatorcmds.c, since that seemed more consistent with most of
      the other catalog object creation processes, eg CREATE TYPE.
      d4af2a64
  7. 14 Aug, 2008 1 commit
    • Tom Lane's avatar
      Implement SEMI and ANTI joins in the planner and executor. (Semijoins replace · e006a24a
      Tom Lane authored
      the old JOIN_IN code, but antijoins are new functionality.)  Teach the planner
      to convert appropriate EXISTS and NOT EXISTS subqueries into semi and anti
      joins respectively.  Also, LEFT JOINs with suitable upper-level IS NULL
      filters are recognized as being anti joins.  Unify the InClauseInfo and
      OuterJoinInfo infrastructure into "SpecialJoinInfo".  With that change,
      it becomes possible to associate a SpecialJoinInfo with every join attempt,
      which permits some cleanup of join selectivity estimation.  That needs to be
      taken much further than this patch does, but the next step is to change the
      API for oprjoin selectivity functions, which seems like material for a
      separate patch.  So for the moment the output size estimates for semi and
      especially anti joins are quite bogus.
      e006a24a
  8. 24 Mar, 2008 1 commit
    • Tom Lane's avatar
      When a relation has been proven empty by constraint exclusion, propagate that · fd791e7b
      Tom Lane authored
      knowledge up through any joins it participates in.  We were doing that already
      in some special cases but not in the general case.  Also, defend against zero
      row estimates for the input relations in cost_mergejoin --- this fix may have
      eliminated the only scenario in which that can happen, but be safe.  Per
      report from Alex Solovey.
      fd791e7b
  9. 01 Jan, 2008 1 commit
  10. 08 Dec, 2007 1 commit
    • Tom Lane's avatar
      Fix mergejoin cost estimation so that we consider the statistical ranges of · 9fd88436
      Tom Lane authored
      the two join variables at both ends: not only trailing rows that need not be
      scanned because there cannot be a match on the other side, but initial rows
      that will be scanned without possibly having a match.  This allows a more
      realistic estimate of startup cost to be made, per recent pgsql-performance
      discussion.  In passing, fix a couple of bugs that had crept into
      mergejoinscansel: it was not quite up to speed for the task of estimating
      descending-order scans, which is a new requirement in 8.3.
      9fd88436
  11. 15 Nov, 2007 2 commits
  12. 24 Oct, 2007 1 commit
    • Tom Lane's avatar
      Fix UPDATE/DELETE WHERE CURRENT OF to support repeated update and update- · c29a9c37
      Tom Lane authored
      then-delete on the current cursor row.  The basic fix is that nodeTidscan.c
      has to apply heap_get_latest_tid() to the current-scan-TID obtained from the
      cursor query; this ensures we get the latest row version to work with.
      However, since that only works if the query plan is a TID scan, we also have
      to hack the planner to make sure only that type of plan will be selected.
      (Formerly, the planner might decide to apply a seqscan if the table is very
      small.  This change is probably a Good Thing anyway, since it's hard to see
      how a seqscan could really win.)  That means the execQual.c code to support
      CurrentOfExpr as a regular expression type is dead code, so replace it with
      just an elog().  Also, add regression tests covering these cases.  Note
      that the added tests expose the fact that re-fetching an updated row
      misbehaves if the cursor used FOR UPDATE.  That's an independent bug that
      should be fixed later.  Per report from Dharmendra Goyal.
      c29a9c37
  13. 22 Sep, 2007 1 commit
    • Tom Lane's avatar
      Fix cost estimates for EXISTS subqueries that are evaluated as initPlans · 71256875
      Tom Lane authored
      (because they are uncorrelated with the immediate parent query).  We were
      charging the full run cost to the parent node, disregarding the fact that
      only one row need be fetched for EXISTS.  While this would only be a
      cosmetic issue in most cases, it might possibly affect planning outcomes
      if the parent query were itself a subquery to some upper query.
      Per recent discussion with Steve Crawford.
      71256875
  14. 11 Jun, 2007 1 commit
    • Tom Lane's avatar
      Support UPDATE/DELETE WHERE CURRENT OF cursor_name, per SQL standard. · 6808f1b1
      Tom Lane authored
      Along the way, allow FOR UPDATE in non-WITH-HOLD cursors; there may once
      have been a reason to disallow that, but it seems to work now, and it's
      really rather necessary if you want to select a row via a cursor and then
      update it in a concurrent-safe fashion.
      
      Original patch by Arul Shaji, rather heavily editorialized by Tom Lane.
      6808f1b1
  15. 05 Jun, 2007 1 commit
    • Tom Lane's avatar
      Downgrade implicit casts to text to be assignment-only, except for the ones · 31edbadf
      Tom Lane authored
      from the other string-category types; this eliminates a lot of surprising
      interpretations that the parser could formerly make when there was no directly
      applicable operator.
      
      Create a general mechanism that supports casts to and from the standard string
      types (text,varchar,bpchar) for *every* datatype, by invoking the datatype's
      I/O functions.  These new casts are assignment-only in the to-string direction,
      explicit-only in the other, and therefore should create no surprising behavior.
      Remove a bunch of thereby-obsoleted datatype-specific casting functions.
      
      The "general mechanism" is a new expression node type CoerceViaIO that can
      actually convert between *any* two datatypes if their external text
      representations are compatible.  This is more general than needed for the
      immediate feature, but might be useful in plpgsql or other places in future.
      
      This commit does nothing about the issue that applying the concatenation
      operator || to non-text types will now fail, often with strange error messages
      due to misinterpreting the operator as array concatenation.  Since it often
      (not always) worked before, we should either make it succeed or at least give
      a more user-friendly error; but details are still under debate.
      
      Peter Eisentraut and Tom Lane
      31edbadf
  16. 21 May, 2007 1 commit
    • Tom Lane's avatar
      Teach tuplestore.c to throw away data before the "mark" point when the caller · 2415ad98
      Tom Lane authored
      is using mark/restore but not rewind or backward-scan capability.  Insert a
      materialize plan node between a mergejoin and its inner child if the inner
      child is a sort that is expected to spill to disk.  The materialize shields
      the sort from the need to do mark/restore and thereby allows it to perform
      its final merge pass on-the-fly; while the materialize itself is normally
      cheap since it won't spill to disk unless the number of tuples with equal
      key values exceeds work_mem.
      
      Greg Stark, with some kibitzing from Tom Lane.
      2415ad98
  17. 04 May, 2007 1 commit
    • Tom Lane's avatar
      Teach tuplesort.c about "top N" sorting, in which only the first N tuples · d26559db
      Tom Lane authored
      need be returned.  We keep a heap of the current best N tuples and sift-up
      new tuples into it as we scan the input.  For M input tuples this means
      only about M*log(N) comparisons instead of M*log(M), not to mention a lot
      less workspace when N is small --- avoiding spill-to-disk for large M
      is actually the most attractive thing about it.  Patch includes planner
      and executor support for invoking this facility in ORDER BY ... LIMIT
      queries.  Greg Stark, with some editorialization by moi.
      d26559db
  18. 21 Apr, 2007 2 commits
  19. 27 Mar, 2007 1 commit
    • Tom Lane's avatar
      Fix array coercion expressions to ensure that the correct volatility is · bf940763
      Tom Lane authored
      seen by code inspecting the expression.  The best way to do this seems
      to be to drop the original representation as a function invocation, and
      instead make a special expression node type that represents applying
      the element-type coercion function to each array element.  In this way
      the element function is exposed and will be checked for volatility.
      Per report from Guillaume Smet.
      bf940763
  20. 22 Feb, 2007 1 commit
    • Tom Lane's avatar
      Turn the rangetable used by the executor into a flat list, and avoid storing · eab6b8b2
      Tom Lane authored
      useless substructure for its RangeTblEntry nodes.  (I chose to keep using the
      same struct node type and just zero out the link fields for unneeded info,
      rather than making a separate ExecRangeTblEntry type --- it seemed too
      fragile to have two different rangetable representations.)
      
      Along the way, put subplans into a list in the toplevel PlannedStmt node,
      and have SubPlan nodes refer to them by list index instead of direct pointers.
      Vadim wanted to do that years ago, but I never understood what he was on about
      until now.  It makes things a *whole* lot more robust, because we can stop
      worrying about duplicate processing of subplans during expression tree
      traversals.  That's been a constant source of bugs, and it's finally gone.
      
      There are some consequent simplifications yet to be made, like not using
      a separate EState for subplans in the executor, but I'll tackle that later.
      eab6b8b2
  21. 22 Jan, 2007 2 commits
    • Tom Lane's avatar
      Put back planner's ability to cache the results of mergejoinscansel(), · 4f06c688
      Tom Lane authored
      which I had removed in the first cut of the EquivalenceClass rewrite to
      simplify that patch a little.  But it's still important --- in a four-way
      join problem mergejoinscansel() was eating about 40% of the planning time
      according to gprof.  Also, improve the EquivalenceClass code to re-use
      join RestrictInfos rather than generating fresh ones for each join
      considered.  This saves some memory space but more importantly improves
      the effectiveness of caching planning info in RestrictInfos.
      4f06c688
    • Tom Lane's avatar
      Add COST and ROWS options to CREATE/ALTER FUNCTION, plus underlying pg_proc · 5a7471c3
      Tom Lane authored
      columns procost and prorows, to allow simple user adjustment of the estimated
      cost of a function call, as well as control of the estimated number of rows
      returned by a set-returning function.  We might eventually wish to extend this
      to allow function-specific estimation routines, but there seems to be
      consensus that we should try a simple constant estimate first.  In particular
      this provides a relatively simple way to control the order in which different
      WHERE clauses are applied in a plan node, which is a Good Thing in view of the
      fact that the recent EquivalenceClass planner rewrite made that much less
      predictable than before.
      5a7471c3
  22. 20 Jan, 2007 1 commit
    • Tom Lane's avatar
      Refactor planner's pathkeys data structure to create a separate, explicit · f41803bb
      Tom Lane authored
      representation of equivalence classes of variables.  This is an extensive
      rewrite, but it brings a number of benefits:
      * planner no longer fails in the presence of "incomplete" operator families
      that don't offer operators for every possible combination of datatypes.
      * avoid generating and then discarding redundant equality clauses.
      * remove bogus assumption that derived equalities always use operators
      named "=".
      * mergejoins can work with a variety of sort orders (e.g., descending) now,
      instead of tying each mergejoinable operator to exactly one sort order.
      * better recognition of redundant sort columns.
      * can make use of equalities appearing underneath an outer join.
      f41803bb
  23. 10 Jan, 2007 1 commit
    • Tom Lane's avatar
      Change the planner-to-executor API so that the planner tells the executor · a191a169
      Tom Lane authored
      which comparison operators to use for plan nodes involving tuple comparison
      (Agg, Group, Unique, SetOp).  Formerly the executor looked up the default
      equality operator for the datatype, which was really pretty shaky, since it's
      possible that the data being fed to the node is sorted according to some
      nondefault operator class that could have an incompatible idea of equality.
      The planner knows what it has sorted by and therefore can provide the right
      equality operator to use.  Also, this change moves a couple of catalog lookups
      out of the executor and into the planner, which should help startup time for
      pre-planned queries by some small amount.  Modify the planner to remove some
      other cavalier assumptions about always being able to use the default
      operators.  Also add "nulls first/last" info to the Plan node for a mergejoin
      --- neither the executor nor the planner can cope yet, but at least the API is
      in place.
      a191a169
  24. 08 Jan, 2007 1 commit
    • Tom Lane's avatar
      Remove cost_hashjoin's very ancient hack to discourage (once, entirely forbid) · 9a9a143a
      Tom Lane authored
      hash joins with the estimated-larger relation on the inside.  There are
      several cases where doing that makes perfect sense, and in cases where it
      doesn't, the regular cost computation really ought to be able to figure that
      out.  Make some marginal tweaks in said computation to try to get results
      approximating reality a bit better.  Per an example from Shane Ambler.
      
      Also, fix an oversight in the original patch to add seq_page_cost: the costs
      of spilling a hash join to disk should be scaled by seq_page_cost.
      9a9a143a
  25. 05 Jan, 2007 1 commit
  26. 23 Dec, 2006 1 commit
    • Tom Lane's avatar
      Restructure operator classes to allow improved handling of cross-data-type · a78fcfb5
      Tom Lane authored
      cases.  Operator classes now exist within "operator families".  While most
      families are equivalent to a single class, related classes can be grouped
      into one family to represent the fact that they are semantically compatible.
      Cross-type operators are now naturally adjunct parts of a family, without
      having to wedge them into a particular opclass as we had done originally.
      
      This commit restructures the catalogs and cleans up enough of the fallout so
      that everything still works at least as well as before, but most of the work
      needed to actually improve the planner's behavior will come later.  Also,
      there are not yet CREATE/DROP/ALTER OPERATOR FAMILY commands; the only way
      to create a new family right now is to allow CREATE OPERATOR CLASS to make
      one by default.  I owe some more documentation work, too.  But that can all
      be done in smaller pieces once this infrastructure is in place.
      a78fcfb5
  27. 15 Dec, 2006 1 commit
    • Tom Lane's avatar
      Fix some planner bugs exposed by reports from Arjen van der Meijden. These · 281f4018
      Tom Lane authored
      are all in new-in-8.2 logic associated with indexability of ScalarArrayOpExpr
      (IN-clauses) or amortization of indexscan costs across repeated indexscans
      on the inside of a nestloop.  In particular:
      
      Fix some logic errors in the estimation for multiple scans induced by a
      ScalarArrayOpExpr indexqual.
      
      Include a small cost component in bitmap index scans to reflect the costs of
      manipulating the bitmap itself; this is mainly to prevent a bitmap scan from
      appearing to have the same cost as a plain indexscan for fetching a single
      tuple.
      
      Also add a per-index-scan-startup CPU cost component; while prior releases
      were clearly too pessimistic about the cost of repeated indexscans, the
      original 8.2 coding allowed the cost of an indexscan to effectively go to zero
      if repeated often enough, which is overly optimistic.
      
      Pay some attention to index correlation when estimating costs for a nestloop
      inner indexscan: this is significant when the plan fetches multiple heap
      tuples per iteration, since high correlation means those tuples are probably
      on the same or adjacent heap pages.
      281f4018
  28. 11 Nov, 2006 1 commit
  29. 10 Nov, 2006 1 commit
  30. 04 Oct, 2006 1 commit
  31. 19 Sep, 2006 1 commit
    • Tom Lane's avatar
      Improve usage of effective_cache_size parameter by assuming that all the · b74c5436
      Tom Lane authored
      tables in the query compete for cache space, not just the one we are
      currently costing an indexscan for.  This seems more realistic, and it
      definitely will help in examples recently exhibited by Stefan
      Kaltenbrunner.  To get the total size of all the tables involved, we must
      tweak the handling of 'append relations' a bit --- formerly we looked up
      information about the child tables on-the-fly during set_append_rel_pathlist,
      but it needs to be done before we start doing any cost estimation, so
      push it into the add_base_rels_to_query scan.
      b74c5436
  32. 02 Aug, 2006 1 commit
  33. 26 Jul, 2006 1 commit
  34. 22 Jul, 2006 1 commit
  35. 14 Jul, 2006 1 commit
  36. 11 Jul, 2006 1 commit
  37. 01 Jul, 2006 1 commit
    • Tom Lane's avatar
      Fix oversight in planning for multiple indexscans driven by · 08ccdf02
      Tom Lane authored
      ScalarArrayOpExpr index quals: we were estimating the right total
      number of rows returned, but treating the index-access part of the
      cost as if a single scan were fetching that many consecutive index
      tuples.  Actually we should treat it as a multiple indexscan, and
      if there are enough of 'em the Mackert-Lohman discount should kick in.
      08ccdf02