1. 18 Jan, 2017 1 commit
    • Tom Lane's avatar
      Improve RLS planning by marking individual quals with security levels. · 215b43cd
      Tom Lane authored
      In an RLS query, we must ensure that security filter quals are evaluated
      before ordinary query quals, in case the latter contain "leaky" functions
      that could expose the contents of sensitive rows.  The original
      implementation of RLS planning ensured this by pushing the scan of a
      secured table into a sub-query that it marked as a security-barrier view.
      Unfortunately this results in very inefficient plans in many cases, because
      the sub-query cannot be flattened and gets planned independently of the
      rest of the query.
      
      To fix, drop the use of sub-queries to enforce RLS qual order, and instead
      mark each qual (RestrictInfo) with a security_level field establishing its
      priority for evaluation.  Quals must be evaluated in security_level order,
      except that "leakproof" quals can be allowed to go ahead of quals of lower
      security_level, if it's helpful to do so.  This has to be enforced within
      the ordering of any one list of quals to be evaluated at a table scan node,
      and we also have to ensure that quals are not chosen for early evaluation
      (i.e., use as an index qual or TID scan qual) if they're not allowed to go
      ahead of other quals at the scan node.
      
      This is sufficient to fix the problem for RLS quals, since we only support
      RLS policies on simple tables and thus RLS quals will always exist at the
      table scan level only.  Eventually these qual ordering rules should be
      enforced for join quals as well, which would permit improving planning for
      explicit security-barrier views; but that's a task for another patch.
      
      Note that FDWs would need to be aware of these rules --- and not, for
      example, send an insecure qual for remote execution --- but since we do
      not yet allow RLS policies on foreign tables, the case doesn't arise.
      This will need to be addressed before we can allow such policies.
      
      Patch by me, reviewed by Stephen Frost and Dean Rasheed.
      
      Discussion: https://postgr.es/m/8185.1477432701@sss.pgh.pa.us
      215b43cd
  2. 01 Jul, 2016 1 commit
    • Tom Lane's avatar
      Rethink the GetForeignUpperPaths API (again). · 9e703987
      Tom Lane authored
      In the previous design, the GetForeignUpperPaths FDW callback hook was
      called before we got around to labeling upper relations with the proper
      consider_parallel flag; this meant that any upper paths created by an FDW
      would be marked not-parallel-safe.  While that's probably just as well
      right now, we aren't going to want it to be true forever.  Hence, abandon
      the idea that FDWs should be allowed to inject upper paths before the core
      code has gotten around to creating the relevant upper relation.  (Well,
      actually they still can, but it's on their own heads how well it works.)
      Instead, adopt the same API already designed for create_upper_paths_hook:
      we call GetForeignUpperPaths after each upperrel has been created and
      populated with the paths the core planner knows how to make.
      9e703987
  3. 30 Apr, 2016 1 commit
    • Tom Lane's avatar
      Fix planner crash from pfree'ing a partial path that a GatherPath uses. · c45bf575
      Tom Lane authored
      We mustn't run generate_gather_paths() during add_paths_to_joinrel(),
      because that function can be invoked multiple times for the same target
      joinrel.  Not only is it wasteful to build GatherPaths repeatedly, but
      a later add_partial_path() could delete the partial path that a previously
      created GatherPath depends on.  Instead establish the convention that we
      do generate_gather_paths() for a rel only just before set_cheapest().
      
      The code was accidentally not broken for baserels, because as of today there
      never is more than one partial path for a baserel.  But that assumption
      obviously has a pretty short half-life, so move the generate_gather_paths()
      calls for those cases as well.
      
      Also add some generic comments explaining how and why this all works.
      
      Per fuzz testing by Andreas Seltenreich.
      
      Report: <871t5pgwdt.fsf@credativ.de>
      c45bf575
  4. 21 Apr, 2016 1 commit
  5. 17 Mar, 2016 1 commit
  6. 15 Mar, 2016 1 commit
    • Tom Lane's avatar
      Add a GetForeignUpperPaths callback function for FDWs. · 101fd934
      Tom Lane authored
      This is basically like the just-added create_upper_paths_hook, but
      control is funneled only to the FDW responsible for all the baserels
      of the current query; so providing such a callback is much less likely
      to add useless overhead than using the hook function is.
      
      The documentation is a bit sketchy.  We'll likely want to improve it,
      and/or adjust the call conventions, when we get some experience with
      actually using this callback.  Hopefully somebody will find time to
      experiment with it before 9.6 feature freeze.
      101fd934
  7. 08 Mar, 2016 1 commit
  8. 07 Mar, 2016 1 commit
    • Tom Lane's avatar
      Make the upper part of the planner work by generating and comparing Paths. · 3fc6e2d7
      Tom Lane authored
      I've been saying we needed to do this for more than five years, and here it
      finally is.  This patch removes the ever-growing tangle of spaghetti logic
      that grouping_planner() used to use to try to identify the best plan for
      post-scan/join query steps.  Now, there is (nearly) independent
      consideration of each execution step, and entirely separate construction of
      Paths to represent each of the possible ways to do that step.  We choose
      the best Path or set of Paths using the same add_path() logic that's been
      used inside query_planner() for years.
      
      In addition, this patch removes the old restriction that subquery_planner()
      could return only a single Plan.  It now returns a RelOptInfo containing a
      set of Paths, just as query_planner() does, and the parent query level can
      use each of those Paths as the basis of a SubqueryScanPath at its level.
      This allows finding some optimizations that we missed before, wherein a
      subquery was capable of returning presorted data and thereby avoiding a
      sort in the parent level, making the overall cost cheaper even though
      delivering sorted output was not the cheapest plan for the subquery in
      isolation.  (A couple of regression test outputs change in consequence of
      that.  However, there is very little change in visible planner behavior
      overall, because the point of this patch is not to get immediate planning
      benefits but to create the infrastructure for future improvements.)
      
      There is a great deal left to do here.  This patch unblocks a lot of
      planner work that was basically impractical in the old code structure,
      such as allowing FDWs to implement remote aggregation, or rewriting
      plan_set_operations() to allow consideration of multiple implementation
      orders for set operations.  (The latter will likely require a full
      rewrite of plan_set_operations(); what I've done here is only to fix it
      to return Paths not Plans.)  I have also left unfinished some localized
      refactoring in createplan.c and planner.c, because it was not necessary
      to get this patch to a working state.
      
      Thanks to Robert Haas, David Rowley, and Amit Kapila for review.
      3fc6e2d7
  9. 20 Jan, 2016 1 commit
    • Robert Haas's avatar
      Support parallel joins, and make related improvements. · 45be99f8
      Robert Haas authored
      The core innovation of this patch is the introduction of the concept
      of a partial path; that is, a path which if executed in parallel will
      generate a subset of the output rows in each process.  Gathering a
      partial path produces an ordinary (complete) path.  This allows us to
      generate paths for parallel joins by joining a partial path for one
      side (which at the baserel level is currently always a Partial Seq
      Scan) to an ordinary path on the other side.  This is subject to
      various restrictions at present, especially that this strategy seems
      unlikely to be sensible for merge joins, so only nested loops and
      hash joins paths are generated.
      
      This also allows an Append node to be pushed below a Gather node in
      the case of a partitioned table.
      
      Testing revealed that early versions of this patch made poor decisions
      in some cases, which turned out to be caused by the fact that the
      original cost model for Parallel Seq Scan wasn't very good.  So this
      patch tries to make some modest improvements in that area.
      
      There is much more to be done in the area of generating good parallel
      plans in all cases, but this seems like a useful step forward.
      
      Patch by me, reviewed by Dilip Kumar and Amit Kapila.
      45be99f8
  10. 01 Oct, 2015 1 commit
  11. 06 Aug, 2015 1 commit
    • Tom Lane's avatar
      Further fixes for degenerate outer join clauses. · 8703059c
      Tom Lane authored
      Further testing revealed that commit f69b4b94 was still a few
      bricks shy of a load: minor tweaking of the previous test cases resulted
      in the same wrong-outer-join-order problem coming back.  After study
      I concluded that my previous changes in make_outerjoininfo() were just
      accidentally masking the problem, and should be reverted in favor of
      forcing syntactic join order whenever an upper outer join's predicate
      doesn't mention a lower outer join's LHS.  This still allows the
      chained-outer-joins style that is the normally optimizable case.
      
      I also tightened things up some more in join_is_legal().  It seems to me
      on review that what's really happening in the exception case where we
      ignore a mismatched special join is that we're allowing the proposed join
      to associate into the RHS of the outer join we're comparing it to.  As
      such, we should *always* insist that the proposed join be a left join,
      which eliminates a bunch of rather dubious argumentation.  The case where
      we weren't enforcing that was the one that was already known buggy anyway
      (it had a violatable Assert before the aforesaid commit) so it hardly
      deserves a lot of deference.
      
      Back-patch to all active branches, like the previous patch.  The added
      regression test case failed in all branches back to 9.1, and I think it's
      only an unrelated change in costing calculations that kept 9.0 from
      choosing a broken plan.
      8703059c
  12. 03 Jun, 2015 1 commit
    • Tom Lane's avatar
      Fix planner's cost estimation for SEMI/ANTI joins with inner indexscans. · 3f59be83
      Tom Lane authored
      When the inner side of a nestloop SEMI or ANTI join is an indexscan that
      uses all the join clauses as indexquals, it can be presumed that both
      matched and unmatched outer rows will be processed very quickly: for
      matched rows, we'll stop after fetching one row from the indexscan, while
      for unmatched rows we'll have an indexscan that finds no matching index
      entries, which should also be quick.  The planner already knew about this,
      but it was nonetheless charging for at least one full run of the inner
      indexscan, as a consequence of concerns about the behavior of materialized
      inner scans --- but those concerns don't apply in the fast case.  If the
      inner side has low cardinality (many matching rows) this could make an
      indexscan plan look far more expensive than it actually is.  To fix,
      rearrange the work in initial_cost_nestloop/final_cost_nestloop so that we
      don't add the inner scan cost until we've inspected the indexquals, and
      then we can add either the full-run cost or just the first tuple's cost as
      appropriate.
      
      Experimentation with this fix uncovered another problem: add_path and
      friends were coded to disregard cheap startup cost when considering
      parameterized paths.  That's usually okay (and desirable, because it thins
      the path herd faster); but in this fast case for SEMI/ANTI joins, it could
      result in throwing away the desired plain indexscan path in favor of a
      bitmap scan path before we ever get to the join costing logic.  In the
      many-matching-rows cases of interest here, a bitmap scan will do a lot more
      work than required, so this is a problem.  To fix, add a per-relation flag
      consider_param_startup that works like the existing consider_startup flag,
      but applies to parameterized paths, and set it for relations that are the
      inside of a SEMI or ANTI join.
      
      To make this patch reasonably safe to back-patch, care has been taken to
      avoid changing the planner's behavior except in the very narrow case of
      SEMI/ANTI joins with inner indexscans.  There are places in
      compare_path_costs_fuzzily and add_path_precheck that are not terribly
      consistent with the new approach, but changing them will affect planner
      decisions at the margins in other cases, so we'll leave that for a
      HEAD-only fix.
      
      Back-patch to 9.3; before that, the consider_startup flag didn't exist,
      meaning that the second aspect of the patch would be too invasive.
      
      Per a complaint from Peter Holzer and analysis by Tomas Vondra.
      3f59be83
  13. 20 May, 2015 1 commit
    • Heikki Linnakangas's avatar
      Collection of typo fixes. · 4fc72cc7
      Heikki Linnakangas authored
      Use "a" and "an" correctly, mostly in comments. Two error messages were
      also fixed (they were just elogs, so no translation work required). Two
      function comments in pg_proc.h were also fixed. Etsuro Fujita reported one
      of these, but I found a lot more with grep.
      
      Also fix a few other typos spotted while grepping for the a/an typos.
      For example, "consists out of ..." -> "consists of ...". Plus a "though"/
      "through" mixup reported by Euler Taveira.
      
      Many of these typos were in old code, which would be nice to backpatch to
      make future backpatching easier. But much of the code was new, and I didn't
      feel like crafting separate patches for each branch. So no backpatching.
      4fc72cc7
  14. 28 Feb, 2015 1 commit
    • Tom Lane's avatar
      Fix planning of star-schema-style queries. · b514a746
      Tom Lane authored
      Part of the intent of the parameterized-path mechanism was to handle
      star-schema queries efficiently, but some overly-restrictive search
      limiting logic added in commit e2fa76d8
      prevented such cases from working as desired.  Fix that and add a
      regression test about it.  Per gripe from Marc Cousin.
      
      This is arguably a bug rather than a new feature, so back-patch to 9.2
      where parameterized paths were introduced.
      b514a746
  15. 05 Nov, 2013 1 commit
  16. 19 Aug, 2013 1 commit
    • Tom Lane's avatar
      Fix qual-clause-misplacement issues with pulled-up LATERAL subqueries. · c64de21e
      Tom Lane authored
      In an example such as
      SELECT * FROM
        i LEFT JOIN LATERAL (SELECT * FROM j WHERE i.n = j.n) j ON true;
      it is safe to pull up the LATERAL subquery into its parent, but we must
      then treat the "i.n = j.n" clause as a qual clause of the LEFT JOIN.  The
      previous coding in deconstruct_recurse mistakenly labeled the clause as
      "is_pushed_down", resulting in wrong semantics if the clause were applied
      at the join node, as per an example submitted awhile ago by Jeremy Evans.
      To fix, postpone processing of such clauses until we return back up to
      the appropriate recursion depth in deconstruct_recurse.
      
      In addition, tighten the is-safe-to-pull-up checks in is_simple_subquery;
      we previously missed the possibility that the LATERAL subquery might itself
      contain an outer join that makes lateral references in lower quals unsafe.
      
      A regression test case equivalent to Jeremy's example was already in my
      commit of yesterday, but was giving the wrong results because of this
      bug.  This patch fixes the expected output for that, and also adds a
      test case for the second problem.
      c64de21e
  17. 18 Aug, 2013 1 commit
    • Tom Lane's avatar
      Fix planner problems with LATERAL references in PlaceHolderVars. · 9e7e29c7
      Tom Lane authored
      The planner largely failed to consider the possibility that a
      PlaceHolderVar's expression might contain a lateral reference to a Var
      coming from somewhere outside the PHV's syntactic scope.  We had a previous
      report of a problem in this area, which I tried to fix in a quick-hack way
      in commit 4da6439b, but Antonin Houska
      pointed out that there were still some problems, and investigation turned
      up other issues.  This patch largely reverts that commit in favor of a more
      thoroughly thought-through solution.  The new theory is that a PHV's
      ph_eval_at level cannot be higher than its original syntactic level.  If it
      contains lateral references, those don't change the ph_eval_at level, but
      rather they create a lateral-reference requirement for the ph_eval_at join
      relation.  The code in joinpath.c needs to handle that.
      
      Another issue is that createplan.c wasn't handling nested PlaceHolderVars
      properly.
      
      In passing, push knowledge of lateral-reference checks for join clauses
      into join_clause_is_movable_to.  This is mainly so that FDWs don't need
      to deal with it.
      
      This patch doesn't fix the original join-qual-placement problem reported by
      Jeremy Evans (and indeed, one of the new regression test cases shows the
      wrong answer because of that).  But the PlaceHolderVar problems need to be
      fixed before that issue can be addressed, so committing this separately
      seems reasonable.
      9e7e29c7
  18. 05 Aug, 2013 1 commit
    • Tom Lane's avatar
      Simplify query_planner's API by having it return the top-level RelOptInfo. · 3ced8837
      Tom Lane authored
      Formerly, query_planner returned one or possibly two Paths for the topmost
      join relation, so that grouping_planner didn't see the join RelOptInfo
      (at least not directly; it didn't have any hesitation about examining
      cheapest_path->parent, though).  However, correct selection of the Paths
      involved a significant amount of coupling between query_planner and
      grouping_planner, a problem which has gotten worse over time.  It seems
      best to give up on this API choice and instead return the topmost
      RelOptInfo explicitly.  Then grouping_planner can pull out the Paths it
      wants from the rel's path list.  In this way we can remove all knowledge
      of grouping behaviors from query_planner.
      
      The only real benefit of the old way is that in the case of an empty
      FROM clause, we never made any RelOptInfos at all, just a Path.  Now
      we have to gin up a dummy RelOptInfo to represent the empty FROM clause.
      That's not a very big deal though.
      
      While at it, simplify query_planner's API a bit more by having the caller
      set up root->tuple_fraction and root->limit_tuples, rather than passing
      those values as separate parameters.  Since query_planner no longer does
      anything with either value, requiring it to fill the PlannerInfo fields
      seemed pretty arbitrary.
      
      This patch just rearranges code; it doesn't (intentionally) change any
      behaviors.  Followup patches will do more interesting things.
      3ced8837
  19. 29 Apr, 2013 1 commit
    • Tom Lane's avatar
      Postpone creation of pathkeys lists to fix bug #8049. · db9f0e1d
      Tom Lane authored
      This patch gets rid of the concept of, and infrastructure for,
      non-canonical PathKeys; we now only ever create canonical pathkey lists.
      
      The need for non-canonical pathkeys came from the desire to have
      grouping_planner initialize query_pathkeys and related pathkey lists before
      calling query_planner.  However, since query_planner didn't actually *do*
      anything with those lists before they'd been made canonical, we can get rid
      of the whole mess by just not creating the lists at all until the point
      where we formerly canonicalized them.
      
      There are several ways in which we could implement that without making
      query_planner itself deal with grouping/sorting features (which are
      supposed to be the province of grouping_planner).  I chose to add a
      callback function to query_planner's API; other alternatives would have
      required adding more fields to PlannerInfo, which while not bad in itself
      would create an ABI break for planner-related plugins in the 9.2 release
      series.  This still breaks ABI for anything that calls query_planner
      directly, but it seems somewhat unlikely that there are any such plugins.
      
      I had originally conceived of this change as merely a step on the way to
      fixing bug #8049 from Teun Hoogendoorn; but it turns out that this fixes
      that bug all by itself, as per the added regression test.  The reason is
      that now get_eclass_for_sort_expr is adding the ORDER BY expression at the
      end of EquivalenceClass creation not the start, and so anything that is in
      a multi-member EquivalenceClass has already been created with correct
      em_nullable_relids.  I am suspicious that there are related scenarios in
      which we still need to teach get_eclass_for_sort_expr to compute correct
      nullable_relids, but am not eager to risk destabilizing either 9.2 or 9.3
      to fix bugs that are only hypothetical.  So for the moment, do this and
      stop here.
      
      Back-patch to 9.2 but not to earlier branches, since they don't exhibit
      this bug for lack of join-clause-movement logic that depends on
      em_nullable_relids being correct.  (We might have to revisit that choice
      if any related bugs turn up.)  In 9.2, don't change the signature of
      make_pathkeys_for_sortclauses nor remove canonicalize_pathkeys, so as
      not to risk more plugin breakage than we have to.
      db9f0e1d
  20. 30 Aug, 2012 1 commit
    • Tom Lane's avatar
      Adjust definition of cheapest_total_path to work better with LATERAL. · e83bb10d
      Tom Lane authored
      In the initial cut at LATERAL, I kept the rule that cheapest_total_path
      was always unparameterized, which meant it had to be NULL if the relation
      has no unparameterized paths.  It turns out to work much more nicely if
      we always have *some* path nominated as cheapest-total for each relation.
      In particular, let's still say it's the cheapest unparameterized path if
      there is one; if not, take the cheapest-total-cost path among those of
      the minimum available parameterization.  (The first rule is actually
      a special case of the second.)
      
      This allows reversion of some temporary lobotomizations I'd put in place.
      In particular, the planner can now consider hash and merge joins for
      joins below a parameter-supplying nestloop, even if there aren't any
      unparameterized paths available.  This should bring planning of
      LATERAL-containing queries to the same level as queries not using that
      feature.
      
      Along the way, simplify management of parameterized paths in add_path()
      and friends.  In the original coding for parameterized paths in 9.2,
      I tried to minimize the logic changes in add_path(), so it just treated
      parameterization as yet another dimension of comparison for paths.
      We later made it ignore pathkeys (sort ordering) of parameterized paths,
      on the grounds that ordering isn't a useful property for the path on the
      inside of a nestloop, so we might as well get rid of useless parameterized
      paths as quickly as possible.  But we didn't take that reasoning as far as
      we should have.  Startup cost isn't a useful property inside a nestloop
      either, so add_path() ought to discount startup cost of parameterized paths
      as well.  Having done that, the secondary sorting I'd implemented (in
      add_parameterized_path) is no longer needed --- any parameterized path that
      survives add_path() at all is worth considering at higher levels.  So this
      should be a bit faster as well as simpler.
      e83bb10d
  21. 19 Apr, 2012 1 commit
    • Tom Lane's avatar
      Revise parameterized-path mechanism to fix assorted issues. · 5b7b5518
      Tom Lane authored
      This patch adjusts the treatment of parameterized paths so that all paths
      with the same parameterization (same set of required outer rels) for the
      same relation will have the same rowcount estimate.  We cache the rowcount
      estimates to ensure that property, and hopefully save a few cycles too.
      Doing this makes it practical for add_path_precheck to operate without
      a rowcount estimate: it need only assume that paths with different
      parameterizations never dominate each other, which is close enough to
      true anyway for coarse filtering, because normally a more-parameterized
      path should yield fewer rows thanks to having more join clauses to apply.
      
      In add_path, we do the full nine yards of comparing rowcount estimates
      along with everything else, so that we can discard parameterized paths that
      don't actually have an advantage.  This fixes some issues I'd found with
      add_path rejecting parameterized paths on the grounds that they were more
      expensive than not-parameterized ones, even though they yielded many fewer
      rows and hence would be cheaper once subsequent joining was considered.
      
      To make the same-rowcounts assumption valid, we have to require that any
      parameterized path enforce *all* join clauses that could be obtained from
      the particular set of outer rels, even if not all of them are useful for
      indexing.  This is required at both base scans and joins.  It's a good
      thing anyway since the net impact is that join quals are checked at the
      lowest practical level in the join tree.  Hence, discard the original
      rather ad-hoc mechanism for choosing parameterization joinquals, and build
      a better one that has a more principled rule for when clauses can be moved.
      The original rule was actually buggy anyway for lack of knowledge about
      which relations are part of an outer join's outer side; getting this right
      requires adding an outer_relids field to RestrictInfo.
      5b7b5518
  22. 16 Mar, 2012 1 commit
    • Tom Lane's avatar
      Revisit handling of UNION ALL subqueries with non-Var output columns. · dd4134ea
      Tom Lane authored
      In commit 57664ed2 I tried to fix a bug
      reported by Teodor Sigaev by making non-simple-Var output columns distinct
      (by wrapping their expressions with dummy PlaceHolderVar nodes).  This did
      not work too well.  Commit b28ffd0f fixed
      some ensuing problems with matching to child indexes, but per a recent
      report from Claus Stadler, constraint exclusion of UNION ALL subqueries was
      still broken, because constant-simplification didn't handle the injected
      PlaceHolderVars well either.  On reflection, the original patch was quite
      misguided: there is no reason to expect that EquivalenceClass child members
      will be distinct.  So instead of trying to make them so, we should ensure
      that we can cope with the situation when they're not.
      
      Accordingly, this patch reverts the code changes in the above-mentioned
      commits (though the regression test cases they added stay).  Instead, I've
      added assorted defenses to make sure that duplicate EC child members don't
      cause any problems.  Teodor's original problem ("MergeAppend child's
      targetlist doesn't match MergeAppend") is addressed more directly by
      revising prepare_sort_from_pathkeys to let the parent MergeAppend's sort
      list guide creation of each child's sort list.
      
      In passing, get rid of add_sort_column; as far as I can tell, testing for
      duplicate sort keys at this stage is dead code.  Certainly it doesn't
      trigger often enough to be worth expending cycles on in ordinary queries.
      And keeping the test would've greatly complicated the new logic in
      prepare_sort_from_pathkeys, because comparing pathkey list entries against
      a previous output array requires that we not skip any entries in the list.
      
      Back-patch to 9.1, like the previous patches.  The only known issue in
      this area that wasn't caused by the ill-advised previous patches was the
      MergeAppend planning failure, which of course is not relevant before 9.1.
      It's possible that we need some of the new defenses against duplicate child
      EC entries in older branches, but until there's some clear evidence of that
      I'm going to refrain from back-patching further.
      dd4134ea
  23. 28 Jan, 2012 1 commit
    • Tom Lane's avatar
      Use parameterized paths to generate inner indexscans more flexibly. · e2fa76d8
      Tom Lane authored
      This patch fixes the planner so that it can generate nestloop-with-
      inner-indexscan plans even with one or more levels of joining between
      the indexscan and the nestloop join that is supplying the parameter.
      The executor was fixed to handle such cases some time ago, but the
      planner was not ready.  This should improve our plans in many situations
      where join ordering restrictions formerly forced complete table scans.
      
      There is probably a fair amount of tuning work yet to be done, because
      of various heuristics that have been added to limit the number of
      parameterized paths considered.  However, we are not going to find out
      what needs to be adjusted until the code gets some real-world use, so
      it's time to get it in there where it can be tested easily.
      
      Note API change for index AM amcostestimate functions.  I'm not aware of
      any non-core index AMs, but if there are any, they will need minor
      adjustments.
      e2fa76d8
  24. 20 Feb, 2011 1 commit
  25. 29 Oct, 2010 1 commit
    • Tom Lane's avatar
      Avoid creation of useless EquivalenceClasses during planning. · 14231a41
      Tom Lane authored
      Zoltan Boszormenyi exhibited a test case in which planning time was
      dominated by construction of EquivalenceClasses and PathKeys that had no
      actual relevance to the query (and in fact got discarded immediately).
      This happened because we generated PathKeys describing the sort ordering of
      every index on every table in the query, and only after that checked to see
      if the sort ordering was relevant.  The EC/PK construction code is O(N^2)
      in the number of ECs, which is all right for the intended number of such
      objects, but it gets out of hand if there are ECs for lots of irrelevant
      indexes.
      
      To fix, twiddle the handling of mergeclauses a little bit to ensure that
      every interesting EC is created before we begin path generation.  (This
      doesn't cost anything --- in fact I think it's a bit cheaper than before
      --- since we always eventually created those ECs anyway.)  Then, if an
      index column can't be found in any pre-existing EC, we know that that sort
      ordering is irrelevant for the query.  Instead of creating a useless EC,
      we can just not build a pathkey for the index column in the first place.
      The index will still be considered if it's useful for non-order-related
      reasons, but we will think of its output as unsorted.
      14231a41
  26. 14 Oct, 2010 1 commit
    • Tom Lane's avatar
      Support MergeAppend plans, to allow sorted output from append relations. · 11cad29c
      Tom Lane authored
      This patch eliminates the former need to sort the output of an Append scan
      when an ordered scan of an inheritance tree is wanted.  This should be
      particularly useful for fast-start cases such as queries with LIMIT.
      
      Original patch by Greg Stark, with further hacking by Hans-Jurgen Schonig,
      Robert Haas, and Tom Lane.
      11cad29c
  27. 20 Sep, 2010 1 commit
  28. 19 Aug, 2010 1 commit
  29. 28 Mar, 2010 1 commit
  30. 29 Sep, 2009 1 commit
    • Tom Lane's avatar
      Fix equivclass.c's not-quite-right strategy for handling X=X clauses. · 25549edb
      Tom Lane authored
      The original coding correctly noted that these aren't just redundancies
      (they're effectively X IS NOT NULL, assuming = is strict).  However, they
      got treated that way if X happened to be in a single-member EquivalenceClass
      already, which could happen if there was an ORDER BY X clause, for instance.
      The simplest and most reliable solution seems to be to not try to process
      such clauses through the EquivalenceClass machinery; just throw them back
      for traditional processing.  The amount of work that'd be needed to be
      smarter than that seems out of proportion to the benefit.
      
      Per bug #5084 from Bernt Marius Johnsen, and analysis by Andrew Gierth.
      25549edb
  31. 17 Sep, 2009 1 commit
    • Tom Lane's avatar
      Implement "join removal" for cases where the inner side of a left join · 488d70ab
      Tom Lane authored
      is unique and is not referenced above the join.  In this case the inner
      side doesn't affect the query result and can be thrown away entirely.
      Although perhaps nobody would ever write such a thing by hand, it's
      a reasonably common case in machine-generated SQL.
      
      The current implementation only recognizes the case where the inner side
      is a simple relation with a unique index matching the query conditions.
      This is enough for the use-cases that have been shown so far, but we
      might want to try to handle other cases later.
      
      Robert Haas, somewhat rewritten by Tom
      488d70ab
  32. 21 Jul, 2009 1 commit
    • Tom Lane's avatar
      Fix another semijoin-ordering bug. We already knew that we couldn't · b2c51e6e
      Tom Lane authored
      reorder a semijoin into or out of the righthand side of another semijoin,
      but actually it doesn't work to reorder it into or out of the righthand
      side of a left or antijoin, either.  Per bug #4906 from Mathieu Fenniak.
      
      This was sloppy thinking on my part.  This identity does work:
      
      	( A left join B on (Pab) ) semijoin C on (Pac)
      ==
      	( A semijoin C on (Pac) ) left join B on (Pab)
      
      but I failed to see that that doesn't mean this does:
      
      	( A left join B on (Pab) ) semijoin C on (Pbc)
      !=
      	A left join ( B semijoin C on (Pbc) ) on (Pab)
      b2c51e6e
  33. 27 Feb, 2009 1 commit
  34. 14 Aug, 2008 1 commit
    • Tom Lane's avatar
      Implement SEMI and ANTI joins in the planner and executor. (Semijoins replace · e006a24a
      Tom Lane authored
      the old JOIN_IN code, but antijoins are new functionality.)  Teach the planner
      to convert appropriate EXISTS and NOT EXISTS subqueries into semi and anti
      joins respectively.  Also, LEFT JOINs with suitable upper-level IS NULL
      filters are recognized as being anti joins.  Unify the InClauseInfo and
      OuterJoinInfo infrastructure into "SpecialJoinInfo".  With that change,
      it becomes possible to associate a SpecialJoinInfo with every join attempt,
      which permits some cleanup of join selectivity estimation.  That needs to be
      taken much further than this patch does, but the next step is to change the
      API for oprjoin selectivity functions, which seems like material for a
      separate patch.  So for the moment the output size estimates for semi and
      especially anti joins are quite bogus.
      e006a24a
  35. 02 Aug, 2008 1 commit
    • Tom Lane's avatar
      Rearrange the querytree representation of ORDER BY/GROUP BY/DISTINCT items · 95113047
      Tom Lane authored
      as per my recent proposal:
      
      1. Fold SortClause and GroupClause into a single node type SortGroupClause.
      We were already relying on them to be struct-equivalent, so using two node
      tags wasn't accomplishing much except to get in the way of comparing items
      with equal().
      
      2. Add an "eqop" field to SortGroupClause to carry the associated equality
      operator.  This is cheap for the parser to get at the same time it's looking
      up the sort operator, and storing it eliminates the need for repeated
      not-so-cheap lookups during planning.  In future this will also let us
      represent GROUP/DISTINCT operations on datatypes that have hash opclasses
      but no btree opclasses (ie, they have equality but no natural sort order).
      The previous representation simply didn't work for that, since its only
      indicator of comparison semantics was a sort operator.
      
      3. Add a hasDistinctOn boolean to struct Query to explicitly record whether
      the distinctClause came from DISTINCT or DISTINCT ON.  This allows removing
      some complicated and not 100% bulletproof code that attempted to figure
      that out from the distinctClause alone.
      
      This patch doesn't in itself create any new capability, but it's necessary
      infrastructure for future attempts to use hash-based grouping for DISTINCT
      and UNION/INTERSECT/EXCEPT.
      95113047
  36. 09 Apr, 2008 3 commits
  37. 21 Mar, 2008 1 commit
  38. 20 Mar, 2008 1 commit