Commits · c23bc6fbb02455ee9c2e0206747a929aa79b7d01 · Abuhujair Javed / Postgres FD Implementation

09 May, 2001 2 commits

First cut at making indexscan cost estimates depend on correlation · c23bc6fb
Tom Lane authored 23 years ago
```
between index order and table order.
```
c23bc6fb

Cause planner to make use of average-column-width statistic that is now · 6cda3ad8

Tom Lane authored 23 years ago

collected by ANALYZE. Also, add some modest amount of intelligence to
guesses that are used for varlena columns in the absence of any ANALYZE
statistics. The 'width' reported by EXPLAIN is finally something less
than totally bogus for varlena columns ... and, in consequence, hashjoin
estimating should be a little better ...

6cda3ad8

07 May, 2001 1 commit

Rewrite of planner statistics-gathering code. ANALYZE is now available as · f905d65e

Tom Lane authored 23 years ago

a separate statement (though it can still be invoked as part of VACUUM, too).
pg_statistic redesigned to be more flexible about what statistics are
stored. ANALYZE now collects a list of several of the most common values,
not just one, plus a histogram (not just the min and max values). Random
sampling is used to make the process reasonably fast even on very large
tables. The number of values and histogram bins collected is now
user-settable via an ALTER TABLE command.

There is more still to do; the new stats are not being used everywhere
they could be in the planner. But the remaining changes for this project
should be localized, and the behavior is already better than before.

A not-very-related change is that sorting now makes use of btree comparison
routines if it can find one, rather than invoking '<' twice.

f905d65e

25 Apr, 2001 1 commit

Tweak nestloop costing to weight restart cost of inner path more heavily. · a43f20cb

Tom Lane authored 23 years ago

Without this, it was making some pretty silly decisions about whether an
expensive sub-SELECT should be the inner or outer side of a join...

a43f20cb

22 Mar, 2001 1 commit
- pgindent run. Make it all clean. · 9e155260
  Bruce Momjian authored 23 years ago
  
  9e155260
16 Feb, 2001 1 commit

Take OUTER JOIN semantics into account when estimating the size of join · b29f68f6

Tom Lane authored 24 years ago

relations.  It's not very bright, but at least it now knows that
A LEFT JOIN B must produce at least as many rows as are in A ...

b29f68f6

15 Feb, 2001 1 commit
- Update a couple of obsolete comments. · 83b4ab53
  Tom Lane authored 24 years ago
  
  83b4ab53
24 Jan, 2001 1 commit
- Change Copyright from PostgreSQL, Inc to PostgreSQL Global Development Group. · 623bf843
  Bruce Momjian authored 24 years ago
  
  623bf843
12 Dec, 2000 1 commit

Cache eval cost of qualification expressions in RestrictInfo nodes to · 17b843d6

Tom Lane authored 24 years ago

avoid repeated evaluations in cost_qual_eval().  This turns out to save
a useful fraction of planning time.  No change to external representation
of RestrictInfo --- although that node type doesn't appear in stored
rules anyway.

17b843d6

05 Oct, 2000 1 commit
- Add proofreader's changes to docs. · b32685a9
  Bruce Momjian authored 24 years ago
```
Fix misspelling of disbursion to dispersion.
```
  b32685a9
29 Sep, 2000 1 commit

Subselects in FROM clause, per ISO syntax: FROM (SELECT ...) [AS] alias. · 3a94e789

Tom Lane authored 24 years ago

(Don't forget that an alias is required.)  Views reimplemented as expanding
to subselect-in-FROM.  Grouping, aggregates, DISTINCT in views actually
work now (he says optimistically).  No UNION support in subselects/views
yet, but I have some ideas about that.  Rule-related permissions checking
moved out of rewriter and into executor.
INITDB REQUIRED!

3a94e789

18 Jun, 2000 1 commit

Reimplement nodeMaterial to use a temporary BufFile (or even memory, if the · 1ee26b77

Tom Lane authored 24 years ago

materialized tupleset is small enough) instead of a temporary relation.
This was something I was thinking of doing anyway for performance, and Jan
says he needs it for TOAST because he doesn't want to cope with toasting
noname relations. With this change, the 'noname table' support in heap.c
is dead code, and I have accordingly removed it. Also clean up 'noname'
plan handling in planner --- nonames are either sort or materialize plans,
and it seems less confusing to handle them separately under those names.

1ee26b77

31 May, 2000 1 commit

The heralded `Grand Unified Configuration scheme' (GUC) · 6a68f426

Peter Eisentraut authored 24 years ago

That means you can now set your options in either or all of $PGDATA/configuration,
some postmaster option (--enable-fsync=off), or set a SET command. The list of
options is in backend/utils/misc/guc.c, documentation will be written post haste.

pg_options is gone, so is that pq_geqo config file. Also removed were backend -K,
-Q, and -T options (no longer applicable, although -d0 does the same as -Q).

Added to configure an --enable-syslog option.

changed all callers from TPRINTF to elog(DEBUG)

6a68f426

30 May, 2000 2 commits
- Third round of fmgr updates: eliminate calls using fmgr() and · 0f1e3964
  Tom Lane authored 24 years ago
```
fmgr_faddr() in favor of new-style calls.  Lots of cleanup of
sloppy casts to use XXXGetDatum and DatumGetXXX ...
```
  0f1e3964
- Remove unused include files. Do not touch /port or includes used by defines. · a12a23f0
  Bruce Momjian authored 24 years ago
  
  a12a23f0
18 Apr, 2000 1 commit

Correct oversight in hashjoin cost estimation: nodeHash sizes its hash · 25442d8d

Tom Lane authored 24 years ago

table for an average of NTUP_PER_BUCKET tuples/bucket, but cost_hashjoin
was assuming a target load of one tuple/bucket.  This was causing a
noticeable underestimate of hashjoin costs.

25442d8d

12 Apr, 2000 1 commit
- Ye-old pgindent run. Same 4-space tabs. · 52f77df6
  Bruce Momjian authored 24 years ago
  
  52f77df6
09 Apr, 2000 1 commit
- Further tweaking of indexscan cost estimates. · 9c38a8d2
  Tom Lane authored 24 years ago
  
  9c38a8d2
30 Mar, 2000 1 commit

Tweak indexscan cost estimation: round estimated # of tuples visited up · e55985d3

Tom Lane authored 24 years ago

to next integer.  Previously, if selectivity was small, we could compute
very tiny scan cost on the basis of estimating that only 0.001 tuple
would be fetched, which is silly.  This naturally led to some rather
silly plans...

e55985d3

22 Mar, 2000 1 commit

Repair logic flaw in cost estimator: cost_nestloop() was estimating CPU · 1d5e7a6f

Tom Lane authored 24 years ago

costs using the inner path's parent->rows count as the number of tuples
processed per inner scan iteration. This is wrong when we are using an
inner indexscan with indexquals based on join clauses, because the rows
count in a Relation node reflects the selectivity of the restriction
clauses for that rel only. Upshot was that if join clause was very
selective, we'd drastically overestimate the true cost of the join.
Fix is to calculate correct output-rows estimate for an inner indexscan
when the IndexPath node is created and save it in the path node.
Change of path node doesn't require initdb, since path nodes don't
appear in saved rules.

1d5e7a6f

14 Mar, 2000 1 commit

Fix some bogosities in the code that deals with estimating the fraction · 6217a8c7

Tom Lane authored 24 years ago

of tuples we are going to retrieve from a sub-SELECT.  Must have been
half asleep when I did this code the first time :-(

6217a8c7

15 Feb, 2000 1 commit

New cost model for planning, incorporating a penalty for random page · b1577a7c

Tom Lane authored 25 years ago

accesses versus sequential accesses, a (very crude) estimate of the
effects of caching on random page accesses, and cost to evaluate WHERE-
clause expressions.  Export critical parameters for this model as SET
variables.  Also, create SET variables for the planner's enable flags
(enable_seqscan, enable_indexscan, etc) so that these can be controlled
more conveniently than via PGOPTIONS.

Planner now estimates both startup cost (cost before retrieving
first tuple) and total cost of each path, so it can optimize queries
with LIMIT on a reasonable basis by interpolating between these costs.
Same facility is a win for EXISTS(...) subqueries and some other cases.

Redesign pathkey representation to achieve a major speedup in planning
(I saw as much as 5X on a 10-way join); also minor changes in planner
to reduce memory consumption by recycling discarded Path nodes and
not constructing unnecessary lists.

Minor cleanups to display more-plausible costs in some cases in
EXPLAIN output.

Initdb forced by change in interface to index cost estimation
functions.

b1577a7c

07 Feb, 2000 1 commit

Repair planning bugs caused by my misguided removal of restrictinfo link · d8733ce6

Tom Lane authored 25 years ago

fields in JoinPaths --- turns out that we do need that after all :-(.
Also, rearrange planner so that only one RelOptInfo is created for a
particular set of joined base relations, no matter how many different
subsets of relations it can be created from. This saves memory and
processing time compared to the old method of making a bunch of RelOptInfos
and then removing the duplicates. Clean up the jointree iteration logic;
not sure if it's better, but I sure find it more readable and plausible
now, particularly for the case of 'bushy plans'.

d8733ce6

26 Jan, 2000 1 commit

Add: · 5c25d602

Bruce Momjian authored 25 years ago

  * Portions Copyright (c) 1996-2000, PostgreSQL, Inc

to all files copyright Regents of Berkeley.  Man, that's a lot of files.

5c25d602

23 Jan, 2000 1 commit
- First cut at unifying regular selectivity estimation with indexscan · 8449df8a
  Tom Lane authored 25 years ago
```
selectivity estimation wasn't right.  This is better...
```
  8449df8a
22 Jan, 2000 1 commit

Revise handling of index-type-specific indexscan cost estimation, per · 71ed7eb4

Tom Lane authored 25 years ago

pghackers discussion of 5-Jan-2000.  The amopselect and amopnpages
estimators are gone, and in their place is a per-AM amcostestimate
procedure (linked to from pg_am, not pg_amop).

71ed7eb4

09 Jan, 2000 1 commit

Another round of planner/optimizer work. This is just restructuring and · 166b5c1d

Tom Lane authored 25 years ago

code cleanup; no major improvements yet.  However, EXPLAIN does produce
more intuitive outputs for nested loops with indexscans now...

166b5c1d

23 Nov, 1999 1 commit
- Tid access method feature from Hiroshi Inoue, Inoue@tpf.co.jp · 6f9ff92c
  Bruce Momjian authored 25 years ago
  
  6f9ff92c
22 Aug, 1999 1 commit

Further planner/optimizer cleanups. Move all set_tlist_references · 78114cd4

Tom Lane authored 25 years ago

and fix_opids processing to a single recursive pass over the plan tree
executed at the very tail end of planning, rather than haphazardly here
and there at different places. Now that tlist Vars do not get modified
until the very end, it's possible to get rid of the klugy var_equal and
match_varid partial-matching routines, and just use plain equal()
throughout the optimizer. This is a step towards allowing merge and
hash joins to be done on expressions instead of only Vars ...

78114cd4

06 Aug, 1999 1 commit

Revise generation of hashjoin paths: generate one path per · e1fad50a

Tom Lane authored 25 years ago

hashjoinable clause, not one path for a randomly-chosen element of each
set of clauses with the same join operator. That is, if you wrote
SELECT ... WHERE t1.f1 = t2.f2 and t1.f3 = t2.f4,
and both '=' ops were the same opcode (say, all four fields are int4),
then the system would either consider hashing on f1=f2 or on f3=f4,
but it would *not* consider both possibilities. Boo hiss.
Also, revise estimation of hashjoin costs to include a penalty when the
inner join var has a high disbursion --- ie, the most common value is
pretty common. This tends to lead to badly skewed hash bucket occupancy
and way more comparisons than you'd expect on average.
I imagine that the cost calculation still needs tweaking, but at least
it generates a more reasonable plan than before on George Young's example.

e1fad50a

16 Jul, 1999 1 commit
- Final cleanup. · a71802e1
  Bruce Momjian authored 25 years ago
  
  a71802e1
15 Jul, 1999 1 commit
- Remove unused #includes in *.c files. · 2e6b1e63
  Bruce Momjian authored 25 years ago
  
  2e6b1e63
07 Jul, 1999 3 commits
- Fix spelling of variable name. · e9c977da
  Bruce Momjian authored 25 years ago
  
  e9c977da
- Cleanup of min tuple size. · 9f7ac20e
  Bruce Momjian authored 25 years ago
  
  9f7ac20e
- Fix misspelling. · 13910988
  Bruce Momjian authored 25 years ago
  
  13910988
25 May, 1999 2 commits
- Another pgindent run. Sorry folks. · fcff1cdf
  Bruce Momjian authored 25 years ago
  
  fcff1cdf
- pgindent run over code. · 07842084
  Bruce Momjian authored 25 years ago
  
  07842084
01 May, 1999 1 commit
- Clean up cost_sort some more: most callers were double-counting · 605d8494
  Tom Lane authored 25 years ago
```
the cost of reading the source data.
```
  605d8494
30 Apr, 1999 1 commit

Clean up some bogosities in path cost estimation, like · 7a7ba335

Tom Lane authored 25 years ago

sometimes estimating an index scan of a table to be cheaper than a
sequential scan of the same tuples...

7a7ba335

05 Apr, 1999 1 commit
- Fix potential overflow problems when relation size exceeds · e91f43a1
  Tom Lane authored 25 years ago
```
2gig.  Fix failure to reliably put the smaller relation on the inside of
a hashjoin.
```
  e91f43a1