Skip to content
Projects
Groups
Snippets
Help
Loading...
Help
Support
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
P
Postgres FD Implementation
Project overview
Project overview
Details
Activity
Releases
Repository
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Issues
0
Issues
0
List
Boards
Labels
Milestones
Merge Requests
0
Merge Requests
0
CI / CD
CI / CD
Pipelines
Jobs
Schedules
Analytics
Analytics
CI / CD
Repository
Value Stream
Wiki
Wiki
Snippets
Snippets
Members
Members
Collapse sidebar
Close sidebar
Activity
Graph
Create a new issue
Jobs
Commits
Issue Boards
Open sidebar
Abuhujair Javed
Postgres FD Implementation
Commits
2bb6b98d
Commit
2bb6b98d
authored
Oct 16, 2001
by
Tom Lane
Browse files
Options
Browse Files
Download
Email Patches
Plain Diff
Add a section about the planner's statistics, including a description
of the new pg_stats view.
parent
daf72bba
Changes
2
Show whitespace changes
Inline
Side-by-side
Showing
2 changed files
with
199 additions
and
3 deletions
+199
-3
doc/src/sgml/perform.sgml
doc/src/sgml/perform.sgml
+195
-1
doc/src/sgml/ref/analyze.sgml
doc/src/sgml/ref/analyze.sgml
+4
-2
No files found.
doc/src/sgml/perform.sgml
View file @
2bb6b98d
<!--
<!--
$Header: /cvsroot/pgsql/doc/src/sgml/perform.sgml,v 1.1
2 2001/10/12 23:32:34 momjian
Exp $
$Header: /cvsroot/pgsql/doc/src/sgml/perform.sgml,v 1.1
3 2001/10/16 01:13:44 tgl
Exp $
-->
-->
<chapter id="performance-tips">
<chapter id="performance-tips">
...
@@ -315,6 +315,200 @@ Total runtime: 30.67 msec
...
@@ -315,6 +315,200 @@ Total runtime: 30.67 msec
</para>
</para>
</sect1>
</sect1>
<sect1 id="planner-stats">
<title>Statistics used by the Planner</title>
<para>
As we saw in the previous section, the query planner needs to estimate
the number of rows retrieved by a query in order to make good choices
of query plans. This section provides a quick look at the statistics
that the system uses for these estimates.
</para>
<para>
One component of the statistics is the total number of entries in each
table and index, as well as the number of disk blocks occupied by each
table and index. This information is kept in
<structname>pg_class</structname>'s <structfield>reltuples</structfield>
and <structfield>relpages</structfield> columns. We can look at it
with queries similar to this one:
<screen>
regression=# select relname, relkind, reltuples, relpages from pg_class
regression-# where relname like 'tenk1%';
relname | relkind | reltuples | relpages
---------------+---------+-----------+----------
tenk1 | r | 10000 | 233
tenk1_hundred | i | 10000 | 30
tenk1_unique1 | i | 10000 | 30
tenk1_unique2 | i | 10000 | 30
(4 rows)
</screen>
Here we can see that <structname>tenk1</structname> contains 10000
rows, as do its indexes, but the indexes are (unsurprisingly) much
smaller than the table.
</para>
<para>
For efficiency reasons, <structfield>reltuples</structfield>
and <structfield>relpages</structfield> are not updated on-the-fly,
and so they usually contain only approximate values (which is good
enough for the planner's purposes). They are initialized with dummy
values (presently 1000 and 10 respectively) when a table is created.
They are updated by certain commands, presently <command>VACUUM</>,
<command>ANALYZE</>, and <command>CREATE INDEX</>. A stand-alone
<command>ANALYZE</>, that is one not part of <command>VACUUM</>,
generates an approximate <structfield>reltuples</structfield> value
since it does not read every row of the table.
</para>
<para>
Most queries retrieve only a fraction of the rows in a table, due
to having WHERE clauses that restrict the rows to be examined.
The planner thus needs to make an estimate of the
<firstterm>selectivity</> of WHERE clauses, that is, the fraction of
rows that match each clause of the WHERE condition. The information
used for this task is stored in the <structname>pg_statistic</structname>
system catalog. Entries in <structname>pg_statistic</structname> are
updated by <command>ANALYZE</> and <command>VACUUM ANALYZE</> commands,
and are always approximate even when freshly updated.
</para>
<para>
Rather than look at <structname>pg_statistic</structname> directly,
it's better to look at its view <structname>pg_stats</structname>
when examining the statistics manually. <structname>pg_stats</structname>
is designed to be more easily readable. Furthermore,
<structname>pg_stats</structname> is readable by all, whereas
<structname>pg_statistic</structname> is only readable by the superuser.
(This prevents unprivileged users from learning something about
the contents of other people's tables from the statistics. The
<structname>pg_stats</structname> view is restricted to show only
rows about tables that the current user can read.)
For example, we might do:
<screen>
regression=# select attname, n_distinct, most_common_vals from pg_stats where tablename = 'road';
attname | n_distinct | most_common_vals
---------+------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
name | -0.467008 | {"I- 580 Ramp","I- 880 Ramp","Sp Railroad ","I- 580 ","I- 680 Ramp","I- 80 Ramp","14th St ","5th St ","Mission Blvd","I- 880 "}
thepath | 20 | {"[(-122.089,37.71),(-122.0886,37.711)]"}
(2 rows)
regression=#
</screen>
As of <productname>Postgres</productname> 7.2 the following columns exist
in <structname>pg_stats</structname>:
</para>
<table>
<title>pg_stats Columns</title>
<tgroup cols=3>
<thead>
<row>
<entry>Name</entry>
<entry>Type</entry>
<entry>Description</entry>
</row>
</thead>
<tbody>
<row>
<entry>tablename</entry>
<entry><type>name</type></entry>
<entry>Name of table containing column</entry>
</row>
<row>
<entry>attname</entry>
<entry><type>name</type></entry>
<entry>Column described by this row</entry>
</row>
<row>
<entry>null_frac</entry>
<entry><type>real</type></entry>
<entry>Fraction of column's entries that are NULL</entry>
</row>
<row>
<entry>avg_width</entry>
<entry><type>integer</type></entry>
<entry>Average width in bytes of column's entries</entry>
</row>
<row>
<entry>n_distinct</entry>
<entry><type>real</type></entry>
<entry>If greater than zero, the estimated number of distinct values
in the column. If less than zero, the negative of the number of
distinct values divided by the number of rows. (The negated form
is used when ANALYZE believes that the number of distinct values
is likely to increase as the table grows; the positive form is used
when the column seems to have a fixed number of possible values.)
For example, -1 indicates a unique column in which the number of
distinct values is the same as the number of rows.
</entry>
</row>
<row>
<entry>most_common_vals</entry>
<entry><type>text[]</type></entry>
<entry>A list of the most common values in the column. (Omitted if
no values seem to be more common than any others.)</entry>
</row>
<row>
<entry>most_common_freqs</entry>
<entry><type>real[]</type></entry>
<entry>A list of the frequencies of the most common values,
ie, number of occurrences of each divided by total number of rows.
</entry>
</row>
<row>
<entry>histogram_bounds</entry>
<entry><type>text[]</type></entry>
<entry>A list of values that divide the column's values into
groups of approximately equal population. The
<structfield>most_common_vals</>, if present, are omitted from the
histogram calculation. (Omitted if column datatype does not have a
<literal><</> operator, or if the <structfield>most_common_vals</>
list accounts for the entire population.)
</entry>
</row>
<row>
<entry>correlation</entry>
<entry><type>real</type></entry>
<entry>Statistical correlation between physical row ordering and
logical ordering of the column values. This ranges from -1 to +1.
When the value is near -1 or +1, an indexscan on the column will
be estimated to be cheaper than when it is near zero, due to reduction
of random access to the disk. (Omitted if column datatype does
not have a <literal><</> operator.)
</entry>
</row>
</tbody>
</tgroup>
</table>
<para>
The maximum number of entries in the <structfield>most_common_vals</>
and <structfield>histogram_bounds</> arrays can be set on a
column-by-column basis using the <command>ALTER TABLE SET STATISTICS</>
command. The default limit is presently 10 entries. Raising the limit
may allow more accurate planner estimates to be made, particularly for
columns with irregular data distributions, at the price of consuming
more space in <structname>pg_statistic</structname> and slightly more
time to compute the estimates. Conversely, a lower limit may be
appropriate for columns with simple data distributions.
</para>
</sect1>
<sect1 id="explicit-joins">
<sect1 id="explicit-joins">
<title>Controlling the Planner with Explicit JOINs</title>
<title>Controlling the Planner with Explicit JOINs</title>
...
...
doc/src/sgml/ref/analyze.sgml
View file @
2bb6b98d
<!--
<!--
$Header: /cvsroot/pgsql/doc/src/sgml/ref/analyze.sgml,v 1.
2 2001/09/03 12:57:49 petere
Exp $
$Header: /cvsroot/pgsql/doc/src/sgml/ref/analyze.sgml,v 1.
3 2001/10/16 01:13:44 tgl
Exp $
Postgres documentation
Postgres documentation
-->
-->
...
@@ -152,7 +152,9 @@ ANALYZE [ VERBOSE ] [ <replaceable class="PARAMETER">table</replaceable> [ (<rep
...
@@ -152,7 +152,9 @@ ANALYZE [ VERBOSE ] [ <replaceable class="PARAMETER">table</replaceable> [ (<rep
distribution in each column. One or both of these may be omitted if
distribution in each column. One or both of these may be omitted if
<command>ANALYZE</command> deems them uninteresting (for example, in
<command>ANALYZE</command> deems them uninteresting (for example, in
a unique-key column, there are no common values) or if the column
a unique-key column, there are no common values) or if the column
datatype does not support the appropriate operators.
datatype does not support the appropriate operators. There is more
information about the statistics in the <citetitle>User's
Guide</citetitle>.
</para>
</para>
<para>
<para>
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment