Commit f905d65e authored by Tom Lane's avatar Tom Lane

Rewrite of planner statistics-gathering code. ANALYZE is now available as

a separate statement (though it can still be invoked as part of VACUUM, too).
pg_statistic redesigned to be more flexible about what statistics are
stored.  ANALYZE now collects a list of several of the most common values,
not just one, plus a histogram (not just the min and max values).  Random
sampling is used to make the process reasonably fast even on very large
tables.  The number of values and histogram bins collected is now
user-settable via an ALTER TABLE command.

There is more still to do; the new stats are not being used everywhere
they could be in the planner.  But the remaining changes for this project
should be localized, and the behavior is already better than before.

A not-very-related change is that sorting now makes use of btree comparison
routines if it can find one, rather than invoking '<' twice.
parent 9583aea9
<!--
Documentation of the system catalogs, directed toward PostgreSQL developers
$Header: /cvsroot/pgsql/doc/src/sgml/catalogs.sgml,v 2.15 2001/04/20 15:52:33 thomas Exp $
$Header: /cvsroot/pgsql/doc/src/sgml/catalogs.sgml,v 2.16 2001/05/07 00:43:14 tgl Exp $
-->
<chapter id="catalogs">
......@@ -16,7 +16,7 @@
<productname>PostgreSQL</productname>'s system catalogs are regular
tables. You can drop and recreate the tables, add columns, insert
and update values, and severely mess up your system that way.
Normally one never has to change the system catalogs by hand, there
Normally one should not change the system catalogs by hand, there
are always SQL commands to do that. (For example, <command>CREATE
DATABASE</command> inserts a row into the
<structname>pg_database</structname> catalog -- and actually
......@@ -185,7 +185,7 @@
<para>
<structname>pg_aggregate</structname> stores information about
aggregate functions. An aggregate function is a function that
operates on a set of values (typically one column from each the row
operates on a set of values (typically one column from each row
that matches a query condition) and returns a single value computed
from all these values. Typical aggregate functions are
<function>sum</function>, <function>count</function>, and
......@@ -233,7 +233,7 @@
<entry>aggbasetype</entry>
<entry><type>oid</type></entry>
<entry>pg_type.oid</entry>
<entry>The type on which this function operates when invoked from SQL</entry>
<entry>The input datatype for this aggregate function</entry>
</row>
<row>
<entry>aggtranstype</entry>
......@@ -269,7 +269,7 @@
<para>
An aggregate function is identified through name
<emphasis>and</emphasis> argument type. Hence aggname and aggname
<emphasis>and</emphasis> argument type. Hence aggname and aggbasetype
are the composite primary key.
</para>
......@@ -311,11 +311,8 @@
<row>
<entry>adnum</entry>
<entry><type>int2</type></entry>
<entry></entry>
<entry>
The number of the column; see
<structname>pg_attribute</structname>.<structfield>pg_attnum</structfield>
</entry>
<entry>pg_attribute.attnum</entry>
<entry>The number of the column</entry>
</row>
<row>
......@@ -390,20 +387,18 @@
</row>
<row>
<entry>attdispersion</entry>
<entry><type>float4</type></entry>
<entry>attstattarget</entry>
<entry><type>int4</type></entry>
<entry></entry>
<entry>
<structfield>attdispersion</structfield> is the dispersion
statistic of the column (0.0 to 1.0), or zero if the statistic
has not been calculated, or -1.0 if <command>VACUUM</command>
found that the column contains no duplicate entries (in which
case the dispersion should be taken as
1.0/<symbol>numberOfRows</symbol> for the current table size).
The -1.0 hack is useful because the number of rows may be
updated more often than
<structfield>attdispersion</structfield> is. We assume that the
column will retain its no-duplicate-entry property.
<structfield>attstattarget</structfield> controls the level of detail
of statistics accumulated for this column by
<command>ANALYZE</command>.
A zero value indicates that no statistics should be collected.
The exact meaning of positive values is datatype-dependent.
For scalar datatypes, <structfield>attstattarget</structfield>
is both the target number of <quote>most common values</quote>
to collect, and the target number of histogram bins to create.
</entry>
</row>
......@@ -430,10 +425,12 @@
</row>
<row>
<entry>attnelems</entry>
<entry>attndims</entry>
<entry><type>int4</type></entry>
<entry></entry>
<entry>Number of dimensions, if the column is an array</entry>
<entry>
Number of dimensions, if the column is an array; otherwise 0.
</entry>
</row>
<row>
......@@ -610,18 +607,22 @@
<entry></entry>
<entry>
Size of the on-disk representation of this table in pages (size
<symbol>BLCKSZ</symbol>). This is only an approximate value
which is calculated during vacuum.
<symbol>BLCKSZ</symbol>).
This is only an estimate used by the planner.
It is updated by <command>VACUUM</command>,
<command>ANALYZE</command>, and <command>CREATE INDEX</command>.
</entry>
</row>
<row>
<entry>reltuples</entry>
<entry><type>int4</type></entry>
<entry><type>float4</type></entry>
<entry></entry>
<entry>
Number of tuples in the table. This is only an estimate used
by the planner, updated by <command>VACUUM</command>.
Number of tuples in the table.
This is only an estimate used by the planner.
It is updated by <command>VACUUM</command>,
<command>ANALYZE</command>, and <command>CREATE INDEX</command>.
</entry>
</row>
......@@ -1671,6 +1672,130 @@
</section>
<section id="catalog-pg-statistic">
<title>pg_statistic</title>
<para>
<structname>pg_statistic</structname> stores statistical data about
the contents of the database. Entries are created by
<command>ANALYZE</command> and subsequently used by the query planner.
There is one entry for each table column that has been analyzed.
Note that all the statistical data is inherently approximate,
even assuming that it is up-to-date.
</para>
<para>
Since different kinds of statistics may be appropriate for different
kinds of data, <structname>pg_statistic</structname> is designed not
to assume very much about what sort of statistics it stores. Only
extremely general statistics (such as NULL-ness) are given dedicated
columns in <structname>pg_statistic</structname>. Everything else
is stored in "slots", which are groups of associated columns whose
content is identified by a code number in one of the slot's columns.
For more information see
<filename>src/include/catalog/pg_statistic.h</filename>.
</para>
<table>
<title>pg_statistic Columns</title>
<tgroup cols=4>
<thead>
<row>
<entry>Name</entry>
<entry>Type</entry>
<entry>References</entry>
<entry>Description</entry>
</row>
</thead>
<tbody>
<row>
<entry>starelid</entry>
<entry><type>oid</type></entry>
<entry>pg_class.oid</entry>
<entry>The table that the described column belongs to</entry>
</row>
<row>
<entry>staattnum</entry>
<entry><type>int2</type></entry>
<entry>pg_attribute.attnum</entry>
<entry>The number of the described column</entry>
</row>
<row>
<entry>stanullfrac</entry>
<entry><type>float4</type></entry>
<entry></entry>
<entry>The fraction of the column's entries that are NULL</entry>
</row>
<row>
<entry>stawidth</entry>
<entry><type>int4</type></entry>
<entry></entry>
<entry>The average stored width, in bytes, of non-NULL entries</entry>
</row>
<row>
<entry>stadistinct</entry>
<entry><type>float4</type></entry>
<entry></entry>
<entry>The number of distinct non-NULL data values in the column.
A value greater than zero is the actual number of distinct values.
A value less than zero is the negative of a fraction of the number
of rows in the table (for example, a column in which values appear about
twice on the average could be represented by stadistinct = -0.5).
A zero value means the number of distinct values is unknown.
</entry>
</row>
<row>
<entry>stakindN</entry>
<entry><type>int2</type></entry>
<entry></entry>
<entry>A code number indicating the kind of statistics stored in the Nth
"slot" of the <structname>pg_statistic</structname> row.
</entry>
</row>
<row>
<entry>staopN</entry>
<entry><type>oid</type></entry>
<entry>pg_operator.oid</entry>
<entry>An operator used to derive the statistics stored in the
Nth "slot". For example, a histogram slot would show the "&lt;"
operator that defines the sort order of the data.
</entry>
</row>
<row>
<entry>stanumbersN</entry>
<entry><type>float4[]</type></entry>
<entry></entry>
<entry>Numerical statistics of the appropriate kind for the Nth
"slot", or NULL if the slot kind does not involve numerical values.
</entry>
</row>
<row>
<entry>stavaluesN</entry>
<entry><type>text[]</type></entry>
<entry></entry>
<entry>Column data values of the appropriate kind for the Nth
"slot", or NULL if the slot kind does not store any data values.
For datatype independence, all column data values are converted
to external textual form and stored as TEXT datums.
</entry>
</row>
</tbody>
</tgroup>
</table>
</section>
<section id="catalog-pg-type">
<title>pg_type</title>
......
<!-- $Header: /cvsroot/pgsql/doc/src/sgml/indices.sgml,v 1.14 2001/02/20 22:27:56 petere Exp $ -->
<!-- $Header: /cvsroot/pgsql/doc/src/sgml/indices.sgml,v 1.15 2001/05/07 00:43:14 tgl Exp $ -->
<chapter id="indices">
<title id="indices-title">Indices</title>
......@@ -71,7 +71,7 @@ CREATE INDEX test1_id_index ON test1 (id);
Once the index is created, no further intervention is required: the
system will use the index when it thinks it would be more efficient
than a sequential table scan. But you may have to run the
<command>VACUUM ANALYZE</command> command regularly to update
<command>ANALYZE</command> command regularly to update
statistics to allow the query planner to make educated decisions.
Also read <xref linkend="performance-tips"> for information about
how to find out whether an index is used and when and why the
......
<!--
$Header: /cvsroot/pgsql/doc/src/sgml/ref/allfiles.sgml,v 1.27 2001/01/13 03:11:12 petere Exp $
$Header: /cvsroot/pgsql/doc/src/sgml/ref/allfiles.sgml,v 1.28 2001/05/07 00:43:14 tgl Exp $
Postgres documentation
Complete list of usable sgml source files in this directory.
-->
......@@ -40,6 +40,7 @@ Complete list of usable sgml source files in this directory.
<!entity alterGroup system "alter_group.sgml">
<!entity alterTable system "alter_table.sgml">
<!entity alterUser system "alter_user.sgml">
<!entity analyze system "analyze.sgml">
<!entity begin system "begin.sgml">
<!entity checkpoint system "checkpoint.sgml">
<!entity close system "close.sgml">
......
<!--
$Header: /cvsroot/pgsql/doc/src/sgml/ref/alter_table.sgml,v 1.22 2001/03/05 18:42:55 momjian Exp $
$Header: /cvsroot/pgsql/doc/src/sgml/ref/alter_table.sgml,v 1.23 2001/05/07 00:43:15 tgl Exp $
Postgres documentation
-->
......@@ -29,7 +29,9 @@ ALTER TABLE [ ONLY ] <replaceable class="PARAMETER">table</replaceable> [ * ]
ALTER TABLE [ ONLY ] <replaceable class="PARAMETER">table</replaceable> [ * ]
ALTER [ COLUMN ] <replaceable class="PARAMETER">column</replaceable> { SET DEFAULT <replaceable
class="PARAMETER">value</replaceable> | DROP DEFAULT }
ALTER TABLE <replaceable class="PARAMETER">table</replaceable> [ * ]
ALTER TABLE [ ONLY ] <replaceable class="PARAMETER">table</replaceable> [ * ]
ALTER [ COLUMN ] <replaceable class="PARAMETER">column</replaceable> SET STATISTICS <replaceable class="PARAMETER">integer</replaceable>
ALTER TABLE [ ONLY ] <replaceable class="PARAMETER">table</replaceable> [ * ]
RENAME [ COLUMN ] <replaceable class="PARAMETER">column</replaceable> TO <replaceable
class="PARAMETER">newcolumn</replaceable>
ALTER TABLE <replaceable class="PARAMETER">table</replaceable>
......@@ -159,9 +161,14 @@ ALTER TABLE <replaceable class="PARAMETER">table</replaceable>
<command>ALTER TABLE</command> changes the definition of an existing table.
The <literal>ADD COLUMN</literal> form adds a new column to the table
using the same syntax as <xref linkend="SQL-CREATETABLE"
endterm="SQL-CREATETABLE-title">. The <literal>ALTER COLUMN</literal> form
allows you to set or remove the default for the column. Note that defaults
only apply to newly inserted rows.
endterm="SQL-CREATETABLE-title">.
The <literal>ALTER COLUMN SET/DROP DEFAULT</literal> forms
allow you to set or remove the default for the column. Note that defaults
only apply to subsequent <command>INSERT</command> commands; they do not
cause rows already in the table to change.
The <literal>ALTER COLUMN SET STATISTICS</literal> form allows you to
set the statistics-gathering target for subsequent
<xref linkend="sql-analyze" endterm="sql-analyze-title"> operations.
The <literal>RENAME</literal> clause causes the name of a table or column
to change without changing any of the data contained in
the affected table. Thus, the table or column will
......@@ -170,7 +177,7 @@ ALTER TABLE <replaceable class="PARAMETER">table</replaceable>
The ADD <replaceable class="PARAMETER">table constraint definition</replaceable> clause
adds a new constraint to the table using the same syntax as <xref
linkend="SQL-CREATETABLE" endterm="SQL-CREATETABLE-title">.
The OWNER clause chnages the owner of the table to the user <replaceable class="PARAMETER">
The OWNER clause changes the owner of the table to the user <replaceable class="PARAMETER">
new user</replaceable>.
</para>
......@@ -190,10 +197,11 @@ ALTER TABLE <replaceable class="PARAMETER">table</replaceable>
</para>
<para>
In the current implementation, default and constraint clauses for the
In the current implementation of <literal>ADD COLUMN</literal>,
default and constraint clauses for the
new column will be ignored. You can use the <literal>SET DEFAULT</literal>
form of <command>ALTER TABLE</command> to set the default later.
(You will also have to update the already existing rows to the
(You may also want to update the already existing rows to the
new default value, using <xref linkend="sql-update"
endterm="sql-update-title">.)
</para>
......@@ -210,7 +218,7 @@ ALTER TABLE <replaceable class="PARAMETER">table</replaceable>
<para>
You must own the table in order to change it.
Renaming any part of the schema of a system
Changing any part of the schema of a system
catalog is not permitted.
The <citetitle>PostgreSQL User's Guide</citetitle> has further
information on inheritance.
......
<!--
$Header: /cvsroot/pgsql/doc/src/sgml/ref/analyze.sgml,v 1.1 2001/05/07 00:43:15 tgl Exp $
Postgres documentation
-->
<refentry id="SQL-ANALYZE">
<refmeta>
<refentrytitle id="sql-analyze-title">
ANALYZE
</refentrytitle>
<refmiscinfo>SQL - Language Statements</refmiscinfo>
</refmeta>
<refnamediv>
<refname>
ANALYZE
</refname>
<refpurpose>
Collect statistics about a <productname>Postgres</productname> database
</refpurpose>
</refnamediv>
<refsynopsisdiv>
<refsynopsisdivinfo>
<date>2001-05-04</date>
</refsynopsisdivinfo>
<synopsis>
ANALYZE [ VERBOSE ] [ <replaceable class="PARAMETER">table</replaceable> [ (<replaceable class="PARAMETER">column</replaceable> [, ...] ) ] ]
</synopsis>
<refsect2 id="R2-SQL-ANALYZE-1">
<refsect2info>
<date>2001-05-04</date>
</refsect2info>
<title>
Inputs
</title>
<para>
<variablelist>
<varlistentry>
<term>VERBOSE</term>
<listitem>
<para>
Enables display of progress messages.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term><replaceable class="PARAMETER">table</replaceable></term>
<listitem>
<para>
The name of a specific table to analyze. Defaults to all tables.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term><replaceable class="PARAMETER">column</replaceable></term>
<listitem>
<para>
The name of a specific column to analyze. Defaults to all columns.
</para>
</listitem>
</varlistentry>
</variablelist>
</para>
</refsect2>
<refsect2 id="R2-SQL-ANALYZE-2">
<refsect2info>
<date>2001-05-04</date>
</refsect2info>
<title>
Outputs
</title>
<para>
<variablelist>
<varlistentry>
<term><computeroutput>
<returnvalue>ANALYZE</returnvalue>
</computeroutput></term>
<listitem>
<para>
The command is complete.
</para>
</listitem>
</varlistentry>
</variablelist>
</para>
</refsect2>
</refsynopsisdiv>
<refsect1 id="R1-SQL-ANALYZE-1">
<refsect1info>
<date>2001-05-04</date>
</refsect1info>
<title>
Description
</title>
<para>
<command>ANALYZE</command> collects statistics about the contents of
<productname>Postgres</productname> tables, and stores the results in
the system table <literal>pg_statistic</literal>. Subsequently,
the query planner uses the statistics to help determine the most efficient
execution plans for queries.
</para>
<para>
With no parameter, <command>ANALYZE</command> examines every table in the
current database. With a parameter, <command>ANALYZE</command> examines
only that table. It is further possible to give a list of column names,
in which case only the statistics for those columns are updated.
</para>
<refsect2 id="R2-SQL-ANALYZE-3">
<refsect2info>
<date>2001-05-04</date>
</refsect2info>
<title>
Notes
</title>
<para>
It is a good idea to run <command>ANALYZE</command> periodically, or
just after making major changes in the contents of a table. Accurate
statistics will help the planner to choose the most appropriate query
plan, and thereby improve the speed of query processing. A common
strategy is to run <command>VACUUM</command> and <command>ANALYZE</command>
once a day during a low-usage time of day.
</para>
<para>
Unlike <xref linkend="sql-vacuum" endterm="sql-vacuum-title">,
<command>ANALYZE</command> requires
only a read lock on the target table, so it can run in parallel with
other activity on the table.
</para>
<para>
For large tables, <command>ANALYZE</command> takes a random sample of the
table contents, rather than examining every row. This allows even very
large tables to be analyzed in a small amount of time. Note however
that the statistics are only approximate, and will change slightly each
time <command>ANALYZE</command> is run, even if the actual table contents
did not change. This may result in small changes in the planner's
estimated costs shown by <command>EXPLAIN</command>.
</para>
<para>
The collected statistics usually include a list of some of the most common
values in each column and a histogram showing the approximate data
distribution in each column. One or both of these may be omitted if
<command>ANALYZE</command> deems them uninteresting (for example, in
a unique-key column, there are no common values) or if the column
datatype does not support the appropriate operators.
</para>
<para>
The extent of analysis can be controlled by adjusting the per-column
statistics target with <command>ALTER TABLE ALTER COLUMN SET
STATISTICS</command> (see
<xref linkend="sql-altertable" endterm="sql-altertable-title">). The
target value sets the maximum number of entries in the most-common-value
list and the maximum number of bins in the histogram. The default
target value is 10, but this can be adjusted up or down to trade off
accuracy of planner estimates against the time taken for
<command>ANALYZE</command> and the
amount of space occupied in <literal>pg_statistic</literal>.
In particular, setting the statistics target to zero disables collection of
statistics for that column. It may be useful to do that for columns that
are never used as part of the WHERE, GROUP BY, or ORDER BY clauses of
queries, since the planner will have no use for statistics on such columns.
</para>
<para>
The largest statistics target among the columns being analyzed determines
the number of table rows sampled to prepare the statistics. Increasing
the target causes a proportional increase in the time and space needed
to do <command>ANALYZE</command>.
</para>
</refsect2>
</refsect1>
<refsect1 id="R1-SQL-ANALYZE-3">
<title>
Compatibility
</title>
<refsect2 id="R2-SQL-ANALYZE-4">
<refsect2info>
<date>2001-05-04</date>
</refsect2info>
<title>
SQL92
</title>
<para>
There is no <command>ANALYZE</command> statement in <acronym>SQL92</acronym>.
</para>
</refsect2>
</refsect1>
</refentry>
<!-- Keep this comment at the end of the file
Local variables:
mode: sgml
sgml-omittag:nil
sgml-shorttag:t
sgml-minimize-attributes:nil
sgml-always-quote-attributes:t
sgml-indent-step:1
sgml-indent-data:t
sgml-parent-document:nil
sgml-default-dtd-file:"../reference.ced"
sgml-exposed-tags:nil
sgml-local-catalogs:"/usr/lib/sgml/catalog"
sgml-local-ecat-files:nil
End:
-->
<!--
$Header: /cvsroot/pgsql/doc/src/sgml/ref/vacuum.sgml,v 1.13 2001/01/13 23:58:55 petere Exp $
$Header: /cvsroot/pgsql/doc/src/sgml/ref/vacuum.sgml,v 1.14 2001/05/07 00:43:15 tgl Exp $
Postgres documentation
-->
......@@ -15,15 +15,15 @@ Postgres documentation
VACUUM
</refname>
<refpurpose>
Clean and analyze a <productname>Postgres</productname> database
Clean and optionally analyze a <productname>Postgres</productname> database
</refpurpose>
</refnamediv>
<refsynopsisdiv>
<refsynopsisdivinfo>
<date>1999-07-20</date>
<date>2001-05-04</date>
</refsynopsisdivinfo>
<synopsis>
VACUUM [ VERBOSE ] [ ANALYZE ] [ <replaceable class="PARAMETER">table</replaceable> ]
VACUUM [ VERBOSE ] [ <replaceable class="PARAMETER">table</replaceable> ]
VACUUM [ VERBOSE ] ANALYZE [ <replaceable class="PARAMETER">table</replaceable> [ (<replaceable class="PARAMETER">column</replaceable> [, ...] ) ] ]
</synopsis>
......@@ -49,7 +49,7 @@ VACUUM [ VERBOSE ] ANALYZE [ <replaceable class="PARAMETER">table</replaceable>
<term>ANALYZE</term>
<listitem>
<para>
Updates column statistics used by the optimizer to
Updates statistics used by the optimizer to
determine the most efficient way to execute a query.
</para>
</listitem>
......@@ -90,7 +90,7 @@ VACUUM [ VERBOSE ] ANALYZE [ <replaceable class="PARAMETER">table</replaceable>
</computeroutput></term>
<listitem>
<para>
The command has been accepted and the database is being cleaned.
The command is complete.
</para>
</listitem>
</varlistentry>
......@@ -144,28 +144,26 @@ NOTICE: Index <replaceable class="PARAMETER">index</replaceable>: Pages 28;
Description
</title>
<para>
<command>VACUUM</command> serves two purposes in
<productname>Postgres</productname> as both a means to reclaim storage and
also a means to collect information for the optimizer.
<command>VACUUM</command> reclaims storage occupied by deleted tuples.
In normal <productname>Postgres</productname> operation, tuples that
are DELETEd or obsoleted by UPDATE are not physically removed from
their table; they remain present until a <command>VACUUM</command> is
done. Therefore it's necessary to do <command>VACUUM</command>
periodically, especially on frequently-updated tables.
</para>
<para>
<command>VACUUM</command> opens every table in the database,
cleans out records from rolled back transactions, and updates statistics in the
system catalogs. The statistics maintained include the number of
tuples and number of pages stored in all tables.
</para>
<para>
<command>VACUUM ANALYZE</command> collects statistics representing the
dispersion of the data in each column.
This information is valuable when several query execution paths are possible.
With no parameter, <command>VACUUM</command> processes every table in the
current database. With a parameter, <command>VACUUM</command> processes
only that table.
</para>
<para>
Running <command>VACUUM</command>
periodically will increase the speed of the database in processing user queries.
<command>VACUUM ANALYZE</command> performs a <command>VACUUM</command>
and then an <command>ANALYZE</command> for each selected table. This
is a handy combination form for routine maintenance scripts. See
<xref linkend="sql-analyze" endterm="sql-analyze-title">
for more details about its processing.
</para>
<refsect2 id="R2-SQL-VACUUM-3">
......@@ -175,16 +173,15 @@ NOTICE: Index <replaceable class="PARAMETER">index</replaceable>: Pages 28;
<title>
Notes
</title>
<para>
The open database is the target for <command>VACUUM</command>.
</para>
<para>
We recommend that active production databases be
<command>VACUUM</command>-ed nightly, in order to remove
expired rows. After copying a large table into
<productname>Postgres</productname> or after deleting a large number
of records, it may be a good idea to issue a <command>VACUUM
ANALYZE</command> query. This will update the system catalogs with
ANALYZE</command> command for the affected table. This will update the
system catalogs with
the results of all recent changes, and allow the
<productname>Postgres</productname> query optimizer to make better
choices in planning user queries.
......
<!-- reference.sgml
$Header: /cvsroot/pgsql/doc/src/sgml/reference.sgml,v 1.15 2001/03/24 13:21:14 petere Exp $
$Header: /cvsroot/pgsql/doc/src/sgml/reference.sgml,v 1.16 2001/05/07 00:43:14 tgl Exp $
PostgreSQL Reference Manual
-->
......@@ -26,6 +26,7 @@ PostgreSQL Reference Manual
&alterGroup;
&alterTable;
&alterUser;
&analyze;
&begin;
&checkpoint;
&close;
......
<!--
$Header: /cvsroot/pgsql/doc/src/sgml/xoper.sgml,v 1.11 2000/09/29 20:21:34 petere Exp $
$Header: /cvsroot/pgsql/doc/src/sgml/xoper.sgml,v 1.12 2001/05/07 00:43:14 tgl Exp $
-->
<Chapter Id="xoper">
......@@ -244,7 +244,7 @@ SELECT (a + b) AS c FROM test_complex;
only a small fraction. '&lt;' will accept a fraction that depends on
where the given constant falls in the range of values for that table
column (which, it just so happens, is information collected by
VACUUM ANALYZE and made available to the selectivity estimator).
<command>ANALYZE</command> and made available to the selectivity estimator).
'&lt;=' will accept a slightly larger fraction than '&lt;' for the same
comparison constant, but they're close enough to not be worth
distinguishing, especially since we're not likely to do better than a
......
......@@ -8,7 +8,7 @@
*
*
* IDENTIFICATION
* $Header: /cvsroot/pgsql/src/backend/access/common/tupdesc.c,v 1.73 2001/03/22 06:16:06 momjian Exp $
* $Header: /cvsroot/pgsql/src/backend/access/common/tupdesc.c,v 1.74 2001/05/07 00:43:15 tgl Exp $
*
* NOTES
* some of the executor utility code such as "ExecTypeFromTL" should be
......@@ -237,16 +237,16 @@ equalTupleDescs(TupleDesc tupdesc1, TupleDesc tupdesc2)
Form_pg_attribute attr2 = tupdesc2->attrs[i];
/*
* We do not need to check every single field here, and in fact
* some fields such as attdispersion probably shouldn't be
* compared. We can also disregard attnum (it was used to place
* the row in the attrs array) and everything derived from the
* column datatype.
* We do not need to check every single field here: we can disregard
* attrelid, attnum (it was used to place the row in the attrs array)
* and everything derived from the column datatype.
*/
if (strcmp(NameStr(attr1->attname), NameStr(attr2->attname)) != 0)
return false;
if (attr1->atttypid != attr2->atttypid)
return false;
if (attr1->attstattarget != attr2->attstattarget)
return false;
if (attr1->atttypmod != attr2->atttypmod)
return false;
if (attr1->attstorage != attr2->attstorage)
......@@ -365,12 +365,12 @@ TupleDescInitEntry(TupleDesc desc,
else
MemSet(NameStr(att->attname), 0, NAMEDATALEN);
att->attdispersion = 0; /* dummy value */
att->attstattarget = 0;
att->attcacheoff = -1;
att->atttypmod = typmod;
att->attnum = attributeNumber;
att->attnelems = attdim;
att->attndims = attdim;
att->attisset = attisset;
att->attnotnull = false;
......@@ -506,7 +506,7 @@ TupleDescMakeSelfReference(TupleDesc desc,
att->attbyval = true;
att->attalign = 'i';
att->attstorage = 'p';
att->attnelems = 0;
att->attndims = 0;
}
/* ----------------------------------------------------------------
......
......@@ -6,7 +6,7 @@
*
*
* IDENTIFICATION
* $Header: /cvsroot/pgsql/src/backend/access/gist/gist.c,v 1.72 2001/03/22 03:59:12 momjian Exp $
* $Header: /cvsroot/pgsql/src/backend/access/gist/gist.c,v 1.73 2001/05/07 00:43:15 tgl Exp $
*
*-------------------------------------------------------------------------
*/
......@@ -84,8 +84,8 @@ static void gist_dumptree(Relation r, int level, BlockNumber blk, OffsetNumber c
#endif
/*
** routine to build an index. Basically calls insert over and over
*/
* routine to build an index. Basically calls insert over and over
*/
Datum
gistbuild(PG_FUNCTION_ARGS)
{
......@@ -105,7 +105,7 @@ gistbuild(PG_FUNCTION_ARGS)
itupdesc;
Datum attdata[INDEX_MAX_KEYS];
char nulls[INDEX_MAX_KEYS];
int nhtups,
double nhtups,
nitups;
Node *pred = indexInfo->ii_Predicate;
......@@ -172,7 +172,7 @@ gistbuild(PG_FUNCTION_ARGS)
#endif /* OMIT_PARTIAL_INDEX */
/* build the index */
nhtups = nitups = 0;
nhtups = nitups = 0.0;
compvec = (bool *) palloc(sizeof(bool) * indexInfo->ii_NumIndexAttrs);
......@@ -183,7 +183,7 @@ gistbuild(PG_FUNCTION_ARGS)
{
MemoryContextReset(econtext->ecxt_per_tuple_memory);
nhtups++;
nhtups += 1.0;
#ifndef OMIT_PARTIAL_INDEX
......@@ -196,7 +196,7 @@ gistbuild(PG_FUNCTION_ARGS)
slot->val = htup;
if (ExecQual((List *) oldPred, econtext, false))
{
nitups++;
nitups += 1.0;
continue;
}
}
......@@ -213,7 +213,7 @@ gistbuild(PG_FUNCTION_ARGS)
}
#endif /* OMIT_PARTIAL_INDEX */
nitups++;
nitups += 1.0;
/*
* For the current heap tuple, extract all the attributes we use
......
......@@ -8,7 +8,7 @@
*
*
* IDENTIFICATION
* $Header: /cvsroot/pgsql/src/backend/access/hash/hash.c,v 1.50 2001/03/22 03:59:12 momjian Exp $
* $Header: /cvsroot/pgsql/src/backend/access/hash/hash.c,v 1.51 2001/05/07 00:43:15 tgl Exp $
*
* NOTES
* This file contains only the public interface routines.
......@@ -57,7 +57,7 @@ hashbuild(PG_FUNCTION_ARGS)
itupdesc;
Datum attdata[INDEX_MAX_KEYS];
char nulls[INDEX_MAX_KEYS];
int nhtups,
double nhtups,
nitups;
HashItem hitem;
Node *pred = indexInfo->ii_Predicate;
......@@ -109,7 +109,7 @@ hashbuild(PG_FUNCTION_ARGS)
#endif /* OMIT_PARTIAL_INDEX */
/* build the index */
nhtups = nitups = 0;
nhtups = nitups = 0.0;
/* start a heap scan */
hscan = heap_beginscan(heap, 0, SnapshotNow, 0, (ScanKey) NULL);
......@@ -118,7 +118,7 @@ hashbuild(PG_FUNCTION_ARGS)
{
MemoryContextReset(econtext->ecxt_per_tuple_memory);
nhtups++;
nhtups += 1.0;
#ifndef OMIT_PARTIAL_INDEX
......@@ -131,7 +131,7 @@ hashbuild(PG_FUNCTION_ARGS)
slot->val = htup;
if (ExecQual((List *) oldPred, econtext, false))
{
nitups++;
nitups += 1.0;
continue;
}
}
......@@ -148,7 +148,7 @@ hashbuild(PG_FUNCTION_ARGS)
}
#endif /* OMIT_PARTIAL_INDEX */
nitups++;
nitups += 1.0;
/*
* For the current heap tuple, extract all the attributes we use
......
......@@ -8,7 +8,7 @@
*
*
* IDENTIFICATION
* $Header: /cvsroot/pgsql/src/backend/access/heap/tuptoaster.c,v 1.21 2001/03/25 00:45:20 tgl Exp $
* $Header: /cvsroot/pgsql/src/backend/access/heap/tuptoaster.c,v 1.22 2001/05/07 00:43:15 tgl Exp $
*
*
* INTERFACE ROUTINES
......@@ -166,6 +166,43 @@ heap_tuple_untoast_attr(varattrib *attr)
}
/* ----------
* toast_raw_datum_size -
*
* Return the raw (detoasted) size of a varlena datum
* ----------
*/
Size
toast_raw_datum_size(Datum value)
{
varattrib *attr = (varattrib *) DatumGetPointer(value);
Size result;
if (VARATT_IS_COMPRESSED(attr))
{
/*
* va_rawsize shows the original data size, whether the datum
* is external or not.
*/
result = attr->va_content.va_compressed.va_rawsize + VARHDRSZ;
}
else if (VARATT_IS_EXTERNAL(attr))
{
/*
* an uncompressed external attribute has rawsize including the
* header (not too consistent!)
*/
result = attr->va_content.va_external.va_rawsize;
}
else
{
/* plain untoasted datum */
result = VARSIZE(attr);
}
return result;
}
/* ----------
* toast_delete -
*
......
......@@ -12,7 +12,7 @@
* Portions Copyright (c) 1994, Regents of the University of California
*
* IDENTIFICATION
* $Header: /cvsroot/pgsql/src/backend/access/nbtree/nbtree.c,v 1.79 2001/03/22 03:59:15 momjian Exp $
* $Header: /cvsroot/pgsql/src/backend/access/nbtree/nbtree.c,v 1.80 2001/05/07 00:43:16 tgl Exp $
*
*-------------------------------------------------------------------------
*/
......@@ -69,7 +69,7 @@ btbuild(PG_FUNCTION_ARGS)
itupdesc;
Datum attdata[INDEX_MAX_KEYS];
char nulls[INDEX_MAX_KEYS];
int nhtups,
double nhtups,
nitups;
Node *pred = indexInfo->ii_Predicate;
......@@ -156,7 +156,7 @@ btbuild(PG_FUNCTION_ARGS)
#endif /* OMIT_PARTIAL_INDEX */
/* build the index */
nhtups = nitups = 0;
nhtups = nitups = 0.0;
if (usefast)
{
......@@ -196,7 +196,7 @@ btbuild(PG_FUNCTION_ARGS)
MemoryContextReset(econtext->ecxt_per_tuple_memory);
nhtups++;
nhtups += 1.0;
#ifndef OMIT_PARTIAL_INDEX
......@@ -209,7 +209,7 @@ btbuild(PG_FUNCTION_ARGS)
slot->val = htup;
if (ExecQual((List *) oldPred, econtext, false))
{
nitups++;
nitups += 1.0;
continue;
}
}
......@@ -226,7 +226,7 @@ btbuild(PG_FUNCTION_ARGS)
}
#endif /* OMIT_PARTIAL_INDEX */
nitups++;
nitups += 1.0;
/*
* For the current heap tuple, extract all the attributes we use
......
......@@ -8,7 +8,7 @@
*
*
* IDENTIFICATION
* $Header: /cvsroot/pgsql/src/backend/access/rtree/Attic/rtree.c,v 1.61 2001/03/22 03:59:16 momjian Exp $
* $Header: /cvsroot/pgsql/src/backend/access/rtree/Attic/rtree.c,v 1.62 2001/05/07 00:43:16 tgl Exp $
*
*-------------------------------------------------------------------------
*/
......@@ -100,7 +100,7 @@ rtbuild(PG_FUNCTION_ARGS)
itupdesc;
Datum attdata[INDEX_MAX_KEYS];
char nulls[INDEX_MAX_KEYS];
int nhtups,
double nhtups,
nitups;
Node *pred = indexInfo->ii_Predicate;
......@@ -163,7 +163,7 @@ rtbuild(PG_FUNCTION_ARGS)
#endif /* OMIT_PARTIAL_INDEX */
/* count the tuples as we insert them */
nhtups = nitups = 0;
nhtups = nitups = 0.0;
/* start a heap scan */
hscan = heap_beginscan(heap, 0, SnapshotNow, 0, (ScanKey) NULL);
......@@ -172,7 +172,7 @@ rtbuild(PG_FUNCTION_ARGS)
{
MemoryContextReset(econtext->ecxt_per_tuple_memory);
nhtups++;
nhtups += 1.0;
#ifndef OMIT_PARTIAL_INDEX
......@@ -185,7 +185,7 @@ rtbuild(PG_FUNCTION_ARGS)
slot->val = htup;
if (ExecQual((List *) oldPred, econtext, false))
{
nitups++;
nitups += 1.0;
continue;
}
}
......@@ -202,7 +202,7 @@ rtbuild(PG_FUNCTION_ARGS)
}
#endif /* OMIT_PARTIAL_INDEX */
nitups++;
nitups += 1.0;
/*
* For the current heap tuple, extract all the attributes we use
......
......@@ -10,7 +10,7 @@
#
#
# IDENTIFICATION
# $Header: /cvsroot/pgsql/src/backend/catalog/Attic/genbki.sh,v 1.19 2001/01/16 22:48:34 tgl Exp $
# $Header: /cvsroot/pgsql/src/backend/catalog/Attic/genbki.sh,v 1.20 2001/05/07 00:43:16 tgl Exp $
#
# NOTES
# non-essential whitespace is removed from the generated file.
......@@ -126,10 +126,12 @@ for dir in $INCLUDE_DIRS; do
fi
done
# Get INDEX_MAX_KEYS from config.h (who needs consistency?)
# Get INDEX_MAX_KEYS and DEFAULT_ATTSTATTARGET from config.h
# (who needs consistency?)
for dir in $INCLUDE_DIRS; do
if [ -f "$dir/config.h" ]; then
INDEXMAXKEYS=`grep '#define[ ]*INDEX_MAX_KEYS' $dir/config.h | $AWK '{ print $3 }'`
DEFAULTATTSTATTARGET=`grep '#define[ ]*DEFAULT_ATTSTATTARGET' $dir/config.h | $AWK '{ print $3 }'`
break
fi
done
......@@ -168,6 +170,7 @@ sed -e "s/;[ ]*$//g" \
-e "s/(NameData/(name/g" \
-e "s/(Oid/(oid/g" \
-e "s/NAMEDATALEN/$NAMEDATALEN/g" \
-e "s/DEFAULT_ATTSTATTARGET/$DEFAULTATTSTATTARGET/g" \
-e "s/INDEX_MAX_KEYS\*2/$INDEXMAXKEYS2/g" \
-e "s/INDEX_MAX_KEYS\*4/$INDEXMAXKEYS4/g" \
-e "s/INDEX_MAX_KEYS/$INDEXMAXKEYS/g" \
......
......@@ -8,7 +8,7 @@
*
*
* IDENTIFICATION
* $Header: /cvsroot/pgsql/src/backend/catalog/heap.c,v 1.162 2001/03/22 06:16:10 momjian Exp $
* $Header: /cvsroot/pgsql/src/backend/catalog/heap.c,v 1.163 2001/05/07 00:43:17 tgl Exp $
*
*
* INTERFACE ROUTINES
......@@ -96,54 +96,72 @@ static void RemoveStatistics(Relation rel);
/*
* Note:
* Should the executor special case these attributes in the future?
* Advantage: consume 1/2 the space in the ATTRIBUTE relation.
* Disadvantage: having rules to compute values in these tuples may
* be more difficult if not impossible.
* Should the system special case these attributes in the future?
* Advantage: consume much less space in the ATTRIBUTE relation.
* Disadvantage: special cases will be all over the place.
*/
static FormData_pg_attribute a1 = {
0xffffffff, {"ctid"}, TIDOID, 0, sizeof(ItemPointerData),
SelfItemPointerAttributeNumber, 0, -1, -1, '\0', 'p', '\0', 'i', '\0', '\0'
0, {"ctid"}, TIDOID, 0, sizeof(ItemPointerData),
SelfItemPointerAttributeNumber, 0, -1, -1,
false, 'p', false, 'i', false, false
};
static FormData_pg_attribute a2 = {
0xffffffff, {"oid"}, OIDOID, 0, sizeof(Oid),
ObjectIdAttributeNumber, 0, -1, -1, '\001', 'p', '\0', 'i', '\0', '\0'
0, {"oid"}, OIDOID, 0, sizeof(Oid),
ObjectIdAttributeNumber, 0, -1, -1,
true, 'p', false, 'i', false, false
};
static FormData_pg_attribute a3 = {
0xffffffff, {"xmin"}, XIDOID, 0, sizeof(TransactionId),
MinTransactionIdAttributeNumber, 0, -1, -1, '\001', 'p', '\0', 'i', '\0', '\0'
0, {"xmin"}, XIDOID, 0, sizeof(TransactionId),
MinTransactionIdAttributeNumber, 0, -1, -1,
true, 'p', false, 'i', false, false
};
static FormData_pg_attribute a4 = {
0xffffffff, {"cmin"}, CIDOID, 0, sizeof(CommandId),
MinCommandIdAttributeNumber, 0, -1, -1, '\001', 'p', '\0', 'i', '\0', '\0'
0, {"cmin"}, CIDOID, 0, sizeof(CommandId),
MinCommandIdAttributeNumber, 0, -1, -1,
true, 'p', false, 'i', false, false
};
static FormData_pg_attribute a5 = {
0xffffffff, {"xmax"}, XIDOID, 0, sizeof(TransactionId),
MaxTransactionIdAttributeNumber, 0, -1, -1, '\001', 'p', '\0', 'i', '\0', '\0'
0, {"xmax"}, XIDOID, 0, sizeof(TransactionId),
MaxTransactionIdAttributeNumber, 0, -1, -1,
true, 'p', false, 'i', false, false
};
static FormData_pg_attribute a6 = {
0xffffffff, {"cmax"}, CIDOID, 0, sizeof(CommandId),
MaxCommandIdAttributeNumber, 0, -1, -1, '\001', 'p', '\0', 'i', '\0', '\0'
0, {"cmax"}, CIDOID, 0, sizeof(CommandId),
MaxCommandIdAttributeNumber, 0, -1, -1,
true, 'p', false, 'i', false, false
};
/*
We decide to call this attribute "tableoid" rather than say
"classoid" on the basis that in the future there may be more than one
table of a particular class/type. In any case table is still the word
used in SQL.
*/
* We decided to call this attribute "tableoid" rather than say
* "classoid" on the basis that in the future there may be more than one
* table of a particular class/type. In any case table is still the word
* used in SQL.
*/
static FormData_pg_attribute a7 = {
0xffffffff, {"tableoid"}, OIDOID, 0, sizeof(Oid),
TableOidAttributeNumber, 0, -1, -1, '\001', 'p', '\0', 'i', '\0', '\0'
0, {"tableoid"}, OIDOID, 0, sizeof(Oid),
TableOidAttributeNumber, 0, -1, -1,
true, 'p', false, 'i', false, false
};
static Form_pg_attribute HeapAtt[] = {&a1, &a2, &a3, &a4, &a5, &a6, &a7};
static Form_pg_attribute SysAtt[] = {&a1, &a2, &a3, &a4, &a5, &a6, &a7};
/*
* This function returns a Form_pg_attribute pointer for a system attribute.
*/
Form_pg_attribute
SystemAttributeDefinition(AttrNumber attno)
{
if (attno >= 0 || attno < - (int) lengthof(SysAtt))
elog(ERROR, "SystemAttributeDefinition: invalid attribute number %d",
attno);
return SysAtt[-attno - 1];
}
/* ----------------------------------------------------------------
* XXX END OF UGLY HARD CODED BADNESS XXX
......@@ -380,32 +398,6 @@ heap_storage_create(Relation rel)
* 8) the relations are closed and the new relation's oid
* is returned.
*
* old comments:
* A new relation is inserted into the RELATION relation
* with the specified attribute(s) (newly inserted into
* the ATTRIBUTE relation). How does concurrency control
* work? Is it automatic now? Expects the caller to have
* attname, atttypid, atttyparg, attproc, and attlen domains filled.
* Create fills the attnum domains sequentually from zero,
* fills the attdispersion domains with zeros, and fills the
* attrelid fields with the relid.
*
* scan relation catalog for name conflict
* scan type catalog for typids (if not arg)
* create and insert attribute(s) into attribute catalog
* create new relation
* insert new relation into attribute catalog
*
* Should coordinate with heap_create_with_catalog(). Either
* it should not be called or there should be a way to prevent
* the relation from being removed at the end of the
* transaction if it is successful ('u'/'r' may be enough).
* Also, if the transaction does not commit, then the
* relation should be removed.
*
* XXX amcreate ignores "off" when inserting (for now).
* XXX amcreate (like the other utilities) needs to understand indexes.
*
* ----------------------------------------------------------------
*/
......@@ -432,14 +424,14 @@ CheckAttributeNames(TupleDesc tupdesc)
*/
for (i = 0; i < natts; i++)
{
for (j = 0; j < (int) (sizeof(HeapAtt) / sizeof(HeapAtt[0])); j++)
for (j = 0; j < (int) lengthof(SysAtt); j++)
{
if (strcmp(NameStr(HeapAtt[j]->attname),
if (strcmp(NameStr(SysAtt[j]->attname),
NameStr(tupdesc->attrs[i]->attname)) == 0)
{
elog(ERROR, "Attribute '%s' has a name conflict"
"\n\tName matches an existing system attribute",
NameStr(HeapAtt[j]->attname));
NameStr(SysAtt[j]->attname));
}
}
if (tupdesc->attrs[i]->atttypid == UNKNOWNOID)
......@@ -574,7 +566,7 @@ AddNewAttributeTuples(Oid new_rel_oid,
/* Fill in the correct relation OID */
(*dpp)->attrelid = new_rel_oid;
/* Make sure these are OK, too */
(*dpp)->attdispersion = 0;
(*dpp)->attstattarget = DEFAULT_ATTSTATTARGET;
(*dpp)->attcacheoff = -1;
tup = heap_addheader(Natts_pg_attribute,
......@@ -593,14 +585,14 @@ AddNewAttributeTuples(Oid new_rel_oid,
/*
* next we add the system attributes..
*/
dpp = HeapAtt;
dpp = SysAtt;
for (i = 0; i < -1 - FirstLowInvalidHeapAttributeNumber; i++)
{
/* Fill in the correct relation OID */
/* HACK: we are writing on static data here */
(*dpp)->attrelid = new_rel_oid;
/* Unneeded since they should be OK in the constant data anyway */
/* (*dpp)->attdispersion = 0; */
/* (*dpp)->attstattarget = 0; */
/* (*dpp)->attcacheoff = -1; */
tup = heap_addheader(Natts_pg_attribute,
......@@ -669,8 +661,23 @@ AddNewRelationTuple(Relation pg_class_desc,
* save. (NOTE: CREATE INDEX inserts the same bogus estimates if it
* finds the relation has 0 rows and pages. See index.c.)
*/
new_rel_reltup->relpages = 10; /* bogus estimates */
new_rel_reltup->reltuples = 1000;
switch (relkind)
{
case RELKIND_RELATION:
case RELKIND_INDEX:
case RELKIND_TOASTVALUE:
new_rel_reltup->relpages = 10; /* bogus estimates */
new_rel_reltup->reltuples = 1000;
break;
case RELKIND_SEQUENCE:
new_rel_reltup->relpages = 1;
new_rel_reltup->reltuples = 1;
break;
default: /* views, etc */
new_rel_reltup->relpages = 0;
new_rel_reltup->reltuples = 0;
break;
}
new_rel_reltup->relowner = GetUserId();
new_rel_reltup->reltype = new_type_oid;
......
......@@ -8,7 +8,7 @@
*
*
* IDENTIFICATION
* $Header: /cvsroot/pgsql/src/backend/catalog/index.c,v 1.145 2001/04/02 14:34:25 momjian Exp $
* $Header: /cvsroot/pgsql/src/backend/catalog/index.c,v 1.146 2001/05/07 00:43:17 tgl Exp $
*
*
* INTERFACE ROUTINES
......@@ -55,7 +55,7 @@
*/
#define AVG_ATTR_SIZE 8
#define NTUPLES_PER_PAGE(natts) \
((BLCKSZ - MAXALIGN(sizeof (PageHeaderData))) / \
((BLCKSZ - MAXALIGN(sizeof(PageHeaderData))) / \
((natts) * AVG_ATTR_SIZE + MAXALIGN(sizeof(HeapTupleHeaderData))))
/* non-export function prototypes */
......@@ -98,39 +98,6 @@ IsReindexProcessing(void)
return reindexing;
}
/* ----------------------------------------------------------------
* sysatts is a structure containing attribute tuple forms
* for system attributes (numbered -1, -2, ...). This really
* should be generated or eliminated or moved elsewhere. -cim 1/19/91
*
* typedef struct FormData_pg_attribute {
* Oid attrelid;
* NameData attname;
* Oid atttypid;
* uint32 attnvals;
* int16 attlen;
* AttrNumber attnum;
* uint32 attnelems;
* int32 attcacheoff;
* int32 atttypmod;
* bool attbyval;
* bool attisset;
* char attalign;
* bool attnotnull;
* bool atthasdef;
* } FormData_pg_attribute;
*
* ----------------------------------------------------------------
*/
static FormData_pg_attribute sysatts[] = {
{0, {"ctid"}, TIDOID, 0, 6, -1, 0, -1, -1, '\0', 'p', '\0', 'i', '\0', '\0'},
{0, {"oid"}, OIDOID, 0, 4, -2, 0, -1, -1, '\001', 'p', '\0', 'i', '\0', '\0'},
{0, {"xmin"}, XIDOID, 0, 4, -3, 0, -1, -1, '\001', 'p', '\0', 'i', '\0', '\0'},
{0, {"cmin"}, CIDOID, 0, 4, -4, 0, -1, -1, '\001', 'p', '\0', 'i', '\0', '\0'},
{0, {"xmax"}, XIDOID, 0, 4, -5, 0, -1, -1, '\001', 'p', '\0', 'i', '\0', '\0'},
{0, {"cmax"}, CIDOID, 0, 4, -6, 0, -1, -1, '\001', 'p', '\0', 'i', '\0', '\0'},
};
/* ----------------------------------------------------------------
* GetHeapRelationOid
* ----------------------------------------------------------------
......@@ -250,7 +217,6 @@ ConstructTupleDescriptor(Relation heapRelation,
for (i = 0; i < numatts; i++)
{
AttrNumber atnum; /* attributeNumber[attributeOffset] */
AttrNumber atind;
Form_pg_attribute from;
Form_pg_attribute to;
......@@ -264,16 +230,9 @@ ConstructTupleDescriptor(Relation heapRelation,
{
/*
* here we are indexing on a system attribute (-1...-n) so we
* convert atnum into a usable index 0...n-1 so we can use it
* to dereference the array sysatts[] which stores tuple
* descriptor information for system attributes.
* here we are indexing on a system attribute (-1...-n)
*/
if (atnum <= FirstLowInvalidHeapAttributeNumber || atnum >= 0)
elog(ERROR, "Cannot create index on system attribute: attribute number out of range (%d)", atnum);
atind = (-atnum) - 1;
from = &sysatts[atind];
from = SystemAttributeDefinition(atnum);
}
else
{
......@@ -284,9 +243,8 @@ ConstructTupleDescriptor(Relation heapRelation,
if (atnum > natts)
elog(ERROR, "Cannot create index: attribute %d does not exist",
atnum);
atind = AttrNumberGetAttrOffset(atnum);
from = heapTupDesc->attrs[atind];
from = heapTupDesc->attrs[AttrNumberGetAttrOffset(atnum)];
}
/*
......@@ -303,10 +261,10 @@ ConstructTupleDescriptor(Relation heapRelation,
*/
to->attnum = i + 1;
to->attdispersion = 0.0;
to->attstattarget = 0;
to->attcacheoff = -1;
to->attnotnull = false;
to->atthasdef = false;
to->attcacheoff = -1;
/*
* We do not yet have the correct relation OID for the index, so
......@@ -1542,10 +1500,14 @@ setNewRelfilenode(Relation relation)
/* ----------------
* UpdateStats
*
* Update pg_class' relpages and reltuples statistics for the given relation
* (which can be either a table or an index). Note that this is not used
* in the context of VACUUM.
* ----------------
*/
void
UpdateStats(Oid relid, long reltuples)
UpdateStats(Oid relid, double reltuples)
{
Relation whichRel;
Relation pg_class;
......@@ -1636,6 +1598,10 @@ UpdateStats(Oid relid, long reltuples)
* with zero size statistics until a VACUUM is done. The optimizer
* will generate very bad plans if the stats claim the table is empty
* when it is actually sizable. See also CREATE TABLE in heap.c.
*
* Note: this path is also taken during bootstrap, because bootstrap.c
* passes reltuples = 0 after loading a table. We have to estimate some
* number for reltuples based on the actual number of pages.
*/
relpages = RelationGetNumberOfBlocks(whichRel);
......@@ -1689,15 +1655,15 @@ UpdateStats(Oid relid, long reltuples)
for (i = 0; i < Natts_pg_class; i++)
{
nulls[i] = heap_attisnull(tuple, i + 1) ? 'n' : ' ';
nulls[i] = ' ';
replace[i] = ' ';
values[i] = (Datum) NULL;
}
replace[Anum_pg_class_relpages - 1] = 'r';
values[Anum_pg_class_relpages - 1] = (Datum) relpages;
values[Anum_pg_class_relpages - 1] = Int32GetDatum(relpages);
replace[Anum_pg_class_reltuples - 1] = 'r';
values[Anum_pg_class_reltuples - 1] = (Datum) reltuples;
values[Anum_pg_class_reltuples - 1] = Float4GetDatum((float4) reltuples);
newtup = heap_modifytuple(tuple, pg_class, values, nulls, replace);
simple_heap_update(pg_class, &tuple->t_self, newtup);
if (!IsIgnoringSystemIndexes())
......@@ -1741,7 +1707,7 @@ DefaultBuild(Relation heapRelation,
TupleDesc heapDescriptor;
Datum datum[INDEX_MAX_KEYS];
char nullv[INDEX_MAX_KEYS];
long reltuples,
double reltuples,
indtuples;
Node *predicate = indexInfo->ii_Predicate;
......@@ -1796,7 +1762,7 @@ DefaultBuild(Relation heapRelation,
0, /* number of keys */
(ScanKey) NULL); /* scan key */
reltuples = indtuples = 0;
reltuples = indtuples = 0.0;
/*
* for each tuple in the base relation, we create an index tuple and
......@@ -1808,7 +1774,7 @@ DefaultBuild(Relation heapRelation,
{
MemoryContextReset(econtext->ecxt_per_tuple_memory);
reltuples++;
reltuples += 1.0;
#ifndef OMIT_PARTIAL_INDEX
......@@ -1821,7 +1787,7 @@ DefaultBuild(Relation heapRelation,
slot->val = heapTuple;
if (ExecQual((List *) oldPred, econtext, false))
{
indtuples++;
indtuples += 1.0;
continue;
}
}
......@@ -1838,7 +1804,7 @@ DefaultBuild(Relation heapRelation,
}
#endif /* OMIT_PARTIAL_INDEX */
indtuples++;
indtuples += 1.0;
/*
* FormIndexDatum fills in its datum and null parameters with
......
This diff is collapsed.
......@@ -8,7 +8,7 @@
*
*
* IDENTIFICATION
* $Header: /cvsroot/pgsql/src/backend/commands/Attic/command.c,v 1.125 2001/03/23 04:49:52 momjian Exp $
* $Header: /cvsroot/pgsql/src/backend/commands/Attic/command.c,v 1.126 2001/05/07 00:43:17 tgl Exp $
*
* NOTES
* The PerformAddAttribute() code, like most of the relation
......@@ -56,6 +56,7 @@
#include "access/genam.h"
static void drop_default(Oid relid, int16 attnum);
static bool needs_toast_table(Relation rel);
static bool is_relation(char *name);
......@@ -408,7 +409,7 @@ AlterTableAddColumn(const char *relationName,
HeapTuple typeTuple;
Form_pg_type tform;
char *typename;
int attnelems;
int attndims;
if (SearchSysCacheExists(ATTNAME,
ObjectIdGetDatum(reltup->t_data->t_oid),
......@@ -425,11 +426,11 @@ AlterTableAddColumn(const char *relationName,
if (colDef->typename->arrayBounds)
{
attnelems = length(colDef->typename->arrayBounds);
attndims = length(colDef->typename->arrayBounds);
typename = makeArrayTypeName(colDef->typename->name);
}
else
attnelems = 0;
attndims = 0;
typeTuple = SearchSysCache(TYPENAME,
PointerGetDatum(typename),
......@@ -441,12 +442,12 @@ AlterTableAddColumn(const char *relationName,
namestrcpy(&(attribute->attname), colDef->colname);
attribute->atttypid = typeTuple->t_data->t_oid;
attribute->attlen = tform->typlen;
attribute->attdispersion = 0;
attribute->attstattarget = DEFAULT_ATTSTATTARGET;
attribute->attcacheoff = -1;
attribute->atttypmod = colDef->typename->typmod;
attribute->attnum = i;
attribute->attbyval = tform->typbyval;
attribute->attnelems = attnelems;
attribute->attndims = attndims;
attribute->attisset = (bool) (tform->typtype == 'c');
attribute->attstorage = tform->typstorage;
attribute->attalign = tform->typalign;
......@@ -496,17 +497,13 @@ AlterTableAddColumn(const char *relationName,
}
static void drop_default(Oid relid, int16 attnum);
/*
* ALTER TABLE ALTER COLUMN SET/DROP DEFAULT
*/
void
AlterTableAlterColumn(const char *relationName,
bool inh, const char *colName,
Node *newDefault)
AlterTableAlterColumnDefault(const char *relationName,
bool inh, const char *colName,
Node *newDefault)
{
Relation rel;
HeapTuple tuple;
......@@ -551,8 +548,8 @@ AlterTableAlterColumn(const char *relationName,
if (childrelid == myrelid)
continue;
rel = heap_open(childrelid, AccessExclusiveLock);
AlterTableAlterColumn(RelationGetRelationName(rel),
false, colName, newDefault);
AlterTableAlterColumnDefault(RelationGetRelationName(rel),
false, colName, newDefault);
heap_close(rel, AccessExclusiveLock);
}
}
......@@ -560,7 +557,7 @@ AlterTableAlterColumn(const char *relationName,
/* -= now do the thing on this relation =- */
/* reopen the business */
rel = heap_openr((char *) relationName, AccessExclusiveLock);
rel = heap_openr(relationName, AccessExclusiveLock);
/*
* get the number of the attribute
......@@ -647,7 +644,6 @@ AlterTableAlterColumn(const char *relationName,
}
static void
drop_default(Oid relid, int16 attnum)
{
......@@ -675,6 +671,104 @@ drop_default(Oid relid, int16 attnum)
}
/*
* ALTER TABLE ALTER COLUMN SET STATISTICS
*/
void
AlterTableAlterColumnStatistics(const char *relationName,
bool inh, const char *colName,
Node *statsTarget)
{
Relation rel;
Oid myrelid;
int newtarget;
Relation attrelation;
HeapTuple tuple;
#ifndef NO_SECURITY
if (!pg_ownercheck(GetUserId(), relationName, RELNAME))
elog(ERROR, "ALTER TABLE: permission denied");
#endif
rel = heap_openr(relationName, AccessExclusiveLock);
if (rel->rd_rel->relkind != RELKIND_RELATION)
elog(ERROR, "ALTER TABLE: relation \"%s\" is not a table",
relationName);
myrelid = RelationGetRelid(rel);
heap_close(rel, NoLock); /* close rel, but keep lock! */
/*
* Propagate to children if desired
*/
if (inh)
{
List *child,
*children;
/* this routine is actually in the planner */
children = find_all_inheritors(myrelid);
/*
* find_all_inheritors does the recursive search of the
* inheritance hierarchy, so all we have to do is process all of
* the relids in the list that it returns.
*/
foreach(child, children)
{
Oid childrelid = lfirsti(child);
if (childrelid == myrelid)
continue;
rel = heap_open(childrelid, AccessExclusiveLock);
AlterTableAlterColumnStatistics(RelationGetRelationName(rel),
false, colName, statsTarget);
heap_close(rel, AccessExclusiveLock);
}
}
/* -= now do the thing on this relation =- */
Assert(IsA(statsTarget, Integer));
newtarget = intVal(statsTarget);
/* Limit target to sane range (should we raise an error instead?) */
if (newtarget < 0)
newtarget = 0;
else if (newtarget > 1000)
newtarget = 1000;
attrelation = heap_openr(AttributeRelationName, RowExclusiveLock);
tuple = SearchSysCacheCopy(ATTNAME,
ObjectIdGetDatum(myrelid),
PointerGetDatum(colName),
0, 0);
if (!HeapTupleIsValid(tuple))
elog(ERROR, "ALTER TABLE: relation \"%s\" has no column \"%s\"",
relationName, colName);
if (((Form_pg_attribute) GETSTRUCT(tuple))->attnum < 0)
elog(ERROR, "ALTER TABLE: cannot change system attribute \"%s\"",
colName);
((Form_pg_attribute) GETSTRUCT(tuple))->attstattarget = newtarget;
simple_heap_update(attrelation, &tuple->t_self, tuple);
/* keep system catalog indices current */
{
Relation irelations[Num_pg_attr_indices];
CatalogOpenIndices(Num_pg_attr_indices, Name_pg_attr_indices, irelations);
CatalogIndexInsert(irelations, Num_pg_attr_indices, attrelation, tuple);
CatalogCloseIndices(Num_pg_attr_indices, irelations);
}
heap_freetuple(tuple);
heap_close(attrelation, RowExclusiveLock);
}
#ifdef _DROP_COLUMN_HACK__
/*
* ALTER TABLE DROP COLUMN trial implementation
......
This diff is collapsed.
......@@ -8,7 +8,7 @@
*
*
* IDENTIFICATION
* $Header: /cvsroot/pgsql/src/backend/executor/nodeSort.c,v 1.32 2001/03/22 06:16:13 momjian Exp $
* $Header: /cvsroot/pgsql/src/backend/executor/nodeSort.c,v 1.33 2001/05/07 00:43:18 tgl Exp $
*
*-------------------------------------------------------------------------
*/
......@@ -20,24 +20,24 @@
#include "utils/tuplesort.h"
/* ----------------------------------------------------------------
* FormSortKeys(node)
* ExtractSortKeys
*
* Forms the structure containing information used to sort the relation.
* Extract the sorting key information from the plan node.
*
* Returns an array of ScanKeyData.
* Returns two palloc'd arrays, one of sort operator OIDs and
* one of attribute numbers.
* ----------------------------------------------------------------
*/
static ScanKey
FormSortKeys(Sort *sortnode)
static void
ExtractSortKeys(Sort *sortnode,
Oid **sortOperators,
AttrNumber **attNums)
{
ScanKey sortkeys;
List *targetList;
List *tl;
int keycount;
Resdom *resdom;
AttrNumber resno;
Index reskey;
Oid reskeyop;
Oid *sortOps;
AttrNumber *attNos;
List *tl;
/*
* get information from the node
......@@ -46,36 +46,33 @@ FormSortKeys(Sort *sortnode)
keycount = sortnode->keycount;
/*
* first allocate space for scan keys
* first allocate space for results
*/
if (keycount <= 0)
elog(ERROR, "FormSortKeys: keycount <= 0");
sortkeys = (ScanKey) palloc(keycount * sizeof(ScanKeyData));
MemSet((char *) sortkeys, 0, keycount * sizeof(ScanKeyData));
elog(ERROR, "ExtractSortKeys: keycount <= 0");
sortOps = (Oid *) palloc(keycount * sizeof(Oid));
MemSet(sortOps, 0, keycount * sizeof(Oid));
*sortOperators = sortOps;
attNos = (AttrNumber *) palloc(keycount * sizeof(AttrNumber));
MemSet(attNos, 0, keycount * sizeof(AttrNumber));
*attNums = attNos;
/*
* form each scan key from the resdom info in the target list
* extract info from the resdom nodes in the target list
*/
foreach(tl, targetList)
{
TargetEntry *target = (TargetEntry *) lfirst(tl);
resdom = target->resdom;
resno = resdom->resno;
reskey = resdom->reskey;
reskeyop = resdom->reskeyop;
Resdom *resdom = target->resdom;
Index reskey = resdom->reskey;
if (reskey > 0) /* ignore TLEs that are not sort keys */
{
ScanKeyEntryInitialize(&sortkeys[reskey - 1],
0x0,
resno,
(RegProcedure) reskeyop,
(Datum) 0);
Assert(reskey <= keycount);
sortOps[reskey - 1] = resdom->reskeyop;
attNos[reskey - 1] = resdom->resno;
}
}
return sortkeys;
}
/* ----------------------------------------------------------------
......@@ -124,8 +121,8 @@ ExecSort(Sort *node)
{
Plan *outerNode;
TupleDesc tupDesc;
int keycount;
ScanKey sortkeys;
Oid *sortOperators;
AttrNumber *attNums;
SO1_printf("ExecSort: %s\n",
"sorting subplan");
......@@ -145,14 +142,17 @@ ExecSort(Sort *node)
outerNode = outerPlan((Plan *) node);
tupDesc = ExecGetTupType(outerNode);
keycount = node->keycount;
sortkeys = (ScanKey) sortstate->sort_Keys;
tuplesortstate = tuplesort_begin_heap(tupDesc, keycount, sortkeys,
true /* randomAccess */ );
ExtractSortKeys(node, &sortOperators, &attNums);
tuplesortstate = tuplesort_begin_heap(tupDesc, node->keycount,
sortOperators, attNums,
true /* randomAccess */ );
sortstate->tuplesortstate = (void *) tuplesortstate;
pfree(sortOperators);
pfree(attNums);
/*
* Scan the subplan and feed all the tuples to tuplesort.
*/
......@@ -230,7 +230,6 @@ ExecInitSort(Sort *node, EState *estate, Plan *parent)
*/
sortstate = makeNode(SortState);
sortstate->sort_Done = false;
sortstate->sort_Keys = NULL;
sortstate->tuplesortstate = NULL;
node->sortstate = sortstate;
......@@ -258,11 +257,6 @@ ExecInitSort(Sort *node, EState *estate, Plan *parent)
outerPlan = outerPlan((Plan *) node);
ExecInitNode(outerPlan, estate, (Plan *) node);
/*
* initialize sortstate information
*/
sortstate->sort_Keys = FormSortKeys(node);
/*
* initialize tuple type. no need to initialize projection info
* because this node doesn't do projections.
......@@ -321,9 +315,6 @@ ExecEndSort(Sort *node)
tuplesort_end((Tuplesortstate *) sortstate->tuplesortstate);
sortstate->tuplesortstate = NULL;
if (sortstate->sort_Keys != NULL)
pfree(sortstate->sort_Keys);
pfree(sortstate);
node->sortstate = NULL;
......
......@@ -15,7 +15,7 @@
* Portions Copyright (c) 1994, Regents of the University of California
*
* IDENTIFICATION
* $Header: /cvsroot/pgsql/src/backend/nodes/copyfuncs.c,v 1.140 2001/03/22 06:16:14 momjian Exp $
* $Header: /cvsroot/pgsql/src/backend/nodes/copyfuncs.c,v 1.141 2001/05/07 00:43:18 tgl Exp $
*
*-------------------------------------------------------------------------
*/
......@@ -1378,8 +1378,8 @@ _copyRestrictInfo(RestrictInfo *from)
newnode->left_pathkey = NIL;
newnode->right_pathkey = NIL;
newnode->hashjoinoperator = from->hashjoinoperator;
newnode->left_dispersion = from->left_dispersion;
newnode->right_dispersion = from->right_dispersion;
newnode->left_bucketsize = from->left_bucketsize;
newnode->right_bucketsize = from->right_bucketsize;
return newnode;
}
......@@ -2209,11 +2209,12 @@ _copyVacuumStmt(VacuumStmt *from)
{
VacuumStmt *newnode = makeNode(VacuumStmt);
newnode->verbose = from->verbose;
newnode->vacuum = from->vacuum;
newnode->analyze = from->analyze;
newnode->verbose = from->verbose;
if (from->vacrel)
newnode->vacrel = pstrdup(from->vacrel);
Node_Copy(from, newnode, va_spec);
Node_Copy(from, newnode, va_cols);
return newnode;
}
......
......@@ -20,7 +20,7 @@
* Portions Copyright (c) 1994, Regents of the University of California
*
* IDENTIFICATION
* $Header: /cvsroot/pgsql/src/backend/nodes/equalfuncs.c,v 1.88 2001/03/22 03:59:31 momjian Exp $
* $Header: /cvsroot/pgsql/src/backend/nodes/equalfuncs.c,v 1.89 2001/05/07 00:43:19 tgl Exp $
*
*-------------------------------------------------------------------------
*/
......@@ -516,7 +516,7 @@ _equalRestrictInfo(RestrictInfo *a, RestrictInfo *b)
return false;
/*
* ignore eval_cost, left/right_pathkey, and left/right_dispersion,
* ignore eval_cost, left/right_pathkey, and left/right_bucketsize,
* since they may not be set yet, and should be derivable from the
* clause anyway
*/
......@@ -1113,13 +1113,15 @@ _equalDropdbStmt(DropdbStmt *a, DropdbStmt *b)
static bool
_equalVacuumStmt(VacuumStmt *a, VacuumStmt *b)
{
if (a->verbose != b->verbose)
if (a->vacuum != b->vacuum)
return false;
if (a->analyze != b->analyze)
return false;
if (a->verbose != b->verbose)
return false;
if (!equalstr(a->vacrel, b->vacrel))
return false;
if (!equal(a->va_spec, b->va_spec))
if (!equal(a->va_cols, b->va_cols))
return false;
return true;
......
......@@ -8,7 +8,7 @@
*
*
* IDENTIFICATION
* $Header: /cvsroot/pgsql/src/backend/nodes/readfuncs.c,v 1.107 2001/03/22 03:59:32 momjian Exp $
* $Header: /cvsroot/pgsql/src/backend/nodes/readfuncs.c,v 1.108 2001/05/07 00:43:19 tgl Exp $
*
* NOTES
* Most of the read functions for plan nodes are tested. (In fact, they
......@@ -1874,11 +1874,11 @@ _readRestrictInfo(void)
/* eval_cost is not part of saved representation; compute on first use */
local_node->eval_cost = -1;
/* ditto for cached pathkeys and dispersion */
/* ditto for cached pathkeys and bucketsize */
local_node->left_pathkey = NIL;
local_node->right_pathkey = NIL;
local_node->left_dispersion = -1;
local_node->right_dispersion = -1;
local_node->left_bucketsize = -1;
local_node->right_bucketsize = -1;
return local_node;
}
......
......@@ -41,7 +41,7 @@
* Portions Copyright (c) 1994, Regents of the University of California
*
* IDENTIFICATION
* $Header: /cvsroot/pgsql/src/backend/optimizer/path/costsize.c,v 1.70 2001/04/25 22:04:37 tgl Exp $
* $Header: /cvsroot/pgsql/src/backend/optimizer/path/costsize.c,v 1.71 2001/05/07 00:43:20 tgl Exp $
*
*-------------------------------------------------------------------------
*/
......@@ -50,11 +50,15 @@
#include <math.h>
#include "catalog/pg_statistic.h"
#include "executor/nodeHash.h"
#include "miscadmin.h"
#include "optimizer/clauses.h"
#include "optimizer/cost.h"
#include "optimizer/pathnode.h"
#include "parser/parsetree.h"
#include "utils/lsyscache.h"
#include "utils/syscache.h"
/*
......@@ -573,7 +577,7 @@ cost_mergejoin(Path *path,
* 'outer_path' is the path for the outer relation
* 'inner_path' is the path for the inner relation
* 'restrictlist' are the RestrictInfo nodes to be applied at the join
* 'innerdispersion' is an estimate of the dispersion statistic
* 'innerbucketsize' is an estimate of the bucketsize statistic
* for the inner hash key.
*/
void
......@@ -581,7 +585,7 @@ cost_hashjoin(Path *path,
Path *outer_path,
Path *inner_path,
List *restrictlist,
Selectivity innerdispersion)
Selectivity innerbucketsize)
{
Cost startup_cost = 0;
Cost run_cost = 0;
......@@ -607,22 +611,20 @@ cost_hashjoin(Path *path,
/*
* The number of tuple comparisons needed is the number of outer
* tuples times the typical hash bucket size. nodeHash.c tries for
* average bucket loading of NTUP_PER_BUCKET, but that goal will be
* reached only if data values are uniformly distributed among the
* buckets. To be conservative, we scale up the target bucket size by
* the number of inner rows times inner dispersion, giving an estimate
* of the typical number of duplicates of each value. We then charge
* one cpu_operator_cost per tuple comparison.
* tuples times the typical number of tuples in a hash bucket,
* which is the inner relation size times its bucketsize fraction.
* We charge one cpu_operator_cost per tuple comparison.
*/
run_cost += cpu_operator_cost * outer_path->parent->rows *
NTUP_PER_BUCKET * ceil(inner_path->parent->rows * innerdispersion);
ceil(inner_path->parent->rows * innerbucketsize);
/*
* Estimate the number of tuples that get through the hashing filter
* as one per tuple in the two source relations. This could be a
* drastic underestimate if there are many equal-keyed tuples in
* either relation, but we have no good way of estimating that...
* either relation, but we have no simple way of estimating that;
* and since this is only a second-order parameter, it's probably
* not worth expending a lot of effort on the estimate.
*/
ntuples = outer_path->parent->rows + inner_path->parent->rows;
......@@ -651,7 +653,7 @@ cost_hashjoin(Path *path,
/*
* Bias against putting larger relation on inside. We don't want an
* absolute prohibition, though, since larger relation might have
* better dispersion --- and we can't trust the size estimates
* better bucketsize --- and we can't trust the size estimates
* unreservedly, anyway. Instead, inflate the startup cost by the
* square root of the size ratio. (Why square root? No real good
* reason, but it seems reasonable...)
......@@ -663,6 +665,171 @@ cost_hashjoin(Path *path,
path->total_cost = startup_cost + run_cost;
}
/*
* Estimate hash bucketsize fraction (ie, number of entries in a bucket
* divided by total tuples in relation) if the specified Var is used
* as a hash key.
*
* This statistic is used by cost_hashjoin. We split out the calculation
* because it's useful to cache the result for re-use across multiple path
* cost calculations.
*
* XXX This is really pretty bogus since we're effectively assuming that the
* distribution of hash keys will be the same after applying restriction
* clauses as it was in the underlying relation. However, we are not nearly
* smart enough to figure out how the restrict clauses might change the
* distribution, so this will have to do for now.
*
* The executor tries for average bucket loading of NTUP_PER_BUCKET by setting
* number of buckets equal to ntuples / NTUP_PER_BUCKET, which would yield
* a bucketsize fraction of NTUP_PER_BUCKET / ntuples. But that goal will
* be reached only if the data values are uniformly distributed among the
* buckets, which requires (a) at least ntuples / NTUP_PER_BUCKET distinct
* data values, and (b) a not-too-skewed data distribution. Otherwise the
* buckets will be nonuniformly occupied. If the other relation in the join
* has a similar distribution, the most-loaded buckets are exactly those
* that will be probed most often. Therefore, the "average" bucket size for
* costing purposes should really be taken as something close to the "worst
* case" bucket size. We try to estimate this by first scaling up if there
* are too few distinct data values, and then scaling up again by the
* ratio of the most common value's frequency to the average frequency.
*
* If no statistics are available, use a default estimate of 0.1. This will
* discourage use of a hash rather strongly if the inner relation is large,
* which is what we want. We do not want to hash unless we know that the
* inner rel is well-dispersed (or the alternatives seem much worse).
*/
Selectivity
estimate_hash_bucketsize(Query *root, Var *var)
{
Oid relid;
RelOptInfo *rel;
HeapTuple tuple;
Form_pg_statistic stats;
double estfract,
ndistinct,
needdistinct,
mcvfreq,
avgfreq;
float4 *numbers;
int nnumbers;
/*
* Lookup info about var's relation and attribute;
* if none available, return default estimate.
*/
if (!IsA(var, Var))
return 0.1;
relid = getrelid(var->varno, root->rtable);
if (relid == InvalidOid)
return 0.1;
rel = get_base_rel(root, var->varno);
if (rel->tuples <= 0.0 || rel->rows <= 0.0)
return 0.1; /* ensure we can divide below */
tuple = SearchSysCache(STATRELATT,
ObjectIdGetDatum(relid),
Int16GetDatum(var->varattno),
0, 0);
if (!HeapTupleIsValid(tuple))
{
/*
* Perhaps the Var is a system attribute; if so, it will have no
* entry in pg_statistic, but we may be able to guess something
* about its distribution anyway.
*/
switch (var->varattno)
{
case ObjectIdAttributeNumber:
case SelfItemPointerAttributeNumber:
/* these are unique, so buckets should be well-distributed */
return (double) NTUP_PER_BUCKET / rel->rows;
case TableOidAttributeNumber:
/* hashing this is a terrible idea... */
return 1.0;
}
return 0.1;
}
stats = (Form_pg_statistic) GETSTRUCT(tuple);
/*
* Obtain number of distinct data values in raw relation.
*/
ndistinct = stats->stadistinct;
if (ndistinct < 0.0)
ndistinct = -ndistinct * rel->tuples;
/*
* Adjust ndistinct to account for restriction clauses. Observe we are
* assuming that the data distribution is affected uniformly by the
* restriction clauses!
*
* XXX Possibly better way, but much more expensive: multiply by
* selectivity of rel's restriction clauses that mention the target Var.
*/
ndistinct *= rel->rows / rel->tuples;
/*
* Discourage use of hash join if there seem not to be very many distinct
* data values. The threshold here is somewhat arbitrary, as is the
* fraction used to "discourage" the choice.
*/
if (ndistinct < 50.0)
{
ReleaseSysCache(tuple);
return 0.5;
}
/*
* Form initial estimate of bucketsize fraction. Here we use rel->rows,
* ie the number of rows after applying restriction clauses, because
* that's what the fraction will eventually be multiplied by in
* cost_heapjoin.
*/
estfract = (double) NTUP_PER_BUCKET / rel->rows;
/*
* Adjust estimated bucketsize if too few distinct values to fill
* all the buckets.
*/
needdistinct = rel->rows / (double) NTUP_PER_BUCKET;
if (ndistinct < needdistinct)
estfract *= needdistinct / ndistinct;
/*
* Look up the frequency of the most common value, if available.
*/
mcvfreq = 0.0;
if (get_attstatsslot(tuple, var->vartype, var->vartypmod,
STATISTIC_KIND_MCV, InvalidOid,
NULL, NULL, &numbers, &nnumbers))
{
/*
* The first MCV stat is for the most common value.
*/
if (nnumbers > 0)
mcvfreq = numbers[0];
free_attstatsslot(var->vartype, NULL, 0,
numbers, nnumbers);
}
/*
* Adjust estimated bucketsize upward to account for skewed distribution.
*/
avgfreq = (1.0 - stats->stanullfrac) / ndistinct;
if (avgfreq > 0.0 && mcvfreq > avgfreq)
estfract *= mcvfreq / avgfreq;
ReleaseSysCache(tuple);
return (Selectivity) estfract;
}
/*
* cost_qual_eval
......
......@@ -8,15 +8,15 @@
*
*
* IDENTIFICATION
* $Header: /cvsroot/pgsql/src/backend/optimizer/path/joinpath.c,v 1.63 2001/04/15 00:48:17 tgl Exp $
* $Header: /cvsroot/pgsql/src/backend/optimizer/path/joinpath.c,v 1.64 2001/05/07 00:43:20 tgl Exp $
*
*-------------------------------------------------------------------------
*/
#include "postgres.h"
#include <sys/types.h>
#include <math.h>
#include "postgres.h"
#include "optimizer/clauses.h"
#include "optimizer/cost.h"
#include "optimizer/pathnode.h"
......@@ -45,7 +45,6 @@ static void hash_inner_and_outer(Query *root, RelOptInfo *joinrel,
List *restrictlist, JoinType jointype);
static Path *best_innerjoin(List *join_paths, List *outer_relid,
JoinType jointype);
static Selectivity estimate_dispersion(Query *root, Var *var);
static List *select_mergejoin_clauses(RelOptInfo *joinrel,
RelOptInfo *outerrel,
RelOptInfo *innerrel,
......@@ -722,7 +721,7 @@ hash_inner_and_outer(Query *root,
Expr *clause;
Var *left,
*right;
Selectivity innerdispersion;
Selectivity innerbucketsize;
List *hashclauses;
if (restrictinfo->hashjoinoperator == InvalidOid)
......@@ -742,34 +741,34 @@ hash_inner_and_outer(Query *root,
/*
* Check if clause is usable with these sub-rels, find inner side,
* estimate dispersion of inner var for costing purposes.
* estimate bucketsize of inner var for costing purposes.
*
* Since we tend to visit the same clauses over and over when
* planning a large query, we cache the dispersion estimates in
* planning a large query, we cache the bucketsize estimates in
* the RestrictInfo node to avoid repeated lookups of statistics.
*/
if (intMember(left->varno, outerrelids) &&
intMember(right->varno, innerrelids))
{
/* righthand side is inner */
innerdispersion = restrictinfo->right_dispersion;
if (innerdispersion < 0)
innerbucketsize = restrictinfo->right_bucketsize;
if (innerbucketsize < 0)
{
/* not cached yet */
innerdispersion = estimate_dispersion(root, right);
restrictinfo->right_dispersion = innerdispersion;
innerbucketsize = estimate_hash_bucketsize(root, right);
restrictinfo->right_bucketsize = innerbucketsize;
}
}
else if (intMember(left->varno, innerrelids) &&
intMember(right->varno, outerrelids))
{
/* lefthand side is inner */
innerdispersion = restrictinfo->left_dispersion;
if (innerdispersion < 0)
innerbucketsize = restrictinfo->left_bucketsize;
if (innerbucketsize < 0)
{
/* not cached yet */
innerdispersion = estimate_dispersion(root, left);
restrictinfo->left_dispersion = innerdispersion;
innerbucketsize = estimate_hash_bucketsize(root, left);
restrictinfo->left_bucketsize = innerbucketsize;
}
}
else
......@@ -790,7 +789,7 @@ hash_inner_and_outer(Query *root,
innerrel->cheapest_total_path,
restrictlist,
hashclauses,
innerdispersion));
innerbucketsize));
if (outerrel->cheapest_startup_path != outerrel->cheapest_total_path)
add_path(joinrel, (Path *)
create_hashjoin_path(joinrel,
......@@ -799,7 +798,7 @@ hash_inner_and_outer(Query *root,
innerrel->cheapest_total_path,
restrictlist,
hashclauses,
innerdispersion));
innerbucketsize));
}
}
......@@ -866,31 +865,6 @@ best_innerjoin(List *join_paths, Relids outer_relids, JoinType jointype)
return cheapest;
}
/*
* Estimate dispersion of the specified Var
*
* We use a default of 0.1 if we can't figure out anything better.
* This will typically discourage use of a hash rather strongly,
* if the inner relation is large. We do not want to hash unless
* we know that the inner rel is well-dispersed (or the alternatives
* seem much worse).
*/
static Selectivity
estimate_dispersion(Query *root, Var *var)
{
Oid relid;
if (!IsA(var, Var))
return 0.1;
relid = getrelid(var->varno, root->rtable);
if (relid == InvalidOid)
return 0.1;
return (Selectivity) get_attdispersion(relid, var->varattno, 0.1);
}
/*
* select_mergejoin_clauses
* Select mergejoin clauses that are usable for a particular join.
......
......@@ -10,14 +10,14 @@
*
*
* IDENTIFICATION
* $Header: /cvsroot/pgsql/src/backend/optimizer/plan/createplan.c,v 1.104 2001/03/22 03:59:36 momjian Exp $
* $Header: /cvsroot/pgsql/src/backend/optimizer/plan/createplan.c,v 1.105 2001/05/07 00:43:20 tgl Exp $
*
*-------------------------------------------------------------------------
*/
#include <sys/types.h>
#include "postgres.h"
#include <sys/types.h>
#include "catalog/pg_index.h"
#include "nodes/makefuncs.h"
#include "nodes/nodeFuncs.h"
......@@ -1484,9 +1484,9 @@ make_sort_from_pathkeys(List *tlist, Plan *lefttree, List *pathkeys)
*/
if (resdom->reskey == 0)
{
/* OK, mark it as a sort key and set the sort operator regproc */
/* OK, mark it as a sort key and set the sort operator */
resdom->reskey = ++numsortkeys;
resdom->reskeyop = get_opcode(pathkey->sortop);
resdom->reskeyop = pathkey->sortop;
}
}
......
......@@ -8,13 +8,14 @@
*
*
* IDENTIFICATION
* $Header: /cvsroot/pgsql/src/backend/optimizer/plan/initsplan.c,v 1.59 2001/04/16 19:44:10 tgl Exp $
* $Header: /cvsroot/pgsql/src/backend/optimizer/plan/initsplan.c,v 1.60 2001/05/07 00:43:21 tgl Exp $
*
*-------------------------------------------------------------------------
*/
#include "postgres.h"
#include <sys/types.h>
#include "postgres.h"
#include "catalog/pg_operator.h"
#include "catalog/pg_type.h"
#include "nodes/makefuncs.h"
......@@ -348,8 +349,8 @@ distribute_qual_to_rels(Query *root, Node *clause,
restrictinfo->left_pathkey = NIL; /* not computable yet */
restrictinfo->right_pathkey = NIL;
restrictinfo->hashjoinoperator = InvalidOid;
restrictinfo->left_dispersion = -1; /* not computed until needed */
restrictinfo->right_dispersion = -1;
restrictinfo->left_bucketsize = -1; /* not computed until needed */
restrictinfo->right_bucketsize = -1;
/*
* Retrieve all relids and vars contained within the clause.
......
......@@ -8,7 +8,7 @@
*
*
* IDENTIFICATION
* $Header: /cvsroot/pgsql/src/backend/optimizer/plan/planner.c,v 1.105 2001/04/30 19:24:47 tgl Exp $
* $Header: /cvsroot/pgsql/src/backend/optimizer/plan/planner.c,v 1.106 2001/05/07 00:43:21 tgl Exp $
*
*-------------------------------------------------------------------------
*/
......@@ -1367,7 +1367,7 @@ make_groupplan(List *group_tlist,
{
/* OK, insert the ordering info needed by the executor. */
resdom->reskey = ++keyno;
resdom->reskeyop = get_opcode(grpcl->sortop);
resdom->reskeyop = grpcl->sortop;
}
}
......@@ -1412,7 +1412,7 @@ make_sortplan(List *tlist, Plan *plannode, List *sortcls)
{
/* OK, insert the ordering info needed by the executor. */
resdom->reskey = ++keyno;
resdom->reskeyop = get_opcode(sortcl->sortop);
resdom->reskeyop = sortcl->sortop;
}
}
......
......@@ -14,7 +14,7 @@
*
*
* IDENTIFICATION
* $Header: /cvsroot/pgsql/src/backend/optimizer/prep/prepunion.c,v 1.62 2001/03/27 18:02:19 tgl Exp $
* $Header: /cvsroot/pgsql/src/backend/optimizer/prep/prepunion.c,v 1.63 2001/05/07 00:43:22 tgl Exp $
*
*-------------------------------------------------------------------------
*/
......@@ -682,8 +682,8 @@ adjust_inherited_attrs_mutator(Node *node,
newinfo->eval_cost = -1; /* reset this too */
newinfo->left_pathkey = NIL; /* and these */
newinfo->right_pathkey = NIL;
newinfo->left_dispersion = -1;
newinfo->right_dispersion = -1;
newinfo->left_bucketsize = -1;
newinfo->right_bucketsize = -1;
return (Node *) newinfo;
}
......
......@@ -8,14 +8,14 @@
*
*
* IDENTIFICATION
* $Header: /cvsroot/pgsql/src/backend/optimizer/util/pathnode.c,v 1.71 2001/03/22 03:59:39 momjian Exp $
* $Header: /cvsroot/pgsql/src/backend/optimizer/util/pathnode.c,v 1.72 2001/05/07 00:43:22 tgl Exp $
*
*-------------------------------------------------------------------------
*/
#include <math.h>
#include "postgres.h"
#include <math.h>
#include "nodes/plannodes.h"
#include "optimizer/cost.h"
#include "optimizer/pathnode.h"
......@@ -559,7 +559,7 @@ create_mergejoin_path(RelOptInfo *joinrel,
* 'restrict_clauses' are the RestrictInfo nodes to apply at the join
* 'hashclauses' is a list of the hash join clause (always a 1-element list)
* (this should be a subset of the restrict_clauses list)
* 'innerdispersion' is an estimate of the dispersion of the inner hash key
* 'innerbucketsize' is an estimate of the bucketsize of the inner hash key
*
*/
HashPath *
......@@ -569,7 +569,7 @@ create_hashjoin_path(RelOptInfo *joinrel,
Path *inner_path,
List *restrict_clauses,
List *hashclauses,
Selectivity innerdispersion)
Selectivity innerbucketsize)
{
HashPath *pathnode = makeNode(HashPath);
......@@ -587,7 +587,7 @@ create_hashjoin_path(RelOptInfo *joinrel,
outer_path,
inner_path,
restrict_clauses,
innerdispersion);
innerbucketsize);
return pathnode;
}
......@@ -9,11 +9,10 @@
*
*
* IDENTIFICATION
* $Header: /cvsroot/pgsql/src/backend/optimizer/util/plancat.c,v 1.64 2001/03/22 03:59:40 momjian Exp $
* $Header: /cvsroot/pgsql/src/backend/optimizer/util/plancat.c,v 1.65 2001/05/07 00:43:22 tgl Exp $
*
*-------------------------------------------------------------------------
*/
#include "postgres.h"
#include <math.h>
......
......@@ -6,7 +6,7 @@
* Portions Copyright (c) 1996-2001, PostgreSQL Global Development Group
* Portions Copyright (c) 1994, Regents of the University of California
*
* $Header: /cvsroot/pgsql/src/backend/parser/analyze.c,v 1.183 2001/03/22 06:16:15 momjian Exp $
* $Header: /cvsroot/pgsql/src/backend/parser/analyze.c,v 1.184 2001/05/07 00:43:22 tgl Exp $
*
*-------------------------------------------------------------------------
*/
......@@ -2660,7 +2660,7 @@ transformForUpdate(Query *qry, List *forUpdate)
/* just the named tables */
foreach(l, forUpdate)
{
char *relname = lfirst(l);
char *relname = strVal(lfirst(l));
i = 0;
foreach(rt, qry->rtable)
......
This diff is collapsed.
......@@ -8,7 +8,7 @@
*
*
* IDENTIFICATION
* $Header: /cvsroot/pgsql/src/backend/parser/keywords.c,v 1.90 2001/03/22 03:59:40 momjian Exp $
* $Header: /cvsroot/pgsql/src/backend/parser/keywords.c,v 1.91 2001/05/07 00:43:23 tgl Exp $
*
*-------------------------------------------------------------------------
*/
......@@ -238,6 +238,7 @@ static ScanKeyword ScanKeywords[] = {
{"some", SOME},
{"start", START},
{"statement", STATEMENT},
{"statistics", STATISTICS},
{"stdin", STDIN},
{"stdout", STDOUT},
{"substring", SUBSTRING},
......
......@@ -8,7 +8,7 @@
*
*
* IDENTIFICATION
* $Header: /cvsroot/pgsql/src/backend/parser/parse_relation.c,v 1.54 2001/04/18 17:04:24 tgl Exp $
* $Header: /cvsroot/pgsql/src/backend/parser/parse_relation.c,v 1.55 2001/05/07 00:43:23 tgl Exp $
*
*-------------------------------------------------------------------------
*/
......@@ -75,7 +75,7 @@ static struct
}
};
#define SPECIALS ((int) (sizeof(special_attr)/sizeof(special_attr[0])))
#define SPECIALS ((int) lengthof(special_attr))
/*
......@@ -670,7 +670,7 @@ isForUpdate(ParseState *pstate, char *relname)
foreach(l, pstate->p_forUpdate)
{
char *rname = lfirst(l);
char *rname = strVal(lfirst(l));
if (strcmp(relname, rname) == 0)
return true;
......@@ -1020,20 +1020,6 @@ attnameIsSet(Relation rd, char *name)
#endif
#ifdef NOT_USED
/*
* This should only be used if the relation is already
* heap_open()'ed. Use the cache version
* for access to non-opened relations.
*/
int
attnumAttNelems(Relation rd, int attid)
{
return rd->rd_att->attrs[attid - 1]->attnelems;
}
#endif
/* given attribute id, return type of that attribute */
/*
* This should only be used if the relation is already
......
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment