Skip to content
Projects
Groups
Snippets
Help
Loading...
Help
Support
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
P
Postgres FD Implementation
Project overview
Project overview
Details
Activity
Releases
Repository
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Issues
0
Issues
0
List
Boards
Labels
Milestones
Merge Requests
0
Merge Requests
0
CI / CD
CI / CD
Pipelines
Jobs
Schedules
Analytics
Analytics
CI / CD
Repository
Value Stream
Wiki
Wiki
Snippets
Snippets
Members
Members
Collapse sidebar
Close sidebar
Activity
Graph
Create a new issue
Jobs
Commits
Issue Boards
Open sidebar
Abuhujair Javed
Postgres FD Implementation
Commits
c6521b1b
Commit
c6521b1b
authored
Feb 13, 2005
by
Tom Lane
Browse files
Options
Browse Files
Download
Email Patches
Plain Diff
Write some real documentation about the index access method API.
parent
67ff8009
Changes
6
Show whitespace changes
Inline
Side-by-side
Showing
6 changed files
with
852 additions
and
315 deletions
+852
-315
doc/src/sgml/catalogs.sgml
doc/src/sgml/catalogs.sgml
+5
-18
doc/src/sgml/filelist.sgml
doc/src/sgml/filelist.sgml
+2
-2
doc/src/sgml/indexam.sgml
doc/src/sgml/indexam.sgml
+837
-0
doc/src/sgml/indexcost.sgml
doc/src/sgml/indexcost.sgml
+0
-285
doc/src/sgml/postgres.sgml
doc/src/sgml/postgres.sgml
+2
-2
doc/src/sgml/xindex.sgml
doc/src/sgml/xindex.sgml
+6
-8
No files found.
doc/src/sgml/catalogs.sgml
View file @
c6521b1b
<!--
Documentation of the system catalogs, directed toward PostgreSQL developers
$PostgreSQL: pgsql/doc/src/sgml/catalogs.sgml,v 2.9
5 2005/01/05 23:42:03
tgl Exp $
$PostgreSQL: pgsql/doc/src/sgml/catalogs.sgml,v 2.9
6 2005/02/13 03:04:15
tgl Exp $
-->
<chapter id="catalogs">
...
...
@@ -289,9 +289,10 @@
</indexterm>
<para>
The catalog <structname>pg_am</structname> stores information about index access
methods. There is one row for each index access method supported by
the system.
The catalog <structname>pg_am</structname> stores information about index
access methods. There is one row for each index access method supported by
the system. The contents of this catalog are discussed in detail in
<xref linkend="indexam">.
</para>
<table>
...
...
@@ -453,20 +454,6 @@
</tgroup>
</table>
<para>
An index access method that supports multiple columns (has
<structfield>amcanmulticol</structfield> true) <emphasis>must</>
support indexing null values in columns after the first, because the planner
will assume the index can be used for queries on just the first
column(s). For example, consider an index on (a,b) and a query with
<literal>WHERE a = 4</literal>. The system will assume the index can be used to scan for
rows with <literal>a = 4</literal>, which is wrong if the index omits rows where <literal>b</> is null.
It is, however, OK to omit rows where the first indexed column is null.
(GiST currently does so.)
<structfield>amindexnulls</structfield> should be set true only if the
index access method indexes all rows, including arbitrary combinations of null values.
</para>
</sect1>
...
...
doc/src/sgml/filelist.sgml
View file @
c6521b1b
<!-- $PostgreSQL: pgsql/doc/src/sgml/filelist.sgml,v 1.4
1 2005/01/10 00:04:38
tgl Exp $ -->
<!-- $PostgreSQL: pgsql/doc/src/sgml/filelist.sgml,v 1.4
2 2005/02/13 03:04:15
tgl Exp $ -->
<!entity history SYSTEM "history.sgml">
<!entity info SYSTEM "info.sgml">
...
...
@@ -77,7 +77,7 @@
<!entity catalogs SYSTEM "catalogs.sgml">
<!entity geqo SYSTEM "geqo.sgml">
<!entity gist SYSTEM "gist.sgml">
<!entity index
cost SYSTEM "indexcost
.sgml">
<!entity index
am SYSTEM "indexam
.sgml">
<!entity nls SYSTEM "nls.sgml">
<!entity plhandler SYSTEM "plhandler.sgml">
<!entity protocol SYSTEM "protocol.sgml">
...
...
doc/src/sgml/indexam.sgml
0 → 100644
View file @
c6521b1b
<!--
$PostgreSQL: pgsql/doc/src/sgml/indexam.sgml,v 2.1 2005/02/13 03:04:15 tgl Exp $
-->
<chapter id="indexam">
<title>Index Access Method Interface Definition</title>
<para>
This chapter defines the interface between the core
<productname>PostgreSQL</productname> system and <firstterm>index access
methods</>, which manage individual index types. The core system
knows nothing about indexes beyond what is specified here, so it is
possible to develop entirely new index types by writing add-on code.
</para>
<para>
All indexes in <productname>PostgreSQL</productname> are what are known
technically as <firstterm>secondary indexes</>; that is, the index is
physically separate from the table file that it describes. Each index
is stored as its own physical <firstterm>relation</> and so is described
by an entry in the <structname>pg_class</> catalog. The contents of an
index are entirely under the control of its index access method. In
practice, all index access methods divide indexes into standard-size
pages so that they can use the regular storage manager and buffer manager
to access the index contents. (All the existing index access methods
furthermore use the standard page layout described in <xref
linkend="storage-page-layout">, and they all use the same format for index
tuple headers; but these decisions are not forced on an access method.)
</para>
<para>
An index is effectively a mapping from some data key values to
<firstterm>tuple identifiers</>, or <acronym>TIDs</>, of row versions
(tuples) in the index's parent table. A TID consists of a
block number and an item number within that block (see <xref
linkend="storage-page-layout">). This is sufficient
information to fetch a particular row version from the table.
Indexes are not directly aware that under MVCC, there may be multiple
extant versions of the same logical row; to an index, each tuple is
an independent object that needs its own index entry. Thus, an
update of a row always creates all-new index entries for the row, even if
the key values did not change. Index entries for dead tuples are
reclaimed (by vacuuming) when the dead tuples themselves are reclaimed.
</para>
<sect1 id="index-catalog">
<title>Catalog Entries for Indexes</title>
<para>
Each index access method is described by a row in the
<structname>pg_am</structname> system catalog (see
<xref linkend="catalog-pg-am">). The principal contents of a
<structname>pg_am</structname> row are references to
<link linkend="catalog-pg-proc"><structname>pg_proc</structname></link>
entries that identify the index access
functions supplied by the access method. The APIs for these functions
are defined later in this chapter. In addition, the
<structname>pg_am</structname> row specifies a few fixed properties of
the access method, such as whether it can support multi-column indexes.
There is not currently any special support
for creating or deleting <structname>pg_am</structname> entries;
anyone able to write a new access method is expected to be competent
to insert an appropriate row for themselves.
</para>
<para>
To be useful, an index access method must also have one or more
<firstterm>operator classes</> defined in
<link linkend="catalog-pg-opclass"><structname>pg_opclass</structname></link>,
<link linkend="catalog-pg-amop"><structname>pg_amop</structname></link>, and
<link linkend="catalog-pg-amproc"><structname>pg_amproc</structname></link>.
These entries allow the planner
to determine what kinds of query qualifications can be used with
indexes of this access method. Operator classes are described
in <xref linkend="xindex">, which is prerequisite material for reading
this chapter.
</para>
<para>
An individual index is defined by a
<link linkend="catalog-pg-class"><structname>pg_class</structname></link>
entry that describes it as a physical relation, plus a
<link linkend="catalog-pg-index"><structname>pg_index</structname></link>
entry that shows the logical content of the index — that is, the set
of index columns it has and the semantics of those columns, as captured by
the associated operator classes. The index columns (key values) can be
either simple columns of the underlying table or expressions over the table
rows. The index access method normally has no interest in where the index
key values come from (it is always handed precomputed key values) but it
will be very interested in the operator class information in
<structname>pg_index</structname>. Both of these catalog entries can be
accessed as part of the <structname>Relation</> data structure that is
passed to all operations on the index.
</para>
<para>
Some of the flag columns of <structname>pg_am</structname> have nonobvious
implications. The requirements of <structfield>amcanunique</structfield>
are discussed in <xref linkend="index-unique-checks">, and those of
<structfield>amconcurrent</structfield> in <xref linkend="index-locking">.
The <structfield>amcanmulticol</structfield> flag asserts that the
access method supports multi-column indexes, while
<structfield>amindexnulls</structfield> asserts that index entries are
created for NULL key values. Since most indexable operators are
strict and hence cannot return TRUE for NULL inputs,
it is at first sight attractive to not store index entries for NULLs:
they could never be returned by an index scan anyway. However, this
argument fails for a full-table index scan (one with no scan keys);
such a scan should include null rows. In practice this means that
indexes that support ordered scans (have <structfield>amorderstrategy</>
nonzero) must index nulls, since the planner might decide to use such a
scan as a substitute for sorting. Another restriction is that an index
access method that supports multiple index columns <emphasis>must</>
support indexing null values in columns after the first, because the planner
will assume the index can be used for queries on just the first
column(s). For example, consider an index on (a,b) and a query with
<literal>WHERE a = 4</literal>. The system will assume the index can be
used to scan for rows with <literal>a = 4</literal>, which is wrong if the
index omits rows where <literal>b</> is null.
It is, however, OK to omit rows where the first indexed column is null.
(GiST currently does so.) Thus,
<structfield>amindexnulls</structfield> should be set true only if the
index access method indexes all rows, including arbitrary combinations of
null values.
</para>
</sect1>
<sect1 id="index-functions">
<title>Index Access Method Functions</title>
<para>
The index construction and maintenance functions that an index access
method must provide are:
</para>
<para>
<programlisting>
void
ambuild (Relation heapRelation,
Relation indexRelation,
IndexInfo *indexInfo);
</programlisting>
Build a new index. The index relation has been physically created,
but is empty. It must be filled in with whatever fixed data the
access method requires, plus entries for all tuples already existing
in the table. Ordinarily the <function>ambuild</> function will call
<function>IndexBuildHeapScan()</> to scan the table for existing tuples
and compute the keys that need to be inserted into the index.
</para>
<para>
<programlisting>
InsertIndexResult
aminsert (Relation indexRelation,
Datum *datums,
char *nulls,
ItemPointer heap_tid,
Relation heapRelation,
bool check_uniqueness);
</programlisting>
Insert a new tuple into an existing index. The <literal>datums</> and
<literal>nulls</> arrays give the key values to be indexed, and
<literal>heap_tid</> is the TID to be indexed.
If the access method supports unique indexes (its
<structname>pg_am</>.<structfield>amcanunique</> flag is true) then
<literal>check_uniqueness</> may be true, in which case the access method
must verify that there is no conflicting row; this is the only situation in
which the access method normally needs the <literal>heapRelation</>
parameter. See <xref linkend="index-unique-checks"> for details.
The result is a struct that must be pfree'd by the caller. (The result
struct is really quite useless and should be removed...)
</para>
<para>
<programlisting>
IndexBulkDeleteResult *
ambulkdelete (Relation indexRelation,
IndexBulkDeleteCallback callback,
void *callback_state);
</programlisting>
Delete tuple(s) from the index. This is a <quote>bulk delete</> operation
that is intended to be implemented by scanning the whole index and checking
each entry to see if it should be deleted.
The passed-in <literal>callback</> function may be called, in the style
<literal>callback(<replaceable>TID</>, callback_state) returns bool</literal>,
to determine whether any particular index entry, as identified by its
referenced TID, is to be deleted. Must return either NULL or a palloc'd
struct containing statistics about the effects of the deletion operation.
</para>
<para>
<programlisting>
IndexBulkDeleteResult *
amvacuumcleanup (Relation indexRelation,
IndexVacuumCleanupInfo *info,
IndexBulkDeleteResult *stats);
</programlisting>
Clean up after a <command>VACUUM</command> operation (one or more
<function>ambulkdelete</> calls). An index access method does not have
to provide this function (if so, the entry in <structname>pg_am</> must
be zero). If it is provided, it is typically used for bulk cleanup
such as reclaiming empty index pages. <literal>info</>
provides some additional arguments such as a message level for statistical
reports, and <literal>stats</> is whatever the last
<function>ambulkdelete</> call returned. <function>amvacuumcleanup</>
may replace or modify this struct before returning it. If the result
is not NULL it must be a palloc'd struct. The statistics it contains
will be reported by <command>VACUUM</> if <literal>VERBOSE</> is given.
</para>
<para>
The purpose of an index, of course, is to support scans for tuples matching
an indexable <literal>WHERE</> condition, often called a
<firstterm>qualifier</> or <firstterm>scan key</>. The semantics of
index scanning are described more fully in <xref linkend="index-scanning">,
below. The scan-related functions that an index access method must provide
are:
</para>
<para>
<programlisting>
IndexScanDesc
ambeginscan (Relation indexRelation,
int nkeys,
ScanKey key);
</programlisting>
Begin a new scan. The <literal>key</> array (of length <literal>nkeys</>)
describes the scan key(s) for the index scan. The result must be a
palloc'd struct. For implementation reasons the index access method
<emphasis>must</> create this struct by calling
<function>RelationGetIndexScan()</>. In most cases
<function>ambeginscan</> itself does little beyond making that call;
the interesting parts of indexscan startup are in <function>amrescan</>.
</para>
<para>
<programlisting>
boolean
amgettuple (IndexScanDesc scan,
ScanDirection direction);
</programlisting>
Fetch the next tuple in the given scan, moving in the given
direction (forward or backward in the index). Returns TRUE if a tuple was
obtained, FALSE if no matching tuples remain. In the TRUE case the tuple
TID is stored into the <literal>scan</> structure. Note that
<quote>success</> means only that the index contains an entry that matches
the scan keys, not that the tuple necessarily still exists in the heap or
will pass the caller's snapshot test.
</para>
<para>
<programlisting>
void
amrescan (IndexScanDesc scan,
ScanKey key);
</programlisting>
Restart the given scan, possibly with new scan keys (to continue using
the old keys, NULL is passed for <literal>key</>). Note that it is not
possible for the number of keys to be changed. In practice the restart
feature is used when a new outer tuple is selected by a nestloop join
and so a new key comparison value is needed, but the scan key structure
remains the same. This function is also called by
<function>RelationGetIndexScan()</>, so it is used for initial setup
of an indexscan as well as rescanning.
</para>
<para>
<programlisting>
void
amendscan (IndexScanDesc scan);
</programlisting>
End a scan and release resources. The <literal>scan</> struct itself
should not be freed, but any locks or pins taken internally by the
access method must be released.
</para>
<para>
<programlisting>
void
ammarkpos (IndexScanDesc scan);
</programlisting>
Mark current scan position. The access method need only support one
remembered scan position per scan.
</para>
<para>
<programlisting>
void
amrestrpos (IndexScanDesc scan);
</programlisting>
Restore the scan to the most recently marked position.
</para>
<para>
<programlisting>
void
amcostestimate (Query *root,
RelOptInfo *rel,
IndexOptInfo *index,
List *indexQuals,
Cost *indexStartupCost,
Cost *indexTotalCost,
Selectivity *indexSelectivity,
double *indexCorrelation);
</programlisting>
Estimate the costs of an index scan. This function is described fully
in <xref linkend="index-cost-estimation">, below.
</para>
<para>
By convention, the <literal>pg_proc</literal> entry for any index
access method function should show the correct number of arguments,
but declare them all as type <type>internal</> (since most of the arguments
have types that are not known to SQL, and we don't want users calling
the functions directly anyway). The return type is declared as
<type>void</>, <type>internal</>, or <type>boolean</> as appropriate.
</para>
</sect1>
<sect1 id="index-scanning">
<title>Index Scanning</title>
<para>
In an index scan, the index access method is responsible for regurgitating
the TIDs of all the tuples it has been told about that match the
<firstterm>scan keys</>. The access method is <emphasis>not</> involved in
actually fetching those tuples from the index's parent table, nor in
determining whether they pass the scan's time qualification test or other
conditions.
</para>
<para>
A scan key is the internal representation of a <literal>WHERE</> clause of
the form <replaceable>index_key</> <replaceable>operator</>
<replaceable>constant</>, where the index key is one of the columns of the
index and the operator is one of the members of the operator class
associated with that index column. An index scan has zero or more scan
keys, which are implicitly ANDed — the returned tuples are expected
to satisfy all the indicated conditions.
</para>
<para>
The operator class may indicate that the index is <firstterm>lossy</> for a
particular operator; this implies that the index scan will return all the
entries that pass the scan key, plus possibly additional entries that do
not. The core system's indexscan machinery will then apply that operator
again to the heap tuple to verify whether or not it really should be
selected. For non-lossy operators, the index scan must return exactly the
set of matching entries, as there is no recheck.
</para>
<para>
Note that it is entirely up to the access method to ensure that it
correctly finds all and only the entries passing all the given scan keys.
Also, the core system will simply hand off all the <literal>WHERE</>
clauses that match the index keys and operator classes, without any
semantic analysis to determine whether they are redundant or
contradictory. As an example, given
<literal>WHERE x > 4 AND x > 14</> where <literal>x</> is a b-tree
indexed column, it is left to the b-tree <function>amrescan</> function
to realize that the first scan key is redundant and can be discarded.
The extent of preprocessing needed during <function>amrescan</> will
depend on the extent to which the index access method needs to reduce
the scan keys to a <quote>normalized</> form.
</para>
<para>
The <function>amgettuple</> function has a <literal>direction</> argument,
which can be either <literal>ForwardScanDirection</> (the normal case)
or <literal>BackwardScanDirection</>. If the first call after
<function>amrescan</> specifies <literal>BackwardScanDirection</>, then the
set of matching index entries is to be scanned back-to-front rather than in
the normal front-to-back direction, so <function>amgettuple</> must return
the last matching tuple in the index, rather than the first one as it
normally would. (This will only occur for access
methods that advertise they support ordered scans by setting
<structname>pg_am</>.<structfield>amorderstrategy</> nonzero.) After the
first call, <function>amgettuple</> must be prepared to advance the scan in
either direction from the most recently returned entry.
</para>
<para>
The access method must support <quote>marking</> a position in a scan
and later returning to the marked position. The same position may be
restored multiple times. However, only one position need be remembered
per scan; a new <function>ammarkpos</> call overrides the previously
marked position.
</para>
<para>
Both the scan position and the mark position (if any) must be maintained
consistently in the face of concurrent insertions or deletions in the
index. It is OK if a freshly-inserted entry is not returned by a scan that
would have found the entry if it had existed when the scan started, or for
the scan to return such an entry upon rescanning or backing
up even though it had not been returned the first time through. Similarly,
a concurrent delete may or may not be reflected in the results of a scan.
What is important is that insertions or deletions not cause the scan to
miss or multiply return entries that were not themselves being inserted or
deleted. (For an index type that does not set
<structname>pg_am</>.<structfield>amconcurrent</>, it is sufficient to
handle these cases for insertions or deletions performed by the same
backend that's doing the scan. But when <structfield>amconcurrent</> is
true, insertions or deletions from other backends must be handled as well.)
</para>
</sect1>
<sect1 id="index-locking">
<title>Index Locking Considerations</title>
<para>
An index access method can choose whether it supports concurrent updates
of the index by multiple processes. If the method's
<structname>pg_am</>.<structfield>amconcurrent</> flag is true, then
the core <productname>PostgreSQL</productname> system obtains
<literal>AccessShareLock</> on the index during an index scan, and
<literal>RowExclusiveLock</> when updating the index. Since these lock
types do not conflict, the access method is responsible for handling any
fine-grained locking it may need. An exclusive lock on the index as a whole
will be taken only during index creation, destruction, or
<literal>REINDEX</>. When <structfield>amconcurrent</> is false,
<productname>PostgreSQL</productname> still obtains
<literal>AccessShareLock</> during index scans, but it obtains
<literal>AccessExclusiveLock</> during any update. This ensures that
updaters have sole use of the index. Note that this implicitly assumes
that index scans are read-only; an access method that might modify the
index during a scan will still have to do its own locking to handle the
case of concurrent scans.
</para>
<para>
Recall that a backend's own locks never conflict; therefore, even a
non-concurrent index type must be prepared to handle the case where
a backend is inserting or deleting entries in an index that it is itself
scanning. (This is of course necessary to support an <command>UPDATE</>
that uses the index to find the rows to be updated.)
</para>
<para>
Building an index type that supports concurrent updates usually requires
extensive and subtle analysis of the required behavior. For the b-tree
and hash index types, you can read about the design decisions involved in
<filename>src/backend/access/nbtree/README</> and
<filename>src/backend/access/hash/README</>.
</para>
<para>
Aside from the index's own internal consistency requirements, concurrent
updates create issues about consistency between the parent table (the
<firstterm>heap</>) and the index. Because
<productname>PostgreSQL</productname> separates accesses
and updates of the heap from those of the index, there are windows in
which the index may be inconsistent with the heap. We handle this problem
with the following rules:
<itemizedlist>
<listitem>
<para>
A new heap entry is made before making its index entries. (Therefore
a concurrent index scan is likely to fail to see the heap entry.
This is okay because the index reader would be uninterested in an
uncommitted row anyway. But see <xref linkend="index-unique-checks">.)
</para>
</listitem>
<listitem>
<para>
When a heap entry is to be deleted (by <command>VACUUM</>), all its
index entries must be removed first.
</para>
</listitem>
<listitem>
<para>
For concurrent index types, an indexscan must maintain a pin
on the index page holding the item last returned by
<function>amgettuple</>, and <function>ambulkdelete</> cannot delete
entries from pages that are pinned by other backends. The need
for this rule is explained below.
</para>
</listitem>
</itemizedlist>
If an index is concurrent then it is possible for an index reader to
see an index entry just before it is removed by <command>VACUUM</>, and
then to arrive at the corresponding heap entry after that was removed by
<command>VACUUM</>. (With a nonconcurrent index, this is not possible
because of the conflicting index-level locks that will be taken out.)
This creates no serious problems if that item
number is still unused when the reader reaches it, since an empty
item slot will be ignored by <function>heap_fetch()</>. But what if a
third backend has already re-used the item slot for something else?
When using an MVCC-compliant snapshot, there is no problem because
the new occupant of the slot is certain to be too new to pass the
snapshot test. However, with a non-MVCC-compliant snapshot (such as
<literal>SnapshotNow</>), it would be possible to accept and return
a row that does not in fact match the scan keys. We could defend
against this scenario by requiring the scan keys to be rechecked
against the heap row in all cases, but that is too expensive. Instead,
we use a pin on an index page as a proxy to indicate that the reader
may still be <quote>in flight</> from the index entry to the matching
heap entry. Making <function>ambulkdelete</> block on such a pin ensures
that <command>VACUUM</> cannot delete the heap entry before the reader
is done with it. This solution costs little in runtime, and adds blocking
overhead only in the rare cases where there actually is a conflict.
</para>
<para>
This solution requires that index scans be <quote>synchronous</>: we have
to fetch each heap tuple immediately after scanning the corresponding index
entry. This is expensive for a number of reasons. An
<quote>asynchronous</> scan in which we collect many TIDs from the index,
and only visit the heap tuples sometime later, requires much less index
locking overhead and may allow a more efficient heap access pattern.
Per the above analysis, we must use the synchronous approach for
non-MVCC-compliant snapshots, but an asynchronous scan would be safe
for a query using an MVCC snapshot. This possibility is not exploited
as of <productname>PostgreSQL</productname> 8.0, but it is likely to be
investigated soon.
</para>
</sect1>
<sect1 id="index-unique-checks">
<title>Index Uniqueness Checks</title>
<para>
<productname>PostgreSQL</productname> enforces SQL uniqueness constraints
using <firstterm>unique indexes</>, which are indexes that disallow
multiple entries with identical keys. An access method that supports this
feature sets <structname>pg_am</>.<structfield>amcanunique</> true.
(At present, only b-tree supports it.)
</para>
<para>
Because of MVCC, it is always necessary to allow duplicate entries to
exist physically in an index: the entries might refer to successive
versions of a single logical row. The behavior we actually want to
enforce is that no MVCC snapshot could include two rows with equal
index keys. This breaks down into the following cases that must be
checked when inserting a new row into a unique index:
<itemizedlist>
<listitem>
<para>
If a conflicting valid row has been deleted by the current transaction,
it's okay. (In particular, since an UPDATE always deletes the old row
version before inserting the new version, this will allow an UPDATE on
a row without changing the key.)
</para>
</listitem>
<listitem>
<para>
If a conflicting row has been inserted by an as-yet-uncommitted
transaction, the would-be inserter must wait to see if that transaction
commits. If it rolls back then there is no conflict. If it commits
without deleting the conflicting row again, there is a uniqueness
violation. (In practice we just wait for the other transaction to
end and then redo the visibility check in toto.)
</para>
</listitem>
<listitem>
<para>
Similarly, if a conflicting valid row has been deleted by an
as-yet-uncommitted transaction, the would-be inserter must wait
for that transaction to commit or abort, and then repeat the test.
</para>
</listitem>
</itemizedlist>
</para>
<para>
We require the index access method to apply these tests itself, which
means that it must reach into the heap to check the commit status of
any row that is shown to have a duplicate key according to the index
contents. This is without a doubt ugly and non-modular, but it saves
redundant work: if we did a separate probe then the index lookup for
a conflicting row would be essentially repeated while finding the place to
insert the new row's index entry. What's more, there is no obvious way
to avoid race conditions unless the conflict check is an integral part
of insertion of the new index entry.
</para>
<para>
The main limitation of this scheme is that it has no convenient way
to support deferred uniqueness checks.
</para>
</sect1>
<sect1 id="index-cost-estimation">
<title>Index Cost Estimation Functions</title>
<para>
The amcostestimate function is given a list of WHERE clauses that have
been determined to be usable with the index. It must return estimates
of the cost of accessing the index and the selectivity of the WHERE
clauses (that is, the fraction of parent-table rows that will be
retrieved during the index scan). For simple cases, nearly all the
work of the cost estimator can be done by calling standard routines
in the optimizer; the point of having an amcostestimate function is
to allow index access methods to provide index-type-specific knowledge,
in case it is possible to improve on the standard estimates.
</para>
<para>
Each amcostestimate function must have the signature:
<programlisting>
void
amcostestimate (Query *root,
RelOptInfo *rel,
IndexOptInfo *index,
List *indexQuals,
Cost *indexStartupCost,
Cost *indexTotalCost,
Selectivity *indexSelectivity,
double *indexCorrelation);
</programlisting>
The first four parameters are inputs:
<variablelist>
<varlistentry>
<term>root</term>
<listitem>
<para>
The query being processed.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term>rel</term>
<listitem>
<para>
The relation the index is on.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term>index</term>
<listitem>
<para>
The index itself.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term>indexQuals</term>
<listitem>
<para>
List of index qual clauses (implicitly ANDed);
a NIL list indicates no qualifiers are available.
Note that the list contains expression trees, not ScanKeys.
</para>
</listitem>
</varlistentry>
</variablelist>
</para>
<para>
The last four parameters are pass-by-reference outputs:
<variablelist>
<varlistentry>
<term>*indexStartupCost</term>
<listitem>
<para>
Set to cost of index start-up processing
</para>
</listitem>
</varlistentry>
<varlistentry>
<term>*indexTotalCost</term>
<listitem>
<para>
Set to total cost of index processing
</para>
</listitem>
</varlistentry>
<varlistentry>
<term>*indexSelectivity</term>
<listitem>
<para>
Set to index selectivity
</para>
</listitem>
</varlistentry>
<varlistentry>
<term>*indexCorrelation</term>
<listitem>
<para>
Set to correlation coefficient between index scan order and
underlying table's order
</para>
</listitem>
</varlistentry>
</variablelist>
</para>
<para>
Note that cost estimate functions must be written in C, not in SQL or
any available procedural language, because they must access internal
data structures of the planner/optimizer.
</para>
<para>
The index access costs should be computed in the units used by
<filename>src/backend/optimizer/path/costsize.c</filename>: a sequential disk block fetch
has cost 1.0, a nonsequential fetch has cost random_page_cost, and
the cost of processing one index row should usually be taken as
cpu_index_tuple_cost (which is a user-adjustable optimizer parameter).
In addition, an appropriate multiple of cpu_operator_cost should be charged
for any comparison operators invoked during index processing (especially
evaluation of the indexQuals themselves).
</para>
<para>
The access costs should include all disk and CPU costs associated with
scanning the index itself, but NOT the costs of retrieving or processing
the parent-table rows that are identified by the index.
</para>
<para>
The <quote>start-up cost</quote> is the part of the total scan cost that must be expended
before we can begin to fetch the first row. For most indexes this can
be taken as zero, but an index type with a high start-up cost might want
to set it nonzero.
</para>
<para>
The indexSelectivity should be set to the estimated fraction of the parent
table rows that will be retrieved during the index scan. In the case
of a lossy index, this will typically be higher than the fraction of
rows that actually pass the given qual conditions.
</para>
<para>
The indexCorrelation should be set to the correlation (ranging between
-1.0 and 1.0) between the index order and the table order. This is used
to adjust the estimate for the cost of fetching rows from the parent
table.
</para>
<procedure>
<title>Cost Estimation</title>
<para>
A typical cost estimator will proceed as follows:
</para>
<step>
<para>
Estimate and return the fraction of parent-table rows that will be visited
based on the given qual conditions. In the absence of any index-type-specific
knowledge, use the standard optimizer function <function>clauselist_selectivity()</function>:
<programlisting>
*indexSelectivity = clauselist_selectivity(root, indexQuals,
rel->relid, JOIN_INNER);
</programlisting>
</para>
</step>
<step>
<para>
Estimate the number of index rows that will be visited during the
scan. For many index types this is the same as indexSelectivity times
the number of rows in the index, but it might be more. (Note that the
index's size in pages and rows is available from the IndexOptInfo struct.)
</para>
</step>
<step>
<para>
Estimate the number of index pages that will be retrieved during the scan.
This might be just indexSelectivity times the index's size in pages.
</para>
</step>
<step>
<para>
Compute the index access cost. A generic estimator might do this:
<programlisting>
/*
* Our generic assumption is that the index pages will be read
* sequentially, so they have cost 1.0 each, not random_page_cost.
* Also, we charge for evaluation of the indexquals at each index row.
* All the costs are assumed to be paid incrementally during the scan.
*/
cost_qual_eval(&index_qual_cost, indexQuals);
*indexStartupCost = index_qual_cost.startup;
*indexTotalCost = numIndexPages +
(cpu_index_tuple_cost + index_qual_cost.per_tuple) * numIndexTuples;
</programlisting>
</para>
</step>
<step>
<para>
Estimate the index correlation. For a simple ordered index on a single
field, this can be retrieved from pg_statistic. If the correlation
is not known, the conservative estimate is zero (no correlation).
</para>
</step>
</procedure>
<para>
Examples of cost estimator functions can be found in
<filename>src/backend/utils/adt/selfuncs.c</filename>.
</para>
</sect1>
</chapter>
<!-- Keep this comment at the end of the file
Local variables:
mode:sgml
sgml-omittag:nil
sgml-shorttag:t
sgml-minimize-attributes:nil
sgml-always-quote-attributes:t
sgml-indent-step:1
sgml-indent-data:t
sgml-parent-document:nil
sgml-default-dtd-file:"./reference.ced"
sgml-exposed-tags:nil
sgml-local-catalogs:("/usr/lib/sgml/catalog")
sgml-local-ecat-files:nil
End:
-->
doc/src/sgml/indexcost.sgml
deleted
100644 → 0
View file @
67ff8009
<!--
$PostgreSQL: pgsql/doc/src/sgml/indexcost.sgml,v 2.19 2005/01/22 22:06:17 momjian Exp $
-->
<chapter id="indexcost">
<title>Index Cost Estimation Functions</title>
<note>
<title>Author</title>
<para>
Written by Tom Lane (<email>tgl@sss.pgh.pa.us</email>) on 2000-01-24
</para>
</note>
<note>
<para>
This must eventually become part of a much larger chapter about
writing new index access methods.
</para>
</note>
<para>
Every index access method must provide a cost estimation function for
use by the planner/optimizer. The procedure OID of this function is
given in the <literal>amcostestimate</literal> field of the access
method's <literal>pg_am</literal> entry.
<note>
<para>
Prior to <productname>PostgreSQL</productname> 7.0, a different
scheme was used for registering
index-specific cost estimation functions.
</para>
</note>
</para>
<para>
The amcostestimate function is given a list of WHERE clauses that have
been determined to be usable with the index. It must return estimates
of the cost of accessing the index and the selectivity of the WHERE
clauses (that is, the fraction of main-table rows that will be
retrieved during the index scan). For simple cases, nearly all the
work of the cost estimator can be done by calling standard routines
in the optimizer; the point of having an amcostestimate function is
to allow index access methods to provide index-type-specific knowledge,
in case it is possible to improve on the standard estimates.
</para>
<para>
Each amcostestimate function must have the signature:
<programlisting>
void
amcostestimate (Query *root,
RelOptInfo *rel,
IndexOptInfo *index,
List *indexQuals,
Cost *indexStartupCost,
Cost *indexTotalCost,
Selectivity *indexSelectivity,
double *indexCorrelation);
</programlisting>
The first four parameters are inputs:
<variablelist>
<varlistentry>
<term>root</term>
<listitem>
<para>
The query being processed.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term>rel</term>
<listitem>
<para>
The relation the index is on.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term>index</term>
<listitem>
<para>
The index itself.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term>indexQuals</term>
<listitem>
<para>
List of index qual clauses (implicitly ANDed);
a NIL list indicates no qualifiers are available.
</para>
</listitem>
</varlistentry>
</variablelist>
</para>
<para>
The last four parameters are pass-by-reference outputs:
<variablelist>
<varlistentry>
<term>*indexStartupCost</term>
<listitem>
<para>
Set to cost of index start-up processing
</para>
</listitem>
</varlistentry>
<varlistentry>
<term>*indexTotalCost</term>
<listitem>
<para>
Set to total cost of index processing
</para>
</listitem>
</varlistentry>
<varlistentry>
<term>*indexSelectivity</term>
<listitem>
<para>
Set to index selectivity
</para>
</listitem>
</varlistentry>
<varlistentry>
<term>*indexCorrelation</term>
<listitem>
<para>
Set to correlation coefficient between index scan order and
underlying table's order
</para>
</listitem>
</varlistentry>
</variablelist>
</para>
<para>
Note that cost estimate functions must be written in C, not in SQL or
any available procedural language, because they must access internal
data structures of the planner/optimizer.
</para>
<para>
The index access costs should be computed in the units used by
<filename>src/backend/optimizer/path/costsize.c</filename>: a sequential disk block fetch
has cost 1.0, a nonsequential fetch has cost random_page_cost, and
the cost of processing one index row should usually be taken as
cpu_index_tuple_cost (which is a user-adjustable optimizer parameter).
In addition, an appropriate multiple of cpu_operator_cost should be charged
for any comparison operators invoked during index processing (especially
evaluation of the indexQuals themselves).
</para>
<para>
The access costs should include all disk and CPU costs associated with
scanning the index itself, but NOT the costs of retrieving or processing
the main-table rows that are identified by the index.
</para>
<para>
The <quote>start-up cost</quote> is the part of the total scan cost that must be expended
before we can begin to fetch the first row. For most indexes this can
be taken as zero, but an index type with a high start-up cost might want
to set it nonzero.
</para>
<para>
The indexSelectivity should be set to the estimated fraction of the main
table rows that will be retrieved during the index scan. In the case
of a lossy index, this will typically be higher than the fraction of
rows that actually pass the given qual conditions.
</para>
<para>
The indexCorrelation should be set to the correlation (ranging between
-1.0 and 1.0) between the index order and the table order. This is used
to adjust the estimate for the cost of fetching rows from the main
table.
</para>
<procedure>
<title>Cost Estimation</title>
<para>
A typical cost estimator will proceed as follows:
</para>
<step>
<para>
Estimate and return the fraction of main-table rows that will be visited
based on the given qual conditions. In the absence of any index-type-specific
knowledge, use the standard optimizer function <function>clauselist_selectivity()</function>:
<programlisting>
*indexSelectivity = clauselist_selectivity(root, indexQuals,
rel->relid, JOIN_INNER);
</programlisting>
</para>
</step>
<step>
<para>
Estimate the number of index rows that will be visited during the
scan. For many index types this is the same as indexSelectivity times
the number of rows in the index, but it might be more. (Note that the
index's size in pages and rows is available from the IndexOptInfo struct.)
</para>
</step>
<step>
<para>
Estimate the number of index pages that will be retrieved during the scan.
This might be just indexSelectivity times the index's size in pages.
</para>
</step>
<step>
<para>
Compute the index access cost. A generic estimator might do this:
<programlisting>
/*
* Our generic assumption is that the index pages will be read
* sequentially, so they have cost 1.0 each, not random_page_cost.
* Also, we charge for evaluation of the indexquals at each index row.
* All the costs are assumed to be paid incrementally during the scan.
*/
cost_qual_eval(&index_qual_cost, indexQuals);
*indexStartupCost = index_qual_cost.startup;
*indexTotalCost = numIndexPages +
(cpu_index_tuple_cost + index_qual_cost.per_tuple) * numIndexTuples;
</programlisting>
</para>
</step>
<step>
<para>
Estimate the index correlation. For a simple ordered index on a single
field, this can be retrieved from pg_statistic. If the correlation
is not known, the conservative estimate is zero (no correlation).
</para>
</step>
</procedure>
<para>
Examples of cost estimator functions can be found in
<filename>src/backend/utils/adt/selfuncs.c</filename>.
</para>
<para>
By convention, the <literal>pg_proc</literal> entry for an
<literal>amcostestimate</literal> function should show
eight arguments all declared as <type>internal</> (since none of them have
types that are known to SQL), and the return type is <type>void</>.
</para>
</chapter>
<!-- Keep this comment at the end of the file
Local variables:
mode:sgml
sgml-omittag:nil
sgml-shorttag:t
sgml-minimize-attributes:nil
sgml-always-quote-attributes:t
sgml-indent-step:1
sgml-indent-data:t
sgml-parent-document:nil
sgml-default-dtd-file:"./reference.ced"
sgml-exposed-tags:nil
sgml-local-catalogs:("/usr/lib/sgml/catalog")
sgml-local-ecat-files:nil
End:
-->
doc/src/sgml/postgres.sgml
View file @
c6521b1b
<!--
$PostgreSQL: pgsql/doc/src/sgml/postgres.sgml,v 1.7
3 2005/01/10 00:04:38
tgl Exp $
$PostgreSQL: pgsql/doc/src/sgml/postgres.sgml,v 1.7
4 2005/02/13 03:04:15
tgl Exp $
-->
<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook V4.2//EN" [
...
...
@@ -235,7 +235,7 @@ $PostgreSQL: pgsql/doc/src/sgml/postgres.sgml,v 1.73 2005/01/10 00:04:38 tgl Exp
&nls;
&plhandler;
&geqo;
&index
cost
;
&index
am
;
&gist;
&storage;
&bki;
...
...
doc/src/sgml/xindex.sgml
View file @
c6521b1b
<!--
$PostgreSQL: pgsql/doc/src/sgml/xindex.sgml,v 1.3
8 2005/01/23 00:30:18 momjian
Exp $
$PostgreSQL: pgsql/doc/src/sgml/xindex.sgml,v 1.3
9 2005/02/13 03:04:15 tgl
Exp $
-->
<sect1 id="xindex">
...
...
@@ -43,7 +43,7 @@ $PostgreSQL: pgsql/doc/src/sgml/xindex.sgml,v 1.38 2005/01/23 00:30:18 momjian E
described in <classname>pg_am</classname>. It is possible to add a
new index method by defining the required interface routines and
then creating a row in <classname>pg_am</classname> — but that is
far beyond the scope of this chapter
.
beyond the scope of this chapter (see <xref linkend="indexam">)
.
</para>
<para>
...
...
@@ -514,7 +514,7 @@ CREATE OPERATOR < (
<listitem>
<para>
Although <productname>PostgreSQL</productname> can cope with
functions having the same name as long as they have different
functions having the same
SQL
name as long as they have different
argument data types, C can only cope with one global function
having a given name. So we shouldn't name the C function
something simple like <filename>abs_eq</filename>. Usually it's
...
...
@@ -525,14 +525,12 @@ CREATE OPERATOR < (
<listitem>
<para>
We could have made the
<productname>PostgreSQL</productname>
name
We could have made the
SQL
name
of the function <filename>abs_eq</filename>, relying on
<productname>PostgreSQL</productname> to distinguish it by
argument data types from any other
<productname>PostgreSQL</productname> function of the same name.
argument data types from any other SQL function of the same name.
To keep the example simple, we make the function have the same
names at the C level and <productname>PostgreSQL</productname>
level.
names at the C level and SQL level.
</para>
</listitem>
</itemizedlist>
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment