Commit 73e35660 authored by Tom Lane's avatar Tom Lane

Improve comments about btree's use of ScanKey data structures: there

are two basically different kinds of scankeys, and we ought to try harder
to indicate which is used in each place in the code.  I've chosen the names
"search scankey" and "insertion scankey", though you could make about
as good an argument for "operator scankey" and "comparison function
scankey".
parent e38217d1
$PostgreSQL: pgsql/src/backend/access/nbtree/README,v 1.8 2003/11/29 19:51:40 pgsql Exp $ $PostgreSQL: pgsql/src/backend/access/nbtree/README,v 1.9 2006/01/17 00:09:00 tgl Exp $
This directory contains a correct implementation of Lehman and Yao's This directory contains a correct implementation of Lehman and Yao's
high-concurrency B-tree management algorithm (P. Lehman and S. Yao, high-concurrency B-tree management algorithm (P. Lehman and S. Yao,
...@@ -325,15 +325,26 @@ work sometimes, but could cause failures later on depending on ...@@ -325,15 +325,26 @@ work sometimes, but could cause failures later on depending on
what else gets put on their page. what else gets put on their page.
"ScanKey" data structures are used in two fundamentally different ways "ScanKey" data structures are used in two fundamentally different ways
in this code. Searches for the initial position for a scan, as well as in this code, which we describe as "search" scankeys and "insertion"
insertions, use scankeys in which the comparison function is a 3-way scankeys. A search scankey is the kind passed to btbeginscan() or
comparator (<0, =0, >0 result). These scankeys are built within the btrescan() from outside the btree code. The sk_func pointers in a search
btree code (eg, by _bt_mkscankey()) and used by _bt_compare(). Once we scankey point to comparison functions that return boolean, such as int4lt.
are positioned, sequential examination of tuples in a scan is done by There might be more than one scankey entry for a given index column, or
_bt_checkkeys() using scankeys in which the comparison functions return none at all. (We require the keys to appear in index column order, but
booleans --- for example, int4lt might be used. These scankeys are the the order of multiple keys for a given column is unspecified.) An
ones originally passed in from outside the btree code. Same insertion scankey uses the same array-of-ScanKey data structure, but the
representation, but different comparison functions! sk_func pointers point to btree comparison support functions (ie, 3-way
comparators that return int4 values interpreted as <0, =0, >0). In an
insertion scankey there is exactly one entry per index column. Insertion
scankeys are built within the btree code (eg, by _bt_mkscankey()) and are
used to locate the starting point of a scan, as well as for locating the
place to insert a new index tuple. (Note: in the case of an insertion
scankey built from a search scankey, there might be fewer keys than
index columns, indicating that we have no constraints for the remaining
index columns.) After we have located the starting point of a scan, the
original search scankey is consulted as each index entry is sequentially
scanned to decide whether to return the entry and whether the scan can
stop (see _bt_checkkeys()).
Notes about data representation Notes about data representation
------------------------------- -------------------------------
......
...@@ -8,7 +8,7 @@ ...@@ -8,7 +8,7 @@
* *
* *
* IDENTIFICATION * IDENTIFICATION
* $PostgreSQL: pgsql/src/backend/access/nbtree/nbtinsert.c,v 1.130 2006/01/11 08:43:11 neilc Exp $ * $PostgreSQL: pgsql/src/backend/access/nbtree/nbtinsert.c,v 1.131 2006/01/17 00:09:00 tgl Exp $
* *
*------------------------------------------------------------------------- *-------------------------------------------------------------------------
*/ */
...@@ -80,7 +80,7 @@ _bt_doinsert(Relation rel, BTItem btitem, ...@@ -80,7 +80,7 @@ _bt_doinsert(Relation rel, BTItem btitem,
BTStack stack; BTStack stack;
Buffer buf; Buffer buf;
/* we need a scan key to do our search, so build one */ /* we need an insertion scan key to do our search, so build one */
itup_scankey = _bt_mkscankey(rel, itup); itup_scankey = _bt_mkscankey(rel, itup);
top: top:
...@@ -331,7 +331,8 @@ _bt_check_unique(Relation rel, BTItem btitem, Relation heapRel, ...@@ -331,7 +331,8 @@ _bt_check_unique(Relation rel, BTItem btitem, Relation heapRel,
* If 'afteritem' is >0 then the new tuple must be inserted after the * If 'afteritem' is >0 then the new tuple must be inserted after the
* existing item of that number, noplace else. If 'afteritem' is 0 * existing item of that number, noplace else. If 'afteritem' is 0
* then the procedure finds the exact spot to insert it by searching. * then the procedure finds the exact spot to insert it by searching.
* (keysz and scankey parameters are used ONLY if afteritem == 0.) * (keysz and scankey parameters are used ONLY if afteritem == 0.
* The scankey must be an insertion-type scankey.)
* *
* NOTE: if the new key is equal to one or more existing keys, we can * NOTE: if the new key is equal to one or more existing keys, we can
* legitimately place it anywhere in the series of equal keys --- in fact, * legitimately place it anywhere in the series of equal keys --- in fact,
......
...@@ -9,7 +9,7 @@ ...@@ -9,7 +9,7 @@
* *
* *
* IDENTIFICATION * IDENTIFICATION
* $PostgreSQL: pgsql/src/backend/access/nbtree/nbtpage.c,v 1.90 2005/11/22 18:17:06 momjian Exp $ * $PostgreSQL: pgsql/src/backend/access/nbtree/nbtpage.c,v 1.91 2006/01/17 00:09:01 tgl Exp $
* *
* NOTES * NOTES
* Postgres btree pages look like ordinary relation pages. The opaque * Postgres btree pages look like ordinary relation pages. The opaque
...@@ -813,7 +813,7 @@ _bt_pagedel(Relation rel, Buffer buf, bool vacuum_full) ...@@ -813,7 +813,7 @@ _bt_pagedel(Relation rel, Buffer buf, bool vacuum_full)
* better drop the target page lock first. * better drop the target page lock first.
*/ */
_bt_relbuf(rel, buf); _bt_relbuf(rel, buf);
/* we need a scan key to do our search, so build one */ /* we need an insertion scan key to do our search, so build one */
itup_scankey = _bt_mkscankey(rel, &(targetkey->bti_itup)); itup_scankey = _bt_mkscankey(rel, &(targetkey->bti_itup));
/* find the leftmost leaf page containing this key */ /* find the leftmost leaf page containing this key */
stack = _bt_search(rel, rel->rd_rel->relnatts, itup_scankey, false, stack = _bt_search(rel, rel->rd_rel->relnatts, itup_scankey, false,
......
...@@ -8,7 +8,7 @@ ...@@ -8,7 +8,7 @@
* Portions Copyright (c) 1994, Regents of the University of California * Portions Copyright (c) 1994, Regents of the University of California
* *
* IDENTIFICATION * IDENTIFICATION
* $PostgreSQL: pgsql/src/backend/access/nbtree/nbtsearch.c,v 1.99 2005/12/07 19:37:53 tgl Exp $ * $PostgreSQL: pgsql/src/backend/access/nbtree/nbtsearch.c,v 1.100 2006/01/17 00:09:01 tgl Exp $
* *
*------------------------------------------------------------------------- *-------------------------------------------------------------------------
*/ */
...@@ -29,6 +29,9 @@ static bool _bt_endpoint(IndexScanDesc scan, ScanDirection dir); ...@@ -29,6 +29,9 @@ static bool _bt_endpoint(IndexScanDesc scan, ScanDirection dir);
* _bt_search() -- Search the tree for a particular scankey, * _bt_search() -- Search the tree for a particular scankey,
* or more precisely for the first leaf page it could be on. * or more precisely for the first leaf page it could be on.
* *
* The passed scankey must be an insertion-type scankey (see nbtree/README),
* but it can omit the rightmost column(s) of the index.
*
* When nextkey is false (the usual case), we are looking for the first * When nextkey is false (the usual case), we are looking for the first
* item >= scankey. When nextkey is true, we are looking for the first * item >= scankey. When nextkey is true, we are looking for the first
* item strictly greater than scankey. * item strictly greater than scankey.
...@@ -127,15 +130,18 @@ _bt_search(Relation rel, int keysz, ScanKey scankey, bool nextkey, ...@@ -127,15 +130,18 @@ _bt_search(Relation rel, int keysz, ScanKey scankey, bool nextkey,
* data that appeared on the page originally is either on the page * data that appeared on the page originally is either on the page
* or strictly to the right of it. * or strictly to the right of it.
* *
* When nextkey is false (the usual case), we are looking for the first
* item >= scankey. When nextkey is true, we are looking for the first
* item strictly greater than scankey.
*
* This routine decides whether or not we need to move right in the * This routine decides whether or not we need to move right in the
* tree by examining the high key entry on the page. If that entry * tree by examining the high key entry on the page. If that entry
* is strictly less than the scankey, or <= the scankey in the nextkey=true * is strictly less than the scankey, or <= the scankey in the nextkey=true
* case, then we followed the wrong link and we need to move right. * case, then we followed the wrong link and we need to move right.
* *
* The passed scankey must be an insertion-type scankey (see nbtree/README),
* but it can omit the rightmost column(s) of the index.
*
* When nextkey is false (the usual case), we are looking for the first
* item >= scankey. When nextkey is true, we are looking for the first
* item strictly greater than scankey.
*
* On entry, we have the buffer pinned and a lock of the type specified by * On entry, we have the buffer pinned and a lock of the type specified by
* 'access'. If we move right, we release the buffer and lock and acquire * 'access'. If we move right, we release the buffer and lock and acquire
* the same on the right sibling. Return value is the buffer we stop at. * the same on the right sibling. Return value is the buffer we stop at.
...@@ -194,14 +200,13 @@ _bt_moveright(Relation rel, ...@@ -194,14 +200,13 @@ _bt_moveright(Relation rel,
/* /*
* _bt_binsrch() -- Do a binary search for a key on a particular page. * _bt_binsrch() -- Do a binary search for a key on a particular page.
* *
* The passed scankey must be an insertion-type scankey (see nbtree/README),
* but it can omit the rightmost column(s) of the index.
*
* When nextkey is false (the usual case), we are looking for the first * When nextkey is false (the usual case), we are looking for the first
* item >= scankey. When nextkey is true, we are looking for the first * item >= scankey. When nextkey is true, we are looking for the first
* item strictly greater than scankey. * item strictly greater than scankey.
* *
* The scankey we get has the compare function stored in the procedure
* entry of each data struct. We invoke this regproc to do the
* comparison for every key in the scankey.
*
* On a leaf page, _bt_binsrch() returns the OffsetNumber of the first * On a leaf page, _bt_binsrch() returns the OffsetNumber of the first
* key >= given scankey, or > scankey if nextkey is true. (NOTE: in * key >= given scankey, or > scankey if nextkey is true. (NOTE: in
* particular, this means it is possible to return a value 1 greater than the * particular, this means it is possible to return a value 1 greater than the
...@@ -301,8 +306,11 @@ _bt_binsrch(Relation rel, ...@@ -301,8 +306,11 @@ _bt_binsrch(Relation rel,
/*---------- /*----------
* _bt_compare() -- Compare scankey to a particular tuple on the page. * _bt_compare() -- Compare scankey to a particular tuple on the page.
* *
* The passed scankey must be an insertion-type scankey (see nbtree/README),
* but it can omit the rightmost column(s) of the index.
*
* keysz: number of key conditions to be checked (might be less than the * keysz: number of key conditions to be checked (might be less than the
* total length of the scan key!) * number of index columns!)
* page/offnum: location of btree item to be compared to. * page/offnum: location of btree item to be compared to.
* *
* This routine returns: * This routine returns:
...@@ -464,12 +472,17 @@ _bt_next(IndexScanDesc scan, ScanDirection dir) ...@@ -464,12 +472,17 @@ _bt_next(IndexScanDesc scan, ScanDirection dir)
/* /*
* _bt_first() -- Find the first item in a scan. * _bt_first() -- Find the first item in a scan.
* *
* We need to be clever about the type of scan, the operation it's * We need to be clever about the direction of scan, the search
* performing, and the tree ordering. We find the * conditions, and the tree ordering. We find the first item (or,
* first item in the tree that satisfies the qualification * if backwards scan, the last item) in the tree that satisfies the
* associated with the scan descriptor. On exit, the page containing * qualifications in the scan key. On exit, the page containing
* the current index tuple is read locked and pinned, and the scan's * the current index tuple is read locked and pinned, and the scan's
* opaque data entry is updated to include the buffer. * opaque data entry is updated to include the buffer.
*
* Note that scan->keyData[], and the so->keyData[] scankey built from it,
* are both search-type scankeys (see nbtree/README for more about this).
* Within this routine, we build a temporary insertion-type scankey to use
* in locating the scan start position.
*/ */
bool bool
_bt_first(IndexScanDesc scan, ScanDirection dir) _bt_first(IndexScanDesc scan, ScanDirection dir)
...@@ -537,6 +550,9 @@ _bt_first(IndexScanDesc scan, ScanDirection dir) ...@@ -537,6 +550,9 @@ _bt_first(IndexScanDesc scan, ScanDirection dir)
* equality quals survive preprocessing, however, it doesn't matter which * equality quals survive preprocessing, however, it doesn't matter which
* one we use --- by definition, they are either redundant or * one we use --- by definition, they are either redundant or
* contradictory. * contradictory.
*
* The selected scan keys (at most one per index column) are remembered by
* storing their addresses into the local startKeys[] array.
*---------- *----------
*/ */
strat_total = BTEqualStrategyNumber; strat_total = BTEqualStrategyNumber;
...@@ -631,9 +647,10 @@ _bt_first(IndexScanDesc scan, ScanDirection dir) ...@@ -631,9 +647,10 @@ _bt_first(IndexScanDesc scan, ScanDirection dir)
return _bt_endpoint(scan, dir); return _bt_endpoint(scan, dir);
/* /*
* We want to start the scan somewhere within the index. Set up a * We want to start the scan somewhere within the index. Set up an
* 3-way-comparison scankey we can use to search for the boundary point we * insertion scankey we can use to search for the boundary point we
* identified above. * identified above. The insertion scankey is built in the local
* scankeys[] array, using the keys identified by startKeys[].
*/ */
Assert(keysCount <= INDEX_MAX_KEYS); Assert(keysCount <= INDEX_MAX_KEYS);
for (i = 0; i < keysCount; i++) for (i = 0; i < keysCount; i++)
...@@ -681,19 +698,20 @@ _bt_first(IndexScanDesc scan, ScanDirection dir) ...@@ -681,19 +698,20 @@ _bt_first(IndexScanDesc scan, ScanDirection dir)
} }
} }
/* /*----------
* Examine the selected initial-positioning strategy to determine exactly * Examine the selected initial-positioning strategy to determine exactly
* where we need to start the scan, and set flag variables to control the * where we need to start the scan, and set flag variables to control the
* code below. * code below.
* *
* If nextkey = false, _bt_search and _bt_binsrch will locate the first * If nextkey = false, _bt_search and _bt_binsrch will locate the first
* item >= scan key. If nextkey = true, they will locate the first item > * item >= scan key. If nextkey = true, they will locate the first
* scan key. * item > scan key.
* *
* If goback = true, we will then step back one item, while if goback = * If goback = true, we will then step back one item, while if
* false, we will start the scan on the located item. * goback = false, we will start the scan on the located item.
* *
* it's yet other place to add some code later for is(not)null ... * it's yet other place to add some code later for is(not)null ...
*----------
*/ */
switch (strat_total) switch (strat_total)
{ {
...@@ -774,8 +792,8 @@ _bt_first(IndexScanDesc scan, ScanDirection dir) ...@@ -774,8 +792,8 @@ _bt_first(IndexScanDesc scan, ScanDirection dir)
} }
/* /*
* Use the manufactured scan key to descend the tree and position * Use the manufactured insertion scan key to descend the tree and
* ourselves on the target leaf page. * position ourselves on the target leaf page.
*/ */
stack = _bt_search(rel, keysCount, scankeys, nextkey, &buf, BT_READ); stack = _bt_search(rel, keysCount, scankeys, nextkey, &buf, BT_READ);
......
...@@ -8,7 +8,7 @@ ...@@ -8,7 +8,7 @@
* *
* *
* IDENTIFICATION * IDENTIFICATION
* $PostgreSQL: pgsql/src/backend/access/nbtree/nbtutils.c,v 1.67 2005/12/07 19:37:53 tgl Exp $ * $PostgreSQL: pgsql/src/backend/access/nbtree/nbtutils.c,v 1.68 2006/01/17 00:09:01 tgl Exp $
* *
*------------------------------------------------------------------------- *-------------------------------------------------------------------------
*/ */
...@@ -23,7 +23,7 @@ ...@@ -23,7 +23,7 @@
/* /*
* _bt_mkscankey * _bt_mkscankey
* Build a scan key that contains comparison data from itup * Build an insertion scan key that contains comparison data from itup
* as well as comparator routines appropriate to the key datatypes. * as well as comparator routines appropriate to the key datatypes.
* *
* The result is intended for use with _bt_compare(). * The result is intended for use with _bt_compare().
...@@ -67,11 +67,12 @@ _bt_mkscankey(Relation rel, IndexTuple itup) ...@@ -67,11 +67,12 @@ _bt_mkscankey(Relation rel, IndexTuple itup)
/* /*
* _bt_mkscankey_nodata * _bt_mkscankey_nodata
* Build a scan key that contains comparator routines appropriate to * Build an insertion scan key that contains 3-way comparator routines
* the key datatypes, but no comparison data. The comparison data * appropriate to the key datatypes, but no comparison data. The
* ultimately used must match the key datatypes. * comparison data ultimately used must match the key datatypes.
* *
* The result cannot be used with _bt_compare(). Currently this * The result cannot be used with _bt_compare(), unless comparison
* data is first stored into the key entries. Currently this
* routine is only called by nbtsort.c and tuplesort.c, which have * routine is only called by nbtsort.c and tuplesort.c, which have
* their own comparison routines. * their own comparison routines.
*/ */
...@@ -160,7 +161,7 @@ _bt_formitem(IndexTuple itup) ...@@ -160,7 +161,7 @@ _bt_formitem(IndexTuple itup)
/*---------- /*----------
* _bt_preprocess_keys() -- Preprocess scan keys * _bt_preprocess_keys() -- Preprocess scan keys
* *
* The caller-supplied keys (in scan->keyData[]) are copied to * The caller-supplied search-type keys (in scan->keyData[]) are copied to
* so->keyData[] with possible transformation. scan->numberOfKeys is * so->keyData[] with possible transformation. scan->numberOfKeys is
* the number of input keys, so->numberOfKeys gets the number of output * the number of input keys, so->numberOfKeys gets the number of output
* keys (possibly less, never greater). * keys (possibly less, never greater).
...@@ -485,7 +486,7 @@ _bt_preprocess_keys(IndexScanDesc scan) ...@@ -485,7 +486,7 @@ _bt_preprocess_keys(IndexScanDesc scan)
* accordingly. See comments for _bt_preprocess_keys(), above, about how * accordingly. See comments for _bt_preprocess_keys(), above, about how
* this is done. * this is done.
* *
* scan: index scan descriptor * scan: index scan descriptor (containing a search-type scankey)
* page: buffer page containing index tuple * page: buffer page containing index tuple
* offnum: offset number of index tuple (must be a valid item!) * offnum: offset number of index tuple (must be a valid item!)
* dir: direction we are scanning in * dir: direction we are scanning in
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment