Commit 53e99f57 authored by Tom Lane's avatar Tom Lane

Make an editorial pass over the newly SGML-ified contrib documentation.

Fix lots of bad markup, bad English, bad explanations.

This commit covers only about half the contrib modules, but I grow weary...
parent a37a0a41
<!-- $PostgreSQL: pgsql/doc/src/sgml/adminpack.sgml,v 1.3 2007/12/06 04:12:09 tgl Exp $ -->
<sect1 id="adminpack">
<title>adminpack</title>
<indexterm zone="adminpack">
<primary>adminpack</primary>
</indexterm>
<para>
adminpack is a PostgreSQL standard module that implements a number of
support functions which pgAdmin and other administration and management tools
can use to provide additional functionality if installed on a server.
<filename>adminpack</> provides a number of support functions which
<application>pgAdmin</> and other administration and management tools can
use to provide additional functionality, such as remote management
of server log files.
</para>
<sect2>
<title>Functions implemented</title>
<para>
Functions implemented by adminpack can only be run by a superuser. Here's a
list of these functions:
</para>
<para>
<programlisting>
int8 pg_catalog.pg_file_write(fname text, data text, append bool)
bool pg_catalog.pg_file_rename(oldname text, newname text, archivname text)
bool pg_catalog.pg_file_rename(oldname text, newname text)
bool pg_catalog.pg_file_unlink(fname text)
setof record pg_catalog.pg_logdir_ls()
/* Renaming of existing backend functions for pgAdmin compatibility */
int8 pg_catalog.pg_file_read(fname text, data text, append bool)
bigint pg_catalog.pg_file_length(text)
int4 pg_catalog.pg_logfile_rotate()
</programlisting>
The functions implemented by <filename>adminpack</> can only be run by a
superuser. Here's a list of these functions:
<programlisting>
int8 pg_catalog.pg_file_write(fname text, data text, append bool)
bool pg_catalog.pg_file_rename(oldname text, newname text, archivename text)
bool pg_catalog.pg_file_rename(oldname text, newname text)
bool pg_catalog.pg_file_unlink(fname text)
setof record pg_catalog.pg_logdir_ls()
/* Renaming of existing backend functions for pgAdmin compatibility */
int8 pg_catalog.pg_file_read(fname text, data text, append bool)
bigint pg_catalog.pg_file_length(text)
int4 pg_catalog.pg_logfile_rotate()
</programlisting>
</para>
</sect2>
</sect1>
<!-- $PostgreSQL: pgsql/doc/src/sgml/btree-gist.sgml,v 1.4 2007/12/06 04:12:09 tgl Exp $ -->
<sect1 id="btree-gist">
<title>btree_gist</title>
<indexterm zone="btree-gist">
<primary>btree_gist</primary>
</indexterm>
<para>
btree_gist is a B-Tree implementation using GiST that supports the int2, int4,
int8, float4, float8 timestamp with/without time zone, time
with/without time zone, date, interval, oid, money, macaddr, char,
varchar/text, bytea, numeric, bit, varbit and inet/cidr types.
<filename>btree_gist</> provides sample GiST operator classes that
implement B-Tree equivalent behavior for the data types
<type>int2</>, <type>int4</>, <type>int8</>, <type>float4</>,
<type>float8</>, <type>numeric</>, <type>timestamp with time zone</>,
<type>timestamp without time zone</>, <type>time with time zone</>,
<type>time without time zone</>, <type>date</>, <type>interval</>,
<type>oid</>, <type>money</>, <type>char</>,
<type>varchar</>, <type>text</>, <type>bytea</>, <type>bit</>,
<type>varbit</>, <type>macaddr</>, <type>inet</>, and <type>cidr</>.
</para>
<para>
In general, these operator classes will not outperform the equivalent
standard btree index methods, and they lack one major feature of the
standard btree code: the ability to enforce uniqueness. However,
they are useful for GiST testing and as a base for developing other
GiST operator classes.
</para>
<sect2>
<title>Example usage</title>
<programlisting>
CREATE TABLE test (a int4);
-- create index
CREATE INDEX testidx ON test USING gist (a);
-- query
SELECT * FROM test WHERE a &lt; 10;
</programlisting>
<programlisting>
CREATE TABLE test (a int4);
-- create index
CREATE INDEX testidx ON test USING gist (a);
-- query
SELECT * FROM test WHERE a &lt; 10;
</programlisting>
</sect2>
<sect2>
<title>Authors</title>
<para>
All work was done by Teodor Sigaev (<email>teodor@stack.net</email>) ,
Oleg Bartunov (<email>oleg@sai.msu.su</email>), Janko Richter
(<email>jankorichter@yahoo.de</email>). See
<ulink url="http://www.sai.msu.su/~megera/postgres/gist"></ulink> for additional
information.
Teodor Sigaev (<email>teodor@stack.net</email>) ,
Oleg Bartunov (<email>oleg@sai.msu.su</email>), and
Janko Richter (<email>jankorichter@yahoo.de</email>). See
<ulink url="http://www.sai.msu.su/~megera/postgres/gist"></ulink>
for additional information.
</para>
</sect2>
</sect1>
<!-- $PostgreSQL: pgsql/doc/src/sgml/chkpass.sgml,v 1.2 2007/12/06 04:12:09 tgl Exp $ -->
<sect1 id="chkpass">
<title>chkpass</title>
<!--
<title>chkpass</title>
<indexterm zone="chkpass">
<primary>chkpass</primary>
</indexterm>
-->
<para>
chkpass is a password type that is automatically checked and converted upon
entry. It is stored encrypted. To compare, simply compare against a clear
This module implements a data type <type>chkpass</> that is
designed for storing encrypted passwords.
Each password is automatically converted to encrypted form upon entry,
and is always stored encrypted. To compare, simply compare against a clear
text password and the comparison function will encrypt it before comparing.
It also returns an error if the code determines that the password is easily
crackable. This is currently a stub that does nothing.
</para>
<para>
Note that the chkpass data type is not indexable.
<!--
I haven't worried about making this type indexable. I doubt that anyone
would ever need to sort a file in order of encrypted password.
-->
There are provisions in the code to report an error if the password is
determined to be easily crackable. However, this is currently just
a stub that does nothing.
</para>
<para>
If you precede the string with a colon, the encryption and checking are
skipped so that you can enter existing passwords into the field.
If you precede an input string with a colon, it is assumed to be an
already-encrypted password, and is stored without further encryption.
This allows entry of previously-encrypted passwords.
</para>
<para>
On output, a colon is prepended. This makes it possible to dump and reload
passwords without re-encrypting them. If you want the password (encrypted)
without the colon then use the raw() function. This allows you to use the
passwords without re-encrypting them. If you want the encrypted password
without the colon then use the <function>raw()</> function.
This allows you to use the
type with things like Apache's Auth_PostgreSQL module.
</para>
<para>
The encryption uses the standard Unix function crypt(), and so it suffers
The encryption uses the standard Unix function <function>crypt()</>,
and so it suffers
from all the usual limitations of that function; notably that only the
first eight characters of a password are considered.
</para>
<para>
Here is some sample usage:
Note that the chkpass data type is not indexable.
<!--
I haven't worried about making this type indexable. I doubt that anyone
would ever need to sort a file in order of encrypted password.
-->
</para>
<programlisting>
<para>
Sample usage:
</para>
<programlisting>
test=# create table test (p chkpass);
CREATE TABLE
test=# insert into test values ('hello');
......@@ -72,13 +82,14 @@ test=# select p = 'goodbye' from test;
----------
f
(1 row)
</programlisting>
</programlisting>
<sect2>
<title>Author</title>
<para>
D'Arcy J.M. Cain <email>darcy@druid.net</email>
D'Arcy J.M. Cain (<email>darcy@druid.net</email>)
</para>
</sect2>
</sect1>
</sect1>
<!-- $PostgreSQL: pgsql/doc/src/sgml/contrib-spi.sgml,v 1.1 2007/12/03 04:18:47 tgl Exp $ -->
<!-- $PostgreSQL: pgsql/doc/src/sgml/contrib-spi.sgml,v 1.2 2007/12/06 04:12:09 tgl Exp $ -->
<sect1 id="contrib-spi">
<title>spi</title>
......@@ -29,27 +29,28 @@
<para>
<function>check_primary_key()</> checks the referencing table.
To use, create a BEFORE INSERT OR UPDATE trigger using this
function on a table referencing another table. You are to specify
as trigger arguments: triggered table column names which correspond
to foreign key, referenced table name and column names in referenced
table which correspond to primary/unique key. To handle multiple
foreign keys, create a trigger for each reference.
To use, create a <literal>BEFORE INSERT OR UPDATE</> trigger using this
function on a table referencing another table. Specify as the trigger
arguments: the referencing table's column name(s) which form the foreign
key, the referenced table name, and the column names in the referenced table
which form the primary/unique key. To handle multiple foreign
keys, create a trigger for each reference.
</para>
<para>
<function>check_foreign_key()</> checks the referenced table.
To use, create a BEFORE DELETE OR UPDATE trigger using this
function on a table referenced by other table(s). You are to specify
as trigger arguments: number of references for which the function has to
perform checking, action if referencing key found ('cascade' &mdash; to delete
corresponding foreign key, 'restrict' &mdash; to abort transaction if foreign keys
exist, 'setnull' &mdash; to set foreign key referencing primary/unique key
being deleted to null), triggered table column names which correspond
to primary/unique key, then referencing table name and column names
corresponding to foreign key (repeated for as many referencing tables/keys
as were specified by first argument). Note that the primary/unique key
columns should be marked NOT NULL and should have a unique index.
To use, create a <literal>BEFORE DELETE OR UPDATE</> trigger using this
function on a table referenced by other table(s). Specify as the trigger
arguments: the number of referencing tables for which the function has to
perform checking, the action if a referencing key is found
(<literal>cascade</> &mdash; to delete the referencing row,
<literal>restrict</> &mdash; to abort transaction if referencing keys
exist, <literal>setnull</> &mdash; to set referencing key fields to null),
the triggered table's column names which form the primary/unique key, then
the referencing table name and column names (repeated for as many
referencing tables as were specified by first argument). Note that the
primary/unique key columns should be marked NOT NULL and should have a
unique index.
</para>
<para>
......@@ -64,60 +65,65 @@
Long ago, <productname>PostgreSQL</> had a built-in time travel feature
that kept the insert and delete times for each tuple. This can be
emulated using these functions. To use these functions,
you are to add to a table two columns of <type>abstime</> type to store
you must add to a table two columns of <type>abstime</> type to store
the date when a tuple was inserted (start_date) and changed/deleted
(stop_date):
<programlisting>
CREATE TABLE mytab (
... ...
start_date abstime default now(),
stop_date abstime default 'infinity'
start_date abstime,
stop_date abstime
... ...
);
</programlisting>
So, tuples being inserted with unspecified start_date/stop_date will get
the current time in start_date and <literal>infinity</> in
stop_date.
The columns can be named whatever you like, but in this discussion
we'll call them start_date and stop_date.
</para>
<para>
When a new row is inserted, start_date should normally be set to
current time, and stop_date to <literal>infinity</>. The trigger
will automatically substitute these values if the inserted data
contains nulls in these columns. Generally, inserting explicit
non-null data in these columns should only be done when re-loading
dumped data.
</para>
<para>
Tuples with stop_date equal to <literal>infinity</> are <quote>valid
now</quote>: when trigger will be fired for UPDATE/DELETE of a tuple with
stop_date NOT equal to <literal>infinity</> then
this tuple will not be changed/deleted!
now</quote>, and can be modified. Tuples with a finite stop_date cannot
be modified anymore &mdash; the trigger will prevent it. (If you need
to do that, you can turn off time travel as shown below.)
</para>
<para>
If stop_date is equal to <literal>infinity</> then on
update only the stop_date in the tuple being updated will be changed (to
current time) and a new tuple with new data (coming from SET ... in UPDATE)
will be inserted. Start_date in this new tuple will be set to current time
and stop_date to <literal>infinity</>.
For a modifiable row, on update only the stop_date in the tuple being
updated will be changed (to current time) and a new tuple with the modified
data will be inserted. Start_date in this new tuple will be set to current
time and stop_date to <literal>infinity</>.
</para>
<para>
A delete does not actually remove the tuple but only set its stop_date
A delete does not actually remove the tuple but only sets its stop_date
to current time.
</para>
<para>
To query for tuples <quote>valid now</quote>, include
<literal>stop_date = 'infinity'</> in the query's WHERE condition.
(You might wish to incorporate that in a view.)
</para>
<para>
You can't change start/stop date columns with UPDATE!
Use set_timetravel (below) if you need this.
(You might wish to incorporate that in a view.) Similarly, you can
query for tuples valid at any past time with suitable conditions on
start_date and stop_date.
</para>
<para>
<function>timetravel()</> is the general trigger function that supports
this behavior. Create a BEFORE INSERT OR UPDATE OR DELETE trigger using this
function on each time-traveled table. You are to specify two trigger arguments:
name of start_date column and name of stop_date column in triggered table.
this behavior. Create a <literal>BEFORE INSERT OR UPDATE OR DELETE</>
trigger using this function on each time-traveled table. Specify two
trigger arguments: the actual
names of the start_date and stop_date columns.
Optionally, you can specify one to three more arguments, which must refer
to columns of type <type>text</>. The trigger will store the name of
the current user into the first of these columns during INSERT, the
......@@ -130,7 +136,9 @@ CREATE TABLE mytab (
<literal>set_timetravel('mytab', 1)</> will turn TT ON for table mytab.
<literal>set_timetravel('mytab', 0)</> will turn TT OFF for table mytab.
In both cases the old status is reported. While TT is off, you can modify
the start_date and stop_date columns freely.
the start_date and stop_date columns freely. Note that the on/off status
is local to the current database session &mdash; fresh sessions will
always start out with TT ON for all tables.
</para>
<para>
......@@ -156,9 +164,9 @@ CREATE TABLE mytab (
</para>
<para>
To use, create a BEFORE INSERT (or optionally BEFORE INSERT OR UPDATE)
trigger using this function. You are to specify
as trigger arguments: the name of the integer column to be modified,
To use, create a <literal>BEFORE INSERT</> (or optionally <literal>BEFORE
INSERT OR UPDATE</>) trigger using this function. Specify two
trigger arguments: the name of the integer column to be modified,
and the name of the sequence object that will supply values.
(Actually, you can specify any number of pairs of such names, if
you'd like to update more than one autoincrementing column.)
......@@ -180,8 +188,8 @@ CREATE TABLE mytab (
</para>
<para>
To use, create a BEFORE INSERT and/or UPDATE
trigger using this function. You are to specify a single trigger
To use, create a <literal>BEFORE INSERT</> and/or <literal>UPDATE</>
trigger using this function. Specify a single trigger
argument: the name of the text column to be modified.
</para>
......@@ -201,8 +209,8 @@ CREATE TABLE mytab (
</para>
<para>
To use, create a BEFORE UPDATE
trigger using this function. You are to specify a single trigger
To use, create a <literal>BEFORE UPDATE</>
trigger using this function. Specify a single trigger
argument: the name of the <type>timestamp</> column to be modified.
</para>
......
<!-- $PostgreSQL: pgsql/doc/src/sgml/contrib.sgml,v 1.7 2007/12/03 04:18:47 tgl Exp $ -->
<!-- $PostgreSQL: pgsql/doc/src/sgml/contrib.sgml,v 1.8 2007/12/06 04:12:09 tgl Exp $ -->
<appendix id="contrib">
<title>Additional Supplied Modules</title>
......@@ -44,7 +44,7 @@
<para>
Many modules supply new user-defined functions, operators, or types.
To make use of one of these modules, after you have installed the code
you need to register the new objects in the database
you need to register the new objects in the database
system by running the SQL commands in the <literal>.sql</> file
supplied by the module. For example,
......@@ -54,6 +54,7 @@ psql -d dbname -f <replaceable>SHAREDIR</>/contrib/<replaceable>module</>.sql
Here, <replaceable>SHAREDIR</> means the installation's <quote>share</>
directory (<literal>pg_config --sharedir</> will tell you what this is).
In most cases the script must be run by a database superuser.
</para>
<para>
......
This diff is collapsed.
This diff is collapsed.
<!-- $PostgreSQL: pgsql/doc/src/sgml/dict-int.sgml,v 1.2 2007/12/06 04:12:10 tgl Exp $ -->
<sect1 id="dict-int">
<title>dict_int</title>
<indexterm zone="dict-int">
<primary>dict_int</primary>
</indexterm>
<para>
The motivation for this example dictionary is to control the indexing of
integers (signed and unsigned), and, consequently, to minimize the number of
unique words which greatly affect the performance of searching.
<filename>dict_int</> is an example of an add-on dictionary template
for full-text search. The motivation for this example dictionary is to
control the indexing of integers (signed and unsigned), allowing such
numbers to be indexed while preventing excessive growth in the number of
unique words, which greatly affects the performance of searching.
</para>
<sect2>
<title>Configuration</title>
<para>
The dictionary accepts two options:
The dictionary accepts two options:
</para>
<itemizedlist>
<listitem>
<para>
The MAXLEN parameter specifies the maximum length (number of digits)
allowed in an integer word. The default value is 6.
The <literal>maxlen</> parameter specifies the maximum number of
digits allowed in an integer word. The default value is 6.
</para>
</listitem>
<listitem>
<para>
The REJECTLONG parameter specifies if an overlength integer should be
truncated or ignored. If REJECTLONG=FALSE (default), the dictionary returns
the first MAXLEN digits of the integer. If REJECTLONG=TRUE, the
dictionary treats an overlength integer as a stop word, so that it will
not be indexed.
The <literal>rejectlong</> parameter specifies whether an overlength
integer should be truncated or ignored. If <literal>rejectlong</> is
<literal>false</> (the default), the dictionary returns the first
<literal>maxlen</> digits of the integer. If <literal>rejectlong</> is
<literal>true</>, the dictionary treats an overlength integer as a stop
word, so that it will not be indexed. Note that this also means that
such an integer cannot be searched for.
</para>
</listitem>
</itemizedlist>
......
<!-- $PostgreSQL: pgsql/doc/src/sgml/dict-xsyn.sgml,v 1.2 2007/12/06 04:12:10 tgl Exp $ -->
<sect1 id="dict-xsyn">
<title>dict_xsyn</title>
<indexterm zone="dict-xsyn">
<primary>dict_xsyn</primary>
</indexterm>
<para>
The Extended Synonym Dictionary module replaces words with groups of their
synonyms, and so makes it possible to search for a word using any of its
synonyms.
<filename>dict_xsyn</> (Extended Synonym Dictionary) is an example of an
add-on dictionary template for full-text search. This dictionary type
replaces words with groups of their synonyms, and so makes it possible to
search for a word using any of its synonyms.
</para>
<sect2>
<title>Configuration</title>
<para>
A <literal>dict_xsyn</> dictionary accepts the following options:
</para>
<itemizedlist>
<listitem>
<para>
KEEPORIG controls whether the original word is included, or only its
synonyms. Default is 'true'.
<literal>keeporig</> controls whether the original word is included (if
<literal>true</>), or only its synonyms (if <literal>false</>). Default
is <literal>true</>.
</para>
</listitem>
<listitem>
<para>
RULES is the base name of the file containing the list of synonyms.
This file must be in $(prefix)/share/tsearch_data/, and its name must
end in ".rules" (which is not included in the RULES parameter).
<literal>rules</> is the base name of the file containing the list of
synonyms. This file must be stored in
<filename>$SHAREDIR/tsearch_data/</> (where <literal>$SHAREDIR</> means
the <productname>PostgreSQL</> installation's shared-data directory).
Its name must end in <literal>.rules</> (which is not to be included in
the <literal>rules</> parameter).
</para>
</listitem>
</itemizedlist>
......@@ -38,41 +46,63 @@
<listitem>
<para>
Each line represents a group of synonyms for a single word, which is
given first on the line. Synonyms are separated by whitespace:
</para>
given first on the line. Synonyms are separated by whitespace, thus:
<programlisting>
word syn1 syn2 syn3
</programlisting>
</para>
</listitem>
<listitem>
<para>
Sharp ('#') sign is a comment delimiter. It may appear at any position
inside the line. The rest of the line will be skipped.
The sharp (<literal>#</>) sign is a comment delimiter. It may appear at
any position in a line. The rest of the line will be skipped.
</para>
</listitem>
</itemizedlist>
<para>
Look at xsyn_sample.rules, which is installed in $(prefix)/share/tsearch_data/,
for an example.
Look at <filename>xsyn_sample.rules</>, which is installed in
<filename>$SHAREDIR/tsearch_data/</>, for an example.
</para>
</sect2>
<sect2>
<title>Usage</title>
<programlisting>
mydb=# SELECT ts_lexize('xsyn','word');
ts_lexize
----------------
{word,syn1,syn2,syn3)
</programlisting>
<para>
Change dictionary options:
</para>
<programlisting>
mydb# ALTER TEXT SEARCH DICTIONARY xsyn (KEEPORIG=false);
Running the installation script creates a text search template
<literal>xsyn_template</> and a dictionary <literal>xsyn</>
based on it, with default parameters. You can alter the
parameters, for example
<programlisting>
mydb# ALTER TEXT SEARCH DICTIONARY xsyn (RULES='my_rules', KEEPORIG=false);
ALTER TEXT SEARCH DICTIONARY
</programlisting>
</programlisting>
or create new dictionaries based on the template.
</para>
<para>
To test the dictionary, you can try
<programlisting>
mydb=# SELECT ts_lexize('xsyn', 'word');
ts_lexize
-----------------------
{word,syn1,syn2,syn3}
</programlisting>
but real-world usage will involve including it in a text search
configuration as described in <xref linkend="textsearch">.
That might look like this:
<programlisting>
ALTER TEXT SEARCH CONFIGURATION english
ALTER MAPPING FOR word, asciiword WITH xsyn, english_stem;
</programlisting>
</para>
</sect2>
</sect1>
This diff is collapsed.
<!-- $PostgreSQL: pgsql/doc/src/sgml/fuzzystrmatch.sgml,v 1.3 2007/12/06 04:12:10 tgl Exp $ -->
<sect1 id="fuzzystrmatch">
<title>fuzzystrmatch</title>
<indexterm zone="fuzzystrmatch">
<primary>fuzzystrmatch</primary>
</indexterm>
<para>
This section describes the fuzzystrmatch module which provides different
The <filename>fuzzystrmatch</> module provides several
functions to determine similarities and distance between strings.
</para>
<sect2>
<title>Soundex</title>
<para>
The Soundex system is a method of matching similar sounding names
(or any words) to the same code. It was initially used by the
United States Census in 1880, 1900, and 1910, but it has little use
beyond English names (or the English pronunciation of names), and
it is not a linguistic tool.
The Soundex system is a method of matching similar-sounding names
by converting them to the same code. It was initially used by the
United States Census in 1880, 1900, and 1910. Note that Soundex
is not very useful for non-English names.
</para>
<para>
When comparing two soundex values to determine similarity, the
difference function reports how close the match is on a scale
from zero to four, with zero being no match and four being an
exact match.
The <filename>fuzzystrmatch</> module provides two functions
for working with Soundex codes:
</para>
<programlisting>
soundex(text) returns text
difference(text, text) returns int
</programlisting>
<para>
The <function>soundex</> function converts a string to its Soundex code.
The <function>difference</> function converts two strings to their Soundex
codes and then reports the number of matching code positions. Since
Soundex codes have four characters, the result ranges from zero to four,
with zero being no match and four being an exact match. (Thus, the
function is misnamed &mdash; <function>similarity</> would have been
a better name.)
</para>
<para>
The following are some usage examples:
Here are some usage examples:
</para>
<programlisting>
SELECT soundex('hello world!');
......@@ -41,81 +62,106 @@ INSERT INTO s VALUES ('jack');
SELECT * FROM s WHERE soundex(nm) = soundex('john');
SELECT a.nm, b.nm FROM s a, s b WHERE soundex(a.nm) = soundex(b.nm) AND a.oid &lt;&gt; b.oid;
CREATE FUNCTION text_sx_eq(text, text) RETURNS boolean AS
'select soundex($1) = soundex($2)'
LANGUAGE SQL;
CREATE FUNCTION text_sx_lt(text, text) RETURNS boolean AS
'select soundex($1) &lt; soundex($2)'
LANGUAGE SQL;
CREATE FUNCTION text_sx_gt(text, text) RETURNS boolean AS
'select soundex($1) &gt; soundex($2)'
LANGUAGE SQL;
CREATE FUNCTION text_sx_le(text, text) RETURNS boolean AS
'select soundex($1) &lt;= soundex($2)'
LANGUAGE SQL;
CREATE FUNCTION text_sx_ge(text, text) RETURNS boolean AS
'select soundex($1) &gt;= soundex($2)'
LANGUAGE SQL;
SELECT * FROM s WHERE difference(s.nm, 'john') &gt; 2;
</programlisting>
</sect2>
CREATE FUNCTION text_sx_ne(text, text) RETURNS boolean AS
'select soundex($1) &lt;&gt; soundex($2)'
LANGUAGE SQL;
<sect2>
<title>Levenshtein</title>
DROP OPERATOR #= (text, text);
<para>
This function calculates the Levenshtein distance between two strings:
</para>
CREATE OPERATOR #= (leftarg=text, rightarg=text, procedure=text_sx_eq, commutator = #=);
<programlisting>
levenshtein(text source, text target) returns int
</programlisting>
SELECT * FROM s WHERE text_sx_eq(nm, 'john');
<para>
Both <literal>source</literal> and <literal>target</literal> can be any
non-null string, with a maximum of 255 characters.
</para>
SELECT * FROM s WHERE s.nm #= 'john';
<para>
Example:
</para>
SELECT * FROM s WHERE difference(s.nm, 'john') &gt; 2;
<programlisting>
test=# SELECT levenshtein('GUMBO', 'GAMBOL');
levenshtein
-------------
2
(1 row)
</programlisting>
</sect2>
<sect2>
<title>levenshtein</title>
<title>Metaphone</title>
<para>
Metaphone, like Soundex, is based on the idea of constructing a
representative code for an input string. Two strings are then
deemed similar if they have the same codes.
</para>
<para>
This function calculates the levenshtein distance between two strings:
This function calculates the metaphone code of an input string:
</para>
<programlisting>
int levenshtein(text source, text target)
metaphone(text source, int max_output_length) returns text
</programlisting>
<para>
Both <literal>source</literal> and <literal>target</literal> can be any
NOT NULL string with a maximum of 255 characters.
<literal>source</literal> has to be a non-null string with a maximum of
255 characters. <literal>max_output_length</literal> sets the maximum
length of the output metaphone code; if longer, the output is truncated
to this length.
</para>
<para>
Example:
</para>
<programlisting>
SELECT levenshtein('GUMBO','GAMBOL');
test=# SELECT metaphone('GUMBO', 4);
metaphone
-----------
KM
(1 row)
</programlisting>
</sect2>
<sect2>
<title>metaphone</title>
<title>Double Metaphone</title>
<para>
This function calculates and returns the metaphone code of an input string:
The Double Metaphone system computes two <quote>sounds like</> strings
for a given input string &mdash; a <quote>primary</> and an
<quote>alternate</>. In most cases they are the same, but for non-English
names especially they can be a bit different, depending on pronunciation.
These functions compute the primary and alternate codes:
</para>
<programlisting>
text metahpone(text source, int max_output_length)
dmetaphone(text source) returns text
dmetaphone_alt(text source) returns text
</programlisting>
<para>
<literal>source</literal> has to be a NOT NULL string with a maximum of
255 characters. <literal>max_output_length</literal> fixes the maximum
length of the output metaphone code; if longer, the output is truncated
to this length.
There is no length limit on the input strings.
</para>
<para>
Example:
</para>
<para>Example</para>
<programlisting>
SELECT metaphone('GUMBO',4);
test=# select dmetaphone('gumbo');
dmetaphone
------------
KMP
(1 row)
</programlisting>
</sect2>
......
This diff is collapsed.
<!-- $PostgreSQL: pgsql/doc/src/sgml/lo.sgml,v 1.3 2007/12/06 04:12:10 tgl Exp $ -->
<sect1 id="lo">
<title>lo</title>
<indexterm zone="lo">
<primary>lo</primary>
</indexterm>
<para>
PostgreSQL type extension for managing Large Objects
The <filename>lo</> module provides support for managing Large Objects
(also called LOs or BLOBs). This includes a data type <type>lo</>
and a trigger <function>lo_manage</>.
</para>
<sect2>
<title>Overview</title>
<title>Rationale</title>
<para>
One of the problems with the JDBC driver (and this affects the ODBC driver
also), is that the specification assumes that references to BLOBS (Binary
Large OBjectS) are stored within a table, and if that entry is changed, the
also), is that the specification assumes that references to BLOBs (Binary
Large OBjects) are stored within a table, and if that entry is changed, the
associated BLOB is deleted from the database.
</para>
<para>
As PostgreSQL stands, this doesn't occur. Large objects are treated as
objects in their own right; a table entry can reference a large object by
OID, but there can be multiple table entries referencing the same large
object OID, so the system doesn't delete the large object just because you
change or remove one such entry.
</para>
<para>
Now this is fine for new PostgreSQL-specific applications, but existing ones
using JDBC or ODBC won't delete the objects, resulting in orphaning - objects
that are not referenced by anything, and simply occupy disk space.
As <productname>PostgreSQL</> stands, this doesn't occur. Large objects
are treated as objects in their own right; a table entry can reference a
large object by OID, but there can be multiple table entries referencing
the same large object OID, so the system doesn't delete the large object
just because you change or remove one such entry.
</para>
</sect2>
<sect2>
<title>The Fix</title>
<para>
I've fixed this by creating a new data type 'lo', some support functions, and
a Trigger which handles the orphaning problem. The trigger essentially just
does a 'lo_unlink' whenever you delete or modify a value referencing a large
object. When you use this trigger, you are assuming that there is only one
database reference to any large object that is referenced in a
trigger-controlled column!
Now this is fine for <productname>PostgreSQL</>-specific applications, but
standard code using JDBC or ODBC won't delete the objects, resulting in
orphan objects &mdash; objects that are not referenced by anything, and
simply occupy disk space.
</para>
<para>
The 'lo' type was created because we needed to differentiate between plain
OIDs and Large Objects. Currently the JDBC driver handles this dilemma easily,
but (after talking to Byron), the ODBC driver needed a unique type. They had
created an 'lo' type, but not the solution to orphaning.
The <filename>lo</> module allows fixing this by attaching a trigger
to tables that contain LO reference columns. The trigger essentially just
does a <function>lo_unlink</> whenever you delete or modify a value
referencing a large object. When you use this trigger, you are assuming
that there is only one database reference to any large object that is
referenced in a trigger-controlled column!
</para>
<para>
You don't actually have to use the 'lo' type to use the trigger, but it may be
convenient to use it to keep track of which columns in your database represent
large objects that you are managing with the trigger.
The module also provides a data type <type>lo</>, which is really just
a domain of the <type>oid</> type. This is useful for differentiating
database columns that hold large object references from those that are
OIDs of other things. You don't have to use the <type>lo</> type to
use the trigger, but it may be convenient to use it to keep track of which
columns in your database represent large objects that you are managing with
the trigger. It is also rumored that the ODBC driver gets confused if you
don't use <type>lo</> for BLOB columns.
</para>
</sect2>
<sect2>
<title>How to Use</title>
<title>How to Use It</title>
<para>
The easiest way is by an example:
Here's a simple example of usage:
</para>
<programlisting>
CREATE TABLE image (title TEXT, raster lo);
CREATE TRIGGER t_raster BEFORE UPDATE OR DELETE ON image
FOR EACH ROW EXECUTE PROCEDURE lo_manage(raster);
</programlisting>
<para>
Create a trigger for each column that contains a lo type, and give the column
name as the trigger procedure argument. You can have more than one trigger on
a table if you need multiple lo columns in the same table, but don't forget to
give a different name to each trigger.
For each column that will contain unique references to large objects,
create a <literal>BEFORE UPDATE OR DELETE</> trigger, and give the column
name as the sole trigger argument. If you need multiple <type>lo</>
columns in the same table, create a separate trigger for each one,
remembering to give a different name to each trigger on the same table.
</para>
</sect2>
<sect2>
<title>Issues</title>
<title>Limitations</title>
<itemizedlist>
<listitem>
<para>
Dropping a table will still orphan any objects it contains, as the trigger
is not executed.
is not executed. You can avoid this by preceding the <command>DROP
TABLE</> with <command>DELETE FROM <replaceable>table</></command>.
</para>
<para>
Avoid this by preceding the 'drop table' with 'delete from {table}'.
<command>TRUNCATE</> has the same hazard.
</para>
<para>
If you already have, or suspect you have, orphaned large objects, see
the contrib/vacuumlo module to help you clean them up. It's a good idea
to run contrib/vacuumlo occasionally as a back-stop to the lo_manage
trigger.
If you already have, or suspect you have, orphaned large objects, see the
<filename>contrib/vacuumlo</> module (<xref linkend="vacuumlo">) to help
you clean them up. It's a good idea to run <application>vacuumlo</>
occasionally as a back-stop to the <function>lo_manage</> trigger.
</para>
</listitem>
<listitem>
<para>
Some frontends may create their own tables, and will not create the
associated trigger(s). Also, users may not remember (or know) to create
associated trigger(s). Also, users may not remember (or know) to create
the triggers.
</para>
</listitem>
</itemizedlist>
<para>
As the ODBC driver needs a permanent lo type (&amp; JDBC could be optimised to
use it if it's Oid is fixed), and as the above issues can only be fixed by
some internal changes, I feel it should become a permanent built-in type.
</para>
</sect2>
<sect2>
<title>Author</title>
<para>
Peter Mount <email>peter@retep.org.uk</email> June 13 1998
Peter Mount <email>peter@retep.org.uk</email>
</para>
</sect2>
</sect1>
</sect1>
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
<!-- $PostgreSQL: pgsql/doc/src/sgml/test-parser.sgml,v 1.1 2007/12/03 04:18:47 tgl Exp $ -->
<!-- $PostgreSQL: pgsql/doc/src/sgml/test-parser.sgml,v 1.2 2007/12/06 04:12:10 tgl Exp $ -->
<sect1 id="test-parser">
<title>test_parser</title>
......@@ -8,15 +8,18 @@
</indexterm>
<para>
This is an example of a custom parser for full text search.
<filename>test_parser</> is an example of a custom parser for full-text
search. It doesn't do anything especially useful, but can serve as
a starting point for developing your own parser.
</para>
<para>
It recognizes space-delimited words and returns just two token types:
<filename>test_parser</> recognizes words separated by white space,
and returns just two token types:
<programlisting>
mydb=# SELECT * FROM ts_token_type('testparser');
tokid | alias | description
tokid | alias | description
-------+-------+---------------
3 | word | Word
12 | blank | Space symbols
......@@ -41,16 +44,16 @@ mydb=# SELECT * FROM ts_token_type('testparser');
<programlisting>
mydb=# SELECT * FROM ts_parse('testparser', 'That''s my first own parser');
tokid | token
tokid | token
-------+--------
3 | That's
12 |
12 |
3 | my
12 |
12 |
3 | first
12 |
12 |
3 | own
12 |
12 |
3 | parser
</programlisting>
</para>
......@@ -68,14 +71,14 @@ mydb-# ADD MAPPING FOR word WITH english_stem;
ALTER TEXT SEARCH CONFIGURATION
mydb=# SELECT to_tsvector('testcfg', 'That''s my first own parser');
to_tsvector
to_tsvector
-------------------------------
'that':1 'first':3 'parser':5
(1 row)
mydb=# SELECT ts_headline('testcfg', 'Supernovae stars are the brightest phenomena in galaxies',
mydb(# to_tsquery('testcfg', 'star'));
ts_headline
ts_headline
-----------------------------------------------------------------
Supernovae &lt;b&gt;stars&lt;/b&gt; are the brightest phenomena in galaxies
(1 row)
......
<!-- $PostgreSQL: pgsql/doc/src/sgml/tsearch2.sgml,v 1.2 2007/12/06 04:12:10 tgl Exp $ -->
<sect1 id="tsearch2">
<title>tsearch2</title>
<indexterm zone="tsearch2">
<primary>tsearch2</primary>
</indexterm>
......
<!-- $PostgreSQL: pgsql/doc/src/sgml/uuid-ossp.sgml,v 1.2 2007/12/06 04:12:10 tgl Exp $ -->
<sect1 id="uuid-ossp">
<title>uuid-ossp</title>
<indexterm zone="uuid-ossp">
<primary>uuid-ossp</primary>
</indexterm>
<para>
This module provides functions to generate universally unique
identifiers (UUIDs) using one of the several standard algorithms, as
well as functions to produce certain special UUID constants.
The <filename>uuid-ossp</> module provides functions to generate universally
unique identifiers (UUIDs) using one of several standard algorithms. There
are also functions to produce certain special UUID constants.
</para>
<para>
This module depends on the OSSP UUID library, which can be found at
<ulink url="http://www.ossp.org/pkg/lib/uuid/"></ulink>.
</para>
<sect2>
<title>UUID Generation</title>
<title><literal>uuid-ossp</literal> Functions</title>
<para>
The relevant standards ITU-T Rec. X.667, ISO/IEC 9834-8:2005, and RFC
4122 specify four algorithms for generating UUIDs, identified by the
......@@ -23,7 +30,7 @@
</para>
<table>
<title><literal>uuid-ossp</literal> functions</title>
<title>Functions for UUID Generation</title>
<tgroup cols="2">
<thead>
<row>
......@@ -59,22 +66,9 @@
<para>
This function generates a version 3 UUID in the given namespace using
the specified input name. The namespace should be one of the special
constants produced by the uuid_ns_*() functions shown below. (It
could be any UUID in theory.) The name is an identifier in the
selected namespace. For example:
</para>
</entry>
</row>
<row>
<entry><literal>uuid_generate_v3(uuid_ns_url(), 'http://www.postgresql.org')</literal></entry>
<entry>
<para>
The name parameter will be MD5-hashed, so the cleartext cannot be
derived from the generated UUID.
</para>
<para>
The generation of UUIDs by this method has no random or
environment-dependent element and is therefore reproducible.
constants produced by the <function>uuid_ns_*()</> functions shown
below. (It could be any UUID in theory.) The name is an identifier
in the selected namespace.
</para>
</entry>
</row>
......@@ -102,15 +96,28 @@
</tgroup>
</table>
<para>
For example:
<programlisting>
SELECT uuid_generate_v3(uuid_ns_url(), 'http://www.postgresql.org');
</programlisting>
The name parameter will be MD5-hashed, so the cleartext cannot be
derived from the generated UUID.
The generation of UUIDs by this method has no random or
environment-dependent element and is therefore reproducible.
</para>
<table>
<title>UUID Constants</title>
<title>Functions Returning UUID Constants</title>
<tgroup cols="2">
<tbody>
<row>
<entry><literal>uuid_nil()</literal></entry>
<entry>
<para>
A "nil" UUID constant, which does not occur as a real UUID.
A <quote>nil</> UUID constant, which does not occur as a real UUID.
</para>
</entry>
</row>
......@@ -135,8 +142,8 @@
<entry>
<para>
Constant designating the ISO object identifier (OID) namespace for
UUIDs. (This pertains to ASN.1 OIDs, unrelated to the OIDs used in
PostgreSQL.)
UUIDs. (This pertains to ASN.1 OIDs, which are unrelated to the OIDs
used in <productname>PostgreSQL</>.)
</para>
</entry>
</row>
......@@ -153,11 +160,14 @@
</tgroup>
</table>
</sect2>
<sect2>
<title>Author</title>
<para>
Peter Eisentraut <email>peter_e@gmx.net</email>
</para>
</sect2>
</sect1>
</sect1>
This diff is collapsed.
This diff is collapsed.
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment