Commit 53e99f57 authored by Tom Lane's avatar Tom Lane

Make an editorial pass over the newly SGML-ified contrib documentation.

Fix lots of bad markup, bad English, bad explanations.

This commit covers only about half the contrib modules, but I grow weary...
parent a37a0a41
<!-- $PostgreSQL: pgsql/doc/src/sgml/adminpack.sgml,v 1.3 2007/12/06 04:12:09 tgl Exp $ -->
<sect1 id="adminpack">
<title>adminpack</title>
<indexterm zone="adminpack">
<primary>adminpack</primary>
</indexterm>
<para>
adminpack is a PostgreSQL standard module that implements a number of
support functions which pgAdmin and other administration and management tools
can use to provide additional functionality if installed on a server.
<filename>adminpack</> provides a number of support functions which
<application>pgAdmin</> and other administration and management tools can
use to provide additional functionality, such as remote management
of server log files.
</para>
<sect2>
<title>Functions implemented</title>
<para>
Functions implemented by adminpack can only be run by a superuser. Here's a
list of these functions:
</para>
<para>
<programlisting>
int8 pg_catalog.pg_file_write(fname text, data text, append bool)
bool pg_catalog.pg_file_rename(oldname text, newname text, archivname text)
bool pg_catalog.pg_file_rename(oldname text, newname text)
bool pg_catalog.pg_file_unlink(fname text)
setof record pg_catalog.pg_logdir_ls()
/* Renaming of existing backend functions for pgAdmin compatibility */
int8 pg_catalog.pg_file_read(fname text, data text, append bool)
bigint pg_catalog.pg_file_length(text)
int4 pg_catalog.pg_logfile_rotate()
</programlisting>
The functions implemented by <filename>adminpack</> can only be run by a
superuser. Here's a list of these functions:
<programlisting>
int8 pg_catalog.pg_file_write(fname text, data text, append bool)
bool pg_catalog.pg_file_rename(oldname text, newname text, archivename text)
bool pg_catalog.pg_file_rename(oldname text, newname text)
bool pg_catalog.pg_file_unlink(fname text)
setof record pg_catalog.pg_logdir_ls()
/* Renaming of existing backend functions for pgAdmin compatibility */
int8 pg_catalog.pg_file_read(fname text, data text, append bool)
bigint pg_catalog.pg_file_length(text)
int4 pg_catalog.pg_logfile_rotate()
</programlisting>
</para>
</sect2>
</sect1>
<!-- $PostgreSQL: pgsql/doc/src/sgml/btree-gist.sgml,v 1.4 2007/12/06 04:12:09 tgl Exp $ -->
<sect1 id="btree-gist">
<title>btree_gist</title>
<indexterm zone="btree-gist">
<primary>btree_gist</primary>
</indexterm>
<para>
btree_gist is a B-Tree implementation using GiST that supports the int2, int4,
int8, float4, float8 timestamp with/without time zone, time
with/without time zone, date, interval, oid, money, macaddr, char,
varchar/text, bytea, numeric, bit, varbit and inet/cidr types.
<filename>btree_gist</> provides sample GiST operator classes that
implement B-Tree equivalent behavior for the data types
<type>int2</>, <type>int4</>, <type>int8</>, <type>float4</>,
<type>float8</>, <type>numeric</>, <type>timestamp with time zone</>,
<type>timestamp without time zone</>, <type>time with time zone</>,
<type>time without time zone</>, <type>date</>, <type>interval</>,
<type>oid</>, <type>money</>, <type>char</>,
<type>varchar</>, <type>text</>, <type>bytea</>, <type>bit</>,
<type>varbit</>, <type>macaddr</>, <type>inet</>, and <type>cidr</>.
</para>
<para>
In general, these operator classes will not outperform the equivalent
standard btree index methods, and they lack one major feature of the
standard btree code: the ability to enforce uniqueness. However,
they are useful for GiST testing and as a base for developing other
GiST operator classes.
</para>
<sect2>
<title>Example usage</title>
<programlisting>
CREATE TABLE test (a int4);
-- create index
CREATE INDEX testidx ON test USING gist (a);
-- query
SELECT * FROM test WHERE a &lt; 10;
</programlisting>
<programlisting>
CREATE TABLE test (a int4);
-- create index
CREATE INDEX testidx ON test USING gist (a);
-- query
SELECT * FROM test WHERE a &lt; 10;
</programlisting>
</sect2>
<sect2>
<title>Authors</title>
<para>
All work was done by Teodor Sigaev (<email>teodor@stack.net</email>) ,
Oleg Bartunov (<email>oleg@sai.msu.su</email>), Janko Richter
(<email>jankorichter@yahoo.de</email>). See
<ulink url="http://www.sai.msu.su/~megera/postgres/gist"></ulink> for additional
information.
Teodor Sigaev (<email>teodor@stack.net</email>) ,
Oleg Bartunov (<email>oleg@sai.msu.su</email>), and
Janko Richter (<email>jankorichter@yahoo.de</email>). See
<ulink url="http://www.sai.msu.su/~megera/postgres/gist"></ulink>
for additional information.
</para>
</sect2>
</sect1>
<!-- $PostgreSQL: pgsql/doc/src/sgml/chkpass.sgml,v 1.2 2007/12/06 04:12:09 tgl Exp $ -->
<sect1 id="chkpass">
<title>chkpass</title>
<!--
<title>chkpass</title>
<indexterm zone="chkpass">
<primary>chkpass</primary>
</indexterm>
-->
<para>
chkpass is a password type that is automatically checked and converted upon
entry. It is stored encrypted. To compare, simply compare against a clear
This module implements a data type <type>chkpass</> that is
designed for storing encrypted passwords.
Each password is automatically converted to encrypted form upon entry,
and is always stored encrypted. To compare, simply compare against a clear
text password and the comparison function will encrypt it before comparing.
It also returns an error if the code determines that the password is easily
crackable. This is currently a stub that does nothing.
</para>
<para>
Note that the chkpass data type is not indexable.
<!--
I haven't worried about making this type indexable. I doubt that anyone
would ever need to sort a file in order of encrypted password.
-->
There are provisions in the code to report an error if the password is
determined to be easily crackable. However, this is currently just
a stub that does nothing.
</para>
<para>
If you precede the string with a colon, the encryption and checking are
skipped so that you can enter existing passwords into the field.
If you precede an input string with a colon, it is assumed to be an
already-encrypted password, and is stored without further encryption.
This allows entry of previously-encrypted passwords.
</para>
<para>
On output, a colon is prepended. This makes it possible to dump and reload
passwords without re-encrypting them. If you want the password (encrypted)
without the colon then use the raw() function. This allows you to use the
passwords without re-encrypting them. If you want the encrypted password
without the colon then use the <function>raw()</> function.
This allows you to use the
type with things like Apache's Auth_PostgreSQL module.
</para>
<para>
The encryption uses the standard Unix function crypt(), and so it suffers
The encryption uses the standard Unix function <function>crypt()</>,
and so it suffers
from all the usual limitations of that function; notably that only the
first eight characters of a password are considered.
</para>
<para>
Here is some sample usage:
Note that the chkpass data type is not indexable.
<!--
I haven't worried about making this type indexable. I doubt that anyone
would ever need to sort a file in order of encrypted password.
-->
</para>
<programlisting>
<para>
Sample usage:
</para>
<programlisting>
test=# create table test (p chkpass);
CREATE TABLE
test=# insert into test values ('hello');
......@@ -72,13 +82,14 @@ test=# select p = 'goodbye' from test;
----------
f
(1 row)
</programlisting>
</programlisting>
<sect2>
<title>Author</title>
<para>
D'Arcy J.M. Cain <email>darcy@druid.net</email>
D'Arcy J.M. Cain (<email>darcy@druid.net</email>)
</para>
</sect2>
</sect1>
</sect1>
<!-- $PostgreSQL: pgsql/doc/src/sgml/contrib-spi.sgml,v 1.1 2007/12/03 04:18:47 tgl Exp $ -->
<!-- $PostgreSQL: pgsql/doc/src/sgml/contrib-spi.sgml,v 1.2 2007/12/06 04:12:09 tgl Exp $ -->
<sect1 id="contrib-spi">
<title>spi</title>
......@@ -29,27 +29,28 @@
<para>
<function>check_primary_key()</> checks the referencing table.
To use, create a BEFORE INSERT OR UPDATE trigger using this
function on a table referencing another table. You are to specify
as trigger arguments: triggered table column names which correspond
to foreign key, referenced table name and column names in referenced
table which correspond to primary/unique key. To handle multiple
foreign keys, create a trigger for each reference.
To use, create a <literal>BEFORE INSERT OR UPDATE</> trigger using this
function on a table referencing another table. Specify as the trigger
arguments: the referencing table's column name(s) which form the foreign
key, the referenced table name, and the column names in the referenced table
which form the primary/unique key. To handle multiple foreign
keys, create a trigger for each reference.
</para>
<para>
<function>check_foreign_key()</> checks the referenced table.
To use, create a BEFORE DELETE OR UPDATE trigger using this
function on a table referenced by other table(s). You are to specify
as trigger arguments: number of references for which the function has to
perform checking, action if referencing key found ('cascade' &mdash; to delete
corresponding foreign key, 'restrict' &mdash; to abort transaction if foreign keys
exist, 'setnull' &mdash; to set foreign key referencing primary/unique key
being deleted to null), triggered table column names which correspond
to primary/unique key, then referencing table name and column names
corresponding to foreign key (repeated for as many referencing tables/keys
as were specified by first argument). Note that the primary/unique key
columns should be marked NOT NULL and should have a unique index.
To use, create a <literal>BEFORE DELETE OR UPDATE</> trigger using this
function on a table referenced by other table(s). Specify as the trigger
arguments: the number of referencing tables for which the function has to
perform checking, the action if a referencing key is found
(<literal>cascade</> &mdash; to delete the referencing row,
<literal>restrict</> &mdash; to abort transaction if referencing keys
exist, <literal>setnull</> &mdash; to set referencing key fields to null),
the triggered table's column names which form the primary/unique key, then
the referencing table name and column names (repeated for as many
referencing tables as were specified by first argument). Note that the
primary/unique key columns should be marked NOT NULL and should have a
unique index.
</para>
<para>
......@@ -64,60 +65,65 @@
Long ago, <productname>PostgreSQL</> had a built-in time travel feature
that kept the insert and delete times for each tuple. This can be
emulated using these functions. To use these functions,
you are to add to a table two columns of <type>abstime</> type to store
you must add to a table two columns of <type>abstime</> type to store
the date when a tuple was inserted (start_date) and changed/deleted
(stop_date):
<programlisting>
CREATE TABLE mytab (
... ...
start_date abstime default now(),
stop_date abstime default 'infinity'
start_date abstime,
stop_date abstime
... ...
);
</programlisting>
So, tuples being inserted with unspecified start_date/stop_date will get
the current time in start_date and <literal>infinity</> in
stop_date.
The columns can be named whatever you like, but in this discussion
we'll call them start_date and stop_date.
</para>
<para>
When a new row is inserted, start_date should normally be set to
current time, and stop_date to <literal>infinity</>. The trigger
will automatically substitute these values if the inserted data
contains nulls in these columns. Generally, inserting explicit
non-null data in these columns should only be done when re-loading
dumped data.
</para>
<para>
Tuples with stop_date equal to <literal>infinity</> are <quote>valid
now</quote>: when trigger will be fired for UPDATE/DELETE of a tuple with
stop_date NOT equal to <literal>infinity</> then
this tuple will not be changed/deleted!
now</quote>, and can be modified. Tuples with a finite stop_date cannot
be modified anymore &mdash; the trigger will prevent it. (If you need
to do that, you can turn off time travel as shown below.)
</para>
<para>
If stop_date is equal to <literal>infinity</> then on
update only the stop_date in the tuple being updated will be changed (to
current time) and a new tuple with new data (coming from SET ... in UPDATE)
will be inserted. Start_date in this new tuple will be set to current time
and stop_date to <literal>infinity</>.
For a modifiable row, on update only the stop_date in the tuple being
updated will be changed (to current time) and a new tuple with the modified
data will be inserted. Start_date in this new tuple will be set to current
time and stop_date to <literal>infinity</>.
</para>
<para>
A delete does not actually remove the tuple but only set its stop_date
A delete does not actually remove the tuple but only sets its stop_date
to current time.
</para>
<para>
To query for tuples <quote>valid now</quote>, include
<literal>stop_date = 'infinity'</> in the query's WHERE condition.
(You might wish to incorporate that in a view.)
</para>
<para>
You can't change start/stop date columns with UPDATE!
Use set_timetravel (below) if you need this.
(You might wish to incorporate that in a view.) Similarly, you can
query for tuples valid at any past time with suitable conditions on
start_date and stop_date.
</para>
<para>
<function>timetravel()</> is the general trigger function that supports
this behavior. Create a BEFORE INSERT OR UPDATE OR DELETE trigger using this
function on each time-traveled table. You are to specify two trigger arguments:
name of start_date column and name of stop_date column in triggered table.
this behavior. Create a <literal>BEFORE INSERT OR UPDATE OR DELETE</>
trigger using this function on each time-traveled table. Specify two
trigger arguments: the actual
names of the start_date and stop_date columns.
Optionally, you can specify one to three more arguments, which must refer
to columns of type <type>text</>. The trigger will store the name of
the current user into the first of these columns during INSERT, the
......@@ -130,7 +136,9 @@ CREATE TABLE mytab (
<literal>set_timetravel('mytab', 1)</> will turn TT ON for table mytab.
<literal>set_timetravel('mytab', 0)</> will turn TT OFF for table mytab.
In both cases the old status is reported. While TT is off, you can modify
the start_date and stop_date columns freely.
the start_date and stop_date columns freely. Note that the on/off status
is local to the current database session &mdash; fresh sessions will
always start out with TT ON for all tables.
</para>
<para>
......@@ -156,9 +164,9 @@ CREATE TABLE mytab (
</para>
<para>
To use, create a BEFORE INSERT (or optionally BEFORE INSERT OR UPDATE)
trigger using this function. You are to specify
as trigger arguments: the name of the integer column to be modified,
To use, create a <literal>BEFORE INSERT</> (or optionally <literal>BEFORE
INSERT OR UPDATE</>) trigger using this function. Specify two
trigger arguments: the name of the integer column to be modified,
and the name of the sequence object that will supply values.
(Actually, you can specify any number of pairs of such names, if
you'd like to update more than one autoincrementing column.)
......@@ -180,8 +188,8 @@ CREATE TABLE mytab (
</para>
<para>
To use, create a BEFORE INSERT and/or UPDATE
trigger using this function. You are to specify a single trigger
To use, create a <literal>BEFORE INSERT</> and/or <literal>UPDATE</>
trigger using this function. Specify a single trigger
argument: the name of the text column to be modified.
</para>
......@@ -201,8 +209,8 @@ CREATE TABLE mytab (
</para>
<para>
To use, create a BEFORE UPDATE
trigger using this function. You are to specify a single trigger
To use, create a <literal>BEFORE UPDATE</>
trigger using this function. Specify a single trigger
argument: the name of the <type>timestamp</> column to be modified.
</para>
......
<!-- $PostgreSQL: pgsql/doc/src/sgml/contrib.sgml,v 1.7 2007/12/03 04:18:47 tgl Exp $ -->
<!-- $PostgreSQL: pgsql/doc/src/sgml/contrib.sgml,v 1.8 2007/12/06 04:12:09 tgl Exp $ -->
<appendix id="contrib">
<title>Additional Supplied Modules</title>
......@@ -44,7 +44,7 @@
<para>
Many modules supply new user-defined functions, operators, or types.
To make use of one of these modules, after you have installed the code
you need to register the new objects in the database
you need to register the new objects in the database
system by running the SQL commands in the <literal>.sql</> file
supplied by the module. For example,
......@@ -54,6 +54,7 @@ psql -d dbname -f <replaceable>SHAREDIR</>/contrib/<replaceable>module</>.sql
Here, <replaceable>SHAREDIR</> means the installation's <quote>share</>
directory (<literal>pg_config --sharedir</> will tell you what this is).
In most cases the script must be run by a database superuser.
</para>
<para>
......
<!-- $PostgreSQL: pgsql/doc/src/sgml/cube.sgml,v 1.5 2007/12/06 04:12:09 tgl Exp $ -->
<sect1 id="cube">
<title>cube</title>
<indexterm zone="cube">
<primary>cube</primary>
</indexterm>
<para>
This module contains the user-defined type, CUBE, representing
multidimensional cubes.
This module implements a data type <type>cube</> for
representing multi-dimensional cubes.
</para>
<sect2>
<title>Syntax</title>
<para>
The following are valid external representations for the CUBE type:
The following are valid external representations for the <type>cube</>
type. <replaceable>x</>, <replaceable>y</>, etc denote floating-point
numbers:
</para>
<table>
......@@ -23,289 +26,114 @@
<tgroup cols="2">
<tbody>
<row>
<entry>'x'</entry>
<entry>A floating point value representing a one-dimensional point or
one-dimensional zero length cubement
</entry>
</row>
<row>
<entry>'(x)'</entry>
<entry>Same as above</entry>
</row>
<row>
<entry>'x1,x2,x3,...,xn'</entry>
<entry>A point in n-dimensional space, represented internally as a zero
volume box
</entry>
</row>
<row>
<entry>'(x1,x2,x3,...,xn)'</entry>
<entry>Same as above</entry>
</row>
<row>
<entry>'(x),(y)'</entry>
<entry>1-D cubement starting at x and ending at y or vice versa; the
order does not matter
</entry>
</row>
<row>
<entry>'(x1,...,xn),(y1,...,yn)'</entry>
<entry>n-dimensional box represented by a pair of its opposite corners, no
matter which. Functions take care of swapping to achieve "lower left --
upper right" representation before computing any values
</entry>
</row>
</tbody>
</tgroup>
</table>
</sect2>
<sect2>
<title>Grammar</title>
<table>
<title>Cube Grammar Rules</title>
<tgroup cols="2">
<tbody>
<row>
<entry>rule 1</entry>
<entry>box -> O_BRACKET paren_list COMMA paren_list C_BRACKET</entry>
</row>
<row>
<entry>rule 2</entry>
<entry>box -> paren_list COMMA paren_list</entry>
</row>
<row>
<entry>rule 3</entry>
<entry>box -> paren_list</entry>
</row>
<row>
<entry>rule 4</entry>
<entry>box -> list</entry>
</row>
<row>
<entry>rule 5</entry>
<entry>paren_list -> O_PAREN list C_PAREN</entry>
</row>
<row>
<entry>rule 6</entry>
<entry>list -> FLOAT</entry>
</row>
<row>
<entry>rule 7</entry>
<entry>list -> list COMMA FLOAT</entry>
</row>
</tbody>
</tgroup>
</table>
</sect2>
<sect2>
<title>Tokens</title>
<table>
<title>Cube Grammar Rules</title>
<tgroup cols="2">
<tbody>
<row>
<entry>n</entry>
<entry>[0-9]+</entry>
</row>
<row>
<entry>i</entry>
<entry>nteger [+-]?{n}</entry>
</row>
<row>
<entry>real</entry>
<entry>[+-]?({n}\.{n}?|\.{n})</entry>
</row>
<row>
<entry>FLOAT</entry>
<entry>({integer}|{real})([eE]{integer})?</entry>
</row>
<row>
<entry>O_BRACKET</entry>
<entry>\[</entry>
</row>
<row>
<entry>C_BRACKET</entry>
<entry>\]</entry>
</row>
<row>
<entry>O_PAREN</entry>
<entry>\(</entry>
</row>
<row>
<entry>C_PAREN</entry>
<entry>\)</entry>
</row>
<row>
<entry>COMMA</entry>
<entry>\,</entry>
</row>
</tbody>
</tgroup>
</table>
</sect2>
<sect2>
<title>Examples</title>
<table>
<title>Examples</title>
<tgroup cols="2">
<tbody>
<row>
<entry>'x'</entry>
<entry>A floating point value representing a one-dimensional point
<entry><literal><replaceable>x</></literal></entry>
<entry>A one-dimensional point
(or, zero-length one-dimensional interval)
</entry>
</row>
<row>
<entry>'(x)'</entry>
<entry><literal>(<replaceable>x</>)</literal></entry>
<entry>Same as above</entry>
</row>
<row>
<entry>'x1,x2,x3,...,xn'</entry>
<entry>A point in n-dimensional space,represented internally as a zero
volume cube
<entry><literal><replaceable>x1</>,<replaceable>x2</>,...,<replaceable>xn</></literal></entry>
<entry>A point in n-dimensional space, represented internally as a
zero-volume cube
</entry>
</row>
<row>
<entry>'(x1,x2,x3,...,xn)'</entry>
<entry><literal>(<replaceable>x1</>,<replaceable>x2</>,...,<replaceable>xn</>)</literal></entry>
<entry>Same as above</entry>
</row>
<row>
<entry>'(x),(y)'</entry>
<entry>A 1-D interval starting at x and ending at y or vice versa; the
<entry><literal>(<replaceable>x</>),(<replaceable>y</>)</literal></entry>
<entry>A one-dimensional interval starting at <replaceable>x</> and ending at <replaceable>y</> or vice versa; the
order does not matter
</entry>
</row>
<row>
<entry>'[(x),(y)]'</entry>
<entry><literal>[(<replaceable>x</>),(<replaceable>y</>)]</literal></entry>
<entry>Same as above</entry>
</row>
<row>
<entry>'(x1,...,xn),(y1,...,yn)'</entry>
<entry>An n-dimensional box represented by a pair of its diagonally
opposite corners, regardless of order. Swapping is provided
by all comarison routines to ensure the
"lower left -- upper right" representation
before actaul comparison takes place.
<entry><literal>(<replaceable>x1</>,...,<replaceable>xn</>),(<replaceable>y1</>,...,<replaceable>yn</>)</literal></entry>
<entry>An n-dimensional cube represented by a pair of its diagonally
opposite corners
</entry>
</row>
<row>
<entry>'[(x1,...,xn),(y1,...,yn)]'</entry>
<entry><literal>[(<replaceable>x1</>,...,<replaceable>xn</>),(<replaceable>y1</>,...,<replaceable>yn</>)]</literal></entry>
<entry>Same as above</entry>
</row>
</tbody>
</tgroup>
</table>
<para>
White space is ignored, so '[(x),(y)]' can be: '[ ( x ), ( y ) ]'
It does not matter which order the opposite corners of a cube are
entered in. The <type>cube</> functions
automatically swap values if needed to create a uniform
<quote>lower left &mdash; upper right</> internal representation.
</para>
</sect2>
<sect2>
<title>Defaults</title>
<para>
I believe this union:
White space is ignored, so <literal>[(<replaceable>x</>),(<replaceable>y</>)]</literal> is the same as
<literal>[ ( <replaceable>x</> ), ( <replaceable>y</> ) ]</literal>.
</para>
<programlisting>
select cube_union('(0,5,2),(2,3,1)','0');
cube_union
-------------------
(0, 0, 0),(2, 5, 2)
(1 row)
</programlisting>
<para>
does not contradict to the common sense, neither does the intersection
</para>
<programlisting>
select cube_inter('(0,-1),(1,1)','(-2),(2)');
cube_inter
-------------
(0, 0),(1, 0)
(1 row)
</programlisting>
<para>
In all binary operations on differently sized boxes, I assume the smaller
one to be a cartesian projection, i. e., having zeroes in place of coordinates
omitted in the string representation. The above examples are equivalent to:
</para>
<programlisting>
cube_union('(0,5,2),(2,3,1)','(0,0,0),(0,0,0)');
cube_inter('(0,-1),(1,1)','(-2,0),(2,0)');
</programlisting>
<para>
The following containment predicate uses the point syntax,
while in fact the second argument is internally represented by a box.
This syntax makes it unnecessary to define the special Point type
and functions for (box,point) predicates.
</para>
<programlisting>
select cube_contains('(0,0),(1,1)', '0.5,0.5');
cube_contains
--------------
t
(1 row)
</programlisting>
</sect2>
<sect2>
<title>Precision</title>
<para>
Values are stored internally as 64-bit floating point numbers. This means that
numbers with more than about 16 significant digits will be truncated.
Values are stored internally as 64-bit floating point numbers. This means
that numbers with more than about 16 significant digits will be truncated.
</para>
</sect2>
<sect2>
<title>Usage</title>
<para>
The access method for CUBE is a GiST index (gist_cube_ops), which is a
generalization of R-tree. GiSTs allow the postgres implementation of
R-tree, originally encoded to support 2-D geometric types such as
boxes and polygons, to be used with any data type whose data domain
can be partitioned using the concepts of containment, intersection and
equality. In other words, everything that can intersect or contain
its own kind can be indexed with a GiST. That includes, among other
things, all geometric data types, regardless of their dimensionality
(see also contrib/seg).
</para>
<para>
The operators supported by the GiST access method include:
</para>
<programlisting>
a = b Same as
</programlisting>
<para>
The cubements a and b are identical.
The <filename>cube</> module includes a GiST index operator class for
<type>cube</> values.
The operators supported by the GiST opclass include:
</para>
<programlisting>
<itemizedlist>
<listitem>
<programlisting>
a = b Same as
</programlisting>
<para>
The cubes a and b are identical.
</para>
</listitem>
<listitem>
<programlisting>
a &amp;&amp; b Overlaps
</programlisting>
<para>
The cubements a and b overlap.
</para>
<programlisting>
</programlisting>
<para>
The cubes a and b overlap.
</para>
</listitem>
<listitem>
<programlisting>
a @&gt; b Contains
</programlisting>
<para>
The cubement a contains the cubement b.
</para>
</programlisting>
<para>
The cube a contains the cube b.
</para>
</listitem>
<listitem>
<programlisting>
a &lt;@ b Contained in
</programlisting>
<para>
The cubement a is contained in b.
</para>
<para>
The cube a is contained in the cube b.
</para>
</listitem>
</itemizedlist>
<para>
(Before PostgreSQL 8.2, the containment operators @&gt; and &lt;@ were
......@@ -316,26 +144,18 @@ a &lt;@ b Contained in
</para>
<para>
Although the mnemonics of the following operators is questionable, I
preserved them to maintain visual consistency with other geometric
data types defined in Postgres.
</para>
<para>
Other operators:
</para>
The standard B-tree operators are also provided, for example
<programlisting>
[a, b] &lt; [c, d] Less than
[a, b] &gt; [c, d] Greater than
</programlisting>
<para>
These operators do not make a lot of sense for any practical
purpose but sorting. These operators first compare (a) to (c),
and if these are equal, compare (b) to (d). That accounts for
and if these are equal, compare (b) to (d). That results in
reasonably good sorting in most cases, which is useful if
you want to use ORDER BY with this type
you want to use ORDER BY with this type.
</para>
<para>
......@@ -343,49 +163,35 @@ a &lt;@ b Contained in
</para>
<table>
<title>Functions available</title>
<title>Cube functions</title>
<tgroup cols="2">
<tbody>
<row>
<entry><literal>cube_distance(cube, cube) returns double</literal></entry>
<entry>cube_distance returns the distance between two cubes. If both
cubes are points, this is the normal distance function.
</entry>
</row>
<row>
<entry><literal>cube(text)</literal></entry>
<entry>Takes text input and returns a cube. This is useful for making
cubes from computed strings.
</entry>
</row>
<row>
<entry><literal>cube(float8) returns cube</literal></entry>
<entry>This makes a one dimensional cube with both coordinates the same.
If the type of the argument is a numeric type other than float8 an
explicit cast to float8 may be needed.
<entry>Makes a one dimensional cube with both coordinates the same.
<literal>cube(1) == '(1)'</literal>
</entry>
</row>
<row>
<entry><literal>cube(float8, float8) returns cube</literal></entry>
<entry>
This makes a one dimensional cube.
<entry>Makes a one dimensional cube.
<literal>cube(1,2) == '(1),(2)'</literal>
</entry>
</row>
<row>
<entry><literal>cube(float8[]) returns cube</literal></entry>
<entry>This makes a zero-volume cube using the coordinates
defined by thearray.<literal>cube(ARRAY[1,2]) == '(1,2)'</literal>
<entry>Makes a zero-volume cube using the coordinates
defined by the array.
<literal>cube(ARRAY[1,2]) == '(1,2)'</literal>
</entry>
</row>
<row>
<entry><literal>cube(float8[], float8[]) returns cube</literal></entry>
<entry>This makes a cube, with upper right and lower left
coordinates as defined by the 2 float arrays. Arrays must be of the
<entry>Makes a cube with upper right and lower left
coordinates as defined by the two arrays, which must be of the
same length.
<literal>cube('{1,2}'::float[], '{3,4}'::float[]) == '(1,2),(3,4)'
</literal>
......@@ -394,8 +200,8 @@ a &lt;@ b Contained in
<row>
<entry><literal>cube(cube, float8) returns cube</literal></entry>
<entry>This builds a new cube by adding a dimension on to an
existing cube with the same values for both parts of the new coordinate.
<entry>Makes a new cube by adding a dimension on to an
existing cube with the same values for both parts of the new coordinate.
This is useful for building cubes piece by piece from calculated values.
<literal>cube('(1)',2) == '(1,2),(1,2)'</literal>
</entry>
......@@ -403,133 +209,194 @@ a &lt;@ b Contained in
<row>
<entry><literal>cube(cube, float8, float8) returns cube</literal></entry>
<entry>This builds a new cube by adding a dimension on to an
existing cube. This is useful for building cubes piece by piece from
<entry>Makes a new cube by adding a dimension on to an
existing cube. This is useful for building cubes piece by piece from
calculated values. <literal>cube('(1,2)',3,4) == '(1,3),(2,4)'</literal>
</entry>
</row>
<row>
<entry><literal>cube_dim(cube) returns int</literal></entry>
<entry>cube_dim returns the number of dimensions stored in the
the data structure
for a cube. This is useful for constraints on the dimensions of a cube.
<entry>Returns the number of dimensions of the cube
</entry>
</row>
<row>
<entry><literal>cube_ll_coord(cube, int) returns double </literal></entry>
<entry>
cube_ll_coord returns the nth coordinate value for the lower left
corner of a cube. This is useful for doing coordinate transformations.
<entry>Returns the n'th coordinate value for the lower left
corner of a cube
</entry>
</row>
<row>
<entry><literal>cube_ur_coord(cube, int) returns double
</literal></entry>
<entry>cube_ur_coord returns the nth coordinate value for the
upper right corner of a cube. This is useful for doing coordinate
transformations.
<entry>Returns the n'th coordinate value for the
upper right corner of a cube
</entry>
</row>
<row>
<entry><literal>cube_is_point(cube) returns bool</literal></entry>
<entry>Returns true if a cube is a point, that is,
the two defining corners are the same.</entry>
</row>
<row>
<entry><literal>cube_distance(cube, cube) returns double</literal></entry>
<entry>Returns the distance between two cubes. If both
cubes are points, this is the normal distance function.
</entry>
</row>
<row>
<entry><literal>cube_subset(cube, int[]) returns cube
</literal></entry>
<entry>Builds a new cube from an existing cube, using a list of
dimension indexes
from an array. Can be used to find both the ll and ur coordinate of single
dimenion, e.g.: cube_subset(cube('(1,3,5),(6,7,8)'), ARRAY[2]) = '(3),(7)'
Or can be used to drop dimensions, or reorder them as desired, e.g.:
cube_subset(cube('(1,3,5),(6,7,8)'), ARRAY[3,2,1,1]) =
'(5, 3, 1, 1),(8, 7, 6, 6)'
<entry>Makes a new cube from an existing cube, using a list of
dimension indexes from an array. Can be used to find both the LL and UR
coordinates of a single dimension, e.g.
<literal>cube_subset(cube('(1,3,5),(6,7,8)'), ARRAY[2]) = '(3),(7)'</>.
Or can be used to drop dimensions, or reorder them as desired, e.g.
<literal>cube_subset(cube('(1,3,5),(6,7,8)'), ARRAY[3,2,1,1]) = '(5, 3,
1, 1),(8, 7, 6, 6)'</>.
</entry>
</row>
<row>
<entry><literal>cube_is_point(cube) returns bool</literal></entry>
<entry>cube_is_point returns true if a cube is also a point.
This is true when the two defining corners are the same.</entry>
<entry><literal>cube_union(cube, cube) returns cube</literal></entry>
<entry>Produces the union of two cubes
</entry>
</row>
<row>
<entry><literal>cube_inter(cube, cube) returns cube</literal></entry>
<entry>Produces the intersection of two cubes
</entry>
</row>
<row>
<entry><literal>cube_enlarge(cube, double, int) returns cube</literal></entry>
<entry>
cube_enlarge increases the size of a cube by a specified
radius in at least
n dimensions. If the radius is negative the box is shrunk instead. This
<entry><literal>cube_enlarge(cube c, double r, int n) returns cube</literal></entry>
<entry>Increases the size of a cube by a specified radius in at least
n dimensions. If the radius is negative the cube is shrunk instead. This
is useful for creating bounding boxes around a point for searching for
nearby points. All defined dimensions are changed by the radius. If n
is greater than the number of defined dimensions and the cube is being
increased (r &gt;= 0) then 0 is used as the base for the extra coordinates.
LL coordinates are decreased by r and UR coordinates are increased by r.
If a LL coordinate is increased to larger than the corresponding UR
coordinate (this can only happen when r &lt; 0) than both coordinates are
set to their average. To make it harder for people to break things there
is an effective maximum on the dimension of cubes of 100. This is set
in cubedata.h if you need something bigger.
nearby points. All defined dimensions are changed by the radius r.
LL coordinates are decreased by r and UR coordinates are increased by r.
If a LL coordinate is increased to larger than the corresponding UR
coordinate (this can only happen when r &lt; 0) than both coordinates
are set to their average. If n is greater than the number of defined
dimensions and the cube is being increased (r &gt;= 0) then 0 is used
as the base for the extra coordinates.
</entry>
</row>
</tbody>
</tgroup>
</table>
</sect2>
<sect2>
<title>Defaults</title>
<para>
I believe this union:
</para>
<programlisting>
select cube_union('(0,5,2),(2,3,1)', '0');
cube_union
-------------------
(0, 0, 0),(2, 5, 2)
(1 row)
</programlisting>
<para>
does not contradict common sense, neither does the intersection
</para>
<programlisting>
select cube_inter('(0,-1),(1,1)', '(-2),(2)');
cube_inter
-------------
(0, 0),(1, 0)
(1 row)
</programlisting>
<para>
In all binary operations on differently-dimensioned cubes, I assume the
lower-dimensional one to be a cartesian projection, i. e., having zeroes
in place of coordinates omitted in the string representation. The above
examples are equivalent to:
</para>
<programlisting>
cube_union('(0,5,2),(2,3,1)','(0,0,0),(0,0,0)');
cube_inter('(0,-1),(1,1)','(-2,0),(2,0)');
</programlisting>
<para>
The following containment predicate uses the point syntax,
while in fact the second argument is internally represented by a box.
This syntax makes it unnecessary to define a separate point type
and functions for (box,point) predicates.
</para>
<programlisting>
select cube_contains('(0,0),(1,1)', '0.5,0.5');
cube_contains
--------------
t
(1 row)
</programlisting>
</sect2>
<sect2>
<title>Notes</title>
<para>
There are a few other potentially useful functions defined in cube.c
that vanished from the schema because I stopped using them. Some of
these were meant to support type casting. Let me know if I was wrong:
I will then add them back to the schema. I would also appreciate
other ideas that would enhance the type and make it more useful.
For examples of usage, see the regression test <filename>sql/cube.sql</>.
</para>
<para>
For examples of usage, see sql/cube.sql
To make it harder for people to break things, there
is a limit of 100 on the number of dimensions of cubes. This is set
in <filename>cubedata.h</> if you need something bigger.
</para>
</sect2>
<sect2>
<title>Credits</title>
<para>
This code is essentially based on the example written for
Illustra, <ulink url="http://garcia.me.berkeley.edu/~adong/rtree"></ulink>
</para>
<para>
My thanks are primarily to Prof. Joe Hellerstein
(<ulink url="http://db.cs.berkeley.edu/~jmh/"></ulink>) for elucidating the
gist of the GiST (<ulink url="http://gist.cs.berkeley.edu/"></ulink>), and
to his former student, Andy Dong
(<ulink url="http://best.me.berkeley.edu/~adong/"></ulink>), for his exemplar.
I am also grateful to all postgres developers, present and past, for enabling
myself to create my own world and live undisturbed in it. And I would like to
acknowledge my gratitude to Argonne Lab and to the U.S. Department of Energy
for the years of faithful support of my database research.
Original author: Gene Selkov, Jr. <email>selkovjr@mcs.anl.gov</email>,
Mathematics and Computer Science Division, Argonne National Laboratory.
</para>
<para>
Gene Selkov, Jr.
Computational Scientist
Mathematics and Computer Science Division
Argonne National Laboratory
9700 S Cass Ave.
Building 221
Argonne, IL 60439-4844
<email>selkovjr@mcs.anl.gov</email>
My thanks are primarily to Prof. Joe Hellerstein
(<ulink url="http://db.cs.berkeley.edu/~jmh/"></ulink>) for elucidating the
gist of the GiST (<ulink url="http://gist.cs.berkeley.edu/"></ulink>), and
to his former student, Andy Dong (<ulink
url="http://best.me.berkeley.edu/~adong/"></ulink>), for his example
written for Illustra,
<ulink url="http://garcia.me.berkeley.edu/~adong/rtree"></ulink>.
I am also grateful to all Postgres developers, present and past, for
enabling myself to create my own world and live undisturbed in it. And I
would like to acknowledge my gratitude to Argonne Lab and to the
U.S. Department of Energy for the years of faithful support of my database
research.
</para>
<para>
Minor updates to this package were made by Bruno Wolff III
<email>bruno@wolff.to</email> in August/September of 2002. These include
changing the precision from single precision to double precision and adding
Minor updates to this package were made by Bruno Wolff III
<email>bruno@wolff.to</email> in August/September of 2002. These include
changing the precision from single precision to double precision and adding
some new functions.
</para>
<para>
Additional updates were made by Joshua Reich <email>josh@root.net</email> in
July 2006. These include <literal>cube(float8[], float8[])</literal> and
cleaning up the code to use the V1 call protocol instead of the deprecated V0
form.
Additional updates were made by Joshua Reich <email>josh@root.net</email> in
July 2006. These include <literal>cube(float8[], float8[])</literal> and
cleaning up the code to use the V1 call protocol instead of the deprecated
V0 protocol.
</para>
</sect2>
</sect1>
</sect1>
<!-- $PostgreSQL: pgsql/doc/src/sgml/dblink.sgml,v 1.3 2007/12/06 04:12:09 tgl Exp $ -->
<sect1 id="dblink">
<title>dblink</title>
<indexterm zone="dblink">
<primary>dblink</primary>
</indexterm>
<para>
<literal>dblink</> is a module which allows connections with
other databases.
<filename>dblink</> is a module which supports connections to
other <productname>PostgreSQL</> databases from within a database
session.
</para>
<refentry id="CONTRIB-DBLINK-CONNECT">
......@@ -15,100 +18,199 @@
<refname>dblink_connect</refname>
<refpurpose>opens a persistent connection to a remote database</refpurpose>
</refnamediv>
<refsynopsisdiv>
<synopsis>
dblink_connect(text connstr)
dblink_connect(text connname, text connstr)
dblink_connect(text connstr) returns text
dblink_connect(text connname, text connstr) returns text
</synopsis>
</refsynopsisdiv>
<refsect1>
<title>Inputs</title>
<refsect2>
<title>connname</title>
<para>
if 2 arguments ar given, the first is used as a name for a persistent
connection
</para>
</refsect2>
<refsect2>
<title>connstr</title>
<para>
standard libpq format connection string,
e.g. "hostaddr=127.0.0.1 port=5432 dbname=mydb user=postgres password=mypasswd"
</para>
<para>
if only one argument is given, the connection is unnamed; only one unnamed
connection can exist at a time
</para>
</refsect2>
</refsect1>
<refsect1>
<title>Outputs</title>
<para>Returns status = "OK"</para>
</refsect1>
<refsect1>
<title>Description</title>
<para>
<function>dblink_connect()</> establishes a connection to a remote
<productname>PostgreSQL</> database. The server and database to
be contacted are identified through a standard <application>libpq</>
connection string. Optionally, a name can be assigned to the
connection. Multiple named connections can be open at once, but
only one unnamed connection is permitted at a time. The connection
will persist until closed or until the database session is ended.
</para>
</refsect1>
<refsect1>
<title>Arguments</title>
<variablelist>
<varlistentry>
<term><parameter>conname</parameter></term>
<listitem>
<para>
The name to use for this connection; if omitted, an unnamed
connection is opened, replacing any existing unnamed connection.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term><parameter>connstr</parameter></term>
<listitem>
<para>
<application>libpq</>-style connection info string, for example
<literal>hostaddr=127.0.0.1 port=5432 dbname=mydb user=postgres
password=mypasswd</>.
For details see <function>PQconnectdb</> in
<xref linkend="libpq-connect">.
</para>
</listitem>
</varlistentry>
</variablelist>
</refsect1>
<refsect1>
<title>Return Value</title>
<para>
Returns status, which is always <literal>OK</> (since any error
causes the function to throw an error instead of returning).
</para>
</refsect1>
<refsect1>
<title>Notes</title>
<para>
Only superusers may use <function>dblink_connect</> to create
non-password-authenticated connections. If non-superusers need this
capability, use <function>dblink_connect_u</> instead.
</para>
<para>
It is unwise to choose connection names that contain equal signs,
as this opens a risk of confusion with connection info strings
in other <filename>dblink</> functions.
</para>
</refsect1>
<refsect1>
<title>Example</title>
<programlisting>
select dblink_connect('dbname=postgres');
dblink_connect
----------------
OK
(1 row)
select dblink_connect('myconn','dbname=postgres');
select dblink_connect('myconn', 'dbname=postgres');
dblink_connect
----------------
OK
(1 row)
</programlisting>
</refsect1>
</refentry>
</refentry>
<refentry id="CONTRIB-DBLINK-CONNECT-U">
<refnamediv>
<refname>dblink_connect_u</refname>
<refpurpose>opens a persistent connection to a remote database, insecurely</refpurpose>
</refnamediv>
<refsynopsisdiv>
<synopsis>
dblink_connect_u(text connstr) returns text
dblink_connect_u(text connname, text connstr) returns text
</synopsis>
</refsynopsisdiv>
<refsect1>
<title>Description</title>
<para>
<function>dblink_connect_u()</> is identical to
<function>dblink_connect()</>, except that it will allow non-superusers
to connect using any authentication method.
</para>
<para>
If the remote server selects an authentication method that does not
involve a password, then impersonation and subsequent escalation of
privileges can occur, because the session will appear to have
originated from the user as which the local <productname>PostgreSQL</>
server runs. Therefore, <function>dblink_connect_u()</> is initially
installed with all privileges revoked from <literal>PUBLIC</>,
making it un-callable except by superusers. In some situations
it may be appropriate to grant <literal>EXECUTE</> permission for
<function>dblink_connect_u()</> to specific users who are considered
trustworthy, but this should be done with care.
</para>
<para>
For further details see <function>dblink_connect()</>.
</para>
</refsect1>
</refentry>
<refentry id="CONTRIB-DBLINK-DISCONNECT">
<refnamediv>
<refname>dblink_disconnect</refname>
<refpurpose>closes a persistent connection to a remote database</refpurpose>
</refnamediv>
<refsynopsisdiv>
<synopsis>
dblink_disconnect()
dblink_disconnect(text connname)
dblink_disconnect() returns text
dblink_disconnect(text connname) returns text
</synopsis>
</refsynopsisdiv>
<refsect1>
<title>Description</title>
<para>
<function>dblink_disconnect()</> closes a connection previously opened
by <function>dblink_connect()</>. The form with no arguments closes
an unnamed connection.
</para>
</refsect1>
<refsect1>
<title>Inputs</title>
<refsect2>
<title>connname</title>
<para>
if an argument is given, it is used as a name for a persistent
connection to close; otherwiase the unnamed connection is closed
</para>
</refsect2>
<title>Arguments</title>
<variablelist>
<varlistentry>
<term><parameter>conname</parameter></term>
<listitem>
<para>
The name of a named connection to be closed.
</para>
</listitem>
</varlistentry>
</variablelist>
</refsect1>
<refsect1>
<title>Outputs</title>
<para>Returns status = "OK"</para>
<title>Return Value</title>
<para>
Returns status, which is always <literal>OK</> (since any error
causes the function to throw an error instead of returning).
</para>
</refsect1>
<refsect1>
<title>Example</title>
<programlisting>
test=# select dblink_disconnect();
dblink_disconnect
-------------------
OK
(1 row)
select dblink_disconnect('myconn');
dblink_disconnect
-------------------
......@@ -116,723 +218,710 @@
(1 row)
</programlisting>
</refsect1>
</refentry>
</refentry>
<refentry id="CONTRIB-DBLINK-OPEN">
<refentry id="CONTRIB-DBLINK">
<refnamediv>
<refname>dblink_open</refname>
<refpurpose>opens a cursor on a remote database</refpurpose>
<refname>dblink</refname>
<refpurpose>executes a query in a remote database</refpurpose>
</refnamediv>
<refsynopsisdiv>
<synopsis>
dblink_open(text cursorname, text sql [, bool fail_on_error])
dblink_open(text connname, text cursorname, text sql [, bool fail_on_error])
dblink(text connname, text sql [, bool fail_on_error]) returns setof record
dblink(text connstr, text sql [, bool fail_on_error]) returns setof record
dblink(text sql [, bool fail_on_error]) returns setof record
</synopsis>
</refsynopsisdiv>
<refsect1>
<title>Inputs</title>
<refsect2>
<title>connname</title>
<para>
if three arguments are present, the first is taken as the specific
connection name to use; otherwise the unnamed connection is assumed
</para>
</refsect2>
<refsect2>
<title>cursorname</title>
<para>
a reference name for the cursor
</para>
</refsect2>
<refsect2>
<title>sql</title>
<para>
sql statement that you wish to execute on the remote host
e.g. "select * from pg_class"
</para>
</refsect2>
<refsect2>
<title>fail_on_error</title>
<para>
If true (default when not present) then an ERROR thrown on the remote side
of the connection causes an ERROR to also be thrown locally. If false, the
remote ERROR is locally treated as a NOTICE, and the return value is set
to 'ERROR'.
</para>
</refsect2>
</refsect1>
<refsect1>
<title>Outputs</title>
<para>Returns status = "OK"</para>
</refsect1>
<refsect1>
<title>Note</title>
<itemizedlist>
<listitem>
<para>
dblink_connect(text connstr) must be executed first
</para>
</listitem>
<listitem>
<para>
dblink_open starts an explicit transaction. If, after using dblink_open,
you use dblink_exec to change data, and then an error occurs or you use
dblink_disconnect without a dblink_close first, your change *will* be
lost. Also, using dblink_close explicitly ends the transaction and thus
effectively closes *all* open cursors.
</para>
</listitem>
</itemizedlist>
<refsect1>
<title>Description</title>
<para>
<function>dblink</> executes a query (usually a <command>SELECT</>,
but it can be any SQL statement that returns rows) in a remote database.
</para>
<para>
When two <type>text</> arguments are given, the first one is first
looked up as a persistent connection's name; if found, the command
is executed on that connection. If not found, the first argument
is treated as a connection info string as for <function>dblink_connect</>,
and the indicated connection is made just for the duration of this command.
</para>
</refsect1>
<refsect1>
<title>Example</title>
<programlisting>
test=# select dblink_connect('dbname=postgres');
dblink_connect
----------------
OK
(1 row)
test=# select dblink_open('foo','select proname, prosrc from pg_proc');
dblink_open
-------------
OK
(1 row)
</programlisting>
<title>Arguments</title>
<variablelist>
<varlistentry>
<term><parameter>conname</parameter></term>
<listitem>
<para>
Name of the connection to use; omit this parameter to use the
unnamed connection.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term><parameter>connstr</parameter></term>
<listitem>
<para>
A connection info string, as previously described for
<function>dblink_connect</>.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term><parameter>sql</parameter></term>
<listitem>
<para>
The SQL query that you wish to execute in the remote database,
for example <literal>select * from foo</>.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term><parameter>fail_on_error</parameter></term>
<listitem>
<para>
If true (the default when omitted) then an error thrown on the
remote side of the connection causes an error to also be thrown
locally. If false, the remote error is locally reported as a NOTICE,
and the function returns no rows.
</para>
</listitem>
</varlistentry>
</variablelist>
</refsect1>
</refentry>
<refentry id="CONTRIB-DBLINK-FETCH">
<refnamediv>
<refname>dblink_fetch</refname>
<refpurpose>returns a set from an open cursor on a remote database</refpurpose>
</refnamediv>
<refsynopsisdiv>
<synopsis>
dblink_fetch(text cursorname, int32 howmany [, bool fail_on_error])
dblink_fetch(text connname, text cursorname, int32 howmany [, bool fail_on_error])
</synopsis>
</refsynopsisdiv>
<refsect1>
<title>Inputs</title>
<refsect2>
<title>connname</title>
<para>
if three arguments are present, the first is taken as the specific
connection name to use; otherwise the unnamed connection is assumed
</para>
</refsect2>
<refsect2>
<title>cursorname</title>
<para>
The reference name for the cursor
</para>
</refsect2>
<refsect2>
<title>howmany</title>
<para>
Maximum number of rows to retrieve. The next howmany rows are fetched,
starting at the current cursor position, moving forward. Once the cursor
has positioned to the end, no more rows are produced.
</para>
</refsect2>
<refsect2>
<title>fail_on_error</title>
<para>
If true (default when not present) then an ERROR thrown on the remote side
of the connection causes an ERROR to also be thrown locally. If false, the
remote ERROR is locally treated as a NOTICE, and no rows are returned.
</para>
</refsect2>
</refsect1>
<refsect1>
<title>Outputs</title>
<para>Returns setof record</para>
</refsect1>
<refsect1>
<title>Note</title>
<refsect1>
<title>Return Value</title>
<para>
The function returns the row(s) produced by the query. Since
<function>dblink</> can be used with any query, it is declared
to return <type>record</>, rather than specifying any particular
set of columns. This means that you must specify the expected
set of columns in the calling query &mdash; otherwise
<productname>PostgreSQL</> would not know what to expect.
Here is an example:
<programlisting>
SELECT *
FROM dblink('dbname=mydb', 'select proname, prosrc from pg_proc')
AS t1(proname name, prosrc text)
WHERE proname LIKE 'bytea%';
</programlisting>
The <quote>alias</> part of the <literal>FROM</> clause must
specify the column names and types that the function will return.
(Specifying column names in an alias is actually standard SQL
syntax, but specifying column types is a <productname>PostgreSQL</>
extension.) This allows the system to understand what
<literal>*</> should expand to, and what <structname>proname</>
in the <literal>WHERE</> clause refers to, in advance of trying
to execute the function. At runtime, an error will be thrown
if the actual query result from the remote database does not
have the same number of columns shown in the <literal>FROM</> clause.
The column names need not match, however, and <function>dblink</>
does not insist on exact type matches either. It will succeed
so long as the returned data strings are valid input for the
column type declared in the <literal>FROM</> clause.
</para>
</refsect1>
<refsect1>
<title>Notes</title>
<para>
<function>dblink</> fetches the entire remote query result before
returning any of it to the local system. If the query is expected
to return a large number of rows, it's better to open it as a cursor
with <function>dblink_open</> and then fetch a manageable number
of rows at a time.
</para>
<para>
On a mismatch between the number of return fields as specified in the FROM
clause, and the actual number of fields returned by the remote cursor, an
ERROR will be thrown. In this event, the remote cursor is still advanced
by as many rows as it would have been if the ERROR had not occurred.
A convenient way to use <function>dblink</> with predetermined
queries is to create a view.
This allows the column type information to be buried in the view,
instead of having to spell it out in every query. For example,
<programlisting>
create view myremote_pg_proc as
select *
from dblink('dbname=postgres', 'select proname, prosrc from pg_proc')
as t1(proname name, prosrc text);
select * from myremote_pg_proc where proname like 'bytea%';
</programlisting>
</para>
</refsect1>
<refsect1>
<title>Example</title>
<programlisting>
test=# select dblink_connect('dbname=postgres');
select * from dblink('dbname=postgres', 'select proname, prosrc from pg_proc')
as t1(proname name, prosrc text) where proname like 'bytea%';
proname | prosrc
------------+------------
byteacat | byteacat
byteaeq | byteaeq
bytealt | bytealt
byteale | byteale
byteagt | byteagt
byteage | byteage
byteane | byteane
byteacmp | byteacmp
bytealike | bytealike
byteanlike | byteanlike
byteain | byteain
byteaout | byteaout
(12 rows)
select dblink_connect('dbname=postgres');
dblink_connect
----------------
OK
(1 row)
test=# select dblink_open('foo','select proname, prosrc from pg_proc where proname like ''bytea%''');
dblink_open
-------------
select * from dblink('select proname, prosrc from pg_proc')
as t1(proname name, prosrc text) where proname like 'bytea%';
proname | prosrc
------------+------------
byteacat | byteacat
byteaeq | byteaeq
bytealt | bytealt
byteale | byteale
byteagt | byteagt
byteage | byteage
byteane | byteane
byteacmp | byteacmp
bytealike | bytealike
byteanlike | byteanlike
byteain | byteain
byteaout | byteaout
(12 rows)
select dblink_connect('myconn', 'dbname=regression');
dblink_connect
----------------
OK
(1 row)
test=# select * from dblink_fetch('foo',5) as (funcname name, source text);
funcname | source
----------+----------
byteacat | byteacat
byteacmp | byteacmp
byteaeq | byteaeq
byteage | byteage
byteagt | byteagt
(5 rows)
test=# select * from dblink_fetch('foo',5) as (funcname name, source text);
funcname | source
-----------+-----------
byteain | byteain
byteale | byteale
bytealike | bytealike
bytealt | bytealt
byteane | byteane
(5 rows)
test=# select * from dblink_fetch('foo',5) as (funcname name, source text);
funcname | source
select * from dblink('myconn', 'select proname, prosrc from pg_proc')
as t1(proname name, prosrc text) where proname like 'bytea%';
proname | prosrc
------------+------------
bytearecv | bytearecv
byteasend | byteasend
byteale | byteale
byteagt | byteagt
byteage | byteage
byteane | byteane
byteacmp | byteacmp
bytealike | bytealike
byteanlike | byteanlike
byteacat | byteacat
byteaeq | byteaeq
bytealt | bytealt
byteain | byteain
byteaout | byteaout
(2 rows)
test=# select * from dblink_fetch('foo',5) as (funcname name, source text);
funcname | source
----------+--------
(0 rows)
(14 rows)
</programlisting>
</refsect1>
</refentry>
<refentry id="CONTRIB-DBLINK-CLOSE">
<refentry id="CONTRIB-DBLINK-EXEC">
<refnamediv>
<refname>dblink_close</refname>
<refpurpose>closes a cursor on a remote database</refpurpose>
<refname>dblink_exec</refname>
<refpurpose>executes a command in a remote database</refpurpose>
</refnamediv>
<refsynopsisdiv>
<synopsis>
dblink_close(text cursorname [, bool fail_on_error])
dblink_close(text connname, text cursorname [, bool fail_on_error])
dblink_exec(text connname, text sql [, bool fail_on_error]) returns text
dblink_exec(text connstr, text sql [, bool fail_on_error]) returns text
dblink_exec(text sql [, bool fail_on_error]) returns text
</synopsis>
</refsynopsisdiv>
<refsect1>
<title>Inputs</title>
<refsect2>
<title>connname</title>
<para>
if two arguments are present, the first is taken as the specific
connection name to use; otherwise the unnamed connection is assumed
</para>
</refsect2>
<refsect2>
<title>cursorname</title>
<para>
a reference name for the cursor
</para>
</refsect2>
<refsect2>
<title>fail_on_error</title>
<para>
If true (default when not present) then an ERROR thrown on the remote side
of the connection causes an ERROR to also be thrown locally. If false, the
remote ERROR is locally treated as a NOTICE, and the return value is set
to 'ERROR'.
</para>
</refsect2>
</refsect1>
<refsect1>
<title>Outputs</title>
<para>Returns status = "OK"</para>
</refsect1>
<refsect1>
<title>Note</title>
<refsect1>
<title>Description</title>
<para>
<function>dblink_exec</> executes a command (that is, any SQL statement
that doesn't return rows) in a remote database.
</para>
<para>
When two <type>text</> arguments are given, the first one is first
looked up as a persistent connection's name; if found, the command
is executed on that connection. If not found, the first argument
is treated as a connection info string as for <function>dblink_connect</>,
and the indicated connection is made just for the duration of this command.
</para>
</refsect1>
<refsect1>
<title>Arguments</title>
<variablelist>
<varlistentry>
<term><parameter>conname</parameter></term>
<listitem>
<para>
Name of the connection to use; omit this parameter to use the
unnamed connection.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term><parameter>connstr</parameter></term>
<listitem>
<para>
A connection info string, as previously described for
<function>dblink_connect</>.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term><parameter>sql</parameter></term>
<listitem>
<para>
The SQL command that you wish to execute in the remote database,
for example
<literal>insert into foo values(0,'a','{"a0","b0","c0"}')</>.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term><parameter>fail_on_error</parameter></term>
<listitem>
<para>
If true (the default when omitted) then an error thrown on the
remote side of the connection causes an error to also be thrown
locally. If false, the remote error is locally reported as a NOTICE,
and the function's return value is set to <literal>ERROR</>.
</para>
</listitem>
</varlistentry>
</variablelist>
</refsect1>
<refsect1>
<title>Return Value</title>
<para>
dblink_connect(text connstr) or dblink_connect(text connname, text connstr)
must be executed first.
Returns status, either the command's status string or <literal>ERROR</>.
</para>
</refsect1>
<refsect1>
<title>Example</title>
<programlisting>
test=# select dblink_connect('dbname=postgres');
select dblink_connect('dbname=dblink_test_slave');
dblink_connect
----------------
OK
(1 row)
test=# select dblink_open('foo','select proname, prosrc from pg_proc');
dblink_open
-------------
OK
(1 row)
test=# select dblink_close('foo');
dblink_close
--------------
OK
select dblink_exec('insert into foo values(21,''z'',''{"a0","b0","c0"}'');');
dblink_exec
-----------------
INSERT 943366 1
(1 row)
select dblink_connect('myconn','dbname=regression');
select dblink_connect('myconn', 'dbname=regression');
dblink_connect
----------------
OK
(1 row)
select dblink_open('myconn','foo','select proname, prosrc from pg_proc');
dblink_open
-------------
OK
(1 row)
select dblink_close('myconn','foo');
dblink_close
--------------
OK
select dblink_exec('myconn', 'insert into foo values(21,''z'',''{"a0","b0","c0"}'');');
dblink_exec
------------------
INSERT 6432584 1
(1 row)
select dblink_exec('myconn', 'insert into pg_class values (''foo'')',false);
NOTICE: sql error
DETAIL: ERROR: null value in column "relnamespace" violates not-null constraint
dblink_exec
-------------
ERROR
(1 row)
</programlisting>
</refsect1>
</refentry>
<refentry id="CONTRIB-DBLINK-EXEC">
<refentry id="CONTRIB-DBLINK-OPEN">
<refnamediv>
<refname>dblink_exec</refname>
<refpurpose>executes an UPDATE/INSERT/DELETE on a remote database</refpurpose>
<refname>dblink_open</refname>
<refpurpose>opens a cursor in a remote database</refpurpose>
</refnamediv>
<refsynopsisdiv>
<synopsis>
dblink_exec(text connstr, text sql [, bool fail_on_error])
dblink_exec(text connname, text sql [, bool fail_on_error])
dblink_exec(text sql [, bool fail_on_error])
dblink_open(text cursorname, text sql [, bool fail_on_error]) returns text
dblink_open(text connname, text cursorname, text sql [, bool fail_on_error]) returns text
</synopsis>
</refsynopsisdiv>
<refsect1>
<title>Inputs</title>
<refsect2>
<title>connname/connstr</title>
<para>
If two arguments are present, the first is first assumed to be a specific
connection name to use. If the name is not found, the argument is then
assumed to be a valid connection string, of standard libpq format,
e.g.: "hostaddr=127.0.0.1 dbname=mydb user=postgres password=mypasswd"
If only one argument is used, then the unnamed connection is used.
</para>
</refsect2>
<refsect2>
<title>sql</title>
<para>
sql statement that you wish to execute on the remote host, e.g.:
insert into foo values(0,'a','{"a0","b0","c0"}');
</para>
</refsect2>
<refsect2>
<title>fail_on_error</title>
<para>
If true (default when not present) then an ERROR thrown on the remote side
of the connection causes an ERROR to also be thrown locally. If false, the
remote ERROR is locally treated as a NOTICE, and the return value is set
to 'ERROR'.
</para>
</refsect2>
</refsect1>
<refsect1>
<title>Outputs</title>
<para>Returns status of the command, or 'ERROR' if the command failed.</para>
</refsect1>
<refsect1>
<title>Description</title>
<para>
<function>dblink_open()</> opens a cursor in a remote database.
The cursor can subsequently be manipulated with
<function>dblink_fetch()</> and <function>dblink_close()</>.
</para>
</refsect1>
<refsect1>
<title>Arguments</title>
<variablelist>
<varlistentry>
<term><parameter>conname</parameter></term>
<listitem>
<para>
Name of the connection to use; omit this parameter to use the
unnamed connection.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term><parameter>cursorname</parameter></term>
<listitem>
<para>
The name to assign to this cursor.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term><parameter>sql</parameter></term>
<listitem>
<para>
The <command>SELECT</> statement that you wish to execute in the remote
database, for example <literal>select * from pg_class</>.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term><parameter>fail_on_error</parameter></term>
<listitem>
<para>
If true (the default when omitted) then an error thrown on the
remote side of the connection causes an error to also be thrown
locally. If false, the remote error is locally reported as a NOTICE,
and the function's return value is set to <literal>ERROR</>.
</para>
</listitem>
</varlistentry>
</variablelist>
</refsect1>
<refsect1>
<title>Return Value</title>
<para>
Returns status, either <literal>OK</> or <literal>ERROR</>.
</para>
</refsect1>
<refsect1>
<title>Notes</title>
<para>
dblink_open starts an explicit transaction. If, after using dblink_open,
you use dblink_exec to change data, and then an error occurs or you use
dblink_disconnect without a dblink_close first, your change *will* be
lost.
Since a cursor can only persist within a transaction,
<function>dblink_open</> starts an explicit transaction block
(<command>BEGIN</>) on the remote side, if the remote side was
not already within a transaction. This transaction will be
closed again when the matching <function>dblink_close</> is
executed. Note that if
you use <function>dblink_exec</> to change data between
<function>dblink_open</> and <function>dblink_close</>,
and then an error occurs or you use <function>dblink_disconnect</> before
<function>dblink_close</>, your change <emphasis>will be
lost</> because the transaction will be aborted.
</para>
</refsect1>
<refsect1>
<title>Example</title>
<programlisting>
select dblink_connect('dbname=dblink_test_slave');
dblink_connect
----------------
OK
(1 row)
select dblink_exec('insert into foo values(21,''z'',''{"a0","b0","c0"}'');');
dblink_exec
-----------------
INSERT 943366 1
(1 row)
select dblink_connect('myconn','dbname=regression');
test=# select dblink_connect('dbname=postgres');
dblink_connect
----------------
OK
(1 row)
select dblink_exec('myconn','insert into foo values(21,''z'',''{"a0","b0","c0"}'');');
dblink_exec
------------------
INSERT 6432584 1
(1 row)
select dblink_exec('myconn','insert into pg_class values (''foo'')',false);
NOTICE: sql error
DETAIL: ERROR: null value in column "relnamespace" violates not-null constraint
dblink_exec
test=# select dblink_open('foo', 'select proname, prosrc from pg_proc');
dblink_open
-------------
ERROR
OK
(1 row)
</programlisting>
</refsect1>
</refentry>
<refentry id="CONTRIB-DBLINK-CURRENT-QUERY">
<refentry id="CONTRIB-DBLINK-FETCH">
<refnamediv>
<refname>dblink_current_query</refname>
<refpurpose>returns the current query string</refpurpose>
<refname>dblink_fetch</refname>
<refpurpose>returns rows from an open cursor in a remote database</refpurpose>
</refnamediv>
<refsynopsisdiv>
<synopsis>
dblink_current_query () RETURNS text
dblink_fetch(text cursorname, int howmany [, bool fail_on_error]) returns setof record
dblink_fetch(text connname, text cursorname, int howmany [, bool fail_on_error]) returns setof record
</synopsis>
</refsynopsisdiv>
<refsect1>
<title>Description</title>
<para>
<function>dblink_fetch</> fetches rows from a cursor previously
established by <function>dblink_open</>.
</para>
</refsect1>
<refsect1>
<title>Arguments</title>
<variablelist>
<varlistentry>
<term><parameter>conname</parameter></term>
<listitem>
<para>
Name of the connection to use; omit this parameter to use the
unnamed connection.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term><parameter>cursorname</parameter></term>
<listitem>
<para>
The name of the cursor to fetch from.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term><parameter>howmany</parameter></term>
<listitem>
<para>
The maximum number of rows to retrieve. The next <parameter>howmany</>
rows are fetched, starting at the current cursor position, moving
forward. Once the cursor has reached its end, no more rows are produced.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term><parameter>fail_on_error</parameter></term>
<listitem>
<para>
If true (the default when omitted) then an error thrown on the
remote side of the connection causes an error to also be thrown
locally. If false, the remote error is locally reported as a NOTICE,
and the function returns no rows.
</para>
</listitem>
</varlistentry>
</variablelist>
</refsect1>
<refsect1>
<title>Inputs</title>
<refsect2>
<title>None</title>
<para>
</para>
</refsect2>
<title>Return Value</title>
<para>
The function returns the row(s) fetched from the cursor. To use this
function, you will need to specify the expected set of columns,
as previously discussed for <function>dblink</>.
</para>
</refsect1>
<refsect1>
<title>Outputs</title>
<para>Returns test -- a copy of the currenty executing query</para>
<title>Notes</title>
<para>
On a mismatch between the number of return columns specified in the
<literal>FROM</> clause, and the actual number of columns returned by the
remote cursor, an error will be thrown. In this event, the remote cursor
is still advanced by as many rows as it would have been if the error had
not occurred. The same is true for any other error occurring in the local
query after the remote <command>FETCH</> has been done.
</para>
</refsect1>
<refsect1>
<title>Example</title>
<programlisting>
test=# select dblink_current_query() from (select dblink('dbname=postgres','select oid, proname from pg_proc where proname = ''byteacat''') as f1) as t1;
dblink_current_query
-----------------------------------------------------------------------------------------------------------------------------------------------------
select dblink_current_query() from (select dblink('dbname=postgres','select oid, proname from pg_proc where proname = ''byteacat''') as f1) as t1;
test=# select dblink_connect('dbname=postgres');
dblink_connect
----------------
OK
(1 row)
test=# select dblink_open('foo', 'select proname, prosrc from pg_proc where proname like ''bytea%''');
dblink_open
-------------
OK
(1 row)
test=# select * from dblink_fetch('foo', 5) as (funcname name, source text);
funcname | source
----------+----------
byteacat | byteacat
byteacmp | byteacmp
byteaeq | byteaeq
byteage | byteage
byteagt | byteagt
(5 rows)
test=# select * from dblink_fetch('foo', 5) as (funcname name, source text);
funcname | source
-----------+-----------
byteain | byteain
byteale | byteale
bytealike | bytealike
bytealt | bytealt
byteane | byteane
(5 rows)
test=# select * from dblink_fetch('foo', 5) as (funcname name, source text);
funcname | source
------------+------------
byteanlike | byteanlike
byteaout | byteaout
(2 rows)
test=# select * from dblink_fetch('foo', 5) as (funcname name, source text);
funcname | source
----------+--------
(0 rows)
</programlisting>
</refsect1>
</refentry>
<refentry id="CONTRIB-DBLINK-GET-PKEY">
<refentry id="CONTRIB-DBLINK-CLOSE">
<refnamediv>
<refname>dblink_get_pkey</refname>
<refpurpose>returns the position and field names of a relation's
primary key fields
</refpurpose>
<refname>dblink_close</refname>
<refpurpose>closes a cursor in a remote database</refpurpose>
</refnamediv>
<refsynopsisdiv>
<synopsis>
dblink_get_pkey(text relname) RETURNS setof dblink_pkey_results
dblink_close(text cursorname [, bool fail_on_error]) returns text
dblink_close(text connname, text cursorname [, bool fail_on_error]) returns text
</synopsis>
</refsynopsisdiv>
<refsect1>
<title>Inputs</title>
<refsect2>
<title>relname</title>
<para>
any relation name;
e.g. 'foobar'
</para>
</refsect2>
</refsect1>
<refsect1>
<title>Outputs</title>
<title>Description</title>
<para>
Returns setof dblink_pkey_results -- one row for each primary key field,
in order of position in the key. dblink_pkey_results is defined as follows:
CREATE TYPE dblink_pkey_results AS (position int4, colname text);
<function>dblink_close</> closes a cursor previously opened with
<function>dblink_open</>.
</para>
</refsect1>
<refsect1>
<title>Example</title>
<programlisting>
test=# select * from dblink_get_pkey('foobar');
position | colname
----------+---------
1 | f1
2 | f2
3 | f3
4 | f4
5 | f5
</programlisting>
<title>Arguments</title>
<variablelist>
<varlistentry>
<term><parameter>conname</parameter></term>
<listitem>
<para>
Name of the connection to use; omit this parameter to use the
unnamed connection.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term><parameter>cursorname</parameter></term>
<listitem>
<para>
The name of the cursor to close.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term><parameter>fail_on_error</parameter></term>
<listitem>
<para>
If true (the default when omitted) then an error thrown on the
remote side of the connection causes an error to also be thrown
locally. If false, the remote error is locally reported as a NOTICE,
and the function's return value is set to <literal>ERROR</>.
</para>
</listitem>
</varlistentry>
</variablelist>
</refsect1>
</refentry>
<refentry id="CONTRIB-DBLINK-BUILD-SQL-INSERT">
<refnamediv>
<refname>dblink_build_sql_insert</refname>
<refpurpose>
builds an insert statement using a local tuple, replacing the
selection key field values with alternate supplied values
</refpurpose>
</refnamediv>
<refsynopsisdiv>
<synopsis>
dblink_build_sql_insert(text relname
,int2vector primary_key_attnums
,int2 num_primary_key_atts
,_text src_pk_att_vals_array
,_text tgt_pk_att_vals_array) RETURNS text
</synopsis>
</refsynopsisdiv>
<refsect1>
<title>Inputs</title>
<refsect2>
<title>relname</title>
<para>
any relation name;
e.g. 'foobar';
</para>
</refsect2>
<refsect2>
<title>primary_key_attnums</title>
<para>
vector of primary key attnums (1 based, see pg_index.indkey);
e.g. '1 2'
</para>
</refsect2>
<refsect2>
<title>num_primary_key_atts</title>
<para>
number of primary key attnums in the vector; e.g. 2
</para>
</refsect2>
<refsect2>
<title>src_pk_att_vals_array</title>
<para>
array of primary key values, used to look up the local matching
tuple, the values of which are then used to construct the SQL
statement
</para>
</refsect2>
<refsect2>
<title>tgt_pk_att_vals_array</title>
<para>
array of primary key values, used to replace the local tuple
values in the SQL statement
</para>
</refsect2>
</refsect1>
<refsect1>
<title>Outputs</title>
<para>Returns text -- requested SQL statement</para>
</refsect1>
<refsect1>
<title>Example</title>
<programlisting>
test=# select dblink_build_sql_insert('foo','1 2',2,'{"1", "a"}','{"1", "b''a"}');
dblink_build_sql_insert
--------------------------------------------------
INSERT INTO foo(f1,f2,f3) VALUES('1','b''a','1')
(1 row)
</programlisting>
<title>Return Value</title>
<para>
Returns status, either <literal>OK</> or <literal>ERROR</>.
</para>
</refsect1>
</refentry>
<refentry id="CONTRIB-DBLINK-BUILD-SQL-DELETE">
<refnamediv>
<refname>dblink_build_sql_delete</refname>
<refpurpose>builds a delete statement using supplied values for selection
key field values
</refpurpose>
</refnamediv>
<refsynopsisdiv>
<synopsis>
dblink_build_sql_delete(text relname
,int2vector primary_key_attnums
,int2 num_primary_key_atts
,_text tgt_pk_att_vals_array) RETURNS text
</synopsis>
</refsynopsisdiv>
<refsect1>
<title>Inputs</title>
<refsect2>
<title>relname</title>
<para>
any relation name;
e.g. 'foobar';
</para>
</refsect2>
<refsect2>
<title>primary_key_attnums</title>
<para>
vector of primary key attnums (1 based, see pg_index.indkey);
e.g. '1 2'
</para>
</refsect2>
<refsect2>
<title>num_primary_key_atts</title>
<para>
number of primary key attnums in the vector; e.g. 2
</para>
</refsect2>
<refsect2>
<title>src_pk_att_vals_array</title>
<para>
array of primary key values, used to look up the local matching
tuple, the values of which are then used to construct the SQL
statement
</para>
</refsect2>
<refsect2>
<title>tgt_pk_att_vals_array</title>
<para>
array of primary key values, used to replace the local tuple
values in the SQL statement
</para>
</refsect2>
</refsect1>
<refsect1>
<title>Outputs</title>
<para>Returns text -- requested SQL statement</para>
</refsect1>
<refsect1>
<title>Example</title>
<programlisting>
test=# select dblink_build_sql_delete('MyFoo','1 2',2,'{"1", "b"}');
dblink_build_sql_delete
---------------------------------------------
DELETE FROM "MyFoo" WHERE f1='1' AND f2='b'
(1 row)
</programlisting>
<title>Notes</title>
<para>
If <function>dblink_open</> started an explicit transaction block,
and this is the last remaining open cursor in this connection,
<function>dblink_close</> will issue the matching <command>COMMIT</>.
</para>
</refsect1>
</refentry>
<refentry id="CONTRIB-DBLINK-BUILD-SQL-UPDATE">
<refnamediv>
<refname>dblink_build_sql_update</refname>
<refpurpose>builds an update statement using a local tuple, replacing
the selection key field values with alternate supplied values
</refpurpose>
</refnamediv>
<refsynopsisdiv>
<synopsis>
dblink_build_sql_update(text relname
,int2vector primary_key_attnums
,int2 num_primary_key_atts
,_text src_pk_att_vals_array
,_text tgt_pk_att_vals_array) RETURNS text
</synopsis>
</refsynopsisdiv>
<refsect1>
<title>Inputs</title>
<refsect2>
<title>relname</title>
<para>
any relation name;
e.g. 'foobar';
</para>
</refsect2>
<refsect2>
<title>primary_key_attnums</title>
<para>
vector of primary key attnums (1 based, see pg_index.indkey);
e.g. '1 2'
</para>
</refsect2>
<refsect2>
<title>num_primary_key_atts</title>
<para>
number of primary key attnums in the vector; e.g. 2
</para>
</refsect2>
<refsect2>
<title>src_pk_att_vals_array</title>
<para>
array of primary key values, used to look up the local matching
tuple, the values of which are then used to construct the SQL
statement
</para>
</refsect2>
<refsect2>
<title>tgt_pk_att_vals_array</title>
<para>
array of primary key values, used to replace the local tuple
values in the SQL statement
</para>
</refsect2>
</refsect1>
<refsect1>
<title>Outputs</title>
<para>Returns text -- requested SQL statement</para>
</refsect1>
<refsect1>
<title>Example</title>
<programlisting>
test=# select dblink_build_sql_update('foo','1 2',2,'{"1", "a"}','{"1", "b"}');
dblink_build_sql_update
-------------------------------------------------------------
UPDATE foo SET f1='1',f2='b',f3='1' WHERE f1='1' AND f2='b'
test=# select dblink_connect('dbname=postgres');
dblink_connect
----------------
OK
(1 row)
test=# select dblink_open('foo', 'select proname, prosrc from pg_proc');
dblink_open
-------------
OK
(1 row)
test=# select dblink_close('foo');
dblink_close
--------------
OK
(1 row)
</programlisting>
</refsect1>
......@@ -841,415 +930,318 @@
<refentry id="CONTRIB-DBLINK-GET-CONNECTIONS">
<refnamediv>
<refname>dblink_get_connections</refname>
<refpurpose>returns a text array of all active named dblink connections</refpurpose>
<refpurpose>returns the names of all open named dblink connections</refpurpose>
</refnamediv>
<refsynopsisdiv>
<synopsis>
dblink_get_connections() RETURNS text[]
dblink_get_connections() returns text[]
</synopsis>
</refsynopsisdiv>
<refsect1>
<title>Inputs</title>
<refsect2>
<title>none</title>
<para></para>
</refsect2>
<title>Description</title>
<para>
<function>dblink_get_connections</> returns an array of the names
of all open named <filename>dblink</> connections.
</para>
</refsect1>
<refsect1>
<title>Outputs</title>
<para>Returns text array of all active named dblink connections</para>
<title>Return Value</title>
<para>Returns a text array of connection names, or NULL if none.</para>
</refsect1>
<refsect1>
<title>Example</title>
<programlisting>
SELECT dblink_get_connections();
</programlisting>
</refsect1>
</refentry>
</refentry>
<refentry id="CONTRIB-DBLINK-IS-BUSY">
<refentry id="CONTRIB-DBLINK-ERROR-MESSAGE">
<refnamediv>
<refname>dblink_is_busy</refname>
<refpurpose>checks to see if named connection is busy with an async query</refpurpose>
<refname>dblink_error_message</refname>
<refpurpose>gets last error message on the named connection</refpurpose>
</refnamediv>
<refsynopsisdiv>
<synopsis>
dblink_is_busy(text connname) RETURNS int
dblink_error_message(text connname) returns text
</synopsis>
</refsynopsisdiv>
<refsect1>
<title>Description</title>
<para>
<function>dblink_error_message</> fetches the most recent remote
error message for a given connection.
</para>
</refsect1>
<refsect1>
<title>Inputs</title>
<refsect2>
<title>connname</title>
<para>
The specific connection name to use
</para>
</refsect2>
<title>Arguments</title>
<variablelist>
<varlistentry>
<term><parameter>conname</parameter></term>
<listitem>
<para>
Name of the connection to use.
</para>
</listitem>
</varlistentry>
</variablelist>
</refsect1>
<refsect1>
<title>Outputs</title>
<title>Return Value</title>
<para>
Returns 1 if connection is busy, 0 if it is not busy.
If this function returns 0, it is guaranteed that dblink_get_result
will not block.
Returns last error message, or an empty string if there has been
no error in this connection.
</para>
</refsect1>
<refsect1>
<title>Example</title>
<programlisting>
SELECT dblink_is_busy('dtest1');
SELECT dblink_error_message('dtest1');
</programlisting>
</refsect1>
</refentry>
<refentry id="CONTRIB-DBLINK-CANCEL-QUERY">
<refentry id="CONTRIB-DBLINK-SEND-QUERY">
<refnamediv>
<refname>dblink_cancel_query</refname>
<refpurpose>cancels any active query on the named connection</refpurpose>
</refnamediv>
<refname>dblink_send_query</refname>
<refpurpose>sends an async query to a remote database</refpurpose>
</refnamediv>
<refsynopsisdiv>
<synopsis>
dblink_cancel_query(text connname) RETURNS text
dblink_send_query(text connname, text sql) returns int
</synopsis>
</refsynopsisdiv>
<refsect1>
<title>Inputs</title>
<refsect2>
<title>connname</title>
<para>
The specific connection name to use.
</para>
</refsect2>
<title>Description</title>
<para>
<function>dblink_send_query</> sends a query to be executed
asynchronously, that is, without immediately waiting for the result.
There must not be an async query already in progress on the
connection.
</para>
<para>
After successfully dispatching an async query, completion status
can be checked with <function>dblink_is_busy</>, and the results
are ultimately collected with <function>dblink_get_result</>.
It is also possible to attempt to cancel an active async query
using <function>dblink_cancel_query</>.
</para>
</refsect1>
<refsect1>
<title>Outputs</title>
<title>Arguments</title>
<variablelist>
<varlistentry>
<term><parameter>conname</parameter></term>
<listitem>
<para>
Name of the connection to use.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term><parameter>sql</parameter></term>
<listitem>
<para>
The SQL statement that you wish to execute in the remote database,
for example <literal>select * from pg_class</>.
</para>
</listitem>
</varlistentry>
</variablelist>
</refsect1>
<refsect1>
<title>Return Value</title>
<para>
Returns "OK" on success, or an error message on failure.
Returns 1 if the query was successfully dispatched, 0 otherwise.
</para>
</refsect1>
<refsect1>
<title>Example</title>
<programlisting>
SELECT dblink_cancel_query('dtest1');
SELECT dblink_send_query('dtest1', 'SELECT * FROM foo WHERE f1 &lt; 3');
</programlisting>
</refsect1>
</refentry>
<refentry id="CONTRIB-DBLINK-ERROR-MESSAGE">
<refentry id="CONTRIB-DBLINK-IS-BUSY">
<refnamediv>
<refname>dblink_error_message</refname>
<refpurpose>gets last error message on the named connection</refpurpose>
<refname>dblink_is_busy</refname>
<refpurpose>checks if connection is busy with an async query</refpurpose>
</refnamediv>
<refsynopsisdiv>
<synopsis>
dblink_error_message(text connname) RETURNS text
dblink_is_busy(text connname) returns int
</synopsis>
</refsynopsisdiv>
<refsect1>
<title>Inputs</title>
<refsect2>
<title>connname</title>
<para>
The specific connection name to use.
</para>
</refsect2>
</refsect1>
<refsect1>
<title>Outputs</title>
<title>Description</title>
<para>
Returns last error message.
<function>dblink_is_busy</> tests whether an async query is in progress.
</para>
</refsect1>
<refsect1>
<title>Example</title>
<programlisting>
SELECT dblink_error_message('dtest1');
</programlisting>
<title>Arguments</title>
<variablelist>
<varlistentry>
<term><parameter>conname</parameter></term>
<listitem>
<para>
Name of the connection to check.
</para>
</listitem>
</varlistentry>
</variablelist>
</refsect1>
</refentry>
<refentry id="CONTRIB-DBLINK">
<refnamediv>
<refname>dblink</refname>
<refpurpose>returns a set from a remote database</refpurpose>
</refnamediv>
<refsynopsisdiv>
<synopsis>
dblink(text connstr, text sql [, bool fail_on_error])
dblink(text connname, text sql [, bool fail_on_error])
dblink(text sql [, bool fail_on_error])
</synopsis>
</refsynopsisdiv>
<refsect1>
<title>Inputs</title>
<refsect2>
<title>connname/connstr</title>
<para>
If two arguments are present, the first is first assumed to be a specific
connection name to use. If the name is not found, the argument is then
assumed to be a valid connection string, of standard libpq format,
e.g.: "hostaddr=127.0.0.1 dbname=mydb user=postgres password=mypasswd"
If only one argument is used, then the unnamed connection is used.
</para>
</refsect2>
<refsect2>
<title>sql</title>
<para>
sql statement that you wish to execute on the remote host
e.g. "select * from pg_class"
</para>
</refsect2>
<refsect2>
<title>fail_on_error</title>
<para>
If true (default when not present) then an ERROR thrown on the remote side
of the connection causes an ERROR to also be thrown locally. If false, the
remote ERROR is locally treated as a NOTICE, and no rows are returned.
</para>
</refsect2>
<refsect2>
<title></title>
<para>
</para>
</refsect2>
</refsect1>
<refsect1>
<title>Outputs</title>
<para>Returns setof record</para>
</refsect1>
<refsect1>
<title>Example</title>
<programlisting>
select * from dblink('dbname=postgres','select proname, prosrc from pg_proc')
as t1(proname name, prosrc text) where proname like 'bytea%';
proname | prosrc
------------+------------
byteacat | byteacat
byteaeq | byteaeq
bytealt | bytealt
byteale | byteale
byteagt | byteagt
byteage | byteage
byteane | byteane
byteacmp | byteacmp
bytealike | bytealike
byteanlike | byteanlike
byteain | byteain
byteaout | byteaout
(12 rows)
select dblink_connect('dbname=postgres');
dblink_connect
----------------
OK
(1 row)
select * from dblink('select proname, prosrc from pg_proc')
as t1(proname name, prosrc text) where proname like 'bytea%';
proname | prosrc
------------+------------
byteacat | byteacat
byteaeq | byteaeq
bytealt | bytealt
byteale | byteale
byteagt | byteagt
byteage | byteage
byteane | byteane
byteacmp | byteacmp
bytealike | bytealike
byteanlike | byteanlike
byteain | byteain
byteaout | byteaout
(12 rows)
select dblink_connect('myconn','dbname=regression');
dblink_connect
----------------
OK
(1 row)
select * from dblink('myconn','select proname, prosrc from pg_proc')
as t1(proname name, prosrc text) where proname like 'bytea%';
proname | prosrc
------------+------------
bytearecv | bytearecv
byteasend | byteasend
byteale | byteale
byteagt | byteagt
byteage | byteage
byteane | byteane
byteacmp | byteacmp
bytealike | bytealike
byteanlike | byteanlike
byteacat | byteacat
byteaeq | byteaeq
bytealt | bytealt
byteain | byteain
byteaout | byteaout
(14 rows)
</programlisting>
<para>
A more convenient way to use dblink may be to create a view:
</para>
<programlisting>
create view myremote_pg_proc as
select *
from dblink('dbname=postgres','select proname, prosrc from pg_proc')
as t1(proname name, prosrc text);
</programlisting>
<title>Return Value</title>
<para>
Then you can simply write:
Returns 1 if connection is busy, 0 if it is not busy.
If this function returns 0, it is guaranteed that
<function>dblink_get_result</> will not block.
</para>
</refsect1>
<refsect1>
<title>Example</title>
<programlisting>
select * from myremote_pg_proc where proname like 'bytea%';
SELECT dblink_is_busy('dtest1');
</programlisting>
</refsect1>
</refentry>
<refentry id="CONTRIB-DBLINK-SEND-QUERY">
<refentry id="CONTRIB-DBLINK-GET-RESULT">
<refnamediv>
<refname>dblink_send_query</refname>
<refpurpose>sends an async query to a remote database</refpurpose>
<refname>dblink_get_result</refname>
<refpurpose>gets an async query result</refpurpose>
</refnamediv>
<refsynopsisdiv>
<synopsis>
dblink_send_query(text connname, text sql)
dblink_get_result(text connname [, bool fail_on_error]) returns setof record
</synopsis>
</refsynopsisdiv>
<refsect1>
<title>Inputs</title>
<refsect2>
<title>connname</title>
<para>
The specific connection name to use.
</para>
</refsect2>
<refsect2>
<title>sql</title>
<para>
sql statement that you wish to execute on the remote host
e.g. "select * from pg_class"
</para>
</refsect2>
</refsect1>
<refsect1>
<title>Outputs</title>
<refsect1>
<title>Description</title>
<para>
Returns int. A return value of 1 if the query was successfully dispatched,
0 otherwise. If 1, results must be fetched by dblink_get_result(connname).
A running query may be cancelled by dblink_cancel_query(connname).
<function>dblink_get_result</> collects the results of an
asynchronous query previously sent with <function>dblink_send_query</>.
If the query is not already completed, <function>dblink_get_result</>
will wait until it is.
</para>
</refsect1>
<refsect1>
<title>Example</title>
<title>Arguments</title>
<variablelist>
<varlistentry>
<term><parameter>conname</parameter></term>
<listitem>
<para>
Name of the connection to use.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term><parameter>fail_on_error</parameter></term>
<listitem>
<para>
If true (the default when omitted) then an error thrown on the
remote side of the connection causes an error to also be thrown
locally. If false, the remote error is locally reported as a NOTICE,
and the function returns no rows.
</para>
</listitem>
</varlistentry>
</variablelist>
</refsect1>
<refsect1>
<title>Return Value</title>
<para>
For an async query (that is, a SQL statement returning rows),
the function returns the row(s) produced by the query. To use this
function, you will need to specify the expected set of columns,
as previously discussed for <function>dblink</>.
</para>
<para>
<literal>
SELECT dblink_connect('dtest1', 'dbname=contrib_regression');
SELECT * FROM
dblink_send_query('dtest1', 'SELECT * FROM foo WHERE f1 &lt; 3') AS t1;
</literal>
For an async command (that is, a SQL statement not returning rows),
the function returns a single row with a single text column containing
the command's status string. It is still necessary to specify that
the result will have a single text column in the calling <literal>FROM</>
clause.
</para>
</refsect1>
</refentry>
<refentry id="CONTRIB-DBLINK-GET-RESULT">
<refnamediv>
<refname>dblink_get_result</refname>
<refpurpose>gets an async query result</refpurpose>
</refnamediv>
<refsynopsisdiv>
<synopsis>
dblink_get_result(text connname [, bool fail_on_error])
</synopsis>
</refsynopsisdiv>
<refsect1>
<title>Inputs</title>
<refsect2>
<title>connname</title>
<para>
The specific connection name to use. An asynchronous query must
have already been sent using dblink_send_query()
</para>
</refsect2>
<refsect2>
<title>fail_on_error</title>
<para>
If true (default when not present) then an ERROR thrown on the remote side
of the connection causes an ERROR to also be thrown locally. If false, the
remote ERROR is locally treated as a NOTICE, and no rows are returned.
</para>
</refsect2>
</refsect1>
<refsect1>
<title>Outputs</title>
<para>Returns setof record</para>
</refsect1>
<refsect1>
<title>Notes</title>
<para>
Blocks until a result gets available.
This function *must* be called if dblink_send_query returned
a 1, even on cancelled queries - otherwise the connection
can't be used anymore. It must be called once for each query
This function <emphasis>must</> be called if
<function>dblink_send_query</> returned 1.
It must be called once for each query
sent, and one additional time to obtain an empty set result,
prior to using the connection again.
before the connection can be used again.
</para>
</refsect1>
<refsect1>
<title>Example</title>
<programlisting>
contrib_regression=# SELECT dblink_connect('dtest1', 'dbname=contrib_regression');
dblink_connect
----------------
OK
(1 row)
contrib_regression=# SELECT * from
contrib_regression-# dblink_send_query('dtest1', 'select * from foo where f1 &lt; 3') as t1;
t1
----
1
(1 row)
contrib_regression=# SELECT * from dblink_get_result('dtest1') as t1(f1 int, f2 text, f3 text[]);
f1 | f2 | f3
----+----+------------
......@@ -1257,19 +1249,19 @@
1 | b | {a1,b1,c1}
2 | c | {a2,b2,c2}
(3 rows)
contrib_regression=# SELECT * from dblink_get_result('dtest1') as t1(f1 int, f2 text, f3 text[]);
f1 | f2 | f3
----+----+----
(0 rows)
contrib_regression=# SELECT * from
dblink_send_query('dtest1', 'select * from foo where f1 &lt; 3; select * from foo where f1 &gt; 6') as t1;
t1
----
1
(1 row)
contrib_regression=# SELECT * from dblink_get_result('dtest1') as t1(f1 int, f2 text, f3 text[]);
f1 | f2 | f3
----+----+------------
......@@ -1277,7 +1269,7 @@
1 | b | {a1,b1,c1}
2 | c | {a2,b2,c2}
(3 rows)
contrib_regression=# SELECT * from dblink_get_result('dtest1') as t1(f1 int, f2 text, f3 text[]);
f1 | f2 | f3
----+----+---------------
......@@ -1286,7 +1278,7 @@
9 | j | {a9,b9,c9}
10 | k | {a10,b10,c10}
(4 rows)
contrib_regression=# SELECT * from dblink_get_result('dtest1') as t1(f1 int, f2 text, f3 text[]);
f1 | f2 | f3
----+----+----
......@@ -1294,4 +1286,502 @@
</programlisting>
</refsect1>
</refentry>
<refentry id="CONTRIB-DBLINK-CANCEL-QUERY">
<refnamediv>
<refname>dblink_cancel_query</refname>
<refpurpose>cancels any active query on the named connection</refpurpose>
</refnamediv>
<refsynopsisdiv>
<synopsis>
dblink_cancel_query(text connname) returns text
</synopsis>
</refsynopsisdiv>
<refsect1>
<title>Description</title>
<para>
<function>dblink_cancel_query</> attempts to cancel any query that
is in progress on the named connection. Note that this is not
certain to succeed (since, for example, the remote query might
already have finished). A cancel request simply improves the
odds that the query will fail soon. You must still complete the
normal query protocol, for example by calling
<function>dblink_get_result</>.
</para>
</refsect1>
<refsect1>
<title>Arguments</title>
<variablelist>
<varlistentry>
<term><parameter>conname</parameter></term>
<listitem>
<para>
Name of the connection to use.
</para>
</listitem>
</varlistentry>
</variablelist>
</refsect1>
<refsect1>
<title>Return Value</title>
<para>
Returns <literal>OK</> if the cancel request has been sent, or
the text of an error message on failure.
</para>
</refsect1>
<refsect1>
<title>Example</title>
<programlisting>
SELECT dblink_cancel_query('dtest1');
</programlisting>
</refsect1>
</refentry>
<refentry id="CONTRIB-DBLINK-CURRENT-QUERY">
<refnamediv>
<refname>dblink_current_query</refname>
<refpurpose>returns the current query string</refpurpose>
</refnamediv>
<refsynopsisdiv>
<synopsis>
dblink_current_query() returns text
</synopsis>
</refsynopsisdiv>
<refsect1>
<title>Description</title>
<para>
Returns the currently executing interactive command string of the
local database session, or NULL if it can't be determined. Note
that this function is not really related to <filename>dblink</>'s
other functionality. It is provided since it is sometimes useful
in generating queries to be forwarded to remote databases.
</para>
</refsect1>
<refsect1>
<title>Return Value</title>
<para>Returns a copy of the currently executing query string.</para>
</refsect1>
<refsect1>
<title>Example</title>
<programlisting>
test=# select dblink_current_query();
dblink_current_query
--------------------------------
select dblink_current_query();
(1 row)
</programlisting>
</refsect1>
</refentry>
<refentry id="CONTRIB-DBLINK-GET-PKEY">
<refnamediv>
<refname>dblink_get_pkey</refname>
<refpurpose>returns the positions and field names of a relation's
primary key fields
</refpurpose>
</refnamediv>
<refsynopsisdiv>
<synopsis>
dblink_get_pkey(text relname) returns setof dblink_pkey_results
</synopsis>
</refsynopsisdiv>
<refsect1>
<title>Description</title>
<para>
<function>dblink_get_pkey</> provides information about the primary
key of a relation in the local database. This is sometimes useful
in generating queries to be sent to remote databases.
</para>
</refsect1>
<refsect1>
<title>Arguments</title>
<variablelist>
<varlistentry>
<term><parameter>relname</parameter></term>
<listitem>
<para>
Name of a local relation, for example <literal>foo</> or
<literal>myschema.mytab</>. Include double quotes if the
name is mixed-case or contains special characters, for
example <literal>"FooBar"</>; without quotes, the string
will be folded to lower case.
</para>
</listitem>
</varlistentry>
</variablelist>
</refsect1>
<refsect1>
<title>Return Value</title>
<para>
Returns one row for each primary key field, or no rows if the relation
has no primary key. The result rowtype is defined as
<programlisting>
CREATE TYPE dblink_pkey_results AS (position int, colname text);
</programlisting>
</para>
</refsect1>
<refsect1>
<title>Example</title>
<programlisting>
test=# create table foobar(f1 int, f2 int, f3 int,
test(# primary key(f1,f2,f3));
CREATE TABLE
test=# select * from dblink_get_pkey('foobar');
position | colname
----------+---------
1 | f1
2 | f2
3 | f3
(3 rows)
</programlisting>
</refsect1>
</refentry>
<refentry id="CONTRIB-DBLINK-BUILD-SQL-INSERT">
<refnamediv>
<refname>dblink_build_sql_insert</refname>
<refpurpose>
builds an INSERT statement using a local tuple, replacing the
primary key field values with alternative supplied values
</refpurpose>
</refnamediv>
<refsynopsisdiv>
<synopsis>
dblink_build_sql_insert(text relname,
int2vector primary_key_attnums,
int2 num_primary_key_atts,
text[] src_pk_att_vals_array,
text[] tgt_pk_att_vals_array) returns text
</synopsis>
</refsynopsisdiv>
<refsect1>
<title>Description</title>
<para>
<function>dblink_build_sql_insert</> can be useful in doing selective
replication of a local table to a remote database. It selects a row
from the local table based on primary key, and then builds a SQL
<command>INSERT</> command that will duplicate that row, but with
the primary key values replaced by the values in the last argument.
(To make an exact copy of the row, just specify the same values for
the last two arguments.)
</para>
</refsect1>
<refsect1>
<title>Arguments</title>
<variablelist>
<varlistentry>
<term><parameter>relname</parameter></term>
<listitem>
<para>
Name of a local relation, for example <literal>foo</> or
<literal>myschema.mytab</>. Include double quotes if the
name is mixed-case or contains special characters, for
example <literal>"FooBar"</>; without quotes, the string
will be folded to lower case.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term><parameter>primary_key_attnums</parameter></term>
<listitem>
<para>
Attribute numbers (1-based) of the primary key fields,
for example <literal>1 2</>.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term><parameter>num_primary_key_atts</parameter></term>
<listitem>
<para>
The number of primary key fields.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term><parameter>src_pk_att_vals_array</parameter></term>
<listitem>
<para>
Values of the primary key fields to be used to look up the
local tuple. Each field is represented in text form.
An error is thrown if there is no local row with these
primary key values.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term><parameter>tgt_pk_att_vals_array</parameter></term>
<listitem>
<para>
Values of the primary key fields to be placed in the resulting
<command>INSERT</> command. Each field is represented in text form.
</para>
</listitem>
</varlistentry>
</variablelist>
</refsect1>
<refsect1>
<title>Return Value</title>
<para>Returns the requested SQL statement as text.</para>
</refsect1>
<refsect1>
<title>Example</title>
<programlisting>
test=# select dblink_build_sql_insert('foo', '1 2', 2, '{"1", "a"}', '{"1", "b''a"}');
dblink_build_sql_insert
--------------------------------------------------
INSERT INTO foo(f1,f2,f3) VALUES('1','b''a','1')
(1 row)
</programlisting>
</refsect1>
</refentry>
<refentry id="CONTRIB-DBLINK-BUILD-SQL-DELETE">
<refnamediv>
<refname>dblink_build_sql_delete</refname>
<refpurpose>builds a DELETE statement using supplied values for primary
key field values
</refpurpose>
</refnamediv>
<refsynopsisdiv>
<synopsis>
dblink_build_sql_delete(text relname,
int2vector primary_key_attnums,
int2 num_primary_key_atts,
text[] tgt_pk_att_vals_array) returns text
</synopsis>
</refsynopsisdiv>
<refsect1>
<title>Description</title>
<para>
<function>dblink_build_sql_delete</> can be useful in doing selective
replication of a local table to a remote database. It builds a SQL
<command>DELETE</> command that will delete the row with the given
primary key values.
</para>
</refsect1>
<refsect1>
<title>Arguments</title>
<variablelist>
<varlistentry>
<term><parameter>relname</parameter></term>
<listitem>
<para>
Name of a local relation, for example <literal>foo</> or
<literal>myschema.mytab</>. Include double quotes if the
name is mixed-case or contains special characters, for
example <literal>"FooBar"</>; without quotes, the string
will be folded to lower case.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term><parameter>primary_key_attnums</parameter></term>
<listitem>
<para>
Attribute numbers (1-based) of the primary key fields,
for example <literal>1 2</>.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term><parameter>num_primary_key_atts</parameter></term>
<listitem>
<para>
The number of primary key fields.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term><parameter>tgt_pk_att_vals_array</parameter></term>
<listitem>
<para>
Values of the primary key fields to be used in the resulting
<command>DELETE</> command. Each field is represented in text form.
</para>
</listitem>
</varlistentry>
</variablelist>
</refsect1>
<refsect1>
<title>Return Value</title>
<para>Returns the requested SQL statement as text.</para>
</refsect1>
<refsect1>
<title>Example</title>
<programlisting>
test=# select dblink_build_sql_delete('"MyFoo"', '1 2', 2, '{"1", "b"}');
dblink_build_sql_delete
---------------------------------------------
DELETE FROM "MyFoo" WHERE f1='1' AND f2='b'
(1 row)
</programlisting>
</refsect1>
</refentry>
<refentry id="CONTRIB-DBLINK-BUILD-SQL-UPDATE">
<refnamediv>
<refname>dblink_build_sql_update</refname>
<refpurpose>builds an UPDATE statement using a local tuple, replacing
the primary key field values with alternative supplied values
</refpurpose>
</refnamediv>
<refsynopsisdiv>
<synopsis>
dblink_build_sql_update(text relname,
int2vector primary_key_attnums,
int2 num_primary_key_atts,
text[] src_pk_att_vals_array,
text[] tgt_pk_att_vals_array) returns text
</synopsis>
</refsynopsisdiv>
<refsect1>
<title>Description</title>
<para>
<function>dblink_build_sql_update</> can be useful in doing selective
replication of a local table to a remote database. It selects a row
from the local table based on primary key, and then builds a SQL
<command>UPDATE</> command that will duplicate that row, but with
the primary key values replaced by the values in the last argument.
(To make an exact copy of the row, just specify the same values for
the last two arguments.) The <command>UPDATE</> command always assigns
all fields of the row &mdash; the main difference between this and
<function>dblink_build_sql_insert</> is that it's assumed that
the target row already exists in the remote table.
</para>
</refsect1>
<refsect1>
<title>Arguments</title>
<variablelist>
<varlistentry>
<term><parameter>relname</parameter></term>
<listitem>
<para>
Name of a local relation, for example <literal>foo</> or
<literal>myschema.mytab</>. Include double quotes if the
name is mixed-case or contains special characters, for
example <literal>"FooBar"</>; without quotes, the string
will be folded to lower case.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term><parameter>primary_key_attnums</parameter></term>
<listitem>
<para>
Attribute numbers (1-based) of the primary key fields,
for example <literal>1 2</>.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term><parameter>num_primary_key_atts</parameter></term>
<listitem>
<para>
The number of primary key fields.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term><parameter>src_pk_att_vals_array</parameter></term>
<listitem>
<para>
Values of the primary key fields to be used to look up the
local tuple. Each field is represented in text form.
An error is thrown if there is no local row with these
primary key values.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term><parameter>tgt_pk_att_vals_array</parameter></term>
<listitem>
<para>
Values of the primary key fields to be placed in the resulting
<command>UPDATE</> command. Each field is represented in text form.
</para>
</listitem>
</varlistentry>
</variablelist>
</refsect1>
<refsect1>
<title>Return Value</title>
<para>Returns the requested SQL statement as text.</para>
</refsect1>
<refsect1>
<title>Example</title>
<programlisting>
test=# select dblink_build_sql_update('foo', '1 2', 2, '{"1", "a"}', '{"1", "b"}');
dblink_build_sql_update
-------------------------------------------------------------
UPDATE foo SET f1='1',f2='b',f3='1' WHERE f1='1' AND f2='b'
(1 row)
</programlisting>
</refsect1>
</refentry>
</sect1>
<!-- $PostgreSQL: pgsql/doc/src/sgml/dict-int.sgml,v 1.2 2007/12/06 04:12:10 tgl Exp $ -->
<sect1 id="dict-int">
<title>dict_int</title>
<indexterm zone="dict-int">
<primary>dict_int</primary>
</indexterm>
<para>
The motivation for this example dictionary is to control the indexing of
integers (signed and unsigned), and, consequently, to minimize the number of
unique words which greatly affect the performance of searching.
<filename>dict_int</> is an example of an add-on dictionary template
for full-text search. The motivation for this example dictionary is to
control the indexing of integers (signed and unsigned), allowing such
numbers to be indexed while preventing excessive growth in the number of
unique words, which greatly affects the performance of searching.
</para>
<sect2>
<title>Configuration</title>
<para>
The dictionary accepts two options:
The dictionary accepts two options:
</para>
<itemizedlist>
<listitem>
<para>
The MAXLEN parameter specifies the maximum length (number of digits)
allowed in an integer word. The default value is 6.
The <literal>maxlen</> parameter specifies the maximum number of
digits allowed in an integer word. The default value is 6.
</para>
</listitem>
<listitem>
<para>
The REJECTLONG parameter specifies if an overlength integer should be
truncated or ignored. If REJECTLONG=FALSE (default), the dictionary returns
the first MAXLEN digits of the integer. If REJECTLONG=TRUE, the
dictionary treats an overlength integer as a stop word, so that it will
not be indexed.
The <literal>rejectlong</> parameter specifies whether an overlength
integer should be truncated or ignored. If <literal>rejectlong</> is
<literal>false</> (the default), the dictionary returns the first
<literal>maxlen</> digits of the integer. If <literal>rejectlong</> is
<literal>true</>, the dictionary treats an overlength integer as a stop
word, so that it will not be indexed. Note that this also means that
such an integer cannot be searched for.
</para>
</listitem>
</itemizedlist>
......
<!-- $PostgreSQL: pgsql/doc/src/sgml/dict-xsyn.sgml,v 1.2 2007/12/06 04:12:10 tgl Exp $ -->
<sect1 id="dict-xsyn">
<title>dict_xsyn</title>
<indexterm zone="dict-xsyn">
<primary>dict_xsyn</primary>
</indexterm>
<para>
The Extended Synonym Dictionary module replaces words with groups of their
synonyms, and so makes it possible to search for a word using any of its
synonyms.
<filename>dict_xsyn</> (Extended Synonym Dictionary) is an example of an
add-on dictionary template for full-text search. This dictionary type
replaces words with groups of their synonyms, and so makes it possible to
search for a word using any of its synonyms.
</para>
<sect2>
<title>Configuration</title>
<para>
A <literal>dict_xsyn</> dictionary accepts the following options:
</para>
<itemizedlist>
<listitem>
<para>
KEEPORIG controls whether the original word is included, or only its
synonyms. Default is 'true'.
<literal>keeporig</> controls whether the original word is included (if
<literal>true</>), or only its synonyms (if <literal>false</>). Default
is <literal>true</>.
</para>
</listitem>
<listitem>
<para>
RULES is the base name of the file containing the list of synonyms.
This file must be in $(prefix)/share/tsearch_data/, and its name must
end in ".rules" (which is not included in the RULES parameter).
<literal>rules</> is the base name of the file containing the list of
synonyms. This file must be stored in
<filename>$SHAREDIR/tsearch_data/</> (where <literal>$SHAREDIR</> means
the <productname>PostgreSQL</> installation's shared-data directory).
Its name must end in <literal>.rules</> (which is not to be included in
the <literal>rules</> parameter).
</para>
</listitem>
</itemizedlist>
......@@ -38,41 +46,63 @@
<listitem>
<para>
Each line represents a group of synonyms for a single word, which is
given first on the line. Synonyms are separated by whitespace:
</para>
given first on the line. Synonyms are separated by whitespace, thus:
<programlisting>
word syn1 syn2 syn3
</programlisting>
</para>
</listitem>
<listitem>
<para>
Sharp ('#') sign is a comment delimiter. It may appear at any position
inside the line. The rest of the line will be skipped.
The sharp (<literal>#</>) sign is a comment delimiter. It may appear at
any position in a line. The rest of the line will be skipped.
</para>
</listitem>
</itemizedlist>
<para>
Look at xsyn_sample.rules, which is installed in $(prefix)/share/tsearch_data/,
for an example.
Look at <filename>xsyn_sample.rules</>, which is installed in
<filename>$SHAREDIR/tsearch_data/</>, for an example.
</para>
</sect2>
<sect2>
<title>Usage</title>
<programlisting>
mydb=# SELECT ts_lexize('xsyn','word');
ts_lexize
----------------
{word,syn1,syn2,syn3)
</programlisting>
<para>
Change dictionary options:
</para>
<programlisting>
mydb# ALTER TEXT SEARCH DICTIONARY xsyn (KEEPORIG=false);
Running the installation script creates a text search template
<literal>xsyn_template</> and a dictionary <literal>xsyn</>
based on it, with default parameters. You can alter the
parameters, for example
<programlisting>
mydb# ALTER TEXT SEARCH DICTIONARY xsyn (RULES='my_rules', KEEPORIG=false);
ALTER TEXT SEARCH DICTIONARY
</programlisting>
</programlisting>
or create new dictionaries based on the template.
</para>
<para>
To test the dictionary, you can try
<programlisting>
mydb=# SELECT ts_lexize('xsyn', 'word');
ts_lexize
-----------------------
{word,syn1,syn2,syn3}
</programlisting>
but real-world usage will involve including it in a text search
configuration as described in <xref linkend="textsearch">.
That might look like this:
<programlisting>
ALTER TEXT SEARCH CONFIGURATION english
ALTER MAPPING FOR word, asciiword WITH xsyn, english_stem;
</programlisting>
</para>
</sect2>
</sect1>
<!-- $PostgreSQL: pgsql/doc/src/sgml/earthdistance.sgml,v 1.3 2007/12/06 04:12:10 tgl Exp $ -->
<sect1 id="earthdistance">
<title>earthdistance</title>
<indexterm zone="earthdistance">
<primary>earthdistance</primary>
</indexterm>
<para>
This module contains two different approaches to calculating
great circle distances on the surface of the Earth. The one described
first depends on the contrib/cube package (which MUST be installed before
earthdistance is installed). The second one is based on the point
datatype using latitude and longitude for the coordinates. The install
script makes the defined functions executable by anyone.
</para>
<para>
A spherical model of the Earth is used.
</para>
<para>
Data is stored in cubes that are points (both corners are the same) using 3
coordinates representing the distance from the center of the Earth.
</para>
<para>
The radius of the Earth is obtained from the earth() function. It is
given in meters. But by changing this one function you can change it
to use some other units or to use a different value of the radius
that you feel is more appropiate.
</para>
<para>
This package also has applications to astronomical databases as well.
Astronomers will probably want to change earth() to return a radius of
180/pi() so that distances are in degrees.
</para>
<para>
Functions are provided to allow for input in latitude and longitude (in
degrees), to allow for output of latitude and longitude, to calculate
the great circle distance between two points and to easily specify a
bounding box usable for index searches.
</para>
<para>
The functions are all 'sql' functions. If you want to make these functions
executable by other people you will also have to make the referenced
cube functions executable. cube(text), cube(float8), cube(cube,float8),
cube_distance(cube,cube), cube_ll_coord(cube,int) and
cube_enlarge(cube,float8,int) are used indirectly by the earth distance
functions. is_point(cube) and cube_dim(cube) are used in constraints for data
in domain earth. cube_ur_coord(cube,int) is used in the regression tests and
might be useful for looking at bounding box coordinates in user applications.
</para>
<para>
A domain of type cube named earth is defined.
There are constraints on it defined to make sure the cube is a point,
that it does not have more than 3 dimensions and that it is very near
the surface of a sphere centered about the origin with the radius of
the Earth.
</para>
<para>
The following functions are provided:
The <filename>earthdistance</> module provides two different approaches to
calculating great circle distances on the surface of the Earth. The one
described first depends on the <filename>cube</> package (which
<emphasis>must</> be installed before <filename>earthdistance</> can be
installed). The second one is based on the built-in <type>point</> datatype,
using longitude and latitude for the coordinates.
</para>
<table id="earthdistance-functions">
<title>EarthDistance functions</title>
<tgroup cols="2">
<tbody>
<row>
<entry><literal>earth()</literal></entry>
<entry>returns the radius of the Earth in meters.</entry>
</row>
<row>
<entry><literal>sec_to_gc(float8)</literal></entry>
<entry>converts the normal straight line
(secant) distance between between two points on the surface of the Earth
to the great circle distance between them.
</entry>
</row>
<row>
<entry><literal>gc_to_sec(float8)</literal></entry>
<entry>Converts the great circle distance
between two points on the surface of the Earth to the normal straight line
(secant) distance between them.
</entry>
</row>
<row>
<entry><literal>ll_to_earth(float8, float8)</literal></entry>
<entry>Returns the location of a point on the surface of the Earth given
its latitude (argument 1) and longitude (argument 2) in degrees.
</entry>
</row>
<row>
<entry><literal>latitude(earth)</literal></entry>
<entry>Returns the latitude in degrees of a point on the surface of the
Earth.
</entry>
</row>
<row>
<entry><literal>longitude(earth)</literal></entry>
<entry>Returns the longitude in degrees of a point on the surface of the
Earth.
</entry>
</row>
<row>
<entry><literal>earth_distance(earth, earth)</literal></entry>
<entry>Returns the great circle distance between two points on the
surface of the Earth.
</entry>
</row>
<row>
<entry><literal>earth_box(earth, float8)</literal></entry>
<entry>Returns a box suitable for an indexed search using the cube @>
operator for points within a given great circle distance of a location.
Some points in this box are further than the specified great circle
distance from the location so a second check using earth_distance
should be made at the same time.
</entry>
</row>
<row>
<entry><literal>&lt;@&gt;</literal> operator</entry>
<entry>gives the distance in statute miles between
two points on the Earth's surface. Coordinates are in degrees. Points are
taken as (longitude, latitude) and not vice versa as longitude is closer
to the intuitive idea of x-axis and latitude to y-axis.
</entry>
</row>
</tbody>
</tgroup>
</table>
<para>
One advantage of using cube representation over a point using latitude and
longitude for coordinates, is that you don't have to worry about special
conditions at +/- 180 degrees of longitude or near the poles.
In this module, the Earth is assumed to be perfectly spherical.
(If that's too inaccurate for you, you might want to look at the
<application><ulink url="http://www.postgis.org/">PostGIS</ulink></>
project.)
</para>
</sect1>
<sect2>
<title>Cube-based earth distances</title>
<para>
Data is stored in cubes that are points (both corners are the same) using 3
coordinates representing the x, y, and z distance from the center of the
Earth. A domain <type>earth</> over <type>cube</> is provided, which
includes constraint checks that the value meets these restrictions and
is reasonably close to the actual surface of the Earth.
</para>
<para>
The radius of the Earth is obtained from the <function>earth()</>
function. It is given in meters. But by changing this one function you can
change the module to use some other units, or to use a different value of
the radius that you feel is more appropiate.
</para>
<para>
This package has applications to astronomical databases as well.
Astronomers will probably want to change <function>earth()</> to return a
radius of <literal>180/pi()</> so that distances are in degrees.
</para>
<para>
Functions are provided to support input in latitude and longitude (in
degrees), to support output of latitude and longitude, to calculate
the great circle distance between two points and to easily specify a
bounding box usable for index searches.
</para>
<para>
The following functions are provided:
</para>
<table id="earthdistance-cube-functions">
<title>Cube-based earthdistance functions</title>
<tgroup cols="3">
<thead>
<row>
<entry>Function</entry>
<entry>Returns</entry>
<entry>Description</entry>
</row>
</thead>
<tbody>
<row>
<entry><function>earth()</function></entry>
<entry><type>float8</type></entry>
<entry>Returns the assumed radius of the Earth.</entry>
</row>
<row>
<entry><function>sec_to_gc(float8)</function></entry>
<entry><type>float8</type></entry>
<entry>Converts the normal straight line
(secant) distance between between two points on the surface of the Earth
to the great circle distance between them.
</entry>
</row>
<row>
<entry><function>gc_to_sec(float8)</function></entry>
<entry><type>float8</type></entry>
<entry>Converts the great circle distance between two points on the
surface of the Earth to the normal straight line (secant) distance
between them.
</entry>
</row>
<row>
<entry><function>ll_to_earth(float8, float8)</function></entry>
<entry><type>earth</type></entry>
<entry>Returns the location of a point on the surface of the Earth given
its latitude (argument 1) and longitude (argument 2) in degrees.
</entry>
</row>
<row>
<entry><function>latitude(earth)</function></entry>
<entry><type>float8</type></entry>
<entry>Returns the latitude in degrees of a point on the surface of the
Earth.
</entry>
</row>
<row>
<entry><function>longitude(earth)</function></entry>
<entry><type>float8</type></entry>
<entry>Returns the longitude in degrees of a point on the surface of the
Earth.
</entry>
</row>
<row>
<entry><function>earth_distance(earth, earth)</function></entry>
<entry><type>float8</type></entry>
<entry>Returns the great circle distance between two points on the
surface of the Earth.
</entry>
</row>
<row>
<entry><function>earth_box(earth, float8)</function></entry>
<entry><type>cube</type></entry>
<entry>Returns a box suitable for an indexed search using the cube
<literal>@&gt;</>
operator for points within a given great circle distance of a location.
Some points in this box are further than the specified great circle
distance from the location, so a second check using
<function>earth_distance</> should be included in the query.
</entry>
</row>
</tbody>
</tgroup>
</table>
</sect2>
<sect2>
<title>Point-based earth distances</title>
<para>
The second part of the module relies on representing Earth locations as
values of type <type>point</>, in which the first component is taken to
represent longitude in degrees, and the second component is taken to
represent latitude in degrees. Points are taken as (longitude, latitude)
and not vice versa because longitude is closer to the intuitive idea of
x-axis and latitude to y-axis.
</para>
<para>
A single operator is provided:
</para>
<table id="earthdistance-point-operators">
<title>Point-based earthdistance operators</title>
<tgroup cols="3">
<thead>
<row>
<entry>Operator</entry>
<entry>Returns</entry>
<entry>Description</entry>
</row>
</thead>
<tbody>
<row>
<entry><type>point</> <literal>&lt;@&gt;</literal> <type>point</></entry>
<entry><type>float8</type></entry>
<entry>Gives the distance in statute miles between
two points on the Earth's surface.
</entry>
</row>
</tbody>
</tgroup>
</table>
<para>
Note that unlike the <type>cube</>-based part of the module, units
are hardwired here: changing the <function>earth()</> function will
not affect the results of this operator.
</para>
<para>
One disadvantage of the longitude/latitude representation is that
you need to be careful about the edge conditions near the poles
and near +/- 180 degrees of longitude. The <type>cube</>-based
representation avoids these discontinuities.
</para>
</sect2>
</sect1>
<!-- $PostgreSQL: pgsql/doc/src/sgml/fuzzystrmatch.sgml,v 1.3 2007/12/06 04:12:10 tgl Exp $ -->
<sect1 id="fuzzystrmatch">
<title>fuzzystrmatch</title>
<indexterm zone="fuzzystrmatch">
<primary>fuzzystrmatch</primary>
</indexterm>
<para>
This section describes the fuzzystrmatch module which provides different
The <filename>fuzzystrmatch</> module provides several
functions to determine similarities and distance between strings.
</para>
<sect2>
<title>Soundex</title>
<para>
The Soundex system is a method of matching similar sounding names
(or any words) to the same code. It was initially used by the
United States Census in 1880, 1900, and 1910, but it has little use
beyond English names (or the English pronunciation of names), and
it is not a linguistic tool.
The Soundex system is a method of matching similar-sounding names
by converting them to the same code. It was initially used by the
United States Census in 1880, 1900, and 1910. Note that Soundex
is not very useful for non-English names.
</para>
<para>
When comparing two soundex values to determine similarity, the
difference function reports how close the match is on a scale
from zero to four, with zero being no match and four being an
exact match.
The <filename>fuzzystrmatch</> module provides two functions
for working with Soundex codes:
</para>
<programlisting>
soundex(text) returns text
difference(text, text) returns int
</programlisting>
<para>
The <function>soundex</> function converts a string to its Soundex code.
The <function>difference</> function converts two strings to their Soundex
codes and then reports the number of matching code positions. Since
Soundex codes have four characters, the result ranges from zero to four,
with zero being no match and four being an exact match. (Thus, the
function is misnamed &mdash; <function>similarity</> would have been
a better name.)
</para>
<para>
The following are some usage examples:
Here are some usage examples:
</para>
<programlisting>
SELECT soundex('hello world!');
......@@ -41,81 +62,106 @@ INSERT INTO s VALUES ('jack');
SELECT * FROM s WHERE soundex(nm) = soundex('john');
SELECT a.nm, b.nm FROM s a, s b WHERE soundex(a.nm) = soundex(b.nm) AND a.oid &lt;&gt; b.oid;
CREATE FUNCTION text_sx_eq(text, text) RETURNS boolean AS
'select soundex($1) = soundex($2)'
LANGUAGE SQL;
CREATE FUNCTION text_sx_lt(text, text) RETURNS boolean AS
'select soundex($1) &lt; soundex($2)'
LANGUAGE SQL;
CREATE FUNCTION text_sx_gt(text, text) RETURNS boolean AS
'select soundex($1) &gt; soundex($2)'
LANGUAGE SQL;
CREATE FUNCTION text_sx_le(text, text) RETURNS boolean AS
'select soundex($1) &lt;= soundex($2)'
LANGUAGE SQL;
CREATE FUNCTION text_sx_ge(text, text) RETURNS boolean AS
'select soundex($1) &gt;= soundex($2)'
LANGUAGE SQL;
SELECT * FROM s WHERE difference(s.nm, 'john') &gt; 2;
</programlisting>
</sect2>
CREATE FUNCTION text_sx_ne(text, text) RETURNS boolean AS
'select soundex($1) &lt;&gt; soundex($2)'
LANGUAGE SQL;
<sect2>
<title>Levenshtein</title>
DROP OPERATOR #= (text, text);
<para>
This function calculates the Levenshtein distance between two strings:
</para>
CREATE OPERATOR #= (leftarg=text, rightarg=text, procedure=text_sx_eq, commutator = #=);
<programlisting>
levenshtein(text source, text target) returns int
</programlisting>
SELECT * FROM s WHERE text_sx_eq(nm, 'john');
<para>
Both <literal>source</literal> and <literal>target</literal> can be any
non-null string, with a maximum of 255 characters.
</para>
SELECT * FROM s WHERE s.nm #= 'john';
<para>
Example:
</para>
SELECT * FROM s WHERE difference(s.nm, 'john') &gt; 2;
<programlisting>
test=# SELECT levenshtein('GUMBO', 'GAMBOL');
levenshtein
-------------
2
(1 row)
</programlisting>
</sect2>
<sect2>
<title>levenshtein</title>
<title>Metaphone</title>
<para>
Metaphone, like Soundex, is based on the idea of constructing a
representative code for an input string. Two strings are then
deemed similar if they have the same codes.
</para>
<para>
This function calculates the levenshtein distance between two strings:
This function calculates the metaphone code of an input string:
</para>
<programlisting>
int levenshtein(text source, text target)
metaphone(text source, int max_output_length) returns text
</programlisting>
<para>
Both <literal>source</literal> and <literal>target</literal> can be any
NOT NULL string with a maximum of 255 characters.
<literal>source</literal> has to be a non-null string with a maximum of
255 characters. <literal>max_output_length</literal> sets the maximum
length of the output metaphone code; if longer, the output is truncated
to this length.
</para>
<para>
Example:
</para>
<programlisting>
SELECT levenshtein('GUMBO','GAMBOL');
test=# SELECT metaphone('GUMBO', 4);
metaphone
-----------
KM
(1 row)
</programlisting>
</sect2>
<sect2>
<title>metaphone</title>
<title>Double Metaphone</title>
<para>
This function calculates and returns the metaphone code of an input string:
The Double Metaphone system computes two <quote>sounds like</> strings
for a given input string &mdash; a <quote>primary</> and an
<quote>alternate</>. In most cases they are the same, but for non-English
names especially they can be a bit different, depending on pronunciation.
These functions compute the primary and alternate codes:
</para>
<programlisting>
text metahpone(text source, int max_output_length)
dmetaphone(text source) returns text
dmetaphone_alt(text source) returns text
</programlisting>
<para>
<literal>source</literal> has to be a NOT NULL string with a maximum of
255 characters. <literal>max_output_length</literal> fixes the maximum
length of the output metaphone code; if longer, the output is truncated
to this length.
There is no length limit on the input strings.
</para>
<para>
Example:
</para>
<para>Example</para>
<programlisting>
SELECT metaphone('GUMBO',4);
test=# select dmetaphone('gumbo');
dmetaphone
------------
KMP
(1 row)
</programlisting>
</sect2>
......
<!-- $PostgreSQL: pgsql/doc/src/sgml/hstore.sgml,v 1.2 2007/12/06 04:12:10 tgl Exp $ -->
<sect1 id="hstore">
<title>hstore</title>
<indexterm zone="hstore">
<primary>hstore</primary>
</indexterm>
<para>
The <literal>hstore</literal> module is usefull for storing (key,value) pairs.
This module can be useful in different scenarios: case with many attributes
rarely searched, semistructural data or a lazy DBA.
This module implements a data type <type>hstore</> for storing sets of
(key,value) pairs within a single <productname>PostgreSQL</> data field.
This can be useful in various scenarios, such as rows with many attributes
that are rarely examined, or semi-structured data.
</para>
<sect2>
<title>Operations</title>
<itemizedlist>
<listitem>
<para>
<literal>hstore -> text</literal> - get value , perl analogy $h{key}
</para>
<programlisting>
select 'a=>q, b=>g'->'a';
?
------
q
</programlisting>
<para>
Note the use of parenthesis in the select below, because priority of 'is' is
higher than that of '->':
</para>
<programlisting>
SELECT id FROM entrants WHERE (info->'education_period') IS NOT NULL;
</programlisting>
</listitem>
<listitem>
<para>
<literal>hstore || hstore</literal> - concatenation, perl analogy %a=( %b, %c );
</para>
<programlisting>
regression=# select 'a=>b'::hstore || 'c=>d'::hstore;
?column?
--------------------
"a"=>"b", "c"=>"d"
(1 row)
</programlisting>
<para>
but, notice
</para>
<programlisting>
regression=# select 'a=>b'::hstore || 'a=>d'::hstore;
?column?
----------
"a"=>"d"
(1 row)
</programlisting>
</listitem>
<listitem>
<para>
<literal>text => text</literal> - creates hstore type from two text strings
</para>
<programlisting>
select 'a'=>'b';
?column?
----------
"a"=>"b"
</programlisting>
</listitem>
<listitem>
<para>
<literal>hstore @> hstore</literal> - contains operation, check if left operand contains right.
</para>
<programlisting>
regression=# select 'a=>b, b=>1, c=>NULL'::hstore @> 'a=>c';
?column?
----------
f
(1 row)
regression=# select 'a=>b, b=>1, c=>NULL'::hstore @> 'b=>1';
?column?
----------
t
(1 row)
</programlisting>
</listitem>
<listitem>
<para>
<literal>hstore &lt;@ hstore</literal> - contained operation, check if
left operand is contained in right
</para>
<para>
(Before PostgreSQL 8.2, the containment operators @&gt; and &lt;@ were
respectively called @ and ~. These names are still available, but are
deprecated and will eventually be retired. Notice that the old names
are reversed from the convention formerly followed by the core geometric
datatypes!)
</para>
</listitem>
</itemizedlist>
<title><type>hstore</> External Representation</title>
<para>
The text representation of an <type>hstore</> value includes zero
or more <replaceable>key</> <literal>=&gt;</> <replaceable>value</>
items, separated by commas. For example:
<programlisting>
k => v
foo => bar, baz => whatever
"1-a" => "anything at all"
</programlisting>
The order of the items is not considered significant (and may not be
reproduced on output). Whitespace between items or around the
<literal>=&gt;</> sign is ignored. Use double quotes if a key or
value includes whitespace, comma, <literal>=</> or <literal>&gt;</>.
To include a double quote or a backslash in a key or value, precede
it with another backslash. (Keep in mind that depending on the
setting of <varname>standard_conforming_strings</>, you may need to
double backslashes in SQL literal strings.)
</para>
<para>
A value (but not a key) can be a SQL NULL. This is represented as
<programlisting>
key => NULL
</programlisting>
The <literal>NULL</> keyword is not case-sensitive. Again, use
double quotes if you want the string <literal>null</> to be treated
as an ordinary data value.
</para>
<para>
Currently, double quotes are always used to surround key and value
strings on output, even when this is not strictly necessary.
</para>
</sect2>
<sect2>
<title>Functions</title>
<itemizedlist>
<listitem>
<para>
<literal>akeys(hstore)</literal> - returns all keys from hstore as array
</para>
<programlisting>
regression=# select akeys('a=>1,b=>2');
akeys
-------
{a,b}
</programlisting>
</listitem>
<listitem>
<para>
<literal>skeys(hstore)</literal> - returns all keys from hstore as strings
</para>
<programlisting>
regression=# select skeys('a=>1,b=>2');
skeys
-------
a
b
</programlisting>
</listitem>
<listitem>
<para>
<literal>avals(hstore)</literal> - returns all values from hstore as array
</para>
<programlisting>
regression=# select avals('a=>1,b=>2');
avals
-------
{1,2}
</programlisting>
</listitem>
<listitem>
<para>
<literal>svals(hstore)</literal> - returns all values from hstore as
strings
</para>
<programlisting>
regression=# select svals('a=>1,b=>2');
svals
-------
1
2
</programlisting>
</listitem>
<listitem>
<para>
<literal>delete (hstore,text)</literal> - delete (key,value) from hstore if
key matches argument.
</para>
<programlisting>
regression=# select delete('a=>1,b=>2','b');
delete
----------
"a"=>"1"
</programlisting>
</listitem>
<listitem>
<para>
<literal>each(hstore)</literal> - return (key, value) pairs
</para>
<programlisting>
regression=# select * from each('a=>1,b=>2');
key | value
<title><type>hstore</> Operators and Functions</title>
<table id="hstore-op-table">
<title><type>hstore</> Operators</title>
<tgroup cols="4">
<thead>
<row>
<entry>Operator</entry>
<entry>Description</entry>
<entry>Example</entry>
<entry>Result</entry>
</row>
</thead>
<tbody>
<row>
<entry><type>hstore</> <literal>-&gt;</> <type>text</></entry>
<entry>get value for key (null if not present)</entry>
<entry><literal>'a=&gt;x, b=&gt;y'::hstore -&gt; 'a'</literal></entry>
<entry><literal>x</literal></entry>
</row>
<row>
<entry><type>text</> <literal>=&gt;</> <type>text</></entry>
<entry>make single-item <type>hstore</></entry>
<entry><literal>'a' =&gt; 'b'</literal></entry>
<entry><literal>"a"=&gt;"b"</literal></entry>
</row>
<row>
<entry><type>hstore</> <literal>||</> <type>hstore</></entry>
<entry>concatenation</entry>
<entry><literal>'a=&gt;b, c=&gt;d'::hstore || 'c=&gt;x, d=&gt;q'::hstore</literal></entry>
<entry><literal>"a"=&gt;"b", "c"=&gt;"x", "d"=&gt;"q"</literal></entry>
</row>
<row>
<entry><type>hstore</> <literal>?</> <type>text</></entry>
<entry>does <type>hstore</> contain key?</entry>
<entry><literal>'a=&gt;1'::hstore ? 'a'</literal></entry>
<entry><literal>t</literal></entry>
</row>
<row>
<entry><type>hstore</> <literal>@&gt;</> <type>hstore</></entry>
<entry>does left operand contain right?</entry>
<entry><literal>'a=&gt;b, b=&gt;1, c=&gt;NULL'::hstore @&gt; 'b=&gt;1'</literal></entry>
<entry><literal>t</literal></entry>
</row>
<row>
<entry><type>hstore</> <literal>&lt;@</> <type>hstore</></entry>
<entry>is left operand contained in right?</entry>
<entry><literal>'a=&gt;c'::hstore &lt;@ 'a=&gt;b, b=&gt;1, c=&gt;NULL'</literal></entry>
<entry><literal>f</literal></entry>
</row>
</tbody>
</tgroup>
</table>
<para>
(Before PostgreSQL 8.2, the containment operators @&gt; and &lt;@ were
respectively called @ and ~. These names are still available, but are
deprecated and will eventually be retired. Notice that the old names
are reversed from the convention formerly followed by the core geometric
datatypes!)
</para>
<table id="hstore-func-table">
<title><type>hstore</> Functions</title>
<tgroup cols="5">
<thead>
<row>
<entry>Function</entry>
<entry>Return Type</entry>
<entry>Description</entry>
<entry>Example</entry>
<entry>Result</entry>
</row>
</thead>
<tbody>
<row>
<entry><function>akeys(hstore)</function></entry>
<entry><type>text[]</type></entry>
<entry>get <type>hstore</>'s keys as array</entry>
<entry><literal>akeys('a=&gt;1,b=&gt;2')</literal></entry>
<entry><literal>{a,b}</literal></entry>
</row>
<row>
<entry><function>skeys(hstore)</function></entry>
<entry><type>setof text</type></entry>
<entry>get <type>hstore</>'s keys as set</entry>
<entry><literal>skeys('a=&gt;1,b=&gt;2')</literal></entry>
<entry>
<programlisting>
a
b
</programlisting></entry>
</row>
<row>
<entry><function>avals(hstore)</function></entry>
<entry><type>text[]</type></entry>
<entry>get <type>hstore</>'s values as array</entry>
<entry><literal>avals('a=&gt;1,b=&gt;2')</literal></entry>
<entry><literal>{1,2}</literal></entry>
</row>
<row>
<entry><function>svals(hstore)</function></entry>
<entry><type>setof text</type></entry>
<entry>get <type>hstore</>'s values as set</entry>
<entry><literal>svals('a=&gt;1,b=&gt;2')</literal></entry>
<entry>
<programlisting>
1
2
</programlisting></entry>
</row>
<row>
<entry><function>each(hstore)</function></entry>
<entry><type>setof (key text, value text)</type></entry>
<entry>get <type>hstore</>'s keys and values as set</entry>
<entry><literal>select * from each('a=&gt;1,b=&gt;2')</literal></entry>
<entry>
<programlisting>
key | value
-----+-------
a | 1
b | 2
</programlisting>
</listitem>
<listitem>
<para>
<literal>exist (hstore,text)</literal>
</para>
<para>
<literal>hstore ? text</literal> - returns 'true if key is exists in hstore
and false otherwise.
</para>
<programlisting>
regression=# select exist('a=>1','a'), 'a=>1' ? 'a';
exist | ?column?
-------+----------
t | t
</programlisting>
</listitem>
<listitem>
<para>
<literal>defined (hstore,text)</literal> - returns true if key is exists in
hstore and its value is not NULL.
</para>
<programlisting>
regression=# select defined('a=>NULL','a');
defined
---------
f
</programlisting>
</listitem>
</itemizedlist>
</programlisting></entry>
</row>
<row>
<entry><function>exist(hstore,text)</function></entry>
<entry><type>boolean</type></entry>
<entry>does <type>hstore</> contain key?</entry>
<entry><literal>exist('a=&gt;1','a')</literal></entry>
<entry><literal>t</literal></entry>
</row>
<row>
<entry><function>defined(hstore,text)</function></entry>
<entry><type>boolean</type></entry>
<entry>does <type>hstore</> contain non-null value for key?</entry>
<entry><literal>defined('a=&gt;NULL','a')</literal></entry>
<entry><literal>f</literal></entry>
</row>
<row>
<entry><function>delete(hstore,text)</function></entry>
<entry><type>hstore</type></entry>
<entry>delete any item matching key</entry>
<entry><literal>delete('a=&gt;1,b=&gt;2','b')</literal></entry>
<entry><literal>"a"=>"1"</literal></entry>
</row>
</tbody>
</tgroup>
</table>
</sect2>
<sect2>
<title>Indices</title>
<title>Indexes</title>
<para>
Module provides index support for '@>' and '?' operations.
<type>hstore</> has index support for <literal>@&gt;</> and <literal>?</>
operators. You can use either GiST or GIN index types. For example:
</para>
<programlisting>
CREATE INDEX hidx ON testhstore USING GIST(h);
CREATE INDEX hidx ON testhstore USING GIN(h);
</programlisting>
</sect2>
......@@ -232,45 +244,53 @@ CREATE INDEX hidx ON testhstore USING GIN(h);
<title>Examples</title>
<para>
Add a key:
Add a key, or update an existing key with a new value:
</para>
<programlisting>
UPDATE tt SET h=h||'c=>3';
UPDATE tab SET h = h || ('c' => '3');
</programlisting>
<para>
Delete a key:
</para>
<programlisting>
UPDATE tt SET h=delete(h,'k1');
UPDATE tab SET h = delete(h, 'k1');
</programlisting>
</sect2>
<sect2>
<title>Statistics</title>
<para>
hstore type, because of its intrinsic liberality, could contain a lot of
different keys. Checking for valid keys is the task of application.
Examples below demonstrate several techniques how to check keys statistics.
The <type>hstore</> type, because of its intrinsic liberality, could
contain a lot of different keys. Checking for valid keys is the task of the
application. Examples below demonstrate several techniques for checking
keys and obtaining statistics.
</para>
<para>
Simple example
Simple example:
</para>
<programlisting>
SELECT * FROM each('aaa=>bq, b=>NULL, ""=>1 ');
SELECT * FROM each('aaa=>bq, b=>NULL, ""=>1');
</programlisting>
<para>
Using table
Using a table:
</para>
<programlisting>
SELECT (each(h)).key, (each(h)).value INTO stat FROM testhstore ;
SELECT (each(h)).key, (each(h)).value INTO stat FROM testhstore;
</programlisting>
<para>Online stat</para>
<para>
Online statistics:
</para>
<programlisting>
SELECT key, count(*) FROM (SELECT (each(h)).key FROM testhstore) AS stat GROUP BY key ORDER BY count DESC, key;
key | count
SELECT key, count(*) FROM
(SELECT (each(h)).key FROM testhstore) AS stat
GROUP BY key
ORDER BY count DESC, key;
key | count
-----------+-------
line | 883
query | 207
......@@ -287,12 +307,14 @@ SELECT key, count(*) FROM (SELECT (each(h)).key FROM testhstore) AS stat GROUP B
<sect2>
<title>Authors</title>
<para>
Oleg Bartunov <email>oleg@sai.msu.su</email>, Moscow, Moscow University, Russia
</para>
<para>
Teodor Sigaev <email>teodor@sigaev.ru</email>, Moscow, Delta-Soft Ltd.,Russia
Teodor Sigaev <email>teodor@sigaev.ru</email>, Moscow, Delta-Soft Ltd., Russia
</para>
</sect2>
</sect1>
</sect1>
<!-- $PostgreSQL: pgsql/doc/src/sgml/lo.sgml,v 1.3 2007/12/06 04:12:10 tgl Exp $ -->
<sect1 id="lo">
<title>lo</title>
<indexterm zone="lo">
<primary>lo</primary>
</indexterm>
<para>
PostgreSQL type extension for managing Large Objects
The <filename>lo</> module provides support for managing Large Objects
(also called LOs or BLOBs). This includes a data type <type>lo</>
and a trigger <function>lo_manage</>.
</para>
<sect2>
<title>Overview</title>
<title>Rationale</title>
<para>
One of the problems with the JDBC driver (and this affects the ODBC driver
also), is that the specification assumes that references to BLOBS (Binary
Large OBjectS) are stored within a table, and if that entry is changed, the
also), is that the specification assumes that references to BLOBs (Binary
Large OBjects) are stored within a table, and if that entry is changed, the
associated BLOB is deleted from the database.
</para>
<para>
As PostgreSQL stands, this doesn't occur. Large objects are treated as
objects in their own right; a table entry can reference a large object by
OID, but there can be multiple table entries referencing the same large
object OID, so the system doesn't delete the large object just because you
change or remove one such entry.
</para>
<para>
Now this is fine for new PostgreSQL-specific applications, but existing ones
using JDBC or ODBC won't delete the objects, resulting in orphaning - objects
that are not referenced by anything, and simply occupy disk space.
As <productname>PostgreSQL</> stands, this doesn't occur. Large objects
are treated as objects in their own right; a table entry can reference a
large object by OID, but there can be multiple table entries referencing
the same large object OID, so the system doesn't delete the large object
just because you change or remove one such entry.
</para>
</sect2>
<sect2>
<title>The Fix</title>
<para>
I've fixed this by creating a new data type 'lo', some support functions, and
a Trigger which handles the orphaning problem. The trigger essentially just
does a 'lo_unlink' whenever you delete or modify a value referencing a large
object. When you use this trigger, you are assuming that there is only one
database reference to any large object that is referenced in a
trigger-controlled column!
Now this is fine for <productname>PostgreSQL</>-specific applications, but
standard code using JDBC or ODBC won't delete the objects, resulting in
orphan objects &mdash; objects that are not referenced by anything, and
simply occupy disk space.
</para>
<para>
The 'lo' type was created because we needed to differentiate between plain
OIDs and Large Objects. Currently the JDBC driver handles this dilemma easily,
but (after talking to Byron), the ODBC driver needed a unique type. They had
created an 'lo' type, but not the solution to orphaning.
The <filename>lo</> module allows fixing this by attaching a trigger
to tables that contain LO reference columns. The trigger essentially just
does a <function>lo_unlink</> whenever you delete or modify a value
referencing a large object. When you use this trigger, you are assuming
that there is only one database reference to any large object that is
referenced in a trigger-controlled column!
</para>
<para>
You don't actually have to use the 'lo' type to use the trigger, but it may be
convenient to use it to keep track of which columns in your database represent
large objects that you are managing with the trigger.
The module also provides a data type <type>lo</>, which is really just
a domain of the <type>oid</> type. This is useful for differentiating
database columns that hold large object references from those that are
OIDs of other things. You don't have to use the <type>lo</> type to
use the trigger, but it may be convenient to use it to keep track of which
columns in your database represent large objects that you are managing with
the trigger. It is also rumored that the ODBC driver gets confused if you
don't use <type>lo</> for BLOB columns.
</para>
</sect2>
<sect2>
<title>How to Use</title>
<title>How to Use It</title>
<para>
The easiest way is by an example:
Here's a simple example of usage:
</para>
<programlisting>
CREATE TABLE image (title TEXT, raster lo);
CREATE TRIGGER t_raster BEFORE UPDATE OR DELETE ON image
FOR EACH ROW EXECUTE PROCEDURE lo_manage(raster);
</programlisting>
<para>
Create a trigger for each column that contains a lo type, and give the column
name as the trigger procedure argument. You can have more than one trigger on
a table if you need multiple lo columns in the same table, but don't forget to
give a different name to each trigger.
For each column that will contain unique references to large objects,
create a <literal>BEFORE UPDATE OR DELETE</> trigger, and give the column
name as the sole trigger argument. If you need multiple <type>lo</>
columns in the same table, create a separate trigger for each one,
remembering to give a different name to each trigger on the same table.
</para>
</sect2>
<sect2>
<title>Issues</title>
<title>Limitations</title>
<itemizedlist>
<listitem>
<para>
Dropping a table will still orphan any objects it contains, as the trigger
is not executed.
is not executed. You can avoid this by preceding the <command>DROP
TABLE</> with <command>DELETE FROM <replaceable>table</></command>.
</para>
<para>
Avoid this by preceding the 'drop table' with 'delete from {table}'.
<command>TRUNCATE</> has the same hazard.
</para>
<para>
If you already have, or suspect you have, orphaned large objects, see
the contrib/vacuumlo module to help you clean them up. It's a good idea
to run contrib/vacuumlo occasionally as a back-stop to the lo_manage
trigger.
If you already have, or suspect you have, orphaned large objects, see the
<filename>contrib/vacuumlo</> module (<xref linkend="vacuumlo">) to help
you clean them up. It's a good idea to run <application>vacuumlo</>
occasionally as a back-stop to the <function>lo_manage</> trigger.
</para>
</listitem>
<listitem>
<para>
Some frontends may create their own tables, and will not create the
associated trigger(s). Also, users may not remember (or know) to create
associated trigger(s). Also, users may not remember (or know) to create
the triggers.
</para>
</listitem>
</itemizedlist>
<para>
As the ODBC driver needs a permanent lo type (&amp; JDBC could be optimised to
use it if it's Oid is fixed), and as the above issues can only be fixed by
some internal changes, I feel it should become a permanent built-in type.
</para>
</sect2>
<sect2>
<title>Author</title>
<para>
Peter Mount <email>peter@retep.org.uk</email> June 13 1998
Peter Mount <email>peter@retep.org.uk</email>
</para>
</sect2>
</sect1>
</sect1>
<!-- $PostgreSQL: pgsql/doc/src/sgml/seg.sgml,v 1.4 2007/12/06 04:12:10 tgl Exp $ -->
<sect1 id="seg">
<title>seg</title>
<indexterm zone="seg">
<primary>seg</primary>
</indexterm>
<para>
The <literal>seg</literal> module contains the code for the user-defined
type, <literal>SEG</literal>, representing laboratory measurements as
floating point intervals.
This module implements a data type <type>seg</> for
representing line segments, or floating point intervals.
<type>seg</> can represent uncertainty in the interval endpoints,
making it especially useful for representing laboratory measurements.
</para>
<sect2>
<title>Rationale</title>
<para>
The geometry of measurements is usually more complex than that of a
point in a numeric continuum. A measurement is usually a segment of
......@@ -22,26 +25,28 @@
the value being measured may naturally be an interval indicating some
condition, such as the temperature range of stability of a protein.
</para>
<para>
Using just common sense, it appears more convenient to store such data
as intervals, rather than pairs of numbers. In practice, it even turns
out more efficient in most applications.
</para>
<para>
Further along the line of common sense, the fuzziness of the limits
suggests that the use of traditional numeric data types leads to a
certain loss of information. Consider this: your instrument reads
6.50, and you input this reading into the database. What do you get
when you fetch it? Watch:
</para>
<programlisting>
test=> select 6.50 as "pH";
test=> select 6.50 :: float8 as "pH";
pH
---
6.5
(1 row)
</programlisting>
<para>
In the world of measurements, 6.50 is not the same as 6.5. It may
sometimes be critically different. The experimenters usually write
down (and publish) the digits they trust. 6.50 is actually a fuzzy
......@@ -50,234 +55,171 @@ test=> select 6.50 as "pH";
share. We definitely do not want such different data items to appear the
same.
</para>
<para>
Conclusion? It is nice to have a special data type that can record the
limits of an interval with arbitrarily variable precision. Variable in
a sense that each data element records its own precision.
the sense that each data element records its own precision.
</para>
<para>
Check this out:
</para>
<programlisting>
<programlisting>
test=> select '6.25 .. 6.50'::seg as "pH";
pH
------------
6.25 .. 6.50
(1 row)
</programlisting>
</programlisting>
</para>
</sect2>
<sect2>
<title>Syntax</title>
<para>
The external representation of an interval is formed using one or two
floating point numbers joined by the range operator ('..' or '...').
Optional certainty indicators (&lt;, &gt; and ~) are ignored by the internal
logics, but are retained in the data.
floating point numbers joined by the range operator (<literal>..</literal>
or <literal>...</literal>). Alternatively, it can be specified as a
center point plus or minus a deviation.
Optional certainty indicators (<literal>&lt;</literal>,
<literal>&gt;</literal> and <literal>~</literal>) can be stored as well.
(Certainty indicators are ignored by all the built-in operators, however.)
</para>
<para>
In the following table, <replaceable>x</>, <replaceable>y</>, and
<replaceable>delta</> denote
floating-point numbers. <replaceable>x</> and <replaceable>y</>, but
not <replaceable>delta</>, can be preceded by a certainty indicator:
</para>
<table>
<title>Rules</title>
<tgroup cols="2">
<tbody>
<row>
<entry>rule 1</entry>
<entry>seg -&gt; boundary PLUMIN deviation</entry>
</row>
<row>
<entry>rule 2</entry>
<entry>seg -&gt; boundary RANGE boundary</entry>
</row>
<row>
<entry>rule 3</entry>
<entry>seg -&gt; boundary RANGE</entry>
</row>
<row>
<entry>rule 4</entry>
<entry>seg -&gt; RANGE boundary</entry>
</row>
<row>
<entry>rule 5</entry>
<entry>seg -&gt; boundary</entry>
</row>
<row>
<entry>rule 6</entry>
<entry>boundary -&gt; FLOAT</entry>
</row>
<row>
<entry>rule 7</entry>
<entry>boundary -&gt; EXTENSION FLOAT</entry>
</row>
<row>
<entry>rule 8</entry>
<entry>deviation -&gt; FLOAT</entry>
</row>
</tbody>
</tgroup>
</table>
<table>
<title>Tokens</title>
<title><type>seg</> external representations</title>
<tgroup cols="2">
<tbody>
<row>
<entry>RANGE</entry>
<entry>(\.\.)(\.)?</entry>
</row>
<row>
<entry>PLUMIN</entry>
<entry>\'\+\-\'</entry>
<entry><literal><replaceable>x</></literal></entry>
<entry>Single value (zero-length interval)
</entry>
</row>
<row>
<entry>integer</entry>
<entry>[+-]?[0-9]+</entry>
<entry><literal><replaceable>x</> .. <replaceable>y</></literal></entry>
<entry>Interval from <replaceable>x</> to <replaceable>y</>
</entry>
</row>
<row>
<entry>real</entry>
<entry>[+-]?[0-9]+\.[0-9]+</entry>
<entry><literal><replaceable>x</> (+-) <replaceable>delta</></literal></entry>
<entry>Interval from <replaceable>x</> - <replaceable>delta</> to
<replaceable>x</> + <replaceable>delta</>
</entry>
</row>
<row>
<entry>FLOAT</entry>
<entry>({integer}|{real})([eE]{integer})?</entry>
<entry><literal><replaceable>x</> ..</literal></entry>
<entry>Open interval with lower bound <replaceable>x</>
</entry>
</row>
<row>
<entry>EXTENSION</entry>
<entry>[&lt;&gt;~]</entry>
<entry><literal>.. <replaceable>x</></literal></entry>
<entry>Open interval with upper bound <replaceable>x</>
</entry>
</row>
</tbody>
</tgroup>
</table>
<table>
<title>Examples of valid <literal>SEG</literal> representations</title>
<title>Examples of valid <type>seg</> input</title>
<tgroup cols="2">
<tbody>
<row>
<entry>Any number</entry>
<entry><literal>5.0</literal></entry>
<entry>
(rules 5,6) -- creates a zero-length segment (a point,
if you will)
Creates a zero-length segment (a point, if you will)
</entry>
</row>
<row>
<entry>~5.0</entry>
<entry><literal>~5.0</literal></entry>
<entry>
(rules 5,7) -- creates a zero-length segment AND records
'~' in the data. This notation reads 'approximately 5.0',
but its meaning is not recognized by the code. It is ignored
until you get the value back. View it is a short-hand comment.
Creates a zero-length segment and records
<literal>~</> in the data. <literal>~</literal> is ignored
by <type>seg</> operations, but
is preserved as a comment.
</entry>
</row>
</row>
<row>
<entry>&lt;5.0</entry>
<entry><literal>&lt;5.0</literal></entry>
<entry>
(rules 5,7) -- creates a point at 5.0; '&lt;' is ignored but
is preserved as a comment
Creates a point at 5.0. <literal>&lt;</literal> is ignored but
is preserved as a comment.
</entry>
</row>
<row>
<entry>&gt;5.0</entry>
<entry><literal>&gt;5.0</literal></entry>
<entry>
(rules 5,7) -- creates a point at 5.0; '&gt;' is ignored but
is preserved as a comment
Creates a point at 5.0. <literal>&gt;</literal> is ignored but
is preserved as a comment.
</entry>
</row>
<row>
<entry><para>5(+-)0.3</para><para>5'+-'0.3</para></entry>
<entry><literal>5(+-)0.3</literal></entry>
<entry>
<para>
(rules 1,8) -- creates an interval '4.7..5.3'. As of this
writing (02/09/2000), this mechanism isn't completely accurate
in determining the number of significant digits for the
boundaries. For example, it adds an extra digit to the lower
boundary if the resulting interval includes a power of ten:
</para>
<programlisting>
postgres=> select '10(+-)1'::seg as seg;
seg
---------
9.0 .. 11 -- should be: 9 .. 11
</programlisting>
<para>
Also, the (+-) notation is not preserved: 'a(+-)b' will
always be returned as '(a-b) .. (a+b)'. The purpose of this
notation is to allow input from certain data sources without
conversion.
</para>
Creates an interval <literal>4.7 .. 5.3</literal>.
Note that the <literal>(+-)</> notation isn't preserved.
</entry>
</row>
<row>
<entry>50 .. </entry>
<entry>(rule 3) -- everything that is greater than or equal to 50</entry>
<entry><literal>50 .. </literal></entry>
<entry>Everything that is greater than or equal to 50</entry>
</row>
<row>
<entry>.. 0</entry>
<entry>(rule 4) -- everything that is less than or equal to 0</entry>
<entry><literal>.. 0</literal></entry>
<entry>Everything that is less than or equal to 0</entry>
</row>
<row>
<entry>1.5e-2 .. 2E-2 </entry>
<entry>(rule 2) -- creates an interval (0.015 .. 0.02)</entry>
<entry><literal>1.5e-2 .. 2E-2 </literal></entry>
<entry>Creates an interval <literal>0.015 .. 0.02</literal></entry>
</row>
<row>
<entry>1 ... 2</entry>
<entry><literal>1 ... 2</literal></entry>
<entry>
The same as 1...2, or 1 .. 2, or 1..2 (space is ignored).
Because of the widespread use of '...' in the data sources,
I decided to stick to is as a range operator. This, and
also the fact that the white space around the range operator
is ignored, creates a parsing conflict with numeric constants
starting with a decimal point.
The same as <literal>1...2</literal>, or <literal>1 .. 2</literal>,
or <literal>1..2</literal>
(spaces around the range operator are ignored)
</entry>
</row>
</tbody>
</tgroup>
</table>
<table>
<title>Examples</title>
<tgroup cols="2">
<tbody>
<row>
<entry>.1e7</entry>
<entry>should be: 0.1e7</entry>
</row>
<row>
<entry>.1 .. .2</entry>
<entry>should be: 0.1 .. 0.2</entry>
</row>
<row>
<entry>2.4 E4</entry>
<entry>should be: 2.4E4</entry>
</row>
</tbody>
</tgroup>
</table>
<para>
The following, although it is not a syntax error, is disallowed to improve
the sanity of the data:
Because <literal>...</> is widely used in data sources, it is allowed
as an alternative spelling of <literal>..</>. Unfortunately, this
creates a parsing ambiguity: it is not clear whether the upper bound
in <literal>0...23</> is meant to be <literal>23</> or <literal>0.23</>.
This is resolved by requiring at least one digit before the decimal
point in all numbers in <type>seg</> input.
</para>
<table>
<title></title>
<tgroup cols="2">
<tbody>
<row>
<entry>5 .. 2</entry>
<entry>should be: 2 .. 5</entry>
</row>
</tbody>
</tgroup>
</table>
<para>
As a sanity check, <type>seg</> rejects intervals with the lower bound
greater than the upper, for example <literal>5 .. 2</>.
</para>
</sect2>
<sect2>
<title>Precision</title>
<para>
The segments are stored internally as pairs of 32-bit floating point
numbers. It means that the numbers with more than 7 significant digits
<type>seg</> values are stored internally as pairs of 32-bit floating point
numbers. This means that numbers with more than 7 significant digits
will be truncated.
</para>
<para>
The numbers with less than or exactly 7 significant digits retain their
Numbers with 7 or fewer significant digits retain their
original precision. That is, if your query returns 0.00, you will be
sure that the trailing zeroes are not the artifacts of formatting: they
reflect the precision of the original data. The number of leading
......@@ -288,28 +230,20 @@ postgres=> select '10(+-)1'::seg as seg;
<sect2>
<title>Usage</title>
<para>
The access method for SEG is a GiST index (gist_seg_ops), which is a
generalization of R-tree. GiSTs allow the postgres implementation of
R-tree, originally encoded to support 2-D geometric types such as
boxes and polygons, to be used with any data type whose data domain
can be partitioned using the concepts of containment, intersection and
equality. In other words, everything that can intersect or contain
its own kind can be indexed with a GiST. That includes, among other
things, all geometric data types, regardless of their dimensionality
(see also contrib/cube).
</para>
<para>
The operators supported by the GiST access method include:
The <filename>seg</> module includes a GiST index operator class for
<type>seg</> values.
The operators supported by the GiST opclass include:
</para>
<itemizedlist>
<listitem>
<programlisting>
[a, b] &lt;&lt; [c, d] Is left of
</programlisting>
<para>
The left operand, [a, b], occurs entirely to the left of the
right operand, [c, d], on the axis (-inf, inf). It means,
[a, b] is entirely to the left of [c, d]. That is,
[a, b] &lt;&lt; [c, d] is true if b &lt; c and false otherwise
</para>
</listitem>
......@@ -318,8 +252,8 @@ postgres=> select '10(+-)1'::seg as seg;
[a, b] &gt;&gt; [c, d] Is right of
</programlisting>
<para>
[a, b] is occurs entirely to the right of [c, d].
[a, b] &gt;&gt; [c, d] is true if a &gt; d and false otherwise
[a, b] is entirely to the right of [c, d]. That is,
[a, b] &gt;&gt; [c, d] is true if a &gt; d and false otherwise
</para>
</listitem>
<listitem>
......@@ -327,8 +261,8 @@ postgres=> select '10(+-)1'::seg as seg;
[a, b] &amp;&lt; [c, d] Overlaps or is left of
</programlisting>
<para>
This might be better read as "does not extend to right of".
It is true when b &lt;= d.
This might be better read as <quote>does not extend to right of</quote>.
It is true when b &lt;= d.
</para>
</listitem>
<listitem>
......@@ -336,17 +270,16 @@ postgres=> select '10(+-)1'::seg as seg;
[a, b] &amp;&gt; [c, d] Overlaps or is right of
</programlisting>
<para>
This might be better read as "does not extend to left of".
It is true when a &gt;= c.
This might be better read as <quote>does not extend to left of</quote>.
It is true when a &gt;= c.
</para>
</listitem>
<listitem>
<programlisting>
[a, b] = [c, d] Same as
[a, b] = [c, d] Same as
</programlisting>
<para>
The segments [a, b] and [c, d] are identical, that is, a == b
and c == d
The segments [a, b] and [c, d] are identical, that is, a = c and b = d
</para>
</listitem>
<listitem>
......@@ -354,28 +287,29 @@ postgres=> select '10(+-)1'::seg as seg;
[a, b] &amp;&amp; [c, d] Overlaps
</programlisting>
<para>
The segments [a, b] and [c, d] overlap.
The segments [a, b] and [c, d] overlap.
</para>
</listitem>
<listitem>
<programlisting>
[a, b] @&gt; [c, d] Contains
[a, b] @&gt; [c, d] Contains
</programlisting>
<para>
The segment [a, b] contains the segment [c, d], that is,
a &lt;= c and b &gt;= d
The segment [a, b] contains the segment [c, d], that is,
a &lt;= c and b &gt;= d
</para>
</listitem>
<listitem>
<programlisting>
[a, b] &lt;@ [c, d] Contained in
[a, b] &lt;@ [c, d] Contained in
</programlisting>
<para>
The segment [a, b] is contained in [c, d], that is,
a &gt;= c and b &lt;= d
The segment [a, b] is contained in [c, d], that is,
a &gt;= c and b &lt;= d
</para>
</listitem>
</itemizedlist>
<para>
(Before PostgreSQL 8.2, the containment operators @&gt; and &lt;@ were
respectively called @ and ~. These names are still available, but are
......@@ -383,68 +317,70 @@ postgres=> select '10(+-)1'::seg as seg;
are reversed from the convention formerly followed by the core geometric
datatypes!)
</para>
<para>
Although the mnemonics of the following operators is questionable, I
preserved them to maintain visual consistency with other geometric
data types defined in Postgres.
</para>
<para>
Other operators:
</para>
The standard B-tree operators are also provided, for example
<programlisting>
[a, b] &lt; [c, d] Less than
[a, b] &gt; [c, d] Greater than
</programlisting>
<para>
These operators do not make a lot of sense for any practical
purpose but sorting. These operators first compare (a) to (c),
and if these are equal, compare (b) to (d). That accounts for
and if these are equal, compare (b) to (d). That results in
reasonably good sorting in most cases, which is useful if
you want to use ORDER BY with this type
you want to use ORDER BY with this type.
</para>
</sect2>
<sect2>
<title>Notes</title>
<para>
There are a few other potentially useful functions defined in seg.c
that vanished from the schema because I stopped using them. Some of
these were meant to support type casting. Let me know if I was wrong:
I will then add them back to the schema. I would also appreciate
other ideas that would enhance the type and make it more useful.
For examples of usage, see the regression test <filename>sql/seg.sql</>.
</para>
<para>
For examples of usage, see sql/seg.sql
The mechanism that converts <literal>(+-)</> to regular ranges
isn't completely accurate in determining the number of significant digits
for the boundaries. For example, it adds an extra digit to the lower
boundary if the resulting interval includes a power of ten:
<programlisting>
postgres=> select '10(+-)1'::seg as seg;
seg
---------
9.0 .. 11 -- should be: 9 .. 11
</programlisting>
</para>
<para>
NOTE: The performance of an R-tree index can largely depend on the
The performance of an R-tree index can largely depend on the initial
order of input values. It may be very helpful to sort the input table
on the SEG column (see the script sort-segments.pl for an example)
on the <type>seg</> column; see the script <filename>sort-segments.pl</>
for an example.
</para>
</sect2>
<sect2>
<title>Credits</title>
<para>
My thanks are primarily to Prof. Joe Hellerstein
(<ulink url="http://db.cs.berkeley.edu/~jmh/"></ulink>) for elucidating the
gist of the GiST (<ulink url="http://gist.cs.berkeley.edu/"></ulink>). I am
also grateful to all postgres developers, present and past, for enabling
myself to create my own world and live undisturbed in it. And I would like
to acknowledge my gratitude to Argonne Lab and to the U.S. Department of
Energy for the years of faithful support of my database research.
Original author: Gene Selkov, Jr. <email>selkovjr@mcs.anl.gov</email>,
Mathematics and Computer Science Division, Argonne National Laboratory.
</para>
<programlisting>
Gene Selkov, Jr.
Computational Scientist
Mathematics and Computer Science Division
Argonne National Laboratory
9700 S Cass Ave.
Building 221
Argonne, IL 60439-4844
</programlisting>
<para>
<email>selkovjr@mcs.anl.gov</email>
My thanks are primarily to Prof. Joe Hellerstein
(<ulink url="http://db.cs.berkeley.edu/~jmh/"></ulink>) for elucidating the
gist of the GiST (<ulink url="http://gist.cs.berkeley.edu/"></ulink>). I am
also grateful to all Postgres developers, present and past, for enabling
myself to create my own world and live undisturbed in it. And I would like
to acknowledge my gratitude to Argonne Lab and to the U.S. Department of
Energy for the years of faithful support of my database research.
</para>
</sect2>
</sect1>
<!-- $PostgreSQL: pgsql/doc/src/sgml/sslinfo.sgml,v 1.3 2007/12/06 04:12:10 tgl Exp $ -->
<sect1 id="sslinfo">
<title>sslinfo</title>
<indexterm zone="sslinfo">
<primary>sslinfo</primary>
</indexterm>
<para>
This modules provides information about current SSL certificate for PostgreSQL.
The <filename>sslinfo</> module provides information about the SSL
certificate that the current client provided when connecting to
<productname>PostgreSQL</>. The module is useless (most functions
will return NULL) if the current connection does not use SSL.
</para>
<sect2>
<title>Notes</title>
<para>
This extension won't build unless your PostgreSQL server is configured
with --with-openssl. Information provided with these functions would
be completely useless if you don't use SSL to connect to database.
</para>
</sect2>
<para>
This extension won't build at all unless the installation was
configured with <literal>--with-openssl</>.
</para>
<sect2>
<title>Functions Description</title>
<itemizedlist>
<listitem>
<programlisting>
ssl_is_used() RETURNS boolean;
</programlisting>
<title>Functions Provided</title>
<variablelist>
<varlistentry>
<term><function>
ssl_is_used() returns boolean
</function></term>
<listitem>
<para>
Returns TRUE, if current connection to server uses SSL and FALSE
Returns TRUE if current connection to server uses SSL, and FALSE
otherwise.
</para>
</listitem>
</listitem>
</varlistentry>
<listitem>
<programlisting>
ssl_client_cert_present() RETURNS boolean
</programlisting>
<varlistentry>
<term><function>
ssl_client_cert_present() returns boolean
</function></term>
<listitem>
<para>
Returns TRUE if current client have presented valid SSL client
certificate to the server and FALSE otherwise (e.g., no SSL,
certificate hadn't be requested by server).
Returns TRUE if current client has presented a valid SSL client
certificate to the server, and FALSE otherwise. (The server
might or might not be configured to require a client certificate.)
</para>
</listitem>
<listitem>
<programlisting>
ssl_client_serial() RETURNS numeric
</programlisting>
</listitem>
</varlistentry>
<varlistentry>
<term><function>
ssl_client_serial() returns numeric
</function></term>
<listitem>
<para>
Returns serial number of current client certificate. The combination
of certificate serial number and certificate issuer is guaranteed to
uniquely identify certificate (but not its owner -- the owner ought to
regularily change his keys, and get new certificates from the issuer).
Returns serial number of current client certificate. The combination of
certificate serial number and certificate issuer is guaranteed to
uniquely identify a certificate (but not its owner &mdash; the owner
ought to regularly change his keys, and get new certificates from the
issuer).
</para>
<para>
So, if you run you own CA and allow only certificates from this CA to
be accepted by server, the serial number is the most reliable (albeit
not very mnemonic) means to indentify user.
So, if you run your own CA and allow only certificates from this CA to
be accepted by the server, the serial number is the most reliable (albeit
not very mnemonic) means to identify a user.
</para>
</listitem>
</listitem>
</varlistentry>
<listitem>
<programlisting>
ssl_client_dn() RETURNS text
</programlisting>
<varlistentry>
<term><function>
ssl_client_dn() returns text
</function></term>
<listitem>
<para>
Returns the full subject of current client certificate, converting
Returns the full subject of the current client certificate, converting
character data into the current database encoding. It is assumed that
if you use non-Latin characters in the certificate names, your
if you use non-ASCII characters in the certificate names, your
database is able to represent these characters, too. If your database
uses the SQL_ASCII encoding, non-Latin characters in the name will be
uses the SQL_ASCII encoding, non-ASCII characters in the name will be
represented as UTF-8 sequences.
</para>
<para>
The result looks like '/CN=Somebody /C=Some country/O=Some organization'.
The result looks like <literal>/CN=Somebody /C=Some country/O=Some organization</>.
</para>
</listitem>
<listitem>
<programlisting>
ssl_issuer_dn()
</programlisting>
</listitem>
</varlistentry>
<varlistentry>
<term><function>
ssl_issuer_dn() returns text
</function></term>
<listitem>
<para>
Returns the full issuer name of the client certificate, converting
character data into current database encoding.
Returns the full issuer name of the current client certificate, converting
character data into the current database encoding. Encoding conversions
are handled the same as for <function>ssl_client_dn</>.
</para>
<para>
The combination of the return value of this function with the
certificate serial number uniquely identifies the certificate.
</para>
<para>
The result of this function is really useful only if you have more
than one trusted CA certificate in your server's root.crt file, or if
this CA has issued some intermediate certificate authority
certificates.
This function is really useful only if you have more than one trusted CA
certificate in your server's <filename>root.crt</> file, or if this CA
has issued some intermediate certificate authority certificates.
</para>
</listitem>
<listitem>
<programlisting>
ssl_client_dn_field(fieldName text) RETURNS text
</programlisting>
</listitem>
</varlistentry>
<varlistentry>
<term><function>
ssl_client_dn_field(fieldname text) returns text
</function></term>
<listitem>
<para>
This function returns the value of the specified field in the
certificate subject. Field names are string constants that are
converted into ASN1 object identificators using the OpenSSL object
certificate subject, or NULL if the field is not present.
Field names are string constants that are
converted into ASN1 object identifiers using the OpenSSL object
database. The following values are acceptable:
</para>
<programlisting>
......@@ -113,7 +128,7 @@ commonName (alias CN)
surname (alias SN)
name
givenName (alias GN)
countryName (alias C)
countryName (alias C)
localityName (alias L)
stateOrProvinceName (alias ST)
organizationName (alias O)
......@@ -127,38 +142,46 @@ generationQualifier
description
dnQualifier
x500UniqueIdentifier
pseudonim
pseudonym
role
emailAddress
</programlisting>
<para>
All of these fields are optional, except commonName. It depends
entirely on your CA policy which of them would be included and which
wouldn't. The meaning of these fields, howeer, is strictly defined by
All of these fields are optional, except <structfield>commonName</>.
It depends
entirely on your CA's policy which of them would be included and which
wouldn't. The meaning of these fields, however, is strictly defined by
the X.500 and X.509 standards, so you cannot just assign arbitrary
meaning to them.
</para>
</listitem>
<listitem>
<programlisting>
ssl_issuer_field(fieldName text) RETURNS text;
</programlisting>
</listitem>
</varlistentry>
<varlistentry>
<term><function>
ssl_issuer_field(fieldname text) returns text
</function></term>
<listitem>
<para>
Does same as ssl_client_dn_field, but for the certificate issuer
Same as <function>ssl_client_dn_field</>, but for the certificate issuer
rather than the certificate subject.
</para>
</listitem>
</itemizedlist>
</listitem>
</varlistentry>
</variablelist>
</sect2>
<sect2>
<title>Author</title>
<para>
Victor Wagner <email>vitus@cryptocom.ru</email>, Cryptocom LTD
E-Mail of Cryptocom OpenSSL development group:
</para>
<para>
E-Mail of Cryptocom OpenSSL development group:
<email>openssl@cryptocom.ru</email>
</para>
</sect2>
</sect1>
</sect1>
<!-- $PostgreSQL: pgsql/doc/src/sgml/tablefunc.sgml,v 1.4 2007/12/06 04:12:10 tgl Exp $ -->
<sect1 id="tablefunc">
<title>tablefunc</title>
<indexterm zone="tablefunc">
<primary>tablefunc</primary>
</indexterm>
<para>
<literal>tablefunc</literal> provides functions to convert query rows into fields.
The <filename>tablefunc</> module includes various functions that return
tables (that is, multiple rows). These functions are useful both in their
own right and as examples of how to write C functions that return
multiple rows.
</para>
<sect2>
<title>Functions</title>
<title>Functions Provided</title>
<table>
<title></title>
<title><filename>tablefunc</> functions</title>
<tgroup cols="3">
<thead>
<row>
<entry>Function</entry>
<entry>Returns</entry>
<entry>Comments</entry>
<entry>Description</entry>
</row>
</thead>
<tbody>
<row>
<entry><function>normal_rand(int numvals, float8 mean, float8 stddev)</function></entry>
<entry><type>setof float8</></entry>
<entry>
<literal>
normal_rand(int numvals, float8 mean, float8 stddev)
</literal>
Produces a set of normally distributed random values
</entry>
</row>
<row>
<entry><function>crosstab(text sql)</function></entry>
<entry><type>setof record</></entry>
<entry>
returns a set of normally distributed float8 values
Produces a <quote>pivot table</> containing
row names plus <replaceable>N</> value columns, where
<replaceable>N</> is determined by the rowtype specified in the calling
query
</entry>
<entry></entry>
</row>
<row>
<entry><literal>crosstabN(text sql)</literal></entry>
<entry>returns a set of row_name plus N category value columns</entry>
<entry><function>crosstab<replaceable>N</>(text sql)</function></entry>
<entry><type>setof table_crosstab_<replaceable>N</></></entry>
<entry>
crosstab2(), crosstab3(), and crosstab4() are defined for you,
but you can create additional crosstab functions per the instructions
in the documentation below.
Produces a <quote>pivot table</> containing
row names plus <replaceable>N</> value columns.
<function>crosstab2</>, <function>crosstab3</>, and
<function>crosstab4</> are predefined, but you can create additional
<function>crosstab<replaceable>N</></> functions as described below
</entry>
</row>
<row>
<entry><literal>crosstab(text sql)</literal></entry>
<entry>returns a set of row_name plus N category value columns</entry>
<entry><function>crosstab(text source_sql, text category_sql)</function></entry>
<entry><type>setof record</></entry>
<entry>
requires anonymous composite type syntax in the FROM clause. See
the instructions in the documentation below.
Produces a <quote>pivot table</>
with the value columns specified by a second query
</entry>
</row>
<row>
<entry><literal>crosstab(text sql, N int)</literal></entry>
<entry></entry>
<entry><function>crosstab(text sql, int N)</function></entry>
<entry><type>setof record</></entry>
<entry>
<para>obsolete version of crosstab()</para>
<para>
the argument N is now ignored, since the number of value columns
is always determined by the calling query
<para>Obsolete version of <function>crosstab(text)</>.
The parameter <replaceable>N</> is now ignored, since the number of
value columns is always determined by the calling query
</para>
</entry>
</row>
<row>
<entry>
<literal>
<function>
connectby(text relname, text keyid_fld, text parent_keyid_fld
[, text orderby_fld], text start_with, int max_depth
[, text branch_delim])
</literal>
[, text orderby_fld ], text start_with, int max_depth
[, text branch_delim ])
</function>
</entry>
<entry><type>setof record</></entry>
<entry>
returns keyid, parent_keyid, level, and an optional branch string
and an optional serial column for ordering siblings
</entry>
<entry>
requires anonymous composite type syntax in the FROM clause. See
the instructions in the documentation below.
Produces a representation of a hierarchical tree structure
</entry>
</row>
</tbody>
......@@ -83,26 +92,31 @@
</table>
<sect3>
<title><literal>normal_rand</literal></title>
<title><function>normal_rand</function></title>
<programlisting>
normal_rand(int numvals, float8 mean, float8 stddev) RETURNS SETOF float8
normal_rand(int numvals, float8 mean, float8 stddev) returns setof float8
</programlisting>
<para>
Where <literal>numvals</literal> is the number of values to be returned
from the function. <literal>mean</literal> is the mean of the normal
distribution of values and <literal>stddev</literal> is the standard
deviation of the normal distribution of values.
<function>normal_rand</> produces a set of normally distributed random
values (Gaussian distribution).
</para>
<para>
Returns a float8 set of random values normally distributed (Gaussian
distribution).
<parameter>numvals</parameter> is the number of values to be returned
from the function. <parameter>mean</parameter> is the mean of the normal
distribution of values and <parameter>stddev</parameter> is the standard
deviation of the normal distribution of values.
</para>
<para>
Example:
For example, this call requests 1000 values with a mean of 5 and a
standard deviation of 3:
</para>
<programlisting>
test=# SELECT * FROM
test=# normal_rand(1000, 5, 3);
test=# SELECT * FROM normal_rand(1000, 5, 3);
normal_rand
----------------------
1.56556322244898
......@@ -118,150 +132,56 @@ normal_rand(int numvals, float8 mean, float8 stddev) RETURNS SETOF float8
2.49639286969028
(1000 rows)
</programlisting>
<para>
Returns 1000 values with a mean of 5 and a standard deviation of 3.
</para>
</sect3>
<sect3>
<title><literal>crosstabN(text sql)</literal></title>
<programlisting>
crosstabN(text sql)
</programlisting>
<para>
The <literal>sql</literal> parameter is a SQL statement which produces the
source set of data. The SQL statement must return one row_name column, one
category column, and one value column. <literal>row_name</literal> and
value must be of type text. The function returns a set of
<literal>row_name</literal> plus N category value columns.
</para>
<para>
Provided <literal>sql</literal> must produce a set something like:
</para>
<programlisting>
row_name cat value
---------+-------+-------
row1 cat1 val1
row1 cat2 val2
row1 cat3 val3
row1 cat4 val4
row2 cat1 val5
row2 cat2 val6
row2 cat3 val7
row2 cat4 val8
</programlisting>
<para>
The returned value is a <literal>SETOF table_crosstab_N</literal>, which
is defined by:
</para>
<programlisting>
CREATE TYPE tablefunc_crosstab_N AS (
row_name TEXT,
category_1 TEXT,
category_2 TEXT,
.
.
.
category_N TEXT
);
</programlisting>
<para>
for the default installed functions, where N is 2, 3, or 4.
</para>
<para>
e.g. the provided crosstab2 function produces a set something like:
</para>
<programlisting>
&lt;== values columns ==&gt;
row_name category_1 category_2
---------+------------+------------
row1 val1 val2
row2 val5 val6
</programlisting>
<note>
<orderedlist>
<listitem><para>The sql result must be ordered by 1,2.</para></listitem>
<listitem>
<para>
The number of values columns depends on the tuple description
of the function's declared return type.
</para>
</listitem>
<listitem>
<para>
Missing values (i.e. not enough adjacent rows of same row_name to
fill the number of result values columns) are filled in with nulls.
</para>
</listitem>
<listitem>
<para>
Extra values (i.e. too many adjacent rows of same row_name to fill
the number of result values columns) are skipped.
</para>
</listitem>
<listitem>
<para>
Rows with all nulls in the values columns are skipped.
</para>
</listitem>
<listitem>
<para>
The installed defaults are for illustration purposes. You
can create your own return types and functions based on the
crosstab() function of the installed library. See below for
details.
</para>
</listitem>
</orderedlist>
</note>
<para>
Example:
</para>
<programlisting>
create table ct(id serial, rowclass text, rowid text, attribute text, value text);
insert into ct(rowclass, rowid, attribute, value) values('group1','test1','att1','val1');
insert into ct(rowclass, rowid, attribute, value) values('group1','test1','att2','val2');
insert into ct(rowclass, rowid, attribute, value) values('group1','test1','att3','val3');
insert into ct(rowclass, rowid, attribute, value) values('group1','test1','att4','val4');
insert into ct(rowclass, rowid, attribute, value) values('group1','test2','att1','val5');
insert into ct(rowclass, rowid, attribute, value) values('group1','test2','att2','val6');
insert into ct(rowclass, rowid, attribute, value) values('group1','test2','att3','val7');
insert into ct(rowclass, rowid, attribute, value) values('group1','test2','att4','val8');
select * from crosstab3(
'select rowid, attribute, value
from ct
where rowclass = ''group1''
and (attribute = ''att2'' or attribute = ''att3'') order by 1,2;');
row_name | category_1 | category_2 | category_3
----------+------------+------------+------------
test1 | val2 | val3 |
test2 | val6 | val7 |
(2 rows)
</programlisting>
</sect3>
<title><function>crosstab(text)</function></title>
<sect3>
<title><literal>crosstab(text)</literal></title>
<programlisting>
crosstab(text sql)
crosstab(text sql, int N)
</programlisting>
<para>
The <literal>sql</literal> parameter is a SQL statement which produces the
source set of data. The SQL statement must return one
<literal>row_name</literal> column, one <literal>category</literal> column,
and one <literal>value</literal> column. <literal>N</literal> is an
obsolete argument; ignored if supplied (formerly this had to match the
number of category columns determined by the calling query).
The <function>crosstab</> function is used to produce <quote>pivot</>
displays, wherein data is listed across the page rather than down.
For example, we might have data like
<programlisting>
row1 val11
row1 val12
row1 val13
...
row2 val21
row2 val22
row2 val23
...
</programlisting>
which we wish to display like
<programlisting>
row1 val11 val12 val13 ...
row2 val21 val22 val23 ...
...
</programlisting>
The <function>crosstab</> function takes a text parameter that is a SQL
query producing raw data formatted in the first way, and produces a table
formatted in the second way.
</para>
<para>
The <parameter>sql</parameter> parameter is a SQL statement that produces
the source set of data. This statement must return one
<structfield>row_name</structfield> column, one
<structfield>category</structfield> column, and one
<structfield>value</structfield> column. <parameter>N</parameter> is an
obsolete parameter, ignored if supplied (formerly this had to match the
number of output value columns, but now that is determined by the
calling query).
</para>
<para>
e.g. provided sql must produce a set something like:
For example, the provided query might produce a set something like:
</para>
<programlisting>
row_name cat value
----------+-------+-------
......@@ -274,84 +194,177 @@ crosstab(text sql, int N)
row2 cat3 val7
row2 cat4 val8
</programlisting>
<para>
Returns a <literal>SETOF RECORD</literal>, which must be defined with a
column definition in the FROM clause of the SELECT statement, e.g.:
The <function>crosstab</> function is declared to return <type>setof
record</type>, so the actual names and types of the output columns must be
defined in the <literal>FROM</> clause of the calling <command>SELECT</>
statement, for example:
</para>
<programlisting>
SELECT *
FROM crosstab(sql) AS ct(row_name text, category_1 text, category_2 text);
SELECT * FROM crosstab('...') AS ct(row_name text, category_1 text, category_2 text);
</programlisting>
<para>
the example crosstab function produces a set something like:
This example produces a set something like:
</para>
<programlisting>
&lt;== values columns ==&gt;
&lt;== value columns ==&gt;
row_name category_1 category_2
---------+------------+------------
row1 val1 val2
row2 val5 val6
</programlisting>
<para>
Note that it follows these rules:
The <literal>FROM</> clause must define the output as one
<structfield>row_name</> column (of the same datatype as the first result
column of the SQL query) followed by N <structfield>value</> columns
(all of the same datatype as the third result column of the SQL query).
You can set up as many output value columns as you wish. The names of the
output columns are up to you.
</para>
<orderedlist>
<listitem><para>The sql result must be ordered by 1,2.</para></listitem>
<listitem>
<para>
The number of values columns is determined by the column definition
provided in the FROM clause. The FROM clause must define one
row_name column (of the same datatype as the first result column
of the sql query) followed by N category columns (of the same
datatype as the third result column of the sql query). You can
set up as many category columns as you wish.
</para>
</listitem>
<listitem>
<para>
Missing values (i.e. not enough adjacent rows of same row_name to
fill the number of result values columns) are filled in with nulls.
</para>
</listitem>
<listitem>
<para>
Extra values (i.e. too many adjacent rows of same row_name to fill
the number of result values columns) are skipped.
</para>
</listitem>
<listitem>
<para>
Rows with all nulls in the values columns are skipped.
</para>
</listitem>
<listitem>
<para>
You can avoid always having to write out a FROM clause that defines the
output columns by setting up a custom crosstab function that has
the desired output row type wired into its definition.
</para>
</listitem>
</orderedlist>
<para>
The <function>crosstab</> function produces one output row for each
consecutive group of input rows with the same
<structfield>row_name</structfield> value. It fills the output
<structfield>value</> columns, left to right, with the
<structfield>value</structfield> fields from these rows. If there
are fewer rows in a group than there are output <structfield>value</>
columns, the extra output columns are filled with nulls; if there are
more rows, the extra input rows are skipped.
</para>
<para>
In practice the SQL query should always specify <literal>ORDER BY 1,2</>
to ensure that the input rows are properly ordered, that is, values with
the same <structfield>row_name</structfield> are brought together and
correctly ordered within the row. Notice that <function>crosstab</>
itself does not pay any attention to the second column of the query
result; it's just there to be ordered by, to control the order in which
the third-column values appear across the page.
</para>
<para>
Here is a complete example:
</para>
<programlisting>
CREATE TABLE ct(id SERIAL, rowid TEXT, attribute TEXT, value TEXT);
INSERT INTO ct(rowid, attribute, value) VALUES('test1','att1','val1');
INSERT INTO ct(rowid, attribute, value) VALUES('test1','att2','val2');
INSERT INTO ct(rowid, attribute, value) VALUES('test1','att3','val3');
INSERT INTO ct(rowid, attribute, value) VALUES('test1','att4','val4');
INSERT INTO ct(rowid, attribute, value) VALUES('test2','att1','val5');
INSERT INTO ct(rowid, attribute, value) VALUES('test2','att2','val6');
INSERT INTO ct(rowid, attribute, value) VALUES('test2','att3','val7');
INSERT INTO ct(rowid, attribute, value) VALUES('test2','att4','val8');
SELECT *
FROM crosstab(
'select rowid, attribute, value
from ct
where attribute = ''att2'' or attribute = ''att3''
order by 1,2')
AS ct(row_name text, category_1 text, category_2 text, category_3 text);
row_name | category_1 | category_2 | category_3
----------+------------+------------+------------
test1 | val2 | val3 |
test2 | val6 | val7 |
(2 rows)
</programlisting>
<para>
You can avoid always having to write out a <literal>FROM</> clause to
define the output columns, by setting up a custom crosstab function that
has the desired output row type wired into its definition. This is
described in the next section. Another possibility is to embed the
required <literal>FROM</> clause in a view definition.
</para>
</sect3>
<sect3>
<title><function>crosstab<replaceable>N</>(text)</function></title>
<programlisting>
crosstab<replaceable>N</>(text sql)
</programlisting>
<para>
The <function>crosstab<replaceable>N</></> functions are examples of how
to set up custom wrappers for the general <function>crosstab</> function,
so that you need not write out column names and types in the calling
<command>SELECT</> query. The <filename>tablefunc</> module includes
<function>crosstab2</>, <function>crosstab3</>, and
<function>crosstab4</>, whose output rowtypes are defined as
</para>
<programlisting>
CREATE TYPE tablefunc_crosstab_N AS (
row_name TEXT,
category_1 TEXT,
category_2 TEXT,
.
.
.
category_N TEXT
);
</programlisting>
<para>
Thus, these functions can be used directly when the input query produces
<structfield>row_name</> and <structfield>value</> columns of type
<type>text</>, and you want 2, 3, or 4 output values columns.
In all other ways they behave exactly as described above for the
general <function>crosstab</> function.
</para>
<para>
For instance, the example given in the previous section would also
work as
</para>
<programlisting>
SELECT *
FROM crosstab3(
'select rowid, attribute, value
from ct
where attribute = ''att2'' or attribute = ''att3''
order by 1,2');
</programlisting>
<para>
There are two ways you can set up a custom crosstab function:
These functions are provided mostly for illustration purposes. You
can create your own return types and functions based on the
underlying <function>crosstab()</> function. There are two ways
to do it:
</para>
<itemizedlist>
<listitem>
<para>
Create a composite type to define your return type, similar to the
examples in the installation script. Then define a unique function
name accepting one text parameter and returning setof your_type_name.
For example, if your source data produces row_names that are TEXT,
and values that are FLOAT8, and you want 5 category columns:
Create a composite type describing the desired output columns,
similar to the examples in the installation script. Then define a
unique function name accepting one <type>text</> parameter and returning
<type>setof your_type_name</>, but linking to the same underlying
<function>crosstab</> C function. For example, if your source data
produces row names that are <type>text</>, and values that are
<type>float8</>, and you want 5 value columns:
</para>
<programlisting>
CREATE TYPE my_crosstab_float8_5_cols AS (
row_name TEXT,
category_1 FLOAT8,
category_2 FLOAT8,
category_3 FLOAT8,
category_4 FLOAT8,
category_5 FLOAT8
my_row_name text,
my_category_1 float8,
my_category_2 float8,
my_category_3 float8,
my_category_4 float8,
my_category_5 float8
);
CREATE OR REPLACE FUNCTION crosstab_float8_5_cols(text)
......@@ -359,86 +372,69 @@ row_name category_1 category_2
AS '$libdir/tablefunc','crosstab' LANGUAGE C STABLE STRICT;
</programlisting>
</listitem>
<listitem>
<para>
Use OUT parameters to define the return type implicitly.
Use <literal>OUT</> parameters to define the return type implicitly.
The same example could also be done this way:
</para>
<programlisting>
CREATE OR REPLACE FUNCTION crosstab_float8_5_cols(IN text,
OUT row_name TEXT,
OUT category_1 FLOAT8,
OUT category_2 FLOAT8,
OUT category_3 FLOAT8,
OUT category_4 FLOAT8,
OUT category_5 FLOAT8)
OUT my_row_name text,
OUT my_category_1 float8,
OUT my_category_2 float8,
OUT my_category_3 float8,
OUT my_category_4 float8,
OUT my_category_5 float8)
RETURNS setof record
AS '$libdir/tablefunc','crosstab' LANGUAGE C STABLE STRICT;
</programlisting>
</listitem>
</itemizedlist>
<para>
Example:
</para>
<programlisting>
CREATE TABLE ct(id SERIAL, rowclass TEXT, rowid TEXT, attribute TEXT, value TEXT);
INSERT INTO ct(rowclass, rowid, attribute, value) VALUES('group1','test1','att1','val1');
INSERT INTO ct(rowclass, rowid, attribute, value) VALUES('group1','test1','att2','val2');
INSERT INTO ct(rowclass, rowid, attribute, value) VALUES('group1','test1','att3','val3');
INSERT INTO ct(rowclass, rowid, attribute, value) VALUES('group1','test1','att4','val4');
INSERT INTO ct(rowclass, rowid, attribute, value) VALUES('group1','test2','att1','val5');
INSERT INTO ct(rowclass, rowid, attribute, value) VALUES('group1','test2','att2','val6');
INSERT INTO ct(rowclass, rowid, attribute, value) VALUES('group1','test2','att3','val7');
INSERT INTO ct(rowclass, rowid, attribute, value) VALUES('group1','test2','att4','val8');
SELECT *
FROM crosstab(
'select rowid, attribute, value
from ct
where rowclass = ''group1''
and (attribute = ''att2'' or attribute = ''att3'') order by 1,2;', 3)
AS ct(row_name text, category_1 text, category_2 text, category_3 text);
row_name | category_1 | category_2 | category_3
----------+------------+------------+------------
test1 | val2 | val3 |
test2 | val6 | val7 |
(2 rows)
</programlisting>
</sect3>
<sect3>
<title><literal>crosstab(text, text)</literal></title>
<title><function>crosstab(text, text)</function></title>
<programlisting>
crosstab(text source_sql, text category_sql)
</programlisting>
<para>
Where <literal>source_sql</literal> is a SQL statement which produces the
source set of data. The SQL statement must return one
<literal>row_name</literal> column, one <literal>category</literal> column,
and one <literal>value</literal> column. It may also have one or more
<emphasis>extra</emphasis> columns.
</para>
<para>
The <literal>row_name</literal> column must be first. The
<literal>category</literal> and <literal>value</literal> columns must be
the last two columns, in that order. <emphasis>extra</emphasis> columns must
be columns 2 through (N - 2), where N is the total number of columns.
The main limitation of the single-parameter form of <function>crosstab</>
is that it treats all values in a group alike, inserting each value into
the first available column. If you want the value
columns to correspond to specific categories of data, and some groups
might not have data for some of the categories, that doesn't work well.
The two-parameter form of <function>crosstab</> handles this case by
providing an explicit list of the categories corresponding to the
output columns.
</para>
<para>
The <emphasis>extra</emphasis> columns are assumed to be the same for all
rows with the same <literal>row_name</literal>. The values returned are
copied from the first row with a given <literal>row_name</literal> and
subsequent values of these columns are ignored until
<literal>row_name</literal> changes.
<parameter>source_sql</parameter> is a SQL statement that produces the
source set of data. This statement must return one
<structfield>row_name</structfield> column, one
<structfield>category</structfield> column, and one
<structfield>value</structfield> column. It may also have one or more
<quote>extra</quote> columns.
The <structfield>row_name</structfield> column must be first. The
<structfield>category</structfield> and <structfield>value</structfield>
columns must be the last two columns, in that order. Any columns between
<structfield>row_name</structfield> and
<structfield>category</structfield> are treated as <quote>extra</>.
The <quote>extra</quote> columns are expected to be the same for all rows
with the same <structfield>row_name</structfield> value.
</para>
<para>
e.g. <literal>source_sql</literal> must produce a set something like:
For example, <parameter>source_sql</parameter> might produce a set
something like:
</para>
<programlisting>
SELECT row_name, extra_col, cat, value FROM foo;
SELECT row_name, extra_col, cat, value FROM foo ORDER BY 1;
row_name extra_col cat value
----------+------------+-----+---------
......@@ -452,14 +448,15 @@ crosstab(text source_sql, text category_sql)
</programlisting>
<para>
<literal>category_sql</literal> has to be a SQL statement which produces
the distinct set of categories. The SQL statement must return one category
column only. <literal>category_sql</literal> must produce at least one
result row or an error will be generated. <literal>category_sql</literal>
must not produce duplicate categories or an error will be generated. e.g.:
<parameter>category_sql</parameter> is a SQL statement that produces
the set of categories. This statement must return only one column.
It must produce at least one row, or an error will be generated.
Also, it must not produce duplicate values, or an error will be
generated. <parameter>category_sql</parameter> might be something like:
</para>
<programlisting>
SELECT DISTINCT cat FROM foo;
SELECT DISTINCT cat FROM foo ORDER BY 1;
cat
-------
cat1
......@@ -467,71 +464,114 @@ SELECT DISTINCT cat FROM foo;
cat3
cat4
</programlisting>
<para>
The function returns <literal>SETOF RECORD</literal>, which must be defined
with a column definition in the FROM clause of the SELECT statement, e.g.:
The <function>crosstab</> function is declared to return <type>setof
record</type>, so the actual names and types of the output columns must be
defined in the <literal>FROM</> clause of the calling <command>SELECT</>
statement, for example:
</para>
<programlisting>
SELECT * FROM crosstab(source_sql, cat_sql)
AS ct(row_name text, extra text, cat1 text, cat2 text, cat3 text, cat4 text);
SELECT * FROM crosstab('...', '...')
AS ct(row_name text, extra text, cat1 text, cat2 text, cat3 text, cat4 text);
</programlisting>
<para>
the example crosstab function produces a set something like:
This will produce a result something like:
</para>
<programlisting>
&lt;== values columns ==&gt;
&lt;== value columns ==&gt;
row_name extra cat1 cat2 cat3 cat4
---------+-------+------+------+------+------
row1 extra1 val1 val2 val4
row2 extra2 val5 val6 val7 val8
</programlisting>
<para>
The <literal>FROM</> clause must define the proper number of output
columns of the proper data types. If there are <replaceable>N</>
columns in the <parameter>source_sql</> query's result, the first
<replaceable>N</>-2 of them must match up with the first
<replaceable>N</>-2 output columns. The remaining output columns
must have the type of the last column of the <parameter>source_sql</>
query's result, and there must be exactly as many of them as there
are rows in the <parameter>category_sql</parameter> query's result.
</para>
<para>
The <function>crosstab</> function produces one output row for each
consecutive group of input rows with the same
<structfield>row_name</structfield> value. The output
<structfield>row_name</structfield> column, plus any <quote>extra</>
columns, are copied from the first row of the group. The output
<structfield>value</> columns are filled with the
<structfield>value</structfield> fields from rows having matching
<structfield>category</> values. If a row's <structfield>category</>
does not match any output of the <parameter>category_sql</parameter>
query, its <structfield>value</structfield> is ignored. Output
columns whose matching category is not present in any input row
of the group are filled with nulls.
</para>
<para>
In practice the <parameter>source_sql</parameter> query should always
specify <literal>ORDER BY 1</> to ensure that values with the same
<structfield>row_name</structfield> are brought together. However,
ordering of the categories within a group is not important.
Also, it is essential to be sure that the order of the
<parameter>category_sql</parameter> query's output matches the specified
output column order.
</para>
<para>
Note that it follows these rules:
Here are two complete examples:
</para>
<orderedlist>
<listitem><para>source_sql must be ordered by row_name (column 1).</para></listitem>
<listitem>
<para>
The number of values columns is determined at run-time. The
column definition provided in the FROM clause must provide for
the correct number of columns of the proper data types.
</para>
</listitem>
<listitem>
<para>
Missing values (i.e. not enough adjacent rows of same row_name to
fill the number of result values columns) are filled in with nulls.
</para>
</listitem>
<listitem>
<para>
Extra values (i.e. source rows with category not found in category_sql
result) are skipped.
</para>
</listitem>
<listitem>
<para>
Rows with a null row_name column are skipped.
</para>
</listitem>
<listitem>
<para>
You can create predefined functions to avoid having to write out
the result column names/types in each query. See the examples
for crosstab(text).
</para>
</listitem>
</orderedlist>
<programlisting>
CREATE TABLE cth(id serial, rowid text, rowdt timestamp, attribute text, val text);
INSERT INTO cth VALUES(DEFAULT,'test1','01 March 2003','temperature','42');
INSERT INTO cth VALUES(DEFAULT,'test1','01 March 2003','test_result','PASS');
INSERT INTO cth VALUES(DEFAULT,'test1','01 March 2003','volts','2.6987');
INSERT INTO cth VALUES(DEFAULT,'test2','02 March 2003','temperature','53');
INSERT INTO cth VALUES(DEFAULT,'test2','02 March 2003','test_result','FAIL');
INSERT INTO cth VALUES(DEFAULT,'test2','02 March 2003','test_startdate','01 March 2003');
INSERT INTO cth VALUES(DEFAULT,'test2','02 March 2003','volts','3.1234');
create table sales(year int, month int, qty int);
insert into sales values(2007, 1, 1000);
insert into sales values(2007, 2, 1500);
insert into sales values(2007, 7, 500);
insert into sales values(2007, 11, 1500);
insert into sales values(2007, 12, 2000);
insert into sales values(2008, 1, 1000);
select * from crosstab(
'select year, month, qty from sales order by 1',
'select m from generate_series(1,12) m'
) as (
year int,
"Jan" int,
"Feb" int,
"Mar" int,
"Apr" int,
"May" int,
"Jun" int,
"Jul" int,
"Aug" int,
"Sep" int,
"Oct" int,
"Nov" int,
"Dec" int
);
year | Jan | Feb | Mar | Apr | May | Jun | Jul | Aug | Sep | Oct | Nov | Dec
------+------+------+-----+-----+-----+-----+-----+-----+-----+-----+------+------
2007 | 1000 | 1500 | | | | | 500 | | | | 1500 | 2000
2008 | 1000 | | | | | | | | | | |
(2 rows)
</programlisting>
<programlisting>
CREATE TABLE cth(rowid text, rowdt timestamp, attribute text, val text);
INSERT INTO cth VALUES('test1','01 March 2003','temperature','42');
INSERT INTO cth VALUES('test1','01 March 2003','test_result','PASS');
INSERT INTO cth VALUES('test1','01 March 2003','volts','2.6987');
INSERT INTO cth VALUES('test2','02 March 2003','temperature','53');
INSERT INTO cth VALUES('test2','02 March 2003','test_result','FAIL');
INSERT INTO cth VALUES('test2','02 March 2003','test_startdate','01 March 2003');
INSERT INTO cth VALUES('test2','02 March 2003','volts','3.1234');
SELECT * FROM crosstab
(
......@@ -547,24 +587,41 @@ AS
test_startdate timestamp,
volts float8
);
rowid | rowdt | temperature | test_result | test_startdate | volts
rowid | rowdt | temperature | test_result | test_startdate | volts
-------+--------------------------+-------------+-------------+--------------------------+--------
test1 | Sat Mar 01 00:00:00 2003 | 42 | PASS | | 2.6987
test2 | Sun Mar 02 00:00:00 2003 | 53 | FAIL | Sat Mar 01 00:00:00 2003 | 3.1234
(2 rows)
</programlisting>
<para>
You can create predefined functions to avoid having to write out
the result column names and types in each query. See the examples
in the previous section. The underlying C function for this form
of <function>crosstab</> is named <literal>crosstab_hash</>.
</para>
</sect3>
<sect3>
<title>
<literal>connectby(text, text, text[, text], text, text, int[, text])</literal>
</title>
<title><function>connectby</function></title>
<programlisting>
connectby(text relname, text keyid_fld, text parent_keyid_fld
[, text orderby_fld], text start_with, int max_depth
[, text branch_delim])
[, text orderby_fld ], text start_with, int max_depth
[, text branch_delim ])
</programlisting>
<para>
The <function>connectby</> function produces a display of hierarchical
data that is stored in a table. The table must have a key field that
uniquely identifies rows, and a parent-key field that references the
parent (if any) of each row. <function>connectby</> can display the
sub-tree descending from any row.
</para>
<table>
<title><literal>connectby</literal> parameters</title>
<title><function>connectby</function> parameters</title>
<tgroup cols="2">
<thead>
<row>
......@@ -574,120 +631,108 @@ connectby(text relname, text keyid_fld, text parent_keyid_fld
</thead>
<tbody>
<row>
<entry><literal>relname</literal></entry>
<entry><parameter>relname</parameter></entry>
<entry>Name of the source relation</entry>
</row>
<row>
<entry><literal>keyid_fld</literal></entry>
<entry><parameter>keyid_fld</parameter></entry>
<entry>Name of the key field</entry>
</row>
<row>
<entry><literal>parent_keyid_fld</literal></entry>
<entry>Name of the key_parent field</entry>
<entry><parameter>parent_keyid_fld</parameter></entry>
<entry>Name of the parent-key field</entry>
</row>
<row>
<entry><literal>orderby_fld</literal></entry>
<entry>
If optional ordering of siblings is desired: Name of the field to
order siblings
</entry>
<entry><parameter>orderby_fld</parameter></entry>
<entry>Name of the field to order siblings by (optional)</entry>
</row>
<row>
<entry><literal>start_with</literal></entry>
<entry>
Root value of the tree input as a text value regardless of
<literal>keyid_fld</literal>
</entry>
<entry><parameter>start_with</parameter></entry>
<entry>Key value of the row to start at</entry>
</row>
<row>
<entry><literal>max_depth</literal></entry>
<entry>
Zero (0) for unlimited depth, otherwise restrict level to this depth
</entry>
<entry><parameter>max_depth</parameter></entry>
<entry>Maximum depth to descend to, or zero for unlimited depth</entry>
</row>
<row>
<entry><literal>branch_delim</literal></entry>
<entry>
If optional branch value is desired, this string is used as the delimiter.
When not provided, a default value of '~' is used for internal
recursion detection only, and no "branch" field is returned.
</entry>
<entry><parameter>branch_delim</parameter></entry>
<entry>String to separate keys with in branch output (optional)</entry>
</row>
</tbody>
</tgroup>
</table>
<para>
The function returns <literal>SETOF RECORD</literal>, which must defined
with a column definition in the FROM clause of the SELECT statement, e.g.:
</para>
<programlisting>
SELECT * FROM connectby('connectby_tree', 'keyid', 'parent_keyid', 'row2', 0, '~')
AS t(keyid text, parent_keyid text, level int, branch text);
</programlisting>
<para>
or
The key and parent-key fields can be any data type, but they must be
the same type. Note that the <parameter>start_with</> value must be
entered as a text string, regardless of the type of the key field.
</para>
<programlisting>
SELECT * FROM connectby('connectby_tree', 'keyid', 'parent_keyid', 'row2', 0)
AS t(keyid text, parent_keyid text, level int);
</programlisting>
<para>
or
The <function>connectby</> function is declared to return <type>setof
record</type>, so the actual names and types of the output columns must be
defined in the <literal>FROM</> clause of the calling <command>SELECT</>
statement, for example:
</para>
<programlisting>
SELECT * FROM connectby('connectby_tree', 'keyid', 'parent_keyid', 'pos', 'row2', 0, '~')
AS t(keyid text, parent_keyid text, level int, branch text, pos int);
</programlisting>
<para>
or
The first two output columns are used for the current row's key and
its parent row's key; they must match the type of the table's key field.
The third output column is the depth in the tree and must be of type
<type>integer</>. If a <parameter>branch_delim</parameter> parameter was
given, the next output column is the branch display and must be of type
<type>text</>. Finally, if an <parameter>orderby_fld</parameter>
parameter was given, the last output column is a serial number, and must
be of type <type>integer</>.
</para>
<programlisting>
SELECT * FROM connectby('connectby_tree', 'keyid', 'parent_keyid', 'pos', 'row2', 0)
AS t(keyid text, parent_keyid text, level int, pos int);
</programlisting>
<para>
Note that it follows these rules:
The <quote>branch</> output column shows the path of keys taken to
reach the current row. The keys are separated by the specified
<parameter>branch_delim</parameter> string. If no branch display is
wanted, omit both the <parameter>branch_delim</parameter> parameter
and the branch column in the output column list.
</para>
<orderedlist>
<listitem><para>keyid and parent_keyid must be the same data type</para></listitem>
<listitem>
<para>
The column definition *must* include a third column of type INT4 for
the level value output
</para>
</listitem>
<listitem>
<para>
If the branch field is not desired, omit both the branch_delim input
parameter *and* the branch field in the query column definition. Note
that when branch_delim is not provided, a default value of '~' is used
for branch_delim for internal recursion detection, even though the branch
field is not returned.
</para>
</listitem>
<listitem>
<para>
If the branch field is desired, it must be the fourth column in the query
column definition, and it must be type TEXT.
</para>
</listitem>
<listitem>
<para>
The parameters representing table and field names must include double
quotes if the names are mixed-case or contain special characters.
</para>
</listitem>
<listitem>
<para>
If sorting of siblings is desired, the orderby_fld input parameter *and*
a name for the resulting serial field (type INT32) in the query column
definition must be given.
</para>
</listitem>
</orderedlist>
<para>
If the ordering of siblings of the same parent is important,
include the <parameter>orderby_fld</parameter> parameter to
specify which field to order siblings by. This field can be of any
sortable data type. The output column list must include a final
integer serial-number column, if and only if
<parameter>orderby_fld</parameter> is specified.
</para>
<para>
Example:
The parameters representing table and field names are copied as-is
into the SQL queries that <function>connectby</> generates internally.
Therefore, include double quotes if the names are mixed-case or contain
special characters. You may also need to schema-qualify the table name.
</para>
<para>
In large tables, performance will be poor unless there is an index on
the parent-key field.
</para>
<para>
It is important that the <parameter>branch_delim</parameter> string
not appear in any key values, else <function>connectby</> may incorrectly
report an infinite-recursion error. Note that if
<parameter>branch_delim</parameter> is not provided, a default value
of <literal>~</> is used for recursion detection purposes.
<!-- That pretty well sucks. FIXME -->
</para>
<para>
Here is an example:
</para>
<programlisting>
CREATE TABLE connectby_tree(keyid text, parent_keyid text, pos int);
......@@ -701,7 +746,7 @@ INSERT INTO connectby_tree VALUES('row7','row3', 0);
INSERT INTO connectby_tree VALUES('row8','row6', 0);
INSERT INTO connectby_tree VALUES('row9','row5', 0);
-- with branch, without orderby_fld
-- with branch, without orderby_fld (order of results is not guaranteed)
SELECT * FROM connectby('connectby_tree', 'keyid', 'parent_keyid', 'row2', 0, '~')
AS t(keyid text, parent_keyid text, level int, branch text);
keyid | parent_keyid | level | branch
......@@ -714,7 +759,7 @@ SELECT * FROM connectby('connectby_tree', 'keyid', 'parent_keyid', 'row2', 0, '~
row9 | row5 | 2 | row2~row5~row9
(6 rows)
-- without branch, without orderby_fld
-- without branch, without orderby_fld (order of results is not guaranteed)
SELECT * FROM connectby('connectby_tree', 'keyid', 'parent_keyid', 'row2', 0)
AS t(keyid text, parent_keyid text, level int);
keyid | parent_keyid | level
......@@ -729,8 +774,8 @@ SELECT * FROM connectby('connectby_tree', 'keyid', 'parent_keyid', 'row2', 0)
-- with branch, with orderby_fld (notice that row5 comes before row4)
SELECT * FROM connectby('connectby_tree', 'keyid', 'parent_keyid', 'pos', 'row2', 0, '~')
AS t(keyid text, parent_keyid text, level int, branch text, pos int) ORDER BY t.pos;
keyid | parent_keyid | level | branch | pos
AS t(keyid text, parent_keyid text, level int, branch text, pos int);
keyid | parent_keyid | level | branch | pos
-------+--------------+-------+---------------------+-----
row2 | | 0 | row2 | 1
row5 | row2 | 1 | row2~row5 | 2
......@@ -742,7 +787,7 @@ SELECT * FROM connectby('connectby_tree', 'keyid', 'parent_keyid', 'pos', 'row2'
-- without branch, with orderby_fld (notice that row5 comes before row4)
SELECT * FROM connectby('connectby_tree', 'keyid', 'parent_keyid', 'pos', 'row2', 0)
AS t(keyid text, parent_keyid text, level int, pos int) ORDER BY t.pos;
AS t(keyid text, parent_keyid text, level int, pos int);
keyid | parent_keyid | level | pos
-------+--------------+-------+-----
row2 | | 0 | 1
......@@ -754,12 +799,16 @@ SELECT * FROM connectby('connectby_tree', 'keyid', 'parent_keyid', 'pos', 'row2'
(6 rows)
</programlisting>
</sect3>
</sect2>
<sect2>
<title>Author</title>
<para>
Joe Conway
</para>
</sect2>
</sect1>
</sect1>
<!-- $PostgreSQL: pgsql/doc/src/sgml/test-parser.sgml,v 1.1 2007/12/03 04:18:47 tgl Exp $ -->
<!-- $PostgreSQL: pgsql/doc/src/sgml/test-parser.sgml,v 1.2 2007/12/06 04:12:10 tgl Exp $ -->
<sect1 id="test-parser">
<title>test_parser</title>
......@@ -8,15 +8,18 @@
</indexterm>
<para>
This is an example of a custom parser for full text search.
<filename>test_parser</> is an example of a custom parser for full-text
search. It doesn't do anything especially useful, but can serve as
a starting point for developing your own parser.
</para>
<para>
It recognizes space-delimited words and returns just two token types:
<filename>test_parser</> recognizes words separated by white space,
and returns just two token types:
<programlisting>
mydb=# SELECT * FROM ts_token_type('testparser');
tokid | alias | description
tokid | alias | description
-------+-------+---------------
3 | word | Word
12 | blank | Space symbols
......@@ -41,16 +44,16 @@ mydb=# SELECT * FROM ts_token_type('testparser');
<programlisting>
mydb=# SELECT * FROM ts_parse('testparser', 'That''s my first own parser');
tokid | token
tokid | token
-------+--------
3 | That's
12 |
12 |
3 | my
12 |
12 |
3 | first
12 |
12 |
3 | own
12 |
12 |
3 | parser
</programlisting>
</para>
......@@ -68,14 +71,14 @@ mydb-# ADD MAPPING FOR word WITH english_stem;
ALTER TEXT SEARCH CONFIGURATION
mydb=# SELECT to_tsvector('testcfg', 'That''s my first own parser');
to_tsvector
to_tsvector
-------------------------------
'that':1 'first':3 'parser':5
(1 row)
mydb=# SELECT ts_headline('testcfg', 'Supernovae stars are the brightest phenomena in galaxies',
mydb(# to_tsquery('testcfg', 'star'));
ts_headline
ts_headline
-----------------------------------------------------------------
Supernovae &lt;b&gt;stars&lt;/b&gt; are the brightest phenomena in galaxies
(1 row)
......
<!-- $PostgreSQL: pgsql/doc/src/sgml/tsearch2.sgml,v 1.2 2007/12/06 04:12:10 tgl Exp $ -->
<sect1 id="tsearch2">
<title>tsearch2</title>
<indexterm zone="tsearch2">
<primary>tsearch2</primary>
</indexterm>
......
<!-- $PostgreSQL: pgsql/doc/src/sgml/uuid-ossp.sgml,v 1.2 2007/12/06 04:12:10 tgl Exp $ -->
<sect1 id="uuid-ossp">
<title>uuid-ossp</title>
<indexterm zone="uuid-ossp">
<primary>uuid-ossp</primary>
</indexterm>
<para>
This module provides functions to generate universally unique
identifiers (UUIDs) using one of the several standard algorithms, as
well as functions to produce certain special UUID constants.
The <filename>uuid-ossp</> module provides functions to generate universally
unique identifiers (UUIDs) using one of several standard algorithms. There
are also functions to produce certain special UUID constants.
</para>
<para>
This module depends on the OSSP UUID library, which can be found at
<ulink url="http://www.ossp.org/pkg/lib/uuid/"></ulink>.
</para>
<sect2>
<title>UUID Generation</title>
<title><literal>uuid-ossp</literal> Functions</title>
<para>
The relevant standards ITU-T Rec. X.667, ISO/IEC 9834-8:2005, and RFC
4122 specify four algorithms for generating UUIDs, identified by the
......@@ -23,7 +30,7 @@
</para>
<table>
<title><literal>uuid-ossp</literal> functions</title>
<title>Functions for UUID Generation</title>
<tgroup cols="2">
<thead>
<row>
......@@ -59,22 +66,9 @@
<para>
This function generates a version 3 UUID in the given namespace using
the specified input name. The namespace should be one of the special
constants produced by the uuid_ns_*() functions shown below. (It
could be any UUID in theory.) The name is an identifier in the
selected namespace. For example:
</para>
</entry>
</row>
<row>
<entry><literal>uuid_generate_v3(uuid_ns_url(), 'http://www.postgresql.org')</literal></entry>
<entry>
<para>
The name parameter will be MD5-hashed, so the cleartext cannot be
derived from the generated UUID.
</para>
<para>
The generation of UUIDs by this method has no random or
environment-dependent element and is therefore reproducible.
constants produced by the <function>uuid_ns_*()</> functions shown
below. (It could be any UUID in theory.) The name is an identifier
in the selected namespace.
</para>
</entry>
</row>
......@@ -102,15 +96,28 @@
</tgroup>
</table>
<para>
For example:
<programlisting>
SELECT uuid_generate_v3(uuid_ns_url(), 'http://www.postgresql.org');
</programlisting>
The name parameter will be MD5-hashed, so the cleartext cannot be
derived from the generated UUID.
The generation of UUIDs by this method has no random or
environment-dependent element and is therefore reproducible.
</para>
<table>
<title>UUID Constants</title>
<title>Functions Returning UUID Constants</title>
<tgroup cols="2">
<tbody>
<row>
<entry><literal>uuid_nil()</literal></entry>
<entry>
<para>
A "nil" UUID constant, which does not occur as a real UUID.
A <quote>nil</> UUID constant, which does not occur as a real UUID.
</para>
</entry>
</row>
......@@ -135,8 +142,8 @@
<entry>
<para>
Constant designating the ISO object identifier (OID) namespace for
UUIDs. (This pertains to ASN.1 OIDs, unrelated to the OIDs used in
PostgreSQL.)
UUIDs. (This pertains to ASN.1 OIDs, which are unrelated to the OIDs
used in <productname>PostgreSQL</>.)
</para>
</entry>
</row>
......@@ -153,11 +160,14 @@
</tgroup>
</table>
</sect2>
<sect2>
<title>Author</title>
<para>
Peter Eisentraut <email>peter_e@gmx.net</email>
</para>
</sect2>
</sect1>
</sect1>
<!-- $PostgreSQL: pgsql/doc/src/sgml/vacuumlo.sgml,v 1.2 2007/12/06 04:12:10 tgl Exp $ -->
<sect1 id="vacuumlo">
<title>vacuumlo</title>
<indexterm zone="vacuumlo">
<primary>vacuumlo</primary>
</indexterm>
<para>
This is a simple utility that will remove any orphaned large objects out of a
PostgreSQL database. An orphaned LO is considered to be any LO whose OID
does not appear in any OID data column of the database.
</para>
<para>
If you use this, you may also be interested in the lo_manage trigger in
contrib/lo. lo_manage is useful to try to avoid creating orphaned LOs
in the first place.
<application>vacuumlo</> is a simple utility program that will remove any
<quote>orphaned</> large objects from a
<productname>PostgreSQL</> database. An orphaned large object (LO) is
considered to be any LO whose OID does not appear in any <type>oid</> or
<type>lo</> data column of the database.
</para>
<para>
<note>
<para>
It was decided to place this in contrib as it needs further testing, but hopefully,
this (or a variant of it) would make it into the backend as a "vacuum lo"
command in a later release.
</para>
</note>
If you use this, you may also be interested in the <function>lo_manage</>
trigger in <filename>contrib/lo</> (see <xref linkend="lo">).
<function>lo_manage</> is useful to try
to avoid creating orphaned LOs in the first place.
</para>
<sect2>
<title>Usage</title>
<programlisting>
vacuumlo [options] database [database2 ... databasen]
</programlisting>
<synopsis>
vacuumlo [options] database [database2 ... databaseN]
</synopsis>
<para>
All databases named on the command line are processed. Available options
include:
</para>
<programlisting>
-v Write a lot of progress messages
-n Don't remove large objects, just show what would be done
-U username Username to connect as
-W Prompt for password
-h hostname Database server host
-p port Database server port
</programlisting>
<variablelist>
<varlistentry>
<term><option>-v</option></term>
<listitem>
<para>Write a lot of progress messages</para>
</listitem>
</varlistentry>
<varlistentry>
<term><option>-n</option></term>
<listitem>
<para>Don't remove anything, just show what would be done</para>
</listitem>
</varlistentry>
<varlistentry>
<term><option>-U</option> <replaceable>username</></term>
<listitem>
<para>Username to connect as</para>
</listitem>
</varlistentry>
<varlistentry>
<term><option>-W</option></term>
<listitem>
<para>Force prompt for password (generally useless)</para>
</listitem>
</varlistentry>
<varlistentry>
<term><option>-h</option> <replaceable>hostname</></term>
<listitem>
<para>Database server's host</para>
</listitem>
</varlistentry>
<varlistentry>
<term><option>-p</option> <replaceable>port</></term>
<listitem>
<para>Database server's port</para>
</listitem>
</varlistentry>
</variablelist>
</sect2>
<sect2>
<title>Method</title>
<para>
First, it builds a temporary table which contains all of the OIDs of the
large objects in that database.
</para>
<para>
It then scans through all columns in the database that are of type "oid"
or "lo", and removes matching entries from the temporary table.
It then scans through all columns in the database that are of type
<type>oid</> or <type>lo</>, and removes matching entries from the
temporary table.
</para>
<para>
The remaining entries in the temp table identify orphaned LOs. These are
removed.
The remaining entries in the temp table identify orphaned LOs.
These are removed.
</para>
</sect2>
<sect2>
<title>Author</title>
<para>
Peter Mount <email>peter@retep.org.uk</email>
</para>
<para>
<ulink url="http://www.retep.org.uk"></ulink>
</para>
</sect2>
</sect1>
<!-- $PostgreSQL: pgsql/doc/src/sgml/xml2.sgml,v 1.4 2007/12/06 04:12:10 tgl Exp $ -->
<sect1 id="xml2">
<title>xml2: XML-handling functions</title>
<title>xml2</title>
<indexterm zone="xml2">
<primary>xml2</primary>
</indexterm>
<para>
The <filename>xml2</> module provides XPath querying and
XSLT functionality.
</para>
<sect2>
<title>Deprecation notice</title>
<para>
From PostgreSQL 8.3 on, there is XML-related
functionality based on the SQL/XML standard in the core server.
That functionality covers XML syntax checking and XPath queries,
which is what this module does as well, and more, but the API is
not at all compatible. It is planned that this module will be
removed in PostgreSQL 8.4 in favor of the newer standard API, so
you are encouraged to try converting your applications. If you
find that some of the functionality of this module is not
available in an adequate form with the newer API, please explain
your issue to pgsql-hackers@postgresql.org so that the deficiency
can be addressed.
From <productname>PostgreSQL</> 8.3 on, there is XML-related
functionality based on the SQL/XML standard in the core server.
That functionality covers XML syntax checking and XPath queries,
which is what this module does, and more, but the API is
not at all compatible. It is planned that this module will be
removed in PostgreSQL 8.4 in favor of the newer standard API, so
you are encouraged to try converting your applications. If you
find that some of the functionality of this module is not
available in an adequate form with the newer API, please explain
your issue to pgsql-hackers@postgresql.org so that the deficiency
can be addressed.
</para>
</sect2>
<sect2>
<title>Description of functions</title>
<para>
The first set of functions are straightforward XML parsing and XPath queries:
These functions provide straightforward XML parsing and XPath queries.
All arguments are of type <type>text</>, so for brevity that is not shown.
</para>
<table>
......@@ -34,27 +44,27 @@
<tbody>
<row>
<entry>
<programlisting>
xml_is_well_formed(document) RETURNS bool
</programlisting>
<synopsis>
xml_is_well_formed(document) returns bool
</synopsis>
</entry>
<entry>
<para>
This parses the document text in its parameter and returns true if the
document is well-formed XML. (Note: before PostgreSQL 8.2, this function
was called xml_valid(). That is the wrong name since validity and
well-formedness have different meanings in XML. The old name is still
available, but is deprecated and will be removed in 8.3.)
document is well-formed XML. (Note: before PostgreSQL 8.2, this
function was called <function>xml_valid()</>. That is the wrong name
since validity and well-formedness have different meanings in XML.
The old name is still available, but is deprecated.)
</para>
</entry>
</row>
<row>
<entry>
<programlisting>
xpath_string(document,query) RETURNS text
xpath_number(document,query) RETURNS float4
xpath_bool(document,query) RETURNS bool
</programlisting>
<synopsis>
xpath_string(document,query) returns text
xpath_number(document,query) returns float4
xpath_bool(document,query) returns bool
</synopsis>
</entry>
<entry>
<para>
......@@ -65,9 +75,9 @@
</row>
<row>
<entry>
<programlisting>
xpath_nodeset(document,query,toptag,itemtag) RETURNS text
</programlisting>
<synopsis>
xpath_nodeset(document,query,toptag,itemtag) returns text
</synopsis>
</entry>
<entry>
<para>
......@@ -75,10 +85,10 @@
the result is multivalued, the output will look like:
</para>
<literal>
&lt;toptag>
&lt;itemtag>Value 1 which could be an XML fragment&lt;/itemtag>
&lt;itemtag>Value 2....&lt;/itemtag>
&lt;/toptag>
&lt;toptag&gt;
&lt;itemtag&gt;Value 1 which could be an XML fragment&lt;/itemtag&gt;
&lt;itemtag&gt;Value 2....&lt;/itemtag&gt;
&lt;/toptag&gt;
</literal>
<para>
If either toptag or itemtag is an empty string, the relevant tag is omitted.
......@@ -87,49 +97,51 @@
</row>
<row>
<entry>
<programlisting>
xpath_nodeset(document,query) RETURNS
</programlisting>
<synopsis>
xpath_nodeset(document,query) returns text
</synopsis>
</entry>
<entry>
<para>
Like xpath_nodeset(document,query,toptag,itemtag) but text omits both tags.
Like xpath_nodeset(document,query,toptag,itemtag) but result omits both tags.
</para>
</entry>
</row>
<row>
<entry>
<programlisting>
xpath_nodeset(document,query,itemtag) RETURNS
</programlisting>
<synopsis>
xpath_nodeset(document,query,itemtag) returns text
</synopsis>
</entry>
<entry>
<para>
Like xpath_nodeset(document,query,toptag,itemtag) but text omits toptag.
Like xpath_nodeset(document,query,toptag,itemtag) but result omits toptag.
</para>
</entry>
</row>
<row>
<entry>
<programlisting>
xpath_list(document,query,seperator) RETURNS text
</programlisting>
<synopsis>
xpath_list(document,query,separator) returns text
</synopsis>
</entry>
<entry>
<para>
This function returns multiple values seperated by the specified
seperator, e.g. Value 1,Value 2,Value 3 if seperator=','.
This function returns multiple values separated by the specified
separator, for example <literal>Value 1,Value 2,Value 3</> if
separator is <literal>,</>.
</para>
</entry>
</row>
<row>
<entry>
<programlisting>
xpath_list(document,query) RETURNS text
</programlisting>
<synopsis>
xpath_list(document,query) returns text
</synopsis>
</entry>
<entry>
This is a wrapper for the above function that uses ',' as the seperator.
This is a wrapper for the above function that uses <literal>,</>
as the separator.
</entry>
</row>
</tbody>
......@@ -137,38 +149,37 @@
</table>
</sect2>
<sect2>
<title><literal>xpath_table</literal></title>
<synopsis>
xpath_table(text key, text document, text relation, text xpaths, text criteria) returns setof record
</synopsis>
<para>
This is a table function which evaluates a set of XPath queries on
each of a set of documents and returns the results as a table. The
primary key field from the original document table is returned as the
first column of the result so that the resultset from xpath_table can
be readily used in joins.
</para>
<para>
The function itself takes 5 arguments, all text.
<function>xpath_table</> is a table function that evaluates a set of XPath
queries on each of a set of documents and returns the results as a
table. The primary key field from the original document table is returned
as the first column of the result so that the result set
can readily be used in joins.
</para>
<programlisting>
xpath_table(key,document,relation,xpaths,criteria)
</programlisting>
<table>
<title>Parameters</title>
<tgroup cols="2">
<tbody>
<row>
<entry><literal>key</literal></entry>
<entry><parameter>key</parameter></entry>
<entry>
<para>
the name of the "key" field - this is just a field to be used as
the first column of the output table i.e. it identifies the record from
which each output row came (see note below about multiple values).
the name of the <quote>key</> field &mdash; this is just a field to be used as
the first column of the output table, i.e. it identifies the record from
which each output row came (see note below about multiple values)
</para>
</entry>
</row>
<row>
<entry><literal>document</literal></entry>
<entry><parameter>document</parameter></entry>
<entry>
<para>
the name of the field containing the XML document
......@@ -176,7 +187,7 @@
</entry>
</row>
<row>
<entry><literal>relation</literal></entry>
<entry><parameter>relation</parameter></entry>
<entry>
<para>
the name of the table or view containing the documents
......@@ -184,20 +195,20 @@
</entry>
</row>
<row>
<entry><literal>xpaths</literal></entry>
<entry><parameter>xpaths</parameter></entry>
<entry>
<para>
multiple xpath expressions separated by <literal>|</literal>
one or more XPath expressions, separated by <literal>|</literal>
</para>
</entry>
</row>
<row>
<entry><literal>criteria</literal></entry>
<entry><parameter>criteria</parameter></entry>
<entry>
<para>
The contents of the where clause. This needs to be specified,
so use "true" or "1=1" here if you want to process all the rows in the
relation.
the contents of the WHERE clause. This cannot be omitted, so use
<literal>true</literal> or <literal>1=1</literal> if you want to
process all the rows in the relation
</para>
</entry>
</row>
......@@ -206,75 +217,75 @@
</table>
<para>
NB These parameters (except the XPath strings) are just substituted
into a plain SQL SELECT statement, so you have some flexibility - the
These parameters (except the XPath strings) are just substituted
into a plain SQL SELECT statement, so you have some flexibility &mdash; the
statement is
</para>
<para>
<literal>
SELECT &lt;key>,&lt;document> FROM &lt;relation> WHERE &lt;criteria>
SELECT &lt;key&gt;, &lt;document&gt; FROM &lt;relation&gt; WHERE &lt;criteria&gt;
</literal>
</para>
<para>
so those parameters can be *anything* valid in those particular
so those parameters can be <emphasis>anything</> valid in those particular
locations. The result from this SELECT needs to return exactly two
columns (which it will unless you try to list multiple fields for key
or document). Beware that this simplistic approach requires that you
validate any user-supplied values to avoid SQL injection attacks.
</para>
<para>
Using the function
</para>
<para>
The function has to be used in a FROM expression. This gives the following
form:
The function has to be used in a <literal>FROM</> expression, with an
<literal>AS</> clause to specify the output columns; for example
</para>
<programlisting>
SELECT * FROM
xpath_table('article_id',
'article_xml',
'articles',
'/article/author|/article/pages|/article/title',
'date_entered > ''2003-01-01'' ')
xpath_table('article_id',
'article_xml',
'articles',
'/article/author|/article/pages|/article/title',
'date_entered > ''2003-01-01'' ')
AS t(article_id integer, author text, page_count integer, title text);
</programlisting>
<para>
The AS clause defines the names and types of the columns in the
virtual table. If there are more XPath queries than result columns,
The <literal>AS</> clause defines the names and types of the columns in the
output table. The first is the <quote>key</> field and the rest correspond
to the XPath queries.
If there are more XPath queries than result columns,
the extra queries will be ignored. If there are more result columns
than XPath queries, the extra columns will be NULL.
</para>
<para>
Note that I've said in this example that pages is an integer. The
function deals internally with string representations, so when you say
you want an integer in the output, it will take the string
representation of the XPath result and use PostgreSQL input functions
to transform it into an integer (or whatever type the AS clause
requests). An error will result if it can't do this - for example if
the result is empty - so you may wish to just stick to 'text' as the
column type if you think your data has any problems.
Notice that this example defines the <structname>page_count</> result
column as an integer. The function deals internally with string
representations, so when you say you want an integer in the output, it will
take the string representation of the XPath result and use PostgreSQL input
functions to transform it into an integer (or whatever type the <type>AS</>
clause requests). An error will result if it can't do this &mdash; for
example if the result is empty &mdash; so you may wish to just stick to
<type>text</> as the column type if you think your data has any problems.
</para>
<para>
The select statement doesn't need to use * alone - it can reference the
The calling <command>SELECT</> statement doesn't necessarily have be
be just <literal>SELECT *</> &mdash; it can reference the output
columns by name or join them to other tables. The function produces a
virtual table with which you can perform any operation you wish (e.g.
aggregation, joining, sorting etc). So we could also have:
</para>
<programlisting>
SELECT t.title, p.fullname, p.email
FROM xpath_table('article_id','article_xml','articles',
'/article/title|/article/author/@id',
'xpath_string(article_xml,''/article/@date'') > ''2003-03-20'' ')
AS t(article_id integer, title text, author_id integer),
tblPeopleInfo AS p
SELECT t.title, p.fullname, p.email
FROM xpath_table('article_id', 'article_xml', 'articles',
'/article/title|/article/author/@id',
'xpath_string(article_xml,''/article/@date'') > ''2003-03-20'' ')
AS t(article_id integer, title text, author_id integer),
tblPeopleInfo AS p
WHERE t.author_id = p.person_id;
</programlisting>
......@@ -282,91 +293,74 @@ WHERE t.author_id = p.person_id;
as a more complicated example. Of course, you could wrap all
of this in a view for convenience.
</para>
<sect3>
<title>Multivalued results</title>
<para>
The xpath_table function assumes that the results of each XPath query
The <function>xpath_table</> function assumes that the results of each XPath query
might be multi-valued, so the number of rows returned by the function
may not be the same as the number of input documents. The first row
returned contains the first result from each query, the second row the
second result from each query. If one of the queries has fewer values
than the others, NULLs will be returned instead.
</para>
<para>
In some cases, a user will know that a given XPath query will return
only a single result (perhaps a unique document identifier) - if used
only a single result (perhaps a unique document identifier) &mdash; if used
alongside an XPath query returning multiple results, the single-valued
result will appear only on the first row of the result. The solution
to this is to use the key field as part of a join against a simpler
XPath query. As an example:
</para>
<para>
<literal>
CREATE TABLE test
(
id int4 NOT NULL,
xml text,
CONSTRAINT pk PRIMARY KEY (id)
)
WITHOUT OIDS;
INSERT INTO test VALUES (1, '&lt;doc num="C1">
&lt;line num="L1">&lt;a>1&lt;/a>&lt;b>2&lt;/b>&lt;c>3&lt;/c>&lt;/line>
&lt;line num="L2">&lt;a>11&lt;/a>&lt;b>22&lt;/b>&lt;c>33&lt;/c>&lt;/line>
&lt;/doc>');
INSERT INTO test VALUES (2, '&lt;doc num="C2">
&lt;line num="L1">&lt;a>111&lt;/a>&lt;b>222&lt;/b>&lt;c>333&lt;/c>&lt;/line>
&lt;line num="L2">&lt;a>111&lt;/a>&lt;b>222&lt;/b>&lt;c>333&lt;/c>&lt;/line>
&lt;/doc>');
</literal>
</para>
</sect3>
<sect3>
<title>The query</title>
<programlisting>
SELECT * FROM xpath_table('id','xml','test',
'/doc/@num|/doc/line/@num|/doc/line/a|/doc/line/b|/doc/line/c','1=1')
AS t(id int4, doc_num varchar(10), line_num varchar(10), val1 int4,
val2 int4, val3 int4)
WHERE id = 1 ORDER BY doc_num, line_num
</programlisting>
<para>
Gives the result:
</para>
<programlisting>
id | doc_num | line_num | val1 | val2 | val3
----+---------+----------+------+------+------
1 | C1 | L1 | 1 | 2 | 3
1 | | L2 | 11 | 22 | 33
CREATE TABLE test (
id int4 NOT NULL,
xml text,
CONSTRAINT pk PRIMARY KEY (id)
);
INSERT INTO test VALUES (1, '&lt;doc num="C1"&gt;
&lt;line num="L1"&gt;&lt;a&gt;1&lt;/a&gt;&lt;b&gt;2&lt;/b&gt;&lt;c&gt;3&lt;/c&gt;&lt;/line&gt;
&lt;line num="L2"&gt;&lt;a&gt;11&lt;/a&gt;&lt;b&gt;22&lt;/b&gt;&lt;c&gt;33&lt;/c&gt;&lt;/line&gt;
&lt;/doc&gt;');
INSERT INTO test VALUES (2, '&lt;doc num="C2"&gt;
&lt;line num="L1"&gt;&lt;a&gt;111&lt;/a&gt;&lt;b&gt;222&lt;/b&gt;&lt;c&gt;333&lt;/c&gt;&lt;/line&gt;
&lt;line num="L2"&gt;&lt;a&gt;111&lt;/a&gt;&lt;b&gt;222&lt;/b&gt;&lt;c&gt;333&lt;/c&gt;&lt;/line&gt;
&lt;/doc&gt;');
SELECT * FROM
xpath_table('id','xml','test',
'/doc/@num|/doc/line/@num|/doc/line/a|/doc/line/b|/doc/line/c',
'true')
AS t(id int4, doc_num varchar(10), line_num varchar(10), val1 int4, val2 int4, val3 int4)
WHERE id = 1 ORDER BY doc_num, line_num
id | doc_num | line_num | val1 | val2 | val3
----+---------+----------+------+------+------
1 | C1 | L1 | 1 | 2 | 3
1 | | L2 | 11 | 22 | 33
</programlisting>
<para>
To get doc_num on every line, the solution is to use two invocations
of xpath_table and join the results:
To get doc_num on every line, the solution is to use two invocations
of xpath_table and join the results:
</para>
<programlisting>
SELECT t.*,i.doc_num FROM
xpath_table('id','xml','test',
'/doc/line/@num|/doc/line/a|/doc/line/b|/doc/line/c','1=1')
AS t(id int4, line_num varchar(10), val1 int4, val2 int4, val3 int4),
xpath_table('id','xml','test','/doc/@num','1=1')
AS i(id int4, doc_num varchar(10))
SELECT t.*,i.doc_num FROM
xpath_table('id', 'xml', 'test',
'/doc/line/@num|/doc/line/a|/doc/line/b|/doc/line/c',
'true')
AS t(id int4, line_num varchar(10), val1 int4, val2 int4, val3 int4),
xpath_table('id', 'xml', 'test', '/doc/@num', 'true')
AS i(id int4, doc_num varchar(10))
WHERE i.id=t.id AND i.id=1
ORDER BY doc_num, line_num;
</programlisting>
<para>
which gives the desired result:
</para>
<programlisting>
id | line_num | val1 | val2 | val3 | doc_num
----+----------+------+------+------+---------
1 | L1 | 1 | 2 | 3 | C1
......@@ -375,62 +369,58 @@ WHERE t.author_id = p.person_id;
</programlisting>
</sect3>
</sect2>
<sect2>
<title>XSLT functions</title>
<para>
The following functions are available if libxslt is installed (this is
not currently detected automatically, so you will have to amend the
Makefile)
Makefile):
</para>
<sect3>
<title><literal>xslt_process</literal></title>
<programlisting>
xslt_process(document,stylesheet,paramlist) RETURNS text
</programlisting>
<synopsis>
xslt_process(text document, text stylesheet, text paramlist) returns text
</synopsis>
<para>
This function appplies the XSL stylesheet to the document and returns
the transformed result. The paramlist is a list of parameter
assignments to be used in the transformation, specified in the form
'a=1,b=2'. Note that this is also proof-of-concept code and the
parameter parsing is very simple-minded (e.g. parameter values cannot
contain commas!)
<literal>a=1,b=2</>. Note that the
parameter parsing is very simple-minded: parameter values cannot
contain commas!
</para>
<para>
Also note that if either the document or stylesheet values do not
begin with a &lt; then they will be treated as URLs and libxslt will
fetch them. It thus follows that you can use xslt_process as a means
to fetch the contents of URLs - you should be aware of the security
implications of this.
</para>
fetch them. It follows that you can use <function>xslt_process</> as a
means to fetch the contents of URLs &mdash; you should be aware of the
security implications of this.
</para>
<para>
There is also a two-parameter version of xslt_process which does not
pass any parameters to the transformation.
There is also a two-parameter version of <function>xslt_process</> which
does not pass any parameters to the transformation.
</para>
</sect3>
</sect2>
<sect2>
<title>Credits</title>
<para>
Development of this module was sponsored by Torchbox Ltd. (www.torchbox.com)
It has the same BSD licence as PostgreSQL.
</para>
<title>Author</title>
<para>
This version of the XML functions provides both XPath querying and
XSLT functionality. There is also a new table function which allows
the straightforward return of multiple XML results. Note that the current code
doesn't take any particular care over character sets - this is
something that should be fixed at some point!
John Gray <email>jgray@azuli.co.uk</email>
</para>
<para>
If you have any comments or suggestions, please do contact me at
<email>jgray@azuli.co.uk.</email> Unfortunately, this isn't my main job, so
I can't guarantee a rapid response to your query!
Development of this module was sponsored by Torchbox Ltd. (www.torchbox.com).
It has the same BSD licence as PostgreSQL.
</para>
</sect2>
</sect1>
</sect1>
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment