Commit a6554df4 authored by Peter Eisentraut's avatar Peter Eisentraut

In an effort to reduce the total number of chapters, combine the small

chapters on extending types, operators, and aggregates into the extending
functions chapter.  Move the information on how to call table functions
into the queries chapter.  Remove some outdated information that is
already present in a better form in other parts of the documentation.
parent 730840c9
...@@ -2,7 +2,7 @@ ...@@ -2,7 +2,7 @@
# #
# PostgreSQL documentation makefile # PostgreSQL documentation makefile
# #
# $Header: /cvsroot/pgsql/doc/src/sgml/Makefile,v 1.56 2003/03/25 16:15:35 petere Exp $ # $Header: /cvsroot/pgsql/doc/src/sgml/Makefile,v 1.57 2003/04/10 01:22:44 petere Exp $
# #
#---------------------------------------------------------------------------- #----------------------------------------------------------------------------
...@@ -77,7 +77,7 @@ all: html ...@@ -77,7 +77,7 @@ all: html
.PHONY: html .PHONY: html
html: postgres.sgml $(ALLSGML) stylesheet.dsl catalogs.gif connections.gif html: postgres.sgml $(ALLSGML) stylesheet.dsl
@rm -f *.html @rm -f *.html
$(JADE) $(JADEFLAGS) $(SGMLINCLUDE) $(CATALOG) -d stylesheet.dsl -i output-html -t sgml $< $(JADE) $(JADEFLAGS) $(SGMLINCLUDE) $(CATALOG) -d stylesheet.dsl -i output-html -t sgml $<
...@@ -114,8 +114,6 @@ features-unsupported.sgml: $(top_srcdir)/src/backend/catalog/sql_feature_package ...@@ -114,8 +114,6 @@ features-unsupported.sgml: $(top_srcdir)/src/backend/catalog/sql_feature_package
%.rtf: %.sgml $(ALLSGML) stylesheet.dsl %.rtf: %.sgml $(ALLSGML) stylesheet.dsl
$(JADE) $(JADEFLAGS) $(SGMLINCLUDE) $(CATALOG) -d stylesheet.dsl -t rtf -V rtf-backend -i output-print $< $(JADE) $(JADEFLAGS) $(SGMLINCLUDE) $(CATALOG) -d stylesheet.dsl -t rtf -V rtf-backend -i output-print $<
postgres.rtf: catalogs.gif connections.gif
# TeX # TeX
# Regular TeX and pdfTeX have slightly differing requirements, so we # Regular TeX and pdfTeX have slightly differing requirements, so we
# need to distinguish the path we're taking. # need to distinguish the path we're taking.
...@@ -123,13 +121,9 @@ postgres.rtf: catalogs.gif connections.gif ...@@ -123,13 +121,9 @@ postgres.rtf: catalogs.gif connections.gif
%.tex-ps: %.sgml $(ALLSGML) stylesheet.dsl %.tex-ps: %.sgml $(ALLSGML) stylesheet.dsl
$(JADE) $(JADEFLAGS) $(SGMLINCLUDE) $(CATALOG) -d stylesheet.dsl -t tex -V tex-backend -i output-print -V texdvi-output -o $@ $< $(JADE) $(JADEFLAGS) $(SGMLINCLUDE) $(CATALOG) -d stylesheet.dsl -t tex -V tex-backend -i output-print -V texdvi-output -o $@ $<
postgres.tex-ps: catalogs.eps connections.eps
%.tex-pdf: %.sgml $(ALLSGML) stylesheet.dsl %.tex-pdf: %.sgml $(ALLSGML) stylesheet.dsl
$(JADE) $(JADEFLAGS) $(SGMLINCLUDE) $(CATALOG) -d stylesheet.dsl -t tex -V tex-backend -i output-print -V texpdf-output -o $@ $< $(JADE) $(JADEFLAGS) $(SGMLINCLUDE) $(CATALOG) -d stylesheet.dsl -t tex -V tex-backend -i output-print -V texpdf-output -o $@ $<
postgres.tex-pdf: catalogs.pdf connections.pdf
%.dvi: %.tex-ps %.dvi: %.tex-ps
@rm -f $*.aux $*.log @rm -f $*.aux $*.log
jadetex $< jadetex $<
......
<Chapter Id="arch-pg">
<TITLE>Architecture</TITLE>
<Sect1 id="arch-pg-concepts">
<Title><ProductName>PostgreSQL</ProductName> Architectural Concepts</Title>
<Para>
Before we begin, you should understand the basic
<ProductName>PostgreSQL</ProductName> system architecture. Understanding how the
parts of <ProductName>PostgreSQL</ProductName> interact will make the next chapter
somewhat clearer.
In database jargon, <ProductName>PostgreSQL</ProductName> uses a simple "process
per-user" client/server model. A <ProductName>PostgreSQL</ProductName> session
consists of the following cooperating Unix processes (programs):
<ItemizedList>
<ListItem>
<Para>
A supervisory daemon process (the <Application>postmaster</Application>),
</Para>
</ListItem>
<ListItem>
<Para>
the user's frontend application (e.g., the <Application>psql</Application> program), and
</Para>
</ListItem>
<ListItem>
<Para>
one or more backend database servers (the <Application>postgres</Application> process itself).
</Para>
</ListItem>
</ItemizedList>
</para>
<Para>
A single <Application>postmaster</Application> manages a given collection of
databases on a single host. Such a collection of
databases is called a cluster (of databases). A frontend
application that wishes to access a given database
within a cluster makes calls to an interface library (e.g., <application>libpq</>)
that is linked into the application.
The library sends user requests over the network to the
<Application>postmaster</Application>
(<XRef LinkEnd="PGARCH-CONNECTIONS">(a)),
which in turn starts a new backend server process
(<XRef LinkEnd="PGARCH-CONNECTIONS">(b))
<figure id="PGARCH-CONNECTIONS">
<title>How a connection is established</title>
<mediaobject>
<imageobject>
<imagedata align="center" fileref="connections">
</imageobject>
</mediaobject>
</figure>
and connects the frontend process to the new server
(<XRef LinkEnd="PGARCH-CONNECTIONS">(c)).
From that point on, the frontend process and the backend
server communicate without intervention by the
<Application>postmaster</Application>. Hence, the <Application>postmaster</Application> is always running, waiting
for connection requests, whereas frontend and backend processes
come and go. The <FileName>libpq</FileName> library allows a single
frontend to make multiple connections to backend processes.
However, each backend process is a single-threaded process that can
only execute one query at a time; so the communication over any one
frontend-to-backend connection is single-threaded.
</Para>
<Para>
One implication of this architecture is that the
<Application>postmaster</Application> and the backend always run on the
same machine (the database server), while the frontend
application may run anywhere. You should keep this
in mind,
because the files that can be accessed on a client
machine may not be accessible (or may only be accessed
using a different path name) on the database server
machine.
</Para>
<Para>
You should also be aware that the <Application>postmaster</Application> and
<application>postgres</> servers run with the user ID of the <ProductName>PostgreSQL</ProductName>
<quote>superuser</>.
Note that the <ProductName>PostgreSQL</ProductName> superuser does not
have to be any particular user (e.g., a user named
<literal>postgres</literal>), although many systems are installed that way.
Furthermore, the <ProductName>PostgreSQL</ProductName> superuser should
definitely not be the Unix superuser, <literal>root</literal>!
It is safest if the <ProductName>PostgreSQL</ProductName> superuser is an
ordinary, unprivileged user so far as the surrounding Unix system is
concerned.
In any case, all files relating to a database should belong to
this <ProductName>Postgres</ProductName> superuser.
</Para>
</sect1>
</Chapter>
<!-- Keep this comment at the end of the file
Local variables:
mode:sgml
sgml-omittag:nil
sgml-shorttag:t
sgml-minimize-attributes:nil
sgml-always-quote-attributes:t
sgml-indent-step:1
sgml-indent-tabs-mode:nil
sgml-indent-data:t
sgml-parent-document:nil
sgml-default-dtd-file:"./reference.ced"
sgml-exposed-tags:nil
sgml-local-catalogs:("/usr/share/sgml/catalog")
sgml-local-ecat-files:nil
End:
-->
<!-- <!--
$Header: /cvsroot/pgsql/doc/src/sgml/dfunc.sgml,v 1.24 2003/03/25 16:15:35 petere Exp $ $Header: /cvsroot/pgsql/doc/src/sgml/dfunc.sgml,v 1.25 2003/04/10 01:22:44 petere Exp $
--> -->
<sect2 id="dfunc"> <sect2 id="dfunc">
...@@ -14,7 +14,8 @@ $Header: /cvsroot/pgsql/doc/src/sgml/dfunc.sgml,v 1.24 2003/03/25 16:15:35 peter ...@@ -14,7 +14,8 @@ $Header: /cvsroot/pgsql/doc/src/sgml/dfunc.sgml,v 1.24 2003/03/25 16:15:35 peter
</para> </para>
<para> <para>
For more information you should read the documentation of your For information beyond what is contained in this section
you should read the documentation of your
operating system, in particular the manual pages for the C compiler, operating system, in particular the manual pages for the C compiler,
<command>cc</command>, and the link editor, <command>ld</command>. <command>cc</command>, and the link editor, <command>ld</command>.
In addition, the <productname>PostgreSQL</productname> source code In addition, the <productname>PostgreSQL</productname> source code
...@@ -47,13 +48,10 @@ $Header: /cvsroot/pgsql/doc/src/sgml/dfunc.sgml,v 1.24 2003/03/25 16:15:35 peter ...@@ -47,13 +48,10 @@ $Header: /cvsroot/pgsql/doc/src/sgml/dfunc.sgml,v 1.24 2003/03/25 16:15:35 peter
here. here.
</para> </para>
<para>
<!-- <!--
Note: Reading GNU Libtool sources is generally a good way of figuring out Note: Reading GNU Libtool sources is generally a good way of
this information. The methods used within figuring out this information. The methods used within PostgreSQL
<productname>PostgreSQL</> source code are not source code are not necessarily ideal.
necessarily ideal.
--> -->
<variablelist> <variablelist>
...@@ -160,7 +158,7 @@ cc -shared -o foo.so foo.o ...@@ -160,7 +158,7 @@ cc -shared -o foo.so foo.o
<indexterm><primary>MacOS X</></> <indexterm><primary>MacOS X</></>
<listitem> <listitem>
<para> <para>
Here is a sample. It assumes the developer tools are installed. Here is an example. It assumes the developer tools are installed.
<programlisting> <programlisting>
cc -c foo.c cc -c foo.c
cc -bundle -flat_namespace -undefined suppress -o foo.so foo.o cc -bundle -flat_namespace -undefined suppress -o foo.so foo.o
...@@ -271,17 +269,13 @@ gcc -shared -o foo.so foo.o ...@@ -271,17 +269,13 @@ gcc -shared -o foo.so foo.o
</varlistentry> </varlistentry>
</variablelist> </variablelist>
</para>
<tip> <tip>
<para> <para>
If you want to package your extension modules for wide distribution If this is too complicated for you, you should consider using
you should consider using <ulink <ulink url="http://www.gnu.org/software/libtool/"><productname>GNU
url="http://www.gnu.org/software/libtool/"><productname>GNU Libtool</productname></ulink>, which hides the platform differences
Libtool</productname></ulink> for building shared libraries. It behind a uniform interface.
encapsulates the platform differences into a general and powerful
interface. Serious packaging also requires considerations about
library versioning, symbol resolution methods, and other issues.
</para> </para>
</tip> </tip>
......
<!-- <!--
$Header: /cvsroot/pgsql/doc/src/sgml/extend.sgml,v 1.20 2003/03/25 16:15:36 petere Exp $ $Header: /cvsroot/pgsql/doc/src/sgml/extend.sgml,v 1.21 2003/04/10 01:22:44 petere Exp $
--> -->
<chapter id="extend"> <chapter id="extend">
<title>Extending <acronym>SQL</acronym>: An Overview</title> <title>Extending <acronym>SQL</acronym></title>
<indexterm zone="extend"> <indexterm zone="extend">
<primary>extending SQL</primary> <primary>extending SQL</primary>
...@@ -17,22 +17,22 @@ $Header: /cvsroot/pgsql/doc/src/sgml/extend.sgml,v 1.20 2003/03/25 16:15:36 pete ...@@ -17,22 +17,22 @@ $Header: /cvsroot/pgsql/doc/src/sgml/extend.sgml,v 1.20 2003/03/25 16:15:36 pete
<itemizedlist spacing="compact" mark="bullet"> <itemizedlist spacing="compact" mark="bullet">
<listitem> <listitem>
<para> <para>
functions functions (starting in <xref linkend="xfunc">)
</para> </para>
</listitem> </listitem>
<listitem> <listitem>
<para> <para>
data types data types (starting in <xref linkend="xtypes">)
</para> </para>
</listitem> </listitem>
<listitem> <listitem>
<para> <para>
operators operators (starting in <xref linkend="xoper">)
</para> </para>
</listitem> </listitem>
<listitem> <listitem>
<para> <para>
aggregates aggregates (starting in <xref linkend="xaggr">)
</para> </para>
</listitem> </listitem>
</itemizedlist> </itemizedlist>
...@@ -44,30 +44,29 @@ $Header: /cvsroot/pgsql/doc/src/sgml/extend.sgml,v 1.20 2003/03/25 16:15:36 pete ...@@ -44,30 +44,29 @@ $Header: /cvsroot/pgsql/doc/src/sgml/extend.sgml,v 1.20 2003/03/25 16:15:36 pete
<para> <para>
<productname>PostgreSQL</productname> is extensible because its operation is <productname>PostgreSQL</productname> is extensible because its operation is
catalog-driven. If you are familiar with standard catalog-driven. If you are familiar with standard
relational systems, you know that they store information relational database systems, you know that they store information
about databases, tables, columns, etc., in what are about databases, tables, columns, etc., in what are
commonly known as system catalogs. (Some systems call commonly known as system catalogs. (Some systems call
this the data dictionary). The catalogs appear to the this the data dictionary). The catalogs appear to the
user as tables like any other, but the <acronym>DBMS</acronym> stores user as tables like any other, but the <acronym>DBMS</acronym> stores
its internal bookkeeping in them. One key difference its internal bookkeeping in them. One key difference
between <productname>PostgreSQL</productname> and standard relational systems is between <productname>PostgreSQL</productname> and standard relational database systems is
that <productname>PostgreSQL</productname> stores much more information in its that <productname>PostgreSQL</productname> stores much more information in its
catalogs -- not only information about tables and columns, catalogs: not only information about tables and columns,
but also information about its types, functions, access but also information about data types, functions, access
methods, and so on. These tables can be modified by methods, and so on. These tables can be modified by
the user, and since <productname>PostgreSQL</productname> bases its internal operation the user, and since <productname>PostgreSQL</productname> bases its operation
on these tables, this means that <productname>PostgreSQL</productname> can be on these tables, this means that <productname>PostgreSQL</productname> can be
extended by users. By comparison, conventional extended by users. By comparison, conventional
database systems can only be extended by changing hardcoded database systems can only be extended by changing hardcoded
procedures within the <acronym>DBMS</acronym> or by loading modules procedures in the source code or by loading modules
specially written by the <acronym>DBMS</acronym> vendor. specially written by the <acronym>DBMS</acronym> vendor.
</para> </para>
<para> <para>
<productname>PostgreSQL</productname> is also unlike most other data managers in The PostgreSQL server can moreover incorporate user-written code into
that the server can incorporate user-written code into
itself through dynamic loading. That is, the user can itself through dynamic loading. That is, the user can
specify an object code file (e.g., a shared library) that implements a new type or function specify an object code file (e.g., a shared library) that implements a new type or function,
and <productname>PostgreSQL</productname> will load it as required. Code written and <productname>PostgreSQL</productname> will load it as required. Code written
in <acronym>SQL</acronym> is even more trivial to add to the server. in <acronym>SQL</acronym> is even more trivial to add to the server.
This ability to modify its operation <quote>on the fly</quote> makes This ability to modify its operation <quote>on the fly</quote> makes
...@@ -89,195 +88,25 @@ $Header: /cvsroot/pgsql/doc/src/sgml/extend.sgml,v 1.20 2003/03/25 16:15:36 pete ...@@ -89,195 +88,25 @@ $Header: /cvsroot/pgsql/doc/src/sgml/extend.sgml,v 1.20 2003/03/25 16:15:36 pete
</indexterm> </indexterm>
<para> <para>
The <productname>PostgreSQL</productname> type system Data types are divided into base types and composite types.
can be broken down in several ways.
Types are divided into base types and composite types.
Base types are those, like <type>int4</type>, that are implemented Base types are those, like <type>int4</type>, that are implemented
in a language such as C. They generally correspond to in a language such as C. They generally correspond to
what are often known as <firstterm>abstract data types</firstterm>; <productname>PostgreSQL</productname> what are often known as abstract data types. <productname>PostgreSQL</productname>
can only operate on such types through methods provided can only operate on such types through methods provided
by the user and only understands the behavior of such by the user and only understands the behavior of such
types to the extent that the user describes them. types to the extent that the user describes them.
Composite types are created whenever the user creates a Composite types are created whenever the user creates a
table. table. The
</para>
<para>
<productname>PostgreSQL</productname> stores these types
in only one way (within the
file that stores all rows of a table) but the
user can <quote>look inside</quote> at the attributes of these types user can <quote>look inside</quote> at the attributes of these types
from the query language and optimize their retrieval by from the query language.
(for example) defining indexes on the attributes.
<productname>PostgreSQL</productname> base types are further
divided into built-in
types and user-defined types. Built-in types (like
<type>int4</type>) are those that are compiled
into the system.
User-defined types are those created by the user in the
manner to be described later.
</para> </para>
</sect1> </sect1>
<sect1 id="pg-system-catalogs"> &xfunc;
<title>About the <productname>PostgreSQL</productname> System Catalogs</title> &xtypes;
&xoper;
<indexterm zone="pg-system-catalogs"> &xaggr;
<primary>catalogs</primary>
</indexterm>
<para>
Having introduced the basic extensibility concepts, we
can now take a look at how the catalogs are actually
laid out. You can skip this section for now, but some
later sections will be incomprehensible without the
information given here, so mark this page for later
reference.
All system catalogs have names that begin with
<literal>pg_</literal>.
The following tables contain information that may be
useful to the end user. (There are many other system
catalogs, but there should rarely be a reason to query
them directly.)
<table tocentry="1">
<title>PostgreSQL System Catalogs</title>
<titleabbrev>Catalogs</titleabbrev>
<tgroup cols="2">
<thead>
<row>
<entry>Catalog Name</entry>
<entry>Description</entry>
</row>
</thead>
<tbody>
<row>
<entry><structname>pg_database</></entry>
<entry> databases</entry>
</row>
<row>
<entry><structname>pg_class</></entry>
<entry> tables</entry>
</row>
<row>
<entry><structname>pg_attribute</></entry>
<entry> table columns</entry>
</row>
<row>
<entry><structname>pg_index</></entry>
<entry> indexes</entry>
</row>
<row>
<entry><structname>pg_proc</></entry>
<entry> procedures/functions </entry>
</row>
<row>
<entry><structname>pg_type</></entry>
<entry> data types (both base and complex)</entry>
</row>
<row>
<entry><structname>pg_operator</></entry>
<entry> operators</entry>
</row>
<row>
<entry><structname>pg_aggregate</></entry>
<entry> aggregate functions</entry>
</row>
<row>
<entry><structname>pg_am</></entry>
<entry> access methods</entry>
</row>
<row>
<entry><structname>pg_amop</></entry>
<entry> access method operators</entry>
</row>
<row>
<entry><structname>pg_amproc</></entry>
<entry> access method support functions</entry>
</row>
<row>
<entry><structname>pg_opclass</></entry>
<entry> access method operator classes</entry>
</row>
</tbody>
</tgroup>
</table>
</para>
<para>
<figure float="1" id="EXTEND-CATALOGS">
<title>The major <productname>PostgreSQL</productname> system catalogs</title>
<mediaobject>
<imageobject>
<imagedata fileref="catalogs" align="center">
</imageobject>
</mediaobject>
</figure>
<xref linkend="catalogs"> gives a more detailed explanation of these
catalogs and their columns. However,
<xref linkend="EXTEND-CATALOGS">
shows the major entities and their relationships
in the system catalogs. (Columns that do not refer
to other entities are not shown unless they are part of
a primary key.)
This diagram is more or less incomprehensible until you
actually start looking at the contents of the catalogs
and see how they relate to each other. For now, the
main things to take away from this diagram are as follows:
<itemizedlist spacing="compact" mark="bullet">
<listitem>
<para>
In several of the sections that follow, we will
present various join queries on the system
catalogs that display information we need to extend
the system. Looking at this diagram should make
some of these join queries (which are often
three- or four-way joins) more understandable,
because you will be able to see that the
columns used in the queries form foreign keys
in other tables.
</para>
</listitem>
<listitem>
<para>
Many different features (tables, columns,
functions, types, access methods, etc.) are
tightly integrated in this schema. A simple
create command may modify many of these catalogs.
</para>
</listitem>
<listitem>
<para>
Types and procedures
are central to the schema.
<note>
<para>
We use the words <firstterm>procedure</firstterm>
and <firstterm>function</firstterm> more or less interchangeably.
</para>
</note>
Nearly every catalog contains some reference to
rows in one or both of these tables. For
example, <productname>PostgreSQL</productname> frequently uses type
signatures (e.g., of functions and operators) to
identify unique rows of other catalogs.
</para>
</listitem>
<listitem>
<para>
There are many columns and relationships that
have obvious meanings, but there are many
(particularly those that have to do with access
methods) that do not.
</para>
</listitem>
</itemizedlist>
</para>
</sect1>
</chapter> </chapter>
<!-- Keep this comment at the end of the file <!-- Keep this comment at the end of the file
......
<!-- $Header: /cvsroot/pgsql/doc/src/sgml/filelist.sgml,v 1.27 2003/03/25 16:15:36 petere Exp $ --> <!-- $Header: /cvsroot/pgsql/doc/src/sgml/filelist.sgml,v 1.28 2003/04/10 01:22:44 petere Exp $ -->
<!entity history SYSTEM "history.sgml"> <!entity history SYSTEM "history.sgml">
<!entity info SYSTEM "info.sgml"> <!entity info SYSTEM "info.sgml">
...@@ -57,7 +57,6 @@ ...@@ -57,7 +57,6 @@
<!entity wal SYSTEM "wal.sgml"> <!entity wal SYSTEM "wal.sgml">
<!-- programmer's guide --> <!-- programmer's guide -->
<!entity arch-pg SYSTEM "arch-pg.sgml">
<!entity dfunc SYSTEM "dfunc.sgml"> <!entity dfunc SYSTEM "dfunc.sgml">
<!entity ecpg SYSTEM "ecpg.sgml"> <!entity ecpg SYSTEM "ecpg.sgml">
<!entity extend SYSTEM "extend.sgml"> <!entity extend SYSTEM "extend.sgml">
......
<!-- <!--
$Header: /cvsroot/pgsql/doc/src/sgml/postgres.sgml,v 1.49 2003/03/25 16:15:38 petere Exp $ $Header: /cvsroot/pgsql/doc/src/sgml/postgres.sgml,v 1.50 2003/04/10 01:22:44 petere Exp $
--> -->
<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook V3.1//EN" [ <!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook V3.1//EN" [
...@@ -210,15 +210,10 @@ $Header: /cvsroot/pgsql/doc/src/sgml/postgres.sgml,v 1.49 2003/03/25 16:15:38 pe ...@@ -210,15 +210,10 @@ $Header: /cvsroot/pgsql/doc/src/sgml/postgres.sgml,v 1.49 2003/03/25 16:15:38 pe
</para> </para>
</partintro> </partintro>
&arch-pg;
&extend; &extend;
&xfunc;
&xtypes;
&xoper;
&xaggr;
&rules;
&xindex; &xindex;
&indexcost; &indexcost;
&rules;
&trigger; &trigger;
&spi; &spi;
......
<!-- $Header: /cvsroot/pgsql/doc/src/sgml/queries.sgml,v 1.20 2003/03/13 01:30:29 petere Exp $ --> <!-- $Header: /cvsroot/pgsql/doc/src/sgml/queries.sgml,v 1.21 2003/04/10 01:22:44 petere Exp $ -->
<chapter id="queries"> <chapter id="queries">
<title>Queries</title> <title>Queries</title>
...@@ -550,6 +550,78 @@ FROM (SELECT * FROM table1) AS alias_name ...@@ -550,6 +550,78 @@ FROM (SELECT * FROM table1) AS alias_name
grouping or aggregation. grouping or aggregation.
</para> </para>
</sect3> </sect3>
<sect3 id="queries-tablefunctions">
<title>Table Functions</title>
<indexterm zone="queries-tablefunctions"><primary>table function</></>
<para>
Table functions are functions that produce a set of rows, made up
of either base data types (scalar types) or composite data types
(table rows). They are used like a table, view, or subquery in
the <literal>FROM</> clause of a query. Columns returned by table
functions may be included in <literal>SELECT</>,
<literal>JOIN</>, or <literal>WHERE</> clauses in the same manner
as a table, view, or subquery column.
</para>
<para>
If a table function returns a base data type, the single result
column is named like the function. If the function returns a
composite type, the result columns get the same names as the
individual attributes of the type.
</para>
<para>
A table function may be aliased in the <literal>FROM</> clause,
but it also may be left unaliased. If a function is used in the
<literal>FROM</> clause with no alias, the function name is used
as the resulting table name.
</para>
<para>
Some examples:
<programlisting>
CREATE TABLE foo (fooid int, foosubid int, fooname text);
CREATE FUNCTION getfoo(int) RETURNS SETOF foo AS '
SELECT * FROM foo WHERE fooid = $1;
' LANGUAGE SQL;
SELECT * FROM getfoo(1) AS t1;
SELECT * FROM foo
WHERE foosubid IN (select foosubid from getfoo(foo.fooid) z
where z.fooid = foo.fooid);
CREATE VIEW vw_getfoo AS SELECT * FROM getfoo(1);
SELECT * FROM vw_getfoo;
</programlisting>
</para>
<para>
In some cases it is useful to define table functions that can
return different column sets depending on how they are invoked.
To support this, the table function can be declared as returning
the pseudotype <type>record</>. When such a function is used in
a query, the expected row structure must be specified in the
query itself, so that the system can know how to parse and plan
the query. Consider this example:
<programlisting>
SELECT *
FROM dblink('dbname=mydb', 'select proname, prosrc from pg_proc')
AS t1(proname name, prosrc text)
WHERE proname LIKE 'bytea%';
</programlisting>
The <literal>dblink</> function executes a remote query (see
<filename>contrib/dblink</>). It is declared to return
<type>record</> since it might be used for any kind of query.
The actual column set must be specified in the calling query so
that the parser knows, for example, what <literal>*</> should
expand to.
</para>
</sect3>
</sect2> </sect2>
<sect2 id="queries-where"> <sect2 id="queries-where">
...@@ -951,7 +1023,7 @@ SELECT DISTINCT ON (<replaceable>expression</replaceable> <optional>, <replaceab ...@@ -951,7 +1023,7 @@ SELECT DISTINCT ON (<replaceable>expression</replaceable> <optional>, <replaceab
The <literal>DISTINCT ON</> clause is not part of the SQL standard The <literal>DISTINCT ON</> clause is not part of the SQL standard
and is sometimes considered bad style because of the potentially and is sometimes considered bad style because of the potentially
indeterminate nature of its results. With judicious use of indeterminate nature of its results. With judicious use of
<literal>GROUP BY</> and subselects in <literal>FROM</> the <literal>GROUP BY</> and subqueries in <literal>FROM</> the
construct can be avoided, but it is often the most convenient construct can be avoided, but it is often the most convenient
alternative. alternative.
</para> </para>
......
<!-- <!--
$Header: /cvsroot/pgsql/doc/src/sgml/xaggr.sgml,v 1.19 2003/03/25 16:15:38 petere Exp $ $Header: /cvsroot/pgsql/doc/src/sgml/xaggr.sgml,v 1.20 2003/04/10 01:22:44 petere Exp $
--> -->
<chapter id="xaggr"> <sect1 id="xaggr">
<title>Extending <acronym>SQL</acronym>: Aggregates</title> <title>User-Defined Aggregates</title>
<indexterm zone="xaggr"> <indexterm zone="xaggr">
<primary>aggregate functions</primary> <primary>aggregate functions</primary>
...@@ -22,38 +22,36 @@ $Header: /cvsroot/pgsql/doc/src/sgml/xaggr.sgml,v 1.19 2003/03/25 16:15:38 peter ...@@ -22,38 +22,36 @@ $Header: /cvsroot/pgsql/doc/src/sgml/xaggr.sgml,v 1.19 2003/03/25 16:15:38 peter
function. The state transition function is just an function. The state transition function is just an
ordinary function that could also be used outside the ordinary function that could also be used outside the
context of the aggregate. A <firstterm>final function</firstterm> context of the aggregate. A <firstterm>final function</firstterm>
can also be specified, in case the desired output of the aggregate can also be specified, in case the desired result of the aggregate
is different from the data that needs to be kept in the running is different from the data that needs to be kept in the running
state value. state value.
</para> </para>
<para> <para>
Thus, in addition to the input and result data types seen by a user Thus, in addition to the argument and result data types seen by a user
of the aggregate, there is an internal state-value data type that of the aggregate, there is an internal state-value data type that
may be different from both the input and result types. may be different from both the argument and result types.
</para> </para>
<para> <para>
If we define an aggregate that does not use a final function, If we define an aggregate that does not use a final function,
we have an aggregate that computes a running function of we have an aggregate that computes a running function of
the column values from each row. <function>Sum</> is an the column values from each row. <function>sum</> is an
example of this kind of aggregate. <function>Sum</> starts at example of this kind of aggregate. <function>sum</> starts at
zero and always adds the current row's value to zero and always adds the current row's value to
its running total. For example, if we want to make a <function>sum</> its running total. For example, if we want to make a <function>sum</>
aggregate to work on a data type for complex numbers, aggregate to work on a data type for complex numbers,
we only need the addition function for that data type. we only need the addition function for that data type.
The aggregate definition is: The aggregate definition would be:
<programlisting> <screen>
CREATE AGGREGATE complex_sum ( CREATE AGGREGATE complex_sum (
sfunc = complex_add, sfunc = complex_add,
basetype = complex, basetype = complex,
stype = complex, stype = complex,
initcond = '(0,0)' initcond = '(0,0)'
); );
</programlisting>
<screen>
SELECT complex_sum(a) FROM test_complex; SELECT complex_sum(a) FROM test_complex;
complex_sum complex_sum
...@@ -61,43 +59,43 @@ SELECT complex_sum(a) FROM test_complex; ...@@ -61,43 +59,43 @@ SELECT complex_sum(a) FROM test_complex;
(34,53.9) (34,53.9)
</screen> </screen>
(In practice, we'd just name the aggregate <function>sum</function>, and rely on (In practice, we'd just name the aggregate <function>sum</function> and rely on
<productname>PostgreSQL</productname> to figure out which kind <productname>PostgreSQL</productname> to figure out which kind
of sum to apply to a column of type <type>complex</type>.) of sum to apply to a column of type <type>complex</type>.)
</para> </para>
<para> <para>
The above definition of <function>sum</function> will return zero (the initial The above definition of <function>sum</function> will return zero (the initial
state condition) if there are no non-null input values. state condition) if there are no nonnull input values.
Perhaps we want to return NULL in that case instead --- the SQL standard Perhaps we want to return null in that case instead --- the SQL standard
expects <function>sum</function> to behave that way. We can do this simply by expects <function>sum</function> to behave that way. We can do this simply by
omitting the <literal>initcond</literal> phrase, so that the initial state omitting the <literal>initcond</literal> phrase, so that the initial state
condition is NULL. Ordinarily this would mean that the <literal>sfunc</literal> condition is null. Ordinarily this would mean that the <literal>sfunc</literal>
would need to check for a NULL state-condition input, but for would need to check for a null state-condition input, but for
<function>sum</function> and some other simple aggregates like <function>max</> and <function>min</>, <function>sum</function> and some other simple aggregates like <function>max</> and <function>min</>,
it's sufficient to insert the first non-null input value into it would be sufficient to insert the first nonnull input value into
the state variable and then start applying the transition function the state variable and then start applying the transition function
at the second non-null input value. <productname>PostgreSQL</productname> at the second nonnull input value. <productname>PostgreSQL</productname>
will do that automatically if the initial condition is NULL and will do that automatically if the initial condition is null and
the transition function is marked <quote>strict</> (i.e., not to be called the transition function is marked <quote>strict</> (i.e., not to be called
for NULL inputs). for null inputs).
</para> </para>
<para> <para>
Another bit of default behavior for a <quote>strict</> transition function Another bit of default behavior for a <quote>strict</> transition function
is that the previous state value is retained unchanged whenever a is that the previous state value is retained unchanged whenever a
NULL input value is encountered. Thus, null values are ignored. If you null input value is encountered. Thus, null values are ignored. If you
need some other behavior for NULL inputs, just define your transition need some other behavior for null inputs, just do not define your transition
function as non-strict, and code it to test for NULL inputs and do function as strict, and code it to test for null inputs and do
whatever is needed. whatever is needed.
</para> </para>
<para> <para>
<function>Avg</> (average) is a more complex example of an aggregate. It requires <function>avg</> (average) is a more complex example of an aggregate. It requires
two pieces of running state: the sum of the inputs and the count two pieces of running state: the sum of the inputs and the count
of the number of inputs. The final result is obtained by dividing of the number of inputs. The final result is obtained by dividing
these quantities. Average is typically implemented by using a these quantities. Average is typically implemented by using a
two-element array as the transition state value. For example, two-element array as the state value. For example,
the built-in implementation of <function>avg(float8)</function> the built-in implementation of <function>avg(float8)</function>
looks like: looks like:
...@@ -116,7 +114,7 @@ CREATE AGGREGATE avg ( ...@@ -116,7 +114,7 @@ CREATE AGGREGATE avg (
For further details see the description of the <command>CREATE For further details see the description of the <command>CREATE
AGGREGATE</command> command in <xref linkend="reference">. AGGREGATE</command> command in <xref linkend="reference">.
</para> </para>
</chapter> </sect1>
<!-- Keep this comment at the end of the file <!-- Keep this comment at the end of the file
Local variables: Local variables:
......
<!-- <!--
$Header: /cvsroot/pgsql/doc/src/sgml/xfunc.sgml,v 1.66 2003/03/25 16:15:38 petere Exp $ $Header: /cvsroot/pgsql/doc/src/sgml/xfunc.sgml,v 1.67 2003/04/10 01:22:45 petere Exp $
--> -->
<chapter id="xfunc"> <sect1 id="xfunc">
<title id="xfunc-title">Extending <acronym>SQL</acronym>: Functions</title> <title>User-Defined Functions</title>
<indexterm zone="xfunc"><primary>function</></> <indexterm zone="xfunc"><primary>function</></>
<sect1 id="xfunc-intro">
<title>Introduction</title>
<para> <para>
<productname>PostgreSQL</productname> provides four kinds of <productname>PostgreSQL</productname> provides four kinds of
functions: functions:
...@@ -17,24 +14,25 @@ $Header: /cvsroot/pgsql/doc/src/sgml/xfunc.sgml,v 1.66 2003/03/25 16:15:38 peter ...@@ -17,24 +14,25 @@ $Header: /cvsroot/pgsql/doc/src/sgml/xfunc.sgml,v 1.66 2003/03/25 16:15:38 peter
<itemizedlist> <itemizedlist>
<listitem> <listitem>
<para> <para>
query language functions query language functions (functions written in
(functions written in <acronym>SQL</acronym>) <acronym>SQL</acronym>) (<xref linkend="xfunc-sql">)
</para> </para>
</listitem> </listitem>
<listitem> <listitem>
<para> <para>
procedural language procedural language functions (functions written in, for
functions (functions written in, for example, <application>PL/Tcl</> or <application>PL/pgSQL</>) example, <application>PL/Tcl</> or <application>PL/pgSQL</>)
(<xref linkend="xfunc-pl">)
</para> </para>
</listitem> </listitem>
<listitem> <listitem>
<para> <para>
internal functions internal functions (<xref linkend="xfunc-internal">)
</para> </para>
</listitem> </listitem>
<listitem> <listitem>
<para> <para>
C language functions C-language functions (<xref linkend="xfunc-c">)
</para> </para>
</listitem> </listitem>
</itemizedlist> </itemizedlist>
...@@ -42,10 +40,14 @@ $Header: /cvsroot/pgsql/doc/src/sgml/xfunc.sgml,v 1.66 2003/03/25 16:15:38 peter ...@@ -42,10 +40,14 @@ $Header: /cvsroot/pgsql/doc/src/sgml/xfunc.sgml,v 1.66 2003/03/25 16:15:38 peter
<para> <para>
Every kind Every kind
of function can take a base type, a composite type, or of function can take base types, composite types, or
some combination as arguments (parameters). In addition, some combination as arguments (parameters). In addition,
every kind of function can return a base type or every kind of function can return a base type or
a composite type. It's easiest to define <acronym>SQL</acronym> a composite type.
</para>
<para>
It's easiest to define <acronym>SQL</acronym>
functions, so we'll start with those. Examples in this section functions, so we'll start with those. Examples in this section
can also be found in <filename>funcs.sql</filename> can also be found in <filename>funcs.sql</filename>
and <filename>funcs.c</filename> in the tutorial directory. and <filename>funcs.c</filename> in the tutorial directory.
...@@ -72,14 +74,14 @@ $Header: /cvsroot/pgsql/doc/src/sgml/xfunc.sgml,v 1.66 2003/03/25 16:15:38 peter ...@@ -72,14 +74,14 @@ $Header: /cvsroot/pgsql/doc/src/sgml/xfunc.sgml,v 1.66 2003/03/25 16:15:38 peter
(Bear in mind that <quote>the first row</quote> of a multirow (Bear in mind that <quote>the first row</quote> of a multirow
result is not well-defined unless you use <literal>ORDER BY</>.) result is not well-defined unless you use <literal>ORDER BY</>.)
If the last query happens If the last query happens
to return no rows at all, NULL will be returned. to return no rows at all, the null value will be returned.
</para> </para>
<para> <para>
<indexterm><primary>SETOF</><seealso>function</></> <indexterm><primary>SETOF</><seealso>function</></>
Alternatively, an SQL function may be declared to return a set, Alternatively, an SQL function may be declared to return a set,
by specifying the function's return type by specifying the function's return type
as <literal>SETOF</literal> <replaceable>sometype</>. In this case as <literal>SETOF <replaceable>sometype</></literal>. In this case
all rows of the last query's result are returned. Further details all rows of the last query's result are returned. Further details
appear below. appear below.
</para> </para>
...@@ -97,22 +99,65 @@ $Header: /cvsroot/pgsql/doc/src/sgml/xfunc.sgml,v 1.66 2003/03/25 16:15:38 peter ...@@ -97,22 +99,65 @@ $Header: /cvsroot/pgsql/doc/src/sgml/xfunc.sgml,v 1.66 2003/03/25 16:15:38 peter
<para> <para>
Arguments to the SQL function may be referenced in the function Arguments to the SQL function may be referenced in the function
body using the syntax <literal>$<replaceable>n</></>: $1 refers to body using the syntax <literal>$<replaceable>n</></>: <literal>$1</> refers to
the first argument, $2 to the second, and so on. If an argument the first argument, <literal>$2</> to the second, and so on. If an argument
is of a composite type, then the <quote>dot notation</quote>, is of a composite type, then the dot notation,
e.g., <literal>$1.emp</literal>, may be used to access attributes e.g., <literal>$1.name</literal>, may be used to access attributes
of the argument. of the argument.
</para> </para>
<sect2> <sect2>
<title>Examples</title> <title><acronym>SQL</acronym> Functions on Base Types</title>
<para> <para>
To illustrate a simple SQL function, consider the following, The simplest possible <acronym>SQL</acronym> function has no arguments and
which might be used to debit a bank account: simply returns a base type, such as <type>integer</type>:
<screen>
CREATE FUNCTION one() RETURNS integer AS '
SELECT 1 AS result;
' LANGUAGE SQL;
SELECT one();
one
-----
1
</screen>
</para>
<para>
Notice that we defined a column alias within the function body for the result of the function
(with the name <literal>result</>), but this column alias is not visible
outside the function. Hence, the result is labeled <literal>one</>
instead of <literal>result</>.
</para>
<para>
It is almost as easy to define <acronym>SQL</acronym> functions
that take base types as arguments. In the example below, notice
how we refer to the arguments within the function as <literal>$1</>
and <literal>$2</>.
<screen>
CREATE FUNCTION add_em(integer, integer) RETURNS integer AS '
SELECT $1 + $2;
' LANGUAGE SQL;
SELECT add_em(1, 2) AS answer;
answer
--------
3
</screen>
</para>
<para>
Here is a more useful function, which might be used to debit a
bank account:
<programlisting> <programlisting>
CREATE FUNCTION tp1 (integer, numeric) RETURNS integer AS ' CREATE FUNCTION tf1 (integer, numeric) RETURNS integer AS '
UPDATE bank UPDATE bank
SET balance = balance - $2 SET balance = balance - $2
WHERE accountno = $1; WHERE accountno = $1;
...@@ -124,17 +169,17 @@ CREATE FUNCTION tp1 (integer, numeric) RETURNS integer AS ' ...@@ -124,17 +169,17 @@ CREATE FUNCTION tp1 (integer, numeric) RETURNS integer AS '
follows: follows:
<programlisting> <programlisting>
SELECT tp1(17, 100.0); SELECT tf1(17, 100.0);
</programlisting> </programlisting>
</para> </para>
<para> <para>
In practice one would probably like a more useful result from the In practice one would probably like a more useful result from the
function than a constant <quote>1</>, so a more likely definition function than a constant 1, so a more likely definition
is is
<programlisting> <programlisting>
CREATE FUNCTION tp1 (integer, numeric) RETURNS numeric AS ' CREATE FUNCTION tf1 (integer, numeric) RETURNS numeric AS '
UPDATE bank UPDATE bank
SET balance = balance - $2 SET balance = balance - $2
WHERE accountno = $1; WHERE accountno = $1;
...@@ -148,83 +193,29 @@ CREATE FUNCTION tp1 (integer, numeric) RETURNS numeric AS ' ...@@ -148,83 +193,29 @@ CREATE FUNCTION tp1 (integer, numeric) RETURNS numeric AS '
<para> <para>
Any collection of commands in the <acronym>SQL</acronym> Any collection of commands in the <acronym>SQL</acronym>
language can be packaged together and defined as a function. language can be packaged together and defined as a function.
The commands can include data modification (i.e., Besides <command>SELECT</command> queries,
the commands can include data modification (i.e.,
<command>INSERT</command>, <command>UPDATE</command>, and <command>INSERT</command>, <command>UPDATE</command>, and
<command>DELETE</command>) as well <command>DELETE</command>). However, the final command
as <command>SELECT</command> queries. However, the final command
must be a <command>SELECT</command> that returns whatever is must be a <command>SELECT</command> that returns whatever is
specified as the function's return type. Alternatively, if you specified as the function's return type. Alternatively, if you
want to define a SQL function that performs actions but has no want to define a SQL function that performs actions but has no
useful value to return, you can define it as returning <type>void</>. useful value to return, you can define it as returning <type>void</>.
In that case it must not end with a <command>SELECT</command>. In that case, the function body must not end with a <command>SELECT</command>.
For example: For example:
<programlisting> <screen>
CREATE FUNCTION clean_EMP () RETURNS void AS ' CREATE FUNCTION clean_emp() RETURNS void AS '
DELETE FROM EMP DELETE FROM emp
WHERE EMP.salary &lt;= 0; WHERE salary &lt;= 0;
' LANGUAGE SQL; ' LANGUAGE SQL;
SELECT clean_EMP(); SELECT clean_emp();
</programlisting>
<screen>
clean_emp clean_emp
----------- -----------
(1 row) (1 row)
</screen>
</para>
</sect2>
<sect2>
<title><acronym>SQL</acronym> Functions on Base Types</title>
<para>
The simplest possible <acronym>SQL</acronym> function has no arguments and
simply returns a base type, such as <type>integer</type>:
<programlisting>
CREATE FUNCTION one() RETURNS integer AS '
SELECT 1 as RESULT;
' LANGUAGE SQL;
SELECT one();
</programlisting>
<screen>
one
-----
1
</screen>
</para>
<para>
Notice that we defined a column alias within the function body for the result of the function
(with the name <literal>RESULT</>), but this column alias is not visible
outside the function. Hence, the result is labeled <literal>one</>
instead of <literal>RESULT</>.
</para>
<para>
It is almost as easy to define <acronym>SQL</acronym> functions
that take base types as arguments. In the example below, notice
how we refer to the arguments within the function as <literal>$1</>
and <literal>$2</>:
<programlisting>
CREATE FUNCTION add_em(integer, integer) RETURNS integer AS '
SELECT $1 + $2;
' LANGUAGE SQL;
SELECT add_em(1, 2) AS answer;
</programlisting>
<screen>
answer
--------
3
</screen> </screen>
</para> </para>
</sect2> </sect2>
...@@ -237,22 +228,27 @@ SELECT add_em(1, 2) AS answer; ...@@ -237,22 +228,27 @@ SELECT add_em(1, 2) AS answer;
types, we must not only specify which types, we must not only specify which
argument we want (as we did above with <literal>$1</> and <literal>$2</literal>) but argument we want (as we did above with <literal>$1</> and <literal>$2</literal>) but
also the attributes of that argument. For example, suppose that also the attributes of that argument. For example, suppose that
<type>EMP</type> is a table containing employee data, and therefore <type>emp</type> is a table containing employee data, and therefore
also the name of the composite type of each row of the table. Here also the name of the composite type of each row of the table. Here
is a function <function>double_salary</function> that computes what your is a function <function>double_salary</function> that computes what someone's
salary would be if it were doubled: salary would be if it were doubled:
<programlisting> <screen>
CREATE FUNCTION double_salary(EMP) RETURNS integer AS ' CREATE TABLE emp (
name text,
salary integer,
age integer,
cubicle point
);
CREATE FUNCTION double_salary(emp) RETURNS integer AS '
SELECT $1.salary * 2 AS salary; SELECT $1.salary * 2 AS salary;
' LANGUAGE SQL; ' LANGUAGE SQL;
SELECT name, double_salary(EMP) AS dream SELECT name, double_salary(emp) AS dream
FROM EMP FROM emp
WHERE EMP.cubicle ~= point '(2,1)'; WHERE emp.cubicle ~= point '(2,1)';
</programlisting>
<screen>
name | dream name | dream
------+------- ------+-------
Sam | 2400 Sam | 2400
...@@ -269,28 +265,29 @@ SELECT name, double_salary(EMP) AS dream ...@@ -269,28 +265,29 @@ SELECT name, double_salary(EMP) AS dream
<para> <para>
It is also possible to build a function that returns a composite type. It is also possible to build a function that returns a composite type.
This is an example of a function This is an example of a function
that returns a single <type>EMP</type> row: that returns a single <type>emp</type> row:
<programlisting> <programlisting>
CREATE FUNCTION new_emp() RETURNS EMP AS ' CREATE FUNCTION new_emp() RETURNS emp AS '
SELECT text ''None'' AS name, SELECT text ''None'' AS name,
1000 AS salary, 1000 AS salary,
25 AS age, 25 AS age,
point ''(2,2)'' AS cubicle; point ''(2,2)'' AS cubicle;
' LANGUAGE SQL; ' LANGUAGE SQL;
</programlisting> </programlisting>
</para>
<para>
In this case we have specified each of the attributes In this case we have specified each of the attributes
with a constant value, but any computation or expression with a constant value, but any computation
could have been substituted for these constants. could have been substituted for these constants.
</para>
<para>
Note two important things about defining the function: Note two important things about defining the function:
<itemizedlist> <itemizedlist>
<listitem> <listitem>
<para> <para>
The target list order must be exactly the same as The select list order in the query must be exactly the same as
that in which the columns appear in the table associated that in which the columns appear in the table associated
with the composite type. (Naming the columns, as we did above, with the composite type. (Naming the columns, as we did above,
is irrelevant to the system.) is irrelevant to the system.)
...@@ -315,13 +312,15 @@ ERROR: function declared to return emp returns varchar instead of text at colum ...@@ -315,13 +312,15 @@ ERROR: function declared to return emp returns varchar instead of text at colum
function, as described below. It can also be called in the context function, as described below. It can also be called in the context
of an SQL expression, but only when you of an SQL expression, but only when you
extract a single attribute out of the row or pass the entire row into extract a single attribute out of the row or pass the entire row into
another function that accepts the same composite type. For example, another function that accepts the same composite type.
</para>
<programlisting> <para>
SELECT (new_emp()).name; This is an example for how to extract an attribute out of a row type:
</programlisting>
<screen> <screen>
SELECT (new_emp()).name;
name name
------ ------
None None
...@@ -340,29 +339,24 @@ ERROR: parser: parse error at or near "." ...@@ -340,29 +339,24 @@ ERROR: parser: parse error at or near "."
functional notation for extracting an attribute. The simple way functional notation for extracting an attribute. The simple way
to explain this is that we can use the to explain this is that we can use the
notations <literal>attribute(table)</> and <literal>table.attribute</> notations <literal>attribute(table)</> and <literal>table.attribute</>
interchangeably: interchangeably.
<programlisting> <screen>
SELECT name(new_emp()); SELECT name(new_emp());
</programlisting>
<screen>
name name
------ ------
None None
</screen> </screen>
<programlisting>
--
-- this is the same as:
-- SELECT EMP.name AS youngster FROM EMP WHERE EMP.age &lt; 30
--
SELECT name(EMP) AS youngster
FROM EMP
WHERE age(EMP) &lt; 30;
</programlisting>
<screen> <screen>
-- This is the same as:
-- SELECT emp.name AS youngster FROM emp WHERE emp.age &lt; 30
SELECT name(emp) AS youngster
FROM emp
WHERE age(emp) &lt; 30;
youngster youngster
----------- -----------
Sam Sam
...@@ -370,17 +364,15 @@ SELECT name(EMP) AS youngster ...@@ -370,17 +364,15 @@ SELECT name(EMP) AS youngster
</para> </para>
<para> <para>
Another way to use a function returning a row result is to declare a The other way to use a function returning a row result is to declare a
second function accepting a row type parameter, and pass the function second function accepting a row type argument and pass the
result to it: result of the first function to it:
<programlisting> <screen>
CREATE FUNCTION getname(emp) RETURNS text AS CREATE FUNCTION getname(emp) RETURNS text AS
'SELECT $1.name;' 'SELECT $1.name;'
LANGUAGE SQL; LANGUAGE SQL;
</programlisting>
<screen>
SELECT getname(new_emp()); SELECT getname(new_emp());
getname getname
--------- ---------
...@@ -391,35 +383,32 @@ SELECT getname(new_emp()); ...@@ -391,35 +383,32 @@ SELECT getname(new_emp());
</sect2> </sect2>
<sect2> <sect2>
<title><acronym>SQL</acronym> Table Functions</title> <title><acronym>SQL</acronym> Functions as Table Sources</title>
<para> <para>
A table function is one that may be used in the <command>FROM</command> All SQL functions may be used in the <literal>FROM</> clause of a query,
clause of a query. All SQL language functions may be used in this manner,
but it is particularly useful for functions returning composite types. but it is particularly useful for functions returning composite types.
If the function is defined to return a base type, the table function If the function is defined to return a base type, the table function
produces a one-column table. If the function is defined to return produces a one-column table. If the function is defined to return
a composite type, the table function produces a column for each column a composite type, the table function produces a column for each attribute
of the composite type. of the composite type.
</para> </para>
<para> <para>
Here is an example: Here is an example:
<programlisting> <screen>
CREATE TABLE foo (fooid int, foosubid int, fooname text); CREATE TABLE foo (fooid int, foosubid int, fooname text);
INSERT INTO foo VALUES(1,1,'Joe'); INSERT INTO foo VALUES (1, 1, 'Joe');
INSERT INTO foo VALUES(1,2,'Ed'); INSERT INTO foo VALUES (1, 2, 'Ed');
INSERT INTO foo VALUES(2,1,'Mary'); INSERT INTO foo VALUES (2, 1, 'Mary');
CREATE FUNCTION getfoo(int) RETURNS foo AS ' CREATE FUNCTION getfoo(int) RETURNS foo AS '
SELECT * FROM foo WHERE fooid = $1; SELECT * FROM foo WHERE fooid = $1;
' LANGUAGE SQL; ' LANGUAGE SQL;
SELECT *, upper(fooname) FROM getfoo(1) AS t1; SELECT *, upper(fooname) FROM getfoo(1) AS t1;
</programlisting>
<screen>
fooid | foosubid | fooname | upper fooid | foosubid | fooname | upper
-------+----------+---------+------- -------+----------+---------+-------
1 | 1 | Joe | JOE 1 | 1 | Joe | JOE
...@@ -432,35 +421,35 @@ SELECT *, upper(fooname) FROM getfoo(1) AS t1; ...@@ -432,35 +421,35 @@ SELECT *, upper(fooname) FROM getfoo(1) AS t1;
<para> <para>
Note that we only got one row out of the function. This is because Note that we only got one row out of the function. This is because
we did not say <literal>SETOF</>. we did not use <literal>SETOF</>. This is described in the next section.
</para> </para>
</sect2> </sect2>
<sect2> <sect2>
<title><acronym>SQL</acronym> Functions Returning Sets</title> <title><acronym>SQL</acronym> Functions Returning Sets</title>
<para> <para>
When an SQL function is declared as returning <literal>SETOF</literal> When an SQL function is declared as returning <literal>SETOF
<replaceable>sometype</>, the function's final <replaceable>sometype</></literal>, the function's final
<command>SELECT</> query is executed to completion, and each row it <command>SELECT</> query is executed to completion, and each row it
outputs is returned as an element of the set. outputs is returned as an element of the result set.
</para> </para>
<para> <para>
This feature is normally used by calling the function as a table This feature is normally used when calling the function in the <literal>FROM</>
function. In this case each row returned by the function becomes clause. In this case each row returned by the function becomes
a row of the table seen by the query. For example, assume that a row of the table seen by the query. For example, assume that
table <literal>foo</> has the same contents as above, and we say: table <literal>foo</> has the same contents as above, and we say:
<programlisting> <programlisting>
CREATE FUNCTION getfoo(int) RETURNS setof foo AS ' CREATE FUNCTION getfoo(int) RETURNS SETOF foo AS '
SELECT * FROM foo WHERE fooid = $1; SELECT * FROM foo WHERE fooid = $1;
' LANGUAGE SQL; ' LANGUAGE SQL;
SELECT * FROM getfoo(1) AS t1; SELECT * FROM getfoo(1) AS t1;
</programlisting> </programlisting>
Then we would get:
<screen> <screen>
fooid | foosubid | fooname fooid | foosubid | fooname
-------+----------+--------- -------+----------+---------
...@@ -471,21 +460,19 @@ SELECT * FROM getfoo(1) AS t1; ...@@ -471,21 +460,19 @@ SELECT * FROM getfoo(1) AS t1;
</para> </para>
<para> <para>
Currently, functions returning sets may also be called in the target list Currently, functions returning sets may also be called in the select list
of a <command>SELECT</> query. For each row that the <command>SELECT</> of a query. For each row that the query
generates by itself, the function returning set is invoked, and an output generates by itself, the function returning set is invoked, and an output
row is generated for each element of the function's result set. Note, row is generated for each element of the function's result set. Note,
however, that this capability is deprecated and may be removed in future however, that this capability is deprecated and may be removed in future
releases. The following is an example function returning a set from the releases. The following is an example function returning a set from the
target list: select list:
<programlisting> <screen>
CREATE FUNCTION listchildren(text) RETURNS SETOF text AS CREATE FUNCTION listchildren(text) RETURNS SETOF text AS
'SELECT name FROM nodes WHERE parent = $1' 'SELECT name FROM nodes WHERE parent = $1'
LANGUAGE SQL; LANGUAGE SQL;
</programlisting>
<screen>
SELECT * FROM nodes; SELECT * FROM nodes;
name | parent name | parent
-----------+-------- -----------+--------
...@@ -519,7 +506,7 @@ SELECT name, listchildren(name) FROM nodes; ...@@ -519,7 +506,7 @@ SELECT name, listchildren(name) FROM nodes;
In the last <command>SELECT</command>, In the last <command>SELECT</command>,
notice that no output row appears for <literal>Child2</>, <literal>Child3</>, etc. notice that no output row appears for <literal>Child2</>, <literal>Child3</>, etc.
This happens because <function>listchildren</function> returns an empty set This happens because <function>listchildren</function> returns an empty set
for those inputs, so no output rows are generated. for those arguments, so no result rows are generated.
</para> </para>
</sect2> </sect2>
</sect1> </sect1>
...@@ -562,7 +549,7 @@ SELECT name, listchildren(name) FROM nodes; ...@@ -562,7 +549,7 @@ SELECT name, listchildren(name) FROM nodes;
<para> <para>
Normally, all internal functions present in the Normally, all internal functions present in the
backend are declared during the initialization of the database cluster (<command>initdb</command>), server are declared during the initialization of the database cluster (<command>initdb</command>),
but a user could use <command>CREATE FUNCTION</command> but a user could use <command>CREATE FUNCTION</command>
to create additional alias names for an internal function. to create additional alias names for an internal function.
Internal functions are declared in <command>CREATE FUNCTION</command> Internal functions are declared in <command>CREATE FUNCTION</command>
...@@ -571,8 +558,8 @@ SELECT name, listchildren(name) FROM nodes; ...@@ -571,8 +558,8 @@ SELECT name, listchildren(name) FROM nodes;
<programlisting> <programlisting>
CREATE FUNCTION square_root(double precision) RETURNS double precision CREATE FUNCTION square_root(double precision) RETURNS double precision
AS 'dsqrt' AS 'dsqrt'
LANGUAGE INTERNAL LANGUAGE internal
WITH (isStrict); STRICT;
</programlisting> </programlisting>
(Most internal functions expect to be declared <quote>strict</quote>.) (Most internal functions expect to be declared <quote>strict</quote>.)
</para> </para>
...@@ -587,7 +574,7 @@ CREATE FUNCTION square_root(double precision) RETURNS double precision ...@@ -587,7 +574,7 @@ CREATE FUNCTION square_root(double precision) RETURNS double precision
</sect1> </sect1>
<sect1 id="xfunc-c"> <sect1 id="xfunc-c">
<title>C Language Functions</title> <title>C-Language Functions</title>
<para> <para>
User-defined functions can be written in C (or a language that can User-defined functions can be written in C (or a language that can
...@@ -617,7 +604,7 @@ CREATE FUNCTION square_root(double precision) RETURNS double precision ...@@ -617,7 +604,7 @@ CREATE FUNCTION square_root(double precision) RETURNS double precision
<para> <para>
The first time a user-defined function in a particular The first time a user-defined function in a particular
loadable object file is called in a backend session, loadable object file is called in a session,
the dynamic loader loads that object file into memory so that the the dynamic loader loads that object file into memory so that the
function can be called. The <command>CREATE FUNCTION</command> function can be called. The <command>CREATE FUNCTION</command>
for a user-defined C function must therefore specify two pieces of for a user-defined C function must therefore specify two pieces of
...@@ -736,9 +723,140 @@ CREATE FUNCTION square_root(double precision) RETURNS double precision ...@@ -736,9 +723,140 @@ CREATE FUNCTION square_root(double precision) RETURNS double precision
<title>Base Types in C-Language Functions</title> <title>Base Types in C-Language Functions</title>
<para> <para>
<xref linkend="xfunc-c-type-table"> gives the C type required for To know how to write C-language functions, you need to know how
parameters in the C functions that will be loaded into PostgreSQL internally represents base data types and how they can
<productname>PostgreSQL</>. be passed to and from functions.
Internally, <productname>PostgreSQL</productname> regards a
base type as a <quote>blob of memory</quote>. The user-defined
functions that you define over a type in turn define the
way that <productname>PostgreSQL</productname> can operate
on it. That is, <productname>PostgreSQL</productname> will
only store and retrieve the data from disk and use your
user-defined functions to input, process, and output the data.
</para>
<para>
Base types can have one of three internal formats:
<itemizedlist>
<listitem>
<para>
pass by value, fixed-length
</para>
</listitem>
<listitem>
<para>
pass by reference, fixed-length
</para>
</listitem>
<listitem>
<para>
pass by reference, variable-length
</para>
</listitem>
</itemizedlist>
</para>
<para>
By-value types can only be 1, 2, or 4 bytes in length
(also 8 bytes, if <literal>sizeof(Datum)</literal> is 8 on your machine).
You should be careful
to define your types such that they will be the same
size (in bytes) on all architectures. For example, the
<literal>long</literal> type is dangerous because it
is 4 bytes on some machines and 8 bytes on others, whereas
<type>int</type> type is 4 bytes on most
Unix machines. A reasonable implementation of
the <type>int4</type> type on Unix
machines might be:
<programlisting>
/* 4-byte integer, passed by value */
typedef int int4;
</programlisting>
</para>
<para>
On the other hand, fixed-length types of any size may
be passed by-reference. For example, here is a sample
implementation of a <productname>PostgreSQL</productname> type:
<programlisting>
/* 16-byte structure, passed by reference */
typedef struct
{
double x, y;
} Point;
</programlisting>
Only pointers to such types can be used when passing
them in and out of <productname>PostgreSQL</productname> functions.
To return a value of such a type, allocate the right amount of
memory with <literal>palloc</literal>, fill in the allocated memory,
and return a pointer to it. (You can also return an input value
that has the same type as the return value directly by returning
the pointer to the input value. <emphasis>Never</> modify the
contents of a pass-by-reference input value, however.)
</para>
<para>
Finally, all variable-length types must also be passed
by reference. All variable-length types must begin
with a length field of exactly 4 bytes, and all data to
be stored within that type must be located in the memory
immediately following that length field. The
length field contains the total length of the structure,
that is, it includes the size of the length field
itself.
</para>
<para>
As an example, we can define the type <type>text</type> as
follows:
<programlisting>
typedef struct {
int4 length;
char data[1];
} text;
</programlisting>
Obviously, the data field declared here is not long enough to hold
all possible strings. Since it's impossible to declare a variable-size
structure in <acronym>C</acronym>, we rely on the knowledge that the
<acronym>C</acronym> compiler won't range-check array subscripts. We
just allocate the necessary amount of space and then access the array as
if it were declared the right length. (This is a common trick, which
you can read about in many textbooks about C.)
</para>
<para>
When manipulating
variable-length types, we must be careful to allocate
the correct amount of memory and set the length field correctly.
For example, if we wanted to store 40 bytes in a <structname>text</>
structure, we might use a code fragment like this:
<programlisting>
#include "postgres.h"
...
char buffer[40]; /* our source data */
...
text *destination = (text *) palloc(VARHDRSZ + 40);
destination-&gt;length = VARHDRSZ + 40;
memcpy(destination-&gt;data, buffer, 40);
...
</programlisting>
<literal>VARHDRSZ</> is the same as <literal>sizeof(int4)</>, but
it's considered good style to use the macro <literal>VARHDRSZ</>
to refer to the size of the overhead for a variable-length type.
</para>
<para>
<xref linkend="xfunc-c-type-table"> specifies which C type
corresponds to which SQL type when writing a C-language function
that uses a built-in type of <productname>PostgreSQL</>.
The <quote>Defined In</quote> column gives the header file that The <quote>Defined In</quote> column gives the header file that
needs to be included to get the type definition. (The actual needs to be included to get the type definition. (The actual
definition may be in a different file that is included by the definition may be in a different file that is included by the
...@@ -749,9 +867,7 @@ CREATE FUNCTION square_root(double precision) RETURNS double precision ...@@ -749,9 +867,7 @@ CREATE FUNCTION square_root(double precision) RETURNS double precision
</para> </para>
<table tocentry="1" id="xfunc-c-type-table"> <table tocentry="1" id="xfunc-c-type-table">
<title>Equivalent C Types <title>Equivalent C Types for Built-In SQL Types</title>
for Built-In <productname>PostgreSQL</productname> Types</title>
<titleabbrev>Equivalent C Types</titleabbrev>
<tgroup cols="3"> <tgroup cols="3">
<thead> <thead>
<row> <row>
...@@ -862,190 +978,64 @@ CREATE FUNCTION square_root(double precision) RETURNS double precision ...@@ -862,190 +978,64 @@ CREATE FUNCTION square_root(double precision) RETURNS double precision
<entry><type>PATH*</type></entry> <entry><type>PATH*</type></entry>
<entry><filename>utils/geo_decls.h</filename></entry> <entry><filename>utils/geo_decls.h</filename></entry>
</row> </row>
<row> <row>
<entry><type>point</type></entry> <entry><type>point</type></entry>
<entry><type>POINT*</type></entry> <entry><type>POINT*</type></entry>
<entry><filename>utils/geo_decls.h</filename></entry> <entry><filename>utils/geo_decls.h</filename></entry>
</row> </row>
<row> <row>
<entry><type>regproc</type></entry> <entry><type>regproc</type></entry>
<entry><type>regproc</type></entry> <entry><type>regproc</type></entry>
<entry><filename>postgres.h</filename></entry> <entry><filename>postgres.h</filename></entry>
</row> </row>
<row> <row>
<entry><type>reltime</type></entry> <entry><type>reltime</type></entry>
<entry><type>RelativeTime</type></entry> <entry><type>RelativeTime</type></entry>
<entry><filename>utils/nabstime.h</filename></entry> <entry><filename>utils/nabstime.h</filename></entry>
</row> </row>
<row> <row>
<entry><type>text</type></entry> <entry><type>text</type></entry>
<entry><type>text*</type></entry> <entry><type>text*</type></entry>
<entry><filename>postgres.h</filename></entry> <entry><filename>postgres.h</filename></entry>
</row> </row>
<row> <row>
<entry><type>tid</type></entry> <entry><type>tid</type></entry>
<entry><type>ItemPointer</type></entry> <entry><type>ItemPointer</type></entry>
<entry><filename>storage/itemptr.h</filename></entry> <entry><filename>storage/itemptr.h</filename></entry>
</row> </row>
<row> <row>
<entry><type>time</type></entry> <entry><type>time</type></entry>
<entry><type>TimeADT</type></entry> <entry><type>TimeADT</type></entry>
<entry><filename>utils/date.h</filename></entry> <entry><filename>utils/date.h</filename></entry>
</row> </row>
<row> <row>
<entry><type>time with time zone</type></entry> <entry><type>time with time zone</type></entry>
<entry><type>TimeTzADT</type></entry> <entry><type>TimeTzADT</type></entry>
<entry><filename>utils/date.h</filename></entry> <entry><filename>utils/date.h</filename></entry>
</row> </row>
<row> <row>
<entry><type>timestamp</type></entry> <entry><type>timestamp</type></entry>
<entry><type>Timestamp*</type></entry> <entry><type>Timestamp*</type></entry>
<entry><filename>utils/timestamp.h</filename></entry> <entry><filename>utils/timestamp.h</filename></entry>
</row> </row>
<row> <row>
<entry><type>tinterval</type></entry> <entry><type>tinterval</type></entry>
<entry><type>TimeInterval</type></entry> <entry><type>TimeInterval</type></entry>
<entry><filename>utils/nabstime.h</filename></entry> <entry><filename>utils/nabstime.h</filename></entry>
</row> </row>
<row> <row>
<entry><type>varchar</type></entry> <entry><type>varchar</type></entry>
<entry><type>VarChar*</type></entry> <entry><type>VarChar*</type></entry>
<entry><filename>postgres.h</filename></entry> <entry><filename>postgres.h</filename></entry>
</row> </row>
<row> <row>
<entry><type>xid</type></entry> <entry><type>xid</type></entry>
<entry><type>TransactionId</type></entry> <entry><type>TransactionId</type></entry>
<entry><filename>postgres.h</filename></entry> <entry><filename>postgres.h</filename></entry>
</row> </row>
</tbody> </tbody>
</tgroup> </tgroup>
</table> </table>
<para>
Internally, <productname>PostgreSQL</productname> regards a
base type as a <quote>blob of memory</quote>. The user-defined
functions that you define over a type in turn define the
way that <productname>PostgreSQL</productname> can operate
on it. That is, <productname>PostgreSQL</productname> will
only store and retrieve the data from disk and use your
user-defined functions to input, process, and output the data.
Base types can have one of three internal formats:
<itemizedlist>
<listitem>
<para>
pass by value, fixed-length
</para>
</listitem>
<listitem>
<para>
pass by reference, fixed-length
</para>
</listitem>
<listitem>
<para>
pass by reference, variable-length
</para>
</listitem>
</itemizedlist>
</para>
<para>
By-value types can only be 1, 2 or 4 bytes in length
(also 8 bytes, if <literal>sizeof(Datum)</literal> is 8 on your machine).
You should be careful
to define your types such that they will be the same
size (in bytes) on all architectures. For example, the
<literal>long</literal> type is dangerous because it
is 4 bytes on some machines and 8 bytes on others, whereas
<type>int</type> type is 4 bytes on most
Unix machines. A reasonable implementation of
the <type>int4</type> type on Unix
machines might be:
<programlisting>
/* 4-byte integer, passed by value */
typedef int int4;
</programlisting>
<productname>PostgreSQL</productname> automatically figures
things out so that the integer types really have the size they
advertise.
</para>
<para>
On the other hand, fixed-length types of any size may
be passed by-reference. For example, here is a sample
implementation of a <productname>PostgreSQL</productname> type:
<programlisting>
/* 16-byte structure, passed by reference */
typedef struct
{
double x, y;
} Point;
</programlisting>
</para>
<para>
Only pointers to such types can be used when passing
them in and out of <productname>PostgreSQL</productname> functions.
To return a value of such a type, allocate the right amount of
memory with <literal>palloc()</literal>, fill in the allocated memory,
and return a pointer to it. (Alternatively, you can return an input
value of the same type by returning its pointer. <emphasis>Never</>
modify the contents of a pass-by-reference input value, however.)
</para>
<para>
Finally, all variable-length types must also be passed
by reference. All variable-length types must begin
with a length field of exactly 4 bytes, and all data to
be stored within that type must be located in the memory
immediately following that length field. The
length field is the total length of the structure
(i.e., it includes the size of the length field
itself). We can define the text type as follows:
<programlisting>
typedef struct {
int4 length;
char data[1];
} text;
</programlisting>
</para>
<para>
Obviously, the data field declared here is not long enough to hold
all possible strings. Since it's impossible to declare a variable-size
structure in <acronym>C</acronym>, we rely on the knowledge that the
<acronym>C</acronym> compiler won't range-check array subscripts. We
just allocate the necessary amount of space and then access the array as
if it were declared the right length. (If this isn't a familiar trick to
you, you may wish to spend some time with an introductory
<acronym>C</acronym> programming textbook before delving deeper into
<productname>PostgreSQL</productname> server programming.)
When manipulating
variable-length types, we must be careful to allocate
the correct amount of memory and set the length field correctly.
For example, if we wanted to store 40 bytes in a text
structure, we might use a code fragment like this:
<programlisting>
#include "postgres.h"
...
char buffer[40]; /* our source data */
...
text *destination = (text *) palloc(VARHDRSZ + 40);
destination-&gt;length = VARHDRSZ + 40;
memcpy(destination-&gt;data, buffer, 40);
...
</programlisting>
<literal>VARHDRSZ</> is the same as <literal>sizeof(int4)</>, but
it's considered good style to use the macro <literal>VARHDRSZ</>
to refer to the size of the overhead for a variable-length type.
</para>
<para> <para>
Now that we've gone over all of the possible structures Now that we've gone over all of the possible structures
...@@ -1054,7 +1044,7 @@ memcpy(destination-&gt;data, buffer, 40); ...@@ -1054,7 +1044,7 @@ memcpy(destination-&gt;data, buffer, 40);
</sect2> </sect2>
<sect2> <sect2>
<title>Version-0 Calling Conventions for C-Language Functions</title> <title>Calling Conventions Version 0 for C-Language Functions</title>
<para> <para>
We present the <quote>old style</quote> calling convention first --- although We present the <quote>old style</quote> calling convention first --- although
...@@ -1072,7 +1062,7 @@ memcpy(destination-&gt;data, buffer, 40); ...@@ -1072,7 +1062,7 @@ memcpy(destination-&gt;data, buffer, 40);
#include "postgres.h" #include "postgres.h"
#include &lt;string.h&gt; #include &lt;string.h&gt;
/* By Value */ /* by value */
int int
add_one(int arg) add_one(int arg)
...@@ -1080,7 +1070,7 @@ add_one(int arg) ...@@ -1080,7 +1070,7 @@ add_one(int arg)
return arg + 1; return arg + 1;
} }
/* By Reference, Fixed Length */ /* by reference, fixed length */
float8 * float8 *
add_one_float8(float8 *arg) add_one_float8(float8 *arg)
...@@ -1103,7 +1093,7 @@ makepoint(Point *pointx, Point *pointy) ...@@ -1103,7 +1093,7 @@ makepoint(Point *pointx, Point *pointy)
return new_point; return new_point;
} }
/* By Reference, Variable Length */ /* by reference, variable length */
text * text *
copytext(text *t) copytext(text *t)
...@@ -1144,47 +1134,48 @@ concat_text(text *arg1, text *arg2) ...@@ -1144,47 +1134,48 @@ concat_text(text *arg1, text *arg2)
with commands like this: with commands like this:
<programlisting> <programlisting>
CREATE FUNCTION add_one(int4) RETURNS int4 CREATE FUNCTION add_one(integer) RETURNS integer
AS '<replaceable>PGROOT</replaceable>/tutorial/funcs' LANGUAGE C AS '<replaceable>DIRECTORY</replaceable>/funcs', 'add_one'
WITH (isStrict); LANGUAGE C STRICT;
-- note overloading of SQL function name add_one() -- note overloading of SQL function name "add_one"
CREATE FUNCTION add_one(float8) RETURNS float8 CREATE FUNCTION add_one(double precision) RETURNS double precision
AS '<replaceable>PGROOT</replaceable>/tutorial/funcs', AS '<replaceable>DIRECTORY</replaceable>/funcs', 'add_one_float8'
'add_one_float8' LANGUAGE C STRICT;
LANGUAGE C WITH (isStrict);
CREATE FUNCTION makepoint(point, point) RETURNS point CREATE FUNCTION makepoint(point, point) RETURNS point
AS '<replaceable>PGROOT</replaceable>/tutorial/funcs' LANGUAGE C AS '<replaceable>DIRECTORY</replaceable>/funcs', 'makepoint'
WITH (isStrict); LANGUAGE C STRICT;
CREATE FUNCTION copytext(text) RETURNS text CREATE FUNCTION copytext(text) RETURNS text
AS '<replaceable>PGROOT</replaceable>/tutorial/funcs' LANGUAGE C AS '<replaceable>DIRECTORY</replaceable>/funcs', 'copytext'
WITH (isStrict); LANGUAGE C STRICT;
CREATE FUNCTION concat_text(text, text) RETURNS text CREATE FUNCTION concat_text(text, text) RETURNS text
AS '<replaceable>PGROOT</replaceable>/tutorial/funcs' LANGUAGE C AS '<replaceable>DIRECTORY</replaceable>/funcs', 'concat_text',
WITH (isStrict); LANGUAGE C STRICT;
</programlisting> </programlisting>
</para> </para>
<para> <para>
Here <replaceable>PGROOT</replaceable> stands for the full path to Here, <replaceable>DIRECTORY</replaceable> stands for the
the <productname>PostgreSQL</productname> source tree. (Better style would directory of the shared library file (for instance the PostgreSQL
be to use just <literal>'funcs'</> in the <literal>AS</> clause, tutorial directory, which contains the code for the examples used
after having added <replaceable>PGROOT</replaceable><literal>/tutorial</> in this section). (Better style would be to use just
to the search path. In any case, we may omit the system-specific <literal>'funcs'</> in the <literal>AS</> clause, after having
extension for a shared library, commonly <literal>.so</literal> or added <replaceable>DIRECTORY</replaceable> to the search path.
In any case, we may omit the system-specific extension for a
shared library, commonly <literal>.so</literal> or
<literal>.sl</literal>.) <literal>.sl</literal>.)
</para> </para>
<para> <para>
Notice that we have specified the functions as <quote>strict</quote>, Notice that we have specified the functions as <quote>strict</quote>,
meaning that meaning that
the system should automatically assume a NULL result if any input the system should automatically assume a null result if any input
value is NULL. By doing this, we avoid having to check for NULL inputs value is null. By doing this, we avoid having to check for null inputs
in the function code. Without this, we'd have to check for null values in the function code. Without this, we'd have to check for null values
explicitly, for example by checking for a null pointer for each explicitly, by checking for a null pointer for each
pass-by-reference argument. (For pass-by-value arguments, we don't pass-by-reference argument. (For pass-by-value arguments, we don't
even have a way to check!) even have a way to check!)
</para> </para>
...@@ -1192,15 +1183,15 @@ CREATE FUNCTION concat_text(text, text) RETURNS text ...@@ -1192,15 +1183,15 @@ CREATE FUNCTION concat_text(text, text) RETURNS text
<para> <para>
Although this calling convention is simple to use, Although this calling convention is simple to use,
it is not very portable; on some architectures there are problems it is not very portable; on some architectures there are problems
with passing smaller-than-int data types this way. Also, there is with passing data types that are smaller than <type>int</type> this way. Also, there is
no simple way to return a NULL result, nor to cope with NULL arguments no simple way to return a null result, nor to cope with null arguments
in any way other than making the function strict. The version-1 in any way other than making the function strict. The version-1
convention, presented next, overcomes these objections. convention, presented next, overcomes these objections.
</para> </para>
</sect2> </sect2>
<sect2> <sect2>
<title>Version-1 Calling Conventions for C-Language Functions</title> <title>Calling Conventions Version 1 for C-Language Functions</title>
<para> <para>
The version-1 calling convention relies on macros to suppress most The version-1 calling convention relies on macros to suppress most
...@@ -1213,21 +1204,26 @@ Datum funcname(PG_FUNCTION_ARGS) ...@@ -1213,21 +1204,26 @@ Datum funcname(PG_FUNCTION_ARGS)
<programlisting> <programlisting>
PG_FUNCTION_INFO_V1(funcname); PG_FUNCTION_INFO_V1(funcname);
</programlisting> </programlisting>
must appear in the same source file (conventionally it's written must appear in the same source file. (Conventionally. it's
just before the function itself). This macro call is not needed written just before the function itself.) This macro call is not
for <literal>internal</>-language functions, since needed for <literal>internal</>-language functions, since
<productname>PostgreSQL</> currently <productname>PostgreSQL</> assumes that all internal functions
assumes all internal functions are version-1. However, it is use the version-1 convention. It is, however, required for
<emphasis>required</emphasis> for dynamically-loaded functions. dynamically-loaded functions.
</para> </para>
<para> <para>
In a version-1 function, each actual argument is fetched using a In a version-1 function, each actual argument is fetched using a
<function>PG_GETARG_<replaceable>xxx</replaceable>()</function> <function>PG_GETARG_<replaceable>xxx</replaceable>()</function>
macro that corresponds to the argument's data type, and the result macro that corresponds to the argument's data type, and the
is returned using a result is returned using a
<function>PG_RETURN_<replaceable>xxx</replaceable>()</function> <function>PG_RETURN_<replaceable>xxx</replaceable>()</function>
macro for the return type. macro for the return type.
<function>PG_GETARG_<replaceable>xxx</replaceable>()</function>
takes as its argument the number of the function argument to
fetch, where the count starts at 0.
<function>PG_RETURN_<replaceable>xxx</replaceable>()</function>
takes as its argument the actual value to return.
</para> </para>
<para> <para>
...@@ -1238,7 +1234,7 @@ PG_FUNCTION_INFO_V1(funcname); ...@@ -1238,7 +1234,7 @@ PG_FUNCTION_INFO_V1(funcname);
#include &lt;string.h&gt; #include &lt;string.h&gt;
#include "fmgr.h" #include "fmgr.h"
/* By Value */ /* by value */
PG_FUNCTION_INFO_V1(add_one); PG_FUNCTION_INFO_V1(add_one);
...@@ -1250,14 +1246,14 @@ add_one(PG_FUNCTION_ARGS) ...@@ -1250,14 +1246,14 @@ add_one(PG_FUNCTION_ARGS)
PG_RETURN_INT32(arg + 1); PG_RETURN_INT32(arg + 1);
} }
/* By Reference, Fixed Length */ /* b reference, fixed length */
PG_FUNCTION_INFO_V1(add_one_float8); PG_FUNCTION_INFO_V1(add_one_float8);
Datum Datum
add_one_float8(PG_FUNCTION_ARGS) add_one_float8(PG_FUNCTION_ARGS)
{ {
/* The macros for FLOAT8 hide its pass-by-reference nature */ /* The macros for FLOAT8 hide its pass-by-reference nature. */
float8 arg = PG_GETARG_FLOAT8(0); float8 arg = PG_GETARG_FLOAT8(0);
PG_RETURN_FLOAT8(arg + 1.0); PG_RETURN_FLOAT8(arg + 1.0);
...@@ -1268,7 +1264,7 @@ PG_FUNCTION_INFO_V1(makepoint); ...@@ -1268,7 +1264,7 @@ PG_FUNCTION_INFO_V1(makepoint);
Datum Datum
makepoint(PG_FUNCTION_ARGS) makepoint(PG_FUNCTION_ARGS)
{ {
/* Here, the pass-by-reference nature of Point is not hidden */ /* Here, the pass-by-reference nature of Point is not hidden. */
Point *pointx = PG_GETARG_POINT_P(0); Point *pointx = PG_GETARG_POINT_P(0);
Point *pointy = PG_GETARG_POINT_P(1); Point *pointy = PG_GETARG_POINT_P(1);
Point *new_point = (Point *) palloc(sizeof(Point)); Point *new_point = (Point *) palloc(sizeof(Point));
...@@ -1279,7 +1275,7 @@ makepoint(PG_FUNCTION_ARGS) ...@@ -1279,7 +1275,7 @@ makepoint(PG_FUNCTION_ARGS)
PG_RETURN_POINT_P(new_point); PG_RETURN_POINT_P(new_point);
} }
/* By Reference, Variable Length */ /* by reference, variable length */
PG_FUNCTION_INFO_V1(copytext); PG_FUNCTION_INFO_V1(copytext);
...@@ -1327,33 +1323,27 @@ concat_text(PG_FUNCTION_ARGS) ...@@ -1327,33 +1323,27 @@ concat_text(PG_FUNCTION_ARGS)
<para> <para>
At first glance, the version-1 coding conventions may appear to At first glance, the version-1 coding conventions may appear to
be just pointless obscurantism. However, they do offer a number be just pointless obscurantism. They do, however, offer a number
of improvements, because the macros can hide unnecessary detail. of improvements, because the macros can hide unnecessary detail.
An example is that in coding <function>add_one_float8</>, we no longer need to An example is that in coding <function>add_one_float8</>, we no longer need to
be aware that <type>float8</type> is a pass-by-reference type. Another be aware that <type>float8</type> is a pass-by-reference type. Another
example is that the <literal>GETARG</> macros for variable-length types hide example is that the <literal>GETARG</> macros for variable-length types allow
the need to deal with fetching <quote>toasted</quote> (compressed or for more efficient fetching of <quote>toasted</quote> (compressed or
out-of-line) values. The old-style <function>copytext</function> out-of-line) values.
and <function>concat_text</function> functions shown above are
actually wrong in the presence of toasted values, because they
don't call <function>pg_detoast_datum()</function> on their
inputs. (The handler for old-style dynamically-loaded functions
currently takes care of this detail, but it does so less
efficiently than is possible for a version-1 function.)
</para> </para>
<para> <para>
One big improvement in version-1 functions is better handling of NULL One big improvement in version-1 functions is better handling of null
inputs and results. The macro <function>PG_ARGISNULL(<replaceable>n</>)</function> inputs and results. The macro <function>PG_ARGISNULL(<replaceable>n</>)</function>
allows a function to test whether each input is NULL (of course, doing allows a function to test whether each input is null. (Of course, doing
this is only necessary in functions not declared <quote>strict</>). this is only necessary in functions not declared <quote>strict</>.)
As with the As with the
<function>PG_GETARG_<replaceable>xxx</replaceable>()</function> macros, <function>PG_GETARG_<replaceable>xxx</replaceable>()</function> macros,
the input arguments are counted beginning at zero. Note that one the input arguments are counted beginning at zero. Note that one
should refrain from executing should refrain from executing
<function>PG_GETARG_<replaceable>xxx</replaceable>()</function> until <function>PG_GETARG_<replaceable>xxx</replaceable>()</function> until
one has verified that the argument isn't NULL. one has verified that the argument isn't null.
To return a NULL result, execute <function>PG_RETURN_NULL()</function>; To return a null result, execute <function>PG_RETURN_NULL()</function>;
this works in both strict and nonstrict functions. this works in both strict and nonstrict functions.
</para> </para>
...@@ -1362,45 +1352,149 @@ concat_text(PG_FUNCTION_ARGS) ...@@ -1362,45 +1352,149 @@ concat_text(PG_FUNCTION_ARGS)
variants of the variants of the
<function>PG_GETARG_<replaceable>xxx</replaceable>()</function> <function>PG_GETARG_<replaceable>xxx</replaceable>()</function>
macros. The first of these, macros. The first of these,
<function>PG_GETARG_<replaceable>xxx</replaceable>_COPY()</function> <function>PG_GETARG_<replaceable>xxx</replaceable>_COPY()</function>,
guarantees to return a copy of the specified parameter which is guarantees to return a copy of the specified argument that is
safe for writing into. (The normal macros will sometimes return a safe for writing into. (The normal macros will sometimes return a
pointer to a value that is physically stored in a table, and so pointer to a value that is physically stored in a table, which
must not be written to. Using the must not be written to. Using the
<function>PG_GETARG_<replaceable>xxx</replaceable>_COPY()</function> <function>PG_GETARG_<replaceable>xxx</replaceable>_COPY()</function>
macros guarantees a writable result.) macros guarantees a writable result.)
</para>
<para>
The second variant consists of the The second variant consists of the
<function>PG_GETARG_<replaceable>xxx</replaceable>_SLICE()</function> <function>PG_GETARG_<replaceable>xxx</replaceable>_SLICE()</function>
macros which take three parameters. The first is the number of the macros which take three arguments. The first is the number of the
parameter (as above). The second and third are the offset and function argument (as above). The second and third are the offset and
length of the segment to be returned. Offsets are counted from length of the segment to be returned. Offsets are counted from
zero, and a negative length requests that the remainder of the zero, and a negative length requests that the remainder of the
value be returned. These routines provide more efficient access to value be returned. These macros provide more efficient access to
parts of large values in the case where they have storage type parts of large values in the case where they have storage type
<quote>external</quote>. (The storage type of a column can be specified using <quote>external</quote>. (The storage type of a column can be specified using
<literal>ALTER TABLE <replaceable>tablename</replaceable> ALTER <literal>ALTER TABLE <replaceable>tablename</replaceable> ALTER
COLUMN <replaceable>colname</replaceable> SET STORAGE COLUMN <replaceable>colname</replaceable> SET STORAGE
<replaceable>storagetype</replaceable></literal>. Storage type is one of <replaceable>storagetype</replaceable></literal>. <replaceable>storagetype</replaceable> is one of
<literal>plain</>, <literal>external</>, <literal>extended</literal>, <literal>plain</>, <literal>external</>, <literal>extended</literal>,
or <literal>main</>.) or <literal>main</>.)
</para> </para>
<para> <para>
The version-1 function call conventions make it possible to Finally, the version-1 function call conventions make it possible
return <quote>set</quote> results and implement trigger functions and to return set results (<xref linkend="xfunc-c-return-set">) and
procedural-language call handlers. Version-1 code is also more implement trigger functions (<xref linkend="triggers">) and
portable than version-0, because it does not break ANSI C restrictions procedural-language call handlers (<xref
on function call protocol. For more details see linkend="xfunc-plhandler">). Version-1 code is also more
<filename>src/backend/utils/fmgr/README</filename> in the source portable than version-0, because it does not break restrictions
distribution. on function call protocol in the C standard. For more details
see <filename>src/backend/utils/fmgr/README</filename> in the
source distribution.
</para> </para>
</sect2> </sect2>
<sect2> <sect2>
<title>Composite Types in C-Language Functions</title> <title>Writing Code</title>
<para>
Before we turn to the more advanced topics, we should discuss
some coding rules for PostgreSQL C-language functions. While it
may be possible to load functions written in languages other than
C into <productname>PostgreSQL</productname>, this is usually
difficult (when it is possible at all) because other languages,
such as C++, FORTRAN, or Pascal often do not follow the same
calling convention as C. That is, other languages do not pass
argument and return values between functions in the same way.
For this reason, we will assume that your C-language functions
are actually written in C.
</para>
<para>
The basic rules for writing and building C functions are as follows:
<itemizedlist>
<listitem>
<para>
Use <literal>pg_config
--includedir-server</literal><indexterm><primary>pg_config</></>
to find out where the <productname>PostgreSQL</> server header
files are installed on your system (or the system that your
users will be running on). This option is new with
<productname>PostgreSQL</> 7.2. For
<productname>PostgreSQL</> 7.1 you should use the option
<option>--includedir</option>. (<command>pg_config</command>
will exit with a non-zero status if it encounters an unknown
option.) For releases prior to 7.1 you will have to guess,
but since that was before the current calling conventions were
introduced, it is unlikely that you want to support those
releases.
</para>
</listitem>
<listitem>
<para>
When allocating memory, use the
<productname>PostgreSQL</productname> functions
<function>palloc</function> and <function>pfree</function>
instead of the corresponding C library functions
<function>malloc</function> and <function>free</function>.
The memory allocated by <function>palloc</function> will be
freed automatically at the end of each transaction, preventing
memory leaks.
</para>
</listitem>
<listitem>
<para>
Always zero the bytes of your structures using
<function>memset</function> or <function>bzero</function>.
Several routines (such as the hash access method, hash joins,
and the sort algorithm) compute functions of the raw bits
contained in your structure. Even if you initialize all
fields of your structure, there may be several bytes of
alignment padding (holes in the structure) that may contain
garbage values.
</para>
</listitem>
<listitem>
<para>
Most of the internal <productname>PostgreSQL</productname>
types are declared in <filename>postgres.h</filename>, while
the function manager interfaces
(<symbol>PG_FUNCTION_ARGS</symbol>, etc.) are in
<filename>fmgr.h</filename>, so you will need to include at
least these two files. For portability reasons it's best to
include <filename>postgres.h</filename> <emphasis>first</>,
before any other system or user header files. Including
<filename>postgres.h</filename> will also include
<filename>elog.h</filename> and <filename>palloc.h</filename>
for you.
</para>
</listitem>
<listitem>
<para>
Symbol names defined within object files must not conflict
with each other or with symbols defined in the
<productname>PostgreSQL</productname> server executable. You
will have to rename your functions or variables if you get
error messages to this effect.
</para>
</listitem>
<listitem>
<para>
Compiling and linking your code so that it can be dynamically
loaded into <productname>PostgreSQL</productname> always
requires special flags. See <xref linkend="dfunc"> for a
detailed explanation of how to do it for your particular
operating system.
</para>
</listitem>
</itemizedlist>
</para>
</sect2>
&dfunc;
<sect2>
<title>Composite-Type Arguments in C-Language Functions</title>
<para> <para>
Composite types do not have a fixed layout like C Composite types do not have a fixed layout like C
...@@ -1409,26 +1503,28 @@ concat_text(PG_FUNCTION_ARGS) ...@@ -1409,26 +1503,28 @@ concat_text(PG_FUNCTION_ARGS)
part of an inheritance hierarchy may have different part of an inheritance hierarchy may have different
fields than other members of the same inheritance hierarchy. fields than other members of the same inheritance hierarchy.
Therefore, <productname>PostgreSQL</productname> provides Therefore, <productname>PostgreSQL</productname> provides
a procedural interface for accessing fields of composite types a function interface for accessing fields of composite types
from C. As <productname>PostgreSQL</productname> processes from C.
a set of rows, each row will be passed into your </para>
function as an opaque structure of type <literal>TUPLE</literal>.
<para>
Suppose we want to write a function to answer the query Suppose we want to write a function to answer the query
<programlisting> <programlisting>
SELECT name, c_overpaid(emp, 1500) AS overpaid SELECT name, c_overpaid(emp, 1500) AS overpaid
FROM emp FROM emp
WHERE name = 'Bill' OR name = 'Sam'; WHERE name = 'Bill' OR name = 'Sam';
</programlisting> </programlisting>
In the query above, we can define <function>c_overpaid</> as: Using call conventions version 0, we can define
<function>c_overpaid</> as:
<programlisting> <programlisting>
#include "postgres.h" #include "postgres.h"
#include "executor/executor.h" /* for GetAttributeByName() */ #include "executor/executor.h" /* for GetAttributeByName() */
bool bool
c_overpaid(TupleTableSlot *t, /* the current row of EMP */ c_overpaid(TupleTableSlot *t, /* the current row of emp */
int32 limit) int32 limit)
{ {
bool isnull; bool isnull;
...@@ -1436,11 +1532,16 @@ c_overpaid(TupleTableSlot *t, /* the current row of EMP */ ...@@ -1436,11 +1532,16 @@ c_overpaid(TupleTableSlot *t, /* the current row of EMP */
salary = DatumGetInt32(GetAttributeByName(t, "salary", &amp;isnull)); salary = DatumGetInt32(GetAttributeByName(t, "salary", &amp;isnull));
if (isnull) if (isnull)
return (false); return false;
return salary &gt; limit; return salary &gt; limit;
} }
</programlisting>
/* In version-1 coding, the above would look like this: */ In version-1 coding, the above would look like this:
<programlisting>
#include "postgres.h"
#include "executor/executor.h" /* for GetAttributeByName() */
PG_FUNCTION_INFO_V1(c_overpaid); PG_FUNCTION_INFO_V1(c_overpaid);
...@@ -1455,7 +1556,7 @@ c_overpaid(PG_FUNCTION_ARGS) ...@@ -1455,7 +1556,7 @@ c_overpaid(PG_FUNCTION_ARGS)
salary = DatumGetInt32(GetAttributeByName(t, "salary", &amp;isnull)); salary = DatumGetInt32(GetAttributeByName(t, "salary", &amp;isnull));
if (isnull) if (isnull)
PG_RETURN_BOOL(false); PG_RETURN_BOOL(false);
/* Alternatively, we might prefer to do PG_RETURN_NULL() for null salary */ /* Alternatively, we might prefer to do PG_RETURN_NULL() for null salary. */
PG_RETURN_BOOL(salary &gt; limit); PG_RETURN_BOOL(salary &gt; limit);
} }
...@@ -1465,7 +1566,7 @@ c_overpaid(PG_FUNCTION_ARGS) ...@@ -1465,7 +1566,7 @@ c_overpaid(PG_FUNCTION_ARGS)
<para> <para>
<function>GetAttributeByName</function> is the <function>GetAttributeByName</function> is the
<productname>PostgreSQL</productname> system function that <productname>PostgreSQL</productname> system function that
returns attributes out of the current row. It has returns attributes out of the specified row. It has
three arguments: the argument of type <type>TupleTableSlot*</type> passed into three arguments: the argument of type <type>TupleTableSlot*</type> passed into
the function, the name of the desired attribute, and a the function, the name of the desired attribute, and a
return parameter that tells whether the attribute return parameter that tells whether the attribute
...@@ -1475,55 +1576,43 @@ c_overpaid(PG_FUNCTION_ARGS) ...@@ -1475,55 +1576,43 @@ c_overpaid(PG_FUNCTION_ARGS)
</para> </para>
<para> <para>
The following command lets <productname>PostgreSQL</productname> The following command declares the function
know about the <function>c_overpaid</function> function: <function>c_overpaid</function> in SQL:
<programlisting> <programlisting>
CREATE FUNCTION c_overpaid(emp, int4) CREATE FUNCTION c_overpaid(emp, integer)
RETURNS bool RETURNS boolean
AS '<replaceable>PGROOT</replaceable>/tutorial/funcs' AS '<replaceable>DIRECTORY</replaceable>/funcs', 'c_overpaid'
LANGUAGE C; LANGUAGE C;
</programlisting> </programlisting>
</para> </para>
</sect2> </sect2>
<sect2> <sect2>
<title>Table Function API</title> <title>Returning Rows (Composite Types) from C-Language Functions</title>
<para>
The Table Function API assists in the creation of user-defined
C language table functions (<xref linkend="xfunc-tablefunctions">).
Table functions are functions that produce a set of rows, made up of
either base (scalar) data types, or composite (multi-column) data types.
The API is split into two main components: support for returning
composite data types, and support for returning multiple rows
(set-returning functions or <acronym>SRF</>s).
</para>
<para> <para>
The Table Function API relies on macros and functions to suppress most To return a row or composite-type value from a C-language
of the complexity of building composite data types and returning multiple function, you can use a special API that provides macros and
results. A table function must follow the version-1 calling convention functions to hide most of the complexity of building composite
described above. In addition, the source file must include: data types. To use this API, the source file must include:
<programlisting> <programlisting>
#include "funcapi.h" #include "funcapi.h"
</programlisting> </programlisting>
</para> </para>
<sect3>
<title>Returning Rows (Composite Types)</title>
<para> <para>
The Table Function API support for returning composite data types The support for returning composite data types (or rows) starts
(or rows) starts with the <structname>AttInMetadata</> with the <structname>AttInMetadata</> structure. This structure
structure. This structure holds arrays of individual attribute holds arrays of individual attribute information needed to create
information needed to create a row from raw C strings. It also a row from raw C strings. The information contained in the
saves a pointer to the <structname>TupleDesc</>. The information structure is derived from a <structname>TupleDesc</> structure,
carried here is derived from the <structname>TupleDesc</>, but it but it is stored to avoid redundant computations on each call to
is stored here to avoid redundant CPU cycles on each call to a a set-returning function (see next section). In the case of a
table function. In the case of a function returning a set, the function returning a set, the <structname>AttInMetadata</>
<structname>AttInMetadata</> structure should be computed structure should be computed once during the first call and saved
once during the first call and saved for re-use in later calls. for reuse in later calls. <structname>AttInMetadata</> also
saves a pointer to the original <structname>TupleDesc</>.
<programlisting> <programlisting>
typedef struct AttInMetadata typedef struct AttInMetadata
{ {
...@@ -1548,13 +1637,13 @@ typedef struct AttInMetadata ...@@ -1548,13 +1637,13 @@ typedef struct AttInMetadata
<programlisting> <programlisting>
TupleDesc RelationNameGetTupleDesc(const char *relname) TupleDesc RelationNameGetTupleDesc(const char *relname)
</programlisting> </programlisting>
to get a <structname>TupleDesc</> based on a specified relation, or to get a <structname>TupleDesc</> for a named relation, or
<programlisting> <programlisting>
TupleDesc TypeGetTupleDesc(Oid typeoid, List *colaliases) TupleDesc TypeGetTupleDesc(Oid typeoid, List *colaliases)
</programlisting> </programlisting>
to get a <structname>TupleDesc</> based on a type OID. This can to get a <structname>TupleDesc</> based on a type OID. This can
be used to get a <structname>TupleDesc</> for a base (scalar) or be used to get a <structname>TupleDesc</> for a base or
composite (relation) type. Then composite type. Then
<programlisting> <programlisting>
AttInMetadata *TupleDescGetAttInMetadata(TupleDesc tupdesc) AttInMetadata *TupleDescGetAttInMetadata(TupleDesc tupdesc)
</programlisting> </programlisting>
...@@ -1562,8 +1651,7 @@ AttInMetadata *TupleDescGetAttInMetadata(TupleDesc tupdesc) ...@@ -1562,8 +1651,7 @@ AttInMetadata *TupleDescGetAttInMetadata(TupleDesc tupdesc)
initialized based on the given initialized based on the given
<structname>TupleDesc</>. <structname>AttInMetadata</> can be <structname>TupleDesc</>. <structname>AttInMetadata</> can be
used in conjunction with C strings to produce a properly formed used in conjunction with C strings to produce a properly formed
tuple. The metadata is stored here to avoid redundant work across row value (internally called tuple).
multiple calls.
</para> </para>
<para> <para>
...@@ -1574,7 +1662,7 @@ TupleTableSlot *TupleDescGetSlot(TupleDesc tupdesc) ...@@ -1574,7 +1662,7 @@ TupleTableSlot *TupleDescGetSlot(TupleDesc tupdesc)
</programlisting> </programlisting>
to initialize this tuple slot, or obtain one through other (user provided) to initialize this tuple slot, or obtain one through other (user provided)
means. The tuple slot is needed to create a <type>Datum</> for return by the means. The tuple slot is needed to create a <type>Datum</> for return by the
function. The same slot can (and should) be re-used on each call. function. The same slot can (and should) be reused on each call.
</para> </para>
<para> <para>
...@@ -1583,13 +1671,13 @@ TupleTableSlot *TupleDescGetSlot(TupleDesc tupdesc) ...@@ -1583,13 +1671,13 @@ TupleTableSlot *TupleDescGetSlot(TupleDesc tupdesc)
HeapTuple BuildTupleFromCStrings(AttInMetadata *attinmeta, char **values) HeapTuple BuildTupleFromCStrings(AttInMetadata *attinmeta, char **values)
</programlisting> </programlisting>
can be used to build a <structname>HeapTuple</> given user data can be used to build a <structname>HeapTuple</> given user data
in C string form. <quote>values</quote> is an array of C strings, one for in C string form. <literal>values</literal> is an array of C strings, one for
each attribute of the return tuple. Each C string should be in each attribute of the return row. Each C string should be in
the form expected by the input function of the attribute data the form expected by the input function of the attribute data
type. In order to return a null value for one of the attributes, type. In order to return a null value for one of the attributes,
the corresponding pointer in the <parameter>values</> array the corresponding pointer in the <parameter>values</> array
should be set to <symbol>NULL</>. This function will need to should be set to <symbol>NULL</>. This function will need to
be called again for each tuple you return. be called again for each row you return.
</para> </para>
<para> <para>
...@@ -1597,16 +1685,16 @@ HeapTuple BuildTupleFromCStrings(AttInMetadata *attinmeta, char **values) ...@@ -1597,16 +1685,16 @@ HeapTuple BuildTupleFromCStrings(AttInMetadata *attinmeta, char **values)
<function>BuildTupleFromCStrings</> is only convenient if your <function>BuildTupleFromCStrings</> is only convenient if your
function naturally computes the values to be returned as text function naturally computes the values to be returned as text
strings. If your code naturally computes the values as a set of strings. If your code naturally computes the values as a set of
Datums, you should instead use the underlying <type>Datum</> values, you should instead use the underlying
<function>heap_formtuple</> routine to convert the function <function>heap_formtuple</> to convert the
<type>Datum</type>s directly into a tuple. You will still need <type>Datum</type> values directly into a tuple. You will still need
the <structname>TupleDesc</> and a <structname>TupleTableSlot</>, the <structname>TupleDesc</> and a <structname>TupleTableSlot</>,
but not <structname>AttInMetadata</>. but not <structname>AttInMetadata</>.
</para> </para>
<para> <para>
Once you have built a tuple to return from your function, the tuple must Once you have built a tuple to return from your function, it
be converted into a <type>Datum</>. Use must be converted into a <type>Datum</>. Use
<programlisting> <programlisting>
TupleGetDatum(TupleTableSlot *slot, HeapTuple tuple) TupleGetDatum(TupleTableSlot *slot, HeapTuple tuple)
</programlisting> </programlisting>
...@@ -1617,28 +1705,36 @@ TupleGetDatum(TupleTableSlot *slot, HeapTuple tuple) ...@@ -1617,28 +1705,36 @@ TupleGetDatum(TupleTableSlot *slot, HeapTuple tuple)
</para> </para>
<para> <para>
An example appears below. An example appears in the next section.
</para> </para>
</sect3> </sect2>
<sect2 id="xfunc-c-return-set">
<title>Returning Sets from C-Language Functions</title>
<sect3> <para>
<title>Returning Sets</title> There is also a special API that provides support for returning
sets (multiple rows) from a C-language function. A set-returning
function must follow the version-1 calling conventions. Also,
source files must include <filename>funcapi.h</filename>, as
above.
</para>
<para> <para>
A set-returning function (<acronym>SRF</>) is normally called A set-returning function (<acronym>SRF</>) is called
once for each item it returns. The <acronym>SRF</> must once for each item it returns. The <acronym>SRF</> must
therefore save enough state to remember what it was doing and therefore save enough state to remember what it was doing and
return the next item on each call. The Table Function API return the next item on each call.
provides the <structname>FuncCallContext</> structure to help The structure <structname>FuncCallContext</> is provided to help
control this process. <literal>fcinfo-&gt;flinfo-&gt;fn_extra</> control this process. Within a function, <literal>fcinfo-&gt;flinfo-&gt;fn_extra</>
is used to hold a pointer to <structname>FuncCallContext</> is used to hold a pointer to <structname>FuncCallContext</>
across calls. across calls.
<programlisting> <programlisting>
typedef struct typedef struct
{ {
/* /*
* Number of times we've been called before. * Number of times we've been called before
* *
* call_cntr is initialized to 0 for you by SRF_FIRSTCALL_INIT(), and * call_cntr is initialized to 0 for you by SRF_FIRSTCALL_INIT(), and
* incremented for you every time SRF_RETURN_NEXT() is called. * incremented for you every time SRF_RETURN_NEXT() is called.
...@@ -1648,7 +1744,7 @@ typedef struct ...@@ -1648,7 +1744,7 @@ typedef struct
/* /*
* OPTIONAL maximum number of calls * OPTIONAL maximum number of calls
* *
* max_calls is here for convenience ONLY and setting it is OPTIONAL. * max_calls is here for convenience only and setting it is optional.
* If not set, you must provide alternative means to know when the * If not set, you must provide alternative means to know when the
* function is done. * function is done.
*/ */
...@@ -1657,41 +1753,43 @@ typedef struct ...@@ -1657,41 +1753,43 @@ typedef struct
/* /*
* OPTIONAL pointer to result slot * OPTIONAL pointer to result slot
* *
* slot is for use when returning tuples (i.e. composite data types) * slot is for use when returning tuples (i.e., composite data types)
* and is not needed when returning base (i.e. scalar) data types. * and is not needed when returning base data types.
*/ */
TupleTableSlot *slot; TupleTableSlot *slot;
/* /*
* OPTIONAL pointer to misc user provided context info * OPTIONAL pointer to miscellaneous user-provided context information
* *
* user_fctx is for use as a pointer to your own struct to retain * user_fctx is for use as a pointer to your own data to retain
* arbitrary context information between calls for your function. * arbitrary context information between calls of your function.
*/ */
void *user_fctx; void *user_fctx;
/* /*
* OPTIONAL pointer to struct containing arrays of attribute type input * OPTIONAL pointer to struct containing attribute type input metadata
* metainfo
* *
* attinmeta is for use when returning tuples (i.e. composite data types) * attinmeta is for use when returning tuples (i.e., composite data types)
* and is not needed when returning base (i.e. scalar) data types. It * and is not needed when returning base data types. It
* is ONLY needed if you intend to use BuildTupleFromCStrings() to create * is only needed if you intend to use BuildTupleFromCStrings() to create
* the return tuple. * the return tuple.
*/ */
AttInMetadata *attinmeta; AttInMetadata *attinmeta;
/* /*
* memory context used for structures which must live for multiple calls * memory context used for structures that must live for multiple calls
* *
* multi_call_memory_ctx is set by SRF_FIRSTCALL_INIT() for you, and used * multi_call_memory_ctx is set by SRF_FIRSTCALL_INIT() for you, and used
* by SRF_RETURN_DONE() for cleanup. It is the most appropriate memory * by SRF_RETURN_DONE() for cleanup. It is the most appropriate memory
* context for any memory that is to be re-used across multiple calls * context for any memory that is to be reused across multiple calls
* of the SRF. * of the SRF.
*/ */
MemoryContext multi_call_memory_ctx; MemoryContext multi_call_memory_ctx;
} FuncCallContext; } FuncCallContext;
</programlisting> </programlisting>
</para>
<para>
An <acronym>SRF</> uses several functions and macros that An <acronym>SRF</> uses several functions and macros that
automatically manipulate the <structname>FuncCallContext</> automatically manipulate the <structname>FuncCallContext</>
structure (and expect to find it via <literal>fn_extra</>). Use structure (and expect to find it via <literal>fn_extra</>). Use
...@@ -1718,9 +1816,9 @@ SRF_PERCALL_SETUP() ...@@ -1718,9 +1816,9 @@ SRF_PERCALL_SETUP()
<programlisting> <programlisting>
SRF_RETURN_NEXT(funcctx, result) SRF_RETURN_NEXT(funcctx, result)
</programlisting> </programlisting>
to return it to the caller. (The <literal>result</> must be a to return it to the caller. (<literal>result</> must be of type
<type>Datum</>, either a single value or a tuple prepared as <type>Datum</>, either a single value or a tuple prepared as
described earlier.) Finally, when your function is finished described above.) Finally, when your function is finished
returning data, use returning data, use
<programlisting> <programlisting>
SRF_RETURN_DONE(funcctx) SRF_RETURN_DONE(funcctx)
...@@ -1731,8 +1829,8 @@ SRF_RETURN_DONE(funcctx) ...@@ -1731,8 +1829,8 @@ SRF_RETURN_DONE(funcctx)
<para> <para>
The memory context that is current when the <acronym>SRF</> is called is The memory context that is current when the <acronym>SRF</> is called is
a transient context that will be cleared between calls. This means a transient context that will be cleared between calls. This means
that you do not need to <function>pfree</> everything that you do not need to call <function>pfree</> on everything
you <function>palloc</>; it will go away anyway. However, if you want to allocate you allocated using <function>palloc</>; it will go away anyway. However, if you want to allocate
any data structures to live across calls, you need to put them somewhere any data structures to live across calls, you need to put them somewhere
else. The memory context referenced by else. The memory context referenced by
<structfield>multi_call_memory_ctx</> is a suitable location for any <structfield>multi_call_memory_ctx</> is a suitable location for any
...@@ -1745,45 +1843,45 @@ SRF_RETURN_DONE(funcctx) ...@@ -1745,45 +1843,45 @@ SRF_RETURN_DONE(funcctx)
A complete pseudo-code example looks like the following: A complete pseudo-code example looks like the following:
<programlisting> <programlisting>
Datum Datum
my_Set_Returning_Function(PG_FUNCTION_ARGS) my_set_returning_function(PG_FUNCTION_ARGS)
{ {
FuncCallContext *funcctx; FuncCallContext *funcctx;
Datum result; Datum result;
MemoryContext oldcontext; MemoryContext oldcontext;
[user defined declarations] <replaceable>further declarations as needed</replaceable>
if (SRF_IS_FIRSTCALL()) if (SRF_IS_FIRSTCALL())
{ {
funcctx = SRF_FIRSTCALL_INIT(); funcctx = SRF_FIRSTCALL_INIT();
oldcontext = MemoryContextSwitchTo(funcctx->multi_call_memory_ctx); oldcontext = MemoryContextSwitchTo(funcctx->multi_call_memory_ctx);
/* one-time setup code appears here: */ /* One-time setup code appears here: */
[user defined code] <replaceable>user code</replaceable>
[if returning composite] <replaceable>if returning composite</replaceable>
[build TupleDesc, and perhaps AttInMetadata] <replaceable>build TupleDesc, and perhaps AttInMetadata</replaceable>
[obtain slot] <replaceable>obtain slot</replaceable>
funcctx-&gt;slot = slot; funcctx-&gt;slot = slot;
[endif returning composite] <replaceable>endif returning composite</replaceable>
[user defined code] <replaceable>user code</replaceable>
MemoryContextSwitchTo(oldcontext); MemoryContextSwitchTo(oldcontext);
} }
/* each-time setup code appears here: */ /* Each-time setup code appears here: */
[user defined code] <replaceable>user code</replaceable>
funcctx = SRF_PERCALL_SETUP(); funcctx = SRF_PERCALL_SETUP();
[user defined code] <replaceable>user code</replaceable>
/* this is just one way we might test whether we are done: */ /* this is just one way we might test whether we are done: */
if (funcctx-&gt;call_cntr &lt; funcctx-&gt;max_calls) if (funcctx-&gt;call_cntr &lt; funcctx-&gt;max_calls)
{ {
/* here we want to return another item: */ /* Here we want to return another item: */
[user defined code] <replaceable>user code</replaceable>
[obtain result Datum] <replaceable>obtain result Datum</replaceable>
SRF_RETURN_NEXT(funcctx, result); SRF_RETURN_NEXT(funcctx, result);
} }
else else
{ {
/* here we are done returning items, and just need to clean up: */ /* Here we are done returning items and just need to clean up: */
[user defined code] <replaceable>user code</replaceable>
SRF_RETURN_DONE(funcctx); SRF_RETURN_DONE(funcctx);
} }
} }
...@@ -1794,6 +1892,7 @@ my_Set_Returning_Function(PG_FUNCTION_ARGS) ...@@ -1794,6 +1892,7 @@ my_Set_Returning_Function(PG_FUNCTION_ARGS)
A complete example of a simple <acronym>SRF</> returning a composite type looks like: A complete example of a simple <acronym>SRF</> returning a composite type looks like:
<programlisting> <programlisting>
PG_FUNCTION_INFO_V1(testpassbyval); PG_FUNCTION_INFO_V1(testpassbyval);
Datum Datum
testpassbyval(PG_FUNCTION_ARGS) testpassbyval(PG_FUNCTION_ARGS)
{ {
...@@ -1818,9 +1917,7 @@ testpassbyval(PG_FUNCTION_ARGS) ...@@ -1818,9 +1917,7 @@ testpassbyval(PG_FUNCTION_ARGS)
/* total number of tuples to be returned */ /* total number of tuples to be returned */
funcctx-&gt;max_calls = PG_GETARG_UINT32(0); funcctx-&gt;max_calls = PG_GETARG_UINT32(0);
/* /* Build a tuple description for a __testpassbyval tuple */
* Build a tuple description for a __testpassbyval tuple
*/
tupdesc = RelationNameGetTupleDesc("__testpassbyval"); tupdesc = RelationNameGetTupleDesc("__testpassbyval");
/* allocate a slot for a tuple with this tupdesc */ /* allocate a slot for a tuple with this tupdesc */
...@@ -1830,7 +1927,7 @@ testpassbyval(PG_FUNCTION_ARGS) ...@@ -1830,7 +1927,7 @@ testpassbyval(PG_FUNCTION_ARGS)
funcctx-&gt;slot = slot; funcctx-&gt;slot = slot;
/* /*
* Generate attribute metadata needed later to produce tuples from raw * generate attribute metadata needed later to produce tuples from raw
* C strings * C strings
*/ */
attinmeta = TupleDescGetAttInMetadata(tupdesc); attinmeta = TupleDescGetAttInMetadata(tupdesc);
...@@ -1856,7 +1953,7 @@ testpassbyval(PG_FUNCTION_ARGS) ...@@ -1856,7 +1953,7 @@ testpassbyval(PG_FUNCTION_ARGS)
/* /*
* Prepare a values array for storage in our slot. * Prepare a values array for storage in our slot.
* This should be an array of C strings which will * This should be an array of C strings which will
* be processed later by the appropriate "in" functions. * be processed later by the type input functions.
*/ */
values = (char **) palloc(3 * sizeof(char *)); values = (char **) palloc(3 * sizeof(char *));
values[0] = (char *) palloc(16 * sizeof(char)); values[0] = (char *) palloc(16 * sizeof(char));
...@@ -1873,7 +1970,7 @@ testpassbyval(PG_FUNCTION_ARGS) ...@@ -1873,7 +1970,7 @@ testpassbyval(PG_FUNCTION_ARGS)
/* make the tuple into a datum */ /* make the tuple into a datum */
result = TupleGetDatum(slot, tuple); result = TupleGetDatum(slot, tuple);
/* Clean up (this is not actually necessary) */ /* clean up (this is not really necessary) */
pfree(values[0]); pfree(values[0]);
pfree(values[1]); pfree(values[1]);
pfree(values[2]); pfree(values[2]);
...@@ -1887,136 +1984,22 @@ testpassbyval(PG_FUNCTION_ARGS) ...@@ -1887,136 +1984,22 @@ testpassbyval(PG_FUNCTION_ARGS)
} }
} }
</programlisting> </programlisting>
with supporting SQL code of
The SQL code to declare this function is:
<programlisting> <programlisting>
CREATE TYPE __testpassbyval AS (f1 int4, f2 int4, f3 int4); CREATE TYPE __testpassbyval AS (f1 integer, f2 integer, f3 integer);
CREATE OR REPLACE FUNCTION testpassbyval(int4, int4) RETURNS setof __testpassbyval CREATE OR REPLACE FUNCTION testpassbyval(integer, integer) RETURNS SETOF __testpassbyval
AS 'MODULE_PATHNAME','testpassbyval' LANGUAGE 'c' IMMUTABLE STRICT; AS '<replaceable>filename</>', 'testpassbyval'
LANGUAGE C IMMUTABLE STRICT;
</programlisting> </programlisting>
</para> </para>
<para> <para>
See <filename>contrib/tablefunc</> for more examples of table functions. The directory <filename>contrib/tablefunc</> in the source
</para> distribution contains more examples of set-returning functions.
</sect3>
</sect2>
<sect2>
<title>Writing Code</title>
<para>
We now turn to the more difficult task of writing
programming language functions. Be warned: this section
of the manual will not make you a programmer. You must
have a good understanding of <acronym>C</acronym>
(including the use of pointers)
before trying to write <acronym>C</acronym> functions for
use with <productname>PostgreSQL</productname>. While it may
be possible to load functions written in languages other
than <acronym>C</acronym> into <productname>PostgreSQL</productname>,
this is often difficult (when it is possible at all)
because other languages, such as <acronym>FORTRAN</acronym>
and <acronym>Pascal</acronym> often do not follow the same
<firstterm>calling convention</firstterm>
as <acronym>C</acronym>. That is, other
languages do not pass argument and return values
between functions in the same way. For this reason, we
will assume that your programming language functions
are written in <acronym>C</acronym>.
</para>
<para>
The basic rules for building <acronym>C</acronym> functions
are as follows:
<itemizedlist>
<listitem>
<para>
Use <literal>pg_config --includedir-server</literal><indexterm><primary>pg_config</></> to find
out where the <productname>PostgreSQL</> server header files are installed on
your system (or the system that your users will be running
on). This option is new with <productname>PostgreSQL</> 7.2.
For <productname>PostgreSQL</>
7.1 you should use the option <option>--includedir</option>.
(<command>pg_config</command> will exit with a non-zero status
if it encounters an unknown option.) For releases prior to
7.1 you will have to guess, but since that was before the
current calling conventions were introduced, it is unlikely
that you want to support those releases.
</para>
</listitem>
<listitem>
<para>
When allocating memory, use the
<productname>PostgreSQL</productname> routines
<function>palloc</function> and <function>pfree</function>
instead of the corresponding <acronym>C</acronym> library
routines <function>malloc</function> and
<function>free</function>. The memory allocated by
<function>palloc</function> will be freed automatically at the
end of each transaction, preventing memory leaks.
</para>
</listitem>
<listitem>
<para>
Always zero the bytes of your structures using
<function>memset</function> or <function>bzero</function>.
Several routines (such as the hash access method, hash join
and the sort algorithm) compute functions of the raw bits
contained in your structure. Even if you initialize all
fields of your structure, there may be several bytes of
alignment padding (holes in the structure) that may contain
garbage values.
</para>
</listitem>
<listitem>
<para>
Most of the internal <productname>PostgreSQL</productname> types
are declared in <filename>postgres.h</filename>, while the function
manager interfaces (<symbol>PG_FUNCTION_ARGS</symbol>, etc.)
are in <filename>fmgr.h</filename>, so you will need to
include at least these two files. For portability reasons it's best
to include <filename>postgres.h</filename> <emphasis>first</>,
before any other system or user header files.
Including <filename>postgres.h</filename> will also include
<filename>elog.h</filename> and <filename>palloc.h</filename>
for you.
</para>
</listitem>
<listitem>
<para>
Symbol names defined within object files must not conflict
with each other or with symbols defined in the
<productname>PostgreSQL</productname> server executable. You
will have to rename your functions or variables if you get
error messages to this effect.
</para>
</listitem>
<listitem>
<para>
Compiling and linking your object code so that
it can be dynamically loaded into
<productname>PostgreSQL</productname>
always requires special flags.
See <xref linkend="dfunc">
for a detailed explanation of how to do it for
your particular operating system.
</para>
</listitem>
</itemizedlist>
</para> </para>
</sect2> </sect2>
&dfunc;
</sect1> </sect1>
<sect1 id="xfunc-overload"> <sect1 id="xfunc-overload">
...@@ -2035,9 +2018,11 @@ CREATE OR REPLACE FUNCTION testpassbyval(int4, int4) RETURNS setof __testpassbyv ...@@ -2035,9 +2018,11 @@ CREATE OR REPLACE FUNCTION testpassbyval(int4, int4) RETURNS setof __testpassbyv
</para> </para>
<para> <para>
A function may also have the same name as an attribute. In the case A function may also have the same name as an attribute. (Recall
that there is an ambiguity between a function on a complex type and that <literal>attribute(table)</literal> is equivalent to
an attribute of the complex type, the attribute will always be used. <literal>table.attribute</literal>.) In the case that there is an
ambiguity between a function on a complex type and an attribute of
the complex type, the attribute will always be used.
</para> </para>
<para> <para>
...@@ -2056,7 +2041,7 @@ CREATE FUNCTION test(smallint, double precision) RETURNS ... ...@@ -2056,7 +2041,7 @@ CREATE FUNCTION test(smallint, double precision) RETURNS ...
</para> </para>
<para> <para>
When overloading C language functions, there is an additional When overloading C-language functions, there is an additional
constraint: The C name of each function in the family of constraint: The C name of each function in the family of
overloaded functions must be different from the C names of all overloaded functions must be different from the C names of all
other functions, either internal or dynamically loaded. If this other functions, either internal or dynamically loaded. If this
...@@ -2076,85 +2061,6 @@ CREATE FUNCTION test(int, int) RETURNS int ...@@ -2076,85 +2061,6 @@ CREATE FUNCTION test(int, int) RETURNS int
</programlisting> </programlisting>
The names of the C functions here reflect one of many possible conventions. The names of the C functions here reflect one of many possible conventions.
</para> </para>
<para>
Prior to <productname>PostgreSQL</productname> 7.0, this
alternative syntax did not exist. There is a trick to get around
the problem, by defining a set of C functions with different names
and then define a set of identically-named SQL function wrappers
that take the appropriate argument types and call the matching C
function.
</para>
</sect1>
<sect1 id="xfunc-tablefunctions">
<title>Table Functions</title>
<indexterm zone="xfunc-tablefunctions"><primary>function</></>
<para>
Table functions are functions that produce a set of rows, made up of
either base (scalar) data types, or composite (multi-column) data types.
They are used like a table, view, or subselect in the <literal>FROM</>
clause of a query. Columns returned by table functions may be included in
<literal>SELECT</>, <literal>JOIN</>, or <literal>WHERE</> clauses in the
same manner as a table, view, or subselect column.
</para>
<para>
If a table function returns a base data type, the single result column
is named for the function. If the function returns a composite type, the
result columns get the same names as the individual attributes of the type.
</para>
<para>
A table function may be aliased in the <literal>FROM</> clause, but it also
may be left unaliased. If a function is used in the FROM clause with no
alias, the function name is used as the relation name.
</para>
<para>
Table functions work wherever tables do in <literal>SELECT</> statements.
For example
<programlisting>
CREATE TABLE foo (fooid int, foosubid int, fooname text);
CREATE FUNCTION getfoo(int) RETURNS setof foo AS '
SELECT * FROM foo WHERE fooid = $1;
' LANGUAGE SQL;
SELECT * FROM getfoo(1) AS t1;
SELECT * FROM foo
WHERE foosubid in (select foosubid from getfoo(foo.fooid) z
where z.fooid = foo.fooid);
CREATE VIEW vw_getfoo AS SELECT * FROM getfoo(1);
SELECT * FROM vw_getfoo;
</programlisting>
are all valid statements.
</para>
<para>
In some cases it is useful to define table functions that can return
different column sets depending on how they are invoked. To support this,
the table function can be declared as returning the pseudo-type
<type>record</>. When such a function is used in a query, the expected
row structure must be specified in the query itself, so that the system
can know how to parse and plan the query. Consider this example:
<programlisting>
SELECT *
FROM dblink('dbname=template1', 'select proname, prosrc from pg_proc')
AS t1(proname name, prosrc text)
WHERE proname LIKE 'bytea%';
</programlisting>
The <literal>dblink</> function executes a remote query (see
<literal>contrib/dblink</>). It is declared to return <type>record</>
since it might be used for any kind of query. The actual column set
must be specified in the calling query so that the parser knows, for
example, what <literal>*</> should expand to.
</para>
</sect1> </sect1>
<sect1 id="xfunc-plhandler"> <sect1 id="xfunc-plhandler">
...@@ -2179,21 +2085,13 @@ WHERE proname LIKE 'bytea%'; ...@@ -2179,21 +2085,13 @@ WHERE proname LIKE 'bytea%';
<para> <para>
The call handler for a procedural language is a The call handler for a procedural language is a
<quote>normal</quote> function, which must be written in a <quote>normal</quote> function that must be written in a compiled
compiled language such as C and registered with language such as C, using the version-1 interface, and registered
<productname>PostgreSQL</productname> as taking no arguments and with <productname>PostgreSQL</productname> as taking no arguments
returning the <type>language_handler</type> type. and returning the type <type>language_handler</type>. This
This special pseudo-type identifies the handler as a call handler special pseudotype identifies the function as a call handler and
and prevents it from being called directly in queries. prevents it from being called directly in SQL commands.
</para>
<note>
<para>
In <productname>PostgreSQL</productname> 7.1 and later, call
handlers must adhere to the <quote>version 1</quote> function
manager interface, not the old-style interface.
</para> </para>
</note>
<para> <para>
The call handler is called in the same way as any other function: The call handler is called in the same way as any other function:
...@@ -2203,7 +2101,7 @@ WHERE proname LIKE 'bytea%'; ...@@ -2203,7 +2101,7 @@ WHERE proname LIKE 'bytea%';
is expected to return a <type>Datum</type> result (and possibly is expected to return a <type>Datum</type> result (and possibly
set the <structfield>isnull</structfield> field of the set the <structfield>isnull</structfield> field of the
<structname>FunctionCallInfoData</structname> structure, if it wishes <structname>FunctionCallInfoData</structname> structure, if it wishes
to return an SQL NULL result). The difference between a call to return an SQL null result). The difference between a call
handler and an ordinary callee function is that the handler and an ordinary callee function is that the
<structfield>flinfo-&gt;fn_oid</structfield> field of the <structfield>flinfo-&gt;fn_oid</structfield> field of the
<structname>FunctionCallInfoData</structname> structure will contain <structname>FunctionCallInfoData</structname> structure will contain
...@@ -2215,12 +2113,12 @@ WHERE proname LIKE 'bytea%'; ...@@ -2215,12 +2113,12 @@ WHERE proname LIKE 'bytea%';
</para> </para>
<para> <para>
It's up to the call handler to fetch the It's up to the call handler to fetch the entry of the function from the system table
<classname>pg_proc</classname> entry and to analyze the argument <classname>pg_proc</classname> and to analyze the argument
and return types of the called procedure. The AS clause from the and return types of the called function. The <literal>AS</> clause from the
<command>CREATE FUNCTION</command> of the procedure will be found <command>CREATE FUNCTION</command> of the function will be found
in the <literal>prosrc</literal> attribute of the in the <literal>prosrc</literal> column of the
<classname>pg_proc</classname> table entry. This may be the source <classname>pg_proc</classname> row. This may be the source
text in the procedural language itself (like for PL/Tcl), a text in the procedural language itself (like for PL/Tcl), a
path name to a file, or anything else that tells the call handler path name to a file, or anything else that tells the call handler
what to do in detail. what to do in detail.
...@@ -2231,11 +2129,11 @@ WHERE proname LIKE 'bytea%'; ...@@ -2231,11 +2129,11 @@ WHERE proname LIKE 'bytea%';
A call handler can avoid repeated lookups of information about the A call handler can avoid repeated lookups of information about the
called function by using the called function by using the
<structfield>flinfo-&gt;fn_extra</structfield> field. This will <structfield>flinfo-&gt;fn_extra</structfield> field. This will
initially be NULL, but can be set by the call handler to point at initially be <symbol>NULL</>, but can be set by the call handler to point at
information about the PL function. On subsequent calls, if information about the called function. On subsequent calls, if
<structfield>flinfo-&gt;fn_extra</structfield> is already non-NULL <structfield>flinfo-&gt;fn_extra</structfield> is already non-<symbol>NULL</>
then it can be used and the information lookup step skipped. The then it can be used and the information lookup step skipped. The
call handler must be careful that call handler must make sure that
<structfield>flinfo-&gt;fn_extra</structfield> is made to point at <structfield>flinfo-&gt;fn_extra</structfield> is made to point at
memory that will live at least until the end of the current query, memory that will live at least until the end of the current query,
since an <structname>FmgrInfo</structname> data structure could be since an <structname>FmgrInfo</structname> data structure could be
...@@ -2244,23 +2142,23 @@ WHERE proname LIKE 'bytea%'; ...@@ -2244,23 +2142,23 @@ WHERE proname LIKE 'bytea%';
<structfield>flinfo-&gt;fn_mcxt</structfield>; such data will <structfield>flinfo-&gt;fn_mcxt</structfield>; such data will
normally have the same lifespan as the normally have the same lifespan as the
<structname>FmgrInfo</structname> itself. But the handler could <structname>FmgrInfo</structname> itself. But the handler could
also choose to use a longer-lived context so that it can cache also choose to use a longer-lived memory context so that it can cache
function definition information across queries. function definition information across queries.
</para> </para>
<para> <para>
When a PL function is invoked as a trigger, no explicit arguments When a procedural-language function is invoked as a trigger, no arguments
are passed, but the are passed in the usual way, but the
<structname>FunctionCallInfoData</structname>'s <structname>FunctionCallInfoData</structname>'s
<structfield>context</structfield> field points at a <structfield>context</structfield> field points at a
<structname>TriggerData</structname> node, rather than being NULL <structname>TriggerData</structname> structure, rather than being <symbol>NULL</>
as it is in a plain function call. A language handler should as it is in a plain function call. A language handler should
provide mechanisms for PL functions to get at the trigger provide mechanisms for procedural-language functions to get at the trigger
information. information.
</para> </para>
<para> <para>
This is a template for a PL handler written in C: This is a template for a procedural-language handler written in C:
<programlisting> <programlisting>
#include "postgres.h" #include "postgres.h"
#include "executor/spi.h" #include "executor/spi.h"
...@@ -2288,7 +2186,8 @@ plsample_call_handler(PG_FUNCTION_ARGS) ...@@ -2288,7 +2186,8 @@ plsample_call_handler(PG_FUNCTION_ARGS)
retval = ... retval = ...
} }
else { else
{
/* /*
* Called as a function * Called as a function
*/ */
...@@ -2299,27 +2198,23 @@ plsample_call_handler(PG_FUNCTION_ARGS) ...@@ -2299,27 +2198,23 @@ plsample_call_handler(PG_FUNCTION_ARGS)
return retval; return retval;
} }
</programlisting> </programlisting>
</para>
<para>
Only a few thousand lines of code have to be added instead of the Only a few thousand lines of code have to be added instead of the
dots to complete the call handler. See <xref linkend="xfunc-c"> dots to complete the call handler.
for information on how to compile it into a loadable module.
</para> </para>
<para> <para>
The following commands then register the sample procedural After having compiled the handler function into a loadable module
language: (see <xref linkend="dfunc">), the following commands then
register the sample procedural language:
<programlisting> <programlisting>
CREATE FUNCTION plsample_call_handler () RETURNS language_handler CREATE FUNCTION plsample_call_handler() RETURNS language_handler
AS '/usr/local/pgsql/lib/plsample' AS '<replaceable>filename</replaceable>'
LANGUAGE C; LANGUAGE C;
CREATE LANGUAGE plsample CREATE LANGUAGE plsample
HANDLER plsample_call_handler; HANDLER plsample_call_handler;
</programlisting> </programlisting>
</para> </para>
</sect1> </sect1>
</chapter>
<!-- Keep this comment at the end of the file <!-- Keep this comment at the end of the file
Local variables: Local variables:
......
<!-- <!--
$Header: /cvsroot/pgsql/doc/src/sgml/xoper.sgml,v 1.22 2003/01/15 19:35:35 tgl Exp $ $Header: /cvsroot/pgsql/doc/src/sgml/xoper.sgml,v 1.23 2003/04/10 01:22:45 petere Exp $
--> -->
<Chapter Id="xoper"> <sect1 id="xoper">
<Title>Extending <Acronym>SQL</Acronym>: Operators</Title> <title>User-defined Operators</title>
<sect1 id="xoper-intro">
<title>Introduction</title>
<Para>
<ProductName>PostgreSQL</ProductName> supports left unary,
right unary, and binary
operators. Operators can be overloaded; that is,
the same operator name can be used for different operators
that have different numbers and types of operands. If
there is an ambiguous situation and the system cannot
determine the correct operator to use, it will return
an error. You may have to type-cast the left and/or
right operands to help it understand which operator you
meant to use.
</Para>
<Para> <Para>
Every operator is <quote>syntactic sugar</quote> for a call to an Every operator is <quote>syntactic sugar</quote> for a call to an
...@@ -28,13 +12,18 @@ $Header: /cvsroot/pgsql/doc/src/sgml/xoper.sgml,v 1.22 2003/01/15 19:35:35 tgl E ...@@ -28,13 +12,18 @@ $Header: /cvsroot/pgsql/doc/src/sgml/xoper.sgml,v 1.22 2003/01/15 19:35:35 tgl E
the operator. However, an operator is <emphasis>not merely</emphasis> the operator. However, an operator is <emphasis>not merely</emphasis>
syntactic sugar, because it carries additional information syntactic sugar, because it carries additional information
that helps the query planner optimize queries that use the that helps the query planner optimize queries that use the
operator. Much of this chapter will be devoted to explaining operator. The next section will be devoted to explaining
that additional information. that additional information.
</Para> </Para>
</sect1>
<sect1 id="xoper-example"> <Para>
<title>Example</title> <productname>PostgreSQL</productname> supports left unary, right
unary, and binary operators. Operators can be overloaded; that is,
the same operator name can be used for different operators that
have different numbers and types of operands. When a query is
executed, the system determines the operator to call from the
number and types of the provided operands.
</Para>
<Para> <Para>
Here is an example of creating an operator for adding two complex Here is an example of creating an operator for adding two complex
...@@ -45,7 +34,7 @@ $Header: /cvsroot/pgsql/doc/src/sgml/xoper.sgml,v 1.22 2003/01/15 19:35:35 tgl E ...@@ -45,7 +34,7 @@ $Header: /cvsroot/pgsql/doc/src/sgml/xoper.sgml,v 1.22 2003/01/15 19:35:35 tgl E
<ProgramListing> <ProgramListing>
CREATE FUNCTION complex_add(complex, complex) CREATE FUNCTION complex_add(complex, complex)
RETURNS complex RETURNS complex
AS '<replaceable>PGROOT</replaceable>/tutorial/complex' AS '<replaceable>filename</replaceable>', 'complex_add'
LANGUAGE C; LANGUAGE C;
CREATE OPERATOR + ( CREATE OPERATOR + (
...@@ -58,7 +47,7 @@ CREATE OPERATOR + ( ...@@ -58,7 +47,7 @@ CREATE OPERATOR + (
</Para> </Para>
<Para> <Para>
Now we can do: Now we could execute a query like this:
<screen> <screen>
SELECT (a + b) AS c FROM test_complex; SELECT (a + b) AS c FROM test_complex;
...@@ -78,20 +67,13 @@ SELECT (a + b) AS c FROM test_complex; ...@@ -78,20 +67,13 @@ SELECT (a + b) AS c FROM test_complex;
<command>CREATE OPERATOR</command>. The <literal>commutator</> <command>CREATE OPERATOR</command>. The <literal>commutator</>
clause shown in the example is an optional hint to the query clause shown in the example is an optional hint to the query
optimizer. Further details about <literal>commutator</> and other optimizer. Further details about <literal>commutator</> and other
optimizer hints appear below. optimizer hints appear in the next section.
</Para> </Para>
</sect1> </sect1>
<sect1 id="xoper-optimization"> <sect1 id="xoper-optimization">
<title>Operator Optimization Information</title> <title>Operator Optimization Information</title>
<note>
<title>Author</title>
<para>
Written by Tom Lane.
</para>
</note>
<para> <para>
A <ProductName>PostgreSQL</ProductName> operator definition can include A <ProductName>PostgreSQL</ProductName> operator definition can include
several optional clauses that tell the system useful things about how several optional clauses that tell the system useful things about how
...@@ -99,7 +81,7 @@ SELECT (a + b) AS c FROM test_complex; ...@@ -99,7 +81,7 @@ SELECT (a + b) AS c FROM test_complex;
appropriate, because they can make for considerable speedups in execution appropriate, because they can make for considerable speedups in execution
of queries that use the operator. But if you provide them, you must be of queries that use the operator. But if you provide them, you must be
sure that they are right! Incorrect use of an optimization clause can sure that they are right! Incorrect use of an optimization clause can
result in backend crashes, subtly wrong output, or other Bad Things. result in server process crashes, subtly wrong output, or other Bad Things.
You can always leave out an optimization clause if you are not sure You can always leave out an optimization clause if you are not sure
about it; the only consequence is that queries might run slower than about it; the only consequence is that queries might run slower than
they need to. they need to.
...@@ -112,7 +94,7 @@ SELECT (a + b) AS c FROM test_complex; ...@@ -112,7 +94,7 @@ SELECT (a + b) AS c FROM test_complex;
</para> </para>
<sect2> <sect2>
<title>COMMUTATOR</title> <title><literal>COMMUTATOR</></title>
<para> <para>
The <literal>COMMUTATOR</> clause, if provided, names an operator that is the The <literal>COMMUTATOR</> clause, if provided, names an operator that is the
...@@ -155,7 +137,7 @@ SELECT (a + b) AS c FROM test_complex; ...@@ -155,7 +137,7 @@ SELECT (a + b) AS c FROM test_complex;
<para> <para>
The other, more straightforward way is just to include <literal>COMMUTATOR</> clauses The other, more straightforward way is just to include <literal>COMMUTATOR</> clauses
in both definitions. When <ProductName>PostgreSQL</ProductName> processes in both definitions. When <ProductName>PostgreSQL</ProductName> processes
the first definition and realizes that <literal>COMMUTATOR</> refers to a non-existent the first definition and realizes that <literal>COMMUTATOR</> refers to a nonexistent
operator, the system will make a dummy entry for that operator in the operator, the system will make a dummy entry for that operator in the
system catalog. This dummy entry will have valid data only system catalog. This dummy entry will have valid data only
for the operator name, left and right operand types, and result type, for the operator name, left and right operand types, and result type,
...@@ -164,9 +146,7 @@ SELECT (a + b) AS c FROM test_complex; ...@@ -164,9 +146,7 @@ SELECT (a + b) AS c FROM test_complex;
dummy entry. Later, when you define the second operator, the system dummy entry. Later, when you define the second operator, the system
updates the dummy entry with the additional information from the second updates the dummy entry with the additional information from the second
definition. If you try to use the dummy operator before it's been filled definition. If you try to use the dummy operator before it's been filled
in, you'll just get an error message. (Note: This procedure did not work in, you'll just get an error message.
reliably in <ProductName>PostgreSQL</ProductName> versions before 6.5,
but it is now the recommended way to do things.)
</para> </para>
</listitem> </listitem>
</itemizedlist> </itemizedlist>
...@@ -174,7 +154,7 @@ SELECT (a + b) AS c FROM test_complex; ...@@ -174,7 +154,7 @@ SELECT (a + b) AS c FROM test_complex;
</sect2> </sect2>
<sect2> <sect2>
<title>NEGATOR</title> <title><literal>NEGATOR</></title>
<para> <para>
The <literal>NEGATOR</> clause, if provided, names an operator that is the The <literal>NEGATOR</> clause, if provided, names an operator that is the
...@@ -194,14 +174,14 @@ SELECT (a + b) AS c FROM test_complex; ...@@ -194,14 +174,14 @@ SELECT (a + b) AS c FROM test_complex;
<para> <para>
An operator's negator must have the same left and/or right operand types An operator's negator must have the same left and/or right operand types
as the operator itself, so just as with <literal>COMMUTATOR</>, only the operator as the operator to be defined, so just as with <literal>COMMUTATOR</>, only the operator
name need be given in the <literal>NEGATOR</> clause. name need be given in the <literal>NEGATOR</> clause.
</para> </para>
<para> <para>
Providing a negator is very helpful to the query optimizer since Providing a negator is very helpful to the query optimizer since
it allows expressions like <literal>NOT (x = y)</> to be simplified into it allows expressions like <literal>NOT (x = y)</> to be simplified into
x &lt;&gt; y. This comes up more often than you might think, because <literal>x &lt;&gt; y</>. This comes up more often than you might think, because
<literal>NOT</> operations can be inserted as a consequence of other rearrangements. <literal>NOT</> operations can be inserted as a consequence of other rearrangements.
</para> </para>
...@@ -213,12 +193,12 @@ SELECT (a + b) AS c FROM test_complex; ...@@ -213,12 +193,12 @@ SELECT (a + b) AS c FROM test_complex;
</sect2> </sect2>
<sect2> <sect2>
<title>RESTRICT</title> <title><literal>RESTRICT</></title>
<para> <para>
The <literal>RESTRICT</> clause, if provided, names a restriction selectivity The <literal>RESTRICT</> clause, if provided, names a restriction selectivity
estimation function for the operator (note that this is a function estimation function for the operator. (Note that this is a function
name, not an operator name). <literal>RESTRICT</> clauses only make sense for name, not an operator name.) <literal>RESTRICT</> clauses only make sense for
binary operators that return <type>boolean</>. The idea behind a restriction binary operators that return <type>boolean</>. The idea behind a restriction
selectivity estimator is to guess what fraction of the rows in a selectivity estimator is to guess what fraction of the rows in a
table will satisfy a <literal>WHERE</literal>-clause condition of the form table will satisfy a <literal>WHERE</literal>-clause condition of the form
...@@ -269,15 +249,15 @@ column OP constant ...@@ -269,15 +249,15 @@ column OP constant
You can use <function>scalarltsel</> and <function>scalargtsel</> for comparisons on data types that You can use <function>scalarltsel</> and <function>scalargtsel</> for comparisons on data types that
have some sensible means of being converted into numeric scalars for have some sensible means of being converted into numeric scalars for
range comparisons. If possible, add the data type to those understood range comparisons. If possible, add the data type to those understood
by the routine <function>convert_to_scalar()</function> in <filename>src/backend/utils/adt/selfuncs.c</filename>. by the function <function>convert_to_scalar()</function> in <filename>src/backend/utils/adt/selfuncs.c</filename>.
(Eventually, this routine should be replaced by per-data-type functions (Eventually, this function should be replaced by per-data-type functions
identified through a column of the <classname>pg_type</> system catalog; but that hasn't happened identified through a column of the <classname>pg_type</> system catalog; but that hasn't happened
yet.) If you do not do this, things will still work, but the optimizer's yet.) If you do not do this, things will still work, but the optimizer's
estimates won't be as good as they could be. estimates won't be as good as they could be.
</para> </para>
<para> <para>
There are additional selectivity functions designed for geometric There are additional selectivity estimation functions designed for geometric
operators in <filename>src/backend/utils/adt/geo_selfuncs.c</filename>: <function>areasel</function>, <function>positionsel</function>, operators in <filename>src/backend/utils/adt/geo_selfuncs.c</filename>: <function>areasel</function>, <function>positionsel</function>,
and <function>contsel</function>. At this writing these are just stubs, but you may want and <function>contsel</function>. At this writing these are just stubs, but you may want
to use them (or even better, improve them) anyway. to use them (or even better, improve them) anyway.
...@@ -285,12 +265,12 @@ column OP constant ...@@ -285,12 +265,12 @@ column OP constant
</sect2> </sect2>
<sect2> <sect2>
<title>JOIN</title> <title><literal>JOIN</></title>
<para> <para>
The <literal>JOIN</> clause, if provided, names a join selectivity The <literal>JOIN</> clause, if provided, names a join selectivity
estimation function for the operator (note that this is a function estimation function for the operator. (Note that this is a function
name, not an operator name). <literal>JOIN</> clauses only make sense for name, not an operator name.) <literal>JOIN</> clauses only make sense for
binary operators that return <type>boolean</type>. The idea behind a join binary operators that return <type>boolean</type>. The idea behind a join
selectivity estimator is to guess what fraction of the rows in a selectivity estimator is to guess what fraction of the rows in a
pair of tables will satisfy a <literal>WHERE</>-clause condition of the form pair of tables will satisfy a <literal>WHERE</>-clause condition of the form
...@@ -319,13 +299,13 @@ table1.column1 OP table2.column2 ...@@ -319,13 +299,13 @@ table1.column1 OP table2.column2
</sect2> </sect2>
<sect2> <sect2>
<title>HASHES</title> <title><literal>HASHES</></title>
<para> <para>
The <literal>HASHES</literal> clause, if present, tells the system that The <literal>HASHES</literal> clause, if present, tells the system that
it is permissible to use the hash join method for a join based on this it is permissible to use the hash join method for a join based on this
operator. <literal>HASHES</> only makes sense for binary operators that operator. <literal>HASHES</> only makes sense for a binary operator that
return <literal>boolean</>, and in practice the operator had better be returns <literal>boolean</>, and in practice the operator had better be
equality for some data type. equality for some data type.
</para> </para>
...@@ -340,33 +320,35 @@ table1.column1 OP table2.column2 ...@@ -340,33 +320,35 @@ table1.column1 OP table2.column2
<para> <para>
In fact, logical equality is not good enough either; the operator In fact, logical equality is not good enough either; the operator
had better represent pure bitwise equality, because the hash function had better represent pure bitwise equality, because the hash
will be computed on the memory representation of the values regardless function will be computed on the memory representation of the
of what the bits mean. For example, equality of values regardless of what the bits mean. For example, the
time intervals is not bitwise equality; the interval equality operator polygon operator <literal>~=</literal>, which checks whether two
considers two time intervals equal if they have the same polygons are the same, is not bitwise equality, because two
duration, whether or not their endpoints are identical. What this means polygons can be considered the same even if their vertices are
is that a join using <literal>=</literal> between interval fields would yield different specified in a different order. What this means is that a join
results if implemented as a hash join than if implemented another way, using <literal>~=</literal> between polygon fields would yield
because a large fraction of the pairs that should match will hash to different results if implemented as a hash join than if
different values and will never be compared by the hash join. But implemented another way, because a large fraction of the pairs
if the optimizer chose to use a different kind of join, all the pairs that should match will hash to different values and will never be
that the equality operator says are equal will be found. compared by the hash join. But if the optimizer chooses to use a
We don't want that kind of inconsistency, so we don't mark interval different kind of join, all the pairs that the operator
equality as hashable. <literal>~=</literal> says are the same will be found. We don't
want that kind of inconsistency, so we don't mark the polygon
operator <literal>~=</literal> as hashable.
</para> </para>
<para> <para>
There are also machine-dependent ways in which a hash join might fail There are also machine-dependent ways in which a hash join might fail
to do the right thing. For example, if your data type to do the right thing. For example, if your data type
is a structure in which there may be uninteresting pad bits, it's unsafe is a structure in which there may be uninteresting pad bits, it's unsafe
to mark the equality operator <literal>HASHES</>. (Unless, perhaps, you write to mark the equality operator <literal>HASHES</>. (Unless you write
your other operators to ensure that the unused bits are always zero.) your other operators and functions to ensure that the unused bits are always zero, which is the recommended strategy.)
Another example is that the floating-point data types are unsafe for hash Another example is that the floating-point data types are unsafe for hash
joins. On machines that meet the <acronym>IEEE</> floating-point standard, minus joins. On machines that meet the <acronym>IEEE</> floating-point standard, negative
zero and plus zero are different values (different bit patterns) but zero and positive zero are different values (different bit patterns) but
they are defined to compare equal. So, if the equality operator on floating-point data types were marked they are defined to compare equal. So, if the equality operator on floating-point data types were marked
<literal>HASHES</>, a minus zero and a plus zero would probably not be matched up <literal>HASHES</>, a negative zero and a positive zero would probably not be matched up
by a hash join, but they would be matched up by any other join process. by a hash join, but they would be matched up by any other join process.
</para> </para>
...@@ -403,9 +385,9 @@ table1.column1 OP table2.column2 ...@@ -403,9 +385,9 @@ table1.column1 OP table2.column2
<para> <para>
The <literal>MERGES</literal> clause, if present, tells the system that The <literal>MERGES</literal> clause, if present, tells the system that
it is permissible to use the merge join method for a join based on this it is permissible to use the merge-join method for a join based on this
operator. <literal>MERGES</> only makes sense for binary operators that operator. <literal>MERGES</> only makes sense for a binary operator that
return <literal>boolean</>, and in practice the operator must represent returns <literal>boolean</>, and in practice the operator must represent
equality for some data type or pair of data types. equality for some data type or pair of data types.
</para> </para>
...@@ -420,7 +402,7 @@ table1.column1 OP table2.column2 ...@@ -420,7 +402,7 @@ table1.column1 OP table2.column2
data types had better be the same (or at least bitwise equivalent), data types had better be the same (or at least bitwise equivalent),
it is possible to merge-join two it is possible to merge-join two
distinct data types so long as they are logically compatible. For distinct data types so long as they are logically compatible. For
example, the <type>int2</type>-versus-<type>int4</type> equality operator example, the <type>smallint</type>-versus-<type>integer</type> equality operator
is merge-joinable. is merge-joinable.
We only need sorting operators that will bring both data types into a We only need sorting operators that will bring both data types into a
logically compatible sequence. logically compatible sequence.
...@@ -429,11 +411,11 @@ table1.column1 OP table2.column2 ...@@ -429,11 +411,11 @@ table1.column1 OP table2.column2
<para> <para>
Execution of a merge join requires that the system be able to identify Execution of a merge join requires that the system be able to identify
four operators related to the merge-join equality operator: less-than four operators related to the merge-join equality operator: less-than
comparison for the left input data type, less-than comparison for the comparison for the left operand data type, less-than comparison for the
right input data type, less-than comparison between the two data types, and right operand data type, less-than comparison between the two data types, and
greater-than comparison between the two data types. (These are actually greater-than comparison between the two data types. (These are actually
four distinct operators if the merge-joinable operator has two different four distinct operators if the merge-joinable operator has two different
input data types; but when the input types are the same the three operand data types; but when the operand types are the same the three
less-than operators are all the same operator.) less-than operators are all the same operator.)
It is possible to It is possible to
specify these operators individually by name, as the <literal>SORT1</>, specify these operators individually by name, as the <literal>SORT1</>,
...@@ -447,8 +429,8 @@ table1.column1 OP table2.column2 ...@@ -447,8 +429,8 @@ table1.column1 OP table2.column2
</para> </para>
<para> <para>
The input data types of the four comparison operators can be deduced The operand data types of the four comparison operators can be deduced
from the input types of the merge-joinable operator, so just as with from the operand types of the merge-joinable operator, so just as with
<literal>COMMUTATOR</>, only the operator names need be given in these <literal>COMMUTATOR</>, only the operator names need be given in these
clauses. Unless you are using peculiar choices of operator names, clauses. Unless you are using peculiar choices of operator names,
it's sufficient to write <literal>MERGES</> and let the system fill in it's sufficient to write <literal>MERGES</> and let the system fill in
...@@ -469,7 +451,7 @@ table1.column1 OP table2.column2 ...@@ -469,7 +451,7 @@ table1.column1 OP table2.column2
<listitem> <listitem>
<para> <para>
A merge-joinable equality operator must have a merge-joinable A merge-joinable equality operator must have a merge-joinable
commutator (itself if the two data types are the same, or a related commutator (itself if the two operand data types are the same, or a related
equality operator if they are different). equality operator if they are different).
</para> </para>
</listitem> </listitem>
...@@ -523,11 +505,8 @@ table1.column1 OP table2.column2 ...@@ -523,11 +505,8 @@ table1.column1 OP table2.column2
<literal>&lt;</> and <literal>&gt;</> respectively. <literal>&lt;</> and <literal>&gt;</> respectively.
</para> </para>
</note> </note>
</sect2> </sect2>
</sect1> </sect1>
</Chapter>
<!-- Keep this comment at the end of the file <!-- Keep this comment at the end of the file
Local variables: Local variables:
......
<chapter id="xtypes"> <!--
<title>Extending <acronym>SQL</acronym>: Types</title> $Header: /cvsroot/pgsql/doc/src/sgml/xtypes.sgml,v 1.17 2003/04/10 01:22:45 petere Exp $
-->
<sect1 id="xtypes">
<title>User-Defined Types</title>
<indexterm zone="xtypes"> <indexterm zone="xtypes">
<primary>data types</primary> <primary>data types</primary>
...@@ -7,22 +11,20 @@ ...@@ -7,22 +11,20 @@
</indexterm> </indexterm>
<comment> <comment>
This chapter needs to be updated for the version-1 function manager This section needs to be updated for the version-1 function manager
interface. interface.
</comment> </comment>
<para> <para>
As previously mentioned, there are two kinds of types in As described above, there are two kinds of data types in
<productname>PostgreSQL</productname>: base types (defined in a <productname>PostgreSQL</productname>: base types and composite
programming language) and composite types. This chapter describes types. This section describes how to define new base types.
how to define new base types.
</para> </para>
<para> <para>
The examples in this section can be found in The examples in this section can be found in
<filename>complex.sql</filename> and <filename>complex.c</filename> <filename>complex.sql</filename> and <filename>complex.c</filename>
in the tutorial directory. Composite examples are in in the tutorial directory.
<filename>funcs.sql</filename>.
</para> </para>
<para> <para>
...@@ -36,15 +38,15 @@ ...@@ -36,15 +38,15 @@
These functions determine how the type appears in strings (for input These functions determine how the type appears in strings (for input
by the user and output to the user) and how the type is organized in by the user and output to the user) and how the type is organized in
memory. The input function takes a null-terminated character string memory. The input function takes a null-terminated character string
as its input and returns the internal (in memory) representation of as its argument and returns the internal (in memory) representation of
the type. The output function takes the internal representation of the type. The output function takes the internal representation of
the type and returns a null-terminated character string. the type as argument and returns a null-terminated character string.
</para> </para>
<para> <para>
Suppose we want to define a complex type which represents complex Suppose we want to define a type <type>complex</> that represents
numbers. Naturally, we would choose to represent a complex in memory complex numbers. A natural way to to represent a complex number in
as the following <acronym>C</acronym> structure: memory would be the following C structure:
<programlisting> <programlisting>
typedef struct Complex { typedef struct Complex {
...@@ -53,24 +55,16 @@ typedef struct Complex { ...@@ -53,24 +55,16 @@ typedef struct Complex {
} Complex; } Complex;
</programlisting> </programlisting>
and a string of the form <literal>(x,y)</literal> as the external string As the external string representation of the type, we choose a
representation. string of the form <literal>(x,y)</literal>.
</para>
<para>
The functions are usually not hard to write, especially the output
function. However, there are a number of points to remember:
<itemizedlist>
<listitem>
<para>
When defining your external (string) representation, remember
that you must eventually write a complete and robust parser for
that representation as your input function!
</para> </para>
<para> <para>
For instance: The input and output functions are usually not hard to write,
especially the output function. But when defining the external
string representation of the type, remember that you must eventually
write a complete and robust parser for that representation as your
input function. For instance:
<programlisting> <programlisting>
Complex * Complex *
...@@ -78,19 +72,19 @@ complex_in(char *str) ...@@ -78,19 +72,19 @@ complex_in(char *str)
{ {
double x, y; double x, y;
Complex *result; Complex *result;
if (sscanf(str, " ( %lf , %lf )", &amp;x, &amp;y) != 2) {
if (sscanf(str, " ( %lf , %lf )", &amp;x, &amp;y) != 2)
{
elog(ERROR, "complex_in: error in parsing %s", str); elog(ERROR, "complex_in: error in parsing %s", str);
return NULL; return NULL;
} }
result = (Complex *)palloc(sizeof(Complex)); result = (Complex *) palloc(sizeof(Complex));
result-&gt;x = x; result-&gt;x = x;
result-&gt;y = y; result-&gt;y = y;
return (result); return result;
} }
</programlisting> </programlisting>
</para>
<para>
The output function can simply be: The output function can simply be:
<programlisting> <programlisting>
...@@ -98,29 +92,23 @@ char * ...@@ -98,29 +92,23 @@ char *
complex_out(Complex *complex) complex_out(Complex *complex)
{ {
char *result; char *result;
if (complex == NULL) if (complex == NULL)
return(NULL); return(NULL);
result = (char *) palloc(60); result = (char *) palloc(60);
sprintf(result, "(%g,%g)", complex-&gt;x, complex-&gt;y); sprintf(result, "(%g,%g)", complex-&gt;x, complex-&gt;y);
return(result); return result;
} }
</programlisting> </programlisting>
</para> </para>
</listitem>
<listitem>
<para> <para>
You should try to make the input and output functions inverses of You should try to make the input and output functions inverses of
each other. If you do not, you will have severe problems when each other. If you do not, you will have severe problems when you
you need to dump your data into a file and then read it back in need to dump your data into a file and then read it back in. This
(say, into someone else's database on another computer). This is is a particularly common problem when floating-point numbers are
a particularly common problem when floating-point numbers are
involved. involved.
</para> </para>
</listitem>
</itemizedlist>
</para>
<para> <para>
To define the <type>complex</type> type, we need to create the two To define the <type>complex</type> type, we need to create the two
...@@ -130,14 +118,18 @@ complex_out(Complex *complex) ...@@ -130,14 +118,18 @@ complex_out(Complex *complex)
<programlisting> <programlisting>
CREATE FUNCTION complex_in(cstring) CREATE FUNCTION complex_in(cstring)
RETURNS complex RETURNS complex
AS '<replaceable>PGROOT</replaceable>/tutorial/complex' AS '<replaceable>filename</replaceable>'
LANGUAGE C; LANGUAGE C;
CREATE FUNCTION complex_out(complex) CREATE FUNCTION complex_out(complex)
RETURNS cstring RETURNS cstring
AS '<replaceable>PGROOT</replaceable>/tutorial/complex' AS '<replaceable>filename</replaceable>'
LANGUAGE C; LANGUAGE C;
</programlisting> </programlisting>
Notice that the declarations of the input and output functions must
reference the not-yet-defined type. This is allowed, but will draw
warning messages that may be ignored.
</para> </para>
<para> <para>
...@@ -149,49 +141,36 @@ CREATE TYPE complex ( ...@@ -149,49 +141,36 @@ CREATE TYPE complex (
output = complex_out output = complex_out
); );
</programlisting> </programlisting>
Notice that the declarations of the input and output functions must
reference the not-yet-defined type. This is allowed, but will draw
warning messages that may be ignored.
</para> </para>
<para> <para>
<indexterm> When you define a new base type,
<primary>arrays</primary>
</indexterm>
As discussed earlier, <productname>PostgreSQL</productname> fully
supports arrays of base types. Additionally,
<productname>PostgreSQL</productname> supports arrays of
user-defined types as well. When you define a type,
<productname>PostgreSQL</productname> automatically provides support <productname>PostgreSQL</productname> automatically provides support
for arrays of that type. For historical reasons, the array type has for arrays of that
the same name as the user-defined type with the underscore character type.<indexterm><primary>array</primary><secondary>of user-defined
<literal>_</> prepended. type</secondary></indexterm> For historical reasons, the array type
has the same name as the base type with the underscore character
(<literal>_</>) prepended.
</para> </para>
<para> <para>
Composite types do not need any function defined on them, since the If the values of your data type might exceed a few hundred bytes in
system already understands what they look like inside. size (in internal form), you should mark them
TOAST-able.<indexterm><primary>TOAST</primary><secondary>and
user-defined types</secondary></indexterm> To do this, the internal
representation must follow the standard layout for variable-length
data: the first four bytes must be an <type>int32</type> containing
the total length in bytes of the datum (including itself). Also,
when running the <command>CREATE TYPE</command> command, specify the
internal length as <literal>variable</> and select the appropriate
storage option.
</para> </para>
<para> <para>
<indexterm> For further details see the description of the <command>CREATE
<primary>TOAST</primary> TYPE</command> command in <xref linkend="reference">.
<secondary>and user-defined types</secondary>
</indexterm>
If the values of your data type might exceed a few hundred bytes in
size (in internal form), you should be careful to mark them
TOAST-able. To do this, the internal representation must follow the
standard layout for variable-length data: the first four bytes must
be an <type>int32</type> containing the total length in bytes of the
datum (including itself). Then, all your functions that accept
values of the type must be careful to call
<function>pg_detoast_datum()</function> on the supplied values ---
after checking that the value is not NULL, if your function is not
strict. Finally, select the appropriate storage option when giving
the <command>CREATE TYPE</command> command.
</para> </para>
</chapter> </sect1>
<!-- Keep this comment at the end of the file <!-- Keep this comment at the end of the file
Local variables: Local variables:
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment