Commit e358a61d authored by Tom Lane's avatar Tom Lane

Updates for schema features.

parent 52200bef
<!--
$Header: /cvsroot/pgsql/doc/src/sgml/datatype.sgml,v 1.90 2002/04/25 02:56:55 tgl Exp $
$Header: /cvsroot/pgsql/doc/src/sgml/datatype.sgml,v 1.91 2002/04/25 20:14:43 tgl Exp $
-->
<chapter id="datatype">
......@@ -2924,6 +2924,18 @@ SELECT SUBSTRING(b FROM 1 FOR 2) FROM test;
<primary>regtype</primary>
</indexterm>
<indexterm zone="datatype-oid">
<primary>xid</primary>
</indexterm>
<indexterm zone="datatype-oid">
<primary>cid</primary>
</indexterm>
<indexterm zone="datatype-oid">
<primary>tid</primary>
</indexterm>
<para>
Object identifiers (OIDs) are used internally by
<productname>PostgreSQL</productname> as primary keys for various system
......@@ -3034,7 +3046,7 @@ SELECT SUBSTRING(b FROM 1 FOR 2) FROM test;
</para>
<para>
All of the alias types accept schema-qualified names, and will
All of the OID alias types accept schema-qualified names, and will
display schema-qualified names on output if the object would not
be found in the current search path without being qualified.
The <type>regproc</> and <type>regoper</> alias types will only
......@@ -3045,6 +3057,52 @@ SELECT SUBSTRING(b FROM 1 FOR 2) FROM test;
operand.
</para>
<para>
OIDs are 32-bit quantities and are assigned from a single cluster-wide
counter. In a large or long-lived database, it is possible for the
counter to wrap around. Hence, it is bad practice to assume that OIDs
are unique, unless you take steps to ensure that they are unique.
Recommended practice when using OIDs for row identification is to create
a unique constraint on the OID column of each table for which the OID will
be used. Never assume that OIDs are unique across tables; use the
combination of <structfield>tableoid</> and row OID if you need a
database-wide identifier. (Future releases of
<productname>PostgreSQL</productname> are likely to use a separate
OID counter for each table, so that <structfield>tableoid</>
<emphasis>must</> be included to arrive at a globally unique identifier.)
</para>
<para>
Another identifier type used by the system is <type>xid</>, or transaction
(abbreviated xact) identifier. This is the datatype of the system columns
<structfield>xmin</> and <structfield>xmax</>.
Transaction identifiers are 32-bit quantities. In a long-lived
database it is possible for transaction IDs to wrap around. This
is not a fatal problem given appropriate maintenance procedures;
see the <citetitle>Administrator's Guide</> for details. However, it is
unwise to depend on uniqueness of transaction IDs over the long term
(more than one billion transactions).
</para>
<para>
A third identifier type used by the system is <type>cid</>, or command
identifier. This is the datatype of the system columns
<structfield>cmin</> and <structfield>cmax</>.
Command identifiers are also 32-bit quantities. This creates a hard
limit of 2<superscript>32</> (4 billion) SQL commands within a single
transaction.
In practice this limit is not a problem --- note that the limit is on
number of SQL queries, not number of tuples processed.
</para>
<para>
A final identifier type used by the system is <type>tid</>, or tuple
identifier. This is the datatype of the system column
<structfield>ctid</>. A tuple ID is a pair
(block number, tuple index within block) that identifies the
physical location of the tuple within its table.
</para>
</sect1>
</chapter>
......
<!-- $Header: /cvsroot/pgsql/doc/src/sgml/queries.sgml,v 1.14 2001/11/28 20:49:10 petere Exp $ -->
<!-- $Header: /cvsroot/pgsql/doc/src/sgml/queries.sgml,v 1.15 2002/04/25 20:14:43 tgl Exp $ -->
<chapter id="queries">
<title>Queries</title>
......@@ -86,7 +86,8 @@ SELECT random();
FROM <replaceable>table_reference</replaceable> <optional>, <replaceable>table_reference</replaceable> <optional>, ...</optional></optional>
</synopsis>
A table reference may be a table name or a derived table such as a
A table reference may be a table name (possibly schema-qualified),
or a derived table such as a
subquery, a table join, or complex combinations of these. If more
than one table reference is listed in the FROM clause they are
cross-joined (see below) to form the derived table that may then
......
<!--
$Header: /cvsroot/pgsql/doc/src/sgml/syntax.sgml,v 1.59 2002/03/22 19:20:31 petere Exp $
$Header: /cvsroot/pgsql/doc/src/sgml/syntax.sgml,v 1.60 2002/04/25 20:14:43 tgl Exp $
-->
<chapter id="sql-syntax">
......@@ -623,14 +623,197 @@ CAST ( '<replaceable>string</replaceable>' AS <replaceable>type</replaceable> )
</sect2>
</sect1>
<sect1 id="sql-naming">
<title>Schemas and naming conventions</title>
<sect1 id="sql-syntax-columns">
<title>Columns</title>
<indexterm>
<primary>schemas</primary>
</indexterm>
<indexterm>
<primary>search path</primary>
</indexterm>
<indexterm>
<primary>namespaces</primary>
</indexterm>
<para>
A <productname>PostgreSQL</productname> database cluster (installation)
contains one or more named databases. Users and groups of users are
shared across the entire cluster, but no other data is shared across
databases. Any given client connection to the server can access
only the data in a single database, the one specified in the connection
request.
</para>
<note>
<para>
A <firstterm>column</firstterm>
is either a user-defined column of a given table or one of the
following system-defined columns:
Users of a cluster do not necessarily have the privilege to access every
database in the cluster. Sharing of user names means that there
cannot be different users named, say, <literal>joe</> in two databases
in the same cluster; but the system can be configured to allow
<literal>joe</> access to only some of the databases.
</para>
</note>
<para>
A database contains one or more named <firstterm>schemas</>, which
in turn contain tables. Schemas also contain other kinds of named
objects, including datatypes, functions, and operators. The same
object name can be used in different schemas without conflict; for
example, both <literal>schema1</> and <literal>myschema</> may
contain tables named <literal>mytable</>. Unlike databases, schemas
are not rigidly separated: a user may access objects in any of the
schemas in the database he is connected to, if he has privileges
to do so.
</para>
<indexterm>
<primary>qualified names</primary>
</indexterm>
<indexterm>
<primary>names</primary>
<secondary>qualified</secondary>
</indexterm>
<para>
To name a table precisely, write a <firstterm>qualified name</> consisting
of the schema name and table name separated by a dot:
<synopsis>
<replaceable>schema</><literal>.</><replaceable>table</>
</synopsis>
Actually, the even more general syntax
<synopsis>
<replaceable>database</><literal>.</><replaceable>schema</><literal>.</><replaceable>table</>
</synopsis>
can be used too, but at present this is just for pro-forma compliance
with the SQL standard; if you write a database name it must be the
same as the database you are connected to.
</para>
<indexterm>
<primary>unqualified names</primary>
</indexterm>
<indexterm>
<primary>names</primary>
<secondary>unqualified</secondary>
</indexterm>
<para>
Qualified names are tedious to write, and it's often best not to
wire a particular schema name into applications anyway. Therefore
tables are often referred to by <firstterm>unqualified names</>,
which consist of just the table name. The system determines which table
is meant by following a <firstterm>search path</>, which is a list
of schemas to look in. The first matching table in the search path
is taken to be the one wanted. If there is no match in the search
path, an error is reported, even if matching table names exist
in other schemas in the database.
</para>
<para>
The first schema named in the search path is called the current schema.
Aside from being the first schema searched, it is also the schema in
which new tables will be created if the <command>CREATE TABLE</>
command does not specify a schema name.
</para>
<para>
The search path works in the same way for datatype names, function names,
and operator names as it does for table names. Datatype and function
names can be qualified in exactly the same way as table names. If you
need to write a qualified operator name in an expression, there is a
special provision: you must write
<synopsis>
<literal>OPERATOR(</><replaceable>schema</><literal>.</><replaceable>operator</><literal>)</>
</synopsis>
This is needed to avoid syntactic ambiguity. An example is
<programlisting>
SELECT 3 OPERATOR(pg_catalog.+) 4;
</programlisting>
In practice one usually relies on the search path for operators,
so as not to have to write anything so ugly as that.
</para>
<para>
The standard search path in <productname>PostgreSQL</productname>
contains first the schema having the same name as the session user
(if it exists), and second the schema named <literal>public</>
(if it exists, which it does by default). This arrangement allows
a flexible combination of private and shared tables. If no per-user
schemas are created then all user tables will exist in the shared
<literal>public</> schema, providing behavior that is backwards-compatible
with pre-7.3 <productname>PostgreSQL</productname> releases.
</para>
<note>
<para>
There is no concept of a <literal>public</> schema in the SQL standard.
To achieve closest conformance to the standard, the DBA should
create per-user schemas for every user, and not use (perhaps even
remove) the <literal>public</> schema.
</para>
</note>
<para>
In addition to <literal>public</> and user-created schemas, each database
contains a
<literal>pg_catalog</> schema, which contains the system tables
and all the built-in datatypes, functions, and operators.
<literal>pg_catalog</> is always effectively part of the search path.
If it is not named explicitly in the path then it is implicitly searched
<emphasis>before</> searching the path's schemas. This ensures that
built-in names will always be findable. However, you may explicitly
place <literal>pg_catalog</> at the end of your search path if you
prefer to have user-defined names override built-in names.
</para>
<sect2 id="sql-reserved-names">
<title>Reserved names</title>
<indexterm>
<primary>reserved names</primary>
</indexterm>
<indexterm>
<primary>names</primary>
<secondary>reserved</secondary>
</indexterm>
<para>
There are several restrictions on the names that can be chosen for
user-defined database objects. These restrictions vary depending
on the kind of object. (Note that these restrictions are
separate from whether the name is a key word or not; quoting a
name will not allow you to escape these restrictions.)
</para>
<para>
Schema names beginning with <literal>pg_</> are reserved for system
purposes and may not be created by users.
</para>
<para>
In <productname>PostgreSQL</productname> versions before 7.3, table
names beginning with <literal>pg_</> were reserved. This is no longer
true: you may create such a table name if you wish, in any non-system
schema. However, it's best to continue to avoid such names,
to ensure that you won't suffer a conflict if some future version
defines a system catalog named the same as your table. (With the
default search path, an unqualified reference to your table name
would be resolved as the system catalog instead.) System catalogs will
continue to follow the convention of having names beginning with
<literal>pg_</>, so that they will not conflict with unqualified
user-table names so long as users avoid the <literal>pg_</> prefix.
</para>
<para>
Every table has several <firstterm>system columns</> that are
implicitly defined by the system. Therefore, these names cannot
be used as names of user-defined columns:
<indexterm>
<primary>columns</primary>
......@@ -648,7 +831,7 @@ CAST ( '<replaceable>string</replaceable>' AS <replaceable>type</replaceable> )
The object identifier (object ID) of a row. This is a serial number
that is automatically added by <productname>PostgreSQL</productname> to all table rows (unless
the table was created WITHOUT OIDS, in which case this column is
not present).
not present). See <xref linkend="datatype-oid"> for more info.
</para>
</listitem>
</varlistentry>
......@@ -715,13 +898,13 @@ CAST ( '<replaceable>string</replaceable>' AS <replaceable>type</replaceable> )
<term><structfield>ctid</></term>
<listitem>
<para>
The tuple ID of the tuple within its table. This is a pair
(block number, tuple index within block) that identifies the
physical location of the tuple. Note that although the <structfield>ctid</structfield>
can be used to locate the tuple very quickly, a row's <structfield>ctid</structfield>
will change each time it is updated or moved by <command>VACUUM
FULL</>.
Therefore <structfield>ctid</structfield> is useless as a long-term row identifier.
The physical location of the tuple within its table.
Note that although the <structfield>ctid</structfield>
can be used to locate the tuple very quickly, a row's
<structfield>ctid</structfield> will change each time it is updated
or moved by <command>VACUUM FULL</>.
Therefore <structfield>ctid</structfield> is useless as a long-term
row identifier.
The OID, or even better a user-defined serial number, should
be used to identify logical rows.
</para>
......@@ -729,38 +912,8 @@ CAST ( '<replaceable>string</replaceable>' AS <replaceable>type</replaceable> )
</varlistentry>
</variablelist>
</para>
<para>
OIDs are 32-bit quantities and are assigned from a single cluster-wide
counter. In a large or long-lived database, it is possible for the
counter to wrap around. Hence, it is bad practice to assume that OIDs
are unique, unless you take steps to ensure that they are unique.
Recommended practice when using OIDs for row identification is to create
a unique constraint on the OID column of each table for which the OID will be
used. Never assume that OIDs are unique across tables; use the
combination of <structfield>tableoid</> and row OID if you need a database-wide
identifier. (Future releases of <productname>PostgreSQL</productname> are likely to use a separate
OID counter for each table, so that <structfield>tableoid</> <emphasis>must</> be
included to arrive at a globally unique identifier.)
</para>
<para>
Transaction identifiers are 32-bit quantities. In a long-lived
database it is possible for transaction IDs to wrap around. This
is not a fatal problem given appropriate maintenance procedures;
see the <citetitle>Administrator's Guide</> for details. However, it is
unwise to depend on uniqueness of transaction IDs over the long term
(more than one billion transactions).
</para>
<para>
Command identifiers are also 32-bit quantities. This creates a hard
limit of 2<superscript>32</> (4 billion) SQL commands within a single transaction.
In practice this limit is not a problem --- note that the limit is on
number of SQL queries, not number of tuples processed.
</para>
</sect1>
</sect2>
</sect1>
<sect1 id="sql-expressions">
<title>Value Expressions</title>
......@@ -864,8 +1017,9 @@ CAST ( '<replaceable>string</replaceable>' AS <replaceable>type</replaceable> )
<replaceable>correlation</replaceable>.<replaceable>columnname</replaceable> `['<replaceable>subscript</replaceable>`]'
</synopsis>
<replaceable>correlation</replaceable> is either the name of a
table, an alias for a table defined by means of a FROM clause, or
<replaceable>correlation</replaceable> is the name of a
table (possibly qualified), or an alias for a table defined by means of a
FROM clause, or
the key words <literal>NEW</literal> or <literal>OLD</literal>.
(NEW and OLD can only appear in the action portion of a rule,
while other correlation names can be used in any SQL statement.)
......@@ -918,9 +1072,13 @@ CREATE FUNCTION dept (text) RETURNS dept
<member><replaceable>expression</replaceable> <replaceable>operator</replaceable> (unary postfix operator)</member>
</simplelist>
where the <replaceable>operator</replaceable> token follows the syntax
rules of <xref linkend="sql-syntax-operators"> or is one of the
tokens <token>AND</token>, <token>OR</token>, and
<token>NOT</token>. Which particular operators exist and whether
rules of <xref linkend="sql-syntax-operators">, or is one of the
keywords <token>AND</token>, <token>OR</token>, and
<token>NOT</token>, or is a qualified operator name
<synopsis>
<literal>OPERATOR(</><replaceable>schema</><literal>.</><replaceable>operatorname</><literal>)</>
</synopsis>
Which particular operators exist and whether
they are unary or binary depends on what operators have been
defined by the system or the user. <xref linkend="functions">
describes the built-in operators.
......@@ -932,8 +1090,7 @@ CREATE FUNCTION dept (text) RETURNS dept
<para>
The syntax for a function call is the name of a function
(which is subject to the syntax rules for identifiers of <xref
linkend="sql-syntax-identifiers">), followed by its argument list
(possibly qualified with a schema name), followed by its argument list
enclosed in parentheses:
<synopsis>
......@@ -976,7 +1133,8 @@ sqrt(2)
</simplelist>
where <replaceable>aggregate_name</replaceable> is a previously
defined aggregate, and <replaceable>expression</replaceable> is
defined aggregate (possibly a qualified name), and
<replaceable>expression</replaceable> is
any value expression that does not itself contain an aggregate
expression.
</para>
......@@ -1044,10 +1202,14 @@ CAST ( <replaceable>expression</replaceable> AS <replaceable>type</replaceable>
</para>
<para>
An explicit type cast may be omitted if there is no ambiguity as to the
type that a value expression must produce (for example, when it is
An explicit type cast may usually be omitted if there is no ambiguity as
to the type that a value expression must produce (for example, when it is
assigned to a table column); the system will automatically apply a
type cast in such cases.
type cast in such cases. However, automatic casting is only done for
cast functions that are marked <quote>okay to apply implicitly</>
in the system catalogs. Other cast functions must be invoked with
explicit casting syntax. This restriction is intended to prevent
surprising conversions from being applied silently.
</para>
<para>
......@@ -1061,7 +1223,7 @@ CAST ( <replaceable>expression</replaceable> AS <replaceable>type</replaceable>
can't be used this way, but the equivalent <literal>float8</literal>
can. Also, the names <literal>interval</>, <literal>time</>, and
<literal>timestamp</> can only be used in this fashion if they are
double-quoted, because of parser conflicts. Therefore, the use of
double-quoted, because of syntactic conflicts. Therefore, the use of
the function-like cast syntax leads to inconsistencies and should
probably be avoided in new applications.
</para>
......@@ -1143,21 +1305,21 @@ SELECT (5 !) - 6;
<tbody>
<row>
<entry><token>::</token></entry>
<entry><token>.</token></entry>
<entry>left</entry>
<entry><productname>PostgreSQL</productname>-style typecast</entry>
<entry>table/column name separator</entry>
</row>
<row>
<entry><token>[</token> <token>]</token></entry>
<entry><token>::</token></entry>
<entry>left</entry>
<entry>array element selection</entry>
<entry><productname>PostgreSQL</productname>-style typecast</entry>
</row>
<row>
<entry><token>.</token></entry>
<entry><token>[</token> <token>]</token></entry>
<entry>left</entry>
<entry>table/column name separator</entry>
<entry>array element selection</entry>
</row>
<row>
......
......@@ -228,7 +228,32 @@ should use this new function and will no longer do the implicit conversion using
<step performance="required">
<para>
Check for an exact match in the <classname>pg_operator</classname> system catalog.
Select the operators to be considered from the
<classname>pg_operator</classname> system catalog. If an unqualified
operator name is used (the usual case), the operators
considered are those of the right name and argument count that are
visible in the current search path (see <xref linkend="sql-naming">).
If a qualified operator name was given, only operators in the specified
schema are considered.
</para>
<substeps>
<step performance="optional">
<para>
If the search path finds multiple operators of identical argument types,
only the one appearing earliest in the path is considered. But operators of
different argument types are considered on an equal footing regardless of
search path position.
</para>
</step>
</substeps>
</step>
<step performance="required">
<para>
Check for an operator accepting exactly the input argument types.
If one exists (there can be only one exact match in the set of
operators considered), use it.
</para>
<substeps>
......@@ -250,10 +275,11 @@ Look for the best match.
<substeps>
<step performance="required">
<para>
Make a list of all operators of the same name for which the input types
match or can be coerced to match. (<type>unknown</type> literals are
assumed to be coercible to anything for this purpose.) If there is only
one, use it; else continue to the next step.
Discard candidate operators for which the input types do not match
and cannot be coerced (using an implicit coercion function) to match.
<type>unknown</type> literals are
assumed to be coercible to anything for this purpose. If only one
candidate remains, use it; else continue to the next step.
</para>
</step>
<step performance="required">
......@@ -467,13 +493,38 @@ tgl=> select cast(text '20' as int8) ! as "factorial";
<step performance="required">
<para>
Check for an exact match in the <classname>pg_proc</classname> system catalog.
Select the functions to be considered from the
<classname>pg_proc</classname> system catalog. If an unqualified
function name is used, the functions
considered are those of the right name and argument count that are
visible in the current search path (see <xref linkend="sql-naming">).
If a qualified function name was given, only functions in the specified
schema are considered.
</para>
<substeps>
<step performance="optional">
<para>
If the search path finds multiple functions of identical argument types,
only the one appearing earliest in the path is considered. But functions of
different argument types are considered on an equal footing regardless of
search path position.
</para>
</step>
</substeps>
</step>
<step performance="required">
<para>
Check for a function accepting exactly the input argument types.
If one exists (there can be only one exact match in the set of
functions considered), use it.
(Cases involving <type>unknown</type> will never find a match at
this step.)
</para></step>
<step performance="required">
<para>
If no exact match appears in the catalog, see whether the function call appears
If no exact match is found, see whether the function call appears
to be a trivial type coercion request. This happens if the function call
has just one argument and the function name is the same as the (internal)
name of some data type. Furthermore, the function argument must be either
......@@ -489,11 +540,11 @@ Look for the best match.
<substeps>
<step performance="required">
<para>
Make a list of all functions of the same name with the same number of
arguments for which the input types
match or can be coerced to match. (<type>unknown</type> literals are
assumed to be coercible to anything for this purpose.) If there is only
one, use it; else continue to the next step.
Discard candidate functions for which the input types do not match
and cannot be coerced (using an implicit coercion function) to match.
<type>unknown</type> literals are
assumed to be coercible to anything for this purpose. If only one
candidate remains, use it; else continue to the next step.
</para>
</step>
<step performance="required">
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment