Updates for schema features.

e358a61d · Tom Lane · 52200bef · e358a61d · e358a61d · e358a61d
Commit e358a61d authored Apr 25, 2002 by Tom Lane
4 changed files
--- a/doc/src/sgml/datatype.sgml
+++ b/doc/src/sgml/datatype.sgml
 <!--
-$Header: /cvsroot/pgsql/doc/src/sgml/datatype.sgml,v 1.90 2002/04/25 02:56:55 tgl Exp $
+$Header: /cvsroot/pgsql/doc/src/sgml/datatype.sgml,v 1.91 2002/04/25 20:14:43 tgl Exp $
 -->

 <chapter id="datatype">
@@ -2924,6 +2924,18 @@ SELECT SUBSTRING(b FROM 1 FOR 2) FROM test;
    <primary>regtype</primary>
   </indexterm>

+   <indexterm zone="datatype-oid">
+    <primary>xid</primary>
+   </indexterm>
+
+   <indexterm zone="datatype-oid">
+    <primary>cid</primary>
+   </indexterm>
+
+   <indexterm zone="datatype-oid">
+    <primary>tid</primary>
+   </indexterm>
+
   <para>
    Object identifiers (OIDs) are used internally by
    <productname>PostgreSQL</productname> as primary keys for various system
@@ -3034,7 +3046,7 @@ SELECT SUBSTRING(b FROM 1 FOR 2) FROM test;
   </para>

   <para>
-    All of the alias types accept schema-qualified names, and will
+    All of the OID alias types accept schema-qualified names, and will
    display schema-qualified names on output if the object would not
    be found in the current search path without being qualified.
    The <type>regproc</> and <type>regoper</> alias types will only
@@ -3045,6 +3057,52 @@ SELECT SUBSTRING(b FROM 1 FOR 2) FROM test;
    operand.
   </para>

+   <para>
+    OIDs are 32-bit quantities and are assigned from a single cluster-wide
+    counter.  In a large or long-lived database, it is possible for the
+    counter to wrap around.  Hence, it is bad practice to assume that OIDs
+    are unique, unless you take steps to ensure that they are unique.
+    Recommended practice when using OIDs for row identification is to create
+    a unique constraint on the OID column of each table for which the OID will
+    be used.  Never assume that OIDs are unique across tables; use the
+    combination of <structfield>tableoid</> and row OID if you need a
+    database-wide identifier.  (Future releases of
+    <productname>PostgreSQL</productname> are likely to use a separate
+    OID counter for each table, so that <structfield>tableoid</>
+    <emphasis>must</> be included to arrive at a globally unique identifier.)
+   </para>
+
+   <para>
+    Another identifier type used by the system is <type>xid</>, or transaction
+    (abbreviated xact) identifier.  This is the datatype of the system columns
+    <structfield>xmin</> and <structfield>xmax</>.
+    Transaction identifiers are 32-bit quantities.  In a long-lived
+    database it is possible for transaction IDs to wrap around.  This
+    is not a fatal problem given appropriate maintenance procedures;
+    see the <citetitle>Administrator's Guide</> for details.  However, it is
+    unwise to depend on uniqueness of transaction IDs over the long term
+    (more than one billion transactions).
+   </para>
+
+   <para>
+    A third identifier type used by the system is <type>cid</>, or command
+    identifier.  This is the datatype of the system columns
+    <structfield>cmin</> and <structfield>cmax</>.
+    Command identifiers are also 32-bit quantities.  This creates a hard
+    limit of 2<superscript>32</> (4 billion) SQL commands within a single
+    transaction.
+    In practice this limit is not a problem --- note that the limit is on
+    number of SQL queries, not number of tuples processed.
+   </para>
+
+   <para>
+    A final identifier type used by the system is <type>tid</>, or tuple
+    identifier.  This is the datatype of the system column
+    <structfield>ctid</>.  A tuple ID is a pair
+    (block number, tuple index within block) that identifies the
+    physical location of the tuple within its table.
+   </para>
+
  </sect1>

 </chapter>

--- a/doc/src/sgml/queries.sgml
+++ b/doc/src/sgml/queries.sgml
-<!-- $Header: /cvsroot/pgsql/doc/src/sgml/queries.sgml,v 1.14 2001/11/28 20:49:10 petere Exp $ -->
+<!-- $Header: /cvsroot/pgsql/doc/src/sgml/queries.sgml,v 1.15 2002/04/25 20:14:43 tgl Exp $ -->

 <chapter id="queries">
 <title>Queries</title>
@@ -86,7 +86,8 @@ SELECT random();
 FROM <replaceable>table_reference</replaceable> <optional>, <replaceable>table_reference</replaceable> <optional>, ...</optional></optional>
 </synopsis>

-    A table reference may be a table name or a derived table such as a
+    A table reference may be a table name (possibly schema-qualified),
+    or a derived table such as a
    subquery, a table join, or complex combinations of these.  If more
    than one table reference is listed in the FROM clause they are
    cross-joined (see below) to form the derived table that may then

--- a/doc/src/sgml/syntax.sgml
+++ b/doc/src/sgml/syntax.sgml
 <!--
-$Header: /cvsroot/pgsql/doc/src/sgml/syntax.sgml,v 1.59 2002/03/22 19:20:31 petere Exp $
+$Header: /cvsroot/pgsql/doc/src/sgml/syntax.sgml,v 1.60 2002/04/25 20:14:43 tgl Exp $
 -->

 <chapter id="sql-syntax">
@@ -623,14 +623,197 @@ CAST ( '<replaceable>string</replaceable>' AS <replaceable>type</replaceable> )
  </sect2>
 </sect1>

+ <sect1 id="sql-naming">
+  <title>Schemas and naming conventions</title>

-  <sect1 id="sql-syntax-columns">
-   <title>Columns</title>
+     <indexterm>
+      <primary>schemas</primary>
+     </indexterm>

+     <indexterm>
+      <primary>search path</primary>
+     </indexterm>
+
+     <indexterm>
+      <primary>namespaces</primary>
+     </indexterm>
+
+   <para>
+    A <productname>PostgreSQL</productname> database cluster (installation)
+    contains one or more named databases.  Users and groups of users are
+    shared across the entire cluster, but no other data is shared across
+    databases.  Any given client connection to the server can access
+    only the data in a single database, the one specified in the connection
+    request.
+   </para>
+
+   <note>
    <para>
-     A <firstterm>column</firstterm>
-     is either a user-defined column of a given table or one of the
-     following system-defined columns:
+     Users of a cluster do not necessarily have the privilege to access every
+     database in the cluster.  Sharing of user names means that there
+     cannot be different users named, say, <literal>joe</> in two databases
+     in the same cluster; but the system can be configured to allow
+     <literal>joe</> access to only some of the databases.
+    </para>
+   </note>
+
+   <para>
+    A database contains one or more named <firstterm>schemas</>, which
+    in turn contain tables.  Schemas also contain other kinds of named
+    objects, including datatypes, functions, and operators.  The same
+    object name can be used in different schemas without conflict; for
+    example, both <literal>schema1</> and <literal>myschema</> may
+    contain tables named <literal>mytable</>.  Unlike databases, schemas
+    are not rigidly separated: a user may access objects in any of the
+    schemas in the database he is connected to, if he has privileges
+    to do so.
+   </para>
+
+     <indexterm>
+      <primary>qualified names</primary>
+     </indexterm>
+
+     <indexterm>
+      <primary>names</primary>
+      <secondary>qualified</secondary>
+     </indexterm>
+
+   <para>
+    To name a table precisely, write a <firstterm>qualified name</> consisting
+    of the schema name and table name separated by a dot:
+<synopsis>
+    <replaceable>schema</><literal>.</><replaceable>table</>
+</synopsis>
+    Actually, the even more general syntax
+<synopsis>
+    <replaceable>database</><literal>.</><replaceable>schema</><literal>.</><replaceable>table</>
+</synopsis>
+    can be used too, but at present this is just for pro-forma compliance
+    with the SQL standard; if you write a database name it must be the
+    same as the database you are connected to.
+   </para>
+
+     <indexterm>
+      <primary>unqualified names</primary>
+     </indexterm>
+
+     <indexterm>
+      <primary>names</primary>
+      <secondary>unqualified</secondary>
+     </indexterm>
+
+   <para>
+    Qualified names are tedious to write, and it's often best not to
+    wire a particular schema name into applications anyway.  Therefore
+    tables are often referred to by <firstterm>unqualified names</>,
+    which consist of just the table name.  The system determines which table
+    is meant by following a <firstterm>search path</>, which is a list
+    of schemas to look in.  The first matching table in the search path
+    is taken to be the one wanted.  If there is no match in the search
+    path, an error is reported, even if matching table names exist
+    in other schemas in the database.
+   </para>
+
+   <para>
+    The first schema named in the search path is called the current schema.
+    Aside from being the first schema searched, it is also the schema in
+    which new tables will be created if the <command>CREATE TABLE</>
+    command does not specify a schema name.
+   </para>
+
+   <para>
+    The search path works in the same way for datatype names, function names,
+    and operator names as it does for table names.  Datatype and function
+    names can be qualified in exactly the same way as table names.  If you
+    need to write a qualified operator name in an expression, there is a
+    special provision: you must write
+<synopsis>
+    <literal>OPERATOR(</><replaceable>schema</><literal>.</><replaceable>operator</><literal>)</>
+</synopsis>
+    This is needed to avoid syntactic ambiguity.  An example is
+<programlisting>
+    SELECT 3 OPERATOR(pg_catalog.+) 4;
+</programlisting>
+    In practice one usually relies on the search path for operators,
+    so as not to have to write anything so ugly as that.
+   </para>
+
+   <para>
+    The standard search path in <productname>PostgreSQL</productname>
+    contains first the schema having the same name as the session user
+    (if it exists), and second the schema named <literal>public</>
+    (if it exists, which it does by default).  This arrangement allows
+    a flexible combination of private and shared tables.  If no per-user
+    schemas are created then all user tables will exist in the shared
+    <literal>public</> schema, providing behavior that is backwards-compatible
+    with pre-7.3 <productname>PostgreSQL</productname> releases.
+   </para>
+
+   <note>
+    <para>
+     There is no concept of a <literal>public</> schema in the SQL standard.
+     To achieve closest conformance to the standard, the DBA should
+     create per-user schemas for every user, and not use (perhaps even
+     remove) the <literal>public</> schema.
+    </para>
+   </note>
+
+   <para>
+    In addition to <literal>public</> and user-created schemas, each database
+    contains a 
+    <literal>pg_catalog</> schema, which contains the system tables
+    and all the built-in datatypes, functions, and operators.
+    <literal>pg_catalog</> is always effectively part of the search path.
+    If it is not named explicitly in the path then it is implicitly searched
+    <emphasis>before</> searching the path's schemas.  This ensures that
+    built-in names will always be findable.  However, you may explicitly
+    place <literal>pg_catalog</> at the end of your search path if you
+    prefer to have user-defined names override built-in names.
+   </para>
+
+  <sect2 id="sql-reserved-names">
+   <title>Reserved names</title>
+
+     <indexterm>
+      <primary>reserved names</primary>
+     </indexterm>
+
+     <indexterm>
+      <primary>names</primary>
+      <secondary>reserved</secondary>
+     </indexterm>
+
+    <para>
+     There are several restrictions on the names that can be chosen for
+     user-defined database objects.  These restrictions vary depending
+     on the kind of object.  (Note that these restrictions are
+     separate from whether the name is a key word or not; quoting a
+     name will not allow you to escape these restrictions.)
+    </para>
+
+    <para>
+     Schema names beginning with <literal>pg_</> are reserved for system
+     purposes and may not be created by users.
+    </para>
+
+    <para>
+     In <productname>PostgreSQL</productname> versions before 7.3, table
+     names beginning with <literal>pg_</> were reserved.  This is no longer
+     true: you may create such a table name if you wish, in any non-system
+     schema.  However, it's best to continue to avoid such names,
+     to ensure that you won't suffer a conflict if some future version
+     defines a system catalog named the same as your table.  (With the
+     default search path, an unqualified reference to your table name
+     would be resolved as the system catalog instead.)  System catalogs will
+     continue to follow the convention of having names beginning with
+     <literal>pg_</>, so that they will not conflict with unqualified
+     user-table names so long as users avoid the <literal>pg_</> prefix.
+    </para>
+
+    <para>
+     Every table has several <firstterm>system columns</> that are
+     implicitly defined by the system.  Therefore, these names cannot
+     be used as names of user-defined columns:

     <indexterm>
      <primary>columns</primary>
@@ -648,7 +831,7 @@ CAST ( '<replaceable>string</replaceable>' AS <replaceable>type</replaceable> )
 	 The object identifier (object ID) of a row.  This is a serial number
 	 that is automatically added by <productname>PostgreSQL</productname> to all table rows (unless
 	 the table was created WITHOUT OIDS, in which case this column is
-	 not present).
+	 not present).  See <xref linkend="datatype-oid"> for more info.
 	</para>
       </listitem>
      </varlistentry>
@@ -715,13 +898,13 @@ CAST ( '<replaceable>string</replaceable>' AS <replaceable>type</replaceable> )
      <term><structfield>ctid</></term>
       <listitem>
 	<para>
-	 The tuple ID of the tuple within its table.  This is a pair
-	 (block number, tuple index within block) that identifies the
-	 physical location of the tuple.  Note that although the <structfield>ctid</structfield>
-	 can be used to locate the tuple very quickly, a row's <structfield>ctid</structfield>
-	 will change each time it is updated or moved by <command>VACUUM
-	 FULL</>.
-	 Therefore <structfield>ctid</structfield> is useless as a long-term row identifier.
+	 The physical location of the tuple within its table.
+	 Note that although the <structfield>ctid</structfield>
+	 can be used to locate the tuple very quickly, a row's
+	 <structfield>ctid</structfield> will change each time it is updated
+	 or moved by <command>VACUUM FULL</>.
+	 Therefore <structfield>ctid</structfield> is useless as a long-term
+	 row identifier.
 	 The OID, or even better a user-defined serial number, should
 	 be used to identify logical rows.
 	</para>
@@ -729,38 +912,8 @@ CAST ( '<replaceable>string</replaceable>' AS <replaceable>type</replaceable> )
      </varlistentry>
     </variablelist>
    </para>
-
-    <para>
-     OIDs are 32-bit quantities and are assigned from a single cluster-wide
-     counter.  In a large or long-lived database, it is possible for the
-     counter to wrap around.  Hence, it is bad practice to assume that OIDs
-     are unique, unless you take steps to ensure that they are unique.
-     Recommended practice when using OIDs for row identification is to create
-     a unique constraint on the OID column of each table for which the OID will be
-     used.  Never assume that OIDs are unique across tables; use the
-     combination of <structfield>tableoid</> and row OID if you need a database-wide
-     identifier.  (Future releases of <productname>PostgreSQL</productname> are likely to use a separate
-     OID counter for each table, so that <structfield>tableoid</> <emphasis>must</> be
-     included to arrive at a globally unique identifier.)
-    </para>
-
-    <para>
-     Transaction identifiers are 32-bit quantities.  In a long-lived
-     database it is possible for transaction IDs to wrap around.  This
-     is not a fatal problem given appropriate maintenance procedures;
-     see the <citetitle>Administrator's Guide</> for details.  However, it is
-     unwise to depend on uniqueness of transaction IDs over the long term
-     (more than one billion transactions).
-    </para>
-
-    <para>
-     Command identifiers are also 32-bit quantities.  This creates a hard
-     limit of 2<superscript>32</> (4 billion) SQL commands within a single transaction.
-     In practice this limit is not a problem --- note that the limit is on
-     number of SQL queries, not number of tuples processed.
-    </para>
-  </sect1>
-
+  </sect2>
+ </sect1>

 <sect1 id="sql-expressions">
  <title>Value Expressions</title>
@@ -864,8 +1017,9 @@ CAST ( '<replaceable>string</replaceable>' AS <replaceable>type</replaceable> )
 <replaceable>correlation</replaceable>.<replaceable>columnname</replaceable> `['<replaceable>subscript</replaceable>`]'
 </synopsis>

-    <replaceable>correlation</replaceable> is either the name of a
-    table, an alias for a table defined by means of a FROM clause, or
+    <replaceable>correlation</replaceable> is the name of a
+    table (possibly qualified), or an alias for a table defined by means of a
+    FROM clause, or 
    the key words <literal>NEW</literal> or <literal>OLD</literal>.
    (NEW and OLD can only appear in the action portion of a rule,
    while other correlation names can be used in any SQL statement.)
@@ -918,9 +1072,13 @@ CREATE FUNCTION dept (text) RETURNS dept
     <member><replaceable>expression</replaceable> <replaceable>operator</replaceable> (unary postfix operator)</member>
    </simplelist>
    where the <replaceable>operator</replaceable> token follows the syntax
-    rules of <xref linkend="sql-syntax-operators"> or is one of the
-    tokens <token>AND</token>, <token>OR</token>, and
-    <token>NOT</token>.  Which particular operators exist and whether
+    rules of <xref linkend="sql-syntax-operators">, or is one of the
+    keywords <token>AND</token>, <token>OR</token>, and
+    <token>NOT</token>, or is a qualified operator name
+<synopsis>
+    <literal>OPERATOR(</><replaceable>schema</><literal>.</><replaceable>operatorname</><literal>)</>
+</synopsis>
+    Which particular operators exist and whether
    they are unary or binary depends on what operators have been
    defined by the system or the user.  <xref linkend="functions">
    describes the built-in operators.
@@ -932,8 +1090,7 @@ CREATE FUNCTION dept (text) RETURNS dept

   <para>
    The syntax for a function call is the name of a function
-    (which is subject to the syntax rules for identifiers of <xref
-    linkend="sql-syntax-identifiers">), followed by its argument list
+    (possibly qualified with a schema name), followed by its argument list
    enclosed in parentheses:

 <synopsis>
@@ -976,7 +1133,8 @@ sqrt(2)
    </simplelist>

    where <replaceable>aggregate_name</replaceable> is a previously
-    defined aggregate, and <replaceable>expression</replaceable> is
+    defined aggregate (possibly a qualified name), and
+    <replaceable>expression</replaceable> is 
    any value expression that does not itself contain an aggregate
    expression.
   </para>
@@ -1044,10 +1202,14 @@ CAST ( <replaceable>expression</replaceable> AS <replaceable>type</replaceable>
   </para>

   <para>
-    An explicit type cast may be omitted if there is no ambiguity as to the
-    type that a value expression must produce (for example, when it is
+    An explicit type cast may usually be omitted if there is no ambiguity as
+    to the type that a value expression must produce (for example, when it is
    assigned to a table column); the system will automatically apply a
-    type cast in such cases.
+    type cast in such cases.  However, automatic casting is only done for
+    cast functions that are marked <quote>okay to apply implicitly</>
+    in the system catalogs.  Other cast functions must be invoked with
+    explicit casting syntax.  This restriction is intended to prevent
+    surprising conversions from being applied silently.
   </para>

   <para>
@@ -1061,7 +1223,7 @@ CAST ( <replaceable>expression</replaceable> AS <replaceable>type</replaceable>
    can't be used this way, but the equivalent <literal>float8</literal>
    can.  Also, the names <literal>interval</>, <literal>time</>, and
    <literal>timestamp</> can only be used in this fashion if they are
-    double-quoted, because of parser conflicts.  Therefore, the use of
+    double-quoted, because of syntactic conflicts.  Therefore, the use of
    the function-like cast syntax leads to inconsistencies and should
    probably be avoided in new applications.
   </para>
@@ -1143,21 +1305,21 @@ SELECT (5 !) - 6;

     <tbody>
      <row>
-       <entry><token>::</token></entry>
+       <entry><token>.</token></entry>
       <entry>left</entry>
-       <entry><productname>PostgreSQL</productname>-style typecast</entry>
+       <entry>table/column name separator</entry>
      </row>

      <row>
-       <entry><token>[</token> <token>]</token></entry>
+       <entry><token>::</token></entry>
       <entry>left</entry>
-       <entry>array element selection</entry>
+       <entry><productname>PostgreSQL</productname>-style typecast</entry>
      </row>

      <row>
-       <entry><token>.</token></entry>
+       <entry><token>[</token> <token>]</token></entry>
       <entry>left</entry>
-       <entry>table/column name separator</entry>
+       <entry>array element selection</entry>
      </row>

      <row>

--- a/doc/src/sgml/typeconv.sgml
+++ b/doc/src/sgml/typeconv.sgml
@@ -228,7 +228,32 @@ should use this new function and will no longer do the implicit conversion using

 <step performance="required">
 <para>
-Check for an exact match in the <classname>pg_operator</classname> system catalog.
+Select the operators to be considered from the
+<classname>pg_operator</classname> system catalog.  If an unqualified
+operator name is used (the usual case), the operators
+considered are those of the right name and argument count that are
+visible in the current search path (see <xref linkend="sql-naming">).
+If a qualified operator name was given, only operators in the specified
+schema are considered.
+</para>
+
+<substeps>
+<step performance="optional">
+<para>
+If the search path finds multiple operators of identical argument types,
+only the one appearing earliest in the path is considered.  But operators of
+different argument types are considered on an equal footing regardless of
+search path position.
+</para>
+</step>
+</substeps>
+</step>
+
+<step performance="required">
+<para>
+Check for an operator accepting exactly the input argument types.
+If one exists (there can be only one exact match in the set of
+operators considered), use it.
 </para>

 <substeps>
@@ -250,10 +275,11 @@ Look for the best match.
 <substeps>
 <step performance="required">
 <para>
-Make a list of all operators of the same name for which the input types
-match or can be coerced to match.  (<type>unknown</type> literals are
-assumed to be coercible to anything for this purpose.)  If there is only
-one, use it; else continue to the next step.
+Discard candidate operators for which the input types do not match
+and cannot be coerced (using an implicit coercion function) to match.
+<type>unknown</type> literals are
+assumed to be coercible to anything for this purpose.  If only one
+candidate remains, use it; else continue to the next step.
 </para>
 </step>
 <step performance="required">
@@ -467,13 +493,38 @@ tgl=> select cast(text '20' as int8) ! as "factorial";

 <step performance="required">
 <para>
-Check for an exact match in the <classname>pg_proc</classname> system catalog.
+Select the functions to be considered from the
+<classname>pg_proc</classname> system catalog.  If an unqualified
+function name is used, the functions
+considered are those of the right name and argument count that are
+visible in the current search path (see <xref linkend="sql-naming">).
+If a qualified function name was given, only functions in the specified
+schema are considered.
+</para>
+
+<substeps>
+<step performance="optional">
+<para>
+If the search path finds multiple functions of identical argument types,
+only the one appearing earliest in the path is considered.  But functions of
+different argument types are considered on an equal footing regardless of
+search path position.
+</para>
+</step>
+</substeps>
+</step>
+
+<step performance="required">
+<para>
+Check for a function accepting exactly the input argument types.
+If one exists (there can be only one exact match in the set of
+functions considered), use it.
 (Cases involving <type>unknown</type> will never find a match at
 this step.)
 </para></step>
 <step performance="required">
 <para>
-If no exact match appears in the catalog, see whether the function call appears
+If no exact match is found, see whether the function call appears
 to be a trivial type coercion request.  This happens if the function call
 has just one argument and the function name is the same as the (internal)
 name of some data type.  Furthermore, the function argument must be either
@@ -489,11 +540,11 @@ Look for the best match.
 <substeps>
 <step performance="required">
 <para>
-Make a list of all functions of the same name with the same number of
-arguments for which the input types
-match or can be coerced to match.  (<type>unknown</type> literals are
-assumed to be coercible to anything for this purpose.)  If there is only
-one, use it; else continue to the next step.
+Discard candidate functions for which the input types do not match
+and cannot be coerced (using an implicit coercion function) to match.
+<type>unknown</type> literals are
+assumed to be coercible to anything for this purpose.  If only one
+candidate remains, use it; else continue to the next step.
 </para>
 </step>
 <step performance="required">