Commit a612b171 authored by Tom Lane's avatar Tom Lane

Assorted editing for collation documentation.

I made a pass over this to familiarize myself with the feature, and found
some things that could be improved.
parent 4502c8e1
...@@ -1128,8 +1128,8 @@ ...@@ -1128,8 +1128,8 @@
<entry><type>oid</type></entry> <entry><type>oid</type></entry>
<entry><literal><link linkend="catalog-pg-collation"><structname>pg_collation</structname></link>.oid</literal></entry> <entry><literal><link linkend="catalog-pg-collation"><structname>pg_collation</structname></link>.oid</literal></entry>
<entry> <entry>
The defined collation of the column, zero if the column does The defined collation of the column, or zero if the column is
not have a collatable type. not of a collatable datatype.
</entry> </entry>
</row> </row>
...@@ -2088,7 +2088,7 @@ ...@@ -2088,7 +2088,7 @@
The catalog <structname>pg_collation</structname> describes the The catalog <structname>pg_collation</structname> describes the
available collations, which are essentially mappings from an SQL available collations, which are essentially mappings from an SQL
name to operating system locale categories. name to operating system locale categories.
See <xref linkend="locale"> for more information. See <xref linkend="collation"> for more information.
</para> </para>
<table> <table>
...@@ -2132,38 +2132,48 @@ ...@@ -2132,38 +2132,48 @@
<entry><structfield>collencoding</structfield></entry> <entry><structfield>collencoding</structfield></entry>
<entry><type>int4</type></entry> <entry><type>int4</type></entry>
<entry></entry> <entry></entry>
<entry> <entry>Encoding to which the collation is applicable</entry>
Encoding to which the collation is applicable. SQL-level
commands such as <command>ALTER COLLATION</command> only
operate on the collation belonging to the current database
encoding. But this field is necessary because when this
catalog is initialized, the encoding of future databases is not
yet known. For practical purposes, collations that do not
match the current database encoding should be considered
invalid or invisible. It could be useful, however, to create
collations whose encoding does not match the database encoding
in template databases. This would currently have to be done
manually.
</entry>
</row> </row>
<row> <row>
<entry><structfield>collcollate</structfield></entry> <entry><structfield>collcollate</structfield></entry>
<entry><type>name</type></entry> <entry><type>name</type></entry>
<entry></entry> <entry></entry>
<entry>LC_COLLATE for this collation object</entry> <entry><symbol>LC_COLLATE</> for this collation object</entry>
</row> </row>
<row> <row>
<entry><structfield>collctype</structfield></entry> <entry><structfield>collctype</structfield></entry>
<entry><type>name</type></entry> <entry><type>name</type></entry>
<entry></entry> <entry></entry>
<entry>LC_CTYPE for this collation object</entry> <entry><symbol>LC_CTYPE</> for this collation object</entry>
</row> </row>
</tbody> </tbody>
</tgroup> </tgroup>
</table> </table>
<para>
Note that the unique key on this catalog is (<structfield>collname</>,
<structfield>collencoding</>, <structfield>collnamespace</>) not just
(<structfield>collname</>, <structfield>collnamespace</>).
<productname>PostgreSQL</productname> generally ignores all
collations not belonging to the current database's encoding; therefore
it is sufficient to use a qualified SQL name
(<replaceable>schema</>.<replaceable>name</>) to identify a collation,
even though this is not unique according to the catalog definition.
The current database's encoding is automatically used as an additional
lookup key. The reason for defining the catalog this way is that
<application>initdb</> fills it in at cluster initialization time with
entries for all locales available on the system, so it must be able to
hold entries for all encodings that might ever be used in the cluster.
</para>
<para>
In the <literal>template0</> database, it could be useful to create
collations whose encoding does not match the database encoding,
since they could match the encodings of databases later cloned from
<literal>template0</>. This would currently have to be done manually.
</para>
</sect1> </sect1>
<sect1 id="catalog-pg-conversion"> <sect1 id="catalog-pg-conversion">
...@@ -6123,12 +6133,11 @@ ...@@ -6123,12 +6133,11 @@
<entry><literal><link linkend="catalog-pg-collation"><structname>pg_collation</structname></link>.oid</literal></entry> <entry><literal><link linkend="catalog-pg-collation"><structname>pg_collation</structname></link>.oid</literal></entry>
<entry><para> <entry><para>
<structfield>typcollation</structfield> specifies the collation <structfield>typcollation</structfield> specifies the collation
of the type. If a type does not support collations, this will of the type. If the type does not support collations, this will
be zero, collation analysis at parse time is skipped, and be zero. A base type that supports collations will have
the use of <literal>COLLATE</literal> clauses with the type is <symbol>DEFAULT_COLLATION_OID</symbol> here. A domain over a
invalid. A base type that supports collations will have collatable type can have some other collation OID, if one was defined
<symbol>DEFAULT_COLLATION_OID</symbol> here. A domain can have for the domain.
another collation OID, if one was defined for the domain.
</para></entry> </para></entry>
</row> </row>
......
...@@ -15,6 +15,8 @@ ...@@ -15,6 +15,8 @@
Using the locale features of the operating system to provide Using the locale features of the operating system to provide
locale-specific collation order, number formatting, translated locale-specific collation order, number formatting, translated
messages, and other aspects. messages, and other aspects.
This is covered in <xref linkend="locale"> and
<xref linkend="collation">.
</para> </para>
</listitem> </listitem>
...@@ -23,6 +25,7 @@ ...@@ -23,6 +25,7 @@
Providing a number of different character sets to support storing text Providing a number of different character sets to support storing text
in all kinds of languages, and providing character set translation in all kinds of languages, and providing character set translation
between client and server. between client and server.
This is covered in <xref linkend="multibyte">.
</para> </para>
</listitem> </listitem>
</itemizedlist> </itemizedlist>
...@@ -138,9 +141,12 @@ initdb --locale=sv_SE ...@@ -138,9 +141,12 @@ initdb --locale=sv_SE
fixed when the database is created. You can use different settings fixed when the database is created. You can use different settings
for different databases, but once a database is created, you cannot for different databases, but once a database is created, you cannot
change them for that database anymore. <literal>LC_COLLATE</literal> change them for that database anymore. <literal>LC_COLLATE</literal>
and <literal>LC_CTYPE</literal> are these type of categories. They affect and <literal>LC_CTYPE</literal> are these categories. They affect
the sort order of indexes, so they must be kept fixed, or indexes on the sort order of indexes, so they must be kept fixed, or indexes on
text columns would become corrupt. The default values for these text columns would become corrupt.
(But you can alleviate this restriction using collations, as discussed
in <xref linkend="collation">.)
The default values for these
categories are determined when <command>initdb</command> is run, and categories are determined when <command>initdb</command> is run, and
those values are used when new databases are created, unless those values are used when new databases are created, unless
specified otherwise in the <command>CREATE DATABASE</command> command. specified otherwise in the <command>CREATE DATABASE</command> command.
...@@ -153,7 +159,7 @@ initdb --locale=sv_SE ...@@ -153,7 +159,7 @@ initdb --locale=sv_SE
linkend="runtime-config-client-format"> for details). The values linkend="runtime-config-client-format"> for details). The values
that are chosen by <command>initdb</command> are actually only written that are chosen by <command>initdb</command> are actually only written
into the configuration file <filename>postgresql.conf</filename> to into the configuration file <filename>postgresql.conf</filename> to
serve as defaults when the server is started. If you disable these serve as defaults when the server is started. If you remove these
assignments from <filename>postgresql.conf</filename> then the assignments from <filename>postgresql.conf</filename> then the
server will inherit the settings from its execution environment. server will inherit the settings from its execution environment.
</para> </para>
...@@ -308,17 +314,17 @@ initdb --locale=sv_SE ...@@ -308,17 +314,17 @@ initdb --locale=sv_SE
<title>Collation Support</title> <title>Collation Support</title>
<para> <para>
The collation support allows specifying the sort order and certain The collation feature allows specifying the sort order and certain
other locale aspects of data per column or per operation at run other locale aspects of data per-column, or even per-operation.
time. This alleviates the problem that the This alleviates the restriction that the
<symbol>LC_COLLATE</symbol> and <symbol>LC_CTYPE</symbol> settings <symbol>LC_COLLATE</symbol> and <symbol>LC_CTYPE</symbol> settings
of a database cannot be changed after its creation. of a database cannot be changed after its creation.
</para> </para>
<note> <note>
<para> <para>
The collation support feature is currently only known to work on Collation support is currently only known to work on
Linux/glibc and Mac OS X platforms. Linux (glibc) and Mac OS X platforms.
</para> </para>
</note> </note>
...@@ -326,48 +332,51 @@ initdb --locale=sv_SE ...@@ -326,48 +332,51 @@ initdb --locale=sv_SE
<title>Concepts</title> <title>Concepts</title>
<para> <para>
Conceptually, every datum of a collatable data type has a Conceptually, every expression of a collatable data type has a
collation. (Collatable data types in the base system are collation. (The built-in collatable data types are
<type>text</type>, <type>varchar</type>, and <type>char</type>. <type>text</type>, <type>varchar</type>, and <type>char</type>.
User-defined base types can also be marked collatable.) If the User-defined base types can also be marked collatable.) If the
datum is a column reference, the collation of the datum is the expression is a column reference, the collation of the expression is the
defined collation of the column. If the datum is a constant, the defined collation of the column. If the expression is a constant, the
collation is the default collation of the data type of the collation is the default collation of the data type of the
constant. The collation of more complex expressions is derived constant. The collation of a more complex expression is derived
from the input collations as described below. from the collations of its inputs, as described below.
</para> </para>
<para> <para>
The collation of a datum can also be the <quote>default</quote> The collation of an expression can be the <quote>default</quote>
collation, which reverts to the locale settings defined for the collation, which means the locale settings defined for the
database. In some cases, a datum can also have no known database. In some cases, an expression can also have no known
collation. In such cases, ordering operations and other collation. In such cases, ordering operations and other
operations that need to know the collation will fail. operations that need to know the collation will fail.
</para> </para>
<para> <para>
When the database system has to perform an ordering or a When the database system has to perform an ordering or a
comparison, it considers the collation of the input data. This comparison, it uses the collation of the input expression. This
happens in two situations: an <literal>ORDER BY</literal> clause happens, for example, with <literal>ORDER BY</literal> clauses
and a function or operator call such as <literal>&lt;</literal>. and function or operator calls such as <literal>&lt;</literal>.
The collation to apply for the performance of the <literal>ORDER The collation to apply for an <literal>ORDER BY</literal> clause
BY</literal> clause is simply the collation of the sort key. The is simply the collation of the sort key. The collation to apply for a
collation to apply for a function or operator call is derived from function or operator call is derived from the arguments, as described
the arguments, as described below. Additionally, collations are below. In addition to comparison operators, collations are taken into
taken into account by functions that convert between lower and account by functions that convert between lower and upper case
upper case letters, that is, <function>lower</function>, letters, such as <function>lower</>, <function>upper</>, and
<function>upper</function>, and <function>initcap</function>. <function>initcap</>.
</para> </para>
<para> <para>
For a function call, the collation that is derived from combining For a function or operator call, the collation that is derived by
the argument collations is both used for performing any examining the argument collations is used at run time for performing
comparisons or ordering and for the collation of the function the specified operation. If the result of the function or operator
result, if the result type is collatable. call is of a collatable data type, the collation is also used at parse
time as the defined collation of the function or operator expression,
in case there is a surrounding expression that requires knowledge of
its collation.
</para> </para>
<para> <para>
The <firstterm>collation derivation</firstterm> of a datum can be The <firstterm>collation derivation</firstterm> of an expression can be
implicit or explicit. This distinction affects how collations are implicit or explicit. This distinction affects how collations are
combined when multiple different collations appear in an combined when multiple different collations appear in an
expression. An explicit collation derivation arises when a expression. An explicit collation derivation arises when a
...@@ -379,9 +388,9 @@ initdb --locale=sv_SE ...@@ -379,9 +388,9 @@ initdb --locale=sv_SE
<orderedlist> <orderedlist>
<listitem> <listitem>
<para> <para>
If any input item has an explicit collation derivation, then If any input expression has an explicit collation derivation, then
all explicitly derived collations among the input items must be all explicitly derived collations among the input expressions must be
the same, otherwise an error is raised. If an explicitly the same, otherwise an error is raised. If any explicitly
derived collation is present, that is the result of the derived collation is present, that is the result of the
collation combination. collation combination.
</para> </para>
...@@ -389,8 +398,8 @@ initdb --locale=sv_SE ...@@ -389,8 +398,8 @@ initdb --locale=sv_SE
<listitem> <listitem>
<para> <para>
Otherwise, all input items must have the same implicit Otherwise, all input expressions must have the same implicit
collation derivation or the default collation. If an collation derivation or the default collation. If any
implicitly derived collation is present, that is the result of implicitly derived collation is present, that is the result of
the collation combination. Otherwise, the result is the the collation combination. Otherwise, the result is the
default collation. default collation.
...@@ -428,19 +437,19 @@ SELECT a || ('foo' COLLATE "y") FROM test1; ...@@ -428,19 +437,19 @@ SELECT a || ('foo' COLLATE "y") FROM test1;
A collation is an SQL schema object that maps an SQL name to A collation is an SQL schema object that maps an SQL name to
operating system locales. In particular, it maps to a combination operating system locales. In particular, it maps to a combination
of <symbol>LC_COLLATE</symbol> and <symbol>LC_CTYPE</symbol>. (As of <symbol>LC_COLLATE</symbol> and <symbol>LC_CTYPE</symbol>. (As
the name would indicate, the main purpose of a collation is to set the name would suggest, the main purpose of a collation is to set
<symbol>LC_COLLATE</symbol>, which controls the sort order. But <symbol>LC_COLLATE</symbol>, which controls the sort order. But
it is rarely necessary in practice to have an it is rarely necessary in practice to have an
<symbol>LC_CTYPE</symbol> setting that is different from <symbol>LC_CTYPE</symbol> setting that is different from
<symbol>LC_COLLATE</symbol>, so it is more convenient to collect <symbol>LC_COLLATE</symbol>, so it is more convenient to collect
these under one concept than to create another infrastructure for these under one concept than to create another infrastructure for
setting <symbol>LC_CTYPE</symbol> per datum.) Also, a collation setting <symbol>LC_CTYPE</symbol> per expression.) Also, a collation
is tied to a character encoding. The same collation name may is tied to a character set encoding (see <xref linkend="multibyte">).
exist for different encodings. The same collation name may exist for different encodings.
</para> </para>
<para> <para>
When a database system is initialized, <command>initdb</command> When a database cluster is initialized, <command>initdb</command>
populates the system catalog <literal>pg_collation</literal> with populates the system catalog <literal>pg_collation</literal> with
collations based on all the locales it finds on the operating collations based on all the locales it finds on the operating
system at the time. For example, the operating system might system at the time. For example, the operating system might
...@@ -463,8 +472,19 @@ SELECT a || ('foo' COLLATE "y") FROM test1; ...@@ -463,8 +472,19 @@ SELECT a || ('foo' COLLATE "y") FROM test1;
collation may be created using collation may be created using
the <xref linkend="sql-createcollation"> command. That command the <xref linkend="sql-createcollation"> command. That command
can also be used to create a new collation from an existing can also be used to create a new collation from an existing
collation, which can be useful to be able to use operating-system collation, which can be useful to be able to use
independent collation names in applications. operating-system-independent collation names in applications.
</para>
<para>
Within any particular database, only collations that use that
database's encoding are of interest. Other entries in
<literal>pg_collation</literal> are ignored. Thus, a stripped collation
name such as <literal>de_DE</literal> can be considered unique
within a given database even though it would not be unique globally.
Use of the stripped collation names is recommendable, since it will
make one less thing you need to change if you decide to change to
another database encoding.
</para> </para>
</sect2> </sect2>
</sect1> </sect1>
......
...@@ -21,7 +21,7 @@ ...@@ -21,7 +21,7 @@
CREATE COLLATION <replaceable>name</replaceable> ( CREATE COLLATION <replaceable>name</replaceable> (
[ LOCALE = <replaceable>locale</replaceable>, ] [ LOCALE = <replaceable>locale</replaceable>, ]
[ LC_COLLATE = <replaceable>lc_collate</replaceable>, ] [ LC_COLLATE = <replaceable>lc_collate</replaceable>, ]
[ LC_CTYPE = <replaceable>lc_ctype</replaceable>, ] [ LC_CTYPE = <replaceable>lc_ctype</replaceable> ]
) )
CREATE COLLATION <replaceable>name</replaceable> FROM <replaceable>existing_collation</replaceable> CREATE COLLATION <replaceable>name</replaceable> FROM <replaceable>existing_collation</replaceable>
</synopsis> </synopsis>
...@@ -32,7 +32,8 @@ CREATE COLLATION <replaceable>name</replaceable> FROM <replaceable>existing_coll ...@@ -32,7 +32,8 @@ CREATE COLLATION <replaceable>name</replaceable> FROM <replaceable>existing_coll
<para> <para>
<command>CREATE COLLATION</command> defines a new collation using <command>CREATE COLLATION</command> defines a new collation using
the specified operating system locales or from an existing collation. the specified operating system locale settings,
or by copying an existing collation.
</para> </para>
<para> <para>
...@@ -53,26 +54,14 @@ CREATE COLLATION <replaceable>name</replaceable> FROM <replaceable>existing_coll ...@@ -53,26 +54,14 @@ CREATE COLLATION <replaceable>name</replaceable> FROM <replaceable>existing_coll
<para> <para>
The name of the collation. The collation name can be The name of the collation. The collation name can be
schema-qualified. If it is not, the collation is defined in the schema-qualified. If it is not, the collation is defined in the
current schema. The collation name must be unique within a current schema. The collation name must be unique within that
schema. (The system catalogs can contain collations with the schema. (The system catalogs can contain collations with the
same name for other encodings, but these are not usable if the same name for other encodings, but these are ignored if the
database encoding does not match.) database encoding does not match.)
</para> </para>
</listitem> </listitem>
</varlistentry> </varlistentry>
<varlistentry>
<term><replaceable>existing_collation</replaceable></term>
<listitem>
<para>
The name of an existing collation to copy. The new collation
will have the same properties as the existing one, but they
will become independent objects.
</para>
</listitem>
</varlistentry>
<varlistentry> <varlistentry>
<term><replaceable>locale</replaceable></term> <term><replaceable>locale</replaceable></term>
...@@ -80,7 +69,7 @@ CREATE COLLATION <replaceable>name</replaceable> FROM <replaceable>existing_coll ...@@ -80,7 +69,7 @@ CREATE COLLATION <replaceable>name</replaceable> FROM <replaceable>existing_coll
<para> <para>
This is a shortcut for setting <symbol>LC_COLLATE</symbol> This is a shortcut for setting <symbol>LC_COLLATE</symbol>
and <symbol>LC_CTYPE</symbol> at once. If you specify this, and <symbol>LC_CTYPE</symbol> at once. If you specify this,
you cannot specify either of the other parameters. you cannot specify either of those parameters.
</para> </para>
</listitem> </listitem>
</varlistentry> </varlistentry>
...@@ -112,6 +101,18 @@ CREATE COLLATION <replaceable>name</replaceable> FROM <replaceable>existing_coll ...@@ -112,6 +101,18 @@ CREATE COLLATION <replaceable>name</replaceable> FROM <replaceable>existing_coll
</para> </para>
</listitem> </listitem>
</varlistentry> </varlistentry>
<varlistentry>
<term><replaceable>existing_collation</replaceable></term>
<listitem>
<para>
The name of an existing collation to copy. The new collation
will have the same properties as the existing one, but they
will become independent objects.
</para>
</listitem>
</varlistentry>
</variablelist> </variablelist>
</refsect1> </refsect1>
...@@ -145,8 +146,8 @@ CREATE COLLATION french (LOCALE = 'fr_FR.utf8'); ...@@ -145,8 +146,8 @@ CREATE COLLATION french (LOCALE = 'fr_FR.utf8');
<programlisting> <programlisting>
CREATE COLLATION german FROM "de_DE"; CREATE COLLATION german FROM "de_DE";
</programlisting> </programlisting>
This can be convenient to be able to use operating-system This can be convenient to be able to use operating-system-independent
independent collation names in applications. collation names in applications.
</para> </para>
</refsect1> </refsect1>
......
...@@ -94,7 +94,7 @@ DROP COLLATION german; ...@@ -94,7 +94,7 @@ DROP COLLATION german;
<para> <para>
The <command>DROP COLLATION</command> command conforms to the The <command>DROP COLLATION</command> command conforms to the
<acronym>SQL</acronym> standard, apart from the <literal>IF <acronym>SQL</acronym> standard, apart from the <literal>IF
EXISTS</> option, which is a <productname>PostgreSQL</> extension.. EXISTS</> option, which is a <productname>PostgreSQL</> extension.
</para> </para>
</refsect1> </refsect1>
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment