Commit 0ba77c14 authored by Peter Eisentraut's avatar Peter Eisentraut

Convert more charset/locale documentation to DocBook

parent 333cbc2d
PostgreSQL Charsets README
Josef Balatka, <balatka@email.cz>
Draft v0.1, Tue Jul 20 15:49:07 CEST 1999
This document is a brief overview of the national charsets support
that PostgreSQL ver. 6.5 has implemented. Various compilation options
and setup tips are mentioned here to be helpful in the particular use.
---------------------------------------------------------------------------
Table of Contents
1. Locale awareness
2. Single-byte charsets recoding
3. Multi-byte support/recoding
4. Credits
---------------------------------------------------------------------------
1. Locale awareness
PostgreSQL server supports both locale aware and locale not aware
(default) operational modes. You can determine this mode during the
configuration stage of the installation with --enable-locale option.
If you don't use --enable-locale, the multi-language code will not be
compiled and PostgreSQL will behave as an ASCII compliant application.
This mode is useful for its speed but only provided that you don't
have to consider national specific chars.
With --enable-locale you will get a locale aware server using LC_*
environment variables to determine how to process national specifics.
In this case strcoll(3) and similar functions are used internally
so speed is somewhat lower.
Notice here that --enable-locale is sufficient when all your clients
use the same single-byte encoding as the database server does.
When your clients use encoding different from the server than you have
to use, moreover, --enable-recode or --with-mb=<encoding> options on
the server side or a particular client that does recoding itself (e.g.
there exists a PostgreSQL ODBC driver for Win32 with various Cyrillic
encoding capability). Option --with-mb=<encoding> is necessary for the
multi-byte charsets support.
2. Single-byte charsets recoding
You can set up this feature with --enable-recode option. This option
is described as 'enable Cyrillic recode support' which doesn't express
all its power. It can be used for *any* single-byte charset recoding.
This method uses charset.conf file located in the $PGDATA directory.
It's a typical configuration text file where spaces and newlines
separate items and records and # specifies comments. Three keywords
with the following syntax are recognized here:
BaseCharset <server_charset>
RecodeTable <from_charset> <to_charset> <file_name>
HostCharset <host_spec> <host_charset>
BaseCharset defines encoding of the database server. All charset
names are only used for mapping inside the charset.conf so you can
freely use typing-friendly names.
RecodeTable records specify translation table between server and client.
The file name is relative to the $PGDATA directory. Table file format
is very simple. There are no keywords and characters are represented by
a pair of decimal or hexadecimal (0x prefixed) values on single lines:
<char_value> <translated_char_value>
HostCharset records define IP address and charset. You can use a single
IP address, an IP mask range starting from the given address or an IP
interval (e.g. 127.0.0.1, 192.168.1.100/24, 192.168.1.20-192.168.1.40)
The charset.conf is always processed up to the end, so you can easily
specify exceptions from the previous rules. In the src/data you will
find charset.conf example and a few recoding tables.
As this solution is based on the client's IP address / charset mapping
there are obviously some restrictions as well. You can't use different
encoding on the same host at the same time. It's also inconvenient when
you boot your client hosts into more operating systems.
Nevertheless, when these restrictions are not limiting and you don't
need multi-byte chars than it's a simple and effective solution.
3. Multi-byte support/recoding
It's a new generation of charset encoding in PostgreSQL designed as a
more complex solution supporting both single-byte and multi-byte chars.
You can set up this feature with --with-mb=<encoding> option.
There is no IP mapping file and recoding is controlled through the new
SQL statements. Recoding tables are included in the code. Many national
charsets are already supported and further will follow.
See doc/README.mb, doc/README.mb.jp to get detailed instruction on how
to use the multibyte support. In the file doc/README.locale there is
a particular instruction on usage of the multibyte support with Cyrillic.
4. Credits
I'd like to thank the PostgreSQL development team and all contributors
for creating PostgreSQL. Thanks to Oleg Bartunov, Oleg Broytmann and
Tatsuo Ishii for opening the door into the multi-language world.
===========
1999 Jul 21
===========
Josef Balatka, <balatka@email.cz> asked us not to remove RECODE and sent me
Czech ISO-8859-2 -> WIN-1250 translation table.
RECODE is no longer contains just Cyrillic RECODE and will stay in
PostgreSQL.
He also created some bits of documentation, mostly concerning RECODE -
see README.Charsets.
===========
1999 Apr 14
===========
Tatsuo Ishii <t-ishii@sra.co.jp> updated Multibyte support extending it
to Cyrillic language. Now PostgreSQL supports KOI8-R, WIN-1251, ISO8859-5
and CP866 (ALT) encodings.
Short instruction on using this feature follows. Longer discussion of
Multibyte support is in README.mb.
WARNING! Now with Multibyte support Cyrillic RECODE declared obsolete
and will be removed from Postgres. If you are using RECODE consider
switching to Multibyte support.
Instructions on how to prepare Postgres for Cyrillic Multibyte support.
----------------------------------------------------------------------
First, you need to backup all your databases. I recommend to backup the
entire Postgres directory, including binaries and libraries - thus you can
easily restore if something goes wrong.
Dump you data: pg_dumpall > dump.db
Stop postmaster.
Configure, compile and install Postgres. (I'll mostly talk about KOI8-R
encoding, this is just to make examples a little more clear; you can use
any supported encoding.)
cd src
./configure --enable-locale --with-mb=KOI8
make
make install
Make sure you've backed up your databases. Doublecheck your backup. I
really mean it - make regular backups and test your backups sometimes by
fake restore.
Remove your data directory (better, rename or move it).
Run initdb saying your primary encoding: initdb -e KOI8. If you omit
encoding, primary encoding from configure will be taken.
Start postmaster.
Create databases: createdb -e KOI8. Again, you can omit encoding -
default encoding will be used. You are not forced to use the same encoding
for all your databases - you can create different databases with different
encodings.
Load your data from the dump you've created: psql < dump.db
That's all! Now you are ready to enjoy the full power of Multibyte
support.
To use Multibyte support you do not need to do something special - just
execute your queries. If client program does not set encoding, it will get
the data in database encoding. But client may ask Postgres to do automatic
server-to-client and client-to-server conversions. There are 2 (two) ways
client program declares its encoding:
1) client explicitly executes the query SET CLIENT_ENCODING TO 'win';
2) client started with environment variable set. Examples -
using sh syntax:
PGCLIENTENCODING='win'; export PGCLIENTENCODING
using csh syntax:
setenv PGCLIENTENCODING 'win'
Setting PGCLIENTENCODING even if you use same client encding as the
database would omit an overhead of asking the database encoding while
initiating the connection, so it is good idea to set it in any case.
Now you may run test suite and see Multibyte support in action. Go to
.../src/test/locale and run
make clean all test-koi2win
===========
1998 Nov 20
===========
I extended locale support, originally written by Oleg Bartunov
<oleg@sai.msu.su>. Now ORDER BY (if PostgreSQL configured with
--enable-locale) uses strcoll() for all text fields: char(n), varchar(n),
text.
I included test suite .../src/test/locale. I didn't include this in
the regression test because not so much people require locale support. Read
.../src/test/locale/README for details on the test suite.
Many thanks to Oleg Bartunov (oleg@sai.msu.su) and Thomas G. Lockhart
(lockhart@alumni.caltech.edu) for hints, tips, help and discussion.
Oleg.
<!--
$Header: /cvsroot/pgsql/doc/src/sgml/Attic/admin.sgml,v 1.26 2000/09/12 05:37:07 thomas Exp $
$Header: /cvsroot/pgsql/doc/src/sgml/Attic/admin.sgml,v 1.27 2000/09/30 16:58:20 petere Exp $
Postgres Administrator's Guide.
Derived from postgres.sgml.
......@@ -98,9 +98,9 @@ Derived from postgres.sgml.
&intro-ag;
&installation;
&installw;
&charset;
&runtime;
&client-auth;
&charset;
&manage-ag;
&user-manag;
&backup;
......
This diff is collapsed.
<!-- $Header: /cvsroot/pgsql/doc/src/sgml/installation.sgml,v 1.21 2000/09/29 20:21:34 petere Exp $ -->
<!-- $Header: /cvsroot/pgsql/doc/src/sgml/installation.sgml,v 1.22 2000/09/30 16:58:20 petere Exp $ -->
<chapter id="installation">
<title><![%flattext-install-include[<productname>PostgreSQL</> ]]>Installation Instructions</title>
......@@ -447,8 +447,9 @@ su - postgres
<term>--enable-recode</term>
<listitem>
<para>
Enables character set recode support. See
<filename>doc/README.Charsets</> for details on this feature.
Enables single-byte character set recode support. See
<![%flattext-install-include[the <citetitle>Administrator's Guide</citetitle>]]>
<![%flattext-install-ignore[<xref linkend="recode">]]> about this feature.
</para>
</listitem>
</varlistentry>
......@@ -459,7 +460,10 @@ su - postgres
<para>
Allows the use of multibyte character encodings. This is
primarily for languages like Japanese, Korean, and Chinese.
Read <filename>doc/README.mb</> for details.
Read
<![%flattext-install-include[the <citetitle>Administrator's Guide</citetitle>]]>
<![%flattext-install-ignore[<xref linkend="multibyte">]]>
for details.
</para>
</listitem>
</varlistentry>
......
<!--
$Header: /cvsroot/pgsql/doc/src/sgml/postgres.sgml,v 1.41 2000/09/12 05:37:09 thomas Exp $
$Header: /cvsroot/pgsql/doc/src/sgml/postgres.sgml,v 1.42 2000/09/30 16:58:20 petere Exp $
-->
<!doctype set PUBLIC "-//OASIS//DTD DocBook V3.1//EN" [
......@@ -173,9 +173,9 @@ $Header: /cvsroot/pgsql/doc/src/sgml/postgres.sgml,v 1.41 2000/09/12 05:37:09 th
-->
&installation;
&installw;
&charset;
&runtime;
&client-auth;
&charset;
&manage-ag;
&user-manag;
&backup;
......
<!--
$Header: /cvsroot/pgsql/doc/src/sgml/runtime.sgml,v 1.25 2000/09/29 20:21:34 petere Exp $
$Header: /cvsroot/pgsql/doc/src/sgml/runtime.sgml,v 1.26 2000/09/30 16:58:20 petere Exp $
-->
<Chapter Id="runtime">
......@@ -1553,126 +1553,6 @@ set semsys:seminfo_semmsl=32
</sect1>
<sect1 id="locale">
<title>Locale Support</title>
<note>
<title>Acknowledgement</title>
<para>
Written by Oleg Bartunov. See <ulink
url="http://www.sai.msu.su/~megera/postgres/">Oleg's web
page</ulink> for additional information on locale and Russian
language support.
</para>
</note>
<para>
While doing a project for a company in Moscow, Russia, I
encountered the problem that <productname>Postgres</> had no
support of national alphabets. After looking for possible
workarounds I decided to develop support of locale myself. I'm not
a C programmer but already had some experience with locale
programming when I work with <productname>Perl</> (debugging) and
<productname>Glimpse</>. After several days of digging through the
<productname>Postgres</> source tree I made very minor corections
to <filename>src/backend/utils/adt/varlena.c</> and
<filename>src/backend/main/main.c</> and got what I needed! I did
support only for <envar>LC_CTYPE</envar> and
<envar>LC_COLLATE</envar>, but later <envar>LC_MONETARY</envar> was
added by others. I got many messages from people about this patch
so I decided to send it to developers and (to my surprise) it was
incorporated into the <productname>Postgres</> distribution.
</para>
<para>
People often complain that locale doesn't work for them. There are
several common mistakes:
<itemizedlist>
<listitem>
<para>
Didn't properly configure <productname>Postgres</> before
compilation. You must run <filename>configure</> with the
<option>--enable-locale</> option to enable locale support.
</para>
</listitem>
<listitem>
<para>
Didn't setup environment correctly when starting postmaster. You
must define environment variables <envar>LC_CTYPE</envar> and
<envar>LC_COLLATE</envar> before running postmaster because
backend gets information about locale from environment. I use
following shell script:
<programlisting>
#!/bin/sh
export LC_CTYPE=koi8-r
export LC_COLLATE=koi8-r
postmaster -B 1024 -S -D/usr/local/pgsql/data/ -o '-Fe'
</programlisting>
</para>
</listitem>
<listitem>
<para>
Broken locale support in the operating system (for example,
locale support in libc under Linux several times has changed and
this caused a lot of problems). Perl has also support of locale
and if locale is broken <command>perl -v</> will complain
something like:
<screen>
<prompt>$</> <userinput>export LC_CTYPE='not_exist'</>
<prompt>$</> <userinput>perl -v</>
<computeroutput>
perl: warning: Setting locale failed.
perl: warning: Please check that your locale settings:
LC_ALL = (unset),
LC_CTYPE = "not_exist",
LANG = (unset)
are supported and installed on your system.
perl: warning: Falling back to the standard locale ("C").
</computeroutput>
</screen>
</para>
</listitem>
<listitem>
<para>
Wrong location of locale files. Possible locations include:
<filename>/usr/lib/locale</filename> (Linux, Solaris),
<filename>/usr/share/locale</filename> (Linux),
<filename>/usr/lib/nls/loc</filename> (DUX 4.0).
Check <command>man locale</command> to find the correct
location. Under Linux I made a symbolic link between
<filename>/usr/lib/locale</filename> and
<filename>/usr/share/locale</filename> to be sure that the next
libc will not break my locale.
</para>
</listitem>
</itemizedlist>
</para>
<formalpara>
<title>What are the Benefits?</title>
<para>
You can use ~* and order by operators for strings contain
characters from national alphabets. Non-english users definitely
need that.
</para>
</formalpara>
<formalpara>
<title>What are the Drawbacks?</title>
<para>
There is one evident drawback of using locale - its speed! So, use
locale only if you really need it.
</para>
</formalpara>
</sect1>
<sect1 id="postmaster-shutdown">
<title>Shutting down the server</title>
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment