Commit 38e2bf62 authored by Teodor Sigaev's avatar Teodor Sigaev

ISpell info updated

parent ef38ca9b
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"><html><head><title>tsearch-v2-intro</title>
<link type="text/css" rel="stylesheet" href="tsearch-V2-intro_files/tsearch.txt"></head>
<html>
<head>
<title>tsearch-v2-intro</title>
<link type="text/css" rel="stylesheet" href="/~megera/postgres/gist/tsearch/tsearch.css">
</head>
<body>
<div class="content">
<h2>Tsearch2 - Introduction</h2>
<p><a href=
"http://www.sai.msu.su/~megera/postgres/gist/tsearch/V2/docs/tsearch-V2-intro.html">
<p><a href="http://www.sai.msu.su/%7Emegera/postgres/gist/tsearch/V2/docs/tsearch-V2-intro.html">
[Online version]</a> of this document is available.</p>
<p>The tsearch2 module is available to add as an extension to
......@@ -38,13 +34,11 @@
<p>The README.tsearch2 file included in the contrib/tsearch2
directory contains a brief overview and history behind tsearch.
This can also be found online <a href=
"http://www.sai.msu.su/~megera/postgres/gist/tsearch/V2/">[right
This can also be found online <a href="http://www.sai.msu.su/%7Emegera/postgres/gist/tsearch/V2/">[right
here]</a>.</p>
<p>Further in depth documentation such as a full function
reference, and user guide can be found online at the <a href=
"http://www.sai.msu.su/~megera/postgres/gist/tsearch/V2/docs/">[tsearch
reference, and user guide can be found online at the <a href="http://www.sai.msu.su/%7Emegera/postgres/gist/tsearch/V2/docs/">[tsearch
documentation home]</a>.</p>
<h3>ACKNOWLEDGEMENTS</h3>
......@@ -105,11 +99,9 @@
<p>Step one is to download the tsearch V2 module :</p>
<p><a href=
"http://www.sai.msu.su/~megera/postgres/gist/tsearch/V2/">[http://www.sai.msu.su/~megera/postgres/gist/tsearch/V2/]</a>
<p><a href="http://www.sai.msu.su/%7Emegera/postgres/gist/tsearch/V2/">[http://www.sai.msu.su/~megera/postgres/gist/tsearch/V2/]</a>
(check Development History for latest stable version !)</p>
<pre>
tar -zxvf tsearch-v2.tar.gz
<pre> tar -zxvf tsearch-v2.tar.gz
mv tsearch2 PGSQL_SRC/contrib/
cd PGSQL_SRC/contrib/tsearch2
</pre>
......@@ -121,18 +113,15 @@
<p>Then continue with the regular building and installation
process</p>
<pre>
gmake
<pre> gmake
gmake install
gmake installcheck
</pre>
<p>That is pretty much all you have to do, unless of course you
get errors. However if you get those, you better go check with
the mailing lists over at <a href=
"http://www.postgresql.org">http://www.postgresql.org</a> or
<a href=
"http://openfts.sourceforge.net/">http://openfts.sourceforge.net/</a>
the mailing lists over at <a href="http://www.postgresql.org/">http://www.postgresql.org</a> or
<a href="http://openfts.sourceforge.net/">http://openfts.sourceforge.net/</a>
since its never failed for me.</p>
<p>The directory in the contib/ and the directory from the
......@@ -151,15 +140,13 @@
<p>We should create a database to use as an example for the
remainder of this file. We can call the database "ftstest". You
can create it from the command line like this:</p>
<pre>
#createdb ftstest
<pre> #createdb ftstest
</pre>
<p>If you thought installation was easy, this next bit is even
easier. Change to the PGSQL_SRC/contrib/tsearch2 directory and
type:</p>
<pre>
psql ftstest &lt; tsearch2.sql
<pre> psql ftstest &lt; tsearch2.sql
</pre>
<p>The file "tsearch2.sql" holds all the wonderful little
......@@ -170,8 +157,7 @@
pg_ts_cfgmap are added.</p>
<p>You can check out the tables if you like:</p>
<pre>
#psql ftstest
<pre> #psql ftstest
ftstest=# \d
List of relations
Schema | Name | Type | Owner
......@@ -188,8 +174,7 @@
<p>The first thing we can do is try out some of the types that
are provided for us. Lets look at the tsvector type provided
for us:</p>
<pre>
SELECT 'Our first string used today'::tsvector;
<pre> SELECT 'Our first string used today'::tsvector;
tsvector
---------------------------------------
'Our' 'used' 'first' 'today' 'string'
......@@ -199,8 +184,7 @@
<p>The results are the words used within our string. Notice
they are not in any particular order. The tsvector type returns
a string of space separated words.</p>
<pre>
SELECT 'Our first string used today first string'::tsvector;
<pre> SELECT 'Our first string used today first string'::tsvector;
tsvector
-----------------------------------------------
'Our' 'used' 'again' 'first' 'today' 'string'
......@@ -217,8 +201,7 @@
by the tsearch2 module.</p>
<p>The function to_tsvector has 3 possible signatures:</p>
<pre>
to_tsvector(oid, text);
<pre> to_tsvector(oid, text);
to_tsvector(text, text);
to_tsvector(text);
</pre>
......@@ -228,8 +211,7 @@
the searchable text is broken up into words (Stemming process).
Right now we will specify the 'default' configuration. See the
section on TSEARCH2 CONFIGURATION to learn more about this.</p>
<pre>
SELECT to_tsvector('default',
<pre> SELECT to_tsvector('default',
'Our first string used today first string');
to_tsvector
--------------------------------------------
......@@ -259,8 +241,7 @@
<p>If you want to view the output of the tsvector fields
without their positions, you can do so with the function
"strip(tsvector)".</p>
<pre>
SELECT strip(to_tsvector('default',
<pre> SELECT strip(to_tsvector('default',
'Our first string used today first string'));
strip
--------------------------------
......@@ -270,8 +251,7 @@
<p>If you wish to know the number of unique words returned in
the tsvector you can do so by using the function
"length(tsvector)"</p>
<pre>
SELECT length(to_tsvector('default',
<pre> SELECT length(to_tsvector('default',
'Our first string used today first string'));
length
--------
......@@ -282,15 +262,13 @@
<p>Lets take a look at the function to_tsquery. It also has 3
signatures which follow the same rational as the to_tsvector
function:</p>
<pre>
to_tsquery(oid, text);
<pre> to_tsquery(oid, text);
to_tsquery(text, text);
to_tsquery(text);
</pre>
<p>Lets try using the function with a single word :</p>
<pre>
SELECT to_tsquery('default', 'word');
<pre> SELECT to_tsquery('default', 'word');
to_tsquery
-----------
'word'
......@@ -303,8 +281,7 @@
<p>Lets attempt to use the function with a string of multiple
words:</p>
<pre>
SELECT to_tsquery('default', 'this is many words');
<pre> SELECT to_tsquery('default', 'this is many words');
ERROR: Syntax error
</pre>
......@@ -313,8 +290,7 @@
"tsquery" used for searching a tsvector field. What we need to
do is search for one to many words with some kind of logic (for
now simple boolean).</p>
<pre>
SELECT to_tsquery('default', 'searching|sentence');
<pre> SELECT to_tsquery('default', 'searching|sentence');
to_tsquery
----------------------
'search' | 'sentenc'
......@@ -328,8 +304,7 @@
<p>You can not use words defined as being a stop word in your
configuration. The function will not fail ... you will just get
no result, and a NOTICE like this:</p>
<pre>
SELECT to_tsquery('default', 'a|is&amp;not|!the');
<pre> SELECT to_tsquery('default', 'a|is&amp;not|!the');
NOTICE: Query contains only stopword(s)
or doesn't contain lexem(s), ignored
to_tsquery
......@@ -348,8 +323,7 @@
<p>The next stage is to add a full text index to an existing
table. In this example we already have a table defined as
follows:</p>
<pre>
CREATE TABLE tblMessages
<pre> CREATE TABLE tblMessages
(
intIndex int4,
strTopic varchar(100),
......@@ -362,8 +336,7 @@
test strings for a topic, and a message. here is some test data
I inserted. (yes I know it's completely useless stuff ;-) but
it will serve our purpose right now).</p>
<pre>
INSERT INTO tblMessages
<pre> INSERT INTO tblMessages
VALUES ('1', 'Testing Topic', 'Testing message data input');
INSERT INTO tblMessages
VALUES ('2', 'Movie', 'Breakfast at Tiffany\'s');
......@@ -400,8 +373,7 @@
<p>The next stage is to create a special text index which we
will use for FTI, so we can search our table of messages for
words or a phrase. We do this using the SQL command:</p>
<pre>
ALTER TABLE tblMessages ADD idxFTI tsvector;
<pre> ALTER TABLE tblMessages ADD COLUMN idxFTI tsvector;
</pre>
<p>Note that unlike traditional indexes, this is actually a new
......@@ -411,8 +383,7 @@
<p>The general rule for the initial insertion of data will
follow four steps:</p>
<pre>
1. update table
<pre> 1. update table
2. vacuum full analyze
3. create index
4. vacuum full analyze
......@@ -426,8 +397,7 @@
the index has been created on the table, vacuum full analyze is
run again to update postgres's statistics (ie having the index
take effect).</p>
<pre>
UPDATE tblMessages SET idxFTI=to_tsvector('default', strMessage);
<pre> UPDATE tblMessages SET idxFTI=to_tsvector('default', strMessage);
VACUUM FULL ANALYZE;
</pre>
......@@ -436,8 +406,7 @@
information stored, you should instead do the following, which
effectively concatenates the two fields into one before being
inserted into the table:</p>
<pre>
UPDATE tblMessages
<pre> UPDATE tblMessages
SET idxFTI=to_tsvector('default',coalesce(strTopic,'') ||' '|| coalesce(strMessage,''));
VACUUM FULL ANALYZE;
</pre>
......@@ -451,8 +420,7 @@
Full Text INDEXINGi ;-)), so don't worry about any indexing
overhead. We will create an index based on the gist function.
GiST is an index structure for Generalized Search Tree.</p>
<pre>
CREATE INDEX idxFTI_idx ON tblMessages USING gist(idxFTI);
<pre> CREATE INDEX idxFTI_idx ON tblMessages USING gist(idxFTI);
VACUUM FULL ANALYZE;
</pre>
......@@ -464,15 +432,13 @@
<p>The last thing to do is set up a trigger so every time a row
in this table is changed, the text index is automatically
updated. This is easily done using:</p>
<pre>
CREATE TRIGGER tsvectorupdate BEFORE UPDATE OR INSERT ON tblMessages
<pre> CREATE TRIGGER tsvectorupdate BEFORE UPDATE OR INSERT ON tblMessages
FOR EACH ROW EXECUTE PROCEDURE tsearch2(idxFTI, strMessage);
</pre>
<p>Or if you are indexing both strMessage and strTopic you
should instead do:</p>
<pre>
CREATE TRIGGER tsvectorupdate BEFORE UPDATE OR INSERT ON tblMessages
<pre> CREATE TRIGGER tsvectorupdate BEFORE UPDATE OR INSERT ON tblMessages
FOR EACH ROW EXECUTE PROCEDURE
tsearch2(idxFTI, strTopic, strMessage);
</pre>
......@@ -490,15 +456,13 @@
the tsearch2 function. Lets say we want to create a function to
remove certain characters (like the @ symbol from all
text).</p>
<pre>
CREATE FUNCTION dropatsymbol(text)
<pre> CREATE FUNCTION dropatsymbol(text)
RETURNS text AS 'select replace($1, \'@\', \' \');' LANGUAGE SQL;
</pre>
<p>Now we can use this function within the tsearch2 function on
the trigger.</p>
<pre>
DROP TRIGGER tsvectorupdate ON tblmessages;
<pre> DROP TRIGGER tsvectorupdate ON tblmessages;
CREATE TRIGGER tsvectorupdate BEFORE UPDATE OR INSERT ON tblMessages
FOR EACH ROW EXECUTE PROCEDURE tsearch2(idxFTI, dropatsymbol, strMessage);
INSERT INTO tblmessages VALUES (69, 'Attempt for dropatsymbol', 'Test@test.com');
......@@ -513,8 +477,7 @@
locale of the server. All you have to do is change your default
configuration, or add a new one for your specific locale. See
the section on TSEARCH2 CONFIGURATION.</p>
<pre class="real">
SELECT * FROM tblmessages WHERE intindex = 69;
<pre class="real"> SELECT * FROM tblmessages WHERE intindex = 69;
intindex | strtopic | strmessage | idxfti
----------+--------------------------+---------------+-----------------------
......@@ -540,8 +503,7 @@ in the tsvector column.
<p>Lets search the indexed data for the word "Test". I indexed
based on the the concatenation of the strTopic, and the
strMessage:</p>
<pre>
SELECT intindex, strtopic FROM tblmessages
<pre> SELECT intindex, strtopic FROM tblmessages
WHERE idxfti @@ 'test'::tsquery;
intindex | strtopic
----------+---------------
......@@ -553,8 +515,7 @@ in the tsvector column.
"Testing Topic". Notice that the word I search for was all
lowercase. Let's see what happens when I query for uppercase
"Test".</p>
<pre>
SELECT intindex, strtopic FROM tblmessages
<pre> SELECT intindex, strtopic FROM tblmessages
WHERE idxfti @@ 'Test'::tsquery;
intindex | strtopic
----------+----------
......@@ -570,8 +531,7 @@ in the tsvector column.
<p>Most likely the best way to query the field is to use the
to_tsquery function on the right hand side of the @@ operator
like this:</p>
<pre>
SELECT intindex, strtopic FROM tblmessages
<pre> SELECT intindex, strtopic FROM tblmessages
WHERE idxfti @@ to_tsquery('default', 'Test | Zeppelin');
intindex | strtopic
----------+--------------------
......@@ -592,8 +552,7 @@ in the tsvector column.
a way around which doesn't appear to have a significant impact
on query time, and that is to use a query such as the
following:</p>
<pre>
SELECT intindex, strTopic FROM tblmessages
<pre> SELECT intindex, strTopic FROM tblmessages
WHERE idxfti @@ to_tsquery('default', 'gettysburg &amp; address')
AND strMessage ~* '.*men are created equal.*';
intindex | strtopic
......@@ -626,8 +585,7 @@ in the tsvector column.
english stemming. We could edit the file
:'/usr/local/pgsql/share/english.stop' and add a word to the
list. I edited mine to exclude my name from indexing:</p>
<pre>
- Edit /usr/local/pgsql/share/english.stop
<pre> - Edit /usr/local/pgsql/share/english.stop
- Add 'andy' to the list
- Save the file.
</pre>
......@@ -638,16 +596,14 @@ in the tsvector column.
connected to the DB while editing the stop words, you will need
to end the current session and re-connect. When you re-connect
to the database, 'andy' is no longer indexed:</p>
<pre>
SELECT to_tsvector('default', 'Andy');
<pre> SELECT to_tsvector('default', 'Andy');
to_tsvector
------------
(1 row)
</pre>
<p>Originally I would get the result :</p>
<pre>
SELECT to_tsvector('default', 'Andy');
<pre> SELECT to_tsvector('default', 'Andy');
to_tsvector
------------
'andi':1
......@@ -660,8 +616,7 @@ in the tsvector column.
'simple', the results would be different. There are no stop
words for the simple dictionary. It will just convert to lower
case, and index every unique word.</p>
<pre>
SELECT to_tsvector('simple', 'Andy andy The the in out');
<pre> SELECT to_tsvector('simple', 'Andy andy The the in out');
to_tsvector
-------------------------------------
'in':5 'out':6 'the':3,4 'andy':1,2
......@@ -672,8 +627,7 @@ in the tsvector column.
into the actual configuration of tsearch2. In the examples in
this document the configuration has always been specified when
using the tsearch2 functions:</p>
<pre>
SELECT to_tsvector('default', 'Testing the default config');
<pre> SELECT to_tsvector('default', 'Testing the default config');
SELECT to_tsvector('simple', 'Example of simple Config');
</pre>
......@@ -682,8 +636,7 @@ in the tsvector column.
contains both the 'default' configurations based on the 'C'
locale. And the 'simple' configuration which is not based on
any locale.</p>
<pre>
SELECT * from pg_ts_cfg;
<pre> SELECT * from pg_ts_cfg;
ts_name | prs_name | locale
-----------------+----------+--------------
default | default | C
......@@ -706,8 +659,7 @@ in the tsvector column.
configuration or just use one that already exists. If I do not
specify which configuration to use in the to_tsvector function,
I receive the following error.</p>
<pre>
SELECT to_tsvector('learning tsearch is like going to school');
<pre> SELECT to_tsvector('learning tsearch is like going to school');
ERROR: Can't find tsearch config by locale
</pre>
......@@ -716,8 +668,7 @@ in the tsvector column.
into the pg_ts_cfg table. We will call the configuration
'default_english', with the default parser and use the locale
'en_US'.</p>
<pre>
INSERT INTO pg_ts_cfg (ts_name, prs_name, locale)
<pre> INSERT INTO pg_ts_cfg (ts_name, prs_name, locale)
VALUES ('default_english', 'default', 'en_US');
</pre>
......@@ -732,15 +683,14 @@ in the tsvector column.
tsearch2.sql</p>
<p>Lets take a first look at the pg_ts_dict table</p>
<pre>
ftstest=# \d pg_ts_dict
<pre> ftstest=# \d pg_ts_dict
Table "public.pg_ts_dict"
Column | Type | Modifiers
-----------------+---------+-----------
dict_name | text | not null
dict_init | oid |
dict_initoption | text |
dict_lemmatize | oid | not null
dict_lexize | oid | not null
dict_comment | text |
Indexes: pg_ts_dict_idx unique btree (dict_name)
</pre>
......@@ -763,28 +713,57 @@ in the tsvector column.
ISpell. We will assume you have ISpell installed on you
machine. (in /usr/local/lib)</p>
<p>First lets register the dictionary(ies) to use from ISpell.
We will use the english dictionary from ISpell. We insert the
paths to the relevant ISpell dictionary (*.hash) and affixes
(*.aff) files. There seems to be some question as to which
ISpell files are to be used. I installed ISpell from the latest
sources on my computer. The installation installed the
dictionary files with an extension of *.hash. Some
installations install with an extension of *.dict As far as I
know the two extensions are equivilant. So *.hash ==
*.dict.</p>
<p>We will also continue to use the english word stop file that
<p>There has been some confusion in the past as to which files
are used from ISpell. ISpell operates using a hash file. This
is a binary file created by the ISpell command line utility
"buildhash". This utility accepts a file containing the words
from the dictionary, and the affixes file and the output is the
hash file. The default installation of ISPell installs the
english hash file english.hash, which is the exact same file as
american.hash. ISpell uses this as the fallback dictionary to
use.</p>
<p>This hash file is not what tsearch2 requires as the ISpell
interface. The file(s) needed are those used to create the
hash. Tsearch uses the dictionary words for morphology, so the
listing is needed not spellchecking. Regardless, these files
are included in the ISpell sources, and you can use them to
integrate into tsearch2. This is not complicated, but is not
very obvious to begin with. The tsearch2 ISpell interface needs
only the listing of dictionary words, it will parse and load
those words, and use the ISpell dictionary for lexem
processing.</p>
<p>I found the ISPell make system to be very finicky. Their
documentation actually states this to be the case. So I just
did things the command line way. In the ISpell source tree
under langauges/english there are several files in this
directory. For a complete description, please read the ISpell
README. Basically for the english dictionary there is the
option to create the small, medium, large and extra large
dictionaries. The medium dictionary is recommended. If the make
system is configured correctly, it would build and install the
english.has file from the medium size dictionary. Since we are
only concerned with the dictionary word listing ... it can be
created from the /languages/english directory with the
following command:</p>
<pre> sort -u -t/ +0f -1 +0 -T /usr/tmp -o english.med english.0 english.1
</pre>
<p>This will create a file called english.med. You can copy
this file to whever you like. I place mine in /usr/local/lib so
it coincides with the ISpell hash files. You can now add the
tsearch2 configuration entry for the ISpell english dictionary.
We will also continue to use the english word stop file that
was installed for the en_stem dictionary. You could use a
different one if you like. The ISpell configuration is based on
the "ispell_template" dictionary installed by default with
tsearch2. We will use the OIDs to the stored procedures from
the row where the dict_name = 'ispell_template'.</p>
<pre>
INSERT INTO pg_ts_dict
<pre> INSERT INTO pg_ts_dict
(SELECT 'en_ispell',
dict_init,
'DictFile="/usr/local/lib/english.hash",'
'DictFile="/usr/local/lib/english.med",'
'AffFile="/usr/local/lib/english.aff",'
'StopFile="/usr/local/pgsql/share/english.stop"',
dict_lexize
......@@ -792,6 +771,50 @@ in the tsvector column.
WHERE dict_name = 'ispell_template');
</pre>
<p>Now that we have a dictionary we can specify it's use in a
query to get a lexem. For this we will use the lexize function.
The lexize function takes the name of the dictionary to use as
an argument. Just as the other tsearch2 functions operate.</p>
<pre> SELECT lexize('en_ispell', 'program');
lexize
-----------
{program}
(1 row)
</pre>
<p>If you wanted to always use the ISpell english dictionary
you have installed, you can configure tsearch2 to always use a
specific dictionary.</p>
<pre> SELCECT set_curdict('en_ispell');
</pre>
<p>Lexize is meant to turn a word into a lexem. It is possible
to receive more than one lexem returned for a single word.</p>
<pre> SELECT lexize('en_ispell', 'conditionally');
lexize
-----------------------------
{conditionally,conditional}
(1 row)
</pre>
<p>The lexize function is not meant to take a full string as an
argument to return lexems for. If you passed in an entire
sentence, it attempts to find that entire sentence in the
dictionary. SInce the dictionary contains only words, you will
receive an empty result set back.</p>
<pre> SELECT lexize('en_ispell', 'This is a senctece to lexize');
lexize
--------
(1 row)
If you parse a lexem from a word not in the dictionary, then you will receive an empty result. This makes sense because the word "tsearch" is not int the english dictionary. You can create your own additions to the dictionary if you like. This may be useful for scientific or technical glossaries that need to be indexed. SELECT lexize('en_ispell', 'tsearch'); lexize -------- (1 row)
</pre>
<p>This is not to say that tsearch will be ignored when adding
text information to the the tsvector index column. This will be
explained in greater detail with the table pg_ts_cfgmap.</p>
<p>Next we need to set up the configuration for mapping the
dictionay use to the lexxem parsings. This will be done by
altering the pg_ts_cfgmap table. We will insert several rows,
......@@ -799,8 +822,7 @@ in the tsvector column.
configured for use within tsearch2. There are several type of
lexims we would be concerned with forcing the use of the ISpell
dictionary.</p>
<pre>
INSERT INTO pg_ts_cfgmap (ts_name, tok_alias, dict_name)
<pre> INSERT INTO pg_ts_cfgmap (ts_name, tok_alias, dict_name)
VALUES ('default_english', 'lhword', '{en_ispell,en_stem}');
INSERT INTO pg_ts_cfgmap (ts_name, tok_alias, dict_name)
VALUES ('default_english', 'lpart_hword', '{en_ispell,en_stem}');
......@@ -818,8 +840,7 @@ in the tsvector column.
<p>There are several other lexem types used that we do not need
to specify as using the ISpell dictionary. We can simply insert
values using the 'simple' stemming process dictionary.</p>
<pre>
INSERT INTO pg_ts_cfgmap
<pre> INSERT INTO pg_ts_cfgmap
VALUES ('default_english', 'url', '{simple}');
INSERT INTO pg_ts_cfgmap
VALUES ('default_english', 'host', '{simple}');
......@@ -857,8 +878,7 @@ in the tsvector column.
complete. We have successfully created a new tsearch2
configuration. At the same time we have also set the new
configuration to be our default for en_US locale.</p>
<pre>
SELECT to_tsvector('default_english',
<pre> SELECT to_tsvector('default_english',
'learning tsearch is like going to school');
to_tsvector
--------------------------------------------------
......@@ -870,12 +890,37 @@ in the tsvector column.
(1 row)
</pre>
<p>Notice here that words like "tsearch" are still parsed and
indexed in the tsvector column. There is a lexem returned for
the word becuase in the configuration mapping table, we specify
words to be used from the 'en_ispell' dictionary first, but as
a fallback to use the 'en_stem' dictionary. Therefore a lexem
is not returned from en_ispell, but is returned from en_stem,
and added to the tsvector.</p>
<pre> SELECT to_tsvector('learning tsearch is like going to computer school');
to_tsvector
---------------------------------------------------------------------------
'go':5 'like':4 'learn':1 'school':8 'compute':7 'tsearch':2 'computer':7
(1 row)
</pre>
<p>Notice in this last example I added the word "computer" to
the text to be converted into a tsvector. Because we have setup
our default configuration to use the ISpell english dictionary,
the words are lexized, and computer returns 2 lexems at the
same position. 'compute':7 and 'computer':7 are now both
indexed for the word computer.</p>
<p>You can create additional dictionarynlists, or use the extra
large dictionary from ISpell. You can read through the ISpell
documents, and source tree to make modifications as you see
fit.</p>
<p>In the case that you already have a configuration set for
the locale, and you are changing it to your new dictionary
configuration. You will have to set the old locale to NULL. If
we are using the 'C' locale then we would do this:</p>
<pre>
UPDATE pg_ts_cfg SET locale=NULL WHERE locale = 'C';
<pre> UPDATE pg_ts_cfg SET locale=NULL WHERE locale = 'C';
</pre>
<p>That about wraps up the configuration of tsearch2. There is
......@@ -917,38 +962,32 @@ in the tsvector column.
<p>1) Backup any global database objects such as users and
groups (this step is usually only necessary when you will be
restoring to a virgin system)</p>
<pre>
pg_dumpall -g &gt; GLOBALobjects.sql
<pre> pg_dumpall -g &gt; GLOBALobjects.sql
</pre>
<p>2) Backup the full database schema using pg_dump</p>
<pre>
pg_dump -s DATABASE &gt; DATABASEschema.sql
<pre> pg_dump -s DATABASE &gt; DATABASEschema.sql
</pre>
<p>3) Backup the full database using pg_dump</p>
<pre>
pg_dump -Fc DATABASE &gt; DATABASEdata.tar
<pre> pg_dump -Fc DATABASE &gt; DATABASEdata.tar
</pre>
<p>To Restore a PostgreSQL database that uses the tsearch2
module:</p>
<p>1) Create the blank database</p>
<pre>
createdb DATABASE
<pre> createdb DATABASE
</pre>
<p>2) Restore any global database objects such as users and
groups (this step is usually only necessary when you will be
restoring to a virgin system)</p>
<pre>
psql DATABASE &lt; GLOBALobjects.sql
<pre> psql DATABASE &lt; GLOBALobjects.sql
</pre>
<p>3) Create the tsearch2 objects, functions and operators</p>
<pre>
psql DATABASE &lt; tsearch2.sql
<pre> psql DATABASE &lt; tsearch2.sql
</pre>
<p>4) Edit the backed up database schema and delete all SQL
......@@ -957,13 +996,11 @@ in the tsvector column.
tsvector types. If your not sure what these are, they are the
ones listed in tsearch2.sql. Then restore the edited schema to
the database</p>
<pre>
psql DATABASE &lt; DATABASEschema.sql
<pre> psql DATABASE &lt; DATABASEschema.sql
</pre>
<p>5) Restore the data for the database</p>
<pre>
pg_restore -N -a -d DATABASE DATABASEdata.tar
<pre> pg_restore -N -a -d DATABASE DATABASEdata.tar
</pre>
<p>If you get any errors in step 4, it will most likely be
......@@ -971,5 +1008,4 @@ in the tsvector column.
tsearch2.sql. Any errors in step 5 will mean the database
schema was probably restored wrongly.</p>
</div>
</body>
</html>
</body></html>
\ No newline at end of file
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment