Skip to content
Projects
Groups
Snippets
Help
Loading...
Help
Support
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
P
Postgres FD Implementation
Project overview
Project overview
Details
Activity
Releases
Repository
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Issues
0
Issues
0
List
Boards
Labels
Milestones
Merge Requests
0
Merge Requests
0
CI / CD
CI / CD
Pipelines
Jobs
Schedules
Analytics
Analytics
CI / CD
Repository
Value Stream
Wiki
Wiki
Snippets
Snippets
Members
Members
Collapse sidebar
Close sidebar
Activity
Graph
Create a new issue
Jobs
Commits
Issue Boards
Open sidebar
Abuhujair Javed
Postgres FD Implementation
Commits
8405e505
Commit
8405e505
authored
Aug 04, 2003
by
Teodor Sigaev
Browse files
Options
Browse Files
Download
Email Patches
Plain Diff
Docs fixes
parent
fb19e2f4
Changes
1
Show whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
137 additions
and
127 deletions
+137
-127
contrib/tsearch2/docs/tsearch2-ref.html
contrib/tsearch2/docs/tsearch2-ref.html
+137
-127
No files found.
contrib/tsearch2/docs/tsearch2-ref.html
View file @
8405e505
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html><head>
<html>
<link
type=
"text/css"
rel=
"stylesheet"
href=
"tsearch2-ref_files/tsearch.txt"
><title>
tsearch2 reference
</title></head>
<head>
<link
type=
"text/css"
rel=
"stylesheet"
href=
"/~megera/postgres/gist/tsearch/tsearch.css"
>
<title>
tsearch2 reference
</title>
</head>
<body>
<body>
<h1
align=
center
>
The tsearch2 Reference
</h1>
<h1
align=
"center"
>
The tsearch2 Reference
</h1>
<p
align=
center
>
<p
align=
"center"
>
Brandon Craig Rhodes
<br>
30 June 2003
Brandon Craig Rhodes
<br>
30 June 2003
(edited by Oleg Bartunov, 2 Aug 2003).
<p>
<
/p><
p>
This Reference documents the user types and functions
This Reference documents the user types and functions
of the tsearch2 module for PostgreSQL.
of the tsearch2 module for PostgreSQL.
An introduction to the module is provided
An introduction to the module is provided
by the
<a
href=
"tsearch2-guide.html"
>
tsearch2 Guide
</a>
,
by the
<a
href=
"
http://www.sai.msu.su/%7Emegera/postgres/gist/tsearch/V2/docs/
tsearch2-guide.html"
>
tsearch2 Guide
</a>
,
a companion document to this one.
a companion document to this one.
You can retrieve a beta copy of the tsearch2 module from the
You can retrieve a beta copy of the tsearch2 module from the
<a
href=
"http://www.sai.msu.su/
~
megera/postgres/gist/"
>
GiST for PostgreSQL
</a>
<a
href=
"http://www.sai.msu.su/
%7E
megera/postgres/gist/"
>
GiST for PostgreSQL
</a>
page
—
look under the section entitled
<i>
Development History
</i>
page
--
look under the section entitled
<i>
Development History
</i>
for the current version.
for the current version.
<
h2><a
name=
"vq"
>
Vectors and Queries
</h2>
<
/p><h2><a
name=
"vq"
>
Vectors and Queries
</a>
</h2>
Vectors and queries both store lexemes,
<a
name=
"vq"
>
Vectors and queries both store lexemes,
but for different purposes.
but for different purposes.
A
<tt>
tsvector
</tt>
stores the lexemes
A
<tt>
tsvector
</tt>
stores the lexemes
of the words that are parsed out of a document,
of the words that are parsed out of a document,
and can also remember the position of each word.
and can also remember the position of each word.
A
<tt>
tsquery
</tt>
specifies a boolean condition among lexemes.
A
<tt>
tsquery
</tt>
specifies a boolean condition among lexemes.
<p>
<
/a><
p>
Any of the following functions with a
<tt><i>
configuration
</i></tt>
argument
<a
name=
"vq"
>
Any of the following functions with a
<tt><i>
configuration
</i></tt>
argument
can use either an integer
<tt>
id
</tt>
or textual
<tt>
ts_name
</tt>
can use either an integer
<tt>
id
</tt>
or textual
<tt>
ts_name
</tt>
to select a configuration;
to select a configuration;
if the option is omitted, then the current configuration is used.
if the option is omitted, then the current configuration is used.
For more information on the current configuration,
For more information on the current configuration,
read the next section on Configurations.
read the next section on Configurations.
<
h3>
Vector Operations
</h3>
<
/a></p><h3><a
name=
"vq"
>
Vector Operations
</a>
</h3>
<dl>
<dl><dt>
<dt>
<a
name=
"vq"
>
<tt>
to_tsvector(
<em>
[
</em><i>
configuration
</i>
,
<em>
]
</em>
<tt>
to_tsvector(
<em>
[
</em><i>
configuration
</i>
,
<em>
]
</em>
<i>
document
</i>
TEXT) RETURNS tsvector
</tt>
<i>
document
</i>
TEXT) RETURNS tsvector
</tt>
<dd>
<
/a></dt><
dd>
Parses a document into tokens,
<a
name=
"vq"
>
Parses a document into tokens,
reduces the tokens to lexemes,
reduces the tokens to lexemes,
and returns a
<tt>
tsvector
</tt>
which lists the lexemes
and returns a
<tt>
tsvector
</tt>
which lists the lexemes
together with their positions in the document.
together with their positions in the document.
For the best description of this process,
For the best description of this process,
see the section on
<
a
href=
"
tsearch2-guide.html#ps"
>
Parsing and Stemming
</a>
see the section on
<
/a><a
href=
"http://www.sai.msu.su/%7Emegera/postgres/gist/tsearch/V2/docs/
tsearch2-guide.html#ps"
>
Parsing and Stemming
</a>
in the accompanying tsearch2 Guide.
in the accompanying tsearch2 Guide.
<dt>
<
/dd><
dt>
<tt>
strip(
<i>
vector
</i>
tsvector) RETURNS tsvector
</tt>
<tt>
strip(
<i>
vector
</i>
tsvector) RETURNS tsvector
</tt>
<dd>
<
/dt><
dd>
Return a vector which lists the same lexemes
Return a vector which lists the same lexemes
as the given
<tt><i>
vector
</i></tt>
,
as the given
<tt><i>
vector
</i></tt>
,
but which lacks any information
but which lacks any information
about where in the document each lexeme appeared.
about where in the document each lexeme appeared.
While the returned vector is thus useless for relevance ranking,
While the returned vector is thus useless for relevance ranking,
it will usually be much smaller.
it will usually be much smaller.
<dt>
<
/dd><
dt>
<tt>
setweight(
<i>
vector
</i>
tsvector,
<i>
letter
</i>
) RETURNS tsvector
</tt>
<tt>
setweight(
<i>
vector
</i>
tsvector,
<i>
letter
</i>
) RETURNS tsvector
</tt>
<dd>
<
/dt><
dd>
This function returns a copy of the input vector
This function returns a copy of the input vector
in which every location has been labelled
in which every location has been labelled
with either the
<tt><i>
letter
</i></tt>
with either the
<tt><i>
letter
</i></tt>
...
@@ -72,12 +68,12 @@ read the next section on Configurations.
...
@@ -72,12 +68,12 @@ read the next section on Configurations.
These labels are retained when vectors are concatenated,
These labels are retained when vectors are concatenated,
allowing words from different parts of a document
allowing words from different parts of a document
to be weighted differently by ranking functions.
to be weighted differently by ranking functions.
<dt>
<
/dd><
dt>
<tt><i>
vector1
</i>
||
<i>
vector2
</i></tt>
<tt><i>
vector1
</i>
||
<i>
vector2
</i></tt>
<
dt
class=
br
>
<
/dt><dt
class=
"br"
>
<tt>
concat(
<i>
vector1
</i>
tsvector,
<i>
vector2
</i>
tsvector)
<tt>
concat(
<i>
vector1
</i>
tsvector,
<i>
vector2
</i>
tsvector)
RETURNS tsvector
</tt>
RETURNS tsvector
</tt>
<dd>
<
/dt><
dd>
Returns a vector which combines the lexemes and position information
Returns a vector which combines the lexemes and position information
in the two vectors given as arguments.
in the two vectors given as arguments.
Position weight labels (described in the previous paragraph)
Position weight labels (described in the previous paragraph)
...
@@ -98,53 +94,52 @@ read the next section on Configurations.
...
@@ -98,53 +94,52 @@ read the next section on Configurations.
and then providing a
<tt><i>
weights
</i></tt>
argument
and then providing a
<tt><i>
weights
</i></tt>
argument
to the
<tt>
rank()
</tt>
function
to the
<tt>
rank()
</tt>
function
that assigns different weights to positions with different labels.
that assigns different weights to positions with different labels.
<dt>
<
/dd><
dt>
<tt>
tsvector_size(
<i>
vector
</i>
tsvector) RETURNS INT4
</tt>
<tt>
tsvector_size(
<i>
vector
</i>
tsvector) RETURNS INT4
</tt>
<dd>
<
/dt><
dd>
Returns the number of lexemes stored in the vector.
Returns the number of lexemes stored in the vector.
<dt>
<
/dd><
dt>
<tt><i>
text
</i>
::tsvector RETURNS tsvector
</tt>
<tt><i>
text
</i>
::tsvector RETURNS tsvector
</tt>
<dd>
<
/dt><
dd>
Directly casting text to a
<tt>
tsvector
</tt>
Directly casting text to a
<tt>
tsvector
</tt>
allows you to directly inject lexemes into a vector,
allows you to directly inject lexemes into a vector,
with whatever positions and position weights you choose to specify.
with whatever positions and position weights you choose to specify.
The
<tt><i>
text
</i></tt>
should be formatted
The
<tt><i>
text
</i></tt>
should be formatted
like the vector would be printed by the output of a
<tt>
SELECT
</tt>
.
like the vector would be printed by the output of a
<tt>
SELECT
</tt>
.
See the
<a
href=
"tsearch2-guide.html#casting"
>
Casting
</a>
See the
<a
href=
"
http://www.sai.msu.su/%7Emegera/postgres/gist/tsearch/V2/docs/
tsearch2-guide.html#casting"
>
Casting
</a>
section in the Guide for details.
section in the Guide for details.
</dl>
</d
d></d
l>
<h3>
Query Operations
</h3>
<h3>
Query Operations
</h3>
<dl>
<dl><dt>
<dt>
<tt>
to_tsquery(
<em>
[
</em><i>
configuration
</i>
,
<em>
]
</em>
<tt>
to_tsquery(
<em>
[
</em><i>
configuration
</i>
,
<em>
]
</em>
<i>
querytext
</i>
text) RETURNS tsvector
</tt>
<i>
querytext
</i>
text) RETURNS tsvector
</tt>
<dd>
<
/dt><
dd>
Parses a query,
Parses a query,
which should be single words separated by the boolean operators
which should be single words separated by the boolean operators
“
<tt>
&
</tt>
”
and,
"
<tt>
&
</tt>
"
and,
“
<tt>
|
</tt>
”
or,
"
<tt>
|
</tt>
"
or,
and
“
<tt>
!
</tt>
”
not,
and
"
<tt>
!
</tt>
"
not,
which can be grouped using parenthesis.
which can be grouped using parenthesis.
Each word is reduced to a lexeme using the current
Each word is reduced to a lexeme using the current
or specified configuration.
or specified configuration.
</ul>
<dt>
<
/dd><
dt>
<tt>
querytree(
<i>
query
</i>
tsquery) RETURNS text
</tt>
<tt>
querytree(
<i>
query
</i>
tsquery) RETURNS text
</tt>
<dd>
<
/dt><
dd>
This might return a textual representation of the given query.
This might return a textual representation of the given query.
<dt>
<
/dd><
dt>
<tt><i>
text
</i>
::tsquery RETURNS tsquery
</tt>
<tt><i>
text
</i>
::tsquery RETURNS tsquery
</tt>
<dd>
<
/dt><
dd>
Directly casting text to a
<tt>
tsquery
</tt>
Directly casting text to a
<tt>
tsquery
</tt>
allows you to directly inject lexemes into a query,
allows you to directly inject lexemes into a query,
with whatever positions and position weight flags you choose to specify.
with whatever positions and position weight flags you choose to specify.
The
<tt><i>
text
</i></tt>
should be formatted
The
<tt><i>
text
</i></tt>
should be formatted
like the query would be printed by the output of a
<tt>
SELECT
</tt>
.
like the query would be printed by the output of a
<tt>
SELECT
</tt>
.
See the
<a
href=
"tsearch2-guide.html#casting"
>
Casting
</a>
See the
<a
href=
"
http://www.sai.msu.su/%7Emegera/postgres/gist/tsearch/V2/docs/
tsearch2-guide.html#casting"
>
Casting
</a>
section in the Guide for details.
section in the Guide for details.
</dl>
</d
d></d
l>
<h2><a
name=
"configurations"
>
Configurations
</a></h2>
<h2><a
name=
"configurations"
>
Configurations
</a></h2>
...
@@ -157,39 +152,38 @@ uses a configuration to perform its processing.
...
@@ -157,39 +152,38 @@ uses a configuration to perform its processing.
Three configurations come with tsearch2:
Three configurations come with tsearch2:
<ul>
<ul>
<li><b>
default
</b>
—
Indexes words and numbers,
<li><b>
default
</b>
--
Indexes words and numbers,
using the
<i>
en_stem
</i>
English Snowball stemmer for Latin-alphabet words
using the
<i>
en_stem
</i>
English Snowball stemmer for Latin-alphabet words
and the
<i>
simple
</i>
dictionary for all others.
and the
<i>
simple
</i>
dictionary for all others.
<
li><b>
default_russian
</b>
—
Indexes words and numbers,
<
/li><li><b>
default_russian
</b>
--
Indexes words and numbers,
using the
<i>
en_stem
</i>
English Snowball stemmer for Latin-alphabet words
using the
<i>
en_stem
</i>
English Snowball stemmer for Latin-alphabet words
and the
<i>
ru_stem
</i>
Russian Snowball dictionary for all others.
and the
<i>
ru_stem
</i>
Russian Snowball dictionary for all others.
<
li><b>
simple
</b>
—
Processes both words and numbers
<
/li><li><b>
simple
</b>
--
Processes both words and numbers
with the
<i>
simple
</i>
dictionary,
with the
<i>
simple
</i>
dictionary,
which neither discards any stop words nor alters them.
which neither discards any stop words nor alters them.
</ul>
</
li></
ul>
The tsearch2 modules initially chooses your current configuration
The tsearch2 modules initially chooses your current configuration
by looking for your current locale in the
<tt>
locale
</tt>
field
by looking for your current locale in the
<tt>
locale
</tt>
field
of the
<tt>
pg_ts_cfg
</tt>
table described below.
of the
<tt>
pg_ts_cfg
</tt>
table described below.
You can manipulate the current configuration yourself with these functions:
You can manipulate the current configuration yourself with these functions:
<dl>
<dl><dt>
<dt>
<tt>
set_curcfg(
<i>
id
</i>
INT
<em>
|
</em>
<i>
ts_name
</i>
TEXT
<tt>
set_curcfg(
<i>
id
</i>
INT
<em>
|
</em>
<i>
ts_name
</i>
TEXT
) RETURNS VOID
</tt>
) RETURNS VOID
</tt>
<dd>
<
/dt><
dd>
Set the current configuration used by
<tt>
to_tsvector
</tt>
Set the current configuration used by
<tt>
to_tsvector
</tt>
and
<tt>
to_tsquery
</tt>
.
and
<tt>
to_tsquery
</tt>
.
<dt>
<
/dd><
dt>
<tt>
show_curcfg() RETURNS INT4
</tt>
<tt>
show_curcfg() RETURNS INT4
</tt>
<dd>
<
/dt><
dd>
Returns the integer
<tt>
id
</tt>
of the current configuration.
Returns the integer
<tt>
id
</tt>
of the current configuration.
</dl>
</d
d></d
l>
<p>
<p>
Each configuration is defined by a record in the
<tt>
pg_ts_cfg
</tt>
table:
Each configuration is defined by a record in the
<tt>
pg_ts_cfg
</tt>
table:
<pre>
create table pg_ts_cfg (
<
/p><
pre>
create table pg_ts_cfg (
id int not null primary key,
id int not null primary key,
ts_name text not null,
ts_name text not null,
prs_name text not null,
prs_name text not null,
...
@@ -200,17 +194,17 @@ The <tt>id</tt> and <tt>ts_name</tt> are unique values
...
@@ -200,17 +194,17 @@ The <tt>id</tt> and <tt>ts_name</tt> are unique values
which identify the configuration;
which identify the configuration;
the
<tt>
prs_name
</tt>
specifies which parser the configuration uses.
the
<tt>
prs_name
</tt>
specifies which parser the configuration uses.
Once this parser has split document text into tokens,
Once this parser has split document text into tokens,
the type of each resulting token
—
the type of each resulting token
--
or, more specifically, the type's
<tt>
lex
_alias
</tt>
or, more specifically, the type's
<tt>
tok
_alias
</tt>
as specified in the parser's
<tt>
lexem_type()
</tt>
table
—
as specified in the parser's
<tt>
lexem_type()
</tt>
table
--
is searched for together with the configuration's
<tt>
ts_name
</tt>
is searched for together with the configuration's
<tt>
ts_name
</tt>
in the
<tt>
pg_ts_cfgmap
</tt>
table:
in the
<tt>
pg_ts_cfgmap
</tt>
table:
<pre>
create table pg_ts_cfgmap (
<pre>
create table pg_ts_cfgmap (
ts_name text not null,
ts_name text not null,
lex
_alias text not null,
tok
_alias text not null,
dict_name text[],
dict_name text[],
primary key (ts_name,
lex
_alias)
primary key (ts_name,
tok
_alias)
);
</pre>
);
</pre>
Those tokens whose types are not listed are discarded.
Those tokens whose types are not listed are discarded.
...
@@ -227,17 +221,16 @@ or discarding the token if no dictionary returns a lexeme for it.
...
@@ -227,17 +221,16 @@ or discarding the token if no dictionary returns a lexeme for it.
Each parser is defined by a record in the
<tt>
pg_ts_parser
</tt>
table:
Each parser is defined by a record in the
<tt>
pg_ts_parser
</tt>
table:
<pre>
create table pg_ts_parser (
<pre>
create table pg_ts_parser (
prs_id int not null primary key,
prs_name text not null,
prs_name text not null,
prs_start oid not null,
prs_start oid not null,
prs_
getlexem
oid not null,
prs_
nexttoken
oid not null,
prs_end oid not null,
prs_end oid not null,
prs_headline oid not null,
prs_headline oid not null,
prs_lextype oid not null,
prs_lextype oid not null,
prs_comment text
prs_comment text
);
</pre>
);
</pre>
The
<tt>
prs_
id
</tt>
and
<tt>
prs_
name
</tt>
uniquely identify the parser,
The
<tt>
prs_name
</tt>
uniquely identify the parser,
while
<tt>
prs_comment
</tt>
usually describes its name and version
while
<tt>
prs_comment
</tt>
usually describes its name and version
for the reference of users.
for the reference of users.
The other items identify the low-level functions
The other items identify the low-level functions
...
@@ -246,40 +239,65 @@ and are only of interest to someone writing a parser of their own.
...
@@ -246,40 +239,65 @@ and are only of interest to someone writing a parser of their own.
<p>
<p>
The tsearch2 module comes with one parser named
<tt>
default
</tt>
The tsearch2 module comes with one parser named
<tt>
default
</tt>
which is suitable for parsing most plain text and HTML documents.
which is suitable for parsing most plain text and HTML documents.
<p>
<
/p><
p>
Each
<tt><i>
parser
</i></tt>
argument below
Each
<tt><i>
parser
</i></tt>
argument below
must designate a parser with either an integer
<tt><i>
prs_id
</i></tt>
must designate a parser with
<tt><i>
prs_name
</i></tt>
;
or a textual
<tt><i>
prs_name
</i></tt>
;
the current parser is used when this argument is omitted.
the current parser is used when this argument is omitted.
<dl>
</p><dl><dt>
<dt>
<tt>
CREATE FUNCTION set_curprs(
<i>
parser
</i>
) RETURNS VOID
</tt>
<tt>
CREATE FUNCTION set_curprs(
<i>
parser
</i>
) RETURNS VOID
</tt>
<dd>
<
/dt><
dd>
Selects a current parser
Selects a current parser
which will be used when any of the following functions
which will be used when any of the following functions
are called without a parser as an argument.
are called without a parser as an argument.
<dt>
<
/dd><
dt>
<tt>
CREATE FUNCTION
lexem
_type(
<tt>
CREATE FUNCTION
token
_type(
<em>
[
</em>
<i>
parser
</i>
<em>
]
</em>
<em>
[
</em>
<i>
parser
</i>
<em>
]
</em>
) RETURNS SETOF
lexem
type
</tt>
) RETURNS SETOF
token
type
</tt>
<dd>
<
/dt><
dd>
Returns a table which defines and describes
Returns a table which defines and describes
each kind of token the parser may produce as output.
each kind of token the parser may produce as output.
For each token type the table gives the
<tt>
lex
id
</tt>
For each token type the table gives the
<tt>
tok
id
</tt>
which the parser will label each token of that type,
which the parser will label each token of that type,
the
<tt>
alias
</tt>
which names the token type,
the
<tt>
alias
</tt>
which names the token type,
and a short description
<tt>
descr
</tt>
for the user to read.
and a short description
<tt>
descr
</tt>
for the user to read.
<dt>
<br>
Example:
<br>
<pre>
apod=# select m.ts_name, t.alias as tok_type, t.descr as description, p.token,\
apod=# m.dict_name, strip(to_tsvector(p.token)) as tsvector\
apod=# from parse('Tsearch module for PostgreSQL 7.3.3') as\
apod=# p, token_type() as t, pg_ts_cfgmap as m, pg_ts_cfg as c\
apod=# where t.tokid=p.tokid and t.alias = m.tok_alias\
apod=# and m.ts_name=c.ts_name and c.oid=show_curcfg();
ts_name | tok_type | description | token | dict_name | tsvector
---------+----------+-------------+------------+-----------+--------------
default | lword | Latin word | Tsearch | {en_stem} | 'tsearch'
default | word | Word | module | {simple} | 'modul'
default | lword | Latin word | for | {en_stem} |
default | lword | Latin word | PostgreSQL | {en_stem} | 'postgresql'
default | version | VERSION | 7.3.3 | {simple} | '7.3.3'
</pre>
Here:
<ul>
<li>
tsname - configuration name
</li><li>
tok_type - token type
</li><li>
description - human readable name of tok_type
</li><li>
token - parser's token
</li><li>
dict_name - dictionary will be used for the token
</li><li>
tsvector - final result
</li></ul>
</dd><dt>
<tt>
CREATE FUNCTION parse(
<tt>
CREATE FUNCTION parse(
<em>
[
</em>
<i>
parser
</i>
,
<em>
]
</em>
<i>
document
</i>
TEXT
<em>
[
</em>
<i>
parser
</i>
,
<em>
]
</em>
<i>
document
</i>
TEXT
) RETURNS SETOF
lexemtype
</tt>
) RETURNS SETOF
tokenout
</tt>
<dd>
<
/dt><
dd>
Parses the given document and returns a series of records,
Parses the given document and returns a series of records,
one for each token produced by parsing.
one for each token produced by parsing.
Each token includes a
<tt>
lex
id
</tt>
giving its type
Each token includes a
<tt>
tok
id
</tt>
giving its type
and a
<tt>
lexem
</tt>
which gives its content.
and a
<tt>
lexem
</tt>
which gives its content.
</dl>
</d
d></d
l>
<h2><a
name=
"dictionaries"
>
Dictionaries
</a></h2>
<h2><a
name=
"dictionaries"
>
Dictionaries
</a></h2>
...
@@ -291,24 +309,23 @@ Among the dictionaries which come installed with tsearch2 are:
...
@@ -291,24 +309,23 @@ Among the dictionaries which come installed with tsearch2 are:
<ul>
<ul>
<li><b>
simple
</b>
simply folds uppercase letters to lowercase
<li><b>
simple
</b>
simply folds uppercase letters to lowercase
before returning the word.
before returning the word.
<li><b>
en_stem
</b>
runs an English Snowball stemmer on each word
<
/li><
li><b>
en_stem
</b>
runs an English Snowball stemmer on each word
that attempts to reduce the various forms of a verb or noun
that attempts to reduce the various forms of a verb or noun
to a single recognizable form.
to a single recognizable form.
<li><b>
ru_stem
</b>
runs a Russian Snowball stemmer on each word.
<
/li><
li><b>
ru_stem
</b>
runs a Russian Snowball stemmer on each word.
</ul>
</
li></
ul>
Each dictionary is defined by an entry in the
<tt>
pg_ts_dict
</tt>
table:
Each dictionary is defined by an entry in the
<tt>
pg_ts_dict
</tt>
table:
<pre>
CREATE TABLE pg_ts_dict (
<pre>
CREATE TABLE pg_ts_dict (
dict_id int not null primary key,
dict_name text not null,
dict_name text not null,
dict_init oid,
dict_init oid,
dict_initoption text,
dict_initoption text,
dict_le
mmat
ize oid not null,
dict_le
x
ize oid not null,
dict_comment text
dict_comment text
);
</pre>
);
</pre>
The
<tt>
dict_
id
</tt>
and
<tt>
dict_
name
</tt>
The
<tt>
dict_name
</tt>
serve as unique identifiers for the dictionary.
serve as unique identifiers for the dictionary.
The meaning of the
<tt>
dict_initoption
</tt>
varies among dictionaries,
The meaning of the
<tt>
dict_initoption
</tt>
varies among dictionaries,
but for the built-in Snowball dictionaries
but for the built-in Snowball dictionaries
...
@@ -319,33 +336,32 @@ useful only to developers trying to implement their own dictionaries.
...
@@ -319,33 +336,32 @@ useful only to developers trying to implement their own dictionaries.
<p>
<p>
The argument named
<tt><i>
dictionary
</i></tt>
The argument named
<tt><i>
dictionary
</i></tt>
in each of the following functions
in each of the following functions
should be
either an integer
<tt>
dict_id
</tt>
or a textual
<tt>
dict_name
</tt>
should be
<tt>
dict_name
</tt>
identifying which dictionary should be used for the operation;
identifying which dictionary should be used for the operation;
if omitted then the current dictionary is used.
if omitted then the current dictionary is used.
<dl>
</p><dl><dt>
<dt>
<tt>
CREATE FUNCTION set_curdict(
<i>
dictionary
</i>
) RETURNS VOID
</tt>
<tt>
CREATE FUNCTION set_curdict(
<i>
dictionary
</i>
) RETURNS VOID
</tt>
<dd>
<
/dt><
dd>
Selects a current dictionary for use by functions
Selects a current dictionary for use by functions
that do not select a dictionary explicitly.
that do not select a dictionary explicitly.
<dt>
<
/dd><
dt>
<tt>
CREATE FUNCTION lexize(
<tt>
CREATE FUNCTION lexize(
<em>
[
</em>
<i>
dictionary
</i>
,
<em>
]
</em>
<i>
word
</i>
text)
<em>
[
</em>
<i>
dictionary
</i>
,
<em>
]
</em>
<i>
word
</i>
text)
RETURNS TEXT[]
</tt>
RETURNS TEXT[]
</tt>
<dd>
<
/dt><
dd>
Reduces a single word to a lexeme.
Reduces a single word to a lexeme.
Note that lexemes are arrays of zero or more strings,
Note that lexemes are arrays of zero or more strings,
since in some languages there might be several base words
since in some languages there might be several base words
from which an inflected form could arise.
from which an inflected form could arise.
</dl>
</d
d></d
l>
<h2><a
name=
"ranking"
>
Ranking
</a></h2>
<h2><a
name=
"ranking"
>
Ranking
</a></h2>
Ranking attempts to measure how relevant documents are to particular queries
Ranking attempts to measure how relevant documents are to particular queries
by inspecting the number of times each search word appears in the document,
by inspecting the number of times each search word appears in the document,
and whether different search terms occur near each other.
and whether different search terms occur near each other.
Note that this information is only available in unstripped vectors
—
Note that this information is only available in unstripped vectors
--
ranking functions will only return a useful result
ranking functions will only return a useful result
for a
<tt>
tsvector
</tt>
which still has position information!
for a
<tt>
tsvector
</tt>
which still has position information!
<p>
<p>
...
@@ -357,45 +373,42 @@ since a hundred-word document with five instances of a search word
...
@@ -357,45 +373,42 @@ since a hundred-word document with five instances of a search word
is probably more relevant than a thousand-word document with five instances.
is probably more relevant than a thousand-word document with five instances.
The option can have the values:
The option can have the values:
<ul>
<
/p><
ul>
<li><tt>
0
</tt>
(the default) ignores document length.
<li><tt>
0
</tt>
(the default) ignores document length.
<li><tt>
1
</tt>
divides the rank by the logarithm of the length.
<
/li><
li><tt>
1
</tt>
divides the rank by the logarithm of the length.
<li><tt>
2
</tt>
divides the rank by the length itself.
<
/li><
li><tt>
2
</tt>
divides the rank by the length itself.
</ul>
</
li></
ul>
The two ranking functions currently available are:
The two ranking functions currently available are:
<dl>
<dl><dt>
<dt>
<tt>
CREATE FUNCTION rank(
<br>
<tt>
CREATE FUNCTION rank(
<br>
<em>
[
</em>
<i>
weights
</i>
float4[],
<em>
]
</em>
<em>
[
</em>
<i>
weights
</i>
float4[],
<em>
]
</em>
<i>
vector
</i>
tsvector,
<i>
query
</i>
tsquery,
<i>
vector
</i>
tsvector,
<i>
query
</i>
tsquery,
<em>
[
</em>
<i>
normalization
</i>
int4
<em>
]
</em><br>
<em>
[
</em>
<i>
normalization
</i>
int4
<em>
]
</em><br>
) RETURNS float4
</tt>
) RETURNS float4
</tt>
<dd>
<
/dt><
dd>
This is the ranking function from the old version of OpenFTS,
This is the ranking function from the old version of OpenFTS,
and offers the ability to weight word instances more heavily
and offers the ability to weight word instances more heavily
depending on how you have classified them.
depending on how you have classified them.
The
<i>
weights
</i>
specify how heavily to weight each category of word:
The
<i>
weights
</i>
specify how heavily to weight each category of word:
<pre
<pre>
{
<i>
D-weight
</i>
,
<i>
C-weight
</i>
,
<i>
B-weight
</i>
,
<i>
A-weight
</i>
}
</pre>
>
{
<i>
D-weight
</i>
,
<i>
A-weight
</i>
,
<i>
B-weight
</i>
,
<i>
C-weight
</i>
}
</pre>
If no weights are provided, then these defaults are used:
If no weights are provided, then these defaults are used:
<pre>
{0.1, 0.2, 0.4, 1.0}
</pre>
<pre>
{0.1, 0.2, 0.4, 1.0}
</pre>
Often weights are used to mark words from special areas of the document,
Often weights are used to mark words from special areas of the document,
like the title or an initial abstract,
like the title or an initial abstract,
and make them more or less important than words in the document body.
and make them more or less important than words in the document body.
<dt>
<
/dd><
dt>
<tt>
CREATE FUNCTION rank_cd(
<br>
<tt>
CREATE FUNCTION rank_cd(
<br>
<em>
[
</em>
<i>
K
</i>
int4,
<em>
]
</em>
<em>
[
</em>
<i>
K
</i>
int4,
<em>
]
</em>
<i>
vector
</i>
tsvector,
<i>
query
</i>
tsquery,
<i>
vector
</i>
tsvector,
<i>
query
</i>
tsquery,
<em>
[
</em>
<i>
normalization
</i>
int4
<em>
]
</em><br>
<em>
[
</em>
<i>
normalization
</i>
int4
<em>
]
</em><br>
) RETURNS float4
</tt>
) RETURNS float4
</tt>
<dd>
<
/dt><
dd>
This function computes the cover density ranking
This function computes the cover density ranking
for the given document
<i>
vector
</i>
and
<i>
query
</i>
,
for the given document
<i>
vector
</i>
and
<i>
query
</i>
,
as described in Clarke, Cormack, and Tudhope's
as described in Clarke, Cormack, and Tudhope's
“
<a
href=
"http://citeseer.nj.nec.com/clarke00relevance.html"
"
<a
href=
"http://citeseer.nj.nec.com/clarke00relevance.html"
>
Relevance Ranking for One to Three Term Queries
</a>
"
>
Relevance Ranking for One to Three Term Queries
</a>
”
in the 1999
<i>
Information Processing and Management
</i>
.
in the 1999
<i>
Information Processing and Management
</i>
.
The value
<i>
K
</i>
is one of the values from their formula,
The value
<i>
K
</i>
is one of the values from their formula,
and defaults to
<i>
K
</i>
=4.
and defaults to
<i>
K
</i>
=4.
...
@@ -403,18 +416,17 @@ The two ranking functions currently available are:
...
@@ -403,18 +416,17 @@ The two ranking functions currently available are:
we can roughly describe the term
we can roughly describe the term
as stating how far apart two search terms can fall
as stating how far apart two search terms can fall
before the formula begins penalizing them for lack of proximity.
before the formula begins penalizing them for lack of proximity.
</dl>
</d
d></d
l>
<h2><a
name=
"headlines"
>
Headlines
</a></h2>
<h2><a
name=
"headlines"
>
Headlines
</a></h2>
<dl>
<dl><dt>
<dt>
<tt>
CREATE FUNCTION headline(
<br>
<tt>
CREATE FUNCTION headline(
<br>
<em>
[
</em>
<i>
id
</i>
int4,
<em>
|
</em>
<i>
ts_name
</i>
text,
<em>
]
</em>
<em>
[
</em>
<i>
id
</i>
int4,
<em>
|
</em>
<i>
ts_name
</i>
text,
<em>
]
</em>
<i>
document
</i>
text,
<i>
query
</i>
tsquery,
<i>
document
</i>
text,
<i>
query
</i>
tsquery,
<em>
[
</em>
<i>
options
</i>
text
<em>
]
</em><br>
<em>
[
</em>
<i>
options
</i>
text
<em>
]
</em><br>
) RETURNS text
</tt>
) RETURNS text
</tt>
<dd>
<
/dt><
dd>
Every form of the the
<tt>
headline()
</tt>
function
Every form of the the
<tt>
headline()
</tt>
function
accepts a
<tt>
document
</tt>
along with a
<tt>
query
</tt>
,
accepts a
<tt>
document
</tt>
along with a
<tt>
query
</tt>
,
and returns one or more ellipse-separated excerpts from the document
and returns one or more ellipse-separated excerpts from the document
...
@@ -424,25 +436,23 @@ The two ranking functions currently available are:
...
@@ -424,25 +436,23 @@ The two ranking functions currently available are:
if none is specified that the current configuration is used instead.
if none is specified that the current configuration is used instead.
<p>
<p>
An
<i>
options
</i>
string if provided should be a comma-separated list
An
<i>
options
</i>
string if provided should be a comma-separated list
of one or more
‘
<i>
option
</i><tt>
=
</tt><i>
value
</i>
’
pairs.
of one or more
'
<i>
option
</i><tt>
=
</tt><i>
value
</i>
'
pairs.
The available options are:
The available options are:
<ul>
<
/p><
ul>
<li><tt>
StartSel
</tt>
,
<tt>
StopSel
</tt>
—
<li><tt>
StartSel
</tt>
,
<tt>
StopSel
</tt>
--
the strings with which query words appearing in the document
the strings with which query words appearing in the document
should be delimited to distinguish them from other excerpted words.
should be delimited to distinguish them from other excerpted words.
<
li><tt>
MaxWords
</tt>
,
<tt>
MinWords
</tt>
—
<
/li><li><tt>
MaxWords
</tt>
,
<tt>
MinWords
</tt>
--
limits on the shortest and longest headlines you will accept.
limits on the shortest and longest headlines you will accept.
<
li><tt>
ShortWord
</tt>
—
<
/li><li><tt>
ShortWord
</tt>
--
this prevents your headline from beginning or ending
this prevents your headline from beginning or ending
with a word which has this many characters or less.
with a word which has this many characters or less.
The default value of
<tt>
3
</tt>
should eliminate most English
The default value of
<tt>
3
</tt>
should eliminate most English
conjunctions and articles.
conjunctions and articles.
</ul>
</
li></
ul>
Any unspecified options receive these defaults:
Any unspecified options receive these defaults:
<pre>
<pre>
StartSel=
<
b
>
, StopSel=
<
/b
>
, MaxWords=35, MinWords=15, ShortWord=3
StartSel=
<
b
>
, StopSel=
<
/b
>
, MaxWords=35, MinWords=15, ShortWord=3
</pre>
</pre>
</dl>
</d
d></d
l>
</body>
</body></html>
</html>
\ No newline at end of file
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment