Skip to content
Projects
Groups
Snippets
Help
Loading...
Help
Support
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
P
Postgres FD Implementation
Project overview
Project overview
Details
Activity
Releases
Repository
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Issues
0
Issues
0
List
Boards
Labels
Milestones
Merge Requests
0
Merge Requests
0
CI / CD
CI / CD
Pipelines
Jobs
Schedules
Analytics
Analytics
CI / CD
Repository
Value Stream
Wiki
Wiki
Snippets
Snippets
Members
Members
Collapse sidebar
Close sidebar
Activity
Graph
Create a new issue
Jobs
Commits
Issue Boards
Open sidebar
Abuhujair Javed
Postgres FD Implementation
Commits
bf028fa8
Commit
bf028fa8
authored
Oct 31, 2006
by
Teodor Sigaev
Browse files
Options
Browse Files
Download
Email Patches
Plain Diff
Add description of new features
parent
7e63445d
Changes
3
Expand all
Hide whitespace changes
Inline
Side-by-side
Showing
3 changed files
with
503 additions
and
90 deletions
+503
-90
contrib/tsearch2/docs/tsearch-V2-intro.html
contrib/tsearch2/docs/tsearch-V2-intro.html
+3
-3
contrib/tsearch2/docs/tsearch2-guide.html
contrib/tsearch2/docs/tsearch2-guide.html
+40
-12
contrib/tsearch2/docs/tsearch2-ref.html
contrib/tsearch2/docs/tsearch2-ref.html
+460
-75
No files found.
contrib/tsearch2/docs/tsearch-V2-intro.html
View file @
bf028fa8
...
@@ -427,9 +427,9 @@ concatenation also works with NULL fields.</strong></p>
...
@@ -427,9 +427,9 @@ concatenation also works with NULL fields.</strong></p>
<p>
We need to create the index on the column idxFTI. Keep in mind
<p>
We need to create the index on the column idxFTI. Keep in mind
that the database will update the index when some action is taken.
that the database will update the index when some action is taken.
In this case we _need_ the index (The whole point of Full Text
In this case we _need_ the index (The whole point of Full Text
INDEXING
i
;-)), so don't worry about any indexing overhead. We will
INDEXING ;-)), so don't worry about any indexing overhead. We will
create an index based on the gist function. GiST is an index
create an index based on the gist
or gin
function. GiST is an index
structure for Generalized Search Tree.
</p>
structure for Generalized Search Tree
, GIN is a inverted index (see
<a
href=
"tsearch2-ref.html#indexes"
>
The tsearch2 Reference: Indexes
</a>
)
.
</p>
<pre>
<pre>
CREATE INDEX idxFTI_idx ON tblMessages USING gist(idxFTI);
CREATE INDEX idxFTI_idx ON tblMessages USING gist(idxFTI);
VACUUM FULL ANALYZE;
VACUUM FULL ANALYZE;
...
...
contrib/tsearch2/docs/tsearch2-guide.html
View file @
bf028fa8
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<html>
<head>
<head>
<link
type=
"text/css"
rel=
"stylesheet"
href=
"/~megera/postgres/gist/tsearch/tsearch.css"
>
<title>
tsearch2 guide
</title>
<title>
tsearch2 guide
</title>
</head>
</head>
<body>
<body>
...
@@ -9,16 +8,13 @@
...
@@ -9,16 +8,13 @@
<p
align=
center
>
<p
align=
center
>
Brandon Craig Rhodes
<br>
30 June 2003
Brandon Craig Rhodes
<br>
30 June 2003
<br>
Updated to 8.2 release by Oleg Bartunov, October 2006
</br>
<p>
<p>
This Guide introduces the reader to the PostgreSQL tsearch2 module,
This Guide introduces the reader to the PostgreSQL tsearch2 module,
version
2.
version
2.
More formal descriptions of the module's types and functions
More formal descriptions of the module's types and functions
are provided in the
<a
href=
"tsearch2-ref.html"
>
tsearch2 Reference
</a>
,
are provided in the
<a
href=
"tsearch2-ref.html"
>
tsearch2 Reference
</a>
,
which is a companion to this document.
which is a companion to this document.
You can retrieve a beta copy of the tsearch2 module from the
<a
href=
"http://www.sai.msu.su/~megera/postgres/gist/"
>
GiST for PostgreSQL
</a>
page
—
look under the section entitled
<i>
Development History
</i>
for the current version.
<p>
<p>
First we will examine the
<tt>
tsvector
</tt>
and
<tt>
tsquery
</tt>
types
First we will examine the
<tt>
tsvector
</tt>
and
<tt>
tsquery
</tt>
types
and how they are used to search documents;
and how they are used to search documents;
...
@@ -32,15 +28,40 @@ you should be able to run the examples here exactly as they are typed.
...
@@ -32,15 +28,40 @@ you should be able to run the examples here exactly as they are typed.
<hr>
<hr>
<h2>
Table of Contents
</h2>
<h2>
Table of Contents
</h2>
<blockquote>
<blockquote>
<a
href=
"#intro"
>
Introduction to FTS with tsearch2
</a><br>
<a
href=
"#vectors_queries"
>
Vectors and Queries
</a><br>
<a
href=
"#vectors_queries"
>
Vectors and Queries
</a><br>
<a
href=
"#simple_search"
>
A Simple Search Engine
</a><br>
<a
href=
"#simple_search"
>
A Simple Search Engine
</a><br>
<a
href=
"#weights"
>
Ranking and Position Weights
</a><br>
<a
href=
"#weights"
>
Ranking and Position Weights
</a><br>
<a
href=
"#casting"
>
Casting Vectors and Queries
</a><br>
<a
href=
"#casting"
>
Casting Vectors and Queries
</a><br>
<a
href=
"#parsing_lexing"
>
Parsing and Lexing
</a><br>
<a
href=
"#parsing_lexing"
>
Parsing and Lexing
</a><br>
<a
href=
"#ref"
>
Additional information
</a>
</blockquote>
</blockquote>
<hr>
<hr>
<h2><a
name=
"intro"
>
Introduction to FTS with tsearch2
</a></h2>
The purpose of FTS is to
find
<b>
documents
</b>
, which satisfy
<b>
query
</b>
and optionally return
them in some
<b>
order
</b>
.
Most common case: Find documents containing all query terms and return them in order
of their similarity to the query. Document in database can be
any text attribute, or combination of text attributes from one or many tables
(using joins).
Text search operators existed for years, in PostgreSQL they are
<tt><b>
~,~*, LIKE, ILIKE
</b></tt>
, but they lack linguistic support,
tends to be slow and have no relevance ranking. The idea behind tsearch2 is
is rather simple - preprocess document at index time to save time at search stage.
Preprocessing includes
<ul>
<li>
document parsing onto words
<li>
linguistic - normalize words to obtain lexemes
<li>
store document in optimized for searching way
</ul>
Tsearch2, in a nutshell, provides FTS operator (contains) for two new data types,
which represent document and query -
<tt>
tsquery @@ tsvector
</tt>
.
<P>
<h2><a
name=
vectors_queries
>
Vectors and Queries
</a></h2>
<h2><a
name=
vectors_queries
>
Vectors and Queries
</a></h2>
<blockquote>
<blockquote>
...
@@ -79,6 +100,8 @@ Preparing your document index involves two steps:
...
@@ -79,6 +100,8 @@ Preparing your document index involves two steps:
on the
<tt>
tsvector
</tt>
column of a table,
on the
<tt>
tsvector
</tt>
column of a table,
which implements a form of the Berkeley
which implements a form of the Berkeley
<a
href=
"http://gist.cs.berkeley.edu/"
><i>
Generalized Search Tree
</i></a>
.
<a
href=
"http://gist.cs.berkeley.edu/"
><i>
Generalized Search Tree
</i></a>
.
Since PostgreSQL 8.2 tsearch2 supports
<a
href=
"http://www.sigaev.ru/gin/"
>
Gin
</a>
index,
which is an inverted index, commonly used in search engines. It adds scalability to tsearch2.
</ul>
</ul>
Once your documents are indexed,
Once your documents are indexed,
performing a search involves:
performing a search involves:
...
@@ -251,7 +274,7 @@ and give you an error to prevent this mistake:
...
@@ -251,7 +274,7 @@ and give you an error to prevent this mistake:
<pre>
<pre>
=#
<b>
SELECT to_tsquery('the')
</b>
=#
<b>
SELECT to_tsquery('the')
</b>
NOTICE: Query contains only stopword(s) or doesn't contain lexem
e
(s), ignored
NOTICE: Query contains only stopword(s) or doesn't contain lexem(s), ignored
to_tsquery
to_tsquery
------------
------------
...
@@ -483,8 +506,8 @@ The <tt>rank()</tt> function existed in older versions of OpenFTS,
...
@@ -483,8 +506,8 @@ The <tt>rank()</tt> function existed in older versions of OpenFTS,
and has the feature that you can assign different weights
and has the feature that you can assign different weights
to words from different sections of your document.
to words from different sections of your document.
The
<tt>
rank_cd()
</tt>
uses a recent technique for weighting results
The
<tt>
rank_cd()
</tt>
uses a recent technique for weighting results
but does not allow
different weight to be given
and also allows
different weight to be given
to different sections of your document.
to different sections of your document
(since 8.2)
.
<p>
<p>
Both ranking functions allow you to specify,
Both ranking functions allow you to specify,
as an optional last argument,
as an optional last argument,
...
@@ -511,9 +534,6 @@ for details
...
@@ -511,9 +534,6 @@ for details
see the
<a
href=
"tsearch2-ref.html#ranking"
>
section on ranking
</a>
see the
<a
href=
"tsearch2-ref.html#ranking"
>
section on ranking
</a>
in the Reference.
in the Reference.
<p>
<p>
The
<tt>
rank()
</tt>
function offers more flexibility
because it pays attention to the
<i>
weights
</i>
with which you have labelled lexeme positions.
Currently tsearch2 supports four different weight labels:
Currently tsearch2 supports four different weight labels:
<tt>
'D'
</tt>
, the default weight;
<tt>
'D'
</tt>
, the default weight;
and
<tt>
'A'
</tt>
,
<tt>
'B'
</tt>
, and
<tt>
'C'
</tt>
.
and
<tt>
'A'
</tt>
,
<tt>
'B'
</tt>
, and
<tt>
'C'
</tt>
.
...
@@ -730,7 +750,7 @@ The main problem is that the apostrophe and backslash
...
@@ -730,7 +750,7 @@ The main problem is that the apostrophe and backslash
are important
<i>
both
</i>
to PostgreSQL when it is interpreting a string,
are important
<i>
both
</i>
to PostgreSQL when it is interpreting a string,
<i>
and
</i>
to the
<tt>
tsvector
</tt>
conversion function.
<i>
and
</i>
to the
<tt>
tsvector
</tt>
conversion function.
You may want to review section
You may want to review section
<a
href=
"http://www.postgresql.org/docs/
view.php?version=7.3&idoc=0&file=sql-syntax.html#SQL-SYNTAX-STRINGS"
>
1.1.2.1,
<a
href=
"http://www.postgresql.org/docs/
current/static/sql-syntax.html#SQL-SYNTAX-STRINGS"
>
“
String Constants
”
</a>
“
String Constants
”
</a>
in the PostgreSQL documentation before proceeding.
in the PostgreSQL documentation before proceeding.
<p>
<p>
...
@@ -1051,6 +1071,14 @@ using the same scheme to determine the dictionary for each token,
...
@@ -1051,6 +1071,14 @@ using the same scheme to determine the dictionary for each token,
with the difference that the query parser recognizes as special
with the difference that the query parser recognizes as special
the boolean operators that separate query words.
the boolean operators that separate query words.
<h2><a
name=
"ref"
>
Additional information
</a></h2>
More information about tsearch2 is available from
<a
href=
"http://www.sai.msu.su/~megera/postgres/gist/tsearch/V2"
>
tsearch2
</a>
page.
Also, it's worth to check
<a
href=
"http://www.sai.msu.su/~megera/wiki/Tsearch2"
>
tsearch2 wiki
</a>
pages.
</body>
</body>
</html>
</html>
...
...
contrib/tsearch2/docs/tsearch2-ref.html
View file @
bf028fa8
This diff is collapsed.
Click to expand it.
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment