Commit bb8f629c authored by Bruce Momjian's avatar Bruce Momjian

Move full text search operators, functions, and data type sections into

the main documentation, out of its own text search chapter.
parent 8bc225e7
<!-- $PostgreSQL: pgsql/doc/src/sgml/datatype.sgml,v 1.207 2007/08/21 01:11:11 tgl Exp $ -->
<!-- $PostgreSQL: pgsql/doc/src/sgml/datatype.sgml,v 1.208 2007/08/29 20:37:14 momjian Exp $ -->
<chapter id="datatype">
<title id="datatype-title">Data Types</title>
......@@ -234,6 +234,18 @@
<entry>date and time, including time zone</entry>
</row>
<row>
<entry><type>tsquery</type></entry>
<entry></entry>
<entry>full text search query</entry>
</row>
<row>
<entry><type>tsvector</type></entry>
<entry></entry>
<entry>full text search document</entry>
</row>
<row>
<entry><type>uuid</type></entry>
<entry></entry>
......@@ -3264,6 +3276,137 @@ a0eebc999c0b4ef8bb6d6bb9bd380a11
</para>
</sect1>
<sect1 id="datatype-textsearch">
<title>Full Text Search</title>
<variablelist>
<indexterm zone="datatype-textsearch">
<primary>tsvector</primary>
</indexterm>
<varlistentry>
<term><firstterm>tsvector</firstterm></term>
<listitem>
<para>
<type>tsvector</type> is a data type that represents a document and is
optimized for full text searching. In the simplest case,
<type>tsvector</type> is a sorted list of lexemes, so even without indexes
full text searches perform better than standard <literal>~</literal> and
<literal>LIKE</literal> operations:
<programlisting>
SELECT 'a fat cat sat on a mat and ate a fat rat'::tsvector;
tsvector
----------------------------------------------------
'a' 'on' 'and' 'ate' 'cat' 'fat' 'mat' 'rat' 'sat'
</programlisting>
Notice, that <literal>space</literal> is also a lexeme:
<programlisting>
SELECT 'space '' '' is a lexeme'::tsvector;
tsvector
----------------------------------
'a' 'is' ' ' 'space' 'lexeme'
</programlisting>
Each lexeme, optionally, can have positional information which is used for
<varname>proximity ranking</varname>:
<programlisting>
SELECT 'a:1 fat:2 cat:3 sat:4 on:5 a:6 mat:7 and:8 ate:9 a:10 fat:11 rat:12'::tsvector;
tsvector
-------------------------------------------------------------------------------
'a':1,6,10 'on':5 'and':8 'ate':9 'cat':3 'fat':2,11 'mat':7 'rat':12 'sat':4
</programlisting>
Each lexeme position also can be labeled as <literal>A</literal>,
<literal>B</literal>, <literal>C</literal>, <literal>D</literal>,
where <literal>D</literal> is the default. These labels can be used to group
lexemes into different <emphasis>importance</emphasis> or
<emphasis>rankings</emphasis>, for example to reflect document structure.
Actual values can be assigned at search time and used during the calculation
of the document rank. This is very useful for controlling search results.
</para>
<para>
The concatenation operator, e.g. <literal>tsvector || tsvector</literal>,
can "construct" a document from several parts. The order is important if
<type>tsvector</type> contains positional information. Of course,
it is also possible to build a document using different tables:
<programlisting>
SELECT 'fat:1 cat:2'::tsvector || 'fat:1 rat:2'::tsvector;
?column?
---------------------------
'cat':2 'fat':1,3 'rat':4
SELECT 'fat:1 rat:2'::tsvector || 'fat:1 cat:2'::tsvector;
?column?
---------------------------
'cat':4 'fat':1,3 'rat':2
</programlisting>
</para>
</listitem>
</varlistentry>
<indexterm zone="datatype-textsearch">
<primary>tsquery</primary>
</indexterm>
<varlistentry>
<term><firstterm>tsquery</firstterm></term>
<listitem>
<para>
<type>tsquery</type> is a data type for textual queries which supports
the boolean operators <literal>&amp;</literal> (AND), <literal>|</literal> (OR),
and parentheses. A <type>tsquery</type> consists of lexemes
(optionally labeled by letters) with boolean operators in between:
<programlisting>
SELECT 'fat &amp; cat'::tsquery;
tsquery
---------------
'fat' &amp; 'cat'
SELECT 'fat:ab &amp; cat'::tsquery;
tsquery
------------------
'fat':AB &amp; 'cat'
</programlisting>
Labels can be used to restrict the search region, which allows the
development of different search engines using the same full text index.
</para>
<para>
<type>tsqueries</type> can be concatenated using <literal>&amp;&amp;</literal> (AND)
and <literal>||</literal> (OR) operators:
<programlisting>
SELECT 'a &amp; b'::tsquery &amp;&amp; 'c | d'::tsquery;
?column?
---------------------------
'a' &amp; 'b' &amp; ( 'c' | 'd' )
SELECT 'a &amp; b'::tsquery || 'c|d'::tsquery;
?column?
---------------------------
'a' &amp; 'b' | ( 'c' | 'd' )
</programlisting>
</para>
</listitem>
</varlistentry>
</variablelist>
</sect1>
<sect1 id="datatype-xml">
<title><acronym>XML</> Type</title>
......
This diff is collapsed.
This diff is collapsed.
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment