Commit 4242a715 authored by Tom Lane's avatar Tom Lane

Adjust text search documentation for recent commits.

Fix some now-obsolete statements that were overlooked in commits
6734a1ca, 3dbbd0f0, 028350f6.  Document the behavior of <0>.
Also do a little bit of rearranging and copy-editing for clarity.
parent 8dee039f
...@@ -3885,12 +3885,12 @@ SELECT 'a:1A fat:2B,4C cat:5D'::tsvector; ...@@ -3885,12 +3885,12 @@ SELECT 'a:1A fat:2B,4C cat:5D'::tsvector;
<para> <para>
It is important to understand that the It is important to understand that the
<type>tsvector</type> type itself does not perform any normalization; <type>tsvector</type> type itself does not perform any word
it assumes the words it is given are normalized appropriately normalization; it assumes the words it is given are normalized
for the application. For example, appropriately for the application. For example,
<programlisting> <programlisting>
select 'The Fat Rats'::tsvector; SELECT 'The Fat Rats'::tsvector;
tsvector tsvector
-------------------- --------------------
'Fat' 'Rats' 'The' 'Fat' 'Rats' 'The'
...@@ -3929,12 +3929,20 @@ SELECT to_tsvector('english', 'The Fat Rats'); ...@@ -3929,12 +3929,20 @@ SELECT to_tsvector('english', 'The Fat Rats');
<literal>&lt;-&gt;</> (FOLLOWED BY). There is also a variant <literal>&lt;-&gt;</> (FOLLOWED BY). There is also a variant
<literal>&lt;<replaceable>N</>&gt;</literal> of the FOLLOWED BY <literal>&lt;<replaceable>N</>&gt;</literal> of the FOLLOWED BY
operator, where <replaceable>N</> is an integer constant that operator, where <replaceable>N</> is an integer constant that
specifies a maximum distance between the two lexemes being searched specifies the distance between the two lexemes being searched
for. <literal>&lt;-&gt;</> is equivalent to <literal>&lt;1&gt;</>. for. <literal>&lt;-&gt;</> is equivalent to <literal>&lt;1&gt;</>.
</para> </para>
<para> <para>
Parentheses can be used to enforce grouping of the operators: Parentheses can be used to enforce grouping of these operators.
In the absence of parentheses, <literal>!</> (NOT) binds most tightly,
<literal>&lt;-&gt;</literal> (FOLLOWED BY) next most tightly, then
<literal>&amp;</literal> (AND), with <literal>|</literal> (OR) binding
the least tightly.
</para>
<para>
Here are some examples:
<programlisting> <programlisting>
SELECT 'fat &amp; rat'::tsquery; SELECT 'fat &amp; rat'::tsquery;
...@@ -3951,17 +3959,21 @@ SELECT 'fat &amp; rat &amp; ! cat'::tsquery; ...@@ -3951,17 +3959,21 @@ SELECT 'fat &amp; rat &amp; ! cat'::tsquery;
tsquery tsquery
------------------------ ------------------------
'fat' &amp; 'rat' &amp; !'cat' 'fat' &amp; 'rat' &amp; !'cat'
SELECT '(fat | rat) &lt;-&gt; cat'::tsquery;
tsquery
-----------------------------------
'fat' &lt;-&gt; 'cat' | 'rat' &lt;-&gt; 'cat'
</programlisting> </programlisting>
In the absence of parentheses, <literal>!</> (NOT) binds most tightly, The last example demonstrates that <type>tsquery</type> sometimes
and <literal>&amp;</literal> (AND) and <literal>&lt;-&gt;</literal> (FOLLOWED BY) rearranges nested operators into a logically equivalent formulation.
both bind more tightly than <literal>|</literal> (OR).
</para> </para>
<para> <para>
Optionally, lexemes in a <type>tsquery</type> can be labeled with Optionally, lexemes in a <type>tsquery</type> can be labeled with
one or more weight letters, which restricts them to match only one or more weight letters, which restricts them to match only
<type>tsvector</> lexemes with matching weights: <type>tsvector</> lexemes with one of those weights:
<programlisting> <programlisting>
SELECT 'fat:ab &amp; cat'::tsquery; SELECT 'fat:ab &amp; cat'::tsquery;
...@@ -3981,25 +3993,7 @@ SELECT 'super:*'::tsquery; ...@@ -3981,25 +3993,7 @@ SELECT 'super:*'::tsquery;
'super':* 'super':*
</programlisting> </programlisting>
This query will match any word in a <type>tsvector</> that begins This query will match any word in a <type>tsvector</> that begins
with <quote>super</>. Note that prefixes are first processed by with <quote>super</>.
text search configurations, which means this comparison returns
true:
<programlisting>
SELECT to_tsvector( 'postgraduate' ) @@ to_tsquery( 'postgres:*' );
?column?
----------
t
(1 row)
</programlisting>
because <literal>postgres</> gets stemmed to <literal>postgr</>:
<programlisting>
SELECT to_tsquery('postgres:*');
to_tsquery
------------
'postgr':*
(1 row)
</programlisting>
which then matches <literal>postgraduate</>.
</para> </para>
<para> <para>
...@@ -4015,6 +4009,24 @@ SELECT to_tsquery('Fat:ab &amp; Cats'); ...@@ -4015,6 +4009,24 @@ SELECT to_tsquery('Fat:ab &amp; Cats');
------------------ ------------------
'fat':AB &amp; 'cat' 'fat':AB &amp; 'cat'
</programlisting> </programlisting>
Note that <function>to_tsquery</> will process prefixes in the same way
as other words, which means this comparison returns true:
<programlisting>
SELECT to_tsvector( 'postgraduate' ) @@ to_tsquery( 'postgres:*' );
?column?
----------
t
</programlisting>
because <literal>postgres</> gets stemmed to <literal>postgr</>:
<programlisting>
SELECT to_tsvector( 'postgraduate' ), to_tsquery( 'postgres:*' );
to_tsvector | to_tsquery
---------------+------------
'postgradu':1 | 'postgr':*
</programlisting>
which will match the stemmed form of <literal>postgraduate</>.
</para> </para>
</sect2> </sect2>
......
...@@ -322,8 +322,7 @@ text @@ text ...@@ -322,8 +322,7 @@ text @@ text
match. Similarly, the <literal>|</literal> (OR) operator specifies that match. Similarly, the <literal>|</literal> (OR) operator specifies that
at least one of its arguments must appear, while the <literal>!</> (NOT) at least one of its arguments must appear, while the <literal>!</> (NOT)
operator specifies that its argument must <emphasis>not</> appear in operator specifies that its argument must <emphasis>not</> appear in
order to have a match. Parentheses can be used to control nesting of order to have a match.
these operators.
</para> </para>
<para> <para>
...@@ -346,10 +345,10 @@ SELECT to_tsvector('error is not fatal') @@ to_tsquery('fatal &lt;-&gt; error'); ...@@ -346,10 +345,10 @@ SELECT to_tsvector('error is not fatal') @@ to_tsquery('fatal &lt;-&gt; error');
There is a more general version of the FOLLOWED BY operator having the There is a more general version of the FOLLOWED BY operator having the
form <literal>&lt;<replaceable>N</>&gt;</literal>, form <literal>&lt;<replaceable>N</>&gt;</literal>,
where <replaceable>N</> is an integer standing for the exact distance where <replaceable>N</> is an integer standing for the difference between
allowed between the matching lexemes. <literal>&lt;1&gt;</literal> is the positions of the matching lexemes. <literal>&lt;1&gt;</literal> is
the same as <literal>&lt;-&gt;</>, while <literal>&lt;2&gt;</literal> the same as <literal>&lt;-&gt;</>, while <literal>&lt;2&gt;</literal>
allows one other lexeme to appear between the matches, and so allows exactly one other lexeme to appear between the matches, and so
on. The <literal>phraseto_tsquery</> function makes use of this on. The <literal>phraseto_tsquery</> function makes use of this
operator to construct a <literal>tsquery</> that can match a multi-word operator to construct a <literal>tsquery</> that can match a multi-word
phrase when some of the words are stop words. For example: phrase when some of the words are stop words. For example:
...@@ -366,9 +365,17 @@ SELECT phraseto_tsquery('the cats ate the rats'); ...@@ -366,9 +365,17 @@ SELECT phraseto_tsquery('the cats ate the rats');
'cat' &lt;-&gt; 'ate' &lt;2&gt; 'rat' 'cat' &lt;-&gt; 'ate' &lt;2&gt; 'rat'
</programlisting> </programlisting>
</para> </para>
<para>
A special case that's sometimes useful is that <literal>&lt;0&gt;</literal>
can be used to require that two patterns match the same word.
</para>
<para> <para>
The precedence of tsquery operators is as follows: <literal>|</literal>, <literal>&amp;</literal>, Parentheses can be used to control nesting of the <type>tsquery</>
<literal>&lt;-&gt;</literal>, <literal>!</literal>. operators. Without parentheses, <literal>|</literal> binds least tightly,
then <literal>&amp;</literal>, then <literal>&lt;-&gt;</literal>,
and <literal>!</literal> most tightly.
</para> </para>
</sect2> </sect2>
...@@ -1423,9 +1430,10 @@ FROM (SELECT id, body, q, ts_rank_cd(ti, q) AS rank ...@@ -1423,9 +1430,10 @@ FROM (SELECT id, body, q, ts_rank_cd(ti, q) AS rank
lacks any position or weight information. The result is usually much lacks any position or weight information. The result is usually much
smaller than an unstripped vector, but it is also less useful. smaller than an unstripped vector, but it is also less useful.
Relevance ranking does not work as well on stripped vectors as Relevance ranking does not work as well on stripped vectors as
unstripped ones. Also, when given stripped input, unstripped ones. Also,
the <literal>&lt;-&gt;</> (FOLLOWED BY) <type>tsquery</> operator the <literal>&lt;-&gt;</> (FOLLOWED BY) <type>tsquery</> operator
effectively degenerates to a simple <literal>&amp;</> (AND) test. will never match stripped input, since it cannot determine the
distance between lexeme occurrences.
</para> </para>
</listitem> </listitem>
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment