Commit a4d4f591 authored by Tom Lane's avatar Tom Lane

Doc: improve documentation about ts_headline() function.

Now that I've had my nose in that code, I thought the docs about
it left something to be desired.
parent c9b0c678
...@@ -1295,64 +1295,75 @@ ts_headline(<optional> <replaceable class="parameter">config</replaceable> <type ...@@ -1295,64 +1295,75 @@ ts_headline(<optional> <replaceable class="parameter">config</replaceable> <type
<itemizedlist spacing="compact" mark="bullet"> <itemizedlist spacing="compact" mark="bullet">
<listitem> <listitem>
<para> <para>
<literal>StartSel</literal>, <literal>StopSel</literal>: the strings with <literal>MaxWords</literal>, <literal>MinWords</literal> (integers):
which to delimit query words appearing in the document, to distinguish these numbers determine the longest and shortest headlines to output.
them from other excerpted words. You must double-quote these strings The default values are 35 and 15.
if they contain spaces or commas.
</para> </para>
</listitem> </listitem>
<listitem> <listitem>
<para> <para>
<literal>MaxWords</literal>, <literal>MinWords</literal>: these numbers <literal>ShortWord</literal> (integer): words of this length or less
determine the longest and shortest headlines to output. will be dropped at the start and end of a headline, unless they are
query terms. The default value of three eliminates common English
articles.
</para> </para>
</listitem> </listitem>
<listitem> <listitem>
<para> <para>
<literal>ShortWord</literal>: words of this length or less will be <literal>HighlightAll</literal> (boolean): if
dropped at the start and end of a headline. The default <literal>true</literal> the whole document will be used as the
value of three eliminates common English articles. headline, ignoring the preceding three parameters. The default
is <literal>false</literal>.
</para> </para>
</listitem> </listitem>
<listitem> <listitem>
<para> <para>
<literal>HighlightAll</literal>: Boolean flag; if <literal>MaxFragments</literal> (integer): maximum number of text
<literal>true</literal> the whole document will be used as the fragments to display. The default value of zero selects a
headline, ignoring the preceding three parameters. non-fragment-based headline generation method. A value greater
than zero selects fragment-based headline generation (see below).
</para> </para>
</listitem> </listitem>
<listitem> <listitem>
<para> <para>
<literal>MaxFragments</literal>: maximum number of text excerpts <literal>StartSel</literal>, <literal>StopSel</literal> (strings):
or fragments to display. The default value of zero selects a the strings with which to delimit query words appearing in the
non-fragment-oriented headline generation method. A value greater than document, to distinguish them from other excerpted words. The
zero selects fragment-based headline generation. This method default values are <quote><literal>&lt;b&gt;</literal></quote> and
finds text fragments with as many query words as possible and <quote><literal>&lt;/b&gt;</literal></quote>, which can be suitable
stretches those fragments around the query words. As a result for HTML output.
query words are close to the middle of each fragment and have words on
each side. Each fragment will be of at most <literal>MaxWords</literal> and
words of length <literal>ShortWord</literal> or less are dropped at the start
and end of each fragment. If not all query words are found in the
document, then a single fragment of the first <literal>MinWords</literal>
in the document will be displayed.
</para> </para>
</listitem> </listitem>
<listitem> <listitem>
<para> <para>
<literal>FragmentDelimiter</literal>: When more than one fragment is <literal>FragmentDelimiter</literal> (string): When more than one
displayed, the fragments will be separated by this string. fragment is displayed, the fragments will be separated by this string.
The default is <quote><literal> ... </literal></quote>.
</para> </para>
</listitem> </listitem>
</itemizedlist> </itemizedlist>
These option names are recognized case-insensitively. These option names are recognized case-insensitively.
Any unspecified options receive these defaults: You must double-quote string values if they contain spaces or commas.
</para>
<programlisting> <para>
StartSel=&lt;b&gt;, StopSel=&lt;/b&gt;, In non-fragment-based headline
MaxWords=35, MinWords=15, ShortWord=3, HighlightAll=FALSE, generation, <function>ts_headline</function> locates matches for the
MaxFragments=0, FragmentDelimiter=" ... " given <replaceable class="parameter">query</replaceable> and chooses a
</programlisting> single one to display, preferring matches that have more query words
within the allowed headline length.
In fragment-based headline generation, <function>ts_headline</function>
locates the query matches and splits each match
into <quote>fragments</quote> of no more than <literal>MaxWords</literal>
words each, preferring fragments with more query words, and when
possible <quote>stretching</quote> fragments to include surrounding
words. The fragment-based mode is thus more useful when the query
matches span large sections of the document, or when it's desirable to
display multiple matches.
In either mode, if no query matches can be identified, then a single
fragment of the first <literal>MinWords</literal> words in the document
will be displayed.
</para> </para>
<para> <para>
...@@ -1364,25 +1375,24 @@ SELECT ts_headline('english', ...@@ -1364,25 +1375,24 @@ SELECT ts_headline('english',
is to find all documents containing given query terms is to find all documents containing given query terms
and return them in order of their similarity to the and return them in order of their similarity to the
query.', query.',
to_tsquery('query &amp; similarity')); to_tsquery('english', 'query &amp; similarity'));
ts_headline ts_headline
------------------------------------------------------------ ------------------------------------------------------------
containing given &lt;b&gt;query&lt;/b&gt; terms containing given &lt;b&gt;query&lt;/b&gt; terms +
and return them in order of their &lt;b&gt;similarity&lt;/b&gt; to the and return them in order of their &lt;b&gt;similarity&lt;/b&gt; to the+
&lt;b&gt;query&lt;/b&gt;. &lt;b&gt;query&lt;/b&gt;.
SELECT ts_headline('english', SELECT ts_headline('english',
'The most common type of search 'Search terms may occur
is to find all documents containing given query terms many times in a document,
and return them in order of their similarity to the requiring ranking of the search matches to decide which
query.', occurrences to display in the result.',
to_tsquery('query &amp; similarity'), to_tsquery('english', 'search &amp; term'),
'StartSel = &lt;, StopSel = &gt;'); 'MaxFragments=10, MaxWords=7, MinWords=3, StartSel=&lt;&lt;, StopSel=&gt;&gt;');
ts_headline ts_headline
------------------------------------------------------- ------------------------------------------------------------
containing given &lt;query&gt; terms &lt;&lt;Search&gt;&gt; &lt;&lt;terms&gt;&gt; may occur +
and return them in order of their &lt;similarity&gt; to the many times ... ranking of the &lt;&lt;search&gt;&gt; matches to decide
&lt;query&gt;.
</screen> </screen>
</para> </para>
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment