Commit 323ae003 authored by Peter Eisentraut's avatar Peter Eisentraut

doc: Expand recursive query documentation

Break the section up with subsection headings.  Add examples for depth-
and breadth-first search ordering.  For consistency with the SQL
search clause, start the depth counting at 0 instead of 1 in the
examples.

Discussion: https://www.postgresql.org/message-id/c5603982-0088-7f14-0caa-fdcd0c837b57@2ndquadrant.com
parent 8fccf758
...@@ -2011,6 +2011,10 @@ GROUP BY region, product; ...@@ -2011,6 +2011,10 @@ GROUP BY region, product;
but we'd have needed two levels of nested sub-<command>SELECT</command>s. It's a bit but we'd have needed two levels of nested sub-<command>SELECT</command>s. It's a bit
easier to follow this way. easier to follow this way.
</para> </para>
</sect2>
<sect2 id="queries-with-recursive">
<title>Recursive Queries</title>
<para> <para>
<indexterm> <indexterm>
...@@ -2114,6 +2118,120 @@ GROUP BY sub_part ...@@ -2114,6 +2118,120 @@ GROUP BY sub_part
</programlisting> </programlisting>
</para> </para>
<sect3 id="queries-with-search">
<title>Search Order</title>
<para>
When computing a tree traversal using a recursive query, you might want to
order the results in either depth-first or breadth-first order. This can
be done by computing an ordering column alongside the other data columns
and using that to sort the results at the end. Note that this does not
actually control in which order the query evaluation visits the rows; that
is as always in SQL implementation-dependent. This approach merely
provides a convenient way to order the results afterwards.
</para>
<para>
To create a depth-first order, we compute for each result row an array of
rows that we have visited so far. For example, consider the following
query that searches a table <structname>tree</structname> using a
<structfield>link</structfield> field:
<programlisting>
WITH RECURSIVE search_tree(id, link, data) AS (
SELECT t.id, t.link, t.data
FROM tree t
UNION ALL
SELECT t.id, t.link, t.data
FROM tree t, search_tree st
WHERE t.id = st.link
)
SELECT * FROM search_tree;
</programlisting>
To add depth-first ordering information, you can write this:
<programlisting>
WITH RECURSIVE search_tree(id, link, data, <emphasis>path</emphasis>) AS (
SELECT t.id, t.link, t.data, <emphasis>ARRAY[t.id]</emphasis>
FROM tree t
UNION ALL
SELECT t.id, t.link, t.data, <emphasis>path || t.id</emphasis>
FROM tree t, search_tree st
WHERE t.id = st.link
)
SELECT * FROM search_tree <emphasis>ORDER BY path</emphasis>;
</programlisting>
</para>
<para>
In the general case where more than one field needs to be used to identify
a row, use an array of rows. For example, if we needed to track fields
<structfield>f1</structfield> and <structfield>f2</structfield>:
<programlisting>
WITH RECURSIVE search_tree(id, link, data, <emphasis>path</emphasis>) AS (
SELECT t.id, t.link, t.data, <emphasis>ARRAY[ROW(t.f1, t.f2)]</emphasis>
FROM tree t
UNION ALL
SELECT t.id, t.link, t.data, <emphasis>path || ROW(t.f1, t.f2)</emphasis>
FROM tree t, search_tree st
WHERE t.id = st.link
)
SELECT * FROM search_tree <emphasis>ORDER BY path</emphasis>;
</programlisting>
</para>
<note>
<para>
The queries shown in this and the following section involving
<literal>ROW</literal> constructors in the target list only support
<literal>UNION ALL</literal> (not plain <literal>UNION</literal>) in the
current implementation.
</para>
</note>
<tip>
<para>
Omit the <literal>ROW()</literal> syntax in the common case where only one
field needs to be tracked. This allows a simple array rather than a
composite-type array to be used, gaining efficiency.
</para>
</tip>
<para>
To create a breadth-first order, you can add a column that tracks the depth
of the search, for example:
<programlisting>
WITH RECURSIVE search_tree(id, link, data, <emphasis>depth</emphasis>) AS (
SELECT t.id, t.link, t.data, <emphasis>0</emphasis>
FROM tree t
UNION ALL
SELECT t.id, t.link, t.data, <emphasis>depth + 1</emphasis>
FROM tree t, search_tree st
WHERE t.id = st.link
)
SELECT * FROM search_tree <emphasis>ORDER BY depth</emphasis>;
</programlisting>
To get a stable sort, add data columns as secondary sorting columns.
</para>
<tip>
<para>
The recursive query evaluation algorithm produces its output in
breadth-first search order. However, this is an implementation detail and
it is perhaps unsound to rely on it. The order of the rows within each
level is certainly undefined, so some explicit ordering might be desired
in any case.
</para>
</tip>
</sect3>
<sect3 id="queries-with-cycle">
<title>Cycle Detection</title>
<para> <para>
When working with recursive queries it is important to be sure that When working with recursive queries it is important to be sure that
the recursive part of the query will eventually return no tuples, the recursive part of the query will eventually return no tuples,
...@@ -2123,13 +2241,13 @@ GROUP BY sub_part ...@@ -2123,13 +2241,13 @@ GROUP BY sub_part
cycle does not involve output rows that are completely duplicate: it may be cycle does not involve output rows that are completely duplicate: it may be
necessary to check just one or a few fields to see if the same point has necessary to check just one or a few fields to see if the same point has
been reached before. The standard method for handling such situations is been reached before. The standard method for handling such situations is
to compute an array of the already-visited values. For example, consider to compute an array of the already-visited values. For example, consider again
the following query that searches a table <structname>graph</structname> using a the following query that searches a table <structname>graph</structname> using a
<structfield>link</structfield> field: <structfield>link</structfield> field:
<programlisting> <programlisting>
WITH RECURSIVE search_graph(id, link, data, depth) AS ( WITH RECURSIVE search_graph(id, link, data, depth) AS (
SELECT g.id, g.link, g.data, 1 SELECT g.id, g.link, g.data, 0
FROM graph g FROM graph g
UNION ALL UNION ALL
SELECT g.id, g.link, g.data, sg.depth + 1 SELECT g.id, g.link, g.data, sg.depth + 1
...@@ -2147,17 +2265,17 @@ SELECT * FROM search_graph; ...@@ -2147,17 +2265,17 @@ SELECT * FROM search_graph;
<structfield>is_cycle</structfield> and <structfield>path</structfield> to the loop-prone query: <structfield>is_cycle</structfield> and <structfield>path</structfield> to the loop-prone query:
<programlisting> <programlisting>
WITH RECURSIVE search_graph(id, link, data, depth, is_cycle, path) AS ( WITH RECURSIVE search_graph(id, link, data, depth, <emphasis>is_cycle, path</emphasis>) AS (
SELECT g.id, g.link, g.data, 1, SELECT g.id, g.link, g.data, 0,
false, <emphasis>false,
ARRAY[g.id] ARRAY[g.id]</emphasis>
FROM graph g FROM graph g
UNION ALL UNION ALL
SELECT g.id, g.link, g.data, sg.depth + 1, SELECT g.id, g.link, g.data, sg.depth + 1,
g.id = ANY(path), <emphasis>g.id = ANY(path),
path || g.id path || g.id</emphasis>
FROM graph g, search_graph sg FROM graph g, search_graph sg
WHERE g.id = sg.link AND NOT is_cycle WHERE g.id = sg.link <emphasis>AND NOT is_cycle</emphasis>
) )
SELECT * FROM search_graph; SELECT * FROM search_graph;
</programlisting> </programlisting>
...@@ -2172,17 +2290,17 @@ SELECT * FROM search_graph; ...@@ -2172,17 +2290,17 @@ SELECT * FROM search_graph;
compare fields <structfield>f1</structfield> and <structfield>f2</structfield>: compare fields <structfield>f1</structfield> and <structfield>f2</structfield>:
<programlisting> <programlisting>
WITH RECURSIVE search_graph(id, link, data, depth, is_cycle, path) AS ( WITH RECURSIVE search_graph(id, link, data, depth, <emphasis>is_cycle, path</emphasis>) AS (
SELECT g.id, g.link, g.data, 1, SELECT g.id, g.link, g.data, 0,
false, <emphasis>false,
ARRAY[ROW(g.f1, g.f2)] ARRAY[ROW(g.f1, g.f2)]</emphasis>
FROM graph g FROM graph g
UNION ALL UNION ALL
SELECT g.id, g.link, g.data, sg.depth + 1, SELECT g.id, g.link, g.data, sg.depth + 1,
ROW(g.f1, g.f2) = ANY(path), <emphasis>ROW(g.f1, g.f2) = ANY(path),
path || ROW(g.f1, g.f2) path || ROW(g.f1, g.f2)</emphasis>
FROM graph g, search_graph sg FROM graph g, search_graph sg
WHERE g.id = sg.link AND NOT is_cycle WHERE g.id = sg.link <emphasis>AND NOT is_cycle</emphasis>
) )
SELECT * FROM search_graph; SELECT * FROM search_graph;
</programlisting> </programlisting>
...@@ -2198,10 +2316,8 @@ SELECT * FROM search_graph; ...@@ -2198,10 +2316,8 @@ SELECT * FROM search_graph;
<tip> <tip>
<para> <para>
The recursive query evaluation algorithm produces its output in The cycle path column is computed in the same way as the depth-first
breadth-first search order. You can display the results in depth-first ordering column show in the previous section.
search order by making the outer query <literal>ORDER BY</literal> a
<quote>path</quote> column constructed in this way.
</para> </para>
</tip> </tip>
...@@ -2217,7 +2333,7 @@ WITH RECURSIVE t(n) AS ( ...@@ -2217,7 +2333,7 @@ WITH RECURSIVE t(n) AS (
UNION ALL UNION ALL
SELECT n+1 FROM t SELECT n+1 FROM t
) )
SELECT n FROM t LIMIT 100; SELECT n FROM t <emphasis>LIMIT 100</emphasis>;
</programlisting> </programlisting>
This works because <productname>PostgreSQL</productname>'s implementation This works because <productname>PostgreSQL</productname>'s implementation
...@@ -2229,6 +2345,11 @@ SELECT n FROM t LIMIT 100; ...@@ -2229,6 +2345,11 @@ SELECT n FROM t LIMIT 100;
outer query will usually try to fetch all of the <literal>WITH</literal> query's outer query will usually try to fetch all of the <literal>WITH</literal> query's
output anyway. output anyway.
</para> </para>
</sect3>
</sect2>
<sect2>
<title>Common Table Expression Materialization</title>
<para> <para>
A useful property of <literal>WITH</literal> queries is that they are A useful property of <literal>WITH</literal> queries is that they are
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment