doc: Expand recursive query documentation

Break the section up with subsection headings. Add examples for depth- and breadth-first search ordering. For consistency with the SQL search clause, start the depth counting at 0 instead of 1 in the examples. Discussion: https://www.postgresql.org/message-id/c5603982-0088-7f14-0caa-fdcd0c837b57@2ndquadrant.com

doc: Expand recursive query documentation
Break the section up with subsection headings. Add examples for depth- and breadth-first search ordering. For consistency with the SQL search clause, start the depth counting at 0 instead of 1 in the examples. Discussion: https://www.postgresql.org/message-id/c5603982-0088-7f14-0caa-fdcd0c837b57@2ndquadrant.com
323ae003 · Peter Eisentraut · 8fccf758 · 323ae003
Commit 323ae003 authored Oct 13, 2020 by Peter Eisentraut
Hide whitespace changes
Inline Side-by-side

Showing with 142 additions and 21 deletions

doc/src/sgml/queries.sgml doc/src/sgml/queries.sgml +142 -21

No files found.
--- a/doc/src/sgml/queries.sgml
+++ b/doc/src/sgml/queries.sgml
@@ -2011,6 +2011,10 @@ GROUP BY region, product;
   but we'd have needed two levels of nested sub-<command>SELECT</command>s.  It's a bit
   easier to follow this way.
  </para>
+ </sect2>
+ <sect2 id="queries-with-recursive">
+  <title>Recursive Queries</title>
  <para>
   <indexterm>
@@ -2114,6 +2118,120 @@ GROUP BY sub_part
 </programlisting>
  </para>
+  <sect3 id="queries-with-search">
+   <title>Search Order</title>
+   <para>
+    When computing a tree traversal using a recursive query, you might want to
+    order the results in either depth-first or breadth-first order.  This can
+    be done by computing an ordering column alongside the other data columns
+    and using that to sort the results at the end.  Note that this does not
+    actually control in which order the query evaluation visits the rows; that
+    is as always in SQL implementation-dependent.  This approach merely
+    provides a convenient way to order the results afterwards.
+   </para>
+   <para>
+    To create a depth-first order, we compute for each result row an array of
+    rows that we have visited so far.  For example, consider the following
+    query that searches a table <structname>tree</structname> using a
+    <structfield>link</structfield> field:
+<programlisting>
+WITH RECURSIVE search_tree(id, link, data) AS (
+    SELECT t.id, t.link, t.data
+    FROM tree t
+  UNION ALL
+    SELECT t.id, t.link, t.data
+    FROM tree t, search_tree st
+    WHERE t.id = st.link
+)
+SELECT * FROM search_tree;
+</programlisting>
+    To add depth-first ordering information, you can write this:
+<programlisting>
+WITH RECURSIVE search_tree(id, link, data, <emphasis>path</emphasis>) AS (
+    SELECT t.id, t.link, t.data, <emphasis>ARRAY[t.id]</emphasis>
+    FROM tree t
+  UNION ALL
+    SELECT t.id, t.link, t.data, <emphasis>path || t.id</emphasis>
+    FROM tree t, search_tree st
+    WHERE t.id = st.link
+)
+SELECT * FROM search_tree <emphasis>ORDER BY path</emphasis>;
+</programlisting>
+   </para>
+   <para>
+    In the general case where more than one field needs to be used to identify
+    a row, use an array of rows.  For example, if we needed to track fields
+    <structfield>f1</structfield> and <structfield>f2</structfield>:
+<programlisting>
+WITH RECURSIVE search_tree(id, link, data, <emphasis>path</emphasis>) AS (
+    SELECT t.id, t.link, t.data, <emphasis>ARRAY[ROW(t.f1, t.f2)]</emphasis>
+    FROM tree t
+  UNION ALL
+    SELECT t.id, t.link, t.data, <emphasis>path || ROW(t.f1, t.f2)</emphasis>
+    FROM tree t, search_tree st
+    WHERE t.id = st.link
+)
+SELECT * FROM search_tree <emphasis>ORDER BY path</emphasis>;
+</programlisting>
+   </para>
+   <note>
+    <para>
+     The queries shown in this and the following section involving
+     <literal>ROW</literal> constructors in the target list only support
+     <literal>UNION ALL</literal> (not plain <literal>UNION</literal>) in the
+     current implementation.
+    </para>
+   </note>
+   <tip>
+    <para>
+     Omit the <literal>ROW()</literal> syntax in the common case where only one
+     field needs to be tracked.  This allows a simple array rather than a
+     composite-type array to be used, gaining efficiency.
+    </para>
+   </tip>
+   <para>
+    To create a breadth-first order, you can add a column that tracks the depth
+    of the search, for example:
+<programlisting>
+WITH RECURSIVE search_tree(id, link, data, <emphasis>depth</emphasis>) AS (
+    SELECT t.id, t.link, t.data, <emphasis>0</emphasis>
+    FROM tree t
+  UNION ALL
+    SELECT t.id, t.link, t.data, <emphasis>depth + 1</emphasis>
+    FROM tree t, search_tree st
+    WHERE t.id = st.link
+)
+SELECT * FROM search_tree <emphasis>ORDER BY depth</emphasis>;
+</programlisting>
+    To get a stable sort, add data columns as secondary sorting columns.
+   </para>
+   <tip>
+    <para>
+     The recursive query evaluation algorithm produces its output in
+     breadth-first search order.  However, this is an implementation detail and
+     it is perhaps unsound to rely on it.  The order of the rows within each
+     level is certainly undefined, so some explicit ordering might be desired
+     in any case.
+    </para>
+   </tip>
+  </sect3>
+  <sect3 id="queries-with-cycle">
+   <title>Cycle Detection</title>
  <para>
   When working with recursive queries it is important to be sure that
   the recursive part of the query will eventually return no tuples,
@@ -2123,13 +2241,13 @@ GROUP BY sub_part
   cycle does not involve output rows that are completely duplicate: it may be
   necessary to check just one or a few fields to see if the same point has
   been reached before.  The standard method for handling such situations is
-   to compute an array of the already-visited values.  For example, consider
+   to compute an array of the already-visited values.  For example, consider again
   the following query that searches a table <structname>graph</structname> using a
   <structfield>link</structfield> field:
 <programlisting>
 WITH RECURSIVE search_graph(id, link, data, depth) AS (
-    SELECT g.id, g.link, g.data, 1
+    SELECT g.id, g.link, g.data, 0
    FROM graph g
  UNION ALL
    SELECT g.id, g.link, g.data, sg.depth + 1
@@ -2147,17 +2265,17 @@ SELECT * FROM search_graph;
   <structfield>is_cycle</structfield> and <structfield>path</structfield> to the loop-prone query:
 <programlisting>
-WITH RECURSIVE search_graph(id, link, data, depth, is_cycle, path) AS (
+WITH RECURSIVE search_graph(id, link, data, depth, <emphasis>is_cycle, path</emphasis>) AS (
-    SELECT g.id, g.link, g.data, 1,
+    SELECT g.id, g.link, g.data, 0,
-      false,
+      <emphasis>false,
-      ARRAY[g.id]
+      ARRAY[g.id]</emphasis>
    FROM graph g
  UNION ALL
    SELECT g.id, g.link, g.data, sg.depth + 1,
-      g.id = ANY(path),
+      <emphasis>g.id = ANY(path),
-      path || g.id
+      path || g.id</emphasis>
    FROM graph g, search_graph sg
-    WHERE g.id = sg.link AND NOT is_cycle
+    WHERE g.id = sg.link <emphasis>AND NOT is_cycle</emphasis>
 )
 SELECT * FROM search_graph;
 </programlisting>
@@ -2172,17 +2290,17 @@ SELECT * FROM search_graph;
   compare fields <structfield>f1</structfield> and <structfield>f2</structfield>:
 <programlisting>
-WITH RECURSIVE search_graph(id, link, data, depth, is_cycle, path) AS (
+WITH RECURSIVE search_graph(id, link, data, depth, <emphasis>is_cycle, path</emphasis>) AS (
-    SELECT g.id, g.link, g.data, 1,
+    SELECT g.id, g.link, g.data, 0,
-      false,
+      <emphasis>false,
-      ARRAY[ROW(g.f1, g.f2)]
+      ARRAY[ROW(g.f1, g.f2)]</emphasis>
    FROM graph g
  UNION ALL
    SELECT g.id, g.link, g.data, sg.depth + 1,
-      ROW(g.f1, g.f2) = ANY(path),
+      <emphasis>ROW(g.f1, g.f2) = ANY(path),
-      path || ROW(g.f1, g.f2)
+      path || ROW(g.f1, g.f2)</emphasis>
    FROM graph g, search_graph sg
-    WHERE g.id = sg.link AND NOT is_cycle
+    WHERE g.id = sg.link <emphasis>AND NOT is_cycle</emphasis>
 )
 SELECT * FROM search_graph;
 </programlisting>
@@ -2198,10 +2316,8 @@ SELECT * FROM search_graph;
  <tip>
   <para>
-    The recursive query evaluation algorithm produces its output in
+    The cycle path column is computed in the same way as the depth-first
-    breadth-first search order.  You can display the results in depth-first
+    ordering column show in the previous section.
-    search order by making the outer query <literal>ORDER BY</literal> a
-    <quote>path</quote> column constructed in this way.
   </para>
  </tip>
@@ -2217,7 +2333,7 @@ WITH RECURSIVE t(n) AS (
  UNION ALL
    SELECT n+1 FROM t
 )
-SELECT n FROM t LIMIT 100;
+SELECT n FROM t <emphasis>LIMIT 100</emphasis>;
 </programlisting>
   This works because <productname>PostgreSQL</productname>'s implementation
@@ -2229,6 +2345,11 @@ SELECT n FROM t LIMIT 100;
   outer query will usually try to fetch all of the <literal>WITH</literal> query's
   output anyway.
  </para>
+  </sect3>
+ </sect2>
+ <sect2>
+  <title>Common Table Expression Materialization</title>
  <para>
   A useful property of <literal>WITH</literal> queries is that they are