Add documentation for data-modifying statements in WITH clauses.

Marko Tiikkaja, somewhat reworked by Tom

Add documentation for data-modifying statements in WITH clauses.
Marko Tiikkaja, somewhat reworked by Tom
0ef0b302 · Tom Lane · c11f1f71 · 0ef0b302 · 0ef0b302
Commit 0ef0b302 authored Feb 28, 2011 by Tom Lane
Hide whitespace changes
Inline Side-by-side

Showing with 214 additions and 12 deletions

doc/src/sgml/queries.sgml doc/src/sgml/queries.sgml +171 -6

doc/src/sgml/ref/select.sgml doc/src/sgml/ref/select.sgml +43 -6

No files found.
--- a/doc/src/sgml/queries.sgml
+++ b/doc/src/sgml/queries.sgml
@@ -1539,11 +1539,23 @@ SELECT <replaceable>select_list</replaceable> FROM <replaceable>table_expression
  </indexterm>

  <para>
-   <literal>WITH</> provides a way to write subqueries for use in a larger
-   query.  The subqueries, which are often referred to as Common Table
-   Expressions or <acronym>CTE</acronym>s, can be thought of as defining
-   temporary tables that exist just for this query.  One use of this feature
-   is to break down complicated queries into simpler parts.  An example is:
+   <literal>WITH</> provides a way to write auxiliary statements for use in a
+   larger query.  These statements, which are often referred to as Common
+   Table Expressions or <acronym>CTE</acronym>s, can be thought of as defining
+   temporary tables that exist just for one query.  Each auxiliary statement
+   in a <literal>WITH</> clause can be a <command>SELECT</>,
+   <command>INSERT</>, <command>UPDATE</>, or <command>DELETE</>; and the
+   <literal>WITH</> clause itself is attached to a primary statement that can
+   also be a <command>SELECT</>, <command>INSERT</>, <command>UPDATE</>, or
+   <command>DELETE</>.
+  </para>
+
+ <sect2 id="queries-with-select">
+   <title><command>SELECT</> in <literal>WITH</></title>
+
+  <para>
+   The basic value of <command>SELECT</> in <literal>WITH</> is to
+   break down complicated queries into simpler parts.  An example is:

 <programlisting>
 WITH regional_sales AS (
@@ -1565,6 +1577,11 @@ GROUP BY region, product;
 </programlisting>

   which displays per-product sales totals in only the top sales regions.
+   The <literal>WITH</> clause defines two auxiliary statements named
+   <structname>regional_sales</> and <structname>top_regions</>,
+   where the output of <structname>regional_sales</> is used in
+   <structname>top_regions</> and the output of <structname>top_regions</>
+   is used in the primary <command>SELECT</> query.
   This example could have been written without <literal>WITH</>,
   but we'd have needed two levels of nested sub-SELECTs.  It's a bit
   easier to follow this way.
@@ -1779,7 +1796,9 @@ SELECT n FROM t LIMIT 100;
   fetched by the parent query.  Using this trick in production is not
   recommended, because other systems might work differently.  Also, it
   usually won't work if you make the outer query sort the recursive query's
-   results or join them to some other table.
+   results or join them to some other table, because in such cases the
+   outer query will usually try to fetch all of the <literal>WITH</> query's
+   output anyway.
  </para>

  <para>
@@ -1806,6 +1825,152 @@ SELECT n FROM t LIMIT 100;
   In each case it effectively provides temporary table(s) that can
   be referred to in the main command.
  </para>
+ </sect2>
+
+ <sect2 id="queries-with-modifying">
+   <title>Data-Modifying Statements in <literal>WITH</></title>
+
+   <para>
+    You can use data-modifying statements (<command>INSERT</>,
+    <command>UPDATE</>, or <command>DELETE</>) in <literal>WITH</>.  This
+    allows you to perform several different operations in the same query.
+    An example is:
+
+<programlisting>
+WITH moved_rows AS (
+    DELETE FROM products
+    WHERE
+        "date" &gt;= '2010-10-01' AND
+        "date" &lt; '2010-11-01'
+    RETURNING *
+)
+INSERT INTO products_log
+SELECT * FROM moved_rows;
+</programlisting>
+
+    This query effectively moves rows from <structname>products</> to
+    <structname>products_log</>.  The <command>DELETE</> in <literal>WITH</>
+    deletes the specified rows from <structname>products</>, returning their
+    contents by means of its <literal>RETURNING</> clause; and then the
+    primary query reads that output and inserts it into
+    <structname>products_log</>.
+   </para>
+
+   <para>
+    A fine point of the above example is that the <literal>WITH</> clause is
+    attached to the <command>INSERT</>, not the sub-<command>SELECT</> within
+    the <command>INSERT</>.  This is necessary because data-modifying
+    statements are only allowed in <literal>WITH</> clauses that are attached
+    to the top-level statement.  However, normal <literal>WITH</> visibility
+    rules apply, so it is possible to refer to the <literal>WITH</>
+    statement's output from the sub-<command>SELECT</>.
+   </para>
+
+   <para>
+    Data-modifying statements in <literal>WITH</> usually have
+    <literal>RETURNING</> clauses, as seen in the example above.
+    It is the output of the <literal>RETURNING</> clause, <emphasis>not</> the
+    target table of the data-modifying statement, that forms the temporary
+    table that can be referred to by the rest of the query.  If a
+    data-modifying statement in <literal>WITH</> lacks a <literal>RETURNING</>
+    clause, then it forms no temporary table and cannot be referred to in
+    the rest of the query.  Such a statement will be executed nonetheless.
+    A not-particularly-useful example is:
+
+<programlisting>
+WITH t AS (
+    DELETE FROM foo
+)
+DELETE FROM bar;
+</programlisting>
+
+    This example would remove all rows from tables <structname>foo</> and
+    <structname>bar</>.  The number of affected rows reported to the client
+    would only include rows removed from <structname>bar</>.
+   </para>
+
+   <para>
+    Recursive self-references in data-modifying statements are not
+    allowed.  In some cases it is possible to work around this limitation by
+    referring to the output of a recursive <literal>WITH</>, for example:
+
+<programlisting>
+WITH RECURSIVE included_parts(sub_part, part) AS (
+    SELECT sub_part, part FROM parts WHERE part = 'our_product'
+  UNION ALL
+    SELECT p.sub_part, p.part
+    FROM included_parts pr, parts p
+    WHERE p.part = pr.sub_part
+  )
+DELETE FROM parts
+  WHERE part IN (SELECT part FROM included_parts);
+</programlisting>
+
+    This query would remove all direct and indirect subparts of a product.
+   </para>
+
+   <para>
+    Data-modifying statements in <literal>WITH</> are executed exactly once,
+    and always to completion, independently of whether the primary query
+    reads all (or indeed any) of their output.  Notice that this is different
+    from the rule for <command>SELECT</> in <literal>WITH</>: as stated in the
+    previous section, execution of a <command>SELECT</> is carried only as far
+    as the primary query demands its output.
+   </para>
+
+   <para>
+    The sub-statements in <literal>WITH</> are executed concurrently with
+    each other and with the main query.  Therefore, when using data-modifying
+    statements in <literal>WITH</>, the order in which the specified updates
+    actually happen is unpredictable.  All the statements are executed with
+    the same <firstterm>snapshot</> (see <xref linkend="mvcc">), so they
+    cannot <quote>see</> each others' effects on the target tables.  This
+    alleviates the effects of the unpredictability of the actual order of row
+    updates, and means that <literal>RETURNING</> data is the only way to
+    communicate changes between different <literal>WITH</> sub-statements and
+    the main query.  An example of this is that in
+
+<programlisting>
+WITH t AS (
+    UPDATE products SET price = price * 1.05
+    RETURNING *
+)
+SELECT * FROM products;
+</programlisting>
+
+    the outer <command>SELECT</> would return the original prices before the
+    action of the <command>UPDATE</>, while in
+
+<programlisting>
+WITH t AS (
+    UPDATE products SET price = price * 1.05
+    RETURNING *
+)
+SELECT * FROM t;
+</programlisting>
+
+    the outer <command>SELECT</> would return the updated data.
+   </para>
+
+   <para>
+    Trying to update the same row twice in a single statement is not
+    supported.  Only one of the modifications takes place, but it is not easy
+    (and sometimes not possible) to reliably predict which one.  This also
+    applies to deleting a row that was already updated in the same statement:
+    only the update is performed.  Therefore you should generally avoid trying
+    to modify a single row twice in a single statement.  In particular avoid
+    writing <literal>WITH</> sub-statements that could affect the same rows
+    changed by the main statement or a sibling sub-statement.  The effects
+    of such a statement will not be predictable.
+   </para>
+
+   <para>
+    At present, any table used as the target of a data-modifying statement in
+    <literal>WITH</> must not have a conditional rule, nor an <literal>ALSO</>
+    rule, nor an <literal>INSTEAD</> rule that expands to multiple statements.
+   </para>
+
+  </sect2>

 </sect1>


--- a/doc/src/sgml/ref/select.sgml
+++ b/doc/src/sgml/ref/select.sgml
@@ -58,9 +58,9 @@ SELECT [ ALL | DISTINCT [ ON ( <replaceable class="parameter">expression</replac

 <phrase>and <replaceable class="parameter">with_query</replaceable> is:</phrase>

-    <replaceable class="parameter">with_query_name</replaceable> [ ( <replaceable class="parameter">column_name</replaceable> [, ...] ) ] AS ( <replaceable class="parameter">select</replaceable> )
+    <replaceable class="parameter">with_query_name</replaceable> [ ( <replaceable class="parameter">column_name</replaceable> [, ...] ) ] AS ( <replaceable class="parameter">select</replaceable> | <replaceable class="parameter">insert</replaceable> | <replaceable class="parameter">update</replaceable> | <replaceable class="parameter">delete</replaceable> )

-TABLE { [ ONLY ] <replaceable class="parameter">table_name</replaceable> [ * ] | <replaceable class="parameter">with_query_name</replaceable> }
+TABLE [ ONLY ] <replaceable class="parameter">table_name</replaceable> [ * ]
 </synopsis>

 </refsynopsisdiv>
@@ -209,6 +209,17 @@ TABLE { [ ONLY ] <replaceable class="parameter">table_name</replaceable> [ * ] |
    subqueries that can be referenced by name in the primary query.
    The subqueries effectively act as temporary tables or views
    for the duration of the primary query.
+    Each subquery can be a <command>SELECT</command>,
+    <command>INSERT</command>, <command>UPDATE</command> or
+    <command>DELETE</command> statement.
+    When writing a data-modifying statement (<command>INSERT</command>,
+    <command>UPDATE</command> or <command>DELETE</command>) in
+    <literal>WITH</>, it is usual to include a <literal>RETURNING</> clause.
+    It is the output of <literal>RETURNING</>, <emphasis>not</> the underlying
+    table that the statement modifies, that forms the temporary table that is
+    read by the primary query.  If <literal>RETURNING</> is omitted, the
+    statement is still executed, but it produces no output so it cannot be
+    referenced as a table by the primary query.
   </para>

   <para>
@@ -220,14 +231,18 @@ TABLE { [ ONLY ] <replaceable class="parameter">table_name</replaceable> [ * ] |

   <para>
    If <literal>RECURSIVE</literal> is specified, it allows a
-    subquery to reference itself by name.  Such a subquery must have
-    the form
+    <command>SELECT</command> subquery to reference itself by name.  Such a
+    subquery must have the form
 <synopsis>
 <replaceable class="parameter">non_recursive_term</replaceable> UNION [ ALL | DISTINCT ] <replaceable class="parameter">recursive_term</replaceable>
 </synopsis>
    where the recursive self-reference must appear on the right-hand
    side of the <literal>UNION</>.  Only one recursive self-reference
-    is permitted per query.
+    is permitted per query.  Recursive data-modifying statements are not
+    supported, but you can use the results of a recursive
+    <command>SELECT</command> query in
+    a data-modifying statement.  See <xref linkend="queries-with"> for
+    an example.
   </para>

   <para>
@@ -241,9 +256,21 @@ TABLE { [ ONLY ] <replaceable class="parameter">table_name</replaceable> [ * ] |
   </para>

   <para>
-    A useful property of <literal>WITH</literal> queries is that they
+    A key property of <literal>WITH</literal> queries is that they
    are evaluated only once per execution of the primary query,
    even if the primary query refers to them more than once.
+    In particular, data-modifying statements are guaranteed to be
+    executed once and only once, regardless of whether the primary query
+    reads all or any of their output.
+   </para>
+
+   <para>
+    The primary query and the <literal>WITH</literal> queries are all
+    (notionally) executed at the same time.  This implies that the effects of
+    a data-modifying statement in <literal>WITH</literal> cannot be seen from
+    other parts of the query, other than by reading its <literal>RETURNING</>
+    output.  If two such data-modifying statements attempt to modify the same
+    row, the results are unspecified.
   </para>

   <para>
@@ -1657,6 +1684,16 @@ SELECT distributors.* WHERE distributors.name = 'Westward';
   </para>
  </refsect2>

+  <refsect2>
+   <title>Data-Modifying Statements in <literal>WITH</></title>
+
+   <para>
+    <productname>PostgreSQL</productname> allows <command>INSERT</>,
+    <command>UPDATE</>, and <command>DELETE</> to be used as <literal>WITH</>
+    queries.  This is not found in the SQL standard.
+   </para>
+  </refsect2>
+
  <refsect2>
   <title>Nonstandard Clauses</title>