When appropriate, postpone SELECT output expressions till after ORDER BY.

It is frequently useful for volatile, set-returning, or expensive functions in a SELECT's targetlist to be postponed till after ORDER BY and LIMIT are done. Otherwise, the functions might be executed for every row of the table despite the presence of LIMIT, and/or be executed in an unexpected order. For example, in SELECT x, nextval('seq') FROM tab ORDER BY x LIMIT 10; it's probably desirable that the nextval() values are ordered the same as x, and that nextval() is not run more than 10 times. In the past, Postgres was inconsistent in this area: you would get the desirable behavior if the ordering were performed via an indexscan, but not if it had to be done by an explicit sort step. Getting the desired behavior reliably required contortions like SELECT x, nextval('seq') FROM (SELECT x FROM tab ORDER BY x) ss LIMIT 10; This patch conditionally postpones evaluation of pure-output target expressions (that is, those that are not used as DISTINCT, ORDER BY, or GROUP BY columns) so that they effectively occur after sorting, even if an explicit sort step is necessary. Volatile expressions and set-returning expressions are always postponed, so as to provide consistent semantics. Expensive expressions (costing more than 10 times typical operator cost, which by default would include any user-defined function) are postponed if there is a LIMIT or if there are expressions that must be postponed. We could be more aggressive and postpone any nontrivial expression, but there are costs associated with doing so: it requires an extra Result plan node which adds some overhead, and postponement changes the volume of data going through the sort step, perhaps for the worse. Since we tend not to have very good estimates of the output width of nontrivial expressions, it's hard to have much confidence in our ability to predict whether postponement would increase or decrease the cost of the sort; therefore this patch doesn't attempt to make decisions conditionally on that. Between these factors and a general desire not to change query behavior when there's not a demonstrable benefit, it seems best to be conservative about applying postponement. We might tweak the decision rules in the future, though. Konstantin Knizhnik, heavily rewritten by me

When appropriate, postpone SELECT output expressions till after ORDER BY.
It is frequently useful for volatile, set-returning, or expensive functions in a SELECT's targetlist to be postponed till after ORDER BY and LIMIT are done. Otherwise, the functions might be executed for every row of the table despite the presence of LIMIT, and/or be executed in an unexpected order. For example, in SELECT x, nextval('seq') FROM tab ORDER BY x LIMIT 10; it's probably desirable that the nextval() values are ordered the same as x, and that nextval() is not run more than 10 times. In the past, Postgres was inconsistent in this area: you would get the desirable behavior if the ordering were performed via an indexscan, but not if it had to be done by an explicit sort step. Getting the desired behavior reliably required contortions like SELECT x, nextval('seq') FROM (SELECT x FROM tab ORDER BY x) ss LIMIT 10; This patch conditionally postpones evaluation of pure-output target expressions (that is, those that are not used as DISTINCT, ORDER BY, or GROUP BY columns) so that they effectively occur after sorting, even if an explicit sort step is necessary. Volatile expressions and set-returning expressions are always postponed, so as to provide consistent semantics. Expensive expressions (costing more than 10 times typical operator cost, which by default would include any user-defined function) are postponed if there is a LIMIT or if there are expressions that must be postponed. We could be more aggressive and postpone any nontrivial expression, but there are costs associated with doing so: it requires an extra Result plan node which adds some overhead, and postponement changes the volume of data going through the sort step, perhaps for the worse. Since we tend not to have very good estimates of the output width of nontrivial expressions, it's hard to have much confidence in our ability to predict whether postponement would increase or decrease the cost of the sort; therefore this patch doesn't attempt to make decisions conditionally on that. Between these factors and a general desire not to change query behavior when there's not a demonstrable benefit, it seems best to be conservative about applying postponement. We might tweak the decision rules in the future, though. Konstantin Knizhnik, heavily rewritten by me
9118d03a · Tom Lane · b1fdc727 · 9118d03a · 9118d03a · 9118d03a
Commit 9118d03a authored Mar 11, 2016 by Tom Lane
4 changed files
--- a/doc/src/sgml/ref/select.sgml
+++ b/doc/src/sgml/ref/select.sgml
@@ -993,6 +993,36 @@ UNBOUNDED FOLLOWING
    cases it is not possible to specify new names with <literal>AS</>;
    the output column names will be the same as the table columns' names.
   </para>
+   <para>
+    According to the SQL standard, the expressions in the output list should
+    be computed before applying <literal>DISTINCT</literal>, <literal>ORDER
+    BY</literal>, or <literal>LIMIT</literal>.  This is obviously necessary
+    when using <literal>DISTINCT</literal>, since otherwise it's not clear
+    what values are being made distinct.  However, in many cases it is
+    convenient if output expressions are computed after <literal>ORDER
+    BY</literal> and <literal>LIMIT</literal>; particularly if the output list
+    contains any volatile or expensive functions.  With that behavior, the
+    order of function evaluations is more intuitive and there will not be
+    evaluations corresponding to rows that never appear in the output.
+    <productname>PostgreSQL</> will effectively evaluate output expressions
+    after sorting and limiting, so long as those expressions are not
+    referenced in <literal>DISTINCT</literal>, <literal>ORDER BY</literal>
+    or <literal>GROUP BY</literal>.  (As a counterexample, <literal>SELECT
+    f(x) FROM tab ORDER BY 1</> clearly must evaluate <function>f(x)</>
+    before sorting.)  Output expressions that contain set-returning functions
+    are effectively evaluated after sorting and before limiting, so
+    that <literal>LIMIT</literal> will act to cut off the output from a
+    set-returning function.
+   </para>
+   <note>
+    <para>
+     <productname>PostgreSQL</> versions before 9.6 did not provide any
+     guarantees about the timing of evaluation of output expressions versus
+     sorting and limiting; it depended on the form of the chosen query plan.
+    </para>
+   </note>
  </refsect2>
  <refsect2 id="sql-distinct">

--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
--- a/src/test/regress/expected/limit.out
+++ b/src/test/regress/expected/limit.out
@@ -129,3 +129,132 @@ SELECT
 10
 (10 rows)
+--
+-- Test behavior of volatile and set-returning functions in conjunction
+-- with ORDER BY and LIMIT.
+--
+create temp sequence testseq;
+explain (verbose, costs off)
+select unique1, unique2, nextval('testseq')
+  from tenk1 order by unique2 limit 10;
+                           QUERY PLAN                           
+----------------------------------------------------------------
+ Limit
+   Output: unique1, unique2, (nextval('testseq'::regclass))
+   ->  Index Scan using tenk1_unique2 on public.tenk1
+         Output: unique1, unique2, nextval('testseq'::regclass)
+(4 rows)
+select unique1, unique2, nextval('testseq')
+  from tenk1 order by unique2 limit 10;
+ unique1 | unique2 | nextval 
+---------+---------+---------
+    8800 |       0 |       1
+    1891 |       1 |       2
+    3420 |       2 |       3
+    9850 |       3 |       4
+    7164 |       4 |       5
+    8009 |       5 |       6
+    5057 |       6 |       7
+    6701 |       7 |       8
+    4321 |       8 |       9
+    3043 |       9 |      10
+(10 rows)
+select currval('testseq');
+ currval 
+---------
+      10
+(1 row)
+explain (verbose, costs off)
+select unique1, unique2, nextval('testseq')
+  from tenk1 order by tenthous limit 10;
+                                QUERY PLAN                                
+--------------------------------------------------------------------------
+ Limit
+   Output: unique1, unique2, (nextval('testseq'::regclass)), tenthous
+   ->  Result
+         Output: unique1, unique2, nextval('testseq'::regclass), tenthous
+         ->  Sort
+               Output: unique1, unique2, tenthous
+               Sort Key: tenk1.tenthous
+               ->  Seq Scan on public.tenk1
+                     Output: unique1, unique2, tenthous
+(9 rows)
+select unique1, unique2, nextval('testseq')
+  from tenk1 order by tenthous limit 10;
+ unique1 | unique2 | nextval 
+---------+---------+---------
+       0 |    9998 |      11
+       1 |    2838 |      12
+       2 |    2716 |      13
+       3 |    5679 |      14
+       4 |    1621 |      15
+       5 |    5557 |      16
+       6 |    2855 |      17
+       7 |    8518 |      18
+       8 |    5435 |      19
+       9 |    4463 |      20
+(10 rows)
+select currval('testseq');
+ currval 
+---------
+      20
+(1 row)
+explain (verbose, costs off)
+select unique1, unique2, generate_series(1,10)
+  from tenk1 order by unique2 limit 7;
+                        QUERY PLAN                        
+----------------------------------------------------------
+ Limit
+   Output: unique1, unique2, (generate_series(1, 10))
+   ->  Index Scan using tenk1_unique2 on public.tenk1
+         Output: unique1, unique2, generate_series(1, 10)
+(4 rows)
+select unique1, unique2, generate_series(1,10)
+  from tenk1 order by unique2 limit 7;
+ unique1 | unique2 | generate_series 
+---------+---------+-----------------
+    8800 |       0 |               1
+    8800 |       0 |               2
+    8800 |       0 |               3
+    8800 |       0 |               4
+    8800 |       0 |               5
+    8800 |       0 |               6
+    8800 |       0 |               7
+(7 rows)
+explain (verbose, costs off)
+select unique1, unique2, generate_series(1,10)
+  from tenk1 order by tenthous limit 7;
+                             QUERY PLAN                             
+--------------------------------------------------------------------
+ Limit
+   Output: unique1, unique2, (generate_series(1, 10)), tenthous
+   ->  Result
+         Output: unique1, unique2, generate_series(1, 10), tenthous
+         ->  Sort
+               Output: unique1, unique2, tenthous
+               Sort Key: tenk1.tenthous
+               ->  Seq Scan on public.tenk1
+                     Output: unique1, unique2, tenthous
+(9 rows)
+select unique1, unique2, generate_series(1,10)
+  from tenk1 order by tenthous limit 7;
+ unique1 | unique2 | generate_series 
+---------+---------+-----------------
+       0 |    9998 |               1
+       0 |    9998 |               2
+       0 |    9998 |               3
+       0 |    9998 |               4
+       0 |    9998 |               5
+       0 |    9998 |               6
+       0 |    9998 |               7
+(7 rows)
--- a/src/test/regress/sql/limit.sql
+++ b/src/test/regress/sql/limit.sql
@@ -39,3 +39,42 @@ SELECT
          (SELECT n FROM generate_series(1,10) AS n
             ORDER BY n LIMIT 1 OFFSET s-1) AS y) AS z
  FROM generate_series(1,10) AS s;
+--
+-- Test behavior of volatile and set-returning functions in conjunction
+-- with ORDER BY and LIMIT.
+--
+create temp sequence testseq;
+explain (verbose, costs off)
+select unique1, unique2, nextval('testseq')
+  from tenk1 order by unique2 limit 10;
+select unique1, unique2, nextval('testseq')
+  from tenk1 order by unique2 limit 10;
+select currval('testseq');
+explain (verbose, costs off)
+select unique1, unique2, nextval('testseq')
+  from tenk1 order by tenthous limit 10;
+select unique1, unique2, nextval('testseq')
+  from tenk1 order by tenthous limit 10;
+select currval('testseq');
+explain (verbose, costs off)
+select unique1, unique2, generate_series(1,10)
+  from tenk1 order by unique2 limit 7;
+select unique1, unique2, generate_series(1,10)
+  from tenk1 order by unique2 limit 7;
+explain (verbose, costs off)
+select unique1, unique2, generate_series(1,10)
+  from tenk1 order by tenthous limit 7;
+select unique1, unique2, generate_series(1,10)
+  from tenk1 order by tenthous limit 7;