Commit 1c1a4726 authored by Bruce Momjian's avatar Bruce Momjian

docs: Improve window function docs

Specifically, the behavior of general-purpose and statistical aggregates
as window functions was not clearly documented, and terms were
inconsistently used.  Also add docs about the difference between
cume_dist and percent_rank, rather than just the formulas.

Discussion: 20170406214918.GA5757@momjian.us
parent a4777f35
...@@ -328,8 +328,8 @@ COMMIT; ...@@ -328,8 +328,8 @@ COMMIT;
A <firstterm>window function</> performs a calculation across a set of A <firstterm>window function</> performs a calculation across a set of
table rows that are somehow related to the current row. This is comparable table rows that are somehow related to the current row. This is comparable
to the type of calculation that can be done with an aggregate function. to the type of calculation that can be done with an aggregate function.
But unlike regular aggregate functions, use of a window function does not However, window functions do not cause rows to become grouped into a single
cause rows to become grouped into a single output row &mdash; the output row like non-window aggregate calls would. Instead, the
rows retain their separate identities. Behind the scenes, the window rows retain their separate identities. Behind the scenes, the window
function is able to access more than just the current row of the query function is able to access more than just the current row of the query
result. result.
...@@ -363,20 +363,19 @@ SELECT depname, empno, salary, avg(salary) OVER (PARTITION BY depname) FROM emps ...@@ -363,20 +363,19 @@ SELECT depname, empno, salary, avg(salary) OVER (PARTITION BY depname) FROM emps
<structname>empsalary</>, and there is one output row for each row in the <structname>empsalary</>, and there is one output row for each row in the
table. The fourth column represents an average taken across all the table table. The fourth column represents an average taken across all the table
rows that have the same <structfield>depname</> value as the current row. rows that have the same <structfield>depname</> value as the current row.
(This actually is the same function as the regular <function>avg</> (This actually is the same function as the non-window <function>avg</>
aggregate function, but the <literal>OVER</> clause causes it to be aggregate, but the <literal>OVER</> clause causes it to be
treated as a window function and computed across an appropriate set of treated as a window function and computed across the window frame.)
rows.)
</para> </para>
<para> <para>
A window function call always contains an <literal>OVER</> clause A window function call always contains an <literal>OVER</> clause
directly following the window function's name and argument(s). This is what directly following the window function's name and argument(s). This is what
syntactically distinguishes it from a regular function or aggregate syntactically distinguishes it from a normal function or non-window
function. The <literal>OVER</> clause determines exactly how the aggregate. The <literal>OVER</> clause determines exactly how the
rows of the query are split up for processing by the window function. rows of the query are split up for processing by the window function.
The <literal>PARTITION BY</> list within <literal>OVER</> specifies The <literal>PARTITION BY</> clause within <literal>OVER</>
dividing the rows into groups, or partitions, that share the same divides the rows into groups, or partitions, that share the same
values of the <literal>PARTITION BY</> expression(s). For each row, values of the <literal>PARTITION BY</> expression(s). For each row,
the window function is computed across the rows that fall into the the window function is computed across the rows that fall into the
same partition as the current row. same partition as the current row.
...@@ -411,8 +410,8 @@ FROM empsalary; ...@@ -411,8 +410,8 @@ FROM empsalary;
</screen> </screen>
As shown here, the <function>rank</> function produces a numerical rank As shown here, the <function>rank</> function produces a numerical rank
within the current row's partition for each distinct <literal>ORDER BY</> for each distinct <literal>ORDER BY</> value in the current row's
value, in the order defined by the <literal>ORDER BY</> clause. partition, using the order defined by the <literal>ORDER BY</> clause.
<function>rank</> needs no explicit parameter, because its behavior <function>rank</> needs no explicit parameter, because its behavior
is entirely determined by the <literal>OVER</> clause. is entirely determined by the <literal>OVER</> clause.
</para> </para>
...@@ -424,20 +423,20 @@ FROM empsalary; ...@@ -424,20 +423,20 @@ FROM empsalary;
if any. For example, a row removed because it does not meet the if any. For example, a row removed because it does not meet the
<literal>WHERE</> condition is not seen by any window function. <literal>WHERE</> condition is not seen by any window function.
A query can contain multiple window functions that slice up the data A query can contain multiple window functions that slice up the data
in different ways by means of different <literal>OVER</> clauses, but in different ways using different <literal>OVER</> clauses, but
they all act on the same collection of rows defined by this virtual table. they all act on the same collection of rows defined by this virtual table.
</para> </para>
<para> <para>
We already saw that <literal>ORDER BY</> can be omitted if the ordering We already saw that <literal>ORDER BY</> can be omitted if the ordering
of rows is not important. It is also possible to omit <literal>PARTITION of rows is not important. It is also possible to omit <literal>PARTITION
BY</>, in which case there is just one partition containing all the rows. BY</>, in which case there is a single partition containing all rows.
</para> </para>
<para> <para>
There is another important concept associated with window functions: There is another important concept associated with window functions:
for each row, there is a set of rows within its partition called its for each row, there is a set of rows within its partition called its
<firstterm>window frame</>. Many (but not all) window functions act only <firstterm>window frame</>. Some window functions act only
on the rows of the window frame, rather than of the whole partition. on the rows of the window frame, rather than of the whole partition.
By default, if <literal>ORDER BY</> is supplied then the frame consists of By default, if <literal>ORDER BY</> is supplied then the frame consists of
all rows from the start of the partition up through the current row, plus all rows from the start of the partition up through the current row, plus
...@@ -515,7 +514,7 @@ SELECT salary, sum(salary) OVER (ORDER BY salary) FROM empsalary; ...@@ -515,7 +514,7 @@ SELECT salary, sum(salary) OVER (ORDER BY salary) FROM empsalary;
elsewhere, such as in <literal>GROUP BY</>, <literal>HAVING</> elsewhere, such as in <literal>GROUP BY</>, <literal>HAVING</>
and <literal>WHERE</literal> clauses. This is because they logically and <literal>WHERE</literal> clauses. This is because they logically
execute after the processing of those clauses. Also, window functions execute after the processing of those clauses. Also, window functions
execute after regular aggregate functions. This means it is valid to execute after non-window aggregate functions. This means it is valid to
include an aggregate function call in the arguments of a window function, include an aggregate function call in the arguments of a window function,
but not vice versa. but not vice versa.
</para> </para>
......
...@@ -13166,13 +13166,14 @@ NULL baz</literallayout>(3 rows)</entry> ...@@ -13166,13 +13166,14 @@ NULL baz</literallayout>(3 rows)</entry>
<para> <para>
<firstterm>Aggregate functions</firstterm> compute a single result <firstterm>Aggregate functions</firstterm> compute a single result
from a set of input values. The built-in normal aggregate functions from a set of input values. The built-in general-purpose aggregate
are listed in functions are listed in <xref linkend="functions-aggregate-table">
<xref linkend="functions-aggregate-table"> and and statistical aggregates in <xref
<xref linkend="functions-aggregate-statistics-table">. linkend="functions-aggregate-statistics-table">.
The built-in ordered-set aggregate functions The built-in within-group ordered-set aggregate functions
are listed in <xref linkend="functions-orderedset-table"> and are listed in <xref linkend="functions-orderedset-table">
<xref linkend="functions-hypothetical-table">. Grouping operations, while the built-in within-group hypothetical-set ones are in <xref
linkend="functions-hypothetical-table">. Grouping operations,
which are closely related to aggregate functions, are listed in which are closely related to aggregate functions, are listed in
<xref linkend="functions-grouping-table">. <xref linkend="functions-grouping-table">.
The special syntax considerations for aggregate The special syntax considerations for aggregate
...@@ -14413,17 +14414,19 @@ SELECT xmlagg(x) FROM (SELECT x FROM test ORDER BY y DESC) AS tab; ...@@ -14413,17 +14414,19 @@ SELECT xmlagg(x) FROM (SELECT x FROM test ORDER BY y DESC) AS tab;
<para> <para>
The built-in window functions are listed in The built-in window functions are listed in
<xref linkend="functions-window-table">. Note that these functions <xref linkend="functions-window-table">. Note that these functions
<emphasis>must</> be invoked using window function syntax; that is an <emphasis>must</> be invoked using window function syntax, i.e., an
<literal>OVER</> clause is required. <literal>OVER</> clause is required.
</para> </para>
<para> <para>
In addition to these functions, any built-in or user-defined normal In addition to these functions, any built-in or user-defined
aggregate function (but not ordered-set or hypothetical-set aggregates) general-purpose or statistical
aggregate (i.e., not ordered-set or hypothetical-set aggregates)
can be used as a window function; see can be used as a window function; see
<xref linkend="functions-aggregate"> for a list of the built-in aggregates. <xref linkend="functions-aggregate"> for a list of the built-in aggregates.
Aggregate functions act as window functions only when an <literal>OVER</> Aggregate functions act as window functions only when an <literal>OVER</>
clause follows the call; otherwise they act as regular aggregates. clause follows the call; otherwise they act as non-window aggregates
and return a single row for the entire set.
</para> </para>
<table id="functions-window-table"> <table id="functions-window-table">
...@@ -14488,7 +14491,7 @@ SELECT xmlagg(x) FROM (SELECT x FROM test ORDER BY y DESC) AS tab; ...@@ -14488,7 +14491,7 @@ SELECT xmlagg(x) FROM (SELECT x FROM test ORDER BY y DESC) AS tab;
<entry> <entry>
<type>double precision</type> <type>double precision</type>
</entry> </entry>
<entry>relative rank of the current row: (<function>rank</> - 1) / (total rows - 1)</entry> <entry>relative rank of the current row: (<function>rank</> - 1) / (total partition rows - 1)</entry>
</row> </row>
<row> <row>
...@@ -14501,7 +14504,7 @@ SELECT xmlagg(x) FROM (SELECT x FROM test ORDER BY y DESC) AS tab; ...@@ -14501,7 +14504,7 @@ SELECT xmlagg(x) FROM (SELECT x FROM test ORDER BY y DESC) AS tab;
<entry> <entry>
<type>double precision</type> <type>double precision</type>
</entry> </entry>
<entry>relative rank of the current row: (number of rows preceding or peer with current row) / (total rows)</entry> <entry>cumulative distribution: (number of partition rows preceding or peer with current row) / total partition rows</entry>
</row> </row>
<row> <row>
...@@ -14634,9 +14637,10 @@ SELECT xmlagg(x) FROM (SELECT x FROM test ORDER BY y DESC) AS tab; ...@@ -14634,9 +14637,10 @@ SELECT xmlagg(x) FROM (SELECT x FROM test ORDER BY y DESC) AS tab;
All of the functions listed in All of the functions listed in
<xref linkend="functions-window-table"> depend on the sort ordering <xref linkend="functions-window-table"> depend on the sort ordering
specified by the <literal>ORDER BY</> clause of the associated window specified by the <literal>ORDER BY</> clause of the associated window
definition. Rows that are not distinct in the <literal>ORDER BY</> definition. Rows that are not distinct when considering only the
ordering are said to be <firstterm>peers</>; the four ranking functions <literal>ORDER BY</> columns are said to be <firstterm>peers</>.
are defined so that they give the same answer for any two peer rows. The four ranking functions (including <function>cume_dist</>) are
defined so that they give the same answer for all peer rows.
</para> </para>
<para> <para>
...@@ -14679,6 +14683,14 @@ SELECT xmlagg(x) FROM (SELECT x FROM test ORDER BY y DESC) AS tab; ...@@ -14679,6 +14683,14 @@ SELECT xmlagg(x) FROM (SELECT x FROM test ORDER BY y DESC) AS tab;
</para> </para>
</note> </note>
<para>
<function>cume_dist</> computes the fraction of partition rows that
are less than or equal to the current row and its peers, while
<function>percent_rank</> computes the fraction of partition rows that
are less than the current row, assuming the current row does not exist
in the partition.
</para>
</sect1> </sect1>
<sect1 id="functions-subquery"> <sect1 id="functions-subquery">
......
...@@ -1664,7 +1664,8 @@ SELECT string_agg(a ORDER BY a, ',') FROM table; -- incorrect ...@@ -1664,7 +1664,8 @@ SELECT string_agg(a ORDER BY a, ',') FROM table; -- incorrect
<para> <para>
Placing <literal>ORDER BY</> within the aggregate's regular argument Placing <literal>ORDER BY</> within the aggregate's regular argument
list, as described so far, is used when ordering the input rows for list, as described so far, is used when ordering the input rows for
a <quote>normal</> aggregate for which ordering is optional. There is a general-purpose and statistical aggregates, for which ordering is
optional. There is a
subclass of aggregate functions called <firstterm>ordered-set subclass of aggregate functions called <firstterm>ordered-set
aggregates</> for which an <replaceable>order_by_clause</replaceable> aggregates</> for which an <replaceable>order_by_clause</replaceable>
is <emphasis>required</>, usually because the aggregate's computation is is <emphasis>required</>, usually because the aggregate's computation is
...@@ -1675,7 +1676,7 @@ SELECT string_agg(a ORDER BY a, ',') FROM table; -- incorrect ...@@ -1675,7 +1676,7 @@ SELECT string_agg(a ORDER BY a, ',') FROM table; -- incorrect
inside <literal>WITHIN GROUP (...)</>, as shown in the final syntax inside <literal>WITHIN GROUP (...)</>, as shown in the final syntax
alternative above. The expressions in alternative above. The expressions in
the <replaceable>order_by_clause</replaceable> are evaluated once per the <replaceable>order_by_clause</replaceable> are evaluated once per
input row just like normal aggregate arguments, sorted as per input row just like regular aggregate arguments, sorted as per
the <replaceable>order_by_clause</replaceable>'s requirements, and fed the <replaceable>order_by_clause</replaceable>'s requirements, and fed
to the aggregate function as input arguments. (This is unlike the case to the aggregate function as input arguments. (This is unlike the case
for a non-<literal>WITHIN GROUP</> <replaceable>order_by_clause</>, for a non-<literal>WITHIN GROUP</> <replaceable>order_by_clause</>,
...@@ -1683,7 +1684,7 @@ SELECT string_agg(a ORDER BY a, ',') FROM table; -- incorrect ...@@ -1683,7 +1684,7 @@ SELECT string_agg(a ORDER BY a, ',') FROM table; -- incorrect
argument expressions preceding <literal>WITHIN GROUP</>, if any, are argument expressions preceding <literal>WITHIN GROUP</>, if any, are
called <firstterm>direct arguments</> to distinguish them from called <firstterm>direct arguments</> to distinguish them from
the <firstterm>aggregated arguments</> listed in the <firstterm>aggregated arguments</> listed in
the <replaceable>order_by_clause</replaceable>. Unlike normal aggregate the <replaceable>order_by_clause</replaceable>. Unlike regular aggregate
arguments, direct arguments are evaluated only once per aggregate call, arguments, direct arguments are evaluated only once per aggregate call,
not once per input row. This means that they can contain variables only not once per input row. This means that they can contain variables only
if those variables are grouped by <literal>GROUP BY</>; this restriction if those variables are grouped by <literal>GROUP BY</>; this restriction
...@@ -1779,10 +1780,10 @@ FROM generate_series(1,10) AS s(i); ...@@ -1779,10 +1780,10 @@ FROM generate_series(1,10) AS s(i);
<para> <para>
A <firstterm>window function call</firstterm> represents the application A <firstterm>window function call</firstterm> represents the application
of an aggregate-like function over some portion of the rows selected of an aggregate-like function over some portion of the rows selected
by a query. Unlike regular aggregate function calls, this is not tied by a query. Unlike non-window aggregate calls, this is not tied
to grouping of the selected rows into a single output row &mdash; each to grouping of the selected rows into a single output row &mdash; each
row remains separate in the query output. However the window function row remains separate in the query output. However the window function
is able to scan all the rows that would be part of the current row's has access to all the rows that would be part of the current row's
group according to the grouping specification (<literal>PARTITION BY</> group according to the grouping specification (<literal>PARTITION BY</>
list) of the window function call. list) of the window function call.
The syntax of a window function call is one of the following: The syntax of a window function call is one of the following:
...@@ -1831,20 +1832,20 @@ UNBOUNDED FOLLOWING ...@@ -1831,20 +1832,20 @@ UNBOUNDED FOLLOWING
named window in the <literal>WINDOW</literal> clause; see the named window in the <literal>WINDOW</literal> clause; see the
<xref linkend="sql-select"> reference page for details. It's worth <xref linkend="sql-select"> reference page for details. It's worth
pointing out that <literal>OVER wname</> is not exactly equivalent to pointing out that <literal>OVER wname</> is not exactly equivalent to
<literal>OVER (wname)</>; the latter implies copying and modifying the <literal>OVER (wname ...)</>; the latter implies copying and modifying the
window definition, and will be rejected if the referenced window window definition, and will be rejected if the referenced window
specification includes a frame clause. specification includes a frame clause.
</para> </para>
<para> <para>
The <literal>PARTITION BY</> option groups the rows of the query into The <literal>PARTITION BY</> clause groups the rows of the query into
<firstterm>partitions</>, which are processed separately by the window <firstterm>partitions</>, which are processed separately by the window
function. <literal>PARTITION BY</> works similarly to a query-level function. <literal>PARTITION BY</> works similarly to a query-level
<literal>GROUP BY</> clause, except that its expressions are always just <literal>GROUP BY</> clause, except that its expressions are always just
expressions and cannot be output-column names or numbers. expressions and cannot be output-column names or numbers.
Without <literal>PARTITION BY</>, all rows produced by the query are Without <literal>PARTITION BY</>, all rows produced by the query are
treated as a single partition. treated as a single partition.
The <literal>ORDER BY</> option determines the order in which the rows The <literal>ORDER BY</> clause determines the order in which the rows
of a partition are processed by the window function. It works similarly of a partition are processed by the window function. It works similarly
to a query-level <literal>ORDER BY</> clause, but likewise cannot use to a query-level <literal>ORDER BY</> clause, but likewise cannot use
output-column names or numbers. Without <literal>ORDER BY</>, rows are output-column names or numbers. Without <literal>ORDER BY</>, rows are
...@@ -1921,17 +1922,17 @@ UNBOUNDED FOLLOWING ...@@ -1921,17 +1922,17 @@ UNBOUNDED FOLLOWING
<para> <para>
The built-in window functions are described in <xref The built-in window functions are described in <xref
linkend="functions-window-table">. Other window functions can be added by linkend="functions-window-table">. Other window functions can be added by
the user. Also, any built-in or user-defined normal aggregate function the user. Also, any built-in or user-defined general-purpose or
can be used as a window function. Ordered-set aggregates presently statistical aggregate can be used as a window function. (Ordered-set
cannot be used as window functions, however. and hypothetical-set aggregates cannot presently be used as window functions.)
</para> </para>
<para> <para>
The syntaxes using <literal>*</> are used for calling parameter-less The syntaxes using <literal>*</> are used for calling parameter-less
aggregate functions as window functions, for example aggregate functions as window functions, for example
<literal>count(*) OVER (PARTITION BY x ORDER BY y)</>. <literal>count(*) OVER (PARTITION BY x ORDER BY y)</>.
The asterisk (<literal>*</>) is customarily not used for non-aggregate window functions. The asterisk (<literal>*</>) is customarily not used for
Aggregate window functions, unlike normal aggregate functions, do not window-specific functions. Window-specific functions do not
allow <literal>DISTINCT</> or <literal>ORDER BY</> to be used within the allow <literal>DISTINCT</> or <literal>ORDER BY</> to be used within the
function argument list. function argument list.
</para> </para>
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment