Commit ff6fe150 authored by Tom Lane's avatar Tom Lane

Update documentation to reflect availability of aggregate(DISTINCT).

Try to provide a more lucid discussion in 'Using Aggregate Functions'
tutorial section.
parent 662371cc
......@@ -361,39 +361,90 @@ DELETE FROM classname;
Like most other query languages,
<ProductName>PostgreSQL</ProductName> supports
aggregate functions.
The current implementation of
<ProductName>Postgres</ProductName> aggregate functions have some limitations.
Specifically, while there are aggregates to compute
such functions as the <Function>count</Function>, <Function>sum</Function>,
An aggregate function computes a single result from multiple input rows.
For example, there are aggregates to compute the
<Function>count</Function>, <Function>sum</Function>,
<Function>avg</Function> (average), <Function>max</Function> (maximum) and
<Function>min</Function> (minimum) over a set of instances, aggregates can only
appear in the target list of a query and not directly in the
qualification (the where clause). As an example,
<Function>min</Function> (minimum) over a set of instances.
</para>
<Para>
It is important to understand the interaction between aggregates and
SQL's <Command>where</Command> and <Command>having</Command> clauses.
The fundamental difference between <Command>where</Command> and
<Command>having</Command> is this: <Command>where</Command> selects
input rows before groups and aggregates are computed (thus, it controls
which rows go into the aggregate computation), whereas
<Command>having</Command> selects group rows after groups and
aggregates are computed. Thus, the
<Command>where</Command> clause may not contain aggregate functions;
it makes no sense to try to use an aggregate to determine which rows
will be inputs to the aggregates. On the other hand,
<Command>having</Command> clauses always contain aggregate functions.
(Strictly speaking, you are allowed to write a <Command>having</Command>
clause that doesn't use aggregates, but it's wasteful; the same condition
could be used more efficiently at the <Command>where</Command> stage.)
</para>
<Para>
As an example, we can find the highest low-temperature reading anywhere
with
<ProgramListing>
SELECT max(temp_lo) FROM weather;
</ProgramListing>
is allowed, while
If we want to know which city (or cities) that reading occurred in,
we might try
<ProgramListing>
SELECT city FROM weather WHERE temp_lo = max(temp_lo);
</ProgramListing>
is not. However, as is often the case the query can be restated to accomplish
the intended result; here by using a <FirstTerm>subselect</FirstTerm>:
but this will not work since the aggregate max() can't be used in
<Command>where</Command>. However, as is often the case the query can be
restated to accomplish the intended result; here by using a
<FirstTerm>subselect</FirstTerm>:
<ProgramListing>
SELECT city FROM weather WHERE temp_lo = (SELECT max(temp_lo) FROM weather);
</ProgramListing>
This is OK because the sub-select is an independent computation that
computes its own aggregate separately from what's happening in the outer
select.
</Para>
<Para>
Aggregates may also have <FirstTerm>group by</FirstTerm> clauses:
Aggregates are also very useful in combination with
<FirstTerm>group by</FirstTerm> clauses. For example, we can get the
maximum low temperature observed in each city with
<ProgramListing>
SELECT city, max(temp_lo)
FROM weather
GROUP BY city;
</ProgramListing>
which gives us one output row per city. We can filter these grouped
rows using <Command>having</Command>:
<ProgramListing>
SELECT city, max(temp_lo)
FROM weather
GROUP BY city
HAVING min(temp_lo) < 0;
</ProgramListing>
which gives us the same results for only the cities that have some
below-zero readings. Finally, if we only care about cities whose
names begin with 'P', we might do
<ProgramListing>
SELECT city, max(temp_lo)
FROM weather
WHERE city like 'P%'
GROUP BY city
HAVING min(temp_lo) < 0;
</ProgramListing>
Note that we can apply the city-name restriction in
<Command>where</Command>, since it needs no aggregate. This is
more efficient than adding the restriction to <Command>having</Command>,
because we avoid doing the grouping and aggregate calculations
for all rows that fail the <Command>where</Command> check.
</Para>
</sect1>
</Chapter>
......
<!--
$Header: /cvsroot/pgsql/doc/src/sgml/ref/select.sgml,v 1.22 1999/08/06 13:50:31 thomas Exp $
$Header: /cvsroot/pgsql/doc/src/sgml/ref/select.sgml,v 1.23 1999/12/13 17:39:38 tgl Exp $
Postgres documentation
-->
......@@ -202,10 +202,10 @@ SELECT [ ALL | DISTINCT [ ON <replaceable class="PARAMETER">column</replaceable>
<para>
<command>DISTINCT</command> will eliminate all duplicate rows from the
selection.
result.
<command>DISTINCT ON <replaceable class="PARAMETER">column</replaceable></command>
will eliminate all duplicates in the specified column; this is
equivalent to using
similar to using
<command>GROUP BY <replaceable class="PARAMETER">column</replaceable></command>.
<command>ALL</command> will return all candidate rows,
including duplicates.
......@@ -320,11 +320,13 @@ GROUP BY <replaceable class="PARAMETER">column</replaceable> [, ...]
<para>
GROUP BY will condense into a single row all rows that share the
same values for the
grouped columns; aggregates return values derived from all rows
that make up the group. The value returned for an ungrouped
and unaggregated column is dependent on the order in which rows
happen to be read from the database.
same values for the grouped columns. Aggregate functions, if any,
are computed across all rows making up each group, producing a
separate value for each group (whereas without GROUP BY, an
aggregate produces a single value computed across all the selected
rows). When GROUP BY is present, it is not valid to refer to
ungrouped columns except within aggregate functions, since there
would be more than one possible value to return for an ungrouped column.
</para>
</refsect2>
......@@ -354,7 +356,8 @@ HAVING <replaceable class="PARAMETER">cond_expr</replaceable>
<para>
Each column referenced in
<replaceable class="PARAMETER">cond_expr</replaceable> shall unambiguously
reference a grouping column.
reference a grouping column, unless the reference appears within an
aggregate function.
</para>
</refsect2>
......
......@@ -642,15 +642,16 @@ CAST '<replaceable>string</replaceable>' AS <replaceable>type</replaceable>
<member><replaceable>a_expr</replaceable> <replaceable>right_unary_operator</replaceable></member>
<member><replaceable>left_unary_operator</replaceable> <replaceable>a_expr</replaceable></member>
<member>parameter</member>
<member>functional expressions</member>
<member>aggregate expressions</member>
<member>functional expression</member>
<member>aggregate expression</member>
</simplelist>
</para>
<para>
We have already discussed constants and attributes. The two kinds of
operator expressions indicate respectively binary and left_unary
expressions. The following sections discuss the remaining options.
We have already discussed constants and attributes. The three kinds of
operator expressions indicate respectively binary (infix), right-unary
(suffix) and left-unary (prefix) operators. The following sections
discuss the remaining options.
</para>
<sect2>
......@@ -690,7 +691,7 @@ CREATE FUNCTION dept (name)
enclosed in parentheses:
<synopsis>
<replaceable>function</replaceable> (<replaceable>a_expr</replaceable> [, <replaceable>a_expr</replaceable> )
<replaceable>function</replaceable> (<replaceable>a_expr</replaceable> [, <replaceable>a_expr</replaceable> ... ] )
</synopsis>
</para>
......@@ -705,20 +706,40 @@ sqrt(emp.salary)
</sect2>
<sect2>
<title>Aggregate Expression</title>
<title>Aggregate Expressions</title>
<para>
An <firstterm>aggregate expression</firstterm>
represents a simple aggregate (i.e., one that computes a single value)
or an aggregate function (i.e., one that computes a set of values).
The syntax is the following:
An <firstterm>aggregate expression</firstterm> represents the application
of an aggregate function across the rows selected by a query.
An aggregate function reduces multiple inputs to a single output value,
such as the sum or average of the inputs.
The syntax of an aggregate expression is one of the following:
<synopsis>
<replaceable>aggregate_name</replaceable> (<replaceable>attribute</replaceable>)
</synopsis>
<simplelist>
<member><replaceable>aggregate_name</replaceable> (<replaceable>expression</replaceable>)</member>
<member><replaceable>aggregate_name</replaceable> (DISTINCT <replaceable>expression</replaceable>)</member>
<member><replaceable>aggregate_name</replaceable> ( * )</member>
</simplelist>
where <replaceable>aggregate_name</replaceable> is a previously defined
aggregate, and <replaceable>expression</replaceable> is any expression
that doesn't itself contain an aggregate expression.
</para>
where <replaceable>aggregate_name</replaceable>
must be a previously defined aggregate.
<para>
The first form of aggregate expression invokes the aggregate across all
input rows for which the given expression yields a non-null value.
The second form invokes the aggregate for all distinct non-null values
of the expression found in the input rows. The last form invokes the
aggregate once for each input row regardless of null or non-null values;
since no particular input value is specified, it is generally only useful
for the count() aggregate.
</para>
<para>
For example, count(*) yields the total number of input rows;
count(f1) yields the number of input rows in which f1 is non-null;
count(distinct f1) yields the number of distinct non-null values of f1.
</para>
</sect2>
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment