Commit be45be9c authored by Tomas Vondra's avatar Tomas Vondra

Implement GROUP BY DISTINCT

With grouping sets, it's possible that some of the grouping sets are
duplicate.  This is especially common with CUBE and ROLLUP clauses. For
example GROUP BY CUBE (a,b), CUBE (b,c) is equivalent to

  GROUP BY GROUPING SETS (
    (a, b, c),
    (a, b, c),
    (a, b, c),
    (a, b),
    (a, b),
    (a, b),
    (a),
    (a),
    (a),
    (c, a),
    (c, a),
    (c, a),
    (c),
    (b, c),
    (b),
    ()
  )

Some of the grouping sets are calculated multiple times, which is mostly
unnecessary.  This commit implements a new GROUP BY DISTINCT feature, as
defined in the SQL standard, which eliminates the duplicate sets.

Author: Vik Fearing
Reviewed-by: Erik Rijkers, Georgios Kokolatos, Tomas Vondra
Discussion: https://postgr.es/m/bf3805a8-d7d1-ae61-fece-761b7ff41ecc@postgresfriends.org
parent cd91de0d
...@@ -1372,6 +1372,55 @@ GROUP BY GROUPING SETS ( ...@@ -1372,6 +1372,55 @@ GROUP BY GROUPING SETS (
</programlisting> </programlisting>
</para> </para>
<para>
<indexterm zone="queries-grouping-sets">
<primary>ALL</primary>
<secondary>GROUP BY ALL</secondary>
</indexterm>
<indexterm zone="queries-grouping-sets">
<primary>DISTINCT</primary>
<secondary>GROUP BY DISTINCT</secondary>
</indexterm>
When specifying multiple grouping items together, the final set of grouping
sets might contain duplicates. For example:
<programlisting>
GROUP BY ROLLUP (a, b), ROLLUP (a, c)
</programlisting>
is equivalent to
<programlisting>
GROUP BY GROUPING SETS (
(a, b, c),
(a, b),
(a, b),
(a, c),
(a),
(a),
(a, c),
(a),
()
)
</programlisting>
If these duplicates are undesirable, they can be removed using the
<literal>DISTINCT</literal> clause directly on the <literal>GROUP BY</literal>.
Therefore:
<programlisting>
GROUP BY <emphasis>DISTINCT</emphasis> ROLLUP (a, b), ROLLUP (a, c)
</programlisting>
is equivalent to
<programlisting>
GROUP BY GROUPING SETS (
(a, b, c),
(a, b),
(a, c),
(a),
()
)
</programlisting>
This is not the same as using <literal>SELECT DISTINCT</literal> because the output
rows may still contain duplicates. If any of the ungrouped columns contains NULL,
it will be indistinguishable from the NULL used when that same column is grouped.
</para>
<note> <note>
<para> <para>
The construct <literal>(a, b)</literal> is normally recognized in expressions as The construct <literal>(a, b)</literal> is normally recognized in expressions as
...@@ -1560,8 +1609,13 @@ SELECT a "from", b + c AS sum FROM ... ...@@ -1560,8 +1609,13 @@ SELECT a "from", b + c AS sum FROM ...
<sect2 id="queries-distinct"> <sect2 id="queries-distinct">
<title><literal>DISTINCT</literal></title> <title><literal>DISTINCT</literal></title>
<indexterm zone="queries-distinct">
<primary>ALL</primary>
<secondary>SELECT ALL</secondary>
</indexterm>
<indexterm zone="queries-distinct"> <indexterm zone="queries-distinct">
<primary>DISTINCT</primary> <primary>DISTINCT</primary>
<secondary>SELECT DISTINCT</secondary>
</indexterm> </indexterm>
<indexterm zone="queries-distinct"> <indexterm zone="queries-distinct">
......
...@@ -37,7 +37,7 @@ SELECT [ ALL | DISTINCT [ ON ( <replaceable class="parameter">expression</replac ...@@ -37,7 +37,7 @@ SELECT [ ALL | DISTINCT [ ON ( <replaceable class="parameter">expression</replac
[ * | <replaceable class="parameter">expression</replaceable> [ [ AS ] <replaceable class="parameter">output_name</replaceable> ] [, ...] ] [ * | <replaceable class="parameter">expression</replaceable> [ [ AS ] <replaceable class="parameter">output_name</replaceable> ] [, ...] ]
[ FROM <replaceable class="parameter">from_item</replaceable> [, ...] ] [ FROM <replaceable class="parameter">from_item</replaceable> [, ...] ]
[ WHERE <replaceable class="parameter">condition</replaceable> ] [ WHERE <replaceable class="parameter">condition</replaceable> ]
[ GROUP BY <replaceable class="parameter">grouping_element</replaceable> [, ...] ] [ GROUP BY [ ALL | DISTINCT ] <replaceable class="parameter">grouping_element</replaceable> [, ...] ]
[ HAVING <replaceable class="parameter">condition</replaceable> ] [ HAVING <replaceable class="parameter">condition</replaceable> ]
[ WINDOW <replaceable class="parameter">window_name</replaceable> AS ( <replaceable class="parameter">window_definition</replaceable> ) [, ...] ] [ WINDOW <replaceable class="parameter">window_name</replaceable> AS ( <replaceable class="parameter">window_definition</replaceable> ) [, ...] ]
[ { UNION | INTERSECT | EXCEPT } [ ALL | DISTINCT ] <replaceable class="parameter">select</replaceable> ] [ { UNION | INTERSECT | EXCEPT } [ ALL | DISTINCT ] <replaceable class="parameter">select</replaceable> ]
...@@ -778,7 +778,7 @@ WHERE <replaceable class="parameter">condition</replaceable> ...@@ -778,7 +778,7 @@ WHERE <replaceable class="parameter">condition</replaceable>
<para> <para>
The optional <literal>GROUP BY</literal> clause has the general form The optional <literal>GROUP BY</literal> clause has the general form
<synopsis> <synopsis>
GROUP BY <replaceable class="parameter">grouping_element</replaceable> [, ...] GROUP BY [ ALL | DISTINCT ] <replaceable class="parameter">grouping_element</replaceable> [, ...]
</synopsis> </synopsis>
</para> </para>
...@@ -802,7 +802,10 @@ GROUP BY <replaceable class="parameter">grouping_element</replaceable> [, ...] ...@@ -802,7 +802,10 @@ GROUP BY <replaceable class="parameter">grouping_element</replaceable> [, ...]
independent <replaceable>grouping sets</replaceable>. The effect of this is independent <replaceable>grouping sets</replaceable>. The effect of this is
equivalent to constructing a <literal>UNION ALL</literal> between equivalent to constructing a <literal>UNION ALL</literal> between
subqueries with the individual grouping sets as their subqueries with the individual grouping sets as their
<literal>GROUP BY</literal> clauses. For further details on the handling <literal>GROUP BY</literal> clauses. The optional <literal>DISTINCT</literal>
clause removes duplicate sets before processing; it does <emphasis>not</emphasis>
transform the <literal>UNION ALL</literal> into a <literal>UNION DISTINCT</literal>.
For further details on the handling
of grouping sets see <xref linkend="queries-grouping-sets"/>. of grouping sets see <xref linkend="queries-grouping-sets"/>.
</para> </para>
......
...@@ -482,7 +482,7 @@ T351 Bracketed SQL comments (/*...*/ comments) YES ...@@ -482,7 +482,7 @@ T351 Bracketed SQL comments (/*...*/ comments) YES
T431 Extended grouping capabilities YES T431 Extended grouping capabilities YES
T432 Nested and concatenated GROUPING SETS YES T432 Nested and concatenated GROUPING SETS YES
T433 Multiargument GROUPING function YES T433 Multiargument GROUPING function YES
T434 GROUP BY DISTINCT NO T434 GROUP BY DISTINCT YES
T441 ABS and MOD functions YES T441 ABS and MOD functions YES
T461 Symmetric BETWEEN predicate YES T461 Symmetric BETWEEN predicate YES
T471 Result sets return value NO T471 Result sets return value NO
......
...@@ -3135,6 +3135,7 @@ _copyQuery(const Query *from) ...@@ -3135,6 +3135,7 @@ _copyQuery(const Query *from)
COPY_NODE_FIELD(onConflict); COPY_NODE_FIELD(onConflict);
COPY_NODE_FIELD(returningList); COPY_NODE_FIELD(returningList);
COPY_NODE_FIELD(groupClause); COPY_NODE_FIELD(groupClause);
COPY_SCALAR_FIELD(groupDistinct);
COPY_NODE_FIELD(groupingSets); COPY_NODE_FIELD(groupingSets);
COPY_NODE_FIELD(havingQual); COPY_NODE_FIELD(havingQual);
COPY_NODE_FIELD(windowClause); COPY_NODE_FIELD(windowClause);
...@@ -3221,6 +3222,7 @@ _copySelectStmt(const SelectStmt *from) ...@@ -3221,6 +3222,7 @@ _copySelectStmt(const SelectStmt *from)
COPY_NODE_FIELD(fromClause); COPY_NODE_FIELD(fromClause);
COPY_NODE_FIELD(whereClause); COPY_NODE_FIELD(whereClause);
COPY_NODE_FIELD(groupClause); COPY_NODE_FIELD(groupClause);
COPY_SCALAR_FIELD(groupDistinct);
COPY_NODE_FIELD(havingClause); COPY_NODE_FIELD(havingClause);
COPY_NODE_FIELD(windowClause); COPY_NODE_FIELD(windowClause);
COPY_NODE_FIELD(valuesLists); COPY_NODE_FIELD(valuesLists);
......
...@@ -977,6 +977,7 @@ _equalQuery(const Query *a, const Query *b) ...@@ -977,6 +977,7 @@ _equalQuery(const Query *a, const Query *b)
COMPARE_NODE_FIELD(onConflict); COMPARE_NODE_FIELD(onConflict);
COMPARE_NODE_FIELD(returningList); COMPARE_NODE_FIELD(returningList);
COMPARE_NODE_FIELD(groupClause); COMPARE_NODE_FIELD(groupClause);
COMPARE_SCALAR_FIELD(groupDistinct);
COMPARE_NODE_FIELD(groupingSets); COMPARE_NODE_FIELD(groupingSets);
COMPARE_NODE_FIELD(havingQual); COMPARE_NODE_FIELD(havingQual);
COMPARE_NODE_FIELD(windowClause); COMPARE_NODE_FIELD(windowClause);
...@@ -1053,6 +1054,7 @@ _equalSelectStmt(const SelectStmt *a, const SelectStmt *b) ...@@ -1053,6 +1054,7 @@ _equalSelectStmt(const SelectStmt *a, const SelectStmt *b)
COMPARE_NODE_FIELD(fromClause); COMPARE_NODE_FIELD(fromClause);
COMPARE_NODE_FIELD(whereClause); COMPARE_NODE_FIELD(whereClause);
COMPARE_NODE_FIELD(groupClause); COMPARE_NODE_FIELD(groupClause);
COMPARE_SCALAR_FIELD(groupDistinct);
COMPARE_NODE_FIELD(havingClause); COMPARE_NODE_FIELD(havingClause);
COMPARE_NODE_FIELD(windowClause); COMPARE_NODE_FIELD(windowClause);
COMPARE_NODE_FIELD(valuesLists); COMPARE_NODE_FIELD(valuesLists);
......
...@@ -1506,6 +1506,22 @@ list_sort(List *list, list_sort_comparator cmp) ...@@ -1506,6 +1506,22 @@ list_sort(List *list, list_sort_comparator cmp)
qsort(list->elements, len, sizeof(ListCell), (qsort_comparator) cmp); qsort(list->elements, len, sizeof(ListCell), (qsort_comparator) cmp);
} }
/*
* list_sort comparator for sorting a list into ascending int order.
*/
int
list_int_cmp(const ListCell *p1, const ListCell *p2)
{
int v1 = lfirst_int(p1);
int v2 = lfirst_int(p2);
if (v1 < v2)
return -1;
if (v1 > v2)
return 1;
return 0;
}
/* /*
* list_sort comparator for sorting a list into ascending OID order. * list_sort comparator for sorting a list into ascending OID order.
*/ */
......
...@@ -2771,6 +2771,7 @@ _outSelectStmt(StringInfo str, const SelectStmt *node) ...@@ -2771,6 +2771,7 @@ _outSelectStmt(StringInfo str, const SelectStmt *node)
WRITE_NODE_FIELD(fromClause); WRITE_NODE_FIELD(fromClause);
WRITE_NODE_FIELD(whereClause); WRITE_NODE_FIELD(whereClause);
WRITE_NODE_FIELD(groupClause); WRITE_NODE_FIELD(groupClause);
WRITE_BOOL_FIELD(groupDistinct);
WRITE_NODE_FIELD(havingClause); WRITE_NODE_FIELD(havingClause);
WRITE_NODE_FIELD(windowClause); WRITE_NODE_FIELD(windowClause);
WRITE_NODE_FIELD(valuesLists); WRITE_NODE_FIELD(valuesLists);
...@@ -2996,6 +2997,7 @@ _outQuery(StringInfo str, const Query *node) ...@@ -2996,6 +2997,7 @@ _outQuery(StringInfo str, const Query *node)
WRITE_NODE_FIELD(onConflict); WRITE_NODE_FIELD(onConflict);
WRITE_NODE_FIELD(returningList); WRITE_NODE_FIELD(returningList);
WRITE_NODE_FIELD(groupClause); WRITE_NODE_FIELD(groupClause);
WRITE_BOOL_FIELD(groupDistinct);
WRITE_NODE_FIELD(groupingSets); WRITE_NODE_FIELD(groupingSets);
WRITE_NODE_FIELD(havingQual); WRITE_NODE_FIELD(havingQual);
WRITE_NODE_FIELD(windowClause); WRITE_NODE_FIELD(windowClause);
......
...@@ -271,6 +271,7 @@ _readQuery(void) ...@@ -271,6 +271,7 @@ _readQuery(void)
READ_NODE_FIELD(onConflict); READ_NODE_FIELD(onConflict);
READ_NODE_FIELD(returningList); READ_NODE_FIELD(returningList);
READ_NODE_FIELD(groupClause); READ_NODE_FIELD(groupClause);
READ_BOOL_FIELD(groupDistinct);
READ_NODE_FIELD(groupingSets); READ_NODE_FIELD(groupingSets);
READ_NODE_FIELD(havingQual); READ_NODE_FIELD(havingQual);
READ_NODE_FIELD(windowClause); READ_NODE_FIELD(windowClause);
......
...@@ -2442,7 +2442,7 @@ preprocess_grouping_sets(PlannerInfo *root) ...@@ -2442,7 +2442,7 @@ preprocess_grouping_sets(PlannerInfo *root)
ListCell *lc_set; ListCell *lc_set;
grouping_sets_data *gd = palloc0(sizeof(grouping_sets_data)); grouping_sets_data *gd = palloc0(sizeof(grouping_sets_data));
parse->groupingSets = expand_grouping_sets(parse->groupingSets, -1); parse->groupingSets = expand_grouping_sets(parse->groupingSets, parse->groupDistinct, -1);
gd->any_hashable = false; gd->any_hashable = false;
gd->unhashable_refs = NULL; gd->unhashable_refs = NULL;
......
...@@ -1264,6 +1264,7 @@ transformSelectStmt(ParseState *pstate, SelectStmt *stmt) ...@@ -1264,6 +1264,7 @@ transformSelectStmt(ParseState *pstate, SelectStmt *stmt)
qry->sortClause, qry->sortClause,
EXPR_KIND_GROUP_BY, EXPR_KIND_GROUP_BY,
false /* allow SQL92 rules */ ); false /* allow SQL92 rules */ );
qry->groupDistinct = stmt->groupDistinct;
if (stmt->distinctClause == NIL) if (stmt->distinctClause == NIL)
{ {
......
...@@ -134,6 +134,13 @@ typedef struct SelectLimit ...@@ -134,6 +134,13 @@ typedef struct SelectLimit
LimitOption limitOption; LimitOption limitOption;
} SelectLimit; } SelectLimit;
/* Private struct for the result of group_clause production */
typedef struct GroupClause
{
bool distinct;
List *list;
} GroupClause;
/* ConstraintAttributeSpec yields an integer bitmask of these flags: */ /* ConstraintAttributeSpec yields an integer bitmask of these flags: */
#define CAS_NOT_DEFERRABLE 0x01 #define CAS_NOT_DEFERRABLE 0x01
#define CAS_DEFERRABLE 0x02 #define CAS_DEFERRABLE 0x02
...@@ -250,6 +257,8 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query); ...@@ -250,6 +257,8 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
PartitionBoundSpec *partboundspec; PartitionBoundSpec *partboundspec;
RoleSpec *rolespec; RoleSpec *rolespec;
struct SelectLimit *selectlimit; struct SelectLimit *selectlimit;
SetQuantifier setquantifier;
struct GroupClause *groupclause;
} }
%type <node> stmt schema_stmt %type <node> stmt schema_stmt
...@@ -405,7 +414,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query); ...@@ -405,7 +414,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
target_list opt_target_list insert_column_list set_target_list target_list opt_target_list insert_column_list set_target_list
set_clause_list set_clause set_clause_list set_clause
def_list operator_def_list indirection opt_indirection def_list operator_def_list indirection opt_indirection
reloption_list group_clause TriggerFuncArgs opclass_item_list opclass_drop_list reloption_list TriggerFuncArgs opclass_item_list opclass_drop_list
opclass_purpose opt_opfamily transaction_mode_list_or_empty opclass_purpose opt_opfamily transaction_mode_list_or_empty
OptTableFuncElementList TableFuncElementList opt_type_modifiers OptTableFuncElementList TableFuncElementList opt_type_modifiers
prep_type_clause prep_type_clause
...@@ -418,6 +427,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query); ...@@ -418,6 +427,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
vacuum_relation_list opt_vacuum_relation_list vacuum_relation_list opt_vacuum_relation_list
drop_option_list drop_option_list
%type <groupclause> group_clause
%type <list> group_by_list %type <list> group_by_list
%type <node> group_by_item empty_grouping_set rollup_clause cube_clause %type <node> group_by_item empty_grouping_set rollup_clause cube_clause
%type <node> grouping_sets_clause %type <node> grouping_sets_clause
...@@ -443,7 +453,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query); ...@@ -443,7 +453,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
%type <node> for_locking_item %type <node> for_locking_item
%type <list> for_locking_clause opt_for_locking_clause for_locking_items %type <list> for_locking_clause opt_for_locking_clause for_locking_items
%type <list> locked_rels_list %type <list> locked_rels_list
%type <boolean> all_or_distinct %type <setquantifier> set_quantifier
%type <node> join_qual %type <node> join_qual
%type <jtype> join_type %type <jtype> join_type
...@@ -11294,7 +11304,8 @@ simple_select: ...@@ -11294,7 +11304,8 @@ simple_select:
n->intoClause = $4; n->intoClause = $4;
n->fromClause = $5; n->fromClause = $5;
n->whereClause = $6; n->whereClause = $6;
n->groupClause = $7; n->groupClause = ($7)->list;
n->groupDistinct = ($7)->distinct;
n->havingClause = $8; n->havingClause = $8;
n->windowClause = $9; n->windowClause = $9;
$$ = (Node *)n; $$ = (Node *)n;
...@@ -11309,7 +11320,8 @@ simple_select: ...@@ -11309,7 +11320,8 @@ simple_select:
n->intoClause = $4; n->intoClause = $4;
n->fromClause = $5; n->fromClause = $5;
n->whereClause = $6; n->whereClause = $6;
n->groupClause = $7; n->groupClause = ($7)->list;
n->groupDistinct = ($7)->distinct;
n->havingClause = $8; n->havingClause = $8;
n->windowClause = $9; n->windowClause = $9;
$$ = (Node *)n; $$ = (Node *)n;
...@@ -11334,17 +11346,17 @@ simple_select: ...@@ -11334,17 +11346,17 @@ simple_select:
n->fromClause = list_make1($2); n->fromClause = list_make1($2);
$$ = (Node *)n; $$ = (Node *)n;
} }
| select_clause UNION all_or_distinct select_clause | select_clause UNION set_quantifier select_clause
{ {
$$ = makeSetOp(SETOP_UNION, $3, $1, $4); $$ = makeSetOp(SETOP_UNION, $3 == SET_QUANTIFIER_ALL, $1, $4);
} }
| select_clause INTERSECT all_or_distinct select_clause | select_clause INTERSECT set_quantifier select_clause
{ {
$$ = makeSetOp(SETOP_INTERSECT, $3, $1, $4); $$ = makeSetOp(SETOP_INTERSECT, $3 == SET_QUANTIFIER_ALL, $1, $4);
} }
| select_clause EXCEPT all_or_distinct select_clause | select_clause EXCEPT set_quantifier select_clause
{ {
$$ = makeSetOp(SETOP_EXCEPT, $3, $1, $4); $$ = makeSetOp(SETOP_EXCEPT, $3 == SET_QUANTIFIER_ALL, $1, $4);
} }
; ;
...@@ -11542,10 +11554,10 @@ opt_table: TABLE ...@@ -11542,10 +11554,10 @@ opt_table: TABLE
| /*EMPTY*/ | /*EMPTY*/
; ;
all_or_distinct: set_quantifier:
ALL { $$ = true; } ALL { $$ = SET_QUANTIFIER_ALL; }
| DISTINCT { $$ = false; } | DISTINCT { $$ = SET_QUANTIFIER_DISTINCT; }
| /*EMPTY*/ { $$ = false; } | /*EMPTY*/ { $$ = SET_QUANTIFIER_DEFAULT; }
; ;
/* We use (NIL) as a placeholder to indicate that all target expressions /* We use (NIL) as a placeholder to indicate that all target expressions
...@@ -11771,8 +11783,20 @@ first_or_next: FIRST_P { $$ = 0; } ...@@ -11771,8 +11783,20 @@ first_or_next: FIRST_P { $$ = 0; }
* GroupingSet node of some type. * GroupingSet node of some type.
*/ */
group_clause: group_clause:
GROUP_P BY group_by_list { $$ = $3; } GROUP_P BY set_quantifier group_by_list
| /*EMPTY*/ { $$ = NIL; } {
GroupClause *n = (GroupClause *) palloc(sizeof(GroupClause));
n->distinct = $3 == SET_QUANTIFIER_DISTINCT;
n->list = $4;
$$ = n;
}
| /*EMPTY*/
{
GroupClause *n = (GroupClause *) palloc(sizeof(GroupClause));
n->distinct = false;
n->list = NIL;
$$ = n;
}
; ;
group_by_list: group_by_list:
...@@ -15145,7 +15169,8 @@ PLpgSQL_Expr: opt_distinct_clause opt_target_list ...@@ -15145,7 +15169,8 @@ PLpgSQL_Expr: opt_distinct_clause opt_target_list
n->targetList = $2; n->targetList = $2;
n->fromClause = $3; n->fromClause = $3;
n->whereClause = $4; n->whereClause = $4;
n->groupClause = $5; n->groupClause = ($5)->list;
n->groupDistinct = ($5)->distinct;
n->havingClause = $6; n->havingClause = $6;
n->windowClause = $7; n->windowClause = $7;
n->sortClause = $8; n->sortClause = $8;
......
...@@ -1071,7 +1071,7 @@ parseCheckAggregates(ParseState *pstate, Query *qry) ...@@ -1071,7 +1071,7 @@ parseCheckAggregates(ParseState *pstate, Query *qry)
* The limit of 4096 is arbitrary and exists simply to avoid resource * The limit of 4096 is arbitrary and exists simply to avoid resource
* issues from pathological constructs. * issues from pathological constructs.
*/ */
List *gsets = expand_grouping_sets(qry->groupingSets, 4096); List *gsets = expand_grouping_sets(qry->groupingSets, qry->groupDistinct, 4096);
if (!gsets) if (!gsets)
ereport(ERROR, ereport(ERROR,
...@@ -1735,6 +1735,33 @@ cmp_list_len_asc(const ListCell *a, const ListCell *b) ...@@ -1735,6 +1735,33 @@ cmp_list_len_asc(const ListCell *a, const ListCell *b)
return (la > lb) ? 1 : (la == lb) ? 0 : -1; return (la > lb) ? 1 : (la == lb) ? 0 : -1;
} }
/* list_sort comparator to sort sub-lists by length and contents */
static int
cmp_list_len_contents_asc(const ListCell *a, const ListCell *b)
{
int res = cmp_list_len_asc(a, b);
if (res == 0)
{
List *la = (List *) lfirst(a);
List *lb = (List *) lfirst(b);
ListCell *lca;
ListCell *lcb;
forboth(lca, la, lcb, lb)
{
int va = intVal(lca);
int vb = intVal(lcb);
if (va > vb)
return 1;
if (va < vb)
return -1;
}
}
return res;
}
/* /*
* Expand a groupingSets clause to a flat list of grouping sets. * Expand a groupingSets clause to a flat list of grouping sets.
* The returned list is sorted by length, shortest sets first. * The returned list is sorted by length, shortest sets first.
...@@ -1743,7 +1770,7 @@ cmp_list_len_asc(const ListCell *a, const ListCell *b) ...@@ -1743,7 +1770,7 @@ cmp_list_len_asc(const ListCell *a, const ListCell *b)
* some consistency checks. * some consistency checks.
*/ */
List * List *
expand_grouping_sets(List *groupingSets, int limit) expand_grouping_sets(List *groupingSets, bool groupDistinct, int limit)
{ {
List *expanded_groups = NIL; List *expanded_groups = NIL;
List *result = NIL; List *result = NIL;
...@@ -1801,8 +1828,31 @@ expand_grouping_sets(List *groupingSets, int limit) ...@@ -1801,8 +1828,31 @@ expand_grouping_sets(List *groupingSets, int limit)
result = new_result; result = new_result;
} }
/* Now sort the lists by length */ /* Now sort the lists by length and deduplicate if necessary */
if (!groupDistinct || list_length(result) < 2)
list_sort(result, cmp_list_len_asc); list_sort(result, cmp_list_len_asc);
else
{
ListCell *cell;
List *prev;
/* Sort each groupset individually */
foreach(cell, result)
list_sort(lfirst(cell), list_int_cmp);
/* Now sort the list of groupsets by length and contents */
list_sort(result, cmp_list_len_contents_asc);
/* Finally, remove duplicates */
prev = list_nth_node(List, result, 0);
for_each_from(cell, result, 1)
{
if (equal(lfirst(cell), prev))
foreach_delete_current(result, cell);
else
prev = lfirst(cell);
}
}
return result; return result;
} }
......
...@@ -5517,6 +5517,8 @@ get_basic_select_query(Query *query, deparse_context *context, ...@@ -5517,6 +5517,8 @@ get_basic_select_query(Query *query, deparse_context *context,
appendContextKeyword(context, " GROUP BY ", appendContextKeyword(context, " GROUP BY ",
-PRETTYINDENT_STD, PRETTYINDENT_STD, 1); -PRETTYINDENT_STD, PRETTYINDENT_STD, 1);
if (query->groupDistinct)
appendStringInfoString(buf, "DISTINCT ");
save_exprkind = context->special_exprkind; save_exprkind = context->special_exprkind;
context->special_exprkind = EXPR_KIND_GROUP_BY; context->special_exprkind = EXPR_KIND_GROUP_BY;
......
...@@ -62,6 +62,14 @@ typedef enum SortByNulls ...@@ -62,6 +62,14 @@ typedef enum SortByNulls
SORTBY_NULLS_LAST SORTBY_NULLS_LAST
} SortByNulls; } SortByNulls;
/* Options for [ ALL | DISTINCT ] */
typedef enum SetQuantifier
{
SET_QUANTIFIER_DEFAULT,
SET_QUANTIFIER_ALL,
SET_QUANTIFIER_DISTINCT
} SetQuantifier;
/* /*
* Grantable rights are encoded so that we can OR them together in a bitmask. * Grantable rights are encoded so that we can OR them together in a bitmask.
* The present representation of AclItem limits us to 16 distinct rights, * The present representation of AclItem limits us to 16 distinct rights,
...@@ -146,6 +154,7 @@ typedef struct Query ...@@ -146,6 +154,7 @@ typedef struct Query
List *returningList; /* return-values list (of TargetEntry) */ List *returningList; /* return-values list (of TargetEntry) */
List *groupClause; /* a list of SortGroupClause's */ List *groupClause; /* a list of SortGroupClause's */
bool groupDistinct; /* is the group by clause distinct? */
List *groupingSets; /* a list of GroupingSet's if present */ List *groupingSets; /* a list of GroupingSet's if present */
...@@ -1629,6 +1638,7 @@ typedef struct SelectStmt ...@@ -1629,6 +1638,7 @@ typedef struct SelectStmt
List *fromClause; /* the FROM clause */ List *fromClause; /* the FROM clause */
Node *whereClause; /* WHERE qualification */ Node *whereClause; /* WHERE qualification */
List *groupClause; /* GROUP BY clauses */ List *groupClause; /* GROUP BY clauses */
bool groupDistinct; /* Is this GROUP BY DISTINCT? */
Node *havingClause; /* HAVING conditional-expression */ Node *havingClause; /* HAVING conditional-expression */
List *windowClause; /* WINDOW window_name AS (...), ... */ List *windowClause; /* WINDOW window_name AS (...), ... */
......
...@@ -604,6 +604,7 @@ extern pg_nodiscard List *list_copy_deep(const List *oldlist); ...@@ -604,6 +604,7 @@ extern pg_nodiscard List *list_copy_deep(const List *oldlist);
typedef int (*list_sort_comparator) (const ListCell *a, const ListCell *b); typedef int (*list_sort_comparator) (const ListCell *a, const ListCell *b);
extern void list_sort(List *list, list_sort_comparator cmp); extern void list_sort(List *list, list_sort_comparator cmp);
extern int list_int_cmp(const ListCell *p1, const ListCell *p2);
extern int list_oid_cmp(const ListCell *p1, const ListCell *p2); extern int list_oid_cmp(const ListCell *p1, const ListCell *p2);
#endif /* PG_LIST_H */ #endif /* PG_LIST_H */
...@@ -26,7 +26,7 @@ extern void transformWindowFuncCall(ParseState *pstate, WindowFunc *wfunc, ...@@ -26,7 +26,7 @@ extern void transformWindowFuncCall(ParseState *pstate, WindowFunc *wfunc,
extern void parseCheckAggregates(ParseState *pstate, Query *qry); extern void parseCheckAggregates(ParseState *pstate, Query *qry);
extern List *expand_grouping_sets(List *groupingSets, int limit); extern List *expand_grouping_sets(List *groupingSets, bool groupDistinct, int limit);
extern int get_aggregate_argtypes(Aggref *aggref, Oid *inputTypes); extern int get_aggregate_argtypes(Aggref *aggref, Oid *inputTypes);
......
...@@ -1929,4 +1929,115 @@ set work_mem to default; ...@@ -1929,4 +1929,115 @@ set work_mem to default;
drop table gs_group_1; drop table gs_group_1;
drop table gs_hash_1; drop table gs_hash_1;
-- GROUP BY DISTINCT
-- "normal" behavior...
select a, b, c
from (values (1, 2, 3), (4, null, 6), (7, 8, 9)) as t (a, b, c)
group by all rollup(a, b), rollup(a, c)
order by a, b, c;
a | b | c
---+---+---
1 | 2 | 3
1 | 2 |
1 | 2 |
1 | | 3
1 | | 3
1 | |
1 | |
1 | |
4 | | 6
4 | | 6
4 | | 6
4 | |
4 | |
4 | |
4 | |
4 | |
7 | 8 | 9
7 | 8 |
7 | 8 |
7 | | 9
7 | | 9
7 | |
7 | |
7 | |
| |
(25 rows)
-- ...which is also the default
select a, b, c
from (values (1, 2, 3), (4, null, 6), (7, 8, 9)) as t (a, b, c)
group by rollup(a, b), rollup(a, c)
order by a, b, c;
a | b | c
---+---+---
1 | 2 | 3
1 | 2 |
1 | 2 |
1 | | 3
1 | | 3
1 | |
1 | |
1 | |
4 | | 6
4 | | 6
4 | | 6
4 | |
4 | |
4 | |
4 | |
4 | |
7 | 8 | 9
7 | 8 |
7 | 8 |
7 | | 9
7 | | 9
7 | |
7 | |
7 | |
| |
(25 rows)
-- "group by distinct" behavior...
select a, b, c
from (values (1, 2, 3), (4, null, 6), (7, 8, 9)) as t (a, b, c)
group by distinct rollup(a, b), rollup(a, c)
order by a, b, c;
a | b | c
---+---+---
1 | 2 | 3
1 | 2 |
1 | | 3
1 | |
4 | | 6
4 | | 6
4 | |
4 | |
7 | 8 | 9
7 | 8 |
7 | | 9
7 | |
| |
(13 rows)
-- ...which is not the same as "select distinct"
select distinct a, b, c
from (values (1, 2, 3), (4, null, 6), (7, 8, 9)) as t (a, b, c)
group by rollup(a, b), rollup(a, c)
order by a, b, c;
a | b | c
---+---+---
1 | 2 | 3
1 | 2 |
1 | | 3
1 | |
4 | | 6
4 | |
7 | 8 | 9
7 | 8 |
7 | | 9
7 | |
| |
(11 rows)
-- end -- end
...@@ -529,4 +529,30 @@ set work_mem to default; ...@@ -529,4 +529,30 @@ set work_mem to default;
drop table gs_group_1; drop table gs_group_1;
drop table gs_hash_1; drop table gs_hash_1;
-- GROUP BY DISTINCT
-- "normal" behavior...
select a, b, c
from (values (1, 2, 3), (4, null, 6), (7, 8, 9)) as t (a, b, c)
group by all rollup(a, b), rollup(a, c)
order by a, b, c;
-- ...which is also the default
select a, b, c
from (values (1, 2, 3), (4, null, 6), (7, 8, 9)) as t (a, b, c)
group by rollup(a, b), rollup(a, c)
order by a, b, c;
-- "group by distinct" behavior...
select a, b, c
from (values (1, 2, 3), (4, null, 6), (7, 8, 9)) as t (a, b, c)
group by distinct rollup(a, b), rollup(a, c)
order by a, b, c;
-- ...which is not the same as "select distinct"
select distinct a, b, c
from (values (1, 2, 3), (4, null, 6), (7, 8, 9)) as t (a, b, c)
group by rollup(a, b), rollup(a, c)
order by a, b, c;
-- end -- end
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment