Tweak GROUP BY so that it will still accept result-column names, but only

after trying to resolve the item as an input-column name. This allows us to be compliant with the SQL92 spec for queries that fall within the spec, while still accepting the same out-of-spec queries as 6.5 did. You'll only lose if there is an output column name that is the same as an input column name, but doesn't refer to the same value. 7.0 will interpret such a GROUP BY spec differently than 6.5 did. No way around that, because 6.5 was clearly not spec compliant.

Tweak GROUP BY so that it will still accept result-column names, but only
after trying to resolve the item as an input-column name. This allows us to be compliant with the SQL92 spec for queries that fall within the spec, while still accepting the same out-of-spec queries as 6.5 did. You'll only lose if there is an output column name that is the same as an input column name, but doesn't refer to the same value. 7.0 will interpret such a GROUP BY spec differently than 6.5 did. No way around that, because 6.5 was clearly not spec compliant.
1763a7c1 · Tom Lane · 8e079051 · 1763a7c1 · 1763a7c1 · 1763a7c1
Commit 1763a7c1 authored Mar 15, 2000 by Tom Lane
4 changed files
--- a/doc/src/sgml/ref/select.sgml
+++ b/doc/src/sgml/ref/select.sgml
--- a/src/backend/parser/parse_clause.c
+++ b/src/backend/parser/parse_clause.c
@@ -8,7 +8,7 @@
 *
 *
 * IDENTIFICATION
- *	  $Header: /cvsroot/pgsql/src/backend/parser/parse_clause.c,v 1.56 2000/03/14 23:06:32 thomas Exp $
+ *	  $Header: /cvsroot/pgsql/src/backend/parser/parse_clause.c,v 1.57 2000/03/15 23:31:04 tgl Exp $
 *
 *-------------------------------------------------------------------------
 */
@@ -719,9 +719,9 @@ parseFromClause(ParseState *pstate, List *frmList)
 *	  list as a "resjunk" node.
 *
 * node		the ORDER BY, GROUP BY, or DISTINCT ON expression to be matched
- * tlist	the existing target list (NB: this cannot be NIL, which is a
+ * tlist	the existing target list (NB: this will never be NIL, which is a
- *			good thing since we'd be unable to append to it...)
+ *			good thing since we'd be unable to append to it if it were...)
- * clause	identifies clause type (mainly for error messages).
+ * clause	identifies clause type being processed.
 */
 static TargetEntry *
 findTargetlistEntry(ParseState *pstate, Node *node, List *tlist, int clause)
@@ -733,7 +733,7 @@ findTargetlistEntry(ParseState *pstate, Node *node, List *tlist, int clause)
 	/*----------
 	 * Handle two special cases as mandated by the SQL92 spec:
 	 *
-	 * 1. ORDER BY ColumnName
+	 * 1. Bare ColumnName (no qualifier or subscripts)
 	 *    For a bare identifier, we search for a matching column name
 	 *	  in the existing target list.  Multiple matches are an error
 	 *	  unless they refer to identical values; for example,
@@ -741,49 +741,76 @@ findTargetlistEntry(ParseState *pstate, Node *node, List *tlist, int clause)
 	 *	  but not   SELECT a AS b, b FROM table ORDER BY b
 	 *	  If no match is found, we fall through and treat the identifier
 	 *	  as an expression.
-	 *	  We do NOT attempt this match for GROUP BY, since it is clearly
+	 *	  For GROUP BY, it is incorrect to match the grouping item against
-	 *	  contrary to the spec to use an output column name in preference
+	 *	  targetlist entries: according to SQL92, an identifier in GROUP BY
-	 *	  to an underlying column name in GROUP BY.  DISTINCT ON isn't in
+	 *	  is a reference to a column name exposed by FROM, not to a target
-	 *	  the standard, so we can do what we like there; we choose to make
+	 *	  list column.  However, many implementations (including pre-7.0
-	 *	  it work like GROUP BY.
+	 *	  PostgreSQL) accept this anyway.  So for GROUP BY, we look first
+	 *	  to see if the identifier matches any FROM column name, and only
+	 *	  try for a targetlist name if it doesn't.  This ensures that we
+	 *	  adhere to the spec in the case where the name could be both.
+	 *	  DISTINCT ON isn't in the standard, so we can do what we like there;
+	 *	  we choose to make it work like ORDER BY, on the rather flimsy
+	 *	  grounds that ordinary DISTINCT works on targetlist entries.
 	 *
-	 * 2. ORDER BY/GROUP BY/DISTINCT ON IntegerConstant
+	 * 2. IntegerConstant
 	 *	  This means to use the n'th item in the existing target list.
 	 *	  Note that it would make no sense to order/group/distinct by an
 	 *	  actual constant, so this does not create a conflict with our
 	 *	  extension to order/group by an expression.
-	 *	  I believe that GROUP BY column-number is not sanctioned by SQL92,
+	 *	  GROUP BY column-number is not allowed by SQL92, but since
-	 *	  but since the standard has no other behavior defined for this
+	 *	  the standard has no other behavior defined for this syntax,
-	 *	  syntax, we may as well continue to support our past behavior.
+	 *	  we may as well accept this common extension.
 	 *
-	 * Note that pre-existing resjunk targets must not be used in either case.
+	 * Note that pre-existing resjunk targets must not be used in either case,
+	 * since the user didn't write them in his SELECT list.
+	 *
+	 * If neither special case applies, fall through to treat the item as
+	 * an expression.
 	 *----------
 	 */
-	if (clause == ORDER_CLAUSE &&
+	if (IsA(node, Ident) && ((Ident *) node)->indirection == NIL)
-		IsA(node, Ident) && ((Ident *) node)->indirection == NIL)
 	{
 		char	   *name = ((Ident *) node)->name;
-		foreach(tl, tlist)
+		if (clause == GROUP_CLAUSE)
 		{
-			TargetEntry *tle = (TargetEntry *) lfirst(tl);
+			/*
-			Resdom	   *resnode = tle->resdom;
+			 * In GROUP BY, we must prefer a match against a FROM-clause
+			 * column to one against the targetlist.  Look to see if there is
+			 * a matching column.  If so, fall through to let transformExpr()
+			 * do the rest.  NOTE: if name could refer ambiguously to more
+			 * than one column name exposed by FROM, colnameRangeTableEntry
+			 * will elog(ERROR).  That's just what we want here.
+			 */
+			if (colnameRangeTableEntry(pstate, name) != NULL)
+				name = NULL;
+		}
-			if (!resnode->resjunk &&
+		if (name != NULL)
-				strcmp(resnode->resname, name) == 0)
+		{
+			foreach(tl, tlist)
 			{
-				if (target_result != NULL)
+				TargetEntry *tle = (TargetEntry *) lfirst(tl);
+				Resdom	   *resnode = tle->resdom;
+				if (!resnode->resjunk &&
+					strcmp(resnode->resname, name) == 0)
 				{
-					if (! equal(target_result->expr, tle->expr))
+					if (target_result != NULL)
-						elog(ERROR, "%s '%s' is ambiguous",
+					{
-							 clauseText[clause], name);
+						if (! equal(target_result->expr, tle->expr))
+							elog(ERROR, "%s '%s' is ambiguous",
+								 clauseText[clause], name);
+					}
+					else
+						target_result = tle;
+					/* Stay in loop to check for ambiguity */
 				}
-				else
-					target_result = tle;
-				/* Stay in loop to check for ambiguity */
 			}
+			if (target_result != NULL)
+				return target_result; /* return the first match */
 		}
-		if (target_result != NULL)
-			return target_result; /* return the first match */
 	}
 	if (IsA(node, A_Const))
 	{

--- a/src/test/regress/expected/numerology.out
+++ b/src/test/regress/expected/numerology.out
@@ -96,14 +96,18 @@ SELECT f1 AS two, max(f3) AS max_float, min(f3) as min_float
   2 |                    0 | -1.2345678901234e+200
 (2 rows)
-- Postgres used to accept this, but it is clearly against SQL92 to
+-- GROUP BY a result column name is not legal per SQL92, but we accept it
-- interpret GROUP BY arguments as result column names; they should
+-- anyway (if the name is not the name of any column exposed by FROM).
-- be source column names *only*.  An error is expected.
 SELECT f1 AS two, max(f3) AS max_float, min(f3) AS min_float
  FROM TEMP_GROUP
  GROUP BY two
  ORDER BY two, max_float, min_float;
-ERROR:  Attribute 'two' not found
+ two |      max_float       |       min_float       
+-----+----------------------+-----------------------
+   1 | 1.2345678901234e+200 |                     0
+   2 |                    0 | -1.2345678901234e+200
+(2 rows)
 SELECT f1 AS two, (max(f3) + 1) AS max_plus_1, (min(f3) - 1) AS min_minus_1
  FROM TEMP_GROUP
  GROUP BY f1

--- a/src/test/regress/sql/numerology.sql
+++ b/src/test/regress/sql/numerology.sql
@@ -70,9 +70,8 @@ SELECT f1 AS two, max(f3) AS max_float, min(f3) as min_float
  GROUP BY f1
  ORDER BY two, max_float, min_float;
-- Postgres used to accept this, but it is clearly against SQL92 to
+-- GROUP BY a result column name is not legal per SQL92, but we accept it
-- interpret GROUP BY arguments as result column names; they should
+-- anyway (if the name is not the name of any column exposed by FROM).
-- be source column names *only*.  An error is expected.
 SELECT f1 AS two, max(f3) AS max_float, min(f3) AS min_float
  FROM TEMP_GROUP
  GROUP BY two