Make operator precedence follow the SQL standard more closely.

While the SQL standard is pretty vague on the overall topic of operator precedence (because it never presents a unified BNF for all expressions), it does seem reasonable to conclude from the spec for <boolean value expression> that OR has the lowest precedence, then AND, then NOT, then IS tests, then the six standard comparison operators, then everything else (since any non-boolean operator in a WHERE clause would need to be an argument of one of these). We were only sort of on board with that: most notably, while "<" ">" and "=" had properly low precedence, "<=" ">=" and "<>" were treated as generic operators and so had significantly higher precedence. And "IS" tests were even higher precedence than those, which is very clearly wrong per spec. Another problem was that "foo NOT SOMETHING bar" constructs, such as "x NOT LIKE y", were treated inconsistently because of a bison implementation artifact: they had the documented precedence with respect to operators to their right, but behaved like NOT (i.e., very low priority) with respect to operators to their left. Fixing the precedence issues is just a small matter of rearranging the precedence declarations in gram.y, except for the NOT problem, which requires adding an additional lookahead case in base_yylex() so that we can attach a different token precedence to NOT LIKE and allied two-word operators. The bulk of this patch is not the bug fix per se, but adding logic to parse_expr.c to allow giving warnings if an expression has changed meaning because of these precedence changes. These warnings are off by default and are enabled by the new GUC operator_precedence_warning. It's believed that very few applications will be affected by these changes, but it was agreed that a warning mechanism is essential to help debug any that are.

Make operator precedence follow the SQL standard more closely.
While the SQL standard is pretty vague on the overall topic of operator precedence (because it never presents a unified BNF for all expressions), it does seem reasonable to conclude from the spec for <boolean value expression> that OR has the lowest precedence, then AND, then NOT, then IS tests, then the six standard comparison operators, then everything else (since any non-boolean operator in a WHERE clause would need to be an argument of one of these). We were only sort of on board with that: most notably, while "<" ">" and "=" had properly low precedence, "<=" ">=" and "<>" were treated as generic operators and so had significantly higher precedence. And "IS" tests were even higher precedence than those, which is very clearly wrong per spec. Another problem was that "foo NOT SOMETHING bar" constructs, such as "x NOT LIKE y", were treated inconsistently because of a bison implementation artifact: they had the documented precedence with respect to operators to their right, but behaved like NOT (i.e., very low priority) with respect to operators to their left. Fixing the precedence issues is just a small matter of rearranging the precedence declarations in gram.y, except for the NOT problem, which requires adding an additional lookahead case in base_yylex() so that we can attach a different token precedence to NOT LIKE and allied two-word operators. The bulk of this patch is not the bug fix per se, but adding logic to parse_expr.c to allow giving warnings if an expression has changed meaning because of these precedence changes. These warnings are off by default and are enabled by the new GUC operator_precedence_warning. It's believed that very few applications will be affected by these changes, but it was agreed that a warning mechanism is essential to help debug any that are.
c6b3c939 · Tom Lane · 21dcda27 · c6b3c939 · c6b3c939 · c6b3c939
Commit c6b3c939 authored Mar 11, 2015 by Tom Lane
18 changed files
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -6816,6 +6816,29 @@ dynamic_library_path = 'C:\tools\postgresql;H:\my_project\lib;$libdir'
      </listitem>
     </varlistentry>

+     <varlistentry id="guc-operator-precedence-warning" xreflabel="operator_precedence_warning">
+      <term><varname>operator_precedence_warning</varname> (<type>boolean</type>)
+      <indexterm>
+       <primary><varname>operator_precedence_warning</> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        When on, the parser will emit a warning for any construct that might
+        have changed meanings since <productname>PostgreSQL</> 9.4 as a result
+        of changes in operator precedence.  This is useful for auditing
+        applications to see if precedence changes have broken anything; but it
+        is not meant to be kept turned on in production, since it will warn
+        about some perfectly valid, standard-compliant SQL code.
+        The default is <literal>off</>.
+       </para>
+
+       <para>
+        See <xref linkend="sql-precedence"> for more information.
+       </para>
+      </listitem>
+     </varlistentry>
+
    <varlistentry id="guc-quote-all-identifiers" xreflabel="quote-all-identifiers">
      <term><varname>quote_all_identifiers</varname> (<type>boolean</type>)
      <indexterm>

--- a/doc/src/sgml/syntax.sgml
+++ b/doc/src/sgml/syntax.sgml
@@ -984,10 +984,11 @@ CAST ( '<replaceable>string</replaceable>' AS <replaceable>type</replaceable> )
    associativity of the operators in <productname>PostgreSQL</>.
    Most operators have the same precedence and are left-associative.
    The precedence and associativity of the operators is hard-wired
-    into the parser.  This can lead to non-intuitive behavior; for
-    example the Boolean operators <literal>&lt;</> and
-    <literal>&gt;</> have a different precedence than the Boolean
-    operators <literal>&lt;=</> and <literal>&gt;=</>.  Also, you will
+    into the parser.
+   </para>
+
+   <para>
+    You will
    sometimes need to add parentheses when using combinations of
    binary and unary operators.  For instance:
 <programlisting>
@@ -1008,7 +1009,7 @@ SELECT (5 !) - 6;
   </para>

   <table id="sql-precedence-table">
-    <title>Operator Precedence (decreasing)</title>
+    <title>Operator Precedence (highest to lowest)</title>

    <tgroup cols="3">
     <thead>
@@ -1063,41 +1064,11 @@ SELECT (5 !) - 6;
      </row>

      <row>
-       <entry><token>IS</token></entry>
-       <entry></entry>
-       <entry><literal>IS TRUE</>, <literal>IS FALSE</>, <literal>IS NULL</>, etc</entry>
-      </row>
-
-      <row>
-       <entry><token>ISNULL</token></entry>
-       <entry></entry>
-       <entry>test for null</entry>
-      </row>
-
-      <row>
-       <entry><token>NOTNULL</token></entry>
-       <entry></entry>
-       <entry>test for not null</entry>
-      </row>
-
-      <row>
-       <entry>(any other)</entry>
+       <entry>(any other operator)</entry>
       <entry>left</entry>
       <entry>all other native and user-defined operators</entry>
      </row>

-      <row>
-       <entry><token>IN</token></entry>
-       <entry></entry>
-       <entry>set membership</entry>
-      </row>
-
-      <row>
-       <entry><token>BETWEEN</token></entry>
-       <entry></entry>
-       <entry>range containment</entry>
-      </row>
-
      <row>
       <entry><token>OVERLAPS</token></entry>
       <entry></entry>
@@ -1105,21 +1076,23 @@ SELECT (5 !) - 6;
      </row>

      <row>
-       <entry><token>LIKE</token> <token>ILIKE</token> <token>SIMILAR</token></entry>
+       <entry><token>BETWEEN</token> <token>IN</token> <token>LIKE</token> <token>ILIKE</token> <token>SIMILAR</token></entry>
       <entry></entry>
-       <entry>string pattern matching</entry>
+       <entry>range containment, set membership, string matching</entry>
      </row>

      <row>
-       <entry><token>&lt;</token> <token>&gt;</token></entry>
+       <entry><token>&lt;</token> <token>&gt;</token> <token>=</token> <token>&lt;=</token> <token>&gt;=</token> <token>&lt;&gt;</token>
+</entry>
       <entry></entry>
-       <entry>less than, greater than</entry>
+       <entry>comparison operators</entry>
      </row>

      <row>
-       <entry><token>=</token></entry>
-       <entry>right</entry>
-       <entry>equality, assignment</entry>
+       <entry><token>IS</token> <token>ISNULL</token> <token>NOTNULL</token></entry>
+       <entry></entry>
+       <entry><literal>IS TRUE</>, <literal>IS FALSE</>, <literal>IS
+       NULL</>, <literal>IS DISTINCT FROM</>, etc</entry>
      </row>

      <row>
@@ -1159,9 +1132,32 @@ SELECT (5 !) - 6;
 SELECT 3 OPERATOR(pg_catalog.+) 4;
 </programlisting>
    the <literal>OPERATOR</> construct is taken to have the default precedence
-    shown in <xref linkend="sql-precedence-table"> for <quote>any other</> operator.  This is true no matter
+    shown in <xref linkend="sql-precedence-table"> for
+    <quote>any other operator</>.  This is true no matter
    which specific operator appears inside <literal>OPERATOR()</>.
   </para>
+
+   <note>
+    <para>
+     <productname>PostgreSQL</> versions before 9.5 used slightly different
+     operator precedence rules.  In particular, <token>&lt;=</token>
+     <token>&gt;=</token> and <token>&lt;&gt;</token> used to be treated as
+     generic operators; <literal>IS</> tests used to have higher priority;
+     and <literal>NOT BETWEEN</> and related constructs acted inconsistently,
+     being taken in some cases as having the precedence of <literal>NOT</>
+     rather than <literal>BETWEEN</>.  These rules were changed for better
+     compliance with the SQL standard and to reduce confusion from
+     inconsistent treatment of logically equivalent constructs.  In most
+     cases, these changes will result in no behavioral change, or perhaps
+     in <quote>no such operator</> failures which can be resolved by adding
+     parentheses.  However there are corner cases in which a query might
+     change behavior without any parsing error being reported.  If you are
+     concerned about whether these changes have silently broken something,
+     you can test your application with the configuration
+     parameter <xref linkend="guc-operator-precedence-warning"> turned on
+     to see if any warnings are logged.
+    </para>
+   </note>
  </sect2>
 </sect1>


--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -2546,6 +2546,9 @@ _outAExpr(StringInfo str, const A_Expr *node)
 			appendStringInfoString(str, " NOT_BETWEEN_SYM ");
 			WRITE_NODE_FIELD(name);
 			break;
+		case AEXPR_PAREN:
+			appendStringInfoString(str, " PAREN");
+			break;
 		default:
 			appendStringInfoString(str, " ??");
 			break;

--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
--- a/src/backend/parser/parse_expr.c
+++ b/src/backend/parser/parse_expr.c
--- a/src/backend/parser/parse_target.c
+++ b/src/backend/parser/parse_target.c
@@ -1654,12 +1654,17 @@ FigureColnameInternal(Node *node, char **name)
 			*name = strVal(llast(((FuncCall *) node)->funcname));
 			return 2;
 		case T_A_Expr:
-			/* make nullif() act like a regular function */
 			if (((A_Expr *) node)->kind == AEXPR_NULLIF)
 			{
+				/* make nullif() act like a regular function */
 				*name = "nullif";
 				return 2;
 			}
+			if (((A_Expr *) node)->kind == AEXPR_PAREN)
+			{
+				/* look through dummy parenthesis node */
+				return FigureColnameInternal(((A_Expr *) node)->lexpr, name);
+			}
 			break;
 		case T_TypeCast:
 			strength = FigureColnameInternal(((TypeCast *) node)->arg,

--- a/src/backend/parser/parser.c
+++ b/src/backend/parser/parser.c
@@ -107,6 +107,9 @@ base_yylex(YYSTYPE *lvalp, YYLTYPE *llocp, core_yyscan_t yyscanner)
 	 */
 	switch (cur_token)
 	{
+		case NOT:
+			cur_token_length = 3;
+			break;
 		case NULLS_P:
 			cur_token_length = 5;
 			break;
@@ -151,6 +154,20 @@ base_yylex(YYSTYPE *lvalp, YYLTYPE *llocp, core_yyscan_t yyscanner)
 	/* Replace cur_token if needed, based on lookahead */
 	switch (cur_token)
 	{
+		case NOT:
+			/* Replace NOT by NOT_LA if it's followed by BETWEEN, IN, etc */
+			switch (next_token)
+			{
+				case BETWEEN:
+				case IN_P:
+				case LIKE:
+				case ILIKE:
+				case SIMILAR:
+					cur_token = NOT_LA;
+					break;
+			}
+			break;
+
 		case NULLS_P:
 			/* Replace NULLS_P by NULLS_LA if it's followed by FIRST or LAST */
 			switch (next_token)

--- a/src/backend/parser/scan.l
+++ b/src/backend/parser/scan.l
@@ -331,10 +331,15 @@ ident_cont		[A-Za-z\200-\377_0-9\$]

 identifier		{ident_start}{ident_cont}*

+/* Assorted special-case operators and operator-like tokens */
 typecast		"::"
 dot_dot			\.\.
 colon_equals	":="
 equals_greater	"=>"
+less_equals		"<="
+greater_equals	">="
+less_greater	"<>"
+not_equals		"!="

 /*
 * "self" is the set of chars that should be returned as single-character
@@ -814,6 +819,28 @@ other			.
 					return EQUALS_GREATER;
 				}

+{less_equals}	{
+					SET_YYLLOC();
+					return LESS_EQUALS;
+				}
+
+{greater_equals} {
+					SET_YYLLOC();
+					return GREATER_EQUALS;
+				}
+
+{less_greater}	{
+					/* We accept both "<>" and "!=" as meaning NOT_EQUALS */
+					SET_YYLLOC();
+					return NOT_EQUALS;
+				}
+
+{not_equals}	{
+					/* We accept both "<>" and "!=" as meaning NOT_EQUALS */
+					SET_YYLLOC();
+					return NOT_EQUALS;
+				}
+
 {self}			{
 					SET_YYLLOC();
 					return yytext[0];
@@ -891,10 +918,6 @@ other			.
 					if (nchars >= NAMEDATALEN)
 						yyerror("operator too long");

-					/* Convert "!=" operator to "<>" for compatibility */
-					if (strcmp(yytext, "!=") == 0)
-						yylval->str = pstrdup("<>");
-					else
 					yylval->str = pstrdup(yytext);
 					return Op;
 				}

--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -1599,6 +1599,16 @@ static struct config_bool ConfigureNamesBool[] =
 		NULL, NULL, NULL
 	},

+	{
+		{"operator_precedence_warning", PGC_USERSET, COMPAT_OPTIONS_PREVIOUS,
+			gettext_noop("Emit a warning for constructs that changed meaning since PostgreSQL 9.4."),
+			NULL,
+		},
+		&operator_precedence_warning,
+		false,
+		NULL, NULL, NULL
+	},
+
 	{
 		{"quote_all_identifiers", PGC_USERSET, COMPAT_OPTIONS_PREVIOUS,
 			gettext_noop("When generating SQL fragments, quote all identifiers."),

--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -586,6 +586,7 @@
 #default_with_oids = off
 #escape_string_warning = on
 #lo_compat_privileges = off
+#operator_precedence_warning = off
 #quote_all_identifiers = off
 #sql_inheritance = on
 #standard_conforming_strings = on

--- a/src/bin/psql/psqlscan.l
+++ b/src/bin/psql/psqlscan.l
@@ -355,10 +355,15 @@ ident_cont		[A-Za-z\200-\377_0-9\$]

 identifier		{ident_start}{ident_cont}*

+/* Assorted special-case operators and operator-like tokens */
 typecast		"::"
 dot_dot			\.\.
 colon_equals	":="
 equals_greater	"=>"
+less_equals		"<="
+greater_equals	">="
+less_greater	"<>"
+not_equals		"!="

 /*
 * "self" is the set of chars that should be returned as single-character
@@ -674,6 +679,22 @@ other			.
 					ECHO;
 				}

+{less_equals}	{
+					ECHO;
+				}
+
+{greater_equals} {
+					ECHO;
+				}
+
+{less_greater}	{
+					ECHO;
+				}
+
+{not_equals}	{
+					ECHO;
+				}
+
 	/*
 	 * These rules are specific to psql --- they implement parenthesis
 	 * counting and detection of command-ending semicolon.  These must

--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -239,7 +239,8 @@ typedef enum A_Expr_Kind
 	AEXPR_BETWEEN,				/* name must be "BETWEEN" */
 	AEXPR_NOT_BETWEEN,			/* name must be "NOT BETWEEN" */
 	AEXPR_BETWEEN_SYM,			/* name must be "BETWEEN SYMMETRIC" */
-	AEXPR_NOT_BETWEEN_SYM		/* name must be "NOT BETWEEN SYMMETRIC" */
+	AEXPR_NOT_BETWEEN_SYM,		/* name must be "NOT BETWEEN SYMMETRIC" */
+	AEXPR_PAREN					/* nameless dummy node for parentheses */
 } A_Expr_Kind;

 typedef struct A_Expr

--- a/src/include/parser/parse_expr.h
+++ b/src/include/parser/parse_expr.h
@@ -16,6 +16,7 @@
 #include "parser/parse_node.h"

 /* GUC parameters */
+extern bool operator_precedence_warning;
 extern bool Transform_null_equals;

 extern Node *transformExpr(ParseState *pstate, Node *expr, ParseExprKind exprKind);

--- a/src/include/parser/scanner.h
+++ b/src/include/parser/scanner.h
@@ -51,6 +51,7 @@ typedef union core_YYSTYPE
 *	%token <str>	IDENT FCONST SCONST BCONST XCONST Op
 *	%token <ival>	ICONST PARAM
 *	%token			TYPECAST DOT_DOT COLON_EQUALS EQUALS_GREATER
+ *	%token			LESS_EQUALS GREATER_EQUALS NOT_EQUALS
 * The above token definitions *must* be the first ones declared in any
 * bison parser built atop this scanner, so that they will have consistent
 * numbers assigned to them (specifically, IDENT = 258 and so on).

--- a/src/interfaces/ecpg/preproc/parse.pl
+++ b/src/interfaces/ecpg/preproc/parse.pl
@@ -42,12 +42,17 @@ my %replace_token = (

 # or in the block
 my %replace_string = (
+	'NOT_LA'          => 'not',
 	'NULLS_LA'        => 'nulls',
 	'WITH_LA'         => 'with',
 	'TYPECAST'        => '::',
 	'DOT_DOT'         => '..',
 	'COLON_EQUALS'    => ':=',
-	'EQUALS_GREATER'  => '=>',);
+	'EQUALS_GREATER'  => '=>',
+	'LESS_EQUALS'     => '<=',
+	'GREATER_EQUALS'  => '>=',
+	'NOT_EQUALS'      => '<>',
+);

 # specific replace_types for specific non-terminals - never include the ':'
 # ECPG-only replace_types are defined in ecpg-replace_types

--- a/src/interfaces/ecpg/preproc/parser.c
+++ b/src/interfaces/ecpg/preproc/parser.c
@@ -75,6 +75,9 @@ filtered_base_yylex(void)
 	 */
 	switch (cur_token)
 	{
+		case NOT:
+			cur_token_length = 3;
+			break;
 		case NULLS_P:
 			cur_token_length = 5;
 			break;
@@ -119,6 +122,20 @@ filtered_base_yylex(void)
 	/* Replace cur_token if needed, based on lookahead */
 	switch (cur_token)
 	{
+		case NOT:
+			/* Replace NOT by NOT_LA if it's followed by BETWEEN, IN, etc */
+			switch (next_token)
+			{
+				case BETWEEN:
+				case IN_P:
+				case LIKE:
+				case ILIKE:
+				case SIMILAR:
+					cur_token = NOT_LA;
+					break;
+			}
+			break;
+
 		case NULLS_P:
 			/* Replace NULLS_P by NULLS_LA if it's followed by FIRST or LAST */
 			switch (next_token)

--- a/src/interfaces/ecpg/preproc/pgc.l
+++ b/src/interfaces/ecpg/preproc/pgc.l
@@ -233,10 +233,16 @@ ident_cont		[A-Za-z\200-\377_0-9\$]
 identifier		{ident_start}{ident_cont}*

 array			({ident_cont}|{whitespace}|[\[\]\+\-\*\%\/\(\)\>\.])*
+
+/* Assorted special-case operators and operator-like tokens */
 typecast		"::"
 dot_dot			\.\.
 colon_equals	":="
 equals_greater	"=>"
+less_equals		"<="
+greater_equals	">="
+less_greater	"<>"
+not_equals		"!="

 /*
 * "self" is the set of chars that should be returned as single-character
@@ -622,6 +628,10 @@ cppline			{space}*#([^i][A-Za-z]*|{if}|{ifdef}|{ifndef}|{import})(.*\\{space})*.
 <SQL>{dot_dot}		{ return DOT_DOT; }
 <SQL>{colon_equals}	{ return COLON_EQUALS; }
 <SQL>{equals_greater} { return EQUALS_GREATER; }
+<SQL>{less_equals}	{ return LESS_EQUALS; }
+<SQL>{greater_equals} { return GREATER_EQUALS; }
+<SQL>{less_greater}	{ return NOT_EQUALS; }
+<SQL>{not_equals}	{ return NOT_EQUALS; }
 <SQL>{informix_special}	{
 			  /* are we simulating Informix? */
 				if (INFORMIX_MODE)
@@ -701,10 +711,6 @@ cppline			{space}*#([^i][A-Za-z]*|{if}|{ifdef}|{ifndef}|{import})(.*\\{space})*.
 								return yytext[0];
 						}

-						/* Convert "!=" operator to "<>" for compatibility */
-						if (strcmp(yytext, "!=") == 0)
-							yylval.str = mm_strdup("<>");
-						else
 						yylval.str = mm_strdup(yytext);
 						return Op;
 					}

--- a/src/pl/plpgsql/src/pl_gram.y
+++ b/src/pl/plpgsql/src/pl_gram.y
@@ -227,6 +227,7 @@ static	void			check_raise_parameters(PLpgSQL_stmt_raise *stmt);
 %token <str>	IDENT FCONST SCONST BCONST XCONST Op
 %token <ival>	ICONST PARAM
 %token			TYPECAST DOT_DOT COLON_EQUALS EQUALS_GREATER
+%token			LESS_EQUALS GREATER_EQUALS NOT_EQUALS

 /*
 * Other tokens recognized by plpgsql's lexer interface layer (pl_scanner.c).