Commit c6b3c939 authored by Tom Lane's avatar Tom Lane

Make operator precedence follow the SQL standard more closely.

While the SQL standard is pretty vague on the overall topic of operator
precedence (because it never presents a unified BNF for all expressions),
it does seem reasonable to conclude from the spec for <boolean value
expression> that OR has the lowest precedence, then AND, then NOT, then IS
tests, then the six standard comparison operators, then everything else
(since any non-boolean operator in a WHERE clause would need to be an
argument of one of these).

We were only sort of on board with that: most notably, while "<" ">" and
"=" had properly low precedence, "<=" ">=" and "<>" were treated as generic
operators and so had significantly higher precedence.  And "IS" tests were
even higher precedence than those, which is very clearly wrong per spec.

Another problem was that "foo NOT SOMETHING bar" constructs, such as
"x NOT LIKE y", were treated inconsistently because of a bison
implementation artifact: they had the documented precedence with respect
to operators to their right, but behaved like NOT (i.e., very low priority)
with respect to operators to their left.

Fixing the precedence issues is just a small matter of rearranging the
precedence declarations in gram.y, except for the NOT problem, which
requires adding an additional lookahead case in base_yylex() so that we
can attach a different token precedence to NOT LIKE and allied two-word
operators.

The bulk of this patch is not the bug fix per se, but adding logic to
parse_expr.c to allow giving warnings if an expression has changed meaning
because of these precedence changes.  These warnings are off by default
and are enabled by the new GUC operator_precedence_warning.  It's believed
that very few applications will be affected by these changes, but it was
agreed that a warning mechanism is essential to help debug any that are.
parent 21dcda27
......@@ -6816,6 +6816,29 @@ dynamic_library_path = 'C:\tools\postgresql;H:\my_project\lib;$libdir'
</listitem>
</varlistentry>
<varlistentry id="guc-operator-precedence-warning" xreflabel="operator_precedence_warning">
<term><varname>operator_precedence_warning</varname> (<type>boolean</type>)
<indexterm>
<primary><varname>operator_precedence_warning</> configuration parameter</primary>
</indexterm>
</term>
<listitem>
<para>
When on, the parser will emit a warning for any construct that might
have changed meanings since <productname>PostgreSQL</> 9.4 as a result
of changes in operator precedence. This is useful for auditing
applications to see if precedence changes have broken anything; but it
is not meant to be kept turned on in production, since it will warn
about some perfectly valid, standard-compliant SQL code.
The default is <literal>off</>.
</para>
<para>
See <xref linkend="sql-precedence"> for more information.
</para>
</listitem>
</varlistentry>
<varlistentry id="guc-quote-all-identifiers" xreflabel="quote-all-identifiers">
<term><varname>quote_all_identifiers</varname> (<type>boolean</type>)
<indexterm>
......
......@@ -984,10 +984,11 @@ CAST ( '<replaceable>string</replaceable>' AS <replaceable>type</replaceable> )
associativity of the operators in <productname>PostgreSQL</>.
Most operators have the same precedence and are left-associative.
The precedence and associativity of the operators is hard-wired
into the parser. This can lead to non-intuitive behavior; for
example the Boolean operators <literal>&lt;</> and
<literal>&gt;</> have a different precedence than the Boolean
operators <literal>&lt;=</> and <literal>&gt;=</>. Also, you will
into the parser.
</para>
<para>
You will
sometimes need to add parentheses when using combinations of
binary and unary operators. For instance:
<programlisting>
......@@ -1008,7 +1009,7 @@ SELECT (5 !) - 6;
</para>
<table id="sql-precedence-table">
<title>Operator Precedence (decreasing)</title>
<title>Operator Precedence (highest to lowest)</title>
<tgroup cols="3">
<thead>
......@@ -1063,41 +1064,11 @@ SELECT (5 !) - 6;
</row>
<row>
<entry><token>IS</token></entry>
<entry></entry>
<entry><literal>IS TRUE</>, <literal>IS FALSE</>, <literal>IS NULL</>, etc</entry>
</row>
<row>
<entry><token>ISNULL</token></entry>
<entry></entry>
<entry>test for null</entry>
</row>
<row>
<entry><token>NOTNULL</token></entry>
<entry></entry>
<entry>test for not null</entry>
</row>
<row>
<entry>(any other)</entry>
<entry>(any other operator)</entry>
<entry>left</entry>
<entry>all other native and user-defined operators</entry>
</row>
<row>
<entry><token>IN</token></entry>
<entry></entry>
<entry>set membership</entry>
</row>
<row>
<entry><token>BETWEEN</token></entry>
<entry></entry>
<entry>range containment</entry>
</row>
<row>
<entry><token>OVERLAPS</token></entry>
<entry></entry>
......@@ -1105,21 +1076,23 @@ SELECT (5 !) - 6;
</row>
<row>
<entry><token>LIKE</token> <token>ILIKE</token> <token>SIMILAR</token></entry>
<entry><token>BETWEEN</token> <token>IN</token> <token>LIKE</token> <token>ILIKE</token> <token>SIMILAR</token></entry>
<entry></entry>
<entry>string pattern matching</entry>
<entry>range containment, set membership, string matching</entry>
</row>
<row>
<entry><token>&lt;</token> <token>&gt;</token></entry>
<entry><token>&lt;</token> <token>&gt;</token> <token>=</token> <token>&lt;=</token> <token>&gt;=</token> <token>&lt;&gt;</token>
</entry>
<entry></entry>
<entry>less than, greater than</entry>
<entry>comparison operators</entry>
</row>
<row>
<entry><token>=</token></entry>
<entry>right</entry>
<entry>equality, assignment</entry>
<entry><token>IS</token> <token>ISNULL</token> <token>NOTNULL</token></entry>
<entry></entry>
<entry><literal>IS TRUE</>, <literal>IS FALSE</>, <literal>IS
NULL</>, <literal>IS DISTINCT FROM</>, etc</entry>
</row>
<row>
......@@ -1159,9 +1132,32 @@ SELECT (5 !) - 6;
SELECT 3 OPERATOR(pg_catalog.+) 4;
</programlisting>
the <literal>OPERATOR</> construct is taken to have the default precedence
shown in <xref linkend="sql-precedence-table"> for <quote>any other</> operator. This is true no matter
shown in <xref linkend="sql-precedence-table"> for
<quote>any other operator</>. This is true no matter
which specific operator appears inside <literal>OPERATOR()</>.
</para>
<note>
<para>
<productname>PostgreSQL</> versions before 9.5 used slightly different
operator precedence rules. In particular, <token>&lt;=</token>
<token>&gt;=</token> and <token>&lt;&gt;</token> used to be treated as
generic operators; <literal>IS</> tests used to have higher priority;
and <literal>NOT BETWEEN</> and related constructs acted inconsistently,
being taken in some cases as having the precedence of <literal>NOT</>
rather than <literal>BETWEEN</>. These rules were changed for better
compliance with the SQL standard and to reduce confusion from
inconsistent treatment of logically equivalent constructs. In most
cases, these changes will result in no behavioral change, or perhaps
in <quote>no such operator</> failures which can be resolved by adding
parentheses. However there are corner cases in which a query might
change behavior without any parsing error being reported. If you are
concerned about whether these changes have silently broken something,
you can test your application with the configuration
parameter <xref linkend="guc-operator-precedence-warning"> turned on
to see if any warnings are logged.
</para>
</note>
</sect2>
</sect1>
......
......@@ -2546,6 +2546,9 @@ _outAExpr(StringInfo str, const A_Expr *node)
appendStringInfoString(str, " NOT_BETWEEN_SYM ");
WRITE_NODE_FIELD(name);
break;
case AEXPR_PAREN:
appendStringInfoString(str, " PAREN");
break;
default:
appendStringInfoString(str, " ??");
break;
......
This diff is collapsed.
This diff is collapsed.
......@@ -1654,12 +1654,17 @@ FigureColnameInternal(Node *node, char **name)
*name = strVal(llast(((FuncCall *) node)->funcname));
return 2;
case T_A_Expr:
/* make nullif() act like a regular function */
if (((A_Expr *) node)->kind == AEXPR_NULLIF)
{
/* make nullif() act like a regular function */
*name = "nullif";
return 2;
}
if (((A_Expr *) node)->kind == AEXPR_PAREN)
{
/* look through dummy parenthesis node */
return FigureColnameInternal(((A_Expr *) node)->lexpr, name);
}
break;
case T_TypeCast:
strength = FigureColnameInternal(((TypeCast *) node)->arg,
......
......@@ -107,6 +107,9 @@ base_yylex(YYSTYPE *lvalp, YYLTYPE *llocp, core_yyscan_t yyscanner)
*/
switch (cur_token)
{
case NOT:
cur_token_length = 3;
break;
case NULLS_P:
cur_token_length = 5;
break;
......@@ -151,6 +154,20 @@ base_yylex(YYSTYPE *lvalp, YYLTYPE *llocp, core_yyscan_t yyscanner)
/* Replace cur_token if needed, based on lookahead */
switch (cur_token)
{
case NOT:
/* Replace NOT by NOT_LA if it's followed by BETWEEN, IN, etc */
switch (next_token)
{
case BETWEEN:
case IN_P:
case LIKE:
case ILIKE:
case SIMILAR:
cur_token = NOT_LA;
break;
}
break;
case NULLS_P:
/* Replace NULLS_P by NULLS_LA if it's followed by FIRST or LAST */
switch (next_token)
......
......@@ -331,10 +331,15 @@ ident_cont [A-Za-z\200-\377_0-9\$]
identifier {ident_start}{ident_cont}*
/* Assorted special-case operators and operator-like tokens */
typecast "::"
dot_dot \.\.
colon_equals ":="
equals_greater "=>"
less_equals "<="
greater_equals ">="
less_greater "<>"
not_equals "!="
/*
* "self" is the set of chars that should be returned as single-character
......@@ -814,6 +819,28 @@ other .
return EQUALS_GREATER;
}
{less_equals} {
SET_YYLLOC();
return LESS_EQUALS;
}
{greater_equals} {
SET_YYLLOC();
return GREATER_EQUALS;
}
{less_greater} {
/* We accept both "<>" and "!=" as meaning NOT_EQUALS */
SET_YYLLOC();
return NOT_EQUALS;
}
{not_equals} {
/* We accept both "<>" and "!=" as meaning NOT_EQUALS */
SET_YYLLOC();
return NOT_EQUALS;
}
{self} {
SET_YYLLOC();
return yytext[0];
......@@ -891,10 +918,6 @@ other .
if (nchars >= NAMEDATALEN)
yyerror("operator too long");
/* Convert "!=" operator to "<>" for compatibility */
if (strcmp(yytext, "!=") == 0)
yylval->str = pstrdup("<>");
else
yylval->str = pstrdup(yytext);
return Op;
}
......
......@@ -1599,6 +1599,16 @@ static struct config_bool ConfigureNamesBool[] =
NULL, NULL, NULL
},
{
{"operator_precedence_warning", PGC_USERSET, COMPAT_OPTIONS_PREVIOUS,
gettext_noop("Emit a warning for constructs that changed meaning since PostgreSQL 9.4."),
NULL,
},
&operator_precedence_warning,
false,
NULL, NULL, NULL
},
{
{"quote_all_identifiers", PGC_USERSET, COMPAT_OPTIONS_PREVIOUS,
gettext_noop("When generating SQL fragments, quote all identifiers."),
......
......@@ -586,6 +586,7 @@
#default_with_oids = off
#escape_string_warning = on
#lo_compat_privileges = off
#operator_precedence_warning = off
#quote_all_identifiers = off
#sql_inheritance = on
#standard_conforming_strings = on
......
......@@ -355,10 +355,15 @@ ident_cont [A-Za-z\200-\377_0-9\$]
identifier {ident_start}{ident_cont}*
/* Assorted special-case operators and operator-like tokens */
typecast "::"
dot_dot \.\.
colon_equals ":="
equals_greater "=>"
less_equals "<="
greater_equals ">="
less_greater "<>"
not_equals "!="
/*
* "self" is the set of chars that should be returned as single-character
......@@ -674,6 +679,22 @@ other .
ECHO;
}
{less_equals} {
ECHO;
}
{greater_equals} {
ECHO;
}
{less_greater} {
ECHO;
}
{not_equals} {
ECHO;
}
/*
* These rules are specific to psql --- they implement parenthesis
* counting and detection of command-ending semicolon. These must
......
......@@ -239,7 +239,8 @@ typedef enum A_Expr_Kind
AEXPR_BETWEEN, /* name must be "BETWEEN" */
AEXPR_NOT_BETWEEN, /* name must be "NOT BETWEEN" */
AEXPR_BETWEEN_SYM, /* name must be "BETWEEN SYMMETRIC" */
AEXPR_NOT_BETWEEN_SYM /* name must be "NOT BETWEEN SYMMETRIC" */
AEXPR_NOT_BETWEEN_SYM, /* name must be "NOT BETWEEN SYMMETRIC" */
AEXPR_PAREN /* nameless dummy node for parentheses */
} A_Expr_Kind;
typedef struct A_Expr
......
......@@ -16,6 +16,7 @@
#include "parser/parse_node.h"
/* GUC parameters */
extern bool operator_precedence_warning;
extern bool Transform_null_equals;
extern Node *transformExpr(ParseState *pstate, Node *expr, ParseExprKind exprKind);
......
......@@ -51,6 +51,7 @@ typedef union core_YYSTYPE
* %token <str> IDENT FCONST SCONST BCONST XCONST Op
* %token <ival> ICONST PARAM
* %token TYPECAST DOT_DOT COLON_EQUALS EQUALS_GREATER
* %token LESS_EQUALS GREATER_EQUALS NOT_EQUALS
* The above token definitions *must* be the first ones declared in any
* bison parser built atop this scanner, so that they will have consistent
* numbers assigned to them (specifically, IDENT = 258 and so on).
......
......@@ -42,12 +42,17 @@ my %replace_token = (
# or in the block
my %replace_string = (
'NOT_LA' => 'not',
'NULLS_LA' => 'nulls',
'WITH_LA' => 'with',
'TYPECAST' => '::',
'DOT_DOT' => '..',
'COLON_EQUALS' => ':=',
'EQUALS_GREATER' => '=>',);
'EQUALS_GREATER' => '=>',
'LESS_EQUALS' => '<=',
'GREATER_EQUALS' => '>=',
'NOT_EQUALS' => '<>',
);
# specific replace_types for specific non-terminals - never include the ':'
# ECPG-only replace_types are defined in ecpg-replace_types
......
......@@ -75,6 +75,9 @@ filtered_base_yylex(void)
*/
switch (cur_token)
{
case NOT:
cur_token_length = 3;
break;
case NULLS_P:
cur_token_length = 5;
break;
......@@ -119,6 +122,20 @@ filtered_base_yylex(void)
/* Replace cur_token if needed, based on lookahead */
switch (cur_token)
{
case NOT:
/* Replace NOT by NOT_LA if it's followed by BETWEEN, IN, etc */
switch (next_token)
{
case BETWEEN:
case IN_P:
case LIKE:
case ILIKE:
case SIMILAR:
cur_token = NOT_LA;
break;
}
break;
case NULLS_P:
/* Replace NULLS_P by NULLS_LA if it's followed by FIRST or LAST */
switch (next_token)
......
......@@ -233,10 +233,16 @@ ident_cont [A-Za-z\200-\377_0-9\$]
identifier {ident_start}{ident_cont}*
array ({ident_cont}|{whitespace}|[\[\]\+\-\*\%\/\(\)\>\.])*
/* Assorted special-case operators and operator-like tokens */
typecast "::"
dot_dot \.\.
colon_equals ":="
equals_greater "=>"
less_equals "<="
greater_equals ">="
less_greater "<>"
not_equals "!="
/*
* "self" is the set of chars that should be returned as single-character
......@@ -622,6 +628,10 @@ cppline {space}*#([^i][A-Za-z]*|{if}|{ifdef}|{ifndef}|{import})(.*\\{space})*.
<SQL>{dot_dot} { return DOT_DOT; }
<SQL>{colon_equals} { return COLON_EQUALS; }
<SQL>{equals_greater} { return EQUALS_GREATER; }
<SQL>{less_equals} { return LESS_EQUALS; }
<SQL>{greater_equals} { return GREATER_EQUALS; }
<SQL>{less_greater} { return NOT_EQUALS; }
<SQL>{not_equals} { return NOT_EQUALS; }
<SQL>{informix_special} {
/* are we simulating Informix? */
if (INFORMIX_MODE)
......@@ -701,10 +711,6 @@ cppline {space}*#([^i][A-Za-z]*|{if}|{ifdef}|{ifndef}|{import})(.*\\{space})*.
return yytext[0];
}
/* Convert "!=" operator to "<>" for compatibility */
if (strcmp(yytext, "!=") == 0)
yylval.str = mm_strdup("<>");
else
yylval.str = mm_strdup(yytext);
return Op;
}
......
......@@ -227,6 +227,7 @@ static void check_raise_parameters(PLpgSQL_stmt_raise *stmt);
%token <str> IDENT FCONST SCONST BCONST XCONST Op
%token <ival> ICONST PARAM
%token TYPECAST DOT_DOT COLON_EQUALS EQUALS_GREATER
%token LESS_EQUALS GREATER_EQUALS NOT_EQUALS
/*
* Other tokens recognized by plpgsql's lexer interface layer (pl_scanner.c).
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment