Fix detection of unfinished Unicode surrogate pair at end of string.

The U&'...' and U&"..." syntaxes silently discarded a surrogate pair start (that is, a code between U+D800 and U+DBFF) if it occurred at the very end of the string. This seems like an obvious oversight, since we throw an error for every other invalid combination of surrogate characters, including the very same situation in E'...' syntax. This has been wrong since the pair processing was added (in 9.0), so back-patch to all supported branches. Discussion: https://postgr.es/m/19113.1482337898@sss.pgh.pa.us

Fix detection of unfinished Unicode surrogate pair at end of string.
The U&'...' and U&"..." syntaxes silently discarded a surrogate pair start (that is, a code between U+D800 and U+DBFF) if it occurred at the very end of the string. This seems like an obvious oversight, since we throw an error for every other invalid combination of surrogate characters, including the very same situation in E'...' syntax. This has been wrong since the pair processing was added (in 9.0), so back-patch to all supported branches. Discussion: https://postgr.es/m/19113.1482337898@sss.pgh.pa.us
a8ae1232 · Tom Lane · 89fcea1a · a8ae1232
Commit a8ae1232 authored Dec 21, 2016 by Tom Lane
Show whitespace changes
Inline Side-by-side

Showing with 7 additions and 0 deletions

src/backend/parser/scan.l src/backend/parser/scan.l +7 -0

No files found.
--- a/src/backend/parser/scan.l
+++ b/src/backend/parser/scan.l
@@ -1435,6 +1435,13 @@ litbuf_udeescape(unsigned char escape, core_yyscan_t yyscanner)
 		}
 	}
+	/* unfinished surrogate pair? */
+	if (pair_first)
+	{
+		ADVANCE_YYLLOC(in - litbuf + 3);				/* 3 for U&" */
+		yyerror("invalid Unicode surrogate pair");
+	}
 	*out = '\0';
 	/*