Commit 451d2808 authored by Tom Lane's avatar Tom Lane

Fix jsonb Unicode escape processing, and in consequence disallow \u0000.

We've been trying to support \u0000 in JSON values since commit
78ed8e03, and have introduced increasingly worse hacks to try to
make it work, such as commit 0ad1a816.  However, it fundamentally
can't work in the way envisioned, because the stored representation looks
the same as for \\u0000 which is not the same thing at all.  It's also
entirely bogus to output \u0000 when de-escaped output is called for.

The right way to do this would be to store an actual 0x00 byte, and then
throw error only if asked to produce de-escaped textual output.  However,
getting to that point seems likely to take considerable work and may well
never be practical in the 9.4.x series.

To preserve our options for better behavior while getting rid of the nasty
side-effects of 0ad1a816, revert that commit in toto and instead
throw error if \u0000 is used in a context where it needs to be de-escaped.
(These are the same contexts where non-ASCII Unicode escapes throw error
if the database encoding isn't UTF8, so this behavior is by no means
without precedent.)

In passing, make both the \u0000 case and the non-ASCII Unicode case report
ERRCODE_UNTRANSLATABLE_CHARACTER / "unsupported Unicode escape sequence"
rather than claiming there's something wrong with the input syntax.

Back-patch to 9.4, where we have to do something because 0ad1a816
broke things for many cases having nothing to do with \u0000.  9.3 also has
bogus behavior, but only for that specific escape value, so given the lack
of field complaints it seems better to leave 9.3 alone.
parent e40d43f8
...@@ -69,12 +69,14 @@ ...@@ -69,12 +69,14 @@
regardless of the database encoding, and are checked only for syntactic regardless of the database encoding, and are checked only for syntactic
correctness (that is, that four hex digits follow <literal>\u</>). correctness (that is, that four hex digits follow <literal>\u</>).
However, the input function for <type>jsonb</> is stricter: it disallows However, the input function for <type>jsonb</> is stricter: it disallows
Unicode escapes for non-ASCII characters (those Unicode escapes for non-ASCII characters (those above <literal>U+007F</>)
above <literal>U+007F</>) unless the database encoding is UTF8. It also unless the database encoding is UTF8. The <type>jsonb</> type also
insists that any use of Unicode surrogate pairs to designate characters rejects <literal>\u0000</> (because that cannot be represented in
outside the Unicode Basic Multilingual Plane be correct. Valid Unicode <productname>PostgreSQL</productname>'s <type>text</> type), and it insists
escapes, except for <literal>\u0000</>, are then converted to the that any use of Unicode surrogate pairs to designate characters outside
equivalent ASCII or UTF8 character for storage. the Unicode Basic Multilingual Plane be correct. Valid Unicode escapes
are converted to the equivalent ASCII or UTF8 character for storage;
this includes folding surrogate pairs into a single character.
</para> </para>
<note> <note>
...@@ -101,7 +103,7 @@ ...@@ -101,7 +103,7 @@
constitutes valid <type>jsonb</type> data that do not apply to constitutes valid <type>jsonb</type> data that do not apply to
the <type>json</type> type, nor to JSON in the abstract, corresponding the <type>json</type> type, nor to JSON in the abstract, corresponding
to limits on what can be represented by the underlying data type. to limits on what can be represented by the underlying data type.
Specifically, <type>jsonb</> will reject numbers that are outside the Notably, <type>jsonb</> will reject numbers that are outside the
range of the <productname>PostgreSQL</productname> <type>numeric</> data range of the <productname>PostgreSQL</productname> <type>numeric</> data
type, while <type>json</> will not. Such implementation-defined type, while <type>json</> will not. Such implementation-defined
restrictions are permitted by <acronym>RFC</> 7159. However, in restrictions are permitted by <acronym>RFC</> 7159. However, in
...@@ -134,7 +136,8 @@ ...@@ -134,7 +136,8 @@
<row> <row>
<entry><type>string</></entry> <entry><type>string</></entry>
<entry><type>text</></entry> <entry><type>text</></entry>
<entry>See notes above concerning encoding restrictions</entry> <entry><literal>\u0000</> is disallowed, as are non-ASCII Unicode
escapes if database encoding is not UTF8</entry>
</row> </row>
<row> <row>
<entry><type>number</></entry> <entry><type>number</></entry>
......
...@@ -101,22 +101,6 @@ ...@@ -101,22 +101,6 @@
</para> </para>
</listitem> </listitem>
<listitem>
<para>
Unicode escapes in <link linkend="datatype-json"><type>JSON</type></link>
text values are no longer rendered with the backslash escaped
(Andrew Dunstan)
</para>
<para>
Previously, all backslashes in text values being formed into JSON
were escaped. Now a backslash followed by <literal>u</> and four
hexadecimal digits is not escaped, as this is a legal sequence in a
JSON string value, and escaping the backslash led to some perverse
results.
</para>
</listitem>
<listitem> <listitem>
<para> <para>
When converting values of type <type>date</>, <type>timestamp</> When converting values of type <type>date</>, <type>timestamp</>
......
...@@ -806,14 +806,17 @@ json_lex_string(JsonLexContext *lex) ...@@ -806,14 +806,17 @@ json_lex_string(JsonLexContext *lex)
* For UTF8, replace the escape sequence by the actual * For UTF8, replace the escape sequence by the actual
* utf8 character in lex->strval. Do this also for other * utf8 character in lex->strval. Do this also for other
* encodings if the escape designates an ASCII character, * encodings if the escape designates an ASCII character,
* otherwise raise an error. We don't ever unescape a * otherwise raise an error.
* \u0000, since that would result in an impermissible nul
* byte.
*/ */
if (ch == 0) if (ch == 0)
{ {
appendStringInfoString(lex->strval, "\\u0000"); /* We can't allow this, since our TEXT type doesn't */
ereport(ERROR,
(errcode(ERRCODE_UNTRANSLATABLE_CHARACTER),
errmsg("unsupported Unicode escape sequence"),
errdetail("\\u0000 cannot be converted to text."),
report_json_context(lex)));
} }
else if (GetDatabaseEncoding() == PG_UTF8) else if (GetDatabaseEncoding() == PG_UTF8)
{ {
...@@ -833,8 +836,8 @@ json_lex_string(JsonLexContext *lex) ...@@ -833,8 +836,8 @@ json_lex_string(JsonLexContext *lex)
else else
{ {
ereport(ERROR, ereport(ERROR,
(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION), (errcode(ERRCODE_UNTRANSLATABLE_CHARACTER),
errmsg("invalid input syntax for type json"), errmsg("unsupported Unicode escape sequence"),
errdetail("Unicode escape values cannot be used for code point values above 007F when the server encoding is not UTF8."), errdetail("Unicode escape values cannot be used for code point values above 007F when the server encoding is not UTF8."),
report_json_context(lex))); report_json_context(lex)));
} }
...@@ -1284,8 +1287,8 @@ json_categorize_type(Oid typoid, ...@@ -1284,8 +1287,8 @@ json_categorize_type(Oid typoid,
/* /*
* We need to get the output function for everything except date and * We need to get the output function for everything except date and
* timestamp types, array and composite types, booleans, * timestamp types, array and composite types, booleans, and non-builtin
* and non-builtin types where there's a cast to json. * types where there's a cast to json.
*/ */
switch (typoid) switch (typoid)
...@@ -1339,7 +1342,8 @@ json_categorize_type(Oid typoid, ...@@ -1339,7 +1342,8 @@ json_categorize_type(Oid typoid,
CoercionPathType ctype; CoercionPathType ctype;
ctype = find_coercion_pathway(JSONOID, typoid, ctype = find_coercion_pathway(JSONOID, typoid,
COERCION_EXPLICIT, &castfunc); COERCION_EXPLICIT,
&castfunc);
if (ctype == COERCION_PATH_FUNC && OidIsValid(castfunc)) if (ctype == COERCION_PATH_FUNC && OidIsValid(castfunc))
{ {
*tcategory = JSONTYPE_CAST; *tcategory = JSONTYPE_CAST;
...@@ -2382,29 +2386,6 @@ escape_json(StringInfo buf, const char *str) ...@@ -2382,29 +2386,6 @@ escape_json(StringInfo buf, const char *str)
appendStringInfoString(buf, "\\\""); appendStringInfoString(buf, "\\\"");
break; break;
case '\\': case '\\':
/*
* Unicode escapes are passed through as is. There is no
* requirement that they denote a valid character in the
* server encoding - indeed that is a big part of their
* usefulness.
*
* All we require is that they consist of \uXXXX where the Xs
* are hexadecimal digits. It is the responsibility of the
* caller of, say, to_json() to make sure that the unicode
* escape is valid.
*
* In the case of a jsonb string value being escaped, the only
* unicode escape that should be present is \u0000, all the
* other unicode escapes will have been resolved.
*/
if (p[1] == 'u' &&
isxdigit((unsigned char) p[2]) &&
isxdigit((unsigned char) p[3]) &&
isxdigit((unsigned char) p[4]) &&
isxdigit((unsigned char) p[5]))
appendStringInfoCharMacro(buf, *p);
else
appendStringInfoString(buf, "\\\\"); appendStringInfoString(buf, "\\\\");
break; break;
default: default:
......
...@@ -426,20 +426,6 @@ select to_json(timestamptz '2014-05-28 12:22:35.614298-04'); ...@@ -426,20 +426,6 @@ select to_json(timestamptz '2014-05-28 12:22:35.614298-04');
(1 row) (1 row)
COMMIT; COMMIT;
-- unicode escape - backslash is not escaped
select to_json(text '\uabcd');
to_json
----------
"\uabcd"
(1 row)
-- any other backslash is escaped
select to_json(text '\abcd');
to_json
----------
"\\abcd"
(1 row)
--json_agg --json_agg
SELECT json_agg(q) SELECT json_agg(q)
FROM ( SELECT $$a$$ || x AS b, y AS c, FROM ( SELECT $$a$$ || x AS b, y AS c,
...@@ -1400,6 +1386,36 @@ ERROR: invalid input syntax for type json ...@@ -1400,6 +1386,36 @@ ERROR: invalid input syntax for type json
DETAIL: Unicode low surrogate must follow a high surrogate. DETAIL: Unicode low surrogate must follow a high surrogate.
CONTEXT: JSON data, line 1: { "a":... CONTEXT: JSON data, line 1: { "a":...
--handling of simple unicode escapes --handling of simple unicode escapes
select json '{ "a": "the Copyright \u00a9 sign" }' as correct_in_utf8;
correct_in_utf8
---------------------------------------
{ "a": "the Copyright \u00a9 sign" }
(1 row)
select json '{ "a": "dollar \u0024 character" }' as correct_everywhere;
correct_everywhere
-------------------------------------
{ "a": "dollar \u0024 character" }
(1 row)
select json '{ "a": "dollar \\u0024 character" }' as not_an_escape;
not_an_escape
--------------------------------------
{ "a": "dollar \\u0024 character" }
(1 row)
select json '{ "a": "null \u0000 escape" }' as not_unescaped;
not_unescaped
--------------------------------
{ "a": "null \u0000 escape" }
(1 row)
select json '{ "a": "null \\u0000 escape" }' as not_an_escape;
not_an_escape
---------------------------------
{ "a": "null \\u0000 escape" }
(1 row)
select json '{ "a": "the Copyright \u00a9 sign" }' ->> 'a' as correct_in_utf8; select json '{ "a": "the Copyright \u00a9 sign" }' ->> 'a' as correct_in_utf8;
correct_in_utf8 correct_in_utf8
---------------------- ----------------------
...@@ -1412,8 +1428,18 @@ select json '{ "a": "dollar \u0024 character" }' ->> 'a' as correct_everywhere; ...@@ -1412,8 +1428,18 @@ select json '{ "a": "dollar \u0024 character" }' ->> 'a' as correct_everywhere;
dollar $ character dollar $ character
(1 row) (1 row)
select json '{ "a": "null \u0000 escape" }' ->> 'a' as not_unescaped; select json '{ "a": "dollar \\u0024 character" }' ->> 'a' as not_an_escape;
not_unescaped not_an_escape
-------------------------
dollar \u0024 character
(1 row)
select json '{ "a": "null \u0000 escape" }' ->> 'a' as fails;
ERROR: unsupported Unicode escape sequence
DETAIL: \u0000 cannot be converted to text.
CONTEXT: JSON data, line 1: { "a":...
select json '{ "a": "null \\u0000 escape" }' ->> 'a' as not_an_escape;
not_an_escape
-------------------- --------------------
null \u0000 escape null \u0000 escape
(1 row) (1 row)
......
...@@ -426,20 +426,6 @@ select to_json(timestamptz '2014-05-28 12:22:35.614298-04'); ...@@ -426,20 +426,6 @@ select to_json(timestamptz '2014-05-28 12:22:35.614298-04');
(1 row) (1 row)
COMMIT; COMMIT;
-- unicode escape - backslash is not escaped
select to_json(text '\uabcd');
to_json
----------
"\uabcd"
(1 row)
-- any other backslash is escaped
select to_json(text '\abcd');
to_json
----------
"\\abcd"
(1 row)
--json_agg --json_agg
SELECT json_agg(q) SELECT json_agg(q)
FROM ( SELECT $$a$$ || x AS b, y AS c, FROM ( SELECT $$a$$ || x AS b, y AS c,
...@@ -1378,7 +1364,7 @@ select * from json_populate_recordset(row('def',99,null)::jpop,'[{"a":[100,200,3 ...@@ -1378,7 +1364,7 @@ select * from json_populate_recordset(row('def',99,null)::jpop,'[{"a":[100,200,3
-- handling of unicode surrogate pairs -- handling of unicode surrogate pairs
select json '{ "a": "\ud83d\ude04\ud83d\udc36" }' -> 'a' as correct_in_utf8; select json '{ "a": "\ud83d\ude04\ud83d\udc36" }' -> 'a' as correct_in_utf8;
ERROR: invalid input syntax for type json ERROR: unsupported Unicode escape sequence
DETAIL: Unicode escape values cannot be used for code point values above 007F when the server encoding is not UTF8. DETAIL: Unicode escape values cannot be used for code point values above 007F when the server encoding is not UTF8.
CONTEXT: JSON data, line 1: { "a":... CONTEXT: JSON data, line 1: { "a":...
select json '{ "a": "\ud83d\ud83d" }' -> 'a'; -- 2 high surrogates in a row select json '{ "a": "\ud83d\ud83d" }' -> 'a'; -- 2 high surrogates in a row
...@@ -1398,8 +1384,38 @@ ERROR: invalid input syntax for type json ...@@ -1398,8 +1384,38 @@ ERROR: invalid input syntax for type json
DETAIL: Unicode low surrogate must follow a high surrogate. DETAIL: Unicode low surrogate must follow a high surrogate.
CONTEXT: JSON data, line 1: { "a":... CONTEXT: JSON data, line 1: { "a":...
--handling of simple unicode escapes --handling of simple unicode escapes
select json '{ "a": "the Copyright \u00a9 sign" }' as correct_in_utf8;
correct_in_utf8
---------------------------------------
{ "a": "the Copyright \u00a9 sign" }
(1 row)
select json '{ "a": "dollar \u0024 character" }' as correct_everywhere;
correct_everywhere
-------------------------------------
{ "a": "dollar \u0024 character" }
(1 row)
select json '{ "a": "dollar \\u0024 character" }' as not_an_escape;
not_an_escape
--------------------------------------
{ "a": "dollar \\u0024 character" }
(1 row)
select json '{ "a": "null \u0000 escape" }' as not_unescaped;
not_unescaped
--------------------------------
{ "a": "null \u0000 escape" }
(1 row)
select json '{ "a": "null \\u0000 escape" }' as not_an_escape;
not_an_escape
---------------------------------
{ "a": "null \\u0000 escape" }
(1 row)
select json '{ "a": "the Copyright \u00a9 sign" }' ->> 'a' as correct_in_utf8; select json '{ "a": "the Copyright \u00a9 sign" }' ->> 'a' as correct_in_utf8;
ERROR: invalid input syntax for type json ERROR: unsupported Unicode escape sequence
DETAIL: Unicode escape values cannot be used for code point values above 007F when the server encoding is not UTF8. DETAIL: Unicode escape values cannot be used for code point values above 007F when the server encoding is not UTF8.
CONTEXT: JSON data, line 1: { "a":... CONTEXT: JSON data, line 1: { "a":...
select json '{ "a": "dollar \u0024 character" }' ->> 'a' as correct_everywhere; select json '{ "a": "dollar \u0024 character" }' ->> 'a' as correct_everywhere;
...@@ -1408,8 +1424,18 @@ select json '{ "a": "dollar \u0024 character" }' ->> 'a' as correct_everywhere; ...@@ -1408,8 +1424,18 @@ select json '{ "a": "dollar \u0024 character" }' ->> 'a' as correct_everywhere;
dollar $ character dollar $ character
(1 row) (1 row)
select json '{ "a": "null \u0000 escape" }' ->> 'a' as not_unescaped; select json '{ "a": "dollar \\u0024 character" }' ->> 'a' as not_an_escape;
not_unescaped not_an_escape
-------------------------
dollar \u0024 character
(1 row)
select json '{ "a": "null \u0000 escape" }' ->> 'a' as fails;
ERROR: unsupported Unicode escape sequence
DETAIL: \u0000 cannot be converted to text.
CONTEXT: JSON data, line 1: { "a":...
select json '{ "a": "null \\u0000 escape" }' ->> 'a' as not_an_escape;
not_an_escape
-------------------- --------------------
null \u0000 escape null \u0000 escape
(1 row) (1 row)
......
...@@ -60,12 +60,18 @@ LINE 1: SELECT '"\u000g"'::jsonb; ...@@ -60,12 +60,18 @@ LINE 1: SELECT '"\u000g"'::jsonb;
^ ^
DETAIL: "\u" must be followed by four hexadecimal digits. DETAIL: "\u" must be followed by four hexadecimal digits.
CONTEXT: JSON data, line 1: "\u000g... CONTEXT: JSON data, line 1: "\u000g...
SELECT '"\u0000"'::jsonb; -- OK, legal escape SELECT '"\u0045"'::jsonb; -- OK, legal escape
jsonb jsonb
---------- -------
"\u0000" "E"
(1 row) (1 row)
SELECT '"\u0000"'::jsonb; -- ERROR, we don't support U+0000
ERROR: unsupported Unicode escape sequence
LINE 1: SELECT '"\u0000"'::jsonb;
^
DETAIL: \u0000 cannot be converted to text.
CONTEXT: JSON data, line 1: ...
-- use octet_length here so we don't get an odd unicode char in the -- use octet_length here so we don't get an odd unicode char in the
-- output -- output
SELECT octet_length('"\uaBcD"'::jsonb::text); -- OK, uppercase and lower case both OK SELECT octet_length('"\uaBcD"'::jsonb::text); -- OK, uppercase and lower case both OK
...@@ -324,20 +330,6 @@ select to_jsonb(timestamptz '2014-05-28 12:22:35.614298-04'); ...@@ -324,20 +330,6 @@ select to_jsonb(timestamptz '2014-05-28 12:22:35.614298-04');
(1 row) (1 row)
COMMIT; COMMIT;
-- unicode escape - backslash is not escaped
select to_jsonb(text '\uabcd');
to_jsonb
----------
"\uabcd"
(1 row)
-- any other backslash is escaped
select to_jsonb(text '\abcd');
to_jsonb
----------
"\\abcd"
(1 row)
--jsonb_agg --jsonb_agg
CREATE TEMP TABLE rows AS CREATE TEMP TABLE rows AS
SELECT x, 'txt' || x as y SELECT x, 'txt' || x as y
...@@ -1971,20 +1963,62 @@ LINE 1: SELECT jsonb '{ "a": "\ude04X" }' -> 'a'; ...@@ -1971,20 +1963,62 @@ LINE 1: SELECT jsonb '{ "a": "\ude04X" }' -> 'a';
DETAIL: Unicode low surrogate must follow a high surrogate. DETAIL: Unicode low surrogate must follow a high surrogate.
CONTEXT: JSON data, line 1: { "a":... CONTEXT: JSON data, line 1: { "a":...
-- handling of simple unicode escapes -- handling of simple unicode escapes
SELECT jsonb '{ "a": "the Copyright \u00a9 sign" }' ->> 'a' AS correct_in_utf8; SELECT jsonb '{ "a": "the Copyright \u00a9 sign" }' as correct_in_utf8;
correct_in_utf8
-------------------------------
{"a": "the Copyright © sign"}
(1 row)
SELECT jsonb '{ "a": "dollar \u0024 character" }' as correct_everywhere;
correct_everywhere
-----------------------------
{"a": "dollar $ character"}
(1 row)
SELECT jsonb '{ "a": "dollar \\u0024 character" }' as not_an_escape;
not_an_escape
-----------------------------------
{"a": "dollar \\u0024 character"}
(1 row)
SELECT jsonb '{ "a": "null \u0000 escape" }' as fails;
ERROR: unsupported Unicode escape sequence
LINE 1: SELECT jsonb '{ "a": "null \u0000 escape" }' as fails;
^
DETAIL: \u0000 cannot be converted to text.
CONTEXT: JSON data, line 1: { "a":...
SELECT jsonb '{ "a": "null \\u0000 escape" }' as not_an_escape;
not_an_escape
------------------------------
{"a": "null \\u0000 escape"}
(1 row)
SELECT jsonb '{ "a": "the Copyright \u00a9 sign" }' ->> 'a' as correct_in_utf8;
correct_in_utf8 correct_in_utf8
---------------------- ----------------------
the Copyright © sign the Copyright © sign
(1 row) (1 row)
SELECT jsonb '{ "a": "dollar \u0024 character" }' ->> 'a' AS correct_everyWHERE; SELECT jsonb '{ "a": "dollar \u0024 character" }' ->> 'a' as correct_everywhere;
correct_everywhere correct_everywhere
-------------------- --------------------
dollar $ character dollar $ character
(1 row) (1 row)
SELECT jsonb '{ "a": "null \u0000 escape" }' ->> 'a' AS not_unescaped; SELECT jsonb '{ "a": "dollar \\u0024 character" }' ->> 'a' as not_an_escape;
not_unescaped not_an_escape
-------------------------
dollar \u0024 character
(1 row)
SELECT jsonb '{ "a": "null \u0000 escape" }' ->> 'a' as fails;
ERROR: unsupported Unicode escape sequence
LINE 1: SELECT jsonb '{ "a": "null \u0000 escape" }' ->> 'a' as fai...
^
DETAIL: \u0000 cannot be converted to text.
CONTEXT: JSON data, line 1: { "a":...
SELECT jsonb '{ "a": "null \\u0000 escape" }' ->> 'a' as not_an_escape;
not_an_escape
-------------------- --------------------
null \u0000 escape null \u0000 escape
(1 row) (1 row)
......
...@@ -60,16 +60,22 @@ LINE 1: SELECT '"\u000g"'::jsonb; ...@@ -60,16 +60,22 @@ LINE 1: SELECT '"\u000g"'::jsonb;
^ ^
DETAIL: "\u" must be followed by four hexadecimal digits. DETAIL: "\u" must be followed by four hexadecimal digits.
CONTEXT: JSON data, line 1: "\u000g... CONTEXT: JSON data, line 1: "\u000g...
SELECT '"\u0000"'::jsonb; -- OK, legal escape SELECT '"\u0045"'::jsonb; -- OK, legal escape
jsonb jsonb
---------- -------
"\u0000" "E"
(1 row) (1 row)
SELECT '"\u0000"'::jsonb; -- ERROR, we don't support U+0000
ERROR: unsupported Unicode escape sequence
LINE 1: SELECT '"\u0000"'::jsonb;
^
DETAIL: \u0000 cannot be converted to text.
CONTEXT: JSON data, line 1: ...
-- use octet_length here so we don't get an odd unicode char in the -- use octet_length here so we don't get an odd unicode char in the
-- output -- output
SELECT octet_length('"\uaBcD"'::jsonb::text); -- OK, uppercase and lower case both OK SELECT octet_length('"\uaBcD"'::jsonb::text); -- OK, uppercase and lower case both OK
ERROR: invalid input syntax for type json ERROR: unsupported Unicode escape sequence
LINE 1: SELECT octet_length('"\uaBcD"'::jsonb::text); LINE 1: SELECT octet_length('"\uaBcD"'::jsonb::text);
^ ^
DETAIL: Unicode escape values cannot be used for code point values above 007F when the server encoding is not UTF8. DETAIL: Unicode escape values cannot be used for code point values above 007F when the server encoding is not UTF8.
...@@ -324,20 +330,6 @@ select to_jsonb(timestamptz '2014-05-28 12:22:35.614298-04'); ...@@ -324,20 +330,6 @@ select to_jsonb(timestamptz '2014-05-28 12:22:35.614298-04');
(1 row) (1 row)
COMMIT; COMMIT;
-- unicode escape - backslash is not escaped
select to_jsonb(text '\uabcd');
to_jsonb
----------
"\uabcd"
(1 row)
-- any other backslash is escaped
select to_jsonb(text '\abcd');
to_jsonb
----------
"\\abcd"
(1 row)
--jsonb_agg --jsonb_agg
CREATE TEMP TABLE rows AS CREATE TEMP TABLE rows AS
SELECT x, 'txt' || x as y SELECT x, 'txt' || x as y
...@@ -1941,7 +1933,7 @@ SELECT * FROM jsonb_populate_recordset(row('def',99,NULL)::jbpop,'[{"a":[100,200 ...@@ -1941,7 +1933,7 @@ SELECT * FROM jsonb_populate_recordset(row('def',99,NULL)::jbpop,'[{"a":[100,200
-- handling of unicode surrogate pairs -- handling of unicode surrogate pairs
SELECT octet_length((jsonb '{ "a": "\ud83d\ude04\ud83d\udc36" }' -> 'a')::text) AS correct_in_utf8; SELECT octet_length((jsonb '{ "a": "\ud83d\ude04\ud83d\udc36" }' -> 'a')::text) AS correct_in_utf8;
ERROR: invalid input syntax for type json ERROR: unsupported Unicode escape sequence
LINE 1: SELECT octet_length((jsonb '{ "a": "\ud83d\ude04\ud83d\udc3... LINE 1: SELECT octet_length((jsonb '{ "a": "\ud83d\ude04\ud83d\udc3...
^ ^
DETAIL: Unicode escape values cannot be used for code point values above 007F when the server encoding is not UTF8. DETAIL: Unicode escape values cannot be used for code point values above 007F when the server encoding is not UTF8.
...@@ -1971,20 +1963,62 @@ LINE 1: SELECT jsonb '{ "a": "\ude04X" }' -> 'a'; ...@@ -1971,20 +1963,62 @@ LINE 1: SELECT jsonb '{ "a": "\ude04X" }' -> 'a';
DETAIL: Unicode low surrogate must follow a high surrogate. DETAIL: Unicode low surrogate must follow a high surrogate.
CONTEXT: JSON data, line 1: { "a":... CONTEXT: JSON data, line 1: { "a":...
-- handling of simple unicode escapes -- handling of simple unicode escapes
SELECT jsonb '{ "a": "the Copyright \u00a9 sign" }' ->> 'a' AS correct_in_utf8; SELECT jsonb '{ "a": "the Copyright \u00a9 sign" }' as correct_in_utf8;
ERROR: invalid input syntax for type json ERROR: unsupported Unicode escape sequence
LINE 1: SELECT jsonb '{ "a": "the Copyright \u00a9 sign" }' as corr...
^
DETAIL: Unicode escape values cannot be used for code point values above 007F when the server encoding is not UTF8.
CONTEXT: JSON data, line 1: { "a":...
SELECT jsonb '{ "a": "dollar \u0024 character" }' as correct_everywhere;
correct_everywhere
-----------------------------
{"a": "dollar $ character"}
(1 row)
SELECT jsonb '{ "a": "dollar \\u0024 character" }' as not_an_escape;
not_an_escape
-----------------------------------
{"a": "dollar \\u0024 character"}
(1 row)
SELECT jsonb '{ "a": "null \u0000 escape" }' as fails;
ERROR: unsupported Unicode escape sequence
LINE 1: SELECT jsonb '{ "a": "null \u0000 escape" }' as fails;
^
DETAIL: \u0000 cannot be converted to text.
CONTEXT: JSON data, line 1: { "a":...
SELECT jsonb '{ "a": "null \\u0000 escape" }' as not_an_escape;
not_an_escape
------------------------------
{"a": "null \\u0000 escape"}
(1 row)
SELECT jsonb '{ "a": "the Copyright \u00a9 sign" }' ->> 'a' as correct_in_utf8;
ERROR: unsupported Unicode escape sequence
LINE 1: SELECT jsonb '{ "a": "the Copyright \u00a9 sign" }' ->> 'a'... LINE 1: SELECT jsonb '{ "a": "the Copyright \u00a9 sign" }' ->> 'a'...
^ ^
DETAIL: Unicode escape values cannot be used for code point values above 007F when the server encoding is not UTF8. DETAIL: Unicode escape values cannot be used for code point values above 007F when the server encoding is not UTF8.
CONTEXT: JSON data, line 1: { "a":... CONTEXT: JSON data, line 1: { "a":...
SELECT jsonb '{ "a": "dollar \u0024 character" }' ->> 'a' AS correct_everyWHERE; SELECT jsonb '{ "a": "dollar \u0024 character" }' ->> 'a' as correct_everywhere;
correct_everywhere correct_everywhere
-------------------- --------------------
dollar $ character dollar $ character
(1 row) (1 row)
SELECT jsonb '{ "a": "null \u0000 escape" }' ->> 'a' AS not_unescaped; SELECT jsonb '{ "a": "dollar \\u0024 character" }' ->> 'a' as not_an_escape;
not_unescaped not_an_escape
-------------------------
dollar \u0024 character
(1 row)
SELECT jsonb '{ "a": "null \u0000 escape" }' ->> 'a' as fails;
ERROR: unsupported Unicode escape sequence
LINE 1: SELECT jsonb '{ "a": "null \u0000 escape" }' ->> 'a' as fai...
^
DETAIL: \u0000 cannot be converted to text.
CONTEXT: JSON data, line 1: { "a":...
SELECT jsonb '{ "a": "null \\u0000 escape" }' ->> 'a' as not_an_escape;
not_an_escape
-------------------- --------------------
null \u0000 escape null \u0000 escape
(1 row) (1 row)
......
...@@ -111,14 +111,6 @@ SET LOCAL TIME ZONE -8; ...@@ -111,14 +111,6 @@ SET LOCAL TIME ZONE -8;
select to_json(timestamptz '2014-05-28 12:22:35.614298-04'); select to_json(timestamptz '2014-05-28 12:22:35.614298-04');
COMMIT; COMMIT;
-- unicode escape - backslash is not escaped
select to_json(text '\uabcd');
-- any other backslash is escaped
select to_json(text '\abcd');
--json_agg --json_agg
SELECT json_agg(q) SELECT json_agg(q)
...@@ -401,9 +393,17 @@ select json '{ "a": "\ude04X" }' -> 'a'; -- orphan low surrogate ...@@ -401,9 +393,17 @@ select json '{ "a": "\ude04X" }' -> 'a'; -- orphan low surrogate
--handling of simple unicode escapes --handling of simple unicode escapes
select json '{ "a": "the Copyright \u00a9 sign" }' as correct_in_utf8;
select json '{ "a": "dollar \u0024 character" }' as correct_everywhere;
select json '{ "a": "dollar \\u0024 character" }' as not_an_escape;
select json '{ "a": "null \u0000 escape" }' as not_unescaped;
select json '{ "a": "null \\u0000 escape" }' as not_an_escape;
select json '{ "a": "the Copyright \u00a9 sign" }' ->> 'a' as correct_in_utf8; select json '{ "a": "the Copyright \u00a9 sign" }' ->> 'a' as correct_in_utf8;
select json '{ "a": "dollar \u0024 character" }' ->> 'a' as correct_everywhere; select json '{ "a": "dollar \u0024 character" }' ->> 'a' as correct_everywhere;
select json '{ "a": "null \u0000 escape" }' ->> 'a' as not_unescaped; select json '{ "a": "dollar \\u0024 character" }' ->> 'a' as not_an_escape;
select json '{ "a": "null \u0000 escape" }' ->> 'a' as fails;
select json '{ "a": "null \\u0000 escape" }' ->> 'a' as not_an_escape;
--json_typeof() function --json_typeof() function
select value, json_typeof(value) select value, json_typeof(value)
......
...@@ -10,7 +10,8 @@ SELECT '"\v"'::jsonb; -- ERROR, not a valid JSON escape ...@@ -10,7 +10,8 @@ SELECT '"\v"'::jsonb; -- ERROR, not a valid JSON escape
SELECT '"\u"'::jsonb; -- ERROR, incomplete escape SELECT '"\u"'::jsonb; -- ERROR, incomplete escape
SELECT '"\u00"'::jsonb; -- ERROR, incomplete escape SELECT '"\u00"'::jsonb; -- ERROR, incomplete escape
SELECT '"\u000g"'::jsonb; -- ERROR, g is not a hex digit SELECT '"\u000g"'::jsonb; -- ERROR, g is not a hex digit
SELECT '"\u0000"'::jsonb; -- OK, legal escape SELECT '"\u0045"'::jsonb; -- OK, legal escape
SELECT '"\u0000"'::jsonb; -- ERROR, we don't support U+0000
-- use octet_length here so we don't get an odd unicode char in the -- use octet_length here so we don't get an odd unicode char in the
-- output -- output
SELECT octet_length('"\uaBcD"'::jsonb::text); -- OK, uppercase and lower case both OK SELECT octet_length('"\uaBcD"'::jsonb::text); -- OK, uppercase and lower case both OK
...@@ -73,14 +74,6 @@ SET LOCAL TIME ZONE -8; ...@@ -73,14 +74,6 @@ SET LOCAL TIME ZONE -8;
select to_jsonb(timestamptz '2014-05-28 12:22:35.614298-04'); select to_jsonb(timestamptz '2014-05-28 12:22:35.614298-04');
COMMIT; COMMIT;
-- unicode escape - backslash is not escaped
select to_jsonb(text '\uabcd');
-- any other backslash is escaped
select to_jsonb(text '\abcd');
--jsonb_agg --jsonb_agg
CREATE TEMP TABLE rows AS CREATE TEMP TABLE rows AS
...@@ -488,9 +481,18 @@ SELECT jsonb '{ "a": "\ud83dX" }' -> 'a'; -- orphan high surrogate ...@@ -488,9 +481,18 @@ SELECT jsonb '{ "a": "\ud83dX" }' -> 'a'; -- orphan high surrogate
SELECT jsonb '{ "a": "\ude04X" }' -> 'a'; -- orphan low surrogate SELECT jsonb '{ "a": "\ude04X" }' -> 'a'; -- orphan low surrogate
-- handling of simple unicode escapes -- handling of simple unicode escapes
SELECT jsonb '{ "a": "the Copyright \u00a9 sign" }' ->> 'a' AS correct_in_utf8;
SELECT jsonb '{ "a": "dollar \u0024 character" }' ->> 'a' AS correct_everyWHERE; SELECT jsonb '{ "a": "the Copyright \u00a9 sign" }' as correct_in_utf8;
SELECT jsonb '{ "a": "null \u0000 escape" }' ->> 'a' AS not_unescaped; SELECT jsonb '{ "a": "dollar \u0024 character" }' as correct_everywhere;
SELECT jsonb '{ "a": "dollar \\u0024 character" }' as not_an_escape;
SELECT jsonb '{ "a": "null \u0000 escape" }' as fails;
SELECT jsonb '{ "a": "null \\u0000 escape" }' as not_an_escape;
SELECT jsonb '{ "a": "the Copyright \u00a9 sign" }' ->> 'a' as correct_in_utf8;
SELECT jsonb '{ "a": "dollar \u0024 character" }' ->> 'a' as correct_everywhere;
SELECT jsonb '{ "a": "dollar \\u0024 character" }' ->> 'a' as not_an_escape;
SELECT jsonb '{ "a": "null \u0000 escape" }' ->> 'a' as fails;
SELECT jsonb '{ "a": "null \\u0000 escape" }' ->> 'a' as not_an_escape;
-- jsonb_to_record and jsonb_to_recordset -- jsonb_to_record and jsonb_to_recordset
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment