Commit 3df9c374 authored by Robert Haas's avatar Robert Haas

Disable abbreviated keys for string-sorting in non-C locales.

Unfortunately, every version of glibc thus far tested has bugs whereby
strcoll() ordering does not match strxfrm() ordering as required by
the standard.  This can result in, for example, corrupted indexes.
Disabling abbreviated keys in these cases slows down non-C-collation
string sorting considerably, but there seems to be no practical
alternative.  Users who are confident that their libc implementations
are solid in this regard can re-enable the optimization by compiling
with TRUST_STRXFRM.

Users who have built indexes using PostgreSQL 9.5 or PostgreSQL 9.5.1
should REINDEX if there is a possibility that they may have been
affected by this problem.

Report by Marc-Olaf Jaschke.  Investigation mostly by Tom Lane, with
help from Peter Geoghegan, Noah Misch, Stephen Frost, and me.  Patch
by me, reviewed by Peter Geoghegan and Tom Lane.
parent 3151f16e
...@@ -1832,17 +1832,30 @@ varstr_sortsupport(SortSupport ssup, Oid collid, bool bpchar) ...@@ -1832,17 +1832,30 @@ varstr_sortsupport(SortSupport ssup, Oid collid, bool bpchar)
} }
/* /*
* It's possible that there are platforms where the use of abbreviated * Unfortunately, it seems that abbreviation for non-C collations is
* keys should be disabled at compile time. Having only 4 byte datums * broken on many common platforms; testing of multiple versions of glibc
* could make worst-case performance drastically more likely, for example. * reveals that, for many locales, strcoll() and strxfrm() do not return
* Moreover, Darwin's strxfrm() implementations is known to not * consistent results, which is fatal to this optimization. While no
* effectively concentrate a significant amount of entropy from the * other libc other than Cygwin has so far been shown to have a problem,
* original string in earlier transformed blobs. It's possible that other * we take the conservative course of action for right now and disable
* supported platforms are similarly encumbered. However, even in those * this categorically. (Users who are certain this isn't a problem on
* cases, the abbreviated keys optimization may win, and if it doesn't, * their system can define TRUST_STRXFRM.)
* the "abort abbreviation" code may rescue us. So, for now, we don't *
* disable this anywhere on the basis of performance. * Even apart from the risk of broken locales, it's possible that there
* are platforms where the use of abbreviated keys should be disabled at
* compile time. Having only 4 byte datums could make worst-case
* performance drastically more likely, for example. Moreover, Darwin's
* strxfrm() implementations is known to not effectively concentrate a
* significant amount of entropy from the original string in earlier
* transformed blobs. It's possible that other supported platforms are
* similarly encumbered. So, if we ever get past disabling this
* categorically, we may still want or need to disable it for particular
* platforms.
*/ */
#ifndef TRUST_STRXFRM
if (!collate_c)
abbreviate = false;
#endif
/* /*
* If we're using abbreviated keys, or if we're using a locale-aware * If we're using abbreviated keys, or if we're using a locale-aware
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment