• Tom Lane's avatar
    Avoid doing encoding conversions by double-conversion via MULE_INTERNAL. · 8d32717b
    Tom Lane authored
    Previously, we did many conversions for Cyrillic and Central European
    single-byte encodings by converting to a related MULE_INTERNAL coding
    scheme before converting to the destination.  This seems unnecessarily
    inefficient.  Moreover, if the conversion encounters an untranslatable
    character, the error message will confusingly complain about failure
    to convert to or from MULE_INTERNAL, rather than the user-visible
    encodings.  Worse still, this approach results in some completely
    unnecessary conversion failures; there are cases where the chosen
    MULE subset lacks characters that exist in both of the user-visible
    encodings, causing a conversion failure that need not occur.
    
    This patch fixes the first two of those deficiencies by introducing
    a new local2local() conversion support subroutine for direct conversion
    between any two single-byte character sets, and adding new conversion
    tables where needed.  However, I generated the new conversion tables by
    testing PG 9.5's behavior, so that the actual conversion behavior is
    bug-compatible with previous releases; the only user-visible behavior
    change is that the error messages for conversion failures are saner.
    Changes in the conversion behavior will probably ensue after discussion.
    
    Interestingly, although this approach requires more tables, the .so files
    actually end up smaller (at least on my x86_64 machine); the tables are
    smaller than the management code needed for double conversion.
    
    Per a complaint from Albe Laurenz.
    8d32717b
conv.c 15.1 KB