• Tom Lane's avatar
    Avoid using %c printf format for potentially non-ASCII characters. · 16e3ad5d
    Tom Lane authored
    Since %c only passes a C "char" to printf, it's incapable of dealing
    with multibyte characters.  Passing just the first byte of such a
    character leads to an output string that is visibly not correctly
    encoded, resulting in undesirable behavior such as encoding conversion
    failures while sending error messages to clients.
    
    We've lived with this issue for a long time because it was inconvenient
    to avoid in a portable fashion.  However, now that we always use our own
    snprintf code, it's reasonable to use the %.*s format to print just one
    possibly-multibyte character in a string.  (We previously avoided that
    obvious-looking answer in order to work around glibc's bug #6530, cf
    commits 54cd4f04 and ed437e2b.)
    
    Hence, run around and fix a bunch of places that used %c to report
    a character found in a user-supplied string.  For simplicity, I did
    not touch places that were emitting non-user-facing debug messages,
    or reporting catalog data that should always be ASCII.  (It's also
    unclear how useful this approach could be in frontend code, where
    it's less certain that we know what encoding we're dealing with.)
    
    In passing, improve a couple of poorly-written error messages in
    pageinspect/heapfuncs.c.
    
    This is a longstanding issue, but I'm hesitant to back-patch because
    of the impact on translatable message strings.  In any case this fix
    would not work reliably before v12.
    
    Tom Lane and Quan Zongliang
    
    Discussion: https://postgr.es/m/a120087c-4c88-d9d4-1ec5-808d7a7f133d@gmail.com
    16e3ad5d
regexp.c 41.8 KB