• Tom Lane's avatar
    Teach the regular expression functions to do case-insensitive matching and · 0d323425
    Tom Lane authored
    locale-dependent character classification properly when the database encoding
    is UTF8.
    
    The previous coding worked okay in single-byte encodings, or in any case for
    ASCII characters, but failed entirely on multibyte characters.  The fix
    assumes that the <wctype.h> functions use Unicode code points as the wchar
    representation for Unicode, ie, wchar matches pg_wchar.
    
    This is only a partial solution, since we're still stupid about non-ASCII
    characters in multibyte encodings other than UTF8.  The practical effect
    of that is limited, however, since those cases are generally Far Eastern
    glyphs for which concepts like case-folding don't apply anyway.  Certainly
    all or nearly all of the field reports of problems have been about UTF8.
    A more general solution would require switching to the platform's wchar
    representation for all regex operations; which is possible but would have
    substantial disadvantages.  Let's try this and see if it's sufficient in
    practice.
    0d323425
regcustom.h 2.88 KB