• Michael Paquier's avatar
    Fix buffer overrun in unicode string normalization with empty input · b609db71
    Michael Paquier authored
    PostgreSQL 13 and newer versions are directly impacted by that through
    the SQL function normalize(), which would cause a call of this function
    to write one byte past its allocation if using in input an empty
    string after recomposing the string with NFC and NFKC.  Older versions
    (v10~v12) are not directly affected by this problem as the only code
    path using normalization is SASLprep in SCRAM authentication that
    forbids the case of an empty string, but let's make the code more robust
    anyway there so as any out-of-core callers of this function are covered.
    
    The solution chosen to fix this issue is simple, with the addition of a
    fast-exit path if the decomposed string is found as empty.  This would
    only happen for an empty string as at its lowest level a codepoint would
    be decomposed as itself if it has no entry in the decomposition table or
    if it has a decomposition size of 0.
    
    Some tests are added to cover this issue in v13~.  Note that an empty
    string has always been considered as normalized (grammar "IS NF[K]{C,D}
    NORMALIZED", through the SQL function is_normalized()) for all the
    operations allowed (NFC, NFD, NFKC and NFKD) since this feature has been
    introduced as of 2991ac5f.  This behavior is unchanged but some tests are
    added in v13~ to check after that.
    
    I have also checked "make normalization-check" in src/common/unicode/,
    while on it (works in 13~, and breaks in older stable branches
    independently of this commit).
    
    The release notes should just mention this commit for v13~.
    
    Reported-by: Matthijs van der Vleuten
    Discussion: https://postgr.es/m/17277-0c527a373794e802@postgresql.org
    Backpatch-through: 10
    b609db71
unicode_norm.c 15.4 KB