• Heikki Linnakangas's avatar
    Use radix tree for character encoding conversions. · aeed17d0
    Heikki Linnakangas authored
    Replace the mapping tables used to convert between UTF-8 and other
    character encodings with new radix tree-based maps. Looking up an entry in
    a radix tree is much faster than a binary search in the old maps. As a
    bonus, the radix tree representation is also more compact, making the
    binaries slightly smaller.
    
    The "combined" maps work the same as before, with binary search. They are
    much smaller than the main tables, so it doesn't matter so much. However,
    the "combined" maps are now stored in the same .map files as the main
    tables. This seems more clear, since they're always used together, and
    generated from the same source files.
    
    Patch by Kyotaro Horiguchi, with lot of hacking by me at various stages.
    Reviewed by Michael Paquier and Daniel Gustafsson.
    
    Discussion: https://www.postgresql.org/message-id/20170306.171609.204324917.horiguchi.kyotaro%40lab.ntt.co.jp
    aeed17d0
gbk_to_utf8.map 318 KB