• Tom Lane's avatar
    Sync our Snowball stemmer dictionaries with current upstream. · fd582317
    Tom Lane authored
    We haven't touched these since text search functionality landed in core
    in 2007 :-(.  While the upstream project isn't a beehive of activity,
    they do make additions and bug fixes from time to time.  Update our
    copies of these files.
    
    Also update our documentation about how to keep things in sync, since
    they're not making distribution tarballs these days.  Fortunately,
    their source code turns out to be a breeze to build.
    
    Notable changes:
    
    * The non-UTF8 version of the hungarian stemmer now works in LATIN2
    not LATIN1.
    
    * New stemmers have appeared for arabic, indonesian, irish, lithuanian,
    nepali, and tamil.  These all work in UTF8, and the indonesian and
    irish ones also work in LATIN1.
    
    (There are some new stemmers that I did not incorporate, mainly because
    their names don't match the underlying languages, suggesting that they're
    not to be considered mainstream.)
    
    Worth noting: the upstream Nepali dictionary was contributed by
    Arthur Zakirov.
    
    initdb forced because the contents of snowball_create.sql have
    changed.
    
    Still TODO: see about updating the stopword lists.
    
    Arthur Zakirov, minor mods and doc work by me
    
    Discussion: https://postgr.es/m/20180626122025.GA12647@zakirov.localdomain
    Discussion: https://postgr.es/m/20180219140849.GA9050@zakirov.localdomain
    fd582317
stem_UTF_8_finnish.c 25.5 KB