• Tom Lane's avatar
    Rename and slightly redefine the default text search parser's "word" · dbaec70c
    Tom Lane authored
    categories, as per discussion.  asciiword (formerly lword) is still
    ASCII-letters-only, and numword (formerly word) is still the most general
    mixed-alpha-and-digits case.  But word (formerly nlword) is now
    any-group-of-letters-with-at-least-one-non-ASCII, rather than all-non-ASCII as
    before.  This is no worse than before for parsing mixed Russian/English text,
    which seems to have been the design center for the original coding; and it
    should simplify matters for parsing most European languages.  In particular
    it will not be necessary for any language to accept strings containing digits
    as being regular "words".  The hyphenated-word categories are adjusted
    similarly.
    dbaec70c
tsdicts.out 8.49 KB