• Tom Lane's avatar
    Allow complemented character class escapes within regex brackets. · 2a0af7fe
    Tom Lane authored
    The complement-class escapes \D, \S, \W are now allowed within
    bracket expressions.  There is no semantic difficulty with doing
    that, but the rather hokey macro-expansion-based implementation
    previously used here couldn't cope.
    
    Also, invent "word" as an allowed character class name, thus "\w"
    is now equivalent to "[[:word:]]" outside brackets, or "[:word:]"
    within brackets.  POSIX allows such implementation-specific
    extensions, and the same name is used in e.g. bash.
    
    One surprising compatibility issue this raises is that constructs
    such as "[\w-_]" are now disallowed, as our documentation has always
    said they should be: character classes can't be endpoints of a range.
    Previously, because \w was just a macro for "[:alnum:]_", such a
    construct was read as "[[:alnum:]_-_]", so it was accepted so long as
    the character after "-" was numerically greater than or equal to "_".
    
    Some implementation cleanup along the way:
    
    * Remove the lexnest() hack, and in consequence clean up wordchrs()
    to not interact with the lexer.
    
    * Fix colorcomplement() to not be O(N^2) in the number of colors
    involved.
    
    * Get rid of useless-as-far-as-I-can-see calls of element()
    on single-character character element names in brackpart().
    element() always maps these to the character itself, and things
    would be quite broken if it didn't --- should "[a]" match something
    different than "a" does?  Besides, the shortcut path in brackpart()
    wasn't doing this anyway, making it even more inconsistent.
    
    Discussion: https://postgr.es/m/2845172.1613674385@sss.pgh.pa.us
    Discussion: https://postgr.es/m/3220564.1613859619@sss.pgh.pa.us
    2a0af7fe
regguts.h 19 KB