• Tom Lane's avatar
    Reduce size of backend scanner's tables. · 7f380c59
    Tom Lane authored
    Previously, the core scanner's yy_transition[] array had 37045 elements.
    Since that number is larger than INT16_MAX, Flex generated the array to
    contain 32-bit integers.  By reimplementing some of the bulkier scanner
    rules, this patch reduces the array to 20495 elements.  The much smaller
    total length, combined with the consequent use of 16-bit integers for
    the array elements reduces the binary size by over 200kB.  This was
    accomplished in two ways:
    
    1. Consolidate handling of quote continuations into a new start condition,
    rather than duplicating that logic for five different string types.
    
    2. Treat Unicode strings and identifiers followed by a UESCAPE sequence
    as three separate tokens, rather than one.  The logic to de-escape
    Unicode strings is moved to the filter code in parser.c, which already
    had the ability to provide special processing for token sequences.
    While we could have implemented the conversion in the grammar, that
    approach was rejected for performance and maintainability reasons.
    
    Performance in microbenchmarks of raw parsing seems equal or slightly
    faster in most cases, and it's reasonable to expect that in real-world
    usage (with more competition for the CPU cache) there will be a larger
    win.  The exception is UESCAPE sequences; lexing those is about 10%
    slower, primarily because the scanner now has to be called three times
    rather than one.  This seems acceptable since that feature is very
    rarely used.
    
    The psql and epcg lexers are likewise modified, primarily because we
    want to keep them all in sync.  Since those lexers don't use the
    space-hogging -CF option, the space savings is much less, but it's
    still good for perhaps 10kB apiece.
    
    While at it, merge the ecpg lexer's handling of C-style comments used
    in SQL and in C.  Those have different rules regarding nested comments,
    but since we already have the ability to keep track of the previous
    start condition, we can use that to handle both cases within a single
    start condition.  This matches the core scanner more closely.
    
    John Naylor
    
    Discussion: https://postgr.es/m/CACPNZCvaoa3EgVWm5yZhcSTX6RAtaLgniCPcBVOCwm8h3xpWkw@mail.gmail.com
    7f380c59
strings.sql 21.4 KB