• Tom Lane's avatar
    Improve parser's one-extra-token lookahead mechanism. · d809fd00
    Tom Lane authored
    There are a couple of places in our grammar that fail to be strict LALR(1),
    by requiring more than a single token of lookahead to decide what to do.
    Up to now we've dealt with that by using a filter between the lexer and
    parser that merges adjacent tokens into one in the places where two tokens
    of lookahead are necessary.  But that creates a number of user-visible
    anomalies, for instance that you can't name a CTE "ordinality" because
    "WITH ordinality AS ..." triggers folding of WITH and ORDINALITY into one
    token.  I realized that there's a better way.
    
    In this patch, we still do the lookahead basically as before, but we never
    merge the second token into the first; we replace just the first token by
    a special lookahead symbol when one of the lookahead pairs is seen.
    
    This requires a couple extra productions in the grammar, but it involves
    fewer special tokens, so that the grammar tables come out a bit smaller
    than before.  The filter logic is no slower than before, perhaps a bit
    faster.
    
    I also fixed the filter logic so that when backing up after a lookahead,
    the current token's terminator is correctly restored; this eliminates some
    weird behavior in error message issuance, as is shown by the one change in
    existing regression test outputs.
    
    I believe that this patch entirely eliminates odd behaviors caused by
    lookahead for WITH.  It doesn't really improve the situation for NULLS
    followed by FIRST/LAST unfortunately: those sequences still act like a
    reserved word, even though there are cases where they should be seen as two
    ordinary identifiers, eg "SELECT nulls first FROM ...".  I experimented
    with additional grammar hacks but couldn't find any simple solution for
    that.  Still, this is better than before, and it seems much more likely
    that we *could* somehow solve the NULLS case on the basis of this filter
    behavior than the previous one.
    d809fd00
parser.c 4.98 KB