• Tom Lane's avatar
    Add some more query-cancel checks to regular expression matching. · 9fe8fe9c
    Tom Lane authored
    Commit 9662143f added infrastructure to
    allow regular-expression operations to be terminated early in the event
    of SIGINT etc.  However, fuzz testing by Greg Stark disclosed that there
    are still cases where regex compilation could run for a long time without
    noticing a cancel request.  Specifically, the fixempties() phase never
    adds new states, only new arcs, so it doesn't hit the cancel check I'd put
    in newstate().  Add one to newarc() as well to cover that.
    
    Some experimentation of my own found that regex execution could also run
    for a long time despite a pending cancel.  We'd put a high-level cancel
    check into cdissect(), but there was none inside the core text-matching
    routines longest() and shortest().  Ordinarily those inner loops are very
    very fast ... but in the presence of lookahead constraints, not so much.
    As a compromise, stick a cancel check into the stateset cache-miss
    function, which is enough to guarantee a cancel check at least once per
    lookahead constraint test.
    
    Making this work required more attention to error handling throughout the
    regex executor.  Henry Spencer had apparently originally intended longest()
    and shortest() to be incapable of incurring errors while running, so
    neither they nor their subroutines had well-defined error reporting
    behaviors.  However, that was already broken by the lookahead constraint
    feature, since lacon() can surely suffer an out-of-memory failure ---
    which, in the code as it stood, might never be reported to the user at all,
    but just silently be treated as a non-match of the lookahead constraint.
    Normalize all that by inserting explicit error tests as needed.  I took the
    opportunity to add some more comments to the code, too.
    
    Back-patch to all supported branches, like the previous patch.
    9fe8fe9c
regexec.c 32.9 KB