• Tom Lane's avatar
    Fix regular-expression compiler to handle loops of constraint arcs. · 48789c5d
    Tom Lane authored
    It's possible to construct regular expressions that contain loops of
    constraint arcs (that is, ^ $ AHEAD BEHIND or LACON arcs).  There's no use
    in fully traversing such a loop at execution, since you'd just end up in
    the same NFA state without having consumed any input.  Worse, such a loop
    leads to infinite looping in the pullback/pushfwd stage of compilation,
    because we keep pushing or pulling the same constraints around the loop
    in a vain attempt to move them to the pre or post state.  Such looping was
    previously recognized in CVE-2007-4772; but the fix only handled the case
    of trivial single-state loops (that is, a constraint arc leading back to
    its source state) ... and not only that, it was incorrect even for that
    case, because it broke the admittedly-not-very-clearly-stated API contract
    of the pull() and push() subroutines.  The first two regression test cases
    added by this commit exhibit patterns that result in assertion failures
    because of that (though there seem to be no ill effects in non-assert
    builds).  The other new test cases exhibit multi-state constraint loops;
    in an unpatched build they will run until the NFA state-count limit is
    exceeded.
    
    To fix, remove the code added for CVE-2007-4772, and instead create a
    general-purpose constraint-loop-breaking phase of regex compilation that
    executes before we do pullback/pushfwd.  Since we never need to traverse
    a constraint loop fully, we can just break the loop at any chosen spot,
    if we add clone states that can replicate any sequence of arc transitions
    that would've traversed just part of the loop.
    
    Also add some commentary clarifying why we have to have all these
    machinations in the first place.
    
    This class of problems has been known for some time --- we had a report
    from Marc Mamin about two years ago, for example, and there are related
    complaints in the Tcl bug tracker.  I had discussed a fix of this kind
    off-list with Henry Spencer, but didn't get around to doing something
    about it until the issue was rediscovered by Greg Stark recently.
    
    Back-patch to all supported branches.
    48789c5d
regex.out 5.79 KB