• Tom Lane's avatar
    Make regexp engine's backref-related compilation state more bulletproof. · 5e6ad63c
    Tom Lane authored
    Up to now, we remembered the definition of a capturing parenthesis
    subexpression by storing a pointer to the associated subRE node.
    That was okay before, because that subRE didn't get modified anymore
    while parsing the rest of the regexp.  However, in the wake of
    commit ea1268f6, that's no longer true: the outer invocation of
    parseqatom() feels free to scribble on that subRE.  This seems to
    work anyway, because the states we jam into the child atom in the
    "prepare a general-purpose state skeleton" stanza aren't really
    semantically different from the original endpoints of the child atom.
    But that would be mighty easy to break, and it's definitely not how
    things worked before.
    
    Between this and the issue fixed in the prior commit, it seems best
    to get rid of this dependence on subRE nodes entirely.  We don't need
    the whole child subRE for future backrefs, only its starting and ending
    NFA states; so let's just store pointers to those.
    
    Also, in the corner case where we make an extra subRE to handle
    immediately-nested capturing parentheses, it seems like it'd be smart
    to have the extra subRE have the same begin/end states as the original
    child subRE does (s/s2 not lp/rp).  I think that linking it from lp to
    rp might actually be semantically wrong, though since Spencer's original
    code did it that way, I'm not totally certain.  Using s/s2 is certainly
    not wrong, in any case.
    
    Per report from Mark Dilger.  Back-patch to v14 where the problematic
    patches came in.
    
    Discussion: https://postgr.es/m/0203588E-E609-43AF-9F4F-902854231EE7@enterprisedb.com
    5e6ad63c
regcomp.c 68.1 KB