• Tom Lane's avatar
    Rethink regexp engine's backref-related compilation state. · 5227d998
    Tom Lane authored
    I had committer's remorse almost immediately after pushing cb76fbd7e,
    upon finding that removing capturing subexpressions' subREs from the
    data structure broke my proposed patch for REG_NOSUB optimization.
    Revert that data structure change.  Instead, address the concern
    about not changing capturing subREs' endpoints by not changing the
    endpoints.  We don't need to, because the point of that bit was just
    to ensure that the atom has endpoints distinct from the outer state
    pair that we're stringing the branch between.  We already made
    suitable states in the parenthesized-subexpression case, so the
    additional ones were just useless overhead.  This seems more
    understandable than Spencer's original coding, and it ought to be
    a shade faster too by saving a few state creations and arc changes.
    (I actually see a couple percent improvement on Jacobson's web
    corpus, though that's barely above the noise floor so I wouldn't
    put much stock in that result.)
    
    Also, fix the logic added by ea1268f6 to ensure that the subRE
    recorded in v->subs[subno] is exactly the one with capno == subno.
    Spencer's original coding recorded the child subRE of the capture
    node, which is okay so far as having the right endpoint states is
    concerned, but as of cb76fbd7e the capturing subRE itself always
    has those endpoints too.  I think the inconsistency is confusing
    for the REG_NOSUB optimization.
    
    As before, backpatch to v14.
    
    Discussion: https://postgr.es/m/0203588E-E609-43AF-9F4F-902854231EE7@enterprisedb.com
    5227d998
regcomp.c 68.3 KB