• Tom Lane's avatar
    Fix regexp misbehavior with capturing parens inside "{0}". · 244dd799
    Tom Lane authored
    Regexps like "(.){0}...\1" drew an "invalid backreference number".
    That's not unreasonable on its face, since the capture group will
    never be matched if it's iterated zero times.  However, other engines
    such as Perl's don't complain about this, nor do we throw an error for
    related cases such as "(.)|\1", even though that backref can never
    succeed either.  Also, if the zero-iterations case happens at runtime
    rather than compile time --- say, "(x)*...\1" when there's no "x" to
    be found --- that's not an error, we just deem the backref to not
    match.  Making this even less defensible, no error was thrown for
    nested cases such as "((.)){0}...\2"; and to add insult to injury,
    those cases could result in assertion failures instead.  (It seems
    that nothing especially bad happened in non-assert builds, though.)
    
    Let's just fix it so that no error is thrown and instead the backref
    is deemed to never match, so that compile-time detection of no
    iterations behaves the same as run-time detection.
    
    Per report from Mark Dilger.  This appears to be an aboriginal error
    in Spencer's library, so back-patch to all supported versions.
    
    Pre-v14, it turns out to also be necessary to back-patch one aspect of
    commits cb76fbd7e/00116dee5, namely to create capture-node subREs with
    the begin/end states of their subexpressions, not the current lp/rp
    of the outer parseqatom invocation.  Otherwise delsub complains that
    we're trying to disconnect a state from itself.  This is a bit scary
    but code examination shows that it's safe: in the pre-v14 code, if we
    want to wrap iteration around the subexpression, the first thing we do
    is overwrite the atom's begin/end fields with new states.  So the
    bogus values didn't survive long enough to be used for anything, except
    if no iteration is required, in which case it doesn't matter.
    
    Discussion: https://postgr.es/m/A099E4A8-4377-4C64-A98C-3DEDDC075502@enterprisedb.com
    244dd799
test_regex.out 128 KB