• Tom Lane's avatar
    Improve performance of regular expression back-references. · 0c3405cf
    Tom Lane authored
    In some cases, at the time that we're doing an NFA-based precheck
    of whether a backref subexpression can match at a particular place
    in the string, we already know which substring the referenced
    subexpression matched.  If so, we might as well forget about the NFA
    and just compare the substring; this is faster and it gives an exact
    rather than approximate answer.
    
    In general, this optimization can help while we are prechecking within
    the second child expression of a concat node, while the capture was
    within the first child expression; then the substring was saved during
    cdissect() of the first child and will be available to NFA checks done
    while cdissect() recurses into the second child.  It can help quite a
    lot if the tree looks like
    
                  concat
                 /      \
          capture        concat
                        /      \
         expensive stuff        backref
    
    as we will be able to avoid recursively dissecting the "expensive
    stuff" before discovering that the backref isn't satisfied with a
    particular midpoint that the lower concat node is testing.  This
    doesn't help if the concat tree is left-deep, as the capture node
    won't get set soon enough (and it's hard to fix that without changing
    the engine's match behavior).  Fortunately, right-deep concat trees
    are the common case.
    
    Patch by me, reviewed by Joel Jacobson
    
    Discussion: https://postgr.es/m/661609.1614560029@sss.pgh.pa.us
    0c3405cf
rege_dfa.c 27.4 KB