• Tom Lane's avatar
    Convert regex engine's subre tree from binary to N-ary style. · 58104308
    Tom Lane authored
    Instead of having left and right child links in subre structs,
    have a single child link plus a sibling link.  Multiple children
    of a tree node are now reached by chasing the sibling chain.
    
    The beneficiary of this is alternation tree nodes.  A regular
    expression with N (>1) branches is now represented by one alternation
    node with N children, rather than a tree that includes N alternation
    nodes as well as N children.  While the old representation didn't
    really cost anything extra at execution time, it was pretty horrid
    for compilation purposes, because each of the alternation nodes had
    its own NFA, which we were too stupid not to separately optimize.
    (To make matters worse, all of those NFAs described the entire
    alternation pattern, not just the portion of it that one might
    expect from the tree structure.)
    
    We continue to require concatenation nodes to have exactly two
    children.  This data structure is now prepared to support more,
    but the executor's logic would need some careful redesign, and
    it's not clear that a lot of benefit could be had.
    
    This is part of a patch series that in total reduces the regex engine's
    runtime by about a factor of four on a large corpus of real-world regexes.
    
    Patch by me, reviewed by Joel Jacobson
    
    Discussion: https://postgr.es/m/1340281.1613018383@sss.pgh.pa.us
    58104308
regexec.c 36.3 KB