Commit 579840ca authored by Tom Lane's avatar Tom Lane

Fix O(N^2) performance problems in regular-expression compiler.

Change the singly-linked in-arc and out-arc lists to be doubly-linked,
so that arc deletion is constant time rather than having worst-case time
proportional to the number of other arcs on the connected states.

Modify the bulk arc transfer operations copyins(), copyouts(), moveins(),
moveouts() so that they use a sort-and-merge algorithm whenever there's
more than a small number of arcs to be copied or moved.  The previous
method is O(N^2) in the number of arcs involved, because it performs
duplicate checking independently for each copied arc.  The new method may
change the ordering of existing arcs for the destination state, but nothing
really cares about that.

Provide another bulk arc copying method mergeins(), which is unused as
of this commit but is needed for the next one.  It basically is like
copyins(), but the source arcs might not all come from the same state.

Replace the O(N^2) bubble-sort algorithm used in carcsort() with a qsort()
call.

These changes greatly improve the performance of regex compilation for
large or complex regexes, at the cost of extra space for arc storage during
compilation.  The original tradeoff was probably fine when it was made, but
now we care more about speed and less about memory consumption.

Back-patch to all supported branches.
parent 48789c5d
This diff is collapsed.
......@@ -124,15 +124,22 @@ static void dropstate(struct nfa *, struct state *);
static void freestate(struct nfa *, struct state *);
static void destroystate(struct nfa *, struct state *);
static void newarc(struct nfa *, int, pcolor, struct state *, struct state *);
static void createarc(struct nfa *, int, pcolor, struct state *, struct state *);
static struct arc *allocarc(struct nfa *, struct state *);
static void freearc(struct nfa *, struct arc *);
static void changearctarget(struct arc *, struct state *);
static int hasnonemptyout(struct state *);
static int nonemptyouts(struct state *);
static int nonemptyins(struct state *);
static struct arc *findarc(struct state *, int, pcolor);
static void cparc(struct nfa *, struct arc *, struct state *, struct state *);
static void sortins(struct nfa *, struct state *);
static int sortins_cmp(const void *, const void *);
static void sortouts(struct nfa *, struct state *);
static int sortouts_cmp(const void *, const void *);
static void moveins(struct nfa *, struct state *, struct state *);
static void copyins(struct nfa *, struct state *, struct state *, int);
static void mergeins(struct nfa *, struct state *, struct arc **, int);
static void moveouts(struct nfa *, struct state *, struct state *);
static void copyouts(struct nfa *, struct state *, struct state *, int);
static void cloneouts(struct nfa *, struct state *, struct state *, struct state *, int);
......@@ -168,7 +175,8 @@ static void markreachable(struct nfa *, struct state *, struct state *, struct s
static void markcanreach(struct nfa *, struct state *, struct state *, struct state *);
static long analyze(struct nfa *);
static void compact(struct nfa *, struct cnfa *);
static void carcsort(struct carc *, struct carc *);
static void carcsort(struct carc *, size_t);
static int carc_cmp(const void *, const void *);
static void freecnfa(struct cnfa *);
static void dumpnfa(struct nfa *, FILE *);
......
......@@ -289,8 +289,10 @@ struct arc
struct state *from; /* where it's from (and contained within) */
struct state *to; /* where it's to */
struct arc *outchain; /* link in *from's outs chain or free chain */
#define freechain outchain
struct arc *outchainRev; /* back-link in *from's outs chain */
#define freechain outchain /* we do not maintain "freechainRev" */
struct arc *inchain; /* link in *to's ins chain */
struct arc *inchainRev; /* back-link in *to's ins chain */
struct arc *colorchain; /* link in color's arc chain */
struct arc *colorchainRev; /* back-link in color's arc chain */
};
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment