Commit 88709176 authored by Tom Lane's avatar Tom Lane

Apply auto-vectorization to the inner loop of numeric multiplication.

Compile numeric.c with -ftree-vectorize where available, and adjust
the innermost loop of mul_var() so that it is amenable to being
auto-vectorized.  (Mainly, that involves making it process the arrays
left-to-right not right-to-left.)

Applying -ftree-vectorize actually makes numeric.o smaller, at least
with my compiler (gcc 8.3.1 on x86_64), and it's a little faster too.
Independently of that, fixing the inner loop to be vectorizable also
makes things a bit faster.  But doing both is a huge win for
multiplications with lots of digits.  For me, the numeric regression
test is the same speed to within measurement noise, but numeric_big
is a full 45% faster.

We also looked into applying -funroll-loops, but that makes numeric.o
bloat quite a bit, and the additional speed improvement is very
marginal.

Amit Khandekar, reviewed and edited a little by me

Discussion: https://postgr.es/m/CAJ3gD9evtA_vBo+WMYMyT-u=keHX7-r8p2w7OSRfXf42LTwCZQ@mail.gmail.com
parent 695de5d1
...@@ -125,6 +125,9 @@ clean distclean maintainer-clean: ...@@ -125,6 +125,9 @@ clean distclean maintainer-clean:
like.o: like.c like_match.c like.o: like.c like_match.c
# Some code in numeric.c benefits from auto-vectorization
numeric.o: CFLAGS += ${CFLAGS_VECTORIZE}
varlena.o: varlena.c levenshtein.c varlena.o: varlena.c levenshtein.c
include $(top_srcdir)/src/backend/common.mk include $(top_srcdir)/src/backend/common.mk
...@@ -8191,6 +8191,7 @@ mul_var(const NumericVar *var1, const NumericVar *var2, NumericVar *result, ...@@ -8191,6 +8191,7 @@ mul_var(const NumericVar *var1, const NumericVar *var2, NumericVar *result,
int res_weight; int res_weight;
int maxdigits; int maxdigits;
int *dig; int *dig;
int *dig_i1_2;
int carry; int carry;
int maxdig; int maxdig;
int newdig; int newdig;
...@@ -8327,10 +8328,18 @@ mul_var(const NumericVar *var1, const NumericVar *var2, NumericVar *result, ...@@ -8327,10 +8328,18 @@ mul_var(const NumericVar *var1, const NumericVar *var2, NumericVar *result,
* *
* As above, digits of var2 can be ignored if they don't contribute, * As above, digits of var2 can be ignored if they don't contribute,
* so we only include digits for which i1+i2+2 <= res_ndigits - 1. * so we only include digits for which i1+i2+2 <= res_ndigits - 1.
*
* This inner loop is the performance bottleneck for multiplication,
* so we want to keep it simple enough so that it can be
* auto-vectorized. Accordingly, process the digits left-to-right
* even though schoolbook multiplication would suggest right-to-left.
* Since we aren't propagating carries in this loop, the order does
* not matter.
*/ */
for (i2 = Min(var2ndigits - 1, res_ndigits - i1 - 3), i = i1 + i2 + 2; i = Min(var2ndigits - 1, res_ndigits - i1 - 3);
i2 >= 0; i2--) dig_i1_2 = &dig[i1 + 2];
dig[i--] += var1digit * var2digits[i2]; for (i2 = 0; i2 <= i; i2++)
dig_i1_2[i2] += var1digit * var2digits[i2];
} }
/* /*
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment