Commit 3c7042a7 authored by Robert Haas's avatar Robert Haas

pgbench: Change terminology from "threshold" to "parameter".

Per a recommendation from Tomas Vondra, it's more helpful to refer to
the value that determines how skewed a Gaussian or exponential
distribution is as a parameter rather than a threshold.

Since it's not quite too late to get this right in 9.5, where it was
introduced, back-patch this.  Most of the patch changes only comments
and documentation, but a few pgbench messages are altered to match.

Fabien Coelho, reviewed by Michael Paquier and by me.
parent 6e7b3359
...@@ -788,7 +788,7 @@ pgbench <optional> <replaceable>options</> </optional> <replaceable>dbname</> ...@@ -788,7 +788,7 @@ pgbench <optional> <replaceable>options</> </optional> <replaceable>dbname</>
<varlistentry> <varlistentry>
<term> <term>
<literal>\setrandom <replaceable>varname</> <replaceable>min</> <replaceable>max</> [ uniform | { gaussian | exponential } <replaceable>threshold</> ]</literal> <literal>\setrandom <replaceable>varname</> <replaceable>min</> <replaceable>max</> [ uniform | { gaussian | exponential } <replaceable>parameter</> ]</literal>
</term> </term>
<listitem> <listitem>
...@@ -804,54 +804,63 @@ pgbench <optional> <replaceable>options</> </optional> <replaceable>dbname</> ...@@ -804,54 +804,63 @@ pgbench <optional> <replaceable>options</> </optional> <replaceable>dbname</>
By default, or when <literal>uniform</> is specified, all values in the By default, or when <literal>uniform</> is specified, all values in the
range are drawn with equal probability. Specifying <literal>gaussian</> range are drawn with equal probability. Specifying <literal>gaussian</>
or <literal>exponential</> options modifies this behavior; each or <literal>exponential</> options modifies this behavior; each
requires a mandatory threshold which determines the precise shape of the requires a mandatory parameter which determines the precise shape of the
distribution. distribution.
</para> </para>
<para> <para>
For a Gaussian distribution, the interval is mapped onto a standard For a Gaussian distribution, the interval is mapped onto a standard
normal distribution (the classical bell-shaped Gaussian curve) truncated normal distribution (the classical bell-shaped Gaussian curve) truncated
at <literal>-threshold</> on the left and <literal>+threshold</> at <literal>-parameter</> on the left and <literal>+parameter</>
on the right. on the right.
Values in the middle of the interval are more likely to be drawn.
To be precise, if <literal>PHI(x)</> is the cumulative distribution To be precise, if <literal>PHI(x)</> is the cumulative distribution
function of the standard normal distribution, with mean <literal>mu</> function of the standard normal distribution, with mean <literal>mu</>
defined as <literal>(max + min) / 2.0</>, then value <replaceable>i</> defined as <literal>(max + min) / 2.0</>, with
between <replaceable>min</> and <replaceable>max</> inclusive is drawn <literallayout>
with probability: f(x) = PHI(2.0 * parameter * (x - mu) / (max - min + 1)) /
<literal> (2.0 * PHI(parameter) - 1.0)
(PHI(2.0 * threshold * (i - min - mu + 0.5) / (max - min + 1)) - </literallayout>
PHI(2.0 * threshold * (i - min - mu - 0.5) / (max - min + 1))) / then value <replaceable>i</> between <replaceable>min</> and
(2.0 * PHI(threshold) - 1.0)</>. <replaceable>max</> inclusive is drawn with probability:
Intuitively, the larger the <replaceable>threshold</>, the more <literal>f(i + 0.5) - f(i - 0.5)</>.
Intuitively, the larger <replaceable>parameter</>, the more
frequently values close to the middle of the interval are drawn, and the frequently values close to the middle of the interval are drawn, and the
less frequently values close to the <replaceable>min</> and less frequently values close to the <replaceable>min</> and
<replaceable>max</> bounds. <replaceable>max</> bounds. About 67% of values are drawn from the
About 67% of values are drawn from the middle <literal>1.0 / threshold</> middle <literal>1.0 / parameter</>, that is a relative
and 95% in the middle <literal>2.0 / threshold</>; for instance, if <literal>0.5 / parameter</> around the mean, and 95% in the middle
<replaceable>threshold</> is 4.0, 67% of values are drawn from the middle <literal>2.0 / parameter</>, that is a relative
quarter and 95% from the middle half of the interval. <literal>1.0 / parameter</> around the mean; for instance, if
The minimum <replaceable>threshold</> is 2.0 for performance of <replaceable>parameter</> is 4.0, 67% of values are drawn from the
the Box-Muller transform. middle quarter (1.0 / 4.0) of the interval (i.e. from
<literal>3.0 / 8.0</> to <literal>5.0 / 8.0</>) and 95% from
the middle half (<literal>2.0 / 4.0</>) of the interval (second and
third quartiles). The minimum <replaceable>parameter</> is 2.0 for
performance of the Box-Muller transform.
</para> </para>
<para> <para>
For an exponential distribution, the <replaceable>threshold</> For an exponential distribution, <replaceable>parameter</>
parameter controls the distribution by truncating a quickly-decreasing controls the distribution by truncating a quickly-decreasing
exponential distribution at <replaceable>threshold</>, and then exponential distribution at <replaceable>parameter</>, and then
projecting onto integers between the bounds. projecting onto integers between the bounds.
To be precise, value <replaceable>i</> between <replaceable>min</> and To be precise, with
<literallayout>
f(x) = exp(-parameter * (x - min) / (max - min + 1)) / (1.0 - exp(-parameter))
</literallayout>
Then value <replaceable>i</> between <replaceable>min</> and
<replaceable>max</> inclusive is drawn with probability: <replaceable>max</> inclusive is drawn with probability:
<literal>(exp(-threshold*(i-min)/(max+1-min)) - <literal>f(x) - f(x + 1)</>.
exp(-threshold*(i+1-min)/(max+1-min))) / (1.0 - exp(-threshold))</>. Intuitively, the larger <replaceable>parameter</>, the more
Intuitively, the larger the <replaceable>threshold</>, the more
frequently values close to <replaceable>min</> are accessed, and the frequently values close to <replaceable>min</> are accessed, and the
less frequently values close to <replaceable>max</> are accessed. less frequently values close to <replaceable>max</> are accessed.
The closer to 0 the threshold, the flatter (more uniform) the access The closer to 0 <replaceable>parameter</>, the flatter (more uniform)
distribution. the access distribution.
A crude approximation of the distribution is that the most frequent 1% A crude approximation of the distribution is that the most frequent 1%
values in the range, close to <replaceable>min</>, are drawn values in the range, close to <replaceable>min</>, are drawn
<replaceable>threshold</>% of the time. <replaceable>parameter</>% of the time.
The <replaceable>threshold</> value must be strictly positive. <replaceable>parameter</> value must be strictly positive.
</para> </para>
<para> <para>
......
...@@ -90,7 +90,7 @@ static int pthread_join(pthread_t th, void **thread_return); ...@@ -90,7 +90,7 @@ static int pthread_join(pthread_t th, void **thread_return);
#define LOG_STEP_SECONDS 5 /* seconds between log messages */ #define LOG_STEP_SECONDS 5 /* seconds between log messages */
#define DEFAULT_NXACTS 10 /* default nxacts */ #define DEFAULT_NXACTS 10 /* default nxacts */
#define MIN_GAUSSIAN_THRESHOLD 2.0 /* minimum threshold for gauss */ #define MIN_GAUSSIAN_PARAM 2.0 /* minimum parameter for gauss */
int nxacts = 0; /* number of transactions per client */ int nxacts = 0; /* number of transactions per client */
int duration = 0; /* duration in seconds */ int duration = 0; /* duration in seconds */
...@@ -488,47 +488,47 @@ getrand(TState *thread, int64 min, int64 max) ...@@ -488,47 +488,47 @@ getrand(TState *thread, int64 min, int64 max)
/* /*
* random number generator: exponential distribution from min to max inclusive. * random number generator: exponential distribution from min to max inclusive.
* the threshold is so that the density of probability for the last cut-off max * the parameter is so that the density of probability for the last cut-off max
* value is exp(-threshold). * value is exp(-parameter).
*/ */
static int64 static int64
getExponentialRand(TState *thread, int64 min, int64 max, double threshold) getExponentialRand(TState *thread, int64 min, int64 max, double parameter)
{ {
double cut, double cut,
uniform, uniform,
rand; rand;
Assert(threshold > 0.0); Assert(parameter > 0.0);
cut = exp(-threshold); cut = exp(-parameter);
/* erand in [0, 1), uniform in (0, 1] */ /* erand in [0, 1), uniform in (0, 1] */
uniform = 1.0 - pg_erand48(thread->random_state); uniform = 1.0 - pg_erand48(thread->random_state);
/* /*
* inner expresion in (cut, 1] (if threshold > 0), rand in [0, 1) * inner expresion in (cut, 1] (if parameter > 0), rand in [0, 1)
*/ */
Assert((1.0 - cut) != 0.0); Assert((1.0 - cut) != 0.0);
rand = -log(cut + (1.0 - cut) * uniform) / threshold; rand = -log(cut + (1.0 - cut) * uniform) / parameter;
/* return int64 random number within between min and max */ /* return int64 random number within between min and max */
return min + (int64) ((max - min + 1) * rand); return min + (int64) ((max - min + 1) * rand);
} }
/* random number generator: gaussian distribution from min to max inclusive */ /* random number generator: gaussian distribution from min to max inclusive */
static int64 static int64
getGaussianRand(TState *thread, int64 min, int64 max, double threshold) getGaussianRand(TState *thread, int64 min, int64 max, double parameter)
{ {
double stdev; double stdev;
double rand; double rand;
/* /*
* Get user specified random number from this loop, with -threshold < * Get user specified random number from this loop,
* stdev <= threshold * with -parameter < stdev <= parameter
* *
* This loop is executed until the number is in the expected range. * This loop is executed until the number is in the expected range.
* *
* As the minimum threshold is 2.0, the probability of looping is low: * As the minimum parameter is 2.0, the probability of looping is low:
* sqrt(-2 ln(r)) <= 2 => r >= e^{-2} ~ 0.135, then when taking the * sqrt(-2 ln(r)) <= 2 => r >= e^{-2} ~ 0.135, then when taking the
* average sinus multiplier as 2/pi, we have a 8.6% looping probability in * average sinus multiplier as 2/pi, we have a 8.6% looping probability in
* the worst case. For a 5.0 threshold value, the looping probability is * the worst case. For a parameter value of 5.0, the looping probability is
* about e^{-5} * 2 / pi ~ 0.43%. * about e^{-5} * 2 / pi ~ 0.43%.
*/ */
do do
...@@ -553,10 +553,10 @@ getGaussianRand(TState *thread, int64 min, int64 max, double threshold) ...@@ -553,10 +553,10 @@ getGaussianRand(TState *thread, int64 min, int64 max, double threshold)
* over. * over.
*/ */
} }
while (stdev < -threshold || stdev >= threshold); while (stdev < -parameter || stdev >= parameter);
/* stdev is in [-threshold, threshold), normalization to [0,1) */ /* stdev is in [-parameter, parameter), normalization to [0,1) */
rand = (stdev + threshold) / (threshold * 2.0); rand = (stdev + parameter) / (parameter * 2.0);
/* return int64 random number within between min and max */ /* return int64 random number within between min and max */
return min + (int64) ((max - min + 1) * rand); return min + (int64) ((max - min + 1) * rand);
...@@ -1483,7 +1483,7 @@ top: ...@@ -1483,7 +1483,7 @@ top:
char *var; char *var;
int64 min, int64 min,
max; max;
double threshold = 0; double parameter = 0;
char res[64]; char res[64];
if (*argv[2] == ':') if (*argv[2] == ':')
...@@ -1554,41 +1554,49 @@ top: ...@@ -1554,41 +1554,49 @@ top:
{ {
if ((var = getVariable(st, argv[5] + 1)) == NULL) if ((var = getVariable(st, argv[5] + 1)) == NULL)
{ {
fprintf(stderr, "%s: invalid threshold number: \"%s\"\n", fprintf(stderr, "%s: invalid parameter: \"%s\"\n",
argv[0], argv[5]); argv[0], argv[5]);
st->ecnt++; st->ecnt++;
return true; return true;
} }
threshold = strtod(var, NULL); parameter = strtod(var, NULL);
} }
else else
threshold = strtod(argv[5], NULL); parameter = strtod(argv[5], NULL);
if (pg_strcasecmp(argv[4], "gaussian") == 0) if (pg_strcasecmp(argv[4], "gaussian") == 0)
{ {
if (threshold < MIN_GAUSSIAN_THRESHOLD) if (parameter < MIN_GAUSSIAN_PARAM)
{ {
fprintf(stderr, "gaussian threshold must be at least %f (not \"%s\")\n", MIN_GAUSSIAN_THRESHOLD, argv[5]); fprintf(stderr, "gaussian parameter must be at least %f (not \"%s\")\n", MIN_GAUSSIAN_PARAM, argv[5]);
st->ecnt++; st->ecnt++;
return true; return true;
} }
#ifdef DEBUG #ifdef DEBUG
printf("min: " INT64_FORMAT " max: " INT64_FORMAT " random: " INT64_FORMAT "\n", min, max, getGaussianRand(thread, min, max, threshold)); printf("min: " INT64_FORMAT " max: " INT64_FORMAT " random: " INT64_FORMAT "\n",
min, max,
getGaussianRand(thread, min, max, parameter));
#endif #endif
snprintf(res, sizeof(res), INT64_FORMAT, getGaussianRand(thread, min, max, threshold)); snprintf(res, sizeof(res), INT64_FORMAT,
getGaussianRand(thread, min, max, parameter));
} }
else if (pg_strcasecmp(argv[4], "exponential") == 0) else if (pg_strcasecmp(argv[4], "exponential") == 0)
{ {
if (threshold <= 0.0) if (parameter <= 0.0)
{ {
fprintf(stderr, "exponential threshold must be greater than zero (not \"%s\")\n", argv[5]); fprintf(stderr,
"exponential parameter must be greater than zero (not \"%s\")\n",
argv[5]);
st->ecnt++; st->ecnt++;
return true; return true;
} }
#ifdef DEBUG #ifdef DEBUG
printf("min: " INT64_FORMAT " max: " INT64_FORMAT " random: " INT64_FORMAT "\n", min, max, getExponentialRand(thread, min, max, threshold)); printf("min: " INT64_FORMAT " max: " INT64_FORMAT " random: " INT64_FORMAT "\n",
min, max,
getExponentialRand(thread, min, max, parameter));
#endif #endif
snprintf(res, sizeof(res), INT64_FORMAT, getExponentialRand(thread, min, max, threshold)); snprintf(res, sizeof(res), INT64_FORMAT,
getExponentialRand(thread, min, max, parameter));
} }
} }
else /* this means an error somewhere in the parsing phase... */ else /* this means an error somewhere in the parsing phase... */
...@@ -2282,8 +2290,9 @@ process_commands(char *buf, const char *source, const int lineno) ...@@ -2282,8 +2290,9 @@ process_commands(char *buf, const char *source, const int lineno)
if (pg_strcasecmp(my_commands->argv[0], "setrandom") == 0) if (pg_strcasecmp(my_commands->argv[0], "setrandom") == 0)
{ {
/* /*
* parsing: \setrandom variable min max [uniform] \setrandom * parsing:
* variable min max (gaussian|exponential) threshold * \setrandom variable min max [uniform]
* \setrandom variable min max (gaussian|exponential) parameter
*/ */
if (my_commands->argc < 4) if (my_commands->argc < 4)
...@@ -2308,7 +2317,7 @@ process_commands(char *buf, const char *source, const int lineno) ...@@ -2308,7 +2317,7 @@ process_commands(char *buf, const char *source, const int lineno)
if (my_commands->argc < 6) if (my_commands->argc < 6)
{ {
syntax_error(source, lineno, my_commands->line, my_commands->argv[0], syntax_error(source, lineno, my_commands->line, my_commands->argv[0],
"missing threshold argument", my_commands->argv[4], -1); "missing parameter", my_commands->argv[4], -1);
} }
else if (my_commands->argc > 6) else if (my_commands->argc > 6)
{ {
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment