Commit 044c99bc authored by Tom Lane's avatar Tom Lane

Use query collation, not column's collation, while examining statistics.

Commit 5e092800 changed the planner so that, instead of blindly using
DEFAULT_COLLATION_OID when invoking operators for selectivity estimation,
it would use the collation of the column whose statistics we're
considering.  This was recognized as still being not quite the right
thing, but it seemed like a good incremental improvement.  However,
shortly thereafter we introduced nondeterministic collations, and that
creates cases where operators can fail if they're passed the wrong
collation.  We don't want planning to fail in cases where the query itself
would work, so this means that we *must* use the query's collation when
invoking operators for estimation purposes.

The only real problem this creates is in ineq_histogram_selectivity, where
the binary search might produce a garbage answer if we perform comparisons
using a different collation than the column's histogram is ordered with.
However, when the query's collation is significantly different from the
column's default collation, the estimate we previously generated would be
pretty irrelevant anyway; so it's not clear that this will result in
noticeably worse estimates in practice.  (A follow-on patch will improve
this situation in HEAD, but it seems too invasive for back-patch.)

The patch requires changing the signatures of mcv_selectivity and allied
functions, which are exported and very possibly are used by extensions.
In HEAD, I just did that, but an API/ABI break of this sort isn't
acceptable in stable branches.  Therefore, in v12 the patch introduces
"mcv_selectivity_ext" and so on, with signatures matching HEAD, and makes
the old functions into wrappers that assume DEFAULT_COLLATION_OID should
be used.  That does not match the prior behavior, but it should avoid risk
of failure in most cases.  (In practice, I think most extension datatypes
aren't collation-aware, so the change probably doesn't matter to them.)

Per report from James Lucas.  Back-patch to v12 where the problem was
introduced.

Discussion: https://postgr.es/m/CAAFmbbOvfi=wMM=3qRsPunBSLb8BFREno2oOzSBS=mzfLPKABw@mail.gmail.com
parent f0d2c65f
...@@ -582,7 +582,7 @@ ltreeparentsel(PG_FUNCTION_ARGS) ...@@ -582,7 +582,7 @@ ltreeparentsel(PG_FUNCTION_ARGS)
double selec; double selec;
/* Use generic restriction selectivity logic, with default 0.001. */ /* Use generic restriction selectivity logic, with default 0.001. */
selec = generic_restriction_selectivity(root, operator, selec = generic_restriction_selectivity(root, operator, InvalidOid,
args, varRelid, args, varRelid,
0.001); 0.001);
......
...@@ -92,6 +92,7 @@ static Pattern_Prefix_Status pattern_fixed_prefix(Const *patt, ...@@ -92,6 +92,7 @@ static Pattern_Prefix_Status pattern_fixed_prefix(Const *patt,
static Selectivity prefix_selectivity(PlannerInfo *root, static Selectivity prefix_selectivity(PlannerInfo *root,
VariableStatData *vardata, VariableStatData *vardata,
Oid eqopr, Oid ltopr, Oid geopr, Oid eqopr, Oid ltopr, Oid geopr,
Oid collation,
Const *prefixcon); Const *prefixcon);
static Selectivity like_selectivity(const char *patt, int pattlen, static Selectivity like_selectivity(const char *patt, int pattlen,
bool case_insensitive); bool case_insensitive);
...@@ -534,12 +535,6 @@ patternsel_common(PlannerInfo *root, ...@@ -534,12 +535,6 @@ patternsel_common(PlannerInfo *root,
* something binary-compatible but different.) We can use it to identify * something binary-compatible but different.) We can use it to identify
* the comparison operators and the required type of the comparison * the comparison operators and the required type of the comparison
* constant, much as in match_pattern_prefix(). * constant, much as in match_pattern_prefix().
*
* NOTE: this logic does not consider collations. Ideally we'd force use
* of "C" collation, but since ANALYZE only generates statistics for the
* column's specified collation, we have little choice but to use those.
* But our results are so approximate anyway that it probably hardly
* matters.
*/ */
vartype = vardata.vartype; vartype = vardata.vartype;
...@@ -622,7 +617,7 @@ patternsel_common(PlannerInfo *root, ...@@ -622,7 +617,7 @@ patternsel_common(PlannerInfo *root,
/* /*
* Pattern specifies an exact match, so estimate as for '=' * Pattern specifies an exact match, so estimate as for '='
*/ */
result = var_eq_const(&vardata, eqopr, prefix->constvalue, result = var_eq_const(&vardata, eqopr, collation, prefix->constvalue,
false, true, false); false, true, false);
} }
else else
...@@ -654,7 +649,8 @@ patternsel_common(PlannerInfo *root, ...@@ -654,7 +649,8 @@ patternsel_common(PlannerInfo *root,
opfuncid = get_opcode(oprid); opfuncid = get_opcode(oprid);
fmgr_info(opfuncid, &opproc); fmgr_info(opfuncid, &opproc);
selec = histogram_selectivity(&vardata, &opproc, constval, true, selec = histogram_selectivity(&vardata, &opproc, collation,
constval, true,
10, 1, &hist_size); 10, 1, &hist_size);
/* If not at least 100 entries, use the heuristic method */ /* If not at least 100 entries, use the heuristic method */
...@@ -666,6 +662,7 @@ patternsel_common(PlannerInfo *root, ...@@ -666,6 +662,7 @@ patternsel_common(PlannerInfo *root,
if (pstatus == Pattern_Prefix_Partial) if (pstatus == Pattern_Prefix_Partial)
prefixsel = prefix_selectivity(root, &vardata, prefixsel = prefix_selectivity(root, &vardata,
eqopr, ltopr, geopr, eqopr, ltopr, geopr,
collation,
prefix); prefix);
else else
prefixsel = 1.0; prefixsel = 1.0;
...@@ -698,7 +695,8 @@ patternsel_common(PlannerInfo *root, ...@@ -698,7 +695,8 @@ patternsel_common(PlannerInfo *root,
* directly to the result selectivity. Also add up the total fraction * directly to the result selectivity. Also add up the total fraction
* represented by MCV entries. * represented by MCV entries.
*/ */
mcv_selec = mcv_selectivity(&vardata, &opproc, constval, true, mcv_selec = mcv_selectivity(&vardata, &opproc, collation,
constval, true,
&sumcommon); &sumcommon);
/* /*
...@@ -1196,7 +1194,7 @@ pattern_fixed_prefix(Const *patt, Pattern_Type ptype, Oid collation, ...@@ -1196,7 +1194,7 @@ pattern_fixed_prefix(Const *patt, Pattern_Type ptype, Oid collation,
* population represented by the histogram --- the caller must fold this * population represented by the histogram --- the caller must fold this
* together with info about MCVs and NULLs. * together with info about MCVs and NULLs.
* *
* We use the specified btree comparison operators to do the estimation. * We use the given comparison operators and collation to do the estimation.
* The given variable and Const must be of the associated datatype(s). * The given variable and Const must be of the associated datatype(s).
* *
* XXX Note: we make use of the upper bound to estimate operator selectivity * XXX Note: we make use of the upper bound to estimate operator selectivity
...@@ -1207,11 +1205,11 @@ pattern_fixed_prefix(Const *patt, Pattern_Type ptype, Oid collation, ...@@ -1207,11 +1205,11 @@ pattern_fixed_prefix(Const *patt, Pattern_Type ptype, Oid collation,
static Selectivity static Selectivity
prefix_selectivity(PlannerInfo *root, VariableStatData *vardata, prefix_selectivity(PlannerInfo *root, VariableStatData *vardata,
Oid eqopr, Oid ltopr, Oid geopr, Oid eqopr, Oid ltopr, Oid geopr,
Oid collation,
Const *prefixcon) Const *prefixcon)
{ {
Selectivity prefixsel; Selectivity prefixsel;
FmgrInfo opproc; FmgrInfo opproc;
AttStatsSlot sslot;
Const *greaterstrcon; Const *greaterstrcon;
Selectivity eq_sel; Selectivity eq_sel;
...@@ -1220,6 +1218,7 @@ prefix_selectivity(PlannerInfo *root, VariableStatData *vardata, ...@@ -1220,6 +1218,7 @@ prefix_selectivity(PlannerInfo *root, VariableStatData *vardata,
prefixsel = ineq_histogram_selectivity(root, vardata, prefixsel = ineq_histogram_selectivity(root, vardata,
&opproc, true, true, &opproc, true, true,
collation,
prefixcon->constvalue, prefixcon->constvalue,
prefixcon->consttype); prefixcon->consttype);
...@@ -1229,27 +1228,18 @@ prefix_selectivity(PlannerInfo *root, VariableStatData *vardata, ...@@ -1229,27 +1228,18 @@ prefix_selectivity(PlannerInfo *root, VariableStatData *vardata,
return DEFAULT_MATCH_SEL; return DEFAULT_MATCH_SEL;
} }
/*------- /*
* If we can create a string larger than the prefix, say * If we can create a string larger than the prefix, say "x < greaterstr".
* "x < greaterstr". We try to generate the string referencing the
* collation of the var's statistics, but if that's not available,
* use DEFAULT_COLLATION_OID.
*-------
*/ */
if (HeapTupleIsValid(vardata->statsTuple) &&
get_attstatsslot(&sslot, vardata->statsTuple,
STATISTIC_KIND_HISTOGRAM, InvalidOid, 0))
/* sslot.stacoll is set up */ ;
else
sslot.stacoll = DEFAULT_COLLATION_OID;
fmgr_info(get_opcode(ltopr), &opproc); fmgr_info(get_opcode(ltopr), &opproc);
greaterstrcon = make_greater_string(prefixcon, &opproc, sslot.stacoll); greaterstrcon = make_greater_string(prefixcon, &opproc, collation);
if (greaterstrcon) if (greaterstrcon)
{ {
Selectivity topsel; Selectivity topsel;
topsel = ineq_histogram_selectivity(root, vardata, topsel = ineq_histogram_selectivity(root, vardata,
&opproc, false, false, &opproc, false, false,
collation,
greaterstrcon->constvalue, greaterstrcon->constvalue,
greaterstrcon->consttype); greaterstrcon->consttype);
...@@ -1278,7 +1268,7 @@ prefix_selectivity(PlannerInfo *root, VariableStatData *vardata, ...@@ -1278,7 +1268,7 @@ prefix_selectivity(PlannerInfo *root, VariableStatData *vardata,
* probably off the end of the histogram, and thus we probably got a very * probably off the end of the histogram, and thus we probably got a very
* small estimate from the >= condition; so we still need to clamp. * small estimate from the >= condition; so we still need to clamp.
*/ */
eq_sel = var_eq_const(vardata, eqopr, prefixcon->constvalue, eq_sel = var_eq_const(vardata, eqopr, collation, prefixcon->constvalue,
false, true, false); false, true, false);
prefixsel = Max(prefixsel, eq_sel); prefixsel = Max(prefixsel, eq_sel);
......
...@@ -137,7 +137,8 @@ networksel(PG_FUNCTION_ARGS) ...@@ -137,7 +137,8 @@ networksel(PG_FUNCTION_ARGS)
* by MCV entries. * by MCV entries.
*/ */
fmgr_info(get_opcode(operator), &proc); fmgr_info(get_opcode(operator), &proc);
mcv_selec = mcv_selectivity(&vardata, &proc, constvalue, varonleft, mcv_selec = mcv_selectivity(&vardata, &proc, InvalidOid,
constvalue, varonleft,
&sumcommon); &sumcommon);
/* /*
......
...@@ -88,11 +88,7 @@ ...@@ -88,11 +88,7 @@
* (if any) is passed using the standard fmgr mechanism, so that the estimator * (if any) is passed using the standard fmgr mechanism, so that the estimator
* function can fetch it with PG_GET_COLLATION(). Note, however, that all * function can fetch it with PG_GET_COLLATION(). Note, however, that all
* statistics in pg_statistic are currently built using the relevant column's * statistics in pg_statistic are currently built using the relevant column's
* collation. Thus, in most cases where we are looking at statistics, we * collation.
* should ignore the operator collation and use the stats entry's collation.
* We expect that the error induced by doing this is usually not large enough
* to justify complicating matters. In any case, doing otherwise would yield
* entirely garbage results for ordered stats data such as histograms.
*---------- *----------
*/ */
...@@ -149,14 +145,14 @@ get_relation_stats_hook_type get_relation_stats_hook = NULL; ...@@ -149,14 +145,14 @@ get_relation_stats_hook_type get_relation_stats_hook = NULL;
get_index_stats_hook_type get_index_stats_hook = NULL; get_index_stats_hook_type get_index_stats_hook = NULL;
static double eqsel_internal(PG_FUNCTION_ARGS, bool negate); static double eqsel_internal(PG_FUNCTION_ARGS, bool negate);
static double eqjoinsel_inner(Oid opfuncoid, static double eqjoinsel_inner(Oid opfuncoid, Oid collation,
VariableStatData *vardata1, VariableStatData *vardata2, VariableStatData *vardata1, VariableStatData *vardata2,
double nd1, double nd2, double nd1, double nd2,
bool isdefault1, bool isdefault2, bool isdefault1, bool isdefault2,
AttStatsSlot *sslot1, AttStatsSlot *sslot2, AttStatsSlot *sslot1, AttStatsSlot *sslot2,
Form_pg_statistic stats1, Form_pg_statistic stats2, Form_pg_statistic stats1, Form_pg_statistic stats2,
bool have_mcvs1, bool have_mcvs2); bool have_mcvs1, bool have_mcvs2);
static double eqjoinsel_semi(Oid opfuncoid, static double eqjoinsel_semi(Oid opfuncoid, Oid collation,
VariableStatData *vardata1, VariableStatData *vardata2, VariableStatData *vardata1, VariableStatData *vardata2,
double nd1, double nd2, double nd1, double nd2,
bool isdefault1, bool isdefault2, bool isdefault1, bool isdefault2,
...@@ -194,10 +190,11 @@ static double convert_timevalue_to_scalar(Datum value, Oid typid, ...@@ -194,10 +190,11 @@ static double convert_timevalue_to_scalar(Datum value, Oid typid,
static void examine_simple_variable(PlannerInfo *root, Var *var, static void examine_simple_variable(PlannerInfo *root, Var *var,
VariableStatData *vardata); VariableStatData *vardata);
static bool get_variable_range(PlannerInfo *root, VariableStatData *vardata, static bool get_variable_range(PlannerInfo *root, VariableStatData *vardata,
Oid sortop, Datum *min, Datum *max); Oid sortop, Oid collation,
Datum *min, Datum *max);
static bool get_actual_variable_range(PlannerInfo *root, static bool get_actual_variable_range(PlannerInfo *root,
VariableStatData *vardata, VariableStatData *vardata,
Oid sortop, Oid sortop, Oid collation,
Datum *min, Datum *max); Datum *min, Datum *max);
static bool get_actual_variable_endpoint(Relation heapRel, static bool get_actual_variable_endpoint(Relation heapRel,
Relation indexRel, Relation indexRel,
...@@ -235,6 +232,7 @@ eqsel_internal(PG_FUNCTION_ARGS, bool negate) ...@@ -235,6 +232,7 @@ eqsel_internal(PG_FUNCTION_ARGS, bool negate)
Oid operator = PG_GETARG_OID(1); Oid operator = PG_GETARG_OID(1);
List *args = (List *) PG_GETARG_POINTER(2); List *args = (List *) PG_GETARG_POINTER(2);
int varRelid = PG_GETARG_INT32(3); int varRelid = PG_GETARG_INT32(3);
Oid collation = PG_GET_COLLATION();
VariableStatData vardata; VariableStatData vardata;
Node *other; Node *other;
bool varonleft; bool varonleft;
...@@ -268,12 +266,12 @@ eqsel_internal(PG_FUNCTION_ARGS, bool negate) ...@@ -268,12 +266,12 @@ eqsel_internal(PG_FUNCTION_ARGS, bool negate)
* in the query.) * in the query.)
*/ */
if (IsA(other, Const)) if (IsA(other, Const))
selec = var_eq_const(&vardata, operator, selec = var_eq_const(&vardata, operator, collation,
((Const *) other)->constvalue, ((Const *) other)->constvalue,
((Const *) other)->constisnull, ((Const *) other)->constisnull,
varonleft, negate); varonleft, negate);
else else
selec = var_eq_non_const(&vardata, operator, other, selec = var_eq_non_const(&vardata, operator, collation, other,
varonleft, negate); varonleft, negate);
ReleaseVariableStats(vardata); ReleaseVariableStats(vardata);
...@@ -287,7 +285,7 @@ eqsel_internal(PG_FUNCTION_ARGS, bool negate) ...@@ -287,7 +285,7 @@ eqsel_internal(PG_FUNCTION_ARGS, bool negate)
* This is exported so that some other estimation functions can use it. * This is exported so that some other estimation functions can use it.
*/ */
double double
var_eq_const(VariableStatData *vardata, Oid operator, var_eq_const(VariableStatData *vardata, Oid operator, Oid collation,
Datum constval, bool constisnull, Datum constval, bool constisnull,
bool varonleft, bool negate) bool varonleft, bool negate)
{ {
...@@ -356,7 +354,7 @@ var_eq_const(VariableStatData *vardata, Oid operator, ...@@ -356,7 +354,7 @@ var_eq_const(VariableStatData *vardata, Oid operator,
* eqproc returns NULL, though really equality functions should * eqproc returns NULL, though really equality functions should
* never do that. * never do that.
*/ */
InitFunctionCallInfoData(*fcinfo, &eqproc, 2, sslot.stacoll, InitFunctionCallInfoData(*fcinfo, &eqproc, 2, collation,
NULL, NULL); NULL, NULL);
fcinfo->args[0].isnull = false; fcinfo->args[0].isnull = false;
fcinfo->args[1].isnull = false; fcinfo->args[1].isnull = false;
...@@ -458,7 +456,7 @@ var_eq_const(VariableStatData *vardata, Oid operator, ...@@ -458,7 +456,7 @@ var_eq_const(VariableStatData *vardata, Oid operator,
* This is exported so that some other estimation functions can use it. * This is exported so that some other estimation functions can use it.
*/ */
double double
var_eq_non_const(VariableStatData *vardata, Oid operator, var_eq_non_const(VariableStatData *vardata, Oid operator, Oid collation,
Node *other, Node *other,
bool varonleft, bool negate) bool varonleft, bool negate)
{ {
...@@ -573,6 +571,7 @@ neqsel(PG_FUNCTION_ARGS) ...@@ -573,6 +571,7 @@ neqsel(PG_FUNCTION_ARGS)
*/ */
static double static double
scalarineqsel(PlannerInfo *root, Oid operator, bool isgt, bool iseq, scalarineqsel(PlannerInfo *root, Oid operator, bool isgt, bool iseq,
Oid collation,
VariableStatData *vardata, Datum constval, Oid consttype) VariableStatData *vardata, Datum constval, Oid consttype)
{ {
Form_pg_statistic stats; Form_pg_statistic stats;
...@@ -672,7 +671,7 @@ scalarineqsel(PlannerInfo *root, Oid operator, bool isgt, bool iseq, ...@@ -672,7 +671,7 @@ scalarineqsel(PlannerInfo *root, Oid operator, bool isgt, bool iseq,
* to the result selectivity. Also add up the total fraction represented * to the result selectivity. Also add up the total fraction represented
* by MCV entries. * by MCV entries.
*/ */
mcv_selec = mcv_selectivity(vardata, &opproc, constval, true, mcv_selec = mcv_selectivity(vardata, &opproc, collation, constval, true,
&sumcommon); &sumcommon);
/* /*
...@@ -681,6 +680,7 @@ scalarineqsel(PlannerInfo *root, Oid operator, bool isgt, bool iseq, ...@@ -681,6 +680,7 @@ scalarineqsel(PlannerInfo *root, Oid operator, bool isgt, bool iseq,
*/ */
hist_selec = ineq_histogram_selectivity(root, vardata, hist_selec = ineq_histogram_selectivity(root, vardata,
&opproc, isgt, iseq, &opproc, isgt, iseq,
collation,
constval, consttype); constval, consttype);
/* /*
...@@ -722,7 +722,7 @@ scalarineqsel(PlannerInfo *root, Oid operator, bool isgt, bool iseq, ...@@ -722,7 +722,7 @@ scalarineqsel(PlannerInfo *root, Oid operator, bool isgt, bool iseq,
* if there is no MCV list. * if there is no MCV list.
*/ */
double double
mcv_selectivity(VariableStatData *vardata, FmgrInfo *opproc, mcv_selectivity(VariableStatData *vardata, FmgrInfo *opproc, Oid collation,
Datum constval, bool varonleft, Datum constval, bool varonleft,
double *sumcommonp) double *sumcommonp)
{ {
...@@ -749,7 +749,7 @@ mcv_selectivity(VariableStatData *vardata, FmgrInfo *opproc, ...@@ -749,7 +749,7 @@ mcv_selectivity(VariableStatData *vardata, FmgrInfo *opproc,
* operators that can return NULL. A small side benefit is to not * operators that can return NULL. A small side benefit is to not
* need to re-initialize the fcinfo struct from scratch each time. * need to re-initialize the fcinfo struct from scratch each time.
*/ */
InitFunctionCallInfoData(*fcinfo, opproc, 2, sslot.stacoll, InitFunctionCallInfoData(*fcinfo, opproc, 2, collation,
NULL, NULL); NULL, NULL);
fcinfo->args[0].isnull = false; fcinfo->args[0].isnull = false;
fcinfo->args[1].isnull = false; fcinfo->args[1].isnull = false;
...@@ -813,7 +813,8 @@ mcv_selectivity(VariableStatData *vardata, FmgrInfo *opproc, ...@@ -813,7 +813,8 @@ mcv_selectivity(VariableStatData *vardata, FmgrInfo *opproc,
* prudent to clamp the result range, ie, disbelieve exact 0 or 1 outputs. * prudent to clamp the result range, ie, disbelieve exact 0 or 1 outputs.
*/ */
double double
histogram_selectivity(VariableStatData *vardata, FmgrInfo *opproc, histogram_selectivity(VariableStatData *vardata,
FmgrInfo *opproc, Oid collation,
Datum constval, bool varonleft, Datum constval, bool varonleft,
int min_hist_size, int n_skip, int min_hist_size, int n_skip,
int *hist_size) int *hist_size)
...@@ -846,7 +847,7 @@ histogram_selectivity(VariableStatData *vardata, FmgrInfo *opproc, ...@@ -846,7 +847,7 @@ histogram_selectivity(VariableStatData *vardata, FmgrInfo *opproc,
* is to not need to re-initialize the fcinfo struct from scratch * is to not need to re-initialize the fcinfo struct from scratch
* each time. * each time.
*/ */
InitFunctionCallInfoData(*fcinfo, opproc, 2, sslot.stacoll, InitFunctionCallInfoData(*fcinfo, opproc, 2, collation,
NULL, NULL); NULL, NULL);
fcinfo->args[0].isnull = false; fcinfo->args[0].isnull = false;
fcinfo->args[1].isnull = false; fcinfo->args[1].isnull = false;
...@@ -903,7 +904,7 @@ histogram_selectivity(VariableStatData *vardata, FmgrInfo *opproc, ...@@ -903,7 +904,7 @@ histogram_selectivity(VariableStatData *vardata, FmgrInfo *opproc,
* Otherwise, fall back to the default selectivity provided by the caller. * Otherwise, fall back to the default selectivity provided by the caller.
*/ */
double double
generic_restriction_selectivity(PlannerInfo *root, Oid oproid, generic_restriction_selectivity(PlannerInfo *root, Oid oproid, Oid collation,
List *args, int varRelid, List *args, int varRelid,
double default_selectivity) double default_selectivity)
{ {
...@@ -946,7 +947,8 @@ generic_restriction_selectivity(PlannerInfo *root, Oid oproid, ...@@ -946,7 +947,8 @@ generic_restriction_selectivity(PlannerInfo *root, Oid oproid,
/* /*
* Calculate the selectivity for the column's most common values. * Calculate the selectivity for the column's most common values.
*/ */
mcvsel = mcv_selectivity(&vardata, &opproc, constval, varonleft, mcvsel = mcv_selectivity(&vardata, &opproc, collation,
constval, varonleft,
&mcvsum); &mcvsum);
/* /*
...@@ -955,7 +957,7 @@ generic_restriction_selectivity(PlannerInfo *root, Oid oproid, ...@@ -955,7 +957,7 @@ generic_restriction_selectivity(PlannerInfo *root, Oid oproid,
* population. Otherwise use the default selectivity for the non-MCV * population. Otherwise use the default selectivity for the non-MCV
* population. * population.
*/ */
selec = histogram_selectivity(&vardata, &opproc, selec = histogram_selectivity(&vardata, &opproc, collation,
constval, varonleft, constval, varonleft,
10, 1, &hist_size); 10, 1, &hist_size);
if (selec < 0) if (selec < 0)
...@@ -1029,6 +1031,7 @@ double ...@@ -1029,6 +1031,7 @@ double
ineq_histogram_selectivity(PlannerInfo *root, ineq_histogram_selectivity(PlannerInfo *root,
VariableStatData *vardata, VariableStatData *vardata,
FmgrInfo *opproc, bool isgt, bool iseq, FmgrInfo *opproc, bool isgt, bool iseq,
Oid collation,
Datum constval, Oid consttype) Datum constval, Oid consttype)
{ {
double hist_selec; double hist_selec;
...@@ -1042,9 +1045,11 @@ ineq_histogram_selectivity(PlannerInfo *root, ...@@ -1042,9 +1045,11 @@ ineq_histogram_selectivity(PlannerInfo *root,
* column type. However, to make that work we will need to figure out * column type. However, to make that work we will need to figure out
* which staop to search for --- it's not necessarily the one we have at * which staop to search for --- it's not necessarily the one we have at
* hand! (For example, we might have a '<=' operator rather than the '<' * hand! (For example, we might have a '<=' operator rather than the '<'
* operator that will appear in staop.) For now, assume that whatever * operator that will appear in staop.) The collation might not agree
* appears in pg_statistic is sorted the same way our operator sorts, or * either. For now, just assume that whatever appears in pg_statistic is
* the reverse way if isgt is true. * sorted the same way our operator sorts, or the reverse way if isgt is
* true. This could result in a bogus estimate, but it still seems better
* than falling back to the default estimate.
*/ */
if (HeapTupleIsValid(vardata->statsTuple) && if (HeapTupleIsValid(vardata->statsTuple) &&
statistic_proc_security_check(vardata, opproc->fn_oid) && statistic_proc_security_check(vardata, opproc->fn_oid) &&
...@@ -1090,6 +1095,7 @@ ineq_histogram_selectivity(PlannerInfo *root, ...@@ -1090,6 +1095,7 @@ ineq_histogram_selectivity(PlannerInfo *root,
have_end = get_actual_variable_range(root, have_end = get_actual_variable_range(root,
vardata, vardata,
sslot.staop, sslot.staop,
collation,
&sslot.values[0], &sslot.values[0],
&sslot.values[1]); &sslot.values[1]);
...@@ -1107,17 +1113,19 @@ ineq_histogram_selectivity(PlannerInfo *root, ...@@ -1107,17 +1113,19 @@ ineq_histogram_selectivity(PlannerInfo *root,
have_end = get_actual_variable_range(root, have_end = get_actual_variable_range(root,
vardata, vardata,
sslot.staop, sslot.staop,
collation,
&sslot.values[0], &sslot.values[0],
NULL); NULL);
else if (probe == sslot.nvalues - 1 && sslot.nvalues > 2) else if (probe == sslot.nvalues - 1 && sslot.nvalues > 2)
have_end = get_actual_variable_range(root, have_end = get_actual_variable_range(root,
vardata, vardata,
sslot.staop, sslot.staop,
collation,
NULL, NULL,
&sslot.values[probe]); &sslot.values[probe]);
ltcmp = DatumGetBool(FunctionCall2Coll(opproc, ltcmp = DatumGetBool(FunctionCall2Coll(opproc,
sslot.stacoll, collation,
sslot.values[probe], sslot.values[probe],
constval)); constval));
if (isgt) if (isgt)
...@@ -1202,7 +1210,7 @@ ineq_histogram_selectivity(PlannerInfo *root, ...@@ -1202,7 +1210,7 @@ ineq_histogram_selectivity(PlannerInfo *root,
* values to a uniform comparison scale, and do a linear * values to a uniform comparison scale, and do a linear
* interpolation within this bin. * interpolation within this bin.
*/ */
if (convert_to_scalar(constval, consttype, sslot.stacoll, if (convert_to_scalar(constval, consttype, collation,
&val, &val,
sslot.values[i - 1], sslot.values[i], sslot.values[i - 1], sslot.values[i],
vardata->vartype, vardata->vartype,
...@@ -1342,6 +1350,7 @@ scalarineqsel_wrapper(PG_FUNCTION_ARGS, bool isgt, bool iseq) ...@@ -1342,6 +1350,7 @@ scalarineqsel_wrapper(PG_FUNCTION_ARGS, bool isgt, bool iseq)
Oid operator = PG_GETARG_OID(1); Oid operator = PG_GETARG_OID(1);
List *args = (List *) PG_GETARG_POINTER(2); List *args = (List *) PG_GETARG_POINTER(2);
int varRelid = PG_GETARG_INT32(3); int varRelid = PG_GETARG_INT32(3);
Oid collation = PG_GET_COLLATION();
VariableStatData vardata; VariableStatData vardata;
Node *other; Node *other;
bool varonleft; bool varonleft;
...@@ -1394,7 +1403,7 @@ scalarineqsel_wrapper(PG_FUNCTION_ARGS, bool isgt, bool iseq) ...@@ -1394,7 +1403,7 @@ scalarineqsel_wrapper(PG_FUNCTION_ARGS, bool isgt, bool iseq)
} }
/* The rest of the work is done by scalarineqsel(). */ /* The rest of the work is done by scalarineqsel(). */
selec = scalarineqsel(root, operator, isgt, iseq, selec = scalarineqsel(root, operator, isgt, iseq, collation,
&vardata, constval, consttype); &vardata, constval, consttype);
ReleaseVariableStats(vardata); ReleaseVariableStats(vardata);
...@@ -1459,7 +1468,7 @@ boolvarsel(PlannerInfo *root, Node *arg, int varRelid) ...@@ -1459,7 +1468,7 @@ boolvarsel(PlannerInfo *root, Node *arg, int varRelid)
* A boolean variable V is equivalent to the clause V = 't', so we * A boolean variable V is equivalent to the clause V = 't', so we
* compute the selectivity as if that is what we have. * compute the selectivity as if that is what we have.
*/ */
selec = var_eq_const(&vardata, BooleanEqualOperator, selec = var_eq_const(&vardata, BooleanEqualOperator, InvalidOid,
BoolGetDatum(true), false, true, false); BoolGetDatum(true), false, true, false);
} }
else else
...@@ -2185,6 +2194,7 @@ eqjoinsel(PG_FUNCTION_ARGS) ...@@ -2185,6 +2194,7 @@ eqjoinsel(PG_FUNCTION_ARGS)
JoinType jointype = (JoinType) PG_GETARG_INT16(3); JoinType jointype = (JoinType) PG_GETARG_INT16(3);
#endif #endif
SpecialJoinInfo *sjinfo = (SpecialJoinInfo *) PG_GETARG_POINTER(4); SpecialJoinInfo *sjinfo = (SpecialJoinInfo *) PG_GETARG_POINTER(4);
Oid collation = PG_GET_COLLATION();
double selec; double selec;
double selec_inner; double selec_inner;
VariableStatData vardata1; VariableStatData vardata1;
...@@ -2235,7 +2245,7 @@ eqjoinsel(PG_FUNCTION_ARGS) ...@@ -2235,7 +2245,7 @@ eqjoinsel(PG_FUNCTION_ARGS)
} }
/* We need to compute the inner-join selectivity in all cases */ /* We need to compute the inner-join selectivity in all cases */
selec_inner = eqjoinsel_inner(opfuncoid, selec_inner = eqjoinsel_inner(opfuncoid, collation,
&vardata1, &vardata2, &vardata1, &vardata2,
nd1, nd2, nd1, nd2,
isdefault1, isdefault2, isdefault1, isdefault2,
...@@ -2262,7 +2272,7 @@ eqjoinsel(PG_FUNCTION_ARGS) ...@@ -2262,7 +2272,7 @@ eqjoinsel(PG_FUNCTION_ARGS)
inner_rel = find_join_input_rel(root, sjinfo->min_righthand); inner_rel = find_join_input_rel(root, sjinfo->min_righthand);
if (!join_is_reversed) if (!join_is_reversed)
selec = eqjoinsel_semi(opfuncoid, selec = eqjoinsel_semi(opfuncoid, collation,
&vardata1, &vardata2, &vardata1, &vardata2,
nd1, nd2, nd1, nd2,
isdefault1, isdefault2, isdefault1, isdefault2,
...@@ -2275,7 +2285,7 @@ eqjoinsel(PG_FUNCTION_ARGS) ...@@ -2275,7 +2285,7 @@ eqjoinsel(PG_FUNCTION_ARGS)
Oid commop = get_commutator(operator); Oid commop = get_commutator(operator);
Oid commopfuncoid = OidIsValid(commop) ? get_opcode(commop) : InvalidOid; Oid commopfuncoid = OidIsValid(commop) ? get_opcode(commop) : InvalidOid;
selec = eqjoinsel_semi(commopfuncoid, selec = eqjoinsel_semi(commopfuncoid, collation,
&vardata2, &vardata1, &vardata2, &vardata1,
nd2, nd1, nd2, nd1,
isdefault2, isdefault1, isdefault2, isdefault1,
...@@ -2323,7 +2333,7 @@ eqjoinsel(PG_FUNCTION_ARGS) ...@@ -2323,7 +2333,7 @@ eqjoinsel(PG_FUNCTION_ARGS)
* that it's worth trying to distinguish them here. * that it's worth trying to distinguish them here.
*/ */
static double static double
eqjoinsel_inner(Oid opfuncoid, eqjoinsel_inner(Oid opfuncoid, Oid collation,
VariableStatData *vardata1, VariableStatData *vardata2, VariableStatData *vardata1, VariableStatData *vardata2,
double nd1, double nd2, double nd1, double nd2,
bool isdefault1, bool isdefault2, bool isdefault1, bool isdefault2,
...@@ -2373,7 +2383,7 @@ eqjoinsel_inner(Oid opfuncoid, ...@@ -2373,7 +2383,7 @@ eqjoinsel_inner(Oid opfuncoid,
* returns NULL, though really equality functions should never do * returns NULL, though really equality functions should never do
* that. * that.
*/ */
InitFunctionCallInfoData(*fcinfo, &eqproc, 2, sslot1->stacoll, InitFunctionCallInfoData(*fcinfo, &eqproc, 2, collation,
NULL, NULL); NULL, NULL);
fcinfo->args[0].isnull = false; fcinfo->args[0].isnull = false;
fcinfo->args[1].isnull = false; fcinfo->args[1].isnull = false;
...@@ -2520,7 +2530,7 @@ eqjoinsel_inner(Oid opfuncoid, ...@@ -2520,7 +2530,7 @@ eqjoinsel_inner(Oid opfuncoid,
* Unlike eqjoinsel_inner, we have to cope with opfuncoid being InvalidOid. * Unlike eqjoinsel_inner, we have to cope with opfuncoid being InvalidOid.
*/ */
static double static double
eqjoinsel_semi(Oid opfuncoid, eqjoinsel_semi(Oid opfuncoid, Oid collation,
VariableStatData *vardata1, VariableStatData *vardata2, VariableStatData *vardata1, VariableStatData *vardata2,
double nd1, double nd2, double nd1, double nd2,
bool isdefault1, bool isdefault2, bool isdefault1, bool isdefault2,
...@@ -2603,7 +2613,7 @@ eqjoinsel_semi(Oid opfuncoid, ...@@ -2603,7 +2613,7 @@ eqjoinsel_semi(Oid opfuncoid,
* returns NULL, though really equality functions should never do * returns NULL, though really equality functions should never do
* that. * that.
*/ */
InitFunctionCallInfoData(*fcinfo, &eqproc, 2, sslot1->stacoll, InitFunctionCallInfoData(*fcinfo, &eqproc, 2, collation,
NULL, NULL); NULL, NULL);
fcinfo->args[0].isnull = false; fcinfo->args[0].isnull = false;
fcinfo->args[1].isnull = false; fcinfo->args[1].isnull = false;
...@@ -2851,6 +2861,7 @@ mergejoinscansel(PlannerInfo *root, Node *clause, ...@@ -2851,6 +2861,7 @@ mergejoinscansel(PlannerInfo *root, Node *clause,
Oid op_lefttype; Oid op_lefttype;
Oid op_righttype; Oid op_righttype;
Oid opno, Oid opno,
collation,
lsortop, lsortop,
rsortop, rsortop,
lstatop, lstatop,
...@@ -2875,6 +2886,7 @@ mergejoinscansel(PlannerInfo *root, Node *clause, ...@@ -2875,6 +2886,7 @@ mergejoinscansel(PlannerInfo *root, Node *clause,
if (!is_opclause(clause)) if (!is_opclause(clause))
return; /* shouldn't happen */ return; /* shouldn't happen */
opno = ((OpExpr *) clause)->opno; opno = ((OpExpr *) clause)->opno;
collation = ((OpExpr *) clause)->inputcollid;
left = get_leftop((Expr *) clause); left = get_leftop((Expr *) clause);
right = get_rightop((Expr *) clause); right = get_rightop((Expr *) clause);
if (!right) if (!right)
...@@ -3008,20 +3020,20 @@ mergejoinscansel(PlannerInfo *root, Node *clause, ...@@ -3008,20 +3020,20 @@ mergejoinscansel(PlannerInfo *root, Node *clause,
/* Try to get ranges of both inputs */ /* Try to get ranges of both inputs */
if (!isgt) if (!isgt)
{ {
if (!get_variable_range(root, &leftvar, lstatop, if (!get_variable_range(root, &leftvar, lstatop, collation,
&leftmin, &leftmax)) &leftmin, &leftmax))
goto fail; /* no range available from stats */ goto fail; /* no range available from stats */
if (!get_variable_range(root, &rightvar, rstatop, if (!get_variable_range(root, &rightvar, rstatop, collation,
&rightmin, &rightmax)) &rightmin, &rightmax))
goto fail; /* no range available from stats */ goto fail; /* no range available from stats */
} }
else else
{ {
/* need to swap the max and min */ /* need to swap the max and min */
if (!get_variable_range(root, &leftvar, lstatop, if (!get_variable_range(root, &leftvar, lstatop, collation,
&leftmax, &leftmin)) &leftmax, &leftmin))
goto fail; /* no range available from stats */ goto fail; /* no range available from stats */
if (!get_variable_range(root, &rightvar, rstatop, if (!get_variable_range(root, &rightvar, rstatop, collation,
&rightmax, &rightmin)) &rightmax, &rightmin))
goto fail; /* no range available from stats */ goto fail; /* no range available from stats */
} }
...@@ -3031,13 +3043,13 @@ mergejoinscansel(PlannerInfo *root, Node *clause, ...@@ -3031,13 +3043,13 @@ mergejoinscansel(PlannerInfo *root, Node *clause,
* fraction that's <= the right-side maximum value. But only believe * fraction that's <= the right-side maximum value. But only believe
* non-default estimates, else stick with our 1.0. * non-default estimates, else stick with our 1.0.
*/ */
selec = scalarineqsel(root, leop, isgt, true, &leftvar, selec = scalarineqsel(root, leop, isgt, true, collation, &leftvar,
rightmax, op_righttype); rightmax, op_righttype);
if (selec != DEFAULT_INEQ_SEL) if (selec != DEFAULT_INEQ_SEL)
*leftend = selec; *leftend = selec;
/* And similarly for the right variable. */ /* And similarly for the right variable. */
selec = scalarineqsel(root, revleop, isgt, true, &rightvar, selec = scalarineqsel(root, revleop, isgt, true, collation, &rightvar,
leftmax, op_lefttype); leftmax, op_lefttype);
if (selec != DEFAULT_INEQ_SEL) if (selec != DEFAULT_INEQ_SEL)
*rightend = selec; *rightend = selec;
...@@ -3061,13 +3073,13 @@ mergejoinscansel(PlannerInfo *root, Node *clause, ...@@ -3061,13 +3073,13 @@ mergejoinscansel(PlannerInfo *root, Node *clause,
* minimum value. But only believe non-default estimates, else stick with * minimum value. But only believe non-default estimates, else stick with
* our own default. * our own default.
*/ */
selec = scalarineqsel(root, ltop, isgt, false, &leftvar, selec = scalarineqsel(root, ltop, isgt, false, collation, &leftvar,
rightmin, op_righttype); rightmin, op_righttype);
if (selec != DEFAULT_INEQ_SEL) if (selec != DEFAULT_INEQ_SEL)
*leftstart = selec; *leftstart = selec;
/* And similarly for the right variable. */ /* And similarly for the right variable. */
selec = scalarineqsel(root, revltop, isgt, false, &rightvar, selec = scalarineqsel(root, revltop, isgt, false, collation, &rightvar,
leftmin, op_lefttype); leftmin, op_lefttype);
if (selec != DEFAULT_INEQ_SEL) if (selec != DEFAULT_INEQ_SEL)
*rightstart = selec; *rightstart = selec;
...@@ -3147,10 +3159,11 @@ matchingsel(PG_FUNCTION_ARGS) ...@@ -3147,10 +3159,11 @@ matchingsel(PG_FUNCTION_ARGS)
Oid operator = PG_GETARG_OID(1); Oid operator = PG_GETARG_OID(1);
List *args = (List *) PG_GETARG_POINTER(2); List *args = (List *) PG_GETARG_POINTER(2);
int varRelid = PG_GETARG_INT32(3); int varRelid = PG_GETARG_INT32(3);
Oid collation = PG_GET_COLLATION();
double selec; double selec;
/* Use generic restriction selectivity logic. */ /* Use generic restriction selectivity logic. */
selec = generic_restriction_selectivity(root, operator, selec = generic_restriction_selectivity(root, operator, collation,
args, varRelid, args, varRelid,
DEFAULT_MATCHING_SEL); DEFAULT_MATCHING_SEL);
...@@ -5337,9 +5350,11 @@ get_variable_numdistinct(VariableStatData *vardata, bool *isdefault) ...@@ -5337,9 +5350,11 @@ get_variable_numdistinct(VariableStatData *vardata, bool *isdefault)
* *
* sortop is the "<" comparison operator to use. This should generally * sortop is the "<" comparison operator to use. This should generally
* be "<" not ">", as only the former is likely to be found in pg_statistic. * be "<" not ">", as only the former is likely to be found in pg_statistic.
* The collation must be specified too.
*/ */
static bool static bool
get_variable_range(PlannerInfo *root, VariableStatData *vardata, Oid sortop, get_variable_range(PlannerInfo *root, VariableStatData *vardata,
Oid sortop, Oid collation,
Datum *min, Datum *max) Datum *min, Datum *max)
{ {
Datum tmin = 0; Datum tmin = 0;
...@@ -5359,7 +5374,7 @@ get_variable_range(PlannerInfo *root, VariableStatData *vardata, Oid sortop, ...@@ -5359,7 +5374,7 @@ get_variable_range(PlannerInfo *root, VariableStatData *vardata, Oid sortop,
* before enabling this. * before enabling this.
*/ */
#ifdef NOT_USED #ifdef NOT_USED
if (get_actual_variable_range(root, vardata, sortop, min, max)) if (get_actual_variable_range(root, vardata, sortop, collation, min, max))
return true; return true;
#endif #endif
...@@ -5387,7 +5402,7 @@ get_variable_range(PlannerInfo *root, VariableStatData *vardata, Oid sortop, ...@@ -5387,7 +5402,7 @@ get_variable_range(PlannerInfo *root, VariableStatData *vardata, Oid sortop,
* *
* If there is a histogram that is sorted with some other operator than * If there is a histogram that is sorted with some other operator than
* the one we want, fail --- this suggests that there is data we can't * the one we want, fail --- this suggests that there is data we can't
* use. * use. XXX consider collation too.
*/ */
if (get_attstatsslot(&sslot, vardata->statsTuple, if (get_attstatsslot(&sslot, vardata->statsTuple,
STATISTIC_KIND_HISTOGRAM, sortop, STATISTIC_KIND_HISTOGRAM, sortop,
...@@ -5434,14 +5449,14 @@ get_variable_range(PlannerInfo *root, VariableStatData *vardata, Oid sortop, ...@@ -5434,14 +5449,14 @@ get_variable_range(PlannerInfo *root, VariableStatData *vardata, Oid sortop,
continue; continue;
} }
if (DatumGetBool(FunctionCall2Coll(&opproc, if (DatumGetBool(FunctionCall2Coll(&opproc,
sslot.stacoll, collation,
sslot.values[i], tmin))) sslot.values[i], tmin)))
{ {
tmin = sslot.values[i]; tmin = sslot.values[i];
tmin_is_mcv = true; tmin_is_mcv = true;
} }
if (DatumGetBool(FunctionCall2Coll(&opproc, if (DatumGetBool(FunctionCall2Coll(&opproc,
sslot.stacoll, collation,
tmax, sslot.values[i]))) tmax, sslot.values[i])))
{ {
tmax = sslot.values[i]; tmax = sslot.values[i];
...@@ -5471,10 +5486,11 @@ get_variable_range(PlannerInfo *root, VariableStatData *vardata, Oid sortop, ...@@ -5471,10 +5486,11 @@ get_variable_range(PlannerInfo *root, VariableStatData *vardata, Oid sortop,
* If no data available, return false. * If no data available, return false.
* *
* sortop is the "<" comparison operator to use. * sortop is the "<" comparison operator to use.
* collation is the required collation.
*/ */
static bool static bool
get_actual_variable_range(PlannerInfo *root, VariableStatData *vardata, get_actual_variable_range(PlannerInfo *root, VariableStatData *vardata,
Oid sortop, Oid sortop, Oid collation,
Datum *min, Datum *max) Datum *min, Datum *max)
{ {
bool have_data = false; bool have_data = false;
...@@ -5514,9 +5530,11 @@ get_actual_variable_range(PlannerInfo *root, VariableStatData *vardata, ...@@ -5514,9 +5530,11 @@ get_actual_variable_range(PlannerInfo *root, VariableStatData *vardata,
continue; continue;
/* /*
* The first index column must match the desired variable and sort * The first index column must match the desired variable, sortop, and
* operator --- but we can use a descending-order index. * collation --- but we can use a descending-order index.
*/ */
if (collation != index->indexcollations[0])
continue; /* test first 'cause it's cheapest */
if (!match_index_to_operand(vardata->var, 0, index)) if (!match_index_to_operand(vardata->var, 0, index))
continue; continue;
switch (get_op_opfamily_strategy(sortop, index->sortopfamily[0])) switch (get_op_opfamily_strategy(sortop, index->sortopfamily[0]))
......
...@@ -144,24 +144,30 @@ extern void get_join_variables(PlannerInfo *root, List *args, ...@@ -144,24 +144,30 @@ extern void get_join_variables(PlannerInfo *root, List *args,
bool *join_is_reversed); bool *join_is_reversed);
extern double get_variable_numdistinct(VariableStatData *vardata, extern double get_variable_numdistinct(VariableStatData *vardata,
bool *isdefault); bool *isdefault);
extern double mcv_selectivity(VariableStatData *vardata, FmgrInfo *opproc, extern double mcv_selectivity(VariableStatData *vardata,
FmgrInfo *opproc, Oid collation,
Datum constval, bool varonleft, Datum constval, bool varonleft,
double *sumcommonp); double *sumcommonp);
extern double histogram_selectivity(VariableStatData *vardata, FmgrInfo *opproc, extern double histogram_selectivity(VariableStatData *vardata,
FmgrInfo *opproc, Oid collation,
Datum constval, bool varonleft, Datum constval, bool varonleft,
int min_hist_size, int n_skip, int min_hist_size, int n_skip,
int *hist_size); int *hist_size);
extern double generic_restriction_selectivity(PlannerInfo *root, Oid oproid, extern double generic_restriction_selectivity(PlannerInfo *root,
Oid oproid, Oid collation,
List *args, int varRelid, List *args, int varRelid,
double default_selectivity); double default_selectivity);
extern double ineq_histogram_selectivity(PlannerInfo *root, extern double ineq_histogram_selectivity(PlannerInfo *root,
VariableStatData *vardata, VariableStatData *vardata,
FmgrInfo *opproc, bool isgt, bool iseq, FmgrInfo *opproc, bool isgt, bool iseq,
Oid collation,
Datum constval, Oid consttype); Datum constval, Oid consttype);
extern double var_eq_const(VariableStatData *vardata, Oid oproid, extern double var_eq_const(VariableStatData *vardata,
Oid oproid, Oid collation,
Datum constval, bool constisnull, Datum constval, bool constisnull,
bool varonleft, bool negate); bool varonleft, bool negate);
extern double var_eq_non_const(VariableStatData *vardata, Oid oproid, extern double var_eq_non_const(VariableStatData *vardata,
Oid oproid, Oid collation,
Node *other, Node *other,
bool varonleft, bool negate); bool varonleft, bool negate);
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment