Commit 044c99bc authored by Tom Lane's avatar Tom Lane

Use query collation, not column's collation, while examining statistics.

Commit 5e092800 changed the planner so that, instead of blindly using
DEFAULT_COLLATION_OID when invoking operators for selectivity estimation,
it would use the collation of the column whose statistics we're
considering.  This was recognized as still being not quite the right
thing, but it seemed like a good incremental improvement.  However,
shortly thereafter we introduced nondeterministic collations, and that
creates cases where operators can fail if they're passed the wrong
collation.  We don't want planning to fail in cases where the query itself
would work, so this means that we *must* use the query's collation when
invoking operators for estimation purposes.

The only real problem this creates is in ineq_histogram_selectivity, where
the binary search might produce a garbage answer if we perform comparisons
using a different collation than the column's histogram is ordered with.
However, when the query's collation is significantly different from the
column's default collation, the estimate we previously generated would be
pretty irrelevant anyway; so it's not clear that this will result in
noticeably worse estimates in practice.  (A follow-on patch will improve
this situation in HEAD, but it seems too invasive for back-patch.)

The patch requires changing the signatures of mcv_selectivity and allied
functions, which are exported and very possibly are used by extensions.
In HEAD, I just did that, but an API/ABI break of this sort isn't
acceptable in stable branches.  Therefore, in v12 the patch introduces
"mcv_selectivity_ext" and so on, with signatures matching HEAD, and makes
the old functions into wrappers that assume DEFAULT_COLLATION_OID should
be used.  That does not match the prior behavior, but it should avoid risk
of failure in most cases.  (In practice, I think most extension datatypes
aren't collation-aware, so the change probably doesn't matter to them.)

Per report from James Lucas.  Back-patch to v12 where the problem was
introduced.

Discussion: https://postgr.es/m/CAAFmbbOvfi=wMM=3qRsPunBSLb8BFREno2oOzSBS=mzfLPKABw@mail.gmail.com
parent f0d2c65f
......@@ -582,7 +582,7 @@ ltreeparentsel(PG_FUNCTION_ARGS)
double selec;
/* Use generic restriction selectivity logic, with default 0.001. */
selec = generic_restriction_selectivity(root, operator,
selec = generic_restriction_selectivity(root, operator, InvalidOid,
args, varRelid,
0.001);
......
......@@ -92,6 +92,7 @@ static Pattern_Prefix_Status pattern_fixed_prefix(Const *patt,
static Selectivity prefix_selectivity(PlannerInfo *root,
VariableStatData *vardata,
Oid eqopr, Oid ltopr, Oid geopr,
Oid collation,
Const *prefixcon);
static Selectivity like_selectivity(const char *patt, int pattlen,
bool case_insensitive);
......@@ -534,12 +535,6 @@ patternsel_common(PlannerInfo *root,
* something binary-compatible but different.) We can use it to identify
* the comparison operators and the required type of the comparison
* constant, much as in match_pattern_prefix().
*
* NOTE: this logic does not consider collations. Ideally we'd force use
* of "C" collation, but since ANALYZE only generates statistics for the
* column's specified collation, we have little choice but to use those.
* But our results are so approximate anyway that it probably hardly
* matters.
*/
vartype = vardata.vartype;
......@@ -622,7 +617,7 @@ patternsel_common(PlannerInfo *root,
/*
* Pattern specifies an exact match, so estimate as for '='
*/
result = var_eq_const(&vardata, eqopr, prefix->constvalue,
result = var_eq_const(&vardata, eqopr, collation, prefix->constvalue,
false, true, false);
}
else
......@@ -654,7 +649,8 @@ patternsel_common(PlannerInfo *root,
opfuncid = get_opcode(oprid);
fmgr_info(opfuncid, &opproc);
selec = histogram_selectivity(&vardata, &opproc, constval, true,
selec = histogram_selectivity(&vardata, &opproc, collation,
constval, true,
10, 1, &hist_size);
/* If not at least 100 entries, use the heuristic method */
......@@ -666,6 +662,7 @@ patternsel_common(PlannerInfo *root,
if (pstatus == Pattern_Prefix_Partial)
prefixsel = prefix_selectivity(root, &vardata,
eqopr, ltopr, geopr,
collation,
prefix);
else
prefixsel = 1.0;
......@@ -698,7 +695,8 @@ patternsel_common(PlannerInfo *root,
* directly to the result selectivity. Also add up the total fraction
* represented by MCV entries.
*/
mcv_selec = mcv_selectivity(&vardata, &opproc, constval, true,
mcv_selec = mcv_selectivity(&vardata, &opproc, collation,
constval, true,
&sumcommon);
/*
......@@ -1196,7 +1194,7 @@ pattern_fixed_prefix(Const *patt, Pattern_Type ptype, Oid collation,
* population represented by the histogram --- the caller must fold this
* together with info about MCVs and NULLs.
*
* We use the specified btree comparison operators to do the estimation.
* We use the given comparison operators and collation to do the estimation.
* The given variable and Const must be of the associated datatype(s).
*
* XXX Note: we make use of the upper bound to estimate operator selectivity
......@@ -1207,11 +1205,11 @@ pattern_fixed_prefix(Const *patt, Pattern_Type ptype, Oid collation,
static Selectivity
prefix_selectivity(PlannerInfo *root, VariableStatData *vardata,
Oid eqopr, Oid ltopr, Oid geopr,
Oid collation,
Const *prefixcon)
{
Selectivity prefixsel;
FmgrInfo opproc;
AttStatsSlot sslot;
Const *greaterstrcon;
Selectivity eq_sel;
......@@ -1220,6 +1218,7 @@ prefix_selectivity(PlannerInfo *root, VariableStatData *vardata,
prefixsel = ineq_histogram_selectivity(root, vardata,
&opproc, true, true,
collation,
prefixcon->constvalue,
prefixcon->consttype);
......@@ -1229,27 +1228,18 @@ prefix_selectivity(PlannerInfo *root, VariableStatData *vardata,
return DEFAULT_MATCH_SEL;
}
/*-------
* If we can create a string larger than the prefix, say
* "x < greaterstr". We try to generate the string referencing the
* collation of the var's statistics, but if that's not available,
* use DEFAULT_COLLATION_OID.
*-------
/*
* If we can create a string larger than the prefix, say "x < greaterstr".
*/
if (HeapTupleIsValid(vardata->statsTuple) &&
get_attstatsslot(&sslot, vardata->statsTuple,
STATISTIC_KIND_HISTOGRAM, InvalidOid, 0))
/* sslot.stacoll is set up */ ;
else
sslot.stacoll = DEFAULT_COLLATION_OID;
fmgr_info(get_opcode(ltopr), &opproc);
greaterstrcon = make_greater_string(prefixcon, &opproc, sslot.stacoll);
greaterstrcon = make_greater_string(prefixcon, &opproc, collation);
if (greaterstrcon)
{
Selectivity topsel;
topsel = ineq_histogram_selectivity(root, vardata,
&opproc, false, false,
collation,
greaterstrcon->constvalue,
greaterstrcon->consttype);
......@@ -1278,7 +1268,7 @@ prefix_selectivity(PlannerInfo *root, VariableStatData *vardata,
* probably off the end of the histogram, and thus we probably got a very
* small estimate from the >= condition; so we still need to clamp.
*/
eq_sel = var_eq_const(vardata, eqopr, prefixcon->constvalue,
eq_sel = var_eq_const(vardata, eqopr, collation, prefixcon->constvalue,
false, true, false);
prefixsel = Max(prefixsel, eq_sel);
......
......@@ -137,7 +137,8 @@ networksel(PG_FUNCTION_ARGS)
* by MCV entries.
*/
fmgr_info(get_opcode(operator), &proc);
mcv_selec = mcv_selectivity(&vardata, &proc, constvalue, varonleft,
mcv_selec = mcv_selectivity(&vardata, &proc, InvalidOid,
constvalue, varonleft,
&sumcommon);
/*
......
This diff is collapsed.
......@@ -144,24 +144,30 @@ extern void get_join_variables(PlannerInfo *root, List *args,
bool *join_is_reversed);
extern double get_variable_numdistinct(VariableStatData *vardata,
bool *isdefault);
extern double mcv_selectivity(VariableStatData *vardata, FmgrInfo *opproc,
extern double mcv_selectivity(VariableStatData *vardata,
FmgrInfo *opproc, Oid collation,
Datum constval, bool varonleft,
double *sumcommonp);
extern double histogram_selectivity(VariableStatData *vardata, FmgrInfo *opproc,
extern double histogram_selectivity(VariableStatData *vardata,
FmgrInfo *opproc, Oid collation,
Datum constval, bool varonleft,
int min_hist_size, int n_skip,
int *hist_size);
extern double generic_restriction_selectivity(PlannerInfo *root, Oid oproid,
extern double generic_restriction_selectivity(PlannerInfo *root,
Oid oproid, Oid collation,
List *args, int varRelid,
double default_selectivity);
extern double ineq_histogram_selectivity(PlannerInfo *root,
VariableStatData *vardata,
FmgrInfo *opproc, bool isgt, bool iseq,
Oid collation,
Datum constval, Oid consttype);
extern double var_eq_const(VariableStatData *vardata, Oid oproid,
extern double var_eq_const(VariableStatData *vardata,
Oid oproid, Oid collation,
Datum constval, bool constisnull,
bool varonleft, bool negate);
extern double var_eq_non_const(VariableStatData *vardata, Oid oproid,
extern double var_eq_non_const(VariableStatData *vardata,
Oid oproid, Oid collation,
Node *other,
bool varonleft, bool negate);
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment