Commit a314c340 authored by Tom Lane's avatar Tom Lane

Clamp semijoin selectivity to be not more than inner-join selectivity.

We should never estimate the output of a semijoin to be more rows than
we estimate for an inner join with the same input rels and join condition;
it's obviously impossible for that to happen.  However, given the
relatively poor quality of our semijoin selectivity estimates ---
particularly, but not only, in cases where we punt and return a default
estimate --- we did often deliver such estimates.  To improve matters,
calculate both estimates inside eqjoinsel() and take the smaller one.

The bulk of this patch is just mechanical refactoring to avoid repetitive
information lookup when we call both eqjoinsel_semi and eqjoinsel_inner.
The actual new behavior is just

	selec = Min(selec, inner_rel->rows * selec_inner);

which looks a bit odd but is correct because of our different definitions
for inner and semi join selectivity.

There is one ensuing plan change in the regression tests, but it looks
reasonable enough (and checking the actual row counts shows that the
estimate moved closer to reality, not further away).

Per bug #15160 from Alexey Ermakov.  Although this is arguably a bug fix,
I won't risk destabilizing plan choices in stable branches by
back-patching.

Tom Lane, reviewed by Melanie Plageman

Discussion: https://postgr.es/m/152395805004.19366.3107109716821067806@wrigleys.postgresql.org
parent 3be5fe2b
This diff is collapsed.
...@@ -802,7 +802,7 @@ SELECT t1.* FROM prt1 t1 WHERE t1.a IN (SELECT t1.b FROM prt2 t1, prt1_e t2 WHER ...@@ -802,7 +802,7 @@ SELECT t1.* FROM prt1 t1 WHERE t1.a IN (SELECT t1.b FROM prt2 t1, prt1_e t2 WHER
EXPLAIN (COSTS OFF) EXPLAIN (COSTS OFF)
SELECT t1.* FROM prt1 t1 WHERE t1.a IN (SELECT t1.b FROM prt2 t1 WHERE t1.b IN (SELECT (t1.a + t1.b)/2 FROM prt1_e t1 WHERE t1.c = 0)) AND t1.b = 0 ORDER BY t1.a; SELECT t1.* FROM prt1 t1 WHERE t1.a IN (SELECT t1.b FROM prt2 t1 WHERE t1.b IN (SELECT (t1.a + t1.b)/2 FROM prt1_e t1 WHERE t1.c = 0)) AND t1.b = 0 ORDER BY t1.a;
QUERY PLAN QUERY PLAN
------------------------------------------------------------------------------- -------------------------------------------------------------------------
Sort Sort
Sort Key: t1.a Sort Key: t1.a
-> Append -> Append
...@@ -831,9 +831,8 @@ SELECT t1.* FROM prt1 t1 WHERE t1.a IN (SELECT t1.b FROM prt2 t1 WHERE t1.b IN ( ...@@ -831,9 +831,8 @@ SELECT t1.* FROM prt1 t1 WHERE t1.a IN (SELECT t1.b FROM prt2 t1 WHERE t1.b IN (
Index Cond: (a = t1_4.b) Index Cond: (a = t1_4.b)
Filter: (b = 0) Filter: (b = 0)
-> Nested Loop -> Nested Loop
-> Unique -> HashAggregate
-> Sort Group Key: t1_5.b
Sort Key: t1_5.b
-> Hash Semi Join -> Hash Semi Join
Hash Cond: (t1_5.b = ((t1_8.a + t1_8.b) / 2)) Hash Cond: (t1_5.b = ((t1_8.a + t1_8.b) / 2))
-> Seq Scan on prt2_p3 t1_5 -> Seq Scan on prt2_p3 t1_5
...@@ -843,7 +842,7 @@ SELECT t1.* FROM prt1 t1 WHERE t1.a IN (SELECT t1.b FROM prt2 t1 WHERE t1.b IN ( ...@@ -843,7 +842,7 @@ SELECT t1.* FROM prt1 t1 WHERE t1.a IN (SELECT t1.b FROM prt2 t1 WHERE t1.b IN (
-> Index Scan using iprt1_p3_a on prt1_p3 t1_2 -> Index Scan using iprt1_p3_a on prt1_p3 t1_2
Index Cond: (a = t1_5.b) Index Cond: (a = t1_5.b)
Filter: (b = 0) Filter: (b = 0)
(40 rows) (39 rows)
SELECT t1.* FROM prt1 t1 WHERE t1.a IN (SELECT t1.b FROM prt2 t1 WHERE t1.b IN (SELECT (t1.a + t1.b)/2 FROM prt1_e t1 WHERE t1.c = 0)) AND t1.b = 0 ORDER BY t1.a; SELECT t1.* FROM prt1 t1 WHERE t1.a IN (SELECT t1.b FROM prt2 t1 WHERE t1.b IN (SELECT (t1.a + t1.b)/2 FROM prt1_e t1 WHERE t1.c = 0)) AND t1.b = 0 ORDER BY t1.a;
a | b | c a | b | c
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment