• Tom Lane's avatar
    Fix ancient thinko in mergejoin cost estimation. · d364e881
    Tom Lane authored
    "rescanratio" was computed as 1 + rescanned-tuples / total-inner-tuples,
    which is sensible if it's to be multiplied by total-inner-tuples or a cost
    value corresponding to scanning all the inner tuples.  But in reality it
    was (mostly) multiplied by inner_rows or a related cost, numbers that take
    into account the possibility of stopping short of scanning the whole inner
    relation thanks to a limited key range in the outer relation.  This'd
    still make sense if we could expect that stopping short would result in a
    proportional decrease in the number of tuples that have to be rescanned.
    It does not, however.  The argument that establishes the validity of our
    estimate for that number is independent of whether we scan all of the inner
    relation or stop short, and experimentation also shows that stopping short
    doesn't reduce the number of rescanned tuples.  So the correct calculation
    is 1 + rescanned-tuples / inner_rows, and we should be sure to multiply
    that by inner_rows or a corresponding cost value.
    
    Most of the time this doesn't make much difference, but if we have
    both a high rescan rate (due to lots of duplicate values) and an outer
    key range much smaller than the inner key range, then the error can
    be significant, leading to a large underestimate of the cost associated
    with rescanning.
    
    Per report from Vijaykumar Jain.  This thinko appears to go all the way
    back to the introduction of the rescan estimation logic in commit
    70fba704, so back-patch to all supported branches.
    
    Discussion: https://postgr.es/m/CAE7uO5hMb_TZYJcZmLAgO6iD68AkEK6qCe7i=vZUkCpoKns+EQ@mail.gmail.com
    d364e881
costsize.c 177 KB