• Tom Lane's avatar
    Use Append rather than MergeAppend for scanning ordered partitions. · 959d00e9
    Tom Lane authored
    If we need ordered output from a scan of a partitioned table, but
    the ordering matches the partition ordering, then we don't need to
    use a MergeAppend to combine the pre-ordered per-partition scan
    results: a plain Append will produce the same results.  This
    both saves useless comparison work inside the MergeAppend proper,
    and allows us to start returning tuples after istarting up just
    the first child node not all of them.
    
    However, all is not peaches and cream, because if some of the
    child nodes have high startup costs then there will be big
    discontinuities in the tuples-returned-versus-elapsed-time curve.
    The planner's cost model cannot handle that (yet, anyway).
    If we model the Append's startup cost as being just the first
    child's startup cost, we may drastically underestimate the cost
    of fetching slightly more tuples than are available from the first
    child.  Since we've had bad experiences with over-optimistic choices
    of "fast start" plans for ORDER BY LIMIT queries, that seems scary.
    As a klugy workaround, set the startup cost estimate for an ordered
    Append to be the sum of its children's startup costs (as MergeAppend
    would).  This doesn't really describe reality, but it's less likely
    to cause a bad plan choice than an underestimated startup cost would.
    In practice, the cases where we really care about this optimization
    will have child plans that are IndexScans with zero startup cost,
    so that the overly conservative estimate is still just zero.
    
    David Rowley, reviewed by Julien Rouhaud and Antonin Houska
    
    Discussion: https://postgr.es/m/CAKJS1f-hAqhPLRk_RaSFTgYxd=Tz5hA7kQ2h4-DhJufQk8TGuw@mail.gmail.com
    959d00e9
partbounds.c 82.9 KB