• Tom Lane's avatar
    Add some more defenses against silly estimates to gincostestimate(). · 3c93a60f
    Tom Lane authored
    A report from Andy Colson showed that gincostestimate() was not being
    nearly paranoid enough about whether to believe the statistics it finds in
    the index metapage.  The problem is that the metapage stats (other than the
    pending-pages count) are only updated by VACUUM, and in the worst case
    could still reflect the index's original empty state even when it has grown
    to many entries.  We attempted to deal with that by scaling up the stats to
    match the current index size, but if nEntries is zero then scaling it up
    still gives zero.  Moreover, the proportion of pages that are entry pages
    vs. data pages vs. pending pages is unlikely to be estimated very well by
    scaling if the index is now orders of magnitude larger than before.
    
    We can improve matters by expanding the use of the rule-of-thumb estimates
    I introduced in commit 7fb008c5: if the index has grown by more
    than a cutoff amount (here set at 4X growth) since VACUUM, then use the
    rule-of-thumb numbers instead of scaling.  This might not be exactly right
    but it seems much less likely to produce insane estimates.
    
    I also improved both the scaling estimate and the rule-of-thumb estimate
    to account for numPendingPages, since it's reasonable to expect that that
    is accurate in any case, and certainly pages that are in the pending list
    are not either entry or data pages.
    
    As a somewhat separate issue, adjust the estimation equations that are
    concerned with extra fetches for partial-match searches.  These equations
    suppose that a fraction partialEntries / numEntries of the entry and data
    pages will be visited as a consequence of a partial-match search.  Now,
    it's physically impossible for that fraction to exceed one, but our
    estimate of partialEntries is mostly bunk, and our estimate of numEntries
    isn't exactly gospel either, so we could arrive at a silly value.  In the
    example presented by Andy we were coming out with a value of 100, leading
    to insane cost estimates.  Clamp the fraction to one to avoid that.
    
    Like the previous patch, back-patch to all supported branches; this
    problem can be demonstrated in one form or another in all of them.
    3c93a60f
selfuncs.c 222 KB