• Tom Lane's avatar
    Ignore old stats file timestamps when starting the stats collector. · ad1b5c84
    Tom Lane authored
    The stats collector disregards inquiry messages that bear a cutoff_time
    before when it last wrote the relevant stats file.  That's fine, but at
    startup when it reads the "permanent" stats files, it absorbed their
    timestamps as if they were the times at which the corresponding temporary
    stats files had been written.  In reality, of course, there's no data
    out there at all.  This led to disregarding inquiry messages soon after
    startup if the postmaster had been shut down and restarted within less
    than PGSTAT_STAT_INTERVAL; which is a pretty common scenario, both for
    testing and in the field.  Requesting backends would hang for 10 seconds
    and then report failure to read statistics, unless they got bailed out
    by some other backend coming along and making a newer request within
    that interval.
    
    I came across this through investigating unexpected delays in the
    src/test/recovery TAP tests: it manifests there because the autovacuum
    launcher hangs for 10 seconds when it can't get statistics at startup,
    thus preventing a second shutdown from occurring promptly.  We might
    want to do some things in the autovac code to make it less prone to
    getting stuck that way, but this change is a good bug fix regardless.
    
    In passing, also fix pgstat_read_statsfiles() to ensure that it
    re-zeroes its global stats variables if they are corrupted by a
    short read from the stats file.  (Other reads in that function
    go into temp variables, so that the issue doesn't arise.)
    
    This has been broken since we created the separation between permanent
    and temporary stats files in 8.4, so back-patch to all supported branches.
    
    Discussion: https://postgr.es/m/16860.1498442626@sss.pgh.pa.us
    ad1b5c84
pgstat.c 165 KB