• Thomas Munro's avatar
    PANIC on fsync() failure. · 9ccdd7f6
    Thomas Munro authored
    On some operating systems, it doesn't make sense to retry fsync(),
    because dirty data cached by the kernel may have been dropped on
    write-back failure.  In that case the only remaining copy of the
    data is in the WAL.  A subsequent fsync() could appear to succeed,
    but not have flushed the data.  That means that a future checkpoint
    could apparently complete successfully but have lost data.
    
    Therefore, violently prevent any future checkpoint attempts by
    panicking on the first fsync() failure.  Note that we already
    did the same for WAL data; this change extends that behavior to
    non-temporary data files.
    
    Provide a GUC data_sync_retry to control this new behavior, for
    users of operating systems that don't eject dirty data, and possibly
    forensic/testing uses.  If it is set to on and the write-back error
    was transient, a later checkpoint might genuinely succeed (on a
    system that does not throw away buffers on failure); if the error is
    permanent, later checkpoints will continue to fail.  The GUC defaults
    to off, meaning that we panic.
    
    Back-patch to all supported releases.
    
    There is still a narrow window for error-loss on some operating
    systems: if the file is closed and later reopened and a write-back
    error occurs in the intervening time, but the inode has the bad
    luck to be evicted due to memory pressure before we reopen, we could
    miss the error.  A later patch will address that with a scheme
    for keeping files with dirty data open at all times, but we judge
    that to be too complicated to back-patch.
    
    Author: Craig Ringer, with some adjustments by Thomas Munro
    Reported-by: Craig Ringer
    Reviewed-by: Robert Haas, Thomas Munro, Andres Freund
    Discussion: https://postgr.es/m/20180427222842.in2e4mibx45zdth5%40alap3.anarazel.de
    9ccdd7f6
md.c 60 KB