• Tom Lane's avatar
    Redesign handling of SIGTERM/control-C in parallel pg_dump/pg_restore. · e652273e
    Tom Lane authored
    Formerly, Unix builds of pg_dump/pg_restore would trap SIGINT and similar
    signals and set a flag that was tested in various data-transfer loops.
    This was prone to errors of omission (cf commit 3c8aa665); and even if
    the client-side response was prompt, we did nothing that would cause
    long-running SQL commands (e.g. CREATE INDEX) to terminate early.
    Also, the master process would effectively do nothing at all upon receipt
    of SIGINT; the only reason it seemed to work was that in typical scenarios
    the signal would also be delivered to the child processes.  We should
    support termination when a signal is delivered only to the master process,
    though.
    
    Windows builds had no console interrupt handler, so they would just fall
    over immediately at control-C, again leaving long-running SQL commands to
    finish unmolested.
    
    To fix, remove the flag-checking approach altogether.  Instead, allow the
    Unix signal handler to send a cancel request directly and then exit(1).
    In the master process, also have it forward the signal to the children.
    On Windows, add a console interrupt handler that behaves approximately
    the same.  The main difference is that a single execution of the Windows
    handler can send all the cancel requests since all the info is available
    in one process, whereas on Unix each process sends a cancel only for its
    own database connection.
    
    In passing, fix an old problem that DisconnectDatabase tends to send a
    cancel request before exiting a parallel worker, even if nothing went
    wrong.  This is at least a waste of cycles, and could lead to unexpected
    log messages, or maybe even data loss if it happened in pg_restore (though
    in the current code the problem seems to affect only pg_dump).  The cause
    was that after a COPY step, pg_dump was leaving libpq in PGASYNC_BUSY
    state, causing PQtransactionStatus() to report PQTRANS_ACTIVE.  That's
    normally harmless because the next PQexec() will silently clear the
    PGASYNC_BUSY state; but in a parallel worker we might exit without any
    additional SQL commands after a COPY step.  So add an extra PQgetResult()
    call after a COPY to allow libpq to return to PGASYNC_IDLE state.
    
    This is a bug fix, IMO, so back-patch to 9.3 where parallel dump/restore
    were introduced.
    
    Thanks to Kyotaro Horiguchi for Windows testing and code suggestions.
    
    Original-Patch: <7005.1464657274@sss.pgh.pa.us>
    Discussion: <20160602.174941.256342236.horiguchi.kyotaro@lab.ntt.co.jp>
    e652273e
pg_backup_archiver.h 15.1 KB