• Tom Lane's avatar
    Fix parallel pg_dump/pg_restore for failure to create worker processes. · 2425f8f7
    Tom Lane authored
    If we failed to fork a worker process, or create a communication pipe
    for one, WaitForTerminatingWorkers would suffer an assertion failure
    if assert-enabled, otherwise crash or go into an infinite loop.  This
    was a consequence of not accounting for the startup condition where
    we've not yet forked all the workers.
    
    The original bug was that ParallelBackupStart would set workerStatus to
    WRKR_IDLE before it had successfully forked a worker.  I made things
    worse in commit b7b8cc0c by not understanding the undocumented fact
    that the WRKR_TERMINATED state was also meant to represent the case
    where a worker hadn't been started yet: I changed enum T_WorkerStatus
    so that *all* the worker slots were initially in WRKR_IDLE state.  But
    this wasn't any more broken in practice, since even one slot in the
    wrong state would keep WaitForTerminatingWorkers from terminating.
    
    In v10 and later, introduce an explicit T_WorkerStatus value for
    worker-not-started, in hopes of preventing future oversights of the
    same ilk.  Before that, just document that WRKR_TERMINATED is supposed
    to cover that case (partly because it wasn't actively broken, and
    partly because the enum is exposed outside parallel.c in those branches,
    so there's microscopically more risk involved in changing it).
    In all branches, introduce a WORKER_IS_RUNNING status test macro
    to hide which T_WorkerStatus values mean that, and be more careful
    not to access ParallelSlot fields till we're sure they're valid.
    
    Per report from Vignesh C, though this is my patch not his.
    Back-patch to all supported branches.
    
    Discussion: https://postgr.es/m/CALDaNm1Luv-E3sarR+-unz-BjchquHHyfP+YC+2FS2pt_J+wxg@mail.gmail.com
    2425f8f7
parallel.c 47.8 KB