• Peter Eisentraut's avatar
    Fix various concurrency issues in logical replication worker launching · de438971
    Peter Eisentraut authored
    The code was originally written with assumption that launcher is the
    only process starting the worker.  However that hasn't been true since
    commit 7c4f5240 which failed to modify the worker management code
    adequately.
    
    This patch adds an in_use field to the LogicalRepWorker struct to
    indicate whether the worker slot is being used and uses proper locking
    everywhere this flag is set or read.
    
    However if the parent process dies while the new worker is starting and
    the new worker fails to attach to shared memory, this flag would never
    get cleared.  We solve this rare corner case by adding a sort of garbage
    collector for in_use slots.  This uses another field in the
    LogicalRepWorker struct named launch_time that contains the time when
    the worker was started.  If any request to start a new worker does not
    find free slot, we'll check for workers that were supposed to start but
    took too long to actually do so, and reuse their slot.
    
    In passing also fix possible race conditions when stopping a worker that
    hasn't finished starting yet.
    
    Author: Petr Jelinek <petr.jelinek@2ndquadrant.com>
    Reported-by: default avatarFujii Masao <masao.fujii@gmail.com>
    de438971
launcher.c 24.6 KB