• Tom Lane's avatar
    Fix, or at least ameliorate, bugs in logicalrep_worker_launch(). · 3e1683d3
    Tom Lane authored
    If we failed to get a background worker slot, the code just walked
    away from the logicalrep-worker slot it already had, leaving that
    looking like the worker is still starting up.  This led to an indefinite
    hang in subscription startup, as reported by Thomas Munro.  We must
    release the slot on failure.
    
    Also fix a thinko: we must capture the worker slot's generation before
    releasing LogicalRepWorkerLock the first time, else testing to see if
    it's changed is pretty meaningless.
    
    BTW, the CHECK_FOR_INTERRUPTS() in WaitForReplicationWorkerAttach is a
    ticking time bomb, even without considering the possibility of elog(ERROR)
    in one of the other functions it calls.  Really, this entire business needs
    a redesign with some actual thought about error recovery.  But for now
    I'm just band-aiding the case observed in testing.
    
    Back-patch to v10 where this code was added.
    
    Discussion: https://postgr.es/m/CAEepm=2bP3TBMFBArP6o20AZaRduWjMnjCjt22hSdnA-EvrtCw@mail.gmail.com
    3e1683d3
launcher.c 26.8 KB