Fix race condition between shutdown and unstarted background workers.
If a database shutdown (smart or fast) is commanded between the time some process decides to request a new background worker and the time that the postmaster can launch that worker, then nothing happens because the postmaster won't launch any bgworkers once it's exited PM_RUN state. This is fine ... unless the requesting process is waiting for that worker to finish (or even for it to start); in that case the requestor is stuck, and only manual intervention will get us to the point of being able to shut down. To fix, cancel pending requests for workers when the postmaster sends shutdown (SIGTERM) signals, and similarly cancel any new requests that arrive after that point. (We can optimize things slightly by only doing the cancellation for workers that have waiters.) To fit within the existing bgworker APIs, the "cancel" is made to look like the worker was started and immediately stopped, causing deregistration of the bgworker entry. Waiting processes would have to deal with premature worker exit anyway, so this should introduce no bugs that weren't there before. We do have a side effect that registration records for restartable bgworkers might disappear when theoretically they should have remained in place; but since we're shutting down, that shouldn't matter. Back-patch to v10. There might be value in putting this into 9.6 as well, but the management of bgworkers is a bit different there (notably see 8ff51869) and I'm not convinced it's worth the effort to validate the patch for that branch. Discussion: https://postgr.es/m/661570.1608673226@sss.pgh.pa.us
Showing
Please register or sign in to comment