Commit 8ff51869 authored by Robert Haas's avatar Robert Haas

Fix confusion of max_parallel_workers mechanism following crash.

Commit b460f5d6 failed to contemplate
the possibilit that a parallel worker registered before a crash would
be unregistered only after the crash; if that happened, we'd end up
with parallel_terminate_count > parallel_register_count and the
system would refuse to launch any more parallel workers.

The easiest way to fix that seems to be to forget BGW_NEVER_RESTART
workers in ResetBackgroundWorkerCrashTimes() rather than leaving them
around to be cleaned up after the conclusion of the restart, so that
they go away before rather than after shared memory is reset.

To make sure that this fix is water-tight, don't allow parallel
workers to be anything other than BGW_NEVER_RESTART, so that after
recovering from a crash, 0 is guaranteed to be the correct starting
value for parallel_register_count.  The core code wouldn't do this
anyway, but somebody might try to do it in extension code.

Report by Thomas Vondra.  Patch by me, reviewed by Kuntal Ghosh.

Discussion: http://postgr.es/m/CAGz5QC+AVEVS+3rBKRq83AxkJLMZ1peMt4nnrQwczxOrmo3CNw@mail.gmail.com
parent 1e298b8d
......@@ -515,13 +515,34 @@ ResetBackgroundWorkerCrashTimes(void)
rw = slist_container(RegisteredBgWorker, rw_lnode, iter.cur);
/*
* For workers that should not be restarted, we don't want to lose the
* information that they have crashed; otherwise, they would be
* restarted, which is wrong.
*/
if (rw->rw_worker.bgw_restart_time != BGW_NEVER_RESTART)
if (rw->rw_worker.bgw_restart_time == BGW_NEVER_RESTART)
{
/*
* Workers marked BGW_NVER_RESTART shouldn't get relaunched after
* the crash, so forget about them. (If we wait until after the
* crash to forget about them, and they are parallel workers,
* parallel_terminate_count will get incremented after we've
* already zeroed parallel_register_count, which would be bad.)
*/
ForgetBackgroundWorker(&iter);
}
else
{
/*
* The accounting which we do via parallel_register_count and
* parallel_terminate_count would get messed up if a worker marked
* parallel could survive a crash and restart cycle. All such
* workers should be marked BGW_NEVER_RESTART, and thus control
* should never reach this branch.
*/
Assert((rw->rw_worker.bgw_flags & BGWORKER_CLASS_PARALLEL) == 0);
/*
* Allow this worker to be restarted immediately after we finish
* resetting.
*/
rw->rw_crashed_at = 0;
}
}
}
......@@ -589,6 +610,21 @@ SanityCheckBackgroundWorker(BackgroundWorker *worker, int elevel)
return false;
}
/*
* Parallel workers may not be configured for restart, because the
* parallel_register_count/parallel_terminate_count accounting can't
* handle parallel workers lasting through a crash-and-restart cycle.
*/
if (worker->bgw_restart_time != BGW_NEVER_RESTART &&
(worker->bgw_flags & BGWORKER_CLASS_PARALLEL) != 0)
{
ereport(elevel,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
errmsg("background worker \"%s\": parallel workers may not be configured for restart",
worker->bgw_name)));
return false;
}
return true;
}
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment