Commit 5b6289c1 authored by Tom Lane's avatar Tom Lane

Handle elog(FATAL) during ROLLBACK more robustly.

Stress testing by Andreas Seltenreich disclosed longstanding problems that
occur if a FATAL exit (e.g. due to receipt of SIGTERM) occurs while we are
trying to execute a ROLLBACK of an already-failed transaction.  In such a
case, xact.c is in TBLOCK_ABORT state, so that AbortOutOfAnyTransaction
would skip AbortTransaction and go straight to CleanupTransaction.  This
led to an assert failure in an assert-enabled build (due to the ROLLBACK's
portal still having a cleanup hook) or without assertions, to a FATAL exit
complaining about "cannot drop active portal".  The latter's not
disastrous, perhaps, but it's messy enough to want to improve it.

We don't really want to run all of AbortTransaction in this code path.
The minimum required to clean up the open portal safely is to do
AtAbort_Memory and AtAbort_Portals.  It seems like a good idea to
do AtAbort_Memory unconditionally, to be entirely sure that we are
starting with a safe CurrentMemoryContext.  That means that if the
main loop in AbortOutOfAnyTransaction does nothing, we need an extra
step at the bottom to restore CurrentMemoryContext = TopMemoryContext,
which I chose to do by invoking AtCleanup_Memory.  This'll result in
calling AtCleanup_Memory twice in many of the paths through this function,
but that seems harmless and reasonably inexpensive.

The original motivation for the assertion in AtCleanup_Portals was that
we wanted to be sure that any user-defined code executed as a consequence
of the cleanup hook runs during AbortTransaction not CleanupTransaction.
That still seems like a valid concern, and now that we've seen one case
of the assertion firing --- which means that exactly that would have
happened in a production build --- let's replace the Assert with a runtime
check.  If we see the cleanup hook still set, we'll emit a WARNING and
just drop the hook unexecuted.

This has been like this a long time, so back-patch to all supported
branches.

Discussion: https://postgr.es/m/877ey7bmun.fsf@ansel.ydns.eu
parent 7f1bb1d7
...@@ -4226,6 +4226,9 @@ AbortOutOfAnyTransaction(void) ...@@ -4226,6 +4226,9 @@ AbortOutOfAnyTransaction(void)
{ {
TransactionState s = CurrentTransactionState; TransactionState s = CurrentTransactionState;
/* Ensure we're not running in a doomed memory context */
AtAbort_Memory();
/* /*
* Get out of any transaction or nested transaction * Get out of any transaction or nested transaction
*/ */
...@@ -4267,7 +4270,14 @@ AbortOutOfAnyTransaction(void) ...@@ -4267,7 +4270,14 @@ AbortOutOfAnyTransaction(void)
break; break;
case TBLOCK_ABORT: case TBLOCK_ABORT:
case TBLOCK_ABORT_END: case TBLOCK_ABORT_END:
/* AbortTransaction already done, still need Cleanup */
/*
* AbortTransaction is already done, still need Cleanup.
* However, if we failed partway through running ROLLBACK,
* there will be an active portal running that command, which
* we need to shut down before doing CleanupTransaction.
*/
AtAbort_Portals();
CleanupTransaction(); CleanupTransaction();
s->blockState = TBLOCK_DEFAULT; s->blockState = TBLOCK_DEFAULT;
break; break;
...@@ -4290,6 +4300,14 @@ AbortOutOfAnyTransaction(void) ...@@ -4290,6 +4300,14 @@ AbortOutOfAnyTransaction(void)
case TBLOCK_SUBABORT_END: case TBLOCK_SUBABORT_END:
case TBLOCK_SUBABORT_RESTART: case TBLOCK_SUBABORT_RESTART:
/* As above, but AbortSubTransaction already done */ /* As above, but AbortSubTransaction already done */
if (s->curTransactionOwner)
{
/* As in TBLOCK_ABORT, might have a live portal to zap */
AtSubAbort_Portals(s->subTransactionId,
s->parent->subTransactionId,
s->curTransactionOwner,
s->parent->curTransactionOwner);
}
CleanupSubTransaction(); CleanupSubTransaction();
s = CurrentTransactionState; /* changed by pop */ s = CurrentTransactionState; /* changed by pop */
break; break;
...@@ -4298,6 +4316,9 @@ AbortOutOfAnyTransaction(void) ...@@ -4298,6 +4316,9 @@ AbortOutOfAnyTransaction(void)
/* Should be out of all subxacts now */ /* Should be out of all subxacts now */
Assert(s->parent == NULL); Assert(s->parent == NULL);
/* If we didn't actually have anything to do, revert to TopMemoryContext */
AtCleanup_Memory();
} }
/* /*
......
...@@ -415,8 +415,8 @@ MarkPortalDone(Portal portal) ...@@ -415,8 +415,8 @@ MarkPortalDone(Portal portal)
* well do that now, since the portal can't be executed any more. * well do that now, since the portal can't be executed any more.
* *
* In some cases involving execution of a ROLLBACK command in an already * In some cases involving execution of a ROLLBACK command in an already
* aborted transaction, this prevents an assertion failure caused by * aborted transaction, this is necessary, or we'd reach AtCleanup_Portals
* reaching AtCleanup_Portals with the cleanup hook still unexecuted. * with the cleanup hook still unexecuted.
*/ */
if (PointerIsValid(portal->cleanup)) if (PointerIsValid(portal->cleanup))
{ {
...@@ -443,8 +443,8 @@ MarkPortalFailed(Portal portal) ...@@ -443,8 +443,8 @@ MarkPortalFailed(Portal portal)
* well do that now, since the portal can't be executed any more. * well do that now, since the portal can't be executed any more.
* *
* In some cases involving cleanup of an already aborted transaction, this * In some cases involving cleanup of an already aborted transaction, this
* prevents an assertion failure caused by reaching AtCleanup_Portals with * is necessary, or we'd reach AtCleanup_Portals with the cleanup hook
* the cleanup hook still unexecuted. * still unexecuted.
*/ */
if (PointerIsValid(portal->cleanup)) if (PointerIsValid(portal->cleanup))
{ {
...@@ -842,8 +842,15 @@ AtCleanup_Portals(void) ...@@ -842,8 +842,15 @@ AtCleanup_Portals(void)
if (portal->portalPinned) if (portal->portalPinned)
portal->portalPinned = false; portal->portalPinned = false;
/* We had better not be calling any user-defined code here */ /*
Assert(portal->cleanup == NULL); * We had better not call any user-defined code during cleanup, so if
* the cleanup hook hasn't been run yet, too bad; we'll just skip it.
*/
if (PointerIsValid(portal->cleanup))
{
elog(WARNING, "skipping cleanup for portal \"%s\"", portal->name);
portal->cleanup = NULL;
}
/* Zap it. */ /* Zap it. */
PortalDrop(portal, false); PortalDrop(portal, false);
...@@ -1026,8 +1033,15 @@ AtSubCleanup_Portals(SubTransactionId mySubid) ...@@ -1026,8 +1033,15 @@ AtSubCleanup_Portals(SubTransactionId mySubid)
if (portal->portalPinned) if (portal->portalPinned)
portal->portalPinned = false; portal->portalPinned = false;
/* We had better not be calling any user-defined code here */ /*
Assert(portal->cleanup == NULL); * We had better not call any user-defined code during cleanup, so if
* the cleanup hook hasn't been run yet, too bad; we'll just skip it.
*/
if (PointerIsValid(portal->cleanup))
{
elog(WARNING, "skipping cleanup for portal \"%s\"", portal->name);
portal->cleanup = NULL;
}
/* Zap it. */ /* Zap it. */
PortalDrop(portal, false); PortalDrop(portal, false);
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment