Commit 26e00793 authored by Thomas Munro's avatar Thomas Munro

Fix race between DROP TABLESPACE and checkpointing.

Commands like ALTER TABLE SET TABLESPACE may leave files for the next
checkpoint to clean up.  If such files are not removed by the time DROP
TABLESPACE is called, we request a checkpoint so that they are deleted.
However, there is presently a window before checkpoint start where new
unlink requests won't be scheduled until the following checkpoint.  This
means that the checkpoint forced by DROP TABLESPACE might not remove the
files we expect it to remove, and the following ERROR will be emitted:

	ERROR:  tablespace "mytblspc" is not empty

To fix, add a call to AbsorbSyncRequests() just before advancing the
unlink cycle counter.  This ensures that any unlink requests forwarded
prior to checkpoint start (i.e., when ckpt_started is incremented) will
be processed by the current checkpoint.  Since AbsorbSyncRequests()
performs memory allocations, it cannot be called within a critical
section, so we also need to move SyncPreCheckpoint() to before
CreateCheckPoint()'s critical section.

This is an old bug, so back-patch to all supported versions.

Author: Nathan Bossart <nathandbossart@gmail.com>
Reported-by: default avatarNathan Bossart <nathandbossart@gmail.com>
Reviewed-by: default avatarThomas Munro <thomas.munro@gmail.com>
Reviewed-by: default avatarAndres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/20220215235845.GA2665318%40nathanxps13
parent dc5b3bda
...@@ -9020,6 +9020,14 @@ CreateCheckPoint(int flags) ...@@ -9020,6 +9020,14 @@ CreateCheckPoint(int flags)
MemSet(&CheckpointStats, 0, sizeof(CheckpointStats)); MemSet(&CheckpointStats, 0, sizeof(CheckpointStats));
CheckpointStats.ckpt_start_t = GetCurrentTimestamp(); CheckpointStats.ckpt_start_t = GetCurrentTimestamp();
/*
* Let smgr prepare for checkpoint; this has to happen outside the
* critical section and before we determine the REDO pointer. Note that
* smgr must not do anything that'd have to be undone if we decide no
* checkpoint is needed.
*/
SyncPreCheckpoint();
/* /*
* Use a critical section to force system panic if we have trouble. * Use a critical section to force system panic if we have trouble.
*/ */
...@@ -9034,13 +9042,6 @@ CreateCheckPoint(int flags) ...@@ -9034,13 +9042,6 @@ CreateCheckPoint(int flags)
LWLockRelease(ControlFileLock); LWLockRelease(ControlFileLock);
} }
/*
* Let smgr prepare for checkpoint; this has to happen before we determine
* the REDO pointer. Note that smgr must not do anything that'd have to
* be undone if we decide no checkpoint is needed.
*/
SyncPreCheckpoint();
/* Begin filling in the checkpoint WAL record */ /* Begin filling in the checkpoint WAL record */
MemSet(&checkPoint, 0, sizeof(checkPoint)); MemSet(&checkPoint, 0, sizeof(checkPoint));
checkPoint.time = (pg_time_t) time(NULL); checkPoint.time = (pg_time_t) time(NULL);
......
...@@ -173,7 +173,9 @@ InitSync(void) ...@@ -173,7 +173,9 @@ InitSync(void)
* counter is incremented here. * counter is incremented here.
* *
* This must be called *before* the checkpoint REDO point is determined. * This must be called *before* the checkpoint REDO point is determined.
* That ensures that we won't delete files too soon. * That ensures that we won't delete files too soon. Since this calls
* AbsorbSyncRequests(), which performs memory allocations, it cannot be
* called within a critical section.
* *
* Note that we can't do anything here that depends on the assumption * Note that we can't do anything here that depends on the assumption
* that the checkpoint will be completed. * that the checkpoint will be completed.
...@@ -181,6 +183,16 @@ InitSync(void) ...@@ -181,6 +183,16 @@ InitSync(void)
void void
SyncPreCheckpoint(void) SyncPreCheckpoint(void)
{ {
/*
* Operations such as DROP TABLESPACE assume that the next checkpoint will
* process all recently forwarded unlink requests, but if they aren't
* absorbed prior to advancing the cycle counter, they won't be processed
* until a future checkpoint. The following absorb ensures that any
* unlink requests forwarded before the checkpoint began will be processed
* in the current checkpoint.
*/
AbsorbSyncRequests();
/* /*
* Any unlink requests arriving after this point will be assigned the next * Any unlink requests arriving after this point will be assigned the next
* cycle counter, and won't be unlinked until next checkpoint. * cycle counter, and won't be unlinked until next checkpoint.
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment