Fix order of shutdown cleanup operations in PostgresNode.pm.

Previously, database clusters created by a TAP test were shut down by DESTROY methods attached to the PostgresNode objects representing them. The trouble with that is that if the objects survive into the final global destruction phase (which they do), Perl executes the DESTROY methods in an unspecified order. Thus, the order of shutdown of multiple clusters was indeterminate, which might lead to not-very-reproducible errors getting logged (eg from a slave whose master might or might not get killed first). Worse, the File::Temp objects representing the temporary PGDATA directories might get destroyed before the PostgresNode objects, resulting in attempts to delete PGDATA directories that still have live servers in them. On Windows, this would lead to directory deletion failures; on Unix, it usually had no effects worse than erratic "could not open temporary statistics file "pg_stat/global.tmp": No such file or directory" log messages. While none of this would affect the reported result of the TAP test, which is already determined, it could be very confusing when one is trying to understand from the logs what went wrong with a failed test. To fix, do the postmaster shutdowns in an END block rather than at object destruction time. The END block will execute at a well-defined (and reasonable) time during script termination, and it will stop the postmasters in order of PostgresNode object creation. (Perhaps we should change that to be reverse order of creation, but the main point here is that we now have control which we did not before.) Use "pg_ctl stop", not an asynchronous kill(SIGQUIT), so that we wait for the postmasters to shut down before proceeding with directory deletion. Deletion of temporary directories still happens in an unspecified order during global destruction, but I can see no reason to care about that once the postmasters are stopped.

Fix order of shutdown cleanup operations in PostgresNode.pm.
Previously, database clusters created by a TAP test were shut down by DESTROY methods attached to the PostgresNode objects representing them. The trouble with that is that if the objects survive into the final global destruction phase (which they do), Perl executes the DESTROY methods in an unspecified order. Thus, the order of shutdown of multiple clusters was indeterminate, which might lead to not-very-reproducible errors getting logged (eg from a slave whose master might or might not get killed first). Worse, the File::Temp objects representing the temporary PGDATA directories might get destroyed before the PostgresNode objects, resulting in attempts to delete PGDATA directories that still have live servers in them. On Windows, this would lead to directory deletion failures; on Unix, it usually had no effects worse than erratic "could not open temporary statistics file "pg_stat/global.tmp": No such file or directory" log messages. While none of this would affect the reported result of the TAP test, which is already determined, it could be very confusing when one is trying to understand from the logs what went wrong with a failed test. To fix, do the postmaster shutdowns in an END block rather than at object destruction time. The END block will execute at a well-defined (and reasonable) time during script termination, and it will stop the postmasters in order of PostgresNode object creation. (Perhaps we should change that to be reverse order of creation, but the main point here is that we now have control which we did not before.) Use "pg_ctl stop", not an asynchronous kill(SIGQUIT), so that we wait for the postmasters to shut down before proceeding with directory deletion. Deletion of temporary directories still happens in an unspecified order during global destruction, but I can see no reason to care about that once the postmasters are stopped.
08af9219 · Tom Lane · 82311bcd · 08af9219
Commit 08af9219 authored Apr 26, 2016 by Tom Lane
Hide whitespace changes
Inline Side-by-side

Showing with 17 additions and 9 deletions

src/test/perl/PostgresNode.pm src/test/perl/PostgresNode.pm +17 -9

No files found.
--- a/src/test/perl/PostgresNode.pm
+++ b/src/test/perl/PostgresNode.pm
@@ -662,6 +662,7 @@ sub stop
 	my $pgdata = $self->data_dir;
 	my $name   = $self->name;
 	$mode = 'fast' unless defined $mode;
+	return unless defined $self->{_pid};
 	print "### Stopping node \"$name\" using mode $mode\n";
 	TestLib::system_log('pg_ctl', '-D', $pgdata, '-m', $mode, 'stop');
 	$self->{_pid} = undef;
@@ -826,8 +827,8 @@ sub _update_pid
 Build a new PostgresNode object, assigning a free port number. Standalone
 function that's automatically imported.

-We also register the node, to avoid the port number from being reused
-for another node even when this one is not active.
+Remembers the node, to prevent its port number from being reused for another
+node, and to ensure that it gets shut down when the test script exits.

 You should generally use this instead of PostgresNode::new(...).

@@ -889,14 +890,21 @@ sub get_new_node
 	return $node;
 }

-# Attempt automatic cleanup
-sub DESTROY
+# Automatically shut down any still-running nodes when the test script exits.
+# Note that this just stops the postmasters (in the same order the nodes were
+# created in).  Temporary PGDATA directories are deleted, in an unspecified
+# order, later when the File::Temp objects are destroyed.
+END
 {
-	my $self = shift;
-	my $name = $self->name;
-	return unless defined $self->{_pid};
-	print "### Signalling QUIT to $self->{_pid} for node \"$name\"\n";
-	TestLib::system_log('pg_ctl', 'kill', 'QUIT', $self->{_pid});
+	# take care not to change the script's exit value
+	my $exit_code = $?;
+
+	foreach my $node (@all_nodes)
+	{
+		$node->teardown_node;
+	}
+
+	$? = $exit_code;
 }

 =pod