Commit f13ea95f authored by Tom Lane's avatar Tom Lane

Change pg_ctl to detect server-ready by watching status in postmaster.pid.

Traditionally, "pg_ctl start -w" has waited for the server to become
ready to accept connections by attempting a connection once per second.
That has the major problem that connection issues (for instance, a
kernel packet filter blocking traffic) can't be reliably told apart
from server startup issues, and the minor problem that if server startup
isn't quick, we accumulate "the database system is starting up" spam
in the server log.  We've hacked around many of the possible connection
issues, but it resulted in ugly and complicated code in pg_ctl.c.

In commit c61559ec, I changed the probe rate to every tenth of a second.
That prompted Jeff Janes to complain that the log-spam problem had become
much worse.  In the ensuing discussion, Andres Freund pointed out that
we could dispense with connection attempts altogether if the postmaster
were changed to report its status in postmaster.pid, which "pg_ctl start"
already relies on being able to read.  This patch implements that, teaching
postmaster.c to report a status string into the pidfile at the same
state-change points already identified as being of interest for systemd
status reporting (cf commit 7d17e683).  pg_ctl no longer needs to link
with libpq at all; all its functions now depend on reading server files.

In support of this, teach AddToDataDirLockFile() to allow addition of
postmaster.pid lines in not-necessarily-sequential order.  This is needed
on Windows where the SHMEM_KEY line will never be written at all.  We still
have the restriction that we don't want to truncate the pidfile; document
the reasons for that a bit better.

Also, fix the pg_ctl TAP tests so they'll notice if "start -w" mode
is broken --- before, they'd just wait out the sixty seconds until
the loop gives up, and then report success anyway.  (Yes, I found that
out the hard way.)

While at it, arrange for pg_ctl to not need to #include miscadmin.h;
as a rather low-level backend header, requiring that to be compilable
client-side is pretty dubious.  This requires moving the #define's
associated with the pidfile into a new header file, and moving
PG_BACKEND_VERSIONSTR someplace else.  For lack of a clearly better
"someplace else", I put it into port.h, beside the declaration of
find_other_exec(), since most users of that macro are passing the value to
find_other_exec().  (initdb still depends on miscadmin.h, but at least
pg_ctl and pg_upgrade no longer do.)

In passing, fix main.c so that PG_BACKEND_VERSIONSTR actually defines the
output of "postgres -V", which remarkably it had never done before.

Discussion: https://postgr.es/m/CAMkU=1xJW8e+CTotojOMBd-yzUvD0e_JZu2xHo=MnuZ4__m7Pg@mail.gmail.com
parent 8c55244a
......@@ -169,7 +169,7 @@ main(int argc, char *argv[])
}
if (strcmp(argv[1], "--version") == 0 || strcmp(argv[1], "-V") == 0)
{
puts("postgres (PostgreSQL) " PG_VERSION);
fputs(PG_BACKEND_VERSIONSTR, stdout);
exit(0);
}
......
......@@ -38,6 +38,7 @@
#include "storage/ipc.h"
#include "storage/pg_shmem.h"
#include "utils/guc.h"
#include "utils/pidfile.h"
/*
......
......@@ -125,6 +125,7 @@
#include "utils/datetime.h"
#include "utils/dynamic_loader.h"
#include "utils/memutils.h"
#include "utils/pidfile.h"
#include "utils/ps_status.h"
#include "utils/timeout.h"
#include "utils/varlena.h"
......@@ -1340,6 +1341,12 @@ PostmasterMain(int argc, char *argv[])
gettimeofday(&random_start_time, NULL);
#endif
/*
* Report postmaster status in the postmaster.pid file, to allow pg_ctl to
* see what's happening.
*/
AddToDataDirLockFile(LOCK_FILE_LINE_PM_STATUS, PM_STATUS_STARTING);
/*
* We're ready to rock and roll...
*/
......@@ -2608,6 +2615,9 @@ pmdie(SIGNAL_ARGS)
Shutdown = SmartShutdown;
ereport(LOG,
(errmsg("received smart shutdown request")));
/* Report status */
AddToDataDirLockFile(LOCK_FILE_LINE_PM_STATUS, PM_STATUS_STOPPING);
#ifdef USE_SYSTEMD
sd_notify(0, "STOPPING=1");
#endif
......@@ -2663,6 +2673,9 @@ pmdie(SIGNAL_ARGS)
Shutdown = FastShutdown;
ereport(LOG,
(errmsg("received fast shutdown request")));
/* Report status */
AddToDataDirLockFile(LOCK_FILE_LINE_PM_STATUS, PM_STATUS_STOPPING);
#ifdef USE_SYSTEMD
sd_notify(0, "STOPPING=1");
#endif
......@@ -2727,6 +2740,9 @@ pmdie(SIGNAL_ARGS)
Shutdown = ImmediateShutdown;
ereport(LOG,
(errmsg("received immediate shutdown request")));
/* Report status */
AddToDataDirLockFile(LOCK_FILE_LINE_PM_STATUS, PM_STATUS_STOPPING);
#ifdef USE_SYSTEMD
sd_notify(0, "STOPPING=1");
#endif
......@@ -2872,6 +2888,8 @@ reaper(SIGNAL_ARGS)
ereport(LOG,
(errmsg("database system is ready to accept connections")));
/* Report status */
AddToDataDirLockFile(LOCK_FILE_LINE_PM_STATUS, PM_STATUS_READY);
#ifdef USE_SYSTEMD
sd_notify(0, "READY=1");
#endif
......@@ -5005,10 +5023,18 @@ sigusr1_handler(SIGNAL_ARGS)
if (XLogArchivingAlways())
PgArchPID = pgarch_start();
#ifdef USE_SYSTEMD
/*
* If we aren't planning to enter hot standby mode later, treat
* RECOVERY_STARTED as meaning we're out of startup, and report status
* accordingly.
*/
if (!EnableHotStandby)
{
AddToDataDirLockFile(LOCK_FILE_LINE_PM_STATUS, PM_STATUS_STANDBY);
#ifdef USE_SYSTEMD
sd_notify(0, "READY=1");
#endif
}
pmState = PM_RECOVERY;
}
......@@ -5024,6 +5050,8 @@ sigusr1_handler(SIGNAL_ARGS)
ereport(LOG,
(errmsg("database system is ready to accept read only connections")));
/* Report status */
AddToDataDirLockFile(LOCK_FILE_LINE_PM_STATUS, PM_STATUS_READY);
#ifdef USE_SYSTEMD
sd_notify(0, "READY=1");
#endif
......
......@@ -47,6 +47,7 @@
#include "utils/builtins.h"
#include "utils/guc.h"
#include "utils/memutils.h"
#include "utils/pidfile.h"
#include "utils/syscache.h"
#include "utils/varlena.h"
......@@ -1149,8 +1150,9 @@ TouchSocketLockFiles(void)
*
* Note: because we don't truncate the file, if we were to rewrite a line
* with less data than it had before, there would be garbage after the last
* line. We don't ever actually do that, so not worth adding another kernel
* call to cover the possibility.
* line. While we could fix that by adding a truncate call, that would make
* the file update non-atomic, which we'd rather avoid. Therefore, callers
* should endeavor never to shorten a line once it's been written.
*/
void
AddToDataDirLockFile(int target_line, const char *str)
......@@ -1193,18 +1195,25 @@ AddToDataDirLockFile(int target_line, const char *str)
srcptr = srcbuffer;
for (lineno = 1; lineno < target_line; lineno++)
{
if ((srcptr = strchr(srcptr, '\n')) == NULL)
{
elog(LOG, "incomplete data in \"%s\": found only %d newlines while trying to add line %d",
DIRECTORY_LOCK_FILE, lineno - 1, target_line);
close(fd);
return;
}
srcptr++;
char *eol = strchr(srcptr, '\n');
if (eol == NULL)
break; /* not enough lines in file yet */
srcptr = eol + 1;
}
memcpy(destbuffer, srcbuffer, srcptr - srcbuffer);
destptr = destbuffer + (srcptr - srcbuffer);
/*
* Fill in any missing lines before the target line, in case lines are
* added to the file out of order.
*/
for (; lineno < target_line; lineno++)
{
if (destptr < destbuffer + sizeof(destbuffer))
*destptr++ = '\n';
}
/*
* Write or rewrite the target line.
*/
......
......@@ -16,14 +16,12 @@ subdir = src/bin/pg_ctl
top_builddir = ../../..
include $(top_builddir)/src/Makefile.global
override CPPFLAGS := -I$(libpq_srcdir) $(CPPFLAGS)
OBJS= pg_ctl.o $(WIN32RES)
all: pg_ctl
pg_ctl: $(OBJS) | submake-libpq submake-libpgport
$(CC) $(CFLAGS) $(OBJS) $(libpq_pgport) $(LDFLAGS) $(LDFLAGS_EX) $(LIBS) -o $@$(X)
pg_ctl: $(OBJS) | submake-libpgport
$(CC) $(CFLAGS) $(OBJS) $(LDFLAGS) $(LDFLAGS_EX) $(LIBS) -o $@$(X)
install: all installdirs
$(INSTALL_PROGRAM) pg_ctl$(X) '$(DESTDIR)$(bindir)/pg_ctl$(X)'
......
This diff is collapsed.
......@@ -4,7 +4,7 @@ use warnings;
use Config;
use PostgresNode;
use TestLib;
use Test::More tests => 17;
use Test::More tests => 19;
my $tempdir = TestLib::tempdir;
my $tempdir_short = TestLib::tempdir_short;
......@@ -32,12 +32,14 @@ else
print $conf "listen_addresses = '127.0.0.1'\n";
}
close $conf;
command_ok([ 'pg_ctl', 'start', '-D', "$tempdir/data" ], 'pg_ctl start');
command_like([ 'pg_ctl', 'start', '-D', "$tempdir/data",
'-l', "$TestLib::log_path/001_start_stop_server.log" ],
qr/done.*server started/s, 'pg_ctl start');
# sleep here is because Windows builds can't check postmaster.pid exactly,
# so they may mistake a pre-existing postmaster.pid for one created by the
# postmaster they start. Waiting more than the 2 seconds slop time allowed
# by test_postmaster_connection prevents that mistake.
# by wait_for_postmaster() prevents that mistake.
sleep 3 if ($windows_os);
command_fails([ 'pg_ctl', 'start', '-D', "$tempdir/data" ],
'second pg_ctl start fails');
......
......@@ -14,8 +14,8 @@
#include <io.h>
#endif
#include "miscadmin.h"
#include "getopt_long.h"
#include "utils/pidfile.h"
#include "pg_upgrade.h"
......
......@@ -21,7 +21,6 @@
#endif
#include "common/config_info.h"
#include "miscadmin.h"
/*
......
......@@ -28,8 +28,6 @@
#include "pgtime.h" /* for pg_time_t */
#define PG_BACKEND_VERSIONSTR "postgres (PostgreSQL) " PG_VERSION "\n"
#define InvalidPid (-1)
......@@ -431,31 +429,6 @@ extern char *session_preload_libraries_string;
extern char *shared_preload_libraries_string;
extern char *local_preload_libraries_string;
/*
* As of 9.1, the contents of the data-directory lock file are:
*
* line #
* 1 postmaster PID (or negative of a standalone backend's PID)
* 2 data directory path
* 3 postmaster start timestamp (time_t representation)
* 4 port number
* 5 first Unix socket directory path (empty if none)
* 6 first listen_address (IP address or "*"; empty if no TCP port)
* 7 shared memory key (not present on Windows)
*
* Lines 6 and up are added via AddToDataDirLockFile() after initial file
* creation.
*
* The socket lock file, if used, has the same contents as lines 1-5.
*/
#define LOCK_FILE_LINE_PID 1
#define LOCK_FILE_LINE_DATA_DIR 2
#define LOCK_FILE_LINE_START_TIME 3
#define LOCK_FILE_LINE_PORT 4
#define LOCK_FILE_LINE_SOCKET_DIR 5
#define LOCK_FILE_LINE_LISTEN_ADDR 6
#define LOCK_FILE_LINE_SHMEM_KEY 7
extern void CreateDataDirLockFile(bool amPostmaster);
extern void CreateSocketLockFile(const char *socketfile, bool amPostmaster,
const char *socketDir);
......
......@@ -98,6 +98,9 @@ extern int find_my_exec(const char *argv0, char *retpath);
extern int find_other_exec(const char *argv0, const char *target,
const char *versionstr, char *retpath);
/* Doesn't belong here, but this is used with find_other_exec(), so... */
#define PG_BACKEND_VERSIONSTR "postgres (PostgreSQL) " PG_VERSION "\n"
/* Windows security token manipulation (in exec.c) */
#ifdef WIN32
extern BOOL AddUserToTokenDacl(HANDLE hToken);
......
/*-------------------------------------------------------------------------
*
* pidfile.h
* Declarations describing the data directory lock file (postmaster.pid)
*
* Portions Copyright (c) 1996-2017, PostgreSQL Global Development Group
* Portions Copyright (c) 1994, Regents of the University of California
*
* src/include/utils/pidfile.h
*
*-------------------------------------------------------------------------
*/
#ifndef UTILS_PIDFILE_H
#define UTILS_PIDFILE_H
/*
* As of Postgres 10, the contents of the data-directory lock file are:
*
* line #
* 1 postmaster PID (or negative of a standalone backend's PID)
* 2 data directory path
* 3 postmaster start timestamp (time_t representation)
* 4 port number
* 5 first Unix socket directory path (empty if none)
* 6 first listen_address (IP address or "*"; empty if no TCP port)
* 7 shared memory key (empty on Windows)
* 8 postmaster status (see values below)
*
* Lines 6 and up are added via AddToDataDirLockFile() after initial file
* creation; also, line 5 is initially empty and is changed after the first
* Unix socket is opened.
*
* Socket lock file(s), if used, have the same contents as lines 1-5, with
* line 5 being their own directory.
*/
#define LOCK_FILE_LINE_PID 1
#define LOCK_FILE_LINE_DATA_DIR 2
#define LOCK_FILE_LINE_START_TIME 3
#define LOCK_FILE_LINE_PORT 4
#define LOCK_FILE_LINE_SOCKET_DIR 5
#define LOCK_FILE_LINE_LISTEN_ADDR 6
#define LOCK_FILE_LINE_SHMEM_KEY 7
#define LOCK_FILE_LINE_PM_STATUS 8
/*
* The PM_STATUS line may contain one of these values. All these strings
* must be the same length, per comments for AddToDataDirLockFile().
* We pad with spaces as needed to make that true.
*/
#define PM_STATUS_STARTING "starting" /* still starting up */
#define PM_STATUS_STOPPING "stopping" /* in shutdown sequence */
#define PM_STATUS_READY "ready " /* ready for connections */
#define PM_STATUS_STANDBY "standby " /* up, won't accept connections */
#endif /* UTILS_PIDFILE_H */
......@@ -49,7 +49,7 @@ my @contrib_excludes = (
# Set of variables for frontend modules
my $frontend_defines = { 'initdb' => 'FRONTEND' };
my @frontend_uselibpq = ('pg_ctl', 'pg_upgrade', 'pgbench', 'psql', 'initdb');
my @frontend_uselibpq = ('pg_upgrade', 'pgbench', 'psql', 'initdb');
my @frontend_uselibpgport = (
'pg_archivecleanup', 'pg_test_fsync',
'pg_test_timing', 'pg_upgrade',
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment