Commit 85ccb689 authored by Tom Lane's avatar Tom Lane

Rearrange pgstat_bestart() to avoid failures within its critical section.

We long ago decided to design the shared PgBackendStatus data structure to
minimize the cost of writing status updates, which means that writers just
have to increment the st_changecount field twice.  That isn't hooked into
any sort of resource management mechanism, which means that if something
were to throw error between the two increments, the st_changecount field
would be left odd indefinitely.  That would cause readers to lock up.
Now, since it's also a bad idea to leave the field odd for longer than
absolutely necessary (because readers will spin while we have it set),
the expectation was that we'd treat these segments like spinlock critical
sections, with only short, more or less straight-line, code in them.

That was fine as originally designed, but commit 9029f4b3 broke it
by inserting a significant amount of non-straight-line code into
pgstat_bestart(), code that is very capable of throwing errors, not to
mention taking a significant amount of time during which readers will spin.
We have a report from Neeraj Kumar of readers actually locking up, which
I suspect was due to an encoding conversion error in X509_NAME_to_cstring,
though conceivably it was just a garden-variety OOM failure.

Subsequent commits have loaded even more dubious code into pgstat_bestart's
critical section (and commit fc70a4b0 deserves some kind of booby prize
for managing to miss the critical section entirely, although the negative
consequences seem minimal given that the PgBackendStatus entry should be
seen by readers as inactive at that point).

The right way to fix this mess seems to be to compute all these values
into a local copy of the process' PgBackendStatus struct, and then just
copy the data back within the critical section proper.  This plan can't
be implemented completely cleanly because of the struct's heavy reliance
on out-of-line strings, which we must initialize separately within the
critical section.  But still, the critical section is far smaller and
safer than it was before.

In hopes of forestalling future errors of the same ilk, rename the
macros for st_changecount management to make it more apparent that
the writer-side macros create a critical section.  And to prevent
the worst consequences if we nonetheless manage to mess it up anyway,
adjust those macros so that they really are a critical section, ie
they now bump CritSectionCount.  That doesn't add much overhead, and
it guarantees that if we do somehow throw an error while the counter
is odd, it will lead to PANIC and a database restart to reset shared
memory.

Back-patch to 9.5 where the problem was introduced.

In HEAD, also fix an oversight in commit b0b39f72: it failed to teach
pgstat_read_current_status to copy st_gssstatus data from shared memory to
local memory.  Hence, subsequent use of that data within the transaction
would potentially see changing data that it shouldn't see.

Discussion: https://postgr.es/m/CAPR3Wj5Z17=+eeyrn_ZDG3NQGYgMEOY6JV6Y-WRRhGgwc16U3Q@mail.gmail.com
parent 4217d15d
......@@ -2849,66 +2849,80 @@ pgstat_initialize(void)
* Apart from auxiliary processes, MyBackendId, MyDatabaseId,
* session userid, and application_name must be set for a
* backend (hence, this cannot be combined with pgstat_initialize).
* Note also that we must be inside a transaction if this isn't an aux
* process, as we may need to do encoding conversion on some strings.
* ----------
*/
void
pgstat_bestart(void)
{
SockAddr clientaddr;
volatile PgBackendStatus *beentry;
volatile PgBackendStatus *vbeentry = MyBEEntry;
PgBackendStatus lbeentry;
#ifdef USE_SSL
PgBackendSSLStatus lsslstatus;
#endif
#ifdef ENABLE_GSS
PgBackendGSSStatus lgssstatus;
#endif
/*
* To minimize the time spent modifying the PgBackendStatus entry, fetch
* all the needed data first.
*/
/* pgstats state must be initialized from pgstat_initialize() */
Assert(vbeentry != NULL);
/*
* We may not have a MyProcPort (eg, if this is the autovacuum process).
* If so, use all-zeroes client address, which is dealt with specially in
* pg_stat_get_backend_client_addr and pg_stat_get_backend_client_port.
* To minimize the time spent modifying the PgBackendStatus entry, and
* avoid risk of errors inside the critical section, we first copy the
* shared-memory struct to a local variable, then modify the data in the
* local variable, then copy the local variable back to shared memory.
* Only the last step has to be inside the critical section.
*
* Most of the data we copy from shared memory is just going to be
* overwritten, but the struct's not so large that it's worth the
* maintenance hassle to copy only the needful fields.
*/
if (MyProcPort)
memcpy(&clientaddr, &MyProcPort->raddr, sizeof(clientaddr));
else
MemSet(&clientaddr, 0, sizeof(clientaddr));
memcpy(&lbeentry,
unvolatize(PgBackendStatus *, vbeentry),
sizeof(PgBackendStatus));
/* These structs can just start from zeroes each time, though */
#ifdef USE_SSL
memset(&lsslstatus, 0, sizeof(lsslstatus));
#endif
#ifdef ENABLE_GSS
memset(&lgssstatus, 0, sizeof(lgssstatus));
#endif
/*
* Initialize my status entry, following the protocol of bumping
* st_changecount before and after; and make sure it's even afterwards. We
* use a volatile pointer here to ensure the compiler doesn't try to get
* cute.
* Now fill in all the fields of lbeentry, except for strings that are
* out-of-line data. Those have to be handled separately, below.
*/
beentry = MyBEEntry;
/* pgstats state must be initialized from pgstat_initialize() */
Assert(beentry != NULL);
lbeentry.st_procpid = MyProcPid;
if (MyBackendId != InvalidBackendId)
{
if (IsAutoVacuumLauncherProcess())
{
/* Autovacuum Launcher */
beentry->st_backendType = B_AUTOVAC_LAUNCHER;
lbeentry.st_backendType = B_AUTOVAC_LAUNCHER;
}
else if (IsAutoVacuumWorkerProcess())
{
/* Autovacuum Worker */
beentry->st_backendType = B_AUTOVAC_WORKER;
lbeentry.st_backendType = B_AUTOVAC_WORKER;
}
else if (am_walsender)
{
/* Wal sender */
beentry->st_backendType = B_WAL_SENDER;
lbeentry.st_backendType = B_WAL_SENDER;
}
else if (IsBackgroundWorker)
{
/* bgworker */
beentry->st_backendType = B_BG_WORKER;
lbeentry.st_backendType = B_BG_WORKER;
}
else
{
/* client-backend */
beentry->st_backendType = B_BACKEND;
lbeentry.st_backendType = B_BACKEND;
}
}
else
......@@ -2918,99 +2932,92 @@ pgstat_bestart(void)
switch (MyAuxProcType)
{
case StartupProcess:
beentry->st_backendType = B_STARTUP;
lbeentry.st_backendType = B_STARTUP;
break;
case BgWriterProcess:
beentry->st_backendType = B_BG_WRITER;
lbeentry.st_backendType = B_BG_WRITER;
break;
case CheckpointerProcess:
beentry->st_backendType = B_CHECKPOINTER;
lbeentry.st_backendType = B_CHECKPOINTER;
break;
case WalWriterProcess:
beentry->st_backendType = B_WAL_WRITER;
lbeentry.st_backendType = B_WAL_WRITER;
break;
case WalReceiverProcess:
beentry->st_backendType = B_WAL_RECEIVER;
lbeentry.st_backendType = B_WAL_RECEIVER;
break;
default:
elog(FATAL, "unrecognized process type: %d",
(int) MyAuxProcType);
proc_exit(1);
}
}
do
{
pgstat_increment_changecount_before(beentry);
} while ((beentry->st_changecount & 1) == 0);
beentry->st_procpid = MyProcPid;
beentry->st_proc_start_timestamp = MyStartTimestamp;
beentry->st_activity_start_timestamp = 0;
beentry->st_state_start_timestamp = 0;
beentry->st_xact_start_timestamp = 0;
beentry->st_databaseid = MyDatabaseId;
lbeentry.st_proc_start_timestamp = MyStartTimestamp;
lbeentry.st_activity_start_timestamp = 0;
lbeentry.st_state_start_timestamp = 0;
lbeentry.st_xact_start_timestamp = 0;
lbeentry.st_databaseid = MyDatabaseId;
/* We have userid for client-backends, wal-sender and bgworker processes */
if (beentry->st_backendType == B_BACKEND
|| beentry->st_backendType == B_WAL_SENDER
|| beentry->st_backendType == B_BG_WORKER)
beentry->st_userid = GetSessionUserId();
if (lbeentry.st_backendType == B_BACKEND
|| lbeentry.st_backendType == B_WAL_SENDER
|| lbeentry.st_backendType == B_BG_WORKER)
lbeentry.st_userid = GetSessionUserId();
else
beentry->st_userid = InvalidOid;
lbeentry.st_userid = InvalidOid;
beentry->st_clientaddr = clientaddr;
if (MyProcPort && MyProcPort->remote_hostname)
strlcpy(beentry->st_clienthostname, MyProcPort->remote_hostname,
NAMEDATALEN);
/*
* We may not have a MyProcPort (eg, if this is the autovacuum process).
* If so, use all-zeroes client address, which is dealt with specially in
* pg_stat_get_backend_client_addr and pg_stat_get_backend_client_port.
*/
if (MyProcPort)
memcpy(&lbeentry.st_clientaddr, &MyProcPort->raddr,
sizeof(lbeentry.st_clientaddr));
else
beentry->st_clienthostname[0] = '\0';
MemSet(&lbeentry.st_clientaddr, 0, sizeof(lbeentry.st_clientaddr));
#ifdef USE_SSL
if (MyProcPort && MyProcPort->ssl != NULL)
{
beentry->st_ssl = true;
beentry->st_sslstatus->ssl_bits = be_tls_get_cipher_bits(MyProcPort);
beentry->st_sslstatus->ssl_compression = be_tls_get_compression(MyProcPort);
strlcpy(beentry->st_sslstatus->ssl_version, be_tls_get_version(MyProcPort), NAMEDATALEN);
strlcpy(beentry->st_sslstatus->ssl_cipher, be_tls_get_cipher(MyProcPort), NAMEDATALEN);
be_tls_get_peer_subject_name(MyProcPort, beentry->st_sslstatus->ssl_client_dn, NAMEDATALEN);
be_tls_get_peer_serial(MyProcPort, beentry->st_sslstatus->ssl_client_serial, NAMEDATALEN);
be_tls_get_peer_issuer_name(MyProcPort, beentry->st_sslstatus->ssl_issuer_dn, NAMEDATALEN);
lbeentry.st_ssl = true;
lsslstatus.ssl_bits = be_tls_get_cipher_bits(MyProcPort);
lsslstatus.ssl_compression = be_tls_get_compression(MyProcPort);
strlcpy(lsslstatus.ssl_version, be_tls_get_version(MyProcPort), NAMEDATALEN);
strlcpy(lsslstatus.ssl_cipher, be_tls_get_cipher(MyProcPort), NAMEDATALEN);
be_tls_get_peer_subject_name(MyProcPort, lsslstatus.ssl_client_dn, NAMEDATALEN);
be_tls_get_peer_serial(MyProcPort, lsslstatus.ssl_client_serial, NAMEDATALEN);
be_tls_get_peer_issuer_name(MyProcPort, lsslstatus.ssl_issuer_dn, NAMEDATALEN);
}
else
{
beentry->st_ssl = false;
lbeentry.st_ssl = false;
}
#else
beentry->st_ssl = false;
lbeentry.st_ssl = false;
#endif
#ifdef ENABLE_GSS
if (MyProcPort && MyProcPort->gss != NULL)
{
beentry->st_gss = true;
beentry->st_gssstatus->gss_auth = be_gssapi_get_auth(MyProcPort);
beentry->st_gssstatus->gss_enc = be_gssapi_get_enc(MyProcPort);
lbeentry.st_gss = true;
lgssstatus.gss_auth = be_gssapi_get_auth(MyProcPort);
lgssstatus.gss_enc = be_gssapi_get_enc(MyProcPort);
if (beentry->st_gssstatus->gss_auth)
strlcpy(beentry->st_gssstatus->gss_princ, be_gssapi_get_princ(MyProcPort), NAMEDATALEN);
if (lgssstatus.gss_auth)
strlcpy(lgssstatus.gss_princ, be_gssapi_get_princ(MyProcPort), NAMEDATALEN);
}
else
{
beentry->st_gss = false;
lbeentry.st_gss = false;
}
#else
beentry->st_gss = false;
lbeentry.st_gss = false;
#endif
beentry->st_state = STATE_UNDEFINED;
beentry->st_appname[0] = '\0';
beentry->st_activity_raw[0] = '\0';
/* Also make sure the last byte in each string area is always 0 */
beentry->st_clienthostname[NAMEDATALEN - 1] = '\0';
beentry->st_appname[NAMEDATALEN - 1] = '\0';
beentry->st_activity_raw[pgstat_track_activity_query_size - 1] = '\0';
beentry->st_progress_command = PROGRESS_COMMAND_INVALID;
beentry->st_progress_command_target = InvalidOid;
lbeentry.st_state = STATE_UNDEFINED;
lbeentry.st_progress_command = PROGRESS_COMMAND_INVALID;
lbeentry.st_progress_command_target = InvalidOid;
/*
* we don't zero st_progress_param here to save cycles; nobody should
......@@ -3018,7 +3025,45 @@ pgstat_bestart(void)
* than PROGRESS_COMMAND_INVALID
*/
pgstat_increment_changecount_after(beentry);
/*
* We're ready to enter the critical section that fills the shared-memory
* status entry. We follow the protocol of bumping st_changecount before
* and after; and make sure it's even afterwards. We use a volatile
* pointer here to ensure the compiler doesn't try to get cute.
*/
PGSTAT_BEGIN_WRITE_ACTIVITY(vbeentry);
/* make sure we'll memcpy the same st_changecount back */
lbeentry.st_changecount = vbeentry->st_changecount;
memcpy(unvolatize(PgBackendStatus *, vbeentry),
&lbeentry,
sizeof(PgBackendStatus));
/*
* We can write the out-of-line strings and structs using the pointers
* that are in lbeentry; this saves some de-volatilizing messiness.
*/
lbeentry.st_appname[0] = '\0';
if (MyProcPort && MyProcPort->remote_hostname)
strlcpy(lbeentry.st_clienthostname, MyProcPort->remote_hostname,
NAMEDATALEN);
else
lbeentry.st_clienthostname[0] = '\0';
lbeentry.st_activity_raw[0] = '\0';
/* Also make sure the last byte in each string area is always 0 */
lbeentry.st_appname[NAMEDATALEN - 1] = '\0';
lbeentry.st_clienthostname[NAMEDATALEN - 1] = '\0';
lbeentry.st_activity_raw[pgstat_track_activity_query_size - 1] = '\0';
#ifdef USE_SSL
memcpy(lbeentry.st_sslstatus, &lsslstatus, sizeof(PgBackendSSLStatus));
#endif
#ifdef ENABLE_GSS
memcpy(lbeentry.st_gssstatus, &lgssstatus, sizeof(PgBackendGSSStatus));
#endif
PGSTAT_END_WRITE_ACTIVITY(vbeentry);
/* Update app name to current GUC setting */
if (application_name)
......@@ -3053,11 +3098,11 @@ pgstat_beshutdown_hook(int code, Datum arg)
* before and after. We use a volatile pointer here to ensure the
* compiler doesn't try to get cute.
*/
pgstat_increment_changecount_before(beentry);
PGSTAT_BEGIN_WRITE_ACTIVITY(beentry);
beentry->st_procpid = 0; /* mark invalid */
pgstat_increment_changecount_after(beentry);
PGSTAT_END_WRITE_ACTIVITY(beentry);
}
......@@ -3096,7 +3141,7 @@ pgstat_report_activity(BackendState state, const char *cmd_str)
* non-disabled state. As our final update, change the state and
* clear fields we will not be updating anymore.
*/
pgstat_increment_changecount_before(beentry);
PGSTAT_BEGIN_WRITE_ACTIVITY(beentry);
beentry->st_state = STATE_DISABLED;
beentry->st_state_start_timestamp = 0;
beentry->st_activity_raw[0] = '\0';
......@@ -3104,14 +3149,14 @@ pgstat_report_activity(BackendState state, const char *cmd_str)
/* st_xact_start_timestamp and wait_event_info are also disabled */
beentry->st_xact_start_timestamp = 0;
proc->wait_event_info = 0;
pgstat_increment_changecount_after(beentry);
PGSTAT_END_WRITE_ACTIVITY(beentry);
}
return;
}
/*
* To minimize the time spent modifying the entry, fetch all the needed
* data first.
* To minimize the time spent modifying the entry, and avoid risk of
* errors inside the critical section, fetch all the needed data first.
*/
start_timestamp = GetCurrentStatementStartTimestamp();
if (cmd_str != NULL)
......@@ -3128,7 +3173,7 @@ pgstat_report_activity(BackendState state, const char *cmd_str)
/*
* Now update the status entry
*/
pgstat_increment_changecount_before(beentry);
PGSTAT_BEGIN_WRITE_ACTIVITY(beentry);
beentry->st_state = state;
beentry->st_state_start_timestamp = current_timestamp;
......@@ -3140,7 +3185,7 @@ pgstat_report_activity(BackendState state, const char *cmd_str)
beentry->st_activity_start_timestamp = start_timestamp;
}
pgstat_increment_changecount_after(beentry);
PGSTAT_END_WRITE_ACTIVITY(beentry);
}
/*-----------
......@@ -3158,11 +3203,11 @@ pgstat_progress_start_command(ProgressCommandType cmdtype, Oid relid)
if (!beentry || !pgstat_track_activities)
return;
pgstat_increment_changecount_before(beentry);
PGSTAT_BEGIN_WRITE_ACTIVITY(beentry);
beentry->st_progress_command = cmdtype;
beentry->st_progress_command_target = relid;
MemSet(&beentry->st_progress_param, 0, sizeof(beentry->st_progress_param));
pgstat_increment_changecount_after(beentry);
PGSTAT_END_WRITE_ACTIVITY(beentry);
}
/*-----------
......@@ -3181,9 +3226,9 @@ pgstat_progress_update_param(int index, int64 val)
if (!beentry || !pgstat_track_activities)
return;
pgstat_increment_changecount_before(beentry);
PGSTAT_BEGIN_WRITE_ACTIVITY(beentry);
beentry->st_progress_param[index] = val;
pgstat_increment_changecount_after(beentry);
PGSTAT_END_WRITE_ACTIVITY(beentry);
}
/*-----------
......@@ -3203,7 +3248,7 @@ pgstat_progress_update_multi_param(int nparam, const int *index,
if (!beentry || !pgstat_track_activities || nparam == 0)
return;
pgstat_increment_changecount_before(beentry);
PGSTAT_BEGIN_WRITE_ACTIVITY(beentry);
for (i = 0; i < nparam; ++i)
{
......@@ -3212,7 +3257,7 @@ pgstat_progress_update_multi_param(int nparam, const int *index,
beentry->st_progress_param[index[i]] = val[i];
}
pgstat_increment_changecount_after(beentry);
PGSTAT_END_WRITE_ACTIVITY(beentry);
}
/*-----------
......@@ -3233,10 +3278,10 @@ pgstat_progress_end_command(void)
&& beentry->st_progress_command == PROGRESS_COMMAND_INVALID)
return;
pgstat_increment_changecount_before(beentry);
PGSTAT_BEGIN_WRITE_ACTIVITY(beentry);
beentry->st_progress_command = PROGRESS_COMMAND_INVALID;
beentry->st_progress_command_target = InvalidOid;
pgstat_increment_changecount_after(beentry);
PGSTAT_END_WRITE_ACTIVITY(beentry);
}
/* ----------
......@@ -3262,12 +3307,12 @@ pgstat_report_appname(const char *appname)
* st_changecount before and after. We use a volatile pointer here to
* ensure the compiler doesn't try to get cute.
*/
pgstat_increment_changecount_before(beentry);
PGSTAT_BEGIN_WRITE_ACTIVITY(beentry);
memcpy((char *) beentry->st_appname, appname, len);
beentry->st_appname[len] = '\0';
pgstat_increment_changecount_after(beentry);
PGSTAT_END_WRITE_ACTIVITY(beentry);
}
/*
......@@ -3287,9 +3332,11 @@ pgstat_report_xact_timestamp(TimestampTz tstamp)
* st_changecount before and after. We use a volatile pointer here to
* ensure the compiler doesn't try to get cute.
*/
pgstat_increment_changecount_before(beentry);
PGSTAT_BEGIN_WRITE_ACTIVITY(beentry);
beentry->st_xact_start_timestamp = tstamp;
pgstat_increment_changecount_after(beentry);
PGSTAT_END_WRITE_ACTIVITY(beentry);
}
/* ----------
......@@ -3310,6 +3357,9 @@ pgstat_read_current_status(void)
*localactivity;
#ifdef USE_SSL
PgBackendSSLStatus *localsslstatus;
#endif
#ifdef ENABLE_GSS
PgBackendGSSStatus *localgssstatus;
#endif
int i;
......@@ -3344,6 +3394,11 @@ pgstat_read_current_status(void)
MemoryContextAlloc(pgStatLocalContext,
sizeof(PgBackendSSLStatus) * NumBackendStatSlots);
#endif
#ifdef ENABLE_GSS
localgssstatus = (PgBackendGSSStatus *)
MemoryContextAlloc(pgStatLocalContext,
sizeof(PgBackendGSSStatus) * NumBackendStatSlots);
#endif
localNumBackends = 0;
......@@ -3363,14 +3418,19 @@ pgstat_read_current_status(void)
int before_changecount;
int after_changecount;
pgstat_save_changecount_before(beentry, before_changecount);
pgstat_begin_read_activity(beentry, before_changecount);
localentry->backendStatus.st_procpid = beentry->st_procpid;
/* Skip all the data-copying work if entry is not in use */
if (localentry->backendStatus.st_procpid > 0)
{
memcpy(&localentry->backendStatus, unvolatize(PgBackendStatus *, beentry), sizeof(PgBackendStatus));
/*
* For each PgBackendStatus field that is a pointer, copy the
* pointed-to data, then adjust the local copy of the pointer
* field to point at the local copy of the data.
*
* strcpy is safe even if the string is modified concurrently,
* because there's always a \0 at the end of the buffer.
*/
......@@ -3380,19 +3440,26 @@ pgstat_read_current_status(void)
localentry->backendStatus.st_clienthostname = localclienthostname;
strcpy(localactivity, (char *) beentry->st_activity_raw);
localentry->backendStatus.st_activity_raw = localactivity;
localentry->backendStatus.st_ssl = beentry->st_ssl;
#ifdef USE_SSL
if (beentry->st_ssl)
{
memcpy(localsslstatus, beentry->st_sslstatus, sizeof(PgBackendSSLStatus));
localentry->backendStatus.st_sslstatus = localsslstatus;
}
#endif
#ifdef ENABLE_GSS
if (beentry->st_gss)
{
memcpy(localgssstatus, beentry->st_gssstatus, sizeof(PgBackendGSSStatus));
localentry->backendStatus.st_gssstatus = localgssstatus;
}
#endif
}
pgstat_save_changecount_after(beentry, after_changecount);
if (before_changecount == after_changecount &&
(before_changecount & 1) == 0)
pgstat_end_read_activity(beentry, after_changecount);
if (pgstat_read_activity_complete(before_changecount,
after_changecount))
break;
/* Make sure we can break out of loop if stuck... */
......@@ -3413,6 +3480,9 @@ pgstat_read_current_status(void)
localactivity += pgstat_track_activity_query_size;
#ifdef USE_SSL
localsslstatus++;
#endif
#ifdef ENABLE_GSS
localgssstatus++;
#endif
localNumBackends++;
}
......@@ -4090,14 +4160,14 @@ pgstat_get_backend_current_activity(int pid, bool checkUser)
int before_changecount;
int after_changecount;
pgstat_save_changecount_before(vbeentry, before_changecount);
pgstat_begin_read_activity(vbeentry, before_changecount);
found = (vbeentry->st_procpid == pid);
pgstat_save_changecount_after(vbeentry, after_changecount);
pgstat_end_read_activity(vbeentry, after_changecount);
if (before_changecount == after_changecount &&
(before_changecount & 1) == 0)
if (pgstat_read_activity_complete(before_changecount,
after_changecount))
break;
/* Make sure we can break out of loop if stuck... */
......
......@@ -1035,11 +1035,12 @@ typedef struct PgBackendStatus
* the copy is valid; otherwise start over. This makes updates cheap
* while reads are potentially expensive, but that's the tradeoff we want.
*
* The above protocol needs the memory barriers to ensure that the
* apparent order of execution is as it desires. Otherwise, for example,
* the CPU might rearrange the code so that st_changecount is incremented
* twice before the modification on a machine with weak memory ordering.
* This surprising result can lead to bugs.
* The above protocol needs memory barriers to ensure that the apparent
* order of execution is as it desires. Otherwise, for example, the CPU
* might rearrange the code so that st_changecount is incremented twice
* before the modification on a machine with weak memory ordering. Hence,
* use the macros defined below for manipulating st_changecount, rather
* than touching it directly.
*/
int st_changecount;
......@@ -1099,42 +1100,66 @@ typedef struct PgBackendStatus
} PgBackendStatus;
/*
* Macros to load and store st_changecount with the memory barriers.
* Macros to load and store st_changecount with appropriate memory barriers.
*
* pgstat_increment_changecount_before() and
* pgstat_increment_changecount_after() need to be called before and after
* PgBackendStatus entries are modified, respectively. This makes sure that
* st_changecount is incremented around the modification.
* Use PGSTAT_BEGIN_WRITE_ACTIVITY() before, and PGSTAT_END_WRITE_ACTIVITY()
* after, modifying the current process's PgBackendStatus data. Note that,
* since there is no mechanism for cleaning up st_changecount after an error,
* THESE MACROS FORM A CRITICAL SECTION. Any error between them will be
* promoted to PANIC, causing a database restart to clean up shared memory!
* Hence, keep the critical section as short and straight-line as possible.
* Aside from being safer, that minimizes the window in which readers will
* have to loop.
*
* Also pgstat_save_changecount_before() and pgstat_save_changecount_after()
* need to be called before and after PgBackendStatus entries are copied into
* private memory, respectively.
* Reader logic should follow this sketch:
*
* for (;;)
* {
* int before_ct, after_ct;
*
* pgstat_begin_read_activity(beentry, before_ct);
* ... copy beentry data to local memory ...
* pgstat_end_read_activity(beentry, after_ct);
* if (pgstat_read_activity_complete(before_ct, after_ct))
* break;
* CHECK_FOR_INTERRUPTS();
* }
*
* For extra safety, we generally use volatile beentry pointers, although
* the memory barriers should theoretically be sufficient.
*/
#define pgstat_increment_changecount_before(beentry) \
do { \
beentry->st_changecount++; \
#define PGSTAT_BEGIN_WRITE_ACTIVITY(beentry) \
do { \
START_CRIT_SECTION(); \
(beentry)->st_changecount++; \
pg_write_barrier(); \
} while (0)
#define pgstat_increment_changecount_after(beentry) \
do { \
#define PGSTAT_END_WRITE_ACTIVITY(beentry) \
do { \
pg_write_barrier(); \
beentry->st_changecount++; \
Assert((beentry->st_changecount & 1) == 0); \
(beentry)->st_changecount++; \
Assert(((beentry)->st_changecount & 1) == 0); \
END_CRIT_SECTION(); \
} while (0)
#define pgstat_save_changecount_before(beentry, save_changecount) \
do { \
save_changecount = beentry->st_changecount; \
pg_read_barrier(); \
#define pgstat_begin_read_activity(beentry, before_changecount) \
do { \
(before_changecount) = (beentry)->st_changecount; \
pg_read_barrier(); \
} while (0)
#define pgstat_save_changecount_after(beentry, save_changecount) \
do { \
pg_read_barrier(); \
save_changecount = beentry->st_changecount; \
#define pgstat_end_read_activity(beentry, after_changecount) \
do { \
pg_read_barrier(); \
(after_changecount) = (beentry)->st_changecount; \
} while (0)
#define pgstat_read_activity_complete(before_changecount, after_changecount) \
((before_changecount) == (after_changecount) && \
((before_changecount) & 1) == 0)
/* ----------
* LocalPgBackendStatus
*
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment