Commit d91a4a6c authored by Robert Haas's avatar Robert Haas

Cosmetic improvements to group locking.

Reflow text in lock manager README so that it fits within 80 columns.
Correct some mistakes.  Expand the README to explain not only why group
locking exists but also the data structures that support it.  Improve
comments related to group locking several files.  Change the name of a
macro argument for improved clarity.

Most of these problems were reported by Tom Lane, but I found a few
of them myself.

Robert Haas and Tom Lane
parent dd077ef8
...@@ -591,26 +591,27 @@ Group Locking ...@@ -591,26 +591,27 @@ Group Locking
As if all of that weren't already complicated enough, PostgreSQL now supports As if all of that weren't already complicated enough, PostgreSQL now supports
parallelism (see src/backend/access/transam/README.parallel), which means that parallelism (see src/backend/access/transam/README.parallel), which means that
we might need to resolve deadlocks that occur between gangs of related processes we might need to resolve deadlocks that occur between gangs of related
rather than individual processes. This doesn't change the basic deadlock processes rather than individual processes. This doesn't change the basic
detection algorithm very much, but it makes the bookkeeping more complicated. deadlock detection algorithm very much, but it makes the bookkeeping more
complicated.
We choose to regard locks held by processes in the same parallel group as We choose to regard locks held by processes in the same parallel group as
non-conflicting. This means that two processes in a parallel group can hold non-conflicting. This means that two processes in a parallel group can hold a
a self-exclusive lock on the same relation at the same time, or one process self-exclusive lock on the same relation at the same time, or one process can
can acquire an AccessShareLock while the other already holds AccessExclusiveLock. acquire an AccessShareLock while the other already holds AccessExclusiveLock.
This might seem dangerous and could be in some cases (more on that below), but This might seem dangerous and could be in some cases (more on that below), but
if we didn't do this then parallel query would be extremely prone to if we didn't do this then parallel query would be extremely prone to
self-deadlock. For example, a parallel query against a relation on which the self-deadlock. For example, a parallel query against a relation on which the
leader had already AccessExclusiveLock would hang, because the workers would leader already had AccessExclusiveLock would hang, because the workers would
try to lock the same relation and be blocked by the leader; yet the leader can't try to lock the same relation and be blocked by the leader; yet the leader
finish until it receives completion indications from all workers. An undetected can't finish until it receives completion indications from all workers. An
deadlock results. This is far from the only scenario where such a problem undetected deadlock results. This is far from the only scenario where such a
happens. The same thing will occur if the leader holds only AccessShareLock, problem happens. The same thing will occur if the leader holds only
the worker seeks AccessShareLock, but between the time the leader attempts to AccessShareLock, the worker seeks AccessShareLock, but between the time the
acquire the lock and the time the worker attempts to acquire it, some other leader attempts to acquire the lock and the time the worker attempts to
process queues up waiting for an AccessExclusiveLock. In this case, too, an acquire it, some other process queues up waiting for an AccessExclusiveLock.
indefinite hang results. In this case, too, an indefinite hang results.
It might seem that we could predict which locks the workers will attempt to It might seem that we could predict which locks the workers will attempt to
acquire and ensure before going parallel that those locks would be acquired acquire and ensure before going parallel that those locks would be acquired
...@@ -618,7 +619,7 @@ successfully. But this is very difficult to make work in a general way. For ...@@ -618,7 +619,7 @@ successfully. But this is very difficult to make work in a general way. For
example, a parallel worker's portion of the query plan could involve an example, a parallel worker's portion of the query plan could involve an
SQL-callable function which generates a query dynamically, and that query SQL-callable function which generates a query dynamically, and that query
might happen to hit a table on which the leader happens to hold might happen to hit a table on which the leader happens to hold
AccessExcusiveLock. By imposing enough restrictions on what workers can do, AccessExclusiveLock. By imposing enough restrictions on what workers can do,
we could eventually create a situation where their behavior can be adequately we could eventually create a situation where their behavior can be adequately
restricted, but these restrictions would be fairly onerous, and even then, the restricted, but these restrictions would be fairly onerous, and even then, the
system required to decide whether the workers will succeed at acquiring the system required to decide whether the workers will succeed at acquiring the
...@@ -627,27 +628,56 @@ necessary locks would be complex and possibly buggy. ...@@ -627,27 +628,56 @@ necessary locks would be complex and possibly buggy.
So, instead, we take the approach of deciding that locks within a lock group So, instead, we take the approach of deciding that locks within a lock group
do not conflict. This eliminates the possibility of an undetected deadlock, do not conflict. This eliminates the possibility of an undetected deadlock,
but also opens up some problem cases: if the leader and worker try to do some but also opens up some problem cases: if the leader and worker try to do some
operation at the same time which would ordinarily be prevented by the heavyweight operation at the same time which would ordinarily be prevented by the
lock mechanism, undefined behavior might result. In practice, the dangers are heavyweight lock mechanism, undefined behavior might result. In practice, the
modest. The leader and worker share the same transaction, snapshot, and combo dangers are modest. The leader and worker share the same transaction,
CID hash, and neither can perform any DDL or, indeed, write any data at all. snapshot, and combo CID hash, and neither can perform any DDL or, indeed,
Thus, for either to read a table locked exclusively by the other is safe enough. write any data at all. Thus, for either to read a table locked exclusively by
Problems would occur if the leader initiated parallelism from a point in the the other is safe enough. Problems would occur if the leader initiated
code at which it had some backend-private state that made table access from parallelism from a point in the code at which it had some backend-private
another process unsafe, for example after calling SetReindexProcessing and state that made table access from another process unsafe, for example after
before calling ResetReindexProcessing, catastrophe could ensue, because the calling SetReindexProcessing and before calling ResetReindexProcessing,
worker won't have that state. Similarly, problems could occur with certain catastrophe could ensue, because the worker won't have that state. Similarly,
kinds of non-relation locks, such as relation extension locks. It's no safer problems could occur with certain kinds of non-relation locks, such as
for two related processes to extend the same relation at the time than for relation extension locks. It's no safer for two related processes to extend
unrelated processes to do the same. However, since parallel mode is strictly the same relation at the time than for unrelated processes to do the same.
read-only at present, neither this nor most of the similar cases can arise at However, since parallel mode is strictly read-only at present, neither this
present. To allow parallel writes, we'll either need to (1) further enhance nor most of the similar cases can arise at present. To allow parallel writes,
the deadlock detector to handle those types of locks in a different way than we'll either need to (1) further enhance the deadlock detector to handle those
other types; or (2) have parallel workers use some other mutual exclusion types of locks in a different way than other types; or (2) have parallel
method for such cases; or (3) revise those cases so that they no longer use workers use some other mutual exclusion method for such cases; or (3) revise
heavyweight locking in the first place (which is not a crazy idea, given that those cases so that they no longer use heavyweight locking in the first place
such lock acquisitions are not expected to deadlock and that heavyweight lock (which is not a crazy idea, given that such lock acquisitions are not expected
acquisition is fairly slow anyway). to deadlock and that heavyweight lock acquisition is fairly slow anyway).
Group locking adds four new members to each PGPROC: lockGroupLeaderIdentifier,
lockGroupLeader, lockGroupMembers, and lockGroupLink. The first is simply a
safety mechanism. A newly started parallel worker has to try to join the
leader's lock group, but it has no guarantee that the group leader is still
alive by the time it gets started. We try to ensure that the parallel leader
dies after all workers in normal cases, but also that the system could survive
relatively intact if that somehow fails to happen. This is one of the
precautions against such a scenario: the leader relays its PGPROC and also its
PID to the worker, and the worker fails to join the lock group unless the
given PGPROC still has the same PID. We assume that PIDs are not recycled
quickly enough for this interlock to fail.
A PGPROC's lockGroupLeader is NULL for processes not involved in parallel
query. When a process wants to cooperate with parallel workers, it becomes a
lock group leader, which means setting this field to point to its own
PGPROC. When a parallel worker starts up, it points this field at the leader,
with the above-mentioned interlock. The lockGroupMembers field is only used in
the leader; it is a list of the member PGPROCs of the lock group (the leader
and all workers). The lockGroupLink field is the list link for this list.
All four of these fields are considered to be protected by a lock manager
partition lock. The partition lock that protects these fields within a given
lock group is chosen by taking the leader's pgprocno modulo the number of lock
manager partitions. This unusual arrangement has a major advantage: the
deadlock detector can count on the fact that no lockGroupLeader field can
change while the deadlock detector is running, because it knows that it holds
all the lock manager locks. Also, holding this single lock allows safe
manipulation of the lockGroupMembers list for the lock group.
User Locks (Advisory Locks) User Locks (Advisory Locks)
--------------------------- ---------------------------
......
...@@ -33,12 +33,21 @@ ...@@ -33,12 +33,21 @@
#include "utils/memutils.h" #include "utils/memutils.h"
/* One edge in the waits-for graph */ /*
* One edge in the waits-for graph.
*
* waiter and blocker may or may not be members of a lock group, but if either
* is, it will be the leader rather than any other member of the lock group.
* The group leaders act as representatives of the whole group even though
* those particular processes need not be waiting at all. There will be at
* least one member of the waiter's lock group on the wait queue for the given
* lock, maybe more.
*/
typedef struct typedef struct
{ {
PGPROC *waiter; /* the waiting process */ PGPROC *waiter; /* the leader of the waiting lock group */
PGPROC *blocker; /* the process it is waiting for */ PGPROC *blocker; /* the leader of the group it is waiting for */
LOCK *lock; /* the lock it is waiting for */ LOCK *lock; /* the lock being waited for */
int pred; /* workspace for TopoSort */ int pred; /* workspace for TopoSort */
int link; /* workspace for TopoSort */ int link; /* workspace for TopoSort */
} EDGE; } EDGE;
...@@ -1006,8 +1015,8 @@ TopoSort(LOCK *lock, ...@@ -1006,8 +1015,8 @@ TopoSort(LOCK *lock,
return false; return false;
/* /*
* Output everything in the lock group. There's no point in outputing * Output everything in the lock group. There's no point in
* an ordering where members of the same lock group are not * outputting an ordering where members of the same lock group are not
* consecutive on the wait queue: if some other waiter is between two * consecutive on the wait queue: if some other waiter is between two
* requests that belong to the same group, then either it conflicts * requests that belong to the same group, then either it conflicts
* with both of them and is certainly not a solution; or it conflicts * with both of them and is certainly not a solution; or it conflicts
......
...@@ -1004,7 +1004,7 @@ ProcSleep(LOCALLOCK *locallock, LockMethod lockMethodTable) ...@@ -1004,7 +1004,7 @@ ProcSleep(LOCALLOCK *locallock, LockMethod lockMethodTable)
int i; int i;
/* /*
* If group locking is in use, locks held my members of my locking group * If group locking is in use, locks held by members of my locking group
* need to be included in myHeldLocks. * need to be included in myHeldLocks.
*/ */
if (leader != NULL) if (leader != NULL)
...@@ -1050,7 +1050,8 @@ ProcSleep(LOCALLOCK *locallock, LockMethod lockMethodTable) ...@@ -1050,7 +1050,8 @@ ProcSleep(LOCALLOCK *locallock, LockMethod lockMethodTable)
{ {
/* /*
* If we're part of the same locking group as this waiter, its * If we're part of the same locking group as this waiter, its
* locks neither conflict with ours nor contribute to aheadRequsts. * locks neither conflict with ours nor contribute to
* aheadRequests.
*/ */
if (leader != NULL && leader == proc->lockGroupLeader) if (leader != NULL && leader == proc->lockGroupLeader)
{ {
...@@ -1797,9 +1798,17 @@ BecomeLockGroupMember(PGPROC *leader, int pid) ...@@ -1797,9 +1798,17 @@ BecomeLockGroupMember(PGPROC *leader, int pid)
/* PID must be valid. */ /* PID must be valid. */
Assert(pid != 0); Assert(pid != 0);
/* Try to join the group. */ /*
* Get lock protecting the group fields. Note LockHashPartitionLockByProc
* accesses leader->pgprocno in a PGPROC that might be free. This is safe
* because all PGPROCs' pgprocno fields are set during shared memory
* initialization and never change thereafter; so we will acquire the
* correct lock even if the leader PGPROC is in process of being recycled.
*/
leader_lwlock = LockHashPartitionLockByProc(MyProc); leader_lwlock = LockHashPartitionLockByProc(MyProc);
LWLockAcquire(leader_lwlock, LW_EXCLUSIVE); LWLockAcquire(leader_lwlock, LW_EXCLUSIVE);
/* Try to join the group */
if (leader->lockGroupLeaderIdentifier == pid) if (leader->lockGroupLeaderIdentifier == pid)
{ {
ok = true; ok = true;
......
...@@ -478,10 +478,11 @@ typedef enum ...@@ -478,10 +478,11 @@ typedef enum
* by one of the lock hash partition locks. Since the deadlock detector * by one of the lock hash partition locks. Since the deadlock detector
* acquires all such locks anyway, this makes it safe for it to access these * acquires all such locks anyway, this makes it safe for it to access these
* fields without doing anything extra. To avoid contention as much as * fields without doing anything extra. To avoid contention as much as
* possible, we map different PGPROCs to different partition locks. * possible, we map different PGPROCs to different partition locks. The lock
* used for a given lock group is determined by the group leader's pgprocno.
*/ */
#define LockHashPartitionLockByProc(p) \ #define LockHashPartitionLockByProc(leader_pgproc) \
LockHashPartitionLock((p)->pgprocno) LockHashPartitionLock((leader_pgproc)->pgprocno)
/* /*
* function prototypes * function prototypes
......
...@@ -152,7 +152,7 @@ struct PGPROC ...@@ -152,7 +152,7 @@ struct PGPROC
*/ */
TransactionId procArrayGroupMemberXid; TransactionId procArrayGroupMemberXid;
/* Per-backend LWLock. Protects fields below. */ /* Per-backend LWLock. Protects fields below (but not group fields). */
LWLock backendLock; LWLock backendLock;
/* Lock manager data, recording fast-path locks taken by this backend. */ /* Lock manager data, recording fast-path locks taken by this backend. */
...@@ -163,11 +163,11 @@ struct PGPROC ...@@ -163,11 +163,11 @@ struct PGPROC
* lock */ * lock */
/* /*
* Support for lock groups. Use LockHashPartitionLockByProc to get the * Support for lock groups. Use LockHashPartitionLockByProc on the group
* LWLock protecting these fields. * leader to get the LWLock protecting these fields.
*/ */
int lockGroupLeaderIdentifier; /* MyProcPid, if I'm a leader */ int lockGroupLeaderIdentifier; /* MyProcPid, if I'm a leader */
PGPROC *lockGroupLeader; /* lock group leader, if I'm a follower */ PGPROC *lockGroupLeader; /* lock group leader, if I'm a member */
dlist_head lockGroupMembers; /* list of members, if I'm a leader */ dlist_head lockGroupMembers; /* list of members, if I'm a leader */
dlist_node lockGroupLink; /* my member link, if I'm a member */ dlist_node lockGroupLink; /* my member link, if I'm a member */
}; };
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment