Commit 378802e3 authored by Thomas Munro's avatar Thomas Munro

Update the names of Parallel Hash Join phases.

Commit 3048898e dropped -ING from some wait event names that correspond
to barrier phases.  Update the phases' names to match.

While we're here making cosmetic changes, also rename "DONE" to "FREE".
That pairs better with "ALLOCATE", and describes the activity that
actually happens in that phase (as we do for the other phases) rather
than describing a state.  The distinction is clearer after bugfix commit
3b8981b6 split the phase into two.  As for the growth barriers, rename
their "ALLOCATE" phase to "REALLOCATE", which is probably a better
description of what happens then.  Also improve the comments about
the phases a bit.

Discussion: https://postgr.es/m/CA%2BhUKG%2BMDpwF2Eo2LAvzd%3DpOh81wUTsrwU1uAwR-v6OGBB6%2B7g%40mail.gmail.com
parent 3b8981b6
This diff is collapsed.
...@@ -39,27 +39,30 @@ ...@@ -39,27 +39,30 @@
* *
* One barrier called build_barrier is used to coordinate the hashing phases. * One barrier called build_barrier is used to coordinate the hashing phases.
* The phase is represented by an integer which begins at zero and increments * The phase is represented by an integer which begins at zero and increments
* one by one, but in the code it is referred to by symbolic names as follows: * one by one, but in the code it is referred to by symbolic names as follows.
* An asterisk indicates a phase that is performed by a single arbitrarily
* chosen process.
* *
* PHJ_BUILD_ELECTING -- initial state * PHJ_BUILD_ELECT -- initial state
* PHJ_BUILD_ALLOCATING -- one sets up the batches and table 0 * PHJ_BUILD_ALLOCATE* -- one sets up the batches and table 0
* PHJ_BUILD_HASHING_INNER -- all hash the inner rel * PHJ_BUILD_HASH_INNER -- all hash the inner rel
* PHJ_BUILD_HASHING_OUTER -- (multi-batch only) all hash the outer * PHJ_BUILD_HASH_OUTER -- (multi-batch only) all hash the outer
* PHJ_BUILD_RUNNING -- building done, probing can begin * PHJ_BUILD_RUN -- building done, probing can begin
* PHJ_BUILD_DONE -- all work complete, one frees batches * PHJ_BUILD_FREE* -- all work complete, one frees batches
* *
* While in the phase PHJ_BUILD_HASHING_INNER a separate pair of barriers may * While in the phase PHJ_BUILD_HASH_INNER a separate pair of barriers may
* be used repeatedly as required to coordinate expansions in the number of * be used repeatedly as required to coordinate expansions in the number of
* batches or buckets. Their phases are as follows: * batches or buckets. Their phases are as follows:
* *
* PHJ_GROW_BATCHES_ELECTING -- initial state * PHJ_GROW_BATCHES_ELECT -- initial state
* PHJ_GROW_BATCHES_ALLOCATING -- one allocates new batches * PHJ_GROW_BATCHES_REALLOCATE* -- one allocates new batches
* PHJ_GROW_BATCHES_REPARTITIONING -- all repartition * PHJ_GROW_BATCHES_REPARTITION -- all repartition
* PHJ_GROW_BATCHES_FINISHING -- one cleans up, detects skew * PHJ_GROW_BATCHES_DECIDE* -- one detects skew and cleans up
* PHJ_GROW_BATCHES_FINISH -- finished one growth cycle
* *
* PHJ_GROW_BUCKETS_ELECTING -- initial state * PHJ_GROW_BUCKETS_ELECT -- initial state
* PHJ_GROW_BUCKETS_ALLOCATING -- one allocates new buckets * PHJ_GROW_BUCKETS_REALLOCATE* -- one allocates new buckets
* PHJ_GROW_BUCKETS_REINSERTING -- all insert tuples * PHJ_GROW_BUCKETS_REINSERT -- all insert tuples
* *
* If the planner got the number of batches and buckets right, those won't be * If the planner got the number of batches and buckets right, those won't be
* necessary, but on the other hand we might finish up needing to expand the * necessary, but on the other hand we might finish up needing to expand the
...@@ -67,27 +70,27 @@ ...@@ -67,27 +70,27 @@
* within our memory budget and load factor target. For that reason it's a * within our memory budget and load factor target. For that reason it's a
* separate pair of barriers using circular phases. * separate pair of barriers using circular phases.
* *
* The PHJ_BUILD_HASHING_OUTER phase is required only for multi-batch joins, * The PHJ_BUILD_HASH_OUTER phase is required only for multi-batch joins,
* because we need to divide the outer relation into batches up front in order * because we need to divide the outer relation into batches up front in order
* to be able to process batches entirely independently. In contrast, the * to be able to process batches entirely independently. In contrast, the
* parallel-oblivious algorithm simply throws tuples 'forward' to 'later' * parallel-oblivious algorithm simply throws tuples 'forward' to 'later'
* batches whenever it encounters them while scanning and probing, which it * batches whenever it encounters them while scanning and probing, which it
* can do because it processes batches in serial order. * can do because it processes batches in serial order.
* *
* Once PHJ_BUILD_RUNNING is reached, backends then split up and process * Once PHJ_BUILD_RUN is reached, backends then split up and process
* different batches, or gang up and work together on probing batches if there * different batches, or gang up and work together on probing batches if there
* aren't enough to go around. For each batch there is a separate barrier * aren't enough to go around. For each batch there is a separate barrier
* with the following phases: * with the following phases:
* *
* PHJ_BATCH_ELECTING -- initial state * PHJ_BATCH_ELECT -- initial state
* PHJ_BATCH_ALLOCATING -- one allocates buckets * PHJ_BATCH_ALLOCATE* -- one allocates buckets
* PHJ_BATCH_LOADING -- all load the hash table from disk * PHJ_BATCH_LOAD -- all load the hash table from disk
* PHJ_BATCH_PROBING -- all probe * PHJ_BATCH_PROBE -- all probe
* PHJ_BATCH_DONE -- end * PHJ_BATCH_FREE* -- one frees memory
* *
* Batch 0 is a special case, because it starts out in phase * Batch 0 is a special case, because it starts out in phase
* PHJ_BATCH_PROBING; populating batch 0's hash table is done during * PHJ_BATCH_PROBE; populating batch 0's hash table is done during
* PHJ_BUILD_HASHING_INNER so we can skip loading. * PHJ_BUILD_HASH_INNER so we can skip loading.
* *
* Initially we try to plan for a single-batch hash join using the combined * Initially we try to plan for a single-batch hash join using the combined
* hash_mem of all participants to create a large shared hash table. If that * hash_mem of all participants to create a large shared hash table. If that
...@@ -99,8 +102,8 @@ ...@@ -99,8 +102,8 @@
* finished. Practically, that means that we never emit a tuple while attached * finished. Practically, that means that we never emit a tuple while attached
* to a barrier, unless the barrier has reached a phase that means that no * to a barrier, unless the barrier has reached a phase that means that no
* process will wait on it again. We emit tuples while attached to the build * process will wait on it again. We emit tuples while attached to the build
* barrier in phase PHJ_BUILD_RUNNING, and to a per-batch barrier in phase * barrier in phase PHJ_BUILD_RUN, and to a per-batch barrier in phase
* PHJ_BATCH_PROBING. These are advanced to PHJ_BUILD_DONE and PHJ_BATCH_DONE * PHJ_BATCH_PROBE. These are advanced to PHJ_BUILD_FREE and PHJ_BATCH_FREE
* respectively without waiting, using BarrierArriveAndDetach(). The last to * respectively without waiting, using BarrierArriveAndDetach(). The last to
* detach receives a different return value so that it knows that it's safe to * detach receives a different return value so that it knows that it's safe to
* clean up. Any straggler process that attaches after that phase is reached * clean up. Any straggler process that attaches after that phase is reached
...@@ -322,10 +325,10 @@ ExecHashJoinImpl(PlanState *pstate, bool parallel) ...@@ -322,10 +325,10 @@ ExecHashJoinImpl(PlanState *pstate, bool parallel)
Barrier *build_barrier; Barrier *build_barrier;
build_barrier = &parallel_state->build_barrier; build_barrier = &parallel_state->build_barrier;
Assert(BarrierPhase(build_barrier) == PHJ_BUILD_HASHING_OUTER || Assert(BarrierPhase(build_barrier) == PHJ_BUILD_HASH_OUTER ||
BarrierPhase(build_barrier) == PHJ_BUILD_RUNNING || BarrierPhase(build_barrier) == PHJ_BUILD_RUN ||
BarrierPhase(build_barrier) == PHJ_BUILD_DONE); BarrierPhase(build_barrier) == PHJ_BUILD_FREE);
if (BarrierPhase(build_barrier) == PHJ_BUILD_HASHING_OUTER) if (BarrierPhase(build_barrier) == PHJ_BUILD_HASH_OUTER)
{ {
/* /*
* If multi-batch, we need to hash the outer relation * If multi-batch, we need to hash the outer relation
...@@ -336,7 +339,7 @@ ExecHashJoinImpl(PlanState *pstate, bool parallel) ...@@ -336,7 +339,7 @@ ExecHashJoinImpl(PlanState *pstate, bool parallel)
BarrierArriveAndWait(build_barrier, BarrierArriveAndWait(build_barrier,
WAIT_EVENT_HASH_BUILD_HASH_OUTER); WAIT_EVENT_HASH_BUILD_HASH_OUTER);
} }
else if (BarrierPhase(build_barrier) == PHJ_BUILD_DONE) else if (BarrierPhase(build_barrier) == PHJ_BUILD_FREE)
{ {
/* /*
* If we attached so late that the job is finished and * If we attached so late that the job is finished and
...@@ -347,7 +350,7 @@ ExecHashJoinImpl(PlanState *pstate, bool parallel) ...@@ -347,7 +350,7 @@ ExecHashJoinImpl(PlanState *pstate, bool parallel)
} }
/* Each backend should now select a batch to work on. */ /* Each backend should now select a batch to work on. */
Assert(BarrierPhase(build_barrier) == PHJ_BUILD_RUNNING); Assert(BarrierPhase(build_barrier) == PHJ_BUILD_RUN);
hashtable->curbatch = -1; hashtable->curbatch = -1;
node->hj_JoinState = HJ_NEED_NEW_BATCH; node->hj_JoinState = HJ_NEED_NEW_BATCH;
...@@ -1139,7 +1142,7 @@ ExecParallelHashJoinNewBatch(HashJoinState *hjstate) ...@@ -1139,7 +1142,7 @@ ExecParallelHashJoinNewBatch(HashJoinState *hjstate)
switch (BarrierAttach(batch_barrier)) switch (BarrierAttach(batch_barrier))
{ {
case PHJ_BATCH_ELECTING: case PHJ_BATCH_ELECT:
/* One backend allocates the hash table. */ /* One backend allocates the hash table. */
if (BarrierArriveAndWait(batch_barrier, if (BarrierArriveAndWait(batch_barrier,
...@@ -1147,13 +1150,13 @@ ExecParallelHashJoinNewBatch(HashJoinState *hjstate) ...@@ -1147,13 +1150,13 @@ ExecParallelHashJoinNewBatch(HashJoinState *hjstate)
ExecParallelHashTableAlloc(hashtable, batchno); ExecParallelHashTableAlloc(hashtable, batchno);
/* Fall through. */ /* Fall through. */
case PHJ_BATCH_ALLOCATING: case PHJ_BATCH_ALLOCATE:
/* Wait for allocation to complete. */ /* Wait for allocation to complete. */
BarrierArriveAndWait(batch_barrier, BarrierArriveAndWait(batch_barrier,
WAIT_EVENT_HASH_BATCH_ALLOCATE); WAIT_EVENT_HASH_BATCH_ALLOCATE);
/* Fall through. */ /* Fall through. */
case PHJ_BATCH_LOADING: case PHJ_BATCH_LOAD:
/* Start (or join in) loading tuples. */ /* Start (or join in) loading tuples. */
ExecParallelHashTableSetCurrentBatch(hashtable, batchno); ExecParallelHashTableSetCurrentBatch(hashtable, batchno);
inner_tuples = hashtable->batches[batchno].inner_tuples; inner_tuples = hashtable->batches[batchno].inner_tuples;
...@@ -1173,7 +1176,7 @@ ExecParallelHashJoinNewBatch(HashJoinState *hjstate) ...@@ -1173,7 +1176,7 @@ ExecParallelHashJoinNewBatch(HashJoinState *hjstate)
WAIT_EVENT_HASH_BATCH_LOAD); WAIT_EVENT_HASH_BATCH_LOAD);
/* Fall through. */ /* Fall through. */
case PHJ_BATCH_PROBING: case PHJ_BATCH_PROBE:
/* /*
* This batch is ready to probe. Return control to * This batch is ready to probe. Return control to
...@@ -1183,13 +1186,13 @@ ExecParallelHashJoinNewBatch(HashJoinState *hjstate) ...@@ -1183,13 +1186,13 @@ ExecParallelHashJoinNewBatch(HashJoinState *hjstate)
* this barrier again (or else a deadlock could occur). * this barrier again (or else a deadlock could occur).
* All attached participants must eventually call * All attached participants must eventually call
* BarrierArriveAndDetach() so that the final phase * BarrierArriveAndDetach() so that the final phase
* PHJ_BATCH_DONE can be reached. * PHJ_BATCH_FREE can be reached.
*/ */
ExecParallelHashTableSetCurrentBatch(hashtable, batchno); ExecParallelHashTableSetCurrentBatch(hashtable, batchno);
sts_begin_parallel_scan(hashtable->batches[batchno].outer_tuples); sts_begin_parallel_scan(hashtable->batches[batchno].outer_tuples);
return true; return true;
case PHJ_BATCH_DONE: case PHJ_BATCH_FREE:
/* /*
* Already done. Detach and go around again (if any * Already done. Detach and go around again (if any
...@@ -1516,7 +1519,7 @@ ExecHashJoinReInitializeDSM(HashJoinState *state, ParallelContext *cxt) ...@@ -1516,7 +1519,7 @@ ExecHashJoinReInitializeDSM(HashJoinState *state, ParallelContext *cxt)
/* /*
* It would be possible to reuse the shared hash table in single-batch * It would be possible to reuse the shared hash table in single-batch
* cases by resetting and then fast-forwarding build_barrier to * cases by resetting and then fast-forwarding build_barrier to
* PHJ_BUILD_DONE and batch 0's batch_barrier to PHJ_BATCH_PROBING, but * PHJ_BUILD_FREE and batch 0's batch_barrier to PHJ_BATCH_PROBE, but
* currently shared hash tables are already freed by now (by the last * currently shared hash tables are already freed by now (by the last
* participant to detach from the batch). We could consider keeping it * participant to detach from the batch). We could consider keeping it
* around for single-batch joins. We'd also need to adjust * around for single-batch joins. We'd also need to adjust
...@@ -1535,7 +1538,7 @@ ExecHashJoinReInitializeDSM(HashJoinState *state, ParallelContext *cxt) ...@@ -1535,7 +1538,7 @@ ExecHashJoinReInitializeDSM(HashJoinState *state, ParallelContext *cxt)
/* Clear any shared batch files. */ /* Clear any shared batch files. */
SharedFileSetDeleteAll(&pstate->fileset); SharedFileSetDeleteAll(&pstate->fileset);
/* Reset build_barrier to PHJ_BUILD_ELECTING so we can go around again. */ /* Reset build_barrier to PHJ_BUILD_ELECT so we can go around again. */
BarrierInit(&pstate->build_barrier, 0); BarrierInit(&pstate->build_barrier, 0);
} }
......
...@@ -4043,8 +4043,8 @@ pgstat_get_wait_ipc(WaitEventIPC w) ...@@ -4043,8 +4043,8 @@ pgstat_get_wait_ipc(WaitEventIPC w)
case WAIT_EVENT_HASH_BUILD_HASH_OUTER: case WAIT_EVENT_HASH_BUILD_HASH_OUTER:
event_name = "HashBuildHashOuter"; event_name = "HashBuildHashOuter";
break; break;
case WAIT_EVENT_HASH_GROW_BATCHES_ALLOCATE: case WAIT_EVENT_HASH_GROW_BATCHES_REALLOCATE:
event_name = "HashGrowBatchesAllocate"; event_name = "HashGrowBatchesReallocate";
break; break;
case WAIT_EVENT_HASH_GROW_BATCHES_DECIDE: case WAIT_EVENT_HASH_GROW_BATCHES_DECIDE:
event_name = "HashGrowBatchesDecide"; event_name = "HashGrowBatchesDecide";
...@@ -4058,8 +4058,8 @@ pgstat_get_wait_ipc(WaitEventIPC w) ...@@ -4058,8 +4058,8 @@ pgstat_get_wait_ipc(WaitEventIPC w)
case WAIT_EVENT_HASH_GROW_BATCHES_REPARTITION: case WAIT_EVENT_HASH_GROW_BATCHES_REPARTITION:
event_name = "HashGrowBatchesRepartition"; event_name = "HashGrowBatchesRepartition";
break; break;
case WAIT_EVENT_HASH_GROW_BUCKETS_ALLOCATE: case WAIT_EVENT_HASH_GROW_BUCKETS_REALLOCATE:
event_name = "HashGrowBucketsAllocate"; event_name = "HashGrowBucketsReallocate";
break; break;
case WAIT_EVENT_HASH_GROW_BUCKETS_ELECT: case WAIT_EVENT_HASH_GROW_BUCKETS_ELECT:
event_name = "HashGrowBucketsElect"; event_name = "HashGrowBucketsElect";
......
...@@ -254,32 +254,32 @@ typedef struct ParallelHashJoinState ...@@ -254,32 +254,32 @@ typedef struct ParallelHashJoinState
} ParallelHashJoinState; } ParallelHashJoinState;
/* The phases for building batches, used by build_barrier. */ /* The phases for building batches, used by build_barrier. */
#define PHJ_BUILD_ELECTING 0 #define PHJ_BUILD_ELECT 0
#define PHJ_BUILD_ALLOCATING 1 #define PHJ_BUILD_ALLOCATE 1
#define PHJ_BUILD_HASHING_INNER 2 #define PHJ_BUILD_HASH_INNER 2
#define PHJ_BUILD_HASHING_OUTER 3 #define PHJ_BUILD_HASH_OUTER 3
#define PHJ_BUILD_RUNNING 4 #define PHJ_BUILD_RUN 4
#define PHJ_BUILD_DONE 5 #define PHJ_BUILD_FREE 5
/* The phases for probing each batch, used by for batch_barrier. */ /* The phases for probing each batch, used by for batch_barrier. */
#define PHJ_BATCH_ELECTING 0 #define PHJ_BATCH_ELECT 0
#define PHJ_BATCH_ALLOCATING 1 #define PHJ_BATCH_ALLOCATE 1
#define PHJ_BATCH_LOADING 2 #define PHJ_BATCH_LOAD 2
#define PHJ_BATCH_PROBING 3 #define PHJ_BATCH_PROBE 3
#define PHJ_BATCH_DONE 4 #define PHJ_BATCH_FREE 4
/* The phases of batch growth while hashing, for grow_batches_barrier. */ /* The phases of batch growth while hashing, for grow_batches_barrier. */
#define PHJ_GROW_BATCHES_ELECTING 0 #define PHJ_GROW_BATCHES_ELECT 0
#define PHJ_GROW_BATCHES_ALLOCATING 1 #define PHJ_GROW_BATCHES_REALLOCATE 1
#define PHJ_GROW_BATCHES_REPARTITIONING 2 #define PHJ_GROW_BATCHES_REPARTITION 2
#define PHJ_GROW_BATCHES_DECIDING 3 #define PHJ_GROW_BATCHES_DECIDE 3
#define PHJ_GROW_BATCHES_FINISHING 4 #define PHJ_GROW_BATCHES_FINISH 4
#define PHJ_GROW_BATCHES_PHASE(n) ((n) % 5) /* circular phases */ #define PHJ_GROW_BATCHES_PHASE(n) ((n) % 5) /* circular phases */
/* The phases of bucket growth while hashing, for grow_buckets_barrier. */ /* The phases of bucket growth while hashing, for grow_buckets_barrier. */
#define PHJ_GROW_BUCKETS_ELECTING 0 #define PHJ_GROW_BUCKETS_ELECT 0
#define PHJ_GROW_BUCKETS_ALLOCATING 1 #define PHJ_GROW_BUCKETS_REALLOCATE 1
#define PHJ_GROW_BUCKETS_REINSERTING 2 #define PHJ_GROW_BUCKETS_REINSERT 2
#define PHJ_GROW_BUCKETS_PHASE(n) ((n) % 3) /* circular phases */ #define PHJ_GROW_BUCKETS_PHASE(n) ((n) % 3) /* circular phases */
typedef struct HashJoinTableData typedef struct HashJoinTableData
......
...@@ -982,12 +982,12 @@ typedef enum ...@@ -982,12 +982,12 @@ typedef enum
WAIT_EVENT_HASH_BUILD_ELECT, WAIT_EVENT_HASH_BUILD_ELECT,
WAIT_EVENT_HASH_BUILD_HASH_INNER, WAIT_EVENT_HASH_BUILD_HASH_INNER,
WAIT_EVENT_HASH_BUILD_HASH_OUTER, WAIT_EVENT_HASH_BUILD_HASH_OUTER,
WAIT_EVENT_HASH_GROW_BATCHES_ALLOCATE, WAIT_EVENT_HASH_GROW_BATCHES_REALLOCATE,
WAIT_EVENT_HASH_GROW_BATCHES_DECIDE, WAIT_EVENT_HASH_GROW_BATCHES_DECIDE,
WAIT_EVENT_HASH_GROW_BATCHES_ELECT, WAIT_EVENT_HASH_GROW_BATCHES_ELECT,
WAIT_EVENT_HASH_GROW_BATCHES_FINISH, WAIT_EVENT_HASH_GROW_BATCHES_FINISH,
WAIT_EVENT_HASH_GROW_BATCHES_REPARTITION, WAIT_EVENT_HASH_GROW_BATCHES_REPARTITION,
WAIT_EVENT_HASH_GROW_BUCKETS_ALLOCATE, WAIT_EVENT_HASH_GROW_BUCKETS_REALLOCATE,
WAIT_EVENT_HASH_GROW_BUCKETS_ELECT, WAIT_EVENT_HASH_GROW_BUCKETS_ELECT,
WAIT_EVENT_HASH_GROW_BUCKETS_REINSERT, WAIT_EVENT_HASH_GROW_BUCKETS_REINSERT,
WAIT_EVENT_LOGICAL_SYNC_DATA, WAIT_EVENT_LOGICAL_SYNC_DATA,
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment