Commit c1ef4e5c authored by Robert Haas's avatar Robert Haas

Make some more improvements to parallel query documentation.

Many places that mentioned only Gather should also mention Gather
Merge, or should be phrased in a more neutral way.  Be more clear
about the fact that max_parallel_workers_per_gather affects the number
of workers the planner may want to use.  Fix a typo.  Explain how
Gather Merge works.  Adjust wording around parallel scans to be a bit
more clear.  Adjust wording around parallel-restricted operations for
the fact that uncorrelated subplans are no longer restricted.

Patch by me, reviewed by Erik Rijkers

Discussion: http://postgr.es/m/CA+TgmoZsTjgVGn=ei5ht-1qGFKy_m1VgB3d8+Rg304hz91N5ww@mail.gmail.com
parent e6940107
...@@ -2050,8 +2050,8 @@ include_dir 'conf.d' ...@@ -2050,8 +2050,8 @@ include_dir 'conf.d'
<listitem> <listitem>
<para> <para>
Sets the maximum number of workers that can be started by a single Sets the maximum number of workers that can be started by a single
<literal>Gather</literal> node. Parallel workers are taken from the <literal>Gather</literal> or <literal>Gather Merge</literal> node.
pool of processes established by Parallel workers are taken from the pool of processes established by
<xref linkend="guc-max-worker-processes">, limited by <xref linkend="guc-max-worker-processes">, limited by
<xref linkend="guc-max-parallel-workers">. Note that the requested <xref linkend="guc-max-parallel-workers">. Note that the requested
number of workers may not actually be available at run time. If this number of workers may not actually be available at run time. If this
......
...@@ -28,7 +28,8 @@ ...@@ -28,7 +28,8 @@
<para> <para>
When the optimizer determines that parallel query is the fastest execution When the optimizer determines that parallel query is the fastest execution
strategy for a particular query, it will create a query plan which includes strategy for a particular query, it will create a query plan which includes
a <firstterm>Gather node</firstterm>. Here is a simple example: a <firstterm>Gather</firstterm> or <firstterm>Gather Merge</firstterm>
node. Here is a simple example:
<screen> <screen>
EXPLAIN SELECT * FROM pgbench_accounts WHERE filler LIKE '%x%'; EXPLAIN SELECT * FROM pgbench_accounts WHERE filler LIKE '%x%';
...@@ -43,15 +44,16 @@ EXPLAIN SELECT * FROM pgbench_accounts WHERE filler LIKE '%x%'; ...@@ -43,15 +44,16 @@ EXPLAIN SELECT * FROM pgbench_accounts WHERE filler LIKE '%x%';
</para> </para>
<para> <para>
In all cases, the <literal>Gather</literal> node will have exactly one In all cases, the <literal>Gather</literal> or
<literal>Gather Merge</literal> node will have exactly one
child plan, which is the portion of the plan that will be executed in child plan, which is the portion of the plan that will be executed in
parallel. If the <literal>Gather</> node is at the very top of the plan parallel. If the <literal>Gather</> or <literal>Gather Merge</> node is
tree, then the entire query will execute in parallel. If it is somewhere at the very top of the plan tree, then the entire query will execute in
else in the plan tree, then only the portion of the plan below it will run parallel. If it is somewhere else in the plan tree, then only the portion
in parallel. In the example above, the query accesses only one table, so of the plan below it will run in parallel. In the example above, the
there is only one plan node other than the <literal>Gather</> node itself; query accesses only one table, so there is only one plan node other than
since that plan node is a child of the <literal>Gather</> node, it will the <literal>Gather</> node itself; since that plan node is a child of the
run in parallel. <literal>Gather</> node, it will run in parallel.
</para> </para>
<para> <para>
...@@ -60,35 +62,47 @@ EXPLAIN SELECT * FROM pgbench_accounts WHERE filler LIKE '%x%'; ...@@ -60,35 +62,47 @@ EXPLAIN SELECT * FROM pgbench_accounts WHERE filler LIKE '%x%';
during query execution, the process which is implementing the user's during query execution, the process which is implementing the user's
session will request a number of <link linkend="bgworker">background session will request a number of <link linkend="bgworker">background
worker processes</link> equal to the number worker processes</link> equal to the number
of workers chosen by the planner. The total number of background of workers chosen by the planner. The number of background workers that
workers that can exist at any one time is limited by both the planner will consider using is limited to at most
<xref linkend="guc-max-parallel-workers-per-gather">. The total number
of background workers that can exist at any one time is limited by both
<xref linkend="guc-max-worker-processes"> and <xref linkend="guc-max-worker-processes"> and
<xref linkend="guc-max-parallel-workers">, so it is possible for a <xref linkend="guc-max-parallel-workers">. Therefore, it is possible for a
parallel query to run with fewer workers than planned, or even with parallel query to run with fewer workers than planned, or even with
no workers at all. The optimal plan may depend on the number of workers no workers at all. The optimal plan may depend on the number of workers
that are available, so this can result in poor query performance. If this that are available, so this can result in poor query performance. If this
occurrence is frequent, considering increasing occurrence is frequent, consider increasing
<varname>max_worker_processes</> and <varname>max_parallel_workers</> <varname>max_worker_processes</> and <varname>max_parallel_workers</>
so that more workers can be run simultaneously or alternatively reducing so that more workers can be run simultaneously or alternatively reducing
<xref linkend="guc-max-parallel-workers-per-gather"> so that the planner <varname>max_parallel_workers_per_gather</varname> so that the planner
requests fewer workers. requests fewer workers.
</para> </para>
<para> <para>
Every background worker process which is successfully started for a given Every background worker process which is successfully started for a given
parallel query will execute the portion of the plan below parallel query will execute the parallel portion of the plan. The leader
the <literal>Gather</> node. The leader will also execute that portion will also execute that portion of the plan, but it has an additional
of the plan, but it has an additional responsibility: it must also read responsibility: it must also read all of the tuples generated by the
all of the tuples generated by the workers. When the parallel portion of workers. When the parallel portion of the plan generates only a small
the plan generates only a small number of tuples, the leader will often number of tuples, the leader will often behave very much like an additional
behave very much like an additional worker, speeding up query execution. worker, speeding up query execution. Conversely, when the parallel portion
Conversely, when the parallel portion of the plan generates a large number of the plan generates a large number of tuples, the leader may be almost
of tuples, the leader may be almost entirely occupied with reading the entirely occupied with reading the tuples generated by the workers and
tuples generated by the workers and performing any further processing performing any further processing steps which are required by plan nodes
steps which are required by plan nodes above the level of the above the level of the <literal>Gather</literal> node or
<literal>Gather</literal> node. In such cases, the leader will do very <literal>Gather Merge</literal> node. In such cases, the leader will
little of the work of executing the parallel portion of the plan. do very little of the work of executing the parallel portion of the plan.
</para> </para>
<para>
When the node at the top of the parallel portion of the plan is
<literal>Gather Merge</> rather than <literal>Gather</>, it indicates that
each process executing the parallel portion of the plan is producing
tuples in sorted order, and that the leader is performing an
order-preserving merge. In contrast, <literal>Gather</> reads tuples
from the workers in whatever order is convenient, destroying any sort
order that may have existed.
</para>
</sect1> </sect1>
<sect1 id="when-can-parallel-query-be-used"> <sect1 id="when-can-parallel-query-be-used">
...@@ -221,9 +235,9 @@ EXPLAIN SELECT * FROM pgbench_accounts WHERE filler LIKE '%x%'; ...@@ -221,9 +235,9 @@ EXPLAIN SELECT * FROM pgbench_accounts WHERE filler LIKE '%x%';
send such a message, this can only occur when using a client that send such a message, this can only occur when using a client that
does not rely on libpq. If this is a frequent does not rely on libpq. If this is a frequent
occurrence, it may be a good idea to set occurrence, it may be a good idea to set
<xref linkend="guc-max-parallel-workers-per-gather"> in sessions <xref linkend="guc-max-parallel-workers-per-gather"> to zero in
where it is likely, so as to avoid generating query plans that may sessions where it is likely, so as to avoid generating query plans
be suboptimal when run serially. that may be suboptimal when run serially.
</para> </para>
</listitem> </listitem>
...@@ -262,6 +276,8 @@ EXPLAIN SELECT * FROM pgbench_accounts WHERE filler LIKE '%x%'; ...@@ -262,6 +276,8 @@ EXPLAIN SELECT * FROM pgbench_accounts WHERE filler LIKE '%x%';
so that each process which executes the plan will generate only a so that each process which executes the plan will generate only a
subset of the output rows in such a way that each required output row subset of the output rows in such a way that each required output row
is guaranteed to be generated by exactly one of the cooperating processes. is guaranteed to be generated by exactly one of the cooperating processes.
Generally, this means that the scan on the driving table of the query
must be a parallel-aware scan.
</para> </para>
<sect2 id="parallel-scans"> <sect2 id="parallel-scans">
...@@ -302,9 +318,8 @@ EXPLAIN SELECT * FROM pgbench_accounts WHERE filler LIKE '%x%'; ...@@ -302,9 +318,8 @@ EXPLAIN SELECT * FROM pgbench_accounts WHERE filler LIKE '%x%';
</listitem> </listitem>
</itemizedlist> </itemizedlist>
Only the scan types listed above may be used for a scan on the driving Other scan types, such as scans of non-btree indexes, may support
table within a parallel plan. Other scan types, such as parallel scans of parallel scans in the future.
non-btree indexes, may be supported in the future.
</para> </para>
</sect2> </sect2>
...@@ -343,10 +358,10 @@ EXPLAIN SELECT * FROM pgbench_accounts WHERE filler LIKE '%x%'; ...@@ -343,10 +358,10 @@ EXPLAIN SELECT * FROM pgbench_accounts WHERE filler LIKE '%x%';
the query performs an aggregation step, producing a partial result for the query performs an aggregation step, producing a partial result for
each group of which that process is aware. This is reflected in the plan each group of which that process is aware. This is reflected in the plan
as a <literal>Partial Aggregate</> node. Second, the partial results are as a <literal>Partial Aggregate</> node. Second, the partial results are
transferred to the leader via the <literal>Gather</> node. Finally, the transferred to the leader via <literal>Gather</> or <literal>Gather
leader re-aggregates the results across all workers in order to produce Merge</>. Finally, the leader re-aggregates the results across all
the final result. This is reflected in the plan as a workers in order to produce the final result. This is reflected in the
<literal>Finalize Aggregate</> node. plan as a <literal>Finalize Aggregate</> node.
</para> </para>
<para> <para>
...@@ -416,8 +431,8 @@ EXPLAIN SELECT * FROM pgbench_accounts WHERE filler LIKE '%x%'; ...@@ -416,8 +431,8 @@ EXPLAIN SELECT * FROM pgbench_accounts WHERE filler LIKE '%x%';
operation is one which cannot be performed in a parallel worker, but which operation is one which cannot be performed in a parallel worker, but which
can be performed in the leader while parallel query is in use. Therefore, can be performed in the leader while parallel query is in use. Therefore,
parallel restricted operations can never occur below a <literal>Gather</> parallel restricted operations can never occur below a <literal>Gather</>
node, but can occur elsewhere in a plan which contains a or <literal>Gather Merge</> node, but can occur elsewhere in a plan which
<literal>Gather</> node. A parallel unsafe operation is one which cannot contains such a node. A parallel unsafe operation is one which cannot
be performed while parallel query is in use, not even in the leader. be performed while parallel query is in use, not even in the leader.
When a query contains anything which is parallel unsafe, parallel query When a query contains anything which is parallel unsafe, parallel query
is completely disabled for that query. is completely disabled for that query.
...@@ -449,7 +464,7 @@ EXPLAIN SELECT * FROM pgbench_accounts WHERE filler LIKE '%x%'; ...@@ -449,7 +464,7 @@ EXPLAIN SELECT * FROM pgbench_accounts WHERE filler LIKE '%x%';
<listitem> <listitem>
<para> <para>
Access to an <literal>InitPlan</> or <literal>SubPlan</>. Access to an <literal>InitPlan</> or correlated <literal>SubPlan</>.
</para> </para>
</listitem> </listitem>
</itemizedlist> </itemizedlist>
...@@ -514,8 +529,8 @@ EXPLAIN SELECT * FROM pgbench_accounts WHERE filler LIKE '%x%'; ...@@ -514,8 +529,8 @@ EXPLAIN SELECT * FROM pgbench_accounts WHERE filler LIKE '%x%';
parallel-restricted functions or aggregates involved in the query in parallel-restricted functions or aggregates involved in the query in
order to obtain a superior plan. So, for example, if a <literal>WHERE</> order to obtain a superior plan. So, for example, if a <literal>WHERE</>
clause applied to a particular table is parallel restricted, the query clause applied to a particular table is parallel restricted, the query
planner will not consider placing the scan of that table below a planner will not consider performing a scan of that table in the parallel
<literal>Gather</> node. In some cases, it would be portion of a plan. In some cases, it would be
possible (and perhaps even efficient) to include the scan of that table in possible (and perhaps even efficient) to include the scan of that table in
the parallel portion of the query and defer the evaluation of the the parallel portion of the query and defer the evaluation of the
<literal>WHERE</> clause so that it happens above the <literal>Gather</> <literal>WHERE</> clause so that it happens above the <literal>Gather</>
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment