Commit 4cad2534 authored by Tomas Vondra's avatar Tomas Vondra

Use CP_SMALL_TLIST for hash aggregate

Commit 1f39bce0 added disk-based hash aggregation, which may spill
incoming tuples to disk. It however did not request projection to make
the tuples as narrow as possible, which may mean having to spill much
more data than necessary (increasing I/O, pushing other stuff from page
cache, etc.).

This adds CP_SMALL_TLIST in places that may use hash aggregation - we do
that only for AGG_HASHED. It's unnecessary for AGG_SORTED, because that
either uses explicit Sort (which already does projection) or pre-sorted
input (which does not need spilling to disk).

Author: Tomas Vondra
Reviewed-by: Jeff Davis
Discussion: https://postgr.es/m/20200519151202.u2p2gpiawoaznsv2%40development
parent 9b60c4b9
...@@ -2715,7 +2715,7 @@ select sum(c1) from ft1 group by c2 having avg(c1 * (random() <= 1)::int) > 100 ...@@ -2715,7 +2715,7 @@ select sum(c1) from ft1 group by c2 having avg(c1 * (random() <= 1)::int) > 100
Group Key: ft1.c2 Group Key: ft1.c2
Filter: (avg((ft1.c1 * ((random() <= '1'::double precision))::integer)) > '100'::numeric) Filter: (avg((ft1.c1 * ((random() <= '1'::double precision))::integer)) > '100'::numeric)
-> Foreign Scan on public.ft1 -> Foreign Scan on public.ft1
Output: c1, c2 Output: c2, c1
Remote SQL: SELECT "C 1", c2 FROM "S 1"."T 1" Remote SQL: SELECT "C 1", c2 FROM "S 1"."T 1"
(10 rows) (10 rows)
...@@ -2964,7 +2964,7 @@ select sum(c1) filter (where (c1 / c1) * random() <= 1) from ft1 group by c2 ord ...@@ -2964,7 +2964,7 @@ select sum(c1) filter (where (c1 / c1) * random() <= 1) from ft1 group by c2 ord
Output: sum(c1) FILTER (WHERE ((((c1 / c1))::double precision * random()) <= '1'::double precision)), c2 Output: sum(c1) FILTER (WHERE ((((c1 / c1))::double precision * random()) <= '1'::double precision)), c2
Group Key: ft1.c2 Group Key: ft1.c2
-> Foreign Scan on public.ft1 -> Foreign Scan on public.ft1
Output: c1, c2 Output: c2, c1
Remote SQL: SELECT "C 1", c2 FROM "S 1"."T 1" Remote SQL: SELECT "C 1", c2 FROM "S 1"."T 1"
(9 rows) (9 rows)
......
...@@ -2113,12 +2113,22 @@ create_agg_plan(PlannerInfo *root, AggPath *best_path) ...@@ -2113,12 +2113,22 @@ create_agg_plan(PlannerInfo *root, AggPath *best_path)
Plan *subplan; Plan *subplan;
List *tlist; List *tlist;
List *quals; List *quals;
int flags;
/* /*
* Agg can project, so no need to be terribly picky about child tlist, but * Agg can project, so no need to be terribly picky about child tlist, but
* we do need grouping columns to be available * we do need grouping columns to be available. We are a bit more careful
* with hash aggregate, where we explicitly request small tlist to minimize
* I/O needed for spilling (we can't be sure spilling won't be necessary,
* so we just do it every time).
*/ */
subplan = create_plan_recurse(root, best_path->subpath, CP_LABEL_TLIST); flags = CP_LABEL_TLIST;
/* ensure small tlist for hash aggregate */
if (best_path->aggstrategy == AGG_HASHED)
flags |= CP_SMALL_TLIST;
subplan = create_plan_recurse(root, best_path->subpath, flags);
tlist = build_path_tlist(root, &best_path->path); tlist = build_path_tlist(root, &best_path->path);
...@@ -2200,6 +2210,7 @@ create_groupingsets_plan(PlannerInfo *root, GroupingSetsPath *best_path) ...@@ -2200,6 +2210,7 @@ create_groupingsets_plan(PlannerInfo *root, GroupingSetsPath *best_path)
int maxref; int maxref;
List *chain; List *chain;
ListCell *lc; ListCell *lc;
int flags;
/* Shouldn't get here without grouping sets */ /* Shouldn't get here without grouping sets */
Assert(root->parse->groupingSets); Assert(root->parse->groupingSets);
...@@ -2207,9 +2218,18 @@ create_groupingsets_plan(PlannerInfo *root, GroupingSetsPath *best_path) ...@@ -2207,9 +2218,18 @@ create_groupingsets_plan(PlannerInfo *root, GroupingSetsPath *best_path)
/* /*
* Agg can project, so no need to be terribly picky about child tlist, but * Agg can project, so no need to be terribly picky about child tlist, but
* we do need grouping columns to be available * we do need grouping columns to be available. We are a bit more careful
* with hash aggregate, where we explicitly request small tlist to minimize
* I/O needed for spilling (we can't be sure spilling won't be necessary,
* so we just do it every time).
*/ */
subplan = create_plan_recurse(root, best_path->subpath, CP_LABEL_TLIST); flags = CP_LABEL_TLIST;
/* ensure small tlist for hash aggregate */
if (best_path->aggstrategy == AGG_HASHED)
flags |= CP_SMALL_TLIST;
subplan = create_plan_recurse(root, best_path->subpath, flags);
/* /*
* Compute the mapping from tleSortGroupRef to column index in the child's * Compute the mapping from tleSortGroupRef to column index in the child's
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment