• Andres Freund's avatar
    Support GROUPING SETS, CUBE and ROLLUP. · f3d31185
    Andres Freund authored
    This SQL standard functionality allows to aggregate data by different
    GROUP BY clauses at once. Each grouping set returns rows with columns
    grouped by in other sets set to NULL.
    
    This could previously be achieved by doing each grouping as a separate
    query, conjoined by UNION ALLs. Besides being considerably more concise,
    grouping sets will in many cases be faster, requiring only one scan over
    the underlying data.
    
    The current implementation of grouping sets only supports using sorting
    for input. Individual sets that share a sort order are computed in one
    pass. If there are sets that don't share a sort order, additional sort &
    aggregation steps are performed. These additional passes are sourced by
    the previous sort step; thus avoiding repeated scans of the source data.
    
    The code is structured in a way that adding support for purely using
    hash aggregation or a mix of hashing and sorting is possible. Sorting
    was chosen to be supported first, as it is the most generic method of
    implementation.
    
    Instead of, as in an earlier versions of the patch, representing the
    chain of sort and aggregation steps as full blown planner and executor
    nodes, all but the first sort are performed inside the aggregation node
    itself. This avoids the need to do some unusual gymnastics to handle
    having to return aggregated and non-aggregated tuples from underlying
    nodes, as well as having to shut down underlying nodes early to limit
    memory usage.  The optimizer still builds Sort/Agg node to describe each
    phase, but they're not part of the plan tree, but instead additional
    data for the aggregation node. They're a convenient and preexisting way
    to describe aggregation and sorting.  The first (and possibly only) sort
    step is still performed as a separate execution step. That retains
    similarity with existing group by plans, makes rescans fairly simple,
    avoids very deep plans (leading to slow explains) and easily allows to
    avoid the sorting step if the underlying data is sorted by other means.
    
    A somewhat ugly side of this patch is having to deal with a grammar
    ambiguity between the new CUBE keyword and the cube extension/functions
    named cube (and rollup). To avoid breaking existing deployments of the
    cube extension it has not been renamed, neither has cube been made a
    reserved keyword. Instead precedence hacking is used to make GROUP BY
    cube(..) refer to the CUBE grouping sets feature, and not the function
    cube(). To actually group by a function cube(), unlikely as that might
    be, the function name has to be quoted.
    
    Needs a catversion bump because stored rules may change.
    
    Author: Andrew Gierth and Atri Sharma, with contributions from Andres Freund
    Reviewed-By: Andres Freund, Noah Misch, Tom Lane, Svenne Krap, Tomas
        Vondra, Erik Rijkers, Marti Raudsepp, Pavel Stehule
    Discussion: CAOeZVidmVRe2jU6aMk_5qkxnB7dfmPROzM7Ur8JPW5j8Y5X-Lw@mail.gmail.com
    f3d31185
parallel_schedule 3.67 KB