• David Rowley's avatar
    Add Result Cache executor node (take 2) · 9eacee2e
    David Rowley authored
    Here we add a new executor node type named "Result Cache".  The planner
    can include this node type in the plan to have the executor cache the
    results from the inner side of parameterized nested loop joins.  This
    allows caching of tuples for sets of parameters so that in the event that
    the node sees the same parameter values again, it can just return the
    cached tuples instead of rescanning the inner side of the join all over
    again.  Internally, result cache uses a hash table in order to quickly
    find tuples that have been previously cached.
    
    For certain data sets, this can significantly improve the performance of
    joins.  The best cases for using this new node type are for join problems
    where a large portion of the tuples from the inner side of the join have
    no join partner on the outer side of the join.  In such cases, hash join
    would have to hash values that are never looked up, thus bloating the hash
    table and possibly causing it to multi-batch.  Merge joins would have to
    skip over all of the unmatched rows.  If we use a nested loop join with a
    result cache, then we only cache tuples that have at least one join
    partner on the outer side of the join.  The benefits of using a
    parameterized nested loop with a result cache increase when there are
    fewer distinct values being looked up and the number of lookups of each
    value is large.  Also, hash probes to lookup the cache can be much faster
    than the hash probe in a hash join as it's common that the result cache's
    hash table is much smaller than the hash join's due to result cache only
    caching useful tuples rather than all tuples from the inner side of the
    join.  This variation in hash probe performance is more significant when
    the hash join's hash table no longer fits into the CPU's L3 cache, but the
    result cache's hash table does.  The apparent "random" access of hash
    buckets with each hash probe can cause a poor L3 cache hit ratio for large
    hash tables.  Smaller hash tables generally perform better.
    
    The hash table used for the cache limits itself to not exceeding work_mem
    * hash_mem_multiplier in size.  We maintain a dlist of keys for this cache
    and when we're adding new tuples and realize we've exceeded the memory
    budget, we evict cache entries starting with the least recently used ones
    until we have enough memory to add the new tuples to the cache.
    
    For parameterized nested loop joins, we now consider using one of these
    result cache nodes in between the nested loop node and its inner node.  We
    determine when this might be useful based on cost, which is primarily
    driven off of what the expected cache hit ratio will be.  Estimating the
    cache hit ratio relies on having good distinct estimates on the nested
    loop's parameters.
    
    For now, the planner will only consider using a result cache for
    parameterized nested loop joins.  This works for both normal joins and
    also for LATERAL type joins to subqueries.  It is possible to use this new
    node for other uses in the future.  For example, to cache results from
    correlated subqueries.  However, that's not done here due to some
    difficulties obtaining a distinct estimation on the outer plan to
    calculate the estimated cache hit ratio.  Currently we plan the inner plan
    before planning the outer plan so there is no good way to know if a result
    cache would be useful or not since we can't estimate the number of times
    the subplan will be called until the outer plan is generated.
    
    The functionality being added here is newly introducing a dependency on
    the return value of estimate_num_groups() during the join search.
    Previously, during the join search, we only ever needed to perform
    selectivity estimations.  With this commit, we need to use
    estimate_num_groups() in order to estimate what the hit ratio on the
    result cache will be.   In simple terms, if we expect 10 distinct values
    and we expect 1000 outer rows, then we'll estimate the hit ratio to be
    99%.  Since cache hits are very cheap compared to scanning the underlying
    nodes on the inner side of the nested loop join, then this will
    significantly reduce the planner's cost for the join.   However, it's
    fairly easy to see here that things will go bad when estimate_num_groups()
    incorrectly returns a value that's significantly lower than the actual
    number of distinct values.  If this happens then that may cause us to make
    use of a nested loop join with a result cache instead of some other join
    type, such as a merge or hash join.  Our distinct estimations have been
    known to be a source of trouble in the past, so the extra reliance on them
    here could cause the planner to choose slower plans than it did previous
    to having this feature.  Distinct estimations are also fairly hard to
    estimate accurately when several tables have been joined already or when a
    WHERE clause filters out a set of values that are correlated to the
    expressions we're estimating the number of distinct value for.
    
    For now, the costing we perform during query planning for result caches
    does put quite a bit of faith in the distinct estimations being accurate.
    When these are accurate then we should generally see faster execution
    times for plans containing a result cache.  However, in the real world, we
    may find that we need to either change the costings to put less trust in
    the distinct estimations being accurate or perhaps even disable this
    feature by default.  There's always an element of risk when we teach the
    query planner to do new tricks that it decides to use that new trick at
    the wrong time and causes a regression.  Users may opt to get the old
    behavior by turning the feature off using the enable_resultcache GUC.
    Currently, this is enabled by default.  It remains to be seen if we'll
    maintain that setting for the release.
    
    Additionally, the name "Result Cache" is the best name I could think of
    for this new node at the time I started writing the patch.  Nobody seems
    to strongly dislike the name. A few people did suggest other names but no
    other name seemed to dominate in the brief discussion that there was about
    names. Let's allow the beta period to see if the current name pleases
    enough people.  If there's some consensus on a better name, then we can
    change it before the release.  Please see the 2nd discussion link below
    for the discussion on the "Result Cache" name.
    
    Author: David Rowley
    Reviewed-by: Andy Fan, Justin Pryzby, Zhihong Yu, Hou Zhijie
    Tested-By: Konstantin Knizhnik
    Discussion: https://postgr.es/m/CAApHDvrPcQyQdWERGYWx8J%2B2DLUNgXu%2BfOSbQ1UscxrunyXyrQ%40mail.gmail.com
    Discussion: https://postgr.es/m/CAApHDvq=yQXr5kqhRviT2RhNKwToaWr9JAN5t+5_PzhuRJ3wvg@mail.gmail.com
    9eacee2e
setrefs.c 93.4 KB