Fix incorrect row estimates used for Memoize costing

In order to estimate the cache hit ratio of a Memoize node, one of the inputs we require is the estimated number of times the Memoize node will be rescanned. The higher this number, the large the cache hit ratio is likely to become. Unfortunately, the value being passed as the number of "calls" to the Memoize was incorrectly using the Nested Loop's outer_path->parent->rows instead of outer_path->rows. This failed to account for the fact that the outer_path might be parameterized by some upper-level Nested Loop. This problem could lead to Memoize plans appearing more favorable than they might actually be. It could also lead to extended executor startup times when work_mem values were large due to the planner setting overly large MemoizePath->est_entries resulting in the Memoize hash table being initially made much larger than might be required. Fix this simply by passing outer_path->rows rather than outer_path->parent->rows. Also, adjust the expected regression test output for a plan change. Reported-by: Pavel Stehule Author: David Rowley Discussion: https://postgr.es/m/CAFj8pRAMp%3DQsMi6sPQJ4W3hczoFJRvyXHJV3AZAZaMyTVM312Q%40mail.gmail.com Backpatch-through: 14, where Memoize was introduced

Fix incorrect row estimates used for Memoize costing
In order to estimate the cache hit ratio of a Memoize node, one of the inputs we require is the estimated number of times the Memoize node will be rescanned. The higher this number, the large the cache hit ratio is likely to become. Unfortunately, the value being passed as the number of "calls" to the Memoize was incorrectly using the Nested Loop's outer_path->parent->rows instead of outer_path->rows. This failed to account for the fact that the outer_path might be parameterized by some upper-level Nested Loop. This problem could lead to Memoize plans appearing more favorable than they might actually be. It could also lead to extended executor startup times when work_mem values were large due to the planner setting overly large MemoizePath->est_entries resulting in the Memoize hash table being initially made much larger than might be required. Fix this simply by passing outer_path->rows rather than outer_path->parent->rows. Also, adjust the expected regression test output for a plan change. Reported-by: Pavel Stehule Author: David Rowley Discussion: https://postgr.es/m/CAFj8pRAMp%3DQsMi6sPQJ4W3hczoFJRvyXHJV3AZAZaMyTVM312Q%40mail.gmail.com Backpatch-through: 14, where Memoize was introduced
23c2b76a · David Rowley · 6dced63b · 23c2b76a · 23c2b76a
Commit 23c2b76a authored May 16, 2022 by David Rowley
Hide whitespace changes
Inline Side-by-side

Showing with 11 additions and 17 deletions

src/backend/optimizer/path/joinpath.c src/backend/optimizer/path/joinpath.c +1 -1

src/test/regress/expected/join.out src/test/regress/expected/join.out +10 -16

No files found.
--- a/src/backend/optimizer/path/joinpath.c
+++ b/src/backend/optimizer/path/joinpath.c
@@ -595,7 +595,7 @@ get_memoize_path(PlannerInfo *root, RelOptInfo *innerrel,
 											hash_operators,
 											extra->inner_unique,
 											binary_mode,
-											outer_path->parent->rows);
+											outer_path->rows);
 	}
 	return NULL;

--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -3673,8 +3673,8 @@ select * from tenk1 t1 left join
  (tenk1 t2 join tenk1 t3 on t2.thousand = t3.unique2)
  on t1.hundred = t2.hundred and t1.ten = t3.ten
 where t1.unique1 = 1;
-                          QUERY PLAN                          
+                       QUERY PLAN                       
--------------------------------------------------------------
+--------------------------------------------------------
 Nested Loop Left Join
   ->  Index Scan using tenk1_unique1 on tenk1 t1
         Index Cond: (unique1 = 1)
@@ -3684,20 +3684,17 @@ where t1.unique1 = 1;
               Recheck Cond: (t1.hundred = hundred)
               ->  Bitmap Index Scan on tenk1_hundred
                     Index Cond: (hundred = t1.hundred)
-         ->  Memoize
+         ->  Index Scan using tenk1_unique2 on tenk1 t3
-               Cache Key: t2.thousand
+               Index Cond: (unique2 = t2.thousand)
-               Cache Mode: logical
+(11 rows)
-               ->  Index Scan using tenk1_unique2 on tenk1 t3
-                     Index Cond: (unique2 = t2.thousand)
-(14 rows)
 explain (costs off)
 select * from tenk1 t1 left join
  (tenk1 t2 join tenk1 t3 on t2.thousand = t3.unique2)
  on t1.hundred = t2.hundred and t1.ten + t2.ten = t3.ten
 where t1.unique1 = 1;
-                          QUERY PLAN                          
+                       QUERY PLAN                       
--------------------------------------------------------------
+--------------------------------------------------------
 Nested Loop Left Join
   ->  Index Scan using tenk1_unique1 on tenk1 t1
         Index Cond: (unique1 = 1)
@@ -3707,12 +3704,9 @@ where t1.unique1 = 1;
               Recheck Cond: (t1.hundred = hundred)
               ->  Bitmap Index Scan on tenk1_hundred
                     Index Cond: (hundred = t1.hundred)
-         ->  Memoize
+         ->  Index Scan using tenk1_unique2 on tenk1 t3
-               Cache Key: t2.thousand
+               Index Cond: (unique2 = t2.thousand)
-               Cache Mode: logical
+(11 rows)
-               ->  Index Scan using tenk1_unique2 on tenk1 t3
-                     Index Cond: (unique2 = t2.thousand)
-(14 rows)
 explain (costs off)
 select count(*) from