Delete empty pages in each pass during GIST VACUUM.

Earlier, we use to postpone deleting empty pages till the second stage of vacuum to amortize the cost of scanning internal pages. However, that can sometimes (say vacuum is canceled or errored between first and second stage) delay the pages to be recycled. Another thing is that to facilitate deleting empty pages in the second stage, we need to share the information about internal and empty pages between different stages of vacuum. It will be quite tricky to share this information via DSM which is required for the upcoming parallel vacuum patch. Also, it will bring the logic to reclaim deleted pages closer to nbtree where we delete empty pages in each pass. Overall, the advantages of deleting empty pages in each pass outweigh the advantages of postponing the same. Author: Dilip Kumar, with changes by Amit Kapila Reviewed-by: Sawada Masahiko and Amit Kapila Discussion: https://postgr.es/m/CAA4eK1LGr+MN0xHZpJ2dfS8QNQ1a_aROKowZB+MPNep8FVtwAA@mail.gmail.com

Delete empty pages in each pass during GIST VACUUM.
Earlier, we use to postpone deleting empty pages till the second stage of vacuum to amortize the cost of scanning internal pages. However, that can sometimes (say vacuum is canceled or errored between first and second stage) delay the pages to be recycled. Another thing is that to facilitate deleting empty pages in the second stage, we need to share the information about internal and empty pages between different stages of vacuum. It will be quite tricky to share this information via DSM which is required for the upcoming parallel vacuum patch. Also, it will bring the logic to reclaim deleted pages closer to nbtree where we delete empty pages in each pass. Overall, the advantages of deleting empty pages in each pass outweigh the advantages of postponing the same. Author: Dilip Kumar, with changes by Amit Kapila Reviewed-by: Sawada Masahiko and Amit Kapila Discussion: https://postgr.es/m/CAA4eK1LGr+MN0xHZpJ2dfS8QNQ1a_aROKowZB+MPNep8FVtwAA@mail.gmail.com
4e514c61 · Amit Kapila · eae056c1 · 4e514c61 · 4e514c61
Commit 4e514c61 authored Jan 13, 2020 by Amit Kapila
Expand all Hide whitespace changes
Inline Side-by-side

Showing with 78 additions and 105 deletions

src/backend/access/gist/README src/backend/access/gist/README +11 -12

src/backend/access/gist/gistvacuum.c src/backend/access/gist/gistvacuum.c +67 -93

No files found.
--- a/src/backend/access/gist/README
+++ b/src/backend/access/gist/README
@@ -429,18 +429,17 @@ splits during searches, we don't need a "vacuum cycle ID" concept for that
 like B-tree does.

 While we scan all the pages, we also make note of any completely empty leaf
-pages. We will try to unlink them from the tree in the second stage. We also
-record the block numbers of all internal pages; they are needed in the second
-stage, to locate parents of the empty pages.
-
-In the second stage, we try to unlink any empty leaf pages from the tree, so
-that their space can be reused. In order to delete an empty page, its
-downlink must be removed from the parent. We scan all the internal pages,
-whose block numbers we memorized in the first stage, and look for downlinks
-to pages that we have memorized as being empty. Whenever we find one, we
-acquire a lock on the parent and child page, re-check that the child page is
-still empty. Then, we remove the downlink and mark the child as deleted, and
-release the locks.
+pages. We will try to unlink them from the tree after the scan. We also record
+the block numbers of all internal pages; they are needed to locate parents of
+the empty pages while unlinking them.
+
+We try to unlink any empty leaf pages from the tree, so that their space can
+be reused. In order to delete an empty page, its downlink must be removed from
+the parent. We scan all the internal pages, whose block numbers we memorized
+in the first stage, and look for downlinks to pages that we have memorized as
+being empty. Whenever we find one, we acquire a lock on the parent and child
+page, re-check that the child page is still empty. Then, we remove the
+downlink and mark the child as deleted, and release the locks.

 The insertion algorithm would get confused, if an internal page was completely
 empty. So we never delete the last child of an internal page, even if it's

--- a/src/backend/access/gist/gistvacuum.c
+++ b/src/backend/access/gist/gistvacuum.c