Commit 1383e2a1 authored by Tom Lane's avatar Tom Lane

Improve FSM management for BRIN indexes.

BRIN indexes like to propagate additions of free space into the upper pages
of their free space maps as soon as the new space is known, even when it's
just on one individual index page.  Previously this required calling
FreeSpaceMapVacuum, which is quite an expensive thing if the map is large.
Use the FreeSpaceMapVacuumRange function recently added by commit c79f6df7
to reduce the amount of work done for this purpose.

Fix a couple of places that neglected to do the upper-page vacuuming at all
after recording new free space.  If the policy is to be that BRIN should do
that, it should do it everywhere.

Do RecordPageWithFreeSpace unconditionally in brin_page_cleanup, and do
FreeSpaceMapVacuum unconditionally in brin_vacuum_scan.  Because of the
FSM's imprecise storage of free space, the old complications here seldom
bought anything, they just slowed things down.  This approach also
provides a predictable path for FSM corruption to be repaired.

Remove premature RecordPageWithFreeSpace call in brin_getinsertbuffer
where it's about to return an extended page to the caller.  The caller
should do that, instead, after it's inserted its new tuple.  Fix the
one caller that forgot to do so.

Simplify logic in brin_doupdate's same-page-update case by postponing
brin_initialize_empty_new_buffer to after the critical section; I see
little point in doing it before.

Avoid repeat calls of RelationGetNumberOfBlocks in brin_vacuum_scan.
Avoid duplicate BufferGetBlockNumber and BufferGetPage calls in
a couple of places where we already had the right values.

Move a BRIN_elog debug logging call out of a critical section; that's
pretty unsafe and I don't think it buys us anything to not wait till
after the critical section.

Move the "*extended = false" step in brin_getinsertbuffer into the
routine's main loop.  There's no actual bug there, since the loop can't
iterate with *extended still true, but it doesn't seem very future-proof
as coded; and it's certainly not documented as a loop invariant.

This is all from follow-on investigation inspired by commit c79f6df7.

Discussion: https://postgr.es/m/5801.1522429460@sss.pgh.pa.us
parent 3de241db
...@@ -1121,16 +1121,22 @@ initialize_brin_buildstate(Relation idxRel, BrinRevmap *revmap, ...@@ -1121,16 +1121,22 @@ initialize_brin_buildstate(Relation idxRel, BrinRevmap *revmap,
static void static void
terminate_brin_buildstate(BrinBuildState *state) terminate_brin_buildstate(BrinBuildState *state)
{ {
/* release the last index buffer used */ /*
* Release the last index buffer used. We might as well ensure that
* whatever free space remains in that page is available in FSM, too.
*/
if (!BufferIsInvalid(state->bs_currentInsertBuf)) if (!BufferIsInvalid(state->bs_currentInsertBuf))
{ {
Page page; Page page;
Size freespace;
BlockNumber blk;
page = BufferGetPage(state->bs_currentInsertBuf); page = BufferGetPage(state->bs_currentInsertBuf);
RecordPageWithFreeSpace(state->bs_irel, freespace = PageGetFreeSpace(page);
BufferGetBlockNumber(state->bs_currentInsertBuf), blk = BufferGetBlockNumber(state->bs_currentInsertBuf);
PageGetFreeSpace(page));
ReleaseBuffer(state->bs_currentInsertBuf); ReleaseBuffer(state->bs_currentInsertBuf);
RecordPageWithFreeSpace(state->bs_irel, blk, freespace);
FreeSpaceMapVacuumRange(state->bs_irel, blk, blk + 1);
} }
brin_free_desc(state->bs_bdesc); brin_free_desc(state->bs_bdesc);
...@@ -1445,14 +1451,15 @@ union_tuples(BrinDesc *bdesc, BrinMemTuple *a, BrinTuple *b) ...@@ -1445,14 +1451,15 @@ union_tuples(BrinDesc *bdesc, BrinMemTuple *a, BrinTuple *b)
static void static void
brin_vacuum_scan(Relation idxrel, BufferAccessStrategy strategy) brin_vacuum_scan(Relation idxrel, BufferAccessStrategy strategy)
{ {
bool vacuum_fsm = false; BlockNumber nblocks;
BlockNumber blkno; BlockNumber blkno;
/* /*
* Scan the index in physical order, and clean up any possible mess in * Scan the index in physical order, and clean up any possible mess in
* each page. * each page.
*/ */
for (blkno = 0; blkno < RelationGetNumberOfBlocks(idxrel); blkno++) nblocks = RelationGetNumberOfBlocks(idxrel);
for (blkno = 0; blkno < nblocks; blkno++)
{ {
Buffer buf; Buffer buf;
...@@ -1461,15 +1468,15 @@ brin_vacuum_scan(Relation idxrel, BufferAccessStrategy strategy) ...@@ -1461,15 +1468,15 @@ brin_vacuum_scan(Relation idxrel, BufferAccessStrategy strategy)
buf = ReadBufferExtended(idxrel, MAIN_FORKNUM, blkno, buf = ReadBufferExtended(idxrel, MAIN_FORKNUM, blkno,
RBM_NORMAL, strategy); RBM_NORMAL, strategy);
vacuum_fsm |= brin_page_cleanup(idxrel, buf); brin_page_cleanup(idxrel, buf);
ReleaseBuffer(buf); ReleaseBuffer(buf);
} }
/* /*
* If we made any change to the FSM, make sure the new info is visible all * Update all upper pages in the index's FSM, as well. This ensures not
* the way to the top. * only that we propagate leaf-page FSM updates made by brin_page_cleanup,
* but also that any pre-existing damage or out-of-dateness is repaired.
*/ */
if (vacuum_fsm) FreeSpaceMapVacuum(idxrel);
FreeSpaceMapVacuum(idxrel);
} }
This diff is collapsed.
...@@ -33,6 +33,6 @@ extern bool brin_start_evacuating_page(Relation idxRel, Buffer buf); ...@@ -33,6 +33,6 @@ extern bool brin_start_evacuating_page(Relation idxRel, Buffer buf);
extern void brin_evacuate_page(Relation idxRel, BlockNumber pagesPerRange, extern void brin_evacuate_page(Relation idxRel, BlockNumber pagesPerRange,
BrinRevmap *revmap, Buffer buf); BrinRevmap *revmap, Buffer buf);
extern bool brin_page_cleanup(Relation idxrel, Buffer buf); extern void brin_page_cleanup(Relation idxrel, Buffer buf);
#endif /* BRIN_PAGEOPS_H */ #endif /* BRIN_PAGEOPS_H */
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment