Commit ccc4c074 authored by Alvaro Herrera's avatar Alvaro Herrera

Close some holes in BRIN page assignment

In some corner cases, it is possible for the BRIN index relation to be
extended by brin_getinsertbuffer but the new page not be used
immediately for anything by its callers; when this happens, the page is
initialized and the FSM is updated (by brin_getinsertbuffer) with the
info about that page, but these actions are not WAL-logged.  A later
index insert/update can use the page, but since the page is already
initialized, the initialization itself is not WAL-logged then either.
Replay of this sequence of events causes recovery to fail altogether.

There is a related corner case within brin_getinsertbuffer itself, in
which we extend the relation to put a new index tuple there, but later
find out that we cannot do so, and do not return the buffer; the page
obtained from extension is not even initialized.  The resulting page is
lost forever.

To fix, shuffle the code so that initialization is not the
responsibility of brin_getinsertbuffer anymore, in normal cases;
instead, the initialization is done by its callers (brin_doinsert and
brin_doupdate) once they're certain that the page is going to be used.
When either those functions determine that the new page cannot be used,
before bailing out they initialize the page as an empty regular page,
enter it in FSM and WAL-log all this.  This way, the page is usable for
future index insertions, and WAL replay doesn't find trying to insert
tuples in pages whose initialization didn't make it to the WAL.  The
same strategy is used in brin_getinsertbuffer when it cannot return the
new page.

Additionally, add a new step to vacuuming so that all pages of the index
are scanned; whenever an uninitialized page is found, it is initialized
as empty and WAL-logged.  This closes the hole that the relation is
extended but the system crashes before anything is WAL-logged about it.
We also take this opportunity to update the FSM, in case it has gotten
out of date.

Thanks to Heikki Linnakangas for finding the problem that kicked some
additional analysis of BRIN page assignment code.

Backpatch to 9.5, where BRIN was introduced.

Discussion: https://www.postgresql.org/message-id/20150723204810.GY5596@postgresql.org
parent a4b059fd
...@@ -68,6 +68,7 @@ static void brinsummarize(Relation index, Relation heapRel, ...@@ -68,6 +68,7 @@ static void brinsummarize(Relation index, Relation heapRel,
static void form_and_insert_tuple(BrinBuildState *state); static void form_and_insert_tuple(BrinBuildState *state);
static void union_tuples(BrinDesc *bdesc, BrinMemTuple *a, static void union_tuples(BrinDesc *bdesc, BrinMemTuple *a,
BrinTuple *b); BrinTuple *b);
static void brin_vacuum_scan(Relation idxrel, BufferAccessStrategy strategy);
/* /*
...@@ -736,6 +737,8 @@ brinvacuumcleanup(PG_FUNCTION_ARGS) ...@@ -736,6 +737,8 @@ brinvacuumcleanup(PG_FUNCTION_ARGS)
heapRel = heap_open(IndexGetRelation(RelationGetRelid(info->index), false), heapRel = heap_open(IndexGetRelation(RelationGetRelid(info->index), false),
AccessShareLock); AccessShareLock);
brin_vacuum_scan(info->index, info->strategy);
brinsummarize(info->index, heapRel, brinsummarize(info->index, heapRel,
&stats->num_index_tuples, &stats->num_index_tuples); &stats->num_index_tuples, &stats->num_index_tuples);
...@@ -1150,3 +1153,43 @@ union_tuples(BrinDesc *bdesc, BrinMemTuple *a, BrinTuple *b) ...@@ -1150,3 +1153,43 @@ union_tuples(BrinDesc *bdesc, BrinMemTuple *a, BrinTuple *b)
MemoryContextDelete(cxt); MemoryContextDelete(cxt);
} }
/*
* brin_vacuum_scan
* Do a complete scan of the index during VACUUM.
*
* This routine scans the complete index looking for uncatalogued index pages,
* i.e. those that might have been lost due to a crash after index extension
* and such.
*/
static void
brin_vacuum_scan(Relation idxrel, BufferAccessStrategy strategy)
{
bool vacuum_fsm = false;
BlockNumber blkno;
/*
* Scan the index in physical order, and clean up any possible mess in
* each page.
*/
for (blkno = 0; blkno < RelationGetNumberOfBlocks(idxrel); blkno++)
{
Buffer buf;
CHECK_FOR_INTERRUPTS();
buf = ReadBufferExtended(idxrel, MAIN_FORKNUM, blkno,
RBM_NORMAL, strategy);
vacuum_fsm |= brin_page_cleanup(idxrel, buf);
ReleaseBuffer(buf);
}
/*
* If we made any change to the FSM, make sure the new info is visible all
* the way to the top.
*/
if (vacuum_fsm)
FreeSpaceMapVacuum(idxrel);
}
This diff is collapsed.
...@@ -52,6 +52,7 @@ typedef struct BrinSpecialSpace ...@@ -52,6 +52,7 @@ typedef struct BrinSpecialSpace
#define BRIN_PAGETYPE_REVMAP 0xF092 #define BRIN_PAGETYPE_REVMAP 0xF092
#define BRIN_PAGETYPE_REGULAR 0xF093 #define BRIN_PAGETYPE_REGULAR 0xF093
#define BRIN_IS_META_PAGE(page) (BrinPageType(page) == BRIN_PAGETYPE_META)
#define BRIN_IS_REVMAP_PAGE(page) (BrinPageType(page) == BRIN_PAGETYPE_REVMAP) #define BRIN_IS_REVMAP_PAGE(page) (BrinPageType(page) == BRIN_PAGETYPE_REVMAP)
#define BRIN_IS_REGULAR_PAGE(page) (BrinPageType(page) == BRIN_PAGETYPE_REGULAR) #define BRIN_IS_REGULAR_PAGE(page) (BrinPageType(page) == BRIN_PAGETYPE_REGULAR)
......
...@@ -33,4 +33,6 @@ extern bool brin_start_evacuating_page(Relation idxRel, Buffer buf); ...@@ -33,4 +33,6 @@ extern bool brin_start_evacuating_page(Relation idxRel, Buffer buf);
extern void brin_evacuate_page(Relation idxRel, BlockNumber pagesPerRange, extern void brin_evacuate_page(Relation idxRel, BlockNumber pagesPerRange,
BrinRevmap *revmap, Buffer buf); BrinRevmap *revmap, Buffer buf);
extern bool brin_page_cleanup(Relation idxrel, Buffer buf);
#endif /* BRIN_PAGEOPS_H */ #endif /* BRIN_PAGEOPS_H */
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment