• Alvaro Herrera's avatar
    Avoid killing btree items that are already dead · 242dfcba
    Alvaro Herrera authored
    _bt_killitems marks btree items dead when a scan leaves the page where
    they live, but it does so with only share lock (to improve concurrency).
    This was historicall okay, since killing a dead item has no
    consequences.  However, with the advent of data checksums and
    wal_log_hints, this action incurs a WAL full-page-image record of the
    page.  Multiple concurrent processes would write the same page several
    times, leading to WAL bloat.  The probability of this happening can be
    reduced by only killing items if they're not already dead, so change the
    code to do that.
    
    The problem could eliminated completely by having _bt_killitems upgrade
    to exclusive lock upon seeing a killable item, but that would reduce
    concurrency so it's considered a cure worse than the disease.
    
    Backpatch all the way back to 9.5, since wal_log_hints was introduced in
    9.4.
    
    Author: Masahiko Sawada <masahiko.sawada@2ndquadrant.com>
    Discussion: https://postgr.es/m/CA+fd4k6PeRj2CkzapWNrERkja5G0-6D-YQiKfbukJV+qZGFZ_Q@mail.gmail.com
    242dfcba
nbtutils.c 84.6 KB