• Alvaro Herrera's avatar
    Fix "base" snapshot handling in logical decoding · f49a80c4
    Alvaro Herrera authored
    Two closely related bugs are fixed.  First, xmin of logical slots was
    advanced too early.  During xl_running_xacts processing, xmin of the
    slot was set to the oldest running xid in the record, but that's wrong:
    actually, snapshots which will be used for not-yet-replayed transactions
    might consider older txns as running too, so we need to keep xmin back
    for them.  The problem wasn't noticed earlier because DDL which allows
    to delete tuple (set xmax) while some another not-yet-committed
    transaction looks at it is pretty rare, if not unique: e.g. all forms of
    ALTER TABLE which change schema acquire ACCESS EXCLUSIVE lock
    conflicting with any inserts. The included test case (test_decoding's
    oldest_xmin) uses ALTER of a composite type, which doesn't have such
    interlocking.
    
    To deal with this, we must be able to quickly retrieve oldest xmin
    (oldest running xid among all assigned snapshots) from ReorderBuffer. To
    fix, add another list of ReorderBufferTXNs to the reorderbuffer, where
    transactions are sorted by base-snapshot-LSN.  This is slightly
    different from the existing (sorted by first-LSN) list, because a
    transaction can have an earlier LSN but a later Xmin, if its first
    record does not obtain an xmin (eg. xl_xact_assignment).  Note this new
    list doesn't fully replace the existing txn list: we still need that one
    to prevent WAL recycling.
    
    The second issue concerns SnapBuilder snapshots and subtransactions.
    SnapBuildDistributeNewCatalogSnapshot never assigned a snapshot to a
    transaction that is known to be a subtxn, which is good in the common
    case that the top-level transaction already has one (no point in doing
    so), but a bug otherwise.  To fix, arrange to transfer the snapshot from
    the subtxn to its top-level txn as soon as the kinship gets known.
    test_decoding's snapshot_transfer verifies this.
    
    Also, fix a minor memory leak: refcount of toplevel's old base snapshot
    was not decremented when the snapshot is transferred from child.
    
    Liberally sprinkle code comments, and rewrite a few existing ones.  This
    part is my (Álvaro's) contribution to this commit, as I had to write all
    those comments in order to understand the existing code and Arseny's
    patch.
    Reported-by: default avatarArseny Sher <a.sher@postgrespro.ru>
    Diagnosed-by: default avatarArseny Sher <a.sher@postgrespro.ru>
    Co-authored-by: default avatarArseny Sher <a.sher@postgrespro.ru>
    Co-authored-by: default avatarÁlvaro Herrera <alvherre@alvh.no-ip.org>
    Reviewed-by: default avatarAntonin Houska <ah@cybertec.at>
    Discussion: https://postgr.es/m/87lgdyz1wj.fsf@ars-thinkpad
    f49a80c4
snapshot_transfer.spec 1.84 KB