• Heikki Linnakangas's avatar
    Fix visibility check when XID is committed in CLOG but not in procarray. · e24615a0
    Heikki Linnakangas authored
    TransactionIdIsInProgress had a fast path to return 'false' if the
    single-item CLOG cache said that the transaction was known to be
    committed. However, that was wrong, because a transaction is first
    marked as committed in the CLOG but doesn't become visible to others
    until it has removed its XID from the proc array. That could lead to an
    error:
    
        ERROR:  t_xmin is uncommitted in tuple to be updated
    
    or for an UPDATE to go ahead without blocking, before the previous
    UPDATE on the same row was made visible.
    
    The window is usually very short, but synchronous replication makes it
    much wider, because the wait for synchronous replica happens in that
    window.
    
    Another thing that makes it hard to hit is that it's hard to get such
    a commit-in-progress transaction into the single item CLOG cache.
    Normally, if you call TransactionIdIsInProgress on such a transaction,
    it determines that the XID is in progress without checking the CLOG
    and without populating the cache. One way to prime the cache is to
    explicitly call pg_xact_status() on the XID. Another way is to use a
    lot of subtransactions, so that the subxid cache in the proc array is
    overflown, making TransactionIdIsInProgress rely on pg_subtrans and
    CLOG checks.
    
    This has been broken ever since it was introduced in 2008, but the race
    condition is very hard to hit, especially without synchronous
    replication. There were a couple of reports of the error starting from
    summer 2021, but no one was able to find the root cause then.
    
    TransactionIdIsKnownCompleted() is now unused. In 'master', remove it,
    but I left it in place in backbranches in case it's used by extensions.
    
    Also change pg_xact_status() to check TransactionIdIsInProgress().
    Previously, it only checked the CLOG, and returned "committed" before
    the transaction was actually made visible to other queries. Note that
    this also means that you cannot use pg_xact_status() to reproduce the
    bug anymore, even if the code wasn't fixed.
    
    Report and analysis by Konstantin Knizhnik. Patch by Simon Riggs, with
    the pg_xact_status() change added by me.
    
    Author: Simon Riggs
    Reviewed-by: Andres Freund
    Discussion: https://www.postgresql.org/message-id/flat/4da7913d-398c-e2ad-d777-f752cf7f0bbb%40garret.ru
    e24615a0
procarray.c 162 KB