• Heikki Linnakangas's avatar
    Fix race between GetNewTransactionId and GetOldestActiveTransactionId. · 74fc8386
    Heikki Linnakangas authored
    The race condition goes like this:
    
    1. GetNewTransactionId advances nextXid e.g. from 100 to 101
    2. GetOldestActiveTransactionId reads the new nextXid, 101
    3. GetOldestActiveTransactionId loops through the proc array. There are no
       active XIDs there, so it returns 101 as the oldest active XID.
    4. GetNewTransactionid stores XID 100 to MyPgXact->xid
    
    So, GetOldestActiveTransactionId returned XID 101, even though 100 only
    just started and is surely still running.
    
    This would be hard to hit in practice, and even harder to spot any ill
    effect if it happens. GetOldestActiveTransactionId is only used when
    creating a checkpoint in a master server, and the race condition can only
    happen on an online checkpoint, as there are no backends running during a
    shutdown checkpoint. The oldestActiveXid value of an online checkpoint is
    only used when starting up a hot standby server, to determine the starting
    point where pg_subtrans is initialized from. For the race condition to
    happen, there must be no other XIDs in the proc array that would hold back
    the oldest-active XID value, which means that the missed XID must be a top
    transaction's XID. However, pg_subtrans is not used for top XIDs, so I
    believe an off-by-one error is in fact inconsequential. Nevertheless, let's
    fix it, as it's clearly wrong and the fix is simple.
    
    This has been wrong ever since hot standby was introduced, so backport to
    all supported versions.
    
    Discussion: https://www.postgresql.org/message-id/e7258662-82b6-7a45-56d4-99b337a32bf7@iki.fi
    74fc8386
procarray.c 120 KB