• Alvaro Herrera's avatar
    Fix traversal of half-frozen update chains · a5736bf7
    Alvaro Herrera authored
    When some tuple versions in an update chain are frozen due to them being
    older than freeze_min_age, the xmax/xmin trail can become broken.  This
    breaks HOT (and probably other things).  A subsequent VACUUM can break
    things in more serious ways, such as leaving orphan heap-only tuples
    whose root HOT redirect items were removed.  This can be seen because
    index creation (or REINDEX) complain like
      ERROR:  XX000: failed to find parent tuple for heap-only tuple at (0,7) in table "t"
    
    Because of relfrozenxid contraints, we cannot avoid the freezing of the
    early tuples, so we must cope with the results: whenever we see an Xmin
    of FrozenTransactionId, consider it a match for whatever the previous
    Xmax value was.
    
    This problem seems to have appeared in 9.3 with multixact changes,
    though strictly speaking it seems unrelated.
    
    Since 9.4 we have commit 37484ad2 "Change the way we mark tuples as
    frozen", so the fix is simple: just compare the raw Xmin (still stored
    in the tuple header, since freezing merely set an infomask bit) to the
    Xmax.  But in 9.3 we rewrite the Xmin value to FrozenTransactionId, so
    the original value is lost and we have nothing to compare the Xmax with.
    To cope with that case we need to compare the Xmin with FrozenXid,
    assume it's a match, and hope for the best.  Sadly, since you can
    pg_upgrade a 9.3 instance containing half-frozen pages to newer
    releases, we need to keep the old check in newer versions too, which
    seems a bit brittle; I hope we can somehow get rid of that.
    
    I didn't optimize the new function for performance.  The new coding is
    probably a bit slower than before, since there is a function call rather
    than a straight comparison, but I'd rather have it work correctly than
    be fast but wrong.
    
    This is a followup after 20b65522 fixed a few related problems.
    Apparently, in 9.6 and up there are more ways to get into trouble, but
    in 9.3 - 9.5 I cannot reproduce a problem anymore with this patch, so
    there must be a separate bug.
    
    Reported-by: Peter Geoghegan
    Diagnosed-by: Peter Geoghegan, Michael Paquier, Daniel Wood,
    	Yi Wen Wong, Álvaro
    Discussion: https://postgr.es/m/CAH2-Wznm4rCrhFAiwKPWTpEw2bXDtgROZK7jWWGucXeH3D1fmA@mail.gmail.com
    a5736bf7
execMain.c 103 KB