• Tom Lane's avatar
    Fix race condition with toast table access from a stale syscache entry. · 08e261cb
    Tom Lane authored
    If a tuple in a syscache contains an out-of-line toasted field, and we
    try to fetch that field shortly after some other transaction has committed
    an update or deletion of the tuple, there is a race condition: vacuum
    could come along and remove the toast tuples before we can fetch them.
    This leads to transient failures like "missing chunk number 0 for toast
    value NNNNN in pg_toast_2619", as seen in recent reports from Andrew
    Hammond and Tim Uckun.
    
    The design idea of syscache is that access to stale syscache entries
    should be prevented by relation-level locks, but that fails for at least
    two cases where toasted fields are possible: ANALYZE updates pg_statistic
    rows without locking out sessions that might want to plan queries on the
    same table, and CREATE OR REPLACE FUNCTION updates pg_proc rows without
    any meaningful lock at all.
    
    The least risky fix seems to be an idea that Heikki suggested when we
    were dealing with a related problem back in August: forcibly detoast any
    out-of-line fields before putting a tuple into syscache in the first place.
    This avoids the problem because at the time we fetch the parent tuple from
    the catalog, we should be holding an MVCC snapshot that will prevent
    removal of the toast tuples, even if the parent tuple is outdated
    immediately after we fetch it.  (Note: I'm not convinced that this
    statement holds true at every instant where we could be fetching a syscache
    entry at all, but it does appear to hold true at the times where we could
    fetch an entry that could have a toasted field.  We will need to be a bit
    wary of adding toast tables to low-level catalogs that don't have them
    already.)  An additional benefit is that subsequent uses of the syscache
    entry should be faster, since they won't have to detoast the field.
    
    Back-patch to all supported versions.  The problem is significantly harder
    to reproduce in pre-9.0 releases, because of their willingness to flush
    every entry in a syscache whenever the underlying catalog is vacuumed
    (cf CatalogCacheFlushRelation); but there is still a window for trouble.
    08e261cb
tuptoaster.c 50 KB