• Tomas Vondra's avatar
    Do not decode TOAST data for table rewrites · f69c959d
    Tomas Vondra authored
    During table rewrites (VACUUM FULL and CLUSTER), the main heap is logged
    using XLOG / FPI records, and thus (correctly) ignored in decoding.
    But the associated TOAST table is WAL-logged as plain INSERT records,
    and so was logically decoded and passed to reorder buffer.
    
    That has severe consequences with TOAST tables of non-trivial size.
    Firstly, reorder buffer has to keep all those changes, possibly spilling
    them to a file, incurring I/O costs and disk space.
    
    Secondly, ReoderBufferCommit() was stashing all those TOAST chunks into
    a hash table, which got discarded only after processing the row from the
    main heap.  But as the main heap is not decoded for rewrites, this never
    happened, so all the TOAST data accumulated in memory, resulting either
    in excessive memory consumption or OOM.
    
    The fix is simple, as commit e9edc1ba already introduced infrastructure
    (namely HEAP_INSERT_NO_LOGICAL flag) to skip logical decoding of TOAST
    tables, but it only applied it to system tables.  So simply use it for
    all TOAST data in raw_heap_insert().
    
    That would however solve only the memory consumption issue - the TOAST
    changes would still be decoded and added to the reorder buffer, and
    spilled to disk (although without TOAST tuple data, so much smaller).
    But we can solve that by tweaking DecodeInsert() to just ignore such
    INSERT records altogether, using XLH_INSERT_CONTAINS_NEW_TUPLE flag,
    instead of skipping them later in ReorderBufferCommit().
    
    Review: Masahiko Sawada
    Discussion: https://www.postgresql.org/message-id/flat/1a17c643-e9af-3dba-486b-fbe31bc1823a%402ndquadrant.com
    Backpatch: 9.4-, where logical decoding was introduced
    f69c959d
decode.c 30.4 KB