• Amit Kapila's avatar
    Allow multiple xacts during table sync in logical replication. · ce0fdbfe
    Amit Kapila authored
    For the initial table data synchronization in logical replication, we use
    a single transaction to copy the entire table and then synchronize the
    position in the stream with the main apply worker.
    
    There are multiple downsides of this approach: (a) We have to perform the
    entire copy operation again if there is any error (network breakdown,
    error in the database operation, etc.) while we synchronize the WAL
    position between tablesync worker and apply worker; this will be onerous
    especially for large copies, (b) Using a single transaction in the
    synchronization-phase (where we can receive WAL from multiple
    transactions) will have the risk of exceeding the CID limit, (c) The slot
    will hold the WAL till the entire sync is complete because we never commit
    till the end.
    
    This patch solves all the above downsides by allowing multiple
    transactions during the tablesync phase. The initial copy is done in a
    single transaction and after that, we commit each transaction as we
    receive. To allow recovery after any error or crash, we use a permanent
    slot and origin to track the progress. The slot and origin will be removed
    once we finish the synchronization of the table. We also remove slot and
    origin of tablesync workers if the user performs DROP SUBSCRIPTION .. or
    ALTER SUBSCRIPTION .. REFERESH and some of the table syncs are still not
    finished.
    
    The commands ALTER SUBSCRIPTION ... REFRESH PUBLICATION and
    ALTER SUBSCRIPTION ... SET PUBLICATION ... with refresh option as true
    cannot be executed inside a transaction block because they can now drop
    the slots for which we have no provision to rollback.
    
    This will also open up the path for logical replication of 2PC
    transactions on the subscriber side. Previously, we can't do that because
    of the requirement of maintaining a single transaction in tablesync
    workers.
    
    Bump catalog version due to change of state in the catalog
    (pg_subscription_rel).
    
    Author: Peter Smith, Amit Kapila, and Takamichi Osumi
    Reviewed-by: Ajin Cherian, Petr Jelinek, Hou Zhijie and Amit Kapila
    Discussion: https://postgr.es/m/CAA4eK1KHJxaZS-fod-0fey=0tq3=Gkn4ho=8N4-5HWiCfu0H1A@mail.gmail.com
    ce0fdbfe
worker_internal.h 2.69 KB