• Andres Freund's avatar
    snapshot scalability: Don't compute global horizons while building snapshots. · dc7420c2
    Andres Freund authored
    To make GetSnapshotData() more scalable, it cannot not look at at each proc's
    xmin: While snapshot contents do not need to change whenever a read-only
    transaction commits or a snapshot is released, a proc's xmin is modified in
    those cases. The frequency of xmin modifications leads to, particularly on
    higher core count systems, many cache misses inside GetSnapshotData(), despite
    the data underlying a snapshot not changing. That is the most
    significant source of GetSnapshotData() scaling poorly on larger systems.
    
    Without accessing xmins, GetSnapshotData() cannot calculate accurate horizons /
    thresholds as it has so far. But we don't really have to: The horizons don't
    actually change that much between GetSnapshotData() calls. Nor are the horizons
    actually used every time a snapshot is built.
    
    The trick this commit introduces is to delay computation of accurate horizons
    until there use and using horizon boundaries to determine whether accurate
    horizons need to be computed.
    
    The use of RecentGlobal[Data]Xmin to decide whether a row version could be
    removed has been replaces with new GlobalVisTest* functions.  These use two
    thresholds to determine whether a row can be pruned:
    1) definitely_needed, indicating that rows deleted by XIDs >= definitely_needed
       are definitely still visible.
    2) maybe_needed, indicating that rows deleted by XIDs < maybe_needed can
       definitely be removed
    GetSnapshotData() updates definitely_needed to be the xmin of the computed
    snapshot.
    
    When testing whether a row can be removed (with GlobalVisTestIsRemovableXid())
    and the tested XID falls in between the two (i.e. XID >= maybe_needed && XID <
    definitely_needed) the boundaries can be recomputed to be more accurate. As it
    is not cheap to compute accurate boundaries, we limit the number of times that
    happens in short succession.  As the boundaries used by
    GlobalVisTestIsRemovableXid() are never reset (with maybe_needed updated by
    GetSnapshotData()), it is likely that further test can benefit from an earlier
    computation of accurate horizons.
    
    To avoid regressing performance when old_snapshot_threshold is set (as that
    requires an accurate horizon to be computed), heap_page_prune_opt() doesn't
    unconditionally call TransactionIdLimitedForOldSnapshots() anymore. Both the
    computation of the limited horizon, and the triggering of errors (with
    SetOldSnapshotThresholdTimestamp()) is now only done when necessary to remove
    tuples.
    
    This commit just removes the accesses to PGXACT->xmin from
    GetSnapshotData(), but other members of PGXACT residing in the same
    cache line are accessed. Therefore this in itself does not result in a
    significant improvement. Subsequent commits will take advantage of the
    fact that GetSnapshotData() now does not need to access xmins anymore.
    
    Note: This contains a workaround in heap_page_prune_opt() to keep the
    snapshot_too_old tests working. While that workaround is ugly, the tests
    currently are not meaningful, and it seems best to address them separately.
    
    Author: Andres Freund <andres@anarazel.de>
    Reviewed-By: default avatarRobert Haas <robertmhaas@gmail.com>
    Reviewed-By: default avatarThomas Munro <thomas.munro@gmail.com>
    Reviewed-By: default avatarDavid Rowley <dgrowleyml@gmail.com>
    Discussion: https://postgr.es/m/20200301083601.ews6hz5dduc3w2se@alap3.anarazel.de
    dc7420c2
nbtree.c 44.8 KB