• David Rowley's avatar
    Allocate consecutive blocks during parallel seqscans · 56788d21
    David Rowley authored
    Previously we would allocate blocks to parallel workers during a parallel
    sequential scan 1 block at a time.  Since other workers were likely to
    request a block before a worker returns for another block number to work
    on, this could lead to non-sequential I/O patterns in each worker which
    could cause the operating system's readahead to perform poorly or not at
    all.
    
    Here we change things so that we allocate consecutive "chunks" of blocks
    to workers and have them work on those until they're done, at which time
    we allocate another chunk for the worker.  The size of these chunks is
    based on the size of the relation.
    
    Initial patch here was by Thomas Munro which showed some good improvements
    just having a fixed chunk size of 64 blocks with a simple ramp-down near
    the end of the scan. The revisions of the patch to make the chunk size
    based on the relation size and the adjusted ramp-down in powers of two was
    done by me, along with quite extensive benchmarking to determine the
    optimal chunk sizes.
    
    For the most part, benchmarks have shown significant performance
    improvements for large parallel sequential scans on Linux, FreeBSD and
    Windows using SSDs.  It's less clear how this affects the performance of
    cloud providers.  Tests done so far are unable to obtain stable enough
    performance to provide meaningful benchmark results.  It is possible that
    this could cause some performance regressions on more obscure filesystems,
    so we may need to later provide users with some ability to get something
    closer to the old behavior.  For now, let's leave that until we see that
    it's really required.
    
    Author: Thomas Munro, David Rowley
    Reviewed-by: Ranier Vilela, Soumyadeep Chakraborty, Robert Haas
    Reviewed-by: Amit Kapila, Kirk Jamison
    Discussion: https://postgr.es/m/CA+hUKGJ_EErDv41YycXcbMbCBkztA34+z1ts9VQH+ACRuvpxig@mail.gmail.com
    56788d21
tableam.h 63.4 KB