• Peter Eisentraut's avatar
    Allow multi-inserts during COPY into a partitioned table · 0d5f05cd
    Peter Eisentraut authored
    CopyFrom allows multi-inserts to be used for non-partitioned tables, but
    this was disabled for partitioned tables.  The reason for this appeared
    to be that the tuple may not belong to the same partition as the
    previous tuple did.  Not allowing multi-inserts here greatly slowed down
    imports into partitioned tables.  These could take twice as long as a
    copy to an equivalent non-partitioned table.  It seems wise to do
    something about this, so this change allows the multi-inserts by
    flushing the so-far inserted tuples to the partition when the next tuple
    does not belong to the same partition, or when the buffer fills.  This
    improves performance when the next tuple in the stream commonly belongs
    to the same partition as the previous tuple.
    
    In cases where the target partition changes on every tuple, using
    multi-inserts slightly slows the performance.  To get around this we
    track the average size of the batches that have been inserted and
    adaptively enable or disable multi-inserts based on the size of the
    batch.  Some testing was done and the regression only seems to exist
    when the average size of the insert batch is close to 1, so let's just
    enable multi-inserts when the average size is at least 1.3.  More
    performance testing might reveal a better number for, this, but since
    the slowdown was only 1-2% it does not seem critical enough to spend too
    much time calculating it.  In any case it may depend on other factors
    rather than just the size of the batch.
    
    Allowing multi-inserts for partitions required a bit of work around the
    per-tuple memory contexts as we must flush the tuples when the next
    tuple does not belong the same partition.  In which case there is no
    good time to reset the per-tuple context, as we've already built the new
    tuple by this time.  In order to work around this we maintain two
    per-tuple contexts and just switch between them every time the partition
    changes and reset the old one.  This does mean that the first of each
    batch of tuples is not allocated in the same memory context as the
    others, but that does not matter since we only reset the context once
    the previous batch has been inserted.
    
    Author: David Rowley <david.rowley@2ndquadrant.com>
    Reviewed-by: default avatarMelanie Plageman <melanieplageman@gmail.com>
    0d5f05cd
copy.c 145 KB