• Heikki Linnakangas's avatar
    Do COPY FROM encoding conversion/verification in larger chunks. · f82de5c4
    Heikki Linnakangas authored
    This gives a small performance gain, by reducing the number of calls
    to the conversion/verification function, and letting it work with
    larger inputs. Also, reorganizing the input pipeline makes it easier
    to parallelize the input parsing: after the input has been converted
    to the database encoding, the next stage of finding the newlines can
    be done in parallel, because there cannot be any newline chars
    "embedded" in multi-byte characters in the encodings that we support
    as server encodings.
    
    This changes behavior in one corner case: if client and server
    encodings are the same single-byte encoding (e.g. latin1), previously
    the input would not be checked for zero bytes ('\0'). Any fields
    containing zero bytes would be truncated at the zero. But if encoding
    conversion was needed, the conversion routine would throw an error on
    the zero. After this commit, the input is always checked for zeros.
    
    Reviewed-by: John Naylor
    Discussion: https://www.postgresql.org/message-id/e7861509-3960-538a-9025-b75a61188e01%40iki.fi
    f82de5c4
copyfrom.c 48.6 KB