• Andrew Gierth's avatar
    Change floating-point output format for improved performance. · 02ddd499
    Andrew Gierth authored
    Previously, floating-point output was done by rounding to a specific
    decimal precision; by default, to 6 or 15 decimal digits (losing
    information) or as requested using extra_float_digits. Drivers that
    wanted exact float values, and applications like pg_dump that must
    preserve values exactly, set extra_float_digits=3 (or sometimes 2 for
    historical reasons, though this isn't enough for float4).
    
    Unfortunately, decimal rounded output is slow enough to become a
    noticable bottleneck when dealing with large result sets or COPY of
    large tables when many floating-point values are involved.
    
    Floating-point output can be done much faster when the output is not
    rounded to a specific decimal length, but rather is chosen as the
    shortest decimal representation that is closer to the original float
    value than to any other value representable in the same precision. The
    recently published Ryu algorithm by Ulf Adams is both relatively
    simple and remarkably fast.
    
    Accordingly, change float4out/float8out to output shortest decimal
    representations if extra_float_digits is greater than 0, and make that
    the new default. Applications that need rounded output can set
    extra_float_digits back to 0 or below, and take the resulting
    performance hit.
    
    We make one concession to portability for systems with buggy
    floating-point input: we do not output decimal values that fall
    exactly halfway between adjacent representable binary values (which
    would rely on the reader doing round-to-nearest-even correctly). This
    is known to be a problem at least for VS2013 on Windows.
    
    Our version of the Ryu code originates from
    https://github.com/ulfjack/ryu/ at commit c9c3fb1979, but with the
    following (significant) modifications:
    
     - Output format is changed to use fixed-point notation for small
       exponents, as printf would, and also to use lowercase 'e', a
       minimum of 2 exponent digits, and a mandatory sign on the exponent,
       to keep the formatting as close as possible to previous output.
    
     - The output of exact midpoint values is disabled as noted above.
    
     - The integer fast-path code is changed somewhat (since we have
       fixed-point output and the upstream did not).
    
     - Our project style has been largely applied to the code with the
       exception of C99 declaration-after-statement, which has been
       retained as an exception to our present policy.
    
     - Most of upstream's debugging and conditionals are removed, and we
       use our own configure tests to determine things like uint128
       availability.
    
    Changing the float output format obviously affects a number of
    regression tests. This patch uses an explicit setting of
    extra_float_digits=0 for test output that is not expected to be
    exactly reproducible (e.g. due to numerical instability or differing
    algorithms for transcendental functions).
    
    Conversions from floats to numeric are unchanged by this patch. These
    may appear in index expressions and it is not yet clear whether any
    change should be made, so that can be left for another day.
    
    This patch assumes that the only supported floating point format is
    now IEEE format, and the documentation is updated to reflect that.
    
    Code by me, adapting the work of Ulf Adams and other contributors.
    
    References:
    https://dl.acm.org/citation.cfm?id=3192369
    
    Reviewed-by: Tom Lane, Andres Freund, Donald Dong
    Discussion: https://postgr.es/m/87r2el1bx6.fsf@news-spur.riddles.org.uk
    02ddd499
float4.out 34.6 KB