This directory contains source for a library of binary -> decimal | |

and decimal -> binary conversion routines, for single-, double-, | |

and extended-precision IEEE binary floating-point arithmetic, and | |

other IEEE-like binary floating-point, including "double double", | |

as in | |

T. J. Dekker, "A Floating-Point Technique for Extending the | |

Available Precision", Numer. Math. 18 (1971), pp. 224-242 | |

and | |

"Inside Macintosh: PowerPC Numerics", Addison-Wesley, 1994 | |

The conversion routines use double-precision floating-point arithmetic | |

and, where necessary, high precision integer arithmetic. The routines | |

are generalizations of the strtod and dtoa routines described in | |

David M. Gay, "Correctly Rounded Binary-Decimal and | |

Decimal-Binary Conversions", Numerical Analysis Manuscript | |

No. 90-10, Bell Labs, Murray Hill, 1990; | |

http://cm.bell-labs.com/cm/cs/what/ampl/REFS/rounding.ps.gz | |

(based in part on papers by Clinger and Steele & White: see the | |

references in the above paper). | |

The present conversion routines should be able to use any of IEEE binary, | |

VAX, or IBM-mainframe double-precision arithmetic internally, but I (dmg) | |

have so far only had a chance to test them with IEEE double precision | |

arithmetic. | |

The core conversion routines are strtodg for decimal -> binary conversions | |

and gdtoa for binary -> decimal conversions. These routines operate | |

on arrays of unsigned 32-bit integers of type ULong, a signed 32-bit | |

exponent of type Long, and arithmetic characteristics described in | |

struct FPI; FPI, Long, and ULong are defined in gdtoa.h. File arith.h | |

is supposed to provide #defines that cause gdtoa.h to define its | |

types correctly. File arithchk.c is source for a program that | |

generates a suitable arith.h on all systems where I've been able to | |

test it. | |

The core conversion routines are meant to be called by helper routines | |

that know details of the particular binary arithmetic of interest and | |

convert. The present directory provides helper routines for 5 variants | |

of IEEE binary floating-point arithmetic, each indicated by one or | |

two letters: | |

f IEEE single precision | |

d IEEE double precision | |

x IEEE extended precision, as on Intel 80x87 | |

and software emulations of Motorola 68xxx chips | |

that do not pad the way the 68xxx does, but | |

only store 80 bits | |

xL IEEE extended precision, as on Motorola 68xxx chips | |

Q quad precision, as on Sun Sparc chips | |

dd double double, pairs of IEEE double numbers | |

whose sum is the desired value | |

For decimal -> binary conversions, there are three families of | |

helper routines: one for round-nearest (or the current rounding | |

mode on IEEE-arithmetic systems that provide the C99 fegetround() | |

function, if compiled with -DHonor_FLT_ROUNDS): | |

strtof | |

strtod | |

strtodd | |

strtopd | |

strtopf | |

strtopx | |

strtopxL | |

strtopQ | |

one with rounding direction specified: | |

strtorf | |

strtord | |

strtordd | |

strtorx | |

strtorxL | |

strtorQ | |

and one for computing an interval (at most one bit wide) that contains | |

the decimal number: | |

strtoIf | |

strtoId | |

strtoIdd | |

strtoIx | |

strtoIxL | |

strtoIQ | |

The latter call strtoIg, which makes one call on strtodg and adjusts | |

the result to provide the desired interval. On systems where native | |

arithmetic can easily make one-ulp adjustments on values in the | |

desired floating-point format, it might be more efficient to use the | |

native arithmetic. Routine strtodI is a variant of strtoId that | |

illustrates one way to do this for IEEE binary double-precision | |

arithmetic -- but whether this is more efficient remains to be seen. | |

Functions strtod and strtof have "natural" return types, float and | |

double -- strtod is specified by the C standard, and strtof appears | |

in the stdlib.h of some systems, such as (at least some) Linux systems. | |

The other functions write their results to their final argument(s): | |

to the final two argument for the strtoI... (interval) functions, | |

and to the final argument for the others (strtop... and strtor...). | |

Where possible, these arguments have "natural" return types (double* | |

or float*), to permit at least some type checking. In reality, they | |

are viewed as arrays of ULong (or, for the "x" functions, UShort) | |

values. On systems where long double is the appropriate type, one can | |

pass long double* final argument(s) to these routines. The int value | |

that these routines return is the return value from the call they make | |

on strtodg; see the enum of possible return values in gdtoa.h. | |

Source files g_ddfmt.c, misc.c, smisc.c, strtod.c, strtodg.c, and ulp.c | |

should use true IEEE double arithmetic (not, e.g., double extended), | |

at least for storing (and viewing the bits of) the variables declared | |

"double" within them. | |

One detail indicated in struct FPI is whether the target binary | |

arithmetic departs from the IEEE standard by flushing denormalized | |

numbers to 0. On systems that do this, the helper routines for | |

conversion to double-double format (when compiled with | |

Sudden_Underflow #defined) penalize the bottom of the exponent | |

range so that they return a nonzero result only when the least | |

significant bit of the less significant member of the pair of | |

double values returned can be expressed as a normalized double | |

value. An alternative would be to drop to 53-bit precision near | |

the bottom of the exponent range. To get correct rounding, this | |

would (in general) require two calls on strtodg (one specifying | |

126-bit arithmetic, then, if necessary, one specifying 53-bit | |

arithmetic). | |

By default, the core routine strtodg and strtod set errno to ERANGE | |

if the result overflows to +Infinity or underflows to 0. Compile | |

these routines with NO_ERRNO #defined to inhibit errno assignments. | |

Routine strtod is based on netlib's "dtoa.c from fp", and | |

(f = strtod(s,se)) is more efficient for some conversions than, say, | |

strtord(s,se,1,&f). Parts of strtod require true IEEE double | |

arithmetic with the default rounding mode (round-to-nearest) and, on | |

systems with IEEE extended-precision registers, double-precision | |

(53-bit) rounding precision. If the machine uses (the equivalent of) | |

Intel 80x87 arithmetic, the call | |

_control87(PC_53, MCW_PC); | |

does this with many compilers. Whether this or another call is | |

appropriate depends on the compiler; for this to work, it may be | |

necessary to #include "float.h" or another system-dependent header | |

file. | |

Source file strtodnrp.c gives a strtod that does not require 53-bit | |

rounding precision on systems (such as Intel IA32 systems) that may | |

suffer double rounding due to use of extended-precision registers. | |

For some conversions this variant of strtod is less efficient than the | |

one in strtod.c when the latter is run with 53-bit rounding precision. | |

The values that the strto* routines return for NaNs are determined by | |

gd_qnan.h, which the makefile generates by running the program whose | |

source is qnan.c. Note that the rules for distinguishing signaling | |

from quiet NaNs are system-dependent. For cross-compilation, you need | |

to determine arith.h and gd_qnan.h suitably, e.g., using the | |

arithmetic of the target machine. | |

C99's hexadecimal floating-point constants are recognized by the | |

strto* routines (but this feature has not yet been heavily tested). | |

Compiling with NO_HEX_FP #defined disables this feature. | |

When compiled with -DINFNAN_CHECK, the strto* routines recognize C99's | |

NaN and Infinity syntax. Moreover, unless No_Hex_NaN is #defined, the | |

strto* routines also recognize C99's NaN(...) syntax: they accept | |

(case insensitively) strings of the form NaN(x), where x is a string | |

of hexadecimal digits and spaces; if there is only one string of | |

hexadecimal digits, it is taken for the fraction bits of the resulting | |

NaN; if there are two or more strings of hexadecimal digits, each | |

string is assigned to the next available sequence of 32-bit words of | |

fractions bits (starting with the most significant), right-aligned in | |

each sequence. | |

For binary -> decimal conversions, I've provided just one family | |

of helper routines: | |

g_ffmt | |

g_dfmt | |

g_ddfmt | |

g_xfmt | |

g_xLfmt | |

g_Qfmt | |

which do a "%g" style conversion either to a specified number of decimal | |

places (if their ndig argument is positive), or to the shortest | |

decimal string that rounds to the given binary floating-point value | |

(if ndig <= 0). They write into a buffer supplied as an argument | |

and return either a pointer to the end of the string (a null character) | |

in the buffer, if the buffer was long enough, or 0. Other forms of | |

conversion are easily done with the help of gdtoa(), such as %e or %f | |

style and conversions with direction of rounding specified (so that, if | |

desired, the decimal value is either >= or <= the binary value). | |

On IEEE-arithmetic systems that provide the C99 fegetround() function, | |

if compiled with -DHonor_FLT_ROUNDS, these routines honor the current | |

rounding mode. | |

For an example of more general conversions based on dtoa(), see | |

netlib's "printf.c from ampl/solvers". | |

For double-double -> decimal, g_ddfmt() assumes IEEE-like arithmetic | |

of precision max(126, #bits(input)) bits, where #bits(input) is the | |

number of mantissa bits needed to represent the sum of the two double | |

values in the input. | |

The makefile creates a library, gdtoa.a. To use the helper | |

routines, a program only needs to include gdtoa.h. All the | |

source files for gdtoa.a include a more extensive gdtoaimp.h; | |

among other things, gdtoaimp.h has #defines that make "internal" | |

names end in _D2A. To make a "system" library, one could modify | |

these #defines to make the names start with __. | |

Various comments about possible #defines appear in gdtoaimp.h, | |

but for most purposes, arith.h should set suitable #defines. | |

Systems with preemptive scheduling of multiple threads require some | |

manual intervention. On such systems, it's necessary to compile | |

dmisc.c, dtoa.c gdota.c, and misc.c with MULTIPLE_THREADS #defined, | |

and to provide (or suitably #define) two locks, acquired by | |

ACQUIRE_DTOA_LOCK(n) and freed by FREE_DTOA_LOCK(n) for n = 0 or 1. | |

(The second lock, accessed in pow5mult, ensures lazy evaluation of | |

only one copy of high powers of 5; omitting this lock would introduce | |

a small probability of wasting memory, but would otherwise be harmless.) | |

Routines that call dtoa or gdtoa directly must also invoke freedtoa(s) | |

to free the value s returned by dtoa or gdtoa. It's OK to do so whether | |

or not MULTIPLE_THREADS is #defined, and the helper g_*fmt routines | |

listed above all do this indirectly (in gfmt_D2A(), which they all call). | |

By default, there is a private pool of memory of length 2000 bytes | |

for intermediate quantities, and MALLOC (see gdtoaimp.h) is called only | |

if the private pool does not suffice. 2000 is large enough that MALLOC | |

is called only under very unusual circumstances (decimal -> binary | |

conversion of very long strings) for conversions to and from double | |

precision. For systems with preemptively scheduled multiple threads | |

or for conversions to extended or quad, it may be appropriate to | |

#define PRIVATE_MEM nnnn, where nnnn is a suitable value > 2000. | |

For extended and quad precisions, -DPRIVATE_MEM=20000 is probably | |

plenty even for many digits at the ends of the exponent range. | |

Use of the private pool avoids some overhead. | |

Directory test provides some test routines. See its README. | |

I've also tested this stuff (except double double conversions) | |

with Vern Paxson's testbase program: see | |

V. Paxson and W. Kahan, "A Program for Testing IEEE Binary-Decimal | |

Conversion", manuscript, May 1991, | |

ftp://ftp.ee.lbl.gov/testbase-report.ps.Z . | |

(The same ftp directory has source for testbase.) | |

Some system-dependent additions to CFLAGS in the makefile: | |

HU-UX: -Aa -Ae | |

OSF (DEC Unix): -ieee_with_no_inexact | |

SunOS 4.1x: -DKR_headers -DBad_float_h | |

If you want to put this stuff into a shared library and your | |

operating system requires export lists for shared libraries, | |

the following would be an appropriate export list: | |

dtoa | |

freedtoa | |

g_Qfmt | |

g_ddfmt | |

g_dfmt | |

g_ffmt | |

g_xLfmt | |

g_xfmt | |

gdtoa | |

strtoIQ | |

strtoId | |

strtoIdd | |

strtoIf | |

strtoIx | |

strtoIxL | |

strtod | |

strtodI | |

strtodg | |

strtof | |

strtopQ | |

strtopd | |

strtopdd | |

strtopf | |

strtopx | |

strtopxL | |

strtorQ | |

strtord | |

strtordd | |

strtorf | |

strtorx | |

strtorxL | |

When time permits, I (dmg) hope to write in more detail about the | |

present conversion routines; for now, this README file must suffice. | |

Meanwhile, if you wish to write helper functions for other kinds of | |

IEEE-like arithmetic, some explanation of struct FPI and the bits | |

array may be helpful. Both gdtoa and strtodg operate on a bits array | |

described by FPI *fpi. The bits array is of type ULong, a 32-bit | |

unsigned integer type. Floating-point numbers have fpi->nbits bits, | |

with the least significant 32 bits in bits[0], the next 32 bits in | |

bits[1], etc. These numbers are regarded as integers multiplied by | |

2^e (i.e., 2 to the power of the exponent e), where e is the second | |

argument (be) to gdtoa and is stored in *exp by strtodg. The minimum | |

and maximum exponent values fpi->emin and fpi->emax for normalized | |

floating-point numbers reflect this arrangement. For example, the | |

P754 standard for binary IEEE arithmetic specifies doubles as having | |

53 bits, with normalized values of the form 1.xxxxx... times 2^(b-1023), | |

with 52 bits (the x's) and the biased exponent b represented explicitly; | |

b is an unsigned integer in the range 1 <= b <= 2046 for normalized | |

finite doubles, b = 0 for denormals, and b = 2047 for Infinities and NaNs. | |

To turn an IEEE double into the representation used by strtodg and gdtoa, | |

we multiply 1.xxxx... by 2^52 (to make it an integer) and reduce the | |

exponent e = (b-1023) by 52: | |

fpi->emin = 1 - 1023 - 52 | |

fpi->emax = 1046 - 1023 - 52 | |

In various wrappers for IEEE double, we actually write -53 + 1 rather | |

than -52, to emphasize that there are 53 bits including one implicit bit. | |

Field fpi->rounding indicates the desired rounding direction, with | |

possible values | |

FPI_Round_zero = toward 0, | |

FPI_Round_near = unbiased rounding -- the IEEE default, | |

FPI_Round_up = toward +Infinity, and | |

FPI_Round_down = toward -Infinity | |

given in gdtoa.h. | |

Field fpi->sudden_underflow indicates whether strtodg should return | |

denormals or flush them to zero. Normal floating-point numbers have | |

bit fpi->nbits in the bits array on. Denormals have it off, with | |

exponent = fpi->emin. Strtodg provides distinct return values for normals | |

and denormals; see gdtoa.h. | |

Compiling g__fmt.c, strtod.c, and strtodg.c with -DUSE_LOCALE causes | |

the decimal-point character to be taken from the current locale; otherwise | |

it is '.'. | |

Source files dtoa.c and strtod.c in this directory are derived from | |

netlib's "dtoa.c from fp" and are meant to function equivalently. | |

When compiled with Honor_FLT_ROUNDS #defined (on systems that provide | |

FLT_ROUNDS and fegetround() as specified in the C99 standard), they | |

honor the current rounding mode. Because FLT_ROUNDS is buggy on some | |

(Linux) systems -- not reflecting calls on fesetround(), as the C99 | |

standard says it should -- when Honor_FLT_ROUNDS is #defined, the | |

current rounding mode is obtained from fegetround() rather than from | |

FLT_ROUNDS, unless Trust_FLT_ROUNDS is also #defined. | |

Compile with -DUSE_LOCALE to use the current locale; otherwise | |

decimal points are assumed to be '.'. With -DUSE_LOCALE, unless | |

you also compile with -DNO_LOCALE_CACHE, the details about the | |

current "decimal point" character string are cached and assumed not | |

to change during the program's execution. | |

Please send comments to David M. Gay (dmg at acm dot org, with " at " | |

changed at "@" and " dot " changed to "."). |