User login
Business Codes
DUNS: 16-498-2238
Cage Code: 5YVN2
NAICS Codes: 541330, 541511, 541512, 541519, 611420
Search
Accuracy Supplement
Introduction
VectorZ is designed to provide highest performance while providing superior accuracy. The accuracy provided depends on the data representation (single-precision floating point), and the algorithms employed. Single-precision accuracy provides 24 bits of precision in the mantissa. This greatly exceeds the precision of most A/D devices, and provides a signal-tonoise ratio of approximately 144 dB.
Iterative Approximation Algorithms
Some functions are approximations evaluated by iterative methods. Such functions will always have some error when compared to the exact results. Functions like sin, exp, log, sqrt, and even reciprocal and divide are evaluated using iterative methods.
VectorZ is designed to provide high accuracy for functions evaluated using iterative methods. This is more challenging to accomplish with SIMD instructions: scalar methods can iterate until a desired error criteria is met. Since SIMD instructions operate on multiple data elements with a single instruction, multiple iterations must be performed until all data elements within SIMD registers converge.
VectorZ accomplishes this using a fixed number of iterations for all functions. Functions incorporate special code to reduce arguments and improve convergence, followed by a fixed number of iterations designed to provide high accuracy. These algorithms are the result of careful implementation based on sound numerical analysis methods. This provides deterministic performance and high accuracy over a wide range of input values.
Absolute and Relative Errors
The tables in this document show the absolute and relative errors between VectorZ functions and double-precision scalar functions, the results of which are rounded to single precision (the model function). The Absolute Error is the absolute value of the difference between the expected and the computed result. The Relative Error is the absolute error divided by the absolute value of the expected value. The relative error is akin to a percentage error, and is often more useful for determining function accuracy.
A relative error of 1.2e-7 represents an error in the least significant bit of a single-precision value. From a signal processing point of view, a function that introduces an error in the LSB of a single-precision result provides a signal-to-noise ratio of about 138 dB. A relative errorof 16 times this amount, representing an error in the lowest 4 significant bits, is a relative error of 1.9e-6, or about 114 dB SNR.
Absolute errors are provided for reference. Functions like tan, which approaches ±∞ as x approaches ±πn for odd n, will have large absolute errors near these poles, even though the relative error will be small. When the result of tan is “large,” the absolute error may be large as well. In such cases the relative error remains small, since the absolute error is a small fraction of the output value. This is also true for functions like exp when the input value is large, and recip and log when the input value is close to zero.
For most functions, the average error across the input range is often on the order of 2e-8. This indicates that most VectorZ results agree completely with the results of the model function. The maximum relative error is also recorded; this is often on the order of 2e-7, indicating a worst-case error of 1 to 3 bits.
Forward trigonometric functions are tested over a very wide range. As the range widens, the maximum relative error increases. This is because the model function, evaluated in double precision, extends the single-precision input to double before subtracting a double-precision multiple of 2π. This provides a false level of accuracy for large inputs that is not realizable in the single-precision algorithms.
Functions within the table are tested for real-inputs only. Functions of complex inputs will have greater errors depending upon the function: trigonometric functions of complex variables involve products of real exponents and real sinusoids. Average and worst-case errors are more difficult to characterize for complex functions.
