Efficient evaluation methods of elementary functions suitable for SIMD computation

Shibata, Naoki

doi:10.1007/s00450-010-0108-2

Efficient evaluation methods of elementary functions suitable for SIMD computation

Special Issue Paper
Published: 20 April 2010

Volume 25, pages 25–32, (2010)
Cite this article

Computer Science - Research and Development

Naoki Shibata¹

143 Accesses
11 Citations
Explore all metrics

Abstract

Data-parallel architectures like SIMD (Single Instruction Multiple Data) or SIMT (Single Instruction Multiple Thread) have been adopted in many recent CPU and GPU architectures. Although some SIMD and SIMT instruction sets include double-precision arithmetic and bitwise operations, there are no instructions dedicated to evaluating elementary functions like trigonometric functions in double precision. Thus, these functions have to be evaluated one by one using an FPU or using a software library. However, traditional algorithms for evaluating these elementary functions involve heavy use of conditional branches and/or table look-ups, which are not suitable for SIMD computation. In this paper, efficient methods are proposed for evaluating the sine, cosine, arc tangent, exponential and logarithmic functions in double precision without table look-ups, scattering from, or gathering into SIMD registers, or conditional branches. We implemented these methods using the Intel SSE2 instruction set to evaluate their accuracy and speed. The results showed that the average error was less than 0.67 ulp, and the maximum error was 6 ulps. The computation speed was faster than the FPUs on Intel Core 2 and Core i7 processors.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Seiler L, Carmean D, Sprangle E, Forsyth T, Abrash M, Dubey P, Junkins S, Lake A, Sugerman J, Cavin R, Espasa R, Grochowski E, Juan T, Hanrahan P (2008) Larrabee: a many-core x86 architecture for visual computing. In: Proceedings of ACM SIGGRAPH 2008, pp 1–15
Barker K, Davis K, Hoisie A, Kerbyson D, Lang M, Pakin S, Sancho J (2008) Entering the petaflop era: the architecture and performance of Roadrunner. In: Proceedings of the 2008 ACM/IEEE conference on supercomputing, pp 1–11
Gschwind M, Hofstee H, Flachs B, Hopkins M, Watanabe Y, Yamazaki T (2006) Synergistic processing in cell’s multicore architecture. IEEE MICRO 26(2):10–24
Article Google Scholar
Tesla C2050 and Tesla C2070 Computing Processor Board. http://www.nvidia.com/docs/IO/43395/BD-04983-001_v01.pdf
Thakkar S, Huff T (1999) Internet streaming SIMD extensions. Computer 32(12):26–34
Article Google Scholar
Approximate Math Library 2.0. http://www.intel.com/design/pentiumiii/devtools/AMaths.zip
Simple SSE SSE2 optimized sin, cos, log and exp. http://gruntthepeon.free.fr/ssemath/
Nyland L, Snyder M Fast trigonometric functions using Intel’s SSE2 instructions. Tech Report
Linux Kernel Version 2.6.30.5. http://www.kernel.org/
GNU C Library Version 2.7. http://www.gnu.org/software/libc/
Brent R (2006) Fast algorithms for high-precision computation of elementary functions. In: Proceedings of 7th conference on real numbers and computers (RNC 7), pp 7–8
The MPFR Library. http://www.mpfr.org/
Detrey J, Dinechin F, Pujul X (2007) Return of the hardware floating-point elementary function. In: Proceedings of the 18th IEEE symposium on computer arithmetic, pp 161–168
Koren I, Zinaty O (1990) Evaluating elementary functions in a numerical coprocessor based on rational approximations. IEEE Trans Comput 39(8):1030–1037
Article Google Scholar
Ercegovac M, Lang T, Muller J, Tisserand A (2000) Reciprocation, square root, inverse square root, and some elementary functions using small multipliers. IEEE Trans Comput 49(7):628–637
Article MathSciNet Google Scholar
Goto E, Wong WF (1995) Fast evaluation of the elementary functions in single precision. IEEE Trans Comput 44(3):453–457
Article MATH Google Scholar
Scarpazza D, Russell G (2009) High-performance regular expression scanning on the Cell/B.E. processor. In: Proceedings of the 23rd international conference on supercomputing, pp 14–25
Rehman M, Kothapalli K, Narayanan P (2009) Fast and scalable list ranking on the GPU. In: Proceedings of the 23rd international conference on supercomputing, pp 235–243
Goldberg D (1991) What every computer scientist should know about floating-point arithmetic. ACM Comput Surv 23(1):5–48
Article Google Scholar
Abramowitz M, Stegun I (1965) Handbook of mathematical functions: with formulas, graphs, and mathematical tables. Dover, New York
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Information Processing & Management, Shiga University, 1-1-1 Bamba, Hikone, 522-8522, Japan
Naoki Shibata

Authors

Naoki Shibata
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Naoki Shibata.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Shibata, N. Efficient evaluation methods of elementary functions suitable for SIMD computation. Comput Sci Res Dev 25, 25–32 (2010). https://doi.org/10.1007/s00450-010-0108-2

Download citation

Published: 20 April 2010
Issue Date: May 2010
DOI: https://doi.org/10.1007/s00450-010-0108-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Efficient evaluation methods of elementary functions suitable for SIMD computation

Abstract

Access this article

Similar content being viewed by others

uLog: a software-based approximate logarithmic number system for computations on SIMD processors

An Implementation of Parallel Number-Theoretic Transform Using Intel AVX-512 Instructions

Performance Analysis of the Kahan-Enhanced Scalar Product on Current Multicore Processors

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Efficient evaluation methods of elementary functions suitable for SIMD computation

Abstract

Access this article

Similar content being viewed by others

uLog: a software-based approximate logarithmic number system for computations on SIMD processors

An Implementation of Parallel Number-Theoretic Transform Using Intel AVX-512 Instructions

Performance Analysis of the Kahan-Enhanced Scalar Product on Current Multicore Processors

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation