Parallel Vectorized Implementations of Compensated Summation Algorithms

Dmitruk, Beata; Stpiczyński, Przemysław

doi:10.1007/978-3-031-30445-3_6

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13827))

Included in the following conference series:

International Conference on Parallel Processing and Applied Mathematics

436 Accesses
1 Citations

Abstract

The aim of this paper is to show that Kahan’s and Gill-Møller compensated summation algorithms that allow to achieve high accuracy of summing long sequences of floating-point numbers can be efficiently vectorized and parallelized. The new implementation uses Intel AVX-512 intrinsics together with OpenMP constructs in order to utilize SIMD extension of modern multicore processors. We describe in detail the vectorization technique and show how to define custom reduction operators in OpenMP. Numerical experiments performed on a server with Intel Xeon Gold 6342 processors show that the new implementations of the compensated summation algorithms achieve much better accuracy than ordinary summation and their performance is comparable with the performance of the ordinary summation algorithm optimized automatically. Moreover, the experiments show that the vectorized implementation of the Gill-Møller algorithm is faster than the vectorized implementation of Kahan’s algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 99.00; Price excludes VAT (USA)

Softcover Book: USD 129.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

An Efficient Summation Algorithm for the Accuracy, Convergence and Reproducibility of Parallel Numerical Methods

Parallel Accurate and Reproducible Summation

CUBLAS-aided Long Vector Algorithms

Article 25 November 2014

References

Ahrens, P., Demmel, J., Nguyen, H.D.: Algorithms for efficient reproducible floating point summation. ACM Trans. Math. Softw. 46, 22:1–22:49 (2020). https://doi.org/10.1145/3389360
Amiri, H., Shahbahrami, A.: SIMD programming using intel vector extensions. J. Parallel Distrib. Comput. 135, 83–100 (2020). https://doi.org/10.1016/j.jpdc.2019.09.012
Article Google Scholar
Collange, S., Defour, D., Graillat, S., Iakymchuk, R.: Numerical reproducibility for the parallel reduction on multi- and many-core architectures. Parallel Comput. 49, 83–97 (2015). https://doi.org/10.1016/j.parco.2015.09.001
Article MathSciNet Google Scholar
Goldberg, D.: What every computer scientist should know about floating-point arithmetic. ACM Comput. Surv. 23, 5–48 (1991). https://doi.org/10.1145/103162.103163
Article Google Scholar
He, Y., Ding, C.H.Q.: Using accurate arithmetics to improve numerical reproducibility and stability in parallel applications. J. Supercomput. 18, 259–277 (2001). https://doi.org/10.1023/A:1008153532043
Article MATH Google Scholar
Higham, N.J.: The accuracy of floating point summation. SIAM J. Sci. Comput. 14, 783–799 (1993). https://doi.org/10.1137/0914050
Article MathSciNet MATH Google Scholar
Higham, N.: Accuracy and Stability of Numerical Algorithms. SIAM, Philadelphia (1996)
MATH Google Scholar
Hofmann, J., Fey, D., Riedmann, M., Eitzinger, J., Hager, G., Wellein, G.: Performance analysis of the Kahan-enhanced scalar product on current multicore processors. In: Wyrzykowski, R., Deelman, E., Dongarra, J., Karczewski, K., Kitowski, J., Wiatr, K. (eds.) PPAM 2015. LNCS, vol. 9573, pp. 63–73. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-32149-3_7
Chapter Google Scholar
Hofmann, J., Fey, D., Riedmann, M., Eitzinger, J., Hager, G., Wellein, G.: Performance analysis of the Kahan-enhanced scalar product on current multi-core and many-core processors. Concurr. Comput. Pract. Exp. 29(9) (2017). https://doi.org/10.1002/cpe.3921
Jankowski, M., Smoktunowicz, A., Woźniakowski, H.: A note on floating-point summation of very many terms. Elektronische Informationsverarbeitung und Kybernetik 19, 435–440 (1983)
MathSciNet MATH Google Scholar
Jankowski, M., Woźniakowski, H.: The accurate solution of certain continuous problems using only single precision arithmetic. BIT Num.l Math. (1985). https://doi.org/10.1007/BF01936142
Article MATH Google Scholar
Jeffers, J., Reinders, J., Sodani, A.: Intel Xeon Phi Processor High-Performance Programming. Knights Landing Edition. Morgan Kaufman, Cambridge (2016)
Google Scholar
Kahan, W.: Pracniques: further remarks on reducing truncation errors. Commun. ACM 8, 40 (1965). https://doi.org/10.1145/363707.363723
Article Google Scholar
Kiełbasiński, A.: The summation algorithm with correction and their applications. Math. Appl. (Matematyka Stosowana) (1973). 10.14708/ma.v1i1.295
Google Scholar
Lefèvre, V.: Correctly rounded arbitrary-precision floating-point summation. IEEE Trans. Comput. 66, 2111–2124 (2017). https://doi.org/10.1109/TC.2017.2690632
Article MathSciNet MATH Google Scholar
Lei, X., Gu, T., Graillat, S., Jiang, H., Qi, J.: A fast parallel high-precision summation algorithm based on AccSumK. J. Computut. Appl. Math. 406, 113827 (2022). https://doi.org/10.1016/j.cam.2021.113827
Article MathSciNet MATH Google Scholar
Lutz, D.R., Hinds, C.N.: High-precision anchored accumulators for reproducible floating-point summation. In: Burgess, N., Bruguera, J.D., de Dinechin, F. (eds.) 24th IEEE Symposium on Computer Arithmetic, ARITH 2017, London, UK, 24–26 July 2017, pp. 98–105. IEEE Computer Society (2017). https://doi.org/10.1109/ARITH.2017.20
Møller, O.: Quasi double-precision in floating point addition. BIT Num.l Math. 5, 37–50 (1965). https://doi.org/10.1007/BF01975722
Article MathSciNet MATH Google Scholar
Neuman, B., Dubois, A., Monroe, L., Robey, R.W.: Fast, good, and repeatable: Summations, vectorization, and reproducibility. Int. J. High Perform. Comput. Appl. 34 (2020). https://doi.org/10.1177/1094342020938425
van der Pas, R., Stotzer, E., Terboven, C.: Using OpenMP - The Next Step. Affinity, Accelerators, Tasking, and SIMD. MIT Press, Cambridge (2017)
Google Scholar
Stojanov, A., Toskov, I., Rompf, T., Püschel, M.: SIMD intrinsics on managed language runtimes. In: Proceedings of the 2018 International Symposium on Code Generation and Optimization, pp. 2–15. ACM, New York, NY (2018). https://doi.org/10.1145/3168810
Stpiczyński, P.: Language-based vectorization and parallelization using intrinsics, OpenMP, TBB and Cilk Plus. J. Supercomput. 74(4), 1461–1472 (2018). https://doi.org/10.1007/s11227-017-2231-3
Article Google Scholar
Stpiczyński, P.: Algorithmic and language-based optimization of Marsa-LFIB4 pseudorandom number generator using OpenMP, OpenACC and CUDA. J. Parallel Distrib. Comput. 137, 238–245 (2020). https://doi.org/10.1016/j.jpdc.2019.12.004
Article Google Scholar
Uguen, Y., de Dinechin, F., Derrien, S.: Bridging high-level synthesis and application-specific arithmetic: the case study of floating-point summations. In: Santambrogio, M.D., Göhringer, D., Stroobandt, D., Mentens, N., Nurmi, J. (eds.) 27th International Conference on Field Programmable Logic and Applications, FPL 2017, Ghent, Belgium, 4–8 September 2017, pp. 1–8. IEEE (2017). https://doi.org/10.23919/FPL.2017.8056792
Wang, H., Wu, P., Tanase, I.G., Serrano, M.J., Moreira, J.E.: Simple, portable and fast SIMD intrinsic programming: generic SIMD library. In: Proceedings of the 2014 Workshop on Programming Models for SIMD/Vector Processing, pp. 9–16. ACM, New York, NY (2014). https://doi.org/10.1145/2568058.2568059
Wilkinson, J.: Rounding Errors in Algebraic Processes. Prentice-Hall, Englewood Cliffs (1963)
MATH Google Scholar

Download references

Author information

Authors and Affiliations

Maria Curie-Skłodowska University, Institute of Computer Science, ul. Akademicka 9, 20-031, Lublin, Poland
Beata Dmitruk & Przemysław Stpiczyński

Authors

Beata Dmitruk
View author publications
You can also search for this author in PubMed Google Scholar
Przemysław Stpiczyński
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Przemysław Stpiczyński .

Editor information

Editors and Affiliations

Czestochowa University of Technology, Czestochowa, Poland
Roman Wyrzykowski
University of Tennessee, Knoxville, TN, USA
Jack Dongarra
University of Southern California, Marina del Rey, CA, USA
Ewa Deelman
Czestochowa University of Technology, Czestochowa, Poland
Konrad Karczewski

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Dmitruk, B., Stpiczyński, P. (2023). Parallel Vectorized Implementations of Compensated Summation Algorithms. In: Wyrzykowski, R., Dongarra, J., Deelman, E., Karczewski, K. (eds) Parallel Processing and Applied Mathematics. PPAM 2022. Lecture Notes in Computer Science, vol 13827. Springer, Cham. https://doi.org/10.1007/978-3-031-30445-3_6

Download citation

DOI: https://doi.org/10.1007/978-3-031-30445-3_6
Published: 27 April 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-30444-6
Online ISBN: 978-3-031-30445-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Parallel Vectorized Implementations of Compensated Summation Algorithms

Abstract

Access this chapter

Similar content being viewed by others

An Efficient Summation Algorithm for the Accuracy, Convergence and Reproducibility of Parallel Numerical Methods

Parallel Accurate and Reproducible Summation

CUBLAS-aided Long Vector Algorithms

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Parallel Vectorized Implementations of Compensated Summation Algorithms

Abstract

Access this chapter

Similar content being viewed by others

An Efficient Summation Algorithm for the Accuracy, Convergence and Reproducibility of Parallel Numerical Methods

Parallel Accurate and Reproducible Summation

CUBLAS-aided Long Vector Algorithms

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation