Extending Summation Precision for Network Reduction Operations

Michelogiannakis, George; Li, Xiaoye S.; Bailey, David H.; Shalf, John

doi:10.1007/s10766-014-0326-5

Extending Summation Precision for Network Reduction Operations

Published: 19 October 2014

Volume 43, pages 1218–1243, (2015)
Cite this article

International Journal of Parallel Programming Aims and scope Submit manuscript

George Michelogiannakis¹,
Xiaoye S. Li¹,
David H. Bailey¹ &
…
John Shalf¹

202 Accesses
1 Citation
1 Altmetric
Explore all metrics

Abstract

Double precision summation is at the core of numerous important algorithms such as Newton–Krylov methods and other operations involving inner products, such as matrix multiplication and dot products. However, the effectiveness of summation is limited by the accumulation of rounding errors due to compressed representations, which are an increasing problem with the scaling of modern HPC systems and data sets that can easily perform summations with millions or billions of operands. To reduce the impact of precision loss, researchers have proposed increased- and arbitrary-precision libraries that provide reproducible error or even bounded error accumulation for large sums. However, such libraries increase computation and communication time significantly, and do not always guarantee an exact result. In this article, we propose fixed-point representations of double precision variables that enable arbitrarily large summations without error and provide exact and reproducible results. We call this format big integer (BigInt). Even though such formats have been studied for local processor computations, we make the case that using fixed-point representation for distributed computation over a system-wide network is feasible with performance comparable to that of double-precision floating point summation. This is possible by the inclusion of simple and inexpensive logic into modern NICs, or by using the programmable logic found in many modern NICs, in order to accelerate performance on large-scale systems in order to avoid waking up processors.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Parallel Vectorized Implementations of Compensated Summation Algorithms

Precision-aware deterministic and probabilistic error bounds for floating point summation

Article 30 August 2023

Reproducible, Accurately Rounded and Efficient BLAS

References

Adams, M., Brown, J., Shalf, J., Straalen, B.V., Strohmaier, E., Williams, S.: HPGMG 1.0: a benchmark for ranking high performance computing systems. Tech. rep., Lawrence Berkeley national laboratory (2014). doi:10.2172/1131029. http://www.osti.gov/scitech/servlets/purl/1131029
Allen, E., Burns, J., Gilliam, D., Hill, J., Shubov, V.: The impact of finite precision arithmetic and sensitivity on the numerical solution of partial differential equations. Math. Comput. Model. 35(11–12) (2002). doi:10.1016/S0895-7177(02)00078-X
Antypas, K.: The Hopper XE6 system: delivering high end computing to the nation’s science and research community. Tech. rep, Cray Quarterly Review (2011)
Astfalk, G.: Why optical data communications and why now? Appl. Phys. A 95(4), 933–940 (2009). doi:10.1007/s00339-009-5115-4
Bailey, D.H.: High-precision floating-point arithmetic in scientific computation. Comput. Sci. Eng. 7(3) (2005). doi:10.1109/MCSE.2005.52
Bailey, D.H., Barrio, R., Borwein, J.M.: High-precision computation: mathematical physics and dynamics. Appl. Math. Comput. 218(20), 10106–10121 (2012)
Bailey, D.H., Hida, Y., Li, X.S., Thompson, O.: ARPREC: an arbitrary precision computation package. Tech. rep, Lawrence Berkeley National Laboratory (2002)
Boden, N.J., Cohen, D., Felderman, R.E., Kulawik, A.E., Seitz, C.L., Seizovic, J.N., Su, W.K.: Myrinet: A gigabit-per-second local area network. IEEE Micro 15(1), 29–36 (1995). doi:10.1109/40.342015
Article Google Scholar
Buntinas, D., Panda, D.K.: NIC-based reduction in Myrinet clusters: is it beneficial? In: SAN-02 Workshop (2003)
Carreo, V.A., Miner, P.S.: Specification of the IEEE-854 floating-point standard in HOL and PVS (1995)
Case, L.: Inside Intel’s Haswell CPU: better performance, all-day battery. http://www.pcworld.com/article/262241/inside_intels_haswell_cpu_better_performance_all_day_battery.html/ (2012)
Chervenak, A., Deelman, E., Livny, M., Su, M.H., Schuler, R., Bharathi, S., Mehta, G., Vahi, K.: Data placement for scientific applications in distributed environments. In: 8th IEEE/ACM International Conference on Grid Computing, GRID ’07 (2007). doi:10.1109/GRID.2007.4354142
Chesneaux, J.M., Graillat, S., Jézéquel, F.: Rounding errors. In: Wiley Encyclopedia of Computer Science and Engineering (2008)
Corporation, I.: Intel 64 and IA-32 architectures developer’s manual: vol. 1 (2012)
Damaraju, S., George, V., Jahagirdar, S., Khondker, T., Milstrey, R., Sarkar, S., Siers, S., Stolero, I., Subbiah, A.: A 22nm IA multi-CPU and GPU system-on-chip. In: 59th IEEE International Solid-State Circuits Conference Digest of Technical Papers, ISSCC ’12 (2012). doi:10.1109/ISSCC.2012.6176876
Demmel, J., Diament, B., Malajovich, G.: On the complexity of computing error bounds. Found. Comput. Math. 1(1), 101–125 (2001)
Article MATH MathSciNet Google Scholar
Demmel, J., Dumitriu, I., Holtz, O., Koev, P.: Accurate and efficient expression evaluation and linear algebra. Comput. Res. Reporitory abs/0712.4027 (2007)
Demmel, J., Nguyen, H.D.: Fast reproducible floating-point summation. In: 21st IEEE Symposium on Computer Arithmetic (2013)
Forum, M.P.I.: MPI: a message-passing interface standard. version 3.0. http://www.mpi-forum.org/docs/mpi-3.0/mpi30-report.pdf/ (2012)
Fousse, L., Hanrot, G., Lefèvre, V., Pélissier, P., Zimmermann, P.: MPFR: a multiple-precision binary floating-point library with correct rounding. ACM Trans. Math. Softw. 33(2) (2007). doi:10.1145/1236463.1236468
Gene Frantz, R.S.: Comparing fixed- and floating-point DSPs. Texas instruments while paper. http://www.ti.com/lit/wp/spry061/spry061.pdf (2004)
Ghazi, K.R., Lefevre, V., Theveny, P., Zimmermann, P.: Why and how to use arbitrary precision. Comput. Sci. Eng. 12(3), 5 (2010). doi:10.1109/MCSE.2010.73
Govindu, G., Zhuo, L., Choi, S., Prasanna, V.: Analysis of high-performance floating-point arithmetic on FPGAs. In: 18th IEEE International Parallel and Distributed Processing Symposium, IPDPS ’04 (2004). doi:10.1109/IPDPS.2004.1303135
Graillat, S., Ménissier-Morain, V.: Accurate summation, dot product and polynomial evaluation in complex floating point arithmetic. Inf. Comput. 216, 57–71 (2012)
Article MATH Google Scholar
Granlund, T., the GMP development team: GNU MP: the GNU Multiple Precision Arithmetic Library, 5.0.5 edn. (2012)
He, Y., Ding, C.H.Q.: Using accurate arithmetics to improve numerical reproducibility and stability in parallel applications. In: 14th International Conference on Supercomputing, ICS ’00 (2000). doi:10.1145/335231.335253
Heroux, M.A., Dongarra, J., Luszczek, P.: HPCG benchmark technical specification. Tech. rep., Sandia national laboratory (2013). http://www.osti.gov/scitech/servlets/purl/1113870
Hida, Y., Li, X., Bailey, D.H.: Library for double-double and quad-double arithmetic. http://web.mit.edu/tabbott/Public/quaddouble-debian/qd-2.3.4-old/docs/qd.pdf/ (2007)
Higham, N.J.: The accuracy of floating point summation. SIAM J. Sci. Comput. 14, 783–799 (1993)
Hoefler, T., Gottlieb, S.: Parallel zero-copy algorithms for fast Fourier transform and conjugate gradient using MPI datatypes. In: Proceedings of the 17th European MPI Users’ Group Meeting Conference on Recent Advances in the Message Passing Interface, EuroMPI’10, pp. 132–141 (2010). http://dl.acm.org/citation.cfm?id=1894122.1894140
Hoefler, T., Traff, J.: Sparse collective operations for MPI. In: 29th IEEE International Symposium on Parallel Distributed Processing, IPDPS ’09 (2009). doi:10.1109/IPDPS.2009.5160935
Hong, X., Chongyang, W., Jiangyu, Y.: Analysis and research of floating-point exceptions. In: 2nd International Conference on Information Science and Engineering, ICISE ’10 (2010). doi:10.1109/ICISE.2010.5690343
IEEE: IEEE standard for binary floating-point arithmetic. ANSI/IEEE Std 754–1985 (1985). doi:10.1109/IEEESTD.1985.82928
IEEE: IEEE standard for floating-point arithmetic. ANSI/IEEE Std 754–2008 (2008). DOI 10.1109/IEEESTD.2008.4610935
Katz, R.H.: Contemporary logic design. Benjamin-Cummings, Redwood City (1993)
Google Scholar
Kielmann, T., Hofman, R.E.H., Bal, H.E., Plaat, A., Bhoedjang, R.A.E.: MPI’s reduction operations in clustered wide area systems. In: Message Passing Interface Developer’s and User’s Conference, MPIFC ’99 (1999). doi:10.1109/IPDPS.2006.1639334
Krueger, J., Donofrio, D., Shalf, J., Mohiyuddin, M., Williams, S., Oliker, L., Pfreund, F.J.: Hardware/software co-design for energy-efficient seismic modeling. In: Conference on High Performance Computing Networking, Storage and Analysis (2011)
Kulisch, U.: Very fast and exact accumulation of products. Computing 91(4), 397–405 (2011). doi:10.1007/s00607-010-0131-y
Article MATH MathSciNet Google Scholar
Kulisch, U., Snyder, V.: The exact dot product as basic tool for long interval arithmetic. Computing 91(3) (2011). doi:10.1007/s00607-010-0127-7
Kwon, T.J., Sondeen, J., Draper, J.: Design trade-offs in floating-point unit implementation for embedded and processing-in-memory systems. In: IEEE International Symposium on Circuits and Systems, ISCAS ’05 (2005). doi:10.1109/ISCAS.2005.1465341
McNamee, J.M.: A comparison of methods for accurate summation. ACM SIGSAM Bull. 38(1) (2004). doi:10.1145/980175.980177
Petrini, F., Feng, W.c., Hoisie, A., Coll, S., Frachtenberg, E.: The quadrics network (QsNet): High-performance clustering technology. In: Proceedings of the The Ninth Symposium on High Performance Interconnects, HOTI ’01, pp. 125–130 (2001)
Petrini, F., Moody, A., Fernandez, J., Frachtenberg, E., Panda, D.K.: NIC-based reduction algorithms for large-scale clusters. Int. J. High Perform. Comput. Netw. 4(3/4), 122–136 (2006). doi:10.1504/IJHPCN.2006.010635
Article Google Scholar
Pritchard, H., Gorodetsky, I., Buntinas, D.: A uGNI-based MPICH2 nemesis network module for the Cray XE. In: 18th European MPI Users’ Group Conference on Recent Advances in the Message Passing Interface, EuroMPI’11 (2011)
Reussner, R., Sanders, P., Träff, J.L.: SKaMPI: a comprehensive benchmark for public benchmarking of MPI. Sci. Program. 10(1), 55–65 (2002)
Google Scholar
Ritzdorf, H., Traff, J.: Collective operations in NEC’s high-performance MPI libraries. In: International Parallel and Distributed Processing Symposium, IPDPS ’06 (2006). doi:10.1109/IPDPS.2006.1639334
Schuite, M., Balzola, P., Akkas, A., Brocato, R.: Integer multiplication with overflow detection or saturation. IEEE Trans. Comput. 49(7), 681–691 (2000). doi:10.1109/12.863038
Article Google Scholar
Shalf, J., Dosanjh, S., Morrison, J.: Exascale computing technology challenges. In: 9th International Conference on High Performance Computing for Computational Science, VECPAR’10 (2011)
Siegel, S., Wolff von Gudenberg, J.: A long accumulator like a carry-save adder. Computing 94(2–4), 203–213 (2012). doi:10.1007/s00607-011-0164-x
Tsafrir, D.: The context-switch overhead inflicted by hardware interrupts (and the enigma of do-nothing loops). In: Experimental Computer Science on Experimental Computer Science, ECS ’07. USENIX Association (2007)
Vishnu, A., ten Bruggencate, M., Olson, R.: Evaluating the potential of Cray Gemini interconnect for PGAS communication runtime systems. In: 19th IEEE Annual Symposium on High Performance Interconnects, HOTI ’11 (2011). doi:10.1109/HOTI.2011.19
Vishnu, A., Koop, M., Moody, A., Mamidala, A., Narravula, S., Panda, D.: Hot-spot avoidance with multi-pathing over InfiniBand: an MPI perspective. In: 7th IEEE International Symposium on Cluster Computing and the Grid, CCGRID ’07 (2007). doi:10.1109/CCGRID.2007.60
Zhu, Y.K., Hayes, W.B.: Algorithm 908: Online exact summation of floating-point streams. ACM Trans. Math. Softw. 37(3), 37:1–37:13 (2010). doi:10.1145/1824801.1824815

Download references

Acknowledgments

This work was supported by the Director, Office of Science, of the U.S. Department of Energy under Contract No. DE- AC02-05CH11231.

Author information

Authors and Affiliations

Lawrence Berkeley National Laboratory, 1 Cyclotron Road, Berkeley, CA, 94702, USA
George Michelogiannakis, Xiaoye S. Li, David H. Bailey & John Shalf

Authors

George Michelogiannakis
View author publications
You can also search for this author in PubMed Google Scholar
Xiaoye S. Li
View author publications
You can also search for this author in PubMed Google Scholar
David H. Bailey
View author publications
You can also search for this author in PubMed Google Scholar
John Shalf
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to George Michelogiannakis.

Additional information

Disclaimer: This document was prepared as an account of work sponsored by the United States Government. While this document is believed to contain correct information, neither the United States Government nor any agency thereof, nor the Regents of the University of California, nor any of their employees, makes any warranty, express or implied, or assumes any legal responsibility for the accuracy, completeness, or usefulness of any information, apparatus, product, or process disclosed, or represents that its use would not infringe privately owned rights. Reference herein to any specific commercial product, process, or service by its trade name, trademark, manufacturer, or otherwise, does not necessarily constitute or imply its endorsement, recommendation, or favoring by the United States Government or any agency thereof, or the Regents of the University of California. The views and opinions of authors expressed herein do not necessarily state or reflect those of the United States Government or any agency thereof or the Regents of the University of California.

Copyright Notice: This manuscript has been authored by an author at Lawrence Berkeley National Laboratory under Contract No. DE-AC02-05CH11231 with the U.S. Department of Energy. The U.S. Government retains, and the publisher, by accepting the article for publication, acknowledges, that the U.S. Government retains a non-exclusive, paid-up, irrevocable, world-wide license to publish or reproduce the published form of this manuscript, or allow others to do so, for U.S. Government purposes.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Michelogiannakis, G., Li, X.S., Bailey, D.H. et al. Extending Summation Precision for Network Reduction Operations. Int J Parallel Prog 43, 1218–1243 (2015). https://doi.org/10.1007/s10766-014-0326-5

Download citation

Received: 21 November 2013
Accepted: 24 September 2014
Published: 19 October 2014
Issue Date: December 2015
DOI: https://doi.org/10.1007/s10766-014-0326-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Extending Summation Precision for Network Reduction Operations

Abstract

Access this article

Similar content being viewed by others

Parallel Vectorized Implementations of Compensated Summation Algorithms

Precision-aware deterministic and probabilistic error bounds for floating point summation

Reproducible, Accurately Rounded and Efficient BLAS

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Extending Summation Precision for Network Reduction Operations

Abstract

Access this article

Similar content being viewed by others

Parallel Vectorized Implementations of Compensated Summation Algorithms

Precision-aware deterministic and probabilistic error bounds for floating point summation

Reproducible, Accurately Rounded and Efficient BLAS

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation