Skip to main content
Log in

Using Accurate Arithmetics to Improve Numerical Reproducibility and Stability in Parallel Applications

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

Numerical reproducibility and stability of large scale scientific simulations, especially climate modeling, on distributed memory parallel computers are becoming critical issues. In particular, global summation of distributed arrays is most susceptible to rounding errors, and their propagation and accumulation cause uncertainty in final simulation results. We analyzed several accurate summation methods and found that two methods are particularly effective to improve (ensure) reproducibility and stability: Kahan's self-compensated summation and Bailey's double-double precision summation. We provide an MPI operator MPI_SUMDD to work with MPI collective operations to ensure a scalable implementation on large number of processors. The final methods are particularly simple to adopt in practical codes: not only global summations, but also vector-vector dot products and matrix-vector or matrix-matrix operations.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. D. H. Bailey. A Fortran-90 suite of double-double precision programs. See web page at http://www.nersc.gov/223C;dhb/mpdist/mpdist.html.

  2. D. H. Bailey. Multiprecision translation and execution of Fortran programs. ACM Transactions on Mathematical Software, 19:288-319, 1993.

    Google Scholar 

  3. R. P. Brent. A Fortran multiple precision arithmetic package. ACM Transactions on Mathematical Software, 4:57-70, 1978.

    Google Scholar 

  4. X. S. Li, J. W. Demmel, D. H. Bailey, G. Henry, Y. Hida, J. Iskandar, W. Kahan, A. Kapur, M. C. Martin, T. Tung and D. J. Yoo. Design, Implementation and testing of extended and mixed precision BLAS. LBL report LBNL-47372, 2000, and ACM Transactions on Mathmatical Software, submitted.

  5. C. H. Q. Ding and R. D. Ferraro. A parallel climate data assimilation package. SIAM News, pp. 1-12, November 1996.

  6. C. H. Q. Ding and Y. He. Data organization and I/O in an ocean circulation model. In Proceedings of Supercomputing'99, November 1999; also LBL report LBNL-43384, May 1999.

  7. C. H. Q. Ding, P. Lyster, J. Larson, J. Guo, and A. da Silva. Atmospheric data assimilation on distributed parallel supercomputers. In P. Sloot et al., eds. Lecture Notes in Computer Science, Vol. 1401, pp. 115-124. Springer, 1998.

  8. J. Drake, I. Foster, J. Michalakes, B. Toonen, and P. Worley. Design and performance of a scalable parallel community climate model. Parallel Computing (PCCM2), 21:1571, 1995.

    Google Scholar 

  9. G. Fox, M. Johnson, G. Lyzenga, S. Otto, J. Salmon, and D. Walker. Solving Problems on Concurrent Processors, Vol. 1. Prentice Hall, Englewood Cliffs, NJ, 1988.

    Google Scholar 

  10. D. Goldberg. What every computer scientist should know about floating-point arithmetic. ACM Computing Surveys, March 1991.

  11. A. Greenbaum. Iterative Methods for Solvong Linear Systems, Frontiers in Applied Mathematics, Vol. 17. SIAM, Philadelphia, 1997.

    Google Scholar 

  12. S. M. Griffies, R. C. Pacanowski, M. Schmidt, and V. Balaji. The explicit free surface method in the GFDL modular ocean model. Monthly Weather Review, submitted.

  13. J. J. Hack, J. M. Rosinski, D. L. Williamson, B. A. Boville, and J. E. Truesdale. Computational design of NCAR community climate model. Parallel Computing, 21:1545, 1995.

    Google Scholar 

  14. Y. He and C. H. Q. Ding. Numerical Reproducibility and Stability/NERSC Homepage. See web page at http://www.nersc.gov/research/SCG/ocean/NRS.

  15. N. J. Higham. Accuracy and Stability of Numerical Algorithms. SIAM Press, Philadelphia, 1996.

    Google Scholar 

  16. W. Kahan. Further remarks on reducing truncation errors. Communications of the ACM, p. 40, 1965.

  17. D. E. Knuth. Arithmetic. In The Art of Computer Programming, Vol. 2, Chap. 4. Addison-Wesley Press, Reading, Mass., 1969.

    Google Scholar 

  18. D. Moore. Class Notes for CAAM 420: Introduction to Computational Science. Rice University, Spring 1999. See web page at http://www.owlnet.rice.edu/¢caam420/Outline.html.

  19. The NCAR Ocean Model User's Guide, Version 1.4. See web page at http://www.cgd.ucar.edu/ csm/models/ocn-ncom/UserGuide1 4.html, 1998.

  20. R. C. Pacanowski and S. M. Griffies. MOM 3.0 Manual. GFDL Ocean Circulation Group, Geophysical Fluid Dynamics Laboratory, Princeton, NJ, 1999.

    Google Scholar 

  21. B. N. Parlett. The Symmetric Eigenvalue Problem, Classics in Applied Mathematics, 20. SIAM, Philadelphia, 1997.

    Google Scholar 

  22. W. H. Press, S. A. Teukolsky, W. T. Vetterling, and B. P. Flannery. Numerical Recipes in Fortran: The Art of Scientific Computing, 2nd ed. Cambridge University Press, Cambridge, UK, 1992.

    Google Scholar 

  23. D. M. Priest. Algorithms for arbitrary precision floating point arithmetic. On properties of floating point arithmetics: numerical stability and the cost of accurate computations. Ph.D. thesis, Mathematics Department, University of California, Berkeley, 1992.

    Google Scholar 

  24. R. D. Smith, J. K. Dukowicz, and R. C. Malone. Parallel ocean general circulation modeling. Physica, D60:38, 1992. See web page at http://www.acl.lanl.gov/climate/models/pop.

    Google Scholar 

  25. Second International Workshop for Software Engineering and Code Design for Parallel Meteorological and Oceanographic Applications, Scottsdale, Ariz., June 1998.

  26. Workshop on Numerical Benchmarks for Climate/Ocean/Weather Modeling Community, Boulder, Colo., June 1999.

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

He, Y., Ding, C.H.Q. Using Accurate Arithmetics to Improve Numerical Reproducibility and Stability in Parallel Applications. The Journal of Supercomputing 18, 259–277 (2001). https://doi.org/10.1023/A:1008153532043

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/A:1008153532043

Navigation