Advertisement

Efficiency of Reproducible Level 1 BLAS

  • Chemseddine Chohra
  • Philippe Langlois
  • David Parello
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9553)

Abstract

Numerical reproducibility failures appear in massively parallel floating-point computations. One way to guarantee this reproducibility is to extend the IEEE-754 correct rounding to larger computing sequences, e.g. to the BLAS. Is the extra cost for numerical reproducibility acceptable in practice? We present solutions and experiments for the level 1 BLAS and we conclude about their efficiency.

References

  1. 1.
    IEEE 754–2008, Standard for Floating-Point Arithmetic. Institute of Electrical and Electronics Engineers, New York (2008)Google Scholar
  2. 2.
    Bohlender, G.: Floating-point computation of functions with maximum accuracy. IEEE Trans. Comput. C-26(7), 621–632 (1977)Google Scholar
  3. 3.
    Chohra, C., Langlois, P., Parello, D.: Implementation and Efficiency of Reproducible Level 1 BLAS (2015). http://hal-lirmm.ccsd.cnrs.fr/lirmm-01179986
  4. 4.
    Collange, S., Defour, D., Graillat, S., Iakimchuk, R.: Reproducible and accurate matrix multiplication in ExBLAS for high-performance computing. In: SCAN 2014, Würzburg, Germany (2014)Google Scholar
  5. 5.
    Dekker, T.J.: A floating-point technique for extending the available precision. Numer. Math. 18, 224–242 (1971)MathSciNetCrossRefzbMATHGoogle Scholar
  6. 6.
    Demmel, J.W., Nguyen, H.D.: Fast reproducible floating-point summation. In: Proceedings of 21th IEEE Symposium on Computer Arithmetic. Austin, Texas, USA (2013)Google Scholar
  7. 7.
  8. 8.
    Jézéquel, F., Langlois, P., Revol, N.: First steps towards more numerical reproducibility. ESAIM: Proc. 45, 229–238 (2013)MathSciNetCrossRefzbMATHGoogle Scholar
  9. 9.
    Muller, J.M., Brisebarre, N., de Dinechin, F., Jeannerod, C.P., Lefèvre, V., Melquiond, G., Revol, N., Stehlé, D., Torres, S.: Handbook of Floating-Point Arithmetic. Birkhäuser, Boston (2010)CrossRefzbMATHGoogle Scholar
  10. 10.
    Ogita, T., Rump, S.M., Oishi, S.: Accurate sum and dot product. SIAM J. Sci. Comput. 26(6), 1955–1988 (2005)MathSciNetCrossRefzbMATHGoogle Scholar
  11. 11.
    Reinders, J.: Intel Threading Building Blocks, 1st edn. O’Reilly & Associates Inc., Sebastopol (2007)Google Scholar
  12. 12.
  13. 13.
    Rump, S.M.: Ultimately fast accurate summation. SIAM J. Sci. Comput. 31(5), 3466–3502 (2009)MathSciNetCrossRefzbMATHGoogle Scholar
  14. 14.
    Rump, S.M., Ogita, T., Oishi, S.: Accurate floating-point summation - part I: faithful rounding. SIAM J. Sci. Comput. 31(1), 189–224 (2008)MathSciNetCrossRefzbMATHGoogle Scholar
  15. 15.
    Story, S.: Numerical reproducibility in the Intel Math Kernel Library. Salt Lake City, November 2012Google Scholar
  16. 16.
    Van Zee, F.G., van de Geijn, R.A.: BLIS: a framework for rapidly instantiating BLAS functionality. ACM Trans. Math. Software 41(3), 14:1–14:33 (2015)MathSciNetCrossRefGoogle Scholar
  17. 17.
    Yamanaka, N., Ogita, T., Rump, S., Oishi, S.: A parallel algorithm for accurate dot product. Parallel Comput. 34(68), 392–410 (2008)MathSciNetCrossRefGoogle Scholar
  18. 18.
    Zhu, Y.K., Hayes, W.B.: Correct rounding and hybrid approach to exact floating-point summation. SIAM J. Sci. Comput. 31(4), 2981–3001 (2009)MathSciNetCrossRefzbMATHGoogle Scholar
  19. 19.
    Zhu, Y.K., Hayes, W.B.: Algorithm 908: online exact summation of floating-point streams. ACM Trans. Math. Softw. 37(3), 37:1–37:13 (2010)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  • Chemseddine Chohra
    • 1
    • 2
  • Philippe Langlois
    • 1
    • 2
  • David Parello
    • 1
    • 2
  1. 1.Digits, Architectures et Logiciels InformatiquesUniv. Perpignan Via DomitiaPerpignanFrance
  2. 2.Laboratoire d’Informatique Robotique et de Microélectronique de MontpellierUniv. Montpellier II, UMR 5506, CNRSMontpellierFrance

Personalised recommendations