Advertisement

Multiple-Precision Scaled Vector Addition on Graphics Processing Unit

  • Konstantin IsupovEmail author
  • Alexander Kuvaev
Conference paper
  • 286 Downloads
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11657)

Abstract

Many large problems need linear algebra operations with a precision exceeding the standard floating-point binary64 format. In this paper, we implement a multiple-precision scaled vector addition BLAS routine (WAXPBY) on graphics processing units. We use a residue number system (RNS) to represent significands of floating-point values. In RNS, large numbers replace with their residues and the operations of addition, subtraction and multiplication perform on these residues in parallel and without carry propagation. Our parallel WAXPBY algorithm is divided into a number of steps, and each step is carried out by a separate GPU kernel. Experiments show that the developed routine clearly outperforms parallel CPU-based multiple-precision implementations.

Keywords

High-precision computations Computer arithmetic Residue number system BLAS CUDA 

Notes

Acknowledgement

This work was supported by the Russian Science Foundation (grant number 18-71-00063).

References

  1. 1.
    Bailey, D.H., Hida, Y., Li, X.S., Thompson, B.: ARPREC: an arbitrary precision computation package. Technical report, Lawrence Berkeley National Laboratory (2002). https://www.osti.gov/servlets/purl/817634. Accessed 28 Jan 2019
  2. 2.
    Bailey, D., Borwein, J.: High-precision arithmetic in mathematical physics. Mathematics 3(2), 337–367 (2015).  https://doi.org/10.3390/math3020337CrossRefzbMATHGoogle Scholar
  3. 3.
    Blackford, L.S., et al.: An updated set of basic linear algebra subprograms (BLAS). ACM Trans. Math. Softw. 28(2), 135–151 (2002).  https://doi.org/10.1145/567806.567807MathSciNetCrossRefzbMATHGoogle Scholar
  4. 4.
    Fousse, L., Hanrot, G., Lefèvre, V., Pélissier, P., Zimmermann, P.: MPFR: a multiple-precision binary floating-point library with correct rounding. ACM Trans. Math. Softw. 33(2), article no. 13 (2007).  https://doi.org/10.1145/1236463.1236468CrossRefGoogle Scholar
  5. 5.
    Isupov, K., Knyazkov, V.: Interval estimation of relative values in residue number system. J. Circ. Syst. Comput. 27(1), 1850004 (2018).  https://doi.org/10.1142/S0218126618500044CrossRefGoogle Scholar
  6. 6.
    Isupov, K., Knyazkov, V., Kuvaev, A.: Fast power-of-two RNS scaling algorithm for large dynamic ranges. In: IVth International Conference on Engineering and Telecommunication (EnT), pp. 135–139. IEEE, Moscow (2017).  https://doi.org/10.1109/ICEnT.2017.36
  7. 7.
    Johnson-McDaniel, N.K., Shah, A.G., Whiting, B.F.: Experimental mathematics meets gravitational self-force. Phys. Rev. D 92(4), 044007 (2015).  https://doi.org/10.1103/PhysRevD.92.044007MathSciNetCrossRefGoogle Scholar
  8. 8.
    Joldes, M., Muller, J.-M., Popescu, V., Tucker, W.: CAMPARY: cuda multiple precision arithmetic library and applications. In: Greuel, G.-M., Koch, T., Paule, P., Sommese, A. (eds.) ICMS 2016. LNCS, vol. 9725, pp. 232–240. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-42432-3_29CrossRefGoogle Scholar
  9. 9.
    Li, X.S., et al.: Design, implementation and testing of extended and mixed precision BLAS. ACM Trans. Math. Softw. 28(2), 152–205 (2002).  https://doi.org/10.1145/567806.567808MathSciNetCrossRefGoogle Scholar
  10. 10.
    Lu, M., He, B., Luo, Q.: Supporting extended precision on graphics processors. In: Sixth International Workshop on Data Management on New Hardware (DaMoN 2010), pp. 19–26. ACM, Indianapolis (2010).  https://doi.org/10.1145/1869389.1869392
  11. 11.
    Mukunoki, D., Takahashi, D.: Implementation and evaluation of quadruple precision BLAS functions on GPUs. In: Jónasson, K. (ed.) PARA 2010. LNCS, vol. 7133, pp. 249–259. Springer, Heidelberg (2012).  https://doi.org/10.1007/978-3-642-28151-8_25CrossRefGoogle Scholar
  12. 12.
    Nakata, M.: Poster: Mpack 0.7.0: Multiple precision version of BLAS and LAPACK. In: 2012 SC Companion: High Performance Computing, Networking Storage and Analysis, pp. 1353–1353. IEEE, Salt Lake City (2012).  https://doi.org/10.1109/SC.Companion.2012.183
  13. 13.
    Nakayama, T.: The CUDA multiple precision arithmetic library. https://github.com/skystar0227/CUMP. Accessed 30 Apr 2019
  14. 14.
    Omondi, A., Premkumar, B.: Residue Number Systems: Theory and Implementation. Imperial College Press, London (2007)CrossRefGoogle Scholar
  15. 15.
    Simmons-Duffin, D.: A semidefinite program solver for the conformal bootstrap. J. High Energy Phys. 2015(6), 174 (2015).  https://doi.org/10.1007/JHEP06(2015)174MathSciNetCrossRefGoogle Scholar
  16. 16.
    Sobyanin, P.: GPU multiple-precision arithmetic libraries (in Russian). Intellektual’nyye sistemy. Teoriya i prilozheniya 22(3), 89–95 (2018). http://intsysjournal.org/pdfs/22-3/Sobyanin.pdf. Accessed 13 May 2019Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.Vyatka State UniversityKirovRussia

Personalised recommendations