A Randomized LU-based Solver Using GPU and Intel Xeon Phi Accelerators

  • Marc Baboulin
  • Amal KhabouEmail author
  • Adrien Rémy
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9523)


We present a fast hybrid solver for dense linear systems based on LU factorization. To achieve good performance, we avoid pivoting by using random butterfly transformations for which we developed efficient implementations on heterogeneous architectures. We used both Graphics Processing Units and Intel Xeon Phi as accelerators. The performance results show that the pre-processing due to randomization is negligible and that the solver outperforms the corresponding routines based on partial pivoting.


Random Butterfly Transformations (RBT) LU factorization Graphics Processing Units (GPU) Intel Xeon Phi 



The authors would like to thank Stanimire Tomov and Ichitaro Yamasaki from University of Tennessee for the support with the MAGMA library.


  1. 1.
    Baboulin, M., Becker, D., Bosilca, G., Danalis, A., Dongarra, J.: An efficient distributed randomized algorithm for solving large dense symmetric indefinite linear systems. Parallel Comput. 40(7), 213–223 (2014)CrossRefMathSciNetGoogle Scholar
  2. 2.
    Baboulin, M., Becker, D., Dongarra, J.: A parallel tiled solver for dense symmetric indefinite systems on multicore architectures. In: 2012 IEEE 26th International Parallel & Distributed Processing Symposium (IPDPS), pp. 14–24. IEEE (2012)Google Scholar
  3. 3.
    Baboulin, M., Dongarra, J., Herrmann, J., Tomov, S.: Accelerating linear system solutions using randomization techniques. ACM Trans. Math. Softw. 39(2), 1–13 (2013)CrossRefMathSciNetGoogle Scholar
  4. 4.
    Baboulin, M., Dongarra, J., Tomov, S.: Some issues in dense linear algebra for multicore and special purpose architectures. In: 9th International Workshop on State-of-the-Art in Scientific and Parallel Computing (PARA’08). Lecture Notes in Computer Science, vol. 6126–6127. Springer-Verlag (2008)Google Scholar
  5. 5.
    Diefendorff, K., Dubey, P.K., Hochsprung, R., Scale, H.: Altivec extension to PowerPC accelerates media processing. IEEE Micro. 20(2), 85–95 (2000)CrossRefGoogle Scholar
  6. 6.
    Grigori, L., Demmel, J.W., Xiang, H.: CALU: a communication optimal LU factorization algorithm. SIAM J. Matrix Anal. Appl. 32(4), 1317–1350 (2011)zbMATHCrossRefMathSciNetGoogle Scholar
  7. 7.
    Haidar, A., Dong, T., Luszczek, P., Tomov, S., Dongarra, J.: Batched matrix computations on hardware accelerators based on GPUs. IJHPCA 29(2), 193–208 (2015).
  8. 8.
    Haidar, A., Luszczek, P., Tomov, S., Dongarra, J.: Heterogenous acceleration for linear algebra in multi-coprocessor environments. In: Daydé, M., Marques, O., Nakajima, K. (eds.) VECPAR 2014. LNCS, vol. 8969, pp. 31–42. Springer, Heidelberg (2015) Google Scholar
  9. 9.
    Intel: Math Kernel Library (MKL).
  10. 10.
    Intel: Intel Xeon Phi™ Coprocessor System Software Developers Guide (2012).
  11. 11.
    Jeffers, J., Reinders, J.: Intel Xeon Phi Coprocessor High-Performance Programming. Morgan Kaufmann, Newnes (2013) Google Scholar
  12. 12.
    Khabou, A., Demmel, J.W., Grigori, L., Gu, M.: Lu factorization with panel rank revealing pivoting and its communication avoiding version. SIAM J. Matrix Anal. Appl. 34(3), 1401–1429 (2013)zbMATHCrossRefMathSciNetGoogle Scholar
  13. 13.
    Nvidia, C.: Compute Unified Device Architecture programming guide (2007)Google Scholar
  14. 14.
    Parker, D.S.: Random butterfly transformations with applications in computational linear algebra. Technical Report CSD-950023, Computer Science Department, UCLA (1995)Google Scholar
  15. 15.
    Parker, D.S., Pierce, B.: The randomizing FFT: an aternative to pivoting in Gaussian elimination. Technical Report CSD-950037, Computer Science Department, UCLA (1995)Google Scholar
  16. 16.
    Tomov, S., Dongarra, J., Baboulin, M.: Towards dense linear algebra for hybrid GPU accelerated manycore systems. Parallel Comput. 36(5&6), 232–240 (2010)zbMATHCrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  1. 1.Université Paris-SudOrsayFrance

Personalised recommendations