Parallel Assembly of ACA BEM Matrices on Xeon Phi Clusters

  • Michal Kravcenko
  • Lukas Maly
  • Michal MertaEmail author
  • Jan Zapletal
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10777)


The paper presents parallelization of the boundary element method in distributed memory of a cluster equipped with many-core based compute nodes. A method for efficient distribution of boundary element matrices among MPI processes based on the cyclic graph decompositions is described. In addition, we focus on the intra-node optimization of the code, which is necessary in order to fully utilize the many-core processors with wide SIMD registers. Numerical experiments carried out on a cluster consisting of the Intel Xeon Phi processors of the Knights Landing generation are presented.


Boundary element method Adaptive cross approximation Distributed parallelization Intel Xeon Phi Many-core processors 



This work was supported by The Ministry of Educations, Youth and Sports from the Large Infrastructures for Research, Experimental Development and Innovations project “IT4Innovations National Supercomputing Center – LM2015070”. The work was supported by The Ministry of Educations, Youth and Sports from the National Programme of Sustainability (NPU II) project “IT4Innovations excellence in science – LQ1602”. This work was partially supported by grant of SGS No. SP2017/165 “Efficient implementation of the boundary element method III”, VŠB – Technical University of Ostrava, Czech Republic. The authors thank HLRN for providing us with access to the HLRN Berlin Test and Development System.


  1. 1.
    Bebendorf, M.: Approximation of boundary element matrices. Numer. Math. 86(4), 565–589 (2000)MathSciNetCrossRefzbMATHGoogle Scholar
  2. 2.
    Bebendorf, M., Kriemann, R.: Fast parallel solution of boundary integral equations and related problems. Comp. Vis. Sci. 8(3–4), 121–135 (2005)MathSciNetCrossRefGoogle Scholar
  3. 3.
    Bebendorf, M., Rjasanow, S.: Adaptive low-rank approximation of collocation matrices. Computing 70(1), 1–24 (2003)MathSciNetCrossRefzbMATHGoogle Scholar
  4. 4.
    Börm, S.: H2Lib (2017). Accessed 14 Feb 2017
  5. 5.
    Dongarra, J.: Report on the Sunway TaihuLight system. Technical report. University of Tennessee, Oak Ridge National Laboratory, June 2016Google Scholar
  6. 6.
    Karypis, G., Kumar, V.: A fast and highly quality multilevel scheme for partitioning irregular graphs. SIAM J. Sci. Comput. 20(1), 359–392 (1999)CrossRefzbMATHGoogle Scholar
  7. 7.
    Kravcenko, M., Merta, M., Zapletal, J.: Using discrete mathematics to optimize parallelism in boundary element method, Paper 2. In: Proceedings of the Fifth International Conference on Parallel, Distributed, Grid and Cloud Computing for Engineering. Civil-Comp Press, Stirlingshire (2017).
  8. 8.
    Kreutzer, M., Hager, G., Wellein, G., Fehske, H., Bishop, A.R.: A unified sparse matrix data format for efficient general sparse matrix-vector multiplication on modern processors with wide SIMD units. SIAM J. Sci. Comput. 36(5), C401–C423 (2014)MathSciNetCrossRefzbMATHGoogle Scholar
  9. 9.
    Lukáš, D., Kovář, P., Kovářová, T., Merta, M.: A parallel fast boundary element method using cyclic graph decompositions. Numer. Algorithms 70(4), 807–824 (2015)MathSciNetCrossRefzbMATHGoogle Scholar
  10. 10.
    Merta, M., Zapletal, J.: BEM4I (2014). Accessed 17 Jan 2017
  11. 11.
    Merta, M., Zapletal, J., Jaros, J.: Many core acceleration of the boundary element method. In: Kozubek, T., Blaheta, R., Šístek, J., Rozložník, M., Čermák, M. (eds.) HPCSE 2015. LNCS, vol. 9611, pp. 116–125. Springer, Cham (2016). CrossRefGoogle Scholar
  12. 12.
    Merta, M., Riha, L., Meca, O., Markopoulos, A., Brzobohaty, T., Kozubek, T., Vondrak, V.: Intel Xeon Phi acceleration of hybrid total FETI solver. Adv. Eng. Softw. 112, 124–135 (2017)CrossRefGoogle Scholar
  13. 13.
    Říha, L., Brzobohatý, T., Markopoulos, A., Kozubek, T., Meca, O., Schenk, O., Vanroose, W.: Efficient implementation of total FETI solver for graphic processing units using schur complement. In: Kozubek, T., Blaheta, R., Šístek, J., Rozložník, M., Čermák, M. (eds.) HPCSE 2015. LNCS, vol. 9611, pp. 85–100. Springer, Cham (2016)CrossRefGoogle Scholar
  14. 14.
    Rjasanow, S., Steinbach, O.: The Fast Solution of Boundary Integral Equations. Springer, Boston (2007). zbMATHGoogle Scholar
  15. 15.
    Sauter, S.A., Schwab, C.: Boundary element methods. In: Sauter, S.A., Schwab, C. (eds.) Boundary Element Methods. Springer Series in Computational Mathematics, vol. 39, pp. 183–287. Springer, Heidelberg (2010). CrossRefGoogle Scholar
  16. 16.
    Steinbach, O.: Numerical Approximation Methods for Elliptic Boundary Value Problems: Finite and Boundary Elements. Texts in Applied Mathematics. Springer, New York (2008). CrossRefzbMATHGoogle Scholar
  17. 17.
    Zapletal, J., Merta, M., Malý, L.: Boundary element quadrature schemes for multi-and many-core architectures. Comput. Math. Appl. 74(1), 157–173 (2017). 5th European Seminar on Computing ESCO 2016MathSciNetCrossRefzbMATHGoogle Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  • Michal Kravcenko
    • 1
    • 2
  • Lukas Maly
    • 1
    • 2
  • Michal Merta
    • 1
    • 2
    Email author
  • Jan Zapletal
    • 1
    • 2
  1. 1.IT4InnovationsVŠB – Technical University of OstravaOstrava-PorubaCzech Republic
  2. 2.Department of Applied MathematicsVŠB – Technical University of OstravaOstrava-PorubaCzech Republic

Personalised recommendations