Parallel Assembly of ACA BEM Matrices on Xeon Phi Clusters
The paper presents parallelization of the boundary element method in distributed memory of a cluster equipped with many-core based compute nodes. A method for efficient distribution of boundary element matrices among MPI processes based on the cyclic graph decompositions is described. In addition, we focus on the intra-node optimization of the code, which is necessary in order to fully utilize the many-core processors with wide SIMD registers. Numerical experiments carried out on a cluster consisting of the Intel Xeon Phi processors of the Knights Landing generation are presented.
KeywordsBoundary element method Adaptive cross approximation Distributed parallelization Intel Xeon Phi Many-core processors
This work was supported by The Ministry of Educations, Youth and Sports from the Large Infrastructures for Research, Experimental Development and Innovations project “IT4Innovations National Supercomputing Center – LM2015070”. The work was supported by The Ministry of Educations, Youth and Sports from the National Programme of Sustainability (NPU II) project “IT4Innovations excellence in science – LQ1602”. This work was partially supported by grant of SGS No. SP2017/165 “Efficient implementation of the boundary element method III”, VŠB – Technical University of Ostrava, Czech Republic. The authors thank HLRN for providing us with access to the HLRN Berlin Test and Development System.
- 4.Börm, S.: H2Lib (2017). http://www.h2lib.org/. Accessed 14 Feb 2017
- 5.Dongarra, J.: Report on the Sunway TaihuLight system. Technical report. University of Tennessee, Oak Ridge National Laboratory, June 2016Google Scholar
- 7.Kravcenko, M., Merta, M., Zapletal, J.: Using discrete mathematics to optimize parallelism in boundary element method, Paper 2. In: Proceedings of the Fifth International Conference on Parallel, Distributed, Grid and Cloud Computing for Engineering. Civil-Comp Press, Stirlingshire (2017). https://doi.org/10.4203/ccp.111.2
- 10.Merta, M., Zapletal, J.: BEM4I (2014). http://bem4i.it4i.cz. Accessed 17 Jan 2017
- 13.Říha, L., Brzobohatý, T., Markopoulos, A., Kozubek, T., Meca, O., Schenk, O., Vanroose, W.: Efficient implementation of total FETI solver for graphic processing units using schur complement. In: Kozubek, T., Blaheta, R., Šístek, J., Rozložník, M., Čermák, M. (eds.) HPCSE 2015. LNCS, vol. 9611, pp. 85–100. Springer, Cham (2016)CrossRefGoogle Scholar