Abstract
The simulation of three-dimensional stress and strain is a research hot spot of computational structural mechanics. As the complexity of the project increasing, the size of the matrix generated increases during the simulation. Therefore, a fast and efficient solver is needed. In this paper, we present swParaFEM, a highly efficient parallel finite element solver on Sunway many-core architecture. It is based on preconditioned conjugate gradient iteration algorithm. We launch a master–slave acceleration model to exploit the computational power of Sunway supercomputer. The kernel aggregation optimization scheme is proposed to deal with the problem that threads’ frequent creation and destruction waste computing resources. Moreover, we improve the data transfer speed from the slave core to the master core through memory access optimization. Using several optimizations, we achieve a speedup of 10.5\(\times\) compared to the naive implementation on one compute group of an SW26010-Pro processor and a strong scaling efficiency of 62.8% on 512 compute groups.
Similar content being viewed by others
Data availability
The data used to support the findings of this study are available from the corresponding author upon request.
References
Abraham FF (1986) Computational statistical mechanics methodology, applications and supercomputing. Adv Phys 35(1):1–111
Guo Z, Saunders N, Schillé J, Miodownik A (2009) Material properties for process simulation. Mater Sci Eng A 499(1–2):7–13
Shen JZCWH (2022) Mechanical properties of floating bollard groove during pouring. Port Waterw Eng 07:211–216+234
Cui Y, Olsen KB, Jordan TH, Lee K, Zhou J, Small P, Roten D, Ely G, Panda DK, Chourasia A et al (2010) Scalable earthquake simulation on petascale supercomputers. In: SC’10: Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE, pp 1–20
Margetts L. ParaFEM: towards a massively parallel alternative to Abaqus/Ansys for implicit solid mechanics
Margetts L, Evans L, Arregui D, Lever L (2014) General purpose finite element analysis of problems with billions of degrees of freedom. In: NAFEMS UK Regional Conference
Preis T, Virnau P, Paul W, Schneider JJ (2009) GPU accelerated Monte Carlo simulation of the 2D and 3D Ising model. J Comput Phys 228(12):4468–4477
Stone JE, Hardy DJ, Ufimtsev IS, Schulten K (2010) GPU-accelerated molecular modeling coming of age. J Mol Graph Model 29(2):116–125
Han S, Jang K, Park K, Moon S (2010) Packetshader: a GPU-accelerated software router. ACM SIGCOMM Comput Commun Rev 40(4):195–206
Sukhwani B, Min H, Thoennes M, Dube P, Iyer B, Brezzo B, Dillenberger D, Asaad S (2012) Database analytics acceleration using FPGAs. In: 2012 21st International Conference on Parallel Architectures and Compilation Techniques (PACT). IEEE, pp 411–420
Hoozemans J, Peltenburg J, Nonnemacher F, Hadnagy A, Al-Ars Z, Hofstee HP (2021) FPGA acceleration for big data analytics: challenges and opportunities. IEEE Circuits Syst Mag 21(2):30–47
Ma Y, Suda N, Cao Y, Vrudhula S, Seo J (2018) Alamo: FPGA acceleration of deep learning algorithms with a modularized rtl compiler. Integration 62:14–23
Fu H, Liao J, Yang J, Wang L, Song Z, Huang X, Yang C, Xue W, Liu F, Qiao F et al (2016) The Sunway TaihuLight supercomputer: system and applications. Sci China Inf Sci 59(7):1–16
Dongarra J (2016) Sunway TaihuLight supercomputer makes its appearance. Nat Sci Rev 3(3):265–266
Lin J, Wen M, Meng D, Liu X, Nukada A, Matsuoka S (2018) Optimizing preconditioned conjugate gradient on TaihuLight for OpenFOAM. In: 2018 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID). IEEE, pp 273–282
Ao Y, Yang C, Liu F, Yin W, Jiang L, Sun Q (2018) Performance optimization of the HPCG benchmark on the Sunway TaihuLight supercomputer. ACM Trans Archit Code Optim 15(1):1. https://doi.org/10.1145/3182177
Yang C, Xue W, Fu H, You H, Wang X, Ao Y, Liu F, Gan L, Xu P, Wang L, Yang G, Zheng W (2016) 10m-core scalable fully-implicit solver for nonhydrostatic atmospheric dynamics. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. SC ’16
Zhaoxiang CPFJQ (2022) Experimental and numerical study of the tensile behavior of high-strength steel T-stub. Prog Steel Build Struct 24(05):40–50+112. https://doi.org/10.13969/j.cnki.cn31-1893.2022.05.005
Yu Xuan SH (2021) Review of research on hole edge crack monitoring technology of aviation structural parts. Aeronaut Sci Technol 32(12):1–17. https://doi.org/10.19452/j.issn1007-5453.2021.12.001
Dai Z, Wang Y, Wang F, Ming L, Zhang J et al (2022) Performance optimization and analysis of the unstructured discontinuous Galerkin solver on multi-core and many-core architectures. arXiv:2209.01877
Yang J, Xu Y, Yang L (2022) Taichi-LBM3D: a single-phase and multiphase lattice Boltzmann solver on cross-platform multicore CPU/GPUs. Fluids 7(8):270
Horikoshi M, Gerofi B, Ishikawa Y, Nakajima K (2022) Exploring communication-computation overlap in parallel iterative solvers on manycore cpus using asynchronous progress control. In: International Conference on High Performance Computing in Asia-Pacific Region Workshops, pp 29–39
Badia JM, Amor-Martin A, Belloch JA, Garcia-Castillo LE (2022) Strategies to parallelize a finite element mesh truncation technique on multi-core and many-core architectures. J Supercomput 1–17. https://link.springer.com/article/10.1007/s11227-022-04975-6
Smith I, Margetts L, Beer G, Dünser C (2007) Parallelising the boundary element method using ParaFEM. In: Proceedings of the Tenth International Conference on Numerical Methods in Geomechanics, NUMOG X
Arregui-Mena J.D, Margetts L, Lever L, Hall G, Mummery P (2014) Stochastic thermomechanical analysis of nuclear graphite using parafem. In: UK Conference of the Association for Computational Mechanics in Engineering
Tsamos A.G, Margetts L, Jivkov AP (2016) Implementation of a cohesive zone model into the open source finite element software ParaFEM. In: Proceedings of the 24th UK Conference of the Association for Computational Mechanics in Engineering: 31 March–01 April 2016, Cardiff University, Cardiff
Pan J, Xiao L, Tian M, Liu T, Wang L (2021) Heterogeneous multi-core optimization of mumps solver and its application. In: Proceedings of the 2021 ACM International Conference on Intelligent Computing and Its Emerging Applications, pp 122–127
Fang J, Fu H, Zhao W, Chen B, Zheng W, Yang G (2017) swdnn: a library for accelerating deep learning applications on Sunway TaihuLight. In: 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS). IEEE, pp 615–624
Dong W, Kang L, Quan Z, Li K, Li K, Hao Z, Xie X-H (2016) Implementing molecular dynamics simulation on Sunway TaihuLight system. In: 2016 IEEE 18th International Conference on High Performance Computing and Communications; IEEE 14th International Conference on Smart City; IEEE 2nd International Conference on Data Science and Systems (HPCC/SmartCity/DSS). IEEE, pp 443–450
Li L, Fang J, Fu H, Jiang J, Zhao W, He C, You X, Yang G (2018) swcaffe: a parallel framework for accelerating deep learning applications on Sunway TaihuLight. In: 2018 IEEE International Conference on Cluster Computing (CLUSTER). IEEE, pp 413–422
Duan X, Gao P, Zhang T, Zhang M, Liu W, Zhang W, Xue W, Fu H, Gan L, Chen D et al (2018) Redesigning LAMMPS for peta-scale and hundred-billion-atom simulation on Sunway TaihuLight. In: SC18: International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE, pp 148–159
Lin H, Tang X, Yu B, Zhuo Y, Chen W, Zhai J, Yin W, Zheng W (2017) Scalable graph traversal on Sunway TaihuLight with ten million cores. In: 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS). IEEE, pp 635–645
Tian M, Wang J, Zhang Z, Du W, Pan J, Liu T (2022) swSuperLU: a highly scalable sparse direct solver on Sunway manycore architecture. J Supercomput 78(9):11441–11463
Xu K, Duan X, Müller A, Kobus R, Schmidt B, Liu W (2022) Fmapper: scalable read mapper based on succinct hash index on SunWay TaihuLight. J Parallel Distrib Comput 161:72–82
Ye Y, Song Z, Zhou S, Liu Y, Shu Q, Wang B, Liu W, Qiao F, Wang L (2022) swnemo_v4.0: an ocean model NEMO for the next generation Sunway supercomputer. Geosci Model Dev Discuss 2022
Zhu Q, Luo H, Yang C, Ding M, Yin W, Yuan X (2021) Enabling and scaling the hpcg benchmark on the newest generation Sunway supercomputer with 42 million heterogeneous cores. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, pp 1–13
Badia S, Martín AF, Principe J (2016) Multilevel balancing domain decomposition at extreme scales. SIAM J Sci Comput 38(1):22–52
Fabien MS, Knepley MG, Mills RT, Rivière BM (2019) Manycore parallel computing for a hybridizable discontinuous Galerkin nested multigrid method. SIAM J Sci Comput 41(2):73–96
Acknowledgements
This work was supported by National Natural Science Foundation of China under Grant 62002186, ShandongYouth Innovation Talent Introduction and Education Plan (Parallel Computing Industrial Software Innovation Team Based on Chinese Supercomputer), and the Qingdao National Laboratory for Marine Science and Technology under Grant No. 2018ASKJ01.
Funding
This work was supported by National Natural Science Foundation of China under Grant 62002186, the Qingdao National Laboratory for Marine Science and Technology No.2018ASKJ01, and ShandongYouth Innovation Talent Introduction and Education Plan (Parallel Computing Industrial Software Innovation Team Based on Chinese Supercomputer)
Author information
Authors and Affiliations
Contributions
LX and JP wrote the main manuscript text and others prepared figures 1–10. All authors reviewed the manuscript.
Corresponding author
Ethics declarations
Conflict of interest
We declare that the authors have no competing interests as defined by Springer, or other interests that might be perceived to influence the results and/or discussion reported in this paper.
Ethical approval
In this paper, the declaration is not applicable.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Pan, J., Xiao, L., Tian, M. et al. swParaFEM: a highly efficient parallel finite element solver on Sunway many-core architecture. J Supercomput 79, 11427–11451 (2023). https://doi.org/10.1007/s11227-023-05114-5
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-023-05114-5