Skip to main content
Log in

swParaFEM: a highly efficient parallel finite element solver on Sunway many-core architecture

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

The simulation of three-dimensional stress and strain is a research hot spot of computational structural mechanics. As the complexity of the project increasing, the size of the matrix generated increases during the simulation. Therefore, a fast and efficient solver is needed. In this paper, we present swParaFEM, a highly efficient parallel finite element solver on Sunway many-core architecture. It is based on preconditioned conjugate gradient iteration algorithm. We launch a master–slave acceleration model to exploit the computational power of Sunway supercomputer. The kernel aggregation optimization scheme is proposed to deal with the problem that threads’ frequent creation and destruction waste computing resources. Moreover, we improve the data transfer speed from the slave core to the master core through memory access optimization. Using several optimizations, we achieve a speedup of 10.5\(\times\) compared to the naive implementation on one compute group of an SW26010-Pro processor and a strong scaling efficiency of 62.8% on 512 compute groups.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15

Similar content being viewed by others

Data availability

The data used to support the findings of this study are available from the corresponding author upon request.

References

  1. Abraham FF (1986) Computational statistical mechanics methodology, applications and supercomputing. Adv Phys 35(1):1–111

    Article  MathSciNet  Google Scholar 

  2. Guo Z, Saunders N, Schillé J, Miodownik A (2009) Material properties for process simulation. Mater Sci Eng A 499(1–2):7–13

    Article  Google Scholar 

  3. Shen JZCWH (2022) Mechanical properties of floating bollard groove during pouring. Port Waterw Eng 07:211–216+234

    Google Scholar 

  4. Cui Y, Olsen KB, Jordan TH, Lee K, Zhou J, Small P, Roten D, Ely G, Panda DK, Chourasia A et al (2010) Scalable earthquake simulation on petascale supercomputers. In: SC’10: Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE, pp 1–20

  5. Margetts L. ParaFEM: towards a massively parallel alternative to Abaqus/Ansys for implicit solid mechanics

  6. Margetts L, Evans L, Arregui D, Lever L (2014) General purpose finite element analysis of problems with billions of degrees of freedom. In: NAFEMS UK Regional Conference

  7. Preis T, Virnau P, Paul W, Schneider JJ (2009) GPU accelerated Monte Carlo simulation of the 2D and 3D Ising model. J Comput Phys 228(12):4468–4477

    Article  MATH  Google Scholar 

  8. Stone JE, Hardy DJ, Ufimtsev IS, Schulten K (2010) GPU-accelerated molecular modeling coming of age. J Mol Graph Model 29(2):116–125

    Article  Google Scholar 

  9. Han S, Jang K, Park K, Moon S (2010) Packetshader: a GPU-accelerated software router. ACM SIGCOMM Comput Commun Rev 40(4):195–206

    Article  Google Scholar 

  10. Sukhwani B, Min H, Thoennes M, Dube P, Iyer B, Brezzo B, Dillenberger D, Asaad S (2012) Database analytics acceleration using FPGAs. In: 2012 21st International Conference on Parallel Architectures and Compilation Techniques (PACT). IEEE, pp 411–420

  11. Hoozemans J, Peltenburg J, Nonnemacher F, Hadnagy A, Al-Ars Z, Hofstee HP (2021) FPGA acceleration for big data analytics: challenges and opportunities. IEEE Circuits Syst Mag 21(2):30–47

    Article  Google Scholar 

  12. Ma Y, Suda N, Cao Y, Vrudhula S, Seo J (2018) Alamo: FPGA acceleration of deep learning algorithms with a modularized rtl compiler. Integration 62:14–23

    Article  Google Scholar 

  13. Fu H, Liao J, Yang J, Wang L, Song Z, Huang X, Yang C, Xue W, Liu F, Qiao F et al (2016) The Sunway TaihuLight supercomputer: system and applications. Sci China Inf Sci 59(7):1–16

    Article  Google Scholar 

  14. Dongarra J (2016) Sunway TaihuLight supercomputer makes its appearance. Nat Sci Rev 3(3):265–266

    Article  Google Scholar 

  15. Lin J, Wen M, Meng D, Liu X, Nukada A, Matsuoka S (2018) Optimizing preconditioned conjugate gradient on TaihuLight for OpenFOAM. In: 2018 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID). IEEE, pp 273–282

  16. Ao Y, Yang C, Liu F, Yin W, Jiang L, Sun Q (2018) Performance optimization of the HPCG benchmark on the Sunway TaihuLight supercomputer. ACM Trans Archit Code Optim 15(1):1. https://doi.org/10.1145/3182177

    Article  Google Scholar 

  17. Yang C, Xue W, Fu H, You H, Wang X, Ao Y, Liu F, Gan L, Xu P, Wang L, Yang G, Zheng W (2016) 10m-core scalable fully-implicit solver for nonhydrostatic atmospheric dynamics. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. SC ’16

  18. Zhaoxiang CPFJQ (2022) Experimental and numerical study of the tensile behavior of high-strength steel T-stub. Prog Steel Build Struct 24(05):40–50+112. https://doi.org/10.13969/j.cnki.cn31-1893.2022.05.005

    Article  Google Scholar 

  19. Yu Xuan SH (2021) Review of research on hole edge crack monitoring technology of aviation structural parts. Aeronaut Sci Technol 32(12):1–17. https://doi.org/10.19452/j.issn1007-5453.2021.12.001

    Article  Google Scholar 

  20. Dai Z, Wang Y, Wang F, Ming L, Zhang J et al (2022) Performance optimization and analysis of the unstructured discontinuous Galerkin solver on multi-core and many-core architectures. arXiv:2209.01877

  21. Yang J, Xu Y, Yang L (2022) Taichi-LBM3D: a single-phase and multiphase lattice Boltzmann solver on cross-platform multicore CPU/GPUs. Fluids 7(8):270

    Article  Google Scholar 

  22. Horikoshi M, Gerofi B, Ishikawa Y, Nakajima K (2022) Exploring communication-computation overlap in parallel iterative solvers on manycore cpus using asynchronous progress control. In: International Conference on High Performance Computing in Asia-Pacific Region Workshops, pp 29–39

  23. Badia JM, Amor-Martin A, Belloch JA, Garcia-Castillo LE (2022) Strategies to parallelize a finite element mesh truncation technique on multi-core and many-core architectures. J Supercomput 1–17. https://link.springer.com/article/10.1007/s11227-022-04975-6

  24. Smith I, Margetts L, Beer G, Dünser C (2007) Parallelising the boundary element method using ParaFEM. In: Proceedings of the Tenth International Conference on Numerical Methods in Geomechanics, NUMOG X

  25. Arregui-Mena J.D, Margetts L, Lever L, Hall G, Mummery P (2014) Stochastic thermomechanical analysis of nuclear graphite using parafem. In: UK Conference of the Association for Computational Mechanics in Engineering

  26. Tsamos A.G, Margetts L, Jivkov AP (2016) Implementation of a cohesive zone model into the open source finite element software ParaFEM. In: Proceedings of the 24th UK Conference of the Association for Computational Mechanics in Engineering: 31 March–01 April 2016, Cardiff University, Cardiff

  27. Pan J, Xiao L, Tian M, Liu T, Wang L (2021) Heterogeneous multi-core optimization of mumps solver and its application. In: Proceedings of the 2021 ACM International Conference on Intelligent Computing and Its Emerging Applications, pp 122–127

  28. Fang J, Fu H, Zhao W, Chen B, Zheng W, Yang G (2017) swdnn: a library for accelerating deep learning applications on Sunway TaihuLight. In: 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS). IEEE, pp 615–624

  29. Dong W, Kang L, Quan Z, Li K, Li K, Hao Z, Xie X-H (2016) Implementing molecular dynamics simulation on Sunway TaihuLight system. In: 2016 IEEE 18th International Conference on High Performance Computing and Communications; IEEE 14th International Conference on Smart City; IEEE 2nd International Conference on Data Science and Systems (HPCC/SmartCity/DSS). IEEE, pp 443–450

  30. Li L, Fang J, Fu H, Jiang J, Zhao W, He C, You X, Yang G (2018) swcaffe: a parallel framework for accelerating deep learning applications on Sunway TaihuLight. In: 2018 IEEE International Conference on Cluster Computing (CLUSTER). IEEE, pp 413–422

  31. Duan X, Gao P, Zhang T, Zhang M, Liu W, Zhang W, Xue W, Fu H, Gan L, Chen D et al (2018) Redesigning LAMMPS for peta-scale and hundred-billion-atom simulation on Sunway TaihuLight. In: SC18: International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE, pp 148–159

  32. Lin H, Tang X, Yu B, Zhuo Y, Chen W, Zhai J, Yin W, Zheng W (2017) Scalable graph traversal on Sunway TaihuLight with ten million cores. In: 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS). IEEE, pp 635–645

  33. Tian M, Wang J, Zhang Z, Du W, Pan J, Liu T (2022) swSuperLU: a highly scalable sparse direct solver on Sunway manycore architecture. J Supercomput 78(9):11441–11463

    Article  Google Scholar 

  34. Xu K, Duan X, Müller A, Kobus R, Schmidt B, Liu W (2022) Fmapper: scalable read mapper based on succinct hash index on SunWay TaihuLight. J Parallel Distrib Comput 161:72–82

    Article  Google Scholar 

  35. Ye Y, Song Z, Zhou S, Liu Y, Shu Q, Wang B, Liu W, Qiao F, Wang L (2022) swnemo_v4.0: an ocean model NEMO for the next generation Sunway supercomputer. Geosci Model Dev Discuss 2022

  36. Zhu Q, Luo H, Yang C, Ding M, Yin W, Yuan X (2021) Enabling and scaling the hpcg benchmark on the newest generation Sunway supercomputer with 42 million heterogeneous cores. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, pp 1–13

  37. Badia S, Martín AF, Principe J (2016) Multilevel balancing domain decomposition at extreme scales. SIAM J Sci Comput 38(1):22–52

    Article  MathSciNet  MATH  Google Scholar 

  38. Fabien MS, Knepley MG, Mills RT, Rivière BM (2019) Manycore parallel computing for a hybridizable discontinuous Galerkin nested multigrid method. SIAM J Sci Comput 41(2):73–96

    Article  MathSciNet  MATH  Google Scholar 

Download references

Acknowledgements

This work was supported by National Natural Science Foundation of China under Grant 62002186, ShandongYouth Innovation Talent Introduction and Education Plan (Parallel Computing Industrial Software Innovation Team Based on Chinese Supercomputer), and the Qingdao National Laboratory for Marine Science and Technology under Grant No. 2018ASKJ01.

Funding

This work was supported by National Natural Science Foundation of China under Grant 62002186, the Qingdao National Laboratory for Marine Science and Technology No.2018ASKJ01, and ShandongYouth Innovation Talent Introduction and Education Plan (Parallel Computing Industrial Software Innovation Team Based on Chinese Supercomputer)

Author information

Authors and Affiliations

Authors

Contributions

LX and JP wrote the main manuscript text and others prepared figures 1–10. All authors reviewed the manuscript.

Corresponding author

Correspondence to Min Tian.

Ethics declarations

Conflict of interest

We declare that the authors have no competing interests as defined by Springer, or other interests that might be perceived to influence the results and/or discussion reported in this paper.

Ethical approval

In this paper, the declaration is not applicable.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Pan, J., Xiao, L., Tian, M. et al. swParaFEM: a highly efficient parallel finite element solver on Sunway many-core architecture. J Supercomput 79, 11427–11451 (2023). https://doi.org/10.1007/s11227-023-05114-5

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-023-05114-5

Keywords

Navigation