swParaFEM: a highly efficient parallel finite element solver on Sunway many-core architecture

Pan, Jingshan; Xiao, Lei; Tian, Min; Liu, Tao; Wang, Yinglong

doi:10.1007/s11227-023-05114-5

swParaFEM: a highly efficient parallel finite element solver on Sunway many-core architecture

Published: 28 February 2023

Volume 79, pages 11427–11451, (2023)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

Jingshan Pan^1,2,
Lei Xiao¹,
Min Tian¹,
Tao Liu¹ &
…
Yinglong Wang¹

297 Accesses
Explore all metrics

Abstract

The simulation of three-dimensional stress and strain is a research hot spot of computational structural mechanics. As the complexity of the project increasing, the size of the matrix generated increases during the simulation. Therefore, a fast and efficient solver is needed. In this paper, we present swParaFEM, a highly efficient parallel finite element solver on Sunway many-core architecture. It is based on preconditioned conjugate gradient iteration algorithm. We launch a master–slave acceleration model to exploit the computational power of Sunway supercomputer. The kernel aggregation optimization scheme is proposed to deal with the problem that threads’ frequent creation and destruction waste computing resources. Moreover, we improve the data transfer speed from the slave core to the master core through memory access optimization. Using several optimizations, we achieve a speedup of 10.5\(\times\) compared to the naive implementation on one compute group of an SW26010-Pro processor and a strong scaling efficiency of 62.8% on 512 compute groups.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Performance improvement of the triangular matrix product in commodity clusters

Article Open access 15 April 2024

The Egyptian national HPC grid (EN-HPCG): open-source Slurm implementation from cluster to grid approach

Article Open access 17 April 2024

Development of a 3D Hybrid Finite-Discrete Element Simulator Based on GPGPU-Parallelized Computation for Modelling Rock Fracturing Under Quasi-Static and Dynamic Loading Conditions

Article 04 September 2019

Data availability

The data used to support the findings of this study are available from the corresponding author upon request.

References

Abraham FF (1986) Computational statistical mechanics methodology, applications and supercomputing. Adv Phys 35(1):1–111
Article MathSciNet Google Scholar
Guo Z, Saunders N, Schillé J, Miodownik A (2009) Material properties for process simulation. Mater Sci Eng A 499(1–2):7–13
Article Google Scholar
Shen JZCWH (2022) Mechanical properties of floating bollard groove during pouring. Port Waterw Eng 07:211–216+234
Google Scholar
Cui Y, Olsen KB, Jordan TH, Lee K, Zhou J, Small P, Roten D, Ely G, Panda DK, Chourasia A et al (2010) Scalable earthquake simulation on petascale supercomputers. In: SC’10: Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE, pp 1–20
Margetts L. ParaFEM: towards a massively parallel alternative to Abaqus/Ansys for implicit solid mechanics
Margetts L, Evans L, Arregui D, Lever L (2014) General purpose finite element analysis of problems with billions of degrees of freedom. In: NAFEMS UK Regional Conference
Preis T, Virnau P, Paul W, Schneider JJ (2009) GPU accelerated Monte Carlo simulation of the 2D and 3D Ising model. J Comput Phys 228(12):4468–4477
Article MATH Google Scholar
Stone JE, Hardy DJ, Ufimtsev IS, Schulten K (2010) GPU-accelerated molecular modeling coming of age. J Mol Graph Model 29(2):116–125
Article Google Scholar
Han S, Jang K, Park K, Moon S (2010) Packetshader: a GPU-accelerated software router. ACM SIGCOMM Comput Commun Rev 40(4):195–206
Article Google Scholar
Sukhwani B, Min H, Thoennes M, Dube P, Iyer B, Brezzo B, Dillenberger D, Asaad S (2012) Database analytics acceleration using FPGAs. In: 2012 21st International Conference on Parallel Architectures and Compilation Techniques (PACT). IEEE, pp 411–420
Hoozemans J, Peltenburg J, Nonnemacher F, Hadnagy A, Al-Ars Z, Hofstee HP (2021) FPGA acceleration for big data analytics: challenges and opportunities. IEEE Circuits Syst Mag 21(2):30–47
Article Google Scholar
Ma Y, Suda N, Cao Y, Vrudhula S, Seo J (2018) Alamo: FPGA acceleration of deep learning algorithms with a modularized rtl compiler. Integration 62:14–23
Article Google Scholar
Fu H, Liao J, Yang J, Wang L, Song Z, Huang X, Yang C, Xue W, Liu F, Qiao F et al (2016) The Sunway TaihuLight supercomputer: system and applications. Sci China Inf Sci 59(7):1–16
Article Google Scholar
Dongarra J (2016) Sunway TaihuLight supercomputer makes its appearance. Nat Sci Rev 3(3):265–266
Article Google Scholar
Lin J, Wen M, Meng D, Liu X, Nukada A, Matsuoka S (2018) Optimizing preconditioned conjugate gradient on TaihuLight for OpenFOAM. In: 2018 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID). IEEE, pp 273–282
Ao Y, Yang C, Liu F, Yin W, Jiang L, Sun Q (2018) Performance optimization of the HPCG benchmark on the Sunway TaihuLight supercomputer. ACM Trans Archit Code Optim 15(1):1. https://doi.org/10.1145/3182177
Article Google Scholar
Yang C, Xue W, Fu H, You H, Wang X, Ao Y, Liu F, Gan L, Xu P, Wang L, Yang G, Zheng W (2016) 10m-core scalable fully-implicit solver for nonhydrostatic atmospheric dynamics. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. SC ’16
Zhaoxiang CPFJQ (2022) Experimental and numerical study of the tensile behavior of high-strength steel T-stub. Prog Steel Build Struct 24(05):40–50+112. https://doi.org/10.13969/j.cnki.cn31-1893.2022.05.005
Article Google Scholar
Yu Xuan SH (2021) Review of research on hole edge crack monitoring technology of aviation structural parts. Aeronaut Sci Technol 32(12):1–17. https://doi.org/10.19452/j.issn1007-5453.2021.12.001
Article Google Scholar
Dai Z, Wang Y, Wang F, Ming L, Zhang J et al (2022) Performance optimization and analysis of the unstructured discontinuous Galerkin solver on multi-core and many-core architectures. arXiv:2209.01877
Yang J, Xu Y, Yang L (2022) Taichi-LBM3D: a single-phase and multiphase lattice Boltzmann solver on cross-platform multicore CPU/GPUs. Fluids 7(8):270
Article Google Scholar
Horikoshi M, Gerofi B, Ishikawa Y, Nakajima K (2022) Exploring communication-computation overlap in parallel iterative solvers on manycore cpus using asynchronous progress control. In: International Conference on High Performance Computing in Asia-Pacific Region Workshops, pp 29–39
Badia JM, Amor-Martin A, Belloch JA, Garcia-Castillo LE (2022) Strategies to parallelize a finite element mesh truncation technique on multi-core and many-core architectures. J Supercomput 1–17. https://link.springer.com/article/10.1007/s11227-022-04975-6
Smith I, Margetts L, Beer G, Dünser C (2007) Parallelising the boundary element method using ParaFEM. In: Proceedings of the Tenth International Conference on Numerical Methods in Geomechanics, NUMOG X
Arregui-Mena J.D, Margetts L, Lever L, Hall G, Mummery P (2014) Stochastic thermomechanical analysis of nuclear graphite using parafem. In: UK Conference of the Association for Computational Mechanics in Engineering
Tsamos A.G, Margetts L, Jivkov AP (2016) Implementation of a cohesive zone model into the open source finite element software ParaFEM. In: Proceedings of the 24th UK Conference of the Association for Computational Mechanics in Engineering: 31 March–01 April 2016, Cardiff University, Cardiff
Pan J, Xiao L, Tian M, Liu T, Wang L (2021) Heterogeneous multi-core optimization of mumps solver and its application. In: Proceedings of the 2021 ACM International Conference on Intelligent Computing and Its Emerging Applications, pp 122–127
Fang J, Fu H, Zhao W, Chen B, Zheng W, Yang G (2017) swdnn: a library for accelerating deep learning applications on Sunway TaihuLight. In: 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS). IEEE, pp 615–624
Dong W, Kang L, Quan Z, Li K, Li K, Hao Z, Xie X-H (2016) Implementing molecular dynamics simulation on Sunway TaihuLight system. In: 2016 IEEE 18th International Conference on High Performance Computing and Communications; IEEE 14th International Conference on Smart City; IEEE 2nd International Conference on Data Science and Systems (HPCC/SmartCity/DSS). IEEE, pp 443–450
Li L, Fang J, Fu H, Jiang J, Zhao W, He C, You X, Yang G (2018) swcaffe: a parallel framework for accelerating deep learning applications on Sunway TaihuLight. In: 2018 IEEE International Conference on Cluster Computing (CLUSTER). IEEE, pp 413–422
Duan X, Gao P, Zhang T, Zhang M, Liu W, Zhang W, Xue W, Fu H, Gan L, Chen D et al (2018) Redesigning LAMMPS for peta-scale and hundred-billion-atom simulation on Sunway TaihuLight. In: SC18: International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE, pp 148–159
Lin H, Tang X, Yu B, Zhuo Y, Chen W, Zhai J, Yin W, Zheng W (2017) Scalable graph traversal on Sunway TaihuLight with ten million cores. In: 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS). IEEE, pp 635–645
Tian M, Wang J, Zhang Z, Du W, Pan J, Liu T (2022) swSuperLU: a highly scalable sparse direct solver on Sunway manycore architecture. J Supercomput 78(9):11441–11463
Article Google Scholar
Xu K, Duan X, Müller A, Kobus R, Schmidt B, Liu W (2022) Fmapper: scalable read mapper based on succinct hash index on SunWay TaihuLight. J Parallel Distrib Comput 161:72–82
Article Google Scholar
Ye Y, Song Z, Zhou S, Liu Y, Shu Q, Wang B, Liu W, Qiao F, Wang L (2022) swnemo_v4.0: an ocean model NEMO for the next generation Sunway supercomputer. Geosci Model Dev Discuss 2022
Zhu Q, Luo H, Yang C, Ding M, Yin W, Yuan X (2021) Enabling and scaling the hpcg benchmark on the newest generation Sunway supercomputer with 42 million heterogeneous cores. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, pp 1–13
Badia S, Martín AF, Principe J (2016) Multilevel balancing domain decomposition at extreme scales. SIAM J Sci Comput 38(1):22–52
Article MathSciNet MATH Google Scholar
Fabien MS, Knepley MG, Mills RT, Rivière BM (2019) Manycore parallel computing for a hybridizable discontinuous Galerkin nested multigrid method. SIAM J Sci Comput 41(2):73–96
Article MathSciNet MATH Google Scholar

Download references

Acknowledgements

This work was supported by National Natural Science Foundation of China under Grant 62002186, ShandongYouth Innovation Talent Introduction and Education Plan (Parallel Computing Industrial Software Innovation Team Based on Chinese Supercomputer), and the Qingdao National Laboratory for Marine Science and Technology under Grant No. 2018ASKJ01.

Funding

This work was supported by National Natural Science Foundation of China under Grant 62002186, the Qingdao National Laboratory for Marine Science and Technology No.2018ASKJ01, and ShandongYouth Innovation Talent Introduction and Education Plan (Parallel Computing Industrial Software Innovation Team Based on Chinese Supercomputer)

Author information

Authors and Affiliations

Shandong Computer Science Center (National Supercomputer Center in Jinan), Qilu University of Technology (Shandong Academy of Sciences), Jinan, Shandong, China
Jingshan Pan, Lei Xiao, Min Tian, Tao Liu & Yinglong Wang
Faculty of Information Science and Engineering, Ocean University of China, Qingdao, China
Jingshan Pan

Authors

Jingshan Pan
View author publications
You can also search for this author in PubMed Google Scholar
Lei Xiao
View author publications
You can also search for this author in PubMed Google Scholar
Min Tian
View author publications
You can also search for this author in PubMed Google Scholar
Tao Liu
View author publications
You can also search for this author in PubMed Google Scholar
Yinglong Wang
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

LX and JP wrote the main manuscript text and others prepared figures 1–10. All authors reviewed the manuscript.

Corresponding author

Correspondence to Min Tian.

Ethics declarations

Conflict of interest

We declare that the authors have no competing interests as defined by Springer, or other interests that might be perceived to influence the results and/or discussion reported in this paper.

Ethical approval

In this paper, the declaration is not applicable.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Pan, J., Xiao, L., Tian, M. et al. swParaFEM: a highly efficient parallel finite element solver on Sunway many-core architecture. J Supercomput 79, 11427–11451 (2023). https://doi.org/10.1007/s11227-023-05114-5

Download citation

Accepted: 11 February 2023
Published: 28 February 2023
Issue Date: July 2023
DOI: https://doi.org/10.1007/s11227-023-05114-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

swParaFEM: a highly efficient parallel finite element solver on Sunway many-core architecture

Abstract

Access this article

Similar content being viewed by others

Performance improvement of the triangular matrix product in commodity clusters

The Egyptian national HPC grid (EN-HPCG): open-source Slurm implementation from cluster to grid approach

Development of a 3D Hybrid Finite-Discrete Element Simulator Based on GPGPU-Parallelized Computation for Modelling Rock Fracturing Under Quasi-Static and Dynamic Loading Conditions

Data availability

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

swParaFEM: a highly efficient parallel finite element solver on Sunway many-core architecture

Abstract

Access this article

Similar content being viewed by others

Performance improvement of the triangular matrix product in commodity clusters

The Egyptian national HPC grid (EN-HPCG): open-source Slurm implementation from cluster to grid approach

Development of a 3D Hybrid Finite-Discrete Element Simulator Based on GPGPU-Parallelized Computation for Modelling Rock Fracturing Under Quasi-Static and Dynamic Loading Conditions

Data availability

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation