Parallel Region Reconstruction Technique for Sunway High-Performance Multi-core Processors

Nie, Kai; Zhou, Qinglei; Qian, Hong; Pang, Jianmin; Xu, Jinlong; Li, Yapeng

doi:10.1007/978-981-16-5940-9_13

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1451))

Included in the following conference series:

International Conference of Pioneering Computer Scientists, Engineers and Educators

1017 Accesses
1 Citations

Abstract

The leading way to achieve thread-level parallelism on the Sunway high-performance multicore processors is to use OpenMP programming techniques. In order to address the problem of low parallel efficiency caused by high thread group control overhead in the compilation of Sunway OpenMP programs, this paper proposes the parallel region reconstruction technique. The parallel region reconstruction technique expands the parallel scope of parallel regions in OpenMP programs by parallel region merging and parallel region extending. Moreover, it reduces the number of parallel regions in OpenMP programs, decreases the overhead of frequent creation and convergence of thread groups, and converts standard fork-join model OpenMP programs to higher performance SPMD model OpenMP programs. On the Sunway 1621 server computer, NPB3.3-OMP and SPEC OMP2012 achieved 8.9% and 7.9% running efficiency improvement respectively through parallel region reconstruction technique. As a result, the parallel region reconstruction technique is feasible and effective. It provides technical support to fully exploit the multi-core parallelism advantage of Sunway's high-performance processors.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 99.00; Price excludes VAT (USA)

Softcover Book: USD 129.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Tiotto, E., Mahjour, B., Tsang, W.: OpenMP 4.5 compiler optimization for GPU offloading. IBM J. Res. Dev. 3(5), 1–11 (2020)
Google Scholar
Neth, B., Scogland, T.R.W., Strout, M.M., de Supinski, B.R.: Unified sequential optimization directives in OpenMP. In: Milfeld, K., de Supinski, B., Koesterke, L., Klinkenberg, J. (eds.) IWOMP 2020. LNCS, vol. 12295, pp. 85–97. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58144-2_6
Mosseri, I., Alon, L.O., Harel, R., Oren, G.: ComPar: optimized multi-compiler for automatic OpenMP S2S parallelization. In: Milfeld, K., de Supinski, B., Koesterke, L., Klinkenberg, J. (eds.) IWOMP 2020. LNCS, vol. 12295, pp. 247–262. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58144-2_16
Onodera, N., Idomura, Y., Hasegawa, Y.: GPU acceleration of multigrid preconditioned conjugate gradient solver on block-structured Cartesian grid. In: Proceedings of International Conference on High Performance Computing in Asia-Pacific Region, pp. 120–128 (2021)
Google Scholar
Pereira, F.H., Verardi, S.L.L., Nabeta, S.I.: A fast algebraic multigrid preconditioned conjugate gradient solver. Appl. Math. Comput. 179(1), 344–351 (2006)
Google Scholar
Pal, S., Pathak, S., Rajasekaran, S.: On speeding-up parallel Jacobi iterations for SVDs. In: Proceedings - 18th IEEE International Conference on High Performance Computing and Communications, 14th IEEE International Conference on Smart City and 2nd IEEE International Conference on Data Science and Systems, pp. 9–16 (2016)
Google Scholar
Yang, X., Mittal, R.: Efficient relaxed-Jacobi smoothers for multigrid on parallel computers. J. Comput. Phys. 332, 135–142 (2017)
Google Scholar
Kudo, S., Yamamoto, Y., Bečka, M., Vajteršic, M.: Performance of the parallel one-sided block Jacobi SVD algorithm on a modern distributed-memory parallel computer. In: Wyrzykowski, R., Deelman, E., Dongarra, J., Karczewski, K., Kitowski, J., Wiatr, K. (eds.) PPAM 2015. LNCS, vol. 9573, pp. 594–604. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-32149-3_55
Cervini, S.: System and method for efficiently executing single program multiple data (SPMD) programs, US7904905 B2, US (2011)
Google Scholar
Intel Corporation: Architecture and method for data parallel single program multiple data (SPMD) Execution: US,US20200104139[P], 4 February 2020
Google Scholar
Sprenger, S., Zeuch, S., Leser, U.: Exploiting automatic vectorization to employ SPMD on SIMD registers. In: Proceedings - IEEE 34th International Conference on Data Engineering Workshops, pp. 90–95 (2018)
Google Scholar
Zhu, W., del Cuvillo, J., Gao, G.R.: Performance characteristics of OpenMP language constructs on a many-core-on-a-chip architecture. In: Mueller, M.S., Chapman, B.M., de Supinski, B.R., Malony, A.D., Voss, M. (eds.) IWOMP -2005. LNCS, vol. 4315, pp. 230–241. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-68555-5_19
Chapter Google Scholar
Stelle, G., Moses, W.S., Olivier, S.L.: Implementing OpenMP tasks with tapir. In: Proceedings of LLVM-HPC 2017: 4th Workshop on the LLVM Compiler Infrastructure in HPC - Held in conjunction with SC 2017: The International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 1–12. OpenMPIR (2017)
Google Scholar
Bouraoui, H., Castrillon, J., Jerad, C.: Comparing dataflow and OpenMP programming for speaker recognition applications. In: PARMA-DITAM 2019 - Proceedings: 10th Workshop on Parallel Programming and Run-Time Management Techniques for Many-Core Architectures - 8th Workshop on Design Tools and Architectures For Multicore Embedded Computing Platforms, pp. 1–6 (2019)
Google Scholar
Scogland, T.R.W., Gyllenhaal, J., Keasler, J., Hornung, R., de Supinski, B.R.: Enabling region merging optimizations in OpenMP. In: Terboven, C., de Supinski, B., Reble, P., Chapman, B., Müller, M. (eds.) IWOMP 2015. LNCS, vol. 9342, pp. 177–188. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24595-9_13

Download references

Author information

Authors and Affiliations

Information Engineering University, Zhengzhou, 450001, Henan, China
Kai Nie, Jianmin Pang, Jinlong Xu & Yapeng Li
Zhengzhou University, Zhengzhou, 450001, Henan, China
Qinglei Zhou
Jiangnan Institute of Computing Technology, Wuxi, 214083, Jiangsu, China
Hong Qian

Authors

Kai Nie
View author publications
You can also search for this author in PubMed Google Scholar
Qinglei Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Hong Qian
View author publications
You can also search for this author in PubMed Google Scholar
Jianmin Pang
View author publications
You can also search for this author in PubMed Google Scholar
Jinlong Xu
View author publications
You can also search for this author in PubMed Google Scholar
Yapeng Li
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

North University of China, Taiyuan, China
Jianchao Zeng
North University of China, Taiyuan, China
Pinle Qin
Northeast Forestry University, Harbin, China
Weipeng Jing
Harbin University of Science and Technology, Harbin, China
Xianhua Song
National Academy of Guo Ding Institute of Data Science, Beijing, China
Zeguang Lu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Nie, K., Zhou, Q., Qian, H., Pang, J., Xu, J., Li, Y. (2021). Parallel Region Reconstruction Technique for Sunway High-Performance Multi-core Processors. In: Zeng, J., Qin, P., Jing, W., Song, X., Lu, Z. (eds) Data Science. ICPCSEE 2021. Communications in Computer and Information Science, vol 1451. Springer, Singapore. https://doi.org/10.1007/978-981-16-5940-9_13

Download citation

DOI: https://doi.org/10.1007/978-981-16-5940-9_13
Published: 10 September 2021
Publisher Name: Springer, Singapore
Print ISBN: 978-981-16-5939-3
Online ISBN: 978-981-16-5940-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics