Skip to main content

Parallel Region Reconstruction Technique for Sunway High-Performance Multi-core Processors

  • Conference paper
  • First Online:
Data Science (ICPCSEE 2021)

Abstract

The leading way to achieve thread-level parallelism on the Sunway high-performance multicore processors is to use OpenMP programming techniques. In order to address the problem of low parallel efficiency caused by high thread group control overhead in the compilation of Sunway OpenMP programs, this paper proposes the parallel region reconstruction technique. The parallel region reconstruction technique expands the parallel scope of parallel regions in OpenMP programs by parallel region merging and parallel region extending. Moreover, it reduces the number of parallel regions in OpenMP programs, decreases the overhead of frequent creation and convergence of thread groups, and converts standard fork-join model OpenMP programs to higher performance SPMD model OpenMP programs. On the Sunway 1621 server computer, NPB3.3-OMP and SPEC OMP2012 achieved 8.9% and 7.9% running efficiency improvement respectively through parallel region reconstruction technique. As a result, the parallel region reconstruction technique is feasible and effective. It provides technical support to fully exploit the multi-core parallelism advantage of Sunway's high-performance processors.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 99.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 129.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Tiotto, E., Mahjour, B., Tsang, W.: OpenMP 4.5 compiler optimization for GPU offloading. IBM J. Res. Dev. 3(5), 1–11 (2020)

    Google Scholar 

  2. Neth, B., Scogland, T.R.W., Strout, M.M., de Supinski, B.R.: Unified sequential optimization directives in OpenMP. In: Milfeld, K., de Supinski, B., Koesterke, L., Klinkenberg, J. (eds.) IWOMP 2020. LNCS, vol. 12295, pp. 85–97. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58144-2_6

  3. Mosseri, I., Alon, L.O., Harel, R., Oren, G.: ComPar: optimized multi-compiler for automatic OpenMP S2S parallelization. In: Milfeld, K., de Supinski, B., Koesterke, L., Klinkenberg, J. (eds.) IWOMP 2020. LNCS, vol. 12295, pp. 247–262. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58144-2_16

  4. Onodera, N., Idomura, Y., Hasegawa, Y.: GPU acceleration of multigrid preconditioned conjugate gradient solver on block-structured Cartesian grid. In: Proceedings of International Conference on High Performance Computing in Asia-Pacific Region, pp. 120–128 (2021)

    Google Scholar 

  5. Pereira, F.H., Verardi, S.L.L., Nabeta, S.I.: A fast algebraic multigrid preconditioned conjugate gradient solver. Appl. Math. Comput. 179(1), 344–351 (2006)

    Google Scholar 

  6. Pal, S., Pathak, S., Rajasekaran, S.: On speeding-up parallel Jacobi iterations for SVDs. In: Proceedings - 18th IEEE International Conference on High Performance Computing and Communications, 14th IEEE International Conference on Smart City and 2nd IEEE International Conference on Data Science and Systems, pp. 9–16 (2016)

    Google Scholar 

  7. Yang, X., Mittal, R.: Efficient relaxed-Jacobi smoothers for multigrid on parallel computers. J. Comput. Phys. 332, 135–142 (2017)

    Google Scholar 

  8. Kudo, S., Yamamoto, Y., Bečka, M., Vajteršic, M.: Performance of the parallel one-sided block Jacobi SVD algorithm on a modern distributed-memory parallel computer. In: Wyrzykowski, R., Deelman, E., Dongarra, J., Karczewski, K., Kitowski, J., Wiatr, K. (eds.) PPAM 2015. LNCS, vol. 9573, pp. 594–604. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-32149-3_55

  9. Cervini, S.: System and method for efficiently executing single program multiple data (SPMD) programs, US7904905 B2, US (2011)

    Google Scholar 

  10. Intel Corporation: Architecture and method for data parallel single program multiple data (SPMD) Execution: US,US20200104139[P], 4 February 2020

    Google Scholar 

  11. Sprenger, S., Zeuch, S., Leser, U.: Exploiting automatic vectorization to employ SPMD on SIMD registers. In: Proceedings - IEEE 34th International Conference on Data Engineering Workshops, pp. 90–95 (2018)

    Google Scholar 

  12. Zhu, W., del Cuvillo, J., Gao, G.R.: Performance characteristics of OpenMP language constructs on a many-core-on-a-chip architecture. In: Mueller, M.S., Chapman, B.M., de Supinski, B.R., Malony, A.D., Voss, M. (eds.) IWOMP -2005. LNCS, vol. 4315, pp. 230–241. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-68555-5_19

    Chapter  Google Scholar 

  13. Stelle, G., Moses, W.S., Olivier, S.L.: Implementing OpenMP tasks with tapir. In: Proceedings of LLVM-HPC 2017: 4th Workshop on the LLVM Compiler Infrastructure in HPC - Held in conjunction with SC 2017: The International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 1–12. OpenMPIR (2017)

    Google Scholar 

  14. Bouraoui, H., Castrillon, J., Jerad, C.: Comparing dataflow and OpenMP programming for speaker recognition applications. In: PARMA-DITAM 2019 - Proceedings: 10th Workshop on Parallel Programming and Run-Time Management Techniques for Many-Core Architectures - 8th Workshop on Design Tools and Architectures For Multicore Embedded Computing Platforms, pp. 1–6 (2019)

    Google Scholar 

  15. Scogland, T.R.W., Gyllenhaal, J., Keasler, J., Hornung, R., de Supinski, B.R.: Enabling region merging optimizations in OpenMP. In: Terboven, C., de Supinski, B., Reble, P., Chapman, B., Müller, M. (eds.) IWOMP 2015. LNCS, vol. 9342, pp. 177–188. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24595-9_13

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Nie, K., Zhou, Q., Qian, H., Pang, J., Xu, J., Li, Y. (2021). Parallel Region Reconstruction Technique for Sunway High-Performance Multi-core Processors. In: Zeng, J., Qin, P., Jing, W., Song, X., Lu, Z. (eds) Data Science. ICPCSEE 2021. Communications in Computer and Information Science, vol 1451. Springer, Singapore. https://doi.org/10.1007/978-981-16-5940-9_13

Download citation

  • DOI: https://doi.org/10.1007/978-981-16-5940-9_13

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-16-5939-3

  • Online ISBN: 978-981-16-5940-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics