Performance Modeling of Stencil Computation on SW26010 Processors

Liu, Yao; Liu, Li; Hu, Mengtao; Wang, Wei; Xue, Wei; Zhu, Qingting

doi:10.1007/978-3-030-60245-1_27

Yao Liu⁹,
Li Liu⁹,
Mengtao Hu⁹,
Wei Wang⁹,
Wei Xue¹⁰ &
…
Qingting Zhu⁹

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 12452))

Included in the following conference series:

International Conference on Algorithms and Architectures for Parallel Processing

1532 Accesses
1 Citations

Abstract

Stencil computation is a basic part in a large variety of scientific computing programs, especially for those containing partial differential equations. Due to the limited memory bandwidth, it is a challenge to improve the parallel efficiency of stencil computation on modern supercomputers. Performance modeling is the most common method of performance analysis. In this paper, we propose the generic performance model based on Sunway TaihuLight which is powered by SW26010 heterogeneous many-core processors. The generic model indicates the interaction between the programs and the computing platform from the architecture perspective, and points out the performance bottlenecks of the programs from the optimization perspective. Furthermore, we propose the specific performance model of stencil computation on SW26010 processors, and optimize the performance of stencil computation under the guidance of the model. The experimental results show that the performance models proposed in this paper are effective—the average error ratio of the predicted performance is less than 7%. Guided by the specific model, the optimized stencil computation achieves better performance than the unoptimized many-core version by 154.71% on 4096 cores.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Ao, Y., et al.: 26 PFLOPS stencil computations for atmospheric modeling on Sunway TaihuLight. In: 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS), pp. 535–544. IEEE (2017)
Google Scholar
Barnes, B.J., Rountree, B., Lowenthal, D.K., Reeves, J., De Supinski, B., Schulz, M.: A regression-based approach to scalability prediction. In: Proceedings of the 22nd Annual International Conference on Supercomputing, pp. 368–377 (2008)
Google Scholar
Burtscher, M., Kim, B.D., Diamond, J., McCalpin, J., Koesterke, L., Browne, J.: Perfexpert: an easy-to-use performance diagnosis tool for HPC applications. In: SC 2010: Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 1–11. IEEE (2010)
Google Scholar
Chen, B., et al.: Simulating the Wenchuan earthquake with accurate surface topography on Sunway TaihuLight. In: SC18: International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 517–528. IEEE (2018)
Google Scholar
Chen, G., Wu, B., Li, D., Shen, X.: Porple: an extensible optimizer for portable data placement on GPU. In: 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture, pp. 88–100. IEEE (2014)
Google Scholar
Datta, K., Kamil, S., Williams, S., Oliker, L., Shalf, J., Yelick, K.: Optimization and performance modeling of stencil computations on modern microprocessors. SIAM Rev. 51(1), 129–159 (2009)
Article Google Scholar
Dennis, J.M., et al.: Cam-se: a scalable spectral element dynamical core for the community atmosphere model. Int. J. High Perform. Comput. Appl. 26(1), 74–89 (2012)
Article Google Scholar
Ding, N., Xu, S., Song, Z., Zhang, B., Li, J., Zheng, Z.: Using hardware counter-based performance model to diagnose scaling issues of HPC applications. Neural Comput. Appl. 31(5), 1563–1575 (2019). https://doi.org/10.1007/s00521-018-3496-z
Article Google Scholar
Dong, W., Li, K., Kang, L., Quan, Z., Li, K.: Implementing molecular dynamics simulation on the Sunway TaihuLight system with heterogeneous many-core processors. Concurr. Comput.: Pract. Exp. 30(16), e4468 (2018)
Article Google Scholar
Fu, H., et al.: Refactoring and optimizing the community atmosphere model (CAM) on the Sunway TaihuLight supercomputer. In: SC 2016: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 969–980. IEEE (2016)
Google Scholar
Fu, H., et al.: The Sunway TaihuLight supercomputer: system and applications. Sci. China Inf. Sci. 59(7), 072001 (2016). https://doi.org/10.1007/s11432-016-5588-7
Article Google Scholar
Hoefler, T., Gropp, W., Kramer, W., Snir, M.: Performance modeling for systematic performance tuning. In: SC 2011: Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 1–12. IEEE (2011)
Google Scholar
Hong, S., Kim, H.: An analytical model for a GPU architecture with memory-level and thread-level parallelism awareness. In: Proceedings of the 36th Annual International Symposium on Computer Architecture, pp. 152–163 (2009)
Google Scholar
Langtangen, H.P.: Computational Partial Differential Equations: Numerical Methods and Diffpack Programming, vol. 2. Springer, Berlin (1999). https://doi.org/10.1007/978-3-662-01170-6
Book MATH Google Scholar
Li, L., et al.: swCaffe: a parallel framework for accelerating deep learning applications on Sunway TaihuLight. In: 2018 IEEE International Conference on Cluster Computing (CLUSTER), pp. 413–422. IEEE (2018)
Google Scholar
Liu, Y., Liao, Q., Sun, J., Hu, M., Liu, L., Zheng, L.: A heterogeneous parallel genetic algorithm based on sw26010 processors. In: 2019 IEEE 21st International Conference on High Performance Computing and Communications; IEEE 17th International Conference on Smart City; IEEE 5th International Conference on Data Science and Systems (HPCC/SmartCity/DSS), pp. 54–61. IEEE (2019)
Google Scholar
Shirako, J., et al.: Analytical bounds for optimal tile size selection. In: O’Boyle, Michael (ed.) CC 2012. LNCS, vol. 7210, pp. 101–121. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-28652-0_6
Chapter Google Scholar
Vizitiu, A., Itu, L., Niţă, C., Suciu, C.: Optimized three-dimensional stencil computation on Fermi and Kepler GPUs. In: 2014 IEEE High Performance Extreme Computing Conference (HPEC), pp. 1–6. IEEE (2014)
Google Scholar
Williams, S., Waterman, A., Patterson, D.: Roofline: an insightful visual performance model for multicore architectures. Commun. ACM 52(4), 65–76 (2009)
Article Google Scholar
Xu, Z., Lin, J., Matsuoka, S.: Benchmarking sw26010 many-core processor. In: 2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), pp. 743–752. IEEE (2017)
Google Scholar
Yang, C., et al.: 10m-core scalable fully-implicit solver for nonhydrostatic atmospheric dynamics. In: SC 2016: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 57–68. IEEE (2016)
Google Scholar
You, Y., et al.: Accelerating the 3D elastic wave forward modeling on GPU and MIC. In: 2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum, pp. 1088–1096. IEEE (2013)
Google Scholar
Zhang, G., Zhao, Y.: Modeling the performance of 2.5 d blocking of 3D stencil code on GPUs. In: IEEE High Performance Extreme Computing Conference, HPEC (2016)
Google Scholar
Zhang, J., et al.: Extreme-scale phase field simulations of coarsening dynamics on the Sunway TaihuLight supercomputer. In: SC 2016: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 34–45. IEEE (2016)
Google Scholar

Download references

Acknowledgement

This work is supported by the Ministry of Education’s University-Industry Collaborative Education Program (No. 201902146019)

Author information

Authors and Affiliations

School of Data Science and Engineering, East China Normal University, Shanghai, China
Yao Liu, Li Liu, Mengtao Hu, Wei Wang & Qingting Zhu
Department of Computer Science and Technology, Tsinghua University, Beijing, China
Wei Xue

Authors

Yao Liu
View author publications
You can also search for this author in PubMed Google Scholar
Li Liu
View author publications
You can also search for this author in PubMed Google Scholar
Mengtao Hu
View author publications
You can also search for this author in PubMed Google Scholar
Wei Wang
View author publications
You can also search for this author in PubMed Google Scholar
Wei Xue
View author publications
You can also search for this author in PubMed Google Scholar
Qingting Zhu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Qingting Zhu .

Editor information

Editors and Affiliations

Columbia University, New York, NY, USA
Meikang Qiu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Liu, Y., Liu, L., Hu, M., Wang, W., Xue, W., Zhu, Q. (2020). Performance Modeling of Stencil Computation on SW26010 Processors. In: Qiu, M. (eds) Algorithms and Architectures for Parallel Processing. ICA3PP 2020. Lecture Notes in Computer Science(), vol 12452. Springer, Cham. https://doi.org/10.1007/978-3-030-60245-1_27

Download citation

DOI: https://doi.org/10.1007/978-3-030-60245-1_27
Published: 29 September 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-60244-4
Online ISBN: 978-3-030-60245-1
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics