Abstract
The Sunway processor has been demonstrated with superior performance by various scientific applications, domain specific frameworks and numerical algorithms. However, the optimization techniques that can fully exploit the architecture features are usually buried deep in large code bases, which prevents average programmers to understand such optimization techniques. Thus, the existing complex software fails to provide guidance for more programs embracing the computation power of Sunway processor. In this paper, we build a benchmark suite swRodinia by porting and optimizing the well-known Rodinia benchmark on Sunway processor. Specifically, we demonstrate several optimization techniques by tailoring the benchmarks to better leverage the architecture features for higher performance. Moreover, based on the optimization experiences, we derive several useful insights from both software and hardware perspectives, that not only guide the better utilization of current Sunway processor, but also reveal the direction of hardware improvements for future Sunway processor. We open source the swRodinia benchmark suite and encourage the community to enhance the benchmark with us continuously.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Athread user guide. http://www.nsccwx.cn/guide/. Accessed 16 Aug 2020
Asanovic, K., et al.: The landscape of parallel computing research: A view from berkeley (2006)
Che, S., et al.: Rodinia: a benchmark suite for heterogeneous computing. In: 2009 IEEE International Symposium on Workload Characterization (IISWC), pp. 44–54 (2009)
Che, S., Sheaffer, J.W., Boyer, M., Szafaryn, L.G., Wang, S.L., Kadron, K.: A characterization of the Rodinia benchmark suite with comparison to contemporary CMP workloads. In: IEEE International Symposium on Workload Characterization (IISWC 2010), pp. 1–11 (2010)
Duan, X., et al.: Neighbor-list-free molecular dynamics on sunway taihulight supercomputer. In: Proceedings of the 25th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP 2020, pp. 413–414. Association for Computing Machinery, New York (2020)
Duan, X., et al.: Redesigning lammps for peta-scale and hundred-billion-atom simulation on sunway taihulight. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis, SC 2018, IEEE Press (2018)
Dun, M., Li, Y., Yang, H., Li, W., Luan, Z., Qian, D.: swCPD: optimizing canonical polyadic decomposition on sunway manycore architecture. In: 2019 IEEE 21st International Conference on High Performance Computing and Communications; IEEE 17th International Conference on Smart City; IEEE 5th International Conference on Data Science and Systems (HPCC/SmartCity/DSS), pp. 1320–1327. IEEE (2019)
Fang, J., Fu, H., Zhao, W., Chen, B., Zheng, W., Yang, G.: swDNN: a library for accelerating deep learning applications on sunway taihulight. In: 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS), pp. 615–624. IEEE (2017)
Fu, H., et al.: 18.9-pflops nonlinear earthquake simulation on sunway taihulight: Enabling depiction of 18-hz and 8-meter scenarios. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. SC 2017. Association for Computing Machinery, New York (2017)
Gao, P., et al.: Millimeter-scale and billion-atom reactive force field simulation on Sunway Taihulight. IEEE Trans. Parallel Distrib. Syst. 31(12), 2954–2967 (2020)
Han, Q., Yang, H., Luan, Z., Qian, D.: Accelerating tile low-rank gemm on sunway architecture: Poster. In: Proceedings of the 16th ACM International Conference on Computing Frontiers, pp. 295–297 (2019)
Hu, Y., Yang, H., Luan, Z., Gan, L., Yang, G., Qian, D.: Massively scaling seismic processing on Sunway Taihulight supercomputer. IEEE Trans. Parallel Distrib. Syst. 31(5), 1194–1208 (2019)
Li, L., et al.: swCaffe: a parallel framework for accelerating deep learning applications on Sunway Taihulight. In: 2018 IEEE International Conference on Cluster Computing (CLUSTER), pp. 413–422 (2018)
Li, M., Liu, Y., Yang, H., Luan, Z., Gan, L., Yang, G., Qian, D.: Accelerating sparse Cholesky factorization on sunway manycore architecture. IEEE Trans. Parallel Distrib. Syst. 31(7), 1636–1650 (2020)
Li, M., Liu, Y., Yang, H., Luan, Z., Qian, D.: Multi-role spTRSV on sunway many-core architecture. In: 2018 IEEE 20th International Conference on High Performance Computing and Communications; IEEE 16th International Conference on Smart City; IEEE 4th International Conference on Data Science and Systems (HPCC/SmartCity/DSS), pp. 594–601. IEEE (2018)
Lin, H., et al.: Shentu: processing multi-trillion edge graphs on millions of cores in seconds. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis, SC 2018. IEEE Press (2018)
Liu, C., Xie, B., Liu, X., Xue, W., Yang, H., Liu, X.: Towards efficient spMV on sunway manycore architectures. In: Proceedings of the 2018 International Conference on Supercomputing, pp. 363–373 (2018)
Liu, C., et al.: swTVM: exploring the automated compilation for deep learning on sunway architecture. arXiv preprint arXiv:1904.07404 (2019)
Wang, X., Liu, W., Xue, W., Wu, L.: Swsptrsv: a fast sparse triangular solve with sparse level tile layout on sunway architectures. SIGPLAN Not. 53(1), 338–353 (2018)
Wienke, S., Springer, P., Terboven, C., an Mey, D.: OpenACC—first experiences with real-world applications. In: Kaklamanis, C., Papatheodorou, T., Spirakis, P.G. (eds.) Euro-Par 2012. LNCS, vol. 7484, pp. 859–870. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-32820-6_85
Xiao, G., Li, K., Chen, Y., He, W., Zomaya, A., Li, T.: CASpMV: a customized and accelerative spMV framework for the Sunway Taihulight. IEEE Trans. Parallel Distrib. Syst. 1 (2019)
Xu, K., et al.: Refactoring and optimizing WRF model on Sunway Taihulight. In: Proceedings of the 48th International Conference on Parallel Processing, ICPP 2019. Association for Computing Machinery, New York (2019). https://doi.org/10.1145/3337821.3337923
Xu, Z., Lin, J., Matsuoka, S.: Benchmarking SW26010 many-core processor. In: 2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), pp. 743–752. IEEE (2017)
Yang, C., et al.: 10m-core scalable fully-implicit solver for nonhydrostatic atmospheric dynamics. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2016. IEEE Press (2016)
Yin, B., Li, Y., Dun, M., You, X., Yang, H., Luan, Z., Qian, D.: swGBDT: efficient gradient boosted decision tree on sunway many-core processor. In: Panda, D.K. (ed.) SCFA 2020. LNCS, vol. 12082, pp. 67–86. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-48842-0_5
Zhang, T., et al.: Sw\(\_\)gromacs: accelerate gromacs on Sunway Taihulight. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2019. Association for Computing Machinery, New York (2019)
Acknowledgment
This work is supported by National Key R&D Program of China (Grant No. 2020YFB150001), National Natural Science Foundation of China (Grant No. 62072018, 61502019 and 61732002), and the Open Project Program of the State Key Laboratory of Mathematical Engineering and Advanced Computing (Grant No. 2019A12).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Chen, B. et al. (2021). swRodinia: A Benchmark Suite for Exploiting Architecture Properties of Sunway Processor. In: Wolf, F., Gao, W. (eds) Benchmarking, Measuring, and Optimizing. Bench 2020. Lecture Notes in Computer Science(), vol 12614. Springer, Cham. https://doi.org/10.1007/978-3-030-71058-3_2
Download citation
DOI: https://doi.org/10.1007/978-3-030-71058-3_2
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-71057-6
Online ISBN: 978-3-030-71058-3
eBook Packages: Computer ScienceComputer Science (R0)