Abstract
In this paper we give a comprehensive overview of our node-level optimization of the high-order finite element software SeisSol aiming at minimizing energy- and time-to-solution. SeisSol simulates dynamic rupture and seismic wave propagation at petascale performance in production runs. In this context we analyze the impact that convergence order, CPU clock frequency, vector instruction sets and chip-level parallelism have on the execution time, energy consumption and accuracy of the obtained solution. From a performance perspective, especially on state-of-the-art and future architectures, the shift from a memory- to a compute-bound scheme and the need for double precision arithmetic with increasing orders of convergence is compelling. Our results show that we are able to reduce the computational error by up to five orders of magnitudes when increasing the order of the scheme from 2 to 7, while consuming the same amount of energy.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
WSM, HSW: Wattsup powermeter (accuracy: \(+/-\) 1.5 %), SNB: MEGWARE clustsafe.
- 2.
Doubling the number of transistors doubles the amount of computations within the same energy budget; number of transistors: WSM: 2 \(\times \) 1.17 B, SNB: 2 \(\times \) 2.26 B, HSW: 2 \(\times \) 5.57 B.
- 3.
Available at http://www.sismowine.org.
References
Aliaga, J.I., Barreda, M., Dolz, M.F., Quintana-Orti, E.S.: Are our dense linear Algebra libraries energy-friendly? Comput. Sci. Res. Dev. 30(2), 187–196 (2015)
Anzt, H., Beglarian, A., Chilingaryan, S., Ferrone, A., Heuveline, V., Kopmann, A.: A unified energy footprint for simulation software. Comput. Sci. -Res. Dev. 29(2), 131–138 (2014)
Auweter, A., Bode, A., Brehm, M., Brochard, L., Hammer, N., Huber, H., Panda, R., Thomas, F., Wilde, T.: A case study of energy aware scheduling on supermuc. In: Kunkel, J.M., Ludwig, T., Meuer, H.W. (eds.) ISC 2014. LNCS, vol. 8488, pp. 394–409. Springer, Heidelberg (2014)
Bosilca, G., Ltaief, H., Dongarra, J.: Power profiling of cholesky and qr factorizations on distributed memory systems. In: Third International Conference on Energy-Aware High Performance Computing, Hamburg, September 2012
Breuer, A., Heinecke, A., Rettenberger, S., Bader, M., Gabriel, A.-A., Pelties, C.: Sustained petascale performance of seismic simulations with seissol on supermuc. In: Kunkel, J.M., Ludwig, T., Meuer, H.W. (eds.) ISC 2014. LNCS, vol. 8488, pp. 1–18. Springer, Heidelberg (2014)
Cebrian,J.W., Natvig, L., Meyer, J.C.: Improving energy efficiency through parallelization and vectorization on Intel Core i5 and i7 processors. In: High Performance Computing, Networking Storage and Analysis, SC Companion: 0:675–684 (2012)
Charles, J., Sawyer, W., Dolz, M.F., Catalń, S.: Evaluating the performance and energy efficiency of the COSMO-ART model system. Comput. Sci. Res. Dev. 30(2), 177–186 (2015)
Chen, T., Du, Z., Sun, N., Wang, J., Wu, C., Chen, Y., Temam, O.: Diannao: a small-footprint high-throughput accelerator for ubiquitous machine-learning. In: Proceedings of the 19th International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS 2014, pp. 269–284. ACM, New York (2014)
Chen, Y., Luo, T., Liu, S., Zhang, S., He, L., Wang, J., Li, L., Chen, T., Xu, Z., Sun, N., Temam, O.: Dadiannao: a machine-learning supercomputer. In: ACM/IEEE International Symposium on Microarchitecture (MICRO), December 2014
Cheveresan, R., Ramsay, M., Feucht, C., Sharapov, I.: Characteristics of workloads used in high performance and technical computing. In: Proceedings of the 21st Annual International Conference on Supercomputing, ICS 2007, pp. 73–82. ACM, New York (2007)
Demmel, J., Gearhart, A.: Instrumenting linear algebra energy consumption via on-chip energy counters. Technical report (2012)
Dongarra, J., Ltaief, H., Luszczek, P., Weaver, V.M.: Energy footprint of advanced dense numerical linear algebra using tile algorithms on multicore architecture. In: 2012 Second International Conference on Cloud and Green Computing (CGC), pp. 274–281. IEEE (2012)
Dumbser, M., Käser, M.: An arbitrary high-order discontinuous Galerkin method for elastic waves on unstructured meshes - II. The three-dimensional isotropic case. Geophys. J. Int. 167(1), 319–336 (2006)
Hager, G., Treibig, J., Habich, J., Wellein, G.: Exploring performance and power properties of modern multicore chips via simple machine models. CoRR, abs/1208.2908, 2012
Hähnel, M., Döbel, B., Völp, M., Härtig, H.: Measuring energy consumption for short code paths using rapl. SIGMETRICS Perform. Eval. Rev. 40(3), 13–17 (2012)
Heinecke, A., Breuer, A., Rettenberger, S., Bader, M., Gabriel, A.-A., Pelties, C., Bode, A., Barth, W., Liao, X-K., Vaidyanathan, K., Smelyanskiy, M., Dubey, P.: Petascale high order dynamic rupture earthquake simulations on heterogeneous supercomputers. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis SC14, pp. 3–14. IEEE, New Orleans, November 2014. Gordon Bell Finalist
Heinecke, A., Vaidyanathan, K., Smelyanskiy, M., Kobotov, A., Dubtsov, R., Henry, G., Chrysos, G., Shet, A.G., Dubey, P.: Design and implementation of the linpack benchmark for single and multi-node systems based on intel(r) xeon phi(tm) coprocessor. In: 27th IEEE International Symposium on Parallel and Distributed Processing, IPDPS 2013, pp. 126–137. IEEE Computer Society, Cambridge, Boston, USA, 20–24 May 2013
Käser, M., Dumbser, M.: An arbitrary high-order discontinuous galerkin method for elasticwaves on unstructured meshesi. the two-dimensional isotropic case withexternal source terms. Geophysical Journal International 166(2), 855–877 (2006)
Lawson, G., Sosonkina, M., Shen, Y.: Energy evaluation for applications with different thread affinities on the intel xeon phi. In: 2014 International Symposium on Computer Architecture and High Performance Computing Workshop (SBAC-PADW), pp. 54–59, October 2014
Lawson, G., Sosonkina, M., Shen, Y.: Performance and energy evaluation of comd on intel xeon phi co-processors. In: Proceedings of the 1st International Workshop on Hardware-Software Co-Design for High Performance Computing, Co-HPC 2014, pp. 49–54, IEEE Press, Piscataway, NJ, USA (2014)
Ltaief, H., Luszczek, P., Dongarra, J.: Profiling high performance dense linear algebra algorithms on multicore architectures for power and energy efficiency. In: International Conference on Energy-Aware High Performance Computing (EnA-HPC 2011), Hamburg, Germany, September 2011
Moczo, P., Kristek, J., Galis, M., Pazak, P., Balazovjech, M.: The finite-difference and finite-element modeling of seismic wave propagation and earthquake motion. Acta phys. slovaca 57(2), 177–406 (2007)
Rahman, S.F.,Guo, J., Yi, Q.: Automated empirical tuning of scientific codes for performance and power consumption. In: Proceedings of the 6th International Conference on High Performance and Embedded Architectures and Compilers, HiPEAC 2011, pp. 107–116. ACM, New York, NY, USA (2011)
Rotem, E., Naveh, A., Ananthakrishnan, A., Rajwan, D., Weissmann, E.: Power-management architecture of the intel microarchitecture code-named sandy bridge. Micro, IEEE 32(2), 20–27 (2012)
Taylor, M.B.: Is dark silicon useful?: harnessing the four horsemen of the coming dark silicon apocalypse. In: Proceedings of the 49th Annual Design Automation Conference, DAC 2012, pp. 1131–1136. ACM, New York (2012)
Tiwari, A., Laurenzano, M.A., Carrington, L., Snavely, A.: Auto-tuning for energy usage in scientific applications. In: Proceedings of the 2011 International Conference on Parallel Processing - vol. 2, Euro-Par 2011, pp. 178–187. Springer-Verlag, Berlin, Heidelberg (2012)
Zecena, I., Burtscher, M., Jin, T., Zong, Z.: Evaluating the performance and energy efficiency of n-body codes on multi-core cpus and gpus. In: 2013 IEEE 32nd International Performance Computing and Communications Conference (IPCCC), pp. 1–8. IEEE (2013)
Acknowledgments
Our project was supported by the Intel Parallel Computing Centre “ExScaMIC - Extreme Scalability on x86/MIC”. We gratefully acknowledge the respective support by Intel Corporation.
Optimization Notice: Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more information go to http://www.intel.com/performance. Intel, Xeon, and Intel Xeon Phi are trademarks of Intel Corporation in the U.S. and/or other countries.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Breuer, A., Heinecke, A., Rannabauer, L., Bader, M. (2015). High-Order ADER-DG Minimizes Energy- and Time-to-Solution of SeisSol. In: Kunkel, J., Ludwig, T. (eds) High Performance Computing. ISC High Performance 2015. Lecture Notes in Computer Science(), vol 9137. Springer, Cham. https://doi.org/10.1007/978-3-319-20119-1_25
Download citation
DOI: https://doi.org/10.1007/978-3-319-20119-1_25
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-20118-4
Online ISBN: 978-3-319-20119-1
eBook Packages: Computer ScienceComputer Science (R0)