High-Order ADER-DG Minimizes Energy- and Time-to-Solution of SeisSol

Breuer, Alexander; Heinecke, Alexander; Rannabauer, Leonhard; Bader, Michael

doi:10.1007/978-3-319-20119-1_25

High-Order ADER-DG Minimizes Energy- and Time-to-Solution of SeisSol

Alexander Breuer¹⁵,
Alexander Heinecke¹⁶,
Leonhard Rannabauer¹⁵ &
…
Michael Bader¹⁵

Conference paper
First Online: 01 January 2015

2883 Accesses
7 Citations

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9137))

Abstract

In this paper we give a comprehensive overview of our node-level optimization of the high-order finite element software SeisSol aiming at minimizing energy- and time-to-solution. SeisSol simulates dynamic rupture and seismic wave propagation at petascale performance in production runs. In this context we analyze the impact that convergence order, CPU clock frequency, vector instruction sets and chip-level parallelism have on the execution time, energy consumption and accuracy of the obtained solution. From a performance perspective, especially on state-of-the-art and future architectures, the shift from a memory- to a compute-bound scheme and the need for double precision arithmetic with increasing orders of convergence is compelling. Our results show that we are able to reduce the computational error by up to five orders of magnitudes when increasing the order of the scheme from 2 to 7, while consuming the same amount of energy.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
WSM, HSW: Wattsup powermeter (accuracy: \(+/-\) 1.5 %), SNB: MEGWARE clustsafe.
2.
Doubling the number of transistors doubles the amount of computations within the same energy budget; number of transistors: WSM: 2 \(\times \) 1.17 B, SNB: 2 \(\times \) 2.26 B, HSW: 2 \(\times \) 5.57 B.
3.
Available at http://www.sismowine.org.

References

Aliaga, J.I., Barreda, M., Dolz, M.F., Quintana-Orti, E.S.: Are our dense linear Algebra libraries energy-friendly? Comput. Sci. Res. Dev. 30(2), 187–196 (2015)
Article Google Scholar
Anzt, H., Beglarian, A., Chilingaryan, S., Ferrone, A., Heuveline, V., Kopmann, A.: A unified energy footprint for simulation software. Comput. Sci. -Res. Dev. 29(2), 131–138 (2014)
Article Google Scholar
Auweter, A., Bode, A., Brehm, M., Brochard, L., Hammer, N., Huber, H., Panda, R., Thomas, F., Wilde, T.: A case study of energy aware scheduling on supermuc. In: Kunkel, J.M., Ludwig, T., Meuer, H.W. (eds.) ISC 2014. LNCS, vol. 8488, pp. 394–409. Springer, Heidelberg (2014)
Google Scholar
Bosilca, G., Ltaief, H., Dongarra, J.: Power profiling of cholesky and qr factorizations on distributed memory systems. In: Third International Conference on Energy-Aware High Performance Computing, Hamburg, September 2012
Google Scholar
Breuer, A., Heinecke, A., Rettenberger, S., Bader, M., Gabriel, A.-A., Pelties, C.: Sustained petascale performance of seismic simulations with seissol on supermuc. In: Kunkel, J.M., Ludwig, T., Meuer, H.W. (eds.) ISC 2014. LNCS, vol. 8488, pp. 1–18. Springer, Heidelberg (2014)
Google Scholar
Cebrian,J.W., Natvig, L., Meyer, J.C.: Improving energy efficiency through parallelization and vectorization on Intel Core i5 and i7 processors. In: High Performance Computing, Networking Storage and Analysis, SC Companion: 0:675–684 (2012)
Google Scholar
Charles, J., Sawyer, W., Dolz, M.F., Catalń, S.: Evaluating the performance and energy efficiency of the COSMO-ART model system. Comput. Sci. Res. Dev. 30(2), 177–186 (2015)
Article Google Scholar
Chen, T., Du, Z., Sun, N., Wang, J., Wu, C., Chen, Y., Temam, O.: Diannao: a small-footprint high-throughput accelerator for ubiquitous machine-learning. In: Proceedings of the 19th International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS 2014, pp. 269–284. ACM, New York (2014)
Google Scholar
Chen, Y., Luo, T., Liu, S., Zhang, S., He, L., Wang, J., Li, L., Chen, T., Xu, Z., Sun, N., Temam, O.: Dadiannao: a machine-learning supercomputer. In: ACM/IEEE International Symposium on Microarchitecture (MICRO), December 2014
Google Scholar
Cheveresan, R., Ramsay, M., Feucht, C., Sharapov, I.: Characteristics of workloads used in high performance and technical computing. In: Proceedings of the 21st Annual International Conference on Supercomputing, ICS 2007, pp. 73–82. ACM, New York (2007)
Google Scholar
Demmel, J., Gearhart, A.: Instrumenting linear algebra energy consumption via on-chip energy counters. Technical report (2012)
Google Scholar
Dongarra, J., Ltaief, H., Luszczek, P., Weaver, V.M.: Energy footprint of advanced dense numerical linear algebra using tile algorithms on multicore architecture. In: 2012 Second International Conference on Cloud and Green Computing (CGC), pp. 274–281. IEEE (2012)
Google Scholar
Dumbser, M., Käser, M.: An arbitrary high-order discontinuous Galerkin method for elastic waves on unstructured meshes - II. The three-dimensional isotropic case. Geophys. J. Int. 167(1), 319–336 (2006)
Article Google Scholar
Hager, G., Treibig, J., Habich, J., Wellein, G.: Exploring performance and power properties of modern multicore chips via simple machine models. CoRR, abs/1208.2908, 2012
Google Scholar
Hähnel, M., Döbel, B., Völp, M., Härtig, H.: Measuring energy consumption for short code paths using rapl. SIGMETRICS Perform. Eval. Rev. 40(3), 13–17 (2012)
Article Google Scholar
Heinecke, A., Breuer, A., Rettenberger, S., Bader, M., Gabriel, A.-A., Pelties, C., Bode, A., Barth, W., Liao, X-K., Vaidyanathan, K., Smelyanskiy, M., Dubey, P.: Petascale high order dynamic rupture earthquake simulations on heterogeneous supercomputers. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis SC14, pp. 3–14. IEEE, New Orleans, November 2014. Gordon Bell Finalist
Google Scholar
Heinecke, A., Vaidyanathan, K., Smelyanskiy, M., Kobotov, A., Dubtsov, R., Henry, G., Chrysos, G., Shet, A.G., Dubey, P.: Design and implementation of the linpack benchmark for single and multi-node systems based on intel(r) xeon phi(tm) coprocessor. In: 27th IEEE International Symposium on Parallel and Distributed Processing, IPDPS 2013, pp. 126–137. IEEE Computer Society, Cambridge, Boston, USA, 20–24 May 2013
Google Scholar
Käser, M., Dumbser, M.: An arbitrary high-order discontinuous galerkin method for elasticwaves on unstructured meshesi. the two-dimensional isotropic case withexternal source terms. Geophysical Journal International 166(2), 855–877 (2006)
Article Google Scholar
Lawson, G., Sosonkina, M., Shen, Y.: Energy evaluation for applications with different thread affinities on the intel xeon phi. In: 2014 International Symposium on Computer Architecture and High Performance Computing Workshop (SBAC-PADW), pp. 54–59, October 2014
Google Scholar
Lawson, G., Sosonkina, M., Shen, Y.: Performance and energy evaluation of comd on intel xeon phi co-processors. In: Proceedings of the 1st International Workshop on Hardware-Software Co-Design for High Performance Computing, Co-HPC 2014, pp. 49–54, IEEE Press, Piscataway, NJ, USA (2014)
Google Scholar
Ltaief, H., Luszczek, P., Dongarra, J.: Profiling high performance dense linear algebra algorithms on multicore architectures for power and energy efficiency. In: International Conference on Energy-Aware High Performance Computing (EnA-HPC 2011), Hamburg, Germany, September 2011
Google Scholar
Moczo, P., Kristek, J., Galis, M., Pazak, P., Balazovjech, M.: The finite-difference and finite-element modeling of seismic wave propagation and earthquake motion. Acta phys. slovaca 57(2), 177–406 (2007)
Article Google Scholar
Rahman, S.F.,Guo, J., Yi, Q.: Automated empirical tuning of scientific codes for performance and power consumption. In: Proceedings of the 6th International Conference on High Performance and Embedded Architectures and Compilers, HiPEAC 2011, pp. 107–116. ACM, New York, NY, USA (2011)
Google Scholar
Rotem, E., Naveh, A., Ananthakrishnan, A., Rajwan, D., Weissmann, E.: Power-management architecture of the intel microarchitecture code-named sandy bridge. Micro, IEEE 32(2), 20–27 (2012)
Article Google Scholar
Taylor, M.B.: Is dark silicon useful?: harnessing the four horsemen of the coming dark silicon apocalypse. In: Proceedings of the 49th Annual Design Automation Conference, DAC 2012, pp. 1131–1136. ACM, New York (2012)
Google Scholar
Tiwari, A., Laurenzano, M.A., Carrington, L., Snavely, A.: Auto-tuning for energy usage in scientific applications. In: Proceedings of the 2011 International Conference on Parallel Processing - vol. 2, Euro-Par 2011, pp. 178–187. Springer-Verlag, Berlin, Heidelberg (2012)
Google Scholar
Zecena, I., Burtscher, M., Jin, T., Zong, Z.: Evaluating the performance and energy efficiency of n-body codes on multi-core cpus and gpus. In: 2013 IEEE 32nd International Performance Computing and Communications Conference (IPCCC), pp. 1–8. IEEE (2013)
Google Scholar

Download references

Acknowledgments

Our project was supported by the Intel Parallel Computing Centre “ExScaMIC - Extreme Scalability on x86/MIC”. We gratefully acknowledge the respective support by Intel Corporation.

Optimization Notice: Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more information go to http://www.intel.com/performance. Intel, Xeon, and Intel Xeon Phi are trademarks of Intel Corporation in the U.S. and/or other countries.

Author information

Authors and Affiliations

Technische Universität München, Boltzmannstr. 3, 85748, Garching, Germany
Alexander Breuer, Leonhard Rannabauer & Michael Bader
Intel Corporation, 2200 Mission College Blvd., Santa Clara, CA, 95054, USA
Alexander Heinecke

Authors

Alexander Breuer
View author publications
You can also search for this author in PubMed Google Scholar
Alexander Heinecke
View author publications
You can also search for this author in PubMed Google Scholar
Leonhard Rannabauer
View author publications
You can also search for this author in PubMed Google Scholar
Michael Bader
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Alexander Breuer .

Editor information

Editors and Affiliations

Deutsches Klimarechenzentrum (DKRZ), Hamburg, Germany
Julian M. Kunkel
Deutsches Klimarechenzentrum (DKRZ), Hamburg, Germany
Thomas Ludwig

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Breuer, A., Heinecke, A., Rannabauer, L., Bader, M. (2015). High-Order ADER-DG Minimizes Energy- and Time-to-Solution of SeisSol. In: Kunkel, J., Ludwig, T. (eds) High Performance Computing. ISC High Performance 2015. Lecture Notes in Computer Science(), vol 9137. Springer, Cham. https://doi.org/10.1007/978-3-319-20119-1_25

Download citation

DOI: https://doi.org/10.1007/978-3-319-20119-1_25
Published: 20 June 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-20118-4
Online ISBN: 978-3-319-20119-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics