High-Order ADER-DG Minimizes Energy- and Time-to-Solution of SeisSol

  • Alexander BreuerEmail author
  • Alexander Heinecke
  • Leonhard Rannabauer
  • Michael Bader
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9137)


In this paper we give a comprehensive overview of our node-level optimization of the high-order finite element software SeisSol aiming at minimizing energy- and time-to-solution. SeisSol simulates dynamic rupture and seismic wave propagation at petascale performance in production runs. In this context we analyze the impact that convergence order, CPU clock frequency, vector instruction sets and chip-level parallelism have on the execution time, energy consumption and accuracy of the obtained solution. From a performance perspective, especially on state-of-the-art and future architectures, the shift from a memory- to a compute-bound scheme and the need for double precision arithmetic with increasing orders of convergence is compelling. Our results show that we are able to reduce the computational error by up to five orders of magnitudes when increasing the order of the scheme from 2 to 7, while consuming the same amount of energy.


Energy- and time-to-solution High-order Vectorization ADER Discontinuous galerkin Finite element method 



Our project was supported by the Intel Parallel Computing Centre “ExScaMIC - Extreme Scalability on x86/MIC”. We gratefully acknowledge the respective support by Intel Corporation.

Optimization Notice: Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more information go to Intel, Xeon, and Intel Xeon Phi are trademarks of Intel Corporation in the U.S. and/or other countries.


  1. 1.
    Aliaga, J.I., Barreda, M., Dolz, M.F., Quintana-Orti, E.S.: Are our dense linear Algebra libraries energy-friendly? Comput. Sci. Res. Dev. 30(2), 187–196 (2015)CrossRefGoogle Scholar
  2. 2.
    Anzt, H., Beglarian, A., Chilingaryan, S., Ferrone, A., Heuveline, V., Kopmann, A.: A unified energy footprint for simulation software. Comput. Sci. -Res. Dev. 29(2), 131–138 (2014)CrossRefGoogle Scholar
  3. 3.
    Auweter, A., Bode, A., Brehm, M., Brochard, L., Hammer, N., Huber, H., Panda, R., Thomas, F., Wilde, T.: A case study of energy aware scheduling on supermuc. In: Kunkel, J.M., Ludwig, T., Meuer, H.W. (eds.) ISC 2014. LNCS, vol. 8488, pp. 394–409. Springer, Heidelberg (2014) Google Scholar
  4. 4.
    Bosilca, G., Ltaief, H., Dongarra, J.: Power profiling of cholesky and qr factorizations on distributed memory systems. In: Third International Conference on Energy-Aware High Performance Computing, Hamburg, September 2012Google Scholar
  5. 5.
    Breuer, A., Heinecke, A., Rettenberger, S., Bader, M., Gabriel, A.-A., Pelties, C.: Sustained petascale performance of seismic simulations with seissol on supermuc. In: Kunkel, J.M., Ludwig, T., Meuer, H.W. (eds.) ISC 2014. LNCS, vol. 8488, pp. 1–18. Springer, Heidelberg (2014) Google Scholar
  6. 6.
    Cebrian,J.W., Natvig, L., Meyer, J.C.: Improving energy efficiency through parallelization and vectorization on Intel Core i5 and i7 processors. In: High Performance Computing, Networking Storage and Analysis, SC Companion: 0:675–684 (2012)Google Scholar
  7. 7.
    Charles, J., Sawyer, W., Dolz, M.F., Catalń, S.: Evaluating the performance and energy efficiency of the COSMO-ART model system. Comput. Sci. Res. Dev. 30(2), 177–186 (2015)CrossRefGoogle Scholar
  8. 8.
    Chen, T., Du, Z., Sun, N., Wang, J., Wu, C., Chen, Y., Temam, O.: Diannao: a small-footprint high-throughput accelerator for ubiquitous machine-learning. In: Proceedings of the 19th International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS 2014, pp. 269–284. ACM, New York (2014)Google Scholar
  9. 9.
    Chen, Y., Luo, T., Liu, S., Zhang, S., He, L., Wang, J., Li, L., Chen, T., Xu, Z., Sun, N., Temam, O.: Dadiannao: a machine-learning supercomputer. In: ACM/IEEE International Symposium on Microarchitecture (MICRO), December 2014Google Scholar
  10. 10.
    Cheveresan, R., Ramsay, M., Feucht, C., Sharapov, I.: Characteristics of workloads used in high performance and technical computing. In: Proceedings of the 21st Annual International Conference on Supercomputing, ICS 2007, pp. 73–82. ACM, New York (2007)Google Scholar
  11. 11.
    Demmel, J., Gearhart, A.: Instrumenting linear algebra energy consumption via on-chip energy counters. Technical report (2012)Google Scholar
  12. 12.
    Dongarra, J., Ltaief, H., Luszczek, P., Weaver, V.M.: Energy footprint of advanced dense numerical linear algebra using tile algorithms on multicore architecture. In: 2012 Second International Conference on Cloud and Green Computing (CGC), pp. 274–281. IEEE (2012)Google Scholar
  13. 13.
    Dumbser, M., Käser, M.: An arbitrary high-order discontinuous Galerkin method for elastic waves on unstructured meshes - II. The three-dimensional isotropic case. Geophys. J. Int. 167(1), 319–336 (2006)CrossRefGoogle Scholar
  14. 14.
    Hager, G., Treibig, J., Habich, J., Wellein, G.: Exploring performance and power properties of modern multicore chips via simple machine models. CoRR, abs/1208.2908, 2012Google Scholar
  15. 15.
    Hähnel, M., Döbel, B., Völp, M., Härtig, H.: Measuring energy consumption for short code paths using rapl. SIGMETRICS Perform. Eval. Rev. 40(3), 13–17 (2012)CrossRefGoogle Scholar
  16. 16.
    Heinecke, A., Breuer, A., Rettenberger, S., Bader, M., Gabriel, A.-A., Pelties, C., Bode, A., Barth, W., Liao, X-K., Vaidyanathan, K., Smelyanskiy, M., Dubey, P.: Petascale high order dynamic rupture earthquake simulations on heterogeneous supercomputers. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis SC14, pp. 3–14. IEEE, New Orleans, November 2014. Gordon Bell FinalistGoogle Scholar
  17. 17.
    Heinecke, A., Vaidyanathan, K., Smelyanskiy, M., Kobotov, A., Dubtsov, R., Henry, G., Chrysos, G., Shet, A.G., Dubey, P.: Design and implementation of the linpack benchmark for single and multi-node systems based on intel(r) xeon phi(tm) coprocessor. In: 27th IEEE International Symposium on Parallel and Distributed Processing, IPDPS 2013, pp. 126–137. IEEE Computer Society, Cambridge, Boston, USA, 20–24 May 2013Google Scholar
  18. 18.
    Käser, M., Dumbser, M.: An arbitrary high-order discontinuous galerkin method for elasticwaves on unstructured meshesi. the two-dimensional isotropic case withexternal source terms. Geophysical Journal International 166(2), 855–877 (2006)CrossRefGoogle Scholar
  19. 19.
    Lawson, G., Sosonkina, M., Shen, Y.: Energy evaluation for applications with different thread affinities on the intel xeon phi. In: 2014 International Symposium on Computer Architecture and High Performance Computing Workshop (SBAC-PADW), pp. 54–59, October 2014Google Scholar
  20. 20.
    Lawson, G., Sosonkina, M., Shen, Y.: Performance and energy evaluation of comd on intel xeon phi co-processors. In: Proceedings of the 1st International Workshop on Hardware-Software Co-Design for High Performance Computing, Co-HPC 2014, pp. 49–54, IEEE Press, Piscataway, NJ, USA (2014)Google Scholar
  21. 21.
    Ltaief, H., Luszczek, P., Dongarra, J.: Profiling high performance dense linear algebra algorithms on multicore architectures for power and energy efficiency. In: International Conference on Energy-Aware High Performance Computing (EnA-HPC 2011), Hamburg, Germany, September 2011Google Scholar
  22. 22.
    Moczo, P., Kristek, J., Galis, M., Pazak, P., Balazovjech, M.: The finite-difference and finite-element modeling of seismic wave propagation and earthquake motion. Acta phys. slovaca 57(2), 177–406 (2007)CrossRefGoogle Scholar
  23. 23.
    Rahman, S.F.,Guo, J., Yi, Q.: Automated empirical tuning of scientific codes for performance and power consumption. In: Proceedings of the 6th International Conference on High Performance and Embedded Architectures and Compilers, HiPEAC 2011, pp. 107–116. ACM, New York, NY, USA (2011)Google Scholar
  24. 24.
    Rotem, E., Naveh, A., Ananthakrishnan, A., Rajwan, D., Weissmann, E.: Power-management architecture of the intel microarchitecture code-named sandy bridge. Micro, IEEE 32(2), 20–27 (2012)CrossRefGoogle Scholar
  25. 25.
    Taylor, M.B.: Is dark silicon useful?: harnessing the four horsemen of the coming dark silicon apocalypse. In: Proceedings of the 49th Annual Design Automation Conference, DAC 2012, pp. 1131–1136. ACM, New York (2012)Google Scholar
  26. 26.
    Tiwari, A., Laurenzano, M.A., Carrington, L., Snavely, A.: Auto-tuning for energy usage in scientific applications. In: Proceedings of the 2011 International Conference on Parallel Processing - vol. 2, Euro-Par 2011, pp. 178–187. Springer-Verlag, Berlin, Heidelberg (2012)Google Scholar
  27. 27.
    Zecena, I., Burtscher, M., Jin, T., Zong, Z.: Evaluating the performance and energy efficiency of n-body codes on multi-core cpus and gpus. In: 2013 IEEE 32nd International Performance Computing and Communications Conference (IPCCC), pp. 1–8. IEEE (2013)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • Alexander Breuer
    • 1
    Email author
  • Alexander Heinecke
    • 2
  • Leonhard Rannabauer
    • 1
  • Michael Bader
    • 1
  1. 1.Technische Universität MünchenGarchingGermany
  2. 2.Intel CorporationSanta ClaraUSA

Personalised recommendations