Sustained Petascale Performance of Seismic Simulations with SeisSol on SuperMUC

  • Alexander Breuer
  • Alexander Heinecke
  • Sebastian Rettenberger
  • Michael Bader
  • Alice-Agnes Gabriel
  • Christian Pelties
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8488)


Seismic simulations in realistic 3D Earth models require peta- or even exascale computing power to capture small-scale features of high relevance for scientific and industrial applications. In this paper, we present optimizations of SeisSol – a seismic wave propagation solver based on the Arbitrary high-order accurate DERivative Discontinuous Galerkin (ADER-DG) method on fully adaptive, unstructured tetrahedral meshes – to run simulations under production conditions at petascale performance. Improvements cover the entire simulation chain: from an enhanced ADER time integration via highly scalable routines for mesh input up to hardware-aware optimization of the innermost sparse-/dense-matrix kernels. Strong and weak scaling studies on the SuperMUC machine demonstrated up to 90% parallel efficiency and 45% floating point peak efficiency on 147k cores. For a simulation under production conditions (108 grid cells, 5·1010 degrees of freedom, 5 seconds simulated time), we achieved a sustained performance of 1.09 PFLOPS.


seismic wave and earthquake simulations petascale vectorization ADER-DG parallel I/O 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Breuer, A., Heinecke, A., Bader, M., Pelties, C.: Accelerating SeisSol by Generating Vectorized Code for Sparse Matrix Operators. In: International Conference on Parallel Computing (ParCo). Technische Universität München, Munich (2013)Google Scholar
  2. 2.
    Burstedde, C., Stadler, G., Alisic, L., Wilcox, L.C., Tan, E., Gurnis, M., Ghattas, O.: Large-scale adaptive mantle convection simulation. Geophysical Journal International 192(3), 889–906 (2013)CrossRefGoogle Scholar
  3. 3.
    Carrington, L., Komatitsch, D., Laurenzano, M., Tikir, M.M., Michéa, D., Le Goff, N., Snavely, A., Tromp, J.: High-frequency simulations of global seismic wave propagation using SPECFEM3D_GLOBE on 62K processors. In: International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 60:1–60:11. IEEE Press, Austin (2008)Google Scholar
  4. 4.
    Cui, Y., Poyraz, E., Olsen, K., Zhou, J., Withers, K., Callaghan, S., Larkin, J., Guest, C., Choi, D., Chourasia, A., Shi, Z., Day, S.M., Maechling, J.P., Jordan, T.H.: Physics-based Seismic Hazard Analysis on Petascale Heterogeneous Supercomputers. In: International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE Press, Denver (2013)Google Scholar
  5. 5.
    Cui, Y., Olsen, K.B., Jordan, T.H., Lee, K., Zhou, J., Small, P., Roten, D., Ely, G., Panda, D.K., Chourasia, A., Levesque, J., Day, S.M., Maechling, P.: Scalable Earthquake Simulation on Petascale Supercomputers. In: International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 1–20. IEEE Press, Washington, DC (2010)Google Scholar
  6. 6.
    Day, S.M., Bielak, J., Dreger, D., Graves, R., Larsen, S., Olsen, K., Pitarka, A.: Tests of 3D elastodynamic codes: Final report for Lifelines Project 1A02. Tech. rep., Pacific Earthquake Engineering Research Center (2003)Google Scholar
  7. 7.
    De La Puente, J., Käser, M., Dumbser, M., Igel, H.: An arbitrary high-order discontinuous Galerkin method for elastic waves on unstructured meshes–IV. Anisotropy. Geophysical Journal International 169(3), 1210–1228 (2007)CrossRefGoogle Scholar
  8. 8.
    Dumbser, M., Käser, M.: An arbitrary high-order discontinuous Galerkin method for elastic waves on unstructured meshes – II. The three-dimensional isotropic case. Geophysical Journal International 167(1), 319–336 (2006)CrossRefGoogle Scholar
  9. 9.
    Dumbser, M., Käser, M., Toro, E.F.: An arbitrary high-order Discontinuous Galerkin method for elastic waves on unstructured meshes–V. Local time stepping and p-adaptivity. Geophysical Journal International 171(2), 695–717 (2007)Google Scholar
  10. 10.
    Friedley, A., Bronevetsky, G., Lumsdaine, A., Hoefler, T.: Hybrid MPI: Efficient Message Passing for Multi-core Systems. In: International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE Press, Denver (2013)Google Scholar
  11. 11.
    Gerstenecker, C., Läufer, G., Steineck, D., Tiede, C., Wrobel, B.: Digital Elevation Models for Merapi. In: Microgravity at Merapi Volcano: Results of the First Two Campaigns, 1st Merapi-Galeras-Workshop (1999), DGG Special IssueGoogle Scholar
  12. 12.
    Gerstenecker, C., Läufer, G., Steineck, D., Tiede, C., Wrobel, B.: Validation of Digital Elevation Models around Merapi Volcano, Java, Indonesia. Natural Hazards and Earth System Sciences 5, 863–876 (2005)CrossRefGoogle Scholar
  13. 13.
    Gmeiner, B., Köstler, H., Stürmer, M., Rüde, U.: Parallel multigrid on hierarchical hybrid grids: a performance study on current high performance computing clusters. Concurrency and Computation: Practice and Experience 26(1), 217–240 (2014)CrossRefGoogle Scholar
  14. 14.
    Goto, K., Van De Geijn, R.: High-performance implementation of the level-3 BLAS. ACM Transactions on Mathematical Software 35, 1–14 (2008)CrossRefzbMATHGoogle Scholar
  15. 15.
    Graves, R., Jordan, T.H., Callaghan, S., Deelman, E., Field, E., Juve, G., Kesselman, C., Maechling, P., Mehta, G., Milner, K., Okaya, D., Small, P., Vahi, K.: CyberShake: A Physics-Based Seismic Hazard Model for Southern California. Pure and Applied Geophysics 168, 367–381 (2011)CrossRefGoogle Scholar
  16. 16.
    Guest, M.: The Scientific Case for HPC in Europe 2012 - 2020. Tech. rep., Partnership for Advanced Computing in Europe (PRACE) (2012)Google Scholar
  17. 17.
    Heinecke, A., Breuer, A., Rettenberger, S., Bader, M., Gabriel, A., Pelties, C.: Optimized Kernels for large scale earthquake simulations with SeisSol, an unstructured ADER-DG code. In: International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE Press, Denver (2013) (poster abstract)Google Scholar
  18. 18.
    Iglberger, K., Hager, G., Treibig, J., Rude, U.: High performance smart expression template math libraries. In: International Conference on High Performance Computing and Simulation, pp. 367–373 (2012)Google Scholar
  19. 19.
    Intel Cooperation: Intel Math Kernel Library (Intel MKL) 11.0. Tech. rep., Intel Cooperation (2013),
  20. 20.
    Karniadakis, G.E., Sherwin, S.J.: Spectral/hp Element Methods for Computational Fluid Dynamics. Oxford University Press (2007)Google Scholar
  21. 21.
    Karypis, G., Kumar, V.: Parallel Multilevel series k-Way Partitioning Scheme for Irregular Graphs. SIAM Review 41(2), 278–300 (1999)CrossRefzbMATHMathSciNetGoogle Scholar
  22. 22.
    Karypis, G., Kumar, V.: A Coarse-Grain Parallel Formulation of Multilevel k-way Graph Partitioning Algorithm. In: SIAM Conference on Parallel Processing for Scientific Computing. SIAM (1997)Google Scholar
  23. 23.
    Käser, M., Mai, P., Dumbser, M.: On the Accurate Treatment of Finite Source Rupture Models Using ADER-DG on Tetrahedral Meshes. Bulletin of the Seismological Society of America 97(5), 1570–1586 (2007)CrossRefGoogle Scholar
  24. 24.
    Käser, M., Pelties, C., Castro, C., Djikpesse, H., Prange, M.: Wave field modeling in exploration seismology using the discontinuous galerkin finite element method on hpc-infrastructure. The Leading Edge 29, 76–85 (2010)CrossRefGoogle Scholar
  25. 25.
    Käser, M., Dumbser, M., De La Puente, J., Igel, H.: An arbitrary high-order Discontinuous Galerkin method for elastic waves on unstructured meshes–III. Viscoelastic attenuation. Geophysical Journal International 168(1), 224–242 (2007)Google Scholar
  26. 26.
    Kramer, W.T.: Top500 Versus Sustained Performance: The Top Problems with the Top500 List – and What to Do About Them. In: 21st International Conference on Parallel Architectures and Compilation Techniques, pp. 223–230. ACM, New York (2012)CrossRefGoogle Scholar
  27. 27.
    Krzikalla, O., Feldhoff, K., Müller-Pfefferkorn, R., Nagel, W.E.: Scout: A Source-to-Source Transformator for SIMD-Optimizations. In: Alexander, M., et al. (eds.) Euro-Par 2011, Part II. LNCS, vol. 7156, pp. 137–145. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  28. 28.
    Lay, T., Aster, R., Forsyth, D., Romanowicz, B., Allen, R., Cormier, V., Wysession, M.E.: Seismological grand challenges in understanding Earth’s dynamic systems. Report to the National Science Foundation, IRIS Consortium 46 (2009)Google Scholar
  29. 29.
    Mellor-Crummey, J., Garvin, J.: Optimizing Sparse Matrix-Vector Product Computations Using Unroll and Jam. Int. J. High Perform. Comput. Appl. 18(2), 225–236 (2004)CrossRefGoogle Scholar
  30. 30.
    Message Passing Interface Forum: MPI: A Message-Passing Interface Standard, Version 3.0. Specification, Message Passing Interface Forum (2012)Google Scholar
  31. 31.
    Meuer, H., Strohmaier, E., Dongarra, J., Simon, H.: Top500 list (November 2013),
  32. 32.
    Nishtala, R., Vuduc, R., Demmel, J., Yelick, K.: When cache blocking of sparse matrix vector multiply works and why. Applicable Algebra in Engineering, Communication and Computing 18(3), 297–311 (2007)CrossRefzbMATHMathSciNetGoogle Scholar
  33. 33.
    Oliker, L., Canning, A., Carter, J., Iancu, C., Lijewski, M., Kamil, S., Shalf, J., Shan, H., Strohmaier, E., Ethier, S., Goodale, T.: Scientific Application Performance on Candidate PetaScale Platforms. In: IEEE International Parallel and Distributed Processing Symposium, IPDPS 2007, pp. 1–12 (2007)Google Scholar
  34. 34.
    Olsen, K.B., Day, S.M., Minster, J.B., Cui, Y., Chourasia, A., Faerman, M., Moore, R., Maechling, P., Jordan, T.: Strong shaking in Los Angeles expected from southern San Andreas earthquake. Geophysical Research Letters 33(7) (2006)Google Scholar
  35. 35.
    OpenMP Architecture Review Board: OpenMP Application Program Interface Version 4.0 (2013),
  36. 36.
    Pelties, C., Gabriel, A.A., Ampuero, J.P.: Verification of an ADER-DG method for complex dynamic rupture problems. Geoscientific Model Development Discussion 6, 5981–6034 (2013)CrossRefGoogle Scholar
  37. 37.
    Pelties, C., Huang, Y., Ampuero, J.P.: Pulse-like rupture induced by three-dimensional fault zone flower structures. Journal of Geophysical Research: Solid Earth (to be submitted, 2014) Google Scholar
  38. 38.
    Pelties, C., Käser, M., Hermann, V., Castro, C.E.: Regular versus irregular meshing for complicated models and their effect on synthetic seismograms. Geophysical Journal International 183(2), 1031–1051 (2010)CrossRefGoogle Scholar
  39. 39.
    Pelties, C., la Puente, J.D., Ampuero, J.P., Brietzke, G.B., Käser, M.: Three-Dimensional Dynamic Rupture Simulation with a High-order Discontinuous Galerkin Method on Unstructured Tetrahedral Meshes. Journal of Geophysical Research: Solid Earth 117 (2012)Google Scholar
  40. 40.
    Peter, D., Komatitsch, D., Luo, Y., Martin, R., Le Goff, N., Casarotti, E., Le Loher, P., Magnoni, F., Liu, Q., Blitz, C., Nissen-Meyer, T., Basini, P., Tromp, J.: Forward and adjoint simulations of seismic wave propagation on fully unstructured hexahedral meshes. Geophys. J. Int. 186(2), 721–739 (2011)CrossRefGoogle Scholar
  41. 41.
    Pham, D.N., Igel, H., de la Puente, J., Käser, M., Schoenberg, M.A.: Rotational motions in homogeneous anisotropic elastic media. Geophysics 75(5), D47–D56 (2010)Google Scholar
  42. 42.
    Rabenseifner, R., Hager, G., Jost, G.: Hybrid MPI/OpenMP Parallel Programming on Clusters of Multi-Core SMP Nodes. In: 17th Euromicro International Conference on Parallel, Distributed and Network-Based Processing, pp. 427–436. IEEE Press, Washington, DC (2009)Google Scholar
  43. 43.
    Rabenseifner, R., Wellein, G.: Comparison of Parallel Programming Models on Clusters of SMP Nodes. In: Modeling, Simulation and Optimization of Complex Processes, pp. 409–425. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  44. 44.
    Rew, R., Davis, G.: NetCDF: an interface for scientific data access. IEEE Computer Graphics and Applications 10(4), 76–82 (1990)CrossRefGoogle Scholar
  45. 45.
    Southern California Earthquake Center: 2014 Science Collaboration Plan. Tech. rep. (2013)Google Scholar
  46. 46.
    Vaidyanathan, K., Pamnany, K., Kalamkar, D.D., Heinecke, A., Smelyanskiy, M., Park, J., Kim, D., Shet, G.A., Kaul, B., Jo’o, B., Dubey, P.: Improving Communication Performance and Scalability of Native Applications on Intel Xeon Phi Coprocessor Clusters. In: 28th IEEE International Symposium on Parallel and Distributed Processing, IPDPS 2014. Phoenix (2014) (accepted for publication)Google Scholar
  47. 47.
    Virieux, J., Calandra, H., Plessix, R.D.: A review of the spectral, pseudo-spectral, finite-difference and finite-element modelling techniques for geophysical imaging. Geophysical Prospecting 59(5), 794–813 (2011)Google Scholar
  48. 48.
    Vuduc, R., Demmel, J.W., Yelick, K.A.: OSKI: A library of automatically tuned sparse matrix kernels. In: Scientific Discovery through Advanced Computing. Journal of Physics: Conference Series. Institute of Physics Publishing, San Francisco (2005)Google Scholar
  49. 49.
    Vuduc, R.W., Moon, H.J.: Fast Sparse Matrix-vector Multiplication by Exploiting Variable Block Structure. In: Yang, L.T., Rana, O.F., Di Martino, B., Dongarra, J. (eds.) HPCC 2005. LNCS, vol. 3726, pp. 807–816. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  50. 50.
    Wenk, S., Pelties, C., Igel, H., Käser, M.: Regional wave propagation using the discontinuous Galerkin method. Journal of Geophysical Research: Solid Earth 4(1), 43–57 (2013)Google Scholar
  51. 51.
    Whaley, R.C., Dongarra, J.J.: Automatically Tuned Linear Algebra Software. In: Conference on Supercomputing, pp. 1–27. IEEE Press, Washington (1998)Google Scholar
  52. 52.
    Whaley, R.C., Petitet, A., Dongarra, J.J.: Automated empirical optimizations of software and the ATLAS project. Parallel Computing 27(1-2), 3–35 (2001)CrossRefzbMATHGoogle Scholar
  53. 53.
    Williams, S., Oliker, L., Vuduc, R., Shalf, J., Yelick, K., Demmel, J.: Optimization of Sparse Matrix-vector Multiplication on Emerging Multicore Platforms. In: International Conference for High Performance Computing, Networking, Storage and Analysis, New York, pp. 38:1–38:12 (2007)Google Scholar
  54. 54.
    Van Zee, F.G., van de Geijn, R.A.: BLIS: A Framework for Rapidly Instantiating BLAS Functionality. ACM Transactions on Mathematical Software (2013) (accepted pending minor modifications)Google Scholar
  55. 55.
    Van Zee, F.G., Smith, T., Igual, F.D., Smelyanskiy, M., Zhang, X., Kistler, M., Austel, V., Gunnels, J., Low, T.M., Marker, B., Killough, L., van de Geijn, R.A.: The BLIS Framework: Experiments in Portability. ACM Transactions on Mathematical Software (2013) (accepted pending modifications)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Alexander Breuer
    • 1
  • Alexander Heinecke
    • 1
  • Sebastian Rettenberger
    • 1
  • Michael Bader
    • 1
  • Alice-Agnes Gabriel
    • 2
  • Christian Pelties
    • 2
  1. 1.Department of InformaticsTechnische Universität MünchenGermany
  2. 2.Department of Earth and Environmental Sciences, GeophysicsLudwig-Maximilians-Universität MünchenGermany

Personalised recommendations