Automated Performance Modeling of the UG4 Simulation Framework

  • Andreas VogelEmail author
  • Alexandru Calotoiu
  • Arne Nägel
  • Sebastian Reiter
  • Alexandre Strube
  • Gabriel Wittum
  • Felix Wolf
Conference paper
Part of the Lecture Notes in Computational Science and Engineering book series (LNCSE, volume 113)


Many scientific research questions such as the drug diffusion through the upper part of the human skin are formulated in terms of partial differential equations and their solution is numerically addressed using grid based finite element methods. For detailed and more realistic physical models this computational task becomes challenging and thus complex numerical codes with good scaling properties up to millions of computing cores are required. Employing empirical tests we presented very good scaling properties for the geometric multigrid solver in Reiter et al. (Comput Vis Sci 16(4):151–164, 2013) using the UG4 framework that is used to address such problems. In order to further validate the scalability of the code we applied automated performance modeling to UG4 simulations and presented how performance bottlenecks can be detected and resolved in Vogel et al. (10,000 performance models per minute—scalability of the UG4 simulation framework. In: Träff JL, Hunold S, Versaci F (eds) Euro-Par 2015: Parallel processing, theoretical computer science and general issues, vol 9233. Springer, Springer, Heidelberg, pp 519–531, 2015). In this paper we provide an overview on the obtained results, present a more detailed analysis via performance models for the components of the geometric multigrid solver and comment on how the performance models coincide with our expectations.


Multigrid Method Grid Level Multigrid Algorithm Weak Scaling Multigrid Solver 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.



Financial support from the DFG Priority Program 1648 Software for Exascale Computing (SPPEXA) is gratefully acknowledged. The authors also thank the Gauss Centre for Supercomputing (GCS) for providing computing time on the GCS share of the supercomputer JUQUEEN at Jülich Supercomputing Centre (JSC).


  1. 1.
    Baker, A., Falgout, R., Kolev, T., Yang, U.: Multigrid smoothers for ultra-parallel computing. SIAM J. Sci. Comput. 33, 2864–2887 (2011)MathSciNetCrossRefzbMATHGoogle Scholar
  2. 2.
    Baker, A.H., Falgout, R.D., Gamblin, T., Kolev, T.V., Schulz, M., Yang, U.M.: Scaling algebraic multigrid solvers: on the road to exascale. In: Competence in High Performance Computing 2010, pp. 215–226. Springer, Berlin/New York (2012)Google Scholar
  3. 3.
    Bastian, P., Blatt, M., Scheichl, R.: Algebraic multigrid for discontinuous Galerkin discretizations of heterogeneous elliptic problems. Numer. Linear Algebra 19 (2), 367–388 (2012)MathSciNetCrossRefzbMATHGoogle Scholar
  4. 4.
    Bergen, B., Gradl, T., Rude, U., Hulsemann, F.: A massively parallel multigrid method for finite elements. Comput. Sci. Eng. 8 (6), 56–62 (2006)CrossRefGoogle Scholar
  5. 5.
    Boyd, E.L., Azeem, W., Lee, H.H., Shih, T.P., Hung, S.H., Davidson, E.S.: A hierarchical approach to modeling and improving the performance of scientific applications on the KSR1. In: Proceedings of the 1994 International Conference on Parallel Processing, St. Charles, vol. III, pp. 188–192. IEEE (1994)Google Scholar
  6. 6.
    Braess, D.: Finite Elemente. Springer, Berlin (2003)CrossRefzbMATHGoogle Scholar
  7. 7.
    Calotoiu, A., Hoefler, T., Poke, M., Wolf, F.: Using automated performance modeling to find scalability bugs in complex codes. In: Proceedings of the ACM/IEEE Conference on Supercomputing (SC13), Denver. ACM (2013)Google Scholar
  8. 8.
    Calotoiu, A., Hoefler, T., Wolf, F.: Mass-producing insightful performance models. In: Workshop on Modeling & Simulation of Systems and Applications, Seattle, Aug 2014Google Scholar
  9. 9.
    Carrington, L., Laurenzano, M., Tiwari, A.: Characterizing large-scale HPC applications through trace extrapolation. Parallel Process. Lett. 23 (4), 1340008 (2013)MathSciNetCrossRefGoogle Scholar
  10. 10.
    Ciarlet, P.G., Lions, J.: Finite Element Methods (Part 1). North-Holland, Amsterdam (1991)zbMATHGoogle Scholar
  11. 11.
    Gahvari, H., Gropp, W.: An introductory exascale feasibility study for FFTs and multigrid. In: International Symposium on Parallel & Distributed Processing (IPDPS), pp. 1–9. IEEE, Piscataway (2010)Google Scholar
  12. 12.
    Gahvari, H., Baker, A.H., Schulz, M., Yang, U.M., Jordan, K.E., Gropp, W.: Modeling the performance of an algebraic multigrid cycle on HPC platforms. In: Proceedings of the International Conference on Supercomputing, pp. 172–181. ACM, New York (2011)Google Scholar
  13. 13.
    Girona, S., Labarta, J., Badia, R.M.: Validation of dimemas communication model for MPI collective operations. In: Proceedings of the 7th European PVM/MPI Users’ Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface, pp. 39–46. Springer, London (2000). Google Scholar
  14. 14.
    Gmeiner, B., Köstler, H., Stürmer, M., Rüde, U.: Parallel multigrid on hierarchical hybrid grids: a performance study on current high performance computing clusters. Concurr. Comput.: Pract. Exp. 26 (1), 217–240 (2014)CrossRefGoogle Scholar
  15. 15.
    Hackbusch, W.: Multi-grid Methods and Applications, vol. 4. Springer, Berlin/New York (1985)zbMATHGoogle Scholar
  16. 16.
    Hackbusch, W.: Theorie und Numerik elliptischer Differentialgleichungen: mit Beispielen und Übungsaufgaben. Teubner (1996)Google Scholar
  17. 17.
    Heppner, I., Lampe, M., Nägel, A., Reiter, S., Rupp, M., Vogel, A., Wittum, G.: Software framework ug4: parallel multigrid on the hermit supercomputer. In: High Performance Computing in Science and Engineering ’12, pp. 435–449. Springer, Cham (2013)Google Scholar
  18. 18.
    Ierusalimschy, R., De Figueiredo, L.H., Celes Filho, W.: Lua-an extensible extension language. Softw. Pract. Exp. 26 (6), 635–652 (1996)CrossRefGoogle Scholar
  19. 19.
    Lee, B.C., Brooks, D.M., de Supinski, B.R., Schulz, M., Singh, K., McKee, S.A.: Methods of inference and learning for performance modeling of parallel applications. In: Proceedings of the 12th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP ’07), pp. 249–258. ACM, New York (2007)Google Scholar
  20. 20.
    Nägel, A., Heisig, M., Wittum, G.: A comparison of two- and three-dimensional models for the simulation of the permeability of human stratum corneum. Eur. J. Pharm. Biopharm. 72 (2), 332–338 (2009)CrossRefGoogle Scholar
  21. 21.
    Nägel, A., Heisig, M., Wittum, G.: Detailed modeling of skin penetration—an overview. Adv. Drug Deliv. Rev. 65 (2), 191–207 (2013)CrossRefGoogle Scholar
  22. 22.
  23. 23.
    Petrini, F., Kerbyson, D.J., Pakin, S.: The case of the missing supercomputer performance: achieving optimal performance on the 8,192 processors of ASCI Q. In: Proceedings of the ACM/IEEE Conference on Supercomputing (SC’03), pp. 55ff. ACM, New York (2003)Google Scholar
  24. 24.
    Picard, R.R., Cook, R.D.: Cross-validation of regression models. J. Am. Stat. Assoc. 79 (387), 575–583 (1984)MathSciNetCrossRefzbMATHGoogle Scholar
  25. 25.
    Reiter, S.: Efficient algorithms and data structures for the realization of adaptive, hierarchical grids on massively parallel systems. Ph.D. thesis, University of Frankfurt, Germany (2014)Google Scholar
  26. 26.
    Reiter, S.: Promesh (Nov 2015),
  27. 27.
    Reiter, S., Vogel, A., Heppner, I., Rupp, M., Wittum, G.: A massively parallel geometric multigrid solver on hierarchically distributed grids. Comput. Vis. Sci. 16 (4), 151–164 (2013)CrossRefGoogle Scholar
  28. 28.
    Saad, Y.: Iterative Methods for Sparse Linear Systems. SIAM, Philadelphia (2003)CrossRefzbMATHGoogle Scholar
  29. 29.
    Sampath, R., Biros, G.: A parallel geometric multigrid method for finite elements on octree meshes. SIAM J. Sci. Comput. 32, 1361–1392 (2010)MathSciNetCrossRefzbMATHGoogle Scholar
  30. 30.
    Shudler, S., Calotoiu, A., Hoefler, T., Strube, A., Wolf, F.: Exascaling your library: will your implementation meet your expectations? In: Proceedings of the International Conference on Supercomputing (ICS), Newport Beach, pp. 1–11. ACM (2015)Google Scholar
  31. 31.
    Siebert, C., Wolf, F.: Parallel sorting with minimal data. In: Recent Advances in the Message Passing Interface, pp. 170–177. Springer, Berlin/New York (2011)Google Scholar
  32. 32.
    Spafford, K.L., Vetter, J.S.: Aspen: a domain specific language for performance modeling. In: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, SC ’12, pp. 84:1–84:11. IEEE Computer Society Press, Los Alamitos (2012)Google Scholar
  33. 33.
    Sundar, H., Biros, G., Burstedde, C., Rudi, J., Ghattas, O., Stadler, G.: Parallel geometric-algebraic multigrid on unstructured forests of octrees. In: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, p. 43. IEEE Computer Society Press, Los Alamitos (2012)Google Scholar
  34. 34.
    Tallent, N.R., Hoisie, A.: Palm: easing the burden of analytical performance modeling. In: Proceedings of the International Conference on Supercomputing (ICS), pp. 221–230. ACM, New York (2014)Google Scholar
  35. 35.
    UG4 (Nov 2015),
  36. 36.
    Vogel, A.: Flexible und kombinierbare Implementierung von Finite-Volumen-Verfahren höherer Ordnung mit Anwendungen für die Konvektions-Diffusions-, Navier-Stokes- und Nernst-Planck-Gleichungen sowie dichtegetriebene Grundwasserströmung in porösen Medien. Ph.D. thesis, Universität Frankfurt am Main (2014)Google Scholar
  37. 37.
    Vogel, A., Reiter, S., Rupp, M., Nägel, A., Wittum, G.: UG 4: a novel flexible software system for simulating PDE based models on high performance computers. Comput. Vis. Sci. 16 (4), 165–179 (2013)CrossRefGoogle Scholar
  38. 38.
    Vogel, A., Calotoiu, A., Strube, A., Reiter, S., Nägel, A., Wolf, F., Wittum, G.: 10,000 performance models per minute—scalability of the UG4 simulation framework. In: Träff, J.L., Hunold, S., Versaci, F. (eds.) Euro-Par 2015: Parallel Processing, Theoretical Computer Science and General Issues, vol. 9233, pp. 519–531. Springer, Heidelberg (2015)CrossRefGoogle Scholar
  39. 39.
    Williams, S., Lijewski, M., Almgren, A., Straalen, B.V., Carson, E., Knight, N., Demmel, J.: s-step Krylov subspace methods as bottom solvers for geometric multigrid. In: 28th International Parallel and Distributed Processing Symposium, pp. 1149–1158. IEEE, Piscataway (2014)Google Scholar
  40. 40.
    Wolf, F., Bischof, C., Hoefler, T., Mohr, B., Wittum, G., Calotoiu, A., Iwainsky, C., Strube, A., Vogel, A.: Catwalk: a quick development path for performance models. In: Euro-Par 2014: Parallel Processing Workshops. Lecture Notes in Computer Science, pp. 589–600. Springer, Cham (2014)Google Scholar
  41. 41.
    Wu, X., Mueller, F.: ScalaExtrap: trace-based communication extrapolation for SPMD programs. In: Proceedings of the 16th ACM Symposium on Principles and Practice of Parallel Programming (PPoPP ’11), pp. 113–122. ACM, New York (2011)Google Scholar
  42. 42.
    Zhai, J., Chen, W., Zheng, W.: Phantom: predicting performance of parallel applications on large-scale parallel machines using a single node. Sigplan Not. 45 (5), 305–314 (2010)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  • Andreas Vogel
    • 1
    Email author
  • Alexandru Calotoiu
    • 2
  • Arne Nägel
    • 1
  • Sebastian Reiter
    • 1
  • Alexandre Strube
    • 3
  • Gabriel Wittum
    • 1
  • Felix Wolf
    • 2
  1. 1.Goethe Universität FrankfurtFrankfurtGermany
  2. 2.Technische Universität DarmstadtDarmstadtGermany
  3. 3.Jülich Supercomputing CenterJülichGermany

Personalised recommendations