10,000 Performance Models per Minute – Scalability of the UG4 Simulation Framework

  • Andreas VogelEmail author
  • Alexandru Calotoiu
  • Alexandre Strube
  • Sebastian Reiter
  • Arne Nägel
  • Felix Wolf
  • Gabriel Wittum
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9233)


Numerically addressing scientific questions such as simulating drug diffusion through the human stratum corneum is a challenging task requiring complex codes and plenty of computational resources. The UG4 framework is used for such simulations, and though empirical tests have shown good scalability so far, its sheer size precludes analytical modeling of the entire code. We have developed a process which combines the power of our automated performance modeling method and the workflow manager JUBE to create insightful models for entire UG4 simulations. Examining three typical use cases, we identified and resolved a previously unknown latent scalability bottleneck. In collaboration with the code developers, we validated the performance expectations in each of the use cases, creating over 10,000 models in less than a minute, a feat previously impossible without our automation techniques.


Stratum Corneum Multigrid Method Process Count Iteration Count Skin Permeation 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.



Financial support from the DFG Priority Program 1648 Software for Exascale Computing (SPPEXA) is gratefully acknowledged. The authors also thank the Gauss Centre for Supercomputing (GCS) for providing computing time on the GCS share of the supercomputer JUQUEEN at Jülich Supercomputing Centre (JSC).


  1. 1.
    Baker, A., Falgout, R., Kolev, T., Yang, U.: Multigrid smoothers for ultra-parallel computing. SIAM J. Sci. Comput 33, 2864–2887 (2011)MathSciNetCrossRefzbMATHGoogle Scholar
  2. 2.
    Baker, A.H., Falgout, R.D., Gamblin, T., Kolev, T.V., Schulz, M., Yang, U.M.: Scaling algebraic multigrid solvers: on the road to exascale. In: Competence in High Performance Computing 2010, pp. 215–226. Springer (2012)Google Scholar
  3. 3.
    Bastian, P., Blatt, M., Scheichl, R.: Algebraic multigrid for discontinuous galerkin discretizations of heterogeneous elliptic problems. Numer. Linear Algebra Appl. 19(2), 367–388 (2012)MathSciNetCrossRefzbMATHGoogle Scholar
  4. 4.
    Bergen, B., Gradl, T., Rude, U., Hulsemann, F.: A massively parallel multigrid method for finite elements. Comput. Sci. Eng. 8(6), 56–62 (2006)CrossRefGoogle Scholar
  5. 5.
    Boyd, E.L., Azeem, W., Lee, H.H., Shih, T.P., Hung, S.H., Davidson, E.S.: A hierarchical approach to modeling and improving the performance of scientific applications on the KSR1. In: Proceedings of the International Conference on Parallel Processing (ICPP), pp. 188–192 (1994)Google Scholar
  6. 6.
    Braess, D.: Finite elements: Theory, Fast Solvers, and Applications in Solid Mechanics. Cambridge University Press, Cambridge (2001)Google Scholar
  7. 7.
    Calotoiu, A., Hoefler, T., Poke, M., Wolf, F.: Using automated performance modeling to find scalability bugs in complex codes. In: Proceedings of the ACM/IEEE Conference on Supercomputing (SC13). ACM, Denver, CO, USA, November 2013Google Scholar
  8. 8.
    Calotoiu, A., Hoefler, T., Wolf, F.: Mass-producing insightful performance models. In: Workshop on Modeling and Simulation of Systems and Applications. University of Washington, Seattle, Washington, August 2014Google Scholar
  9. 9.
    Carrington, L., Laurenzano, M., Tiwari, A.: Characterizing large-scale HPC applications through trace extrapolation. Parallel Process. Lett. 23(4), 1340008 (2013)MathSciNetCrossRefGoogle Scholar
  10. 10.
    Gahvari, H., Baker, A.H., Schulz, M., Yang, U.M., Jordan, K.E., Gropp, W.: Modeling the performance of an algebraic multigrid cycle on HPC platforms. In: Proceedings of the International Conference on Supercomputing, pp. 172–181. ACM (2011)Google Scholar
  11. 11.
    Gahvari, H., Gropp, W.: An introductory exascale feasibility study for FFTs and multigrid. In: International Symposium on Parallel and Distributed Processing (IPDPS), pp. 1–9. IEEE (2010)Google Scholar
  12. 12.
    Gmeiner, B., Köstler, H., Stürmer, M., Rüde, U.: Parallel multigrid on hierarchical hybrid grids: a performance study on current high performance computing clusters. Concurrency Comput. Pract. Experience 26(1), 217–240 (2014)CrossRefGoogle Scholar
  13. 13.
    Hackbusch, W.: Multi-grid Methods and Applications, vol. 4. Springer, Heidelberg (1985)zbMATHGoogle Scholar
  14. 14.
    Hackbusch, W.: Iterative Solution of Large Sparse Systems of Equations. Springer, New York (1994)CrossRefzbMATHGoogle Scholar
  15. 15.
    Lee, B.C., Brooks, D.M., de Supinski, B.R., Schulz, M., Singh, K., McKee, S.A.: Methods of inference and learning for performance modeling of parallel applications. In: Proceedings of the 12th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP 2007), pp. 249–258 (2007)Google Scholar
  16. 16.
    Nägel, A., Heisig, M., Wittum, G.: Detailed modeling of skin penetration—an overview. Adv. Drug Deliv. Rev. 65(2), 191–207 (2013)CrossRefGoogle Scholar
  17. 17.
    Nägel, A., Heisig, M., Wittum, G.: A comparison of two- and three-dimensional models for the simulation of the permeability of human stratum corneum. Eur. J. Pharm. Biopharm. 72(2), 332–338 (2009)CrossRefGoogle Scholar
  18. 18.
    Petrini, F., Kerbyson, D.J., Pakin, S.: The case of the missing supercomputer performance: achieving optimal performance on the 8,192 processors of ASCI Q. In: Proceedings of the ACM/IEEE Conference on Supercomputing (SC 2003), p. 55 (2003)Google Scholar
  19. 19.
    Picard, R.R., Cook, R.D.: Cross-validation of regression models. J. Am. Statist. Assoc. 79(387), 575–583 (1984)MathSciNetCrossRefzbMATHGoogle Scholar
  20. 20.
    Reiter, S.: Efficient algorithms and data structures for the realization of adaptive, hierarchical grids on massively parallel systems. Ph.D. thesis, University of Frankfurt, Germany (2014)Google Scholar
  21. 21.
    Reiter, S., Vogel, A., Heppner, I., Rupp, M., Wittum, G.: A massively parallel geometric multigrid solver on hierarchically distributed grids. Comp. Vis. Sci. 16(4), 151–164 (2013)CrossRefGoogle Scholar
  22. 22.
    Sack, P., Gropp, W.: A scalable MPI\_Comm\_split algorithm for exascale computing. In: Keller, R., Gabriel, E., Resch, M., Dongarra, J. (eds.) EuroMPI 2010. LNCS, vol. 6305, pp. 1–10. Springer, Heidelberg (2010) CrossRefGoogle Scholar
  23. 23.
    Sampath, R., Biros, G.: A parallel geometric multigrid method for finite elements on octree meshes. SIAM J. Sci. Comput. 32, 1361–1392 (2010)MathSciNetCrossRefGoogle Scholar
  24. 24.
    Siebert, C., Wolf, F.: Parallel sorting with minimal data. In: Cotronis, Y., Danalis, A., Nikolopoulos, D.S., Dongarra, J. (eds.) EuroMPI 2011. LNCS, vol. 6960, pp. 170–177. Springer, Heidelberg (2011) CrossRefGoogle Scholar
  25. 25.
    Spafford, K.L., Vetter, J.S.: Aspen: a domain specific language for performance modeling. In: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, SC 2012, pp. 84:1–84:11. IEEE Computer Society Press, Los Alamitos (2012)Google Scholar
  26. 26.
    Sundar, H., Biros, G., Burstedde, C., Rudi, J., Ghattas, O., Stadler, G.: Parallel geometric-algebraic multigrid on unstructured forests of octrees. In: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis. p. 43. IEEE Computer Society Press (2012)Google Scholar
  27. 27.
    Tallent, N.R., Hoisie, A.: Palm: easing the burden of analytical performance modeling. In: Proceedings of the International Conference on Supercomputing (ICS), pp. 221–230 (2014)Google Scholar
  28. 28.
    Vogel, A., Reiter, S., Rupp, M., Nägel, A., Wittum, G.: UG 4: a novel flexible software system for simulating PDE based models on high performance computers. Comp. Vis. Sci. 16(4), 165–179 (2013)CrossRefGoogle Scholar
  29. 29.
    Williams, S., Lijewski, M., Almgren, A., Straalen, B.V., Carson, E., Knight, N., Demmel, J.: s-step Krylov subspace methods as bottom solvers for geometric multigrid. In: 28th International Parallel and Distributed Processing Symposium, pp. 1149–1158. IEEE (2014)Google Scholar
  30. 30.
    Wolf, F., Bischof, C., Hoefler, T., Mohr, B., Wittum, G., Calotoiu, A., Iwainsky, C., Strube, A., Vogel, A.: Catwalk: a quick development path for performance models. In: Lopes, L., et al. (eds.) Euro-Par 2014, Part II. LNCS, vol. 8806, pp. 589–600. Springer, Heidelberg (2014) Google Scholar
  31. 31.
    Wu, X., Mueller, F.: ScalaExtrap: trace-based communication extrapolation for SPMD programs. In: Proceedings of the 16th ACM Symposium on Principles and Practice of Parallel Programming (PPoPP 2011), pp. 113–122 (2011)Google Scholar
  32. 32.
    Zhai, J., Chen, W., Zheng, W.: Phantom: predicting performance of parallel applications on large-scale parallel machines using a single node. SIGPLAN Not. 45(5), 305–314 (2010)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2015

Authors and Affiliations

  • Andreas Vogel
    • 1
    Email author
  • Alexandru Calotoiu
    • 2
  • Alexandre Strube
    • 3
  • Sebastian Reiter
    • 1
  • Arne Nägel
    • 1
  • Felix Wolf
    • 4
  • Gabriel Wittum
    • 1
  1. 1.Goethe Universität FrankfurtFrankfurtGermany
  2. 2.German Research School for Simulation SciencesAachenGermany
  3. 3.Forschungszentrum JülichJülichGermany
  4. 4.Technische Universität DarmstadtDarmstadtGermany

Personalised recommendations