International Journal of Parallel Programming

, Volume 43, Issue 6, pp 1004–1027 | Cite as

Invasive Compute Balancing for Applications with Shared and Hybrid Parallelization

  • Martin SchreiberEmail author
  • Christoph Riesinger
  • Tobias Neckel
  • Hans-Joachim Bungartz
  • Alexander Breuer


Achieving high scalability with dynamically adaptive algorithms in high-performance computing (HPC) is a non-trivial task. The invasive paradigm using compute migration represents an efficient alternative to classical data migration approaches for such algorithms in HPC. We present a core-distribution scheduler which realizes the migration of computational power by distributing the cores depending on the requirements specified by one or more parallel program instances. We validate our approach with different benchmark suites for simulations with artificial workload as well as applications based on dynamically adaptive shallow water simulations, and investigate concurrently executed adaptivity parameter studies on realistic Tsunami simulations. The invasive approach results in significantly faster overall execution times and higher hardware utilization than alternative approaches. A dynamic resource management is therefore mandatory for a more efficient execution of scenarios similar to our simulations, e.g. several Tsunami simulations in urgent computing, to overcome strong scalability challenges in the area of HPC. The optimizations obtained by invasive migration of cores can be generalized to similar classes of algorithms with dynamic resource requirements.


Invasive computing Compute migration High-performance computing Hybrid parallelization Dynamic adaptive mesh refinement 



This work was supported by the German Research Foundation (DFG) as part of the Transregional Collaborative Research Centre “Invasive Computing” (SFB/TR 89).


  1. 1.
    Aizinger, V.: A discontinuous Galerkin method for two-dimensional flow and transport in shallow water. Adv. Water Resour. 25, 67–84 (2002)Google Scholar
  2. 2.
    Al Faruque, M.A., Krist, R., Henkel, J.: ADAM: run-time agent-based distributed application mapping for on-chip communication. In: Proceedings of the 45th Annual Design Automation Conference, ACM, New York, NY, USA, DAC ’08, pp. 760–765 (2008)Google Scholar
  3. 3.
    Bader, M., Breuer, A., Schreiber, M.: Parallel fully adaptive tsunami simulations. In: Facing the Multicore-Challenge III, Institut für Informatik, Technische Universität München, Springer, Heidelberg, Germany. Lecture Notes in Computer Science, vol. 7686 (2012a)Google Scholar
  4. 4.
    Bader, M., Bungartz, H.J., Schreiber, M.: Invasive computing on high performance shared memory systems. In: Facing the Multicore-Challenge III. Lecture Notes in Computer Science, vol. 7686, pp. 1–12. Springer (2012b)Google Scholar
  5. 5.
    Bangerth, W., Hartmann, R., Kanschat, G.: Deal.II—a general purpose object oriented finite element library. ACM Trans. Math. Softw. 33(4), 1–27 (2007)Google Scholar
  6. 6.
    Becchi, M., Crowley, P.: Dynamic thread assignment on heterogeneous multiprocessor architectures. In: Proceedings of the 3rd Conference on Computing Frontiers, ACM, New York, NY, USA, CF ’06, pp. 29–40 (2006)Google Scholar
  7. 7.
    Behrens, J.: Efficiency for adaptive triangular meshes: key issues of future approaches. In: Hamilton, K., Lohmann, G., Mysak, L. A. (eds.) Earth System Modelling, vol. 2. Springer (2012)Google Scholar
  8. 8.
    Bhadauria, M., McKee, S.: An approach to resource-aware co-scheduling for CMPs. In: Proceedings of the 24th ACM International Conference on Supercomputing, ACM, ICS ’10, pp. 189–199 (2010)Google Scholar
  9. 9.
    BODC.: Centenary Edition of the GEBCO Digital Atlas (2013)Google Scholar
  10. 10.
    Bolosky, W.J., Scott, M.L.: False sharing and its effect on shared memory performance. In: 4th Symposium on Experimental Distributed and Multiprocessor Systems, pp. 57–71 (1993)Google Scholar
  11. 11.
    Burstedde, C., Wilcox, L.C., Ghattas, O.: p4est: scalable algorithms for parallel adaptive mesh refinement on forests of octrees. SIAM J. Sci. Comput. 33(3), 1103–1133 (2011). doi: 10.1137/100791634 zbMATHMathSciNetCrossRefGoogle Scholar
  12. 12.
    Castro, C., Käser, M., Toro, E.: Space-time adaptive numerical methods for geophysical applications. Philos. Trans. R. Soc. A Math. Phys. Eng. Sci. 367, 4613–4631 (2009)zbMATHCrossRefGoogle Scholar
  13. 13.
    Corbalán, J., Martorell, X., Labarta, J.: Performance-driven processor allocation. In: Proceedings of the 4th Conference on Symposium on Operating System Design & Implementation, vol. 4 (2000)Google Scholar
  14. 14.
    Corbalan, J., Martorell, X., Labarta, J.: Performance-driven processor allocation. IEEE Trans. Parallel Distrib. Syst. 16(7), 599–611 (2005)CrossRefGoogle Scholar
  15. 15.
    De Grande, R., Boukerche, A.: Dynamic load redistribution based on migration latency analysis for distributed virtual simulations. In: 2011 IEEE International Workshop on Haptic Audio Visual Environments and Games (HAVE), pp. 88–93 (2011). doi: 10.1109/HAVE.2011.6088397
  16. 16.
    Drosinos, N., Koziris, N.: Performance comparison of pure MPI vs hybrid MPI-OpenMP parallelization models on SMP clusters. In: Parallel and Distributed Processing Symposium 2004 IEEE (2004)Google Scholar
  17. 17.
    Falby, J.S., Zyda, M.J., Pratt, D.R., Mackey, R.L.: NPSNET: hierarchical data structures for real-time three-dimensional visual simulation. Comput. Graph. 17(1), 65–69 (1993)CrossRefGoogle Scholar
  18. 18.
    Fleisch, B.D.: Distributed system V IPC in LOCUS: a design and implementation retrospective. ACM SIGCOMM Comput. Commun. Rev. ACM 16, 386–396 (1986)CrossRefGoogle Scholar
  19. 19.
    Fletcher, R., Powell, M.J.: A rapidly convergent descent method for minimization. Comput. J. 6(2), 163–168 (1963)zbMATHMathSciNetCrossRefGoogle Scholar
  20. 20.
    Garcia, M., Corbalan, J., Badia Maria, R., Labarta, J.: A dynamic load balancing approach with SMPSuperscalar and MPI. In: Keller, R., Kramer, D., Weiss, J.P. (eds.) Facing the Multicore-Challenge II, Springer Berlin Heidelberg, Stuttgart (2012)Google Scholar
  21. 21.
    George, D.: Augmented Riemann solvers for the shallow water equations over variable topography with steady states and inundation. J. Comput. Phys. 227(6), 3089–3113 (2008)zbMATHMathSciNetCrossRefGoogle Scholar
  22. 22.
    Gerndt, M., Hollmann, A., Meyer, M., Schreiber, M., Weidendorfer, J.: Invasive computing with iOMP. In: Specification and Design Languages (FDL), pp. 225–231. IEEE, Vienna (2012)Google Scholar
  23. 23.
    Hesthaven, J.S., Warburton, T.: Nodal Discontinuous Galerkin Methods: Algorithms, Analysis, and Applications, pp. 97–107. Springer Verlag, New York (2008)Google Scholar
  24. 24.
    Hsieh, W.C.Y.: Dynamic computation migration in distributed shared memory systems. PhD thesis, MIT (1995)Google Scholar
  25. 25.
    Keyes, D.E.: Four horizons for enhancing the performance of parallel simulations based on partial differential equations. In: Euro-Par 2000 Parallel Processing, pp. 1–17. Springer (2000)Google Scholar
  26. 26.
    Kobbe, S., Bauer, L., Lohmann, D., Schröder-Preikschat, W., Henkel, J.: DistRM: Distributed resource management for on-chip many-core systems. In: Proceedings of the Seventh IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis, ACM, pp. 119–128 (2011)Google Scholar
  27. 27.
    Li, D., De Supinski, B., Schulz, M., Cameron, K., Nikolopoulos, D.: Hybrid MPI/OpenMP power-aware computing. In: Parallel Distributed Processing (IPDPS), pp. 1–12 (2010)Google Scholar
  28. 28.
    Meister, O., Rahnema, K., Bader, M.: A software concept for cache-efficient simulation on dynamically adaptive structured triangular grids. In: PARCO, pp. 251–260 (2011)Google Scholar
  29. 29.
    Michael, M.M.: Scalable lock-free dynamic memory allocation. ACM SIGPLAN Not. ACM 39, 35–46 (2004)CrossRefGoogle Scholar
  30. 30.
    Neckel, T.: The PDE framework peano: an environment for efficient flow simulations. Dissertation, Institut für Informatik, Technische Universität München (2009)Google Scholar
  31. 31.
    Nogina, S., Unterweger, K., Weinzierl, T.: Autotuning of adaptive mesh refinement PDE solvers on shared memory architectures. In: PPAM 2011. Lecture Notes in Computer Science, vol. 7203, pp. 671–680. Springer, Heidelberg (2012)Google Scholar
  32. 32.
    Reinders, J.: Intel Threading Building Blocks: Outfitting C++ for Multi-core Processor Parallelism. O’Reilly Media Inc, Sebastopol (2010)Google Scholar
  33. 33.
    Rosu, D., Schwan, K., Yalamanchili, S., Jha, R.: On adaptive resource allocation for complex real-time applications. In: Proceedings of the 18th IEEE Real-Time Systems Symposium, IEEE Computer Society, Washington, DC, USA, RTSS ’97, p. 320 (1997). doi: 10.1109/REAL.1997.641293
  34. 34.
    Rüde, U.: Fully adaptive multigrid methods. SIAM J. Numer. Anal. 30(1), 230–248 (1993)zbMATHMathSciNetCrossRefGoogle Scholar
  35. 35.
    Rusanov, V.V.: Calculation of interaction of non-steady shock waves with obstacles. NRC, Division of Mechanical Engineering (1962)Google Scholar
  36. 36.
    Sagan, H.: Space-Filling Curves, vol. 18. Springer, New York (1994)zbMATHCrossRefGoogle Scholar
  37. 37.
    Schmidl, D., Cramer, T., Wienke, S., Terboven, C., Müller, M.: Assessing the performance of openmp programs on the intel xeon phi. In: Wolf, F., Mohr, B., Mey, D. (eds.) Euro-Par 2013 Parallel Processing. Lecture Notes in Computer Science, vol. 8097, pp. 547–558. Springer, Berlin (2013)CrossRefGoogle Scholar
  38. 38.
    Schreiber, M., Bungartz, H.J., Bader, M.: Shared memory parallelization of fully-adaptive simulations using a dynamic tree-split and -join approach. In: IEEE International Conference on High Performance Computing (HiPC), IEEE Xplore, Puna, India (2012)Google Scholar
  39. 39.
    Schreiber, M., Weinzierl, T., Bungartz, H.J.: Cluster optimization of parallel simulations with dynamically adaptive grids. In: EuroPar 2013, Aachen, Germany (2013a)Google Scholar
  40. 40.
    Schreiber, M., Weinzierl, T., Bungartz, H.J.: SFC-based communication metadata encoding for adaptive mesh. In: Proceedings of the International Conference on Parallel Computing (ParCo) (2013b)Google Scholar
  41. 41.
    Shao, G., Li, X., Ji, C., Maeda, T.: Focal mechanism and slip history of the 2011 Mw 9.1 off the Pacific coast of Tohoku Earthquake, constrained with teleseismic body and surface waves. Earth Planets Space 63(7), 559–564 (2011)CrossRefGoogle Scholar
  42. 42.
    Teich, J., Henkel, J., Herkersdorf, A., Schmitt-Landsiedel, D., Schröder-Preikschat, W., Snelting, G.: Invasive computing: an overview. In: Multiprocessor SoC, pp. 241–268. Springer (2011)Google Scholar
  43. 43.
    Tradowsky, C., Schreiber, M., Vesper, M., Domladovec, I., Braun, M., Bungartz, H.J., Becker, J.: Towards Dynamic Cache and Bandwidth Invasion, pp. 97–107. Springer International Publishing (2014)Google Scholar
  44. 44.
    Vigh, C.A.: Parallel simulations of the shallow water equations on structured dynamically adaptive triangular grids. Dissertation, Institut für Informatik, Technische Universität München (2012)Google Scholar
  45. 45.
    Vuchener, C., Esnard, A.: Dynamic load-balancing with variable number of processors based on graph repartitioning. In: Proceedings of High Performance Computing (HiPC 2012), pp. 1–9 (2012)Google Scholar
  46. 46.
    Weinzierl, T.: A framework for parallel PDE solvers on multiscale adaptive cartesian grids. Dissertation, Institut für Informatik, Technische Universität München, München (2009)Google Scholar

Copyright information

© Springer Science+Business Media New York 2014

Authors and Affiliations

  • Martin Schreiber
    • 1
    Email author
  • Christoph Riesinger
    • 1
  • Tobias Neckel
    • 1
  • Hans-Joachim Bungartz
    • 1
  • Alexander Breuer
    • 1
  1. 1.Fakultät für InformatikTechnische Universität MünchenGarchingGermany

Personalised recommendations