Invasive Compute Balancing for Applications with Shared and Hybrid Parallelization

Schreiber, Martin; Riesinger, Christoph; Neckel, Tobias; Bungartz, Hans-Joachim; Breuer, Alexander

doi:10.1007/s10766-014-0336-3

Invasive Compute Balancing for Applications with Shared and Hybrid Parallelization

Published: 26 October 2014

Volume 43, pages 1004–1027, (2015)
Cite this article

International Journal of Parallel Programming Aims and scope Submit manuscript

Martin Schreiber¹,
Christoph Riesinger¹,
Tobias Neckel¹,
Hans-Joachim Bungartz¹ &
…
Alexander Breuer¹

328 Accesses
5 Citations
Explore all metrics

Abstract

Achieving high scalability with dynamically adaptive algorithms in high-performance computing (HPC) is a non-trivial task. The invasive paradigm using compute migration represents an efficient alternative to classical data migration approaches for such algorithms in HPC. We present a core-distribution scheduler which realizes the migration of computational power by distributing the cores depending on the requirements specified by one or more parallel program instances. We validate our approach with different benchmark suites for simulations with artificial workload as well as applications based on dynamically adaptive shallow water simulations, and investigate concurrently executed adaptivity parameter studies on realistic Tsunami simulations. The invasive approach results in significantly faster overall execution times and higher hardware utilization than alternative approaches. A dynamic resource management is therefore mandatory for a more efficient execution of scenarios similar to our simulations, e.g. several Tsunami simulations in urgent computing, to overcome strong scalability challenges in the area of HPC. The optimizations obtained by invasive migration of cores can be generalized to similar classes of algorithms with dynamic resource requirements.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

LS-HTC: an HTC system for large-scale jobs

Article 11 March 2024

Juncheng Hu, Xilong Che, … Yuhan Shao

Performance improvement of the triangular matrix product in commodity clusters

Article Open access 15 April 2024

Inmaculada Santamaria-Valenzuela, Rocío Carratalá-Sáez, … Arturo Gonzalez-Escribano

Accelerating I/O performance of ZFS-based Lustre file system in HPC environment

Article 05 December 2022

Jiwoo Bang, Chungyong Kim, … Hyeonsang Eom

Notes

http://invasive-computing.de

References

Aizinger, V.: A discontinuous Galerkin method for two-dimensional flow and transport in shallow water. Adv. Water Resour. 25, 67–84 (2002)
Al Faruque, M.A., Krist, R., Henkel, J.: ADAM: run-time agent-based distributed application mapping for on-chip communication. In: Proceedings of the 45th Annual Design Automation Conference, ACM, New York, NY, USA, DAC ’08, pp. 760–765 (2008)
Bader, M., Breuer, A., Schreiber, M.: Parallel fully adaptive tsunami simulations. In: Facing the Multicore-Challenge III, Institut für Informatik, Technische Universität München, Springer, Heidelberg, Germany. Lecture Notes in Computer Science, vol. 7686 (2012a)
Bader, M., Bungartz, H.J., Schreiber, M.: Invasive computing on high performance shared memory systems. In: Facing the Multicore-Challenge III. Lecture Notes in Computer Science, vol. 7686, pp. 1–12. Springer (2012b)
Bangerth, W., Hartmann, R., Kanschat, G.: Deal.II—a general purpose object oriented finite element library. ACM Trans. Math. Softw. 33(4), 1–27 (2007)
Becchi, M., Crowley, P.: Dynamic thread assignment on heterogeneous multiprocessor architectures. In: Proceedings of the 3rd Conference on Computing Frontiers, ACM, New York, NY, USA, CF ’06, pp. 29–40 (2006)
Behrens, J.: Efficiency for adaptive triangular meshes: key issues of future approaches. In: Hamilton, K., Lohmann, G., Mysak, L. A. (eds.) Earth System Modelling, vol. 2. Springer (2012)
Bhadauria, M., McKee, S.: An approach to resource-aware co-scheduling for CMPs. In: Proceedings of the 24th ACM International Conference on Supercomputing, ACM, ICS ’10, pp. 189–199 (2010)
BODC.: Centenary Edition of the GEBCO Digital Atlas (2013)
Bolosky, W.J., Scott, M.L.: False sharing and its effect on shared memory performance. In: 4th Symposium on Experimental Distributed and Multiprocessor Systems, pp. 57–71 (1993)
Burstedde, C., Wilcox, L.C., Ghattas, O.: p4est: scalable algorithms for parallel adaptive mesh refinement on forests of octrees. SIAM J. Sci. Comput. 33(3), 1103–1133 (2011). doi:10.1137/100791634
Article MATH MathSciNet Google Scholar
Castro, C., Käser, M., Toro, E.: Space-time adaptive numerical methods for geophysical applications. Philos. Trans. R. Soc. A Math. Phys. Eng. Sci. 367, 4613–4631 (2009)
Article MATH Google Scholar
Corbalán, J., Martorell, X., Labarta, J.: Performance-driven processor allocation. In: Proceedings of the 4th Conference on Symposium on Operating System Design & Implementation, vol. 4 (2000)
Corbalan, J., Martorell, X., Labarta, J.: Performance-driven processor allocation. IEEE Trans. Parallel Distrib. Syst. 16(7), 599–611 (2005)
Article Google Scholar
De Grande, R., Boukerche, A.: Dynamic load redistribution based on migration latency analysis for distributed virtual simulations. In: 2011 IEEE International Workshop on Haptic Audio Visual Environments and Games (HAVE), pp. 88–93 (2011). doi:10.1109/HAVE.2011.6088397
Drosinos, N., Koziris, N.: Performance comparison of pure MPI vs hybrid MPI-OpenMP parallelization models on SMP clusters. In: Parallel and Distributed Processing Symposium 2004 IEEE (2004)
Falby, J.S., Zyda, M.J., Pratt, D.R., Mackey, R.L.: NPSNET: hierarchical data structures for real-time three-dimensional visual simulation. Comput. Graph. 17(1), 65–69 (1993)
Article Google Scholar
Fleisch, B.D.: Distributed system V IPC in LOCUS: a design and implementation retrospective. ACM SIGCOMM Comput. Commun. Rev. ACM 16, 386–396 (1986)
Article Google Scholar
Fletcher, R., Powell, M.J.: A rapidly convergent descent method for minimization. Comput. J. 6(2), 163–168 (1963)
Article MATH MathSciNet Google Scholar
Garcia, M., Corbalan, J., Badia Maria, R., Labarta, J.: A dynamic load balancing approach with SMPSuperscalar and MPI. In: Keller, R., Kramer, D., Weiss, J.P. (eds.) Facing the Multicore-Challenge II, Springer Berlin Heidelberg, Stuttgart (2012)
George, D.: Augmented Riemann solvers for the shallow water equations over variable topography with steady states and inundation. J. Comput. Phys. 227(6), 3089–3113 (2008)
Article MATH MathSciNet Google Scholar
Gerndt, M., Hollmann, A., Meyer, M., Schreiber, M., Weidendorfer, J.: Invasive computing with iOMP. In: Specification and Design Languages (FDL), pp. 225–231. IEEE, Vienna (2012)
Hesthaven, J.S., Warburton, T.: Nodal Discontinuous Galerkin Methods: Algorithms, Analysis, and Applications, pp. 97–107. Springer Verlag, New York (2008)
Hsieh, W.C.Y.: Dynamic computation migration in distributed shared memory systems. PhD thesis, MIT (1995)
Keyes, D.E.: Four horizons for enhancing the performance of parallel simulations based on partial differential equations. In: Euro-Par 2000 Parallel Processing, pp. 1–17. Springer (2000)
Kobbe, S., Bauer, L., Lohmann, D., Schröder-Preikschat, W., Henkel, J.: DistRM: Distributed resource management for on-chip many-core systems. In: Proceedings of the Seventh IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis, ACM, pp. 119–128 (2011)
Li, D., De Supinski, B., Schulz, M., Cameron, K., Nikolopoulos, D.: Hybrid MPI/OpenMP power-aware computing. In: Parallel Distributed Processing (IPDPS), pp. 1–12 (2010)
Meister, O., Rahnema, K., Bader, M.: A software concept for cache-efficient simulation on dynamically adaptive structured triangular grids. In: PARCO, pp. 251–260 (2011)
Michael, M.M.: Scalable lock-free dynamic memory allocation. ACM SIGPLAN Not. ACM 39, 35–46 (2004)
Article Google Scholar
Neckel, T.: The PDE framework peano: an environment for efficient flow simulations. Dissertation, Institut für Informatik, Technische Universität München (2009)
Nogina, S., Unterweger, K., Weinzierl, T.: Autotuning of adaptive mesh refinement PDE solvers on shared memory architectures. In: PPAM 2011. Lecture Notes in Computer Science, vol. 7203, pp. 671–680. Springer, Heidelberg (2012)
Reinders, J.: Intel Threading Building Blocks: Outfitting C++ for Multi-core Processor Parallelism. O’Reilly Media Inc, Sebastopol (2010)
Google Scholar
Rosu, D., Schwan, K., Yalamanchili, S., Jha, R.: On adaptive resource allocation for complex real-time applications. In: Proceedings of the 18th IEEE Real-Time Systems Symposium, IEEE Computer Society, Washington, DC, USA, RTSS ’97, p. 320 (1997). doi:10.1109/REAL.1997.641293
Rüde, U.: Fully adaptive multigrid methods. SIAM J. Numer. Anal. 30(1), 230–248 (1993)
Article MATH MathSciNet Google Scholar
Rusanov, V.V.: Calculation of interaction of non-steady shock waves with obstacles. NRC, Division of Mechanical Engineering (1962)
Sagan, H.: Space-Filling Curves, vol. 18. Springer, New York (1994)
Book MATH Google Scholar
Schmidl, D., Cramer, T., Wienke, S., Terboven, C., Müller, M.: Assessing the performance of openmp programs on the intel xeon phi. In: Wolf, F., Mohr, B., Mey, D. (eds.) Euro-Par 2013 Parallel Processing. Lecture Notes in Computer Science, vol. 8097, pp. 547–558. Springer, Berlin (2013)
Chapter Google Scholar
Schreiber, M., Bungartz, H.J., Bader, M.: Shared memory parallelization of fully-adaptive simulations using a dynamic tree-split and -join approach. In: IEEE International Conference on High Performance Computing (HiPC), IEEE Xplore, Puna, India (2012)
Schreiber, M., Weinzierl, T., Bungartz, H.J.: Cluster optimization of parallel simulations with dynamically adaptive grids. In: EuroPar 2013, Aachen, Germany (2013a)
Schreiber, M., Weinzierl, T., Bungartz, H.J.: SFC-based communication metadata encoding for adaptive mesh. In: Proceedings of the International Conference on Parallel Computing (ParCo) (2013b)
Shao, G., Li, X., Ji, C., Maeda, T.: Focal mechanism and slip history of the 2011 Mw 9.1 off the Pacific coast of Tohoku Earthquake, constrained with teleseismic body and surface waves. Earth Planets Space 63(7), 559–564 (2011)
Article Google Scholar
Teich, J., Henkel, J., Herkersdorf, A., Schmitt-Landsiedel, D., Schröder-Preikschat, W., Snelting, G.: Invasive computing: an overview. In: Multiprocessor SoC, pp. 241–268. Springer (2011)
Tradowsky, C., Schreiber, M., Vesper, M., Domladovec, I., Braun, M., Bungartz, H.J., Becker, J.: Towards Dynamic Cache and Bandwidth Invasion, pp. 97–107. Springer International Publishing (2014)
Vigh, C.A.: Parallel simulations of the shallow water equations on structured dynamically adaptive triangular grids. Dissertation, Institut für Informatik, Technische Universität München (2012)
Vuchener, C., Esnard, A.: Dynamic load-balancing with variable number of processors based on graph repartitioning. In: Proceedings of High Performance Computing (HiPC 2012), pp. 1–9 (2012)
Weinzierl, T.: A framework for parallel PDE solvers on multiscale adaptive cartesian grids. Dissertation, Institut für Informatik, Technische Universität München, München (2009)

Download references

Acknowledgments

This work was supported by the German Research Foundation (DFG) as part of the Transregional Collaborative Research Centre “Invasive Computing” (SFB/TR 89).

Author information

Authors and Affiliations

Fakultät für Informatik, Technische Universität München, Boltzmannstraße 3, 85748, Garching, Germany
Martin Schreiber, Christoph Riesinger, Tobias Neckel, Hans-Joachim Bungartz & Alexander Breuer

Authors

Martin Schreiber
View author publications
You can also search for this author in PubMed Google Scholar
Christoph Riesinger
View author publications
You can also search for this author in PubMed Google Scholar
Tobias Neckel
View author publications
You can also search for this author in PubMed Google Scholar
Hans-Joachim Bungartz
View author publications
You can also search for this author in PubMed Google Scholar
Alexander Breuer
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Martin Schreiber.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Schreiber, M., Riesinger, C., Neckel, T. et al. Invasive Compute Balancing for Applications with Shared and Hybrid Parallelization. Int J Parallel Prog 43, 1004–1027 (2015). https://doi.org/10.1007/s10766-014-0336-3

Download citation

Received: 07 January 2014
Accepted: 09 October 2014
Published: 26 October 2014
Issue Date: December 2015
DOI: https://doi.org/10.1007/s10766-014-0336-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Invasive Compute Balancing for Applications with Shared and Hybrid Parallelization

Abstract

Access this article

Similar content being viewed by others

LS-HTC: an HTC system for large-scale jobs

Performance improvement of the triangular matrix product in commodity clusters

Accelerating I/O performance of ZFS-based Lustre file system in HPC environment

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Invasive Compute Balancing for Applications with Shared and Hybrid Parallelization

Abstract

Access this article

Similar content being viewed by others

LS-HTC: an HTC system for large-scale jobs

Performance improvement of the triangular matrix product in commodity clusters

Accelerating I/O performance of ZFS-based Lustre file system in HPC environment

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation