Towards Elastic Resource Management

  • Isaías A. Comprés UreñaEmail author
  • Michael Gerndt
Conference paper


A new paradigm for HPC Resource Management, called Elastic Computing, is under development at the Invasive Computing Transregional Collaborative Research Center. An extension to MPI for programming elastic applications and a resource manager were implemented. The resource manager is an extension of the SLURM batch scheduler. Resource elasticity allows the resource manager to dictate changes in the resource allocations of running applications based on scheduler decisions. These resource allocation changes are decided by the scheduler based on performance feedback from the applications. The collection of performance feedback from running applications poses unique challenges for the runtime system. In this document, our current performance feedback system is presented.


Resource management MPI Performance monitoring 


  1. 1.
    Aguilar, X., Fürlinger, K., Laure, E.: MPI trace compression using event flow graphs. In: Euro-Par 2014 Parallel Processing: 20th International Conference, Porto, Portugal, August 25–29, 2014. Proceedings, pp. 1–12. Springer International Publishing (2014).
  2. 2.
    Aguilar, X., Fürlinger, K., Laure, E.: Automatic on-line detection of MPI application structure with event flow graphs. In: Euro-Par 2015: Parallel Processing: 21st International Conference on Parallel and Distributed Computing, Vienna, Austria, August 24-28, 2015, Proceedings, pp. 70–81. Springer, Berlin, Heidelberg (2015).
  3. 3.
    Aguilar, X., Fürlinger, K., Laure, E.: Visual MPI performance analysis using event flow graphs. Proced. Comput. Sci. 51, 1353 – 1362 (2015). URL Scholar
  4. 4.
    Aguilar, X., Fürlinger, K., Laure, E.: Event flow graphs for MPI performance monitoring and analysis. In: Tools for High Performance Computing 2015: Proceedings of the 9th International Workshop on Parallel Tools for High Performance Computing, September 2015, Dresden, Germany, pp. 103–115. Springer International Publishing, Cham (2016).
  5. 5.
    Casavant, T.L., Kuhl, J.G.: A taxonomy of scheduling in general-purpose distributed computing systems. IEEE Trans. Softw. Eng. 14(2), 141–154 (1988). Scholar
  6. 6.
    Coffman, E.G., J., Garey, M.R., Johnson, D.S.: An application of bin-packing to multiprocessor scheduling. SIAM J. Comput. 7(1), 1–17 (1978).
  7. 7.
    Davis, R.I., Burns, A.: A survey of hard real-time scheduling for multiprocessor systems. ACM Comput. Surv. 43(4), 35:1–35:44 (2011). Scholar
  8. 8.
    Etsion, Y., Tsafrir, D.: A short survey of commercial cluster batch schedulers. Sch. Comput. Sci. Eng. Hebr. Univ. Jerus. 44221, 2005–13 (2005)Google Scholar
  9. 9.
    Feitelson, D.G., Rudolph, L., Schwiegelshohn, U.: Parallel job scheduling—a status report. In: Proceedings of the 10th International Conference on Job Scheduling Strategies for Parallel Processing, JSSPP 2004, pp. 1–16. Springer, Berlin, Heidelberg (2005).
  10. 10.
    Fortnow, L.: The status of the P versus NP problem. Commun. ACM 52(9), 78–86 (2009). Scholar
  11. 11.
    Fürlinger, K., Skinner, D.: Capturing and visualizing event flow graphs of MPI applications. In: Euro-Par 2009—Parallel Processing Workshops: HPPC, HeteroPar, PROPER, ROIA, UNICORE, VHPC, Delft, The Netherlands, August 25–28, 2009, Revised Selected Papers, pp. 218–227. Springer, Berlin, Heidelberg (2010).
  12. 12.
    Garey, M.R., Johnson, D.S.: Computers and Intractability: A Guide to the Theory of NP-Completeness. W. H. Freeman & Co., New York, NY, USA (1990)Google Scholar
  13. 13.
    Graham, R., Lawler, E., Lenstra, J., Kan, A.: Optimization and approximation in deterministic sequencing and scheduling: a survey. In: Proceedings of the Advanced Research Institute on Discrete Optimization and Systems Applications, Annals of Discrete Mathematics, vol. 5, pp. 287–326. Elsevier (1979). Scholar
  14. 14.
    Havlak, P.: Nesting of reducible and irreducible loops. ACM Trans. Program. Lang. Syst. 19(4), 557–567 (1997). Scholar
  15. 15.
    Ioannou, N., Kauschke, M., Gries, M., Cintra, M.: Phase-based application-driven hierarchical power management on the single-chip cloud computer. In: 2011 International Conference on Parallel Architectures and Compilation Techniques, pp. 131–142 (2011).
  16. 16.
    Jackson, D.B., Snell, Q., Clement, M.J.: Core algorithms of the Maui scheduler. In: Revised Papers from the 7th International Workshop on Job Scheduling Strategies for Parallel Processing, JSSPP 2001, pp. 87–102. Springer, London, UK (2001).
  17. 17.
    Karp, R.M.: Reducibility among combinatorial problems. In: Complexity of Computer Computations: Proceedings of a symposium on the Complexity of Computer Computations, pp. 85–103. Springer US, Boston, MA (1972).
  18. 18.
    Khan, A.A., Mccreary, C.L., Jones, M.S.: A comparison of multiprocessor scheduling heuristics. In: Internatonal Conference on Parallel Processing Vol. 2, vol. 2, pp. 243–250 (1994).
  19. 19.
    Lawler, E.L., Lenstra, J.K., Kan, A.H.R., Shmoys, D.B.: Chapter 9 sequencing and scheduling: Algorithms and complexity. In: Logistics of Production and Inventory, Handbooks in Operations Research and Management Science, vol. 4, pp. 445 – 522. Elsevier (1993). Scholar
  20. 20.
    Lee, I., Iliopoulos, C.S., Park, K.: Linear time algorithm for the longest common repeat problem. J. Discret. Algorithms 5(2), 243–249 (2007). 2004 Symposium on String Processing and Information RetrievalMathSciNetCrossRefGoogle Scholar
  21. 21.
    Lenstra, J., Kan, A.R., Brucker, P.: Complexity of machine scheduling problems. In: Studies in Integer Programming, Annals of Discrete Mathematics, vol. 1, pp. 343–362. Elsevier (1977). Scholar
  22. 22.
    Lopes, R.V., Menascé, D.: A taxonomy of job scheduling on distributed computing systems. IEEE Trans. Parallel Distrib. Syst. 27(12), 3412–3428 (2016). Scholar
  23. 23.
    Mu’alem, A.W., Feitelson, D.G.: Utilization, predictability, workloads, and user runtime estimates in scheduling the IBM SP2 with backfilling. IEEE Trans. Parallel Distrib. Syst. 12(6), 529–543 (2001). Scholar
  24. 24.
    Ramalingam, G.: Identifying loops in almost linear time. ACM Trans. Program. Lang. Syst. 21(2), 175–188 (1999). Scholar
  25. 25.
    Rotithor, H.G.: Taxonomy of dynamic task scheduling schemes in distributed computing systems. IEE Proc. Comput. Digital Techn. 141(1), 1–10 (1994). Scholar
  26. 26.
    Srinivasan, S., Kettimuthu, R., Subramani, V., Sadayappan, P.: Selective reservation strategies for backfill job scheduling. In: Job Scheduling Strategies for Parallel Processing: 8th International Workshop, JSSPP 2002 Edinburgh, Scotland, UK, July 24, 2002 Revised Papers, pp. 55–71. Springer, Berlin, Heidelberg (2002).
  27. 27.
    SuperMUC Petascale System (2017). [Online]
  28. 28.
    Tarjan, R.: Testing flow graph reducibility. In: Proceedings of the Fifth Annual ACM Symposium on Theory of Computing, STOC 1973, pp. 96–107. ACM, New York, NY, USA (1973).
  29. 29.
    Transregional Research Center InvasIC (2017). [Online]
  30. 30.
    Ukkonen, E.: On-line construction of suffix trees. Algorithmica 14(3), 249–260 (1995). 10.1007/BF01206331Google Scholar
  31. 31.
    Ullman, J.: Np-complete scheduling problems. J. Comput. Syst. Sci. 10(3), 384–393 (1975). Scholar
  32. 32.
    Wei, T., Mao, J., Zou, W., Chen, Y.: A new algorithm for identifying loops in decompilation. In: Static Analysis: 14th International Symposium, SAS 2007, Kongens Lyngby, Denmark, August 22-24, 2007. Proceedings, pp. 170–183. Springer, Berlin, Heidelberg (2007).

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.Technical University of Munich (TUM)MünchenGermany

Personalised recommendations