Skip to main content

Towards Elastic Resource Management

  • Conference paper
  • First Online:
Tools for High Performance Computing 2017 (PTHPC 2017)

Abstract

A new paradigm for HPC Resource Management, called Elastic Computing, is under development at the Invasive Computing Transregional Collaborative Research Center. An extension to MPI for programming elastic applications and a resource manager were implemented. The resource manager is an extension of the SLURM batch scheduler. Resource elasticity allows the resource manager to dictate changes in the resource allocations of running applications based on scheduler decisions. These resource allocation changes are decided by the scheduler based on performance feedback from the applications. The collection of performance feedback from running applications poses unique challenges for the runtime system. In this document, our current performance feedback system is presented.

Support for this work was provided by the Transregional Collaborative Research Centre 89: Invasive Computing (InvasIC) [29].

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Aguilar, X., Fürlinger, K., Laure, E.: MPI trace compression using event flow graphs. In: Euro-Par 2014 Parallel Processing: 20th International Conference, Porto, Portugal, August 25–29, 2014. Proceedings, pp. 1–12. Springer International Publishing (2014). https://doi.org/10.1007/978-3-319-09873-91

  2. Aguilar, X., Fürlinger, K., Laure, E.: Automatic on-line detection of MPI application structure with event flow graphs. In: Euro-Par 2015: Parallel Processing: 21st International Conference on Parallel and Distributed Computing, Vienna, Austria, August 24-28, 2015, Proceedings, pp. 70–81. Springer, Berlin, Heidelberg (2015). https://doi.org/10.1007/978-3-662-48096-06

  3. Aguilar, X., Fürlinger, K., Laure, E.: Visual MPI performance analysis using event flow graphs. Proced. Comput. Sci. 51, 1353 – 1362 (2015). https://doi.org/10.1016/j.procs.2015.05.322. URL http://www.sciencedirect.com/science/article/pii/S1877050915011308

    Article  Google Scholar 

  4. Aguilar, X., Fürlinger, K., Laure, E.: Event flow graphs for MPI performance monitoring and analysis. In: Tools for High Performance Computing 2015: Proceedings of the 9th International Workshop on Parallel Tools for High Performance Computing, September 2015, Dresden, Germany, pp. 103–115. Springer International Publishing, Cham (2016). https://doi.org/10.1007/978-3-319-39589-08

  5. Casavant, T.L., Kuhl, J.G.: A taxonomy of scheduling in general-purpose distributed computing systems. IEEE Trans. Softw. Eng. 14(2), 141–154 (1988). https://doi.org/10.1109/32.4634

    Article  Google Scholar 

  6. Coffman, E.G., J., Garey, M.R., Johnson, D.S.: An application of bin-packing to multiprocessor scheduling. SIAM J. Comput. 7(1), 1–17 (1978). https://doi.org/10.1137/0207001

  7. Davis, R.I., Burns, A.: A survey of hard real-time scheduling for multiprocessor systems. ACM Comput. Surv. 43(4), 35:1–35:44 (2011). https://doi.org/10.1145/1978802.1978814

    Article  Google Scholar 

  8. Etsion, Y., Tsafrir, D.: A short survey of commercial cluster batch schedulers. Sch. Comput. Sci. Eng. Hebr. Univ. Jerus. 44221, 2005–13 (2005)

    Google Scholar 

  9. Feitelson, D.G., Rudolph, L., Schwiegelshohn, U.: Parallel job scheduling—a status report. In: Proceedings of the 10th International Conference on Job Scheduling Strategies for Parallel Processing, JSSPP 2004, pp. 1–16. Springer, Berlin, Heidelberg (2005). https://doi.org/10.1007/114075221

  10. Fortnow, L.: The status of the P versus NP problem. Commun. ACM 52(9), 78–86 (2009). https://doi.org/10.1145/1562164.1562186

    Article  Google Scholar 

  11. Fürlinger, K., Skinner, D.: Capturing and visualizing event flow graphs of MPI applications. In: Euro-Par 2009—Parallel Processing Workshops: HPPC, HeteroPar, PROPER, ROIA, UNICORE, VHPC, Delft, The Netherlands, August 25–28, 2009, Revised Selected Papers, pp. 218–227. Springer, Berlin, Heidelberg (2010). https://doi.org/10.1007/978-3-642-14122-526

  12. Garey, M.R., Johnson, D.S.: Computers and Intractability: A Guide to the Theory of NP-Completeness. W. H. Freeman & Co., New York, NY, USA (1990)

    Google Scholar 

  13. Graham, R., Lawler, E., Lenstra, J., Kan, A.: Optimization and approximation in deterministic sequencing and scheduling: a survey. In: Proceedings of the Advanced Research Institute on Discrete Optimization and Systems Applications, Annals of Discrete Mathematics, vol. 5, pp. 287–326. Elsevier (1979). https://doi.org/10.1016/S0167-5060(08)70356-X

    Google Scholar 

  14. Havlak, P.: Nesting of reducible and irreducible loops. ACM Trans. Program. Lang. Syst. 19(4), 557–567 (1997). https://doi.org/10.1145/262004.262005

    Article  Google Scholar 

  15. Ioannou, N., Kauschke, M., Gries, M., Cintra, M.: Phase-based application-driven hierarchical power management on the single-chip cloud computer. In: 2011 International Conference on Parallel Architectures and Compilation Techniques, pp. 131–142 (2011). https://doi.org/10.1109/PACT.2011.19

  16. Jackson, D.B., Snell, Q., Clement, M.J.: Core algorithms of the Maui scheduler. In: Revised Papers from the 7th International Workshop on Job Scheduling Strategies for Parallel Processing, JSSPP 2001, pp. 87–102. Springer, London, UK (2001). http://dl.acm.org/citation.cfm?id=646382.689682

  17. Karp, R.M.: Reducibility among combinatorial problems. In: Complexity of Computer Computations: Proceedings of a symposium on the Complexity of Computer Computations, pp. 85–103. Springer US, Boston, MA (1972). https://doi.org/10.1007/978-1-4684-2001-29

  18. Khan, A.A., Mccreary, C.L., Jones, M.S.: A comparison of multiprocessor scheduling heuristics. In: Internatonal Conference on Parallel Processing Vol. 2, vol. 2, pp. 243–250 (1994). https://doi.org/10.1109/ICPP.1994.19

  19. Lawler, E.L., Lenstra, J.K., Kan, A.H.R., Shmoys, D.B.: Chapter 9 sequencing and scheduling: Algorithms and complexity. In: Logistics of Production and Inventory, Handbooks in Operations Research and Management Science, vol. 4, pp. 445 – 522. Elsevier (1993). https://doi.org/10.1016/S0927-0507(05)80189-6

    Chapter  Google Scholar 

  20. Lee, I., Iliopoulos, C.S., Park, K.: Linear time algorithm for the longest common repeat problem. J. Discret. Algorithms 5(2), 243–249 (2007). https://doi.org/10.1016/j.jda.2006.03.019. 2004 Symposium on String Processing and Information Retrieval

    Article  MathSciNet  Google Scholar 

  21. Lenstra, J., Kan, A.R., Brucker, P.: Complexity of machine scheduling problems. In: Studies in Integer Programming, Annals of Discrete Mathematics, vol. 1, pp. 343–362. Elsevier (1977). https://doi.org/10.1016/S0167-5060(08)70743-X

    Chapter  Google Scholar 

  22. Lopes, R.V., Menascé, D.: A taxonomy of job scheduling on distributed computing systems. IEEE Trans. Parallel Distrib. Syst. 27(12), 3412–3428 (2016). https://doi.org/10.1109/TPDS.2016.2537821

    Article  Google Scholar 

  23. Mu’alem, A.W., Feitelson, D.G.: Utilization, predictability, workloads, and user runtime estimates in scheduling the IBM SP2 with backfilling. IEEE Trans. Parallel Distrib. Syst. 12(6), 529–543 (2001). https://doi.org/10.1109/71.932708

    Article  Google Scholar 

  24. Ramalingam, G.: Identifying loops in almost linear time. ACM Trans. Program. Lang. Syst. 21(2), 175–188 (1999). https://doi.org/10.1145/316686.316687

    Article  Google Scholar 

  25. Rotithor, H.G.: Taxonomy of dynamic task scheduling schemes in distributed computing systems. IEE Proc. Comput. Digital Techn. 141(1), 1–10 (1994). https://doi.org/10.1049/ip-cdt:19949630

    Article  Google Scholar 

  26. Srinivasan, S., Kettimuthu, R., Subramani, V., Sadayappan, P.: Selective reservation strategies for backfill job scheduling. In: Job Scheduling Strategies for Parallel Processing: 8th International Workshop, JSSPP 2002 Edinburgh, Scotland, UK, July 24, 2002 Revised Papers, pp. 55–71. Springer, Berlin, Heidelberg (2002). https://doi.org/10.1007/3-540-36180-44

  27. SuperMUC Petascale System (2017). https://www.lrz.de/services/compute/supermuc/. [Online]

  28. Tarjan, R.: Testing flow graph reducibility. In: Proceedings of the Fifth Annual ACM Symposium on Theory of Computing, STOC 1973, pp. 96–107. ACM, New York, NY, USA (1973). https://doi.org/10.1145/800125.804040

  29. Transregional Research Center InvasIC (2017). http://www.invasic.de. [Online]

  30. Ukkonen, E.: On-line construction of suffix trees. Algorithmica 14(3), 249–260 (1995). 10.1007/BF01206331

    Google Scholar 

  31. Ullman, J.: Np-complete scheduling problems. J. Comput. Syst. Sci. 10(3), 384–393 (1975). https://doi.org/10.1016/S0022-0000(75)80008-0

    Article  MathSciNet  Google Scholar 

  32. Wei, T., Mao, J., Zou, W., Chen, Y.: A new algorithm for identifying loops in decompilation. In: Static Analysis: 14th International Symposium, SAS 2007, Kongens Lyngby, Denmark, August 22-24, 2007. Proceedings, pp. 170–183. Springer, Berlin, Heidelberg (2007). https://doi.org/10.1007/978-3-540-74061-211

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Isaías A. Comprés Ureña .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Comprés Ureña, I.A., Gerndt, M. (2019). Towards Elastic Resource Management. In: Niethammer, C., Resch, M., Nagel, W., Brunst, H., Mix, H. (eds) Tools for High Performance Computing 2017. PTHPC 2017. Springer, Cham. https://doi.org/10.1007/978-3-030-11987-4_7

Download citation

Publish with us

Policies and ethics