Advertisement

Alea – Complex Job Scheduling Simulator

  • Dalibor KlusáčekEmail author
  • Mehmet Soysal
  • Frédéric Suter
Conference paper
  • 122 Downloads
Part of the Lecture Notes in Computer Science book series (LNCS, volume 12044)

Abstract

Using large computer systems such as HPC clusters up to their full potential can be hard. Many problems and inefficiencies relate to the interactions of user workloads and system-level policies. These policies enable various setup choices of the resource management system (RMS) as well as the applied scheduling policy. While expert’s assessment and well known best practices do their job when tuning the performance, there is usually plenty of room for further improvements, e.g., by considering more efficient system setups or even radically new scheduling policies. For such potentially damaging modifications it is very suitable to use some form of a simulator first, which allows for repeated evaluations of various setups in a fully controlled manner. This paper presents the latest improvements and advanced simulation capabilities of the Alea job scheduling simulator that has been actively developed for over 10 years now. We present both recently added advanced simulation capabilities as well as a set of real-life based case studies where Alea has been used to evaluate major modifications of real HPC and HTC systems.

Keywords

Alea Simulation Scheduling HPC HTC 

Notes

Acknowledgments

We acknowledge the support and computational resources provided by the MetaCentrum under the program LM2015042, and the support provided by the project Reg. No. CZ.02.1.01/0.0/0.0/16_013/0001797 co-funded by the Ministry of Education, Youth and Sports of the Czech Republic.

References

  1. 1.
    Alea job scheduling simulator, April 2019. https://github.com/aleasimulator
  2. 2.
    Azevedo, F., Klusáček, D., Suter, F.: Improving fairness in a large scale HTC system through workload analysis and simulation. In: Yahyapour, R. (ed.) Euro-Par 2019. LNCS, vol. 11725, pp. 129–141. Springer, Cham (2019).  https://doi.org/10.1007/978-3-030-29400-7_10CrossRefGoogle Scholar
  3. 3.
    Bak, S., Krystek, M., Kurowski, K., Oleksiak, A., Piatek, W., Weglarz, J.: GSSIM - a tool for distributed computing experiments. Sci. Program. 19(4), 231–251 (2011)Google Scholar
  4. 4.
    Dutot, P.-F., Mercier, M., Poquet, M., Richard, O.: Batsim: a realistic language-independent resources and jobs management systems simulator. In: Desai, N., Cirne, W. (eds.) JSSPP 2015-2016. LNCS, vol. 10353, pp. 178–197. Springer, Cham (2017).  https://doi.org/10.1007/978-3-319-61756-5_10CrossRefGoogle Scholar
  5. 5.
    Galleguillos, C., Kiziltan, Z., Netti, A., Soto, R.: AccaSim: a customizable workload management simulator for job dispatching research in HPC systems. arXiv e-prints arXiv:1806.06728 (2018)
  6. 6.
    Ghodsi, A., Zaharia, M., Hindman, B., Konwinski, A., Shenker, S., Stoica, I.: Dominant resource fairness: fair allocation of multiple resource types. In: 8th USENIX Symposium on Networked Systems Design and Implementation (2011)Google Scholar
  7. 7.
    Jackson, D., Snell, Q., Clement, M.: Core algorithms of the maui scheduler. In: Feitelson, D.G., Rudolph, L. (eds.) JSSPP 2001. LNCS, vol. 2221, pp. 87–102. Springer, Heidelberg (2001).  https://doi.org/10.1007/3-540-45540-X_6CrossRefGoogle Scholar
  8. 8.
    Klusáček, D., Chlumský, V.: Planning and metaheuristic optimization in production job scheduler. In: Desai, N., Cirne, W. (eds.) JSSPP 2015-2016. LNCS, vol. 10353, pp. 198–216. Springer, Cham (2017).  https://doi.org/10.1007/978-3-319-61756-5_11CrossRefGoogle Scholar
  9. 9.
    Klusáček, D., Rudová, H.: Alea 2 - job scheduling simulator. In: 3rd International ICST Conference on Simulation Tools and Technique, ICST (2010)Google Scholar
  10. 10.
    Klusáček, D., Tóth, Š.: On interactions among scheduling policies: finding efficient queue setup using high-resolution simulations. In: Silva, F., Dutra, I., Santos Costa, V. (eds.) Euro-Par 2014. LNCS, vol. 8632, pp. 138–149. Springer, Cham (2014).  https://doi.org/10.1007/978-3-319-09873-9_12CrossRefGoogle Scholar
  11. 11.
    Klusáček, D., Tóth, Š., Podolníková, G.: Real-life experience with major reconfiguration of job scheduling system. In: Desai, N., Cirne, W. (eds.) JSSPP 2015-2016. LNCS, vol. 10353, pp. 83–101. Springer, Cham (2017).  https://doi.org/10.1007/978-3-319-61756-5_5CrossRefGoogle Scholar
  12. 12.
    Klusáček, D., Tóth, Š., Podolníková, G.: Complex job scheduling simulations with Alea 4. In: Ninth EAI International Conference on Simulation Tools and Techniques (SimuTools 2016), pp. 124–129. ACM (2016)Google Scholar
  13. 13.
  14. 14.
    Mu’alem, A.W., Feitelson, D.G.: Utilization, predictability, workloads, and user runtime estimates in scheduling the IBM SP2 with backfilling. IEEE Trans. Parallel Distrib. Syst. 12(6), 529–543 (2001)CrossRefGoogle Scholar
  15. 15.
    Oeste, S., Kluge, M., Soysal, M., Streit, A., Vef, M.-A., Brinkmann, A.: Exploring opportunities for job-temporal file systems with ADA-FS. In: 1st Joint International Workshop on Parallel Data Storage and Data Intensive Scalable Computing Systems (2016)Google Scholar
  16. 16.
    Rodrigo, G.P., Elmroth, E., Östberg, P.-O., Ramakrishnan, L.: ScSF: a scheduling simulation framework. In: Klusáček, D., Cirne, W., Desai, N. (eds.) JSSPP 2017. LNCS, vol. 10773, pp. 152–173. Springer, Cham (2018).  https://doi.org/10.1007/978-3-319-77398-8_9CrossRefGoogle Scholar
  17. 17.
    Schwiegelshohn, U.: How to design a job scheduling algorithm. In: Cirne, W., Desai, N. (eds.) JSSPP 2014. LNCS, vol. 8828, pp. 147–167. Springer, Cham (2015).  https://doi.org/10.1007/978-3-319-15789-4_9CrossRefGoogle Scholar
  18. 18.
    Simakov, N.A., et al.: A slurm simulator: implementation and parametric analysis. In: Jarvis, S., Wright, S., Hammond, S. (eds.) PMBS 2017. LNCS, vol. 10724, pp. 197–217. Springer, Cham (2018).  https://doi.org/10.1007/978-3-319-72971-8_10CrossRefGoogle Scholar
  19. 19.
    Soysal, M., Berghoff, M., Klusáček, D., Streit, A.: On the quality of wall time estimates for resource allocation prediction. In: ICPP 2019 Proceedings of the 48th International Conference on Parallel Processing: Workshops. ACM (2019)Google Scholar
  20. 20.
    Sulistio, A., Cibej, U., Venugopal, S., Robic, B., Buyya, R.: A toolkit for modelling and simulating data Grids: an extension to GridSim. Concurr. Comput.: Pract. Exp. 20(13), 1591–1609 (2008)CrossRefGoogle Scholar
  21. 21.
  22. 22.
    Zakay, N., Feitelson, D.G.: Preserving user behavior characteristics in trace-based simulation of parallel job scheduling. In: 22nd Modeling, Analysis & Simulation of Computer and Telecommunication Systems (MASCOTS), pp. 51–60 (2014)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  • Dalibor Klusáček
    • 1
    Email author
  • Mehmet Soysal
    • 2
  • Frédéric Suter
    • 3
  1. 1.CESNET a.l.e.BrnoCzech Republic
  2. 2.Steinbuch Centre for ComputingKarlsruhe Institute of TechnologyKarlsruheGermany
  3. 3.IN2P3 Computing Center/CNRSLyon-VilleurbanneFrance

Personalised recommendations