Advertisement

Batsim: A Realistic Language-Independent Resources and Jobs Management Systems Simulator

  • Pierre-François Dutot
  • Michael Mercier
  • Millian Poquet
  • Olivier Richard
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10353)

Abstract

As large scale computation systems are growing to exascale, Resources and Jobs Management Systems (RJMS) need to evolve to manage this scale modification. However, their study is problematic since they are critical production systems, where experimenting is extremely costly due to downtime and energy costs. Meanwhile, many scheduling algorithms emerging from theoretical studies have not been transferred to production tools for lack of realistic experimental validation. To tackle these problems we propose Batsim, an extendable, language-independent and scalable RJMS simulator. It allows researchers and engineers to test and compare any scheduling algorithm, using a simple event-based communication interface, which allows different levels of realism. In this paper we show that Batsim’s behaviour matches the one of the real RJMS OAR. Our evaluation process was made with reproducibility in mind and all the experiment material is freely available.

Keywords

RJMS Scheduling Simulation Reproducibility 

References

  1. 1.
    Balouek, D., et al.: Adding virtualization capabilities to the grid’5000 testbed. In: Ivanov, I.I., Sinderen, M., Leymann, F., Shan, T. (eds.) CLOSER 2012. CCIS, vol. 367, pp. 3–20. Springer, Cham (2013). doi: 10.1007/978-3-319-04519-1_1 CrossRefGoogle Scholar
  2. 2.
    Barcelona Supercomputing Center: Extrae, February 2016. https://www.bsc.es/computer-sciences/extrae
  3. 3.
    Bedaride, P., Degomme, A., Genaud, S., Legrand, A., Markomanolis, G., Quinson, M., Stillwell, M., Suter, F., Videau, B.: Toward better simulation of MPI applications on ethernet/TCP networks, November 2013. https://hal.inria.fr/hal-00919507/document
  4. 4.
    Bell, W.H., Cameron, D.G., Millar, A.P., Capozza, L., Stockinger, K., Zini, F.: Optorsim: a grid simulator for studying dynamic data replication strategies. Int. J. High Perform. Comput. Appl. 17(4), 403–416 (2003)CrossRefzbMATHGoogle Scholar
  5. 5.
    Caniou, Y., Gay, J.-S.: Simbatch: an API for simulating and predicting the performance of parallel resources managed by batch systems. In: César, E., et al. (eds.) Euro-Par 2008. LNCS, vol. 5415, pp. 223–234. Springer, Heidelberg (2009). doi: 10.1007/978-3-642-00955-6_27 CrossRefGoogle Scholar
  6. 6.
    Capit, N., Da Costa, G., Georgiou, Y., Huard, G., Martin, C., Mounié, G., Neyron, P., Richard, O.: A batch scheduler with high level components. In: IEEE International Symposium on Cluster Computing and the Grid, 2005. CCGrid 2005, vol. 2, pp. 776–783. IEEE (2005)Google Scholar
  7. 7.
    Casanova, H., Giersch, A., Legrand, A., Quinson, M., Suter, F.: Versatile, scalable, and accurate simulation of distributed applications and platforms. J. Parallel Distrib. Comput. 74(10), 2899–2917 (2014). http://hal.inria.fr/hal-01017319 CrossRefGoogle Scholar
  8. 8.
    Clauss, P.N., Stillwell, M., Genaud, S., Suter, F., Casanova, H., Quinson, M.: Single node on-line simulation of MPI applications with SMPI, May 2011. https://hal.inria.fr/inria-00527150/document
  9. 9.
    Diaz, A., Batista, R., Castro, O.: Realtss: a real-time scheduling simulator. In: 4th International Conference on Electrical and Electronics Engineering, 2007. ICEEE 2007, pp. 165–168. IEEE (2007)Google Scholar
  10. 10.
    Dutot, P.-F., Poquet, M., Trystram, D.: Communication models insights meet simulations. In: Hunold, S., et al. (eds.) Euro-Par 2015. LNCS, vol. 9523, pp. 258–269. Springer, Cham (2015). doi: 10.1007/978-3-319-27308-2_22 CrossRefGoogle Scholar
  11. 11.
    Estrada, T., Flores, D., Taufer, M., Teller, P.J., Kerstens, A., Anderson, D.P., et al.: The effectiveness of threshold-based scheduling policies in BOINC projects. In: Second IEEE International Conference on e-Science and Grid Computing, 2006. e-Science 2006, p. 88. IEEE (2006)Google Scholar
  12. 12.
    Feitelson, D.G.: Workload Modeling for Computer Systems Performance Evaluation. Cambridge University Press, Cambridge (2015). https://cds.cern.ch/record/2005898 CrossRefzbMATHGoogle Scholar
  13. 13.
    Grid5000: Nancy: Home - Grid5000, February 2016. https://www.grid5000.fr/mediawiki/index.php/Nancy:Home
  14. 14.
    Imbert, M., Pouilloux, L., Rouzaud-Cornabas, J., Lébre, A., Hirofuchi, T.: Using the EXECO toolbox to perform automatic and reproducible cloud experiments, December 2013. https://hal.inria.fr/hal-00861886
  15. 15.
    Inria: InriaForge: Evalys: Projet Home. https://gforge.inria.fr/projects/evalys
  16. 16.
    Inria: BatSim Homepage, February 2016. http://batsim.gforge.inria.fr/
  17. 17.
    Inria: InriaForge: Batsimctn: Project Home, February 2016. https://gforge.inria.fr/projects/simctn/
  18. 18.
    Inria: InriaForge:expe_batsim: Project Home, February 2016. https://gforge.inria.fr/projects/expe-batsim
  19. 19.
    Inria: Welcome to execo–execo v2.5.3, February 2016. http://execo.gforge.inria.fr/doc/latest-stable/
  20. 20.
    Jones, W.M., Ligon III, W.B., Pang, L.W., Stanzione, D.: Characterization of bandwidth-aware meta-schedulers for co-allocating jobs across multiple clusters. J. Supercomput. 34(2), 135–163 (2005)CrossRefGoogle Scholar
  21. 21.
    Klusáček, D., Rudová, H.: Alea 2: job scheduling simulator. In: Proceedings of the 3rd International ICST Conference on Simulation Tools and Techniques, p. 61. ICST (Institute for Computer Sciences, Social-Informatics and Telecommunications Engineering) (2010)Google Scholar
  22. 22.
    Legrand, A.: Simgrid Usages, January 2016. http://simgrid.gforge.inria.fr/Usages.php
  23. 23.
    Lucarelli, G., Mendonca, F., Trystram, D., Wagner, F.: Contiguity and locality in backfilling scheduling. In: 2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid), pp. 586–595. IEEE (2015)Google Scholar
  24. 24.
    Mercier, M.: MPI+PRV+TIT-traces_nas-Benchmarks_2016-02-08-10-10-44, February 2016. http://academictorrents.com/details/53b46a4ff43a8ae91f674b26c65c5cc6187f4f8e
  25. 25.
    NASA: NAS Parallel Benchmarks, February 2016. https://www.nas.nasa.gov/publications/npb.html
  26. 26.
    Pascual, J.A., Miguel-Alonso, J., Lozano, J.A.: Locality-aware policies to improve job scheduling on 3D tori. J. Supercomput. 71(3), 966–994 (2015)CrossRefGoogle Scholar
  27. 27.
    Ridruejo Perez, F.J., Miguel-Alonso, J.: INSEE: an interconnection network simulation and evaluation environment. In: Cunha, J.C., Medeiros, P.D. (eds.) Euro-Par 2005. LNCS, vol. 3648, pp. 1014–1023. Springer, Heidelberg (2005). doi: 10.1007/11549468_111 CrossRefGoogle Scholar
  28. 28.
    Phatanapherom, S., Uthayopas, P., Kachitvichyanukul, V.: Dynamic scheduling II: fast simulation model for grid scheduling using HyperSim. In: Proceedings of the 35th Conference on Winter Simulation: Driving Innovation, pp. 1494–1500. Winter Simulation Conference (2003)Google Scholar
  29. 29.
    Proebsting, T., Warren, A.M.: Repeatability and benefaction in computer systems research. Technical report, The university of Arizona (2015). http://reproducibility.cs.arizona.edu/v2/RepeatabilityTR.pdf
  30. 30.
    Ruiz, C., Harrache, S., Mercier, M., Richard, O.: Reconstructable software appliances with kameleon. SIGOPS Oper. Syst. Rev. 49(1), 80–89 (2015)CrossRefGoogle Scholar
  31. 31.
    Stanisic, L., Legrand, A.: Effective reproducible research with org-mode and Git. In: Lopes, L., et al. (eds.) Euro-Par 2014. LNCS, vol. 8805, pp. 475–486. Springer, Cham (2014). doi: 10.1007/978-3-319-14325-5_41 Google Scholar
  32. 32.
    Takefusa, A., Matsuoka, S., Nakada, H., Aida, K., Nagashima, U.: Overview of a performance evaluation system for global computing scheduling algorithms. In: Proceedings of the Eighth International Symposium on High Performance Distributed Computing, pp. 97–104. IEEE (1999)Google Scholar
  33. 33.
    tcbozzetti: tcbozzetti/trabalhoconclusao, February 2016. https://github.com/tcbozzetti/trabalhoconclusao
  34. 34.
  35. 35.
  36. 36.
    Xia, H., Dail, H., Casanova, H., Chien, A.: The microgrid: using emulation to predict application performance in diverse grid network environments. In: Proceedings of the Workshop on Challenges of Large Applications in Distributed Environments (2004)Google Scholar
  37. 37.
    Yu, J., Buyya, R.: A taxonomy of scientific workflow systems for grid computing. SIGMOD Rec. 34(3), 44–49 (2005)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  • Pierre-François Dutot
    • 1
    • 2
  • Michael Mercier
    • 1
    • 2
    • 3
  • Millian Poquet
    • 1
    • 2
  • Olivier Richard
    • 1
    • 2
  1. 1.Univ. Grenoble Alpes, LIGGrenobleFrance
  2. 2.Inria, CNRS, LIGGrenobleFrance
  3. 3.AtosBezonsFrance

Personalised recommendations