Skip to main content
Log in

Accuracy Comparison of Various Supercomputer Job Management System Models

  • Published:
Lobachevskii Journal of Mathematics Aims and scope Submit manuscript

Abstract

Supercomputer job management systems (JMS) are complex software which have a number of parameters and settings. Various simulating methods have been used in order to explore impact of such parameters on the JMS efficiency metrics. At the same time evaluating accuracy (adequacy) of the applied JMS models is one of the key points. The paper contains the results of adequacy measure experiments for various JMS models, including simulating with a virtual supercomputing nodes and with the Alea job scheduling simulator. JMS SUPPZ functioning at the Joint Supercomputer Center of the Russian Academy of Sciences (JSCC RAS) was used for the experiments. Source data for such simulating was created upon the statistics of supercomputer MVS–10P OP installed at JSCC RAS. The normalized Euclidean distance between the job residence (turnaround) time vectors, obtained from the job streams of the real supercomputer and JMS model, was used as a measure of adequacy. The experiments results have confirmed intuitive ideas about the studied simulating methods accuracy, that allows using the normalized Euclidean distance between the jobs turnaround times vectors as a measure of various JMS models adequacy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

REFERENCES

  1. A. Reuther et al., ‘‘Scalable system scheduling for HPC and big data,’’ J. Parallel Distrib. Comput. 111, 76–92 (2018). https://doi.org/10.1016/j.jpdc.2017.06.009

    Article  Google Scholar 

  2. A. B. Yoo, M. A. Jette, and M. Grondona, ‘‘SLURM: Simple Linux Utility for Resource Management,’’ Lect. Notes Comput. Sci. 2862, 44–60 (2003). https://doi.org/10.1007/10968987_3

    Article  Google Scholar 

  3. IBM Spectrum LSF overview. https://www.ibm.com/support/knowledgecenter/en/SSWRJV_10.1.0/lsf_ foundations/chap_lsf_overview_foundations.html. Accessed 13 May 2020.

  4. R. L. Henderson, ‘‘Job scheduling under the Portable Batch System,’’ Lect. Notes Comput. Sci. 949, 279–294 (1995). https://doi.org/10.1007/3-540-60153-8_34

    Article  Google Scholar 

  5. G. I. Savin, B. M. Shabanov, P. N. Telegin, and A. V. Baranov, ‘‘Joint Supercomputer Center of the Russian Academy of Sciences: Present and future,’’ Lobachevskii J. Math. 40 (11), 1853–1862 (2019). https://doi.org/10.1134/S1995080219110271

    Article  MATH  Google Scholar 

  6. A. W. Mu’alem and D. G. Feitelson, ‘‘Utilization, predictability, workloads, and user runtime estimates in scheduling the IBM SP2 with backfilling,’’ IEEE Trans. Parallel Distrib. Syst., 529–543 (2001). https://doi.org/10.1109/71.932708

  7. D. G. Feitelson, L. Rudolph, and U. Schwiegelshohn, ‘‘Parallel job scheduling—A status report,’’ Lect. Notes Comput. Sci. 3277, 1–16 (2005). https://doi.org/10.1007/11407522_1

    Article  Google Scholar 

  8. A. Baranov, D. Lyakhovets, G. Savin, B. Shabanov, and P. Telegin, ‘‘Measure of adequacy for the supercomputer job management system model,’’ in Proceedings of the 2019 Federated Conference on Computer Science and Information Systems FedCSIS (2019), pp. 423–426. doi 10.15439/2019F186

  9. M. Martinasso, M. Gila, M. Bianco, S. R. Alam, C. McMurtrie, and T. C. Schulthess, ‘‘RM-Replay: A high-fidelity tuning, optimization and exploration tool for resource management,’’ in Proceedings of the SC18: International Conference for High Performance Computing, Networking, Storage and Analysis (2018), pp. 320–332. https://doi.org/10.1109/SC.2018.00028

  10. N. Simakov, M. Innus, M. Jones, R. DeLeon, J. White, S. Gallo, A. Patra, and T. Furlani, ‘‘A Slurm simulator: Implementation and parametric analysis,’’ in Proceedings of the 2018 IEEE/ACM Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems PMBS (2018), pp. 197–217. https://doi.org/10.1007/978-3-319-72971-8_10

  11. A. Jokanovic, M. D’Amico, and J. Corbalan, ‘‘Evaluating SLURM simulator with real-machine SLURM and vice versa,’’ in Proceedings of the 2018 IEEE/ACM Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems PMBS (2018), pp. 72–82. https://doi.org/10.1109/PMBS.2018.8641556

  12. P. F. Dutot, M. Mercier, M. Poquet, and O. Richard, ‘‘Batsim: A realistic language-independent resources and jobs management systems simulator,’’ in Job Scheduling Strategies for Parallel Processing JSSPP 2015 and JSSPP 2016, Lect. Notes Comput. Sci. 10353, 178–197 (2017). https://doi.org/10.1007/978-3-319-61756-5_10

  13. C. Galleguillos, Z. Kiziltan, A. Netti, and R. Soto, ‘‘AccaSim: A customizable workload management simulator for job dispatching research in HPC systems,’’ Cluster Comput. 23, 107–122 (2020). https://doi.org/10.1007/s10586-019-02905-5

    Article  Google Scholar 

  14. D. Klusáček, M. Soysa, and F. Suter, ‘‘Alea—complex job scheduling simulator,’’ in Parallel Processing and Applied Mathematics PPAM 2019, Lect. Notes Comput. Sci. 12043, 217–229 (2020). https://doi.org/10.1007/978-3-030-43222-5_19

  15. T. H. Le Hai, K. P. Trung, and N. Thoai, ‘‘A working time deadline-based backfilling scheduling solution,’’ in Proceedings of the 2020 International Conference on Advanced Computing and Applications ACOMP (2020), pp. 63–70. doi 10.1109/ACOMP50827.2020.00017

  16. G. I. Savin, B. M. Shabanov, D. S. Nikolaev, et al., ‘‘Jobs runtime forecast for JSCC RAS supercomputers using machine learning methods,’’ Lobachevskii J. Math. 41 (12), 2593–2602 (2020). https://doi.org/10.1134/S1995080220120343

    Article  MathSciNet  MATH  Google Scholar 

  17. W. Cirne and F. Berman, ‘‘A model for moldable supercomputer jobs,’’ in Proceedings of the 15th International Parallel and Distributed Processing Symposium IPDPS 2001 (2001), p. 8. https://doi.org/10.1109/IPDPS.2001.925004

  18. Supercomputing Resources of JSCC RAS. http://www.jscc.ru/supercomputing-resources/. Accessed 12 May 2020.

  19. A. V. Baranov, E. A. Kiselev, and D. S. Lyakhovets, ‘‘The quasi scheduler for utilization of multiprocessing computing system idle resources under control of the management system of the parallel jobs,’’ Vestn. Yuzh.-Ural. Univ., Ser. Vychisl. Mat. Inform. 3 (4), 75–84 (2014). https://doi.org/10.14529/cmse140405

    Article  Google Scholar 

Download references

Funding

The work was carried out at the JSCC RAS as part of the government assignment. Supercomputer MVS-10P OP was used.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to A. V. Baranov or D. S. Lyakhovets.

Additional information

(Submitted by A. M. Elizarov)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Baranov, A.V., Lyakhovets, D.S. Accuracy Comparison of Various Supercomputer Job Management System Models. Lobachevskii J Math 42, 2510–2519 (2021). https://doi.org/10.1134/S199508022111007X

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1134/S199508022111007X

Keywords:

Navigation