Abstract
Supercomputer job management systems (JMS) are complex software which have a number of parameters and settings. Various simulating methods have been used in order to explore impact of such parameters on the JMS efficiency metrics. At the same time evaluating accuracy (adequacy) of the applied JMS models is one of the key points. The paper contains the results of adequacy measure experiments for various JMS models, including simulating with a virtual supercomputing nodes and with the Alea job scheduling simulator. JMS SUPPZ functioning at the Joint Supercomputer Center of the Russian Academy of Sciences (JSCC RAS) was used for the experiments. Source data for such simulating was created upon the statistics of supercomputer MVS–10P OP installed at JSCC RAS. The normalized Euclidean distance between the job residence (turnaround) time vectors, obtained from the job streams of the real supercomputer and JMS model, was used as a measure of adequacy. The experiments results have confirmed intuitive ideas about the studied simulating methods accuracy, that allows using the normalized Euclidean distance between the jobs turnaround times vectors as a measure of various JMS models adequacy.
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1134%2FS199508022111007X/MediaObjects/12202_2021_6541_Fig1_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1134%2FS199508022111007X/MediaObjects/12202_2021_6541_Fig2_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1134%2FS199508022111007X/MediaObjects/12202_2021_6541_Fig3_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1134%2FS199508022111007X/MediaObjects/12202_2021_6541_Fig4_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1134%2FS199508022111007X/MediaObjects/12202_2021_6541_Fig5_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1134%2FS199508022111007X/MediaObjects/12202_2021_6541_Fig6_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1134%2FS199508022111007X/MediaObjects/12202_2021_6541_Fig7_HTML.png)
Similar content being viewed by others
REFERENCES
A. Reuther et al., ‘‘Scalable system scheduling for HPC and big data,’’ J. Parallel Distrib. Comput. 111, 76–92 (2018). https://doi.org/10.1016/j.jpdc.2017.06.009
A. B. Yoo, M. A. Jette, and M. Grondona, ‘‘SLURM: Simple Linux Utility for Resource Management,’’ Lect. Notes Comput. Sci. 2862, 44–60 (2003). https://doi.org/10.1007/10968987_3
IBM Spectrum LSF overview. https://www.ibm.com/support/knowledgecenter/en/SSWRJV_10.1.0/lsf_ foundations/chap_lsf_overview_foundations.html. Accessed 13 May 2020.
R. L. Henderson, ‘‘Job scheduling under the Portable Batch System,’’ Lect. Notes Comput. Sci. 949, 279–294 (1995). https://doi.org/10.1007/3-540-60153-8_34
G. I. Savin, B. M. Shabanov, P. N. Telegin, and A. V. Baranov, ‘‘Joint Supercomputer Center of the Russian Academy of Sciences: Present and future,’’ Lobachevskii J. Math. 40 (11), 1853–1862 (2019). https://doi.org/10.1134/S1995080219110271
A. W. Mu’alem and D. G. Feitelson, ‘‘Utilization, predictability, workloads, and user runtime estimates in scheduling the IBM SP2 with backfilling,’’ IEEE Trans. Parallel Distrib. Syst., 529–543 (2001). https://doi.org/10.1109/71.932708
D. G. Feitelson, L. Rudolph, and U. Schwiegelshohn, ‘‘Parallel job scheduling—A status report,’’ Lect. Notes Comput. Sci. 3277, 1–16 (2005). https://doi.org/10.1007/11407522_1
A. Baranov, D. Lyakhovets, G. Savin, B. Shabanov, and P. Telegin, ‘‘Measure of adequacy for the supercomputer job management system model,’’ in Proceedings of the 2019 Federated Conference on Computer Science and Information Systems FedCSIS (2019), pp. 423–426. doi 10.15439/2019F186
M. Martinasso, M. Gila, M. Bianco, S. R. Alam, C. McMurtrie, and T. C. Schulthess, ‘‘RM-Replay: A high-fidelity tuning, optimization and exploration tool for resource management,’’ in Proceedings of the SC18: International Conference for High Performance Computing, Networking, Storage and Analysis (2018), pp. 320–332. https://doi.org/10.1109/SC.2018.00028
N. Simakov, M. Innus, M. Jones, R. DeLeon, J. White, S. Gallo, A. Patra, and T. Furlani, ‘‘A Slurm simulator: Implementation and parametric analysis,’’ in Proceedings of the 2018 IEEE/ACM Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems PMBS (2018), pp. 197–217. https://doi.org/10.1007/978-3-319-72971-8_10
A. Jokanovic, M. D’Amico, and J. Corbalan, ‘‘Evaluating SLURM simulator with real-machine SLURM and vice versa,’’ in Proceedings of the 2018 IEEE/ACM Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems PMBS (2018), pp. 72–82. https://doi.org/10.1109/PMBS.2018.8641556
P. F. Dutot, M. Mercier, M. Poquet, and O. Richard, ‘‘Batsim: A realistic language-independent resources and jobs management systems simulator,’’ in Job Scheduling Strategies for Parallel Processing JSSPP 2015 and JSSPP 2016, Lect. Notes Comput. Sci. 10353, 178–197 (2017). https://doi.org/10.1007/978-3-319-61756-5_10
C. Galleguillos, Z. Kiziltan, A. Netti, and R. Soto, ‘‘AccaSim: A customizable workload management simulator for job dispatching research in HPC systems,’’ Cluster Comput. 23, 107–122 (2020). https://doi.org/10.1007/s10586-019-02905-5
D. Klusáček, M. Soysa, and F. Suter, ‘‘Alea—complex job scheduling simulator,’’ in Parallel Processing and Applied Mathematics PPAM 2019, Lect. Notes Comput. Sci. 12043, 217–229 (2020). https://doi.org/10.1007/978-3-030-43222-5_19
T. H. Le Hai, K. P. Trung, and N. Thoai, ‘‘A working time deadline-based backfilling scheduling solution,’’ in Proceedings of the 2020 International Conference on Advanced Computing and Applications ACOMP (2020), pp. 63–70. doi 10.1109/ACOMP50827.2020.00017
G. I. Savin, B. M. Shabanov, D. S. Nikolaev, et al., ‘‘Jobs runtime forecast for JSCC RAS supercomputers using machine learning methods,’’ Lobachevskii J. Math. 41 (12), 2593–2602 (2020). https://doi.org/10.1134/S1995080220120343
W. Cirne and F. Berman, ‘‘A model for moldable supercomputer jobs,’’ in Proceedings of the 15th International Parallel and Distributed Processing Symposium IPDPS 2001 (2001), p. 8. https://doi.org/10.1109/IPDPS.2001.925004
Supercomputing Resources of JSCC RAS. http://www.jscc.ru/supercomputing-resources/. Accessed 12 May 2020.
A. V. Baranov, E. A. Kiselev, and D. S. Lyakhovets, ‘‘The quasi scheduler for utilization of multiprocessing computing system idle resources under control of the management system of the parallel jobs,’’ Vestn. Yuzh.-Ural. Univ., Ser. Vychisl. Mat. Inform. 3 (4), 75–84 (2014). https://doi.org/10.14529/cmse140405
Funding
The work was carried out at the JSCC RAS as part of the government assignment. Supercomputer MVS-10P OP was used.
Author information
Authors and Affiliations
Corresponding authors
Additional information
(Submitted by A. M. Elizarov)
Rights and permissions
About this article
Cite this article
Baranov, A.V., Lyakhovets, D.S. Accuracy Comparison of Various Supercomputer Job Management System Models. Lobachevskii J Math 42, 2510–2519 (2021). https://doi.org/10.1134/S199508022111007X
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1134/S199508022111007X