Abstract
Many studies in the past two decades focused on the problem of efficient resource management and job scheduling in large computational systems such as HPC clusters and Grids. For this purpose, the application of Artificial Intelligence-based methods such as metaheuristics has been proposed in many works. This chapter provides an overview of such works that involve metaheuristics and discusses why mainstream resource management and scheduling systems are instead using only a limited set of rather simple scheduling policies. We identify several reasons that are causing this situation, e.g., a common use of overly simplified problem definitions with rather naive job and machine models or an application of unrealistic optimization criteria. In order to solve aforementioned issues, this chapter proposes new complex and well designed approaches that involve the use of metaheuristic which periodically optimizes job scheduling plan using several real life based optimization criteria. Importantly, approaches described in this chapter are successfully used in practice, i.e., within a production job scheduler which manages the computing infrastructure of the Czech Centre for Education, Research and Innovation in ICT (CERIT Scientific Cloud).
Keywords
- Job scheduling
- Metaheuristic
- Optimization
- Fairness
This is a preview of subscription content, access via your institution.
Buying options





Notes
- 1.
To avoid huge slowdowns of extremely short jobs, the minimal job runtime is bounded by some predefined time constant (e.g., 10 s), sometimes called a “threshold of interactivity” [10].
- 2.
If a job is not the first in the queue, new jobs that arrive later may skip it in the queue. While such jobs do not delay the first job in the queue, they may delay all other jobs and the system cannot predict when a queued job will eventually run [4].
- 3.
When required, the schedule can be recreated from scratch, e.g., due to a machine failure or early job completion as discussed in Sect. 3.2.1. Still, no optimization or evaluation is applied during this process.
- 4.
- 5.
- 6.
- 7.
Except for CERIT-SC, all workloads come from the Parallel Workloads Archive [47]. CERIT-SC can be obtained at http://www.fi.muni.cz/~xklusac/workload/.
- 8.
Other policies like Conservative backfilling using linear compression with RS optimization being disabled (CONS-L), First Come First Served (FCFS) [12], or Shortest Job First (SJF) [16] were also tested, but they performed poorly compared to other algorithms. Therefore, we do not present them in the figures for better visibility.
- 9.
References
Kleban, S.D., Clearwater, S.H.: Fair share on high performance computing systems: What does fair really mean? In: Third IEEE International Symposium on Cluster Computing and the Grid (CCGrid’03), pp. 146–153. IEEE (2003)
Klusáček, D., Rudová, H.: Performance and fairness for users in parallel job scheduling. In: Cirne, W. (ed.) Job Scheduling Strategies for Parallel Processing. LNCS, vol. 7698, pp. 235–252. Springer (2012)
Hovestadt, M., Kao, O., Keller, A., Streit, A.: Scheduling in HPC resource management systems: queuing vs. planning. In: Feitelson, D.G., Rudolph, L., Schwiegelshohn, U. (eds.) Job Scheduling Strategies for Parallel Processing. LNCS, vol. 2862, pp. 1–20. Springer (2003)
Mu’alem, A.W., Feitelson, D.G.: Utilization, predictability, workloads, and user runtime estimates in scheduling the IBM SP2 with backfilling. IEEE Trans. Parallel Distrib. Syst. 12(6), 529–543 (2001)
Xhafa, F., Abraham, A.: Metaheuristics for Scheduling in Distributed Computing Environments. Studies in Computational Intelligence, vol. 146. Springer, Berlin (2008)
Klusáček, D., Tóth, Š.: On interactions among scheduling policies: finding efficient queue setup using high-resolution simulations. In: Silva, F., Dutra, I., Costa, V.S. (eds.) Euro-Par 2014. LNCS, vol. 8632, pp. 138–149. Springer (2014)
Adaptive Computing Enterprises, Inc.: Moab Workload Manager, Jan 2015. http://docs.adaptivecomputing.com/
Klusáček, D.: Event-based optimization of schedules for grid jobs. Ph.D. thesis, Masaryk University, 2011
Klusáček, D., Rudová, H.: Efficient grid scheduling through the incremental schedule-based approach. Comput. Intell.: Int. J. 27(1), 4–22 (2011)
Feitelson, D.G., Rudolph, L., Schwiegelshohn, U., Sevcik, K.C., Wong, P.: Theory and practice in parallel job scheduling. In: Feitelson, D.G., Rudolph, L. (eds.) Job Scheduling Strategies for Parallel Processing. LNCS, vol. 1291, pp. 1–34. Springer (1997)
Tsafrir, D., Etsion, Y., Feitelson, D.G.: Modeling user runtime estimates. In: Feitelson, D.G., Frachtenberg, E., Rudolph, L., Schwiegelshohn, U. (eds.) Job Scheduling Strategies for Parallel Processing. LNCS, vol. 3834, pp. 1–35. Springer (2005)
PBS Works: PBS Professional 12.1, Administrator’s Guide, Jan 2015. http://www.pbsworks.com
Ernemann, C., Hamscher, V., Yahyapour, R.: Benefits of global grid computing for job scheduling. In: GRID’04: Proceedings of the 5th IEEE/ACM International Workshop on Grid Computing, pp. 374–379. IEEE (2004)
Sabin, G., Kochhar, G., Sadayappan, P.: Job fairness in non-preemptive job scheduling. In: International Conference on Parallel Processing (ICPP’04), pp. 186–194. IEEE Computer Society (2004)
Sabin, G., Sadayappan, P.: Unfairness metrics for space-sharing parallel job schedulers. In: Feitelson, D.G., Frachtenberg, E., Rudolph, L., Schwiegelshohn, U. (eds.) Job Scheduling Strategies for Parallel Processing. LNCS, vol. 3834, pp. 238–256. Springer (2005)
Srinivasan, S., Kettimuthu, R., Subramani, V., Sadayappan, P.: Selective reservation strategies for backfill job scheduling. In: Feitelson, D.G., Rudolph, L., Schwiegelshohn, U. (eds.) Job Scheduling Strategies for Parallel Processing. LNCS, vol. 2537, pp. 55–71. Springer (2002)
Jackson, D., Snell, Q., Clement, M.: Core algorithms of the Maui scheduler. In: Feitelson, D.G., Rudolph, L. (eds.) Job Scheduling Strategies for Parallel Processing. LNCS, vol. 2221, pp. 87–102. Springer (2001)
Adaptive Computing Enterprises, Inc.: TORQUE Resource Manager, Jan 2015. http://docs.adaptivecomputing.com/
Lifka, D.A.: The ANL/IBM SP scheduling system. In: Feitelson, D.G., Rudolph, L. (eds.) Job Scheduling Strategies for Parallel Processing. LNCS, vol. 949, pp. 295–303. Springer (1995)
Talby, D., Feitelson, D.G.: Supporting priorities and improving utilization of the IBM SP scheduler using slack-based backfilling. In: IPPS’99/SPDP’99: Proceedings of the 13th International Symposium on Parallel Processing and the 10th Symposium on Parallel and Distributed Processing, pp. 513–517. IEEE Computer Society (1999)
Feitelson, D.G.: Experimental analysis of the root causes of performance evaluation results: a backfilling case study. IEEE Trans. Parallel Distrib. Syst. 16(2), 175–182 (2005)
Li, B., Zhao, D.: Performance impact of advance reservations from the grid on backfill algorithms. In: Sixth International Conference on Grid and Cooperative Computing (GCC 2007), pp. 456–461 (2007)
Ngubiri, J.: Techniques and evaluation of processor co-allocation in multi-cluster systems. Ph.D. thesis, Radboud University Nijmegen, 2008
Feitelson, D.G., Weil, A.M.: Utilization and predictability in scheduling the IBM SP2 with backfilling. In: 12th International Parallel Processing Symposium, pp. 542–546. IEEE (1998)
Chiang, S.-H., Arpaci-Dusseau, A., Vernon, M.K.: The impact of more accurate requested runtimes on production job scheduling performance. In: Feitelson, D.G., Rudolph, L., Schwiegelshohn, U. (eds.) Job Scheduling Strategies for Parallel Processing. LNCS, vol. 2537, pp. 103–127. Springer (2002)
Smith, W., Taylor, V., Foster, I.: Using run-time predictions to estimate queue wait times and improve scheduler performance. In: Feitelson, D.G., Rudolph, L. (eds.) Job Scheduling Strategies for Parallel Processing. LNCS, vol. 1659, pp. 202–219. Springer (1999)
Srinivasan, S., Kettimuthu, R., Subrarnani, V., Sadayappan, P.: Characterization of backfilling strategies for parallel job scheduling. In: Proceedings of 2002 International Workshops on Parallel Processing, pp. 514–519. IEEE Computer Society (2002)
Yousif, A., Abdullah, A.H., Nor, S.M., Abdelaziz, A.A.: Scheduling jobs on grid computing using firefly algorithm. J. Theor. Appl. Inf. Technol. 33(2), 155–164 (2011)
Abraham, A., Liu, H., Grosan, C., Xhafa, F.: Nature inspired meta-heuristics for grid scheduling: single and multi-objective optimization approaches. In: Metaheuristics for Scheduling in Distributed Computing Environments [5], pp. 247–272 (2008)
Abramson, D., Buyya, R., Murshed, M., Venugopal, S.: Scheduling parameter sweep applications on global grids: a deadline and budget constrained cost-time optimisation algorithm. Softw.: Pract. Exper. 35(5):491–512 (2005)
Stucky, K.-U., Jakob, W., Quinte, A., Süß, W.: Solving scheduling problems in grid resource management using an evolutionary algorithm. In: On the Move to Meaningful Internet Systems 2006: CoopIS, DOA, GADA, and ODBASE. LNCS, vol. 4276, pp. 1252–1262. Springer (2006)
Kumar, R., Vadhiyar, S.: Prediction of queue waiting times for metascheduling on parallel batch systems. In: Cirne, W. (ed.) Job Scheduling Strategies for Parallel Processing. LNCS, vol. 8828. Springer (2015)
Nurmi, D., Brevik, J., Wolski, R.: QBETS: queue bounds estimation from time series. In: Frachtenberg, E., Schwiegelshohn, U. (eds.) Job Scheduling Strategies for Parallel Processing. LNCS, vol. 4942, pp. 76–101. Springer (2007)
Klusáček, D., Chlumský, V., Rudová, H.: Optimizing user oriented job scheduling within TORQUE. In: SuperComputing—The International Conference for High Performance Computing, Networking, Storage and Analysis. Poster, 2013
Keller, A., Reinefeld, A.: Anatomy of a resource management system for HPC clusters. Annu. Rev. Scalable Comput. 3, 1–31 (2001)
Subrata, R., Zomaya, A.Y., Landfeldt, B.: Artificial life techniques for load balancing in computational grids. J. Comput. Syst. Sci. 73(8), 1176–1190 (2007)
Ritchie, G., Levine, J.: A fast, effective local search for scheduling independent jobs in heterogeneous computing environments. In: Porteous, J. (ed.) 22nd Workshop of the UK Planning and Scheduling Special Interest Group (PlanSig 03), 2003
Carretero, J., Xhafa, F.: Using genetic algorithms for scheduling jobs in large scale grid applications. J. Technol. Econ. Dev. Res. J. Vilnius Gediminas Tech. Univ. 12(1), 11–17 (2006)
Asim YarKhan, J.J.D.: Experiments with scheduling using simulated annealing in a grid environment. In: Parashar, M. (ed.) GRID. LNCS, vol. 2536. Springer (2002)
Koodziej, J., Xhafa, F.: Integration of task abortion and security requirements in GA-based meta-heuristics for independent batch grid scheduling. Comput. Math. Appl. 63(2), 350–364 (2012)
Switalski, P., Seredynski, F.: Scheduling parallel batch jobs in grids with evolutionary metaheuristics. J. Sched. 1–13 (2014)
Pooranian, Z., Shojafar, M., Abawajy, J., Abraham, A.: An efficient meta-heuristic algorithm for grid computing. J. Comb. Optim. 1–22 (2013)
Xhafa, F., Abraham, A.: Computational models and heuristic methods for grid scheduling problems. Future Gener. Comput. Syst. 26(4), 608–621 (2010)
Süß, W., Jakob, W., Quinte, A., Stucky, K.-U.: GORBA: a global optimising resource broker embedded in a Grid resource management system. In: International Conference on Parallel and Distributed Computing Systems, PDCS 2005, pp. 19–24. IASTED/ACTA Press (2005)
Jakob, W., Quinte, A. Stucky, K.-U., Süß, W.: Optimised scheduling of Grid resources using hybrid evolutionary algorithms. In: Wyrzykowski, R., Dongarra, J., Meyer, N., Wasniewski, J. (eds.) Parallel Processing and Applied Mathematics, 6th International Conference, PPAM 2005. LNCS, vol. 3911, pp. 406–413. Springer (2005)
Sulistio, A., Cibej, U., Venugopal, S., Robic, B., Buyya, R.: A toolkit for modelling and simulating data grids: an extension to GridSim. Concurr. Comput.: Pract. Exper. 20(13), 1591–1609 (2008)
Feitelson, D.G.: Parallel workloads archive (PWA), Jan 2015. http://www.cs.huji.ac.il/labs/parallel/workload/
Acknowledgments
We highly appreciate the support of the Grant Agency of the Czech Republic under the grant No. P202/12/0306. The access to the MetaCentrum computing facilities and workloads provided under the program “Projects of Large Infrastructure for Research, Development, and Innovations” LM2010005 funded by the Ministry of Education, Youth, and Sports of the Czech Republic is highly appreciated.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this chapter
Cite this chapter
Klusáček, D., Rudová, H. (2015). A Metaheuristic for Optimizing the Performance and the Fairness in Job Scheduling Systems. In: Laalaoui, Y., Bouguila, N. (eds) Artificial Intelligence Applications in Information and Communication Technologies. Studies in Computational Intelligence, vol 607. Springer, Cham. https://doi.org/10.1007/978-3-319-19833-0_1
Download citation
DOI: https://doi.org/10.1007/978-3-319-19833-0_1
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-19832-3
Online ISBN: 978-3-319-19833-0
eBook Packages: EngineeringEngineering (R0)