Advertisement

A Metaheuristic for Optimizing the Performance and the Fairness in Job Scheduling Systems

  • Dalibor KlusáčekEmail author
  • Hana Rudová
Chapter
Part of the Studies in Computational Intelligence book series (SCI, volume 607)

Abstract

Many studies in the past two decades focused on the problem of efficient resource management and job scheduling in large computational systems such as HPC clusters and Grids. For this purpose, the application of Artificial Intelligence-based methods such as metaheuristics has been proposed in many works. This chapter provides an overview of such works that involve metaheuristics and discusses why mainstream resource management and scheduling systems are instead using only a limited set of rather simple scheduling policies. We identify several reasons that are causing this situation, e.g., a common use of overly simplified problem definitions with rather naive job and machine models or an application of unrealistic optimization criteria. In order to solve aforementioned issues, this chapter proposes new complex and well designed approaches that involve the use of metaheuristic which periodically optimizes job scheduling plan using several real life based optimization criteria. Importantly, approaches described in this chapter are successfully used in practice, i.e., within a production job scheduler which manages the computing infrastructure of the Czech Centre for Education, Research and Innovation in ICT (CERIT Scientific Cloud).

Keywords

Job scheduling Metaheuristic Optimization Fairness 

Notes

Acknowledgments

We highly appreciate the support of the Grant Agency of the Czech Republic under the grant No. P202/12/0306. The access to the MetaCentrum computing facilities and workloads provided under the program “Projects of Large Infrastructure for Research, Development, and Innovations” LM2010005 funded by the Ministry of Education, Youth, and Sports of the Czech Republic is highly appreciated.

References

  1. 1.
    Kleban, S.D., Clearwater, S.H.: Fair share on high performance computing systems: What does fair really mean? In: Third IEEE International Symposium on Cluster Computing and the Grid (CCGrid’03), pp. 146–153. IEEE (2003)Google Scholar
  2. 2.
    Klusáček, D., Rudová, H.: Performance and fairness for users in parallel job scheduling. In: Cirne, W. (ed.) Job Scheduling Strategies for Parallel Processing. LNCS, vol. 7698, pp. 235–252. Springer (2012)Google Scholar
  3. 3.
    Hovestadt, M., Kao, O., Keller, A., Streit, A.: Scheduling in HPC resource management systems: queuing vs. planning. In: Feitelson, D.G., Rudolph, L., Schwiegelshohn, U. (eds.) Job Scheduling Strategies for Parallel Processing. LNCS, vol. 2862, pp. 1–20. Springer (2003)Google Scholar
  4. 4.
    Mu’alem, A.W., Feitelson, D.G.: Utilization, predictability, workloads, and user runtime estimates in scheduling the IBM SP2 with backfilling. IEEE Trans. Parallel Distrib. Syst. 12(6), 529–543 (2001)CrossRefGoogle Scholar
  5. 5.
    Xhafa, F., Abraham, A.: Metaheuristics for Scheduling in Distributed Computing Environments. Studies in Computational Intelligence, vol. 146. Springer, Berlin (2008)Google Scholar
  6. 6.
    Klusáček, D., Tóth, Š.: On interactions among scheduling policies: finding efficient queue setup using high-resolution simulations. In: Silva, F., Dutra, I., Costa, V.S. (eds.) Euro-Par 2014. LNCS, vol. 8632, pp. 138–149. Springer (2014)Google Scholar
  7. 7.
    Adaptive Computing Enterprises, Inc.: Moab Workload Manager, Jan 2015. http://docs.adaptivecomputing.com/
  8. 8.
    Klusáček, D.: Event-based optimization of schedules for grid jobs. Ph.D. thesis, Masaryk University, 2011Google Scholar
  9. 9.
    Klusáček, D., Rudová, H.: Efficient grid scheduling through the incremental schedule-based approach. Comput. Intell.: Int. J. 27(1), 4–22 (2011)CrossRefzbMATHGoogle Scholar
  10. 10.
    Feitelson, D.G., Rudolph, L., Schwiegelshohn, U., Sevcik, K.C., Wong, P.: Theory and practice in parallel job scheduling. In: Feitelson, D.G., Rudolph, L. (eds.) Job Scheduling Strategies for Parallel Processing. LNCS, vol. 1291, pp. 1–34. Springer (1997)Google Scholar
  11. 11.
    Tsafrir, D., Etsion, Y., Feitelson, D.G.: Modeling user runtime estimates. In: Feitelson, D.G., Frachtenberg, E., Rudolph, L., Schwiegelshohn, U. (eds.) Job Scheduling Strategies for Parallel Processing. LNCS, vol. 3834, pp. 1–35. Springer (2005)Google Scholar
  12. 12.
    PBS Works: PBS Professional 12.1, Administrator’s Guide, Jan 2015. http://www.pbsworks.com
  13. 13.
    Ernemann, C., Hamscher, V., Yahyapour, R.: Benefits of global grid computing for job scheduling. In: GRID’04: Proceedings of the 5th IEEE/ACM International Workshop on Grid Computing, pp. 374–379. IEEE (2004)Google Scholar
  14. 14.
    Sabin, G., Kochhar, G., Sadayappan, P.: Job fairness in non-preemptive job scheduling. In: International Conference on Parallel Processing (ICPP’04), pp. 186–194. IEEE Computer Society (2004)Google Scholar
  15. 15.
    Sabin, G., Sadayappan, P.: Unfairness metrics for space-sharing parallel job schedulers. In: Feitelson, D.G., Frachtenberg, E., Rudolph, L., Schwiegelshohn, U. (eds.) Job Scheduling Strategies for Parallel Processing. LNCS, vol. 3834, pp. 238–256. Springer (2005)Google Scholar
  16. 16.
    Srinivasan, S., Kettimuthu, R., Subramani, V., Sadayappan, P.: Selective reservation strategies for backfill job scheduling. In: Feitelson, D.G., Rudolph, L., Schwiegelshohn, U. (eds.) Job Scheduling Strategies for Parallel Processing. LNCS, vol. 2537, pp. 55–71. Springer (2002)Google Scholar
  17. 17.
    Jackson, D., Snell, Q., Clement, M.: Core algorithms of the Maui scheduler. In: Feitelson, D.G., Rudolph, L. (eds.) Job Scheduling Strategies for Parallel Processing. LNCS, vol. 2221, pp. 87–102. Springer (2001)Google Scholar
  18. 18.
    Adaptive Computing Enterprises, Inc.: TORQUE Resource Manager, Jan 2015. http://docs.adaptivecomputing.com/
  19. 19.
    Lifka, D.A.: The ANL/IBM SP scheduling system. In: Feitelson, D.G., Rudolph, L. (eds.) Job Scheduling Strategies for Parallel Processing. LNCS, vol. 949, pp. 295–303. Springer (1995)Google Scholar
  20. 20.
    Talby, D., Feitelson, D.G.: Supporting priorities and improving utilization of the IBM SP scheduler using slack-based backfilling. In: IPPS’99/SPDP’99: Proceedings of the 13th International Symposium on Parallel Processing and the 10th Symposium on Parallel and Distributed Processing, pp. 513–517. IEEE Computer Society (1999)Google Scholar
  21. 21.
    Feitelson, D.G.: Experimental analysis of the root causes of performance evaluation results: a backfilling case study. IEEE Trans. Parallel Distrib. Syst. 16(2), 175–182 (2005)CrossRefzbMATHGoogle Scholar
  22. 22.
    Li, B., Zhao, D.: Performance impact of advance reservations from the grid on backfill algorithms. In: Sixth International Conference on Grid and Cooperative Computing (GCC 2007), pp. 456–461 (2007)Google Scholar
  23. 23.
    Ngubiri, J.: Techniques and evaluation of processor co-allocation in multi-cluster systems. Ph.D. thesis, Radboud University Nijmegen, 2008Google Scholar
  24. 24.
    Feitelson, D.G., Weil, A.M.: Utilization and predictability in scheduling the IBM SP2 with backfilling. In: 12th International Parallel Processing Symposium, pp. 542–546. IEEE (1998)Google Scholar
  25. 25.
    Chiang, S.-H., Arpaci-Dusseau, A., Vernon, M.K.: The impact of more accurate requested runtimes on production job scheduling performance. In: Feitelson, D.G., Rudolph, L., Schwiegelshohn, U. (eds.) Job Scheduling Strategies for Parallel Processing. LNCS, vol. 2537, pp. 103–127. Springer (2002)Google Scholar
  26. 26.
    Smith, W., Taylor, V., Foster, I.: Using run-time predictions to estimate queue wait times and improve scheduler performance. In: Feitelson, D.G., Rudolph, L. (eds.) Job Scheduling Strategies for Parallel Processing. LNCS, vol. 1659, pp. 202–219. Springer (1999)Google Scholar
  27. 27.
    Srinivasan, S., Kettimuthu, R., Subrarnani, V., Sadayappan, P.: Characterization of backfilling strategies for parallel job scheduling. In: Proceedings of 2002 International Workshops on Parallel Processing, pp. 514–519. IEEE Computer Society (2002)Google Scholar
  28. 28.
    Yousif, A., Abdullah, A.H., Nor, S.M., Abdelaziz, A.A.: Scheduling jobs on grid computing using firefly algorithm. J. Theor. Appl. Inf. Technol. 33(2), 155–164 (2011)Google Scholar
  29. 29.
    Abraham, A., Liu, H., Grosan, C., Xhafa, F.: Nature inspired meta-heuristics for grid scheduling: single and multi-objective optimization approaches. In: Metaheuristics for Scheduling in Distributed Computing Environments [5], pp. 247–272 (2008)Google Scholar
  30. 30.
    Abramson, D., Buyya, R., Murshed, M., Venugopal, S.: Scheduling parameter sweep applications on global grids: a deadline and budget constrained cost-time optimisation algorithm. Softw.: Pract. Exper. 35(5):491–512 (2005)Google Scholar
  31. 31.
    Stucky, K.-U., Jakob, W., Quinte, A., Süß, W.: Solving scheduling problems in grid resource management using an evolutionary algorithm. In: On the Move to Meaningful Internet Systems 2006: CoopIS, DOA, GADA, and ODBASE. LNCS, vol. 4276, pp. 1252–1262. Springer (2006)Google Scholar
  32. 32.
    Kumar, R., Vadhiyar, S.: Prediction of queue waiting times for metascheduling on parallel batch systems. In: Cirne, W. (ed.) Job Scheduling Strategies for Parallel Processing. LNCS, vol. 8828. Springer (2015)Google Scholar
  33. 33.
    Nurmi, D., Brevik, J., Wolski, R.: QBETS: queue bounds estimation from time series. In: Frachtenberg, E., Schwiegelshohn, U. (eds.) Job Scheduling Strategies for Parallel Processing. LNCS, vol. 4942, pp. 76–101. Springer (2007)Google Scholar
  34. 34.
    Klusáček, D., Chlumský, V., Rudová, H.: Optimizing user oriented job scheduling within TORQUE. In: SuperComputing—The International Conference for High Performance Computing, Networking, Storage and Analysis. Poster, 2013Google Scholar
  35. 35.
    Keller, A., Reinefeld, A.: Anatomy of a resource management system for HPC clusters. Annu. Rev. Scalable Comput. 3, 1–31 (2001)CrossRefzbMATHGoogle Scholar
  36. 36.
    Subrata, R., Zomaya, A.Y., Landfeldt, B.: Artificial life techniques for load balancing in computational grids. J. Comput. Syst. Sci. 73(8), 1176–1190 (2007)CrossRefzbMATHGoogle Scholar
  37. 37.
    Ritchie, G., Levine, J.: A fast, effective local search for scheduling independent jobs in heterogeneous computing environments. In: Porteous, J. (ed.) 22nd Workshop of the UK Planning and Scheduling Special Interest Group (PlanSig 03), 2003Google Scholar
  38. 38.
    Carretero, J., Xhafa, F.: Using genetic algorithms for scheduling jobs in large scale grid applications. J. Technol. Econ. Dev. Res. J. Vilnius Gediminas Tech. Univ. 12(1), 11–17 (2006)zbMATHGoogle Scholar
  39. 39.
    Asim YarKhan, J.J.D.: Experiments with scheduling using simulated annealing in a grid environment. In: Parashar, M. (ed.) GRID. LNCS, vol. 2536. Springer (2002)Google Scholar
  40. 40.
    Koodziej, J., Xhafa, F.: Integration of task abortion and security requirements in GA-based meta-heuristics for independent batch grid scheduling. Comput. Math. Appl. 63(2), 350–364 (2012)CrossRefGoogle Scholar
  41. 41.
    Switalski, P., Seredynski, F.: Scheduling parallel batch jobs in grids with evolutionary metaheuristics. J. Sched. 1–13 (2014)Google Scholar
  42. 42.
    Pooranian, Z., Shojafar, M., Abawajy, J., Abraham, A.: An efficient meta-heuristic algorithm for grid computing. J. Comb. Optim. 1–22 (2013)Google Scholar
  43. 43.
    Xhafa, F., Abraham, A.: Computational models and heuristic methods for grid scheduling problems. Future Gener. Comput. Syst. 26(4), 608–621 (2010)CrossRefzbMATHGoogle Scholar
  44. 44.
    Süß, W., Jakob, W., Quinte, A., Stucky, K.-U.: GORBA: a global optimising resource broker embedded in a Grid resource management system. In: International Conference on Parallel and Distributed Computing Systems, PDCS 2005, pp. 19–24. IASTED/ACTA Press (2005)Google Scholar
  45. 45.
    Jakob, W., Quinte, A. Stucky, K.-U., Süß, W.: Optimised scheduling of Grid resources using hybrid evolutionary algorithms. In: Wyrzykowski, R., Dongarra, J., Meyer, N., Wasniewski, J. (eds.) Parallel Processing and Applied Mathematics, 6th International Conference, PPAM 2005. LNCS, vol. 3911, pp. 406–413. Springer (2005)Google Scholar
  46. 46.
    Sulistio, A., Cibej, U., Venugopal, S., Robic, B., Buyya, R.: A toolkit for modelling and simulating data grids: an extension to GridSim. Concurr. Comput.: Pract. Exper. 20(13), 1591–1609 (2008)CrossRefGoogle Scholar
  47. 47.
    Feitelson, D.G.: Parallel workloads archive (PWA), Jan 2015. http://www.cs.huji.ac.il/labs/parallel/workload/

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  1. 1.Faculty of InformaticsMasaryk UniversityBrnoCzech Republic

Personalised recommendations