Skip to main content
Log in

On Effective Scheduling in Computing Clusters

  • Published:
Programming and Computer Software Aims and scope Submit manuscript

Abstract

Presently, big companies such as Amazon, Google, Facebook, Microsoft, and Yahoo! own huge datacenters with thousands of nodes. These clusters are used simultaneously by many clients. The users submit jobs containing one or more tasks. The task flow is usually a mix of short, long, interactive, and batch tasks with different priorities. The cluster scheduler decides on which server the task should be run as a process, container, or virtual machine. Scheduler optimizations are important as they provide higher server utilization, lower latency, improved load balancing, and fault tolerance. Optimal task placement is a complex problem that has multiple dimensions and requires algorithmically complex optimizations. This increases placement latency and limits cluster scalability. In this paper, we consider different cluster scheduler architectures and optimization problems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

Notes

  1. A computing cluster is a set of workstations (nodes) that are connected by a communication medium and can operate as a whole owing to additional software called a cluster management system.

  2. More precisely, process-level isolation: Borg appeared before the widespread use of containerization and virtualization.

REFERENCES

  1. Sinnen, O., Task Scheduling for Parallel Systems, Wiley, 2007.

    Book  Google Scholar 

  2. Verma, A., Pedrosa, L., Korupolu, M., et al., Large-scale cluster management at Google with Borg, Proc. 10th Eur. Conf. Computer Systems, 2015, pp. 18:1–18:17.

  3. Ehrgott, M., Multicriteria Optimization, Springer, 2005.

    MATH  Google Scholar 

  4. Avetisyan, A., Grushin, D., and Ryzhov, A., Cluster management systems, Tr. Inst. Sistemnogo Program. Ross. Akad. Nauk, 2002, vol. 3, pp. 39–62.

    Google Scholar 

  5. MPI: A message-passing interface standard, Technical Report, University of Tennessee, Knoxville, 1994.

  6. Kaplan, J.A. and Nelson, M.L., A comparison of queueing, cluster, and distributed computing systems, NASA Langley Technical Report, 1994.

  7. Baker, M., Fox, G., and Yau, H., A review of commercial and research cluster management software, Northeast Parallel Architectures Center, 1996.

    Google Scholar 

  8. Foster, I. and Kesselman, C., The Grid: Blueprint for a New Computing Infrastructure, Morgan Kaufmann, 1999.

    Google Scholar 

  9. Boutin, E., Ekanayake, J., Lin, W., et al., Apollo: Scalable and coordinated scheduling for cloud-scale computing, Proc. 11th USENIX Conf. Operating Systems Design and Implementation, 2014, pp. 285–300.

  10. Delgado, P., Dinu, F., Kermarrec, A.-M., et al., Hawk: Hybrid datacenter scheduling, Proc. USENIX Annual Technical Conf., 2015.

  11. Schwarzkopf, M., Cluster scheduling for data centers, Queue, 2017, vol. 15, no. 5.

    Article  Google Scholar 

  12. Herbein, S., Dusia, A., Landwehr, A., et al., Resource management for running HPC applications in container clouds, Lect. Notes Comput. Sci., 2016, vol. 9697, pp. 261–278.

    Article  Google Scholar 

  13. Ye, D., Han, X., and Zhang, G., Online multiple-strip packing, Theor. Comput. Sci., 2011, vol. 412, no. 3, pp. 233–239.

    Article  MathSciNet  Google Scholar 

  14. Hurink, J.L. and Paulus, J.J., Online algorithm for parallel job scheduling and strip packing, Lect. Notes Comput. Sci., 2007, vol. 4927, pp. 67–74.

    Article  MathSciNet  Google Scholar 

  15. Zhuk, S., On-line algorithms for packing rectangles into several strips, Discrete Math. Appl., 2007, vol. 17, no. 5, pp. 517–531.

    Article  Google Scholar 

  16. Johnson, D.S., Demers, A., Ullman, J.D., et al., Worst-case performance bounds for simple one-dimensional packing algorithms, SIAM J. Comput., 1974, vol. 3, no. 4, pp. 299–325.

    Article  MathSciNet  Google Scholar 

  17. Garey, M.R., Graham, R.L., Johnson, D.S., et al., Resource constrained scheduling as generalized bin packing, J. Comb. Theory, Ser. A, 1976, vol. 21, no. 3, pp. 257–298.

    Article  MathSciNet  Google Scholar 

  18. Zhuk, S., Chernykh, A., Avetisyan, A., et al., Comparison of scheduling heuristics for grid resource broker, Proc. 5th Mexican Int. Conf., 2004, pp. 388–392.

  19. Tchernykh, A., Ramí rez-Alcaraz, J.M., Avetisyan, A., et al., Two level job-scheduling strategies for a computational grid, Lect. Notes Comp. Sci., 2005, vol. 3911, pp. 774–781.

    Article  Google Scholar 

  20. Tchernykh, A., Schwiegelshohn, U., Yahyapour, R., et al., On-line hierarchical job scheduling on grids with admissible allocation, J. Scheduling, 2010, vol. 13, no. 5, pp. 545–52.

    Article  MathSciNet  Google Scholar 

  21. Tchernykh, A., Schwiegelshohn, U., Yahyapour, R., et al., Online hierarchical job scheduling on grids, From Grids to Service and Pervasive Computing, Springer, 2008, pp. 77–91.

    MATH  Google Scholar 

  22. Avetisyan, A.I., Gaissaryan, S.S., Grushin, D.A., et al., Scheduling heuristics for grid resource broker, Tr. Inst. Sistemnogo Program. Ross. Akad. Nauk, 2004, vol. 5, pp. 41–62.

    Google Scholar 

  23. Baraglia, R., Capannini, G., Pasquali, M., et al., Backfilling strategies for scheduling streams of jobs on computational farms, Making Grids Work, Springer, 2008.

    Google Scholar 

  24. Mu'alem, A.W. and Feitelson, D.G., Utilization, predictability, workloads, and user runtime estimates in scheduling the IBM SP2 with backfilling, IEEE Trans. Parallel Distrib. Syst., 2001, vol. 12, no. 6, pp. 529–543.

    Article  Google Scholar 

  25. Nissimov, A. and Feitelson, D.G., Probabilistic backfilling, Lect. Notes Comput. Sci., 2008, vol. 4942, pp. 102–115.

    Article  Google Scholar 

  26. Ivannikov, V.P., Grushin, D.A., Kuzyurin, N.N., Pospelov, A.I., and Shokurov, A.V., Software for improving the energy efficiency of a computer cluster, Program. Comput. Software, 2010, vol. 36, no. 6, pp. 327–336.

    Article  Google Scholar 

  27. Baraglia, R., Capannini, G., Dazzi, P., et al., A multi-criteria job scheduling framework for large computing farms, J. Comput. Syst. Sci., 2013, vol. 79, no. 2, pp. 230–244.

    Article  MathSciNet  Google Scholar 

  28. Csirik, J. and Van Vliet, A., An online algorithm for multidimensional bin packing, Oper. Res. Lett., 1993, vol. 13, no. 3, pp. 149–158.

    Article  MathSciNet  Google Scholar 

  29. Kuzjurin, N.N. and Pospelov, A.I., Probabilistic analysis of a new class of rectangle strip packing algorithms, Comput. Math. Math. Phys., 2011, vol. 51, no. 10, pp. 1931–1936.

    MathSciNet  MATH  Google Scholar 

  30. Trushnikov, M.A., Probabilistic analysis of a new rectangle strip packing algorithm, Tr. Inst. Sistemnogo Program. Ross. Akad. Nauk, 2013, vol. 24, pp. 457–468.

    Google Scholar 

  31. Lazarev, D.O. and Kuzjurin, N.N., Rectangle packing algorithm into several strips and average case analysis, Tr. Inst. Sistemnogo Program. Ross. Akad. Nauk, 2017, vol. 29, no. 6, pp. 221–228.

    Google Scholar 

  32. Zhuk, S., On parallel task scheduling on a group of clusters with different speeds, Tr. Inst. Sistemnogo Program. Ross. Akad. Nauk, 2012, vol. 23, pp. 447–454.

    Google Scholar 

  33. Tsai, C.-W. and Rodrigues, J.J., Metaheuristic scheduling for cloud: A survey, IEEE Syst. J., 2014, vol. 8, no. 1, pp. 279–291.

    Article  Google Scholar 

  34. Grushin, D.A. and Kuzjurin, N.N., Energy effective computations on a group of clusters, Tr. Inst. Sistemnogo Program. Ross. Akad. Nauk, 2012, vol. 23, pp. 433–446.

    Google Scholar 

  35. Goldberg, R.P., Distributed, low latency scheduling, Proc. 24th ACM Symp. Operating Systems Principles, 2013, pp. 69–84.

  36. Magic quadrant for cloud infrastructure as a service. http://www.gartner.com/doc/3875999/magic-quadrant-cloud-infrastructure-service.

  37. Helsley, M., LXC: Linux container tools, IBM developerWorks Technical Library, 2009.

  38. Merkel, D., Docker: Lightweight Linux containers for consistent development and deployment, Linux J., 2014, no. 239.

  39. Barroso, L.A., Clidaras, J., and Hoelzle, U., The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines, Morgan & Claypool, 2013.

    Google Scholar 

  40. Delimitrou, C. and Kozyrakis, C., Paragon: QoS-aware scheduling for heterogeneous datacenters, ACM SIGPLAN Not., 2013, vol. 48, no. 4, pp. 77–88.

    Article  Google Scholar 

  41. Delimitrou, C. and Kozyrakis, C., Quasar: Resource-efficient and QoS-aware cluster management, ACM SIGPLAN Not., 2014, vol. 49, no. 4, pp. 127–144.

    Google Scholar 

  42. Romero, F. and Delimitrou, C., Mage: Online and interference-aware scheduling for multi-scale heterogeneous systems, Proc. 27th Int. Conf. Parallel Architectures and Compilation Techniques, 2018, pp. 19:1–19:13.

  43. Gog, I., Schwarzkopf, M., Gleave, A., et al., Firmament: Fast, centralized cluster scheduling at scale, Proc. 12th USENIX Conf. Operating Systems Design and Implementation, 2016, pp. 99–115.

  44. Breitgand, D. and Epstein, A., Improving consolidation of virtual machines with risk-aware bandwidth oversubscription in compute clouds, Proc. IEEE INFOCOM, 2012, pp. 2861–2865.

  45. Wang, M., Meng, X., and Zhang, L., Consolidating virtual machines with dynamic bandwidth demand in data centers, Proc. IEEE INFOCOM, 2011, pp. 71–75.

  46. Urgaonkar, B., Shenoy, P., and Roscoe, T., Resource overbooking and application profiling in shared hosting platforms, Proc. 5th Symp. Operating Systems Design and Implementation, 2002, pp. 239–254.

    Article  Google Scholar 

  47. Hindman, B., Konwinski, A., Zaharia, M., et al., Mesos: A platform for fine-grained resource sharing in the data center, Proc. 8th USENIX Conf. Networked Systems Design and Implementation, 2011, pp. 295–308.

  48. Vavilapalli, V.K., Murthy, A.C., Douglas, C., et al., Apache Hadoop YARN: Yet another resource negotiator, Proc. 4th Annu. Symp. Cloud Computing, 2013, pp. 5:1–5:16.

  49. Schwarzkopf, M., Konwinski, A., Abd-El-Malek, M., et al., Omega: Flexible, scalable schedulers for large compute clusters, Proc. 8th ACM Eur. Conf. Computer Systems, 2013, pp. 351–364.

  50. Scheduling in Nomad. http://www.nomadproject.io/docs/internals/scheduling.html. Accessed December 1, 2018.

  51. Mitzenmacher, M., The power of two choices in randomized load balancing, IEEE Trans. Parallel Distrib. Syst., 2001, vol. 12, no. 10, pp. 1094–1104.

    Article  Google Scholar 

  52. Ousterhout, K., Wendell, P., Zaharia, M., et al., Sparrow: Distributed, low latency scheduling, Proc. 24th ACM Symp. Operating Systems Principles, 2013, pp. 69–84.

  53. Fagin, R. and Williams, J.H., A fair carpool scheduling algorithm, IBM J. Res. Dev., 1983, vol. 27, no. 2, pp. 133–139.

    Article  Google Scholar 

  54. Delimitrou, C., Sanchez, D., and Kozyrakis, C., Tarcil: Reconciling scheduling speed and quality in large shared clusters, Proc. 6th ACM Symp. Cloud Computing, 2015, pp. 97–110.

  55. Karanasos, K., Rao, S., Curino, C., et al., Mercury: Hybrid centralized and distributed scheduling in large shared clusters, Proc. USENIX Annual Technical Conf., 2015, pp. 485–497.

  56. Delgado, P., Dinu, F., Kermarrec, A.-M., et al., Hawk: Hybrid datacenter scheduling, Proc. USENIX Annual Technical Conf., 2015, pp. 499–510.

  57. The Open Container Initiative. http://www.opencontainers.org. Accessed December 1, 2018.

  58. Avetisyan, A.A., Gaissaryan, S.S., Samovarov, O.I., et al., Organization of scientific centers in university cluster program, Proc. All-Russ. Conf. Scientific Service in Internet: Supercomputer Centers and Problems, 2010, pp. 213–215.

  59. Samovarov, O.I. and Gaissaryan, S.S., Architecture and implementation details of Unihub platform in cloud computing architecture based on Openstack package, Tr. Inst. Sistemnogo Program. Ross. Akad. Nauk, 2014, vol. 26, no. 1, pp. 403–420.

    Google Scholar 

  60. Grushin, D.A. and Kuzyurin, N.N., Load balancing in Unihub SaaS system based on user behavior prediction, Tr. Inst. Sistemnogo Program. Ross. Akad. Nauk, 2015, vol. 27, no. 5, pp. 23–34.

    Google Scholar 

  61. Grushin, D.A. and Kuzyurin, N.N., Optimization problems running MPI-based HPC applications, Tr. Inst. Sistemnogo Program. Ross. Akad. Nauk, 2017, vol. 29, no. 6, pp. 229–244.

    Google Scholar 

Download references

Funding

This work was supported by the Russian Foundation for Basic Research, project no. 17-07-01006.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to D. A. Grushin or N. N. Kuzyurin.

Additional information

Translated by Yu. Kornienko

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Grushin, D.A., Kuzyurin, N.N. On Effective Scheduling in Computing Clusters. Program Comput Soft 45, 398–404 (2019). https://doi.org/10.1134/S0361768819070077

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1134/S0361768819070077

Navigation