On Effective Scheduling in Computing Clusters

Grushin, D. A.; Kuzyurin, N. N.

doi:10.1134/S0361768819070077

On Effective Scheduling in Computing Clusters

Published: 16 December 2019

Volume 45, pages 398–404, (2019)
Cite this article

Programming and Computer Software Aims and scope Submit manuscript

D. A. Grushin¹ &
N. N. Kuzyurin^1,2

159 Accesses
2 Citations
3 Altmetric
Explore all metrics

Abstract

Presently, big companies such as Amazon, Google, Facebook, Microsoft, and Yahoo! own huge datacenters with thousands of nodes. These clusters are used simultaneously by many clients. The users submit jobs containing one or more tasks. The task flow is usually a mix of short, long, interactive, and batch tasks with different priorities. The cluster scheduler decides on which server the task should be run as a process, container, or virtual machine. Scheduler optimizations are important as they provide higher server utilization, lower latency, improved load balancing, and fault tolerance. Optimal task placement is a complex problem that has multiple dimensions and requires algorithmically complex optimizations. This increases placement latency and limits cluster scalability. In this paper, we consider different cluster scheduler architectures and optimization problems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Notes

A computing cluster is a set of workstations (nodes) that are connected by a communication medium and can operate as a whole owing to additional software called a cluster management system.
More precisely, process-level isolation: Borg appeared before the widespread use of containerization and virtualization.

REFERENCES

Sinnen, O., Task Scheduling for Parallel Systems, Wiley, 2007.
Book Google Scholar
Verma, A., Pedrosa, L., Korupolu, M., et al., Large-scale cluster management at Google with Borg, Proc. 10th Eur. Conf. Computer Systems, 2015, pp. 18:1–18:17.
Ehrgott, M., Multicriteria Optimization, Springer, 2005.
MATH Google Scholar
Avetisyan, A., Grushin, D., and Ryzhov, A., Cluster management systems, Tr. Inst. Sistemnogo Program. Ross. Akad. Nauk, 2002, vol. 3, pp. 39–62.
Google Scholar
MPI: A message-passing interface standard, Technical Report, University of Tennessee, Knoxville, 1994.
Kaplan, J.A. and Nelson, M.L., A comparison of queueing, cluster, and distributed computing systems, NASA Langley Technical Report, 1994.
Baker, M., Fox, G., and Yau, H., A review of commercial and research cluster management software, Northeast Parallel Architectures Center, 1996.
Google Scholar
Foster, I. and Kesselman, C., The Grid: Blueprint for a New Computing Infrastructure, Morgan Kaufmann, 1999.
Google Scholar
Boutin, E., Ekanayake, J., Lin, W., et al., Apollo: Scalable and coordinated scheduling for cloud-scale computing, Proc. 11th USENIX Conf. Operating Systems Design and Implementation, 2014, pp. 285–300.
Delgado, P., Dinu, F., Kermarrec, A.-M., et al., Hawk: Hybrid datacenter scheduling, Proc. USENIX Annual Technical Conf., 2015.
Schwarzkopf, M., Cluster scheduling for data centers, Queue, 2017, vol. 15, no. 5.
Article Google Scholar
Herbein, S., Dusia, A., Landwehr, A., et al., Resource management for running HPC applications in container clouds, Lect. Notes Comput. Sci., 2016, vol. 9697, pp. 261–278.
Article Google Scholar
Ye, D., Han, X., and Zhang, G., Online multiple-strip packing, Theor. Comput. Sci., 2011, vol. 412, no. 3, pp. 233–239.
Article MathSciNet Google Scholar
Hurink, J.L. and Paulus, J.J., Online algorithm for parallel job scheduling and strip packing, Lect. Notes Comput. Sci., 2007, vol. 4927, pp. 67–74.
Article MathSciNet Google Scholar
Zhuk, S., On-line algorithms for packing rectangles into several strips, Discrete Math. Appl., 2007, vol. 17, no. 5, pp. 517–531.
Article Google Scholar
Johnson, D.S., Demers, A., Ullman, J.D., et al., Worst-case performance bounds for simple one-dimensional packing algorithms, SIAM J. Comput., 1974, vol. 3, no. 4, pp. 299–325.
Article MathSciNet Google Scholar
Garey, M.R., Graham, R.L., Johnson, D.S., et al., Resource constrained scheduling as generalized bin packing, J. Comb. Theory, Ser. A, 1976, vol. 21, no. 3, pp. 257–298.
Article MathSciNet Google Scholar
Zhuk, S., Chernykh, A., Avetisyan, A., et al., Comparison of scheduling heuristics for grid resource broker, Proc. 5th Mexican Int. Conf., 2004, pp. 388–392.
Tchernykh, A., Ramí rez-Alcaraz, J.M., Avetisyan, A., et al., Two level job-scheduling strategies for a computational grid, Lect. Notes Comp. Sci., 2005, vol. 3911, pp. 774–781.
Article Google Scholar
Tchernykh, A., Schwiegelshohn, U., Yahyapour, R., et al., On-line hierarchical job scheduling on grids with admissible allocation, J. Scheduling, 2010, vol. 13, no. 5, pp. 545–52.
Article MathSciNet Google Scholar
Tchernykh, A., Schwiegelshohn, U., Yahyapour, R., et al., Online hierarchical job scheduling on grids, From Grids to Service and Pervasive Computing, Springer, 2008, pp. 77–91.
MATH Google Scholar
Avetisyan, A.I., Gaissaryan, S.S., Grushin, D.A., et al., Scheduling heuristics for grid resource broker, Tr. Inst. Sistemnogo Program. Ross. Akad. Nauk, 2004, vol. 5, pp. 41–62.
Google Scholar
Baraglia, R., Capannini, G., Pasquali, M., et al., Backfilling strategies for scheduling streams of jobs on computational farms, Making Grids Work, Springer, 2008.
Google Scholar
Mu'alem, A.W. and Feitelson, D.G., Utilization, predictability, workloads, and user runtime estimates in scheduling the IBM SP2 with backfilling, IEEE Trans. Parallel Distrib. Syst., 2001, vol. 12, no. 6, pp. 529–543.
Article Google Scholar
Nissimov, A. and Feitelson, D.G., Probabilistic backfilling, Lect. Notes Comput. Sci., 2008, vol. 4942, pp. 102–115.
Article Google Scholar
Ivannikov, V.P., Grushin, D.A., Kuzyurin, N.N., Pospelov, A.I., and Shokurov, A.V., Software for improving the energy efficiency of a computer cluster, Program. Comput. Software, 2010, vol. 36, no. 6, pp. 327–336.
Article Google Scholar
Baraglia, R., Capannini, G., Dazzi, P., et al., A multi-criteria job scheduling framework for large computing farms, J. Comput. Syst. Sci., 2013, vol. 79, no. 2, pp. 230–244.
Article MathSciNet Google Scholar
Csirik, J. and Van Vliet, A., An online algorithm for multidimensional bin packing, Oper. Res. Lett., 1993, vol. 13, no. 3, pp. 149–158.
Article MathSciNet Google Scholar
Kuzjurin, N.N. and Pospelov, A.I., Probabilistic analysis of a new class of rectangle strip packing algorithms, Comput. Math. Math. Phys., 2011, vol. 51, no. 10, pp. 1931–1936.
MathSciNet MATH Google Scholar
Trushnikov, M.A., Probabilistic analysis of a new rectangle strip packing algorithm, Tr. Inst. Sistemnogo Program. Ross. Akad. Nauk, 2013, vol. 24, pp. 457–468.
Google Scholar
Lazarev, D.O. and Kuzjurin, N.N., Rectangle packing algorithm into several strips and average case analysis, Tr. Inst. Sistemnogo Program. Ross. Akad. Nauk, 2017, vol. 29, no. 6, pp. 221–228.
Google Scholar
Zhuk, S., On parallel task scheduling on a group of clusters with different speeds, Tr. Inst. Sistemnogo Program. Ross. Akad. Nauk, 2012, vol. 23, pp. 447–454.
Google Scholar
Tsai, C.-W. and Rodrigues, J.J., Metaheuristic scheduling for cloud: A survey, IEEE Syst. J., 2014, vol. 8, no. 1, pp. 279–291.
Article Google Scholar
Grushin, D.A. and Kuzjurin, N.N., Energy effective computations on a group of clusters, Tr. Inst. Sistemnogo Program. Ross. Akad. Nauk, 2012, vol. 23, pp. 433–446.
Google Scholar
Goldberg, R.P., Distributed, low latency scheduling, Proc. 24th ACM Symp. Operating Systems Principles, 2013, pp. 69–84.
Magic quadrant for cloud infrastructure as a service. http://www.gartner.com/doc/3875999/magic-quadrant-cloud-infrastructure-service.
Helsley, M., LXC: Linux container tools, IBM developerWorks Technical Library, 2009.
Merkel, D., Docker: Lightweight Linux containers for consistent development and deployment, Linux J., 2014, no. 239.
Barroso, L.A., Clidaras, J., and Hoelzle, U., The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines, Morgan & Claypool, 2013.
Google Scholar
Delimitrou, C. and Kozyrakis, C., Paragon: QoS-aware scheduling for heterogeneous datacenters, ACM SIGPLAN Not., 2013, vol. 48, no. 4, pp. 77–88.
Article Google Scholar
Delimitrou, C. and Kozyrakis, C., Quasar: Resource-efficient and QoS-aware cluster management, ACM SIGPLAN Not., 2014, vol. 49, no. 4, pp. 127–144.
Google Scholar
Romero, F. and Delimitrou, C., Mage: Online and interference-aware scheduling for multi-scale heterogeneous systems, Proc. 27th Int. Conf. Parallel Architectures and Compilation Techniques, 2018, pp. 19:1–19:13.
Gog, I., Schwarzkopf, M., Gleave, A., et al., Firmament: Fast, centralized cluster scheduling at scale, Proc. 12th USENIX Conf. Operating Systems Design and Implementation, 2016, pp. 99–115.
Breitgand, D. and Epstein, A., Improving consolidation of virtual machines with risk-aware bandwidth oversubscription in compute clouds, Proc. IEEE INFOCOM, 2012, pp. 2861–2865.
Wang, M., Meng, X., and Zhang, L., Consolidating virtual machines with dynamic bandwidth demand in data centers, Proc. IEEE INFOCOM, 2011, pp. 71–75.
Urgaonkar, B., Shenoy, P., and Roscoe, T., Resource overbooking and application profiling in shared hosting platforms, Proc. 5th Symp. Operating Systems Design and Implementation, 2002, pp. 239–254.
Article Google Scholar
Hindman, B., Konwinski, A., Zaharia, M., et al., Mesos: A platform for fine-grained resource sharing in the data center, Proc. 8th USENIX Conf. Networked Systems Design and Implementation, 2011, pp. 295–308.
Vavilapalli, V.K., Murthy, A.C., Douglas, C., et al., Apache Hadoop YARN: Yet another resource negotiator, Proc. 4th Annu. Symp. Cloud Computing, 2013, pp. 5:1–5:16.
Schwarzkopf, M., Konwinski, A., Abd-El-Malek, M., et al., Omega: Flexible, scalable schedulers for large compute clusters, Proc. 8th ACM Eur. Conf. Computer Systems, 2013, pp. 351–364.
Scheduling in Nomad. http://www.nomadproject.io/docs/internals/scheduling.html. Accessed December 1, 2018.
Mitzenmacher, M., The power of two choices in randomized load balancing, IEEE Trans. Parallel Distrib. Syst., 2001, vol. 12, no. 10, pp. 1094–1104.
Article Google Scholar
Ousterhout, K., Wendell, P., Zaharia, M., et al., Sparrow: Distributed, low latency scheduling, Proc. 24th ACM Symp. Operating Systems Principles, 2013, pp. 69–84.
Fagin, R. and Williams, J.H., A fair carpool scheduling algorithm, IBM J. Res. Dev., 1983, vol. 27, no. 2, pp. 133–139.
Article Google Scholar
Delimitrou, C., Sanchez, D., and Kozyrakis, C., Tarcil: Reconciling scheduling speed and quality in large shared clusters, Proc. 6th ACM Symp. Cloud Computing, 2015, pp. 97–110.
Karanasos, K., Rao, S., Curino, C., et al., Mercury: Hybrid centralized and distributed scheduling in large shared clusters, Proc. USENIX Annual Technical Conf., 2015, pp. 485–497.
Delgado, P., Dinu, F., Kermarrec, A.-M., et al., Hawk: Hybrid datacenter scheduling, Proc. USENIX Annual Technical Conf., 2015, pp. 499–510.
The Open Container Initiative. http://www.opencontainers.org. Accessed December 1, 2018.
Avetisyan, A.A., Gaissaryan, S.S., Samovarov, O.I., et al., Organization of scientific centers in university cluster program, Proc. All-Russ. Conf. Scientific Service in Internet: Supercomputer Centers and Problems, 2010, pp. 213–215.
Samovarov, O.I. and Gaissaryan, S.S., Architecture and implementation details of Unihub platform in cloud computing architecture based on Openstack package, Tr. Inst. Sistemnogo Program. Ross. Akad. Nauk, 2014, vol. 26, no. 1, pp. 403–420.
Google Scholar
Grushin, D.A. and Kuzyurin, N.N., Load balancing in Unihub SaaS system based on user behavior prediction, Tr. Inst. Sistemnogo Program. Ross. Akad. Nauk, 2015, vol. 27, no. 5, pp. 23–34.
Google Scholar
Grushin, D.A. and Kuzyurin, N.N., Optimization problems running MPI-based HPC applications, Tr. Inst. Sistemnogo Program. Ross. Akad. Nauk, 2017, vol. 29, no. 6, pp. 229–244.
Google Scholar

Download references

Funding

This work was supported by the Russian Foundation for Basic Research, project no. 17-07-01006.

Author information

Authors and Affiliations

Ivannikov Institute for System Programming, Russian Academy of Sciences, ul. Solzhenitsyna 25, 109004, Moscow, Russia
D. A. Grushin & N. N. Kuzyurin
Moscow Institute of Physics and Technology, Institutskii per. 9, 141700, Dolgoprudnyi, Moscow oblast, Russia
N. N. Kuzyurin

Authors

D. A. Grushin
View author publications
You can also search for this author in PubMed Google Scholar
N. N. Kuzyurin
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to D. A. Grushin or N. N. Kuzyurin.

Additional information

Translated by Yu. Kornienko

Rights and permissions

Reprints and permissions

About this article

Cite this article

Grushin, D.A., Kuzyurin, N.N. On Effective Scheduling in Computing Clusters. Program Comput Soft 45, 398–404 (2019). https://doi.org/10.1134/S0361768819070077

Download citation

Received: 13 February 2019
Revised: 13 February 2019
Accepted: 15 February 2019
Published: 16 December 2019
Issue Date: December 2019
DOI: https://doi.org/10.1134/S0361768819070077

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions