A Review of Resource Scheduling in Large-Scale Server Cluster

He, Libo; Qiang, Zhenping; Zhou, Wei; Yao, Shaowen

doi:10.1007/978-3-319-62698-7_41

Libo He¹²,
Zhenping Qiang¹²,
Wei Zhou¹³ &
…
Shaowen Yao¹³

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 731))

Included in the following conference series:

International Conference on Knowledge Management in Organizations

2075 Accesses
1 Citations

Abstract

Resource scheduling has played a crucial role for improving resource utilization of server cluster. There are five types of scheduling architectures that are being used widely for scheduling resource in server clusters that includes statically partitioned schedulers, monolithic schedulers, two-level scheduling, shared-state scheduling and distributed schedulers. In this paper, several scheduling architectures will be discussed. This paper also illustrates key techniques of these scheduling architectures, including resource representation and sharing model, scheduling algorithms and some other techniques. Different scheduling techniques are being applied in different scheduling architectures. Based on this review paper, it can be concluded that there are a lot of works related to scheduling strategies in large-scale cluster are conducted. However, the relatively complicated application and scaling cluster size present new requirements to scheduling techniques. Then some scheduling techniques can still be improved.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. In: Proceedings of Conference on Symposium on Operating Systems Design & Implementation, vol. 51, pp. 107–113 (2004)
Google Scholar
Bouteiller, A., Cappello, Herault, F.T., Krawezik, G., Lemarinier, P., Magniette, F.: MPICH-V2: a fault tolerant MPI for volatile nodes based on pessimistic sender based message logging. In: Proceedings of the 2003 ACM/IEEE Conference on Supercomputing, pp. 1–17. ACM (2003)
Google Scholar
Zaharia, M., Borthakur, D., Sen Sarma, J., Elmeleegy, K., Shenker, S., Stoica, I.: Delay scheduling: a simple technique for achieving locality and fairness in cluster scheduling. In: Proceedings of the 5th European Conference on Computer Systems, pp. 265–278. ACM (2010)
Google Scholar
Zaharia, M., Chowdhury, M., Das, T., Dave, A., Ma, J., McCauley, M., Franklin, M.J., Shenker, S., Stoica, I.: Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing. In: Proceedings of USENIX Conference on Networked Systems Design and Implementation, pp. 141–146. USENIX Association (2012)
Google Scholar
Apache Storm (2014). http://storm.apache.org/
Apache Tez (2014). http://tez.apache.org/
Boutin, E., Ekanayake, J., Lin, W., Shi, B., Zhou, J., Qian, Z., et al.: Apollo: scalable and coordinated scheduling for cloud-scale computing. In: Proceedings of the 11th USENIX Symposium on Operating Systems Design and Implementation (OSDI 2013), pp. 285–300. ACM (2013)
Google Scholar
Verma, A., Pedrosa, L., Abd-El-Malek, M., Korupolu, M., Oppenheimer, D., Tune, E., Wilkes, J.: Large-scale cluster management at Google with Borg. In: Proceedings of the Tenth European Conference on Computer Systems (EuroSys 2015), pp. 18. ACM (2015)
Google Scholar
Narayanan, A.: Tupperware: containerized deployment at Facebook (2014). http://www.slideshare.net/dotCloud/tupperware-containerized-deployment-at-facebook
Apache Aurora (2014). http://aurora.incubator.apache.org/
Zhang, Z., Li, C., Tao, Y., Yang, R., Tang, H., Xu, J.: Fuxi: a fault-tolerant resource management and job scheduling system at internet scale. In: Proceedings of International Conference on Very Large Data Bases (VLDB), pp. 1393–1404. VLDB Endowment Inc. (2014)
Google Scholar
Murthy, A.C.: The Next Generation of Apache MapReduce (2012). http://developer.yahoo.com/blogs/hadoop/nextgenerationapache-hadoop-mapreduce-3061.html
Hindman, B., Konwinski, A., Zaharia, M., Ghodsi, A., Joseph, A.D., Katz, R., Shenker, S., Stoica, I.: Mesos: a platform for fine-grained resource sharing in the data center. In: Proceedings of the 8th USENIX Conference on Networked Systems Design and Implementation, pp. 429–483. USENIX Association (2013)
Google Scholar
Schwarzkopf, M., Konwinski, A., Abd-El-Malek, M., Wilkes, J.: Omega: flexible, scalable schedulers for large compute clusters. In: Proceedings of the 8th ACM European Conference on Computer Systems (EuroSys 2013), pp. 351–364. ACM (2013)
Google Scholar
Ousterhout, K., Wendell, P., Zaharia, M., Stoica, I.: Sparrow: distributed, low latency scheduling. In: Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles (SOSP 2013), pp. 69–84. ACM (2013)
Google Scholar
Delgado, P., Dinu, F., Kermarrec, A.M., Zwaenepoel, W.: Hawk: hybrid datacenter scheduling. In: Proceedings of 2015 USENIX Annual Technical Conference (USENIX ATC 2015), pp. 499–510. USENIX Association (2015)
Google Scholar
Delimitrou, C., Sanchez, D., Kozyrakis, C.: Tarcil: reconciling scheduling speed and quality in large shared clusters. In: Proceedings of the Sixth ACM Symposium on Cloud Computing (SoCC 2015), pp. 97–110. ACM (2015)
Google Scholar
VMware VCloud Suite. http://www.vmware.com/products/vcloud-suite/
IBM Platform Computing. http://www-03.ibm.com/systems/technicalcomputing/platformcomputing/products/clustermanager/index.html
Banga, G., Druschel, P., Mogul, J.C.: Resource containers: a new facility for resource management in server systems. In: Proceedings of Symposium on Operating Systems Design and Implementation, vol. 22, pp. 45–58. USENIX Association (1970)
Google Scholar
Docker Project (2014). https://www.docker.io/
Kubernetes (2014). http://kubernetes.io
Delimitrou, C., Kozyrakis, C.: Quasar: resource-efficient and QoS-Aware cluster management. In: Proceedings of the 19th International Conference on Architectural Support for Programming Languages and Operating Systems, vol. 42(4), pp. 127–144. ACM (2014)
Google Scholar
Isard, M., Prabhakaran, V., Currey, J., Wieder, U., Talwar, K., Goldberg, A.: Quincy: fair scheduling for distributed computing clusters. In: Proceedings of the 22nd Symposium on Operating System Principles, pp. 261–276. ACM (2009)
Google Scholar
Tiwari, N., Sarkar, S., Bellur, U., Indrawan, M.: Classification framework of MapReduce scheduling algorithms. ACM Comput. Surv. 47(3), 1–38 (2015)
Article Google Scholar
Chen, Y., Ganapathi, A., Griffith, R., Katz, R.: The case for evaluating MapReduce performance using workload suites. In: Proceedings of the 2011 IEEE 19th Annual International Symposium on Modelling, Analysis, and Simulation of Computer and Telecommunication Systems, pp. 390–399. IEEE Computer Society, 2011
Google Scholar
Reiss, C., Tumanov, A., Ganger, G.R., Katz, R.H., Kozuch, M.A.: Heterogeneity and dynamicity of clouds at scale: Google trace analysis. In: Proceedings of the Third ACM Symposium on Cloud Computing, pp. 7:1–7:13. ACM (2012)
Google Scholar
Ghodsi, A., Zaharia, M., Hindman, B., Konwinski, A., Shenker, S., Stoica, I.: Dominant resource fairness: fair allocation of multiple resource types. In: The Proceedings of USENIX Conference on Networked Systems Design and Implementation, pp. 323–336. USENIX Association (2011)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Information Science and Engineering, Yunnan University, No. 2 Cuihu North Road, Kunming, Yunnan, China
Libo He & Zhenping Qiang
School of Software, Yunnan University, No. 2 Cuihu North Road, Kunming, Yunnan, China
Wei Zhou & Shaowen Yao

Authors

Libo He
View author publications
You can also search for this author in PubMed Google Scholar
Zhenping Qiang
View author publications
You can also search for this author in PubMed Google Scholar
Wei Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Shaowen Yao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shaowen Yao .

Editor information

Editors and Affiliations

University of Staffordshire, Stoke-on-Trent, Staffordshire, United Kingdom
Lorna Uden
Beijing Jiaotong University, Beijing, China
Wei Lu
Department of Information Management, University of Kaohsiung, Kaohsiung, Taiwan
I-Hsien Ting

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

He, L., Qiang, Z., Zhou, W., Yao, S. (2017). A Review of Resource Scheduling in Large-Scale Server Cluster. In: Uden, L., Lu, W., Ting, IH. (eds) Knowledge Management in Organizations. KMO 2017. Communications in Computer and Information Science, vol 731. Springer, Cham. https://doi.org/10.1007/978-3-319-62698-7_41

Download citation

DOI: https://doi.org/10.1007/978-3-319-62698-7_41
Published: 12 July 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-62697-0
Online ISBN: 978-3-319-62698-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics