Skip to main content

A Review of Resource Scheduling in Large-Scale Server Cluster

  • Conference paper
  • First Online:
Knowledge Management in Organizations (KMO 2017)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 731))

Included in the following conference series:

Abstract

Resource scheduling has played a crucial role for improving resource utilization of server cluster. There are five types of scheduling architectures that are being used widely for scheduling resource in server clusters that includes statically partitioned schedulers, monolithic schedulers, two-level scheduling, shared-state scheduling and distributed schedulers. In this paper, several scheduling architectures will be discussed. This paper also illustrates key techniques of these scheduling architectures, including resource representation and sharing model, scheduling algorithms and some other techniques. Different scheduling techniques are being applied in different scheduling architectures. Based on this review paper, it can be concluded that there are a lot of works related to scheduling strategies in large-scale cluster are conducted. However, the relatively complicated application and scaling cluster size present new requirements to scheduling techniques. Then some scheduling techniques can still be improved.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. In: Proceedings of Conference on Symposium on Operating Systems Design & Implementation, vol. 51, pp. 107–113 (2004)

    Google Scholar 

  2. Bouteiller, A., Cappello, Herault, F.T., Krawezik, G., Lemarinier, P., Magniette, F.: MPICH-V2: a fault tolerant MPI for volatile nodes based on pessimistic sender based message logging. In: Proceedings of the 2003 ACM/IEEE Conference on Supercomputing, pp. 1–17. ACM (2003)

    Google Scholar 

  3. Zaharia, M., Borthakur, D., Sen Sarma, J., Elmeleegy, K., Shenker, S., Stoica, I.: Delay scheduling: a simple technique for achieving locality and fairness in cluster scheduling. In: Proceedings of the 5th European Conference on Computer Systems, pp. 265–278. ACM (2010)

    Google Scholar 

  4. Zaharia, M., Chowdhury, M., Das, T., Dave, A., Ma, J., McCauley, M., Franklin, M.J., Shenker, S., Stoica, I.: Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing. In: Proceedings of USENIX Conference on Networked Systems Design and Implementation, pp. 141–146. USENIX Association (2012)

    Google Scholar 

  5. Apache Storm (2014). http://storm.apache.org/

  6. Apache Tez (2014). http://tez.apache.org/

  7. Boutin, E., Ekanayake, J., Lin, W., Shi, B., Zhou, J., Qian, Z., et al.: Apollo: scalable and coordinated scheduling for cloud-scale computing. In: Proceedings of the 11th USENIX Symposium on Operating Systems Design and Implementation (OSDI 2013), pp. 285–300. ACM (2013)

    Google Scholar 

  8. Verma, A., Pedrosa, L., Abd-El-Malek, M., Korupolu, M., Oppenheimer, D., Tune, E., Wilkes, J.: Large-scale cluster management at Google with Borg. In: Proceedings of the Tenth European Conference on Computer Systems (EuroSys 2015), pp. 18. ACM (2015)

    Google Scholar 

  9. Narayanan, A.: Tupperware: containerized deployment at Facebook (2014). http://www.slideshare.net/dotCloud/tupperware-containerized-deployment-at-facebook

  10. Apache Aurora (2014). http://aurora.incubator.apache.org/

  11. Zhang, Z., Li, C., Tao, Y., Yang, R., Tang, H., Xu, J.: Fuxi: a fault-tolerant resource management and job scheduling system at internet scale. In: Proceedings of International Conference on Very Large Data Bases (VLDB), pp. 1393–1404. VLDB Endowment Inc. (2014)

    Google Scholar 

  12. Murthy, A.C.: The Next Generation of Apache MapReduce (2012). http://developer.yahoo.com/blogs/hadoop/nextgenerationapache-hadoop-mapreduce-3061.html

  13. Hindman, B., Konwinski, A., Zaharia, M., Ghodsi, A., Joseph, A.D., Katz, R., Shenker, S., Stoica, I.: Mesos: a platform for fine-grained resource sharing in the data center. In: Proceedings of the 8th USENIX Conference on Networked Systems Design and Implementation, pp. 429–483. USENIX Association (2013)

    Google Scholar 

  14. Schwarzkopf, M., Konwinski, A., Abd-El-Malek, M., Wilkes, J.: Omega: flexible, scalable schedulers for large compute clusters. In: Proceedings of the 8th ACM European Conference on Computer Systems (EuroSys 2013), pp. 351–364. ACM (2013)

    Google Scholar 

  15. Ousterhout, K., Wendell, P., Zaharia, M., Stoica, I.: Sparrow: distributed, low latency scheduling. In: Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles (SOSP 2013), pp. 69–84. ACM (2013)

    Google Scholar 

  16. Delgado, P., Dinu, F., Kermarrec, A.M., Zwaenepoel, W.: Hawk: hybrid datacenter scheduling. In: Proceedings of 2015 USENIX Annual Technical Conference (USENIX ATC 2015), pp. 499–510. USENIX Association (2015)

    Google Scholar 

  17. Delimitrou, C., Sanchez, D., Kozyrakis, C.: Tarcil: reconciling scheduling speed and quality in large shared clusters. In: Proceedings of the Sixth ACM Symposium on Cloud Computing (SoCC 2015), pp. 97–110. ACM (2015)

    Google Scholar 

  18. VMware VCloud Suite. http://www.vmware.com/products/vcloud-suite/

  19. IBM Platform Computing. http://www-03.ibm.com/systems/technicalcomputing/platformcomputing/products/clustermanager/index.html

  20. Banga, G., Druschel, P., Mogul, J.C.: Resource containers: a new facility for resource management in server systems. In: Proceedings of Symposium on Operating Systems Design and Implementation, vol. 22, pp. 45–58. USENIX Association (1970)

    Google Scholar 

  21. Docker Project (2014). https://www.docker.io/

  22. Kubernetes (2014). http://kubernetes.io

  23. Delimitrou, C., Kozyrakis, C.: Quasar: resource-efficient and QoS-Aware cluster management. In: Proceedings of the 19th International Conference on Architectural Support for Programming Languages and Operating Systems, vol. 42(4), pp. 127–144. ACM (2014)

    Google Scholar 

  24. Isard, M., Prabhakaran, V., Currey, J., Wieder, U., Talwar, K., Goldberg, A.: Quincy: fair scheduling for distributed computing clusters. In: Proceedings of the 22nd Symposium on Operating System Principles, pp. 261–276. ACM (2009)

    Google Scholar 

  25. Tiwari, N., Sarkar, S., Bellur, U., Indrawan, M.: Classification framework of MapReduce scheduling algorithms. ACM Comput. Surv. 47(3), 1–38 (2015)

    Article  Google Scholar 

  26. Chen, Y., Ganapathi, A., Griffith, R., Katz, R.: The case for evaluating MapReduce performance using workload suites. In: Proceedings of the 2011 IEEE 19th Annual International Symposium on Modelling, Analysis, and Simulation of Computer and Telecommunication Systems, pp. 390–399. IEEE Computer Society, 2011

    Google Scholar 

  27. Reiss, C., Tumanov, A., Ganger, G.R., Katz, R.H., Kozuch, M.A.: Heterogeneity and dynamicity of clouds at scale: Google trace analysis. In: Proceedings of the Third ACM Symposium on Cloud Computing, pp. 7:1–7:13. ACM (2012)

    Google Scholar 

  28. Ghodsi, A., Zaharia, M., Hindman, B., Konwinski, A., Shenker, S., Stoica, I.: Dominant resource fairness: fair allocation of multiple resource types. In: The Proceedings of USENIX Conference on Networked Systems Design and Implementation, pp. 323–336. USENIX Association (2011)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shaowen Yao .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

He, L., Qiang, Z., Zhou, W., Yao, S. (2017). A Review of Resource Scheduling in Large-Scale Server Cluster. In: Uden, L., Lu, W., Ting, IH. (eds) Knowledge Management in Organizations. KMO 2017. Communications in Computer and Information Science, vol 731. Springer, Cham. https://doi.org/10.1007/978-3-319-62698-7_41

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-62698-7_41

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-62697-0

  • Online ISBN: 978-3-319-62698-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics