Resource Utilization Analysis of Alibaba Cloud

  • Li Deng
  • Yu-Lin Ren
  • Fei Xu
  • Heng He
  • Chao LiEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10954)


Currently, low resource utilization and high costs of cloud platform are becoming big challenges to cloud provider. However, due to confidentiality, few cloud platform providers are willing to publish resource utilization data of their cloud platform. This poses great difficulties in designing an effective cloud resource scheduler. Fortunately, Alibaba released its cloud resource usage data in September 2017. This paper analyzes Alibaba cloud trace data deeply from different aspects and reveals some important features of resource utilization. These features will help to design effective resource management approaches for cloud platform: (1) The maximum resource utilization of online services is closely related to their average utilization. (2) The longer a batch instance runs, the longer it may last. (3) The type of job that runs in a container can be estimated according to the amount of consumed resources and life time of this container. (4) Actual resources used by different batch jobs vary with time greatly and static resource allocation would make resource wasted seriously.


Cloud platform Online services Batch jobs Resource utilization ratio 


  1. 1.
    Hindman, B., Konwinski, A., Zaharia, M., et al.: Mesos: a platform for fine-grained resource sharing in the data center. In: Proceedings of the 8th USENIX Conference on Networked Systems Design and Implementation, pp. 429–483. USENIX (2011)Google Scholar
  2. 2.
    Delimitrou, C., Kozyrakis, C.: Quasar: resource-efficient and QoS-aware cluster management. In: Proceedings of International Conference on Architectural Support for Programming Languages and Operating Systems, pp. 127–144. ACM (2014)Google Scholar
  3. 3.
    Poggi, N., Carrera, D., Gavalda, R., et al.: Characterization of workload and resource consumption for an online travel and booking site. In: Proceedings of IEEE International Symposium on Workload Characterization, pp. 1–10. IEEE (2010)Google Scholar
  4. 4.
    Zheng, Z., Yu, L., Tang, W., et al.: Co-analysis of RAS log and job log on Blue Gene/P. In: Proceedings of Parallel & Distributed Processing Symposium, pp. 840–851. IEEE (2011)Google Scholar
  5. 5.
    Li, C., Dai, B., Kuang, Z., et al.: Research on task scheduling with multiple constraints based on genetic algorithm in cloud computing environment. J. Chin. Comput. Syst. 38(9), 1945–1949 (2017)Google Scholar
  6. 6.
    Reiss, C., Tumanov, A., Ganger, G.R., et al.: Heterogeneity and dynamicity of clouds at scale: google trace analysis. In: Proceedings of ACM Symposium on Cloud Computing, pp. 1–13. ACM (2012)Google Scholar
  7. 7.
    Verma, A., Pedrosa, L., Korupolu, M., et al.: Large-scale cluster management at Google with Borg. In: Proceedings of Tenth European Conference on Computer Systems. ACM (2015)Google Scholar
  8. 8.
    Cortez, E., Bonde, A., Muzio, A., et al.: Resource central: understanding and predicting workloads for improved resource management in large cloud platforms. In: Proceedings of Symposium on Operating Systems Principles, pp. 153–167 (2017)Google Scholar
  9. 9.
    Delimitrou, C., Kozyrakis, C.: HCloud: resource-efficient provisioning in shared cloud systems. In: Proceedings of International Conference on Architectural Support for Programming Languages and Operating Systems, pp. 473–488. ACM (2016)Google Scholar
  10. 10.
    Lu, C., Ye, K., Xu, G.: Imbalance in the cloud: an analysis on Alibaba cluster trace. In: Proceedings of IEEE International Conference on Big Data. IEEE (2017)Google Scholar
  11. 11.
    Yang, S., Zhang, Q.: Research on k-nearest neighbor text classification algorithm of approximation set of rough set. J. Chin. Comput. Syst. 38(10), 2192–2196 (2017)Google Scholar
  12. 12.
    Meng, X., Bradley, J., Yavuz, B., et al.: MLlib: machine learning in apache spark. J. Mach. Learn. Res. 17(1), 1235–1241 (2015)MathSciNetzbMATHGoogle Scholar
  13. 13.
    Abadi, M., Barham, P., Chen, J., et al.: TensorFlow: a system for large-scale machine learning. In: Proceedings of the 12th USENIX Symposium on Operating System Design and Implementation, pp. 265–283. USENIX (2016)Google Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  • Li Deng
    • 1
    • 2
  • Yu-Lin Ren
    • 1
    • 2
  • Fei Xu
    • 1
    • 2
  • Heng He
    • 1
    • 2
  • Chao Li
    • 3
    Email author
  1. 1.College of Computer Science and TechnologyWuhan University of Science and TechnologyWuhanChina
  2. 2.Hubei Province Key Laboratory of Intelligent Information Processing and Real-Time Industrial SystemWuhanChina
  3. 3.Department of Information Development and ManagementHubei UniversityWuhanChina

Personalised recommendations