Performance modeling of big data applications in the cloud centers

Abstract

Cloud computing has evolved as an efficient paradigm to process big data applications. Performance evaluation of cloud center is a necessary prerequisite to guarantee quality of service. However, it is a challenge task to effectively analyze the performance of cloud service due to the complexity of cloud resources and the diversity of big data applications. In this paper, we leverage queuing theory and probabilistic statistics to propose a performance evaluation model for cloud center under big data application arrivals. In this model, the tasks (i.e., big data applications) are with Poisson arrivals, each task is divided into lots of parallel subtasks, and the number of subtasks follows a general distribution. The model allows to calculate the important performance indicators such as mean number of subtasks in the system, the probability that a task obtains immediate service, task waiting time and blocking probability. The model can also be used to predict the time cost of performing application. Finally, we use the simulations and benchmarking running WordCount and TeraSort applications on a Hadoop platform to demonstrate the utility of the model.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

References

  1. 1.

    Vaquero LM, Rodero-Merino L, Caceres J, Lindner M (2009) A break in the clouds: towards a cloud definition. ACM SIGCOMM Comput Commun Rev 39(1):50–55

    Article  Google Scholar 

  2. 2.

    Amazon Elastic Compute Cloud, Amazon EC2 (2015) An Amazon.com Company. http://aws.amazon.com/ec2

  3. 3.

    Google Cloud Platform (2015) Google. http://www.ancoris.com/cloud-computing/google-cloud-platform.html

  4. 4.

    IBM Cloud Computing (2015) IBM. http://www.ibm.com/cloud-computing/

  5. 5.

    Khazaei H, Misic J, Misic Vojislav B (2012) Performance analysis of cloud computing centers using m/g/m/m+r queuing systems. IEEE Trans Parallel Distrib Syst 23(5):936–943

    Article  Google Scholar 

  6. 6.

    Ghosh R, Trivedi KS, Naik VK, Kim DS (2010) End-to-end performability analysis for infrastructure-as-a-service cloud. In: Proceedings of IEEE 16th Pacific Rim International Symposium on Dependable Computing. pp 125–132

  7. 7.

    Suresh Varma P, Satyanarayana A, Sundari R (2012) Performance analysis of cloud computing using queuing models. In: International Conference on Cloud Computing, Technologies, Applications and Management. pp 12–15

  8. 8.

    Xiong K, Perros H (2009) Service performance and analysis in cloud computing. In: World Conference on Services. pp 693–700

  9. 9.

    Qian H, Medhi D, Trivedi KS (2011) A hierarchical model to evaluate quality of experience of online services hosted by cloud computing. In: Proceedings of IFIP/IEEE International Symposium on Integrated Network Management (IM). pp 105–112

  10. 10.

    Ghosh R, Longo F, Naik VK, Trivedi KS (2010) Quantifying resiliency of IaaS cloud. In: Proceedings of IEEE Symposium on Reliable Distributed Systems. pp 343–347

  11. 11.

    Khazaei H, Misic J, Misic VB, Rashwand S (2013) Analysis of a pool management scheme for cloud computing centers. IEEE Trans Parallel Distrib Syst 24(5):849–861

    Article  Google Scholar 

  12. 12.

    Khazaei H, Misic J, Misic Vojislav B (2013) Performance of cloud centers with high degree of virtualization under batch task arrivals. IEEE Trans Parallel Distrib Syst 24(12):2429–2438

    Article  Google Scholar 

  13. 13.

    Khazaei H, Misic J, Misic VB (2013) A fine-grained performance model of cloud computing centers. IEEE Trans Parallel Distrib Syst 24(11):2138–2147

    Article  Google Scholar 

  14. 14.

    Yang B, Tan F, Dai YS (2013) Performance evaluation of cloud service considering fault recovery. J Supercomput 65(1):426–444

    Article  Google Scholar 

  15. 15.

    Liu X, Tong W, Zhi X, ZhiRen F, WenZhao Liao (2014) Performance analysis of cloud computing services considering resources sharing among virtual machines. J Supercomput 69(1):357–374

    Article  Google Scholar 

  16. 16.

    Khazaei H, Misic J, Misic VB, Mohammadi NB (2013) Modeling the performance of heterogeneous IaaS cloud centers. In: 33rd International Conference on Distributed Computing Systems Workshops. pp 232–237

  17. 17.

    Dean J, Ghemawat S (2008) MapReduce: simplified data processing on large clusters. Commun ACM 51(1):107–113

    Article  Google Scholar 

  18. 18.

    Valiant Leslie G (1990) A bridging model for parallel computation. Commun ACM 33(8):103–111

    Article  Google Scholar 

  19. 19.

    Bolch G, Greiner S, de Meer H, Trivedi KS (2006) Q ueueing networks and markov chains, 2nd edn. Wiley, Hoboken

    Book  MATH  Google Scholar 

  20. 20.

    Doulkeridis C, Norvag Kjetil (2014) A survey of large-scale analytical query processing in MapReduce. Very Large Data Bases J 23:355–380

    Article  Google Scholar 

  21. 21.

    Pace MF (2012) BSP vs MapReduce. Procedia Comput Sci 9:246–255

    Article  Google Scholar 

  22. 22.

    Shvachko K, Kuang H, Radia S, Chansler R (2010) The hadoop distributed file system. In: Proceedings of the IEEE 26th Symposium on Mass Storage Systems and Technologies, Washington, DC, USA. IEEE Computer Society

  23. 23.

    Garfinkel SL (2007) An evaluation of Amazons grid computing services: EC2, S3 and SQS. Tech. Rep., \(\#\) TR-08-07

  24. 24.

    Jackson KR, Ramakrishnan L, Muriki K et al. (2010) Performance analysis of high performance computing applications on the Amazon web services cloud. In: 2nd IEEE International Conference on Cloud Computing Technology and Science. pp 159–168

  25. 25.

    Iosup A, Ostermann S, Yigitbasi N, Prodan R, Fahringer T, Epema D (2011) Performance analysis of cloud computing services for many-tasks scientific computing. IEEE Trans Parallel Distrib Syst 22(6):931–945

    Article  Google Scholar 

  26. 26.

    Yigitbasi N, Iosup A, Epema D, Ostermann S (2009) C-meter: a framework for performance analysis of computing clouds. In: CCGRID ’09: Proceedings of Ninth IEEE/ACM International Symposium on Cluster Computing and the Grid. pp 472–477

  27. 27.

    Liu X, Li S, Tong W (2015) A queuing model considering resources sharing for cloud service performance. J Supercomput 71(11):1–14

    Article  Google Scholar 

  28. 28.

    Chao S, Weiqin T, Kausar S (2015) Predicting the performance of parallel computing models using queuing system. In: 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing. pp 757–760

  29. 29.

    Dai Yuan-Shun, Pan Yi, Zou Xukai (2007) A hierarchical modeling and analysis for grid service reliability. IEEE Trans Comput 56(5):681–691

    MathSciNet  Article  Google Scholar 

  30. 30.

    Maple 18 (2015) Maplesoft. http://www.maplesoft.com/

  31. 31.

    Hadoop, Apache (2015) http://hadoop.apache.org/

  32. 32.

    Laplace transform (2016) https://en.wikipedia.org/wiki/Laplace_transform

  33. 33.

    Xiong W, Yu Z, Bei Z, Zhao J, Zhang F, Zou Y, Bai X, Li Y, Xu C (2013) A characterization of big data benchmarks. In: IEEE International Conference on Big Data. pp 118–125

  34. 34.

    Xiong W, Yu Z, Eeckhout L, Bei Z, Zhang F, Xu C (2015) SZTS: A novel big data transportation system benchmark suite. In: 44th International Conference on Parallel Processing. pp 819–828

  35. 35.

    Wang L, Zhan J, Luo C, Zhu Y, Yang Q, He Y, Gao W, Jia Z, Shi Y, Zhang S, Zheng C, Lu G, Zhan K, Li X, Qiu B (2014) Bigdatabench: a big data benchmark suite from internetservices. In: IEEE 20th International Symposium on High Performance Computer Architecture (HPCA). pp 488–499

  36. 36.

    Wasi-ur-Rahman M, Lu X, Islam NS, Panda DK (2014) Performance modeling for RDMA-enhanced hadoop MapReduce. In: 43rd International Conference on Parallel Processing. pp 50–59

Download references

Acknowledgements

This work is supported by Innovation Action Plan supported by Science and Technology Commission of Shanghai Municipality (15DZ1100305).

Author information

Affiliations

Authors

Corresponding author

Correspondence to Chao Shen.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Shen, C., Tong, W., Hwang, JN. et al. Performance modeling of big data applications in the cloud centers. J Supercomput 73, 2258–2283 (2017). https://doi.org/10.1007/s11227-017-2005-y

Download citation

Keywords

  • Cloud computing
  • Big data
  • Performance modeling
  • Embedded Markov chain
  • Response time