Cluster Computing

, Volume 21, Issue 2, pp 1439–1454 | Cite as

Performance prediction of parallel computing models to analyze cloud-based big data applications

  • Chao ShenEmail author
  • Weiqin Tong
  • Kim-Kwang Raymond Choo
  • Samina Kausar


Performance evaluation of cloud center is a necessary prerequisite to fulfilling contractual quality of service, particularly in big data applications. However, effectively evaluating performance of cloud services is challenging due to the complexity of cloud services and the diversity of big data applications. In this paper, we propose a performance evaluation model for parallel computing models deployed in cloud centers to support big data applications. In this evaluation model, a big data application is divided into lots of parallel tasks and the task arrivals follow a general distribution. In our approach, we also consider factors associated with resource heterogeneity, resource contention among cloud nodes, and data storage strategy, which have an impact on the performance of parallel computing models. Our model also allows us to calculate key performance indicators of cloud center such as mean number of tasks in the system, probability that a task obtains immediate service, and task waiting time. The model can also be used to predict the time of performing applications. We then demonstrate the utility of the model based on simulations and benchmarking using WordCount and TeraSort applications.


Parallel computing model Big data application Cloud center Performance modeling Embedded Markov chain Service time 



This work is partially supported by Shanghai Innovation Action Plan Project under the Grant No. 16511101200.


  1. 1.
    Khazaei, H., Misic, J., Misic, V.B.: Performance analysis of cloud computing centers using M/G/m/m+r queuing system. IEEE Trans. Parallel Distrib. Syst. (2012)
  2. 2.
    Liu, X.D., Tong, W.Q., Zhi, X.L., Fu, Z.R., Liao, W.Z.: Performance analysis of cloud computing services considering resources sharing among virtual machines. J Supercomput, pp. 357-374 (2014)Google Scholar
  3. 3.
    Nita, M.-C., Pop, F., Voicu, C., Dobre, C., Xhafa, F.: MOMTH: multi-objective scheduling algorithm of many tasks in Hadoop. Clust. Comput. J. Netw. Softw. Tools Appl. 18, 1011–1024 (2015)Google Scholar
  4. 4.
    Evans, J.J., Lucas, C.E.: Parallel application-level behavioral attributes for performance and energy management of high-performance computing systems. Clust. Comput. J. Netw. Softw. Tools Appl. 16, 91–115 (2013)Google Scholar
  5. 5.
    Sandhu, R., Sood, S.K.: Scheduling of big data applications on distributed cloud based on QoS parameters. Clust. Comput. J. Netw. Softw. Tools Appl. 18, 817–828 (2015)Google Scholar
  6. 6.
    Luo, T., Liao, Y., Chen, G., Zhang, Y.: P-DOT: a model of computation for big data. In: IEEE International Conference on Big Data, pp. 31–37 (2013)Google Scholar
  7. 7.
    Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)CrossRefGoogle Scholar
  8. 8.
    Olsen, B., McKenney, M.: Storm system database: a big data approach to moving object databases. In: The 4th International Conference on Computing for Geospatial Research and Application (COM.Geo), pp. 142–143 (2013)Google Scholar
  9. 9.
    Abouzeid, A., Bajda-Pawlikowski, K., Abadi, D.J., Rasin, A., Silberschatz, A.: HadoopDB: an architectural hybrid of MapReduce and DBMS technologies for analytical workloads. Proc. VLDB Endow. 2(1), 922–933 (2009)CrossRefGoogle Scholar
  10. 10.
    Valiant, L.G.: A bridging model for parallel computation. Commun. ACM 33(8), 103–111 (1990)CrossRefGoogle Scholar
  11. 11.
    Herodotou, H.: Hadoop Performance Models. Technical Report CS-2011-05. Computer Science Department, Duke UniversityGoogle Scholar
  12. 12.
    Lin, X., Meng, Z., Xu, C., Wang, M.: A practical performance model for Hadoop MapReduce. In : IEEE International Conference on Cluster Computing Workshops, pp. 231–239 (2012)Google Scholar
  13. 13.
    Yigitbasi, N., Willke, T.L., Liao, G., Epema, D.: Towards machine learning-based auto-tuning of MapReduce. In: 21st International Symposium on Modelling, Analysis and Simulation of Computer and Telecommunication Systems, pp. 11–20 (2013)Google Scholar
  14. 14.
    Kadirvel, S., Fortes, J.A.B.: Grey-box approach for performance prediction in map-reduce based platforms. In: 21st International Conference on Computer Communications and Networks, pp. 1–9 (2012)Google Scholar
  15. 15.
    Karloff, H., Suri, S., Vassilvitskii, S.: A model of computation for MapReduce. In: Proceedings of the 21st Annual ACM–SIAM Symposium on Discrete Algorithms, pp. 938–948 (2010)Google Scholar
  16. 16.
    Morton, K., Balazinska, M., Grossman, D.: ParaTimer: a progress indicator for MapReduce DAGs. In: Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data, pp. 507–518 (2010)Google Scholar
  17. 17.
    Wasi-ur-Rahman, M., Lu, X., Islam, N.S., Panda, D.K.: Performance modeling for RDMA-enhanced Hadoop MapReduce. In: 43rd International Conference on Parallel Processing, pp. 50–59 (2014)Google Scholar
  18. 18.
    Yzelman, A.N., Bisseling, R.H., Roose, D., et al.: MulticoreBSP for C: a high-performance library for shared-memory parallel programming. Int. J. Parallel Program. (2013)
  19. 19.
    Niculescu, V.: Cost evaluation from specifications for BSP programs. In: 20th International Parallel and Distributed Processing Symposium, pp. 25–29 (2006)Google Scholar
  20. 20.
    Huai, Y., Lee, R., Zhang, S., Xia, C.H., Zhang, X.: DOT: a matrix model for analyzing, optimizing and deploying software for big data analytics in distributed systems. In: Proceedings of the 2nd ACM Symposium on Cloud Computing, pp. 1–14 (2011)Google Scholar
  21. 21.
    Lee, J.W., Cho, Y.: An effective shared memory allocator for reducing false sharing in NUMA multiprocessors. In: IEEE Second International Conference on Algorithms and Architectures for Parallel Processing, pp. 373–382 (1996)Google Scholar
  22. 22.
    Felice Pace, M.: BSP vs MapReduce. Procedia Comput. Sci. 9, 246–255 (2012)CrossRefGoogle Scholar
  23. 23.
    Dagum, L., Menon, R.: OpenMP: an industry-standard API for shared-memory programming. IEEE Comput. Sci. Eng. 5, 46–55 (1998)CrossRefGoogle Scholar
  24. 24.
    Ahuja, S., Carriero, N.J., Gelernter, D.H., Krishnaswamy, V.: Matching language and hardware for parallel computation in the Linda machine. IEEE Trans. Comput. 37(8), 921–929 (1998)CrossRefGoogle Scholar
  25. 25.
    Gropp, W., Lusk, E., Skjellum, A.: Using MPI: Portable Parallel Programming with the Message-Passing Interface. MIT Press, Cambridge (1999)zbMATHGoogle Scholar
  26. 26.
    Sunderam, V.S.: PVM: a framework for parallel distributed programming. Concurr. Pract. Exp. 2, 315–339 (1990)CrossRefGoogle Scholar
  27. 27.
    Blumofe, R.D., Joerg, C.F., Kuszmaul, B.C., Leiserson, C.E., Randall, K.H., Zhou, Y.: Cilk: an efficient multithreaded runtime system. In: Proceedings of the Fifth ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP), pp. 207–216 (1995)Google Scholar
  28. 28.
    Reinders, J.: Intel Threading Building Blocks: Outfitting C++ for Multicore Processor Parallelism. O’Reilly Media, Inc., Sebastopol (2007)Google Scholar
  29. 29.
    Isard, M., Budiu, M., Yu, Y., Birrell, A., Fetterly, D.: Dryad: distributed data-parallel programs from sequential building blocks. In: European Conference on Computer Systems (EuroSys), pp. 59–72 (2007)Google Scholar
  30. 30.
    Power, R., Li, J.Y.: Piccolo: building fast, distributed programs with partitioned tables. In: Proceedings of the 9th USENIX Conference on Operating Systems, pp. 1–14 (2010)Google Scholar
  31. 31.
    Khazaei, H.: Performance analysis of cloud computing centers. In: Quality, Reliability, Security and Robustness in Heterogeneous Networks, pp. 251–264 (2012)Google Scholar
  32. 32.
    Shen, C., Tong, W.Q., Kausar, S.: Predicting the performance of parallel computing models using queuing system. In: 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, pp. 757–760 (2015)Google Scholar
  33. 33.
    Lu, C.: Queuing Theory (the Second Version). Beijing University of Posts and Telecommunication Press, Beijing (2009)Google Scholar
  34. 34.
    Maple 18, Maplesoft. (2016)
  35. 35.
    Hadoop, apache. (2016)

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2017

Authors and Affiliations

  • Chao Shen
    • 1
    • 2
    Email author
  • Weiqin Tong
    • 2
  • Kim-Kwang Raymond Choo
    • 3
    • 4
  • Samina Kausar
    • 2
  1. 1.School of Information and Electrical EngineeringHebei University of EngineeringHandanChina
  2. 2.School of Computer Engineering and ScienceShanghai UniversityShanghaiChina
  3. 3.School of Computer ScienceChina University of GeosciencesWuhanChina
  4. 4.Department of Information Systems and Cyber SecurityThe University of Texas at San AntonioSan AntonioUSA

Personalised recommendations