Soft Computing

, Volume 21, Issue 16, pp 4501–4512 | Cite as

Autonomic deployment decision making for big data analytics applications in the cloud

  • Qinghua Lu
  • Zheng Li
  • Weishan Zhang
  • Laurence T. Yang


When changes happen to big data analytics (BDA) applications in the Cloud at runtime, the affected BDA applications have to be re-deployed to accommodate the changes. Deciding the most suitable deployment is critical and complicated. Although there have been various research studies working on BDA application management, autonomic deployment decision making is still an open research issue. This paper proposes a deployment decision making solution for BDA applications in the Cloud: first, we propose a novel language, named DepPolicy, to specify runtime deployment information as policies; second, we model the deployment decision making problem as a constraint programming problem using MiniZinc; third, we propose a decision making algorithm that can make different deployment decisions for different jobs in a way that maximises overall utility while satisfying all given constraints (e.g., cost limit); fourth, we design and implement a decision making middleware, named DepWare, for BDA application deployment in the Cloud. The proposed solution is evaluated in terms of feasibility, functional correctness, performance and scalability.


Big data analytics Deployment Decision making Cloud QoS Autonomic computing 


  1. Aggarwal S, Phadke S, Bhandarkar M (2010) Characterization of hadoop jobs using unsupervised learning. In: Proceedings of the 2nd International Conference on cloud computing technology and science (CloudCom 2010). IEEE Computer Society, Indianapolis, IN, USA, pp 748–753Google Scholar
  2. Brust A (2014) Big data analytics in the cloud: the enterprise wants it now.
  3. Conejero J, Rana O, Burnap P, Morgan J, Caminero B, Carrión C (2015) Analyzing hadoop power consumption and impact on application qos. Future Gener Comput Syst (in press)Google Scholar
  4. Cooper BF, Schwan K (2005) Distributed stream management using utility-driven self-adaptive middleware. In: Proceedings of the 2nd international conference on automatic computing (ICAC 2005). IEEE Computer Society, Seattle, WA, USA, pp 3–14Google Scholar
  5. Cromwell B (2013) How much of cloud security is new and different?
  6. Edwards G (2009) Business-driven it management for web services and business processes. PhD thesis, University of New South WalesGoogle Scholar
  7. Emeakaroha VC, Brandic I, Maurer M, Dustdar S (2010) Low level metrics to high level SLAs—LoM2HiS framework: Bridging the gap between monitored metrics and SLA parameters in cloud environments. In: Proceedings of the 2010 international conference on high performance computing and simulation (HPCS 2010). IEEE Computer Society, Caen, France, pp 48–54Google Scholar
  8. Feller E, Ramakrishnan L, Morin C (2015) Performance and energy efficiency of big data applications in cloud environments: a hadoop case study. J Parallel Distrib Comput 79–80:8089Google Scholar
  9. Fisher D, DeLine R, Czerwinski M, Drucker S (2012) Interactions with big data analytics. Interactions 19(3):50–59CrossRefGoogle Scholar
  10. Fitó JO, Goiri I, Guitart J (2010) SLA-driven elastic cloud hosting provider. In: Proceedings of the 18th Euromicro international conference on parallel, distributed and network-based processing (PDP 2010). IEEE Computer Society, Pisa, Italy, pp 111–118Google Scholar
  11. Heipcke S (1999) Comparing constraint programming and mathematical programming approaches to discrete optimization—the change problem. J Oper Res Soc 50(6):581–595MATHGoogle Scholar
  12. Hou X, Ashwin Kumar TK, Thomas JP, Varadharajan V (2014) Dynamic workload balancing for hadoop mapreduce. In: Proceedings of the 4th IEEE International Conference on Big Data and Cloud Computing (BdCloud 2014). IEEE Computer Society, Sydney, Australia, pp 56–62Google Scholar
  13. IBM (2015) Mathematical programming vs. constraint programming.
  14. Iqbal W, Dailey MN, Carrera D (2010) SLA-driven dynamic resource management for multi-tier web applications in a cloud. In: Proceedings of the 10th IEEE/ACM international symposium on cluster, cloud and grid computing (CCGrid 2010). IEEE Computer Society, Melbourne, Australia, pp 832–837Google Scholar
  15. Jayalath C, Stephen J, Eugster P (2014) From the cloud to the atmosphere: running mapreduce across data centers. IEEE Trans Comput 63(1):74–87MathSciNetCrossRefMATHGoogle Scholar
  16. Kang Y, Zhou Y, Zheng Z, Lyu MR (2011) A user experience-based cloud service redeployment mechanism. In: Proceedings of the 4th IEEE international conference on cloud computing (CLOUD 2011). IEEE Computer Society, Washington, DC, pp 227–234Google Scholar
  17. Kephart JO, Chess DM (2003) The vision of autonomic computing. Computer 36(1):41–50MathSciNetCrossRefGoogle Scholar
  18. Koehler M, Kaniovskyi Y, Benkner S (2011) An adaptive framework for the execution of data-intensive mapreduce applications in the cloud. In; Proceedings of the 2011 IEEE international symposium on parallel and distributed processing workshops and Phd forum (IPDPSW 2011). IEEE Computer Society, Shanghai, China, pp 1122–1131Google Scholar
  19. Li Z, O’Brien L, Cai R, Zhang H (2013) Boosting metrics for cloud services evaluation - the last mile of using benchmark suites. In: Proceedings of the 27th IEEE international conference on advanced information networking and applications (AINA 2013). IEEE Computer Society, Barcelona, Spain, pp 381–388Google Scholar
  20. Litoiu M, Woodside M, Wong J, Ng J, Iszlai G (2010) A business driven cloud optimization architecture. In: Proceedings of the 25th ACM symposium on applied computing (SAC 2010). ACM Press, Sierre, Switzerland, pp 380–385Google Scholar
  21. Loughran S, Calero JMA, Farrell A, Kirschnick J, Guijarro J (2012) Dynamic cloud deployment of a mapreduce architecture. IEEE Internet Comput 16(6):40–50CrossRefGoogle Scholar
  22. Mao H, Zhang Z, Zhao B, Xiao L, Ruan L (2011) Towards deploying elastic hadoop in the cloud. In: Proceedings of the 2011 international conference on cyber-enabled distributed computing and knowledge discovery (CYBERC 2011). IEEE Computer Society, Beijing, China, pp 476–482Google Scholar
  23. Nethercote N, Stuckey PJ, Becket R, Brand S, Duck GJ, Tack G (2007) Minizinc: towards a standard cp modelling language. In: Proceedings of the 13th international conference on principles and practice of constraint programming (CP 2007). Springer, Providence, RI, USA, pp 529–543Google Scholar
  24. Paton NW, Arag ao MAT, Lee K, Fernandes AAA, Sakellariou R (2009) Optimizing utility in cloud computing through autonomic workload execution. IEEE Data Eng Bull 32(1):51–58Google Scholar
  25. Pujolle G (2010) An autonomic virtualized architecture for clouds and sky. In: Proceedings of the IEEE globecom 2010 workshop on towards SmArt COmmunications and Network technologies applied on autonomous systems. IEEE Press, Miami, FL, pp 1644–1647Google Scholar
  26. Ren Z, Wan J, Shi W, Xu X, Zhou M (2014) Workload analysis, implications, and optimization on a production hadoop cluster: a case study on taobao. IEEE Trans Serv Comput 7(2):307–321CrossRefGoogle Scholar
  27. Shang W, Jiang ZM, Hemmati H, Adams B, Hassan AE, Martin P (2013) Assisting developers of big data analytics applications when deploying on hadoop clouds. In: Proceedings of the 35th international conference on software engineering (ICSE 2013). IEEE Computer Society, San Francisco, CA, USA, pp 402–411Google Scholar
  28. Sloman M (1994) Policy driven management for distributed systems. J Netw Syst Manag 2(4):333–360CrossRefGoogle Scholar
  29. Talia D (2013) Clouds for scalable big data analytics. Computer 46(5):98–101CrossRefGoogle Scholar
  30. Tan J, Kavulya S, Gandhi R, Narasimhan P (2010) Visual, log-based causal tracing for performance debugging of mapreduce systems. In: Proceedings of the 30th IEEE international conference on distributed computing systems (ICDCS 2010). IEEE Computer Society, Genova, Italy, pp 795–806Google Scholar
  31. W3C (2007) Web services policy 1.5 - framework.
  32. Wu X, Liu Y, Gorton I (2015) Exploring performance models of hadoop applications on cloud architecture. In: Proceedings of the 11th international ACM SIGSOFT conference on quality of software architectures (QoSA 2015). ACM Press, Montreal, Canada, pp 93–101Google Scholar
  33. Wu X, Liu Y, Gorton I (2015b) Scalability and cost evaluation of incremental data processing using amazon’s hadoop service. In: Li KC, Jiang H, Yang LT, Cuzzocrea A (eds) Big data: algorithms, analytics, and applications, chap 2. Chapman and Hall/CRC, Boca Raton, p 2138Google Scholar
  34. Zhang F, Sakr M (2013) Cluster-size scaling and mapreduce execution times. In: Proceedings of the 5th international conference on cloud computing technology and science (CloudCom 2013). IEEE Computer Society, Bristol, UK, pp 240–249Google Scholar
  35. Zhang Q, Zhani MF, Boutaba R, Hellerstein JL (2013) Harmony: dynamic heterogeneity-aware resource provisioning in the cloud. In: Proc. IEEE computer society, Philadelphia, United States, IEEE Int. Conf. Distrib. Comput. Syst., pp 511–519Google Scholar
  36. Zhu X, Qin X, Qiu M (2011) Qos-aware fault-tolerant scheduling for real-time tasks on heterogeneous clusters. IEEE Trans Comput 60(6):800–812MathSciNetCrossRefMATHGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2015

Authors and Affiliations

  • Qinghua Lu
    • 1
  • Zheng Li
    • 2
  • Weishan Zhang
    • 1
  • Laurence T. Yang
    • 3
  1. 1.College of Computer and Communication EngineeringChina University of PetroleumQingdaoChina
  2. 2.Department of Electrical and Information TechnologyLund UniversityLundSweden
  3. 3.Department of Computer ScienceSt. Francis Xavier UniversityAntigonishCanada

Personalised recommendations