Skip to main content
Log in

Autonomic deployment decision making for big data analytics applications in the cloud

  • Focus
  • Published:
Soft Computing Aims and scope Submit manuscript

Abstract

When changes happen to big data analytics (BDA) applications in the Cloud at runtime, the affected BDA applications have to be re-deployed to accommodate the changes. Deciding the most suitable deployment is critical and complicated. Although there have been various research studies working on BDA application management, autonomic deployment decision making is still an open research issue. This paper proposes a deployment decision making solution for BDA applications in the Cloud: first, we propose a novel language, named DepPolicy, to specify runtime deployment information as policies; second, we model the deployment decision making problem as a constraint programming problem using MiniZinc; third, we propose a decision making algorithm that can make different deployment decisions for different jobs in a way that maximises overall utility while satisfying all given constraints (e.g., cost limit); fourth, we design and implement a decision making middleware, named DepWare, for BDA application deployment in the Cloud. The proposed solution is evaluated in terms of feasibility, functional correctness, performance and scalability.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  • Aggarwal S, Phadke S, Bhandarkar M (2010) Characterization of hadoop jobs using unsupervised learning. In: Proceedings of the 2nd International Conference on cloud computing technology and science (CloudCom 2010). IEEE Computer Society, Indianapolis, IN, USA, pp 748–753

  • Brust A (2014) Big data analytics in the cloud: the enterprise wants it now. http://research.gigaom.com/2014/11/big-data-analytics-in-the-cloud-the-enterprise-wants-it-now

  • Conejero J, Rana O, Burnap P, Morgan J, Caminero B, Carrión C (2015) Analyzing hadoop power consumption and impact on application qos. Future Gener Comput Syst (in press)

  • Cooper BF, Schwan K (2005) Distributed stream management using utility-driven self-adaptive middleware. In: Proceedings of the 2nd international conference on automatic computing (ICAC 2005). IEEE Computer Society, Seattle, WA, USA, pp 3–14

  • Cromwell B (2013) How much of cloud security is new and different? http://cybersecurity.learningtree.com/2013/02/07/how-much-of-cloud-security-is-new-and-different/

  • Edwards G (2009) Business-driven it management for web services and business processes. PhD thesis, University of New South Wales

  • Emeakaroha VC, Brandic I, Maurer M, Dustdar S (2010) Low level metrics to high level SLAs—LoM2HiS framework: Bridging the gap between monitored metrics and SLA parameters in cloud environments. In: Proceedings of the 2010 international conference on high performance computing and simulation (HPCS 2010). IEEE Computer Society, Caen, France, pp 48–54

  • Feller E, Ramakrishnan L, Morin C (2015) Performance and energy efficiency of big data applications in cloud environments: a hadoop case study. J Parallel Distrib Comput 79–80:8089

    Google Scholar 

  • Fisher D, DeLine R, Czerwinski M, Drucker S (2012) Interactions with big data analytics. Interactions 19(3):50–59

    Article  Google Scholar 

  • Fitó JO, Goiri I, Guitart J (2010) SLA-driven elastic cloud hosting provider. In: Proceedings of the 18th Euromicro international conference on parallel, distributed and network-based processing (PDP 2010). IEEE Computer Society, Pisa, Italy, pp 111–118

  • Heipcke S (1999) Comparing constraint programming and mathematical programming approaches to discrete optimization—the change problem. J Oper Res Soc 50(6):581–595

    MATH  Google Scholar 

  • Hou X, Ashwin Kumar TK, Thomas JP, Varadharajan V (2014) Dynamic workload balancing for hadoop mapreduce. In: Proceedings of the 4th IEEE International Conference on Big Data and Cloud Computing (BdCloud 2014). IEEE Computer Society, Sydney, Australia, pp 56–62

  • IBM (2015) Mathematical programming vs. constraint programming. http://www-01.ibm.com/software/integration/optimization/cplex-cp-optimizer/mp-cp/

  • Iqbal W, Dailey MN, Carrera D (2010) SLA-driven dynamic resource management for multi-tier web applications in a cloud. In: Proceedings of the 10th IEEE/ACM international symposium on cluster, cloud and grid computing (CCGrid 2010). IEEE Computer Society, Melbourne, Australia, pp 832–837

  • Jayalath C, Stephen J, Eugster P (2014) From the cloud to the atmosphere: running mapreduce across data centers. IEEE Trans Comput 63(1):74–87

    Article  MathSciNet  MATH  Google Scholar 

  • Kang Y, Zhou Y, Zheng Z, Lyu MR (2011) A user experience-based cloud service redeployment mechanism. In: Proceedings of the 4th IEEE international conference on cloud computing (CLOUD 2011). IEEE Computer Society, Washington, DC, pp 227–234

  • Kephart JO, Chess DM (2003) The vision of autonomic computing. Computer 36(1):41–50

    Article  MathSciNet  Google Scholar 

  • Koehler M, Kaniovskyi Y, Benkner S (2011) An adaptive framework for the execution of data-intensive mapreduce applications in the cloud. In; Proceedings of the 2011 IEEE international symposium on parallel and distributed processing workshops and Phd forum (IPDPSW 2011). IEEE Computer Society, Shanghai, China, pp 1122–1131

  • Li Z, O’Brien L, Cai R, Zhang H (2013) Boosting metrics for cloud services evaluation - the last mile of using benchmark suites. In: Proceedings of the 27th IEEE international conference on advanced information networking and applications (AINA 2013). IEEE Computer Society, Barcelona, Spain, pp 381–388

  • Litoiu M, Woodside M, Wong J, Ng J, Iszlai G (2010) A business driven cloud optimization architecture. In: Proceedings of the 25th ACM symposium on applied computing (SAC 2010). ACM Press, Sierre, Switzerland, pp 380–385

  • Loughran S, Calero JMA, Farrell A, Kirschnick J, Guijarro J (2012) Dynamic cloud deployment of a mapreduce architecture. IEEE Internet Comput 16(6):40–50

    Article  Google Scholar 

  • Mao H, Zhang Z, Zhao B, Xiao L, Ruan L (2011) Towards deploying elastic hadoop in the cloud. In: Proceedings of the 2011 international conference on cyber-enabled distributed computing and knowledge discovery (CYBERC 2011). IEEE Computer Society, Beijing, China, pp 476–482

  • Nethercote N, Stuckey PJ, Becket R, Brand S, Duck GJ, Tack G (2007) Minizinc: towards a standard cp modelling language. In: Proceedings of the 13th international conference on principles and practice of constraint programming (CP 2007). Springer, Providence, RI, USA, pp 529–543

  • Paton NW, Arag ao MAT, Lee K, Fernandes AAA, Sakellariou R (2009) Optimizing utility in cloud computing through autonomic workload execution. IEEE Data Eng Bull 32(1):51–58

    Google Scholar 

  • Pujolle G (2010) An autonomic virtualized architecture for clouds and sky. In: Proceedings of the IEEE globecom 2010 workshop on towards SmArt COmmunications and Network technologies applied on autonomous systems. IEEE Press, Miami, FL, pp 1644–1647

  • Ren Z, Wan J, Shi W, Xu X, Zhou M (2014) Workload analysis, implications, and optimization on a production hadoop cluster: a case study on taobao. IEEE Trans Serv Comput 7(2):307–321

    Article  Google Scholar 

  • Shang W, Jiang ZM, Hemmati H, Adams B, Hassan AE, Martin P (2013) Assisting developers of big data analytics applications when deploying on hadoop clouds. In: Proceedings of the 35th international conference on software engineering (ICSE 2013). IEEE Computer Society, San Francisco, CA, USA, pp 402–411

  • Sloman M (1994) Policy driven management for distributed systems. J Netw Syst Manag 2(4):333–360

    Article  Google Scholar 

  • Talia D (2013) Clouds for scalable big data analytics. Computer 46(5):98–101

    Article  Google Scholar 

  • Tan J, Kavulya S, Gandhi R, Narasimhan P (2010) Visual, log-based causal tracing for performance debugging of mapreduce systems. In: Proceedings of the 30th IEEE international conference on distributed computing systems (ICDCS 2010). IEEE Computer Society, Genova, Italy, pp 795–806

  • W3C (2007) Web services policy 1.5 - framework. http://www.w3.org/TR/ws-policy/

  • Wu X, Liu Y, Gorton I (2015) Exploring performance models of hadoop applications on cloud architecture. In: Proceedings of the 11th international ACM SIGSOFT conference on quality of software architectures (QoSA 2015). ACM Press, Montreal, Canada, pp 93–101

  • Wu X, Liu Y, Gorton I (2015b) Scalability and cost evaluation of incremental data processing using amazon’s hadoop service. In: Li KC, Jiang H, Yang LT, Cuzzocrea A (eds) Big data: algorithms, analytics, and applications, chap 2. Chapman and Hall/CRC, Boca Raton, p 2138

  • Zhang F, Sakr M (2013) Cluster-size scaling and mapreduce execution times. In: Proceedings of the 5th international conference on cloud computing technology and science (CloudCom 2013). IEEE Computer Society, Bristol, UK, pp 240–249

  • Zhang Q, Zhani MF, Boutaba R, Hellerstein JL (2013) Harmony: dynamic heterogeneity-aware resource provisioning in the cloud. In: Proc. IEEE computer society, Philadelphia, United States, IEEE Int. Conf. Distrib. Comput. Syst., pp 511–519

  • Zhu X, Qin X, Qiu M (2011) Qos-aware fault-tolerant scheduling for real-time tasks on heterogeneous clusters. IEEE Trans Comput 60(6):800–812

    Article  MathSciNet  MATH  Google Scholar 

Download references

Acknowledgments

This Project is supported by National Natural Science Foundation of China (Grant No. 61402533).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zheng Li.

Ethics declarations

Funding

This study was funded by National Natural Science Foundation of China (Grant No. 61402533).

Conflict of interest

Qinghua Lu declares that she has no conflict of interest. Zheng Li declares that he has no conflict of interest. Weishan Zhang declares that he has no conflict of interest. Laurence T. Yang declares that he has no conflict of interest.

Ethical approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Additional information

Communicated by F. Pop, C. Dobre and A. Costan.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Lu, Q., Li, Z., Zhang, W. et al. Autonomic deployment decision making for big data analytics applications in the cloud. Soft Comput 21, 4501–4512 (2017). https://doi.org/10.1007/s00500-015-1945-5

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00500-015-1945-5

Keywords

Navigation