Abstract
When changes happen to big data analytics (BDA) applications in the Cloud at runtime, the affected BDA applications have to be re-deployed to accommodate the changes. Deciding the most suitable deployment is critical and complicated. Although there have been various research studies working on BDA application management, autonomic deployment decision making is still an open research issue. This paper proposes a deployment decision making solution for BDA applications in the Cloud: first, we propose a novel language, named DepPolicy, to specify runtime deployment information as policies; second, we model the deployment decision making problem as a constraint programming problem using MiniZinc; third, we propose a decision making algorithm that can make different deployment decisions for different jobs in a way that maximises overall utility while satisfying all given constraints (e.g., cost limit); fourth, we design and implement a decision making middleware, named DepWare, for BDA application deployment in the Cloud. The proposed solution is evaluated in terms of feasibility, functional correctness, performance and scalability.
Similar content being viewed by others
References
Aggarwal S, Phadke S, Bhandarkar M (2010) Characterization of hadoop jobs using unsupervised learning. In: Proceedings of the 2nd International Conference on cloud computing technology and science (CloudCom 2010). IEEE Computer Society, Indianapolis, IN, USA, pp 748–753
Brust A (2014) Big data analytics in the cloud: the enterprise wants it now. http://research.gigaom.com/2014/11/big-data-analytics-in-the-cloud-the-enterprise-wants-it-now
Conejero J, Rana O, Burnap P, Morgan J, Caminero B, Carrión C (2015) Analyzing hadoop power consumption and impact on application qos. Future Gener Comput Syst (in press)
Cooper BF, Schwan K (2005) Distributed stream management using utility-driven self-adaptive middleware. In: Proceedings of the 2nd international conference on automatic computing (ICAC 2005). IEEE Computer Society, Seattle, WA, USA, pp 3–14
Cromwell B (2013) How much of cloud security is new and different? http://cybersecurity.learningtree.com/2013/02/07/how-much-of-cloud-security-is-new-and-different/
Edwards G (2009) Business-driven it management for web services and business processes. PhD thesis, University of New South Wales
Emeakaroha VC, Brandic I, Maurer M, Dustdar S (2010) Low level metrics to high level SLAs—LoM2HiS framework: Bridging the gap between monitored metrics and SLA parameters in cloud environments. In: Proceedings of the 2010 international conference on high performance computing and simulation (HPCS 2010). IEEE Computer Society, Caen, France, pp 48–54
Feller E, Ramakrishnan L, Morin C (2015) Performance and energy efficiency of big data applications in cloud environments: a hadoop case study. J Parallel Distrib Comput 79–80:8089
Fisher D, DeLine R, Czerwinski M, Drucker S (2012) Interactions with big data analytics. Interactions 19(3):50–59
Fitó JO, Goiri I, Guitart J (2010) SLA-driven elastic cloud hosting provider. In: Proceedings of the 18th Euromicro international conference on parallel, distributed and network-based processing (PDP 2010). IEEE Computer Society, Pisa, Italy, pp 111–118
Heipcke S (1999) Comparing constraint programming and mathematical programming approaches to discrete optimization—the change problem. J Oper Res Soc 50(6):581–595
Hou X, Ashwin Kumar TK, Thomas JP, Varadharajan V (2014) Dynamic workload balancing for hadoop mapreduce. In: Proceedings of the 4th IEEE International Conference on Big Data and Cloud Computing (BdCloud 2014). IEEE Computer Society, Sydney, Australia, pp 56–62
IBM (2015) Mathematical programming vs. constraint programming. http://www-01.ibm.com/software/integration/optimization/cplex-cp-optimizer/mp-cp/
Iqbal W, Dailey MN, Carrera D (2010) SLA-driven dynamic resource management for multi-tier web applications in a cloud. In: Proceedings of the 10th IEEE/ACM international symposium on cluster, cloud and grid computing (CCGrid 2010). IEEE Computer Society, Melbourne, Australia, pp 832–837
Jayalath C, Stephen J, Eugster P (2014) From the cloud to the atmosphere: running mapreduce across data centers. IEEE Trans Comput 63(1):74–87
Kang Y, Zhou Y, Zheng Z, Lyu MR (2011) A user experience-based cloud service redeployment mechanism. In: Proceedings of the 4th IEEE international conference on cloud computing (CLOUD 2011). IEEE Computer Society, Washington, DC, pp 227–234
Kephart JO, Chess DM (2003) The vision of autonomic computing. Computer 36(1):41–50
Koehler M, Kaniovskyi Y, Benkner S (2011) An adaptive framework for the execution of data-intensive mapreduce applications in the cloud. In; Proceedings of the 2011 IEEE international symposium on parallel and distributed processing workshops and Phd forum (IPDPSW 2011). IEEE Computer Society, Shanghai, China, pp 1122–1131
Li Z, O’Brien L, Cai R, Zhang H (2013) Boosting metrics for cloud services evaluation - the last mile of using benchmark suites. In: Proceedings of the 27th IEEE international conference on advanced information networking and applications (AINA 2013). IEEE Computer Society, Barcelona, Spain, pp 381–388
Litoiu M, Woodside M, Wong J, Ng J, Iszlai G (2010) A business driven cloud optimization architecture. In: Proceedings of the 25th ACM symposium on applied computing (SAC 2010). ACM Press, Sierre, Switzerland, pp 380–385
Loughran S, Calero JMA, Farrell A, Kirschnick J, Guijarro J (2012) Dynamic cloud deployment of a mapreduce architecture. IEEE Internet Comput 16(6):40–50
Mao H, Zhang Z, Zhao B, Xiao L, Ruan L (2011) Towards deploying elastic hadoop in the cloud. In: Proceedings of the 2011 international conference on cyber-enabled distributed computing and knowledge discovery (CYBERC 2011). IEEE Computer Society, Beijing, China, pp 476–482
Nethercote N, Stuckey PJ, Becket R, Brand S, Duck GJ, Tack G (2007) Minizinc: towards a standard cp modelling language. In: Proceedings of the 13th international conference on principles and practice of constraint programming (CP 2007). Springer, Providence, RI, USA, pp 529–543
Paton NW, Arag ao MAT, Lee K, Fernandes AAA, Sakellariou R (2009) Optimizing utility in cloud computing through autonomic workload execution. IEEE Data Eng Bull 32(1):51–58
Pujolle G (2010) An autonomic virtualized architecture for clouds and sky. In: Proceedings of the IEEE globecom 2010 workshop on towards SmArt COmmunications and Network technologies applied on autonomous systems. IEEE Press, Miami, FL, pp 1644–1647
Ren Z, Wan J, Shi W, Xu X, Zhou M (2014) Workload analysis, implications, and optimization on a production hadoop cluster: a case study on taobao. IEEE Trans Serv Comput 7(2):307–321
Shang W, Jiang ZM, Hemmati H, Adams B, Hassan AE, Martin P (2013) Assisting developers of big data analytics applications when deploying on hadoop clouds. In: Proceedings of the 35th international conference on software engineering (ICSE 2013). IEEE Computer Society, San Francisco, CA, USA, pp 402–411
Sloman M (1994) Policy driven management for distributed systems. J Netw Syst Manag 2(4):333–360
Talia D (2013) Clouds for scalable big data analytics. Computer 46(5):98–101
Tan J, Kavulya S, Gandhi R, Narasimhan P (2010) Visual, log-based causal tracing for performance debugging of mapreduce systems. In: Proceedings of the 30th IEEE international conference on distributed computing systems (ICDCS 2010). IEEE Computer Society, Genova, Italy, pp 795–806
W3C (2007) Web services policy 1.5 - framework. http://www.w3.org/TR/ws-policy/
Wu X, Liu Y, Gorton I (2015) Exploring performance models of hadoop applications on cloud architecture. In: Proceedings of the 11th international ACM SIGSOFT conference on quality of software architectures (QoSA 2015). ACM Press, Montreal, Canada, pp 93–101
Wu X, Liu Y, Gorton I (2015b) Scalability and cost evaluation of incremental data processing using amazon’s hadoop service. In: Li KC, Jiang H, Yang LT, Cuzzocrea A (eds) Big data: algorithms, analytics, and applications, chap 2. Chapman and Hall/CRC, Boca Raton, p 2138
Zhang F, Sakr M (2013) Cluster-size scaling and mapreduce execution times. In: Proceedings of the 5th international conference on cloud computing technology and science (CloudCom 2013). IEEE Computer Society, Bristol, UK, pp 240–249
Zhang Q, Zhani MF, Boutaba R, Hellerstein JL (2013) Harmony: dynamic heterogeneity-aware resource provisioning in the cloud. In: Proc. IEEE computer society, Philadelphia, United States, IEEE Int. Conf. Distrib. Comput. Syst., pp 511–519
Zhu X, Qin X, Qiu M (2011) Qos-aware fault-tolerant scheduling for real-time tasks on heterogeneous clusters. IEEE Trans Comput 60(6):800–812
Acknowledgments
This Project is supported by National Natural Science Foundation of China (Grant No. 61402533).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Funding
This study was funded by National Natural Science Foundation of China (Grant No. 61402533).
Conflict of interest
Qinghua Lu declares that she has no conflict of interest. Zheng Li declares that he has no conflict of interest. Weishan Zhang declares that he has no conflict of interest. Laurence T. Yang declares that he has no conflict of interest.
Ethical approval
This article does not contain any studies with human participants or animals performed by any of the authors.
Additional information
Communicated by F. Pop, C. Dobre and A. Costan.
Rights and permissions
About this article
Cite this article
Lu, Q., Li, Z., Zhang, W. et al. Autonomic deployment decision making for big data analytics applications in the cloud. Soft Comput 21, 4501–4512 (2017). https://doi.org/10.1007/s00500-015-1945-5
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00500-015-1945-5