Abstract
Meeting performance expectations of tenants without sacrificing economic benefit is a tough challenge for cloud providers. We propose a data replication strategy to simultaneously satisfy both the performance and provider profit. Response time of database queries is estimated with the consideration of parallel execution. If the estimated response time is not acceptable, bottlenecks are identified in the query plan. Data replication is realized to resolve the bottlenecks. Data placement is heuristically performed in a way to satisfy query response times at a minimal cost for the provider. We demonstrate the validity of our strategy in a performance evaluation study.
Similar content being viewed by others
References
Abouzamazem A, Ezhilchelvan P (2013) Efficient inter-cloud replication for high-availability services. In: 2013 IEEE international conference on cloud engineering (IC2E). IEEE, pp 132–139. https://doi.org/10.1109/IC2E.2013.27
Alami Milani B, Jafari Navimipour N (2016) A comprehensive review of the data replication techniques in the cloud environments: major trends and future directions. J Netw Comput Appl 64:229–238. https://doi.org/10.1016/j.jnca.2016.02.005
Alghamdi M, Tang B, Chen Y (2017) Profit-based file replication in data intensive cloud data centers. In: 2017 IEEE international conference on communication (ICC), pp 1–7. https://doi.org/10.1109/ICC.2017.7996728
Bai X, Jin H, Liao X, Shi X, Shao Z (2013) RTRM: a response time-based replica management strategy for cloud storage system. In: Grid pervasive computing, pp 124–133
Barroso LA, Hölzle U, Ranganathan P (2018) The datacenter as a computer: designing warehouse-scale machines. Synth Lect Comput Archit 13(3):1–189
Bonvin N, Papaioannou TG, Aberer K (2010) A self-organized, fault-tolerant and scalable replication scheme for cloud storage categories and subject descriptors. In: Proceedings of the 1st ACM symposium cloud computiong (SoCC ’10), pp 205–216
Boru D, Kliazovich D, Granelli F, Bouvry P, Zomaya AY (2015) Energy-efficient data replication in cloud computing datacenters. Clust Comput 18(1):385–402. https://doi.org/10.1007/s10586-014-0404-x
Buyya R, Yeo CS, Venugopal S, Broberg J, Brandic I (2009) Cloud computing and emerging IT platforms: vision, hype, and reality for delivering computing as the 5th utility. Future Gener Comput Syst 25(6):599–616. https://doi.org/10.1016/j.future.2008.12.001. arXiv0808.3558
Calheiros RN, Ranjan R, Beloglazov A, De Rose CAF, Buyya R (2011) CloudSim: a toolkit for modeling and simulation of cloud computing environments and evaluation of resource provisioning algorithms. Softw Pract Exp 41(1):23–50. https://doi.org/10.1002/spe.995. arXiv:1008.1900
Casas I, Taheri J, Ranjan R, Wang L, Zomaya AY (2017) A balanced scheduler with data reuse and replication for scientific workflows in cloud computing systems. Future Gener Comput Syst 74:168–178
Edwin EB, Umamaheswari P, Thanka MR (2019) An efficient and improved multi-objective optimized replication management with dynamic and cost aware strategies in cloud computing data center. Clust Comput 22(5):11119–11128. https://doi.org/10.1007/s10586-017-1313-6
Foster I, Zhao Y, Raicu I, Lu S (2008) Cloud computing and grid computing 360-degree compared. In: 2008 Grid computing environment workshop. IEEE, pp 1–10. https://doi.org/10.1109/GCE.2008.4738445
Garofalakis MN, Ioannidis YE (1996) Multi-dimensional resource scheduling for parallel queries. Sigmod Rec 25(2):365–376. https://doi.org/10.1145/233269.233352
Gill NK, Singh S (2016) A dynamic, cost-aware, optimized data replication strategy for heterogeneous cloud data centers. Future Gener Comput Syst 65:10–32. https://doi.org/10.1016/j.future.2016.05.016
Gopinath S, Sherly E (2018) A comprehensive survey on data replication techniques in cloud storage systems. Int J Appl Eng Res 13(22):15926–15932
Greenberg A, Hamilton J, Da Maltz, Patel P (2009) The cost of a cloud: research problems in data center networks. Comput Commun Rev 39(1):68–73. https://doi.org/10.1145/1496091.1496103
Hameurlain A, Mokadem R (2017) Elastic data management in cloud systems. Int J Comput Syst Sci Eng 32(4):3
Harangsri B (1998) Query result size estimation techniques in database systems. PhD thesis. http://www.cs.txstate.edu/~hn12/all.pdf. Accessed 18 Dec 2020
Hsiao HI, Chen MS, Yu PS (1994) On parallel execution of multiple pipelined hash joins. Sigmod Rec 23(2):185–196. https://doi.org/10.1145/191843.191879
Januzaj Y, Ajdari J, Selimi B (2015) DBMS as a cloud service: advantages and disadvantages. Procedia Soc Behav Sci 195:1851–1859. https://doi.org/10.1016/j.sbspro.2015.06.412
Javanmardi S, Shojafar M, Shariatmadari S, Sarv Ahrabi S (2015) FRTRUST: a fuzzy reputation based model for trust management in semantic P2P grids. Int J Grid Util Comput 6:57–66. https://doi.org/10.1504/IJGUC.2015.066397
Kumar KA, Quamar A, Deshpande A, Khuller S (2014) SWORD: workload-aware data placement and replica selection for cloud data management systems. VLDB J 23(6):845–870. https://doi.org/10.1007/s00778-014-0362-1
Lanzelotte RSG, Valduriez P, Zaït M, Ziane M (1994) Industrial-strength parallel query optimization: issues and lessons. Inform Syst 19(4):311–330. https://doi.org/10.1016/0306-4379(94)90017-5
Limam S, Mokadem R, Belalem G (2019) Data replication strategy with satisfaction of availability, performance and tenant budget requirements. Clust Comput 22(4):1199–1210. https://doi.org/10.1007/s10586-018-02899-6
Liu G, Shen H (2017) Minimum-cost cloud storage service across multiple cloud providers. IEEE/ACM Trans Netw 25(4):2498–2513
Liu J, Shen H, Narman HS (2019) Popularity-aware multi-failure resilient and cost-effective replication for high data durability in cloud storage. IEEE T Parall Distrib 30(10):2355–2369. https://doi.org/10.1109/TPDS.2018.2873384
Long SQ, Zhao YL, Chen W (2014) MORM: a multi-objective optimized replication management strategy for cloud storage cluster. J Syst Archit 60(2):234–244. https://doi.org/10.1016/j.sysarc.2013.11.012
Mansouri N, Javidi MM (2018) A new prefetching-aware data replication to decrease access latency in cloud environment. J Syst Softw 144:197–215
Mansouri Y, Buyya R (2019) Dynamic replication and migration of data objects with hot-spot and cold-spot statuses across storage data centers. J Parallel Distrib Comput 126:121–133
Mansouri N, Rafsanjani MK, Javidi MM (2017) DPRS: a dynamic popularity aware replication strategy with parallel download scheme in cloud environments. Simul Model Pract Theory 77:177–196. https://doi.org/10.1016/j.simpat.2017.06.001
Marcus R, Papaemmanouil O, Semenova S, Garber S (2018) NashDB: an end-to-end economic method for elastic database fragmentation, replication, and provisioning. In: Proceedings of the 2018 international conference on management data, pp 1253–1267
Mokadem R, Hameurlain A (2020) A data replication strategy with tenant performance and provider economic profit guarantees in cloud data centers. J Syst Softw 159:110447. https://doi.org/10.1016/j.jss.2019.110447
Mokadem R, Martinez-Gil J, Hameurlain A, Kueng J (2021) A review on data replication strategies in cloud systems. Int J Grid Util Comput (in press)
Özsu MT, Valduriez P (2020) Principles of distributed database systems, 4th edn. Springer, Berlin. https://doi.org/10.1007/978-3-030-26253-2
Philip Chen CL, Zhang CY (2014) Data-intensive applications, challenges, techniques and technologies: a survey on big data. Inform Sci 275:314–347. https://doi.org/10.1016/j.ins.2014.01.015
Qu Y, Xiong N (2012) RFH: a resilient, fault-tolerant and high-efficient replication algorithm for distributed cloud storage. In: 41st international conference on parallel processing. IEEE, pp 520–529. https://doi.org/10.1109/ICPP.2012.3
Sharov A, Shraer A, Merchant A, Stokely M (2015) Take me to your leader!. Proc VLDB Endow 8(12):1490–1501. https://doi.org/10.14778/2824032.2824047
Shojafar M, Javanmardi S, Abolfazli S, Cordeschi N (2015) FUGE: a joint meta-heuristic approach to cloud job scheduling algorithm using fuzzy theory and a genetic method. Clust Comput 18(2):829–844. https://doi.org/10.1007/s10586-014-0420-x
Silberschatz A, Korth HF, Sudarshan S (2011) Database system concepts, 6th edn. McGraw-Hill, Cambridge. https://doi.org/10.1145/253671.253760
Sousa FRC, Machado JC (2012) Towards elastic multi-tenant database replication with quality of service. In: IEEE/ACM 5th international conference on utility cloud computing (UCC 2012), pp 168–175. https://doi.org/10.1109/UCC.2012.36
Sousa FRC, Moreira LO, Costa Filho JS, Machado JC (2018) Predictive elastic replication for multi-tenant databases in the cloud. Concurr Comput Pract Exp 30(16):e4437
Suleiman B, Sakr S, Jeffery R, Liu A (2011) On understanding the economics and elasticity challenges of deploying business applications on public cloud infrastructure. J Internet Serv Appl 3(2):173–193. https://doi.org/10.1007/s13174-011-0050-y
Sun S, Yao W, Li X (2018) DARS: a dynamic adaptive replica strategy under high load cloud-P2P. Future Gener Comput Syst 78:31–40. https://doi.org/10.1016/j.future.2017.07.046
Swami A, Schiefer KB (1994) On the estimation of join result sizes. In: Advances in database technology—EDBT ’94, pp 287–300. https://doi.org/10.1007/3-540-57818-8_58
Swarna Priya RM, Bhattacharya S, Maddikunta PKR, Somayaji SRK, Lakshmanna K, Kaluri R, Hussien A, Gadekallu TR (2020) Load balancing of energy cloud using wind driven and firefly algorithms in internet of everything. J Parallel Distrib Comput 142:16–26. https://doi.org/10.1016/j.jpdc.2020.02.010
Tabet K, Mokadem R, Laouar MR, Eom S (2017) Data replication in cloud systems: a survey. Int J Inf Syst Soc Change 8(3):17–33
Taft R, Lang W, Duggan J, Elmore AJ, Stonebraker M, DeWitt D (2016) STeP: scalable tenant placement for managing database-as-a-service deployments. In: Proceedings of the 7th ACM symposium on cloud computing, SoCC ’16, pp 388–400. https://doi.org/10.1145/2987550.2987575
Tan Z, Babu S (2016) Tempo: robust and self-tuning resource management in multi-tenant parallel databases. Proc VLDB Endow 9(10):720–731. https://doi.org/10.14778/2977797.2977799
Tomov N, Dempster E, Williams MH, Burger A, Taylor H, King PJB, Broughton P (1999) Some results from a new technique for response time estimation in parallel DBMS. Proc High Perform Comput Netw 1593:713–721
Tomov N, Dempster E, Williams MH, Burger A, Taylor H, King PJB, Broughton P (2004) Analytical response time estimation in parallel relational database systems. Parallel Comput 30(2):249–283. https://doi.org/10.1016/j.parco.2003.11.003
Tos U (2017) Data replication in large-scale data management systems. PhD thesis. Université Paul Sabatier - Toulouse III. https://tel.archives-ouvertes.fr/tel-01820748. Accessed 18 Dec 2020
Tos U, Mokadem R, Hameurlain A, Ayav T, Bora S (2016) A performance and profit oriented data replication strategy for cloud systems. In: 2016 international IEEE conference on ubiquitous intelligence and computing, advanced and trusted computing, scalable computing and communications. cloud big data computing. Internet people smart world Congress (UIC/ATC/ScalCom/CBDCom/IoP/SmartWorld). IEEE, pp 780–787. https://doi.org/10.1109/UIC-ATC-ScalCom-CBDCom-IoP-SmartWorld.2016.0125
Tos U, Mokadem R, Hameurlain A, Ayav T, Bora S (2018) Ensuring performance and provider profit through data replication in cloud systems. Clust Comput 21(3):1479–1492. https://doi.org/10.1007/s10586-017-1507-y
Vengerov D, Way W, Jose S, Menck AC (2015) Join size estimation subject to filter conditions. VLDB J. https://doi.org/10.14778/2824032.282405110.14778/2824032.2824051
Vulimiri A, Curino C, Godfrey B, Jungblut T, Padhye J, Varghese G (2015) Global analytics in the face of bandwidth and regulatory constraints. In: 12th USENIX symposium on networked system design implementation (NSDI), pp 323–336
Wei Q, Veeravalli B, Gong B, Zeng L, Feng D (2010) CDRM: a cost-effective dynamic replication management scheme for cloud storage cluster. In: IEEE international conference on clustering computing. IEEE, pp 188–196. https://doi.org/10.1109/CLUSTER.2010.24
Wu L, Garg SK, Buyya R (2011) SLA-based resource allocation for software as a service provider (SaaS) in cloud computing environments. In: 2011 11th IEEE/ACM international symposium on clustering cloud grid computing. IEEE, pp 195–204. https://doi.org/10.1109/CCGrid.2011.51
Wu W, Chi Y, Hacigümüs H, Naughton JF (2013a) Towards predicting query execution time for concurrent and dynamic database workloads. Proc VLDB Endow 6(10):925–936. https://doi.org/10.14778/2536206.2536219
Wu Z, Butkiewicz M, Perkins D, Katz-Bassett E, Madhyastha HV (2013b) Spanstore: cost-effective geo-replicated storage spanning multiple cloud services. In: Proceedings of the 24th ACM symposium on operation system principle, pp 292–308
Xiong P, Chi Y, Zhu S, Moon HJ, Pu C, Hacigumus H (2011) Intelligent management of virtualized resources for database systems in cloud environment. In: International conference on data engineering, pp 87–98. https://doi.org/10.1109/ICDE.2011.5767928
Zeng Z, Veeravalli B (2014) Optimal metadata replications and request balancing strategy on cloud data centers. J Parallel Distrib Comput 74(10):2934–2940
Zeng L, Xu S, Wang Y, Kent KB, Bremner D, Xu C (2016) Toward cost-effective replica placements in cloud storage systems with QoS-awareness. Pract Exp Softw. https://doi.org/10.1002/spe.2441
Zhao L, Sakr S, Liu A, Bouguettaya A (2014) SLA-driven database replication on virtualized database servers. In: Cloud data management. Springer, Cham, pp 97–118. https://doi.org/10.1007/978-3-319-04765-2_7
Acknowledgements
The work presented in this paper is supported in part by the Scientific and Technological Research Council of Turkey (TÜBİTAK). This work is also supported in part by the French Ministries of Europe and Foreign Affairs (MEFA) and of Higher Education, Research and Innovation (MHERI) and Amadeus Program 2020 (French-Austrian Hubert Curien Partnership—PHC) Grant Number 44086TD.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Human and animal participants
This article does not contain any studies with human participants or animals performed by any of the authors.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Tos, U., Mokadem, R., Hameurlain, A. et al. Achieving query performance in the cloud via a cost-effective data replication strategy. Soft Comput 25, 5437–5454 (2021). https://doi.org/10.1007/s00500-020-05544-w
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00500-020-05544-w