Abstract
At present, huge cloud-based applications have put forward higher requests for data center storage. In a large-scale Cloud environment, data replication provides an appropriate solution for managing data files, which improves data reliability and availability. In this paper, we propose a data replication algorithm called hybrid replication strategy (HRS) that is applied into replica placement, selection, and replacement steps. HRS has three main phases and is suitable for replicating data files in cloud. In the first phase, it selects the best site (i.e., that is the most central site with high number of access) for storing new replica to reduce access time. In the second phase, HRS considers the best replica node for users based on different parameters such as CPU process capability, network transmission capability, I/O capability of disks, load, and network latency. In the third phase, the replacement decision is made in order to provide better response time. HRS can ascertain the importance of valuable replicas on the basis of a fuzzy inference system with three input parameters (i.e., number of accesses, cost, and the last time the replica was accessed). The new replication policy is simulated using the CloudSim toolkit package. Our proposed mechanism replicates the data over the cloud nodes reasonably well and is easily implementable in a real environment. Experiment results prove that HRS can significantly enhance availability, performance and load balance for data-intensive applications. In addition, it stands good without increasing additional overheads.
Similar content being viewed by others
References
Liu Q, Wang G, Liu X, Peng T, Wu J (2017) Achieving reliable and secure services in cloud computing environments. Comput Electr Eng 59:153–164
Jakóbik A, Grzonk D, Palmieri F (2017) Non-deterministic security driven meta scheduler for distributed cloud organizations. Simul Model Pract Theory 76:67–81
Mishra SK, Puthal D, Sahoo B, Jena SK, Obaidat MS (2017) An adaptive task allocation technique for green cloud computing. J Supercomput 74(1):370–385
Wang T, Zhiyang S, Yu X, Mounir H (2014) Rethinking the data center networking: architecture, network protocols, and resource sharing. IEEE Access 2:1481–1496
Wang T, Mounir H (2016) Presto: Towards efficient online virtual network embedding in virtualized cloud data centers. Comput Netw 106:196–208
Foster I, Zhao Y, Raicu I, Lu S (2008) Cloud computing and grid computing 360-degree compared. In: Grid Computing Environments Workshop, GCE’08, pp 1–10
Rajkumar B, Rajiv R, Calheiros RN (2009) Modeling and simulation of scalable cloud computing environments and the CloudSim toolkit: challenges and opportunities. High Perform Comput Simul 1:1–11
Ghemawat S, Gobioff H, Leung S (2003) The Google file system. In: ACM Symposium on Operating Systems Principles, pp 29–43
Mansouri N, Javidi MMA (2017) survey of dynamic replication strategies for improving response time in data grid environment. AUT J Model Simul 49:239–264
Borthakur D (2007) The Hadoop distributed file system: architecture and design. http://hadoop.apache.org/common/docs/r0.18.3/hdfs_design.html
Feng D, Qin L (2006) Adaptive object placement in object-based storage systems with minimal blocking probability. In: Proceeding of the 20th International Conference on Advanced Information Networking and Application
López-Pires F, Barán B (2017) Many-objective virtual machine placement. J Grid Comput 15(2):161–176
Tao M, Ota O, Dong M (2017) Dependency-aware dependable scheduling workflow applications with active replica placement in the cloud. In: IEEE Transactions on Cloud Computing, p 99
Mansouri N, Kuchaki Rafsanjani M, Javidi MMDPRS (2017) A dynamic popularity aware replication strategy with parallel download scheme in cloud environments. Simul Model Theory 77:177–196
Rahman RM, Barker K, Alhajj R (2006) Replica placement design with static optimality and dynamic maintainability. In: Sixth IEEE International Symposium on Cluster Computing and the Grid, pp 434–437
Shvachko K, Kuang H, Radia S, Chansler R (2010) The Hadoop distributed file system. In: IEEE 26th Symposium on Mass Storage Systems and Technologies, pp 1–10
Mansouri N, Dastghaibyfard GHA (2012) dynamic replica management strategy in data grid. J Netw Comput Appl 35:1297–1303
Ibrahim IA, Dai W, Bassiouni M (2016) Intelligent data placement mechanism for replicas distribution in cloudstorage systems. In: IEEE International Conference on Smart Cloud (SmartCloud), pp 134–139
Mansouri N, Dastghaibyfard GH, Mansouri E (2013) Combination of data replication and scheduling algorithm for improving data availability in data grids. J Netw Comput Appl 36:711–722
Mansouri N, Dastghaibyfard GH (2013) Enhanced dynamic hierarchical replication and weighted scheduling strategy in data grid. J Parallel Distrib Comput 73:534–543
Mansouri N (2016) Adaptive data replication strategy in cloud computing for performance improvement. Front Comput Sci 10(5):925–935
Sun DW, Chang GR, Gao S, Jin LZ, Wang XW (2012) Modeling a dynamic data replication strategy to increase system availability in cloud computing environments. J Comput Sci Technol 27:256–272
Chang RS, Chang HP (2008) A dynamic data replication strategy using access-weights in data grids. J Supercomput 45(3):277–295
Kim YH, Jung MJ, Lee CH (2010) Energy-aware real-time task scheduling exploiting temporal locality. IEICE Trans Inform Syst 93(5):1147–1153
Sun DW, Chang GR, Miao C, Jin LZ, Wang XW (2013) Analyzing modeling and evaluating dynamic adaptive fault tolerance strategies in cloud computing environments. J Supercomput 66:193–228
Zhang B, Wang X, Huang M (2014) A PGSA based data replica selection scheme for accessing cloud storage system. Adv Comput Archit 451:140–151
Ding X, You J (2011) Plant growth simulation algorithm. Shanghai People’s Publishing House, Shanghai, pp 1–59
Li B, Song SL, Bezakova I, Cameron KW (2013) EDR: An energy-aware runtime load distribution system for data-intensive applications in the cloud. In: IEEE International Conference on Cluster Computing
Lin JW, Chen CH, Chang JM (2013) QoS-aware data replication for data-intensive applications in cloud computing systems. IEEE Trans Cloud Comput 1:101–115
Long SQ, Zhao YL, Chen W (2014) MORM: a multi-objective optimized replication management strategy for cloud storage cluster. J Syst Architect 60:234–244
Luo Y, Li R, Tian F (2004) Application of artificial immune algorithm to function optimization. Fifth World Congr Intel Control Autom 3:2248–2252
Lou C, Zheng M, Liu X, Li X (2014) Replica selection strategy based on individual QoS sensitivity constraints in cloud environment. Pervasive Comput Netw World 8351:393–399
Kumar KA, Quamar A, Deshpande A, Khuller S (2014) SWORD: workload-aware data placement and replica selection for cloud data management systems. VLDB J 23:845–870
Newman MN (2009) An introduction. Oxford University Press, Oxford
Saleh A, Javidan R, Fatehikhaje MT (2015) A four-phase data replication algorithm for data grid. J Adv Comput Sci Technol 4:163
Bhardwaj T, Chander Sharma S (2018) Fuzzy logic-based elasticity controller for autonomic resource provisioning in parallel scientific applications: a cloud computing perspective. Comput Electr Eng. https://doi.org/10.1016/j.compeleceng.2018.02.050
Dhinesh Babu LD, Venkata KP (2013) Honey bee behavior inspired load balancing of tasks in cloud computing environments. Appl Soft Comput 13:2292–2303
Pérez JM, García-Carballeira F, Carretero J, Calderón A, Fernández J (2010) Branch replication scheme: a new model for data replication in large scale data grids. Future Gener Comput Syst 26:12–20
Dasgupta K, Kumar Mondal J, Dutta P (2013) Optimized video steganography using genetic algorithm. Int Conf Comput Intell Model Tech Appl 10:131–137
Saadat N, Rahmani AM (2012) PDDRA: a new pre-fetching based dynamic data replication algorithm in data grids. Future Gener Comput Syst 28:666–681
Calheiros RN, Ranjan R, Beloglazov A, De Rose CAF, Buyya R (2011) CloudSim: a toolkit for modeling and simulation of cloud computing environments and evaluation of resource provisioning algorithms. Softw Pract Exp 41:23–50
Howell F, Mcnab R (1998) SimJava: a discrete event simulation library for java. In: Proceedings of the First International Conference on Web-Based Modeling and Simulation
Barroso LA, Clidaras J, Holzle U (2013) The datacenter as a computer: an introduction to the design of warehouse-scale machines, vol 2. Morgan and Claypool Publishers, San Rafael
Kim YJ, Kim BK (2000) Load balancing algorithm of parallel vision processing system for real-time navigation. In Proceedings of 2000 IEEE/RSJ International Conference on Intelligent Robots and Systems, Takamatsu, Japan, pp 1860–1865
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Mansouri, N., Javidi, M.M. A hybrid data replication strategy with fuzzy-based deletion for heterogeneous cloud data centers. J Supercomput 74, 5349–5372 (2018). https://doi.org/10.1007/s11227-018-2427-1
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-018-2427-1