Skip to main content
Log in

Hierarchical data replication strategy to improve performance in cloud computing

  • Research Article
  • Published:
Frontiers of Computer Science Aims and scope Submit manuscript

Abstract

Cloud computing environment is getting more interesting as a new trend of data management. Data replication has been widely applied to improve data access in distributed systems such as Grid and Cloud. However, due to the finite storage capacity of each site, copies that are useful for future jobs can be wastefully deleted and replaced with less valuable ones. Therefore, it is considerable to have appropriate replication strategy that can dynamically store the replicas while satisfying quality of service (QoS) requirements and storage capacity constraints. In this paper, we present a dynamic replication algorithm, named hierarchical data replication strategy (HDRS). HDRS consists of the replica creation that can adaptively increase replicas based on exponential growth or decay rate, the replica placement according to the access load and labeling technique, and finally the replica replacement based on the value of file in the future. We evaluate different dynamic data replication methods using CloudSim simulation. Experiments demonstrate that HDRS can reduce response time and bandwidth usage compared with other algorithms. It means that the HDRS can determine a popular file and replicates it to the best site. This method avoids useless replications and decreases access latency by balancing the load of sites.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Fu X, Chen J, Deng S, Wang J, Zhang L. Layered virtual machine migration algorithm for network resource balancing in cloud computing. Frontiers of Computer Science, 2018, 12(1): 75–85

    Article  Google Scholar 

  2. Mansouri N, Javidi M M. A hybrid data replication strategy with fuzzy-based deletion for heterogeneous cloud data centers. The Journal of Supercomputing, 2018, 74(10): 5349–5372

    Article  Google Scholar 

  3. Mansouri N, Javidi M M. A review of data replication based on metaheuristics approach in cloud computing and data grid. Soft Computing, 2020

  4. Yang X, Wallom D, Waddington S, Wang J, Shaon A, Matthews B, Wilson M, Guo Y, Guo L, Blower J D, Vasilakos A V, Liu K, Kershaw P. Cloud computing in e-Science: research challenges and opportunities. The Journal of Supercomputing, 2014, 70: 1453–1471

    Article  Google Scholar 

  5. Shi Y, Meng X, Zhao J, Hu X, Liu B, Wang H. Benchmarking cloud-based data management systems. In: Proceedings of the 2nd International CIKM Workshop on Cloud Data Management. 2010

  6. Thusoo A, Sarma J, Jain N, Shao Z, Chakka P, Anthony S, Liu H, Wyckoff P, Murthy R. Hive-a warehousing solution over a MapReduce framework. Proceedings of the VLDB Endowment, 2009, 2(2): 1626–1629

    Article  Google Scholar 

  7. Kuhlenkamp J, Klems M, Röss O. Benchmarking scalability and elasticity of distributed database systems. Proceedings of the VLDB Endowment, 2014, 7(12): 1219–1230

    Article  Google Scholar 

  8. Loukopoulos T, Ahmad I, Papadias D. An overview of data replication on the internet. In: Proceedings of the International Symposium on Parallel Architectures, Algorithms and Networks (ISPAN.02). 2002, 27–32

  9. Mansouri N. Adaptive data replication strategy in cloud computing for performance improvement. Frontiers of Computer Science, 2016, 10(5): 925–935

    Article  Google Scholar 

  10. ElYamany H F, Mohamed M F, Grolinger K, Capretz M A. A generalized service replication process in distributed environments. In: Proceedings of the 5th International Conference on Cloud Computing and Services Science (CLOSER). 2015, 20–22

  11. Kim H, Parashar M, Foran D J, Yang L. Investigating the use of cloudbursts for high-throughput medical image registration. In: Proceedings of the 10th IEEE/ACM International Conference on Grid Computing (GRID). 2009

  12. Mohamed M F. Service replication taxonomy in distributed environments. Service Oriented Computing and Applications, 2016, 10(3): 317–336

    Article  Google Scholar 

  13. Zhong H, Zhang Z, Zhang X. A dynamic replica management strategy based on data grid. In: Proceedings of the 9th International Conference on Grid and Cloud Computing. 2010, 18–23

  14. Ghemawat S, Gobioff H, Leung S T. The Google file system. In: Proceedings of the 19th ACM Symposium on Operating Systems Principles. 2003, 29–43

  15. Wang Y, Wang J. An optimized replica distribution method in cloud storage system. Journal of Control Science and Engineering, 2017, 11: 1–8

    MATH  Google Scholar 

  16. Milani B A, Navimipour N J. A comprehensive review of the data replication techniques in the cloud environments: major trends and future directions. Journal of Network and Computer Applications, 2016, 64: 229–238

    Article  Google Scholar 

  17. Tabet K, Mokadem R, Laouar M R, Eom S. Data replication in cloud systems: a survey. International Journal of Systems and Social Change, 2017, 8(3): 1–17

    Google Scholar 

  18. Shvachko K, Hairong K, Radia S, Chansler R. TheHadoop distributed file system. In: Proceedings of the 26th Symposium on Mass Storage Systems and Technologies, Incline Village, NV. 2010, 1–10

  19. Mansouri N, Dastghaibyfard G H. Job scheduling and dynamic data replication in data grid environment. The Journal of Supercomputing, 2013, 64: 204–225

    Article  Google Scholar 

  20. Tos U, Mokadem R, Hameurlain A, Ayav T, Bora S. Dynamic replication strategies in data grid systems: a survey. The Journal of Supercomputing, 2015, 71(11): 4116–4140

    Article  Google Scholar 

  21. Jianjin J, Guangwen Y. An optimal replication strategy for data grid systems. Frontiers of Computer Science, 2007, 1(3): 338–348

    Article  Google Scholar 

  22. Mansouri N, Javidi M M. A new prefetching-aware data replication to decrease access latency in cloud environment. Journal of Systems and Software, 2018, 144: 197–215

    Article  Google Scholar 

  23. Gopinath S, Sherly E. A dynamic replica factor calculator for weighted dynamic replication management in cloud storage systems. Procedia Computer Science, 2018, 132: 1771–1780

    Article  Google Scholar 

  24. Mansouri N, Dastghaibyfard G H, Mansouri E. Combination of data replication and scheduling algorithm for improving data availability in data grids. Journal of Network and Computer Applications, 2013, 36: 711–722

    Article  Google Scholar 

  25. Dabas C, Aggarwal J. An intensive review of data replication algorithms for cloud systems. In: Shetty N, Pathaik L, Nagaraj H, Hamsavath P, Nalini N, eds. Emerging Research in Computing, Information, Communication and Applications. Springer, Singapore, 2019, 25–39

    Chapter  Google Scholar 

  26. Mansouri N, Dastghaibyfard G H. Enhanced dynamic hierarchical replication and weighted scheduling strategy in data grid. Journal of Parallel and Distributed Computing, 2013, 73(4): 534–543

    Article  Google Scholar 

  27. Ranganathan K, Foster I. Identifying dynamic replication strategies for a high performance data grid. In: Proceedings of International Workshop on Grid Computing. 2001, 75–86

  28. Park S M, Kim J H, Ko Y B, Yoon W S. Dynamic data grid replication strategy based on Internet hierarchy. In: Proceedings of International Conference on Grid and Cooperative Computing. 2003, 838–846

  29. Myint J, Hunger A. Comparative analysis of adaptive file replication algorithms for cloud data storage. In: Proceedings of International Conference on Future Internet of Things and Cloud. 2014

  30. Khanli L M, Isazadeh A, Shishavan T N. PHFS: a dynamic replication method, to decrease access latency in the multi-tier data grid. Future Generation Computer Systems, 2011, 27(3): 233–244

    Article  Google Scholar 

  31. Sun D W, Chang G R, Gao S, Jin L Z, Wang X W. Modeling a dynamic data replication strategy to increase system availability in cloud computing environments. Journal of Computer Science and Technology, 2012, 27: 256–272

    Article  MATH  Google Scholar 

  32. Chang R S, Chang H P. A dynamic data replication strategy using access-weights in data grids. Journal of Supercomputing, 2008, 45(3): 277–295

    Article  Google Scholar 

  33. Kim Y H, Jung M J, Lee C H. Energy-aware real-time task scheduling exploiting temporal locality. IEICE Transactions on Information and Systems, 2010, 93(5): 1147–1153

    Article  Google Scholar 

  34. Sun D W, Chang G R, Miao C, Jin L Z, Wang X W. Analyzing modeling and evaluating dynamic adaptive fault tolerance strategies in cloud computing environments. The Journal of Supercomputing, 2013, 66: 193–228

    Article  Google Scholar 

  35. Zhang B, Wang X, Huang M. A PGSA based data replica selection scheme for accessing cloud storage system. Advanced Computer Architecture, 2014,451: 140–151

    Google Scholar 

  36. Ding X, You J. Plant Growth Simulation Algorithm. Shanghai People’s Publishing House, 2011, 1–59

  37. Long S Q, Zhao Y L, Chen W. MORM: a multi-objective optimized replication management strategy for cloud storage cluster. Journal of Systems Architecture, 2014, 60(2): 234–244

    Article  Google Scholar 

  38. Lou C, Zheng M, Liu X, Li X. Replica selection strategy based on individual QoS sensitivity constraints in cloud environment. Pervasive Computing and the Networked World, 2014, 8351: 393–399

    Article  Google Scholar 

  39. Kumar K A, Quamar A, Deshpande A, Khuller S. SWORD: workload-aware data placement and replica selection for cloud data management systems. The VLDB Journal, 2014, 23(6): 845–870

    Article  Google Scholar 

  40. Tos U, Mokadem R, Hameurlain A, Ayav T, Bora S. Ensuring performance and provider profit through data replication in cloud systems. Cluster Computing, 2018, 21(3): 1479–1492

    Article  Google Scholar 

  41. Wu Z, Butkiewicz M, Perkins D, Katz-Basset E, Madhyastha H V. Spanstore: cost-effective geo-replicated storage spanning multiple cloud services. In: Proceedings of the 24th ACM Symposium on Operating Systems Principles. 2013, 292–308

  42. Vulimiri A, Curino C, Godfrey B, Padhye J, Varghese G. Global analytics in the face of bandwidth and regulatory constraints. In: Proceedings of the 12th USENIX Conference on Networked Systems Design and Implementation. 2015, 323–336

  43. Wei Q, Veeravalli B, Gong B, Zeng L, Feng D. CDRM: a cost-effective dynamic replication management scheme for cloud storage cluster. In: Proceedings of IEEE International Conference on Cluster Computing. 2010,188-196

  44. Edwin E B, Umamaheswari P, Thanka M R. An efficient and improved multi-objective optimized replication management with dynamic and cost aware strategies in cloud computing data center. Cluster Computing, 2019,22: 11119–11128

    Article  Google Scholar 

  45. Azimi S K. A Bee Colony (Beehive) based approach for data replication in cloud environments. In: Montaser Kouhsari S, eds. Fundamental Research in Electrical Engineering. Springer, Singapore, 2018, 1039–1052

    Google Scholar 

  46. Tatarinov I, Viglas S D, Beyer K S, Shanmugasundaram J, Shekita E J, Zhang C. Storing and querying ordered XML using a relational database system. In: Proceedings of the 2002 ACMSIGMOD International Conference on Management of Data. 2002, 204–215

  47. Cheng X, Dale C, Liu J. Statistics and social network of YouTube videos. In: Proceedings of the 16th International Workshop on Quality of Service. 2008, 229–238

  48. Madi M K, Hassan S. Dynamic replication algorithm in data grid: survey. In: Proceedings of International Conference on Network Applications, Protocols and Services. 2008

  49. Madi M, Hassan S, Yusof Y. A dynamic replication strategy based on exponential growth/decay rate. In: Proceedings of International Conference on Computing and Informatics. 2009

  50. Xu L, Ling T W, Wu H, Bao Z. DDE: from dewey to a fully dynamic XML labeling scheme. In: Proceedings of SIGMOD Conference. 2009, 719–730

  51. Dogan A. A study on performance of dynamic file replication algorithms for real-time file access in data grids. Future Generation Computer Systems, 2009, 25(8): 829–839

    Article  Google Scholar 

  52. Rahmani A M, Fadaie Z, Chronopoulos A T. Data placement using dewey encoding in a hierarchical data grid. Journal of Network and Computer Applications, 2015, 49: 88–98

    Article  Google Scholar 

  53. Barroso L A, Clidaras J, Holzle U. The Datacenter As a Computer: an Introduction to the Design of Warehouse-scale Machines. 2nd ed. Morgan and Claypool Publishers, 2013

  54. Murugesan R, Elango C, Kannan S. Cloud computing networks with poisson arrival process dynamic resource allocation. IOSR Journal of Computer Engineering, 2014, 16(5): 124–129

    Article  Google Scholar 

  55. Mosleh M A S, Radhamani G, Hasan S H. Adaptive cost-based task scheduling in cloud environment. Scientific Programming, 2016

  56. Cameron D G, Carvajal-schiaffino R, Paul Millar A, Nicholson C, Stockinger K, Zini F. UK Grid Simulation with OptorSim. UK e-Science All Hands Meeting, 2003

  57. Lee L W, Scheuermann P, Vingralek R. File assignment in parallel I/O systems with minimal variance of service time. IEEE Transactions on Computers, 2000, 49(2): 127–140

    Article  Google Scholar 

  58. Ranganathan K, Foster I. Decoupling computation and data scheduling in distributed data intensive applications. In: Proceedings of International Symposium for High Performance Distributed Computing. 2002

  59. Breslau L, Cao P, Fan L, Phillips G, Shenker S. Web caching and Zipf-like distributions: evidence and implications. In: Proceedings of IEEE INFO-COM’99, Conference on Computer Communications. 1999, 126–134

  60. Iamnitchi A, Ripeanu M, Foster I. Locating data in (small-world?) peer-to-peer scientific collaborations. In: Proceedings of the 1 st International Workshop on Peer-to-Peer Systems. 2002, 232–241

  61. Visser M. Zipf’s law, power laws and maximum entropy. New Journal of Physics, 2013, 15(4): 1–13

    Article  MATH  Google Scholar 

  62. Adamic L, Huberman B. Zipf’s law and the Internet. Glottometrics, 2002, 3(1): 143–150

    Google Scholar 

  63. Tos U, Mokadem R, Hameurlain A, Ayav T, Bora S. Dynamic replication strategies in data grid systems: a survey. The Journal of Supercomputing, 2015,21(11): 4116–4140

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Najme Mansouri.

Additional information

Najme Mansouri is currently a faculty of Computer Science at Shahid Bahonar University of Kerman, Iran. She received her PhD in Computer Science from Shahid Bahonar University of Kerman, Iran and MSc in software engineering at Department of Computer Science & Engineering, College of Electrical & Computer Engineering, Shiraz University, Iran. She received her BSc (Honor Student) in Computer Science from Shahid Bahonar University of Kerman, Iran in 2009. She has published more than 40 scientific papers in the field of high-performance, parallel processing, grid and cloud computing, and contributed to more than 20 research and development programs.

Mohammad Masoud Javidi is currently an associate professor of Computer Science in Department of Computer Science, Shahid Bahonar University of Kerman, Iran. His research interests include distributed systems, and cloud computing. He has published more than 50 articles in journals and conferences.

Behnam Mohammad Hasani Zade received MSc degree in computer science from Shahid Bahonar University of Kerman, Kerman, Iran in 2018. His researches interests are in the areas of evolutionary algorithms, swarm intelligence, multi-objective optimization, machine learning and cloud computing.

Electronic Supplementary Material

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Mansouri, N., Javidi, M.M. & Zade, B.M.H. Hierarchical data replication strategy to improve performance in cloud computing. Front. Comput. Sci. 15, 152501 (2021). https://doi.org/10.1007/s11704-019-9099-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11704-019-9099-8

Keywords

Navigation