The Journal of Supercomputing

, Volume 62, Issue 1, pp 227–250 | Cite as

A novel dynamic network data replication scheme based on historical access record and proactive deletion

  • Zhe Wang
  • Tao LiEmail author
  • Naixue Xiong
  • Yi Pan


Data replication is becoming a popular technology in many fields such as cloud storage, Data grids and P2P systems. By replicating files to other servers/nodes, we can reduce network traffic and file access time and increase data availability to react natural and man-made disasters. However, it does not mean that more replicas can always have a better system performance. Replicas indeed decrease read access time and provide better fault-tolerance, but if we consider write access, maintaining a large number of replications will result in a huge update overhead. Hence, a trade-off between read access time and write updating cost is needed. File popularity is an important factor in making decisions about data replication. To avoid data access fluctuations, historical file popularity can be used for selecting really popular files. In this research, a dynamic data replication strategy is proposed based on two ideas. The first one employs historical access records which are useful for picking up a file to replicate. The second one is a proactive deletion method, which is applied to control the replica number to reach an optimal balance between the read access time and the write update overhead. A unified cost model is used as a means to measure and compare the performance of our data replication algorithm and other existing algorithms. The results indicate that our new algorithm performs much better than those algorithms.


Data replication Read overhead Update overhead Historical access record Proactive deletion 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Abdulla G (1998) Analysis and modeling of world wide web traffic. Ph.D. Thesis. Virginia Polytechnic Institute and State University, Virginia, USA Google Scholar
  2. 2.
    Wei Q, Veeravalli B, Gong B, Zeng L, Feng D (2010) CDRM: a cost-effective dynamic replication management scheme for cloud storage cluster. In: 2010 IEEE international conference on cluster computing Google Scholar
  3. 3.
    Ghemawat S, Gobioff H, Leung S-T (2003) The Google file system. In: Proceedings of 19th ACM symposium on operating systems principles (SOSP 2003), New York, USA, October 2003 Google Scholar
  4. 4.
    Weil SA, Brandt SA, Miller EL, Long DDE, Maltzahn C (2006) Ceph: a scalable, high-performance distributed file system. In: Proceeding of 7th conference on operating system design and implementation (OSDI’06), November 2006 Google Scholar
  5. 5.
    The Apache Software Foundation (2011) Hadoop.
  6. 6.
    Chang R-S, Chang H-P (2008) A dynamic data replication strategy using access-weights in Data Grids. J Supercomput 45:277–295 CrossRefGoogle Scholar
  7. 7.
    Loukopoulos T, Ahmad I (2004) Static and adaptive distributed data replication using genetic algorithms. J Parallel Distrib Comput 64:1270–1285 zbMATHCrossRefGoogle Scholar
  8. 8.
    Ranganathan K, Foster I (2001) Identifying dynamic replication strategies for a high-performance data grids. In: International workshop on grid computing, Denver, USA, 2001 Google Scholar
  9. 9.
    Lei M, Vrbsky SV, Hong X (2008) An on-line replication strategy to increase availability in data grids. Future Gener Comput Syst 24:85–98 zbMATHCrossRefGoogle Scholar
  10. 10.
    Cibej U, Slivnik B, Robic B (2005) The complexity of static data replication in data grids. Parallel Comput 31:900–912 MathSciNetCrossRefGoogle Scholar
  11. 11.
    Bsoul M, Al-Khasawneh A, Kilani Y, Obeidat I (2010) A threshold-based dynamic data replication strategy. J Supercomput. doi: 10.1007/s11227-010-0466-3 zbMATHGoogle Scholar
  12. 12.
    Shen H (2010) IRM: integrated file replication and consistency maintenance in P2P systems. IEEE Trans Parallel Distrib Syst 21:100–113 CrossRefGoogle Scholar
  13. 13.
    Tang M, Lee B-S, Yeo C-K, Tang X (2005) Dynamic replication algorithms for the multi-tier data grid. Future Gener Comput Syst 21:775–790 CrossRefGoogle Scholar
  14. 14.
    Tang M, Lee B-S, Tang X, Yeo C-K (2006) The impact of data replication of job scheduling performance in the data grid. Future Gener Comput Syst 22:254–268 zbMATHCrossRefGoogle Scholar
  15. 15.
    Zhang J, Lee B-S, Tang X, Yeo C-K (2010) A model to predict the optimal performance of the Hierarchical Data Grid. Future Gener Comput Syst 26:1–11 CrossRefGoogle Scholar
  16. 16.
    Khanli LM, Isazadeh A, Shishavan TN (2011) PHFS: a dynamic replication method, to decrease access latency in the multi-tier data grid. Future Gener Comput Syst 27:233–244 CrossRefGoogle Scholar
  17. 17.
    Khan SU, Ahmad I (2008) Comparison and analysis of ten static heuristics-based Internet data replication techniques. J Parallel Distrib Comput 68:113–136 zbMATHCrossRefGoogle Scholar
  18. 18.
    Shen H (2010) An efficient and adaptive decentralized file replication algorithm in P2P file sharing systems. IEEE Trans Parallel Distrib Syst 21:827–840 CrossRefGoogle Scholar
  19. 19.
    Bell WH, Cameron DG, Capozza L, Millar AP, Stockinger K, Zini F (2003) OptorSim—a grid simulator for studying dynamic data replication strategies. Int J High Perform Comput Appl 17:403–416 CrossRefGoogle Scholar
  20. 20.
    Shorfuzzaman M, Graham P, Eskicioglu R (2010) Adaptive popularity-driven replica placement in hierarchical data grids. J Supercomput 51:374–392 CrossRefGoogle Scholar
  21. 21.
    Rasool Q, Li J, Zhang S (2009) Replica placement in multi-tier data grid. In: IEEE international conference on dependable, autonomic and secure computing. Google Scholar
  22. 22.

Copyright information

© Springer Science+Business Media, LLC 2011

Authors and Affiliations

  1. 1.Dept. of Computer ScienceSichuan UniversityChengduChina
  2. 2.Dept. of Computer ScienceGeorgia State UniversityAtlantaUSA

Personalised recommendations