Advertisement

A New Data Replication Scheme for PVFS2

  • Nianyuan Bao
  • Jie TangEmail author
  • Xiaoyu Zhang
  • Gangshan Wu
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9530)

Abstract

PVFS is one of the most popular distributed file systems with parallelism, which is still widely used today. Now PVFS is in its version 2, called PVFS2. PVFS2 has a leading performance on I/O operations, but the reliability and stability are not as good. One of the reasons is the lack of data replication. This paper presents a new data replication scheme in PVFS2. In our approach, the backup operation is done on the servers, therefore the user experience is not affected while creating copies of files. In addition, we optimized the read operation of PVFS2. With copies, we can choose the servers to read from, so we can maintain parallelism of read operation under complex conditions such as a server is down or the load of some servers are obviously higher than others. Experimental results verify the effectiveness and efficiency of our method.

Keywords

PVFS2 Distributed file system Parallel file system Data replication Read optimization 

Notes

Acknowledgments

We would like to thank the anonymous reviewers for helping us refine this paper. Their constructive comments and suggestions are very helpful. This paper is partly funded by National Science and Technology Major Project of the Ministry of Science and Technology of China under grant 2011ZX05035-004-004HZ. The corresponding author of this paper is Jie Tang.

References

  1. 1.
    Zhao, D., Raicu, I.: Distributed file systems for exascale computing. Doctoral Showcase, SC, 12 (2012)Google Scholar
  2. 2.
    Ross, R.B., Thakur, R.: PVFS: a parallel file system for Linux clusters. In: Proceedings of the 4th Annual Linux Showcase and Conference, pp. 391–430 (2000)Google Scholar
  3. 3.
    Parallel Virtual File System, Version 2. http://www.pvfs.org/
  4. 4.
    Wu, J., Wyckoff, P., Panda, D.: PVFS over InfiniBand: design and performance evaluation. In: 2003 Proceedings of the International Conference on Parallel Processing, pp. 125–132. IEEE (2003)Google Scholar
  5. 5.
    Wu, J., Wyckoff. P., Panda, D.: Supporting efficient noncontiguous access in PVFS over InfiniBand. In: 2003 Proceedings of the IEEE International Conference on Cluster Computing, pp. 344–351. IEEE (2003)Google Scholar
  6. 6.
    Zhu, Y., Jiang, H.: Ceft: a cost-effective, fault-tolerant parallel virtual file system. J. Parallel Distrib. Comput. 66(2), 291–306 (2006)CrossRefzbMATHGoogle Scholar
  7. 7.
    Bell, W.H., Cameron, D.G., Millar, A.P., et al.: Optorsim: a grid simulator for studying dynamic data replication strategies. Int. J. High Perform. Comput. Appl. 17(4), 403–416 (2003)CrossRefGoogle Scholar
  8. 8.
    Nieto, E., Camacho, H.E., Anguita, M., et al.: Fault tolerant PVFS2 based on data replication. In: 2010 1st International Conference on Parallel Distributed and Grid Computing (PDGC), pp. 107–112. IEEE (2010)Google Scholar
  9. 9.
    Satyanarayanan, M.: A survey of distributed file systems. Annu. Rev. Comput. Sci. 4(1), 73–104 (1990)CrossRefGoogle Scholar
  10. 10.
    Latham, R., Miller, N., Ross, R., et al.: A next-generation parallel file system for Linux cluster. LinuxWorld Mag. 2 (ANL/MCS/JA-48544) (2004)Google Scholar
  11. 11.
    Zhang, X., Jiang, S., Davis, K.: Making resonance a common case: a high-performance implementation of collective I/O on parallel file systems. In: IEEE International Symposium on Parallel & Distributed Processing, 2009, IPDPS 2009, pp. 1–12. IEEE (2009)Google Scholar
  12. 12.
    Kunkel, J.M., Ludwig, T.: Performance evaluation of the PVFS2 architecture. In: 15th EUROMICRO International Conference on Parallel, Distributed and Network-Based Processing, 2007, PDP 2007, pp. 509–516. IEEE (2007)Google Scholar
  13. 13.
    Chai, L., Ouyang, X., Noronha, R., et al.: pNFS/PVFS2 over InfiniBand: early experiences. In: Proceedings of the 2nd International Workshop on Petascale Data Storage: Held in Conjunction with Supercomputing 2007, pp. 5–11. ACM (2007)Google Scholar
  14. 14.
    Choi, Y.H., Cho, W.H., Eom, H., et al.: A study of the fault-tolerant PVFS2. In: 2011 6th International Conference on Computer Sciences and Convergence Information Technology (ICCIT), pp. 482–485. IEEE (2011)Google Scholar
  15. 15.
    Zhu, Y., Jiang, H., Qin, X., et al.: Improved read performance in a cost-effective, faulttolerant parallel virtual file system (CEFT-PVFS). In: 2003 Proceedings of the 3rd IEEE/ACM International Symposium on Cluster Computing and the Grid, CCGrid 2003, pp. 730–735. IEEE (2003)Google Scholar
  16. 16.
    Wolfson, O., Jajodia, S., Huang, Y.: An adaptive data replication algorithm. ACM Trans. Database Syst. (TODS) 22(2), 255–314 (1997)CrossRefGoogle Scholar
  17. 17.
    Saadat, N., Rahmani, A.M.: PDDRA: a new pre-fetching based dynamic data replication algorithm in data grids. Future Gener. Comput. Syst. 28(4), 666–681 (2012)CrossRefGoogle Scholar
  18. 18.
    Cachin, C., Junker, B., Sorniotti, A.: On limitations of using cloud storage for data replication. In: 2012 IEEE/IFIP 42nd International Conference on Dependable Systems and Networks Workshops (DSN-W), pp. 1–6. IEEE (2012)Google Scholar
  19. 19.
  20. 20.
    Ghemawat, S., Gobioff, H., Leung, S.T.: The Google file system. In: SIGOPS Operating Systems Review, vol. 37(5), pp. 29–43. ACM (2003)Google Scholar
  21. 21.
    Shvachko, K., Kuang, H., Radia, S., et al.: The hadoop distributed file system. In: 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST), pp. 1–10. IEEE (2010)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • Nianyuan Bao
    • 1
  • Jie Tang
    • 1
    Email author
  • Xiaoyu Zhang
    • 1
  • Gangshan Wu
    • 1
  1. 1.State Key Laboratory for Novel Software Technology, Department of Computer Science and TechnologyNanjing UniversityNanjingChina

Personalised recommendations