Advertisement

Peer-to-Peer Networking and Applications

, Volume 6, Issue 4, pp 397–408 | Cite as

An SSD-based accelerator for directory parsing in storage systems containing massive files

  • Zhiguang Chen
  • Nong Xiao
  • Fang Liu
Article

Abstract

Data explosion introduces new challenges to storage systems. In a file system for big data, a large number of directories and files exist, which are usually organized in a large tree. Parsing directories in a large tree is difficult. In this paper, we propose an accelerator, which helps file systems to fetch the metadata of files rapidly. Contributions of this work include two aspects. First, we propose an accelerator for directory parsing. The accelerator is actually an SSD-based (Solid State Drive-based) cache, which keeps the metadata of frequently or recently accessed files and directories. When a file is demanded, the accelerator attempts to obtain its metadata directly from SSD. If the metadata is kept in SSD, the file system can rapidly obtain the metadata. However, if the metadata is not in SSD, the accelerator consumes a long time to access SSD, but to no avail. In order to avoid non-beneficial SSD accesses, the accelerator predicts whether the metadata is kept by SSD before issuing a read request. Only if the metadata has a high probability of being kept in SSD, the accelerator issues a request to the SSD. The second contribution of this paper is a new bloom filter used to predict whether a piece of data is kept in SSD. Bloom filter is a space-efficient data structure supporting membership query. But, the standard bloom filter cannot support element deletion. Whereas, our accelerator is a cache, which evicts items periodically. The standard bloom filter is not suitable for our accelerator. In this work, we designed a new bloom filter with low overhead, which supports element deletion. The new bloom filter perfectly suits the proposed accelerator. With the prediction of our bloom filter, the accelerator can accelerate the process of directory parsing with nearly no negative impact. We evaluated the accelerator by using a prototype. Experimental results demonstrate that, the accelerator can speed up the directory parsing process by nearly four times compared with a file system without an accelerator.

Keywords

SSD Directory parsing Cache Accelerator Storage system File system 

Notes

Acknowledgments

We are grateful to our anonymous reviewers for their suggestions. This work is supported by the National High Technology Research and Development 863 Program of China under Grant No. 2013AA013201, the National Natural Science Foundation of China under Grant Nos. 61025009, 61232003, 61120106005, 61170288, 61070198.

References

  1. 1.
    Bryant RE, Katz RH, Lazowska ED (2008) Big-data computing: Creating revolutionary break throughs in commerce, science, and society. In Computing Research Initiatives for the 21st Century. Computing Research AssociationGoogle Scholar
  2. 2.
    Oracle Information Architecture: An Architect’s Guide to Big Data (2012) An oracle white paper in enterprise architectureGoogle Scholar
  3. 3.
    Villars RL, Olofson CW, Eastwood M (2011) Big data: What it is and why you should care. White Paper, IDCGoogle Scholar
  4. 4.
    IDC: Digital Data to Double Every 18 Months (2009) Information management journal, September. 2009, vol. 43/5 Docstoc page 20Google Scholar
  5. 5.
    Big Data: Beyond the hype, why big data matters to you. White Paper, DataStax Corporation, March 2012Google Scholar
  6. 6.
    Russom P (2011) Big data analytics. TDWI best practices report. The fourth quarter 2011Google Scholar
  7. 7.
    Zikopoulos P, Eaton C, Zikopoulos P (2011) Understanding big data: Analytics for enterprise class hadoop and streaming data. Published by Paul Zikopoulos, October 2011Google Scholar
  8. 8.
    Burton HB (1970) Space/time trade-offs in hash coding with allowable errors. Commun ACM 13(7):422–426CrossRefzbMATHGoogle Scholar
  9. 9.
    Ext2fs Home Page. http://e2fsprogs.sourceforge.net/ext2.html. Accessed 25 October 2012
  10. 10.
    Roselli D, Lorch J, Anderson T (2000) A compareison of file system workloads. Proceedings of the 2000 USENIX Annual Technical Conference, pp. 41–54Google Scholar
  11. 11.
    The Directory Cache and Inode Cache. http://www.science.unitn.it/~fiorella/guidelinux/tlk/. Accessed 20 October 2012
  12. 12.
    Borthakur D (2012) The hadoop distributed file system: architecture and design. http://hadoop.apache.org/core/docs/current/hdfs_design.pdf. Accessed 20 October
  13. 13.
    Patil S, Gibson G (2011) Scale and concurrency of GIGA+: file system directories with millions of files, Proceedings of the 9th USENIX conference on File and stroage technologies, p.13-13, February 15–17, 2011, San Jose, CaliforniaGoogle Scholar
  14. 14.
    Ghemawat S, Gobioff H, Leung ST (2003) The Google file system. Proceedings of the nineteenth ACM symposium on operating systems principles, October 19–22, 2003. Bolton Landing, New YorkGoogle Scholar
  15. 15.
    Shi W (2010) Foundations of computer systems research. Higher Education, BeijingGoogle Scholar
  16. 16.
    Bonomi F, Mitzenmacher M, Panigrahy R, Singh S, Varghese G (2006) An improved construction for counting bloom filters, Proceedings of the 14th conference on Annual European Symposium, pp. 684–695, September 11–13, 2006, Zurich, SwitzerlandGoogle Scholar
  17. 17.
    FIPS 180-1, Secure Hash Standard, April 1995Google Scholar
  18. 18.
    FUSE: Filesystem in Userspace, http://www.fuse.sourceforge.net. Accessed 19 September 2012
  19. 19.
    SNIA IOTTA Repository: MSR Cambridge Block I/O Traces, http://iotta.snia.org/traces/list/BlockIO. Accessed 19 September 2012
  20. 20.
    Narayanan D, Donnelly A, Rowstron A (2008) Write off-loading: Practical power management for enterprise storage. Proceedings of the 6th USENIX conference on file and storage technologies, pp. 253–267. San Jose, CA, USA, February 26–29Google Scholar
  21. 21.
    Hua Y, Zhu Y, Jiang H, Feng D, Tian L (2008) Scalable and adaptive metadata management in ultra large-scale file systems. Proceedings of the ICDCS pp. 403–410Google Scholar
  22. 22.
    Linchen Y, Liao X, Jin H, Jiang W (2011) Integrated buffering schemes for P2P VoD services. Peer-to-Peer Networking and Applications 4(1):63–74CrossRefGoogle Scholar
  23. 23.
    Liao X, Jin H, Linchen Y (2012) A novel data replication mechanism in P2P VoD system. Future Generation Computing System 28(6):930–939CrossRefGoogle Scholar
  24. 24.
    Sirui Y, Hai J, Bo L, Xiaofei L, Hong Y, Qi H, Xuping T (2009) Measuring web feature impacts in Peer-to-Peer file sharing systems. Comput Commun 32(12):1418–1425CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2013

Authors and Affiliations

  1. 1.State Key Laboratory of High Performance ComputingNational University of Defense TechnologyChangshaChina
  2. 2.School of ComputerNational University of Defense TechnologyChangshaChina

Personalised recommendations