An SSD-based accelerator for directory parsing in storage systems containing massive files

Chen, Zhiguang; Xiao, Nong; Liu, Fang

doi:10.1007/s12083-013-0209-3

An SSD-based accelerator for directory parsing in storage systems containing massive files

Published: 17 May 2013

Volume 6, pages 397–408, (2013)
Cite this article

Peer-to-Peer Networking and Applications Aims and scope Submit manuscript

Zhiguang Chen^1,2,
Nong Xiao^1,2 &
Fang Liu^1,2

385 Accesses
2 Citations
Explore all metrics

Abstract

Data explosion introduces new challenges to storage systems. In a file system for big data, a large number of directories and files exist, which are usually organized in a large tree. Parsing directories in a large tree is difficult. In this paper, we propose an accelerator, which helps file systems to fetch the metadata of files rapidly. Contributions of this work include two aspects. First, we propose an accelerator for directory parsing. The accelerator is actually an SSD-based (Solid State Drive-based) cache, which keeps the metadata of frequently or recently accessed files and directories. When a file is demanded, the accelerator attempts to obtain its metadata directly from SSD. If the metadata is kept in SSD, the file system can rapidly obtain the metadata. However, if the metadata is not in SSD, the accelerator consumes a long time to access SSD, but to no avail. In order to avoid non-beneficial SSD accesses, the accelerator predicts whether the metadata is kept by SSD before issuing a read request. Only if the metadata has a high probability of being kept in SSD, the accelerator issues a request to the SSD. The second contribution of this paper is a new bloom filter used to predict whether a piece of data is kept in SSD. Bloom filter is a space-efficient data structure supporting membership query. But, the standard bloom filter cannot support element deletion. Whereas, our accelerator is a cache, which evicts items periodically. The standard bloom filter is not suitable for our accelerator. In this work, we designed a new bloom filter with low overhead, which supports element deletion. The new bloom filter perfectly suits the proposed accelerator. With the prediction of our bloom filter, the accelerator can accelerate the process of directory parsing with nearly no negative impact. We evaluated the accelerator by using a prototype. Experimental results demonstrate that, the accelerator can speed up the directory parsing process by nearly four times compared with a file system without an accelerator.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Stochastic gradient descent without full data shuffle: with applications to in-database machine learning and deep learning systems

Article Open access 12 April 2024

Accelerating erasure coding by exploiting multiple repair paths in distributed storage systems

Article 12 April 2024

The big data system, components, tools, and technologies: a survey

Article 18 September 2018

References

Bryant RE, Katz RH, Lazowska ED (2008) Big-data computing: Creating revolutionary break throughs in commerce, science, and society. In Computing Research Initiatives for the 21st Century. Computing Research Association
Oracle Information Architecture: An Architect’s Guide to Big Data (2012) An oracle white paper in enterprise architecture
Villars RL, Olofson CW, Eastwood M (2011) Big data: What it is and why you should care. White Paper, IDC
IDC: Digital Data to Double Every 18 Months (2009) Information management journal, September. 2009, vol. 43/5 Docstoc page 20
Big Data: Beyond the hype, why big data matters to you. White Paper, DataStax Corporation, March 2012
Russom P (2011) Big data analytics. TDWI best practices report. The fourth quarter 2011
Zikopoulos P, Eaton C, Zikopoulos P (2011) Understanding big data: Analytics for enterprise class hadoop and streaming data. Published by Paul Zikopoulos, October 2011
Burton HB (1970) Space/time trade-offs in hash coding with allowable errors. Commun ACM 13(7):422–426
Article MATH Google Scholar
Ext2fs Home Page. http://e2fsprogs.sourceforge.net/ext2.html. Accessed 25 October 2012
Roselli D, Lorch J, Anderson T (2000) A compareison of file system workloads. Proceedings of the 2000 USENIX Annual Technical Conference, pp. 41–54
The Directory Cache and Inode Cache. http://www.science.unitn.it/~fiorella/guidelinux/tlk/. Accessed 20 October 2012
Borthakur D (2012) The hadoop distributed file system: architecture and design. http://hadoop.apache.org/core/docs/current/hdfs_design.pdf. Accessed 20 October
Patil S, Gibson G (2011) Scale and concurrency of GIGA+: file system directories with millions of files, Proceedings of the 9th USENIX conference on File and stroage technologies, p.13-13, February 15–17, 2011, San Jose, California
Ghemawat S, Gobioff H, Leung ST (2003) The Google file system. Proceedings of the nineteenth ACM symposium on operating systems principles, October 19–22, 2003. Bolton Landing, New York
Google Scholar
Shi W (2010) Foundations of computer systems research. Higher Education, Beijing
Google Scholar
Bonomi F, Mitzenmacher M, Panigrahy R, Singh S, Varghese G (2006) An improved construction for counting bloom filters, Proceedings of the 14th conference on Annual European Symposium, pp. 684–695, September 11–13, 2006, Zurich, Switzerland
FIPS 180-1, Secure Hash Standard, April 1995
FUSE: Filesystem in Userspace, http://www.fuse.sourceforge.net. Accessed 19 September 2012
SNIA IOTTA Repository: MSR Cambridge Block I/O Traces, http://iotta.snia.org/traces/list/BlockIO. Accessed 19 September 2012
Narayanan D, Donnelly A, Rowstron A (2008) Write off-loading: Practical power management for enterprise storage. Proceedings of the 6th USENIX conference on file and storage technologies, pp. 253–267. San Jose, CA, USA, February 26–29
Hua Y, Zhu Y, Jiang H, Feng D, Tian L (2008) Scalable and adaptive metadata management in ultra large-scale file systems. Proceedings of the ICDCS pp. 403–410
Linchen Y, Liao X, Jin H, Jiang W (2011) Integrated buffering schemes for P2P VoD services. Peer-to-Peer Networking and Applications 4(1):63–74
Article Google Scholar
Liao X, Jin H, Linchen Y (2012) A novel data replication mechanism in P2P VoD system. Future Generation Computing System 28(6):930–939
Article Google Scholar
Sirui Y, Hai J, Bo L, Xiaofei L, Hong Y, Qi H, Xuping T (2009) Measuring web feature impacts in Peer-to-Peer file sharing systems. Comput Commun 32(12):1418–1425
Article Google Scholar

Download references

Acknowledgments

We are grateful to our anonymous reviewers for their suggestions. This work is supported by the National High Technology Research and Development 863 Program of China under Grant No. 2013AA013201, the National Natural Science Foundation of China under Grant Nos. 61025009, 61232003, 61120106005, 61170288, 61070198.

Author information

Authors and Affiliations

State Key Laboratory of High Performance Computing, National University of Defense Technology, Changsha, 410073, China
Zhiguang Chen, Nong Xiao & Fang Liu
School of Computer, National University of Defense Technology, Changsha, 410073, China
Zhiguang Chen, Nong Xiao & Fang Liu

Authors

Zhiguang Chen
View author publications
You can also search for this author in PubMed Google Scholar
Nong Xiao
View author publications
You can also search for this author in PubMed Google Scholar
Fang Liu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zhiguang Chen.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chen, Z., Xiao, N. & Liu, F. An SSD-based accelerator for directory parsing in storage systems containing massive files. Peer-to-Peer Netw. Appl. 6, 397–408 (2013). https://doi.org/10.1007/s12083-013-0209-3

Download citation

Received: 28 October 2012
Accepted: 11 April 2013
Published: 17 May 2013
Issue Date: December 2013
DOI: https://doi.org/10.1007/s12083-013-0209-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An SSD-based accelerator for directory parsing in storage systems containing massive files

Abstract

Access this article

Similar content being viewed by others

Stochastic gradient descent without full data shuffle: with applications to in-database machine learning and deep learning systems

Accelerating erasure coding by exploiting multiple repair paths in distributed storage systems

The big data system, components, tools, and technologies: a survey

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

An SSD-based accelerator for directory parsing in storage systems containing massive files

Abstract

Access this article

Similar content being viewed by others

Stochastic gradient descent without full data shuffle: with applications to in-database machine learning and deep learning systems

Accelerating erasure coding by exploiting multiple repair paths in distributed storage systems

The big data system, components, tools, and technologies: a survey

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation