Abstract
Large-scale small file storage for original pages degrades performance of search engines. In this paper, we first analyze the disadvantages of the existing EXT3 file system in accessing small files. Then, the rate and speed of compression algorithms are verified to choose a proper storage compression algorithm. Meanwhile, we design an original page oriented file organization structure and a read–write query tree to store the large-scale small files which need no modification. The accessing response time and disk space waste are remarkably decreased when search engines use these techniques to store original-page small files.
Similar content being viewed by others
References
RFC1952.GZIP File format specification version 4.3. http://www.ietf.org/rfc/rfc1951.txt
RFC1950.ZLIB Compressed Data Format Specification version 3.3. http://www.ietf.org/rfc/rfc1950.txt
Welch TA (1984) A technique for high-performance data compression. Computer 17(6):8–19
Ziv J, Lempel A (1977) A universal algorithm for sequential data compression. IEEE Trans Inf Theory 23(3):337–343
Tweedie SC (1998) Journaling the Linux ext2fs filesystem. In: Proceedings of the 4th Annual LinuxExpo, Durham, NC
Namesys web site. http://www.namesys.com/
JFS for linux project website. http://jfs.sourceforge.net/
The SGI XFS project website. http://oss.sgi.com/projects/xfs/
Rosenblum M, Ousterhout JK (1992) The design and implementation of a log-structured file system. ACM Trans Comput Syst (TOCS) 10(1):26–52
Zhang WZ, Chen HX, He H, Chen G (2014) A two-tier distributed full-text indexing system. Appl Math 8(1):321–326
Zhang WZ, He H, Ye J (2013) A two-level cache for distributed information retrieval in search engines. Sci World J (2013)
Zhang WZ, He H, Zhang Q (2012) Original-page small file oriented EXT3 file storage system. ASTL 5 (Software Technology)
Acknowledgments
This work is supported by the National Basic Research Program of China (973 Program) under Grant No. 2011CB302605, the National Science Foundation of China (NSFC) under Grant No. 61173145, and also the Doctoral Program of Higher Education of China under Grant No. 20132302110037.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Zhang, W., Lu, G., He, H. et al. Exploring large-scale small file storage for search engines. J Supercomput 72, 2911–2923 (2016). https://doi.org/10.1007/s11227-015-1394-z
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-015-1394-z