Abstract
There is an increasing demand for efficient indexing techniques to support queries on large string databases. In this paper, a hybrid RAM/disk-based index structure, called the Hybrid Digital tree (HD-tree), is proposed. The HD-tree keeps internal nodes in the RAM to minimize the number of disk I/Os, while maintaining leaf nodes on the disk to maximize the capability of the tree for indexing large databases. Experimental results using real data have shown that the HD-tree outperformed the Prefix B-tree for prefix and substring searches. In particular, for distinctive random queries in the experiments, the average number of disk I/Os was reduced by a factor of two to three, while the running time was reduced in an order of magnitude.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Baeza-Yates, R. and Ribiero-Neto, B. (1999). Modern Information Retrieval. Addison Wesley Longman Publishing Co. Inc.
Bayer, R. and McCreight, E. M. (1972). Organization and maintenance of large ordered indexes. Acta Informatica, 1(3):173–189.
Bayer, R. and Unterauer, K. (1977). Prefix b-trees. ACM Trans. Database Syst., 2(1):11–26.
Clark, D. R. and Munro, J. I. (1996). Efficient suffix trees on secondary storage. In Proceedings of the seventh annual ACM-SIAM symposium on Discrete algorithms, pages 383–391, Atlanta, Georgia, United States. Society for Industrial and Applied Mathematics.
Comer, D. (1979). Ubiquitous b-tree. ACM Comput. Surv., 11(2):121–137.
Fagin, R., Nievergelt, J., Pippenger, N., and Strong, H. R. (1979). Extendible hashing a fast access method for dynamic files. ACM Trans. Database Syst., 4(3):315–344.
Ferragina, P. and Grossi, R. (1999). The string b-tree: A new data structure for string search in external memory and its applications. J. Assoc. Comput. Mach., 46(2):236–280.
Gonnet, G. H., Baeza-Yates, R. A., and Snider, T. (1991). Lexicographical indices for text: Inverted files vs. pat trees. Technical Report OED-91–01, University of Waterloo.
Manber, U. and Myers, G. (1990). Sufffix arrays: a new method for on-line string searches. In Proceedings of the first annual ACM-SIAM symposium on Discrete algorithms, pages 319–327. Society for Industrial and Applied Mathematics.
McCreight, E. M. (1976). A space-economical sufffix tree construction algorithm. J. ACM, 23(2):262–272.
Morrison, D. R. (1968). Patricia practical algorithm to retrieve information coded in alphanumeric. J. ACM, 15(4):514–534.
Sleepycat (2004). Berkeley db. http://www.sleepycat.com/.
Voorhees, E. M. and Harman, D. (1997). Overview of the sixth text retrieval conference (trec-6). In Proceedings of the Sixth Text REtrieval Conference, pages 1–24. NIST Special Publication.
Weiner, P. (1973). Linear pattern matching algorithms. In 14th Annual Symposium on Switching and Automata Theory, pages 1–11. IEEE.
Xue, Q., Pramanik, S., Qian, G., and Zhu, Q. (2004). The hybrid ram/disk-based index structure. Technical report, Department of CSE, Michigan State University.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2007 Springer
About this paper
Cite this paper
Xue, Q., Pramanik, S., Qian, G., Zhu, Q. (2007). THE HYBRID DIGITAL TREE. In: Chen, CS., Filipe, J., Seruca, I., Cordeiro, J. (eds) Enterprise Information Systems VII. Springer, Dordrecht. https://doi.org/10.1007/978-1-4020-5347-4_5
Download citation
DOI: https://doi.org/10.1007/978-1-4020-5347-4_5
Publisher Name: Springer, Dordrecht
Print ISBN: 978-1-4020-5323-8
Online ISBN: 978-1-4020-5347-4
eBook Packages: Computer ScienceComputer Science (R0)