Skip to main content

THE HYBRID DIGITAL TREE

A New Indexing Technique for Large String Databases

  • Conference paper
Enterprise Information Systems VII
  • 672 Accesses

Abstract

There is an increasing demand for efficient indexing techniques to support queries on large string databases. In this paper, a hybrid RAM/disk-based index structure, called the Hybrid Digital tree (HD-tree), is proposed. The HD-tree keeps internal nodes in the RAM to minimize the number of disk I/Os, while maintaining leaf nodes on the disk to maximize the capability of the tree for indexing large databases. Experimental results using real data have shown that the HD-tree outperformed the Prefix B-tree for prefix and substring searches. In particular, for distinctive random queries in the experiments, the average number of disk I/Os was reduced by a factor of two to three, while the running time was reduced in an order of magnitude.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • Baeza-Yates, R. and Ribiero-Neto, B. (1999). Modern Information Retrieval. Addison Wesley Longman Publishing Co. Inc.

    Google Scholar 

  • Bayer, R. and McCreight, E. M. (1972). Organization and maintenance of large ordered indexes. Acta Informatica, 1(3):173–189.

    Article  Google Scholar 

  • Bayer, R. and Unterauer, K. (1977). Prefix b-trees. ACM Trans. Database Syst., 2(1):11–26.

    Article  Google Scholar 

  • Clark, D. R. and Munro, J. I. (1996). Efficient suffix trees on secondary storage. In Proceedings of the seventh annual ACM-SIAM symposium on Discrete algorithms, pages 383–391, Atlanta, Georgia, United States. Society for Industrial and Applied Mathematics.

    Google Scholar 

  • Comer, D. (1979). Ubiquitous b-tree. ACM Comput. Surv., 11(2):121–137.

    Article  MATH  Google Scholar 

  • Fagin, R., Nievergelt, J., Pippenger, N., and Strong, H. R. (1979). Extendible hashing a fast access method for dynamic files. ACM Trans. Database Syst., 4(3):315–344.

    Article  Google Scholar 

  • Ferragina, P. and Grossi, R. (1999). The string b-tree: A new data structure for string search in external memory and its applications. J. Assoc. Comput. Mach., 46(2):236–280.

    MATH  MathSciNet  Google Scholar 

  • Gonnet, G. H., Baeza-Yates, R. A., and Snider, T. (1991). Lexicographical indices for text: Inverted files vs. pat trees. Technical Report OED-91–01, University of Waterloo.

    Google Scholar 

  • Manber, U. and Myers, G. (1990). Sufffix arrays: a new method for on-line string searches. In Proceedings of the first annual ACM-SIAM symposium on Discrete algorithms, pages 319–327. Society for Industrial and Applied Mathematics.

    Google Scholar 

  • McCreight, E. M. (1976). A space-economical sufffix tree construction algorithm. J. ACM, 23(2):262–272.

    Article  MATH  MathSciNet  Google Scholar 

  • Morrison, D. R. (1968). Patricia practical algorithm to retrieve information coded in alphanumeric. J. ACM, 15(4):514–534.

    Article  MathSciNet  Google Scholar 

  • Sleepycat (2004). Berkeley db. http://www.sleepycat.com/.

    Google Scholar 

  • Voorhees, E. M. and Harman, D. (1997). Overview of the sixth text retrieval conference (trec-6). In Proceedings of the Sixth Text REtrieval Conference, pages 1–24. NIST Special Publication.

    Google Scholar 

  • Weiner, P. (1973). Linear pattern matching algorithms. In 14th Annual Symposium on Switching and Automata Theory, pages 1–11. IEEE.

    Google Scholar 

  • Xue, Q., Pramanik, S., Qian, G., and Zhu, Q. (2004). The hybrid ram/disk-based index structure. Technical report, Department of CSE, Michigan State University.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer

About this paper

Cite this paper

Xue, Q., Pramanik, S., Qian, G., Zhu, Q. (2007). THE HYBRID DIGITAL TREE. In: Chen, CS., Filipe, J., Seruca, I., Cordeiro, J. (eds) Enterprise Information Systems VII. Springer, Dordrecht. https://doi.org/10.1007/978-1-4020-5347-4_5

Download citation

  • DOI: https://doi.org/10.1007/978-1-4020-5347-4_5

  • Publisher Name: Springer, Dordrecht

  • Print ISBN: 978-1-4020-5323-8

  • Online ISBN: 978-1-4020-5347-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics