Skip to main content

Hybrid index organizations for text databases

  • Conference paper
  • First Online:
Advances in Database Technology — EDBT '92 (EDBT 1992)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 580))

Included in the following conference series:

Abstract

Due to the skewed nature of the frequency distribution of term occurrence (e.g., Zipf's law) it is unlikely that any single technique for indexing text can do well in all situations. In this paper we propose a hybrid approach to indexing text, and show how it can outperform the traditional inverted B-tree index both in storage overhead, in time to perform a retrieval, and, for dynamic databases, in time for an insertion, both for single term and for multiple term queries. We demonstrate the benefits of our technique on a database of stories from the Associated Press news wire, and we provide formulae and guidelines on how to make optimal choices of the design parameters in real applications.

This research was sponsored in pan by the University of Maryland Institute for Advanced Computer Studies, by the National Science Foundation under grants DCR-86-16833, IRI-8719458 and IRI-8938546 and by the Air Force Office of Scientific Research under Grant AFOSR-89-0303.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bourne, C.P., “Frequency and Impact of Spelling Errors in Bibliographic Databases,” Information Processing and Management 13(1), pp. 1–12 (1977).

    Google Scholar 

  2. Cardenas, A.F., “Analysis and Performance of Inverted Data Base Structures,” CACM 18(5), pp. 253–263 (May 1975).

    Google Scholar 

  3. Christodoulakis, S., M. Theodoridou, F. Ho, M. Papa, and A. Pathria, “Multimedia Document Presentation. Information Extraction and Document Formation in MINOS: A Model and a System,” ACM TOOIS 4(4) (Oct. 1986).

    Google Scholar 

  4. Comer, D., “The Ubiquitous B-Tree,” Computing Surveys 11(2), pp. 121–137 (June 1979).

    Google Scholar 

  5. Faloutsos, C., “Access Methods for Text,” ACM Computing Surveys 17(1), pp. 49–74 (March 1985).

    Google Scholar 

  6. Faloutsos, C. and R. Chan, “Fast Text Access Methods for Optical and Large Magnetic Disks: Designs and Performance Comparison,” Proc. 14th International Conf. on VLDB, Long Beach, California, pp. 280–293 (Aug. 1988).

    Google Scholar 

  7. Faloutsos, C., “Signature-Based Text Retrieval Methods: A Survey,” IEEE Data Engineering 13(1), pp. 25–32 (March 1990).

    Google Scholar 

  8. Haskin, R.L., “Special-Purpose Processors for Text Retrieval,” Database Engineering 4(1), pp. 16–29 (Sept. 1981).

    Google Scholar 

  9. Hollaar, L.A., K.F. Smith, W.H. Chow, P.A. Emrath, and R.L. Haskin, “Architecture and Operation of a Large, Full-Text Information-Retrieval System,” pp. 256–299 in Advanced Database Machine Architecture, ed. D.K. Hsiao, Prentice-Hall, Englewood Cliffs, New Jersey (1983).

    Google Scholar 

  10. King, D. R., “The Binary Vector as the Basis of an Inverted Index File,” J. Lib. Autom. 7(4), p. 307 (1974).

    Google Scholar 

  11. Lesk, M.E., “Some Applications of Inverted Indexes on the UNIX System,” UNIX Programmer's Manual, Bell Laboratories, Murray Hill, New Jersey (1978).

    Google Scholar 

  12. Lin, Z. and C. Faloutsos, “Frame Sliced Signature Files,” CS-TR-2146 and UMIACS-TR-88-88. Dept. of Computer Science, Univ. of Maryland (Dec. 1988).

    Google Scholar 

  13. Rothnic, J.B. and T. Lozano, “Attribute Based File Organization in a Paged Memory Environment,” CACM 17(2), pp. 63–69 (Feb. 1974).

    Google Scholar 

  14. Salton, G. and M. J. McGill, Introduction to Modern Information Retrieval, McGraw-Hill (1983).

    Google Scholar 

  15. Schuegraf, E.J., “Compression of Large Inverted Files with Hyperbolic Term Distribution,” Information Processing and Management 12, pp. 377–384 (1976).

    Google Scholar 

  16. Zipf, G.K., Human Behavior and Principle of Least Effort: An Introduction to Human Ecology, Addison Wesley, Cambridge, Massachusetts (1949).

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Alain Pirotte Claude Delobel Goerg Gottlob

Rights and permissions

Reprints and permissions

Copyright information

© 1992 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Faloutsos, C., Jagadish, H.V. (1992). Hybrid index organizations for text databases. In: Pirotte, A., Delobel, C., Gottlob, G. (eds) Advances in Database Technology — EDBT '92. EDBT 1992. Lecture Notes in Computer Science, vol 580. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0032439

Download citation

  • DOI: https://doi.org/10.1007/BFb0032439

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-55270-3

  • Online ISBN: 978-3-540-47003-8

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics