Supporting full-text information retrieval with a persistent object store

  • Eric W. Brown
  • James P. Callan
  • W. Bruce Croft
  • J. Eliot B. Moss
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 779)

Abstract

The inverted file index common to many full-text information retrieval systems presents unusual and challenging data management requirements. These requirements are usually met with custom data management software. Rather than build this custom software, we would prefer to use an existing database management system. Attempts to do this with traditional (e.g., relational) database management systems have produced discouraging results. Instead, we have used a persistent object store, Mneme, to support the inverted file of a full-text information retrieval system, INQUERY. The result is an improvement in performance along with opportunities for INQUERY to take advantage of the standard data management services provided by Mneme. We describe our implementation, present performance results on a variety of document collections, and discuss the advantages of using a persistent object store to support information retrieval.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    D. C. Blair. An extended relational document retrieval model. Inf. Process. & Mgmnt., 24(3):349–371, 1988.Google Scholar
  2. 2.
    C. Buckley and A. F. Lewit. Optimization of inverted vector searches. In Proc. of the 8th Inter. ACM SIGIR Conf. on Res. and Develop in Infor. Retr., pages 97–110, June 1985.Google Scholar
  3. 3.
    J. P. Callan, W. B. Croft, and S. M. Harding. The INQUERY retrieval system. In Proc. of the 3rd Inter. Conf. on Database and Expert Sys. Apps., Sept. 1992.Google Scholar
  4. 4.
    R. G. Crawford. The relational model in information retrieval. J. Amer. Soc. Inf. Sci., 32(1):51–64, 1981.Google Scholar
  5. 5.
    R. G. Crawford and I. A. MacLeod. A relational approach to modular information retrieval systems design. In Proc. of the 41st Conf. of the Amer. Soc. for Inf. Sci., 1978.Google Scholar
  6. 6.
    J. S. Deogun and V. V. Raghavan. Integration of information retrieval and database management systems. Inf. Process. & Mgmnt., 24(3):303–313, 1988.Google Scholar
  7. 7.
    C. Faloutsos. Access methods for text. ACM Comput. Surv., 17:50–74, 1985.Google Scholar
  8. 8.
    E. A. Fox. Characterization of two new experimental collections in computer and information science containing textual and bibliographic concepts. Technical Report 83-561, Cornell University, Ithaca, NY, Sept. 1983.Google Scholar
  9. 9.
    D. A. Grossman and J. R. Driscoll. Structuring text within a relational system. In Proc. of the 3rd Inter. Conf. on Database and Expert Sys. Apps., pages 72–77, Sept. 1992.Google Scholar
  10. 10.
    D. Harman, editor. The First Text REtrieval Conference (TREC1). National Institute of Standards and Technology Special Publication 200–207, Gaithersburg, MD, 1992.Google Scholar
  11. 11.
    C. A. Lynch and M. Stonebraker. Extended user-defined indexing with application to textual databases. In Proc. of the 14th Inter. Conf. on VLDB, pages 306–317, 1988.Google Scholar
  12. 12.
    I. A. MacLeod. SEQUEL as a language for document retrieval. J. Amer. Soc. Inf. Sci., 30(5):243–249, 1979.Google Scholar
  13. 13.
    I. A. MacLeod and R. G. Crawford. Document retrieval as a database application. Inf. Tech. Res. Dev., 2(1):43–60, 1983.Google Scholar
  14. 14.
    J. E. B. Moss. Design of the Mneme persistent object store. ACM Trans. Inf. Syst., 8(2): 103–139, Apr. 1990.Google Scholar
  15. 15.
    G. Salton and M. J. McGill. Introduction to Modern Information Retrieval. McGraw-Hill, New York, 1983.Google Scholar
  16. 16.
    L. V. Saxton and V. V. Raghavan. Design of an integrated information retrieval/database management system. IEEE Trans. Know. Data Eng., 2(2):210–219, June 1990.Google Scholar
  17. 17.
    M. Stonebraker. Operating system support for database management. Commun. ACM, 24(7):412–418, July 1981.Google Scholar
  18. 18.
    A. Tomasic and H. Garcia-Molina. Performance of inverted indices in distributed text document retrieval systems. Technical Report STAN-CS-92-1434, Stanford University Department of Computer Science, 1992.Google Scholar
  19. 19.
    A. Tomasic and H. Garcia-Molina. Caching and database scaling in distributed shared-nothing information retrieval systems. In Proc. of the ACM SIGMOD Inter. Conf. on Management of Data, Washington, D.C., May 1993.Google Scholar
  20. 20.
    H. Turtle and W. B. Croft. Evaluation of an inference network-based retrieval model. ACM Trans. Inf. Syst., 9(3): 187–222, July 1991.Google Scholar
  21. 21.
    D. Wolfram. Applying informetric characteristics of databases to IR system file design, Part I: informetric models. Inf. Process. & Mgmnt., 28(1):121–133, 1992.Google Scholar
  22. 22.
    D. Wolfram. Applying informetric characteristics of databases to IR system file design, Part II: simulation comparisons. Inf. Process. & Mgmnt., 28(1):135–151, 1992.Google Scholar
  23. 23.
    G. K.Zipf. Human Behavior and the Principle of Least Effort. Addison-Wesley Press, 1949.Google Scholar
  24. 24.
    J. Zobel, A. Moffat, and R. Sacks-Davis. An efficient indexing technique for full-text database systems. In Proc. of the 18th Inter. Conf. on VLDB, Vancouver, 1992.Google Scholar

Copyright information

© Springer-Verlag 1994

Authors and Affiliations

  • Eric W. Brown
    • 1
  • James P. Callan
    • 1
  • W. Bruce Croft
    • 1
  • J. Eliot B. Moss
    • 1
  1. 1.Department of Computer ScienceUniversity of MassachusettsAmherstUSA

Personalised recommendations