Skip to main content

Combining Approaches to Information Retrieval

  • Chapter
Advances in Information Retrieval

Part of the book series: The Information Retrieval Series ((INRE,volume 7))

Abstract

The combination of different text representations and search strategies has become a standard technique for improving the effectiveness of information retrieval. combination, for example, has been studied extensively in the TREC evaluations and is the basis of the “meta-search” engines used on the Web. This paper examines the development of this technique, including both experimental results and the retrieval models that have been proposed as formal frameworks for combination. We show that combining approaches for information retrieval can be modeled as combining the outputs of multiple classifiers based on one or more representations, and that this simple model can provide explanations for many of the experimental results. We also show that this view of combination is very similar to the inference net model, and that a new approach to retrieval based on language models supports combination and can be integrated with the inference net model.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • Bartell, B., Cottrell, G., and Belew, R. (1994). Automatic combination of multiple ranked retrieval systems. In Proceedings of the 17th ACM SIGIR Conference on Research and Development in Information Retrieval, pages 173–181.

    Google Scholar 

  • Belkin, N., Cool, C., Croft, W., and Callan, J. (1993). The effect of multiple query representations on information retrieval system performance. In Proceedings of the 16th ACM SIGIR Conference on Research and Development in Information Retrieval, pages 339–346.

    Google Scholar 

  • Belkin, N., Kantor, P., Fox, E., and Shaw, J. (1995). Combining the evidence of multiple query representations for information retrieval. Information Processing and Management, 31(3):431–448.

    Article  Google Scholar 

  • Berger, A. and Lafferty, J. (1999). Information retrieval as statistical translation. In Proceedings of the 22nd ACM SIGIR Conference on Research and DevelopmentinInformationRetrieval, pages222–229.

    Google Scholar 

  • Callan, J. (1994). Passage-level evidence in document retrieval. In Proceedings of the 17th ACM SIGlR Conference on Research and Development in Information Retrieval, pages 302–310.

    Google Scholar 

  • Callan, J. and Croft, W. (1993). An evaluation of query processing strategies using the TIPSTER collection. In Proceedings of the 16th ACM SIGIR Conference on Research and Development in Information Retrieval, pages 347–355.

    Google Scholar 

  • Callan, J., Croft, W., and Broglio, J. (1995a). TREC and TIPSTER experiments with INQUERY. Information Processing and Management, 31(3):327–343.

    Article  Google Scholar 

  • Callan, J., Lu, Z., and Croft, W. (1995b). Searching distributed collections with inference networks. In Proceedings of the 18th ACM SIGIR Conference on Research and Development in Information Retrieval, pages 21–28.

    Google Scholar 

  • Ciaccia, P., Patella, M., and Zezula, P. (1998). Processing complex similarity queries with distance-based access methods. In Proceedings of the 6th International Conference on Extending Database Technology (EDBT), pages 9–23. Springer-Verlag.

    Google Scholar 

  • Cleverdon, C. (1967). The Cranfield tests on index language devices. Aslib Proceedings, 19:173–192.

    Article  Google Scholar 

  • Croft, W. and Harper, D. (1979). Using probabilistic models of document retrieval without relevance information. Journal of Documentation, 35:285–295.

    Google Scholar 

  • Croft, W., Krovetz, R., and Turtle, H. (1990). Interactive retrieval of complex documents. Information Processing and Management, 26(5):593–613.

    Article  Google Scholar 

  • Croft, W., Lucia, T. J. and Cringean, J., and Willett, P. (1989). Retrieving documents by plausible inference: An experimental study. Information Processing and Management, 25(6):599–614.

    Google Scholar 

  • Croft, W. and Thompson, R. (1984). The use of adaptive mechanisms for selection of search strategies in document retrieval systems. In Proceedings of the 7th ACM SIGIR Conference on Research and Development in Information Retrieval, pages 95–110. Cambridge University Press.

    Google Scholar 

  • Croft, W. and Thompson, R. (1987). I3R: A new approach to the design of document retrieval systems. Journal of the American Society for Information Science, 38(6): 389–404.

    Article  Google Scholar 

  • Croft, W. and Turtle, H. (1989): A retrieval model incorporating hypertext links. In Proceedings of ACM HypertextConference, pages 213–224.

    Google Scholar 

  • Croft, W. and Turtle, H. (1992). Retrieval of complex objects. In Proceedings of the 3rd International Conference on Extending Database Technology (EDBT), pages 217–229. Springer-Verlag.

    Google Scholar 

  • Croft, W., Turtle, H., and Lewis, D. (1991). The use of phrases and structured queries in information retrieval. In Proceedings of the 14th ACM SIGIR Conference on Research and Development in Information Retrieval, pages 32–45.

    Google Scholar 

  • Crouch, C., Crouch, D., and Nareddy, K. (1990). The automatic generation of extended queries. In Proceedings of the 13th ACM SlGlR Conference on Research and Development in Information Retrieval, pages 369–383.

    Google Scholar 

  • Deerwester, S., Dumais, S., Furnas, G., Landauer, T., and Harshman, R. (1990). Indexing by latent semantic analysis. Journal of the American Society for Information Science, 41:391–407.

    Article  Google Scholar 

  • Fagan, J. (1987). Experiments in automatic phrase indexing for document retrieval: A comparison of syntactic and non-syntactic methods. PhD thesis, Computer Science Department, Cornell University.

    Google Scholar 

  • Fagan, J. (1989). The effectiveness of a nonsyntactic approach to automatic phrase indexing for document retrieval. Journal of the American Society for Information Science, 40(2):115–132.

    Article  Google Scholar 

  • Fagin, R. (1996). Combining fuzzy information from multiple systems. In Proceedings of the 15th ACM Conference on Principles of Database Systems (PODS), pages 216–226.

    Google Scholar 

  • Fagin, R. (1998). Fuzzy queries in multimediadatabase systems. In Proceedings of the 17th ACM Conference on Principles of Database Systems (PODS), pages 1–10.

    Google Scholar 

  • Fisher, H. and Elchesen, D. (1972). Effectiveness of combining title words and index terms in machine retrieval searches. Nature, 238:109–110.

    Article  Google Scholar 

  • Flickner, M., Sawhney, H., Niblack, W., Ashley, J., Huang, Q., Dom, B., Gorkani, M., Lee, D., Petkovix, D., Steele, D., and Yanker, P. (1995). Query by image and video content: The QBIC system. IEEE Computer Magazine, 28(9):23–30.

    Google Scholar 

  • Fox, E. (1983). Extending the Boolean and vector space models of information retrieval with p-norm queries and multiple concept types. PhD thesis, Computer Science Department, Cornell University.

    Google Scholar 

  • Fox, E. and France, R. (1987). Architecture of an expert system for composite document analysis, representation, and retrieval. Journal of Approximate Reasoning, 1:151–175.

    Google Scholar 

  • Fox, E., Nunn, G., and Lee, W. (1988). Coefficients for combining concept classes in a collection. In Proceedings of the 11th ACM SIGIR Conference on Research and Development in Information Retrieval, pages 291–308.

    Google Scholar 

  • Fox, E. and Shaw, J. (1994). Combination of multiple searches. In Proceedings of the 2nd Text Retrieval Conference (TREC-2), pages 243–252. National Institute of Standards and Technology Special Publication 500-215.

    Google Scholar 

  • Frankel, C., Swain, M., and Athitsos, V. (1996). WebSeer: An image search engine for the World Wide Web. Technical Report TR-96-14, University of Chicago Computer Science Department.

    Google Scholar 

  • Frisse, M. and Cousins, S. (1989). Information retrieval from hypertext: Update on the dynamic medical handbook project. In Proceedings of ACM Hypertext Conference, pages 199–212.

    Google Scholar 

  • Fuhr, N. (1990). A probabilistic framework for vague queries and imprecise information in databases. In Proceedings of the Very Large Database Conference (VLDB), pages 696–707.

    Google Scholar 

  • Fuhr, N. (1992). Probabilistic models in information retrieval. Computer Journal, 35(3):243–255.

    Article  MATH  Google Scholar 

  • Fuhr, N. and Buckley, C. (1991). A probabilistic learning approach for document indexing. ACM Transactions on Information Systems, 9(3):223–248.

    Article  Google Scholar 

  • Gey, F. (1994). Inferring probability of relevance using the method of logistic regression. In Proceedings of the 17th ACM SIGIR Conference on Research and Development in Information Retrieval, pages 222–231.

    Google Scholar 

  • Greiff, W. (1998). A theory of term weighting based on exploratory data analysis. In Proceedings of the 21st ACM SIGIR Conference on Research and Development in Information Retrieval, pages 11–19.

    Google Scholar 

  • Greiff, W. (1999). Maximum entropy, weight of evidence, and information retrieval. PhD thesis, Computer Science Department, University of Massachusetts.

    Google Scholar 

  • Greiff, W., Croft, W., and Turtle, H. (1997). Computationally tractable probabilistic modeling of Boolean operators. In Proceedings of the 20th ACM SIGIR Conference on Research and Development in Information Retrieval, pages 119–128.

    Google Scholar 

  • Haines, D. (1996). Adaptive query modification in a probabilistic information retrieval model. PhD thesis, Computer Science Department, University of Massachusetts.

    Google Scholar 

  • Haines, D. and Croft, W. (1993). Relevance feedback and inference networks. In Proceedings of the 16th ACM SIGIR Conference on Research and Development in Information Retrieval, pages 2–11.

    Google Scholar 

  • Harman, D. (1992). The DARPA TIPSTER project. ACM SIGIR Forum, 26(2):26–28.

    Google Scholar 

  • Harman, D. (1995). Overview of the second text retrieval conference (TREC-2). Information Processing and Management, 31(3):271–289.

    Google Scholar 

  • Harmandas, V., Sanderson, M., and Dunlop, M. (1997). Image retrieval by hypertext links. In Proceedings of the 20th ACM SIGIR Conference on Research and Development in Information Retrieval, pages 296–303.

    Google Scholar 

  • Hearst, M. and Plaunt, C. (1993). Subtopic structuring for full-length document access. In Proceedings of the 16th ACM SIGIR Conference on Research and Development in Information Retrieval, pages 59–68.

    Google Scholar 

  • Heckerman, D., Geiger, D., and Chickering, D. (1994). Learning Bayesian networks: The combination of knowledge and statistical data. In Proceedings of the 10th Conference on Uncertainty in Artificial Intelligence, pages 293–301. Morgan Kaufmann.

    Google Scholar 

  • Hofmann, T. (1999). Probabilistic latent semantic indexing. In Proceedings of the 22nd ACM SIGIR Conference on Research and Development in Information Retrieval, pages 50–57.

    Google Scholar 

  • Hull, D., Pedersen, J., and Schutze, H. (1996). Method combination for document filtering. In Proceedings of the 19th ACM SIGIR Conference on Research and Development in Information Retrieval, pages 279–287.

    Google Scholar 

  • Jansen, B., Spink, A., and Saracevic, T. (1998). Real life information retrieval: A study of user queries on the Web. SIGIR Forum, 32(1):5–17

    Google Scholar 

  • Jelinek, F. (1997). Statistical methods for speech recognition. MIT Press, Cambridge.

    Google Scholar 

  • Kaszkiel, M. and Zobel, J. (1997). Passage retrieval revisited. In Proceedings of the 20th ACM SIGIR Conference on Research and Development in Information Retrieval, pages 178–185.

    Google Scholar 

  • Katzer, J., McGill, M., Tessier, J., Frakes, W., and DasGupta, P. (1982). A study of the overlap among document representations. Information Technology: Research and Development, 1(4): 261–274.

    Google Scholar 

  • Larkey, L. and Croft, W. (1996). Combining classifiers in text categorization. In Proceedings of the 19th ACM SIGIR Conference on Research and Development in Information Retrieval, pages 289–297.

    Google Scholar 

  • Lee, J. (1995). Combining multiple evidence from different properties of weighting schemes. In Proceedings of the 18th ACM SIGIR Conference on Research and Development in Information Retrieval, pages 180–188.

    Google Scholar 

  • Lee, J. (1997). Analyses of multiple evidence combination. In Proceedings of the 20th ACM SIGIR Conference on Research and Development in Information Retrieval, pages 267–276.

    Google Scholar 

  • Lewis, D. and Hayes, P. (1994). Special issue on text categorization. ACM Transactions on Information Systems, 12(3).

    Google Scholar 

  • Manning, C. and Schutze, H. (1999). Foundations of statistical natural language processing. MIT Press, Cambridge.

    MATH  Google Scholar 

  • McGill, M., Koll, M., and Noreault, T. (1979). An evaluation of factors affecting document ranking by information retrieval systems. Final report for grant NSF-IST-78-10454 to the National Science Foundation, Syracuse University.

    Google Scholar 

  • McLachlan, G. and Krishnan, T. (1997). The EM algorithm and extensions. Wiley, New York.

    MATH  Google Scholar 

  • Miller, D., Leek, T., and Schwartz, R. (1999). A Hidden Markov Model information retrieval system. In Proceedings of the 22nd ACM SIGIR Conference on Research and Development in Information Retrieval, pages 214–221.

    Google Scholar 

  • Mitchell, T. (1997). Machine Learning. McGraw-Hill, New York.

    MATH  Google Scholar 

  • Mitra, M., Singhal, A., and Buckley, C. (1998). Improving automatic query expansion. In Proceedings of the 21st ACM SIGIR Conference on Research and Development in Information Retrieval, pages 206–214.

    Google Scholar 

  • Mittendorf, E. and Schauble, P. (1994). Document and passage retrieval based on Hidden Markov Models. In Proceedings of the 17th ACM SIGIR Conference on Research and Development in Information Retrieval, pages 318–327.

    Google Scholar 

  • MUC-6 (1995). Proceedings of the Sixth Message Understanding Conference (MUC-6). Morgan Kaufmann, San Mateo.

    Google Scholar 

  • O’Connor, J. (1975). Retrieval of answer-sentences and answer figures from papers by text searching. Information Processing and Management, 11(5/7):155–164.

    Google Scholar 

  • O’Connor, J. (1980). Answer-passage retrieval by text searching. Journal of the American Society for Information Science, 31(4):227–239.

    Google Scholar 

  • Pao, M. and Worthen, D. (1989). Retrieval effectiveness by semantic and citation searching. Journal of the American Society for Information Science, 40(4):226–235.

    Article  Google Scholar 

  • Pearl, J. (1988). Probabilistic reasoning in intelligent systems: Networks of plausible inference. Morgan Kaufmann, San Mateo.

    Google Scholar 

  • Ponte, J. (1998). A Language Modeling Approach to Information Retrieval. PhD thesis, Computer Science Department, University of Massachusetts.

    Google Scholar 

  • Ponte, J. and Croft, W. (1998). A language modeling approach to information retrieval. In Proceedings of the 21st ACM SIGIR Conference on Research and Development in Information Retrieval, pages 275–281.

    Google Scholar 

  • Rajashekar, T. and Croft, W. (1995). Combining automatic and manual index representations in probabilistic retrieval. Journal of the American Society for Information Science, 46(4):272–283.

    Article  Google Scholar 

  • Ravela, C. and Manmatha, R. (1997). Image retrieval by appearance. In Proceedings of the 20th ACM SIGIR Conference on Research and Development in Information Retrieval, pages 278–285.

    Google Scholar 

  • Robertson, S. (1977). The probability ranking principle in information retrieval. Journal of Documentation, 33:294–304.

    Google Scholar 

  • Robertson, S. and Sparck Jones, K. (1976). Relevance weighting of search terms. Journal of the American Society for Information Science, 27:129–146.

    Google Scholar 

  • Salton, G. (1968). Automatic information organization and retrieval. McGraw-Hill, New York.

    Google Scholar 

  • Salton, G. (1971). The SMART retrieval system-Experiments in automatic document processing. Prentice-Hall, Englewood Cliffs.

    Google Scholar 

  • Salton, G. (1974). Automatic indexing using bibliographic citations. Journal of Documentation, 27:98–100.

    Google Scholar 

  • Salton, G., Allan, J., and Buckley, C. (1993). Approaches to passage retrieval in full text information systems. In Proceedings of the 17th ACM SIGIR Conference on Research and Development in Information Retrieval, pages 49–56.

    Google Scholar 

  • Salton, G. and Buckley, C. (1988). Term-weighting approaches in automatic text retrieval. Information Processing and Management, 24:513–523.

    Google Scholar 

  • Salton, G., Fox, E., and Voorhees, E. (1983). Advanced feedback methods in information retrieval. Journal of the American Society for Information Science, 36(3):200–210.

    Google Scholar 

  • Salton, G. and Lesk, M. (1968). Computer evaluation of indexing and text processing. Journal of the ACM, 15:8–36.

    Article  MATH  Google Scholar 

  • Salton, G. and McGill, M. (1983). Introduction to modern information retrieval. McGraw-Hill, New York.

    MATH  Google Scholar 

  • Salton, G., Wong, A., and Yang, C. (1975). A vector space model for automatic indexing. Communications of the ACM, 18:613–620.

    Article  MATH  Google Scholar 

  • Saracevic, T. and Kantor, P. (1988). A study of information seeking and retrieving. Part111. Searchers, searches, overlap. Journal of the American Society for Information Science, 39(3): 197–216.

    Google Scholar 

  • Schneiderman, H. and Kanade, T. (1998). Probabilistic modeling of local appearance and spatial relationships for object recognition. In Proceedings of the Computer Vision and Pattern Recognition Conference (CVPR), pages 45–51.

    Google Scholar 

  • Small, H. (1973). Co-citation in scientific literature: A new measure of the relationship between two documents. Journal of the American Society for Information Science, 24:265–269.

    Google Scholar 

  • Song, F. and Croft, W. (1999). A general language model for information retrieval. In Proceedings of the Conference on Information and Knowledge Management (CIKM), pages 316–321.

    Google Scholar 

  • Sparck Jones, K. (1971). Automatic keyword classification for information retrieval. Butterworths, London.

    Google Scholar 

  • Sparck Jones, K. (1974). Automatic indexing. Journal of Documentation, 30(4):393–432.

    Google Scholar 

  • Svenonius, E. (1986). Unanswered questions in the design of controlled vocabularies. Journal of the American Society for Information Science, 37(5):331–340.

    Article  Google Scholar 

  • Tumer, K. and Ghosh, J. (1999). Linear and order statistics combiners for pattern classification. In Sharkey, A., editor, Combining Artificial Neural Networks, pages 127–162. Springer-Verlag.

    Google Scholar 

  • Turtle, H. (1990). Inference networks for document retrieval. PhD thesis, Computer Science Department, University of Massachusetts.

    Google Scholar 

  • Turtle, H. and Croft, W. (1991). Evaluation of an inference network-based retrieval model. ACM Transactions on Information Systems, 9(3):187–222.

    Article  Google Scholar 

  • Turtle, H. and Croft, W. (1992). A comparison of text retrieval models. Computer Journal, 35(3):279–290.

    Article  MATH  Google Scholar 

  • Van Rijsbergen, C. (1979). Information Retrieval. Butterworths, London.

    Google Scholar 

  • Van Rijsbergen, C. (1986). A non-classical logic for information retrieval. Computer Journal, 29:481–485.

    MATH  Google Scholar 

  • Vogt, C. and Cottrell, G. (1998). Predicting the performance of linearly combined IR systems. In Proceedings of the 21st ACM SIGIR Conference on Research and Development in Information Retrieval, pages 190–196.

    Google Scholar 

  • Voorhees, E., Gupta, N., and Johnson-Laird, B. (1995). Learning collection fusion strategies. In Proceedings of the 18th ACM SIGIR Conference on Research and Development in Information Retrieval, pages 172–179.

    Google Scholar 

  • Wilkinson, R. (1994). Effective retrieval of structured documents. In Proceedings of the 17th ACM SIGIR Conference on Research and Development in Information Retrieval, pages 311–317.

    Google Scholar 

  • Xu, J. and Croft, W. (1996). Query expansion using local and global document analysis. In Proceedings of the 19th ACM SIGIR Conference on Research and Development in Information Retrieval, pages 4–11.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2002 Kluwer Academic Publishers

About this chapter

Cite this chapter

Croft, W.B. (2002). Combining Approaches to Information Retrieval. In: Croft, W.B. (eds) Advances in Information Retrieval. The Information Retrieval Series, vol 7. Springer, Boston, MA. https://doi.org/10.1007/0-306-47019-5_1

Download citation

  • DOI: https://doi.org/10.1007/0-306-47019-5_1

  • Publisher Name: Springer, Boston, MA

  • Print ISBN: 978-0-7923-7812-9

  • Online ISBN: 978-0-306-47019-6

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics