Combining Approaches to Information Retrieval

Croft, W. Bruce

doi:10.1007/0-306-47019-5_1

W. Bruce Croft³

Part of the book series: The Information Retrieval Series ((INRE,volume 7))

307 Accesses
33 Citations

Abstract

The combination of different text representations and search strategies has become a standard technique for improving the effectiveness of information retrieval. combination, for example, has been studied extensively in the TREC evaluations and is the basis of the “meta-search” engines used on the Web. This paper examines the development of this technique, including both experimental results and the retrieval models that have been proposed as formal frameworks for combination. We show that combining approaches for information retrieval can be modeled as combining the outputs of multiple classifiers based on one or more representations, and that this simple model can provide explanations for many of the experimental results. We also show that this view of combination is very similar to the inference net model, and that a new approach to retrieval based on language models supports combination and can be integrated with the inference net model.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Hardcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Bartell, B., Cottrell, G., and Belew, R. (1994). Automatic combination of multiple ranked retrieval systems. In Proceedings of the 17th ACM SIGIR Conference on Research and Development in Information Retrieval, pages 173–181.
Google Scholar
Belkin, N., Cool, C., Croft, W., and Callan, J. (1993). The effect of multiple query representations on information retrieval system performance. In Proceedings of the 16th ACM SIGIR Conference on Research and Development in Information Retrieval, pages 339–346.
Google Scholar
Belkin, N., Kantor, P., Fox, E., and Shaw, J. (1995). Combining the evidence of multiple query representations for information retrieval. Information Processing and Management, 31(3):431–448.
Article Google Scholar
Berger, A. and Lafferty, J. (1999). Information retrieval as statistical translation. In Proceedings of the 22nd ACM SIGIR Conference on Research and DevelopmentinInformationRetrieval, pages222–229.
Google Scholar
Callan, J. (1994). Passage-level evidence in document retrieval. In Proceedings of the 17th ACM SIGlR Conference on Research and Development in Information Retrieval, pages 302–310.
Google Scholar
Callan, J. and Croft, W. (1993). An evaluation of query processing strategies using the TIPSTER collection. In Proceedings of the 16th ACM SIGIR Conference on Research and Development in Information Retrieval, pages 347–355.
Google Scholar
Callan, J., Croft, W., and Broglio, J. (1995a). TREC and TIPSTER experiments with INQUERY. Information Processing and Management, 31(3):327–343.
Article Google Scholar
Callan, J., Lu, Z., and Croft, W. (1995b). Searching distributed collections with inference networks. In Proceedings of the 18th ACM SIGIR Conference on Research and Development in Information Retrieval, pages 21–28.
Google Scholar
Ciaccia, P., Patella, M., and Zezula, P. (1998). Processing complex similarity queries with distance-based access methods. In Proceedings of the 6th International Conference on Extending Database Technology (EDBT), pages 9–23. Springer-Verlag.
Google Scholar
Cleverdon, C. (1967). The Cranfield tests on index language devices. Aslib Proceedings, 19:173–192.
Article Google Scholar
Croft, W. and Harper, D. (1979). Using probabilistic models of document retrieval without relevance information. Journal of Documentation, 35:285–295.
Google Scholar
Croft, W., Krovetz, R., and Turtle, H. (1990). Interactive retrieval of complex documents. Information Processing and Management, 26(5):593–613.
Article Google Scholar
Croft, W., Lucia, T. J. and Cringean, J., and Willett, P. (1989). Retrieving documents by plausible inference: An experimental study. Information Processing and Management, 25(6):599–614.
Google Scholar
Croft, W. and Thompson, R. (1984). The use of adaptive mechanisms for selection of search strategies in document retrieval systems. In Proceedings of the 7th ACM SIGIR Conference on Research and Development in Information Retrieval, pages 95–110. Cambridge University Press.
Google Scholar
Croft, W. and Thompson, R. (1987). I³R: A new approach to the design of document retrieval systems. Journal of the American Society for Information Science, 38(6): 389–404.
Article Google Scholar
Croft, W. and Turtle, H. (1989): A retrieval model incorporating hypertext links. In Proceedings of ACM HypertextConference, pages 213–224.
Google Scholar
Croft, W. and Turtle, H. (1992). Retrieval of complex objects. In Proceedings of the 3rd International Conference on Extending Database Technology (EDBT), pages 217–229. Springer-Verlag.
Google Scholar
Croft, W., Turtle, H., and Lewis, D. (1991). The use of phrases and structured queries in information retrieval. In Proceedings of the 14th ACM SIGIR Conference on Research and Development in Information Retrieval, pages 32–45.
Google Scholar
Crouch, C., Crouch, D., and Nareddy, K. (1990). The automatic generation of extended queries. In Proceedings of the 13th ACM SlGlR Conference on Research and Development in Information Retrieval, pages 369–383.
Google Scholar
Deerwester, S., Dumais, S., Furnas, G., Landauer, T., and Harshman, R. (1990). Indexing by latent semantic analysis. Journal of the American Society for Information Science, 41:391–407.
Article Google Scholar
Fagan, J. (1987). Experiments in automatic phrase indexing for document retrieval: A comparison of syntactic and non-syntactic methods. PhD thesis, Computer Science Department, Cornell University.
Google Scholar
Fagan, J. (1989). The effectiveness of a nonsyntactic approach to automatic phrase indexing for document retrieval. Journal of the American Society for Information Science, 40(2):115–132.
Article Google Scholar
Fagin, R. (1996). Combining fuzzy information from multiple systems. In Proceedings of the 15th ACM Conference on Principles of Database Systems (PODS), pages 216–226.
Google Scholar
Fagin, R. (1998). Fuzzy queries in multimediadatabase systems. In Proceedings of the 17th ACM Conference on Principles of Database Systems (PODS), pages 1–10.
Google Scholar
Fisher, H. and Elchesen, D. (1972). Effectiveness of combining title words and index terms in machine retrieval searches. Nature, 238:109–110.
Article Google Scholar
Flickner, M., Sawhney, H., Niblack, W., Ashley, J., Huang, Q., Dom, B., Gorkani, M., Lee, D., Petkovix, D., Steele, D., and Yanker, P. (1995). Query by image and video content: The QBIC system. IEEE Computer Magazine, 28(9):23–30.
Google Scholar
Fox, E. (1983). Extending the Boolean and vector space models of information retrieval with p-norm queries and multiple concept types. PhD thesis, Computer Science Department, Cornell University.
Google Scholar
Fox, E. and France, R. (1987). Architecture of an expert system for composite document analysis, representation, and retrieval. Journal of Approximate Reasoning, 1:151–175.
Google Scholar
Fox, E., Nunn, G., and Lee, W. (1988). Coefficients for combining concept classes in a collection. In Proceedings of the 11th ACM SIGIR Conference on Research and Development in Information Retrieval, pages 291–308.
Google Scholar
Fox, E. and Shaw, J. (1994). Combination of multiple searches. In Proceedings of the 2nd Text Retrieval Conference (TREC-2), pages 243–252. National Institute of Standards and Technology Special Publication 500-215.
Google Scholar
Frankel, C., Swain, M., and Athitsos, V. (1996). WebSeer: An image search engine for the World Wide Web. Technical Report TR-96-14, University of Chicago Computer Science Department.
Google Scholar
Frisse, M. and Cousins, S. (1989). Information retrieval from hypertext: Update on the dynamic medical handbook project. In Proceedings of ACM Hypertext Conference, pages 199–212.
Google Scholar
Fuhr, N. (1990). A probabilistic framework for vague queries and imprecise information in databases. In Proceedings of the Very Large Database Conference (VLDB), pages 696–707.
Google Scholar
Fuhr, N. (1992). Probabilistic models in information retrieval. Computer Journal, 35(3):243–255.
Article MATH Google Scholar
Fuhr, N. and Buckley, C. (1991). A probabilistic learning approach for document indexing. ACM Transactions on Information Systems, 9(3):223–248.
Article Google Scholar
Gey, F. (1994). Inferring probability of relevance using the method of logistic regression. In Proceedings of the 17th ACM SIGIR Conference on Research and Development in Information Retrieval, pages 222–231.
Google Scholar
Greiff, W. (1998). A theory of term weighting based on exploratory data analysis. In Proceedings of the 21st ACM SIGIR Conference on Research and Development in Information Retrieval, pages 11–19.
Google Scholar
Greiff, W. (1999). Maximum entropy, weight of evidence, and information retrieval. PhD thesis, Computer Science Department, University of Massachusetts.
Google Scholar
Greiff, W., Croft, W., and Turtle, H. (1997). Computationally tractable probabilistic modeling of Boolean operators. In Proceedings of the 20th ACM SIGIR Conference on Research and Development in Information Retrieval, pages 119–128.
Google Scholar
Haines, D. (1996). Adaptive query modification in a probabilistic information retrieval model. PhD thesis, Computer Science Department, University of Massachusetts.
Google Scholar
Haines, D. and Croft, W. (1993). Relevance feedback and inference networks. In Proceedings of the 16th ACM SIGIR Conference on Research and Development in Information Retrieval, pages 2–11.
Google Scholar
Harman, D. (1992). The DARPA TIPSTER project. ACM SIGIR Forum, 26(2):26–28.
Google Scholar
Harman, D. (1995). Overview of the second text retrieval conference (TREC-2). Information Processing and Management, 31(3):271–289.
Google Scholar
Harmandas, V., Sanderson, M., and Dunlop, M. (1997). Image retrieval by hypertext links. In Proceedings of the 20th ACM SIGIR Conference on Research and Development in Information Retrieval, pages 296–303.
Google Scholar
Hearst, M. and Plaunt, C. (1993). Subtopic structuring for full-length document access. In Proceedings of the 16th ACM SIGIR Conference on Research and Development in Information Retrieval, pages 59–68.
Google Scholar
Heckerman, D., Geiger, D., and Chickering, D. (1994). Learning Bayesian networks: The combination of knowledge and statistical data. In Proceedings of the 10th Conference on Uncertainty in Artificial Intelligence, pages 293–301. Morgan Kaufmann.
Google Scholar
Hofmann, T. (1999). Probabilistic latent semantic indexing. In Proceedings of the 22nd ACM SIGIR Conference on Research and Development in Information Retrieval, pages 50–57.
Google Scholar
Hull, D., Pedersen, J., and Schutze, H. (1996). Method combination for document filtering. In Proceedings of the 19th ACM SIGIR Conference on Research and Development in Information Retrieval, pages 279–287.
Google Scholar
Jansen, B., Spink, A., and Saracevic, T. (1998). Real life information retrieval: A study of user queries on the Web. SIGIR Forum, 32(1):5–17
Google Scholar
Jelinek, F. (1997). Statistical methods for speech recognition. MIT Press, Cambridge.
Google Scholar
Kaszkiel, M. and Zobel, J. (1997). Passage retrieval revisited. In Proceedings of the 20th ACM SIGIR Conference on Research and Development in Information Retrieval, pages 178–185.
Google Scholar
Katzer, J., McGill, M., Tessier, J., Frakes, W., and DasGupta, P. (1982). A study of the overlap among document representations. Information Technology: Research and Development, 1(4): 261–274.
Google Scholar
Larkey, L. and Croft, W. (1996). Combining classifiers in text categorization. In Proceedings of the 19th ACM SIGIR Conference on Research and Development in Information Retrieval, pages 289–297.
Google Scholar
Lee, J. (1995). Combining multiple evidence from different properties of weighting schemes. In Proceedings of the 18th ACM SIGIR Conference on Research and Development in Information Retrieval, pages 180–188.
Google Scholar
Lee, J. (1997). Analyses of multiple evidence combination. In Proceedings of the 20th ACM SIGIR Conference on Research and Development in Information Retrieval, pages 267–276.
Google Scholar
Lewis, D. and Hayes, P. (1994). Special issue on text categorization. ACM Transactions on Information Systems, 12(3).
Google Scholar
Manning, C. and Schutze, H. (1999). Foundations of statistical natural language processing. MIT Press, Cambridge.
MATH Google Scholar
McGill, M., Koll, M., and Noreault, T. (1979). An evaluation of factors affecting document ranking by information retrieval systems. Final report for grant NSF-IST-78-10454 to the National Science Foundation, Syracuse University.
Google Scholar
McLachlan, G. and Krishnan, T. (1997). The EM algorithm and extensions. Wiley, New York.
MATH Google Scholar
Miller, D., Leek, T., and Schwartz, R. (1999). A Hidden Markov Model information retrieval system. In Proceedings of the 22nd ACM SIGIR Conference on Research and Development in Information Retrieval, pages 214–221.
Google Scholar
Mitchell, T. (1997). Machine Learning. McGraw-Hill, New York.
MATH Google Scholar
Mitra, M., Singhal, A., and Buckley, C. (1998). Improving automatic query expansion. In Proceedings of the 21st ACM SIGIR Conference on Research and Development in Information Retrieval, pages 206–214.
Google Scholar
Mittendorf, E. and Schauble, P. (1994). Document and passage retrieval based on Hidden Markov Models. In Proceedings of the 17th ACM SIGIR Conference on Research and Development in Information Retrieval, pages 318–327.
Google Scholar
MUC-6 (1995). Proceedings of the Sixth Message Understanding Conference (MUC-6). Morgan Kaufmann, San Mateo.
Google Scholar
O’Connor, J. (1975). Retrieval of answer-sentences and answer figures from papers by text searching. Information Processing and Management, 11(5/7):155–164.
Google Scholar
O’Connor, J. (1980). Answer-passage retrieval by text searching. Journal of the American Society for Information Science, 31(4):227–239.
Google Scholar
Pao, M. and Worthen, D. (1989). Retrieval effectiveness by semantic and citation searching. Journal of the American Society for Information Science, 40(4):226–235.
Article Google Scholar
Pearl, J. (1988). Probabilistic reasoning in intelligent systems: Networks of plausible inference. Morgan Kaufmann, San Mateo.
Google Scholar
Ponte, J. (1998). A Language Modeling Approach to Information Retrieval. PhD thesis, Computer Science Department, University of Massachusetts.
Google Scholar
Ponte, J. and Croft, W. (1998). A language modeling approach to information retrieval. In Proceedings of the 21st ACM SIGIR Conference on Research and Development in Information Retrieval, pages 275–281.
Google Scholar
Rajashekar, T. and Croft, W. (1995). Combining automatic and manual index representations in probabilistic retrieval. Journal of the American Society for Information Science, 46(4):272–283.
Article Google Scholar
Ravela, C. and Manmatha, R. (1997). Image retrieval by appearance. In Proceedings of the 20th ACM SIGIR Conference on Research and Development in Information Retrieval, pages 278–285.
Google Scholar
Robertson, S. (1977). The probability ranking principle in information retrieval. Journal of Documentation, 33:294–304.
Google Scholar
Robertson, S. and Sparck Jones, K. (1976). Relevance weighting of search terms. Journal of the American Society for Information Science, 27:129–146.
Google Scholar
Salton, G. (1968). Automatic information organization and retrieval. McGraw-Hill, New York.
Google Scholar
Salton, G. (1971). The SMART retrieval system-Experiments in automatic document processing. Prentice-Hall, Englewood Cliffs.
Google Scholar
Salton, G. (1974). Automatic indexing using bibliographic citations. Journal of Documentation, 27:98–100.
Google Scholar
Salton, G., Allan, J., and Buckley, C. (1993). Approaches to passage retrieval in full text information systems. In Proceedings of the 17th ACM SIGIR Conference on Research and Development in Information Retrieval, pages 49–56.
Google Scholar
Salton, G. and Buckley, C. (1988). Term-weighting approaches in automatic text retrieval. Information Processing and Management, 24:513–523.
Google Scholar
Salton, G., Fox, E., and Voorhees, E. (1983). Advanced feedback methods in information retrieval. Journal of the American Society for Information Science, 36(3):200–210.
Google Scholar
Salton, G. and Lesk, M. (1968). Computer evaluation of indexing and text processing. Journal of the ACM, 15:8–36.
Article MATH Google Scholar
Salton, G. and McGill, M. (1983). Introduction to modern information retrieval. McGraw-Hill, New York.
MATH Google Scholar
Salton, G., Wong, A., and Yang, C. (1975). A vector space model for automatic indexing. Communications of the ACM, 18:613–620.
Article MATH Google Scholar
Saracevic, T. and Kantor, P. (1988). A study of information seeking and retrieving. Part111. Searchers, searches, overlap. Journal of the American Society for Information Science, 39(3): 197–216.
Google Scholar
Schneiderman, H. and Kanade, T. (1998). Probabilistic modeling of local appearance and spatial relationships for object recognition. In Proceedings of the Computer Vision and Pattern Recognition Conference (CVPR), pages 45–51.
Google Scholar
Small, H. (1973). Co-citation in scientific literature: A new measure of the relationship between two documents. Journal of the American Society for Information Science, 24:265–269.
Google Scholar
Song, F. and Croft, W. (1999). A general language model for information retrieval. In Proceedings of the Conference on Information and Knowledge Management (CIKM), pages 316–321.
Google Scholar
Sparck Jones, K. (1971). Automatic keyword classification for information retrieval. Butterworths, London.
Google Scholar
Sparck Jones, K. (1974). Automatic indexing. Journal of Documentation, 30(4):393–432.
Google Scholar
Svenonius, E. (1986). Unanswered questions in the design of controlled vocabularies. Journal of the American Society for Information Science, 37(5):331–340.
Article Google Scholar
Tumer, K. and Ghosh, J. (1999). Linear and order statistics combiners for pattern classification. In Sharkey, A., editor, Combining Artificial Neural Networks, pages 127–162. Springer-Verlag.
Google Scholar
Turtle, H. (1990). Inference networks for document retrieval. PhD thesis, Computer Science Department, University of Massachusetts.
Google Scholar
Turtle, H. and Croft, W. (1991). Evaluation of an inference network-based retrieval model. ACM Transactions on Information Systems, 9(3):187–222.
Article Google Scholar
Turtle, H. and Croft, W. (1992). A comparison of text retrieval models. Computer Journal, 35(3):279–290.
Article MATH Google Scholar
Van Rijsbergen, C. (1979). Information Retrieval. Butterworths, London.
Google Scholar
Van Rijsbergen, C. (1986). A non-classical logic for information retrieval. Computer Journal, 29:481–485.
MATH Google Scholar
Vogt, C. and Cottrell, G. (1998). Predicting the performance of linearly combined IR systems. In Proceedings of the 21st ACM SIGIR Conference on Research and Development in Information Retrieval, pages 190–196.
Google Scholar
Voorhees, E., Gupta, N., and Johnson-Laird, B. (1995). Learning collection fusion strategies. In Proceedings of the 18th ACM SIGIR Conference on Research and Development in Information Retrieval, pages 172–179.
Google Scholar
Wilkinson, R. (1994). Effective retrieval of structured documents. In Proceedings of the 17th ACM SIGIR Conference on Research and Development in Information Retrieval, pages 311–317.
Google Scholar
Xu, J. and Croft, W. (1996). Query expansion using local and global document analysis. In Proceedings of the 19th ACM SIGIR Conference on Research and Development in Information Retrieval, pages 4–11.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, University of Massachusetts, Amherst
W. Bruce Croft

Authors

W. Bruce Croft
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

University of Massachusetts, Amherst
W. Bruce Croft

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Croft, W.B. (2002). Combining Approaches to Information Retrieval. In: Croft, W.B. (eds) Advances in Information Retrieval. The Information Retrieval Series, vol 7. Springer, Boston, MA. https://doi.org/10.1007/0-306-47019-5_1

Download citation

DOI: https://doi.org/10.1007/0-306-47019-5_1
Publisher Name: Springer, Boston, MA
Print ISBN: 978-0-7923-7812-9
Online ISBN: 978-0-306-47019-6
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics