Abstract
The combination of different text representations and search strategies has become a standard technique for improving the effectiveness of information retrieval. combination, for example, has been studied extensively in the TREC evaluations and is the basis of the “meta-search” engines used on the Web. This paper examines the development of this technique, including both experimental results and the retrieval models that have been proposed as formal frameworks for combination. We show that combining approaches for information retrieval can be modeled as combining the outputs of multiple classifiers based on one or more representations, and that this simple model can provide explanations for many of the experimental results. We also show that this view of combination is very similar to the inference net model, and that a new approach to retrieval based on language models supports combination and can be integrated with the inference net model.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Bartell, B., Cottrell, G., and Belew, R. (1994). Automatic combination of multiple ranked retrieval systems. In Proceedings of the 17th ACM SIGIR Conference on Research and Development in Information Retrieval, pages 173–181.
Belkin, N., Cool, C., Croft, W., and Callan, J. (1993). The effect of multiple query representations on information retrieval system performance. In Proceedings of the 16th ACM SIGIR Conference on Research and Development in Information Retrieval, pages 339–346.
Belkin, N., Kantor, P., Fox, E., and Shaw, J. (1995). Combining the evidence of multiple query representations for information retrieval. Information Processing and Management, 31(3):431–448.
Berger, A. and Lafferty, J. (1999). Information retrieval as statistical translation. In Proceedings of the 22nd ACM SIGIR Conference on Research and DevelopmentinInformationRetrieval, pages222–229.
Callan, J. (1994). Passage-level evidence in document retrieval. In Proceedings of the 17th ACM SIGlR Conference on Research and Development in Information Retrieval, pages 302–310.
Callan, J. and Croft, W. (1993). An evaluation of query processing strategies using the TIPSTER collection. In Proceedings of the 16th ACM SIGIR Conference on Research and Development in Information Retrieval, pages 347–355.
Callan, J., Croft, W., and Broglio, J. (1995a). TREC and TIPSTER experiments with INQUERY. Information Processing and Management, 31(3):327–343.
Callan, J., Lu, Z., and Croft, W. (1995b). Searching distributed collections with inference networks. In Proceedings of the 18th ACM SIGIR Conference on Research and Development in Information Retrieval, pages 21–28.
Ciaccia, P., Patella, M., and Zezula, P. (1998). Processing complex similarity queries with distance-based access methods. In Proceedings of the 6th International Conference on Extending Database Technology (EDBT), pages 9–23. Springer-Verlag.
Cleverdon, C. (1967). The Cranfield tests on index language devices. Aslib Proceedings, 19:173–192.
Croft, W. and Harper, D. (1979). Using probabilistic models of document retrieval without relevance information. Journal of Documentation, 35:285–295.
Croft, W., Krovetz, R., and Turtle, H. (1990). Interactive retrieval of complex documents. Information Processing and Management, 26(5):593–613.
Croft, W., Lucia, T. J. and Cringean, J., and Willett, P. (1989). Retrieving documents by plausible inference: An experimental study. Information Processing and Management, 25(6):599–614.
Croft, W. and Thompson, R. (1984). The use of adaptive mechanisms for selection of search strategies in document retrieval systems. In Proceedings of the 7th ACM SIGIR Conference on Research and Development in Information Retrieval, pages 95–110. Cambridge University Press.
Croft, W. and Thompson, R. (1987). I3R: A new approach to the design of document retrieval systems. Journal of the American Society for Information Science, 38(6): 389–404.
Croft, W. and Turtle, H. (1989): A retrieval model incorporating hypertext links. In Proceedings of ACM HypertextConference, pages 213–224.
Croft, W. and Turtle, H. (1992). Retrieval of complex objects. In Proceedings of the 3rd International Conference on Extending Database Technology (EDBT), pages 217–229. Springer-Verlag.
Croft, W., Turtle, H., and Lewis, D. (1991). The use of phrases and structured queries in information retrieval. In Proceedings of the 14th ACM SIGIR Conference on Research and Development in Information Retrieval, pages 32–45.
Crouch, C., Crouch, D., and Nareddy, K. (1990). The automatic generation of extended queries. In Proceedings of the 13th ACM SlGlR Conference on Research and Development in Information Retrieval, pages 369–383.
Deerwester, S., Dumais, S., Furnas, G., Landauer, T., and Harshman, R. (1990). Indexing by latent semantic analysis. Journal of the American Society for Information Science, 41:391–407.
Fagan, J. (1987). Experiments in automatic phrase indexing for document retrieval: A comparison of syntactic and non-syntactic methods. PhD thesis, Computer Science Department, Cornell University.
Fagan, J. (1989). The effectiveness of a nonsyntactic approach to automatic phrase indexing for document retrieval. Journal of the American Society for Information Science, 40(2):115–132.
Fagin, R. (1996). Combining fuzzy information from multiple systems. In Proceedings of the 15th ACM Conference on Principles of Database Systems (PODS), pages 216–226.
Fagin, R. (1998). Fuzzy queries in multimediadatabase systems. In Proceedings of the 17th ACM Conference on Principles of Database Systems (PODS), pages 1–10.
Fisher, H. and Elchesen, D. (1972). Effectiveness of combining title words and index terms in machine retrieval searches. Nature, 238:109–110.
Flickner, M., Sawhney, H., Niblack, W., Ashley, J., Huang, Q., Dom, B., Gorkani, M., Lee, D., Petkovix, D., Steele, D., and Yanker, P. (1995). Query by image and video content: The QBIC system. IEEE Computer Magazine, 28(9):23–30.
Fox, E. (1983). Extending the Boolean and vector space models of information retrieval with p-norm queries and multiple concept types. PhD thesis, Computer Science Department, Cornell University.
Fox, E. and France, R. (1987). Architecture of an expert system for composite document analysis, representation, and retrieval. Journal of Approximate Reasoning, 1:151–175.
Fox, E., Nunn, G., and Lee, W. (1988). Coefficients for combining concept classes in a collection. In Proceedings of the 11th ACM SIGIR Conference on Research and Development in Information Retrieval, pages 291–308.
Fox, E. and Shaw, J. (1994). Combination of multiple searches. In Proceedings of the 2nd Text Retrieval Conference (TREC-2), pages 243–252. National Institute of Standards and Technology Special Publication 500-215.
Frankel, C., Swain, M., and Athitsos, V. (1996). WebSeer: An image search engine for the World Wide Web. Technical Report TR-96-14, University of Chicago Computer Science Department.
Frisse, M. and Cousins, S. (1989). Information retrieval from hypertext: Update on the dynamic medical handbook project. In Proceedings of ACM Hypertext Conference, pages 199–212.
Fuhr, N. (1990). A probabilistic framework for vague queries and imprecise information in databases. In Proceedings of the Very Large Database Conference (VLDB), pages 696–707.
Fuhr, N. (1992). Probabilistic models in information retrieval. Computer Journal, 35(3):243–255.
Fuhr, N. and Buckley, C. (1991). A probabilistic learning approach for document indexing. ACM Transactions on Information Systems, 9(3):223–248.
Gey, F. (1994). Inferring probability of relevance using the method of logistic regression. In Proceedings of the 17th ACM SIGIR Conference on Research and Development in Information Retrieval, pages 222–231.
Greiff, W. (1998). A theory of term weighting based on exploratory data analysis. In Proceedings of the 21st ACM SIGIR Conference on Research and Development in Information Retrieval, pages 11–19.
Greiff, W. (1999). Maximum entropy, weight of evidence, and information retrieval. PhD thesis, Computer Science Department, University of Massachusetts.
Greiff, W., Croft, W., and Turtle, H. (1997). Computationally tractable probabilistic modeling of Boolean operators. In Proceedings of the 20th ACM SIGIR Conference on Research and Development in Information Retrieval, pages 119–128.
Haines, D. (1996). Adaptive query modification in a probabilistic information retrieval model. PhD thesis, Computer Science Department, University of Massachusetts.
Haines, D. and Croft, W. (1993). Relevance feedback and inference networks. In Proceedings of the 16th ACM SIGIR Conference on Research and Development in Information Retrieval, pages 2–11.
Harman, D. (1992). The DARPA TIPSTER project. ACM SIGIR Forum, 26(2):26–28.
Harman, D. (1995). Overview of the second text retrieval conference (TREC-2). Information Processing and Management, 31(3):271–289.
Harmandas, V., Sanderson, M., and Dunlop, M. (1997). Image retrieval by hypertext links. In Proceedings of the 20th ACM SIGIR Conference on Research and Development in Information Retrieval, pages 296–303.
Hearst, M. and Plaunt, C. (1993). Subtopic structuring for full-length document access. In Proceedings of the 16th ACM SIGIR Conference on Research and Development in Information Retrieval, pages 59–68.
Heckerman, D., Geiger, D., and Chickering, D. (1994). Learning Bayesian networks: The combination of knowledge and statistical data. In Proceedings of the 10th Conference on Uncertainty in Artificial Intelligence, pages 293–301. Morgan Kaufmann.
Hofmann, T. (1999). Probabilistic latent semantic indexing. In Proceedings of the 22nd ACM SIGIR Conference on Research and Development in Information Retrieval, pages 50–57.
Hull, D., Pedersen, J., and Schutze, H. (1996). Method combination for document filtering. In Proceedings of the 19th ACM SIGIR Conference on Research and Development in Information Retrieval, pages 279–287.
Jansen, B., Spink, A., and Saracevic, T. (1998). Real life information retrieval: A study of user queries on the Web. SIGIR Forum, 32(1):5–17
Jelinek, F. (1997). Statistical methods for speech recognition. MIT Press, Cambridge.
Kaszkiel, M. and Zobel, J. (1997). Passage retrieval revisited. In Proceedings of the 20th ACM SIGIR Conference on Research and Development in Information Retrieval, pages 178–185.
Katzer, J., McGill, M., Tessier, J., Frakes, W., and DasGupta, P. (1982). A study of the overlap among document representations. Information Technology: Research and Development, 1(4): 261–274.
Larkey, L. and Croft, W. (1996). Combining classifiers in text categorization. In Proceedings of the 19th ACM SIGIR Conference on Research and Development in Information Retrieval, pages 289–297.
Lee, J. (1995). Combining multiple evidence from different properties of weighting schemes. In Proceedings of the 18th ACM SIGIR Conference on Research and Development in Information Retrieval, pages 180–188.
Lee, J. (1997). Analyses of multiple evidence combination. In Proceedings of the 20th ACM SIGIR Conference on Research and Development in Information Retrieval, pages 267–276.
Lewis, D. and Hayes, P. (1994). Special issue on text categorization. ACM Transactions on Information Systems, 12(3).
Manning, C. and Schutze, H. (1999). Foundations of statistical natural language processing. MIT Press, Cambridge.
McGill, M., Koll, M., and Noreault, T. (1979). An evaluation of factors affecting document ranking by information retrieval systems. Final report for grant NSF-IST-78-10454 to the National Science Foundation, Syracuse University.
McLachlan, G. and Krishnan, T. (1997). The EM algorithm and extensions. Wiley, New York.
Miller, D., Leek, T., and Schwartz, R. (1999). A Hidden Markov Model information retrieval system. In Proceedings of the 22nd ACM SIGIR Conference on Research and Development in Information Retrieval, pages 214–221.
Mitchell, T. (1997). Machine Learning. McGraw-Hill, New York.
Mitra, M., Singhal, A., and Buckley, C. (1998). Improving automatic query expansion. In Proceedings of the 21st ACM SIGIR Conference on Research and Development in Information Retrieval, pages 206–214.
Mittendorf, E. and Schauble, P. (1994). Document and passage retrieval based on Hidden Markov Models. In Proceedings of the 17th ACM SIGIR Conference on Research and Development in Information Retrieval, pages 318–327.
MUC-6 (1995). Proceedings of the Sixth Message Understanding Conference (MUC-6). Morgan Kaufmann, San Mateo.
O’Connor, J. (1975). Retrieval of answer-sentences and answer figures from papers by text searching. Information Processing and Management, 11(5/7):155–164.
O’Connor, J. (1980). Answer-passage retrieval by text searching. Journal of the American Society for Information Science, 31(4):227–239.
Pao, M. and Worthen, D. (1989). Retrieval effectiveness by semantic and citation searching. Journal of the American Society for Information Science, 40(4):226–235.
Pearl, J. (1988). Probabilistic reasoning in intelligent systems: Networks of plausible inference. Morgan Kaufmann, San Mateo.
Ponte, J. (1998). A Language Modeling Approach to Information Retrieval. PhD thesis, Computer Science Department, University of Massachusetts.
Ponte, J. and Croft, W. (1998). A language modeling approach to information retrieval. In Proceedings of the 21st ACM SIGIR Conference on Research and Development in Information Retrieval, pages 275–281.
Rajashekar, T. and Croft, W. (1995). Combining automatic and manual index representations in probabilistic retrieval. Journal of the American Society for Information Science, 46(4):272–283.
Ravela, C. and Manmatha, R. (1997). Image retrieval by appearance. In Proceedings of the 20th ACM SIGIR Conference on Research and Development in Information Retrieval, pages 278–285.
Robertson, S. (1977). The probability ranking principle in information retrieval. Journal of Documentation, 33:294–304.
Robertson, S. and Sparck Jones, K. (1976). Relevance weighting of search terms. Journal of the American Society for Information Science, 27:129–146.
Salton, G. (1968). Automatic information organization and retrieval. McGraw-Hill, New York.
Salton, G. (1971). The SMART retrieval system-Experiments in automatic document processing. Prentice-Hall, Englewood Cliffs.
Salton, G. (1974). Automatic indexing using bibliographic citations. Journal of Documentation, 27:98–100.
Salton, G., Allan, J., and Buckley, C. (1993). Approaches to passage retrieval in full text information systems. In Proceedings of the 17th ACM SIGIR Conference on Research and Development in Information Retrieval, pages 49–56.
Salton, G. and Buckley, C. (1988). Term-weighting approaches in automatic text retrieval. Information Processing and Management, 24:513–523.
Salton, G., Fox, E., and Voorhees, E. (1983). Advanced feedback methods in information retrieval. Journal of the American Society for Information Science, 36(3):200–210.
Salton, G. and Lesk, M. (1968). Computer evaluation of indexing and text processing. Journal of the ACM, 15:8–36.
Salton, G. and McGill, M. (1983). Introduction to modern information retrieval. McGraw-Hill, New York.
Salton, G., Wong, A., and Yang, C. (1975). A vector space model for automatic indexing. Communications of the ACM, 18:613–620.
Saracevic, T. and Kantor, P. (1988). A study of information seeking and retrieving. Part111. Searchers, searches, overlap. Journal of the American Society for Information Science, 39(3): 197–216.
Schneiderman, H. and Kanade, T. (1998). Probabilistic modeling of local appearance and spatial relationships for object recognition. In Proceedings of the Computer Vision and Pattern Recognition Conference (CVPR), pages 45–51.
Small, H. (1973). Co-citation in scientific literature: A new measure of the relationship between two documents. Journal of the American Society for Information Science, 24:265–269.
Song, F. and Croft, W. (1999). A general language model for information retrieval. In Proceedings of the Conference on Information and Knowledge Management (CIKM), pages 316–321.
Sparck Jones, K. (1971). Automatic keyword classification for information retrieval. Butterworths, London.
Sparck Jones, K. (1974). Automatic indexing. Journal of Documentation, 30(4):393–432.
Svenonius, E. (1986). Unanswered questions in the design of controlled vocabularies. Journal of the American Society for Information Science, 37(5):331–340.
Tumer, K. and Ghosh, J. (1999). Linear and order statistics combiners for pattern classification. In Sharkey, A., editor, Combining Artificial Neural Networks, pages 127–162. Springer-Verlag.
Turtle, H. (1990). Inference networks for document retrieval. PhD thesis, Computer Science Department, University of Massachusetts.
Turtle, H. and Croft, W. (1991). Evaluation of an inference network-based retrieval model. ACM Transactions on Information Systems, 9(3):187–222.
Turtle, H. and Croft, W. (1992). A comparison of text retrieval models. Computer Journal, 35(3):279–290.
Van Rijsbergen, C. (1979). Information Retrieval. Butterworths, London.
Van Rijsbergen, C. (1986). A non-classical logic for information retrieval. Computer Journal, 29:481–485.
Vogt, C. and Cottrell, G. (1998). Predicting the performance of linearly combined IR systems. In Proceedings of the 21st ACM SIGIR Conference on Research and Development in Information Retrieval, pages 190–196.
Voorhees, E., Gupta, N., and Johnson-Laird, B. (1995). Learning collection fusion strategies. In Proceedings of the 18th ACM SIGIR Conference on Research and Development in Information Retrieval, pages 172–179.
Wilkinson, R. (1994). Effective retrieval of structured documents. In Proceedings of the 17th ACM SIGIR Conference on Research and Development in Information Retrieval, pages 311–317.
Xu, J. and Croft, W. (1996). Query expansion using local and global document analysis. In Proceedings of the 19th ACM SIGIR Conference on Research and Development in Information Retrieval, pages 4–11.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2002 Kluwer Academic Publishers
About this chapter
Cite this chapter
Croft, W.B. (2002). Combining Approaches to Information Retrieval. In: Croft, W.B. (eds) Advances in Information Retrieval. The Information Retrieval Series, vol 7. Springer, Boston, MA. https://doi.org/10.1007/0-306-47019-5_1
Download citation
DOI: https://doi.org/10.1007/0-306-47019-5_1
Publisher Name: Springer, Boston, MA
Print ISBN: 978-0-7923-7812-9
Online ISBN: 978-0-306-47019-6
eBook Packages: Springer Book Archive