Abstract
This chapter introduces a feature-based retrieval model based on Markov random fields (MRF model), which serves as the primary retrieval model throughout the remainder of the book. Although there are many different ways to formulate a general feature-based model for information retrieval, this work focuses on the MRF model because it satisfies the following desiderata: 1) supports basic information retrieval tasks (e.g., ranking, query expansion, etc.), 2) easily and intuitively models query term dependencies., 3) handles arbitrary textual and non-textual features, and 4) consistently and significantly improves effectiveness over bag of words models across a wide range of tasks and data sets. The chapter covers the basic theoretical and practical foundations of the model, whereas subsequent chapters provide more detail and describe various extensions of the basic model.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Note that we make the assumption that relevance is binary, which is commonly used for information retrieval tasks. If relevance is non-binary, then a different relevance distribution can be estimated for each relevance level.
- 2.
Although some TREC collections actually do have ternary (i.e., not relevant, relevant, and highly relevant) judgments, they have never been used during official evaluations. When ternary judgments do exist, all relevant (rating 1) and highly relevant (rating 2) documents are considered relevant, which thereby binarizes the judgments.
- 3.
Average precision is equal to reciprocal rank for queries with only one relevant document, so the two measures will only differ for those topics that have more than one relevant document.
- 4.
References
Agichtein, E., Brill, E., & Dumais, S. (2006). Improving web search ranking by incorporating user behavior information. In Proc. 29th ann. intl. ACM SIGIR conf. on research and development in information retrieval (pp. 19–26).
Anh, V. N., & Moffat, A. (2005). Simplified similarity scoring using term ranks. In Proc. 28th ann. intl. ACM SIGIR conf. on research and development in information retrieval (pp. 226–233).
Anh, V. N., & Moffat, A. (2006). Pruned query evaluation using pre-computed impacts. In Proc. 29th ann. intl. ACM SIGIR conf. on research and development in information retrieval (pp. 372–379).
Baron, J. R., Lewis, D. D., & Oard, D. (2006). TREC 2006 legal track overview. In Proc. 15th intl. conf. on World Wide Web
Bendersky, M., & Croft, W. B. (2008). Discovering key concepts in verbose queries. In Proc. 31st ann. intl. ACM SIGIR conf. on research and development in information retrieval.
Brin, S., & Page, L. (1998). The anatomy of a large-scale hypertextual Web search engine. Computer Networks and ISDN Systems, 30(1–7), 107–117.
Broder, A. (2002). A taxonomy of web search. SIGIR Forum, 36(2), 3–10.
Buckley, C. (2004). Why current IR engines fail. In Proc. 27th ann. intl. ACM SIGIR conf. on research and development in information retrieval (pp. 584–585).
Buckley, C., Dimmick, D., Soboroff, I., & Voorhees, E. (2006). Bias and the limits of pooling. In Proc. 29th ann. intl. ACM SIGIR conf. on research and development in information retrieval (pp. 619–620).
Clarke, C., Scholar, F., & Soboroff, I. (2006). Overview of the TREC 2005 terabyte track. In Proc. 14th intl. conf. on World Wide Web.
Craswell, N., de Vries, A. P., & Soboroff, I. (2005a). Overview of the TREC 2005 enterprise track. In Proc. 14th intl. conf. on World Wide Web.
Croft, W. B., Turtle, H., & Lewis, D. (1991). The use of phrases and structured queries in information retrieval. In Proc. 14th ann. intl. ACM SIGIR conf. on research and development in information retrieval (pp. 32–45).
Diaz, F., & Metzler, D. (2006). Improving the estimation of relevance models using large external corpora. In Proc. 29th ann. intl. ACM SIGIR conf. on research and development in information retrieval (pp. 154–161).
Eguchi, K. (2005). NTCIR-5 query expansion experiments using term dependence models. In Proc. of the fifth NTCIR workshop meeting on evaluation of information access technologies (pp. 494–501).
Fagan, J. (1987). Automatic phrase indexing for document retrieval: An examination of syntactic and non-syntactic methods. In Proc. tenth ann. intl. ACM SIGIR conf. on research and development in information retrieval (pp. 91–101).
Fang, H., & Zhai, C. (2005). An exploration of axiomatic approaches to information retrieval. In Proc. 28th ann. intl. ACM SIGIR conf. on research and development in information retrieval (pp. 480–487).
Gao, J., Nie, J., Wu, G., & Cao, G. (2004). Dependence language model for information retrieval. In Proc. 27th ann. intl. ACM SIGIR conf. on research and development in information retrieval (pp. 170–177).
Kleinberg, J. (1999). Authoritative sources in a hyperlinked environment. Journal of the ACM, 46(5), 604–632.
Kraaij, W., Westerveld, T., & Hiemstra, D. (2002). The importance of prior probabilities for entry page search. In Proc. 25th ann. intl. ACM SIGIR conf. on research and development in information retrieval (pp. 27–34).
Lang, H., Metzler, D., Wang, B., & Li, J.-T. (2010). Improved latent concept expansion using hierarchical Markov random fields. In Proc. 19th intl. conf. on information and knowledge management, CIKM ’10 (pp. 249–258). New York: ACM.
Lavrenko, V. (2004). A generative theory of relevance. PhD thesis, University of Massachusetts, Amherst, MA.
Lease, M. (2009). An improved Markov random field model for supporting verbose queries. In Proc. 32nd ann. intl. ACM SIGIR conf. on research and development in information retrieval.
Lin, J., Metzler, D., Elsayed, T., & Wang, L. (2009). Of ivory and smurfs: Loxodontan mapreduce experiments for web search. In Proc. 18th intl. conf. on World Wide Web.
Metzler, D., & Croft, W. B. (2004). Combining the language model and inference network approaches to retrieval. Information Processing & Management, 40(5), 735–750.
Metzler, D., & Croft, W. B. (2005). A Markov random field model for term dependencies. In Proc. 28th ann. intl. ACM SIGIR conf. on research and development in information retrieval (pp. 472–479).
Metzler, D., & Croft, W. B. (2007). Latent concept expansion using Markov random fields. In Proc. 30th ann. intl. ACM SIGIR conf. on research and development in information retrieval.
Metzler, D., Strohman, T., Turtle, H., & Croft, W. B. (2004b). Indri at TREC 2004: Terabyte track. In Proc. 13th intl. conf. on World Wide Web.
Metzler, D., Strohman, T., Zhou, Y., & Croft, W. B. (2005b). Indri at TREC 2005: terabyte track. In Proc. 14th intl. conf. on World Wide Web.
Metzler, D., Strohman, T., & Croft, W. B. (2006). Lessons learned from three terabyte tracks. In Proc. 15th intl. conf. on World Wide Web.
Ogilvie, P., & Callan, J. (2003). Combining document representations for known-item search. In Proc. 26th ann. intl. ACM SIGIR conf. on research and development in information retrieval (pp. 143–150).
Ounis, I., de Rijke, M., MacDonald, C., Mishne, G., & Soboroff, I. (2006). Overview of the TREC-2006 blog track. In Proc. 15th intl. conf. on World Wide Web.
Pang, B., Lee, L., & Vaithyanathan, S. (2002). Thumbs up? Sentiment classification using machine learning techniques. In Proceedings of the 2002 conference on empirical methods in natural language processing (EMNLP) (pp. 79–86).
Peng, Y., & He, D. (2006). Direct comparison of commercial and academic retrieval system: an initial study. In Proc. 15th intl. conf. on information and knowledge management (pp. 806–807).
Robertson, S., Zaragoza, H., & Taylor, M. (2004). Simple bm25 extension to multiple weighted fields. In Proc. 13th intl. conf. on information and knowledge management (pp. 42–49).
Si, L., & Callan, J. (2001). A statistical model for scientific readability. In Proc. 10th intl. conf. on information and knowledge management (pp. 574–576).
Singhal, A., Buckley, C., & Mitra, M. (1996). Pivoted document length normalization. In Proc. 19th ann. intl. ACM SIGIR conf. on research and development in information retrieval (pp. 21–29).
Song, F., & Croft, W. B. (1999). A general language model for information retrieval. In Proc. 8th intl. conf. on information and knowledge management (pp. 316–321).
Spärck Jones, K. (2005). Wearing proper combinations (Technical report). University of Cambridge.
Srikanth, M., & Srihari, R. (2002). Biterm language models for document retrieval. In Proc. 25th ann. intl. ACM SIGIR conf. on research and development in information retrieval (pp. 425–426).
Strohman, T., & Croft, W. B. (2007). Efficient document retrieval in main memory. In Proc 30th ann. intl. ACM SIGIR conf. on research and development in information retrieval (pp. 175–182).
Strohman, T., Metzler, D., Turtle, H., & Croft, W. B. (2004). Indri: A language model-based search engine for complex queries. In Proceedings of the international conference on intelligence analysis.
Vechtomova, O., Karamuftuoglu, M., & Robertson, S. E. (2006). On document relevance and lexical cohesion between query terms. Information Processing & Management, 42(5), 1230–1247.
Voorhees, E. (1999). The TREC-8 question answering track report. In Proc. 8th intl. conf. on World Wide Web (pp. 77–82).
Voorhees, E. (2004). Overview of the TREC 2004 robust retrieval track. In Proc. 13th intl. conf. on World Wide Web.
Voorhees, E. (2005). Overview of the TREC 2005 robust retrieval track. In Proc. 14th intl. conf. on World Wide Web.
Wang, L., Lin, J., & Metzler, D. (2010a). Learning to efficiently rank. In Proc. 33rd ann. intl. ACM SIGIR conf. on research and development in information retrieval (pp. 138–145). Geneva, Switzerland.
Wang, L., Metzler, D., & Lin, J. (2010b). Ranking under temporal constraints. In Proc. 19th intl. conf. on information and knowledge management, Toronto, Canada.
Zhai, C., & Lafferty, J. (2001b). A study of smoothing methods for language models applied to ad hoc information retrieval. In Proc. 24th ann. intl. ACM SIGIR conf. on research and development in information retrieval (pp. 334–342).
Zhou, Y., & Croft, W. B. (2005). Document quality models for web ad hoc retrieval. In Proc. 14th intl. conf. on information and knowledge management (pp. 331–332).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Metzler, D. (2011). Feature-Based Ranking. In: A Feature-Centric View of Information Retrieval. The Information Retrieval Series, vol 27. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-22898-8_3
Download citation
DOI: https://doi.org/10.1007/978-3-642-22898-8_3
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-22897-1
Online ISBN: 978-3-642-22898-8
eBook Packages: Computer ScienceComputer Science (R0)