Skip to main content

Part of the book series: The Information Retrieval Series ((INRE,volume 27))

  • 1117 Accesses

Abstract

This chapter introduces a feature-based retrieval model based on Markov random fields (MRF model), which serves as the primary retrieval model throughout the remainder of the book. Although there are many different ways to formulate a general feature-based model for information retrieval, this work focuses on the MRF model because it satisfies the following desiderata: 1) supports basic information retrieval tasks (e.g., ranking, query expansion, etc.), 2) easily and intuitively models query term dependencies., 3) handles arbitrary textual and non-textual features, and 4) consistently and significantly improves effectiveness over bag of words models across a wide range of tasks and data sets. The chapter covers the basic theoretical and practical foundations of the model, whereas subsequent chapters provide more detail and describe various extensions of the basic model.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 54.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Note that we make the assumption that relevance is binary, which is commonly used for information retrieval tasks. If relevance is non-binary, then a different relevance distribution can be estimated for each relevance level.

  2. 2.

    Although some TREC collections actually do have ternary (i.e., not relevant, relevant, and highly relevant) judgments, they have never been used during official evaluations. When ternary judgments do exist, all relevant (rating 1) and highly relevant (rating 2) documents are considered relevant, which thereby binarizes the judgments.

  3. 3.

    Average precision is equal to reciprocal rank for queries with only one relevant document, so the two measures will only differ for those topics that have more than one relevant document.

  4. 4.

    http://www.google.com/technology/.

References

  • Agichtein, E., Brill, E., & Dumais, S. (2006). Improving web search ranking by incorporating user behavior information. In Proc. 29th ann. intl. ACM SIGIR conf. on research and development in information retrieval (pp. 19–26).

    Chapter  Google Scholar 

  • Anh, V. N., & Moffat, A. (2005). Simplified similarity scoring using term ranks. In Proc. 28th ann. intl. ACM SIGIR conf. on research and development in information retrieval (pp. 226–233).

    Chapter  Google Scholar 

  • Anh, V. N., & Moffat, A. (2006). Pruned query evaluation using pre-computed impacts. In Proc. 29th ann. intl. ACM SIGIR conf. on research and development in information retrieval (pp. 372–379).

    Chapter  Google Scholar 

  • Baron, J. R., Lewis, D. D., & Oard, D. (2006). TREC 2006 legal track overview. In Proc. 15th intl. conf. on World Wide Web

    Google Scholar 

  • Bendersky, M., & Croft, W. B. (2008). Discovering key concepts in verbose queries. In Proc. 31st ann. intl. ACM SIGIR conf. on research and development in information retrieval.

    Google Scholar 

  • Brin, S., & Page, L. (1998). The anatomy of a large-scale hypertextual Web search engine. Computer Networks and ISDN Systems, 30(1–7), 107–117.

    Article  Google Scholar 

  • Broder, A. (2002). A taxonomy of web search. SIGIR Forum, 36(2), 3–10.

    Article  Google Scholar 

  • Buckley, C. (2004). Why current IR engines fail. In Proc. 27th ann. intl. ACM SIGIR conf. on research and development in information retrieval (pp. 584–585).

    Google Scholar 

  • Buckley, C., Dimmick, D., Soboroff, I., & Voorhees, E. (2006). Bias and the limits of pooling. In Proc. 29th ann. intl. ACM SIGIR conf. on research and development in information retrieval (pp. 619–620).

    Chapter  Google Scholar 

  • Clarke, C., Scholar, F., & Soboroff, I. (2006). Overview of the TREC 2005 terabyte track. In Proc. 14th intl. conf. on World Wide Web.

    Google Scholar 

  • Craswell, N., de Vries, A. P., & Soboroff, I. (2005a). Overview of the TREC 2005 enterprise track. In Proc. 14th intl. conf. on World Wide Web.

    Google Scholar 

  • Croft, W. B., Turtle, H., & Lewis, D. (1991). The use of phrases and structured queries in information retrieval. In Proc. 14th ann. intl. ACM SIGIR conf. on research and development in information retrieval (pp. 32–45).

    Chapter  Google Scholar 

  • Diaz, F., & Metzler, D. (2006). Improving the estimation of relevance models using large external corpora. In Proc. 29th ann. intl. ACM SIGIR conf. on research and development in information retrieval (pp. 154–161).

    Chapter  Google Scholar 

  • Eguchi, K. (2005). NTCIR-5 query expansion experiments using term dependence models. In Proc. of the fifth NTCIR workshop meeting on evaluation of information access technologies (pp. 494–501).

    Google Scholar 

  • Fagan, J. (1987). Automatic phrase indexing for document retrieval: An examination of syntactic and non-syntactic methods. In Proc. tenth ann. intl. ACM SIGIR conf. on research and development in information retrieval (pp. 91–101).

    Chapter  Google Scholar 

  • Fang, H., & Zhai, C. (2005). An exploration of axiomatic approaches to information retrieval. In Proc. 28th ann. intl. ACM SIGIR conf. on research and development in information retrieval (pp. 480–487).

    Chapter  Google Scholar 

  • Gao, J., Nie, J., Wu, G., & Cao, G. (2004). Dependence language model for information retrieval. In Proc. 27th ann. intl. ACM SIGIR conf. on research and development in information retrieval (pp. 170–177).

    Google Scholar 

  • Kleinberg, J. (1999). Authoritative sources in a hyperlinked environment. Journal of the ACM, 46(5), 604–632.

    Article  MathSciNet  MATH  Google Scholar 

  • Kraaij, W., Westerveld, T., & Hiemstra, D. (2002). The importance of prior probabilities for entry page search. In Proc. 25th ann. intl. ACM SIGIR conf. on research and development in information retrieval (pp. 27–34).

    Chapter  Google Scholar 

  • Lang, H., Metzler, D., Wang, B., & Li, J.-T. (2010). Improved latent concept expansion using hierarchical Markov random fields. In Proc. 19th intl. conf. on information and knowledge management, CIKM ’10 (pp. 249–258). New York: ACM.

    Google Scholar 

  • Lavrenko, V. (2004). A generative theory of relevance. PhD thesis, University of Massachusetts, Amherst, MA.

    Google Scholar 

  • Lease, M. (2009). An improved Markov random field model for supporting verbose queries. In Proc. 32nd ann. intl. ACM SIGIR conf. on research and development in information retrieval.

    Google Scholar 

  • Lin, J., Metzler, D., Elsayed, T., & Wang, L. (2009). Of ivory and smurfs: Loxodontan mapreduce experiments for web search. In Proc. 18th intl. conf. on World Wide Web.

    Google Scholar 

  • Metzler, D., & Croft, W. B. (2004). Combining the language model and inference network approaches to retrieval. Information Processing & Management, 40(5), 735–750.

    Article  Google Scholar 

  • Metzler, D., & Croft, W. B. (2005). A Markov random field model for term dependencies. In Proc. 28th ann. intl. ACM SIGIR conf. on research and development in information retrieval (pp. 472–479).

    Chapter  Google Scholar 

  • Metzler, D., & Croft, W. B. (2007). Latent concept expansion using Markov random fields. In Proc. 30th ann. intl. ACM SIGIR conf. on research and development in information retrieval.

    Google Scholar 

  • Metzler, D., Strohman, T., Turtle, H., & Croft, W. B. (2004b). Indri at TREC 2004: Terabyte track. In Proc. 13th intl. conf. on World Wide Web.

    Google Scholar 

  • Metzler, D., Strohman, T., Zhou, Y., & Croft, W. B. (2005b). Indri at TREC 2005: terabyte track. In Proc. 14th intl. conf. on World Wide Web.

    Google Scholar 

  • Metzler, D., Strohman, T., & Croft, W. B. (2006). Lessons learned from three terabyte tracks. In Proc. 15th intl. conf. on World Wide Web.

    Google Scholar 

  • Ogilvie, P., & Callan, J. (2003). Combining document representations for known-item search. In Proc. 26th ann. intl. ACM SIGIR conf. on research and development in information retrieval (pp. 143–150).

    Google Scholar 

  • Ounis, I., de Rijke, M., MacDonald, C., Mishne, G., & Soboroff, I. (2006). Overview of the TREC-2006 blog track. In Proc. 15th intl. conf. on World Wide Web.

    Google Scholar 

  • Pang, B., Lee, L., & Vaithyanathan, S. (2002). Thumbs up? Sentiment classification using machine learning techniques. In Proceedings of the 2002 conference on empirical methods in natural language processing (EMNLP) (pp. 79–86).

    Chapter  Google Scholar 

  • Peng, Y., & He, D. (2006). Direct comparison of commercial and academic retrieval system: an initial study. In Proc. 15th intl. conf. on information and knowledge management (pp. 806–807).

    Google Scholar 

  • Robertson, S., Zaragoza, H., & Taylor, M. (2004). Simple bm25 extension to multiple weighted fields. In Proc. 13th intl. conf. on information and knowledge management (pp. 42–49).

    Google Scholar 

  • Si, L., & Callan, J. (2001). A statistical model for scientific readability. In Proc. 10th intl. conf. on information and knowledge management (pp. 574–576).

    Google Scholar 

  • Singhal, A., Buckley, C., & Mitra, M. (1996). Pivoted document length normalization. In Proc. 19th ann. intl. ACM SIGIR conf. on research and development in information retrieval (pp. 21–29).

    Chapter  Google Scholar 

  • Song, F., & Croft, W. B. (1999). A general language model for information retrieval. In Proc. 8th intl. conf. on information and knowledge management (pp. 316–321).

    Google Scholar 

  • Spärck Jones, K. (2005). Wearing proper combinations (Technical report). University of Cambridge.

    Google Scholar 

  • Srikanth, M., & Srihari, R. (2002). Biterm language models for document retrieval. In Proc. 25th ann. intl. ACM SIGIR conf. on research and development in information retrieval (pp. 425–426).

    Chapter  Google Scholar 

  • Strohman, T., & Croft, W. B. (2007). Efficient document retrieval in main memory. In Proc 30th ann. intl. ACM SIGIR conf. on research and development in information retrieval (pp. 175–182).

    Chapter  Google Scholar 

  • Strohman, T., Metzler, D., Turtle, H., & Croft, W. B. (2004). Indri: A language model-based search engine for complex queries. In Proceedings of the international conference on intelligence analysis.

    Google Scholar 

  • Vechtomova, O., Karamuftuoglu, M., & Robertson, S. E. (2006). On document relevance and lexical cohesion between query terms. Information Processing & Management, 42(5), 1230–1247.

    Article  Google Scholar 

  • Voorhees, E. (1999). The TREC-8 question answering track report. In Proc. 8th intl. conf. on World Wide Web (pp. 77–82).

    Google Scholar 

  • Voorhees, E. (2004). Overview of the TREC 2004 robust retrieval track. In Proc. 13th intl. conf. on World Wide Web.

    Google Scholar 

  • Voorhees, E. (2005). Overview of the TREC 2005 robust retrieval track. In Proc. 14th intl. conf. on World Wide Web.

    Google Scholar 

  • Wang, L., Lin, J., & Metzler, D. (2010a). Learning to efficiently rank. In Proc. 33rd ann. intl. ACM SIGIR conf. on research and development in information retrieval (pp. 138–145). Geneva, Switzerland.

    Google Scholar 

  • Wang, L., Metzler, D., & Lin, J. (2010b). Ranking under temporal constraints. In Proc. 19th intl. conf. on information and knowledge management, Toronto, Canada.

    Google Scholar 

  • Zhai, C., & Lafferty, J. (2001b). A study of smoothing methods for language models applied to ad hoc information retrieval. In Proc. 24th ann. intl. ACM SIGIR conf. on research and development in information retrieval (pp. 334–342).

    Chapter  Google Scholar 

  • Zhou, Y., & Croft, W. B. (2005). Document quality models for web ad hoc retrieval. In Proc. 14th intl. conf. on information and knowledge management (pp. 331–332).

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Donald Metzler .

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Metzler, D. (2011). Feature-Based Ranking. In: A Feature-Centric View of Information Retrieval. The Information Retrieval Series, vol 27. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-22898-8_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-22898-8_3

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-22897-1

  • Online ISBN: 978-3-642-22898-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics