Encyclopedia of Database Systems

2018 Edition
| Editors: Ling Liu, M. Tamer Özsu

Field-Based Information Retrieval Models

  • Vassilis Plachouras
Reference work entry
DOI: https://doi.org/10.1007/978-1-4614-8265-9_927

Definition

A document D consists of a set of n document fields, and it is represented by a set of n vectors, where each vector corresponds to a document field. A field-based Information Retrieval (IR) model assigns a score or Retrieval Status Value (RSV) to a document D and a query Q by distinguishing the occurrences of query terms in the different field vectors, and by weighting the contribution of each field appropriately.

Historical Background

Textual documents, whether they are news wire items, scientific publications, or Web pages, are rich in structure. For example, depending on its length, a text can be organized in chapters, sections, paragraphs, and each of those can have a concise description in the form of a title. Shorter texts, such as emails, also consist of free text and formatted text. In information retrieval (IR), however, documents are usually represented as a single vector, the dimensions of which correspond to terms occurring in the document. Such a representation...

This is a preview of subscription content, log in to check access.

Recommended Reading

  1. 1.
    Fagin R, Kumar R, McCurley KS, Novak J, Sivakumar D, Tomlin J.A, Williamson D.P. Searching the workplace web. In: Proceedings of the 12th International World Wide Web Conference; 2003. p. 366–75.Google Scholar
  2. 2.
    Fox E.A. Extending the Boolean and vector space models of information retrieval with P-Norm queries and multiple concept types. PhD dissertation. Cornell University; 1983.Google Scholar
  3. 3.
    Fox E.A. Coefficients of combining concept classes in a collection. In: Proceedings of the 11th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval; 1988. p. 291–307.Google Scholar
  4. 4.
    Hawking D, Craswell N. The very large collection and Web tracks. In: Voorhees E, Harman D, editors. TREC: experiment and evaluation in information retrieval. Cambridge, MA: MIT; 2005. p. 199–232.Google Scholar
  5. 5.
    Hawking D, Upstill T, Craswell N. Toward better weighting of anchors. In: Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval; 2004. p. 512–3.Google Scholar
  6. 6.
    Lalmas M. Uniform representation of content and structure for structured document retrieval. Technical report, Queen Mary University of London. 2000.Google Scholar
  7. 7.
    Macdonald C, Ounis I. Combining fields in known-item email search. In: Proceedings of the 32nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval; 2006. p. 675–6.Google Scholar
  8. 8.
    Macdonald C, Plachouras V, He B, Lioma C, Ounis I. University of Glasgow at WebCLEF 2005: experiments in per-field normalisation and language specific stemming. In: Accessing Multilingual Information Repositories. Proceedings of the 6th Workshop of the Cross-Language Evalution Forum; 2005. p. 898–907.CrossRefGoogle Scholar
  9. 9.
    Malik S, Trotman A, Lalmas M, Fuhr N. Overview of INEX 2006. In: Comparative evaluation of XML information retrieval systems. LNCS 4518. Berlin: Springer; 2007. p. 1–11.Google Scholar
  10. 10.
    Myaeng S.H, Jang D.H, Kim M.S, Zhoo Z.C. A flexible model for retrieval of SGML documents. In: Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval; 1998. p. 138–45.Google Scholar
  11. 11.
    Ogilvie P, Callan J. Combining document representations for known-item search. In: Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval; 2003. p. 143–50.Google Scholar
  12. 12.
    Plachouras V, Ounis I. Multinomial randomness models for retrieval with document fields. In: Proceedings of the 29th European Conference on IR Research; 2007. p. 28–39.Google Scholar
  13. 13.
    Robertson S, Zaragoza H, Taylor M. Simple BM25 extension to multiple weighted fields. In: Proceedings of the 13th ACM International Conference on Information and Knowledge Management; 2004. 42–9.Google Scholar
  14. 14.
    Switzer P. Vector images in information retrieval. In: Proceedings of the Symposium on Statistical Association Methods for Mechanical Documentation; 1965. 163–71.Google Scholar
  15. 15.
    Taylor M, Zaragoza H, Craswell N, Robertson S, Burges C. Optimisation methods for ranking functions with multiple parameters. In: Proceedings of the 15th ACM International Conference on Information and Knowledge Management; 2006. p. 585–93.Google Scholar
  16. 16.
    Wilkinson R. Effective retrieval of structured documents. In: Proceedings of the 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval; 1994.p. 311–7.CrossRefGoogle Scholar
  17. 17.
    Zaragoza H, Craswell N, Taylor M, Saria S, Robertson S. Microsoft Cambridge at TREC-13: Web and HARD tracks. In: Proceedings of the 13th Text Retrieval Conference; 2004.Google Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Yahoo! ResearchBarcelonaSpain

Section editors and affiliations

  • Giambattista Amati
    • 1
  1. 1.Fondazione Ugo BordoniRomeItaly