A Field Relevance Model for Structured Document Retrieval

  • Jin Young Kim
  • W. Bruce Croft
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7224)

Abstract

Many search applications involve documents with structure or fields. Since query terms often are related to specific structural components, mapping queries to fields and assigning weights to those fields is critical for retrieval effectiveness. Although several field-based retrieval models have been developed, there has not been a formal justification of field weighting.

In this work, we aim to improve the field weighting for structured document retrieval. We first introduce the notion of field relevance as the generalization of field weights, and discuss how it can be estimated using relevant documents, which effectively implements relevance feedback for field weighting. We then propose a framework for estimating field relevance based on the combination of several sources. Evaluation on several structured document collections show that field weighting based on the suggested framework improves retrieval effectiveness significantly.

Keywords

Relevant Document Relevance Feedback Retrieval Model Query Term Mean Average Precision 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Bendersky, M., Metzler, D., Croft, W.B.: Learning concept importance using a weighted dependence model. In: WSDM 2010, pp. 31–40. ACM, New York (2010)CrossRefGoogle Scholar
  2. 2.
    Craswell, N., Hugo Zaragoza, S.R.: Microsoft cambridge at trec-14: Enterprise track. In: The Fourteenth Text REtrieval Conference (2005)Google Scholar
  3. 3.
    Craswell, N., de Vries, A.P.: Overview of the trec-2005 enterprise track. In: The Fourteenth Text REtrieval Conf. Proc. (2005)Google Scholar
  4. 4.
    Kim, J., Croft, W.B.: Retreival experiments using pseudo-desktop collections. In: Proceedings of CIKM 2009, Hong Kong, China, pp. 1297–1306 (2009)Google Scholar
  5. 5.
    Kim, J., Xue, X., Croft, W.B.: A Probabilistic Retrieval Model for Semistructured Data. In: Boughanem, M., Berrut, C., Mothe, J., Soule-Dupuy, C. (eds.) ECIR 2009. LNCS, vol. 5478, pp. 228–239. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  6. 6.
    Lavrenko, V.: A generative theory of relevance. PhD thesis, AAI3152722 (2004)Google Scholar
  7. 7.
    Lavrenko, V., Croft, W.B.: Relevance based language models. In: SIGIR 2001, pp. 120–127. ACM, New York (2001)CrossRefGoogle Scholar
  8. 8.
    Lavrenko, V., Yi, X., Allan, J.: Information retrieval on empty fields. In: HLT-NAACL, pp. 89–96 (2007)Google Scholar
  9. 9.
    Li, X., Wang, Y.-Y., Acero, A.: Extracting structured information from user queries with semi-supervised conditional random fields. In: SIGIR 2009. ACM, New York (2009)Google Scholar
  10. 10.
    Metzler, D., Croft, W.B.: Linear feature-based models for information retrieval. Information Retrieval 10, 257–274 (2007)CrossRefGoogle Scholar
  11. 11.
    Ogilvie, P., Callan, J.: Combining document representations for known-item search. In: SIGIR 2003: Proceedings of the 26th Annual International ACM SIGIR Conference, pp. 143–150. ACM, New York (2003)CrossRefGoogle Scholar
  12. 12.
    Petkova, D., Croft, W.B., Diao, Y.: Refining Keyword Queries for XML Retrieval by Combining Content and Structure. In: Boughanem, M., Berrut, C., Mothe, J., Soule-Dupuy, C. (eds.) ECIR 2009. LNCS, vol. 5478, pp. 662–669. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  13. 13.
    Ponte, J., Croft, W.B.: A language modeling approach to information retrieval, pp. 275–281. ACM, New York (1998)Google Scholar
  14. 14.
    Robertson, S., Zaragoza, H., Taylor, M.: Simple BM25 extension to multiple weighted fields. In: Proceedings of CIKM 2004, pp. 42–49. ACM, New York (2004)CrossRefGoogle Scholar
  15. 15.
    Robertson, S.E., Jones, K.S.: Relevance weighting of search terms. Journal of the American Society for Information Science 27(3), 129–146 (1976)CrossRefGoogle Scholar
  16. 16.
    Yi, X., Allan, J., Croft, W.B.: Matching resumes and jobs based on relevance models. In: SIGIR, pp. 809–810 (2007)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Jin Young Kim
    • 1
  • W. Bruce Croft
    • 1
  1. 1.Center for Intelligent Information Retrieval, Department of Computer ScienceUniversity of MassachusettsAmherstUSA

Personalised recommendations