Advertisement

Relation Based Term Weighting Regularization

  • Hao Wu
  • Hui Fang
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7224)

Abstract

Traditional retrieval models compute term weights based on only the information related to individual terms such as TF and IDF. However, query terms are related. Intuitively, these relations could provide useful information about the importance of a term in the context of other query terms. For example, query “perl tutorial” specifies that a user look for information relevant to both perl and tutorial. Thus, a document containing both terms should have higher relevance score than the ones with only one of them. However, if the IDF value of “tutorial” is much smaller than “perl”, existing retrieval models may assign the document lower score than those containing multiple occurrences of “perl”. It is clear that the importance of a term should be dependent on not only collection statistics but also the relations with other query terms. In this work, we study how to utilize semantic relations among query terms to regularize term weighting. Experiment results over TREC collections show that the proposed strategy is effective to improve the retrieval performance.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Croft, W., Turtle, H., Lewis, D.: The use of phrases and structured queries in information retrieval. In: Proceedings of SIGIR 1991 (1991)Google Scholar
  2. 2.
    Fang, H., Tao, T., Zhai, C.: A formal study of information retrieval heuristics. In: Proceedings of SIGIR 2004 (2004)Google Scholar
  3. 3.
    Fang, H., Zhai, C.: An exploration of axiomatic approaches to information retrieval. In: Proceedings of SIGIR 2005 (2005)Google Scholar
  4. 4.
    Fang, H., Zhai, C.: Semantic term matching in axiomatic approaches to information retrieval. In: Proceedings of SIGIR 2006 (2006)Google Scholar
  5. 5.
    Fuhr, N.: Probabilistic models in information retrieval. The Computer Journal 35(3), 243–255 (1992)zbMATHCrossRefGoogle Scholar
  6. 6.
    Grieff, W.R.: A theory of term weighting based on exploratory data analysis. In: Proceedings of SIGIR 1998 (1998)Google Scholar
  7. 7.
    Hartiwig, F., Dearing, B.E.: Exploratory Data Analysis. Sage Publications (1979)Google Scholar
  8. 8.
    Liu, S., Liu, F., Yu, C., Meng, W.: An effective approach to document retrieval via utilizing wordnet and recognizing phrases. In: Proceedings of SIGIR 2004 (2004)Google Scholar
  9. 9.
    Metzler, D., Croft, W.B.: A markov random field model for term dependencies. In: Proceedings of SIGIR 2005 (2005)Google Scholar
  10. 10.
    Mitra, M., Buckley, C., Singhal, A., Cardie, C.: An analysis of statistical and syntactic phrases. In: Proceedings of RIAO (1997)Google Scholar
  11. 11.
    Ponte, J., Croft, W.B.: A language modeling approach to information retrieval. In: Proceedings of the ACM SIGIR 1998, pp. 275–281 (1998)Google Scholar
  12. 12.
    Robertson, S.E., Walker, S., Jones, S., Hancock-Beaulieu, M.M., Gatford, M.: Okapi at TREC-3. In: Proceedings of TREC-3 (1995)Google Scholar
  13. 13.
    Salton, G.: Automatic Text Processing: The Transformation, Analysis and Retrieval of Information by Computer. Addison-Wesley (1989)Google Scholar
  14. 14.
    Singhal, A., Buckley, C., Mitra, M.: Pivoted document length normalization. In: Proceedings of SIGIR 1996 (1996)Google Scholar
  15. 15.
    Tao, T., Zhai, C.: An exploration of proximity measures in information retrieval. In: Proceedings of SIGIR 2007 (2007)Google Scholar
  16. 16.
    van Rijbergen, C.J.: A theoretical basis for theuse of co-occurrence data in information retrieval. Journal of Documentation, 106–119 (1977)Google Scholar
  17. 17.
    van Rijsbergen, C.J.: Information Retrieval. Butterworths (1979)Google Scholar
  18. 18.
    Voorhees, E.M.: Overview of the trec 2005 robust retrieval track. In: Proceedings of the Fourteenth Text REtrieval Conference, TREC 2005 (2006)Google Scholar
  19. 19.
    Zhai, C., Lafferty, J.: A study of smoothing methods for language models applied to ad hoc information retrieval. In: Proceedings of SIGIR 2001 (2001)Google Scholar
  20. 20.
    Zheng, W., Fang, H.: Query Aspect Based Term Weighting Regularization in Information Retrieval. In: Gurrin, C., He, Y., Kazai, G., Kruschwitz, U., Little, S., Roelleke, T., Rüger, S., van Rijsbergen, K. (eds.) ECIR 2010. LNCS, vol. 5993, pp. 344–356. Springer, Heidelberg (2010)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Hao Wu
    • 1
  • Hui Fang
    • 1
  1. 1.Department of Electrical and Computer EngineeringUniversity of DelawareUSA

Personalised recommendations