Advertisement

LSM: Language Sense Model for Information Retrieval

  • Shenghua Bao
  • Lei Zhang
  • Erdong Chen
  • Min Long
  • Rui Li
  • Yong Yu
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4016)

Abstract

A lot of work has been done on drawing word senses into retrieval to deal with the word sense ambiguity problem, but most of them achieved negative results. In this paper, we first implement a WSD system for nouns and verbs, then the language sense model (LSM) for information retrieval is proposed. The LSM combines the terms and senses of a document seamlessly through an EM algorithm. Retrieval on TREC collections shows that the LSM outperforms both the vector space model (BM25) and the traditional language model significantly for both medium and long queries (7.53%-16.90%). Based on the experiments, we can also empirically draw the conclusion that the fine-grained senses will improve the retrieval performance when they are properly used.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Voorhees, E.M.: Using wordnet to disambiguate word senses for text retrieval. In: Korfhage, R., Rasmussen, E.M., Willett, P. (eds.) Proceedings of the 16th Annual International ACMSIGIR Conference on Research and Development in Information Retrieval, Pittsburgh, PA, USA, June 27 - July 1, pp. 171–180. ACM, New York (1993)CrossRefGoogle Scholar
  2. 2.
    Wallis, P.: Information retrieval based on paraphrase (1993)Google Scholar
  3. 3.
    Sussna, M.: Word sense disambiguation for free-text indexing using a massive semantic network. In: CIKM 1993: Proceedings of the second international conference on Information and knowledge management, pp. 67–74. ACM Press, New York (1993)CrossRefGoogle Scholar
  4. 4.
    Gonzalo, J., Verdejo, F., Chugur, I., Cigarran, J.: Indexing withWordNet synsets can improve text retrieval. In: Proceedings of the COLING/ACL 1998 Workshop on Usage of WordNet for NLP, Montreal, Canada, pp. 38–44 (1998)Google Scholar
  5. 5.
    Sanderson, M.: Word sense disambiguation and information retrieval. In: Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval, Dublin, Ireland, pp. 142–151 (1994)Google Scholar
  6. 6.
    Krovetz, R.: Viewing Morphology as an Inference Process. In: Proceedings of the Sixteenth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 191–203 (1993)Google Scholar
  7. 7.
    Kim, S.B., Seo, H.C., Rim, H.C.: Information retrieval using word senses: root sense tagging approach. In: SIGIR 2004: Proceedings of the 27th annual international conference on Research and development in information retrieval, pp. 258–265. ACM Press, New York (2004)CrossRefGoogle Scholar
  8. 8.
    Ponte, J.M., Croft, W.B.: A language modeling approach to information retrieval. In: Research and Development in Information Retrieval, pp. 275–281 (1998)Google Scholar
  9. 9.
    Sanderson, M.: Retrieval with good sense. Information Retrieval 2, 47–67 (2000)CrossRefGoogle Scholar
  10. 10.
    Stokoe, C., Oakes, M.P., Tait, T.: Word sense disambiguation in information retrieval revisited. In: Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Text representation, pp. 159–166 (2003) Google Scholar
  11. 11.
    Rosenfeld, R.: Two decades of statistical language modeling. In: Where do we go from here (2000)Google Scholar
  12. 12.
    Song, F., Croft, W.B.: A general language model for information retrieval. In: Proceedings of the eighth international conference on Information and knowledge management, pp. 316–321 (1999)Google Scholar
  13. 13.
    Zhai, C., Lafferty, J.: A study of smoothing methods for language models applied to information retrieval. ACM Transactions on Information Systems 22, 179–214 (2004)CrossRefGoogle Scholar
  14. 14.
    Kurland, O., Lee, L.: Corpus structure, language models, and ad hoc information. In: Proceedings of the 27th International ACM SIGIR Conference, pp. 194–201 (2004)Google Scholar
  15. 15.
    Xu, J., Croft, W.: Cluster-based retrieval using language models. In: Proceedings of the 27th International ACM SIGIR conference (2004)Google Scholar
  16. 16.
    Srikanth, M., Srihari, R.K.: Exploiting syntactic structure of queries in a language modeling approach to ir. In: Proceedings of the 2003 ACM CIKM International Conference on Information and Knowledge Management, New Orleans, Louisiana, USA, pp. 476–483. ACM, New York (2003)CrossRefGoogle Scholar
  17. 17.
    Gao, J., Nie, J.Y., Wu, G., Cao, G.: Dependence language model for information retrieval. In: Proceedings of the 27th annual international conference on Research and development in information retrieval (2004)Google Scholar
  18. 18.
    Cao, G., Nie, J.Y., Bai, J.: Integrating word relationships into language models. In: Proceedings of 17th ACM SIGIR conference, pp. 298–305 (2005)Google Scholar
  19. 19.
    Mihalcea, R.F., Moldovan, D.I.: A highly accurate bootstrapping algorithm for word sense disambiguation. International Journal on Artificial Intelligence Tools 10, 5–21 (2001)CrossRefGoogle Scholar
  20. 20.
    Liu, S., Liu, F., Yu, C., Meng, W.: An effective approach to document retrieval via utilizing wordnet and recognizing phrases. In: SIGIR 2004: Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval, pp. 266–272. ACM Press, New York (2004)Google Scholar
  21. 21.
    Chen, S.F., Goodman, J.: An empirical study of smoothing techniques for language modeling. In: Joshi, A., Palmer, M. (eds.) Proceedings of the Thirty-Fourth Annual Meeting of the Association for Computational Linguistics, pp. 310–318. Morgan Kaufmann Publishers, San Francisco (1996)Google Scholar
  22. 22.
    Robertson, S.E., Walker, S., Hancock-Beaulieu, M., Gull, A., Lau, M.: Okapi at TREC. In: Text REtrieval Conference, pp. 21–30 (1992)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Shenghua Bao
    • 1
  • Lei Zhang
    • 1
  • Erdong Chen
    • 1
  • Min Long
    • 1
  • Rui Li
    • 1
  • Yong Yu
    • 1
  1. 1.APEX Data and Knowledge Management Lab, Department of Computer Science & EngineeringShanghai Jiao Tong UniversityShanghaiP.R.China

Personalised recommendations