Information Retrieval

, Volume 17, Issue 1, pp 21–51 | Cite as

Latent word context model for information retrieval

  • Bernard Brosseau-Villeneuve
  • Jian-Yun Nie
  • Noriko Kando
Article

Abstract

The application of word sense disambiguation (WSD) techniques to information retrieval (IR) has yet to provide convincing retrieval results. Major obstacles to effective WSD in IR include coverage and granularity problems of word sense inventories, sparsity of document context, and limited information provided by short queries. In this paper, to alleviate these issues, we propose the construction of latent context models for terms using latent Dirichlet allocation. We propose building one latent context per word, using a well principled representation of local context based on word features. In particular, context words are weighted using a decaying function according to their distance to the target word, which is learnt from data in an unsupervised manner. The resulting latent features are used to discriminate word contexts, so as to constrict query’s semantic scope. Consistent and substantial improvements, including on difficult queries, are observed on TREC test collections, and the techniques combines well with blind relevance feedback. Compared to traditional topic modeling, WSD and positional indexing techniques, the proposed retrieval model is more effective and scales well on large-scale collections.

Keywords

Retrieval models Word context discrimination (WCD) Word context Topic models Word sense disambiguation (WSD) 

References

  1. Bai, J., Song, D., Bruza, P., Nie, J. Y., & Cao, G. (2005). Query expansion using term relationships in language models for information retrieval. In CIKM’05 proceedings (pp. 688–695).Google Scholar
  2. Berger, A., & Lafferty, J. (1999). Information retrieval as statistical translation. In SIGIR’99 proceedings (pp. 222–229).Google Scholar
  3. Blei, D. M., & Lafferty, J. D. (2009). Topic models. Text Mining: Classification, clustering, and applications (Vol. 10, p. 71). London, England: Taylor & Francis.Google Scholar
  4. Blei, D., & Lafferty, J. (2006). Correlated topic models. In Y. Weiss, B. Schölkopf, & J. Platt (Eds.), Blei06CTM proceedings. Advances in Neural Information Processing Systems (Vol. 18, pp. 147–154). Cambridge, MA: MIT Press.Google Scholar
  5. Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent dirichlet allocation. Journal of Machine Learning Research, 3, 993–1022.MATHGoogle Scholar
  6. Brosseau-Villeneuve, B., Kando, N., & Nie, J. Y. (2011). Construction of context models for word sense disambiguation. Information and Media Technologies, 6(3), 701–729.Google Scholar
  7. Brown, P. F., Pietra, S. A. D., Pietra, V. J. D., & Mercer, R. L. (1991). Word-sense disambiguation using statistical methods. In ACL’91 proceedings (pp. 264–270).Google Scholar
  8. Cai, J. F., Lee, W. S., & Teh, Y. W. (2007). Nus-ml: Improving word sense disambiguation using topic features. In SemEval’07 (pp. 249–252).Google Scholar
  9. Cao, G., Nie, J. Y., Gao, J., & Robertson, S. (2008). Selecting good expansion terms for pseudo-relevance feedback. In: SIGIR’08 proceedings (pp. 243–250).Google Scholar
  10. Croft, B., Metzler, D., & Strohman, T. (2009). Search engines: Information retrieval in practice. Boston: Addison-Wesley.Google Scholar
  11. Croft, W., Metzler, D., & Strohmann, T. (2010). Search engines: Information retrieval in practice. London, UK: Pearson.Google Scholar
  12. Cui, H., Wen, J. R., Nie, J. Y., & Ma, W. Y. (2002). Probabilistic query expansion using query logs. In WWW’02 proceedings (pp. 325–332).Google Scholar
  13. Deerwester, S. C., Dumais, S. T., Landauer, T. K., Furnas, G. W., & Harshman, R. A. (1990). Indexing by latent semantic analysis. Journal of the American Society of Information Science, 41, 391–407.CrossRefGoogle Scholar
  14. Doyle, G., & Elkan, C. (2009). Accounting for burstiness in topic models. In ICML’09 proceedings (pp. 281–288).Google Scholar
  15. Gale, W. A., Church, K. W., & Yarowsky, D. (1992). One sense per discourse. In HLT’91 proceedings (pp. 233–237).Google Scholar
  16. Gao, J., Nie, J. Y., Wu, G., & Cao, G. (2004). Dependence language model for information retrieval. In SIGIR’04 proceedings (pp. 170–177).Google Scholar
  17. Gaustad, T. (2001). Statistical corpus-based word sense disambiguation: Pseudowords vs. real ambiguous words. In Companion volume to the ACL’01 proceedings (pp. 61–66).Google Scholar
  18. Gonzalo, J., Verdejo, F., Chugur, I., & Cigarrin, J. (1998). Indexing with wordnet synsets can improve text retrieval. In COLING/ACL’98 workshop on the usage of WordNet for NLP (pp. 38–44).Google Scholar
  19. Hofmann, T. (1999). Probabilistic latent semantic indexing. In Proceedings of the 22nd annual international ACM SIGIR conference on research and development in information retrieval, SIGIR’99 (pp. 50–57). New York, NY, USA: ACMGoogle Scholar
  20. Ide, N., & Véronis, J. (1998). Introduction to the special issue on word sense disambiguation: The state of the art. Computational Linguistics, 24(1), 2–40.Google Scholar
  21. Kilgarriff, A. (1997). I don’t believe in word senses. Computers and the Humanities, 31(2), 91–113.Google Scholar
  22. Kim, S. B., Seo, H. C., & Rim, H. C. (2004). Information retrieval using word senses: Root sense tagging approach. In SIGIR’04 proceedings (pp. 258–265).Google Scholar
  23. Krovetz, R., & Croft, W. B. (1992). Lexical ambiguity and information retrieval. ACM Transactions on Information Systems, 10, 115–141.CrossRefGoogle Scholar
  24. Lafferty, J., & Zhai, C. (2001). Document language models, query models, and risk minimization for information retrieval. In SIGIR’01 (pp. 111–119).Google Scholar
  25. Lavrenko, V., & Croft, W. B. (2001). Relevance based language models. In SIGIR’01 proceedings (pp. 120–127).Google Scholar
  26. Li, W., & Mccallum, A. (2006). Pachinko allocation: Dag-structured mixture models of topic correlations. In ICML’06 proceedings (pp. 577–584).Google Scholar
  27. Lu, Y., Mei, Q., & Zhai, C. (2011). Investigating task performance of probabilistic topic models: An empirical study of plsa and lda. Information Retrieval Journal, 14, 178–203.CrossRefGoogle Scholar
  28. Lund, K., & Burgess, C. (1996). Producing high-dimensional semantic spaces from lexical co-occurrence. Behavior Research Methods, Instrumentation, and Computers, 28, 203–208.CrossRefGoogle Scholar
  29. Lv, Y., & Zhai, C. (2009). Positional language models for information retrieval. In SIGIR’09 proceedings (pp. 299–306).Google Scholar
  30. Manning, C. D., Raghavan, P., & Schütze, H. (2008). Introduction to information retrieval. Cambridge: Cambridge University Press. http://nlp.stanford.edu/IR-book/.
  31. Metzler, D., & Croft, W. B. (2004). Combining the language model and inference network approaches to retrieval. Information Processing & Management, 40, 735–750.CrossRefGoogle Scholar
  32. Metzler, D., & Croft, W. B. (2005). A Markov random field model for term dependencies. In SIGIR’05 proceedings (pp. 472–479).Google Scholar
  33. Navigli, R. (2009). Word sense disambiguation: A survey. ACM Computing Surveys, 41(2), 1–69.CrossRefGoogle Scholar
  34. Okumura, M., Shirai, K., Komiya, K., & Yokono, H. (2010). Semeval-2010 task: Japanese wsd. In Proceedings of the 5th international workshop on semantic evaluation (pp. 69–74). Uppsala, Sweden: Association for Computational Linguistics.Google Scholar
  35. Salton, G., Allan, J., & Buckley, C. (1993). Approaches to passage retrieval in full text information systems. In SIGIR’93 proceedings (pp. 49–58).Google Scholar
  36. Sanderson, M. (1994). Word sense disambiguation and information retrieval. In SIGIR’94 proceedings (pp. 142–151).Google Scholar
  37. Sanderson, M. (2000). Retrieving with good sense. Information Retrieval, 2, 49–69.CrossRefGoogle Scholar
  38. Sanderson, M., & Van Rijsbergen, C. J. (1999). The impact on retrieval effectiveness of skewed frequency distributions. ACM Transactions on Information Systems, 17, 440–465.CrossRefGoogle Scholar
  39. Schütze, H. (1998). Automatic word sense discrimination. Computational Linguistics, 24, 97–123.Google Scholar
  40. Schutze, H., & Pedersen, J. O. (1995). Information retrieval based on word senses. In SDAIR’95 proceedings (pp. 161–175).Google Scholar
  41. Song, F., & Croft, W. B. (1999). A general language model for information retrieval. In CIKM’99 proceedings (pp. 316–321).Google Scholar
  42. Srikanth, M., & Srihari, R. (2002). Biterm language models for document retrieval. In SIGIR’02 proceedings (pp. 425–426).Google Scholar
  43. Stokoe, C. (2005). Differentiating homonymy and polysemy in information retrieval. In HLT’05 proceedings (pp. 403–410).Google Scholar
  44. Stokoe, C., Oakes, M. P., & Tait, J. (2003). Word sense disambiguation in information retrieval revisited. In SIGIR’03 proceedings (pp. 159–166).Google Scholar
  45. Voorhees, E. M. (1993). Using wordnet to disambiguate word senses for text retrieval. In SIGIR’93 proceedings (pp. 171–180).Google Scholar
  46. Voorhees, E. M. (2004). Overview of the trec 2004 robust retrieval track. In TREC’04 (p. 13).Google Scholar
  47. Wei, X., & Croft, W. B. (2006). LDA-based document models for ad-hoc retrieval. In SIGIR’06 proceedings (pp. 178–185).Google Scholar
  48. Xu, J., & Croft, W. B. (1996). Query expansion using local and global document analysis. In SIGIR’96 proceedings (pp. 4–11).Google Scholar
  49. Zhao, J., & Yun, Y. (2009). A proximity language model for information retrieval. In SIGIR’09 proceedings (pp. 291–298).Google Scholar

Copyright information

© Springer Science+Business Media New York 2013

Authors and Affiliations

  • Bernard Brosseau-Villeneuve
    • 1
  • Jian-Yun Nie
    • 1
  • Noriko Kando
    • 2
  1. 1.University of MontréalMontrealCanada
  2. 2.National Institute of InformaticsTokyoJapan

Personalised recommendations