An Effective Approach to Verbose Queries Using a Limited Dependencies Language Model

  • Eduard Hoenkamp
  • Peter Bruza
  • Dawei Song
  • Qiang Huang
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5766)

Abstract

Intuitively, any ‘bag of words’ approach in IR should benefit from taking term dependencies into account. Unfortunately, for years the results of exploiting such dependencies have been mixed or inconclusive. To improve the situation, this paper shows how the natural language properties of the target documents can be used to transform and enrich the term dependencies to more useful statistics. This is done in three steps. The term co-occurrence statistics of queries and documents are each represented by a Markov chain. The paper proves that such a chain is ergodic, and therefore its asymptotic behavior is unique, stationary, and independent of the initial state. Next, the stationary distribution is taken to model queries and documents, rather than their initial distributions. Finally, ranking is achieved following the customary language modeling paradigm. The main contribution of this paper is to argue why the asymptotic behavior of the document model is a better representation then just the document’s initial distribution. A secondary contribution is to investigate the practical application of this representation in case the queries become increasingly verbose. In the experiments (based on Lemur’s search engine substrate) the default query model was replaced by the stable distribution of the query. Just modeling the query this way already resulted in significant improvements over a standard language model baseline. The results were on a par or better than more sophisticated algorithms that use fine-tuned parameters or extensive training. Moreover, the more verbose the query, the more effective the approach seems to become.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Ponte, J.M., Croft, W.B.: A language modeling approach to information retrieval. In: Proceedings of the 21st Conference on Research and Development in Information Retrieval, pp. 275–281 (1998)Google Scholar
  2. 2.
    Song, F., Croft, W.B.: A general language model for information retrieval. In: Proceedings of the 22nd Conference on Research and Development in Information Retrieval, pp. 279–280 (1999)Google Scholar
  3. 3.
    Zhai, C., Lafferty, J.: A study of smoothing methods for language models applied to ad hoc information retrieval. In: SIGIR 2001: Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval, pp. 334–342. ACM Press, New York (2001)Google Scholar
  4. 4.
    Shiffrin, R.M., Steyvers, M.: The effectiveness of retrieval from memory. In: Oaksford, M., Chater, N. (eds.) Rational models of cognition, pp. 73–95. Oxford University Press, Oxford (1998)Google Scholar
  5. 5.
    Metzler, D., Croft, W.B.: A markov random field model for term dependencies. In: SIGIR 2005: Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval. ACM Press, New York (2005)Google Scholar
  6. 6.
    Gao, J., Nie, J.Y., Wu, G., Cao, G.: Dependence language model for information retrieval. In: Proceedings of the 27th annual international conference on Research and development in information retrieval, pp. 170–177. ACM Press, New York (2004)Google Scholar
  7. 7.
    Lafferty, J., Zhai, C.: Document language models, query models, and risk minimization for IR. In: Proceedings of the 24th Conference on Research and Development in Information Retrieval, pp. 111–119 (2001)Google Scholar
  8. 8.
    Burgess, C., Livesay, K., Lund, K.: Explorations in context space: Words, sentences, discourse. Discourse Processes 25, 211–257 (1998)CrossRefGoogle Scholar
  9. 9.
    Anderson, J.: The Architecture of Cognition. Harvard University Press, Cambridge (1983)Google Scholar
  10. 10.
    Chwilla, D., Kolk, H.: Accessing world knowledge: Evidence from n400 and reaction time priming. Cognitive Brain Research 25, 589–606 (2005)CrossRefGoogle Scholar
  11. 11.
    Shannon, C.E.: A mathematical theory of communication. Bell System Technical Journal 27, 379–423, 623–656 (1948) MathSciNetCrossRefMATHGoogle Scholar
  12. 12.
    Wei, X., Croft, W.B.: Lda-based document models for ad-hoc retrieval. In: SIGIR 2006: Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval, pp. 178–185. ACM Press, New York (2006)Google Scholar
  13. 13.
    Lund, K., Burgess, C.: Producing high-dimensional semantic spaces from lexical co-occurrence. Behavior Research Methods, Instruments, & Computers 28(2), 203–208 (1996)CrossRefGoogle Scholar
  14. 14.
    Azzopardi, L., Girolami, M., Crowe, M.: Probabilistic hyperspace analogue to language. In: Proceedings of the 28th Annual ACM Conference on Research and Development in Infomration Retrieval (SIGIR 2005), pp. 575–576. ACM, New York (2005)Google Scholar
  15. 15.
    Bendersky, M., Croft, W.B.: Discovering key concepts in verbose queries. In: SIGIR 2008: Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval, pp. 491–498. ACM, New York (2008)Google Scholar
  16. 16.
    Cao, G., Nie, J.Y., Bai, J.: Using markov chains to exploit word relationships in information retrieval. In: The 8th Conference on Large-Scale Semantic Access to Content, RIAO 2007 (2007)Google Scholar
  17. 17.
    Lavrenko, V., Croft, W.B.: Relevance-based language models. In: Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 120–127 (2001)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Eduard Hoenkamp
    • 1
  • Peter Bruza
    • 2
  • Dawei Song
    • 3
  • Qiang Huang
    • 3
  1. 1.University of MaastrichtThe Netherlands
  2. 2.Queensland University of TechnologyAustralia
  3. 3.Robert Gordon UniversityUK

Personalised recommendations