A Latent Dirichlet Framework for Relevance Modeling

  • Viet Ha-Thuc
  • Padmini Srinivasan
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5839)


Relevance-based language models operate by estimating the probabilities of observing words in documents relevant (or pseudo relevant) to a topic. However, these models assume that if a document is relevant to a topic, then all tokens in the document are relevant to that topic. This could limit model robustness and effectiveness. In this study, we propose a Latent Dirichlet relevance model, which relaxes this assumption. Our approach derives from current research on Latent Dirichlet Allocation (LDA) topic models. LDA has been extensively explored, especially for discovering a set of topics from a corpus. LDA itself, however, has a limitation that is also addressed in our work. Topics generated by LDA from a corpus are synthetic, i.e., they do not necessarily correspond to topics identified by humans for the same corpus. In contrast, our model explicitly considers the relevance relationships between documents and given topics (queries). Thus unlike standard LDA, our model is directly applicable to goals such as relevance feedback for query modification and text classification, where topics (classes and queries) are provided upfront. Thus although the focus of our paper is on improving relevance-based language models, in effect our approach bridges relevance-based language models and LDA addressing limitations of both.


LDA topic models relevance-based language models 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Adrieu, C., Freitas, N., Doucet, A., Jordan, M.: An Introduction to Markov Chain Monte Carlo for Machine Learning. Machine Learning 50 (2003)Google Scholar
  2. 2.
    Blei, M., Ng, A., Jordan, M.: Latent Dirichlet Allocation. Journal of Machine Learning Research 3 (2003)Google Scholar
  3. 3.
    Casella, G., George, E.: Explaining the Gibbs Sampler. The American Statistician 46(3) (1992)Google Scholar
  4. 4.
    Chemudugunta, C., Smyth, P., Steyvers, M.: Modeling General and Specific Aspects of Documents with a Probabilistic Topic Model. In: Proceedings of the 20th NIPS (2006)Google Scholar
  5. 5.
    Erosheva, E., Fienberg, S., Lafferty, J.: Mixed-membership Models of Scientific Publication. In: Proceedings of National Academy of Science, PNAS (2004)Google Scholar
  6. 6.
    Griffiths, T., Steyvers, M.: Finding Scientific Topics. In: Proceedings of National Academy of Science, PNAS (2004)Google Scholar
  7. 7.
    Hiemstra, D., Robertson, S., Zaragoza, H.: Parsimonious Language Models for Information Retrieval. In: Proceedings of the 27th ACM SIGIR (2004)Google Scholar
  8. 8.
    Hofmann, T.: Probabilistic Latent Semantic Indexing. In: Proceedings of the 15th UAI (1999)Google Scholar
  9. 9.
    Lavrenko, V., Croft, W.B.: Relevance-based Language Models. In: Proceedings of the 24th ACM SIGIR (2001)Google Scholar
  10. 10.
    Lavrenko, V., Croft, W.B.: Relevance Models in Information Retrieval. In: Croft, B., Lafferty, J. (eds.) Language Modeling for Information Retrieval. Kluwer Academic Publishers, Dordrecht (2003)Google Scholar
  11. 11.
    Liu, X., Croft, B.: Passage Retrieval Based on Language Models. In: Proceedings of the 11th ACM CIKM (2002)Google Scholar
  12. 12.
    Rijsbergen, C., Robertson, S., Porter, M.: New Models in Probabilistic Information Retrieval, British Library Research and Development Report, 5587 (1980)Google Scholar
  13. 13.
    Robertson, S., Sparck-Jones, K.: Relevance Weighting of Search Terms. Journal of American Society for Information Science 27 (1988)Google Scholar
  14. 14.
    Sparck-Jones, A., Robertson, S., Hiemstra, D., Zaragoza, H.: Language Modelling and Relevance. In: Croft, B., Lafferty, J. (eds.) Language Modeling for Information Retrieval. Kluwer Academic Publishers, Dordrecht (2003)Google Scholar
  15. 15.
    Steyvers, M., Griffiths, T.: Probabilistic Topic Models. In: Landauer, T., et al. (eds.) Latent Semantic Analysis: A Road to Meaning. Lawrence Erlbaum, Mahwah (2006)Google Scholar
  16. 16.
    Wei, X., Croft, B.: LDA-based Document Models for Ad-hoc Retrieval. In: Proceedings of the 29th ACM SIGIR (2006)Google Scholar
  17. 17.
    Zhang, Y., Callan, J., Minka, T.: Novelty and Redundancy Detection in Adaptive Filtering. In: Proceedings of the 25th ACM SIGIR (2002)Google Scholar
  18. 18.
    Zhou, D., Manavoglu, E., Li, J., Giles, L., Zha, H.: Probabilistic Models for Discovering E-Communities. In: Proceedings of the 15th ACM WWW (2006)Google Scholar
  19. 19.

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Viet Ha-Thuc
    • 1
  • Padmini Srinivasan
    • 1
  1. 1.Computer Science DepartmentThe University of IowaIowa CityUSA

Personalised recommendations