Skip to main content

A Comparative Study of Utilizing Topic Models for Information Retrieval

  • Conference paper
Advances in Information Retrieval (ECIR 2009)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 5478))

Included in the following conference series:

Abstract

We explore the utility of different types of topic models for retrieval purposes. Based on prior work, we describe several ways that topic models can be integrated into the retrieval process. We evaluate the effectiveness of different types of topic models within those retrieval approaches. We show that: (1) topic models are effective for document smoothing; (2) more rigorous topic models such as Latent Dirichlet Allocation provide gains over cluster-based models; (3) more elaborate topic models that capture topic dependencies provide no additional gains; (4) smoothing documents by using their similar documents is as effective as smoothing them by using topic models; (5) doing query expansion should utilize topics discovered in the top feedback documents instead of coarse-grained topics from the whole corpus; (6) generally, incorporating topics in the feedback documents for building relevance models can benefit the performance more for queries that have more relevant documents.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Xu, J., Croft, W.B.: Cluster-based language models for distributed retrieval. In: Proceedings of ACM SIGIR, pp. 254–261 (1999)

    Google Scholar 

  2. Blei, D., Ng, A., Jordan, M.: Latent Dirichlet Allocation. Journal of machine Learning Research 3, 993–1022 (2003)

    MATH  Google Scholar 

  3. Li, W., McCallum, A.: Pachinko Allocation: DAG-structured mixture models of topic correlations. In: Proceedings of ICML, Pittsburgh, PA, pp. 577–584 (2006)

    Google Scholar 

  4. Nigam, K., McCallum, A., Thrun, S., Mitchell, T.M.: Text classification from labeled and unlabeled documents using EM. Machine Learning 39(2/3), 103–134 (2000)

    Article  MATH  Google Scholar 

  5. Hofmann, T.: Probabilistic latent semantic indexing. In: Proceedings of ACM SIGIR, Berkeley,CA,USA, pp. 50–57 (1999)

    Google Scholar 

  6. Deerwester, S., Dumais, S.T., Furnas, G.W., Landauer, T.K., Harshman, R.: Indexing by latent semantic analysis. Journal of the American Society for Information Science 41, 391–407 (1990)

    Article  Google Scholar 

  7. Wei, X., Croft, W.B.: LDA-based document models for ad-hoc retrieval. In: Proceedings of ACM SIGIR, Seattle, Washington, pp. 178–185 (2006)

    Google Scholar 

  8. Steyvers, M., Griffiths, T.: Probabilistic topic models. Handbook of Latent Semantic Analysis (2007)

    Google Scholar 

  9. Lavrenko, V., Croft, W.B.: Relevance-based language models. In: Proceedings of ACM SIGIR, pp. 120–127 (2001)

    Google Scholar 

  10. Zhai, C., Lafferty, J.: Model-based feedback in the language modeling approach to information retrieval. In: Proceedings of CIKM, pp. 403–410 (2001)

    Google Scholar 

  11. Zhai, C., Lafferty, J.: A study of smoothing methods for language models applied to ad-hoc Information Retrieval. In: Proceedings of ACM SIGIR, pp. 334–342 (2001)

    Google Scholar 

  12. Liu, X., Croft, W.B.: Cluster-based retrieval using language models. In: Proceedings of ACM SIGIR, Sheffield, UK, pp. 186–193 (2004)

    Google Scholar 

  13. Tao, T., Wang, X., Mei, Q., Zhai, C.: Language model information retrieval with document expansion. In: Proceedings of HLT/NAACL, pp. 407–414 (2006)

    Google Scholar 

  14. Lafferty, J., Zhai, C.: Document language models, query models, and risk minimization for Information Retrieval. In: Proceedings of ACM SIGIR, pp. 111–119 (2001)

    Google Scholar 

  15. Lavrenko, V.: A generative theory of relevance. Ph.D. Dissertation, 55–56 (2004)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Yi, X., Allan, J. (2009). A Comparative Study of Utilizing Topic Models for Information Retrieval. In: Boughanem, M., Berrut, C., Mothe, J., Soule-Dupuy, C. (eds) Advances in Information Retrieval. ECIR 2009. Lecture Notes in Computer Science, vol 5478. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-00958-7_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-00958-7_6

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-00957-0

  • Online ISBN: 978-3-642-00958-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics