An Ad Hoc Information Retrieval Perspective on PLSI through Language Model Identification

  • Jean-Cédric Chappelier
  • Emmanuel Eckard
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5766)


This paper proposes a new document–query similarity for PLSI that allows queries to be used in PLSI without folding-in. We compare this similarity to Fisher kernels, the state-of-the-art approach for PLSI, on a corpus of 1M+ word occurrences coming from TREC–AP.


Information Retrieval Language Model Latent Variable Model Fisher Kernel Word Occurrence 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Cadez, I.V., Gaffney, S., Smyth, P.: A general probabilistic framework for clustering individuals and objects. In: Proc. of 6th KDD, pp. 140–149 (2000)Google Scholar
  2. 2.
    Chappelier, J.-C., Eckard, E.: PLSI: the true Fisher kernel and beyond. In: Proc. of ECML/PKDD (2009)Google Scholar
  3. 3.
    Hinneburg, A., Gabriel, H.-H., Gohr, A.: Bayesian folding-in with Dirichlet kernels for PLSI. In: Proc. of 7th Int. Conf. on Data Mining, pp. 499–504 (2007)Google Scholar
  4. 4.
    Hofmann, T.: Probabilistic latent semantic indexing. In: Proc. of 22th Int. Conf. on Research and Development in Information Retrieval (SIGIR), pp. 50–57 (1999)Google Scholar
  5. 5.
    Hofmann, T.: Learning the similarity of documents. In: Adv. in Neural Information Processing Systems, vol. 12, pp. 914–920 (2000)Google Scholar
  6. 6.
    Lafferty, J., Zhai, C.: Document language models, query models, and risk minimization for information retrieval. In: Proc. of 24th Annual Int. Conference on Research and Development in Information Retrieval (SIGIR), pp. 111–119 (2001)Google Scholar
  7. 7.
    Ponte, J.M., Croft, W.B.: A language modeling approach to information retrieval. In: Proc. of 21st SIGIR, pp. 275–281 (1998)Google Scholar
  8. 8.
    Welling, M., Chemudugunta, C., Sutter, N.: Deterministic latent variable models and their pitfalls. In: SIAM Conference on Data Mining SDM 2008 (2008)Google Scholar
  9. 9.
    Zhai, C.: Statistical language models for information retrieval: A critical review. Foundations and Trends in Information Retrieval 2(3), 137–213 (2008)CrossRefGoogle Scholar
  10. 10.
    Zhai, C., Lafferty, J.: Model-based feedback in the language modeling approach to information retrieval. In: Proc. of 10th CIKM, pp. 403–410 (2001)Google Scholar
  11. 11.
    Zhai, C., Lafferty, J.: A study of smoothing methods for language models applied to information retrieval. ACM Trans. Inf. Syst. 22(2), 179–214 (2004)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Jean-Cédric Chappelier
    • 1
  • Emmanuel Eckard
    • 1
  1. 1.School of Computer and Communication SciencesÉcole Polytechnique Fédérale de LausanneLausanneSwitzerland

Personalised recommendations