PLSI: The True Fisher Kernel and beyond

Chappelier, Jean-Cédric; Eckard, Emmanuel

doi:10.1007/978-3-642-04180-8_30

Jean-Cédric Chappelier²² &
Emmanuel Eckard²²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5781))

Included in the following conference series:

Joint European Conference on Machine Learning and Knowledge Discovery in Databases

2618 Accesses
10 Citations

Abstract

The Probabilistic Latent Semantic Indexing model, introduced by T. Hofmann (1999), has engendered applications in numerous fields, notably document classification and information retrieval. In this context, the Fisher kernel was found to be an appropriate document similarity measure. However, the kernels published so far contain unjustified features, some of which hinder their performances. Furthermore, PLSI is not generative for unknown documents, a shortcoming usually remedied by “folding them in” the PLSI parameter space.

This paper contributes on both points by (1) introducing a new, rigorous development of the Fisher kernel for PLSI, addressing the role of the Fisher Information Matrix, and uncovering its relation to the kernels proposed so far; and (2) proposing a novel and theoretically sound document similarity, which avoids the problem of “folding in” unknown documents. For both aspects, experimental results are provided on several information retrieval evaluation sets.

Work supported by projects 200021-111817 and 200020-119745 of the Swiss National Science Foundation.

Download to read the full chapter text

Chapter PDF

A comparative study of data-dependent approaches without learning in measuring similarities of data objects

Article 30 October 2019

On the Replicability of Combining Word Embeddings and Retrieval Models

A unified statistical approach to non-negative matrix factorization and probabilistic latent semantic indexing

Article 15 November 2014

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

Ahrendt, P., Goutte, C., Larsen, J.: Co-occurrence models in music genre classification. In: ieee Int. Workshop on Machine Learning for Signal Processing (2005)
Google Scholar
Bast, H., Weber, I.: Insights from viewing ranked retrieval as rank aggregation. In: Proc. of Int. Workshop on Challenges in Web Information Retrieval and Integration (WIRI 2005), pp. 232–239 (2005)
Google Scholar
Blei, D., Lafferty, J.: A correlated topic model of Science. Annals of Applied Statistics 1(1), 17–35 (2007)
Article MathSciNet MATH Google Scholar
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. Journal of Machine Learning Research 3, 993–1022 (2003)
MATH Google Scholar
Bosch, A., Zisserman, A., Munoz, X.: Scene classification via plsa. In: Proc. of the European Conf. on Computer Vision (2006)
Google Scholar
Gaussier, E., Goutte, C., Popat, K., Chen, F.: A hierarchical model for clustering and categorising documents. In: Proc. of 24th BCS-IRSG Europ. Coll. on IR Research, pp. 229–247 (2002)
Google Scholar
Gehler, P.V., Holub, A.D., Welling, M.: The rate adapting Poisson model for information retrieval and object recognition. In: Proc. 23rd Int. Conf. on Machine Learning, pp. 337–344 (2006)
Google Scholar
Harman, D.: Overview of the fourth Text REtrieval Conference (TREC–4). In: Proc. of the 4th Text REtrieval Conf., pp. 1–23 (1995)
Google Scholar
Hinneburg, A., Gabriel, H.-H., Gohr, A.: Bayesian folding-in with Dirichlet kernels for PLSI. In: Proc. of the 7th IEEE Int. Conf. on Data Mining, pp. 499–504 (2007)
Google Scholar
Hofmann, T.: Probabilistic latent semantic indexing. In: Proc. of 22nd Annual Int. ACM SIGIR Conf. on Research and Development in Information Retrieval, pp. 50–57 (1999)
Google Scholar
Hofmann, T.: Learning the similarity of documents: An information-geometric approach to document retrieval and categorization. In: Advances in Neural Information Processing Systems, vol. 12, pp. 914–920 (2000)
Google Scholar
Hofmann, T.: Unsupervised learning by probabilistic latent semantic analysis. Machine Learning 42(1), 177–196 (2001)
Article MathSciNet MATH Google Scholar
Jaakkola, T., Haussler, D.: Exploiting generative models in discriminative classifiers. In: Advances in Neural Information Processing Systems, vol. 11, pp. 487–493. MIT Press, Cambridge (1999)
Google Scholar
Jin, X., Zhou, Y., Mobasher, B.: Web usage mining based on probabilistic latent semantic analysis. In: Proc. of 10th Int. Conf. on Knowledge Discovery and Data Mining, pp. 197–205 (2004)
Google Scholar
Lafferty, J., Zhai, C.: Document language models, query models, and risk minimization for information retrieval. In: Proc. ACM SIGIR Conf. on Research and Development in Information Retrieval (2001)
Google Scholar
Lienhart, R., Slaney, M.: Plsa on large-scale image databases. In: Proc. of the 2007 Int. Conf. on Acoustics, Speech and Signal Processing, IEEE (ICASSP 2007), vol. 4, pp. 1217–1220 (2007)
Google Scholar
McLachlan, G., Peel, D.: Finite Mixture Models. Wiley, Chichester (2000)
Book MATH Google Scholar
Mei, Q., Zhai, C.: A mixture model for contextual text mining. In: Proc. of 12th Int. Conf. on Knowledge Discovery and Data Mining, pp. 649–655 (2006)
Google Scholar
Monay, F., Gatica-Perez, D.: Plsa-based image auto-annotation: Constraining the latent space. In: Proc. ACM Int. Conf. on Multimedia, ACM MM (2004)
Google Scholar
Monay, F., Gatica-Perez, D.: Modeling semantic aspects for cross-media image indexing. IEEE Transactions on Pattern Analysis and Machine Intelligence 29 (2007)
Google Scholar
Nyffenegger, M., Chappelier, J.-C., Gaussier, E.: Revisiting Fisher kernels for document similarities. In: Proc. of 17th European Conf. on Machine Learning, pp. 727–734 (2006)
Google Scholar
Ponte, J.M., Croft, W.B.: A language modeling approach to information retrieval. In: 21st SIGIR Conf. on Research and Development in Information Retrieval, pp. 275–281 (1998)
Google Scholar
Popescul, A., Ungar, L.H., Pennock, D.M., Lawrence, S.: Probabilistic models for unified collaborative and content-based recommendation in sparse-data environments. In: Proc. of the 17th Conf. in Uncertainty in Artificial Intelligence, pp. 437–444 (2001)
Google Scholar
Quelhas, P., Monay, F., Odobez, J.-M., Gatica-Perez, D., Tuytelaars, T., Gool, L.V.: Modeling scenes with local descriptors and latent aspects. In: Proc. of ICCV 2005, vol. 1, pp. 883–890 (2005)
Google Scholar
Robertson, S.E., Walker, S., Jones, S., Hancock-Beaulieu, M., Gatford, M.: Okapi at TREC–3. In: Proc. of the 3rd Text REtrieval Conf. (1994)
Google Scholar
Steyvers, M., Smyth, P., Rosen-Zvi, M., Griffiths, T.: Probabilistic author-topic models for information discovery. In: Proc. 10th Int. Conf. on Knowl. Discovery and Data Mining, pp. 306–315 (2004)
Google Scholar
Vinokourov, A., Girolami, M.: A probabilistic framework for the hierarchic organisation and classification of document collections. Journal of Intelligent Information Systems 18(2/3), 153–172 (2002)
Article Google Scholar
Welling, M., Rosen-Zvi, M., Hinton, G.: Exponential family harmoniums with an application to information retrieval. In: Advances in Neural Information Processing Systems, vol. 17, pp. 1481–1488 (2005)
Google Scholar
Zhai, C.: Statistical language models for information retrieval a critical review. Found. Trends Inf. Retr. 2(3), 137–213 (2008)
Article Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer and Communication Sciences, École Polytechnique Fédérale de Lausanne, Switzerland
Jean-Cédric Chappelier & Emmanuel Eckard

Authors

Jean-Cédric Chappelier
View author publications
You can also search for this author in PubMed Google Scholar
Emmanuel Eckard
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

NICTA, Locked Bag 8001, Canberra, 2601, Australia and Helsinki Institute of IT,, Finland
Wray Buntine
Dept. of Knowledge Technologies, Jožef Stefan Institute, Jamova 39, 1000, Ljubljana, Slovenia
Marko Grobelnik & Dunja Mladenić &
University College London, The Centre for Computational Statistics and Machine Learning Department of Computer Science, Gower St., WC1E 6BT, London, UK
John Shawe-Taylor

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Chappelier, JC., Eckard, E. (2009). PLSI: The True Fisher Kernel and beyond. In: Buntine, W., Grobelnik, M., Mladenić, D., Shawe-Taylor, J. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2009. Lecture Notes in Computer Science(), vol 5781. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-04180-8_30

Download citation

DOI: https://doi.org/10.1007/978-3-642-04180-8_30
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-04179-2
Online ISBN: 978-3-642-04180-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

PLSI: The True Fisher Kernel and beyond

Abstract

Chapter PDF

Similar content being viewed by others

A comparative study of data-dependent approaches without learning in measuring similarities of data objects

On the Replicability of Combining Word Embeddings and Retrieval Models

A unified statistical approach to non-negative matrix factorization and probabilistic latent semantic indexing

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

PLSI: The True Fisher Kernel and beyond

Abstract

Chapter PDF

Similar content being viewed by others

A comparative study of data-dependent approaches without learning in measuring similarities of data objects

On the Replicability of Combining Word Embeddings and Retrieval Models

A unified statistical approach to non-negative matrix factorization and probabilistic latent semantic indexing

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation