Machine Learning

, Volume 42, Issue 1–2, pp 177–196 | Cite as

Unsupervised Learning by Probabilistic Latent Semantic Analysis

  • Thomas Hofmann


This paper presents a novel statistical method for factor analysis of binary and count data which is closely related to a technique known as Latent Semantic Analysis. In contrast to the latter method which stems from linear algebra and performs a Singular Value Decomposition of co-occurrence tables, the proposed technique uses a generative latent class model to perform a probabilistic mixture decomposition. This results in a more principled approach with a solid foundation in statistical inference. More precisely, we propose to make use of a temperature controlled version of the Expectation Maximization algorithm for model fitting, which has shown excellent performance in practice. Probabilistic Latent Semantic Analysis has many applications, most prominently in information retrieval, natural language processing, machine learning from text, and in related areas. The paper presents perplexity results for different types of text and linguistic data collections and discusses an application in automated document indexing. The experiments indicate substantial and consistent improvements of the probabilistic method over standard Latent Semantic Analysis.

unsupervised learning latent class models mixture models dimension reduction EM algorithm information retrieval natural language processing language modeling 


  1. Baker, L. D. & McCallum, A. K. (1998). Distributional clustering ofwords for text classification. In Proceedings of the 21st ACM-SIGIR International Conference on Research and Development in Information Retrieval (SIGIR).Google Scholar
  2. Bellegarda, J. R. (1998). Exploiting both local and global constraints for multi-span statistical language modeling. In Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP'98, pp. 677–680.Google Scholar
  3. Berry, M. W. Dumais, S. T., & O'Brien, G. W. (1995). Using linear algebra for intelligent information retrieval. SIAM Review, 37(4), 573–595.Google Scholar
  4. Cheeseman, P. & Stutz, J. (1996). Bayesian classification (AutoClass): Theory and results. In Usama M. Fayyad, Gregory Piatetsky-Shapiro, Padhraic Smyth, & Ramasamy Uthurusamy, (Eds.), Advances in Knowledge Discovery and Data Mining. AAAI Press/MIT Press.Google Scholar
  5. Coccaro, N. & Jurafsky, D. (1998). Towards better integration of semantic predictors in statistical language modeling. In Proceedings of the 5th International Conference on Spoken Language Processing (ICSLP).Google Scholar
  6. Deerwester, S., Dumais, G. W., Furnas, S. T., Landauer, T. K., & Harshman, R. (1990). Indexing by latent semantic analysis. Journal of the American Society for Information Science, 41, 391–407.Google Scholar
  7. Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. J. Royal Statist. Soc. B, 39, 1–38.Google Scholar
  8. Dumais, S. T. (1995). Latent semantic indexing (LSI): TREC-3 report. In D.K Harman, (Ed.), Proceedings of the Text REtrieval Conference (TREC-3), pp. 219–230.Google Scholar
  9. Foltz, P. W. & Dumais, S. T. (1992). An analysis of information filtering methods. Communications of the ACM, 35(12), 51–60.Google Scholar
  10. Gilula, Z., & Haberman, S. J. (1986). Canonical analysis of contingency tables by maximum likelihood. Journal of the American Statistical Association, 81(395), 780–788.Google Scholar
  11. Golub, G. H. & Van Loan, C. F. (1996). Matrix Computations. Johns Hopkins University Press, 3rd (ed.).Google Scholar
  12. Hofmann, T., Puzicha, J., & Jordan, M. I. (1999). Unsupervised learning from dyadic data. In Advances in Neural Information Processing Systems, Vol. 11, MIT Press.Google Scholar
  13. Katz, S. M. (1987). Estimation of probabilities for sparse data for the language model component of a speech recogniser. IEEE Transactions on Acoustics, Speech and Signal Processing, 35(3), 400–401.Google Scholar
  14. Landauer, T. K. & Dumais, S. T. (1997). A solution to Plato's problem: The latent semantic analysis theory of acquisition, induction, and representation of knowledge. Psychological Review.Google Scholar
  15. LDC. Linguistic Data Consortium: TDT pilot study corpus documentation., 1997.Google Scholar
  16. Lee, D. D. & Seung, H. S. (1999). Learning the parts of objects by non-negative matrix factorization. Nature, 401(675), 788–791.Google Scholar
  17. Neal, R. M. & Hinton, G. E. (1998). A view of the EM algorithm that justifies incremental and other variants. In M.I. Jordan, (Ed.), Learning in Graphical Models, Dordrecht, MA: Kluwer Academic Publishers, pp. 355–368.Google Scholar
  18. Pereira, F. C. N., Tishby, N. Z., & Lee, L. (1983). Distributional clustering of english words. In Proceedings of the ACL, pp. 183–190.Google Scholar
  19. Rose, K., Gurewitz, E., & Fox, G. (1990). A deterministic annealing approach to clustering. Pattern Recognition Letters, 11(11), 589–594.Google Scholar
  20. Salton, G. & McGill, M. J. (1983). Introduction to Modern Information Retrieval. New York: McGraw-Hill.Google Scholar
  21. Saul, L. & Pereira, F. (1997). Aggregate and mixed-order Markov models for statistical language processing. In Proceedings of the 2nd International Conference on Empirical Methods in Natural Language Processing, pp. 81–89.Google Scholar
  22. Ueda, N. & Nakano, R. (1988). Deterministic annealing EM algorithm. Neural Networks, 11(2), 271–282.Google Scholar
  23. Witten, I. H. & Bell, T. C. (1991). The zero-frequency problem—estimating the probabilities of novel events in adaptive text compression. IEEE Transactions on Information Theory, 37(4); 1085–1094.Google Scholar
  24. Wolfe, M. B. W., Schreiner, M. E., Rehder, B., Laham, D., Foltz, P. W., Kintsch, W. & Landauer, T. K. (1998). Learning from text: Matching readers and texts by latent semantic analysis. Discourse Processes, 25(2/3), 309–336.Google Scholar

Copyright information

© Kluwer Academic Publishers 2001

Authors and Affiliations

  • Thomas Hofmann
    • 1
  1. 1.Department of Computer ScienceBrown UniversityProvidenceUSA

Personalised recommendations