Probabilistic Topic Maps: Navigating through Large Text Collections

  • Thomas Hofmann
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 1642)


The visualization of large text databases and document collections is an important step towards more flexible and interactive types of information retrieval. This paper presents a probabilistic approach which combines a statistical, model—based analysis with a topological visualization principle. Our method can be utilized to derive topic maps which represent topical information by characteristic keyword distributions arranged in a two—dimensional spatial layout. Combined with multi-resolution techniques this provides a three-dimensional space for interactive information navigation in large text collections.


Document Collection Latent Semantic Analysis Latent Class Model Probabilistic Latent Semantic Analysis Dimensional Grid 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. [1]
    J. M. Buhmann. Stochastic algorithms for data clustering and visualization. In M. I. Jordan,editor, Learning in Graphical Models. Kluwer Academic Publishers, 1998.Google Scholar
  2. [2]
    J. M. Buhmann and H. Kühnel. Complexity optimized data clustering by competitive neural networks. Neural Computation, 5:75–88, 1993.CrossRefGoogle Scholar
  3. [3]
    S. Deerwester, G. W. Dumais, S. T. amd Furnas, Landauer. T. K., and R. Harshman. Indexing by latent semantic analysis. Journal of the American Society for Information Science, 1990.Google Scholar
  4. [4]
    A. P. Dempster, N. M. Laird, and D. B. Rubin. Maximum likelihood from incomplete data via the EM algorithm. J. Royal Statist. Soc. B, 39:1–38, 1977.zbMATHMathSciNetGoogle Scholar
  5. [5]
    J. G. Herder. Sprachphilosophische Schriften. Felix Meiner Verlag, Hamburg, 1960.Google Scholar
  6. [6]
    T. Hofmann. Probabilistic latent semantic analysis. In Proceedings of the 15th Conference on Uncertainty in AI, 1999.Google Scholar
  7. [7]
    T. Hofmann. Probabilistic latent semantic indexing. In Proceedings of the 22nd ACM-SIGIR International Conference on Research and Development in Information Retrieval, Berkeley, California, 1999.Google Scholar
  8. [8]
    T. Hofmann and J. M. Buhmann. Competitive learning algorithms for robust vector quantization. IEEE Transaction on Signal Processing, 46(6):1665–1675, 1998.zbMATHCrossRefMathSciNetGoogle Scholar
  9. [9]
    T. Hofmann and J. Puzicha. Statistical models for co-occurrence data. Technical report, AI Memo 1625, M.I.T., 1998.Google Scholar
  10. [10]
    S. Kaski, T. Honkela, K. Lagus, and T. Kohonen. WEBSOM-self-organizing maps of document collections. Neurocomputing, 21:101–117, 1998.zbMATHCrossRefGoogle Scholar
  11. [11]
    T. Kohonen. Self-organization and Associative Memory. Springer, 1984.Google Scholar
  12. [12]
    T. Kohonen. Self-Organizing Maps. Springer, 1995.Google Scholar
  13. [13]
    Linguistic Data Consortium. TDT pilot study corpus. Catalog no. LDC98T25, 1998.Google Scholar
  14. [14]
    S. P. Luttrell. Hierarchical vector quantization. IEE Proceedings, 136:405–413, 1989.Google Scholar
  15. [15]
    H. Ritter and T. Kohonen. Self-organizing semantic maps. Biological Cyberbetics, 61:241–254, 1989.CrossRefGoogle Scholar
  16. [16]
    G. Salton and M. J. McGill. Introduction to Modern Information Retrieval. McGraw-Hill, 1983.Google Scholar
  17. [17]
    L. Saul and F. Pereira. Aggregate and mixed-order Markov models for statistical language processing. In Proceedings of the 2nd International Conference on Empirical Methods in Natural Language Processing, 1997.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 1999

Authors and Affiliations

  • Thomas Hofmann
    • 1
  1. 1.Computer Science DivisionUC Berkeley & International CS InstituteBerkeley

Personalised recommendations