TopicExplorer: Exploring Document Collections with Topic Models

  • Alexander Hinneburg
  • Rico Preiss
  • René Schröder
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7524)


The demo presents a prototype – called TopicExplorer – that combines topic modeling, key word search and visualization techniques to explore a large collection of Wikipedia documents. Topics derived by Latent Dirichlet Allocation are presented by top words. In addition, topics are accompanied by image thumbnails extracted from related Wikipedia documents to aid sense making of derived topics during browsing. Topics are shown in a linear order such that similar topics are close. Topics are mapped to color using that order. The auto-completion of search terms suggests words together with their color coded topics, which allows to explore the relation between search terms and topics. Retrieved documents are shown with color coded topics as well. Relevant documents and topics found during browsing can be put onto a shortlist. The tool can recommend further documents with respect to the average topic mixture of the shortlist.


topic model document browser 


  1. 1.
    Blei, D.M.: Probabilistic topic models. Commun. ACM 55(4), 77–84 (2012)CrossRefGoogle Scholar
  2. 2.
    Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)zbMATHGoogle Scholar
  3. 3.
    McCallum, A.K.: Mallet: A machine learning for language toolkit (2002),
  4. 4.
    Charikar, M.S.: Similarity estimation techniques from rounding algorithms. In: Proc. of STOC 2002, pp. 380–388. ACM (2002)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Alexander Hinneburg
    • 1
  • Rico Preiss
    • 1
  • René Schröder
    • 1
  1. 1.InformatikMartin-Luther-University Halle-WittenbergHalleGermany

Personalised recommendations