Discovering Emerging Topics in Unlabelled Text Collections

  • Rene Schult
  • Myra Spiliopoulou
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4152)


As document collections accummulate over time, some of the discussion subjects in them become outfashioned, while new ones emerge. Then, old classification schemes should be updated. In this paper, we address the challenge of finding emerging and persistent “themes”, i.e. subjects that live long enough to be incorporated into a taxonomy or ontology describing the document collection. We focus on the identification of cluster labels that “survive” changes in the constitution of the underlying population of documents, including changes in the feature space of dominant words, because the terminology of the document archive also changes over time. We have conducted a set of promising experiments on the identification of themes that manifested themselves in section H2.8 of the ACM digital library and juxtapose them with the classes foreseen in the ACM taxonomy for this section.


Data Mining Feature Space Image Retrieval Image Database Document Collection 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. [Agg05]
    Aggarwal, C.: On change diagnosis in evolving data streams. IEEE TKDE 17(5), 587–600 (2005)Google Scholar
  2. [All02]
    Allan, J.: Introduction to Topic Detection and Tracking. Kluwer Academic Publishers, Dordrecht (2002)Google Scholar
  3. [BN04]
    Borgelt, C., Nürnberger, A.: Experiments in Document Clustering using Cluster Specific Term Weights. In: Proc. Workshop Machine Learning and Interaction for Text-based Information Retrieval (TIR 2004), Germany, pp. 55–68. University of Ulm (2004)Google Scholar
  4. [GGR99]
    Ganti, V., Gehrke, J., Ramakrishnan, R.: A Framework for Measuring Changes in Data Characteristics. In: Proceedings of the 18th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, Philadelphia, Pennsylvania, May 1999, pp. 126–137. ACM Press, New York (1999)Google Scholar
  5. [KGP+03]
    Kontostathis, A., Galitsky, L., Pottenger, W.M., Roy, S., Phelps, D.J.: A Survey of Emerging Trend Detection in Textual Data Mining. Springer, Heidelberg (2003)Google Scholar
  6. [MY04]
    Moringa, S., Yamanishi, K.: Tracking Dynamics of Topic Trends Using a Finite Mixture Model. In: Kohavi, R., Gehrke, J., DuMouchel, W., Ghosh, J. (eds.) Proceedings of the 2004 ACM SIGKDD international conference on Knowledge discovery and data mining, August 2004, pp. 811–816. ACM Press, New York (2004)CrossRefGoogle Scholar
  7. [MZ05]
    Mei, Q., Zhai, C.: Discovering Evolutionary Theme Patterns from Text - An Exploration of Temporal Text Mining. In: Proceeding of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining, Chicago, Illinois, USA, August 2005, pp. 198–207. ACM Press, New York (2005)CrossRefGoogle Scholar
  8. [NMSD05]
    Neill, D., Moore, A., Sabhnani, M., Daniel, K.: Detection of emerging space-time clusters. In: Proc. of KDD 2005, Chicago, IL, August 2005, pp. 218–227 (2005)Google Scholar
  9. [SNTS06]
    Spiliopoulou, M., Ntoutsi, I., Theodoridis, Y., Schult, R.: Monic – modeling and monitoring cluster transitions. In: Proc. of 12th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining (KDD 2006), Philadelphia, USA, August 2006, pages. 6. ACM Press, New York (2006)Google Scholar
  10. [SS06]
    Schult, R., Spiliopoulou, M.: Expanding the Taxonomies of Bibliographic Archives with Persistent Long-Term Themes. In: SAC 2006, ACM Press, New York (2006)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Rene Schult
    • 1
  • Myra Spiliopoulou
    • 1
  1. 1.Institute of Technical and Business Information SystemsOtto-von-Guericke-University MagdeburgGermany

Personalised recommendations