Exploration of a Text Collection and Identification of Topics by Clustering

  • Antoine Naud
  • Shiro Usui
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4881)


An application of cluster analysis to identify topics in a collection of posters abstracts from the Society for Neuroscience (SfN) Annual Meeting in 2006 is presented. The topics were identified by selecting from the abstracts belonging to each cluster the terms with the highest scores using different ranking schemes. The ranking scheme based on log-entropy showed better performance in this task than other more classical TFIDF schemes. An evaluation of the extracted topics was performed by comparison with previously defined thematic categories for which titles are available, and after assigning each cluster to one dominant category. The results show that repeated bisecting k-means performs better than standard k-means.


Nonnegative Matrix Factorization Vector Space Model Document Frequency Original Category Ranking Scheme 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Usui, S.: Visiome: Neuroinformatics Research in Vision Project. Neural Networks 16, 1293–1300 (2003)CrossRefGoogle Scholar
  2. 2.
    Porter, M.: An algorithm for suffix stripping. Program 14(3), 130–137 (1980)Google Scholar
  3. 3.
    Usui, S., Palmes, P., Nagata, K., Taniguchi, T., Ueda, N.: Keyword Extraction, Ranking, and Organization for the Neuroinformatics Platform. Bio Systems 88, 334–342 (2007)CrossRefGoogle Scholar
  4. 4.
    Kolda, T.G.: Limited-memory matrix methods with applications. University of Maryland, CS-TR-3806, ch. 7, pp. 59–78 (1997)Google Scholar
  5. 5.
    Steinbach, M., Karypis, G., Kumar, V.: A comparison of documents clustering techniques. In: KDD Workshop on Text Mining (2000)Google Scholar
  6. 6.
    Tan, P.N., Steinbach, M., Kumar, V.: Introduction to datamining. Addison-Wesley, Reading (2006)Google Scholar
  7. 7.
    Groenen, P.: Modern multidimensional scaling: Theory and Applications. Springer Series in Statistics. Springer, Heidelberg (1996)Google Scholar
  8. 8.
    Saito, K., Iwata, T., Ueda, N.: Visualization of Bipartite Graph by Spherical Embedding. In: JNNS (in Japanese) (2004)Google Scholar
  9. 9.
    Dhillon, I.S., Modha, D.S.: Concept decomposition for large sparse text data using clustering. Machine Learning 42(1/2), 143–175 (2001)zbMATHCrossRefGoogle Scholar
  10. 10.
    CLUTO, Karypis, G., et al.: University of Minnesota (2003), available at:
  11. 11.
    Strehl, A., Ghosh, J., Mooney, R.: Impact of similarity measures on Web-page clustering. In: AAAI 2000. Proc. AAAI Workshop on AI for Web Search, Austin, pp. 58–64. AAAI-MIT Press (2000)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2007

Authors and Affiliations

  • Antoine Naud
    • 1
    • 2
  • Shiro Usui
    • 1
  1. 1.RIKEN Brain Science Institute, 2-1 Hirosawa, Wako City, 351-0198 SaitamaJapan
  2. 2.Department of Informatics, N. Copernicus University, ul. Grudziadzka 5, 87-100 TorunPoland

Personalised recommendations