Document Clustering by Semantic Smoothing and Dynamic Growing Cell Structure (DynGCS) for Biomedical Literature

  • Min Song
  • Xiaohua Hu
  • Illhoi Yoo
  • Eric Koppel
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5182)


The general goal of clustering is to group data elements such that the intra-group similarities are high and the inter-group similarities are low. In this paper, we propose a novel hybrid clustering technique that incorporates semantic smoothing of document models into a neural network framework. Recently it has been reported that the semantic smoothing model enhances the retrieval quality in Information Retrieval (IR). Inspired by that, we apply the context-sensitive semantic smoothing model to boost accuracy of clustering that is generated by a dynamic growing cell structure algorithm, a variation of the neural network technique. We evaluated the proposed technique on article sets from MEDLINE, the largest biomedical digital library in Biomedicine. Our experimental evaluations show that the proposed algorithm significantly improves the clustering quality over the traditional clustering techniques.


document clustering feature selection neural network 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Bruske, J., Sommer, G.: Dynamic cell structures. In: Tesauro, G., Touretzky, D., Leen, T. (eds.) Advances in Neural Information Processing Systems, vol. 7, pp. 497–504. MIT Press, Cambridge (1995)Google Scholar
  2. 2.
    Dittenbach, M., Merkl, D., Rauber, A.: The Growing Hierarchical Self-Organizing Map. In: Proc. Intl. Joint Conf. on Neural Networks (IJCNN 2000) (2000)Google Scholar
  3. 3.
    Fritzke, B.: A growing neural gas network learns topologies. In: Lean, T.K. (ed.) Advances in Neural Information Processing Systems 7, pp. 625–632. MIT Press, Cambridge (1995)Google Scholar
  4. 4.
    Fritzke, B.: Growing cell structures - a self-organizing network for unsupervised and supervised learning. Neural Networks 7(9), 1441–1460 (1994)CrossRefGoogle Scholar
  5. 5.
    Kohonen, T., Kaski, S., Lagus, K., Salojärvi, J., Honkela, J., Paatero, V., Saarela, A.: Self organization of a massive text document collection. In: Oja, E., Kaski, S. (eds.) Kohonen Maps, pp. 171–182. Elsevier, Amsterdam (1999)CrossRefGoogle Scholar
  6. 6.
    Lafferty, J., Zhai, C.: Document Language Models, Query Models, and Risk Minimization for Information Retrieval. In: Proceedings of the 24th ACM SIGIR Conference on Research and Development in IR, pp. 111–119 (2001)Google Scholar
  7. 7.
    Nigam, K., McCallum, A.: Text Classification from Labeled and Unlabeled Document Using EM. Machine Learning 39(2-3), 103–134Google Scholar
  8. 8.
    Pearson, R.K.: Mining imperfect data; dealing with contamination and incomplete records. In: SIAM 2005 (2005)Google Scholar
  9. 9.
    Song, M., Song, I.-Y., Hu, X.: KPSpotter: a flexible information gain-based keyphrase extraction system. In: WIDM 2003, pp. 50–53 (2003)Google Scholar
  10. 10.
    Steinbach, M., Karypis, G., Kumar, V.: A Comparison of Document Clustering Techniques. In: Workshop on Text Mining, SIGKDD (2000)Google Scholar
  11. 11.
    Zhang, X., Zhou, X., Hu, X.: Semantic Smoothing for Model-based Document Clustering. In: The 2006 IEEE International Conference on Data Mining (IEEE ICDM 2006), HongKong, December 18-22 (2006)Google Scholar
  12. 12.
    Zhong, S., Ghosh, J.: Generative Model-based Document Clustering: a Comparative Study. Knowledge and Information Systems 8(3), 374–384 (2005)CrossRefGoogle Scholar
  13. 13.
    Witten, I.H., Paynter, G.W., Frank, E., Gutwin, C., Nevill-Manning, C.G.: KEA: Practical automatic keyphrase extraction. In: Proc. DL 1999, pp. 254–256 (1999)Google Scholar
  14. 14.
    Yoo, I., Hu, X., Song, I.-Y.: A Coherent Graph-based Semantic Clustering and Summarization Approach for Biomedical Literature and a New Summarization Evaluation Methods. BMC Bioinformatics 8(Suppl 9), S4 (2007)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • Min Song
    • 1
  • Xiaohua Hu
    • 2
  • Illhoi Yoo
    • 3
  • Eric Koppel
    • 4
  1. 1.Information SystemsNew Jersey Institute of TechnologyUSA
  2. 2.College of Information Science and TechnologyDrexel UniversityUSA
  3. 3.Department of Health Management and InformaticsSchool of Medicine, University of Missouri-ColumbiaUSA
  4. 4.Computer ScienceNew Jersey Institute of Technology 

Personalised recommendations