A Comparative Study on Text Clustering Methods

  • Yan Zheng
  • Xiaochun Cheng
  • Ronghuai Huang
  • Yi Man
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4093)


Text clustering is one of the most important research areas in text mining, which handles the text automatically to discover implicit knowledge. It groups text into different clusters by contents without apriori knowledge. In this paper, different text clustering methods are studied and three text clustering validation criteria are studied and used to evaluate the experimental results. We compare and contrast the effectiveness of k-means and FIHC text clustering methods by experiments, and address the different levels of quality of the resulting text clusters.


Recall Rate Text Cluster Division Function Feedback Neural Network Apriori Knowledge 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Sasaki, M., Shinnou, H.: Spam Detection Using Text Clustering. In: International Conference on Cyberworlds, pp. 316–319 (2005)Google Scholar
  2. 2.
    Kang, M., Asakimori, K., Utsuki, A., Kaburagi, M.: Automated text clustering system on responses to open-ended questions in course evaluations. In: 6th International Conference on Information Technology Based Higher Education and Training, pp. F4B/18–F4B/22 (2005)Google Scholar
  3. 3.
    Benjamin, C.M., Fung, K.W., Ester, M.: Hierarchical Document Clustering Using Frequent Itemsets. In: Proceedings of the 2003 SIAM International Conference on Data Mining (SDM 2003), San Francisco, CA, pp. 59–70 (2003)Google Scholar
  4. 4.
    Rocchio, J.J.: Document retrieval systems, Optimization and evaluation. Harvard University, Cambridge (1966)Google Scholar
  5. 5.
    Jo, T., Japkowicz, N.: Text clustering with NTSO (neural text self organizer). In: Proceedings of 2005 IEEE International Joint Conference on Neural Networks, vol. 1, pp. 558–563 (2005)Google Scholar
  6. 6.
    Liu, L., Kang, J., Yu, J., Wang, Z.: A comparative study on unsupervised feature selection methods for text clustering. In: Proceedings of 2005 IEEE International Conference on Natural Language Processing and Knowledge Engineering, pp. 597–601 (2005)Google Scholar
  7. 7.
    Zhongzhi, S.: Knowledge Discovery. Tsinghua University Press, Beijing (2002)Google Scholar
  8. 8.
    Xu, J.-S., Wang, L.: TCBLHT: a new method of hierarchical text clustering. In: Proceedings of 2005 International Conference on Machine Learning and Cybernetics, vol. 4, pp. 2178–2181 (2005)Google Scholar
  9. 9.
    Tolat, V.V.: An analysis of Kohonen’s self-organizing maps using a system of energy functions. Biol.Cybern. 64, 155–164 (1990)CrossRefMATHGoogle Scholar
  10. 10.
    Yin, F., Wang, J., Guo, C.: A Novel Approach to Clustering Analysis Based on Support Vector Machine Advances in Neural Networks. In: International Symposium on Neural Networks, Proceedings, Part I, Dalian, China, pp. 565–570 (2004)Google Scholar
  11. 11.
    Makoto, I., Takenobu, T.: Hierarchical Bayesian clustering for automatic text classification. Department of Computer Science Tokyo Institute of Technology, TechRep, TR95-0015 (1995)Google Scholar
  12. 12.
    Rigouste, L., Cappe, O., Yvon, F.: Inference for probabilistic unsupervised text clustering. In: Processing of 2005 IEEE/SP 13th Workshop on Statistical Signal, pp. 387–392 (2005)Google Scholar
  13. 13.
    McNeil, A.R., Sarkodie-Gyan, T.: A neural network based recognition scheme for the classification of industrial components. In: Proceedings of 1995 IEEE International Conference on Fuzzy Systems, vol. 4, pp. 1813–1818 (1995)Google Scholar
  14. 14.
    Suli, C., Fuhua, Z., Huanguang, C.: Automatic Chinese Text Classification System Based on the Frequency Vector of the Chinese Word. Journal of Shanxi University (Natural Science Edition) 22(2), 44–49 (1999)Google Scholar
  15. 15.
    Dubes, R.C., Jain, A.K.: Algorithms for Clustering Data. Prentice Hall College Div, Englewood Cliffs (1998)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Yan Zheng
    • 1
    • 2
  • Xiaochun Cheng
    • 2
    • 3
  • Ronghuai Huang
    • 2
  • Yi Man
    • 1
  1. 1.School of Computer Science and TechnologyBeijing University of Posts and TelecommunicationsBeijingChina
  2. 2.Knowledge Science and Engineering InstituteBeijing Normal UniversityBeijingChina
  3. 3.Middlesex UniversityUK

Personalised recommendations