Clustering Text Documents Using Kernel Possibilistic C-Means

  • M. B. Revanasiddappa
  • B. S. Harish
  • S. V. Aruna Kumar
Conference paper
Part of the Lecture Notes in Networks and Systems book series (LNNS, volume 14)

Abstract

Text Document Clustering is one of the classic topics in text mining, which groups text documents in unsupervised way. There are various clustering techniques available to cluster text documents. Fuzzy C-Means (FCM) is one of the popular fuzzy-clustering algorithm. Unfortunately, Fuzzy C-Means algorithm is too sensitive to noise. Possibilistic C-Means overcomes this drawback by releasing the probabilistic constraint of the membership function. In this paper, we proposed a Kernel Possibilistic C-Means (KPCM) method for Text Document Clustering. Unlike the classical Possibilistic C-Means algorithm, the proposed method employs the kernel distance metric to calculate the distance between the cluster center and text document. We used standard 20NewsGroups dataset for experimentation and conducted comparison between proposed method (KPCM), Fuzzy C-Means, Kernel Fuzzy C-Means and Possibilistic C-Means. The experimental results reveal that the Kernel Possibilistic C-Means outperforms the other methods in terms of accuracy.

Keywords

Text document clustering Term document matrix Fuzzy C-Means Possibilistic C-Means 

References

  1. 1.
    Win TT, Mon L (2010) Document clustering by fuzzy c-mean algorithm. In: Advanced computer control (ICACC), IEEE 2010 2nd international conference, vol 1, pp 239–242Google Scholar
  2. 2.
    James CB, Robert E, William F (1984) FCM: the fuzzy c-means clustering algorithm. Comput Geosci 191–203Google Scholar
  3. 3.
    Harish BS, Prasad B, Udayasri B (2014) Classification of text documents using adaptive fuzzy c-means clustering. In: Recent advances in intelligent informatics. Springer International Publishing, pp 205–214Google Scholar
  4. 4.
    Chuai AS, Lursinsap C, Sophasathit P, Siripant S (2001) Fuzzy C-Mean: A statistical feature classification of text and image segmentation method. Int J Uncertain Fuzziness Knowl-Based Syst 06:661–671MATHGoogle Scholar
  5. 5.
    Mei JP, Wang Y, Chen L, Miao C (2014) Incremental fuzzy clustering for document categorization. In: 2014 IEEE international conference on fuzzy systems (FUZZ-IEEE), pp 1518–1525Google Scholar
  6. 6.
    Bezdek JC, Pal MR, Keller J, Krishnapuram R (1999) Fuzzy models and algorithms for pattern recognition and image processing [M]. Kluwer Academic, MassaschusettsCrossRefMATHGoogle Scholar
  7. 7.
    Wu Z, Xie W, Yu J (2003) Fuzzy C-means clustering algorithm based on kernel method. In: IEEE conference on computational intelligence and multimedia applications, pp 49–54Google Scholar
  8. 8.
    Krishnapuram R, Keller JM (1993) A possibilistic approach to clustering. IEEE Trans Fuzzy Syst 1(2):98–110Google Scholar
  9. 9.
    Yang MS, Wu KL (2006) Unsupervised possibilistic clustering. Pattern Recognit 39: 5–21Google Scholar
  10. 10.
    Timm H, Borgelt C, Doring C, Kruse R (2004) An extension to possibilistic fuzzy cluster analysis. Fuzzy Sets Syst 147(1):3–16MathSciNetCrossRefMATHGoogle Scholar
  11. 11.
    Zhang JS, Leung YW (2004) Improved possibilistic c-means clustering algorithms. IEEE Trans Fuzzy Syst 12(2):209–217CrossRefGoogle Scholar
  12. 12.
    Krishnapuram R, Keller JM (1996) The possibilistic c-means algorithm: insights and recommendations. IEEE Trans Fuzzy Syst 4(3):385–393CrossRefGoogle Scholar
  13. 13.
    Saad MF, Alimi AM (2009) Modified fuzzy possibilistic c-means. In: Proceeding of the international Multi conference of engineers and computer scientists, vol 1. Hong KongGoogle Scholar
  14. 14.
    Rhee FCH, Choi KS, Choi BI (2009) Kernel approach to possibilistic C-means clustering. Int J Intell Syst 24(3):272–292CrossRefMATHGoogle Scholar
  15. 15.
    Mizutani K, Miyamoto S (2005) Possibilistic approach to kernel-based fuzzy c-means clustering with entropy regularization. In: International conference on modeling decisions for artificial intelligence. Springer, Berlin, pp 144–155Google Scholar
  16. 16.
    Raza MA, Rhee FCH (2012) Interval type-2 approach to kernel possibilistic c-means clustering. In: Fuzzy systems (FUZZ-IEEE), 2012 IEEE international conference, pp 1–7Google Scholar
  17. 17.
    Tjhi WC, Chen L (2009) Dual fuzzy-possibilistic coclustering for categorization of documents. IEEE Trans Fuzzy Syst 17(3):532–543Google Scholar
  18. 18.
    Tjhi WC, Chen L (2007) Possibilistic fuzzy co-clustering of large document collections. Pattern Recogn 40(12):3452–3466CrossRefMATHGoogle Scholar
  19. 19.
    Nogueira TM, Rezende SO, Camargo HA (2015) Flexible document organization: comparing fuzzy and possibilistic approaches. In: Fuzzy systems (FUZZ-IEEE), 2015 IEEE international conference, pp 1–8Google Scholar
  20. 20.
    Cai D, He X, Zhang WV, Han J (2007) Regularized locality preserving indexing via spectral regression. In: Proceedings of the sixteenth ACM conference on conference on information and knowledge management, ACM, pp 741–750Google Scholar
  21. 21.
    Zhang DQ (2003) Kernel-based fuzzy clustering incorporating spatial constraints for image segmentation. In: Proceedings of the international conference on machine learning and cybernetics, pp 2189–2192Google Scholar
  22. 22.

Copyright information

© Springer Nature Singapore Pte Ltd. 2018

Authors and Affiliations

  • M. B. Revanasiddappa
    • 1
  • B. S. Harish
    • 1
  • S. V. Aruna Kumar
    • 1
  1. 1.Department of Information Science and EngineeringSri Jayachamarajendra College of EngineeringMysoreIndia

Personalised recommendations