Applied Intelligence

, Volume 45, Issue 2, pp 293–304 | Cite as

Ambiguity-driven fuzzy C-means clustering: how to detect uncertain clustered records

Article

Abstract

As a well-known clustering algorithm, Fuzzy C-Means (FCM) allows each input sample to belong to more than one cluster, providing more flexibility than non-fuzzy clustering methods. However, the accuracy of FCM is subject to false detections caused by noisy records, weak feature selection and low certainty of the algorithm in some cases. The false detections are very important in some decision-making application domains like network security and medical diagnosis, where weak decisions based on such false detections may lead to catastrophic outcomes. They mainly emerge from making decisions about a subset of records that do not provide sufficient evidence to make a good decision. In this paper, we propose a method for detecting such ambiguous records in FCM by introducing a certainty factor to decrease invalid detections. This approach enables us to send the detected ambiguous records to another discrimination method for a deeper investigation, thus increasing the accuracy by lowering the error rate. Most of the records are still processed quickly and with low error rate preventing performance loss which is common in similar hybrid methods. Experimental results of applying the proposed method on several datasets from different domains show a significant decrease in error rate as well as improved sensitivity of the algorithm.

Keywords

FCM clustering Intrusion detection Classification with ambiguity Certainty factor Location privacy Fuzzy image segmentation 

References

  1. 1.
    Bezdek JC, Ehrlich R, Full W (1984) Fcm: The fuzzy c-means clustering algorithm. Comput Geosci 10 (2):191–203CrossRefGoogle Scholar
  2. 2.
    Brush AJ, Krumm J, Scott J (2010) Exploring end user preferences for location obfuscation, location-based services, and the value of location. In: Proceedings of the 12th ACM international conference on Ubiquitous computing, pp 95–104. ACMGoogle Scholar
  3. 3.
    Callado A, Kamienski C, Szabó G, Gero B, Kelner J, Fernandes S, Sadok D (2009) A survey on internet traffic identification. IEEE Communications Surveys & Tutorials 11(3):37–52CrossRefGoogle Scholar
  4. 4.
    Callado A, Kelner J, Sadok D, Kamienski C A, Fernandes S (2010) Better network traffic identification through the independent combination of techniques. J Netw Comput Appl 33(4):433–446CrossRefGoogle Scholar
  5. 5.
    Casas-Roma J, Herrera-Joancomartí J, Torra V (2014) Anonymizing graphs: measuring quality for clustering. Knowl Inf Syst:1–22Google Scholar
  6. 6.
    Chuang K-S, Tzeng H-L, Chen S, Wu J, Chen T-J (2006) Fuzzy c-means clustering with spatial information for image segmentation. Comput Med Imaging Graph 30(1):9–15CrossRefGoogle Scholar
  7. 7.
    Dainotti A, Pescape A, Claffy KC (2012) Issues and future directions in traffic classification. IEEE Netw 26(1):35–40CrossRefGoogle Scholar
  8. 8.
    Endo Y, Hasegawa Y, Yukihiro H, Kanzawa Y (2011) Fuzzy c-means clustering for uncertain data using quadratic penalty-vector regularization. Journal of Advanced Computational Intelligence 15(1)Google Scholar
  9. 9.
    Fonseca J, Abdelouahab Z, Lopes D, Labidi S (2010) A security framework for soa applications in mobile environment. arXiv:1004.0774
  10. 10.
    Ghadiri A, Ghadiri N (2011) An adaptive hybrid architecture for intrusion detection based on fuzzy clustering and rbf neural networks. In: Communication Networks and Services Research Conference (CNSR), 2011 Ninth Annual, pp 123–129. IEEEGoogle Scholar
  11. 11.
    Graves D, Pedrycz W (2010) Kernel-based fuzzy clustering and fuzzy clustering: A comparative experimental study. Fuzzy Sets Syst 161(4):522–543MathSciNetCrossRefGoogle Scholar
  12. 12.
    Hamasuna Y, Endo Y, Miyamoto S (2011) On mahalanobis distance based fuzzy c-means clustering for uncertain data using penalty vector regularization. In: 2011 IEEE International Conference on Fuzzy Systems (FUZZ), pp 810–815. IEEEGoogle Scholar
  13. 13.
    Hartigan JA, Wong MA (1979) Algorithm as 136: A k-means clustering algorithm. Appl Stat:100–108Google Scholar
  14. 14.
    Hoh B, Gruteser M (2005) Protecting location privacy through path confusion. In: First International Conference on Security and Privacy for Emerging Areas in Communications Networks, 2005. SecureComm 2005, pp 194–205. IEEEGoogle Scholar
  15. 15.
    Hoh B, Gruteser M, Xiong H, Alrabady A (2006) Enhancing security and privacy in traffic-monitoring systems. IEEE Pervasive Computing 5(4):38–46CrossRefGoogle Scholar
  16. 16.
    Höppner F, Klawonn F (2003) Improved fuzzy partitions for fuzzy regression models. Int J Approx Reason 32(2):85–102MathSciNetCrossRefMATHGoogle Scholar
  17. 17.
    Jain A, Agrawal S, Agrawal J, F-fdrpso Sanjeev Sharma. (2014) A novel approach based on hybridization of fuzzy c-means and fdrpso for gene clustering. In: Proceedings of the Third International Conference on Soft Computing for Problem Solving, pp 709–719. SpringerGoogle Scholar
  18. 18.
    Jiang W, Yao M, Yan J (2008) Intrusion detection based on improved fuzzy c-means algorithm. In: International Symposium on Information Science and Engineering, 2008. ISISE’08, vol 2, pp 326–329. IEEEGoogle Scholar
  19. 19.
    Jianliang M, Haikun S, Ling B (2009) The application on intrusion detection based on k-means cluster algorithm. In: International Forum on Information Technology and Applications, 2009. IFITA’09, vol 1, pp 150–152. IEEEGoogle Scholar
  20. 20.
    Li D-C, Liu C-W, Susan CH (2010) A learning method for the class imbalance problem with medical data sets. Comput Biol Med 40(5):509–518CrossRefGoogle Scholar
  21. 21.
    Li H, Cai J, Nguyen TNA, Zheng J (2013) A benchmark for semantic image segmentation. In: 2013 IEEE International Conference on Multimedia and Expo (ICME), pp 1–6. IEEEGoogle Scholar
  22. 22.
    Li W, Canini M, Moore AW, Bolla R (2009) Efficient application identification and the temporal and spatial stability of classification schema. Comput Netw 53(6):790–809CrossRefMATHGoogle Scholar
  23. 23.
    Lim Y-s, Kim H-c, Jeong J, Kim C-k, Kwon TT, Choi Y (2010) Internet traffic classification demystified: on the sources of the discriminative power. In: Proceedings of the 6th International COnference, p 9. ACMGoogle Scholar
  24. 24.
    Lin K-P (2014) A novel evolutionary kernel intuitionistic fuzzy c-means clustering algorithm. IEEE Trans Fuzzy Syst 22(5):1074–1087CrossRefGoogle Scholar
  25. 25.
    Linda O, Manic M (2012) General type-2 fuzzy c-means algorithm for uncertain fuzzy clustering. IEEE Trans Fuzzy Syst 20(5):883–897CrossRefGoogle Scholar
  26. 26.
    Octavio L-G, García-Borroto M, Medina-Pérez MA, Martínez-Trinidad JF, Carrasco-Ochoa JA, De Ita G (2013) An empirical study of oversampling and undersampling methods for lcmine an emerging pattern based classifier. In: Pattern Recognition, pp 264–273. SpringerGoogle Scholar
  27. 27.
    Mei J-P, Linkfcm LC (2013) Relation integrated fuzzy c-means. Pattern Recog 46(1):272–283CrossRefGoogle Scholar
  28. 28.
    Ménard M, Demko C, Loonis P (2000) The fuzzy c + 2-means: solving the ambiguity rejection in clustering. Pattern recog 33(7):1219–1237CrossRefGoogle Scholar
  29. 29.
    Mohd AB, Nor SbM (2009) Towards a flow-based internet traffic classification for bandwidth optimization. Int J Comput Sci Secur (IJCSS) 3(2):146–153Google Scholar
  30. 30.
    Nejad TR, Abadi MSA (2014) Intrusion detection in computer networks through a hybrid approach of data mining and decision treesGoogle Scholar
  31. 31.
    Pal NR, Pal K, Keller JM, Bezdek JC (2005) A possibilistic fuzzy c-means clustering algorithm. IEEE Trans Fuzzy Syst 13(4):517–530MathSciNetCrossRefGoogle Scholar
  32. 32.
    Parker JK, Hall LO (2014) Accelerating fuzzy-c means using an estimated subsample size. IEEE Trans Fuzzy Syst 22(5):1229–1244CrossRefGoogle Scholar
  33. 33.
    Pedrycz W, Rai P (2008) Collaborative clustering with the use of fuzzy c-means and its quantification. Fuzzy Sets Syst 159(18):2399–2427MathSciNetCrossRefMATHGoogle Scholar
  34. 34.
    Sezer EA, Nefeslioglu HA, Gokceoglu C (2014) An assessment on producing synthetic samples by fuzzy c-means for limited number of data in prediction models. Appl Soft Comput 24:126–134CrossRefGoogle Scholar
  35. 35.
    Chao-Ton S, Chen L-S, Yih Y (2006) Knowledge acquisition through information granulation for imbalanced data. Expert Syst Appl 31(3):531–541CrossRefGoogle Scholar
  36. 36.
    Velmurugan T (2014) Performance based analysis between k-means and fuzzy c-means clustering algorithms for connection oriented telecommunication data. Appl Soft Comput 19:134– 146CrossRefGoogle Scholar
  37. 37.
    Wang X-Y, Juan B (2010) A fast and robust image segmentation using fcm with spatial information. Digital Signal Processing 20(4):1173–1182CrossRefGoogle Scholar
  38. 38.
    Williams N, Zander S, Armitage G (2006) A preliminary performance comparison of five machine learning algorithms for practical ip traffic flow classification. ACM SIGCOMM Computer Communication Review 36(5):5–16CrossRefGoogle Scholar
  39. 39.
    Yasunori E, Isao T, Yukihiro H, Sadaaki M (2011) Kernelized fuzzy c-means clustering for uncertain data using quadratic penalty-vector regularization with explicit mappings. In: 2011 IEEE International Conference on Fuzzy Systems (FUZZ), pp 804–809. IEEEGoogle Scholar
  40. 40.
    Yu P, Qinghua L, Xiyuan P (2011) Uck-means: A customized k-means for clustering uncertain measurement data. In: 2011 Eighth International Conference on Fuzzy Systems and Knowledge Discovery (FSKD), vol 2, pp 1196–1200. IEEEGoogle Scholar
  41. 41.
    Yuan R, Li Z, Guan X, Li X (2010) An svm-based machine learning method for accurate internet traffic classification. Inf Syst Front 12(2):149–156CrossRefGoogle Scholar
  42. 42.
    Zeng S, Tong X, Sang N (2014) Study on multi-center fuzzy c-means algorithm based on transitive closure and spectral clustering. Appl Soft Comput 16:89–101CrossRefGoogle Scholar
  43. 43.
    Zhao F, Liu H, Fan J (2015) A multiobjective spatial fuzzy clustering algorithm for image segmentation. Appl Soft Comput 30:48–57CrossRefGoogle Scholar
  44. 44.
    Zhen L, Qiong L (2012) A new feature selection method for internet traffic classification using ml. Phys Procedia 33:1338–1345CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2016

Authors and Affiliations

  1. 1.Department of Electrical and Computer EngineeringIsfahan University of TechnologyIsfahanIran

Personalised recommendations