International Journal of Fuzzy Systems

, Volume 19, Issue 5, pp 1585–1602 | Cite as

Tune Up Fuzzy C-Means for Big Data: Some Novel Hybrid Clustering Algorithms Based on Initial Selection and Incremental Clustering

Article

Abstract

Data are getting larger, and most of them are necessary for our businesses. Rapid explosion of data brings us a number of challenges relating to its complexity and how the most important knowledge can be captured in reasonable time. Fuzzy C-means (FCM)—one of the most efficient clustering algorithms which have been widely used in pattern recognition, data compression, image segmentation, computer vision and many other fields—also faces the problem of processing large datasets. In this paper, we propose some novel hybrid clustering algorithms based on incremental clustering and initial selection to tune up FCM for the Big Data problem. The first algorithm determines meshes of rectangle covering data points as the representatives, while the second one considers data points that have high influence to others as the representatives. The representatives are then clustered by FCM, and the new centers are selected as initial ones for clustering of the dataset. Theoretical analyses of the new algorithms including comparison of quality of solutions when clustering the representatives set versus the entire set are examined. The experimental results on both simulated and real datasets show that total computational time of the new methods including time of finding representatives and clustering is faster than those of other relevant algorithms. The validation on clustering quality is also examined. The findings of this paper have great impact and significance to researches in the fields of soft computing and Big Data processing. It is obvious that computing methodologies nowadays are facing with huge amount of diverse and complex data structures. Speed of processing is the main priority when considering effectiveness of a specific method. The findings demonstrated practical algorithms and investigated their characteristics that could be referenced by other researchers in similar applications. The usefulness and significance of this research are clearly demonstrated within the extent of real-life applications.

Keywords

Big data Density-based clustering Fuzzy C-means Grid-based clustering Incremental clustering Initial selection 

Notes

Acknowledgments

The authors are greatly indebted to the editor-in-chief, Prof. Shun-Feng Su and anonymous reviewers for their comments and their valuable suggestions that improved the quality and clarity of paper. A great thank was dedicated to Msc. Nguyen Duc Thien for his discussion and supports in theoretical validation of this paper. We acknowledge the Center for High Performance Computing, VNU for running the codes in the IBM 1350 system.

References

  1. 1.
    Aaron, B., Tamir, D., Rishe, N., Kandel, A.: Dynamic incremental fuzzy C-means clustering. In: 6th International Conferences on Pervasive Patterns and Applications (PATTERNS 2014), pp. 28–37 (2014)Google Scholar
  2. 2.
    Anderson, D.T., Luke, R.H., Keller, J.M.: Speedup of fuzzy clustering through stream processing on graphics processing units. IEEE Trans. Fuzzy Syst. 16(4), 1101–1106 (2008)CrossRefGoogle Scholar
  3. 3.
    Arora S., Chana, I.: A survey of clustering techniques for big data analysis. In: 2014 5th IEEE International Conference on the Next Generation Information Technology Summit (Confluence), pp. 59–65 (2014)Google Scholar
  4. 4.
    Bezdek, J.C., Ehrlich, R., Full, W.: FCM: the fuzzy c-means clustering algorithm. Comput. Geosci. 10(2), 191–203 (1984)CrossRefGoogle Scholar
  5. 5.
    Borgelt, C., Kruse R.: Speeding up fuzzy clustering with neural network techniques. In: Proceeding of the 12th IEEE International Conference on Fuzzy Systems (FUZZ ‘03), St. Louis, Missouri, USA, Vol. 2, pp. 852–856 (2003)Google Scholar
  6. 6.
    Cheng, T.W., Goldgof, D.B., Hall, L.O.: Fast fuzzy clustering. Fuzzy Sets Syst. 93(1), 49–56 (1998)CrossRefMATHGoogle Scholar
  7. 7.
    Cuong, B.C., Son, L.H., Chau, H.T.M.: Some context fuzzy clustering methods for classification problems. In: Proceedings of the 2010 Symposium on Information and Communication Technology, Hanoi, Vietnam, pp. 34–40 (2010)Google Scholar
  8. 8.
    Davies, D.L., Bouldin, D.W.: A cluster separation measure. IEEE Trans. Patt Anal. Mach. Intell. 2, 224–227 (1979)CrossRefGoogle Scholar
  9. 9.
    Dong, Y., Zhuang, Y.: Fuzzy Hierarchical clustering algorithm facing large databases. In: Proceeding of the 5th IEEE World Congress on Intelligent Control and Automation, Hangzhou, China, Vol. 5, pp. 4282–4286 (2004)Google Scholar
  10. 10.
    Fahad, A., Alshatri, N., Tari, Z., Alamri, A., Khalil, I., Zomaya, A.Y., Foufou, S., Bouras, A.: A survey of clustering algorithms for big data: taxonomy & empirical analysis. IEEE Trans. Emerg. Top. Comput. 2(3), 267–279 (2014)CrossRefGoogle Scholar
  11. 11.
    Fan, J., Li, J.: A fixed suppressed rate selection method for suppressed fuzzy c-means clustering algorithm. Appl. Math. 5, 1275–1283 (2014)CrossRefGoogle Scholar
  12. 12.
    Feng, X.B., Yao, F., Li, Z.G., Yang, X.J.: Improved fuzzy C-means based on the optimal number of clusters. Appl. Mech. Mater. 392, 803–807 (2013)CrossRefGoogle Scholar
  13. 13.
    Gobi, A.F., Pedrycz, W.: The potential of fuzzy neural networks in the realization of approximation reasoning engines. Fuzzy Sets Syst. 157(22), 2954–2973 (2006)CrossRefMATHGoogle Scholar
  14. 14.
    Hall, L.O.: Exploring big data with scalable soft clustering. In: Synergies of Soft Computing and Statistics for Intelligent Data Analysis, pp. 11–15. Springer, Berlin (2013)Google Scholar
  15. 15.
    Hu, Y., Qu, F., Wen, C.: An unsupervised possibilistic c-means clustering algorithm with data reduction. In: 10th IEEE International Conference on Fuzzy Systems and Knowledge Discovery (FSKD 2013), pp. 29–33 (2013)Google Scholar
  16. 16.
    Hung, M. C., Yang, D.L. An efficient Fuzzy C-means clustering algorithm. In: Proceedings of the IEEE International Conference on Data Mining 2001 (ICDM 2001), San Jose, CA, USA, pp. 225–232 (2001)Google Scholar
  17. 17.
    Jain, A.K.: Data clustering: 50 years beyond K-means. Pattern Recogn. Lett. 31(8), 651–666 (2010)CrossRefGoogle Scholar
  18. 18.
    Kothari, D., Narayanan, S.T., Devi, K.K.: Extended fuzzy C-means with random sampling techniques for clustering large data. Int. J. Innov. Res. Adv. Eng. 1(1), 1–4 (2014)Google Scholar
  19. 19.
    Levy, R.: Probabilistic models in the study of language, Ms. University of California, San Diego (2010)Google Scholar
  20. 20.
    Marsaglia, G.: Random variables and computers. In: Information Theory Statistical Decision Functions Random Process, pp. 499–510 (1962)Google Scholar
  21. 21.
    Ozturk, C., Hancer, E., Karaboga, D.: Improved clustering criterion for image clustering with artificial bee colony algorithm. Pattern Anal. Appl. 18(3), 587–599 (2015)MathSciNetCrossRefGoogle Scholar
  22. 22.
    Parker, J.K., Hall, L.O.: Accelerating fuzzy-c means using an estimated subsample size. IEEE Trans. Fuzzy Syst. 22(5), 1229–1244 (2014)CrossRefGoogle Scholar
  23. 23.
    Parvin, H., Minaei-Bidgoli, B.: A clustering ensemble framework based on selection of fuzzy weighted clusters in a locally adaptive clustering algorithm. Pattern Anal. Appl. 18(1), 87–112 (2015)MathSciNetCrossRefMATHGoogle Scholar
  24. 24.
    Qu, F., Hu, Y., Xue, Y., Yang, Y.: A modified possibilistic fuzzy c-means clustering algorithm. In: 2013 IEEE 9th International Conference on Natural Computation (ICNC 2013), pp. 858–862 (2013)Google Scholar
  25. 25.
    Rahimi S., Zargham M., Thakre A., Chhillar D.: A parallel Fuzzy C-Mean algorithm for image segmentation. In: Proceeding of the IEEE Annual Meeting of the Fuzzy Information Processing Society (NAFIPS ‘04), Vol. 1, pp. 234–237 (2004)Google Scholar
  26. 26.
    Ramathilagam, S., Devi, R., Kannan, S.R.: Extended fuzzy c-means: an analyzing data clustering problems. Cluster Comput. 16(3), 389–406 (2013)CrossRefGoogle Scholar
  27. 27.
    Sarma, T.H., Viswanath, P., Reddy, B.E.: Speeding-up the kernel k-means clustering method: a prototype based hybrid approach. Pattern Recogn. Lett. 34(5), 564–573 (2013)CrossRefGoogle Scholar
  28. 28.
    Shirkhorshidi, A.S., Aghabozorgi, S., Wah, T.Y., Herawan, T.: Big data clustering: a review. In: Computational Science and its Applications–ICCSA 2014 (pp. 707–720). Springer International Publishing (2014)Google Scholar
  29. 29.
    Son, L.H., Cuong, B.C., Lanzi, P.L., Thong, N.T.: A novel intuitionistic fuzzy clustering method for geo-demographic analysis. Expert Syst. Appl. 39(10), 9848–9859 (2012)CrossRefGoogle Scholar
  30. 30.
    Son, L.H., Cuong, B.C., Long, H.V.: Spatial interaction—modification model and applications to geo-demographic analysis. Knowl. Based Syst. 49, 152–170 (2013)CrossRefGoogle Scholar
  31. 31.
    Son, L.H., Lanzi, P.L., Cuong, B.C., Hung, H.A.: Data mining in GIS: A novel context-based fuzzy geographically weighted clustering algorithm. Int. J. Mach. Learn. Comput. 2(3), 235–238 (2012)CrossRefGoogle Scholar
  32. 32.
    Son, L.H.: Enhancing clustering quality of geo-demographic analysis using context fuzzy clustering type-2 and particle swarm optimization. Appl. Soft Comput. 22, 566–584 (2014)CrossRefGoogle Scholar
  33. 33.
    Son, L.H.: HU-FCF: a hybrid user-based fuzzy collaborative filtering method in recommender systems. Expert Syst. Appl. 41(15), 6861–6870 (2014)CrossRefGoogle Scholar
  34. 34.
    Son, L.H.: Optimizing municipal solid waste collection using chaotic particle swarm optimization in GIS based environments: a case study at Danang City, Vietnam. Expert Syst. Appl. 41(18), 8062–8074 (2014)CrossRefGoogle Scholar
  35. 35.
    Son, L.H.: DPFCM: A novel distributed picture fuzzy clustering method on picture fuzzy sets. Expert Syst. Appl. 42(1), 51–66 (2015)CrossRefGoogle Scholar
  36. 36.
    Son, L.H.: Dealing with the new user cold-start problem in recommender systems: a comparative review. Inform. Syst. 58, 87–104 (2015)CrossRefGoogle Scholar
  37. 37.
    Son, L.H.: HU-FCF++: a novel hybrid method for the new user cold-start problem in recommender systems. Eng. Appl. Artif. Intell. 41, 207–222 (2015)CrossRefGoogle Scholar
  38. 38.
    Son, L.H., Linh, N.D., Long, H.V.: A lossless DEM compression for fast retrieval method using fuzzy clustering and MANFIS neural network. Eng. Appl. Artif. Intell. 29, 33–42 (2014)CrossRefGoogle Scholar
  39. 39.
    Son, L.H., Thong, N.T.: Intuitionistic fuzzy recommender systems: an effective tool for medical diagnosis. Knowl.-Based Syst. 74, 133–150 (2015)CrossRefGoogle Scholar
  40. 40.
    Szilágyi, L., Szilágyi, S.M.: Generalization rules for the suppressed fuzzy c-means clustering algorithm. Neurocomputing 139, 298–309 (2014)CrossRefGoogle Scholar
  41. 41.
    Szilagyi, L., Denesi, G., Szilagyi, S.M.: Fast color reduction using approximative c-means clustering models. In: 2014 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE 14’), pp. 194–201 (2014)Google Scholar
  42. 42.
    Taherdangkoo, M., Bagheri, M.H.: A powerful hybrid clustering method based on modified stem cells and Fuzzy C-means algorithms. Eng. Appl. Artif. Intell. 26(5), 1493–1502 (2013)CrossRefGoogle Scholar
  43. 43.
    Thong, N.T., Son, L.H.: HIFCF: an effective hybrid model between picture fuzzy clustering and intuitionistic fuzzy recommender systems for medical diagnosis. Expert Syst. Appl. 42(7), 3682–3701 (2015)CrossRefGoogle Scholar
  44. 44.
    Thong, P.H., Son, L.H.: A new approach to multi-variables fuzzy forecasting using picture fuzzy clustering and picture fuzzy rules interpolation method. In: Proceeding of 6th International Conference on Knowledge and Systems Engineering (KSE 2014), Hanoi, Vietnam, pp 679–690 (2014)Google Scholar
  45. 45.
    UCI Machine Learning Repository. (2015). Datasets, Available at: https://archive.ics.uci.edu/ml/datasets.html. Accessed: 11/03/2015
  46. 46.
    Wang, J., Chung, F.L., Wang, S., Deng, Z.: Double indices-induced FCM clustering and its integration with fuzzy subspace clustering. Pattern Anal. Appl. 17(3), 549–566 (2014)MathSciNetCrossRefMATHGoogle Scholar
  47. 47.
    Wang, Y., Chen, L., Mei, J.P.: Incremental fuzzy clustering with multiple medoids for large data. IEEE Trans. Fuzzy Syst. 22(6), 1557–1568 (2014)CrossRefGoogle Scholar
  48. 48.
    Zang, X., Vista IV, F.P., Chong, K.T.: Fast global kernel fuzzy c-means clustering algorithm for consonant/vowel segmentation of speech signal. J Zhejiang Univ. Sci. C 15(7), 551–563 (2014)CrossRefGoogle Scholar
  49. 49.
    Zhang, Q., Chen, Z.: A weighted kernel possibilistic c-means algorithm based on cloud computing for clustering big data. Int. J. Commun Syst 27(9), 1378–1391 (2014)CrossRefGoogle Scholar
  50. 50.
    Zhang, Z., Havens, T.C.: Scalable approximation of kernel fuzzy c-means. In: 2013 IEEE International Conference on Big Data, pp. 161–168 (2013)Google Scholar
  51. 51.
    Zhao, Y., Wu, X., Kong, S.G., Zhang, L.: Joint segmentation and pairing of multispectral chromosome images. Pattern Anal. Appl. 16(4), 497–506 (2013)MathSciNetCrossRefGoogle Scholar

Copyright information

© Taiwan Fuzzy Systems Association and Springer-Verlag Berlin Heidelberg 2016

Authors and Affiliations

  1. 1.Division of Data ScienceTon Duc Thang UniversityHo Chi Minh CityVietnam
  2. 2.Faculty of Information TechnologyTon Duc Thang UniversityHo Chi Minh CityVietnam
  3. 3.People’s Police University of Technology and LogisticsBac NinhVietnam
  4. 4.VNU University of ScienceVietnam National UniversityHanoiVietnam

Personalised recommendations