Skip to main content
Log in

A hybrid clustering algorithm based on missing attribute interval estimation for incomplete data

  • Short Paper
  • Published:
Pattern Analysis and Applications Aims and scope Submit manuscript

Abstract

Partially missing data sets are a prevailing problem in clustering analysis. We propose a hybrid algorithm combining fuzzy clustering with particle swarm optimization (PSO) for incomplete data clustering, and missing attributes are represented as intervals. Furthermore, we develop a neighbor interval reconstruction (NIR) method based on pre-classification results that estimates the nearest-neighbor interval of missing attribute using the nearest-neighbor rule, which avoids endpoints of intervals determined by different species information, thereby improving the accuracy of missing attribute intervals and enhancing the robustness of missing attribute imputation. Then, the PSO and fuzzy c-means hybrid algorithm are used for clustering the interval-valued data set, and the global optimization ability of the PSO can improve the accuracy of clustering results compared with gradient-based optimization methods. The experimental results for several UCI data sets show the superiority of the proposed NIR hybrid algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

References

  1. Chen M, Miao DQ (2011) Interval set clustering. Expert Syst Appl 38(4):2923–2932

    Article  MathSciNet  Google Scholar 

  2. Wang J, Chung FL, Wang ST, Deng ZH (2013) Double indices-induced FCM clustering and its integration with fuzzy subspace clustering. Pattern Anal Appl 6:1433–7541

    Google Scholar 

  3. Chang CT, Lai JZ, Jeng MD (2011) A fuzzy K-means clustering algorithm using cluster center displacement. J Inf Sci Eng 27(3):995–1009

    MathSciNet  Google Scholar 

  4. Taherdangkoo M, Bagheri MH (2013) A powerful hybrid clustering method based on modified stem cells and Fuzzy C-means algorithms. Eng Appl Artif Intell 26(5–6):1493–1502

    Article  Google Scholar 

  5. Abas AR (2010) Using general regression with local tuning for learning mixture models from incomplete data sets. Egypt Inform J 11(2):49–57

    Article  Google Scholar 

  6. Abas AR (2012) Unsupervised learning of mixture models based on swarm intelligence and neural networks with optimal completion using incomplete data. Egypt Inform J 13(2):103–109

    Article  Google Scholar 

  7. Lin HC, Su CT (2013) A selective Bayes classifier with meta-heuristics for incomplete data. Neurocomputing 15(106):95–102

    Article  MathSciNet  Google Scholar 

  8. Hathaway RJ, Bezdek JC (2001) Fuzzy c-means clustering of incomplete data. IEEE Trans Syst Man Cybern Part B Cybern 31(5):735–744

    Article  Google Scholar 

  9. Dixon JK (1979) Pattern recognition with partly missing data. IEEE Trans Syst Man Cybern 9(10):617–621

    Article  Google Scholar 

  10. Di Nuovo AG (2011) Missing data analysis with fuzzy C-means: a study of its application in a psychological scenario. Expert Syst Appl 38(6):6793–6797

    Article  Google Scholar 

  11. Aydilek IB, Arslan A (2013) A hybrid method for imputation of missing values using optimized fuzzy c-means with support vector regression and a genetic algorithm. Inf Sci 233:25–35

    Article  Google Scholar 

  12. Simiński K (2013) Clustering with missing values. Fundam Inform 123(3):331–350

    MATH  Google Scholar 

  13. Nowicki RK (2010) On classification with missing data using rough-neuro-fuzzy systems. Int J Appl Math Comput Sci 20(1):55–67

    Article  MATH  Google Scholar 

  14. Dopazo E, Ruiz-Tagle M (2011) A parametric GP model dealing with incomplete information for group decision-making. Appl Math Comput 218(2):514–519

    Article  MATH  MathSciNet  Google Scholar 

  15. Pei Z (2012) Rational decision making models with incomplete weight information for production line assessment. Inf Sci 222(10):696–716

    Google Scholar 

  16. Himmelspach L, Conrad S (2010) Fuzzy clustering of incomplete data based on cluster dispersion. Comput Intell Knowl Based Syst Des 6178:59–68

    Article  Google Scholar 

  17. Zhang SC, Jin Z, Zhu XF (2011) Missing data imputation by utilizing information within incomplete instances. J Syst Softw 84(3):452–459

    Article  Google Scholar 

  18. Subasi MM, Subasi E, Anthony M, Hammer PL (2011) A new imputation method for incomplete binary data. Discrete Appl Math 159(10):1040–1047

    Article  MATH  MathSciNet  Google Scholar 

  19. Hathaway RJ, Bezdek JC (2002) Clustering incomplete relational data using the non-Euclidean relational fuzzy c-means algorithm. Pattern Recogn Lett 23(1):151–160

    Article  MATH  Google Scholar 

  20. Sánchez JS, Mollineda RA, Sotoca JM (2007) An analysis of how training data complexity affects the nearest neighbor classifiers. Pattern Anal Appl 10(3):189–201

    Article  MathSciNet  Google Scholar 

  21. Franco A, Maltoni D, Nanni L (2010) Data pre-processing through reward–punishment editing. Pattern Anal Appl 13(4):367–381

    Article  MathSciNet  Google Scholar 

  22. Doquire G, Verleysen M (2012) Feature selection with missing data using mutual information estimators. Neurocomputing 90:3–11

    Article  Google Scholar 

  23. Van Hulse J, Khoshgoftaar TM (2011) Incomplete-case nearest neighbor imputation in software measurement data. In: Proceedings of Information Sciences, pp 1–15

  24. Li D, Gu H, Zhang L (2010) A fuzzy c-means clustering algorithm based on nearest-neighbor intervals for incomplete data. Expert Syst Appl 37(10):6942–6947

    Article  Google Scholar 

  25. Izakian H, Abraham A (2011) Fuzzy C-means and fuzzy swarm for fuzzy clustering problem. Expert Syst Appl 38(3):1835–1838

    Article  Google Scholar 

  26. Benaichouche AN, Oulhadj H, Siarry P (2013) Improved spatial fuzzy c-means clustering for image segmentation using PSO initialization, Mahalanobis distance and post-segmentation correction. Digit Signal Process 23(5):1390–1400

    Article  MathSciNet  Google Scholar 

  27. Yu SW, Wei YM, Fan JL, Zhang X, Wang K (2012) Exploring the regional characteristics of inter-provincial CO2 emissions in China: an improved fuzzy clustering analysis based on particle swarm optimization. Appl Energy 92:552–562

    Article  Google Scholar 

  28. Omran MG, Salman A, Engelbrecht AP (2006) Dynamic clustering using particle swarm optimization with application in image segmentation. Pattern Anal Appl 8(4):332–344

    Article  MathSciNet  Google Scholar 

  29. Mohandes MA (2012) Modeling global solar radiation using particle swarm optimization (PSO). Sol Energy 86(11):3137–3145

    Article  Google Scholar 

  30. Farahmand H, Rashidinejad M, Mousavi A, Gharaveisi AA, Irving MR, Taylor GA (2012) Hybrid mutation particle swarm optimization method for available transfer capability enhancement. Int J Electr Power Energy Syst 42(1):240–249

    Article  Google Scholar 

  31. Zhang L, Zhao JQ, Zhang XN, Zhang SL (2013) Study of a new improved PSO-BP neural network algorithm. J Harbin Inst Technol 20(5):99–105

    Google Scholar 

Download references

Acknowledgments

This work is supported by the National Nature Science Foundation of China (No. 61174115, No. 51104044).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zhaohong Bing.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhang, L., Bing, Z. & Zhang, L. A hybrid clustering algorithm based on missing attribute interval estimation for incomplete data. Pattern Anal Applic 18, 377–384 (2015). https://doi.org/10.1007/s10044-014-0376-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10044-014-0376-8

Keywords

Navigation