A hybrid clustering algorithm based on missing attribute interval estimation for incomplete data

Zhang, Li; Bing, Zhaohong; Zhang, Liyong

doi:10.1007/s10044-014-0376-8

A hybrid clustering algorithm based on missing attribute interval estimation for incomplete data

Short Paper
Published: 01 June 2014

Volume 18, pages 377–384, (2015)
Cite this article

Pattern Analysis and Applications Aims and scope Submit manuscript

Li Zhang¹,
Zhaohong Bing¹ &
Liyong Zhang²

604 Accesses
23 Citations
Explore all metrics

Abstract

Partially missing data sets are a prevailing problem in clustering analysis. We propose a hybrid algorithm combining fuzzy clustering with particle swarm optimization (PSO) for incomplete data clustering, and missing attributes are represented as intervals. Furthermore, we develop a neighbor interval reconstruction (NIR) method based on pre-classification results that estimates the nearest-neighbor interval of missing attribute using the nearest-neighbor rule, which avoids endpoints of intervals determined by different species information, thereby improving the accuracy of missing attribute intervals and enhancing the robustness of missing attribute imputation. Then, the PSO and fuzzy c-means hybrid algorithm are used for clustering the interval-valued data set, and the global optimization ability of the PSO can improve the accuracy of clustering results compared with gradient-based optimization methods. The experimental results for several UCI data sets show the superiority of the proposed NIR hybrid algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Chen M, Miao DQ (2011) Interval set clustering. Expert Syst Appl 38(4):2923–2932
Article MathSciNet Google Scholar
Wang J, Chung FL, Wang ST, Deng ZH (2013) Double indices-induced FCM clustering and its integration with fuzzy subspace clustering. Pattern Anal Appl 6:1433–7541
Google Scholar
Chang CT, Lai JZ, Jeng MD (2011) A fuzzy K-means clustering algorithm using cluster center displacement. J Inf Sci Eng 27(3):995–1009
MathSciNet Google Scholar
Taherdangkoo M, Bagheri MH (2013) A powerful hybrid clustering method based on modified stem cells and Fuzzy C-means algorithms. Eng Appl Artif Intell 26(5–6):1493–1502
Article Google Scholar
Abas AR (2010) Using general regression with local tuning for learning mixture models from incomplete data sets. Egypt Inform J 11(2):49–57
Article Google Scholar
Abas AR (2012) Unsupervised learning of mixture models based on swarm intelligence and neural networks with optimal completion using incomplete data. Egypt Inform J 13(2):103–109
Article Google Scholar
Lin HC, Su CT (2013) A selective Bayes classifier with meta-heuristics for incomplete data. Neurocomputing 15(106):95–102
Article MathSciNet Google Scholar
Hathaway RJ, Bezdek JC (2001) Fuzzy c-means clustering of incomplete data. IEEE Trans Syst Man Cybern Part B Cybern 31(5):735–744
Article Google Scholar
Dixon JK (1979) Pattern recognition with partly missing data. IEEE Trans Syst Man Cybern 9(10):617–621
Article Google Scholar
Di Nuovo AG (2011) Missing data analysis with fuzzy C-means: a study of its application in a psychological scenario. Expert Syst Appl 38(6):6793–6797
Article Google Scholar
Aydilek IB, Arslan A (2013) A hybrid method for imputation of missing values using optimized fuzzy c-means with support vector regression and a genetic algorithm. Inf Sci 233:25–35
Article Google Scholar
Simiński K (2013) Clustering with missing values. Fundam Inform 123(3):331–350
MATH Google Scholar
Nowicki RK (2010) On classification with missing data using rough-neuro-fuzzy systems. Int J Appl Math Comput Sci 20(1):55–67
Article MATH Google Scholar
Dopazo E, Ruiz-Tagle M (2011) A parametric GP model dealing with incomplete information for group decision-making. Appl Math Comput 218(2):514–519
Article MATH MathSciNet Google Scholar
Pei Z (2012) Rational decision making models with incomplete weight information for production line assessment. Inf Sci 222(10):696–716
Google Scholar
Himmelspach L, Conrad S (2010) Fuzzy clustering of incomplete data based on cluster dispersion. Comput Intell Knowl Based Syst Des 6178:59–68
Article Google Scholar
Zhang SC, Jin Z, Zhu XF (2011) Missing data imputation by utilizing information within incomplete instances. J Syst Softw 84(3):452–459
Article Google Scholar
Subasi MM, Subasi E, Anthony M, Hammer PL (2011) A new imputation method for incomplete binary data. Discrete Appl Math 159(10):1040–1047
Article MATH MathSciNet Google Scholar
Hathaway RJ, Bezdek JC (2002) Clustering incomplete relational data using the non-Euclidean relational fuzzy c-means algorithm. Pattern Recogn Lett 23(1):151–160
Article MATH Google Scholar
Sánchez JS, Mollineda RA, Sotoca JM (2007) An analysis of how training data complexity affects the nearest neighbor classifiers. Pattern Anal Appl 10(3):189–201
Article MathSciNet Google Scholar
Franco A, Maltoni D, Nanni L (2010) Data pre-processing through reward–punishment editing. Pattern Anal Appl 13(4):367–381
Article MathSciNet Google Scholar
Doquire G, Verleysen M (2012) Feature selection with missing data using mutual information estimators. Neurocomputing 90:3–11
Article Google Scholar
Van Hulse J, Khoshgoftaar TM (2011) Incomplete-case nearest neighbor imputation in software measurement data. In: Proceedings of Information Sciences, pp 1–15
Li D, Gu H, Zhang L (2010) A fuzzy c-means clustering algorithm based on nearest-neighbor intervals for incomplete data. Expert Syst Appl 37(10):6942–6947
Article Google Scholar
Izakian H, Abraham A (2011) Fuzzy C-means and fuzzy swarm for fuzzy clustering problem. Expert Syst Appl 38(3):1835–1838
Article Google Scholar
Benaichouche AN, Oulhadj H, Siarry P (2013) Improved spatial fuzzy c-means clustering for image segmentation using PSO initialization, Mahalanobis distance and post-segmentation correction. Digit Signal Process 23(5):1390–1400
Article MathSciNet Google Scholar
Yu SW, Wei YM, Fan JL, Zhang X, Wang K (2012) Exploring the regional characteristics of inter-provincial CO₂ emissions in China: an improved fuzzy clustering analysis based on particle swarm optimization. Appl Energy 92:552–562
Article Google Scholar
Omran MG, Salman A, Engelbrecht AP (2006) Dynamic clustering using particle swarm optimization with application in image segmentation. Pattern Anal Appl 8(4):332–344
Article MathSciNet Google Scholar
Mohandes MA (2012) Modeling global solar radiation using particle swarm optimization (PSO). Sol Energy 86(11):3137–3145
Article Google Scholar
Farahmand H, Rashidinejad M, Mousavi A, Gharaveisi AA, Irving MR, Taylor GA (2012) Hybrid mutation particle swarm optimization method for available transfer capability enhancement. Int J Electr Power Energy Syst 42(1):240–249
Article Google Scholar
Zhang L, Zhao JQ, Zhang XN, Zhang SL (2013) Study of a new improved PSO-BP neural network algorithm. J Harbin Inst Technol 20(5):99–105
Google Scholar

Download references

Acknowledgments

This work is supported by the National Nature Science Foundation of China (No. 61174115, No. 51104044).

Author information

Authors and Affiliations

School of Information, Liaoning University, Shenyang, 110036, China
Li Zhang & Zhaohong Bing
School of Control Science and Engineering, Dalian University of Technology, Dalian, 116024, China
Liyong Zhang

Authors

Li Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Zhaohong Bing
View author publications
You can also search for this author in PubMed Google Scholar
Liyong Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zhaohong Bing.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhang, L., Bing, Z. & Zhang, L. A hybrid clustering algorithm based on missing attribute interval estimation for incomplete data. Pattern Anal Applic 18, 377–384 (2015). https://doi.org/10.1007/s10044-014-0376-8

Download citation

Received: 16 January 2013
Accepted: 15 May 2014
Published: 01 June 2014
Issue Date: May 2015
DOI: https://doi.org/10.1007/s10044-014-0376-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A hybrid clustering algorithm based on missing attribute interval estimation for incomplete data

Abstract

Access this article

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation