Missing value imputation using unsupervised machine learning techniques

Raja, P. S.; Thangavel, K.

doi:10.1007/s00500-019-04199-6

Missing value imputation using unsupervised machine learning techniques

Methodologies and Application
Published: 08 July 2019

Volume 24, pages 4361–4392, (2020)
Cite this article

Soft Computing Aims and scope Submit manuscript

2621 Accesses
54 Citations
Explore all metrics

Abstract

In data mining, preprocessing is one of the essential processes which involves data normalization, noise removal, handling missing values, etc. This paper focuses on handling missing values using unsupervised machine learning techniques. Soft computation approaches are combined with the clustering techniques to form a novel method to handle the missing values, which help us to overcome the problems of inconsistency. Rough K-means centroid-based imputation method is proposed and compared with K-means centroid-based imputation method, fuzzy C-means centroid-based imputation method, K-means parameter-based imputation method, fuzzy C-means parameter-based imputation method, and rough K-means parameter-based imputation methods. The experimental analysis is carried out on four benchmark datasets, viz. Dermatology, Pima, Wisconsin, and Yeast datasets, which have taken from UCI data repository. The proposed method proves the efficacy of different datasets, and the results are also promising one.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Soft Clustering Based Missing Value Imputation

A Novel Fuzzy Rough Clustering Parameter-based missing value imputation

Article 19 October 2019

A Normalized Mean Algorithm for Imputation of Missing Data Values in Medical Databases

References

Bezdek JC (2013) Pattern recognition with fuzzy objective function algorithms. Springer, Berlin
MATH Google Scholar
Cannon RL, Dave JV, Bezdek JC (1986) Efficient implementation of the fuzzy c-means clustering algorithms. IEEE Trans Pattern Anal Mach Intell 2:248–255
Article Google Scholar
Dunn JC (1973) A fuzzy relative of the ISODATA process and its use in detecting compact well-separated clusters. J Cybern 3(3):32–57
Article MathSciNet Google Scholar
Gajawada S, Toshniwal D (2012) Missing value imputation method based on clustering and nearest neighbours. Int J Future Comput Commun 1(2):206
Article Google Scholar
García-Laencina PJ, Sancho-Gómez JL, Figueiras-Vidal AR (2010) Pattern classification with missing data: a review. Neural Comput Appl 19(2):263–282
Article Google Scholar
Han J, Pei J, Kamber M (2011) Data mining: concepts and techniques. Elsevier, Amsterdam
MATH Google Scholar
Hathaway RJ, Bezdek JC (2001) Fuzzy c-means clustering of incomplete data. IEEE, Piscataway
Book Google Scholar
Havens TC, Bezdek JC, Leckie C, Hall LO, Palaniswami M (2012) Fuzzy c-means algorithms for very large data. IEEE Trans Fuzzy Syst 20(6):1130–1146
Article Google Scholar
https://archive.ics.uci.edu/ml/datasets/Yeast
Huang Z (1998) Extensions to the k-means algorithm for clustering large data sets with categorical values. Data Min Knowl Discov 2(3):283–304
Article Google Scholar
Jain AK (2010) Data clustering: 50 years beyond K-means. Pattern Recognit Lett 31(8):651–666
Article Google Scholar
Khan SS, Ahmad A (2004) Cluster center initialization algorithm for K-means clustering. Pattern Recogn Lett 25(11):1293–1302
Article Google Scholar
Kondo Y, Salibian-Barrera M, Zamar R (2012) A robust and sparse K-means clustering algorithm, arXiv preprint arXiv:1201.6082
Li D, Deogun J, Spaulding W, Shuart B (2004) Towards missing data imputation: a study of fuzzy k-means clustering method. InRough Sets Curr Trends Comput 3066:573–579
Article MathSciNet Google Scholar
Li D, Deogun J, Spaulding W, Shuart B (2005) Dealing with missing data: algorithms based on fuzzy set and rough set theories. In: Peters JF, Skowron A (eds) Transactions on rough sets IV. Springer, Berlin, pp 37–57
Chapter Google Scholar
Lingras P, Peters G (2011) Rough clustering. Wiley Interdiscip Rev Data Min Knowl Discov 1(1):64–72
Article Google Scholar
Liu ZG, Pan Q, Dezert J, Martin A (2016) Adaptive imputation of missing values for incomplete pattern classification. Pattern Recogn 52:85–95
Article Google Scholar
Nelwamondo FV (2008) Computational intelligence techniques for missing data imputation. Doctoral dissertation, University of the Witwatersrand, Johannesburg
Panda S, Sahu S, Jena P, Chattopadhyay S (2012) Comparing fuzzy-C means and K-means clustering techniques: a comprehensive study. In: Wyld DC, Zizka J, Nagamalai D (eds) Proceedings of 2nd international conference on computer science, engineering and applications, vol 166. Advances in computer science, engineering & applications. Springer, Berlin, Heidelberg, pp 451–460
Pawlak Z (1998) Rough set theory and its applications to data analysis. Cybern Syst 29(7):661–688
Article Google Scholar
Peters G (2005) Outliers in rough k-means clustering. InPReMI, pp 702–707
Peters G (2006) Some refinements of rough k-means clustering. Pattern Recognit 39(8):1481–1491
Article Google Scholar
Peters G, Crespo F (2013) An illustrative comparison of rough k-means to classical clustering approaches. InRSFDGrC, pp 337–344
Peters G, Lampart M (2006) A partitive rough clustering algorithm. In: International conference on rough sets and current trends in computing. Springer, Berlin, pp 657–666
Chapter Google Scholar
Peters G, Lampart M, Weber R (2008) Evolutionary rough k-medoid clustering. Lect Notes Comput Sci 5084:289–306
Article Google Scholar
Rahman MM, Davis DN (2013) Machine learning-based missing value imputation method for clinical datasets. In: Yang G-C, Ao S-I, Gelman L (eds) IAENG transactions on engineering technologies. Springer, Dordrecht, pp 245–257
Chapter Google Scholar
Rahman MG, Islam MZ (2016) Missing value imputation using a fuzzy clustering-based EM approach. Knowl Inf Syst 46(2):389–422
Article Google Scholar
Raja PS, Thangavel K (2016) Soft clustering based missing value imputation. In: Subramanian S et al (eds) Annual convention of the computer society of India. Springer, Singapore, pp 119–133
Google Scholar
Rey-del-Castillo P, Cardeñosa J (2012) Fuzzy min-max neural networks for categorical data: application to missing data imputation. Neural Comput Appl 21(6):1349–1362
Article Google Scholar
Suguna N, Thanushkodi KG (2011) Predicting missing attribute values using k-means clustering. J Comput Sci 7(2):216
Article Google Scholar
Tuikkala J, Elo LL, Nevalainen OS, Aittokallio T (2008) Missing value imputation improves clustering and interpretation of gene expression microarray data. BMC Bioinform 9(1):202
Article Google Scholar
Zhang S, Zhang J, Zhu X, Qin Y, Zhang C (2008) Missing value imputation based on data clustering. In: Gavrilova ML, Tan CJK (eds) Transactions on computational science I. Lecture notes in computer science, vol 4750, pp 128–138

Download references

Acknowledgements

Authors would like to thank UGC, New Delhi, for the financial support received under UGC Rajiv Gandhi National Fellowship (F1-17.1/2016-17/RGNF-2015-17-SC-TAM-28324) and UGC Major Research Project (43-274/2014). The authors extend their sincere thanks to the anonymous referees for their suggestions to improve the paper.

Author information

Authors and Affiliations

Department of Computer Science, Periyar University, Salem, Tamil Nadu, India
P. S. Raja & K. Thangavel

Authors

P. S. Raja
View author publications
You can also search for this author in PubMed Google Scholar
K. Thangavel
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to P. S. Raja.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Ethical approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Additional information

Communicated by V. Loia.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Tables 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20 and 21 show the information that is very important to impute the missing values of an object. In centroid-based missing value, imputation method, the missing values are imputed by the information of the closest centroid value of the cluster. Tables 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20 and 21 show the distance between the missing object and K cluster centroid and also show the minimum distance cluster and minimum distance value.

Table 10 K-means centroid-based imputation method for Dermatology

Missing value imputation using unsupervised machine learning techniques

Abstract

Access this article

Similar content being viewed by others

Soft Clustering Based Missing Value Imputation

A Novel Fuzzy Rough Clustering Parameter-based missing value imputation

A Normalized Mean Algorithm for Imputation of Missing Data Values in Medical Databases

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Additional information

Publisher’s Note

Appendix

Appendix

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation