Abstract
Preprocessing is one of the steps in Data Mining, which involves Noise removal, Identification of outlier, Normalization, Data transformation, Handling missing values, etc. Missing value is a common problem in large datasets. Most frequently used method to handle missing values by statistical is discarding the instances with missing values. Sometime deletion of instances with missing values cause loss of essential information, which affects the performance of statistical and machine learning algorithms. This paper focuses on handling missing values using unsupervised learning techniques. Rough K-Means based missing value imputation was proposed and compared with K-Means, Fuzzy C-Means based imputation methods. The experimental analysis is carried out on two data sets Lung Cancer and Cleveland Heart data sets. The proposed method achieves the best accuracy for some of the datasets.
References
Allison, P.D.: Missing Data. Sage, Thousand Oaks (2001)
Gajawada, S., Toshniwal, D.: Missing value imputation method based on clustering and nearest neighbours. Int. J. Future Comput. Commun. 1(2), 206 (2012)
Nelwamondo F.V.: Computational intelligence techniques for missing data imputation. University of the Witwatersrand, Johannesburg
Zhang, S., Zhang, J., Zhu, X., Qin, Y., Zhang, C.: Missing value imputation based on data clustering. In: Australian Large ARC Grants China NSF Major Research Program
Han, J., Kamber, M.: Data Mining Concepts and Techniques, 2nd edn. Morgan Kaufmann Publishers, San Francisco (2006). ISBN 1-55860-901-6
Suguna, N., Thanushkodi, K.G.: Predicting missing attribute values using k-means clustering. J. Comput. Sci. 7(2), 216–224 (2011). ISBN 1549-3636
Kondo, Y., Salibian-Barrera, M., Zamar, R.: A robust and sparse K-means clustering algorithm. Department of Statistics, The University of British Columbia, Vancouver, Canada, January 2012
Pavan, K.K., Appa Rao, A., Dattatreya Rao, A.V., Sridhar, G.R.: Single pass seed selection algorithm for k-means. J. Comput. Sci. 6(1), 60–66 (2010)
Havens, T.C.: Fuzzy c-means algorithms for very large data. IEEE Trans. Fuzzy Syst. 20, 1130–1146 (2012). 1063-6706/$31.00
Peters, G.: Some refinements of rough k-means clustering. Pattern Recogn. Soc. 39, 1481–1491 (2006). Published by Elsevier Ltd.
Peters, G., Lampart, M.: A partitive rough clustering algorithm. In: Greco, S., Hata, Y., Hirano, S., Inuiguchi, M., Miyamoto, S., Nguyen, H.S., Słowiński, R. (eds.) RSCTC 2006. LNCS, vol. 4259, pp. 657–666. Springer, Heidelberg (2006). doi:10.1007/11908029_68
Panda, S., Sahu, S., Jena, P., Chattopadhyay, S.: Comparing fuzzy-C means and K-means clustering techniques: a comprehensive study. In: Wyld, D.C., Zizka, J., Nagamalai, D. (eds.) ICCSEA 2012. AISC, vol. 166, pp. 451–460. Springer, Heidelberg (2012). doi:10.1007/978-3-642-30157-5_45
Li, D., Deogun, J., Spaulding, W., Shuart, B.: Dealing with missing data: algorithms based on fuzzy set and rough set theories. In: Peters, J.F., Skowron, A. (eds.) Transactions on Rough Sets IV. LNCS, vol. 3700, pp. 37–57. Springer, Heidelberg (2005). doi:10.1007/11574798_3
Li, D., Deogun, J., Spaulding, W., Shuart, B.: Towards missing data imputation: a study of fuzzy k-means clustering method. In: Tsumoto, S., Słowiński, R., Komorowski, J., Grzymała-Busse, J.W. (eds.) RSCTC 2004. LNCS, vol. 3066, pp. 573–579. Springer, Heidelberg (2004). doi:10.1007/978-3-540-25929-9_70
Peters, G., Crespo, F.: An Illustrative comparison of rough k-Means. In: Ciucci, D., Inuiguchi, M., Yao, Y., Ślęzak, D., Wang, G. (eds.) RSFDGrC 2013. LNCS, vol. 8170, pp. 337–344. Springer, Heidelberg (2013). doi:10.1007/978-3-642-41218-9_36
Peters, G., Lampart, M., Weber, R.: Evolutionary rough k-medoid clustering. In: Peters, J.F., Skowron, A. (eds.) Transactions on Rough Sets VIII. LNCS, vol. 5084, pp. 289–306. Springer, Heidelberg (2008). doi:10.1007/978-3-540-85064-9_13
Lingras, P., Peters, G.: Rough Clustering, vol. 1. Wiley, Hoboken (2011)
Bezdek, J.C., Ehrlich, R., Full, W.: FCM: the fuzzy c-means clustering algorithm. Comput. Geosci. 10(2–3), 191–203 (1984). Pergamon Press Ltd.
Dun, J.C.: A fuzzy relative of the ISODATA process and its use in detecting compact well-separated clusters. J. Cybern. 3(3), 32–57 (1974). Department of Theoretical and Applied Mechanics, Cornell University
Reby, D., et al.: Artificial neural networks as a classification method in the behavioural sciences. Behav. Process. 40, 35–43 (1997). Elsevier Science B.V.
Zhang, G.P.: Neural networks for classification: a survey. IEEE Trans. Syst. Man Cybern.—Part C: Appl. Rev. 30(4), 451–462 (2000)
Rey-del-Castillo, P., Cardeñosa, J.: Fuzzy Min–Max Neural Networks for Categorical Data: Application to Missing Data Imputation. Springer, London (2011)
Acknowledgment
The present work is supported by Special Assistance Programme of University Grants Commission, New Delhi, India (Grant No. F.3-50/2011(SAP-II)).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Raja, P.S., Thangavel, K. (2016). Soft Clustering Based Missing Value Imputation. In: Subramanian, S., Nadarajan, R., Rao, S., Sheen, S. (eds) Digital Connectivity – Social Impact. CSI 2016. Communications in Computer and Information Science, vol 679. Springer, Singapore. https://doi.org/10.1007/978-981-10-3274-5_10
Download citation
DOI: https://doi.org/10.1007/978-981-10-3274-5_10
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-3273-8
Online ISBN: 978-981-10-3274-5
eBook Packages: Computer ScienceComputer Science (R0)