Soft Clustering Based Missing Value Imputation

Raja, P. S.; Thangavel, K.

doi:10.1007/978-981-10-3274-5_10

P. S. Raja¹³ &
K. Thangavel¹³

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 679))

Included in the following conference series:

Annual Convention of the Computer Society of India

647 Accesses
2 Citations

Abstract

Preprocessing is one of the steps in Data Mining, which involves Noise removal, Identification of outlier, Normalization, Data transformation, Handling missing values, etc. Missing value is a common problem in large datasets. Most frequently used method to handle missing values by statistical is discarding the instances with missing values. Sometime deletion of instances with missing values cause loss of essential information, which affects the performance of statistical and machine learning algorithms. This paper focuses on handling missing values using unsupervised learning techniques. Rough K-Means based missing value imputation was proposed and compared with K-Means, Fuzzy C-Means based imputation methods. The experimental analysis is carried out on two data sets Lung Cancer and Cleveland Heart data sets. The proposed method achieves the best accuracy for some of the datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

References

Allison, P.D.: Missing Data. Sage, Thousand Oaks (2001)
MATH Google Scholar
Gajawada, S., Toshniwal, D.: Missing value imputation method based on clustering and nearest neighbours. Int. J. Future Comput. Commun. 1(2), 206 (2012)
Article Google Scholar
Nelwamondo F.V.: Computational intelligence techniques for missing data imputation. University of the Witwatersrand, Johannesburg
Google Scholar
Zhang, S., Zhang, J., Zhu, X., Qin, Y., Zhang, C.: Missing value imputation based on data clustering. In: Australian Large ARC Grants China NSF Major Research Program
Google Scholar
Han, J., Kamber, M.: Data Mining Concepts and Techniques, 2nd edn. Morgan Kaufmann Publishers, San Francisco (2006). ISBN 1-55860-901-6
MATH Google Scholar
Suguna, N., Thanushkodi, K.G.: Predicting missing attribute values using k-means clustering. J. Comput. Sci. 7(2), 216–224 (2011). ISBN 1549-3636
Article Google Scholar
Kondo, Y., Salibian-Barrera, M., Zamar, R.: A robust and sparse K-means clustering algorithm. Department of Statistics, The University of British Columbia, Vancouver, Canada, January 2012
Google Scholar
Pavan, K.K., Appa Rao, A., Dattatreya Rao, A.V., Sridhar, G.R.: Single pass seed selection algorithm for k-means. J. Comput. Sci. 6(1), 60–66 (2010)
Article Google Scholar
Havens, T.C.: Fuzzy c-means algorithms for very large data. IEEE Trans. Fuzzy Syst. 20, 1130–1146 (2012). 1063-6706/$31.00
Article Google Scholar
Peters, G.: Some refinements of rough k-means clustering. Pattern Recogn. Soc. 39, 1481–1491 (2006). Published by Elsevier Ltd.
Article MATH Google Scholar
Peters, G., Lampart, M.: A partitive rough clustering algorithm. In: Greco, S., Hata, Y., Hirano, S., Inuiguchi, M., Miyamoto, S., Nguyen, H.S., Słowiński, R. (eds.) RSCTC 2006. LNCS, vol. 4259, pp. 657–666. Springer, Heidelberg (2006). doi:10.1007/11908029_68
Chapter MATH Google Scholar
Panda, S., Sahu, S., Jena, P., Chattopadhyay, S.: Comparing fuzzy-C means and K-means clustering techniques: a comprehensive study. In: Wyld, D.C., Zizka, J., Nagamalai, D. (eds.) ICCSEA 2012. AISC, vol. 166, pp. 451–460. Springer, Heidelberg (2012). doi:10.1007/978-3-642-30157-5_45
Chapter Google Scholar
Li, D., Deogun, J., Spaulding, W., Shuart, B.: Dealing with missing data: algorithms based on fuzzy set and rough set theories. In: Peters, J.F., Skowron, A. (eds.) Transactions on Rough Sets IV. LNCS, vol. 3700, pp. 37–57. Springer, Heidelberg (2005). doi:10.1007/11574798_3
Chapter MATH Google Scholar
Li, D., Deogun, J., Spaulding, W., Shuart, B.: Towards missing data imputation: a study of fuzzy k-means clustering method. In: Tsumoto, S., Słowiński, R., Komorowski, J., Grzymała-Busse, J.W. (eds.) RSCTC 2004. LNCS, vol. 3066, pp. 573–579. Springer, Heidelberg (2004). doi:10.1007/978-3-540-25929-9_70
Chapter Google Scholar
Peters, G., Crespo, F.: An Illustrative comparison of rough k-Means. In: Ciucci, D., Inuiguchi, M., Yao, Y., Ślęzak, D., Wang, G. (eds.) RSFDGrC 2013. LNCS, vol. 8170, pp. 337–344. Springer, Heidelberg (2013). doi:10.1007/978-3-642-41218-9_36
Chapter Google Scholar
Peters, G., Lampart, M., Weber, R.: Evolutionary rough k-medoid clustering. In: Peters, J.F., Skowron, A. (eds.) Transactions on Rough Sets VIII. LNCS, vol. 5084, pp. 289–306. Springer, Heidelberg (2008). doi:10.1007/978-3-540-85064-9_13
Chapter MATH Google Scholar
Lingras, P., Peters, G.: Rough Clustering, vol. 1. Wiley, Hoboken (2011)
MATH Google Scholar
Bezdek, J.C., Ehrlich, R., Full, W.: FCM: the fuzzy c-means clustering algorithm. Comput. Geosci. 10(2–3), 191–203 (1984). Pergamon Press Ltd.
Article Google Scholar
Dun, J.C.: A fuzzy relative of the ISODATA process and its use in detecting compact well-separated clusters. J. Cybern. 3(3), 32–57 (1974). Department of Theoretical and Applied Mechanics, Cornell University
Article MathSciNet Google Scholar
Reby, D., et al.: Artificial neural networks as a classification method in the behavioural sciences. Behav. Process. 40, 35–43 (1997). Elsevier Science B.V.
Article Google Scholar
Zhang, G.P.: Neural networks for classification: a survey. IEEE Trans. Syst. Man Cybern.—Part C: Appl. Rev. 30(4), 451–462 (2000)
Article Google Scholar
Rey-del-Castillo, P., Cardeñosa, J.: Fuzzy Min–Max Neural Networks for Categorical Data: Application to Missing Data Imputation. Springer, London (2011)
Google Scholar

Download references

Acknowledgment

The present work is supported by Special Assistance Programme of University Grants Commission, New Delhi, India (Grant No. F.3-50/2011(SAP-II)).

Author information

Authors and Affiliations

Department of Computer Science, Periyar University, Salem, 636 011, India
P. S. Raja & K. Thangavel

Authors

P. S. Raja
View author publications
You can also search for this author in PubMed Google Scholar
K. Thangavel
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to P. S. Raja .

Editor information

Editors and Affiliations

Karpagam Academy of Higher Education, Coimbatore, India
S. Subramanian
PSG College of Technology, Coimbatore, India
R. Nadarajan
International Institute of Information Technology, Bengaluru, Karnataka, India
Shrisha Rao
Applied Mathematics and Computational Sciences, PSG College of Technology, Coimbatore, Tamil Nadu, India
Shina Sheen

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Raja, P.S., Thangavel, K. (2016). Soft Clustering Based Missing Value Imputation. In: Subramanian, S., Nadarajan, R., Rao, S., Sheen, S. (eds) Digital Connectivity – Social Impact. CSI 2016. Communications in Computer and Information Science, vol 679. Springer, Singapore. https://doi.org/10.1007/978-981-10-3274-5_10

Download citation

DOI: https://doi.org/10.1007/978-981-10-3274-5_10
Published: 23 November 2016
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-3273-8
Online ISBN: 978-981-10-3274-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics