Skip to main content

Soft Clustering Based Missing Value Imputation

  • Conference paper
  • First Online:
Digital Connectivity – Social Impact (CSI 2016)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 679))

Included in the following conference series:

Abstract

Preprocessing is one of the steps in Data Mining, which involves Noise removal, Identification of outlier, Normalization, Data transformation, Handling missing values, etc. Missing value is a common problem in large datasets. Most frequently used method to handle missing values by statistical is discarding the instances with missing values. Sometime deletion of instances with missing values cause loss of essential information, which affects the performance of statistical and machine learning algorithms. This paper focuses on handling missing values using unsupervised learning techniques. Rough K-Means based missing value imputation was proposed and compared with K-Means, Fuzzy C-Means based imputation methods. The experimental analysis is carried out on two data sets Lung Cancer and Cleveland Heart data sets. The proposed method achieves the best accuracy for some of the datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

References

  1. Allison, P.D.: Missing Data. Sage, Thousand Oaks (2001)

    MATH  Google Scholar 

  2. Gajawada, S., Toshniwal, D.: Missing value imputation method based on clustering and nearest neighbours. Int. J. Future Comput. Commun. 1(2), 206 (2012)

    Article  Google Scholar 

  3. Nelwamondo F.V.: Computational intelligence techniques for missing data imputation. University of the Witwatersrand, Johannesburg

    Google Scholar 

  4. Zhang, S., Zhang, J., Zhu, X., Qin, Y., Zhang, C.: Missing value imputation based on data clustering. In: Australian Large ARC Grants China NSF Major Research Program

    Google Scholar 

  5. Han, J., Kamber, M.: Data Mining Concepts and Techniques, 2nd edn. Morgan Kaufmann Publishers, San Francisco (2006). ISBN 1-55860-901-6

    MATH  Google Scholar 

  6. Suguna, N., Thanushkodi, K.G.: Predicting missing attribute values using k-means clustering. J. Comput. Sci. 7(2), 216–224 (2011). ISBN 1549-3636

    Article  Google Scholar 

  7. Kondo, Y., Salibian-Barrera, M., Zamar, R.: A robust and sparse K-means clustering algorithm. Department of Statistics, The University of British Columbia, Vancouver, Canada, January 2012

    Google Scholar 

  8. Pavan, K.K., Appa Rao, A., Dattatreya Rao, A.V., Sridhar, G.R.: Single pass seed selection algorithm for k-means. J. Comput. Sci. 6(1), 60–66 (2010)

    Article  Google Scholar 

  9. Havens, T.C.: Fuzzy c-means algorithms for very large data. IEEE Trans. Fuzzy Syst. 20, 1130–1146 (2012). 1063-6706/$31.00

    Article  Google Scholar 

  10. Peters, G.: Some refinements of rough k-means clustering. Pattern Recogn. Soc. 39, 1481–1491 (2006). Published by Elsevier Ltd.

    Article  MATH  Google Scholar 

  11. Peters, G., Lampart, M.: A partitive rough clustering algorithm. In: Greco, S., Hata, Y., Hirano, S., Inuiguchi, M., Miyamoto, S., Nguyen, H.S., Słowiński, R. (eds.) RSCTC 2006. LNCS, vol. 4259, pp. 657–666. Springer, Heidelberg (2006). doi:10.1007/11908029_68

    Chapter  MATH  Google Scholar 

  12. Panda, S., Sahu, S., Jena, P., Chattopadhyay, S.: Comparing fuzzy-C means and K-means clustering techniques: a comprehensive study. In: Wyld, D.C., Zizka, J., Nagamalai, D. (eds.) ICCSEA 2012. AISC, vol. 166, pp. 451–460. Springer, Heidelberg (2012). doi:10.1007/978-3-642-30157-5_45

    Chapter  Google Scholar 

  13. Li, D., Deogun, J., Spaulding, W., Shuart, B.: Dealing with missing data: algorithms based on fuzzy set and rough set theories. In: Peters, J.F., Skowron, A. (eds.) Transactions on Rough Sets IV. LNCS, vol. 3700, pp. 37–57. Springer, Heidelberg (2005). doi:10.1007/11574798_3

    Chapter  MATH  Google Scholar 

  14. Li, D., Deogun, J., Spaulding, W., Shuart, B.: Towards missing data imputation: a study of fuzzy k-means clustering method. In: Tsumoto, S., Słowiński, R., Komorowski, J., Grzymała-Busse, J.W. (eds.) RSCTC 2004. LNCS, vol. 3066, pp. 573–579. Springer, Heidelberg (2004). doi:10.1007/978-3-540-25929-9_70

    Chapter  Google Scholar 

  15. Peters, G., Crespo, F.: An Illustrative comparison of rough k-Means. In: Ciucci, D., Inuiguchi, M., Yao, Y., Ślęzak, D., Wang, G. (eds.) RSFDGrC 2013. LNCS, vol. 8170, pp. 337–344. Springer, Heidelberg (2013). doi:10.1007/978-3-642-41218-9_36

    Chapter  Google Scholar 

  16. Peters, G., Lampart, M., Weber, R.: Evolutionary rough k-medoid clustering. In: Peters, J.F., Skowron, A. (eds.) Transactions on Rough Sets VIII. LNCS, vol. 5084, pp. 289–306. Springer, Heidelberg (2008). doi:10.1007/978-3-540-85064-9_13

    Chapter  MATH  Google Scholar 

  17. Lingras, P., Peters, G.: Rough Clustering, vol. 1. Wiley, Hoboken (2011)

    MATH  Google Scholar 

  18. Bezdek, J.C., Ehrlich, R., Full, W.: FCM: the fuzzy c-means clustering algorithm. Comput. Geosci. 10(2–3), 191–203 (1984). Pergamon Press Ltd.

    Article  Google Scholar 

  19. Dun, J.C.: A fuzzy relative of the ISODATA process and its use in detecting compact well-separated clusters. J. Cybern. 3(3), 32–57 (1974). Department of Theoretical and Applied Mechanics, Cornell University

    Article  MathSciNet  Google Scholar 

  20. Reby, D., et al.: Artificial neural networks as a classification method in the behavioural sciences. Behav. Process. 40, 35–43 (1997). Elsevier Science B.V.

    Article  Google Scholar 

  21. Zhang, G.P.: Neural networks for classification: a survey. IEEE Trans. Syst. Man Cybern.—Part C: Appl. Rev. 30(4), 451–462 (2000)

    Article  Google Scholar 

  22. Rey-del-Castillo, P., Cardeñosa, J.: Fuzzy Min–Max Neural Networks for Categorical Data: Application to Missing Data Imputation. Springer, London (2011)

    Google Scholar 

Download references

Acknowledgment

The present work is supported by Special Assistance Programme of University Grants Commission, New Delhi, India (Grant No. F.3-50/2011(SAP-II)).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to P. S. Raja .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer Nature Singapore Pte Ltd.

About this paper

Cite this paper

Raja, P.S., Thangavel, K. (2016). Soft Clustering Based Missing Value Imputation. In: Subramanian, S., Nadarajan, R., Rao, S., Sheen, S. (eds) Digital Connectivity – Social Impact. CSI 2016. Communications in Computer and Information Science, vol 679. Springer, Singapore. https://doi.org/10.1007/978-981-10-3274-5_10

Download citation

  • DOI: https://doi.org/10.1007/978-981-10-3274-5_10

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-10-3273-8

  • Online ISBN: 978-981-10-3274-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics