Skip to main content

Performance Analysis of Various Missing Value Imputation Methods on Heart Failure Dataset

  • Conference paper
  • First Online:
Proceedings of SAI Intelligent Systems Conference (IntelliSys) 2016 (IntelliSys 2016)

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 16))

Included in the following conference series:

Abstract

The missing data issue is a fundamental challenge in terms of analyses and classification of data. The classification performance of incomplete data could be affected and produce different accuracy results compared with complete data. In this work we compare six scalable imputation methods, implemented on a Heart Failure dataset. The comparison is done by the performance metrics of three different classification methods namely J48, REPTree, and Random Forest. The aim of the research is to find a classifier that achieves best performance results after imputing the missing data using different imputation methods. The results show that in general, the Random Forest classification achieves the best results in comparison to the decision tree J48 and REP Tree. Furthermore, the performance of classification improved when imputing the missing values by concept most common (CMC) and support vector machine (SVM).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Liu, Z., Pan, Q., Dezert, J., Martin, A.: Adaptive imputation of missing values for incomplete pattern classification. Pattern Recogn. 52, 85–95 (2015)

    Article  Google Scholar 

  2. Razzaghi, T., Roderick, O., Safro, I., Marko, N.: Fast imbalanced classification of healthcare data with missing values. arXiv preprint arXiv:1503.06250 (2015)

  3. Batista, G.E., Monard, M.C.: An analysis of four missing data treatment methods for supervised learning. Appl. Artif. Intell. 17, 519–533 (2003)

    Article  Google Scholar 

  4. Zhang, S., Qin, Z., Ling, C.X., Sheng, S.: “Missing is useful”: missing values in cost-sensitive decision trees. IEEE Trans. Knowl. Data Eng. 17, 1689–1693 (2005)

    Article  Google Scholar 

  5. Marivate, V.N., Nelwamodo, F.V., Marwala, T.: Autoencoder, principal component analysis and support vector regression for data imputation. arXiv preprint arXiv:0709.2506 (2007)

  6. Umathe, V.H., Chaudhary, G.: Imputation methods for incomplete data. In: 2015 International Conference on Innovations in Information, Embedded and Communication Systems (ICIIECS), pp. 1–4 (2015)

    Google Scholar 

  7. Carmona, C.J., Luengo, J., Gonzalez, P., del Jesus, M.J.: A preliminary study on missing data imputation in evolutionary fuzzy systems of subgroup discovery. In: 2012 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE), pp. 1–7 (2012)

    Google Scholar 

  8. Zhang, Y., Kambhampati, C., Davis, D.N., Goode, K., Cleland, J.G.: A comparative study of missing value imputation with multiclass classification for clinical heart failure data. In: 2012 9th International Conference on Fuzzy Systems and Knowledge Discovery (FSKD), pp. 2840–2844 (2012)

    Google Scholar 

  9. Little, R.J., Rubin, D.B.: The analysis of social science data with missing values. Sociol. Methods Res. 18, 292–326 (1989)

    Article  Google Scholar 

  10. Nelwamondo, F.V., Mohamed, S., Marwala, T.: Missing data: a comparison of neural network and expectation maximisation techniques. arXiv preprint arXiv:0704.3474 (2007)

  11. Farhangfar, A., Kurgan, L., Pedrycz, W.: A novel framework for imputation of missing values in databases. IEEE Trans. Syst. Man Cybern. Part A: Syst. Hum. 37, 692–709 (2007)

    Article  Google Scholar 

  12. Belanche, L.A., Kobayashi, V., Aluja, T.: Handling missing values in kernel methods with application to microbiology data. Neurocomputing 141, 110–116 (2014)

    Article  Google Scholar 

  13. Jordanov, I., Petrov, N.: Sets with incomplete and missing data—NN radar signal classification. In: 2014 International Joint Conference on Neural Networks (IJCNN), pp. 218–224 (2014)

    Google Scholar 

  14. Gheyas, I.A., Smith, L.S.: A neural network-based framework for the reconstruction of incomplete data sets. Neurocomputing 73, 3039–3065 (2010)

    Article  Google Scholar 

  15. Min, P.: Based on kernel function and non-parametric multiple imputation algorithm to solve the problem of missing data. In: 2011 International Conference on Management Science and Industrial Engineering (MSIE), pp. 905–909 (2011)

    Google Scholar 

  16. Chauhan, H., Kumar, V., Pundir, S., Pilli, E.S.: A comparative study of classification techniques for intrusion detection. In: 2013 International Symposium on Computational and Business Intelligence (ISCBI), pp. 40–43 (2013)

    Google Scholar 

  17. Moore, L., Kambhampati, C., Cleland, J.G.F.: Classification of a real live heart failure clinical dataset- Is TAN Bayes better than other Bayes? In: 2014 IEEE International Conference on Systems, Man and Cybernetics (SMC), pp. 882–887 (2014)

    Google Scholar 

  18. My Chau, T., Dongil, S., Dongkyoo, S.: A comparative study of medical data classification methods based on decision tree and bagging algorithms. In: 2009 Eighth IEEE International Conference on Dependable, Autonomic and Secure Computing, DASC 2009, pp. 183–187 (2009)

    Google Scholar 

  19. Nakai, M., Chen, D.-G., Nishimura, K., Miyamoto, Y.: Comparative study of four methods in missing value imputations under missing completely at random mechanism. Open J. Stat. 4, 27–37 (2014)

    Article  Google Scholar 

  20. Kumdee, O., Ritthipravat, P., Bhongmakapat, T., Cheewaruangroj, W.: Dealing with missing values for effective prediction of NPC recurrence. In: 2008 SICE Annual Conference, pp. 1290–1294 (2008)

    Google Scholar 

  21. Dodge, Y., Zoppe, A.: Adjusting the EM algorithm for design of experiments with missing data. In: 2004 26th International Conference on Information Technology Interfaces, vol. 1, pp. 9–12 (2004)

    Google Scholar 

  22. Karmaker, A., Kwek, S.: Incorporating an EM-approach for handling missing attribute-values in decision tree induction. In: 2005 Fifth International Conference on Hybrid Intelligent Systems, HIS 2005, p. 6 (2005)

    Google Scholar 

  23. Li, D., Deogun, J., Spaulding, W., Shuart, B.: Towards missing data imputation: a study of fuzzy k-means clustering method. In: Rough Sets and Current Trends in Computing, pp. 573–579 (2004)

    Google Scholar 

  24. Grzymala-Busse, J.W., Goodwin, L.K., Grzymala-Busse, W.J., Zheng, X.: Handling missing attribute values in preterm birth data sets. In: Rough Sets, Fuzzy Sets, Data Mining, and Granular Computing, pp. 342–351. Springer (2005)

    Google Scholar 

  25. Kaiser, J.: Dealing with missing values in data. J. Syst. Integrat. 5, 42–51 (2014)

    Article  Google Scholar 

  26. Sivapriya, T., Kamal, A.N.B., Thavavel, V.: Imputation and classification of missing data using least square support vector machines–a new approach in dementia diagnosis. Int. J. Adv. Res. Artif. Intell. 1, 29–33 (2012)

    Google Scholar 

  27. Rogers, S.D.: Support vector machines for classification and imputation (2012)

    Google Scholar 

  28. Liu, Y., Liu, Y.: Incremental learning method of least squares support vector machine. In: 2010 International Conference on Intelligent Computation Technology and Automation (ICICTA), pp. 529–532 (2010)

    Google Scholar 

  29. Lomax, S., Vadera, S., Saraee, M.: A multi-armed bandit approach to cost-sensitive decision tree learning. In: 2012 IEEE 12th International Conference on Data Mining Workshops (ICDMW), pp. 162–168 (2012)

    Google Scholar 

  30. Agrawal, G.L., Gupta, H.: Optimization of C4.5 decision tree algorithm for data mining application. Int. J. Emerg. Technol. Adv. Eng. 3, 341–345 (2013)

    Google Scholar 

  31. Sharma, P., Singh, D., Singh, A.: Classification algorithms on a large continuous random dataset using rapid miner tool. In: 2015 2nd International Conference on Electronics and Communication Systems (ICECS), pp. 704–709 (2015)

    Google Scholar 

  32. Kaur, G., Chhabra, A.: Improved J48 classification algorithm for the prediction of diabetes. Int. J. Comput. Appl. 98, 13–17 (2014)

    Google Scholar 

  33. Almutairi, A., Parish, D.: Using classification techniques for creation of predictive intrusion detection model. In: 2014 9th International Conference on Internet Technology and Secured Transactions (ICITST), pp. 223–228 (2014)

    Google Scholar 

  34. Galathiya, A., Ganatra, A., Bhensdadia, C.: Classification with an improved Decision Tree Algorithm. Int. J. Comput. Appl. 46, 1–6 (2012)

    Google Scholar 

  35. Mohamed, W.N.H.W., Salleh, M.N.M., Omar, A.H.: A comparative study of Reduced Error Pruning method in decision tree algorithms. In: 2012 IEEE International Conference on Control System, Computing and Engineering (ICCSCE), pp. 392–397 (2012)

    Google Scholar 

  36. Balasundaram, A., Bhuvaneswari, P.T.V.: Comparative study on decision tree based data mining algorithm to assess risk of epidemic. In: IET Chennai Fourth International Conference on Sustainable Energy and Intelligent Systems (SEISCON 2013), pp. 390–396 (2013)

    Google Scholar 

  37. Junghun, P., Hsiao-Rong, T., Kuo, C.C.J.: GA-based internet traffic classification technique for qos provisioning. In: 2006 International Conference on Intelligent Information Hiding and Multimedia Signal Processing, IIH-MSP 2006, pp. 251–254 (2006)

    Google Scholar 

  38. Jian, X., Chen, P., Bin, L.: Random forest for relational classification with application to terrorist profiling. In: 2009 IEEE International Conference on Granular Computing, GRC 2009, pp. 630–633 (2009)

    Google Scholar 

  39. Svetnik, V., Liaw, A., Tong, C., Culberson, J.C., Sheridan, R.P., Feuston, B.P.: Random forest: a classification and regression tool for compound classification and QSAR modeling. J. Chem. Inf. Comput. Sci. 43, 1947–1958 (2003)

    Article  Google Scholar 

  40. Cuzzocrea, A., Francis, S.L., Gaber, M.M.: An information-theoretic approach for setting the optimal number of decision trees in random forests. In: 2013 IEEE International Conference on Systems, Man, and Cybernetics (SMC), pp. 1013–1019 (2013)

    Google Scholar 

  41. Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA data mining software: an update. SIGKDD Explor. 11, 10–18 (2009)

    Article  Google Scholar 

  42. Alcalá-Fdez, A.F.J., Luengo, J., Derrac, J., García, S., Sánchez, L., Herrera, F.: KEEL data-mining software tool: data set repository, integration of algorithms and experimental analysis framework. J. Multiple-Valued Logic Soft Comput. 17(2–3), 255–287 (2011)

    Google Scholar 

  43. Alcalá-Fdez, J., Sánchez, L., García, S., Jesus, M.J., Ventura, S., Garrell, J.M., et al.: KEEL: a software tool to assess evolutionary algorithms to data mining problems. Soft Comput. 13(3), 307–318 (2009)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mohammad Al Khaldy .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG

About this paper

Cite this paper

Al Khaldy, M., Kambhampati, C. (2018). Performance Analysis of Various Missing Value Imputation Methods on Heart Failure Dataset. In: Bi, Y., Kapoor, S., Bhatia, R. (eds) Proceedings of SAI Intelligent Systems Conference (IntelliSys) 2016. IntelliSys 2016. Lecture Notes in Networks and Systems, vol 16. Springer, Cham. https://doi.org/10.1007/978-3-319-56991-8_31

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-56991-8_31

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-56990-1

  • Online ISBN: 978-3-319-56991-8

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics