Skip to main content

Multiple Imputation and Ensemble Learning for Classification with Incomplete Data

  • Conference paper
  • First Online:
Intelligent and Evolutionary Systems

Part of the book series: Proceedings in Adaptation, Learning and Optimization ((PALO,volume 8))

Abstract

Missing values are a common issue in many real-world datasets, and therefore coping with such datasets is an essential requirement of classification since inadequate treatment of missing values often leads to large classification errors. One of the most popular ways to address incomplete data is to use imputation methods to fill missing fields with plausible values. Multiple imputation, which fills each missing field with a set of plausible values, is a powerful approach to dealing with incomplete data, but is mainly used for statistical analysis. Ensemble learning which constructs a set of classifiers instead of one classifier has proven capable of improving classification accuracy, but has been mainly applied to complete data. This paper proposes a combination of multiple imputation and ensemble learning to build an ensemble of classifiers for incomplete data classification tasks. A multiple imputation method is used to generate a set of diverse imputed datasets which is then used to build a set of diverse classifiers. Experiments on ten benchmark datasets use a decision tree as classification algorithm and compare the proposed approach with two other popular approaches to dealing with incomplete data. The results show that, in almost all cases, the proposed method achieves significantly better classification accuracy than the other methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Asuncion, A., Newman, D.: UCI machine learning repository (2007)

    Google Scholar 

  2. Batista, G.E., Monard, M.C.: A study of k-nearest neighbour as an imputation method. In: Hybrid Intelligent Systems - HIS. pp. 251–260 (2002)

    Google Scholar 

  3. Berger, J.O.: Statistical decision theory and Bayesian analysis. Springer Science & Business Media (2013)

    Google Scholar 

  4. Bishop, C.M.: Pattern Recognition and Machine Learning. Springer-Verlag New York, Inc. (2006)

    Google Scholar 

  5. Breiman, L., Friedman, J., Stone, C.J., Olshen, R.A.: Classification and regression trees. CRC Press (1984)

    Google Scholar 

  6. Buuren, S., Groothuis-Oudshoorn, K.: MICE: Multivariate imputation by chained equations in R. Journal of statistical software 45, 1–67 (2011)

    Google Scholar 

  7. Chen, H., Du, Y., Jiang, K.: Classification of incomplete data using classifier ensembles. In: Systems and Informatics (ICSAI), 2012 International Conference on. pp. 2229–2232 (2012)

    Google Scholar 

  8. Dietterich, T.G.: Ensemble methods in machine learning. In: International workshop on multiple classifier systems. pp. 1–15 (2000)

    Google Scholar 

  9. Farhangfar, A., Kurgan, L.A., Pedrycz, W.: A novel framework for imputation of missing values in databases. Systems, Man and Cybernetics, Part A: Systems and Humans, IEEE Transactions on 37, 692–709 (2007)

    Google Scholar 

  10. García-Laencina, P.J., Sancho-Gómez, J.L., Figueiras-Vidal, A.R.: Pattern classification with missing data: a review. Neural Computing and Applications 19, 263–282 (2010)

    Google Scholar 

  11. Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA data mining software: an update. ACM SIGKDD explorations newsletter 11, 10–18 (2009)

    Google Scholar 

  12. Harel, O., Zhou, X.H.: Multiple imputation: review of theory, implementation and software. Statistics in medicine 26, 3057–3077 (2007)

    Google Scholar 

  13. Krause, S., Polikar, R.: An ensemble of classifiers approach for the missing feature problem. In: Neural Networks, 2003. Proceedings of the International Joint Conference on. vol. 1, pp. 553–558 (2003)

    Google Scholar 

  14. Liaw, A., Wiener, M.: Classification and regression by randomforest. R news 2, 18–22 (2002)

    Google Scholar 

  15. Little, R.J., Rubin, D.B.: Statistical analysis with missing data. John Wiley & Sons (2014)

    Google Scholar 

  16. Opitz, D., Maclin, R.: Popular ensemble methods: An empirical study. Journal of Artificial Intelligence Research 11, 169–198 (1999)

    Google Scholar 

  17. Quinlan, J.R.: C4. 5: programs for machine learning. Elsevier (2014)

    Google Scholar 

  18. Schafer, J.L., Graham, J.W.: Missing data: our view of the state of the art. Psychological methods 7, 147 (2002)

    Article  Google Scholar 

  19. Tran, C.T., Andreae, P., Zhang, M.: Impact of imputation of missing values on genetic programming based multiple feature construction for classification. In: 2015 IEEE Congress on Evolutionary Computation (CEC). pp. 2398–2405 (2015)

    Google Scholar 

  20. Tran, C.T., Zhang, M., Andreae, P.: Multiple imputation for missing data using genetic programming. In: Proceedings of the 2015 Annual Conference on Genetic and Evolutionary Computation. pp. 583–590 (2015)

    Google Scholar 

  21. Tran, C.T., Zhang, M., Andreae, P.: A genetic programming-based imputation method for classification with missing data. In: European Conference on Genetic Programming. pp. 149–163 (2016)

    Google Scholar 

  22. White, I.R., Royston, P., Wood, A.M.: Multiple imputation using chained equations: issues and guidance for practice. Statistics in medicine 30, 377–399 (2011)

    Article  MathSciNet  Google Scholar 

  23. Williams, D., Liao, X., Xue, Y., Carin, L., Krishnapuram, B.: On classification with incomplete data. IEEE Transactions on Pattern Analysis and Machine Intelligence 29(3), 427–436 (2007)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Cao Truong Tran .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Tran, C.T., Zhang, M., Andreae, P., Xue, B., Bui, L.T. (2017). Multiple Imputation and Ensemble Learning for Classification with Incomplete Data. In: Leu, G., Singh, H., Elsayed, S. (eds) Intelligent and Evolutionary Systems. Proceedings in Adaptation, Learning and Optimization, vol 8. Springer, Cham. https://doi.org/10.1007/978-3-319-49049-6_29

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-49049-6_29

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-49048-9

  • Online ISBN: 978-3-319-49049-6

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics