Multiple Imputation and Ensemble Learning for Classification with Incomplete Data

Tran, Cao Truong; Zhang, Mengjie; Andreae, Peter; Xue, Bing; Bui, Lam Thu

doi:10.1007/978-3-319-49049-6_29

Cao Truong Tran^6,7,
Mengjie Zhang⁶,
Peter Andreae⁶,
Bing Xue⁶ &
…
Lam Thu Bui⁷

Part of the book series: Proceedings in Adaptation, Learning and Optimization ((PALO,volume 8))

1333 Accesses
12 Citations

Abstract

Missing values are a common issue in many real-world datasets, and therefore coping with such datasets is an essential requirement of classification since inadequate treatment of missing values often leads to large classification errors. One of the most popular ways to address incomplete data is to use imputation methods to fill missing fields with plausible values. Multiple imputation, which fills each missing field with a set of plausible values, is a powerful approach to dealing with incomplete data, but is mainly used for statistical analysis. Ensemble learning which constructs a set of classifiers instead of one classifier has proven capable of improving classification accuracy, but has been mainly applied to complete data. This paper proposes a combination of multiple imputation and ensemble learning to build an ensemble of classifiers for incomplete data classification tasks. A multiple imputation method is used to generate a set of diverse imputed datasets which is then used to build a set of diverse classifiers. Experiments on ten benchmark datasets use a decision tree as classification algorithm and compare the proposed approach with two other popular approaches to dealing with incomplete data. The results show that, in almost all cases, the proposed method achieves significantly better classification accuracy than the other methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Hardcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Asuncion, A., Newman, D.: UCI machine learning repository (2007)
Google Scholar
Batista, G.E., Monard, M.C.: A study of k-nearest neighbour as an imputation method. In: Hybrid Intelligent Systems - HIS. pp. 251–260 (2002)
Google Scholar
Berger, J.O.: Statistical decision theory and Bayesian analysis. Springer Science & Business Media (2013)
Google Scholar
Bishop, C.M.: Pattern Recognition and Machine Learning. Springer-Verlag New York, Inc. (2006)
Google Scholar
Breiman, L., Friedman, J., Stone, C.J., Olshen, R.A.: Classification and regression trees. CRC Press (1984)
Google Scholar
Buuren, S., Groothuis-Oudshoorn, K.: MICE: Multivariate imputation by chained equations in R. Journal of statistical software 45, 1–67 (2011)
Google Scholar
Chen, H., Du, Y., Jiang, K.: Classification of incomplete data using classifier ensembles. In: Systems and Informatics (ICSAI), 2012 International Conference on. pp. 2229–2232 (2012)
Google Scholar
Dietterich, T.G.: Ensemble methods in machine learning. In: International workshop on multiple classifier systems. pp. 1–15 (2000)
Google Scholar
Farhangfar, A., Kurgan, L.A., Pedrycz, W.: A novel framework for imputation of missing values in databases. Systems, Man and Cybernetics, Part A: Systems and Humans, IEEE Transactions on 37, 692–709 (2007)
Google Scholar
García-Laencina, P.J., Sancho-Gómez, J.L., Figueiras-Vidal, A.R.: Pattern classification with missing data: a review. Neural Computing and Applications 19, 263–282 (2010)
Google Scholar
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA data mining software: an update. ACM SIGKDD explorations newsletter 11, 10–18 (2009)
Google Scholar
Harel, O., Zhou, X.H.: Multiple imputation: review of theory, implementation and software. Statistics in medicine 26, 3057–3077 (2007)
Google Scholar
Krause, S., Polikar, R.: An ensemble of classifiers approach for the missing feature problem. In: Neural Networks, 2003. Proceedings of the International Joint Conference on. vol. 1, pp. 553–558 (2003)
Google Scholar
Liaw, A., Wiener, M.: Classification and regression by randomforest. R news 2, 18–22 (2002)
Google Scholar
Little, R.J., Rubin, D.B.: Statistical analysis with missing data. John Wiley & Sons (2014)
Google Scholar
Opitz, D., Maclin, R.: Popular ensemble methods: An empirical study. Journal of Artificial Intelligence Research 11, 169–198 (1999)
Google Scholar
Quinlan, J.R.: C4. 5: programs for machine learning. Elsevier (2014)
Google Scholar
Schafer, J.L., Graham, J.W.: Missing data: our view of the state of the art. Psychological methods 7, 147 (2002)
Article Google Scholar
Tran, C.T., Andreae, P., Zhang, M.: Impact of imputation of missing values on genetic programming based multiple feature construction for classification. In: 2015 IEEE Congress on Evolutionary Computation (CEC). pp. 2398–2405 (2015)
Google Scholar
Tran, C.T., Zhang, M., Andreae, P.: Multiple imputation for missing data using genetic programming. In: Proceedings of the 2015 Annual Conference on Genetic and Evolutionary Computation. pp. 583–590 (2015)
Google Scholar
Tran, C.T., Zhang, M., Andreae, P.: A genetic programming-based imputation method for classification with missing data. In: European Conference on Genetic Programming. pp. 149–163 (2016)
Google Scholar
White, I.R., Royston, P., Wood, A.M.: Multiple imputation using chained equations: issues and guidance for practice. Statistics in medicine 30, 377–399 (2011)
Article MathSciNet Google Scholar
Williams, D., Liao, X., Xue, Y., Carin, L., Krishnapuram, B.: On classification with incomplete data. IEEE Transactions on Pattern Analysis and Machine Intelligence 29(3), 427–436 (2007)
Article Google Scholar

Download references

Author information

Authors and Affiliations

School of Engineering and Computer Science, Victoria University of Wellington, PO Box 600, 6140, Wellington, New Zealand
Cao Truong Tran, Mengjie Zhang, Peter Andreae & Bing Xue
Faculty of Information Technology, Le Qui Don Technical University, Hanoi, Vietnam
Cao Truong Tran & Lam Thu Bui

Authors

Cao Truong Tran
View author publications
You can also search for this author in PubMed Google Scholar
Mengjie Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Peter Andreae
View author publications
You can also search for this author in PubMed Google Scholar
Bing Xue
View author publications
You can also search for this author in PubMed Google Scholar
Lam Thu Bui
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Cao Truong Tran .

Editor information

Editors and Affiliations

School of Engineering and Information Technology, Australian Defence Force Academy, The University of New South Wales, Canberra, Australian Capital Territory, Australia
George Leu
School of Engineering and Information Technology, Australian Defence Force Academy, The University of New South Wales, Canberra, Australian Capital Territory, Australia
Hemant Kumar Singh
School of Engineering and Information Technology, Australian Defence Force Academy, The University of New South Wales, Canberra, Australian Capital Territory, Australia
Saber Elsayed

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Tran, C.T., Zhang, M., Andreae, P., Xue, B., Bui, L.T. (2017). Multiple Imputation and Ensemble Learning for Classification with Incomplete Data. In: Leu, G., Singh, H., Elsayed, S. (eds) Intelligent and Evolutionary Systems. Proceedings in Adaptation, Learning and Optimization, vol 8. Springer, Cham. https://doi.org/10.1007/978-3-319-49049-6_29

Download citation

DOI: https://doi.org/10.1007/978-3-319-49049-6_29
Published: 09 November 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-49048-9
Online ISBN: 978-3-319-49049-6
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics