Bagging and Feature Selection for Classification with Incomplete Data

Tran, Cao Truong; Zhang, Mengjie; Andreae, Peter; Xue, Bing

doi:10.1007/978-3-319-55849-3_31

Cao Truong Tran¹⁵,
Mengjie Zhang¹⁵,
Peter Andreae¹⁵ &
…
Bing Xue¹⁵

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 10199))

Included in the following conference series:

European Conference on the Applications of Evolutionary Computation

1887 Accesses
8 Citations

Abstract

Missing values are an unavoidable issue of many real-world datasets. Dealing with missing values is an essential requirement in classification problem, because inadequate treatment with missing values often leads to large classification errors. Some classifiers can directly work with incomplete data, but they often result in big classification errors and generate complex models. Feature selection and bagging have been successfully used to improve classification, but they are mainly applied to complete data. This paper proposes a combination of bagging and feature selection to improve classification with incomplete data. To achieve this purpose, a wrapper-based feature selection which can directly work with incomplete data is used to select suitable feature subsets for bagging. The experiments on eight incomplete datasets were designed to compare the proposed method with three other popular methods that are able to deal with incomplete data using C4.5/REPTree as classifiers and using Particle Swam Optimisation as a search technique in feature selection. Results show that the combination of bagging and feature selection can not only achieve better classification accuracy than the other methods but also generate less complex models compared to the bagging method.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

References

Chandrashekar, G., Sahin, F.: A survey on feature selection methods. Comput. Electr. Eng. 40(1), 16–28 (2014)
Article Google Scholar
Chen, H., Du, Y., Jiang, K.: Classification of incomplete data using classifier ensembles. In: 2012 International Conference on Systems and Informatics (ICSAI), pp. 2229–2232 (2012)
Google Scholar
Clerc, M., Kennedy, J.: The particle swarm-explosion, stability, and convergence in a multidimensional complex space. IEEE Trans. Evol. Comput. 6, 58–73 (2002)
Article Google Scholar
Dietterich, T.G.: Ensemble methods in machine learning. In: International Workshop on Multiple Classifier Systems, pp. 1–15 (2000)
Google Scholar
Doquire, G., Verleysen, M.: Feature selection with missing data using mutual information estimators. Neurocomputing 90, 3–11 (2012)
Article Google Scholar
García-Laencina, P.J., Sancho-Gómez, J.L., Figueiras-Vidal, A.R.: Pattern classification with missing data: a review. Neural Comput. Appl. 19, 263–282 (2010)
Article Google Scholar
Guerra-Salcedo, C., Whitley, D.: Feature selection mechanisms for ensemble creation: a genetic search perspective. In: Data Mining with Evolutionary Algorithms: Research Directions. Papers from the AAAI Workshop (1999)
Google Scholar
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The weka data mining software: An update. SIGKDD Explor. Newsl. 11, 10–18 (2009)
Article Google Scholar
Han, J., Pei, J., Kamber, M.: Data Mining: Concepts and Techniques. Elsevier, Waltham (2011)
MATH Google Scholar
Kennedy, J.: Particle swarm optimization. In: Encyclopedia of Machine Learning, pp. 760–766 (2011)
Google Scholar
Krause, S., Polikar, R.: An ensemble of classifiers approach for the missing feature problem. In: 2003 Proceedings of the International Joint Conference on Neural Networks, vol. 1, pp. 553–558 (2003)
Google Scholar
Lichman, M.: UCI machine learning repository (2013). http://archive.ics.uci.edu/ml
Little, R.J., Rubin, D.B.: Statistical Analysis with Missing Data. Wiley, New York (2014)
MATH Google Scholar
Liu, H., Motoda, H.: Feature Selection for Knowledge Discovery and Data Mining, vol. 454. Springer, Heidelberg (2012)
MATH Google Scholar
Oliveira, L.S., Morita, M., Sabourin, R.: Feature selection for ensembles applied to handwriting recognition. Int. J. Doc. Anal. Recogn. (IJDAR) 8, 262–279 (2006)
Article Google Scholar
Opitz, D., Maclin, R.: Popular ensemble methods: An empirical study. J. Artif. Intell. Res. 11, 169–198 (1999)
MATH Google Scholar
Opitz, D.W.: Feature selection for ensembles. In: AAAI/IAAI 379–384 (1999)
Google Scholar
Qian, W., Shu, W.: Mutual information criterion for feature selection from incomplete data. Neurocomputing 168, 210–220 (2015)
Article Google Scholar
Quinlan, J.R.: C4.5: Programs for Machine Learning. Elsevier, New York (2014)
Google Scholar
Saar-Tsechansky, M., Provost, F.: Handling missing values when applying classification models. J. Mach. Learn. Res. 8, 1623–1657 (2007)
MATH Google Scholar
Su, J., Zhang, H.: A fast decision tree learning algorithm. In: Proceedings of the 21st National Conference on Artificial Intelligence, vol. 1, pp. 500–505 (2006)
Google Scholar
Tran, C.T., Zhang, M., Andreae, P., Xue, B.: Improving performance for classification with incomplete data using wrapper-based feature selection. Evol. Intell. 9, 81–94 (2016)
Article Google Scholar
Tran, C.T., Zhang, M., Andreae, P., Xue, B.: A wrapper feature selection approach to classification with missing data. In: Squillero, G., Burelli, P. (eds.) EvoApplications 2016. LNCS, vol. 9597, pp. 685–700. Springer, Cham (2016). doi:10.1007/978-3-319-31204-0_44
Chapter Google Scholar
Xue, B., Zhang, M., Browne, W., Yao, X.: A survey on evolutionary computation approaches to feature selection. IEEE Trans. Evol. Comput. 20, 606–626 (2016)
Article Google Scholar

Download references

Author information

Authors and Affiliations

School of Engineering and Computer Science, Victoria University of Wellington, PO Box 600, Wellington, 6140, New Zealand
Cao Truong Tran, Mengjie Zhang, Peter Andreae & Bing Xue

Authors

Cao Truong Tran
View author publications
You can also search for this author in PubMed Google Scholar
Mengjie Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Peter Andreae
View author publications
You can also search for this author in PubMed Google Scholar
Bing Xue
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Cao Truong Tran .

Editor information

Editors and Affiliations

Politecnico di Torino, Turin, Italy
Giovanni Squillero
Edinburgh Napier University, Edinburgh, United Kingdom
Kevin Sim

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Tran, C.T., Zhang, M., Andreae, P., Xue, B. (2017). Bagging and Feature Selection for Classification with Incomplete Data. In: Squillero, G., Sim, K. (eds) Applications of Evolutionary Computation. EvoApplications 2017. Lecture Notes in Computer Science(), vol 10199. Springer, Cham. https://doi.org/10.1007/978-3-319-55849-3_31

Download citation

DOI: https://doi.org/10.1007/978-3-319-55849-3_31
Published: 25 March 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-55848-6
Online ISBN: 978-3-319-55849-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics