Abstract
Missing values are an unavoidable issue of many real-world datasets. Dealing with missing values is an essential requirement in classification problem, because inadequate treatment with missing values often leads to large classification errors. Some classifiers can directly work with incomplete data, but they often result in big classification errors and generate complex models. Feature selection and bagging have been successfully used to improve classification, but they are mainly applied to complete data. This paper proposes a combination of bagging and feature selection to improve classification with incomplete data. To achieve this purpose, a wrapper-based feature selection which can directly work with incomplete data is used to select suitable feature subsets for bagging. The experiments on eight incomplete datasets were designed to compare the proposed method with three other popular methods that are able to deal with incomplete data using C4.5/REPTree as classifiers and using Particle Swam Optimisation as a search technique in feature selection. Results show that the combination of bagging and feature selection can not only achieve better classification accuracy than the other methods but also generate less complex models compared to the bagging method.
References
Chandrashekar, G., Sahin, F.: A survey on feature selection methods. Comput. Electr. Eng. 40(1), 16–28 (2014)
Chen, H., Du, Y., Jiang, K.: Classification of incomplete data using classifier ensembles. In: 2012 International Conference on Systems and Informatics (ICSAI), pp. 2229–2232 (2012)
Clerc, M., Kennedy, J.: The particle swarm-explosion, stability, and convergence in a multidimensional complex space. IEEE Trans. Evol. Comput. 6, 58–73 (2002)
Dietterich, T.G.: Ensemble methods in machine learning. In: International Workshop on Multiple Classifier Systems, pp. 1–15 (2000)
Doquire, G., Verleysen, M.: Feature selection with missing data using mutual information estimators. Neurocomputing 90, 3–11 (2012)
GarcÃa-Laencina, P.J., Sancho-Gómez, J.L., Figueiras-Vidal, A.R.: Pattern classification with missing data: a review. Neural Comput. Appl. 19, 263–282 (2010)
Guerra-Salcedo, C., Whitley, D.: Feature selection mechanisms for ensemble creation: a genetic search perspective. In: Data Mining with Evolutionary Algorithms: Research Directions. Papers from the AAAI Workshop (1999)
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The weka data mining software: An update. SIGKDD Explor. Newsl. 11, 10–18 (2009)
Han, J., Pei, J., Kamber, M.: Data Mining: Concepts and Techniques. Elsevier, Waltham (2011)
Kennedy, J.: Particle swarm optimization. In: Encyclopedia of Machine Learning, pp. 760–766 (2011)
Krause, S., Polikar, R.: An ensemble of classifiers approach for the missing feature problem. In: 2003 Proceedings of the International Joint Conference on Neural Networks, vol. 1, pp. 553–558 (2003)
Lichman, M.: UCI machine learning repository (2013). http://archive.ics.uci.edu/ml
Little, R.J., Rubin, D.B.: Statistical Analysis with Missing Data. Wiley, New York (2014)
Liu, H., Motoda, H.: Feature Selection for Knowledge Discovery and Data Mining, vol. 454. Springer, Heidelberg (2012)
Oliveira, L.S., Morita, M., Sabourin, R.: Feature selection for ensembles applied to handwriting recognition. Int. J. Doc. Anal. Recogn. (IJDAR) 8, 262–279 (2006)
Opitz, D., Maclin, R.: Popular ensemble methods: An empirical study. J. Artif. Intell. Res. 11, 169–198 (1999)
Opitz, D.W.: Feature selection for ensembles. In: AAAI/IAAI 379–384 (1999)
Qian, W., Shu, W.: Mutual information criterion for feature selection from incomplete data. Neurocomputing 168, 210–220 (2015)
Quinlan, J.R.: C4.5: Programs for Machine Learning. Elsevier, New York (2014)
Saar-Tsechansky, M., Provost, F.: Handling missing values when applying classification models. J. Mach. Learn. Res. 8, 1623–1657 (2007)
Su, J., Zhang, H.: A fast decision tree learning algorithm. In: Proceedings of the 21st National Conference on Artificial Intelligence, vol. 1, pp. 500–505 (2006)
Tran, C.T., Zhang, M., Andreae, P., Xue, B.: Improving performance for classification with incomplete data using wrapper-based feature selection. Evol. Intell. 9, 81–94 (2016)
Tran, C.T., Zhang, M., Andreae, P., Xue, B.: A wrapper feature selection approach to classification with missing data. In: Squillero, G., Burelli, P. (eds.) EvoApplications 2016. LNCS, vol. 9597, pp. 685–700. Springer, Cham (2016). doi:10.1007/978-3-319-31204-0_44
Xue, B., Zhang, M., Browne, W., Yao, X.: A survey on evolutionary computation approaches to feature selection. IEEE Trans. Evol. Comput. 20, 606–626 (2016)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Tran, C.T., Zhang, M., Andreae, P., Xue, B. (2017). Bagging and Feature Selection for Classification with Incomplete Data. In: Squillero, G., Sim, K. (eds) Applications of Evolutionary Computation. EvoApplications 2017. Lecture Notes in Computer Science(), vol 10199. Springer, Cham. https://doi.org/10.1007/978-3-319-55849-3_31
Download citation
DOI: https://doi.org/10.1007/978-3-319-55849-3_31
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-55848-6
Online ISBN: 978-3-319-55849-3
eBook Packages: Computer ScienceComputer Science (R0)