Genetic Programming with Interval Functions and Ensemble Learning for Classification with Incomplete Data
- 1.3k Downloads
Abstract
Missing values are an unavoidable issue in many real-world datasets. Classification with incomplete data has to be addressed carefully because inadequate treatment often leads to a big classification error. Interval genetic programming (IGP) is an approach to directly use genetic programming to evolve an effective and efficient classifier for incomplete data. This paper proposes a method to improve IGP for classification with incomplete data by integrating IGP with ensemble learning to build a set of classifiers. Experimental results show that the integration of IGP and ensemble learning to evolve a set of classifiers for incomplete data can achieve better accuracy than IGP alone. The proposed method is also more accurate than other common methods for classification with incomplete data.
Keywords
Incomplete data Classification Genetic programming Interval functions Ensemble learningReferences
- 1.Acuna, E., Rodriguez, C.: The treatment of missing values and its effect on classifier accuracy. In: Banks, D., McMorris, F.R., Arabie, P., Gaul, W. (eds.) Classification, Clustering, and Data Mining Applications. Studies in Classification, Data Analysis, and Knowledge Organisation, pp. 639–647. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-642-17103-1_60CrossRefGoogle Scholar
- 2.Asuncion, A., Newman, D.: UCI Machine Learning Repository (2013)Google Scholar
- 3.Buuren, S., Groothuis-Oudshoorn, K.: MICE: multivariate imputation by chained equations in R. J. Stat. Softw. 45, 1–67 (2011)CrossRefGoogle Scholar
- 4.Demšar, J.: Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 7, 1–30 (2006)MathSciNetzbMATHGoogle Scholar
- 5.Espejo, P.G., Ventura, S., Herrera, F.: A survey on the application of genetic programming to classification. IEEE Trans. Syst. Man Cybern. Part C (Appl. Rev.) 40, 121–144 (2010)CrossRefGoogle Scholar
- 6.García-Laencina, P.J., Sancho-Gómez, J.-L., Figueiras-Vidal, A.R.: Pattern classification with missing data: a review. Neural Comput. Appl. 19, 263–282 (2010)CrossRefGoogle Scholar
- 7.Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The weka data mining software: an update. ACM SIGKDD Explor. Newsl. 11, 10–18 (2009)CrossRefGoogle Scholar
- 8.Hall, M.A.: Correlation-based feature selection for discrete and numeric class machine learning. In: Proceedings of the Seventeenth International Conference on Machine Learning, pp. 359–366 (2000)Google Scholar
- 9.Koza, J.R.: Genetic Programming III: Darwinian Invention and Problem Solving, vol. 3 (1999)Google Scholar
- 10.Liu, Y., Brown, S.D.: Comparison of five iterative imputation methods for multivariate classification. Chemom. Intell. Lab. Syst. 120, 106–115 (2013)CrossRefGoogle Scholar
- 11.Luke, S., et al.: A Java-based evolutionary computation research system, March 2004. http://cs.gmu.edu/~eclab/projects/ecj
- 12.Neshatian, K., Zhang, M., Andreae, P.: A filter approach to multiple feature construction for symbolic learning classifiers using genetic programming. IEEE Trans. Evol. Comput. 16, 645–661 (2012)CrossRefGoogle Scholar
- 13.Opitz, D.W., Maclin, R.: Popular ensemble methods: an empirical study. J. Artif. Intell. Res. (JAIR) 11, 169–198 (1999)CrossRefGoogle Scholar
- 14.Tran, C.T., Zhang, M., Andreae, P.: Directly evolving classifiers for missing data using genetic programming. In: 2016 IEEE Congress on Evolutionary Computation (CEC), pp. 5278–5285 (2016)Google Scholar
- 15.Tran, C.T., Zhang, M., Andreae, P., Xue, B., Bui, L.T.: An effective and efficient approach to classification with incomplete data. Knowl.-Based Syst. 154, 1–16 (2018)CrossRefGoogle Scholar
- 16.White, I.R., Royston, P., Wood, A.M.: Multiple imputation using chained equations: issues and guidance for practice. Stat. Med. 30, 377–399 (2011)MathSciNetCrossRefGoogle Scholar
- 17.Wu, X., Kumar, V., Quinlan, J.R., Ghosh, J., Yang, Q., Motoda, H., McLachlan, G.J., Ng, A., Liu, B., Philip, S.Y., et al.: Top 10 algorithms in data mining. Knowl. Inf. Syst. 14(1), 1–37 (2008)CrossRefGoogle Scholar