Genetic Programming with Interval Functions and Ensemble Learning for Classification with Incomplete Data

  • Cao Truong TranEmail author
  • Mengjie Zhang
  • Bing Xue
  • Peter Andreae
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11320)


Missing values are an unavoidable issue in many real-world datasets. Classification with incomplete data has to be addressed carefully because inadequate treatment often leads to a big classification error. Interval genetic programming (IGP) is an approach to directly use genetic programming to evolve an effective and efficient classifier for incomplete data. This paper proposes a method to improve IGP for classification with incomplete data by integrating IGP with ensemble learning to build a set of classifiers. Experimental results show that the integration of IGP and ensemble learning to evolve a set of classifiers for incomplete data can achieve better accuracy than IGP alone. The proposed method is also more accurate than other common methods for classification with incomplete data.


Incomplete data Classification Genetic programming Interval functions Ensemble learning 


  1. 1.
    Acuna, E., Rodriguez, C.: The treatment of missing values and its effect on classifier accuracy. In: Banks, D., McMorris, F.R., Arabie, P., Gaul, W. (eds.) Classification, Clustering, and Data Mining Applications. Studies in Classification, Data Analysis, and Knowledge Organisation, pp. 639–647. Springer, Heidelberg (2004). Scholar
  2. 2.
    Asuncion, A., Newman, D.: UCI Machine Learning Repository (2013)Google Scholar
  3. 3.
    Buuren, S., Groothuis-Oudshoorn, K.: MICE: multivariate imputation by chained equations in R. J. Stat. Softw. 45, 1–67 (2011)CrossRefGoogle Scholar
  4. 4.
    Demšar, J.: Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 7, 1–30 (2006)MathSciNetzbMATHGoogle Scholar
  5. 5.
    Espejo, P.G., Ventura, S., Herrera, F.: A survey on the application of genetic programming to classification. IEEE Trans. Syst. Man Cybern. Part C (Appl. Rev.) 40, 121–144 (2010)CrossRefGoogle Scholar
  6. 6.
    García-Laencina, P.J., Sancho-Gómez, J.-L., Figueiras-Vidal, A.R.: Pattern classification with missing data: a review. Neural Comput. Appl. 19, 263–282 (2010)CrossRefGoogle Scholar
  7. 7.
    Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The weka data mining software: an update. ACM SIGKDD Explor. Newsl. 11, 10–18 (2009)CrossRefGoogle Scholar
  8. 8.
    Hall, M.A.: Correlation-based feature selection for discrete and numeric class machine learning. In: Proceedings of the Seventeenth International Conference on Machine Learning, pp. 359–366 (2000)Google Scholar
  9. 9.
    Koza, J.R.: Genetic Programming III: Darwinian Invention and Problem Solving, vol. 3 (1999)Google Scholar
  10. 10.
    Liu, Y., Brown, S.D.: Comparison of five iterative imputation methods for multivariate classification. Chemom. Intell. Lab. Syst. 120, 106–115 (2013)CrossRefGoogle Scholar
  11. 11.
    Luke, S., et al.: A Java-based evolutionary computation research system, March 2004.
  12. 12.
    Neshatian, K., Zhang, M., Andreae, P.: A filter approach to multiple feature construction for symbolic learning classifiers using genetic programming. IEEE Trans. Evol. Comput. 16, 645–661 (2012)CrossRefGoogle Scholar
  13. 13.
    Opitz, D.W., Maclin, R.: Popular ensemble methods: an empirical study. J. Artif. Intell. Res. (JAIR) 11, 169–198 (1999)CrossRefGoogle Scholar
  14. 14.
    Tran, C.T., Zhang, M., Andreae, P.: Directly evolving classifiers for missing data using genetic programming. In: 2016 IEEE Congress on Evolutionary Computation (CEC), pp. 5278–5285 (2016)Google Scholar
  15. 15.
    Tran, C.T., Zhang, M., Andreae, P., Xue, B., Bui, L.T.: An effective and efficient approach to classification with incomplete data. Knowl.-Based Syst. 154, 1–16 (2018)CrossRefGoogle Scholar
  16. 16.
    White, I.R., Royston, P., Wood, A.M.: Multiple imputation using chained equations: issues and guidance for practice. Stat. Med. 30, 377–399 (2011)MathSciNetCrossRefGoogle Scholar
  17. 17.
    Wu, X., Kumar, V., Quinlan, J.R., Ghosh, J., Yang, Q., Motoda, H., McLachlan, G.J., Ng, A., Liu, B., Philip, S.Y., et al.: Top 10 algorithms in data mining. Knowl. Inf. Syst. 14(1), 1–37 (2008)CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  • Cao Truong Tran
    • 1
    Email author
  • Mengjie Zhang
    • 1
  • Bing Xue
    • 1
  • Peter Andreae
    • 1
  1. 1.School of Engineering and Computer ScienceVictoria University of WellingtonWellingtonNew Zealand

Personalised recommendations