Evolutionary Intelligence

, Volume 9, Issue 3, pp 81–94 | Cite as

Improving performance for classification with incomplete data using wrapper-based feature selection

  • Cao Truong TranEmail author
  • Mengjie Zhang
  • Peter Andreae
  • Bing Xue
Special Issue


Missing values are an unavoidable problem of many real-world datasets. Inadequate treatment of missing values may result in large errors on classification; thus, dealing well with missing values is essential for classification. Feature selection has been well known for improving classification, but it has been seldom used for improving classification with incomplete datasets. Moreover, some classifiers such as C4.5 are able to directly classify incomplete datasets, but they often generate more complex classifiers with larger classification errors. The purpose of this paper is to propose a wrapper-based feature selection method to improve the ability of a classifier able to classify incomplete datasets. In order to achieve the purpose, the feature selection method evaluates feature subsets using a classifier able to classify incomplete datasets. Empirical results on 14 datasets using particle swarm optimisation for searching feature subsets and C4.5 for evaluating the feature subsets in the feature selection method show that the wrapper-based feature selection is not only able to improve classification accuracy of the classifier, but also able to reduce the size of trees generated by the classifier.


Missing data Incomplete data Missing values Feature selection Classification C4.5 Particle swarm optimisation 


  1. 1.
    Lichman M (2013) UCI Machine Learning Repository. School of Information and Computer Science, University of California, Irvine, CA.
  2. 2.
    Barnard J, Meng X-L (1999) Applications of multiple imputation in medical studies: from aids to nhanes. Stat Methods Med Res 8:17–36CrossRefGoogle Scholar
  3. 3.
    Batista GE, Monard MC (2002) A study of K-nearest neighbour as an imputation method. HIS 87:251–260Google Scholar
  4. 4.
    Berger JO (2013) Statistical decision theory and Bayesian analysis. Springer, New YorkGoogle Scholar
  5. 5.
    Bishop CM (2006) Pattern recognition and machine learning. Springer, New YorkzbMATHGoogle Scholar
  6. 6.
    Chuang L-Y, Chang H-W, Tu C-J, Yang C-H (2008) Improved binary pso for feature selection using gene expression data. Comput Biol Chem 32:29–38CrossRefzbMATHGoogle Scholar
  7. 7.
    Clark P, Niblett T (1989) The CN2 induction algorithm. Mach Learn 3:261–283Google Scholar
  8. 8.
    Clerc M, Kennedy J (2002) The particle swarm-explosion, stability, and convergence in a multidimensional complex space. IEEE Trans Evol Comput 6:58–73CrossRefGoogle Scholar
  9. 9.
    Dash M, Liu H (1997) Feature selection for classification. Intell Data Anal 1:131–156CrossRefGoogle Scholar
  10. 10.
    De’ath G, Fabricius KE (2000) Classification and regression trees: a powerful yet simple technique for ecological data analysis. Ecology 81:3178–3192CrossRefGoogle Scholar
  11. 11.
    Doquire G, Verleysen M (2012) Feature selection with missing data using mutual information estimators. Neurocomputing 90:3–11CrossRefGoogle Scholar
  12. 12.
    Farhangfar A, Kurgan L, Dy J (2008) Impact of imputation of missing values on classification error for discrete data. Pattern Recogn 41:3692–3705CrossRefzbMATHGoogle Scholar
  13. 13.
    Farhangfar A, Kurgan LA, Pedrycz W (2007) A novel framework for imputation of missing values in databases. IEEE Trans Syst Man Cybern A Syst Hum 37:692–709CrossRefGoogle Scholar
  14. 14.
    García S, Molina D, Lozano M, Herrera F (2009) A study on the use of non-parametric tests for analyzing the evolutionary algorithms behaviour: a case study on the cec2005 special session on real parameter optimization. J Heuristics 15:617–644CrossRefzbMATHGoogle Scholar
  15. 15.
    García-Laencina PJ, Sancho-Gómez J-L, Figueiras-Vidal AR (2010) Pattern classification with missing data: a review. Neural Comput Appl 19:263–282CrossRefGoogle Scholar
  16. 16.
    Graham JW (2009) Missing data analysis: making it work in the real world. Annu Rev Psychol 60:549–576CrossRefGoogle Scholar
  17. 17.
    Guyon I, Elisseeff A (2003) An introduction to variable and feature selection. J Mach Learn Res 3:1157–1182zbMATHGoogle Scholar
  18. 18.
    Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH (2009) The WEKA data mining software: an update. ACM SIGKDD Explor Newsl 11:10–18CrossRefGoogle Scholar
  19. 19.
    Hall MA (1999) Correlation-based feature selection for machine learning. PhD thesis, The University of WaikatoGoogle Scholar
  20. 20.
    Han J, Kamber M, Pei J (2006) Data mining, southeast asia edition: concepts and techniques. Morgan kaufmann, San FranciscozbMATHGoogle Scholar
  21. 21.
    Huang C-L, Dun J-F (2008) A distributed pso-svm hybrid system with feature selection and parameter optimization. Appl Soft Comput 8:1381–1391CrossRefGoogle Scholar
  22. 22.
    Jain A, Zongker D (1997) Feature selection: evaluation, application, and small sample performance. IEEE Trans Pattern Anal Mach Intell 19:153–158CrossRefGoogle Scholar
  23. 23.
    Kennedy J (2010) Particle swarm optimization. In: Encyclopedia of machine learning, pp 760–766Google Scholar
  24. 24.
    Kennedy J, Kennedy JF, Eberhart RC (2001) Swarm intelligence. Morgan Kaufmann, San FranciscoGoogle Scholar
  25. 25.
    Kohavi R, John GH (1997) Wrappers for feature subset selection. Artif Intell 97:273–324CrossRefzbMATHGoogle Scholar
  26. 26.
    Koller D, Sahami M (1995) Toward optimal feature selection. In: 13th international conference on machine learning, pp 284–292Google Scholar
  27. 27.
    Lane MC, Xue B, Liu I, Zhang M (2014) Gaussian based particle swarm optimisation and statistical clustering for feature selection. In: European conference on evolutionary computation in combinatorial optimization, pp 133–144Google Scholar
  28. 28.
    Lin S-W, Ying K-C, Chen S-C, Lee Z-J (2008) Particle swarm optimization for parameter determination and feature selection of support vector machines. Expert Syst Appl 35:1817–1824CrossRefGoogle Scholar
  29. 29.
    Little RJ, Rubin DB (2014) Statistical analysis with missing data. Wiley, HobokenzbMATHGoogle Scholar
  30. 30.
    MacKay DJ (2003) Information theory, inference, and learning algorithms, vol 7. CiteseerGoogle Scholar
  31. 31.
    Oh I-S, Lee J-S, Moon B-R (2004) Hybrid genetic algorithms for feature selection. IEEE Trans Pattern Anal Mach Intell 26:1424–1437CrossRefGoogle Scholar
  32. 32.
    Qian W, Shu W (2015) Mutual information criterion for feature selection from incomplete data. Neurocomputing 168:210–220CrossRefGoogle Scholar
  33. 33.
    Quinlan JR (2014) C4. 5: programs for machine learning. Elsevier, AmsterdamGoogle Scholar
  34. 34.
    Schafer JL (1997) Analysis of incomplete multivariate data. CRC Press, Boca RatonCrossRefzbMATHGoogle Scholar
  35. 35.
    Schafer JL, Graham JW (2002) Missing data: our view of the state of the art. Psychol Methods 7:147CrossRefGoogle Scholar
  36. 36.
    Tran CT, Andreae P, Zhang M (2015) Impact of imputation of missing values on genetic programming based multiple feature construction for classification. In: 2015 IEEE congress on evolutionary computation (CEC), pp 2398–2405Google Scholar
  37. 37.
    Tran CT, Zhang M, Andreae P (2015) Multiple imputation for missing data using genetic programming. In: Proceedings of the 2015 annual conference on genetic and evolutionary computation, pp 583–590Google Scholar
  38. 38.
    Tran CT, Zhang M, Andreae P (2016) A genetic programming-based imputation method for classification with missing data. In: European conference on genetic programming. Springer, pp 149–163Google Scholar
  39. 39.
    Tran CT, Zhang M, Andreae P, Xue B (2016) A wrapper feature selection approach to classification with missing data. In: Applications of evolutionary computation, pp 685–700Google Scholar
  40. 40.
    Xue B, Zhang M, Browne W, Yao X (2015) A survey on evolutionary computation approaches to feature selection. IEEE Trans Evol Comput 99:1Google Scholar
  41. 41.
    Xue B, Zhang M, Browne WN (2012) Single feature ranking and binary particle swarm optimisation based feature subset ranking for feature selection. In: Proceedings of the thirty-fifth Australasian computer science conference, vol 122, pp 27–36Google Scholar
  42. 42.
    Xue B, Zhang M, Browne WN (2013) Particle swarm optimization for feature selection in classification: a multi-objective approach. IEEE Trans Cybern 43:1656–1671CrossRefGoogle Scholar
  43. 43.
    Xue B, Zhang M, Browne WN (2015) A comprehensive comparison on evolutionary feature selection approaches to classification. Int J Comput Intell Appl 14:1550008CrossRefGoogle Scholar
  44. 44.
    Yang J, Honavar V (1998) Feature subset selection using a genetic algorithm. In: Feature extraction, construction and selection, pp 117–136Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2016

Authors and Affiliations

  1. 1.Evolutionary Computation Research Group, School of Engineering and Computer ScienceVictoria University of WellingtonWellingtonNew Zealand

Personalised recommendations