Skip to main content

Bagging and Feature Selection for Classification with Incomplete Data

  • Conference paper
  • First Online:
Applications of Evolutionary Computation (EvoApplications 2017)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 10199))

Included in the following conference series:

Abstract

Missing values are an unavoidable issue of many real-world datasets. Dealing with missing values is an essential requirement in classification problem, because inadequate treatment with missing values often leads to large classification errors. Some classifiers can directly work with incomplete data, but they often result in big classification errors and generate complex models. Feature selection and bagging have been successfully used to improve classification, but they are mainly applied to complete data. This paper proposes a combination of bagging and feature selection to improve classification with incomplete data. To achieve this purpose, a wrapper-based feature selection which can directly work with incomplete data is used to select suitable feature subsets for bagging. The experiments on eight incomplete datasets were designed to compare the proposed method with three other popular methods that are able to deal with incomplete data using C4.5/REPTree as classifiers and using Particle Swam Optimisation as a search technique in feature selection. Results show that the combination of bagging and feature selection can not only achieve better classification accuracy than the other methods but also generate less complex models compared to the bagging method.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

References

  1. Chandrashekar, G., Sahin, F.: A survey on feature selection methods. Comput. Electr. Eng. 40(1), 16–28 (2014)

    Article  Google Scholar 

  2. Chen, H., Du, Y., Jiang, K.: Classification of incomplete data using classifier ensembles. In: 2012 International Conference on Systems and Informatics (ICSAI), pp. 2229–2232 (2012)

    Google Scholar 

  3. Clerc, M., Kennedy, J.: The particle swarm-explosion, stability, and convergence in a multidimensional complex space. IEEE Trans. Evol. Comput. 6, 58–73 (2002)

    Article  Google Scholar 

  4. Dietterich, T.G.: Ensemble methods in machine learning. In: International Workshop on Multiple Classifier Systems, pp. 1–15 (2000)

    Google Scholar 

  5. Doquire, G., Verleysen, M.: Feature selection with missing data using mutual information estimators. Neurocomputing 90, 3–11 (2012)

    Article  Google Scholar 

  6. García-Laencina, P.J., Sancho-Gómez, J.L., Figueiras-Vidal, A.R.: Pattern classification with missing data: a review. Neural Comput. Appl. 19, 263–282 (2010)

    Article  Google Scholar 

  7. Guerra-Salcedo, C., Whitley, D.: Feature selection mechanisms for ensemble creation: a genetic search perspective. In: Data Mining with Evolutionary Algorithms: Research Directions. Papers from the AAAI Workshop (1999)

    Google Scholar 

  8. Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The weka data mining software: An update. SIGKDD Explor. Newsl. 11, 10–18 (2009)

    Article  Google Scholar 

  9. Han, J., Pei, J., Kamber, M.: Data Mining: Concepts and Techniques. Elsevier, Waltham (2011)

    MATH  Google Scholar 

  10. Kennedy, J.: Particle swarm optimization. In: Encyclopedia of Machine Learning, pp. 760–766 (2011)

    Google Scholar 

  11. Krause, S., Polikar, R.: An ensemble of classifiers approach for the missing feature problem. In: 2003 Proceedings of the International Joint Conference on Neural Networks, vol. 1, pp. 553–558 (2003)

    Google Scholar 

  12. Lichman, M.: UCI machine learning repository (2013). http://archive.ics.uci.edu/ml

  13. Little, R.J., Rubin, D.B.: Statistical Analysis with Missing Data. Wiley, New York (2014)

    MATH  Google Scholar 

  14. Liu, H., Motoda, H.: Feature Selection for Knowledge Discovery and Data Mining, vol. 454. Springer, Heidelberg (2012)

    MATH  Google Scholar 

  15. Oliveira, L.S., Morita, M., Sabourin, R.: Feature selection for ensembles applied to handwriting recognition. Int. J. Doc. Anal. Recogn. (IJDAR) 8, 262–279 (2006)

    Article  Google Scholar 

  16. Opitz, D., Maclin, R.: Popular ensemble methods: An empirical study. J. Artif. Intell. Res. 11, 169–198 (1999)

    MATH  Google Scholar 

  17. Opitz, D.W.: Feature selection for ensembles. In: AAAI/IAAI 379–384 (1999)

    Google Scholar 

  18. Qian, W., Shu, W.: Mutual information criterion for feature selection from incomplete data. Neurocomputing 168, 210–220 (2015)

    Article  Google Scholar 

  19. Quinlan, J.R.: C4.5: Programs for Machine Learning. Elsevier, New York (2014)

    Google Scholar 

  20. Saar-Tsechansky, M., Provost, F.: Handling missing values when applying classification models. J. Mach. Learn. Res. 8, 1623–1657 (2007)

    MATH  Google Scholar 

  21. Su, J., Zhang, H.: A fast decision tree learning algorithm. In: Proceedings of the 21st National Conference on Artificial Intelligence, vol. 1, pp. 500–505 (2006)

    Google Scholar 

  22. Tran, C.T., Zhang, M., Andreae, P., Xue, B.: Improving performance for classification with incomplete data using wrapper-based feature selection. Evol. Intell. 9, 81–94 (2016)

    Article  Google Scholar 

  23. Tran, C.T., Zhang, M., Andreae, P., Xue, B.: A wrapper feature selection approach to classification with missing data. In: Squillero, G., Burelli, P. (eds.) EvoApplications 2016. LNCS, vol. 9597, pp. 685–700. Springer, Cham (2016). doi:10.1007/978-3-319-31204-0_44

    Chapter  Google Scholar 

  24. Xue, B., Zhang, M., Browne, W., Yao, X.: A survey on evolutionary computation approaches to feature selection. IEEE Trans. Evol. Comput. 20, 606–626 (2016)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Cao Truong Tran .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Tran, C.T., Zhang, M., Andreae, P., Xue, B. (2017). Bagging and Feature Selection for Classification with Incomplete Data. In: Squillero, G., Sim, K. (eds) Applications of Evolutionary Computation. EvoApplications 2017. Lecture Notes in Computer Science(), vol 10199. Springer, Cham. https://doi.org/10.1007/978-3-319-55849-3_31

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-55849-3_31

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-55848-6

  • Online ISBN: 978-3-319-55849-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics