Combination of Single Feature Classifiers for Fast Feature Selection

Abstract

Feature selection happens to be an important step in many classification tasks. Its aim is to reduce the number of features and at the same time to try to maintain or even improve the performance of the used classifier. The selection methods described in the literature present some limitations at different levels. For instance, some are too complex to be operated in reasonable time or too dependent on the classifier used for evaluation. Others overlook interactions between features. In this paper, in order to limit these drawbacks, we propose a fast feature selection method. Each feature is closely associated with a single feature classifier. The weak classifiers we considered have several degrees of freedom and are optimized on the training dataset. Within the genetic algorithm, the individuals who are classifier subsets are evaluated by a fitness function based on a combination of single feature classifiers. Several combination operators are compared. The whole method is implemented and extensive trials are performed on four databases built from the MNIST handwritten digits database using four different descriptors. Results show how robust is our approach and how efficient is the method. On average, the number of selected features is about 70% smaller than the initial set while keeping the level of recognition rate.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. [Alamdari, 2006]
    Alamdari, A.: Variable selection using correlation and single variable classifier methods: Applications. In: Guyon, I., Nikravesh, M., Gunn, S., Zadeh, L. (eds.) Feature Extraction. STUDFUZZ, vol. 207, pp. 343–358. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  2. [Ben-Bassat, 1983]
    Ben-Bassat, M.: Use of distance measures, information measures and error bounds in feature evaluation. In: Krishnaiah, P., Kanal, L. (eds.) Classification, Pattern Recognition and Reduction of Dimensionality. HandBook of Statistics II, vol. 2, pp. 773–791. North Holland (1983)Google Scholar
  3. [Bins and Draper, 2001]
    Bins, J., Draper, B.: Feature selection from huge feature sets. In: Proceedings of the International Conference on Computer Vision (ICCV), pp. 159–165. IEEE (2001)Google Scholar
  4. [Bouguila and Ziou, 2012]
    Bouguila, N., Ziou, D.: A countably infinite mixture model for clustering and feature selection. Knowledge and Information Systems 33, 351–370 (2012)CrossRefGoogle Scholar
  5. [Breiman et al., 1984]
    Breiman, L., et al.: Classification and Regression Trees. Chapman and Hall, New York (1984)MATHGoogle Scholar
  6. [Chapelle and Vapnik, 2000]
    Chapelle, O., Vapnik, V.: Model selection for support vector machines. In: Proceedings of the Neural Information Processing Systems, ANIPS 2000, Denver, Colorado, USA, pp. 230–236. MIT Press (2000)Google Scholar
  7. [Dash and Liu, 2003]
    Dash, M., Liu, H.: Consistency-based search in feature selection. Artif. Intell. 151(1-2), 155–176 (2003)MathSciNetCrossRefMATHGoogle Scholar
  8. [Dujet and Vincent, 1998]
    Dujet, C., Vincent, N.: Data fusion modeling human behavior. International Journal of Intelligent System 13, 27–39 (1998)CrossRefGoogle Scholar
  9. [Freund and Schapire, 1995]
    Freund, Y., Schapire, R.E.: A decision-theoretic generalization of on-line learning and an application to boosting. In: Vitányi, P.M.B. (ed.) EuroCOLT 1995. LNCS, vol. 904, pp. 23–37. Springer, Heidelberg (1995)CrossRefGoogle Scholar
  10. [Furey et al., 2000]
    Furey, T.S., Cristianini, N., Duffy, N., Bednarski, D.W., Schummer, M., Haussler, D.: Support vector machine classification and validation of cancer tissue samples using microarray expression data. Bioinformatics 16(10), 906–914 (2000)CrossRefGoogle Scholar
  11. [Guyon and Elisseeff, 2003]
    Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. J. Mach. Learn. Res. 3, 1157–1182 (2003)MATHGoogle Scholar
  12. [Hall, 2000]
    Hall, M.: Correlation-based Feature Selection for Discrete and Numeric Class Machine Learning. In: 17th International Conference on Machine Learning, ICML 2000. LNCS, pp. 359–366. Morgan Kaufmann Publishers, San Fransico (2000)Google Scholar
  13. [Huang et al., 2008]
    Huang, C.-J., Yang, D.-X., Chuang, Y.-T.: Application of wrapper approach and composite classifier to the stock trend prediction. Expert Syst. Appl. 34, 2870–2878 (2008)CrossRefGoogle Scholar
  14. [Iba and Langley, 1992]
    Iba, W., Langley, P.: Induction of one-level decision trees. In: Proceedings of the ninth International Workshop on Machine Learning, ML 1992, pp. 233–240. Morgan Kaufmann Publishers Inc., San Francisco (1992)Google Scholar
  15. [John et al., 1994]
    John, G.H., Kohavi, R., Pfleger, K.: Irrelevant features and the subset selection problem. In: Machine Learning: Proceedings of the Eleventh International Conference, pp. 121–129. Morgan Kaufmann (1994)Google Scholar
  16. [Kachouri et al., 2010]
    Kachouri, R., Djemal, K., Maaref, H.: Adaptive feature selection for heterogeneous image databases. In: Djemal, K., Deriche, M. (eds.) Second IEEE International Conference on Image Processing Theory, Tools 38; Applications, 10, Paris, France (2010)Google Scholar
  17. [Kim et al., 2000a]
    Kim, H., Kim, J., Sim, D., Oh, D.: A modified zernike moment shape descriptor invariant to translation rotation and scale for similarity-based image retrieval. In: ICME 2000, p. MP5 (2000a)Google Scholar
  18. [Kim et al., 2000b]
    Kim, Y., Street, W., Menczer, F.: Feature selection in unsupervised learning via evolutionary search. In: 6th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD), pp. 365–369 (2000b)Google Scholar
  19. [Kira and Rendell, 1992]
    Kira, K., Rendell, L.A.: The feature selection problem: Traditional methods and a new algorithm. In: AAAI, pp. 129–134. AAAI Press and MIT Press, Cambridge, MA, USA (1992)Google Scholar
  20. [Kitoogo and Baryamureeba, 2007]
    Kitoogo, F.E., Baryamureeba, V.: A methodology for feature selection in named entity recognition. International Journal of Computing and ICT, 18–26 (2007)Google Scholar
  21. [Kohavi and John, 1997]
    Kohavi, R., John, G.H.: Wrappers for feature subset selection. Artif. Intell. 97, 273–324 (1997)CrossRefMATHGoogle Scholar
  22. [Leardi, 1994]
    Leardi, R.: Application of a genetic algorithm to feature selection under full validation conditions and to outlier detection. Journal of Chemometrics 8(1), 65–79 (1994)CrossRefGoogle Scholar
  23. [Lecun et al., 1998]
    Lecun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE, 2278–2324 (1998)Google Scholar
  24. [Li and Guo, 2008]
    Li, Y., Guo, L.: Tcm-knn scheme for network anomaly detection using feature-based optimizations. In: Proceedings of the 2008 ACM Symposium on Applied Computing, SAC 2008, pp. 2103–2109. ACM, New York (2008)Google Scholar
  25. [Liu and Yu, 2005]
    Liu, H., Yu, L.: Toward integrating feature selection algorithms for classification and clustering. IEEE Transations on Knowledge and Data Engineering 17, 491–502 (2005)CrossRefGoogle Scholar
  26. [Oliveira et al., 2002]
    Oliveira, L.S., Sabourin, R., Bortolozzi, F., Suen, C.Y.: Feature selection using multi-objective genetic algorithms for handwritten digit recognition. In: Proceedings of the 16th International Conference on Pattern Recognition, ICPR 2002, vol. 1. IEEE Computer Society, Washington, DC (2002)Google Scholar
  27. [Tabbone and Wendling, 2003]
    Tabbone, S., Wendling, L.: Binary shape normalization using the Radon transform. In: Nyström, I., Sanniti di Baja, G., Svensson, S. (eds.) DGCI 2003. LNCS, vol. 2886, pp. 184–193. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  28. [Yang and Honavar, 1998]
    Yang, J., Honavar, V.: Feature subset selection using a genetic algorithm. IEEE Intelligent Systems and their Applications 13(2), 44–49 (1998)CrossRefGoogle Scholar
  29. [Zhang and Lu, 2002]
    Zhang, D., Lu, G.: Shape based image retrieval using generic fourier descriptors. Signal Processing: Image Communication 17, 825–848 (2002)CrossRefGoogle Scholar
  30. [Zhou and Dillion, 1991]
    Zhou, X., Dillion, T.: A statistical-heuristic feature selection criterion for decision tree induction. IEEE Trans. Pattern Anal. Mach. Intell. 13, 834–841 (1991)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Hassan Chouaib
    • 1
  • Florence Cloppet
    • 1
  • Nicole Vincent
    • 1
  1. 1.Laboratoire LIPADEUniversité Paris DescartesParis Cedex 06France

Personalised recommendations