Skip to main content

Feature Subset Selection, Class Separability, and Genetic Algorithms

  • Conference paper
Genetic and Evolutionary Computation – GECCO 2004 (GECCO 2004)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 3102))

Included in the following conference series:

Abstract

The performance of classification algorithms in machine learning is affected by the features used to describe the labeled examples presented to the inducers. Therefore, the problem of feature subset selection has received considerable attention. Genetic approaches to this problem usually follow the wrapper approach: treat the inducer as a black box that is used to evaluate candidate feature subsets. The evaluations might take a considerable time and the traditional approach might be impractical for large data sets. This paper describes a hybrid of a simple genetic algorithm and a method based on class separability applied to the selection of feature subsets for classification problems. The proposed hybrid was compared against each of its components and two other feature selection wrappers that are used widely. The objective of this paper is to determine if the proposed hybrid presents advantages over the other methods in terms of accuracy or speed in this problem. The experiments used a Naive Bayes classifier and public-domain and artificial data sets. The experiments suggest that the hybrid usually finds compact feature subsets that give the most accurate results, while beating the execution time of the other wrappers.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. John, G., Kohavi, R., Phleger, K.: Irrelevant features and the feature subset problem. In: Proceedings of the 11th International Conference on Machine Learning, pp. 121–129. Morgan Kaufmann, San Francisco (1994)

    Google Scholar 

  2. Kohavi, R., John, G.: Wrappers for feature subset selection. Artificial Intelligence 97, 273–324 (1997)

    Article  MATH  Google Scholar 

  3. Jain, A., Zongker, D.: Feature selection: evaluation, application and small sample performance. IEEE Transactions on Pattern Analysis and Machine Intelligence 19, 153–158 (1997)

    Article  Google Scholar 

  4. Siedlecki, W., Sklansky, J.: A note on genetic algorithms for large-scale feature selection. Pattern Recognition Letters 10, 335–347 (1989)

    Article  MATH  Google Scholar 

  5. Brill, F.Z., Brown, D.E., Martin, W.N.: Genetic algorithms for feature selection for counterpropagation networks. Tech. Rep. No. IPC-TR-90-004, University of Virginia, Institute of Parallel Computation, Charlottesville (1990)

    Google Scholar 

  6. Brotherton, T.W., Simpson, P.K.: Dynamic feature set training of neural nets for classification. In: McDonnell, J.R., Reynolds, R.G., Fogel, D.B. (eds.) Evolutionary Programming IV, Cambridge, MA, pp. 83–94. MIT Press, Cambridge (1995)

    Google Scholar 

  7. Bala, J., De Jong, K., Huang, J., Vafaie, H., Wechsler, H.: Using learning to facilitate the evolution of features for recognizing visual concepts. Evolutionary Computation 4, 297–311 (1996)

    Article  Google Scholar 

  8. Kelly, J.D., Davis, L.: Hybridizing the genetic algorithm and the K nearest neighbors classification algorithm. In: Belew, R.K., Booker, L.B. (eds.) Proceedings of the Fourth International Conference on Genetic Algorithms, San Mateo, CA, pp. 377–383. Morgan Kaufmann, San Francisco (1991)

    Google Scholar 

  9. Punch, W.F., Goodman, E.D., Pei, M., Chia-Shun, L., Hovland, P., Enbody, R.: Further research on feature selection and classification using genetic algorithms. In: Forrest, S. (ed.) Proceedings of the Fifth International Conference on Genetic Algorithms, San Mateo, CA, pp. 557–564. Morgan Kaufmann, San Francisco (1993)

    Google Scholar 

  10. Raymer, M.L., Punch, W.F., Goodman, E.D., Sanschagrin, P.C., Kuhn, L.A.: Simultaneous feature scaling and selection using a genetic algorithm. In: Bäck, T. (ed.) Proceedings of the Seventh International Conference on Genetic Algorithms, San Francisco, pp. 561–567. Morgan Kaufmann, San Francisco (1997)

    Google Scholar 

  11. Kudo, M., Sklansky, K.: Comparison of algorithms that select features for pattern classifiers. Pattern Recognition 33, 25–41 (2000)

    Article  Google Scholar 

  12. Vafaie, H., De Jong, K.A.: Robust feature selection algorithms. In: Proceedings of the International Conference on Tools with Artificial Intelligence, pp. 356–364. IEEE Computer Society Press, Los Alamitos (1993)

    Google Scholar 

  13. Inza, I., Larrañaga, P., Etxeberria, R., Sierra, B.: Feature subset selection by Bayesian networks based optimization. Artificial Intelligence 123, 157–184 (1999)

    Article  Google Scholar 

  14. Cantú-Paz, E.: Feature subset selection by estimation of distribution algorithms. In: Langdon, W.B., Cantú-Paz, E., Mathias, K., Roy, R., Davis, D., Poli, R., Balakrishnan, K., Honavar, V., Rudolph, G., Wegener, J., Bull, L., Potter, M.A., Schultz, A.C., Miller, J.F., Burke, E., Jonoska, N. (eds.) GECCO 2002: Proceedings of the Genetic and Evolutionary Computation Conference, San Francisco, CA, pp. 303–310. Morgan Kaufmann Publishers, San Francisco (2002)

    Google Scholar 

  15. Raymer, M.L., Punch, W.F., Goodman, E.D., Kuhn, L.A., Jain, A.K.: Dimensionality reduction using genetic algorithms. IEEE Transactions on Evolutionary Computation 4, 164–171 (2000)

    Article  Google Scholar 

  16. Inza, I., Larrañaga, P., Sierra, B.: Feature subset selection by Bayesian networks: a comparison with genetic and sequential algorithms. International Journal of Approximate Reasoning 27, 143–164 (2001)

    Article  MATH  Google Scholar 

  17. Inza, I., Larrañaga, P., Sierra, B.: Feature subset selection by estimation of distribution algorithms. In: Larrañaga, P., Lozano, J.A. (eds.) Estimation of Distribution Algorithms: A new tool for Evolutionary Computation, Kluwer Academic Publishers, Dordrecht (2001)

    Google Scholar 

  18. Ozdemir, M., Embrechts, M.J., Arciniegas, F., Breneman, C.M., Lockwood, L., Bennett, K.P.: Feature selection for in-silico drug design using genetic algorithms and neural networks. In: IEEE Mountain Workshop on Soft Computing in Industrial Applications, pp. 53–57. IEEE Press, Los Alamitos (2001)

    Google Scholar 

  19. Lanzi, P.: Fast feature selection with genetic algorithms: a wrapper approach. In: IEEE International Conference on Evolutionary Computation, pp. 537–540. IEEE Press, Los Alamitos (1997)

    Google Scholar 

  20. Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. Journal of Machine Learning Research 3, 1157–1182 (2003)

    Article  MATH  Google Scholar 

  21. Oh, I.S., Lee, J.S., Suen, C.: Analysis of class separation and combination of classdependent features for handwritting recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 21, 1089–1094 (1999)

    Article  Google Scholar 

  22. Harik, G., Cantú-Paz, E., Goldberg, D.E., Miller, B.L.: The gambler’s ruin problem, genetic algorithms, and the sizing of populations. Evolutionary Computation 7, 231–253 (1999)

    Article  Google Scholar 

  23. Matsumoto, M., Nishimura, T.: Mersenne twister: A 623-dimensionally equidistributed uniform pseudorandom number generator. ACM Transactions on Modeling and Computer Simulation 8, 3–30 (1998)

    Article  MATH  Google Scholar 

  24. Blake, C., Merz, C.: UCI repository of machine learning databases (1998)

    Google Scholar 

  25. Miller, B.L., Goldberg, D.E.: Genetic algorithms, selection schemes, and the varying effects of noise. Evolutionary Computation 4, 113–131 (1996)

    Article  Google Scholar 

  26. Alpaydin, E.: Combined 5 × 2cv F test for comparing supervised classification algorithms. Neural Computation 11, 1885–1892 (1999)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2004 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Cantú-Paz, E. (2004). Feature Subset Selection, Class Separability, and Genetic Algorithms. In: Deb, K. (eds) Genetic and Evolutionary Computation – GECCO 2004. GECCO 2004. Lecture Notes in Computer Science, vol 3102. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-24854-5_96

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-24854-5_96

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-22344-3

  • Online ISBN: 978-3-540-24854-5

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics