Feature Construction and Dimension Reduction Using Genetic Programming

  • Kourosh Neshatian
  • Mengjie Zhang
  • Mark Johnston
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4830)


This paper describes a new approach to the use of genetic programming (GP) for feature construction in classification problems. Rather than wrapping a particular classifier for single feature construction as in most of the existing methods, this approach uses GP to construct multiple (high-level) features from the original features. These constructed features are then used by decision trees for classification. As feature construction is independent of classification, the fitness function is designed based on the class dispersion and entropy. This approach is examined and compared with the standard decision tree method, using the original features, and using a combination of the original features and constructed features, on 12 benchmark classification problems. The results show that the new approach outperforms the standard way of using decision trees on these problems in terms of the classification performance, dimension reduction and the learned decision tree size.


Decision Tree Genetic Programming Dimension Reduction Original Feature Class Interval 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Ekart, A., Markus, A.: Using genetic programming and decision trees for generating structural descriptions of four bar mechanisms. Artificial Intelligence for Engineering Design, Analysis and Manufacturing 17(3), 205–220 (2003)CrossRefGoogle Scholar
  2. 2.
    Muni, D.P., Pal, N.R., Das, J.: Genetic programming for simultaneous feature selection and classifier design. IEEE Transactions on Systems, Man and Cybernetics, Part B 36(1), 106–117 (2006)CrossRefGoogle Scholar
  3. 3.
    Krawiec, K., Bhanu, B.: Visual learning by coevolutionary feature synthesis. IEEE Transactions on System, Man, and Cybernetics – Part B 35(3), 409–425 (2005)CrossRefGoogle Scholar
  4. 4.
    Bhanu, B., Krawiec, K.: Coevolutionary construction of features for transformation of representation in machine learning. In: Barry, A.M. (ed.) GECCO 2002. Proceedings of the Bird of a Feather Workshops, Genetic and Evolutionary Computation Conference, pp. 249–254. AAAI, New York (2002)Google Scholar
  5. 5.
    Otero, F.E.B., Silva, M.M.S., Freitas, A.A., Nievola, J.C.: Genetic programming for attribute construction in data mining. In: Ryan, C., Soule, T., Keijzer, M., Tsang, E.P.K., Poli, R., Costa, E. (eds.) EuroGP 2003. LNCS, vol. 2610, pp. 384–393. Springer, Heidelberg (2003)Google Scholar
  6. 6.
    Kohavi, R., John, G.: Wrappers for feature subset selection. Artificial Intelligence, 273–324 (1997)Google Scholar
  7. 7.
    Smith, M.G., Bull, L.: Genetic programming with a genetic algorithm for feature construction and selection. Genetic Programming and Evolvable Machines 6(3), 265–281 (2005)CrossRefGoogle Scholar
  8. 8.
    Krawiec, K.: Genetic programming-based construction of features for machine learning and knowledge discovery tasks. Genetic Programming and Evolvable Machines 3(4), 329–343 (2002)zbMATHCrossRefGoogle Scholar
  9. 9.
    Muharram, M.A., Smith, G.D.: Evolutionary feature construction using information gain and gini index. In: Keijzer, M., O’Reilly, U.M., Lucas, S.M., Costa, E., Soule, T. (eds.) EuroGP 2004. LNCS, vol. 3003, pp. 379–388. Springer, Heidelberg (2004)Google Scholar
  10. 10.
    Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann, San Francisco (1993)Google Scholar
  11. 11.
    Kreyszig, E.: Advanced Engineering Mathematics, 8th edn. John Wiley, Chichester (1999)Google Scholar
  12. 12.
    Mitchell, T.: Machine Learning. McGraw-Hill, New York (1997)zbMATHGoogle Scholar
  13. 13.
    Koza, J.R.: Genetic Programming. MIT Press, Cambridge (1992)zbMATHGoogle Scholar
  14. 14.
    Asuncion, A.D.N.: UCI machine learning repository (2007)Google Scholar
  15. 15.
    Silva, S., Almeida, J.: Gplab - a genetic programming toolbox for matlab. In: Proceedings of the Nordic MATLAB Conference, pp. 273–278 (2003)Google Scholar
  16. 16.
    Silva, S., Costa, E.: Dynamic limits for bloat control: Variations on size and depth. In: Deb, K., et al. (eds.) GECCO 2004. LNCS, vol. 3103, pp. 666–677. Springer, Heidelberg (2004)Google Scholar
  17. 17.
    Davis, L.: Adapting operator probabilities in genetic algorithms. In: Proceedings of the Third International Conference on Genetic Algorithms, pp. 70–79. Morgan Kaufman, San Francisco (1989)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2007

Authors and Affiliations

  • Kourosh Neshatian
    • 1
  • Mengjie Zhang
    • 1
  • Mark Johnston
    • 1
  1. 1.School of Mathematics, Statistics and Computer Science, Victoria University of Wellington, P.O. Box 600, WellingtonNew Zealand

Personalised recommendations