Using Genetic Programming for Feature Creation with a Genetic Algorithm Feature Selector

  • Matthew G. Smith
  • Larry Bull
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3242)


The use of machine learning techniques to automatically analyse data for information is becoming increasingly widespread. In this paper we primarily examine the use of Genetic Programming and a Genetic Algorithm to pre-process data before it is classified using the C4.5 decision tree learning algorithm. Genetic Programming is used to construct new features from those available in the data, a potentially significant process for data mining since it gives consideration to hidden relationships between features. A Genetic Algorithm is used to determine which such features are the most predictive. Using ten well-known datasets we show that our approach, in comparison to C4.5 alone, provides marked improvement in a number of cases. We then examine its use with other well-known machine learning techniques.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Ahluwalia, M., Bull, L.: Co-Evolving Functions in Genetic Programming: Classification using k-nearest neighbour. In: Banzhaf, W., Daida, J., Eiben, G., Garzon, M.-H., Honavar, J., Jakeila, K., Smith, R. (eds.) GECCO 1999: Proceedings of the Genetic and Evolutionary Computation Conference, pp. 947–952. Morgan Kaufmann, San Francisco (1999)Google Scholar
  2. 2.
    Dagher, I., Georgiopoulos, M., Heileman, G.L., Bebis, G.: An Ordering Algorithm for Pattern Presentation in Fuzzy ARTMAP That Tends to Improve Generalization Performance. IEEE Transactions on Neural Networks 10(4), 768–778 (1999)CrossRefGoogle Scholar
  3. 3.
    Dixon, P.W., Corne, D.W., Oates, M.J.: A Preliminary Investigation of Modified XCS as a Generic Data Mining Tool. In: Lanzi, P.L., Stolzmann, W., Wilson, S.W. (eds.) IWLCS 2001. LNCS (LNAI), vol. 2321, pp. 133–151. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  4. 4.
    Ekárt, A., Márkus, A.: Using Genetic Programming and Decision Trees for Generating Structural Descriptions of Four Bar Mechanisms. To appear in Artificial Intelligence for Engineering Design, Analysis and Manufacturing 17(3) (2003)Google Scholar
  5. 5.
    Holland, J.H.: Adaptation in Natural and Artificial Systems. Univ. Michigan (1975)Google Scholar
  6. 6.
    Kelly, J.D., Davis, L.: Hybridizing the Genetic Algorithm and the K Nearest Neighbors Classification Algorithm. In: Belew, R., Booker, L. (eds.) Proceedings of the Fourth International Conference on Genetic Algorithms, pp. 377–383. Morgan Kaufmann, San Francisco (1991)Google Scholar
  7. 7.
    Kohavi, R., John, G.H.: Wrappers for feature subset selection. Artificial Intelligence Journal 1-3, 273–324 (1997)CrossRefGoogle Scholar
  8. 8.
    Koza, J.R.: Genetic Programming. MIT Press, Cambridge (1992)MATHGoogle Scholar
  9. 9.
    Krawiec, K.: Genetic Programming-based Construction of Features for Machine Learning and Knowledge Discovery Tasks. Genetic Programming and Evolvable Machines 3(4), 329–343 (2002)MATHCrossRefGoogle Scholar
  10. 10.
    Mangasarian, O.L., Musicant, D.R.: Lagrangian support vector machines. Journal of Machine Learning Research 1, 161–177 (2001)MATHCrossRefMathSciNetGoogle Scholar
  11. 11.
    Otero, F.E.B., Silva, M.M.S., Freitas, A.A., Nievola, J.C.: Genetic Programming for Attribute Construction in Data Mining. In: Ryan, C., Soule, T., Keijzer, M., Tsang, E.P.K., Poli, R., Costa, E. (eds.) EuroGP 2003. LNCS, vol. 2610, pp. 384–393. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  12. 12.
    Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann, San Francisco (1993)Google Scholar
  13. 13.
    Raymer, M.L., Punch, W., Goodman, E.D., Kuhn, L.: Genetic Programming for Improved Data Mining - Application to the Biochemistry of Protein Interactions. In: Koza, J.R., Deb, K., Dorigo, M., Fogel, D.B., Garzon, M., Iba, H., Riolo, R. (eds.) Proceedings of the Second Annual Conference on Genetic Programming, pp. 375–380. Morgan Kaufmann, San Francisco (1996)Google Scholar
  14. 14.
    Siedlecki, W., Sklansky, J.: On Automatic Feature Selection. International Journal of Pattern Recognition and Artificial Intelligence 2, 197–220 (1988)CrossRefGoogle Scholar
  15. 15.
    Smith, M., Bull, L.: Feature Construction and Selection using Genetic Programming and a Genetic Algorithm. In: Ryan, C., Soule, T., Keijzer, M., Tsang, E.P.K., Poli, R., Costa, E. (eds.) EuroGP 2003. LNCS, vol. 2610, pp. 229–237. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  16. 16.
    Vafaie, H., De Jong, K.: Genetic Algorithms as a Tool for Restructuring Feature Space Representations. In: Proceedings of the International Conference on Tools with A.I., IEEE Computer Society Press, Los Alamitos (1995)Google Scholar
  17. 17.
    Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations. Morgan Kaufmann, San Francisco (2000)Google Scholar
  18. 18.
    Aha, D., Kibler, D.: Instance-based learning algorithms. Machine Learning 6, 37–66 (1991)Google Scholar
  19. 19.
    John, G.H., Langley, P.: Estimating Continuous Distributions in Bayesian Classifiers. In: Proceedings of the Eleventh Conference on Uncertainty in Artificial Intelligence, pp. 338–345. Morgan Kaufmann, San Mateo (1995)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2004

Authors and Affiliations

  • Matthew G. Smith
    • 1
  • Larry Bull
    • 1
  1. 1.Faculty of Computing, Engineering & Mathematical SciencesUniversity of the West of EnglandBristolUK

Personalised recommendations