Abstract
The use of machine learning techniques to automatically analyse data for information is becoming increasingly widespread. In this paper we primarily examine the use of Genetic Programming and a Genetic Algorithm to pre-process data before it is classified using the C4.5 decision tree learning algorithm. Genetic Programming is used to construct new features from those available in the data, a potentially significant process for data mining since it gives consideration to hidden relationships between features. A Genetic Algorithm is used to determine which such features are the most predictive. Using ten well-known datasets we show that our approach, in comparison to C4.5 alone, provides marked improvement in a number of cases. We then examine its use with other well-known machine learning techniques.
Similar content being viewed by others
References
D. Aha and D. Kibler, “Instance-based learning algorithms,” Machine Learning vol. 6, pp. 37–66, 1991.
M. Ahluwalia and L. Bull, “Co-evolving functions in genetic programming: Classification using k-nearest neighbour,” in GECCO-99: Proceedings of the Genetic and Evolutionary Computation Conference, W. Banzhaf, J. Daida, G. Eiben, M-H. Garzon, J. Honavar, K. Jakeila, and R. Smith (Eds.), Morgan Kaufmann: San Mateo, 1999, pp. 947–952.
Y. Amit and D. Geman, “Shape quantization and recognition with randomized trees,” Neural Computation, vol. 9, no. 7, pp. 1545–1588, 1996.
W. Banzhaf, P. Nordin, R. E. Keller, and F. D. Francone, Genetic programming—An Introduction on the Automatic Evolution of Computer Programs and its Applications, Morgan Kaufmann: San Mateo, 1998.
L. Breiman, “Bagging predictors,” Machine Learning, vol. 24, no. 2, pp. 123–140, 1996.
I. Dagher, M. Georgiopoulos, G. L. Heileman, and G. Bebis, “An ordering algorithm for pattern presentation in fuzzy ARTMAP that tends to improve generalization performance,” IEEE Transactions on Neural Networks vol. 10, no. 4, pp. 768–778, 1999.
P. Dixon, D. Corne, and M. Oates, “A preliminary investigation of modified XCS as a generic data mining Tool,” in Advances in Learning Classifier Systems, P-L. Lanzi, W. Stolzmann, and S. Wilson (Eds.), Springer, 2001, pp. 133–151.
A. Ekárt and A. Márkus, “Using genetic programming and decision trees for generating structural descriptions of four bar mechanisms,” Artificial Intelligence for Engineering Design, Analysis and Manufacturing, vol. 17, no. 3. 2003, to appear.
I. Guyon and A. Elisseeff, “An introduction to variable and feature selection,” Journal of Machine Learning Research, vol. 3, pp. 1157–1182, 2003.
J. Holland, Adaptation in Natural and Artificial Systems. Univ. Michigan, 1975.
G. John and P. Langley, “Estimating continuous distributions in bayesian classifiers,” in Proceedings of the Eleventh Conference on Uncertainty in Artificial Intelligence. Morgan Kaufmann: San Mateo, 1995, pp. 338–345.
J. Kelly and L. Davis, “Hybridizing the genetic algorithm and the K nearest neighbors classification algorithm”, in Proceedings of the Fourth International Conference on Genetic Algorithms, R. Belew and L. Booker (Eds.), Morgan Kaufmann: San Mateo, 1991, pp. 377–383.
R. Kohavi and G. John, “Wrappers for feature subset selection,” Artificial Intelligence Journal, vols. 1–2, pp. 273–324, 1997.
J. Koza, Genetic Programming, MIT Press, 1992.
K. Krawiec, “Genetic programming-based construction of features for machine learning and knowledge discovery tasks,” Genetic Programming and Evolvable Machines, vol. 3, no. 4, pp. 329–343, 2002.
O. Mangasarian and D. Musicant, “Lagrangian support vector machines,” Journal of Machine Learning Research vol. 1, pp. 161–177, 2001.
T. M. Mitchell, Machine Learning. McGraw-Hill: New York, 1997.
F. Otero, M. Silva, A. Freitas, and J. Nievola, “Genetic programming for attribute construction in data mining,” in Proceedings of Genetic Programming: 6th European Conference, EuroGP 2003, Essex, UK, Springer, 2003, pp. 384–393.
J. Quinlan, C4.5: Programs for Machine Learning, Morgan Kaufmann: San Mateo, 1993.
M. Raymer, W. Punch, E. Goodman, and L. Kuhn, “Genetic programming for improved data mining—Application to the biochemistry of protein interactions,” in Proceedings of the Second Annual Conference on Genetic Programming, J. Koza, K. Deb, M. Dorigo, D. Fogel, M.Garzon, H. Iba, and R. Riolo (Eds.), Morgan Kaufmann: San Mateo, 1996, pp. 375–380.
W. Siedlecki and J. Sklansky, “On automatic feature selection,” International Journal of Pattern Recognition and Artificial Intelligence vol. 2, pp. 197–220, 1988.
D. Song, M. I. Heywood, and A. Nur Zincir-Heywood, “A linear genetic programming approach to intrusion detection,” Genetic and Evolutionary Computation—GECCO-2003, E. Cantú-Paz et al. (Eds.), 2003, pp. 2325–2336.
H. Vafaie and K. De Jong, “Genetic algorithms as a tool for restructuring feature space representations,” in Proceedings of the International Conference on Tools with A.I., IEEE Computer Society Press, 1995.
I. Witten and E. Frank, Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations. Morgan Kaufmann: San Mateo, 2000.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Smith, M.G., Bull, L. Genetic Programming with a Genetic Algorithm for Feature Construction and Selection. Genet Program Evolvable Mach 6, 265–281 (2005). https://doi.org/10.1007/s10710-005-2988-7
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10710-005-2988-7