Abstract
As genetic algorithms are able to perform extensive global search they have the potential to find the optimal set of model parameters for a classification algorithm. However if test set error is used to calculate fitness, the computational costs can be high and there is a danger that over-fitting to the test set can occur. This paper empirically examines the over-fitting problem in a feature selection context and then proposes techniques for modifying the fitness function to improve speed and accuracy. It is shown that test set sampling can dramatically speed up the evaluation function and hence enable the GA approach to be feasibly applied to large data sets. A technique is then proposed which combines the use of Occam's razor with statistical confidence tests to determine the number of samples utilized by the evaluation function.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
I Kuscu, C Thornton: “Design of Artificial Neural Networks using Genetic Algorithms: review and prospect”, Technical Report, Cognitive and Computing Sciences, University of Sussex (1994)
F Brill, D Brown et al.: ”Fast genetic selection of features for neural network classifiers”, IEEE Transactions on Neural Networks, Vol.3 No.2 (1992) 324–328.
S Salzberg: ”A Critique of Current Research and Methods”, Technical Report JHU-95/06, John Hopkins University, Department of Computer Science. (1995).
R Forsyth: ”IOGA: An Instance-Oriented Genetic algorithm”, Parallel Problem Solving from Nature 4, (1996) 482–493.
J Kelly, L Davis: ”A hybrid genetic algorithm for classification.”, Proceedings of the Twelfth International Joint conference on Artificial Intelligence, Morgan Kaufmann (1991) 1022–1029.
A Miller: ”Subset Selection in Regression”, Chapman and Hall (1990).
R Kohavi, D Sommerfield: ”Feature subset selection using the wrapper model: Overfitting and dynamic search space topology”, First International conference on Knowledge Discovery and Data mining. (1995) 192–197
Statlog data and documentation at ftp.ncc.up.pt/pub/statlog
J Holland: ”Adaption in natural and artificial systems”, University of Michigan Press(1975).
J Fitzpatrick, T Grefenstette: ”Genetic Algorithms in noisy environments”, Machine Learning Vol. 3. No. 2/3 (1985) 101–120.
S Rana, D Whitley et al.: ”Searching in the presence of noise”, Parallel Problem Solving from Nature 4, (1996) 198–207.
O Maron, A Moore: ”Hoeffding Races: Accelerating Model Selection Search for Classification and Function Approximation”, Advances in Neural Information Processing Systems 6. Morgan Kaufmann (1994).
A Moore, M Lee: ”Efficient algorithms for minimizing cross validation error”, Machine Learning: Proceedings of the Eleventh International Conference, Morgan Kaufmann (1994)
R Smith, E Dike et al.: ”Inheritance in Genetic Algorithms”, Proceedings of the ACM 1995 Symposium on Applied Computing, ACM Press (1994).
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1997 Springer-Verlag
About this paper
Cite this paper
Glover, R., Sharpe, P. (1997). Efficient GA based techniques for automating the design of classification models. In: Liu, X., Cohen, P., Berthold, M. (eds) Advances in Intelligent Data Analysis Reasoning about Data. IDA 1997. Lecture Notes in Computer Science, vol 1280. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0052837
Download citation
DOI: https://doi.org/10.1007/BFb0052837
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-63346-4
Online ISBN: 978-3-540-69520-2
eBook Packages: Springer Book Archive