Abstract
In pattern recognition the curse of dimensionality can be handled either by reducing the number of features, e.g. with decision trees or by extraction of new features.
We propose a genetic programming (GP) framework for automatic extraction of features with the express aim of dimension reduction and the additional aim of improving accuracy of the k-nearest neighbour (k-NN) classifier. We will show that our system is capable of reducing most datasets to one or two features while k-NN accuracy improves or stays the same. Such a small number of features has the great advantage of allowing visual inspection of the dataset in a two-dimensional plot.
Since k-NN is a non-linear classification algorithm [2], we compare several linear fitness measures. We will show the a very simple one, the accuracy of the minimal distance to means (mdm) classifier outperforms all other fitness measures.
We introduce a stopping criterion gleaned from numeric mathematics. New features are only added if the relative increase in training accuracy is more than a constant d, for the mdm classifier estimated to be 3.3%.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
E.I. Chang and R.P. Lippman. Using genetic algorithms to improve pattern classification performance. In Advances in Neural Information Processing Systems, 1991.
J. Friedman, J. Bentley, and R. Finkel. An algorithm for finding best matches in logarithmic expected time. ACM Transactions on Mathematical Software, 3(3):209–226, 1977.
K. Fukunaga. Introduction to Statistical Pattern Recognition. Academic Press, New York, second edition, 1990.
J. Edward Jackson. A User’s Guide to Principal Components. John Wiley & Sons, Inc, 1991.
M. Kotani, M. Nakai, and K. Akazawa. Feature extraction using evolutionary computation. In CEC 1999, pages 1230–1236, 1999.
H. Liu and R. Setiono. Feature transformation and multivariate decision tree induction. In Discovery Science, pages 279–290, 1998.
B. Masand and G. Piatetsky-Shapiro. Discovering time oriented abstractions in historical data to optimize decision tree classification. In P. Angeline and E. Kinnear Jr, editors, Advances in Genetic Programming, volume 2, pages 489–498. MIT Press, 1996.
T. Mitchell. Machine Learning. WCB/McGraw-Hill, 1997.
M.L. Raymer, W.F. Punch, E.D. Goodman, and L.A. Kuhn. Genetic programming for improved data mining: An application to the biochemistry of protein interactions. In Proceedings GP 1996, pages 375–380. MIT Press, 1996.
R. Setiono and H. Liu. Fragmentation problem and automated feature construction. In Proc. 10th IEEE Int. Conf on Tools with AI, pages 208–215, 1998.
J. Sherrah. Automatic Feature Extraction for Pattern Recognition. PhD thesis, University of Adelaide, South Australia, 1998.
Zijian Zheng. A comparison of constructive induction with different types of new attribute. Technical Report TR C96/8, Deakin University, Geelong, Australia, May 1996.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2001 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Bot, M.C.J. (2001). Feature Extraction for the k-Nearest Neighbour Classifier with Genetic Programming. In: Miller, J., Tomassini, M., Lanzi, P.L., Ryan, C., Tettamanzi, A.G.B., Langdon, W.B. (eds) Genetic Programming. EuroGP 2001. Lecture Notes in Computer Science, vol 2038. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45355-5_20
Download citation
DOI: https://doi.org/10.1007/3-540-45355-5_20
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-41899-3
Online ISBN: 978-3-540-45355-0
eBook Packages: Springer Book Archive