Feature Extraction for the k-Nearest Neighbour Classifier with Genetic Programming

Bot, Martijn C. J.

doi:10.1007/3-540-45355-5_20

Martijn C. J. Bot⁷

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2038))

Included in the following conference series:

European Conference on Genetic Programming

683 Accesses
10 Citations

Abstract

In pattern recognition the curse of dimensionality can be handled either by reducing the number of features, e.g. with decision trees or by extraction of new features.

We propose a genetic programming (GP) framework for automatic extraction of features with the express aim of dimension reduction and the additional aim of improving accuracy of the k-nearest neighbour (k-NN) classifier. We will show that our system is capable of reducing most datasets to one or two features while k-NN accuracy improves or stays the same. Such a small number of features has the great advantage of allowing visual inspection of the dataset in a two-dimensional plot.

Since k-NN is a non-linear classification algorithm [2], we compare several linear fitness measures. We will show the a very simple one, the accuracy of the minimal distance to means (mdm) classifier outperforms all other fitness measures.

We introduce a stopping criterion gleaned from numeric mathematics. New features are only added if the relative increase in training accuracy is more than a constant d, for the mdm classifier estimated to be 3.3%.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

E.I. Chang and R.P. Lippman. Using genetic algorithms to improve pattern classification performance. In Advances in Neural Information Processing Systems, 1991.
Google Scholar
J. Friedman, J. Bentley, and R. Finkel. An algorithm for finding best matches in logarithmic expected time. ACM Transactions on Mathematical Software, 3(3):209–226, 1977.
Article MATH Google Scholar
K. Fukunaga. Introduction to Statistical Pattern Recognition. Academic Press, New York, second edition, 1990.
MATH Google Scholar
J. Edward Jackson. A User’s Guide to Principal Components. John Wiley & Sons, Inc, 1991.
Google Scholar
M. Kotani, M. Nakai, and K. Akazawa. Feature extraction using evolutionary computation. In CEC 1999, pages 1230–1236, 1999.
Google Scholar
H. Liu and R. Setiono. Feature transformation and multivariate decision tree induction. In Discovery Science, pages 279–290, 1998.
Google Scholar
B. Masand and G. Piatetsky-Shapiro. Discovering time oriented abstractions in historical data to optimize decision tree classification. In P. Angeline and E. Kinnear Jr, editors, Advances in Genetic Programming, volume 2, pages 489–498. MIT Press, 1996.
Google Scholar
T. Mitchell. Machine Learning. WCB/McGraw-Hill, 1997.
Google Scholar
M.L. Raymer, W.F. Punch, E.D. Goodman, and L.A. Kuhn. Genetic programming for improved data mining: An application to the biochemistry of protein interactions. In Proceedings GP 1996, pages 375–380. MIT Press, 1996.
Google Scholar
R. Setiono and H. Liu. Fragmentation problem and automated feature construction. In Proc. 10th IEEE Int. Conf on Tools with AI, pages 208–215, 1998.
Google Scholar
J. Sherrah. Automatic Feature Extraction for Pattern Recognition. PhD thesis, University of Adelaide, South Australia, 1998.
Google Scholar
Zijian Zheng. A comparison of constructive induction with different types of new attribute. Technical Report TR C96/8, Deakin University, Geelong, Australia, May 1996.
Google Scholar

Download references

Author information

Authors and Affiliations

Vrije Universiteit, De Boelelaan 1081, 1081, HV, Amsterdam
Martijn C. J. Bot

Authors

Martijn C. J. Bot
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Computer Science, University of Birmingham, Birmingham
Julian Miller
Computer Science Institute, University of Lausanne, Lausanne
Marco Tomassini
Dipartimento di Elettronica e Informazione, Politecnico di Milano, Italy
Pier Luca Lanzi
Computer Science and Information Systems, University of Limerick, Limerick
Conor Ryan
Dipartimento di Tecnologie dell’Informazione, Università degli Studi di Milano, Italy
Andrea G. B. Tettamanzi
Department of Computer Science, University College London, London
William B. Langdon

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Bot, M.C.J. (2001). Feature Extraction for the k-Nearest Neighbour Classifier with Genetic Programming. In: Miller, J., Tomassini, M., Lanzi, P.L., Ryan, C., Tettamanzi, A.G.B., Langdon, W.B. (eds) Genetic Programming. EuroGP 2001. Lecture Notes in Computer Science, vol 2038. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45355-5_20

Download citation

DOI: https://doi.org/10.1007/3-540-45355-5_20
Published: 02 April 2001
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-41899-3
Online ISBN: 978-3-540-45355-0
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics