Abstract
Both theory and a wealth of empirical studies have established that ensembles are more accurate than single predictive models. For the ensemble approach to work, base classifiers must not only be accurate but also diverse, i.e., they should commit their errors on different instances. Instance-based learners are, however, very robust with respect to variations of a data set, so standard resampling methods will normally produce only limited diversity. Because of this, instance-based learners are rarely used as base classifiers in ensembles. In this chapter, we introduce a method where genetic programming is used to generate kNN base classifiers with optimized k-values and feature weights. Due to the inherent inconsistency in genetic programming (i.e., different runs using identical data and parameters will still produce different solutions) a group of independently evolved base classifiers tend to be not only accurate but also diverse. In the experimentation, using 30 data sets from the UCI repository, two slightly different versions of kNN ensembles are shown to significantly outperform both the corresponding base classifiers and standard kNN with optimized k-values, with respect to accuracy and AUC.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Asuncion, A., Newman, D.J.: UCI machine learning repository (2007)
Bishop, C.M.: Neural Networks for Pattern Recognition. Oxford University Press, Oxford (1995)
Boström, H.: Estimating class probabilities in random forests. In: ICMLA ’07: Proceedings of the Sixth International Conference on Machine Learning and Applications, pp. 211–216. IEEE Computer Society, Washington, DC, USA (2007)
Breiman, L.: Bagging predictors. Machine Learning 24(2), 123–140 (1996)
Breiman, L., Friedman, J., Stone, C.J., Olshen, R.A.: Classification and Regression Trees. Chapman & Hall/CRC, Boca Raton, FL (1984)
Brier, G.: Verification of forecasts expressed in terms of probability. Monthly Weather Review 78, 1–3 (1950)
Brown, G., Wyatt, J., Harris, R., Yao, X.: Diversity creation methods: a survey and categorisation. Journal of Information Fusion 6(1), 5–20 (2005)
Dietterich, T.G.: Machine-learning research: Four current directions. The AI Magazine 18(4), 97–136 (1998)
Domeniconi, C., Yan, B.: Nearest neighbor ensemble. In: 17th International Conference on Pattern Recognition, vol. 1, pp. 228–231. IEEE Computer Society, Los Alamitos, CA, USA (2004)
Fawcett, T.: Using rule sets to maximize roc performance. In: Proceedings of the 2001 IEEE International Conference on Data Mining, ICDM’01, pp. 131–138. IEEE Computer Society, Washington, DC, USA (2001)
Johansson, U.: Obtaining Accurate and Comprehensible Data Mining Models: An Evolutionary Approach. PhD-thesis. Institute of Technology, Linköping University (2007)
Johansson, U., König, R., Niklasson, L.: Rule extraction from trained neural networks using genetic programming. In: 13th International Conference on Artificial Neural Networks, supplementary proceedings, pp. 13–16 (2003)
Johansson, U., König, R., Niklasson, L.: Evolving a locally optimized instance based learner. In: 4th International Conference on Data Mining – DMIN’08, pp. 124–129. CSREA Press (2008)
König, R., Johansson, U., Niklasson, L.: G-REX: A versatile framework for evolutionary data mining, ieee international conference on data mining (icdm’08), demo paper. in press (2008)
Krogh, A., Vedelsby, J.: Neural network ensembles, cross validation, and active learning. Advances in Neural Information Processing Systems 2, 231–238 (1995)
Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers Inc., San Fransisco, CA (1993)
Schapire, R.E.: The strength of weak learnability. Machine Learning 5(2), 197–227 (1990)
Wettschereck, D., Dietterich, T.G.: Locally adaptive nearest neighbor algorithms. Advances in Neural Information Processing Systems 6, 184–191 (1994)
Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems). Morgan Kaufmann, San Fransisco, CA (2005)
Wolpert, D.H.: Stacked generalization. Neural Networks 5, 241–259 (1992)
Zavrel, J.: An empirical re-examination of weighted voting for k-nn. In: Proceedings of the 7th Belgian-Dutch Conference on Machine Learning, pp. 139–148 (1997)
Acknowledgments
This work was supported by the Information Fusion Research Program (University of Skövde, Sweden) in partnership with the Swedish Knowledge Foundation under grant 2003/0104 (URL:http://www.infofusion.se).
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer Science+Business Media, LLC
About this chapter
Cite this chapter
Johansson, U., König, R., Niklasson, L. (2010). Genetically Evolved kNN Ensembles. In: Stahlbock, R., Crone, S., Lessmann, S. (eds) Data Mining. Annals of Information Systems, vol 8. Springer, Boston, MA. https://doi.org/10.1007/978-1-4419-1280-0_13
Download citation
DOI: https://doi.org/10.1007/978-1-4419-1280-0_13
Published:
Publisher Name: Springer, Boston, MA
Print ISBN: 978-1-4419-1279-4
Online ISBN: 978-1-4419-1280-0
eBook Packages: Computer ScienceComputer Science (R0)