Data Mining pp 299-313 | Cite as

Genetically Evolved kNN Ensembles

  • Ulf JohanssonEmail author
  • Rikard KönigEmail author
  • Lars NiklassonEmail author
Part of the Annals of Information Systems book series (AOIS, volume 8)


Both theory and a wealth of empirical studies have established that ensembles are more accurate than single predictive models. For the ensemble approach to work, base classifiers must not only be accurate but also diverse, i.e., they should commit their errors on different instances. Instance-based learners are, however, very robust with respect to variations of a data set, so standard resampling methods will normally produce only limited diversity. Because of this, instance-based learners are rarely used as base classifiers in ensembles. In this chapter, we introduce a method where genetic programming is used to generate kNN base classifiers with optimized k-values and feature weights. Due to the inherent inconsistency in genetic programming (i.e., different runs using identical data and parameters will still produce different solutions) a group of independently evolved base classifiers tend to be not only accurate but also diverse. In the experimentation, using 30 data sets from the UCI repository, two slightly different versions of kNN ensembles are shown to significantly outperform both the corresponding base classifiers and standard kNN with optimized k-values, with respect to accuracy and AUC.


Genetic Programming Test Instance Feature Weight Weighted Vote Brier Score 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.



This work was supported by the Information Fusion Research Program (University of Skövde, Sweden) in partnership with the Swedish Knowledge Foundation under grant 2003/0104 (URL:


  1. 1.
    Asuncion, A., Newman, D.J.: UCI machine learning repository (2007)Google Scholar
  2. 2.
    Bishop, C.M.: Neural Networks for Pattern Recognition. Oxford University Press, Oxford (1995)Google Scholar
  3. 3.
    Boström, H.: Estimating class probabilities in random forests. In: ICMLA ’07: Proceedings of the Sixth International Conference on Machine Learning and Applications, pp. 211–216. IEEE Computer Society, Washington, DC, USA (2007)Google Scholar
  4. 4.
    Breiman, L.: Bagging predictors. Machine Learning 24(2), 123–140 (1996)Google Scholar
  5. 5.
    Breiman, L., Friedman, J., Stone, C.J., Olshen, R.A.: Classification and Regression Trees. Chapman & Hall/CRC, Boca Raton, FL (1984)Google Scholar
  6. 6.
    Brier, G.: Verification of forecasts expressed in terms of probability. Monthly Weather Review 78, 1–3 (1950)CrossRefGoogle Scholar
  7. 7.
    Brown, G., Wyatt, J., Harris, R., Yao, X.: Diversity creation methods: a survey and categorisation. Journal of Information Fusion 6(1), 5–20 (2005)CrossRefGoogle Scholar
  8. 8.
    Dietterich, T.G.: Machine-learning research: Four current directions. The AI Magazine 18(4), 97–136 (1998)Google Scholar
  9. 9.
    Domeniconi, C., Yan, B.: Nearest neighbor ensemble. In: 17th International Conference on Pattern Recognition, vol. 1, pp. 228–231. IEEE Computer Society, Los Alamitos, CA, USA (2004)Google Scholar
  10. 10.
    Fawcett, T.: Using rule sets to maximize roc performance. In: Proceedings of the 2001 IEEE International Conference on Data Mining, ICDM’01, pp. 131–138. IEEE Computer Society, Washington, DC, USA (2001)Google Scholar
  11. 11.
    Johansson, U.: Obtaining Accurate and Comprehensible Data Mining Models: An Evolutionary Approach. PhD-thesis. Institute of Technology, Linköping University (2007)Google Scholar
  12. 12.
    Johansson, U., König, R., Niklasson, L.: Rule extraction from trained neural networks using genetic programming. In: 13th International Conference on Artificial Neural Networks, supplementary proceedings, pp. 13–16 (2003)Google Scholar
  13. 13.
    Johansson, U., König, R., Niklasson, L.: Evolving a locally optimized instance based learner. In: 4th International Conference on Data Mining – DMIN’08, pp. 124–129. CSREA Press (2008)Google Scholar
  14. 14.
    König, R., Johansson, U., Niklasson, L.: G-REX: A versatile framework for evolutionary data mining, ieee international conference on data mining (icdm’08), demo paper. in press (2008)Google Scholar
  15. 15.
    Krogh, A., Vedelsby, J.: Neural network ensembles, cross validation, and active learning. Advances in Neural Information Processing Systems 2, 231–238 (1995)Google Scholar
  16. 16.
    Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers Inc., San Fransisco, CA (1993)Google Scholar
  17. 17.
    Schapire, R.E.: The strength of weak learnability. Machine Learning 5(2), 197–227 (1990)Google Scholar
  18. 18.
    Wettschereck, D., Dietterich, T.G.: Locally adaptive nearest neighbor algorithms. Advances in Neural Information Processing Systems 6, 184–191 (1994)Google Scholar
  19. 19.
    Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems). Morgan Kaufmann, San Fransisco, CA (2005)Google Scholar
  20. 20.
    Wolpert, D.H.: Stacked generalization. Neural Networks 5, 241–259 (1992)CrossRefGoogle Scholar
  21. 21.
    Zavrel, J.: An empirical re-examination of weighted voting for k-nn. In: Proceedings of the 7th Belgian-Dutch Conference on Machine Learning, pp. 139–148 (1997)Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2010

Authors and Affiliations

  1. 1.School of Business and InformaticsUniversity of BoråsBoråsSweden
  2. 2.Informatics Research CentreUniversity of SkövdeSkövdeSweden

Personalised recommendations