Ensembles of Nearest Neighbors for Gene Expression Based Cancer Classification

  • Oleg Okun
  • Helen Priisalu
Part of the Studies in Computational Intelligence book series (SCI, volume 126)


Gene expression levels are useful in discriminating between cancer and normal examples and/or between different types of cancer. In this chapter, ensembles of k-nearest neighbors are employed for gene expression based cancer classification. The ensembles are created by randomly sampling subsets of genes, assigning each subset to a k-nearest neighbor (k-NN) to perform classification, and finally, combining k-NN predictions with majority vote. Selection of subsets is governed by the statistical dependence between dataset complexity and classification error, confirmed by the copula method, so that least complex subsets are preferred since they are associated with more accurate predictions. Experiments carried out on six gene expression datasets show that our ensemble scheme is superior to a single best classifier in the ensemble and to the redundancy-based filter, especially designed to remove irrelevant genes.


Ensemble of classifiers k-nearest neighbor gene expression cancer classification dataset complexity copula bolstered error 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Alon U, Barkai N, Notterman DA, Gish K, Ybarra S, Mack D, Levine AJ (1999) Proc Natl Acad Sci 96:6745–6750CrossRefGoogle Scholar
  2. 2.
    Pomeroy SL, Tamayo P, Gaasenbeek M, Sturla LM, Angelo M, McLaughlin ME, Kim JYH, Goumnerova LC, Black PM, Lau C, Allen JC, Zagzag D, Olson JM, Curran T, Wetmore C, Biegel JA, Poggio T, Mukherjee S, Rifkin R, Califano A, Stolovitzky G, Louis DN, Mesirov JP, Lander ES, Golub TR (2002) Nature 415:436–442CrossRefGoogle Scholar
  3. 3.
    Singh D, Febbo PG, Ross K, Jackson DG, Manola J, Ladd C, Tamayo P, Renshaw AA, D’Amico AV, Richie JP, Lander ES, Loda M, Kantoff PW, Golub TR, Sellers WR (2002) Cancer Cell 1:203–209CrossRefGoogle Scholar
  4. 4.
    Sima C, Attoor S, Braga-Neto U, Lowey J, Suh E, Dougherty ER (2005) Error estimation confounds feature selection in expression-based classification. In: Proc IEEE Int Workshop Genomic Sign Proc and Stat, Newport, Rhode IslandGoogle Scholar
  5. 5.
    Braga-Neto U, Dougherty ER (2004) Pattern Recognition 37:1267–1281zbMATHCrossRefGoogle Scholar
  6. 6.
    Kuncheva L (2004) Combining pattern classifiers: methods and algorithms. John Wiley & Sons, HobokenzbMATHCrossRefGoogle Scholar
  7. 7.
    Dudoit S, Fridlyand J (2003) Classification in microarray experiments. In: Speed T (ed) Statistical analysis of gene expression microarray data. Chapman & Hall∖CRC Press, Boca RatonGoogle Scholar
  8. 8.
    Yu L (2008) Feature selection for genomic data analysis. In Liu H, Motoda H (eds) Computational methods of feature selection. Chapman & Hall∖CRC, Boca RatonGoogle Scholar
  9. 9.
    Sklar A (1959) Fonctions de répartition à n dimensions et leurs marges. Publications of the Institute of Statistics, University of ParisGoogle Scholar
  10. 10.
    Nelsen RB (2006) An inroduction to copulas. Springer Science+Business Media, New YorkGoogle Scholar
  11. 11.
    Joe H (1997) Multivariate models and dependence concepts. Chapman & Hall∖CRC Press, Boca RatonzbMATHGoogle Scholar
  12. 12.
    Zar JH (1999) Biostatistical analysis. Prentice Hall, Upper Saddle RiverGoogle Scholar
  13. 13.
    Gandrillon O (2004) Guide to the gene expression data. In: Proc ECML/PKDD Discovery Challenge Workshop, Pisa, Italy, pp 116–120Google Scholar
  14. 14.
    Bø TH, Jonassen I (2002) Genome Biology 3:0017.1–0017.11CrossRefGoogle Scholar
  15. 15.
    Box GEP, Müller ME (1958) The Annals of Mathematical Statistics 29:610–611zbMATHCrossRefGoogle Scholar
  16. 16.
    Schweizer B, Wolff EF (1981) The Annals of Statistics 9:879–885zbMATHCrossRefMathSciNetGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • Oleg Okun
    • 1
  • Helen Priisalu
    • 2
  1. 1.University of OuluOuluFinland
  2. 2.TeradataEspooFinland

Personalised recommendations