Computational Statistics

, Volume 26, Issue 2, pp 321–340 | Cite as

On the fusion of threshold classifiers for categorization and dimensionality reduction

  • Hans A. Kestler
  • Ludwig Lausser
  • Wolfgang Lindner
  • Günther Palm
Original Paper


We study ensembles of simple threshold classifiers for the categorization of high-dimensional data of low cardinality and give a compression bound on their prediction risk. Two approaches are utilized to produce such classifiers. One is based on univariate feature selection employing the area under the ROC curve as ranking criterion. The other approach uses a greedy selection strategy. The methods are applied to artificial data, published microarray expression profiles, and highly imbalanced data.


Feature reduction Threshold classifiers High dimensional data 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Alon U, Barkai N, Notterman DA, Gish K, Ybarra S, Mack D, Levine AJ (1999) Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proc Natl Acad Sci 96(12): 6745–6750CrossRefGoogle Scholar
  2. Anthony M, Biggs N (1992) Computational learning theory. Cambridge University Press, CambridgezbMATHGoogle Scholar
  3. Bittner M, Meltzer P, Chen Y, Jiang Y, Seftor E, Hendrix M, Radmacher M, Simon R, Yakhini Z, Ben-Dor A, Sampas N, Dougherty E, Wang E, Marincola F, Gooden C, Lueders J, Glatfelter A, Pollock P, Carpten J, Gillanders E, Leja D, Dietrich K, Beaudry C, Berens M, Alberts D, Sondak V (2000) Molecular classification of cutaneous malignant melanoma by gene expression profiling. Nature 406(6795): 536–540CrossRefGoogle Scholar
  4. Blum A, Langford J (2003) PAC-MDL bounds. In: Schölkopf Bernhard, Warmuth Manfred K (eds) COLT. Springer, Berlin, pp 344–357Google Scholar
  5. Breiman L (1996) Bagging predictors. Mach Learn 24(2): 123–140MathSciNetzbMATHGoogle Scholar
  6. Breiman L (1998) Arcing classifiers. Annal Stat 26(3): 801–824MathSciNetzbMATHCrossRefGoogle Scholar
  7. Breiman L (2001) Random forests. Mach Learn 45(1): 5–32zbMATHCrossRefGoogle Scholar
  8. Breiman L, Friedman JH, Olshen RA, Stone CJ (1984) Classification and regression trees. Wadsworth Publishing Company, BelmontzbMATHGoogle Scholar
  9. Buchholz M, Kestler HA, Bauer A, Böck W, Rau B, Leder G, Kratzer W, Bommer M, Scarpa A, Schilling MK, Adler G, Hoheisel JD, Gress TM (2005) Specialized DNA arrays for the differentiation of pancreatic tumors. Clin Cancer Res 11(22): 8048–8054CrossRefGoogle Scholar
  10. Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3): 273–297zbMATHGoogle Scholar
  11. Cover TM (1965) Geometrical and statistical properties of systems of linear inequalities with applications in pattern recognition. IEEE Trans Electron Comput 14(3): 326–334zbMATHCrossRefGoogle Scholar
  12. Duch W (2004) Filter methods. In: Guyon I, Gunn S, Nikravesh M, Zadeh LA (eds) Feature extraction, foundations and applications. Springer, Berlin, pp 89–118Google Scholar
  13. Floyd S, Warmuth MK (1995) Sample compression, learnability, and the Vapnik-Chervonenkis dimension. Mach Learn 21(3): 269–304Google Scholar
  14. Freund Y (1995) Boosting a weak learning algorithm by majority. Inf Comput 121(2): 256–285MathSciNetzbMATHCrossRefGoogle Scholar
  15. Furey TS, Cristianini N, Duffy N, Bednarski DW, Schummer M, Haussler D (2000) Support vector machine classification and validation of cancer tissue samples using microarray expression data. Bioinformatics 16(10): 906–914CrossRefGoogle Scholar
  16. Golub TR, Slonim DK, Tamayo P, Huard C, Gaasenbeek M, Mesirov JP, Coller H, Loh ML, Downing JR, Caligiuri MA, Bloomfield CD, Lander ES (1999) Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286(5439): 531–537CrossRefGoogle Scholar
  17. Graepel T, Herbrich R, Shawe-Taylor J (2005) PAC-Bayesian compression bounds on the prediction error of learning algorithms for classification. Mach Learn 59(1–2): 55–76zbMATHCrossRefGoogle Scholar
  18. Hanley JA, Mcneil BJ (1982) The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology 143(1): 29–36Google Scholar
  19. Haussler D (1988) Quantifying inductive bias: AI learning algorithms and Valiant’s learning framework. Artif Intell 36(2): 177–221MathSciNetzbMATHCrossRefGoogle Scholar
  20. Kearns MJ, Vazirani UV (1994) An introduction to computational learning theory. MIT, CambridgeGoogle Scholar
  21. Klivans AR, Servedio RA (2006) Toward attribute efficient learning of decision lists and parities. J Mach Learn Res 7: 587–602MathSciNetGoogle Scholar
  22. Kuncheva LI, Whitaker CJ, Shipp CA (2003) Limits on the majority vote accuracy in classifier fusion. Pattern Anal Appl 6(1): 22–31MathSciNetzbMATHCrossRefGoogle Scholar
  23. Lai C, Reinders MJT, van’t Veer LJ, Wessels LFA (2006) A comparison of univariate and multivariate gene selection techniques for classification of cancer datasets. BMC Bioinf 7: 235CrossRefGoogle Scholar
  24. Langford J (2005) Tutorial on practical prediction theory for classification. J Mach Learn Res 6: 273–306MathSciNetGoogle Scholar
  25. Laviolette F, Marchand M, Shah M (2005) Margin-sparsity trade-off for the set covering machine. In: Proceedings of the 16th European conference on machine learning. Springer, pp 206–217Google Scholar
  26. Laviolette F, Marchand M, Shah M, Shanian S (2010) Learning the set covering machine by bound minimization and margin-sparsity trade-off. Mach Learn 78(1–2): 175–201CrossRefGoogle Scholar
  27. Littlestone N, Warmuth M (1986) Relating data compression and learnability. Unpublished manuscriptGoogle Scholar
  28. Marchand M, Shah M (2004) PAC-Bayes learning of conjunctions and classification of gene-expression data. In: Proceedings of the 18th annual conference on neural information processing systems. MIT, Cambridge, pp 881–888Google Scholar
  29. Marchand M, Shawe-Taylor J (2002) The set covering machine. J Mach Learn Res 3(4–5): 723–746MathSciNetGoogle Scholar
  30. McAllester DA (1999) PAC-Bayesian model averaging. In: COLT ’99: Proceedings of the twelfth annual conference on computational learning theory. ACM, pp 164–170Google Scholar
  31. Pomeroy SL, Tamayo P, Gaasenbeek M, Sturla LM, Angelo M, McLaughlin ME, Kim JYH, Goumnerova LC, Black PM, Lau C, Allen JC, Zagzag D, Olson JM, Curran T, Wetmore C, Biegel JA, Poggio T, Mukherjee S, Rifkin R, Califano A, Stolovitzky G, Louis DN, Mesirov JP, Lander ES, Golub TR (2002) Prediction of central nervous system embryonal tumour outcome based on gene expression. Nature 415(6870): 436–442CrossRefGoogle Scholar
  32. Ruschhaupt M, Huber W, Poustka A, Mansmann U (2004) A compendium to ensure computational reproducibility in high-dimensional classification tasks. Stat Appl Genet Mol Biol 3(1): 37MathSciNetGoogle Scholar
  33. Vapnik V (1998) Statistical learning theory. Wiley, New YorkzbMATHGoogle Scholar

Copyright information

© Springer-Verlag 2011

Authors and Affiliations

  • Hans A. Kestler
    • 1
    • 2
  • Ludwig Lausser
    • 1
  • Wolfgang Lindner
    • 1
  • Günther Palm
    • 2
  1. 1.Internal Medicine IUniversity Hospital UlmUlmGermany
  2. 2.Institute of Neural Information ProcessingUniversity of UlmUlmGermany

Personalised recommendations