Applied Intelligence

, Volume 33, Issue 3, pp 357–369 | Cite as

A low variance error boosting algorithm

Article

Abstract

This paper introduces a robust variant of AdaBoost, cw-AdaBoost, that uses weight perturbation to reduce variance error, and is particularly effective when dealing with data sets, such as microarray data, which have large numbers of features and small number of instances. The algorithm is compared with AdaBoost, Arcing and MultiBoost, using twelve gene expression datasets, using 10-fold cross validation. The new algorithm consistently achieves higher classification accuracy over all these datasets. In contrast to other AdaBoost variants, the algorithm is not susceptible to problems when a zero-error base classifier is encountered.

Keywords

Boosting Bagging Arcing Multiboost Ensemble machine learning Random resampling weighted instances Variance error Bias error 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Ambroise C, McLachlan GJ (2002) Selection bias in gene extraction on the basis of microarray gene-expression data. Proc Natl Acad Sci USA 99(10):6562–6566 MATHCrossRefGoogle Scholar
  2. 2.
    Amit Y, Blanchard G (2001) Multiple randomized classifiers. Technical report, University of Chicago Google Scholar
  3. 3.
    Ali KM, Pazzani MJ (1996) Error reduction through learning multiple descriptions. Int J Mach Learn 24:173–202 Google Scholar
  4. 4.
    Alon U, Barkai N, Notterman DA, Gish K, Ybarra S, Mack D, Levine AJ (1999) Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Natl Acad Sci Cell Biol 96:6745–6750 CrossRefGoogle Scholar
  5. 5.
    Armstrong SA, Staunton JE, Silverman LB, Pieters R (2002) MLL translocations specify a distinct gene expression profile that distinguishes a unique leukemia. Nat Genet 30:41–47 CrossRefGoogle Scholar
  6. 6.
    Ash AA, Michael BE, Davis RE, Ma C, Izidore SL, Andreas R (2000) Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling. Nature 403:503–511 CrossRefGoogle Scholar
  7. 7.
    Bauer E, Kohavi R (1999) An empirical comparison of voting classification algorithms: Bagging, boosting, and variants. Int J Mach Learn 36:105–139 CrossRefGoogle Scholar
  8. 8.
    Breiman L (1996) Bias, variance, and arcing classifiers. Technical report 460, Statistics Department, UC Berkeley Google Scholar
  9. 9.
    Breiman L (1996) Bagging predictors. Int J Mach Learn 24:134–140 MathSciNetGoogle Scholar
  10. 10.
    Catherine LN, Mani DR, Rebecca AB, Pablo T (2003) Gene expression-based classification of malignant gliomas correlates better with survival than histological classification. Cancer Res 63:1602–1607 Google Scholar
  11. 11.
    Dasgupta S, Long PM (2003) Boosting with diverse base classifiers. In: Proceedings of the conference on computational learning theory, pp 273–287 Google Scholar
  12. 12.
    Dettling M (2004) BagBoosting for tumor classification with gene expression data. Bioinformatics 20(18):3583–3593 CrossRefGoogle Scholar
  13. 13.
    Dinesh S, Phillip GF, Kenneth R, Donald GJ, Judith M (2002) Gene expression correlates of clinical prostate cancer behavior. Cancer Cell 1:203–209 CrossRefGoogle Scholar
  14. 14.
    Domingo C, Watanabe O (2000) MadaBoost: A modification of AdaBoost. Technical reports on mathematical and computing sciences TR-C138 Google Scholar
  15. 15.
    Freund Y, Schapire R (1996) Experiments with a new boosting algorithm. In: Proceedings of the thirteenth international conference on machine learning, San Francisco, pp 148–156 Google Scholar
  16. 16.
    Freund Y (2001) An adaptive version of the boost by majority algorithm. Int J Mach Learn 43(3):293–318 MATHCrossRefGoogle Scholar
  17. 17.
    Friedman J (2002) Stochastic gradient boosting. Comput Stat Data Anal 38:367–368 MATHCrossRefGoogle Scholar
  18. 18.
    Friedman JH, Hastie T, Tibshirani R (2000) Additive logistic regression: A statistical view of boosting. Ann Stat 28:337–374 MATHCrossRefMathSciNetGoogle Scholar
  19. 19.
    Gavin JG, Roderick VJ, Li-Li H, Steven RG, Joshua EB, Sridhar R, William GR, David JS, Raphael B (2002) Translation of microarray data into clinically relevant cancer diagnostic tests using gene expression ratios in lung cancer and mesothelioma. Cancer Res 62:4963–4967 Google Scholar
  20. 20.
    Golub TR, Slonim DK, Tamayo P, Huard C, Gaasenbeek M (1999) Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286:531–537 CrossRefGoogle Scholar
  21. 21.
    Kohavi R, Wolpert D (1996) Bias plus variance decomposition for zero-one loss functions. In: Proceedings of the thirteenth international machine learning conference Google Scholar
  22. 22.
    Kuncheva LI (2005) Diversity in multiple classifier systems. Inf Fusion 6:3–4 CrossRefGoogle Scholar
  23. 23.
    Long PM, Vega VB (2003) Boosting and microarray data. Int J Mach Learn 52:31–44 MATHCrossRefGoogle Scholar
  24. 24.
    Pomeroy SL, Tamayo P, Gaasenbeek M, Sturla LM (2002) Prediction of central nervous system embryonal tumour outcome based on gene expression. Nature 415:436–442 CrossRefGoogle Scholar
  25. 25.
    Quinlan JR (1996) Bagging, boosting and c4.5. In: Proceedings of the thirteenth national conference on artificial intelligence, pp 725–730 Google Scholar
  26. 26.
    Tan AC, Gilbert D (2003) Ensemble machine learning on gene expression data for cancer classification. Appl Bioinf 2:S75–S83 Google Scholar
  27. 27.
    Veer LJ, Dai H, van de Vijver MJ, He YD, Hart AA, Mao M, Peterse HL (2002) Gene expression profiling predicts clinical outcome of breast cancer. Nature 415(6871):484–485 CrossRefGoogle Scholar
  28. 28.
    Wang C-W (2006) New ensemble machine learning method for classification and prediction on gene expression data. In: Proceedings of the international conference of the IEEE engineering in medicine and biology society, vol 2, pp 3478–3481 Google Scholar
  29. 29.
    Warmuth MK, Liao J, Ratsch G (2006) Totally corrective boosting algorithms that maximize the margin. In: Proceedings of the 23rd international conference on machine learning, vol 148, pp 1001–1008 Google Scholar
  30. 30.
    Webb GI (2000) MultiBoosting: a technique for combining boosting and wagging. Int J Mach Learn 40:159–196 CrossRefGoogle Scholar
  31. 31.
    Yeoh EJ, Ross ME, Shurtleff SA, Williams WK, Patel D, Mahfouz R, Behm FG (2002) Classification, subtype discovery, and prediction of outcome in pediatric acute lymphoblastic leukemia by gene expression profiling. Cancer Cell 1(2):133–143 CrossRefGoogle Scholar
  32. 32.
    Zembutsu H, Ohnishi Y, Tsunoda T, Furukawa Y, Katagiri T, Ueyama Y (2002) Genome-wide cDNA microarray screening to correlate gene expression profiles with sensitivity of 85 human cancer xenografts to anticancer drugs. Cancer Res 62(2):518–527 Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2009

Authors and Affiliations

  1. 1.University of LincolnLincolnUK

Personalised recommendations