Skip to main content
Log in

A low variance error boosting algorithm

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

This paper introduces a robust variant of AdaBoost, cw-AdaBoost, that uses weight perturbation to reduce variance error, and is particularly effective when dealing with data sets, such as microarray data, which have large numbers of features and small number of instances. The algorithm is compared with AdaBoost, Arcing and MultiBoost, using twelve gene expression datasets, using 10-fold cross validation. The new algorithm consistently achieves higher classification accuracy over all these datasets. In contrast to other AdaBoost variants, the algorithm is not susceptible to problems when a zero-error base classifier is encountered.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Ambroise C, McLachlan GJ (2002) Selection bias in gene extraction on the basis of microarray gene-expression data. Proc Natl Acad Sci USA 99(10):6562–6566

    Article  MATH  Google Scholar 

  2. Amit Y, Blanchard G (2001) Multiple randomized classifiers. Technical report, University of Chicago

  3. Ali KM, Pazzani MJ (1996) Error reduction through learning multiple descriptions. Int J Mach Learn 24:173–202

    Google Scholar 

  4. Alon U, Barkai N, Notterman DA, Gish K, Ybarra S, Mack D, Levine AJ (1999) Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Natl Acad Sci Cell Biol 96:6745–6750

    Article  Google Scholar 

  5. Armstrong SA, Staunton JE, Silverman LB, Pieters R (2002) MLL translocations specify a distinct gene expression profile that distinguishes a unique leukemia. Nat Genet 30:41–47

    Article  Google Scholar 

  6. Ash AA, Michael BE, Davis RE, Ma C, Izidore SL, Andreas R (2000) Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling. Nature 403:503–511

    Article  Google Scholar 

  7. Bauer E, Kohavi R (1999) An empirical comparison of voting classification algorithms: Bagging, boosting, and variants. Int J Mach Learn 36:105–139

    Article  Google Scholar 

  8. Breiman L (1996) Bias, variance, and arcing classifiers. Technical report 460, Statistics Department, UC Berkeley

  9. Breiman L (1996) Bagging predictors. Int J Mach Learn 24:134–140

    MathSciNet  Google Scholar 

  10. Catherine LN, Mani DR, Rebecca AB, Pablo T (2003) Gene expression-based classification of malignant gliomas correlates better with survival than histological classification. Cancer Res 63:1602–1607

    Google Scholar 

  11. Dasgupta S, Long PM (2003) Boosting with diverse base classifiers. In: Proceedings of the conference on computational learning theory, pp 273–287

  12. Dettling M (2004) BagBoosting for tumor classification with gene expression data. Bioinformatics 20(18):3583–3593

    Article  Google Scholar 

  13. Dinesh S, Phillip GF, Kenneth R, Donald GJ, Judith M (2002) Gene expression correlates of clinical prostate cancer behavior. Cancer Cell 1:203–209

    Article  Google Scholar 

  14. Domingo C, Watanabe O (2000) MadaBoost: A modification of AdaBoost. Technical reports on mathematical and computing sciences TR-C138

  15. Freund Y, Schapire R (1996) Experiments with a new boosting algorithm. In: Proceedings of the thirteenth international conference on machine learning, San Francisco, pp 148–156

  16. Freund Y (2001) An adaptive version of the boost by majority algorithm. Int J Mach Learn 43(3):293–318

    Article  MATH  Google Scholar 

  17. Friedman J (2002) Stochastic gradient boosting. Comput Stat Data Anal 38:367–368

    Article  MATH  Google Scholar 

  18. Friedman JH, Hastie T, Tibshirani R (2000) Additive logistic regression: A statistical view of boosting. Ann Stat 28:337–374

    Article  MATH  MathSciNet  Google Scholar 

  19. Gavin JG, Roderick VJ, Li-Li H, Steven RG, Joshua EB, Sridhar R, William GR, David JS, Raphael B (2002) Translation of microarray data into clinically relevant cancer diagnostic tests using gene expression ratios in lung cancer and mesothelioma. Cancer Res 62:4963–4967

    Google Scholar 

  20. Golub TR, Slonim DK, Tamayo P, Huard C, Gaasenbeek M (1999) Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286:531–537

    Article  Google Scholar 

  21. Kohavi R, Wolpert D (1996) Bias plus variance decomposition for zero-one loss functions. In: Proceedings of the thirteenth international machine learning conference

  22. Kuncheva LI (2005) Diversity in multiple classifier systems. Inf Fusion 6:3–4

    Article  Google Scholar 

  23. Long PM, Vega VB (2003) Boosting and microarray data. Int J Mach Learn 52:31–44

    Article  MATH  Google Scholar 

  24. Pomeroy SL, Tamayo P, Gaasenbeek M, Sturla LM (2002) Prediction of central nervous system embryonal tumour outcome based on gene expression. Nature 415:436–442

    Article  Google Scholar 

  25. Quinlan JR (1996) Bagging, boosting and c4.5. In: Proceedings of the thirteenth national conference on artificial intelligence, pp 725–730

  26. Tan AC, Gilbert D (2003) Ensemble machine learning on gene expression data for cancer classification. Appl Bioinf 2:S75–S83

    Google Scholar 

  27. Veer LJ, Dai H, van de Vijver MJ, He YD, Hart AA, Mao M, Peterse HL (2002) Gene expression profiling predicts clinical outcome of breast cancer. Nature 415(6871):484–485

    Article  Google Scholar 

  28. Wang C-W (2006) New ensemble machine learning method for classification and prediction on gene expression data. In: Proceedings of the international conference of the IEEE engineering in medicine and biology society, vol 2, pp 3478–3481

  29. Warmuth MK, Liao J, Ratsch G (2006) Totally corrective boosting algorithms that maximize the margin. In: Proceedings of the 23rd international conference on machine learning, vol 148, pp 1001–1008

  30. Webb GI (2000) MultiBoosting: a technique for combining boosting and wagging. Int J Mach Learn 40:159–196

    Article  Google Scholar 

  31. Yeoh EJ, Ross ME, Shurtleff SA, Williams WK, Patel D, Mahfouz R, Behm FG (2002) Classification, subtype discovery, and prediction of outcome in pediatric acute lymphoblastic leukemia by gene expression profiling. Cancer Cell 1(2):133–143

    Article  Google Scholar 

  32. Zembutsu H, Ohnishi Y, Tsunoda T, Furukawa Y, Katagiri T, Ueyama Y (2002) Genome-wide cDNA microarray screening to correlate gene expression profiles with sensitivity of 85 human cancer xenografts to anticancer drugs. Cancer Res 62(2):518–527

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ching-Wei Wang.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wang, CW., Hunter, A. A low variance error boosting algorithm. Appl Intell 33, 357–369 (2010). https://doi.org/10.1007/s10489-009-0172-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-009-0172-0

Keywords

Navigation