Skip to main content
Log in

A boosting method for maximization of the area under the ROC curve

  • Published:
Annals of the Institute of Statistical Mathematics Aims and scope Submit manuscript

Abstract

We discuss receiver operating characteristic (ROC) curve and the area under the ROC curve (AUC) for binary classification problems in clinical fields. We propose a statistical method for combining multiple feature variables, based on a boosting algorithm for maximization of the AUC. In this iterative procedure, various simple classifiers that consist of the feature variables are combined flexibly into a single strong classifier. We consider a regularization to prevent overfitting to data in the algorithm using a penalty term for nonsmoothness. This regularization method not only improves the classification performance but also helps us to get a clearer understanding about how each feature variable is related to the binary outcome variable. We demonstrate the usefulness of score plots constructed componentwise by the boosting method. We describe two simulation studies and a real data analysis in order to illustrate the utility of our method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Bamber D. (1975) The area above the ordinal dominance graph and the area below the receiver operating characteristic graph. Journal of Mathematical Psychology 12: 387–415

    Article  MathSciNet  MATH  Google Scholar 

  • Chambers J.M., Hastie T.J. (1992) Statistical models in S. Pacific Grove, CA, Wadsworth and Brooks

    MATH  Google Scholar 

  • Eguchi S., Copas J. (2002) A class of logistic-type discriminant functions. Biometrika 89: 1–22

    Article  MathSciNet  MATH  Google Scholar 

  • Freund Y., Schapire R.E. (1997) A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences 55: 119–139

    Article  MathSciNet  MATH  Google Scholar 

  • Friedman J., Hastie T., Tibshirani R. (2000) Additive logistic regression: A statistical view of boosting (with discussion). The Annals of Statistics 28: 337–407

    Article  MathSciNet  MATH  Google Scholar 

  • Hastie T., Tibshirani R. (1986) Generalized additive models. Statistical Science 1: 297–318

    Article  MathSciNet  Google Scholar 

  • Hastie T., Tibshirani R., Friedman J. (2001) The elements of statistical learning. Springer, New York

    MATH  Google Scholar 

  • Kawakita M., Minami M., Eguchi S., Lennert-Cody C.E. (2005) An introduction to the predictive technique AdaBoost with a comparison to generalized additive models. Fisheries Research 76: 328–343

    Article  Google Scholar 

  • Long P.M., Servedio R.A. (2007) Boosting the area under the ROC curve. In: Platt J.C., Koller D., Singer Y., Roweis S. (eds) Advances in neural information processing systems (Vol. 20). MIT Press, Cambridge, MA, pp 945–952

    Google Scholar 

  • Ma S., Huang J. (2005) Regularized ROC method for disease classification and biomarker selection with microarray data. Bioinformatics 21: 4356–4362

    Article  Google Scholar 

  • Ma S., Huang J. (2007) Combining multiple markers for classification using ROC. Biometrics 63: 751–757

    Article  MathSciNet  MATH  Google Scholar 

  • McIntosh M.W., Pepe M.S. (2002) Combining several screening tests: Optimality of the risk score. Biometrics 58: 657–664

    Article  MathSciNet  MATH  Google Scholar 

  • McLachlan G.J. (2004) Discriminant analysis and statistical pattern recognition. Wiley, New York

    MATH  Google Scholar 

  • Murata N., Takenouchi T., Kanamori T., Eguchi S. (2004) Information geometry of \({\mathcal{U}}\) -Boost and Bregman divergence. Neural Computation 16: 1437–1481

    Article  MATH  Google Scholar 

  • Neyman J., Pearson E.S. (1933) On the problem of the most efficient tests of statistical hypotheses. Philosophical Transactions of the Royal Society of London, Series A 231: 289–337

    Article  Google Scholar 

  • Pepe M.S. (2003) The statistical evaluation of medical tests for classification and prediction. Oxford University Press, Oxford

    MATH  Google Scholar 

  • Pepe M.S., Thompson M.L. (2000) Combining diagnostic test results to increase accuracy. Biostatistics 1: 123–140

    Article  MATH  Google Scholar 

  • Pepe M.S., Longton G., Anderson G.L., Schummer M. (2003) Selecting differentially expressed genes from microarray experiments. Biometrics 59: 133–142

    Article  MathSciNet  MATH  Google Scholar 

  • Pepe M.S., Cai T., Longton G. (2006) Combining predictors for classification using the area under the receiver operating characteristic curve. Biometrics 62: 221–229

    Article  MathSciNet  Google Scholar 

  • Pepe M.S., Janes H., Longton G., Leisenring W., Newcomb P. (2004) Limitations of the odds ratio in gauging the performance of a diagnostic, prognostic, or screening marker. American Journal of Epidemiology 159: 882–890

    Article  Google Scholar 

  • Su J.Q., Liu J.S. (1993) Linear combinations of multiple diagnostic markers. Journal of the American Statistical Association 88: 1350–1355

    Article  MathSciNet  MATH  Google Scholar 

  • Takenouchi T., Eguchi S. (2004) Robustifying AdaBoost by adding the naive error rate. Neural Computation 16: 767–787

    Article  MATH  Google Scholar 

  • Tutz G., Binder H. (2006) Generalized additive modeling with implicit variable selection by likelihood-based boosting. Biometrics 62: 961–971

    Article  MathSciNet  MATH  Google Scholar 

  • Ueki, M., Fueda, K. (2009). Optimal tuning parameter estimation in maximum penalized likelihood method. Annals of the Institute of Statistical Mathematics. doi:10.1007/s10463-008-0186-0.

  • Wang Z., Chang Y.I., Ying Z., Zhu L., Yang Y. (2007) A parsimonious threshold-independent protein feature selection method through the are under receiver operating characteristic curve. Bioinformatics 23: 2788–2794

    Article  Google Scholar 

  • Zhang B.T., Yu B. (2005) Boosting with early stopping: Convergence and consistency. The Annals of Statistics 33: 1538–1579

    Article  MathSciNet  MATH  Google Scholar 

  • Zhou X.H., Obuchowski N.A., McClish D.K. (2002) Statistical methods in diagnostic medicine. Wiley, New York

    Book  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Osamu Komori.

About this article

Cite this article

Komori, O. A boosting method for maximization of the area under the ROC curve. Ann Inst Stat Math 63, 961–979 (2011). https://doi.org/10.1007/s10463-009-0264-y

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10463-009-0264-y

Keywords

Navigation