A boosting method for maximization of the area under the ROC curve

Komori, Osamu

doi:10.1007/s10463-009-0264-y

A boosting method for maximization of the area under the ROC curve

Published: 28 October 2009

Volume 63, pages 961–979, (2011)
Cite this article

Annals of the Institute of Statistical Mathematics Aims and scope Submit manuscript

Osamu Komori¹

371 Accesses
13 Citations
Explore all metrics

Abstract

We discuss receiver operating characteristic (ROC) curve and the area under the ROC curve (AUC) for binary classification problems in clinical fields. We propose a statistical method for combining multiple feature variables, based on a boosting algorithm for maximization of the AUC. In this iterative procedure, various simple classifiers that consist of the feature variables are combined flexibly into a single strong classifier. We consider a regularization to prevent overfitting to data in the algorithm using a penalty term for nonsmoothness. This regularization method not only improves the classification performance but also helps us to get a clearer understanding about how each feature variable is related to the binary outcome variable. We demonstrate the usefulness of score plots constructed componentwise by the boosting method. We describe two simulation studies and a real data analysis in order to illustrate the utility of our method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Bamber D. (1975) The area above the ordinal dominance graph and the area below the receiver operating characteristic graph. Journal of Mathematical Psychology 12: 387–415
Article MathSciNet MATH Google Scholar
Chambers J.M., Hastie T.J. (1992) Statistical models in S. Pacific Grove, CA, Wadsworth and Brooks
MATH Google Scholar
Eguchi S., Copas J. (2002) A class of logistic-type discriminant functions. Biometrika 89: 1–22
Article MathSciNet MATH Google Scholar
Freund Y., Schapire R.E. (1997) A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences 55: 119–139
Article MathSciNet MATH Google Scholar
Friedman J., Hastie T., Tibshirani R. (2000) Additive logistic regression: A statistical view of boosting (with discussion). The Annals of Statistics 28: 337–407
Article MathSciNet MATH Google Scholar
Hastie T., Tibshirani R. (1986) Generalized additive models. Statistical Science 1: 297–318
Article MathSciNet Google Scholar
Hastie T., Tibshirani R., Friedman J. (2001) The elements of statistical learning. Springer, New York
MATH Google Scholar
Kawakita M., Minami M., Eguchi S., Lennert-Cody C.E. (2005) An introduction to the predictive technique AdaBoost with a comparison to generalized additive models. Fisheries Research 76: 328–343
Article Google Scholar
Long P.M., Servedio R.A. (2007) Boosting the area under the ROC curve. In: Platt J.C., Koller D., Singer Y., Roweis S. (eds) Advances in neural information processing systems (Vol. 20). MIT Press, Cambridge, MA, pp 945–952
Google Scholar
Ma S., Huang J. (2005) Regularized ROC method for disease classification and biomarker selection with microarray data. Bioinformatics 21: 4356–4362
Article Google Scholar
Ma S., Huang J. (2007) Combining multiple markers for classification using ROC. Biometrics 63: 751–757
Article MathSciNet MATH Google Scholar
McIntosh M.W., Pepe M.S. (2002) Combining several screening tests: Optimality of the risk score. Biometrics 58: 657–664
Article MathSciNet MATH Google Scholar
McLachlan G.J. (2004) Discriminant analysis and statistical pattern recognition. Wiley, New York
MATH Google Scholar
Murata N., Takenouchi T., Kanamori T., Eguchi S. (2004) Information geometry of \({\mathcal{U}}\) -Boost and Bregman divergence. Neural Computation 16: 1437–1481
Article MATH Google Scholar
Neyman J., Pearson E.S. (1933) On the problem of the most efficient tests of statistical hypotheses. Philosophical Transactions of the Royal Society of London, Series A 231: 289–337
Article Google Scholar
Pepe M.S. (2003) The statistical evaluation of medical tests for classification and prediction. Oxford University Press, Oxford
MATH Google Scholar
Pepe M.S., Thompson M.L. (2000) Combining diagnostic test results to increase accuracy. Biostatistics 1: 123–140
Article MATH Google Scholar
Pepe M.S., Longton G., Anderson G.L., Schummer M. (2003) Selecting differentially expressed genes from microarray experiments. Biometrics 59: 133–142
Article MathSciNet MATH Google Scholar
Pepe M.S., Cai T., Longton G. (2006) Combining predictors for classification using the area under the receiver operating characteristic curve. Biometrics 62: 221–229
Article MathSciNet Google Scholar
Pepe M.S., Janes H., Longton G., Leisenring W., Newcomb P. (2004) Limitations of the odds ratio in gauging the performance of a diagnostic, prognostic, or screening marker. American Journal of Epidemiology 159: 882–890
Article Google Scholar
Su J.Q., Liu J.S. (1993) Linear combinations of multiple diagnostic markers. Journal of the American Statistical Association 88: 1350–1355
Article MathSciNet MATH Google Scholar
Takenouchi T., Eguchi S. (2004) Robustifying AdaBoost by adding the naive error rate. Neural Computation 16: 767–787
Article MATH Google Scholar
Tutz G., Binder H. (2006) Generalized additive modeling with implicit variable selection by likelihood-based boosting. Biometrics 62: 961–971
Article MathSciNet MATH Google Scholar
Ueki, M., Fueda, K. (2009). Optimal tuning parameter estimation in maximum penalized likelihood method. Annals of the Institute of Statistical Mathematics. doi:10.1007/s10463-008-0186-0.
Wang Z., Chang Y.I., Ying Z., Zhu L., Yang Y. (2007) A parsimonious threshold-independent protein feature selection method through the are under receiver operating characteristic curve. Bioinformatics 23: 2788–2794
Article Google Scholar
Zhang B.T., Yu B. (2005) Boosting with early stopping: Convergence and consistency. The Annals of Statistics 33: 1538–1579
Article MathSciNet MATH Google Scholar
Zhou X.H., Obuchowski N.A., McClish D.K. (2002) Statistical methods in diagnostic medicine. Wiley, New York
Book MATH Google Scholar

Download references

Author information

Authors and Affiliations

Department of Statistical Science, The Graduate University for Advanced Studies, Minami-azabu, Tokyo, 106-8569, Japan
Osamu Komori

Authors

Osamu Komori
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Osamu Komori.

About this article

Cite this article

Komori, O. A boosting method for maximization of the area under the ROC curve. Ann Inst Stat Math 63, 961–979 (2011). https://doi.org/10.1007/s10463-009-0264-y

Download citation

Received: 01 December 2008
Revised: 08 July 2009
Published: 28 October 2009
Issue Date: October 2011
DOI: https://doi.org/10.1007/s10463-009-0264-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A boosting method for maximization of the area under the ROC curve

Abstract

Access this article

Similar content being viewed by others

Boosting for high-dimensional two-class prediction

I-Boost: an integrative boosting approach for predicting survival time with multiple genomics platforms

Introduction to Binary Coordinate Ascent: New Insights into Efficient Feature Subset Selection for Machine Learning

References

Author information

Authors and Affiliations

Corresponding author

About this article

Cite this article

Keywords

Navigation

A boosting method for maximization of the area under the ROC curve

Abstract

Access this article

Similar content being viewed by others

Boosting for high-dimensional two-class prediction

I-Boost: an integrative boosting approach for predicting survival time with multiple genomics platforms

Introduction to Binary Coordinate Ascent: New Insights into Efficient Feature Subset Selection for Machine Learning

References

Author information

Authors and Affiliations

Corresponding author

About this article

Cite this article

Share this article

Keywords

Search

Navigation