Skip to main content

Development and Evaluation of Classifiers

  • Protocol
Topics in Biostatistics

Part of the book series: Methods in Molecular Biology™ ((MIMB,volume 404))

Abstract

Diagnostic tests, medical tests, screening tests, biomarkers, and prediction rules are all types of classifiers. This chapter introduces methods for classifier development and evaluation. We first introduce measures of classification performance including sensitivity, specificity, and receiver operating characteristic (ROC) curves. We then review some issues in the design of studies to assess and compare the performance of classifiers. Approaches for using the data to estimate and compare classifier accuracy are then introduced. Next, methods for combining multiple classifiers into a single classifier are presented. Lastly, we discuss other important aspects of classifier development and evaluation. The methods presented are illustrated with real data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Protocol
USD 49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 189.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 249.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 249.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Pepe, M. S. (2003) The Statistical Evaluation of Medical Tests for Classification and Prediction. Oxford, Oxford University Press.

    Google Scholar 

  2. Kaiser, S., Frenckner, B., and Jorulf, H. K. (2002) Suspected appendicitis in children: US and CT—a prospective randomized study. Radiology 223, 633–638.

    Article  PubMed  Google Scholar 

  3. Pepe, M. S., Etzioni, R., Feng, Z., Potter, J. D., Thompson, M., Thornquist, M., Winget, M., and Yasui, Y. (2001) Phases of biomarker development for early detection of cancer. J. Natl. Cancer Inst. 93, 1054–1061.

    Article  PubMed  CAS  Google Scholar 

  4. Bossuyt, P. M., Reitsma, J. B., Bruns, D. E., Gatsonis, C. A., Glasziou, P. P., Irwig, L. M., Lijmer, J. G., Moher, D., Rennie, D., and Vet, H. C. W. D. (2003) Towards complete and accurate reporting of studies of diagnostic accuracy: the STARD initiative. Clin. Chem. 49, 1–6.

    Article  PubMed  CAS  Google Scholar 

  5. Bossuyt, P. M., Reitsma, J. B., Bruns, D. E., Gatsonis, C. A., Glasziou, P. P., Irwig, L. M., Lijmer, J. G., Moher, D., Rennie, D., and Vet, H. C. W. D. (2003) The STARD statement for reporting studies of diagnostic accuracy: explanation and elaboration. Clin. Chem. 49, 7–18.

    Article  PubMed  CAS  Google Scholar 

  6. Cheng, H., and Macaluso, M. (1997) Comparison of the accuracy of two tests with a confirmatory procedure limited to positive results. Epidemiology 8, 104–106.

    Article  PubMed  CAS  Google Scholar 

  7. Christenson, R. H., Fitzgerald, R. L., Ochs, L., Rozenberg, M., Frankel, W. L., Herold, D. A., Duh, S. H., Alonsozana, G. L., and Jacobs, E. (1997) Characteristics of a 20-minute whole blood rapid assay for cardiac troponin T. Clin. Biochem. 30, 27–33.

    Article  PubMed  CAS  Google Scholar 

  8. Schatzkin, A., Connor, R. J., and Taylor, P. R. (1987) Comparing new and old screening tests when a reference procedure cannot be performed on all screenees. Am. J. Epidemiol. 125, 672–678.

    PubMed  CAS  Google Scholar 

  9. Wieand, S., Gail, M. H., James, B. R., and James, K. L. (1989) A family of nonparametric statistics for comparing diagnostic markers with paired or unpaired data. Biometrika 76, 585–592.

    Article  Google Scholar 

  10. Hsieh, F., and Turnbull, B. W. (1996) Nonparametric and semiparametric estimation of the receiver operating chacterisitic ROC curve. Ann. Stat. 24, 25–40.

    Article  Google Scholar 

  11. Bamber, D. (1975) The area above the ordinal dominance graph and the area below the receiver operating characteristic graph. J. Math. Psychol. 12, 387–415.

    Article  Google Scholar 

  12. Hanley, J. A., and McNeil, B. J. (1982) The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology 142, 29–36.

    Google Scholar 

  13. DeLong, E. R., DeLong, D. M., and Clarke-Pearson, D. L. (1988) Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach. Biometrics 44, 837–845.

    Article  PubMed  CAS  Google Scholar 

  14. Efron, B., and Tibshirani, R. J. (1993) An Introduction to the Bootstrap. New York, Chapman & Hall.

    Google Scholar 

  15. Dodd, L. E., and Pepe, M. S. (2003) Semiparametric regression for the area under the receiver operating characteristic curve. J. Am. Stat. Assoc. 98, 409–417.

    Article  Google Scholar 

  16. Pepe, M. S. (2000) An interpretation for the ROC curve and inference using GLM procedures. Biometrics 56, 352–359.

    Article  PubMed  CAS  Google Scholar 

  17. Metz, C. E., Herman, B. A., and Shen, J. H. (1998) Maximum likelihood estimation of receiver operating characteristic (ROC) curves from continuously-distributed data. Stat. Medi. 17, 1033–1053.

    Article  CAS  Google Scholar 

  18. Dorfman, D. D., and Alf, E. (1969) Maximum likelihood estimation of parameters of signal detection theory and determination of confidence intervals: rating data. J. Math. Psychol. 6, 487–496.

    Article  Google Scholar 

  19. Ma, G., and Hall, W. J. (1993) Confidence bands for receiver operating characteristic curves. Medical Decision Making 13, 191–197.

    Article  PubMed  CAS  Google Scholar 

  20. Metz, C. E., and Kronman, H. B. (1980) Statistical significance tests for binormal ROC curves. J. Math. Psychol. 22, 218–243.

    Article  Google Scholar 

  21. Marshall, R. J. (1989) The predictive value of simple rules for combining two diagnostic tests. Biometrics 45, 1213–1222.

    Article  Google Scholar 

  22. McIntosh, M., and Pepe, M. S. (2002) Combining several screening tests: optimality of the risk score. Biometrics 58, 657–664.

    Article  PubMed  Google Scholar 

  23. Baker, S. G. (2000) Identifying combinations of cancer markers for further study as triggers of early intervention. Biometrics 56, 1082–1087.

    Article  PubMed  CAS  Google Scholar 

  24. Ruczinski, I., Kooperberg, C., and LeBlanc, M. L. (2003) Logic regression. J. Computati. Graphical Stat. 12, 475–511.

    Article  Google Scholar 

  25. Breiman, L., Freidman, J. H., Olshen, R. A., and Stone, C. J. (1984) Classification and Regression Trees. Belmont, Wadsworth.

    Google Scholar 

  26. Cristianini, N., and Shawe-Taylor, J. (2000) An Introduction to Support Vector Machines: And Other Kernel-Based Learning Methods. Cambridge, Cambridge University Press.

    Google Scholar 

  27. Schapire, R., Freund, Y., Bartlett, P., and Lee, W. (1998) Boosting the margin: a new explanation for the effectiveness of voting methods. Ann. Stat. 26, 1651–1686.

    Article  Google Scholar 

  28. Friedman, L. M., Hastie, T., and Tibshirani, R. (2000) Additive logistic regression: a statistical view of boosting. Ann. Stat. 28, 400–407.

    Article  Google Scholar 

  29. Efron, B., and Morris, C. (1977) Stein’s paradox in statistics. Sci. Am. 236, 119–127.

    Article  Google Scholar 

  30. Copas, J. B. (1997) Using regression models for prediction: shrinkage and regression to the mean. Stat. Methods Med. Res. 6, 167–183.

    Article  PubMed  CAS  Google Scholar 

  31. Moons, K. G. M., Donders, A. R. T., Steyerberg, E. W., and Harrell, F. E. (2004) Penalized maximum likelihood estimation to directly adjust diagnostic and prognostic prediction models for overoptimism: a clinical example. J. Clin. Epidemiol. 57, 1262–1270.

    Article  PubMed  CAS  Google Scholar 

  32. Begg, C. B., and Greenes, R. A. (1983) Assessment of diagnostic tests when disease is subject to selection bias. Biometrics 39, 207–216.

    Article  PubMed  CAS  Google Scholar 

  33. Alonzo, T. A., and Pepe, M. S. (2005) Assessing accuracy of a continuous screening test in the presence of verification bias. Appl. Stat. 54, 173–190.

    Google Scholar 

  34. Gart, J. J., and Buck, A. A. (1966) Comparison of a screening test and a reference test in epidemilogic studies. II. A probabilitic model for the comparison of diagnostic tests. Am. J. Epidemiol. 83, 593–602.

    CAS  Google Scholar 

  35. Leisenring, W., Pepe, M. S., and Longton, G. (1997) A marginal regression modelling framework for evaluating medical diagnostic tests. Stat. Med. 16, 1263–1281.

    Article  PubMed  CAS  Google Scholar 

  36. Leisenring, W., Alonzo, T., and Pepe, M. S. (2000) Comparisons of predictive values of binary medical diagnostic tests for paired designs. Biometrics 56, 345–351.

    Article  PubMed  CAS  Google Scholar 

  37. Tosteson, A., and Begg, C. B. (1985) A general regression methodology for ROC curve estimation. Medical Decision Making 8, 204–215.

    Article  Google Scholar 

  38. Toledano, A. Y., and Gastonis, C. A. (1996) Ordinal regression methodology for ROC curves derived from correlated datta. Stat. Med. 15, 1807–1826.

    Article  PubMed  CAS  Google Scholar 

  39. Pepe, M. S. (1997) A regression modelling framework for receiver operating characteristic curves in medical diagnostic testing. Biometrika 84, 595–608.

    Article  Google Scholar 

  40. Dorfman, D. D., Berbaum, K. S., and Metz, C. E. (1992) Receiver operating characteristic rating analysis: generalization to the population of readers and patients with the jack-knife method. Invest. Radiol. 27, 723–731.

    Article  PubMed  CAS  Google Scholar 

  41. Obuchowski, N. A. (1995) Multireader, multimodality receiver operating characteristic curve studies: hypothesis testing and sample size estimation using an analysis of variance approach with dependent observations. Acad. Radiol. 2, S22–S29.

    Article  PubMed  Google Scholar 

  42. Zhou, X. H., Obuchowski, N. A., and McClish, D. K. (2002) Statistical Methods in Diagnostic Medicine. New York, John Wiley & Sons.

    Book  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Humana Press Inc., Totowa, NJ

About this protocol

Cite this protocol

Alonzo, T.A., Pepe, M.S. (2007). Development and Evaluation of Classifiers. In: Ambrosius, W.T. (eds) Topics in Biostatistics. Methods in Molecular Biology™, vol 404. Humana Press. https://doi.org/10.1007/978-1-59745-530-5_6

Download citation

  • DOI: https://doi.org/10.1007/978-1-59745-530-5_6

  • Publisher Name: Humana Press

  • Print ISBN: 978-1-58829-531-6

  • Online ISBN: 978-1-59745-530-5

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics