European Journal of Epidemiology

, Volume 27, Issue 10, pp 761–770 | Cite as

Assessing the discriminative ability of risk models for more than two outcome categories

  • Ben Van Calster
  • Yvonne Vergouwe
  • Caspar W. N. Looman
  • Vanya Van Belle
  • Dirk Timmerman
  • Ewout W. Steyerberg


The discriminative ability of risk models for dichotomous outcomes is often evaluated with the concordance index (c-index). However, many medical prediction problems are polytomous, meaning that more than two outcome categories need to be predicted. Unfortunately such problems are often dichotomized in prediction research. We present a perspective on the evaluation of discriminative ability of polytomous risk models, which may instigate researchers to consider polytomous prediction models more often. First, we suggest a “discrimination plot” as a tool to visualize the model’s discriminative ability. Second, we discuss the use of one overall polytomous c-index versus a set of dichotomous measures to summarize the performance of the model. Third, we address several aspects to consider when constructing a polytomous c-index. These involve the assessment of concordance in pairs versus sets of patients, weighting by outcome prevalence, the value related to models with random performance, the reduction to the dichotomous c-index for dichotomous problems, and interpretation. We illustrate these issues on case studies dealing with ovarian cancer (four outcome categories) and testicular cancer (three categories). We recommend the use of a discrimination plot together with an overall c-index such as the Polytomous Discrimination Index. If the overall c-index suggests that the model has relevant discriminative ability, pairwise c-indexes for each pair of outcome categories are informative. For pairwise c-indexes we recommend the ‘conditional-risk’ method which is consistent with the analytical approach of the multinomial logistic regression used to develop polytomous risk models.


Polytomous risk prediction Discrimination c-index Discrimination plot 


  1. 1.
    Steyerberg EW. Clinical prediction models: a practical approach to development, validation, and updating. New York: Springer; 2009.Google Scholar
  2. 2.
    Biesheuvel CJ, Vergouwe Y, Steyerberg EW, Grobbee DE, Moons KGM. Polytomous logistic regression analysis could be applied more often in diagnostic research. J Clin Epidemiol. 2008;61:125–34.PubMedCrossRefGoogle Scholar
  3. 3.
    Harrell FE Jr, Lee KL, Mark DB. Multivariable prognostic models: issues in developing models, evaluating assumptions and adequacy, and measuring and reducing errors. Stat Med. 1996;15:361–87.PubMedCrossRefGoogle Scholar
  4. 4.
    Harrell FE Jr. Regression modeling strategies: with applications to linear models, logistic regression, and survival analysis. New York: Springer; 2001.Google Scholar
  5. 5.
    Hanley JA, McNeil BJ. The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology. 1982;143:29–36.PubMedGoogle Scholar
  6. 6.
    Mossman D. Three-way ROCs. Med Decis Making. 1999;19:78–89.PubMedCrossRefGoogle Scholar
  7. 7.
    Hand DJ, Till RJ. A simple generalisation of the area under the ROC curve for multiple class classification problems. Mach Learn. 2001;45:171–86.CrossRefGoogle Scholar
  8. 8.
    Obuchowski NA, Goske MJ, Applegate KE. Assessing physicians’ accuracy in diagnosing paediatric patients with acute abdominal pain: measuring accuracy for multiple diseases. Stat Med. 2001;20:3261–78.PubMedCrossRefGoogle Scholar
  9. 9.
    Provost F, Domingos P. Tree induction for probability-based ranking. Mach Learn. 2003;52:199–215.CrossRefGoogle Scholar
  10. 10.
    Obuchowski NA. Estimating and comparing diagnostic tests’ accuracy when the gold standard is not binary. Acad Radiol. 2005;12:1198–204.PubMedCrossRefGoogle Scholar
  11. 11.
    Van Calster B, Van Belle V, Vergouwe Y, Timmerman D, Van Huffel S, Steyerberg EW. Extending the c-statistic to nominal polytomous outcomes: the Polytomous Discrimination Index. Stat Med. 2012;31:2610–26.Google Scholar
  12. 12.
    Nakas CT, Yiannoutsos CT. Ordered multiple-class ROC analysis with continuous measurements. Stat Med. 2004;23:3437–49.PubMedCrossRefGoogle Scholar
  13. 13.
    Nakas CT, Alonzo TA. ROC graphs for assessing the ability of a diagnostic marker to detect three disease classes with an umbrella ordering. Biometrics. 2007;63:603–9.PubMedCrossRefGoogle Scholar
  14. 14.
    Van Calster B, Van Belle V, Vergouwe Y, Steyerberg EW. Discrimination ability of prediction models for ordinal outcomes: relationship between existing measures and a new measure. Biom J. 2012;54:674–85.PubMedCrossRefGoogle Scholar
  15. 15.
    Steyerberg EW, Vickers AJ, Cook NR, Gerds T, Gonen M, Obuchowski N, et al. Assessing the performance of prediction models: a framework for traditional and novel measures. Epidemiology. 2010;21:128–38.PubMedCrossRefGoogle Scholar
  16. 16.
    Panici PB, Muzii L, Palaia I, Manci N, Bellati F, Plotti F, et al. Minilaparotomy versus laparoscopy in the treatment of benign adnexal cysts: a randomized clinical study. Eur J Obstet Gynecol Reprod Biol. 2007;133:218–22.PubMedCrossRefGoogle Scholar
  17. 17.
    Tinelli R, Tinelli A, Tinelli FG, Cicinelli E, Malvasi A. Conservative surgery for borderline ovarian tumors: a review. Gynecol Oncol. 2006;100:185–91.PubMedCrossRefGoogle Scholar
  18. 18.
    Hennessy BT, Coleman RL, Markman M. Ovarian cancer. Lancet. 2009;374:1371–82.PubMedCrossRefGoogle Scholar
  19. 19.
    Timmerman D, Testa AC, Bourne T, Ferrazzi E, Ameye L, Konstantinovic ML, et al. A logistic regression model to distinguish between the benign and malignant adnexal mass before surgery: a multicenter study by the International Ovarian Tumor Analysis (IOTA) group. J Clin Oncol. 2005;23:8794–801.PubMedCrossRefGoogle Scholar
  20. 20.
    Van Holsbeke C, Van Calster B, Testa AC, Domali E, Lu C, Van Huffel S, et al. Prospective internal validation of mathematical models to predict malignancy in adnexal masses: results from the International Ovarian Tumor Analysis Study. Clin Cancer Res. 2009;15:684–91.PubMedCrossRefGoogle Scholar
  21. 21.
    Timmerman D, Van Calster B, Testa AC, Guerriero S, Fischerova D, Lissoni AA, et al. Ovarian cancer prediction in adnexal masses using ultrasound-based logistic regression models: a temporal and external validation study by the IOTA group. Ultrasound Obstet Gynecol. 2010;36:226–34.PubMedCrossRefGoogle Scholar
  22. 22.
    Van Holsbeke C, Van Calster B, Bourne T, Ajossa S, Testa AC, Guerriero S, et al. External validation of diagnostic models to estimate the risk of malignancy in adnexal masses. Clin Cancer Res. 2012;18:815–25.PubMedCrossRefGoogle Scholar
  23. 23.
    Timmerman D, Valentin L, Bourne TH, Collins WP, Verrelst H, Vergote I. Terms, definitions and measurements to describe the ultrasonographic features of adnexal tumors: a consensus opinion from the international ovarian tumor analysis (IOTA) group. Ultrasound Obstet Gynecol. 2000;16:500–5.PubMedCrossRefGoogle Scholar
  24. 24.
    Van Calster B, Valentin L, Van Holsbeke C, Zhang J, Jurkovic D, Lissoni AA, et al. A novel approach to predict the likelihood of specific ovarian tumor pathology based on serum CA-125: a multicenter observational study. Cancer Epidemiol Biomarkers Prev. 2011;20:2420–8.PubMedCrossRefGoogle Scholar
  25. 25.
    Hosmer DW, Lemeshow S. Applied logistic regression. 2nd ed. New York: Wiley; 2000.CrossRefGoogle Scholar
  26. 26.
    Van Calster B, Valentin L, Van Holsbeke C, Testa AC, Bourne T, Van Huffel S, et al. Polytomous diagnosis of ovarian tumors as benign, borderline, primary invasive or metastatic: development and validation of standard and kernel-based risk prediction models. BMC Med Res Methodol. 2010;10:96.PubMedCrossRefGoogle Scholar
  27. 27.
    Steyerberg EW, Keizer HJ, Fosså SD, Sleijfer DT, Toner GC, Schraffordt Koops H, et al. Prediction of residual retroperitoneal mass histology after chemotherapy for metastatic nonseminomatous germ cell tumor: multivariate analysis of individual patient data from six study groups. J Clin Oncol. 1995;13:1177–87.PubMedGoogle Scholar
  28. 28.
    Steyerberg EW, Gerl A, Fosså SD, Sleijfer DT, de Wit R, Kirkels WJ, et al. Validity of predictions of residual retroperitoneal mass histology in nonseminomatous testicular cancer. J Clin Oncol. 1998;16:269–74.PubMedGoogle Scholar
  29. 29.
    Vergouwe Y, Steyerberg EW, de Wit R, Roberts JT, Keizer HJ, Collette L, et al. External validity of a prediction rule for residual mass histology in testicular cancer: an evaluation for good prognosis patients. Br J Cancer. 2003;88:843–7.PubMedCrossRefGoogle Scholar
  30. 30.
    Vergouwe Y, Steyerberg EW, Foster RS, Sleijfer DT, Fosså SD, Gerl A, et al. Predicting retroperitoneal histology in postchemotherapy testicular germ cell cancer: a model update and multicentre validation with more than 1000 patients. Eur Urol. 2007;51:424–32.PubMedCrossRefGoogle Scholar
  31. 31.
    Steyerberg EW, Mushkudiani N, Perel P, Butcher I, Lu J, McHugh GS, et al. Predicting outcome after traumatic brain injury: development and international validation of prognostic scores based on admission characteristics. PLoS Med. 2008;5:e165.PubMedCrossRefGoogle Scholar
  32. 32.
    Van Calster B, Van Belle V, Condous G, Bourne T, Timmerman D, Van Huffel S. Multi-class AUC metrics and weighted alternatives. In: Liu D, Kozma R, editors. Proceedings of the 21st international joint conference on neural networks. Los Alamitos: IEEE Computer Society; 2008. p. 1391–7.Google Scholar
  33. 33.
    Vickers AJ, Cronin AM, Begg CB. One statistical test is sufficient for assessing new predictive markers. BMC Med Res Methodol. 2011;11:13.PubMedCrossRefGoogle Scholar
  34. 34.
    Vickers AJ, Elkin EB. Decision curve analysis: a novel method for evaluating prediction models. Med Decis Making. 2006;26:565–74.PubMedCrossRefGoogle Scholar
  35. 35.
    Leeflang MMG, Bossuyt PMM, Irwig L. Diagnostic test accuracy may vary with prevalence: implications for evidence-based diagnosis. J Clin Epidemiol. 2009;62:5–12.PubMedCrossRefGoogle Scholar
  36. 36.
    Webb GI, Ting KM. On the application of ROC analysis to predict classification performance under varying class distributions. Mach Learn. 2005;58:25–32.CrossRefGoogle Scholar
  37. 37.
    Whiting P, Rutjes AWS, Reitsma JB, Glas AS, Bossuyt PMM, Kleijnen J. Sources of variation and bias in studies of diagnostic accuracy: a systematic review. Ann Intern Med. 2004;140:189–202.PubMedGoogle Scholar
  38. 38.
    Moons KGM, van Es GA, Deckers JW, Habbema JDF, Grobbee DE. Limitations of sensitivity, specificity, likelihood ratio, and Bayes’ theorem in assessing diagnostic probabilities: a clinical example. Epidemiology. 1997;8:12–7.PubMedCrossRefGoogle Scholar
  39. 39.
    Pepe MS, Janes HE. Gauging the performance of SNPs, biomarkers, and clinical factors for predicting risk of breast cancer (editorial). J Natl Cancer Inst. 2008;100:978–9.PubMedCrossRefGoogle Scholar
  40. 40.
    Janes H, Pepe MS, Gu W. Assessing the value of risk predictions using risk stratification tables. Ann Intern Med. 2008;149:751–60.PubMedGoogle Scholar
  41. 41.
    Dreiseitl S, Ohno-Machado L, Binder M. Comparing three-class diagnostic tests by three-way ROC analysis. Med Decis Making. 2000;20:323–31.PubMedCrossRefGoogle Scholar
  42. 42.
    Skaltsa K, Jover L, Fuster D, Carrasco JL. Optimum threshold estimation based on cost function in a multistate diagnostic setting. Stat Med. 2012;31:1098–109.PubMedCrossRefGoogle Scholar
  43. 43.
    O’Brien DB, Gupta MR, Gray RM. Cost-sensitive multi-class classification from probability estimates. In: Cohen WW, McCallum A, Roweis ST, editors. Proceedings of the 25th international conference on machine learning. New York: Association for Computing Machinery; 2008. p. 712–9.CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media Dordrecht 2012

Authors and Affiliations

  • Ben Van Calster
    • 1
    • 2
  • Yvonne Vergouwe
    • 2
  • Caspar W. N. Looman
    • 2
  • Vanya Van Belle
    • 3
    • 4
  • Dirk Timmerman
    • 1
  • Ewout W. Steyerberg
    • 2
  1. 1.Department of Development and RegenerationKU Leuven, University of LeuvenLeuvenBelgium
  2. 2.Department of Public HealthErasmus MCRotterdamThe Netherlands
  3. 3.Department of Electrical EngineeringKU LeuvenLeuvenBelgium
  4. 4.IBBT- Future Health DepartmentKU LeuvenLeuvenBelgium

Personalised recommendations