Lifetime Data Analysis

, Volume 19, Issue 2, pp 170–201 | Cite as

Estimating improvement in prediction with matched case–control designs



When an existing risk prediction model is not sufficiently predictive, additional variables are sought for inclusion in the model. This paper addresses study designs to evaluate the improvement in prediction performance that is gained by adding a new predictor to a risk prediction model. We consider studies that measure the new predictor in a case–control subset of the study cohort, a practice that is common in biomarker research. We ask if matching controls to cases in regards to baseline predictors improves efficiency. A variety of measures of prediction performance are studied. We find through simulation studies that matching improves the efficiency with which most measures are estimated, but can reduce efficiency for some. Efficiency gains are less when more controls per case are included in the study. A method that models the distribution of the new predictor in controls appears to improve estimation efficiency considerably.


Classification Diagnosis Medical decision making  Receiver operating characteristic curve 



The support for this research was provided by RO1-GM054438 and PO1-CA-053996. The authors thank Mr. Jing Fan for his contribution to the simulation studies.


  1. Anderson M, Wilson PW, Odell PM, Kannel WB (1991) An updated coronary risk profile: a statement for health professionals. Circulation 83:356–362CrossRefGoogle Scholar
  2. Baker SG (2009) Putting risk prediction in perspective: relative utility curves. J Natl Cancer Inst 101:1538–1542CrossRefGoogle Scholar
  3. Breslow NE, Day NE (1980) Statistical methods in cancer research, vol 1. International Agency for Research on Cancer, LyonGoogle Scholar
  4. Breslow NE, Cain KC (1988) Logistic regression for two-stage case–control data. Biometrika 75(1):11–20MathSciNetMATHCrossRefGoogle Scholar
  5. Breslow NE (1996) Statistics in epidemiology: the case–control study. J Am Stat Assoc 91(433):14–27MathSciNetMATHCrossRefGoogle Scholar
  6. Bura E, Gastwirth JL (2001) The binary regression quantile plot: assessing the importance of predictors in binary regression visually. Biomet J 43:5–21MathSciNetMATHCrossRefGoogle Scholar
  7. Efron B, Tibshirani RJ (1993) An introduction to the bootstrap. Chapman & Hall/CRC, New YorkMATHGoogle Scholar
  8. Expert Panel on Detection, Evaluation, and Treatment of High Blood Cholesterol in Adults (2001) Executive summary of the Third Report of The National Cholesterol Education Program (NCEP) Expert Panel on Detection, Evaluation, and Treatment of High Blood Cholesterol in Adults (Adult Treatment Panel III). J Am Med Assoc 285(19):2486–2497Google Scholar
  9. Fears TR, Brown CC (1986) Logistic regression methods for retrospective case–control studies using complex sampling procedures. Biometrics 42:955–960MATHCrossRefGoogle Scholar
  10. Gail MH, Brinton LA, Byar DP, Corle DK, Green SB, Shairer C, Mulvihill JJ (1989) Projecting individualized probabilities of developing breast cancer for white females who are being examined annually. J Natl Cancer Inst 81(24):1879–1886CrossRefGoogle Scholar
  11. Gail MH, Costantino JP, Bryant J, Croyle R, Freedman L, Helzlsouer K, Vogel V (1999) Weighing the risks and benefits of tamoxifen treatment for preventing breast cancer. J Natl Cancer Inst 91(21):1829–1846CrossRefGoogle Scholar
  12. Gordon T, Kannel WB (1982) Multiple risk functions for predicting coronary heart disease: the concept, accuracy, and application. Am Heart J 103:1031–1039CrossRefGoogle Scholar
  13. Gu W, Pepe M (2009a) Measures to summarize and compare the predictive capacity of markers. Int J Biostat 5. doi: 10.2202/1557-4679.1188
  14. Gu W, Pepe MS (2009b) Estimating the capacity for improvement in risk prediction. Biostatistics 10(1):172–186CrossRefGoogle Scholar
  15. Heagerty PJ, Pepe MS (1999) Semiparametric estimation of regression quantiles with application to standardizing weight for height and age in children. Appl Stat 48(4):533–551MATHGoogle Scholar
  16. Huang Y, Pepe MS, Feng Z (2007) Evaluating the predictiveness of a continuous marker. Biometrics 63(4):1181–1188MathSciNetMATHCrossRefGoogle Scholar
  17. Huang Y, Pepe MS (2009) Semiparametric methods for evaluating risk prediction markers in case–control studies. Biometrika 96(4):991–997MathSciNetMATHCrossRefGoogle Scholar
  18. Janes H, Pepe MS (2008) Matching in studies of classification accuracy: implications for analysis, efficiency, and assessment of incremental value. Biometrics 64:1–9MathSciNetMATHCrossRefGoogle Scholar
  19. Janes H, Pepe MS (2009) Adjusting for covariate effects on classification accuracy using the covariate adjusted ROC curve. Biometrika 96:371–382MathSciNetMATHCrossRefGoogle Scholar
  20. Janssens ACJW, Deng Y, Borsboom GJJM, Eijkemans MJC, Habemma JDF, Steyerberg EW (2005) A new logistic regression approach for the evaluation of diagnostic test results. Ann Intern Med 25(2):168–177Google Scholar
  21. Kannel WB, McGee D, Gordon T (1976) A general cardiovascular risk profile: the Framingham study. Am J Cardiol 38:46–51CrossRefGoogle Scholar
  22. Kerr KF, McClelland RL, Brown ER, Lumley T (2011) Evaluating the incremental value of new biomarkers with integrated discrimination improvement. Am J Epidemiol 174(3):364–374CrossRefGoogle Scholar
  23. Krijnen P, van Jaarsveld BC, Steyerberg EW, Man in’t Veld AJ, Schalekamp MADH, Habbema JDF (1998) A clinical prediction rule for renal artery stenosis. Stat Med 129(9):705–711Google Scholar
  24. Mealiffe ME, Stokowski RP, Rhees BK, Prentice RL, Pettinger M, Hinds DA (2010) Assessment of clinical validity of a breast cancer risk model combining genetic and clinical information. J Natl Cancer Inst 102(21):1618–1627CrossRefGoogle Scholar
  25. Pauker SG, Kassierer JP (1980) The threshold approach to clinical decision making. N Engl J Med 302:1109–1117CrossRefGoogle Scholar
  26. Pencina MJ, D’Agostino RB Sr, D’Agostino RB Jr, Vasan RS (2008) Evaluating the added predictive ability of a new marker: from area under the ROC curve to reclassification and beyond. Stat Med 27:157–172MathSciNetCrossRefGoogle Scholar
  27. Pencina MJ, D’Agostino RB Sr, Steyerberg EW (2011) Extensions of net reclassification improvement calculations to measure usefulness of new biomarkers. Stat Med 30:11–21MathSciNetCrossRefGoogle Scholar
  28. Pepe MS, Kerr KF, Longton G, Wang Z (2013) Testing for improvement in prediction model performance. Stat Med. doi: 10.1002/sim.5727
  29. Pepe MS, Feng Z, Janes H, Bossuyt P, Potter J (2008) Pivotal evaluation of the accuracy of a biomarker used for classification or prediction: standards for study design. J Natl Cancer Inst 100(20):1432–1438CrossRefGoogle Scholar
  30. Pepe MS, Fan J, Seymour CW, Li C, Huang Y, Feng Z (2012) Biases introduced by choosing controls to match risk factors of cases in biomarker research. Clin Chem 58(8):1242–1251Google Scholar
  31. Pepe MS, Janes H (2013) Methods for evaluating prediction performance of biomarkers and tests. In MLT Lee, M Gail, G Satten, T Cai, A Gandy, R Pfeiffer (Ed.), Risk assessment and evaluation of predictions. SpringerGoogle Scholar
  32. Pfeiffer RM, Gail MH (2011) Two criteria for evaluating risk prediction models. Biometrics 67:1057–1065MathSciNetMATHCrossRefGoogle Scholar
  33. Prentice RL, Pyke R (1979) Logistic disease incidence models and case–control studies. Biometrika 66:403–411MathSciNetMATHCrossRefGoogle Scholar
  34. Truett J, Cornfield J, Kannel W (1967) A multivariate analysis of the risk of coronary heart disease in Framingham. J Chron Dis 20:511–524CrossRefGoogle Scholar
  35. Vickers AJ, Cronin AM, Begg CM (2011) One statistical test is sufficient for assessing new predictive markers. BMC Med Res Methodol 11(13). doi: 10.1186/1471-2288-11-13
  36. Vickers AJ, Elkin EB (2006) Decision curve analysis: a novel method for evaluating prediction models. Med Decis Making 26:565–574CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2013

Authors and Affiliations

  1. 1.Department of BiostatisticsUniversity of WashingtonSeattleUSA
  2. 2.Fred Hutchinson Cancer Research CenterSeattleUSA

Personalised recommendations