The present issue of Lifetime Data Analysis contains six articles related to risk assessment and prediction based on work presented at the October 12–14, 2011 Conference on Risk Assessment and Evaluation of Predictions in Silver Spring, Maryland, organized by Dr. Mei-Ling Ting Lee and the Biostatistics and Risk Assessment Center of the School of Public Health, University of Maryland, College Park. In addition to tutorials on Absolute Risk Prediction and on Current Methods for Evaluating Prediction Performance of Biomarkers and Tests, the conference covered a wide range of topics including: pharmaceutical risk assessment, selecting subsets of a population likely to show a treatment benefit in a future comparative study, identifying high risk regions or time periods, food safety, using time-varying markers in risk prediction, dose-response methods in toxicology, evaluation of risk prediction models, genetic and epigenetic risk factors, and risk-benefit analyses. It is not possible, therefore, for six papers to represent the breadth of topics presented at the conference. More detail on the Conference will be published by Springer in a book volume titled Risk Assessment and Evaluation of Predictions.

Three of these papers in this special issue concern the assessment of improvements in risk models based on the addition of additional risk factor(s). Zhou et al. (2013) show that even when there is no global improvement in prediction, there may be subgroups defined by risk levels of a previously available risk model, within which the new model improves performance. They give methods for identifying such subgroups and for inference on the increment in performance of risk prediction. When the population average risk of disease is known in the general population, case–control data can be used to estimate improvements in prospective characteristics of model performance, such as changes in positive predictive value or in expected loss. Case–control data can be used, even without knowledge of disease risk, to estimate improvements in retrospective measures that are conditional on disease status, such as increases in the AUC, the area under the receiver operating characteristic curve (ROC), the mean risk difference between cases and controls, and the net reclassification index. Bansal and Pepe (2013) discuss the effect of matching in the case–control data on such inferences. Naïve analyses can lead to bias, but Bansal and Pepe show how to adjust for the fact that matched controls are not representative of controls in the general population to produce unbiased estimates. Matching usually improves the precision of the estimates of improvements in risk model performance. Although the AUC has been criticized as a measure of model performance, Pencina et al. (2013) ask to what extent increases in AUC reflect increases in other measures of model performance. Assuming risk factors are jointly normally distributed, both in cases and controls, and with common covariance matrix, the authors show that many performance measures are monotone functions of the Mahalanobis distance, and they conclude that changes in AUC are usually indicative of changes in other measures of performance unless very high specificity is required.

The three other papers cover various topics related to risk models. If outcomes in clinical trials do not follow the proportional hazards model, it may be desirable to express treatment effects in terms of other measures such as the decrease in the pure survival probability or the increase in restricted mean lifetime. Yang (2013) describes inference for these quantities under a semi-parametric model. Independent cases and controls are frequently used to estimate and perform inference on the AUC, that can also be expressed as the probability that a randomly selected case has a higher risk than a randomly selected control, by relying on the relationship of AUC to the Mann-Whitney statistic. Rosner et al. (2013) consider the more complex setting where comparisons are made across clusters containing cases and controls. For example, an affected eye (case) might be compared with an unaffected eye (control) from the same person (i.e., the same cluster) or with an unaffected eye from another person (a different cluster). The authors weight all discordant comparisons equally and account for correlations within cluster. Wang and Li (2013) generalize the concept of ROC for multiple markers. Rather than basing the ROC on a scalar function of the markers, they partition the marker space into a decision region for disease and a decision region for non-disease, using the same types of partitions as given by the terminal nodes in a classification tree. They define ROC(q) as the conditional expectation of sensitivity for the partition defined by marker cut-points in the decision tree, given that those cut-points are chosen so that specificity equals to 1 \(-\) q. This conditional expectation is based on the joint distribution of markers in non-cases, which is also used to calculate AUC as an average of ROC(q).

Readers are encouraged to consult the book volume which will be published by Springer titled Risk Assessment and Evaluation of Predictions to learn about other valuable presentations.