PredictABEL: an R package for the assessment of risk prediction models
- First Online:
- 1k Downloads
The rapid identification of genetic markers for multifactorial diseases from genome-wide association studies is fuelling interest in investigating the predictive ability and health care utility of genetic risk models. Various measures are available for the assessment of risk prediction models, each addressing a different aspect of performance and utility. We developed PredictABEL, a package in R that covers descriptive tables, measures and figures that are used in the analysis of risk prediction studies such as measures of model fit, predictive ability and clinical utility, and risk distributions, calibration plot and the receiver operating characteristic plot. Tables and figures are saved as separate files in a user-specified format, which include publication-quality EPS and TIFF formats. All figures are available in a ready-made layout, but they can be customized to the preferences of the user. The package has been developed for the analysis of genetic risk prediction studies, but can also be used for studies that only include non-genetic risk factors. PredictABEL is freely available at the websites of GenABEL (http://www.genabel.org) and CRAN (http://cran.r-project.org/).
KeywordsRisk prediction Genetic Assessment Measures Software
Area under the ROC curve
Integrated discrimination improvement
Net reclassification improvement
Receiver operating characteristic
The rapid identification of genetic markers for multifactorial diseases from genome-wide association studies is fuelling interest in investigating the predictive ability and health care utility of genetic risk models. Genetic risk models are investigated for their potential to target diagnostic, preventive and therapeutic interventions for multifactorial diseases. Implementation of these models in health care requires a series of studies that encompass all phases of translational research [1, 2], starting with a comprehensive evaluation of genetic risk prediction.
Various measures are available for the assessment of risk prediction models, each addressing a different aspect of performance and utility [3, 4]. The GRIPS Statement recommends that transparent and complete reporting should provide a description of the risk factors and the risk model by reporting univariate and multivariate odds ratios for the predictors, present risk distributions for individuals with and without the outcome of interest, and report measures of model fit, predictive ability and others, if pertinent [5, 6]. Examples of measures include the Hosmer–Lemeshow statistic  and Nagelkerke’s R2  for model fit, the area under the receiver operating characteristic (ROC) curve (AUC)  and integrated discrimination improvement (IDI)  for predictive ability, and percentages of total reclassification  and net reclassification improvement (NRI)  for clinical utility.
Even though the assessment of risk prediction models is relatively standard, there is no single statistical package that would allow for the computation and production of all these measures and plots. Therefore, we developed PredictABEL, a freely available R package, which contains functions to obtain all descriptive tables, measures and plots that are used in genetic risk prediction studies.
Description of PredictABEL
Measures and plots covered in PredictABEL (version 1.1)
Measures and plots
Description of the data
Univariate odds ratios
Allele and genotype frequencies by disease status
Odds ratios per allele and per genotype
Description of the model
Multivariate odds ratios
Odds ratios adjusted for all predictors in the logistic regression modela
Histogram of predicted risks by disease status
Cumulative percentage of individuals against predicted risks
Overall model performance
Percentage of variance in the outcome explained by predictors in the logistic regression modela
Average squared difference between predicted risks and observed disease status
Average difference between observed and predicted risks across subgroups
Observed and predicted risks across subgroups
Receiver operating characteristic (ROC) curve
Area under the ROC curve (AUC)
Discrimination box plot
Integrated discrimination improvement (IDI)
Sensitivity and specificity for all possible cut-off values of predicted risks
Measure of discriminative accuracy
Box plot of predicted risks by disease status
Comparison of mean difference in predicted risks of individuals with and without the disease between initial and updated model
Net reclassification improvement (NRI)
Number of individuals per risk category of the initial against the updated model by disease status
Net improvement in risk classification in individuals with and without the disease.
The tables and plots generated using PredictABEL are saved as separate files in the working directory. Tables can be saved as Excel or tab-delimited text files and figures can be saved as publication-quality EPS or TIFF files or as JPEG files for insertion in manuscripts. All figures are available in a ready-made layout, but they can be customized to the journal style or preferences of the user. A hypothetical dataset and examples of use are included in the package to demonstrate all functions.
The hypothetical dataset included in the package was reconstructed from an empirical study on age-related macular degeneration (AMD) , using a simulation method that has been described in detail elsewhere . Based on published frequencies and odds ratios of the genetic variants and non-genetic risk factors implicated in AMD and on published population disease risks, we created a dataset that contains genotype data and disease status for 10,000 individuals. Predicted risks were obtained using logistic regression analysis, for which the codes are provided in the package. Two risk models were constructed: a model based on non-genetic risk factors only and a model based on genetic and non-genetic predictors.
Reclassification table comparing clinical risk models without and with genetic factors
Without genetic predictors
With genetic predictors
Net correctly reclassified (%)
Individuals without AMD
Individuals with AMD
PredictABEL is a comprehensive software package, designed for the development and assessment of genetic risk prediction models. PredictABEL is a part of the GenABEL software suite for statistical genomics [14, 15] and for that reason written in R to enable easy transfer of data from gene discovery to genetic prediction studies. A detailed manual is available that demonstrates and explains all the functions in the package. The manual is accessible for researchers who do not regularly use R software. The manual and the package are freely available from the GenABEL project website (http://www.genabel.org) and from CRAN (http://cran.r-project.org/).
The current version of PredictABEL (version 1.1) includes all basic descriptive tables, measures and plots that are used in the assessment of risk prediction models. Planned extensions of the package include other strategies to construct risk models, e.g., using Cox Proportional Hazards analysis for prospective data, and functions to construct simulated data for the evaluation of genetic risk models . Furthermore, we will optimize the interconnectivity between PredictABEL and other packages in the GenABEL suite.
Where the GRIPS Statement aims to improve the transparency, quality and completeness of reporting [5, 6], PredictABEL has similar goals for the assessment of genetic risk prediction studies. The collection of all measures and plots in a single, software package gives a comprehensive overview of the various measures that are available for the assessment of risk prediction studies. This overview emphasizes that different measures are available to answer different questions in the assessment of risk models and facilitates the selection of the most appropriate measure for the question under study.
This work was supported by the Vidi grant from the Netherlands Organization for Scientific Research (NWO), the Young Investigator grant from the Erasmus University Medical Center Rotterdam and by the Center for Medical Systems Biology within the framework of the Netherlands Genomics Initiative.
This article is distributed under the terms of the Creative Commons Attribution Noncommercial License which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.