Risk Prediction Models: Applications in Cancer Prevention

Colditz, Graham A.; Wei, Esther K.

doi:10.1007/s40471-015-0057-1

Risk Prediction Models: Applications in Cancer Prevention

Cancer Epidemiology (G Colditz, Section Editor)
Published: 30 September 2015

Volume 2, pages 245–250, (2015)
Cite this article

Download PDF

Current Epidemiology Reports Aims and scope Submit manuscript

Risk Prediction Models: Applications in Cancer Prevention

Download PDF

Graham A. Colditz¹ &
Esther K. Wei^2,3

2004 Accesses
5 Citations
1 Altmetric
Explore all metrics

Abstract

We review development, validation, and translation of risk prediction models to clinical and population practices. We focus on issues in each of these steps and the gaps in the field across the continuum of risk prediction model development (many models published); validation (few validated); and implementation (even fewer implemented in clinical settings, much implementation on web sites). Design of models for end users and critical issues in implementing and evaluating models are addressed with examples from first-hand experience.

Defining the Study Cohort: Inclusion and Exclusion Criteria

Estimating the sample mean and standard deviation from the sample size, median, range and/or interquartile range

Article Open access 19 December 2014

Statistical significance: p value, 0.05 threshold, and applications to radiomics—reasons for a conservative approach

Article Open access 11 March 2020

Introduction

Purposes of Risk Prediction

In cancer prevention, research and practice risk prediction models have been used to determine study eligibility [1]. Risk stratification may be used to identify high-risk women, say in breast cancer families for referral to counseling, or to guide lifestyle modification or chemoprevention. More recently, with recommendations for MRI screening of women at high risk for breast cancer, the risk prediction guides an intervention decision by classifying women as eligible for screening or not [2]. Similar eligibility for covered services now applies to low-dose CT scanning for lung cancer as implemented by CMS coverage. Finally, refining models to better understand disease etiology through temporal relations of risk factors can improve approaches to prevention [3].

Regardless of these purposes, the process of multivariable risk prediction model development, validation, implementation, and adjustment underlies the continuous process of development and refinement. We propose the model in Fig. 1 as a continuing process for model application.

Approaches to Model Development

In the field of cancer risk prediction, two distinct classes of mathematical models have been used in cancer epidemiology. Statistical models may draw on established multivariable regressions (including linear and logistic regression) to relate risk factors to cancer incidence. Biomathematical models, on the other hand, aim to translate the presumed biologic process of carcinogenesis into mathematical models [4]. The best known models developed by Armitage and Doll underpin a long history of applying mathematical models to cancer incidence rates. Moving beyond age relations and adding epidemiologic risk factors, this approach now provides a structure to view the contribution of these risk factors to the underlying biologic process of carcinogenesis [5]. With regard to age relations, Fisher and Hollomon [6] used stomach cancer mortality, and Nordling [7] combined all cancer sites. They noted that, for ages 25 to 74 years, the logarithm of the death rate increased directly in relation to the logarithm of age. Armitage and Doll then evaluated cancer mortality in the UK in men and women in 1950 and 1951. Importantly, they focused on the slope or gradient in risk with age. A gradient of 6 to 1 (i.e., 6 units increase in the logarithm of the death rate per unit increase in the logarithm of age) was relatively consistent across 17 cancer sites. Based on this, they concluded that cancer is the end-result of several successive cellular changes. However, for breast, ovary, and cervical cancers, there was a deficit or reduction in the slope in older age groups. They concluded that this was due to a reduction (after about age 50 in their regressions) in the rate of one of the later changes in the process of carcinogenesis [5]. Thus, they proposed a multistage model of carcinogenesis.

Mathematical models can also summarize the impact of multiple variables such as change in risk factors across the life course, which may modify the incidence rates [8]. These models can refine and improve understanding of disease relations or disease development and then add to precision in risk estimation. More precise models may then lead to better tools for clinical risk assessment and decision-making [9]. Doll and Peto [10] applied this multistage cancer incidence model to lung cancer within the British Doctor’s Study. They observed that lung cancer incidence is proportional to (dose +6)² × (age − 22.5)^4.5, where dose = cigarettes per day. This result was consistent with the multistage model of carcinogenesis. They interpreted the coefficients for the components of the model as approximations for the number of stages in the carcinogenesis process, that is, incidence is proportional to the fourth to sixth power of time (age), suggesting four to six independent steps in the process of carcinogenesis. These model-based extrapolations have been confirmed by Vogelstein and colleagues in the setting of colon cancer [11]. For lung cancer, theses models implied that more than one of the stage of carcinogenesis was strongly affected by smoking [12, 13]. Extensive application of the Armitage and Doll model to radiation exposure also attests to its utility [14, 15].

Pike et al. [16•] took the Armitage and Doll approach and applied it to breast cancer, including risk factors (menarche, first birth, and menopause) as modifiers of the effect of time. Pike assumed that breast tissue “aged” at a constant rate starting at menarche and continuing to first birth. After an adverse effect of first birth, there was a decrease in the rate of “tissue aging” after the first birth. The rate of tissue aging further decreased after menopause. This replicated the observation for breast cancer mortality reported by Armitage and Doll [5]. Pike’s model only had a term for parous vs. nulliparous, did not include terms for second and subsequent pregnancies, nor did it account for the timing of these births nor any differences in the effect of natural menopause vs. bilateral oophorectomy. Rosner and Colditz expanded from the Pike model by adding more details of reproductive history, including the timing of births, and type of menopause (natural vs. surgical) [17–19]. Like the Doll and Peto lung cancer model, this model generated a set of parameters for the rate of breast tissue aging before first pregnancy, the rate of tissue aging after menopause, and the magnitude of the adverse effect of first pregnancy. The Rosner and Colditz model has been further refined with the addition of benign breast disease [20], circulating hormone levels [21, 22], and so forth, but the underlying approach remains a life course accumulation of cancer risk that can be used to estimate annual and cumulative risk of cancer. Applications in colon [23], melanoma [24], and ovary [25] all use this approach.

A simpler form of this multivariable risk factor approach is to take a model from an existing epidemiologic data set and assess its performance in predicting cancer. One example is the multivariable model originally developed for lung cancer [26] that has been expanded to assess performance based on inclusion of DNA repair markers [27], gender, and smoking history [28].

Focusing on the age-incidence data for breast cancer incidence from high- and low-risk countries, Moolgavkar et al. [29, 30] took an alternative approach to modeling. Specifically, they fitted a two-stage model that allowed for normal cells to progress through transformed cells to cancer. They noted that across high- and low-risk countries, the shape of the breast cancer incidence curves was constant. Pathak and Whittemore applied a breast cancer incidence rate function to data from countries with high, medium, and low breast cancer incidence rates. They confirmed the observation of Moolgavkar that age at first birth and age at menopause exert similar effects on all women regardless of the breast cancer incidence rates in their country [31]. Pike and colleagues subsequently used traditional survival analysis methods to show that reproductive risk factors apply equally across ethnic groups in the USA [32]. The underlying approach of modeling the two-stage model of cancer has continued to be applied by Moolgavkar and colleagues in settings of lung, colon, and so forth [13, 33, 34].

Missing Data

A common gap in model development is description of how missing data are handled. Limiting model development to a completed data set is often reported. This has implications for the final application—will those with one or more missing data points be excluded from prediction? How will this impact clinical decision-making, testing or referral, or acceptability in clinical and public health settings? Rosner has overcome this in the application of his macular degeneration prediction model [35] using NHANES data to impute missing variables (personal communication). On the other hand, at the Joanne Knight Breast Health Center where some 50,000 screening mammograms are performed annually, a sufficiently large data set of similar women is available to impute missing variables when the Rosner-Colditz model is implemented in the clinical setting. Too often, lack of information on how missing data are handled limits the transfer of models from development to broader application.

Summary

Regardless of the approach to building a model, the proliferation in number of risk prediction models published since the NCI workshop in 2005 is impressive and indicates how an NCI initiative can help move a field forward [9]. Models are typically developed following one of three general approaches: (1) explicit selection of known causal factors; (2) biologic/lifespan or life calendar approaches; and (3) data driven and regression applications, typically from large databases. Despite the publication of many models, few seem to progress to validation in independent settings. In breast cancer, a systematic review of models by Meads and colleagues notes that 17 models have been published as of 2012, 3 have been validated (Gail, Rosner, Cuzick), and none evaluated for their clinical impact. Similarly, models for predicting colorectal neoplasia have been developed, though many lack validation, and only a few have been evaluated for implementation in clinical practice [36–38]. A unique characteristic of colorectal neoplasia is the opportunity to develop risk models for the precursor lesion. This type of model has direct applications in clinical practice with respect to counseling for colorectal cancer screening.

Validation Comments

While Steyerberg in his text [39] discusses in detail the approaches to adjusting models for over fitting and other strategies in the context of splitting data sets into development and testing subsets, along with more advanced bootstrapping type approaches, an underlying limitation of these statistical approaches is that the extant data set can hide issues of bias. Accordingly, Moons and others advocate for independent validation—that is in an independent prospective data set [40, 41•, 42]. Validation is a key step in moving to application of the risk prediction model for cancer prevention.

One major challenge in epidemiologic risk prediction model building is obtaining access to the independent data set with the necessary variables. In breast modeling, Rosner and Colditz collaborated with California Teachers Study to achieve this [43•]—in model building and assessing the value of SNPs to other risk factors, the validation of the new models with necessary SNP measures remains a challenge.

Although statistical methods can mitigate the potential overestimate of performance associated with an internal validation, the goal is for a model to predict risk in groups other than the original population and ultimately to be used in a clinical setting. To evaluate generalizability of the model in other populations and to quantify any deficiencies in the model development require an external validation [40, 44]. When the validation population varies in an obvious way from the development population, the interpretation of the validation is straightforward, e.g., a model developed in one country that is validated in another country. When the development and validation populations vary in subtler or complex ways, the interpretation of the validation can be more challenging. Recent methods to better quantify the differences between the development and validation populations allow for more rigorous evaluation of external validation studies [45]. As suggested by Park [46•], comparison studies of different risk models’ performance on the same population (e.g., group external validation), such as the one by D’Amelio and colleagues [47], would be possibly of even greater value than individual external validation studies that assess the performance of any particular model.

The calibration of a model is a particularly important piece of determining a model’s performance and utility when applied beyond the data set from which it was developed, such as at the population level. Calibration provides information on the agreement between predicted and observed risks. In practice, the majority of prediction model articles do not report the model’s performance assessed by calibration [44]. One example of how calibration methods were used in an external validation was the external validation of the Rosner-Colditz model using the California Teachers Study (CTS) as an independent data set [43•] and using calibration methods described by Gail [1]. Calculating the observed and expected deciles of cases in the CTS based on Rosner-Colditz beta coefficients, the model demonstrated an overall good fit to SEER data [43•]. Other considerations related to validation and calibration are discussed in more detail by Park as part of this series [46•].

Reporting of Methods Used

As the number of risk prediction models and validation studies (internal and external) has grown, the need for a systematic way of reporting results has become paramount. Without consistent reporting of methods, choosing a model for application in cancer prevention can be quite subjective. Meta-analyses and systematic reviews of risk prediction modeling articles consistently find poor quality reporting across all aspects of prediction model development and for multiple disease sites [44, 48, 49]. In response to this, Collins and others developed the TRIPOD Statement, a checklist of 22 items determined to be essential for high-quality reporting of multivariable prediction models (diagnostic or prognostic) [50•]. The checklist is organized according to the sections of a standard research manuscript and differentiates which sections apply to development, validation, or both types of models. The authors propose to include the checklist with manuscripts submitted for peer review. As the literature in the field of risk prediction continues to grow, this type of structured guideline should improve the quality of reporting methods and will facilitate model comparisons and improvements.

Implementation

While models are developed and can be applied in a number of settings as noted earlier, the underlying challenge is for the model to be useful in the clinical or public health setting improving outcomes such as satisfaction with decisions, quality of life, or reducing disease endpoints [41•]. To achieve successful implementation, which is the true measure of a prediction model’s utility, the end user must be considered, preferably from the beginning of the model development process. An example may help understand how important this can be. If a sophisticated model is built on extensive assessment of lifestyle factors and is not sufficiently short to be completed in say a clinic setting, then noncompletion makes the model, no matter how good or perfect, of no practical use in that clinic. The requirement of simple variables for implementation increases the number of data sets that could be used for the validation of existing models, a current gap in the field of risk prediction as discussed above. We extended from this basic premise when developing the cancer risk assessment tools from the Harvard Center for Cancer Prevention in the 1990s [51, 52]. We chose simple dichotomy of risk factors to essay completion, and after focus group testing, [52] we moved to computer administration to reduce errors in arithmetic by users. We chose an engaging presentation with seven categories of risk as recommended by Weinstein and provide a lower limit of achievable risk reduction to convey the pint that risk of cancer cannot go to zero [53, 54]. Ongoing research on risk perception and presentation of risk will help refine the usefulness of output from models [55–59]. Better integration of insights to output from the beginning phases of model development may increases uptake of models for cancer prevention.

Adaptation

In cardiovascular disease, we find numerous models of risk prediction—Framingham, Scottish, New Zealand, etc. For cancer, where we have standardized population-based incidence reporting through registration systems, adjusting models to fit national cancer incidence should be less problematic. However, beyond the approach of Gail and Rosner, no systematic study of adaptation has been reported. Should one take a validated model and apply it while assessing performance in a new setting, or should we go back to deriving a model from scratch? Starting over at the model development stage when a validation study suggests poor performance implies reselecting predictors, giving up any knowledge gained from the initial development of the model [41•], and ultimately will lead to more models developed that are not carried beyond the initial development or validation stage. Although several general methods for updating prediction models have been proposed and evaluated, and can improve the generalizability and transportability of existing models [41•], no broader standards or guidelines have been established that could guide efforts to adapt existing models. A systematic approach might help reduce redundancy and the proliferation of models that have not been validated. This would then facilitate more models reaching the stage of assessment for use in clinical or prevention settings and ultimately lead the intended positive impact on public health.

Conclusion

Risk prediction models have great potential to improve current cancer prevention strategies. Building on Armitage and Doll’s work on stages of carcinogenesis, risk models for cancer, and breast cancer in particular, have provided insights into etiology and moved clinical practice and research forward. Models that follow the full cycle, e.g., model development, validation, implementation, and adaptation, will result in the greatest impact on identifying specific groups for screening, targeting specific populations for cancer prevention counseling, more finely defining study eligibility criteria, and improving our understanding of etiologic heterogeneity. The challenges of each step in the cycle include the following: forethought regarding implementation during model development; accurate methods of handling missing data and careful and complete validation, including identifying an appropriate external validation data set; accurate and comprehensive reporting across the spectrum of development and validation; pragmatic studies of implementation in real-world clinical settings; and appropriate adaptation as knowledge grows. Perhaps due to these challenges, the proliferation of risk models has occurred largely without appropriate attention to the full cycle and eventual goal, resulting in many models that have little or no clinical or population-level impact. The need for wide-scale improvement in risk/screening stratification has been highlighted by the recently launched National Precision Medicine Initiative, which asserts the need for more precise clinical decision-making. However, much of the immediate attention given to the National Precision Medicine Initiative has focused on treatment, e.g., classifying an individuals’ response to specific pharmaceutical agents. This unfortunately overshadows the many applications to prevention—where risk prediction models can result in targeted and cost-effective screening [60]. In summary, risk prediction modeling has is still a growing field with many methodological challenges and opportunities. However, what we do not know, or areas in which we can still improve, should not hinder us from using our current knowledge in risk modeling to advance population-level cancer prevention.

References

Papers of particular interest, published recently, have been highlighted as: • Of importance

Gail MH, Brinton LA, Byar DP, Corle DK, Green SB, Schairer C, et al. Projecting individualized probabilities of developing breast cancer for white females who are being examined annually. J Natl Cancer Inst. 1989;81(24):1879–86.
Article CAS PubMed Google Scholar
Saslow D, Boetes C, Burke W, Harms S, Leach MO, Lehman CD, et al. American Cancer Society guidelines for breast screening with MRI as an adjunct to mammography. CA Cancer J Clin. 2007;57(2):75–89.
Article PubMed Google Scholar
Colditz GA, Rosner BA. What can be learnt from models of incidence rates? Breast Cancer Res. 2006;8(3):208. Summarizes the value of cancer risk prediction models in the setting of breast cancer.
Article PubMed Central PubMed Google Scholar
Kaldor J, Day N. Mathematical models in cancer epidemiology. In: Schottenfeld D, Fraumeni J, editors. Cancer epidemiology. New York: Oxford University Press; 1996. p. 127–37.
Google Scholar
Armitage P, Doll R. The age distribution of cancer and a multistage theory of carcinogenesis. Br J Cancer. 1954;8:1–12.
Article PubMed Central CAS PubMed Google Scholar
Fisher JC, Hollomon JH. A hypothesis for the origin of cancer foci. Cancer. 1951;4(5):916–8.
Article CAS PubMed Google Scholar
Nordling CO. A new theory on cancer-inducing mechanism. Br J Cancer. 1953;7(1):68–72.
Article PubMed Central CAS PubMed Google Scholar
Moolgavkar S. Cancer models. Epidemiology. 1990;1:419–20.
Article CAS PubMed Google Scholar
Freedman AN, Seminara D, Gail MH, Hartge P, Colditz GA, Ballard-Barbash R, et al. Cancer risk prediction models: a workshop on development, evaluation, and application. J Natl Cancer Inst. 2005;97(10):715–23.
Article PubMed Google Scholar
Doll R, Peto R. Cigarette smoking and bronchial carcinoma: dose and time relationships among regular smokers and lifelong non-smokers. J Epidemiol Community Health. 1978;32:303–13.
Article PubMed Central CAS PubMed Google Scholar
Vogelstein B, Fearon ER, Hamilton SR, Kern SE, Preisinger AC, Leppert M, et al. Genetic alterations during colorectal-tumor development. N Engl J Med. 1988;319:525–32.
Article CAS PubMed Google Scholar
Brown C, Chu K. Use of multistage models to infer stage affected by carcinogenic exposure: example of lung cancer and cigarette smoking. J Chron Dis. 1987;40 Suppl 2:171s–9s.
Article PubMed Google Scholar
Hazelton W, Clements M, Moolgavkar S. Multistage carcinogenesis and lung cancer mortality in three cohorts. Cancer Epidemiol Biomarkers Prev. 2005;14:1171–81.
Article CAS PubMed Google Scholar
Little M, Hawkins M, Charles M, Hildreth N. Fitting the Armitage-Doll model to radiation-exposed cohorts and implications for population cancer risks. Radiat Res. 1992;132:207–21.
Article CAS PubMed Google Scholar
Day N. The Armitage-Doll multistage model of carcinogenesis. Stat Med. 1990;9:677–9.
Article CAS PubMed Google Scholar
Pike MC, Krailo MD, Henderson BE, Casagrande JT, Hoel DG. “Hormonal” risk factors, “breast tissue age” and the age-incidence of breast cancer. Nature. 1983;303:767–70. Seminal work applying previous understanding of carcinogenesis to breast cancer risk incidence.
Article CAS PubMed Google Scholar
Rosner B, Colditz GA. Nurses’ health study: log-incidence mathematical model of breast cancer incidence. J Natl Cancer Inst. 1996;88(6):359–64.
Article CAS PubMed Google Scholar
Rosner B, Colditz GA, Willett WC. Reproductive risk factors in a prospective study of breast cancer: the Nurses’ Health Study. Am J Epidemiol. 1994;139(8):819–35.
CAS PubMed Google Scholar
Berkey C, Rockett H, Field A, Gillman M, Frazier A, Camargo C, et al. Activity, dietary intake and weight change in a longitudinal study of adolescent boys and girls. Pediatrics. 2000;105:E56.
Article CAS PubMed Google Scholar
Tamimi RM, Rosner B, Colditz GA. Evaluation of a breast cancer risk prediction model expanded to include category of prior benign breast disease lesion. Cancer. 2010;116(21):4944–53. doi:10.1002/cncr.25386.
Article PubMed Central PubMed Google Scholar
Rosner B, Colditz GA, Iglehart JD, Hankinson SE. Risk prediction models with incomplete data with application to prediction of estrogen receptor-positive breast cancer: prospective data from the Nurses’ Health Study. Breast Cancer Res. 2008;10(4):R55.
Article PubMed Central PubMed Google Scholar
Tworoger SS, Zhang X, Eliassen AH, Qian J, Colditz GA, Willett WC, et al. Inclusion of endogenous hormone levels in risk prediction models of postmenopausal breast cancer. J Clin Oncol Offl Jl Am Soc Clinical Oncol. 2014;32(28):3111–7. doi:10.1200/JCO.2014.56.1068.
Article Google Scholar
Wei EK, Colditz GA, Giovannucci EL, Fuchs CS, Rosner BA. Cumulative risk of colon cancer up to age 70 years by risk factor status using data from the Nurses’ Health Study. Am J Epidemiol. 2009;170(7):863–72. doi:10.1093/aje/kwp210.
Article PubMed Central PubMed Google Scholar
Cho E, Rosner BA, Feskanich D, Colditz GA. Risk factors and individual probabilities of melanoma for whites. J Clin Oncol. 2005;23(12):2669–75.
Article PubMed Google Scholar
Rosner BA, Colditz GA, Webb PM, Hankinson SE. Mathematical models of ovarian cancer incidence. Epidemiology. 2005;16(4):508–15.
Article PubMed Google Scholar
Spitz MR, Hong WK, Amos CI, Wu X, Schabath MB, Dong Q, et al. A risk model for prediction of lung cancer. J Natl Cancer Inst. 2007;99(9):715–26. doi:10.1093/jnci/djk153.
Article PubMed Google Scholar
Spitz MR, Etzel CJ, Dong Q, Amos CI, Wei Q, Wu X, et al. An expanded risk prediction model for lung cancer. Cancer Prev Res (Phila). 2008;1(4):250–4. doi:10.1158/1940-6207.CAPR-08-0060.
Article Google Scholar
Foy M, Spitz MR, Kimmel M, Gorlova OY. A smoking-based carcinogenesis model for lung cancer risk prediction. Int J Cancer. 2011;129(8):1907–13. doi:10.1002/ijc.25834.
Article PubMed Central CAS PubMed Google Scholar
Moolgavkar SH, Day NE, Stevens RG. Two-stage model for carcinogenesis: epidemiology of breast cancer in females. J Natl Cancer Inst. 1980;65:559–69.
CAS PubMed Google Scholar
Moolgavkar S, Knudson Jr A. Mutation and cancer: a model for human carcinogenesis. J Natl Cancer Inst. 1981;66:1037–52.
CAS PubMed Google Scholar
Pathak DR, Whittemore AS. Combined effects of body size, parity, and menstrual events on breast cancer incidence in seven countries. Am J Epidemiol. 1992;135:153–68.
CAS PubMed Google Scholar
Pike MC, Kolonel LN, Henderson BE, Wilkens LR, Hankin JH, Feigelson HS, et al. Breast cancer in a multiethnic cohort in Hawaii and Los Angeles: risk factor-adjusted incidence in Japanese equals and in Hawaiians exceeds that in whites. Cancer Epidemiol Biomarkers Prev. 2002;11(9):795–800.
PubMed Google Scholar
Meza R, Hazelton WD, Colditz GA, Moolgavkar SH. Analysis of lung cancer incidence in the Nurses’ Health and the Health Professionals’ Follow-Up Studies using a multistage carcinogenesis model. Cancer Causes Control. 2008;19(3):317–28.
Article PubMed Google Scholar
Hazelton WD, Goodman G, Rom WN, Tockman M, Thornquist M, Moolgavkar S, et al. Longitudinal multistage model for lung cancer incidence, mortality, and CT detected indolent and aggressive cancers. Math Biosci. 2012;240(1):20–34. doi:10.1016/j.mbs.2012.05.008.
Article PubMed Central PubMed Google Scholar
Seddon JM, Reynolds R, Yu Y, Daly MJ, Rosner B. Risk models for progression to advanced age-related macular degeneration using demographic, environmental, genetic, and ocular factors. Ophthalmology. 2011;118(11):2203–11. doi:10.1016/j.ophtha.2011.04.029.
Article PubMed Central PubMed Google Scholar
Ma GK, Ladabaum U. Personalizing colorectal cancer screening: a systematic review of models to predict risk of colorectal neoplasia. Clin Gastroenterol Hepatol. 2014;12(10):1624–34 e1. doi:10.1016/j.cgh.2014.01.042.
Article PubMed Google Scholar
Schroy PC, Wong JB, O’Brien MJ, Chen CA, Griffith JL. A risk prediction index for advanced colorectal neoplasia at screening colonoscopy. Am J Gastroenterol. 2015;110(7):1062–71. doi:10.1038/ajg.2015.146.
Article PubMed Google Scholar
Cao Y, Rosner BA, Ma J, Tamimi RM, Chan AT, Fuchs CS, et al. Assessing individual risk for high-risk colorectal adenoma at first-time screening colonoscopy. Int J Cancer. 2015;137(7):1719–28. doi:10.1002/ijc.29533.
Article CAS PubMed Google Scholar
Steyerberg EW. Clinical prediction models. A practical approach to development, validation, and updating. Springer; 2009.
Altman DG, Vergouwe Y, Royston P, Moons KG. Prognosis and prognostic research: validating a prognostic model. BMJ. 2009;338:b605. doi:10.1136/bmj.b605.
Article PubMed Google Scholar
Moons KG, Kengne AP, Grobbee DE, Royston P, Vergouwe Y, Altman DG, et al. Risk prediction models: II. External validation, model updating, and impact assessment. Heart. 2012;98(9):691–8. doi:10.1136/heartjnl-2011-301247. Summarizes key issues in and importance of external validation, implementation and adaptation.
Article PubMed Google Scholar
Moons KG, Kengne AP, Woodward M, Royston P, Vergouwe Y, Altman DG, et al. Risk prediction models: I. Development, internal validation, and assessing the incremental value of a new (bio)marker. Heart. 2012;98(9):683–90. doi:10.1136/heartjnl-2011-301246.
Article PubMed Google Scholar
Rosner BA, Colditz GA, Hankinson SE, Sullivan-Halley J, Lacey Jr JV, Bernstein L. Validation of Rosner-Colditz breast cancer incidence model using an independent data set, the California Teachers Study. Breast Cancer Res Treat. 2013;142(1):187–202. doi:10.1007/s10549-013-2719-3. This article presents a detailed example of external validation of a breast cancer model.
Article PubMed Central CAS PubMed Google Scholar
Collins GS, de Groot JA, Dutton S, Omar O, Shanyinde M, Tajar A, et al. External validation of multivariable prediction models: a systematic review of methodological conduct and reporting. BMC Med Res Methodol. 2014;14:40. doi:10.1186/1471-2288-14-40.
Article PubMed Central PubMed Google Scholar
Debray TP, Vergouwe Y, Koffijberg H, Nieboer D, Steyerberg EW, Moons KG. A new framework to enhance the interpretation of external validation studies of clinical prediction models. J Clin Epidemiol. 2015;68(3):279–89. doi:10.1016/j.jclinepi.2014.06.018.
Article PubMed Google Scholar
Park Y. Predicting cancer risk: practical considerations in developing and validating a cancer risk prediction model. Curr Epidemiol Rep. 2015;2:197–204. doi:10.1007/s40471-015-0048-2. Discusses practical issues in the development and validation of a risk prediction model.
Article Google Scholar
D’Amelio Jr AM, Cassidy A, Asomaning K, Raji OY, Duffy SW, Field JK, et al. Comparison of discriminatory power and accuracy of three lung cancer risk models. Br J Cancer. 2010;103(3):423–9. doi:10.1038/sj.bjc.6605759.
Article PubMed Central PubMed Google Scholar
Mallett S, Royston P, Dutton S, Waters R, Altman DG. Reporting methods in studies developing prognostic models in cancer: a review. BMC Med. 2010;8:20. doi:10.1186/1741-7015-8-20.
Article PubMed Central PubMed Google Scholar
Bouwmeester W, Zuithoff NP, Mallett S, Geerlings MI, Vergouwe Y, Steyerberg EW, et al. Reporting and methods in clinical prediction research: a systematic review. PLoS Med. 2012;9(5):1–12. doi:10.1371/journal.pmed.1001221.
Article PubMed Google Scholar
Collins GS, Reitsma JB, Altman DG, Moons KG. Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): the TRIPOD statement. BMJ. 2015;350:g7594. A systematic checklist for development and validation papers to assist in accurate reporting.
Article PubMed Google Scholar
Colditz GA, Atwood KA, Emmons K, Monson RR, Willett WC, Trichopoulos D, et al. Harvard report on cancer prevention volume 4: Harvard Cancer Risk Index. Risk Index Working Group, Harvard Center for Cancer Prevention. Cancer Causes Control. 2000;11(6):477–88.
Article CAS PubMed Google Scholar
Emmons KM, Koch-Weser S, Atwood K, Conboy L, Rudd R, Colditz G. A qualitative evaluation of the Harvard Cancer Risk Index. J Health Commun. 1999;4(3):181–93.
Article CAS PubMed Google Scholar
Weinstein ND, Atwood K, Puleo E, Fletcher R, Colditz G, Emmons KM. Colon cancer: risk perceptions and risk communication. J Health Commun. 2004;9(1):53–65.
Article PubMed Google Scholar
Waters EA, Weinstein ND, Colditz GA, Emmons K. Formats for improving risk communication in medical tradeoff decisions. J Health Commun. 2006;11(2):167–82.
Article PubMed Google Scholar
Waters EA, Klein WM, Moser RP, Yu M, Waldron WR, McNeel TS, et al. Correlates of unrealistic risk beliefs in a nationally representative sample. J Behav Med. 2011;34(3):225–35. doi:10.1007/s10865-010-9303-7.
Article PubMed Central PubMed Google Scholar
Taber JM, Klein WM, Ferrer RA, Lewis KL, Biesecker LG, Biesecker BB. Dispositional optimism and perceived risk interact to predict intentions to learn genome sequencing results. Health Psychol. 2015;34(7):718–28. doi:10.1037/hea0000159.
Article PubMed Central PubMed Google Scholar
Portnoy DB, Ferrer RA, Bergman HE, Klein WM. Changing deliberative and affective responses to health risk: a meta-analysis. Health Psychol Rev. 2014;8(3):296–318. doi:10.1080/17437199.2013.798829.
Article PubMed Google Scholar
Klein WM, Hamilton JG, Harris PR, Han PK. Health messaging to individuals who perceive ambiguity in health communications: the promise of self-affirmation. J Health Commun. 2015;20(5):566–72. doi:10.1080/10810730.2014.999892.
Article PubMed Google Scholar
Han PK, Klein WM, Killam B, Lehman T, Massett H, Freedman AN. Representing randomness in the communication of individualized cancer risk estimates: effects on cancer risk perceptions, worry, and subjective uncertainty about risk. Patient Educ Couns. 2012;86(1):106–13. doi:10.1016/j.pec.2011.01.033.
Article PubMed Central PubMed Google Scholar
Ashley EA. The precision medicine initiative: a new national effort. JAMA. 2015;313(21):2119–20. doi:10.1001/jama.2015.3595.
Article CAS PubMed Google Scholar

Download references

Acknowledgments

This study was supported by Siteman Cancer Center and the Foundation for Barnes Jewish Hospital.

Compliance with Ethics Guidelines

ᅟ

Conflict of Interest

Graham A. Colditz and Esther K. Wei declare that they have no conflict of interest.

Human and Animal Rights and Informed Consent

This article does not contain any studies with human or animal subjects performed by any of the authors.

Author information

Authors and Affiliations

Division of Public Health Sciences, Washington University School of Medicine in St. Louis, 660 S. Euclid Ave., Campus Box 8100, St. Louis, MO, 63110, USA
Graham A. Colditz
California Pacific Medical Center Research Institute, San Francisco, CA, USA
Esther K. Wei
Channing Division of Network Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA, USA
Esther K. Wei

Authors

Graham A. Colditz
View author publications
You can also search for this author in PubMed Google Scholar
Esther K. Wei
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Graham A. Colditz.

Additional information

This article is part of the Topical Collection on Cancer Epidemiology

Rights and permissions

Reprints and permissions

About this article

Cite this article

Colditz, G.A., Wei, E.K. Risk Prediction Models: Applications in Cancer Prevention. Curr Epidemiol Rep 2, 245–250 (2015). https://doi.org/10.1007/s40471-015-0057-1

Download citation

Published: 30 September 2015
Issue Date: December 2015
DOI: https://doi.org/10.1007/s40471-015-0057-1

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Risk Prediction Models: Applications in Cancer Prevention

Abstract

Similar content being viewed by others

Defining the Study Cohort: Inclusion and Exclusion Criteria

Estimating the sample mean and standard deviation from the sample size, median, range and/or interquartile range

Statistical significance: p value, 0.05 threshold, and applications to radiomics—reasons for a conservative approach

Introduction

Purposes of Risk Prediction

Approaches to Model Development

Missing Data

Summary

Validation Comments

Reporting of Methods Used

Implementation

Adaptation

Conclusion

References

Papers of particular interest, published recently, have been highlighted as: • Of importance

Acknowledgments

Compliance with Ethics Guidelines

Conflict of Interest

Human and Animal Rights and Informed Consent

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Risk Prediction Models: Applications in Cancer Prevention

Abstract

Similar content being viewed by others

Defining the Study Cohort: Inclusion and Exclusion Criteria

Estimating the sample mean and standard deviation from the sample size, median, range and/or interquartile range

Statistical significance: p value, 0.05 threshold, and applications to radiomics—reasons for a conservative approach

Introduction

Purposes of Risk Prediction

Approaches to Model Development

Missing Data

Summary

Validation Comments

Reporting of Methods Used

Implementation

Adaptation

Conclusion

References

Papers of particular interest, published recently, have been highlighted as: • Of importance

Acknowledgments

Compliance with Ethics Guidelines

Conflict of Interest

Human and Animal Rights and Informed Consent

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation