Introduction

Purposes of Risk Prediction

In cancer prevention, research and practice risk prediction models have been used to determine study eligibility [1]. Risk stratification may be used to identify high-risk women, say in breast cancer families for referral to counseling, or to guide lifestyle modification or chemoprevention. More recently, with recommendations for MRI screening of women at high risk for breast cancer, the risk prediction guides an intervention decision by classifying women as eligible for screening or not [2]. Similar eligibility for covered services now applies to low-dose CT scanning for lung cancer as implemented by CMS coverage. Finally, refining models to better understand disease etiology through temporal relations of risk factors can improve approaches to prevention [3].

Regardless of these purposes, the process of multivariable risk prediction model development, validation, implementation, and adjustment underlies the continuous process of development and refinement. We propose the model in Fig. 1 as a continuing process for model application.

Fig. 1
figure 1

The cycle of development, validation implementation, and adjustment for application of risk prediction models

Approaches to Model Development

In the field of cancer risk prediction, two distinct classes of mathematical models have been used in cancer epidemiology. Statistical models may draw on established multivariable regressions (including linear and logistic regression) to relate risk factors to cancer incidence. Biomathematical models, on the other hand, aim to translate the presumed biologic process of carcinogenesis into mathematical models [4]. The best known models developed by Armitage and Doll underpin a long history of applying mathematical models to cancer incidence rates. Moving beyond age relations and adding epidemiologic risk factors, this approach now provides a structure to view the contribution of these risk factors to the underlying biologic process of carcinogenesis [5]. With regard to age relations, Fisher and Hollomon [6] used stomach cancer mortality, and Nordling [7] combined all cancer sites. They noted that, for ages 25 to 74 years, the logarithm of the death rate increased directly in relation to the logarithm of age. Armitage and Doll then evaluated cancer mortality in the UK in men and women in 1950 and 1951. Importantly, they focused on the slope or gradient in risk with age. A gradient of 6 to 1 (i.e., 6 units increase in the logarithm of the death rate per unit increase in the logarithm of age) was relatively consistent across 17 cancer sites. Based on this, they concluded that cancer is the end-result of several successive cellular changes. However, for breast, ovary, and cervical cancers, there was a deficit or reduction in the slope in older age groups. They concluded that this was due to a reduction (after about age 50 in their regressions) in the rate of one of the later changes in the process of carcinogenesis [5]. Thus, they proposed a multistage model of carcinogenesis.

Mathematical models can also summarize the impact of multiple variables such as change in risk factors across the life course, which may modify the incidence rates [8]. These models can refine and improve understanding of disease relations or disease development and then add to precision in risk estimation. More precise models may then lead to better tools for clinical risk assessment and decision-making [9]. Doll and Peto [10] applied this multistage cancer incidence model to lung cancer within the British Doctor’s Study. They observed that lung cancer incidence is proportional to (dose +6)2 × (age − 22.5)4.5, where dose = cigarettes per day. This result was consistent with the multistage model of carcinogenesis. They interpreted the coefficients for the components of the model as approximations for the number of stages in the carcinogenesis process, that is, incidence is proportional to the fourth to sixth power of time (age), suggesting four to six independent steps in the process of carcinogenesis. These model-based extrapolations have been confirmed by Vogelstein and colleagues in the setting of colon cancer [11]. For lung cancer, theses models implied that more than one of the stage of carcinogenesis was strongly affected by smoking [12, 13]. Extensive application of the Armitage and Doll model to radiation exposure also attests to its utility [14, 15].

Pike et al. [16•] took the Armitage and Doll approach and applied it to breast cancer, including risk factors (menarche, first birth, and menopause) as modifiers of the effect of time. Pike assumed that breast tissue “aged” at a constant rate starting at menarche and continuing to first birth. After an adverse effect of first birth, there was a decrease in the rate of “tissue aging” after the first birth. The rate of tissue aging further decreased after menopause. This replicated the observation for breast cancer mortality reported by Armitage and Doll [5]. Pike’s model only had a term for parous vs. nulliparous, did not include terms for second and subsequent pregnancies, nor did it account for the timing of these births nor any differences in the effect of natural menopause vs. bilateral oophorectomy. Rosner and Colditz expanded from the Pike model by adding more details of reproductive history, including the timing of births, and type of menopause (natural vs. surgical) [1719]. Like the Doll and Peto lung cancer model, this model generated a set of parameters for the rate of breast tissue aging before first pregnancy, the rate of tissue aging after menopause, and the magnitude of the adverse effect of first pregnancy. The Rosner and Colditz model has been further refined with the addition of benign breast disease [20], circulating hormone levels [21, 22], and so forth, but the underlying approach remains a life course accumulation of cancer risk that can be used to estimate annual and cumulative risk of cancer. Applications in colon [23], melanoma [24], and ovary [25] all use this approach.

A simpler form of this multivariable risk factor approach is to take a model from an existing epidemiologic data set and assess its performance in predicting cancer. One example is the multivariable model originally developed for lung cancer [26] that has been expanded to assess performance based on inclusion of DNA repair markers [27], gender, and smoking history [28].

Focusing on the age-incidence data for breast cancer incidence from high- and low-risk countries, Moolgavkar et al. [29, 30] took an alternative approach to modeling. Specifically, they fitted a two-stage model that allowed for normal cells to progress through transformed cells to cancer. They noted that across high- and low-risk countries, the shape of the breast cancer incidence curves was constant. Pathak and Whittemore applied a breast cancer incidence rate function to data from countries with high, medium, and low breast cancer incidence rates. They confirmed the observation of Moolgavkar that age at first birth and age at menopause exert similar effects on all women regardless of the breast cancer incidence rates in their country [31]. Pike and colleagues subsequently used traditional survival analysis methods to show that reproductive risk factors apply equally across ethnic groups in the USA [32]. The underlying approach of modeling the two-stage model of cancer has continued to be applied by Moolgavkar and colleagues in settings of lung, colon, and so forth [13, 33, 34].

Missing Data

A common gap in model development is description of how missing data are handled. Limiting model development to a completed data set is often reported. This has implications for the final application—will those with one or more missing data points be excluded from prediction? How will this impact clinical decision-making, testing or referral, or acceptability in clinical and public health settings? Rosner has overcome this in the application of his macular degeneration prediction model [35] using NHANES data to impute missing variables (personal communication). On the other hand, at the Joanne Knight Breast Health Center where some 50,000 screening mammograms are performed annually, a sufficiently large data set of similar women is available to impute missing variables when the Rosner-Colditz model is implemented in the clinical setting. Too often, lack of information on how missing data are handled limits the transfer of models from development to broader application.

Summary

Regardless of the approach to building a model, the proliferation in number of risk prediction models published since the NCI workshop in 2005 is impressive and indicates how an NCI initiative can help move a field forward [9]. Models are typically developed following one of three general approaches: (1) explicit selection of known causal factors; (2) biologic/lifespan or life calendar approaches; and (3) data driven and regression applications, typically from large databases. Despite the publication of many models, few seem to progress to validation in independent settings. In breast cancer, a systematic review of models by Meads and colleagues notes that 17 models have been published as of 2012, 3 have been validated (Gail, Rosner, Cuzick), and none evaluated for their clinical impact. Similarly, models for predicting colorectal neoplasia have been developed, though many lack validation, and only a few have been evaluated for implementation in clinical practice [3638]. A unique characteristic of colorectal neoplasia is the opportunity to develop risk models for the precursor lesion. This type of model has direct applications in clinical practice with respect to counseling for colorectal cancer screening.

Validation Comments

While Steyerberg in his text [39] discusses in detail the approaches to adjusting models for over fitting and other strategies in the context of splitting data sets into development and testing subsets, along with more advanced bootstrapping type approaches, an underlying limitation of these statistical approaches is that the extant data set can hide issues of bias. Accordingly, Moons and others advocate for independent validation—that is in an independent prospective data set [40, 41•, 42]. Validation is a key step in moving to application of the risk prediction model for cancer prevention.

One major challenge in epidemiologic risk prediction model building is obtaining access to the independent data set with the necessary variables. In breast modeling, Rosner and Colditz collaborated with California Teachers Study to achieve this [43•]—in model building and assessing the value of SNPs to other risk factors, the validation of the new models with necessary SNP measures remains a challenge.

Although statistical methods can mitigate the potential overestimate of performance associated with an internal validation, the goal is for a model to predict risk in groups other than the original population and ultimately to be used in a clinical setting. To evaluate generalizability of the model in other populations and to quantify any deficiencies in the model development require an external validation [40, 44]. When the validation population varies in an obvious way from the development population, the interpretation of the validation is straightforward, e.g., a model developed in one country that is validated in another country. When the development and validation populations vary in subtler or complex ways, the interpretation of the validation can be more challenging. Recent methods to better quantify the differences between the development and validation populations allow for more rigorous evaluation of external validation studies [45]. As suggested by Park [46•], comparison studies of different risk models’ performance on the same population (e.g., group external validation), such as the one by D’Amelio and colleagues [47], would be possibly of even greater value than individual external validation studies that assess the performance of any particular model.

The calibration of a model is a particularly important piece of determining a model’s performance and utility when applied beyond the data set from which it was developed, such as at the population level. Calibration provides information on the agreement between predicted and observed risks. In practice, the majority of prediction model articles do not report the model’s performance assessed by calibration [44]. One example of how calibration methods were used in an external validation was the external validation of the Rosner-Colditz model using the California Teachers Study (CTS) as an independent data set [43•] and using calibration methods described by Gail [1]. Calculating the observed and expected deciles of cases in the CTS based on Rosner-Colditz beta coefficients, the model demonstrated an overall good fit to SEER data [43•]. Other considerations related to validation and calibration are discussed in more detail by Park as part of this series [46•].

Reporting of Methods Used

As the number of risk prediction models and validation studies (internal and external) has grown, the need for a systematic way of reporting results has become paramount. Without consistent reporting of methods, choosing a model for application in cancer prevention can be quite subjective. Meta-analyses and systematic reviews of risk prediction modeling articles consistently find poor quality reporting across all aspects of prediction model development and for multiple disease sites [44, 48, 49]. In response to this, Collins and others developed the TRIPOD Statement, a checklist of 22 items determined to be essential for high-quality reporting of multivariable prediction models (diagnostic or prognostic) [50•]. The checklist is organized according to the sections of a standard research manuscript and differentiates which sections apply to development, validation, or both types of models. The authors propose to include the checklist with manuscripts submitted for peer review. As the literature in the field of risk prediction continues to grow, this type of structured guideline should improve the quality of reporting methods and will facilitate model comparisons and improvements.

Implementation

While models are developed and can be applied in a number of settings as noted earlier, the underlying challenge is for the model to be useful in the clinical or public health setting improving outcomes such as satisfaction with decisions, quality of life, or reducing disease endpoints [41•]. To achieve successful implementation, which is the true measure of a prediction model’s utility, the end user must be considered, preferably from the beginning of the model development process. An example may help understand how important this can be. If a sophisticated model is built on extensive assessment of lifestyle factors and is not sufficiently short to be completed in say a clinic setting, then noncompletion makes the model, no matter how good or perfect, of no practical use in that clinic. The requirement of simple variables for implementation increases the number of data sets that could be used for the validation of existing models, a current gap in the field of risk prediction as discussed above. We extended from this basic premise when developing the cancer risk assessment tools from the Harvard Center for Cancer Prevention in the 1990s [51, 52]. We chose simple dichotomy of risk factors to essay completion, and after focus group testing, [52] we moved to computer administration to reduce errors in arithmetic by users. We chose an engaging presentation with seven categories of risk as recommended by Weinstein and provide a lower limit of achievable risk reduction to convey the pint that risk of cancer cannot go to zero [53, 54]. Ongoing research on risk perception and presentation of risk will help refine the usefulness of output from models [5559]. Better integration of insights to output from the beginning phases of model development may increases uptake of models for cancer prevention.

Adaptation

In cardiovascular disease, we find numerous models of risk prediction—Framingham, Scottish, New Zealand, etc. For cancer, where we have standardized population-based incidence reporting through registration systems, adjusting models to fit national cancer incidence should be less problematic. However, beyond the approach of Gail and Rosner, no systematic study of adaptation has been reported. Should one take a validated model and apply it while assessing performance in a new setting, or should we go back to deriving a model from scratch? Starting over at the model development stage when a validation study suggests poor performance implies reselecting predictors, giving up any knowledge gained from the initial development of the model [41•], and ultimately will lead to more models developed that are not carried beyond the initial development or validation stage. Although several general methods for updating prediction models have been proposed and evaluated, and can improve the generalizability and transportability of existing models [41•], no broader standards or guidelines have been established that could guide efforts to adapt existing models. A systematic approach might help reduce redundancy and the proliferation of models that have not been validated. This would then facilitate more models reaching the stage of assessment for use in clinical or prevention settings and ultimately lead the intended positive impact on public health.

Conclusion

Risk prediction models have great potential to improve current cancer prevention strategies. Building on Armitage and Doll’s work on stages of carcinogenesis, risk models for cancer, and breast cancer in particular, have provided insights into etiology and moved clinical practice and research forward. Models that follow the full cycle, e.g., model development, validation, implementation, and adaptation, will result in the greatest impact on identifying specific groups for screening, targeting specific populations for cancer prevention counseling, more finely defining study eligibility criteria, and improving our understanding of etiologic heterogeneity. The challenges of each step in the cycle include the following: forethought regarding implementation during model development; accurate methods of handling missing data and careful and complete validation, including identifying an appropriate external validation data set; accurate and comprehensive reporting across the spectrum of development and validation; pragmatic studies of implementation in real-world clinical settings; and appropriate adaptation as knowledge grows. Perhaps due to these challenges, the proliferation of risk models has occurred largely without appropriate attention to the full cycle and eventual goal, resulting in many models that have little or no clinical or population-level impact. The need for wide-scale improvement in risk/screening stratification has been highlighted by the recently launched National Precision Medicine Initiative, which asserts the need for more precise clinical decision-making. However, much of the immediate attention given to the National Precision Medicine Initiative has focused on treatment, e.g., classifying an individuals’ response to specific pharmaceutical agents. This unfortunately overshadows the many applications to prevention—where risk prediction models can result in targeted and cost-effective screening [60]. In summary, risk prediction modeling has is still a growing field with many methodological challenges and opportunities. However, what we do not know, or areas in which we can still improve, should not hinder us from using our current knowledge in risk modeling to advance population-level cancer prevention.