Breast Cancer Research and Treatment

, Volume 133, Issue 1, pp 1–10

Risk prediction models of breast cancer: a systematic review of model performances

Authors

  • Thunyarat Anothaisintawee
    • Section for Clinical Epidemiology and Biostatistics, Department of Family MedicineRamathibodi Hospital
  • Yot Teerawattananon
    • Health Intervention and Technology Assessment Program, Ministry of Public Health
  • Chollathip Wiratkapun
    • Department of Radiology, Faculty of Medicine, Ramathibodi HospitalMahidol University
  • Vijj Kasamesup
    • Department of Community Medicine, Faculty of Medicine, Ramathibodi HospitalMahidol University
    • Section for Clinical Epidemiology and Biostatistics, Faculty of Medicine, Ramathibodi HospitalMahidol University
Review

DOI: 10.1007/s10549-011-1853-z

Cite this article as:
Anothaisintawee, T., Teerawattananon, Y., Wiratkapun, C. et al. Breast Cancer Res Treat (2012) 133: 1. doi:10.1007/s10549-011-1853-z

Abstract

The number of risk prediction models has been increasingly developed, for estimating about breast cancer in individual women. However, those model performances are questionable. We therefore have conducted a study with the aim to systematically review previous risk prediction models. The results from this review help to identify the most reliable model and indicate the strengths and weaknesses of each model for guiding future model development. We searched MEDLINE (PubMed) from 1949 and EMBASE (Ovid) from 1974 until October 2010. Observational studies which constructed models using regression methods were selected. Information about model development and performance were extracted. Twenty-five out of 453 studies were eligible. Of these, 18 developed prediction models and 7 validated existing prediction models. Up to 13 variables were included in the models and sample sizes for each study ranged from 550 to 2,404,636. Internal validation was performed in four models, while five models had external validation. Gail and Rosner and Colditz models were the significant models which were subsequently modified by other scholars. Calibration performance of most models was fair to good (expected/observe ratio: 0.87–1.12), but discriminatory accuracy was poor to fair both in internal validation (concordance statistics: 0.53–0.66) and in external validation (concordance statistics: 0.56–0.63). Most models yielded relatively poor discrimination in both internal and external validation. This poor discriminatory accuracy of existing models might be because of a lack of knowledge about risk factors, heterogeneous subtypes of breast cancer, and different distributions of risk factors across populations. In addition the concordance statistic itself is insensitive to measure the improvement of discrimination. Therefore, the new method such as net reclassification index should be considered to evaluate the improvement of the performance of a new develop model.

Keywords

Breast cancerRisk prediction modelSystematic review

Introduction

Breast cancer is the most common cancer found in women across the world, accounting for 23% of all new cancers in women [1]. Although around two-thirds of new cases are in developed countries, breast cancer is also the most common cancer in females in developing countries with the age standardized incidence rate around 20 per 100,000 [2]. An early detection screening using mammography can improve the survival rate of patients using strong evidence suggested from a meta-analysis [3]. The organized breast cancer screening programs using mammography have therefore been well established in most countries in Europe and North America, which resulted in significantly improved five-year survival rates as high as 89% [4]. In contrast, the majority of women living in developing countries have never had access to mammography because of the severe shortage of human resources and infrastructures that have been needed for a few decades to fill the gap. As a result, providing breast cancer screening to every woman is not feasible in most developing countries, but identifying target women with relatively higher risk of developing breast cancer looks to be a promising alternative.

The prediction model of breast cancer is a mathematical equation designed to quantify the risk that an individual woman would develop breast cancer in a defined period [5]. Demographic and biomedical characteristics are often required as input variables in the prediction model. At present, there are a number of risk prediction models constructed by many scholars using various statistical methods [613]. However, there has been no systematic review yet to evaluate the methods used to construct the models and their performances are questionable. Better understanding of model performance, including the strengths and limitations of each prediction model is needed before applying them in clinical practice. Therefore, this study aims to systematically review the development and performance of existing prediction models which are used to estimate the risk of breast cancer which are available globally. It is expected that the results from this review can help to identify the most reliable model which can be used in other settings or, if there is no particular model promising, the review may help to identify the strengths and weaknesses of each model development and lead to future research in this field.

Methods

Search strategy

We searched Medline from 1949 and EMBASE from 1974 to October 2010 for relevant articles published in English. Search terms and search strategy were ((“breast cancer”) OR (“breast neoplasm”)) AND ((“risk prediction”) OR (“assessment tool”) OR (“predictive model”) OR (“risk predicting”) OR (“risk assessment model”) OR (“prediction model”) OR (“prediction score”)).

Selection study

Identified studies were selected based on titles and abstracts. If a decision could not be made based on abstracts, full articles were retrieved. Observational studies (cohort, case–control, or cross-sectional study) published in English were selected if they met with the following criteria: considered more than one risk factor in the prediction model simultaneously, had the outcome as breast cancer versus non- breast cancer, applied any regression equation (e.g., Logistic regression, Poisson regression, Cox regression) to build up the prediction model, and reported each model’s performance (i.e., expected over observed ratio or concordance statistic).

Data extraction

The general characteristics of studies (i.e., author, journal, publication year, setting, type of studied population, ethnicity, study design, number of subjects, and specific objective(s) (i.e., develop or validate model, or both)) were extracted. If the prediction model was a first development, specific information about the creation of the prediction model (i.e., type of statistical model, risk factors, creating scores using coefficients or exponential of coefficients) were extracted. The method to assess the predictive ability which included the calibration (how closely predicted values agreed with the observed values) and the discrimination were also extracted. The assessment of calibration using a ratio of expected versus observed value (E/O ratio) along with 95% confidence interval (CI), or goodness of fit test was recorded. For assessment of discrimination using ROC analysis, the concordance statistic (C-statistic) along with 95% CI was extracted.

If studies had validated the prediction models, the type of validations, method used, and results of validation were also recorded. If authors had modified the previous prediction models, the following aspects were recorded: whether any of the standard variables was removed or modified; and whether any new risk factor were added. For each risk factor, we recorded how the risk factor was treated (continuous or categorical data). For the outcome, we recorded whether authors had compared invasive breast cancer versus non-cancer with or without combining ductal carcinoma in situ (DCIS) in invasive breast cancer.

Results

Description of studies

We identified 453 studies of which 25 studies were eligible for the review, see Fig. 1. Eighteen studies [623] developed or modified the previous prediction models estimating the likelihood of individual women developing breast cancer and seven studies [2430] only validated the existing prediction models. Table 1 illustrates the characteristics of the selected 18 studies which aimed to develop or modify the prediction models. Of these, 13 studies [610, 12, 13, 16, 17, 19, 20, 22, 23] focused on predicting overall invasive breast cancer while five studies [11, 14, 15] focused on both non-invasive (carcinoma in situ) and invasive breast cancer. Two studies [7, 18] estimated the risk of developing estrogen-type specific breast cancer.
https://static-content.springer.com/image/art%3A10.1007%2Fs10549-011-1853-z/MediaObjects/10549_2011_1853_Fig1_HTML.gif
Fig. 1

 Flow chart of selecting study

Table 1

Describe characteristics of study that developed prediction model

Author

Year

Model

Study design

Type of participant

Ethnicity

Outcome

Statistical method

Breast cancers

Non breast cancers

Risk factors considered in the models

Gail [11]

1989

Gail

Nested case–control

General women

Caucasian

Invasive breast cancer with CIS

Logistic regression

2,852

3,146

Age, age at menarche, age at first live birth, family history of breast cancer, numbers of previous breast biopsy, history of atypical hyperplasia

Tice [21]

2005

Modified Gail

Cohort

General women

Mixed

Invasive breast cancer with DCIS

Cox regression

955

80,822

Risk factors considered in Gail model plus breast density

Tice [22]

2005

Modified Gail

Cohort

General women

Mixed

Invasive breast cancer

Cox regression

400

6,504

Risk factors considered in Gail model plus nipple aspirate fluid cytology

Chen [15]

2006

Modified Gail

Nested case-control

General women

Caucasian

Invasive breast cancer with CIS

Logistic regression

1,280

4,035

Age, Age at birth of first child, family history of breast cancer, numbers of previous breast biopsy, breast density, weight

Decarli [10]

2006

Modified Gail

Case–control

General women

Caucasian

Invasive breast cancer

Logistic regression

2,569

2,588

Risk factors considered in Gail model

Tice [20]

2008

Modified Gail

Cohort

General women

Mixed

Invasive breast cancer

Cox regression

14,766

629,229

Age, race, breast density, family history of breast cancer, history of breast biopsy

Gail [16]

2007

Modified Gail (CARE)

Case–control

General women

African–American

Invasive breast cancer

Logistic regression

1,607

1,674

Risk factors considered in Gail model

Novoty [17]

2006

Modified Gail (Czech)

Case–control

General women

Mixed

Invasive breast cancer

Logistic regression

2,299

2,299

Risk factors considered in Gail model plus family history of any cancer, history of breast inflammation, body mass index, parity

Rosner [13]

1996

Rosner and Colditz

Cohort

General women

Caucasian

Invasive breast cancer

Poisson regression

2,249

89,132

Age at menarche, age at first live birth, age at subsequent births, age at menopause

Colditz [8]

2000

Modified Rosner and Colditz

Cohort

General women

Caucasian

Invasive breast cancer

Poisson regression

1,761

58,520

Risk factors considered in Rosner and Colditz model plus benign breast disease, type of menopause, hormone replacement therapy, weight, height, alcohol

Colditz [9]

2004

Modified Rosner and Colditz

Cohort

General women

Caucasian

Invasive breast cancer

Poisson regression

2,096

66,145

Risk factors considered in modified Rosner and Colditz model

Rosner [18]

2008

Modified Rosner & Colditz

Cohort

Postmenopause

Caucasian

Hormonal defined breast cancer

Poisson regression

1,559

59,812

Risk factors considered in modified Rosner and Colditz model plus serum estradiol

Tamimi [19]

2010

Modified Rosner & Colditz

Cohort

General women

Caucasian

Invasive breast cancer

Poisson regression

3,221

75,022

Risk factors considered in modified Rosner and Colditz model plus type of benign breast disease

Ueda [23]

2003

Other model

Case–control

General women

Asian

Invasive breast cancer

Logistic regression

376

430

Age, age at menarche, age at first live birth, family history of breast cancer, BMI

Boyle [6]

2004

Other model

Case–control

General women

Caucasian

Invasive breast cancer

Logistic regression

2,569

2,588

Age at menarche, age at first birth, age at menopause, family history of breast cancer, BMI, alcohol, hormonal replacement therapy, physical activity

Lee [12]

2004

Other model

Case–control

General women

Asian

Invasive breast cancer

Logistic regression

384

166

Age, age at menarche, age at menopause, age at first live birth, family history of breast cancer, breast feeding, alcohol, smoking

Barlow [14]

2006

Other model

Cohort

General women

Mixed

Invasive breast cancer with DCIS

Logistic regression

11,638

2,392,998

Age, age at first birth, family history of breast cancer, breast density, prior breast procedure, BMI, hormonal replacement therapy, false positive of MMG

Chlebowski [7]

2007

Other model

Cohort

Postmenopause

Mixed

Invasive breast cancer

Logistic regression

3,236

147,916

Risk factors considered in Gail model plus parity, breast feeding, smoking, alcohol, BMI, physical activity, hormone replacement therapy

Most studies constructed models using data obtained from general women (i.e., pre- and post-menopausal women), except two models [7, 18] which were developed based on postmenopausal data only. Nine models[6, 811, 13, 15, 18, 19] studied in Caucasian populations, six models were mixed populations but Caucasians were the majority, and only a few models were developed using data obtained from Asians [12, 23] and African–Americans [16], respectively.

Regarding the methodological approach, cohort was the most popular study design used for ten studies [79, 13, 14, 1822], followed by case–controls (six studies) [6, 10, 12, 16, 17, 23], and nested case–controls (two studies) [11, 15]. The study samples ranged from 550 to 2,404,636. Three regression equations were applied for model development, i.e., logistic regression in ten studies [6, 7, 1012, 1417, 23], poisson regression in five studies [8, 9, 13, 18, 19], and Cox regression in three studies [2022]. Among 18 studies, only four [6, 7, 14, 20] performed internal validations, while five studies (i.e., Gail [6, 7, 10, 2022, 2426, 2830], CARE [16], modified Gail [10], Rosner, and Colditz models [27], and modified Rosner and Colditz model [27]), performed external validations. The two prediction models developed by Gail et al. [11], and Rosner et al. [13], were subsequently modified by many scholars.

Gail model

In 1989, Gail et al. [11], published a landmark study which described the first prediction model, hereafter referred to as the reference Gail model, to estimate the risk that an American woman with given age and risk factors would develop breast cancer over the specified time interval. The model was constructed from 2,852 cases and 3,146 control women. The six risk factors included in the prediction models were age, age at first live birth, age at menarche, history of breast cancer in first-degree relatives, number of previous breast biopsies, and history of atypical hyperplasia. Although the authors did not examine the model’s performance, ten studies [6, 7, 10, 2022, 25, 26, 28, 30] conducted by other scholars validated the model in general US and Italian populations, and two other studies carried out model validation in sub-populations i.e., women with family history of breast cancer [24] and postmenopausal women [29]. All six variables from the reference Gail model were included in the validated models. The calibration performance of the 12 studies were similar to the Gail model, with the median E/O of 1.04 (range = 0.790–1.12). However, these have poor to fair performance in discrimination with the median C-statistic of 0.59 (range = 0.57–0.67).

Modified Gail model

The reference Gail model was then modified by seven subsequent studies [10, 1517, 2022]. In 2005, Tice et al. [21] modified the Gail model using 81,777 women in the San Francisco Mammography Registry. They included breast density as an additional predictive factor which yielded a very small improvement on discrimination with C-statistic up from 0.67 (95% CI, 0.65–0.68) to 0.68 (95% CI, 0.66–0.70). Tice et al. [22] also added nipple aspirate fluid cytology derived from an observational cohort of 6,904 general women in California. Again, the modified model could only provide a minor improvement in the ability of discrimination with the C-statistic up from 0.62 to 0.64.

In 2006, three groups of investigators [10, 15, 17] modified the reference Gail model by adding and removing a few variables. Chen et al. [15], extended the cohort that was used for the reference Gail model, added two variables (i.e., breast density and weight) and removed the age of menarche from the reference Gail model. The modified model had a greater ability of discrimination than the reference one with the C-statistic of 0.643 versus 0.596, respectively. Decarli et al. [10], were the first to apply the Gail model outside the US, and changed the coding of the categorical number of history of breast biopsy in the model using Italian women. Their results suggested that changing the coding of the categorical number of history of breast biopsy could give only a very minor improvement on the ability of discrimination with the C-statistic of 0.59 (95% CI; 0.55–0.63). Novoty et al. [17] added four different variables, i.e., family history of any cancer in first degree relatives, history of breast inflammation, body mass index, and parity in the reference model using data from Czechlosovakian women. They suggested their modified model was more accurate than the reference Gail model without reporting the C-statistic.

Gail et al. [16] subsequently modified the reference Gail model using African–American populations in 2007, by removing the age at birth of the first child from the model. They had also examined the performance of the modified model using data from the Women’s Health Initiative(WHI) study, which found that C-statistic was relatively low at 0.56 (95% CI, 0.54–0.58). Chlebowski et al. [7] included five more risk factors ((i.e., breast feeding, smoking, alcohol, physical activity, hormonal replacement therapy) in the reference Gail model. These risk factors had not been considered in the previous modified models. The model performance was fair with the C-statistic of 0.61 (95% CI, 0.59–0.63). The latest modified Gail model was done by Tice et al. [20], in 2008. They simplified the reference Gail model by keeping only three risk factors (i.e., age, family history of breast cancer, and history of breast biopsy) plus adding two more risk factors (i.e., race and breast density). The modified model was created using a large amount of data from seven mammography registries in the US. The new model offered relatively higher accuracy compared to the reference Gail model with the C-statistic of 0.66 (95% CI; 0.651–0.669) versus 0.613 (95% CI; 0.604–0.622), respectively. In addition, the calibration performance was good with an E/O ratio of 1.03 (95% CI, 0.94–1.01).

Rosner and Colditz model

Rosner and Colditz [13] developed their prediction model in 1996, hereafter referred to as the reference Rosner and Colditz model, based on the assumption that breast cancer development is closely related to "breast tissue aging" rather than reproductive and non-reproductive risk factors such as the Gail model did. The model was constructed using a nurse cohort of almost 90,000 Caucasian women in the US. Poisson regression was applied by including relevant breast tissue aging variables (i.e., current age, age at menarche, age at first live birth, age at subsequent births, and age at menopause) in the model. The Rosner and Colditz model was later externally examined by Rockhill et al. [27]. Although the model yielded good calibration with the E/O ratio of 1 (95% CI; 0.93–1.07), it offered relatively poor performance in discrimination with the C-statistic of 0.57 (95% CI; 0.55–0.59).

Since the reference Rosner and Colditz model’s performance was quite poor, the model was then later modified by Colditz et al. [8] in 2000. Six additional risk factors (i.e., history of benign breast disease, type of menopause, use of hormone replacement therapy, weight, height, and alcohol intake) were considered in addition to the reference Rosner and Colditz model’s which already contained five variables. The model performance was not reported. Rockhill et al. [27], subsequently validated the modified model in the Nurses’ Health study between 1992 and 1997 and found a similar performance in calibration (E/O ratio = 1.01, 95% CI, 0.94–1.09), but improved in the ability of discrimination (C-statistic = 0.64, 95% CI, 0.62–0.66 vs. 0.57, 95% CI, 0.55–0.59). Tamimi et al. [19], recently considered atypical hyperplasia benign breast disease in the modified Rosner and Colditz model [8] using the same nurse cohort data. This model offered little improvement, with increased C-statistic from 0.628 to 0.635 compared with the C-statistic of the modified Rosner and Colditz model developed in 2000.

The modified Rosner and Coditz model was later used to classify estrogen and progesterone receptors (i.e., ER+/PR+ and ER−/PR−) from non-breast cancer by Colditz et al. [9]. This finding suggested that age at menarche, age at menopause, body mass index, and history of benign breast disease were significant risk factors of ER+/PR+ and ER−/PR− breast cancer, whereas parity and age at each birth were significant preventive factors of ER+/PR+ breast cancer, but were not significant risk factors of ER−/PR− breast cancer. The discriminatory accuracy in classifying ER+/PR+ breast cancer was slightly higher than ER−/PR−. The corresponding C-statistics were 0.64 (95% CI = 0.63–0.66) and 0.61 (95% CI = 0.58–0.64), respectively. In addition, Rosner et al. [18], later included serum estradiol into the modified Rosner and Colditz model. Comparing the discriminatory accuracy of the two models with and without serum estradiol, the model with serum estradiol had a significantly higher discriminatory accuracy than the model without serum estradiol.

Other models

Some prediction models were constructed by combining risk factors from the modified Gail and modified Rosner and Colditz models with or without additional risk factors, see Table 1. Two models were constructed based on case–control studies in Japanese [23] and Korean [12] women. The Japanese study followed the modified Gail approach by Novoty et al. [17], whereas the Korean study [12] had added breast feeding and smoking with some other variables from the modified Gail and modified Rosner and Colditz models. However, these two modified models have never been validated.

The other two models were constructed using data of Caucasian [6] and mixed population women [14]. Additional risk factors that had never been considered in the modified Gail and the modified Rosner and Colditz were physical activity [6], prior breast procedure [14], and false positive of mammography [14]. In addition, Barlow et al. [14] created these models separately in pre and post-menopausal women. Internal validations were performed in both the Boyle and Barlow models (see Table 2), and calibration performance was good across the two Boyle [6] and Barlow [14] studies and the E/O ratio ranged from 0.96 to 1.00. However, the discrimination ability was still low in Boyle’s [6] study with the C-statistic of 0.58, but fair in the Barlow [14] study with the C-statistic of 0.63.
Table 2

Describe model performances

Author

Year

Calibration

Discrimination

Internal validation

External validation

Derived model

Internal validation

External validation

Gail [11]

1989

0.79–1.12a

0.58–0.67

Tice [21]

2005

0.68 (0.66–0.70)

Tice [22]

2005

0.64

Chen [15]

2006

0.64

Decarli [10]

2006

0.96 (0.84–1.11)

0.59 (0.54–0.63)

Tice [20]

2008

1.03 (0.99-1.06)b

0.66 (0.65–0.66)

Gail [16]

2007

0.93 (0.97–1.20)

0.56 (0.54–0.58)

Novoty [17]

2006

Rosner [13]

1996

33.22 (P = 0.096)

1.00 (0.93–1.07)

0.57 (0.55–0.59)

Colditz [8]

2000

1.01 (0.94–1.09)

0.63 (0.61–0.65)

Colditz [9]

2004

0.64 (0.63–0.66) for ER+/PR+

  

0.61 (0.58–0.64) for ER−/PR−

Rosner [18]

2008

0.64 (0.63–0.65)

Tamimi [19]

2010

0.64

Ueda [23]

2003

Boyle [6]

2004

1.03 (0.99–1.06)

0.58

 

Lee [12]

2004

Barlow [14]

2006

1.00 (premenopausal)

0.63 (0.60–0.66) (premenopausal)

  

1.01 (postmenopausal)

0.63 (0.62–0.64) (postmenopausal)

 

Chlebowski [7]

2007

0.62 (0.60–0.64) For ER+

  

0.53 (0.47–0.58) For ER−

aRange

b Chi-square goodness of fit test with P value

Discussion

Our systematic review has demonstrated that the number of breast cancer prediction models has grown up steadily over the past two decades. Although, many scholars have put substantial effort in developing prediction models, the overall results are not promising. Most models yield relatively poor performances particularly in discrimination, with the median C-statistic of 0.63 (range = 0.53 to 0.66) in settings where the models were developed (i.e., internal validation) and 0.59 (range = 0.56 to 0.63) in settings where the models were adopted (i.e., external validation or generalizability).

Most studies reported the accuracy of the models in terms of calibration and discrimination parameters. The goodness of fit or E/O ratio is commonly applied to measure how close the predicted and the observed values are [31, 32]. The C-statistic is usually applied to measure how well the model will assign a higher probability of having an event to a case group and a lower probability to a non-case group [33]. Poor model performance in internal and external validation can be explained by several factors. The association between risk factors and breast cancer derived from the developed data may occur by chance. This problem is prominent in situations in which there is a relatively small sample size with many risk factors included in the model. With a small sample size, it is more likely to select unimportant variables, but omit some important variables from the model [34]. Conversely, a very large sample size is more likely to include statistically important variables without clinical importance. The results of simulation studies have suggested that the number of subjects with events should be at least 10 and more safe with 20 or larger per one risk factor for building up a valid model [35, 36]. As per the results of our review, the number of variables included in the model varied from 5 to 13 variables, so the required number of breast cancers should therefore be at least 50 to 130 subjects, and 100 to 260 subjects to be safer. None of studies extracted had their number of breast cancers less than the required numbers. Therefore, an overoptimistic model should be less likely.

Differences in the distribution of risk factors across populations may also affect the generalizability of the model to different populations. Our review has shown that the C-statistics derived from external validation were lower than the C-statistics derived from internal validation with the median of 0.59 versus 0.63.

Moreover, some clinically important risk factors of breast cancer may not have been included in the prediction models [37, 38]. The best scenario is all important variables should be included, but not all important variables are known in real practice. As for the previous meta-analyses [3941], the odds ratios of known risk factors (i.e., reproductive history, family history of breast cancer in first degree relatives, hormonal replacement therapy) were only modest (i.e., they ranged from 1.14 to 2.10), suggesting some important variables have not been considered. Our meta-analysis (will be published) also suggested some new risk factors have been later identified during the last decade, but have not been considered in a model. These factors included breastfeeding, diabetes mellitus, obesity, active smoking, alcohol drinking, and oral contraceptives with their effects measured by odds ratio ranged from 1.10 to 1.60.

Variables included in a model should be reliable and accurately measured. Observer variability generally dilutes the predictive and discriminative abilities of the model [5]. A prospective study with well planned data collection will help to minimize measurement bias. In addition, these variables to be considered should be easy to measure, be readily available in routine practice or known by patients. Variables which require special techniques for measurement (e.g., BRCA gene, breast density, type of benign breast disease) will not be cost effective and thus not easy to apply for screening [42]. However, in a setting where genetic examinations are routinely performed the models incorporating genetic data with other risk factors (e.g., Tyrer’s model [43]) gained better performance than the models without genetic data and should be applied [44].

The nature of breast cancer itself might cause poor performance. Different subtypes of breast cancer might have different risk factors or similar factors, but different effects. This is supported by Colditz et al. [9], and Rosner and Colditz model [8] studies which found the ER+/PR+ and ER−/PR− subtypes had different risk profiles and performances. Most previous models combined all breast cancers together which may result in a dilution of the effects of risk factors.

The AUC or C-statistic itself is insensitive, i.e., it hardly increases even when very strong predictors are added in the models [45]. A reclassification table, claimed to move beyond the C-statistic, has therefore been proposed [46]. However, this method has a limitation, that is, it does not quantify an objective improvement for re-classification [33]. Therefore, the net reclassification improvement (NRI) and integrated discrimination improvement (IDI) have later been developed to quantify the degree of correct reclassification and net improvement [33]. Jane et al. [47], have applied these methods to re-evaluate the performance of the modified Gail model with breast density [21] compared with the reference Gail model [11]. They found that the model with breast density could improve risk stratification and classification accuracy with the NRI of 8.7, although the C-statistic increased only 0.01. Therefore, the NRI and IDI should be applied more to evaluate the prediction model performance.

Our review suggested that the reporting of research methods and results in risk prediction studies of breast cancer had discrepancies. From our knowledge, there is no specific recommendation for reporting studies in this area. However, some issues from reporting recommendations for tumor marker prognostic studies (REMARK) [48] may be applicable. These include type of studied subjects, study design, validity of measurements for outcome and risk factors, and use of statistical methods. In addition to general methodological issues, the study design should consist of derivative and validation phases [34] and if possible external validation should be conducted. The use of statistical models should be clearly described and models’ assumptions should be checked. In addition, calibration and discriminative parameters should be reported. Furthermore, the model should be simple and easily interpretable, if we would like to encourage general physicians and health care providers, who are not familiar with mathematic-statistic jargons, to apply it in health care practice.

Our review has some limitations. We included studies published in English. Studies available in gray literature e.g., research reports and conference proceedings, or published in other languages were not considered. The review focused on cancer risk prediction models developed using regression techniques. Those models developed using recursive-partition techniques (e.g., classification and regression tree (CART), or neural network) were not covered.

In conclusion, there is still a need for developing a reliable risk prediction model for breast cancer. Although the development of a reliable risk prediction model for breast cancer poses many challenges, it looks important for better understanding of the mechanism of risk factors and breast cancer in particular populations and provides an opportunity for enhanced study design for clinical research. Some more recently identified risk factors should also be considered in the model. The NRI and IDI should be applied to evaluate model performances in addition to the C-statistic. Lessons learned from prior efforts made by many scholars can be useful for the future development of a reliable prediction model.

Acknowledgment

This study was supported by the Health Intervention and Technology Assessment Program, the Thai Health Promotion Foundation, the Health Systems Research Institute, the Bureau of Policy and Strategy of the Ministry of Public Health, and Thai Health-Global Link Initiative Project.

Conflict of interest

TA: received honorarium from the Health Intervention and Technology Assessment Program, Ministry of Public Health, Thailand; received travel grants for conference from the Health Intervention and Technology Assessment Program and Faculty of Graduate Studies, Mahidol University, Thailand. YT, CW, VK, and AT have no conflicts of interest to declare.

Copyright information

© Springer Science+Business Media, LLC. 2011