Background

Vitamin D is a precursor to 1,25-dihydroxyvitamin D (1,25(OH)2D), a steroid hormone that mediates numerous actions in the body, including pathways involved in cancer. Mechanisms underlying the possible anticancer effects of 1,25(OH)2D include induction of apoptosis, stimulation of differentiation, anti-inflammatory effects, anti-proliferative effects, and inhibition of angiogenesis, invasion, and metastasis [1]. Substantial evidence from epidemiologic studies links higher vitamin D status to a reduced risk of colon cancer [2, 3]. However, results from studies of vitamin D status and breast cancer risk have been conflicting, with several prospective studies finding no association with circulating levels of 25-hydroxyvitamin D (25(OH)D) [47]. There has been little research on the relationship between vitamin D status and risk of breast cancer in African Americans [811]. Vitamin D deficiency (<20 ng/mL) or insufficiency (20 to 29 ng/mL) is common in African Americans, in large part due to darker skin pigmentation on average, which reduces the penetration of sunlight and subsequent production of vitamin D3 in the skin [12, 13].

Each of the study designs used for assessment of the relationship between vitamin D status and risk of breast cancer has drawbacks. Case-control studies of plasma or serum 25(OH)D levels are prone to reverse causality because blood specimens are usually drawn around the time of breast cancer surgery, at which point 25(OH)D levels may have been affected by the disease process or patients may have changed their habits regarding time spent outdoors, diet, and supplementation.

Prospective cohort studies with pre-diagnostic blood specimens overcome the problem of reverse causality. However, they typically have only a single blood draw for each participant, representing exposure status at only one point in time, whereas evidence suggests that 25(OH)D levels vary over time depending upon season, age, weight, and other characteristics [14, 15]. Studies of dietary intake or use of vitamin D supplements may have measures at more than one time point, but these measures do not take into account sun exposure and skin pigmentation, which also influence blood levels of 25(OH)D [16]. These limitations may be overcome by use of predicted vitamin D status, with updating and averaging of the predicted values over time. This method was first demonstrated by Giovannucci et al. in one of the initial studies to show an association between vitamin D status and colon cancer [2].

While predicted 25(OH)D will be imprecise for any given individual, it can be effective for ranking study participants into disparate categories of exposure, such as lowest quartile vs. highest quartile of predicted value. The validity of this method will depend on the specimens used for establishing the prediction model, collection of data on the important determinants of 25(OH)D levels in the study population, and availability of repeated measures of those determinants over time.

We used prospectively collected data and blood specimens from 2856 study participants in the Black Women’s Health Study to develop a prediction model for 25(OH)D. We then assessed the relationship between predicted vitamin D status and risk of breast cancer in the entire cohort.

Methods

Study population

The Black Women’s Health Study (BWHS) began in 1995 when 59,000 African American women aged 21–69 years from across the USA completed mailed health questionnaires. Participants have completed follow-up questionnaires every two years. Follow-up was complete through 2013 for over 85 % of person-time since 1995. The Institutional Review Board of Boston University approved the protocol and reviewed the study annually.

At baseline, participants were asked about use of vitamin D supplements, use of multivitamins, weight, height, number of births, timing of each full-term birth, lactation, age at menarche, use of oral contraceptives, breast cancer in first-degree relatives, vigorous physical activity, alcohol consumption, cigarette smoking, menopausal status, age at menopause, use of supplemental female hormones, years of education, and many other factors. The biennial follow-up questionnaires ascertained occurrences of incident breast cancer and updated information on use of vitamin D supplements and multivitamins, and most other variables. A modified version of the NCI-Block food frequency questionnaire was used to ascertain usual diet in 1995 and 2001 [17].

Breast cancer cases

Each BWHS questionnaire asks about new diagnoses of breast cancer and the year of diagnosis. Participants are contacted for permission to obtain pathology reports and other medical records and data are also obtained from state cancer registries in the 24 states in which 95 % of participants live. We were able to obtain medical records, cancer registry records, or both for approximately 95 % of women who reported incident breast cancer, of which 99 % were confirmed. Only cases of confirmed incident breast cancer were included in the present analysis. In the early years of the study, 1995–2000, testing for estrogen receptor (ER) and progesterone receptor (PR) was not universal, and thus we have missing data on ER and PR status for some participants. Among cases with known status, the proportions with ER+/PR+, ER+/PR-, ER-/PR+, and ER-/PR- tumors are 50 %, 14 %, 2 %, and 34 %, respectively, similar to the distributions observed for African American women in the SEER registry and other population-based data [1820]. In previous comparisons of cases with data on receptor status to cases with unknown receptor status, the two groups were similar with regard to the prevalence of known breast cancer risk factors [21].

Blood collection and laboratory assays

Collection of blood specimens from BWHS participants began in 2012 and will continue through 2017, by which time all living study participants will have had an opportunity to provide a sample. Of participants approached to date, about 25 % have provided samples. Participants are mailed an informed consent, explanatory materials, a pre-printed laboratory requisition form, and instructions for locating a nearby blood collection site. Blood specimens are collected and tested by Quest Diagnostics (Madison, NJ, USA) an accredited national clinical laboratory [22, 23]. Liquid chromatography-tandem mass spectrometry (LC-MS/MS) was used for measurement of 25(OH)D [24, 25], which was carried out at three Quest central laboratories. National Institute of Standards and Technology Standard Reference Material for 25(OH)D in human serum (NIST SRM 972) was used for quality control. Written informed consent to use the blood samples for health-related research for the entirety of the study was obtained from participants who provided samples.

25(OH)D prediction model

Development of the prediction model was based on 25(OH)D values obtained from assays of plasma samples provided between 2013 and early 2015, the period immediately following completion of the 2013 BWHS questionnaire. During that period, 3539 participants provided blood samples with signed informed consent. We excluded 276 women with prevalent cancer at the time of the blood draw and 407 who had missing data on any of the candidate predictors of 25(OH)D, for an analytic sample of 2856. Variables that were known or suspected to be related to endogenous levels of 25(OH)D and for which data were available from the 2013 questionnaire were considered as possible predictors: use of vitamin D supplements (with or without calcium, at least twice a week), multivitamin use (at least twice a week, not specified whether they included vitamin D or how much), body mass index (BMI, kg/m2) (considered as both a categorical and a continuous variable), vigorous exercise, walking for exercise, current cigarette smoking, current alcohol consumption, use of female hormones, use of oral contraceptives, and menopausal status. Dietary intake of vitamin D derived from responses to a modified version of the NCI-Block food frequency questionnaire completed in 2001 was also considered as a predictor. A solar UV-B flux variable (high, medium, low levels of solar UV-B radiation) was created as a proxy for ambient sun exposure based on state of residence in 2013 and the reported average annual UV-B radiation in each state [26], and considered for the prediction model.

Repeated k-fold cross-validation was used to derive the best 25(OH)D prediction model [27]. The 2856 specimens were divided into five groups of equal size, with 4/5 serving as a training set and the remaining 1/5 serving as the test set. Using the ‘caret’ package in R [28], stepwise selection by Akaike’s information criterion was performed on each training set to identify the optimal predictors in a generalized linear model for continuous 25(OH)D. The best fit model parameters were then used to predict 25(OH)D in the test set, at which point Pearson’s correlation coefficient was computed for the linear association between observed and predicted 25(OH)D values. This procedure was performed another four times with each remaining 1/5 serving as the test set. We repeated this 100 times, with the full sample divided into a different set of five groups each time. Overall model prediction performance was calculated as the average R-squared value across the 100 repetitions and each of the five folds. The generalized linear models were adjusted for season of blood draw, laboratory, and age, for the purpose of controlling variability when estimating beta coefficients for predictors.

Baseline data from the 1995 BWHS questionnaire were then used in combination with beta-coefficients for each of the predictors to compute a predicted 25(OH)D level at baseline in 1995 for the entire BWHS cohort. Participants were excluded from the analyses if they had missing data on any of the predictors at baseline. Predicted 25(OH)D level was then updated every two years using the same beta coefficients and new values of the predictors. If data were missing for a given variable at some point during follow-up, the value from the previous cycle was carried forward. We then computed a cumulative average of predicted 25(OH)D. The cumulative average method [29, 30] has been used previously in vitamin D prediction models and for exposures such as dietary intake and physical activity [2, 31, 32]. For this approach, the predicted 25(OH)D score for a given time point is the average of scores from previous time points up to and including that time point. This method may better represent average long-term vitamin D status over the period of follow-up for each individual [33].

Statistical analysis of 25(OH)D in relation to breast cancer risk

Analyses of the association between predicted vitamin D status and breast cancer risk included all BWHS participants who had not been diagnosed with cancer prior to enrollment in the cohort and had complete information from the baseline questionnaire on each of the variables included in the 25(OH)D prediction model. Each participant contributed person-time from baseline in 1995 until diagnosis of breast cancer, death, loss to follow-up, or end of follow-up in 2013, whichever came first. Predicted 25(OH)D status was analyzed in quartiles, with highest quartile as the reference category. We used Cox proportional hazards regression, stratified by age (year) and questionnaire cycle (two years) to estimate the incidence rate ratio (IRR) and 95 % confidence interval (CI) for quartile of predicted 25(OH)D in relation to breast cancer incidence, with adjustment for number of births (0, 1, 2, ≥3), age at first birth (<20, 20–24, ≥25), age at menarche (<11, 12–13, ≥14 years), age at menopause (<45, 45–49, ≥50 years, or premenopausal), first-degree family history of breast cancer (yes, no), recent oral contraceptive use (within the previous 5 years), long-term oral contraceptive use (≥10 years), duration of use of estrogen with progesterone postmenopausal hormones (≥5 years), and BMI (<25, 25–29, 30–34, ≥35 kg/m2). Covariates that changed over time were treated as time-dependent. In addition to the overall analyses, we conducted analyses separately for ER+ and ER- breast cancer and within strata of age (<45 years and ≥45 years) and current use of vitamin D supplements (yes, no).

Results

Predicted 25(OH)D

Figure 1 displays the frequency distribution of measured plasma 25(OH)D in the 2856 specimens that were included in model development. Quartiles of measured plasma 25(OH)D had the following cut points: 21 ng/mL, 31 ng/mL, and 40 ng/mL. Overall, 22 % of specimens had plasma 25(OH)D levels <20 ng/mL (a commonly used cut-point for deficiency) and 47 % had a value <30 ng/mL (cut-point for insufficiency) [34]. Among women who did not report taking a vitamin D supplement, 34 % had <20 ng/mL and 64 % had <30 ng/mL of plasma 25(OH)D.

Fig. 1
figure 1

Measured plasma 25-hydroxyvitamin D (25(OH)D) (ng/mL) among 2856 participants in the Black Women’s Health Study

Table 1 shows the variables retained in the prediction model. Beta-coefficients for age, season of blood draw, and UV-B flux, which were included as adjustment factors but not used in the derivation of predicted vitamin D status, are also given in Table 1. The strongest predictor, as indicated by squared semi-partial correlation coefficients, was vitamin D supplementation, which independently accounted for 10 % of the total variation in the observed vitamin D levels after adjustment for the other retained predictors in the model. Multivitamin use, dietary intake, physical activity, use of female hormones, and use of oral contraceptives were associated with higher levels of predicted 25(OH)D, whereas cigarette smoking, alcohol consumption, and higher BMI were associated with lower levels. Overall, the model was estimated to explain 25.2 % of variation in 25(OH)D. The correlation coefficient for predicted vs. observed 25(OH)D averaged across all cross-validation runs was 0.49 (SD 0.026). On average, across the 100 repetitions of five-fold cross-validation, 40 % of the testing set participants were classified into the same quartile for observed and predicted (agreement diagonal) and 82 % were classified in either the same or an adjacent quartile.

Table 1 Predictors of plasma 25-hydroxyvitamin D (25(OH)D) in 2856 participants from the Black Women’s Health Study

Association between predicted 25(OH)D and incidence of breast cancer

A total of 1454 cases of incident invasive breast cancer were identified during follow-up from 1995 through 2013, including 433 ER- cases, 802 ER+ cases, and 219 cases with unknown ER status.

Women in the lowest quartile of predicted 25(OH)D had an increased risk of breast cancer: the IRR for the lowest vs. highest quartile (reference) was 1.06 (95 % CI 0.92, 1.23) in analyses adjusted for age and period and 1.23 (95 % CI 1.04, 1.46) in multivariable analyses (Table 2). The strongest confounder was pre-diagnostic BMI; higher BMI was associated with lower levels of measured 25(OH)D and, in this dataset, with lower incidence of breast cancer.

Table 2 Cumulative predicted vitamin D status in relation to breast cancer incidence, overall and by estrogen receptor (ER) status of the breast tumor

There was evidence of a linear trend toward increasing risk with decreasing quartile of predicted 25(OH)D (P trend = 0.015). The association between predicted 25(OH)D and ER+ breast cancer was similar to that for breast cancer overall, with a multivariable-adjusted IRR for lowest vs. highest quartile of 1.26 (95 % CI 1.00, 1.58). For ER- breast cancer, the IRR was 1.12 (95 % CI 0.82, 1.52) for the same comparison.

In age-specific analyses (Table 3), IRRs for the lowest versus highest quartiles of predicted 25(OH)D were 1.28 (95 % CI 0.90, 1.82) in women younger than 45 years and 1.25 (95 % CI 1.03, 1.51) in older women. The observed association was present regardless of vitamin D supplementation: IRRs for the lowest vs. highest quartiles of predicted score were 1.23 (95 % CI 1.02, 1.49) among non-users and 1.25 (95 % CI 0.82, 1.90) among users. Results were inconsistent across strata of BMI. Among women who were obese at baseline, a strong significant association was observed, with an IRR of 1.65 (95 % CI 1.25, 2.19) for lowest relative to highest quartile, whereas among overweight and normal weight women, IRRs for the same comparisons were 1.04 (95 % CI 0.80, 1.35) and 1.09 (95 % CI 0.87, 1.45), respectively. However, there was not a statistically significant interaction (P interaction = 0.18).

Table 3 Cumulative predicted vitamin D status in relation to breast cancer incidence, within strata of age and vitamin D supplement use

When the analyses were repeated using a simple update of predicted 25(OH)D instead of a cumulative average, the results were essentially the same: the IRR for lowest quartile to highest quartile for all breast cancer was 1.23 (95 % CI 1.04, 1.44).

Discussion

To our knowledge, this is the first study to use a 25(OH)D prediction model to assess the relationship between vitamin D status and incidence of breast cancer. Women in the lowest quartile of predicted 25(OH)D over the course of follow-up were estimated to have a 23 % increased risk of breast cancer compared with those in the highest quartile. Based on the distribution of measured plasma 25(OH)D levels in the 2856 specimens that formed the basis of the prediction model, almost all women in the lowest quartile would have had a level considered deficient, and all women in the highest quartile would have had levels considered sufficient. In analyses stratified by BMI, a significant positive association was observed only among obese women.

Most previous studies of vitamin D status in relation to breast cancer risk used serum or plasma levels of 25(OH)D from a single point in time as the measure of vitamin D exposure. Higher levels were associated with lower risk in a number of case-control studies [35, 36]. The majority of prospective cohort studies have yielded null results [35, 36] but a significant inverse association was observed between 25(OH)D levels and overall breast cancer risk in the E3N, a large prospective cohort study from France [37]. Three of the largest prospective studies – the European Prospective Investigation into Cancer and Nutrition [7], the Nurses’ Health Study II [5], and a combined analysis of the NYU Women’s Health Study and the Swedish Mammography Cohort [6] - found no association between 25(OH)D levels and overall breast cancer risk. However, inverse associations were observed in some subgroups. The NYU/Swedish analysis observed an inverse association among women under age 45 years and among premenopausal women [6]. In the Multiethnic Cohort Study, higher levels of plasma vitamin D were associated with a significant reduction in breast cancer incidence in white women but not in other racial/ethnic groups [10]. In the Nurses’ Health Study, there with a statistically significant inverse trend across quintiles among women aged 60 years and older, but no trend among younger women [38]. In the present study, IRRs were similar across different ages.

In the Nurses’ Health Study II, there was a significant interaction with BMI, with a strong positive interaction observed among women in the highest category of BMI (≥25) but no association among women with BMI <25 [5]. Our BMI-stratified analyses produced similar findings, with a positive association observed among women with BMI ≥30, but not among women with lower BMIs; we were able to examine three strata of BMI because of the greater number of breast cancer cases and the higher prevalence of obesity. Other studies have reported no interaction with BMI [3740]. The reasons for this interaction, if not due to chance, are unclear.

Although findings from basic research suggest that vitamin D may have a greater impact on ER+ breast cancer than ER- breast cancer through attenuation of estrogen signaling and synthesis, [4143], previous investigations that assessed ER+ and ER- breast cancer have not found a stronger association for ER+ breast cancer. In the Nurses’ Health Study [38] and two other studies [44, 45], there was evidence of a stronger association with ER- breast cancer, but most findings have been consistent across subtypes. In the present study, there was significant association with incidence of ER+ breast cancer, and a weaker association with ER- cancer.

Circulating levels of 25(OH)D change over time, with a single measurement reflecting vitamin D from dietary and solar sources within a three-week half-life [46]. Since most observational studies have relied on plasma 25(OH)D measured from a single blood draw, often taken many years before the diagnosis of breast cancer, non-differential misclassification with bias towards the null is likely. The reproducibility of 25(OH)D measurements obtained at two time points has been examined in a few studies. In most instances, including among African American participants in the Southern Community Cohort Study [47], specimens taken 1–2 years apart were strongly correlated, but correlation diminished after longer intervals since first blood draw. Thus, 25(OH)D levels from a single blood draw appear to be valid measures of usual levels within two years, but no inference can be made about levels more removed in time. An important strength of using a vitamin D prediction model is the use of repeated measures to estimate vitamin D levels at multiple time points. In the present study, we were able to calculate predicted 25(OH)D at baseline in 1995 and every two years afterwards through the end of follow-up. We created a cumulative average exposure variable, which represented an average predicted level for each participant from baseline through the end of her follow-up. A cumulative average of predicted 25(OH)D is likely to be a better proxy for extremes of vitamin D status in the years prior to breast cancer diagnosis compared with a single blood draw.

Other studies have assessed intake of foods containing vitamin D or intake of vitamin D supplements in relation to breast cancer risk. Several studies found reduced risk of breast cancer among women who took supplements or were in the highest categories of dietary intake [4850], whereas others found no association [5157].

A notable strength of the present study is the k-fold cross-validation method used to develop and test the prediction model. Most previous vitamin D status prediction models have been based on model development in a single training set with testing in a single test set [2, 33, 58]. Our machine-learning approach repeated the training and testing steps 100 times, each time using a different subset of the sample for each step. The validity of our model was evidenced by the high correlation coefficient, 0.49. Bertrand et al. developed and tested 25(OH)D prediction models in similarly sized samples from Nurses’ Health Study, Nurses’ Health Study II, and Health Professionals Follow-Up Study and reported correlation coefficients of 0.33, 0.42, and 0.30, respectively [33].

Prediction models for vitamin D status may perform better in populations with African ancestry than in populations with European ancestry because sun exposure, which is difficult to quantify, contributes less to 25(OH)D levels among African Americans, because on average persons of African ancestry have darker skin pigmentation. Thus, variables such as use of vitamin D supplements, BMI, and cigarette smoking, will tend to be more important predictors in an African American population. In a study of circulating 25(OH)D levels in African American and white participants from a nationwide study of radiologic technologists, UV radiation factors (e.g., time spent outdoors, season) were associated with 25(OH)D levels in white Americans, but not in African Americans [14]. In the Adventist Health Study II, age, BMI, season, supplement use, total vitamin D intake, skin type, and sun exposure factors were significant predictors in white Americans, whereas only season, supplement use, and total vitamin D intake were predictors in African Americans [59]. In the Health ABC Study of elderly African Americans, significant predictors of 25(OH)D were similar to those in the present study: supplement use, dietary intake, BMI, walking, and season of blood draw, with supplement use being the strongest predictor [60].

Several other surveys have reported higher proportions of vitamin D deficiency among African American women than were observed in our study sample [61, 62]. The BWHS population is not representative of all African American women: in particular BMI is lower in the BWHS and use of female hormones is more common, and both would result in relatively higher levels of 25(OH)D. Nevertheless, the wide range of values in our sample permitted development of a prediction model with validation parameters on a par with those from prediction models in other populations. The internal validity of the work presented here would not be compromised by use of a sample that does not represent all women.

A limitation of the present study is the lack of granular data on vitamin D supplements. We did not collect data on the dose of the supplement. The prediction model might have been stronger if separate variables for low dose (e.g., 400 IU as part of a multivitamin) and high dose (e.g., 1000–2000 IU individual supplement) were included in the model. Nevertheless, supplement use accounted for 10 % of the variability in predicted vitamin D status in our study. Another potential limitation is lack of data on degree of skin pigmentation. However, two recent studies of skin pigmentation and 25(OH)D levels among several racial/ethnic groups found that measured constitutive skin color did not improve prediction of 25(OH)D concentrations when included in a model that had terms for race/ethnicity [63, 64].

Genetic variation may explain some of the interpersonal variation in 25(OH)D concentrations. We did not include genetic variants in the prediction model because there is a lack of consensus on which variants are associated with 25(OH)D in African Americans [8, 60, 65], and a previous study found that adding genetic variants to a 25(OH)D prediction model made little difference in explaining overall variation in serum 25(OH)D, especially in African Americans [65].

Conclusions

In conclusion, data from this large prospective study suggest that African American women who have low predicted 25(OH)D have a greater risk of breast cancer relative to those who have sufficiently high levels. Vitamin D deficiency is common among African Americans. Indeed, of the 2856 BWHS participants who provided a blood sample in 2013–2015, 47 % had levels below 30 ng/mL (insufficient or deficient) and 22 % had levels below 20 ng/mL (deficient). If the present findings are confirmed in other prospective studies, preventing vitamin D deficiency may be an effective means of reducing breast cancer incidence in African American women.

Abbreviations

1,25(OH)2D, 1,25-dihydroxyvitamin D; 25(OH)D, 25-hydroxyvitamin D; BMI, body mass index; BWHS, Black Women’s Health Study; CI, confidence interval; ER, estrogen receptor; IRR, incidence rate ratio; LC-MS/MS, liquid chromatography-tandem mass spectrometry; SD, standard deviation