Introduction

Vitamin D regulates parathyroid hormone levels and has numerous physiological effects, including regulation of bone formation and mineralization, calcium homeostasis, immunity, and insulin secretion [1]. Insufficiency for vitamin D correlates with various diseases, including multiple sclerosis, cardiovascular disease, infectious disease, and cancer [26]. In cancer, the vitamin D hypothesis was first proposed based on the observation that cancer mortality varies with higher latitudes [7]. This hypothesis relies on the fact that vitamin D is produced in the skin during exposure to ultraviolet light, and persons living at higher latitudes, being exposed on average to less ultraviolet light compared to persons living at more equatorial latitudes, produce lower levels of vitamin D. Animal studies [810] and case–control studies in humans [1115] have provided strong evidence that vitamin D protects against colorectal cancer (CRC). 25 hydroxyvitamin D3 [25(OH)D3 or calcidiol] concentrations have been associated with both CRC incidence (op. cit.) and adenoma recurrence [1618]. Lower dietary intakes of vitamin D have been associated with a higher risk of developing colonic neoplasia [12, 1921]. Some have suggested that vitamin D insufficiency, which increases with advancing age, is a factor contributing to sporadic CRC, which is also associated with aging [22]. Other factors that are associated with variance in serum 25(OH)D3 levels, and coincidentally with CRC risk, include calcium intake, obesity, skin color, and genetic background [23]. Finally, vitamin D has anti-proliferative, anti-invasive, pro-apoptotic, and pro-differentiation activities [24, 25], making vitamin D a potentially potent cancer-preventive agent.

CRC is the second leading cause of cancer-related deaths in both sexes in the United States [26, 27]. Incidence and mortality rates for CRC in the US have declined since the late 1980s, but these trends have been less pronounced in African Americans (AAs), resulting in 20 % higher incidence and 44 % higher mortality rates in AAs compared to European Americans (EAs) [28]. AAs have lower serum vitamin D levels than other Americans [15, 29]. The lower vitamin D levels could be explained in part by skin color, which attenuates the production of vitamin D. Consequently, differences in serum vitamin D levels could contribute to the CRC health disparities between AAs and non-Hispanic whites. Inverse associations between serum vitamin D levels and AA CRC have been reported in the Health Professionals Follow-Up Study and the Multi-Ethnic Cohort [30, 31]; however, the role of vitamin D in protection against CRC in AAs remains understudied.

In the present study, we hypothesized that genetic polymorphisms could affect serum or tissue vitamin D levels and thereby be associated with CRC risk, explaining in part the CRC health disparity in the AA population. We selected genes for analysis that synthesize, metabolize, and transport vitamin D and act in vitamin D-related transcriptional regulation (CYP2R1, CYP3A4, CYP24A1, CYP27A1, CYP27B1, GC, DHCR7, and VDR, which we will hereafter refer to as vitamin D-related genes). These genes are highly polymorphic in different human populations, and as a group, they have been extensively analyzed in numerous cancer association studies. Although there have been many genetic association studies of CRC [3135], few studies have focused on the role of vitamin D-related genes in AA CRC [3638]. Consequently, in the present study, we compared the frequencies of potentially functional single-nucleotide polymorphisms (SNPs) in vitamin D-related genes in AA CRC cases and AA controls.

Materials and methods

Human subjects

CRC cases and population-matched, healthy controls were ascertained from the North Carolina Colorectal Cancer Study (NCCCS) and the Chicago Colorectal Cancer Consortium (CCCC). In total, we included DNA samples from 961 AA CRC cases (371 NCCCS, 590 CCCC) and 838 AA controls (380 NCCCS, 458 CCCC). Samples from the NCCCS were obtained through a large-scale, population-based case–control study of colon and rectal cancer, conducted in a 33 county area in central and eastern North Carolina. Histologically confirmed cases were drawn at random from all CRC cases reported to the North Carolina Central Cancer Registry through the rapid ascertainment system. There were two phases of the NCCCS: one from 1996–2000 and one from 2001–2006, in which rectal and rectal–sigmoid cancers were over-sampled. Controls were selected from North Carolina Division of Motor Vehicle lists if under the age of 65, or from a list of Medicare-eligible beneficiaries obtained from the Health Care Financing Administration if over the age of 65. Controls were matched to cases using randomized recruitment strategies, with probabilities based on 5-year age group, sex, and race. The details of this study have been published previously [37, 39].

The CCCC was established to ascertain a significant proportion of CRC cases occurring in Cook Country, Illinois, with IRB approval to enroll CRC patients prospectively, which began in 2011, at six major hospitals in the County (Advocate Christ Medical Center, Jesse Brown Veterans Administration Medical Center, Rush University Medical Center, John H. Stroger Hospital of Cook County, the University of Chicago Medicine, and the University of Illinois Hospital and Health Sciences System). CRC cases are identified in endoscopy, oncology, or surgery clinics and enrolled in the study. Detailed clinical and epidemiological data are collected, and the CCCC preserves and stores specimens of tumor tissue and noninvolved normal colonic mucosa as well as serum, plasma, red blood cells, and DNA from blood.

In the present study, most of the DNA samples from CRC cases (590 individuals) were prepared from noncancerous tissues obtained from colon or rectal surgical specimens (formalin-fixed, paraffin-embedded tissues), archived over the period 1985–2012, ascertained through the records in the Departments of Pathology at the Jesse Brown Veterans Administration Medical Center, John H. Stroger Hospital of Cook County, the University of Chicago Medicine, and the University of Illinois Hospital and Health Sciences System. Individuals known to have hereditary syndromes (familial adenomatous polyposis and Lynch syndrome) or inflammatory bowel disease were excluded. Available baseline characteristics including age, gender, race, colorectal tumor location, histological grade, depth of invasion, nodal involvement, and metastases were recorded. The remaining case DNA samples were prepared from blood specimens obtained from the prospective ascertainment of the CCCC. Control subjects were individuals with tumor-free colon and rectum as determined by colonoscopy or cancer-free individuals as determined by review of their available medical records ascertained through the centralized biobanks of the University of Chicago Medical Center, Department of Medicine, and of the University of Illinois Hospital and Health Sciences System. The age at time of sample collection was used as the age for each control.

Germline DNAs were prepared using Gentra Puregene kits (Qiagen) according to the manufacturer’s instructions. For formalin-fixed, paraffin-embedded tissues, the paraffin was first removed with octane–methanol and the proteinase K extraction step was extended to 3 days, adding fresh enzyme on each day, followed by heating the sample at 95 °C for 15 min prior to protein precipitation.

Genotyping

The vitamin D-related genes were selected for analysis based on their functions in the synthesis, metabolism, transport, and regulation of vitamin D transcriptional responses. SNPs in the vitamin D-related genes were identified by direct sequencing of PCR products of each exon of each gene in addition to PCR products spanning the 5′ and 3′ untranslated regions and 2,000 base pairs upstream of the transcription start site of each gene. The PCR products were prepared from DNAs from 48 healthy AA persons ascertained at Howard University in Washington, DC, from 2000 to 2005 [38]. SNPs were selected based on the following criteria: The SNP (1) had a minor allele frequency greater than 5 % and (2) was potentially functional, that is, the base pair change was in predicted regulatory sequences in a promoter, 5′ untranslated, or 3′ untranslated region, could affect splicing, or caused an amino acid substitution or (3) was associated with serum D levels in genome-wide association studies [40, 41]. The list of 39 potentially functional SNPs is shown in Table 1. We noted that many of the variants identified by the sequencing of AA men and selected by these criteria had high Fst values comparing African- and European-ancestry populations (Table 1). Fst is a measure of allele frequency differences between populations.

Table 1 Characteristics of the single-nucleotide polymorphisms used in the study

We genotyped the 39 potentially functional SNPs and 100 ancestry informative markers (AIMs) [42] using the Sequenom MassARRAY platform. For quality control, we excluded SNPs with Hardy–Weinberg equilibrium p values < 0.001 in controls, which is the significance threshold after adjustment for multiple testing. SNPs and individuals with missingness >10 % were also excluded. As a result, 35 SNPs were included for analysis of the study group. Genotyping rates were >98.6 % for all samples. The concordance rate for 32 duplicate samples was 99.9 %.

Statistical analysis

Global individual ancestry was determined for each individual in the study group using 100 AIMs for West African ancestry (WAA). Individual ancestry estimates were obtained from the genotype data using the Markov Chain Monte Carlo (MCMC) method implemented in the program STRUCTURE 2.1 [43]. STRUCTURE 2.1 assumes an admixture model using prior population information and independent allele frequencies. The MCMC model was run using K = 3 populations with genotype data from 58 Europeans, 67 Native Americans, and 62 West Africans. We used a burn-in length of 30,000 iterations followed by 70,000 replications. To test heterogeneity in the two study groups (NCCCS and CCCC), we analyzed WAA using a principal component analysis (PCA) of the 100 AIMs, gender by a two-sided chi-square test, and age by a two-sided t test.

We tested the 35 potentially functional SNPs for association with CRC in the combined NCCCS and CCCC study groups and in each study group individually. We calculated odds ratios (ORs) and 95 % confidence intervals (CIs) using logistic regression assuming a log-additive genetic model. For stratified association testing by anatomic site, we defined right-sided CRC (R-CRC) as adenocarcinoma in the colon proximal to the splenic flexure and left-sided CRC (L-CRC) as adenocarcinoma in the colon and rectum distal to and including the splenic flexure. We also performed an analysis of SNP associations with rectal cancer. To adjust for multiple testing, we calculated gene-wide significance levels by permuting case–control status and repeating the analysis 1,000 times to determine the p value from the empirical distribution; p values less than 0.05 were taken as significant. Logistic regression analyses were carried out using the program Golden Helix (Bozeman, MO) and PLINK (http://pngu.mgh.harvard.edu/~purcell/plink/).

Results

Analysis of all AA CRC cases

Table 2 shows the distribution of CRC cases and controls by sex, age, and percent WAA in the NCCCS and CCCC study groups and in the two study groups combined. The two study groups were comparable with respect to WAA by PCA plot (Supplementary Figure 1) and gender (p = 0.463). Age was significantly different between the two study groups (p < 0.001); however, the age difference was not large (means ages were 63.6 in the NCCCS vs. 61.5 in the CCCC). There was significant heterogeneity within the study groups with respect to age, gender, and ancestry; consequently, we adjusted for these parameters in the logistic regression models.

Table 2 Clinical characteristics of the two study groups

Association ORs and p values were calculated from comparisons of CRC cases and controls in the combined NCCCS and CCCC study groups, and selected SNPs (p < 0.1) are shown in Table 3. ORs and p value results for all SNPs in the combined and individual study groups are shown in Supplementary Table 1. After adjustment for age, sex, and WAA, the A allele of SNP rs12794714, located in the 25-hydroxylase gene CYP2R1, was associated with a decreased risk of CRC in the combined CRC groups (p = 0.019; OR = 0.79, 95 % CI 0.65–0.96). The associations of the minor alleles of two other SNPs—rs17467825 and rs7041—trended toward significance with p values between 0.05 and 0.1 (Table 3). rs12794714 was not significantly associated with CRC in either study group alone (Supplementary Table 1); however, this SNP was still significant after adjustment for multiple testing on a gene-wide basis (Adj p = 0.048).

Table 3 Selected genetic polymorphisms that associate with colorectal cancer in the combined series

Analysis of AA CRC cases stratified by tumor location

Because cancer on the right and left sides of the colon is different at the molecular level [44], we analyzed R-CRC and L-CRC separately. We compared the 292 R-CRC cases from the combined NCCCS and CCCC study groups with all 838 controls; similarly, we compared the combined 443 L-CRC cases with all 838 controls. Association ORs and p values were calculated by logistic regression, and selected SNPs (p < 0.1) are shown in Table 4. ORs and p values for all SNPS in the combined and individual study groups are shown in Supplementary Table 2. In the analysis of genotype data from R-CRC cases, we obtained results comparable to the analysis of all CRC cases. The A allele of the CYP2R1 SNP rs12794714 and the G allele of rs7041 in CYP24A1 were weakly associated with the decreased risk of R-CRC (p = 0.056 and p = 0.059, respectively) (Table 4). rs12794714 was significantly associated with R-CRC in the NCCCS group but not in the CCCC study group. None of the p values adjusted for multiple testing were less than 0.1.

Table 4 Selected genetic polymorphisms that associate with left-sided colorectal cancer or right-sided colorectal cancer

In the analysis of genotype data from L-CRC cases, the T allele of rs16847024 in GC was associated with an increased risk of L-CRC (p = 0.015; OR = 1.49, 95 % CI 1.08–2.06) (Table 4). The G allele of rs6022990 in CYP24A1 was also associated with an increased risk of L-CRC (p = 0.018; OR = 1.41, 95 % CI 1.06–1.86) (Table 4). Two additional SNPs rs17467825 and rs73913757, localized in the GC and CYP24A1 genes, respectively, trended toward significance in L-CRC (Table 4). rs16847024 and rs6022990 were both significantly associated with L-CRC in the NCCCS group but not in the CCCC, and their p values adjusted for multiple testing trended toward significance. Neither was significantly associated with R-CRC or all CRC.

We also analyzed the genotype data comparing rectal cancer cases with controls. The GC SNP rs16847024 was more strongly associated with rectal cancer than with L-CRC (p = 0.002; OR = 2.29, 95 % CI 1.39–3.75), but none of the other SNPs associated with L-CRC or R-CRC were associated with rectal cancer (Supplementary Table 3). There were two additional SNPs with p values less than 0.05, but we note that this genotype analysis was based on only 101 rectal cancer cases, and these results should be interpreted cautiously.

Discussion

In the present study, we identified several nominally significant associations between SNPs in vitamin D-related genes and AA CRC. When testing all CRC cases, we identified an association between SNP rs12794714 in CYP2R1 and AA CRC. The association between rs12794714 and CRC stayed significant after adjustment for multiple testing gene-wide. In the subgroup analyses by location in the colon, rs12794714 trended toward significance in R-CRC. Two different SNPs, rs16847024 and rs6022990, had nominally significant p values in L-CRC but not in R-CRC nor in all CRCs. These associations trended toward significance after adjustment for multiple testing gene-wide. We used a gene-wide adjustment for multiple testing because significant associations with cancer or serum vitamin D levels have been previously reported in vitamin D-related genes and, in particular, for the three genes that showed associations in this study, as detailed below. Consequently, we concluded that genetic variation in the vitamin D-related genes could account for differences in CRC risk among AAs, although replication of these associations in large, independent AA CRC study groups needs to be performed to test these results.

A recent study of the five SNPs (rs2282679, rs10741657, rs12785878, rs11234027, and rs6013897) most strongly associated with serum 25(OH)D3 levels, conducted in 13 large cohorts and examining 10,061 CRC cases and 12,768 controls, failed to identify any associations with CRC [36]. We tested all of these SNPs in the present study and similarly failed to identify a significant effect on risk of CRC in AAs. These SNPs account for approximately 5 % of the variance in serum 25(OH)D3 levels, which could be too small an effect to impact CRC risk.

The effect of SNPs that affect serum vitamin D levels may be too small to be important in CRC; however, most of the genes studied in the present study are expressed in colonic mucosa and could have effects on the levels of active hormone in the tissues, where vitamin D has its antitumor effects. CYP27B1 catalyzes a second 1α-hydroxylation step of 25(OH)D3 in the kidney and in some extra-renal tissues to produce the active form of vitamin D, 1,25-dihydroxyvitamin D3 [1,25(OH)2D3 or calcitriol]. Cells in the colon and many other tissues, including the prostate, cervix, breast, placenta, pancreas, and brain, express CYP27B1 mRNA and other 1α-hydroxylase enzymes and thus have the capacity to synthesize 1,25(OH)2D3 from 25(OH)D3 [4547]. As a consequence of this capacity, the levels of active vitamin D hormone 1,25(OH)2D3 in colonic tissue could be strongly influenced by serum levels of 25(OH)D3.

CYP2R1 encodes a hepatic microsomal enzyme—one of the enzymes catalyzing 25-hydroxylation of vitamin D in the liver, but it is also expressed in the colon, where it could convert cholecalciferol to 25(OH)D3. The SNP that we found associated with all CRC, rs12794714, was previously found to associate with serum 25(OH)D3 concentrations [40]. The SNP is present in a promoter and DNAaseI hypersensitive region. The protective effect of the SNP could be explained if it increased transcription of the CYP2R1 gene, increasing enzyme levels and producing more 25(OH)D3 in colon tissue, where it could result in increased active hormone and a lower risk of CRC.

Almost all the 25(OH)D3 and active hormone 1,25(OH)2D3 exist in the circulation bound to serum vitamin D-binding protein (VDBP; also referred to as Gc-globulin), which is encoded by the gene group-specific component (GC) [48, 49]. One important function of VDBP is to regulate the half-life of 25(OH)D3 in the circulation through stabilization of the hormone, but VDBP also helps maintain serum vitamin D levels through reuptake by proximal tubule cells in the kidney [50]. 25(OH)D3 and 1,25(OH)2D3 can enter cells either by diffusion of free vitamin D across the cell membrane or by megalin-receptor-mediated endocytosis of vitamin D-VDBP complex [51]. GC is highly polymorphic with more than 120 known variants and with different frequency distributions in diverse populations. Some SNPs have been previously associated with plasma concentrations of 25(OH)D3 [52, 53] and with breast cancer [54, 55]. Here, we found a nominally significant association between risk of L-CRC in AAs and rs16847024 in GC. The rs16847024 SNP was selected for study because it is near the promoter of GC; however, it is not located within any regulatory elements, and formally rs16847024 may be in linkage disequilibrium with another SNP that impacts GC function. Theoretically, lower levels of VDBP or reduced function could result is less 25(OH)D3 delivery to tissues. We note that rs16847024 has not been associated with AA prostate cancer [38] or with vitamin D levels in AAs [56]. rs16847024 is relatively common in African-ancestry populations (minor allele frequency = 0.074), but is not present in European-ancestry populations (Table 1), making this association African ancestry specific.

Although the SNP rs7041 exhibited only a trend toward association with AA CRC in our data, this SNP warrants greater scrutiny because recent work on the relationship between VDBP and serum 25(OH)D3 levels has suggested that this polymorphism regulates the bioavailability of 25(OH)D3 [57, 58]. rs7041 encodes the electrophoretically distinguishable protein isoforms Gc1F common in Africans and Gc1S common in Europeans; Gc1F binds less 25(OH)D3 and is associated with lower serum 25(OH)D3 levels, resulting by the authors’ calculation in more bioavailable vitamin D [58]. In our data, the minor allele of rs7041 was associated with protection against CRC, consistent with the usual theory that higher serum 25(OH)D3 levels lead to more antitumor activity at the level of the tissues.

CYP24A1 begins the catabolic processing of 1,25(OH)2D3, and it may also catabolize 25(OH)D3, a process that has the potential to limit the availability of this molecule [59]. Some genetic polymorphisms in CYP24A1 have been associated with the concentrations of serum vitamin D metabolites [60] and with risk of CRC [32] in European-ancestry populations. Here, we found an association between risk of L-CRC and rs6022990—another SNP that is relatively common in African-ancestry populations (minor allele frequency = 0.098) but not present in European-ancestry populations (Table 1). rs6022990 is a missense variant that substitutes a threonine for a methionine at amino acid residue 374 in CYP24A1. The change is predicted to damage the function of the enzyme by both Polyphen and SIFT, and analysis of over-expressed mutant protein in colon cancer cells HCT116 demonstrated decreased CYP24A1 and increased intracellular levels of 1,25(OH)2D3 [61]. Based on the usual theory, these experimental results predict that the threonine-encoding allele of rs6022990 would be associated with lower risk of CRC because more active hormone in the colonic mucosa should result in greater antitumor activity. Our results were not consistent with this prediction. Additional AA CRC association studies and direct measurements of CYP24A1 activity in different genetic backgrounds would be useful to test this apparent conflict.

Associations between SNPs and development of CRC by location in the colon have been noted in several studies [6264]. Biological differences by location in the colon could explain why a SNP has an effect on risk in one part of the colon but not the other part. For instance, the arterial supply for the right and left intestine is different, the lymphatic drainage into colic nodes vary by location, the mucosa is thinner on the right than on the left, and luminal contents differ [6567]. Clinically, a higher proportion of CRCs in the right colon exhibit lymphocytic infiltrations compared with the left, and on average, R-CRC has a worse prognosis compared to L-CRC [6872]. AAs have a higher rate of R-CRC development compared to EAs [73], and consistent with this observation, AAs have a higher proportion of CRCs with lymphocytic infiltrations [63]. There is evidence that the vitamin D-related genes VDR in human and Cyp24a1 and Cyp27b1 in mouse are regulated differently in the proximal and distal colon [74, 75], raising the question whether vitamin D levels could affect CRC development and outcomes by location within the colon [76].

An important limitation of this study is the lack of data on serum vitamin D levels from cases and controls that would allow us to test our central hypothesis more directly. Our study was also limited by lack of information about important covariates that modify vitamin D levels. Sunlight exposure strongly influences serum vitamin D levels; only about a quarter of the interindividual variability in serum vitamin D concentration is attributable to season, geographical latitude, or vitamin D intake [77]. Variation in serum vitamin D levels is modified by calcium intake, age, obesity, skin pigmentation, physical activity, race, and genetic background [36]. Further studies in AA study groups in which serum 25(OH)D3 levels and additional epidemiological data are available are needed to control for these effects. Although we have some mechanistic clues about the possible functions of the polymorphisms that exhibited associations in this study, the in vitro information is relatively limited and studies in human tissues have not been performed. None of the associated SNPs have been reported as cis-expression quantitative trait loci (cis-eQTL) either in publicly available databases (http://www.scandb.org) or in our own unpublished eQTL study of AA colon tissue (data not shown). Finally, although this study contains the largest number of AAs genotyped for these vitamin D-related SNPs, larger study groups are needed to test the validity of the associations we report here and to increase overall study power.

Although the exact molecular mechanisms by which the SNPs in CYP2R1, CYP24A1, and GC influence CRC development remain to be determined, our study provides evidence that SNPs in vitamin D-related genes play a role in CRC susceptibility in AAs. Our findings are significant because there has been limited focus on the role of vitamin D-related polymorphisms on CRC in AAs. Identifying genetic variants affecting the functional status of vitamin D-related genes is important for understanding the role of this important regulatory hormone in tumor development and progression.