Background

Colorectal cancer (CRC) is still the third most commonly occurring cancer both in men and women worldwide. 1.8 million new CRC cases were diagnosed, and 609,000 death cases were reported in 2018 [1]. More importantly, the increased incidence and mortality of CRC were reported in young Asian adults including China [2,3,4]. The etiology of CRC is complicated in human and multifactor involved in carcinogenesis including environmental exposures, lifestyle factors, and especially multiple inherited genetic variations [5,6,7,8,9]. Non-coding RNA (ncRNAs) is regarded as “a genomic dark matter”, increasing studies have indicated a strong association between single-nucleotide polymorphisms (SNPs) in ncRNAs with the risk for CRC [10,11,12,13,14,15,16,17]. Therefore, to identify genetic variations including those in lncRNA and the interactions between genetic variations with environmental factors could reveal novel diagnostic and prognostic biomarkers for CRC diagnosis and assessments of the treatment accuracy.

Long non-coding RNA (lncRNAs) were first identified in the 1990s [18, 19], which are single-stranded, non-coding RNAs more than 200 nucleotides and no open reading frames (ORF) [20]. Rather than to be transcriptional noise, lncRNAs are the key players with multiple functions in carcinogenesis including regulating cancer cell cycle, proliferation, and apoptosis through regulating gene transcription and posttranscriptional processing [21,22,23,24]. The H19 gene is located on human chromosome 11p15.5, which is a cluster of imprinted genes including H19/insulin like growth factor 2 (IGF2). The H19 gene encodes 2.3 kb spliced and polyadenylated long noncoding RNA [25,26,27]. Indeed, H19 is highly expressed in the early stages of embryogenesis, and down-regulated with tissue maturation, however, (re)-expressed in human carcinomas tissues, such as CRC [28,29,30,31]. Thus, H19 is involved in cancer initiation, development, and progression, suggesting it could be a critical diagnostic and prognostic biomarker as well as a potential novel target in cancer therapy.

Recent functional studies provide insights into the roles of genetic variants in the H19 promoter region on the cancer risk, inter-individualized chemotherapy response and prognosis [10, 32,33,34]. The H19 expression was mainly regulated by H19 gene upstream 5′-flanking region, which contains differentially methylated regions (DMRs) and mutations [35]. To date, among the more than 100 SNPs found in the H19 gene (http://www.ncbi.nlm.nih.gov/projects/SNP), some potential functional SNPs in the promoter region play critical roles in altering individual susceptibility to cancer, interaction with environmental factors, and clinical outcomes in CRC [12, 16, 17, 36,37,38,39]. Bhatti et al. demonstrated that H19 rs2107425 polymorphism had close relationships with radiation therapy response in breast cancer patients in the United States (n = 859) [40]. O’Brien et al. further recognized that H19 rs2107425 polymorphism had significantly relationships with breast cancer susceptibility among African–Americans [41]. Yang et al. also reported that the H19 promoter SNP rs2839698 T allele contributes to the increased gastric cancer risk in a Chinese population [25]. The previous studies focused on H19 promoter SNP rs2107425 and rs2839698, which are not localized on the high incidence region in the upstream of the H19 gene. Therefore, to identify potential-functional SNPs in the H19 promoter region is urgently required which might benefit for early screening initiation and merit investigation.

In this study, we screened the distributions of genetic variation of approximately 3 kb upstream of the H19 promoter region and further investigated the possible association between every three SNPs in the human H19 gene (rs4930101, rs11042170, and rs2735970) with advanced CRC risk, environmental factors, and clinical outcomes. Crucially, this study would provide a novel diagnostic biomarker for advanced CRC patients.

Materials and methods

Patients and clinical information

This hospital-based case–control study was conducted at China Medical University (Shenyang, China) and approved by the Medical Ethics Committee of China Medical University. Specifically, 572 patients with advanced CRC were recruited from 2008 to 2013 at the First Affiliated Hospital and Shengjing Hospital of China Medical University. The inclusion criteria for CRC patients were: (1) availability of complete clinical data and follow-up status; (2) patients with clinical stage III and IV; and (3) patients underwent FOLFOX6 chemotherapy. The exclusion criteria were: (1) incomplete clinical data; (2) blood samples for genotyping were unavailable; (3) patients only received radiation therapy; (4) patients with other cancers, or cancers with unknown primary sites; (5) patients did not receive the FOLFOX6 regimen. Clinicopathological data were collected including age, gender, first-degree family history of CRC, smoking status, tumor size, tumor differentiation, pathological grade, lymph-node metastases from the interviewer-administered health risk questionnaires and medical records. Non-smokers were defined as individuals who < 100 cigarettes in a lifetime. BMI was calculated from self-reported height and body weight. Tumor differentiation and pathological grade for CRCs were performed according to the World Health Organization criteria. The patients underwent FOLFOX6 regimen for at least 2–3 cycles and were followed up monthly until recurrence or death. Age-, gender-, and ethnicity-matched healthy control volunteers (n = 555) were recruited from the same hospitals. After the interview, 5 ml blood samples were collected for further SNPs genotyping in each group.

Genotyping

Genomic DNA was extracted from peripheral blood leukocytes using the TIANGEN DNA Blood Mini Kit (TIANGEN Biotech CO., LTD, Beijing, China) and SNP genotyping was performed by TaqMan assay. The probes, primers and the related information about assay conditions, are available upon request. SNP allele-specific probes were labeled with the fluorescent dyes VIC and FAM by using the TaqMan SNP Genotyping Assays on the ABI 7500 Fast Real-Time PCR platform (Applied Biosystems, Life Technologies Corporation, Foster City, CA, USA). The genotyping rates of these SNPs were all above 90%. For quality control, approximately 10% of samples were randomly selected for repeated confirmation. Some of these samples were also confirmed by DNA sequencing analysis. The concordance rate of these repeated samples reached 100%, indicating that the genotyping method and results were reliable.

Statistical analysis

All data were analyzed via SPSS version 19.0 (SPSS Inc. Chicago, Illinois, USA) and a value of P < 0.05 was considered as statistically significant. Correlations between genetic polymorphisms and the susceptibility of CRC and clinical variables were assessed by odds ratios (OR) and 95% confidence intervals (CI) by unconditional logistic regression adjusted for age, gender, body weight, and smoking status. Overall survival (OS) was defined as the time between the surgery and death or last known follow-up. Disease-free survival (DFS) was the time from surgery until recurrence, death, or last known follow-up. Kaplan–Meier curves were used to assess DFS and OS, and the association between the DFS or OS with SNPs was estimated by Log-rank test. Multivariate Cox hazards regression models were used to estimating the adjusted hazard ratios and their 95% CI, thus to evaluate the independent prognostic value of each genotype and clinical variables. The high-order interactions were assessed between the SNPs and clinicopathological parameters by the Multiple Dimension Reduction (MDR) analysis.

Results

Identification of SNPs in the promoter region of the H19 gene

To investigate the distribution difference of genetic variants of the H19 promoter region, the SNPs in approximately 3 kb upstream of H19 promoter were genotyped in CRC patients (n = 51) and healthy controls (n = 50) by DNA sequencing. Sixteen SNPs were identified compared with the Gene Bank (https://www.ncbi.nlm.nih.gov/snp/), including rs10840167 (G/T), rs2525883 (C/T), rs4930101 (G/T), rs2525882 (T/C), rs2735970 (A/G), rs2735971 (A/G), rs11042170 (G/A), rs2735972 (G/A), rs2071094 (C/A), rs2107425 (C/T), rs4930098 (C/G), rs11042167 (A/G), rs2071095 (G/T), rs2251312 (G/C), rs2251375 (A/C), rs2525881 (T/C) (Additional file 1: Table S1; Fig. 1a). The genotype distributions of those SNPs in the control group were in agreement with the Hardy–Weinberg test (P > 0.05, Additional file 1: Table S1). To further evaluate whether those SNPs could affect CRC risk, we carried out a standard allelic association analysis on these SNPs by the Pearson χ2 test and the logistic regression. The frequency distributions of rs4930101 (G/T), rs2735970 (A/G), rs11042170 (G/A) showed significantly different between CRC patients and healthy controls (Additional file 1: Table S1, Fig. 1b–d). Specifically, the SNP rs4930101GG genotype increased the risk for CRC development by 5.211-folds. The combined genotype GT/GG or G allele showed a further significant increase in CRC risk. Harboring rs11042170 GG or GA/GG genotypes suggested a dominant higher risk for CRC development (GG vs. AA: P = 0.033, adjusted OR = 5.500, 95% CI 1.027–29.451; GA/GG vs. AA: P = 0.034, adjusted OR = 5.067, 95% CI 1.001–25.647, respectively). Moreover, a significantly increased frequency of the rs2735970 AG genotype in CRC patients was observed, compared with that in the healthy controls. In addition, no statistical association was observed between the susceptibility of CRCs and other SNPs of H19 promoter loci in this cohort (Additional file 1: Table S1).

Fig. 1
figure 1

The identified 16 SNPs distribution of about 3 kb upstream of the H19 promoter region. a 16 SNPs distribution in the H19 promoter region. b DNA sequencing genotyping the tagSNPs of rs4930101. c DNA sequencing genotyping the tagSNPs of rs2735970, and d DNA sequencing genotyping the tagSNPs of rs11042170

The correlation of H19 rs4930101, rs11042170, rs2735970 with colorectal cancer risk

To study whether H19 promoter SNPs rs4930101, rs11042170, rs2735970 affect the susceptibility to CRC, we enrolled 572 CRC patients and 555 healthy controls with age and gender-matched. The Median age (range, years) of the CRC group and the control group were 59 (26–82) years and 59 (25–80) years, respectively. There was no statistical difference between the two groups (P = 0.789). Demographic data, risk factors and related clinical variables including tumor size, clinical stage, pathological type, lymph node metastasis status, chemotherapy regimen, and other information were list in Additional file 1: Table S2.

By adjusted logistic regression analyses, we found that CRC risk was significantly increased in CRC patients carrying different genotypes of SNP rs4930101, such as heterozygous GT genotype (P = 0.007, adjusted OR = 1.92, 95% CI 1.19–3.10), the homozygous GG genotype (P = 0.001, adjusted OR = 2.12, 95% CI 1.32–3.39), the dominant model GT/GG genotype (P = 0.002, adjusted OR = 2.03, 95% CI 1.28–3.21), and then the G allele (P = 0.009, adjusted OR = 1.28, 95% CI 1.06–1.54) (Table 1 and Fig. 2a). SNP rs2735970 was also significantly associated with the increased risk for CRC, such as heterozygous GA genotype (P = 0.001, adjusted OR = 1.64, 95% CI 1.26–2.12), the homozygous GG genotype (P = 0.029, adjusted OR = 1.48, 95% CI 1.04–2.11) (Table 1 and Fig. 2a). GA/GG genotype (P = 0.001, adjusted OR = 1.60, 95% CI 1.25–2.04) and G allele (P = 0.003, adjusted OR = 1.29, 95% CI 1.09–1.52) of SNP rs2735970 were also associated with increasing susceptibility of CRCs (Table 1 and Fig. 2a). Moreover, harboring SNP rs11042170 GA, GG genotype, G allele, and GA/GG genotype in dominant model showed significant association with increased CRC risk [GA vs. AA: adjusted OR (95% CI) 1.69 (1.07–2.67), P = 0.023; GG vs. AA: adjusted OR (95% CI) 2.00 (1.28–3.13), P = 0.002; G vs. A allele: adjusted OR (95% CI) 1.32 (1.09–1.58), P = 0.003; and GA/GG vs. AA: 1.86 (1.20–2.87), P = 0.005] (Table 1 and Fig. 2a). More importantly, we further elucidated the impact of combined effect of risk genotypes on cancer risk, and found that carrying 1, or 2 or 3 risk genotypes (rs4930101 GT/GG + rs2735970 GA/GG + rs11042170 GA/GG genotype) showed a remarkable increase in the cancer risk [1 risk genotype: P = 0.001, adjusted OR (95% CI) 3.53 (1.58–7.86), 2 risk genotypes: P < 0.0001, adjusted OR (95% CI) 10.08 (4.56–22.28), 3 risk genotypes: P = 0.009, adjusted OR (95% CI) 2.79 (1.26–6.18)] (Table 1 and Fig. 2a). Subsequently, harboring more than 1 risk genotypes of CRC patients significantly increased susceptibility to cancer compared with carrying ≤ 1 risk genotype [P < 0.0001, adjusted OR (95% CI) 6.48 (2.97–14.15)] (Table 1 and Fig. 2a). Taken together, these data indicated that the potential function of three SNPs of the H19 gene is significantly associated with CRC risk.

Table 1 Logistic regression analysis of associations between genotypes of H19 promoter SNPs and advanced CRC susceptibility
Fig. 2
figure 2

Histogram and box plots illustrating the frequency distribution of rs4930101, rs2735970 and rs11042170 and stratified clinicopathological characteristics. a Pie chart illustrating the frequency distribution of rs4930101, rs2735970, and rs11042170 between controls (n = 555) and cases (n = 572). b Histogram chart representing the frequency distribution of rs2735970 genotypes classified by gender (male, female), and rs11042170 genotypes classified by first-degree family history of cancer (no, yes)

The interaction between H19 promoter SNPs with environmental factors and clinical variables

To explore the clinical utility of the SNP genotypes, the interactive effects of H19 SNPs between rs4930101, rs11042170, rs2735970 and the environmental factors or clinical variables were determined by χ2 test and unconditional logistic regression adjusted by gender, ages, smoking status, and first history of cancer (Fig. 2b, Table 2 and Additional file 1: Table S2). We found the significant gender difference in the distribution frequency of H19 rs2735970 GA/GG genotype [65.4% in man CRC patients, 75.9% in woman CRC patients, P = 0.006, the corresponding adjusted OR (95% CI) 1.700 (1.163–2.485)]. The frequency of rs11042170 GA/AA genotype was significantly increased in patients with a family history of cancer (58.8%) compared with those without a family history [45.9%, P = 0.035, the corresponding adjusted OR (95% CI) 1.677 (1.038–2.710)] (Fig. 2b and Additional file 1: Table S3). Body weight, smoking and family history of cancer act as the environmental higher risk factors of CRC, we further analyzed the interactions of environmental factors and genetic factors, and identify that combined risk genotypes (> 1 vs. ≤ 1) related to family history of cancer (P = 0.028, Table 2).

Table 2 Gene-environmental factor interactions (logistic regression)

Prognostic markers evaluation of H19 rs4930101, rs11042170, rs2735970 in advanced CRC patients

To further clarify whether the 3 SNPs of H19 promoter region were independent prognostic factors in this cohort, we assessed the Log-rank test and multivariate Cox hazard regression analysis including all variables which could affect DFS and OS in CRC patients treated with FOLFOX6 regimen. Overall, there was no statistically significant correlation between the 3 SNPs of the H19 gene and prognosis. However, remarkably worsen clinical outcomes were found in patients with combined risk genotypes (> 1), especially to those with body weight ≥ 61 kg, smoking, and first-degree family history of cancer (Log-rank test: P = 0.006, P = 0.018, and P = 0.013, respectively) (Fig. 3a–c). The median survival time (MST) in CRC patients with body weight ≥ 61 kg harboring more than 1 combined risk genotypes [MST (95% CI) 65 (59–70) months] was much shorter than those carrying ≤ 1 combined risk genotypes [MST (95% CI) 83 (76–89) months] (Fig. 3a). Meanwhile, in comparison to the reference combined genotypes with the MST on 83 months or 85 months, > 1 combined risk genotype was related to worse overall survival in the patients with smoking [MST (95% CI) 56 (52–60) months] (Fig. 3b) and a family cancer history [MST (95% CI) 66 (60–71) months] (Fig. 3c), respectively. More importantly, the multivariate Cox regression analyses further verified that > 1 combined risk genotypes shows a prognostic risk factor for CRC patients with body weight ≥ 61 kg [P = 0.002, HR (95% CI) 1.79 (1.09–2.94)], smoking [P = 0.008, HR (95% CI) 2.64 (1.84–3.88)], and a family history of cancer [P = 0.006, HR (95% CI) 2.75 (1.17–6.60)] (Table 3).

Fig. 3
figure 3

Stratification analysis estimate the correlation of OS and combined genotypes of the H19 gene in advanced CRC patients using Kaplan–Meier analysis. Stratification analysis illustrating combined genotypes of rs4930101, rs2735970 and rs11042170 (risk genotype > 1) had shorter OS time in advanced CRC patients with body weight ≥ 61 kg (P = 0.006) (a), smoking history (P = 0.018) (b), and family history of cancer (P = 0.013) (c)

Table 3 Multivariate Cox proportional hazard analyses of H19 rs4930101, rs2735970, and rs11042170 of in association with DFS and OS in advanced CRC patients

High-order interactions with CRC prognosis by MDR analysis

To further evaluate the existence of possible gene-environmental factors interaction in association with the clinical outcomes, high-order interactions were assessed by the multiple dimension reduction analysis on the 3 SNPs (rs4930101, rs2735970, and rs11042170), combined genotypes and 8 known risk factors (i.e., age, body weight, gender, smoking status, first-degree family history of cancer, tumor size, tumor differentiation, and clinical stage). In the MDR analysis, 8 risk factors combination was the best model with the highest cross-validation consistency (CVC) and the lowest prediction error in comparison to the one-factor model among all 5 risk factors. The 12-factor model had a maximum CVC and a minimum prediction error, with the prediction error being statistically significant (Table 4) both in DFS and OS. Taken together, the 12-factor model showed a better prediction for prognosis than the 8-factor model and represented the best model to predict CRC prognosis for this study population.

Table 4 MDR analysis for the prediction of prognosis with and without 3 SNPs genotypes in advanced CRC patients

Discussion

Although only a small number of lncRNAs have been well-characterized, current studies have revealed that lncRNAs, such as H19 have been functionally associated with diseases occurrence, development, and progression, in particular, cancers [42, 43]. Dysregulation of lncRNAs has been implicated in breast cancer, bladder cancer, gastric cancer, and colorectal cancer [44,45,46,47]. It is evident that dysregulation of H19 expression affects cellular functions, such as cell proliferation, imprinting, migration, invasion, and metastasis [28, 43, 48,49,50]. Therefore, the genetic variations of H19, especially in the promoter region may play a critical role in affecting the susceptibility to cancer. In the current case–control study with 572 CRC cases and 555 healthy controls from northeast of the Chinese population, for the first time, we explored the potential association between H19 promoter genetic polymorphisms and CRC risk. We verified that 3 of the 16 included SNPs in the DMR upstream loci of H19 gene, namely rs4930101, rs11042170, and rs2735970, especially in the combined risk genotypes of the 3 SNPs were remarkably associated with an increased advanced CRC risk, environmental factors, and the clinical outcomes in the advanced CRC patients with body weight ≥ 61 kg, smoking, and first-degree family history of cancer.

In the current study, we first detected the SNPs located at the DMR upstream loci of the H19 gene in the training set on 51 CRC patients and 50 healthy controls. Total 16 SNPs were identified in this cohort. As the first discovered lncRNA, H19 is involved in regulating gene expression in the imprinted gene network and contributes to growth control in development [19, 51,52,53,54]. Due to the important roles in forensic identification, the 16 SNPs were detected in another two different nationalities, Chinese Han population and Chinese Korean nationality [55, 56], which was consistent with our findings. In this study, because high-quality DNA could be easily prepared from peripheral blood, the genotyping of these SNPs was only identified based on genomic DNA. Van Huis-Tanja et al. [57] reported that 11 SNPs in 9 genes were determined in matched samples from blood and FFPE tissue of colorectal tumors by pyrosequencing and TaqMan techniques. They found only GSTP1 showed significant discordance between FFPE tissue and blood genotype, the discordant rate was only 1.4%. Recently, Shao et al. [58] evaluated the genotyping concordance between tumor tissues and peripheral blood in a genome-wide scale, and high concordant rate (97.42%) was found between tumor tissues and peripheral blood. Thus, we further investigate the relevance of those SNPs with advanced CRC risk and found 3 SNPs among those 16 SNPs showed significantly associated with cancer susceptibility including rs4930101, rs2735970, and rs11042170. With regard to the relationship of the SNPs with CRC risk, we further explored the investigation in a relatively large sample including 572 advanced CRC patients and 555 healthy controls on genomic DNA. Specifically, a significantly increased CRC risk was observed in the advanced CRC patients carrying SNP rs4930101, rs2735970, and rs11042170 homozygous genotype and under the dominant model. More importantly, a remarkably increased 6.48-fold of susceptibility to CRC cancer was determined for the first time in the patients harboring > 1 risk genotypes when compared with carrying ≤ 1 risk genotype (risk genotypes: rs4930101 GT/GG + rs2735970 GA/GG + rs11042170 GA/GG). To our knowledge, it is unclear whether the potential 3 SNPs could affect the expression of H19 and then develop the cancer risk. However, we found a strong synergistic effect in combined risk genotypes, suggesting they could act as a biomarker in CRC screening and diagnosis.

In this cohort, we further explored the gene-environmental factor interaction of H19 promoter SNPs rs493010, rs11042170, and rs2735970 with clinicopathological parameters of CRC patients including gender, body weight, smoking and family history of cancer. Although no association was found between rs4930101 and clinical variables, a significantly decreased distribution frequency of rs2735970 AA genotype was observed in the female CRC patients. Importantly, a remarkable relationship was found in the patients who carrying rs11042170 genotype or combined risk genotypes (> 1 vs. ≤ 1) with a family history of cancer. This also indicated that the G allele might be a genetic predisposition factor in advanced CRC. The effect of combined risk genotypes (> 1 vs. ≤ 1) is more significant than the single genotype variation. As cancer is multifactorial, the changes in combined genotypes could dramatically affect cancer development. Recent research found that some variants (rs10505477, rs6983267, rs10795668, and rs11255841) related to CRC risk are associated with the family history of CRC [59]. However, until now, the interaction between those 3 SNPs of H19 and CRC environmental factors is still unreported. Only one recent case–control study reported another SNP rs2107425 of H19 promoter region showed a combined greater impact on affecting lung cancer risk than individual effects of the SNPs with cooking smoke exposure [38]. These results indicate that the 3 tag SNPs could serve as potential biomarkers for evaluating the interaction of clinicopathological parameters and advanced CRC associated polymorphisms. Studies on other cancer types and larger sample sizes are encouraged to validate the findings and need to be elucidated and verified in the future.

To further excavate independent prognostic factors in this cohort, we for the first time to perform the log-rank test, multivariate Cox regression analysis, and MDR analysis on all variables to possibly affecting DFS and OS in advanced CRC patients. No significant association was found between H19 SNPs and CRC overall survival in patients treated with FOLFOX6 regimen. However, the stratification analysis found a remarkably worsen clinical outcomes harboring combined risk genotypes (> 1 vs. ≤ 1) of CRC patients with body weight ≥ 61 kg, smoking, and first-degree family history of cancer, which suggested that combined genotype of the 3 SNPs may affect CRC prognosis and could be a promising biomarker for advanced CRC prognosis. As previously reported, the expression of H19 could be induced by cigarette smoke and other factors. Therefore, these data suggest that the combined genotypes of the potential SNPs could be functional biomarkers for predicting the prognosis, especially in the CRC patients with specific clinical characteristics including greater body weight, ever-smoking, and first-degree family history of cancer.

In this study, we extensively evaluated the significant associations between SNPs of the H19 promoter region and CRC risk, pathological features, and clinical outcome in advanced CRC patients for the first time. Our results identified 16 SNPs in the DMR upstream loci of the H19 gene. The 3 potential SNPs of the rs4930101 G allele, rs11042170 G allele, rs2735970 G allele, and combined risk genotypes were associated with increased advanced CRC risk in a training set and overall cohort. Furthermore, interactions of those SNPs and combined risk genotypes with environmental factors, and prognosis were found in the advanced CRC patients with body weight ≥ 61 kg, smoking, and first-degree family history of cancer. However, functional experiments are warranted to further elucidate the role of H19 and the underlying molecular mechanism in CRC tumorigenesis.

Conclusions

  1. 1.

    3 SNPs of rs4930101, rs11042170, and rs27359703 among 16 identified SNPs in the DMR upstream loci of the H19 gene were remarkably associated with an increased risk for advanced CRC.

  2. 2.

    CRC patients who are harboring > 1 combined risk genotypes showed a remarkably increased CRC risk (6.48-fold) and a significant interaction with environmental factors.

  3. 3.

    It is notable that a significantly worse impact on clinical outcomes was observed in the stratification analysis, especially in the CRC patients harboring combined risk genotypes (> 1 vs. ≤ 1) with body weight ≥ 61 kg, ever-smoking, and first-degree family history of cancer.

  4. 4.

    Future in vitro and in vivo studies in patients with other cancers are needed to confirm these findings.