Background

The nucleotide excision repair (NER) system participates in the removal of the bulky adducts of DNA lesions from the genome under environmental stimuli, such as UV irradiation, tobacco, alkylating agents or pollutants, and xeroderma pigmentosum group A (XPA) acts as an essential NER member [1, 2]. XPA protein, as a zinc finger DNA binding protein and an important damage verifier, can bind the NER core repair factors to identify the damage site of the DNA substrate [2,3,4]. Abnormal DNA repair mechanisms or mutated NER proteins are involved in the process of mutagenesis and oncogenesis and are often linked to a group of clinical disorders [1, 2]. The human XPA rs1800975 T/C polymorphism is a common single nucleotide polymorphism (SNP) in the 5′-untranslated region of the XPA gene [5]. In the present study, we are interested in comprehensively exploring the possible effect of the XPA rs1800975 genetic variant on the susceptibility to different cancer diseases, such as skin cancer, lung cancer, breast cancer, esophageal cancer, gastric cancer, colorectal cancer or endometrial cancer.

There are different reports with distinct conclusions regarding the genetic relationship between the XPA rs1800975 polymorphism and cancer susceptibility in varied populations. For example, the XPA rs1800975 polymorphism was reported to be related to the risk of lung cancer in Norwegian [6], Germany [7, 8] or Korean populations [9] but not in patients from Belgium [10] or the USA [11]. These results merit a comprehensive evaluation by means of a meta-analysis.

To the best of our knowledge, to date, only two meta-analyses regarding the association between the XPA rs1800975 polymorphism and susceptibility to overall cancer diseases have been previously reported in 2012 [12, 13]. Nevertheless, no more than 36 case–control studies were enrolled for the prior meta-analysis. Therefore, we performed an updated comprehensive meta-analysis in 2020 based on the guidelines of preferred reporting items for systematic reviews and meta-analyses (PRISMA) [14]. In total, 71 case–control studies following the principle of Hardy–Weinberg equilibrium (HWE) were enrolled for pooling, and a series of stratified analyses, Begg’s test, Egger’s test, sensitivity analysis, FPRP analysis and TSA test, expression pattern, eQTL and sQTL analysis were conducted.

Methods

Database retrieval

Potentially relevant publications from six online databases, including PubMed, Excerpta Medica Database (EMBASE), Cochrane, China National Knowledge Infrastructure (CNKI), WANFANG and VIP, were retrieved until April 8, 2020. We did not set up any geographical or language restrictions for publications. Additional file 1: Table S1 shows our specific search terms during the database retrieval.

Screening criteria

The articles were then screened and evaluated for eligibility, according to our screening criteria. The inclusion criteria were as follows: genotypic frequency data for the XPA rs1800975 polymorphism in both cases and controls. The exclusion criteria included duplicate information; cell, plant or animal assay data; other diseases, genes or SNPs; review, meeting or meta-analysis; lack of normal control; lack of full genotypic data; and the genotypic distribution in controls was not in line with HWE.

Data extraction and quality evaluation

We utilized a table to independently extract the basic information, including first author, publication year, country, race, genotypic distribution, cancer type, control source, genotyping method, genotype frequency, and sample size. Possible disagreements were resolved by full discussion, and missing data were obtained by attempting to contact the corresponding author via e-mail. The P value of HWE in controls was obtained by the Chi square test. We evaluated the methodological quality of studies using the criteria of the Newcastle–Ottawa quality assessment scale (NOS) with a score ranging from one to nine. If the NOS score was less than five, the study was considered to be of poor quality.

Heterogeneity and association test

If the I2 value (variation in ORs attributable to heterogeneity) > 50% and the P-value of heterogeneity < 0.05, we adopted a random-effect model for the test of association. Otherwise, a fixed-effect model was used, owing to the absence of significant interstudy heterogeneity. P-values of association, OR and 95% CI (confidence interval) were calculated for the allelic (G vs. A), carrier (G vs. A), homozygotic (GG vs AA), heterozygotic (AG vs. AA), dominant (AG + GG vs. AA) and recessive (GG vs. AA + AG) models. In addition, subgroup analyses for race, control source and genotyping method were conducted. In the subgroup analysis, a minimum of three case–control studies should be included to obtain a relatively scientific and reliable conclusion.

Publication bias assessment

Begg’s test and Egger’s test were carried out for the quantitative evaluation of potential publication bias. We finally obtained the P-values for Begg’s test and Egger’s test, Begg’s funnel plot (pseudo 95% confidence limit) and Egger’s publication bias plot. If there is a basic symmetrical funnel plot and yielded P-values were larger than 0.05, the absence of significant publication bias was suggested.

Data sensitivity

We also conducted sensitivity analyses under the above six genetic models. After the sequential removal of each case–control study, the obvious change in the estimates showed the lack of statistical stability. STATA 12.0 software (StataCorp, College Station, USA) was used for the above statistical analysis.

False-positive report probability test

As the relevant information of former studies [15,16,17], a false-positive report probability (FPRP) test was carried out for the assessment of the true genetic relationship probability under the parameters of FPRP threshold value with 0.2, power OR with 1.5, and prior probability levels with “0.25, 0.1, 0.01, 0.001, 0.0001, 0.00001″. If the FPRP value < 0.2 under the prior probability level of 0.1, a worthy outcome between XPA rs1800975 and cancer risk was considered.

Trial sequential analysis

We applied a trial sequential analysis (TSA) approach to adjust random and systematic error risk and provided the optimal sample size for pooling by means of TSA viewer software (Copenhagen Trial Unit, Copenhagen), similar to several reported studies [17,18,19]. The TSA plot with a two-sided boundary type was obtained by the parameters of type I error probability with 5%, statistical test power with 80%, and relative risk reduction with 20%. For the genetic model of AG + GG vs. AA, if the cumulative Z-curve crossed the TSA monitoring boundary and touched the line of required information size, the power of the results with robustness was regarded.

Expression pattern analysis

Based on the dataset of GTEx (Genotype-Tissue Expression) analysis release V8 (dbGaP accession phs000424.v8.p2) [20], we analyzed the expression profile of XPA gene (ENSG0000136936.10) across multiple tissues, such as heart, brain, lung, stomach or colon. Log10 [TPM (Transcripts Per Million) +1] was utilized for scale. Besides, we applied the TIMER (Tumor Immune Estimation Resource) approach [21] to compare the expression difference of the XPA gene between tumor and adjacent normal tissues across all TCGA (The Cancer Genome Atlas) tumors. Wilcoxon test was used for the assessment of statistical significance. The results were visualized by the violin plot or box-plot.

The eQTL and sQTL analysis

Based on the dataset of GTEx [20], we also analyzed the “Significant Single-Tissue” eQTL (expression quantitative trait loci) and sQTL (splicing quantitative trait loci) in all tissues, for the XPA gene and the rs1800975 SNP. The values of sample number, NES (Normalized Effect Size), p-value, m-value were obtained. When m-value was larger than 0.9, an eQTL effect was considered [22]. The violin plots of eQTL and sQTL, and multi-tissue eQTL plots of the cross-tissue meta-analysis were provided, respectively. The normalized intron-excision ratio was used for the scale of sQTL.

Results

Enrolled case–control studies

A schematic illustration of eligible case–control study selection is shown in Fig. 1. We initially obtained 400 publications from six databases. Then, duplicate publications were excluded, and the remaining 269 publications were screened. Of them, we further removed 195 publications using our screening criteria. A total of 22 full-text articles were also excluded due to “lack full genotypic data”, “not in line with HWE” or “duplicate or overlapped data”. We finally extracted a total of 71 case–control studies from 52 publications [6,7,8,9,10,11, 23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68] for our integrated analysis. Table 1 lists the main characteristics of the enrolled case–control studies with good methodological quality (NOS score ≥ 5).

Fig. 1
figure 1

Schematic illustration of case–control identification in our meta-analysis

Table 1 Characteristics of included case–control studies

Overall meta-analysis results

As shown in Table 2, our overall meta-analysis enrolled a total of 71 case–control studies with 19,257 cases and 30,208 controls under the recessive model (GG vs. AA + AG) and 69 case–control studies with 19,039 cases and 29,707 controls under the other genetic models. The heterogeneity under the carrier G vs. A model (Table 2, I2 = 22.3%, P = 0.056) led to the utilization of a fixed-effects pooling model, and a random-effects pooling model was applied for others. For the pooling results shown in Table 2, a statistically significant difference in the susceptibility to cancer between cases and controls was detected under the allelic (P = 0.026, OR = 1.07), carrier (P = 0.009, OR = 1.04) and recessive (P = 0.001, OR = 1.12) genetic models. However, negative results were observed under other models (Table 2, P > 0.05). We failed to obtain evidence regarding the relationship between the XPA rs1800975 polymorphism and the overall risk of cancer in the overall population.

Table 2 Overall meta-analysis and publication bias data

Subgroup analysis results

Next, we conducted a series of subgroup meta-analyses stratified by race, control source and genotyping method. As shown in Table 3, an increased cancer risk in cases was observed compared with negative controls in the Caucasian subgroup analysis under the models of allelic G vs. A (P < 0.001, OR = 1.12), carrier G vs. A (P = 0.001, OR = 1.08), homozygotic GG vs AA (P < 0.001, OR = 1.24), heterozygotic AG vs. AA (P = 0.046, OR = 1.10), dominant AG + GG vs. AA (P = 0.004, OR = 1.16) and recessive GG vs. AA + AG (P < 0.001, OR = 1.16). A similar positive conclusion was detected in the subgroup analysis of the “population-based control, PB” under the allelic, carrier, homozygotic and recessive models (Table 3, P < 0.05, OR > 1). For the PCR-RFLP subgroup analysis, we only observed an increased risk of cancer in the carrier (Table 3, P = 0.016, OR = 1.06) and recessive (P = 0.018, OR = 1.16) models.

Table 3 Subgroup analyses by race, control source and genotyping assay

As shown in Tables 4 and 5, compared with controls, a decreased lung cancer risk was detected in cases under the GG vs AA (P = 0.032, OR = 0.87), AG vs. AA (P = 0.014, OR = 0.86), AG + GG vs. AA (P = 0.021, OR = 0.87) models, but not allelic G vs. A (P = 0.155), carrier G vs. A (P = 0.345), and GG vs. AA + AG (P = 0.755) models. For the subgroup of digestive system cancer, a positive association was detected under the carrier (Table 4, P = 0.013, OR = 1.09) and recessive (Table 5, P = 0.025, OR = 1.26) models. Moreover, we observed an enhanced risk of colorectal cancer under allelic (Table 4, P = 0.021, OR = 1.20), homozygotic (P = 0.007, OR = 1.68), heterozygotic (Table 5, P = 0.041, OR = 1.46), and dominant (P = 0.016, OR = 1.54) conditions, implying the potential effect of the AG genotype of XPA rs1800975 on the risk of colorectal cancer.

Table 4 Subgroup analyses by cancer type under the allelic, carrier and homozygotic models
Table 5 Subgroup analyses by cancer type under the heterozygotic, dominant and recessive models

Interestingly, as shown in Tables 4 and 5, we detected a significant difference between skin cancer cases and controls under the allelic (P < 0.001, OR = 1.17), carrier (P = 0.005, OR = 1.12), homozygotic (P < 0.001, OR = 1.36), heterozygotic (P = 0.029, OR = 1.18), dominant (P = 0.001, OR = 1.27), and recessive (P < 0.001, OR = 1.20) models. There was a similar positive association in the “skin BCC” subgroup under the allelic, carrier, homozygotic, dominant, and recessive models (all P < 0.05, OR > 1). These data suggested that XPA rs1800975 may be associated with a high susceptibility to skin cancer, especially skin BCC.

There were no significant differences between cases and controls in the majority of comparisons (Tables 2, 3, 4, P > 0.05), indicating that XPA rs1800975 does not seem to contribute to the risk of specific cancer types, such as breast cancer, esophageal cancer, gastric cancer, reproductive system cancer, endometrial cancer, or head and neck cancer. Forest plots of subgroup analyses by race (Fig. 2 of allelic model; Additional file 2: Fig. S1 of carrier model; Additional file 3: Fig. S2 of dominant model), control source (Additional file 4: Fig. S3 of allelic model; Additional file 5: Fig. S4 of carrier model; Additional file 6: Fig. S5 of dominant model), and cancer type (Fig. 3 of allelic model; Additional file 7: Fig. S6 of homozygotic model; Additional file 8: Fig. S7 of heterozygotic model; Additional file 9: Fig. S8 of dominant model) are presented as examples.

Fig. 2
figure 2

Forest plot data of subgroup analysis by race (allelic model)

Fig. 3
figure 3

Forest plot data of subgroup analysis by cancer type (allelic model)

FRAP and TSA results

To strengthen our results in the subgroup analysis of “lung cancer”, “colorectal cancer”, and “skin cancer”, we performed the FPRP test. As shown in Table 6, under the 0.1 prior probability level, the FPRP value for lung cancer was less than 0.20 under the heterozygotic and dominant models but not the homozygotic model, suggesting the lack of notable associations. We found that the subjects in different populations or the mixed source-based controls were included for the pooling analysis of lung cancer. Considering the above positive results in the subgroup of “Caucasian” and “PB”, we also performed another pooling analysis limited to the Caucasian population. As shown in Additional file 1: Table S2, when we only included the Caucasian subjects for the pooling analysis, we did not observe positive conclusions (all P > 0.05). A similar negative conclusion was further detected in the meta-analysis using PB-based controls in the Caucasian population (Additional file 1: Table S3, P > 0.05). Collectively, this evidence did not support the strong association between lung cancer risk and XPA rs1800975.

Table 6 FPRP values for the association between XPA rs1800975 and the risk of lung, skin, and colorectal cancers

With regard to colorectal cancer, we only observed that the FPRP value was less than 0.20 in the allelic and homozygotic models, under the prior probability level of 0.1 (Table 6). There are only three case–control studies [36, 40, 43] in the Caucasian population in the pooling analysis. After removing one study with the HB-based control [36], only two studies with 460 cases and 921 controls were enrolled for the pooling analysis (Additional file 1: Table S3). Although we observed an increased risk of colorectal cancer under the homozygotic, heterozygotic and dominant models (Additional file 1: Table S3, P < 0.05, OR > 1), this does not exceed our minimum requirement for pooling analysis, which requires at least three case–control studies. We cannot obtain a relatively scientific conclusion regarding the potential links of XPA rs1800975 and colorectal cancer risk.

As shown in Table 6, under the 0.1 prior probability level, the FPRP values for skin cancer were all less than 0.20, confirming notable associations. Caucasian subjects and PB-based controls were enrolled in all case–control studies. We further performed the TSA test, and the TSA plot in Fig. 4 shows that the cumulative Z-curve of the dominant model can cross both the lines of the TSA monitoring boundary and the required information size, suggesting a credible conclusion regarding the association between XPA rs1800975 and skin susceptibility.

Fig. 4
figure 4

TSA plot for skin cancer under the dominant model

Publication bias and sensitivity analysis results

For the evaluation of publication bias, the two-sided P-value of Begg’s and Egger’s test > 0.05 (Table 2) and the absence of obvious asymmetry of funnel plots under each genetic model (Fig. 5a, b show the plots of allelic model as instances) suggested no evidence of large publication bias during the pooling analysis mentioned above. In addition, we failed to detect the greatly changed values of ORs and 95% CIs through our leave-one-out sensitivity analysis (Fig. 5c for allelic model as an example).

Fig. 5
figure 5

Publication bias and sensitivity analysis (allelic model). a Begg’s test data; b Egger’s test data; c sensitivity analysis data

The eQTL and sQTL analysis results

Finally, based on GTEx datasets, we analyzed the expression profile of the XPA gene in different tissues, and the correlation between the gene expression and rs1800975 SNP of XPA. As shown in Additional file 10: Fig. S9, the XPA gene is expressed in various tissues, such as the brain, colon, esophagus, lung or skin tissues, suggesting a low tissue specificity. Based on the “Significant Single-Tissue” eQTL data (Fig. 6), we observed the potential association between XPA gene expression and rs1800975 SNP, in the tissues of artery aorta (P-value = 1.8e−9), artery tibial (P-value = 1.55e−6), esophagus muscularis (P-value = 3.59e−9), muscle skeletal (P-value = 6.39e−12), but not the skin tissue of [“not sun exposed (suprapubic)”, P-value = 7.87e−1) or [“sun exposed (lower leg)”, P-value = 5.16e−1). The data of multi-tissue eQTL comparison also suggested that four tissues (artery aorta, artery tibial, esophagus muscularis, muscle skeletal) were predicted to have an eQTL effect (Fig. 7, all m-value = 1.00). Cross-tissue meta-analysis further showed a potential overall correlation between gene expression and rs1800975 SNP of XPA (Fig. 7, P-Value = 3.07e−50). In addition, our sQTL data further showed a potential association between rs1800975 SNP and the splicing changes of XPA gene in the thyroid tissue (Fig. 8).

Fig. 6
figure 6

Violin plots of eQTL across multiple tissues of GTEx project

Fig. 7
figure 7

Multi-tissue eQTL plots of cross-tissue meta-analysis (GTEx)

Fig. 8
figure 8

Violin plot of sQTL in the thyroid tissue of GTEx project

Discussion

Although we observed a group of publications regarding the influence of XPA rs1800975 on the risk of certain specific cancers, such as lung cancer [69, 70], head and neck cancer [71], breast cancer [72], and digestive system cancer [73, 74], the evaluation strategies, study number and statistical power differed. We were interested in comprehensively exploring the impact of XPA rs1800975 on overall cancer susceptibility by pooling all currently available evidence. To date, there are only two reported meta-analyses from 2012 [12, 13] describing the association between XPA rs1800975 and susceptibility to overall cancer diseases. In the current study, we searched six online electronic databases, including PubMed, EMBASE, Cochrane, CNKI, WANFANG and VIP, with the last retrieval on April 8, 2020, to include a total of 71 case–control studies. Based on six genetic models (allelic, carrier, homozygotic, heterozygotic, dominant and recessive), a series of overall meta-analyses and subgroup analyses using the factors of race, control source and genotyping method, were used to scientifically assess the association between XPA rs1800975 polymorphism and the risk of cancer. Additionally, Begg’s test and Egger’s test, sensitivity analysis, FPRP analysis and TSA test were conducted.

In 2012, Ding et al. included a total of thirty-six case–control or case-cohort studies from twenty-eight publications to conduct a meta-analysis for the genetic effect of XPA rs1800975 on the susceptibility to overall cancer [13]. They did not detect a positive conclusion in the overall meta-analysis but a significant difference between controls and cases in the “lung cancer” subgroup analysis under the homozygotic and recessive models, the “Asian” subgroup in the dominant models, and the “skin cancer” subgroup in the homozygotic, heterozygotic, dominant and recessive models. In our updated meta-analysis, we excluded three publications in which the genotypic distribution of the control group was not in line with the HWE principle [75,76,77] and one publication related to oral premalignant lesions [78]. We also replaced one publication [79] with another one [67]. In addition, we added a total of twenty-eight publications for our new pooled analysis. In 2012, Liu et al. included twenty-four publications to conduct another meta-analysis and reported an increased colorectal cancer risk under the homozygotic and dominant models but a decreased susceptibility to lung cancer under the homozygotic and dominant models [12]. In the present study, we removed two publications owing to HWE [75, 77] and added another thirty new publications for our updated integrative analysis.

Our new findings showed a positive conclusion in the overall meta-analysis only under the carrier and recessive models, and in the “Caucasian” subgroup analysis under each model. We failed to detect a significant difference between cases and controls in the Asian population. The sample size contributes to the inconsistency with the data of Ding et al. [13].

Additionally, we detected a decreased lung cancer risk in cases under the GG vs. AA, AG vs. AA, AG + GG vs. AA models but an increased risk of colorectal cancer under the allelic, homozygotic, heterozygotic, dominant models, indicating the possible effect of the AG genotype of XPA rs1800975 on the susceptibility to colorectal cancer. These findings are partly in line with the conclusion of the above prior meta-analyses [12, 13]. Nevertheless, our data from FPRP analysis and another pooling analysis with only the population-based controls in the Caucasian population did not strongly support the protective role of the G allele within the XPA rs1800975 polymorphism in the risk of lung or colorectal cancer. Our data from the pooling analysis, FPRP analysis and TSA demonstrated a significant difference between skin cancer cases and negative controls under six genetic models, suggesting the contribution of the G allele within XPA rs1800975 to an enhanced susceptibility to skin cancer. Our eQTL and sQTL analysis data of GTEx showed that the XPA rs1800975 might not be associated with the gene expression or splicing changes of XPA in the skin tissue, suggesting the existence of other molecular mechanisms.

There are several strengths within our pooling analysis. No case–control study with poor quality was enrolled. We also excluded studies in which the genotypic contribution in the control group was not in Hardy-Weinberg equilibrium. In addition, both the absence of larger publication bias and the stability of pooling data were observed in all comparisons.

There are also several disadvantages during our analyses, which need to be discussed. First, fewer than ten case–control studies were enrolled in some comparisons, such as the subgroup meta-analysis of “breast cancer”, “gastric cancer”, “colorectal cancer”, “endometrial cancer”, “head and neck cancer”, and “skin cancer”. Therefore, several comparisons, such as subgroup analyses of “oral cancer” or “skin SCC”, were not carried out. In addition, high heterogeneity was present, and the “random-effect with DerSimonian and Laird method” was set in the overall meta-analyses under the allelic, homozygotic, heterozygotic, dominant and recessive models. There exists a decreased level of between-study heterogeneity in some subgroups of “Caucasian” (data not shown), indicating that ethnicity may be involved in the heterogeneity source.

After investigating the expression difference of XPA gene between tumor and adjacent normal tissues in TCGA project (Additional file 11: Fig. S10), we observed a higher expression level of XPA in the tissues of CHOL (Cholangiocarcinoma, P < 0.001) and LIHC (Liver hepatocellular carcinoma, P < 0.001), but a lower level in the tissues of BLCA (Bladder Urothelial Carcinoma), BRCA (Breast invasive carcinoma), KICH (Kidney Chromophobe), KIRC (Kidney renal clear cell carcinoma), KIRP (Kidney renal papillary cell carcinoma), LUAD (Lung adenocarcinoma), LUSC (Lung squamous cell carcinoma), READ (Rectum adenocarcinoma), THCA (Thyroid carcinoma), and UCEC (Uterine Corpus Endometrial Carcinoma) (all P < 0.05), compared with the corresponding control tissues. Apart from that, we predicted that the tissues of artery aorta, artery tibial, esophagus muscularis, muscle skeletal have an eQTL effect, while the thyroid tissue has a sQTL effect. Thus, it is meaningful to explore the potential genetic influence of all XPA genetic variants or the combined variants of XPA and other relevant genes (such as xeroderma pigmentosum group D, XPD) in the pathogenesis of the above tumors, arterial or muscular system-related diseases. The larger sample sizes are warranted, and the factors of age, sex, smoking, drinking, or therapy should be adjusted.

Conclusions

To summarize, our comprehensive integrative analysis data demonstrated statistical evidence on the association between the XPA rs1800975 A/G polymorphism and susceptibility to skin cancer, especially skin BCC, in the Caucasian population. The enrollment of more case–control studies following the HWE principle in diverse ethnicities will help researchers to further verify the potential genetic role of the XPA rs1800975 polymorphism in the risk of lung or colorectal cancer.