Background

Gastrointestinal stromal tumors, or GISTs, are soft tissue sarcomas that develop from mesenchymal connective tissue anywhere in the gastrointestinal tract, though they most frequently appear in the stomach or small intestine [13]. Recent advances in molecular biology have revealed a distinct subset of these tumors that express tyrosine kinase receptors or platelet-derived growth factor receptors [46]. The presence of these receptors, encoded by the KIT or PDGFRA oncogenes, respectively, is an indication that these tumors share a common origin of the interstitial cells of Cajal, the pacemaker cells of the gut. The Cajal cells normally express both CD117, the immunohistochemical (IHC) marker for tyrosine kinase receptors, and CD34, the IHC marker for platelet-derived growth factor receptors, and GISTs likely develop from Cajal cells with acquired gain-in-function KIT or PDGFRA mutations [4, 7, 8]. KIT exon 11 (50-60% of cases) and KIT exon 9 (5-10% of cases) are the most common mutation sites [2, 9, 10]. Such mutations can enable receptor activation in the absence of normal stem cell factor signaling mechanisms, thereby over-stimulating cell proliferation and leading to tumor development. Based on this discovery, a 2001 National Institutes of Health consensus panel [8] agreed to formally define GISTs as mesenchymal neoplasms of the gastrointestinal tract displaying positivity for CD117 or CD34, with some exceptions allowed for immunonegative tumors with otherwise consistent histology.

These relatively new and complex diagnostic criteria make disease surveillance and etiologic study difficult. In a recent evaluation of patients in the National Cancer Institute’s Surveillance, Epidemiology and End Results database, Rubin et al. [11] estimated an annual age-adjusted incidence rate of 0.32 cases per 100,000 individuals in the United States (US). The rarity of the disease makes it a difficult subject for population-based research or for prompt and unbiased assessment of non-genetic risk factors in any study population. An evaluation of the genetic determinants of GIST is much more feasible, as the germline DNA of individuals does not change over time or in response to disease processes. To date, no other research groups have published such evaluations.

To help fill this knowledge gap, we investigated the genetic determinants of GIST in a case-only analysis. Specifically, we examined the associations between select single nucleotide polymorphisms (SNPs), which are inherited variations in individuals’ DNA, and the several common types of acquired KIT and PDGFRA mutations present in GIST tissue. Certain susceptibility loci have been linked to characteristic mutations, or mutational “signatures”, in other cancers. These include associations between GSTM1-null genotype and TP53 transversion mutations among bladder cancer patients [12], and certain functional polymorphisms in XPD and G:C→T:A TP53 mutations among lung cancer patients [13]. Similarly, we hypothesized that the characteristic somatic mutations in the KIT and PDGFRA genes in GIST tumors may be mutational signatures that are causally linked to specific mutagens or susceptibility loci. As such, identifying risk factors for the individual tumor subtypes may be fundamental to understanding the disease.

We conducted our evaluation in two phases. The first phase included genes previously linked to soft tissue sarcoma or to environmental risk factors for soft tissue sarcoma, such as dioxins, phenoxyherbicides, insecticides, and vinyl chloride [1418], as well as genes previously linked to mutational signatures in other cancers [12, 13, 1921]. We found that several SNPs were associated with GIST tumor subtypes, including SNPs on two xenobiotic metabolizing genes, CYP1B1 and GSTM1, and two DNA repair genes, RAD23B and ERCC2[22].

The present report includes results from the second phase of the study, in which we examined the relationship between these 7 somatic mutation categories and 522 additional candidate SNPs. These SNPs are located on genes that play secondary or less well-understood roles in dioxin response or toxin-metabolizing pathways, as well as SNPs on PDGFRA and 10 matrix metalloproteinase (MMP) genes, which are often over-expressed in GISTs and other soft tissue sarcomas and may be linked to tumor invasion and metastasis [2325]. Based on previous evidence that immune and inflammation-related genes are associated with osteosarcomas and other gastrointestinal tumors [2628], we also selected certain SNPs from the GeneChip® Human Immune and Inflammation SNP Kit genotyping panel designed by Affymetrix® [29, 30].

Our main objective for these analyses was to identify genes or gene pathways potentially related to GIST carcinogenesis. Therefore, in addition to assessing the effect of each individual SNP, we also assessed the joint effects of SNPs in the same functional categories. By conducting these exploratory analyses with a large and diverse gene panel, our goal is to identify specific variants, genes, or functional pathways meriting further investigation.

Methods

Study population

The study population consists of the first 279 individuals from the American College of Surgeons Oncology Group (ACOSOG) Z9001 clinical trial who provided blood and tumor tissue samples for ancillary research and had sufficient tumor tissue for mutation analysis. The Z9001 trial was a multicenter, Phase III, randomized, double-blind study of adjuvant imatinib (Gleevec™; Novartis Pharmaceuticals) versus placebo. To be eligible for the clinical trial, cases had to have a CD117+, resected, localized, primary GIST of at least 3 cm diagnosed between July 1, 2002 and April 18, 2007. IHC staining for CD117 was completed using the Dako antibody (DakoCytomation, Copenhagen, Denmark). Additional information on the Z9001 trial is published elsewhere [31]. Institutional Review Boards at all participating institutions approved this ancillary study and all of the included participants consented to the use of their blood and tissue specimens.

SNP selection

Among the target genes described above, we identified SNPs within 2000 or 500 base pairs of the 5′ and 3′ ends of the coding regions, respectively, that could affect gene function and had at least a 10% minor allele frequency in the HapMap CEU population [32]. This included nonsense, missense and splice site mutations, as well as SNPs overlapping with microRNA seed regions or transcription binding sites. If the selected SNPs did not meet design phase quality control standards (designability score <1 or final score <0.7), we selected a surrogate SNP in high linkage disequilibrium with the desired SNP. SNPs related to dioxin response or toxin metabolism (final n = 68), matrix metalloproteinase (final n = 24), or on PDGFRA (final n = 4) were selected in this manner.

We selected the remaining SNPs (final n = 426) from a pre-existing Affymetrix® panel designed to include potentially functional non-synonymous SNPs from 318 genes related to immune and inflammatory response. Genes were selected for inclusion based on their gene ontology (GO) categorization [33]. GO categories were also used to separate the genes into functional subgroups (see Additional file 1: Table S1).

Lab assays

DNA for mutation analysis was extracted from snap-frozen tumor tissue and tested for KIT exon 11 mutations using polymerase chain reaction (PCR) analysis (Platinum TaqDNA Polymerase High Fidelity; Life Technologies, Inc., Gaithersburg, MD). The PCR conditions were as follows: 1) 94°C for 4 min; 2) 94°C for 30 sec, 3) the relevant annealing temperature for each primer set for 30 sec, 4) 72°C for 30 sec, (35 cycles); and 5) 72°C for 3 min. The PCR products were identified by agarose gel electrophoresis using a 2% MetaPhor™ agarose gel (BioWhittaker Applications, Rockland, ME). The PCR products were purified with the QIAquick™ PCR Purification Kit (Qiagen Inc., Valencia, CA) before sequencing. The sequencing reactions for each case were performed from both the forward and reverse directions. Tumors lacking exon 11 mutations were genotyped for mutations in KIT exons 9, 13, 14 and 17 and PDGFRA exons 12 and 18. A more detailed description of these assays can be found elsewhere [34, 35].

An initial 544 candidate SNPs were genotyped using the GoldenGate genotyping assay (Illumina Inc., San Diego, CA). Briefly, allele-specific oligos were hybridized directly to genomic DNA extracted from the blood samples. The hybridized DNA was extended and ligated to downstream locus-specific oligos and then amplified using universal PCR fluorescently labeled primers and allele-specific primers [30, 36]. After the resulting products were hybridized to their complementary bead types, the arrays were assessed using the BeadArray™ Reader.

Twenty-seven participants underwent duplicate genotype analysis for quality assurance purposes. Concordance for duplicate samples was 99.9%. After excluding SNPs that were mono-allelic (n = 3), had >5% missing data (n = 3), or showed poor clustering (n = 16) among our study subjects, we had 522 evaluable SNPs, as listed above. We retained three SNPs that showed evidence of possible copy number variation, but designated them accordingly.

Statistical analysis

We conducted descriptive analyses of selected demographic variables, tumor characteristics and genotypes. As this population includes some non-white participants, we calculated race-specific MAFs and compared genotype distributions across racial groups (white vs. other) using a Pearson χ2 test of association. Fisher’s exact test was used if one or more cells had less than 5 observations.

We categorized each individual’s tumor based on the presence or absence of each of the following outcomes: i) a deletion of KIT exon 11 codons 557–558, ii) any other (i.e. non-codon 557–8) deletion in KIT exon 11, iii) a KIT exon 11 insertion, iv) a KIT exon 11 point mutation, v) a KIT exon 9, exon 13, exon 14, or exon 17 mutation, vi) a PDGFRA exon 12 or 18 mutation, and vii) no KIT or PDGFRA mutation (wild type). KIT mutations in exons 9, 13, 14 and 17 were too rare for independent evaluation.

We obtained odds ratios (ORs), 95% confidence intervals (CIs) and p-values for each SNP-mutation combination using logistic regression. All models were adjusted for race, sex, and age at diagnosis. We assumed additive genetic models, denoting whether an individual had 0, 1 or 2 copies of the minor allele. P-values were calculated using trend tests and were corrected for multiple testing by controlling for a false discovery rate of 25%. This method is less conservative than a Bonferroni approach and is thus better suited for a hypothesis-generating study such as this [37].

We used a sequence kernel association test (SKAT) to assess the joint effect of a group of SNPs on an outcome [38, 39]. We grouped SNPs according to their functional category, as described above. This method is well-powered to detect associations when SNPs in a group are correlated with one another but only moderately associated with the outcome. Briefly, for individual i, the log odds of having the outcome given genotypes zi1 to zip and covariates xi1 to xim is modeled semi-parametrically using a logistic kernel-machine regression model:

logit P y i = 1 = α 0 + α 1 x i 1 + + α m x im + h z i 1 , z i 2 , , z ip ,

where h(Z i ) is a function of a positive, definite kernel function, K(•,•) and some γi, …, γn:

h Z i = i = 1 n γ i ' K Z i , Z i '

We used an identity-by-state kernel: K Z i , Z i ' = j = 1 p 2 Z ij Z i ' j , as this does not require linearity assumptions and allows for epistasis [39]. Assuming h follows an arbitrary distribution with a mean of 0 and variance τK, testing the null hypothesis H0: h(Z) = 0 is equivalent to testing H0: τ=0. This is accomplished using a modified variance-component score statistic:

Q = y p ^ 0 ' K y p ^ 0 2 ,

where logit

p ^ 0 i = a ^ 0 + a ^ 1 x i 1 + a ^ 2 x i 2 + + a ^ m x im .

Here, Q is comparable to a χ2 distribution with scale parameter κ and ν degrees of freedom, both of which are modified to account for correlation between SNPs in the same SNP-set (for calculations, see Appendix A in Wu et al. [38]).

Results

Males and females were approximately equally represented in our study population, while 18% were non-white (Table 1). Median age at diagnosis was 58 years (range 18–85), though non-white participants tended to be younger (median age = 53 years). Non-white participants were also more likely to have a smaller tumor than white participants (median tumor sizes of 6.0 cm versus 6.5 cm), and more likely to have stomach tumors (74% versus 64%). Most patients had mutations in KIT exon 11 (70% overall), the largest proportion of which were codon 557–558 deletions (34% of exon 11 mutations). Demographic and tumor characteristics were very similar for males and females.

Table 1 Demographic information and tumor characteristics of patients included in genotyping ancillary study

Compared with the larger ACOSOG Z9001 cohort, whites were somewhat over-represented in this ancillary study, which had otherwise similar characteristics (data not shown). Race-stratified MAF and association p-values for all 522 SNPs are displayed in Additional file 1: Table S2. As expected based on genotype distributions in ethnically diverse HapMap populations [32], genotype distributions in this study differed by race for many of the candidate polymorphisms.

The top 5 SNPs for each mutation subtype are displayed in Table 2. Only one SNP, rs1716 on ITGAE, was statistically significant after adjusting for multiple comparisons. This SNP was associated with KIT exon 11 non-codon 557–8 deletions (OR = 2.86, 95% CI: 1.71, 4.78; p = 6.4×10-5).

Table 2 Odds ratios (ORs) for top 5 SNP-mutation associations, by mutation type

Though not statistically significant after adjusting for multiple comparisons, rs3024498 (IL10) and rs1050783 (F13A1) were strongly associated with PDGFRA mutations (OR = 0.31, 95% CI: 0.16,0.60 and OR = 0.31, 95% CI: 0.16,0.61, respectively) and rs2071888 (TAPBP) was strongly associated with wild type tumors (OR = 0.37, 95% CI: 0.20, 0.67). Additionally, several SNPs in matrix metalloproteinase genes were associated with tumor subtypes. MMP10 and MMP1 SNPs were associated with KIT exon 11 codon 557–558 deletions, 2 MMP7 SNPs were associated with KIT exon 11 point mutations and a MMP1 SNP was associated with KIT exon 9, 13, 14, or 17 mutations. Effect estimates and p-values for all 522 SNPs can be found in Additional file 1: Table S3. The relative magnitude of all SNP-subtype associations is depicted in Figure 1.

Figure 1
figure 1

Log p-values for the association between each candidate SNP and tumor mutation type.

Despite strong SNP-level effects, the MMP pathway was not associated with any of the tumor subtypes in the SKAT analyses (minimum p-value =0.2 for KIT exon 11 point mutations; Additional file 1: Table S4). As seen in Table 3, the strongest pathway-level associations were in relation to somatic mutations in PDGFRA. This included defense response (p = 0.005), negative regulation of immune response (p = 0.01), protein phosphorylation (p = 0.02), positive regulation of immune response (p = 0.03), and AHR/dioxin response (0.03). Additionally, negative regulation of cell proliferation was associated with PDGFRA mutations (p = 0.04; Additional file 1: Table S4). In total, only 5 other pathways were associated with a tumor subtype at p < 0.05. These were AHR/dioxin response with non exon 11 KIT mutations (p = 0.01), humoral immune response with wild type mutations (p = 0.02), and response to stress, negative regulation of apoptosis, and protein tyrosine kinase activity with non-codon 557–8 KIT exon 11 deletions (p = 0.02, p = 0.03, and p = 0.04, respectively). No pathways were statistically significant after correcting for multiple testing. Log p-values for all pathway analyses can be seen in Figure 2.

Table 3 SKAT p-values for top 5 functional pathway- mutation associations, by mutation type
Figure 2
figure 2

SKAT log p-values for the association between each functional pathway and tumor mutation type.

Discussion

In this exploratory analysis of genetic risk factors for GIST tumor subtypes, we identified one statistically significant association and a number of other potentially important associations for individual polymorphisms. We also identified several potentially relevant functional pathways. These novel findings offer clues about the etiology of these rare and poorly understood tumors.

The SNP with the strongest association with a tumor subtype was rs1716 on ITGAE. This SNP results in a missense mutation on ITGAE (also known as CD103), a gene involved in protein tyrosine phosphatase activity. This SNP was previously associated with increased risk of melanoma [40], as was another SNP in the gene. The CD103 protein is commonly expressed in intraepithelial lymphocytic T cells and hairy cell leukemia cells [41, 42].

The IL10 SNP associated with PDGFRA mutations, rs3024498, is located in a seed microRNA region. One previous study found an association between rs3024498 and colorectal cancer [43]. The IL10 gene encodes a cytokine that plays a role in immunoregulation and inflammation, and has been previously linked to several cancers, including osteosarcoma [26], cervical cancer [44], and gastric cancer [45, 46].

rs1050783 in F13A1 is also in a seed microRNA region, but neither the SNP nor the gene has been previously linked to cancer. The same is true for rs2071888 in TAPBP, a missense mutation. As noted above, several studies have observed over-expression of matrix metalloproteinase genes in GISTs and other soft tissue sarcomas [2325], though none of the evaluated SNPs have previously been associated with cancer risk.

Pathway analyses are common in cancer epidemiology, but there is little consistency in how the pathways are defined. We selected the well-documented and publicly available Gene Ontology [33] classification system to facilitate replication and follow-up studies. Although we did not identify any studies that examined the specific pathways included in the present analyses, numerous studies have observed associations of inflammatory or immune response genes with risk of sarcomas or gastrointestinal cancers [26, 28, 30, 47].

The present exploratory study was undertaken in light of evidence suggesting that some mutagens and susceptibility loci are associated with specific mutational “signatures,” i.e., characteristic mutation patterns [12, 13, 1921]. At this point, GIST etiology has been insufficiently researched, and we do not know how well the mutation-based tumor classifications used in this study correspond to distinct carcinogenic processes. However, the existence of multiple types of mutations suggest that more than one mutagenic process could be involved, and we believe that identifying associations between germline genetic polymorphisms and unique tumor phenotypes could contribute valuable new information about disease etiology. This information could also help to elucidate environmental risk factors for this disease.

Because tumors with KIT exon 11 mutations were not assessed for other KIT or PDGFRA mutations, some tumors may be misclassified, though evidence from population-based studies suggests that few GISTs have more than one mutation type [2, 9]. Our study participants had similar mutation profiles to the individuals included in these population-based investigations, but the results from this predominately white clinical trials population may not be generalizable to all GIST patients. Lastly, this study had a small sample size. As such, we had limited power to detect true associations, particularly when the evaluated genotype and mutation type were rare.

Conclusions

In this novel study of genetic risk factors for GIST, we identified several SNPs and gene pathways associated with GIST mutation subtypes. This included SNPs involved in dioxin response, toxin metabolism, matrix metalloproteinase synthesis, and inflammatory or immune response. While only a single SNP was statistically significant after correcting for multiple comparisons, our overall findings provide an important starting point for future studies of genetic and environmental risk factors for this rare and poorly understood disease.