Background

Despite recent decline of mortality rates from gastric cancer in North America and in most of Northern and Western Europe, stomach cancer remains one of the major causes of death worldwide and is common in Japan, Korea, Chile, Costa Rica, Russian Federation and other countries of the former soviet union [1]. Despite improvements in treatment modalities and screening, the prognosis of patients with gastric adenocarcinoma remains poor [2]. To understand the pathogenesis and to develop new therapeutic strategies, it is essential to dissect the molecular mechanisms that regulate the progression of gastric cancer. In particular, the oncogenic mechanisms which can be targeted by personalized medicine.

The term "oncogene addiction" to describe cancer cells highly dependent on a given oncogene or oncogenic pathway was introduced by Weinstein [3, 4]. The concept underscores the development of targeted therapies which attempt to inactivate an oncogene, critical to survival of cancer cells whilst sparing normal cells which are not similarly addicted.

Several oncogenes activated at high frequency in other cancers have also been shown to be mutated in gastric cancer. It follows that marketed therapeutics targeting these oncogenes would effectively treat a proportion of gastric carcinomas, either as single agents or in combination. In January 2010, trastuzumab was approved in combination with chemotherapy for the first-line treatment of ERBB2-positive advanced and metastatic gastric cancer. Trastuzumab is the first targeted agent to be approved for the treatment of gastric carcinoma and an increase of 12.8% in response rate was seen with addition of Trastuzumab to chemotherapy in ERBB2 positive gastric adenocarcinoma [5, 6]. It has been estimated that 2-27% of gastric cancers harbour ERBB2 amplifications and may be treated with ERBB2 inhibitors [7, 8]. Similarly, overexpression of another receptor tyrosine kinase (RTK) EGFR, has been noted in gastric cancer and multiple trials of EGFR inhibitors in this cancer type are ongoing (reviewed in [9, 10]). Furthermore some gastric cancers harbour DNA amplification or overexpression of the RTK MET[11, 12] and its paralogue MST1R[13] and may be treated with MET or MST1R inhibitors [1420]. Finally, FGFR2 over expression and amplification has been observed in a small proportion of gastric cancers (scirrhous) [21] and inhibitors have shown some efficacy in clinic [22].

Downstream of the RTKs, KRAS wildtype amplification and mutation has also been found in about 9-15% of gastric cancers [23, 24] and may be effectively treated with MEK inhibitors [25, 26]. Activation of the Pi3K/AKT/mTOR pathway has also been seen in 4-16% of gastric cancer [2730] and so may be sensitive to PI3K inhibitors [3134]. Similarly, cell cycle kinase AURKA has been shown to be activated in gastric cancer [35, 36] and AURKA inhibitors in clinical development [37] may have clinical benefit.

Reports of the frequency of different types of oncogenic activation and their co-occurrence are limited. In contrast to gastrointestinonal stromal tumours (GIST) which are characterized by a high frequency of KIT and PDGFRA activation [38] and hence effectively treated in the majority by imitanib and sunitinib [39, 40], gastric adenocarcinoma appears to be a molecularly heterogeneous disease with no high-frequency oncogenic perturbation discovered thus far. This is illustrated by a recent survey of somatic mutation in kinase coding genes across 14 gastric cancer cell lines and three gastric cancer tissues which discovered more than 300 novel kinase single nucleotide variations and kinase-related structural variants. However, no very frequently recurrent mutation or mutated kinase was uncovered [41].

With the aim of elucidating the potential for treatment of gastric carcinoma with targeted therapies either on the market, in development or to be discovered, we have characterized clinical gastric carcinoma samples to detect oncogene activation.

We took a global approach by assaying the samples on affymetrix SNP arrays and Illumina mRNA expression arrays. These technologies are well validated for detection of genotype, DNA copy number variation and mRNA expression profile. They are amenable to heterogeneous clinical samples. The samples were also interrogated by second generation (Illumina) sequencing. Relatively novel second generation sequencing technologies offer both increased throughput and deep sequencing capacity. The latter is especially important for characterizing cancer samples which tend to include a mixture of cell types including infiltrating normal cells, vasculature and tumour cell of different genotypes. In this study we utilized target enrichment and Illumina sequencing technology to sequence the coding regions of 384 genes. We decided to favour depth of coverage over wider coverage in order to capture mutations present in subpopulations within the tumours. Recent studies have shown cancers tend to harbour many mutations in a smaller number of signalling pathways [42, 43] therefore we concentrated on genes in these pathways. We also included genes coding for proteins previously shown to affect response to targeted therapies and more likely to be successfully targeted by small molecule intervention, as our aim is to find more effective and novel ways of treating gastric carcinoma.

Methods

Tissue samples

DNA and RNA samples were obtained from hospitals in Russia and Vietnam according to IRB approved Protocols and with IRB approved Consent forms for molecular and genetic analysis. The medical centres themselves also have internal ethical committees with reviewed the protocol and ICFs. The samples were sourced through Tissue Solutions Ltd http://www.tissue-solutions.com/. For sample characteristics see additional file 1 table S1

Arrays

Genotypes and copy number profiles were generated for each samples using 1 μg of DNA run on Affymetrix SNP V6 arrays using Affymetrix protocols. Copy number variation data was analysed within the ArrayStudio software http://www.Omicsoft.com. Data was normalized using Affymetrix algorithm and segmented using CBS. A transcript profile was generated for each sample using 1 μg of total RNA run on Illumnia HG-12 RNA expression arrays following the Illumina protocols. Data was analysed within the Illumina GenomeStudio software http://www.illumina.com/software/genomestudio_software.ilmn. As a data pre-processing procedure, a probe set was only retained if it has a "present" (i.e. two standard deviations above background) call in at least one of the samples. Signal values of the remaining probe sets were transformed to 2-based logarithm scale and quantile normalization was performed. DNA copy and RNA expression levels were integrated at the gene level within the ArrayStudio software http://www.Omicsoft.com. Pathway enrichment analysis was performed within the GeneGO metacore analysis suite http://www.genego.com/. All array data from this study is available in GEO http://www.ncbi.nlm.nih.gov/geo/ under series accession number GSE29999.

Targeted deep DNA sequencing

5 μg of DNA was PCR-enriched for the coding exons of any known transcript of 384 genes of interest (additional file 2 table S2) using the Raindance platform http://www.raindancetechnologies.com/.

The resulting target libraries were sequenced using Illumnia GAII at a read-length of 54 nt. Sequence reads were mapped to the reference genome (hg18) using the BWA program [44]. Bases outside the targeted regions were ignored when summarizing coverage statistics and variant calls. SAMtools was used to parse the alignments and make genotype calls [45], and any call that deviates from reference base was regarded as a potential variant. The SAMtools package generates consensus quality and variant quality estimates to characterize the genotype calls. Accuracy of genotype calls was estimated by concordance to genotype calls from the Affymetrix 6.0 SNP microarray. Concordance matrices of samples based on both SNP and sequence data were generated to check for sample mislabelling (additional file 3 figure S1). Concordance and quantity of genotype calls were tabulated for thresholds of consensus quality, variant quality, and depth. The final set of variant calls were identified using consensus quality greater than or equal to 50 and variant quality greater than 0. To exclusively identify somatic changes, only those mutations present in the cancer sample and not detected in any of the normal samples were retained. As an additional filter for germline variants, all variants present in dbSNP and 1000 genome polymorphism datasets were removed.

Q-PCR

Q-PCR was performed via standard protocol using Fluidigm 48*48 dynamic array. Firstly, a validation run was conducted using pooled control RNA from three specimens. Four input RNA amounts were tested (125 ng, 250 ng, 375 ng and 500 ng). Triplicate data points were obtained for the subsequently 10-point serial dilution per each condition per assay. The best overall results were at 250 or 500 ng, which yielded efficiency values ~85%. Therefore 250 ng input amount for the experimental samples. Data was produced in triplicate and mean combined. CT values were converted to abundance using standard formula abundance = 10(40-CT/3.5). Test data was normalised to housekeepers using the analysis of covariance method whereby the two housekeepers (GAPDH and beta-actin) were used to compute a robust score and the score was used as a covariate to adjust the other genes. Data analysis was performed in the Arraystudio software.

Sanger Sequencing

Genomic DNA PCR primers were ordered from IDT (Integrated DNA Technologies Inc, Coralville, Iowa). PCR reactions were carried out using Invitrogen Platnium polymerase (Invitrogen, Carlsbad, CA). 50 ng of genomic DNA was amplified for 35 cycles at 94°C for 30 seconds, 58°C for 30 seconds and 68°C for 45 seconds. PCR products were purified using Agencourt AmPure (Agencourt Bioscience Corporation, Beverly, MA). Direct sequencing of purified PCR products with sequencing primers were performed with AB v3.1 BigDye-terminator cycle sequencing kit (Applied Biosystems, Foster City, CA) and sequencing reactions were purified using Agencourt CleanSeq (Agencourt Bioscience Corporation, Beverly, MA). The sequencing reactions were analyzed using a Genetic Analyzer 3730XL (Applied Biosystems, Foster City, CA). All sequence results data were assembled and analyzed using Codon Code Aligner (CodonCode Corporation, Dedham, MA).

Results

DNA and RNA amplification patterns across samples are consistent with previous studies

Consistent with most other human cancers, copy number changes occurred across the genomes of the 50 gastric cancer samples compared to matched normal samples (Figure 1). Large regions of frequent amplification were found at chromosomal regions 8q, 13q, 20q, and 20p. Known oncogenes MYC and CCNE1 are located in the 8q and 20p amplicons, respectively and likely contribute to a growth advantage conferred by the amplification. These amplifications have been seen in prior studies in gastric cancer along with amplification of 20p for which ZNF217 and TNFRSF6B have been suggested as candidate driver genes [46].

Figure 1
figure 1

View of CNV aberrations across all 50 gastric carcinoma samples, for each autosome. The y-axis corresponds to the sum of the number of positive or negative changes for a particular segment with the log2 ratio of those change. Areas with increased or decreased copy number consistent throughout all the samples analysed or very large changes in few samples will show large positive and negative change sizes. Each dot or segment in figure is colored by sample. The colour code is arbitrary with each of the 50 cancer samples being assigned a colour. Amplified segments include chromosome 8q, 20q, 20p, 3q, 7p, and 1q.

Concordance between DNA copy number gain and RNA expression among the cancer samples was evaluated and the top 200 genes contained within a region of frequent high DNA copy in cancer samples and which had high mRNA levels (compared to matched normal tissue) are tabulated in additional file 4 table S3. Most of the genes on this list are from chromosomal regions 20q and 8q, suggesting that these amplifications have the most effect on mRNA levels, in the minority are genes for 20p, 3q, 7p, and 1q. Figure 2 shows the RNA profiles measured by Q-PCR of an exemplar gene from each region showing general overexpression in gastric cancer, particularly in certain samples. Besides MYC and CCNE1, there are multiple genes in these regions, which could contribute to a growth advantage for the cancer cell. The biological pathways most significantly enriched for amplified and overexpressed genes are involved in regulation of translation (p = 0.000015) and DNA damage repair (p = 0.003). Samples with amplifications in these genomic regions are annotated in Figure 3. There is no discernible tendency for amplifications in these regions to co-occur or to be exclusive. In agreement with a previous study [47], the PERLD1 locus was amplified (within the ERBB2 amplicon) in sample 08280 and MMP9 was overexpressed but not discernibly amplified. Also in Figure 3 focal DNA amplifications with concordant RNA expression of genes likely to affect the response to targeted therapies are denoted, for example underlying data see additional file 5 figure S2.

Figure 2
figure 2

Expression of example genes from each amplified chromosomal region across study samples confirmed by Q-PCR. Red dots denote cancer samples and white dots denote normal samples. The y-axis denotes the mRNA abundance.

Figure 3
figure 3

Mutational profile of samples. Tissue samples are displayed across the top and annotations relevant to them are in columns below. Red boxes denote DNA amplification and concordant mRNA overexpression, orange boxes denote RNA overexpression with no evidence of DNA amplification, red dots denote DNA loss. Blue boxes denote somatic nonsynonymous mutation validated by Sanger sequencing and purple boxes denote nonsynonymous somatic mutations, observed in the Illumina data with no attempt to confirm by Sanger sequencing. Amino changes are noted in the boxes and changes leading to loss or gain of a stop codon are in red text.

Sequencing data shows high concordance with genotyping

Sequencing library preparation failed for six of the original 50 cancer samples and fourteen of the original matched normal samples. Therefore two more matched pairs were added to the analysis, resulting in a dataset of 44 cancer samples, 36 with matched normal pairs (additional file 1 table S1). The targeted region included 3.28 MB across 6,547 unique exons in 384 genes (additional file 2 table S2). Median coverage of across all samples was 88.3% and dropped to 74% when requiring minimum coverage of 20. All sequencing was carried out to a minimum of 110x average read coverage across the enriched genomic regions for each sample. The reads were aligned against the human genome and variants from the reference genome were called. As a control, an analysis to compare genotyping calls from the Affymetrix V6 SNP arrays and the Illumina sequencing was performed. The regions targeted for sequencing contained 1005 loci covered by the Affymetrix V6 SNP arrays. With no filtering of the sequencing variant calls for quality metrics, the median agreement between the genotyping and sequencing results was 97.8% with a range of 65-99% (additional file 6a, Figure S3a). The raw overall genotype call concordance was 96.8%. Quality metrics were chosen to maximize the agreement between the genotyping and the sequencing calls while minimizing false negatives. The most informative metric was consensus quality and a cut-off of ≥50 resulted in loss of about 10% of the shared genotypes but an overall 2% increase in concordance to 98.7% (additional file 6b, Figure S3b). Variant genotype calls were isolated for further concordance analysis. In this set, a variant quality threshold of > 0 increased accuracy of variant genotype calls to 98.9% (additional file 6c, Figure S3c). When both quality thresholds were applied the median sample concordance is 99.5% (additional file 6d, Figure S3d) which is within the region of genotyping array error. Six samples (08362T1, 08373T2, 336MHAXA, 08337T1, 89362T2, DV41BNOH) had a concordance of < 98% and two of these (08393T2 and DV41BNOH) had a concordance of 82% and 88% respectively. Therefore with a consensus quality ≥ 50 and a variant quality > 0, the false positive rate was 0.5% and 1.6% for reference genotypes and variant genotypes, respectively (additional file 6e Figure S3e).

From all single nucleotide changes passing the above thresholds, all variants present in any of the normal samples or in the polymorphism databases of dbSNP (v130) or 1000 genomes were assumed to be germline variants and discarded. Variants present only in the exons of cancer samples were assumed to be somatic and retained. 18,549 somatic variants were detected in total across all 44 samples (additional file 7 Table S4), 3357 were predicted to be exonic and nonsynonymous. To prioritise for mutations with functional impact we concentrate all further analyses on nonsynonymous mutations and highlighted mutations leading to loss or gain of stop codons. We have applied the SIFT algorithm [48] to predict amino acid changes that are not tolerated in evolution and so are more likely to affect the function of the protein, 1509 somatic nonsynonymous mutations have a SIFT score of < 0.05. The rate of mutations with SIFT score < 0.05 per gene, corrected for CDS length was calculated (4). Figure 4 shows, the genes with the highest concentration of low SIFT scoring mutations were S1PR2, LPAR2, SSTR1, TP53, GPR78 and RET, with S1PR2 being most extreme. There are fifteen mutations with SIFT score <0.05 across the 353aa CDS of S1PR2, concentrated in nine samples. S1PR2 also known as EDG5 codes for a G-protein coupled receptor of S1P and activates RhoGEF, LARG[49]. Little is known of its role in cancer and somatic mutations have not been observed in the 44 tissues sequenced for S1PR2 in the COSMIC database [50].

Figure 4
figure 4

Bar chart of rate of deleterious mutations across gene sequenced. Genes sequenced are shown on the x-axis. The number of deleterious somatic nonsynonymous mutations observed in each gene/number of amino acids in each CDS in plotted.

Sequencing data is confirmed by Sanger sequencing

Some nonsynonymous somatic mutations were selected to be confirmed by Sanger sequencing. All mutations reported in blue in Figure 3 were confirmed by Sanger sequencing and were also confirmed to be somatic by sequencing of the wildtype sequence in the matched normal tissue (see additional file 8 Figure S4 for example sequencing traces). Although 74% were confirmed, some mutations detected in the Illumnia sequencing were not confirmed as somatic mutations by Sanger sequencing. Sixteen of the 68 (24%) mutations we attempted to confirm were present in the normal and cancer sample, these are germline mutations but not detected in any of the normal samples by Illumina sequencing and also not represented in dbSNP or 1000 genomes data. Five of the sixteen germline mutations were from cancer samples with no matched normal tissue included in the dataset, the other eleven came from cancer samples with matched normal tissue sequence included in the dataset. This evidences a rate of germline contamination not eliminated by the matched normal controls or the comparison to known polymorphism databases. It may be that the coverage of the substitutions in the normal tissue happens to be lower than in the cancer sample and so some germline mutations remain despite the somatic filters. Two of the 68 (3%) mutations we attempted to confirm were not present in the normal or cancer sample by Sanger sequencing. One cause could be false positives in the Illumnia data due to artefact; however additional file 6 Figure S3 shows the false positive rate to be low at least for those variants represented on the Affymetrix V6 arrays. Another possibility is that these are present in a subset of the sample below the sensitivity of the Sanger methodology but detected by the Illumina sequencing. Therefore, mutations reported in the Illumina sequencing are also reported in purple in Figure 3, some caution is warranted when interpreting these results as they may be germline polymorphisms or present only in a subset of the tumour sample.

Alterations in the RAS/RAF/MEK/ERK pathway

Three tumour samples had KRAS genetic alterations (Figure 3) suggesting therapeutic opportunity for treatment with MEK inhibitors. One of these alterations is a G12D mutation. KRAS G12D mutations have been shown to initiate carcinogenesis and tumour survival [51]. Amplification and overexpression of wildtype KRAS was seen in the other 2 samples. KRAS amplification has been observed before in 5% of primary gastric cancers. Gastric cancer cell lines with wildtype KRAS amplification show constitutive KRAS activation and sensitivity to KRAS RNAi knockdown [24]. A novel mutation in KRAS was also observed; (in sample 08393) the functional consequence is unknown.

The PIK3CA mutation co-occurring with KRAS G12D, is known to affect sensitivity to MEK inhibitors [25]; in addition, novel mutations observed in this study may also have consequences for the same class of therapeutics. For instance: KSR2 functions as a molecular scaffold to promote ERK signalling [52, 53]. Therefore, mutations in KSR2 such as seen in seven samples may affect sensitivity to MEK inhibitors. A second example is ULK1, which positively controls autophagy downstream of mTOR [54] and is mutated in fourteen samples. Autophagy is increased along with ERK phosphorylation when gastric cancer cells are treated with a proteasome inhibitor [55], therefore mutations in ULK1 may affect sensitivity to proteasomal inhibitor treatments such as bortezomib as a single agent or in combination with MEK inhibitors.

Alterations in the PI3K/AKT pathway

There was substantial sequence disruption of the phosphoinositide-3-kinase (Pi3K) pathway genes in the sample set. There are a number of PI3K/AKT/mTOR inhibitors in clinical development and patients with activating mutations in the pathway are candidates for treatment [56]. PIK3CA mutations of known oncogenicity were found in four samples. This results in a frequency of PIK3CA hotspot mutation of 9%, slightly higher than previous estimates of 6% (12/185) [27] and 4.3% (4/94) [57]. The common PIK3CA hotspot mutations of known oncogenicity (E545K and H1047R) [58] were observed twice each. Another mutation in PIK3CA K111E, which has also been observed before in four samples in COSMIC, was observed once and potentially novel somatic mutations were observed in two more samples.

Five nonsynonymous AKT1 mutations were observed. Although AKT1 mutations are found in about 2% of all cancers, they mainly occur at amino acid 15 and the functional importance of mutation at other sites is unknown. Another nonsynonymous mutation in AKT2 was observed in sample 08407. AKT2 mutations are much rarer than AKT1 mutations, although an AKT2 mutation has been observed before in gastric carcinoma, at a 2% frequency [59]. Finally mutation of PTEN or MTOR may affect response to pathway inhibitors. Several PTEN mutations are noted and MTOR mutations are frequent.

Alterations in Receptor Tyrosine Kinases

The receptor tyrosine kinases (RTKs) and drug targets EGFR, ERBB2 and MET were each amplified (log2 > 0.6) and overexpressed at the RNA level in one cancer sample. It follows that the tumours may be sensitive to the inhibitors of the amplified RTKs. In addition, multiple nonsynonymous mutations are observed in their coding regions. Downstream mutations would be expected to influence response. For instance, in the MET amplified sample a truncating mutation in AKT3 may affect sensitivity to MET inhibitors.

FGFR2 is amplified and RNA overexpressed in two samples, there are also multiple mutations in FGFR1-4. Broad range RTK inhibitors, which target FGFRs among other kinases, may be efficacious in these patients [60, 61].

Alterations in Cell Cycle Proteins

The viral oncogene homolog SRC is mutated in four of the tumour samples, two of the mutations are predicted to have a deleterious effect including introduction of a stop codon. This may counter-indicate SRC inhibitors. MET amplification is also a known resistance marker for anti-SRC therapeutics such as dasatanib [62, 63]. The cell cycle related kinase, AURKA was amplified and overexpressed in one sample. AURKA inhibitors are in development for solid tumours [37] and may be indicated in this case. CCNE1 was amplified in two samples (08390 and 08357). High levels of CCNE1 have been shown to be frequently associated with early gastric cancer and metastasis but expression levels do not correlate with survival [64, 65]. High CCNE1 levels have been suggested as a sensitivity marker for the gene-directed pro-drug enzyme-activated therapies [66]

Activation of wnt pathway is common in the carcinoma samples

Mutations were observed in the APC gene in 22 samples. APC is a tumour suppressor known to activate CTNNB1 and wnt pathway signalling, amongst other effects [67]. The wnt pathway has been previously found to be frequently activated in gastric cancer [68]. We used a transcriptional signature, generated from previous studies [69, 70] and available at the Broad Institute MSigDB database to classify the study samples by their wnt transcriptional signatures. Figure 5A shows a heat map of the transcriptional levels of the WNT signature genes in the datasets. Activation of this pathway is higher in nearly all the cancer samples compared to the normal samples. Wnt inhibitors are the subject of intense investigation in pharmaceutical and academic research [7173]. These results suggest they will have an indication in gastric cancer as well as many other cancers.

Figure 5
figure 5

Transcriptional signatures across samples. Clustered heatmap showing expression of A wnt signature genes and B hedgehog signature genes, across samples in the study. All expression values are Zscore normalized. Zscore <-1 are blue, Z-score > 1 are red with a graded coloring through white at 0. Sample names are on the x-axis, they are clustered by expression pattern and samples with high signature scores are to the right. Samples with somatic nonsynonymous APC mutations (A) or PTCH1 mutations (B) and denoted by an asterisk above the heatmaps. WNT signature genes (top to bottom): FSTL1, DACT1, CD99, LMNA, SERPINE1, TNFAIP3, GNAI2, ID2, MVP, ACTN4, CAPN1, LUZP1, MTA1, RPS19, PTPRE, AXIN2, NKD2, SFRS6, CCND1, SCAP, CPSF4, SENP2, DKK1, PRKCSH, SLC1A5, HDGF, CBX3, SCML1, PCNA, RPS11, SNRPA1, TGM2, LY6E, IFITM1, NSMAF, TCF20, BCAP31, AXIN1, AGRN, PLEKHA1, SLC2A1, CTNNB1, EIF5A, IMPDH2, GSK3B, PFN1, UBE, MAP3K11, ARHGDIA, HNRPUL1, FLOT2, GYPC, NCOA3, CENTB1, SYK, POLR2A, KRT5, DHX36, ELF1, SMG2, FGD6, MAPKAP1, LOC389435, RPL27A, SRP19, RPL39L, SFRS2IP, FUSIP1; Hedgehog signature genes (top to bottom): LRFN4, JAG2, RPL29, WNT5A, SNAI2, FST, MYCN, BMP4, CCND1, BMI1, CFLAR, PRDM1, GREM1, FOXF1, CCND2, CD44.

Activation of the hedgehog pathway is also common in the carcinoma samples

PTCH1 is a tumour suppressor and acts as a receptor for the hedgehog ligands and inhibits the function of smoothened. When smoothened is freed, it signals intracellularly leading to the activation of the GLI transcription factors [74]. Multiple somatic mutations of PTCH1 are recorded in COSMIC, consistent with its tumour suppressor role. The D362Y mutation seen in this study in sample FICJG, is in the fourth transmembrane domain of PTCH1 and has been previously seen as a loss-of-function germline mutation in a patient with Gorlin syndrome, predisposing to neoplasms (numbered D513Y due to different transcript) [75]. Therefore, sample FICJG is very likely to have deregulated hedgehog signalling and does indeed have high levels of GLI target genes (as defined by [74] (Figure 5B)). Other samples also contain PTCH1 mutations in the Illumina sequence data, including a truncating stop codon (Y140X) in sample 08379 and have high levels of hedgehog signature genes. Hedgehog signalling has previously been shown be frequently activated in gastric cancer [76] though no genetic cause has been previously implicated. Inhibitors of the hedgehog pathway are in clinical development [77, 78].

Loss of Epithelial phenotype

Epithelial or mesenchymal status has been shown to affect response to multiple drugs [79] and samples may be more resistant due to loss of an epithelial phenotype. Both hedgehog and wnt signalling upregulate mesenchymal precursors such as BMP4 and mutations can lead directly to loss of epithelial phenotype. CDH1 is a marker of an epithelial phenotype and is often lost in gastric tumours due to the process of epithelial to mesenchymal transformation (EMT) and is a negative prognostic marker [80]. Mutations in CDH1 were observed in nine samples, including a D254G mutation in CDH1 was detected in sample 08359. A mutation at the same site (D254Y) has been recorded in COSMIC in a breast tumour and 211 somatic mutations have been observed in the 2732 samples sequenced for CDH1 in COSMIC. Mutation in SMAD4 is also likely to affect epithelial phenotype. Loss of SMAD4 function facilitates EMT and its re-expression reverses the process in cancer cell lines [81]. Mutations in tumour suppressor SMAD4 were observed in ten samples.

Sensitivity to chemotherapy

Multiple substitutions in BRCA1 were observed in ten samples, including three cases of substitution of a stop codon. Germline mutations in BRCA1 predispose patients to breast and ovarian cancer, multiple somatic mutations have been found in tumours [82]. BRCA1 expression levels and polymorphic status has been shown to correlate with sensitivity to chemotherapeutics in gastric cancer [83, 84]. Therefore, the observed mutations of BRCA1 may affect sensitivity to chemotherapy.

Another commonly mutated gene which is linked to sensitivity to chemotherapy in gastric cancer is TP53[85]. Eight examples of TP53 mutation including two stop codons are seen in the dataset.

Mutations in TRAPP were found in 22 samples, including one mutation to a stop codon. TRRAP is a component of histone acetyltransferase complexes and is implicated in oncogenic transformation and cell fate decisions through chromatin regulation [86]. Loss of function mutations of the Sacchromyces pombe orthologue of TRRAP, cause defects in G2/M cell cycle control and resistance to CHK1 overexpression [87]. Mutations in TRAPP are likely to affect response to HDAC and CHK1 inhibitors currently approved and in trials for use as anticancer agents [8892].

Novel targets for therapies in gastric cancer

An additional aim of our study was to uncover novel drug targets for gastric cancer. Many novel perturbations were observed in tractable target genes, following are three examples which warrant further investigation.

Thyrotropin receptor (TSHR) is mutant in four samples. The A553T mutation of TSHR found in sample 08360, has been previously been observed in two siblings with congenital hypothyroidism and was found to be inactivating [93]. Both loss and gain of function TSHR mutations are often found in thyroid cancer [94]. However, a role for TSHR in other cancers has not been elucidated, although infrequent mutations in lung cancer are recorded in COSMIC and TSHR has been shown to be lost at the DNA level, in some gastric cancers [95]. Three of the four TSHR mutations found have very low SIFT scores and may suggest deregulation of this growth hormone pathway.

We used the COPA algorithm [96] to identify mRNAs with outlier expression in the cancer samples. The top gene identified was KLK6. KLK6 is not detected or detected at very low levels in the normal samples, whilst its expression is very high in eleven of the cancer samples. Figure 6 shows the expression profile of KLK6 across the samples, confirmed by Q-PCR. KLK6 has previously been shown to be over expressed in gastric cancer and RNAi mediated knockdown of KLK6 in gastric cancer cell lines has been shown to be anti-proliferative and anti-invasive [97, 98].

Figure 6
figure 6

Expression of KLK6 across study samples confirmed by q-PCR. Red dots denote cancer samples and white dots denote normal samples. Patient IDs are arranged on the x-axis. The y-axis is the mRNA abundance.

Finally, mutations in the Rho associated coiled-coil containing protein kinases (ROCK1 and ROCK2) are interesting in view of their role as effectors of RhoA GTPase and the recent finding that truncating mutations in ROCK1 (similar to the confirmed ROCK2 mutation in this study) are activating and lead to increased motility and adhesion in cancer cells [99].

Discussion

Gastric adenocarcinoma rates vary widely across geographical regions, gender, ethnicity and time [100]. Diet has been shown to significantly influence gastric cancer risk as have tobacco smoking and obesity [101]. The infectious agent Helicobacter pylori is intimately associated with the most common types of gastric adenocarcinoma development [102]. H. pylori colonizes the stomach of at least half the world's population, virtually all persons infected with H. pylori develop gastric inflammation, which confers an increased risk for developing gastric cancer; however, only a fraction of infected individuals develop the clinical disease [103]. H. pylori induces generalized mutation and genomic instability in host DNA [104], which along with the complex risk profile suggests diverse routes to oncogenesis in gastric adenocarcinoma.

Therefore, an individualized personal medicine approach, measuring molecular targets in tumours and suggesting treatment regimens based on the results, is attractive. A recent study using this approach across tumour types has reported improved outcomes [105]. The trial used IHC, FISH and microarray technologies to assay levels of molecular targets in tumours, as the authors mention, second generation sequencing techniques offers a more complete picture of tumour mutagenic profile and will be even more informative in identifying sensitivity and resistance biomarkers.

Conclusions

This study evidences previously observed perturbations of the KRAS, ERBB2, EGFR, MET, PIK3CA, FGFR2 and AURKA genes in gastric cancer and suggests some of the targeted therapies approved or in clinical development would be of benefit to 11 of the 50 patients studied. The data, also suggests that agents targeting the wnt and hedgehog pathways would be of benefit to a majority of patients. The previously undocumented DNA mutations discovered are likely to affect clinical response to marked therapeutics and may be good drug targets. Detection of these mutations was enabled by Illumina sequencing and the concordance with genotyping arrays shows its suitability for heterogeneous cancer samples. These "nextgen sequencing" techniques are just at the beginning of expanding our abilities to detect genome wide DNA mutation, DNA copy number, RNA levels and epigenetic changes, in each patient's genome. However, it remains a challenge to filter germline from somatic mutations and sort driver mutations with functional import from passenger mutations.

Whole genome studies using both Sanger and nextgen sequencing have revealed mutagenic profiles of other cancers in unprecedented completeness and detail [41, 106112]. Similar studies with large numbers of samples will be critical to fully appreciate the mutagenic diversity in gastric cancer and identify the important driver mutations. Bodies such as the ICGC (International Cancer Genomics Consortium) are currently collecting gastric adenocarcinoma samples.

Translation of these findings to clinic will require pinpointing of important mutations as well as easier access to broad diagnostic assays and clinical development of agents targeting low-frequency events [113]. Data such as that presented here, is a necessary preliminary step in delivering the maximum benefit from the major advances of targeted therapies and personalized medicine to gastric cancer patients.