Background

Alternative splicing of pre-messenger RNA is nearly universal, involving more than 90% of human genes [1]. It is an important regulatory mechanism of gene expression for tissue-specific functions. Each gene maintains a delicate balance of its alternative transcripts in normal cells, disruption of which affects normal cellular processes and may cause various diseases, including cancer [2]. Although there have been numerous studies to identify tumor-specific splicing variants as diagnostic markers or therapeutic targets, only recently has alternative splicing in cancer been studied using genome-wide profiling methods [37]. Most of these studies focus on the identification of splicing variants in tumor, while little has been explored on the role of alternative splicing in tumors of different stages and molecular subtypes.

Neuroblastoma is the most common solid extracranial tumor in children. The incidence rate is 10.2 cases per million children under 15 years of age, and the median age at diagnosis is 17 months [8]. Stage, age, and other biological features in tumour cells are important prognostic factors for risk stratification and disease management. The "International Neuroblastoma Staging System" (INSS) classifies the tumor into six stages (1, 2A, 2B, 3, 4, 4S) according to its anatomical presence at diagnosis [9]. Localized disease has favorable outcome with an overall survival rate for stage 1 disease of 75-90%. Patients over 18 months with stage 4 disease has a 2-year disease-free survival rate of only 30-40% [10]. MYCN is the most important biologic marker for neuroblastoma. It is amplified in approximately 25% of de novo neuroblastoma cases and is more common in patients with advanced-stage disease. MYCN-amplified tumor is highly aggressive with poor outcome [11]. Alternative splicing has been shown to be involved in neuroblastoma development [12]. For example, kinesin family member 1B isoform beta (KIF1Bbeta) but not alpha is down-regulated in advanced stages of neuroblastoma. KIF1Bbeta induces apoptotic cell death, suggesting its role as a haploinsufficient tumor suppressor [13, 14]. In this study, we used Affymetrix Human Exon 1.0 ST Array (HuEx) to measure exon expression levels in 47 neuroblastoma samples of different clinical stages and molecular subtypes including stage 1 with normal MYCN copy number (1-), stage 4 with MYCN amplification (4+) or normal MYCN copy number (4-). The goal was to identify stage- and MYCN amplification-specific splicing patterns in comparison to whole gene expression changes for the understanding of cancer biology and discovery of biomarkers or therapeutic targets in neuroblastoma.

Results

Identification of alternatively spliced candidate genes

To study the role of splicing regulation in high stage and MYCN amplified neuroblastoma, we used HuEx array to measure exon expression levels in 47 neuroblastoma samples from 10 stage 1-, 28 stage 4-, and 9 stage 4+ tumors (Table 1). HuEx array allows the detection of differential inclusion or skipping of exons by measuring expression levels of individual exons in different groups of samples. Alternative splicing events may be detected by normalized intensity (NI), which is defined as the ratio of exon-level probeset expression to gene-level transcript cluster expression [15]. To examine the global splicing pattern in neuroblastoma, we performed principal component analysis (PCA) using NI values of all core probesets (n = 221,809) across all samples after quality filtering. Figure 1 shows that stage 1- and 4+ samples are clearly separated from each other, while stage 4- samples are located between the other two groups. The separation of three groups of tumors by NI values suggests distinct alternative splicing patterns associated with clinical stage and MYCN status.

Table 1 Tumor samples used in the study
Figure 1
figure 1

Principal component analysis (PCA) of 47 neuroblastoma samples by log2-transformed normalized intensity of core probesets. PCA was performed using NI values of core probesets (n = 221,809) across all samples after quality filtering. Stage 1- and 4+ samples are clearly separated from each other, while stage 4- samples are located between them. Blue, stage 1-; green, stage 4-; red, stage 4+ tumors.

Probesets with significantly different NI values in two groups of samples represent exons that may be differentially spliced in two disease states. We used Significant Analysis of Microarray (SAM) procedure [16] with a stringent false discovery rate threshold (q value < 0.05) to compare NI among three groups of patients. There were 1501 differentially spliced candidate genes between 4- and 4+, while only 362 genes were differentially spliced between 1- and 4-. Using AltAnalyze, we checked prior evidence of alternative splicing in Ensembl and/or UCSC genome browser databases [17]. Of the candidate gene lists derived from the comparison of 4-/4+ and 1-/4- tumors, 44.7% (671) of the former list and 29.6% (107) of the latter were supported by prior evidence of alternative splicing. The largest difference was between 4+ and 1- with 2775 differentially spliced candidate genes, of which 46.9% (1302) had alternative exons with prior evidence of alternative splicing (Additional file 1, Figure 2A). Therefore, both stage and MYCN amplification seem to affect the alternative splicing patterns in neuroblastoma.

Figure 2
figure 2

Comparison of exon level splicing change and gene level expression change. Pairwise comparison of stage 1-, 4-, and 4+ tumors was performed for both exon level splicing change and gene level expression change; the genes with FDR < 0.05 were identified. (A) Venn diagram of alternatively spliced candidate genes. (B) Venn diagram of differentially expressed candidate genes. (C) Comparison of exon level splicing change and gene level expression change. Blue column shows number of genes with whole gene expression change but not splicing change, while red column shows number of genes with splicing change but not whole gene expression change. Green column represents genes with both whole gene expression and splicing changes.

Potential impact of alternative splicing in neuroblastoma

To estimate the functional impact of alternative splicing, we predicted changes in domain, motif, and miRNA binding site composition of protein sequences for alternatively regulated exons in stage 1-, 4-, and 4+ tumors [17]. The percentage of alternative exons that were associated with predicted domain/motif change ranges from 74.1% to 76.2% for the pair-wise comparison of three tumor groups. The number of genes including at least one alternative domain/motif were 296 (81.8%) for stage 4- vs 1-, 1181 (78.7%) for stage 4+ vs 4-, and 2203 (79.4%) for stage 4+ vs 1- respectively. In addition to the impact on protein domain/motif, alternative splicing may result in gain or loss of miRNA binding sites. Our analysis identified 19, 280, and 538 genes containing alternative exons overlapping with predicted miRNA binding sites for the comparison of 4-/1-, 4+/4-, and 4+/1- tumors, respectively. These results suggested diverse changes in protein function and expression regulated by alternative splicing in high stage neuroblastoma.

Several known aberrant splicing events in neuroblastoma and/or other tumors are evident in our candidate lists. For example, exon array data indicated decreased expression of KIF1Bbeta but not KIF1Balpha in high stage neuroblastoma, which is consistent with previous report (Additional file 2) [18].

More interestingly, our results showed increased expression of M2 isoform of PKM2 in high stage neuroblastoma while the M1 isoform exhibited decreased expression (Figure 3). The splicing switch of pyruvate kinase has been demonstrated in multiple tumor types [19], but not previously reported in neuroblastoma. Two isoforms are expressed through exchange of two cassette exons, which are equally long and share 60% protein sequence identity [20]. Functional annotation by AltAnalyze identified the gain of the allosteric activator fructose 1,6-bisphosphate (FBP) binding region and intersubunit contact in high stage neuroblastoma, consistent with the allosteric regulation of M2 isoform but not M1 isoform by FBP [20]. The upregulation of M2 isoform in stage 4+ vs 1- tumors was also evident at the proteome level demonstrated by a previous proteomics study in our lab, which applied isotope-coded affinity tags (ICAT) in combination with mass spectrometry to quantify peptide expression levels in stage 1- and 4+ tumors [21]. The quantitative ICAT analysis identified five peptides matched against PKM2 protein sequence, among which KCCSGAIIVLTKS was from the exon specific to the M2 isoform and the other four were from constitutive exons in PKM2. M2 isoform-specific peptide demonstrated a mean log2-transformed expression ratio of 2.73, and the other peptides had a mean ratio of 1.67 between 4+ and 1- tumors ([21]). Although no peptide has been found for M1 isoform-specific exon, the higher expression change of M2 isoform-specific exon than constitutive exons suggested the upregulation of M2 isoform but not the M1 isoform at the proteome level.

Figure 3
figure 3

Alternative splicing of pyruvate kinase (PKM2) detected by Affymetrix exon (HuEx) array. (A) Gene structure of known isoforms is shown on the top panel with predicted domains/motifs that differ in protein isoforms. Green oval shows intersubunit contact (ISC) sequence, and red boxes point to fructose 1,6-bisphosphate (FBP) binding regions as defined by UniProt. The HuEx expression is shown on the bottom panel. Each point represents mean log2-expression of each group that was then median-centered across three groups. Orange lines point to probeset 3631984 that mapped to the unique exon in isoform M1 (NM_182470 and NM_182471), and probeset 3631977 that mapped to the unique exon in isoform M2 (NM_002654). While isoform M2-specific probeset showed increased expression in stage 4+ compared to stage 4-/1- tumors, isoform M1-specific probeset indicated lower expression in stage 4+ tumor. (B) Normalized intensity (NI) values for probeset 3631984 in Stage 1-, 4- and 4+ tumors. (C) Normalized intensity values for probeset 3631977 in Stage 1-, 4- and 4+ tumors. The expression for probesets 3631984 and 3631977was significantly different between stage 1- and 4+, suggesting the increased expression of isoform M2 and reduced expression of isoform M1 in MYCN-amplified neuroblastoma.

Alternative splicing and whole gene expression changes in neuroblastoma

To compare global splicing and transcription regulation in high stage neuroblastoma, we derived whole gene expression signatures by pairwise comparison of gene-level transcript cluster signals. Therefore, splicing signature includes genes having differentially included/excluded exons between two tumor groups, while whole gene expression signature are genes with different gene-level signals of two tumor groups. SAM analysis resulted in 185, 2318, and 3298 genes showing whole gene expression changes for stage 1- vs 4-, 4- vs 4+, and 1- vs 4+, respectively (FDR ≤ 0.05 and Fold Change ≥ 1.5; Figure 2B, Additional file 3).

Comparing stage 4- with 1-, only 27 genes were shared between splicing signature (n = 362) and whole gene expression signature (n = 185) (Figure 2C). Using the Database for Annotation, Visualization and Integrated Discovery (DAVID) [22], we found that genes with known splice variants were significantly overrepresented in splicing signature (n = 153) but not expression signature based on UniProt annotation (FDR < 0.05), which indicated the validity of our splicing analysis process. The most enriched Gene Ontology terms in the biological process category in the splicing signature included nervous system development, cell adhesion, synaptic transmission, and cytoskeleton organization and biogenesis (Figure 4A, Additional file 4). In contrast, the whole gene expression signature is enriched with genes involved in cell cycle, cell division, and DNA metabolic processes (Figure 4B, Additional file 4), which is consistent with the results derived using traditional gene expression microarray platforms [23]. Different sets of genes with different biological functions were affected by alternative splicing and transcription regulation, suggesting independent roles of splicing and transcription regulation in stage 4- neuroblastoma.

Figure 4
figure 4

Enriched Gene Ontology (GO) biological processes in alternatively spliced (A) or differentially expressed (B) genes in stage 4- vs 1- neuroblastomas. GO enrichment analysis was done by DAVID [22], and overrepresented biological processes were shown as a GO graph in which child terms are connected to their parent terms by directed lines. Color scale denotes Benjamini-corrected p-values generated by a hypergeometric test, and node size is proportional to the number of genes annotated to corresponding GO terms.

MYCN amplified neuroblastoma is the most aggressive disease subtype. By comparing stage 1- and stage 4+ neuroblastomas, we identified 9044 probesets within 2775 alternatively spliced candidate genes, along with 3298 differentially expressed genes between two tumor groups. Again, significant enrichment of genes with known splice variants (n = 311) was found in those showing only splicing changes (n = 852), but not those having only whole gene expression changes (n = 1375) (FDR < 0.05). In contrast to stage 4- tumors, a large number of genes (n = 1923) appeared to have both changes in stage 4+ tumors (Figure 2C). Biological processes enriched in alternatively spliced genes in stage 4+ include those involved in stage-specific splicing signature and whole gene expression signature (Table 2). Similar GO terms were enriched in genes showing whole gene expression changes in stage 4+ (Additional file 5). To derive MYCN amplification-specific signatures, we compared stage 4- and 4+ tumor, and detected 4602 probesets within 1501 genes that may undergo splicing disruption, and 2318 transcript clusters with expression changes during MYCN status change. The overlap between two signatures was also large with 1127 genes in common (Figure 2C). GO enrichment analysis results were similar to what was found in the comparison between stage 4+ vs 1- tumors (Additional file 6). These results suggested a correlation between MYCN amplification and splicing regulation. While splicing and transcription regulation may affect different sets of genes involved in different biological processes in stage 4- tumors, these biological processes may be disrupted by both gene expression regulatory processes to a greater extent in the more severe stage 4+ neuroblastoma.

Table 2 Top 40 overrepresented Gene Ontology biological processes (FDR < 0.0001) in alternatively spliced genes in stage 4+ vs. 1- tumors

Validation of splice variants differentially expressed in stage 1- and stage 4+ tumors

Three alternatively spliced genes have been selected for quantitative RT-PCR validation including PKM2 (NM_002654 vs. NM_182470 & NM_182471), KIF1B (NM_015074 vs. NM_183416) and MAP2 (NM_001039538 vs. NM_002374). Figure 5 shows gene expression fold changes between splice variants in 5 stage 1- and 5 stage 4+ tumors. The HuEx results for all three alternatively spliced genes were validated by qRT-PCR result and the expression of splice variants is significantly different between stage 1- and stage 4+ tumors with p-values (t-test) equal to 0.0038, 0.0014 and 0.0209 for genes PKM2, KIF1B and MAP2 respectively. The expression of M2 isoform (NM_002654) of PKM2 is increased in stage 4+ compared with M1 isoform; the expression of KIF1Bβ (NM_015074) is decreased in stage 4+ compared with KIF1Bα (NM_002374); and the expression of isoform NM_001039538 of MAP2 is higher in stage 4+ compared with the isoform NM_002374.

Figure 5
figure 5

qRT-PCR validation on splice variants differentialy expressed in stage 1- and stage 4+ tumors. Quantitative RT-PCR was performed for three spliced genes including PKM2 (NM_002654 vs. NM_182470 & NM_182471), KIF1B (NM_015074 vs. NM_183416) and MAP2 (NM_001039538 vs. NM_002374) in 5 stage 1- and 5 stage 4+ tumors. Differential expression of splice variants was evaluated by calculating expression fold changes between splice variants of the spliced gene in each sample, which were further centered by median of values obtained in stage 1- and stage 4+ tumors. The expression of splice variants is significantly different between stage 1- and stage 4+ tumors for all three tested spliced genes.

Discussion

Alternative splicing of precursor mRNA is an essential step in gene expression and responsible for much of the proteome diversity in mammalian genomes. Although splice variants have long been known to be associated with many human diseases, very little is understood about the global properties of alternative splicing in cancer development. Using Affymetrix Human 1.0 exon array, we compared splicing interruption and whole gene expression change in different stages of a pediatric cancer - neuroblastoma. Our results suggested a significant role of splicing regulation in high stage and MYCN amplified neuroblastoma tumors.

In stage 4- neuroblastoma, transcription regulation and alternative splicing may affect different sets of genes involved in different biological processes with more genes showing splicing disruption than whole gene expression change. Regulation of transcription and splicing seem to be two independent processes that result in distinct functional outcomes in stage 4- tumor, which is consistent with the independent roles of these two processes in determination of tissue specificity [24] and regulation of immune response [25]. In contrast, a large group of genes underwent both splicing and whole gene expression changes in stage 4+ tumor. Similar biological processes were enriched in splicing and whole gene expression signatures in stage 4+ tumor, including those that were affected separately by alternative splicing and whole gene expression changes in stage 4- tumor. It indicates that different biological processes may be affected by splicing and transcription regulation in stage 4- tumors and these processes may need to be disrupted by both gene regulatory processes in the more severe stage 4+ tumors.

MYCN-amplified tumor is a highly aggressive subtype with poor prognosis in 20% of neuroblastoma patients. Several studies have shown the differential expression of a large number of genes involved in cell cycle and differentiation in these tumors [23, 26]. Our current study indicates, for the first time, that MYCN amplification is not only related to large scale gene expression changes but also profound splicing regulation in neuroblastoma. MYCN is a global transcriptional regulator for both protein-coding genes and genes encoding non-coding RNA products [27, 28]. Transcriptional targets for MYCN include splicing factors [29] which may in turn regulate alternative splicing of various target genes. Recent study has shown that the splicing switch of pyruvate kinase in human gliomas may be controlled by c-myc through splicing factors including polypyrimidine tract binding protein (PTBP1) [30]. The PKM2 splicing switch (Figure 3) and PTBP1 expression upregulation we found in stage 4+ neuroblastoma (Additional file 3) suggest a MYCN-controlled pathway for PKM2 splicing in neuroblastoma. Other splicing targets of PTBP1 have also been found in our splicing signatures of MYCN-amplified tumors, including reticulon 4 (RTN4) [31] and ROD1 regulator of differentiation 1 [32]. In addition, multiple differentially expressed MYC/MYCN target genes in stage 4+ tumor appear to be associated with splicing processing complex, such as small nuclear ribonucleoprotein polypeptides (SNRPA, SNRPB, SNRPD2), dead-box polypeptides (DDX1, DDX18), RNA-binding motif protein (RBM3), and cleavage and polyadenylation specific factor (CPSF1) [29, 33]. These splicing regulators may be involved in MYCN-associated splicing regulation, through which MYCN may exert part of its phenotypic effects on neuroblastoma.

Alternative splicing plays important roles in various diseases. It may be a direct cause of the disease, or a modifier of disease susceptibility and severity [2]. In this study, we identified candidate genes undergoing splicing disruption in high stage and MYCN amplified neuroblastoma, which help the understanding of disease biology. Neural development and cell adhesion genes exhibit splicing changes in stage 4- disease, and they undergo both splicing regulation and whole gene expression level change in stage 4+ patients. Previous proteomics study has demonstrated that proteins involved in these processes are significantly suppressed in stage 4+ neuroblastoma patients [21]. Our results suggest that alternative splicing may be responsible, at least partially, for the changes observed at the proteome level. Defects in splicing machinery may cause alternative splicing and further whole gene expression changes of neural development and adhesion genes, resulting in the protein level changes observed in our previous study.

Alternatively spliced candidate genes identified in this study provides a useful resource for the discovery of diagnostic biomarkers or therapeutic targets in neuroblastoma. One example is related to splicing switch of PKM2 variants. Previous studies in multiple tumor types have shown that M2 isoform is expressed while M1 isoform disappears during tumor development [34]. PKM2 plays important role in cancer metabolism, and it has been proposed to be a potential metabolic target for the treatment of cancer [19, 35]. Our results demonstrated the splicing switch of isoform M1 to isoform M2 in high stage neuroblastoma (Figure 3), suggesting a similar role of PKM2 in neuroblastoma development as in other tumors [35]. Another interesting result is related to the alternative splicing of microtubule associated genes. Our data demonstrated that genes involved in microtubule-associated process were significantly enriched in differentially spliced candidate genes in stage 4+/4- vs stage 1- tumors. There were 57, 27, and 15 microtubule-associated genes showing aberrant splicing pattern in stage 4+ vs 1-, 4+ vs 4-, and 4- vs 1- respectively (Figure 4A; Additional file 4, 5, 6). Among these genes, KIF1Bbeta but not alpha has been shown to be a potential 1p36 tumor suppressor for neuroblastoma. Protein regulator of cytokinesis 1 (PRC1) is essential for organization of central spindle and midzone formation, whose interaction with KIF2C has been shown to be involved in breast cancer tumorigenesis [36]. Microtubule-associated protein 2 (MAP2) has been shown to be a prognostic marker for melanoma patients [37], and splice variants of MAPT gene demonstrated opposite changes in normal versus prostate tumor [38]. Recent studies have suggested a significant role of deregulated microtubule dynamics in enhanced genomic instabilities and tumor development [39]. Our results suggest that alteration of microtubule dynamics by alternative splicing may be an important pathogenetic mechanism in high stage neuroblastoma. Further functional study of these microtubule-associated genes may reveal novel tumor suppressors or oncogenes for neuroblastoma.

Conclusions

In summary, our study demonstrated the important roles of splicing regulation in high-stage and MYCN amplified neuroblastoma. There may be a generalized shift in global splicing patterns synchronizing with the development of malignant phenotypes of tumor cells. Deciphering the "splicing code" is essential for our understanding of cancer etiology and progression pathway.

Methods

Tumor samples

We used 47 pretreatment primary neuroblastoma tumor samples in our study. All patients were blinded and anonymized to us. Our protocol was exempt from the NIH Multiple Project Assurance and our research activity involving human subjects was exempt from the office of Human Subjects Research (OHSR). The median age of forty-seven patients at diagnosis was 1.6 years (range 0.1 - 11.6 years). Ten patients had stage 1 disease while the remaining thirty-seven patients had stage 4 disease of which nine demonstrated MYCN gene amplification.

Array experiments and data analysis

We extracted total RNA from tumor samples [40], which were then processed and labeled using Affymetrix whole transcript one-cycle labeling kit. After hybridization to Affymetrix Human Exon 1.0 arrays, chips were stained and scanned as per manufacturer's instructions. For data analysis, we used Affymetrix Power Tool (APT) to get signal intensities for probesets and transcript clusters. Probeset signals were estimated by PLIER and a detection p-value was assigned to each probeset by DABG algorithm. Gene-level signals were derived for 17800 transcript clusters comprised of core probesets that are supported by RefSeq transcripts and full-length mRNAs using IterPLIER algorithm. IterPLIER identifies highly correlated probesets of each transcript cluster to derive transcript cluster signal, thus gene-level expression estimate mostly includes the expression level of constitutive exons, which reflects gene expression regulation at the level of transcription and/or RNA stability but not splicing. A probeset is considered to be expressed in a sample if its detection p-value is less than 0.05, and a transcript cluster is expressed if more than half of its consisting probesets have p values less than 0.05.

For each core probeset (~284,000), we calculated normalized intensity (NI) which is probeset (exon-level) intensity divided by transcript cluster (gene-level) intensity in each sample. The comparison of NI between two sample groups may reveal exons that are differentially spliced in two groups. To reduce the false positive rate in the splicing variant identification, several filtering steps were applied to the signals of both probesets and transcript clusters. First, the detection p-value was used to remove probesets and transcript clusters with undetectable signals. Probesets are required to be expressed in >50% samples of at least one group, while transcript clusters have to be expressed in >50% samples of both sample groups. Next, we removed probesets with very low variances based on the interquantile range of probeset intensities across all samples. The least variable 20% probesets were discarded to reduce the number of cross-hybridizing or non-responsive probesets. Thirdly, we set expression values of less than 30 to 30 to reduce noise [7]. Lastly, we reduced the effect of inaccurate annotations by removing transcript clusters with more than one annotated Entrez Gene identifiers as well as multiple transcript clusters annotated by the same Entrez Gene identifier.

After the filtering steps, we used the Significance Analysis of Microarray (SAM) [16] two class unpaired method to identify probesets that have statistically significant changes in NI values between two groups of patients. Briefly, a score d i is calculated for each probeset to measure the relative difference in splicing index between two groups of samples,

where s i is a pooled standard deviation over the two groups of samples, and so is a small positive constant that adjusts for the small variability in the data. These d i values are used to rank probesets on ascending order and derive observed order statistics d(i). Then, a permutation procedure (n = 100) is applied to get a set of permuted relative difference values and corresponding order statistics for each probeset. The average of permuted order statistics is defined as the expected statistic d E (i), which is plotted versus the observed statistic d(i) in a scatter plot. The set of probesets that are away from the d(i) = dE(i) line by a distance greater than an adjustable threshold Δ are called significant, and the percentage of such probesets identified by chance (false discovery rate) is estimated by the permuted dataset. In our study, the threshold Δ was chose to achieve a false discovery rate of 0.05, and the fold change of NI was required to be more than 1.5. The analysis of SAM was done by the R package samr. Similarly, differentially expressed genes were identified by applying SAM on the signal estimate of transcript clusters. Non-specific filtering was applied to remove transcript clusters that are only expressed in less than 20% of all samples. The false discovery rate is also required to be less than 0.05, and the signal fold change is more than 1.5.

For each probeset with significantly different NI between two tumor groups, we used AltAnalyze to identify competitive transcript isoforms that contain or do not contain the exon overlapping with that probeset. Possible splicing events caused by the alternative exon were classified into seven types (alternative N/C-terminus, alternative cassette exon, alternative 5'/3' splice site, retained intron, bleeding exon) based on the comparison of exon structures of competitive isoforms. Furthermore, we predicted changes in protein domains/motifs and miRNA binding sites associated with the probeset using both competitive isoform analysis and direct alignment method in AltAnalyze [17, 41].

Gene-level signal was estimated by the IterPlier algorithm, and it mostly includes the expression level of constitutive exons, which reflects gene expression regulation at the level of transcription and/or RNA stability but not splicing.

Array data access and visualization

To facilitate exon array data access and visualization, we have developed a novel database and visualization system displaying both gene-level and exon-level expressions for array probesets and transcript clusters (http://pob.abcc.ncifcrf.gov/cgi-bin/JK). Coupled with exon array annotation tools, such as ArrayCheck [42], this system allows easy query and visual inspection of alternatively spliced exons. Clinical outcome was also integrated with gene expression data such that Kaplan-Maier plot may be dynamically generated for each probeset and transcript cluster. The raw data and the processed data are also available at http://www.ncbi.nlm.nih.gov/geo/ (GEO accession: GSE27608).

Quantitative RT-PCR validation

One microgram of total RNA was reversely transcribed to cDNA using Superscript II reverse transcriptase (Invitrogen) and random hexamer primer (Invitrogen) at 42°C for 1 hour. PCR reactions were performed with 40 cycles at 95°C for 30 sec, 60°C for 30 sec, and 72°C for 45 sec using SYBR Green PCR Master Mix (Applied Biosystems) and spicing variant-specific primers on ABI 7000 Sequence Detection System (Applied Biosystems). Splice variant-specific primers were designed using Primer 3 program (http://frodo.wi.mit.edu/primer3) and synthesized by Integrated DNA Technologies. Primer sequences are as follows: PKM2 (NM_182470&NM_182471) forward 5'-CTA TCC TCT GGA GGC TGT GC-3', reverse 5'-GAG GCT CGC ACA AGT TCT TC-3'; PKM2 (NM_002654) forward 5'-ATC GTC CTC ACC AAG TCT GG-3', reverse 5'-GAA GAT GCC ACG GTA CAG GT-3'; KIF1Bα (NM_183416) forward 5'- GAA GAT CGA AGA CGT CAT GGC C-3', reverse 5'-ACA CCA GCA CCA ACA GGC TCC-3'; KIF1Bβ (NM_015074) forward 5'-ACT TCT AGC TGG CAC AAT ACG-3', reverse 5'-GTC ACC GTC AAG AAT CAC AAA G-3'; MAP2 (NM_002374) forward 5'-TGG GTG GAC ACT CAA GAT GA-3', reverse 5'-TGA TCT CCG AGC TTC CTT TT-3'; MAP2 (NM_001039538) forward 5'-GCT CTG GCT CCC AGT GTA TT-3', reverse 5'-CTG CTG AGG TGG GCT GTA TT-3'.

To evaluate differential expression of splice variants in stage 1- and stage 4+ tumors, we calculated expression fold changes between different splice variants of the spliced gene in each sample, which were further centered by median of values obtained in stage 1- and stage 4+ tumors.