Transcriptome sequencing and detection of fusion genes
To identify fusion genes in pediatric ALL, we sequenced the transcriptomes of 134 ALL samples, including 116 BCP-ALL and 18 T-ALL patients (Table 1; Additional file 1: Table S1). RNA sequencing yielded between 19 and 120 million (average 46 million) paired-end reads per sample. We detected 2136 candidate fusion transcripts with the FusionCatcher software across all the samples prior to filtering . On average, we discovered 31 candidate fusion genes per sample (range 2–158). To reduce the number of potential false-positive fusion transcripts, we performed stringent filtering of the candidate fusion transcripts, including filtering of fusion genes called in the normal B and T cell to enrich for cancer-specific fusion genes in the ALL samples. This filtering procedure rendered a set of 197 unique candidate fusion genes identified in 97 of the ALL patient samples (Additional file 3: Fig. S1B). Next, we validated candidate fusion genes by visual examination of the aligned sequencing reads that supported a fusion junction in the RNA sequencing data. Of these candidates, 104 were selected for further validation by PCR followed by Sanger sequencing, where 61 of the fusion genes were experimentally validated (Additional file 1: Table S6). In addition, we performed targeted screening of 22 well-established ALL fusion genes whereby additional fusion genes were detected, including DUX4-IGH (n = 8 patients), TAF15-ZNF384 (n = 1), and STIL-TAL1 (n = 1) (Additional file 1: Table S4). Thus, after filtering and experimental validation, we detected a total of 64 unique fusion events, corresponding to 136 fusion genes in 80 of the patients included in the study (Fig. 1).
Characteristics of the fusion genes in pediatric ALL
A fusion gene was detected in 74 of the 116 BCP-ALL patients, whereas a fusion gene was detected in only six out of the 18 T-ALL patients analyzed. No difference in fusion gene calls was observed between samples originating from the bone marrow or peripheral blood. Among the BCP-ALL patients with a detectable fusion gene, the frequency of fusion genes varied from one to four co-occurring fusion genes, although in the majority of the BCP-ALL patients (38/74), only one fusion gene was detected per sample (Fig. 1a). In the six T-ALL patients with a detectable fusion gene, only a single fusion gene was observed per sample.
In the BCP-ALL subtypes t(12;21)ETV6-RUNX1, t(9;22)BCR-ABL1, and 11q23/MLL, a fusion gene was detected by RNA-sequencing in all but one t(12;21) patient (ALL_504) and patients with the HeH subtype had the lowest frequency of fusion genes (Fig. 1b). In addition to fusion genes that are characteristic for ALL, we identified several other fusion genes in our patient cohort (Fig. 1c, d). The largest proportion of unique fusion genes was detected in the BCP-ALL “other” subgroup where a fusion gene was detected in 33 of the 42 patients, of which 16 patients expressed more than one.
Forty of the 64 unique fusion genes occurred between two genes on different chromosomes, while 24 fusion genes were caused by presumptive intra-chromosomal rearrangements, of which 18 (75%) of the involved genes are located over 1 Mbp distance from each other (Fig. 1c, d, Additional file 1: Table S6). Thirty-six fusion genes had an open reading frame, and in-frame fusion genes were more common in t(12;21), t(9;22), or the 11q23/MLL than in the ALL patients with HeH, where only two out of 12 fusion genes were in-frame. Most fusion genes (43/64) were only detected in a single patient and 36 (56%) have not been previously described in ALL.
The most common fusion gene was the well-known ETV6-RUNX1 and its reciprocal RUNX1-ETV6 in the t(12;21) BCP-ALL subtype (Fig. 1d). We also detected eight fusion genes, including six novel in-frame fusion genes that were expressed concurrently with ETV6-RUNX1. Contrary to the t(12;21) subtype, we only identified a single in-frame fusion gene (TNKS-ATL3) in addition to the canonical BCR-ABL1/ABL1-BCR in the t(9;22) subtype. In the 11q23/MLL subtype, no in-frame fusion genes were detected besides the canonical KMT2A-AFF1/AFF1-KMT2A and KMT2A-MLLT3. Among the BCP-ALL “other” patients, DUX4-IGH was detected in eight out of 42 patients and was thus the most common fusion gene in this subgroup, followed by recurrent fusion events involving ZNF384 or PAX5.
Information from karyotyping performed at diagnosis provided support for genomic breakpoints giving rise to the fusion genes. In addition to the canonical fusion genes in t(12;21), t(9;22), and 11q23/MLL, we found evidence for genomic breakpoints for 12 fusion genes in the karyotype data. Supporting evidence in the karyotype data was primarily observed as translocations between the chromosomes where the fusion genes are located (Additional file 1: Table S7). Furthermore, we used array-based copy number analysis (CNA) to detect evidence for chromosomal rearrangements within or in close proximity of the genes involved in a fusion event. Deletions or amplifications in the genomic regions of genes involved in a fusion event were identified for an additional 11 fusion genes (Additional file 1: Table S7, Additional file 3: Fig. S2). Thus, in total, we detected the presumable genomic breakpoint that gave rise to 23 out of the 57 unique non-canonical fusion genes.
Recurrent fusion genes
Of the 64 fusion genes, 21 were recurrent in BCP-ALL (Fig. 2a). Patients with t(12;21)ETV6-RUNX1 (n = 18), t(9;22)BCR-ABL (n = 6), and 11q23/MLL (n = 7), all harbored translocations resulting in fusion genes that had been verified either by FISH or RT-PCR and served as positive controls for fusion gene detection by RNA sequencing. We detected the characteristic subtype-defining fusion transcripts, including their reciprocal fusion genes, in 29/31 (94%) of the patients in this positive control group (Additional file 1: Table S1). The expected canonical fusion gene was not detected in two patients. In both cases (ALL_504 t(12;21) and ALL_16 11q23/MLL), the libraries were among those with the lowest sequence depth in the study (32 and 35 million read-pairs, respectively) (Additional file 1: Table S1). The targeted approach to identify fusion-supporting reads for ETV6-RUNX1 in ALL_504 revealed only two reads, while no fusion supporting reads in ALL_16 were detected for 11q23/MLL-related fusion partners, although this does not exclude the presence of an unknown or rare fusion partner that was not included in our target approach. It is highly likely that low sequencing depth in these two cases contributed to the false negative result.
Balanced translocations, which can result in expression of reciprocal fusion genes, occur in the t(12;21)ETV6-RUNX1, t(9;22)BCR-ABL1, and 11q23/MLL subtypes. In agreement with this, we found co-expression of the reciprocal fusion gene in several patients belonging to these three BCP-ALL subtypes (Fig. 2a). In the 11q23/MLL subgroup, we found co-expression of KMT2A-AFF1 and AFF1-KMT2A in all four patients with t(4;11), but no evidence for co-expression of the reciprocal fusion gene in the two patients with KMT2A-MLLT3. We also identified reciprocal BCR-ABL1 and ABL1-BCR in two out of the six t(9;22) patients. In the t(12;21) subtype, we detected co-expressed ETV6-RUNX1 and reciprocal RUNX1-ETV6 in eight out of 18 t(12;21) patients. Moreover, in three out of the seven patients with ETV6-RUNX1, but without the reciprocal RUNX1-ETV6, we identified co-expression of two other previously unreported in-frame fusion genes, namely DCAF5-ETV6 (ALL_386) and RUNX1-PTPRO (ALL_9 and ALL_678), which appear to have arisen from cryptic unbalanced translocation of the truncated ETV6 or RUNX1 gene to another chromosomal region (Additional file 1: Table S1). Interestingly, the PTPRO gene is located on chromosome 12 approximately 3.6 Mbp downstream of ETV6. The DCAF5 gene is located on chromosome 14, and the karyotype of ALL_386 revealed a complex translocation involving chromosomes 3, 12, and 14.
The DUX4-IGH fusion gene (n = 8) and ZNF384 rearrangements involving EP300 (n = 3) or TCF3 (n = 2) were the most frequently occurring fusion genes in the BCP-ALL “other” group in addition to P2RY8-CRLF2. Together with PAX5-ETV6 (n = 2), DUX4-IGH and ZNF384 rearrangements were observed exclusively in BCP-ALL “other” patients. The novel PDGFRA-SF1 (n = 2), VASH2-ATF3 (n = 3), and CD69-HIST1H2BG (n = 2) were also found exclusively in BCP-ALL “other” and co-expressed with aforementioned fusion genes such as DUX4-IGH or ZNF384-TCF3. Interestingly, TTHY3-PDGFA was found exclusively in three patients with HeH.
Fusion gene nodes
In addition to the recurrent fusion genes described above, we identified promiscuous genes that were fused with several gene partners (Fig. 2b). Of the 89 unique genes involved in fusion events, 11 genes had more than one fusion partner and constitute six independent nodes. These genes include the well-known ALL genes ETV6, RUNX1, KMT2A, and PAX5 that form fusion genes with up to five different partners; the emerging BCP-ALL subgroup defined by ZNF384 rearrangements; and the new P2RY8, PDGFA, and PDGFRA nodes.
ETV6 and PAX5 were the most frequently translocated genes and formed the largest network of gene connections often resulting in in-frame fusion genes. Although fusion genes involving ETV6 and RUNX1 were predominantly detected in the t(12;21) subgroup, CBX3-ETV6, RUNX1-ASXL1, and ETV6-AK125726 were detected in patients with other subtypes. These three out-of-frame fusion genes have been previously described in these patients based on a t(12;21)-like DNA methylation signature . This is contrary to the general pattern in the t(12;21) subgroup, where additional in-frame fusion genes with ETV6 or RUNX1 are expected. For example, DCAF5-ETV6 and RUNX1-PTPRO are both in-frame.
DNA methylation and transcriptional signatures
Next, to obtain a view of the molecular variation associated with recurrent fusion genes in our dataset, we performed unsupervised clustering analysis using array-based genome-wide DNA methylation and gene expression data from RNA sequencing (Fig. 3, Additional file 3: Fig. S3). T(12;21)-like DNA methylation patterns have previously been described for three of the patients included in this study (ALL_11, ALL_106, and ALL_495) as mentioned above. These patients were found to harbor fusion genes involving either ETV6 or RUNX1, but not the canonical ETV6-RUNX1 fusion gene . In agreement with these findings, these three patients clustered together with the t(12;21)ETV6-RUNX1 patients in DNA methylation and gene expression data (Fig. 3a, b, e, f). Notably, the patient with a CBX3-ETV6 fusion gene that was diagnosed as HeH and verified by CNA clustered together with the t(12;21) rather than with the HeH subgroup. Furthermore, three t(12;21) patients (ALL_386, ALL_9, and ALL_678) harboring unbalanced t(12;21)-translocations with DCAF5-ETV6 or RUNX1-PTPRO, and six patients in which only one of the reciprocal fusion genes was detected, clustered with the t(12;21) based on both DNA methylation and gene expression data. The two patients with PAX5-ETV6 did not cluster with the t(12;21) patients, most likely due to the downstream effect of altered PAX5, rather than ETV6 in these patients.
We also examined the DNA methylation and gene expression patterns for recurrent fusion gene families characterized by DUX4-IGH, ZNF384, or PAX5 rearrangements in the BCP-ALL “other” group. Three major clusters defined by the recurrent fusion gene families emerged (Fig. 3c, d). One additional patient (ALL_205) clustered together with the eight DUX4-IGH patients, although DUX4-IGH was not detected by RNA sequencing. DUX4-rearranged cases have previously been associated with a distinct over-expression of the DUX4 gene . In agreement with this, ALL_205 displayed similarly high levels of DUX4 expression as the DUX4-IGH-positive cases at much higher levels compared to other BCP-ALL subgroups and controls (Additional file 3: Fig. S4A). 3′ RACE confirmed the presence of a DUX4-IGH fusion gene in ALL_205; however, ~470 bp from chromosome 8q24.21 (chr8:130,691,944-130,692,413) was identified as inserted between the DUX4 and IGH genes in the fusion transcript; thus, it was not initially detected by our targeted screening approach (Additional file 3: Fig. S5A).
The six patients with ZNF384 rearrangements clustered together in both the DNA methylation and gene expression data (Fig. 3c–f). In the ZNF384 cluster, we observed three additional patients (ALL_58, ALL_61, and ALL_257) with no evidence for a ZNF384 rearrangement despite targeted screening (Fig. 3c, d). Unlike the DUX4-IGH group, patients with ZNF384 rearrangements lacked differential expression of the ZNF384 gene and its associated fusion partners (Additional file 3: Fig. S4B). 5′ RACE was performed to amplify the fusion transcripts without prior knowledge of the ZNF384 fusion partner in these three patients. A novel fusion between the ATP5C1 gene on chromosome 10 (5′ fusion partner) and ZNF384 (3′ fusion partner) was detected in ALL_257 (Additional file 3: Fig. S5B). The RACE experiments were inconclusive for ALL_58 and ALL_61, and additional experiments will be needed to identify which, if any, ZNF384 fusion gene is present in these patients.
Consistent with the previously reported Philadelphia-like signature associated with PAX5-JAK2, patient ALL_539 clustered together with the t(9;22)BCR-ABL1 patients (Fig. 3e, f) [4, 33]. The remaining patients with other PAX5 fusion genes clustered together.
Differential DNA methylation and gene expression
To date, differential DNA methylation has not been comprehensively studied in the DUX4-IGH- and ZNF384-rearranged subgroups. We therefore highlight DMCs in combination with differentially expressed genes in patients with the DUX4-IGH and ZNF384 rearrangements compared to patients with well-established ALL subtypes and normal CD19+ B cells.
We detected 2740 and 3516 DMCs specific to the DUX4-IGH- and ZNF384-rearranged subgroups, respectively (Additional file 1: Table S8–S9). DUX4-IGH was characterized by widespread hypomethylation compared with normal B cells and the other ALL subtypes, whereas the group with ZNF384 rearrangements was hypermethylated (Fig. 4, Additional file 1: Table S10). The DMCs were distributed across 245 and 192 genes unique to the DUX4-IGH and ZNF384-rearranged groups, respectively (Additional file 1: Table S10). While no enrichment to known pathways was observed, an enrichment of genes regulated by the transcription factor E2F1 was found in the DUX4-IGH subgroup (p = 8.2 × 10− 5). E2F1 is a member of the E2F family of transcription factors and acts as a potent transcriptional activator and master regulator of cell cycle progression.
To provide additional insights into possible functional implications of the subtype-specific DMCs, we examined their distribution across functional genomic regions defined by chromatin marks and DNaseI hypersensitive sites and in relation to CpG islands and gene-centric regions. The majority of the DMCs (88%) in DUX4-IGH were hypomethylated, enriched to gene bodies, and depleted in regions with open chromatin and high CG density (Fig. 4a). In contrast, the predominantly hypermethylated DMCs (80%) in the ZNF384-rearranged group were strongly enriched in CpG islands marked by bivalent chromatin marks (H3K4me3 and H3K27me3) and in open chromatin regions (Fig. 4b).
In order to investigate whether differential methylation is associated with gene expression, differential gene expression analysis was performed in the DUX4-IGH- and ZNF384-rearranged groups (Additional file 1: Table S11–S12). Approximately 3% of the genes with DMCs overlapped with differentially expressed genes corresponding to 47 and 63 overlapping genes in the DUX4-IGH- and ZNF384-rearranged groups, respectively (Fig. 4c, d, Additional file 1: Table S13). Most of the overlapping genes in the DUX4-IGH group showed an inverse correlation between methylation and gene expression (85%), with the majority of genes upregulated in DUX4-IGH compared to the other subtypes. Several of the overlapping genes (n = 9) were directly regulated or associated with the hypomethylated ESR1 gene such as GATA3, WT1, and ITGA6. Alterations involving ESR1 has mainly been described in breast cancer . However, a study performed in non-hyperdiploid multiple myeloma proposed that ESR1 contributes to cell cycle dysregulation, thus affecting the transcription of several downstream genes including E2F1 . Similarly, in the ZNF384-rearranged group, the majority of the genes showed an inverse correlation between methylation and gene expression (84%); however, the differentially expressed and methylated genes were downregulated compared to the other ALL subgroups. These include genes involved in hematological system development such as SETBP1 and NRP1 and putative tumor suppressor genes such as PARD3 and LRIG1 [36,37,38]. A notable exception is the overexpression of SALL4 in the ZNF384-rearranged group, which has been described as an oncogene in leukemia [39, 40].
To further characterize the subgroups with DUX4-IGH and ZNF384 rearrangements, we utilized previously published targeted exome sequencing data including 872 cancer genes  and array-based copy number data derived from the HumanMethylation 450 k arrays from the same patients . Targeted sequencing data was available for five out of the nine DUX4-IGH positive cases. Non-synonymous somatic mutations in the mutation hotspot p.G12 of the NRAS gene were found in all five DUX4-IGH-positive samples that were analyzed by targeted sequencing (Table 2). We screened for NRAS mutations in the RNA-sequencing reads confirming the presence of the five aforementioned NRAS mutations, but no additional NRAS mutations were detected in the remaining four patients. We detected ERG deletions, which is a common alteration in this subgroup [14, 15], in seven out of the nine DUX4-IGH patients (Additional file 3: Fig. S6). Interestingly, we also detected shared non-synonymous PTPN11 mutations and chr7q deletions in the two patients with TCF3-ZNF384 (Table 2). These mutations were confirmed in the RNA-sequencing data, and no additional PTPN11 mutations were detected in the other patients with ZNF384 rearrangements. Furthermore, Hirabayashi et al. reported that two out of six patients with TCF3-ZNF384 harbored PTPN11 mutations detected by whole exome sequencing or RNA sequencing; however, no additional copy number analysis was performed . Together, these findings, albeit in a small sample set, suggest a common pattern related with the TCF3-ZNF384 fusion gene.
Recent studies have shown that DUX4 rearrangements are associated with a favorable prognosis of pediatric ALL, whereas ZNF384 rearrangements appear to be associated with an intermediate outcome. To assess the prognostic impact of recurrent fusion genes in our cohort, we determined the event-free survival (EFS) in the BCP-ALL subgroups (Additional file 3: Fig. S7). One relapse was observed in the nine patients with DUX4-IGH, which confirms previous reports of a generally favorable outcome associated with DUX4 rearrangements [9, 14, 15]. Inferior prognosis has been associated with TCF3-ZNF384, while favorable prognosis has been associated with the EP300-ZNF384 fusion gene . In the present study, no relapses were observed in patients with EP300-ZNF384. Relapses were observed in patients with all of the other fusion gene partners (TCF3-ZNF384 (ALL_622), TAF15-ZNF384 (ALL_8), and ATP5C1-ZNF384 (ALL_257)).