Background

Medulloblastoma (MB) is the most common malignant brain tumour of childhood, accounting for around 10% of paediatric cancer deaths. Despite recent therapeutic advances, up to 40% of patients still die from their disease and cure is often associated with disabling therapy-related effects in later life. New therapeutic approaches based on improved biological understanding of the disease will be essential to improve outcomes, through strategies such as the delivery of risk-adapted therapies guided by molecular prognostic biomarkers, and the stratified use of molecularly-targeted agents (reviewed in [1]).

Four major molecular subtypes of medulloblastoma with distinct biological, clinical and pathological features are now recognised, and this subgrouping is beginning to have clinical impact [13]. WNT subgroup tumours (~15% of total) are characterised by activation of the wnt/wingless pathway through CTNNB1 mutations, and appear to originate from progenitor cells derived from the dorsal brain stem [4]. Patients with WNT-associated tumours have a favourable prognosis and will receive reduced therapies in forthcoming international clinical trials [1, 5, 6]. The SHH subgroup (~25% of total) is defined by activation of the sonic hedgehog signalling pathway, and mutations in SHH pathway genes (e.g. PTCH1, SUFU) arise in a significant subset [2, 3]. All evidence suggests that these tumours originate from cerebellar granule neuron precursors (cGNPs) within the external granular layer of the developing cerebellum (reviewed in [7]) or cochlear nuclei of the brainstem [8]. SHH tumours are associated with an intermediate prognosis. Early clinical trials of SHH pathway inhibitors are underway, although acquired resistance has been reported [9] and tumours with downstream pathway mutations (e.g. GLI2, SUFU) are predicted to be insensitive to their action [10]. Group 3 and 4 tumours are more heterogeneous and show overlapping molecular features such as frequent chromosome 17 defects. However, Group 3 tumours (~25% of total) have been associated with high-risk features, such as MYC gene amplification and large-cell/anaplastic (LCA) pathology, and a poor prognosis [2].

Recent genome-wide studies have further underlined the complexity of MB; heterogeneous mutations targeting processes including histone methylation and chromatin remodelling have been discovered, but these typically describe limited subsets of tumours and few additional recurrent mutations targeting specific genes and/or pathways have been identified [1115]. Critical genes therefore remain to be uncovered in all MB subgroups and there is a growing need to identify low-frequency alterations which drive disease progression, distinguish these from passenger mutations, and determine their mechanisms of action and clinical significance. Primary tumour data alone may not be sufficient to achieve these goals.

A number of murine MB models have been created allowing comparative analyses (reviewed in [16]), including models which recapitulate WNT [4] and Group 3 [17, 18] tumours. However, the most widely studied mouse model is a knockout of the Shh transmembrane receptor, Ptch [19], which mimics SHH subgroup tumours. Ptch+/- heterozygotes develop MBs at a frequency which is significantly influenced by genetic background [20]. Sleeping Beauty ( SB) murine mutagenesis [21, 22], coupled to statistical analysis of insertion site distribution [23, 24], has emerged as a powerful method to identify genes involved in a wide variety of human cancers [2429]. Recently the SB11 transposase [22] driven by the Math1 promoter has been used to mutagenise developing neuronal tissues in both the Ptch+/- and p53 loss-of-function models of MB [30]. This identified a large number of candidate genes potentially involved in MB progression, and demonstrated that the genetic events observed in metastases show limited overlap with those in matched primary tumours, supporting a bicompartmental genetic model of metastatic disease [30, 31].

Here, we report the application of whole-body SB mutagenesis [26] to the Ptch+/- tumour model, and the identification of 17 genes associated with enhanced medulloblastoma formation. We show that these genes are enriched for neuronal transcription factors defining a novel gene network which, when mutagenised by SB, is associated with increased cell proliferation and reduced neuronal differentiation. Significantly, increased expression of Igf2, a gene known to be essential for tumour formation in the Ptch+/- model, is associated with insertional mutations within this network. Moreover, we show that in human disease, network activity predicts poor survival specifically within the SHH tumour subgroup. Together, these findings provide important novel insights into the molecular mechanisms of medulloblastoma pathogenesis, and identify exploitable therapeutic targets and prognostic biomarkers for development towards improved therapy.

Results

The incidence of MB but not RMS is enhanced in mutagenised Ptch+/- mice

A total of 243 mutagenised Ptch+/- animals (Ptch+/-; SB11+/-;T2Onc+/-) and 195 control littermates were aged for up to 15 months and monitored for tumour development. Mortality in mutagenised Ptch+/- animals was approximately 90% after 1 year, significantly higher than in the predisposition only (Ptch+/-;T2Onc+/-) or transposition only (T2Onc+/-;SB11+/-) control genotypes (Figure 1a and Additional file 1: Table S1). Approximately ~28% of mutagenised Ptch+/- animals aged for more than 6 months succumbed to haematological neoplasms, which usually presented as large thoracic tumours and/or hepatosplenomegaly. A low frequency of parenchymal brain lesions consistent with glial tumours (2%) was also observed. These malignancies occurred at comparable frequencies within the transposition controls, have previously been reported using the same transposase/transposon combination [26, 29], and are not analysed further here.

Figure 1
figure 1

Mortality in experimental/control cohorts and tumour pathology. a-c. Kaplan Meier survival curves. All p-values are from log-rank tests. a. Mortality due to all causes in experimental cohort and controls. b. Medulloblastoma (MB) arising from cerebellum (C). Scale bar = 500 μm. c. High resolution image showing typical cytomorphology of small, polygonal or slightly elongated cells. Perivascular pseudorosetting and occasional necrotic foci were also seen in some tumours. Scale bar = 50 μm. d. Mortality due to medulloblastoma alone. The difference between predisposition and transposition controls was not significant (p=0.118). e. Typical appearance of RMS consisting of variable numbers of small, round mitotically active cells together with large more well-differentiated rhabdomyoblasts arranged in interlacing fascicles. Scale bar = 200 μm. f. High resolution image showing large rhabdomyoblasts with characteristic cross-striations, accompanied by smaller less well differentiated cells. Scale bar = 50 μm. g. Mortality due to RMS alone. h. Hepatocellular adenoma containing mixed pattern (micro- and macrovesicular) steatosis and leukaemic infiltration. Scale bar = 50 μm. i. Low power view of a well demarcated hepatocellular carcinoma with leukaemic infiltration (T=tumour, L=liver parenchyma). Scale bar = 500 μm. j. High resolution image of hepatocellular carcinoma showing tumour cells (right-hand side of image) with eosinophilic cytoplasmic inclusions. Scale bar = 50 μm. All sections are stained with Haematoxylin and Eosin.

Despite this mutagen-specific burden, significant mortality within mutagenised Ptch+/- animals was related to the Ptch genotype. Large exophytic and/or invasive MBs (Figure 1b and c) developed in ~23% of mutagenised Ptch+/- animals aged for over 6 months, with no macroscopic meningeal masses being observed (typical of the Ptch model [19]). Tumours with indistinguishable pathology were observed in the predisposition controls but at a much lower frequency (6%), with survival analysis providing clear evidence that SB mutagenesis enhanced the predisposition of Ptch+/- mice to MB (p<0.0001, Figure 1d). Slow growing and locally invasive rhabdomyosarcomas (RMSs), also typical of the Ptch+/- model [32], developed in ~22% of experimental animals aged over 6 months (Figure 1e and f). However, mutagenesis did not significantly alter RMS related mortality relative to predisposition controls (Figure 1g, p=0.31). Finally, multiple liver tumours (Figure 1h-j), morphologically similar to those generated using a conditional SB11 screen for hepatocellular carcinoma [27], were observed in 19 mutagenised Ptch+/- mice but not in predisposition controls. Although not previously reported in whole body SB mutagenesis screens, liver tumours were also observed at low frequency in transposition only controls (Additional file 1: Table S1), suggesting that they can be induced by SB mutagenesis alone.

SB insertions target neuronal transcription factors

To identify genes responsible for the impact of mutagenesis upon MB incidence, transposon insertion sites within MBs and control cerebellar tissues were recovered using Splinkerette PCR, sequenced, and their distribution analysed using Gaussian kernel convolution (GKC [23]) to identify common insertion sites (CISs, see Methods). A total of 17 genes were identified within 20 CISs recovered [median p-value = 0.008 (Table 1)]. The majority were also recovered using Monte Carlo simulation analysis [24], and these are highlighted in bold in Table 1. Three of the CISs target known tumourigenic genes (Crebbp, Nfib and Pten), mutations within orthologs of two (Crebbp and Pten) are somatically mutated at low frequency in human MB [11, 33], and six (Crebbp, Nfia, Nfib, Pten, Sfi1, and Tead1) have recently been identified as MB CISs in a tissue-specific mutagenesis screen [30]. Strikingly, ontological analysis established that six of the 17 CIS genes have transcription factor activity (Nfia, Tead1, Tgif2, Nfib, Myt1l and L3mbtl4), a highly significant excess relative to expectation (FDR corrected p-value = 2×10-5). Furthermore, seven genes (Tgif2, Pten, Nfia, Nfib, Myt1l, Slit3 and Fgf13) are implicated in neuronal biological processes. In contrast, only three genic CISs were identified in control tissue, all with modest p-values (0.01-0.05, Table 1). All of these observations are consistent with insertional mutations at CISs contributing to increased penetrance of the tumour phenotype. As all SB insertions within Tgif2 mapped to a single ~4 kb intron, these were analysed in detail using both genomic DNA and cDNA templates. This both validated our sequence data and confirmed the inferred upregulation of this gene by SB insertion (Additional file 2: Figure S1).

Table 1 Common Insertion Sites (CISs) in Medulloblastomas and Cerebellum Controls

Association of CIS genes with survival and focal copy number alterations

To investigate the association between CIS gene expression and survival, log-rank (Mantel-Cox) tests were performed on median split microarray expression data from human tumours [35]. Reduced expression of two genes, PTEN (a tumour suppressor previously implicated in MB) and MYT1L, was associated with poor outcome (Figure 2). This is consistent with the inferred mode of action of the SB insertions that target these genes (Table 1). Furthermore, the association of MYT1L expression with survival remained significant within a Cox-regression model incorporating high-risk clinical features using the data from Cho, Tsherniak et al [35], even after exclusion of the good prognosis WNT subgroup (p=0.011, see Additional file 3: Table S2).

Figure 2
figure 2

Expression of MYT1L and PTEN correlates with survival. Kaplan Meier curves showing survival in high and low expressing groups in data of [35] split by median of expression in all MB regardless of subtype. a. MYT1L, b. PTEN. Numbers at risk and log-rank test p-values are shown.

To investigate the relationship between CIS genes and regions of chromosomal loss or gain defined within primary tumours [35, 36], a GISTIC analysis was also performed [37]: PTEN maps within a well-established region of common chromosomal loss on 10q associated with SHH tumours [38], and both NFIB and TMEM45B were found to be present within peak regions of localised copy number gain (Additional file 4: Figure S2). While this is consistent with the mode of action inferred for PTEN and NFIB, the mode of action of TMEM45B remains unclear from insert data alone (Table 1).

CIS genes are differentially expressed in MB clinico- genetic subgroups

To establish whether the CIS genes are relevant specifically to the SHH subgroup of tumours, we used published data sets [38, 39] to compare expression of human orthologs in SHH subgroup tumours with expression in all other subgroups. Of the 17 CIS genes, 9 show significant differential expression when SHH subgroup tumours are compared to all others (including MYT1L and PTEN discussed above), and 15 show differential expression in one or more clinicogenetic subgroups (Table 2). However, only two (ITGBL1 and L3MBTL4) show clear differential expression in the SHH subgroup alone. Some show marked differential expression in a single non-SHH subgroup, such as NFIA (Group 3 tumours) and FGF13 (Group 4 tumours), or in more than one subgroup (e.g. TGIF2 and MYT1L), suggesting that these genes may be relevant to MB in general. We also investigated expression with respect to the presence/absence of metastatic disease, and 7 genes show a significant association, most notably genes with extreme expression values in Group 3 and 4 tumours where metastatic disease is common (Table 2).

Table 2 CIS gene expression according to clinicogenetic groups and presence of metastatic disease

For 13 genes, a comparison of expression in human tumours and normal cerebellum was also possible (using data from Cho, Tsherniak et al. [35]) and a total of 10 genes show significantly different expression between cerebellum and either SHH subgroup tumours alone or all tumours (Additional file 5: Table S3), consistent with dysregulation of expression during tumorigenesis. Furthermore, the direction of expression change observed is generally consistent with the predicted mode of action of each CIS. For instance, expression levels of FGF13, NFIB, TEAD1 and TGIF2 are all increased versus normal cerebellum whilst expression of MYT1L, SFI1, and SLIT3 is appropriately reduced (Table S2) in line with the inferred mechanism of action (Table 1).

CIS genes define a neuronal transcription factor network in human MBs

The significant enrichment for transcription factor (TF) activity within the MB CIS genes raised the possibility that they could be present within co-ordinated signalling or developmental pathways. ARACNE [40, 41] is a method which uses gene-gene co-regulation measures, and elimination of indirect relationships, to infer TF-target interactions within expression data. It has successfully been used to identify novel oncogenes in expression datasets from glioma [42] and acute lymphoblastic leukaemia [43]. We used ARACNE to infer regulatory networks within publicly available MB gene expression data ([38, 39] see Methods). Strikingly, seven CIS genes, including four of the five CISs with the highest GKC p-values, were linked within a single network either directly or via nearest neighbours (Figure 3). This cluster of CIS nodes is highly significant (p=0.006 using 1000 randomly re-sampled networks) and consists of 6 genes with transcription factor/cofactor activity (CREBBP, MYT1L, NFIA, NFIB, TEAD1 and TGIF2) and one neuronal growth factor (FGF13). Gene ontology analysis (see Methods) established that the extended network is enriched both for transcription factors/regulators (p=0.0026/0.0035, Additional file 6: Table S4, yellow in Figure 3) and for genes with ontologies relating to cellular components of differentiated neurons (p=0.019-0.0042, Additional file 6: Table S4, green in Figure 3). This suggests that the network consists primarily of neuronal transcription factors and their targets.

Figure 3
figure 3

A neuronally enriched transcription factor network is defined by MB CIS genes. MB CIS genes (enlarged) linked via nearest neighbours are shown. The topological arrangement of connected CIS nodes is highly significant (p<0.006) as the average minimum path distance between the 7 CISs is lower than in 994/1000 randomly resampled networks. All network edges are supported with a bootstrapped p-value of <1x10-8. Genes identified as biallelically mutated MB CISs by Wu et al. [30] are shown as diamonds. Genes with transcription regulator activity (GO ID 30528) are shown in yellow. Genes with enriched ontologies associated with differentiated neurons (GO IDs 45202, 44456, 45211, 43005, 43198) are shown in green. Other genes with a known role in neuronal development are shown with a green border. For details of network construction, see Additional file 7. For details of enriched ontologies see Additional file 6: Table S4.

To investigate the relative activity of genes within these enriched ontologies in human tumours, expression heatmaps of CIS genes, transcription factors, and neuronal genes within the network were generated (Figure 4a). There are clear expression differences between clinicogenetic subgroups, consistent both with the ANOVA analysis of CIS gene expression across subgroups (Table 2) and with the presence of genes previously shown to be highly expressed in Group 3 and 4 tumours (e.g. NEUROD2, GABBR2 [38]). However, most striking are the neuronal genes which include neurotransmitter receptors and synaptic scaffold/matrix proteins, the vast majority of which show low expression in the SHH and WNT tumours.

Figure 4
figure 4

Network activity correlates with tumour subgroup, metastasis and survival. a. Gene expression heatmaps of CIS genes and network genes with enriched ontologies in all 4 clinicogenetic subgroups are shown. Where more than one probe per gene was present in the network, the probe with the highest mean expression level across all samples is shown. b. Box and whisker plot showing log of metagene expression in MB clinicogenetic subgroups. ANOVA p-value is shown. Boxes represent the 25th to 75th percentile, with median values shown as solid lines. Whiskers represent 95th percentile, and outliers are shown individually. c. Log of metagene expression in tumours with metastases and those without. T-test p-value is shown. d. Kaplan Meier curve of SHH subgroup tumours relative to Metagene score.

Four of the seven networked CIS genes have recently been identified as MB CISs in a tissue-specific SB mutagenesis screen of primary tumours generated using the Ptch model (Crebbp, Nfia, Nfib and Tead1 [30]), and orthologs of a further three genes within our network (Dip2c, Edil3 and Erbb4) were identified as MB CISs in the same screen. Strikingly, with the exception of Tead1, both alleles of all of these genes were targeted by inserts in primary tumours in this screen [30], indicative of key tumour promoting events. We therefore analysed the distribution of all 17 biallelic events identified in this tissue-specific study and found them to be significantly enriched within our network (6/90, 7%) compared to outside our network (11/5823, 0.19%; Fisher’s Exact Test p <0.00001). This provides evidence that the TF network identified here was also targeted in an independent SB screen.

Network activity correlates with advanced disease and survival in SHH tumours

To assess network activity within human tumour datasets further, we generated a single “metagene” metric to summarise the expression of CIS network genes. The expression of each gene was signed according to direction of correlation with other CIS genes, such that a single score reflected the unified action of all genes (see Additional file 7). As a result, the expression of genes where CISs are inferred to cause loss of function was positively correlated with metagene score (e.g. MYT1L), whereas the expression of genes where CISs are inferred to cause gain of function (e.g. TGIF2, Additional file 2: Figure S1) was negatively correlated. Metagene activity was then investigated with respect to MB subgroups and clinical features. As expected from the expression heatmaps (Figure 4a), the activity of the network metagene differs significantly between MB clinicogenetic subgroups (F = 62.8 p<0.0001 Figure 4b), with the highest network activity being observed in Group 4 tumours. Metagene activity is also higher in tumours presenting with metastatic disease when compared to those that do not (bootstrapped t = 2.388; p<0.013 Figure 4c). This is likely to reflect the high metagene expression in Group 4 tumours where metastases are frequently observed. A subgroup specific analysis of available expression data which has associated survival information [35], however, showed that network activity correlates significantly with survival in SHH subgroup tumours (log-rank 8.03; p <0.005, Figure 4d), but not in other subgroups (data not shown).

Microarray expression analysis of mutagenised tumours identifies Igf2 as a key network associated gene

To identify specific genes whose expression might be altered by network mutations, we generated murine expression data using Illumina bead arrays (see Methods) from 30 SB induced tumours, 6 non-mutagenised tumours from Ptch+/- animals, and 6 normal cerebella. We first validated the mutagenised murine model in terms of gene expression as follows: Four metagenes were generated from human tumour expression data using NMF (see Methods) to define the four MB biological subgroups. These metagenes were then projected across the mouse tumour data using all available orthologous probes (Figure 5a), and the subgroup identity of the mouse tumours was then tested with a Support Vector Machine (SVM) using the metagene scores for the human data as the training set and the mouse tumours as the test set. All human tumours were trained correctly with zero errors. The mouse SB tumours were also correctly predicted to be SHH tumours in 29/30 (96%) of cases (Figure 5b) and 6/6 (100%) of non-transposon PTCH MB controls, establishing that gene expression in SB induced mouse tumours is similar to expression in human SHH tumours.

Figure 5
figure 5

Ptch+/- MBs are SHH subgroup tumours and Igf2 is a key network associated gene. a-b. Projection of four sub-group specific metagenes derived from 108 human primary MB expression profiles (Training Set) onto MB mouse model expression profiles (Test Set) using Non-negative Matrix Factorisation (NMF). a. Heatmaps showing expression of metagenes across human tumours (training; left panel) and expression of metagenes across murine tumours (test; right panel). b. Pseudo-plot of NMF projections coloured according to confident MB sub-group classification of murine tumours using metagene expression values. This unsupervised analysis shows that the majority of Ptch mouse model samples recapitulate the expression profiles of primary SHH MB. c. Box and whisker plots showing relative Igf2 expression assayed by Real Time PCR (for details, see Methods): Left Panel - tumours with inserts in 1 or more network CIS gene versus tumours with no inserts; Right Panel - Igf2 expression in tumours with 1 or more insertions in Nfia versus all other tumours with inserts in at least 1 network CIS gene. Boxes represent the 25th to 75th percentile, with median values shown as solid lines. Whiskers represent 95th percentile, and outliers are shown individually. d-e. GSEA plots for human tumours ranked by metagene score (left panels) and mouse tumours ranked by fold change of expression in tumours with network hits compared to predisposition control MBs not exposed to mutagenesis (right panels). ES-Enrichment Score, RM-Ranked list Metric. d. Lein_Neuron_Markers. Human NES=2.15 p<0.001, Mouse NES=-2.24 p<0.001. e. KEGG_Ribosome. Human NES=-3.00 p<0.001, Mouse NES=2.59 p<0.001.

We then looked for genes differentially expressed between tumours with hits in CIS network genes and tumours with no hits in these genes. This identified the Insulin-like growth factor 2 (Igf2) as the most differentially expressed gene with a mean fold change of 3.58 (p=0.002, Additional file 8: Table S5). To validate the association between network insertions and Igf2 expression, we quantified Igf2 expression in all of our SB-induced MB tumours for which RNA was available using Real Time PCR (See Methods). This confirmed that Igf2 was expressed at a significantly higher level in tumours with one or more insertions in a network CIS gene, than in tumours with no insertion in a network CIS gene (p<0.0001, Figure 5c). Furthermore, we also established that tumours with hits in Nfia, the CIS gene most frequently affected by SB insertion, expressed Igf2 at higher levels than network tumours with no insert in Nfia (Figure 5c).

CIS related Network activity is associated with proliferation and reduced differentiation

Finally, to gain insight into biological processes which may be affected by the network, we also performed a Gene Set Enrichment Analysis (GSEA) which uses gene ranking to test for enrichment of predefined genesets [44]. This was performed both in mouse tumours (ranking genes by fold change of expression in SB-induced tumours with network hits compared to Ptch+/- MBs not exposed to mutagenesis) and in primary human MBs (ranking by association with metagene score).

GSEA revealed a broad picture of increased cell proliferation and reduced differentiation associated with network hits in mice and low network metagene activity in human tumours (the concordant gene expression pattern), as demonstrated by significant enrichment of multiple genesets (see Additional file 9: Table S6). For instance, genesets indicative of neuronal differentiation, such as “Cahoy_Neuronal” and “Lein_Neuron_Markers”, are significantly enriched in human MBs with high metagene expression and in MBs from murine PTCH controls with no transposition (e.g. Figure 5d). A similar effect is seen for genesets describing genes containing CREB and cAMP responsive elements (e.g. Additional file 10: Figure S3A). In contrast, genesets denoting proliferation and elevated cell growth are significantly enriched in human MB with low network metagene activity and mouse PTCH MBs with CIS network hits. These include genesets linked to mitosis and cell cycle, MYC targets (Additional file 10: Figure S3B), and ribosome biogenesis (Figure 5e).

Consistent with increased Igf2 expression in MBs from mice with network hits, enrichment of IGF-related genesets is also observed in human and mouse tumours; e.g. enrichment of Pacher_Targets_of_IGF1 _and_IGF2 _up in mouse tumours with network hits (Additional file 10: Figure S3C), and of Boudoukha_Bound_by_IGF2BP2 in human tumours with a low Metagene score (Additional file 10: Figure S3D). In addition, the genesets Lee_Targets_of PTCH1 _and_SUFU “ _up” and “_dn”, indicative of SHH dependent murine tumorigenesis [45], show consistent differential enrichment; upregulated targets are enriched in mouse network tumours and human tumours with a low Metagene score (Additional file 10: Figure S3E), and downregulated targets are enriched in mouse tumours with no mutagenesis and human tumours with a high Metagene score (Additional file 10: Figure S3F). Collectively, the broad concordance of human and murine datasets indicates conservation of CIS mutational function across species, and substantiates the concerted action of CIS network genes as a tumourigenic process promoting cell proliferation and inhibiting neuronal differentiation.

Discussion

We have shown that whole body SB mutagenesis of Ptch+/- mice significantly enhances MB frequency without affecting latency, and does not induce these tumours in wild type mice. The majority of candidate genes identified have either been implicated in neuronal development, differentiation and/or migration, have been linked to SHH signalling, or have been shown to be mutated in SHH subgroup tumours in humans. Furthermore, we found that one gene identified from our screen, MYT1L, is of prognostic value within a multivariate analysis of human MB survival data. These genes, therefore, warrant individual assessment as potential therapeutic targets and/or biomarkers for the improved stratification and treatment of medulloblastoma.

Notably, 7 of the genes (6 transcription factors/cofactors and 1 growth factor) show significant associations with each other within a novel MB expression network, implying a previously unidentified functional relationship which does not map to established canonical pathways. This network is enriched for transcriptional regulators and genes with neuronal ontologies, and links genes with roles in stem cell/neuron migration, neurite growth and neuronal cell cycle progression, to genes which encode structural and functional elements of mature neurons (See Additional file 11: Table S7 for known gene functions). This suggests that the network comprises transcription factors involved in the proliferation and differentiation of cGNPs, the cell of origin for SHH MB [7], and their targets. Interestingly, lineage commitment to cGNP identity is a pre-requisite for SHH tumour development [46, 47].

The relationships identified here between murine network activity, metagene activity in human tumour subtypes, and individual CIS gene expression, are summarised in Figure 6. This highlights the variable network activity within SHH tumours and Group 3 tumours, and identifies CIS genes with similar, or wholly divergent, expression patterns relative to the metagene. Of these, TGIF2 and MYT1L are of particular interest as they influence neuronal SHH expression and development, respectively: A conditional Tgif1/Tgif2 double knock-out has recently been shown to reduce Shh expression in the developing brain and to recapitulate holoprosencephaly [a human condition caused by SHH and TGIF1 mutations (OMIM# 142945 and 142946)], while the transcriptional repressor MYT1L can contribute to the re-programming of human fibroblasts into neurons [48, 49]. As neither gene has been implicated in MB development to date, both are prime targets for further investigation.

Figure 6
figure 6

Integrated summary of metagene expression, associated clinical features, and GSEA. The schematic relationship between mouse and human tumours in terms of network genes is shown in the top panel. Expression heatmaps showing the network metagene and CIS genes in 108 MBs by subgroup [38, 39], together with the incidence of metastatic disease, are shown in the middle panel. For all genes, data from the most highly expressed Affymetrix probe are shown. In the metastases track, red = presence, green = absence, grey = no data. In the histological subtype track, magenta = classic, green = desmoplasia, blue = LCA. A schematic of key processes relevant to tumour biology identified by GSEA is shown in the bottom panel. For details of GSEA, see text and Additional file 9: Table S6.

Importantly, network activity has clinical relevance, as high activity is associated with advanced disease in all tumours and low activity is associated with poor survival specifically in SHH subgroup tumours (Figure 6). These associations appear incongruous, but the former is likely due to the high incidence of metastases in Group 4 tumours where network activity is uniformly high. In contrast, the SHH subgroup-specific association with outcome may reflect clinically important variation in the developmental status of individual tumours, and highlights the potential utility of network activity as a prognostic biomarker for the prediction of outcome within the SHH subgroup.

The GSEA analysis in mouse and human tumours demonstrates very clearly a role for this network in inhibiting neuronal differentiation and promoting cell proliferation. Consistent with this, several common functional pathways were identified in both species of potential relevance to disease. Of these, the MYC and IGF-dependent signalling pathways are of particular interest, the latter having recently been highlighted in an independent SB screen [30]. Furthermore, our analysis of gene expression in SB mutagenized mouse tumours identified Igf2 upregulation as a key output of SB-induced network perturbation. Igf2 is already known to be required for MB development in the Ptch+/- model as no tumours are observed in Igf2 null; Ptch+/- mice [50], over-expression of Igf2 in Ptch+/- mice increases the frequency of MBs generated by Shh transfection of cerebellar neural progenitors [51], and at the cellular level Igf2 acts synergistically with Shh to increases murine cGNP cell proliferation 10 fold [52].

The results presented here suggest that network mutations converge to inhibit differentiation and upregulate Igf2. This extends the existing model of MB formation in SB mutagenised Ptch+/- mice by identifying genes underpinning the upregulation of Igf2 which leads to the persistence of Ptch+/--induced cerebellar proliferative lesions and progression to MB [53]. Consistent with this model, several network CIS genes, or genes which they bind/modulate, have already been implicated in Igf2 expression or activity, including Tead1 [54], Nfia and Nfib [55, 56], and Crebbp [57]. There is an unmet clinical need for the development of SHH pathway-independent targeted therapies for SHH subgroup tumours, particularly in view of the predicted acquired or intrinsic resistance to current SMO inhibitors [9, 10]. The implication of insulin-dependent signalling in human and mouse SHH tumours strongly support its development as a therapeutic target for SHH subgroup tumours.

The application of SB mutagenesis to additional murine MB models [17, 18] could identify genes relevant to other tumour subgroups. However, our results contrast sharply with a recent Math1 driven tissue-restricted screen of a more penetrant Ptch+/- model [30] where tumour latency was reduced from 8 to 2.5 months, a high frequency of metastases (80%) was observed, and divergent primary and metastatic insertional mutation signatures were defined. Notably, we did not observe metastases following whole-body mutagenesis in this study. These two models are, therefore, not directly comparable and suggest that penetrance of the tumour predisposition, and the power of the mutagen, are likely to determine the nature of genes identified in future screens.

Finally, this is the first time to our knowledge that mutagenesis data from a murine cancer model have been integrated with human expression networks to explore biological mechanisms of tumourigenesis. The identification of novel and biologically relevant candidate genes linked within a single expression network, the activity of which correlates with disease state and survival within the subgroup of tumours being modelled, illustrates the utility of this cross-species approach. Clarification of the interactions between network genes identified here, their roles in the pathways highlighted by our GSEA analysis, and establishment of their therapeutic relevance will, however, require extensive functional analyses of multiple genes both individually and in concert.

Conclusions

Here, we have used SB mutagenesis to define a novel neuronal transcription factor network involved in medulloblastoma formation within the Ptch+/- model, and provide evidence that disruption of this network upregulates Igf2, critical for proliferation of GNPs and tumour formation. Moreover, we have identified rational therapeutic targets for SHH subgroup tumours, alongside prognostic biomarkers for the identification of poor-risk SHH patients, supporting the further development of these findings as a basis for improved and individualised therapy. Our results also suggest that the integration of mutagenesis data and expression network analysis may help to unravel key events in other cancers which disrupt complex developmental programmes, for which murine models are available.

Methods

Mice strains

The following strains were used: B6 Ptch1tm1Mps/J mice [19]; CBA wild type (Charles River laboratories, Margate, UK); T2Onc line 76 [58] and SB11 Rosa26 [22]. All animal work adhered to UK Home Office guidelines and was performed under Project Licence PLL/60/3621. Animal numbers, together with tumour incidence in each genotype, are given in Additional file 1: Table S1.

Sample processing, insertion site mapping and CIS identification

Tumours and other abnormal tissue identified upon post-mortem examination were collected for histological examination and DNA/RNA isolation. Insertion sites were identified using splinkerette PCR [21] coupled to GS-FLX amplicon sequencing [59] and sequence reads were mapped to the mouse genome (NCBI37/mm9) as described previously [60]. Common insertion sites (CISs) were identified using Gaussian kernel convolution (GKC) [23]. The raw p-value of each CIS peak was corrected for the total number of CIS peaks on the chromosome to which it maps, with a cut off of p<0.05. Monte Carlo simulation methods [24] were also used for comparative purposes.

Illumina expression analysis and Real Time PCR

  1. 200

    ng of each RNA was amplified and biotin labelled using the Illumina TotalPrep RNA amplification kit (Applied Biosystems, Foster City, CA. USA). cRNA size distribution was assessed using an Agilent Bioanalyser. Approximately 750 ng of each cRNA was hybridised to the Illumina Mouse8 Reference Array (Illumina, Essex, UK) according to the manufacturer’s recommended protocols by the Wellcome Trust Clinical Research Facility (Edinburgh, UK). Real Time PCR was performed using the 5’ nuclease assay on the ABI PRISM 7700 Sequence detector (Perkin-Elmer, Applied Biosystems, Foster City, CA, USA). Oligonucleotides were designed using Primer Express software (v3.0, PE Biosystems, Foster City, CA, USA), and were designed to span an intron to avoid amplification from genomic DNA. Mean Ct values were normalised against the average expression of the endogenous control gene β-Actin. Relative gene expression was calculated by the 2-ΔΔCt method [61] using the control cerebellar RNA which showed the highest expression level as measured by the Illumina microarray data. For primer sequences see Additional file 7.

Bioinformatic Analyses

Expression profiles comprising 119 Affymetrix HGU133p2 arrays were taken from published studies [38, 39]. CEL files were processed using the Bioconductor RMA package [62]. The ARACNE network was constructed using the aracne2 standalone software package according to authors instructions [40]. Ontology analyses were performed using the Bingo 2.44 Cytoscape plug in [63], with Benjamini and Hochberg FDR-corrected hypergeometric tests and the whole annotation set as background. The topological arrangement of 7 network genes was significance tested by calculating the mean shortest path distance to the nearest connecting CIS gene and permuting 10,000 times with 7 randomly selected genes in order to create a null distribution.

GISTIC analyses [37] were performed using the module provided in Genepattern [64], with Affymetrix SNP Chip .CEL files [35, 36] being processed using the Aroma package [65] and segmented using the CBS algorithm [66]. GSEA [44] was performed using the standalone package (http://www.broadinstitute.org/gsea/) and genesets were taken from the MsigDB library [67]. NMF (Non-Negative Matrix Factorisation) was performed using a script adapted from [68]. All other statistical tests were performed using R [69]. For further details of procedures and analyses, see Additional file 7.

Availability of supporting data

Microarray gene expression data from this study have been deposited in the Gene Expression Omnibus as submission GSE43994. Insertion site data in the form of BED files are provided as Additional files 12 and 13.