Distinct DNA Methylation Patterns of Subependymal Giant Cell Astrocytomas in Tuberous Sclerosis Complex

Tuberous sclerosis complex (TSC) is a monogenic disorder caused by mutations in either the TSC1 or TSC2 gene, two key regulators of the mechanistic target of the rapamycin complex pathway. Phenotypically, this leads to growth and formation of hamartomas in several organs, including the brain. Subependymal giant cell astrocytomas (SEGAs) are low-grade brain tumors commonly associated with TSC. Recently, gene expression studies provided evidence that the immune system, the MAPK pathway and extracellular matrix organization play an important role in SEGA development. However, the precise mechanisms behind the gene expression changes in SEGA are still largely unknown, providing a potential role for DNA methylation. We investigated the methylation profile of SEGAs using the Illumina Infinium HumanMethylation450 BeadChip (SEGAs n = 42, periventricular control n = 8). The SEGA methylation profile was enriched for the adaptive immune system, T cell activation, leukocyte mediated immunity, extracellular structure organization and the ERK1 & ERK2 cascade. More interestingly, we identified two subgroups in the SEGA methylation data and show that the differentially expressed genes between the two subgroups are related to the MAPK cascade and adaptive immune response. Overall, this study shows that the immune system, the MAPK pathway and extracellular matrix organization are also affected on DNA methylation level, suggesting that therapeutic intervention on DNA level could be useful for these specific pathways in SEGA. Moreover, we identified two subgroups in SEGA that seem to be driven by changes in the adaptive immune response and MAPK pathway and could potentially hold predictive information on target treatment response. Supplementary Information The online version contains supplementary material available at 10.1007/s10571-021-01157-5.


Background
Tuberous sclerosis complex (TSC) is a multisystem monogenetic disorder caused by mutations in either TSC1 or TSC2 and is characterized by hamartoma development in several organs, including the brain, kidneys, lungs, heart, eyes, and skin (Curatolo et al. 2008). Patients with TSC often have neurological manifestations including neurodevelopmental disorders (such as autism) and severe epilepsy (Curatolo et al. 2015). The majority of patients with TSC have seizure onset before the age of two (Davis et al. 2017). The hallmark brain lesions in TSC include cortical/subcortical tubers, subependymal nodules (SENs) and subependymal giant cell astrocytomas (SEGAs) (Aronica et al. 2012;Aronica and Crino 2014). SEGAs are benign, slow growing tumors classified as WHO grade I and make up 1-2% of all paediatric brain tumors (Jozwiak et al. 2015;Louis et al. 2016). Usually, SEGAs develop during the first two decades of life in patients with TSC, with a mean age at presentation below 18 years (Jozwiak et al. 2015;Adriaensen et al. 2009). They are typically located near the foramen of Monro where extended growth of the tumor can result in blockage of the cerebral fluid circulation and subsequent obstructive hydrocephalus (Cuccia et al. 2003). SEGAs are thought to arise from SEN along the ependymal lining of the lateral ventricles (Buccoliero et al. 2009;Fujiwara et al. 1989;Morimoto and Mogami 1986). Histologically, they are characterized by spindle cells, gemistocytic-like cells and giant cells and demonstrate an immature neuroglial phenotype. Tumor suppressors hamartin (TSC1) and tuberin (TSC2) can form an intracellular complex with TBC1 domain family member 7 (TBC1D7) that exerts GTPase-activating protein (GAP) activity towards the small GTPase Ras homolog enriched in brain 1 (RHEB1) (Dibble et al. 2012;Inoki et al. 2003). Inhibition of RHEB1 is important in regulating the mechanistic target of rapamycin complex (mTOR) pathway, which can affect cell growth and proliferation. Pathogenic loss of function mutations in TSC1 or TSC2 result in constitutive activation of the mTOR pathway and uncontrolled cell cycle progression (Chan et al. 2004). Besides the mTOR pathway, the immune system, the mitogen-activated protein kinase (MAPK) pathway and extracellular matrix (ECM) organization have been suggested to play a role in SEGA development based on gene expression studies (Bongaarts et al. 2019;Martin et al. 2017;Tyburczy et al. 2010). However, the precise mechanisms behind these gene expression changes in SEGA are still largely unknown.
Gene expression can be controlled through regulation of the epigenome, via epigenetic mechanisms (Keshet et al. 1985). DNA methylation is one of most recognized epigenetic markers, generally associated with silencing of gene expression, and its role in tumorigenesis has become a topic of interest (Klutstein et al. 2016). It is characterized by the addition of a methyl or hydroxymethyl by DNA methyltransferases (DNMTs) to cytosine residues in CG (CpG sites), CXG and CXX DNA sequences (where X corresponds to A, T, or C). Changes in DNA methylation have been well studied in cancer including central nervous system (CNS) tumors (Binder et al. 2019;Jones and Baylin 2007) and profound changes of methylation profiles have also been seen in neuro-psychiatric diseases such as autism spectrum disorders, epilepsy and TSC (Martin et al. 2017;Henshall and Kobow 2015;Gos 2013). Furthermore, DNA methylation profiling is highly robust and reproducible and has therefore been successfully used to distinguish subtypes in CNS tumors and focal cortical dysplasia (Laffaire et al. 2011;Kobow et al. 2019;Capper et al. 2018a,b;Stone et al. 2018). These DNA methylation-based classifications of CNS tumors have proven helpful for better diagnostics especially in cases with ambiguous histology or contradictory molecular profiles. Although, SEGAs have been included in previous methylation-based studies, none of these studies have performed an in-depth exploration of the molecular contribution of DNA methylation in SEGAs (Martin et al. 2017;Capper et al. 2018a,b;Capper et al. 2018a,b). Therefore, in this study, we aimed to identify distinct methylation patterns and pathways that might contribute to SEGA pathogenesis by performing a comprehensive analysis of genomic DNA methylation patterns in SEGAs.

The Methylation Profile of SEGAs
To characterize the methylation profile of SEGAs, DNA was extracted from SEGA samples and control brain samples and analyzed using the 450k methylation array. In total, 42 SEGA samples were included from 39 TSC patients and 3 patients with no other manifestations of TSC (surgical specimens) and 8 location-matched periventricular controls (autopsy specimens; see materials and methods and Table 1). A total of 421,352 CpGs were analyzed with a principal component analysis (PCA) indicating that the major source of variability was the diagnosis (SEGA or control; Fig. 1a & Supplementary Fig. 2), which was confirmed with a spearman's correlation matrix using the top 5% most variable CpGs (Fig. 1b). Furthermore, no specific clustering was seen based on the TSC mutation (Fig. 1b). To further assess other potential confounders on the methylation profile a principal variance component analysis (PVCA) was performed, showing that the major contributor to the variance between the samples was again the diagnosis ( Supplementary Fig. 1a).
Since the majority of the differentially methylated CpGs (adjusted p-value 0.01, β-value difference of > 0.2) were located at the TSS-associated regions (Fig. 1c), we narrowed our data set to these CpGs and found 4616 CpGs hypomethylated and 2526 hypermethylated in SEGA compared to controls (adjusted p-value 0.01, β-value difference of > 0.2, TSS-associated regions; Fig. 1d). The 7142 differentially methylated CpGs were located on 3875 genes. We identified 227 enriched GO terms (adjusted p-value < 0.05) for these genes ( Fig. 1e; Table 2) including adaptive immune system, T cell activation, leukocyte mediated immunity, extracellular structure organization and the ERK1 & ERK2 cascade. When accounting for the number of probes in each gene (using missMethyl) we found that the adaptive immune system, T cell activation, leukocyte mediated immunity, the ERK1 & ERK2 cascade and the extracellular matrix were still among the enriched GO terms (Supplementary Table 1). The majority of the enriched GO terms contained more hypomethylated genes then hypermethylated genes ( Fig. 1f; Table 2).

Expression of Inflammation, mTOR Activation, Glial and Neuronal Markers in SEGAs
SEGAs are considered mixed glio-neuronal tumors, with mTOR activity and presence of inflammation markers. Therefore, we wanted to evaluate the commonalities and differences in the expression of CD3, HLA-DP/DQ/DR, GFAP, MAP2 and pS6 in 42 SEGAs and 8 location-matched controls. In periventricular control tissue CD3, MAP2 and pS6 were not detected, whereas a moderate expression of HLA-DP/DQ/DR and high expression of GFAP was seen (Fig. 3a). In SEGA, we found several positive CD3 cells and observed an overall increase in positive area for CD3 in SEGA compared to control tissue ( Fig. 3b; p < 0.0001). HLA-DP/DQ/DR, GFAP, MAP2 and pS6 were expressed in a heterogeneous manner in SEGAs (Fig. 3a). The percentage of positive area of HLA-DP/DQ/DR (p < 0.0001), MAP2 (p = 0.0114) and PS6 (p < 0.0001) were increased in SEGA compared to control tissue, whereas the positive area for GFAP was decreased in SEGA (p = 0.0016).

Two Distinct Methylation Groups in SEGAs
To evaluate potential subgroups within the SEGA samples the top 5% most variable CpGs were analysed with hierarchical clustering, consensus clustering and silhouette clustering. Hierarchical clustering indicated 2 major groups with one group subdividing into two smaller groups (Fig. 4a). This was confirmed by both consensus clustering  and silhouette plots ( Fig. 4e-g), which indicated k = 3 as the most robust number of clusters. To assess other potential confounders on the methylation profile another PVCA was performed, showing that the major variance between the SEGA samples matched with the identified subgroups k = 3, followed by the subgroups k = 2 ( Supplementary Fig. 1b).
Other clinical data contributed minimally to the overall variance seen amongst the samples. Clustering of subgroups was re-evaluated in an additional independent SEGA cohort from  Age at surgery is the same as the age at tissue collection Heidelberg (50 additional cases) showing the robustness of the two groups ( Supplementary Fig. 3). We further investigated the two largest groups identified and performed differential testing between group 1 compared to control (SEGA1-control), group 2 compared to control (SEGA2-control) and group 1 compared to group 2 (SEGA1-SEGA2). We found 4377 hypomethylated and 1411 hypermethylated CpGs in SEGA1-control (Fig. 5a), 4883 hypomethylated and 3132 hypermethylated CpGs in SEGA2-control (Fig. 5b) and 321 hypomethylated and 70 hypermethylated CpGs in SEGA1-SEGA2 ( Fig. 5c; adjusted p-value 0.01, β-value difference of > 0.2, TSSassociated regions). In order to identify differentially methylated genes, genes corresponding to the differentially methylated CpGs were extracted. Genes that were overlapping between SEGA1-control and SEGA1-SEGA2 and did not overlap with SEGA2-control were considered unique for SEGA1, whereas genes that overlapped between SEGA2-control and SEGA1-SEGA2 but did not overlap with SEGA1-control were considered unique for SEGA2 (70 SEGA1 unique genes and 58 SEGA2 unique genes; Fig. 5d). GO analysis revealed 15 GO terms enriched for these 128 unique genes, which were related mainly to the MAPK cascade and adaptive immune response (p-adjusted < 0.05; Fig. 5e). We further evaluated the RNA expression of these 128 genes and performed correlations between the normalized count matrix of genes that were expressed (106/128 genes) and the β-values of the corresponding CpGs. We identified 11 genes that inversely correlated with their corresponding CpG (Table 3). Based on the RNA expression no clear clustering was found between SEGA1 and SEGA2, although the number of cases in each group was relatively small (Fig. 5f).
Next, we investigated if the expression of CD3, HLA-DP/ DQ/DR, GFAP, MAP2 or pS6 could explain the subgroups found in the methylation data. We found a higher positive area of CD3 in SEGA2 compared to SEGA1 (p = 0.0068; Fig ; AUC = 0.539). Furthermore, using Random Forest we found that none of the clinical data could properly separate between SEGA1 and SEGA2 ( Fig. 6f) or between SEGA1, SEGA2a and SEGA2b (Fig. 6g). The tumor size contributed most to separating the two groups but showed no significant difference between SEGA1 and SEGA2 (Fig. 6h).

Discussion
In this study, we performed DNA methylation profiling of SEGAs from TSC patients and showed that the differential methylation profile between SEGAs and control tissue was enriched for GO terms including the adaptive immune system, T cell activation, leukocyte mediated immunity, extracellular structure organization and the ERK1 & ERK2 cascade. Histological markers for T cells, microglia reactivity, mTOR activation and neurons were higher expressed, whereas the glial marker GFAP was lower expressed in SEGA compared to periventricular control tissue. Furthermore, we identified two robust subgroups in the DNA methylation of SEGA, with a distinct methylation profile of genes related to the adaptive immune response and the MAPK pathway. Moreover, we found differences in positivity of the T cell marker CD3 between the two largest subgroups.
Previous studies on SEGA methylation have shown that SEGAs are a unique entity among CNS tumors (Martin et al. 2017;Capper et al. 2018a,b). However, the molecular mechanisms targeted by methylation changes in SEGA have not been well studied. In the present study, we identified substantial methylation changes in SEGAs compared to periventricular control tissue that appeared to be independent of the TSC1/TSC2 mutation or other clinical information available. GO analysis showed an enrichment of the adaptive immune system, T cell activation, leukocyte-mediated immunity, extracellular structure organization and the ERK1 & ERK2 cascade. Previous gene expression studies on SEGA found differential expression of similar pathways, indicating that these pathways are already affected on DNA level and might therefore be important drivers in SEGA pathogenesis (Bongaarts et al. 2019;Martin et al. 2017;Tyburczy et al. 2010). Differential expression of genes related to the immune system and the ECM organization has also been seen in cortical tubers (Martin et al. 2017;Mills et al. 2017). Furthermore, several studies have documented dysregulation of inflammation and ECM organization related pathways in cortical tubers, suggesting that these processes might be conserved across TSC pathology (Boer et al. 2008(Boer et al. , 2010Prabowo et al. 2013;Broekaart et al. 2019). Therefore, it would be of interest to investigate if DNA methylation changes related to these processes are also present in cortical tubers and other TSC lesions.
The role of mTOR pathway activation due to loss of function mutations in TSC1/TSC2 in TSC is well established. Furthermore, loss of heterozygosity (LOH) of TSC1 or TSC2 has been reported in approximately 80% of SEGAs and has also been found in other TSC hamartomas (Martin et al. 2017;Bongaarts et al. 2017;Chan et al. 2004). In vitro experiments show that mTOR inhibitors can reduce cell size and cell proliferation of SEGA cells. Currently, mTOR 1 3 inhibitors are amongst the treatment options for SEGA associated with TSC (Franz et al. 2015). Therefore, we aimed to investigate methylation changes on genes related to the mTOR signaling pathway. GO term analysis on the differentially methylated CpGs between SEGA and control tissue did not reveal the mTOR pathway as a principal target. Moreover, by directly investigating mTOR pathway related genes, we found only a small number of differentially methylated CpGs on 6/35 mTOR related genes, indicating that DNA methylation changes most likely do not contribute to the mTOR activation in SEGA. It has been suggested that TSC1/TSC2 epigenetic silencing might contribute to tumor formation in TSC and could explain cases where the second hit mutation in TSC1/TSC2 is not found (Jiang et al. 2005;Lesma et al. 2009). In accordance with previous study, we did not find evidence of epigenetic silencing of the promoter of TSC1 or TSC2 in SEGA (Martin et al. 2017).
Initial studies suggested an astrocytic nature of SEGAs, whereas more recent studies demonstrate a mixed glioneuronal phenotype, with mTOR activity and presence of inflammation markers Boer et al. 2008;Chan et al. 2004;Buccoliero et al. 2009Buccoliero et al. , 2016. We evaluated the expression of CD3, HLA-DP/DQ/DR, GFAP, MAP2 and pS6 using whole slide scanning in 42 SEGAs and 8 location and age matched controls. In accordance with previous literature, we confirmed the presence of inflammation markers and mTOR pathway activation in SEGA compared to control tissue Boer et al. 2008;Chan et al. 2004;Buccoliero et al. 2009Buccoliero et al. , 2016. Previous research showed that the mTOR activation is mainly present in giant cells and not in spindle cells of SEGA, which could explain the variability between the SEGA samples seen in this study (Buccoliero et al. 2016). The mTOR pathway can also regulate inflammatory responses and a previous study has shown that HLA-DR positive microglial cells were localized around giant cells in SEGA (Boer et al. 2008;Lim et al. 2003). In accordance, we found a positive correlation between the expression HLA-DP/DQ/DR and pS6. Furthermore, we found expression of both GFAP and MAP2 in SEGAs confirming a glio-neuronal nature of SEGAs. However, the percentage of positive GFAP area was lower in SEGA compared to control tissue. This lower expression could be explained by the diffuse staining of GFAP, which has been reported in prior immunohistochemical studies, indicating that some SEGA cells might lose their glial phenotype (Buccoliero et al. 2009(Buccoliero et al. , 2016. It could be that GFAP negative SEGA cells have a more neuronal expression, however, no negative correlation was found between the expression of GFAP and MAP2 in SEGA. Interestingly, we identified two subgroups (SEGA1 and SEGA2) in the SEGA methylation data with one group subdividing further into two smaller groups. These two groups were distinct in the methylation of genes related to the adaptive immune response and the MAPK cascade. However, no correlation was found between the methylation and RNA expression of these specific genes. Due to the complexity of RNA expression regulation, the effect of methylation changes might not be directly reflected in the RNA expression data. Higher expression of the T cell marker CD3 was found in SEGA2 compared to SEGA1 confirming differences in the adaptive immune response between the two groups. It could be possible that these methylation subgroups reflect differences in inflammatory cell content. However, it must be noted that the differences are small and only detectible with quantification. Furthermore, CD3 was not found differentially methylated between SEGA1 and SEGA2, suggesting an indirect effect of methylation on CD3 expression. The precise role of T-cells in SEGAs is still unknown, it could indicate a more responsive immune response to tumor cells but we also know that, neuroinflammation can increase the expression and activity of MMPs, which are increased in SEGAs and can play a role in tumorgenesis (Bongaarts et al. 2020). The MAPK pathway has been shown to be an important pathway for SEGA growth and has been suggested as a novel target for therapy for patients with SEGA (Bongaarts et al. 2019;Tyburczy et al. 2010;Mi et al. 2009). Therefore, the differences in methylation of this pathway between the two groups found in this study is interesting and could potentially reflect how these tumors would respond to MAPK inhibitors. Since MAPK inhibitors are not routinely used for SEGA treatment this could not be evaluated within this study. Furthermore, none of the clinical data available could Fig. 1 The methylation profile of SEGAs. a A principal component analysis (PCA) of the methylation data in SEGA (n = 42) and periventricular control tissue (n = 8) showing that the major source of variability in CpG methylation was the diagnosis. x-axis: the first principal component (PC); y-axis: the second PC. b Spearman's rank correlation matrix of the methylation data showing separate clustering of SEGAs from periventricular control tissue. The scale bar indicates the strength of the correlation with 1 indicating a strong positive correlation (dark blue) and − 1 indicating a negative correlation (dark red) between samples. c Pie charts showing the distribution of CpGs on the gene region (TSS200, TSS1500, 5'UTR and Exon 1, IGR, 3'UTR or gene body). The upper pie chart shows the distribution for 421,352 CpGs selected after filtering for probes with a detection p-values of more than 0.01, located on the sex chromosomes, or in SNPs were removed as well as cross-hybridization probes. The lower pie chart shows the gene distribution after selecting for an adjusted p-value < 0.01 and a β-value difference of > 0.2. d Volcano plot showing the differentially methylated CpGs on the TSS-associated regions (adjusted p-value < 0.01 and a β-value difference of > 0.2) between SEGAs and control tissue. A total of 4616 CpGs were hypomethylated and 2526 were hypermethylated in SEGA compared to control tissue. e Schematic overview using Cytoscape of GO terms enriched in SEGA compared to control tissue (adjusted p-value < 0.02). Lines indicate genes in common between GO terms. f Graphical representation of hypermethylated (red) and hypomethylated (blue) in the top 50 GO terms ◂  2) between TSC1 mutated SEGAs and control tissue (TSC1-control). A total of 4008 CpGs were hypomethylated and 2111 were hypermethylated in TSC1-control. b Volcano plot showing the differentially methylated CpGs on the TSS-associated regions (adjusted p-value < 0.01 and a β-value difference of > 0.2) between TSC2 mutated SEGAs and control tissue (TSC2-control). A total of 4008 CpGs were hypomethylated and 2111 were hypermethyl-ated in TSC2-control. c Venn diagram showing the overlap of CpGs between TSC1-control and TSC2-control. 5293 CpGs were overlapping, whereas 1773 CpGs were only found differentially methylated in TSC1-control and 826 CpGs in TSC2-control. d Pie chart showing the distribution of differentially methylated and not differentially methylated mTOR pathway related genes (based on Reactome). A total of 18 CpGs located on 6 mTOR pathway related genes were differentially methylated, whereas 441 CpGs located on 29 mTOR pathway related genes were not differentially methylated explain the subgroups found, including the TSC mutation. We did not find any confounding effects, however we cannot exclude confounding by other parameters such as Body Mass Index (BMI). It is very well possible that these two groups do reflect other important clinical features that were not evaluated in this study, such as tumor progression, or any differences in the clinical phenotypes either related or unrelated to the TSC mutation. Moreover, since the majority of patients included were not treated with mTOR inhibitors, it cannot be excluded that the biological make up of these subgroups reflect potential response to mTOR inhibitors. Therefore, DNA methylation analysis on SEGAs in retrospective studies are highly needed in order to unravel the clinical relevance of these subgroups.

Conclusions
Overall, this study shows that the DNA methylation profile of SEGAs is enriched for the immune system and the MAPK pathway and the ECM organization, strengthening the importance of these pathways in SEGA development and suggests that therapeutic intervention on DNA level could be useful. Moreover, we identified two subgroups in SEGA that seem to be driven by changes in the adaptive immune response and MAPK pathway. Although the clinical relevance of these subgroups remains uncertain they could potentially reflect tumor progression or response to treatment other than anti-epileptic drugs and deserve further investigation.  (Louis et al. 2016). TSC1/TSC2 mutation analysis was performed in blood or tumor sample DNA (for samples with a sufficient amount of DNA) as part of routine clinical care or was detected using massively parallel sequencing as described previously (Table 1)

RNA Isolation and RNA Sequencing
Frozen tissue of 12 SEGAs was homogenized with Qiazol Lysis Reagent (Qiagen Benelux, Venlo, The Netherlands) and total RNA was isolated using the miRNeasy Mini kit (Qiagen Benelux, Venlo, the Netherlands) according to the manufacturer's instructions. RNA concentration was determined using a Qubit® 2.0 Fluorometer (Life Technologies, Carlsbad, CA, USA) and the RNA integrity was assessed using a Bioanalyser 2100 (Agilent). Library preparation and sequencing were completed at GenomeScan (Leiden, the Netherlands). The Illumina (San Diego, California, USA) NEBNext Ultra Directional RNA Library preparation kit was used to prepare sequencing libraries in accordance to manufacturers guidelines. Clustering and DNA sequencing was performed using the Illumina cBot and HiSeq 4000 according to manufacturer's protocols. Each library was subjected to paired-end sequencing, producing reads of 150 nucleotides in length with a read-depth of 36 million reads. Data were processed as previously described (Bongaarts et al. 2019). Briefly, quality control of the reads was done using FastQC v0.11.5 (Babraham Institute, Babraham, Cambridgeshire, UK). Sequence reads were trimmed and filtered using FastQC v0.11.5 (Babraham Institute, Babraham, Fig. 3 Expression of inflammatory, mTOR activation, glial and neuronal markers SEGAs. a Immunohistochemistry showing the expression of CD3, HLA-DP/DQ/DR, pS6, GFAP and MAP2 in SEGA and periventricular control. Scale bar = 200 μm. b Quantification of immunohistochemistry indicated in % of positive area of CD3, HLA-DP/DQ/DR, pS6, GFAP and MAP2 in SEGA and periventricular control. An higher expression of CD3, HLA-DP/ DQ/DR, pS6 and MAP2 and a lower expression of GFAP was found in SEGA compared to control. *p-value < 0.05, **p-value < 0.01, ***p-value < 0.001, Mann-Whitney U test ◂ 1 3 Cambridgeshire, UK) and Trimmomatic v0.36 (Bolger et al. 2014). Paired-end reads were aligned to the human reference genome (GRCh38) with TopHat2 v2.0.13 and default settings (Kim et al. 2013). The number of reads that mapped to each gene, based on Gencode v25, was determined using featureCounts from the SubRead package (Liao et al. 2014). The count matrix was normalized using the R package DESeq2 (Love et al. 2014). Confounding variables were assessed with PCA and linear regression.

Bioinformatic Analysis
Raw IDAT files from the 450k were passed to the minfi package in R and quality control was performed using both minfi and shinymethyl. Two samples failed quality control based on the minfi quality control plot and were therefore not included in this manuscript. Normalization included a Noob background correction and dye-correction based on the control probes using the function preprocessFunnorm from the R package minfi, which removes any between-array variation (Fortin et al. 2014). Probes with detection p-values of more than 0.01, located on the sex chromosomes, or in SNPs were removed as well as cross-hybridization probes. After these steps, beta (β)-values ranging from 0.0 to 1.0 from 421,352 probes were used for further analysis.
Using the ConsensusClusterPlus package (Wilkerson and HaYes 2010), consensus clustering was performed with h-clust average linkage to detect robust clusters, where the metric was 1 minus the Spearmans correlation coefficient. The procedure was run over 1000 iterations and with a subsampling ratio of 0.99. Additionally, we applied a silhouette analysis to identify robust clusters. PCA was performed considering all CpG probes. Hierarchical clustering was performed on the top 5% most variable CpG probes using h-clust with average linkage. PCA, PVCA, linear regression, receiver operating characteristic (ROC) analysis and Random Forest (using the R package Random Forest with standard settings) were used to assess potential confounding factors as well as the contribution of histological and clinical data to the clusters identified. To determine the CpGs that were differentially methylated between groups (e.g. SEGA compared to control) a non-parametric Mann-Whitney U test was used at each CpG probe. The distribution of CpGs on the gene (TSS200, TSS1500, 5'UTR and Exon 1, Intergenic region (IGR), 3'UTR or gene body) was evaluated by calculating the percentage of CpGs per gene region. CpGs that were located on multiple genes or transcript variants were counted as one for each corresponding gene region. CpG probes located at the TSS-associated regions (TSS200, TSS1500, 5'UTR and Exon 1) with a benjamini-Hochberg adjusted p-value < 0.01 and a β-value difference of > 0.2 were considered differentially methylated. Gene ontology (GO) analysis was performed using the R package clusterProfiler (Yu et al. 2012) and the R package missMethyl. In order to identify mTOR pathway methylation changes the mTOR pathway or mTORC1 signaling pathway genelist from the Reactome database was extracted (Croft et al. 2011). CpGs mapping to these genes with a Benjamini-Hochberg adjusted p-value < 0.01 and a β-value difference of > 0.2 were considered differentially methylated. Clustering of subgroups was re-evaluated in an additional independent SEGA cohort from Heidelberg (50 additional cases) using hierarchical clustering and t-distributed stochastic neighbor embedding (TSNE) plot ( Supplementary  Fig. 3).

Evaluation of Immunohistochemistry
Olympus dotSlide system (vs 2.5, Olympus, Tokyo, Japan) was used for whole slide scanning at a × 20 magnification with a resolution of 0.32 μm/pixel. The scans were converted Fig. 4 Two robust groups found in the methylation data of SEGAs. a Heatmap using hierarchical clustering showing two/three subgroups in SEGA. b-d Consensus clustering showing that k = 3 is most robust indicated by delta area plot (b) consensus Cumulative Distribution Function (CDF) plot (c) and consensus matrix for k = 3 (d). e Silhouette plot showing the average silhouette with for each k, indicating that k = 3 is most robust. f Barchart of the silhouette clustering with for k = 2 showing which samples cluster together. g Barchart of the silhouette clustering with for k = 3 showing which samples cluster together ◂ 1 3 into TIFF files. The DAB positive area was separated from background using an adjusted protocol for segmentation and was corrected for total area of the tissue. For samples with blood contamination, regions of interest (ROI) were selected in order to select representative tumor regions. The overall percentage of positivity was assessed for each case and used for statistical analysis.

Statistical Analysis
Statistical analysis was performed with GraphPad Prism software (Graphpad software Inc., La Jolla, CA) using the nonparametric Mann-Whitney U-test or, for multiple groups, the non-parametric Kruskal-Wallis test followed by Mann-Whitney U-test. Correlations were assessed with R using the Spearman's rank correlation test. An adjusted p-value < 0.05 was considered statistically significant except for the differentially methylation analysis where an adjusted p-value < 0.01 was considered statistically significant.

Fig. 5
Unique gene expression between two SEGA subgroups. a Volcano plot showing the differentially methylated CpGs on the TSS-associated regions (adjusted p-value < 0.01 and a β-value difference of > 0.2) between one subgroup of SEGAs and control tissue (SEGA1-control). A total of 4377 CpGs were hypomethylated and 1411 were hypermethylated in SEGA1-control. b Volcano plot showing the differentially methylated CpGs on the TSS-associated regions (adjusted p-value < 0.01 and a β-value difference of > 0.2) between the other subgroup of SEGAs and control tissue (SEGA2control). A total of 4883 CpGs were hypomethylated and 3132 were hypermethylated in SEGA2-control. c Volcano plot showing the differentially methylated CpGs on the TSS-associated regions (adjusted p-value < 0.01 and a β-value difference of > 0.2) between the two subgroups of SEGA (SEGA1-SEGA2). A total of 321 CpGs were hypomethylated and 70 were hypermethylated in SEGA1-SEGA2. d Venn diagram showing the overlap of genes corresponding to the differentially methylated CpGs between SEGA1-control, SEGA2-control and SEGA1-SEGA2. Genes overlapping between SEGA1-control and SEGA1-SEGA2 but not with SEGA2-control were considered unique for SEGA1. Genes overlapping between SEGA2-control and SEGA1-SEGA2 but not with SEGA1-control were considered unique for SEGA2 (70 genes SEGA1 unique and 58 genes SEGA2 unique). e Bar chart showing the GO terms enriched for the SEGA1 and SEGA2 unique genes. Each GO term is listed on the y-axis, the log10(1/ adjusted p-value) on the x-axis and the n is equal to the number of differentially methylated genes in each GO term. f Heatmap with Z-score hierarchical clustering for the RNA expression data of 12 SEGAs. Each row indicates one of unique SEGA1 or SEGA2 genes based on the methylation data that was expressed on RNA level in SEGA. The color scale means the gene expression standard deviations from the mean, with green for low expression and red for the high expression levels ◂

Data Availability
The datasets used and/or analysed during the current study are available from the corresponding author on request.

Conflict of interest
The authors have no duality or conflicts of interest to declare.
Ethical Approval Specimens were obtained and used in accordance with the Declaration of Helsinki and this study was approved by the Medical Ethics Committees of each institution.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/.

Fig. 6
Detection of SEGA subgroups based on histological markers and clinical data. a-e ROC curves and scatterplots for detection of SEGA1 or SEGA2 based on CD3 (a), HLA-DP/DQ/DR (b), GFAP (c), MAP2 (d) and pS6 (e) positivity. The point in the ROC curve indicates the most optimal % of positivity to separate the SEGA1 and SEGA2 group, followed by the proportion of SEGA1 and SEGA2 that is correctly detected. Scatterplots show the spread between samples for the % of positive area for SEGA1 (blue) and SEGA2 (SEGA2a light red and SEGA2b dark red). The red line indicates the most optimal % of positivity to separate the SEGA1 and SEGA2 group based on the ROC curve. *p-value < 0.05, **p-value < 0.01, ***p-value < 0.001, Mann-Whitney U test. f Variable Importance plots obtained from Random Forest in R for detection of SEGA1 and SEGA2 based on the clinical data (Table 1). Each point represents the mean decrease Gini value, indicative of the importance of each variable. Variables are listed from most important to least important. Antiepileptic drugs were also categorized in 3 groups: GABA blockers (treatment GABA), valproic acid (treatment valproic acid) and sodium channel blockers (treatment sodium channels). g Variable Importance plots obtained from Random Forest in R for detection of SEGA1, SEGA2a and SEGA2b based on the clinical data (Table 1). Each point represents the mean decrease Gini value, indicative of the importance of each variable. Variables are listed from most important to least important. Antiepileptic drugs were also categorized in 3 groups: GABA blockers (treatment GABA), valproic acid (treatment valproic acid) and sodium channel blockers (treatment sodium channels). h Scatterplot showing no difference in tumor size (mm) in SEGA1 (blue) compared to SEGA2 (SEGA2a light red and SEGA2b dark red) ◂