Transcriptomic and epigenetic dissection of spinal ependymoma (SP-EPN) identifies clinically relevant subtypes enriched for tumors with and without NF2 mutation

Ependymomas encompass multiple clinically relevant tumor types based on localization and molecular profiles. Tumors of the methylation class “spinal ependymoma” (SP-EPN) represent the most common intramedullary neoplasms in children and adults. However, their developmental origin is ill-defined, molecular data are scarce, and the potential heterogeneity within SP-EPN remains unexplored. The only known recurrent genetic events in SP-EPN are loss of chromosome 22q and NF2 mutations, but neither types and frequency of these alterations nor their clinical relevance have been described in a large, epigenetically defined series. Transcriptomic (n = 72), epigenetic (n = 225), genetic (n = 134), and clinical data (n = 112) were integrated for a detailed molecular overview on SP-EPN. Additionally, we mapped SP-EPN transcriptomes to developmental atlases of the developing and adult spinal cord to uncover potential developmental origins of these tumors. The integration of transcriptomic ependymoma data with single-cell atlases of the spinal cord revealed that SP-EPN display the highest similarities to mature adult ependymal cells. Unsupervised hierarchical clustering of transcriptomic data together with integrated analysis of methylation profiles identified two molecular SP-EPN subtypes. Subtype A tumors primarily carried previously known germline or sporadic NF2 mutations together with 22q loss (bi-allelic NF2 loss), resulting in decreased NF2 expression. Furthermore, they more often presented as multilocular disease and demonstrated a significantly reduced progression-free survival as compared to SP-EP subtype B. In contrast, subtype B predominantly contained samples without NF2 mutation detected in sequencing together with 22q loss (monoallelic NF2 loss). These tumors showed regular NF2 expression but more extensive global copy number alterations. Based on integrated molecular profiling of a large multi-center cohort, we identified two distinct SP-EPN subtypes with important implications for genetic counseling, patient surveillance, and drug development priorities. Supplementary Information The online version contains supplementary material available at 10.1007/s00401-023-02668-9.

SNPs, sex chromosomes, cross-reactive sites, and sites that were not represented on the 450k or the EPIC BeadChip were excluded as described previously 1 .Previously published samples that were present multiple times in the cohort, were identified based on their SNP profile using getSnpBeta in minfi and subsequently excluded.The predicted match to the methylation class "SP-EPN" was obtained using the "Heidelberg" brain tumor classifier developed by Capper et al 1 (https://www.molecularneuropathology.org/mnp/).Classifier version [v12.5] was used and the cutoff value of ³0.9 was implemented to indicate a significant match.Samples with detection p-values ³0.01 were excluded from the cohort.
For unsupervised clustering of SP-EPN cases with other EPN cases, as well as for unfiltered clustering of methylation data of all SP-EPN cases, the 10,000 most variable CpG sites were used.Differentially methylated probes (DMP) between the two SP-EPN RNA subtypes were determined with ChAMP 2 , using adjusted (Benjamini-Hochberg procedure for multiple comparisons) p-values <0.05 and logFC <-0.2 or >0.2 as thresholds.1,518 DMP were identified, of which 616 were also covered by the Infinium HumanMethylation450 BeadChip that was used for metyhylation analysis of a subset of SP-EPN samples.All SP-EPN cases were subsequently clustered based on the methylation of these 616 DMP between the two RNA subtypes to establish extended molecular subtypes.UMAPs were generated with umap using the default parameters.
Copy number plots were generated using Conumee and GISTIC2 for the segmentation of armlevel methylation-based copy number profiles with default parameters 3,4 .One case was excluded for copy number analysis because of low quality in the copy number plot.A threshold of -0.1 was applied to detect relevant losses of chromosomal arm 22q.For all cases with 22q values between 0 and -0.2, manual assessment of CNV plots was additionally performed to validate 22q copy number status.The threshold for homozygous loss of 22q and NF2 was -0.75.MGMT promoter methylation status was assessed as described previously 1 .

Sanger sequencing
PCR was performed using KAPA Taq Extra kit (Kapa Biosystems) and confirmed by gel electrophoresis (1% agarose gel at 100V for 15 min).PCR products were purified by Illustra ExoProStar Kit (Cytiva) and further processed with BigDye™ Terminator v3.Identified variants were annotated based on publicly available data bases.Variants were further filtered for variants with an annotated population frequency <0.0001 and an allele frequency ³0.3.
Variants occurring in introns, UTRs or classified as synonymous variant were excluded.Putative clinical relevance of detected, filtered variants was determined based on annotation in ClinVar and Varsome.Variants classified as "benign" or "likely benign" were further excluded.Gene ontology analysis of genes affected by the remaining variants was performed using the DAVID functional annotation tool with the following thresholds: count=2, ease = 0.05.Enriched gene ontology terms were clustered and visualized with Revigo 5 .

Bulk RNA sequencing
RNA of SP-EPN was isolated from FFPE tissues using the Maxwell® RSC RNA FFPE Kit (Promega).Concentrations were measured with a Qubit 2.0 Fluorometer (Thermo Fisher Scientific) and RNA integrity was analyzed using the RNA 6000 Nano Kit on a 2100 Bioanalyzer (Agilent Technologies).

Bulk RNA sequencing data processing
All analyses were performed in R. Summarized read counts (SP-EPN n=61,552; microarray EPN data n = 54,675 transcripts total) with transcript IDs were mapped to HUGO Gene Nomenclature Committee (HGNC) symbols using BiomaRt.Non-protein coding transcripts and those without HGNC annotation were removed.The means of transcripts that matched the same HGNC symbol were calculated and genes of the Y chromosome were excluded to avoid gender bias.19,271 transcripts for SP-EPN, and 16,779 for microarray EPN data were retrieved.
For SP-EPN samples, batch effects were detected based on principal component analysis (PCA) and subsequent correction was performed between the five different hospitals, where the samples were collected.The ComBat_seq function of the sva package was used on raw count data to remove batch effects.Afterwards, batch corrected counts were normalized using the "median of ratios" method of the DESeq2 package 8 .For clustering analyses, a subsequent variance stabilization transformation followed by calculation of z-scores was performed.To obtain the 4,000 genes for clustering of SP-EPN, the most variable genes within each batch were determined first to reduce the influence of technical variability between batches on the analysis.
Therefore, variance stabilization transformation was performed on each batch individually and 3,000 genes with highest intra-batch variance were determined and the union taken.For each gene, the variance within each batch was weighted by batch size and for this purpose multiplied by sample size per batch/ total sample size.Afterwards, we calculated the sum for each gene of the weighted variance of all batches.Genes were ordered by the resulting batch-weighted variance and the 4,000 genes with the highest values were selected for clustering analysis.
The average Silhouette width was determined by the factoextra::fviz_nbclust function and for consensus clustering the ConsensusClusterPlus package was employed 9 .Here, hierarchical clustering using the options "ward.D2" and "pearson" was performed with 1000 resamplings, each on 80% of samples and all features.
ComplexHeatmap visualized hierarchical clustering of SP-EPN using Ward's method (D2) and Pearson distance 10,11 .UMAP dimensionality reduction was performed on all features and variance stabilized counts (19257 transcripts) with n_neighbors set to 15 using umap.
For differential expression analysis between SP-EPN subtype A and B, DESeq2::DESeq was used on batch-corrected counts.The volcano plot of differentially expressed genes was created with EnhancedVolcano 12 .Significantly differentially expressed genes were defined by adjusted p-Value < 0.05 and log2 fold change (log2FC) >1.
To determine potential marker genes of each SP-EPN cluster, differential expression was calculated between SP-EPN cluster tumors and all other ependymomas from the analysis using limma.Significantly differentially expressed genes were selected with Benjamini-Hochberg adjusted p-value <0.01 and logFC >1.Afterwards, only genes that also demonstrated significant differential expression in the comparison of transcriptional subtypes A and B in our cohort of 72 bulk RNA sequencing SP-EPN were considered as potential marker genes for subtypes.
Immune cell abundance in SP-EPN samples was estimated as described by Bockmayr et al. 15 The DESeq2-normalized and log2 transformed counts of SP-EPN were used for calculation of immune scores.Differential pathway activation analysis was conducted utilizing HiPathia 16 , which performs a canonical circuit activity analysis.Variance stabilized count data of SP-EPN were scaled to values between 0 and 1 with hipathia::normalize_data.When calculating signaling circuit activation, only effector pathways were included and differential activation was tested with hipathia::do_wilcoxon.

Single-cell/single-nucleus RNA sequencing analysis
Human adult lumbar spinal cord single nucleus RNA sequencing data of seven donors 17 were downloaded from GEO repository GSE190442 as aggregated counts.Human embryonal cervical and lumbar spinal cord single cell RNA sequencing samples 18 were selected from GEO repository GSE136719 (GSM4080448, GSM4080410, GSM408043, GSM4080449, GSM4080411, GSM4080437).All reference samples were integrated using the standard Seurat v4 workflow 19 .
Quality control was performed by selecting cells with nFeatures >200 and <3000 and excluding cells with more than 5% mitochondrial transcripts.Default parameters were used for integration of the first 30 principal components of the reference samples and seurat_clusters were determined at a resolution of 0.5.Differences between G2/M and S cell cycle phases were removed as recommend by Seurat vignette "Cell-Cycle Scoring and Regression", where cell cycle scores are calculated based on specific marker genes 20 .
As Yadav et al. provided a thorough annotation, these data were employed in the process of cell type annotation of the embryonal data.Furthermore, cluster gene expression was compared to appropriate literature 17,18,21 .
DESeq2-normalized bulk SP-EPN RNA sequencing data were subjected to Seurat v4 referencebased mapping workflow 19,22 , where the SP-EPN query is projected into the UMAP of the integrated human spinal cord reference.Here, we used the first 30 principle components for calculation of transfer anchors and filtering was turned off.During UMAP projection, the PCA was chosen as the reference reduction and UMAP as the reduction model.Afterwards, query and reference were merged based on reference PCA and newly generated query PCA to generate a new UMAP.
For detailed analysis of SP-EPN and ependymal cells, the same procedure was conducted on the isolated ependymal cluster of the data set described above.For Pearson correlation analysis, the isolated ependymal cells and SP-EPN data were log-normalized and mean centered before together.The 8,000 most variable genes were selected and gene expression averaged for each cell population and SP-EPN subtypes.Pearson correlation was calculated using stats::cor.

Histopathological analysis
Tumor samples were histomorphologically assessed in a blinded manner by at least two experienced institutional neuropathologists based on H&E-stained tumor sections.

Statistics
Differences between clinical and biological characteristics of SP-EPN subtypes were assessed using the appropriate test as indicated in the respective figure legends.P-values <0.05 were considered statistically significant.
DNA panel-based sequencing and data processingNext-generation DNA sequencing was performed using a customized targeted gene panel, manufactured by Qiagen (CDHS-21330Z-424).The panel covers the complete coding regions and splice-sites of six genes (ATRX, EGFR, NF1, NF2, PTEN, TP53), as well as mutation hotspots of further 14 genes (AKT, BRAF, CTNNB1, FGFR1, FGFR2, H3F3A, HIST1H3B, HIST1H3C, IDH1, IDH2, KRAS, PI3CA, PIK3R1, TERT promoter).Library preparation was done according to the manufacturer's instructions.Samples were sequenced on the Illumina MiniSeq sequencing system (paired-end, 2 x 151 bp, average coverage 500x).Data were analyzed with the Qiagen CLC Genomics workbench, using a customized workflow.Variants were annotated with information from the 1000 genome project, dbSNP, ClinVar and COSMIC.Only variants with an allele frequency ≥5% and a total target coverage of ≥40x were further analyzed.Variants not annotated by ClinVar were additionally assessed with VarSome (www.varsome.com).Lollipopof reads in target areas was performed with ABRA to achieve more accurate indel calling.Proprietary software (cegat) was used to discard duplicate reads, which originate most likely from the same PCR amplicon, and reads that were aligned with the same mapping score at more than one location in the genome.For variant calling, another proprietary software (cegat) was used.