Introduction

Parkinson’s disease (PD) is the second most common neurodegenerative disorder. It is characterized by the pathologic aggregation of alpha-synuclein and its pathology is known to progress in a predictable spatiotemporal manner. The spread of the disease pathology commences in the olfactory bulb and the lower brainstem and moves through the substantia nigra pars compacta (SN) in the midbrain, eventually reaching the meso- and neocortical areas [1, 2]. PD arises from a complex interplay of various factors, such as aging, genetic predisposition, and environmental factors. While the cause of PD is unknown in most cases, specific mutations in genes such as LRRK2 and GBA1 have been identified to significantly increase the risk of developing the disease, although likely by affecting different molecular pathways. This may be one reason why the impact of LRRK2 and GBA1 mutations on clinical presentations may differ [3]. In recent decades, transcriptome profiling has emerged as a preeminent methodology for exploring human pathologies at the molecular and cellular level. In PD, alpha-synuclein pathology has been shown to be associated with the transcriptional programs of various brain cell types, including neurons and glial cells [4]. Furthermore, although the degeneration of dopaminergic neurons (DA) in PD primarily occurs in the SN region, Lewy bodies can form in other brain regions, such as the limbic system and the prefrontal cortex [5]. The aim of our study is to identify transcriptional changes between available brain samples from neocortex with LRRK2 or GBA1 mutations and those from Healthy Controls, and to correlate these changes with clinical outcomes.

Results and discussion

In this study, we performed single-nucleus RNA-sequencing (snRNA-seq) on the prefrontal cortex (PFC) and anterior cingulate (ACC) brain regions from the same individuals (2 LRRK2 PD, 2 GBA1 PD and 2 Healthy Controls) (Fig. 1A and Table S1). After data cleaning and quality control (see Methods), 88,876 high-quality single nuclei were retained. The clustering of these high-quality nuclei identified 13 clusters covering major cell types in the brain, i.e. excitatory neurons (ExN), inhibitory neurons (InN), oligodendrocytes (ODCs), oligodendrocyte precursor cells (OPCs), microglia (MG), astrocytes (Astro) and vascular cells (Vas) (Fig. 1B). While ~ 50% of the total nuclei were annotated as ODCs, only 4.3 and 0.5% of nuclei accounted for OPCs and Vas cells, respectively (Fig. 1C). The observed proportions of various cell populations in our data aligned with histopathological studies, indicating ODCs as the predominant glial cell population [6]. On average, we obtained 7406 high-quality nuclei per sample ranging from 3.6 to 16.5% of the total nuclei (Fig. 1C and Table S2). While the percentage of each cell type varied across the samples, no significant differences (see Methods) in cell type proportions were observed between brain regions or mutation groups (Fig. 1D, E and Table S2). The clusters were annotated based on the expression of well-known cell-type markers (Fig. 1F and Table S3). The top 5 markers of each cell type cluster are shown in Fig. 1G.

Fig. 1
figure 1

Overview of snRNA-seq profiling in the human post-mortem brain tissues. A Schematic overview of the experimental plan (see Table S1). B Uniform manifold approximation and projection (UMAP) visualization of the snRNA-seq clusters from 88,876 high quality nuclei. C Percentage of nuclei for each cell type across samples, mutation group and brain regions (see Table S2). D UMAP embeddings of nuclei colored by mutation group and brain regions. E Barplot displaying the distribution of cell-type percentage sample-wise. F Violin plot illustrating the expression distribution of known gene markers (see Table S3). G Genes most up-regulated in identified cell types: excitatory neurons (ExN), inhibitory neurons (InN), oligodendrocytes (ODCs), oligodendrocyte precursor cells (OPCs), microglia (MG), astrocytes (Astro) and vascular cells (Vas)

Integration of snRNA-seq data, which includes all expressed genes, with PD genome-wide association studies (GWAS) [7] showed the significant association (adjusted P < 0.01) of ODCs, OPCs and MG with PD-linked risk loci (Fig. 2A and Table S4). Next, we performed differential expression gene analysis to identify dysregulated genes between PD and controls in ODCs and OPCs. While 1040 and 543 genes were differentially expressed between GBA1 and control samples in ODCs and OPCs, respectively, only 278 genes in ODCs and 108 genes in OPCs were differentially expressed between LRRK2 and control samples (Fig. 2B and Table S5). Using all the DEGs, MAGMA indicated that OPCs had the highest association with PD-linked risk loci (Fig. 2C and Table S6). The most prominent MAGMA association was observed among DEGs in OPCs when comparing LRRK2 vs HC, aligning with earlier findings indicating highest LRRK2 expression in OPCs [8]. Enriched biological processes in up-regulated DEGs (Fig. 2D and Table S7) exhibited negative regulation of inclusion body assembly (in GBA1/Control ODCs), regulation of microtubule nucleation (in LRRK2/Control ODCs), positive regulation of tau-protein kinase activity (in GBA1/Control OPCs) and regulation of protein polymerization (in LRRK2/Control OPCs). On the other hand, enriched biological processes in down-regulated DEGs exhibited protein processing involved in protein targeting to mitochondria (in GBA1/Control ODCs), regulation of potassium ion transport (in LRRK2/Control ODCs), modulation of chemical synaptic transmission (in GBA1/Control OPCs) and cholesterol biosynthetic process (in LRRK2/Control OPCs). The low overlap between DEGs of LRRK2 or GBA1 compared to Healthy Controls, along with the distinct enriched biological processes in LRRK2 and GBA1 DEGs, indicates that GBA1 and LRRK2 mutations have unique molecular impacts in PD.

Fig. 2
figure 2

Association of PD susceptibility with ODCs and OPCs. A Multi-marker analysis of genomic annotation (MAGMA) gene set enrichment based on all the 88,876 high quality nuclei showed significant associations with oligodendrocytes (ODCs) and oligodendrocyte precursor cells (OPCs) (see Table S4). B Number of differentially expressed genes (DEGs) in each comparison and cell-type (see Table S5). C MAGMA gene set enrichment based on DEGs in LRRK2 vs HC (upper) and in GBA1 vs HC (lower) (see Table S6). D Gene ontology enrichment analysis of up-regulated (left) or down-regulated (right) genes. Top five biological process terms for each gene list are indicated. Enrichr combined score is calculated by the logarithmic transformation of the p-value obtained from Fisher’s exact test, multiplied by the z-score representing the deviation from the expected rank (see Table S7)

Next, we performed polygenic prediction in an independent PD cohort from Tuebingen (681 cases and 549 controls), which has a uniquely rich and detailed set of PD clinical measures. Polygenic Risk Scores (PRS) were calculated using the GWAS summary statistics for PD excluding the data from our Tuebingen cohort (see Methods). We computed six different PRS derived from six gene lists obtained through our analysis of differential expression in ODCs, OPCs and MG (Table S8). All the scores significantly predicted case–control status (Table S8). Subsequently, we conducted prediction of PD measures among patients. We focused on Unified Parkinson Disease Rating Scale-III (UPDRS-III), Montreal Cognitive Assessment (MoCA) and Beck Depression Inventory-II (BDI-II) among PD patients (N ranges from 379 to 514). Significant associations were observed between GBA1_OPC_DEG score with MoCA (P-value = 0.055, Table 1), LRRK2_ODC_DEG and LRRK2_OPC_DEG scores with BDI-II (P-value = 0.054 and P-value = 0.002, respectively). No significant associations with clinical outcomes were found using PRS based on DEGs in MG (Table 1). It is noteworthy that none of the scores were able to predict motor examination measures i.e. UPDRS-III. Expanding beyond region-specific prediction, we computed a general PRS using all SNPs as a control. As anticipated, we observed an association with UPDRS-III (P-value = 0.076) but not with non-motor symptoms (Table S8).

Table 1 Polygenic prediction of Parkinson’s measures in the Tuebingen patient sample

In order to confirm and broaden our findings beyond cortical regions, we utilized three distinct single-cell datasets for predicting PD measures using PRS derived from DEGs in ODCs and OPCs [9,10,11] (Table S9). Using a dataset from the first study focusing on the ACC region [9], we uncovered an association between ODC DEGs and BDI-II (P-value = 0.066, Fig. 3A and Table S10). In the second study centered on the SN region [10], we discovered a substantial association between the OPC DEGs and UPDRS-III (P-value = 0.012, Fig. 3B and Table S10), indicating a potential specificity to different brain regions. To further uncover the association of PD measures and subtypes of OPCs and ODCs, we used snRNA-seq generated by Kamath and colleagues [11], providing well-defined subtypes within the SN region. As a positive control, we first used DEGs from a highly vulnerable DA neuronal subpopulation, marked by SOX6_AGTR1. Indeed, as expected the PRS derived from DEGs in SOX6_AGTR1, were significantly associated with all three PD measures i.e. UPDRS-III, MoCA and BDI-II (P-value < 0.1, Fig. 3C and Table S10). Intriguingly, besides SOX6_AGTR1, we found an association of the OPC_ADM subtype with UPDRS-III (P-value = 0.052, Fig. 3C and Table S10). In addition, BDI-II was associated with the OPC_ADM, OPC_HOXD3 and ODC_ENPP6_EMILIN subtypes (P-value < 0.1). It’s crucial to highlight that the OPC_ADM population showed a notable increase in PD samples (Fig. 3 of Kamath et al. [11]). Enriched biological processes in OPC_ADM DEGs exhibited distinct terms like regulation of myelination and glial cell differentiation whereas OPC_HOXD3 exhibited terms like regulation of receptor-mediated endocytosis, neuroblast proliferation and energy reserve metabolic process (Fig. 3D and Table S11). ODC_ENPP6_EMILIN displayed terms related to unfolded protein and chaperone-mediated protein complex assembly (Fig. 3D and Table S11). Similar processes have recently been found to be enriched in PD-associated oligodendrocytes [12], suggesting that oligodendrocytes are affected by protein folding stress in PD and antigen processing/presentation as reviewed by Sutter and Crocker [13].

Fig. 3
figure 3

Polygenic prediction of PD measures using the ODCs and OPCs DEGs in publicly available datasets. AC Prediction of clinical outcomes using Feleke et al. [9] from anterior cingulate cortex region (A), Lee et al. [10] from substantia nigra region (B) and Kamath et al. [11] from substantia nigra region (C) (see Table S10). D Gene ontology enrichment analysis of DEGs in predictive subpopulation of cell-types in Kamath et al. [11] from substantia nigra region. Top five biological process terms for each gene list are indicated (see Table S11)

To summarize, in this short report, we found a highest and significant association of PD GWAS risk loci in ODC and OPC expressed genes within PD GWAS risk loci and revealed several dysregulated genes and pathways, including regulation of tau-protein kinase activity, regulation of inclusion body assembly and protein processing involved in protein targeting to mitochondria. It is now widely acknowledged that oligodendrocytes and/or precursor cells play a role in supporting neurons through mechanisms beyond the insulating function of myelin [14,15,16]. Therefore, it is tempting to speculate that the abnormally regulated pathways, which extend beyond myelination, such as those involved in metabolic support to neurons, may contribute to the pathology of PD. ODCs form dynamic myelin sheaths around axons, which are essential for higher brain functions such as learning and memory. Additionally, they offer vital metabolic support to neurons, underscoring their significant contribution to overall brain function. OPCs are integrated into local neural circuits and play roles in synaptic and axonal remodeling, phagocytosis, and immunomodulation. Here, we would also address certain limitations of our study. We recognize that the sample sizes of the newly generated data in this study are comparatively small, primarily due to the challenge of obtaining these rare samples from brain banks, given the rarity of the mutations. Therefore, we utilized three distinct single-cell datasets from previously published studies to validate our findings. However, it is crucial to highlight that not all datasets exhibit uniform distribution in terms of age, gender, and mutations. Additionally, we are currently expanding our Tuebingen cohort to enhance the predictive power of clinical outcomes. We have incorporated clinical data from newly received (in July 2024) PD patient data (n = 358) and conducted polygenic predictions of Parkinson's measures using this replicate cohort. Indeed, we found associations between differentially expressed polygenic risk scores and non-motor symptoms (Table S12). However, the results do not fully match those in Table 1, indicating that clinical prediction outcomes are highly dependent on sample size and genetic background. In the future, it will be essential to integrate and replicate the results in a larger cohort characterized by a balanced metadata. It is noteworthy that in line with our results, previous studies indicate a significant enrichment of PD heritability in glial cell types like oligodendrocytes and astrocytes [17,18,19]. Two decades ago, Wakabyashi et al., observed an abnormal accumulation of alpha-synuclein in the oligodendrocytes within the substantia nigra of PD patients [20]. By integrating GWAS results with single-cell transcriptomic data, Bryois et al., and Agarwal et al., observed oligodendrocytes and oligodendrocyte precursor cells to be significantly associated with PD [18, 19]. Moreover, while the loss of DA neurons in the SN region of the midbrain is a well-known pathological hallmark of PD closely associated with motor symptoms, it is important to note that PD patients also encounter various non-motor symptoms, including cognitive and psychopathological manifestations, which are associated with prefrontal cortex dysfunction [21]. Szabolcs and colleagues found higher rates of psychiatric morbidity (especially mood disorders, cognitive impairment, anxiety disorders, schizophrenia) in the premotor phase of PD and these were more common in PD patients before PD diagnosis [22]. In line with this, the polygenic predictions in this study showed notable associations with non-motor symptoms, suggesting a crucial involvement of glial cells in neuropsychiatric symptoms that may extend beyond the SN region of the midbrain. On the other hand, a significant correlation was observed between the OPC subpopulation in the substantia nigra (OPC_ADM) and PD motor symptoms. This suggests that region-specific alterations in molecular pathways and gene expression changes in glial cells contribute to distinct clinical outcome predictions. Altogether, we anticipate that our study will serve as a valuable resource and prompt further research into the involvement of oligodendrocytes and oligodendrocyte precursor cells in the pathology of PD.

Methods

Samples used in this study

The research was conducted using fresh-frozen postmortem brain tissues obtained from four PD patients, two of which had the LRRK2 p.G2019S mutation and the other two had the GBA1 mutation (one with p.L444P and one with p.E326K), along with two healthy controls. All donors were males, aged between 65 and 80, and PD patients had Lewy body Braak stages of 5–6. Two brain regions, namely the prefrontal cortex and anterior cingulate cortex, were investigated for each donor. All tissues were procured from the Netherlands brain bank except one LRRK2 brain from UCL Queen Square Brain Bank for Neurological Disorders, following the policies and regulations of the institutional ethics board at the University Hospital Tuebingen in Germany. For transcriptome analysis, single nuclei were extracted from all samples, and snRNA-seq was performed. The clinical characteristics of the donors have been elaborated in Table S1.

Generation of single nuclei from postmortem human brains

The process of isolating nuclei entails the utilization of a detergent lysis technique, where a detergent is employed to break down the cellular membranes, followed by the centrifugal separation of the nuclei. In brief, 300 mg of post-mortem brain tissue was dounce-homogenized in 2 ml of Nuclei EZ Prep Lysis Buffer (Sigma Aldrich, MA, USA) spiked with 0.2 U μl−1 RNase inhibitor (Sigma Aldrich, MA, USA), 3.3 μl DTT (Thermo Fisher Scientific, MA, USA) and 33 μl of 10% Triton X100 which were added before incubating on ice for 5 min in a final volume of 10 ml. Homogenized tissue was washed with 3–4 ml washing buffer which was fresh PBSB 1% and then was filtered through a 70-μm cell strainer (BD Bioscience, NY, USA). Then using a long tube, 10 ml of 1.8 M ice-cold sucrose cushion solution is added to each sample i.e. the roughly 3 ml lysate. After carefully and completely discarding the supernatant and the sucrose cushion layer containing debris and myelin, 1 ml PBS buffer added to resuspend the nuclei and 4 ml nuclei suspension buffer (1% BSA-PBS solution). It was finally centrifuged on 500 g for 5 min. At the end, 2 μl DAPI (Sigma Aldrich, MA, USA) with the concentration of 1:100 was added to stain the nuclei. Final centrifugation step on 500 g for 5 min was preceded by incubating the DAPI added suspension for 15–20 min in the cold & dark room on the rotation wheel at 4 C. Sorting buffer which consists of 99 μl PBS and 1 μl RNase inhibitor (Sigma Aldrich, MA, USA) was added to resuspend the nuclei and make them ready for quality/quantity inspection and then to run on a 10 × genomic chromium controller. Quality assessment was performed using fluorescence-activated cell sorting (FACS) to detect all DAPI-positive events, i.e. individual nuclei comprising more than 95% of all events.

Droplet-based snRNA-seq using 10 × Genomics

Single-nuclei suspension concentration was determined by automatic cell counting (DeNovix CellDrop, DE, USA) using an AO/PI viability assay (DeNovix, DE, USA) and counting nuclei as dead cells. Single-nucleus gene expression libraries were generated using the 10 × Chromium Next gel beads-in-emulsion (GEM) Single Cell 3’ Reagent Kit v3.1 (10 × Genomics, CA, USA) according to manufacturer’s instructions. In brief, cells were loaded on the Chromium Next GEM Chip G, which was subsequently run on the Chromium Controller (10 × Genomics, CA, USA) to partition cells into GEMs. Cell lysis and reverse transcription of poly-adenylated mRNA occurred within the GEMs and resulted in cDNA with GEM-specific barcodes and transcript-specific unique molecular identifiers (UMIs). After breaking the emulsion, cDNA was amplified by PCR, enzymatically fragmented, end-repaired, extended with 3′ A-overhangs, and ligated to adapters. P5 and P7 sequences, as well as sample indices (Chromium i7 Multiplex kit, 10 × Genomics, CA, USA), were added during the final PCR amplification step. The fragment size of the final libraries was determined using the Bioanalyzer High-Sensitivity DNA Kit (Agilent, CA, USA). Library concentration was determined using the Qubit dsDNA HS Assay Kit (Thermo Fisher Scientific, MA, USA). snRNA libraries were pooled and paired-end-sequenced on the Illumina NovaSeq 6000 platform (Illumina, CA, USA).

snRNA-seq quality control

Samples were demultiplexed using Illumina's bcl2fastq conversion tool and the 10 × Genomics pipeline Cell Ranger count v6.0.1 to perform alignment against the 10 × Genomics pre-built Cell Ranger reference GRCh38-2020-A (introns included), filtering, barcode counting, and UMI counting. As a default, a cut-off value of 200 unique molecular identifiers expressed in at least 3 cells was used to select nuclei of sufficient complexity for further analysis. Each sample’s count was normalized by the SCTransform method in Seurat v4.1.0 [23] with mitochondrial reads regressed out. Two approaches were combined for quality control: (1) Doublets and multiplets were filtered out using DoubletFinder v2.0.3 [24] for each individual sample; (2) outliers with a high ratio of mitochondrial and ribosomal counts (each > 10%) and cells with low a number of genes (N < 1000) were removed. The core statistical parameters of DoubletFinder used to build artificial doublets for true doublet classification were determined automatically using recommended settings. After applying these filtering steps on 105,781 input nuclei, the dataset contained 88,876 high-quality single nuclei that were eligible for further analysis. We used the speckle R package v0.99.7 to analyze differences in cell type proportions [25]. We used the propeller function with CellType, SamplID and Mutation/Region columns from the Seurat MetaData object as input for clusters, sample and group, respectively.

Cell annotations and differential expression

After combining all samples into a single Seurat object, genes were projected into principal component space using the principal component analysis (RunPCA). Harmony R package [26] was used for integration as well as for removing unwanted effects across subjects. The first 12 PC dimensions of data processed with Harmony were used as inputs into the FindNeighbours, FindClusters [at 0.1 resolution obtained out of a range of tested resolutions (0.1, 0.2, 0.5, 1.0)] and RunUMAP functions of Seurat. In brief, a shared-nearest-neighbor graph was constructed on the basis of the Euclidean distance metric in principal component space, and cells were clustered using the Louvain algorithm. The RunUMAP function with default settings was used to calculate 2D UMAP coordinates and search for distinct cell populations. Cluster markers and differential expression testing was performed on Seurat “RNA” assay containing seurat log-normalized counts using default Wilcoxon method implemented in Seurat v4.1.0. Differential gene expression test between cases and controls was performed for each cell type using the Wilcoxon ranked sum method implemented within the FindMarkers function with min.pct of 0.25 and logfc of 0.25 (Adjusted P-Value < 0.05). Gene ontology enrichment analysis for biological processes was performed using EnrichR [27]. In addition, hierarchical clustering of enriched GO terms was performed for a set of paired comparisons, including LRRK2 vs. HC and GBA1 vs. HC brains, in which differentially expressed genes showed significant enrichment (Adjusted P-Value < 0.05).

Cell-type association with genetic risk of PD

Association analysis of cell type-specific expressed genes with genetic risk of PD was performed at NIH as described previously [28], using Multi-marker Analysis of GenoMic Annotation (MAGMA) v2.0.2, in order to identify disease-relevant cell types in the data [29, 30]. MAGMA, as a gene set enrichment analysis method, tests the joint association of all SNPs in a gene with the phenotype, while accounting for LD structure between SNPs. Competitive gene set analysis was performed on SNP p-values from the latest PD GWAS summary statistics including 23andMe data and the publicly available European subset of 1000 Genomes Phase 3 was used as a reference panel to estimate LD between SNPs. SNPs were mapped to genes using NCBI GRCh37 build (annotation release 105). Gene boundaries were defined as the transcribed region of each gene. An extended window of 10 kb upstream and 1.5 kb downstream of each gene was added to the gene boundaries.

Polygenic risk scores (PRS)

Using summary statistics data from PD meta GWAS (https://pdgenetics.org/resources “GWAS META5 summary stats”), we performed a comprehensive annotation using the Region Annotation function in Annovar [31] to find all genes corresponding to the SNPs in the whole genome and then generated the GWAS gene list. We performed an overlap analysis between the GWAS annotated genes and gene lists obtained from our differential expression comparisons in ODCs and OPCs. Next, we retrieved SNPs in each of the gene lists, including MAF, beta, and p-value from the base data summary statistics. For the prediction, we used imputed genotypes of 681 cases and 549 controls from the Tuebingen cohort [32], which were not included in the base data “GWAS META5 summary stats”. To construct polygenic risk score (PRS) models, we utilized the R package PRSice2 v2.3.5 [33]. LD reference was not required as our target sample has more than 500 samples (https://choishingwan.github.io/PRSice/step_by_step/). We applied clumping procedure using r2 > 0.1 and 1000 kb as the clumping parameters in PRSice2 and a p-value of 0.05 was chosen as the threshold to exclude non-significant SNPs. In other words, lead SNPs with a p-value of 0.05 from the LD-clumped list were included in the calculation of PRS used in the regression models. The null model is a logistic regression model that measures the power of covariates including age, sex, and genetic principal components (PC1-4) in the prediction of PD status, whereas the full model adds the PRS to the null model, thereby isolating the additive influence of the PRS on risk prediction. The R2 was also adjusted for an estimated PD prevalence of 0.005 on the liability scale. To predict the clinical measures, we followed a strategy similar to our previous work [34] i.e. PRS were included in a linear regression model using the “lm” function in R and standardized beta was obtained using the “lm.beta” function.

Publicly available datasets used in this study

We used three published snRNA-seq datasets from human postmortem specimens: (1) Feleke et al., anterior cingulate cortex [9] (n = 7 per group PD vs Controls, proportion female: control = 1/7, PD = 5/7), (2) Lee et al., substantia nigra [10] (n = 6 PD vs n = 13 Controls, proportion female: control = 4/13, PD = 3/6), and (3) Kamath et al., substantia nigra [11] (n = 7 PD vs n = 8 Controls, proportion female: control = 6/8, PD = 2/7). With the exception of Kamath et al., DEGs were directly obtained from the respective studies. DEGs between cases and controls from Kamath et al., were calculated using the Wilcoxon ranked sum method as implemented in Seurat’s FindMarkers function (Seurat v4.3.0.1). DEGs were computed for each cell type subpopulation based on subpopulations defined in Kamath et al. Filtered gene expression matrices and subpopulation annotations for Kamath et al. data were downloaded from Single Cell Portal (https://singlecell.broadinstitute.org/single_cell/study/SCP1768/) and converted into Seurat objects for log-normalization and differential gene expression analysis.