Large-scale pathway specific polygenic risk and transcriptomic community network analysis identifies novel functional pathways in Parkinson disease

A Correction to this article was published on 04 May 2021

This article has been updated

Abstract

Polygenic inheritance plays a central role in Parkinson disease (PD). A priority in elucidating PD etiology lies in defining the biological basis of genetic risk. Unraveling how risk leads to disruption will yield disease-modifying therapeutic targets that may be effective. Here, we utilized a high-throughput and hypothesis-free approach to determine biological processes underlying PD using the largest currently available cohorts of genetic and gene expression data from International Parkinson’s Disease Genetics Consortium (IPDGC) and the Accelerating Medicines Partnership-Parkinson’s disease initiative (AMP-PD), among other sources. We applied large-scale gene-set specific polygenic risk score (PRS) analyses to assess the role of common variation on PD risk focusing on publicly annotated gene sets representative of curated pathways. We nominated specific molecular sub-processes underlying protein misfolding and aggregation, post-translational protein modification, immune response, membrane and intracellular trafficking, lipid and vitamin metabolism, synaptic transmission, endosomal–lysosomal dysfunction, chromatin remodeling and apoptosis mediated by caspases among the main contributors to PD etiology. We assessed the impact of rare variation on PD risk in an independent cohort of whole-genome sequencing data and found evidence for a burden of rare damaging alleles in a range of processes, including neuronal transmission-related pathways and immune response. We explored enrichment linked to expression cell specificity patterns using single-cell gene expression data and demonstrated a significant risk pattern for dopaminergic neurons, serotonergic neurons, hypothalamic GABAergic neurons, and neural progenitors. Subsequently, we created a novel way of building de novo pathways by constructing a network expression community map using transcriptomic data derived from the blood of PD patients, which revealed functional enrichment in inflammatory signaling pathways, cell death machinery related processes, and dysregulation of mitochondrial homeostasis. Our analyses highlight several specific promising pathways and genes for functional prioritization and provide a cellular context in which such work should be done.

Introduction

Although a great deal of progress in understanding the genetic underpinnings of familial and sporadic Parkinson disease (PD) has been made, the biological basis and cellular context of this risk remain unclear. We have learned that about 1–2% of PD is associated with a classical Mendelian inheritance pattern, while the majority of disease is driven by a complex set of factors in which polygenic risk seems to play a crucial role [3]. The fact that many of the genes that contain disease-causing mutations also map within risk loci identified by genome-wide association studies (GWAS), supports the notion that common pathways are involved in both forms, and therefore, these pleomorphic genes might interact to regulate downstream common targets in both monogenic and non-monogenic PD [24].

Several common molecular processes have been suggested as critical in PD pathophysiology, including lysosome mediated autophagy, mitochondrial dysfunction, endosomal protein sorting and recycling, immune response, alpha-synuclein aggregation, lipid metabolism and synaptic transmission [2]. A goal in much of this work has been to unify the proteins encoded by PD-linked genes into common pathways. For instance, some success has been seen in this regard within the autosomal recessive genes PINK1, PRKN, and DJ-1, which share a common cellular mechanism: mitochondrial quality control and regulation [12, 18]. However, despite this success, the PD genetics field is still facing the challenge of understanding how genetic risk variants may disrupt biological processes and drive the underlying pathobiology of the disease. In the current era, using genetics to understand the disease process is a key milestone to facilitate the development of targeted therapies.

A priority in elucidating PD etiology lies in defining cumulative risk. GWAS continues to expand the number of genes and loci associated with disease [17], but the majority of these contributors individually exert small effects on PD risk. Current estimates of heritability explained by GWAS loci suggest that there is still an important component of risk yet to be discovered.

Here, we present a novel high-throughput and hypothesis-free approach to detect the existence of PD genetic risk linked to any particular biological pathway. We apply polygenic risk score (PRS) to a total of 2199 curated and well-defined gene sets representative of canonical pathways publicly available in the Molecular Signature Database v7.2 (MSigDB) [26] to define the cumulative effect of pathway-specific genetic variation on PD risk. To assess the impact of rare variation on PD risk explained by significant pathways, we perform gene-set burden analyses in an independent cohort of whole-genome sequencing (WGS) data, including 2101 cases and 2230 controls.

Additionally, we explore cell-type expression specificity enrichment linked to PD etiology by using single-cell RNA sequencing data from brain cells. Furthermore, we use graph-based analyses to generate de novo pathways that could be involved in disease etiology by constructing a transcriptome map of network communities based on RNA sequencing data derived from the blood of 1612 PD patients and 1042 healthy subjects.

Subsequently, we perform summary-data-based Mendelian randomization (SMR) analyses to prioritize genes from significant gene-sets by exploring possible genomic associations with expression quantitative trait loci (eQTL) in public databases and nominate overlapping genes within our transcriptome communities for follow-up functional studies.

Finally, we present a user-friendly platform for the PD research community that enables easy and interactive access to these results (https://pdgenetics.shinyapps.io/pathwaysbrowser/).

Methods

Gene set selection representative of canonical pathways

The Molecular Signatures Database (MSigDB database v7.2) is a compilation of annotated gene sets from various sources such as online pathway databases, the biomedical literature, and manual curation by domain experts [15, 26]. We selected the collection “Canonical Pathways” composed of 2199 curated gene sets of pathways annotated from the following databases; Reactome (1499), KEGG (186), BIOCARTA (289), Pathway Interaction Database (196), Matrisome project (10), Signaling Gateway (8), Sigma Aldrich (10), SuperArray SABiosciences (1) (http://software.broadinstitute.org/gsea/msigdb).

Genotyping data: cohort characteristics, quality control procedures, and study design

To assess PD risk, summary statistics from Chang et al. [7] PD GWAS meta-analysis involving 26,035 PD cases and 403,190 controls of European ancestry were used as the reference dataset for the primary analysis to define risk allele weights. In this study, there were 7,909,453 imputed SNPs tested for association with PD with a minor allele frequency (MAF) > 0.03. Recruitment and genotyping quality control procedures were described in the original report [7]. Individual-level genotyping data not included in Chang et al. [7] and from the last GWAS meta-analysis [17] was then randomly divided as the training and testing datasets. The training dataset used to construct the PRS consisted of 7218 PD cases and 9424 controls, while the testing dataset to validate the results consisted of 5429 PD cases and 5814 controls, all of European ancestry (see Fig. 1 for analysis workflow and rationale summary). Demographic and clinical characteristics of the cohorts under study are given in Supplementary Table 1, online resource.

Fig. 1
figure1

Workflow and rationale summary

Additional details of these cohorts, along with detailed quality control (QC) methods, can be found in Nalls et al. [17]. For sample QC, in short, individuals with low call rates (< 95%), discordance between genetic and reported sex, heterozygosity outliers (F-statistic cutoff of > − 0.15 and < 0.15) and ancestry outliers (± 6 standard deviations from means of eigenvectors 1 and 2 of the 1000 Genomes phase 3 CEU and TSI populations from principal components) were excluded. Further, for genotype QC, variants with a missingness rate of > 5%, minor allele frequency < 0.05, exhibiting Hardy–Weinberg Equilibrium (HWE) < 1E−5 and palindromic SNPs were excluded. Remaining samples were imputed using the Haplotype Reference Consortium (HRC) on the University of Michigan imputation server under default settings with Eagle v2.3 phasing based on Haplotype Reference Consortium r1.1 2016 (http://www.haplotype-reference-consortium.org), and variants with an imputation quality (R2 > 0.3) were included.

Polygenic effect scores for individual biological gene-sets versus PD risk

A polygenic effect score (PES) was generated to estimate polygenic risk for each of the 2199 gene sets representative of biological pathways and then tested for association with PD. PES was calculated based on the weighted allele dose as implemented in PRSice2 (v2.1.1) (https://github.com/choishingwan/PRSice) [9]. Using the reference dataset, we selected variants with a summary statistic p value of association less than or equal to 0.05 and with MAF > 1%. We extracted these variants from the training dataset, and linkage disequilibrium (LD) clumping was performed using the default r2 = 0.1 and 250 Kb of distance. Then, 1000 permutations of sample labels were implemented to generate association p-value estimates for each gene-set. A p-value threshold = 0.05 was considered to prefilter the inclusion of variants in an effort to avoid overfitting when comparing across gene sets as well as improve computational efficiency. The permutation test in the training dataset provided a Nagelkerke’s pseudo r2 value after adjusting for an estimated prevalence of 0.005 (aged population estimate as per Gasser and colleagues), age at onset for cases and age at examination for controls, gender, and 20 PCs to account for population stratification. For those gene-sets surpassing Bonferroni multiple testing correction (p-value corrected = 0.05/2199 gene-sets = 2.27E−5), PES was then tested in an independent cohort (testing dataset) in a similar way, and overlapping gene-sets significantly associated with PD risk were reported. In an attempt to explore what biological processes were associated with PD risk after excluding known risk factors, the same analyses were performed after removing the 90 known PD GWAS hits [17] and additional SNPs located 1 Mb upstream and downstream from the signal. PES analyses considered that all the variants conferred risk under the additive model and did not cover regulatory regions adjacent to the up or downstream of the genes or intergenic variants.

Whole-genome sequencing data: cohort characteristics and quality control procedures

The following eight cohorts were utilized in this study; Biofind (https://biofind.loni.usc.edu/), NABEC [11], LNG Path confirmed, PDBP (https://pdbp.ninds.nih.gov/), NIH PD CLINIC, PPMI (https://www.ppmi-info.org/), WELLDERLY and UKBEC. Clinical and demographic characteristics of the cohorts under study are summarised in Supplementary Table 2, online resource. Participants included sporadic PD cases clinically diagnosed by experienced neurologists. PD cases met criteria defined by the UK PD Society Brain Bank. This included 2101 cases and 2230 controls. All individuals were of European descent and were not age- or gender-matched.

DNA sequencing was performed using two vendors: Macrogen and USUHS. For samples sequenced at Macrogen, one microgram of each DNA sample was fragmented by the Covaris System and further prepared according to the Illumina TruSeq DNA Sample preparation guide to obtain a final library of 300–400 bp average insert size. Libraries were multiplexed and sequenced on the Illumina HiSeq X platform. For samples sequenced by USUHS, DNA samples were processed using the Illumina TruSeq DNA PCRFree Sample Preparation kit, starting with 500 ng input and resulting in an average insert size of 310 bp. USUHS processed single-libraries on single lanes on HiSeq X flow cells, and the Macrogen protocol used multiplexing. Paired-end read sequences were processed in accordance with the pipeline standard developed by the Centers for Common Disease Genomics [5]. The GRCh38DH reference genome was used for alignment as specified in the FE standardized pipeline [31]. The Broad Institute’s implementation of this FE standardized pipeline, which incorporates the GATK [8] Best Practices is publicly available and used for WGS processing. Single-nucleotide (SNV) and InDel variants were called from the processed WGS data following the GATK [8] Best Practices [8] using the Broad Institute’s workflow for joint discovery and variant quality score recalibration (VQSR). For quality control, each sample was checked using common methods for genotypes as well and sequence-related metrics. Using Plink v1.9 [6], each sample’s genotype missingness rate (< 95%), heterozygosity rate (exceeding ± 0.15 F-stat), and gender were checked. The King v2.1.3 kinship tool (8) was used to check for the presence of duplicate samples. Sequence and alignment related metrics generated by the Broad’s implementation of the FE standardized pipeline were inspected for potential quality problems. This included the sample’s mean sequence depth (< 30×) and contamination rate (> 2%), as reported by VerifyBamID (9), and single nucleotide variant count as reported by Picard’s CollectVariantCallingMetrics (< 3 StDev) based on the sample’s genomic vcf (gvcf). Principal components (PCs) were created for each dataset using PLINK. For the PC calculation, variants were filtered for minor allele frequency (> 0.01), genotype missingness (< 0.05), and HWE (P≥1E−6), and minor allele count < 3. GCTA [33] was used to remove cryptically related at the level of first cousins or closer (sharing proportionally more than 12.5% of alleles).

Gene-set burden analyses

The sequence kernel association test-optimal (SKAT-O) [14] was implemented using default parameters in RVTESTS [35] to determine the difference in the aggregate burden of rare coding genetic variants (minor allele count ≥ 3) between PD cases and controls for the nominated gene-sets by PRS. SKAT-O was applied to aggregate genetic information across defined genomic regions to test for associations with gene-sets of interest under two frequency levels (MAF ≤ 0.03 and MAF ≤ 0.01) and three functional categories (missense, loss of function and Combined Annotation Dependent Depletion (CADD) score > 12 representing between 1 and 10% predicted most pathogenic variants in the genome). Covariates including gender, age at onset (cases), age at enrollment (controls), and 10 PCs were included to adjust the analyses. ANNOVAR was used for variant annotation [30].

Network expression community map in gene expression data

Baseline peri-diagnostic RNA sequencing data derived from the blood for 1612 PD patients and 1042 healthy subjects available from the Parkinson Progression Marker Initiative (PPMI) was used to construct a network of expression communities based on a graph model with Louvain clusters. This cleaned and normalized data was downloaded from the Accelerating Medicines Partnership for Parkinson’s disease (AMP-PD) on March 1st, 2020. Library preparation, protocol, and transcriptomic quality control procedures can be found in detail in the original source https://amp-pd.org/transcriptomics-data. Prior to analyses, all data for the baseline visit were extracted. Data for each gene was then z-transformed to a mean of zero and a standard deviation of one. Scikit-learn’s extraTreeClassifier option was used to extract coding gene features for inclusion in the network builds that are likely to contribute to classifying cases versus controls under default settings in the feature selection phase, leaving 8.3 k protein-coding genes for candidate networks [22]. Following this feature extraction phase, controls were excluded, and case-only correlations were calculated for all remaining gene features. Next, this correlation structure was converted to a graph object using NetworkX [28]. We filtered for network links at positive correlations (upregulated in cases together) between genes greater than or equal to 0.8. Subsequently, the Louvain algorithm was employed to build network communities within this graph object derived from the selected feature set [1].

Finally, pathway enrichment analysis within expression communities was performed to further dissect its biological function using the function g:GOSt from g:ProfileR [19]. The significance of each pathway was tested by hypergeometric tests with Bonferroni correction to calculate the error rate of each network.

Cell-type polygenic risk enrichment analysis

Single-cell RNA sequencing data [25] based on a total of 9970 cells obtained from several mouse brain regions (neocortex, hippocampus, hypothalamus, striatum, and midbrain) was used to explore cell types associated with PD risk. There are certainly differences between the mouse and the human brain. We used the package EWCE (v. 0.99.2) (https://github.com/NathanSkene/EWCE) to perform mouse to human homolog gene conversion. The package contains a dataset with the human orthologs of Mouse-Genomics-Informatics (MGI) mouse genes (mouse_to_human_homologs list). Out of the 14,579 mouse genes reported in Supplementary Table 4, Skene et al. [25], a total of 13,533 genes (92.82%) were converted to human HGNC symbols. Only genes with a high-confidence (1:1 mapping) were retained. As described in Skene et al. [25], a large fraction of non-matches is reasonable given evolutionary differences between humans and mice. The dataset described by Skene et al. [25], includes the specificity of expression for each gene within each cell type where values range from zero to one and represent the proportion of the total expression of a gene found in one cell type compared to all cell types. The closer the score is to 1, the more specific is the expression in that particular cell type. Taking this into account, PRS R2 (variance) was calculated within each cell type using PRSice2 (v2.1.1) as previously described in this manuscript. Cell type expression specificity levels ranging from 0 to 1 were then distributed in deciles. If a particular cell type is associated with PD risk, it is expected to observe a shift in the curve distribution with low PRS R2 in non-specific gene sets (i.e., lower deciles) and a higher PRS R2 in more specific gene sets (i.e., higher deciles). Linear regression adjusted by the number of SNPs included in the PRS was performed to assess the trend of increased PRS R2 per decile of cell-type expression specificity.

Summary-data-based Mendelian randomization quantitative trait loci analyses

Two-sample SMR was applied to explore the enrichment of cis eQTLs within the 46 gene-sets nominated by our large-scale PRS analysis. The methodology can be interpreted as an analysis to test if the effect size of genetic variants influencing PD risk is mediated by gene expression or methylation to prioritize genes underlying these gene-sets for follow-up functional studies [37]. QTL association summary statistics from well-curated expression datasets were compared to Nalls et al. [17] summary statistics after extracting the gene-set-specific independent SNPs considered as the instrumental variables. Expression datasets used for these analyses include estimates for cis-expression from the Genotype-Tissue Expression (GTEx) Consortium (v6; whole blood and 10 brain regions), the Common Mind Consortium (CMC; dorsolateral prefrontal cortex), the Religious Orders Study and Memory and Aging Project (ROSMAP), and the Brain eQTL Almanac project (Braineac; 10 brain regions). Additionally, we studied expression patterns in blood from the largest eQTL meta-analysis so far [29]. LD pruning and clumping were carried out using default SMR protocols (http://cnsgenomics.com/software/smr). Multi-SMR p-values (gene-level expression summaries for eQTLs) were adjusted by Bonferroni multiple test correction considering the number of genes tested per gene-set, and HEIDI was used to detect pleiotropic associations between the expression levels and PD risk that could be biasing the model at a p-value < 0.01 [32]. Effect estimates represent the change in PD odds ratio per one standard deviation increase in gene expression. Enrichment of cis expression was assessed per gene-set and per tissue. The number of genes tested per gene-set were Bonferroni corrected, and a Chisq test was applied to assess whether the proportion of QTLs per gene-set was significantly higher than expected by chance.

Results

Large-scale PES analysis nominates biological processes involved in PD risk

Out of the 2199 gene sets representative of biological processes included in this report, 279 gene-sets were significantly associated with PD risk in the training phase (Bonferroni threshold for significance 0.05/2,199 = 2.27E−5) (Supplementary Table 3, online resource, https://pdgenetics.shinyapps.io/pathwaysbrowser/). Following the same analysis workflow, a total of 46 gene sets were replicated in the testing phase and nominated as potentially linked to PD risk through common genetic variation (Table 1, Fig. 2a, b).

Table 1 Canonical pathways significantly associated with PD risk in the discovery and replication phases through common variation
Fig. 2
figure2figure2

Canonical pathways associated with Parkinson disease risk through common genetic variation based on PES analyses. Forest plots showing polygenic risk score estimates for the significant canonical pathways in the replication phase including (a) and removing (b) PD known risk loci ± 1Mb upstream and downstream. Estimates of variance explained by PRS for the significant canonical pathways including (c) or excluding (d) PD known risk loci ± 1Mb upstream and downstream

Supplementary Table 4, (online resource) summarizes what SNPs within the 90 risk loci located up to 1 Mb upstream and downstream from the GWAS signal were included for each of the 2199 gene-sets as part of the large-scale polygenic risk score analyses for both the training and testing phases.

After excluding the 90 PD risk loci and SNPs located 1 Mb upstream and downstream from the GWAS hits, six gene sets including adaptive immune system, innate immune system, vesicle mediated transport, signaling by G protein-coupled receptors (GPCR) ligand binding, metabolism of lipids and neutrophil degranulation remained significant, suggesting as yet unidentified risk within these gene-sets (Bonferroni threshold for significance 2.27E−5) (Table 2, Fig. 2c, d). For an easy interpretation of these findings, significant gene-sets were clustered in hierarchies according to genetic redundancy, as highlighted in Supplementary Figs. 1, 2, online resource. Additionally, considering genetic pleiotropy across the 46 gene-sets, we prioritized the top 1% of genes involved in multiple pathways as a way of nominating promising PD candidate genes (Supplementary Table 5).

Table 2 Canonical pathways significantly associated with PD risk through common variation in the discovery and replication phases after excluding PD known risk loci ± 1 Mb upstream and downstream

In an attempt to define etiological subtypes of PD, we performed Uniform Manifold Approximation and Projection for Dimension Reduction Analysis (UMAP) to explore the possibility of clustering different subgroups of patients that could be enriched for risk in certain molecular pathways. UMAP analysis showed two different clusters of patients according to the pathway-specific PES (subgroup 1 and subgroup 2; Supplementary Fig. 3a, online resource). Subgroup 1 was not enriched on any LRRK2 G2019S carriers, while all patients from subgroup 2 (N = 100) were LRRK2 G2019S carriers. When LRRK2 gene boundaries were removed from the analysis and PES were calculated per individual, no subgroups were observed. We assume that since LRRK2 G2019S is the main risk factor for PD, this variant overweights PES for those pathways in which LRRK2 plays a role in (Supplementary Fig. 3b, online resource). This would suggest that pathway-specific PES by itself is not an accurate way to define etiological subgroups of the disease since association does not involve prediction. Future multimodality studies are necessary to increase discriminative accuracy given the heterogeneous nature of PD.

Gene-set-based burden analyses identifies gene-sets involved in PD risk through rare variation

To test whether the same biological processes are enriched by rare coding variants, we implemented gene-set based SKAT-O in a large WGS cohort composed of 2101 PD cases and 2230 controls. Out of the 46 gene-sets significantly associated with PD risk through common variation, 20 were linked through low-frequency genetic variation (MAF ≤ 3%) and 19 through rare variation (MAF ≤ 1%), at a p-value < 0.05 (Table 3). At a MAF threshold ≤ 3%, 12 gene-sets remained significantly associated with PD risk when focusing only on missense mutations, 4 when considering only loss of function variants and 6 when filtering by CADD score > 12 (~ among the 1–10% most pathogenic variants in the genome) (Table 3). At a MAF threshold ≤ 1%, 12 gene-sets remained significantly associated with PD risk when focusing only on missense mutations, four when considering only loss of function variants and five when filtering by CADD score > 12 (Table 3). Considering a more stringent p-value (Bonferroni threshold for significance 0.05/46 gene-sets = 0.001), five gene sets including Alzheimer’s disease, Parkinson’s disease, Transmission across chemical synapses, Neuroactive ligand receptor interaction and GPCR ligand binding remained significant at MAF ≤ 3%. When focusing on MAF ≤ 1%, the above mentioned gene sets in addition to Aspargine-N-glycosylation were significantly associated with PD risk.

Table 3 Association of canonical pathways and PD risk through rare variation

After removing PD GWAS hits and SNPs located 1 Mb upstream and downstream, innate immune system and signaling by GPCR remained significantly associated with PD suggesting that rare variation within these gene-sets contributes to PD heritability (Supplementary Table 6, online resource).

In an effort to prioritize the top genes within significant gene-sets showing the highest cumulative effect on PD risk, individual gene-based SKAT-O analyses were performed considering a MAF threshold ≤ 3% and three functional categories (missense, loss of function and CADD score > 12). Using this approach, gene-level prioritization is highlighted in Supplementary Table 7, online resource.

Transcriptome map reveals expression modules linked to PD etiology

Using Louvain community detection, we generated transcriptomic networks among PD cases. We identified 54 de novo expression communities (Supplementary Table 8, Supplementary Fig. 4, online resource). Overall, the communities generated were relatively robust, with a modularity score of 0.523 (modularity ranges from − 1 to 1, with closer to 1 suggesting stronger connectivity between network members). The 54 network communities were found to be enriched via hypergeometric tests after Bonferroni correction for processes relating to immune system response, ribosome RNA processing to the nucleus and cytosol, cell cycle, oxidative stress, and mitochondrial impairment (Fig. 3, Supplementary Table 9, online resource).

Fig. 3
figure3

Functional enrichment analyses of transcriptomic community maps. The x-axis represents the gene set enrichment (%) based on the community map gene lists. Intersection size denotes the number of input genes within an enrichment category. Blue color indicates the adjusted association p-values on a − log10 scale. ***By chemiosmotic coupling and heat production by uncoupling proteins

Dopaminergic neurons, serotonergic neurons and neural progenitors play a role on PD etiology

We used single-cell RNA sequencing data from 24 different brain cell types [25, 34]. For each of those cell types, genes were clustered into 10 gene sets according to the level of expression specificity, ranging from 0 to 1 (0 means that a gene is not expressed at all and 1 means the gene expression is highly specific for that cell type). Then, PRS was calculated per quintile of specificity within cells. Increased PRS R2, consistent with increased cell expression specificity, was observed for embryonic dopaminergic neurons, serotonergic neurons, hypothalamic GABAergic neurons, and neural progenitors at P < 0.05 in both the training and replication phases (Supplementary Table 10, online resource).

Mendelian randomization prioritizes pathways and genes based on their functional consequence

We aimed at nominating genes within significant gene-sets contributing to PD etiology by assessing changes in expression across blood and brain. Out of the 46 gene-sets of interest, 7 showed a significant enrichment of QTLs more than expected by chance in the brain, 1 in substantia nigra and 11 in the blood (Supplementary Table 11, online resource).

SMR revealed functional genomic associations with eQTLs in 201 genes (Supplementary Table 12, online resource) of which 88 were found to be part of the network communities significantly associated with PD in our transcriptome community map (Supplementary Table 13, online resource).

Discussion

Despite success at uncovering genetic risk factors associated with PD, our understanding of the molecular processes involved in disease is still limited. Using the largest genomic and transcriptomic PD cohorts currently available, our study sought to define both cumulative genetic risk and functional consequences linked to myriad biological pathways in an unbiased and data-directed manner. To our knowledge, there are no previous reports in the PD field where a similar approach has been implemented to explore the contribution of thousands of molecular processes on both the trigger (risk) and the effect (expression changes) in a systematic manner.

Our large-scale PRS analysis identified multiple biological pathways associated with PD risk through common genetic variation. Overall, our results found that molecular processes underlying protein misfolding and aggregation, post-translational protein modification, immune response, membrane and intracellular trafficking, lipid metabolism, synaptic transmission, endosomal–lysosomal dysfunction and apoptosis mediated by initiator and executioner caspases are among the main contributors to PD etiology.

PD heritability remains incompletely deciphered by the genes and variants identified to date [17]. Here, we demonstrate that some of these significant gene-sets contribute to the heritability of PD outside of what is explained by current GWAS [17]. Notably, our genetic analyses provide definitive evidence for the role of several signal transduction mechanisms affecting adaptive and innate immune response, vesicular-mediated transport, and lipid metabolism on the risk for PD even after excluding PD known GWAS loci. The present study suggests that additional targets within these pathways are yet to be identified and prioritizes genes for follow-up functional studies.

A novel aspect of our study is that we nominate pathways whose implication on PD pathology has been poorly studied or debatable before. Our results support the hypothesis that chromatin remodeling and epigenetic mechanisms contribute to the development of PD [13]. An appropriate balance and distribution of active and repressed chromatin is required for proper transcriptional control, maintaining nuclear architecture and genomic stability, as well as regulation of the cell cycle [10]. Dysfunction in the epigenetic machinery has been shown to play a role in the etiology of a number of neurodegenerative and neurodevelopmental disorders either by genetic variation in an epigenetic gene or by changes in DNA methylation or histone modifications [13]. Similarly, our approach supports a role for vitamin metabolism on PD risk. Vitamins are crucial cofactors in the metabolism of carbohydrates, fat, and proteins, and vitamin deficiency has been widely proven to promote oxidative stress and neuroinflammation [16].

Interestingly, some of the nominated gene-sets seem to span the etiological risk spectrum in which both common and rare variation contribute to PD susceptibility. In concordance with previous studies [21], our study identified an increased collective effect of rare lysosomal related variants in PD etiology. Additionally, we found evidence for a burden of rare damaging alleles in a range of specific processes, including neuronal transmission-related pathways and immune response.

The present study represents a significant step forward in our understanding of important connections between genetic factors, functional consequences and PD etiology. We constructed a transcriptome map by clustering de novo pathways relevant to disease pathology. Functional characterization analysis of these expression communities revealed that dysregulation of the immune system and inflammatory response including neutrophil degranulation, interferon alpha beta signaling, and other cytokine-related signaling pathways are key disease processes. Strikingly, when looking at molecular mechanisms significantly associated with PD risk, a cumulative effect of rare loss of function variants was found to be linked to disease through the adaptive immune system pathway. Both inflammation and autoimmune response have been widely studied with regard to PD etiology. Previous genetic studies have identified risk loci spanning key immune-associated genes such as BST1 (bone marrow stromal cell antigen 1), a gene known to play role in neutrophil adhesion and migration, and HLA (human leukocyte antigen) [17, 23]. In support of this, it has been reported that α-synuclein-derived fragments act as antigenic epitopes displayed by HLA receptors, where both helper and cytotoxic T-cell responses are present in a high percentage of patients when tested [27].

Our analysis provides compelling evidence that dysregulation in genes that play a pivotal role in mitochondrial homeostasis exists in genetically complex PD. Despite not identifying these pathways as part of the stringent large-scale PRS analysis, our transcriptome community map showed an enrichment for the respiratory electron transport ATP synthesis by chemiosmotic coupling process and mitochondrial oxidative phosphorylation, in concordance with other reports [36]. Among the expression networks to highlight, it should be pointed out an enrichment in cell cycle and cell death machinery related processes and ribosome RNA processing to the nucleus and cytosol.

Our study aimed at pinpointing the specific drivers underlying these significant networks. Focusing on gene-sets linked to PD risk, SMR was applied to prioritize genes whose variation was found to be associated with expression changes linked with PD risk. Interestingly, we managed to replicate 88 of these genes after validating the functional consequence within our transcriptome community map.

Despite genetic efforts, it remains a matter of study in what cell types risk variants are active, which is essential for understanding etiology and experimental modeling. By integrating genetics and single-cell expression data, we found that PD risk is linked to expression specificity patterns in dopaminergic neurons, serotonergic neurons, hypothalamic GABAergic neurons, and neural progenitors, suggesting that these cell types disrupt biological networks that impact PD risk. Although our study failed at replicating specific enrichment patterns for oligodendrocytes and microglia as previously reported using other approaches [20], our results are in concordance with previous literature that applies various methodologies to gain similar conclusions [4].

The strengths of this study include an unbiased effort to link risk variants to biological pathways and characterize the functional consequence. While this study marks major progress in integrating human genetic and functional evidence, much remains to be established. A caveat of this study is that our approach was limited by the canonical gene sets publicly defined that were used for pathway analysis, and the relatively few brain regions studied for cell type analysis, which was based on mice data. We are aware that additional molecular networks and cell types from unsampled regions could contribute to PD. In addition, PRS analyses considered that all the variants conferred risk under the additive model and did not cover regulatory regions adjacent to the up or downstream of the genes or intergenic variants, which may be crucial for the disease. A further limitation of our study is that although we used state-of-the-art methodologies such as SMR to nominate candidate pathways and genes related to PD etiology, QTL datasets and associations are affected by both small sample size and low cis-SNP density. In addition, trans-QTL could not be assessed. Furthermore, our study focused on individuals of European ancestry, given that large sample sizes were required to create this resource. Replication in ancestrally diverse populations would be necessary for future studies. We also assume the limitation that gene redundancy might exist across the tested gene-sets and therefore overrepresentation of certain genes might lead to missing important gene-sets that in turn are associated with PD etiology. We anticipate that substantial collaborative efforts will lead to an improvement in statistical power and accuracy to define gene-sets linked to PD.

In conclusion, our high-throughput and hypothesis-free approach exemplifies a powerful strategy to provide valuable mechanistic insights into PD etiology and pathogenesis. We highlight several promising pathways, cell types, and genes for further functional prioritization, aware that further in-depth investigation will be required to prove a definite link. As part of this study, we created a foundational resource for the PD community that can be applied to other neurodegenerative diseases with complex genetic etiologies (https://pdgenetics.shinyapps.io/pathwaysbrowser/). In future studies, linking specific phenotypic aspects of PD to pathways will constitute a critical effort using large longitudinal cohorts of well clinically characterized PD patients, with the hope of yielding disease-modifying therapeutic targets that are effective across PD subtypes.

Change history

References

  1. 1.

    Combe D, Largeron C, Géry M, Egyed-Zsigmond E (2015) I-Louvain: an attributed graph clustering method. In: Fromont E, De Bie T, van Leeuwen M (eds) Advances in intelligent data analysis. Lecture notes in computer science, vol 9385. Springer, Cham

    Google Scholar 

  2. 2.

    Bandres-Ciga S, Diez-Fairen M, Kim JJ, Singleton AB (2020) Genetics of Parkinson’s disease: an introspection of its journey towards precision medicine. Neurobiol Dis 137:104782

    CAS  Article  Google Scholar 

  3. 3.

    Blauwendraat C, Nalls MA, Singleton AB (2020) The genetic architecture of Parkinson’s disease. Lancet Neurol 19:170–178

    CAS  Article  Google Scholar 

  4. 4.

    Bryois J, Skene NG, Hansen TF, Kogelman LJA, Watson HJ, Liu Z et al (2020) Genetic identification of cell types underlying brain complex traits yields insights into the etiology of Parkinson’s disease. Nat Genet. https://doi.org/10.1038/s41588-020-0610-9

    Article  PubMed  PubMed Central  Google Scholar 

  5. 5.

    Centers for Common Disease Genomics. In: Genome.gov. https://www.genome.gov/Funded-Programs-Projects/NHGRI-Genome-Sequencing-Program/Centers-for-Common-Disease-Genomics. Accessed 12 Nov 2019

  6. 6.

    Chang CC, Chow CC, Tellier LC, Vattikuti S, Purcell SM, Lee JJ (2015) Second-generation PLINK: rising to the challenge of larger and richer datasets. Gigascience 4:7

    Article  Google Scholar 

  7. 7.

    Chang D, Nalls MA, Hallgrímsdóttir IB, Hunkapiller J, van der Brug M, Cai F et al (2017) A meta-analysis of genome-wide association studies identifies 17 new Parkinson's disease risk loci. Nat Genet 49(10):1511–1516

    CAS  Article  Google Scholar 

  8. 8.

    Auwera GAV der, Van der Auwera GA, Carneiro MO, Hartl C, Poplin R, del Angel G et al (2013) From FastQ data to high-confidence variant calls: the genome analysis toolkit best practices pipeline. Curr Protoc Bioinformat 11.10.1–11.10.33

  9. 9.

    Euesden J, Lewis CM, O’Reilly PF (2015) PRSice: polygenic risk score software. Bioinformatics 31:1466–1468

    CAS  Article  Google Scholar 

  10. 10.

    Frost B, Hemberg M, Lewis J, Feany MB (2014) Tau promotes neurodegeneration through global chromatin relaxation. Nat Neurosci 17:357–366

    CAS  Article  Google Scholar 

  11. 11.

    Gibbs JR, van der Brug MP, Hernandez DG, Traynor BJ, Nalls MA, Lai S-L et al (2010) Abundant quantitative trait loci exist for DNA methylation and gene expression in human brain. PLoS Genet 6:e1000952

    Article  Google Scholar 

  12. 12.

    Hao L-Y, Giasson BI, Bonini NM (2010) DJ-1 is critical for mitochondrial function and rescues PINK1 loss of function. Proc Natl Acad Sci USA 107:9747–9752

    CAS  Article  Google Scholar 

  13. 13.

    Hwang J-Y, Aromolaran KA, Zukin RS (2017) The emerging field of epigenetics in neurodegeneration and neuroprotection. Nat Rev Neurosci 18:347–361

    CAS  Article  Google Scholar 

  14. 14.

    Lee S, Emond MJ, Bamshad MJ, Barnes KC, Rieder MJ, Nickerson DA et al (2012) Optimal unified approach for rare-variant association testing with application to small-sample case-control whole-exome sequencing studies. Am J Hum Genet 91:224–237

    CAS  Article  Google Scholar 

  15. 15.

    Liberzon A, Subramanian A, Pinchback R, Thorvaldsdóttir H, Tamayo P, Mesirov JP (2011) Molecular signatures database (MSigDB) 3.0. Bioinformatics 27:1739–1740

    CAS  Article  Google Scholar 

  16. 16.

    Lin MT, Beal MF (2006) Mitochondrial dysfunction and oxidative stress in neurodegenerative diseases. Nature 443:787–795

    CAS  Article  Google Scholar 

  17. 17.

    Nalls MA, Blauwendraat C, Vallerga CL, Heilbron K, Bandres-Ciga S, Chang D et al (2019) Identification of novel risk loci, causal insights, and heritable risk for Parkinson’s disease: a meta-analysis of genome-wide association studies. Lancet Neurol 18:1091–1102

    CAS  Article  Google Scholar 

  18. 18.

    Narendra DP, Jin SM, Tanaka A, Suen D-F, Gautier CA, Shen J et al (2010) PINK1 is selectively stabilized on impaired mitochondria to activate Parkin. PLoS Biol 8:e1000298

    Article  Google Scholar 

  19. 19.

    Raudvere U, Kolberg L, Kuzmin I, Arak T, Adler P, Peterson H et al (2019) g:Profiler: a web server for functional enrichment analysis and conversions of gene lists (2019 update). Nucleic Acids Res 47:W191–W198

    CAS  Article  Google Scholar 

  20. 20.

    Reynolds RH, Botía J, Nalls MA, Hardy J, International Parkinson’s Disease Genomics Consortium (IPDGC), System Genomics of Parkinson’s Disease (SGPD) et al (2019) Moving beyond neurons: the role of cell type-specific gene regulation in Parkinson’s disease heritability. NPJ Parkinsons Dis 5:6

    Article  Google Scholar 

  21. 21.

    Robak LA, Jansen IE, van Rooij J, Uitterlinden AG, Kraaij R, Jankovic J et al (2017) Excessive burden of lysosomal storage disorder gene variants in Parkinson’s disease. Brain 140:3191–3203

    Article  Google Scholar 

  22. 22.

    Saleh H (2018) Machine learning fundamentals: use python and SCIKIT-learn to get up and running with the hottest developments in machine learning. Packt Publishing Ltd, Birmingham

    Google Scholar 

  23. 23.

    Satake W, Nakabayashi Y, Mizuta I, Hirota Y, Ito C, Kubo M et al (2009) Genome-wide association study identifies common variants at four loci as genetic risk factors for Parkinson’s disease. Nat Genet 41:1303–1307

    CAS  Article  Google Scholar 

  24. 24.

    Singleton A, Hardy J (2011) A generalizable hypothesis for the genetic architecture of disease: pleomorphic risk loci. Hum Mol Genet 20:R158–R162

    CAS  Article  Google Scholar 

  25. 25.

    Skene NG, Bryois J, Bakken TE, Breen G, Crowley JJ, Major Depressive Disorder Working Group of the Psychiatric Genomics Consortium et al (2018) Genetic identification of brain cell types underlying schizophrenia. Nat Genet 50:825–833

    CAS  Article  Google Scholar 

  26. 26.

    Subramanian A, Tamayo P, Mootha VK, Mukherjee S, Ebert BL, Gillette MA et al (2005) Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci USA 102:15545–15550

    CAS  Article  Google Scholar 

  27. 27.

    Sulzer D, Alcalay RN, Garretti F, Cote L, Kanter E, Agin-Liebes J et al (2017) T cells from patients with Parkinson’s disease recognize α-synuclein peptides. Nature 546:656–661

    CAS  Article  Google Scholar 

  28. 28.

    Torrents J, Ferraro F (2015) Structural cohesion: visualization and heuristics for fast computation with NetworkX and matplotlib. In: Proceedings of the 14th python in science conference

  29. 29.

    Võsa U, Claringbould A, Westra H-J, Bonder MJ, Deelen P, Zeng B et al (2018) Unraveling the polygenic architecture of complex traits using blood eQTL metaanalysis. Genomics 228

  30. 30.

    Wang K, Li M, Hakonarson H (2010) ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data. Nucleic Acids Res 38:e164

    Article  Google Scholar 

  31. 31.

    Website. CCDG. CCDG/Pipeline-Standardization. GitHub. https://github.com/CCDG/Pipeline-Standardization. (Accessed: 12th November 2019). Accessed 20 Nov 2019

  32. 32.

    Wu Y, Zeng J, Zhang F, Zhu Z, Qi T, Zheng Z et al (2018) Integrative analysis of omics summary data reveals putative mechanisms underlying complex traits. Nat Commun 9:918

    Article  Google Scholar 

  33. 33.

    Yang J, Hong Lee S, Goddard ME, Visscher PM (2011) GCTA: a tool for genome-wide complex trait analysis. Am J Hum Genet 88:76–82

    CAS  Article  Google Scholar 

  34. 34.

    Zeisel A, Hochgerner H, Lönnerberg P, Johnsson A, Memic F, van der Zwan J et al (2018) Molecular architecture of the mouse nervous system. Cell 174:999.e22–1014.e22

    Article  Google Scholar 

  35. 35.

    Zhan X, Hu Y, Li B, Abecasis GR, Liu DJ (2016) RVTESTS: an efficient and comprehensive tool for rare variant association analysis using sequence data. Bioinformatics 32:1423–1426

    CAS  Article  Google Scholar 

  36. 36.

    Zheng B, Liao Z, Locascio JJ, Lesniak KA, Roderick SS, Watt ML et al (2010) PGC-1α, a potential therapeutic target for early intervention in Parkinson’s disease. Sci Transl Med 2:52ra73

    Article  Google Scholar 

  37. 37.

    Zhu Z, Zhang F, Hu H, Bakshi A, Robinson MR, Powell JE et al (2016) Integration of summary data from GWAS and eQTL studies predicts complex trait gene targets. Nat Genet 48:481–487

    CAS  Article  Google Scholar 

Download references

Acknowledgements

We would like to thank all of the subjects who donated their time and biological samples to be a part of this study. This work was supported in part by the Intramural Research Programs of the National Institute of Neurological Disorders and Stroke (NINDS), the National Institute on Aging (NIA), and the National Institute of Environmental Health Sciences both part of the National Institutes of Health, Department of Health and Human Services; project numbers Z01-AG000949-02 and Z01-ES101986. In addition this work was supported by the Department of Defense (award W81XWH-09-2-0128), and The Michael J Fox Foundation for Parkinson’s Research. We would also like to thank all members of the International Parkinson Disease Genomics Consortium (IPDGC). For a complete overview of members, acknowledgements and funding, please see http://pdgenetics.org/partners. We would like to thank the Accelerating Medicines Partnership initiative (AMP-PD) for the publicly available whole genome sequencing data. This work utilized the computational resources of the NIH HPC Biowulf cluster. (http://hpc.nih.gov).

International Parkinson Disease Genomics Consortium

We thank contributors who collected samples used in this initiative, as well as patients and families, whose help and participation made this work possible. Part of the data used for the analyses described in this manuscript was obtained from dbGaP at http://www.ncbi.nlm.nih.gov/sites/entrez?db=gap through dbGaP accession phs000200, we thank members of the North American Brain Expression Consortium (NABEC) for providing DNA samples derived from brain tissue. Brain tissue for the NABEC cohort was obtained from the Baltimore Longitudinal Study on Aging at the Johns Hopkins School of Medicine, and from the NICHD Brain and Tissue Bank for Developmental Disorders at the University of Maryland, Baltimore, MD, USA. We would like to thank the United Kingdom Brain Expression Consortium (UKBEC) for providing DNA samples. This study acknowledges the National Institute of Neurological Disorders and Stroke (NINDS) supported Parkinson’s Disease Biomarkers Program Investigators (https://pdbp.ninds.nih.gov/sites/default/files/assets/ PDBP_investigator_list.pdf). A full list of PDBP investigators can be found at https://pdbp.ninds.nih.gov/policy. Data and biospecimens used in the preparation of this manuscript were obtained from the Parkinson’s Disease Biomarkers Program (PDBP) Consortium, part of the National Institute of Neurological Disorders and Stroke at the National Institutes of Health. Investigators include: Roger Albin, Roy Alcalay, Alberto Ascherio, Thomas Beach, Sarah Berman, Bradley Boeve, F. DuBois Bowman, Shu Chen, Alice Chen-Plotkin, William Dauer, Ted Dawson, Paula Desplats, Richard Dewey, Ray Dorsey, Jori Fleisher, Kirk Frey, Douglas Galasko, James Galvin, Dwight German, Lawrence Honig, Xuemei Huang, David Irwin, Kejal Kantarci, Anumantha Kanthasamy, Daniel Kaufer, James Leverenz, Carol Lippa, Irene Litvan, Oscar Lopez, Jian Ma, Lara Mangravite, Karen Marder, Laurie Ozelius, Vladislav Petyuk, Judith Potashkin, Liana Rosenthal, Rachel Saunders-Pullman, Clemens Scherzer, Michael Schwarzschild, Tanya Simuni, Andrew Singleton, David Standaert, Debby Tsuang, David Vaillancourt, David Walt, Andrew West, Cyrus Zabetian, Jing Zhang, and Wenquan Zou. The PDBP Investigators have not participated in reviewing the data analysis or content of the manuscript. This work was supported by Scripps Research Translational Institute, an NIH-NCATS Clinical and Translational Science Award (CTSA; 5 UL1 RR025774). We are grateful to the NIH NeuroBioBank for providing brain tissue samples for Parkinson’s disease cases. We are grateful to the Banner Sun Health Research Institute Brain and Body Donation Program of Sun City, Arizona, for the provision of human brain tissue (PI: Thomas G. Beach, MD). The Brain and Body Donation Program is supported by the National Institute of Neurological Disorders and Stroke (U24 NS072026 National Brain and Tissue Resource for Parkinson’s Disease and Related Disorders), the National Institute on Aging (P30 AG19610 Arizona Alzheimer’s Disease Core Center), the Arizona Department of Health Services (contract 211002, Arizona Alzheimer’s Research Center), the Arizona Biomedical Research Commission (contracts 4001, 0011, 05-901 and 1001 to the Arizona Parkinson’s Disease Consortium) and the Michael J. Fox Foundation for Parkinson’s Research.

Funding

This research was supported in part by the Intramural Research Program of the National Institutes of Health (National Institute on Aging, National Institute of Neurological Disorders and Stroke; project numbers: project numbers 1ZIA-NS003154, Z01-AG000949-02 and Z01-ES101986). In addition, this work was supported by the Department of Defense (award W81XWH-09-2-0128), and The Michael J Fox Foundation for Parkinson’s Research. C.R.S. was supported in part by NIH grants U01NS095736, U01NS100603, R01AG057331, and R01NS115144 and by the MJFF. Mike A. Nalls’ participation is supported by a consulting contract between Data Tecnica International and the National Institute on Aging, NIH, Bethesda, MD, USA, as a possible conflict of interest, Dr. Nalls also consults for Neuron 23 s Inc, Lysosomal Therapeutics Inc, and Illumina Inc among others. C.R.S. is named as co-inventor on a US patent application on sphingolipids biomarkers that is jointly held by Brigham & Women’s Hospital and Sanofi. C.R.S has consulted for Sanofi Inc.; has collaborated with Pfizer, Opko, and Proteome Sciences, and Genzyme Inc.

Author information

Affiliations

Authors

Consortia

Contributions

(1) Research project: A. Conception (SBC, SSA, MN, CD, AS), B. Organization (SBC, SSA, MN), C. Execution (SBC, SSA, JK, MM, MN). Data generation: A. Experimental (RG, JD, CB, MN). (3) Statistical Analysis: A. Design (SBC, SSA, MN, FF), B. Execution (SBC, SSA, MM, MN, FF, JJ). (4) Manuscript Preparation: A. Writing of the first draft (SBC), B. Review and Critique (all authors).

Corresponding author

Correspondence to A. B. Singleton.

Ethics declarations

Conflict of interest

No other disclosures were reported.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

We included the member of International Parkinson Disease Genomics Consortium in Acknowledgement section.

The original online version of this article was revised due to a Retrospective Open Access.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (PDF 8915 kb)

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Bandres-Ciga, S., Saez-Atienzar, S., Kim, J.J. et al. Large-scale pathway specific polygenic risk and transcriptomic community network analysis identifies novel functional pathways in Parkinson disease. Acta Neuropathol 140, 341–358 (2020). https://doi.org/10.1007/s00401-020-02181-3

Download citation

Keywords

  • Parkinson disease
  • Polygenic risk
  • Transcriptome community maps
  • Mendelian randomization