Introduction

Neurodegenerative diseases promise to exert an increasingly onerous toll on society’s aging population in the coming years [1]. However, despite decades of research and hundreds of unique animal models, no therapy has yet emerged to overcome the insidious loss of neurons in neurodegenerative diseases such as Alzheimer’s disease (AD), Parkinson’s disease (PD), Huntington’s disease (HD), amyotrophic lateral sclerosis (ALS), or frontotemporal lobar dementia (FTLD). The diversity of identified contributors to neurodegeneration, including vascular pathology [2], excitotoxicity [3], oxidative stress [4], prior traumatic brain injury [5], environmental exposures [6], and genetic mutations in mitochondrial [7], RNA processing [8], proteasomal and autophagy-related genes [9], points to a multiple-hit hypothesis of neurodegeneration. In contrast to deterministic animal models wherein neurodegeneration may be induced through a single mutation and cured with a single compound [10], most human neurodegeneration occurs sporadically and may therefore reflect the cumulative effects of numerous low penetrance risk factors and stressors.

Despite the challenges posed in identifying the individual causes of neurodegeneration, common themes of mitochondrial dysfunction, protein aggregation, oxidative stress and neuroinflammation have emerged in most neurodegenerative diseases [11],[12]. Increasing numbers of transcriptome studies have addressed individual neurodegenerative diseases, including those focused on understanding regional susceptibility [13], disease progression [14], cell type-specific signals [15], and disease-specific meta-analysis [16]. However, these studies are usually limited by relatively small sample sizes and significant heterogeneity between experiments, particularly in the tissue sampled, expression analysis platform, sample procurement method, and background of the investigated patient populations. Availability of large amounts of expression profiling data in public repositories such as the NCBI Gene Expression Omnibus (GEO) and EBI ArrayExpress presents novel opportunities to carry out an integrated multi-cohort analysis of diseases, and such data has been used to identify common transcriptional signatures in cancer and infections [17],[18]. There are approximately 40 publicly available gene expression microarray studies that profiled brain tissue in neurodegenerative diseases. Collectively these studies better represent the heterogeneity of neurodegeneration observed in the real world as different research groups carried out these experiments independently using different tissue samples and microarray technologies. However, this inherent heterogeneity in public data also presents challenges in terms of how to integrate these independent studies cohesively into a single analysis. We recently proposed a meta-analysis approach that leverages the heterogeneity across different data sets to identify robust, reproducible disease gene signatures. We have successfully used this meta-analysis approach to reveal novel insights into lung cancer [19] and to predict FDA-approved drugs that can be repurposed to treat organ transplant patients [20].

No systematic multi-cohort analysis has yet evaluated transcriptional alterations that are conserved across neurodegenerative diseases. We applied our meta-analysis approach to analyze publicly available gene expression datasets of post-mortem central nervous system (CNS) tissue for AD, HD, PD, and ALS. We hypothesized that such an analysis would identify the transcriptional alterations that define neurodegeneration, regardless of the specific neurodegenerative disease. Our results identified a conserved signature of neurodegeneration, applicable even to variants of FTLD, which were not included in the original meta-analysis. We analyzed this signature with respect to normal aging brain gene expression data, cell type specificity, microglial polarization and gliosis, revealing novel insights into the neurodegenerative process. Finally, we identified patterns of gene dysregulation unique to each neurodegenerative disease relative to the others.

Materials and methods

All analyses were completed in R/Bioconductor unless otherwise noted. Heat maps were generated using the R package pheatmap [21]. The analysis workflow is shown in Figure 1.

Figure 1
figure 1

Meta-analysis workflow schematic. See Materials and Methods for details. GSEA, Gene Set Enrichment Analysis; TF, transcription factor.

Data collection and pre-processing

We searched the public data repository ArrayExpress (search date: March 15, 2014) for gene expression microarray data sets from neurodegenerative disease experiments using the search terms “neurodegeneration,” “dementia”, “Alzheimer”, “Parkinson”, “Huntington”, “amyotrophic lateral sclerosis”, “frontotemporal”, “motor neurone disease”, “spinocerebellar ataxia”, “spinal muscular atrophy” and “prion”. We first identified data sets that satisfied the following criteria: (1) samples were from human post-mortem CNS tissue samples, (2) the data was originally acquired using a genome-wide gene expression microarray platform, (3) the microarray platform had reasonably accessible and clear probe-to-gene mapping annotations and (4) there were ≥5 cases and ≥5 controls total for the relevant patient cohort in each data set. We identified a total of 28 patient cohorts containing 1475 samples from 19 independent data sets that satisfied these criteria. Note that some data sets included more than one disease: we refer to each disease-specific group and its respective control group as a patient cohort.

Next, we divided these patient cohorts into two groups based on their sample sizes. We chose smaller patient cohorts (<100 samples) for the initial meta-analysis (Discovery cohorts), and we reserved larger patient cohorts (>100 samples) for validation analysis (Validation cohorts). For the discovery cohorts, we ensured that there were at least two patient cohorts for each disease that met our selection criteria. We then identified up to three different CNS regions affected by the disease process at the transcriptional level in each specific neurodegenerative disease, as identified in the included studies [16],[22]-[37] (e.g. AD pathology involves the entorhinal cortex, hippocampus and frontal cortex; see Table 1 for all CNS regions used). We separated the patient cohorts by these CNS regions, and we selected the largest independent cohort by sample size for each region. Thus, in the discovery cohorts, we used two to three different disease-affected brain regions for each disease. If a patient cohort contained samples from multiple CNS regions in the same individuals, we used that data set only once, selecting samples from a single CNS region. This approach ensured that every sample in the analysis came from a different individual. When possible, for data sets including multiple CNS regions, we took advantage of the opportunity to use disease-affected regions not represented in other data sets to ensure regional generalizability. Based on these criteria, we chose 10 patient cohorts containing 285 samples to include in the discovery cohorts.

Table 1 Summary of public gene expression data sets used in the discovery, validation, and secondary validation data set meta-analyses

For the validation cohorts, we used three patient cohorts containing 985 samples from AD and HD patients. We performed a secondary validation meta-analysis on the 15 remaining patient cohorts (205 samples) that met our inclusion criteria, which included several smaller studies of PD and AD as well as five variants of FTLD. The GEO accession numbers of the data sets used in our analysis are summarized in Table 1[16],[22]-[37].

Because the available data sets use many different microarray platforms, we downloaded the processed gene expression data for each data set, most of which were associated with peer-reviewed publications analyzing the data. Phenotypic data for each sample were also extracted where available. We log2 transformed and quantile normalized gene expression signal intensities within each data set, if not already processed as such.

Gene expression meta-analysis

We conducted gene expression meta-analysis by combining effect sizes (standardized mean differences) as previously described in detail [20]. This approach determines a meta-effect size for each gene, which estimates the change in gene expression across all data sets given a common two-class comparison (i.e. disease vs. control). Microarray probes from each data set were mapped onto HUGO gene symbols. If a probe matched more than one gene, an additional record was added for each mapped gene. The effect size for each gene in each data set was estimated as Hedge’s adjusted g. If multiple probes matched to a gene, that gene’s effect size was summarized using the fixed effects inverse-variance model. Study-specific effect sizes were then combined to determine the pooled effect size and its standard error using the random effects inverse-variance technique. Nominal p-values were determined by comparing a Z-statistic (ratio of the pooled effect size to its standard error for each gene) to a standard normal distribution. The p-values were corrected for multiple hypotheses testing using Benjamini-Hochberg false discovery rate (FDR) correction [39]. In the discovery meta-analysis, genes were deemed to be significantly differentially expressed if FDR ≤ 5% and the gene was measured in all 10 patient cohorts.

Leave-one-disease-out analysis

In order to ensure that our meta-analysis was not influenced by or biased towards a specific neurodegenerative disease, we repeated our meta-analysis four times by removing data sets corresponding to one disease at a time (e.g. in the first iteration, HD data sets were removed, and the meta-analysis was completed on the combined AD, PD, and ALS data sets). At each iteration, we identified significantly differentially expressed genes (FDR ≤ 5%). Genes that were significant, irrespective of which subset of neurodegenerative diseases were analyzed, formed the pre-validation common neurodegeneration module (CNM). We have previously shown the utility of the leave-one-disease-out approach in identifying a robust gene expression signature during acute rejection across different transplanted solid organs [20].

Validation analyses

We first validated each of the genes in the CNM in the three large patient cohorts, including two AD data sets [30],[38] and a HD data set (not yet published) (Table 1). We used the meta-analysis approach described above to identify significantly differentially expressed genes across the three validation data sets (FDR ≤ 5%). Genes that were significantly differentially expressed in the same direction in both the discovery and validation analyses were considered validated. We removed the genes from the CNM that were not validated in the independent cohorts.

We further validated the CNM in 15 additional patient cohorts containing 205 samples from 10 studies of PD, AD, and variants of frontotemporal dementia (Table 1).

Analysis of the CNM association with histologic disease severity

Some of the data sets used provided neuropathological annotations. In three data sets from the discovery cohorts, there were Braak stage and Huntington grade information for each sample [24],[28],[29]. To assess the CNM’s association with histologic disease severity, we calculated the geometric mean of the gene expression intensity for the up-regulated and down-regulated components of the CNM separately within each sample. The geometric mean of the CNM was centered and standardized across all samples in a given experiment, giving a z-score. We also calculated the difference between the up-regulated CNM z-score and the down-regulated CNM z-score in each sample. Jonckheere trend test was used to evaluate the significance of trends (two-tailed test). We generated bar plots using the R package ggplot2 [40].

Gene ontology, pathway, and network analysis

We used Gene Set Enrichment Analysis (GSEA) [41] to identify the enrichment of pre-established gene sets across neurodegenerative diseases. We used the GSEA PreRank option to input the complete list of genes with their corresponding meta-effect sizes from the discovery meta-analysis, regardless of significance. This approach allowed us to first assess pathway enrichment without arbitrary thresholds for significance. We used the curated gene sets for Gene Ontology (GO) terms from the Broad Institute’s Molecular Signature Database (MSigDB). We set the false discovery rate q-value ≤ 0.05 as the threshold for significance. We constructed networks of overlapping significantly enriched gene sets using the EnrichmentMap plugin in the Cytoscape software [42],[43].

The MetaCore software suite (Thomson Reuters) was used to functionally analyze the CNM and generate gene networks. We set the background gene list in MetaCore to all of the genes assessed in all 10 discovery cohorts. We conducted enrichment analysis of the CNM for MetaCore’s curated pathways. We then generated a network from CNM genes using only the direct interactions between network objects. We generated additional networks using the default “analyze network” algorithm in MetaCore (50 nodes per sub-network).

The MetaCore “Interactions by Protein Functions” tool was used to identify proteins that are functionally over-connected with proteins corresponding to genes in the CNM. We opted to include protein complexes in this analysis.

Correlation with normal aging

We investigated the correlation of each gene in the CNM with normal brain aging. We searched the EBI ArrayExpress for aging CNS microarray data sets. We identified three independent normal aging human CNS data sets (221 samples) from various tissues that had a minimum of 30 samples per experiment covering a broad age range (Table 2) [44]-[46]. For a data set that used samples from multiple CNS areas, we only used samples from the hippocampus because it has been reported to vary the least based on gender [44]. For each CNM gene in each data set, we determined Kendall’s tau coefficient between log2 transformed gene signal intensity and age using the “Kendall” R package [47]. In this package, when ties are present in the data, a normal approximation with continuity correction is made. If more than one probe existed for each gene, the geometric mean of the signal intensity of the multiple probes was used. Genes that were positively (negatively) correlated with a p-value ≤ 0.05 in ≥2 out of 3 of the normal aging CNS data sets were deemed to be significantly positively (negatively) correlated with aging.

Table 2 Summary of public gene expression data from normal aging human brain studies used in analysis

We used the Database for Annotation, Visualization and Integrated Discovery (DAVID) [48] tool to assess the enrichment for GO terms in the aging correlated and non-correlated components of the CNM.

Assessment of cell type specificity in differentially expressed genes

To evaluate whether the CNM may reflect changes in cell-type composition, we assessed the overlap between our module and genes enriched in isolated neurons, astrocytes, oligodendrocytes [49],[50], microglia or peripheral macrophages [51],[52] from normal mice, as well as genes enriched in astrocytes isolated from mice following stroke or LPS treatment [53]. While these data sets were derived from multiple experiments and could not be compared directly, they all used the same platform, the Affymetrix Mouse Genome 430 2.0 Array. This permitted us to use the Gene Expression Commons (GEXC) tool [52] to evaluate gene expression activity in these cell type specific data sets relative to 11,939 public gene array data sets, which allows for the classification of genes as “active” or “inactive”. We used the “Expression Pattern Search” function in GEXC to identify genes from the CNM that were “active” in the cell type of interest and “inactive” in all others. We then performed manual visualization of differential gene expression in GEXC to confirm and finalize the assigned cell-type enrichment category for each gene in the CNM. For genes only modestly differentially expressed between neurons, oligodendrocytes and astrocytes, we deferred to the directly compared cell type-specific gene lists from [49]. Genes not enriched in a single given cell type based on these criteria were regarded as not being cell-type specific.

Assessment of enrichment for microglia polarization states and gliosis

We created custom gene sets from published gene lists generated from transcriptome analyses of human M1 and M2-polarized macrophages [54], microglia from the end stage of a mouse model of ALS [55], and astrocytes from mice 24 hours following treatment with lipopolysaccharide (LPS) or middle cerebral artery occlusion (MCAO) [53]. For the mouse data, we downloaded the gene lists and converted mouse gene symbols to human HUGO gene symbols prior to inputting the custom gene sets into GSEA [41]. We used the GSEA PreRank option to assess the enrichment for these custom gene sets in the complete list of discovery meta-analysis genes with their corresponding meta-effect sizes, regardless of significance.

Identification of enriched transcription factors using ENCODE data

We used the ENCODE ChIP-Seq Significance Tool [56] to identify transcription factors enriched in the up-regulated and down-regulated components of the CNM. We used the following parameters: organism, human (hg19); regulatory element type, protein-coding genes; ID type, symbol; background regions, a list of all genes assessed across all 10 discovery data sets; analysis window center, TSS/5’ end (transcription start site); upstream and downstream window size relative to TSS, 500 bp; and cell lines, all. We repeated this analysis for genes positively and negatively correlated with aging and for published gene lists where indicated.

Identification of unique disease-specific patterns of gene expression changes

To identify patterns in gene expression changes that are unique to each of the neurodegenerative diseases evaluated, for each specific disease, we repeated the aforementioned gene expression meta-analysis on the disease by itself, as well as separately on the other three diseases together. We then removed genes from the individual disease meta-analysis output gene list that were significantly differentially expressed from the three-disease meta-analysis (FDR ≤ 0.05), thereby omitting common differentially expressed genes in neurodegeneration from disease-specific gene lists. We restricted the gene lists to genes that were measured in all 10 data sets. The resulting individual disease meta-analysis gene list was then input into GSEA PreRank for assessment of GO term enrichment as described earlier.

Results

Meta-analysis identifies a common gene signature of neurodegeneration

For our discovery meta-analysis of neurodegenerative diseases, we collected microarray data sets containing 10 independent patient cohorts that profiled human post-mortem CNS tissues in 285 samples (150 cases, 135 controls) (Table 1, Additional file 1: Table S1) [22]-[29]. These samples were obtained from various cortical regions, hippocampus, basal ganglia, and spinal cord in four neurodegenerative diseases (AD, PD, HD, and ALS). These experiments used seven different gene expression microarray platforms. As some data sets do not provide raw data and optimal microarray pre-processing techniques differ across platforms, we downloaded processed signal intensities, and checked that all data were log2 transformed and quantile normalized across all samples in the specific experiment. If not, we log2 transformed and quantile normalized the data. We used disease phenotypes as defined in the original publications for disease versus control tissue comparisons. The included studies also generally showed an effort to age-match cases and controls.

To identify the most robust and consistently differentially expressed genes across neurodegenerative diseases, we used a gene expression meta-analysis approach [20]. Briefly, this approach combines the effect sizes, calculated as Hedges’ adjusted g, for each gene from each data set to estimate a standardized mean difference in gene expression (see Materials and Methods for details). For a gene to be considered differentially expressed in the meta-analysis, we required it to be measured in all 10 patient cohorts and for its effect size to have a significant false discovery rate (FDR ≤ 5%). This analysis yielded lists of 3,078 and 3,565 significantly up-regulated and down-regulated genes, respectively (Additional file 1: Table S2).

However, because of the heterogeneity in effect sizes, it is possible that some of the genes may be differentially expressed in one or more neurodegenerative diseases, but not all. Because our goal is to identify a set of common genes that are differentially expressed in the same direction across all neurodegenerative diseases, we carried out “leave-one-disease-out” analysis. In this analysis, we repeated the meta-analysis four additional times, each time removing patient cohorts corresponding to one disease prior to analysis of the remaining patient cohorts for the other three diseases. Genes that remained significantly differentially expressed (FDR ≤ 5%) in all four iterations of the “leave-one-disease-out” analysis were considered to represent the common genes dysregulated across neurodegenerative diseases. We identified 322 such consistently differentially expressed genes (95 up-regulated, 227 down-regulated) irrespective of which subset of neurodegenerative diseases were analyzed (Additional file 1: Table S3). It is possible that there may still be significant heterogeneity in the effect sizes of these genes between different neurodegenerative diseases; however, this heterogeneity may indicate that the same pathway is expressed at different levels between neurodegenerative diseases, but in the same direction. Therefore, we did not further consider this heterogeneity for the 322 genes.

Next, we validated these genes in three additional patient cohorts of neurodegenerative disease consisting of 985 samples (643 cases, 342 controls) (Table 1, Additional file 1: Table S1) [30],[38]. These data sets profiled human post-mortem CNS tissue samples from patients with AD or HD. Large validation data sets were not publicly available for PD and ALS. We found 73/95 (76.8%) up-regulated genes and 170/227 (74.9%) down-regulated genes (total of 243 genes) were also significantly differentially expressed in the validation cohorts (Figure 2 and Additional file 2: Figure S1). Henceforth, these 243 validated genes are referred to as a common neurodegeneration module (CNM) (Table 3 and Additional file 1: Table S4).

Figure 2
figure 2

Meta-analysis and leave-one-disease-out analysis reveal common differentially expressed genes across neurodegenerative diseases. Heat map shows consistent differential expression in the discovery, validation, and secondary validation data sets. Columns denote CNM genes ranked from highest to lowest standardized mean difference (Hedges’ g in log2 scale), from left to right. Rows denote data sets used in each stage of meta-analysis. Heat map colors indicate Hedges’ g in log2 scale. Refer to Table 1 for data set information. ALS, amyotrophic lateral sclerosis; HD, Huntington’s disease; PD, Parkinson’s disease; AD, Alzheimer’s disease; PiD, classical Pick’s disease; FTLD, frontotemporal lobar dementia (Constantinidis type C); PSP, progressive supranuclear palsy; FTLD-GRNpos, frontotemporal lobar dementia with ubiquitin- and TDP-43-positive inclusions, progranulin mutation positive; FTLD-GRNneg, frontotemporal lobar dementia with ubiquitin- and TDP-43-positive inclusions, progranulin mutation negative.

Table 3 Common neurodegeneration module (CNM) genes

Finally, we extended our analysis to include data sets and neurodegenerative diseases that were not part of the discovery or validation cohorts in order to test the generalizability of the CNM. This secondary validation included 205 samples from 15 patient cohorts from 10 independent experiments, including PD, AD, and five variants of FTLD (Table 1, Additional file 1: Table S1). Restricting our analysis to the 243 CNM genes, 42/72 (65.3%) and 156/170 (91.7%) of the up- and down-regulated CNM genes were differentially expressed in this secondary validation meta-analysis (because some data sets did not assess all 243 CNM genes, we only required the gene to be assessed in half the experiments to be included in the analysis, but one of the 73 up-regulated CNM genes did not meet this criteria). Since these data sets were small and inherently noisy, we did not further alter our CNM gene list based on the results of this secondary validation meta-analysis. Nevertheless, visual inspection of a heat map of the CNM genes (Figure 2 and Additional file 2: Figure S2) show that the CNM pattern of expression is generally highly consistent between the discovery, validation and secondary validation meta-analyses, further supporting the generalizability of the CNM to neurodegenerative diseases.

Two PD data sets (GSE19587 and GSE20146) in the secondary validation analysis did not show the CNM pattern of expression. In GSE19587, tissue was sampled from the dorsal motor nucleus of the Vagus nerve [34], which showed uniquely decreased cerebral blood volume in PD on MRI relative to other brainstem regions. The impact of vascular perfusion on gene expression in neurodegeneration requires further evaluation. GSE20146 used samples from the globus pallidus interna [57] a region not typically associated with the neurodegenerative aspect of PD.

In addition, we assessed the association of the CNM with histologic disease severity in individual patient samples. Three of the discovery cohorts categorized patients based upon histologic criteria of disease severity, including “HD grade” [24] and AD Braak stage [28],[29]. It should be noted that disease severity was not considered during meta-analysis, and every sample was classified as either “control” or “case.” We calculated the geometric mean of up-regulated and down-regulated CNM genes separately for each sample, as well as the difference. We found that the geometric mean of the up-regulated CNM genes increases with disease severity, while that of the down-regulated CNM genes decreases with disease severity (Figure 3). Furthermore, the difference in each sample between the geometric mean of the upregulated CNM genes and down-regulated CNM genes increases with disease severity. This trend was statistically significant (two-sided p < 0.05, Jonckheere’s trend test) in five of the six cases where the up-regulated and down-regulated components were analyzed separately and in all three cases when the difference was analyzed. In summary, the CNM represents a shared core signature of neurodegeneration and is associated with disease severity.

Figure 3
figure 3

CNM significantly associates with histologic disease severity. Boxplots of the CNM z-score (standardized geometric mean of the up-regulated or down-regulated CNM) and the difference between the up- and down-regulated z-scores for samples in each disease neuropathology category in three independent data sets (GSE3790, GSE29378, GSE36980). Blue dots correspond to individual samples. The up-regulated CNM trends upward with increasing disease severity, while the down-regulated CNM trends downwards with increasing disease severity. The difference between the z-scores increases with disease progression. Jonckheere's trend test shows significant association (two-tailed p ≤ 0.05) in 8 out of 9 plots (left to right, HD-GSE3790: p = 0.00356, p = 0.00003, p = 0.00022; AD-GSE29378: p = 0.00758, p = 0.04707, p = 0.01460; AD-GSE36980: p = 0.07324, p = 0.00874, p = 0.01815). HD, Huntington’s disease; AD, Alzheimer’s disease.

Meta-analysis highlights common mechanisms of neurodegenerative diseases

We hypothesized that the results of our meta-analysis would enable us to identify conserved pathways dysregulated across neurodegenerative diseases. We used Gene Set Enrichment Analysis (GSEA) [41] to evaluate the enrichment of Gene Ontology (GO) terms in the complete ranked gene list from the discovery meta-analysis, prior to “leave-one-disease-out” analysis and validation analysis (Figure 1). Using the GSEA PreRank option and the discovery meta-analysis gene list, we found that 10 and 48 GO terms were significantly enriched (FDR ≤ 5%) in neurodegeneration and normal control tissue, respectively (Additional file 1: Table S5). We further generated networks connecting overlapping GO gene sets to aid in the interpretation of these results (Figure 4A and Additional file 2: Figure S3). We found that gene sets enriched in neurodegeneration relative to control tissue formed clusters relating to NFκB signaling, immune response and cytokine binding, whereas gene sets enriched in control tissue relative to neurodegeneration included clusters relating to mitochondrial and oxidative metabolism, cation channel activity, synaptic transmission, protein channel regulation, and nucleotide metabolism. Although not clustered together, the proteasome complex and ubiquitin cycle, which are both related to protein degradation, were both enriched in control tissue. Collectively, these findings are consistent with established literature regarding common pathways in neurodegenerative diseases, including chronic neuroinflammation, oxidative stress, mitochondrial dysfunction, altered synaptic transmission, and disrupted protein degradation [12].

Figure 4
figure 4

Network and pathway analyses reveal common pathways and hubs in neurodegeneration. (A) EnrichmentMap [42] network for overlapping enriched Gene Ontology gene sets identified by GSEA. Each node represents a significantly enriched gene set (FDR q-value ≤ 0.05), and more significant nodes are proportionally larger. Red nodes denote gene sets enriched in neurodegenerative disease tissue, while blue nodes denote those enriched in control tissue. Green lines appear between any gene sets with > 50% overlap, and are proportionally thicker given greater overlap. See Additional file 2: Figure S3 for full annotations of nodes. (B and C) MetaCore analyses generated, inputting all 243 CNM genes. (B) Network generated using only direct interactions between CNM genes. Smaller red and blue circles denote up-regulated and down-regulated genes respectively. Refer to MetaCore website for detailed network symbol legend. (C) MetaCore “Interactions by Protein Function” analysis identification of proteins, not necessarily within the CNM or differentially expressed at all, that are highly functionally connected with proteins corresponding to genes in the CNM. Z-score, standardized connectivity ratio (higher ratios denote greater connectivity).

Network analyses reveal shared pathways and hub genes in neurodegeneration

To gain insight into the functional characteristics of the CNM specifically, we used MetaCore, an integrated functional analysis tool based on a manually curated database of published molecular biology data. Enrichment analysis in MetaCore for disease biomarkers, process networks, and pathway maps largely reiterated what we found in the Gene Ontology analysis, with the additional identification of gene sets related to altered cell adhesion, cytoskeletal changes, and endocrine signaling (Additional file 1: Table S6).

We then used MetaCore to generate a network, which was restricted to direct interactions between the protein products of input genes to conservatively avoid potentially spurious interactions. From the 243 CNM genes, a network of 43 directly connected proteins was identified (Figure 4B) centered on the hub gene CEBPB. CEBPB (CCAAT-enhancer binding protein beta), which is up-regulated in the CNM (Additional file 2: Figure S4), is a transcription factor known to be involved in regulating inflammatory responses. It has recently been shown to be up-regulated in AD and ALS microglia [58],[59]. CEBPB is also enriched in astrocytes, relative to neurons and oligodendrocytes [49], and has been found to be up-regulated in reactive astrocytes responding to stroke or LPS [53]. Additional hub proteins included CDK5, CALM1 (part of the calmodulin protein complex), and BCL6. While aberrant CDK5 and calmodulin activity have been associated with neurodegenerative diseases through tau hyperphosphorylation and calcium signaling respectively [57],[60], BCL6 has not been previously associated with neurodegeneration. However, BCL6 does play a role in inflammatory signaling in macrophages [61]. Three secreted proteins also appear in this network: CSF1, CCL2, and Substance P—proteins all associated with inflammatory signaling [62]-[64]. In summary, the CNM direct interaction network reveals a common network of inflammation-related protein interactions underlying neurodegenerative disease.

Removing the direct interaction restriction from the MetaCore network-building algorithm, we built additional networks using the default “analyze network” algorithm, which generates a comprehensive network of interactions based on CNM genes prior to fragmenting it into smaller, more manageable sub-networks. This analysis yielded 28 sub-networks. The top sub-network (p = 3.41 × 10–20) contained 19 CNM genes, and was centered on the SP1 transcription factor (Additional file 2: Figure S5A). Although SP1 is not a CNM gene, it was significantly up-regulated in the discovery meta-analysis list (FDR = 0.001) (Additional file 2: Figure S4). Moreover, it is elevated in AD [65], responds to oxidative stress [66], regulates expression of APP and tau [67], and is a proposed hub gene common to both AD and PD pathogenesis [12]. Other hub proteins in this network include GRP78, NFKBIA, ATM, and YB-1. The network is also enriched for Gene Ontology terms related to apoptosis and contains a MetaCore canonical pathway pertaining to heat shock protein and proteasome signaling that includes the proteins Parkin and Huntingtin (Additional file 2: Figure S5A). The second sub-network (p = 4.37 × 10–12) is centered on NR3C1 (GCR-alpha), the glucocorticoid receptor (Additional file 2: Figure S5B). Although not differentially expressed in our meta-analysis (Additional file 2: Figure S4), GCR-alpha signaling has been implicated in neuroinflammation, particularly in relation to stress [68]. The third sub-network (p = 3.40 × 10–12) is centered on c-Myc (Additional file 2: Figure S5C), and appears to be related to ephrin signaling, which is implicated in aberrant synaptic function [69]. Our MetaCore network analysis identified additional common core networks of genes dysregulated across neurodegenerative diseases (Additional file 1: Table S7).

Novel common neurodegenerative hub proteins

Next, we used the MetaCore “Interactions by Protein Function” analysis tool to identify proteins, not necessarily within the CNM or differentially expressed at all, that are highly functionally connected with proteins corresponding to genes in the CNM. This analysis allows for the identification of hub proteins that may not be dysregulated at the gene expression level, but are influencing the CNM, possibly through altered protein translation, post-translational modification or molecular interactions. We identified 24 candidate hub proteins (Figure 4C and Additional file 1: Table S8). Among these hub proteins are many that have a well-established role in neurodegeneration. SOD1, SNCA, and APP are central to current hypotheses around ALS, PD, and AD pathogenesis respectively [70]. As such, the hub proteins identified here may represent different disease pathologies that converge on the CNM. In addition, the top three most highly connected genes NLGN1, GPHN, and DLG4, as well as PPP1R9B, are all associated with synaptic function [71]. Not surprisingly, many other identified hub proteins have known associations with aspects of neurodegeneration. NR3C1, the glucocorticoid receptor, is associated with elevated stress signaling in neurodegeneration. IKBKG is a part of the NFκB cascade, which is associated with neuroinflammation [72]. Ubiquitin is central in protein degradation [73]. c-Myc and dysregulated cell cycling are associated with AD [74]. 14-3-3 beta/alpha is associated with Creutzfeldt-Jakob disease. CASK and calmodulin are associated with dysregulated calcium signaling in neurodegeneration [60]. Chromogranin A is a pro-inflammatory peptide implicated in AD and ALS [75].

In addition to providing further evidence supporting the role of these proteins in neurodegenerative processes, our analysis identified 9 hub proteins (gene symbols in parentheses if different from protein) that have not previously been implicated in neurodegenerative disease. C2orf18 (SLC35F6) is a protein localized to mitochondrial that is involved in apoptosis [76]. HSP20 (HSPB6) is a heat shock protein that may be involved in excitoxicity [77]. PLAP-like (ALPPL2), a germ cell alkaline phosphatase, is aberrantly expressed in seminoma [78]. CLIP170 (CLIP1) regulates microtubule dynamics [79]. STAU2 is a hub gene involved in neuronal RNA transport [80] and is also down-regulated in the CNM (FDR = 7.83×10–5). NUDEL (NDEL1) is a neurodevelopment protein involved in assembly, transport and neuronal integrity [81]. EPB41, also known as protein 4.1R, is a part of the red cell membrane cytoskeletal network, but has been implicated in post-synaptic molecule organization [82]. ERR3 (ESRRG) is a nuclear estrogen receptor-related protein highly expressed in the brain [83]. MaxiK alpha subunit (KCNMA1) is associated with synaptic transmission [84]. These novel hub proteins may serve as candidate genes for further investigation into disease mechanisms and the development of novel therapies for neurodegenerative diseases.

Characterizing the association of CNM genes with normal aging

Aging is an important risk factor for neurodegenerative diseases and is associated with altered microglial activity [85], synaptic plasticity [86] and a component of “normal” cognitive decline [87]. However, normal healthy aging does not involve the severe progressive loss of function observed across neurodegenerative diseases. It is known that aging is associated with increased inflammation and oxidative stress [85], but the healthy brain has adaptive strategies to maintain “normal” function in spite of the normal stresses of aging. This relentless destructive process is only observed in neurodegenerative diseases. Therefore, we hypothesized that the CNM genes that are down-regulated with neurodegeneration, but not with aging may be particularly critical to maintenance of the “healthy aging” process. Conversely, the CNM genes up-regulated specifically in neurodegeneration, but not in normal aging, may be specific drivers of progressive neurodegeneration and could be biomarkers of the degenerative process.

We identified and analyzed three independent post-mortem human microarray data sets investigating the normal aging cortex from age 20 to 106 years (Table 2). We removed samples from patients younger than 20 years of age to avoid developmental changes in gene expression. These data sets included 221 independent samples from the hippocampus, frontal cortex, and dorsolateral prefrontal cortex, all areas associated with changes in aging [44]-[46]. Heat map visualization of these data sets suggests that some genes in the CNM are correlated with aging, and these changes may be the largest in the hippocampus (Additional file 2: Figure S6).

For each gene in the CNM, we determined the non-parametric Kendall rank correlation coefficient (τ) between the gene’s log2 transformed signal intensity and age, and we calculated the two-tailed p-value for each coefficient. Genes that were significantly correlated with age (p ≤ 0.05) in the same direction in at least two out of the three independent data sets were considered correlated with age. Genes that did not meet this criterion were considered to be unchanged in aging. Among all genes evaluated in the discovery meta-analysis, 545 were positively correlated with aging, while 499 were negatively correlated with aging (Additional file 1: Table S9). Of these, we identified 126 genes that were down in the CNM and unchanged in aging (i.e., candidate genes required to prevent neurodegeneration), 48 genes that were up in the CNM and unchanged in aging (candidate neurodegeneration biomarkers), 25 genes that were up in both the CNM and aging, and 44 genes that were down in both the CNM and aging (Table 3 and Additional file 2: Figure’S6 and Additional file 1: Table S10 and Additional file 1: Table S11). No genes were detected that were up in the CNM and down in aging or down in the CNM and up in aging. The overlap between CNM genes and genes correlated with aging was highly significant (p < 2.2 × 10–16, Fisher’s exact test).

We used the DAVID [48] bioinformatics tool to assess the functional enrichment of the these aging-related subgroups of CNM genes with GO terms. None of the groups were significantly enriched for any terms (Benjamini-Hochberg corrected p-value ≤ 0.05), except for the GO cellular component term “mitochondrion” for the CNM genes that were down-regulated in neurodegeneration and unchanged in aging, which suggests that impaired mitochondrial function might be the most consistent specific feature of neurodegeneration, when compared to aging. This finding is consistent with mitochondria-related gene sets being the most significantly dysregulated process in our GSEA (Figure 4A and Additional file 1: Table S4).

Assessment of CNS tissue composition and cell type-specific changes

Because neurodegenerative diseases involve the loss of neurons, the proportion of CNS cells in a CNS tissue sample may change. As such, genes up-regulated in the CNM may reflect increased glial cell density, while genes that are down-regulated in the CNM may reflect decreased neuronal density. To test this hypothesis, we determined whether or not each gene in the CNM demonstrated a cell type-specific expression pattern based on public data sets for purified cell types, including neurons, astrocytes, reactive astrocytes, oligodendrocytes, and microglia/macrophages [49],[50],[52],[53] (Additional file 1: Table S12). We used the Gene Expression Commons to allow for comparison of gene expression between these data sets (see Materials and Methods for details) [52]. Although some astrocyte-associated genes were present in both the up and down-regulated components of the CNM, the up-regulated component of the CNM was comprised predominantly of genes enriched in reactive astrocytes, monocytes, or both—groups that were largely absent from the down-regulated portion of the CNM. Conversely, 70 of the 170 down-regulated CNM genes were enriched in neurons, whereas no neuron-specific genes were present in the up-regulated portion of the CNM (Figure 5A). These results suggest that the decrease in expression of the neuron-specific CNM genes in part could be either due to reduction in neuronal cell density in neurodegenerative disease or due to decrease in expression in neurons without any change in neuronal cell density.

Figure 5
figure 5

Cell type and activation state analysis of CNM genes. (A) Cell type-specificity of the CNM genes. CNM genes were categorized by cell type enrichment based on analysis of public data (see Materials and Methods). The distribution of genes in the categories is shown for CNM genes up-regulated (left) and down-regulated (right). (B) Assessment of microglia/monocyte activation and reactive astrocyte states in the discovery meta-analysis gene list. GSEA for custom gene sets for glial cell polarization states from isolated cell transcriptome analyses in the literature. Positive normalized enrichment scores indicate enrichment in neurodegeneration, while negative score indicate enrichment in control tissue. All enrichments are significant (p < 0.005), except for ALS microglia (down). ALS (up or down), amyotrophic lateral sclerosis, up-regulated or down-regulated genes (mouse model); LPS, lipopolysaccharide treated mouse, up-regulated genes; MCAO, middle cerebral artery occlusion mouse, up-regulated genes.

Furthermore, we found that approximately half of the neuron-enriched genes have never previously been associated with neurodegeneration, despite in many cases having a variety of potentially important roles, including in neural development (Additional file 1: Table S12).

Enrichment for activated microglial and reactive astrocyte states

Microglial activation, monocyte infiltration and gliosis are common features of neurodegenerative disease. To date, no human transcriptome data are available for microglia or reactive astrocytes from neurodegenerative disease. In order to gain insights into the transcriptional contributions of microglia/macrophages and gliosis to neurodegenerative disease, we performed GSEA [41] using custom gene sets for various defined populations of microglia/monocytes and astrocytes (Figure 5B) [53]-[55]. We observed significant enrichment in the complete discovery meta-analysis ranked gene list for genes up-regulated in mouse astrocytes responding to stroke or LPS, relative to controls (p < 0.001), as well as for genes up-regulated in human M1 polarized, relative to M2 polarized macrophages (p < 0.001). We also observed enrichment for genes up-regulated in microglia isolated from a mouse SOD1 model of ALS (p < 0.001). Of note, the discovery meta-analysis gene list was depleted (i.e. normal controls were enriched) of genes differentially expressed in human M2 macrophages (p = 0.005).

Transcription factors associated with the CNM versus normal aging

Next, we carried out enrichment analysis of transcription factor (TF) targets using the ENCODE ChIP-Seq Significance Tool [56], which integrates data from hundreds of public ChIP-Seq data sets, to evaluate potential transcriptional regulators of the differentially expressed genes in the CNM. The 170 down-regulated CNM genes were significantly enriched (q-value < 0.05) for targets of six transcription factors: REST, RBBP5, YY1, SIN3A, ZNF143 and SP2. The 73 up-regulated genes were significantly enriched (q-value < 0.05) for targets of three transcription factors: IKZF1, STAT3 (in cells exposed to ethanol or tamoxifen), and FOS (in cells exposed to tamoxifen) (Figure 6A and 6B). As a significant portion of the CNM genes have expression levels correlated with normal aging, we repeated this analysis on the 545 and 449 genes positively and negatively correlated with aging, respectively. Unlike the CNM, no transcription factors were predicted for the genes negatively correlated with aging; however, the genes positively correlated with aging yielded an almost identical set of predicted transcription factors as the up-regulated component of the CNM, only including POLR2A instead of IKZF1. These findings suggest that genes up-regulated in both aging and neurodegeneration may share similar regulatory mechanisms, while those genes down-regulated in the CNM may be transcriptionally regulated in a manner unique to neurodegeneration.

Figure 6
figure 6

ENCODE ChIP-seq significance analysis identifies transcription factors upstream of CNM genes. (A) Heat map shows genes in CNM bound by transcription factors across discovery, validation, and secondary validation analyses. Heat map colors correspond to log2 standardized mean difference (Hedges’ g). Up and down-regulated CNM were analyzed separately. (B) Bar plot shows –log(q-value) for predicted transcription factors. All shown are significant (q < 0.05). Refer to Table 1 for data set information. ALS, amyotrophic lateral sclerosis; HD, Huntington’s disease; PD, Parkinson’s disease; AD, Alzheimer’s disease; PiD, classical Pick’s disease; FTLD, frontotemporal lobar dementia (Constantinidis type C); PSP, progressive supranuclear palsy; FTLD-GRNpos, frontotemporal lobar dementia with ubiquitin- and TDP-43-positive inclusions, progranulin mutation positive; FTLD-GRNneg, frontotemporal lobar dementia with ubiquitin- and TDP-43-positive inclusions, progranulin mutation negative.

Of the six transcription factors predicted to be upstream of the down-regulated CNM, REST and YY1 have previously been implicated in neurodegeneration [88]-[92]. REST is a master regulator of neuronal genes, whose protein abundance increases with stress and aging, but decreases with AD, frontotemporal dementia and dementia with Lewy bodies [88]. YY1 is a ubiquitous transcription factor previously noted to regulate several genes associated with neurodegenerative diseases including BACE1 and APP[89], SNCA[90], EAAT2[91], MTOR and PPARGC1A[92].

Assessment of disease-specific changes

As many elements of differential gene expression are shared across neurodegenerative diseases, we hypothesized that by removing elements common to other neurodegenerative diseases from a disease-specific gene signature and then functionally analyzing the remaining genes, we would be able to gain insights into the unique pathogenic mechanisms underlying each individual disease. Thus, for each disease, we used the meta-analysis approach to generate a rank ordered list of up- and down-regulated genes relative to controls. Examining where the CNM genes fall in this ordered list of genes for each individual disease validates that the CNM genes are similarly dysregulated in each neurodegenerative disease (Figure 7A). We then utilized the “leave-one-disease-out” meta-analyses previously generated (Figure 1), comprising the other 3 diseases (e.g. meta-analysis on HD alone vs. meta-analysis on AD, PD, and ALS together). These two analyses yielded ranked gene lists. We then removed significantly differentially expressed genes (FDR ≤ 0.05) identified from the 3-disease analysis gene list from the complete 1-disease analysis ranked gene list. We removed 1524, 780, 786 and 1019 genes from ALS, HD, PD, and AD-specific meta-analysis gene lists respectively. The shortened disease-specific gene lists represent genes that are expressed more strongly in a specific neurodegenerative disease (Figure 7B). As these genes represent disease-specific pathways, we then input these lists into GSEA PreRank for GO term enrichment analysis, as described earlier.

Figure 7
figure 7

Disease-specific meta-analysis. (A) Distribution of the 243 CNM genes among individual disease meta-analysis gene lists. Each line represents the presence of a CNM gene among the 11,564 genes generated from disease-specific meta-analysis, ranked from most positive standardized mean difference (left) to most negative standardized mean difference (right). (B) Disease-specific meta-analysis, after removing genes differentially expressed across the other three diseases, identifies genes more strongly expressed in a single disease. Top 10 up-regulated and top 10 down-regulated genes shown. ALS, amyotrophic lateral sclerosis; HD, Huntington’s disease; PD, Parkinson’s disease; AD, Alzheimer’s disease.

This analysis demonstrated significant unique up-regulation of immune and inflammatory genes in ALS specifically, including genes in the JAK-STAT cascade, suggesting additional inflammation over and above that shared by other neurodegenerative diseases. JAK-STAT genes have been found to be enriched in an independent gene expression analysis of ALS [93]. ALS and PD demonstrated down-regulation of additional proteasomal gene sets, and ALS showed down-regulation of genes involved in chomatin assembly suggestive of potentially unique epigenetic alterations. Notably, each of the 4 diseases, even after subtraction of genes significant in the other 3 disease, revealed persistently significant down-regulation of mitochondria-related genes, and all but ALS additionally revealed down-regulation of genes related to synaptic transmission—changes that were particularly prominent in AD (Additional file 1: Table S13).

Discussion

Although each neurodegenerative disease has been studied in detail individually, no integrated analysis has previously determined what genes and pathways are consistently conserved across all neurodegenerative diseases (Figure 8A). In total, in this study we examined 31 separate patient cohorts consisting of 1,696 independent patient samples collected using various microarray platforms from diverse institutions in different countries to identify a robust and reproducible signature. Key findings from our analysis include: (1) a common signature of neurodegeneration that correlates with histologic disease severity and (2) identification of novel candidate convergent networks, hub proteins, and transcription factors for neurodegenerative diseases. We further analyzed expression of the CNM genes in normal aging brain to identify CNM genes that are altered in both aging and neurodegeneration, versus those altered in neurodegeneration alone. We also analyzed expression of the CNM genes enriched in specific cell types to better understand whether the changes in expression are due to changes in the number of specific functional transcripts or due to reduction in neuronal density. Our results identify down-regulation of genes important for neuronal maintenance and synaptic transmission, but relative preservation of most constitutively expressed neuronal genes. Finally, we performed disease-specific meta-analysis relative to common signatures of neurodegeneration. Although there are diverse genetic and environmental causes of different neurodegenerative disease processes, our results show that the CNM represents the most reproducibly convergent pathways. Furthermore, our unbiased approach validates that neurodegeneration commonly involves elements beyond neuroinflammation. As such, our data provide a valuable resource for interpreting disease mechanisms, connecting findings from one neurodegenerative disease to another, and driving novel hypotheses.

Figure 8
figure 8

Conserved elements of neurodegeneration. (A) Schematic diagram of conserved elements of neurodegeneration. Select CNM genes of interest are shown in black text. Predicted hub genes and upstream transcription factors shown in green text. (B) Forest plots for highlighted candidate novel genes of interest. Forest plot x-axes show standardized mean difference (Hedges’ g in log2 scale) for genes in multiple data sets. Blue box sizes are inversely proportional to the SEM difference of the gene in each data set. Whiskers denote 95% confidence interval. Yellow diamonds represent combined mean difference for each gene. Yellow diamond width denotes 95% confidence interval.

Novel insights into mechanisms of neuronal degeneration

We identified 70 genes enriched in neurons that are decreased in the CNM, 53 of which are not altered with normal aging. Fewer than half of these 70 genes have previously been investigated for a specific role in neurodegeneration, despite substantial evidence in the literature suggesting that many of them could be of significant interest. We describe three examples, each of which is decreased in the CNM, but unchanged in aging, implying specificity to the neurodegenerative process (Figure 8B).

First, STAU2, a hub gene also identified in the MetaCore protein interaction analysis, is involved in neuronal RNA transport [80]. STAU2 regulates the balance of neural stem cell maintenance versus differentiation during development [94], modulates long term depression by directing dendritic localization of protein synthesis in hippocampal neurons [95], and stabilizes the RNA of dendritic and synaptic proteins including RGS4 (regulator of G protein signaling 4) [80]. Indeed RGS4 itself was the second-most highly down-regulated gene in our entire meta-analysis (Figure 2) and has previously been associated with diseases ranging from AD [96] and HD [97], to schizophrenia [98] and depression [99]. Hence, given the role of STAU2 in maintaining the fundamental structure of neuronal projections and synapses, pathologic decrease of STAU2 expression could exacerbate neurodegeneration.

Second, necdin (NDN) is expressed predominantly in post-mitotic neurons where it forms a stable complex with p53 and sirtuin1 to down-regulate p53 acetylation and protect neurons from DNA damage-induced apoptosis [100]. Though an association with neurodegenerative disease has not previously been established, one study showed that necdin ablation in mice led to exacerbated dopaminergic cell loss after MPTP exposure, while overexpression of an AAV-necdin construct almost completely abrogated MPTP-induced dopaminergic cell loss [101]. These data suggest that NDN could be critical for maintaining neuronal resilience against exogenous stressors.

Third, NAP1L2 promotes histone acetylation activity during neuronal differentiation [102]; NAP1L2 mutants are embryonic lethal due to neural tube defects [103]. The importance of chromatin regulation in neurodegenerative disease was recently highlighted in experiments showing that Tau-induced heterochromatin loss results in aberrant gene expression in tauopathies [104]. However, it has previously been reported that brains with AD have a lower percentage of euchromatin than control brains [105]. Therefore, down-regulated NAP1L2 would be consistent with the idea that neurodegeneration may additionally result from loss of essential regions of euchromatin, secondary to dysregulation of neuron-specific epigenetic regulators such as NAP1L2. If true, loss of NAP1L2 could help to explain the diverse panel of down-regulated neuron-specific genes across neurodegenerative diseases.

These three genes are only a few from the list of neuron-associated CNM genes not previously associated with neurodegeneration. However, this list also contains numerous other potentially interesting candidates, including FGF12, a regulator of NFκB signaling in neurons [106]; FGF13, a microtubule stabilizing protein regulating neuronal polarization [107]; MOAP1, a modulator of apoptosis [108]; and REEP1, a gene involved in endoplasmic reticulum maintenance that is mutated in hereditary spastic paraplegia [109], among others. While several of these genes have established neurologic phenotypes in mutants, others are entirely unstudied.

Common transcriptional regulators of neurodegenerative disease

We used the ENCODE ChIP-Seq significance tool to predict six transcription factors upstream of the 170 downregulated CNM genes. These transcription factors were not identified by our analysis of normal aging brain gene expression changes. Four of these have not previously been implicated in neurodegenerative diseases. SIN3A is a multifunctional scaffolding protein that forms part of a large co-repressor transcriptional regulatory complex. It recruits a wide variety of epigenetic modifiers that collectively repress gene expression in non-neuronal cells by regulating histone deacetylation and DNA methylation. In neurons, SIN3A works in concert with calcium-sensitive transcription factors to facilitate plasticity and activity-dependent gene regulation—processes fundamental to learning and memory [110]. RBBP5 is a ubiquitously expressed transcriptional activator with histone methytransferase activity [111]. RBBP5 co-purifies with the noncoding RNA 116HG, paternal deletion of which leads to Prader-Willi syndrome, a disease characterized in part by intellectual disability [112]. Furthermore, RBBP5 has been implicated as a potential oncogene in glioblastoma [113]. SP2 is a cell cycle regulator, deletion of which disrupts neurogenesis in embryonic and postnatal brain [114]. Interestingly, SP2 was the only predicted TF from the down-regulated CNM list whose mRNA level was decreased with aging. This suggests a potential dose-dependent effect of SP2 activity, with decreases in aging, and more severe decreases in neurodegeneration—both of which are associated with decreased neurogenesis. Finally, ZNF143 plays roles in response to oxidative stress [115] and cell cycle regulation [116]. Collectively, these data identify potential transcriptional regulators of the core set of genes downregulated in neurodegeneration.

Shared insights across neurodegenerative diseases

An important implication of our work is that prior insights gained regarding any of the genes in the CNM may prove broadly relevant across neurodegenerative diseases. For many genes in the CNM that have been associated with neurodegeneration previously, such a role has typically only been investigated for one or two diseases. However, such insights, when considering the convergent pathways revealed by our analysis, may benefit research in other diseases. For example, RNF11 is a regulator of NF-κB signaling previously associated with only PD [117]. Low levels of axonal NEFL mRNA have previously been linked to ALS [118]. RAB3B, originally identified in a screen for genes enriched in MPTP-resistant A10 dopaminergic neurons relative to MPTP-susceptible A9 neurons, proved capable of protecting dopaminergic neurons when overexpressed in A9 neurons [119]. Neuronal CD200 is a negative regulator of inflammation, previously found to be decreased AD [120] and PD [121]. Our data suggest that each of these genes is in fact decreased across each neurodegenerative diseases in our analysis—AD, PD, HD ALS, and likely others—suggesting broader potential applications of these prior findings.

Metallothioneins and oxidative stress

Four of the top 10 up-regulated genes in the CNM were metallothioneins (MT2A, MT1Z, MT1H, and MT1F), each of which was also up-regulated in the normal aging brain. Metallothioneins are increasingly being explored in neurological diseases as potential therapeutic targets [122]. Metallothioneins 1 and 2 are expressed predominantly in astrocytes and are critical for buffering zinc, which has been implicated in the production of reactive oxygen species (ROS) in association with aging and inflammation [123]. States of increased oxidative stress promote mobilization of zinc from matrix metallothioneins after which they are taken up by mitochondria where they impede respiration and incite further ROS. This finding underscores the central additive role of further oxidative stress and mitochondrial dysfunction in all neurodegenerative diseases. That these genes were also up-regulated in normal aging in our analysis suggests that changes in these metallothioneins tended to be adaptive, rather than pathological in nature, which in turn suggests that astrocytes may attempt to mitigate this increased oxidative stress by further up-regulation of metallothioneins. Our identification of multiple reproducibly upregulated metallothioneins across all neurodegenerative diseases provides further impetus for further work in this area.

Limitations

Despite its comprehensiveness, our analysis has several limitations. Although we have made inferences about genes likely altered in neurons, the resolution of cell-type specific data from mixed tissue is inherently limited. A number of methods for statistical deconvolution of mixed tissue gene expression data have been developed, which should be used to further explore cell-type specific expression in neurodegeneration once further human brain cell-specific gene expression profiles have been established [124]. Degeneration of specific neuronal subtypes in different diseases is believed to result from selective vulnerability—an issue that is not addressed in our analysis. Based on the use of microarray data, including that from multiple platforms, we can draw no conclusions regarding the broadly observed alterations in splice variants that are increasingly implicated in neurodegeneration; future analysis of transcriptome data derived from RNA-seq will illuminate this issue. Work to evaluate conserved epigenetic signatures of neurodegeneration will also be of great interest once sufficient relevant data are available in the future. Samples included in our analysis were largely derived from late stage disease, thereby masking potentially important early changes that could offer targets for preventative therapies. Nevertheless, the CNM was found to associate with histologic disease severity (Figure 3). Further work to collect region- and cell-type specific transcriptome data at multiple stages of disease with next generation sequencing technology will dramatically enhance the insights obtainable through bioinformatics analysis in the future.

In addition, while most included studies attempted to use brain tissue without co-pathologies, there are potentially other pathologies in the samples. Given the size of our study and the number of sources that the samples in these studies came from, we are optimistic that such confounding is minimized. Furthermore, the common neurodegeneration pathways identified may also be shared with other prevalent human diseases, like diabetes mellitus and atherosclerosis, which requires further investigation. However, such an analysis is out of the scope of our current analysis.

Finally, although our data shed light on the conserved signature of neurodegeneration, direct experimentation will be required to determine which of these newly highlighted changes are (1) direct etiological contributors to degeneration; (2) appropriate “survival” reactions activated in a valiant attempt to preserve cellular viability; or (3) stress-related changes that, though adaptive in the acute setting, lead to neurodegenerative sequelae in the long term.

Conclusions

We carried out an integrated multi-cohort analysis of CNS tissue microarrays from AD, PD, HD, and ALS, thereby identifying a conserved transcriptional signature of neurodegeneration. These results were confirmed in additional independent publically available neurodegeneration CNS tissue microarray data sets meeting our inclusion criteria. Impaired bioenergetics with global down-regulation of mitochondria-related genes was the most predominantly conserved theme of neurodegeneration, accompanied by evidence of neuroinflammation, protein mishandling, oxidative stress, microglial activation, gliosis, and coordinated down-regulation of a host of genes essential for neurotransmission and normal neuronal function (Figure 8A). Overall, our functional analysis of the CNM, using Gene Ontology terms, MetaCore canonical pathways, and ChIP-Seq transcription factor prediction analysis, confirmed established findings and revealed additional novel insights. We believe the CNM represents a rich repository of convergent candidate genes that may be harnessed to improve our understanding of neurodegeneration, provide unique biomarkers for neurodegeneration, and facilitate the development of therapeutic strategies. We hope that these data will aid those studying neurodegeneration and pursuing therapies for these devastating diseases.

Availability of supporting data

The data sets supporting the results of this article are available in the ArrayExpress and the Gene Expression Omnibus online repositories, at http://www.ebi.ac.uk/arrayexpress/ and http://www.ncbi.nlm.nih.gov/geo/. Data set and individual sample accession numbers can be found in Table 1 and Additional file 1: Table S1.

Additional files