Towards a pathway definition of Parkinson’s disease: a complex disorder with links to cancer, diabetes and inflammation
- First Online:
- Cite this article as:
- Moran, L.B. & Graeber, M.B. Neurogenetics (2008) 9: 1. doi:10.1007/s10048-007-0116-y
- 1.5k Downloads
We have previously established a first whole genome transcriptomic profile of sporadic Parkinson’s disease (PD). After extensive brain tissue-based validation combined with cycles of iterative data analysis and by focusing on the most comparable cases of the cohort, we have refined our analysis and established a list of 892 highly dysregulated priority genes that are considered to form the core of the diseased Parkinsonian metabolic network. The substantia nigra pathways, now under scrutiny, contain more than 100 genes whose association with PD is known from the literature. Of those, more than 40 genes belong to the highly significantly dysregulated group identified in our dataset. Apart from the complete list of 892 priority genes, we present pathways revealing PD ‘hub’ as well as ‘peripheral’ network genes. The latter include Lewy body components or interact with known PD genes. Biological associations of PD with cancer, diabetes and inflammation are discussed and interactions of the priority genes with several drugs are provided. Our study illustrates the value of rigorous clinico-pathological correlation when analysing high-throughput data to make optimal use of the histopathological phenome, or morphonome which currently serves as the key diagnostic reference for most human diseases. The need for systematic human tissue banking, following the highest possible professional and ethical standard to enable sustainability, becomes evident.
KeywordsAposklesis Expression analysis Lewy bodies Microarrays Network medicine Neurodegeneration Synaptic dysfunction
Our understanding of Parkinson’s disease (PD) is largely incomplete. However, the pace of discovery in this field is rapidly accelerating. It took more than 100 years for the key region of neuronal damage, the substantia nigra to be identified , and it took almost 80 years for the first disease-causing mutation to be discovered . Not even 10 years later, the first whole genome transcriptome analysis had been performed , and a number of other microarray studies focusing on known sequences were carried out (e.g. [4, 5, 6]). We now provide the complete list of 892 highly dysregulated PD nigral genes derived from a brain tissue-validated whole genome expression microarray data set. In addition, predicted interactions of a number of these genes are reported as potential drug targets. We would like to emphasise that the neurohistological validation that is so crucial for our work and which has already led to the identification of two novel Lewy body components predicted on the basis of this dataset [7, 8] could not have been performed without generous brain donations. In addition, the iterative analysis performed combining histological phenome (morphonome) data and clinical criteria within silico data mining would not have been possible without significant advances in computing, notably virtual machine technology. It is readily apparent that a publication of this format requires the Internet as it would not have been possible to publish its almost 3,200 hyperlinked files on paper, which are provided as electronic supplemental material.
PD is a severely disabling neurodegenerative disorder second in frequency only to Alzheimer’s disease and has a significant socio-economic impact. Unlike in Alzheimer’s disease, however, the brain region taking the brunt of the disease process is rather well circumscribed. In addition, there is widespread consensus on diagnostic criteria both clinically and neuropathologically (http://www.ICDNS.org). This is at least in part due to the fact that the leading motor symptoms are less complex and easier to recognize and define than the clinical signs in disorders mainly affecting higher brain functions such as cognition. Furthermore, there is symptomatic treatment for PD pointing to key pathways involved. All these are important prerequisites when working with high throughput technologies such as microarrays which require precise tissue sampling because the procedures employed are both laborious and expensive.
Major known pathways involved in PD include the ubiquitin-proteasome system dysfunction of which may lead to abnormal protein deposition, mitochondrial failure and decreased expression of synaptic proteins [6, 9, 10, 11]. Oxidative stress has been traditionally implicated in the aetiology of the disease but the changes observed could be secondary. The concept of ‘neuroinflammation’ has become very popular recently [12, 13, 14], but our own work in this field does not currently support a role for microgliosis as a driver of the disease process . The effective failure of recent studies employing non-steroidal anti-inflammatory drugs supports this notion [16, 17]. Thus, there are leads and popular ideas but the big picture of PD pathogenesis is still missing. A true understanding of PD and its subtypes will require integrated knowledge from several system biological levels, ranging from genomics to proteomics and metabonomics as well as clinical data and neuroimaging. Through this study, we aim to contribute a validated transcriptomic data layer.
Materials and methods
Data set used
The 94.CEL files used for this study have been deposited at the National Center for Biotechnology Information, Gene Expression Omnibus (http://www.ncbi.nlm.nih.gov/projects/geo) with GEO Series accession number GSE8397 (scheduled release date: 1 January 2008). The patient samples employed have been described previously . This data set is based on Affymetrix HU_133A and HU_133B gene chips set and has been extensively validated over a period of 2 years using qRT-PCR, immunocytochemistry and in situ hybridisation to cellularly ‘back-map’ sequences of interest [7, 8, 10, 18].
Data analysis procedures employed
In silico analyses were performed with the help of programme packages from different suppliers. Our initial microarray data analysis was performed using ArrayAssist 3.0 (Stratagene) but there were intermittent problems with some versions of the software (3.2–3.4). We have repeated our analysis using newer (4.0, 4.1; Linux and/or Windows) and the latest version of this program (ArrayAssist 5.5 for Windows, Stratagene). In addition, we have reproduced our results using an independent software package for microarray analysis, PathwayStudio 5.0. The (GC-)RMA algorithm which has become a gold standard for Affymetrix microarray normalisation was applied in all cases. In addition, PathwayAssist (Ariadne) and Pathway Architect (Stratagene) were employed during phases of our work. For ease of use, most software installations were performed in virtual machines (VMware Workstation and Fusion, respectively; http://www.vmware.com) and two virtual machines were frequently run in parallel on Windows, Linux, or Macintosh platforms. The ability to create virtual machine ‘snap shots’ of critical stages of the analysis proved invaluable for the backtracking of results and to allow comparability over time. It proved essential in the case of the most complex network analyses where software stability proved to be a factor.
We have used the microarray data in a hypothesis generating rather than hypothesis testing way (cf. [4, 10]). Since our original predictions based on the ArrayAssist 3.0 dataset turned out to be very reliable with respect to subsequent in situ tissue validation results, we have performed our refined analysis by means of this programme following extensive comparison with readings generated by the latest versions of ArrayAssist 5.5 and PathwayStudio 5.0. Anyone intending to reproduce our results is referred to the original CEL files (GEO ID GSE8397).
The following original cases were excluded from our refined analysis to obtain more homogeneous cohorts taking newly obtained histological and expression data into account: Con4, 8 and MS155 as well as the sample of medial substantia nigra from PD22. Thus, nine control nigra and 23 PD nigra samples remained in the study. To find differentially expressed genes, the p value cut off was kept at 0.001 (differential expression = 1 log2). Multiple testing corrections (FDR, Bonferroni) were carried out for comparative purposes. The 892 top genes identified on the basis of 1,145 probes are referred to as the ‘priority genes’.
Hierarchical clustering was executed on both rows and columns using ArrayAssist 5.5 (Pearson centred distance metric, centroid linkage rule). Similarity images were produced to visualise the quality of clustering results (SI_Figure_1). In addition, self-organising map clustering was performed on both rows and columns using a Euclidean distance metric in ArrayAssist 5.5 (maximum number of iterations 50, number of grid rows 3, number of grid columns 4, initial learning rate 0.03, initial neighbourhood radius 5, grid topology hexagonal, neighbourhood type bubble).
For finding cell processes regulated by the differentially expressed genes, PathwayStudio software (version 5.0) was used. The 164 top up-regulated priority genes showing a differential expression >1 log2 were selected and the find common targets algorithm was employed in the build pathway tool setting the cell process filter option. The procedure was repeated for diseases entity type. Cell processes and disease conditions showing the highest number of biological associations, i.e. the strongest probabilistic relationship based on literature evidence to the group of 164 top up-regulated priority genes were selected. Furthermore, known interactions between all 892 dysregulated genes were identified using the ResNet 5.0 database of molecular interactions which has been derived from the published literature by means of a natural language processing technology called Medscan . A filter for the parameters, promoter binding, expression and regulation was applied in the latter case.
For additional validation, the commercial PathArt database (PathwayAssist plug-in, Jubilant Biosys) containing a large number of manually curated pathways was queried. A number of GEO datasets (e.g. astrocytes and microglia in culture) were downloaded to evaluate cell-type specificity of expression of individual priority genes. A search for novel secreted biomarkers was also performed of which NPTX2  may be one.
To retrieve known drug interactions of the 892 priority genes, more than 9,000,000 database objects were checked in ResNet 5.0 (PathwayStudio 5.0). Subsequently, all known interactions of individual relevant drugs contained in ResNet 5.0 were retrieved and the results overlaid with the set of priority genes. Drugs scoring high in terms of the number of their known interactions with the 892 priority genes were identified and the relationship to PD analysed employing the PubMed database (http://www.ncbi.nlm.nih.gov/sites/entrez).
The extended set of 892 PD priority genes
Gene ontology and pathway analysis of the differentially expressed genes
We did not observe an effect for gender (cf. ). Hierarchical clustering of the male and female PD patients on the basis of the expression values of the 892 priority genes did not separate the groups nor did a whole genome clustering omitting sex chromosomal sequences.
Relationship to diseases and drug interactions
A search of the ResNet database identified three disease conditions that showed the strongest probabilistic relationship to the group of top up-regulated priority genes, cancer, diabetes and inflammation as illustrated in Fig. 2b. A hyperlinked version of this figure providing details on all genes and their interactions is available online (SI_Figure_3b). Known drug interactions of some of the priority genes were retrieved from the ResNet 5.0 database through checking of more than 9,000,000 database objects. It is noteworthy that drugs such as clozapine, cocaine and haloperidol, which are used in the treatment of PD or which cause Parkinsonian side effects, appear to interact with a large number of PD priority genes (SI_Figure_8). A search of the ResNet database also yielded information on the interactions of two cytostatic drugs, paclitaxel and vincristine with the priority genes identified in this study (SI_Pathways_2&3). Both paclitaxel and vincristine have been reported to induce parkinsonian side effects [22, 23].
‘Hub’ vs ‘peripheral’ genes
A number of genes known to have numerous interactions with other genes were found amongst the priority genes. These represent so-called network ‘hubs’ and include HSPA1A, NFKBIA, CDC42, GSK3B, ACHE, AGTR1, IGF1R and TH as well as about 200 others (50 to >1,600 connectivities). Figure 2a and b contain a number of hubs. However, almost 30% of the priority genes have no known interactions according to the ResNet 5.0 database although the cellular localisation is known in some instances. Such genes may be called ‘peripheral’ genes of the human interactome [24, 25]. Apparently, peripheral priority genes with still unknown pathway connections and cellular localisation are ACTR10, ANKRD29, ANKRD34, ANKRD50, ARMCX4, ARRDC2, ASMTL, ATAD1, BLOC1S2, CAP2, CCDC4, CCDC85A, CKMT1A, CNIH3, DBNDD1, DCUN1D4, DIRAS2, DOPEY1, EHBP1, ELMOD1, FARSLB, FBXO9, FHOD3, GABARAPL3, GARNL4, GPRASP2, GPRIN3, GUSBP1, HISPPD1, HNRPUL2, IPW, KLHL1, KRT222P, LRRC49, LRRC55, LYNX1, MANEAL, MAP1LC3A, MAP9, MGC22265, MGC39606, MGC4677, MIA3, MRC1L1, NCDN, NGFRAP1L1, NRIP3, NUDT11, OCIAD2, OGDHL, OSBPL10, PCYOX1L, PFAAP5, PGM2L1, PLCXD3, PNMA6A, PRKY, PRMT8, PRPS1, RFPL1S, RUTBC2, SLC35F1, SNX10, SNX25, TAGLN3, TBC1D24, TBC1D9, TMEM130, TMEM132B, TMEM35, TRIM4, TRIM9, TSGA14, TTC7B, TUBB2B, TUBB3, UBPH, USP34, WDR47, XKR4, ZNF204.
Up-regulated ‘peripheral’ priority genes
Our study reveals a significant up-regulation of substantia nigra genes in PD which have known biological associations with cancer, diabetes and inflammation. This includes major ‘hub’ genes [24, 25, 26] such as p53, somatic mutations of which can cause cancer. This is of note as p53 forms part of a molecular network that integrates tumour suppression and ageing . DJ-1 is another cancer- and Parkinson’s disease-associated protein , and it is of special interest in the present context that the ubiquitin-proteasomal pathway has an established role in neoplastic processes . Furthermore, both parkin and PINK1 might be tumour suppressor genes, and it has been suggested that although cancer is rare in PD, unraveling the link between PD and cancer [30, 31] may open a therapeutic window for both diseases . The finding of a molecular biological association between diabetes and PD is not truly surprising either [33, 34, 35, 36, 37]. Thirdly, the link of PD with inflammation which emerges from our unsupervised analysis seems almost expected considering the very lively debate of this topic in the literature. The whole genome transcriptome data presented here certainly justify additional scrutiny of the underlying mechanisms in relation to PD pathogenesis.
The problem of defining what causes PD at a system level has become more complex with the recent finding that disease-relevant genes may reside at the periphery of disease networks. It is interesting to note that the neighbour of a disease node appears more likely to be another disease protein, which also preferentially interacts with other disease nodes . Proteins that are associated with the same disease show a 10-fold increased tendency to interact with each other than those not associated with the same disease . This should direct our attention also to genes that do not form major network hubs but which are either likely to be involved in PD on cell biological grounds (e.g. Lewy body components) or which interact with PD causing genes. A significant fraction of the genes identified in this study still represent functionally ill-characterised entities.
The on-line material of this manuscript illustrates some of the pathways and biological association networks that emerge from our analysis. Networks are now recognized to pervade all aspects of human biology and the question where function lies within a cell is shifting from a simple focus on genes to the understanding that behind each cellular function there is a discernible network module consisting of genes, transcription factors, RNAs, enzymes and metabolites . However, ‘network medicine’ is still in its infancy and the present study may be the first where an iterative multidimensional tissue analysis approach, http://www.neurogenetics.net/Multidimensional.html, has been applied to a human neurological disorder. The ultimate goal of such analyses is the precise cellular localisation of all expressed human disease genes in their affected tissues. The present PD dataset has so far yielded two novel components of Lewy bodies [7, 8] but much more back-mapping work will need to be performed.
For instance, the exact mechanism of cell death in PD is still unknown . Recent evidence has suggested that one mechanism linked to the death of terminally differentiated neurons is aberrant re-entry into the cell cycle, and possible connections between oxidative stress and unscheduled cell cycle re-entry in PD have been proposed . However, as neuroscientists, we may have to move beyond the description of the cell cycle that has been propagated by those in the cancer field because the regulation of the cell cycle in the neuron is much more nuanced (K Herrup, http://www.alzforum.org/new/detailprint.asp?id=1688). This raises the possibility that some of the data supporting cell processes such as mutagenesis in this study may have to be re-read and interpreted in a modified way. It is worth noting in this context that absence of RET signalling in mice causes progressive and late degeneration of the nigrostriatal system . We would also like to point out that the present study provides additional evidence for the importance of changes in the neuronal cytoskeleton in PD [43, 44] because neurofilament subunit as well as microtubuli-associated protein genes were found to be highly dysregulated. Dysregulation of signal transduction, heat shock and synaptic proteins also featured very prominently.
The view that the 892 nigral genes presented here are relevant for sporadic PD is supported by the finding that their pattern of expression is characteristic of nervous tissue (SI_Figure_2). It is further clear from our data that the human substantia nigra in PD does not represent dead tissue but that there is an active ongoing disease process understanding of which may hold the key to halting PD. The dysregulated priority genes may reside at the core of the disease process and could serve as novel targets for therapeutic intervention. This idea is supported by the observation that a number of priority genes interact with drugs whose actions are associated with a Parkinsonian clinical phenotype.
The uncertainty whether inflammatory processes truly represent a causative factor in the aetiology of PD  requires an answer. Our own work and that of others suggests a direct role of primary glial degeneration in the pathogenetic process underlying PD [10, 46]. This means that PD extends beyond the neuron. The disease is also not confined to the substantia nigra anatomically. Detailed cellular back-mapping of all priority genes to brain tissue will help to settle these questions. New algorithms are required to explain the links of PD as defined in the living and in the microscope with the underlying high-throughput datasets.
Collection of links to Electronic Supplemental Material (ESM)
SI_Table_1a List of all 892 highly dysregulated genes (‘priority genes’) contained in the ResNet 5.0 database (Ariadne) http://www.morphonom.net/ng/ESM/t/SI_Table_1a.xls
SI_Table_1b List of the 1,145 Affymetrix probes identifying highly dysregulated sequences including all 892 ‘priority genes’ http://www.morphonom.net/ng/ESM/t/SI_Table_1b.txt
SI_Table_1c P and differential expression values for the 1,145 Affymetrix probes http://www.morphonom.net/ng/ESM/t/SI_Table_1c.xls
SI_Table_1g List of priority genes that are published AD candidate genes http://www.morphonom.net/ng/ESM/t/SI_Table_1g.rtf
SI_Table_1h List of priority genes with known functional links to PD (PubMed) http://www.morphonom.net/ng/ESM/t/SI_Table_1h.xls
The literature references for this table can be found in http://www.morphonom.net/ng/ESM/r/SI_References_1.rtf
SI_Table_2 Gene Ontology analysis of the 892 genes http://www.morphonom.net/ng/ESM/t/SI_Table_2.xls
SI_Table_3 Signalling pathway analysis of the 892 genes (PathwayStudio 5.0) http://www.morphonom.net/ng/ESM/t/SI_Table_3.xls
SI_Figures_1a Hierarchical clustering of the 1,145 probes (dendrogram description: clustering on rows and columns, Pearson centred distance metric, centroid linkage rule); the clustering separates PD cases (green, right) from controls (red, left); designation of samples as indicated in the dataset submitted to GEO (see ‘Materials and methods’ section). http://www.morphonom.net/ng/ESM/f/SI_Figure_1a.png
Abbreviations used: F, female; M, male; LN, lateral substantia nigra; MN, medial substantia nigra; CON, PDC, controls; PD, Parkinson’s disease; followed by the number indicating the age of each subject (cf. Moran et al. ); LN, MNCON10, M, 71; LN, MNCON2, M, 77; MNCON3, M, 81; LN, MNCON9, M, 57; LN, MNPD01, F, 87; LN, MNPD02, M, 83; LN, MNPD04, M, 68; LN, MNPD07, M, 78; LN, MNPD09, F, 86; LN, MNPD10, F, 81; LN, MNPD16, F, 85; MNPD20, M, 75; MNPD21, M, 76; LN, MNPD28, M, 82; LN, MNPD29, M, 76; MNPD32, M, 89; MNPD34, F, 84; MNPD36, M, 76; LN, MNPDC1, M, 76.
An explanation for the colour coding is provided in http://www.morphonom.net/ng/ESM/f/SI_Figures_1a&c-Color_range.jpg
The labels used for the clustering and the corresponding Affymetrix probe set IDs are explained in http://www.morphonom.net/ng/ESM/f/SI_Figures_1a&c-Labels_used_for_clustering.xls
The four probes at the very bottom of the figure (XIST, X (inactive)-specific transcript) in PD01, 9, 10, 16 and 34 identify the female patients in our refined cohort. They served as an internal control.
SI_Figures_1b Column similarity image for SI_Figure_1a.png http://www.morphonom.net/ng/ESM/f/SI_Figure_1b.png
SI_Figures_1c Self-organising map corresponding to SI_Figure_1a (dendrogram description: clustering on rows and columns, Euclidean distance metric, maximum number of iterations 50, number of grid rows 3, number of grid columns 4, initial learning rate 0.03, initial neighbourhood radius 5, grid topology hexagonal, neighbourhood type bubble) http://www.morphonom.net/ng/ESM/f/SI_Figure_1c.png
SI_Figure_2 Hierarchical clustering of the 1,145 probes using 64 whole genome array datasets (Affymetrix Human Genome U133 Plus 2.0 arrays) representing individual organ samples including 20 brain regions and three ganglia . GSM numbers refer to the respective file names in the complete dataset which comprises 353 whole genome arrays (GEO database, GSE ID GSE3526). There is a complete separation of nervous tissue (left) from other organs on the basis of the 1,145 probes. http://www.morphonom.net/ng/ESM/f/SI_Figure_2.png
The designations of all samples and their code numbers are provided in http://www.morphonom.net/ng/ESM/r/SI_References_2.rtf
SI_Figure_3a Online version (with links) of Fig. 2a http://www.morphonom.net/ng/ESM/f/Cell_Processes.html
SI_Figures_4a-c Online permutations of Fig. 3
Layout by cellular localisation with links http://www.morphonom.net/ng/ESM/f/892.html
Symmetrical layout with links http://www.morphonom.net/ng/ESM/f/892s.html
Hierarchical layout with links http://www.morphonom.net/ng/ESM/f/892h.html
SI_Figure_5 Known components of Lewy bodies are indicated by the blue highlighting (this figure is identical to SI_Figure_1c except that it has a lower resolution) http://www.morphonom.net/ng/ESM/f/SI_Figure_5.png
SI_Figure_6 Online version of Fig. 4 with expression values overlaid (red indicates up- and blue indicates down-regulation in PD nigra, respectively). The blue shading of the priority genes has been replaced with yellow. http://www.morphonom.net/ng/ESM/f/Lewy_Body.html
SI_Figure_7 Hyperlinked online version of Fig. 5. The gene colours range in this figure indicates significance: high, white; low, red. The blue shading indicates that the respective gene is a priority gene. http://www.morphonom.net/ng/ESM/f/PD_genes_interactions_with_892_direct_no_DorCP.html
SI_Figures_8a-c Known drug interactions of some of the priority genes as derived from the ResNet 5.0 database (more than 9,000,000 database objects were checked). It is noteworthy that drugs such as clozapine, cocaine and haloperidol which are used in the treatment of PD or which cause Parkinsonian side effects appear to interact with an especially large number of PD priority genes.
SI_Pathway_1 This signalling pathway was identified based on a search of 192 canonical pathways and 555 signalling pathways in PathwayStudio 5.0 and ranked highest (also see SI_Table_3). Six priority genes are represented in this pathway and are marked by the blue shading. http://www.morphonom.net/ng/ESM/p/RET_HSF1_signaling_pathway.html
SI_Pathway_2 Interactions of the cytostatic drug, paclitaxel with a total of 13 priority genes (blue shading) are shown. Display by effect. Hierarchical layout. http://www.morphonom.net/ng/ESM/p/paclitaxel_interactions.html
SI_Pathway_3 Interactions of the cytostatic drug, vincristine with 3 priority genes (violet shading, bottom of figure) are illustrated. Red shaded genes also showed dysregulation in the PD nigra (p < 0.001). Display by references count (darker blue indicates a larger number of references supporting the respective connection). Hierarchical layout. http://www.morphonom.net/ng/ESM/p/vincristine_interactions.html
SI_References_1 References for SI_Table_1h http://www.morphonom.net/ng/ESM/r/SI_References_1.rtf
SI_References_2 Designations and code numbers for SI_Figure_2 http://www.morphonom.net/ng/ESM/r/SI_References_2.rtf
We express our deepest appreciation to the donors and their families for donating human brain tissue for research to the UK Parkinson’s Disease Society Brain Bank. This work was funded by the Parkinson’s Disease Society of the United Kingdom, registered charity 948776. We are also most grateful to BUG. LBM would like to thank the UK Parkinson’s Disease Society for special support.