Increased brain expression of GPNMB is associated with genome wide significant risk for Parkinson’s disease on chromosome 7p15.3

Genome wide association studies (GWAS) for Parkinson’s disease (PD) have previously revealed a significant association with a locus on chromosome 7p15.3, initially designated as the glycoprotein non-metastatic melanoma protein B (GPNMB) locus. In this study, the functional consequences of this association on expression were explored in depth by integrating different expression quantitative trait locus (eQTL) datasets (Braineac, CAGEseq, GTEx, and Phenotype-Genotype Integrator (PheGenI)). Top risk SNP rs199347 eQTLs demonstrated increased expressions of GPNMB, KLHL7, and NUPL2 with the major allele (AA) in brain, with most significant eQTLs in cortical regions, followed by putamen. In addition, decreased expression of the antisense RNA KLHL7-AS1 was observed in GTEx. Furthermore, rs199347 is an eQTL with long non-coding RNA (AC005082.12) in human tissues other than brain. Interestingly, transcript-specific eQTLs in immune-related tissues (spleen and lymphoblastoid cells) for NUPL2 and KLHL7-AS1 were observed, which suggests a complex functional role of this eQTL in specific tissues, cell types at specific time points. Significantly increased expression of GPNMB linked to rs199347 was consistent across all datasets, and taken in combination with the risk SNP being located within the GPNMB gene, these results suggest that increased expression of GPNMB is the causative link explaining the association of this locus with PD. However, other transcript eQTLs and subsequent functional roles cannot be excluded. This highlights the importance of further investigations to understand the functional interactions between the coding genes, antisense, and non-coding RNA species considering the tissue and cell-type specificity to understand the underlying biological mechanisms in PD. Electronic supplementary material The online version of this article (doi:10.1007/s10048-017-0514-8) contains supplementary material, which is available to authorized users.


Introduction
Parkinson's disease (PD) is the second most common neurodegenerative disease, characterized by movement-related symptoms including bradykinesia, rigidity, and tremor, as well as an increasingly appreciated array of non-movement issues [1]. The symptoms derive from extensive neuronal cell death, most notably (but not exclusively) of dopaminergic neurons within the substantia nigra pars compacta. The etiology of PD is complex, and is thought to involve the interplay of several factors, including environmental exposure and genetic The complete list of the IPDGC members is listed in the Supplementary Material.

Electronic supplementary material
The online version of this article (doi:10.1007/s10048-017-0514-8) contains supplementary material, which is available to authorized users. predisposition. Our understanding of the latter has undergone a transformation in the last two decades, moving from fully penetrant causative variants inherited in a Mendelian fashion to subtle risk factors impacting on transcript expression, genegene interactions, gene-protein interactions, and other downstream processes in different tissues and specific cell type [2]. In recent years, genome wide association (GWA) metaanalyses have opened a new window on how common variation in the general population can increase lifetime risk of developing PD. The most recent of these, a meta-analysis study conducted by Nalls et al. included 13,708 cases and 95,282 controls, identified 26 risk loci of which 6 were novel. Thirty significant associations between SNPs of interest and either CpG methylation or messenger RNA (mRNA) expression profiles across the six newly identified loci were identified [3]. Thus, the application of GWA approaches over the past decade have identified a large number of loci associated with increased risk of PD and helped prioritize genomic regions of interest for further functional characterization.
A major challenge for the Parkinson's community is, therefore, to decipher the functional sequelae of the variants identified by GWA studies in order to achieve a deeper understanding of the genetic etiology of PD and uncover novel drug targets/pathways, thereby accelerating drug development. Parallel studies by a number of groups using different experimental approaches have investigated the functional roles of these variants such as their effects on gene expression (expression quantitative trait loci (eQTLs)) [4][5][6][7], long non-coding RNA trans-regulation [8], and protein-protein interaction networks [9] providing substantial insights into disease mechanisms for a number of common disorders. The impact of eQTLs is of particular interest as it can provide compelling evidence linking a risk variant and disease-specific genetic alterations in terms of altered expression and splicing levels, therefore yielding insight into the disease association and mechanism. Hence, an eQTL analytical approach can bridge the gap between the structural variants and their functional and regulatory implications which can facilitate further integrative analyses.
A number of eQTL studies have been conducted for different human diseases in order to understand the effect of the associated variants on the candidate transcripts. For example, prostate cancer risk SNPs were analyzed from 471 prostate tissues [10], identifying 51 significant eQTLs associated with 88 genes. In a subsequent study, an eQTL mapping approach was applied to human inflammatory bowel (IBD) diseases in five primary immune cell types. This study involved 91 patients with active inflammatory disease, 46 with antineutrophil cytoplasmic antibody-associated vasculitis, and 43 healthy controls. As a result, novel eQTLs in 34 IBD-associated loci were reported [11]. Other eQTL studies were performed at a genome wide scale in different control human tissues such as liver [12], blood and brain [13], and monocytes [14].
Outcomes of these studies highlighted the importance of tissue-specific eQTL and splicing QTLs in human disease.
In the context of brain disorders, a number of datasets are now publically available to look at gene expression on a regional and temporal basis [15,16]. eQTL results from the Braineac dataset, which integrates whole genome genotype and transcript expression data from 134 human control brain samples of 10 brain regions [7,17], allowing examination of genes implicated in PD by GWA analyses. Studies focusing on specific three PD loci, alpha synuclein (SNCA) [18], microtubule-associated protein tau (MAPT) [19], and leucine-rich repeat kinase (LRRK2) [4], were investigated separately in detail and have been published previously. In addition, targeted eQTL approaches have been applied in the context of PD by Latourelle and coworkers. The transcript expression profiling was performed on 23 PD prefrontal cortex brain cases and 24 controls in 5 GWAS-identified loci (SNCA, MAPT, GAK/DGKQ, HLA, and RIT2). The study identified multiple eQTLs which include both cis-acting SNP effects as well as trans-effects [20].
In this current study, the hypothesis that PD GWA risk S N P r s 1 9 9 3 4 7 t a g g i n g t h e g e n o m i c l o c a t i o n Chr7:23,145,089-23,314,256 bp (GRCh37) (Fig. 1) segregates as an eQTL with some or all of the transcripts at this genomic region. This hypothesis was tested by exploring all eQTLs of these five transcripts at this locus (GPNMB, KLHL7, KLHL7-AS1, NUPL2, and AC005082.12) using the Braineac microarray dataset [17]; a recent CAGEseq dataset [21]; GTEx Portal, which uses RNA sequence platform [22]; and NCBI's Phenotype-Genotype Integrator (PheGenI) [6]. The performed analysis here does not cover epigenetic effect; however, it is a comprehensive analysis of reported GWAS signal (rs199347) in different brain tissues as well as other human tissues using multiple datasets including our in-house dataset (Braineac) and as a result, a step forward from Nalls et al. study [3].

Results
In this study, we examined the functional effect of the PD risk SNP rs199347 on the mRNA expression levels of the transcripts at the Chr7p15.3 locus. This was performed by integrating expression data from different human brain tissues from Braineac, alongside other human tissues accessed via the GTEx portal through an eQTL approach (refer to Table 1  and Supplementary Table 1 for further details on the human tissues in GTEx) and an eQTL data generated from human frontal lobe tissues based on cap analysis gene expression sequencing (CAGEseq).
Firstly, expression profiling for the five transcripts in the locus under consideration (GPNMB, KLHL7, KLHL7-AS1, NUPL2, and AC005082.12) (Fig. 1) were compared using Braineac and GTEx datasets (refer to Fig. 2 for the locus details). The expression profiles for the antisense KLHL7-AS1and long non-coding RNA (AC005082.12) species were attained from GTEx only as the microarray platform design does not cover long non-coding RNA species. The expression pattern of glycoprotein non-metastatic melanoma protein B (GPNMB) from the Braineac dataset revealed significant regional expression differences (2.4-fold change (FC), p = 4.5 × 10 −43 ; refer to the BMaterials and methods^section for further details) with TCTX showing the highest expression and cerebellum (CRBL) showing the lowest expression (Fig. 3a). The same pattern was confirmed from GTEx data showing lowest expression in CRBL and highest in cortical regions. Due to differences between the precise regions assessed in the different datasets, comparisons were performed among the most relevant matching brain region between the Braineac and GTEx datasets ( Table 2). The expression level difference in the Kelch-like protein 7 (KLHL7) transcript observed in the Braineac dataset was a 1.5 FC with CRBL exhibiting the highest expression and white matter (WHMT) showing the lowest expression (p = 1.2 × 10 −31 ; Fig. 3b), with similar pattern observed in the GTEx dataset showing a high expression in the cerebellar hemisphere in comparison with other brain regions. For nucleoporin-like protein 2 (NUPL2) transcript, a 1.2 FC with substantia nigra (SNIG) being the lowest and TCTX being the highest (p = 8.7 × 10 −13 ) were observed (Fig. 3c) in Braineac. However, this was not the case in GTEx as CRBL showed the highest expression level followed by cortical regions. This can be understandable and we must allow expression variability, as it can be raised based on different platforms, dissection and extraction protocols, and quality controls between the two datasets.
Secondly, the eQTL analyses for the PD risk SNP rs199347 in relation with the five transcripts were studied in detail using the abovementioned datasets. The investigation of rs199347, which is located in introns 2-3 of the GPNMB gene, showed significant eQTLs in several brain regions in all the four eQTL datasets, which are Braineac, CAGEseq, GTEx, and PheGenI. In Braineac, rs199347 was recorded as a significant eQTL with the GPNMB transcript (p = 8 × 10 −13 , average across all regions). The SNP is associated with increased mRNA expression with the major allele AA in temporal cortex (TCTX), frontal cortex (FCTX), hippocampus (HIPP), putamen (PUTM), occipital cortex (OCTX), and CRBL in normal individuals ( Fig. 4a and Table 1). In the CAGEseq dataset, rs199347 was a significant eQTL in FCTX (p = 1.6 × 10 −11 ) also for the major allele AA (Table 3 and Supplementary  Table 3). In GTEx, rs199347 was also a significant eQTL in the brain (FCTX, caudate, HIPP, PUTM, and CRBL) and in heart and skin showing the same mode of effect (MOE) on the expression (+). In Table 1, further details and summary about eQTLs, mode of effect, and false discovery rate (FDR) values for GPNMB in the four eQTL datasets are shown. It is worth mentioning that several other eQTLs were found to be significant in GPNMB. For example, rs156425, rs6967526, and rs858272 were found to be significant in several brain regions across all datasets studied. The rs156425 was observed with the highest significance in FCTX, TCTX, OCTX, and PUTM, while rs6967526 and rs858272 showed high significance in FCTX, TCTX, OCTX, PUTM, and HIPP. The fact that the above SNPs belong to the same linkage disequilibrium (LD) as rs199347 justify their significance as eQTLs in the similar brain regions (refer to Table 1 for further details).
Thirdly, deeper analysis revealed that rs199347 SNP is associated not only with increased expression of the GPNMB transcript in brain but also with altered expression of four other transcripts at this locus, which are KLHL7, KLHL7-AS1, NUPL2, and AC005082.12 (Tables 3 and 4 and Supplementary Table2). In GTEx, these transcripts demonstrated significant eQTL association resulting in decreased expression of KLHL7 mRNA in thyroid and brain (nucleus accumbens) and increased expression in heart and skeletal muscle with FDR values ranging from 2.4 × 10 −06 to 3.7 × 10 −10 (Table 3 and Supplementary Table 2); although the similar expression effect was observed in Braineac, this eQTL did not pass the multiple correction FDR threshold (Fig. 4b). Most importantly, rs199347 shows the highest significant eQTL associations with KLHL7-AS1 (antisense RNA1) and has decrease effect on the expression in 41 tissues (FDR ranging from 1 × 10 −13 to 1 × 10 −39 ) such as heart, lung, Fig. 1 Regional association plot for rs199347 SNP at Chr7p15.3 locus from discovery phase. The plot shows regions ±1 Mb most significant SNP from PD GWAS study and the five transcripts in the locus investigated in this study. This locus was named as GPNMB locus. Only the five transcripts at this locus are shown in this figure. Figure modified from Nalls et al. [3] and many others. including brain regions (anterior cingulate cortex, HIPP, caudate, CRBL, FCTX, PUTM, and cortex) with less significant association (FDR ranging from 1 × 10 −06 to 1 × 10 −13 ) and increase effect on the expression in immune system tissue (spleen; FDR 3 × 10 −10 ) (refer to Table 4 and Supplementary Table 2). In addition, rs199347 is associated with expression of a long non-coding RNA (AC005082.12) in tissues other than brain. This long noncoding RNA is located 29 kb 5′ to the GPNMB transcript ( Fig. 2, Table 4, and Supplementary Table 2). It is important to mention that the expression profile and eQTL analyses for KLHL7-AS1 and long non-coding RNA (AC005082.12) could  Table shows information extracted, summarized, and compared from Braineac, CAGEseq, GTEx, and PheGen datasets. The rs199347 is reported as a GWAS for PD, and it is a significant eQTL mostly in brain, specifically in cortical regions. It is clear that GPNMB eQTLs are brain specific followed by the heart and skin (for more details about other tissues and other SNPs, see Supplementary Table 1). Low numbers of less significant QTLs in other tissues are reported such as gastrointestinal tissues. No eQTLs were detected in other 21 human tissues that GTEx tested such as liver and kidney. Other reported SNPs are significant eQTLs in the three datasets, and they are in the same linkage disequilibrium (LD) with the SNP of interest rs199347. It is worth noting that different datasets reported same effect of rs199347 on GPNMB expression. MOE is the mode of effect. The (−) and (+) indicating the mode of the effect of the QTL on the expression either increase (+) or decrease (−) in association with the major allele. The p value is the unadjusted value of eQTL. False discovery rate (FDR) is the adjusted p value with FDR threshold 1%. The FDR was calculated within each tissue. Braineac and CAGEseq FDR threshold is 1%. GTEx and PheGen FDR threshold is 5% (for more details, please see Supplementary Tables 1 and 3) not be obtained from Braineac as it is a microarray platform and the probes specific for this transcript were not included in the array design. Therefore, the comparison between GTEx and Braineac could not be performed. Finally, the SNP rs199347 is an eQTL with NUPL2 transcript and shows an increased expression in 24 tissues including brain, heart, lung, spleen, and skin (FDR ranging from 4.9 × 10 − 06 to1.2 × 10 − 19 ) in GTEx (Table 3 and  Supplementary Table 2). Similar pattern of increasing mRNA expression of NUPL2 in relation to rs199347 was also observed in the FCTX, CRBL, and PUTM in Braineac, but it did not pass the multiple test FDR correction (Fig. 4c). In the CAGEseq dataset, rs199347 was a significant eQTL in FCTX (FDR = 3.6 × 10 −05 ) also for the major allele AA (Table 3 and  Supplementary Table 3). An additional eQTL (rs1474347) was revealed in lymphoblastoid cells with NUPL2 with FDR 1 × 10 −04 .

Discussion
Over the past two decades, GWA studies have revolutionized our understanding of common genetic variation and helped us to map genomic loci that are associated with increased risk for common human disease. The majority of these risk variants, however, are not associated with coding changes in expressed proteins [23], and a major challenge for the research community is to identify and understand the subtle functional consequences of non-coding genetic variation linked to disease risk in the human genome. These functional modifications can be via altered expression, splicing, and methylation patterns of targeted transcripts and proteins that can be localized in specific tissues, regions or cells, and at specific time points in development or ageing. GWA data provides, therefore, only the starting point in terms of understanding the functional impact on transcripts and proteins in the context of disease etiology. One approach to achieve greater understanding of the link between genomic variation and functional consequence is to combine GWAS and multiple eQTL studies to understand the functional effects of risk loci and provide further information about the link between genetic association and cellular mechanisms [24]. Previous analysis of eQTL results from the Braineac resource revealed that 17.4% of GWAS SNPs associated with brain-related traits were functional eQTLs [7]. A number of other studies have used same approach by overlapping GWAS and whole genome eQTL results for different human diseases to prioritize targeted loci/transcripts for further biological experiments [2,10,21].
Applying an eQTL analysis approach can shed light on variation in gene expression associated with PD and help to develop our understanding of disease etiology. Both Braineac and GTEx gene expression datasets revealed differential expression levels between different brain regions (Braineac) and other human tissues (GTEx) for the named transcripts at the chromosome 7 PD association locus. The data presented above demonstrate that the risk SNP rs199347 is an eQTL with the five transcripts we investigated at this locus ( G P N M B , K L H L 7 , K L H L 7 -A S 1 , N U P L 2 , and AC005082.12) at different significant levels in different brain regions, cortex, and PUTM being the highest, as well as in other human tissues such as heart and skin. It is important to note that this genomic locus on Chr7p15.3 (∼169 kb) is in  CRBL cerebellum, OCTX occipital cortex, FCTX frontal cortex, TCTX temporal cortex, SNIG substantia nigra, WHMT white matter, HIPP hippocampus, PUTM putamen, THAL thalamus, MEDU medulla. This plot shows that GPNMB expression in TCTX is higher by 2.4-fold change (FC) compared with CRBL. B Box plot of mRNA expression levels for KLHL7 in 10 brain regions, from microarray experiments on a log2 scale (y axis). This plot shows that KLHL7 expression in CRBL is higher by 1.5 FC compared with WHMT. C Box plot of mRNA expression levels for NUPL2 in 10 brain regions, from microarray experiments on a log2 scale (y axis). This plot shows that NUPL2 expression in TCTX is higher by 1.2 FC compared with SNIG. Whiskers extend from the box to 1.5 times the inter-quartile range. Whiskers extend from the box to 1.5 times the inter-quartile range high LD block structure based on the HapMap project [25], a fact further emphasized by the spread of genome wide significant SNPs identified in the PD GWAS and displayed in Fig. 1. It is therefore challenging to dissect and specify from which gene/transcript the signal is driven, although the increase in the expression of GPNMB in multiple datasets (Braineac, CAGEseq, and GTEx), and the localization of the most significant risk SNP at the locus to the GPNMB gene, suggests that the GPNMB is the most logical candidate coding gene in the Chr7p15.3 locus. These data, however, do not exclude potentially important functional roles for the other transcripts, antisense, and sense non-coding RNA species within this locus. GPNMB revealed brain-specific eQTLs in most brain regions, which are reported and confirmed in both datasets (refer to Table 1). In addition, based on a recent study that identified PD risk loci that linked to immune system relevant to PD [23], no eQTLs were observed in any immune system tissues (e.g., spleen and lymphoblastoid cells) in all three datasets for this transcript. For NUPL2 and KLHL7, only eQTLs in brain and other human tissues from CAGEseq and GTEx passed the FDR threshold (refer to Table 3). KLHL7-AS1 revealed the most significant eQTLs in brain tissues in the GTEx dataset (refer to Table 4). It is worth noting that the KLHL7 transcript demonstrates significant eQTL in only 5 out of 44 tissues in GTEx, while KLHL7-AS1 shows significant associations with the risk SNP in 43 tissues with an opposite effect on the expression. This supports a role for the antisense RNA as a key regulator of KLHL7 in diverse tissues and demands more consideration in future studies to understand its interaction with other transcripts in greater detail. In addition, expression of the long non-coding RNA AC005082.12 was increased in a range of human tissues (although notably not brain tissue) as an eQTL associated with rs199347 (refer to Table 4). Interestingly, rs199347 eQTL shows differences in its effects on the mRNA expression patterns in the brain tissues, as it shows higher expression with the major allele in case of GPNMB, KLHL7, and NUPL2, but shows decreasing in the expression of KLHL7-AS1. The data reported by Nalls et al. indicated that the rs199347 is associated with increased expression of NUPL2 and decreased methylation of GPNMB in FCTX and CRBL brain regions [3]. Previous studies compared single-cell-type specific expression patterns for human GPNMB in the mouse astrocytes, neurons, OPC, oligodendrocytes, microglia, and endothelial tissues, demonstrating that GPNMB is highly expressed in glial cell populations, while the expression in neurons is minimal. This calls for further human single-cell expression studies ( Supplementary Fig. 1) [26], which would aid in building on the existing knowledge regarding cell-specific functional mechanisms in PD. This suggests a complex role of the eQTL that could be transcript, tissue, cell specific, and species and demands further investigations on possible functional interaction between these coding transcripts and antisense and sense non-coding RNA species in the brain.
A confounding factor when interpreting and understanding genome wide association data is that reported associations can be skewed by population-specific aspects of the results. It is of note that a GWAS conducted in PD, amyotrophic lateral sclerosis (ALS), and multiple system atrophy (MSA) cohorts from a Han Chinese population reported that GPNMB has no association of rs156429, which is located in intron 6-7 of GPNMB gene, Chr7:23,266,401, with PD; this SNP is in strong LD with rs199347, suggesting that either the association is population specific or indicating the need of meta-analyses in large cohorts in order to eradicate false negative results [27].
In terms of biological roles, GPNMB is a glycoprotein transmembrane protein of unknown function. It has been reported to have a potential neuroprotective role in the spinal cord of an ALS mouse model and showed high protein expression level in CSF of human ALS patients [28]. Intriguingly, GPNMB mRNA and protein expression have also been linked to Gaucher's disease [29] and Niemann Pick type C [30], two lysosomal storage disorders. The former has important genetic links with PD, reinforcing a potential link between this protein and PD. Equally of interest is a role for GPNMB in the severity of IBD models [31], as other PD link genes (notably LRRK2) have demonstrated a phenotypic overlap with IBD [4]. A number of studies have linked increased expression of GPNMB to tumors, and indeed, GPNMB is being used as a potential binding partner for targeting drugs to cancerous cells [28,32]. Gene ontology suggests that the GPNMB plays a role in many molecular functions, for example, integrin binding, protein complex   A similar pattern was observed in other brain regions, but not as significantly. B Box plot shows KLHL7 expression stratified by rs199347 in 134 brain samples. No significant association was observed after multiple testing correction FDR was applied. C Box plot shows NUPL2 expression stratified by rs199347 in 134 brain samples. The SNP is associated with increased expression in CRBL, TCTX, and FCTX, although no significant association was observed after multiple testing correction FDR was applied. Whiskers extend from the box to 1.5 times the interquartile range binding, ion binding, and receptor binding. NUPL2 is part of nuclear export signal receptor, mRNA transport, and establishment of RNA localization. KLHL7 acts as a mediator for protein ubiquitination and modification [33,34]. Further biological investigation relating to the role of these genes in cellular pathways and function is vital and could clarify a putative role for one of these genes in association with the eQTL in PD. In summary, the results of this study reinforce a need for greater functional characterization of the biological roles of Table shows information extracted and summarized from GTEx, CAGEseq, and PheGenI datasets for rs199347 in association with the other transcripts in the GPNMB locus, KLHL7 and NUPL2. The rs199347 is a significant eQTL with KLHL7 in the basal ganglia in brain and thyroid causing decrease in the expression of the transcript with the major allele, while it is causing increase in the expression of the same transcript in heart and muscles. For NUPL2, this SNP is an eQTL in 24 tissues including adipose, heart, brain, muscle, and lung with highest significance association in CRBL. MOE is the mode of effect. The (−) and (+) indicate the mode of the effect of the QTL on the expression, either increase (+) or decrease (−) in association with the major allele. GTEx FDR threshold is 5% and CAGEseq FDR threshold is 1% (for more details, please see Supplementary Tables 2 and 3) the genes at this locus in order to determine their potential role in the etiology of PD, with GPNMB prioritized for such treatment. These data also further emphasize the challenges presented by the GWA analyses with regard to developing a detailed mechanistic understanding of pathways to disease and highlights the importance of combining genetic approaches with functional analysis and investigations to improve resolution of these issues. The data presented herein suggests that an increased expression of GPNMB in brain tissue underlies the association between PD risk and chromosome 7p15.3. With currently available datasets and analysis techniques, however, it is not possible to exclude alterations in other genes at the locus as the causative link between 7p15.3 and PD. Further experimental investigation into gene expression and functional variation at this locus is, therefore, a priority.  Table shows the information extracted and summarized from GTEx datasets for rs199347 in association with the other transcripts in the GPNMB locus, KLHL7-AS1 and AC005082.12. The rs199347 is a significant eQTL with KLHL7-AS1 in the brain (cortex, PUTM, and HIPP) and other 33 human tissues with higher significant FDR values. The SNP has the same effect in all brain regions by decreasing the expression of the transcript with the major allele. However, there is opposite effect in the spleen. The same effect on the expression of the long noncoding RNA AC005082.12 is also observed in 18 human tissues but not the brain. MOE is the mode of effect. The (−) and (+) indicating the mode of the effect of the QTL on the expression either increase (+) or decrease (−) in association with the major allele. GTEx FDR threshold is 5% (for more details, please see Supplementary  [3] were taken as potential candidates for expression (eQTL) analysis. eQTL reporting and analysis were performed on several datasets including the in-house dataset, Braineac, which contains Brain tissues originating from 134 control individuals collected by the Medical Research Council (MRC) Sudden Death Brain and Tissue Bank, Edinburgh. The dataset contains brain tissues from the following regions: FCTX Brodmann areas 9 and 46; TCTX Brodmann areas 21, 41, and 42; parietal parasaggital (PCTX) Brodmann areas 3, 1, and 2; OCTX (specifically primary visual cortex) Brodmann area 17; HIPP; thalamus (THAL); PUTM; SNIG; medulla (MEDU; specifically inferior olivary nucleus); CRBL; and intralobular WHMT below Brodmann areas 39 and 40. RNA isolation and processing of brain samples were performed and analyzed using Affymetrix Exon 1.0 ST Arrays. In parallel, genomic DNA was extracted and gentotyped on the Illumina Infinium Human Omni1-Quad BeadChip. The QTL analysis was run for each expression profile (either exon level or transcript level) against every genetic marker (either SNP or indel) in Matrix eQTL [35]. Subsequent analyses were conducted in R open source software. A detailed description of the samples used in the study, tissue processing, dissection, and analysis pipeline is provided in main published papers for Braineac dataset [4,7,17]. ANOVA modules (method of moments) were performed using Partek® Genomics Suite ™ to determine differentially expressed transcripts among 10 regions. The date of array hybridization (batch effects), gender, region, and individual were included as covariates to eliminate the possibility of variability that influences the expression profiles. All p values were corrected for multiple comparisons using the FDR step-up method. The eQTL results were classified by the marker type, SNP or indel; expression type, exon or gene/ transcript level; and the distance of SNP to the transcription start site, cis or trans. Then, the FDR was calculated by Matrix eQTL [35] based on the Benjamin-Hochberg method. Basically, it takes into account the multiple tests performed based on a single probe, which includes all the SNPs around 1 Mb window of the boundaries of the probe. Only the associations with FDR <1% were considered for the subsequent analyses. All data is now publicly available online at http://www.braineac.org/. The eQTLs obtained for the transcripts in the Chr7p15.3 locus in Braineac dataset were cross verified in multiple datasets from the GTEx portal and NCBI's PheGenI in brain and different tissues. All the QTL data were downloaded, collected, and summarized in Table 1 and Supplementary Table 1 based on the most significant SNP as eQTL and tissue specificity.
GTEx dataset [36] consists of a total of 8555 samples from 53 tissues of 544 donors for which RNAseq was conducted. The dataset has eQTL analysis for 7051 samples from 44 tissues of 449 individuals which combine genotype data from whole exome and genome sequencing as well as expression data from microarray and RNA sequencing. eQTL analysis was performed using Matrix eQTL [35]. FDR of 5% threshold was used to correct for multiple hypothesis. Data is available from the publicly available database at http://www.gtexportal.org/home/. Data was downloaded on July 2016, version 6. It is noteworthy that not all the brain regions in Braineac and GTEx datasets directly overlap. In these cases, the most relevant and closest region was taken for comparison. See Table 2 for more details.
PheGenI merges the NHGRI-GWAS catalogue data with several databases at NCBI, including Gene, dbGaP, OMIM, GTEx, and dbSNP. The eQTL data consists of 1269 samples from 7 tissues. The data is available at NCBI's PheGenI website (http://www.ncbi.nlm.nih.gov/gap/phegeni) or in the eQTL browser (https://www.ncbi.nlm.nih.gov/projects/ gap/eqtl/index.cgi). CAGEseq data was obtained from a previous published study consisting 119 FCTX samples. eQTL analysis was performed using Matrix eQTL with covariate postmortem interval, age, gender, and RNA integrity number and the first six principal components. A detailed description of the included samples, library preparation, and analysis pipeline is provided in main published paper [21].
Expression in single-cell types of human and mouse brain tissues The expression pattern for the transcripts was studied for eight single-cell types, namely, neurons, astrocytes, oligodendrocyte precursor cells, newly formed oligodendrocytes, myelinating oligodendrocytes, microglia, and endothelial cells from the database-RNA sequence transcriptome and splicing database of glia, neurons, and vascular cells of the cerebral cortex [26]. The data is publicly available at http:// web.stanford.edu/group/barres_lab/brainseqMariko/ brainseq2.html. Conflict of interest The authors declare that they have no conflict of interest.
Ethical approval All procedures performed in studies involving human participants were in accordance with the ethical standards of the institutional and/or national research committee and with the 1964 Helsinki declaration and its later amendments or comparable ethical standards.
Open Access This article is distributed under the terms of the Creative Comm ons Attribution 4.0 International License (http:// creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.