Introduction

Headache is one of the commonest symptoms that present to clinicians in general practice or in specialist neurology clinics (Fuller and Kaye 2007). Its lifetime prevalence in individuals is as high as 93% (Boardman et al. 2003). Globally, around 46% of the adult population suffers with an active headache disorder (Stovner et al. 2007). According to the current definitions of the International Headache Society, headaches can be classified into three categories: (1) primary headaches (including migraine, tension-type headache, and trigeminal autonomic cephalalgias); (2) secondary headaches (including headaches attributed to other disorders such as trauma, infection); (3) painful cranial neuropathies, other facial pain and other headaches (Headache Classification Committee of the International Headache Society 2018).

Among all types of headache, tension-type headache is the commonest form, causing over 40% of all headaches in the general population, while migraine is the most disabling type at population level, with a prevalence of around 10% of all headaches (Riesco et al. 2017). It is important to note that an individual can experience more than one type of headache at the same time (Fuller and Kaye 2007).

According to the Global Burden of Diseases 2019 study, headache disorders represent the 14th leading cause of disability-adjusted life years (DALYs) when considering all ages and both genders in 174 causes (GBD 2019 Diseases and Injuries Collaborators 2020). It was estimated in 2003 that migraine alone costs the UK over two billion pounds a year (Steiner et al. 2003). Migraine is still one of the leading causes of disability among over 300 diseases (Steiner et al. 2020).

It has been confirmed that headaches such as migraine are heritable. The single nucleotide polymorphism (SNP) based-heritabilities of migraine and self-reported headache were 0.15 and 0.21 in Caucasians, respectively (Gormley et al. 2016; Meng et al. 2018). Genome-wide association studies (GWAS) have revealed that there are significant genetic components contributing to migraine (Anttila et al. 2010, 2013; Chasman et al. 2011; Freilinger et al. 2012; Ligthart et al. 2011). A GWAS meta-analysis paper consisting of 22 cohorts by Gormley et al. identified 38 genetic loci for migraine (Gormley et al. 2016). Our study based on the UK Biobank resource also revealed 28 risk loci for self-reported headache, of which 14 loci had been previously identified by Gormley et al. (2016) and 14 loci were newly reported (Meng et al. 2018). A recent large GWAS meta-analysis on migraine has suggested 123 migraine-related loci (Hautakangas et al. 2022).

Recently, researchers have been encouraged to perform GWAS meta-analysis on genetically correlated phenotypes, due to the increasing recognition of pleiotropy in GWAS, to boost study power to detect more genetic components (Masotti et al. 2019). Pleiotropy refers to the phenomenon where a genetic variant or a gene has non-zero effect on multiple phenotypic traits and can contribute to genetic correlations among these traits (Stearns 2010). Successful examples of recent GWAS meta-analysis studies on genetically correlated phenotypes have included education and intelligence as well as different hypertension phenotypes (Hill et al. 2019; Zhu et al. 2015).

In a previous study, we reported that the self-reported headache phenotype from the UK Biobank and the self-reported migraine phenotype from the 23andMe were genetically correlated, with a high correlation value of 0.72 (p = 1.66 × 10–68, standard error = 0.04) (Meng et al. 2020). Therefore, we aimed to perform a joint GWAS meta-analysis study of these two different but highly genetically correlated phenotypes with a view to replicating previously identified genetic associations and identifying new associations arising from the increased power of this approach.

Materials and Methods

Cohorts’ Information

The two sets of GWAS summary statistics used in this study were from the GWAS on self-reported headache based on the UK Biobank cohort and the GWAS on self-reported migraine provided by the 23andMe (Gormley et al. 2016; Meng et al. 2018).

The definitions of self-reported headache (UK Biobank) were as follows: cases (N = 74,461), defined as those who self-reported headache symptoms affecting daily lives within last month using the UK Biobank online questionnaire; controls (N = 149,312), defined as those who did not have any pain affecting daily lives within last month (UK Biobank code 6159). The corresponding GWAS analysis was performed using a linear mixed model adjusting for age, sex, nine population principal components, genotyping arrays, and assessment centers (Meng et al. 2018). The dataset contains 9,304,965 SNPs (minor allele frequency > 0.005, imputation score > 0.1).

The definitions of self-reported migraine (23andMe) were: cases (N = 30,465), defined as those who self-reported a migraine history (diagnosed by doctors or self-diagnosing) using the 23andMe online questionnaire; controls (N = 143,147), those who self-reported having no migraine. The corresponding GWAS was performed using a linear mixed model adjusting for age, sex, and five population principal components (Gormley et al. 2016). The dataset contains 19,023,436 SNPs (containing minor allele frequency and imputation score information for all SNPs). This dataset was also used by Gormley et al. (2016).

All of the participants in both GWAS were of European descent. In addition, as the UK Biobank cohort only recruited within the UK, while the 23andMe mainly recruited from the USA, there was little sample overlap between the two cohorts (linkage disequilibrium score regression intercept = 0.009) (Meng et al. 2020). The detailed cohorts’ information and the statistical methods of the two GWAS can be found in the original papers (Gormley et al. 2016; Meng et al. 2018).

The Preprocessing of the GWAS Summary Statistics

SNPs in both datasets were coded in a forward direction and according to the GRCh37 genome build. In total, 8,500,802 SNPs with minor allele frequency > 0.005 in both datasets were extracted. To ensure the datasets could be jointly meta-analysed, these SNPs were checked for same effect alleles and flipped accordingly in R (https://www.r-project.org/).

The Meta-Analysis Method

The Unified Score-based Association Test (metaUSAT) is a software package for performing GWAS meta-analysis studies on genetically correlated phenotypes (Ray and Boehnke 2018) (https://github.com/RayDebashree/metaUSAT). The metaUSAT software applies a multivariate meta-analysis approach instead of a univariate approach of analysing each related trait separately. Unlike traditional GWAS meta-analysis on a single trait, where several sets of summary statistics on a single trait are combined into a single summary measure for that trait, the multivariate meta-analysis implemented by metaUSAT does not combine the summary statistics; instead, a joint analysis is performed using summary statistics from related traits. It is a statistical inference approach that leverages related traits to provide a p value for the test of no association of any trait with a SNP against the alternative that at least one trait is associated with the SNP. Being a complex data-adaptive approach, the metaUSAT software does not output an overall effect size (Beta) and standard error (SE) values for each SNP. The metaUSAT software is robust to the association structure of correlated traits and potential sample overlap (Ray and Boehnke 2018).

The Annotation Method

The output generated from metaUSAT was uploaded to Functional Mapping and Annotation (FUMA v1.3.6b) for SNP annotation (Watanabe et al. 2017). We used the 1000 Genome Phase 3 reference panel by default and other default values adapted by FUMA in terms of defining lead SNPs and risk loci. FUMA also generates a Manhattan plot and a corresponding Q-Q plot for the meta-analysis result (https://fuma.ctglab.nl/). FUMA uses “maximum distance of linkage disequilibrium (LD) blocks to merge’’ (default value = 250 kb) to determine the number of associated loci and the r2 value (default value r2 ≥ 0.6 to be considered as non-independent) to determine the number of independent significant SNPs. In addition, the gene-based association analysis and the gene-set analysis were performed with Multi-marker Analysis of Genomic Annotation (MAGMA v1.08), which was integrated in FUMA (de Leeuw et al. 2015). In gene-based association analysis, summary statistics of SNPs were aggregated to the level of whole genes, testing the joint association of all SNPs in the gene with the phenotype. In other words, all the SNPs were mapped to 19,436 protein coding genes if the SNPs are located within genes. In gene-set analysis, individual genes were aggregated to groups of genes sharing certain biological, functional or other characteristics. This was done to provide insight into the involvement of specific biological pathways or cellular functions in the genetic aetiology of a phenotype. A total of 10,894 gene sets were tested and a competitive test model was applied. Tissue expression analysis was obtained from the Genotype-Tissue Expression (GTEx) project (https://www.gtexportal.org/home/) which was also integrated in FUMA. In the tissue expression analysis, average gene-expression per tissue type was used as gene covariate to test positive relationships between gene expression in a specific tissue type and genetic associations. In addition, regional plots of the suggested new loci were generated by LozusZoom (http://locuszoom.org/).

Results

There were 8,500,802 common SNPs from both cohorts analysed by the metaUSAT software. FUMA reported 38 independent genetic loci across autosomal chromosomes with the LDL Receptor related Protein 1 (LRP1)—Signal Transducer and Activator of Transcription 6 (STAT6)—Short chain Dehydrogenase/Reductase family 9C member 7 (SDR9C7) region in chromosome 12q13.3 being the most significantly associated locus with a leading p value of 1.24 × 10–62 for rs11172113. The Four And A Half LIM Domains 5 (FHL5)—UFM1 Specific Ligase 1 (UFL1) locus in chromosome 6q16.1 was the second most significantly associated, with a p value of 6.57 × 10–39 for rs9486715. A Manhattan plot showing these loci is shown in Fig. 1. A corresponding Q-Q plot is included as shown in Supplementary Fig. 1. Among the 38 identified loci, there were 2228 SNPs that demonstrated an association with genome-wide significance, with p value < 5 × 10–8. Among these SNPs, 113 SNPs were considered as independent associations (r2 < 0.6 with any SNP within the 2228 SNPs) (Supplementary Tables 1 and 2).

Fig. 1
figure 1

The Manhattan plot of the GWAS meta-analysis on headaches (N = 397,385). The dashed red line indicates the cut-off p value of 5 × 10−8

Table 1 summarises the information relating to the 38 associated loci. Among these loci, 25 loci had previously been reported by Gormley et al. (2016), and nine further loci had been separately identified by Meng et al. (2018). Four of the 38 loci were newly suggested. The One Cut homeobox 2 (ONECUT2) gene locus (18q21.31) was the strongest signal among these four new loci, associated with a p value of 1.29 × 10–9 for rs673939. Table 2 summarises the details of the four newly suggested loci. Regional plots of these four new loci are included in Fig. 2.

Table 1 The 38 loci generated by the GWAS meta-analysis study on self-reported headache and self-reported migraine
Table 2 The summary statistics of the four newly suggested loci of headaches
Fig. 2
figure 2

The regional plots of the four new loci. Up left: the ONECUT2 region; Up right: the MAU2 region; Bottom left: the Intergenic region (Near KCNK17, 6p21.2); Bottom right: the ZNF462 region

The gene-based association study identified 51 genes (cut-off p value = 0.05/19,436 = 2.57 × 10–6) that were associated with headaches, with the PR/SET domain 16 (PRDM16) gene showing the strongest association (p value of 7.10 × 10–15). All significantly associated genes are summarised in Supplementary Table 3.

The gene-set analysis found that no specific pathway was significantly associated with headaches after Bonferroni correction (cut-off p value = 0.05/10,894 = 4.6 × 10–6). The top 10 pathways are included in the Supplementary Table 4.

Two types of tissue analysis were performed. The tissue expression analysis on 30 general tissues revealed that both brain tissues and vascular tissues are potentially involved in the disease mechanisms (Fig. 3). The tissue expression analysis on 54 specific tissues also found 11 brain tissues with significant association (p < 0.05/54 = 9.26 × 10–4) (Fig. 4).

Fig. 3
figure 3

Tissue expression results on 30 specific tissue types by GTEx in the FUMA. The dashed line shows the cut-off p value for significance with Bonferroni adjustment for multiple hypothesis testing

Fig. 4
figure 4

Tissue expression results on 53 specific tissue types by GTEx in the FUMA. The dashed line shows the cut-off p value for significance with Bonferroni adjustment for multiple hypothesis testing

We also compared the 28 loci suggested by Meng et al. (2018) with the 38 loci suggested by this study. We noticed that 24 loci reported by Meng et al. (2018) still exist in the current study while four loci dropped out. Fourteen loci showed up in the current study which were not identified by Meng et al. (2018) (Table 3, Supplementary Tables 5 and 6).

Table 3 Four loci reported by Meng et al. (2018) while dropped out in the current study

With the specific permission obtained from the 23andMe, we have included the summary results of the GWAS on self-reported migraine using the 23andMe data in a supplementary file. We also provided a loci comparison table among the four studies (the current study, Gormley et al. 2016; Meng et al. 2018 and 23andMe) (Supplementary Table 7).

Discussion

We performed a GWAS meta-analysis study on two highly genetically correlated phenotypes based on summary statistics from two large GWAS: self-reported headache and self-reported migraine (genetic correlation value = 0.72, p = 1.66 × 10–68, standard error = 0.04). This analysis identified 38 loci associated with headaches, of which 34 had been previously identified (Gormley et al. 2016; Meng et al. 2018) and four were newly suggested loci.

GWAS on complex traits have achieved great success in the past decade (Mills and Rahal 2019). Furthermore, GWAS meta-analysis on same phenotypes from multi-centers and multi-cohorts also improve statistical power to identify genetic variants which otherwise cannot be detected by a single cohort study (Evangelou and Ioannidis 2013). However, the number of cohorts in a genetic consortium will reach a bottleneck when most of the existing cohorts are already included. It becomes more difficult to include extra cohorts into the consortium to achieve a higher study power. Meanwhile, it might also be challenging to fund and allocate resources to increase sample size of cohorts in a genetic consortium. This has led to the development and use of statistical methods that leverage other aspects of a study to increase detection power. Software developed for joint meta-analysis of GWAS on existing correlated phenotypes can improve power with minimum additional resource requirement, particularly as it is now a routine requirement for GWAS summary statistics to be shared publicly after publication (Guo and Wu 2019). Specific software created for this purpose includes Software for Correlated Phenotype Analysis (META-SCOPA), Canonical Correlation Analysis (metaCCA), and Multi-Trait Analysis of GWAS (MTAG) (Cichonska et al. 2016; Mägi et al. 2017; Turley et al. 2018).

In this study, we used a recently developed software called metaUSAT, whose properties have been illustrated using simulated data, and the Type 2 Diabetes Genetic Exploration by Next-generation sequencing in multi-Ethnic Samples (T2D-GENES) and the METabolic Syndrome In Men (METSIM) datasets (Fuchsberger et al. 2016; Stancáková et al. 2009). The software does not generate an overall Beta or standard error for each SNP but it calculates a Z value representing the contribution of each SNP from each cohort or each phenotype. As with Beta values, if Z values are both positive or both negative, then the direction of the SNP effect on the traits is the same. In Table 2, we see that the most strongly associated SNPs in the top nine loci all have the same direction of effect. However, the lead SNP in the Long Intergenic Non-Protein Coding RNA 2210 (LINC02210)—Corticotropin Releasing Hormone Receptor 1 (CRHR1)—Microtubule Associated Protein Tau (MAPT) locus (ranked 10th), although significantly associated, showed a different direction of effect in the two datasets. This might indicate that the role of this locus might be different in these two phenotypes. It is possible that a locus could contribute to non-migraine type headaches while contributing minimally to migraine. However, this assumption definitely needs further lab evidence. Comparing the Z-values could be a novel way to differentiate the genetic impact of certain SNPs in genetically correlated yet different phenotypes (Ray and Chatterjee 2020).

Consistent with previous studies, the most significantly associated locus in the LRP1-STAT6-SDR9C7 region was the strongest locus identified in the meta-analysis (p = 1.24 × 10–62 for rs11172113) (Gormley et al. 2016; Meng et al. 2018). This locus, ranging from 57,244,168 to 57,629,608 in chromosome 12, contained 122 SNPs associated with genome-wide significance, among the 166 SNPs in the output dataset. The LRP1 gene has been well established as a migraine gene (Anttila et al. 2010, 2013). One theory about its possible link with migraine is that the LRP1 protein interacts with the glutamate receptors on neurons while the pathophysiology of migraine has been suggested to be related with the glutamate homeostasis (Andreou and Goadsby 2009). The gene-based association study revealed that the PRDM16 gene was the most significantly associated gene, followed by CRHR1, MAPT, and KAT8 regulatory NSL complex subunit 1 (KANSL1) (Supplementary Table 3). Through the tissue expression analysis, both brain and vascular tissues were indicated as being involved in the mechanisms of headaches. Gormley et al. (2016) found that vascular factors played a main role in migraine, while in our UK Biobank study, we found that neural tissues were major factors in self-reported headache. We, therefore, deduce that for other types of headaches, such as tension-type headache which produces most headaches in the general population, the role of neural tissue is likely to be greater than that of vascular factors. The cortex has demonstrated the strongest link with headache in the tissue expression analysis. It has been reported that migraine is associated with the changes in cortex functions (Barbanti et al. 2020; Charles and Brennan 2010).

In this study, we suggested four new loci which have not previously been reported to be associated with headaches. The ONECUT2 gene region was the most strongly associated among these four loci. The Z values from the headache study (Z = –5.24) and the migraine study (Z = –4.05) were in the same direction. The original p values of the top SNP (rs673939) in this region were found to be 9.00 × 10–8 by Meng et al. (2018) and 4.88 × 10–5 in the 23andMe migraine dataset (Gormley et al. 2016). The ONECUT2 gene, also termed OC‑2, is a newly discovered member of the ONECUT transcription factor family (Jacquemin et al. 1999). ONECUT2 can widely regulate the protein expression associated with cell proliferation, migration, adhesion, and differentiation, thus being involved in the regulation of the development of an organism (Yu et al. 2020). It has been well reported for its associations with multiple cancers. Although we do not know why it is statistically associated with headaches, the gene is expressed in the brain (https://www.ncbi.nlm.nih.gov/gene/9480). It is not uncommon that SNPs can be associated with multiple phenotypes which seem completely unrelated (Solovieff et al. 2013). The MAU2 sister chromatid cohesion factor (MAU2) is a protein-coding gene, which plays an important role when cohesions (chromosome-associated multi-subunit protein complex) try to bind to DNA to carry out a large spectrum of chromatin-related functions, including sister chromatid cohesion, DNA repair, transcriptional regulation, and three-dimensional organization of chromatin (Zhu and Wang 2019). Mutations of MAU2 have been linked with a rare disorder of Cornelia de Lange Syndrome (Parenti et al. 2020). The Potassium two pore domain channel subfamily K member 17 (KCNK17) is the nearest gene to the leading SNP of rs72854120 in the third new locus. Variants of this gene have been reported to be associated with ischaemic stroke, cerebral hemorrhage, and arrhythmia (Friedrich et al. 2014; He et al. 2014). The protein products of Zinc Finger protein 462 (ZNF462) have shown important roles in embryonic development in animal models (Cosemans et al. 2018). Variants of this gene have been reported to contribute to craniofacial and neurodevelopmental abnormalities (Weiss et al. 2017). It is worth noting that the Z values of each leading SNP in the four new loci were all in the same direction in each of the two cohorts. We also noted that Meng et al. (2018) suggested 28 loci associated with self-reported headache; with 24 of these loci still found in the current study while four dropped out (Table 3), which means we found 14 newly suggested loci when performing the meta-analysis with the 23andMe data in the current study. One reason for dropping the four previously suggested loci might be that the studies are based on the genetically correlated samples but are examining different phenotypes. Of note is that the p values of the dropped loci in Meng et al. (2018) could be considered as of marginal GWAS significance, and further work needs to be done to explore their relevance.

Notably, we had particular advantages in this study that would be important to consider if applying these methods to other phenotypes and or samples. One was that the two phenotypes we chose (self-reported headache and migraine) are highly genetically correlated. However, the ability of a study to detect more new variants would be reduced if the two phenotypes are not so highly correlated. Second, our two cohorts were both of mainly European descent and with minimum sample overlap; therefore, we avoided some negative impact (such as increasing type I and type II errors) which could be caused by these factors in the study. There are limitations associated with this approach, being novel in its application. For example, although we successfully addressed phenotypes and datasets which are highly genetically correlated, there are insufficient published studies to allow us to determine the strength of correlation which is required to allow this approach for future studies. This will require our approach to be replicated with other phenotypes and datasets, followed by formal statistical appraisal of the results. Similarly, although it was not directly relevant in our study, consideration will need to be given to applying this approach when there is sample overlapping. It is worth mentioning that the genetic correlation between diagnosed cluster headache phenotype from a mixed UK and Swedish cohort and self-reported headache from the UK Biobank was 0.50 (O’Connor et al. 2021). At the time we conducted this analysis, the four novel loci we suggested were previously unknown. During the preparation stage of this paper, a large GWAS meta-analysis on migraine has been published suggesting 123 migraine-related loci (Hautakangas et al. 2022). Among our four newly suggested loci, three loci have been reported (ONECUT2, MAU2 and ZNF462). Although that study was specifically addressing migraine, rather than the more general headache phenotype that we addressed, this overlap between the new loci suggested in the two studies both helps to confirm the findings of Hautakangas et al. (2022), and the success of our methodological approach. The loci near KCNK17 on chromosome 6p21.2 was not clearly reported and its nearest loci in the paper was potassium two pore domain channel subfamily K member 5 (KCNK5) which is 60 kb away from KCNK17. A recent study on rare variants of migraine showed that significant cis-expression quantitative trait loci (eQTL) in the polycomb response elements (regulatory sites that mediate the silencing of homeotic and other genes) mapped to the KCNK17 (Techlo et al. 2020).

It is also important to note that the control definitions in the two GWAS datasets were different. The controls used in the UK Biobank self-reported headache phenotype reported no pain within previous month, while the controls used in the 23andMe self-reported migraine phenotype could have had pain in body sites other than the head. This mean that genes identified in the UK Biobank cohort may not be specific to headache, but could be more generally associated with pain. Please also note that self-reported headache or migraine phenotypes are different from clinical ascertained phenotypes. Although our results on self-reported phenotypes will provide reference values to other researchers, there could be potential biases because of this and the results should be interpreted with caution in a clinical setting.

Conclusion

In summary, our study suggested four new genetic loci which are associated with self-reported headaches and/or migraine, and shed further light on their potential mechanisms. Further research could attempt a meta-analysis study on GWAS of different types of primary headaches (on the condition that they are reasonably genetically correlated) to identify further genetic components.