Background

Cluster headache is one of the most painful disorders in the world, with a prevalence of around 0.1% and a clear male predominance [1]. It is characterized by excruciating unilateral orbital, supraorbital or temporal pain attacks, with accompanying cranial autonomic symptoms or restlessness/agitation. The attacks last between 15 and 180 min but may occur up to 8 times per day and take weeks to months to remit. In some patients, the cluster headache presents as a chronic form, with attacks occurring for > 1 year without remission, or with remission periods lasting < 3 months [2]. The clinical features of cluster headache in Asian populations differ from those in Western populations, including a lower prevalence of chronic cluster headache, higher male/female ratio, and lower frequencies of restlessness or aura [3, 4]. Mechanisms underlying these ethnic differences remain unclear.

Despite the tremendous impact on the sufferers, detailed pathogenesis underlying cluster headache remains enigmatic. Because cluster headache has significant familial aggregation, attempts have been undertaken to determine its underlying genetic architectures. Hypothesis-driven approaches using candidate gene association studies, however, have not identified replicable signals. The first genome-wide association study (GWAS) investigating 99 Italian patients with cluster reported suggestive associations with genetic variants in ADCYAP1R1 (ADCYAP receptor type I) and MME (membrane metalloendopeptidase) [5]; however, these findings were not replicated subsequently. Two recent studies provided the first evidence to demonstrate genome-wide significant variants contributing to the predisposition of cluster headache in European cohorts [6, 7]. The Dutch and Norwegian study identified rs11579212 near RP11-815 M8.1, rs6541998 near MERTK (MER Proto-Oncogene, Tyrosine Kinase), rs10184573 near AC093590.1, and rs2499799 near UFL1 (UFM1 specific ligase 1)/FHL5 (four and a half LIM domains 5)[6] whereas the combined United Kingdom (UK) and Swedish cohorts identified rs113658130 near LINC01877/SATB2 (SATB homeobox 2), rs4519530 in MERTK, rs12121134 near LINC01705/DUSP10 (Dual Specificity Phosphatase 10), and rs11153082 in FHL5 to be associated with cluster headache [7]. Considering the inter-ethnic variability of clinical characteristics [3], it is uncertain whether these novel loci are replicable in other populations.

To interrogate the genetic architecture of cluster headache in Asians, we performed a two-stage GWAS in a total of 734 clinic-based patients and 9,846 population-based controls. We also established polygenic risk score (PRS) models to differentiate patients from controls and conducted downstream analyses to investigate genes and potential pathogenic mechanisms of cluster headache.

Methods

Study participants and clinical evaluations

This study was a two-stage case–control GWAS, including a discovery cohort and a replication cohort, followed by a combined analysis of both cohorts. All participants were unrelated and of Han Chinese ancestry. Patients with cluster headache were enrolled from the headache or neurological clinics of TVGH and TSGH, Taiwan, as well as the Taiwan Precision Medicine Initiative, from 2007 through 2022. Patients who were recruited after May 2012 were assigned to the discovery cohort and subjects recruited prior to that were assigned to the replication cohort. The control subjects were recruited from the Taiwan Biobank, the largest publicly available genetic database of individuals with East Asian ancestry that provides adequate coverage of genetic diversity across all Han Chinese [8]. To evaluate the genetic correlation of cluster headache and migraine, we also recruited an independent cohort of migraine patients from TVGH. The inclusion criteria for all participants include (1) age between 20 and 70 years old and (2) could fully understand the study objective and willing to provide written informed consent. Patients were eligible if they were diagnosed with cluster headache or migraine by board-certified neurologists based on the criteria proposed in the International Classification of Headache Disorders, third edition (ICHD-3) [2]. The exclusion criteria for all participants included history of major systemic illnesses or major neuropsychiatric disorders. Control subjects who have a history of moderate or severe headaches, first-degree relatives with a history of migraine or severe headaches, or any kinship relatedness with the patients were also excluded.

Single nucleotide polymorphism (SNP) genotyping

SNP genotyping was conducted with the Affymetrix Axiom Genome-Wide TWB2 Array Plate at the National Center for Genomics Medicine, Academia Sinica, Taiwan. SNP genotypes were called using the Axiom GT1 algorithm. Quality control (QC) criteria for SNPs were applied to exclude SNPs if they (a) were monomorphic in both cases and controls, (b) had a minor allele frequency < 0.01, (c) had a total call rate < 95% in cases and controls combined, (d) showed significant (P < 0.00001) deviation from Hardy–Weinberg equilibrium in case or controls, or (e) a significant difference in the genotype call rates between cases and controls (P < 0.001). For sample filtering, we excluded arrays with generated genotypes for fewer than 95% of loci. Heterozygosity of SNPs on the X-chromosome was used to verify the sex of the samples. PLINK software (version 1.90) [9] was used to identify samples with genetic relatedness, indicating that they were from the same individual (or monozygotic twins) or from first-, second- or third-degree relatives. These determinations were made based on evidence for cryptic relatedness from identity-by-descent status (pi-hat cutoff of 0.125).

Genotype imputation analysis

We conducted a genotype imputation analysis using the 1000 Genomes Phase 3 data [10] and TWB whole-genome sequencing data as reference and IMPUTE2 [11,12,13]. Well-imputed SNPs (information score > 0.8) were retained followed by systematic QC, as described above [14]. For sample filtering, we excluded arrays with generated genotypes for fewer than 95% of loci.

Statistical analysis

Association analyses were carried out by comparing allele or genotype frequencies between cases and controls. The Manhattan Plot and quantile–quantile (Q-Q) plots were generated using the “qqman” package in R [15]. Detection of population stratification was carried out by using principal component analysis (PCA). The genomic inflation factor (GIF) was calculated and the top 10 principal components (PCs) via Eigenstrat in typing datasets. We calculated the distribution of expected p values under the null hypothesis and genomic inflation value (λ = 1). Logistic regression was used to evaluate association of SNPs with cluster headache by adjusting age, sex, and the top 10 PCs (PC1‒PC10), and an additive genetic model was conducted using PLINK (version 1.9) [9]. P < 5 × 10–8 was designated as GWAS significance and p < 5 × 10–5 was designated as suggestive GWAS significance. The positional gene mapping and fine mapping of significant loci were generated using LocusZoom [16] and Probabilistic Identification of Causal SNPs (PICS2) [17]. The proportion of variance explained by a given SNP was calculated using Nagelkerke pseudo R2.

Gene-based analysis

We performed the Multimarker Analysis of GenoMic Annotation (MAGMA) gene-based association analysis [18] implemented in FUMA [19], which calculates a gene test-statistic (p-value) based on all SNPs located within genes, using imputation data.

Previously reported cluster headache or migraine loci

Genetic loci previously reported to be associated with cluster headache were tested for association in our samples [5,6,7, 20,21,22,23,24,25,26]. We also conducted meta-analysis of these loci by combining the results from previous studies and ours with METAL [27]. As recent GWASs suggested a potential genetic correlation of cluster headache with migraine [6, 7], we also tested the association of migraine-associated loci [28, 29] in the current sample.

Univariate LD-score regression

Linkage Disequilibrium Score Regression (LDSC v1.0.1) was used to estimate the proportion of a true polygenic signal opposed to confounding biases, and to calculate SNP-based heritability [30]. We estimated LD Scores from 1,446 samples in the Taiwan Biobank whole-genome sequencing data using an unbiased estimator of r2 with 1-cM windows, singletons excluded and no r2 cutoff. Heritability estimates were converted to the liability scale assuming a population prevalence of cluster headache of 0.1%.

Polygenic risk score derivation

Four sets of polygenic risk scores were developed to discriminate patients from controls. In the first set, we computed genome-wide PRS derived from the whole genome imputation data following a method reported previously [31]. We used the GWAS of the discovery cohort (i.e., the training set) for generating the estimates of regression coefficients, its standard error and associated p-value (\({\widehat{\beta }}_{j}\), \({\widehat{\sigma }}_{j}\) and \({p}_{j}\)) for each SNP j based on the univariate logistic regression analysis. With SNPs that passed the QC in GWAS analysis, the SNPs with MAF less than 1% or Indels were excluded. To calculate PRS, we used the standard clumping and thresholding (C + T) method [32] using two hyperparameters for model building: the cut-off of correlation \({r}^{2}\) and p-value threshold p. The optimal PRS model was chosen from the pair of (\({r}^{2}, p\)) that maximized the prediction performance, area under the receiver-operator characteristics curve (AUC), evaluated in the testing (i.e., replication) cohort. In the second set, the PRS was derived from all the previously reported loci associated with cluster headache. The prediction performance was also evaluated with AUC. The 3rd and 4th sets of PRS were derived from the significant SNPs of the two previously reported European cluster headache GWASs [6, 7] respectively.

Gene expression, tissue enrichment analysis, and pathway analysis

The tissue specific expression of the cluster headache-associated genes was explored based on the expression data from the Genotype-Tissue Expression (GTEx) project [33] using the GTEx platform or the MAGMA gene property in FUMA. We also implemented GIGSEA (Genotype Imputed Gene Set Enrichment Analysis) [34] to infer differential gene expression and interrogate gene set enrichment for the trait-associated SNPs. The gene expression levels were imputed from GWAS summary data of cluster headache by Elastic net (ENet) regression models using MetaXcan [35] or the pathway analyses of MetaXcan output, we employed the weighted linear regression model of the GIGSEA [34] with empirical p-values incorporating 1,000 permutations for Gene Ontology (GO), Kyoto Encyclopedia of Genes and Genomes (KEGG), and Reactome (REAC) [36,37,38]. Following the guidance of GIGSEA, the Bayes factor was used to correct the multiple hypothesis testing [39]. Stringent thresholds for significant associations were set as the empirical p-values < 0.05 and Bayes factor [BF] > 100 in the pathway enrichment analyses in brain tissues. Of note, we selected the enriched pathways with the gene sets that contain from 15 to 500 genes as suggested by GSEA manual.

Results

Characteristics of study subjects

The two-stage GWAS included a discovery cohort consisting of 359 cluster headache patients and 4786 population-based controls and a replication cohort comprised 375 patients and 5060 population-based controls. Demographic characteristics of participants are shown in Additional File 1.

Association analysis

After applying stringent QC criteria, we obtained 549,611 SNPs with an average call rate of 99.78% ± 0.25%. After imputation, 6,630,979 loci were yielded. Q-Q plot (Additional File 2) and PCA (Additional File 3) showed no significant population stratification. In the combined analysis, the LDSC intercept was 0.9962 (SE 0.0061), indicating little inflation due to factors other than polygenic architecture. In addition, we estimated the SNP-based heritability (h2) of cluster headache at 16.51% (SE 4.42%) on the observed scale and 22.5% (SE 6.03%) on the liability scale.

The two-stage genome-wide analysis using imputation data identified one replicable genomic locus in Calpain 2 (CAPN2) with multiple SNPs reaching GWAS significance at both discovery and replication cohorts (Fig. 1A and Additional File 4). Combined analysis found additional two loci with GWAS significance in MERTK and SATB2 (Table 1) and one locus with suggestive GWAS significance in CYP2C18/CYP2C19 (Fig. 1A). Of note, indels or singleton SNPs without significant LD with other SNPs that were deemed spurious association signals. The regional association plots for the significant loci were demonstrated in Fig. 2. In addition, fine mapping with PICS identified multiple variants with causal probability larger than 0.2, but only rs4653500 (PICS probability = 0.46) has eQTL on CANP2.

Fig. 1
figure 1

Manhattan plots for the GWAS and gene-based analysis. A Manhattan plot for the associations of SNPs with cluster headache. The horizontal axis shows the chromosomal position, and the vertical axis shows the significance of tested markers. The threshold for genome wide significance (p < 5 × 10–8) is indicated by a red dash line. B Manhattan plot of the gene-based test as computed by Multimarker Analysis of GenoMic Annotation (MAGMA). The input SNPs were mapped to 18,460 protein coding genes. The red dash line indicates the genome wide significance defined at p = 0.05/18,460 = 2.709 × 10–6

Table 1 Basic demographics of participants
Fig. 2
figure 2

Regional association plots of the two genome-wide significant cluster headache loci (A) CAPN2, (B) MERTK, and (C) SATB2. Each dot represents a single nucleotide polymorphism (SNP) derived from the fine-mapped imputation data. The horizontal axis gives the genomic coordinate and the vertical axis the significance level (-log10 p value). The top SNP for each locus is marked with a purple diamond (CRCh38/hg19). SNPs are colored based on their correlation (r2) with the labeled lead SNP according to the legend. The solid blue line shows the recombination rate from 1000 Genomes EAS data (right vertical axis). The gray dash line corresponds to p = 5 × 10−8. Figures were obtained from LocusZoom

Gene-based association testing

Gene-based association testing used the mean association signal from all SNPs within each gene, accounting for LD. Seven genes passed the threshold of statistical significance: MERTK, ANAPC1 (anaphase promoting complex subunit 1), CAPN2, NUTM2F (NUT Family Member 2F), TP53BP2 (Tumor Protein P53 Binding Protein 2), HIATL1 (Hippocampus Abundant Transcript-Like Protein 1), and CYP2C18 (Fig. 1B). The Q-Q plot of the gene-based test computed by MAGMA was shown in Additional File 5.

Previously reported cluster headache loci

Among the previously reported cluster headache-associated SNPs, rs4519530 in MERTK and rs6541998 near MERTK were successfully replicated in our sample using imputation data. Some SNPs were unavailable or insignificant in our sample, but the lead SNPs within or near the same genes were found to have suggestive GWAS significance (Table 2). Most of the variants, particularly those in MERTK and FHL5, were more significant in meta-analysis (Table 2); however, substantial heterogeneity was noted in some of them (Additional File 6).

Table 2 The Association of previously reported cluster headache loci in the current sample

Polygenic risk score

We calculated two sets of PRS. In the first set using the C + T method for genome-wide data, the distribution of the PRS for cluster headache in testing dataset was shown in Fig. 3. Grid search found that the PRS consisted of SNPs with r2 = 0.1 and p-value = 0.05 was the best model to differentiate patients from controls (AUC = 0.772, 95% confidence interval (CI) 0.747–0.797). The SNPs within the PRS model had a combined explained variance (i.e., Nagelkerke pseudo R2) of 16.97%. In the second set which used only the previously reported cluster headache loci (listed in Table 3) to derive PRS, the AUC was 0.633 (95% CI 0.604–0.662). The 3rd set of PRS derived from the Dutch and Norwegian study [6] had an AUC of 0.536 [0.504–0.567] while the 4th set from UK and Swedish study [7] had an AUC of 0.583 [0.553–0.613].

Fig. 3
figure 3

Genome-wide polygenic risk score (PRS) for cluster headache. The density distributions of polygenic score for cluster headache in testing dataset. The x-axis represents polygenic score, with values scaled to a mean of 0 and standard deviation of 1. Cases: blue; Controls: yellow

Table 3 Lead SNPs of GWAS significant loci in the combined analysis for patients with cluster headache in Taiwan

Genetic correlation between cluster headache and migraine

We recruited a cohort consisting of 3,173 Taiwanese migraine patients and 24,528 controls and estimated the genetic covariance between cluster headache and migraine on the liability scale was 21.0% (SE 5.7%) and the genetic correlation was 42.91% (SE 16.31%, P = 0.009). We have also examined 125 migraine-associated loci reported in previous migraine GWAS in the current sample of cluster headache.27, 28 Some SNPs or variants with high LD within or near the migraine-associated loci were found to have suggestive GWAS significance in the current cluster headache dataset (Table 4).

Table 4 Previously reported migraine loci with suggestive GWAS significance in the current sample

eQTL analysis

eQTL analysis using the GTEx online platform [33] (accessed on June 20, 2022) found that the lead SNP rs1556780 in CAPN2 and rs10188640 in MERTK had significant eQTLs on several genes (Table 1). With MetaXcan to infer the correlation between genetic variants and gene expression, we found that the genetic variants in MERTK were significantly associated with the gene expression in the pituitary gland and caudate (basal ganglia) with a false discovery rate (FDR) of less than 0.001 (Additional File 7).

Functional enrichment and pathway analysis

MAGMA gene-property analyses implemented in FUMA did not demonstrate significantly enriched tissues. GIGSEA biological pathway enrichment analyses based on the MetaXcan gene-level summary statistics found that the pathways were predominantly enriched in the pituitary gland, hippocampus, cerebellum, and basal ganglion. Top significantly associated pathways in these tissues that may be relevant to cluster headache include pathways involved in synaptic transmission, transmembrane protein kinase activity, immune responses, mitochondrial function, and metalloendopeptidase activity (Additional File 8 and Additional File 9).

Discussion

We identified three susceptibility loci, in CAPN2, MERTK, and SATB2, as well as one suggestive locus at CYP2C18/CYP2C19 at genome-wide level in patients with cluster headache in Taiwan. To the best of our knowledge, this is the first GWAS of cluster headache performed in Han Chinese and the first in Asians. While replicating the susceptibility genes recently identified in GWASs in patients of European ancestries [6, 7] suggested the validity of our study, we also identified novel genes implicating potential inter-ethnic differences. The association effect sizes are relatively large and similar to those observed in European GWASs [6, 7]. These results suggest that some phenotypes of cluster headache might be driven by these selected loci with large effect size, although further studies are needed to explore the genotype–phenotype association. In addition, several other risk loci identified in previous studies are of suggestive GWAS significance in our samples, suggesting that future studies with larger sample sizes might validate the associations of these loci with cluster headache. Moreover, the superior discriminative capacity of genome-wide PRS than the PRS composed of known cluster headache-associated loci suggests that additional loci with smaller effect sizes might also contribute to the genetic basis of cluster headache. Nevertheless, the clinical utility of PRS remains to be explored.

The novel gene CAPN2 identified in our study encodes calpain 2, a calcium-regulated non-lysosomal thiol-protease involved in cytoskeletal remodeling and signal transduction. Calpain 1/2 have been known to mediate Ca2+ influx and mediate degradation of suprachiasmatic nucleus (SCN) circadian oscillatory protein (SCOP) [40] in SCN neurons, which may contribute to coordinated regulation of circadian rhythms. In addition, calpain 1/2 play opposite roles in retinal ganglion cell (RGC) degeneration induced by ischemia/reperfusion injury [41], while degeneration of RGCs could lead to impaired circadian rhythmicity. Another potential implication of CAPN2 may be that the most commonly used preventive drug for cluster headache, the L-type calcium channel blocker verapamil, has been known to abrogate calpain activation [42]. Furthermore, calpain was found to mediate capsaicin-induced ablation of transient receptor potential vanilloid subtype 1 (TRPV1)-positive trigeminal afferent terminals [43], which may mediate the release of calcitonin gene-related peptide (CGRP) and contribute to the pathogenesis of cluster headache.

We successfully replicated MERTK and SATB2 identified in previous cluster headache GWASs [6, 7]. MERTK encodes a protein belongs to the MER/AXL/TYRO3 receptor kinase family. In addition to regulating microglia-mediated neuroinflammation and astrocyte-mediated neuronal synaptic remodeling proposed in previous studies [6, 7]. MERTK functions in the retinal pigment epithelium as a regulator of photoreceptor phagocytosis, which is a circadian-regulated process indispensable for vision [44]. Mutations of MERTK cause degeneration of photoreceptors, which in turn lead to the loss of photic and circadian control and reduced production of melanopsin mRNA in RGCs [45]. As melanopsin-expressing RGCs are responsible for circadian photoreception and project to SCN and hypothalamus [46], MERTK may thus indirectly participate in the pathogenesis of cluster headache. In addition, enrichment analysis showed significant expression of MERTK in pituitary gland, which could potentially contribute to the altered hormonal expression in cluster headache [47]. SATB2 encodes a DNA binding protein that specifically binds nuclear matrix attachment regions and involves in transcription regulation and chromatin remodeling. Previous GWAS suggested that it may be associated with hypothalamic dopaminergic neurons and structures responsible for nociceptive processing [6, 7]. SATB2 was also known as an important transcription factor of RGCs in primates [48]. In addition, both GWAS and gene-based association analysis suggested CYP2C18 as a potential novel susceptibility gene for cluster headache. Interesting, CYP2C18 is a cytochrome P450 monooxygenase involved in retinoid metabolism [49], while retinoic acid is a molecular trigger of RGC hyperactivity [50]. Taken together, the top implicated genes in our study may modulate the function of RGCs; however, how this could be involved in cluster headache pathogenesis particularly circadian rhythmicity requires further validation.

Gene-based association testing also identified several possible additional loci, among these, ANAPC1 has also been identified in another cluster headache GWAS using gene-based analysis, with enriched expression in the brain, particularly neurons [7]. In addition, among the genes that may be potentially affected by variants in MERTK, i.e., with significant eQTL expression, TMEM87B (transmembrane protein 87B) and FBLN7 (Fibulin 7) have been identified in prior GWASs via gene-based analysis or eQTL analysis [6, 7], suggesting their involvement in cluster headache. Moreover, we found evidence of suggestive association among previously suggested loci to be involved in cluster headache, including LINC01877, LINC01705, ADCYAP1R1, and MME. Although these genes are functionally plausible for the pathogenesis of cluster headache, future studies with larger sample sizes are needed to validate these findings.

We found considerable heritability of cluster headache and migraine and significant genetic correlation between these two primary headache disorders, which corroborates with the clinical observations that both disorders exhibit features of trigeminovascular activation and respond to similar treatments such as triptans or CGRP monoclonal antibodies. Some previously reported migraine loci were also found to have suggestive GWAS significance in our current sample, which may contribute to the share biology between migraine and cluster headache. In addition, pathway analyses suggested that biological processes associated with synaptic transmission or immune responses may be involved in the pathogenesis of cluster headache. In fact, immunological processes have long been considered important in the pathogenesis of migraine and we have recently found HLA class I alleles are associated with clinic-based migraine [51].

Our study has limitations. First, the sample size is relatively small. However, the successful validation of the findings of previous GWASs and replicable signals in both stages of our GWAS after stringent quality control suggest that our findings are unlikely spurious. Second, the identified variants might not be the true causal variants and it remains unknown whether the expression of the implicated genes is truly altered in patients even though eQTL analysis suggested that these variants could affect gene expression in certain tissues. Finally, although the implicated genes were plausibly relevant to the pathogenesis of cluster headache in in silico functional analysis, we were unable to validate the function of these genes in vivo at the current stage owing to the limitation of the availability of biological samples such as brain tissues or retina from the patients. Further in-depth functional analysis at molecular level, at least in a subset of the patients, are needed the increase the credibility of the findings..

Conclusions

In conclusion, our data provide the genetic architecture and mechanistic insights into cluster headache, particularly in Asians, which has been under-represented in previous genetic studies of cluster headache. Future GWAS with patients from multiple ethnicities are required to identify shared and independent genetic determinants of cluster headache across different populations and to better understand the molecular mechanisms of this debilitating disorder.