Familial analysis reveals rare risk variants for migraine in regulatory regions

The most recent genome-wide association study of migraine increased the total number of known migraine risk loci to 38. Still, most of the heritability of migraine remains unexplained, and it has been suggested that rare gene dysregulatory variants play an important role in migraine etiology. Addressing the missing heritability of migraine, we aim to fine-map signals from the known migraine risk loci to regulatory mechanisms and associate these to downstream genic targets. We analyzed a large cohort of whole-genome sequenced patients from extended migraine pedigrees (1040 individuals from 155 families). We test for association between rare variants segregating in regulatory regions with migraine. The findings were replicated in an independent case-control cohort (2027 migraineurs, 1650 controls). We report an increased burden of rare variants in one CpG island and three polycomb group response elements near four migraine risk loci. We found that the association is independent of the common risk variants in the loci. The regulatory regions are suggested to affect different genes than those originally tagged by the index SNPs of the migraine loci. Families with familial clustering of migraine have an increased burden of rare variants in regulatory regions near known migraine risk loci, with effects that are independent of the variants in the loci. The possible regulatory targets suggest different genes than those originally tagged by the index SNPs of the migraine loci. Electronic supplementary material The online version of this article (10.1007/s10048-020-00606-5) contains supplementary material, which is available to authorized users.


Introduction
Migraine is a debilitating and complex genetic disorder with 1 billion affected individuals worldwide and a life-time prevalence of 15-20% [1]. Given the high prevalence, migraine has a considerable social and economic impact, with annual expenses estimated at €27 billion in Europe alone [2]. Furthermore, the Global Burden of Disease Study (2016) ranks migraine the second highest cause of disability worldwide [3]. Migraine is characterized by episodes of moderate to severe throbbing, unilateral headache, which intensifies with an increase in physical activity, and is accompanied by nausea and increased sensitivity to light and sound [4]. Rare variants with large effect sizes may be implicated with a high migraine risk [5]. The most recent meta-analysis on migraine genomewide association studies (GWAS) identified 44 independent SNPs defining 38 migraine risk loci [6]. However, a GWAS typically assesses common SNPs derived from imputation and genotyping, and thus effect sizes are small. Using next generation sequencing, it is possible to assess rare variants. Such Electronic supplementary material The online version of this article (https://doi.org/10.1007/s10048-020-00606-5) contains supplementary material, which is available to authorized users. rare variants are expected to have a larger effect size, particularly if the variant segregates with disease within families [7]. Common complex disorders, such as migraine, are thought to be influenced by alterations in gene regulation [8,9]. As migraine symptoms occur episodically and can be affected by regulatory factors, such as hormones [10,11], it is very likely that gene dysregulation plays an important role in migraine etiology. SNPs that affect gene regulation or are present in regulatory regions have been linked with migraine [12,13], which further supports the impact of regulatory dysfunction on the disorder. However, most of these SNPs are common with small effect sizes and fail to explain a high migraine risk.
We assessed whether rare variants in regulatory regions in known migraine risk loci have an increased risk burden for migraine. Additionally, we investigated whether the rare regulatory variants had effects that are independent of the variants in the migraine risk loci. We employed a cohort of extended families with familial clustering of migraine and performed whole-genome sequencing to identify rare variants. The findings were replicated in an independent case-control cohort of sporadic migraineurs with no familial history of migraine and controls.

Familial cohort
A total of 155 families with familial clustering of migraine were recruited. The families consisted of 1040 subjects (mean number of individuals per family = 6.7) of which 746 individuals were diagnosed with migraine (mean number of migraineurs per family = 4.8) and 294 had no history of migraine. The smallest families consisted of at least two individuals and at least one migraineur. A proband from each family was initially recruited at the Danish Headache Center, Rigshospitalet-Glostrup, Denmark, and the remaining family members were subsequently recruited, as described elsewhere [14]. All subjects were assessed by a neurologist or a senior medical student trained in headache diagnostics using a validated semi-structured interview [15,16] based on the International Classification of Headache Disorders [4].

Replication cohort
The replication cohort consisted of 2027 sporadic migraineurs (no first-degree relatives with migraine) and 1650 controls. The sporadic migraineurs were recruited and interviewed at the Danish Headache Center using the same procedures as described for the probands of the familial cohort. The controls were recruited for the Danish study of Non-Invasive testing in Coronary Artery Disease (Dan-NICAD) according to procedures described by Nissen et al. [17].

Sequencing
Genomic DNA was extracted from whole blood. All samples from both the familial cohort and the replication cohort were subjected to the same whole-genome sequencing procedures using an Illumina NovaSeq 6000 sequencing platform and S4 flow cells and subsequently subjected to quality control by deCODE genetics, as described elsewhere [18].

Defining genomic regions for analysis
All regulatory regions surrounding the index SNP defining the known, autosomal migraine risk loci [6] were analyzed. We defined the genomic search regions as 1 Mb pairs upstream and downstream of the index SNP (see supplementary Table 1 for regions). In the 2 Mb pair window, the genomic positions (according to the Genome Reference Consortium Human Build 38 GRCh38.p12 (hg38)) of all regulatory regions were annotated using BED files downloaded from the UCSC Table browser [19]. These BED files included insulators [20,21], polycomb group response elements (PREs) [20,21], enhancers [22], transcription factor binding sites (TFBS) [21], CpG islands [23], and promoters [24]. All BED files were downloaded on 02/21/2019 except for the BED file of the PREs (downloaded on 02/28/2019). Any genomic coordinates from the hg19 assembly were subsequently converted to the hg38 assembly. In total, 1079 insulators; 518 PREs; 1651 enhancers; 16,190 TFBSs; 828 CpG islands; and 135 promoters were mapped and subjected to familial association analysis. Within the regulatory regions, only genetic variants with a minor allele frequency (MAF) < 5% were assessed.

Familial association analysis
VCF files of each participant in the study were merged, multiallelic sites were split into a biallelic representation, the positions of the regulatory regions were annotated, and rare genetic variants (MAF < 5%) were isolated. The rarevariant association analysis was carried out with the software of Family Sequence Kernel Association Test (F-SKAT) [25]. Age and gender were used as covariates. The age was defined as the age at the time of the interview (the time of diagnosis). The analysis was performed with all default options in effect, and the resulting p values were controlled for multiple testing using the Bonferroni method. A Bonferroni-corrected p value < 0.05 was considered significant. To assess whether the findings were dependent on the original index SNP of the migraine GWAS, the number of alleles of the index SNP (0, 1, or 2) were applied as additional covariates in a separate rare-variant association analysis with F-SKAT.

Replication
The findings from the familial association analysis were assessed in a case-control design using an independent cohort of sporadic migraineurs with no familial history of migraine and controls. Aggregated data were available on rare variants (MAF < 5%) in the regulatory regions for the controls. Thus, we tested whether the frequency of rare variants was increased in sporadic migraineurs compared to the controls under the null hypothesis of no association with migraine and under a dominant model of penetrance, using a χ 2 test [26]. The resulting p values were controlled for multiple testing using the Bonferroni method. A Bonferroni-corrected p value < 0.05 was considered significant.

eQTL and gene expression analysis
Regulatory regions with an increased burden of rare variants were subsequently assessed using gene expression data and single-tissue cis-eQTL data from the Genotype-Tissue Expression (GTEx) version 8.0 dataset (dbGaP Accession phs000424.v8.p2, 2019-08-26 release). All SNP identifiers were found in the Database of Single Nucleotide Polymorphisms (dbSNP) [27] build 152.

Increased burden of rare variants found in four regulatory regions
We analyzed 155 migraine families (n = 1040) and 37 autosomal migraine risk loci (supplementary Table 1) and found a significant increased burden of rare variants segregating with migraine in 39 regulatory regions (see supplementary Table 2 for regulatory regions and p values). As expected, when analyzing rare variants in regions, that are known to harbor polygenic effects in the known risk loci [6,28,29], we found that the distribution of the test statistics deviated from the null, with a genomic inflation factor of λ = 1.89 (supplementary Fig. 1).
In an independent case-control cohort using sporadic migraineurs with no familial history of migraine (n cases = 2027) and controls (n controls = 1650), assuming a dominant model of penetrance, we verified that four of the regulatory regions had an increased burden of rare variants ( Table 1). The four regulatory regions include a CpG island located near the PHACTR1 migraine risk locus and three PREs located near the KCNK5, ASTN2, and RNF213 migraine risk locus. The CpG island located near the PHACTR1 migraine risk locus associated with the highest migraine risk, having an odds ratio (OR) of 1.54 (Fig. 1). Extensive information of the rare variants identified for the four regulatory regions is available in supplementary Table 3.
Association of migraine and rare regulatory variants independent of known risk loci Subsequently, we tested whether the four regulatory regions with increased burden were correlated with the index SNP of the migraine risk loci. Using the risk allele count of the index SNP as covariate, we found a nonsignificant effect on the results defining each migraine risk locus as additional covariates (Table 2).

Possible regulatory targets identified for three regulatory regions
To shed light on the biological mechanisms regulated by the regulatory regions, the genic regulatory targets were sought to be identified. The CpG island near the PHACTR1 migraine risk locus is overlapping the promoter region of various splice variants of the GFOD1 gene and the non-coding GFOD1-AS1 gene (Fig. 2a). We also found two cis-eQTLs within the CpG island that associated with GFOD1 transcript levels (supplementary Table 4). Analysis of the gene expression in the GTEx database indicates that GFOD1 is expressed across different kinds of tissue and is highly expressed in brain and vascular tissue (Fig. 3a).
PREs can be situated far away from their target genes, in introns, or in the 3′ untranslated regions (UTRs) [30]. Thus,  Table 4). The PRE is situated across two introns and one exon of the KCNK17 gene (Fig. 2b). According to GTEx, KCNK17 is primarily expressed in vascular tissue (Fig. 3b) while KIF6 is mainly expressed in the brain (Fig. 3c).
The PRE near the RNF213 locus harbored cis-eQTLs that associated with the transcript levels of five different genes. These include the nearby genes of CBX2, LINC0207, ENPP7, TBC1D16, and CARD14 (Fig. 2c). We do not note a higher expression of these genes in brain or vascular tissue compared to other tissue types (Fig. 3d-h).
The PRE situated near the ASTN2 locus is located in an intron of the PAPPA gene (Fig. 2d). No significant cis-eQTLs could be found within this PRE.

No increased burden of rare variants in regulatory targets
We assessed whether the possible gene targets of the four regulatory regions had an increased burden of rare variants segregating with migraine. Using a familial association analysis, we did not find a significantly increased burden of rare variants segregating with migraine after Bonferroni correction ( Table 3). The analysis was performed on the protein-coding genes only.

Discussion
Migraine is most often an episodic disorder and can be affected by regulatory factors, such as hormones. It is therefore very likely that gene dysregulation plays a causal role in migraine etiology. Here, we have assessed the hypothesis that an increased burden of rare regulatory variants in known migraine risk loci can be associated with a risk of migraine. We also addressed whether the association is independent of the common risk variants in the loci. We found four regulatory regions with an OR > 1, in which an increased burden of rare variants is independently associated with migraine. We report eight possible regulatory target genes. These genes are different than those in which the index SNPs of the known migraine risk loci resides in.
The CpG island near the PHACTR1 locus is overlapping the promoter region of various splice variants of the GFOD1 gene and the non-coding GFOD1-AS1 gene. As about 70% of all promoters are located in a CpG island-rich area [31], it is highly likely that this CpG island is involved in the regulation of the GFOD1 and the GFOD1-AS1 gene. The regulation of the GFOD1 gene is supported by the existence of two cis-eQTLs within the CpG island, that are associated with GFOD1 transcript levels. Interestingly, the research group of Lasky-Su et al. [32] has reported a SNP in an intron of the GFOD1 gene that associated with ADHD, which is a comorbid disorder of migraine [33,34]. Thus, the findings of this study could help elucidate the association between the disorders. Analysis of the gene expression in the GTEx database indicates that GFOD1 is expressed predominantly in brain and vascular tissue, which are tissue types that have been connected with migraine [35,36]. Transcription of antisense non- Fig. 1 Effect of regulatory regions on migraine risk. The figure presents the effect of the significant regulatory regions on migraine risk. X-axis: OR (dot) with the 95% confidence intervals (lines). Y-axis: The genomic positions (given as chromosome:start:end) and type of regulatory regions and the migraine risk loci in brackets  coding RNAs has been shown to participate in both cis and trans gene regulation [37], and it is thus possible that GFOD1-AS1 plays an important regulatory role on GFOD1.
Variations within CpG islands have been shown to influence DNA methylation patterns. Depending on the increase or elimination of CpG dinucleotides, the mutations may lead to hyper-or hypomethylation and, therefore, an altered transcriptional activity [38]. If CpG islands are hypermethylated, they may cause a stable silencing of the gene. In contrast, a hypomethylation can result in an overexpression [39]. Therefore, such changes of the CpG sites with the CpG island could affect the expression of GFOD1.
Significant cis-eQTLs in the PRE near the KCNK5 locus drive a change in the gene expression of KCNK17 and KIF6. The KCNK17 gene encodes a membrane protein and belongs to the family of two-pore domain potassium (K2P) channels. Of the same K2P channel subfamily is the protein product of the KCNK5 gene [40], in which the index SNP defining the migraine risk locus resides. From a different subfamily is the KCNK18 gene where a frameshift mutation has been found to segregate perfectly in families with migraine with aura [41]. Additionally, previously identified migraine-associated genes include genes encoding potassium channels [41,42]. Cortical spreading depression has been associated with migraine attacks [4], and because cortical spreading depression is characterized by ion influxes and neuronal depolarization, genetic alterations affecting potassium channels could play a role in this. The findings of our study could support that migraine may be a channelopathy and that potassium ion channels influence the disorder.
The PRE near the RNF213 locus harbored cis-eQTLs that act upon five different genes. These include the nearby genes of CBX2, LINC0207, ENPP7, TBC1D16, and CARD14. The protein encoded by the CARD14 gene mediates the activation of transcription factor NF-κB [43]. NF-κB plays a part in induction of nitric oxide production [44], which has been implied as having a key causative role in migraine [45] and can induce headache in healthy subjects [46]. Therefore, inhibition of NF-κB has been proposed as a possible therapeutic approach to treat migraine [47].
PREs are a class of silencers that have a suppressive effect on gene expression, when polycomb group (PcG) proteins are  Table 3 Results from familial association analysis on rare variants segregating with migraine in the possible genic regulatory targets. The name of the migraine risk loci, the regulatory regions, and the corresponding possible genic regulatory target are presented. Additionally, the nominal p value and the Bonferroni-corrected p value of the familial association analysis are displayed Aside from finding a connection between a PcG protein and migraine, we have also found associations between an increased burden of rare variants in three PREs and migraine. It is interesting, that we find multiple connections between an increased migraine risk and PREs and proteins that bind to them. In humans, PREs are known to be influential on human embryonic development and epigenetic memory [48] but the function and regulatory mechanism of action of mammalian PREs is still largely unknown [49]. It is therefore speculative, what effect PREs have on migraine. As more knowledge on PREs is gained in the future, we may be able to understand how their altered regulatory mechanism may affect migraine. PcG proteins have been suggested to act synergistically in order to silence gene expression [50]. Consequently, any mutated PcG binding site within a PRE may cause an altered phenotype. Future research could include determining if the PREs we find associated with migraine harbor PcG bindings sites that have been altered by the rare mutations.
Lastly, we addressed whether only the regulatory regions had a higher burden of rare variants segregating with migraine or if the genes possibly targeted by the regulatory regions had a higher burden of rare variants. We found that none of the protein-coding possible genic regulatory targets had a significantly increased burden of rare variants segregating with migraine. This suggests that a higher burden of rare variants in the regulatory regions, and not their respective regulatory targets, is implicated in the pathophysiology of migraine. This supports that migraine is influenced by gene dysregulation and that regulatory dysfunction has a crucial impact on the disorder.

Conclusion
We report that families with familial clustering of migraine have an increased burden of rare variants segregating with migraine in regulatory regions. The regulatory regions are located near known migraine risk loci but display effects independent of the common variants defining the loci. The possible regulatory targets suggest different genes than those originally tagged by the index SNPs of the migraine loci. The findings support that gene dysregulation plays a crucial, if not causal, role in migraine etiology.
Funding information The study was funded by a grant from Candys Foundation "CEHEAD". The funding body was not involved in the design of the study and collection, analysis, and interpretation of data.
Data availability Summary data supporting the conclusions of this article are available on request.

Compliance with ethical standards
Conflict of interest The authors declare that they have no conflict of interest.
Ethics approval All procedures performed in studies involving human participants were in accordance with the ethical standards of the National Committee on Health Research Ethics (file no. H-2-2010-122) and with the 1964 Helsinki declaration and its later amendments or comparable ethical standards. The study was reported to the data-protection agency.
Informed consent A written informed consent was obtained from all participants.
Abbreviations CI, Confidence interval; F-SKAT, Family Sequence Kernel Association Test; GWAS, Genome-wide association study; K2P, Two-pore domain potassium; MAF, Minor allele frequency; OR, Odds ratio; PcG, Polycomb group; PRE, Polycomb group response elements; TFBS, Transcription factor binding sites Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.