Family studies to find rare high risk variants in migraine
- 1.6k Downloads
Migraine has long been known as a common complex disease caused by genetic and environmental factors. The pathophysiology and the specific genetic susceptibility are poorly understood. Common variants only explain a small part of the heritability of migraine. It is thought that rare genetic variants with bigger effect size may be involved in the disease. Since migraine has a tendency to cluster in families, a family approach might be the way to find these variants. This is also indicated by identification of migraine-associated loci in classical linkage-analyses in migraine families. A single migraine study using a candidate-gene approach was performed in 2010 identifying a rare mutation in the TRESK potassium channel segregating in a large family with migraine with aura, but this finding has later become questioned. The technologies of next-generation sequencing (NGS) now provides an affordable tool to investigate the genetic variation in the entire exome or genome. The family-based study design using NGS is described in this paper. We also review family studies using NGS that have been successful in finding rare variants in other common complex diseases in order to argue the promising application of a family approach to migraine.
PubMed was searched to find studies that looked for rare genetic variants in common complex diseases through a family-based design using NGS, excluding studies looking for de-novo mutations, or using a candidate-gene approach and studies on cancer. All issues from Nature Genetics and PLOS genetics 2014, 2015 and 2016 (UTAI June) were screened for relevant papers. Reference lists from included and other relevant papers were also searched. For the description of the family-based study design using NGS an in-house protocol was used.
Thirty-two successful studies, which covered 16 different common complex diseases, were included in this paper. We also found a single migraine study. Twenty-three studies found one or a few family specific variants (less than five), while other studies found several possible variants. Not all of them were genome wide significant. Four studies performed follow-up analyses in unrelated cases and controls and calculated odds ratios that supported an association between detected variants and risk of disease. Studies of 11 diseases identified rare variants that segregated fully or to a large degree with the disease in the pedigrees.
It is possible to find rare high risk variants for common complex diseases through a family-based approach. One study using a family approach and NGS to find rare variants in migraine has already been published but with strong limitations. More studies are under way.
KeywordsNext generation sequencing Common complex disease Whole genome sequencing family approach Whole exome sequencing Migraine genetics
Age-related macular degeneration
Autism spectrum disorder
Copy number variant
Genome wide associations study
Identity by descent
Late onset Alzheimer’s disease
Logarithm of the odds
Migraine with aura
Minor allele frequency
Migraine with and without aura
Migraine without aura
Next generation sequencing
Non syndromic cleft lip palate
Single nucleotide polymorphism
Whole exome sequencing
Whole genome sequencing
With a lifetime prevalence of 16%, migraine affects 75 million Europeans. It can be very disabling for the individual and is a large economic burden to society . Unraveling the genetics of migraine is therefore highly relevant. Migraine is a complex disorder caused by several genes and environmental factors [2, 3]. A higher concordance of migraine in monozygotic than in dizygotic twins, and the 1.9-3.8 fold higher risk of migraine among first degree relatives of affected individuals, indicates an important genetic component [2, 3, 4, 5] Twin studies show a heritability of 34%–65% [5, 6]. Heritability of migraine with typical aura (MA) that affects approximately one third of migraineurs is higher than for migraine without aura (MO), and evidence supports that MA and MO have different, though somewhat overlapping, etiology [7, 8]. The diagnosis of migraine is based solely on patients’ history in the absence of validated biomarkers.
Explanation of genetic methods and terms
Single nucleotide polymorphism (SNP)
A SNP is a substitution of a single base pair in the genome that occur in >1% of a population, a so called common variant [84, 85]. SNPs that occur in <1% of a population are considered rare. On average, there is one SNP for every 0.75-1.91 kb throughout the genome [86, 87]. Many of these reside outside protein-coding areas. A proportion of these will reside in other functional elements . <1% of SNPs lead to changes in protein function .
After completion of especially the HapMap project and The 1000 Genomes Project, the vast majority of SNPs and structural variants are now mapped throughout the genome [86, 87, 90, 91, 92]. More than 38 million SNPs are identified and these are estimated to constitute more than 95% of all common SNPs . The SNPs known to date are gathered in public databases like dbSNP  .
LOD = logarithm of the odds. A measure of the probability of two genetic loci to be located close to each other on a chromosome and thereby the likelihood for them to be inherited together (be linked). A LOD-score on > 3 means that the likelihood for two loci to be located close (and be linked) is 1,000 times the likelihood of no linkage .
Genome wide association study (GWAS)
The rationale is to find variants that happen to occur more often than by chance in the genomes of individuals with a specific phenotype. It is carried out by an association analysis on genotyped cases and controls. SNPs are most widely used as genetic marker. Genomes are genotyped at specific points in the DNA where the chosen markers are localized if present. Every SNP represents a block of genes, a haplotype. These are inherited together more often than by chance. They are said to be in linkage disequilibrium . Tag-SNPs present in the sample are tested for association with a phenotype of interest, e.g. migraine, by comparing the frequencies of the SNPs in cases vs. controls.
Nest generation sequencing (NGS)
Sequencing of the nucleotides in the entire exome or genome by whole exome or whole genome sequencing (WES or WGS, see below)
Whole exome sequencing (WES)
WES is sequencing of every nucleotide in all exomes in a genome. Exomes are the protein coding part of DNA. This means that the remaining part of DNA in between the exomes is not sequenced.
Whole genome sequencing (WGS)
WGS is complete sequencing of the entire genome consisting almost 3 billion base pairs . Thus, also non-coding parts of the DNA are sequenced. Non protein coding DNA contains many functional elements with influence on gene expression and regulation e.g. RNA coding sequences, transcription factor binding sides, regions of modification or with influence on chromatin (the DNA, RNA and proteins that chromosomes are made of) structure and other interacting regions .
Attempts to find chromosome segments that are shared between affected family members. Thus, no prior hypothesis of involved loci is needed. To screen for shared DNA blocks, markers are needed. Often, sets of microsatellite-markers are used . Microsatellites contain a short sequence of base pairs that are repeated a variable number of times. Every microsatellite represents a block of DNA, a haplotype. Thus, having a specific microsatellite means having a specific haplotype. The aim is then to find linkage between a phenotype e.g. a disease and a haplotype. If a haplotype segregates with a disease in a family, they are probably linked.
Each gene has a specific position on a chromosome, a so called locus. A haplotype is a combination of gene alleles at a chromosome that are inherited together more often than by chance. On average haplotypes span 25,000 nucleotides [84, 85]. Haplotypes are longer for newer and inbred or isolated populations and shorter for old or very outbred populations .
A classic method to sequence every nucleotide in a DNA fragment of interest. The method includes the use of modified nucleotides labeled radioactively or by fluorescence and gel electrophoresis . More precise sequencing with fewer read errors that WES/WGS. It is used to confirm findings in WES/WGS.
Phasing and imputation
Imputation is performed with different kinds of software and is a way to predict not genotyped variants, located between genotyped variants in haplotyped blocks, by using a reference sample where a greater number of variants are genotyped . Phasing means to sort out which genotypes are placed on the paternal respectively the maternal chromosome .
Identity by descent (IBD)
Genomic regions that are identically inherited from parents to more than one child. This means that the siblings will share the DNA combination in that region . IBD can prevail over many generations and reveal the familial relationship (a common ancestor) between very distantly related individuals.
This is a focused review and by no means exhaustive on the topic of NGS in common complex diseases.
PubMed was searched to find studies on rare genetic variants using NGS in common complex diseases with positive findings.. There is no clear definition of the term “common disease”. The definition of a “rare disease” is not clear either, but it is defined by the European Commission as “prevalence of less than five per 10,000 in the Community” . We therefore defined diseases not fitting this definition as common. Successful studies with a family-based design using a NGS method, not looking for de novo mutations or already known candidate genes were included. Studies focusing on cancer were also excluded. We searched the following terms: “Exome AND sequencing AND pedigree AND rare AND variants” which resulted in 192 articles of which 29 abstracts were read, 15 were read in full and 14 were included. Also the terms “family-based AND “exome-sequencing” AND rare AND variants” were searched and yielded 17 articles (three already included) out of which nine abstracts were read and eight articles were read in full resulting in three included studies.
“Migraine AND “whole genome sequencing ”and “migraine AND” “whole exome sequencing” only yielded three and nine hits respectively and one article were read in full and included.
We especially wanted to include studies of bipolar disorder and schizophrenia, because the diagnosis, like for migraine, relies on clinical characteristics in lack of a biomarker. Therefore, we also searched the terms “Schizophrenia AND “whole genome sequencing” AND families” yielding six articles whereof two were included, “Schizophrenia AND “whole exome sequencing” AND families” yielding 13 articles (one already included) of which 8 abstracts were read, 5 of them in full resulting in 3 included. The terms ““bipolar disorder” AND “whole exome sequencing” AND families” resulted in one article which was not included and ““bipolar disorder” AND “whole genome sequencing” AND families” resulted in two articles of which one was included.
The last search was performed the 12th of June 2016.
Methodology of the family approach
In the following we shall describe a promising design for a family-study to find rare high risk variants in complex diseases used in our ongoing studies. For explanation of the different methods and terms mentioned, see table 1.
Multigenerational pedigrees with as many affected and unaffected individuals as possible will optimize the chance to find causal variants. The affected family members have to be in direct bloodline, with a minimum of affected spouses. There must be a trio of an affected child and an affected as well as a non-affected parent and also an affected relative in direct blood line as many meioses away as possible to narrow down the number of shared variants.
To distinguish affected individuals from unaffected, the diagnostic process is of crucial importance. A single wrongly diagnosed individual in a family, whether affected or unaffected, will diminish the chance to identify the causal genetic variant(s) in the segregation analysis. For migraine this is difficult as no diagnostic biomarker exists. The diagnosis relies on a detailed recording of symptoms and unambiguous diagnostic criteria of the International Headache Society (international classification of headache disorders third edition (ICHD 3-beta) ). A validated semi-structured interview based on the diagnostic criteria, allows a reliable diagnosis. The interview should be performed by a specially trained physician or senior medical student. An example is the validated, semi-structured migraine interview used at the Danish Headache Center .
Whole exome sequencing (WES) or preferably whole genome sequencing (WGS) can be performed on DNA from blood samples or cells from other biological materials like mucosal cells from a buccal swab. The analysis will be a combination of an association- and a segregation analysis which will require a combination of several sequencing and genotyping methods. Based on an in-house protocol and literature review we propose the following.
Several hundred variants are usually shared between the affected individuals. The disease-causing variants will often not be present in DNA from the unaffected relative. The use of a more distantly related affected individual in the analysis will decrease the number of possible causal variants because a distantly related person shares less DNA with the two affected individuals in the trio. Quality testing of the sequencing like depth of coverage (the a verage number of times a nucleotide is expected and observed to be sequenced) will not be explained here in details, but the importance of this step should be noted.
To narrow down the number of variants, a filtering process is hereafter necessary. This can be done in several ways. They all depend on several existing databases with genetic information. These databases consist of collections of structural variants like SNPs and copy number variants (CNVs) detected by different studies. All SNPs known to date are for example gathered in public databases like dbSNP . Other examples are the 1000 genomes project data, the Database of Genomic Variants , and more local databases like LuCamp containing 700.000 SNPS from 2000 Danish individuals . The strategy is to filter out and exclude all variants known in all available genetic databases as the causal variants are expected to be private for a family. Some studies choose to include rare variants present in databases with minor allele frequency (MAF) <0.5% or <0.1% . A third strategy is to filter out variants with e.g. MAF >0,1%, thereby excluding variants common in the population from the study data [37, 38].
The remaining family specific variants then have to be tested for segregation with disease in the pedigree. This can be done by a classical Sanger sequencing of the relevant genes in all the non-sequenced family members or by SNP genotyping. Ideally the causal variant is present only in affected individuals, but the possibility of unaffected carriers has to be taken into account because of incomplete penetrance, or later debut of the disease. Calculating a so called LOD score (LOD = logarithm of the odds), can help as a measure of probability of linkage between a variant and a disease.
Oligogenic inheritance where more than one variant are found to segregate in a pedigree is very likely in common diseases. Therefore, it is possible that this approach will result in multiple variants segregating in one family, all playing a role for the pathogenicity in that family.
Successful use of the family approach in diseases other than migraine
Family studies using different variants of the approach described above or some of its elements have already provided interesting results in common complex diseases other than migraine.
Thirty-two studies that fulfilled the criteria were included in this review [36, 37, 38, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69]. Six of them studied bipolar disorder, five of them studied schizophrenia (SCZ), three of them autism spectrum disorder (ASD) and two of them type II diabetes, while four studies focused on late onset Alzheimer’s disease (LOAD). There was one study focusing on each of the following: Parkinson’s disease, age-related macular degeneration (AMD), adiponectin level, atrial fibrillation, intracranial aneurysms, nonsyndromic cleft lip and palate (NSCLP), nonsyndromic hearing impairment, otitis media, preeclampsia, primary open angle glaucoma, inflammatory bowel disease and reumathoid arthritis. The study groups ranged from one to eighty families of varying size. They all used a combination of genetic methods like WGS, WES, sanger sequncing, GWAS and linkage-analysis. Some also used identity by descent analysis (IBD). Only three studies did WGS and three studies made use of family trios. One study searched for rare CNVs instead of SNPs . 23 of the studies found one or a few variants (<5) specific for a family and the rest found several variants without finding one causal variant. Study details on disease, sample, techniques and findings are listed in Additional file 1: Table S1.
Four studies performed a follow-up analysis in unrelated cases and controls and presented an OR. Wetzel-smith et al. detected a missense variant (rs137875858 in UNC5C) in a large family with 8 LOAD cases using WGS, WES, linkage-analysis and assay genotyping. The same variant was found in four other pedigrees. Further genotyping of LOAD cases (8,050) and controls (98,194) resulted in an OR = 2.15 (95% CI = [1.21; 3.84], P = 0.0095) . By WES and genotyping Cruchaga et al. detected a rare variant (rs145999145 in PLD3) segregating in 2 pedigrees with multiple LOAD cases. When testing independent cohorts of sporadic cases (4,998) and controls (6,356) they found an OR = 2.10 (95% CI = [1.47; 2,99] P=2.39×10−10-). For familial LOAD cases (1,106) vs. unrelated controls (6,356) an OR of 3.39 (95% CI = [2.14; 5.39] P = 1.18×10−6) was calculated . Kohli et al. made use of WES, linkage analysis, genotyping and Sanger sequencing to study one pedigree counting 15 individuals. They found a rare variant (rs377155188 in TTC3) which segregated perfectly with LOAD in the pedigree. They calculated an OR for LOAD of 3.35 for sporadic cases (6,669) vs. controls (5,585) . The result did not reach statistical significance (CI not specified). Goes et al. exome sequenced 36 individuals from 8 families and calculated ORs for variants in three genes with association to bipolar disorder in a case–control follow-up study (3,541 cases, 4,774 controls). It resulted in ORs of 2.73 (P = 0.016), 6.7 (P = 0.0039) and 2.78 (P = 0.045) for variants in MLK4, APPL2 and HSP90AA1 respectively .
Some studies found complete segregation between a variant and the studied disease and a few of them will be highlighted in the following. Cruceanu et al. exome-sequenced DNA from caucasian individuals affected with a highly heritable subtype of bipolar disorder in multigenerational families with three to seven affected individuals. Focus was on variants with MAF<1% in the general population. A missense variant was detected on chromosome 11 (position116652892) that leads to substitution of an aminoacid in the encoded protein. The variant segregated with affected family members in a family (five individuals) and was not present in unaffected family members selected as controls (six individuals). Whether this variant contributes to bipolar disorder is not known because of lack of knowledge about the involved gene .
Nyegaard et al. studied a five generational family containing 17 individuals with nonsyndromic hearing impairment (two of them dead). Seven already known hearing loss genes were not involved. 11 individuals were selected for SNP genotyping. 11.034 detected SNPs were included in a parametric linkage analysis which resulted in a significant linkage peak at chromosome six. To narrow down the size of the detected locus, 26 family members were genotyped for seven microsatellite markers. Then DNA from an affected individual was selected to undergo next generation sequencing (NGS) at the locus site by a costume designed sequence array. This identified 28,300 variants. One variant found in a coding region and predicted to have a functional effect was identified after excluding common variants. The mutation c.574C>T in CD164 was found in all affected individuals including a young girl with signs of a beginning hearing loss. The variant was absent in 12 unaffected family members, in one with unknown phenotype and in 1200 unrelated controls .
Suggested application of the family approach to migraine
The studies reviewed here largely followed the steps we have described, but with many variations. In our description of a family based study design, we suggest to use family trios for WGS. Only three studies [46, 47, 48], one of them still ongoing, did this. Also, only three studies [42, 46, 49] made use of WGS, which probably reflects that the cost of WGS has only been manageable very recently. It is highly likely that the non-coding regulatory areas play an important role in common migraine . Georgi et al. chose to study a pedigree in an isolated population which has also been suggested as a possible approach to find rare variants [16, 46, 71, 72]. 23 studies [36, 37, 38, 41, 42, 43, 44, 45, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62] reported the finding of less than five family specific variants (not all significant). Some studies could not find one specific causal variant probably because disease susceptibility is caused by more than one rare variant specific to a family (oligogenic inheritance). Detection of variants only makes sense if it is followed by further studies to clarify the causality of the variant. An et al. found several variants associated with autism spectrum disorder. They found enrichment of rare causal variants in key neurobiological processes, and overrepresentation of the rare causal variants in functions involving neuronal development, signal transduction and synapse development .
In the future, combinations of gene variants might be analyzed by “omics” approach, where bioinformatics integrate genomics, epigenomics, transcriptomics, proteomics and metabolomics [73, 74]. We excluded Ratnapriya et al.  from the reviewing part of this study because they studied a rare subtype of macular degeneration. They found a rare variant in a family with early onset macular degeneration in FBN2, but also a common variant in the same gene with a modest association to AMD cases. It is an excellent example of how both rare and common variants in a single gene can contribute to complex forms of a disease phenotype and the understanding of its pathophysiology.
Few studies focused on CNV’s, and only Van Den Bossche et al. , studying schizophrenia, succeeded in finding a CNV associated with disease. Rare inherited CNVs were more frequent in familial schizophrenia than in an unaffected control cohort . This supports CNVs as an area of interest when searching for rare disease variants in migraine. In the future, NGS methods will be able to capture CNVs .
As mentioned, ORs for SNPs associated with migraine found through a GWAS ranged between 0.85 and 1.24. In the studies using a family approach reviewed here the ORs in follow-up case–control studies ranged between 2.10 and 6.7, the last one for the association between a variant in APPL2 and bipolar disorder. Like migraine, bipolar disorder and schizophrenia are common and complex neurological disorders with a clearly heritable factor and a diagnosis based on history in the absence of a biomarker. Success in these disorders raises hope to find specific rare variants with high relative risk in migraine families. Jiang et al. reported the preliminary finding of six novel rare non-synonymous mutations in a Chinese family with clustering of migraine without aura using WES. They included four cases (a father and three children) and four unrelated controls. However, the study had several limitations . As far as we know, Jiang et al. is the only study using a family approach and NGS in migraine that has been published. F. Michael Cutrer, Mayo Clinic, Rochester has carried out WES in two large migraine families, according to a published grant description . Five candidate genes were found to segregate with MA in one family. In the other family, including individuals with varying phenotypes, a single variant was detected. Whether the variants are rare was not stated. These results have unfortunately not been published. A similar study is ongoing on at the Danish Headache Center. This project aims to find rare genetic variants conferring a high risk of migraine using a family approach exactly as described previously. Extended Danish families with MA or MO are included. The study is still collecting data and biological material for sequencing and genotyping. A family approach in migraine will encounter obstacles. Correct phenotyping cannot avoid that unaffected controls may develop migraine later. The probability that some affected family members do not carry a family specific variant is high due to the high prevalence of migraine in the general population. Also, unaffected carriers are a possibility due to low level of penetrance. This will complicate the analysis. Many families contain a mix of individuals with MO, MA or so called MAMO (co-occurring MA and MO). It is not known whether the two phenotypes are part of a spectrum of the same disease or different diseases. Taking these problems into account, we still believe that a family approach is the best way to find variants with a high relative risk. Such variants can be the key to understand the pathophysiological mechanisms of migraine, and much more so than the common variants discovered by GWAS. It is obvious that migraine is highly genetically heterogeneous. Pathophysiology as well as response to prophylactic drugs vary considerably [80, 81]. On the other hand, 80% of patients respond to injection of Sumatriptan suggesting the existence of a final common pathway [82, 83]. If the etiology of just a few sub-phenotypes can be identified with certainty, it seems possible to identify one or more migraine pathwaysthat may be relevant for many patients, even if the genetic cause that lead to the discovery is rare. New targets for better and more specific treatments may then be discovered.
Review and conclusions
It has proven possible to find rare high risk variants for common complex diseases through a family based approach. One study using a family approach and NGS to find rare variants in migraine has already been published but it has strong limitations. More studies are under way.
Future family approach studies could be advanced by choosing isolated populations or individuals with severe phenotypes as study groups and include analysis of mitochondrial DNA and “omics”.
We would like to thank Torben Hansen, Professor at Novo Nordisk Foundation Center for Basic Metabolic Research, University of Copenhagen, for comments that improved the manuscript.
JO and AC were involved in the conception and design of this work. RH collected litterature and in cooperation with AC, relevant studies where sorted out to be reviewed. The manuscript where mainly drafted by RH but with guidance and inputs from JO and AC who also revised it critically. All authors read and approved the final manuscript.
The authors declare that they have no competing interests.
- 4.Ulrich V, Gervil M, Kyvik KO et al (1999) Evidence of a genetic factor in migraine with aura: a population-based danish twin study. Ann Neurol 45:242–246. doi: 10.1002/1531-8249(199902)45:2<242::AID-ANA15>3.0.CO;2-1 CrossRefPubMedGoogle Scholar
- 30.European Commission (2005) Useful Information on rare dieseases from an EU perspective., pp 2005–2006Google Scholar
- 33.dbSNP. https://www.ncbi.nlm.nih.gov/SNP/. Accessed 14 Mar 2016
- 34.Database of Genomic Variants. http://dgv.tcag.ca/dgv/app/about?ref. Accessed 14 Mar 2016
- 35.LuCamp. http://www.lucamp.org/#/172977/. Accessed 15 Mar 2016
- 39.Polyphen-2. http://genetics.bwh.harvard.edu/pph2/. Accessed 14 Mar 2016
- 40.SIFT. http://sift.jcvi.org/. Accessed 14 Mar 2016
- 47.N. Matoba, M. Kataoka, K. Fujii, Y. Suzuki, S. Sugano TK Trio-based pathway analysis of bipolar disorder. http://www.ashg.org/2013meeting/abstracts/fulltext/f130121585.htm.
- 49.Thygesen JH, Zambach SK, Ingason A, et al. (2015) Linkage and whole genome sequencing identify a locus on 6q25–26 for formal thought disorder and implicate MEF2A regulation. Schizophr Res 6–11. doi: 10.1016/j.schres.2015.08.037.
- 55.Deng H-X, Shi Y, Yang Y, et al. (2016) Identification of TMEM230 mutations in familial Parkinson’s disease. Nat Genet advance on:733–739. doi: 10.1038/ng.3589.
- 64.Weeke P, Muhammad R, Delaney JT, et al. (2014) Whole-exome sequencing in familial atrial fibrillation. Eur Heart J 1–7. doi: 10.1093/eurheartj/ehu156 .
- 69.Homann OR, Misura K, Lamas E, et al. (2016) Whole-genome sequencing in multiplex families with psychoses reveals mutations in the SHANK2 and SMARCA1 genes segregating with illness. Mol Psychiatry 1–6. doi: 10.1038/mp.2016.24.
- 79.Cutrer F Whole Exome Sequencing as a Strategy for Gene Discovery in a Large Well Characterized Family with Migraine. http://www.migraineresearchfoundation.org/completed-research.html. Accessed 14 Mar 2016
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.